2605.18147 2026-05-19 cs.LG

Foundation Models for Credit Risk Prediction: A Game Changer?

信贷风险预测的基础模型：变革性突破？

Bart Baesens, Andreas Goethals, Stefan Lessmann, Simon De Vos, Cristián Bravo, David Martens, Victor Medina-Olivares, Christophe Mues, Maria Oskarsdóttir, Seppe vanden Broucke, Tim Verdonck, Wouter Verbeke

AI总结本文研究了信贷风险预测中基础模型的应用，探讨了其在小数据环境下提升预测性能的能力，并通过对比多种方法验证了基础模型在PD和LGD建模任务中的优越性。

详情

AI中文摘要

预测模型在信贷风险管理中发挥着关键作用，通过准确估计违约概率和损失来指导关键决策。大量研究引入了新的建模技术，并通过大规模基准研究巩固了最先进的方法。如今，梯度提升模型配以SHAP解释器已成为准标准，但风险模型的持续改进仍是首要任务。同时，人工智能的快速进展，尤其是大型语言模型，已颠覆了预测建模范式。基础模型通过在广泛领域数据集上预训练，利用先验知识表现出色。尽管在自然语言处理和计算机视觉中广泛应用，但针对表格数据的基础模型才刚刚出现。我们推测，在小数据设置中，如中小企业贷款或专门化的公司投资组合中，使用非领域数据进行预训练可能特别有益，并可能帮助解决长期存在的挑战，包括低违约率投资组合和类别不平衡问题。本文将最近提出的方法与广泛竞争对手进行基准测试，包括已建立和先进的机器学习技术，在PD和LGD建模两个核心任务中进行评估。我们的评估涵盖了各种数据集、性能指标和实验条件。我们发现，表格基础模型在各种数据集和任务中表现最佳。此外，当数据集规模减小时，它们在预测性能上提供了显著改进。这些结果令人印象深刻，因为模型在即开即用的情况下进行测试，无需超参数调优，确保了易用性和降低了计算成本。

英文摘要

Predictive models play a pivotal role in credit risk management, guiding critical decisions through accurate estimation of default probabilities and losses. Extensive research has introduced new modeling techniques, complemented by large-scale benchmarking studies consolidating the state-of-the-art. Today, quasi-standards such as gradient-boosting models paired with SHAP explainers have emerged, yet continuous improvement of risk models remains a top priority. Concurrently, rapid advancements in AI, most notably large language models, have disrupted predictive modeling paradigms. Foundation models, pretrained on extensive datasets from diverse domains, have demonstrated remarkable performance by leveraging prior knowledge. While prevalent in natural language processing and computer vision, foundation models for tabular data have only recently emerged. We conjecture that pretraining on out-of-domain data is particularly beneficial in small-data settings, such as SME lending or specialized corporate portfolios, and may help address longstanding challenges including low default portfolios and class imbalance. This paper benchmarks recently proposed tabular foundation models against a broad set of competitors, including established and advanced machine learning techniques, across two core tasks: PD and LGD modeling. Our evaluation encompasses various datasets, performance indicators, and experimental conditions. We find that tabular foundation models generally perform best across datasets and tasks. Moreover, they offer significant improvement in predictive performance as dataset size shrinks. These results are remarkable given that the models are tested out-of-the-box, without hyperparameter tuning, ensuring ease of use and mitigating computational costs.

URL PDF HTML ☆

赞 0 踩 0

2605.18144 2026-05-19 cs.AI

Evidence-Grounded Frontier Mapping and Agentic Hypothesis Generation in Nanomedicine

基于证据的前沿映射与代理假设生成在纳米医学中

Christiaan G. A. Viviers, Koen de Bruin, Mirre M. Trines, Ayla M. Hokke, Roy van der Meel, Avi Schroeder, Twan Lammers, Willem J. M. Mulder, Fons van der Sommen

AI总结该研究提出了一种结合文章嵌入、相似性图分析、稀疏前沿提取、结构化证据包检索和审计过的大型语言模型（LLM）工作流的系统pArticleMap，用于支持纳米医学研究方向的选择和假设生成，通过生成和评分基于引用的假设，实现了证据导向的研究辅助。

详情

AI中文摘要

纳米医学研究涵盖了递送化学、免疫学、成像、生物材料和疾病特定的转化科学，但其概念设计空间仍然在大量异质文献中碎片化。截至目前，人工智能在纳米医学中的应用主要集中在性质预测和配方优化，对研究方向选择层面的证据导向发现支持关注较少。我们引入了pArticleMap，一个结合文章嵌入、相似性图分析、稀疏前沿提取、结构化证据包检索和审计过的大型语言模型（LLM）工作流的文献映射和研究假设生成系统。该系统不同于预测未来概念共现，而是针对低密度文章级桥接区域和聚类界面，然后在代理设置中利用大型语言模型生成和评分基于引用的假设。我们通过回顾性实现基准（在历史截止点下生成后续文献）和盲人类读者评估层，在提示条件下的纳米医学任务中评估该系统。在4个选定的回顾性包中，pArticleMap在基准协议下生成了想法并选择了任务保留的假设（获胜想法）。对于任务级保留的假设，获得了一个汇总的黄金回收率10.8%，召回@10为15.9%，未来邻域率61.0%，表明该系统经常能够达到正确的前瞻性邻域（论文想法），即使没有精确的论文级回收。人类-代理协议总体上是中等的，表明内部评分是有用的支持信号，但不能替代专家判断。这些结果将pArticleMap定位为一种保守的、基于证据的研究助手，用于纳米医学。

英文摘要

Nanomedicine research spans delivery chemistry, immunology, imaging, biomaterials, and disease-specific translational science, yet its conceptual design space remains fragmented across a large and heterogeneous literature. To date, artificial intelligence in nanomedicine has focused primarily on property prediction and formulation optimization, with much less attention to evidence-grounded discovery support at the level of research direction selection. We introduce pArticleMap, a literature-mapping and research-hypothesis-generation system that combines article embeddings, similarity-graph analysis, sparse frontier extraction, structured evidence-pack retrieval, and an audited large-language-model (LLM) workflow for grounded ideation. Rather than forecasting future concept co-occurrence, pArticleMap targets low-density article-level bridge regions and cluster interfaces, then generates and scores citation-grounded hypotheses with large language models in an agentic setup. We evaluate the system with a retrospective realization benchmark (generate later literature under a historical cutoff) and a blinded human reader assessment layer across cue-conditioned nanomedicine tasks. Across 4 selected retrospective bundles, pArticleMap generated ideas and selected task-retained hypotheses (winner ideas) under the benchmark protocol. For task-level retained hypotheses, a pooled gold recovery rate of 10.8% was obtained, with a recall@10 of 15.9% and a future-neighborhood rate of 61.0%, indicating that the system often reached the correct forward-looking neighborhood (paper ideas) even without exact paper-level recovery. Human-agent agreement is modest overall, indicating that internal scoring is useful as a support signal but does not replace expert judgment. These results position pArticleMap as a conservative, evidence-grounded research assistant for nanomedicine.

URL PDF HTML ☆

赞 0 踩 0

2605.18143 2026-05-19 cs.AI

Generative AI and the Productivity Divide: Human-AI Complementarities in Education

生成式AI与生产力差距：教育中的人类-人工智能互补性

Lihi Idan, Bharat Anand

AI总结本研究探讨了生成式AI对不同用户生产力影响的异质性，发现AI交互能力（AIC）是决定AI使用效果的关键因素，通过概念图干预可减少不平等，强调需结合AIC微培训和标准流程以实现持续价值捕获。

详情

AI中文摘要

生成式人工智能（GenAI）正在改变企业创造、处理和应用知识的方式，但对其生产力影响的异质性知之甚少。我们报告了一项随机对照试验的结果，参与者（早期知识工作者的类比）被分配在传统资源或大语言模型（LLM）辅助下自学技术领域。平均而言，GenAI访问显著提高了任务表现，但收益分布极不均衡。改进未由GPA或先前知识预测，而是由AI交互能力（AIC）——即获取、过滤和验证模型输出的能力——预测。高AIC参与者实现了显著收益；低AIC参与者则获得有限甚至负的边际回报。概念图干预（ scaffolding）减少了结果变异，表明标准化流程可减轻AI中介表现中的不平等。我们通过人类-人工智能互补性视角解读这些发现：GenAI提高平均生产力，但引入了新的能力不平等轴。管理上，企业应将GenAI访问与短期AIC微培训和简单标准操作程序相结合，以一致捕获价值并避免不均的采用结果。

英文摘要

Generative Artificial Intelligence (GenAI) is transforming how firms create, process, and apply knowledge, yet little is known about the heterogeneity of its productivity effects across users. We report results from a randomized controlled experiment in which participants-analogs of early-career knowledge workers-were assigned to self-study a technical domain using either traditional resources or large-language-model (LLM) assistance. On average, GenAI access significantly increased task performance, but the distribution of gains was highly uneven. Improvements were not predicted by GPA or prior knowledge, but by \textit{AI Interaction Competence (AIC)} -- the ability to elicit, filter, and verify model outputs. High-AIC participants realized outsized gains; low-AIC participants saw limited or even negative marginal returns. A scaffolding intervention (conceptual maps) reduced outcome variance, indicating that standardized workflows can mitigate inequality in AI-mediated performance. We interpret these findings through the lens of human-AI complementarities: GenAI raises mean productivity while introducing a new axis of capability inequality. Managerially, firms should pair GenAI access with short AIC micro-training and simple standard operating procedures to capture value consistently and avoid uneven adoption outcomes.

URL PDF HTML ☆

赞 0 踩 0

2605.18132 2026-05-19 cs.CV cs.AI

Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

谁生成了这个3D资产？学习生成3D模型的来源归属

Sihan Ma, Siyuan Liang, Dacheng Tao

AI总结该研究提出了一种方法，用于确定给定3D资产是由哪种生成模型创建的，通过构建首个被动来源归属基准，发现生成3D模型留下稳定的指纹特征，从而建立了可信的3D内容来源的新标准。

详情

AI中文摘要

生成3D模型被应用于游戏、机器人和沉浸式创作，因此来源归属至关重要：给定一个3D资产，我们能否确定并识别出是哪种生成模型创建的？该问题面临两个核心挑战：分散的归属信号，其中3D指纹分布在多视角、几何和频率域提示中；以及现实部署约束，其中稀少的标签、退化的提示和混合真实/合成资产会破坏归属的可靠性。为了系统研究该问题，我们构建了迄今为止首个被动来源归属基准，涵盖22种代表性的3D生成器，在标准、少样本和现实部署协议下。基于此基准，我们发现生成3D模型留下两种稳定的指纹：跨视角不一致性和体现在几何统计和频率域提示中的结构伪影。为了捕捉这些分散的信号，我们提出了一种层次多视角多模态Transformer，融合每个视角的外观、几何和频率域特征，并在跨视角建模全局关系。大量实验表明性能优异，在全监督下达到97.22%的准确率，在仅有1%训练数据时达到77.17%的准确率，对应每个生成器少于五个样本。这些结果表明现代3D生成器留下稳定且可归属的指纹，建立了可信3D内容来源的新基准和方法论基础。

英文摘要

Generative 3D models are deployed in gaming, robotics, and immersive creation, making source attribution critical: given a 3D asset, can we identify whether and which generative model created it? This problem faces two core challenges: dispersed attribution signals, where 3D fingerprints are distributed across multi-view, geometric, and frequency-domain cues; and realistic deployment constraints, where scarce labels, degraded prompts, and mixed real/synthetic assets undermine attribution reliability. To systematically study this problem, we construct, to the best of our knowledge, the first passive source attribution benchmark for modern generated assets, covering 22 representative 3D generators under standard, few-shot, and realistic deployment protocols. Based on this benchmark, we find that generative 3D models leave two types of stable fingerprints: cross-view inconsistency and structural artifacts reflected in geometric statistics and frequency-domain cues. To capture these dispersed signals, we propose a hierarchical multi-view multi-modal Transformer that fuses appearance, geometric, and frequency-domain features within each view and models global relationships across views. Extensive experiments demonstrate strong performance, achieving 97.22% accuracy under full supervision and 77.17% accuracy with only 1% training data, corresponding to fewer than five samples per generator. These results show that modern 3D generators leave stable and attributable fingerprints, establishing a new benchmark and methodological foundation for trustworthy 3D content provenance.

URL PDF HTML ☆

赞 0 踩 0

2605.18130 2026-05-19 cs.CV

Rad-VLSM: A Cross-Modal Framework with Semantics-Assisted Prompting for Medical Segmentation and Diagnosis

Rad-VLSM：一种结合语义辅助提示的跨模态框架用于医学分割与诊断

Fengyi Zhang, Xujie Zeng, Mohan Liu, Zengyi Wang, Yalong Jiang

AI总结本文提出Rad-VLSM框架，通过语义引导的提示机制，提升医学图像分割与诊断的准确性，解决现有模型易受背景组织和无关视觉相关性干扰的问题。

详情

AI中文摘要

医学图像分割在支持诊断而非仅仅生成病变掩码时更具临床价值。然而，诊断相关的病变线索往往微妙且局部化，而现有模型可能受背景组织、声学伪影和无关视觉相关性干扰。为了解决这个问题，我们提出了Rad-VLSM，一种两阶段跨模态框架，用于语义辅助的病变聚焦、鲁棒分割和视觉基础诊断。第一阶段中，基于BLIP-2的视觉-语言对齐模块在语义引导下识别病变相关候选区域，并将其转换为框提示。第二阶段中，这些提示被输入基于SAM的多任务网络，其中多候选区域聚合策略提高提示稳定性并引导病变分割。预测的掩码随后用作诊断的空间先验，视觉-放射组学融合头将病变感知的视觉特征与选定的放射组学描述符整合。通过使用语义信息进行定位而非直接预测，Rad-VLSM减少了文本到诊断的依赖，并将诊断基于病变层面的证据。在私有临床乳腺超声数据集和公共基准测试中，Rad-VLSM在分割和诊断性能方面表现强劲，具有良好的泛化能力。

英文摘要

Medical image segmentation is more clinically valuable when it supports diagnosis rather than merely producing lesion masks. However, diagnostically relevant lesion cues are often subtle and localized, while existing models may be distracted by background tissues, acoustic artifacts, and irrelevant visual correlations. To address this problem, we propose Rad-VLSM, a two-stage cross-modal framework for semantics-assisted lesion focusing, robust segmentation, and visually grounded diagnosis. In the first stage, a BLIP-2-based vision-language alignment module identifies lesion-related candidate regions under semantic guidance and converts them into box prompts. In the second stage, these prompts are fed into a SAM-based multitask network, where a multi-candidate region aggregation strategy improves prompt stability and guides lesion segmentation. The predicted masks are then used as spatial priors for diagnosis, and a visual-radiomics fusion head integrates lesion-aware visual features with selected radiomics descriptors. By using semantic information for localization rather than direct prediction, Rad-VLSM reduces text-to-diagnosis dependence and grounds diagnosis in lesion-level evidence. Experiments on a private clinical breast ultrasound dataset and public benchmarks show that Rad-VLSM achieves strong segmentation and diagnostic performance with favorable generalization.

URL PDF HTML ☆

赞 0 踩 0

2605.18128 2026-05-19 cs.AI

POST: Prior-Observation Adversarial Learning of Spatio-Temporal Associations for Multivariate Time Series Anomaly Detection

POST: 基于先验观察的时空关联对抗学习用于多变量时间序列异常检测

Suofei Zhang, Yaxuan Zheng, Haifeng Hu

AI总结本文提出了一种新的框架，通过联合先验观察对抗学习方法统一时空建模，以解决多变量时间序列异常检测中的空间过泛化问题，并在公开数据集和自建基准上展示了在时间检测和空间定位任务上的新状态。

详情

AI中文摘要

现有的多变量时间序列异常检测（MTSAD）框架越来越多地依赖于将图神经网络（GNNs）与序列模型相结合，以捕捉复杂的时空依赖关系。然而，较少关注空间过泛化问题，即不受约束的结构建模会 indiscriminately 重建异常，不可避免地降低检测召回率。为了解决这个问题，我们提出了一种新的框架，通过联合先验观察对抗学习方法统一时空建模。在空间维度上，模型交替学习邻接矩阵作为结构先验，并在训练过程中通过最小化方式建模先验与数据驱动观察之间的关联差异。这种对抗优化不仅提高了模型对时间检测的敏感性，还使模型能够定位到特定通道的异常。为了系统评估这种异常定位能力，我们进一步构建了一个带有精确通道注释的合成基准。在公开数据集和我们专门的基准上进行的广泛实验表明，所提出的框架在时间和空间定位任务上都建立了新的状态。我们的代码、预训练模型和基准已公开在 https://github.com/anocodetest1/POST。

英文摘要

Existing Multivariate Time Series Anomaly Detection (MTSAD) frameworks increasingly rely on integrating Graph Neural Networks (GNNs) with sequence models to capture complex spatio-temporal dependencies. However, less attention is paid to the spatial over-generalization problem, where unconstrained structural modeling indiscriminately reconstructs anomalies, inevitably degrading detection recall. To tackle this problem, we propose a novel framework that unifies spatio-temporal modeling through a joint prior-observation adversarial learning paradigm. In the spatial dimension, the model alternately learns adjacency matrices as structural prior and models the association discrepancy between prior and data-driven observation in a minimax manner during training. Such adversarial optimization not only improves the model sensitivity for time-wise detection, but also enables the model to localize anomalies to specific channels. To systematically evaluate this anomaly localization capability, we further construct a synthetic benchmark equipped with precise channel-wise annotations. Extensive experiments across public datasets and our dedicated benchmark demonstrate that the proposed framework establishes a new state-of-the-art in both time-wise detection and spatial localization tasks. Our code, pre-trained models, and benchmark are publicly available at https://github.com/anocodetest1/POST.

URL PDF HTML ☆

赞 0 踩 0

2605.18115 2026-05-19 cs.CV

轰鸣声击中新闻摊位：对全球山体滑坡相关新闻报道和空间偏见的数据分析

Brielen Madureira, Andreas Niekler, Marc Keuschnigg, Mariana Madruga de Brito

AI总结本文通过分析25年间近6万篇关于5500起山体滑坡事件的新闻文章，探讨德国报纸对全球山体滑坡的报道方式，揭示南欧和西欧地区报道过度的现象，为研究媒体对国际灾害关注的不平等提供参考。

Comments Work in progress

2605.18104 2026-05-19 cs.AI cs.CR

多语言大语言模型的高效路径：通过后训练PARAM$Δ$整合到再利用MoE进行语言扩展

Hao Zhou, Tianhao Li, Zhijun Wang, Shuaijie She, Linjuan Wu, Hao-Ran Wei, Baosong Yang, Jiajun Chen, Shujian Huang

AI总结本文提出了一种高效的方法，通过将密集模型转换为MoE架构，并将不同语言分配给不同专家，从而在不进行复杂对齐阶段的情况下提升多语言大语言模型的性能，同时保留原始能力。

详情

AI中文摘要

将大型语言模型（LLMs）扩展到新语言是一个成本高昂的过程，需要大量的持续预训练（CPT）和数据密集型对齐。尽管最近的数据免费融合技术试图通过将多语言CPT增强模型与其指令版本融合来绕过对齐，但它们受到关键权衡的限制：缓解参数冲突以保持原始能力不可避免地会稀释新语言的学习，反之亦然。为了解决这一矛盾，我们引入了\method，将密集模型重新利用为专家混合（MoE）架构，将不同专家分配给不同语言。然后通过将MoE扩展的参数delta（$Δ_{ ext{post}}$）嫁接回CPT增强的基模型来转移对齐能力，从而绕过复杂的对齐阶段。实验表明，\method在具有相似FLOPs或参数数量的基线方法上表现出色；它在扩展语言上提高了性能，同时有效保留了原始能力。我们进一步证明，我们的方法在不同模型和后训练delta上具有高度适用性。

英文摘要

Expanding Large Language Models~(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training~(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by fusing a multilingual CPT-enhanced model with its instruct counterpart, they are plagued by a critical trade-off: mitigating parameter conflicts to preserve original abilities inevitably dilutes new language acquisition, and vice-versa. To resolve this conflict, we introduce \method, which upcycles a dense model into a Mixture-of-Experts~(MoE) architecture, allocating different experts to different languages. Alignment ability is then transferred by grafting a MoE-expanded parameter delta~($Δ_{\text{post}}$) to the CPT-enhanced base model, bypassing the complex alignment phase. Experiments demonstrate \method's superiority even against baselines with similar FLOPs or number of parameters; it improves performance on expanded languages while effectively preserving original capabilities. We further show our approach is highly applicable across different models and Post-training deltas.

URL PDF HTML ☆

赞 0 踩 0

2605.18082 2026-05-19 cs.LG

pyforce-1.0.0: Python Framework for data-driven model Order Reduction of multi-physiCs problEms

pyforce-1.0.0: 用于多物理问题数据驱动模型降阶的Python框架

Stefano Riva, Yantao Luo, Carolina Introini, Antonio Cammi

AI总结本文提出pyforce-1.0.0框架，采用数据驱动降阶建模技术用于多物理问题，主要应用于核工程领域，改进了传感器位置优化和实测数据整合，提升了物理系统认知。

Comments Github Repo: https://github.com/ERMETE-Lab/ROSE-pyforce

2605.18079 2026-05-19 cs.LG cs.CC cs.CL

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

低精度softmax变换器的表达能力（摘要）链式思维

Moritz Brösamle, Stephan Eckstein

AI总结本文研究了低精度softmax变换器在链式思维中的表达能力，通过构造三元激活和分离注意力分数的硬max变换器来模拟图灵机，从而将构造转换为等效的softmax变换器，并分析了最近提出的总结链式思维范式在模拟图灵机时的效率。

Comments Accepted to ICML 2026

详情

AI中文摘要

现有的变换器表达性结果通常依赖于hardmax注意力、高精度和其它架构修改，这些修改将它们与实际使用的模型脱节。我们通过分析具有softmax注意力和激活值及注意力权重四舍五入的标准变换器解码器，同时允许深度和宽度以对数方式增长于上下文长度，来弥合这一差距。作为中间步骤，我们构造了具有三元激活和良好分离注意力分数的硬max变换器，利用链式思维（CoT）模拟图灵机。这使我们能够将构造转换为等效的softmax变换器，而无需先前方法所需的不现实的参数规模或激活精度。使用相同的技术，我们分析了最近提出的总结Co T范式，并展示其在模拟图灵机时更加高效，模型大小以空间界而非时间界缩放。我们通过在数独推理任务上验证我们的结果，并发现其比先前的高精度结果更符合可学习性。我们的代码可在https://github.com/moritzbroe/transformer-expressivity上获得。

英文摘要

Existing expressivity results for transformers typically rely on hardmax attention, high precision, and other architectural modifications that disconnect them from the models used in practice. We bridge this gap by analyzing standard transformer decoders with softmax attention and rounding of activations and attention weights, while allowing depth and width to grow logarithmically with the context length. As an intermediate step, we construct hardmax transformers with ternary activations and well-separated attention scores that simulate Turing machines using Chain-of-Thought (CoT). This lets us convert the constructions to equivalent softmax transformers without the unrealistic parameter magnitudes or activation precision that prior approaches would require. Using the same technique, we analyze a recently proposed summarized CoT paradigm and show that it simulates Turing machines more efficiently, with model size scaling logarithmically in a space bound rather than a time bound. We empirically test predictions made by our results on a Sudoku reasoning task and find better alignment with learnability than for prior high-precision results. Our code is available at https://github.com/moritzbroe/transformer-expressivity.

URL PDF HTML ☆

赞 0 踩 0

2605.18078 2026-05-19 cs.LG

KVDrive: 一个面向长上下文LLM推理的多层级KV缓存管理系统

Jian Lin, Jiazhi Mi, Zicong Hong, Haodong Wang, Qianli Liu, Haodyue Zhang, Peng Li, Song Guo

AI总结本文提出KVDrive，一个面向长上下文LLM推理的多层级KV缓存管理系统，通过联合缓存放置、流水线调度和跨层级协调，实现了高吞吐量的推理，在有限的GPU预算下保持高精度。

详情

AI中文摘要

支持长上下文LLM存在挑战，因为键值（KV）缓存的大量内存需求。现有的卸载系统将完整的缓存存储在主机内存中，并在解码过程中选择性地获取关键条目，但这种策略很快达到极限：无法进一步稀释而不影响准确性。因此，当上下文长度和批处理大小增加时，KV传输的体积急剧上升，成为解码延迟的主要来源。我们提出了KVDrive，一个横跨GPU内存、主机DRAM和SSD的多层级KV缓存管理系统。与之前通过算法改进追求更高稀疏度的工作不同，KVDrive从系统角度出发，联合缓存放置、流水线调度和跨层级协调，以在有限的GPU预算下维持高吞吐量的推理。KVDrive实现了三个基本能力：它根据注意力行为调整缓存管理以最大化重用并最小化冗余数据移动；它重构解码流水线以重叠I/O和CPU/GPU计算瓶颈阶段，消除异构资源中的停滞；并且它协调内存层级之间的数据移动，解锁远超GPU和DRAM限制的可扩展长上下文推理。我们已经实现了一个完整的KVDrive原型，并在长上下文基准测试中评估了流行LLM。该系统在保持准确性的同时，相比最先进的工作实现了高达1.74倍的吞吐量提升。

英文摘要

Supporting long-context LLMs is challenging due to the substantial memory demands of the key-value (KV) cache. Existing offloading systems store the full cache in host memory and selectively fetch critical entries during decoding, but this strategy quickly hits a ceiling: sparsity cannot be pushed further without degrading accuracy. As a result, when context length and batch size grow, the volume of KV transfers rises sharply and becomes the dominant source of decoding latency. We present KVDrive, a holistic multi-tier KV cache management system spanning GPU memory, host DRAM, and SSD. Unlike prior work that pursues greater sparsity through algorithmic refinements, KVDrive tackles the problem from a systems perspective - jointly orchestrating cache placement, pipeline scheduling, and cross-tier coordination to sustain high-throughput inference under tight GPU budgets. KVDrive advances three fundamental capabilities: it adapts cache management to attention behavior to maximize reuse and minimize redundant data movement; it restructures the decoding pipeline to overlap I/O- and CPU/GPU compute-bound stages, eliminating stalls across heterogeneous resources; and it harmonizes data movement across memory tiers to unlock scalable long-context inference far beyond GPU and DRAM limits. We have implemented a fully functional prototype of KVDrive and evaluated it on long-context benchmarks with popular LLMs. The system achieves up to 1.74x higher throughput compared to state-of-the-art works while preserving accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.18068 2026-05-19 cs.LG cs.AI

Improving Spatio-Temporal Residual Error Propagation by Mitigating Over-Squashing

通过缓解过压缩来改进时空残差误差传播

Seyed Mohamad Moghadas, Esther Rodrigo Bonet, Bruno Cornelis, Adrian Munteanu

AI总结本文提出Teger模块，通过空间曲率感知的图重排机制改进误差相关的自回归预测，提升时空预测的连续排名概率得分。

详情

AI中文摘要

残差误差传播仍然是递归模型中的基本问题，其中小的预测不准确会随时间累积并降低长周期性能。准确建模此类残差的相关结构对于概率多变量时间序列预测中的可靠不确定性量化至关重要。尽管最近的时间序列深度模型能够高效参数化时间变化的同期相关性，但它们通常假设误差的时序独立性，并忽略了观测网络中的空间相关性。在本文中，我们引入Teger，一个结构化的不确定性模块，克服了误差相关自回归预测中的空间和时间限制。Teger提出了一种空间曲率感知的图重排机制，明确加强了由离散Forman曲率识别出的信息瓶颈边。该组件被集成到低秩加对角协方差头中，通过Woodbury恒等式保持可推断性。Teger是backbone无关的，仅需任何自回归编码器产生的潜在状态。我们提供了Teger的理论证据，并在四个现实世界的时空数据集上实验评估了它在LSTM、Transformer和xLSTM backbone上的表现，显示了连续排名概率得分的一致改进。我们进一步提供了将曲率感知重排与（i）过压缩缓解、（ii）改进的谱连接性、（iii）减少有效电阻以及（iv）改进的协方差校准界联系起来的正式理论分析。

英文摘要

Residual error propagation remains a fundamental problem in recurrent models, where small prediction inaccuracies compound over time and degrade long-horizon performance. Accurately modeling the correlation structure of such residuals is critical for reliable uncertainty quantification in probabilistic multivariate timeseries forecasting. While recent time-series deep models efficiently parametrize time-varying contemporaneous correlations, they often assume temporal independence of errors and neglect spatial correlation across the observed network. In this paper, we introduce Teger, a structured uncertainty module that overcomes the spa- tial and temporal limitations of error-correlated autoregressive forecasting. Teger proposes a spatial curvature-aware graph rewiring mechanism explicitly strengthening information-bottleneck edges identified by discrete Forman curvature. The component is integrated into a low-rank-plus-diagonal covariance head, preserving tractable inference via the Woodbury identity. Teger is backbone-agnostic, requiring only the latent state produced by any autoregressive encoder. We provide theoretical evidence of Teger, and experimentally evaluate it on LSTM, Transformer, and xLSTM backbones across four real-world spatio-temporal datasets, showing consistent improvement in Continuous Ranked Probability Score (CRPS). We further provide a formal theoretical analysis connecting curvature-aware rewiring to (i) oversquashing alleviation, (ii) improved spectral connectivity, (iii) reduced effective resistance, and (iv) improved covariance calibration bounds

URL PDF HTML ☆

赞 0 踩 0

2605.18067 2026-05-19 cs.CL

PPAI: Enabling Personalized LLM Agent Interoperability for Collaborative Edge Intelligence

PPAI: 促进个性化大语言模型代理在协作边缘智能中的互操作性

Zile Wang, Qianli Liu, Kaibin Guo, Haodong Wang, Jian Lin, Zicong Hong, Song Guo

AI总结本文提出PPAI系统，通过代理专长实现用户间协作，解决动态代理池和负载平衡问题，提升任务准确性并降低延迟。

详情

AI中文摘要

在边缘设备上部署大型语言模型（LLM）可为各种用户提供个性化LLM代理。随着多样化个性化代理的可用性增加，同伴对同伴（P2P）协作提供了独特机会，其中每个用户可以将超出本地代理专长的任务委托给更适合特定查询的远程代理。本文介绍了PPAI，首个个性化LLM代理互操作性系统，使用户基于代理专长进行协作。然而，代理池的不断变化和其可互换性带来了新的挑战，即在存在 churn 的P2P网络中匹配查询到代理并平衡负载，与现有P2P系统相比更具挑战性。因此，我们提出了一种基于原型的可扩展查询-代理对评分机制，以在具有 churn 的P2P网络中识别适合的代理。此外，我们提出一个多代理互操作性贝叶斯博弈，以在远程代理负载变化过快无法观察时平衡本地需求和全局效率。最后，我们实现了一个PPAI原型，并证明它显著扩展了可执行的任务范围，同时保持负载平衡。平均而言，它在多个任务上实现了高达7.96%的准确性提升，同时相比基线减少了16.34%的延迟。

英文摘要

Deploying large language model (LLM) on edge device enables personalized LLM agents for various users. The growing availability of diverse personalized agents presents a unique opportunity for peer-to-peer (P2P) collaboration, wherein each user can delegate tasks beyond the local agent's expertise to remote agents more suited for the specific query. This paper introduces PPAI, the first personalized LLM agent interoperability system, which enables users to collaborate with each other based on agent specialization. However, the ever-changing pool of agents and their interchangeable capacity introduce new challenges when it comes to matching queries to agents and balancing loads, compared with existing P2P systems. Therefore, we propose a scalable query-agent pair scoring mechanism based on prototypes to identify suitable agents within a P2P network with churn. Moreover, we propose a multi-agent interoperability Bayesian game to balance local demand and global efficiency, when changes in remote agent load occur too quickly to be observed. Finally, we implement a prototype of PPAI and demonstrate that it substantially broadens the range of tasks that could be carried out while maintaining load balance. On average, it achieves an accuracy improvement of up to 7.96% across multiple tasks, while reducing latency by 16.34% compared to the baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.18063 2026-05-19 cs.CV cs.LG

The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting

MixCount数据集：弥合开放词汇物体计数的数据缺口

Corentin Dumery, Niki Amini-Naieni, Shervin Naini, Pascal Fua

AI总结本文提出MixCount数据集，通过自动生成管道解决开放词汇物体计数中混合物体场景下的数据不足问题，展示了在真实世界基准上的显著提升。

Comments Co-first authors. Dataset and project page https://corentindumery.github.io/projects/mixcount.html

详情

AI中文摘要

物体计数是一个基础的视觉任务，已有超过十年的专门研究，但最先进的模型在混合物体设置中仍系统性地失败，这在工业检测和产品分拣等现实应用中占主导地位。我们证明，这一差距主要是由现有训练和评估数据的限制造成的：真实的计数数据集标注成本过高且存在标签噪声，而现有的合成替代方案缺乏多样性和现实感。我们通过MixCount数据集和基准来解决这一问题，该数据集旨在针对当前计数模型的失败模式。为了克服构建和标注此类数据的高成本，我们开发了一种自动生成管道，能够大规模合成图像、细粒度文本描述和像素级计数注释，消除了此前数据集中的标注模糊性。在MixCount上评估最先进的计数模型会暴露混合物体设置下的严重退化。更重要的是，将这些模型在我们的合成数据上训练，在真实世界基准上取得了显著提升，将FSC-147的MAE降低了20.14%，在PairTally上降低了18.3%。这些结果确立了MixCount作为细粒度计数的基准和训练数据集，并证明了我们的管道能够产生实际上无限的标注数据，从而解决了计数模型中长期存在的瓶颈问题。

英文摘要

Object counting is a foundational vision task with over a decade of dedicated research, yet state-of-the-art models still fail systematically in the mixed-object setting that dominates real-world applications such as industrial inspection and product sorting. We show that this gap is strongly driven by limitations in existing training and evaluation data: real counting datasets are prohibitively expensive to annotate and suffer from labeling noise, while existing synthetic alternatives lack diversity and realism. We address this with MixCount, a dataset and benchmark for mixed-object counting designed to target the failure modes of current counting models. To overcome the high cost of constructing and labeling such data, we develop an automatic generation pipeline that synthesizes images, fine-grained textual descriptions, and pixel-perfect counting annotations at scale, eliminating the labeling ambiguity that plagues prior datasets. Evaluating state-of-the-art counting models on MixCount exposes severe degradation in the mixed-object setting. More importantly, training these models on our synthesized data yields substantial gains on real-world benchmarks, reducing MAE by 20.14% on FSC-147 and by 18.3% on PairTally. These results establish MixCount as both a benchmark and a training dataset for fine-grained counting, and demonstrate that our pipeline, which produces effectively unlimited labeled data, helps address a long-standing bottleneck in counting models.

URL PDF HTML ☆

赞 0 踩 0

2605.18060 2026-05-19 cs.CV

Embedded ConvNet Ensembles: A Lightweight Approach to Recognize Arabic Handwritten Characters

嵌入式卷积网络集合：一种轻量级的阿拉伯手写字符识别方法

Mohsine El Khayati, Rachid Elouahbi, Abdelillah Semma

AI总结本文提出了一种轻量级嵌入式卷积网络与集成学习相结合的方法，用于实现阿拉伯手写字符识别，通过实验验证了轻量模型在准确率上的优势以及集成学习对性能的提升。

Comments Accepted in the IEEE 15th Image, Video, and Multidimensional Signal Processing Workshop 2026

详情

AI中文摘要

阿拉伯手写字符识别（AHCR）近年来通过深度卷积神经网络（ConvNets）取得了显著进展。然而，文献中的许多模型深度且在参数和FLOPs上计算成本高，限制了其在资源受限设备上的部署，而这些设备日益普遍。本研究通过提出轻量级嵌入式ConvNet模型和集成学习技术来填补这一空白。进行了广泛的实验以确定AHCR的最佳实践，考虑了训练超参数、学习策略、模型选择和集成方法。结果表明，嵌入模型可以实现与或甚至超过更重架构的准确率。集成学习在只有适度计算开销的情况下进一步提升性能，特别是在具有挑战性的训练场景中。在集成策略中，软投票产生了最佳的整体结果。

英文摘要

Arabic Handwritten Character Recognition (AHCR) has recently advanced significantly with deep Convolutional Neural Networks (ConvNets). However, many models in the literature are deep and computationally expensive in terms of parameters and FLOPs, limiting their deployment on resource-constrained devices, which are increasingly common. This study addresses this gap by proposing a combination of lightweight embedded ConvNet models and ensemble learning techniques. Extensive experiments were conducted to identify best practices in AHCR, considering training hyperparameters, learning strategies, model choices, and ensemble methods. Results show that embedded models can achieve accuracy comparable to, or even surpassing, heavier architectures. Ensemble learning further enhances performance with only modest computational overhead, particularly under challenging training scenarios. Among the ensembling strategies, soft voting yielded the best overall results.

URL PDF HTML ☆

赞 0 踩 0