arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1926
专题追踪 全部专题
2602.22270 2026-05-22 cs.LG q-bio.PE

Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting

先验知识增强的时空疫情预测

Sijie Ruan, Jinyu Li, Jia Wei, Zenghao Xu, Jie Bao, Junshi Xu, Junyang Qiu, Shuliang Wang, Xiaoxiao Wang, Hanning Yuan

发表机构 * Beijing Institute of Technology(北京理工大学) Zhejiang Provincial Center for Disease Control and Prevention(浙江省疾病预防控制中心) JD Technology(京东科技) The University of Hong Kong(香港大学) China Mobile Internet(中国移动互联网)

AI总结 本文提出了一种结合隐式时空先验和显式专家先验的新型混合框架STOEP,通过动态调整区域依赖关系、放大弱信号和机制性预测来提升时空疫情预测的准确性。

Comments 12 pages, 10 figures, accepted to IJCAI 2026

详情
AI中文摘要

时空疫情预测对于公共卫生管理至关重要,但现有方法常面临对弱疫情信号不敏感、空间关系过于简化和参数估计不稳定的问题。为解决这些问题,我们提出了Spatio-Temporal priOr-aware Epidemic Predictor(STOEP),一种新的混合框架,整合了隐式时空先验和显式专家先验。STOEP由三个关键组件组成:(1)病例感知邻接学习(CAL),利用历史感染模式动态调整基于移动性的区域依赖关系;(2)空间指导参数估计(SPE),采用可学习的空间先验来放大弱疫情信号;(3)基于滤波的机制性预测(FMF),使用专家指导的自适应阈值策略来正则化疫情参数。在真实世界中的新冠和流感数据集上进行的广泛实验表明,STOEP在RMSE上比最佳基线高出11.1%。该系统已在中国一个省级CDC部署,以促进后续应用。

英文摘要

Spatio-temporal epidemic forecasting is critical for public health management, yet existing methods often struggle with insensitivity to weak epidemic signals, over-simplified spatial relations, and unstable parameter estimation. To address these challenges, we propose the Spatio-Temporal priOr-aware Epidemic Predictor (STOEP), a novel hybrid framework that integrates implicit spatio-temporal priors and explicit expert priors. STOEP consists of three key components: (1) Case-aware Adjacency Learning (CAL), which dynamically adjusts mobility-based regional dependencies using historical infection patterns; (2) Space-informed Parameter Estimating (SPE), which employs learnable spatial priors to amplify weak epidemic signals; and (3) Filter-based Mechanistic Forecasting (FMF), which uses an expert-guided adaptive thresholding strategy to regularize epidemic parameters. Extensive experiments on real-world COVID-19 and influenza datasets demonstrate that STOEP outperforms the best baseline by 11.1% in RMSE. The system has been deployed at a provincial CDC in China to facilitate downstream applications.

2602.20845 2026-05-22 cs.CV

FLIM Networks with Bag of Feature Points

具有特征点袋的FLIM网络

João Deltregia Martinelli, Marcelo Luis Rodrigues Filho, Felipe Crispim da Rocha Salvagnini, Gilson Junior Soares, Jefersson A. dos Santos, Alexandre X. Falcão

发表机构 * Institute of Computing UNICAMP Campinas, Brazil School of Computer Science University of Sheffield Sheffield, United Kingdom(计算研究所(UNICAMP) 埃尔南迪斯,巴西 学校计算机科学 大学谢菲尔德,英国)

AI总结 本文提出FLIM-BoFP,一种更高效的滤波器估计方法,用于显微镜图像中的寄生虫检测,相较于FLIM-Cluster和其他先进基线,在效率、效果和泛化能力上均有优势。

Comments Accepted at the 28th Iberoamerican Congress on Pattern Recognition (CIARP 2025). To appear in Lecture Notes in Computer Science (LNCS), Springer

详情
AI中文摘要

卷积网络需要大量的图像标注,这可能成本高昂且耗时。通过从少量代表性图像上用户绘制的标记中估计编码器滤波器(即核权重),特征学习从图像标记(FLIM)解决了这一挑战,而无需传统优化。这种编码器与自适应解码器结合构成了一个完全训练而无需反向传播的FLIM网络。先前研究已证明其在显著物检测(SOD)中的有效性,比现有轻量模型显著更轻。本研究重新审视FLIM SOD,并引入FLIM-Bag of Feature Points(FLIM-BoFP),一种显著更快的滤波器估计方法。先前方法FLIM-Cluster通过每个编码器块的补丁聚类来推导滤波器,导致计算开销和对滤波器位置的控制减少。FLIM-BoFP通过在输入块进行一次聚类,创建特征点袋,并在所有块上直接从映射的特征点定义滤波器。论文评估了FLIM-BoFP与FLIM-Cluster和其他最先进的基线在寄生虫检测中的效率、效果和泛化能力的益处。

英文摘要

Convolutional networks require extensive image annotation, which can be costly and time-consuming. Feature Learning from Image Markers (FLIM) tackles this challenge by estimating encoder filters (i.e., kernel weights) from user-drawn markers on discriminative regions of a few representative images without traditional optimization. Such an encoder combined with an adaptive decoder comprises a FLIM network fully trained without backpropagation. Prior research has demonstrated their effectiveness in Salient Object Detection (SOD), being significantly lighter than existing lightweight models. This study revisits FLIM SOD and introduces FLIM-Bag of Feature Points (FLIM-BoFP), a considerably faster filter estimation method. The previous approach, FLIM-Cluster, derives filters through patch clustering at each encoder's block, leading to computational overhead and reduced control over filter locations. FLIM-BoFP streamlines this process by performing a single clustering at the input block, creating a bag of feature points, and defining filters directly from mapped feature points across all blocks. The paper evaluates the benefits in efficiency, effectiveness, and generalization of FLIM-BoFP compared to FLIM-Cluster and other state-of-the-art baselines for parasite detection in optical microscopy images.

2602.18141 2026-05-22 cs.LG

Geometry-Induced Diffusion on Graphs: A Learnable Weighted Laplacian for Spectral GNNs

图诱导扩散:用于谱GNNs的可学习加权拉普拉斯算子

Mia Zosso, Ali Hariri, Victor Kawasaki-Borruat, Pierre-Gabriel Berlureau, Pierre Vandergheynst

发表机构 * École Polytechnique Fédérale de Lausanne (EPFL)(瑞士联邦理工学院(EPFL)) École Normale Supérieure – PSL(巴黎高等师范学院–PSL)

AI总结 本文提出了一种简单的谱GNN架构mu-ChebNet,通过学习节点级权重函数mu来修改图拉普拉斯算子,从而改变传播几何而不改变图拓扑,从而促进信息传播的优选路径,帮助长距离信号避免高收缩瓶颈,无需重复层堆叠。

详情
AI中文摘要

长距离图任务对图神经网络(GNNs)来说具有挑战性:全局机制如注意力或重排方案可能计算成本高,而深度局部传播容易导致梯度消失、过平滑和过压缩。引入的mu-ChebNet架构是一种简单的谱GNN,它在应用ChebNet式滤波器之前学习一个节点级权重函数mu。所学的权重mu诱导了一个修改后的图拉普拉斯算子,从而有效改变传播几何而不改变图拓扑。这种任务相关的几何促进了信息传播的优选路径,从而帮助长距离信号避免高度收缩的瓶颈,并消除了对重复层堆叠的需要。在实践中,我们用学习的算子L_mu代替固定的图拉普拉斯算子L,保持所提出的mu-ChebNet架构轻量级,同时使传播任务自适应。此外,我们提供了一种谱分析,说明mu如何调节传播动力学,并在合成长距离推理任务和现实世界图基准上观察到性能的提高。所学的权重函数不仅具有可解释性,还为自适应图传播提供了轻量级的替代方案。

英文摘要

Long-range graph tasks are challenging for Graph Neural Networks (GNNs): global mechanisms such as attention or rewiring schemes can be computationally expensive, while deep local propagation is prone to vanishing gradients, oversmoothing, and oversquashing. The introduced mu-ChebNet architecture is a simple spectral GNN that learns a node-wise weight function mu before applying ChebNet-style filters. The learned weighting mu induces a modified graph Laplacian which effectively changes the propagation geometry without altering the graph topology. This task-dependent geometry promotes preferred routes for information propagation, thereby helping long-range signals avoid highly contractive bottlenecks, and obviating the need for repeated layer stacking. In practice, we replace the fixed graph Laplacian L by a learned operator L_mu, keeping the proposed mu-ChebNet architecture lightweight while making propagation task-adaptive. Furthermore, we provide a spectral analysis demonstrating how mu modulates propagation dynamics, and empirically observe improved performance on both synthetic long-range reasoning tasks and real-world graph benchmarks. The learned weight function is not only interpretable, but also offers a lightweight alternative to attention and rewiring for adaptive graph propagation.

2602.17517 2026-05-22 cs.CV

Depth Augmented and FE Free 3D/2D Liver Registration for Laparoscopic Liver AR

深度增强和无有限元分析的3D/2D肝脏注册用于腹腔镜肝脏AR

Hanyuan Zhang, Lucas He, Runlong He, Weixi Yi, Abdolrahim Kadkhodamohammadi, Danail Stoyanov, Brian R. Davidson, Evangelos B. Mazomenos, Matthew J. Clarkson

发表机构 * UCL Hawkes Institute, University College London, London WC1E 6BT, UK(伦敦大学学院UCL哈维斯研究所) Division of Surgery and Interventional Science, University College London, London WC1E 6BT, UK(伦敦大学学院UCL外科与介入科学系) Unit for Lifelong Health and Ageing at UCL, University College London, London WC1E 7HB, UK(伦敦大学学院UCL终身健康与老龄化单位) Medtronic plc., London, UK(伦敦梅脱利克公司)

AI总结 本研究提出了一种深度增强且无需有限元分析的3D/2D肝脏注册方法,通过结合鲁棒的刚性初始化和患者特定的非刚性细化,以提高腹腔镜肝脏手术AR中的3D到2D注册精度。

详情
AI中文摘要

增强现实(AR)在腹腔镜肝脏手术中的引导需要准确地将术前3D模型与术中2D视频进行注册,但因部分可见性、镜面反射和组织变形而具有挑战性。现有方法通常依赖于基于轮廓的刚性初始化和有限元(FE)模型进行可变形注册,增加了建模和工程复杂性。我们提出了一种深度增强且无有限元分析的3D-2D注册流程,结合了鲁棒的刚性初始化和患者特定的非刚性细化。对于刚性对齐,我们通过使用多类轮廓图和单目深度来适应FoundationPose的RefineNet模块以适应腹腔镜肝脏场景,以实现相对姿态的细化。对于可变形对齐,我们从非刚性ICP(NICP)对应关系中构建患者特定的统计变形模型,并使用粗到细的L-BFGS-B策略优化姿态和形状参数。在公开的临床腹腔镜肝脏数据集上,所提出的方法在受控的手动轮廓设置下实现了平均目标注册误差(TRE)为14.73毫米。消融研究显示,单目深度在轮廓输入上提高了刚性初始化,而肿瘤映射分析表明良好的表面对齐并不一定转化为更低的目标定位误差。在没有地面真实数据的外部数据集上,该方法产生视觉上合理的叠加以进行定性评估。这些结果表明,深度增强的姿态细化和无有限元分析的统计变形建模为受控的3D-2D肝脏注册在手术AR中提供了一个有前景的替代方案。

英文摘要

Augmented reality (AR) guidance in laparoscopic liver surgery requires accurate registration of preoperative 3D models to intraoperative 2D video, but remains challenging due to partial visibility, specularities, and tissue deformation. Existing methods often rely on contour-based rigid initialization and finite-element (FE) models for deformable registration, increasing modeling and engineering complexity. We present a depth-augmented, FE-free 3D--2D registration pipeline that combines robust rigid initialization with patient-specific non-rigid refinement. For rigid alignment, we adapt the RefineNet module of FoundationPose to laparoscopic liver scenes by using multi-class contour maps and monocular depth for relative pose refinement. For deformable alignment, we construct a patient-specific statistical deformation model from non-rigid ICP (NICP) correspondences and optimize pose and shape parameters using a coarse-to-fine L-BFGS-B strategy. On a public clinical laparoscopic liver dataset, the proposed method achieves a mean target registration error (TRE) of 14.73\,mm under a controlled manual-contour setting designed to isolate registration performance. Ablation studies show that monocular depth improves rigid initialization over contour-only inputs, while tumor-mapping analysis indicates that good surface alignment does not necessarily translate into lower target localization error. On an external dataset without ground truth, the method produces visually plausible overlays for qualitative assessment. These results suggest that depth-augmented pose refinement and FE-free statistical deformation modeling provide a promising alternative to FE-based pipelines for controlled 3D--2D liver registration in surgical AR.

2602.17385 2026-05-22 cs.AI

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

通过克罗内克-因子近似曲率进行任务算术中的无数据权重解耦

Angelo Porrello, Pietro Buzzega, Felix Dangel, Thomas Sommariva, Riccardo Salami, Lorenzo Bonicelli, Simone Calderara

发表机构 * University of Modena and Reggio Emilia(莫德纳和雷吉奥艾米利亚大学) Vector Institute(向量研究所)

AI总结 本文提出了一种无数据的方法,通过将表示漂移正则化问题框架化为曲率矩阵近似问题,以解决任务算术中任务向量的交叉任务干扰问题,实现了任务加法和否定的最新成果。

Comments Accepted to ICLR 2026

详情
AI中文摘要

任务算术提供了一种模块化且可扩展的方法来适应基础模型。然而,结合多个任务向量可能导致跨任务干扰,导致表示漂移和性能下降。表示漂移正则化提供了一种自然的解决方法来解耦任务向量;然而,现有方法通常需要外部任务数据,这与模块化和数据可用性约束(例如隐私要求)相冲突。我们提出了一种无数据的方法,通过将正则化表示漂移作为曲率矩阵近似问题来框架化。这使我们能够利用已建立的技术;特别是,我们采用克罗内克-因子近似曲率,并获得一个实用的正则器,实现了任务加法和否定的最新成果。我们的方法在任务数量上具有常数复杂性,并增强了对任务向量重新缩放的鲁棒性,消除了对保留调优的需要。

英文摘要

Task Arithmetic yields a modular, scalable way to adapt foundation models. Combining multiple task vectors, however, can lead to cross-task interference, causing representation drift and degraded performance. Representation drift regularization provides a natural remedy to disentangle task vectors; however, existing approaches typically require external task data, conflicting with modularity and data availability constraints (e.g., privacy requirements). We propose a dataless approach by framing regularization against representation drift as a curvature matrix approximation problem. This allows us to leverage well-established techniques; in particular, we adopt Kronecker-Factored Approximate Curvature and obtain a practical regularizer that achieves state-of-the-art results in task addition and negation. Our method has constant complexity in the number of tasks and promotes robustness to task vector rescaling, eliminating the need for held-out tuning.

2602.13372 2026-05-22 cs.AI cs.LG

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

MoralityGym:用于评估序列决策代理中分层道德对齐的基准

Simon Rosen, Siddarth Singh, Ebenezer Gelo, Helen Sarah Robertson, Ibrahim Suder, Victoria Williams, Benjamin Rosman, Geraud Nangue Tasse, Steven James

发表机构 * University of the Witwatersrand(威特沃特斯兰大学)

AI总结 本文提出MoralityGym基准,通过将道德规范表示为有序的规范约束,评估序列决策代理中分层道德对齐的挑战,展示了98个伦理困境问题,并通过心理学和哲学的见解改进了伦理决策方法。

Comments Accepted at AAMAS 2026

Journal ref Proc of the 25th International Conference on Autonomous Agents and Multiagent Systems AAMAS 2026, Paphos, Cyprus, May 25 to 29, 2026, IFAAMAS

详情
AI中文摘要

评估在面对冲突且分层结构的人类规范时,代理的道德对齐是一个在人工智能安全、道德哲学和认知科学交汇处的关键挑战。我们引入了Morality Chains,一种新的形式化方法,用于将道德规范表示为有序的规范约束,并引入了MoralityGym,一个包含98个伦理困境问题的基准,这些问题是作为电车困境风格的Gymnasium环境呈现的。通过将任务解决与道德评估解耦,并引入新的道德度量标准,MoralityGym允许将心理学和哲学的见解整合到规范敏感推理的评估中。基于安全强化学习方法的基准结果揭示了关键限制,强调了需要更系统的方法来处理伦理决策。本文为开发在复杂现实环境中行为更可靠、透明和道德的AI系统提供了基础。

英文摘要

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

2602.12952 2026-05-22 cs.LG cs.AI cs.CV

Transporting Task Vectors across Different Architectures without Training

在不同架构间传输任务向量而无需训练

Filippo Rinaldi, Aniello Panariello, Giacomo Salici, Angelo Porrello, Simone Calderara

发表机构 * AImageLab, University of Modena and Reggio Emilia(AImageLab,Modena和雷吉奥艾米利亚大学)

AI总结 本文提出Theseus方法,通过功能匹配在不同宽度模型间传输任务更新,无需训练或反向传播,展示了在视觉和语言模型上的改进效果。

Comments Accepted at the International Conference on Machine Learning (ICML), 2026

详情
AI中文摘要

适应大型预训练模型以完成下游任务时,通常会产生针对特定任务的参数更新,这些更新对于每个模型变体重新学习都很昂贵。尽管最近的研究表明,这些更新可以在具有相同架构的模型之间转移,但跨不同宽度的模型转移仍鲜有探索。在本文中,我们引入Theseus,一种无需训练的方法,用于在异构宽度模型间传输任务更新。与其匹配参数,我们通过其在中间表示上诱导的功能效应来表征任务更新。我们正式将任务向量传输定义为在观察到的激活上进行的功能匹配问题,并显示在通过正交Procrustes分析对齐表示空间后,它允许一个稳定的闭式解,该解保留了更新的几何结构。我们在不同宽度的视觉和语言模型上评估Theseus,显示在不进行额外训练或反向传播的情况下,相对于基线有持续的改进。我们的结果表明,当任务身份通过功能而非参数定义时,任务更新可以有意义地在不同架构间转移。代码可在https://github.com/apanariello4/merge-and-rebase获取。

英文摘要

Adapting large pre-trained models to downstream tasks often produces task-specific parameter updates that are expensive to relearn for every model variant. While recent work has shown that such updates can be transferred between models with identical architectures, transferring them across models of different widths remains unexplored. In this work, we introduce Theseus, a training-free method for transporting task updates across heterogeneous-width models. Rather than matching parameters, we characterize a task update by the functional effect it induces on intermediate representations. We formalize task-vector transport as a functional matching problem on observed activations and show that, after aligning representation spaces via orthogonal Procrustes analysis, it admits a stable closed-form solution that preserves the geometry of the update. We evaluate Theseus on vision and language models across different widths, showing consistent improvements over baselines without additional training or backpropagation. Our results show that task updates can be meaningfully transferred across architectures when task identity is defined functionally rather than parametrically. Code is available at https://github.com/apanariello4/merge-and-rebase.

2602.12506 2026-05-22 cs.LG

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

关于RL微调VLMs的鲁棒性和链式思维一致性

Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal

发表机构 * Apple(苹果公司) OpenAI

AI总结 本文研究了RL微调VLMs在视觉推理任务中的鲁棒性和链式思维一致性,发现文本扰动和CoT不一致会显著降低模型的鲁棒性和信心,而闭源模型在保持鲁棒性和推理一致性方面表现更佳,指出这一差距源于当前开源RL微调的不足而非任务本身的限制。

Comments ICML 2026

详情
AI中文摘要

强化学习(RL)微调已成为增强大型语言模型(LLMs)在推理密集型任务中的关键技术,推动其扩展到视觉语言模型(VLMs)。尽管RL微调的VLMs在视觉推理基准测试中表现优异,但它们仍容易受到弱视觉基础、幻觉和过度依赖文本提示的影响。我们发现,简单的受控文本扰动,包括误导的标题或错误的链式思维(CoT)轨迹,会导致鲁棒性和信心的显著下降,且当考虑跨开源多模态推理模型的CoT一致性时,这些影响更为明显。相比之下,闭源模型表现出相似的失败模式,但保持了显著更高的鲁棒性和推理一致性,这表明差距反映的是当前开源RL微调的不足,而非任务本身的限制。为了更好地理解这些漏洞,我们进一步分析了RL微调动态,并揭示了准确率与忠实度之间的权衡:微调提高了基准测试准确率,但同时可能削弱伴随的CoT的可靠性及其对上下文变化的鲁棒性。尽管对抗性增强提高了鲁棒性,但本身并不能防止忠实度漂移。结合忠实度意识的奖励可以恢复答案与推理之间的对齐,但当与增强结合时,训练风险会坍缩到捷径策略,鲁棒性仍然难以获得。这些发现突显了仅基于准确率的评估的局限性,并促使训练和评估协议共同强调正确性、鲁棒性和视觉基础推理的忠实度。

英文摘要

Reinforcement learning (RL) finetuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision-language models (VLMs). While RL-tuned VLMs improve on visual reasoning benchmarks, they remain vulnerable to weak visual grounding, hallucinations, and over-reliance on textual cues. We show that simple, controlled textual perturbations, including misleading captions or incorrect chain-of-thought (CoT) traces, cause substantial drops in robustness and confidence, and that these effects are more pronounced when CoT consistency is taken into account across open-source multimodal reasoning models. In contrast, closed models exhibit similar failure modes but maintain markedly greater robustness and reasoning consistency, suggesting that the gap reflects a shortcoming in current open-source RL finetuning rather than an inherent limitation of the task. To better understand these vulnerabilities, we further analyze RL finetuning dynamics and uncover an accuracy-faithfulness trade-off: finetuning raises benchmark accuracy, but can simultaneously erode the reliability of the accompanying CoT and its robustness to contextual shifts. Although adversarial augmentation improves robustness, it does not by itself prevent faithfulness drift. Incorporating a faithfulness-aware reward can restore alignment between answers and reasoning, but when paired with augmentation, training risks collapsing onto shortcut strategies and robustness remains elusive. Together, these findings highlight the limitations of accuracy-only evaluations and motivate training and assessment protocols that jointly emphasize correctness, robustness, and the faithfulness of visually grounded reasoning.

2602.10894 2026-05-22 cs.LG cs.AI

Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games

重新审视正则化策略优化以实现稳定且高效的双人博弈强化学习

Kazuki Ota, Takayuki Osa, Motoki Omura, Tatsuya Harada

发表机构 * The University of Tokyo, Japan(东京大学) RIKEN Center for Advanced Intelligence Project, Japan(日本RIKEN高级智能项目中心)

AI总结 本文重新审视了带有反向Kullback-Leibler正则化和熵正则化的策略优化方法,在双人零和设置中从理论和经验角度分析其组合,提供了新的收敛保证并通过合成游戏的数值实验验证了理论结果,并基于正则化策略优化推导出一种实用的模型无关强化学习算法,通过在五个棋盘游戏中进行的全面实验验证了算法的训练效率。

Comments Accepted at ICML 2026

详情
AI中文摘要

像棋盘游戏这样的双人博弈长期以来一直是强化学习的传统基准。本工作重新审视了一种带有反向Kullback-Leibler正则化和熵正则化的策略优化方法,并从理论和经验角度分析其在双人零和设置中的组合。从理论角度来看,我们研究了策略更新规则在两个理论设置中的稳定性:博弈论的正常形式博弈和有限长度博弈。我们提供了新的收敛保证,并通过合成游戏的数值实验验证了我们的理论结果。从经验角度来看,我们推导出一种基于正则化策略优化的实用模型无关强化学习算法。我们通过在五个棋盘游戏中进行的全面实验验证了我们算法的训练效率。实验结果表明,我们的智能体在各种环境中学习效率均优于现有方法。

英文摘要

Two-player games such as board games have long been used as traditional benchmarks for reinforcement learning. This work revisits a policy optimization method with reverse Kullback-Leibler regularization and entropy regularization and analyzes this combination in two-player zero-sum settings from theoretical and empirical perspectives. From a theoretical perspective, we investigate the stability of the policy update rule in two theoretical settings: game-theoretic normal-form games and finite-length games. We provide novel convergence guarantees and verify our theoretical results through numerical experiments on synthetic games. From an empirical perspective, we derive a practical model-free reinforcement learning algorithm based on the regularized policy optimization. We validate the training efficiency of our algorithm through comprehensive experiments on five board games: Animal Shogi, Gardner Chess, Go, Hex, and Othello. Experimental results show that our agent learns more efficiently than existing methods across environments.

2602.10085 2026-05-22 cs.AI

CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

CODE-SHARP: 连续开放发现和演化的技能作为层次奖励程序

Richard Bornemann, Pierluigi Vito Amadori, Antoine Cully

发表机构 * Imperial College London(帝国理工学院伦敦分校) Sony Interactive Entertainment(索尼互动娱乐)

AI总结 该研究提出CODE-SHARP框架,通过基础模型自主发现和演化技能作为层次奖励程序,实现通用智能体政策的从零开始强化学习,无需预定义奖励,有效学习长周期技能。

Comments Preprint

详情
AI中文摘要

一般智能的核心特征是能够自主扩展和演化其掌握的技能集。尽管最近基于基础模型(FM)的方法在这一目标上显示出有希望的结果,但它们通常依赖于显著的人工工程,限制了其在新环境中的可转移性。为了解决这个问题,我们引入了连续开放发现和演化技能作为层次奖励程序(CODE-SHARP)框架,该框架利用基础模型来自主增长和演化一个编码技能的Python程序档案,通过强化学习训练通用智能体策略。这些程序被称为技能作为层次奖励程序(SHARPs),每个程序编码一个局部成功条件和一组被委托给先前发现的SHARPs的先决条件。在运行时,SHARPs根据当前状态动态路由智能体通过其先决条件链,奖励沿途的每个完成,要求智能体仅学习每个新SHARP引入的边际行为,从而在无需预定义奖励的情况下高效学习长周期技能。在Craftax-Classic和XLand上,由CODE-SHARP完全自主训练的智能体在中位性能上比先前工作高出6倍和2.6倍,并且是唯一能够制作铁工具和开采钻石的智能体。在扩展的Craftax上,CODE-SHARP在超过90个发现的SHARPs上训练通用智能体,使其能够零样本解决具有挑战性的长周期任务,与基于真实奖励训练的智能体表现相当。

英文摘要

A core quality of general intelligence is the ability to open-endedly expand and evolve its set of mastered skills autonomously. While recent Foundation Model (FM) driven approaches have shown promising results towards this goal, they typically rely on significant human-in-the-loop engineering, limiting their transferability to novel environments. To address this, we introduce Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs (CODE-SHARP), a framework that leverages FMs to open-endedly grow and evolve an archive of Python programs encoding skills to train a generalist agent policy entirely from scratch via reinforcement learning, directly from source code. These programs, termed Skills as Hierarchical Reward Programs (SHARPs), each encode a local success condition and a set of prerequisites delegated to previously discovered SHARPs. At runtime, SHARPs dynamically route the agent through their prerequisite chain based on the current state, rewarding each completion along the way, requiring the agent to learn only the marginal behaviour each new SHARP introduces, enabling efficient learning of long-horizon skills without any pre-defined rewards. On Craftax-Classic and XLand, agents trained fully autonomously by CODE-SHARP outperform previous works by 6x and 2.6x in median performance and are the only agents capable of crafting iron tools and mining diamonds. Scaled to Craftax-Extended, CODE-SHARP trains a generalist agent on over 90 discovered SHARPs, enabling the agent to solve challenging long-horizon tasks zero-shot, matching agents trained on ground-truth rewards.

2602.10009 2026-05-22 cs.AI cs.HC

Discovering High Level Patterns from Simulation Traces

从仿真轨迹中发现高层次模式

Sean Memery, Kartic Subr

发表机构 * University of Edinburgh(爱丁堡大学)

AI总结 本文提出了一种通过程序合成进行无监督学习的方法,将仿真轨迹转换为稀疏的高层次模式表示,以提升大语言模型对物理系统的推理能力。

详情
AI中文摘要

大型语言模型(LLMs)在处理特定物理系统时无法可靠推理。尽管尝试通过赋予LLMs物理概念知识来提升其能力显示出巨大潜力,但可解释性和验证仍面临挑战。一种新兴的替代方法是工具链,其中LLMs可以查询物理模拟器并利用生成的仿真轨迹作为验证上下文。然而,这种方法的可扩展性较差,因为仿真轨迹包含大量细粒度的数值和语义数据。我们证明,将仿真轨迹转换为稀疏表示的“高层次”结构模式能更有效地被LLMs解释。我们提出了一种无监督学习方案,通过程序合成执行此转换或注释。我们的学习结果产生了一组程序库,这些程序作为模式检测器,可以将仿真轨迹转换为稀疏注释的模式序列。检测到的模式可选地通过人类专家的字符串标签(如刚性碰撞、拉伸弹簧等)进行引导。我们通过最近的一个物理基准测试表明,这样的注释表示更易于自然语言推理特定物理系统。合成的程序充当透明、可解释的函数,将系统状态映射到稀疏且高效的注释空间。作为应用示例,我们展示了如何将自然语言指定的物理系统目标转换为奖励程序,通过最大化这些程序来寻找解决方案。

英文摘要

Large Language Models (LLMs) are unable to reliably reason about specific physical systems. Attempts to imbue LLMs with knowledge of the necessary physics concepts have shown great promise, but explainability and validation remain open challenges. An emerging alternative is tooling, where LLMs can query physical simulators and use the resulting simulation traces as context for validation. This approach suffers from poor scalability since simulation traces contain large volumes of fine-grained numerical and semantic data. We show that translating simulation traces to a sparse representation of "high-level" structural patterns leads to more effective interpretation by LLMs. We propose an unsupervised learning scheme to perform this translation, or annotation, via program synthesis. Our learning results in a library of programs that act as pattern detectors which can translate simulation traces to sparse, annotated pattern sequences. The detected patterns may optionally be guided by human experts via string labels (rigid collision, stretching spring, etc.). We show, using a recent physics benchmark, that such annotated representations are more amenable to natural language reasoning about specific physical systems. The synthesized programs serve as transparent, explainable functions that map system states to a sparse and efficient annotation space. As an example application, we show how goals within physical systems that are specified in natural language may be converted to reward programs which are maximized to find solutions.

2602.09851 2026-05-22 cs.LG

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

CoFEH: 由协作贝叶斯超参数优化赋能的LLM驱动特征工程

Beicheng Xu, Keyao Ding, Wei Liu, Yupeng Lu, Bin Cui

发表机构 * School of CS \& Key Lab of High Confidence Software Technologies (MOE), Peking University Beijing China School of CS \& Beijing Key Laboratory of Software Hardware Cooperative Artificial Intelligence Systems, Peking University Beijing China School of CS \& Key Lab of High Confidence Software Technologies (MOE), Peking University Hardware Cooperative Artificial Intelligence Systems, Peking University

AI总结 本文提出CoFEH框架,通过结合LLM驱动的特征工程和贝叶斯超参数优化,实现鲁棒的端到端AutoML,解决了传统方法在搜索空间刚性和缺乏领域意识的问题,并引入互条件机制提升FE与HPO的协同效果。

Comments Accepted at KDD 2026. Extended version with full appendices

详情
AI中文摘要

特征工程(FE)在自动化机器学习(AutoML)中至关重要,但传统方法在搜索空间刚性和缺乏领域意识方面存在瓶颈。尽管大型语言模型(LLMs)能生成无界运算符,但现有方法仅关注孤立子任务,无法实现自由形式的FE流程。此外,它们很少与下游ML模型的超参数优化(HPO)结合,导致贪心的"FE-then-HPO"工作流无法捕捉强FE-HPO交互。本文提出CoFEH,一种协作框架,通过 interleaving LLM驱动的FE和贝叶斯HPO实现鲁棒的端到端AutoML。CoFEH使用基于Tree of Thought(TOT)的LLM驱动FE优化器探索灵活的FE流程,贝叶斯优化(BO)模块解决HPO,并动态优化器选择器适配FE和HPO步骤。关键的是,我们引入互条件机制,使LLM和BO之间共享上下文,实现相互指导的决策。实验表明,CoFEH在独立FE和联合FE+HPO设置中均优于传统和LLM基线。

英文摘要

Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which operate within rigid search spaces and lack domain awareness. While Large Language Models (LLMs) offer a promising alternative to generate unbounded operators with semantic reasoning, existing methods focus on isolated subtasks such as feature generation, falling short of free-form FE pipelines. Moreover, they are rarely coupled with hyperparameter optimization (HPO) of the downstream ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (TOT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that adaptively interleaves FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH outperforms both traditional and LLM-based baselines in both standalone FE and joint FE+HPO settings.

2602.08064 2026-05-22 cs.LG cs.AI cs.CL

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

SiameseNorm: 突破预规范与后规范之间的障碍

Tianyu Li, Dongchen Han, Zixuan Cao, Haofeng Huang, Mengyu Zhou, Ming Chen, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang, Gao Huang

发表机构 * Leap Lab, Tsinghua University(清华大学 Leap 实验室) Qwen Large Model Application Team, Alibaba(阿里巴巴 Qwen 大模型应用团队) Institute for Interdisciplinary Information Sciences, Tsinghua University(清华大学交叉信息学研究院)

AI总结 本文提出SiameseNorm,一种双流架构,通过共享残差块将预规范和后规范结合,从而在保持训练稳定性的同时提升模型性能,适用于多种架构和模态。

Comments Accepted to ICML 2026; camera-ready version; revised presentation and added additional experimental results

详情
AI中文摘要

预规范与后规范之间的长期矛盾仍然是Transformer架构中的一个开放问题,反映了训练稳定性与表示能力之间的根本权衡。先前尝试结合两者优势的研究取得了一定进展,但往往在不同训练设置下表现有限,限制了其更广泛的应用。我们重新审视这一困境,表明单流架构难以协调预规范的稳定身份梯度传播与后规范的主要残差路径归一化。为了解决这种结构张力,我们提出SiameseNorm,一种简单而有效的双流架构,能够与预规范训练配方保持兼容。SiameseNorm通过共享残差块将预规范和后规范流连接起来,允许每个残差块从两个路径接收优化信号,且开销极低。在400M和1.3B密集语言模型、15B MoE模型、视觉Transformer以及扩散Transformer上的大量实验表明,SiameseNorm在各种架构和模态中都能保持强大的训练稳定性的同时提升性能。代码可在https://github.com/Qwen-Applications/SiameseNorm上获得。

英文摘要

The long-standing tension between Pre- and Post-Norm remains an open problem in Transformer architecture, reflecting a fundamental trade-off between training stability and representational capacity. Prior attempts to combine their strengths have made progress, but often show limited robustness across training settings, restricting their broader applicability. We revisit this dilemma, showing that single-stream architectures struggle to reconcile Pre-Norm's stable identity-gradient propagation with Post-Norm's normalization of the main residual path. To address this structural tension, we propose SiameseNorm, a simple yet effective two-stream architecture that remains compatible with Pre-Norm training recipes. SiameseNorm couples Pre-Norm-like and Post-Norm-like streams through shared residual blocks, allowing each residual block to receive optimization signals from both pathways with negligible overhead. Extensive experiments on 400M and 1.3B dense language models, 15B MoE models, Vision Transformers, and Diffusion Transformers show that SiameseNorm consistently improves performance while maintaining strong training stability across architectures and modalities. Code is available at https://github.com/Qwen-Applications/SiameseNorm.

2602.07340 2026-05-22 cs.LG

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

通过选择性几何控制重新审视LLM安全对齐的鲁棒性

Yonghui Yang, Wenjian Tao, Jilong Liu, Xingyu Zhu, Junfeng Fang, Weibiao Huang, Le Wu, Richang Hong, Tat-Sent Chua

发表机构 * National University of Singapore(新加坡国立大学) Hefei University of Technology(合肥工业大学) ST Engineering Ltd., Singapore(新加坡ST工程有限公司)

AI总结 本文通过优化几何视角重新审视LLM安全对齐的鲁棒性,提出ShaPO框架,通过选择性几何控制在对齐关键参数子空间上强制最坏对齐目标,提升安全鲁棒性。

详情
AI中文摘要

大型语言模型的安全对齐在领域偏移和噪声偏好监督下仍显得脆弱。大多数现有鲁棒对齐方法关注对齐数据中的不确定性,而忽视了基于偏好的目标中优化诱导的脆弱性。在本文中,我们从优化几何的角度重新审视LLM安全对齐的鲁棒性,并认为鲁棒性失败不能仅通过数据为中心的方法解决。我们提出了ShaPO,一种几何感知的偏好优化框架,通过在对齐关键参数子空间上进行选择性几何控制来强制最坏情况下的对齐目标。通过避免均匀的几何约束,ShaPO缓解了在分布偏移下可能损害鲁棒性的过度正则化问题。我们将在两个层面实例化ShaPO:token层面的ShaPO稳定了基于似然的替代优化,而reward层面的ShaPO在噪声监督下强制奖励一致的优化。在多样化的安全基准和噪声偏好设置中,ShaPO在流行偏好优化方法上一致地提高了安全鲁棒性。此外,ShaPO能够与数据鲁棒目标清洁地组合,产生额外的收益,并经验上支持所提出的优化-几何视角。代码可在https://github.com/liujilong0116/ShaPO上获得。

英文摘要

Safety alignment of large language models remains brittle under domain shift and noisy preference supervision. Most existing robust alignment methods focus on uncertainty in alignment data, while overlooking optimization-induced fragility in preference-based objectives. In this work, we revisit robustness for LLM safety alignment from an optimization geometry perspective, and argue that robustness failures cannot be addressed by data-centric methods alone. We propose \textit{ShaPO}, a geometry-aware preference optimization framework that enforces worst-case alignment objectives via selective geometry control over alignment-critical parameter subspace. By avoiding uniform geometry constraints, ShaPO mitigates the over-regularization that can harm robustness under distribution shift. We instantiate ShaPO at two levels: token-level ShaPO stabilizes likelihood-based surrogate optimization, while reward-level ShaPO enforces reward-consistent optimization under noisy supervision. Across diverse safety benchmarks and noisy preference settings, ShaPO consistently improves safety robustness over popular preference optimization methods. Moreover, ShaPO composes cleanly with data-robust objectives, yielding additional gains and empirically supporting the proposed optimization-geometry perspective. The code is available at https://github.com/liujilong0116/ShaPO.

2602.06995 2026-05-22 cs.RO cs.CV cs.IT cs.MA math.IT

When Simultaneous Localization and Mapping Meets Wireless Communications: A Survey

当同时定位与建图遇见无线通信:一篇综述

Konstantinos Gounis, Sotiris A. Tegos, Dimitrios Tyrovolas, Panagiotis D. Diamantoulakis, George K. Karagiannidis

发表机构 * Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki(阿尔蒂斯大学电气与计算机工程系)

AI总结 本文综述了SLAM与无线通信交汇领域的最新进展,重点探讨了视觉SLAM(V-SLAM)整合中的双向影响,总结了无线信号传播、几何信道建模、基于射频(RF)的定位与感知等关键概念,以及图像处理技术如何检测地标并预测无线信道的最优路径,同时分析了SLAM与无线通信交叉领域的技术、挑战和未来方向。

详情
AI中文摘要

本文综述了SLAM与无线通信交汇领域的最新进展, attributing the bidirectional impact of each with a focus on visual SLAM (V-SLAM) integration. We provide an overview of key concepts related to wireless signal propagation, geometric channel modeling, and radio frequency (RF)-based localization and sensing. In addition to this, we show image processing techniques that can detect landmarks, proactively predicting optimal paths for wireless channels. Several dimensions are considered, including the prerequisites, techniques, background, and future directions and challenges of the intersection between SLAM and wireless communications. We analyze estimation and control approaches such as Bayesian filters, feature-based pose estimation, perception-aware motion control, spatial methods for signal processing such as vector fields, and key technological aspects. We expose techniques and items towards enabling a highly effective retrieval of the autonomous robot state. Among other interesting findings, we observe that monocular V-SLAM would benefit from RF relevant information, as the latter can serve as a proxy for the scale ambiguity resolution. Conversely, we find that wireless communications in the context of 5G and beyond can potentially benefit from visual odometry that is central in SLAM. Moreover, we examine other sources besides the camera for SLAM and describe the twofold relation with wireless communications. Finally, integrated solutions performing joint communications and SLAM appear to be in their infancy: theoretical and practical advancements are required to add higher-level localization and semantic perception capabilities to RF and multi-antenna technologies.

英文摘要

This paper surveys the state-of-the-art in the nexus of SLAM and Wireless Communications, attributing the bidirectional impact of each with a focus on visual SLAM (V-SLAM) integration. We provide an overview of key concepts related to wireless signal propagation, geometric channel modeling, and radio frequency (RF)-based localization and sensing. In addition to this, we show image processing techniques that can detect landmarks, proactively predicting optimal paths for wireless channels. Several dimensions are considered, including the prerequisites, techniques, background, and future directions and challenges of the intersection between SLAM and wireless communications. We analyze estimation and control approaches such as Bayesian filters, feature-based pose estimation, perception-aware motion control, spatial methods for signal processing such as vector fields, and key technological aspects. We expose techniques and items towards enabling a highly effective retrieval of the autonomous robot state. Among other interesting findings, we observe that monocular V-SLAM would benefit from RF relevant information, as the latter can serve as a proxy for the scale ambiguity resolution. Conversely, we find that wireless communications in the context of 5G and beyond can potentially benefit from visual odometry that is central in SLAM. Moreover, we examine other sources besides the camera for SLAM and describe the twofold relation with wireless communications. Finally, integrated solutions performing joint communications and SLAM appear to be in their infancy: theoretical and practical advancements are required to add higher-level localization and semantic perception capabilities to RF and multi-antenna technologies.

2602.06676 2026-05-22 cs.CV

Can We Build a Monolithic Model for Fake Image Detection? SICA: Semantic-Induced Constrained Adaptation for Unified-Yet-Discriminative Artifact Feature Space Reconstruction

我们能否为伪造图像检测构建一个单一模型?SICA:语义诱导约束适应用于统一且具有判别性的伪影特征空间重建

Bo Du, Xiaochen Ma, Xuekang Zhu, Zhe Yang, Chaogun Niu, Chenfan Qu, Mingqi Fang, Zhenming Wang, Jingjing Liu, Jian Liu, Ji-Zhe Zhou

发表机构 * Sichuan University(四川大学) The Hong Kong University of Science and Technology(香港科学与技术大学) University of Science and Technology of China(中国科学技术大学) South China University of Technology(华南理工大学)

AI总结 本文提出了一种新的单体伪造图像检测模型SICA,通过语义诱导约束适应方法,解决伪影特征空间重建的统一与判别性矛盾,实验表明其优于15种现有方法。

详情
AI中文摘要

伪造图像检测(FID),旨在在四个图像鉴真子领域中实现统一检测,在现实鉴真场景中至关重要。与集成方法相比,单体FID模型在理论上更具前景,但至今在实践中始终表现不佳。在本文中,我们识别了伪影在子领域中的本质差异,这一关键障碍我们称之为“齐则现象”。受这一现象的驱动,我们首次诊断出这种表现不佳的根本原因:伪影特征空间的崩溃。因此,开发实用单体FID模型的核心挑战归结为“统一且具有判别性的”伪影特征空间重建。为了解决这个矛盾的挑战,我们假设高层语义可以作为重建的结构先验,并进一步提出语义诱导约束适应(SICA),这是首个单体FID范式。在我们开放的OpenMMSec数据集上进行了广泛的实验,结果表明SICA优于15种最先进的方法,并以近正交的方式重建了目标统一且具有判别性的伪影特征空间,从而牢固验证了我们的假设。代码和数据集可在:https://github.com/venus-guangjian/SICA_OpenMMSec获取。

英文摘要

Fake Image Detection (FID), aiming at unified detection across four image forensic subdomains, is critical in real-world forensic scenarios. Compared with ensemble approaches, monolithic FID models are theoretically more promising, but to date, consistently yield inferior performance in practice. In this work, we identify the intrinsic distinctness of artifacts across subdomains, a critical barrier we term the ``Ji-Zhe phenomenon". Driven by this phenomenon, we diagnose the cause of this underperformance for the first time: the collapse of the artifact feature space. The core challenge for developing a practical monolithic FID model thus boils down to the ``unified-yet-discriminative" reconstruction of the artifact feature space. To address this paradoxical challenge, we hypothesize that high-level semantics can serve as a structural prior for the reconstruction, and further propose Semantic-Induced Constrained Adaptation (SICA), the first monolithic FID paradigm. Extensive experiments on our OpenMMSec dataset demonstrate that SICA outperforms 15 state-of-the-art methods and reconstructs the target unified-yet-discriminative artifact feature space in a near-orthogonal manner, thus firmly validating our hypothesis. The code and dataset are available at: https://github.com/venus-guangjian/SICA_OpenMMSec.

2602.05873 2026-05-22 cs.LG

Large-scale Score-based Variational Posterior Inference for Bayesian Deep Neural Networks

大规模基于分数的变分后验推断用于贝叶斯深度神经网络

Minyoung Kim

发表机构 * Samsung AI Center(三星人工智能中心)

AI总结 本文提出了一种适用于大规模贝叶斯深度神经网络的变分后验推断方法,结合了分数匹配损失和近端惩罚项,避免了重新参数化采样,实现了大规模神经网络的高效训练。

详情
AI中文摘要

贝叶斯(深度)神经网络(BNN)在多个方面比传统的点估计深度学习更具吸引力,包括不确定性量化、噪声鲁棒性、过拟合抵抗性等。变分推断(VI)是应用最广泛的近似推断方法之一。尽管基于ELBO的变分自由能方法在文献中占主导地位,但本文提出了一种基于分数的替代方法用于BNN的变分推断。基于分数的VI可以解决基于ELBO的VI中已知的模式崩溃问题。尽管社区中已经提出了几种基于分数的VI方法,但大多数方法由于各种计算和技术原因并不适用于大规模BNN。我们提出了一种新颖的可扩展VI方法,其中学习目标结合了分数匹配损失和近端惩罚项,这有助于我们的方法避免重新参数化采样,并允许通过随机梯度获得有偏的噪声小批量分数。这使得我们的方法能够扩展到大规模神经网络,包括视觉Transformer。在多个基准上,包括使用大规模深度网络的视觉识别和时间序列预测,我们实证地展示了我们方法的有效性。

英文摘要

Bayesian (deep) neural networks (BNN) are often more attractive than the vanilla point-estimate deep learning in various aspects including uncertainty quantification, robustness to noise, resistance to overfitting, and more. The variational inference (VI) is one of the most widely adopted approximate inference methods. Whereas the ELBO-based variational free energy method is a dominant choice in the literature, in this paper we introduce a score-based alternative for BNN variational inference. Score-based VI can address the known issue of mode collapsing in ELBO-based VI. Although several score-based VI methods have been proposed in the community, most are not adequate for large-scale BNNs for various computational and technical reasons. We propose a novel scalable VI method where the learning objective combines the score matching loss and the proximal penalty term in iterations, which helps our method avoid the reparametrized sampling, and allows for noisy unbiased mini-batch scores through stochastic gradients. This in turn makes our method scalable to large-scale neural networks including Vision Transformers. On several benchmarks including visual recognition and time-series forecasting with large-scale deep networks, we empirically show the effectiveness of our approach.

2602.05536 2026-05-22 cs.LG cs.AI cs.CL cs.CV

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

当共享知识有害:模型融合中的谱过积累

Yayuan Li, Ze Peng, Jian Zhang, Jintao Guo, Yue Duan, Yinghuan Shi

发表机构 * National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China.(新型软件技术国家重点实验室,南京大学,南京210023,中国。) Institute of Brain-Computer Interface, Nanjing University, Nanjing 210023, China.(脑机接口研究院,南京大学,南京210023,中国。)

AI总结 本文研究了模型融合中共享知识过积累的问题,提出SVC方法通过校准奇异值来恢复谱平衡,提升了模型融合和任务算术的性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

模型融合通过将多个微调模型的权重更新相加,提供了一种轻量级的替代方法,而非重新训练。现有方法主要针对解决任务更新之间的冲突,未处理共享知识过积累的失败模式。我们发现当任务共享对齐的谱方向(即重叠的奇异向量)时,简单的线性组合会反复积累这些方向,导致奇异值膨胀并使融合模型偏向共享子空间。为缓解此问题,我们提出Singular Value Calibration (SVC),一种无需训练和数据的后处理方法,量化子空间重叠并重新缩放膨胀的奇异值以恢复平衡的谱。在视觉和语言基准上,SVC一致改进了强大的融合基线并实现了最先进的性能。此外,仅通过修改奇异值,SVC将任务算术的性能提高了13.0%。代码可在https://github.com/lyymuwu/SVC获取。

英文摘要

Model merging combines multiple fine-tuned models into a single model by adding their weight updates, providing a lightweight alternative to retraining. Existing methods primarily target resolving conflicts between task updates, leaving the failure mode of over-counting shared knowledge unaddressed. We show that when tasks share aligned spectral directions (i.e., overlapping singular vectors), a simple linear combination repeatedly accumulates these directions, inflating the singular values and biasing the merged model toward shared subspaces. To mitigate this issue, we propose Singular Value Calibration (SVC), a training-free and data-free post-processing method that quantifies subspace overlap and rescales inflated singular values to restore a balanced spectrum. Across vision and language benchmarks, SVC consistently improves strong merging baselines and achieves state-of-the-art performance. Furthermore, by modifying only the singular values, SVC improves the performance of Task Arithmetic by 13.0%. Code is available at https://github.com/lyymuwu/SVC.

2602.05304 2026-05-22 cs.LG cs.SY eess.SY math.OC

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

SAG、SAGA和IAG算法的简短统一收敛性分析

Feng Zhu, Robert W. Heath, Aritra Mitra

发表机构 * Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, USA(北卡罗来纳州立大学电气与计算机工程系) Department of Electrical and Computer Engineering, University of California, San Diego, USA(加州大学圣地亚哥分校电气与计算机工程系)

AI总结 本文提出了一种统一的收敛性分析方法,适用于SAG、SAGA和IAG算法,通过简单的集中工具建立延迟界并设计新的Lyapunov函数,从而得到高概率界,并扩展到非凸目标和马尔可夫采样。

Comments To appear at the 43rd International Conference on Machine Learning (ICML)

详情
AI中文摘要

诸如随机平均梯度(SAG)和SAGA的随机方差减少算法,以及其确定性对应物如增量聚合梯度(IAG)方法,在大规模机器学习中已被广泛研究。尽管这些算法很受欢迎,但现有的分析却各不相同,依赖于针对每种方法量身定制的证明技术。此外,SAG的原始证明已知相当复杂,需要计算机辅助分析。聚焦于有限和优化问题,我们的主要贡献是开发了一种适用于所有三种算法的统一收敛性分析:SAG、SAGA和IAG。我们的分析有两个关键步骤:(i)使用简单的集中工具建立由于随机子采样导致的延迟界;(ii)精心设计一个新的Lyapunov函数,以考虑此类延迟。所得到的证明简短且模块化,为SAG和SAGA提供了首个高概率界,可以无缝扩展到非凸目标和马尔可夫采样。作为我们新分析技术的直接产物,我们获得了IAG算法的最佳已知速率,显著改进了之前的界。

英文摘要

Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel Lyapunov function that accounts for such delays. The resulting proof is short and modular, providing the first high-probability bounds for SAG and SAGA that can be seamlessly extended to non-convex objectives and Markov sampling. As an immediate byproduct of our new analysis technique, we obtain the best known rates for the IAG algorithm, significantly improving upon prior bounds.

2602.04768 2026-05-22 cs.LG cs.AI

Billion-Scale Graph Foundation Models

十亿级图基础模型

Maya Bechler-Speicher, Yoel Gottlieb, Andrey Isakov, David Abensur, Ami Tavory, Daniel Haimovich, Ido Guy, Udi Weinsberg

发表机构 * Meta

AI总结 本文提出GraphBFF,一种用于构建大规模异构图的十亿参数图基础模型的端到端方法,通过引入GraphBFF Transformer架构,揭示了异构图的神经缩放定律,并在多个下游任务中展示了其优越的性能。

详情
AI中文摘要

图结构数据支撑了许多关键应用。尽管基础模型通过大规模预训练和轻量级适应改变了语言和视觉领域,但将其扩展到一般、现实世界的图结构却具有挑战性。在本文中,我们提出了Graph Billion-Foundation-Fusion(GraphBFF):一种用于构建大规模异构图的十亿参数图基础模型(GFMs)的端到端方法。该方法的核心是GraphBFF Transformer,一种灵活且可扩展的架构,专为实际的十亿级GFMs设计。利用GraphBFF,我们提出了异构图的神经缩放定律,并显示损失随着模型容量或训练数据规模的增加而减少,取决于哪个因素是瓶颈。GraphBFF框架提供了具体的方法论,用于数据分批、预训练和微调,以构建大规模的GFMs。我们通过一个现实世界中的十亿级图展示了该框架的有效性,评估了一个十亿参数的GraphBFF Transformer,按照所提出的配方。在十个不同的现实世界下游任务上,涵盖节点和链接级别的分类和回归,GraphBFF在训练过程中未见过的图上始终优于基线,最大差距达到31个PRAUC点,包括在少样本设置中。最后,我们讨论了使GFMs成为工业规模图学习实际和原则性基础的关键挑战和开放机会。

英文摘要

Graph-structured data underpins many critical applications. While foundation models have transformed language and vision via large-scale pretraining and lightweight adaptation, extending this paradigm to general, real-world graphs is challenging. In this work, we present Graph Billion-Foundation-Fusion (GraphBFF): an end-to-end recipe for building billion-parameter Graph Foundation Models (GFMs) for large-scale heterogeneous graphs. Central to the recipe is the GraphBFF Transformer, a flexible and scalable architecture designed for practical billion-scale GFMs. Using the GraphBFF, we present neural scaling laws for heterogeneous graphs and show that loss decreases predictably as either model capacity or training data scales, depending on which factor is the bottleneck. The GraphBFF framework provides concrete methodologies for data batching, pretraining, and fine-tuning for building GFMs at scale. We demonstrate the effectiveness of the framework over a real-world billion-scale graph, with an evaluation of a billion-parameter GraphBFF Transformer following the proposed recipe. Across ten diverse, real-world downstream tasks on graphs unseen during training, spanning node- and link-level classification and regression, GraphBFF consistently outperforms baselines, with large margins of up to 31 PRAUC points, including in few-shot settings. Finally, we discuss key challenges and open opportunities for making GFMs a practical and principled foundation for graph learning at industrial scale.

2602.03784 2026-05-22 cs.CL

Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission

修复结构瓶颈:通过显式信息传输进行上下文压缩

Jiangnan Ye, Hanqi Yan, Zhenyi Shen, Heng Chang, Ye Mao, Yulan He

发表机构 * King’s College London(伦敦国王学院) Tsinghua University(清华大学) Imperial College London(伦敦帝国学院) The Alan Turing Institute(艾伦·图灵研究所)

AI总结 本文通过从结构角度重新审视上下文压缩,识别出标准LLM压缩方法中的两个关键瓶颈,并提出ComprExIT框架,通过显式信息传输提升压缩效率,实验表明其在多个数据集上表现优异,提升了F1分数并降低了计算成本。

详情
AI中文摘要

长上下文LLM代理往往面临增长的token、内存和延迟成本,使高效的上下文压缩对实际部署至关重要。现有LLM作为压缩器的方法在使用完整上下文时仍明显劣于其性能。我们发现这一差距部分源于其无法有效保留上下文信息。在本文中,我们从结构角度重新审视上下文压缩,并识别出标准LLM压缩方法中的两个关键瓶颈:信息聚合过程中压缩token之间的协调有限,以及层间稀释削弱了中间隐藏状态中的有用信号。为了解决这些限制,我们提出了ComprExIT,一种基于显式信息传输的新上下文压缩框架。ComprExIT会自适应地选择冻结LLM层中的特征,然后通过全局协调的运输计划将信息从锚点分配到压缩槽中。在12个数据集上的实验表明,ComprExIT在多个数据集上优于强大的软压缩基线,平均F1分数提升高达18.5%,同时仅增加约1%的可训练参数,并且比最快的基线快超过2倍的压缩速度。代码将在接受后发布。

英文摘要

Long-context LLM agents often struggle with growing token, memory, and latency costs, making efficient context compression essential for practical deployment. Existing LLM-as-a-compressor methods remain noticeably inferior to using the full context. We find that this gap partly stems from their inability to preserve contextual information effectively. In this work, we revisit context compression from a structural perspective and identify two key bottlenecks in standard LLM-based compressors: limited coordination among compression tokens during information aggregation, and layerwise dilution that weakens useful signals from intermediate hidden states. To address these limitations, we propose ComprExIT, a new context compression framework based on explicit information transmission. ComprExIT adaptively selects features across frozen LLM layers, then allocates information from anchors to compression slots through a globally coordinated transport plan. Experiments on 12 datasets show that ComprExIT consistently outperforms strong soft-compression baselines, improving average F1 by up to 18.5%, while adding only ~1% trainable parameters and achieving more than 2x faster compression than the fastest baselines. The code will be released upon acceptance.

2602.02709 2026-05-22 cs.AI

ATLAS: A Multi-LLM Training Framework for EvoDPO with Adaptive Reference Evolution

ATLAS:一种用于EvoDPO的多LLM训练框架,具有自适应参考进化

Ujin Jeon, Jiyong Kwon, Madison Ann Sullivan, Caleb Eunho Lee, Guang Lin

发表机构 * School of Electrical and Computer Engineering(电气与计算机工程学院) Purdue University West Lafayette(韦伯州立大学) School of Mechanical Engineering(机械工程学院) Department of Mathematics(数学系) Department of Computer Science(计算机科学系) Department of Mathematics and Mechanical Engineering(数学与机械工程系)

AI总结 本文提出ATLAS框架,通过自适应参考进化解决多LLM代理系统中固定参考模型导致的更新保守或训练停滞问题,结合支持者驱动探索与EvoDPO驱动的稳定性,提升长期评估驱动的自我改进能力。

详情
AI中文摘要

最近的多LLM代理系统在自动化问题解决中表现出有前途的能力,但它们主要依赖于冻结的代理或静态微调管道。为了解决这一限制,我们的主要贡献是ATLAS(用于代理自演化的自适应任务分布式学习),一种多代理框架,其中专门的元代理协作训练和优化一个活跃的代理以获得领域特定的策略。在这些管道中的迭代偏好学习中的核心挑战是依赖于固定的参考模型,通常导致过于保守的更新或训练停滞。为克服这一问题,该框架的算法引擎使用进化直接偏好优化(EvoDPO)。EvoDPO采用一个检查代理,根据连续的训练 telemetry 进行自适应的、基于代理-KL门控的参考策略更新。我们评估了该完整框架在一系列具有挑战性的环境中,包括非平稳的上下文带仔、偏微分方程(PINNs)和组合优化任务(TSP、Bin Packing)。通过与固定参考、自适应参考和外部自动发现基线的比较,我们的结果表明,ATLAS结合支持者驱动的探索与EvoDPO驱动的稳定性,以提高长期评估驱动的自我改进能力。

英文摘要

Recent multi-LLM agent systems have shown promising capabilities for automated problem-solving, yet they predominantly rely on frozen agents or static fine-tuning pipelines. To address this limitation, our primary contribution is ATLAS (Adaptive Task-distributed Learning for Agentic Self-evolution), a multi-agent framework where specialized meta-agents collaboratively train and refine an active agent toward a domain-specific policy. A core challenge in iterative preference learning within these pipelines is the reliance on fixed reference models, which typically leads to overly conservative updates or training stagnation. To overcome this, the framework's algorithmic engine utilizes Evolving Direct Preference Optimization (EvoDPO). EvoDPO employs an inspection agent to perform adaptive, proxy-KL gated reference policy updates based on continuous training telemetry. We evaluate this full framework across a diverse set of challenging environments-including non-stationary contextual bandits, partial differential equations (PINNs), and combinatorial optimization tasks (TSP, Bin Packing). Through comparison against fixed-reference, adaptive-reference, and external automated-discovery baselines, our results suggest that ATLAS combines supporter-driven exploration with EvoDPO-driven stability to improve long-horizon evaluator-driven self-improvement.

2602.02112 2026-05-22 cs.LG cs.AI cs.CL

Unifying Masked Diffusion Models with Various Generation Orders and Beyond

统一多种生成顺序及超越的掩码扩散模型

Chunsan Hong, Sanghyun Lee, Jong Chul Ye

发表机构 * Graduate School of AI, KAIST, South Korea(韩国延世大学人工智能研究生院)

AI总结 本文提出Order-Expressive Masked Diffusion Model (OeMDM)和Learnable-Order Masked Diffusion Model (LoMDM),统一了不同生成顺序的扩散生成过程,并通过单目标学习生成顺序和扩散骨干,提升了文本生成性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

Masked diffusion models (MDMs) 是语言生成中替代自回归模型 (ARMs) 的潜在选择,但生成质量严重依赖于生成顺序。先前工作要么硬编码顺序(例如块状左到右),要么为预训练的MDM学习顺序策略,这会带来额外成本并可能导致次优解,因为存在两阶段优化。受此启发,我们提出了order-expressive masked diffusion model (OeMDM),以适用于各种生成顺序的广泛扩散生成过程,使MDM、ARM和块扩散能在单一框架中进行解释。此外,基于OeMDM,我们引入了learnable-order masked diffusion model (LoMDM),通过单目标学习生成顺序和扩散骨干,使扩散模型能够根据上下文生成顺序进行文本生成。实证上,我们证实LoMDM在多个语言模型基准测试中优于各种离散扩散模型。

英文摘要

Masked diffusion models (MDMs) are a potential alternative to autoregressive models (ARMs) for language generation, but generation quality depends critically on the generation order. Prior work either hard-codes an ordering (e.g., blockwise left-to-right) or learns an ordering policy for a pretrained MDM, which incurs extra cost and can yield suboptimal solutions due to the two-stage optimization. Motivated by this, we propose order-expressive masked diffusion model (OeMDM) for a broad class of diffusion generative processes with various generation orders, enabling the interpretation of MDM, ARM, and block diffusion in a single framework. Furthermore, building on OeMDM, we introduce learnable-order masked diffusion model (LoMDM), which jointly learns the generation ordering and diffusion backbone through a single objective from scratch, enabling the diffusion model to generate text in context-dependent ordering. Empirically, we confirm that LoMDM outperforms various discrete diffusion models across multiple language modeling benchmarks.

2602.01334 2026-05-22 cs.CV

What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom

视觉工具使用强化学习究竟在学习什么?解构工具诱导效应与内在效应以实现作物和缩放

Yan Ma, Weiyu Zhang, Tianle Li, Linge Du, Xuyang Shen, Pengfei Liu

发表机构 * Shanghai Jiao Tong University(上海交通大学) Fudan University(复旦大学) Peking University(北京大学) The Chinese University of Hong Kong(香港中文大学)

AI总结 本文研究了视觉工具使用强化学习在作物和缩放任务中的学习机制,通过引入MED框架解耦内在能力变化与工具诱导效应,发现改进主要由内在学习驱动,而工具使用强化学习主要减少工具诱导的负面影响,而非掌握工具。

Comments ICML 2026 camera ready. Code: https://github.com/GAIR-NLP/Med

详情
AI中文摘要

视觉工具使用强化学习(RL)可以为视觉语言模型提供如作物和缩放等视觉操作,从而实现显著性能提升,但尚不清楚这些提升是源于工具使用能力的改进还是内在能力的演变。我们引入MED(测量-解释-诊断),一种由粗到细的框架,用于解耦内在能力变化与工具诱导效应,将工具诱导的性能差异分解为增益和损害项,并探测驱动其演变的机制。在作物和缩放设置中,对两个具有不同工具先验的VLMs和六个基准测试的检查点级分析显示,改进主要由内在学习驱动,而工具使用RL主要减少工具诱导的损害(例如更少的调用诱导错误和更弱的工具模式干扰),并在工具基于的内在失败修正方面取得有限进展。总体而言,在本文研究的作物和缩放设置中,当前的视觉工具使用RL学习的是安全地与工具共存,而非掌握工具。

英文摘要

Vision tool-use reinforcement learning (RL) can equip vision language models with visual operators such as crop-and-zoom and achieves strong performance gains, yet it remains unclear whether these gains are driven by improvements in tool use or evolving intrinsic capabilities. We introduce MED (Measure--Explain--Diagnose), a coarse-to-fine framework that disentangles intrinsic capability changes from tool-induced effects, decomposes the tool-induced performance difference into gain and harm terms, and probes the mechanisms driving their evolution. Across checkpoint-level analyses in the crop-and-zoom setting on two VLMs with different tool priors and six benchmarks, we find that improvements are dominated by intrinsic learning, while tool-use RL mainly reduces tool-induced harm (e.g., fewer call-induced errors and weaker tool schema interference) and yields limited progress in tool-based correction of intrinsic failures. Overall, in the crop-and-zoom setting studied here, current vision tool-use RL learns to coexist safely with tools rather than master them.

2602.00688 2026-05-22 cs.LG

Provably Protecting Fine-Tuned LLMs from Training Data Extraction while Preserving Utility

可证明地保护微调的LLM免受训练数据提取攻击同时保持效用

Tom Segal, Asaf Shabtai, Yuval Elovici

发表机构 * Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel(软件与信息系统工程系,内盖夫本·古里安大学,贝尔谢巴,以色列)

AI总结 本文提出了一种基于近访问自由(NAF)的算法SCP-Δ_r,通过相对概率和基础模型对低影响token进行平滑处理,从而在理论上有更优的界限,并在实践中有效抵御训练数据提取攻击,同时保持性能损失最小。

Comments 21 pages, 5 figures

详情
AI中文摘要

在敏感数据集上微调大型语言模型(LLMs)会引发隐私问题,因为训练数据提取(TDE)攻击可以暴露高度机密信息。现有的防御措施要么缺乏正式的隐私保证,要么导致显著的效用降级。我们观察到微调会引起广泛的概率偏移,但仅保留一小部分有影响的token级偏差即可;其余偏移可以通过强烈平滑处理,对效用影响极小。受此启发,我们提出了SCP-Δ_r,一种基于近访问自由(NAF)的算法,该算法在相对概率上操作,并利用基础模型显式平滑低影响token。SCP-Δ_r在理论上有比现有基于NAF的方法更好的界限,并且在实践中提供了强大的对抗TDE攻击的保护,同时性能损失很小。

英文摘要

Fine-tuning large language models (LLMs) on sensitive datasets raises privacy concerns, as training data extraction (TDE) attacks can expose highly confidential information. Existing defenses against such attacks either lack formal privacy guarantees or incur substantial utility degradation. We observe that fine-tuning induces widespread probability shifts, yet preserving only a small subset of influential token-level deviations is sufficient; the remaining shifts can be aggressively smoothed with minimal impact on utility. Motivated by this insight, we propose SCP-$Δ_r$, a Near Access Freeness (NAF)-based algorithm that operates on relative probabilities and explicitly smooths low-impact tokens using a base model. SCP-$Δ_r$ achieves orders-of-magnitude better theoretical bounds than existing NAF based methods and provides strong empirical protection against TDE attacks with minimal performance loss.

2601.23224 2026-05-22 cs.CV

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Video-o3:长视频多跳推理的原生交错线索搜索

Xiangyu Zeng, Zhiqiu Zhang, Yuhan Zhu, Xinhao Li, Zikang Wang, Changlian Ma, Qingyu Zhang, Zizheng Huang, Kun Ouyang, Tianxiang Jiang, Ziang Yan, Yi Wang, Hongjie Zhang, Yali Wang, Limin Wang

发表机构 * Nanjing University(南京大学) Shanghai AI Laboratory(上海人工智能实验室) Shanghai Jiao Tong University(上海交通大学) Peking University(北京大学) University of Science and Technology of China(中国科学技术大学) Zhejiang University(浙江大学) SIAT, Chinese Academy of Sciences(中国科学院软件研究所)

AI总结 本研究提出Video-o3框架,通过迭代发现显著视觉线索、细粒度检查关键片段以及适应性终止,提升长视频多跳推理能力,实验表明其在MLVU和Video-Holmes上分别达到72.1%和46.5%的准确率。

Comments 27 pages, 15 figures, 15 tables

详情
AI中文摘要

现有用于长视频理解的多模态大语言模型主要依赖均匀采样和单轮推理,限制了其在大量冗余中识别稀疏但关键证据的能力。我们引入Video-o3,一种支持迭代发现显著视觉线索、细粒度检查关键片段以及在获得足够证据后适应性终止的新框架。技术上,我们解决了交错工具调用中的两个核心挑战。首先,为减轻由推理和工具调用异质性引起的注意力分散,我们提出任务解耦注意力掩码,该方法在保持共享全局上下文的同时,隔离每一步的专注。其次,为控制多轮交互中的上下文长度增长,我们引入可验证轨迹引导奖励,平衡探索覆盖与推理效率。为了支持大规模训练,我们进一步开发了数据合成管道,并构建了包含173,000个高质量工具交互轨迹的Seeker-173K数据集。大量实验表明,Video-o3显著优于现有方法,在MLVU上达到72.1%的准确率,在Video-Holmes上达到46.5%的准确率。这些结果展示了Video-o3在长视频场景中的强大多跳证据搜索和推理能力,并验证了原生工具调用的有效性。

英文摘要

Existing multimodal large language models for long-video understanding predominantly rely on uniform sampling and single-turn inference, limiting their ability to identify sparse yet critical evidence amid extensive redundancy. We introduce Video-o3, a novel framework that supports iterative discovery of salient visual clues, fine-grained inspection of key segments, and adaptive termination once sufficient evidence is acquired. Technically, we address two core challenges in interleaved tool invocation. First, to mitigate attention dispersion induced by the heterogeneity of reasoning and tool-calling, we propose Task-Decoupled Attention Masking, which isolates per-step concentration while preserving shared global context. Second, to control context length growth in multi-turn interactions, we introduce a Verifiable Trajectory-Guided Reward that balances exploration coverage with reasoning efficiency. To support training at scale, we further develop a data synthesis pipeline and construct Seeker-173K, comprising 173K high-quality tool-interaction trajectories for effective supervised and reinforcement learning. Extensive experiments show that Video-o3 substantially outperforms state-of-the-art methods, achieving 72.1% accuracy on MLVU and 46.5% on Video-Holmes. These results demonstrate Video-o3's strong multi-hop evidence-seeking and reasoning capabilities, and validate the effectiveness of native tool invocation in long-video scenarios.

2601.20205 2026-05-22 cs.LG

Hyperparameter Transfer with Mixture-of-Expert Layers

通过专家混合层进行超参数迁移

Tianze Jiang, Blake Bordelon, Cengiz Pehlevan, Boris Hanin

发表机构 * Operations Research Financial Engineering, Princeton University, Princeton, NJ, USA Center of Mathematical Sciences Applications, Harvard University, Cambridge, MA, USA John A. Paulson School of Engineering Applied Sciences, Center for Brain Science, Kempner Institute for the Study of Natural Artificial Intelligence, Harvard University, Cambridge, MA, USA

AI总结 本文提出了一种新的参数化方法,用于在扩展模型宽度、深度、专家数量和专家(隐藏)大小时,通过专家混合层的变压器模型进行超参数迁移,该方法基于动态平均场理论分析,实验证明其在不同规模模型间可靠地迁移超参数。

Comments ICML 2026

详情
AI中文摘要

混合专家(MoE)层已成为通过在前向传递中解耦总可训练参数与激活参数来扩展现代神经网络的重要工具。然而,稀疏MoEs由于(i)新的可训练参数(路由权重),这些参数像所有其他参数组一样需要超参数(HP)调整;(ii)新的架构尺度维度(专家数量和大小)必须选择并可能取大,从而增加了训练的复杂性。为了使超参数选择变得廉价且可靠,我们提出了一种新的参数化方法,用于在扩展模型宽度、深度、专家数量和专家(隐藏)大小时的变压器模型。我们的参数化方法通过一种新的动态平均场理论(DMFT)分析得到证明。当在固定token预算下变化不同的模型维度时,我们发现我们的参数化方法在51M到超过2B总参数的模型间实现了可靠的超参数迁移。我们进一步利用在短token范围上扫掠的小模型识别出的超参数来训练更大模型在更长的范围上,并报告了性能良好的模型行为。

英文摘要

Mixture-of-Experts (MoE) layers have emerged as an important tool in scaling up modern neural networks by decoupling total trainable parameters from activated parameters in the forward pass for each token. However, sparse MoEs add complexity to training due to (i) new trainable parameters (router weights) that, like all other parameter groups, require hyperparameter (HP) tuning; (ii) new architecture scale dimensions (number of and size of experts) that must be chosen and potentially taken large. To make HP selection cheap and reliable, we propose a new parameterization for transformer models with MoE layers when scaling model width, depth, number of experts, and expert (hidden) size. Our parameterization is justified by a novel dynamical mean-field theory (DMFT) analysis. When varying different model dimensions trained at a fixed token budget, we find empirically that our parameterization enables reliable HP transfer across models from 51M to over 2B total parameters. We further take HPs identified from sweeping small models on a short token horizon to train larger models on longer horizons and report performant model behaviors.

2601.20107 2026-05-22 cs.CV cs.CL cs.IR

Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval

结构锚点剪枝:用于视觉文档检索的无训练多向量压缩

Zhuchenyang Liu, Ziyu Hu, Yao Zhang, Yu Xiao

发表机构 * Aalto University(阿alto大学)

AI总结 本文提出结构锚点剪枝(SAP),一种无需训练的多向量压缩方法,通过保留评分、指导窗口选择和视觉入度中心性评分三个组件,在不进行模型参数调整的情况下,实现了超过90%的视觉token剪枝同时保持NDCG@5超过90%的性能。

Comments methodology revision and new title

详情
AI中文摘要

最近的视觉-语言模型(例如ColPali)能够实现细粒度的视觉文档检索(VDR),但带来了可接受的多向量索引存储开销。现有的无训练剪枝方法要么依赖于启发式的层选择,要么在激进压缩下急剧退化,导致先前的工作认为有效的高压缩剪枝需要查询依赖的训练。我们通过结构锚点剪枝(SAP)挑战这一观点,这是一种自校准、无训练、且查询无关的索引时间剪枝框架,包含三个组件:(i)评分保留(SR),一种每层压缩诊断的白盒方法;(ii)SR引导的窗口选择,一种自动定位任何主干网络的结构剪枝区域的程序,无需每个模型的超参数;(iii)一个视觉入度中心性评分器,用于识别所选窗口内的锚点块。在ViDoRe v1/v2基准测试中,跨越三种架构(18、28和36层主干网络)的三个架构上,SAP在不进行任何模型参数调整的情况下,保留了超过90%的NDCG@5,同时剪枝了超过90%的视觉token。我们的分层解析SR分析揭示了对齐-聚合分歧:文档的视觉结构在主干网络中被保留为稳定的“结构高原”,但最终层将这种表示重塑为稀疏、查询对齐的形式,不再适合剪枝。这是SAP在最终层方法失败的地方的机械原因。

英文摘要

Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR) but incur prohibitive multi-vector index storage overhead. Existing training-free pruning methods either rely on heuristic layer choices or degrade sharply under aggressive compression, leading prior work to argue that effective high-compression pruning requires query-dependent training. We challenge this view with Structural Anchor Pruning (SAP), a self-calibrating, training-free, and query-agnostic index-time pruning framework with three components: (i) Score Retention (SR), a white-box per-layer compression diagnostic; (ii) SR-guided window selection, a procedure that automatically locates the structural pruning region for any backbone with no per-model hyperparameters; and (iii) a visual in-degree centrality scorer that identifies anchor patches within the selected window. On the ViDoRe v1/v2 benchmarks across three architectures spanning 18, 28, and 36 backbone layers, SAP retains over 90\% of NDCG@5 while pruning more than 90\% of visual tokens, without any per-model parameter tuning. Our layer-resolved SR analysis reveals an Alignment-Aggregation Divergence: the document's visual structure is preserved as a stable ``Structural Plateau'' within the backbone, but the final layers reshape this representation into a sparse, query-aligned form that is no longer suitable for pruning. This is the mechanistic reason SAP succeeds where final-layer methods fail.

2601.07603 2026-05-22 cs.CV

UIKA: Fast Universal Head Avatar from Pose-Free Images

UIKA:从无姿态图像快速生成通用头身模型

Zijian Wu, Boyao Zhou, Liangxiao Hu, Hongyu Liu, Yuan Sun, Xuan Wang, Xun Cao, Yujun Shen, Hao Zhu

发表机构 * Nanjing University(南京大学) Ant Group(蚂蚁集团) HKUST(香港科技大学) Xi’an Jiaotong University(西安交通大学)

AI总结 本文提出UIKA,一种从任意数量的无姿态输入(包括单张图像、多视角捕捉和手机拍摄视频)生成可动画的高斯头身模型。与传统头身模型不同,UIKA通过模型表示、网络设计和数据准备重新思考任务,引入了UV引导的头身建模策略,设计了可学习的UV标记,并通过聚合所有输入视角的UV信息解码为标准高斯属性。

Comments CVPR 2026 Highlight. Code: https://github.com/ant-research/UIKA

详情
AI中文摘要

我们提出UIKA,一种从任意数量的无姿态输入(包括单张图像、多视角捕捉和手机拍摄视频)生成可动画的高斯头身模型。与传统头身模型不同,UIKA通过模型表示、网络设计和数据准备重新思考任务。首先,我们引入了UV引导的头身建模策略,其中每个输入图像都与像素级的面部对应关系估计相关联。这种对应关系估计允许我们将每个有效像素的颜色从屏幕空间重新投影到UV空间,这与相机姿态和人物表情无关。此外,我们设计了可学习的UV标记,在屏幕和UV层面均可应用注意力机制。通过聚合所有输入视角的UV信息,这些学习到的UV标记可以解码为标准的高斯属性。为了训练我们的大型头身模型,我们还准备了一个大规模、身份丰富的合成训练数据集。我们的方法在单目和多视角设置中均显著优于现有方法。

英文摘要

We present UIKA, a feed-forward animatable Gaussian head model from an arbitrary number of pose-free inputs, including a single image, multi-view captures, and smartphone-captured videos. Unlike the traditional avatar method, which requires a studio-level multi-view capture system and reconstructs a human-specific model through a long-time optimization process, we rethink the task through the lenses of model representation, network design, and data preparation. First, we introduce a UV-guided avatar modeling strategy, in which each input image is associated with a pixel-wise facial correspondence estimation. Such correspondence estimation allows us to reproject each valid pixel color from screen space to UV space, which is independent of camera pose and character expression. Furthermore, we design learnable UV tokens on which the attention mechanism can be applied at both the screen and UV levels. The learned UV tokens can be decoded into canonical Gaussian attributes using aggregated UV information from all input views. To train our large avatar model, we additionally prepare a large-scale, identity-rich synthetic training dataset. Our method significantly outperforms existing approaches in both monocular and multi-view settings.

2601.04537 2026-05-22 cs.LG cs.CL

Linear Dynamics in the RLVR Training of Large Language Models

在大语言模型RLVR训练中的线性动力学

Tianle Wang, Jiayu Liu, Zhongyuan Wu, Shenghao Jin, Wei Chen, Hao Xu, Ning Miao

发表机构 * Department of Data Science, City University of Hong Kong(香港城市大学数据科学系) Hong Kong Institute of AI for Science, City University of Hong Kong(香港城市大学人工智能科学研究院) Li Auto Inc. Beihang University(北航大学)

AI总结 本文研究了强化学习可验证奖励(RLVR)在大语言模型训练中的内部动态,发现RLVR在多种模型和训练配置下均进入线性区域,通过实验和理论分析证明这种线性特性源于训练信号的高方差和噪声,且具有预测性和实用性。

Comments Major revision: substantially reorganized the manuscript and added a theoretical explanation section. The replacement is intended for the same arXiv paper; the core topic and contribution remain the same

详情
AI中文摘要

强化学习可验证奖励(RLVR)在以推理为导向的大语言模型(LLMs)中推动了显著的性能提升,但其内部训练动态仍 largely 是一个黑箱。在本文中,我们对RLVR进行了全面的轨迹级分析,并揭示出一个显著的规律:在各种模型家族、RL算法和训练配置下,RLVR始终进入一个稳健的线性区域,其中参数权重和输出对数概率,通过严格教师强制评估测量,以高度线性的方式(R²>0.7)演变。通过受控实验和理论分析,我们证明这种线性并非偶然,而是源于RLVR训练信号的高方差和噪声性质,这些性质起到了低通滤波器的作用,将优化集中在稳定的、低维的漂移上。此外,我们显示这种线性结构不仅具有描述性,而且具有强大的预测性和实用性。具体而言,权重空间外推在性能上与标准RL优化相当,同时通过定期重新定位实现了6.1倍的训练加速。同时,输出空间外推作为一种轻量级干预,有效 bypassed 后期模型崩溃,持续在数学和编码基准上优于标准RL,平均性能提升了4.2%。我们的代码可在https://github.com/Miaow-Lab/RLVR-Linearity获得。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gains in reasoning-oriented large language models (LLMs), yet its internal training dynamics remain largely a black box. In this work, we perform a comprehensive trajectory-level analysis of RLVR and uncover a striking regularity: across various model families, RL algorithms, and training configurations, RLVR consistently enters a robust linear regime, where both parameter weights and output log-probabilities, measured rigorously via teacher-forced evaluation, evolve in a highly linear manner ($R^2 > 0.7$). Through controlled experiments and theoretical analysis, we demonstrate that this linearity is not a coincidence, but stems from the high-variance, noisy nature of RLVR training signals, which act as a low-pass filter to concentrate optimization along a stable, low-dimensional drift. Moreover, we show that this linear structure is not merely descriptive but powerfully predictive and actionable. Specifically, weight-space extrapolation matches the performance of standard RL optimization while achieving a 6.1x training speedup through periodic re-grounding. Meanwhile, output-space extrapolation serves as a lightweight intervention that effectively bypasses late-stage model collapse, consistently outperforming standard RL across mathematical and coding benchmarks, with an average performance improvement of 4.2%. Our code is available at https://github.com/Miaow-Lab/RLVR-Linearity.