arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.22270 2026-05-22 cs.LG q-bio.PE

Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting

先验知识增强的时空疫情预测

Sijie Ruan, Jinyu Li, Jia Wei, Zenghao Xu, Jie Bao, Junshi Xu, Junyang Qiu, Shuliang Wang, Xiaoxiao Wang, Hanning Yuan

发表机构 * Beijing Institute of Technology（北京理工大学）； Zhejiang Provincial Center for Disease Control and Prevention（浙江省疾病预防控制中心）； JD Technology（京东科技）； The University of Hong Kong（香港大学）； China Mobile Internet（中国移动互联网）

AI总结本文提出了一种结合隐式时空先验和显式专家先验的新型混合框架STOEP，通过动态调整区域依赖关系、放大弱信号和机制性预测来提升时空疫情预测的准确性。

Comments 12 pages, 10 figures, accepted to IJCAI 2026

详情

AI中文摘要

时空疫情预测对于公共卫生管理至关重要，但现有方法常面临对弱疫情信号不敏感、空间关系过于简化和参数估计不稳定的问题。为解决这些问题，我们提出了Spatio-Temporal priOr-aware Epidemic Predictor（STOEP），一种新的混合框架，整合了隐式时空先验和显式专家先验。STOEP由三个关键组件组成：（1）病例感知邻接学习（CAL），利用历史感染模式动态调整基于移动性的区域依赖关系；（2）空间指导参数估计（SPE），采用可学习的空间先验来放大弱疫情信号；（3）基于滤波的机制性预测（FMF），使用专家指导的自适应阈值策略来正则化疫情参数。在真实世界中的新冠和流感数据集上进行的广泛实验表明，STOEP在RMSE上比最佳基线高出11.1%。该系统已在中国一个省级CDC部署，以促进后续应用。

英文摘要

Spatio-temporal epidemic forecasting is critical for public health management, yet existing methods often struggle with insensitivity to weak epidemic signals, over-simplified spatial relations, and unstable parameter estimation. To address these challenges, we propose the Spatio-Temporal priOr-aware Epidemic Predictor (STOEP), a novel hybrid framework that integrates implicit spatio-temporal priors and explicit expert priors. STOEP consists of three key components: (1) Case-aware Adjacency Learning (CAL), which dynamically adjusts mobility-based regional dependencies using historical infection patterns; (2) Space-informed Parameter Estimating (SPE), which employs learnable spatial priors to amplify weak epidemic signals; and (3) Filter-based Mechanistic Forecasting (FMF), which uses an expert-guided adaptive thresholding strategy to regularize epidemic parameters. Extensive experiments on real-world COVID-19 and influenza datasets demonstrate that STOEP outperforms the best baseline by 11.1% in RMSE. The system has been deployed at a provincial CDC in China to facilitate downstream applications.

URL PDF HTML ☆

赞 0 踩 0

2602.20845 2026-05-22 cs.CV

FLIM Networks with Bag of Feature Points

具有特征点袋的FLIM网络

João Deltregia Martinelli, Marcelo Luis Rodrigues Filho, Felipe Crispim da Rocha Salvagnini, Gilson Junior Soares, Jefersson A. dos Santos, Alexandre X. Falcão

发表机构 * Institute of Computing UNICAMP Campinas, Brazil School of Computer Science University of Sheffield Sheffield, United Kingdom（计算研究所（UNICAMP）埃尔南迪斯，巴西学校计算机科学大学谢菲尔德，英国）

AI总结本文提出FLIM-BoFP，一种更高效的滤波器估计方法，用于显微镜图像中的寄生虫检测，相较于FLIM-Cluster和其他先进基线，在效率、效果和泛化能力上均有优势。

Comments Accepted at the 28th Iberoamerican Congress on Pattern Recognition (CIARP 2025). To appear in Lecture Notes in Computer Science (LNCS), Springer

详情

DOI: 10.1007/978-3-032-23176-5_19

AI中文摘要

卷积网络需要大量的图像标注，这可能成本高昂且耗时。通过从少量代表性图像上用户绘制的标记中估计编码器滤波器（即核权重），特征学习从图像标记（FLIM）解决了这一挑战，而无需传统优化。这种编码器与自适应解码器结合构成了一个完全训练而无需反向传播的FLIM网络。先前研究已证明其在显著物检测（SOD）中的有效性，比现有轻量模型显著更轻。本研究重新审视FLIM SOD，并引入FLIM-Bag of Feature Points（FLIM-BoFP），一种显著更快的滤波器估计方法。先前方法FLIM-Cluster通过每个编码器块的补丁聚类来推导滤波器，导致计算开销和对滤波器位置的控制减少。FLIM-BoFP通过在输入块进行一次聚类，创建特征点袋，并在所有块上直接从映射的特征点定义滤波器。论文评估了FLIM-BoFP与FLIM-Cluster和其他最先进的基线在寄生虫检测中的效率、效果和泛化能力的益处。

英文摘要

Convolutional networks require extensive image annotation, which can be costly and time-consuming. Feature Learning from Image Markers (FLIM) tackles this challenge by estimating encoder filters (i.e., kernel weights) from user-drawn markers on discriminative regions of a few representative images without traditional optimization. Such an encoder combined with an adaptive decoder comprises a FLIM network fully trained without backpropagation. Prior research has demonstrated their effectiveness in Salient Object Detection (SOD), being significantly lighter than existing lightweight models. This study revisits FLIM SOD and introduces FLIM-Bag of Feature Points (FLIM-BoFP), a considerably faster filter estimation method. The previous approach, FLIM-Cluster, derives filters through patch clustering at each encoder's block, leading to computational overhead and reduced control over filter locations. FLIM-BoFP streamlines this process by performing a single clustering at the input block, creating a bag of feature points, and defining filters directly from mapped feature points across all blocks. The paper evaluates the benefits in efficiency, effectiveness, and generalization of FLIM-BoFP compared to FLIM-Cluster and other state-of-the-art baselines for parasite detection in optical microscopy images.

URL PDF HTML ☆

赞 0 踩 0

2602.18141 2026-05-22 cs.LG

Geometry-Induced Diffusion on Graphs: A Learnable Weighted Laplacian for Spectral GNNs

图诱导扩散：用于谱GNNs的可学习加权拉普拉斯算子

Mia Zosso, Ali Hariri, Victor Kawasaki-Borruat, Pierre-Gabriel Berlureau, Pierre Vandergheynst

发表机构 * École Polytechnique Fédérale de Lausanne (EPFL)（瑞士联邦理工学院（EPFL））； École Normale Supérieure – PSL（巴黎高等师范学院–PSL）

AI总结本文提出了一种简单的谱GNN架构mu-ChebNet，通过学习节点级权重函数mu来修改图拉普拉斯算子，从而改变传播几何而不改变图拓扑，从而促进信息传播的优选路径，帮助长距离信号避免高收缩瓶颈，无需重复层堆叠。

详情

AI中文摘要

长距离图任务对图神经网络（GNNs）来说具有挑战性：全局机制如注意力或重排方案可能计算成本高，而深度局部传播容易导致梯度消失、过平滑和过压缩。引入的mu-ChebNet架构是一种简单的谱GNN，它在应用ChebNet式滤波器之前学习一个节点级权重函数mu。所学的权重mu诱导了一个修改后的图拉普拉斯算子，从而有效改变传播几何而不改变图拓扑。这种任务相关的几何促进了信息传播的优选路径，从而帮助长距离信号避免高度收缩的瓶颈，并消除了对重复层堆叠的需要。在实践中，我们用学习的算子L_mu代替固定的图拉普拉斯算子L，保持所提出的mu-ChebNet架构轻量级，同时使传播任务自适应。此外，我们提供了一种谱分析，说明mu如何调节传播动力学，并在合成长距离推理任务和现实世界图基准上观察到性能的提高。所学的权重函数不仅具有可解释性，还为自适应图传播提供了轻量级的替代方案。

英文摘要

Long-range graph tasks are challenging for Graph Neural Networks (GNNs): global mechanisms such as attention or rewiring schemes can be computationally expensive, while deep local propagation is prone to vanishing gradients, oversmoothing, and oversquashing. The introduced mu-ChebNet architecture is a simple spectral GNN that learns a node-wise weight function mu before applying ChebNet-style filters. The learned weighting mu induces a modified graph Laplacian which effectively changes the propagation geometry without altering the graph topology. This task-dependent geometry promotes preferred routes for information propagation, thereby helping long-range signals avoid highly contractive bottlenecks, and obviating the need for repeated layer stacking. In practice, we replace the fixed graph Laplacian L by a learned operator L_mu, keeping the proposed mu-ChebNet architecture lightweight while making propagation task-adaptive. Furthermore, we provide a spectral analysis demonstrating how mu modulates propagation dynamics, and empirically observe improved performance on both synthetic long-range reasoning tasks and real-world graph benchmarks. The learned weight function is not only interpretable, but also offers a lightweight alternative to attention and rewiring for adaptive graph propagation.

URL PDF HTML ☆

赞 0 踩 0

2602.17517 2026-05-22 cs.CV

Depth Augmented and FE Free 3D/2D Liver Registration for Laparoscopic Liver AR

深度增强和无有限元分析的3D/2D肝脏注册用于腹腔镜肝脏AR

Hanyuan Zhang, Lucas He, Runlong He, Weixi Yi, Abdolrahim Kadkhodamohammadi, Danail Stoyanov, Brian R. Davidson, Evangelos B. Mazomenos, Matthew J. Clarkson

发表机构 * UCL Hawkes Institute, University College London, London WC1E 6BT, UK（伦敦大学学院UCL哈维斯研究所）； Division of Surgery and Interventional Science, University College London, London WC1E 6BT, UK（伦敦大学学院UCL外科与介入科学系）； Unit for Lifelong Health and Ageing at UCL, University College London, London WC1E 7HB, UK（伦敦大学学院UCL终身健康与老龄化单位）； Medtronic plc., London, UK（伦敦梅脱利克公司）

AI总结本研究提出了一种深度增强且无需有限元分析的3D/2D肝脏注册方法，通过结合鲁棒的刚性初始化和患者特定的非刚性细化，以提高腹腔镜肝脏手术AR中的3D到2D注册精度。

详情

AI中文摘要

增强现实（AR）在腹腔镜肝脏手术中的引导需要准确地将术前3D模型与术中2D视频进行注册，但因部分可见性、镜面反射和组织变形而具有挑战性。现有方法通常依赖于基于轮廓的刚性初始化和有限元（FE）模型进行可变形注册，增加了建模和工程复杂性。我们提出了一种深度增强且无有限元分析的3D-2D注册流程，结合了鲁棒的刚性初始化和患者特定的非刚性细化。对于刚性对齐，我们通过使用多类轮廓图和单目深度来适应FoundationPose的RefineNet模块以适应腹腔镜肝脏场景，以实现相对姿态的细化。对于可变形对齐，我们从非刚性ICP（NICP）对应关系中构建患者特定的统计变形模型，并使用粗到细的L-BFGS-B策略优化姿态和形状参数。在公开的临床腹腔镜肝脏数据集上，所提出的方法在受控的手动轮廓设置下实现了平均目标注册误差（TRE）为14.73毫米。消融研究显示，单目深度在轮廓输入上提高了刚性初始化，而肿瘤映射分析表明良好的表面对齐并不一定转化为更低的目标定位误差。在没有地面真实数据的外部数据集上，该方法产生视觉上合理的叠加以进行定性评估。这些结果表明，深度增强的姿态细化和无有限元分析的统计变形建模为受控的3D-2D肝脏注册在手术AR中提供了一个有前景的替代方案。

英文摘要

Augmented reality (AR) guidance in laparoscopic liver surgery requires accurate registration of preoperative 3D models to intraoperative 2D video, but remains challenging due to partial visibility, specularities, and tissue deformation. Existing methods often rely on contour-based rigid initialization and finite-element (FE) models for deformable registration, increasing modeling and engineering complexity. We present a depth-augmented, FE-free 3D--2D registration pipeline that combines robust rigid initialization with patient-specific non-rigid refinement. For rigid alignment, we adapt the RefineNet module of FoundationPose to laparoscopic liver scenes by using multi-class contour maps and monocular depth for relative pose refinement. For deformable alignment, we construct a patient-specific statistical deformation model from non-rigid ICP (NICP) correspondences and optimize pose and shape parameters using a coarse-to-fine L-BFGS-B strategy. On a public clinical laparoscopic liver dataset, the proposed method achieves a mean target registration error (TRE) of 14.73\,mm under a controlled manual-contour setting designed to isolate registration performance. Ablation studies show that monocular depth improves rigid initialization over contour-only inputs, while tumor-mapping analysis indicates that good surface alignment does not necessarily translate into lower target localization error. On an external dataset without ground truth, the method produces visually plausible overlays for qualitative assessment. These results suggest that depth-augmented pose refinement and FE-free statistical deformation modeling provide a promising alternative to FE-based pipelines for controlled 3D--2D liver registration in surgical AR.

URL PDF HTML ☆

赞 0 踩 0

2602.17385 2026-05-22 cs.AI

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

通过克罗内克-因子近似曲率进行任务算术中的无数据权重解耦

Angelo Porrello, Pietro Buzzega, Felix Dangel, Thomas Sommariva, Riccardo Salami, Lorenzo Bonicelli, Simone Calderara

发表机构 * University of Modena and Reggio Emilia（莫德纳和雷吉奥艾米利亚大学）； Vector Institute（向量研究所）

AI总结本文提出了一种无数据的方法，通过将表示漂移正则化问题框架化为曲率矩阵近似问题，以解决任务算术中任务向量的交叉任务干扰问题，实现了任务加法和否定的最新成果。

Comments Accepted to ICLR 2026

详情

AI中文摘要

任务算术提供了一种模块化且可扩展的方法来适应基础模型。然而，结合多个任务向量可能导致跨任务干扰，导致表示漂移和性能下降。表示漂移正则化提供了一种自然的解决方法来解耦任务向量；然而，现有方法通常需要外部任务数据，这与模块化和数据可用性约束（例如隐私要求）相冲突。我们提出了一种无数据的方法，通过将正则化表示漂移作为曲率矩阵近似问题来框架化。这使我们能够利用已建立的技术；特别是，我们采用克罗内克-因子近似曲率，并获得一个实用的正则器，实现了任务加法和否定的最新成果。我们的方法在任务数量上具有常数复杂性，并增强了对任务向量重新缩放的鲁棒性，消除了对保留调优的需要。

英文摘要

Task Arithmetic yields a modular, scalable way to adapt foundation models. Combining multiple task vectors, however, can lead to cross-task interference, causing representation drift and degraded performance. Representation drift regularization provides a natural remedy to disentangle task vectors; however, existing approaches typically require external task data, conflicting with modularity and data availability constraints (e.g., privacy requirements). We propose a dataless approach by framing regularization against representation drift as a curvature matrix approximation problem. This allows us to leverage well-established techniques; in particular, we adopt Kronecker-Factored Approximate Curvature and obtain a practical regularizer that achieves state-of-the-art results in task addition and negation. Our method has constant complexity in the number of tasks and promotes robustness to task vector rescaling, eliminating the need for held-out tuning.

URL PDF HTML ☆

赞 0 踩 0

2602.13372 2026-05-22 cs.AI cs.LG

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

MoralityGym：用于评估序列决策代理中分层道德对齐的基准

Simon Rosen, Siddarth Singh, Ebenezer Gelo, Helen Sarah Robertson, Ibrahim Suder, Victoria Williams, Benjamin Rosman, Geraud Nangue Tasse, Steven James

发表机构 * University of the Witwatersrand（威特沃特斯兰大学）

AI总结本文提出MoralityGym基准，通过将道德规范表示为有序的规范约束，评估序列决策代理中分层道德对齐的挑战，展示了98个伦理困境问题，并通过心理学和哲学的见解改进了伦理决策方法。

Comments Accepted at AAMAS 2026

Journal ref Proc of the 25th International Conference on Autonomous Agents and Multiagent Systems AAMAS 2026, Paphos, Cyprus, May 25 to 29, 2026, IFAAMAS

详情

DOI: 10.65109/SAKL6648

AI中文摘要

评估在面对冲突且分层结构的人类规范时，代理的道德对齐是一个在人工智能安全、道德哲学和认知科学交汇处的关键挑战。我们引入了Morality Chains，一种新的形式化方法，用于将道德规范表示为有序的规范约束，并引入了MoralityGym，一个包含98个伦理困境问题的基准，这些问题是作为电车困境风格的Gymnasium环境呈现的。通过将任务解决与道德评估解耦，并引入新的道德度量标准，MoralityGym允许将心理学和哲学的见解整合到规范敏感推理的评估中。基于安全强化学习方法的基准结果揭示了关键限制，强调了需要更系统的方法来处理伦理决策。本文为开发在复杂现实环境中行为更可靠、透明和道德的AI系统提供了基础。

英文摘要

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

URL PDF HTML ☆

赞 0 踩 0

2602.12952 2026-05-22 cs.LG cs.AI cs.CV

Transporting Task Vectors across Different Architectures without Training

在不同架构间传输任务向量而无需训练

Filippo Rinaldi, Aniello Panariello, Giacomo Salici, Angelo Porrello, Simone Calderara

发表机构 * AImageLab, University of Modena and Reggio Emilia（AImageLab，Modena和雷吉奥艾米利亚大学）

AI总结本文提出Theseus方法，通过功能匹配在不同宽度模型间传输任务更新，无需训练或反向传播，展示了在视觉和语言模型上的改进效果。

Comments Accepted at the International Conference on Machine Learning (ICML), 2026

详情

AI中文摘要

适应大型预训练模型以完成下游任务时，通常会产生针对特定任务的参数更新，这些更新对于每个模型变体重新学习都很昂贵。尽管最近的研究表明，这些更新可以在具有相同架构的模型之间转移，但跨不同宽度的模型转移仍鲜有探索。在本文中，我们引入Theseus，一种无需训练的方法，用于在异构宽度模型间传输任务更新。与其匹配参数，我们通过其在中间表示上诱导的功能效应来表征任务更新。我们正式将任务向量传输定义为在观察到的激活上进行的功能匹配问题，并显示在通过正交Procrustes分析对齐表示空间后，它允许一个稳定的闭式解，该解保留了更新的几何结构。我们在不同宽度的视觉和语言模型上评估Theseus，显示在不进行额外训练或反向传播的情况下，相对于基线有持续的改进。我们的结果表明，当任务身份通过功能而非参数定义时，任务更新可以有意义地在不同架构间转移。代码可在https://github.com/apanariello4/merge-and-rebase获取。

英文摘要

Adapting large pre-trained models to downstream tasks often produces task-specific parameter updates that are expensive to relearn for every model variant. While recent work has shown that such updates can be transferred between models with identical architectures, transferring them across models of different widths remains unexplored. In this work, we introduce Theseus, a training-free method for transporting task updates across heterogeneous-width models. Rather than matching parameters, we characterize a task update by the functional effect it induces on intermediate representations. We formalize task-vector transport as a functional matching problem on observed activations and show that, after aligning representation spaces via orthogonal Procrustes analysis, it admits a stable closed-form solution that preserves the geometry of the update. We evaluate Theseus on vision and language models across different widths, showing consistent improvements over baselines without additional training or backpropagation. Our results show that task updates can be meaningfully transferred across architectures when task identity is defined functionally rather than parametrically. Code is available at https://github.com/apanariello4/merge-and-rebase.

URL PDF HTML ☆

赞 0 踩 0

2602.12506 2026-05-22 cs.LG

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

关于RL微调VLMs的鲁棒性和链式思维一致性

Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal

发表机构 * Apple（苹果公司）； OpenAI

AI总结本文研究了RL微调VLMs在视觉推理任务中的鲁棒性和链式思维一致性，发现文本扰动和CoT不一致会显著降低模型的鲁棒性和信心，而闭源模型在保持鲁棒性和推理一致性方面表现更佳，指出这一差距源于当前开源RL微调的不足而非任务本身的限制。

Comments ICML 2026

详情

AI中文摘要

强化学习（RL）微调已成为增强大型语言模型（LLMs）在推理密集型任务中的关键技术，推动其扩展到视觉语言模型（VLMs）。尽管RL微调的VLMs在视觉推理基准测试中表现优异，但它们仍容易受到弱视觉基础、幻觉和过度依赖文本提示的影响。我们发现，简单的受控文本扰动，包括误导的标题或错误的链式思维（CoT）轨迹，会导致鲁棒性和信心的显著下降，且当考虑跨开源多模态推理模型的CoT一致性时，这些影响更为明显。相比之下，闭源模型表现出相似的失败模式，但保持了显著更高的鲁棒性和推理一致性，这表明差距反映的是当前开源RL微调的不足，而非任务本身的限制。为了更好地理解这些漏洞，我们进一步分析了RL微调动态，并揭示了准确率与忠实度之间的权衡：微调提高了基准测试准确率，但同时可能削弱伴随的CoT的可靠性及其对上下文变化的鲁棒性。尽管对抗性增强提高了鲁棒性，但本身并不能防止忠实度漂移。结合忠实度意识的奖励可以恢复答案与推理之间的对齐，但当与增强结合时，训练风险会坍缩到捷径策略，鲁棒性仍然难以获得。这些发现突显了仅基于准确率的评估的局限性，并促使训练和评估协议共同强调正确性、鲁棒性和视觉基础推理的忠实度。

英文摘要

Reinforcement learning (RL) finetuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision-language models (VLMs). While RL-tuned VLMs improve on visual reasoning benchmarks, they remain vulnerable to weak visual grounding, hallucinations, and over-reliance on textual cues. We show that simple, controlled textual perturbations, including misleading captions or incorrect chain-of-thought (CoT) traces, cause substantial drops in robustness and confidence, and that these effects are more pronounced when CoT consistency is taken into account across open-source multimodal reasoning models. In contrast, closed models exhibit similar failure modes but maintain markedly greater robustness and reasoning consistency, suggesting that the gap reflects a shortcoming in current open-source RL finetuning rather than an inherent limitation of the task. To better understand these vulnerabilities, we further analyze RL finetuning dynamics and uncover an accuracy-faithfulness trade-off: finetuning raises benchmark accuracy, but can simultaneously erode the reliability of the accompanying CoT and its robustness to contextual shifts. Although adversarial augmentation improves robustness, it does not by itself prevent faithfulness drift. Incorporating a faithfulness-aware reward can restore alignment between answers and reasoning, but when paired with augmentation, training risks collapsing onto shortcut strategies and robustness remains elusive. Together, these findings highlight the limitations of accuracy-only evaluations and motivate training and assessment protocols that jointly emphasize correctness, robustness, and the faithfulness of visually grounded reasoning.

URL PDF HTML ☆

赞 0 踩 0

2602.10894 2026-05-22 cs.LG cs.AI

Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games

重新审视正则化策略优化以实现稳定且高效的双人博弈强化学习

Kazuki Ota, Takayuki Osa, Motoki Omura, Tatsuya Harada

发表机构 * The University of Tokyo, Japan（东京大学）； RIKEN Center for Advanced Intelligence Project, Japan（日本RIKEN高级智能项目中心）

AI总结本文重新审视了带有反向Kullback-Leibler正则化和熵正则化的策略优化方法，在双人零和设置中从理论和经验角度分析其组合，提供了新的收敛保证并通过合成游戏的数值实验验证了理论结果，并基于正则化策略优化推导出一种实用的模型无关强化学习算法，通过在五个棋盘游戏中进行的全面实验验证了算法的训练效率。

Comments Accepted at ICML 2026

详情

AI中文摘要

像棋盘游戏这样的双人博弈长期以来一直是强化学习的传统基准。本工作重新审视了一种带有反向Kullback-Leibler正则化和熵正则化的策略优化方法，并从理论和经验角度分析其在双人零和设置中的组合。从理论角度来看，我们研究了策略更新规则在两个理论设置中的稳定性：博弈论的正常形式博弈和有限长度博弈。我们提供了新的收敛保证，并通过合成游戏的数值实验验证了我们的理论结果。从经验角度来看，我们推导出一种基于正则化策略优化的实用模型无关强化学习算法。我们通过在五个棋盘游戏中进行的全面实验验证了我们算法的训练效率。实验结果表明，我们的智能体在各种环境中学习效率均优于现有方法。

英文摘要

Two-player games such as board games have long been used as traditional benchmarks for reinforcement learning. This work revisits a policy optimization method with reverse Kullback-Leibler regularization and entropy regularization and analyzes this combination in two-player zero-sum settings from theoretical and empirical perspectives. From a theoretical perspective, we investigate the stability of the policy update rule in two theoretical settings: game-theoretic normal-form games and finite-length games. We provide novel convergence guarantees and verify our theoretical results through numerical experiments on synthetic games. From an empirical perspective, we derive a practical model-free reinforcement learning algorithm based on the regularized policy optimization. We validate the training efficiency of our algorithm through comprehensive experiments on five board games: Animal Shogi, Gardner Chess, Go, Hex, and Othello. Experimental results show that our agent learns more efficiently than existing methods across environments.

URL PDF HTML ☆

赞 0 踩 0

2602.10085 2026-05-22 cs.AI

CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

CODE-SHARP: 连续开放发现和演化的技能作为层次奖励程序

Richard Bornemann, Pierluigi Vito Amadori, Antoine Cully

发表机构 * Imperial College London（帝国理工学院伦敦分校）； Sony Interactive Entertainment（索尼互动娱乐）

AI总结该研究提出CODE-SHARP框架，通过基础模型自主发现和演化技能作为层次奖励程序，实现通用智能体政策的从零开始强化学习，无需预定义奖励，有效学习长周期技能。

Comments Preprint

详情

AI中文摘要

一般智能的核心特征是能够自主扩展和演化其掌握的技能集。尽管最近基于基础模型（FM）的方法在这一目标上显示出有希望的结果，但它们通常依赖于显著的人工工程，限制了其在新环境中的可转移性。为了解决这个问题，我们引入了连续开放发现和演化技能作为层次奖励程序（CODE-SHARP）框架，该框架利用基础模型来自主增长和演化一个编码技能的Python程序档案，通过强化学习训练通用智能体策略。这些程序被称为技能作为层次奖励程序（SHARPs），每个程序编码一个局部成功条件和一组被委托给先前发现的SHARPs的先决条件。在运行时，SHARPs根据当前状态动态路由智能体通过其先决条件链，奖励沿途的每个完成，要求智能体仅学习每个新SHARP引入的边际行为，从而在无需预定义奖励的情况下高效学习长周期技能。在Craftax-Classic和XLand上，由CODE-SHARP完全自主训练的智能体在中位性能上比先前工作高出6倍和2.6倍，并且是唯一能够制作铁工具和开采钻石的智能体。在扩展的Craftax上，CODE-SHARP在超过90个发现的SHARPs上训练通用智能体，使其能够零样本解决具有挑战性的长周期任务，与基于真实奖励训练的智能体表现相当。

英文摘要

A core quality of general intelligence is the ability to open-endedly expand and evolve its set of mastered skills autonomously. While recent Foundation Model (FM) driven approaches have shown promising results towards this goal, they typically rely on significant human-in-the-loop engineering, limiting their transferability to novel environments. To address this, we introduce Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs (CODE-SHARP), a framework that leverages FMs to open-endedly grow and evolve an archive of Python programs encoding skills to train a generalist agent policy entirely from scratch via reinforcement learning, directly from source code. These programs, termed Skills as Hierarchical Reward Programs (SHARPs), each encode a local success condition and a set of prerequisites delegated to previously discovered SHARPs. At runtime, SHARPs dynamically route the agent through their prerequisite chain based on the current state, rewarding each completion along the way, requiring the agent to learn only the marginal behaviour each new SHARP introduces, enabling efficient learning of long-horizon skills without any pre-defined rewards. On Craftax-Classic and XLand, agents trained fully autonomously by CODE-SHARP outperform previous works by 6x and 2.6x in median performance and are the only agents capable of crafting iron tools and mining diamonds. Scaled to Craftax-Extended, CODE-SHARP trains a generalist agent on over 90 discovered SHARPs, enabling the agent to solve challenging long-horizon tasks zero-shot, matching agents trained on ground-truth rewards.

URL PDF HTML ☆

赞 0 踩 0

2602.10009 2026-05-22 cs.AI cs.HC

Discovering High Level Patterns from Simulation Traces

从仿真轨迹中发现高层次模式

Sean Memery, Kartic Subr

发表机构 * University of Edinburgh（爱丁堡大学）

AI总结本文提出了一种通过程序合成进行无监督学习的方法，将仿真轨迹转换为稀疏的高层次模式表示，以提升大语言模型对物理系统的推理能力。

详情

AI中文摘要

大型语言模型（LLMs）在处理特定物理系统时无法可靠推理。尽管尝试通过赋予LLMs物理概念知识来提升其能力显示出巨大潜力，但可解释性和验证仍面临挑战。一种新兴的替代方法是工具链，其中LLMs可以查询物理模拟器并利用生成的仿真轨迹作为验证上下文。然而，这种方法的可扩展性较差，因为仿真轨迹包含大量细粒度的数值和语义数据。我们证明，将仿真轨迹转换为稀疏表示的“高层次”结构模式能更有效地被LLMs解释。我们提出了一种无监督学习方案，通过程序合成执行此转换或注释。我们的学习结果产生了一组程序库，这些程序作为模式检测器，可以将仿真轨迹转换为稀疏注释的模式序列。检测到的模式可选地通过人类专家的字符串标签（如刚性碰撞、拉伸弹簧等）进行引导。我们通过最近的一个物理基准测试表明，这样的注释表示更易于自然语言推理特定物理系统。合成的程序充当透明、可解释的函数，将系统状态映射到稀疏且高效的注释空间。作为应用示例，我们展示了如何将自然语言指定的物理系统目标转换为奖励程序，通过最大化这些程序来寻找解决方案。

英文摘要

Large Language Models (LLMs) are unable to reliably reason about specific physical systems. Attempts to imbue LLMs with knowledge of the necessary physics concepts have shown great promise, but explainability and validation remain open challenges. An emerging alternative is tooling, where LLMs can query physical simulators and use the resulting simulation traces as context for validation. This approach suffers from poor scalability since simulation traces contain large volumes of fine-grained numerical and semantic data. We show that translating simulation traces to a sparse representation of "high-level" structural patterns leads to more effective interpretation by LLMs. We propose an unsupervised learning scheme to perform this translation, or annotation, via program synthesis. Our learning results in a library of programs that act as pattern detectors which can translate simulation traces to sparse, annotated pattern sequences. The detected patterns may optionally be guided by human experts via string labels (rigid collision, stretching spring, etc.). We show, using a recent physics benchmark, that such annotated representations are more amenable to natural language reasoning about specific physical systems. The synthesized programs serve as transparent, explainable functions that map system states to a sparse and efficient annotation space. As an example application, we show how goals within physical systems that are specified in natural language may be converted to reward programs which are maximized to find solutions.

URL PDF HTML ☆

赞 0 踩 0

2602.09851 2026-05-22 cs.LG

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

CoFEH: 由协作贝叶斯超参数优化赋能的LLM驱动特征工程

Beicheng Xu, Keyao Ding, Wei Liu, Yupeng Lu, Bin Cui

发表机构 * School of CS \& Key Lab of High Confidence Software Technologies (MOE), Peking University Beijing China ； School of CS \& Beijing Key Laboratory of Software ； Hardware Cooperative Artificial Intelligence Systems, Peking University Beijing China ； School of CS \& Key Lab of High Confidence Software Technologies (MOE), Peking University ； Hardware Cooperative Artificial Intelligence Systems, Peking University

AI总结本文提出CoFEH框架，通过结合LLM驱动的特征工程和贝叶斯超参数优化，实现鲁棒的端到端AutoML，解决了传统方法在搜索空间刚性和缺乏领域意识的问题，并引入互条件机制提升FE与HPO的协同效果。

Comments Accepted at KDD 2026. Extended version with full appendices

详情

DOI: 10.1145/3770855.3817664

AI中文摘要

特征工程（FE）在自动化机器学习（AutoML）中至关重要，但传统方法在搜索空间刚性和缺乏领域意识方面存在瓶颈。尽管大型语言模型（LLMs）能生成无界运算符，但现有方法仅关注孤立子任务，无法实现自由形式的FE流程。此外，它们很少与下游ML模型的超参数优化（HPO）结合，导致贪心的"FE-then-HPO"工作流无法捕捉强FE-HPO交互。本文提出CoFEH，一种协作框架，通过 interleaving LLM驱动的FE和贝叶斯HPO实现鲁棒的端到端AutoML。CoFEH使用基于Tree of Thought（TOT）的LLM驱动FE优化器探索灵活的FE流程，贝叶斯优化（BO）模块解决HPO，并动态优化器选择器适配FE和HPO步骤。关键的是，我们引入互条件机制，使LLM和BO之间共享上下文，实现相互指导的决策。实验表明，CoFEH在独立FE和联合FE+HPO设置中均优于传统和LLM基线。

英文摘要

Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which operate within rigid search spaces and lack domain awareness. While Large Language Models (LLMs) offer a promising alternative to generate unbounded operators with semantic reasoning, existing methods focus on isolated subtasks such as feature generation, falling short of free-form FE pipelines. Moreover, they are rarely coupled with hyperparameter optimization (HPO) of the downstream ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (TOT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that adaptively interleaves FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH outperforms both traditional and LLM-based baselines in both standalone FE and joint FE+HPO settings.

URL PDF HTML ☆

赞 0 踩 0

2602.08064 2026-05-22 cs.LG cs.AI cs.CL

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

SiameseNorm: 突破预规范与后规范之间的障碍

Tianyu Li, Dongchen Han, Zixuan Cao, Haofeng Huang, Mengyu Zhou, Ming Chen, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang, Gao Huang

发表机构 * Leap Lab, Tsinghua University（清华大学 Leap 实验室）； Qwen Large Model Application Team, Alibaba（阿里巴巴 Qwen 大模型应用团队）； Institute for Interdisciplinary Information Sciences, Tsinghua University（清华大学交叉信息学研究院）

AI总结本文提出SiameseNorm，一种双流架构，通过共享残差块将预规范和后规范结合，从而在保持训练稳定性的同时提升模型性能，适用于多种架构和模态。

Comments Accepted to ICML 2026; camera-ready version; revised presentation and added additional experimental results

详情

AI中文摘要

预规范与后规范之间的长期矛盾仍然是Transformer架构中的一个开放问题，反映了训练稳定性与表示能力之间的根本权衡。先前尝试结合两者优势的研究取得了一定进展，但往往在不同训练设置下表现有限，限制了其更广泛的应用。我们重新审视这一困境，表明单流架构难以协调预规范的稳定身份梯度传播与后规范的主要残差路径归一化。为了解决这种结构张力，我们提出SiameseNorm，一种简单而有效的双流架构，能够与预规范训练配方保持兼容。SiameseNorm通过共享残差块将预规范和后规范流连接起来，允许每个残差块从两个路径接收优化信号，且开销极低。在400M和1.3B密集语言模型、15B MoE模型、视觉Transformer以及扩散Transformer上的大量实验表明，SiameseNorm在各种架构和模态中都能保持强大的训练稳定性的同时提升性能。代码可在https://github.com/Qwen-Applications/SiameseNorm上获得。

英文摘要

The long-standing tension between Pre- and Post-Norm remains an open problem in Transformer architecture, reflecting a fundamental trade-off between training stability and representational capacity. Prior attempts to combine their strengths have made progress, but often show limited robustness across training settings, restricting their broader applicability. We revisit this dilemma, showing that single-stream architectures struggle to reconcile Pre-Norm's stable identity-gradient propagation with Post-Norm's normalization of the main residual path. To address this structural tension, we propose SiameseNorm, a simple yet effective two-stream architecture that remains compatible with Pre-Norm training recipes. SiameseNorm couples Pre-Norm-like and Post-Norm-like streams through shared residual blocks, allowing each residual block to receive optimization signals from both pathways with negligible overhead. Extensive experiments on 400M and 1.3B dense language models, 15B MoE models, Vision Transformers, and Diffusion Transformers show that SiameseNorm consistently improves performance while maintaining strong training stability across architectures and modalities. Code is available at https://github.com/Qwen-Applications/SiameseNorm.

URL PDF HTML ☆

赞 0 踩 0

2602.07340 2026-05-22 cs.LG

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

通过选择性几何控制重新审视LLM安全对齐的鲁棒性

Yonghui Yang, Wenjian Tao, Jilong Liu, Xingyu Zhu, Junfeng Fang, Weibiao Huang, Le Wu, Richang Hong, Tat-Sent Chua

发表机构 * National University of Singapore（新加坡国立大学）； Hefei University of Technology（合肥工业大学）； ST Engineering Ltd., Singapore（新加坡ST工程有限公司）

AI总结本文通过优化几何视角重新审视LLM安全对齐的鲁棒性，提出ShaPO框架，通过选择性几何控制在对齐关键参数子空间上强制最坏对齐目标，提升安全鲁棒性。

详情

AI中文摘要

大型语言模型的安全对齐在领域偏移和噪声偏好监督下仍显得脆弱。大多数现有鲁棒对齐方法关注对齐数据中的不确定性，而忽视了基于偏好的目标中优化诱导的脆弱性。在本文中，我们从优化几何的角度重新审视LLM安全对齐的鲁棒性，并认为鲁棒性失败不能仅通过数据为中心的方法解决。我们提出了ShaPO，一种几何感知的偏好优化框架，通过在对齐关键参数子空间上进行选择性几何控制来强制最坏情况下的对齐目标。通过避免均匀的几何约束，ShaPO缓解了在分布偏移下可能损害鲁棒性的过度正则化问题。我们将在两个层面实例化ShaPO：token层面的ShaPO稳定了基于似然的替代优化，而reward层面的ShaPO在噪声监督下强制奖励一致的优化。在多样化的安全基准和噪声偏好设置中，ShaPO在流行偏好优化方法上一致地提高了安全鲁棒性。此外，ShaPO能够与数据鲁棒目标清洁地组合，产生额外的收益，并经验上支持所提出的优化-几何视角。代码可在https://github.com/liujilong0116/ShaPO上获得。

英文摘要

Safety alignment of large language models remains brittle under domain shift and noisy preference supervision. Most existing robust alignment methods focus on uncertainty in alignment data, while overlooking optimization-induced fragility in preference-based objectives. In this work, we revisit robustness for LLM safety alignment from an optimization geometry perspective, and argue that robustness failures cannot be addressed by data-centric methods alone. We propose \textit{ShaPO}, a geometry-aware preference optimization framework that enforces worst-case alignment objectives via selective geometry control over alignment-critical parameter subspace. By avoiding uniform geometry constraints, ShaPO mitigates the over-regularization that can harm robustness under distribution shift. We instantiate ShaPO at two levels: token-level ShaPO stabilizes likelihood-based surrogate optimization, while reward-level ShaPO enforces reward-consistent optimization under noisy supervision. Across diverse safety benchmarks and noisy preference settings, ShaPO consistently improves safety robustness over popular preference optimization methods. Moreover, ShaPO composes cleanly with data-robust objectives, yielding additional gains and empirically supporting the proposed optimization-geometry perspective. The code is available at https://github.com/liujilong0116/ShaPO.

URL PDF HTML ☆

赞 0 踩 0

2602.06995 2026-05-22 cs.RO cs.CV cs.IT cs.MA math.IT

When Simultaneous Localization and Mapping Meets Wireless Communications: A Survey

当同时定位与建图遇见无线通信：一篇综述

Konstantinos Gounis, Sotiris A. Tegos, Dimitrios Tyrovolas, Panagiotis D. Diamantoulakis, George K. Karagiannidis

发表机构 * Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki（阿尔蒂斯大学电气与计算机工程系）

AI总结本文综述了SLAM与无线通信交汇领域的最新进展，重点探讨了视觉SLAM（V-SLAM）整合中的双向影响，总结了无线信号传播、几何信道建模、基于射频（RF）的定位与感知等关键概念，以及图像处理技术如何检测地标并预测无线信道的最优路径，同时分析了SLAM与无线通信交叉领域的技术、挑战和未来方向。

详情

AI中文摘要

本文综述了SLAM与无线通信交汇领域的最新进展， attributing the bidirectional impact of each with a focus on visual SLAM (V-SLAM) integration. We provide an overview of key concepts related to wireless signal propagation, geometric channel modeling, and radio frequency (RF)-based localization and sensing. In addition to this, we show image processing techniques that can detect landmarks, proactively predicting optimal paths for wireless channels. Several dimensions are considered, including the prerequisites, techniques, background, and future directions and challenges of the intersection between SLAM and wireless communications. We analyze estimation and control approaches such as Bayesian filters, feature-based pose estimation, perception-aware motion control, spatial methods for signal processing such as vector fields, and key technological aspects. We expose techniques and items towards enabling a highly effective retrieval of the autonomous robot state. Among other interesting findings, we observe that monocular V-SLAM would benefit from RF relevant information, as the latter can serve as a proxy for the scale ambiguity resolution. Conversely, we find that wireless communications in the context of 5G and beyond can potentially benefit from visual odometry that is central in SLAM. Moreover, we examine other sources besides the camera for SLAM and describe the twofold relation with wireless communications. Finally, integrated solutions performing joint communications and SLAM appear to be in their infancy: theoretical and practical advancements are required to add higher-level localization and semantic perception capabilities to RF and multi-antenna technologies.

英文摘要

This paper surveys the state-of-the-art in the nexus of SLAM and Wireless Communications, attributing the bidirectional impact of each with a focus on visual SLAM (V-SLAM) integration. We provide an overview of key concepts related to wireless signal propagation, geometric channel modeling, and radio frequency (RF)-based localization and sensing. In addition to this, we show image processing techniques that can detect landmarks, proactively predicting optimal paths for wireless channels. Several dimensions are considered, including the prerequisites, techniques, background, and future directions and challenges of the intersection between SLAM and wireless communications. We analyze estimation and control approaches such as Bayesian filters, feature-based pose estimation, perception-aware motion control, spatial methods for signal processing such as vector fields, and key technological aspects. We expose techniques and items towards enabling a highly effective retrieval of the autonomous robot state. Among other interesting findings, we observe that monocular V-SLAM would benefit from RF relevant information, as the latter can serve as a proxy for the scale ambiguity resolution. Conversely, we find that wireless communications in the context of 5G and beyond can potentially benefit from visual odometry that is central in SLAM. Moreover, we examine other sources besides the camera for SLAM and describe the twofold relation with wireless communications. Finally, integrated solutions performing joint communications and SLAM appear to be in their infancy: theoretical and practical advancements are required to add higher-level localization and semantic perception capabilities to RF and multi-antenna technologies.

URL PDF HTML ☆

赞 0 踩 0

2602.06676 2026-05-22 cs.CV

Can We Build a Monolithic Model for Fake Image Detection? SICA: Semantic-Induced Constrained Adaptation for Unified-Yet-Discriminative Artifact Feature Space Reconstruction

我们能否为伪造图像检测构建一个单一模型？SICA：语义诱导约束适应用于统一且具有判别性的伪影特征空间重建

Bo Du, Xiaochen Ma, Xuekang Zhu, Zhe Yang, Chaogun Niu, Chenfan Qu, Mingqi Fang, Zhenming Wang, Jingjing Liu, Jian Liu, Ji-Zhe Zhou

发表机构 * Sichuan University（四川大学）； The Hong Kong University of Science and Technology（香港科学与技术大学）； University of Science and Technology of China（中国科学技术大学）； South China University of Technology（华南理工大学）

AI总结本文提出了一种新的单体伪造图像检测模型SICA，通过语义诱导约束适应方法，解决伪影特征空间重建的统一与判别性矛盾，实验表明其优于15种现有方法。

详情

AI中文摘要

伪造图像检测（FID），旨在在四个图像鉴真子领域中实现统一检测，在现实鉴真场景中至关重要。与集成方法相比，单体FID模型在理论上更具前景，但至今在实践中始终表现不佳。在本文中，我们识别了伪影在子领域中的本质差异，这一关键障碍我们称之为“齐则现象”。受这一现象的驱动，我们首次诊断出这种表现不佳的根本原因：伪影特征空间的崩溃。因此，开发实用单体FID模型的核心挑战归结为“统一且具有判别性的”伪影特征空间重建。为了解决这个矛盾的挑战，我们假设高层语义可以作为重建的结构先验，并进一步提出语义诱导约束适应（SICA），这是首个单体FID范式。在我们开放的OpenMMSec数据集上进行了广泛的实验，结果表明SICA优于15种最先进的方法，并以近正交的方式重建了目标统一且具有判别性的伪影特征空间，从而牢固验证了我们的假设。代码和数据集可在：https://github.com/venus-guangjian/SICA_OpenMMSec获取。

英文摘要

Fake Image Detection (FID), aiming at unified detection across four image forensic subdomains, is critical in real-world forensic scenarios. Compared with ensemble approaches, monolithic FID models are theoretically more promising, but to date, consistently yield inferior performance in practice. In this work, we identify the intrinsic distinctness of artifacts across subdomains, a critical barrier we term the ``Ji-Zhe phenomenon". Driven by this phenomenon, we diagnose the cause of this underperformance for the first time: the collapse of the artifact feature space. The core challenge for developing a practical monolithic FID model thus boils down to the ``unified-yet-discriminative" reconstruction of the artifact feature space. To address this paradoxical challenge, we hypothesize that high-level semantics can serve as a structural prior for the reconstruction, and further propose Semantic-Induced Constrained Adaptation (SICA), the first monolithic FID paradigm. Extensive experiments on our OpenMMSec dataset demonstrate that SICA outperforms 15 state-of-the-art methods and reconstructs the target unified-yet-discriminative artifact feature space in a near-orthogonal manner, thus firmly validating our hypothesis. The code and dataset are available at: https://github.com/venus-guangjian/SICA_OpenMMSec.

URL PDF HTML ☆

赞 0 踩 0

2602.05873 2026-05-22 cs.LG

十亿级图基础模型

Maya Bechler-Speicher, Yoel Gottlieb, Andrey Isakov, David Abensur, Ami Tavory, Daniel Haimovich, Ido Guy, Udi Weinsberg

发表机构 * Meta

AI总结本文提出GraphBFF，一种用于构建大规模异构图的十亿参数图基础模型的端到端方法，通过引入GraphBFF Transformer架构，揭示了异构图的神经缩放定律，并在多个下游任务中展示了其优越的性能。

详情

AI中文摘要

图结构数据支撑了许多关键应用。尽管基础模型通过大规模预训练和轻量级适应改变了语言和视觉领域，但将其扩展到一般、现实世界的图结构却具有挑战性。在本文中，我们提出了Graph Billion-Foundation-Fusion（GraphBFF）：一种用于构建大规模异构图的十亿参数图基础模型（GFMs）的端到端方法。该方法的核心是GraphBFF Transformer，一种灵活且可扩展的架构，专为实际的十亿级GFMs设计。利用GraphBFF，我们提出了异构图的神经缩放定律，并显示损失随着模型容量或训练数据规模的增加而减少，取决于哪个因素是瓶颈。GraphBFF框架提供了具体的方法论，用于数据分批、预训练和微调，以构建大规模的GFMs。我们通过一个现实世界中的十亿级图展示了该框架的有效性，评估了一个十亿参数的GraphBFF Transformer，按照所提出的配方。在十个不同的现实世界下游任务上，涵盖节点和链接级别的分类和回归，GraphBFF在训练过程中未见过的图上始终优于基线，最大差距达到31个PRAUC点，包括在少样本设置中。最后，我们讨论了使GFMs成为工业规模图学习实际和原则性基础的关键挑战和开放机会。

英文摘要

Graph-structured data underpins many critical applications. While foundation models have transformed language and vision via large-scale pretraining and lightweight adaptation, extending this paradigm to general, real-world graphs is challenging. In this work, we present Graph Billion-Foundation-Fusion (GraphBFF): an end-to-end recipe for building billion-parameter Graph Foundation Models (GFMs) for large-scale heterogeneous graphs. Central to the recipe is the GraphBFF Transformer, a flexible and scalable architecture designed for practical billion-scale GFMs. Using the GraphBFF, we present neural scaling laws for heterogeneous graphs and show that loss decreases predictably as either model capacity or training data scales, depending on which factor is the bottleneck. The GraphBFF framework provides concrete methodologies for data batching, pretraining, and fine-tuning for building GFMs at scale. We demonstrate the effectiveness of the framework over a real-world billion-scale graph, with an evaluation of a billion-parameter GraphBFF Transformer following the proposed recipe. Across ten diverse, real-world downstream tasks on graphs unseen during training, spanning node- and link-level classification and regression, GraphBFF consistently outperforms baselines, with large margins of up to 31 PRAUC points, including in few-shot settings. Finally, we discuss key challenges and open opportunities for making GFMs a practical and principled foundation for graph learning at industrial scale.

URL PDF HTML ☆

赞 0 踩 0

2602.03784 2026-05-22 cs.CL

Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission

修复结构瓶颈：通过显式信息传输进行上下文压缩

Jiangnan Ye, Hanqi Yan, Zhenyi Shen, Heng Chang, Ye Mao, Yulan He

发表机构 * King’s College London（伦敦国王学院）； Tsinghua University（清华大学）； Imperial College London（伦敦帝国学院）； The Alan Turing Institute（艾伦·图灵研究所）

AI总结本文通过从结构角度重新审视上下文压缩，识别出标准LLM压缩方法中的两个关键瓶颈，并提出ComprExIT框架，通过显式信息传输提升压缩效率，实验表明其在多个数据集上表现优异，提升了F1分数并降低了计算成本。

详情

AI中文摘要

长上下文LLM代理往往面临增长的token、内存和延迟成本，使高效的上下文压缩对实际部署至关重要。现有LLM作为压缩器的方法在使用完整上下文时仍明显劣于其性能。我们发现这一差距部分源于其无法有效保留上下文信息。在本文中，我们从结构角度重新审视上下文压缩，并识别出标准LLM压缩方法中的两个关键瓶颈：信息聚合过程中压缩token之间的协调有限，以及层间稀释削弱了中间隐藏状态中的有用信号。为了解决这些限制，我们提出了ComprExIT，一种基于显式信息传输的新上下文压缩框架。ComprExIT会自适应地选择冻结LLM层中的特征，然后通过全局协调的运输计划将信息从锚点分配到压缩槽中。在12个数据集上的实验表明，ComprExIT在多个数据集上优于强大的软压缩基线，平均F1分数提升高达18.5%，同时仅增加约1%的可训练参数，并且比最快的基线快超过2倍的压缩速度。代码将在接受后发布。

英文摘要

Long-context LLM agents often struggle with growing token, memory, and latency costs, making efficient context compression essential for practical deployment. Existing LLM-as-a-compressor methods remain noticeably inferior to using the full context. We find that this gap partly stems from their inability to preserve contextual information effectively. In this work, we revisit context compression from a structural perspective and identify two key bottlenecks in standard LLM-based compressors: limited coordination among compression tokens during information aggregation, and layerwise dilution that weakens useful signals from intermediate hidden states. To address these limitations, we propose ComprExIT, a new context compression framework based on explicit information transmission. ComprExIT adaptively selects features across frozen LLM layers, then allocates information from anchors to compression slots through a globally coordinated transport plan. Experiments on 12 datasets show that ComprExIT consistently outperforms strong soft-compression baselines, improving average F1 by up to 18.5%, while adding only ~1% trainable parameters and achieving more than 2x faster compression than the fastest baselines. The code will be released upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2602.02709 2026-05-22 cs.AI

在大语言模型RLVR训练中的线性动力学

Tianle Wang, Jiayu Liu, Zhongyuan Wu, Shenghao Jin, Wei Chen, Hao Xu, Ning Miao

发表机构 * Department of Data Science, City University of Hong Kong（香港城市大学数据科学系）； Hong Kong Institute of AI for Science, City University of Hong Kong（香港城市大学人工智能科学研究院）； Li Auto Inc. ； Beihang University（北航大学）

AI总结本文研究了强化学习可验证奖励（RLVR）在大语言模型训练中的内部动态，发现RLVR在多种模型和训练配置下均进入线性区域，通过实验和理论分析证明这种线性特性源于训练信号的高方差和噪声，且具有预测性和实用性。

Comments Major revision: substantially reorganized the manuscript and added a theoretical explanation section. The replacement is intended for the same arXiv paper; the core topic and contribution remain the same

详情

AI中文摘要

强化学习可验证奖励（RLVR）在以推理为导向的大语言模型（LLMs）中推动了显著的性能提升，但其内部训练动态仍 largely 是一个黑箱。在本文中，我们对RLVR进行了全面的轨迹级分析，并揭示出一个显著的规律：在各种模型家族、RL算法和训练配置下，RLVR始终进入一个稳健的线性区域，其中参数权重和输出对数概率，通过严格教师强制评估测量，以高度线性的方式（R²>0.7）演变。通过受控实验和理论分析，我们证明这种线性并非偶然，而是源于RLVR训练信号的高方差和噪声性质，这些性质起到了低通滤波器的作用，将优化集中在稳定的、低维的漂移上。此外，我们显示这种线性结构不仅具有描述性，而且具有强大的预测性和实用性。具体而言，权重空间外推在性能上与标准RL优化相当，同时通过定期重新定位实现了6.1倍的训练加速。同时，输出空间外推作为一种轻量级干预，有效 bypassed 后期模型崩溃，持续在数学和编码基准上优于标准RL，平均性能提升了4.2%。我们的代码可在https://github.com/Miaow-Lab/RLVR-Linearity获得。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has driven significant performance gains in reasoning-oriented large language models (LLMs), yet its internal training dynamics remain largely a black box. In this work, we perform a comprehensive trajectory-level analysis of RLVR and uncover a striking regularity: across various model families, RL algorithms, and training configurations, RLVR consistently enters a robust linear regime, where both parameter weights and output log-probabilities, measured rigorously via teacher-forced evaluation, evolve in a highly linear manner ($R^2 > 0.7$). Through controlled experiments and theoretical analysis, we demonstrate that this linearity is not a coincidence, but stems from the high-variance, noisy nature of RLVR training signals, which act as a low-pass filter to concentrate optimization along a stable, low-dimensional drift. Moreover, we show that this linear structure is not merely descriptive but powerfully predictive and actionable. Specifically, weight-space extrapolation matches the performance of standard RL optimization while achieving a 6.1x training speedup through periodic re-grounding. Meanwhile, output-space extrapolation serves as a lightweight intervention that effectively bypasses late-stage model collapse, consistently outperforming standard RL across mathematical and coding benchmarks, with an average performance improvement of 4.2%. Our code is available at https://github.com/Miaow-Lab/RLVR-Linearity.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Prior Knowledge-enhanced Spatio-temporal Epidemic Forecasting

FLIM Networks with Bag of Feature Points

Geometry-Induced Diffusion on Graphs: A Learnable Weighted Laplacian for Spectral GNNs

Depth Augmented and FE Free 3D/2D Liver Registration for Laparoscopic Liver AR

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Transporting Task Vectors across Different Architectures without Training

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games

CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

Discovering High Level Patterns from Simulation Traces

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

When Simultaneous Localization and Mapping Meets Wireless Communications: A Survey

Can We Build a Monolithic Model for Fake Image Detection? SICA: Semantic-Induced Constrained Adaptation for Unified-Yet-Discriminative Artifact Feature Space Reconstruction

Large-scale Score-based Variational Posterior Inference for Bayesian Deep Neural Networks

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

Billion-Scale Graph Foundation Models

Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission

ATLAS: A Multi-LLM Training Framework for EvoDPO with Adaptive Reference Evolution

Unifying Masked Diffusion Models with Various Generation Orders and Beyond

What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom

Provably Protecting Fine-Tuned LLMs from Training Data Extraction while Preserving Utility

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Hyperparameter Transfer with Mixture-of-Expert Layers

Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval

UIKA: Fast Universal Head Avatar from Pose-Free Images

Linear Dynamics in the RLVR Training of Large Language Models