arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2367
专题追踪
2603.01311 2026-05-29 cs.CL

Catalyst-Agent: Autonomous heterogeneous catalyst screening with an LLM Agent

Catalyst-Agent:基于LLM Agent的自主异质催化剂筛选

Achuth Chandrasekhar, Janghoon Ock, Amir Barati Farimani

发表机构 * Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA(卡内基梅隆大学机械工程系,匹兹堡,PA 15213,USA) Department of Chemical and Biomolecular Engineering, University of Nebraska--Lincoln, Lincoln, NE 68588, USA(内布拉斯加大学林肯分校化学与生物分子工程系,林肯,NE 68588,USA)

AI总结 提出Catalyst-Agent,一种基于MCP服务器和LLM的AI代理,通过OPTIMADE API探索材料数据库、利用UMA模型计算吸附能,实现闭环自主催化剂筛选,在ORR、NRR和CO2RR反应中成功率达33-41%。

详情
AI中文摘要

发现针对特定应用的新型催化剂是21世纪的一项重大挑战。传统方法包括基于化学理论的耗时且昂贵的实验试错法,或基于密度泛函理论的计算密集型第一性原理方法。近期研究表明,图神经网络(GNN)等深度学习模型可以将催化剂材料的筛选速度提高多个数量级,且具有很高的准确性和保真度。在这项工作中,我们引入了Catalyst-Agent,一个基于模型上下文协议(MCP)服务器、由LLM驱动的AI代理。它可以使用OPTIMADE API探索庞大的材料数据库,进行结构修改,通过FAIRchem的AdsorbML工作流程和板坯构建使用Meta FAIRchem的UMA(GNN)模型计算吸附能,并以闭环方式向研究人员提供有用的材料建议,包括改进接近命中候选者的结构修改。我们在三个关键反应上进行了测试:氧还原反应(ORR)、氮还原反应(NRR)和CO2还原反应(CO2RR)。Catalyst-Agent在其选择和评估的所有材料中实现了33-41%的成功率,并且平均每个成功材料在1-4次试验内收敛。这项工作展示了AI代理利用其规划能力和工具使用实现自主催化剂筛选工作流程的潜力。

英文摘要

The discovery of novel catalysts tailored for particular applications is a major challenge for the twenty-first century. Traditional methods for this include time-consuming and expensive experimental trial-and-error approaches in labs based on chemical theory or heavily computational first-principles approaches based on density functional theory. Recent studies show that deep learning models like graph neural networks (GNNs) can significantly speed up the screening of catalyst materials by many orders of magnitude, with very high accuracy and fidelity. In this work, we introduce Catalyst-Agent, a Model Context Protocol (MCP) server-based, LLM-powered AI agent. It can explore vast material databases using the OPTIMADE API, make structural modifications, calculate adsorption energies using Meta FAIRchem's UMA (GNN) model via FAIRchem's AdsorbML workflow and slab construction, and make useful material suggestions to the researcher in a closed-loop manner, including structural modifications to refine near-miss candidates. It is tested on three pivotal reactions: the oxygen reduction reaction (ORR), the nitrogen reduction reaction (NRR), and the CO2 reduction reaction (CO2RR). Catalyst-Agent achieves a success rate of 33-41% among all the materials it chooses and evaluates, and manages to converge in 1-4 trials per successful material on average. This work demonstrates the potential of AI agents to exercise their planning capabilities and tool use for autonomous catalyst screening workflows.

2602.23258 2026-05-29 cs.AI cs.CL

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

AgentDropoutV2: 通过测试时修正或拒绝剪枝优化多智能体系统中的信息流

Yutong Wang, Siyuan Xiong, Xuebo Liu, Wenkang Zhou, Liang Ding, Miao Zhang, Min Zhang

发表机构 * Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Alibaba Group(阿里巴巴集团)

AI总结 提出AgentDropoutV2框架,在测试时通过检索增强修正器纠正错误并剪枝不可修复输出,动态优化多智能体系统信息流,显著提升数学和代码基准性能。

详情
AI中文摘要

虽然多智能体系统(MAS)在复杂推理中表现出色,但它们受到来自单个智能体的错误信息的级联影响。当前的解决方案通常依赖于刚性的结构工程或昂贵的微调,限制了它们的适应性。我们提出了AgentDropoutV2(ADv2),一种测试时修正或拒绝剪枝框架,动态优化MAS信息流。作为主动防火墙,ADv2拦截智能体输出,并采用检索增强修正器迭代纠正错误。这种修正由一个指示池引导,该池通过从历史MAS失败轨迹中提炼错误模式离线构建。随后,不可修复的输出被剪枝以防止错误传播。实验结果表明,ADv2在固定和动态MAS框架上均显著提升了性能,在广泛的数学和代码基准测试中分别实现了平均6.39和2.28个百分点的准确率提升。此外,ADv2表现出卓越的适应性,根据任务难度动态调整修正力度,以解决广泛的错误模式。我们的代码已发布在https://github.com/TonySY2/AgentDropoutV2。

英文摘要

While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information from individual agents. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their adaptability. We propose AgentDropoutV2 (ADv2), a test-time rectify-or-reject pruning framework that dynamically optimizes MAS information flow. Acting as an active firewall, ADv2 intercepts agent outputs and employs a retrieval-augmented rectifier to iteratively correct errors. This rectification is guided by an indicator pool, which is constructed offline by distilling error patterns from historical MAS failure trajectories. Irreparable outputs are subsequently pruned to prevent error propagation. Empirical results demonstrate that ADv2 significantly boosts performance on both fixed and dynamic MAS frameworks, achieving average accuracy gains of 6.39 and 2.28 percentage points on extensive math and code benchmarks, respectively. Furthermore, ADv2 exhibits remarkable adaptivity, dynamically modulating rectification efforts based on task difficulty to resolve a wide spectrum of error patterns. Our code is released at https://github.com/TonySY2/AgentDropoutV2.

2602.21565 2026-05-29 cs.LG

Routing by Reaching: Composition of Pre-trained GFlowNets for Multi-Objective Generation

通过到达进行路由:预训练GFlowNets的组合用于多目标生成

Seokwon Yoon, Youngbin Choi, Seunghyuk Cho, Seungbeom Lee, MoonJeong Park, Dongwoo Kim

发表机构 * Department of Computer Science \& Engineering, POSTECH, South Korea Graduate School of Artificial Intelligence, POSTECH, South Korea

AI总结 提出一个在推理时组合预训练GFlowNets的框架,无需微调或重新训练即可快速适应多目标生成任务,并证明在线性标量化下精确恢复目标分布,对非线性算子通过畸变因子量化近似质量。

Comments Appears in the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

生成流网络(GFlowNets)学习按照奖励函数比例采样多样化的候选,使其非常适合科学发现,其中探索多个有希望的解决方案至关重要。进一步将GFlowNets扩展到多目标设置已引起越来越多的兴趣,因为现实世界的应用通常涉及多个相互冲突的目标。然而,现有方法需要对每个目标组合进行联合训练,这意味着目标集的任何变化都需要从头开始重新训练。我们提出了一个在推理时组合预训练GFlowNets的框架,无需微调或重新训练即可实现快速适应。重要的是,我们的框架是灵活的,能够处理从线性标量化到复杂非线性算子的多种奖励组合,这些在以前的文献中通常分开处理。我们证明,我们的方法在线性标量化下精确恢复目标分布,并通过畸变因子量化非线性算子的近似质量。在合成二维网格和真实分子生成任务上的实验表明,我们的方法达到了与基线相当的性能。

英文摘要

Generative Flow Networks (GFlowNets) learn to sample diverse candidates in proportion to a reward function, making them well-suited for scientific discovery, where exploring multiple promising solutions is crucial. Further extending GFlowNets to multi-objective settings has attracted growing interest as real-world applications often involve multiple, conflicting objectives. However, existing approaches require joint training for each combination of objectives, meaning that any change in the objective set necessitates retraining from scratch. We propose a framework that composes pre-trained GFlowNets at inference time, enabling rapid adaptation without fine-tuning or retraining. Importantly, our framework is flexible, capable of handling diverse reward combinations ranging from linear scalarization to complex nonlinear operators, which are often handled separately in previous literature. We prove that our method exactly recovers the target distribution for linear scalarization, and quantify the approximation quality for nonlinear operators through a distortion factor. Experiments on a synthetic 2D grid and real-world molecule generation tasks demonstrate that our approach achieves performance comparable to baselines.

2602.20141 2026-05-29 cs.AI

Recurrent Structural Policy Gradient for Partially Observable Mean Field Games

循环结构策略梯度用于部分可观测平均场博弈

Clarisse Wibault, Johannes Forkel, Sebastian Towers, Tiphaine Wibault, Juan Duque, George Whittle, Andreas Schaab, Yucheng Yang, Chiyuan Wang, Maike Osborne, Benjamin Moll, Jakob Foerster

发表机构 * FLAIR, University of Oxford(FLAIR,牛津大学) MLRG, University of Oxford(MLRG,牛津大学) ifo Institute, LMU Munich(IFO研究所,慕尼黑大学) Mila, Québec AI Institute(Mila,魁北克AI研究所) University of Zurich(苏黎世大学) Peking University(北京大学) London School of Economics(伦敦经济学院)

AI总结 针对部分可观测平均场博弈,提出首个历史感知的混合结构方法RSPG,通过利用低维状态动作空间和已知转移动力学计算期望回报,实现比无模型RL方法快一个数量级的收敛速度。

详情
AI中文摘要

平均场博弈(MFGs)为大规模群体系统中的交互建模提供了原则性框架。然而,由于无模型方法方差高而精确方法扩展性差,算法进展有限。最近的混合结构方法(HSMs)通过利用低维个体状态和动作空间以及已知的转移动力学,计算以公共噪声的蒙特卡洛轨迹为条件的精确期望回报,从而在保持可处理性的同时降低方差。然而,HSMs尚未扩展到部分可观测设置。我们提出循环结构策略梯度(RSPG),这是首个用于具有公共部分信息的MFGs的历史感知HSM。RSPG实现了比无模型RL方法快一个数量级的收敛速度,同时学习历史感知行为,这与当前的HSMs不同。为了促进对MFGs的研究,我们还引入了MFAX,这是我们基于JAX的MFG框架,支持解析和基于样本的平均场更新。MFAX和使用示例可在https://clarisse-wibault.github.io/rspg/找到。

英文摘要

Mean Field Games (MFGs) provide a principled framework for modelling interactions in large population systems. However, algorithmic progress has been limited since model-free methods are high variance and exact methods scale poorly. Recent Hybrid Structural Methods (HSMs) reduce variance while maintaining tractability by leveraging low-dimensional individual state and action spaces and known transition dynamics to compute the exact expected return conditioned on Monte Carlo rollouts of common noise. However, HSMs have not been extended to partially observable settings. We propose Recurrent Structural Policy Gradient (RSPG), the first history-aware HSM for MFGs with public partial information. RSPG achieves an order-of-magnitude faster convergence than model-free RL methods while learning history-aware behaviour, unlike current HSMs. To facilitate research into MFGs, we also introduce MFAX, our JAX-based framework for MFGs that supports both analytic and sample-based mean-field updates. MFAX and usage examples can be found at https://clarisse-wibault.github.io/rspg/.

2602.18527 2026-05-29 cs.CV cs.AI cs.SD

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

JAEGER:模拟物理环境中的联合3D音频-视觉定位与推理

Zhan Liu, Changli Tang, Yuxin Wang, Zhiyuan Zhu, Youjun Chen, Yiwen Shao, Tianzi Wang, Lei Ke, Zengrui Jin, Chao Zhang

发表机构 * Tsinghua University(清华大学) Zhejiang University(浙江大学) The Chinese University of Hong Kong(香港中文大学) Tencent AI Lab(腾讯AI实验室)

AI总结 提出JAEGER框架,通过集成RGB-D观测和多通道一阶环境声学,将音频-视觉大语言模型扩展到3D空间,实现联合空间定位与推理,并引入神经强度向量(Neural IV)提升声源方向估计的鲁棒性。

Comments Accepted to ICML 2026

详情
AI中文摘要

当前的音频-视觉大语言模型(AV-LLMs)主要局限于2D感知,依赖于RGB视频和单声道音频。这种设计选择引入了基本的维度不匹配,阻碍了在复杂3D环境中可靠的声源定位和空间推理。我们通过提出JAEGER框架来解决这一限制,该框架将AV-LLMs扩展到3D空间,通过集成RGB-D观测和多通道一阶环境声学实现联合空间定位与推理。我们工作的核心贡献是神经强度向量(Neural IV),一种学习的空间音频表示,它编码了鲁棒的方向线索,以增强到达方向估计,即使在具有重叠声源的不利声学场景中也是如此。为了促进大规模训练和系统评估,我们提出了SpatialSceneQA,一个包含从模拟物理环境中整理的6.1万个指令调优样本的基准。大量实验表明,我们的方法在各种空间感知和推理任务中始终优于以2D为中心的基线,强调了显式3D建模对于推进物理环境中AI的必要性。我们的源代码、预训练模型检查点和数据集可在https://github.com/liuzhan22/JAEGER获取。

英文摘要

Current audio-visual large language models (AV-LLMs) are predominantly restricted to 2D perception, relying on RGB video and monaural audio. This design choice introduces a fundamental dimensionality mismatch that precludes reliable source localization and spatial reasoning in complex 3D environments. We address this limitation by presenting JAEGER, a framework that extends AV-LLMs to 3D space, to enable joint spatial grounding and reasoning through the integration of RGB-D observations and multi-channel first-order ambisonics. A core contribution of our work is the neural intensity vector (Neural IV), a learned spatial audio representation that encodes robust directional cues to enhance direction-of-arrival estimation, even in adverse acoustic scenarios with overlapping sources. To facilitate large-scale training and systematic evaluation, we propose SpatialSceneQA, a benchmark of 61k instruction-tuning samples curated from simulated physical environments. Extensive experiments demonstrate that our approach consistently surpasses 2D-centric baselines across diverse spatial perception and reasoning tasks, underscoring the necessity of explicit 3D modelling for advancing AI in physical environments. Our source code, pre-trained model checkpoints, and datasets are available at https://github.com/liuzhan22/JAEGER.

2602.18196 2026-05-29 cs.LG

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference

RAT+:密集训练,稀疏推理——用于扩张推理的循环增强注意力

Xiuying Wei, Caglar Gulcehre

发表机构 * CLAIRE lab at EPFL, Lausanne, Switzerland(EPFL 拉沃斯实验室)

AI总结 提出RAT+架构,通过密集预训练和循环增强注意力,使模型在推理时可灵活切换为稀疏扩张注意力,大幅降低计算和缓存开销,同时保持高精度。

Comments Accepted by ICML2026

详情
AI中文摘要

结构化扩张注意力具有吸引人的推理效率调节旋钮:它将注意力的FLOPs和KV缓存大小减少扩张大小D的倍数,同时保持长程连接。虽然先前的工作通过从头训练每个配置来研究它,但直接将预训练注意力模型稀疏化为扩张模式会导致严重的精度下降,阻碍跨推理场景的灵活重用。我们引入RAT+,一种密集预训练架构,通过全序列循环和主动循环学习增强注意力。单个RAT+模型密集预训练一次,然后可以在推理时灵活切换到扩张注意力(可选局部窗口)或混合层/头组合,仅需短期的10亿token分辨率适应,而无需重新训练单独的稀疏模型。在100B token上训练的1.5B参数模型中,RAT+在D=16时紧密匹配密集精度,在D=64时在常识推理和LongBench任务上下降约2-3个点。我们进一步扩展到2.6B和7.6B参数,观察到更有希望的性能(例如,在注意力FLOPs和KV缓存大小减少64倍的情况下,平均精度损失1个点)。代码可在https://github.com/wimh966/rat-plus获取。

英文摘要

Structured dilated attention has an appealing inference-time efficiency knob: it reduces the FLOPs of attention and the KV cache size by a factor of the dilation size D, while preserving long-range connectivity. While prior work studies it by training each configuration from scratch, directly sparsifying a pretrained attention model into a dilated pattern leads to severe accuracy degradation, preventing flexible reuse across inference scenarios. We introduce RAT+, a dense-pretraining architecture that augments attention with full-sequence recurrence and active recurrence learning. A single RAT+ model is pretrained densely once and can then be flexibly switched at inference time to dilated attention (optionally with local windows) or hybrid layer/head compositions, requiring only a short 1B-token resolution adaptation rather than retraining separate sparse models. At 1.5B parameters trained on 100B tokens, RAT+ closely matches dense accuracy at D = 16, and drops by about 2-3 points at D = 64 on commonsense reasoning and LongBench tasks. We further scale to 2.6B and 7.6B parameters and observe even more promising performance (e.g., a 1-point average accuracy loss with a 64x reduction in attention FLOPs and KV cache size). Code is available at https://github.com/wimh966/rat-plus.

2602.16610 2026-05-29 cs.CL cs.AI cs.LG

Who can we trust? LLM-as-a-jury for Comparative Assessment

我们该信任谁?LLM作为陪审团进行比较评估

Mengjie Qian, Guangzhi Sun, Mark J. F. Gales, Kate M. Knill

发表机构 * Department of Engineering, University of Cambridge, UK(剑桥大学工程系)

AI总结 针对LLM作为评估者时判断不一致和可靠性差异的问题,提出BT-sigma模型,通过引入判别参数联合推断项目排名和法官可靠性,优于平均聚合方法。

Comments Accepted to ICML 2026

详情
AI中文摘要

大型语言模型(LLMs)越来越多地被用作自动评估器,用于自然语言生成评估,通常采用成对比较判断。现有方法通常依赖单一法官或聚合多个法官并假设其可靠性相同。在实践中,LLM法官在不同任务和评估方面的表现差异很大,其判断概率可能存在偏差和不一致。此外,用于法官校准的人工标注监督可能不可用。我们首先通过实验证明LLM比较概率的不一致性存在,并表明这限制了直接基于概率排名的有效性。为解决此问题,我们研究了LLM作为陪审团的设置,并提出了BT-sigma,这是Bradley-Terry模型的一种法官感知扩展,为每个法官引入一个判别参数,仅从成对比较中联合推断项目排名和法官可靠性。在基准NLG评估数据集上的实验表明,BT-sigma始终优于基于平均的聚合方法,并且学习到的判别参数与LLM判断的循环一致性的独立度量高度相关。进一步分析揭示,BT-sigma可以解释为一种无监督校准机制,通过建模法官可靠性来改进聚合。

英文摘要

Large language models (LLMs) are increasingly applied as automatic evaluators for natural language generation assessment often using pairwise comparative judgements. Existing approaches typically rely on single judges or aggregate multiple judges assuming equal reliability. In practice, LLM judges vary substantially in performance across tasks and evaluation aspects, and their judgment probabilities may be biased and inconsistent. Furthermore, human-labelled supervision for judge calibration may be unavailable. We first empirically demonstrate that inconsistencies in LLM comparison probabilities exist and show that it limits the effectiveness of direct probability-based ranking. To address this, we study the LLM-asa-jury setting and propose BT-sigma, a judge-aware extension of the Bradley-Terry model that introduces a discriminator parameter for each judge to jointly infer item rankings and judge reliability from pairwise comparisons alone. Experiments on benchmark NLG evaluation datasets show that BT-sigma consistently outperforms averaging-based aggregation methods, and that the learned discriminators strongly correlate with independent measures of the cycle consistency of LLM judgments. Further analysis reveals that BT-sigma can be interpreted as an unsupervised calibration mechanism that improves aggregation by modelling judge reliability.

2602.16449 2026-05-29 cs.LG cs.AI stat.ML

GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation

GICDM: 缓解枢纽性以实现可靠的基于距离的生成模型评估

Nicolas Salvy, Hugues Talbot, Bertrand Thirion

发表机构 * Inria, Palaiseau, France(法国帕莱索研究所)

AI总结 针对生成模型评估中高维嵌入空间的枢纽性现象,提出GICDM方法(基于迭代上下文不相似度度量),通过多尺度扩展校正邻域估计,恢复可靠度量并与人类评估对齐。

Comments Forty-third International Conference on Machine Learning, 2026

详情
AI中文摘要

生成模型评估通常依赖于高维嵌入空间来计算样本之间的距离。我们表明,这些空间中的数据集表示受到枢纽性现象的影响,这会扭曲最近邻关系并使基于距离的度量产生偏差。基于经典的迭代上下文不相似度度量(ICDM),我们引入了生成式ICDM(GICDM),一种校正真实数据和生成数据邻域估计的方法。我们引入了多尺度扩展以改善经验行为。在合成和真实基准上的大量实验表明,GICDM解决了枢纽性引起的失败,恢复了可靠的度量行为,并改善了与人类评估的一致性。

英文摘要

Generative model evaluation commonly relies on high-dimensional embedding spaces to compute distances between samples. We show that dataset representations in these spaces are affected by the hubness phenomenon, which distorts nearest-neighbor relationships and biases distance-based metrics. Building on the classical Iterative Contextual Dissimilarity Measure (ICDM), we introduce Generative ICDM (GICDM), a method to correct neighborhood estimation for both real and generated data. We introduce a multi-scale extension to improve empirical behavior. Extensive experiments on synthetic and real benchmarks demonstrate that GICDM resolves hubness-induced failures, restores reliable metric behavior, and improves alignment with human assessment.

2602.15382 2026-05-29 cs.CL cs.CV cs.LG

The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

视觉虫洞:异构多智能体系统中的潜在空间通信

Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He, Feijie Wu, Hoin Jung, Matt Fredrikson, Xiaoqian Wang, Jing Gao

发表机构 * Purdue University(普渡大学) Contextual AI(情境人工智能) Carnegie Mellon University(卡内基梅隆大学) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出Vision Wormhole框架,通过通用视觉编解码器将推理轨迹映射到共享连续空间,实现异构VLM间的潜在状态传输,无需配对翻译器,降低对齐复杂度并提升效率。

Comments Preprint. Work in progress

详情
AI中文摘要

由大型语言模型驱动的多智能体系统(MAS)实现了先进的协作推理,但仍受限于离散文本通信,这带来了运行时开销和信息量化损失。虽然潜在状态传输提供了一种替代方案,但现有方法要么假设同构的发送器-接收器架构,要么依赖于特定配对的学得翻译器,限制了跨具有不连续流形的不同模型族的可扩展性。我们将为自然图像训练的视觉-语言模型(VLM)的视觉界面重新概念化为异构智能体之间的连续通信通道,并将这一思想实例化为 extbf{视觉虫洞}:一种通用视觉编解码器,将推理轨迹映射到共享的连续参考空间,并将其注入接收器的视觉通路,实现无需配对翻译器的跨架构潜在状态传输。该框架采用中心辐射拓扑,将对齐复杂度从$O(N^2)$降低到$O(N)$,并通过无标签的教师-学生蒸馏针对文本通道进行训练,无需并行隐藏状态监督。在异构VLM族(Qwen-VL、Gemma、SmolVLM2、LFM2.5-VL)和九个推理基准上的大量实验表明,视觉虫洞在大多数评估设置中减少了端到端挂钟时间,并产生了正的平均宏$Δ$-准确率。

英文摘要

Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain bottlenecked by discrete text communication, which imposes runtime overhead and information quantization loss. While latent state transfer offers an alternative, existing approaches either assume homogeneous sender--receiver architectures or rely on pair-specific learned translators, limiting scalability across diverse model families with disjoint manifolds. We reconceptualize the visual interface of Vision-Language Models (VLMs), trained for natural images, as a continuous communication channel between heterogeneous agents, and instantiate this idea as the \textbf{Vision Wormhole}: a Universal Visual Codec maps reasoning traces into a shared continuous reference space and injects them into the receiver's visual pathway, yielding cross-architecture latent state transfer without per-pair translators. The framework adopts a hub-and-spoke topology that reduces alignment complexity from $O(N^2)$ to $O(N)$, and is trained by label-free teacher--student distillation against the text channel, requiring no parallel hidden-state supervision. Extensive experiments across heterogeneous VLM families (Qwen-VL, Gemma, SmolVLM2, LFM2.5-VL) and nine reasoning benchmarks show that the Vision Wormhole reduces end-to-end wall-clock time across most evaluated settings and yields positive macro-average $Δ$-accuracy.

2602.15239 2026-05-29 cs.LG

Size Transferability of Graph Transformers with Convolutional Positional Encodings

图Transformer的尺寸可迁移性与卷积位置编码

Javier Porras-Valenzuela, Zhiyang Wang, Xiaotao Shang, Yusu Wang, Alejandro Ribeiro

发表机构 * Department of Electrical and Systems Engineering(电气与系统工程系) University of Pennsylvania(宾夕法尼亚大学) University of California San Diego(圣地亚哥大学)

AI总结 通过图神经网络位置编码建立图Transformer与流形神经网络的联系,证明其在小图上训练后可泛化到大图,并在标准基准和实际地形最短路径估计任务中验证可扩展性。

详情
AI中文摘要

Transformer在各个领域取得了显著成功,推动了图Transformer(GTs)作为基于注意力的图结构数据架构的兴起。GTs的一个关键设计选择是使用基于图神经网络(GNN)的位置编码来融入结构信息。在这项工作中,我们通过图序列的流形极限模型研究GTs,并建立了具有GNN位置编码的GTs与流形神经网络(MNNs)之间的理论联系。基于GNN在流形收敛下的可迁移性结果,我们证明了GTs从其位置编码继承了可迁移性保证。特别地,在温和假设下,在小图上训练的GTs可以证明地泛化到更大的图。我们通过标准图基准上的大量实验补充了理论,表明GTs表现出与GNN相当的可扩展行为。为了进一步展示在真实场景中的效率,我们实现了GTs用于地形上的最短路径距离估计,以更好地说明可迁移GTs的效率。我们的结果为理解GTs提供了新见解,并为在大规模设置中高效训练GTs提出了实用方向。

英文摘要

Transformers have achieved remarkable success across domains, motivating the rise of Graph Transformers (GTs) as attention-based architectures for graph-structured data. A key design choice in GTs is the use of Graph Neural Network (GNN)-based positional encodings to incorporate structural information. In this work, we study GTs through the lens of manifold limit models for graph sequences and establish a theoretical connection between GTs with GNN positional encodings and Manifold Neural Networks (MNNs). Building on transferability results for GNNs under manifold convergence, we show that GTs inherit transferability guarantees from their positional encodings. In particular, GTs trained on small graphs provably generalize to larger graphs under mild assumptions. We complement our theory with extensive experiments on standard graph benchmarks, demonstrating that GTs exhibit scalable behavior on par with GNNs. To further show the efficiency in a real-world scenario, we implement GTs for shortest path distance estimation over terrains to better illustrate the efficiency of the transferable GTs. Our results provide new insights into the understanding of GTs and suggest practical directions for efficient training of GTs in large-scale settings.

2602.12304 2026-05-29 cs.SD cs.AI cs.MM eess.AS

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

OmniCustom: 通过联合音视频生成模型实现同步音视频定制

Maomao Li, Zhen Li, Kaipeng Zhang, Guosheng Yin, Zhifeng Li, Dong Xu

发表机构 * The University of Hong Kong(香港大学) Shanda AI Research Tokyo(Shanda AI东京研究所) XIntelligence Technology Co., Limited(XIntelligence技术有限公司)

AI总结 提出一种基于DiT的零样本音视频定制框架OmniCustom,通过参考图像和音频同步生成保持身份和音色一致性的视频,支持文本指定语音内容。

Comments code: https://github.com/OmniCustom-project/OmniCustom

详情
AI中文摘要

现有的主流视频定制方法侧重于基于给定参考图像和文本提示生成身份一致的视频。受益于联合音视频生成的快速发展,本文提出一个更具吸引力的新任务:同步音视频定制,旨在同步定制视频身份和音频音色。具体来说,给定参考图像$I^{r}$和参考音频$A^{r}$,该新任务要求生成保持参考图像身份并模仿参考音频音色的视频,语音内容可由用户提供的文本提示自由指定。为此,我们提出OmniCustom,一个基于DiT的强大音视频定制框架,能够以零样本方式一次性根据参考图像身份、音频音色和文本提示合成视频。我们的框架基于三个关键贡献。首先,身份和音频音色控制通过独立的参考身份和音频LoRA模块实现,这些模块通过基础音视频生成模型中的自注意力层操作。其次,我们引入了对比学习目标与标准流匹配目标一起使用。它将以参考输入为条件的预测流作为正例,以无参考条件的预测流作为负例,从而增强模型保持身份和音色的能力。第三,我们在构建的大规模高质量音视频人类数据集上训练OmniCustom。大量实验表明,OmniCustom在生成具有一致身份和音色保真度的音视频内容方面优于现有方法。项目页面:https://omnicustom-project.github.io/page/。

英文摘要

Existing mainstream video customization methods focus on generating identity-consistent videos based on given reference images and textual prompts. Benefiting from the rapid advancement of joint audio-video generation, this paper proposes a more compelling new task: sync audio-video customization, which aims to synchronously customize both video identity and audio timbre. Specifically, given a reference image $I^{r}$ and a reference audio $A^{r}$, this novel task requires generating videos that maintain the identity of the reference image while imitating the timbre of the reference audio, with spoken content freely specifiable through user-provided textual prompts. To this end, we propose OmniCustom, a powerful DiT-based audio-video customization framework that can synthesize a video following reference image identity, audio timbre, and text prompts all at once in a zero-shot manner. Our framework is built on three key contributions. First, identity and audio timbre control are achieved through separate reference identity and audio LoRA modules that operate through self-attention layers within the base audio-video generation model. Second, we introduce a contrastive learning objective alongside the standard flow matching objective. It uses predicted flows conditioned on reference inputs as positive examples and those without reference conditions as negative examples, thereby enhancing the model ability to preserve identity and timbre. Third, we train OmniCustom on our constructed large-scale, high-quality audio-visual human dataset. Extensive experiments demonstrate that OmniCustom outperforms existing methods in generating audio-video content with consistent identity and timbre fidelity. Project page: https://omnicustom-project.github.io/page/.

2602.11171 2026-05-29 cs.CL cs.AI

A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search

语言引导的贝叶斯优化用于高效LoRA超参数搜索

Baek Seong-Eun, Lee Jung-Mok, Kim Sung-Bin, Tae-Hyun Oh

发表机构 * Grad. School of AI, POSTECH, Pohang, Korea(POSTECH人工智能研究生院,韩国坡安) School of EE, KAIST, Daejeon, Korea(韩国科学技术院电子工程学院,韩国大田) School of Computing, KAIST, Daejeon, Korea(韩国科学技术院计算学院,韩国大田)

AI总结 提出一种利用预训练LLM领域知识的贝叶斯优化框架,通过语言提示将超参数映射到连续空间,结合子集训练代理评估,仅需约30次迭代即可发现比标准超参数提升20%以上性能的LoRA超参数。

Comments Accepted at ICML 2026

详情
AI中文摘要

使用低秩适配(LoRA)微调大型语言模型(LLM)提供了一种资源高效的方式来实现个性化或专业化。然而,LoRA对超参数选择高度敏感,且穷举超参数搜索计算成本高昂。为此,我们提出一个贝叶斯优化(BO)框架,利用预训练LLM的领域知识来高效搜索LoRA超参数。我们的方法将预训练LLM重新用作离散到连续映射模块,将超参数及其领域知识链接到连续向量空间,在其中进行BO。我们通过语言提示设计和控制映射,提供描述超参数间关系及其各自角色的领域感知文本提示。这使我们能够以自然语言将关于LoRA的领域知识显式注入LLM。我们还引入一个额外的可学习标记,以捕获提示中难以用语言描述的残差信息。这有助于BO采样更多高性能超参数。此外,通过利用LoRA训练机制中从完整数据集和子集训练数据集获得的性能之间观察到的强相关性,我们引入使用数据子集的代理训练和评估。这显著提高了我们方法的效率。我们证明,仅需约30次迭代发现的超参数,相比从约45,000种组合中找到的标准超参数,实现了超过20%的性能提升。项目页面:https://baekseongeun.github.io/lora-bo/

英文摘要

Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) offers a resource-efficient way to personalize or specialize. However, LoRA is highly sensitive to hyperparameter choices, and exhaustive hyperparameter search is computationally expensive. To address this, we propose a Bayesian Optimization (BO) framework that leverages the domain knowledge of pre-trained LLMs to efficiently search for LoRA hyperparameters. Our approach repurposes a pre-trained LLM as a discrete-to-continuous mapping module to link hyperparameters and their domain knowledge to a continuous vector space, where BO is conducted. We design and control the mapping via language prompting, providing a domain-aware textual prompt that describes the relationships among hyperparameters and their respective roles. This allows us to explicitly inject domain knowledge about LoRA into the LLM in natural language. We also introduce an additional learnable token to capture residual information that is difficult to describe linguistically in the prompt. This aids BO to sample more high-performing hyperparameters. In addition, by leveraging the strong correlation observed between the performance obtained from full and subset training datasets in LoRA training regimes, we introduce proxy training and evaluation using a data subset. This significantly improves the efficiency of our method. We demonstrate that our hyperparameter, discovered with only about 30 iterations, achieves more than 20% performance improvement over standard hyperparameters found from about 45,000 combinations. Project page: https://baekseongeun.github.io/lora-bo/

2602.11065 2026-05-29 cs.CL cs.AI

S-MARC: Causal Streaming Reasoning for Full-Duplex Conversational Behavior Modeling

S-MARC:全双工对话行为建模的因果流式推理

Dingkun Zhou, Shuchang Pan, Jiachen Lian, Siddharth Banerjee, Sarika Pasumarthy, Dhruv Hebbar, Siddhant Patel, Zeyi Austin Li, Kan Jen Cheng, Sanay Bordia, Krish Patel, Akshaj Gupta, Tingle Li, Gopala Anumanchipalli

发表机构 * University of California, Berkeley(加州大学伯克利分校) Zhejiang University(浙江大学) South China University of Technology(华南理工大学)

AI总结 提出S-MARC框架,通过流式因果层次建模意图到动作路径,预测高层交际功能和低层交互行为,并构建高质量语料库,实现全双工对话中的鲁棒行为检测与可解释推理。

详情
AI中文摘要

人类对话由隐式的思维链组织,并表现为时间结构化的对话行为。捕捉这一感知路径对于构建自然的全双工交互系统至关重要。我们提出了S-MARC(对话的流式因果建模与推理),一个用于对话行为建模与推理的流式、因果、层次化框架。通过形式化意图到动作的路径,S-MARC预测高层交际功能和低层交互行为,同时建模它们的因果和时间依赖关系。为支持这一设置,我们构建了一个高质量语料库,将可控、事件丰富的双工对话数据与行为标签配对。S-MARC将流式预测组织成持续演化的图结构,为其决策生成简洁的推理依据,并动态优化其推理过程。在合成和真实双工对话上的实验表明,S-MARC实现了鲁棒的行为检测,产生了可解释的推理链,并为全双工口语对话系统中的对话推理建立了基准基础。

英文摘要

Human conversation is organized by an implicit chain of thought and manifests as temporally structured conversational behaviors. Capturing this perceptual pathway is critical for building natural full-duplex interactive systems. We propose S-MARC (Streaming Causal Modeling and Reasoning for Conversation), a streaming, causal, and hierarchical framework for conversational behavior modeling and reasoning. By formalizing the intent-to-action pathway, S-MARC predicts high-level communicative functions and low-level interaction behaviors while modeling their causal and temporal dependencies. To support this setting, we construct a high-quality corpus that pairs controllable, event-rich duplex dialogue data with behavior labels. S-MARC organizes streaming predictions into a continuously evolving graph structure, generates concise justifications for its decisions, and dynamically optimizes its reasoning process. Experiments on synthetic and real duplex dialogues show that S-MARC achieves robust behavior detection, produces interpretable reasoning chains, and establishes a benchmark foundation for conversational reasoning in full-duplex spoken dialogue systems.

2602.10637 2026-05-29 cs.LG cond-mat.stat-mech physics.chem-ph stat.ML

Coarse-Grained Boltzmann Generators

粗粒度玻尔兹曼生成器

Weilong Chen, Bojun Zhao, Jan Eckwert, Julija Zavadlav

发表机构 * Professorship of Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Germany(多尺度流体材料建模教授职位,工程物理与计算系,TUM工程与设计学院,慕尼黑技术大学,德国) Atomistic Modeling Center, Munich Data Science Institute, Technical University of Munich, Germany(原子建模中心,慕尼黑数据科学研究所,慕尼黑技术大学,德国)

AI总结 提出粗粒度玻尔兹曼生成器(CG-BGs)框架,结合基于流的生成模型与重要性采样,利用学习到的平均力势(PMF)进行重加权,在降低计算成本的同时实现大分子系统的平衡采样。

Comments Accepted at ICML 2026

详情
AI中文摘要

从玻尔兹曼分布中采样平衡分子构型是一个长期挑战。玻尔兹曼生成器(BGs)通过结合精确似然生成模型与重要性采样来解决这一问题,但实际可扩展性有限。同时,粗粒度代理模型通过降低有效维度来建模更大系统,但往往缺乏确保渐近正确统计量的重加权过程。在这项工作中,我们提出了粗粒度玻尔兹曼生成器(CG-BGs),一个用于粗粒度坐标空间中的降阶生成建模与重要性采样的框架。CG-BGs使用基于流的模型生成样本,并使用学习到的平均力势(PMF)进行重加权。我们表明,可以通过增强采样力匹配从快速收敛的轨迹中学习PMF。实验证明,CG-BGs在高度降阶表示中捕获溶剂介导的相互作用,同时相对于原子级BGs大幅降低计算成本,为更大分子系统的平衡采样提供了实用途径。

英文摘要

Sampling equilibrium molecular configurations from the Boltzmann distribution is a longstanding challenge. Boltzmann Generators (BGs) address this by combining exact-likelihood generative models with importance sampling, but practical scalability is limited. Meanwhile, coarse-grained surrogates enable the modeling of larger systems by reducing effective dimensionality, yet often lack a reweighting procedure required to ensure asymptotically correct statistics. In this work, we propose Coarse-Grained Boltzmann Generators (CG-BGs), a framework for reduced-order generative modeling with importance sampling in coarse-grained coordinate space. CG-BGs generate samples using a flow-based model and reweight them using a learned potential of mean force (PMF). We show that the PMF can be learned from rapidly converged trajectories via enhanced sampling force matching. Experiments demonstrate that CG-BGs capture solvent-mediated interactions in highly reduced representations while substantially reducing computational cost relative to atomistic BGs, providing a practical route toward equilibrium sampling of larger molecular systems.

2602.10520 2026-05-29 cs.LG

Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models

优先过程而非结果:奖励潜在思维轨迹改善循环语言模型的推理能力

Jonathan Williams, Esin Tureci

发表机构 * Department of Computer Science, Princeton University, Princeton NJ, U.S.A(普林斯顿大学计算机科学系)

AI总结 针对循环语言模型(LoopLM)中标准强化学习(如GRPO)仅奖励最终潜在状态导致推理改进失败的问题,提出RLTT框架,通过在整个潜在推理轨迹上分配奖励实现密集的轨迹级信用分配,显著提升数学推理性能并泛化至非数学任务。

Comments ICML 2026

详情
AI中文摘要

循环语言模型(LoopLMs)在生成token之前执行多步潜在推理,并在较小的参数预算下在推理基准上优于传统LLM。然而,使用强化学习进一步改进LoopLM推理的尝试失败了——诸如群体相对策略优化(GRPO)等标准目标仅对最终潜在状态分配信用,与模型的内部计算存在根本性不匹配。为解决此问题,我们引入了RLTT(奖励潜在思维轨迹),这是一种强化学习框架,将奖励分布在整个潜在推理轨迹上。RLTT提供密集的、轨迹级别的信用分配,无需依赖外部验证器,并且可以直接替代GRPO,开销可忽略不计。在相同训练和推理条件下,使用Ouro-1.4B/2.6B-Thinking进行的大量实验中,RLTT在具有挑战性的数学推理基准上比GRPO取得了统计上显著的改进,在1.4B规模上,MATH-500、AIME24/26和BeyondAIME的平均准确率提高了+5.8%,在2.6B规模上提高了+10.9%。尽管仅在数学上训练,RLTT也能有效迁移到非数学推理基准,证明了轨迹级信用分配在LoopLMs中强化学习的有效性。代码可在https://github.com/jonwill8/RLTT.git获取。

英文摘要

Looped Language Models (LoopLMs) perform multi-step latent reasoning prior to token generation and outperform conventional LLMs on reasoning benchmarks at smaller parameter budgets. However, attempts to further improve LoopLM reasoning with reinforcement learning have failed - standard objectives such as Group Relative Policy Optimization (GRPO) only assign credit to the final latent state, creating a fundamental mismatch with the model's internal computation. To resolve this, we introduce RLTT (Reward Latent Thought Trajectories), a reinforcement learning framework which distributes reward across the full latent reasoning trajectory. RLTT provides dense, trajectory-level credit assignment without relying on external verifiers and can directly replace GRPO with negligible overhead. Across extensive experiments with Ouro-1.4B/2.6B-Thinking under identical training and inference conditions, RLTT yields statistically significant improvements over GRPO on challenging mathematical reasoning benchmarks, improving mean accuracy over MATH-500, AIME24/26, and BeyondAIME by +5.8% on the 1.4B scale, and +10.9% on the 2.6B scale. Despite being trained exclusively on mathematics, RLTT also transfers effectively to non-mathematical reasoning benchmarks, demonstrating the effectiveness of trajectory-level credit assignment for reinforcement learning in LoopLMs. Code is available at https://github.com/jonwill8/RLTT.git.

2602.08979 2026-05-29 cs.SD cs.CL

Beyond Transcripts: A Renewed Perspective on Audio Chaptering

超越文本:音频章节划分的新视角

Fabian Retkowski, Maike Züfle, Thai Binh Nguyen, Jan Niehues, Alexander Waibel

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文通过提出音频专用架构AudioSeg、分析影响性能的因素以及形式化评估协议,系统研究了音频章节划分任务,发现AudioSeg显著优于基于文本的方法,停顿是最有效的声学特征,而多模态大模型在短音频上表现有潜力。

Comments Accepted at ACL 2026 (Main Conference)

详情
AI中文摘要

音频章节划分是将长音频分割成连贯部分的任务,对于导航播客、讲座和视频越来越重要。尽管其相关性,研究仍然有限且基于文本,留下了关于利用音频信息、处理ASR错误以及无转录评估的关键问题未解决。我们通过三个贡献来解决这些空白:(1)基于文本模型与声学特征、一种新颖的仅音频架构(AudioSeg,操作于学习到的音频表示)以及多模态大模型的系统比较;(2)影响性能因素的经验分析,包括转录质量、声学特征、持续时间和说话人组成;(3)形式化的评估协议,对比依赖转录的文本空间协议与转录不变的时间空间协议。我们在YTSeg上的实验表明,AudioSeg显著优于基于文本的方法,停顿提供了最大的声学增益,而MLLMs受限于上下文长度和指令遵循能力较弱,但MLLMs在较短的音频上显示出潜力。

英文摘要

Audio chaptering, the task of segmenting long-form audio into coherent sections, is increasingly important for navigating podcasts, lectures, and videos. Despite its relevance, research remains limited and text-based, leaving key questions unresolved about leveraging audio information, handling ASR errors, and transcript-free evaluation. We address these gaps through three contributions: (1) a systematic comparison between text-based models with acoustic features, a novel audio-only architecture (AudioSeg) operating on learned audio representations, and multimodal LLMs; (2) empirical analysis of factors affecting performance, including transcript quality, acoustic features, duration, and speaker composition; and (3) formalized evaluation protocols contrasting transcript-dependent text-space protocols with transcript-invariant time-space protocols. Our experiments on YTSeg reveal that AudioSeg substantially outperforms text-based approaches, pauses provide the largest acoustic gains, and MLLMs remain limited by context length and weak instruction following, yet MLLMs are promising on shorter audio.

2602.08783 2026-05-29 cs.AI cs.CL

Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure

潜在思维链中的因果结构:一项实证研究

Zirui Li, Xuefeng Bai, Kehai Chen, Yizhi Li, Jian Yang, Chenghua Lin, Min Zhang

发表机构 * Harbin Institute of Technology, Shenzhen, China(哈尔滨工业大学(深圳)) University of Manchester, United Kingdom(曼彻斯特大学) Beihang University, China(北京航空航天大学)

AI总结 通过结构因果模型对潜在思维链进行干预分析,揭示其因果结构、步骤间影响传播及与显式思维链的差异。

Comments Accepted to ICML 2026; 25 pages, 23 figures

详情
AI中文摘要

潜在或连续思维链方法用若干内部潜在步骤替代显式文本推理,但这些中间计算难以通过基于相关性的探针进行评估。本文将潜在思维链视为表示空间中的可操控因果过程,将潜在步骤建模为结构因果模型(SCM)中的变量,并通过逐步do-干预分析其效应。我们研究了两种代表性范式(即Coconut和CODI)在数学和通用推理任务上的表现,以探讨三个关键问题:(1)哪些步骤对正确性具有因果必要性,以及答案何时可早期解码;(2)影响如何在步骤间传播,以及这种结构与显式CoT相比如何;(3)中间轨迹是否保留竞争性答案模式,以及输出级承诺与步骤间表示级承诺的差异。我们发现潜在步骤预算更像分阶段功能而非同质化额外深度,并具有非局部路由特性,同时识别出早期输出偏差与后期表示承诺之间的持续差距。这些结果促使我们采用模式条件化和稳定性感知分析,以及相应的训练/解码目标,作为解释和改进潜在推理系统的更可靠工具。代码见https://github.com/J1mL1/causal-latent-cot。

英文摘要

Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these intermediate computations are difficult to evaluate beyond correlation-based probes. In this paper, we view latent chain-of-thought as a manipulable causal process in representation space by modeling latent steps as variables in a structural causal model (SCM) and analyzing their effects through step-wise do-interventions. We study two representative paradigms (i.e., Coconut and CODI) on both mathematical and general reasoning tasks to investigate three key questions: (1) which steps are causally necessary for correctness and when answers become decodable early; (2) how influence propagates across steps and how this structure compares to explicit CoT; and (3) whether intermediate trajectories retain competing answer modes and how output-level commitment differs from representational commitment across steps. We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing, and we identify a persistent gap between early output bias and late representational commitment. These results motivate mode-conditional and stability-aware analyses, together with corresponding training/decoding objectives, as more reliable tools for interpreting and improving latent reasoning systems. Code is available at https://github.com/J1mL1/causal-latent-cot.

2602.06791 2026-05-29 cs.LG cond-mat.dis-nn cond-mat.stat-mech

Rare Event Analysis of Large Language Models

大型语言模型的罕见事件分析

Jake McAllister Dorman, Edward Gillman, Dominic C. Rose, Jamie F. Mair, Juan P. Garrahan

发表机构 * School of Physics and Astronomy, University of Nottingham(物理与天文学学院,诺丁汉大学)

AI总结 本文提出一个端到端框架,用于系统分析大型语言模型中的罕见事件,涵盖理论、高效生成策略、概率估计和误差分析,并通过实例展示其应用。

Comments ICML 2026 Oral Spotlight

详情
AI中文摘要

作为概率模型,大型语言模型(LLMs)在推理过程中会显示罕见事件:即远离典型但高度显著的行为。根据定义,所有罕见事件都难以观察,但LLM使用的巨大规模意味着在开发过程中完全未观察到的事件在部署中可能变得突出。在此,我们提出了一个用于系统分析LLMs中罕见事件的端到端框架。我们提供了一个实用的实现,涵盖理论、高效生成策略、概率估计和误差分析,并通过具体示例加以说明。我们概述了扩展到其他模型和背景的应用,强调了这里提出的概念和技术的通用性。

英文摘要

Being probabilistic models, during inference large language models (LLMs) display rare events: behaviour that is far from typical but highly significant. By definition all rare events are hard to see, but the enormous scale of LLM usage means that events completely unobserved during development are likely to become prominent in deployment. Here we present an end-to-end framework for the systematic analysis of rare events in LLMs. We provide a practical implementation spanning theory, efficient generation strategies, probability estimation and error analysis, which we illustrate with concrete examples. We outline extensions and applications to other models and contexts, highlighting the generality of the concepts and techniques presented here.

2602.06036 2026-05-29 cs.CL

DFlash: Block Diffusion for Flash Speculative Decoding

DFlash:用于快速推测解码的块扩散模型

Jian Chen, Yesheng Liang, Zhijian Liu

发表机构 * Z-Lab

AI总结 提出DFlash框架,利用轻量级块扩散模型并行生成草稿,通过目标模型上下文特征条件化,实现高质量草稿和高接受率,在多种模型和任务上实现超过6倍无损加速,比最先进的推测解码方法EAGLE-3快2.5倍。

Comments Accepted at ICML 2026. Camera-ready version. Code: https://github.com/z-lab/dflash

详情
AI中文摘要

自回归大型语言模型(LLMs)性能强大,但需要固有的顺序解码,导致高推理延迟和低GPU利用率。推测解码通过使用快速草稿模型来缓解这一瓶颈,其输出由目标LLM并行验证;然而,现有方法仍然依赖于自回归草稿生成,这仍然是顺序的,限制了实际加速。扩散LLMs通过实现并行生成提供了一种有希望的替代方案,但当前的扩散模型通常性能不如自回归模型。在本文中,我们介绍了DFlash,一种采用轻量级块扩散模型进行并行草稿生成的推测解码框架。通过在单次前向传播中生成草稿标记,并将草稿模型条件化于从目标模型提取的上下文特征,DFlash实现了高效草稿生成,具有高质量输出和更高的接受率。实验表明,DFlash在多种模型和任务上实现了超过6倍的无损加速,比最先进的推测解码方法EAGLE-3提供高达2.5倍的加速提升。

英文摘要

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM; however, existing methods still rely on autoregressive drafting, which remains sequential and limits practical speedups. Diffusion LLMs offer a promising alternative by enabling parallel generation, but current diffusion models typically underperform compared with autoregressive models. In this paper, we introduce DFlash, a speculative decoding framework that employs a lightweight block diffusion model for parallel drafting. By generating draft tokens in a single forward pass and conditioning the draft model on context features extracted from the target model, DFlash enables efficient drafting with high-quality outputs and higher acceptance rates. Experiments show that DFlash achieves over 6x lossless acceleration across a range of models and tasks, delivering up to 2.5x higher speedup than the state-of-the-art speculative decoding method EAGLE-3.

2602.05961 2026-05-29 cs.LG stat.ML

Discrete diffusion samplers and bridges: Off-policy algorithms and applications in latent spaces

离散扩散采样器与桥:离策略算法及其在潜在空间中的应用

Arran Carter, Sanghyeok Choi, Kirill Tamogashev, Víctor Elvira, Esmeralda S. Whitammer

发表机构 * University of Edinburgh(爱丁堡大学) CIFAR Fellow(卡尔·弗里德里希·列文森研究员)

AI总结 提出离策略训练技术改进离散扩散采样器性能,并首次引入离散域的数据到能量薛定谔桥训练,应用于图像生成模型的离散潜在空间中的无数据后验采样。

Comments ICML 2026. Code: https://github.com/mmacosha/offpolicy-discrete-diffusion-samplers-and-bridges

详情
AI中文摘要

从已知归一化常数的分布 $p(x) \propto e^{-\mathcal{E}(x)}$ 中采样是统计学中一个重要且具有挑战性的问题。近年来,出现了一类新的摊销采样算法,通常称为扩散采样器,能够从未归一化的密度中快速高效地采样。这类算法在连续空间采样任务中已被广泛研究;然而,它们在离散空间问题中的应用仍 largely 未被探索。尽管该领域已取得一些进展,但离散扩散采样器并未充分利用连续空间采样中常用的思想。在本文中,我们提出通过引入离散扩散采样器的离策略训练技术来弥合这一差距。我们证明这些技术在已有和新颖的合成基准上提高了离散采样器的性能。接下来,我们将离散扩散采样器推广到两个任意分布之间的桥接任务,首次为离散域引入了数据到能量薛定谔桥训练。最后,我们展示了所提出的扩散采样器在图像生成模型的离散潜在空间中进行无数据后验采样的应用。

英文摘要

Sampling from a distribution $p(x) \propto e^{-\mathcal{E}(x)}$ known up to a normalising constant is an important and challenging problem in statistics. Recent years have seen the rise of a new family of amortised sampling algorithms, commonly referred to as diffusion samplers, that enable fast and efficient sampling from an unnormalised density. Such algorithms have been widely studied for continuous-space sampling tasks; however, their application to problems in discrete space remains largely unexplored. Although some progress has been made in this area, discrete diffusion samplers do not take full advantage of ideas commonly used for continuous-space sampling. In this paper, we propose to bridge this gap by introducing off-policy training techniques for discrete diffusion samplers. We show that these techniques improve the performance of discrete samplers on both established and new synthetic benchmarks. Next, we generalise discrete diffusion samplers to the task of bridging between two arbitrary distributions, introducing data-to-energy Schrödinger bridge training for the discrete domain for the first time. Lastly, we showcase the application of the proposed diffusion samplers to data-free posterior sampling in the discrete latent spaces of image generative models.

2602.03357 2026-05-29 cs.LG math.OC

Achieving Linear Speedup for Composite Federated Learning

实现复合联邦学习的线性加速

Kun Huang, Shi Pu, Karl Henrik Johansson

发表机构 * KTH Royal Institute of Technology(皇家理工学院) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 提出基于法向映射的FedNMap方法,通过法向映射更新处理非光滑项并采用局部校正策略减轻数据异质性,在非凸损失下首次实现关于客户端数和本地更新次数的线性加速。

Comments 38 pages, 19 figures

详情
AI中文摘要

本文提出了FedNMap,一种基于法向映射的复合联邦学习方法,其中目标函数由光滑损失和可能非光滑的正则化项组成。FedNMap利用基于法向映射的更新方案来处理非光滑项,并采用局部校正策略来减轻客户端间数据异质性的影响。在标准假设下,包括光滑局部损失、正则化项的弱凸性以及有界随机梯度方差,FedNMap在非凸损失下(无论是否满足Polyak-Łojasiewicz条件)实现了关于客户端数和本地更新次数的线性加速。据我们所知,这是首个为非凸复合联邦学习建立线性加速的算法。数值实验证实了我们的理论发现,并展示了FedNMap的线性加速性能。

英文摘要

This paper proposes FedNMap, a normal map-based method for composite federated learning, where the objective consists of a smooth loss and a possibly nonsmooth regularizer. FedNMap leverages a normal map-based update scheme to handle the nonsmooth term and incorporates a local correction strategy to mitigate the impact of data heterogeneity across clients. Under standard assumptions, including smooth local losses, weak convexity of the regularizer, and bounded stochastic gradient variance, FedNMap achieves linear speedup with respect to both the number of clients and the number of local updates for nonconvex losses, both with and without the Polyak-Łojasiewicz condition. To the best of our knowledge, this is the first algorithm establishing linear speedup for nonconvex composite federated learning. Numerical experiments corroborate our theoretical findings and demonstrate the linear speedup of FedNMap.

2602.02849 2026-05-29 cs.AI

AutoSizer: Automatic Sizing of Analog and Mixed-Signal Circuits via Large Language Model (LLM) Agents

AutoSizer: 通过大语言模型代理自动调整模拟和混合信号电路的尺寸

Xi Yu, Dmitrii Torbunov, Soumyajit Mandal, Yihui Ren

发表机构 * Artificial Intelligence Department, Brookhaven National Laboratory, Upton, NY(布鲁赫斯国家实验室人工智能部) Instrumentation Department, Brookhaven National Laboratory, Upton, NY 11973(布鲁赫斯国家实验室仪器部)

AI总结 提出AutoSizer,一种反射式LLM驱动的元优化框架,通过双循环结构统一电路理解、自适应搜索空间构建和优化编排,在模拟和混合信号电路尺寸调整中实现更优解质量、更快收敛和更高成功率。

详情
AI中文摘要

模拟和混合信号(AMS)集成电路的设计仍然严重依赖专家知识,其中晶体管尺寸调整由于非线性行为、高维设计空间和严格的性能约束而成为主要瓶颈。现有的电子设计自动化(EDA)方法通常将尺寸调整视为静态黑箱优化,导致解决方案效率低下且鲁棒性不足。尽管大语言模型(LLM)展现出强大的推理能力,但它们并不适合AMS尺寸调整中的精确数值优化。为弥补这一差距,我们提出AutoSizer,一种反射式LLM驱动的元优化框架,以闭环方式统一电路理解、自适应搜索空间构建和优化编排。它采用双循环优化框架,内循环负责电路尺寸调整,外循环分析优化动态和约束,从仿真反馈中迭代优化搜索空间。我们进一步引入AMS-SizingBench,一个包含SKY130 CMOS技术中24种不同AMS电路的开源基准,旨在评估在基于仿真器的现实约束下的自适应优化策略。实验表明,AutoSizer在不同电路难度下实现了更高的解质量、更快的收敛速度和更高的成功率,优于传统优化方法和现有的基于LLM的代理。

英文摘要

The design of Analog and Mixed-Signal (AMS) integrated circuits remains heavily reliant on expert knowledge, with transistor sizing a major bottleneck due to nonlinear behavior, high-dimensional design spaces, and strict performance constraints. Existing Electronic Design Automation (EDA) methods typically frame sizing as static black-box optimization, resulting in inefficient and less robust solutions. Although Large Language Models (LLMs) exhibit strong reasoning abilities, they are not suited for precise numerical optimization in AMS sizing. To address this gap, we propose AutoSizer, a reflective LLM-driven meta-optimization framework that unifies circuit understanding, adaptive search-space construction, and optimization orchestration in a closed loop. It employs a two-loop optimization framework, with an inner loop for circuit sizing and an outer loop that analyzes optimization dynamics and constraints to iteratively refine the search space from simulation feedback. We further introduce AMS-SizingBench, an open benchmark comprising 24 diverse AMS circuits in SKY130 CMOS technology, designed to evaluate adaptive optimization policies under realistic simulator-based constraints. AutoSizer experimentally achieves higher solution quality, faster convergence, and higher success rate across varying circuit difficulties, outperforming both traditional optimization methods and existing LLM-based agents.

2602.02103 2026-05-29 cs.LG cs.CL

How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning

LLMs 能提前多远规划?揭示思维链推理中的潜在视界

Liyan Xu, Mo Yu, Fandong Meng, Jie Zhou

发表机构 * WeChat AI, Tencent Inc(腾讯AI实验室)

AI总结 通过探测方法 Tele-Lens 研究 LLMs 在思维链推理中的潜在规划能力,发现其具有短视视界,并基于此提出利用稀疏枢轴位置增强不确定性估计及自动识别 CoT 绕过的假设。

Comments Accepted to ICML 2026

详情
AI中文摘要

思维链推理已成为激发大型语言模型多步推理的核心机制。然而,近期证据呈现一种矛盾:隐藏状态似乎在 CoT 完全展开之前就已经编码了未来的推理,而显式步骤对于需要组合计算的任务仍然至关重要。为了加深对 LLM 内部状态与其言语化推理轨迹之间关系的理解,我们通过探测方法 Tele-Lens 研究了 LLMs 的潜在规划强度,该方法应用于跨不同任务领域的隐藏状态。我们的实证结果表明,LLMs 表现出短视视界,主要进行增量转换,而没有精确的全局规划。利用这一特性,我们提出了一个增强 CoT 不确定性估计的假设,并通过实验验证了一组稀疏的枢轴位置可以有效地代表整个路径的不确定性。我们进一步强调了利用 CoT 动态的重要性,并证明了可以在不降低性能的情况下实现 CoT 绕过的自动识别。我们的代码、数据和模型发布于 https://github.com/lxucs/tele-lens。

英文摘要

Chain-of-thought (CoT) reasoning has become a central mechanism for eliciting multi-step reasoning in Large Language Models (LLMs). Yet recent evidence presents a tension: hidden states appear to already encode future reasoning before CoT fully unfolds, while explicit steps still remain crucial for tasks requiring compositional computation. To deepen the understanding between LLM's internal states and its verbalized reasoning trajectories, we investigate the latent planning strength of LLMs, through our probing method, Tele-Lens, applying to hidden states across diverse task domains. Our empirical results indicate that LLMs exhibit a myopic horizon, primarily conducting incremental transitions without precise global planning. Leveraging this characteristic, we propose a hypothesis on enhancing uncertainty estimation of CoT, which we validate that a sparse set of pivot positions can effectively represent the uncertainty of the entire path. We further underscore the significance of exploiting CoT dynamics, and demonstrate that automatic recognition of CoT bypass can be achieved without performance degradation. Our code, data and models are released at https://github.com/lxucs/tele-lens.

2602.01869 2026-05-29 cs.AI

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

Skill-Pro: 通过非参数PPO从经验中学习可复用技能以用于LLM智能体

Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, Haifeng Zhang, Jun Wang

发表机构 * Key Laboratory of Interdisciplinary Research of Computation and Economics(交叉计算与经济学交叉研究实验室) Shanghai University of Finance and Economics(上海财经大学) Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) School of Artificial Intelligence, Chinese Academy of Sciences(中国科学院人工智能研究所) University of Bristol(布里斯托大学) Peking University(北京大学) University College London(伦敦大学学院)

AI总结 提出Skill-Pro框架,通过非参数PPO从交互经验中自动学习可复用的程序性技能,无需参数更新,实现高效经验重用和长期自主性。

Comments Accepted at ICML 2026 (spotlight); 22 Pages, 6 Figures, 5 Tables

详情
AI中文摘要

基于LLM的智能体在序列决策中表现出色,但通常依赖即时推理,即使在重复场景中也会重新推导解决方案。这种经验重用不足导致计算冗余和不稳定性。为弥补这一差距,我们提出Skill-Pro,一个使智能体能够从交互经验中自主学习可复用程序性技能而无需参数更新的框架。通过形式化Skill-MDP,Skill-Pro将被动的情节叙述转化为由激活、执行和终止条件定义的可执行技能,以确保可执行性。为了实现可靠的可重用性而不降低能力,我们引入非参数PPO,它利用语义梯度进行高质量候选生成,并使用PPO Gate进行稳健的技能验证。通过基于分数的维护,Skill-Pro维持紧凑、高质量的程序性记忆。在域内、跨任务和跨智能体场景下的实验结果表明,Skill-Pro实现了卓越的重用率和在极端内存压缩下的显著增益。可视化的进化轨迹和技能分布进一步揭示了Skill-Pro如何透明地积累、精炼和重用程序性知识以促进长期自主性。

英文摘要

LLM-driven agents excel at sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scenarios. This insufficient experience reuse leads to computational redundancy and instability. To bridge this gap, we propose Skill-Pro, a framework enabling agents to autonomously learn reusable procedural skills from interaction experiences without parameter updates. By formalizing a Skill-MDP, Skill-Pro transforms passive episodic narratives into executable Skills defined by activation, execution, and termination conditions to ensure executability. To achieve reliable reusability without capability degradation, we introduce Non-Parametric PPO, which leverages semantic gradients for high-quality candidate generation and a PPO Gate for robust Skill verification. Through score-based maintenance, Skill-Pro sustains compact, high-quality procedural memory. Experimental results across in-domain, cross-task, and cross-agent scenarios demonstrate that Skill-Pro achieves superior reuse rates and significant gains with extreme memory compression. Visualized evolutionary trajectories and Skill distributions further reveal how Skill-Pro transparently accumulates, refines, and reuses procedural knowledge to facilitate long-term autonomy.

2602.01456 2026-05-29 cs.LG cs.CV

Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations

Rectified LpJEPA:具有稀疏和最大熵表示的联合嵌入预测架构

Yilun Kuang, Yash Dagade, Tim G. J. Rudner, Randall Balestriero, Yann LeCun

发表机构 * New York University(纽约大学) Duke University(杜克大学) University of Toronto(多伦多大学) Brown University(布朗大学)

AI总结 提出Rectified Distribution Matching Regularization (RDMReg)损失,通过将表示对齐到Rectified Generalized Gaussian分布,实现稀疏且最大熵的表示,从而改进联合嵌入预测架构(JEPA)的性能。

Comments ICML 2026

详情
AI中文摘要

联合嵌入预测架构(JEPA)学习视角不变表示,并采用基于投影的分布匹配来防止崩溃。现有方法将表示正则化为各向同性高斯分布,但固有地偏向密集表示,未能捕捉高效表示中观察到的稀疏性关键特性。我们引入了Rectified Distribution Matching Regularization (RDMReg),这是一种切片双样本分布匹配损失,将表示对齐到Rectified Generalized Gaussian (RGG)分布。RGG通过整流显式控制期望的$\ell_0$范数,而其连续截断部分在期望$\ell_p$范数和支撑约束下具有最大熵特性。将RDMReg应用于JEPA得到Rectified LpJEPA,它严格推广了先前基于高斯的JEPA。实验表明,Rectified LpJEPA学习到稀疏、非负的表示,具有有利的稀疏性-性能权衡,并在图像分类基准上取得了有竞争力的下游性能,表明RDMReg可以在保留任务相关信息的同时强制执行稀疏性。

英文摘要

Joint-Embedding Predictive Architectures (JEPA) learn view-invariant representations and admit projection-based distribution matching for collapse prevention. Existing approaches regularize representations towards isotropic Gaussian distributions, but inherently favor dense representations and fail to capture the key property of sparsity observed in efficient representations. We introduce Rectified Distribution Matching Regularization (RDMReg), a sliced two-sample distribution-matching loss that aligns representations to a Rectified Generalized Gaussian (RGG) distribution. RGG enables explicit control over expected $\ell_0$ norm through rectification, while its continuous truncated component admits a maximum-entropy characterization under expected $\ell_p$ norm and support constraints. Equipping JEPAs with RDMReg yields Rectified LpJEPA, which strictly generalizes prior Gaussian-based JEPAs. Empirically, Rectified LpJEPA learns sparse, non-negative representations with favorable sparsity--performance trade-offs and competitive downstream performance on image classification benchmarks, showing that RDMReg can enforce sparsity while preserving task-relevant information.

2601.23156 2026-05-29 cs.LG cs.FL

Unsupervised Hierarchical Skill Discovery

无监督层次化技能发现

Damion Harvey, Geraud Nangue Tasse, Benjamin Rosman, Branden Ingram, Steven James

发表机构 * University of the Witwatersrand(瓦特斯兰大学) Neural Discovery (MIND) Institute(神经发现(MIND)研究所)

AI总结 提出一种基于语法的无监督方法,从无标签轨迹中分割技能并构建层次结构,在像素级环境(如Craftax和Minecraft)中优于现有基线,并能加速下游强化学习任务。

Comments Accepted to ICML 2026. 27 pages. 15 figures

详情
AI中文摘要

我们考虑强化学习中无监督技能分割和层次结构发现的问题。虽然最近的方法试图将轨迹分割为可重用的技能或选项,但大多数依赖于动作标签、奖励或手工注释,限制了其适用性。我们提出了一种方法,将未标记的轨迹分割成技能,并使用基于语法的方法在它们之上诱导出层次结构。得到的层次结构既捕获了低级行为,也捕获了它们组合成高级技能的过程。我们在高维、基于像素的环境中评估了我们的方法,包括Craftax和完整、未修改版本的Minecraft。使用技能分割、重用和层次质量的指标,我们发现我们的方法始终比现有基线产生更结构化和语义上有意义的层次结构。此外,作为概念验证,我们证明了这些发现的层次结构加速并稳定了下游强化学习任务的学习。

英文摘要

We consider the problem of unsupervised skill segmentation and hierarchical structure discovery in reinforcement learning. While recent approaches have sought to segment trajectories into reusable skills or options, most rely on action labels, rewards, or handcrafted annotations, limiting their applicability. We propose a method that segments unlabelled trajectories into skills and induces a hierarchical structure over them using a grammar-based approach. The resulting hierarchy captures both low-level behaviours and their composition into higher-level skills. We evaluate our approach in high-dimensional, pixel-based environments, including Craftax and the full, unmodified version of Minecraft. Using metrics for skill segmentation, reuse, and hierarchy quality, we find that our method consistently produces more structured and semantically meaningful hierarchies than existing baselines. Furthermore, as a proof of concept, we demonstrate that these discovered hierarchies accelerate and stabilise learning on downstream reinforcement learning tasks.

2601.22274 2026-05-29 cs.LG

Server-Proximal Aggregation for Federated Domain-Incremental Learning under Partial Participation: Task-Uniform Convergence and Backward Transfer

部分参与下联邦域增量学习的服务器近端聚合:任务均匀收敛与反向迁移

Longtao Xu, Jian Li

发表机构 * Stony Brook University, New York(石英布鲁克大学,纽约)

AI总结 针对联邦域增量学习(FDIL)中客户端异构、任务顺序到达且标签空间固定的场景,提出无记忆算法SPECIAL,通过服务器端轻量近端项抑制累积漂移,实现反向知识迁移(BKT)保证和首个部分参与下的非凸收敛速率O((E/NT)^(1/2))。

Comments Accepted in ICML2026

详情
AI中文摘要

现实联邦系统很少在静态数据上运行:输入分布漂移,而隐私规则禁止原始数据共享。我们将此设置研究为联邦域增量学习(FDIL),其中(i)客户端是异构的,(ii)任务顺序到达且域不断变化,但(iii)标签空间保持不变。在现实部署下,FDIL仍然缺少两个理论支柱:反向知识迁移(BKT)的保证以及在部分参与下所有任务序列上的收敛速率。我们引入SPECIAL(服务器近端高效持续聚合学习),一种简单的、无记忆的FDIL算法,它在标准FedAvg中添加了一个单服务器端“锚点”:在每一轮中,服务器通过一个轻量近端项,将均匀采样的参与客户端的更新推向先前的全局模型。该锚点无需重放缓冲区、合成数据或任务特定头部即可抑制累积漂移,保持通信和模型大小不变。我们的理论表明,SPECIAL(i)保留了早期任务:BKT界限将先前任务损失的任意增加限制为一个漂移控制项,该漂移控制项随着更多轮次、本地周期和参与客户端而缩小;(ii)在所有任务上高效学习:首个针对部分参与下FDIL的通信高效非凸收敛速率,O((E/NT)^(1/2)),其中E为本地周期数,T为通信轮数,N为每轮参与客户端数,与单任务FedAvg匹配,同时明确区分优化方差和任务间漂移。实验结果进一步证明了SPECIAL的有效性。

英文摘要

Real-world federated systems seldom operate on static data: input distributions drift while privacy rules forbid raw-data sharing. We study this setting as Federated Domain-Incremental Learning (FDIL), where (i) clients are heterogeneous, (ii) tasks arrive sequentially with shifting domains, yet (iii) the label space remains fixed. Two theoretical pillars remain missing for FDIL under realistic deployment: a guarantee of backward knowledge transfer (BKT) and a convergence rate that holds across the sequence of all tasks with partial participation. We introduce SPECIAL (Server-Proximal Efficient Continual Aggregation for Learning), a simple, memory-free FDIL algorithm that adds a single server-side ``anchor'' to vanilla FedAvg: in each round, the server nudges the uniformly sampled participated clients update toward the previous global model with a lightweight proximal term. This anchor curbs cumulative drift without replay buffers, synthetic data, or task-specific heads, keeping communication and model size unchanged. Our theory shows that SPECIAL (i) preserves earlier tasks: a BKT bound caps any increase in prior-task loss by a drift-controlled term that shrinks with more rounds, local epochs, and participating clients; and (ii) learns efficiently across all tasks: the first communication-efficient non-convex convergence rate for FDIL with partial participation, O((E/NT)^(1/2)), with E local epochs, T communication rounds, and N participated clients per round, matching single-task FedAvg while explicitly separating optimization variance from inter-task drift. Experimental results further demonstrate the effectiveness of SPECIAL.

2601.21909 2026-05-29 cs.AI cs.CL

From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

从元思维到执行:面向通用且可靠的大语言模型推理的认知对齐后训练

Shaojie Wang, Liang Zhang

发表机构 * Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出一种认知启发的两阶段后训练框架,通过元思维链监督学习通用策略和置信度校准强化学习优化执行可靠性,在分布内和分布外分别提升2.10%和3.86%。

详情
AI中文摘要

当前的大语言模型后训练方法通过监督微调(SFT)后接基于结果的强化学习(RL)来优化完整的推理轨迹。虽然有效,但仔细审视发现一个根本差距:这种方法与人类实际解决问题的方式不一致。人类认知自然地将问题解决分解为两个不同的阶段:首先获取跨问题泛化的抽象策略(即元知识),然后将其适应到具体实例。相比之下,通过将完整轨迹视为基本单元,当前方法本质上是问题中心的,将抽象策略与问题特定的执行纠缠在一起。为了解决这种错位,我们提出了一个认知启发的框架,明确地模仿人类认知的两阶段过程。具体而言,元思维链(CoMT)将监督学习聚焦于抽象推理模式而不涉及具体执行,从而能够获取可泛化的策略。然后,置信度校准强化学习(CCRL)通过中间步骤上的置信度感知奖励来优化任务适应,防止过度自信的错误级联并提高执行可靠性。在四个模型和十个基准上的实验表明,与标准方法相比,分布内和分布外分别提升了2.10%和3.86%,同时对教师模型选择、优化方法和符号扰动的变化保持高度鲁棒。

英文摘要

Current LLM post-training methods optimize complete reasoning trajectories through Supervised Fine-Tuning (SFT) followed by outcome-based Reinforcement Learning (RL). While effective, a closer examination reveals a fundamental gap: this approach does not align with how humans actually solve problems. Human cognition naturally decomposes problem-solving into two distinct stages: first acquiring abstract strategies (i.e., meta-knowledge) that generalize across problems, then adapting them to specific instances. In contrast, by treating complete trajectories as basic units, current methods are inherently problem-centric, entangling abstract strategies with problem-specific execution. To address this misalignment, we propose a cognitively-inspired framework that explicitly mirrors the two-stage human cognitive process. Specifically, Chain-of-Meta-Thought CoMT focuses supervised learning on abstract reasoning patterns without specific executions, enabling acquisition of generalizable strategies. Confidence-Calibrated Reinforcement Learning (CCRL) then optimizes task adaptation via confidence-aware rewards on intermediate steps, preventing overconfident errors from cascading and improving execution reliability. Experiments across four models and ten benchmarks show 2.10% and 3.86% improvements in-distribution and out-of-distribution respectively over standard methods, while remaining highly robust to variations in teacher model selection, optimization methods, and symbolic perturbations.

2601.21568 2026-05-29 cs.LG

Bridging Functional and Representational Similarity via Usable Information

通过可用信息桥接功能相似性与表征相似性

Antonio Almudévar, Alfonso Ortega

发表机构 * ViVoLab, Aragón Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain(ViVoLab,阿拉贡工程研究院(I3A),萨拉戈萨大学,西班牙萨拉戈萨)

AI总结 提出一个基于可用信息的统一框架,从功能相似性、表征相似性及其关系三个维度进行理论和实证综合,揭示表征相似性是功能相似性的充分非必要条件。

详情
AI中文摘要

我们提出了一个通过可用信息量化表征之间相似性的统一框架,在三个关键维度上提供了严格的理论和实证综合。首先,针对功能相似性,我们建立了拼接性能与条件互信息之间的形式化联系。我们进一步揭示拼接本质上是非对称的,证明稳健的功能比较需要双向分析而非单向映射。其次,关于表征相似性,我们发现基于重构的指标和标准工具(如CKA、RSA)在特定约束下充当可用信息的估计量。关键的是,我们表明相似性是相对于预测族的能力而言的:对刚性观察者而言不同的表征,对更具表达力的观察者可能是相同的。第三,我们证明表征相似性是功能相似性的充分非必要条件。我们通过任务粒度层次统一这些概念:复杂任务上的相似性保证了任何更粗粒度衍生任务上的相似性,将表征相似性确立为最大粒度的极限:输入重构。

英文摘要

We present a unified framework for quantifying the similarity between representations through the lens of \textit{usable} information, offering a rigorous theoretical and empirical synthesis across three key dimensions. First, addressing functional similarity, we establish a formal link between stitching performance and conditional mutual information. We further reveal that stitching is inherently asymmetric, demonstrating that robust functional comparison necessitates a bidirectional analysis rather than a unidirectional mapping. Second, concerning representational similarity, we find that reconstruction-based metrics and standard tools (e.g., CKA, RSA) act as estimators of usable information under specific constraints. Crucially, we show that similarity is relative to the capacity of the predictive family: representations that appear distinct to a rigid observer may be identical to a more expressive one. Third, we demonstrate that representational similarity is sufficient but not necessary for functional similarity. We unify these concepts through a task-granularity hierarchy: similarity on a complex task guarantees similarity on any coarser derivative, establishing representational similarity as the limit of maximum granularity: input reconstruction.

2601.21564 2026-05-29 cs.LG

Representation Unlearning: Forgetting through Information Compression

表示遗忘:通过信息压缩实现遗忘

Antonio Almudévar, Alfonso Ortega

发表机构 * ViVoLab, Aragón Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain(ViVoLab,阿拉贡工程研究所(I3A),萨拉戈萨大学,西班牙萨拉戈萨)

AI总结 提出表示遗忘框架,通过在模型表示空间学习信息瓶颈变换来直接执行遗忘,无需修改模型参数,实现可靠遗忘、保持效用和计算高效。

详情
AI中文摘要

机器遗忘旨在消除特定训练数据对模型的影响,这一需求由隐私法规和鲁棒性关注驱动。现有方法通常修改模型参数,但此类更新可能不稳定、计算成本高且受局部近似限制。我们引入表示遗忘,一个直接在模型表示空间中执行遗忘的框架。我们不修改模型参数,而是学习一个对表示施加信息瓶颈的变换:最大化与保留数据的互信息,同时抑制关于待遗忘数据的信息。我们推导出使这一目标可处理的变分替代,并展示如何在两种实际场景中实例化:当保留和遗忘数据都可用时,以及在仅能访问遗忘数据的零样本设置中。在多个基准上的实验表明,与以参数为中心的基线相比,表示遗忘实现了更可靠的遗忘、更好的效用保持和更高的计算效率。

英文摘要

Machine unlearning seeks to remove the influence of specific training data from a model, a need driven by privacy regulations and robustness concerns. Existing approaches typically modify model parameters, but such updates can be unstable, computationally costly, and limited by local approximations. We introduce Representation Unlearning, a framework that performs unlearning directly in the model's representation space. Instead of modifying model parameters, we learn a transformation over representations that imposes an information bottleneck: maximizing mutual information with retained data while suppressing information about data to be forgotten. We derive variational surrogates that make this objective tractable and show how they can be instantiated in two practical regimes: when both retain and forget data are available, and in a zero-shot setting where only forget data can be accessed. Experiments across several benchmarks demonstrate that Representation Unlearning achieves more reliable forgetting, better utility retention, and greater computational efficiency than parameter-centric baselines.