arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3868
2605.20341 2026-06-09 cs.LG cs.AI cs.CR cs.PF 版本更新

Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions

协同优化中的因果卸载:在对抗性贡献下的精确和近似影响反转

Ali Mahdavi, Azadeh Zamanifar, Amirfarhad Farhadi, Omid Kashefi

发表机构 * Department of Computer Engineering, SRC, Islamic Azad University Tehran, Iran(伊朗伊斯兰Azad大学塔希尔分校计算机工程系) School of Computer Engineering, Iran University of Science and Technology Tehran, Iran(伊朗科学技术大学塔希尔分校计算机工程系) Meta CA, USA(美国Meta公司)

AI总结 本文提出HF-KCU方法,通过共轭梯度迭代在Krylov子空间中近似影响函数,从而在协同优化中实现数据删除,减少计算复杂度并提高隐私保护效果。

详情
AI中文摘要

联邦学习系统必须支持数据删除请求以符合隐私法规,但每次删除后重新训练是计算上不可行的。我们提出了HF-KCU方法,通过在Krylov子空间中进行共轭梯度迭代近似影响函数,将复杂度从O(d^3)降低到O(kd),其中k<<d。因果加权机制确保只有持有删除数据的客户端接收参数更新,防止对未受影响的客户端造成虚假变化。我们的方法设计用于处理有界对抗性扰动的Hessian和梯度,提供在现实威胁模型下的优雅退化。我们在卷积(ResNet-18,SimpleCNN)和Transformer(ViT-Lite)架构上CIFAR-10、MNIST和Fashion-MNIST数据集上验证了HF-KCU。在CIFAR-10的Dirichlet(alpha=0.5)划分下,HF-KCU在重新训练的基础上实现了47.75倍的速度提升,同时保持测试准确率在0.60%以内(71.16 vs 71.76%)。对遗忘集的成员推断攻击的成功率达到了0.499,与重新训练模型匹配,证实了有效的隐私恢复。我们提供了收敛保证,显示Krylov近似误差随着O((k^{1/2}-1)/(k^{1/2}+1))递减,其中k是Hessian条件数。因果加权机制确保了手术更新,只有持有删除数据的客户端被修改,保护了未受影响参与者的模型质量,并避免了异步联邦设置中梯度方法的不稳定性。该设计提供了可解释性,因为每个更新都可以直接追溯到删除数据的影响。该方法的效率和精度使其适用于生产联邦系统,其中删除请求异步到达且计算预算受限。

英文摘要

Federated learning systems must support data deletion requests to comply with privacy regulations, yet retraining from scratch after each deletion is computationally prohibitive. We present HF-KCU, a method that removes a client's contribution by approximating the influence function through conjugate gradient iterations in Krylov subspaces, reducing complexity from O(d^3) to O(kd) where k<<d.A causal weighting mechanism ensures that only clients holding the deleted data receive parameter updates, preventing spurious changes to unaffected clients. Our method is designed to handle bounded adversarial perturbations to the Hessian and gradient, providing graceful degradation under realistic threat models. We validate HF-KCU across convolutional (ResNet-18, SimpleCNN) and transformer (ViT-Lite) architectures on CIFAR-10, MNIST, and Fashion-MNIST. On CIFAR-10 under Dirichlet (alpha=0.5) partitioning, HF-KCU achieves 47.75 times speedup over retraining while maintaining test accuracy within 0.60% of the rational baseline(71.16 vs 71.76 %). Membership inference attacks on the forget set yield success rates of 0.499 matching the retrained model and confirming effective privacy restoration. We provide convergence guarantees showing that the Krylov approximation error decreases as O((k ^1/2-1)/(k^1/2+1)) where k is the Hessian condition number. The causal weighting mechanism ensures surgical updates, where only clients holding deleted data are modified, preserving model quality for unaffected participants and avoiding the instability of gradient-based approaches in asynchronous federated settings. This design provides interpretability as each update is directly traceable to the influence of the deleted data. The method's efficiency and precision make it suitable for production federated systems where deletion requests arrive asynchronously and computational budgets are constrained.

2605.19674 2026-06-09 cs.AI 版本更新

Beyond Rational Illusion: Behaviorally Realistic Strategic Classification

超越理性错觉:行为现实的战略分类

Xinpeng Lv, Yunxin Mao, Renzhe Xu, Chunyuan Zheng, Yikai Chen, Haoxuan Li, Yang Shi, Jinxuan Yang, Zhouchen Lin, Yuanlong Chen, Yuanxing Zhang, Shaowu Yang, Wenjing Yang, Haotian Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出了一种基于前景理论的行为现实战略分类框架,以应对现实中受心理偏差影响的决策者策略性操纵问题。

Comments Accepted by ICML2026

详情
AI中文摘要

战略分类(SC)研究了决策模型与策略性操纵特征以获得有利结果的代理之间的相互作用。现有SC框架通常依赖于理想化的假设,即代理是严格理性的。然而,行为经济学和心理学的证据一致表明,现实世界中的决策往往受到认知偏差的影响,偏离纯粹理性。为了正式化这一限制,我们识别并定义了一个新的问题设置,称为行为现实的战略分类问题,其中代理的策略性操纵由于心理偏差而偏离完全理性。受识别限制的启发,我们提出了前景引导的战略框架(Pro-SF)来解决这个问题,这是一个基于前景理论的原理框架,用于建模和学习在行为现实的战略响应下。具体来说,为了捕捉行为现实的战略操纵,我们的框架通过引入三种受前景理论启发的关键机制,重新表述了代理与决策者之间的Stackelberg式互动,包括收益与成本之间的不对称性、不同的主观参照点以及非理性的概率扭曲。在合成和现实世界数据集上的实验表明,Pro-SF是一种行为导向的战略分类方法,连接了机器学习和行为经济学,为现实世界中的更可靠部署提供了桥梁。

英文摘要

Strategic classification(SC) studies the interaction between decision models and agents who strategically manipulate their features for favorable outcomes. Existing SC frameworks typically rely on the idealized assumption that agents are strictly rational. However, evidence from behavioral economics and psychology consistently shows that real-world decision-making is often shaped by cognitive biases, deviating from pure rationality. To formalize this limitation, we identify and define a new problem setting, termed the behaviorally realistic strategic classification problem, where agents' strategic manipulations deviate from full rationality due to psychological biases. Motivated by the identified limitation, we propose the Prospect-Guided Strategic Framework (Pro-SF) to address the problem, a principled framework grounded in prospect theory to model and learn under behaviorally realistic strategic responses. Specifically, to capture behaviorally realistic strategic manipulations, our framework reformulates the Stackelberg-style interaction between agents and the decision-maker by incorporating three key mechanisms inspired by prospect theory, including the asymmetry between benefits and costs, different subjective reference points, and non-rational probability distortion. Experiments on synthetic and real-world datasets establish Pro-SF as a behaviorally grounded approach to strategic classification, bridging machine learning and behavioral economics for more reliable deployment in the real world.

2605.19662 2026-06-09 cs.AI 版本更新

When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

当表格基础模型遇见策略性表格数据:一种先验对齐方法

Xinpeng Lv, Yunxin Mao, Renzhe Xu, Chunyuan Zheng, Yikai Chen, Haoxuan Li, Jinxuan Yang, Kun Kuang, Yuanlong Chen, Mingyang Geng, Wanrong Huang, Shixuan Liu, Shaowu Yang, Wenjing Yang, Zhouchen Lin, Haotian Wang

发表机构 * University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学)

AI总结 本文研究了表格基础模型在策略性表格数据上的泛化能力,提出了一种策略感知的先验对齐框架SPN,以提高模型在策略性环境中的鲁棒性和预测性能。

Comments Accepted by ICML2026

详情
AI中文摘要

基于预训练先验数据拟合网络(PFNs)的表格基础模型在多样化的表格任务上表现出强大的泛化能力,但通常设计用于非策略性设置,其中数据分布与部署分类器无关。然而,在许多现实世界决策场景中,个体可能在部署后有意识地修改特征以获得有利结果,导致部署后分布偏移。本文研究了PFN风格的表格基础模型是否能泛化到此类策略性表格数据。我们证明,策略性操纵导致了预训练期间学习的非策略性先验与操纵后的策略性先验之间的不匹配,从而产生系统性的预测偏差。为了解决这个问题,我们提出了策略性先验数据拟合网络(SPN),一种推理时策略感知的框架,能够在不重新训练的情况下将表格基础模型适应到策略性环境。SPN构建策略性上下文示例以近似操纵后的输入,并将PFN预测与诱导的策略性分布对齐。在现实世界和合成表格数据集上的实验表明,与表格基础模型和经典表格方法相比,SPN在策略性操纵下始终提高了鲁棒性和预测性能。

英文摘要

Tabular foundation models based on pretrained prior-data fitted networks~(PFNs) have shown strong generalization on diverse tabular tasks, but they are typically designed for \emph{non-strategic} settings where data distributions are independent of deployed classifiers. In many real-world decision scenarios, however, individuals may strategically modify their features after deployment to obtain favorable outcomes, inducing a post-deployment distribution shift. This paper studies whether PFN-style tabular foundation models can generalize to such \emph{strategic} tabular data. We show that strategic manipulation creates a mismatch between the non-strategic prior learned during pretraining and the post-manipulation strategic prior, which leads to systematic prediction bias. To address this issue, we propose \textbf{Strategic Prior-data Fitted Network}~\textit{(SPN)}, an inference-time strategy-aware framework that adapts tabular foundation models to strategic environments without retraining. SPN constructs strategic in-context examples to approximate post-manipulation inputs and aligns PFN predictions with the induced strategic distribution. Experiments on real-world and synthetic tabular datasets show that SPN consistently improves robustness and predictive performance under strategic manipulation compared with both tabular foundation models and classical tabular methods.

2605.19266 2026-06-09 cs.CL cs.AI 版本更新

FormalASR: End-to-End Spoken Chinese to Formal Text

FormalASR: 语音中文到正式文本的端到端系统

Wanyi Ning, Yinshang Guo, Haitao Qian, Jiyuan Cheng, Weiyuan Feng, Yufei Zhang

发表机构 * arXiv

AI总结 本文提出FormalASR,一种端到端的中文语音到正式文本转换模型,通过构建大规模的语音到正式文本数据集,并使用Qwen3-ASR进行微调,实现了比原声基线减少37.4%的CER,同时提升了ROUGE-L和BERTScore指标,提供了一个轻量级的设备端解决方案。

详情
AI中文摘要

自动语音识别(ASR)系统通常优化于逐字转录,这保留了不连贯、填充词和非正式口语结构,这些结构往往不适合下游写作应用。常见的解决方法是ASR+LLM的两阶段流程用于后期编辑,但这种设计增加了延迟和内存成本,并且难以在设备上部署。我们提出了FormalASR,两个紧凑的端到端模型(0.6B和1.7B),可直接将中文语音转录为正式书面文本。为了实现这一目标,我们构建了WenetSpeech-Formal和Speechio-Formal两个大规模的语音到正式文本数据集,通过基于LLM的重写和质量过滤构建。然后我们使用监督微调对Qwen3-ASR进行两个规模(0.6B和1.7B)的微调。在WenetSpeech-Formal和Speechio-Formal上的实验表明,FormalASR在比原声基线减少37.4%的CER的同时,也提高了ROUGE-L和BERTScore。FormalASR在部署时不需要后处理LLM,提供了一个轻量级的设备端解决方案用于语音到正式转录。

英文摘要

Automatic speech recognition (ASR) systems are typically optimized for verbatim transcription, which preserves disfluencies, filler words, and informal spoken structures that are often unsuitable for downstream writing-oriented applications. A common workaround is a two-stage ASR+LLM pipeline for post-editing, but this design increases latency and memory cost and is difficult to deploy on-device. We present FormalASR, two compact end-to-end models (0.6B and 1.7B) that directly transcribe spoken Chinese into formal written text. To enable this setting, we build WenetSpeech-Formal and Speechio-Formal, two large-scale spoken-to-formal datasets constructed by LLM-based rewriting and quality filtering. We then fine-tune Qwen3-ASR at two scales (0.6B and 1.7B) with supervised fine-tuning. Experiments on WenetSpeech-Formal and Speechio-Formal show that FormalASR achieves up to 37.4% relative CER reduction over verbatim baselines, while also improving ROUGE-L and BERTScore. FormalASR requires no post-processing LLM at deployment time, providing a lightweight, on-device solution for spoken-to-formal transcription.

2605.19228 2026-06-09 cs.CL cs.AI cs.IT cs.LG math.IT 版本更新

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

通过分步置信度归因诊断黑盒大语言模型的多步推理失败

Xiaoou Liu, Tiejin Chen, Dengjia Zhang, Yaqing Wang, Lu Cheng, Hua Wei

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出了一种基于分步置信度归因(SCA)的方法,用于诊断黑盒大语言模型在多步推理中的失败,通过信息瓶颈原理对生成的推理轨迹进行置信度评估,并通过实验验证该方法在数学推理和多跳问答任务中的有效性。

Comments Accepted by ICML 2026

详情
AI中文摘要

大型语言模型通过生成分步解决方案在具有客观答案的推理任务中实现了强大的性能,但诊断多步推理轨迹可能失败的位置仍然困难。置信度估计提供了一种诊断信号,但现有方法受限于最终答案或需要内部模型访问。在本文中,我们引入了分步置信度归因(SCA),一种适用于封闭源LLM的框架,该框架仅基于生成的推理轨迹分配步骤级置信度。SCA应用信息瓶颈原理:与正确解决方案中的一致结构对齐的步骤获得高置信度,而偏差则被标记为可能错误。我们提出了两种互补的方法:(1)NIBS,一种非参数化的IB方法,用于测量一致性而无需图结构,以及(2)GIBS,一种基于图的IB模型,通过可微分掩码学习子图以捕捉逻辑变化。在数学推理和多跳问答任务上的大量实验表明,SCA能够可靠地识别与推理错误高度相关的低置信度步骤。此外,使用步骤级置信度指导自我修正,比使用答案级反馈提高了13.5%的修正成功率。

英文摘要

Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step reasoning trace might fail remains difficult. Confidence estimation offers a diagnostic signal, yet existing methods are restricted to final answers or require internal model access. In this paper, we introduce Stepwise Confidence Attribution (SCA), a framework for closed-source LLMs that assigns step-level confidence based only on generated reasoning traces. SCA applies the Information Bottleneck principle: steps aligning with consensus structures across correct solutions receive high confidence, while deviations are flagged as potentially erroneous. We propose two complementary methods: (1) NIBS, a non-parametric IB approach measuring consistency without graph structures, and (2) GIBS, a graph-based IB model that learns subgraphs through a differentiable mask to capture logical variability. Extensive experiments on mathematical reasoning and multi-hop question answering show that SCA reliably identifies low-confidence steps strongly correlated with reasoning errors. Moreover, using step-level confidence to guide self-correction improves the correction success rate by up to 13.5\% over answer-level feedback.

2605.18643 2026-06-09 cs.LG cs.AI cs.CL 版本更新

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Xingtai Lv, Li Sheng, Kaiyan Zhang, Yichen You, Siyan Gao, Xueheng Luo, Yuxin Zuo, Yuchen Fan, Junlin Yang, Ganqu Cui, Bingning Wang, Fan Yang, Youbang Sun, Ning Ding, Bowen Zhou

发表机构 * Frontis.AI Kuaishou Technology(快手科技) Shanghai AI Lab(上海人工智能实验室) TsinghuaC3I/ZEDA(清华大学C3I/ZEDA)

AI总结 本文提出ZEDA框架,通过自蒸馏将预训练的静态MoE模型转换为高效的动态MoE模型,显著减少专家FLOPs并提升推理速度。

详情
AI中文摘要

混合专家(MoE)通过稀疏专家激活高效地扩展语言模型,其动态变体进一步通过输入依赖的方式调整激活专家以减少计算。现有动态MoE方法通常依赖从头训练或任务特定适应,使完全训练的MoE的实际转换未被充分探索。启用此类适应可直接缓解推理成本,通过允许简单令牌在服务时绕过不必要的专家。本文引入了零专家自蒸馏适应(ZEDA),一种低成本框架,将后训练的静态MoE模型转换为高效的动态MoE模型。为稳定此架构转换,ZEDA在每个MoE层中注入无参数的零输出专家,并通过两阶段自蒸馏适应增强模型,利用原始MoE作为冻结的教师,并应用组级平衡损失。在Qwen3-30B-A3B和GLM-4.7-Flash上跨11个基准测试(涵盖数学、代码和指令跟随)中,ZEDA在边际精度损失下消除了超过50%的专家FLOPs。在两个模型上,ZEDA比最强的动态MoE基线分别高出6.1和4.0个点,并提供约1.20倍的端到端推理加速。

英文摘要

Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference costs by allowing easy tokens to bypass unnecessary expert during serving. This paper introduces Zero-Expert Self-Distillation Adaptation (ZEDA), a low-cost framework that transforms post-trained static MoE models into efficient dynamic ones. To stabilize this architectural conversion, ZEDA injects parameter-free zero-output experts into each MoE layer and adapts the augmented model through two-stage self-distillation, utilizing the original MoE as a frozen teacher and applying a group-level balancing loss. On Qwen3-30B-A3B and GLM-4.7-Flash across 11 benchmarks spanning math, code, and instruction following, ZEDA eliminates over 50% of expert FLOPs at marginal accuracy loss. It outperforms the strongest dynamic MoE baseline by 6.1 and 4.0 points on the two models, and delivers ~1.20$\times$ end-to-end inference speedup.

2605.06317 2026-06-09 cs.CV cs.AI 版本更新

NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps

NavOne: 一种基于顶部向下地图的视觉语言导航的一步全局规划

Dijia Zhan, Jinyi Li, Chenxi Zheng, Shaoyu Huang, Yong Li, Jie Tang, Xuemiao Xu

发表机构 * South China University of Technology(南方科技大学)

AI总结 本文提出了一种基于顶部向下地图的视觉语言导航方法,通过引入NavOne框架,实现多模态地图的单步全局路径规划,显著提升了导航效率和性能。

Comments 10 pages, 7 figures

详情
AI中文摘要

现有的视觉语言导航(VLN)方法通常采用以自身为中心的逐步导航范式,这导致误差累积并限制了效率。尽管最近的方法试图利用预建的环境地图,但它们通常依赖于逐步更新记忆图或评分离散路径提案,这限制了连续的空间推理并创建了离散瓶颈。我们提出了顶部向下VLN(TD-VLN),将导航重新表述为在预建的顶部向下地图上的一步全局路径规划问题,支持我们新构建的R2R-TopDown数据集。为了解决这个问题,我们引入了NavOne,一个统一的框架,它在单次端到端前向传递中直接预测多模态地图上的密集路径概率。NavOne具有顶部向下地图融合器,用于联合多模态地图表示,并扩展了空间感知的深度混合。在R2R-TopDown上的广泛实验表明,NavOne在基于地图的VLN方法中实现了最先进的性能,其规划阶段的速度提升比现有基于地图的基线方法快8倍,比以自身为中心的方法快80倍,从而实现了高效全局导航。

英文摘要

Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path planning problem on pre-built top-down maps, supported by our newly constructed R2R-TopDown dataset. To solve this, we introduce NavOne, a unified framework that directly predicts dense path probabilities over multi-modal maps in a single end-to-end forward pass. NavOne features a Top-Down Map Fuser for joint multi-modal map representation, and extends Attention Residuals for spatial-aware depth mixing. Extensive experiments on R2R-TopDown show that NavOne achieves state-of-the-art performance among map-based VLN methods, with a planning-stage speedup of 8x over existing map-based baselines and 80x over egocentric methods, enabling highly efficient global navigation.

2605.17609 2026-06-09 cs.LG 版本更新

Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification

自适应生成-排序-验证:具有高成本验证的推理时间搜索

Shaddin Dughmi, Mahdi Haghifam, Yusuf Hakan Kalayci

发表机构 * University of Southern California(南加州大学) Northwestern University(西北大学) University of Chicago(芝加哥大学) Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) Simons Institute for the Theory of Computing(Simons计算理论研究所) Data Science Institute at the University of Chicago(芝加哥大学数据科学研究所)

AI总结 本文提出了一种自适应生成-排序-验证方法,通过在未知分布下自适应地生成和验证候选答案,以在保证成本的前提下找到正例,同时通过理论分析和实验验证了该方法在数学推理和编程竞赛中的有效性。

Comments 33 Pages, 6 Figures, 4 Tables. Changes compared to V1: updated the related work section

详情
AI中文摘要

许多推理时间语言模型管道结合了低成本奖励信号和高成本验证器,例如数学推理中的精确答案检查或代码生成中的隐藏测试执行。我们通过学习理论的视角将这一设置形式化为生成性主动搜索:一个成本敏感的首次正例搜索问题,在其中策略会自适应地从未知分布中采样候选者,观察低成本评分,并支付验证器标签的费用,直到找到正例。对于固定的提示,生成器和奖励模型诱导出两个未知对象:奖励分数上的分布和条件于评分的成功函数。当这些量已知时,我们使用动态规划方法来表征分布感知的最优策略。在现实和实用的设置中,当评分分布和成功函数都未知时,我们提出ADAP算法,一种分层自适应的生成-排序-验证算法,逐步增加采样的响应数量和顶部验证的数量。在单调性假设下,即更高的奖励分数不太可能通过验证,我们证明ADAP在期望成本上接近分布感知的最优。我们通过基于中心星数的学习理论下界补充这一结果,表明对评分-标签关系的结构假设是必要的。在数学推理和竞争编程上的实验验证了在固定非自适应策略和难度自适应基线上的预测优势。

英文摘要

Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier labels until it finds a positive example. For a fixed prompt, the generator and reward model induce two unknown objects: a distribution over reward scores and a score-conditioned success function. When these quantities are known, we characterize the distribution-aware optimal policy using a dynamic programming approach. In the realistic and practical setting where both the score distribution and success function are unknown, we propose ADAP, a shellwise adaptive generate-rank-verify algorithm that progressively increases the number of sampled responses and top-ranked verifications. Under the monotonicity assumption that higher reward scores are no less likely to pass verification, we show that ADAP achieves expected cost within a constant factor of the distribution-aware optimum. We complement this result with learning-theoretic lower bounds, based on a centered star number, showing that structural assumptions on the score--label relationship are necessary. Experiments on mathematical reasoning and competitive programming validate the predicted advantage over both fixed non-adaptive policies and difficulty-adaptive baselines.

2605.17301 2026-06-09 cs.CL cs.AI 版本更新

ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation

ConflictRAG: 检测和解决检索增强生成中的知识冲突

Chenyu Wang, Yueyuan Li, Yingmin Liu, Yang Shu

发表机构 * Zhejiang University(浙江大学)

AI总结 本研究提出ConflictRAG框架,通过两阶段冲突检测模块、熵-TOPSIS框架和冲突感知RAG评分,有效检测和解决检索增强生成中的知识冲突,实验表明其在冲突检测F1和正确性方面优于现有方法。

Comments 6 pages, 6 figures, submitted to IEEE SMC 2026

详情
AI中文摘要

检索增强生成(RAG)系统隐式假设检索文档之间相互一致——这一假设在实践中经常失效。我们提出了ConflictRAG,一种具有冲突意识的RAG框架,能够在生成答案之前检测、分类和解决知识冲突。该框架引入了三个贡献:(1)一个两阶段冲突检测模块,结合轻量级嵌入基于MLP分类器和选择性LLM细化,使API成本降低62%,同时保持90.8%的检测准确率;(2)一个熵-TOPSIS框架用于数据驱动的来源可信度评估,比手动启发式方法提高7.1%的选取准确率;(3)一个冲突感知RAG评分(CARS)用于诊断冲突处理能力。在三个基准测试中对六个基线的实验表明,冲突检测F1达到88.7%,并且在最强的冲突感知基线中,正确性提高了5.3-6.1%。该流程能够有效跨基础LLM转移。

英文摘要

Retrieval-Augmented Generation (RAG) systems implicitly assume mutual consistency among retrieved documents -- an assumption that frequently fails in practice. We present ConflictRAG, a conflict-aware RAG framework that detects, classifies, and resolves knowledge conflicts prior to answer generation. The framework introduces three contributions: (1) a two-stage conflict detection module combining a lightweight embedding-based MLP classifier with selective LLM refinement, reducing API costs by 62% while maintaining 90.8% detection accuracy; (2) an Entropy-TOPSIS framework for data-driven source credibility assessment, improving selection accuracy by 7.1% over manual heuristics; and (3) a Conflict-Aware RAG Score (CARS) for diagnostic evaluation of conflict-handling capabilities. Experiments on three benchmarks against six baselines demonstrate 88.7% conflict-detection F1 and consistent 5.3--6.1% correctness gains over the strongest conflict-aware baseline, with the pipeline transferring effectively across backbone LLMs.

2605.17289 2026-06-09 cs.LG cs.AI 版本更新

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

LEAP:可学习的端到端无结构剪枝大型语言模型

Mohammad Mozaffari, Younes Hourri, Mohammad Rastegari, Mahyar Najibi

发表机构 * University of Maryland(马里兰大学)

AI总结 本文提出LEAP,一种可学习的端到端无结构剪枝方法,通过伯努利-戈姆贝茨松弛替代传统参数化,提高了无结构剪枝的端到端准确率,实验表明在多个LLM家族上平均提升了零样本准确率。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference (AdaptFM)

详情
AI中文摘要

无结构稀疏性现在通过最近的GPU内核和数据流硬件原生加速,瓶颈从推理执行转移到了剪枝算法。最先进的无结构LLM剪枝方法是基于最优大脑外科手术原理的分层代理,牺牲了端到端准确性,尤其是在高稀疏度下。端到端替代方案如MaskLLM和PATCH表明可学习掩码可以缩小这一差距,但它们的类别-模式参数化随有效掩码数量按行数增长,并不适用于无结构设置。我们引入LEAP,用每权重伯努利-戈姆贝茨松弛替代这种不可行参数化,使端到端无结构掩码学习变得可行。在五个从0.5B到8B参数的LLM家族上,在50%和60%稀疏度下,LEAP在六个任务的零样本准确率上平均比ADMM提升+2.59点,ADMM是我们在扫掠中的最佳分层基线。

英文摘要

Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise surrogates derived from the Optimal Brain Surgeon principle, and they sacrifice end-to-end accuracy, especially under aggressive sparsity. End-to-end alternatives such as MaskLLM and PATCH show that learnable masks can close this gap, but their categorical-over-patterns parameterization scales with the number of valid masks per row and does not port to the unstructured setting. We introduce LEAP, which replaces this intractable parameterization with a per-weight Bernoulli-via-Gumbel-sigmoid relaxation that makes end-to-end unstructured mask learning tractable. Across five LLM families from 0.5B to 8B parameters at 50% and 60% sparsity, LEAP improves six-task average zero-shot accuracy by +2.59 points on average over ADMM, the best layer-wise baseline in our sweep.

2605.16928 2026-06-09 cs.CL cs.AI 版本更新

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

全注意力再临:在数百次训练步骤内将全注意力转化为稀疏

Yanke Zhou, Yiduo Li, Hanlin Tang, Maohua Li, Kan Liu, Tao Lan, Lin Qu, Yuan Yao, Xiaoxing Ma

发表机构 * Nanjing University(南京大学) Alibaba Group(阿里巴巴集团)

AI总结 本文提出RTPurbo方法,通过利用模型内在稀疏性,在少量训练步骤内实现高效的稀疏注意力,从而在保持接近无损精度的同时,显著提升推理效率。

Comments 20 pages, 9 figures

详情
AI中文摘要

大型语言模型的长上下文推理受到全注意力二次成本的限制。现有的高效替代方法通常依赖于原生稀疏训练或启发式令牌驱逐,导致效率、训练成本和准确性之间存在不理想的权衡。在本文中,我们证明全注意力LLM本质上已经是稀疏的,并且可以通过最小的适应转化为高度稀疏的模型。我们的方法基于三个观察:(1) 只有少量的注意力头真正需要完整的长上下文处理;(2) 长距离检索主要由低维子空间支配,允许相关令牌通过16维索引器高效检索;(3) 有用的令牌预算强烈依赖于查询,使得动态top-p选择比固定top-k稀疏化更合适。基于这些见解,我们提出了RTPurbo,该方法仅保留检索头的完整KV缓存,并引入轻量级令牌索引器进行稀疏注意力。通过利用模型的内在稀疏性,RTPurbo仅在数百次训练步骤内即可实现稀疏化。在长上下文基准和推理任务上的实验表明,RTPurbo在保持接近无损精度的同时,实现了显著的效率提升,包括在100万上下文下的预填充速度提升高达9.36倍,以及解码速度提升约2.01倍。这些结果表明,可以通过标准的全注意力训练获得强大的稀疏推理,而无需昂贵的原生稀疏预训练。

英文摘要

Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse training or on heuristic token eviction, creating an undesirable trade-off among efficiency, training cost, and accuracy. In this work, we show that full-attention LLMs are already intrinsically sparse and can be transformed into highly sparse models with only minimal adaptation. Our approach is built on three observations: (1) only a small subset of attention heads truly requires full long-context processing; (2) long-range retrieval is governed primarily by a low-dimensional subspace, allowing relevant tokens to be retrieved efficiently with a 16-dimensional indexer; and (3) the useful token budget is strongly query-dependent, making dynamic top-$p$ selection more suitable than fixed top-$k$ sparsification. Based on these insights, we propose RTPurbo, which retains the full KV cache only for retrieval heads and introduces a lightweight token indexer for sparse attention. By exploiting the model's intrinsic sparsity, RTPurbo achieves sparsification with only a few hundred training steps. Experiments on long-context benchmarks and reasoning tasks show that RTPurbo preserves near-lossless accuracy while delivering substantial efficiency gains, including up to a 9.36$\times$ prefill speedup at 1M context and about a 2.01$\times$ decode speedup. These results suggest that strong sparse inference can be obtained from standard full-attention training without expensive native sparse pretraining.

2605.16823 2026-06-09 cs.LG 版本更新

VQ-Atom: Semantic Discretization of Local Atomic Environments for Molecular Representation Learning

原子作为语言:VQ-Atom:用于分子表示学习的语义离散化

Takayuki Kimura

发表机构 * Atoms as Language, LLC(Atoms as Language公司)

AI总结 本文提出VQ-Atom,一种用于分子表示学习的语义离散化框架,通过将连续的原子级图表示转换为对应局部化学环境的离散标记,从而提升分子表示的学习效果。

详情
AI中文摘要

分子表示学习已成为AI驱动药物发现中的核心方法,但现有分子分词如SMILES仍主要是语法性的,无法自然对齐具有化学意义的子结构。在本文中,我们介绍了VQ-Atom,一种语义离散化框架,将连续的原子级图表示转换为对应局部化学环境的离散标记。利用图神经网络嵌入和向量量化,原子被分配到代表化学有意义的原子上下文的代码本条目中。这些离散标记定义了一种适合基于Transformer的预训练的分子语言。我们评估了VQ-Atom在蛋白质-配体相互作用预测中的表现,采用蛋白质冷分割设置且不依赖3D结构信息。实验结果表明,与传统分词方法相比,VQ-Atom在预测性能上始终有所提升,表明语义基础的离散化可以显著增强分子表示学习。我们的发现表明,分词设计本身在使化学领域有效语言建模中起着关键作用。

英文摘要

Large language models succeed by combining large-scale pretraining with meaningful discrete tokens. In molecular machine learning, SMILES is widely used as a token representation, but it is primarily a linearization format for molecular graphs rather than a semantic decomposition of chemistry. We propose VQ-Atom, a semantic tokenization framework that assigns discrete atom-level tokens based on local chemical environments via vector quantization. Unlike SMILES tokens, VQ-Atom tokens encode graph-local chemical context and are aligned with molecular structure. On protein-cold drug--target interaction prediction using the KIBA dataset, VQ-Atom substantially improves global ranking performance, achieving AUROC of 0.79 while substantially outperforming both SMILES-based and continuous molecular representations under an identical downstream architecture. Furthermore, VQ-Atom enables approximately 3 times faster downstream training than continuous atom-level representations by replacing per-atom continuous features with reusable discrete tokens. These results suggest that molecular tokenization is not merely a preprocessing step, but a central design choice. In particular, well-structured tokens can encode substantial chemical semantics, reducing the burden on downstream learning. VQ-Atom can be interpreted as defining a molecular language, where tokens correspond to chemically meaningful atomic environments, suggesting that token design may constitute an additional axis of machine learning research alongside architecture, objectives, and optimization.

2605.16309 2026-06-09 cs.AI cs.LG cs.MA 版本更新

ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning

ANNEAL:通过受控符号补丁学习适应大语言模型代理

Safayat Bin Hakim, Keyan Guo, Wenkai Tan, Alvaro Velasquez, Shouhuai Xu, Houbing Herbert Song

发表机构 * University of Maryland, Baltimore County(马里兰大学巴尔的摩县分校) University at Buffalo(布法罗大学) University of Colorado Boulder(科罗拉多大学博尔德分校) University of Colorado Colorado Springs(科罗拉多大学科罗拉多州立分校)

AI总结 ANNEAL通过受控符号补丁学习适应大语言模型代理,解决重复故障问题,其核心机制FDKA能定位责任操作符并生成类型补丁,实现持久结构修复,优于现有方法。

Comments Code Implementation: https://github.com/sbhakim/anneal-agents

详情
AI中文摘要

基于大语言模型的代理可以恢复个体执行错误,但在底层过程知识未修复时,同一故障会反复失败。现有自我进化方法通过更新提示、记忆或模型权重来解决这一差距,但未直接修复编码任务执行的符号结构,且缺乏安全部署所需的治理保证。我们引入ANNEAL,一种神经符号代理,将重复失败转化为受控符号编辑过程知识图谱,而无需修改基础模型权重。其核心机制,故障驱动知识获取(FDKA),定位责任操作符,通过约束LLM生成合成类型补丁,并通过多维评分、符号护栏和金丝雀测试验证提案,再提交。每条接受的编辑都携带完整溯源和确定性回滚能力。在四个领域和27个多种子运行中,ANNEAL是唯一在测试重复故障设置中将失败率降至0%的评估系统。消融实验表明,移除FDKA会消除所有结构修复并使成功率下降最高26.7个百分点。这些结果表明,受控符号修复为持续故障消除提供了与权重级和提示级适应互补的范式。

英文摘要

LLM-based agents can recover from individual execution errors, yet they repeatedly fail on the same fault when the underlying process knowledge--operator schemas, preconditions, and constraints--remains unrepaired. Existing self-evolving approaches address this gap by updating prompts, memory, or model weights, but none directly repair the symbolic structures that encode how tasks are executed, and few provide the governance guarantees required for safe deployment. We introduce ANNEAL, a neuro-symbolic agent that converts recurring failures into governed symbolic edits of a process knowledge graph without modifying foundation model weights. Its core mechanism, Failure-Driven Knowledge Acquisition (FDKA), localizes the responsible operator, synthesizes a typed patch through constrained LLM generation, and validates the proposal via multi-dimensional scoring, symbolic guardrails, and canary testing before commit. Every accepted edit carries full provenance and deterministic rollback capability. Across four domains and 27 multi-seed runs, ANNEAL is the only evaluated system that commits persistent structural repairs--strong baselines such as ReAct and Reflexion achieve high episodic recovery yet retain 72--100% holdout failure rates on recurring faults, whereas ANNEAL reduces these to 0% in the tested recurring-failure settings. Ablation confirms that removing FDKA eliminates all structural repairs and drops success rate by up to 26.7 percentage points. These results suggest that governed symbolic repair offers a complementary paradigm to weight-level and prompt-level adaptation for persistent fault elimination.

2605.15690 2026-06-09 cs.LG 版本更新

FRWKV+: Periodic-Aware Adaptive Gating for Frequency-Space Linear Time Series Forecasting

FRWKV+: 基于周期感知的自适应门控用于频率域线性时间序列预测

Qingyuan Yang, Dongyue Chen, Da Teng, Junhua Xiao, Jiaji Pan, Shizhuo Deng

发表机构 * College of Information Science and Engineering, Northeastern University(信息科学与工程学院,东北大学) Foshan Graduate School of Innovation, Northeastern University(创新研究生学院,东北大学) National Frontiers Science Center for Industrial Intelligence and Systems Optimization(工业智能与系统优化国家级前沿科学中心)

AI总结 本文提出FRWKV-Plus模型,通过引入跨分支频谱门和信任门控残差修正,提升频率域时间序列预测的准确性与效率,实验表明其在多个基准数据集上表现优异。

详情
AI中文摘要

准确且高效的长期多变量时间序列预测需要捕捉重复的时序结构,同时在许多变量和预测范围上保持推理成本低。频率域模型能紧凑地表示长程和周期性变化,但通常将实部和虚部频谱组件作为弱耦合流处理,并将周期性提示作为普通输入特征,即使这些提示不可靠。本文提出FRWKV-Plus,一种轻量级周期感知频率域预测模型,基于高效的FRWKV骨干网络。FRWKV-Plus引入了跨分支频谱门,通过总结其兄弟分支来重新加权每个频谱分支,并引入信任门控残差修正,将紧凑的周期内上下文转换为有界的、符号灵活的调整。通过构造,修正在初始化时保持恒等,并严格有界,因此周期性证据可以细化但不会主导或反转基础交互。在七个标准基准上,FRWKV-Plus在强线性、频率域、递归式和Transformer基预测器中表现一致竞争,同时保持骨干网络的轻量级特性。受控三种子消融实验显示,每个组件都起作用,收益在强周期性数据上较小,在更难的交换和IL数据集上更显著,且周期内上下文是最有影响力的单一组件。实现已公开在https://github.com/yangqingyuan-byte/FRWKV-plus。

英文摘要

Accurate and efficient long-term multivariate time series forecasting requires capturing recurring temporal structure while keeping inference cheap across many variables and horizons. Frequency-space models represent long-range and periodic variation compactly, but they typically process the real and imaginary spectral components as weakly coupled streams and treat periodic cues as ordinary input features, even when such cues are unreliable. This paper proposes FRWKV-Plus, a lightweight periodic-aware frequency-space forecasting model built on the efficient FRWKV backbone. FRWKV-Plus introduces a cross-branch spectral gate that reweights each spectral branch using a summary of its sibling branch, and a trust-gated residual correction that converts compact within-period context into a bounded, sign-flexible adjustment of these gates under a learned, data-dependent trust score. By construction, the correction is identity-preserving at initialization and strictly bounded, so periodic evidence can refine but never dominate or invert the base interaction. On seven standard benchmarks, FRWKV-Plus is consistently competitive with strong linear, frequency-domain, recurrent-style, and Transformer-based forecasters while preserving the lightweight profile of the backbone. Controlled three-seed ablations show that each component contributes, that the benefit is modest on strongly periodic data and pronounced on the harder Exchange and ILI datasets, and that the within-period context is the most influential single component. The implementation is publicly available at https://github.com/yangqingyuan-byte/FRWKV-plus.

2605.15491 2026-06-09 cs.LG cs.AI cs.PF 版本更新

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

Ghosted Layers: 无约束激活对齐用于恢复层剪枝的LLM

Vincent-Daniel Yun, Junhyuk Jo, Sai Praneeth Karimireddy, Sunwoo Lee

发表机构 * University of Southern California(南加州大学) Inha University(inha大学)

AI总结 本文提出Ghosted Layers方法,通过无约束优化解决层剪枝后激活分布不匹配问题,提升LLM准确性和 perplexity 而不牺牲效率。

详情
AI中文摘要

层剪枝从大型语言模型中移除整个Transformer解码器块,但导致后续存活层接收到的隐藏状态分布与训练时分布不匹配,从而引起显著性能下降。我们提出Ghosted Layers,一种无需训练的恢复模块,通过解决边界激活对齐问题来解决此问题。我们的方法从少量校准集推导出闭合形式的最优线性算子,以重建由剪枝层引入的激活差异。我们展示该解决方案对应于对齐目标的无约束最优解,而现有方法受限于有限算子子空间内的约束解。在多个LLM backbone和剪枝策略上的实验表明,我们的方法在保持层剪枝效率增益的同时,一致提升了准确性和perplexity,优于先前的无训练基线。官方代码仓库:https://github.com/daniel-eai/ghosted_layers_official_repository/.

英文摘要

Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a mismatch between the hidden state received by the next surviving layer and the distribution it was trained to process, leading to significant performance degradation. We propose Ghosted Layers, a training-free recovery module that addresses this issue by solving a boundary activation alignment problem. Our method derives a closed-form optimal linear operator from a small calibration set to reconstruct the activation discrepancy introduced by the pruned layers. We show that this solution corresponds to the unconstrained optimum of the alignment objective, whereas existing methods are restricted to constrained solutions over limited operator subspaces. Experiments across multiple LLM backbones and pruning strategies demonstrate that our method consistently improves accuracy and perplexity over prior training-free baselines, while preserving the efficiency gains of layer pruning. Official code repository: https://github.com/daniel-eai/ghosted_layers_official_repository/.

2605.15466 2026-06-09 cs.CV 版本更新

Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction

以实体为中心的世界模型:交互感知的掩码用于因果视频预测

Santosh Kumar Paidi

发表机构 * Genentech, Inc.(基因泰克公司)

AI总结 本文提出IA-JEPA,通过运动中心的自监督掩码策略,优先捕捉物理交互,提升因果推理任务的准确性,并在真实世界动作和物理谜题中验证了其泛化能力。

Comments 12 pages, 4 figures

详情
AI中文摘要

从未标记视频中学习预测性世界模型是人工智能的基础挑战。尽管联合嵌入预测架构(JEPA)在语义分类中设定了新基准,但它们往往缺乏物理感知,无法捕捉下游推理所需的因果动态。我们假设这源于标准的基于块的掩码策略,这些策略优先考虑视觉纹理而非罕见但信息丰富的运动事件。我们提出交互感知JEPA(IA-JEPA),利用自监督的运动中心掩码策略,优先考虑物理交互。通过专门针对碰撞或动量转移的实体,我们迫使架构重建潜在轨迹而非静态背景特征。在CLEVRER基准上评估,IA-JEPA在因果推理任务中达到14.26%的准确率,显著高于标准块掩码基线的3.22%。关键的是,我们证明IA-JEPA通过诱导更高熵、更具判别性的潜在空间(+10%熵增)打破了标准自监督的“静态偏见”,并线性化物理能量(R²=0.43)。我们展示这种交互偏见可推广到真实世界的人类动作(Something-Something V2)和零样本物理谜题(PHYRE-Lite)。我们的结果提供了一条可扩展的、完全自监督的路径,以构建开始内部化物理世界因果结构的基础世界模型。

英文摘要

Learning predictive world models from unlabelled video is a foundational challenge in artificial intelligence. While Joint Embedding Predictive Architectures (JEPA) have set new benchmarks in semantic classification, they often remain physics-blind, failing to capture the causal dynamics necessary for downstream reasoning. We hypothesize that this stems from standard patch-based masking strategies, which prioritize visual texture over rare but informative kinematic events. We propose Interaction-Aware JEPA (IA-JEPA), which utilizes a self-supervised motion-centric masking strategy to prioritize physical interactions. By specifically targeting entities engaged in collisions or momentum transfers, we force the architecture to reconstruct latent trajectories rather than static background features. Evaluated on the CLEVRER benchmark, IA-JEPA achieves 14.26% accuracy on causal reasoning tasks, a significant lead over the 3.22% achieved by standard patch-masked baselines. Crucially, we demonstrate that IA-JEPA breaks the "static bias" of standard self-supervision by inducing a higher-entropy, more discriminative latent space (+10% entropy gain) that linearizes physical energy ($R^2=0.43$). We show that this interaction bias generalizes to real-world human actions (Something-Something V2) and zero-shot physical puzzles (PHYRE-Lite). Our results provide a scalable, fully self-supervised path toward building foundational world models that begin to internalize the causal structure of the physical world.

2605.14531 2026-06-09 cs.CL 版本更新

Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

语言生成作为最优控制:潜在控制空间中的闭环扩散

ZiYi Dong, Yuliang Huang, Weijian Deng, Xiangyang Ji, Liang Lin, Pengxu Wei

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文将语言生成重新表述为随机最优控制问题,通过统一理论视角分析自回归和扩散模型,解释其局限性,并提出基于流匹配的闭环控制器实现高效文本生成。

详情
AI中文摘要

本工作将语言生成重新表述为随机最优控制问题,提供统一的理论视角来分析自回归和扩散模型,并解释其局限性(效率-保真度悖论、不可逆误差传播、优化可行性与保真度)在轨迹奇异性、共轭状态消失和梯度缺失的组合下的表现。为解决这些问题,我们近似求解哈密顿-雅可比-贝尔曼(HJB)方程,得到一个作为闭环控制器的最优策略。为避免直接求解HJB PDE的不可行性,我们采用流匹配作为最优轨迹求解器,在校正的潜在控制空间中。这使我们的Manta-LM配备全局积分算子能够近似全局向量场,从而实现同时实现高保真文本生成和高效、低成本并行采样的模型。实验表明,我们的方法在语言建模和条件生成任务中表现强劲,同时表现出改进的稳定性、效率和可控性。

英文摘要

This work reformulates language generation as a stochastic optimal control problem, providing a unified theoretical perspective to analyze autoregressive and diffusion models and explain their limitations (Efficiency-Fidelity Paradox, Irreversibility Error Propagation, Optimization Tractability and Fidelity) in terms of combination of trajectory singularity, adjoint state vanishing, and gradient absence. To address these issues, we approximate the solution to the Hamilton-Jacobi-Bellman (HJB) equation, yielding an optimal policy that acts as a closed-loop controller. To bypass the intractability of directly solving the HJB PDE, we employ Flow Matching as the optimal trajectory solver within the rectified latent control space. This allows our Manta-LM with Global Integral Operator to approximate the global vector field, effectively realizing a model that simultaneously achieves high-fidelity text generation and efficient, low-cost parallel sampling. Empirically, our method achieves strong performance on language modeling and conditional generation tasks, while exhibiting improved stability, efficiency, and controllability.

2604.26498 2026-06-09 cs.LG q-bio.QM 版本更新

Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

大模型真的在药物发现中胜出吗?AI驱动的分子性质和活性预测中模型规模的基准评估

Jinjiang Guo, Sheng Ding

发表机构 * Global Health Drug Discovery Institute(全球健康药物发现研究所) School of Pharmaceutical Sciences(药学院)

AI总结 本文通过26个ADME、毒性及生物活性端点评估,发现传统机器学习在多数任务中表现最佳,大模型在部分困难分割中竞争力有限,模型性能依赖于任务与验证场景的适配性,而非单纯规模。

Comments Improved benchmark design and reproducibility, replaced restricted datasets with public benchmarks in primary analyses, and added sensitivity analyses supporting the interpretation of model scaling and evaluation protocol effects in molecular prediction

详情
AI中文摘要

分子基础模型和大语言模型的快速发展促使人们以规模为中心看待AI在药物发现中的应用,认为更大的预训练模型将取代紧凑的化学信息学模型。我们测试了这一假设,涵盖26个ADME、毒性及生物活性端点,共165,541个端点级别化合物标签记录。基准测试包含78个端点和分割条目,通过随机、Murcko骨架和结构分离的5折交叉验证协议评估,代表递增的化学泛化难度。在156个任务和指标比较中,传统机器学习(ML)提供了最大的最佳表现份额(47.4%),其次是预训练分子序列模型(28.8%)、图神经网络(21.8%)和基于LLM的SAR基线(1.9%)。传统ML在随机分割插值中占优,并总体上是最大的胜利家族。GNN和序列模型在部分更难的分割中具有竞争力,但其严格胜利份额在固定最终窗口读取下减少,表明对训练设置和模型选择的敏感性。配对Bootstrap分析显示,模型间的小数值差异不应被视为决定性胜利。训练折叠中的SAR知识提高了GPT5.5-SAR和Opus4.7-SAR指标,但并未使基于规则的推理成为监督预测器的通用替代品。紧凑的专业模型仍高度有效,预测性能取决于模型、任务和验证场景之间的适配性,而非规模本身。

英文摘要

The rapid growth of molecular foundation models and large language models (LLMs) has encouraged a scale centred view of AI in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models. We test this assumption across 26 ADME, toxicity and bioactivity endpoints, covering 165,541 endpoint level compound label records. The benchmark contains 78 endpoint and split entries evaluated under random, Murcko scaffold and structure separated 5-fold cross validation protocols, representing increasing chemical generalization difficulty. Across 156 task and metric comparisons, classical machine learning (ML) provides the largest share of best performing entries (47.4%), followed by pretrained molecular sequence models (28.8%), graph neural networks (21.8%) and LLM based SAR baselines (1.9%). Classical ML dominates random split interpolation and remains the largest winner family overall. GNN and sequence models are competitive in selected harder splits, but their strict winner shares decrease under a fixed final-window readout, indicating sensitivity to training settings and model selection. Paired bootstrap analyses show that small numerical differences between individual models should not be read as decisive victories. SAR knowledge from training folds improves GPT5.5-SAR and Opus4.7-SAR metrics but does not make rule based reasoning a universal substitute for supervised predictors. Compact specialized models remain highly effective, and predictive performance depends on the fit among model, task and validation scenario, not on scale alone.

2605.13768 2026-06-09 cs.LG cs.AI cs.IT math.IT 版本更新

High-Rate Quantized Matrix Multiplication II

高速率量化矩阵乘法II

Or Ordentlich, Yury Polyanskiy

发表机构 * Hebrew University of Jerusalem(希伯来大学杰里科分校) MIT(麻省理工学院)

AI总结 本文研究在已知第二因子列协方差矩阵情况下高速率量化矩阵乘法,通过水填充算法改进LLM量化方法,展示WaterSIC方案在信息论极限下的性能。

详情
AI中文摘要

本文是关于量化矩阵乘法(MatMul)工作的第二部分。在第一部分中,我们考虑了无校准量化的情况,而在这里,我们讨论了在第二因子列协方差矩阵$Σ_X$已知的情况下的情形。这种情形出现在广泛应用的LLM后训练量化任务中。权重量化与加权均方误差(WMSE)源编码问题相关,其经典的(反向)水填充解决定了如何在向量的坐标之间分配速率。我们展示了如何利用水填充来改进实际的LLM量化算法(GPTQ),目前这些算法平均分配速率。最近的一种方案(称为``WaterSIC'')仅使用标量INT量化器进行分析,其高速率性能被证明为(a)基无关(即由$Σ_X$的行列式决定,因此不同于现有方案,不受随机旋转的影响);(b)在信息论极限下的性能与$\frac{2πe}{12}$(或0.25 bit/entry)的乘法因子内。GPTQ的性能受基的选择影响,但对于随机旋转和实际的$Σ_X$来自Llama-3-8B,我们发现其性能在0.1 bit(取决于层类型)以内,表明GPTQ结合随机旋转也接近最优,至少在高速率范围内。

英文摘要

This is the second part of the work investigating quantized matrix multiplication (MatMul). In part I we considered the case of calibration-free quantization, whereas here we discuss the setting where covariance matrix $Σ_X$ of the columns of the second factor is available. This setting arises in the ubiquitous task of weight-only post-training quantization of LLMs. Weight-only quantization is related to the problem of weighted mean squared error (WMSE) source coding, whose classical (reverse) waterfilling solution dictates how one should distribute rate between coordinates of the vector. We show how waterfilling can be used to improve practical LLM quantization algorithms (GPTQ), which at present allocate rate equally. A recent scheme (known as ``WaterSIC'') that only uses scalar INT quantizers is analyzed and its high-rate performance is shown to be (a) basis free (i.e., characterized by the determinant of $Σ_X$ and, thus, unlike existing schemes, is immune to applying random rotations); and (b) within a multiplicative factor of $\frac{2πe}{12}$ (or 0.25 bit/entry) of the information-theoretic distortion limit. GPTQ's performance, in turn, is affected by the choice of basis, but for a random rotation and actual $Σ_X$ from Llama-3-8B we find it to be within 0.1 bit (depending on the layer type) of WaterSIC, suggesting that GPTQ with random rotation is also near optimal, at least in the high-rate regime.

2605.11212 2026-06-09 cs.CL 版本更新

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

ReVision:通过时间视觉冗余减少扩展计算机使用代理

Amirhossein Abaskohi, Yuhang He, Peter West, Giuseppe Carenini, Pranit Chawla, Vibhav Vineet

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Microsoft Research(微软研究院)

AI总结 ReVision通过去除冗余视觉片段,减少token使用并提升成功率,使代理能处理更长轨迹。

详情
AI中文摘要

计算机使用代理(CUAs)依赖于图形用户界面的视觉观察,每个截图被编码为大量视觉token。随着交互轨迹增长,token成本迅速增加,限制了在固定上下文和计算预算下可纳入的历史量。这导致使用历史时性能提升有限,不同于其他领域。我们通过引入ReVision解决这一效率问题,该方法用于训练多模态语言模型,在轨迹中去除冗余视觉片段,使用学习的片段选择器比较连续截图的片段表示,同时保留模型所需的时空结构。在三个基准测试(OSWorld、WebTailBench和AgentNetBench)中,当使用Qwen2.5-VL-7B处理包含5个历史截图的轨迹时,ReVision平均减少46%的token使用,同时将成功率提高3%。这建立了明显的效率提升,使代理能用更少token处理更长轨迹。通过这一改进效率,我们重新审视CUAs中历史的作用,发现当去除冗余时,性能随更多过去观察的纳入而持续提升。

英文摘要

Computer-use agents (CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual tokens. As interaction trajectories grow, the token cost increases rapidly, limiting the amount of history that can be incorporated under fixed context and compute budgets. This has resulted in no or very limited improvement in the performance when using history unlike other domains. We address this inefficiency by introducing ReVision, which is used to train multimodal language models on trajectories where redundant visual patches are removed using a learned patch selector that compares patch representations across consecutive screenshots while preserving spatial structure required by the model. Across three benchmarks, OSWorld, WebTailBench, and AgentNetBench, when processing trajectories with 5 history screenshots using Qwen2.5-VL-7B, ReVision reduces token usage by 46% on average while improving success rate by 3% over the no drop baseline. This establishes a clear efficiency gain, enabling agents to process longer trajectories with fewer tokens. With this improved efficiency, we revisit the role of history in CUAs and find that performance continues to improve as more past observations are incorporated when redundancy is removed.

2605.12213 2026-06-09 cs.AI 版本更新

Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

面向目标的推理用于基于RAG的记忆在对话型代理LLM系统中

Jiazhou Liang, Armin Toroghi, Yifan Simon Liu, Faeze Moradi Kalarde, Liam Gallagher, Scott Sanner

发表机构 * University of Toronto(多伦多大学) Vector Institute for Artificial Intelligence(向量人工智能研究所)

AI总结 本文提出Goal-Mem框架,通过目标导向的推理提升RAG记忆在复杂任务中的表现,尤其在多跳推理和隐含推理中效果显著。

详情
AI中文摘要

基于LLM的对话型AI代理在长时间范围内维持一致行为存在困难,因为上下文有限。虽然RAG方法通过外部记忆模块存储交互并进行检索来克服这一限制,但其在回答具有挑战性的问题(如多跳、常识推理)上的有效性最终取决于代理对检索信息的推理能力。然而,现有方法通常基于语义相似性检索原始用户语句,缺乏对缺失中间事实的显式推理,且常返回无关或不足的证据。本文引入Goal-Mem,一种面向目标的推理框架,通过从用户语句作为目标进行逆向推导。而非逐步扩展检索上下文,Goal-Mem将每个目标分解为原子子目标,进行针对性记忆检索以满足每个子目标,并迭代识别在中间目标无法解决时应从记忆中检索哪些信息。我们通过自然语言逻辑(NLL)形式化这一过程,该逻辑系统结合了FOL的推理可验证性和自然语言的表达性。通过在两个数据集上进行广泛实验,并与九个强大的记忆基线进行比较,我们证明Goal-Mem在多个任务中表现更优,尤其在需要多跳推理和隐含推理的任务中效果显著。

英文摘要

LLM-based conversational AI agents struggle to maintain coherent behavior over long horizons due to limited context. While RAG-based approaches are increasingly adopted to overcome this limitation by storing interactions in external memory modules and performing retrieval from them, their effectiveness in answering challenging questions (e.g., multi-hop, commonsense) ultimately depends on the agent's ability to reason over the retrieved information. However, existing methods typically retrieve memory based on semantic similarity to the raw user utterance, which lacks explicit reasoning about missing intermediate facts and often returns evidence that is irrelevant or insufficient for grounded reasoning. In this work, we introduce Goal-Mem, a goal-oriented reasoning framework for RAG-based agentic memory that performs explicit backward chaining from the user's utterance as a goal. Rather than progressively expanding from retrieved context, Goal-Mem decomposes each goal into atomic subgoals, performs targeted memory retrieval to satisfy each subgoal, and iteratively identifies what information from memory should be retrieved when intermediate goals cannot be resolved. We formalize this process in Natural Language Logic, a logical system that combines the verifiability of reasoning provided by FOL with the expressivity of natural language. Through extensive experiments on two datasets and comparing to nine strong memory baselines, we show that Goal-Mem consistently improves performance, particularly on tasks requiring multi-hop reasoning and implicit inference.

2605.11855 2026-06-09 cs.LG cs.AI cs.AR 版本更新

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

提升为超低功耗应用设计的可并行递归神经网络的性能和学习稳定性

Julien Brandoit, Arthur Fyon, Damien Ernst, Guillaume Drion

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出CMRU和αCMRU,通过累积更新公式恢复梯度流并保持持久记忆,提升收敛稳定性并减少初始化敏感性,在多样本基准中表现优异,尤其在需要离散长距离保留的任务中表现突出。

Comments Accepted as a spotlight at ICML2026. This work has been the subject of patent applications under numbers EP26175243.0 and EP26175248.9

详情
AI中文摘要

序列学习主要由Transformer和可并行递归神经网络(如状态空间模型)主导,但学习长期依赖仍具挑战性,最先进的设计以性能牺牲换取功耗降低。Bistable Memory Recurrent Unit(BMRU)被引入以实现超低功耗RNNs的软硬件协同设计:具有滞后特性的量化状态提供持久记忆并直接映射到模拟基本单元。然而,BMRU在复杂序列任务上性能落后于可并行RNNs。本文识别出在状态更新期间出现的梯度阻塞是关键限制,并提出累积更新公式以恢复梯度流并保持持久记忆,通过时间创建跳跃连接。这导致了累积记忆递归单元(CMRU)及其放松变体αCMRU。实验表明,累积公式显著提高了收敛稳定性并减少了初始化敏感性。CMRU和αCMRU在小模型规模下在多样本基准中与线性递归单元(LRUs)和最小门控递归单元(minGRUs)匹配或超越,尤其在需要离散长距离保留的任务中表现突出,同时CMRU保留量化状态、持久记忆和抗噪声动态,这些对于模拟实现至关重要。

英文摘要

Sequence learning is dominated by Transformers and parallelizable recurrent neural networks (RNNs) such as state-space models, yet learning long-term dependencies remains challenging, and state-of-the-art designs trade power consumption for performance. The Bistable Memory Recurrent Unit (BMRU) was introduced to enable hardware-software co-design of ultra-low power RNNs: quantized states with hysteresis provide persistent memory while mapping directly to analog primitives. However, BMRU performance lags behind parallelizable RNNs on complex sequential tasks. In this paper, we identify gradient blocking during state updates as a key limitation and propose a cumulative update formulation that restores gradient flow while preserving persistent memory, creating skip-connections through time. This leads to the Cumulative Memory Recurrent Unit (CMRU) and its relaxed variant, the $α$CMRU. Experiments show that the cumulative formulation dramatically improves convergence stability and reduces initialization sensitivity. The CMRU and $α$CMRU match or outperform Linear Recurrent Units (LRUs) and minimal Gated Recurrent Units (minGRUs) across diverse benchmarks at small model sizes, with particular advantages on tasks requiring discrete long-range retention, while the CMRU retains quantized states, persistent memory, and noise-resilient dynamics essential for analog implementation.

2605.08384 2026-06-09 cs.CL 版本更新

jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers

jina-embeddings-v5-omni: 通过锁定对齐塔实现几何保持嵌入

Florian Hönicke, Michael Günther, Andreas Koukounas, Mohammad Kalim Akram, Scott Martens, Saba Sturua, Han Xiao

发表机构 * Jina by Elastic(Jina 由 Elastic 公司)

AI总结 本文提出GELATO方法,通过冻结对齐塔实现多模态嵌入,生成统一语义空间,训练效率高且保持文本嵌入一致性。

Comments 11 pages, 9 figures, 5 tables

详情
AI中文摘要

在本文中,我们介绍了GELATO(通过锁定对齐塔实现几何保持嵌入),一种新型的多模态嵌入模型。我们基于VLM式架构,非文本编码器被调整以生成语言模型的输入,进而生成所有输入类型的嵌入。我们展示了结果:jina-embeddings-v5-omni套件,一对模型将文本、图像、音频和视频输入编码到单一语义嵌入空间。GELATO扩展了两个Jina Embeddings v5文本模型,通过添加图像和音频编码器支持额外模态。骨干文本嵌入模型和新增的非文本模态编码器保持冻结。我们仅训练连接组件,代表联合模型总权重的0.35%。因此,训练比全参数重新训练要高效得多。此外,语言模型保持基本不变,对文本输入生成与Jina Embeddings v5文本模型完全相同的嵌入。我们的评估表明,GELATO产生的结果与最先进的方法相媲美,几乎与更大的多模态嵌入模型具有同等性能。

英文摘要

In this work, we introduce GELATO (Geometry-preserving Embeddings via Locked Aligned TOwers), a novel approach to multimodal embedding models. We build on the VLM-style architecture, in which non-text encoders are adapted to produce input for a language model, which in turn generates embeddings for all varieties of input. We present the result: the jina-embeddings-v5-omni suite, a pair of models that encode text, image, audio, and video input into a single semantic embedding space. GELATO extends the two Jina Embeddings v5 Text models to support additional modality by adding encoders for images and audio. The backbone text embedding models and the added non-text modality encoders remain frozen. We only trained the connecting components, representing 0.35% of the total weights of the joint model. Training is therefore much more efficient than full-parameter retraining. Additionally, the language model remains effectively unaltered, producing exactly the same embeddings for text inputs as the Jina Embeddings v5 Text models. Our evaluations show that GELATO produces results that are competitive with the state-of-the-art, yielding nearly equal performance to larger multimodal embedding models.

2605.11484 2026-06-09 cs.AI 版本更新

Engagement Process: Rethinking the Temporal Interface of Action and Observation

参与过程:重新思考动作与观察的时间接口

Jialian Li, Yuchen Cao, Junhong Liu, Weiran Guo, Xutao Wang, Jiaming Song, Jiahao Zhang, Jie Chen

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出参与过程(EP)模型,通过显式时间接口处理动作与观察的不同时间尺度交互,支持多速率协调和子系统组合,揭示隐藏的时间行为并使策略适应显式时间成本。

详情
AI中文摘要

在数字和物理环境中完成任务日益涉及复杂的时序交互,其中动作和观察在不同的时间尺度上展开,而非与固定观察-动作步骤对齐。为了建模此类交互,我们提出参与过程(EP),一种继承POMDP决策理论结构的交互形式,使时间在动作-观察接口中显式化。EP将动作和观察表示为沿时间解耦的事件流,而非在固定决策步骤上配对更新。此接口捕捉单agent的时间问题,如决策延迟、延迟反馈和持续动作,同时支持更丰富的agent侧组织、多速率协调和子系统间的组合交互。在玩具、LLM-agent和学习实验中,EP揭示了由基于步骤的接口隐藏的时间行为,并使策略在显式时间成本下适应。

英文摘要

Task completion in digital and physical environments increasingly involves complex temporal interaction, where actions and observations unfold over different time scales rather than align with fixed observation--action steps. To model such interactions, we propose \emph{Engagement Process} (EP), an interaction formalism that inherits the decision-theoretic structure of POMDPs while making time explicit in the action--observation interface. EP represents actions and observations as decoupled event streams along time, rather than updates paired at fixed decision steps. This interface captures single-agent timing issues such as deliberation latency, delayed feedback, and persistent actions, while supporting richer agent-side organization, multi-rate coordination, and compositional interaction among subsystems. Across toy, LLM-agent, and learning experiments, EP exposes temporal behaviors hidden by step-based interfaces and enables policies to adapt under explicit time costs.

2601.23286 2026-06-09 cs.CV cs.AI cs.LG 版本更新

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

VideoGPA: 通过几何先验知识蒸馏实现3D一致的视频生成

Hongyang Du, Junjie Ye, Xiaoyan Cong, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 VideoGPA通过几何先验知识蒸馏提升视频生成的3D一致性,利用数据高效的自监督框架引导视频扩散模型,显著增强时间稳定性、几何合理性与运动一致性。

Comments 8 pages, 5 figures, ICML 2026

详情
AI中文摘要

尽管最近的视频扩散模型(VDMs)能产生视觉上令人印象深刻的结果,但它们在保持3D结构一致性方面存在根本性困难,常导致物体变形或空间漂移。我们假设这些失败是因为标准去噪目标缺乏显式的几何一致性激励。为此,我们引入VideoGPA(视频几何偏好对齐),一种数据高效的自监督框架,利用几何基础模型自动推导密集偏好信号,通过直接偏好优化(DPO)引导VDMs。该方法有效将生成分布引导至内在3D一致性,而无需人工标注。VideoGPA通过最少的偏好对显著提升了时间稳定性、几何合理性与运动一致性,在大量实验中一致优于最先进基线。

英文摘要

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, geometric plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.

2412.01324 2026-06-09 cs.RO 版本更新

Integrated Hierarchical Decision-Making in Inverse Kinematic Planning and Control

集成化分层决策在逆运动学规划与控制中

Kai Pfeiffer, Quan Zhang, Yuqing Chen, Gordon Boateng, Yuquan Wang, Vincent Bonnet, Aberrahmane Kheddar

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出一种高效的非线性规划框架,整合分层决策与全身逆运动学规划控制,解决逆运动学规划中同时选择端效应器位置的问题。

Comments Accepted paper to "Robotics: Science and Systems" (2026)

详情
AI中文摘要

本文提出了一种新颖且高效的非线性规划框架,紧密整合分层决策与全身逆运动学规划与控制。决策在机器人领域诸多方面起核心作用,从稀疏逆运动学控制(使用最少的关节)到同时选择多个候选端效应器位置的逆运动学规划。当前方法常依赖混合整数非线性规划的大量计算,将决策与逆运动学分离(有时用可达性方法近似),或使用高效但不够灵活的ℓ1范数线性稀疏规划方法,未解决底层非线性问题。相比之下,所提出的稀疏分层非线性规划求解器通过利用稀疏分层结构和ℓ0范数(在机器人领域很少使用)实现了高效、灵活和准确。该求解器有效处理了文献中未解决的复杂非线性分层决策问题,例如同时从大量候选中优先选择端效应器位置的逆运动学规划,或同时选择双臂抓取位置的逆运动学控制。

英文摘要

This work presents a novel and efficient nonlinear programming framework that tightly integrates hierarchical decision-making with whole-body inverse kinematic planning and control. Decision-making plays a central role in many aspects of robotics, from sparse inverse kinematic control with a minimal number of joints, to inverse kinematic planning while simultaneously selecting a discrete end-effector location from multiple candidates. Current approaches often rely on heavy computations using mixed-integer nonlinear programming, separate decision-making from inverse kinematics (some times approximated by reachability methods), or employ efficient but less versatile $\ell_1$-norm formulations of linear sparse programming, without addressing the underlying nonlinear problem formulations. In contrast, the proposed sparse hierarchical nonlinear programming solver is efficient, versatile, and accurate by exploiting sparse hierarchical structure and leveraging the $\ell_0$-norm which is rarely used in robotics. The solver efficiently tackles complex nonlinear hierarchical decision-making problems previously unaddressed in the literature, such as inverse kinematic planning with simultaneous prioritized selection of end-effector locations from a large set of candidates, or inverse kinematic control with simultaneous selection of bi-manual grasp locations on a randomly rotated box.

2605.04913 2026-06-09 cs.CL cs.LG 版本更新

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

重新思考局部学习:一种更便宜更快的LLM后训练配方

Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang, Junhao Su

发表机构 * Independent Researcher(独立研究者) D 4 Lab(D4实验室) Southeast University(东南大学)

AI总结 本文提出LoPT,一种局部学习后训练策略,通过在transformer中点设置梯度边界,降低内存成本,提高训练效率并保留预训练能力。

Comments 35pages

详情
AI中文摘要

LLM后训练通常通过完整深度传播任务梯度。尽管这种端到端结构简单通用,但将其任务适应与完整深度激活存储、长距离反向依赖和直接任务梯度访问预训练表示耦合在一起。我们主张这种完整深度反向耦合可能不必要的昂贵和侵入性,尤其是在后训练监督远比预训练狭窄时。为此,我们提出LoPT:局部学习后训练,一种简单的后训练策略,使梯度达到成为显式设计选择。LoPT在transformer中点放置单一梯度边界:后半部分块从任务目标学习,而前半部分块通过轻量级特征重建目标进行更新,以保留有用的表示并保持接口兼容性。LoPT缩短了任务引起的反向路径,同时限制了狭窄任务梯度对早期层表示的直接干扰。大量实验表明,LoPT在较低的内存成本、较高的训练效率和更好的保留预训练能力方面实现了竞争性性能。我们的代码可在:https://github.com/HumyuShi/LoPT获取。

英文摘要

LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we propose \textbf{LoPT}: Local-Learning Post-Training, a simple post-training strategy that makes gradient reach an explicit design choice. LoPT places a single gradient boundary at the transformer midpoint: the second-half block learns from the task objective, while the first-half block is updated by a lightweight feature-reconstruction objective to preserve useful representations and maintain interface compatibility. LoPT shortens the task-induced backward path while limiting direct interference from narrow task gradients on early-layer representations. Extensive experiments demonstrate that LoPT achieves competitive performance with lower memory cost, higher training efficiency and better retention of pretrained capabilities. Our code is available at: https://github.com/HumyuShi/LoPT

2605.06582 2026-06-09 cs.LG cs.CL cs.SD 版本更新

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

PairAlign:一种通过自对齐的序列标记化框架及其在音频标记化中的应用

Adhiraj Banerjee, Vipul Arora

发表机构 * Department of Electrical Engineering, Indian Institute of Technology, Kanpur(电子工程系,印度理工学院,坎浦尔)

AI总结 PairAlign通过序列级自对齐实现紧凑音频标记化,利用条件序列生成方法,提升标记一致性、长度控制和编辑相似性。

Comments 57 pages main content, 109 total pages, 9 Figures, pre-print, Under Review

详情
AI中文摘要

许多感官数据的操作——比较、记忆、检索和推理——自然地在离散符号结构上表达。在语言中,这种接口由标记提供;在音频中,必须学习。现有音频标记器依赖于量化、聚类或编解码器重建,将标记局部分配,因此序列一致性、紧凑性、长度控制、终止和编辑相似性很少被直接优化。我们引入PairAlign,一种通过序列级自对齐实现紧凑音频标记化的框架。PairAlign将标记化视为条件序列生成:编码器将语音映射为连续条件,自回归解码器从BOS开始生成标记,学习标记身份、顺序、长度和EOS位置。给定两个保持内容的视图,每个视图的序列在另一个视图的表示下被训练为可能,而无关示例提供竞争序列。这为可扩展的编辑距离保留代理,同时抑制许多对一的坍缩。PairAlign从VQ式标记化开始,并通过EMA教师目标、交叉配对教师强制、前缀损坏、似然对比和长度控制进行优化。在3秒语音上,PairAlign学习紧凑、非退化的序列,具有广泛的词汇使用和强跨视图一致性。在检索测试中,它保留编辑距离搜索,同时将存档标记数量减少55%。连续扫频探针显示其局部重叠低于密集几何标记器,但具有更强的长度控制和在100毫秒移位下的受约束编辑轨迹。PairAlign是一种序列符号预测学习者:像JEPA式目标一样,它从另一个视图预测一个抽象目标作为学习的可变长度符号序列,而不是连续潜在变量。

英文摘要

Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstruction, assigning tokens locally, so sequence consistency, compactness, length control, termination, and edit similarity are rarely optimized directly. We introduce PairAlign, a framework for compact audio tokenization through sequence-level self-alignment. PairAlign treats tokenization as conditional sequence generation: an encoder maps speech to a continuous condition, and an autoregressive decoder generates tokens from BOS, learning token identity, order, length, and EOS placement. Given two content-preserving views, each view's sequence is trained to be likely under the other's representation, while unrelated examples provide competing sequences. This gives a scalable surrogate for edit-distance preservation while discouraging many-to-one collapse. PairAlign starts from VQ-style tokenization and refines it with EMA-teacher targets, cross-paired teacher forcing, prefix corruption, likelihood contrast, and length control. On 3-second speech, PairAlign learns compact, non-degenerate sequences with broad vocabulary usage and strong cross-view consistency. On retrieval tests, it preserves edit-distance search while reducing archive token count by 55%. A continuous-sweep probe shows lower local overlap than a dense geometric tokenizer, but stronger length control and bounded edit trajectories under 100 ms shifts. PairAlign is a sequence-symbolic predictive learner: like JEPA-style objectives, it predicts an abstract target from another view as a learned variable-length symbolic sequence, not a continuous latent.

2605.05136 2026-06-09 cs.CV 版本更新

CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization

CPCANet:基于共同主成分分析的深度展开方法用于领域泛化

Yu-Hsi Chen, Abd-Krim Seghouane

发表机构 * The University of Melbourne(墨尔本大学)

AI总结 本文提出CPCANet,通过深度展开Flury-Gautschi算法实现共同主成分分析,提升领域泛化性能,在四个基准测试中达到零样本转移的最先进水平。

Comments 9 pages, 5 tables

详情
AI中文摘要

领域泛化(DG)旨在学习在分布外转移下仍具鲁棒性的表示,并有效推广到未见目标领域。尽管最近的不变学习策略和架构进步已取得良好性能,但通过二阶统计显式发现结构化的领域不变子空间仍被忽视。本文提出CPCANet,一种基于共同主成分分析(CPCA)的新型框架,将迭代的Flury-Gautschi(FG)算法展开为完全可微的神经层。该方法将CPCA的统计特性整合到端到端可训练框架中,强制在不同领域间发现共享子空间,同时保持可解释性。在四个标准DG基准测试中,CPCANet在零样本转移中达到最先进性能。此外,CPCANet架构无关,无需特定数据集调优,提供了一种简单高效的鲁棒表示学习方法以应对分布偏移。代码可在https://github.com/wish44165/CPCANet获取。

英文摘要

Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong performance, explicitly discovering a structured domain-invariant subspace through second-order statistics remains underexplored. In this work, we propose CPCANet, a novel framework grounded in Common Principal Component Analysis (CPCA), which unrolls the iterative Flury-Gautschi (FG) algorithm into fully differentiable neural layers. This approach integrates the statistical properties of CPCA into an end-to-end trainable framework, enforcing the discovery of a shared subspace across diverse domains while preserving interpretability. Experiments on four standard DG benchmarks demonstrate that CPCANet achieves state-of-the-art (SOTA) performance in zero-shot transfer. Moreover, CPCANet is architecture-agnostic and requires no dataset-specific tuning, providing a simple and efficient approach to learning robust representations under distribution shift. Code is available at https://github.com/wish44165/CPCANet.

2605.03862 2026-06-09 cs.AI cs.CL 版本更新

Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards

正确性不足:通过执行器导向的奖励训练推理计划器

Tianyang Han, Hengyu Shi, Junjie Hu, Xu Yang, Zhiling Wang, Junhao Su

发表机构 * D 4 Lab(D4实验室) Independent Researcher(独立研究者)

AI总结 本文提出TraceLift框架,通过执行器导向的奖励提升推理质量,利用rubric-based Reasoning Reward Model评估推理轨迹的可靠性与有效性。

Comments 36 pages

详情
AI中文摘要

可验证奖励的强化学习已成为提升大语言模型显式推理的常见方法,但仅凭最终答案正确性无法揭示推理轨迹的忠实性、可靠性或对消费模型的效用。为此,我们提出TraceLift,将推理视为可消费的中间产物。在计划器训练中,计划器生成标记化的推理。冻结的执行器将此推理转化为最终产物供验证器反馈,同时执行器导向的奖励塑造中间轨迹。此奖励乘以基于rubric的Reasoning Reward Model评分,乘以在相同冻结执行器上测量的提升,奖励高质量且有用的轨迹。为使推理质量直接可学习,我们引入TRACELIFT-GROUPS数据集,包含数学和代码种子问题。每个示例是同一问题组,包含高质量参考轨迹和多个可能的错误轨迹,通过局部扰动降低推理质量或解决方案支持,同时保持任务相关性。在代码和数学基准上的广泛实验表明,执行器导向的推理奖励提高了两阶段计划器-执行器系统,表明推理监督应不仅评估轨迹是否看起来好,还应评估其是否帮助消耗模型。

英文摘要

Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model that consumes it. This outcome-only signal can reinforce traces that are right for the wrong reasons, overstate reasoning gains by rewarding shortcuts, and propagate flawed intermediate states in multi-step systems. To this end, we propose TraceLift, a planner-executor training framework that treats reasoning as a consumable intermediate artifact. During planner training, the planner emits tagged reasoning. A frozen executor turns this reasoning into the final artifact for verifier feedback, while an executor-grounded reward shapes the intermediate trace. This reward multiplies a rubric-based Reasoning Reward Model (RM) score by measured uplift on the same frozen executor, crediting traces that are both high-quality and useful. To make reasoning quality directly learnable, we introduce TRACELIFT-GROUPS, a rubric-annotated reason-only dataset built from math and code seed problems. Each example is a same-problem group containing a high-quality reference trace and multiple plausible flawed traces with localized perturbations that reduce reasoning quality or solution support while preserving task relevance. Extensive experiments on code and math benchmarks show that this executor-grounded reasoning reward improves the two-stage planner-executor system over execution-only training, suggesting that reasoning supervision should evaluate not only whether a trace looks good, but also whether it helps the model that consumes it. Our code is available at: https://github.com/MasaiahHan/TraceLift