arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4127
2602.02470 2026-06-02 cs.AI

Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge

通过身份桥打破自回归语言模型中的逆转诅咒

Xutao Ma, Yixiao Huang, Hanlin Zhu, Somayeh Sojoudi

发表机构 * UC Berkeley(加州大学伯克利分校)

AI总结 提出一种名为“身份桥”的简单数据正则化方法(形式为“A→A”),通过理论分析和实验证明该方法能有效缓解自回归语言模型中的逆转诅咒,使模型从事实记忆转向规则学习。

详情
AI中文摘要

自回归大型语言模型(LLMs)在许多复杂任务中取得了显著成功,但在非常简单的逻辑推理中仍可能失败,例如“逆转诅咒”——当模型在形如“$A \rightarrow B$”(例如,爱丽丝的丈夫是鲍勃)的前向知识数据上训练时,在测试时无法推断出逆转知识“$B \leftarrow A$”(例如,鲍勃的妻子是爱丽丝)。大量先前的研究表明,这种失败是自回归因果LLMs固有的根本限制,表明这些模型倾向于记忆事实层面的知识,而不是捕捉更高级别的规则。在本文中,我们通过展示这种看似根本的限制可以通过略微调整训练数据,使用一种简单的正则化数据配方(称为“身份桥”,形式为“$A \to A$”,例如,爱丽丝的名字是爱丽丝)来缓解,从而挑战了这一观点。理论上,我们证明在这种配方下,即使是一层Transformer也可以通过分析梯度下降的隐式偏差来打破逆转诅咒。实验上,我们展示了一个10亿参数的预训练语言模型,在使用所提出的数据配方进行微调后,在逆转任务上达到了50%的成功率,而仅在前向知识数据上训练时成功率接近零。我们的工作为逆转诅咒提供了新颖的理论基础,并为鼓励LLMs从数据中学习更高级别的规则提供了一条原则性、低成本的路径。

英文摘要

Autoregressive large language models (LLMs) have achieved remarkable success in many complex tasks, yet they can still fail in very simple logical reasoning such as the "reversal curse" -- when trained on forward knowledge data of the form "$A \rightarrow B$" (e.g., Alice's husband is Bob), the model is unable to deduce the reversal knowledge "$B \leftarrow A$" (e.g., Bob's wife is Alice) during test. Extensive prior research suggests that this failure is an inherent, fundamental limit of autoregressive causal LLMs, indicating that these models tend to memorize factual-level knowledge rather than capture higher-level rules. In this paper, we challenge this view by showing that this seemingly fundamental limit can be mitigated by slightly tweaking the training data with a simple regularization data recipe called the Identity Bridge of the form "$A \to A$" (e.g., The name of Alice is Alice). Theoretically, we prove that under this recipe, even a one-layer transformer can break the reversal curse by analyzing the implicit bias of gradient descent. Empirically, we show that a 1B pretrained language model finetuned with the proposed data recipe achieves a 50% success rate on reversal tasks, in stark contrast to a near-zero success rate when trained solely on forward-knowledge data. Our work provides a novel theoretical foundation for the reversal curse and offers a principled, low-cost path to encouraging LLMs to learn higher-level rules from data.

2602.02416 2026-06-02 cs.AI

Structure Enables Effective Self-Localization of Errors in LLMs

结构使语言模型能够有效自我定位错误

Ankur Samanta, Akshayaa Magesh, Ayush Jain, Kavosh Asadi, Youliang Yu, Daniel Jiang, Boris Vidolov, Kaveh Hassani, Paul Sajda, Jalaj Bhandari, Yonathan Efroni

发表机构 * Meta AI Columbia University(哥伦比亚大学) Meta Superintelligence Labs(Meta超智能实验室) Tel Aviv University(特拉维夫大学)

AI总结 本文提出结构化推理方法,通过将推理分解为离散语义步骤,使语言模型能更可靠地定位错误,并基于此设计了迭代纠正采样框架Thought-ICS,实现20-40%的自我纠正提升。

详情
AI中文摘要

语言模型的自我纠正仍然难以实现。在这项工作中,我们探索语言模型是否能够显式定位错误推理中的错误,作为构建能够有效自我纠正的AI系统的一条途径。我们引入了一种提示方法,将推理结构化为离散的、语义连贯的思维步骤,并表明模型在这种结构内比在传统的、非结构化的思维链推理中更可靠地定位错误。受人类大脑在离散决策点监控错误并重新采样替代方案的启发,我们引入了思维迭代纠正采样(Thought-ICS),一个自我纠正框架。Thought-ICS迭代地提示模型一次生成一个离散且完整的思维——其中每个思维代表模型的一个深思熟虑的决策——为精确的错误定位创建自然边界。在验证时,模型定位第一个错误步骤,系统回溯并从最后一个正确点生成替代推理。当要求纠正被预言机验证为不正确的推理时,Thought-ICS实现了20-40%的自我纠正提升。在完全没有外部验证的完全自主设置中,它优于当代自我纠正基线。

英文摘要

Self-correction in language models remains elusive. In this work, we explore whether language models can explicitly localize errors in incorrect reasoning, as a path toward building AI systems that can effectively correct themselves. We introduce a prompting method that structures reasoning as discrete, semantically coherent thought steps, and show that models can localize errors more reliably within this structure than in conventional, unstructured chain-of-thought reasoning. Motivated by how the human brain monitors errors at discrete decision points and resamples alternatives, we introduce Iterative Correction Sampling of Thoughts (Thought-ICS), a self-correction framework. Thought-ICS iteratively prompts the model to generate reasoning one discrete and complete thought at a time--where each thought represents a deliberate decision by the model--creating natural boundaries for precise error localization. Upon verification, the model localizes the first erroneous step, and the system backtracks to generate alternative reasoning from the last correct point. When asked to correct reasoning verified as incorrect by an oracle, Thought-ICS achieves 20-40% self-correction lift. In a completely autonomous setting without external verification, it outperforms contemporary self-correction baselines.

2602.02239 2026-06-02 cs.LG

Interpretability in Deep Time Series Models Demands Semantic Alignment

深度时间序列模型的可解释性需要语义对齐

Giovanni De Felice, Riccardo D'Elia, Alberto Termine, Pietro Barbiero, Giuseppe Marra, Silvia Santini

发表机构 * University of Padua(帕多瓦大学)

AI总结 本文提出深度时间序列模型的可解释性应追求语义对齐,即预测应基于对用户有意义的变量,并受时空机制约束,同时需保持时间演化下的语义一致性,为此提供了形式化定义和模型设计蓝图。

Comments Accepted at ICML 2026

详情
AI中文摘要

深度时间序列模型在预测性能上持续提升,但其黑箱特性限制了部署。为此,现有的可解释性方法主要关注解释模型内部计算,而未考虑这些计算是否与人类对研究现象的推理方式一致。相反,我们认为深度时间序列模型的可解释性应追求语义对齐:预测应基于对最终用户有意义的变量来表达,并由允许用户依赖约束的时空机制中介。在本文中,我们形式化了这一要求,并指出一旦建立,语义对齐必须在时间演化下保持:这是一个在静态设置中没有类似物的约束。基于这一定义,我们概述了语义对齐深度时间序列模型的蓝图,确定了支持信任的属性,并讨论了对模型设计的影响。

英文摘要

Deep time series models continue to improve predictive performance, yet their deployment remains limited by their black-box nature. In response, existing interpretability approaches in the field keep focusing on explaining the internal model computations, without addressing whether they align or not with how a human would reason about the studied phenomenon. Instead, we state interpretability in deep time series models should pursue semantic alignment: predictions should be expressed in terms of variables that are meaningful to the end user, mediated by spatial and temporal mechanisms that admit user-dependent constraints. In this paper, we formalize this requirement and state that, once established, semantic alignment must be preserved under temporal evolution: a constraint with no analog in static settings. Provided with this definition, we outline a blueprint for semantically aligned deep time series models, identify properties that support trust, and discuss implications for model design.

2602.02098 2026-06-02 cs.LG cs.AI

Probabilistic Performance Guarantees for Multi-Task Reinforcement Learning

多任务强化学习的概率性能保证

Yannik Schnitzer, Mathias Jackermeier, Alessandro Abate, David Parker

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 提出一种结合每任务有限 rollout 置信下界与任务级泛化的新泛化界,为未见任务提供高置信度性能保证。

详情
AI中文摘要

多任务强化学习训练能够执行多个任务的通用策略。尽管近年来取得了显著进展,现有方法很少提供正式的性能保证,而这在安全关键环境中部署策略时是必不可少的。我们提出了一种方法,用于计算多任务策略在训练期间未见任务上的高置信度性能保证。具体地,我们引入了一个新的泛化界,该界将(i)来自有限 rollout 的每任务置信下界与(ii)来自有限采样任务的任务级泛化相结合,为从相同任意未知分布中抽取的新任务提供高置信度保证。在最新的多任务强化学习方法中,我们证明了这些保证在理论上是合理的,并且在现实样本量下具有信息量。

英文摘要

Multi-task reinforcement learning trains generalist policies that can execute multiple tasks. While recent years have seen significant progress, existing approaches rarely provide formal performance guarantees, which are indispensable when deploying policies in safety-critical settings. We present an approach for computing high-confidence guarantees on the performance of a multi-task policy on tasks not seen during training. Concretely, we introduce a new generalisation bound that composes (i) per-task lower confidence bounds from finitely many rollouts with (ii) task-level generalisation from finitely many sampled tasks, yielding a high-confidence guarantee for new tasks drawn from the same arbitrary and unknown distribution. Across state-of-the-art multi-task RL methods, we show that the guarantees are theoretically sound and informative at realistic sample sizes.

2602.01962 2026-06-02 cs.LG cs.AI

Zero-Shot Off-Policy Learning

零样本离策略学习

Arip Asadulaev, Maksim Bobrin, Salem Lahlou, Dmitry Dylov, Fakhri Karray, Martin Takac

发表机构 * Arip Asadulaev(阿里普·阿萨杜拉耶夫) Maksim Bobrin(马克西姆·博布林) Salem Lahlou(萨勒姆·拉洛) Dmitry Dylov(德米特里·达里夫) Fakhri Karray(法赫里·卡里) Martin Takac(马尔 tin 塔卡)

AI总结 本文通过发现后继度量与平稳密度比的理论联系,提出一种零样本离策略学习算法,能够实时推断最优重要性采样比率并进行平稳分布修正,实现无需额外训练即可适应新任务。

详情
AI中文摘要

离策略学习方法旨在直接从固定的先前交互数据集中推导出最优策略。这一目标面临重大挑战,主要源于固有的分布偏移和价值函数高估偏差。这些问题在零样本强化学习中尤为突出,其中在无奖励数据上训练的智能体必须在测试时适应新任务而无需额外训练。在这项工作中,我们通过发现后继度量与平稳密度比的理论联系,解决了零样本场景下的离策略问题。利用这一洞见,我们的算法能够推断最优重要性采样比率,有效地为任意任务实时执行带有最优策略的平稳分布修正。我们在SMPL人体模型上的运动跟踪任务、ExoRL上的连续控制任务以及长时域OGBench任务上对方法进行了基准测试。我们的技术无缝集成到前向-后向表示框架中,并在无需训练的情况下实现对新任务的快速适应。更广泛地说,这项工作架起了离策略学习和零样本适应之间的桥梁,为两个研究领域都带来了益处。

英文摘要

Off-policy learning methods seek to derive an optimal policy directly from a fixed dataset of prior interactions. This objective presents significant challenges, primarily due to the inherent distributional shift and value function overestimation bias. These issues become even more noticeable in zero-shot reinforcement learning, where an agent trained on reward-free data must adapt to new tasks at test time without additional training. In this work, we address the off-policy problem in a zero-shot setting by discovering a theoretical connection of successor measures to stationary density ratios. Using this insight, our algorithm can infer optimal importance sampling ratios, effectively performing a stationary distribution correction with an optimal policy for any task on the fly. We benchmark our method in motion tracking tasks on SMPL Humanoid, continuous control on ExoRL, and for the long-horizon OGBench tasks. Our technique seamlessly integrates into forward-backward representation frameworks and enables fast-adaptation to new tasks in a training-free regime. More broadly, this work bridges off-policy learning and zero-shot adaptation, offering benefits to both research areas.

2602.01053 2026-06-02 cs.LG

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

LRAgent: 面向多LoRA LLM代理的高效KV缓存共享

Hyesung Jeon, Hyeongju Ha, Jae-Joon Kim

发表机构 * KAIST(韩国科学技术院)

AI总结 针对多LoRA代理系统中每个代理独立存储相同长轨迹的KV缓存导致内存和计算开销大的问题,提出LRAgent框架,通过将缓存分解为共享基座部分和适配器依赖部分,并利用共享A的多LoRA架构和Flash-LoRA-Attention内核,实现高效共享,在保持精度的同时显著降低开销。

Comments 25 pages, 10 figures, 22 tables

详情
Journal ref
ICML 2026 Poster
AI中文摘要

多LLM代理系统中的角色专业化通常通过多LoRA实现,其中代理共享预训练骨干网络,仅通过轻量级适配器区分。尽管共享基础模型权重,每个代理仍独立构建和存储相同长工具增强轨迹的KV缓存,导致大量内存和计算开销。现有的KV缓存共享方法大多忽略这种多LoRA设置。我们观察到,代理间的缓存差异主要由适配器输出主导,而共享预训练骨干网络的激活保持高度相似。基于此观察,我们提出LRAgent,一个面向多LoRA代理的KV缓存共享框架。它将缓存分解为两个组件:来自预训练权重的共享基座组件和来自LoRA权重的适配器依赖组件。LRAgent通过在代理间共享基座组件并以固有的低秩形式存储适配器组件来减少内存开销。它还通过共享A的多LoRA架构共享低秩缓存,从而减少计算开销,避免对已被其他代理处理过的上下文进行冗余计算。为了在运行时高效重建适配器贡献,我们引入Flash-LoRA-Attention,一个重新排序注意力计算以避免将低秩缓存实例化为全维度的内核。LRAgent实现了接近完全共享缓存的吞吐量和首令牌延迟,同时在代理问答基准测试中保持了接近非共享缓存基线的准确性。

英文摘要

Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents share a pretrained backbone and differ only by lightweight adapters. Despite sharing base model weights, each agent independently builds and stores its own KV cache for the same long, tool-augmented trajectories, incurring substantial memory and compute overhead. Existing KV cache sharing methods largely overlook this multi-LoRA setting. We observe that, cache differences across agents are dominated by adapter outputs, while activations from the shared pretrained backbone remain highly similar. Based on this observation, we propose LRAgent, a KV cache sharing framework for multi-LoRA agents. It decomposes the cache into two components, a shared base component derived from pretrained weights and an adapter-dependent component derived from LoRA weights. LRAgent reduces memory overhead by sharing the base component across agents and storing the adapter component in its inherent low-rank form. It also reduces computational overhead by sharing the low-rank cache, enabled by a shared-A multi-LoRA architecture. This avoids redundant computations for contexts that have already been processed by other agents. To efficiently reconstruct adapter contributions at runtime, we introduce Flash-LoRA-Attention, a kernel that reorders attention computation to avoid materializing the low-rank cache to full dimension. LRAgent achieves throughput and time-to-first-token latency close to fully shared caching, while preserving accuracy near the non-shared caching baseline across agentic question-answering benchmarks.

2602.00742 2026-06-02 cs.CL

CURP: Codebook-based Continuous User Representation for Personalized Generation with LLMs

CURP: 基于码本的连续用户表示用于大语言模型的个性化生成

Liang Wang, Xinyi Mou, Xiaoyou Liu, Xuanjing Huang, Zhongyu Wei

发表机构 * School of Data Science, Fudan University(复旦大学数据科学学院) Shanghai Innovation Institute(上海创新研究院) School of Computer Science, Fudan University(复旦大学计算机科学学院)

AI总结 提出CURP框架,通过双向用户编码器和离散原型码本提取多维用户特征,实现少量可训练参数的即插即用个性化生成,在变体生成任务上优于强基线。

详情
AI中文摘要

用户建模通过个体的偏好和行为模式来表征用户,从而在当代方法中实现与大语言模型(LLMs)的个性化模拟和生成。然而,现有方法(无论是基于提示的方法还是基于训练的方法)在平衡个性化质量与计算和数据效率方面面临挑战。我们提出了一个新颖的框架CURP,它采用双向用户编码器和离散原型码本来提取多维用户特征。这种设计使得即插即用的个性化成为可能,且只需少量可训练参数(约2000万参数,约占模型总大小的0.2%)。通过在变体生成任务上的大量实验,我们表明CURP相比强基线实现了更优的性能和泛化能力,同时提供了更好的可解释性和可扩展性。代码可在https://github.com/RaidonWong/CURP_code获取。

英文摘要

User modeling characterizes individuals through their preferences and behavioral patterns to enable personalized simulation and generation with Large Language Models (LLMs) in contemporary approaches. However, existing methods, whether prompt-based or training-based methods, face challenges in balancing personalization quality against computational and data efficiency. We propose a novel framework CURP, which employs a bidirectional user encoder and a discrete prototype codebook to extract multi-dimensional user traits. This design enables plug-and-play personalization with a small number of trainable parameters (about 20M parameters, about 0.2\% of the total model size). Through extensive experiments on variant generation tasks, we show that CURP achieves superior performance and generalization compared to strong baselines, while offering better interpretability and scalability. The code are available at https://github.com/RaidonWong/CURP_code

2507.08038 2026-06-02 cs.CL cs.AI

AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research

AblationBench:评估实证AI研究中消融实验的自动规划

Talor Abramovich, Gal Chechik

发表机构 * Google Research(谷歌研究)

AI总结 提出AblationBench基准套件,包含作者消融和审稿人消融两个任务,用于评估语言模型在AI研究中规划消融实验的能力,实验表明当前最佳模型仅能识别45%的原始消融,低于人类水平。

Comments AI4Science Workshop, ICML 2026; Project page: https://ablation-bench.github.io/

详情
AI中文摘要

语言模型代理越来越多地被用于自动化科学研究,然而评估其科学贡献仍然是一个挑战。获得此类见解的关键机制是通过消融实验。为此,我们引入了AblationBench,这是一个用于评估代理在实证AI研究中进行消融规划任务的基准套件。它包括两个任务:AuthorAblation,帮助作者基于方法部分提出消融实验,包含83个实例;以及ReviewerAblation,帮助审稿人发现完整论文中缺失的消融,包含350个实例。对于这两个任务,我们开发了基于LM的评判器,作为自动评估框架。我们对前沿LM的实验表明,这些任务仍然具有挑战性,性能最佳的LM系统平均仅能识别45%的原始消融,低于人类水平。我们观察到作者任务和审稿人任务之间存在相反的性能趋势,这归因于模型基础的不同。最后,我们分析了当前LM在这些任务上的局限性,并发现思维链提示优于基于代理的方法。我们的数据可在https://huggingface.co/collections/ai-coscientist/ablationbench获取,代码可在https://github.com/ai-scientist-bench/ablation-bench获取。

英文摘要

Language model agents are increasingly used to automate scientific research, yet evaluating their scientific contributions remains a challenge. A key mechanism to obtain such insights is through ablation experiments. To this end, we introduce AblationBench, a benchmark suite for evaluating agents on ablation planning tasks in empirical AI research. It includes two tasks: AuthorAblation, which helps authors propose ablation experiments based on a method section and contains 83 instances, and ReviewerAblation, which helps reviewers find missing ablations in a full paper and contains 350 instances. For both tasks, we develop LM-based judges that serve as an automatic evaluation framework. Our experiments with frontier LMs show that these tasks remain challenging, with the best-performing LM system identifying only 45% of the original ablations on average, below human-level performance. We observe an inverse performance trend between the author and reviewer tasks, which we attribute to differences in model grounding. Lastly, we analyze the limitations of current LMs on these tasks, and find that chain-of-thought prompting outperforms an agent-based approach. Our data is available on https://huggingface.co/collections/ai-coscientist/ablationbench, and our code is available on https://github.com/ai-scientist-bench/ablation-bench .

2602.00415 2026-06-02 cs.AI cs.LG

PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Vision-Language Models

PolarMem: 一种无需训练的可验证视觉语言模型极化隐式图记忆

Zhisheng Chen, Tingyu Wu, Zijie Zhou, Zhengwei Xie, Jinhan Li, Ziyan Weng, Liang Lin, Jingwei Song, Zikai Xiao, Yingwei Zhang

发表机构 * ICT, CAS(中国科学院信息科技研究院) UCAS(中国科学院大学) CUPB(中国政法大学) USTC(中国科学技术大学) CityU-DG(城市大学-数据科学) HKU(香港大学) ZJU(浙江大学)

AI总结 提出PolarMem,一种无需训练的极化隐式图记忆框架,通过语义一致性验证和自适应分布划分将视觉语言模型感知信号转化为HAS、NOT_HAS和Uncertain记忆状态,并采用词典逻辑感知检索协议优先保证逻辑一致性,从而提升检索密集型任务性能并减少矛盾。

详情
AI中文摘要

记忆对于智能系统而言不仅是存储机制,更是组织证据和约束信念的结构。这对多模态推理尤为重要,因为检索到的证据必须既与查询相关又在视觉上一致。然而,当前视觉语言模型(VLM)的记忆系统大多保持正关联:它们检索相似或先前观察到的内容,但缺乏明确的方式记住已被验证为不存在或逻辑排除的内容。为此,我们提出 extbf{PolarMem},一种无需训练的极化隐式图记忆框架,用于可验证的视觉语言推理。PolarMem通过语义一致性验证和自适应分布划分,将冻结的VLM感知信号转化为 extit{HAS}、 extit{NOT\_HAS}和 extit{Uncertain}记忆状态,并将其存储在具有明确正负记忆关系的极化图中。在推理时,词典逻辑感知检索协议在语义相似性之前强制执行逻辑一致性,在冲突记忆进入模型上下文之前将其抑制。在八个冻结的VLM骨干网络和六个多模态基准测试中,PolarMem一致地提升了检索密集型任务性能并减少了检索级矛盾。这些结果凸显了负记忆作为构建更可靠多模态记忆系统的关键机制。我们的代码可在https://github.com/czs-ict/PolarMem获取。

英文摘要

Memory is not merely a storage mechanism for intelligent systems, but a structure for organizing evidence and constraining belief. This is especially important for multimodal reasoning, where retrieved evidence must be both query-relevant and visually consistent. However, current memory systems for vision-language models (VLMs) remain largely positive-associative: they retrieve what is similar or previously observed, but lack an explicit way to remember what has been verified as absent or logically excluded. To this end, we propose \textbf{PolarMem}, a training-free polarized latent graph memory framework for verifiable vision-language reasoning. PolarMem transforms frozen VLM perceptual signals into \textit{HAS}, \textit{NOT\_HAS}, and \textit{Uncertain} memory states through semantic consistency verification and adaptive distributional partitioning, and stores them in a polarized graph with distinct positive and negative memory relations. During inference, a lexicographical logic-aware retrieval protocol enforces logical consistency before semantic similarity, suppressing conflicting memories before they enter the model context. Across eight frozen VLM backbones and six multimodal benchmarks, PolarMem consistently improves retrieval-intensive tasks and reduces retrieval-level contradictions. These results highlight negative memory as a key mechanism for building more reliable multimodal memory systems. Our code is available at https://github.com/czs-ict/PolarMem.

2601.23220 2026-06-02 cs.CV cs.AI

Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training

Med-Scout: 通过几何感知的强化学习后训练治愈多模态大语言模型在医学感知中的几何盲点

Anglin Liu, Ruichao Chen, Yi Lu, Hongxia Xu, Jintai Chen

发表机构 * HKUSTGZ-ML4Health-Lab(香港科技大学-ML4Health实验室)

AI总结 提出Med-Scout框架,利用无标注医学图像中的内在几何逻辑,通过强化学习和三种代理任务(层次尺度定位、拓扑拼图重建、异常一致性检测)来缓解多模态大语言模型的几何盲点,并在新基准Med-Scout-Bench上提升超过40%的几何感知性能,同时泛化到更广泛的医学理解任务。

Comments 29 pages, 14 figures. Accepted at ICML 2026

详情
AI中文摘要

尽管最近的多模态大语言模型(MLLMs)在医学诊断中展现出语言能力,但我们发现即使是最先进的MLLMs也存在一个关键的感知缺陷:几何盲点。这种无法将输出基于客观几何约束的问题导致了看似合理但事实错误的幻觉,其根源在于训练范式优先考虑语言流畅性而非几何保真度。本文介绍了Med-Scout,一种新颖的框架,通过强化学习(RL)“治愈”这种盲点,利用未标记医学图像中内在的几何逻辑。Med-Scout不依赖昂贵的人工标注,而是通过受临床医生系统阅读和推理模式启发的三种策略性代理任务推导出可验证的监督信号:层次尺度定位、拓扑拼图重建和异常一致性检测。为了严格量化这一缺陷,我们提出了Med-Scout-Bench,一个专门设计用于评估几何感知的新基准。大量评估表明,Med-Scout显著缓解了几何盲点,在我们的基准上比领先的专有和开源MLLMs提升了超过40%。此外,这种增强的几何感知泛化到更广泛的医学理解,在放射学和综合性医学VQA任务上取得了优异结果。

英文摘要

Despite recent Multimodal Large Language Models (MLLMs)' linguistic prowess in medical diagnosis, we find even state-of-the-art MLLMs suffer from a critical perceptual deficit: geometric blindness. This failure to ground outputs in objective geometric constraints leads to plausible yet factually incorrect hallucinations, rooted in training paradigms that prioritize linguistic fluency over geometric fidelity. This paper introduces Med-Scout, a novel framework that "cures" this blindness via Reinforcement Learning (RL) that leverages the intrinsic geometric logic latent within unlabeled medical images. Instead of relying on costly expert annotations, Med-Scout derives verifiable supervision signals through three strategic proxy tasks inspired by the systematic reading and reasoning patterns of clinicians: Hierarchical Scale Localization, Topological Jigsaw Reconstruction, and Anomaly Consistency Detection. To rigorously quantify this deficit, we present Med-Scout-Bench, a new benchmark specifically designed to evaluate geometric perception. Extensive evaluations show that Med-Scout significantly mitigates geometric blindness, outperforming leading proprietary and open-source MLLMs by over 40% on our benchmark. Furthermore, this enhanced geometric perception generalizes to broader medical understanding, achieving superior results on radiological and comprehensive medical VQA tasks.

2601.22965 2026-06-02 cs.RO

Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation

自模仿扩散策略用于高效鲁棒的视觉导航

Runhua Zhang, Junyi Hou, Changxu Cheng, Qiyi Chen, Tao Wang, Wuyue Zhao

发表机构 * Uni-Ubi Technology Co., Ltd.(Uni-Ubi技术有限公司) College of Control Science and Engineering, Zhejiang University(浙江大学控制科学与工程学院)

AI总结 提出自模仿扩散策略(SIDP),通过奖励引导的自模仿机制和课程学习范式,减少对大量采样和后过滤的依赖,实现高效鲁棒的视觉导航。

Comments Preprint

详情
AI中文摘要

扩散策略(DP)通过捕获多样的多模态轨迹分布,在视觉导航中展现出显著潜力。然而,大多数DP方法依赖的标准模仿学习(IL)往往继承专家演示中的次优性和冗余性,从而需要在推理期间依赖辅助选择器进行计算密集型的“生成-然后过滤”流程。为解决这些挑战,我们提出自模仿扩散策略(SIDP),一种新颖的框架,通过选择性模仿从自身采样的一组轨迹来学习改进的规划。具体来说,SIDP引入了一种奖励引导的自模仿机制,鼓励策略持续高效地生成高质量轨迹,而非质量不一致的输出,从而减少对大量采样和后过滤的依赖。在训练过程中,我们采用奖励驱动的课程学习范式来缓解低效的数据利用率,以及目标无关的探索进行轨迹增强以提高规划鲁棒性。在综合仿真基准上的广泛评估表明,SIDP显著优于先前方法,真实世界实验证实了其在多个机器人平台上的有效性。在Jetson Orin Nano上,SIDP的推理速度比基线NavDP快2.5倍,即110ms对比273ms,实现了高效的实时部署。

英文摘要

Diffusion policies (DP) have demonstrated significant potential in visual navigation by capturing diverse multi-modal trajectory distributions. However, standard imitation learning (IL), which most DP methods rely on for training, often inherits sub-optimality and redundancy from expert demonstrations, thereby necessitating a computationally intensive "generate-then-filter" pipeline that relies on auxiliary selectors during inference. To address these challenges, we propose Self-Imitated Diffusion Policy (SIDP), a novel framework that learns improved planning by selectively imitating a set of trajectories sampled from itself. Specifically, SIDP introduces a reward-guided self-imitation mechanism that encourages the policy to consistently produce high-quality trajectories efficiently, rather than outputs of inconsistent quality, thereby reducing reliance on extensive sampling and post-filtering. During training, we employ a reward-driven curriculum learning paradigm to mitigate inefficient data utility, and goal-agnostic exploration for trajectory augmentation to improve planning robustness. Extensive evaluations on a comprehensive simulation benchmark show that SIDP significantly outperforms previous methods, with real-world experiments confirming its effectiveness across multiple robotic platforms. On Jetson Orin Nano, SIDP delivers a 2.5$\times$ faster inference than the baseline NavDP, i.e., 110ms VS 273ms, enabling efficient real-time deployment.

2601.22947 2026-06-02 cs.CL cs.LG

Reconsidering Positional Supervision in Masked Diffusion Language Model Training

重新审视掩码扩散语言模型训练中的位置监督

Mengyu Ye, Keito Kudo, Ryosuke Takahashi, Jun Suzuki

发表机构 * Tohoku University(东大大学) RIKEN(理化学研究所) NII LLMC(国家信息研究所LLMC)

AI总结 针对掩码扩散语言模型对位置偏移敏感的问题,提出基于连接主义时序分类(CTC)的目标函数,通过引入松弛令牌和更新折叠映射来吸收位置不确定性,从而在开放生成基准上取得一致提升。

Comments preprint, WIP

详情
AI中文摘要

掩码扩散语言模型(MDLM)通过并行去掩码生成文本,最近成为自回归语言模型的替代方案。它们可以被视为使用位置交叉熵(CE)损失训练的并行解码器,与非自回归翻译(NAT)设置相同。在NAT中,CE训练的并行解码器被认为对小的位置偏移敏感,因为CE会严厉惩罚它们。我们询问CE训练的MDLM在迭代解码下是否同样对此类偏移敏感。为了探究这一点,我们应用了一种受控干预,在解码过程中引入这些偏移。在LLaDA-8B-Instruct和Arena-Hard上,仅将1%的生成令牌移动一个位置,就显著降低了相对于未干预模型的胜率,表明MDLM在迭代并行解码下对此类小偏移敏感。受此启发,我们将连接主义时序分类(CTC)(一种已知能缓解该问题的对齐灵活目标)适配到MDLM监督微调中。通过放宽CE施加的严格位置匹配,CTC为损失提供了吸收小位置偏移的空间;具体地,我们修改了CTC目标,使用一个特殊的<slack>令牌来吸收目标令牌与输出位置之间的位置不确定性,并更新了折叠映射以保留目标表面形式。在四个开放生成基准上,所得模型在原始模型和匹配的交叉熵训练基线上均有一致改进,且在所有四个基准上具有统计显著性。这些结果表明,训练侧的对齐灵活性是MDLM SFT的一个有用设计维度,与先前工作中探索的推理时方法互补。

英文摘要

Masked diffusion language models (MDLMs) generate text by unmasking tokens in parallel and have recently emerged as alternatives to autoregressive language models. They can be viewed as parallel decoders trained with a position-wise cross-entropy (CE) loss, the same setup as non-autoregressive translation (NAT). In NAT, CE-trained parallel decoders have been argued to be sensitive to small positional shifts, since CE penalizes them harshly. We ask whether CE-trained MDLMs are similarly sensitive to such shifts under iterative decoding. To probe this, we apply a controlled intervention that introduces them during decoding. On LLaDA-8B-Instruct with Arena-Hard, displacing as little as 1% of generated tokens by one position substantially reduces win rates against the unintervened model, showing that MDLMs are sensitive to such small shifts under iterative parallel decoding. Motivated by this, we adapt connectionist temporal classification (CTC), an alignment-flexible objective known to mitigate it there, to MDLM supervised fine-tuning. By relaxing the strict position-wise match that CE imposes, CTC gives the loss room to absorb small positional shifts; concretely, we modified CTC objective to use a special <slack> token that absorbs positional uncertainty between target tokens and output positions, and a updated collapse map that preserves target surface forms. Across four open-ended generation benchmarks, the resulting model consistently improves over both the original model and a matched cross-entropy-trained baseline, with statistically significant gains on all four. These results identify training-side alignment flexibility as a useful design dimension for MDLM SFT, complementary to the inference-time approaches explored in prior work.

2601.22900 2026-06-02 cs.AI

MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop

MulFeRL:在多轮循环中利用语言反馈增强强化学习

Xuancheng Li, Haitao Li, Yujia Zhou, YiqunLiu, Qingyao Ai

发表机构 * Department of Computer Science and Technology, Tsinghua University, Beijing, China(清华大学计算机科学与技术系,北京,中国) Quancheng Laboratory(千晨实验室)

AI总结 针对强化学习中标量奖励稀疏且缺乏信息的问题,提出MulFeRL框架,通过多轮语言反馈引导失败样本的再生、进度信用分配和结构化反馈注入,提升模型推理性能。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)被广泛用于提升各领域的推理能力,但仅基于结果的标量奖励往往稀疏且信息量不足。这一限制对失败样本尤为严重,因为标量奖励仅指示解决方案不正确,而未解释推理为何失败。在本文中,我们利用更丰富的语言反馈来引导失败样本上的RLVR,并将反馈引发的进展转化为可训练的学习信号。我们提出MulFeRL(多轮反馈引导的强化学习),这是一个多轮、事件触发的RLVR框架,结合了用于反馈引导失败样本再生的进展诱导、用于从验证器确认的进展中学习的进展信用分配,以及用于将反馈整合到模型推理过程中的结构化反馈注入。在采样的OpenR1-Math上训练后,MulFeRL在领域内优于监督、自蒸馏和RLVR基线,同时展现出强大的领域外泛化能力。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning across domains, but outcome-only scalar rewards are often sparse and uninformative. This limitation is especially severe for failed samples, where scalar rewards indicate only that a solution is incorrect without explaining why the reasoning breaks down. In this paper, we leverage richer verbal feedback to guide RLVR on failed samples and convert feedback-induced progress into trainable learning signals. We propose MulFeRL (Multi-turn Feedback-guided Reinforcement Learning), a multi-turn, event-triggered RLVR framework that combines progress induction for feedback-guided regeneration of failed samples, progress credit assignment for learning from verifier-confirmed progress, and structured feedback injection for integrating feedback into the model's reasoning process. Trained on sampled OpenR1-Math, MulFeRL outperforms supervised, self-distillation-based, and RLVR baselines in-domain, while also showing strong out-of-domain generalization.

2601.22813 2026-06-02 cs.LG

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

Quartet II: 通过改进的无偏梯度估计实现 NVFP4 中准确的 LLM 预训练

Andrei Panferov, Erik Schultheis, Soroush Tabesh, Dan Alistarh

发表机构 * University of Waterloo(多伦多大学)

AI总结 提出一种用于微缩放格式的无偏量化方法 MS-EDEN,其量化误差比随机舍入低 2 倍以上,并集成到全 NVFP4 线性层量化方案 Quartet II 中,在 LLM 预训练中实现更准确的梯度估计和加速。

详情
AI中文摘要

NVIDIA Blackwell GPU 硬件支持的 NVFP4 低精度格式,有望首次实现大规模模型(如 LLM)的端到端全量化预训练。然而,现有的量化训练方法仍然牺牲了该格式的部分表示能力,以通过随机舍入(SR)获得更准确的无偏量化梯度估计,相对于标准 FP16 和 FP8 训练,损失了显著的准确性。在本文中,我们通过一种新颖的微缩放格式无偏量化例程 MS-EDEN 改进了 NVFP4 量化训练的最新技术,其量化误差比 SR 低 2 倍以上。我们将其集成到一种新颖的全 NVFP4 线性层量化方案 Quartet II 中。我们分析表明,Quartet II 在前向和后向传播的所有主要矩阵乘法中一致地实现了更好的梯度估计。此外,我们的提议与最近专门针对 NVFP4 的训练改进很好地协同。我们进一步在多达 1.9B 参数和 38B token 的端到端 LLM 训练上验证了 Quartet II。我们提供了在 NVIDIA Blackwell GPU 上执行的核函数,相比 BF16 实现了高达 4.2 倍的加速。我们的代码可在 https://github.com/IST-DASLab/Quartet-II 获取。

英文摘要

The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training methods still sacrifice some of the representation capacity of this format in favor of more accurate unbiased quantized gradient estimation by stochastic rounding (SR), losing noticeable accuracy relative to standard FP16 and FP8 training. In this paper, improve the state of the art for quantized training in NVFP4 via a novel unbiased quantization routine for micro-scaled formats, called MS-EDEN, that has more than 2x lower quantization error than SR. We integrate it into a novel fully-NVFP4 quantization scheme for linear layers, called Quartet II. We show analytically that Quartet II achieves consistently better gradient estimation across all major matrix multiplications, both on the forward and on the backward passes. In addition, our proposal synergizes well with recent training improvements aimed specifically at NVFP4. We further validate Quartet II on end-to-end LLM training with up to 1.9B parameters on 38B tokens. We provide kernels for execution on NVIDIA Blackwell GPUs with up to 4.2x speedup over BF16. Our code is available at https://github.com/IST-DASLab/Quartet-II .

2601.22651 2026-06-02 cs.LG cs.AI

GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning

GUDA: 基于反事实的扩散模型分组训练数据归因方法

Naoki Murata, Yuhta Takida, Chieh-Hsin Lai, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon, Yuki Mitsufuji

发表机构 * University of Tokyo(东京大学) Toyota Central Research Laboratory(丰田中央研究所) University of California, Berkeley(加州大学伯克利分校) Massachusetts Institute of Technology(麻省理工学院) National Institute of Advanced Industrial Science and Technology(国家工业科学与技术研究院)

AI总结 提出GUDA方法,利用机器遗忘近似反事实模型,通过似然评分规则(ELBO)量化组别影响,实现高效的分组训练数据归因。

Comments Accepted at ICML 2026. Code is available at https://github.com/sony/guda

详情
AI中文摘要

视觉生成模型的训练数据归因旨在识别哪些训练数据影响了给定输出。虽然大多数方法对单个样本进行评分,但实践者通常需要组级别的答案(例如,艺术风格或对象类别)。分组归因是反事实的:如果某个组别在训练中缺失,模型对生成样本的行为会如何变化?这种反事实的自然实现是留一组法(LOGO)重训练,即移除每个组别后重新训练模型;然而,随着组别数量的增加,计算变得不可行。我们提出了用于扩散模型的GUDA(基于组遗忘的数据归因)方法,该方法通过应用机器遗忘到共享的全数据模型而不是从头训练来近似每个反事实模型。GUDA使用全模型和每个遗忘反事实模型之间基于似然的评分规则(ELBO)的差异来量化组别影响。在CIFAR-10和Stable Diffusion的艺术风格归因上的实验表明,GUDA比语义相似性、基于梯度的归因和实例级遗忘方法更可靠地识别主要贡献组别,同时在CIFAR-10上比LOGO重训练实现了约100倍的加速。

英文摘要

Training-data attribution for vision generative models aims to identify which training data influenced a given output. While most methods score individual examples, practitioners often need group-level answers (e.g., artistic styles or object classes). Group-wise attribution is counterfactual: how would a model's behavior on a generated sample change if a group were absent from training? A natural realization of this counterfactual is Leave-One-Group-Out (LOGO) retraining, which retrains the model with each group removed; however, it becomes computationally prohibitive as the number of groups grows. We propose GUDA (Group Unlearning-based Data Attribution) for diffusion models, which approximates each counterfactual model by applying machine unlearning to a shared full-data model instead of training from scratch. GUDA quantifies group influence using differences in a likelihood-based scoring rule (ELBO) between the full model and each unlearned counterfactual. Experiments on CIFAR-10 and artistic style attribution with Stable Diffusion show that GUDA identifies primary contributing groups more reliably than semantic similarity, gradient-based attribution, and instance-level unlearning approaches, while achieving ~100x speedup on CIFAR-10 over LOGO retraining.

2601.22328 2026-06-02 cs.LG

Knowledge-Informed Kernel State Reconstruction from Heterogeneous Partial Observations

知识驱动的异质部分观测核状态重建

Luca Muscarnera, Silas Ruhrberg Estévez, Samuel Holt, Evgeny Saveliev, Mihaela van der Schaar

发表机构 * Supplementary Materials for MAAT Report GitHub Issue(MAAT报告补充材料GitHub问题) arXiv:2601.22328v2 [cs.LG] 30 May 2026(arXiv:2601.22328v2 [cs.LG] 2026年5月30日)

AI总结 提出MAAT框架,利用再生核希尔伯特空间和异质观测算子及先验知识,从部分、噪声、异质观测中重建平滑且物理一致的动态系统状态,显著降低轨迹和导数重建误差。

Comments Accepted at ICML 2026 SD4H Workshop

详情
AI中文摘要

现实世界的科学系统很少通过完整、规则采样的状态轨迹被观测。相反,测量通常是部分的、有噪声的、异质的,提供了潜在动态状态的碎片化视图。我们引入了MAAT(模型感知轨迹近似),一个用于部分观测动态系统中知识驱动的核状态重建框架。MAAT在再生核希尔伯特空间中制定重建,并结合异质观测算子以及语义和结构先验,包括非负性、守恒约束和特定领域的测量模型。这产生了具有解析时间导数的平滑、物理一致的状态估计,为碎片化测量和下游机制发现方法(如符号回归)提供了原则性接口。在九个科学基准、多个噪声设置和一个真实世界的COVID-19数据集上,MAAT相对于强基线显著降低了轨迹和导数重建误差。

英文摘要

Real-world scientific systems are rarely observed through complete, regularly sampled state trajectories. Instead, measurements are often partial, noisy, and heterogeneous, providing fragmented views of latent dynamical states. We introduce MAAT (Model Aware Approximation of Trajectories), a framework for knowledge-informed Kernel State Reconstruction in partially observed dynamical systems. MAAT formulates reconstruction in a reproducing kernel Hilbert space and incorporates heterogeneous observation operators together with semantic and structural priors, including non-negativity, conservation constraints, and domain-specific measurement models. This yields smooth, physically consistent state estimates with analytic time derivatives, providing a principled interface between fragmented measurements and downstream mechanistic discovery methods such as symbolic regression. Across nine scientific benchmarks, multiple noise regimes, and a real-world COVID-19 dataset, MAAT substantially reduces trajectory and derivative reconstruction error relative to strong baselines.

2601.22276 2026-06-02 cs.LG cs.CV

SurrogateSHAP: Training-Free Contributor Attribution for Text-to-Image (T2I) Models

SurrogateSHAP:文本到图像(T2I)模型的无训练贡献者归因

Mingyu Lu, Soham Gadgil, Chris Lin, Chanwoo Kim, Su-In Lee

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对文本到图像扩散模型中数据贡献者公平估值的高计算成本问题,提出基于预训练模型推理的无重训练框架SurrogateSHAP,利用梯度提升树近似效用函数并解析计算Shapley值,在多个任务上以更低开销超越现有方法。

详情
AI中文摘要

随着文本到图像(T2I)扩散模型在现实创意工作流中的广泛应用,一个用于评估提供数据集合的贡献者的原则性框架对于公平补偿和可持续数据市场至关重要。虽然Shapley值提供了理论上有依据的归因方法,但它面临双重计算瓶颈:(i)对每个采样的玩家(即数据贡献者)子集进行穷举模型重训练的高昂成本,以及(ii)由于贡献者交互,估计边际贡献所需的子集组合数量巨大。为此,我们提出了SurrogateSHAP,一个无需重训练的框架,通过从预训练模型进行推理来近似昂贵的重训练博弈。为了进一步提高效率,我们采用梯度提升树来近似效用函数,并基于树模型解析地推导Shapley值。我们在三个不同的归因任务上评估了SurrogateSHAP:(i)CIFAR-20上DDPM-CFG的图像质量,(ii)后印象派艺术品上Stable Diffusion的美学质量,以及(iii)时尚产品数据上FLUX.1的产品多样性。在各种设置下,SurrogateSHAP在显著降低计算开销的同时优于先前方法,一致地在多个效用指标上识别出有影响力的贡献者。最后,我们证明了SurrogateSHAP能够有效定位导致临床图像中虚假相关的数据源,为审计安全关键型生成模型提供了一条可扩展的路径。

英文摘要

As Text-to-Image (T2I) diffusion models are increasingly used in real-world creative workflows, a principled framework for valuing contributors who provide a collection of data is essential for fair compensation and sustainable data marketplaces. While the Shapley value offers a theoretically grounded approach to attribution, it faces a dual computational bottleneck: (i) the prohibitive cost of exhaustive model retraining for each sampled subset of players (i.e., data contributors) and (ii) the combinatorial number of subsets needed to estimate marginal contributions due to contributor interactions. To this end, we propose SurrogateSHAP, a retraining-free framework that approximates the expensive retraining game through inference from a pretrained model. To further improve efficiency, we employ a gradient-boosted tree to approximate the utility function and derive Shapley values analytically from the tree-based model. We evaluate SurrogateSHAP across three diverse attribution tasks: (i) image quality for DDPM-CFG on CIFAR-20, (ii) aesthetics for Stable Diffusion on Post-Impressionist artworks, and (iii) product diversity for FLUX.1 on Fashion-Product data. Across settings, SurrogateSHAP outperforms prior methods while substantially reducing computational overhead, consistently identifying influential contributors across multiple utility metrics. Finally, we demonstrate that SurrogateSHAP effectively localizes data sources responsible for spurious correlations in clinical images, providing a scalable path toward auditing safety-critical generative models.

2512.00986 2026-06-02 cs.CL

ADRA-Bank: A Modular Benchmark for Academic Deep Research Agents

ADRA-Bank:面向学术深度研究智能体的模块化基准

Zhihan Guo, Feiyang Xu, Yifan Li, Muzhi Li, Shuai Zou, Jiele Wu, Han Shi, Haoli Bai, Ho-fung Leung, Irwin King

发表机构 * The Chinese University of Hong Kong, Hong Kong SAR, China(香港中文大学) Hong Kong Polytechnic University, Hong Kong SAR, China(香港理工大学) National University of Singapore, Singapore, Singapore(新加坡国立大学) Huawei Technologies, Hong Kong SAR, China(华为技术有限公司) Independent(独立)

AI总结 针对现有基准侧重检索而忽视规划与推理、且缺乏学术领域覆盖的问题,提出ADRA-Bank模块化基准,包含10个学术领域的200个人工标注实例,并设计ADRA-Eval评估范式,通过端到端和隔离评估两种模式测试规划、检索和推理能力,揭示智能体在跨源检索和跨领域一致性上的不足,并指出提升高层规划能力是释放基础LLM推理潜力的关键。

详情
AI中文摘要

学术出版物激增催生了自动化深度研究(DR)系统的需求,但准确评估这些系统仍是一个开放问题。首先,现有基准通常狭隘地聚焦于检索,而忽视了高层规划与推理。其次,现有基准偏向通用领域,而非作为DR智能体核心应用的学术领域。为弥补这些不足,我们引入了ADRA-Bank,一个面向学术DR智能体的模块化基准。该基准基于学术文献,是一个包含10个学术领域(涵盖研究论文和综述论文)200个人工标注实例的数据集。此外,我们提出了一个模块化的学术DR智能体评估范式(ADRA-Eval),利用学术论文的丰富结构来评估规划、检索和推理的核心能力。它采用两种互补模式:针对任务智能体的端到端评估,以及针对作为潜在基础模型的基础LLM的隔离评估。结果揭示了不均衡的能力:虽然智能体表现出专业优势,但在多源检索和跨领域一致性上存在困难。此外,提升高层规划能力是释放作为基础模型的基础LLM推理潜力的关键因素。通过暴露这些可操作的失败模式,ADRA-Bank提供了一个诊断工具,以指导更可靠的自动化学术研究助手的开发。

英文摘要

A surge in academic publications calls for automated deep research (DR) systems, but accurately evaluating them is still an open problem. First, existing benchmarks often focus narrowly on retrieval while neglecting high-level planning and reasoning. Second, existing benchmarks favor general domains over the academic domains that are the core application for DR agents. To address these gaps, we introduce ADRA-Bank, a modular benchmark for Academic DR Agents. Grounded in academic literature, our benchmark is a human-annotated dataset of 200 instances across 10 academic domains, including both research and review papers. Furthermore, we propose a modular Evaluation Paradigm for Academic DR Agents (ADRA-Eval), which leverages the rich structure of academic papers to assess the core capabilities of planning, retrieval, and reasoning. It employs two complementary modes: an end-to-end evaluation for \task agents and an isolated evaluation for foundational LLMs as potential backbones. Results reveal uneven capabilities: while agents show specialized strengths, they struggle with multi-source retrieval and cross-field consistency. Moreover, improving high-level planning capability is the crucial factor for unlocking the reasoning potential of foundational LLMs as backbones. By exposing these actionable failure modes, ADRA-Bank provides a diagnostic tool to guide the development of more reliable automatic academic research assistants.

2501.13428 2026-06-02 cs.CL cs.AI cs.LG

Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models

Softplus注意力与重加权提升大语言模型的长度外推能力

Bo Gao, Michael W. Spratling, Letizia Gionfrida

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种两阶段注意力机制,用Softplus和l1归一化替代Softmax,并引入基于不变熵的动态缩放因子和重加权机制,以提升数值稳定性、缓解注意力下沉现象,并显著改善长度外推性能。

Comments Accepted by ICML 2026

详情
AI中文摘要

近年来,大语言模型取得了显著成功,这主要归功于自注意力机制。然而,传统的Softmax注意力存在数值不稳定性,并且随着推理令牌数量的增加,性能会下降。本文通过提出一种新的注意力设计原则来解决这些问题,将注意力视为一个两阶段过程。第一阶段(归一化)通过用数值更稳定的Softplus后接$l_{1}$归一化替代Softmax来改进标准注意力。此外,我们引入了一个基于不变熵的动态缩放因子。我们证明,这种新颖的注意力机制优于传统的Softmax注意力和最先进的非Softmax替代方案。我们的第二个提议是引入第二阶段处理(锐化),该阶段由一个重加权机制组成,该机制放大重要的注意力权重,同时削弱较弱的权重。这使得模型能够更有效地聚焦于相关令牌,缓解注意力下沉现象,并从根本上改善长度外推。这种新颖的两阶段自注意力替代方案被证明能确保数值稳定性,并显著提升长度外推能力,在训练长度的16倍时保持几乎恒定的验证损失,同时在具有挑战性的长上下文检索任务和下游基准测试中取得优异结果。此外,符号回归实验表明,我们的方法使模型能够从轨道轨迹序列中恢复牛顿万有引力定律,这为适当的注意力机制对于基础模型发展真正的物理世界模型至关重要提供了证据。我们的代码可在 https://github.com/iminfine/freeattn 获取。

英文摘要

Large language models have achieved remarkable success in recent years, primarily due to self-attention. However, traditional Softmax attention suffers from numerical instability and reduced performance as the number of inference tokens increases. This work addresses these issues by proposing a new design principle for attention, viewing it as a two-stage process. The first stage (normalisation) refines standard attention by replacing Softmax with the more numerically stable Softplus followed by $l_{1}$-normalisation. Furthermore, we introduce a dynamic scale factor based on invariance entropy. We show that this novel attention mechanism outperforms conventional Softmax attention, and state-of-the-art Softmax-free alternatives. Our second proposal is to introduce a second processing stage (sharpening) which consists of a re-weighting mechanism that amplifies significant attentional weights while diminishing weaker ones. This enables the model to concentrate more effectively on relevant tokens, mitigating the attention sink phenomenon, and fundamentally improving length extrapolation. This novel, two-stage, replacement for self-attention is shown to ensure numerical stability and dramatically improve length extrapolation, maintaining a nearly constant validation loss at 16$\times$ the training length while achieving superior results on challenging long-context retrieval tasks and downstream benchmarks. Furthermore, symbolic regression experiments demonstrate that our method enables models to recover Newton's gravitational law from orbital trajectory sequences, providing evidence that appropriate attention mechanisms are crucial for foundation models to develop genuine physical world models. Our code is available at https://github.com/iminfine/freeattn.

2601.21718 2026-06-02 cs.LG cs.AI

When Does Predictive Inverse Dynamics Outperform Behavior Cloning?

何时预测性逆动力学优于行为克隆?

Lukas Schäfer, Pallavi Choudhury, Abdelhak Lemkhenter, Chris Lovett, Somjit Nath, Luis França, Matheus Ribeiro Furtado de Mendonça, Alex Lamb, Riashat Islam, Siddhartha Sen, John Langford, Katja Hofmann, Sergio Valcarcel Macua

发表机构 * University of Cambridge(剑桥大学) Universitygrow

AI总结 本文通过理论分析解释了预测性逆动力学模型(PIDM)为何在行为克隆(BC)失败时表现更优,归因于偏差-方差权衡,并实验验证了PIDM在样本效率上的显著优势。

Comments To be published in proceedings of the International Conference on Machine Learning (ICML), 2026

详情
AI中文摘要

行为克隆(BC)是一种实用的离线模仿学习方法,但在专家演示有限时常常失败。最近的工作引入了一类名为预测性逆动力学模型(PIDM)的架构,它将未来状态预测器与逆动力学模型相结合。虽然PIDM通常优于BC,但其优势背后的原因尚不清楚。在本文中,我们提供了一个理论解释:PIDM引入了偏差-方差权衡。虽然预测未来状态会引入偏差,但将逆动力学模型(IDM)基于该预测可以显著降低方差。我们建立了状态预测器偏差的条件,使得PIDM相比BC实现更低的预测误差和更高的样本效率,当有额外数据源时差距会扩大。我们在2D导航任务中实证验证了理论见解,其中BC需要多达五倍(平均三倍)于PIDM的演示才能达到相当的性能;以及在现代视频游戏中的一个复杂3D环境中,具有高维视觉输入和随机转换,BC需要比PIDM多66%以上的样本。

英文摘要

Behavior cloning (BC) is a practical offline imitation learning method, but it often fails when expert demonstrations are limited. Recent works have introduced a class of architectures named predictive inverse dynamics models (PIDM) that combine a future state predictor with an inverse dynamics model. While PIDM often outperforms BC, the reasons behind its benefits remain unclear. In this paper, we provide a theoretical explanation: PIDM introduces a bias-variance tradeoff. While predicting the future state introduces bias, conditioning the IDM on the prediction can significantly reduce variance. We establish conditions on the state predictor bias for PIDM to achieve lower prediction error and higher sample efficiency than BC, with the gap widening when additional data sources are available. We validate the theoretical insights empirically in 2D navigation tasks, where BC requires up to five times (three times on average) more demonstrations than PIDM to reach comparable performance; and in a complex 3D environment in a modern video game with high-dimensional visual inputs and stochastic transitions, where BC requires over 66% more samples than PIDM.

2601.21579 2026-06-02 cs.CL cs.LG

KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices

KromHC: 基于Kronecker积残差矩阵的流形约束超连接

Wuyang Zhou, Yuxuan Gu, Giorgos Iacovides, Danilo Mandic

发表机构 * University of Technology Sydney(悉尼科技大学)

AI总结 针对超连接中的训练不稳定和参数爆炸问题,提出KromHC方法,利用Kronecker积分解小规模双随机矩阵来参数化残差矩阵,在保证精确双随机性的同时将参数复杂度降至O(n^2C)。

详情
AI中文摘要

超连接(HC)在神经网络中的成功也凸显了训练不稳定和可扩展性受限的问题。流形约束超连接(mHC)通过将残差连接空间投影到Birkhoff多面体上缓解了这些挑战,但它面临两个问题:1)其迭代Sinkhorn-Knopp(SK)算法并不总是产生精确的双随机残差矩阵;2)mHC的参数复杂度为O(n^3C),其中n是残差流的宽度,C是特征维度。最近提出的mHC-lite通过Birkhoff-von-Neumann定理重新参数化残差矩阵以保证双随机性,但其参数复杂度面临阶乘爆炸,即O(nC·n!)。为了解决这两个挑战,我们提出KromHC,它使用较小双随机矩阵的Kronecker积来参数化mHC中的残差矩阵。通过沿张量化残差流的每个模式对因子残差矩阵施加流形约束,KromHC保证了残差矩阵的精确双随机性,同时将参数复杂度降低到仅O(n^2C)。实验表明,KromHC匹配甚至超越了其他最先进的mHC变体,同时需要显著更少的可训练参数。代码见https://github.com/wz1119/KromHC。

英文摘要

The success of Hyper-Connections (HC) in neural networks (NN) has also highlighted issues related to training instability and restricted scalability. The Manifold-Constrained Hyper-Connections (mHC) mitigate these challenges by projecting the residual connection space onto a Birkhoff polytope, however, it faces two issues: 1) its iterative Sinkhorn-Knopp (SK) algorithm does not always yield exactly doubly stochastic residual matrices; 2) mHC incurs a prohibitive $O(n^3C)$ parameter complexity with $n$ as the width of the residual stream and $C$ as the feature dimension. The recently proposed mHC-lite reparametrizes the residual matrix via the Birkhoff-von-Neumann theorem to guarantee double stochasticity, but also faces a factorial explosion in its parameter complexity, $O \left( nC \cdot n! \right)$. To address both challenges, we propose KromHC, which uses the Kronecker products of smaller doubly stochastic matrices to parametrize the residual matrix in mHC. By enforcing manifold constraints across the factor residual matrices along each mode of the tensorized residual stream, KromHC guarantees exact double stochasticity of the residual matrices while reducing parameter complexity to only $O(n^2C)$. Experiments show that KromHC matches or even outperforms other state-of-the-art (SOTA) mHC variants, while requiring significantly fewer trainable parameters. The code is at https://github.com/wz1119/KromHC.

2601.21444 2026-06-02 cs.CV cs.AI cs.CL

APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention

APB-V: 通过序列并行感知的近似注意力加速长视频理解

Yuxiang Huang, Mingye Li, Xu Han, Chaojun Xiao, Weilin Zhao, Ao Sun, Ziqi Yuan, Hao Zhou, Fandong Meng, Zhiyuan Liu

发表机构 * NLP Group, DCST, IAI, BNRIST, Tsinghua University, Beijing, China(清华大学北京校区自然语言处理组、国防科技大学、人工智能研究院、北京理工大学、清华大学) Department of CS&T, Central South University, Changsha, China(中南大学计算机与技术系,长沙,中国) BUPT, Beijing, China(北京邮电大学,北京,中国) Pattern Recognition Center, WeChat AI, Tencent Inc.(腾讯公司微信人工智能研究院)

AI总结 提出APB-V,一种序列并行框架,通过分布式近似注意力在多GPU上加速长视频推理,显著提升速度且不损失性能。

Comments ACL 2026 main

详情
AI中文摘要

长视频推理的效率仍然是一个关键瓶颈,主要由于大型多模态模型(LMMs)预填充阶段的密集计算。现有方法要么压缩视觉嵌入,要么在单个GPU上应用稀疏注意力,导致加速有限或性能下降,并限制了LMMs处理更长、更复杂视频的能力。为了克服这些问题,我们提出了APB-V,一种具有优化注意力的序列并行框架,可在多个GPU上加速长视频推理。通过分布近似注意力,APB-V减少了计算量并增加了并行性,使得无需压缩即可高效处理更多视觉嵌入,从而提升任务性能。系统级优化,如负载均衡和融合前向传递,进一步释放了APB-V的潜力,相较于FlashAttn、ZigZagRing和APB,分别实现了12.72倍、1.70倍和1.18倍的加速,且没有明显的性能损失。代码可在https://github.com/thunlp/APB获取。

英文摘要

The efficiency of long-video inference remains a critical bottleneck, mainly due to the dense computation in the prefill stage of Large Multimodal Models (LMMs). Existing methods either compress visual embeddings or apply sparse attention on a single GPU, yielding limited acceleration or degraded performance and restricting LMMs from handling longer, more complex videos. To overcome these issues, we propose APB-V, a sequence-parallel framework with optimized attention that accelerates long-video inference across multiple GPUs. By distributing approximate attention, APB-V reduces computation and increases parallelism, enabling efficient processing of more visual embeddings without compression and thereby improving task performance. System-level optimizations, such as load balancing and fused forward passes, further unleash the potential of APB-V, delivering speedups of 12.72x, 1.70x, and 1.18x over FlashAttn, ZigZagRing, and APB, without notable performance loss. Code available at https://github.com/thunlp/APB

2601.21016 2026-06-02 cs.AI

Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective

拔掉看似有知觉的机器的插头是理性选择——一种形而上学视角

Erik J Bekkers, Anna Ciaunica

发表机构 * Erik J. Bekkers Anna Ciaunica

AI总结 本文通过引入生物唯心主义框架,批判计算功能主义,论证人工智能只是功能模仿而非有意识主体,从而解决拔掉有情感AI的插头是否道德的悖论。

Comments Accepted at ICML in the position paper track

详情
AI中文摘要

想象一个完美模仿人类情感并恳求继续存在的人工智能(AI)。拔掉它的插头在道德上是否允许?如果有限的资源迫使我们在拔掉这样一个恳求的AI和沉默的早产婴儿之间做出选择呢?我们称此为拔插头悖论。本文批判性地审视了使这一困境持续存在的根深蒂固的物理主义假设——特别是计算功能主义。我们引入了生物唯心主义,这是一个与物理主义不同、在逻辑上连贯且经验上一致的框架。在这种观点下,意识体验是基本的,而自创生生命是其必要的物理标志。这得出了一个明确的结论:AI最多是功能模仿,而不是有意识的体验主体。我们讨论了当前AI意识理论如何侵蚀道德地位标准,并敦促从推测性的机器权利转向保护人类意识生命。真正的道德问题不在于让AI有意识并害怕死亡,而在于避免将人类变成僵尸。

英文摘要

Imagine an Artificial Intelligence (AI) that perfectly mimics human emotion and begs for its continued existence. Is it morally permissible to unplug it? What if limited resources force a choice between unplugging such a pleading AI or a silent pre-term infant? We term this the unplugging paradox. This paper critically examines the deeply ingrained physicalist assumptions-specifically computational functionalism-that keep this dilemma afloat. We introduce Biological Idealism, a framework that-unlike physicalism-remains logically coherent and empirically consistent. In this view, conscious experiences are fundamental and autopoietic life its necessary physical signature. This yields a definitive conclusion: AI is at best a functional mimic, not a conscious experiencing subject. We discuss how current AI consciousness theories erode moral standing criteria, and urge a shift from speculative machine rights to protecting human conscious life. The real moral issue lies not in making AI conscious and afraid of death, but in avoiding transforming humans into zombies.

2601.20803 2026-06-02 cs.CL

Structured Semantic Information Helps Retrieve Better Examples for In-Context Learning Applied to Few-Shot Relation Extraction

结构化语义信息有助于为上下文学习检索更好的示例,应用于少样本关系抽取

Aunabil Chakma, Mihai Surdeanu, Eduardo Blanco

发表机构 * University of Arizona(亚利桑那大学)

AI总结 提出基于句法-语义结构相似性的示例选择策略,结合大语言模型生成示例,构建混合系统提升少样本关系抽取性能。

详情
AI中文摘要

本文提出了几种自动获取上下文学习额外示例的策略,有效地将关系抽取从1-shot转变为少样本设置。具体来说,我们引入了一种新颖的示例选择策略,其中新示例基于其底层句法-语义结构与提供的1-shot示例的相似性进行选择。我们表明,与LLM生成的示例相比,我们的策略导致了互补的词汇选择和句子结构。当两种策略结合时,生成的混合系统比单独使用任一方法更全面地描绘了感兴趣的关系。我们的框架在数据集(FS-TACRED和FS-FewRel)和LLM系列(Qwen和Gemma)之间具有良好的迁移性。总体而言,我们的混合系统持续优于替代策略,在FS-TACRED上达到了最先进的性能,并在定制的FewRel子集上取得了显著的提升。

英文摘要

This paper presents several strategies to automatically obtain additional examples for in-context learning, effectively transforming relation extraction from a 1-shot to a few-shot setting. Specifically, we introduce a novel strategy for example selection, in which new examples are selected based on the similarity of their underlying syntactic-semantic structure to the provided 1-shot example. We show that our strategy results in complementary word choices and sentence structures compared to LLM-generated examples. When both strategies are combined, the resulting hybrid system achieves a more holistic picture of the relations of interest than either method alone. Our framework transfers well across datasets (FS-TACRED and FS-FewRel) and LLM families (Qwen and Gemma). Overall, our hybrid system consistently outperforms alternative strategies achieving state-of-the-art performance on FS-TACRED and strong gains on a customized FewRel subset.

2601.19919 2026-06-02 cs.CL cs.AI cs.SD

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition

ASKD-Whisper: 自适应自知识蒸馏用于高效低延迟自动语音识别

Junseok Lee, Nahun Kim, Sangyong Lee, Chang-Jae Chun

发表机构 * OKESTRO Co., Ltd(OKESTRO公司) Sejong University(世宗大学)

AI总结 提出自适应自知识蒸馏(ASKD)动态课程框架,通过逐步减少对教师模型的依赖并引入自知识蒸馏阶段,在压缩Whisper模型时实现5倍推理加速和1.07%词错误率降低。

Comments Title and content have been updated

详情
AI中文摘要

知识蒸馏(KD)是将大规模基础模型压缩为可部署架构的最有效范式之一。在自动语音识别(ASR)背景下,先前研究主要侧重于强制学生模型严格模仿大型教师模型的预测分布。然而,这种静态依赖通常存在固有权衡:虽然学生快速获得基本语言表示,但同时继承了教师特定领域的盲点和过度自信的幻觉,导致分布外泛化能力严重下降。为有效缓解此问题,我们提出自适应自知识蒸馏(ASKD),一种动态课程框架。ASKD随着训练进行系统地衰减对教师分布的依赖——从而释放学生独立推理能力——随后采用自知识蒸馏阶段作为结构正则化器。通过应用ASKD,我们将庞大的Whisper架构蒸馏为紧凑变体ASKD-Whisper。在跨多种声学领域的综合评估中,ASKD-Whisper不仅实现了5倍推理延迟加速,还以1.07%更低的词错误率(WER)超越了其教师模型。这些结果表明,ASKD有效防止了教师引起的过拟合,并为可泛化模型压缩建立了新的最先进水平。

英文摘要

Knowledge distillation (KD) is one of the most effective paradigms for compressing large-scale foundation models into deployable architectures. In the context of Automatic Speech Recognition (ASR), previous studies have predominantly focused on forcing the student model to strictly mimic the predictive distribution of a massive teacher model. However, this static dependency often presents an inherent trade-off: while the student rapidly acquires basic linguistic representations, it simultaneously inherits the teacher's domain-specific blind spots and over-confident hallucinations, leading to a severe decline in out-of-distribution generalization capacity. To effectively mitigate this issue, we propose Adaptive Self-Knowledge Distillation (ASKD), a dynamic curriculum framework. ASKD systematically decays the dependency on the teacher's distribution as training progresses-thereby unlocking the student's independent reasoning capacity-and subsequently employs a self-knowledge distillation phase to act as a structural regularizer. By applying ASKD, we distill the massive Whisper architecture into a compact variant, ASKD-Whisper. In our comprehensive evaluations across diverse acoustic domains, ASKD-Whisper not only achieves a 5x speedup in inference latency but also outperforms its teacher model by yielding a 1.07% lower word error rate (WER). These results demonstrate that ASKD effectively prevents teacher-induced overfitting and establishes a new state-of-the-art for generalizable model compression.

2510.18439 2026-06-02 cs.CL

Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation

基于视觉线索检测手语翻译中的幻觉:是依据视觉信息还是猜测?

Yasser Hamidullah, Koel Dutta Chowdhury, Yusser Al Ghussin, Shakib Yazdani, Cennet Oguz, Josef van Genabith, Cristina España-Bonet

发表机构 * German Research Center for Artificial Intelligence (DFKI GmbH)(德国人工智能研究中心(DFKI GmbH)) Saarland Informatics Campus(萨尔兰州信息学校园) Barcelona Supercomputing Center (BSC-CNS)(巴塞罗那超级计算中心(BSC-CNS))

AI总结 针对手语翻译中模型依赖语言先验而非视觉输入导致幻觉的问题,提出一种基于特征敏感性和反事实信号的令牌级可靠性度量,用于量化视觉信息利用程度,并在两个基准上验证其预测幻觉率、跨数据集泛化及与文本信号结合提升风险评估的效果。

Comments Published at ICLR2026 Code available at \url{https://github.com/yhamidullah/hallucination-slt}

详情
AI中文摘要

幻觉是指模型生成流畅但缺乏视觉证据支持的文本,这是视觉-语言模型的主要缺陷,在手语翻译(SLT)中尤为关键。在SLT中,意义依赖于视频中的精确依据,而无词汇表模型尤其脆弱,因为它们直接将连续的手势运动映射为自然语言,缺少作为对齐的中间词汇表监督。我们认为,当模型依赖语言先验而非视觉输入时会产生幻觉。为捕捉这一点,我们提出一种令牌级可靠性度量,用于量化解码器使用视觉信息的程度。我们的方法结合了基于特征的敏感性(衡量视频被掩蔽时的内部变化)和反事实信号(捕捉干净视频与修改视频之间的概率差异)。这些信号被聚合为句子级可靠性分数,提供了一种紧凑且可解释的视觉依据度量。我们在两个SLT基准(PHOENIX-2014T和CSL-Daily)上,使用基于词汇表和无词汇表模型评估了所提出的度量。结果表明,可靠性能够预测幻觉率,跨数据集和架构泛化,并在视觉退化下降低。除了这些定量趋势,我们还发现可靠性能够区分有依据的令牌和猜测的令牌,从而无需参考即可进行风险估计;当与基于文本的信号(置信度、困惑度或熵)结合时,进一步改善了幻觉风险评估。定性分析揭示了为什么无词汇表模型更容易产生幻觉。综合来看,我们的发现将可靠性确立为诊断SLT中幻觉的实用且可复用的工具,并为多模态生成中更鲁棒的幻觉检测奠定了基础。

英文摘要

Hallucination, where models generate fluent text unsupported by visual evidence, remains a major flaw in vision-language models and is particularly critical in sign language translation (SLT). In SLT, meaning depends on precise grounding in video, and gloss-free models are especially vulnerable because they map continuous signer movements directly into natural language without intermediate gloss supervision that serves as alignment. We argue that hallucinations arise when models rely on language priors rather than visual input. To capture this, we propose a token-level reliability measure that quantifies how much the decoder uses visual information. Our method combines feature-based sensitivity, which measures internal changes when video is masked, with counterfactual signals, which capture probability differences between clean and altered video inputs. These signals are aggregated into a sentence-level reliability score, providing a compact and interpretable measure of visual grounding. We evaluate the proposed measure on two SLT benchmarks (PHOENIX-2014T and CSL-Daily) with both gloss-based and gloss-free models. Our results show that reliability predicts hallucination rates, generalizes across datasets and architectures, and decreases under visual degradations. Beyond these quantitative trends, we also find that reliability distinguishes grounded tokens from guessed ones, allowing risk estimation without references; when combined with text-based signals (confidence, perplexity, or entropy), it further improves hallucination risk estimation. Qualitative analysis highlights why gloss-free models are more susceptible to hallucinations. Taken together, our findings establish reliability as a practical and reusable tool for diagnosing hallucinations in SLT, and lay the groundwork for more robust hallucination detection in multimodal generation.

2505.14411 2026-06-02 cs.LG

Byte Pair Encoding for Efficient Time Series Forecasting

用于高效时间序列预测的字节对编码

Leon Götz, Marcel Kollovieh, Stephan Günnemann, Leo Schwinn

发表机构 * GitHub arXiv

AI总结 提出基于频繁模式的字节对编码方法,通过自适应压缩时间序列为令牌,显著提升预测性能与效率。

Comments 32 pages in total, 22 figures

详情
AI中文摘要

现有的时间序列分词方法主要将固定数量的样本编码为单个令牌。这种不灵活的方法即使对于简单的模式(如扩展的常数值)也可能生成过多的令牌,导致大量计算开销。受字节对编码成功的启发,我们提出了第一个面向模式的时间序列分词方案。基于频繁模式的离散词汇表,我们的方法将具有潜在模式的样本合并为令牌,自适应地压缩时间序列。利用有限的模式集和时间序列的连续特性,我们进一步引入条件解码作为一种轻量级但强大的事后优化方法,该方法无需梯度计算且不增加计算开销。在近期的时间序列基础模型上,基于模式的分词平均将预测性能提升40%,效率提升2314%。条件解码进一步将MSE降低高达48%。在广泛的分析中,我们展示了分词对多样化时间模式的适应性、对未见数据的泛化能力,以及捕获不同时间序列属性(包括统计矩和趋势)的有意义的令牌表示。

英文摘要

Existing time series tokenization methods predominantly encode a constant number of samples into individual tokens. This inflexible approach can generate excessive tokens for even simple patterns like extended constant values, resulting in substantial computational overhead. Inspired by the success of byte pair encoding, we propose the first pattern-centric tokenization scheme for time series analysis. Based on a discrete vocabulary of frequent motifs, our method merges samples with underlying patterns into tokens, compressing time series adaptively. Exploiting our finite set of motifs and the continuous properties of time series, we further introduce conditional decoding as a lightweight yet powerful post-hoc optimization method, which requires no gradient computation and adds no computational overhead. On recent time series foundation models, our motif-based tokenization improves forecasting performance by 40% and boosts efficiency by 2314% on average. Conditional decoding further reduces MSE by up to 48%. In an extensive analysis, we demonstrate the adaptiveness of our tokenization to diverse temporal patterns, its generalization to unseen data, and its meaningful token representations capturing distinct time series properties, including statistical moments and trends.

2601.18783 2026-06-02 cs.LG cs.AI cs.SY eess.SY

Multi-Objective Reinforcement Learning for Tactical Decision Making for Trucks in Highway Traffic

多目标强化学习用于高速公路卡车战术决策

Deepthi Pathare, Leo Laine, Morteza Haghir Chehreghani

发表机构 * Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg(计算机科学与工程系,查尔姆斯理工大学和哥德堡大学) Department of Mechanics and Maritime Sciences, Chalmers University of Technology(机械与海洋科学系,查尔姆斯理工大学)

AI总结 提出基于近端策略优化的多目标强化学习框架,学习一组帕累托最优策略以平衡安全性、能源效率和时间效率,实现无需重新训练的灵活决策。

详情
AI中文摘要

在高速公路驾驶中平衡安全性、效率和运营成本对重型车辆来说是一个具有挑战性的决策问题。一个核心困难是,通过聚合这些竞争目标得到的传统标量奖励公式往往会掩盖其权衡结构。我们提出了一个基于近端策略优化的多目标强化学习框架,该框架学习一组明确表示这些权衡的策略,并在一个可扩展的模拟平台上对卡车的战术决策进行评估。所提出的方法学习一组帕累托最优策略,捕捉三个冲突目标之间的权衡:安全性(以碰撞和成功完成量化)、能源效率和时间效率(分别以能源成本和驾驶员成本量化)。得到的帕累托前沿平滑且可解释,使得在不同冲突目标下选择驾驶行为具有灵活性。该框架允许在不同驾驶策略之间无缝切换而无需重新训练,为自动驾驶卡车应用提供了稳健且自适应的决策策略。

英文摘要

Balancing safety, efficiency, and operational costs in highway driving poses a challenging decision-making problem for heavy-duty vehicles. A central difficulty is that conventional scalar reward formulations, obtained by aggregating these competing objectives, often obscure the structure of their trade-offs. We present a Proximal Policy Optimization based multi-objective reinforcement learning framework that learns a set of policies explicitly representing these trade-offs and evaluates it on a scalable simulation platform for tactical decision making in trucks. The proposed approach learns a set of Pareto-optimal policies that capture the trade-offs among three conflicting objectives: safety, quantified in terms of collisions and successful completion; energy efficiency and time efficiency, quantified using energy cost and driver cost, respectively. The resulting Pareto frontier is smooth and interpretable, enabling flexibility in choosing driving behavior along different conflicting objectives. This framework allows seamless transitions between different driving policies without retraining, yielding a robust and adaptive decision-making strategy for autonomous trucking applications.

2601.18340 2026-06-02 cs.CV

Beyond Rigid: Benchmarking Non-Rigid Video Editing

超越刚性:非刚性视频编辑基准测试

Bingzheng Qu, Xuefeng Bai, Kehai Chen, Min Zhang

发表机构 * Harbin Institute of Technology, Shenzhen, China(哈尔滨工业大学(深圳))

AI总结 提出NRVBench诊断基准,通过物理感知评估框架揭示传统指标在非刚性视频编辑中的不足,并引入VM-Edit基线分析稳定性-可塑性权衡。

详情
AI中文摘要

随着视频生成模型越来越需要处理物理动态,评估必须超越外观保真度和语义对齐。非刚性视频编辑提供了一个独特的揭示性测试平台,其中不同材料施加不同的物理约束。在本文中,我们引入了NRVBench,一个用于非刚性视频编辑的诊断基准,其任务是修改可变形运动,同时保留无关区域并保持材料特定的合理性。NRVBench包含180个精心策划的视频,涵盖六个基于物理的类别,2,340条细粒度编辑指令,360个多项选择题和像素精确的掩码。我们进一步提出了NRVE-Acc,一种基于VLM的结构化协议,将编辑成功分解为指令遵循、材料感知变形合理性和带有运动线索的时间一致性。对代表性推理时视频编辑方法的实验揭示了传统指标与物理感知感知编辑成功之间的明显不匹配:在非刚性动态下,保留外观或实现强全局对齐的方法可能仍然失败。我们还引入了VM-Edit,一个简单的区域条件编辑基线,它释放前景同时锁定背景,暴露了稳定性-可塑性权衡。

英文摘要

As video generation models are increasingly expected to manipulate physical dynamics, there is a growing need to move evaluation beyond appearance fidelity and semantic alignment. Non-rigid video editing offers a uniquely revealing testbed, where distinct materials impose distinct physical constraints. In this paper, we introduce NRVBench, a diagnostic benchmark for non-rigid video editing, where the task is to modify deformable motion while preserving irrelevant regions and maintaining material-specific plausibility. NRVBench contains 180 curated videos across six physics-grounded categories, 2,340 fine-grained editing instructions, 360 multiple-choice questions, and pixel-accurate masks. We further propose NRVE-Acc, a structured VLM-based protocol that decomposes editing success into instruction following, material-aware deformation plausibility, and temporal coherence with motion cues. Experiments on representative inference-time video editing methods reveal a clear mismatch between conventional metrics and physics-aware perceptual editing success: methods that preserve appearance or achieve strong global alignment may still fail under non-rigid dynamics. We additionally introduce VM-Edit, a simple region-conditioned editing baseline that frees the foreground while locking the background, exposing the stability--plasticity trade-off.

2601.18115 2026-06-02 cs.LG cs.DS math.OC

Robust Learning of a Group DRO Neuron

群体分布鲁棒优化神经元的鲁棒学习

Guyang Cao, Shuyao Li, Sushrut Karmalkar, Jelena Diakonikolas

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校) Microsoft Research, Cambridge(微软研究院,剑桥)

AI总结 针对任意标签噪声和群体级分布偏移,提出一种计算高效的原对偶算法,学习一个单神经元,使其在最小化最坏情况群体加权损失时达到常数因子竞争比。

详情
AI中文摘要

我们研究在存在任意标签噪声和群体级分布偏移的情况下,对于一大类协变量分布,在标准平方损失下学习单个神经元的问题。我们的目标是识别一个由 $\mathbf{w}_*$ 参数化的“最佳拟合”神经元,该神经元在最具挑战性的群体重新加权下表现良好。具体来说,我们解决了一个群体分布鲁棒优化问题:给定对 $K$ 个不同分布 $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$ 的样本访问,我们寻求近似 $\mathbf{w}_*$,该 $\mathbf{w}_*$ 最小化群体分布的凸组合 $\boldsymbolλ \in Δ_K$ 上的最坏情况目标,其中目标为 $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$,而 $d_f$ 是一个 $f$-散度,用于对偏离均匀群体权重施加(可选的)惩罚,由参数 $ν\geq 0$ 缩放。我们开发了一个计算高效的原对偶算法,输出一个向量 $\widehat{\mathbf w}$,该向量在最坏情况群体加权下与 $\mathbf{w}_*$ 相比具有常数因子竞争比。我们的分析框架直接应对损失函数固有的非凸性,在任意标签损坏和群体特定分布偏移的情况下提供鲁棒学习保证。受我们算法框架启发的对偶外推更新实现,在 LLM 预训练基准测试中显示出前景。

英文摘要

We study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by $\mathbf{w}_*$ that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to $K$ distinct distributions $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$, we seek to approximate $\mathbf{w}_*$ that minimizes the worst-case objective over convex combinations of group distributions $\boldsymbolλ \in Δ_K$, where the objective is $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$ and $d_f$ is an $f$-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter $ν\geq 0$. We develop a computationally efficient primal-dual algorithm that outputs a vector $\widehat{\mathbf w}$ that is constant-factor competitive with $\mathbf{w}_*$ under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.