arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2251
2605.28573 2026-05-28 cs.LG cs.AI

Efficient Pre-Training of LLMs through Truncated SVD Layers

通过截断SVD层实现LLM的高效预训练

Kaivan Kamali, Kajetan Schweighofer, Hormoz Shahrzad, Olivier Francon, Babak Hodjat, Risto Miikkulainen

发表机构 * Cognizant AI Lab(认知AI实验室) UT Austin(得克萨斯大学奥斯汀分校)

AI总结 提出TSVD框架,利用谱能量启发式自适应秩选择和缓存机制保持低秩与严格正交性,在减少计算开销的同时匹配或超越全参数基线的性能。

详情
AI中文摘要

大规模语言模型(LLM)的规模扩展使得预训练成本日益高昂。虽然低秩表示和正交权重矩阵原则上可以减少参数数量和计算开销,但现有方法大多依赖静态秩选择,且由于高计算成本而不强制权重正交性。本文引入TSVD框架,在整个训练过程中保持低秩和严格正交性。它利用基于谱能量的启发式方法进行自适应秩选择,并采用缓存机制来维持正交性。理论分析证明了该方法在预训练动态中的优势,跨多种模型规模的实验表明其在经验上有效。TSVD在显著降低计算需求的同时,匹配或超越了全参数基线的性能。因此,该方法为高效高性能LLM预训练提供了一条有充分依据、实用且可扩展的路径。

英文摘要

The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and orthonormal weight matrices could in principle reduce parameter counts and computational overhead, most existing methods rely on static rank selection and do not enforce weight orthonormality due to high computational cost. This paper introduces TSVD, a framework that maintains low rank and strict orthonormality throughout the training process. It utilizes a spectral energy-based heuristic for adaptive rank selection, and a caching mechanisms to maintain orthonormality. Theoretical analysis justifies the advantage of the approach in pretraining dynamics and experiments across various model scales demonstrate that it is effective empirically. TSVD matches or exceeds the performance of full-parameter baselines while significantly reducing compute requirements. The approach thus offers a well-founded, practical, and scalable path toward efficient high-performance LLM pretraining.

2605.28567 2026-05-28 cs.LG cs.AI

Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression

稀疏自编码器特征匹配与电路压缩的语义最优传输

Tue M. Cao, Nguyen Do, My T. Thai

发表机构 * University of Florida(佛罗里达大学)

AI总结 提出基于最优传输的分布框架,通过激活加权分布和Wasserstein距离统一解决跨层特征匹配与电路压缩问题。

Comments preprint

详情
AI中文摘要

稀疏自编码器(SAE)已成为解释语言模型的核心工具。然而,两个关键的SAE分析仍然难以规模化:(1)跨层匹配语义相似的特征,(2)将大型特征电路压缩为可解释的超节点。尽管这些问题被视为独立问题,但我们表明它们都是更基础挑战的实例,我们将其框架化为估计位于不同激活流形上的SAE特征之间的语义距离。我们为此问题引入了一个分布框架,其中每个特征不是像文献中那样由单个解码器向量表示,而是由表达它的隐藏状态上的激活加权分布表示。通过将这些分布投影到共享参考空间并使用Wasserstein距离进行比较,我们的方法为跨层特征比较提供了统一的语义度量。我们证明了我们的表示对激活缩放具有不变性,在扰动下稳定,并在有限样本边际条件下恢复真实匹配。实验上,我们的方法优于解码器向量和基于LLM的基线,并捕捉相关特征之间的细微功能差异。值得注意的是,我们的方法自动将大型特征电路压缩为可解释的超节点。

英文摘要

Sparse autoencoders (SAEs) have become a central tool for interpreting language models. However, two key SAE analyses that remain difficult to scale are (1) matching semantically similar features across multi-layers and (2) compressing large feature circuits into interpretable supernodes. Although these have been treated as separate problems, we show that both are instances of a more fundamental challenge, which we frame as the estimation of semantic distances between SAE features that lie on different activation manifolds. We introduce a distributional framework for this problem, in which each feature is represented not by a single decoder vector like in the literature, but by an activation-weighted distribution over the hidden states that express it. By projecting these distributions into a shared reference space and comparing them with Wasserstein distance, our method provides a unified semantic metric for cross-layer feature comparison. We prove that our representation is invariant to activation rescaling, stable under perturbations, and recovers true matches under finite-sample margin conditions. Empirically, our method outperforms decoder-vector and LLM-based baselines and captures subtle functional distinctions between related features. Notably, our method compresses large feature circuits into interpretable supernodes automatically.

2605.28566 2026-05-28 cs.AI cs.LG

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

思维树作为经典启发式搜索问题:形式化基础与设计模式

Guni Sharon

发表机构 * Guni Sharon

AI总结 本文通过经典启发式搜索术语统一分类法,将基于LLM的推理映射到搜索组件,并识别出系统搜索和前瞻性策略两种设计模式。

Comments Extended version of the SoCS 2026 paper. Includes appendices omitted from the proceedings version

Journal ref Proceedings of the Nineteenth International Symposium on Combinatorial Search (SoCS 2026), AAAI Press, 2026

详情
AI中文摘要

大型语言模型(LLM)展示了卓越的推理能力,但其标准生成过程——自回归令牌预测——本质上是短视的,容易产生级联错误。为了解决这个问题,思维树(ToT)框架在中间推理步骤上创建了一个搜索空间,允许搜索模型进行探索、前瞻和回溯。然而,当前的ToT研究在自然语言处理和自动规划社区之间仍然分散,常常使用不一致的术语和临时实现。因此,我们通过基于经典启发式搜索术语的统一分类法综合了ToT领域。我们将基于LLM的推理映射到经典搜索组件:状态表示(思维粒度)、后继生成(提示操作符)和启发式评估(进展自我评估)。我们在分类法的背景下分析现有工作,并识别出新兴的设计模式:针对浅层确定性任务的系统搜索(最佳优先搜索)和针对深层多步推理的前瞻性策略(DFS、MCTS)。最后,我们指出了启发式搜索与LLM推理交叉领域中的开放算法挑战,并呼吁启发式搜索社区参与这一新兴领域。

英文摘要

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, yet their standard generation process -- auto-regressive token prediction -- is inherently myopic and prone to cascading errors. To address this, the Tree-of-Thoughts (ToT) framework creates a search space over intermediate reasoning steps, allowing search models to explore, look ahead, and backtrack. However, current ToT research remains fragmented across Natural Language Processing and Automated Planning communities, often using inconsistent terminology and ad-hoc implementations. Consequently, we synthesize the ToT landscape through a unified taxonomy based on classical heuristic search terminology. We map LLM-based reasoning to classical search components: state representation (granularity of thoughts), successor generation (prompting operators), and heuristic evaluation (self-assessment of progress). We analyze existing work within the context of our taxonomy and identify emerging design patterns: systematic search (Best-First Search) for shallow, deterministic tasks and lookahead-heavy strategies (DFS, MCTS) for deep multi-step reasoning. We conclude by identifying open algorithmic challenges at the intersection of heuristic search and LLM reasoning, and call on the heuristic search community to engage with this emerging domain.

2605.28563 2026-05-28 cs.LG cs.AI

A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models

评估脑电图基础模型泛化能力的多维框架

Aditya Kommineni, Emily Zhou, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan

发表机构 * Signal Analysis and Interpretation Laboratory(信号分析与解释实验室)

AI总结 提出一个多维评估框架,在低资源条件下系统评估EEG基础模型(如LaBraM、CSBrain、CBraMod)的泛化能力,发现其在长上下文任务中表现优异,但在短窗口BCI任务中与监督模型相当,且对通道限制鲁棒性不足。

Comments 24 pages, 5 Figures

详情
AI中文摘要

在适当的适应设置下评估基础模型对于理解所学表示的质量和可迁移性至关重要。最近的脑电图基础模型在跨任务和数据集上展示了有前景的迁移能力,推动了它们在神经技术和临床应用中日益增长的使用。然而,这些模型通常是在精心整理的下游数据集上进行全微调评估,这种设置并未反映生物医学领域的约束,如有限的标记数据、减少的传感器覆盖或参数高效的适应。在这项工作中,我们提出了一个多维评估框架,用于在现实低资源条件下评估脑电图模型。在提出的多维评估框架下,对包括LaBraM、CSBrain和CBraMod在内的监督脑电图模型和最近的脑电图基础模型在6个不同数据集上进行了实证分析。我们发现,脑电图基础模型在长上下文任务(如睡眠阶段预测和心理健康状态分类)上持续提供性能提升。相比之下,对于短窗口的脑机接口风格任务,监督模型尽管参数少得多,却取得了相当的性能。额外的分析表明,当前的基础模型对短窗口任务和通道受限设置提供的鲁棒性有限。总之,这些发现激励使用多维评估协议,以表征模型在现实使用约束下的行为。

英文摘要

Evaluating foundation models under appropriate adaptation settings is essential for understanding the quality and transferability of the learned representations. Recent EEG foundation models have demonstrated promising transfer capabilities across tasks and datasets, motivating their growing use in neurotechnology and clinical applications. However, these models are typically evaluated under full fine-tuning on well-curated downstream datasets, a setting that does not reflect biomedical domain constraints such as limited labeled data, reduced sensor coverage, or parameter-efficient adaptation. In this work, we propose a multi-dimensional evaluation framework for assessing EEG models under realistic low-resource conditions. Empirical analysis of both supervised EEG models and recent EEG foundation models, including LaBraM, CSBrain, and CBraMod, across 6 different datasets is performed under the proposed multi-dimensional evaluation framework. We find that EEG foundation models consistently provide performance gains on long-context tasks such as sleep stage prediction and mental health state classification. In contrast, for short-window Brain Computer Interface style tasks, supervised models achieve comparable despite having substantially fewer parameters. Additional analyses demonstrate that current foundation models provide limited robustness to short-window tasks and channel constrained settings. Together, these findings motivate the use of multi-dimensional evaluation protocols that characterize model behavior under realistic use constraints.

2605.28561 2026-05-28 cs.CL cs.LG

Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards

Soft-SVeRL: 基于软奖励的自验证强化学习

Saurabh Dash, Pierre Clavier, John Dang, Matthias Galle, Marzieh Fadaee, Ahmet Üstün, Beyza Ermis

发表机构 * Cohere Labs(Cohere实验室)

AI总结 针对部分可验证任务,提出基于检查表分解的软奖励框架Soft-RLVR及其自验证变体Soft-SVeRL,通过密集部分信用信号提升强化学习训练效果,并解决自验证中的奖励膨胀问题。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)在数学和代码等领域改进了语言模型,这些领域中正确性可以自动检查。然而,许多重要任务仅部分可验证:提示包含多个要求,响应可能满足其中一些但非全部,或者可能不存在单一的参考答案。我们引入Soft-RLVR,一个从分解的、学习的验证信号中进行强化学习的框架。Soft-RLVR将每个提示转换为原子要求的检查表,使用LLM验证器逐项评分候选响应,并在生成的软奖励上进行训练。基于检查表的奖励将稀疏的通过/失败监督转化为更密集的部分信用信号,但它们也引入了一个权衡:平均逐项判断可以减少验证器噪声,而部分信用可能奖励不完整的响应。我们形式化了这一权衡,并确定了基于检查表的验证比整体验证提供更可靠RL训练信号的条件。我们进一步引入Soft-SVeRL,这是Soft-RLVR的一个自验证变体,其中策略也充当验证器。我们表明,自验证容易因过于宽松的自我判断而导致奖励膨胀,并且需要显式稳定化以防止这种崩溃。在基于规则的ground-truth评估的受控指令遵循设置中,基于检查表的Soft-RLVR仅使用学习的验证器奖励就将IFEval提升了最多11.1分。我们的实验进一步表明,验证器质量和检查表质量都影响下游RL结果,并且显式稳定化对于有效的自验证至关重要。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has improved language models in domains such as mathematics and code, where correctness can be checked automatically. However, many important tasks are only partially verifiable: prompts contain multiple requirements, responses may satisfy some but not all of them, or no single reference answer might exist. We introduce Soft-RLVR, a framework for reinforcement learning from decomposed, learned verification signals. Soft-RLVR converts each prompt into a checklist of atomic requirements, scores candidate responses item by item with an LLM verifier, and trains on the resulting soft reward. Checklist-based rewards turn sparse pass/fail supervision into a denser partial-credit signal, but they also introduce a tradeoff: averaging item-level judgments can reduce verifier noise, while partial credit can reward incomplete responses. We formalize this tradeoff and identify conditions under which checklist-based verification gives a more reliable RL training signal than holistic verification. We further introduce Soft-SVeRL, a self-verifying variant of Soft-RLVR in which the policy also acts as the verifier. We show that self-verification is prone to reward inflation from overly permissive self-judgments, and that explicit stabilization is needed to prevent this collapse. In a controlled instruction-following setting with rule-based ground-truth evaluation, checklist-based Soft-RLVR improves IFEval by up to 11.1 points using only learned verifier rewards. Our experiments further show that verifier quality and checklist quality both affect downstream RL outcomes, and that explicit stabilization is essential for effective self-verification.

2605.28554 2026-05-28 cs.LG

High Performance, Low Reliability: Uncertainty Benchmarking for Tabular Foundation Models

高性能,低可靠性:表格基础模型的不确定性基准测试

José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan

发表机构 * CentraleSupélec(中央理工大学) ENS Paris-Saclay(巴黎-萨克雷大学) Université Paris-Saclay(巴黎-萨克雷大学)

AI总结 通过TALENT基准测试,发现表格基础模型虽在预测性能上优于梯度提升决策树,但在不确定性校准上表现更差,存在性能-不确定性权衡。

Comments 6 pages, 2 figures, 2 tables. Accepted at ESANN 2026 (European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning), 22-24 April 2026, Bruges (Belgium)

Journal ref ESANN 2026 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium) and online event, 22-24 April 2026, pp. 115-120, i6doc.com publ., ISBN 9782875870964

详情
AI中文摘要

最近的表格基础模型(TFMs)展示了最先进的预测性能,通常超越梯度提升决策树(GBDTs)。然而,这些模型的可信度,特别是其不确定性量化,在很大程度上被忽视了。我们通过在TALENT基准测试的112个数据集上进行广泛研究,比较TFMs、GBDTs和经典基线,调查了这一差距。我们的结果揭示了性能-不确定性权衡:尽管TFMs在AUC测量下达到了最高的预测性能,但在共形预测下,它们表现出较低的条件覆盖率(由SSCS测量),相比GBDTs。在合成数据集上的补充实验进一步刻画了这种效应加剧的情景。我们得出结论,尽管TFMs推进了预测前沿,但实现良好校准的不确定性仍然是其可靠采用的主要开放挑战。代码可在:https://github.com/jose-melo/high-performance-low-reliability 获取。

英文摘要

Recent Tabular Foundation Models (TFMs) have demonstrated state-of-the-art predictive performance, often surpassing Gradient-Boosted Decision Trees (GBDTs). However, the trustworthiness of these models, particularly their uncertainty quantification, has been largely overlooked. We investigate this gap through an extensive study comparing TFMs, GBDTs, and classical baselines on the 112 datasets of the TALENT benchmark. Our results reveal a performance-uncertainty trade-off: although TFMs achieve the highest predictive performance, measured by AUC, they exhibit lower conditional coverage under conformal prediction, measured by SSCS, compared to GBDTs. Complementary experiments on synthetic datasets further characterize the regimes in which this effect intensifies. We conclude that while TFMs advance predictive frontiers, achieving well-calibrated uncertainty remains a major open challenge for their reliable adoption. Code is available at: https://github.com/jose-melo/high-performance-low-reliability

2605.28553 2026-05-28 cs.AI cs.CR

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

解码前拒绝:检测和利用中间LLM激活中的拒绝信号

Matteo Gioele Collu, Riccardo Conte, Alberto Giaretta, Denis Kleyko, Mauro Conti, Matteo Zavatteri, Roberto Confalonieri

发表机构 * University of Padua(帕多瓦大学) Örebro University(欧雷布罗大学) Fondazione Bruno Kessler(布鲁诺·凯索基金会)

AI总结 本文通过线性探针在变压器块的残差流激活中检测拒绝行为,并提出Mechanistic AutoDAN方法,利用探针引导的遗传搜索实现高效攻击,显著降低搜索时间并保持攻击成功率。

详情
AI中文摘要

在本文中,我们研究了是否可以通过在解码前使用线性探针在变压器块的残差流激活上训练,从LLM中间激活中预测拒绝行为。我们发现拒绝在远早于最后一层时即可线性解码,表明安全相关行为在输出生成前就已编码在中间激活中。为了测试该信号是否可行,我们引入了Mechanistic AutoDAN,这是AutoDAN的一种探针引导变体,它在遗传提示搜索循环中用部分前向传递和基于探针的评分取代了全模型适应度评估。在评估的模型中,我们的方法实现了与原始AutoDAN相当的攻击成功率,同时将每次迭代的搜索时间减少了高达72%,并且在多种配置下,探针引导的提示在跨模型迁移方面达到或超过了AutoDAN。我们进一步发现,探针引导的有效性随模型规模增大而增加。我们的结果表明,拒绝不仅在输出层面可观察,而且作为结构化且可行的信号编码在LLM中间激活中。

英文摘要

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that safety-relevant behavior is represented in intermediate activations before output generation. To test whether this signal is actionable, we introduce Mechanistic AutoDAN, a probe-guided variant of AutoDAN that replaces full-model fitness evaluation with partial forward passes and probe-based scoring inside a genetic prompt search loop. Across the evaluated models, our method achieves attack success rates competitive with vanilla AutoDAN while reducing per-iteration search time by up to 72%, and probe-guided prompts match or exceed AutoDAN's cross-model transfer in several configurations. We further find that the usefulness of probe guidance increases with model scale. Our results show that refusal is not only observable at the output level, but is encoded as a structured and actionable signal in intermediate LLM activations.

2605.28552 2026-05-28 cs.AI

Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning

使用Smooth-Mamba深度强化学习建模安全关键交互中车辆类型特定的行人碰撞规避行为

Qingwen Pu, Kun Xie, Hong Yang, Di Yang, Junqing Wang

发表机构 * Transportation Informatics Lab, Department of Civil and Environmental Engineering, Old Dominion University(交通信息实验室,土木与环境工程系,旧 Dominion 大学) Department of Electrical and Computer Engineering, Old Dominion University(电气与计算机工程系,旧 Dominion 大学) Department of Transportation and Urban Infrastructure Studies, SMARTER Center, Morgan State University(交通与城市基础设施研究系,SMARTER 中心,莫根州立大学)

AI总结 本研究利用Smooth-Mamba深度确定性策略梯度框架(SMamba-DDPG)从Argoverse 2数据集中提取安全关键交互,建模行人与自动驾驶车辆(AV)和人类驾驶车辆(HDV)的碰撞规避行为,发现行人对AV反应更快、穿越速度更低,且AV场景冲突率更低。

Comments 37 page. 15 Figure, 9 table

详情
AI中文摘要

随着自动驾驶车辆(AV)越来越多地与人类驾驶车辆(HDV)共享道路,理解行人在安全关键交互中如何应对不同车辆类型对于自动驾驶技术的安全部署至关重要。本研究从Argoverse 2数据集中提取安全关键的行人-车辆交互,以捕捉涉及AV和HDV的真实碰撞规避行为。为了建模车辆类型特定的行人碰撞规避行为,我们开发了Smooth-Mamba深度确定性策略梯度框架(称为SMamba-DDPG),该框架将平滑动作约束与高效的时序表示学习相结合。为了量化行人行为差异,该框架分别为行人与AV和HDV的交互训练了碰撞规避策略。结果表明,SMamba-DDPG在复现行人碰撞规避行为方面优于基线强化学习和监督学习模型。重构轨迹表现出强烈的行为真实性,准确复现了AV和HDV场景中的碰撞规避运动学。反应时间分析表明,该模型捕捉到了类人的响应延迟,并揭示行人对AV的反应比HDV更快。反事实分析进一步表明,行人在与AV交互时采用更低的穿越速度。对模型生成数据的大规模安全分析显示,与行人-HDV交互相比,行人-AV交互始终产生更低的冲突率和更高的行人让行率。这些发现强调了在混合交通环境中,将车辆类型特定的行人行为模型纳入更安全的自动驾驶系统设计和更真实的交通模拟中的重要性。

英文摘要

As automated vehicles (AVs) increasingly share roadways with human-driven vehicles (HDVs), understanding how pedestrians respond to different vehicle types in safety-critical interactions is essential for the safe deployment of automated driving technologies. This study extracts safety-critical pedestrian-vehicle interactions from the Argoverse 2 dataset to capture real-world crash avoidance behaviors in encounters involving AVs and HDVs. To model vehicle-type-specific pedestrian crash avoidance behavior, we develop a Smooth-Mamba Deep Deterministic Policy Gradient framework, termed SMamba-DDPG, which integrates smooth action constraints with efficient temporal representation learning. To quantify pedestrian behavioral differences, the framework trains separate crash avoidance policies for pedestrian interactions with AVs and HDVs. Results show that SMamba-DDPG outperforms baseline reinforcement learning and supervised learning models in reproducing pedestrian crash avoidance behaviors. Reconstructed trajectories demonstrate strong behavioral realism, accurately reproducing crash avoidance kinematics in both AV and HDV scenarios. Reaction time analysis shows that the model captures human-like response delays and reveals that pedestrians respond more quickly to AVs than to HDVs. Counterfactual analysis further indicates that pedestrians adopt lower crossing speeds when interacting with AVs. Large-scale safety analysis of model-generated data revealed that pedestrian-AV interactions consistently yielded lower conflict rates and higher pedestrian yielding rates compared to pedestrian-HDV interactions. The findings highlight the importance of incorporating vehicle-type-specific pedestrian behavioral models for safer automated driving system design and more realistic traffic simulations in mixed-traffic environments.

2605.28549 2026-05-28 cs.RO cs.LG

SPRINT: Efficient Spectral Priors for Humanoid Athletic Sprints

SPRINT: 用于人形运动短跑的高效频谱先验

Yantong Wei, Kaihong Huang, Hainan Pan, Jiawei Luo, Jiawei Zhou, Ziyan Mai, Zhiwen Zeng, Yaonan Wang, Huimin Lu

发表机构 * College of Intelligence Science and Technology, National University of Defense Technology(智能科学与技术学院,国防科技大学) School of Artificial Intelligence and Robotics, Hunan University(人工智能与机器人学院,湖南大学)

AI总结 提出SPRINT框架,利用频率自适应频谱先验生成运动学可行的关节轨迹,实现零样本仿真到现实迁移,在Unitree G1平台上达到6 m/s峰值速度。

详情
AI中文摘要

人形运动短跑的追求受到缺乏人形可行的运动学参考数据以及现有框架在短跑过程中无法保持稳定性的阻碍。为了克服这些限制,我们引入了SPRINT,一种由高效、频率自适应频谱先验驱动的新框架。通过使用五个离散运动序列的参考库在频域中表征人类运动的基本周期性,这些先验在广泛的速度范围内生成运动学可行的关节轨迹,成功外推至超过参考分布的速度。在这些预训练先验的指导下,SPRINT策略在Unitree G1平台上的现场实验中实现了零样本仿真到现实迁移,达到了6 m/s的峰值短跑速度,并在保持仿生自然性的同时展示了无缝步态转换。最终,这项工作确立了频率自适应频谱先验作为人形运动短跑的高数据效率基础。项目页面见 https://anonymous.4open.science/w/SPRINT-138A/。

英文摘要

The pursuit of humanoid athletic sprints is hindered by a scarcity of humanoid-viable kinematic reference data and the inability of existing frameworks to maintain stability during sprints. To overcome these limitations, we introduce SPRINT, a novel framework driven by efficient, frequency-adaptive spectral priors. By characterizing the fundamental periodicity of human locomotion in the frequency domain using a reference library of five discrete motion sequences, these priors generate kinematically feasible joint trajectories across a broad velocity spectrum, successfully extrapolating to speeds that exceed the reference distribution. Guided by these pretrained priors, the SPRINT policy achieves zero-shot sim-to-real transfer in field experiments on the Unitree G1 platform, reaching a peak sprinting velocity of 6 m/s and demonstrating seamless gait transitions while preserving biomimetic naturalness. Ultimately, this work establishes frequency-adaptive spectral priors as a highly data-efficient foundation for humanoid athletic sprints. The project page is available at https://anonymous.4open.science/w/SPRINT-138A/.

2605.28548 2026-05-28 cs.CV

GEM: Generative Supervision Helps Embodied Intelligence

GEM: 生成式监督助力具身智能

Ruowen Zhao, Bangguo Li, Zuyan Liu, Yinan Liang, Junliang Ye, Fangfu Liu, Diankun Wu, Zhengyi Wang, Xumin Yu, Yongming Rao, Han Hu, Jun Zhu

发表机构 * Tsinghua University(清华大学) Tencent Hunyuan(腾讯文心)

AI总结 提出GEM模型,通过在视觉语言模型预训练中引入深度图生成任务,联合训练以提升具身智能的语义理解与物理操作能力,并发布大规模数据集GEM-4M,在多个基准上取得最优结果。

Comments Project Page: https://zhaorw02.github.io/GEM/

详情
AI中文摘要

具身视觉语言模型(VLMs)在机器人领域,特别是在视觉-语言-动作框架中,展示了令人印象深刻的性能和泛化能力。然而,标准文本引导预训练范式的高层语义焦点与具身环境中执行所需的关键低层空间和物理知识之间仍存在显著差距。在本文中,我们介绍了GEM,一种生成式监督的具身视觉语言模型,旨在弥合这一鸿沟。我们提出将深度图生成任务直接集成到VLM预训练阶段。通过将这一生成目标与主模型联合训练,我们观察到具身智能的显著提升,同时增强了语义理解和物理操作能力。为了支持这一范式,我们整理并发布了GEM-4M,一个包含基础、推理和规划数据与高质量深度监督配对的大规模综合数据集。大量实验表明,GEM在多个具身基准上取得了最先进的结果。此外,我们部署的动作模型GEM-VLA在模拟环境和真实世界评估中均表现出卓越的任务执行能力。代码、模型和数据集可在https://zhaorw02.github.io/GEM/获取。

英文摘要

Embodied Vision-Language Models (VLMs) have demonstrated impressive performance and generalization in robotics, particularly within Vision-Language-Action frameworks. However, a significant gap remains between the high-level semantic focus of standard text-guided pre-training paradigms and the low-level spatial and physical knowledge critical for execution in embodied environments. In this paper, we introduce GEM, a Generative-supervised Embodied vision-language Model designed to bridge this divide. We propose integrating a depth map generation task directly into the VLM pre-training phase. By training this generative objective jointly with the main model, we observe substantial improvements in embodied intelligence, significantly enhancing both semantic understanding and physical operation capabilities. To support this paradigm, we curate and release GEM-4M, a comprehensive large-scale dataset featuring a mixture of grounding, reasoning, and planning data paired with high-quality depth supervision. Extensive experiments demonstrate that GEM achieves state-of-the-art results across diverse embodied benchmarks. Furthermore, our deployed action model, GEM-VLA, exhibits vastly superior task execution abilities in both simulation environments and real-world evaluations. Code, models, and datasets are available at https://zhaorw02.github.io/GEM/

2605.28544 2026-05-28 cs.CV

DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving

DriveWAM: 视频生成先验实现自动驾驶的可扩展世界-动作建模

Chen Shi, Jinrui Xu, Shaoshuai Shi, Kehua Sheng, Bo Zhang, Li Jiang

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Voyager Research, Didi Chuxing(Voyager Research,滴滴出行)

AI总结 提出DriveWAM,通过将预训练视频扩散Transformer适配为自回归视频-动作策略,并引入场景演化驾驶引导和选择性KV记忆,实现可扩展的世界-动作建模,在NAVSIM和PhysicalAI基准上取得强规划性能。

详情
AI中文摘要

预训练基础模型已成为端到端自动驾驶的重要基础。与主要在静态图像-文本对上预训练的视觉-语言模型相比,视频生成模型捕获了自然适合驾驶的时间动态和运动先验。我们提出DriveWAM,一种驾驶世界-动作模型,它将预训练的视频扩散Transformer适配为自回归视频-动作策略。DriveWAM将视频和动作流组织成统一的时序token序列,并在联合流匹配目标下训练它们,保留预训练的视频生成架构,同时将其大规模视频先验适应于动作生成。为了融入高层场景理解,我们引入了场景演化驾驶引导,其中冻结的VLM生成块特定的语义意图以指导视频-动作生成。为了保持长时域推演有界,我们进一步引入了选择性KV记忆,通过推理时的相关性-冗余性缓存选择来维护有界的模态感知视频和动作记忆池。在NAVSIM和PhysicalAI-Autonomous-Vehicles基准上的实验表明,DriveWAM实现了强大的规划性能,从4k到100k驾驶片段的数据缩放研究进一步证实了世界-动作建模在端到端自动驾驶中的扩展潜力。

英文摘要

Pretrained foundation models have become an important basis for end-to-end autonomous driving. In contrast to vision-language models pretrained primarily on static image-text pairs, video generative models capture temporal dynamics and motion priors that are naturally suited for driving. We present DriveWAM, a driving world-action model that adapts a pretrained video diffusion transformer into an autoregressive video-action policy. DriveWAM organizes video and action streams into a unified temporal token sequence and trains them under a joint flow-matching objective, preserving the pretrained video-generation architecture while adapting its large-scale video priors to action generation. To incorporate high-level scene understanding, we introduce scene-evolving driving guidance, where a frozen VLM produces chunk-specific semantic intent to guide video-action generation. To keep long-horizon rollout bounded, we further introduce selective KV memory, which maintains bounded modality-aware video and action memory pools through relevance-redundancy cache selection at inference time. Experiments on NAVSIM and the PhysicalAI-Autonomous-Vehicles benchmark show that DriveWAM achieves strong planning performance, and a data-scaling study from 4k to 100k driving clips further confirms the scaling potential of world-action modeling for end-to-end autonomous driving.

2605.28543 2026-05-28 cs.AI cs.CL cs.LG

Cultural Binding Heads in Language Models

语言模型中的文化绑定头

Avrile Floro, Luca Benedetto

发表机构 * Mistral-7B Mistral-Nemo-12B Llama-3.1-8B Gemma-2-9B

AI总结 通过机制可解释性和析因设计,识别出8个语言模型中2-3个中间层注意力头对文化绑定有因果贡献,且绑定主要在预训练阶段形成,知识探测表明模型知道的知识远多于其行为表现。

详情
AI中文摘要

大型语言模型通常默认对不同文化群体一视同仁,即使上下文需要区分:这缺乏差异意识。利用机制可解释性和Wang等人(2025)的N4文化挪用基准上的析因设计,我们在八个模型(四种架构,基础版和指令版)中识别出每个模型有2-3个中间层注意力头对文化绑定有因果贡献。文化绑定是将文化项目与适当身份关联的过程。敲除这些头上的身份到项目边会使绑定强度降低9-23%。识别出的头从指令模型转移到基础模型,表明文化绑定是在预训练阶段创建的。α缩放显示分级剂量反应,生成时适度放大引导(α=2-3)可将文化区分准确性提高1-3个百分点,同时基本保持中性推理不变。知识探测任务表明,模型知道的知识比其行为表现多3-5倍,表明瓶颈在于路由而非知识。

英文摘要

LLMs often default to equal treatment across cultural groups, even though context warrants differentiation: this is a lack of difference awareness. Using mechanistic interpretability and a factorial design on the N4 cultural appropriation benchmark from Wang et al. (2025), we identify 2-3 mid-layer attention heads per model that contribute causally to cultural binding across eight models (four architectures, base and instruct). Cultural binding is the process of associating cultural items with the appropriate identity. Knockout of the identity-to-item edges on these heads lowers the binding strength by 9-23%. The identified heads transfer from instruct to base models, suggesting that cultural binding is created at pre-training. An $α$-scaling shows a graded dose-response and moderate amplification steering at generation ($α= 2-3$) increases cultural differentiation accuracy by 1-3 pp while leaving neutral reasoning mostly intact. A knowledge probing task shows that models know 3-5 times more than they act upon it, indicating that the bottleneck lies in routing and not knowledge.

2605.28534 2026-05-28 cs.CL

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection

GUI-CIDER:通过因果内化和密度感知示例重选进行GUI代理的中期训练

Zheng Wu, Chengcheng Han, Zhengxi Lu, Tianjie Ju, Yanyu Chen, Qi Gu, Xunliang Cai, Zhuosheng Zhang

发表机构 * School of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学学院) Meituan(美团) Zhejiang University(浙江大学) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出GUI-CIDER中期训练方法,通过因果内化和密度感知示例重选显式内化GUI世界知识,提升代理对GUI操作的理解和任务成功率。

详情
AI中文摘要

尽管多模态大语言模型在构建图形用户界面(GUI)代理方面取得了快速进展,但其现实世界任务完成从根本上受到缺乏GUI操作世界知识的瓶颈。现有解决方案通常依赖昂贵的多代理框架或传统的后训练范式,如监督微调(SFT)和强化学习(RL)。然而,后训练仅允许代理通过动作注释或奖励信号隐式吸收世界知识,导致低效的轨迹记忆而非真正理解。因此,一种能够显式学习这些知识的方法至关重要。为此,我们提出GUI-CIDER,一种通过因果内化和密度感知示例重选显式内化GUI世界知识的中期训练方法。GUI-CIDER分为三个阶段:(1)数据合成,从GUI轨迹中提取静态规划和动态因果知识为文本;(2)示例重选,通过奖励因果结构和惩罚语义冗余来过滤语料库;(3)中期训练,使用精炼数据嵌入所学知识。在两个GUI知识基准和三个任务完成基准上的大量实验表明,GUI-CIDER持续提升了代理对GUI操作的理解及其任务成功率。代码可在https://github.com/Wuzheng02/GUI-CIDER获取。

英文摘要

Despite the rapid progress of multimodal large language models in building Graphical User Interface (GUI) agents, their real-world task completion is fundamentally bottlenecked by a lack of world knowledge about GUI operations. Existing solutions typically rely on expensive multi-agent scaffolding or conventional post-training paradigms, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). However, post-training only allows agents to implicitly absorb world knowledge through action annotations or reward signals, leading to inefficient trajectory memorization rather than genuine comprehension. Therefore, an approach that enables explicit learning of this knowledge is imperative. To this end, we propose GUI-CIDER, a mid-training method that explicitly internalizes GUI world knowledge through Causal Internalization and Density-aware Exemplar Reselection. GUI-CIDER operates in three stages: (1) data synthesis, which distills static planning and dynamic causal knowledge from GUI trajectories into text; (2) exemplar reselection, which filters the corpus by rewarding causal structures and penalizing semantic redundancy; and (3) mid-training, where the refined data is used to embed the acquired knowledge. Extensive experiments on two GUI knowledge benchmarks and three task completion benchmarks demonstrate that GUI-CIDER consistently improves both the agent's understanding of GUI operations and its task success rates.The codes are available at https://github.com/Wuzheng02/GUI-CIDER.

2605.28533 2026-05-28 cs.LG

Semi-Supervised Hypothesis Testing by Betting on Predictions

基于预测投注的半监督假设检验

Yaniv Tenzer, Elad Tolochinsky, Yaniv Romano

发表机构 * Department of Computer Science, Technion – Israel Institute of Technology(计算机科学系,技术Ion – 以色列理工学院) Department of Electrical and Computer Engineering, Technion – Israel Institute of Technology(电气与计算机工程系,技术Ion – 以色列理工学院)

AI总结 提出一种基于预测投注的框架,利用无标签数据增强序贯假设检验的效力,通过引入e统计量实现任意有效的检验,并在标签偏移或概念偏移下保持有效性。

详情
AI中文摘要

我们引入了一个基于预测投注的框架,利用无标签数据上的预测来增强序贯假设检验的效力。给定来自$(X,Y)$联合分布的有限样本,以及来自$X$边际分布的额外无标签样本,我们探究如何利用无标签数据对$Y$的分布以及$Y\mid X$的条件分布进行假设。我们引入了一个e统计量,并用它构建了一个序贯检验。在标准分布假设——标签偏移或概念偏移下,我们证明了该检验是任意有效的。此外,我们表明对于二元数据,该e统计量具有非平凡的检验功效。关键在于,即使底层预测不准确,我们的方法仍能保持这些性质。通过模拟实验和在大语言模型评估中的应用,我们展示了该方法相对于基线方法(包括预测驱动推断)的效力提升。即使在无标签数据相对有限,且由于$X$和$Y$之间弱相关导致预测精度较低的情况下,这些提升仍然存在。

英文摘要

We introduce a testing-by-betting framework that leverages predictions on unlabeled data to enhance the power of sequential hypothesis testing. Given limited samples from the joint distribution of $(X,Y)$, and additional unlabeled samples from the marginal of $X$, we ask how unlabeled data can be used to hypothesize about the distribution of $Y$, and the conditional distribution of $Y\mid X$. We introduce an e-statistic and use it to construct a sequential test. Under standard distributional assumptions -- label shift or concept shift -- we establish that the test is anytime valid. Furthermore, we show that for binary data, the e-statistic has non-trivial power. Crucially, our approach retains these properties even when the underlying predictions are inaccurate. Through simulations and applications to large language models evaluation, we demonstrate power gains over baseline approaches, including prediction-powered inference. These gains persist even with relatively limited unlabeled data and when predictions have low accuracy due to weak correlation between $X$ and $Y$.

2605.28532 2026-05-28 cs.AI

Do Agents Know What They Can't Do? Evaluating Feasibility Awareness in Tool-Using Agents

智能体知道它们不能做什么吗?评估使用工具的智能体的可行性意识

Liang Cheng, Mingsheng Cai, Jiuming Jiang, Luo Mai

发表机构 * University of Edinburgh(爱丁堡大学)

AI总结 提出FeasiGen自动构建不可行任务管道,通过屏蔽关键工具将可解任务转为不可解,评估发现多数模型缺乏可行性检测能力,错误继续率高达73.9%。

Comments 14 pages

详情
AI中文摘要

使用工具的智能体通常因长推理链和迭代工具使用而产生大量计算成本。在实际场景中,许多任务在受限的工具环境下变得不可行,因为成功完成任务所需的能力不可用。检测不可行任务并提前停止执行可以显著减少不必要的执行成本。在这项工作中,我们提出了FeasiGen,一个自动构建不可行智能体任务的管道,通过识别成功完成任务所需的关键工具。我们的方法从多个智能体系统的成功执行中提取工具调用轨迹,识别不同执行策略中一致共享的关键工具,并屏蔽这些工具,从而自动将可解任务转化为不可解任务。人工验证确认,我们构建的任务的不可行性标注准确率超过94%。我们进一步引入了可行性感知评估指标,用于衡量智能体是否能识别不可行任务并适当停止执行。在九个模型上的广泛评估揭示了显著弱的不可行性检测能力,错误继续率高达73.9%。我们进一步观察到,多智能体架构在不可行条件下显著减少了错误执行。

英文摘要

Tool-using agents often incur substantial computational cost due to long reasoning chains and iterative tool usage. In practical scenarios, many tasks become infeasible under constrained tool environments, where the capabilities required for successful task completion are unavailable. Detecting infeasible tasks and stopping execution early can significantly reduce unnecessary execution cost. In this work, we propose FeasiGen, an automatic pipeline for constructing infeasible agent tasks by identifying the critical tools required for successful task completion. Our approach extracts tool-calling traces from successful executions across multiple agent systems, identifies critical tools consistently shared across diverse execution strategies, and masks these tools to automatically transform solvable tasks into infeasible ones. Human verification confirms that the infeasibility annotations for our constructed tasks achieve over 94% accuracy. We further introduce feasibility-aware evaluation metrics for measuring whether agents can recognize infeasible tasks and stop execution appropriately. Extensive evaluations across nine models reveal substantially weak infeasibility detection ability, with false continue rate reaching up to 73.9%. We further observe that multi-agent architectures significantly reduce erroneous execution under infeasible conditions.

2605.28531 2026-05-28 cs.LG

Stabilizing distribution-free probabilistic forecasts

稳定化无分布概率预测

Jente Van Belle, Honglin Wen, Wouter Verbeke, Pierre Pinson

发表机构 * Faculty of Economics and Business(经济与商业学院) Department of Electrical Engineering, Shanghai Jiao Tong University(上海交通大学电气工程学院) Dyson School of Design Engineering, Imperial College London(帝国理工学院设计工程学院) Department of Technology, Management and Economics, Technical University of Denmark(丹麦技术大学技术、管理与经济学系) CoRE, Aarhus University(阿贾克斯大学CoRE)

AI总结 提出一种基于神经网络参数化回归样条的方法,联合优化无分布概率时间序列预测的质量与稳定性,以控制预测更新导致的波动,并在两个数据集上验证了其有效性。

详情
AI中文摘要

多步预测通常会在新观测值可用时进行更新,因为较短的预测期限通常会提高预测质量。然而,这种改进是以预测不稳定性为代价的,即同一目标时期的预测值存在变异性。这种不稳定性可能引发基于预测制定的计划发生代价高昂的变更,并可能削弱对预测系统的信任。在这项工作中,我们将预测稳定性与预测质量一起纳入无分布概率时间序列预测模型的训练中,从而能够控制这种权衡。我们提出了一种使用神经网络参数化的回归样条生成稳定化预测条件分位数函数的方法。这种方法能够联合优化质量和稳定性,因为它允许我们直接惩罚由预测更新引起的差异。此外,它允许对稳定预测分布的不同部分(例如,中心部分与尾部)赋予不同的重要性,以专注于对预期下游应用最相关的部分(例如,库存管理的上尾)。我们在两个具有不同统计特性的数据集上对所提出的方法进行了实证评估,结果表明,它可以在不显著损失预测质量的情况下有效降低预测不稳定性,并且可以将稳定化努力针对预测分布的特定部分。

英文摘要

Multi-step-ahead forecasts are often updated as new observations become available, since shorter forecast horizons typically improve forecast quality. However, such improvements come at the cost of forecast instability, i.e., variability in forecasts for the same target period. This instability can trigger costly changes to plans formulated based on the forecasts and may erode trust in the forecasting system. In this work, we integrate forecast stability alongside forecast quality into the training of distribution-free probabilistic time-series forecasting models, allowing us to control this trade-off. We propose a method for generating stabilized forecasted conditional quantile functions using regression splines parameterized by a neural network. This approach enables joint optimization of quality and stability, as it allows us to directly penalize dissimilarities arising from forecast updates. Furthermore, it allows assigning varying importance to stabilizing different parts of the forecast distributions (e.g., central parts vs. tails) to focus on the parts most relevant for the intended downstream use (e.g., the upper tail for inventory management). We empirically evaluate the proposed method on two datasets with different statistical properties and show that it can effectively reduce forecast instability without a substantial loss in forecast quality, and that it can target stabilization effort toward specific parts of the forecast distributions.

2605.28527 2026-05-28 cs.RO

What Frozen VLAs Already Know About Success: A Probing Study of Value-Like Structure in Foundation Robot Policies

冻结的VLA已经知道关于成功的信息:对基础机器人策略中价值类结构的探测研究

Jiachen Zhang, Junnan Nie, Junyi Lao, Wei Cheng, Chenghao Liu, Jiaxin Jiang, Songfang Huang

发表机构 * Peking University(北京大学) China Agricultural University(中国农业大学)

AI总结 通过线性探测从冻结的VLA特征中预测蒙特卡洛结果目标,发现其编码了成功信息,并可用于测试时动作选择提升成功率。

Comments 14 pages, 1 figure, 11 tables. Equal contribution: Jiachen Zhang, Junnan Nie, and Junyi Lao. Corresponding author: Songfang Huang. Preprint

详情
AI中文摘要

视觉-语言-动作(VLA)策略被训练来模仿动作;它们的损失函数从未要求它们估计奖励、进展或未来成功。然而,它们冻结的表示仍然携带这些信息,并且可以在不重新训练策略的情况下被读取并用于指导动作选择。从LIBERO-Goal上的混合成功和失败操作轨迹中,我们使用冻结特征上的轻量级线性探测恢复了蒙特卡洛结果目标。这些目标可以从OpenVLA、Pi0.5、DINOv2和CLIP特征中一致地预测,而基于进展、剩余时间、任务身份或本体感觉的基线则显著较差。为了排除任务和时间捷径,我们在相同任务、相同时间步的匹配比较下评估探测:Pi0.5探测仍然达到约92%的成对排序准确率,而标签打乱的对照则停留在随机水平。作为测试时选择器,在采样的Pi0.5动作前缀上使用相同的探测,将这一离线发现转化为行为:在推板任务中,成功率从贪婪解码下的26.7%上升到44.3%,在酒架任务中也有一个正面案例。这种提升并非普遍适用,并且需要额外的推理计算,但底层发现是清晰的:冻结的VLA已经编码了关于成功的信息,而它们的模仿目标从未明确要求这些信息。

英文摘要

Vision--language--action (VLA) policies are trained to imitate actions; their loss never asks them to estimate reward, progress, or future success. Their frozen representations nevertheless carry such information, and it can be read out and used to guide action choice without retraining the policy. From mixed successful and failed manipulation trajectories on LIBERO-Goal, we recover Monte-Carlo outcome targets using lightweight linear probes on frozen features. The targets are consistently predictable from OpenVLA, Pi0.5, DINOv2, and CLIP features, and substantially less so from baselines built on progress, time-to-go, task identity, or proprioception. To rule out task and temporal shortcuts, we evaluate the probes under same-task, same-timestep matched comparisons: Pi0.5 probes still reach roughly 92% pairwise ordering accuracy, while label-shuffled controls stay at chance. Used as a test-time selector over sampled Pi0.5 action prefixes, the same probe turns this offline finding into behavior: on push-plate, success rises from 26.7% under greedy decoding to 44.3%, with a second positive case on wine-rack. The gains are not universal and require additional inference compute, but the underlying finding is clean: frozen VLAs already encode information about success that their imitation objective never explicitly demands.

2605.28526 2026-05-28 cs.AI cs.CL

Entropy-aware Masking for Masked Language Modeling

面向掩码语言建模的熵感知掩码策略

Gokul Srinivasagan, Kai Hartung, Munir Georges

发表机构 * AImotion Bavaria(AImotion巴伐利亚) Technische Hochschule Ingolstadt(英戈尔施塔特技术大学)

AI总结 提出基于熵分布的掩码策略,通过模型预测熵识别信息量高的token进行掩码,并引入自掩码方法提升训练效率,在GLUE上平均提升5%。

Comments accepted at starsem 2026 Conference

详情
AI中文摘要

掩码语言建模已成为训练基于编码器的语言模型的标准预训练目标。在该方法中,输入中的某些token被掩码,模型学习利用周围上下文预测它们。这一过程使模型能够捕捉语言的句法和语义属性。传统上,用于掩码的token是随机选择的,这可能并不总是产生最有效的学习信号。在这项工作中,我们研究了一种基于熵分布的token掩码策略。我们利用模型在token预测上的熵来确定哪些token应被掩码。该方法旨在针对信息量更大、不确定性更高的token,以提高训练效率。我们还提出了一种新颖的自掩码方法,无需依赖外部参考模型即可增强训练效率。实验结果表明,与基线相比,我们的方法在GLUE分数上平均提升了5%。此外,我们尝试将知识蒸馏与熵掩码相结合,取得了最佳的整体结果。

英文摘要

Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain tokens in the input are masked, and the model learns to predict them using the surrounding context. This process enables the model to capture both syntactic and semantic properties of language. Conventionally, the tokens selected for masking are chosen at random, which may not always yield the most effective learning signals. In this work, we examine a token masking strategy based on entropy distribution. We use the model's entropy over token predictions to identify which tokens should be masked. This method aims to target tokens that are more informative and uncertain to improve the training efficacy. We also propose a novel self-masking approach that enhances training efficiency without relying on an external reference model. Experimental results demonstrate that our method achieves an average performance improvement of 5% in GLUE scores compared to the baseline. Further, we experiment with combining knowledge distillation with entropy masking, resulting in the best overall results.

2605.28524 2026-05-28 cs.AI

Let Relations Speak: An End-to-End LLM-GNN Soft Prompt Framework for Fraud Detection

让关系说话:面向欺诈检测的端到端LLM-GNN软提示框架

Zhixing Zuo, Huilin He, Jiasheng Wu, Dawei Cheng

发表机构 * School of Computer Science and Technology, Tongji University(同济大学计算机科学与技术学院)

AI总结 提出LGSPF框架,通过软提示桥接图结构与语义空间,并引入并行GNN编码器将多关系拓扑转化为图令牌,实现端到端优化,在欺诈检测中达到最优性能。

Comments 14 pages,3 figures

详情
AI中文摘要

近年来,大型语言模型(LLM)在处理欺诈检测等图任务方面展现出强大能力。然而,现有方法大多严重依赖丰富的文本属性,由于该领域缺乏文本数据,这带来了困难。尽管一些开创性方法试图克服这一问题,但它们通过硬提示将图结构文本化容易导致特征失真。此外,欺诈检测通常表现出多关系复杂性,当前方法难以捕捉这种深层语义信息。为应对这些挑战,我们提出了LLM-GNN软提示框架(LGSPF)。具体而言,LGSPF使用软提示桥接图结构和语义空间,以消除对文本的依赖。我们进一步引入并行图神经网络(GNN)编码器,将多关系拓扑转化为图令牌,用于细粒度的LLM欺诈理解。通过端到端优化,LGSPF增强了LLM和GNN之间的深层语义对齐。在多个欺诈检测基准上的实验表明,我们的方法达到了最先进的性能。此外,我们进一步验证了LGSPF在增强欺诈行为语义可解释性方面的贡献。

英文摘要

In recent years, Large Language Models (LLMs) have shown great capability in processing graph tasks such as fraud detection. However, most existing methods rely heavily on rich text attributes, which poses difficulties for this domain due to the lack of textual data. Although some pioneering methods attempt to overcome it, their textualization of graph structures via hard prompts easily leads to feature distortion. Additionally, fraud detection often exhibits multi-relational complexity, where current methods struggle to capture this deep semantic information. To address these challenges, we propose LLM-GNN Soft Prompt Framework (LGSPF). Specifically, LGSPF bridges the graph structure and semantic space using soft prompt to eliminate reliance on text. We further introduce a parallel Graph Neural Network (GNN) encoder to translate multi-relational topologies into graph tokens for fine-grained LLM fraud comprehension. Through end-to-end optimization, LGSPF enhances deep semantic alignment between LLM and GNN. Experiments across diverse fraud detection benchmarks demonstrate our method achieves state-of-the-art performance. Moreover, we further validate the contribution of LGSPF on enhancing the semantic interpretability of fraud behaviors.

2605.28521 2026-05-28 cs.CL

ClinicalEncoder26AM: A Multlilingual Diagnosable ColBERT Model; Evidences from the MultiClinNER Shared Task

ClinicalEncoder26AM:一个多语言可诊断的ColBERT模型——来自MultiClinNER共享任务的证据

François Remy

发表机构 * Parallia AI

AI总结 本文提出ClinicalEncoder26AM,一个基于BGE-M3的多语言可诊断ColBERT模型,通过多适配器蒸馏和ColBERT式检索目标进行临床后训练,在MultiClinNER任务中微调为BIO标注器,实现了最先进的多语言实体召回率和字符加权F1分数前五。

详情
AI中文摘要

ClinicalEncoder26AM是一个用于临床和生物医学文本的多语言可诊断ColBERT模型,它在多个层次上将其token级语义与ClinicalMap25对齐,ClinicalMap25是一个受BioLORD-2023启发并通过合成和标注监督丰富的临床潜在空间。后训练方案基于BGE-M3,结合了合成临床笔记、患者-医生对话以及MedMentions等标注资源,同时通过多适配器蒸馏考虑命名实体级和句子级表示,并采用ColBERT风格的检索目标。在这篇系统演示论文中,我们通过将模型微调为用于患者症状、疾病和程序范围的BIO标注器来评估其在MultiClinNER共享任务中的表现,使用轻量级两层CNN头部来改善局部边界检测。最终系统保持简单,在单个8192 token窗口中处理大多数文档,实现了最先进的多语言实体召回率,并在所有实体类型和语言的字符加权F1分数中达到前五。训练曲线进一步表明,ClinicalEncoder26AM比基础M3模型在数据效率上显著更高,支持其临床后训练对下游信息提取的有用性。模型可在https://huggingface.co/Parallia/ClinicalEncoder26AM-Diagnosable-Colbert-L2-for-multilingual-medical-texts下载。

英文摘要

ClinicalEncoder26AM is a multilingual Diagnosable ColBERT for clinical and biomedical texts, which aligns at multiple levels its token-level semantic with ClinicalMap25, a clinical latent space inspired by BioLORD-2023 and enriched with synthetic and annotated supervision. The post-training recipe builds upon BGE-M3, and combines synthetic clinical notes, patient--doctor conversations, and annotated resources such as MedMentions, while considering both named-entity-level and sentence-level representations in a multi-adapter distillation, along with a ColBERT-style retrieval objective. In this system demonstration paper, we evaluate the model in the MultiClinNER shared task by finetuning it as a BIO tagger for patient symptoms, disorders, and procedure spans, using a lightweight two-layer CNN head to improve local boundary detection. The resulting system remains simple, processes most documents in a single 8192-token window, and achieves state-of-the-art multilingual entity recall, while achieving Top 5 overall across all entity types and languages in Character-weighted F1 scores. Training curves further show that ClinicalEncoder26AM is markedly more data-efficient than the base M3 model, supporting the usefulness of its clinical post-training for downstream information extraction. The model can be downloaded on https://huggingface.co/Parallia/ClinicalEncoder26AM-Diagnosable-Colbert-L2-for-multilingual-medical-texts

2605.28520 2026-05-28 cs.AI

GS-FUSE: Granger-Supervised Gated Fusion and Multi-Granularity Alignment for Event-Driven Financial Forecasting

GS-FUSE: 格兰杰监督的门控融合与多粒度对齐用于事件驱动的金融预测

Yang Zhang, En Chun, Ziyun Mao, Yulu Wu, Jun Wang

发表机构 * Southwestern University of Finance and Economics(西南财经大学)

AI总结 提出GS-Fuse框架,通过格兰杰因果监督的门控融合模块和多粒度对齐机制,选择性利用事件文本与价格信号,提升金融事件对市场影响的预测精度。

详情
AI中文摘要

准确预测重大金融事件对市场的影响对投资者和政策制定者至关重要。然而,现有的多模态时间序列模型通常对称地融合文本和价格,没有明确的方式来决定事件文本何时真正具有预测性,因此难以利用事件到价格的方向性结构以及文本和价格信号的异质性角色。在这项工作中,我们提出了GS-Fuse,一个基于多模态事件的预测框架,它采用:(i) 格兰杰监督的、因果感知的门控融合模块,该模块仅在事件文本提供超越历史价格的增量预测价值时学习向事件文本开放;(ii) 多粒度对齐机制,该机制将高级事件表示和细粒度文本线索与未来市场轨迹联合对齐。作为构建在现成的大语言模型和时间序列基础模型之上的灵活、即插即用适配器,GS-Fuse可以在不同的骨干网络和市场设置中实例化。在真实世界金融数据集上的大量实验表明,GS-Fuse在多种资产和预测时间范围内始终优于最先进的时间序列和多模态基线。

英文摘要

Accurately forecasting the impact of salient financial events on markets is critical for investors and policymakers. However, existing multimodal time-series models typically fuse text and prices symmetrically, without an explicit way to decide when event text is truly predictive, and thus struggle to exploit the directional event-to-price structure and the heterogeneous roles of textual and price signals. In this work, we propose GS-Fuse, a multimodal event-based forecasting framework that employs (i) a Granger-supervised, causal-aware gated fusion module, which learns to open toward event text only when it provides incremental predictive value beyond historical prices, and (ii) a multi-granularity alignment mechanism that jointly aligns high-level event representations and fine-grained textual cues with future market trajectories. Built as a flexible, plug-and-play adapter on top of off-the-shelf large language models and time-series foundation models, GS-Fuse can be instantiated across diverse backbones and market settings. Extensive experiments on real-world financial datasets show that GS-Fuse consistently outperforms state-of-the-art time-series and multimodal baselines across multiple assets and forecasting horizons.

2605.28517 2026-05-28 cs.LG cs.AI

Stochastic Gradient Descent with Momentum is Algorithmically Stable

带动量的随机梯度下降具有算法稳定性

Yunwen Lei, Zimeng Wang, Xiaoming Yuan

发表机构 * Department of Mathematics, The University of Hong Kong(香港大学数学系) Department of Mathematics and Mathematical Statistics, Umeå University(乌梅大学数学与统计学系)

AI总结 本文通过算法稳定性分析,证明了带动量的随机梯度下降(SGDM)在光滑凸问题上具有泛化保证,并建立了最优的过界总体风险界。

详情
AI中文摘要

带动量的随机梯度下降(SGDM)是机器学习中最广泛使用的优化算法之一。尽管文献中已经广泛研究了SGDM的优化性质,但关于SGDM是否以及何时能够很好地泛化到未见数据,仍然不够清楚。特别是,有人推测虽然动量加速了训练,但可能会降低泛化性能。在本文中,我们通过算法稳定性的视角,对SGDM进行了全面的泛化分析,填补了这一空白。更具体地说,我们引入了一个广义的SGDM框架,该框架涵盖了Polyak和Nesterov的动量方案,并为光滑凸问题建立了紧的平均模型稳定性界。值得注意的是,所获得的界利用了沿轨迹的小优化误差界,适用于区间$[0, 1)$内的任何动量参数,并且不需要通常假设的损失函数的Lipschitz连续性。我们进一步推导了广义SGDM的优化误差界,并将其与我们的泛化分析相结合,为具有Polyak和Nesterov动量的SGDM获得了最优的过界总体风险界。

英文摘要

Stochastic gradient descent with momentum (SGDM) is one of the most widely used optimization algorithms in machine learning. While optimization properties of SGDM have been extensively studied in the literature, it remains insufficiently understood whether and when SGDM can generalize well to unseen data. In particular, it has been conjectured that while momentum accelerates training, it may degrade generalization. In this paper, we close this gap by developing a comprehensive generalization analysis of SGDM through the lens of algorithmic stability. More specifically, we introduce a generalized SGDM framework that encompasses both Polyak's and Nesterov's momentum schemes, and establish tight on-average model stability bounds for smooth and convex problems. Notably, the obtained bounds exploit small optimization error bounds along the trajectory, apply to any momentum parameter in the interval $[0, 1)$, and do not require the commonly assumed Lipschitzness of loss functions. We further derive optimization error bounds for the generalized SGDM, and combine them with our generalization analyses to obtain optimal excess population risk bounds for SGDM with both Polyak's and Nesterov's momentum.

2605.28513 2026-05-28 cs.LG cs.AI

Learning Theory of the SVRG: Generalization and Convergence Analysis

SVRG的学习理论:泛化与收敛性分析

Yunwen Lei, Zimeng Wang, Xiaoming Yuan

发表机构 * Department of Mathematics, The University of Hong Kong(香港大学数学系) Department of Mathematics and Mathematical Statistics, Umeå University(乌梅大学数学与统计学系)

AI总结 本文通过算法稳定性分析,首次为非凸和强凸设置下的SVRG方法建立了非平凡的泛化界,揭示了优化与泛化之间的相互作用,并得到了最优的过量风险界。

详情
AI中文摘要

方差缩减(VR)方法采用方差递减的随机梯度,因其高效性被广泛应用于机器学习中的大规模优化问题。现有的VR方法理论研究主要集中在收敛性分析上,而泛化行为在很大程度上未被探索。本文通过算法稳定性的视角,首次为代表性VR方法——随机方差缩减梯度(SVRG)建立了非平凡的泛化分析,填补了这一空白。特别地,我们利用SVRG的算法结构,在凸和强凸两种设置下建立了尖锐的稳定性界。所得到的界是数据依赖的,因为训练误差沿轨迹被纳入。我们的分析阐明了优化与泛化之间的相互作用,从而在两种设置下都得到了最优的过量风险界。我们的方法与现有的随机算法分析有本质不同,我们将SVRG更新分解为类似SGD的步骤加上一个零均值修正项,然后引入新的Lyapunov函数来吸收由参考点引起的额外梯度项。我们的分析框架可以推广到其他VR方法,并通过著名的随机平均梯度加速(SAGA)方法展示了泛化性。

英文摘要

Variance reduction (VR) methods employ stochastic gradients with decreasing variance, and they have been widely applied to solve large-scale optimization problems in machine learning because of their efficiency. Existing theoretical studies of VR methods are mainly focused on the convergence analysis, leaving the generalization behavior largely unexplored. In this paper, we bridge this gap by developing the first non-vacuous generalization analysis of the representative VR method: Stochastic Variance Reduced Gradient (SVRG), through the lens of algorithmic stability. In particular, we establish sharp stability bounds of the SVRG in both convex and strongly convex settings by exploiting its algorithmic structure. The obtained bounds are data-dependent, because the training errors are incorporated along the trajectory. Our analysis clarifies the interplay between optimization and generalization, leading to optimal excess population risk bounds in both settings. Our approach differs substantially from existing analyses of stochastic algorithms in the sense that we decompose the SVRG update as an SGD-like step plus a zero-mean correction term and then introduce novel Lyapunov functions to absorb the additional gradient terms induced by the reference points. Our analytical framework can be generalized to other VR methods, and we demonstrate the generalization by the well-known Stochastic Average Gradient Accelerated (SAGA) method.

2605.28512 2026-05-28 cs.CL

On Compositional Learning Behaviours in Formal Mathematics

论形式数学中的组合学习行为

Kevin Yandoka Denamganaï

发表机构 * University of York(约克大学)

AI总结 本文提出 S2B-LM 基准,通过去除数值处理混淆并添加思维链框架来评估组合学习行为(CLB),发现 CLB 能力对于形式数学验证的困难部分必要但不充分。

Comments work in progress, under review

详情
AI中文摘要

能够征服形式数学困难尾部的自我进化科学智能体需要组合学习行为(CLBs)——在上下文中基础化和重组新颖符号结构的能力,而不仅仅是预学习原子的重组。我们提出了 extbf{S2B-LM},这是符号行为基准的一个改编,它移除了数值处理作为混淆因素,并添加了思维链框架以引发而非仅仅探测潜在的 CLB 能力。在 CLB 能力(adj-ZSCT)和 miniF2F 整体证明性能上交叉评估十个 Lean~4 定理证明器,精确置换检验建立了一个层次必要性结构:搜索密集型模型覆盖了可处理的绝大部分而没有可检测的 CLB,然而每个进入奥林匹克级别(miniF2F $>75\%$)的模型都是五个最高 CLB 得分者之一($p=0.004$)。在排除模型规模作为混淆因素后,我们的结果表明 CLB 能力对于形式数学验证的困难尾部是 \emph{必要但不充分的}。

英文摘要

Self-evolving scientific agents capable of conquering the hard tail of formal mathematics require Compositional Learning Behaviours (CLBs) -- the capacity to ground and recombine novel symbolic structures in context, beyond mere recombination of prelearned atoms. We propose \textbf{S2B-LM}, an adaptation of the Symbolic Behaviour Benchmark that removes numerical processing as a confound and adds chain-of-thought scaffolding to elicit rather than merely probe latent CLB competency. Cross-evaluating ten Lean~4 theorem provers on CLB competency (adj-ZSCT) and miniF2F whole-proof performance, exact permutation tests establish a hierarchical necessity structure: search-heavy models cover the tractable bulk without detectable CLBs, yet every model breaking into the Olympiad-level tier (miniF2F $>75\%$) is among the five highest CLB scorers ($p=0.004$). After ruling out model scale as a confound, our results show that CLB competency is \emph{necessary but not sufficient} for the hard tail of formal mathematical verification.

2605.28501 2026-05-28 cs.LG

Fitting Unknown Number of Hyperplanes with Manifold Optimization

基于流形优化的未知数量超平面拟合

Zhiqin Cheng, Yu Zhan, Mingjin Zhang, Lingbo Liu, Liang Lin

发表机构 * Department of Electronic and Electrical Engineering, Southern University of Science and Technology, ShenZhen, China(南方科技大学电子与电气工程系) Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong(香港理工大学计算机系) School of Comuputer Science and Engineering, Sun Yat-Sen University, GuangZhou, China(中山大学计算机科学与工程学院) Research Institute of Multiple Agents and Embodied Intelligence, Pengcheng Laboratory, ShenZhen, China(鹏城实验室多智能体与具身智能研究院)

AI总结 针对未知数量超平面拟合的非凸、非可微及模型阶数未知问题,提出基于流形优化的两阶段算法,通过黎曼期望最大化与投影密度估计实现高精度鲁棒拟合。

详情
AI中文摘要

将未知数量的超平面拟合到数据是机器学习中一个基本但具有挑战性的问题,其特点是非凸性、非可微性和未知模型阶数。现有方法常陷入局部最优或缺乏几何一致性。为解决这些局限,我们提出一种基于流形优化的新框架。我们将问题重新表述为单位球面流形 $\mathcal{S}^{ extbf{dim}-1}$ 上的无监督学习任务。该公式有效处理了非凸约束并线性化了距离度量,使得梯度下降易于处理。我们提出了一种两阶段流形优化算法。在第一阶段,我们采用带有重尾核的黎曼期望最大化过程来鲁棒地估计后验概率,有效解决了相交超平面间点分布的歧义。在第二阶段,当软估计收敛后,概率权重退化为硬匹配,产生严格满足几何定义的精确局部最优解。此外,我们引入了一种投影密度估计策略用于初始化,通过显著降低特征描述空间和搜索复杂度来促进全局收敛。大量实验表明,我们的方法在几何精度和鲁棒性方面均优于最先进的基线方法。

英文摘要

Fitting an unknown number of hyperplanes to data is a fundamental yet challenging problem in machine learning, characterized by its non-convexity, non-differentiability, and unknown model order. Existing approaches often struggle with local optima or lack geometric consistency. To address these limitations, we propose a novel framework based on Manifold Optimization. We reformulate the problem as an unsupervised learning task on the unit sphere manifold $\mathcal{S}^{\textbf{dim}-1}$. This formulation effectively handles the non-convex constraints and linearizes the distance measurement, rendering the gradient descent tractable. We propose a Two-Stage Manifold Optimization algorithm. In Phase I, we employ a Riemannian Expectation-Maximization process with a heavy-tailed kernel to robustly estimate posterior probabilities, effectively resolving the ambiguities of point distribution between intersecting hyperplanes. In Phase II, upon convergence of the soft estimates, the probabilistic weights degenerate into hard matching, generating a precise local optimum that strictly satisfies the geometric definition. Furthermore, we introduce a projected density estimation strategy for initialization to facilitate global convergence by significantly reducing the feature description space and search complexity. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in both geometric accuracy and robustness.

2605.28500 2026-05-28 cs.CL cs.AI cs.LG

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

功能熵:通过不确定性量化预测LLM生成代码的功能正确性

Dylan Bouchard, Mohit Singh Chauhan, Zeya Ahmad, Ho-Kyeong Ra

发表机构 * CVS Health(CVS健康)

AI总结 针对LLM生成代码功能不正确的问题,提出基于功能等价性的不确定性量化方法(功能熵),在多个编程语言和模型上优于现有方法。

详情
AI中文摘要

大型语言模型在代码生成方面表现出令人印象深刻的能力,但它们经常生成功能不正确的代码。不确定性量化(UQ)方法已成为检测自然语言生成中幻觉的有前途的方法,但它们在代码生成任务中的有效性仍未得到充分探索。我们系统地评估了UQ技术如何跨三种编程语言、五个LLM和超过1700个问题迁移到代码生成。我们发现,一些基于令牌概率的方法无需修改即可有效泛化,而依赖自然语言推理(NLI)的基于采样的方法失败,因为NLI模型无法区分功能不同的代码,导致大多数响应崩溃为单个语义簇。为了解决这个问题,我们引入了功能等价性方法,这是一类特定于代码的方法,用基于LLM的功能等价性评估取代基于NLI的语义等价性,包括功能熵,即语义熵的代码特定模拟。功能等价性方法在15个模型-基准组合中的11个中实现了最高的AUROC,并在大多数设置中实现了最佳校准,始终优于基于NLI的对应方法以及所有其他评估方法。

英文摘要

Large language models have shown impressive capabilities in code generation, yet they often produce functionally incorrect code. Uncertainty quantification (UQ) methods have emerged as a promising approach for detecting hallucinations in natural language generation, but their effectiveness for code generation tasks remains underexplored. We systematically evaluate how UQ techniques transfer to code generation across three programming languages, five LLMs, and over 1,700 problems. We find that some token-probability-based methods generalize effectively without modification, while sampling-based methods relying on natural language inference (NLI) fail because NLI models cannot distinguish functionally different code, causing most responses to collapse into a single semantic cluster. To address this, we introduce functional equivalence methods, a family of code-specific methods that replace NLI-based semantic equivalence with an LLM-based functional equivalence assessment, including functional entropy, a code-specific analog of semantic entropy. Functional equivalence methods achieve top AUROC in 11 out of 15 model-benchmark combinations and the best calibration across most settings, consistently outperforming both NLI-based counterparts and all other methods evaluated.

2605.28495 2026-05-28 cs.CV

Janus-LoRA: A Balanced Low-Rank Adaptation for Continual Learning

Janus-LoRA:面向持续学习的平衡低秩适配

Cheng Chen, Pengpeng Zeng, Yuyu Guo, Lianli Gao, Hengtao Shen, Jingkuan Song

发表机构 * School of Computer Science and Technology, Tongji University, Shanghai, China(同济大学计算机科学与技术学院,上海,中国) School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China(电子科技大学计算机科学与工程学院,成都,中国) Shanghai Innovation Institute, Shanghai, China(上海创新研究院,上海,中国) Independent Researcher(独立研究者)

AI总结 提出Janus-LoRA框架,通过梯度修正实现参数级正交性以克服灾难性遗忘,并利用解耦边际损失增强特征级分离,从而在持续学习中平衡稳定性与可塑性。

Comments 9pages, International Conference on Machine Learning

详情
AI中文摘要

低秩适配(LoRA)已成为持续学习的一种有前景的范式。它独立更新其低秩因子($A$和$B$),通过它们的相互作用对完整权重矩阵产生复合更新。为了防止灾难性遗忘,该更新应保持与包含先前学习知识的任务特定子空间正交。然而,我们发现这种复合更新系统性地违反了这种正交性,重新引入了干扰并破坏了稳定性。此外,天真地强制执行这种正交性会损害可塑性,破坏微妙的稳定性-可塑性权衡。为了解决这些问题,我们提出了 extbf{Janus-LoRA}框架,通过两个新颖的组件恢复这种平衡。具体来说,我们首先引入梯度修正,这是一种闭式解,数学上解耦LoRA的因子更新,针对通过高效在线估计识别的历史知识子空间强制执行正交性。接下来,为了增强可塑性,我们引入解耦边际损失,通过将新特征表示推离旧特征表示来促进特征级分离,从而为新学习创建独特、低干扰的区域。在具有挑战性的基准上的全面实验表明,通过协调参数级正交性与特征级分离,Janus-LoRA实现了优越的平衡,并建立了新的最先进性能。

英文摘要

Low-Rank Adaptation (LoRA) has emerged as a promising paradigm for Continual Learning. It independently updates its low-rank factors ($A$ and $B$), creating a composite update to the full weight matrix through their interaction. To prevent catastrophic forgetting, this update should remain orthogonal to the task-specific subspace that contains previously learned knowledge. However, we identify that this composite update systematically violates this orthogonality, reintroducing interference and undermining stability. Furthermore, naively enforcing this orthogonality compromises plasticity, disrupting the delicate stability-plasticity trade-off. To resolve these issues, we propose \textbf{Janus-LoRA}, a framework that restores this balance through two novel components. Specifically, we first introduce Gradient Rectification, a closed-form solution that mathematically decouples LoRA's factor updates, enforcing orthogonality against the historical knowledge subspace identified by an efficient Online Estimation. Next, to enhance plasticity, we introduce a Decoupled Margin Loss that promotes feature-level separation by pushing new feature representations away from old ones, thus creating distinct, low-interference regions for new learning. Comprehensive experiments on challenging benchmarks demonstrate that by harmonizing parameter-level orthogonality with feature-level separation, Janus-LoRA achieves a superior balance and establishes new state-of-the-art performance.

2605.28494 2026-05-28 cs.CL

A new semantically annotated corpus with syntactic-semantic and cross-lingual senses

一个带有句法语义和跨语言义项的新语义标注语料库

Myriam Rakho, Eric Laporte, Matthieu Constant

发表机构 * Université Paris-Est(巴黎-埃松大学)

AI总结 本文构建了一个包含20个法语多义动词实例的新语义标注语料库,每个实例标注了三种义项:平行语料中的英语翻译、法语计算词典(Lexicon-Grammar表)条目以及两者的组合细粒度义项。

Journal ref Language Resources and Evaluation (LREC), 2012, Istanbul, Turkey, pp.597-600

详情
AI中文摘要

我们描述了一个用于词义消歧的新义项标注语料库。该语料库由20个法语多义动词的实例组成。每个动词实例都标注了三种义项标签:(1) 该实例在平行语料库英语版本中的实际翻译,(2) 法语计算词典(Lexicon-Grammar表)中的动词条目,以及(3) 由翻译和Lexicon-Grammar条目拼接而成的细粒度义项标签。

英文摘要

We describe a new sense-tagged corpus for word sense disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in the english version of this instance in a parallel corpus, (2) an entry of the verb in a computational dictionary of French (the Lexicon-Grammar tables) and (3) a fine-grained sense label resulting from the concatenation of the translation and the Lexicon-Grammar entry.

2605.28491 2026-05-28 cs.CV

DiscoForcing: A Unified Framework for Real-Time Audio-Driven Character Control with Diffusion Forcing

DiscoForcing:基于扩散强制的实时音频驱动角色控制统一框架

Kaiyang Ji, Bingsheng Qian, Binghuan Wu, Kangyi Chen, Ye Shi, Jingya Wang

发表机构 * ShanghaiTech University(上海科技大学)

AI总结 针对实时音频响应角色控制问题,提出DiscoForcing框架,结合因果音乐编码器和扩散强制序列模型,在严格因果、有限延迟的流式生成中实现音频与全身运动的稳定对齐。

Comments accepted by ICML 2026

详情
AI中文摘要

我们研究实时音频响应角色控制作为一个部署忠实性问题:严格因果、有限延迟的流式生成,必须在交互帧率下生成连贯的全身运动,同时音频条件可能突然变化,包括节奏变化、音频丢失或用户编辑。先前的音乐到运动系统主要针对具有全局上下文的离线生成进行优化,在流式部署中,当条件历史变得过时或不可靠时,性能会下降。我们引入了DiscoForcing,一个流式音频驱动扩散框架,它将捕获节奏结构和相位动态的因果音乐编码器与在时间范围内以异构噪声水平训练的扩散强制序列模型相结合。在此基础上,我们设计了一个混合时间调度和一个历史引导的流式采样器,以明确权衡响应性与非平稳音频下的长期一致性。在端到端实时交互系统中实现,包括在线虚拟角色回放和人形部署工作流,DiscoForcing在匹配因果性和延迟约束下,比先前基线提供更稳定的长期展开和更清晰的音频-运动对齐,同时保持实时吞吐量。

英文摘要

We study real-time audio-responsive character control as a deployment-faithful problem: strictly causal, bounded-latency streaming that must generate coherent full-body motion at interactive frame rates while the audio condition can change abruptly, including tempo shifts, drops, or user edits. Prior music-to-motion systems are largely optimized for offline generation with global context, and degrade in streaming rollouts where conditioning history becomes stale or unreliable. We introduce DiscoForcing, a streaming audio-driven diffusion framework that combines a causal music encoder that captures rhythmic structure and phase dynamics with a diffusion-forcing sequence model trained under heterogeneous noise levels across the temporal horizon. Building on this, we design a hybrid temporal schedule and a history-guided streaming sampler to explicitly trade off responsiveness against long-horizon consistency under non-stationary audio. Implemented in an end-to-end real-time interactive system with online avatar playback and humanoid deployment workflows, DiscoForcing delivers more stable long-horizon rollouts and sharper audio-motion alignment than prior baselines under matched causality and latency constraints while maintaining real-time throughput.

2605.28490 2026-05-28 cs.CV cs.AI

SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs

SSR3D-LLM: 通过潜在步骤实现结构化空间推理以实现统一3D-LLM中的细粒度定位

Jiawei Li, Ziyi Liu, Weijie Shi, Long Chen, Jiajie Xu, Xiaofang Zhou

发表机构 * The Hong Kong University of Science and Technology(香港科学与技术大学) Soochow University(苏州大学)

AI总结 针对统一3D-LLM中细粒度查询的脆弱性,提出SSR3D-LLM,通过潜在空间推理步骤和几何感知评分器逐步精炼候选排名,在多个基准上取得最优结果。

详情
AI中文摘要

3D物体定位从自然语言中定位3D场景中的所指对象。统一的以实例为中心的3D-LLM旨在同时解决定位、对话、问答和描述任务,但许多方法依赖于单一的指针式定位决策,将关系指令压缩为一个选择。这对于需要根据上下文对象和空间关系排除多个同类候选的细粒度查询来说是脆弱的。我们提出结构化空间推理3D-LLM(SSR3D-LLM),一种用于统一3D-LLM的结构化定位接口。给定固定的Mask3D物体提议,LLM从查询中写出一系列潜在的空间推理步骤和记忆令牌,然后一个几何感知评分器读取这些潜在步骤,通过逐步长度掩码逐步精炼候选排名。潜在步骤从标准基准目标监督和训练期间的辅助指代线索监督中学习,而推理仅使用输入查询和Mask3D提议。在ReferIt3D、ScanRefer和Multi3DRef上,SSR3D-LLM在统一3D-LLM基线中取得了最强结果,在细粒度定位上相比单指针QPG基线有显著提升,并相比先前的统一3D-LLM有一致改进,同时保留了默认的语言任务路径。

英文摘要

3D object grounding localizes referred objects in a 3D scene from natural language. Unified instance-centric 3D-LLMs aim to solve grounding together with dialog, QA, and captioning, yet many rely on a single pointer-style grounding decision that compresses a relational instruction into one selection. This is brittle for fine-grained queries where multiple same-class candidates must be ruled out by context objects and spatial relations. We propose Structured Spatial Reasoning 3D-LLM (SSR3D-LLM), a structured grounding interface for unified 3D-LLMs. Given fixed Mask3D object proposals, the LLM writes a sequence of latent spatial reasoning steps and memory tokens from the query, and a geometry-aware scorer reads these latent steps in order to refine candidate rankings step by step with step-length masking. The latent steps are learned from standard benchmark target supervision with auxiliary referential-cue supervision during training, while inference uses only the input query and Mask3D proposals. Across ReferIt3D, ScanRefer, and Multi3DRef, SSR3D-LLM achieves the strongest results among unified 3D-LLM baselines, with substantial gains over the single-pointer QPG baseline on fine-grained grounding and consistent improvements over prior unified 3D-LLMs, while preserving the default language-task route.