arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2280
2603.09117 2026-05-28 cs.LG cs.AI cs.CL

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

解耦推理与置信度:在可验证奖励的强化学习中恢复校准

Zhengzhao Ma, Xueru Wen, Boxi Cao, Yaojie Lu, Hongyu Lin, Jinglin Yang, Min He, Xianpei Han, Le Sun

AI总结 针对RLVR中模型校准退化问题,提出DCPO框架通过解耦推理与校准目标,在保持准确率的同时显著改善校准性能并缓解过度自信。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)显著增强了大语言模型(LLMs)的推理能力,但严重遭受校准退化,即模型对错误答案变得过度自信。以往研究致力于将校准目标直接纳入现有优化目标。然而,我们的理论分析表明,最大化策略准确率与最小化校准误差之间存在根本性的梯度冲突。基于这一见解,我们提出了DCPO,一个简单而有效的框架,系统地解耦了推理和校准目标。大量实验表明,我们的DCPO不仅保持了与GRPO相当的准确率,还实现了最佳的校准性能,并显著缓解了过度自信问题。我们的研究为更可靠的LLM部署提供了宝贵的见解和实用的解决方案。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) significantly enhances large language models (LLMs) reasoning but severely suffers from calibration degeneration, where models become excessively over-confident in incorrect answers. Previous studies devote to directly incorporating calibration objective into existing optimization target. However, our theoretical analysis demonstrates that there exists a fundamental gradient conflict between the optimization for maximizing policy accuracy and minimizing calibration error. Building on this insight, we propose DCPO, a simple yet effective framework that systematically decouples reasoning and calibration objectives. Extensive experiments demonstrate that our DCPO not only preserves accuracy on par with GRPO but also achieves the best calibration performance and substantially mitigates the over-confidence issue. Our study provides valuable insights and practical solution for more reliable LLM deployment.

2410.04096 2026-05-28 cs.LG cs.AI cs.NA cs.NE math.NA physics.comp-ph

Sinc Kolmogorov-Arnold network and its application for solving PDEs with singularities

Sinc Kolmogorov-Arnold 网络及其在求解含奇异性偏微分方程中的应用

Tianchi Yu, Jingwei Qiu, Jiang Yang, Ivan Oseledets

AI总结 本文提出在 Kolmogorov-Arnold 网络中使用 Sinc 插值作为可学习激活函数,以有效逼近光滑函数和含奇异性的函数,并在物理信息神经网络求解偏微分方程中取得更好效果。

详情
Journal ref
Neural Networks 2026
AI中文摘要

在本文中,我们提出在 Kolmogorov-Arnold 网络(一种具有可学习激活函数的神经网络,最近作为多层感知机的替代方案受到关注)中使用 Sinc 插值。已有许多不同的函数表示被尝试,但我们表明 Sinc 插值提供了一种可行的替代方案,因为它在数值分析中已知能有效表示光滑函数和含奇异性的函数。这不仅对函数逼近重要,也对使用物理信息神经网络求解偏微分方程重要。通过一系列实验,我们表明 SincKANs 在我们考虑的大多数示例中提供了更好的结果。

英文摘要

In this paper, we propose to use Sinc interpolation in the context of Kolmogorov-Arnold Networks, neural networks with learnable activation functions, which recently gained attention as alternatives to Multilayer Perceptron. Many different function representations have already been tried, but we show that Sinc interpolation proposes a viable alternative, since it is known in numerical analysis to effectively represent both smooth functions and functions with singularities. This is important not only for function approximation but also for solving the partial differential equations with physics-informed neural networks. Through a series of experiments, we show that SincKANs provide better results in almost all of the examples we have considered.

2604.25491 2026-05-28 cs.CV cs.AI

The Forensic Cost of Watermark Removal: From Dedicated Attacks to Image Editing

水印移除的法医成本:从专用攻击到图像编辑

Gautier Evennou, Ewa Kijak

AI总结 本文提出水印移除检测(WRD)作为新评估维度,通过训练分类器检测移除痕迹,在10^{-3}假阳性率下实现最优检测,证明法医隐蔽性是水印移除的必要条件。

Comments v1:The Forensic Cost of Watermark Removal, accepted at IH&MMSEC 2026, Special Session "Watermarking Across the Lifecycle of Generative Models". v2: extended version, under review

详情
AI中文摘要

当前水印移除方法在两个轴上进行评估:攻击成功率和感知质量。我们证明这是不够的。虽然最先进的攻击成功地在没有可见失真的情况下降低了水印信号,但它们留下了明显的统计伪影,暴露了移除尝试。我们将这个被忽视的轴命名为水印移除检测(WRD),并证明基于这些伪影训练的现代分类器在10^{-3}假阳性率下,对每种测试的移除方法都达到了最先进的检测率。没有现有的攻击考虑到这种法医泄漏。我们在扩展的评估三元组(攻击成功率、感知质量和法医可检测性)下,对领先的水印方案与标准移除流水线进行了基准测试,发现当前没有方法能平衡所有三个。我们的结果确立了法医隐蔽性作为水印移除的必要要求。

英文摘要

Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. While state-of-the-art attacks successfully degrade the watermark signal without visible distortion, they leave distinct statistical artifacts that betray the removal attempt. We name this overlooked axis Watermark Removal Detection (WRD) and demonstrate that a modern classifier trained on these artifacts achieves state-of-the-art detection rates at $10^{-3}$ FPR across every removal method tested. No existing attack accounts for this forensic leakage. We benchmark leading watermarking schemes against standard removal pipelines under the extended evaluation triple of attack success, perceptual quality, and forensic detectability, and find that no current method balances all three. Our results establish forensic stealthiness as a necessary requirement for watermark removal.

2510.24941 2026-05-28 cs.LG

Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought

顿悟时刻可以是假的吗?——量化思维链中的装饰性思考与真实思考

Jiachen Zhao, Yiyou Sun, Weiyan Shi, Dawn Song

AI总结 提出真实思考得分(TTS)量化思维链中每一步对最终答案的因果贡献,发现模型常混合真实思考与装饰性思考,并利用TTS实现有效剪枝与自训练,揭示前沿模型常表述未因果使用的推理步骤。

详情
AI中文摘要

大型语言模型可以生成长的思维链(CoT)推理,但先前的研究表明,在明确设计的设置下,CoT可能是事后合理化,而非计算过程的忠实反映。在这项工作中,我们更进一步,提出了真实思考得分(TTS),用于量化在现实推理问题中CoT每一步对模型最终预测的因果贡献。在从1.5B到1.1T参数的11个模型上,针对常见推理基准,我们发现CoT经常交织真实思考步骤(对最终答案有因果影响)和装饰性思考步骤(看似有用但因果影响很小);即使对于前沿模型,这种装饰性步骤仍然普遍存在:在MATH上,Kimi-K2.6中超过30%的步骤是装饰性的(TTS <= 0.005)。此外,TTS使得有效的CoT剪枝成为可能:移除TTS最低的50%的CoT步骤可以基本保持性能。在这些剪枝后的CoT上进行自训练,可以将Nemotron3-Nano-30B的推理长度减少66%,同时保持性能。最后,我们提供了机制分析,表明LLM可以在潜在空间中被引导以参与或脱离推理步骤。总体而言,我们的结果揭示了前沿LLM经常表述未被因果使用的推理步骤,这对CoT的效率和可信度提出了挑战。

英文摘要

Large language models can generate long chain-of-thought (CoT) reasoning, yet prior work suggests that CoT can be post-hoc rationalization rather than a faithful reflection of the computation through explicitly designed settings. In this work, we go further and propose a True Thinking Score (TTS) to quantify the causal contribution of each step in CoT to the model's final prediction in realistic reasoning problems. Across eleven models ranging from 1.5B to 1.1T parameters on common reasoning benchmarks, we find that CoTs often interleave true-thinking steps, which causally affect the final answer, with decorative-thinking steps, which appear useful but have little causal influence; Such decorative steps remain prevalent even for frontier models: Over 30% of steps in Kimi-K2.6 are decorative on MATH with TTS <= 0.005. Furthermore, TTS enables effective CoT pruning: removing 50% of CoT steps with the lowest TTS can largely maintain the performance. Self-training on these pruned CoTs reduces reasoning length by 66% while preserving performance on Nemotron3-Nano-30B. Finally, we provide a mechanistic analysis showing that LLMs can be steered in the latent space to engage or disengage with reasoning steps. Overall, our results reveal that frontier LLMs often verbalize reasoning steps that are not causally used, challenging both the efficiency and the trustworthiness of CoT.

2604.23472 2026-05-28 cs.AI

Escher-Loop: Mutual Evolution by Closed-Loop Self-Referential Optimization

Escher-Loop:通过闭环自我指涉优化的共同进化

Ziyang Liu, Xinyan Guo, Xuchen Wei, Han Hao, Liu Yang

AI总结 提出Escher-Loop框架,通过任务代理和优化代理的闭环共同进化及动态基准机制,实现超越静态基线的持续性能提升。

Comments The first three authors contributed equally. Corresponding Authors: Han Hao, Liu Yang

详情
AI中文摘要

尽管最近自主代理展示了令人印象深刻的能力,但它们主要依赖于手动脚本化工作流和手工制作的启发式方法,本质上限制了其开放式改进的潜力。为了解决这个问题,我们提出了Escher-Loop,一个完全闭环的框架,实现了两个不同群体的共同进化:解决具体问题的任务代理,以及递归优化任务代理和自身的优化代理。为了维持这种自我指涉的进化,我们提出了一种动态基准测试机制,该机制无缝地将新生成任务代理的经验分数作为相对胜负信号,用于更新优化代理的分数。该机制利用任务代理的进化作为内在信号,驱动优化代理的评估和优化,而无需额外开销。在数学优化问题上的实证评估表明,Escher-Loop有效突破了静态基线的性能上限,在所有评估任务中,在匹配计算量下实现了最高的绝对峰值性能。值得注意的是,我们观察到优化代理动态调整其策略以适应高性能任务代理不断变化的需求,这解释了系统的持续改进和优越的后期性能。

英文摘要

While recent autonomous agents demonstrate impressive capabilities, they predominantly rely on manually scripted workflows and handcrafted heuristics, inherently limiting their potential for open-ended improvement. To address this, we propose Escher-Loop, a fully closed-loop framework that operationalizes the mutual evolution of two distinct populations: Task Agents that solve concrete problems, and Optimizer Agents that recursively refine both the task agents and themselves. To sustain this self-referential evolution, we propose a dynamic benchmarking mechanism that seamlessly reuses the empirical scores of newly generated task agents as relative win-loss signals to update optimizers' scores. This mechanism leverages the evolution of task agents as an inherent signal to drive the evaluation and refinement of optimizers without additional overhead. Empirical evaluations on mathematical optimization problems demonstrate that Escher-Loop effectively pushes past the performance ceilings of static baselines, achieving the highest absolute peak performance across all evaluated tasks under matched compute. Remarkably, we observe that the optimizer agents dynamically adapt their strategies to match the shifting demands of high-performing task agents, which explains the system's continuous improvement and superior late-stage performance.

2604.23282 2026-05-28 cs.CV cs.MM

Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search

弥合姿态-语义鸿沟:基于文本的人物异常搜索的级联框架

Zequn Xie, Guijin Luo, Chuxin Wang, Sihang Cai, Tao Jin, Zhou Zhao, Yixuan Tang

AI总结 提出结构-语义解耦级联(SSDC)框架,通过两阶段检索(结构感知粗检索和多智能体语义验证)平衡效率与语义推理,在PAB基准上达到最优性能。

Comments Accepted to ACL 2026.10 pages, 5 figures

详情
AI中文摘要

基于文本的人物异常搜索利用自然语言查询从监控档案中检索特定行为事件。尽管最近的姿态感知方法能够很好地对齐几何结构,但它们面临一个根本性的姿态-语义鸿沟:语义不同的动作可能共享相似的骨骼几何结构。虽然多模态大语言模型(MLLMs)可以减少这种歧义,但将其用于大规模检索在计算上代价高昂。我们提出了结构-语义解耦级联(SSDC)框架,将检索解耦为两个阶段:(1)结构感知粗检索,其中轻量级模型通过骨骼相似性快速筛选候选;(2)侦探小组交互,一个多智能体语义验证模块。该小组包括一个用于快速二元过滤的侦探、一个用于证据提取的分析师和一个用于语义合成的写手。最后,通过将合成描述与结构先验融合,对候选进行重新排序。在PAB基准上的实验表明,SSDC通过平衡效率和语义推理实现了最先进的性能。

英文摘要

Text-based person anomaly search retrieves specific behavioral events from surveillance archives using natural-language queries. Although recent pose-aware methods align geometric structures well, they face a fundamental Pose-Semantic Gap: semantically different actions can share similar skeletal geometries. While Multimodal Large Language Models (MLLMs) can reduce this ambiguity, using them for large-scale retrieval is computationally prohibitive. We propose the Structure-Semantic Decoupled Cascade (SSDC) framework, which decouples retrieval into two stages: (1) Structure-Aware Coarse Retrieval, where a lightweight model quickly filters candidates by skeletal similarity ; and (2) Detective Squad Interaction, a multi-agent semantic verification module. The squad consists of a Detective for fast binary filtering, an Analyst for evidence extraction, and a Writer for semantic synthesis. Finally, we re-rank candidates by fusing the synthesized captions with structural priors. Experiments on the PAB benchmark show that SSDC achieves state-of-the-art performance by balancing efficiency and semantic reasoning.

2604.23061 2026-05-28 cs.LG cs.AI

C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMs

C-MORAL: 基于强化对齐的可控多目标分子优化用于大语言模型

Rui Gao, Youngseung Jeon, Swastik Roy, Morteza Ziyadi, Xiang 'Anthony' Chen

AI总结 提出C-MORAL框架,通过强化学习后训练结合分组相对优化、属性分数对齐和瓶颈敏感非线性奖励聚合,实现可控多目标分子优化,在C-MuMOInstruct和S$^2$-Bench MolOpt基准上取得最优性能。

Comments 26 pages, 7 figures

详情
AI中文摘要

大型语言模型(LLMs)在分子优化方面展现出潜力,但使其与选择性且相互竞争的药物设计约束对齐仍然具有挑战性。我们提出了C-Moral,一个用于可控多目标分子优化的强化学习后训练框架。C-Moral结合了基于分组的相对优化、针对异构目标的属性分数对齐以及瓶颈敏感的非线性奖励聚合,以提高跨竞争分子属性的稳定性。在C-MuMOInstruct和S$^2$-Bench MolOpt上的实验表明,C-Moral在两个基准上均取得了比较方法中最佳的性能。在C-MuMOInstruct上,C-Moral在域内任务中实现了最佳的成功优化率(SOR)48.9%,在域外任务中为39.5%,同时保持了骨架相似性。在S$^2$-Bench MolOpt上,它在LogP、MR和QED优化任务中也取得了最强结果。这些结果表明,C-Moral是将分子LLMs与连续且受约束的分子设计目标对齐的有效方法。我们的代码和模型公开在https://github.com/Rwigie/C-MORAL。

英文摘要

Large language models (LLMs) show promise for molecular optimization, but aligning them with selective and competing drug-design constraints remains challenging. We propose C-Moral, a reinforcement learning post-training framework for controllable multi-objective molecular optimization. C-Moral combines group-based relative optimization, property score alignment for heterogeneous objectives, and bottleneck-sensitive non-linear reward aggregation to improve stability across competing molecular properties. Experiments on C-MuMOInstruct and S$^2$-Bench MolOpt show that C-Moral achieves the best performance among compared methods on both benchmarks. On C-MuMOInstruct, C-Moral achieves the best Success Optimized Rate (SOR) of 48.9\% on in-domain tasks and 39.5\% on out-of-domain tasks while preserving scaffold similarity. On S$^2$-Bench MolOpt, it also achieves the strongest results across LogP, MR, and QED optimization tasks. These results suggest that C-Moral is an effective way to align molecular LLMs with continuous and constrained molecular design objectives. Our code and models are publicly available at https://github.com/Rwigie/C-MORAL.

2604.19072 2026-05-28 cs.LG cs.AI stat.ML

S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

S2MAM: 半监督元加性模型用于稳健估计和变量选择

Xuelin Zhang, Hong Chen, Yingjie Wang, Tieliang Gong, Bin Gu

AI总结 提出基于双层优化的半监督元加性模型,自动识别信息变量、更新相似矩阵并实现可解释预测,理论保证收敛性和泛化界,实验验证了鲁棒性和可解释性。

Comments Accepted by ICML'2026 as Accept (regular)

详情
AI中文摘要

基于流形正则化的半监督学习是一种经典的联合利用有标签和无标签数据进行学习的框架,其关键要求是未知边际分布的支持集具有黎曼流形的几何结构。通常,基于拉普拉斯-贝尔特拉米算子的流形正则化可以通过与整个训练数据及其对应的图拉普拉斯矩阵相关联的拉普拉斯正则化进行经验近似。然而,图拉普拉斯矩阵严重依赖于预先指定的相似度度量,并且在处理冗余或噪声输入变量时可能导致不适当的惩罚。为了解决上述问题,本文提出了一种新的半监督元加性模型(S$^2$MAM),该模型基于双层优化方案,能够自动识别信息变量、更新相似矩阵,并同时实现可解释的预测。为S$^2$MAM提供了理论保证,包括计算收敛性和统计泛化界。在4个合成数据集和12个真实世界数据集上进行的实验评估,涵盖了不同级别和类型的污染,验证了所提方法的鲁棒性和可解释性。

英文摘要

Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data and its corresponding graph Laplacian matrix. However, the graph Laplacian matrix depends heavily on the prespecified similarity metric and may lead to inappropriate penalties when dealing with redundant or noisy input variables. To address the above issues, this paper proposes a new Semi-Supervised Meta Additive Model (S$^2$MAM) based on a bilevel optimization scheme that automatically identifies informative variables, updates the similarity matrix, and simultaneously achieves interpretable predictions. Theoretical guarantees are provided for S$^2$MAM, including the computing convergence and the statistical generalization bound. Experimental assessments across 4 synthetic and 12 real-world datasets, with varying levels and categories of corruption, validate the robustness and interpretability of the proposed approach.

2604.21534 2026-05-28 cs.CL

UKP_Psycontrol at SemEval-2026 Task 2: Modeling Valence and Arousal Dynamics from Text

UKP_Psycontrol 在 SemEval-2026 任务 2:从文本建模效价和唤醒动态

Darya Hryhoryeva, Amaia Zurinaga, Hamidreza Jamalabadi, Iryna Gurevych

AI总结 针对 SemEval-2026 任务 2,提出三种互补方法(LLM 提示、成对最大熵模型、轻量级神经回归模型)建模文本中的即时情感和短期情感变化,发现 LLM 擅长捕捉静态情感信号,而短期变化更依赖于数值轨迹,系统在子任务 1 和 2A 中排名第一。

Comments Accepted to SemEval 2026 (co-located with ACL 2026)

详情
AI中文摘要

本文介绍了我们为 SemEval-2026 任务 2 开发的系统。该任务要求对按时间顺序排列的用户生成文本中的当前情感和短期情感变化进行建模。我们探索了三种互补的方法:(1)在用户感知和用户无关设置下的 LLM 提示,(2)具有 Ising 式交互的成对最大熵(MaxEnt)模型用于结构化转换建模,以及(3)结合近期情感轨迹和可训练用户嵌入的轻量级神经回归模型。我们的发现表明,LLM 能有效捕捉文本中的静态情感信号,而该数据集中短期情感变化更多地由近期数值状态轨迹解释,而非文本语义。根据官方评估指标,我们的系统在子任务 1 和子任务 2A 中均排名第一。

英文摘要

This paper presents our system developed for SemEval-2026 Task 2. The task requires modeling both current affect and short-term affective change in chronologically ordered user-generated texts. We explore three complementary approaches: (1) LLM prompting under user-aware and user-agnostic settings, (2) a pairwise Maximum Entropy (MaxEnt) model with Ising-style interactions for structured transition modeling, and (3) a lightweight neural regression model incorporating recent affective trajectories and trainable user embeddings. Our findings indicate that LLMs effectively capture static affective signals from text, whereas short-term affective variation in this dataset is more strongly explained by recent numeric state trajectories than by textual semantics. Our system ranked first among participating teams in both Subtask 1 and Subtask 2A based on the official evaluation metric.

2604.20996 2026-05-28 cs.CL

AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

AFRILANGTUTOR:利用大语言模型推进低资源语言的语言辅导与文化教育

Tadesse Destaw Belay, Shahriar Kabir Nahin, Israel Abebe Azime, Ocean Monjur, Marek Rei, Chris Biemann, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam, Anshuman Chhabra

AI总结 针对低资源语言缺乏训练数据的问题,提出AFRILANGDICT词典资源并构建AFRILANGEDU数据集,通过监督微调和直接偏好优化训练AFRILANGTUTOR模型,在10种非洲语言上显著提升辅导性能。

详情
AI中文摘要

如何为缺乏足够训练资源的语言开发语言学习系统?这一挑战日益被非洲大陆的开发者所面临,他们旨在构建能够理解并用当地语言回应的AI系统。为弥补这一差距,我们引入AFRILANGDICT,一个包含19.47万条非洲语言-英语词典条目的集合,作为生成语言学习材料的种子资源,使我们能够自动构建大规模、多样且可验证的学生-导师问答交互,适用于训练AI辅助语言导师。利用AFRILANGDICT,我们构建了AFRILANGEDU,一个包含7.89万个多轮训练示例的数据集,用于监督微调(SFT)和直接偏好优化(DPO)。使用AFRILANGEDU,我们训练了统称为AFRILANGTUTOR的语言辅导模型。我们在AFRILANGEDU上对两个多语言LLM:Llama-3-8B-IT和Gemma-3-12B-IT进行了微调,覆盖10种非洲语言,并评估了它们的性能。结果表明,在AFRILANGEDU上训练的模型始终优于其基础版本,且结合SFT和DPO带来了显著改进,在LLM作为评判者的评估中,四项指标的提升范围从1.8%到15.5%。为促进低资源语言的进一步研究,所有资源均可在https://huggingface.co/afrilang-edu获取。

英文摘要

How can language learning systems be developed for languages that lack sufficient training resources? This challenge is increasingly faced by developers across the African continent who aim to build AI systems capable of understanding and responding in local languages. To address this gap, we introduce AFRILANGDICT, a collection of 194.7K African language-English dictionary entries designed as seed resources for generating language-learning materials, enabling us to automatically construct large-scale, diverse, and verifiable student-tutor question-answer interactions suitable for training AI-assisted language tutors. Using AFRILANGDICT, we build AFRILANGEDU, a dataset of 78.9K multi-turn training examples for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Using AFRILANGEDU, we train language tutoring models collectively referred to as AFRILANGTUTOR. We fine-tune two multilingual LLMs: Llama-3-8B-IT and Gemma-3-12B-IT on AFRILANGEDU across 10 African languages and evaluate their performance. Our results show that models trained on AFRILANGEDU consistently outperform their base counterparts, and combining SFT and DPO yields substantial improvements, with gains ranging from 1.8% to 15.5% under LLM-as-a-judge evaluations across four criteria. To facilitate further research on low-resource languages, all resources are available at https://huggingface.co/afrilang-edu.

2604.05673 2026-05-28 cs.RO cs.AI

Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation

整流薛定谔桥匹配用于少步视觉导航

Wuyang Luan, Junhui Li, Weiguang Zhao, Wenjian Zhang, Tieru Wu, Rui Ma

AI总结 提出整流薛定谔桥匹配(RSBM)框架,利用速度场结构不变性和线性方差减少,在仅3步积分中实现高保真生成策略,满足具身AI低延迟需求。

Comments 18 pages, 7 figures, 10 tables. Code available at https://github.com/WuyangLuan/RSBM

详情
AI中文摘要

视觉导航是具身AI中的核心挑战,要求自主智能体将高维感官观测转化为连续的、长视界动作轨迹。基于扩散模型和薛定谔桥(SB)的生成策略能有效捕捉多模态动作分布,但由于高方差随机传输,需要数十个积分步骤,这对实时机器人控制构成了关键障碍。我们提出整流薛定谔桥匹配(RSBM),该框架利用标准薛定谔桥(ε=1,最大熵传输)与确定性最优传输(ε→0,如条件流匹配)之间共享的速度场结构,由单一熵正则化参数ε控制。我们证明两个关键结果:(1)条件速度场的函数形式在整个ε谱上保持不变(速度结构不变性),使单一网络能够服务于所有正则化强度;(2)减小ε线性降低条件速度方差,实现更稳定的粗步ODE积分。基于缩短传输距离的学习条件先验,RSBM在中间ε下运行,平衡多模态覆盖和路径直线性。实验表明,标准桥需要≥10步才能收敛,而RSBM在仅3个积分步骤中实现了超过94%的余弦相似度和92%的成功率——无需蒸馏或多阶段训练——显著缩小了高保真生成策略与具身AI低延迟需求之间的差距。

英文摘要

Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schrödinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing a critical barrier for real-time robotic control. We propose Rectified Schrödinger Bridge Matching (RSBM), a framework that exploits a shared velocity-field structure between standard Schrödinger Bridges ($\varepsilon=1$, maximum-entropy transport) and deterministic Optimal Transport ($\varepsilon\to 0$, as in Conditional Flow Matching), controlled by a single entropic regularization parameter $\varepsilon$. We prove two key results: (1) the conditional velocity field's functional form is invariant across the entire $\varepsilon$-spectrum (Velocity Structure Invariance), enabling a single network to serve all regularization strengths; and (2) reducing $\varepsilon$ linearly decreases the conditional velocity variance, enabling more stable coarse-step ODE integration. Anchored to a learned conditional prior that shortens transport distance, RSBM operates at an intermediate $\varepsilon$ that balances multimodal coverage and path straightness. Empirically, while standard bridges require $\geq 10$ steps to converge, RSBM achieves over 94% cosine similarity and 92% success rate in merely 3 integration steps -- without distillation or multi-stage training -- substantially narrowing the gap between high-fidelity generative policies and the low-latency demands of Embodied AI.

2604.13583 2026-05-28 cs.CL cs.AI

BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks

BenGER平台:面向德国法律任务端到端基准测试的协作式Web平台

Sebastian Nagl, Matthias Grabmair

AI总结 提出BenGER开源Web平台,集成任务创建、协作标注、可配置LLM运行及多维度评估,支持多组织项目与租户隔离,实现法律推理基准测试的端到端透明与可复现。

Comments Preprint - Accepted at ICAIL 2026

详情
AI中文摘要

评估大语言模型(LLM)的法律推理能力需要涵盖任务设计、专家标注、模型执行和基于指标的评估的工作流。在实践中,这些步骤分散在不同的平台和脚本中,限制了透明度、可复现性以及非技术法律专家的参与。我们提出了BenGER(德国法律基准测试)框架,这是一个开源Web平台,集成了任务创建、协作标注、可配置的LLM运行以及基于词汇、语义、事实和法官指标的评估。BenGER支持具有租户隔离和基于角色的访问控制的多组织项目,并可选择性地为标注者提供形成性的、基于参考的反馈。我们将展示一个实时部署,演示端到端的基准测试创建和分析。

英文摘要

Evaluating large language models (LLMs) for legal reasoning requires workflows that span task design, expert annotation, model execution, and metric-based evaluation. In practice, these steps are split across platforms and scripts, limiting transparency, reproducibility, and participation by non-technical legal experts. We present the BenGER (Benchmark for German Law) framework, an open-source web platform that integrates task creation, collaborative annotation, configurable LLM runs, and evaluation with lexical, semantic, factual, and judge-based metrics. BenGER supports multi-organization projects with tenant isolation and role-based access control, and can optionally provide formative, reference-grounded feedback to annotators. We will demonstrate a live deployment showing end-to-end benchmark creation and analysis.

2604.19669 2026-05-28 cs.LG

HardNet++: Nonlinear Constraint Enforcement in Neural Networks

HardNet++: 神经网络中的非线性约束强制执行

Andrea Goertzen, Kaveh Alim, Youngjae Min, Navid Azizan

AI总结 提出一种通过阻尼局部线性化迭代调整网络输出来强制执行线性和非线性等式与不等式约束的方法,并证明在正则条件下可达到任意精度,应用于非线性模型预测控制问题中实现紧约束满足且不损失最优性。

详情
AI中文摘要

在许多控制和决策应用中,强制执行神经网络输出的约束满足对于安全性、可靠性和物理保真度至关重要。软约束方法在训练期间惩罚违反约束的行为,但不能保证推理期间的约束遵守。其他方法通过投影层保证约束满足,但通常依赖于可行集上存在可处理的投影,限制了它们在更一般问题设置中的实用性。许多感兴趣的现实世界问题是非线性的,缺乏允许可处理投影的特殊结构,这促使开发能够强制执行一般非线性约束的方法。为此,我们引入了HardNet++,一种强制执行线性和非线性等式与不等式约束的约束满足方法。我们的方法通过约束的阻尼局部线性化迭代调整网络输出。每次迭代都是可微的,允许端到端训练框架,其中约束满足层在训练期间处于活动状态。我们证明,在一定的正则条件下,该过程可以强制执行非线性约束满足到任意容差。最后,我们在学习优化背景下展示了紧约束满足而不损失最优性,并将该方法应用于非线性模型预测控制问题。

英文摘要

Enforcing constraint satisfaction in neural network outputs is critical for safety, reliability, and physical fidelity in many control and decision-making applications. While soft-constrained methods penalize constraint violations during training, they do not guarantee constraint adherence during inference. Other approaches guarantee constraint satisfaction via a projection layer, but often rely on the existence of a tractable projection onto the feasible set, limiting their utility in more general problem settings. Many real-world problems of interest are nonlinear and lack the special structure admitting a tractable projection, motivating the development of methods that can enforce general nonlinear constraints. To this end, we introduce HardNet++, a constraint-satisfaction method that enforces linear and nonlinear equality and inequality constraints. Our approach iteratively adjusts the network output via damped local linearizations of the constraints. Each iteration is differentiable, admitting an end-to-end training framework, where the constraint satisfaction layer is active during training. We show that under certain regularity conditions, this procedure enforces nonlinear constraint satisfaction to arbitrary tolerance. Finally, we demonstrate tight constraint adherence without loss of optimality in a learning-for-optimization context, where we apply this method to a nonlinear model predictive control problem.

2604.19355 2026-05-28 cs.LG cs.AI cs.CE

LASER: Learning Active Sensing for Continuum Field Reconstruction

LASER: 用于连续场重建的学习主动感知

Huayu Deng, Jinghui Zhong, Xiangming Zhu, Yunbo Wang, Xiaokang Yang

AI总结 提出LASER框架,将主动感知建模为部分可观测马尔可夫决策过程,利用连续场潜在世界模型和强化学习策略在潜在想象空间中模拟感知场景,实现稀疏约束下的高保真重建。

Comments Accepted by ICML 2026 (Oral)

详情
AI中文摘要

连续物理场的高保真测量对于科学发现和工程设计至关重要,但在稀疏和受限感知条件下仍然具有挑战性。传统的重建方法通常依赖于固定的传感器布局,无法适应演变的物理状态。我们提出LASER,一个统一的闭环框架,将主动感知建模为部分可观测马尔可夫决策过程(POMDP)。其核心是采用连续场潜在世界模型,捕捉底层物理动力学并提供内在奖励反馈。这使得强化学习策略能够在潜在想象空间中模拟“假设”感知场景。通过根据预测的潜在状态调整传感器移动,LASER能够导航到当前观测之外可能的高信息区域。我们的实验表明,LASER在多种连续场中始终优于静态和离线优化策略,在稀疏条件下实现高保真重建。

英文摘要

High-fidelity measurements of continuum physical fields are essential for scientific discovery and engineering design but remain challenging under sparse and constrained sensing. Conventional reconstruction methods typically rely on fixed sensor layouts, which cannot adapt to evolving physical states. We propose LASER, a unified, closed-loop framework that formulates active sensing as a Partially Observable Markov Decision Process (POMDP). At its core, LASER employs a continuum field latent world model that captures the underlying physical dynamics and provides intrinsic reward feedback. This enables a reinforcement learning policy to simulate ''what-if'' sensing scenarios within a latent imagination space. By conditioning sensor movements on predicted latent states, LASER navigates toward potentially high-information regions beyond current observations. Our experiments demonstrate that LASER consistently outperforms static and offline-optimized strategies, achieving high-fidelity reconstruction under sparsity across diverse continuum fields.

2604.18758 2026-05-28 cs.CL

Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation

句法作为罗塞塔石碑:用于上下文科普特语翻译的通用依存关系

Abhishek Purushothama, Emma Thronson, Alexia Guo, Amir Zeldes

AI总结 提出一种结合通用依存句法分析和双语词典的上下文学习方法,用于低资源科普特语到英语的机器翻译,取得了新的最佳结果。

Comments ACL 2026 Findings camera-ready, with fixes

详情
AI中文摘要

低资源机器翻译需要不同于高资源语言的方法。本文提出了一种新颖的上下文学习方法,通过输入句子的通用依存句法分析来增强句法信息,以支持科普特语到英语的低资源机器翻译。在已有使用双语词典支持词汇项推理的工作基础上,我们在输入中添加了多种句法分析表示,具体探索了包含原始解析器输出、用简单英语表达的解析结果,以及针对子树中识别出的困难结构的定向指令及其翻译方法。结果表明,虽然单独的句法信息不如基于词典的注释有用,但将检索到的词典项与句法信息相结合,在不同模型规模上均取得了显著提升,为科普特语翻译实现了新的最佳结果。

英文摘要

Low-resource machine translation requires methods that differ from those used for high-resource languages. This paper proposes a novel in-context learning approach to support low-resource machine translation of the Coptic language to English, with syntactic augmentation from Universal Dependencies parses of input sentences. Building on existing work using bilingual dictionaries to support inference for vocabulary items, we add several representations of syntactic analyses to our inputs , specifically exploring the inclusion of raw parser outputs, verbalizations of parses in plain English, and targeted instructions of difficult constructions identified in sub-trees and how they can be translated. Our results show that while syntactic information alone is not as useful as dictionary-based glosses, combining retrieved dictionary items with syntactic information achieves significant gains across model sizes, achieving new state-of-the-art translation results for Coptic.

2601.11632 2026-05-28 cs.CV

KG-ViP: Bridging Knowledge Grounding and Visual Perception in Multi-modal LLMs for Visual Question Answering

KG-ViP:在多模态大语言模型中桥接知识基础与视觉感知以进行视觉问答

Zhiyang Li, Ao Ke, Yukun Cao, Xike Xie

AI总结 提出KG-ViP框架,通过检索与融合场景图和常识图,统一外部知识与细粒度视觉细节,缓解多模态大语言模型在视觉问答中的知识幻觉和视觉感知不足问题。

详情
AI中文摘要

用于视觉问答(VQA)的多模态大语言模型(MLLMs)通常面临双重限制:知识幻觉和细粒度视觉感知不足。关键的是,我们发现常识图和场景图通过提供丰富的外部知识和捕捉细粒度视觉细节,恰好为这些缺陷提供了互补的解决方案。然而,先前的工作通常孤立地处理它们,忽视了它们的协同潜力。为了弥合这一差距,我们提出了KG-ViP,一个统一的框架,通过融合场景图和常识图来增强MLLMs。KG-ViP框架的核心是一个新颖的检索与融合流程,利用查询作为语义桥逐步整合两种图,合成统一的结构化上下文,促进可靠的多模态推理。在FVQA 2.0+和MVQA基准上的大量实验表明,KG-ViP显著优于现有的VQA方法。

英文摘要

Multi-modal Large Language Models (MLLMs) for Visual Question Answering (VQA) often suffer from dual limitations: knowledge hallucination and insufficient fine-grained visual perception. Crucially, we identify that commonsense graphs and scene graphs provide precisely complementary solutions to these respective deficiencies by providing rich external knowledge and capturing fine-grained visual details. However, prior works typically treat them in isolation, overlooking their synergistic potential. To bridge this gap, we propose KG-ViP, a unified framework that empowers MLLMs by fusing scene graphs and commonsense graphs. The core of the KG-ViP framework is a novel retrieval-and-fusion pipeline that utilizes the query as a semantic bridge to progressively integrate both graphs, synthesizing a unified structured context that facilitates reliable multi-modal reasoning. Extensive experiments on FVQA 2.0+ and MVQA benchmarks demonstrate that KG-ViP significantly outperforms existing VQA methods.

2604.18530 2026-05-28 cs.AI

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

OGER:一种用于混合强化学习的鲁棒离线引导探索奖励

Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong, Min Zhang

AI总结 提出OGER框架,通过多教师协作训练和基于熵的辅助探索奖励,统一离线教师引导与在线强化学习,提升大语言模型在数学推理和泛化任务中的探索能力。

详情
AI中文摘要

近年来,具有可验证奖励的强化学习(RLVR)的进展显著提升了大型语言模型(LLM)的推理能力,但模型在探索超出其初始策略分布的新轨迹方面仍存在困难。尽管已提出离线教师引导和基于熵的策略来解决这一问题,但它们往往缺乏深度融合或受限于模型自身能力。在本文中,我们提出OGER(离线引导探索奖励),一种新颖的框架,通过专门的奖励建模视角统一离线教师引导和在线强化学习。OGER采用多教师协作训练,并构建一个辅助探索奖励,利用离线轨迹和模型自身的熵来激励自主探索。在数学和通用推理基准上的大量实验表明,OGER持续优于竞争基线,在数学推理上取得显著提升,同时保持对域外任务的鲁棒泛化。我们提供了训练动态的全面分析,并进行了详细的消融研究,以验证我们基于熵的奖励调制的有效性。我们的代码可在 https://github.com/ecoli-hit/OGER.git 获取。

英文摘要

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial policy distribution. While offline teacher guidance and entropy-driven strategies have been proposed to address this, they often lack deep integration or are constrained by the model's inherent capacity. In this paper, we propose OGER (Offline-Guided Exploration Reward), a novel framework that unifies offline teacher guidance and online reinforcement learning through a specialized reward modeling lens. OGER employs multi-teacher collaborative training and constructs an auxiliary exploration reward that leverages both offline trajectories and the model's own entropy to incentivize autonomous exploration. Extensive experiments across mathematical and general reasoning benchmarks demonstrate that OGER consistently outperforms competitive baselines, achieving substantial gains in mathematical reasoning while maintaining robust generalization to out-of-domain tasks. We provide a comprehensive analysis of training dynamics and conduct detailed ablation studies to validate the effectiveness of our entropy-aware reward modulation. Our code is available at https://github.com/ecoli-hit/OGER.git.

2604.18235 2026-05-28 cs.CL cs.AI

Negative Advantages Is a Double-Edged Sword: Calibrating advantages in GRPO for Search Agents

负优势是一把双刃剑:为搜索智能体校准GRPO中的优势

Jiayi Wu, Ruobing Xie, Zeqian Huang, Lei Jiang, Can Xu, Kangyang Luo, Bochen Lin, Ming Gao, Xiang Li

AI总结 针对GRPO算法在多跳搜索中因粗粒度优势分配和正负优势不平衡导致的训练不稳定问题,提出CalibAdv方法,通过细粒度降低过度负优势并重新平衡正负优势,提升模型性能和训练稳定性。

详情
AI中文摘要

搜索智能体通过与搜索引擎的多轮交互实现强大的问答性能,其中组相对策略优化(GRPO)是一种广泛使用的训练算法。然而,GRPO风格的算法在多跳搜索场景中仍面临若干挑战。首先,当最终答案错误时,正确的中间步骤常常受到惩罚。其次,训练高度不稳定,经常导致自然语言能力退化甚至灾难性训练崩溃。我们的分析将这些问题归因于粗粒度的优势分配以及正负优势之间的不平衡。为了解决这些问题,我们提出了CalibAdv,一种专门为搜索智能体设计的优势校准方法,能够更准确、更稳定地对惩罚和奖励进行建模。具体来说,CalibAdv利用中间步骤的正确性在细粒度上降低过度的负优势,然后进一步重新平衡正负优势以提高训练稳定性。重要的是,CalibAdv采用轻量级设计,从标准 rollout 信号中校准优势,使其简单且易于部署。在三个模型和七个基准上的大量实验表明,CalibAdv同时提升了模型性能和训练稳定性。我们的代码可在 https://github.com/wujwyi/CalibAdv 获取。

英文摘要

Search agents achieve strong question-answering performance through multi-turn interactions with search engines, with Group Relative Policy Optimization (GRPO) being a widely used training algorithm. However, GRPO-style algorithms still face several challenges in multi-hop search settings. First, correct intermediate steps are often penalized when the final answer is wrong. Second, training is highly unstable, often causing degradation of natural language ability or even catastrophic training collapse. Our analysis attributes these issues to coarse-grained advantage assignment and an imbalance between positive and negative advantages. To address these problems, we propose CalibAdv, an advantage calibration method specifically designed for search agents that enables more accurate and more stable modeling of penalties and rewards. Specifically, CalibAdv leverages the correctness of intermediate steps to downscale excessive negative advantages at a fine-grained level. It then further rebalances positive and negative advantages to improve training stability. Importantly, CalibAdv adopts a lightweight design that calibrates advantages from standard rollout signals, making it simple and easy to deploy. Extensive experiments across three models and seven benchmarks demonstrate that CalibAdv improves both model performance and training stability. Our code is available at https://github.com/wujwyi/CalibAdv.

2604.18227 2026-05-28 cs.LG

FSEVAL: Feature Selection Evaluation Toolbox and Dashboard

FSEVAL:特征选择评估工具箱与仪表盘

Muhammad Rajabinasab, Arthur Zimek

AI总结 提出FSEVAL工具箱与可视化仪表盘,用于标准化、统一地评估和可视化特征选择算法。

详情
AI中文摘要

特征选择是一项基本的机器学习和数据挖掘任务,涉及从信息特征中区分冗余特征。它试图通过去除冗余特征来解决维数灾难,同时与降维方法不同,保持可解释性。特征选择在有监督和无监督设置下进行,采用不同的评估指标来确定哪个特征选择算法最佳。在本文中,我们提出了FSEVAL,一个带有可视化仪表盘的特征选择评估工具箱,旨在轻松全面地评估特征选择算法。FSEVAL旨在提供一个标准化、统一的评估和可视化工具箱,帮助该领域的研究人员轻松地对特征选择算法进行广泛而全面的评估。

英文摘要

Feature selection is a fundamental machine learning and data mining task, involved with discriminating redundant features from informative ones. It is an attempt to address the curse of dimensionality by removing the redundant features, while unlike dimensionality reduction methods, preserving explainability. Feature selection is conducted in both supervised and unsupervised settings, with different evaluation metrics employed to determine which feature selection algorithm is the best. In this paper, we propose FSEVAL, a feature selection evaluation toolbox accompanied with a visualization dashboard, with the goal to make it easy to comprehensively evaluate feature selection algorithms. FSEVAL aims to provide a standardized, unified, evaluation and visualization toolbox to help the researchers working in the field, conduct extensive and comprehensive evaluation of feature selection algorithms with ease.

2604.17943 2026-05-28 cs.CL

A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents

专业领域基准构建与评估框架:以国防相关文档为例

Bao Gia Doan, Aditya Joshi, Pantelis Elinas, Aarya Bodhankar, Oscar Leslie, Tom Marchant, Flora Salim

AI总结 提出DoRA框架,通过合成数据生成和双LLM流水线解决专业领域RAG问答的冷启动问题,在国防文档上显著减少幻觉并提升覆盖率和忠实度。

详情
AI中文摘要

基于RAG的专业领域问答面临冷启动问题:缺乏评估基准和用于后训练的标注数据。我们提出DoRA(面向领域的RAG评估),一个仅使用少量专业领域文档的新型基准构建与评估框架。DoRA系统地生成合成QA训练和评估数据集,并跨五个领域特定意图提供可审计的证据。为缓解同流水线循环,DoRA的训练和测试拆分使用不同的LLM家族(训练用Claude Sonnet;测试用GPT-4o),这些数据来自不相交的种子文档语料库。在40份国防相关文档(英文)上实例化后,DoRA产生约6600个精心整理的实例。与8个LLM基线在1259个样本的基准上比较,基于合成训练集微调的LoRA适配Llama3.1-8B在6个覆盖率和忠实度指标上持续提升性能,尤其在默认GTE检索设置下将幻觉减少一半以上,且增益在替代检索器和基于提示的基线下依然保持。国防领域专业知识在评估的三个阶段被纳入:(a) 判断DoRA生成的合成QA质量,(b) 确定LLM作为评判者的分数可靠性,(c) 评估QA流水线在完全人工编写的QA示例上的泛化能力。我们将DoRA定位为领域迁移下专业领域RAG的实用框架,并以国防作为高风险的案例研究。

英文摘要

RAG-based question-answering (QA) in specialist domains faces a cold-start problem: lack of evaluative benchmarks and absence of labeled data for post-training. We present DoRA (Domain-oriented RAG Assessment), a novel benchmark construction and evaluation framework using only a small set of specialist domain documents. DoRA systematically generates synthetic QA training and evaluation datasets with auditable evidence across five domain-specific intents. To mitigate same-pipeline circularity, DoRA's training and test splits use different LLM families (Claude Sonnet for training; GPT-4o for test) drawn from disjoint seed-document corpora. Instantiated on 40 defense-related documents (written in English), DoRA yields ~6.6K curated instances. Compared against 8 LLM baselines over a benchmark of 1,259 samples, a LoRA-adapted Llama3.1-8B trained on the synthetic training set consistently improves performance over 6 coverage and faithfulness metrics, especially reducing hallucination by more than half under the default GTE retrieval setting, with gains persisting across alternative retrievers and prompting-based baselines. Defense-domain expertise is incorporated in three stages of our evaluation: (a) determining the quality of the synthetic QA generated by DoRA, (b) ascertaining the reliability of LLM-as-judge scores, and (c) evaluating the generalization of the QA pipeline on completely human-written QA examples. We position DoRA as a practical framework for specialist-domain RAG under domain shift, with defense as a high-stakes case study.

2604.17110 2026-05-28 cs.CV

From Clinical Intent to Clinical Model: Autonomous Coding-Agents for Clinician-driven AI Development

从临床意图到临床模型:面向临床医生驱动AI开发的自主编码代理

Zihao Zhao, Frederik Hauke, Juliana De Castilhos, Mathis Bode, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn

AI总结 提出一种自主编码代理系统,允许临床医生用自然语言描述任务,系统自动生成并迭代优化模型,在五项临床任务中达到竞争性能,并显著减少胸部X光片模型对胸腔引流管的依赖。

Comments Code is available at https://github.com/zhaozh10/clinical-automata/

详情
AI中文摘要

开发在临床实践中有用的AI模型需要临床医生和AI开发者之间的高效协作。这带来了一个实际挑战:临床医生必须反复与AI开发者沟通并完善其需求,然后这些需求才能转化为可执行的模型开发。这种迭代过程耗时,即使经过反复讨论,由于双方未能完全共享专业知识,仍可能存在不一致。编码代理可能有助于弥合这一差距。它们可以自主编写和优化代码,并具备医学和AI的工作知识,以理解医学专家和开发者制定的命令。我们提出了一个原型,让临床医生直接驱动AI开发。临床医生用自然语言描述任务,系统将描述转化为可工作的流程,通过与临床医生一起反复实验进行优化,并返回满足既定临床目标的模型。在五项临床任务中,该系统可靠地生成了与临床医生请求匹配且达到竞争性能的模型。最值得注意的是,在胸部X光片上,该系统显著减少了模型对胸腔引流管的依赖(气胸分类的已知捷径),在一个数据集上从60%降至31%,在另一个数据集上从50%降至18%。我们的结果表明,编码代理可以将临床AI开发转向更以临床医生驱动的模式,使领域专家能够直接塑造模型,而不是通过专门的AI团队传递需求。

英文摘要

Developing AI models that are useful in clinical practice, requires efficient collaboration between clinicians and AI developers. This poses a practical challenge: clinicians must repeatedly communicate and refine their requirements with AI developers before those requirements can be translated into executable model development. This iterative process is time-consuming, and even after repeated discussion, misalignment may still exist because the two sides do not fully share each other's expertise. Coding agents may help close this gap. They can write and refine code on their own, and they carry working knowledge of both medicine and AI to understand commands formulated by both medical experts and developers. We present a prototype that lets clinicians drive AI development directly. A clinician describes the task in plain language, and the system turns the description into a working pipeline, refines it through repeated experiments together with the clinician, and returns a model that meets the stated clinical objective. Across five clinical tasks, the system reliably produces models that matched the clinician's request and reached competitive performance. Most notably, on chest radiographs the system sharply reduced the model's reliance on chest drains, a well-known shortcut for pneumothorax classification, from 60% to 31% on one dataset and from 50% to 18% on another. Our results suggest that coding agents can shift clinical AI development toward a more clinician-driven mode, allowing domain experts to shape models directly instead of relaying requirements through specialized AI teams.

2604.16774 2026-05-28 cs.CL cs.AI

Retention Consequence in Lifecycle Memory Control

生命周期记忆控制中的保留后果

Jiarui Han

AI总结 研究持久记忆在准入后失效的问题,提出将置信度作为前向有效性/支持证据,并引入强度作为保留后果的显式生命周期状态,通过StageMem控制器实验验证显式保留后果在生命周期结算中的控制作用。

详情
AI中文摘要

持久记忆在成功准入后可能失效:一个前提被写入,然后成为无声的假设,后续维护将其视为普通残留进行压缩、降级或驱逐。我们将这种准入后失效作为生命周期控制问题来研究。现有记忆系统已经执行准入、更新、压缩、检索和驱逐。我们的主张并非此类系统缺乏维护,而是保留后果通常仅通过有效性、相似性、新近性、频率、重要性或摘要信号间接操作,而非作为单独的生命周期状态暴露。因此,我们将置信度视为前向有效性/支持证据,并引入强度作为保留后果的显式生命周期状态。我们在StageMem中实现了这一区分,这是一个小型的分阶段控制器,其瞬态、工作态和持久态存储暴露了提升、压缩和驱逐压力点。在受控的前提实现、压缩、压力和隐式启发式诊断实验中,实验区分了写入过少、保留错误的高线索内容、遗忘代价高昂的前提以及通过饱和保留所有内容。通过生命周期结算使用的显式保留后果,提供了在遗漏和囤积之间的控制面。针对目标准入后失效模式,结果支持持久记忆的生命周期观点:可靠性不仅取决于进入记忆的内容,还取决于准入有效性和保留后果在维护期间是否可用。

英文摘要

Persistent memory can fail after successful admission: a premise is written, then becomes a silent assumption, and later maintenance treats it as ordinary residue to be compressed, demoted, or evicted. We study this post-admission failure as a lifecycle-control problem. Existing memory systems already perform admission, update, compression, retrieval, and eviction. Our claim is not that such systems lack maintenance, but that retention consequence is often operationalized only indirectly through validity, similarity, recency, frequency, importance, or summarization signals rather than exposed as a separate lifecycle state. We therefore treat confidence as carried-forward validity/support evidence, and introduce strength as an explicit lifecycle state for retention consequence. We operationalize this distinction in StageMem, a small staged controller whose transient, working, and durable stores expose promotion, compression, and eviction pressure points. Across controlled premise-realization, compression, pressure, and implicit-heuristic diagnostics, the experiments separate writing too little, retaining the wrong high-cue content, forgetting costly premises, and preserving everything by saturation. Explicit retention consequence, used through lifecycle settlement, provides a control surface between omission and hoarding. For the targeted post-admission failure mode, the results support a lifecycle view of persistent memory: reliability depends not only on what enters memory, but on whether admission validity and retention consequence remain available during maintenance.

2604.16565 2026-05-28 cs.LG cs.AI

Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

流形上的推理:扩散语言模型中用于自我验证的双向一致性

Jiaoyang Ruan, Xin Gao, Yinda Chen, Hengyu Zeng, Liang Du, Guanghao Li, Jie Fu, Jian Pu

AI总结 提出双向流形一致性(BMC),一种无训练、无监督的度量方法,通过前向掩码和后向重建循环量化生成序列的稳定性,用于扩散语言模型的诊断、推理和对齐。

Comments 31 pages, 7 figures. Accepted to the 43rd International Conference on Machine Learning (ICML 2026). Camera-ready version

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, PMLR 306, 2026
AI中文摘要

尽管扩散大语言模型(dLLMs)在全局规划方面具有结构优势,但高效验证它们是否通过有效的推理轨迹得出正确答案仍然是一个关键挑战。在这项工作中,我们提出了一种几何视角:流形上的推理。我们假设有效的生成轨迹作为学习分布的高密度流形上的稳定吸引子存在,而无效路径则表现出流形外漂移。为了实现这一点,我们引入了双向流形一致性(BMC),这是一种无训练、无监督的度量,通过前向掩码和后向重建循环量化生成序列的稳定性。实验上,我们展示了BMC在整个推理生命周期中的多功能性:(1)在诊断中,它作为无需真实答案的解决方案有效性的鲁棒判别器;(2)在推理中,它能够通过拒绝重采样有效集中计算资源于复杂推理任务;(3)在对齐中,它作为密集的几何奖励,将稀疏的结果监督转化为细粒度的指导,使模型能够超越标准基线自我进化。我们的结果确立了内在几何稳定性作为dLLMs正确性的鲁棒指标。

英文摘要

While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geometric perspective: Reasoning on the Manifold. We hypothesize that valid generation trajectories reside as stable attractors on the high-density manifold of the learned distribution, whereas invalid paths exhibit off-manifold drift. To operationalize this, we introduce Bidirectional Manifold Consistency (BMC), a training-free, unsupervised metric that quantifies the stability of the generated sequence through a forward-masking and backward-reconstruction cycle. Empirically, we demonstrate BMC's versatility across the full reasoning lifecycle: (1) in Diagnosis, it serves as a robust discriminator of solution validity without ground truth answer; (2) in Inference, it enables rejection resampling to effectively concentrate computational resources on complex reasoning tasks; and (3) in Alignment, it functions as a dense geometric reward that transforms sparse outcome supervision into fine-grained guidance, empowering models to self-evolve beyond standard baselines. Our results establish intrinsic geometric stability as a robust indicator of correctness for dLLMs.

2604.16358 2026-05-28 cs.LG cs.CL

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

SaFeR-Steer:通过合成引导和反馈动力学进化多轮多模态大语言模型

Haolong Hu, Hanyu Li, Tiancheng He, Huahui Yi, An Zhang, Qiankun Li, Kun Wang, Yang Liu, Zhigang Zeng

AI总结 提出SaFeR-Steer框架,通过分阶段合成引导和导师参与的GRPO训练单学生模型,并引入轨迹一致总结奖励(TCSR)以解决多轮安全对齐中的长上下文安全衰减问题,显著提升多轮安全性和有用性。

详情
AI中文摘要

多模态大语言模型(MLLMs)越来越多地部署在多轮场景中,攻击者可以通过不断演变的视觉-文本历史升级不安全意图,并利用长上下文安全衰减。然而,安全对齐仍然以单轮数据和固定模板对话为主,导致训练与部署之间存在不匹配。为弥补这一差距,我们提出SaFeR-Steer,一种渐进式多轮对齐框架,结合分阶段合成引导和导师参与的GRPO,在自适应、在线策略攻击下训练单个学生模型。我们还引入了轨迹一致总结奖励(TCSR),该奖励聚合了历史最小值和回合奖励的平均值,使得任何低质量回合都会影响轨迹级别的回报。I. 数据集。我们发布STEER,一个多轮多模态安全数据集,包含STEER-SFT(12,934)、STEER-RL(2,000)和STEER-Bench(3,227)对话,回合数为2-10。II. 实验。从Qwen2.5-VL-3B/7B开始,SaFeR-Steer在单轮基准(3B:48.30/45.86 → 81.84/70.77;7B:56.21/60.32 → 87.89/77.40)和多轮基准(3B:12.55/27.13 → 55.58/70.27;7B:24.66/46.48 → 64.89/72.35)上显著提高了安全性/有用性,将失败转移到后续回合,并产生了超越单纯扩展的鲁棒性。代码可在https://anonymous.4open.science/r/SaFeR-Steer获取。

英文摘要

MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment. To bridge this gap, we propose SaFeR-Steer, a progressive multi-turn alignment framework that combines staged synthetic bootstrapping with tutor-in-the-loop GRPO to train a single student under adaptive, on-policy attacks. We also introduce Trajectory-Consistent Summative Reward (TCSR), which aggregates the historical minimum and average of turn rewards so that any low-quality turn affects the trajectory-level return. I. Dataset. We release STEER, a multi-turn multimodal safety dataset with STEER-SFT (12,934), STEER-RL (2,000), and STEER-Bench (3,227) dialogues spanning 2-10 turns. II. Experiment. Starting from Qwen2.5-VL-3B/7B, SaFeR-Steer substantially improves Safety/Helpfulness on both single-turn (48.30/45.86 $\rightarrow$ 81.84/70.77 for 3B; 56.21/60.32 $\rightarrow$ 87.89/77.40 for 7B) and multi-turn benchmarks (12.55/27.13 $\rightarrow$ 55.58/70.27 for 3B; 24.66/46.48 $\rightarrow$ 64.89/72.35 for 7B), shifting failures to later turns and yielding robustness beyond scaling alone. Code is available at https://anonymous.4open.science/r/SaFeR-Steer

2604.15898 2026-05-28 cs.AI

Towards Rigorous Explainability by Feature Attribution

通过特征归因实现严格可解释性

Olivier Létoffé, Xuanxiang Huang, Joao Marques-Silva

AI总结 本文综述了使用严格的符号化可解释人工智能方法替代非严格的非符号化方法(如SHAP)来分配相对特征重要性的研究进展。

详情
AI中文摘要

大约十年来,非符号化方法一直是解释复杂机器学习(ML)模型的首选。不幸的是,这些方法缺乏严格性,可能误导人类决策者。在ML的高风险应用中,缺乏严格性尤其成问题。一个典型的不严格性证明例子是在可解释人工智能(XAI)中采用Shapley值,工具SHAP就是一个普遍的例子。本文概述了当前使用严格的符号化XAI方法作为非严格非符号化方法替代方案的努力,具体用于分配相对特征重要性。

英文摘要

For around a decade, non-symbolic methods have been the option of choice when explaining complex machine learning (ML) models. Unfortunately, such methods lack rigor and can mislead human decision-makers. In high-stakes uses of ML, the lack of rigor is especially problematic. One prime example of provable lack of rigor is the adoption of Shapley values in explainable artificial intelligence (XAI), with the tool SHAP being a ubiquitous example. This paper overviews the ongoing efforts towards using rigorous symbolic methods of XAI as an alternative to non-rigorous non-symbolic approaches, concretely for assigning relative feature importance.

2604.14585 2026-05-28 cs.AI cs.CL

Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

提示优化如同抛硬币:诊断其在复合AI系统中何时有效

Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He

AI总结 通过大量实验发现提示优化在复合AI系统中效果不稳定,仅当任务具有可挖掘的输出结构时才有帮助,并提供了两阶段诊断方法。

Comments Accepted to the 1st Workshop on Combining Theory and Benchmarks, CTB@ICML 2026, Seoul, South Korea

详情
AI中文摘要

复合AI系统中的提示优化在统计上与抛硬币无异:在Claude Haiku 4.5上的72次优化运行(6种方法 × 4个任务 × 3次重复)中,49%的得分低于零样本;在Amazon Nova Lite上,失败率更高。然而,在一个任务上,所有六种方法相比零样本提升了高达+6.8分。是什么区分了成功与失败?我们通过18,000次网格评估和144次优化运行进行了调查,按照必须回答的顺序测试了TextGrad和DSPy等端到端优化工具背后的两个假设:(A) 智能体提示存在交互,需要联合优化而非独立优化;(B) 单个提示本身值得优化。交互效应从未显著(p > 0.52,所有F < 1.0),并且优化仅在任务具有可挖掘的输出结构时才有帮助:即模型可以生成但不会默认采用的格式。我们进一步给出了机制性解释:指令微调将输入措辞压缩成狭窄的输出分布,消除了联合优化所依赖的措辞敏感性。我们提供了一个两阶段诊断:一个80美元的ANOVA预测试用于智能体耦合,以及一个10分钟的头空间测试,用于预测优化是否值得,从而将抛硬币转变为知情决策。

英文摘要

Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku 4.5 (6 methods $\times$ 4 tasks $\times$ 3 repeats), 49% score below zero-shot; on Amazon Nova Lite, the failure rate is even higher. Yet on one task, all six methods improve over zero-shot by up to $+6.8$ points. What distinguishes success from failure? We investigate with 18,000 grid evaluations and 144 optimization runs, testing two assumptions behind end-to-end optimization tools like TextGrad and DSPy, in the order they must be answered: (A) agent prompts interact, requiring joint rather than independent optimization, and (B) individual prompts are worth optimizing at all. Interaction effects are never significant ($p > 0.52$, all $F < 1.0$), and optimization helps only when the task has exploitable output structure: a format the model can produce but does not default to. We further give a mechanistic account: instruction-tuning compresses input phrasing into a narrow output distribution, eliminating the very phrasing-sensitivity that joint optimization assumes. We provide a two-stage diagnostic: an \$80 ANOVA pre-test for agent coupling, and a 10-minute headroom test that predicts whether optimization is worthwhile, turning a coin flip into an informed decision.

2604.14356 2026-05-28 cs.CL cs.AI

When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden

当多囊卵巢综合征遇上进食障碍:一种可解释的AI方法检测隐藏的三重负担

Apoorv Prasad, Susan McRoy

AI总结 本研究通过微调小型开源语言模型,利用可解释性AI从社交媒体帖子中自动检测多囊卵巢综合征患者的身体形象困扰、进食障碍和代谢挑战的三重负担,最佳模型在150条测试帖上达到75.3%的精确匹配准确率。

详情
AI中文摘要

患有多囊卵巢综合征(PCOS)的女性面临身体形象困扰、进食障碍和代谢挑战的显著升高风险,然而现有的自然语言处理方法在检测这些状况时缺乏透明度,且无法识别共病表现。我们开发了小型开源语言模型,以基于可解释性的方式自动检测社交媒体帖子中的这种三重负担。我们从六个子论坛收集了1000条与PCOS相关的帖子,由两名经过训练的标注员根据Lee等人(2017)临床框架的操作化指南对帖子进行标注。使用低秩适配对三个模型(Gemma-2-2B、Qwen3-1.7B、DeepSeek-R1-Distill-Qwen-1.5B)进行微调,以生成带有文本证据的结构化解释。最佳模型在150条保留帖子上实现了75.3%的精确匹配准确率,具有稳健的共病检测能力和强可解释性。性能随诊断复杂性下降,表明其最佳用途是筛查而非自主诊断。

英文摘要

Women with polycystic ovary syndrome (PCOS) face substantially elevated risks of body image distress, disordered eating, and metabolic challenges, yet existing natural language processing approaches for detecting these conditions lack transparency and cannot identify co-occurring presentations. We developed small, open-source language models to automatically detect this triple burden in social media posts with grounded explainability. We collected 1,000 PCOS-related posts from six subreddits, with two trained annotators labeling posts using guidelines operationalizing Lee et al. (2017) clinical framework. Three models (Gemma-2-2B, Qwen3-1.7B, DeepSeek-R1-Distill-Qwen-1.5B) were fine-tuned using Low-Rank Adaptation to generate structured explanations with textual evidence. The best model achieved 75.3 percent exact match accuracy on 150 held-out posts, with robust comorbidity detection and strong explainability. Performance declined with diagnostic complexity, indicating their best use is for screening rather than autonomous diagnosis.

2604.12955 2026-05-28 cs.AI

Text2Model: Modeling Copilots for Text-to-Model Translation

Text2Model: 用于文本到模型翻译的建模副驾驶

Serdar Kadioglu, Karthik Uppuluri, Akash Singirikonda

AI总结 本文提出Text2Model和Text2Zinc,通过统一架构和数据集、求解器无关的方式,利用多种LLM策略实现文本到组合优化与满足问题的模型翻译,并开源副驾驶和排行榜以缩小性能差距。

Comments AAAI'25 Bridge Program on Machine Learning and Operations Research CPAIOR'26 Master Class on LLMs for CP/OR

详情
AI中文摘要

利用大型语言模型(LLM)进行文本到模型翻译和优化任务的研究兴趣日益增长。本文通过引入\textsc{Text2Model}和\textsc{Text2Zinc}来推进这一研究方向。\textsc{Text2Model}是一套基于多种LLM策略(复杂度各异)的副驾驶,并附带在线排行榜。\textsc{Text2Zinc}是一个跨领域数据集,用于捕捉自然语言指定的优化和满足问题,并附带内置AI助手的交互式编辑器。虽然已有新兴文献使用LLM将组合问题翻译为形式化模型,但我们的工作是首次尝试将满足问题和优化问题集成在\textit{统一架构}和\textit{数据集}中。此外,我们的方法是\textit{求解器无关的},不同于现有专注于翻译为特定求解器模型的工作。为此,我们利用\textsc{MiniZinc}的求解器和范式无关的建模能力来表述组合问题。我们进行了全面实验,比较了多种单次和多次调用策略的执行和解准确率,包括:零样本提示、思维链推理、通过知识图谱的中间表示、基于语法的语法编码,以及将模型分解为顺序子任务的代理方法。我们的副驾驶策略具有竞争力,并在部分方面改进了该领域的最新研究。我们的发现表明,虽然LLM有前景,但尚未成为组合建模的一键式技术。我们开源了\textsc{Text2Model}副驾驶和排行榜,以及\textsc{Text2Zinc}和交互式编辑器,以支持缩小这一性能差距。

英文摘要

There is growing interest in leveraging large language models (LLMs) for text-to-model translation and optimization tasks. This paper aims to advance this line of research by introducing \textsc{Text2Model} and \textsc{Text2Zinc}. \textsc{Text2Model} is a suite of copilots based on several LLM strategies with varying complexity, along with an online leaderboard. \textsc{Text2Zinc} is a cross-domain dataset for capturing optimization and satisfaction problems specified in natural language, along with an interactive editor with built-in AI assistant. While there is an emerging literature on using LLMs for translating combinatorial problems into formal models, our work is the first attempt to integrate \textit{both} satisfaction and optimization problems within a \textit{unified architecture} and \textit{dataset}. Moreover, our approach is \textit{solver-agnostic} unlike existing work that focuses on translation to a solver-specific model. To achieve this, we leverage \textsc{MiniZinc}'s solver-and-paradigm-agnostic modeling capabilities to formulate combinatorial problems. We conduct comprehensive experiments to compare execution and solution accuracy across several single- and multi-call strategies, including; zero-shot prompting, chain-of-thought reasoning, intermediate representations via knowledge-graphs, grammar-based syntax encoding, and agentic approaches that decompose the model into sequential sub-tasks. Our copilot strategies are competitive, and in parts improve, recent research in this domain. Our findings indicate that while LLMs are promising they are not yet a push-button technology for combinatorial modeling. We contribute \textsc{Text2Model} copilots and leaderboard, and \textsc{Text2Zinc} and interactive editor to open-source to support closing this performance gap.

2604.13232 2026-05-28 cs.CL

Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection

评估评估者:SemEval-2020任务1在词汇语义变化检测中的问题

Bach Phan-Tat, Kris Heylen, Dirk Geeraerts, Stefano De Pascale, Dirk Speelmana

AI总结 通过操作化、数据质量和基准设计三个框架,批判性分析SemEval-2020任务1的局限性,指出其窄化语义变化模型、数据质量问题及设计缺陷,呼吁未来改进。

详情
AI中文摘要

本文通过操作化、数据质量和基准设计三个框架重新审视了词汇语义变化检测中最具影响力的共享基准SemEval-2020任务1。首先,在操作化层面,我们认为该基准主要将语义变化建模为离散义项的增加、丢失或重新分布。虽然这种框架便于标注和评估,但过于狭窄,无法捕捉渐变的、构式的、搭配的和语篇层面的变化。此外,黄金标签是标注决策、聚类过程和阈值设置的结果,可能限制任务的有效性。其次,在数据质量层面,我们表明该基准受到严重的语料库和预处理问题影响,包括OCR噪声、畸形字符、截断句子、不一致的词形还原、词性标注错误以及目标词遗漏。这些问题可能扭曲模型行为,使语言分析复杂化,并降低可重复性。第三,在基准设计层面,我们认为精心挑选的小规模目标集和有限的语言覆盖降低了现实性并增加了统计不确定性。综合来看,这些局限性表明该基准应被视为一个有用但不完整的测试平台,而非进展的最终衡量标准。因此,我们呼吁未来的数据集和共享任务采用更广泛的语义变化理论,透明地记录预处理过程,扩大跨语言覆盖范围,并使用更现实的评估设置。这些步骤对于词汇语义变化检测中更有效、可解释和可推广的进展是必要的。

英文摘要

This discussion paper re-examines SemEval-2020 Task 1, the most influential shared benchmark for lexical semantic change detection, through a three-part evaluative framework: operationalisation, data quality, and benchmark design. First, at the level of operationalisation, we argue that the benchmark models semantic change mainly as gain, loss, or redistribution of discrete senses. While practical for annotation and evaluation, this framing is too narrow to capture gradual, constructional, collocational, and discourse-level change. Also, the gold labels are outcomes of annotation decisions, clustering procedures, and threshold settings, which could potentially limit the validity of the task. Second, at the level of data quality, we show that the benchmark is affected by substantial corpus and preprocessing problems, including OCR noise, malformed characters, truncated sentences, inconsistent lemmatisation, POS-tagging errors, and missed targets. These issues can distort model behaviour, complicate linguistic analysis, and reduce reproducibility. Third, at the level of bench-mark design, we argue the small curated target sets and limited language coverage reduce realism and increase statistical uncertainty. Taken together, these limitations suggest that the benchmark should be treated as a useful but partial test bed rather than a definitive measure of progress. We therefore call for future datasets and shared tasks to adopt broader theories of semantic change, document pre-processing transparently, expand cross-linguistic coverage, and use more realistic evaluation settings. Such steps are necessary for more valid, interpretable, and generalisable progress in lexical semantic change detection

2506.01247 2026-05-28 cs.CV cs.AI cs.LG

Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering

超越可解释性:稀疏自编码器何时、为何以及如何实现无标签视觉引导

Gerasimos Chatzoudis, Zhuowei Li, Gemma E. Moran, Hao Wang, Dimitris N. Metaxas

AI总结 本文提出无标签视觉稀疏引导方法VS2,通过训练稀疏自编码器并利用其重构误差和稀疏特征放大来引导冻结的视觉语言模型,在九个图像分类数据集上提升零样本准确率。

详情
AI中文摘要

稀疏自编码器(SAE)越来越多地被用于解释基础模型,但它们作为可操作干预空间的作用仍不太被理解,尤其是在视觉领域。我们研究稀疏视觉特征是否不仅可用于事后分析,还可用于引导冻结的视觉语言模型。我们引入视觉稀疏引导(VS2),一种无标签方法,它在冻结的CLIP图像编码器的无标签激活上训练一个top-$k$ SAE,并在测试时通过放大输入的活跃稀疏特征并解码诱导的变化来构建一个可解释的引导向量。我们证明该过程可分解为质心偏差引导:每个输入沿着其与SAE学习到的质心的偏差移动。残差项由SAE的每样本重构误差(通过FVU测量)精确控制,从而产生基于FVU的残差界限,并促使在SAE重构不可靠时回退到零样本CLIP的可靠性门控。通过使用在无标签CLIP图像编码器激活上训练的目标域SAE,VS2在九个图像分类数据集上提高了零样本准确率,在推理计算量增加不到0.1%的情况下实现了高达+4.12%的提升。最后,一项受控的上界研究VS2++表明,选择性放大稀疏特征可带来高达+21.44%的提升,揭示了一个重构与任务显著性的差距:对重构显著的稀疏特征不一定与对下游预测有用的特征一致。

英文摘要

Sparse Autoencoders (SAEs) are increasingly used to interpret foundation models, but their role as an actionable intervention space remains less understood, especially in vision. We study whether sparse visual features can be used not only for post-hoc analysis, but also to steer frozen vision-language models. We introduce Visual Sparse Steering (VS2), a label-free method that trains a top-$k$ SAE on unlabeled activations from a frozen CLIP image encoder and, at test time, constructs an interpretable steering vector by amplifying the input's active sparse features and decoding the induced change. We show that this procedure admits a closed-form decomposition as centroid-deviation steering: each input is moved along its deviation from the SAE-learned centroid. The residual term is controlled exactly by the SAE's per-sample reconstruction error, measured by FVU, yielding an FVU-based residual bound and motivating a reliability gate that falls back to zero-shot CLIP when SAE reconstruction is unreliable. With target-domain SAEs trained on unlabeled CLIP image-encoder activations, VS2 improves zero-shot accuracy across nine image-classification datasets, achieving gains up to $+4.12\%$ with less than $0.1\%$ additional inference compute. Finally, a controlled upper-bound study, VS2++, shows that selective amplification of sparse features can yield gains up to $+21.44\%$, exposing a reconstruction-vs-task saliency gap: features salient for reconstruction need not align with features useful for downstream prediction.