arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2370
2605.00846 2026-05-29 cs.AI cs.MA

ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations

ClinicBot:基于指南的临床聊天机器人,具有优先证据RAG和可验证引用

Navapat Nananukul, Mayank Kejriwal

AI总结 提出ClinicBot系统,通过结构化提取指南、按临床重要性排序证据和多智能体协作,生成准确、可验证的临床回答。

详情
AI中文摘要

临床诊断需要准确、可验证且明确基于官方指南的答案。虽然大型语言模型在自然语言处理方面表现出色,但它们产生幻觉的倾向削弱了其在需要精确性的高风险医疗环境中的实用性。现有的检索增强生成(RAG)系统对所有证据一视同仁,产生嘈杂的上下文和与临床实践不符的通用答案。我们提出ClinicBot,一个通过三项关键进展将指南建议转化为可信临床支持的人工智能系统:(1)将临床指南结构化提取为语义单元(建议、表格、定义、叙述)并带有明确的出处;(2)证据优先级排序,根据临床重要性和指南结构而非文本相似性对内容进行排序;(3)一个基于Web的界面,呈现简洁、可操作的答案及可验证的证据。我们将使用真实患者的糖尿病问题以及一个忠实于美国糖尿病协会(ADA)《糖尿病护理标准(2025)》的额外糖尿病风险评估工具来演示ClinicBot。演示将说明语义知识提取和分层证据排名如何在多智能体设置中可靠运行,以大规模处理复杂的临床指南。

英文摘要

Clinical diagnosis requires answers that are accurate, verifiable, and explicitly grounded in official guidelines. While large language models excel at natural language processing, their tendency to hallucinate undermines their utility in high-stakes medical contexts where precision is essential. Existing retrieval-augmented generation (RAG) systems treat all evidence equally, producing noisy context and generic answers misaligned with clinical practice. We present ClinicBot, an AI system that translates guideline recommendations into trustworthy clinical support through three key advances: (1) structured extraction of clinical guidelines into semantic units (recommendations, tables, definitions, narrative) with explicit provenance, (2) evidence prioritization that ranks content by clinical significance and guideline structure rather than textual similarity, and (3) a web-based interface that presents concise, actionable answers with verifiable evidence. We will demonstrate ClinicBot using diabetes questions from real patients and an additional diabetes risk assessment tool that is faithful to the American Diabetes Association (ADA) Standards of Care in Diabetes (2025). The demonstration will illustrate how semantic knowledge extraction and hierarchical evidence ranking can reliably operate in a multi-agent setting to process complex clinical guidelines at scale.

2605.00716 2026-05-29 cs.LG cs.SI

Aitchison Embeddings for Learning Compositional Graph Representations

用于学习组合图表示的Aitchison嵌入

Nikolaos Nakis, Chrysoula Kosma, Panagiotis Promponas, Michail Chatzianastasis, Giannis Nikolentzos

AI总结 提出基于Aitchison几何的组合图嵌入框架,通过等距对数比坐标实现可解释的节点表示,在节点分类和链接预测任务中性能与强基线相当,并利用子成分一致性进行维度约简。

Comments ICML 2026 Camera-ready version

详情
AI中文摘要

表示学习是图机器学习的核心,驱动着链接预测和节点分类等任务。然而,大多数图嵌入难以解释,对学习特征与图结构之间的关系提供的洞察有限。许多网络自然地具有角色混合视图,其中节点最好被描述为潜在原型因素的混合。受此结构启发,我们提出了一个基于Aitchison几何的组合图嵌入框架,Aitchison几何是比较混合物的标准几何。节点表示为单纯形值组合,并通过等距对数比(ILR)坐标嵌入,该坐标在保留Aitchison距离的同时允许在欧几里得空间中进行无约束优化。这产生了内在可解释的嵌入,其几何反映了原型之间的相对权衡,并在成分限制下支持一致行为;我们考虑了固定和可学习的ILR基。在节点分类和链接预测中,我们的方法在提供构建时而非事后可解释性的同时,实现了与强基线相当的性能。最后,子成分一致性实现了原则性的成分限制:移除和重新归一化子集保留了良好定义的几何,我们通过子成分维度移除来探究原型组如何影响表示和预测。

英文摘要

Representation learning is central to graph machine learning, powering tasks such as link prediction and node classification. However, most graph embeddings are hard to interpret, offering limited insight into how learned features relate to graph structure. Many networks naturally admit a role-mixture view, where nodes are best described as mixtures over latent archetypal factors. Motivated by this structure, we propose a compositional graph embedding framework grounded in Aitchison geometry, the canonical geometry for comparing mixtures. Nodes are represented as simplex-valued compositions and embedded via isometric log-ratio (ILR) coordinates, which preserve Aitchison distances while enabling unconstrained optimization in Euclidean space. This yields intrinsically interpretable embeddings whose geometry reflects relative trade-offs among archetypes and supports coherent behavior under component restriction; we consider both fixed and learnable ILR bases. Across node classification and link prediction, our method achieves competitive performance with strong baselines while providing explainability by construction rather than post-hoc. Finally, subcompositional coherence enables principled component restriction: removing and renormalizing subsets preserves a well-defined geometry, which we exploit via subcompositional dimensionality removal to probe how archetype groups influence representations and predictions.

2605.00553 2026-05-29 cs.LG

Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

Stable-GFlowNet: 通过对比轨迹平衡实现多样且鲁棒的LLM红队测试

Minchan Kwon, Sunghyun Baek, Minseo Kim, Jaemyung Yu, Dongyoon Han, Junmo Kim

AI总结 针对大语言模型红队测试中有效性与多样性难以兼顾的问题,提出Stable-GFlowNet方法,通过消除配分函数估计和对比轨迹平衡实现稳定训练,在保持最优策略的同时提升攻击性能与多样性。

Comments ICML 2026 Spotlight

详情
AI中文摘要

大语言模型红队测试是一种主动识别LLM漏洞的重要安全过程。在红队测试中寻找有效且多样的攻击至关重要,但实现两者兼具极具挑战性。执行分布匹配的生成流网络(GFNs)是一种有前景的方法,但因其训练不稳定和模式坍塌而臭名昭著。特别是,红队测试中的不稳定奖励加速了模式坍塌。我们提出Stable-GFN(S-GFN),它消除了GFN中的配分函数$Z$估计,并减少了训练不稳定性。S-GFN通过成对比较避免Z估计,并采用针对噪声奖励的鲁棒掩码方法。此外,我们提出流畅性稳定器,以防止模型陷入产生无意义内容的局部最优。S-GFN在保持GFN最优策略的同时提供了更稳定的训练。我们展示了S-GFN在各种设置下压倒性的攻击性能和多样性。

英文摘要

Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generative Flow Networks (GFNs) that perform distribution matching are a promising methods, but they are notorious for training instability and mode collapse. In particular, unstable rewards in red-teaming accelerate mode collapse. We propose Stable-GFN (S-GFN), which eliminates partition function $Z$ estimation in GFN and reduces training instability. S-GFN avoids Z-estimation through pairwise comparisons and employs a robust masking methodology against noisy rewards. Additionally, we propose a fluency stabilizer to prevent the model from getting stuck in local optima that produce gibberish. S-GFN provides more stable training while maintaining the optimal policy of GFN. We demonstrate the overwhelming attack performance and diversity of S-GFN across various settings.

2604.26571 2026-05-29 cs.LG physics.chem-ph physics.data-an

Advancing multi-site emission control: A physics-informed transfer learning framework with mixture of experts for carbon-pollutant synergy

推进多站点排放控制:一种基于物理信息的迁移学习框架结合专家混合模型实现碳-污染物协同控制

Yuxuan Ying, Hanqing Yang, Kaige Wang, Yu Hu, Zhiming Zheng, Yunliang Jiang, Xiaoqing Lin, Xiaodong Li, Jun Chen

AI总结 针对多站点城市固体废物焚烧厂排放控制中数据驱动模型迁移困难的问题,提出一种结合物理约束和运行工况异构性的碳-污染物专家混合模型(CPMoE),通过物理信息迁移学习实现跨站点知识迁移,在13个源站点和12个目标站点上均取得高预测精度,并实现3.6-6.3%的风险指数降低和94-100%的污染物协同减排。

Comments Supplementary materials will be released after the final version is finalized

详情
AI中文摘要

城市固体废物焚烧(MSWI)将城市废物转化为能源,但同时排放二氧化碳、一氧化碳和多种受管制的空气污染物,这些污染物在单一燃烧系统内紧密耦合。在多样化的设施网络中控制这些排放与优化单个工厂存在根本性不同:在一个站点训练的数据驱动模型捕捉到局部统计模式,这些模式很少能成功迁移到另一个站点,因为它们缺乏泛化所需的物理约束和工况级结构。这里我们证明,当物理守恒定律、运行工况异质性和碳-污染物耦合被联合处理时,可以在异质MSWI工厂中识别共享的排放控制关系。我们开发了一种碳-污染物专家混合(CPMoE)模型,该模型在基于守恒的正则化下,通过工况特定专家网络路由过程观测,并结合物理信息迁移学习将参考模型适应到新设施。在13个工厂中,CPMoE预测六种主要污染物和复合系统级风险指数,源域R2分别为0.668-0.904和0.666-0.970;迁移到12个目标工厂后,这些值仍保持在0.661-0.842和0.610-0.841。专家利用模式表明,适应过程通过结构化的工况重新加权进行,而不是从头重新学习。将迁移模型嵌入离线数字孪生,并针对历史过程记录筛选候选操作调整,在94-100%的评估样本中实现一致的风险指数降低3.6-6.3%,同时实现污染物协同减排。这些发现为异质废物-能源网络中可迁移的、系统级碳-污染物协同控制决策支持提供了一条实用途径。

英文摘要

Municipal solid waste incineration (MSWI) converts urban waste to energy but simultaneously emits carbon dioxide, carbon monoxide and multiple regulated air pollutants whose formation is tightly coupled within a single combustion system. Controlling these emissions across a network of diverse facilities poses a fundamentally different challenge from optimising a single plant: data-driven models trained at one site capture local statistical patterns that rarely survive transfer to another, because they lack the physical constraints and regime-level structure needed to generalise. Here we show that shared emission-control relationships can be identified across heterogeneous MSWI plants when physical conservation laws, operating-regime heterogeneity and carbon-pollutant coupling are treated jointly. We develop a carbon-pollutant mixture-of-experts (CPMoE) model that routes process observations through regime-specific expert networks under conservation-based regularisation, and combine it with physics-informed transfer learning to adapt a reference model to new facilities. Across 13 plants, CPMoE predicts six major pollutants and a composite system-level risk index with source-domain R2 of 0.668-0.904 and 0.666-0.970, respectively; after transfer to 12 target plants these values remain 0.661-0.842 and 0.610-0.841. Expert-utilisation patterns show that adaptation proceeds through structured regime re-weighting rather than re-learning from scratch. Embedding the transferred model in an offline digital twin and screening candidate operating adjustments against historical process records yields consistent risk-index reductions of 3.6-6.3% with simultaneous pollutant co-reductions in 94-100% of evaluated samples. These findings suggest a practical route toward transferable, system-level decision support for carbon-pollutant co-control in heterogeneous waste-to-energy networks.

2604.25098 2026-05-29 cs.AI cs.CL cs.LG

Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling

重新审视LLM剪枝对测试时缩放的有效性

Ocean Monjur, Shahriar Kabir Nahin, Anshuman Chhabra

AI总结 本文研究非结构化剪枝对推理型大语言模型测试时缩放性能的影响,发现其优于结构化剪枝甚至有时超过未剪枝模型,并探讨了层间稀疏分配策略的作用。

详情
AI中文摘要

大型语言模型(LLM)现在通过测试时计算缩放(TTS)展现出卓越的推理能力,在数学和编程基准测试中表现令人印象深刻。与此同时,模型压缩研究开发了剪枝方法,旨在在不牺牲任务性能的情况下移除冗余/有害参数。这两项研究进展的交叉点构成了我们工作的基础。具体到推理型LLM,先前的工作表明结构化剪枝(移除整组层块的方法)显著降低了TTS推理性能。然而,在这项工作中,我们重新审视了这一假设,并研究了非结构化剪枝(仅小心移除某些冗余/有害权重的方法)是否表现出类似的局限性。令人惊讶的是,我们在两个推理型LLM(s1.1-7B和Qwen3-8B)的四个推理基准上的广泛实验一致表明,与结构化剪枝相比,非结构化剪枝增强了TTS性能,有时甚至能超越未剪枝的全权重LLM。此外,我们还实证研究了不同层间稀疏分配策略的影响,这些策略是实现这些非结构化方法的重要参数选择。这些发现挑战了剪枝总是降低TTS性能的传统观念,实际上表明,谨慎进行的剪枝可以保持TTS的有效性。

英文摘要

Large Language Models (LLMs) now exhibit remarkable reasoning capabilities through test-time compute scaling (TTS), with impressive performance across math and coding benchmarks. In parallel, research in model compression has developed pruning methods that seek to remove redundant/detrimental parameters without sacrificing task performance. The intersection of these two research advancements lays the foundation for our work. Specific to reasoning LLMs, prior work has shown that structured pruning (methods which remove entire set of layer blocks), significantly degrades TTS reasoning performance. However, in this work, we revisit this assumption and investigate whether unstructured pruning (methods that carefully remove only certain redundant/detrimental weights) exhibits similar limitations. Surprisingly, our extensive experiments across four reasoning benchmarks on two reasoning LLMs: s1.1-7B and Qwen3-8B, consistently show that unstructured pruning augments TTS performance compared to structured pruning, and at times can even outperform the unpruned full-weight LLMs. Furthermore, we also empirically study the impact of different layer-wise sparsity allocation strategies, which are an important parametric choice for instantiating these unstructured methods. These findings challenge the conventional notion that pruning always reduces TTS performance and in fact, suggest that carefully undertaken pruning can retain TTS effectiveness.

2604.24824 2026-05-29 cs.LG

Negative Ontology of True Target for Machine Learning: Towards Evaluation and Learning under Democratic Supervision

机器学习中真实目标的否定本体论:迈向民主监督下的评估与学习

Yongquan Yang

AI总结 本文从哲学角度审视真实目标存在性假设的转变,提出基于否定本体论的民主监督框架,并构建了多不准确真实目标(MIATTs)的评估与学习体系。

详情
AI中文摘要

本文从哲学角度审视关于真实目标(TT)存在与不存在的假设转变如何为基于机器学习的预测建模带来新的视角和见解,并相应地提出了一个民主监督下的评估与学习知识体系。通过系统分析当前主流机器学习范式中TT的存在性假设,我们明确采用否定本体论视角,认为TT在客观世界中并不存在,并基于这一不存在假设定义了机器学习的民主监督。我们进一步提出多不准确真实目标(MIATTs)作为民主监督的实例级实现。基于MIATTs,我们推导了逻辑驱动的MIATTs生成与评估原则、使用MIATTs进行评估的逻辑评估公式,以及使用MIATTs进行学习的不可定义真实目标学习。基于这些组件,我们建立了基于MIATTs的评估与学习(EL-MIATTs)框架,用于基于机器学习的预测建模。一个实际应用展示了所提出的EL-MIATTs框架在支持个人教育和专业发展方面的潜力,与先前在教育与专业发展领域关于民主监督的讨论相一致。

英文摘要

This article philosophically examines how shifts in assumptions regarding the existence and non-existence of the true target (TT) give rise to new perspectives and insights for machine learning (ML)-based predictive modeling and, correspondingly, proposes a knowledge system for evaluation and learning under Democratic Supervision. By systematically analysing the existence assumption of the TT in current mainstream ML paradigms, we explicitly adopt a negative ontology perspective, positing that the TT does not objectively exist in the real world, and, grounded in this non-existence assumption, define Democratic Supervision for ML. We further present Multiple Inaccurate True Targets (MIATTs) as an instance-level realization of Democratic Supervision. Building upon MIATTs, we derive principles, for the logic-driven generation and assessment of MIATTs, a logical assessment formulation for evaluation with MIATTs, and undefinable true target learning for learning with MIATTs. Based on these components, we establish the evaluation and learning with MIATTs (EL-MIATTs) framework for ML-based predictive modelling. A real-world application demonstrates the potential of the proposed EL-MIATTs framework in supporting education and professional development for individuals, aligning with prior discussions of Democratic Supervision in the fields of education and professional development.

2604.21654 2026-05-29 cs.CV cs.AI

Causal Disentanglement-Inspired Degradation Representation Learning for Full-Reference Image Quality Assessment

因果解耦启发的退化表示学习用于全参考图像质量评估

Zhen Zhang, Jielei Chu, Tian Zhang, Lin Ma, Fengmao Lv, Weide Liu, Tianrui Li, Yuming Fang

AI总结 提出基于因果推断和解耦表示学习的全参考图像质量评估新范式,通过干预潜在表示实现退化估计,在多种设置和跨域场景中表现优异。

详情
AI中文摘要

现有的基于深度网络的全参考图像质量评估(FR-IQA)模型通常通过对参考图像和失真图像的深度特征进行成对比较来工作。在本文中,我们从不同的角度处理这个问题,提出了一种基于因果推断和解耦表示学习的新型FR-IQA范式。与典型的基于特征比较的FR-IQA模型不同,我们的方法将退化估计表述为一个由对潜在表示进行干预引导的因果解耦过程。我们首先利用参考图像和失真图像之间的内容不变性来解耦退化表示和内容表示。其次,受人类视觉掩蔽效应的启发,我们设计了一个掩蔽模块来建模图像内容与退化特征之间的因果关系,从而从失真图像中提取受内容影响的退化特征。最后,通过监督回归或无标签降维从这些退化特征预测质量分数。大量实验表明,我们的方法在全监督、少标签和无标签设置的标准IQA基准上取得了极具竞争力的性能。此外,我们还在数据稀缺的多种非标准自然图像域(包括水下、放射线、医学、中子和屏幕内容图像)上评估了该方法。得益于其能够在没有标记IQA数据的情况下进行场景特定训练和预测的能力,我们的方法在跨域泛化方面优于现有的无训练FR-IQA模型。

英文摘要

Existing deep network-based full-reference image quality assessment (FR-IQA) models typically work by performing pairwise comparisons of deep features from the reference and distorted images. In this paper, we approach this problem from a different perspective and propose a novel FR-IQA paradigm based on causal inference and decoupled representation learning. Unlike typical feature comparison-based FR-IQA models, our approach formulates degradation estimation as a causal disentanglement process guided by intervention on latent representations. We first decouple degradation and content representations by exploiting the content invariance between the reference and distorted images. Second, inspired by the human visual masking effect, we design a masking module to model the causal relationship between image content and degradation features, thereby extracting content-influenced degradation features from distorted images. Finally, quality scores are predicted from these degradation features using either supervised regression or label-free dimensionality reduction. Extensive experiments demonstrate that our method achieves highly competitive performance on standard IQA benchmarks across fully supervised, few-label, and label-free settings. Furthermore, we evaluate the approach on diverse non-standard natural image domains with scarce data, including underwater, radiographic, medical, neutron, and screen-content images. Benefiting from its ability to perform scenario-specific training and prediction without labeled IQA data, our method exhibits superior cross-domain generalization compared to existing training-free FR-IQA models.

2604.20443 2026-05-29 cs.CL cs.AI cs.LG

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

DialToM:用于预测状态驱动对话轨迹的心智理论基准

Neemesh Yadav, Palakorn Achananuparp, Jing Jiang, Ee-Peng Lim

AI总结 提出DialToM基准,通过多选评估框架从自然对话中构建,揭示LLMs在推断心理状态(字面ToM)与利用其进行社会预测(功能ToM)之间的系统性推理不对称性,并证明领域专家与AI之间存在显著能力差距。

Comments Submitted to EMNLP 2026

详情
AI中文摘要

我们介绍了DialToM,一个基于自然人类对话构建的带注释的心智理论(ToM)基准,采用多选评估框架。与近期在合成环境中显示显式心理状态推断与应用ToM之间存在差距的工作一致,我们建立了一个更严格的“状态驱动诊断探针”,要求模型仅从孤立的心理状态特征(无对话上下文)预测状态一致的对话轨迹。我们的评估揭示了系统性的推理不对称性——LLMs在推断心理状态(字面ToM)方面表现出色,但在利用它们进行社会预测(功能ToM)方面存在困难。关键的是,领域专家在此任务上达到100%准确率,证明了其有效性,并揭示了人类与AI之间的显著能力差距。此外,教师-学生推理注入探针显示,Gemini 3 Pro(建立了领先基线)具备强大的功能ToM能力,可用于无上下文预测,且该能力可迁移至较弱模型。DialToM、其评估代码和数据集公开于https://github.com/Stealth-py/DialToM。

英文摘要

We introduce DialToM, an annotated Theory of Mind (ToM) benchmark built from naturalistic human-human dialogues using a multiple-choice evaluation framework. Concurrent with recent work showing a gap between explicit mental-state inference and applied ToM in synthetic settings~\cite{gu2024simpletom}, we establish a stricter \emph{State-Driven Diagnostic Probe} in which models must forecast state-consistent dialogue trajectories solely from isolated mental-state profiles without dialogue context. Our evaluation reveals a systematic reasoning asymmetry -- LLMs excel at inferring mental states (Literal ToM) but struggle to leverage them for social forecasting (Functional ToM). Crucially, a domain expert achieves 100\% accuracy on this task, proving its validity and establishing a stark human-AI capability gap. Further, a teacher-student reasoning injection probe shows that Gemini 3 Pro -- which establishes the leading baseline -- possesses robust Functional ToM capabilities for context-free forecasting that are transferable to weaker models. DialToM, its evaluation code, and dataset are publicly available at https://github.com/Stealth-py/DialToM.

2604.18847 2026-05-29 cs.AI cs.CL

Human-Guided Harm Recovery for Computer Use Agents

面向计算机使用代理的人类引导式危害恢复

Christy Li, Sky CH-Wang, Andi Peng, Andreea Bobu

AI总结 针对LM代理在计算机系统中执行操作后的危害恢复问题,通过用户研究定义偏好对齐的恢复维度,提出基于奖励模型对候选恢复计划重排序的方法,并构建BackBench基准测试,实验表明该方法优于基线代理。

详情
AI中文摘要

随着LM代理获得在真实计算机系统上执行操作的能力,我们不仅需要大规模预防有害行为的方法,还需要在预防失败时有效修复危害。我们形式化了后执行安全中这一被忽视的挑战的解决方案——危害恢复:即根据人类偏好,将代理从有害状态最优地引导回安全状态的问题。通过一项形成性用户研究,我们确定了偏好对齐的恢复维度,并生成了自然语言评分标准,从而为偏好对齐的恢复奠定基础。我们的1130个成对判断数据集揭示了属性重要性的上下文相关变化,例如偏好实用、有针对性的策略而非全面的长期方法。我们将这些学习到的见解操作化为一个奖励模型,在测试时对代理框架生成的多个候选恢复计划进行重排序。为了系统性地评估恢复能力,我们引入了BackBench,一个包含50个计算机使用任务的基准测试,用于测试代理从有害状态中恢复的能力。人工评估表明,我们的奖励模型框架比基础代理和基于评分标准的框架产生更高质量的恢复轨迹。这些贡献共同为新型代理安全方法奠定了基础——这些方法不仅通过预防来应对危害,而且通过有意图的对齐来应对危害的后果。

英文摘要

As LM agents gain the ability to execute actions on real computer systems, we need ways to not only prevent harmful actions at scale but also effectively remediate harm when prevention fails. We formalize a solution to this neglected challenge in post-execution safeguards as harm recovery: the problem of optimally steering an agent from a harmful state back to a safe one in alignment with human preferences. We ground preference-aligned recovery through a formative user study that identifies valued recovery dimensions and produces a natural language rubric. Our dataset of 1,130 pairwise judgments reveals context-dependent shifts in attribute importance, such as preferences for pragmatic, targeted strategies over comprehensive long-term approaches. We operationalize these learned insights in a reward model, re-ranking multiple candidate recovery plans generated by an agent scaffold at test time. To evaluate recovery capabilities systematically, we introduce BackBench, a benchmark of 50 computer-use tasks that test an agent's ability to recover from harmful states. Human evaluation shows our reward model scaffold yields higher-quality recovery trajectories than base agents and rubric-based scaffolds. Together, these contributions lay the foundation for a new class of agent safety methods -- ones that confront harm not only by preventing it, but by navigating its aftermath with alignment and intent.

2604.18518 2026-05-29 cs.CV cs.LG

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

UDM-GRPO:面向均匀离散扩散模型的稳定高效组相对策略优化

Jiaqi Wang, Haoge Deng, Ting Pan, Yang Liu, Chengyuan Wang, Fan Zhang, Yonggang Qi, Xinlong Wang

AI总结 针对均匀离散扩散模型(UDM)与强化学习(RL)集成时训练不稳定、性能提升有限的问题,提出UDM-GRPO框架,通过将最终干净样本作为动作、利用扩散前向过程重建轨迹以及引入简化步数和无CFG策略,显著提升文本到图像生成任务的性能。

Comments UDM-GRPO is accepted by ICML 2026 (Spotlight). Code is available at https://github.com/Yovecent/UDM-GRPO

详情
AI中文摘要

均匀离散扩散模型(UDM)最近成为离散生成建模的一种有前景的范式;然而,其与强化学习的集成仍然很大程度上未被探索。我们观察到,将GRPO直接应用于UDM会导致训练不稳定和边际性能提升。为了解决这个问题,我们提出了UDM-GRPO,这是第一个将UDM与RL集成的框架。我们的方法基于两个关键见解:(i)将最终干净样本作为动作提供更准确和稳定的优化信号;(ii)通过扩散前向过程重建轨迹更好地将概率路径与预训练分布对齐。此外,我们引入了两种策略,即简化步数(Reduced-Step)和无CFG(CFG-Free),以进一步提高训练效率。UDM-GRPO在多个T2I任务上显著提升了基础模型性能。值得注意的是,GenEval准确率从69%提高到96%,PickScore从20.46增加到23.81,在连续和离散设置中均达到了最先进的性能。在OCR基准测试中,准确率从8%提高到57%,进一步验证了我们方法的泛化能力。代码可在https://github.com/Yovecent/UDM-GRPO获取。

英文摘要

Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance gains. To address this, we propose UDM-GRPO, the first framework to integrate UDM with RL. Our method is guided by two key insights: (i) treating the final clean sample as the action provides more accurate and stable optimization signals; and (ii) reconstructing trajectories via the diffusion forward process better aligns probability paths with the pretraining distribution. Additionally, we introduce two strategies, Reduced-Step and CFG-Free, to further improve training efficiency. UDM-GRPO significantly improves base model performance across multiple T2I tasks. Notably, GenEval accuracy improves from $69\%$ to $96\%$ and PickScore increases from $20.46$ to $23.81$, achieving state-of-the-art performance in both continuous and discrete settings. On the OCR benchmark, accuracy rises from $8\%$ to $57\%$, further validating the generalization ability of our method. Code is available at https://github.com/Yovecent/UDM-GRPO.

2604.15864 2026-05-29 cs.RO

Environment-Adaptive Solid-State LiDAR-Inertial Odometry

环境自适应固态激光雷达-惯性里程计

Zhi Zhang, Chalermchon Satirapod, Bingtao Ma, Changjun Gu

AI总结 提出一种集成局部法向量约束与退化感知地图维护的环境自适应固态激光雷达-惯性里程计,以解决极端环境下的几何退化与观测不可靠导致的定位漂移和地图不一致问题。

详情
AI中文摘要

固态激光雷达-惯性SLAM因其速度和鲁棒性优势而受到广泛关注。然而,在极端环境中实现精确建图仍然具有挑战性,因为严重的几何退化和不可靠的观测常常导致病态优化和地图不一致。为了解决这些问题,我们提出了一种环境自适应固态激光雷达-惯性里程计,它集成了局部法向量约束与退化感知地图维护,以增强定位精度。具体来说,我们引入局部法向量约束来提高状态估计的稳定性,有效抑制退化场景中的定位漂移。此外,我们设计了一种退化引导的地图更新策略以提高地图精度。得益于精细化的地图表示,后续估计中的定位精度进一步提高。实验结果表明,所提方法在极端和感知退化环境中实现了优越的建图精度和鲁棒性,与基线方法相比,平均RMSE降低高达12.8%。

英文摘要

Solid-state LiDAR-inertial SLAM has attracted significant attention due to its advantages in speed and robustness. However, achieving accurate mapping in extreme environments remains challenging due to severe geometric degeneracy and unreliable observations, which often lead to ill-conditioned optimization and map inconsistencies. To address these challenges, we propose an environment-adaptive solid-state LiDAR-inertial odometry that integrates local normal-vector constraints with degeneracy-aware map maintenance to enhance localization accuracy. Specifically, we introduce local normal-vector constraints to improve the stability of state estimation, effectively suppressing localization drift in degenerate scenarios. Furthermore, we design a degeneration-guided map update strategy to improve map precision. Benefiting from the refined map representation, localization accuracy is further enhanced in subsequent estimation. Experimental results demonstrate that the proposed method achieves superior mapping accuracy and robustness in extreme and perceptually degraded environments, with an average RMSE reduction of up to 12.8% compared to the baseline method.

2604.13519 2026-05-29 cs.CL

ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding

ToolSpec: 通过模式感知与检索增强的推测解码加速工具调用

Heming Xia, Yongqi Li, Cunxiao Du, Mingbo Song, Wenjie Li

AI总结 针对工具调用延迟问题,提出一种基于模式感知和检索增强的推测解码方法ToolSpec,利用预定义工具模式生成准确草稿,并通过有限状态机交替填充确定性模式令牌和推测生成可变字段,同时检索历史调用复用草稿,实现最高4.2倍加速。

详情
AI中文摘要

工具调用极大地扩展了大语言模型(LLMs)的实际效用,使其能够与外部应用程序交互。随着LLM能力的提升,有效的工具使用越来越多地涉及多步骤、多轮交互以解决复杂任务。然而,由此产生的工具交互增长带来了大量延迟,对实时LLM服务构成了关键挑战。通过实证分析,我们发现工具调用轨迹高度结构化,符合受限模式,并且经常表现出重复的调用模式。受此启发,我们提出了ToolSpec,一种模式感知、检索增强的推测解码方法,用于加速工具调用。ToolSpec利用预定义的工具模式生成准确的草稿,使用有限状态机在确定性模式令牌填充和可变字段的推测生成之间交替。此外,ToolSpec检索相似的历史工具调用并将其重用为草稿,以进一步提高效率。ToolSpec提供了一种即插即用的解决方案,可以无缝集成到现有的LLM工作流中。在多个基准上的实验表明,ToolSpec实现了高达4.2倍的加速,显著优于现有的无训练推测解码方法。

英文摘要

Tool calling has greatly expanded the practical utility of large language models (LLMs) by enabling them to interact with external applications. As LLM capabilities advance, effective tool use increasingly involves multi-step, multi-turn interactions to solve complex tasks. However, the resulting growth in tool interactions incurs substantial latency, posing a key challenge for real-time LLM serving. Through empirical analysis, we find that tool-calling traces are highly structured, conform to constrained schemas, and often exhibit recurring invocation patterns. Motivated by this, we propose ToolSpec, a schema-aware, retrieval-augmented speculative decoding method for accelerating tool calling. ToolSpec exploits predefined tool schemas to generate accurate drafts, using a finite-state machine to alternate between deterministic schema token filling and speculative generation for variable fields. In addition, ToolSpec retrieves similar historical tool invocations and reuses them as drafts to further improve efficiency. ToolSpec presents a plug-and-play solution that can be seamlessly integrated into existing LLM workflows. Experiments across multiple benchmarks demonstrate that ToolSpec achieves up to a 4.2x speedup, substantially outperforming existing training-free speculative decoding methods.

2604.13019 2026-05-29 cs.CV

PrecisionCUA: Iterative Visual Refinement for Pixel-Precise Cursor Grounding in Code Editors

PrecisionCUA:代码编辑器中像素级光标定位的迭代视觉细化

Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu

AI总结 提出PrecisionCUA方法,通过迭代视觉反馈细化机制实现代码编辑器中像素级光标定位,显著提升点击精度和任务成功率。

详情
AI中文摘要

计算机使用代理(CUA)从根本上依赖图形用户界面(GUI)定位,将语言指令转化为可执行的屏幕操作,但在密集编码界面(如VS Code和Cursor)中,需要亚像素精度才能与密集IDE元素交互的编辑级定位尚未得到充分探索。现有方法通常依赖单次坐标预测,缺乏纠错机制,在高密度界面中常常失败。在本技术报告中,我们对编码环境中的像素级光标定位进行了实证研究。我们的代理不是单步执行,而是参与迭代细化过程,利用先前尝试的视觉反馈来达到目标元素。这种闭环定位机制使代理能够自我纠正位移误差并适应动态UI变化。我们在Claude、Qwen和GPT上的一系列复杂编码基准上评估了我们的方法,结果表明多轮细化在点击精度和整体任务成功率上均显著优于最先进的单次模型。我们的结果表明,迭代视觉推理是下一代可靠软件工程代理的关键组成部分。代码:https://github.com/microsoft/precision-cua-bench/tree/main。

英文摘要

Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces (such as VS Code and Cursor), where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing approaches typically rely on single-shot coordinate prediction, which lacks a mechanism for error correction and often fails in high-density interfaces. In this technical report, we conduct an empirical study of pixel-precise cursor localization in coding environments. Instead of a single-step execution, our agent engages in an iterative refinement process, utilizing visual feedback from previous attempts to reach the target element. This closed-loop grounding mechanism allows the agent to self-correct displacement errors and adapt to dynamic UI changes. We evaluate our approach across Claude, Qwen, and GPT on a suite of complex coding benchmarks, demonstrating that multi-turn refinement significantly outperforms state-of-the-art single-shot models in both click precision and overall task success rate. Our results suggest that iterative visual reasoning is a critical component for the next generation of reliable software engineering agents. Code: https://github.com/microsoft/precision-cua-bench/tree/main.

2604.12772 2026-05-29 cs.CV cs.MA

A Multi-Agent Feedback System for Detecting and Describing News Events in Satellite Imagery

用于检测和描述卫星图像中新闻事件的多智能体反馈系统

Madeline Anderson, Mikhail Klassen, Ash Hoover, Kerri Cahoy

AI总结 提出一种迭代多智能体工作流SkyScraper,通过地理编码新闻文章并合成卫星图像序列描述,有效发现多时相事件并构建5000序列数据集。

详情
AI中文摘要

卫星图像的变化通常发生在多个时间步长上。尽管出现了双时相变化描述数据集,但遥感领域缺乏多时相事件描述数据集(每个序列至少两张图像)。这一差距的存在是因为(1)在卫星图像中搜索可见事件和(2)标记多时相序列需要大量的时间和人力。为了解决这些挑战,我们提出了SkyScraper,一种迭代的多智能体工作流,它对新闻文章进行地理编码,并为相应的卫星图像序列合成描述。我们的实验表明,SkyScraper成功找到的事件数量是传统地理编码方法的5倍,证明了智能体反馈是发现卫星图像中新的多时相事件的有效策略。我们将我们的框架应用于全球新闻文章的大型数据库,整理出一个包含5000个序列的新多时相描述数据集。通过自动识别与新闻事件相关的图像,我们的工作也支持新闻和报道工作。

英文摘要

Changes in satellite imagery often occur over multiple time steps. Despite the emergence of bi-temporal change captioning datasets, there is a lack of multi-temporal event captioning datasets (at least two images per sequence) in remote sensing. This gap exists because (1) searching for visible events in satellite imagery and (2) labeling multi-temporal sequences require significant time and labor. To address these challenges, we present SkyScraper, an iterative multi-agent workflow that geocodes news articles and synthesizes captions for corresponding satellite image sequences. Our experiments show that SkyScraper successfully finds 5x more events than traditional geocoding methods, demonstrating that agentic feedback is an effective strategy for surfacing new multi-temporal events in satellite imagery. We apply our framework to a large database of global news articles, curating a new multi-temporal captioning dataset with 5,000 sequences. By automatically identifying imagery related to news events, our work also supports journalism and reporting efforts.

2604.11088 2026-05-29 cs.AI cs.CL

Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents

护栏优于指导:关于编码智能体的规则、技能和持久配置的大规模研究

Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He

AI总结 通过大规模实验发现,随机规则与专家规则对编码智能体性能提升相当,且有益规则均为负面约束,有害规则均为正面指令,提出应使用约束而非指导来配置智能体。

详情
AI中文摘要

随机规则对编码智能体任务性能的提升与专家精心设计的规则相当(在SWE-bench Verified的判别子集上均提升$+13.8$个百分点),并且在我们的数据中,每条单独有益的规则都是负面约束(“不要重构无关代码”),而每条单独有害的规则都是正面指令(“遵循代码风格”)。我们通过首次对智能体规则文件( exttt{CLAUDE.md}、 exttt{.cursorrules}以及更广泛的智能体技能、插件清单和角色定义系列)进行大规模受控研究得出这些发现:我们从GitHub抓取了679个规则文件(共25,532条规则),并使用Claude Opus 4.6在SWE-bench Verified上进行了超过5,000次Claude Code智能体运行。出现了三种模式。(i)规则极性清晰地区分了有益规则和有害规则;我们通过基于势能的奖励塑形(PBRS)的视角来解读这一点。(ii)性能提升在很大程度上与内容无关:随机、打乱、领域不匹配和未转换格式的规则文件均与精心设计的规则相匹配,指向一种上下文启动机制。(iii)单独的规则通常看起来有害,但在集成中并未明显累积损害:在规则数量从0到50的范围内,通过率保持稳定。这些发现揭示了快速增长的社区编写规则和技能生态系统中隐藏的可靠性风险,并得出了更安全智能体配置的明确原则:约束智能体不能做什么,而不是规定它应该做什么。

英文摘要

Random rules improve a coding agent's task performance as much as expert-curated ones (both $+13.8$pp on a discriminative subset of SWE-bench Verified), and in our data every individually beneficial rule is a negative constraint ("do not refactor unrelated code"), while every individually harmful one is a positive directive ("follow code style"). We arrive at these findings through the first large-scale controlled study of agent rule files (\texttt{CLAUDE.md}, \texttt{.cursorrules}, and the broader family of agent skills, plugin manifests, and persona definitions): we scrape 679 rule files (25{,}532 rules) from GitHub and conduct over 5{,}000 agent runs of Claude Code with Claude Opus 4.6 on SWE-bench Verified. Three patterns emerge. (i) Rule polarity cleanly separates beneficial from harmful rules; we read this through the lens of potential-based reward shaping (PBRS). (ii) Performance gains are largely content-independent: random, shuffled, mismatched-domain, and unconverted-format rule files all match curated rules, pointing to a context priming mechanism. (iii) Individual rules often appear harmful in isolation yet do not visibly accumulate damage in ensemble: pass rates remain stable across rule counts from 0 to 50. These findings expose a hidden reliability risk in the rapidly growing ecosystem of community-authored rules and skills, and they yield a clear principle for safer agent configuration: constrain what agents must not do, rather than prescribing what they should.

2604.11080 2026-05-29 cs.CV cs.AI

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

ReSpinQuant: 通过子空间残差旋转近似实现高效逐层大模型量化

Suyoung Kim, Sunghyun Wee, Hyeonjin Kim, Kyomin Hwang, Hyunho Lee, Nojun Kwak

AI总结 提出ReSpinQuant框架,通过离线激活旋转融合和高效子空间残差旋转匹配基,解决逐层量化方法在线计算开销大的问题,在W4A4和W3A3量化上达到最优性能。

Comments ICML 2026

详情
AI中文摘要

基于旋转的后训练量化(PTQ)已成为缓解大型语言模型(LLMs)量化中激活值异常值的有前景的解决方案。全局旋转方法通过将激活旋转融合到注意力块和前馈网络块中实现推理效率,但由于受限于在所有层中使用单一可学习旋转矩阵,其表达能力有限。为了解决这一问题,出现了逐层变换方法,通过局部自适应实现了更高的精度。然而,逐层方法无法将激活旋转矩阵融合到权重中,需要在线计算并导致显著开销。在本文中,我们提出ReSpinQuant,一种量化框架,通过利用离线激活旋转融合和使用高效残差子空间旋转匹配基来解决此类开销。这种设计调和了逐层自适应的高表达性与仅可忽略的推理开销。在W4A4和W3A3量化上的大量实验表明,ReSpinQuant实现了最先进的性能,优于全局旋转方法,并以最小开销匹配计算昂贵的逐层方法的精度。

英文摘要

Rotation-based Post-Training Quantization (PTQ) has emerged as a promising solution for mitigating activation outliers in the quantization of Large Language Models (LLMs). Global rotation methods achieve inference efficiency by fusing activation rotations into attention and FFN blocks, but suffer from limited expressivity as they are constrained to use a single learnable rotation matrix across all layers. To tackle this, layer-wise transformation methods emerged, achieving superior accuracy through localized adaptation. However, layer-wise methods cannot fuse activation rotation matrices into weights, requiring online computations and causing significant overhead. In this paper, we propose ReSpinQuant, a quantization framework that resolves such overhead by leveraging offline activation rotation fusion and matching basis using efficient residual subspace rotation. This design reconciles the high expressivity of layer-wise adaptation with only negligible inference overhead. Extensive experiments on W4A4 and W3A3 quantization demonstrate that ReSpinQuant achieves state-of-the-art performance, outperforming global rotation methods and matching the accuracy of computationally expensive layer-wise methods with minimal overhead.

2604.09629 2026-05-29 cs.CL

HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation

HumorGen: 基于角色蒸馏的大语言模型幽默生成的认知协同

Edward Ajayi, Prasenjit Mitra

AI总结 针对大语言模型标准训练目标与幽默所需意外性之间的矛盾,提出认知协同框架,利用六种认知角色合成多样化喜剧视角数据,微调7B参数学生模型,实验表明认知驱动的数据策展比对齐算法或模型规模更关键。

详情
AI中文摘要

幽默生成对大语言模型(LLMs)构成重大挑战,因为其标准训练目标(下一个词预测)与喜剧所需的意外性和不协调性存在固有冲突。为弥合这一差距,我们引入了认知协同框架,这是一种受幽默心理学理论启发的高质量幽默数据生成方法。利用混合思维(MoT)方法,我们部署了六种认知角色(例如荒诞主义者、愤世嫉俗者)来为给定提示合成多样化的喜剧视角。该框架产生了一个基于理论的数据集,我们使用该数据集微调了一个7B参数的学生模型。我们进一步评估了两种对齐策略:直接偏好优化(DPO)和一种离线组相对变体O-GRPO,发现两者均未优于SFT。然而,我们的7B HumorGen模型变体显著优于更大的指令微调基线,并达到顶级开源权重性能,同时与前沿专有系统保持竞争力。这些结果表明,对于幽默生成,认知驱动的数据策展比对齐算法或模型规模更为关键。

英文摘要

Humor generation poses a significant challenge for Large Language Models (LLMs), because their standard training objective (next-token prediction) inherently conflicts with the surprise and incongruity required for comedy. To bridge this gap, we introduce the Cognitive Synergy Framework, a methodology for generating highquality humor data inspired by psychological theories of humor. Utilizing a Mixtureof-Thought (MoT) approach, we deploy six cognitive personas (e.g., The Absurdist, The Cynic) to synthesize diverse comedic perspectives for a given prompt. This framework produces a theory-grounded dataset, which we use to fine-tune a 7B-parameter student model. We further evaluate two alignment strategies, Direct Preference Optimization (DPO) and an offline group-relative variant O-GRPO, finding that neither improves over SFT. However, our 7B HumorGen model variants significantly outperform larger instruction-tuned baselines and achieve top-tier open-weight performance while remaining competitive with frontier proprietary systems. These results suggest that cognitively driven data curation is more critical than alignment algorithms or model scale for humor generation.

2604.06805 2026-05-29 cs.CL

Cognitive Loop of Thought: Reversible Hierarchical Markov Chain for Efficient Mathematical Reasoning

思维认知循环:基于可逆层次马尔可夫链的高效数学推理

Jia-Chen Zhang, Yu-Jie Xiong, Zheng Zhou

AI总结 提出基于可逆层次马尔可夫链的思维认知循环框架,通过层次分解和反向验证机制减少冗余、增强推理鲁棒性,在数学推理任务上取得显著提升。

详情
AI中文摘要

多步思维链通过利用显式推理步骤显著提升了大型语言模型的数学推理能力。然而,长思维链的广泛采用往往导致序列长度超出可管理的计算限制。现有方法尝试通过类似马尔可夫链的结构减少KV缓存冗余来缓解这一问题,但引入了两个关键限制:固有的无记忆性(上下文丢失)和有限的反向推理能力。为了解决这些限制,我们提出了一种基于可逆层次马尔可夫链的新型思维链框架,称为思维认知循环,以及一个反向推理数据集CLoT-Instruct。在CLoT中,问题被分解为具有层次依赖关系的子问题。受人类认知过程的启发,我们在每个层次层引入反向验证机制。此外,我们实施了一种剪枝策略:一旦高层子问题得到验证,冗余的低层子问题就会被剪枝以最大化效率。这种方法有效缓解了错误传播并增强了推理鲁棒性。在四个数学基准上的实验证明了我们方法的有效性。值得注意的是,在使用GPT-4o-mini的AddSub数据集上,CLoT达到了99.0%的准确率,分别比传统思维链和思维链自洽性高出4.1%和2.9%。

英文摘要

Multi-step Chain-of-Thought (CoT) has significantly advanced the mathematical reasoning capabilities of LLMs by leveraging explicit reasoning steps. However, the widespread adoption of Long CoT often results in sequence lengths that exceed manageable computational limits. While existing approaches attempt to alleviate this by reducing KV Cache redundancy via Markov chain-like structures, they introduce two critical limitations: inherent memorylessness (loss of context) and limited backward reasoning capability. To address these limitations, we propose a novel Chain-of-Thought framework based on Reversible Hierarchical Markov Chain, termed Cognitive Loop of Thought (CLoT), and a backward reasoning dataset CLoT-Instruct. In CLoT, problems are decomposed into sub-problems with hierarchical dependencies. Inspired by human cognitive processes, we introduce a backward verification mechanism at each hierarchical layer. Furthermore, we implement a pruning strategy: once higher-level sub-problems are verified, redundant lower-level sub-problems are pruned to maximize efficiency. This approach effectively mitigates error propagation and enhances reasoning robustness. Experiments on four mathematical benchmarks demonstrate the effectiveness of our method. Notably, on the AddSub dataset using GPT-4o-mini, CLoT achieves 99.0% accuracy, outperforming traditional CoT and CoT-SC by 4.1% and 2.9%, respectively.

2604.05157 2026-05-29 cs.AI

IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

IntentScore: 面向计算机使用智能体的意图条件动作评估

Rongqian Chen, Yu Li, Zeyu Fang, Sizhe Tang, Weidong Cao, Tian Lan

AI总结 提出IntentScore,一种基于计划感知的奖励模型,通过对比对齐和边际排序学习评估动作质量,在OSWorld上提升任务成功率6.9个百分点。

详情
AI中文摘要

计算机使用智能体(CUA)利用大型语言模型在桌面环境中执行GUI操作,但它们生成动作时不评估动作质量,导致不可逆的错误级联到后续步骤。我们提出IntentScore,一种计划感知的奖励模型,从跨越三个操作系统的398K离线GUI交互步骤中学习对候选动作进行评分。IntentScore通过两个互补目标进行训练:状态-动作相关性的对比对齐和动作正确性的边际排序。在架构上,它将每个候选的计划意图嵌入动作编码器,从而能够区分具有相似动作但不同理由的候选。IntentScore在保留评估上达到97.5%的成对区分准确率。作为Agent S3在OSWorld(训练中完全未见的环境)上的重排序器,IntentScore将任务成功率提高了6.9个百分点,表明从异构离线轨迹中学到的奖励估计可以泛化到未见过的智能体和任务分布。

英文摘要

Computer-Use Agents (CUAs) leverage large language models to execute GUI operations on desktop environments, yet they generate actions without evaluating action quality, leading to irreversible errors that cascade through subsequent steps. We propose IntentScore, a plan-aware reward model that learns to score candidate actions from 398K offline GUI interaction steps spanning three operating systems. IntentScore trains with two complementary objectives: contrastive alignment for state-action relevance and margin ranking for action correctness. Architecturally, it embeds each candidate's planning intent in the action encoder, enabling discrimination between candidates with similar actions but different rationales. IntentScore achieves 97.5% pairwise discrimination accuracy on held-out evaluation. Deployed as a re-ranker for Agent S3 on OSWorld, an environment entirely unseen during training, IntentScore improves task success rate by 6.9 points, demonstrating that reward estimation learned from heterogeneous offline trajectories generalizes to unseen agents and task distributions.

2603.27758 2026-05-29 cs.CV

RHO: Robust Holistic OSM-Based Metric Cross-View Geo-Localization

RHO: 基于OSM的鲁棒整体度量跨视角地理定位

Junwei Zheng, Ruize Dai, Ruiping Liu, Zichao Zeng, Yufan Chen, Fangjinhua Wang, Kunyu Peng, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

AI总结 提出RHO模型,利用全景图和OpenStreetMap进行度量跨视角地理定位,通过Split-Undistort-Merge模块处理全景畸变和Position-Orientation Fusion机制提升定位精度,在CV-RHO数据集上相比基线方法性能提升高达20%。

Comments Accepted by CVPR 2026. Project page: https://github.com/InSAI-Lab/RHO

详情
AI中文摘要

度量跨视角地理定位(MCVGL)旨在通过匹配地面和卫星图像来估计3自由度相机位姿(位置和朝向)。在这项工作中,我们研究使用整体全景图和OpenStreetMap(OSM)的鲁棒MCVGL,而非针孔和卫星图像。为此,我们建立了一个大规模MCVGL基准数据集CV-RHO,包含超过270万张图像,涵盖不同天气和光照条件以及传感器噪声。此外,我们提出了一种名为RHO的模型,采用双分支Pin-Pan架构进行精确视觉定位。引入了Split-Undistort-Merge(SUM)模块来解决全景畸变问题,并设计了Position-Orientation Fusion(POF)机制来增强定位精度。大量实验证明了我们CV-RHO数据集的价值和RHO模型的有效性,与最先进的基线方法相比,性能提升高达20%。项目页面:https://github.com/InSAI-Lab/RHO。

英文摘要

Metric Cross-View Geo-Localization (MCVGL) aims to estimate the 3-DoF camera pose (position and heading) by matching ground and satellite images. In this work, instead of pinhole and satellite images, we study robust MCVGL using holistic panoramas and OpenStreetMap (OSM). To this end, we establish a large-scale MCVGL benchmark dataset, CV-RHO, with over 2.7M images under different weather and lighting conditions, as well as sensor noise. Furthermore, we propose a model termed RHO with a two-branch Pin-Pan architecture for accurate visual localization. A Split-Undistort-Merge (SUM) module is introduced to address the panoramic distortion, and a Position-Orientation Fusion (POF) mechanism is designed to enhance the localization accuracy. Extensive experiments prove the value of our CV-RHO dataset and the effectiveness of the RHO model, with a significant performance gain up to 20% compared with the state-of-the-art baselines. Project page: https://github.com/InSAI-Lab/RHO.

2603.27518 2026-05-29 cs.CL

Over-Refusal and Representation Subspaces: A Mechanistic Analysis of Task-Conditioned Refusal in Aligned LLMs

过度拒绝与表示子空间:对齐大语言模型中任务条件拒绝的机制分析

Utsav Maskey, Mark Dras, Usman Naseem

AI总结 通过分析有害拒绝和过度拒绝的表示几何,发现过度拒绝方向是任务相关的且存在于良性任务表示簇中,解释了为何全局方向消融无法解决过度拒绝,并表明需要任务特定的几何干预。

Comments Preprint

详情
AI中文摘要

经过训练以拒绝有害请求的对齐语言模型也会表现出过度拒绝:它们拒绝看似类似于有害指令的安全指令。一种自然的方法是消融全局拒绝方向,将隐藏状态向量远离或朝向有害拒绝示例,但这只是偶然地纠正了过度拒绝,同时破坏了更广泛的拒绝机制。在这项工作中,我们分析了两种拒绝类型的表示几何,以理解为什么会发生这种情况。我们表明,有害拒绝方向是任务无关的,可以通过单个全局向量捕获,而过度拒绝方向是任务相关的:它们位于良性任务表示簇内,在不同任务之间变化,并跨越更高维的子空间。线性探测表明,两种拒绝类型从早期Transformer层开始就在表示上不同。这些发现提供了机制上的解释,说明为什么仅靠全局方向消融无法解决过度拒绝,并确立了任务特定的几何干预是必要的。

英文摘要

Aligned language models that are trained to refuse harmful requests also exhibit over-refusal: they decline safe instructions that seemingly resemble harmful instructions. A natural approach is to ablate the global refusal direction, steering the hidden-state vectors away or towards the harmful-refusal examples, but this corrects over-refusal only incidentally while disrupting the broader refusal mechanism. In this work, we analyse the representational geometry of both refusal types to understand why this happens. We show that harmful-refusal directions are task-agnostic and can be captured by a single global vector, whereas over-refusal directions are task-dependent: they reside within the benign task-representation clusters, vary across tasks, and span a higher-dimensional subspace. Linear probing suggests that the two refusal types are representationally distinct from the early transformer layers. These findings provide a mechanistic explanation of why global direction ablation alone cannot address over-refusal, and establish that task-specific geometric interventions are necessary.

2603.27150 2026-05-29 cs.AI cs.MA

MediHive: A Decentralized Agent Collective for Medical Reasoning

MediHive:用于医学推理的去中心化智能体集体

Xiaoyang Wang, Christopher C. Yang

AI总结 提出一种去中心化多智能体框架MediHive,通过共享记忆池和迭代融合机制,使LLM智能体自主分配角色、进行条件性基于证据的辩论并融合观点,在MedQA和PubMedQA上分别达到84.3%和78.4%的准确率。

Comments Accepted by Journal of Healthcare Informatics Research

详情
AI中文摘要

大型语言模型(LLM)已经革新了医学推理任务,但单智能体系统在处理需要稳健处理不确定性和冲突证据的复杂跨学科问题时常常表现不佳。利用LLM的多智能体系统(MAS)能够实现协作智能,但主流的集中式架构在资源受限环境中存在可扩展性瓶颈、单点故障和角色混淆问题。去中心化MAS(D-MAS)通过点对点交互承诺增强自主性和弹性,但其在高风险医疗领域的应用仍未充分探索。我们提出了MediHive,一种新颖的去中心化多智能体医学问答框架,该框架将共享记忆池与迭代融合机制相结合。MediHive部署基于LLM的智能体,这些智能体自主分配专业角色、进行初始分析、通过条件性基于证据的辩论检测分歧,并在多轮中本地融合同伴见解以达成共识。实验表明,MediHive在MedQA和PubMedQA数据集上分别达到84.3%和78.4%的准确率,优于单LLM和集中式基线。我们的工作推进了用于医学AI的可扩展、容错D-MAS,解决了集中式设计的关键局限性,同时在推理密集型任务中展示了优越性能。

英文摘要

Large language models (LLMs) have revolutionized medical reasoning tasks, yet single-agent systems often falter on complex, interdisciplinary problems requiring robust handling of uncertainty and conflicting evidence. Multi-agent systems (MAS) leveraging LLMs enable collaborative intelligence, but prevailing centralized architectures suffer from scalability bottlenecks, single points of failure, and role confusion in resource-constrained environments. Decentralized MAS (D-MAS) promise enhanced autonomy and resilience via peer-to-peer interactions, but their application to high-stakes healthcare domains remains underexplored. We introduce MediHive, a novel decentralized multi-agent framework for medical question answering that integrates a shared memory pool with iterative fusion mechanisms. MediHive deploys LLM-based agents that autonomously self-assign specialized roles, conduct initial analyses, detect divergences through conditional evidence-based debates, and locally fuse peer insights over multiple rounds to achieve consensus. Empirically, MediHive outperforms single-LLM and centralized baselines on MedQA and PubMedQA datasets, attaining accuracies of 84.3% and 78.4%, respectively. Our work advances scalable, fault-tolerant D-MAS for medical AI, addressing key limitations of centralized designs while demonstrating superior performance in reasoning-intensive tasks.

2603.23971 2026-05-29 cs.CL cs.AI cs.GT cs.LG cs.MA

The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More

价格反转现象:当更便宜的推理模型成本更高时

Lingjiao Chen, Chi Zhang, Yeye He, Ion Stoica, Matei Zaharia, James Zou

AI总结 本文首次系统研究推理模型标价与实际成本的偏差,发现32%的模型对比较中存在价格反转现象,并基于Shapley值建立成本归因框架,揭示思考令牌消耗和交互轮次的高度异质性是主要原因。

详情
AI中文摘要

开发者和消费者越来越根据列出的API价格选择推理模型(RMs)。然而,这些价格在多大程度上准确反映了实际推理成本?我们首次系统研究这一问题,评估了8个前沿RM在12个不同任务上的表现,涵盖竞赛数学、科学问答、代码生成和多领域智能体。我们发现了定价反转现象:在32%的模型对比较中,标价较低的模型实际上产生了更高的总成本,反转幅度高达28倍。例如,Gemini 3 Flash的标价比GPT-5.4便宜80%,但其在所有任务上的实际成本却高出38%。我们基于Shapley值构建了一个正式的成本归因框架,并利用它追溯了思考令牌消耗和交互轮次数量巨大异质性的主要贡献因素:对于同一查询,一个模型可能比另一个模型多使用900%的思考令牌,或多出10倍的环境交互轮次。我们进一步表明,每次查询的成本预测本质上是困难的:同一查询的重复运行产生的思考令牌变化高达9.7倍,为任何预测器建立了不可约的噪声底限。因此,我们提出成本分布预测作为一个开放挑战。我们的发现表明,列出的API定价是实际成本的不可靠代理,呼吁进行成本感知的模型选择和透明的每次请求成本监控。

英文摘要

Developers and consumers increasingly choose reasoning models (RMs) based on their listed API prices. However, how accurately do these prices reflect actual inference costs? We conduct the first systematic study of this question, evaluating 8 frontier RMs across 12 diverse tasks covering competition math, science QA, code generation, and multi-domain agents. We uncover the pricing reversal phenomenon: in 32% of model-pair comparisons, the model with a lower listed price actually incurs a higher total cost, with reversal magnitude reaching up to 28x. For example, Gemini 3 Flash's listed price is 80% cheaper than GPT-5.4's, yet its actual cost across all tasks is 38% higher. We build a formal cost attribution framework based on Shapley value, and leverage it to trace the dominating contributors to vast heterogeneity in thinking token consumption and number of interaction turns: on the same query, one model may use 900% more thinking tokens than another, or 10x more turns of environment interactions. We further show that per-query cost prediction is fundamentally difficult: repeated runs of the same query yield thinking token variation up to 9.7x, establishing an irreducible noise floor for any predictor. Thus, we propose cost distribution prediction as an open challenge. Our findings demonstrate that listed API pricing is an unreliable proxy for actual cost, calling for cost-aware model selection and transparent per-request cost monitoring.

2603.22348 2026-05-29 cs.LG cs.GT

Learning Safely Without Knowing the World:COMPASS-Hedge

在不了解世界的情况下安全学习:COMPASS-Hedge

Ting Hu, Luanda Cai, Emmanouil-Vasileios Vlatakis-Gkaragkounis

AI总结 提出COMPASS-Hedge算法,通过自适应伪遗憾缩放和基于阶段的激进策略,首次在全信息在线学习中同时实现对抗环境下的极小化最优遗憾、随机环境下的实例最优遗憾以及相对于基准策略的常数遗憾,且无需先验知识。

详情
AI中文摘要

在线学习算法常常面临一个基本的三难困境:在对抗性和随机性设置之间平衡遗憾保证,并提供相对于固定比较器的基线安全性。虽然现有方法在其中一个或两个领域表现出色,但它们通常无法在不牺牲最优速率或需要问题相关参数的神谕访问的情况下统一所有三个目标。在这项工作中,我们通过引入COMPASS-Hedge来弥合这一差距。据我们所知,我们的算法是第一个全信息任意时间方法,同时实现(达到对数因子):i)对抗环境中的极小化最优遗憾;ii)随机环境中实例最优、间隙相关的遗憾;以及iii)相对于指定基准策略的$\tilde{\mathcal{O}}(1)$遗憾。关键是,COMPASS-Hedge是无参数的,不需要事先了解环境的性质或随机次优间隙的大小。我们的方法依赖于自适应伪遗憾缩放和基于阶段的激进策略的新颖结合,以及比较器感知的混合策略。据我们所知,这提供了全信息设置中的第一个“三世界最优”保证,确立了基线安全性不必以最坏情况鲁棒性或随机效率为代价。

英文摘要

Online learning algorithms often face a fundamental trilemma: balancing regret guarantees between adversarial and stochastic settings and providing baseline safety against a fixed comparator. While existing methods excel in one or two of these regimes, they typically fail to unify all three without sacrificing optimal rates or requiring oracle access to problem-dependent parameters. In this work, we bridge this gap by introducing COMPASS-Hedge. To the best of our knowledge, our algorithm is the first full-information anytime method to simultaneously achieve, up to logarithmic factors: i) minimax-optimal regret in adversarial environments; ii) instance-optimal, gap-dependent regret in stochastic environments; and iii) $\tilde{\mathcal{O}}(1)$ regret relative to a designated baseline policy. Crucially, COMPASS-Hedge is parameter-free and requires no prior knowledge of the environment's nature or the magnitude of the stochastic suboptimality gaps. Our approach hinges on a novel integration of adaptive pseudo-regret scaling and phase-based aggression, coupled with a comparator-aware mixing strategy. To the best of our knowledge, this provides the first "best-of-three-world" guarantee in the full-information setting, establishing that baseline safety does not have to come at the cost of worst-case robustness or stochastic efficiency.

2603.18859 2026-05-29 cs.AI cs.CL cs.LG

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

RewardFlow: 面向大语言模型智能体强化学习的拓扑感知状态图奖励传播

Xiao Feng, Bo Han, Zhanke Zhou, Jiaqi Fan, Jiangchao Yao, Ka Ho Li, Dahai Yu, Michael Kwok-Po Ng

AI总结 提出RewardFlow方法,通过构建状态图进行拓扑感知的奖励传播,为智能体推理提供无标注的密集奖励,显著提升强化学习性能。

详情
AI中文摘要

强化学习在增强大语言模型智能体推理方面展现出潜力,但稀疏的终端奖励阻碍了细粒度优化。过程奖励建模提供了一种替代方案,但带来了高计算成本、奖励黑客风险和标注瓶颈。我们引入RewardFlow,一种用于估计智能体推理中状态级奖励的轻量级方法。通过构建捕获轨迹内在拓扑结构的状态图,RewardFlow执行拓扑感知的传播以估计每个状态对成功的贡献,从而产生有原则的、无标注的密集奖励。用于强化学习优化时,RewardFlow在四个智能体基准测试中显著优于先前基线:在基于文本的任务上平均成功率提高6.2%,在视觉推理上跨三个模型尺度比最强基线提高29.7%,在DeepResearch上准确率提高10%,同时具有卓越的鲁棒性和训练效率。RewardFlow的实现已在https://github.com/tmlr-group/RewardFlow公开。

英文摘要

Reinforcement learning (RL) shows promise for enhancing LLM agentic reasoning, yet sparse terminal rewards hinder fine-grained optimization. Process reward modeling offers an alternative but incurs high computational costs, reward hacking risks, and annotation bottlenecks. We introduce RewardFlow, a lightweight method for estimating state-level rewards in agentic reasoning. By constructing state graphs that capture the intrinsic topological structure of trajectories, RewardFlow performs topology-aware propagation to estimate each state's contribution to success, yielding principled, annotation-free dense rewards. Used for RL optimization, RewardFlow substantially outperforms prior baselines across four agentic benchmarks: +6.2% average success rate on text-based tasks, +29.7% on visual reasoning over the strongest baseline across three model scales, and +10% accuracy on DeepResearch, with superior robustness and training efficiency. The implementation of RewardFlow is publicly available at https://github.com/tmlr-group/RewardFlow.

2603.16673 2026-05-29 cs.RO cs.AI cs.LG

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

机器人何时应该思考?基于强化学习的资源感知推理在具身机器人决策中的应用

Jun Liu, Pu Zhao, Zhenglun Kong, Xuan Shen, Peiyan Dong, Fan Yang, Lin Cui, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Xue Lin, Gaowen Liu, Yanzhi Wang, Dong Huang

AI总结 提出RARRL框架,通过强化学习学习高层编排策略,使具身代理能自适应决定是否调用LLM推理、选择推理角色及分配计算预算,以平衡推理开销与任务成功率。

详情
AI中文摘要

具身机器人系统越来越依赖基于大语言模型(LLM)的代理来支持与环境交互过程中的高级推理、规划和决策。然而,调用LLM推理会引入大量的计算延迟和资源开销,这可能会中断动作执行并降低系统可靠性。过多的推理可能延迟动作,而推理不足则常常导致错误决策和任务失败。这引出了具身代理的一个基本问题:代理何时应该推理,何时应该行动?在这项工作中,我们提出了RARRL(基于强化学习的资源感知推理),一个用于具身代理资源感知编排的分层框架。RARRL不是学习低级控制策略,而是学习一个在代理决策层运行的高级编排策略。该策略使代理能够根据当前观察、执行历史和剩余资源,自适应地决定是否调用推理、使用哪个推理角色以及分配多少计算预算。大量实验,包括使用来自ALFRED基准测试的经验延迟配置文件进行评估,表明与固定或启发式推理策略相比,RARRL在减少执行延迟和增强鲁棒性的同时,持续提高了任务成功率。这些结果表明,自适应推理控制对于构建可靠且高效的具身机器人代理至关重要。

英文摘要

Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-level reasoning, planning, and decision-making during interactions with the environment. However, invoking LLM reasoning introduces substantial computational latency and resource overhead, which can interrupt action execution and reduce system reliability. Excessive reasoning may delay actions, while insufficient reasoning often leads to incorrect decisions and task failures. This raises a fundamental question for embodied agents: when should the agent reason, and when should it act? In this work, we propose RARRL (Resource-Aware Reasoning via Reinforcement Learning), a hierarchical framework for resource-aware orchestration of embodied agents. Rather than learning low-level control policies, RARRL learns a high-level orchestration policy that operates at the agent's decision-making layer. This policy enables the agent to adaptively determine whether to invoke reasoning, which reasoning role to employ, and how much computational budget to allocate based on current observations, execution history, and remaining resources. Extensive experiments, including evaluations with empirical latency profiles derived from the ALFRED benchmark, show that RARRL consistently improves task success rates while reducing execution latency and enhancing robustness compared with fixed or heuristic reasoning strategies. These results demonstrate that adaptive reasoning control is essential for building reliable and efficient embodied robotic agents.

2603.08142 2026-05-29 cs.RO

Multifingered force-aware control for humanoid robots

人形机器人的多指力感知控制

Pasquale Marra, Gabriele M. Caddeo, Ugo Pattacini, Lorenzo Natale

AI总结 提出一种基于力估计的多指手人形机器人控制方案,通过调整躯干、手臂、手腕和手指的运动重新分配力,以维持与物体的稳定接触,在平衡任务中成功率达82.7%。

Comments This work has been accepted for publication in ICRA 2026

详情
AI中文摘要

在本文中,我们研究了具有多指手的机器人平台中的力感知控制和力分配问题。给定目标以及来自触觉传感器的力估计,我们设计了一个控制器,能够调整躯干、手臂、手腕和手指的运动,重新分配力以维持与不同质量分布或不稳定接触的物体的稳定接触。为了估计力,我们使用五个Xela磁传感器与压头交互,收集触觉信号和地面真实力测量数据集,并训练力估计器。然后,我们引入一种基于模型的控制方案,该方案最小化压力中心(CoP)与指尖接触多边形质心之间的距离。由于我们的方法依赖于估计的力而非原始触觉信号,因此它有可能应用于任何能够进行力估计的传感器。我们在一个包含五个物体的平衡任务上验证了我们的框架,成功率达到82.7%,并在多物体场景中进一步评估,准确率达到80%。代码和数据可在此处找到:https://github.com/hsp-iit/multifingered-force-aware-control。

英文摘要

In this paper, we address force-aware control and force distribution in robotic platforms with multi-fingered hands. Given a target goal and force estimates from tactile sensors, we design a controller that adapts the motion of the torso, arm, wrist, and fingers, redistributing forces to maintain stable contact with objects of varying mass distribution or unstable contacts. To estimate forces, we collect a dataset of tactile signals and ground-truth force measurements using five Xela magnetic sensors interacting with indenters, and train force estimators. We then introduce a model-based control scheme that minimizes the distance between the Center of Pressure (CoP) and the centroid of the fingertips contact polygon. Since our method relies on estimated forces rather than raw tactile signals, it has the potential to be applied to any sensor capable of force estimation. We validate our framework on a balancing task with five objects, achieving a $82.7\%$ success rate, and further evaluate it in multi-object scenarios, achieving $80\%$ accuracy. Code and data can be found here https://github.com/hsp-iit/multifingered-force-aware-control.

2603.01006 2026-05-29 cs.SD cs.AI cs.LG cs.MM

AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching

AG-REPA:音频流匹配中表示对齐的因果层选择

Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu

AI总结 提出AG-REPA方法,通过前向门控消融量化各层对速度场的因果贡献,实现稀疏层选择和自适应加权对齐,在音频流匹配中优于传统REPA基线。

Comments Accepted to ICML 2026. 17 pages, 4 figures, 12 tables

详情
AI中文摘要

表示对齐(REPA)通过将中间隐藏状态与预训练教师特征对齐来改进生成流模型的训练,但在令牌条件音频流匹配中,其有效性关键取决于监督层的选择,而监督层通常基于深度启发式地选择。在这项工作中,我们引入了归因引导的表示对齐(AG-REPA),一种用于音频流匹配中表示对齐的新型因果层选择策略。首先,我们发现最能存储语义/声学信息(高教师空间相似性)的层不一定是那些对驱动生成的速度场贡献最大的层,我们称之为存储-贡献分离(SCD)。为了将这一见解转化为可操作的训练指导,我们提出了一种前向门控消融(FoG-A),通过预测速度场中的诱导变化来量化每个层的因果贡献,从而实现稀疏层选择和自适应加权对齐。在统一的语音和通用音频训练(LibriSpeech + AudioSet)中,在不同的令牌条件拓扑下,AG-REPA始终优于REPA基线。总体而言,我们的结果表明,当对齐应用于因果主导的驱动速度场的层时,而不是应用于表示丰富但功能被动的层时,对齐最为有效。

英文摘要

REPresentation Alignment (REPA) improves the training of generative flow models by aligning intermediate hidden states with pretrained teacher features, but its effectiveness in token-conditioned audio Flow Matching critically depends on the choice of supervised layers, which is typically made heuristically based on the depth. In this work, we introduce Attribution-Guided REPresentation Alignment (AG-REPA), a novel causal layer selection strategy for representation alignment in audio Flow Matching. Firstly, we find that layers that best store semantic/acoustic information (high teacher-space similarity) are not necessarily the layers that contribute most to the velocity field that drives generation, and we call it Store-Contribute Dissociation (SCD). To turn this insight into an actionable training guidance, we propose a forward-only gate ablation (FoG-A) that quantifies each layer's causal contribution via the induced change in the predicted velocity field, enabling sparse layer selection and adaptive weighting for alignment. Across unified speech and general-audio training (LibriSpeech + AudioSet) under different token-conditioning topologies, AG-REPA consistently outperforms REPA baselines. Overall, our results show that alignment is most effective when applied to the causally dominant layers that drive the velocity field, rather than to layers that are representationally rich but functionally passive.

2603.00991 2026-05-29 cs.AI cs.PL

Tracking Capabilities for Safer Agents

更安全智能体的追踪能力

Martin Odersky, Yaoyu Zhao, Yichen Xu, Oliver Bračevac, Cao Nguyen Pham

AI总结 提出通过Scala 3的捕获检查类型系统静态追踪能力,构建基于编程语言的智能体安全约束,防止信息泄露和恶意副作用。

详情
AI中文摘要

通过工具调用与现实世界交互的AI智能体带来了基本的安全挑战:智能体可能泄露私人信息、导致意外副作用或通过提示注入被操纵。为应对这些挑战,我们建议将智能体置于基于编程语言的“安全约束”中:智能体不直接调用工具,而是以能力安全的语言(支持捕获检查的Scala 3)中的代码表达其意图。能力是程序变量,用于调节对感兴趣的效果和资源的访问。Scala的类型系统静态追踪能力,提供对智能体行为的细粒度控制。特别是,它支持局部纯度,即强制子计算无副作用的能力,防止智能体处理机密数据时的信息泄露。我们展示了通过利用具有追踪能力的强类型系统,可以构建可扩展的智能体安全约束。实验表明,智能体能够生成能力安全的代码,而任务性能没有显著损失,同时类型系统可靠地防止了信息泄露和恶意副作用等不安全行为。

英文摘要

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenges, we propose to put the agent in a programming-language-based "safety harness": instead of calling tools directly, agents express their intentions as code in a capability-safe language: Scala 3 with capture checking. Capabilities are program variables that regulate access to effects and resources of interest. Scala's type system tracks capabilities statically, providing fine-grained control over what an agent can do. In particular, it enables local purity, the ability to enforce that sub-computations are side-effect-free, preventing information leakage when agents process classified data. We demonstrate that extensible agent safety harnesses can be built by leveraging a strong type system with tracked capabilities. Our experiments show that agents can generate capability-safe code with no significant loss in task performance, while the type system reliably prevents unsafe behaviors such as information leakage and malicious side effects.

2603.00454 2026-05-29 cs.LG cs.AI

Rooted Absorbed Prefix Trajectory Balance with Submodular Replay for GFlowNet Training

基于子模重放的根吸收前缀轨迹平衡用于GFlowNet训练

Xi Wang, Wenbo Lu, Shengjie Wang

AI总结 针对GFlowNet的模式坍塌问题,提出RapTB目标函数(通过根锚定子轨迹监督和吸收后缀备份提供密集前缀学习信号)和SubM子模重放策略(促进高奖励和多样性),在分子生成等任务中提升优化性能和多样性。

详情
AI中文摘要

生成流网络(GFlowNets)能够微调大型语言模型以近似奖励比例的后验分布,但仍容易出现模式坍塌,表现为前缀坍塌和长度偏差。我们将此归因于两个因素:(i)对早期前缀的信用分配较弱,以及(ii)有偏的重放导致偏移的、非代表性的训练流分布。我们提出根吸收前缀轨迹平衡(RapTB),该目标函数将子轨迹监督锚定在根节点,并通过吸收后缀备份将终端奖励传播到中间前缀,从而提供密集的前缀级学习信号。为了减轻重放引起的分布偏移,我们进一步引入SubM,一种子模重放刷新策略,同时促进高奖励和多样性。实验表明,在使用SMILES字符串的分子生成等任务中,RapTB结合SubM持续提升优化性能和分子多样性,同时保持高有效性。

英文摘要

Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attribute this to two factors: (i) weak credit assignment to early prefixes, and (ii) biased replay that induces a shifted, non-representative training flow distribution. We propose Rooted absorbed prefix Trajectory Balance RapTB, an objective that anchors subtrajectory supervision at the root and propagates terminal rewards to intermediate prefixes via absorbed suffix-based backups, providing dense prefix-level learning signals. To mitigate replay-induced distribution shift, we further introduce SubM, a submodular replay refresh strategy that promotes both high reward and diversity. Empirically, on tasks such as molecule generation with LLM using SMILES strings, RapTB combined with SubM consistently improves optimization performance and molecular diversity while preserving high validity.