arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.15206 2026-05-18 cs.LG cs.AI cs.DC

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

Dzung Pham, Kleomenis Katevas, Ali Shahin Shamsabadi, Hamed Haddadi

AI总结随着基于大语言模型的自主代理在复杂任务中应用增多，本地部署虽能提升隐私保护和降低成本，但其资源消耗远高于普通语言模型交互。本文研究了在消费级硬件上本地运行代理的能耗问题，提出了一种名为AgentStop的轻量级监督机制，通过预测任务失败的可能性提前终止无效流程，在减少15%-20%能耗的同时仅小幅影响任务性能，为可持续的本地智能代理系统提供了可行方案。

Comments ACM CAIS '26

2605.15205 2026-05-18 cs.AI

Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

Nanxu Gong, Zixin Chen, Haotian Li, Zishu Zhao, Jianxun Lian, Huamin Qu, Yanjie Fu, Xing Xie

AI总结本研究探讨了提升大型语言模型（LLM）心智理论（ToM）能力是否真正有助于改善人机交互。研究指出，现有基准多从第三人称视角通过阅读故事和选择题评估ToM能力，忽视了真实交互中的第一人称、动态和开放特性。为此，研究提出了一种新的交互式ToM评估范式，并通过真实数据集和用户实验系统评估了四种代表性ToM增强技术，发现静态基准上的提升并不一定带来动态人机交互中的性能改善，强调了基于交互的评估在开发下一代社会智能模型中的重要性。

2605.15204 2026-05-18 cs.AI

SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

Zhantao Wang

AI总结本文提出了一种名为SDOF的多智能体协调框架，旨在解决现有系统在任务调度中缺乏阶段约束的问题。该框架将多智能体执行视为受约束的状态机，并通过强化学习与有限状态自动机相结合的方法，实现对任务流程的精确控制与合规性验证。实验表明，SDOF在招聘系统等实际场景中表现出更高的任务完成率与执行安全性，显著优于现有模型。

Comments 12 pages, 4 figures, 14 tables

2605.15202 2026-05-18 cs.AI cs.CL cs.IR

DeepSlide: From Artifacts to Presentation Delivery

Ming Yang, Zhiwei Zhang, Jiahang Li, Haoseng Liu, Yuzheng Cai, Weiguo Zheng

AI总结 DeepSlide 是一个支持全流程演示文稿准备的人机协作多智能体系统，旨在优化从内容规划到演讲表现的整个过程，而不仅仅是生成视觉上合理的幻灯片。该系统结合了可控逻辑链规划、内容树检索、风格继承的序列渲染以及可执行的排练支持，有效提升了演讲的叙事连贯性、节奏精确度和幻灯片与讲稿的协同性。研究还引入了一个双评分板基准，用于区分静态内容质量与动态演讲表现，实验表明 DeepSlide 在多个领域和受众场景下均优于现有方法。

Comments 37 pages,10 figures,9 tables

2605.15093 2026-05-18 cs.CV

CoralLite: μCT Reconstruction of Coral Colonies from Individual Corallites

Jess Jones, Leonardo Bertini, Kenneth Johnson, Erica Hendy, Tilo Burghardt

AI总结该研究提出了一种名为CoralLite的方法，用于从珊瑚骨骼的微CT扫描数据中重建单个珊瑚虫的骨骼结构。研究通过结合弱标注数据预训练与全标注切片微调的混合V-Trans-UNet网络，实现了对整个珊瑚群体骨骼的高精度分割与三维建模。该方法在相同珊瑚群体和不同生物样本上均表现出良好的分割性能，为基于微CT的珊瑚个体骨骼建模提供了首个深度学习基准与完整数据集。

Comments 15 pages, 10 figures, 2 tables

详情

英文摘要

The life history of an individual coral is archived within the accreting skeleton of the colony. While reef-forming coral colonies (e.g. massive $\textit{Porites}$ sp.) may live for hundreds of years and deposit calcareous structures many metres in height and width, their living tissue is a thin outer surface layer comprised of asexually-dividing polyps that only survive a few years. To understand the rate and timing of polyp division and the consequences for colony skeletal growth, scientists need to track the skeletal corallite deposited around each polyp. Here we propose CoralLite, an annotated $μ$CT scan dataset of entire calcareous skeletons and an associated, first corallite deep learning reconstruction baseline. CoralLite combines fully quantified volumetric segmentations with cross-slice linking for visualisations of 3D models for each corallite up to colony scale. For segmentation, we propose and evaluate in detail a hybrid V-Trans-UNet architecture applicable to segmenting tiled $μ$CT virtual slabs of $\textit{Porites}$ sp. colonies. The model is pre-trained on weakly annotated data and topology-aware fine-tuned using fully annotated slice sections with 8k+ manual corallite region annotations. On unseen slices of the same colony, the resulting model reaches 0.94 topological accuracy at mean Dice scores of 0.77 on the same colony and projection axis, and 0.63 mean Dice scores on a different, biologically unrelated specimen. Whilst our experiments are limited in scale and context, our results show for the first time that visual machine learning can effectively support full 3D individual corallite modelling from $μ$CT scans of coral skeletons alone. For reproducibility and as a baseline for future research we publish our full dataset of 697 $μ$CT slices, 37 partial or full slice annotations, and all network weights and source code with this paper.

URL PDF HTML ☆

赞 0 踩 0

2605.15053 2026-05-18 cs.LG cs.AI

TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale

Anurup Ganguli

AI总结本文提出了一种名为TFGN的新型架构，能够在无需回放数据、无需任务标识的情况下，在大规模语言模型中实现无灾难性遗忘的持续预训练。该方法通过在Transformer模型上叠加一个参数高效的输入条件更新模块，实现了跨异构文本领域的正向和反向迁移，并在多个大规模模型和数据集上取得了显著效果。研究还进一步引入了闭环元控制器和操作级计划向量，提升了模型的自主学习能力和跨域适应性，为大规模语言模型的持续学习提供了新的架构解决方案。

Comments 65 pages, 10 figures, 40 tables

详情

英文摘要

Continually pre-training a large language model on heterogeneous text domains, without replay or task labels, has remained an unsolved architectural problem at LLM scale. Existing methods rely on replay buffers, task identifiers, regularization penalties that scale poorly, or sentence-classification-scale evaluation. We introduce TFGN, an architectural overlay for transformer language models that produces input-conditioned, parameter-efficient updates while leaving the rest of the transformer unchanged. On six heterogeneous text domains (Prose, Python, Math, Biomedical, Chinese, JavaScript) at 1B tokens per phase across three model scales (~398M, ~739M, ~9B) and two regimes (From-Scratch and Retrofit), TFGN achieves backward transfer of -0.007 at LLaMA 3.1 8B Retrofit, HellaSwag retention 0.506/0.504/0.510, and >=99.59% L2-orthogonal gradient separation between domain pairs - with no replay, no task IDs, no Fisher penalty. The same matrices show positive cross-domain forward transfer: held-out JavaScript PPL drops 26.8% at LLaMA-8B Retrofit and 62.0% at GPT-2 Medium From-Scratch purely from Python training. Two extensions on the same substrate close further open problems. A closed-loop meta-control layer (Extension A) reduces forgetting by an additional 81% at ~398M, mapping onto the System A and System M roles of Dupoux et al. (arXiv:2603.15381). An operator-level plan vector (Extension B) reshapes forward-pass behavior at 99.96% cosine fidelity over 30 source->target pairs. The architectural insight is a Read/Write decomposition: the forward pass is fully dense, while cross-domain parameter updates are structured so prior-domain subspaces are not written to. To our knowledge, TFGN is the first architecture that simultaneously closes catastrophic forgetting at LLM scale, realizes a closed-loop autonomous-learning meta-controller, and carries an operator-level latent planner.

URL PDF HTML ☆

赞 0 踩 0

2605.15010 2026-05-18 cs.CV

3D Skew-Normal Splatting

Xiangru Wu, Ke Fan, Yanwei Fu

AI总结本文提出了一种名为Skew-Normal Splatting（SNS）的新方法，用于改进3D高斯溅射（3DGS）在实时新视角合成中的表示能力。通过引入Azzalini偏正态分布作为基本单元，SNS能够灵活建模对称和非对称结构，尤其在处理物体边界和单侧表面时表现出更强的表示能力。此外，SNS保持了数学上的可解析性，并通过解耦参数化和分块优化策略提升了训练稳定性，实验表明其在多个基准测试中优于传统高斯及其他非高斯核方法。

2605.14978 2026-05-18 cs.CL

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

Jie Jiang, Xing Sun, Ruotian Chen, Jianan Su, Kaixin Shen

AI总结本文研究了如何通过性能驱动的策略优化提升推测解码的效率，提出了一种基于强化学习的框架PPOW，该方法将草案模型的优化从传统的词元级模仿转向窗口级优化。PPOW结合了成本感知加速奖励、分布基于的接近奖励以及自适应发散感知窗口机制，优先优化具有高置信度的窗口。实验表明，PPOW在多个模型和基准测试中显著提升了推测解码的接受长度和加速效果。

2605.14892 2026-05-18 cs.AI

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Shihao Qi, Jie Ma, Rui Xing, Wei Guo, Xiao Huang, Zhitao Gao, Jianhao Deng, Jun Liu, Lingling Zhang, Bifan Wei, Boqian Yang, Pinghui Wang, Jianwen Sun, Jing Tao, Yaqiang Wu, Hui Liu, Yu Yao, Tongliang Liu

AI总结本文综述了基于大语言模型的多智能体系统在协作、错误归因与自主进化方面的研究进展，指出现有研究多分别关注单个智能体能力、协作机制或自我进化，而忽视了它们之间的因果关系。文章提出了一个统一的框架——LIFE 进程，涵盖能力基础构建、协作整合、错误归因与自主进化四个阶段，系统分析了各阶段之间的依赖关系，并提出了跨阶段的研究方向，旨在推动具备持续诊断、结构调整与行为优化能力的自组织多智能体系统发展。

2605.14884 2026-05-18 cs.LG

AIMing for Standardised Explainability Evaluation in GNNs: A Framework and Case Study on Graph Kernel Networks

Magdalena Proszewska, N. Siddharth

AI总结图神经网络（GNNs）在处理图结构数据方面取得了显著进展，但缺乏一个全面的可解释性评估框架。本文提出AIM框架，从准确性、实例级解释和模型级解释三个维度对可解释性进行系统评估，具有高度灵活性和广泛适用性。通过将AIM应用于图核网络（GKNs）等内在可解释的GNN模型，研究发现了其解释性局限并据此改进模型，提出了在保持高准确率的同时提升可解释性的xGKN，为图神经网络的可解释性研究提供了更实用和有效的解决方案。

Comments 19 pages, 4 figures, 8 tables

2605.14876 2026-05-18 cs.CV cs.AI

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

Hanbo Cheng, Limin Lin, Ruo Zhang, Yicheng Pan, Jun Du

AI总结尽管当前文本到图像生成模型在技术上取得了快速进展，但它们大多依赖单步生成范式，难以处理复杂的语义内容，且参数扩展带来的性能提升有限。为了解决多步推理方法中存在的幻觉、优化不稳定和推理延迟等问题，本文提出了一种闭环视觉推理框架CLVR，该框架将视觉语言逻辑规划与像素级扩散生成深度融合，并引入了基于代理提示的强化学习和Δ-空间权重合并等方法，有效提升了生成质量与推理效率，实验表明其在多个基准测试中优于现有开源模型，接近商业模型的性能。

2605.14665 2026-05-18 cs.AI cs.CL cs.IR

Falkor-IRAC: Graph-Constrained Generation for Verified Legal Reasoning in Indian Judicial AI

Joy Bose

AI总结该论文提出了一种名为Falkor-IRAC的图约束生成框架，旨在提升印度司法AI系统中法律推理的准确性和可靠性。该方法基于IRAC（问题、规则、分析、结论）知识图谱，将印度最高法院和高等法院的判决结构化为图节点，并整合程序状态转换、先例关系和法律条文引用。在推理过程中，系统仅接受能通过图结构验证的生成结果，从而有效减少错误引用和推理链不完整的问题，并能主动检测法律原则间的冲突，为法律AI的可信推理提供了新思路。

Comments 20 pages, 8 figures, 4 tables

详情

英文摘要

Legal reasoning is not semantic similarity search. A court judgment encodes constrained symbolic reasoning: precedent propagation, procedural state transitions, and statute-bound inference. These are properties that vector-based retrieval-augmented generation (RAG) cannot faithfully represent. Hallucinated precedents, outdated statute citations, and unsupported reasoning chains remain persistent failure modes in LLM-based legal AI, with real consequences for access to justice in high-caseload jurisdictions such as India. This paper presents Falkor-IRAC, a graph-constrained generation framework for Indian legal AI that grounds generation in structured reasoning over an IRAC (Issue, Rule, Analysis, Conclusion) knowledge graph. Judgments from the Supreme Court and High Courts of India are ingested as IRAC node structures enriched with procedural state transitions, precedent relationships, and statutory references, stored in FalkorDB for low-latency agentic traversal. At inference time, LLM-generated answers are accepted only if a valid supporting path can be traced through the graph, a check performed by a falsifiability oracle called the Verifier Agent. The system also detects doctrinal conflicts as a first-class output rather than silently resolving them. Falkor-IRAC is evaluated using graph-native metrics: citation grounding accuracy, path validity rate, hallucinated precedent rate, and conflict detection rate. These metrics are argued to be more appropriate for legal reasoning evaluation than BLEU and ROUGE. On a proof-of-concept corpus of 51 Supreme Court judgments, the Verifier Agent correctly validated citations on completed queries and correctly rejected fabricated citations. Evaluation against vector-only RAG baselines is left for future work. The companion InIRAC dataset, 500+ structured Indian court judgments with IRAC annotations, is released alongside this paper.

URL PDF HTML ☆

赞 0 踩 0

2605.14401 2026-05-18 cs.CL cs.AI

Agentic Recommender System with Hierarchical Belief-State Memory

Xiang Shen, Yuhang Zhou, Yifan Wu, Zhuokai Zhao, Siyu Lin, Lei Huang, Qianqian Zhong, Lizhu Zhang, Benyu Zhang, Xiangjun Fan, Hong Yan

AI总结本文提出了一种基于记忆增强的智能推荐系统MARS，通过分层信念状态记忆结构，将推荐问题建模为部分可观测问题，从而更准确地捕捉用户的动态偏好。MARS将记忆分为事件记忆、偏好记忆和用户画像记忆三个层级，并引入包含提取、强化、弱化、巩固、遗忘和重构六种操作的完整生命周期，由基于大语言模型的调度器动态管理。实验表明，MARS在多个推荐基准数据集上取得了显著性能提升，优于现有最优方法。

Comments 4 figures, 8 tables

2605.14354 2026-05-18 cs.CL

LLM-based Detection of Manipulative Political Narratives

Sinclair Schneider, Florian Steuber, Gabi Dreo Rodosek

AI总结本文提出了一种基于大语言模型的计算框架，用于检测和结构化操纵性政治叙事。该方法通过结合少量样本提示与合法批评内容，预先过滤出具有操纵性的帖子，再利用UMAP进行嵌入和降维，使用HDBSCAN进行聚类分析，从而发现新的叙事群体。该方法无需预设目标类别，能够有效识别出120多万条社交媒体帖子中的41个操纵性叙事集群，为分析政治舆论提供了新的工具。

Comments This paper has been submitted to the upcoming 18th International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2026)

2605.14311 2026-05-18 cs.LG cs.AI cs.HC

Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment

Yuchen Sun, Pei Fu, Shaojie Zhang, Anan Du, Xiuwen Xi, Ruoceng Zhang, Zhenbo Luo, Jian Luan, Chongyang Zhang

AI总结本文研究了通用图形用户界面（GUI）代理中测试时扩展（TTS）方法中的关键问题，即现有批评模型依赖二分类导致对有效操作和看似合理但无效的操作无法区分。为此，作者提出了一种新的连续语义对齐方法BBCritic，通过两阶段对比学习恢复被二分类压制的层次结构，并引入首个细粒度评估基准BBBench。实验表明，该方法在无需额外标注的情况下超越了现有大模型，在跨平台任务中表现出强大的零样本迁移能力。

Comments 28 pages including appendix. Code and BBBench benchmark to be released

2605.14309 2026-05-18 cs.CV cs.AI cs.LG

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

Shen Lin, Jing Lin, Junhao Dong, Piotr Koniusz, Li Xu

AI总结本文提出了一种基于可解释概念分解的视觉-语言模型（VLM）概念级机器遗忘方法ICED，旨在解决传统图像或实例级遗忘难以精确移除目标知识而不影响无关语义的问题。该方法通过多模态大语言模型构建任务相关的概念词汇表，并将视觉表征分解为稀疏、非负的语义概念组合，从而实现对图像中目标概念的精确抑制，同时保留非目标语义和跨模态知识。实验表明，该方法在保持模型性能的同时，能够更全面地遗忘目标知识并更好保留图像中的非目标信息。

2605.14205 2026-05-18 cs.AI

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ted Chaiwachirasak, Han Li, Lingyun Wang

AI总结本文提出SimPersona框架，旨在解决基于大语言模型的电商代理在面对真实买家群体时无法捕捉其异质性和分布特性的问题。该方法通过从历史点击流中学习离散的买家类型，并将其转化为紧凑的个性标签，从而指导代理的行为决策。实验表明，SimPersona能够有效模拟真实买家行为，实现高转化率匹配，并在多个电商场景中表现出优越的性能。

详情

英文摘要

LLM-based web agents can navigate live storefronts, yet they often collapse to a single "average buyer" policy, failing to capture the heterogeneous and distributional nature of real buyer populations. Existing personalization methods rely on hand-crafted prompt-based personas that are brittle, difficult to scale, context-inefficient, and unable to faithfully represent population-level behavior. We introduce SimPersona, a novel framework that learns discrete buyer types from historical traffic and exposes them to LLM-based web agents as compact persona tokens. Given raw clickstreams, a behavior-aware VQ-VAE induces a discrete buyer-type space that captures the statistical structure of real buyer behavior and merchant-specific buyer population distributions. To provide behavior-specific guidance to LLM-based web agents, SimPersona maps each learned buyer type to a dedicated persona token in the LLM agent vocabulary and fine-tunes the agent with these tokens on real browsing traces. At inference, each synthetic buyer is assigned to a learned buyer type with a single encoder forward pass, requiring no retraining or store-specific prompt engineering. For population-level simulation, SimPersona samples buyer types from each merchant's empirical distribution over the learned VQ-VAE codebook and instantiates agents with the corresponding persona tokens, preserving merchant-specific buyer population distributions. Evaluated on $8.37$M buyers across $42$ held-out live storefronts, SimPersona achieves $78\%$ conversion-rate alignment with real buyers, exhibits interpretable behavioral variation across buyer types, and outperforms a baseline with $8\times$ more parameters on goal-oriented shopping tasks. We further release an open-source data pipeline that converts raw e-commerce event logs into buyer representations and agent-training traces.

URL PDF HTML ☆

赞 0 踩 0

2605.14087 2026-05-18 cs.CL cs.LG

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Mokshit Surana, Archit Rathod, Akshaj Satishkumar

AI总结本研究系统评估了大型语言模型中毒性内容的生成与缓解方法，重点考察了推理时缓解技术DExperts在降低有害输出方面的效果。研究通过三个阶段的实验发现，DExperts在显式毒性检测中表现优异，安全率达到100%，但在面对隐含的仇恨言论时效果下降至98.5%，同时带来了显著的延迟开销。该研究揭示了显式与隐式毒性缓解之间的性能差距，为AI安全领域提供了重要的实证参考。

2605.14057 2026-05-18 cs.CL

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Xubo Lin, Zezhi Deng, Shihao Wang, Grace Hui Yang, Yang Deng

AI总结本文提出了一种用于法律咨询对话代理的双层次强化学习框架，旨在解决传统对话系统被动响应用户需求的问题。该方法通过两个协作的强化学习智能体，分别负责策略层面的对话管理和细粒度的语句生成，使代理能够主动提问以获取关键信息，模拟法官的质询模式。实验表明，该方法在美最高法院数据集上优于多种基线模型，为高风险、领域特定的对话系统应用提供了重要进展。

Comments Accepted in ACL 2026 as Findings

2605.13925 2026-05-18 cs.RO

Towards Robotic Dexterous Hand Intelligence: A Survey

Weiguang Zhao, Tian Liang, Xihao Guo, Rui Zhang, Irwin King, Kaizhu Huang

AI总结本文综述了灵巧机械手领域的研究进展，系统分析了硬件设计、控制与学习方法、数据集与评估体系等方面的现状与挑战。文章从四个互补角度出发，梳理了机械手在驱动、感知、控制策略等方面的关键权衡，并总结了当前研究的主要局限与未来发展方向，旨在为该领域提供结构化的理解与研究指引。

2605.13142 2026-05-18 cs.AI math.OC

A Constraint Programming Approach for n-Day Lookahead Playoff Clinching in the NHL

Gili Rosenberg, Kyle E. C. Booth, J. Kyle Brubaker, Ruben S. Andrist

AI总结本文研究了如何在国家冰球联盟（NHL）中确定一支球队在接下来的 $n$ 天内是否能够锁定季后赛资格的问题。针对复杂的晋级规则和复杂的平局处理机制，作者提出了一种基于约束编程的树搜索算法，能够高效地分析未来 $n$ 天比赛结果的所有可能组合，并判断球队是否能够确保季后赛席位。该方法结合了预处理、剪枝策略和节点排序启发式，有效提升了搜索效率，并通过大量真实赛季数据验证了其有效性，具有良好的扩展性，可用于分析其他相关体育指标。

Comments 18 pages, 5 figures, 4 tables. Accepted to CP 2026

2605.13073 2026-05-18 cs.CV

HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization

Yulei Kang, Tianze Zhu, Jian-Fang Hu, Jianhuang Lai, Wei-Shi Zheng

AI总结本文针对真实场景中3D高斯泼溅（3DGS）重建面临的动态干扰和光照引起的视图间外观不一致问题，提出了一种基于冲突感知的优化框架。该方法通过语义一致性引导的掩膜生成和双视角梯度调和策略，有效抑制了不可靠的监督信息并缓解视图间梯度冲突，从而提升了重建质量与稳定性。实验表明，该方法在复杂真实场景下取得了当前最优的渲染效果。

2605.12667 2026-05-18 cs.LG cs.AI

ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization

Nirmal Patel, Fei Wang, Inderjit S. Dhillon

AI总结该研究针对大语言模型对齐中基于人工智能反馈的强化学习（RLAIF）所面临的离散奖励噪声问题，提出了一种名为ODRPO的鲁棒策略优化框架。其核心方法是将多级离散奖励分解为一系列二元序数指示符，从而结构化地隔离评估噪声，并通过逐步设定的成功阈值独立计算优势，提升学习稳定性与鲁棒性。实验表明，ODRPO在多个基准任务上显著优于现有方法，且几乎不增加训练时间开销。

详情

英文摘要

The alignment of Large Language Models (LLMs) utilizes Reinforcement Learning from AI Feedback (RLAIF) for non-verifiable domains such as long-form question answering and open-ended instruction following. These domains often rely on LLM based auto-raters to provide granular, multi-tier discrete rewards (e.g., 1-10 rubrics) that are inherently stochastic due to prompt sensitivity and sampling randomness. We empirically verify the stochasticity of auto-raters that can propagate and corrupt standard advantage estimators like GRPO and MaxRL, as a noisy reward samples can skew normalization statistics and degrade the global learning signal. Empirically, sampling more rewards and taking majority voting may reduce the noise and improve performance, but this approach is computationally expensive. To address this bottleneck, we introduce $\textbf{O}$rdinal $\textbf{D}$ecomposition for $\textbf{R}$obust $\textbf{P}$olicy $\textbf{O}$ptimization ($\textbf{ODRPO}$), a framework that structurally isolates evaluation noise by decomposing discrete rewards into a sequence of ordinal binary indicators. By independently computing and accumulating advantages across these progressively challenging success thresholds, ODRPO prevents outlier evaluations from corrupting the global update while establishing an implicit, variance-aware learning curriculum. Empirically, ODRPO achieves robust performance on Qwen2.5-7B and Qwen3-4B models, outperforming baselines with relative improvements of upto 14.8% on FACTS-grounding-v2 and 7.5% on Alpaca-Evals. Critically, these gains are achieved with negligible training-time overhead, as ODRPO requires no additional compute per step compared to standard estimators. Supported by theoretical analysis confirming its optimization stability, ODRPO provides a scalable and robust framework for aligning models within the noisy, discrete evaluation landscape of modern RLAIF.

URL PDF HTML ☆

赞 0 踩 0

2605.11885 2026-05-18 cs.AI q-bio.NC

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

Justus Meyer zu Bexten, Nico Scherf, Bogdan Franczyk, Simon M. Hofmann

AI总结本文研究了如何利用基于注意力的逐层相关传播（LRP）方法对脑电图基础模型（EEG-FMs）进行解释，以解决其模型可解释性差的问题。研究将LRP方法从传统的卷积神经网络扩展到基于Transformer架构的EEG-FMs，发现该方法不仅能验证模型决策，还能揭示具有生物学意义的新假设。研究在运动想象和情感预测任务中展示了LRP的有效性，揭示了模型对特定脑区信号的依赖，为理解EEG-FMs的行为提供了新的视角。

Comments 18 pages, 6 figures

2605.11485 2026-05-18 cs.RO

Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

Lasse Peters, Laura Ferranti, Andrea Bajcsy, Javier Alonso-Mora

AI总结该论文研究了如何在没有多智能体示范数据的情况下，通过单智能体示范数据学习多智能体协作行为。提出了一种名为CoDi的框架，通过用户定义的多智能体成本函数，将独立训练的单智能体扩散策略进行耦合，从而生成协调的多智能体行为。该方法无需多智能体示范数据，通过一种新的扩散采样方案实现策略协调，并能在无需额外训练的情况下适用于黑箱或非微分成本函数，实验表明其在数据效率和行为协调性方面优于传统多智能体方法。

详情

英文摘要

Imitation learning powered by generative models has proven effective for modeling complex single-agent behaviors. However, teaching multi-agent systems, like multiple arms or vehicles, to coordinate through imitation learning is hindered by a fundamental data bottleneck: as the joint state-action space grows exponentially with the number of agents, collecting a sufficient amount of coordinated multi-agent demonstrations becomes extremely costly. In this work, we ask: how can we leverage single-agent demonstration data to learn multi-agent policies? We present Coordinated Diffusion (CoDi), a framework that couples independently trained single-agent diffusion policies through a user-defined multi-agent cost function, without requiring any coordinated demonstrations. We derive a new diffusion-based sampling scheme wherein the diffusion score function decomposes into independent, single-agent pre-trained base policies plus a cost-driven guidance term that coordinates these base policies into cohesive multi-agent behavior. We show that this guidance term can be estimated in a gradient-free manner, making CoDi applicable to black-box, non-differentiable cost functions without additional training. Theoretically and empirically, we analyze the conditions under which this composition can faithfully approximate a target multi-agent behavior. We find a complementary role for demonstration data versus the cost function: single-agent demonstrations must cover the support of the desired multi-agent behavior, while the cost function must promote desired behavior from this product of single-agent policies. Our results in simulation and hardware experiments of a two-arm manipulation task show that CoDi discovers robust coordinated behavior from single-agent data, is more data-efficient than multi-agent baselines, and highlights the importance of joint guidance, base policy support, and cost design.

URL PDF HTML ☆

赞 0 踩 0

2605.11118 2026-05-18 cs.AI cs.IR

A Cascaded Generative Approach for e-Commerce Recommendations

Moein Hasani, Hamidreza Shahidi, Trace Levinson, Yuan Zhong, Guanghua Shu, Vinesh Gudla, Tejaswi Tenneti

AI总结本文提出了一种级联生成框架，用于解决电商推荐中个性化店面构建的问题。该方法将店面生成分解为两个生成任务：页面区域的主题生成和针对每个区域的受限关键词生成，以支持产品检索。通过教师-学生微调策略提升模型的生产效率，并结合传统排序模型实现混合架构，实验表明该方法在每页浏览量的购物车添加率上相比基线提升了约2.7%。

2605.10893 2026-05-18 cs.CL

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

Reza Khanmohammadi, Erfan Miahi, Simerjot Kaur, Charese H. Smiley, Ivan Brugere, Kundan Thind, Mohammad M. Ghassemi

AI总结该论文研究了大型视觉-语言模型（LVLM）在回答问题时可能依赖语言先验而非图像信息的问题，提出了一种名为BICR的模型无关置信度估计框架。BICR通过在训练时对比真实图像-问题对与图像遮蔽后的隐藏状态，学习区分视觉依据与纯语言驱动的回答，从而在不增加推理成本的情况下提升模型置信度的可靠性。实验表明，BICR在多个基准任务中表现出色，显著优于现有方法，且参数量更少。

2605.10799 2026-05-18 cs.LG cs.AI cs.CL

The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies

Gabriel Garcia

AI总结该论文指出，在评估链式推理（CoT）可信度的标准方法中，存在一个由格式引起的偏差问题：当基准任务的推理链以明确的最终答案结尾时，现有的腐败实验主要测量的是答案位置的影响，而非中间计算步骤的重要性。研究通过实验表明，移除最终答案或提供错误答案会显著影响模型表现，且这种影响随模型规模变化而不同。论文进一步提出了一套三要素协议，以改进未来基于腐败的可信度研究。

Comments 34 pages, 6 figures, 13 tables. Submitted to NeurIPS 2026. Code and data: https://github.com/Gpgabriel25/LastWordWinsCoT

2605.10057 2026-05-18 cs.AI cs.MA

STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning

Ruiyi Yang, Lihuan Li, Hao Xue, Flora D. Salim

AI总结本文提出了一种名为STAR的失效感知路由框架，用于多智能体时空推理中的任务分配问题。该方法通过将智能体之间的控制决策显式建模为基于状态的转移策略，能够根据任务类型和执行状态动态选择合适的专家智能体，从而有效应对不同类型的执行失败。STAR通过结合专家指定的正常路由路径和从执行轨迹中学习的恢复转移，显著提升了系统在面对异常情况时的鲁棒性和可解释性。实验表明，STAR在多个时空推理基准上优于现有方法，尤其在执行路径偏离预期的情况下表现突出。

Comments 30 pages, 13 figures

详情

英文摘要

Compositional spatiotemporal reasoning often requires a system to invoke multiple heterogeneous specialists, such as geometric, temporal, topological, and trajectory agents. A central question is how such a system should route among specialists when execution does not simply succeed or fail, but fails in qualitatively different ways. Existing tool-augmented and multi-agent LLM systems typically leave this routing decision implicit in language generation, making recovery ad hoc, difficult to interpret, and hard to optimize. This paper presents STAR (Spatio-Temporal Agent Router), a failure-aware routing framework that externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At the center of STARis an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states, the router can respond differently to malformed outputs, missing dependencies, and tool--query mismatches, rather than collapsing them into a generic retry signal. Specialists execute through a tool-grounded extract--compute--deposit protocol and write intermediate results to a shared blackboard for downstream fusion. Results prove that retaining unsuccessful traces during training enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent. Across three spatiotemporal benchmarks and eight backbone LLMs, STAR improves over multiple baselines with the clearest gains on queries whose execution deviates from the nominal routing path. Router-specific ablations and recovery analyses further show that typed failure-aware routing, rather than specialist composition alone, is a key factor for these improvements.

URL PDF HTML ☆

赞 0 踩 0

2605.10052 2026-05-18 cs.CL cs.AI

Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering

Xinyu Zhang, Zhicheng Dou, Deyang Li, Jianjun Tao, Shuo Cheng, Ruifeng Shi, Fangchao Liu, Enrui Hu, Yangkai Ding, Hongbo Wang, Qi Ye, Xuefeng Jin, Zhangchun Zhao

AI总结随着人工智能工程范式从单智能体提示和上下文工程转向多智能体协调工程，如何系统化地编码和提升多智能体协作能力成为关键瓶颈。本文提出了一种名为 *Swarm Skills* 的可移植、自演进的多智能体系统规范，通过引入角色、工作流、执行边界和自演进语义结构，将多智能体协作流程转化为可分发的资产。研究还提出了一种自演进算法，能够自动提炼成功执行轨迹并持续优化现有技能，从而实现无需人工干预的多智能体协调策略自我进化。