arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2605.14401 2026-05-18 cs.CL cs.AI

Agentic Recommender System with Hierarchical Belief-State Memory

Xiang Shen, Yuhang Zhou, Yifan Wu, Zhuokai Zhao, Siyu Lin, Lei Huang, Qianqian Zhong, Lizhu Zhang, Benyu Zhang, Xiangjun Fan, Hong Yan

发表机构 * Meta Recommendation Systems (MRS)（Meta推荐系统）

AI总结本文提出了一种基于记忆增强的智能推荐系统MARS，通过分层信念状态记忆结构，将推荐问题建模为部分可观测问题，从而更准确地捕捉用户的动态偏好。MARS将记忆分为事件记忆、偏好记忆和用户画像记忆三个层级，并引入包含提取、强化、弱化、巩固、遗忘和重构六种操作的完整生命周期，由基于大语言模型的调度器动态管理。实验表明，MARS在多个推荐基准数据集上取得了显著性能提升，优于现有最优方法。

Comments 4 figures, 8 tables

2605.14354 2026-05-18 cs.CL

LLM-based Detection of Manipulative Political Narratives

Sinclair Schneider, Florian Steuber, Gabi Dreo Rodosek

发表机构 * University of the Bundeswehr Munich（联邦国防军大学慕尼黑）

AI总结本文提出了一种基于大语言模型的计算框架，用于检测和结构化操纵性政治叙事。该方法通过结合少量样本提示与合法批评内容，预先过滤出具有操纵性的帖子，再利用UMAP进行嵌入和降维，使用HDBSCAN进行聚类分析，从而发现新的叙事群体。该方法无需预设目标类别，能够有效识别出120多万条社交媒体帖子中的41个操纵性叙事集群，为分析政治舆论提供了新的工具。

Comments This paper has been submitted to the upcoming 18th International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2026)

2605.14311 2026-05-18 cs.LG cs.AI cs.HC

Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment

Yuchen Sun, Pei Fu, Shaojie Zhang, Anan Du, Xiuwen Xi, Ruoceng Zhang, Zhenbo Luo, Jian Luan, Chongyang Zhang

发表机构 * Xiaomi Inc.（小米公司）； Shanghai Jiao Tong University（上海交通大学）

AI总结本文研究了通用图形用户界面（GUI）代理中测试时扩展（TTS）方法中的关键问题，即现有批评模型依赖二分类导致对有效操作和看似合理但无效的操作无法区分。为此，作者提出了一种新的连续语义对齐方法BBCritic，通过两阶段对比学习恢复被二分类压制的层次结构，并引入首个细粒度评估基准BBBench。实验表明，该方法在无需额外标注的情况下超越了现有大模型，在跨平台任务中表现出强大的零样本迁移能力。

Comments 28 pages including appendix. Code and BBBench benchmark to be released

2605.14309 2026-05-18 cs.CV cs.AI cs.LG

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

Shen Lin, Jing Lin, Junhao Dong, Piotr Koniusz, Li Xu

发表机构 * Fujian Normal University（福建师范大学）； Nanyang Technological University（南洋理工大学）； University of New South Wales（新南威尔士大学）； Data61 CSIRO（Data61澳大利亚联邦科学与工业研究组织）

AI总结本文提出了一种基于可解释概念分解的视觉-语言模型（VLM）概念级机器遗忘方法ICED，旨在解决传统图像或实例级遗忘难以精确移除目标知识而不影响无关语义的问题。该方法通过多模态大语言模型构建任务相关的概念词汇表，并将视觉表征分解为稀疏、非负的语义概念组合，从而实现对图像中目标概念的精确抑制，同时保留非目标语义和跨模态知识。实验表明，该方法在保持模型性能的同时，能够更全面地遗忘目标知识并更好保留图像中的非目标信息。

2605.14205 2026-05-18 cs.AI

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ted Chaiwachirasak, Han Li, Lingyun Wang

发表机构 * Shopify

AI总结本文提出SimPersona框架，旨在解决基于大语言模型的电商代理在面对真实买家群体时无法捕捉其异质性和分布特性的问题。该方法通过从历史点击流中学习离散的买家类型，并将其转化为紧凑的个性标签，从而指导代理的行为决策。实验表明，SimPersona能够有效模拟真实买家行为，实现高转化率匹配，并在多个电商场景中表现出优越的性能。

2605.14087 2026-05-18 cs.CL cs.LG

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Mokshit Surana, Archit Rathod, Akshaj Satishkumar

发表机构 * University of Illinois Chicago（伊利诺伊大学芝加哥分校）

AI总结本研究系统评估了大型语言模型中毒性内容的生成与缓解方法，重点考察了推理时缓解技术DExperts在降低有害输出方面的效果。研究通过三个阶段的实验发现，DExperts在显式毒性检测中表现优异，安全率达到100%，但在面对隐含的仇恨言论时效果下降至98.5%，同时带来了显著的延迟开销。该研究揭示了显式与隐式毒性缓解之间的性能差距，为AI安全领域提供了重要的实证参考。

2605.14057 2026-05-18 cs.CL

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Xubo Lin, Zezhi Deng, Shihao Wang, Grace Hui Yang, Yang Deng

发表机构 * Georgetown University（乔治城大学）； Singapore Management University（新加坡国立管理学院）

AI总结本文提出了一种用于法律咨询对话代理的双层次强化学习框架，旨在解决传统对话系统被动响应用户需求的问题。该方法通过两个协作的强化学习智能体，分别负责策略层面的对话管理和细粒度的语句生成，使代理能够主动提问以获取关键信息，模拟法官的质询模式。实验表明，该方法在美最高法院数据集上优于多种基线模型，为高风险、领域特定的对话系统应用提供了重要进展。

Comments Accepted in ACL 2026 as Findings

2605.13925 2026-05-18 cs.RO

Towards Robotic Dexterous Hand Intelligence: A Survey

Weiguang Zhao, Tian Liang, Xihao Guo, Rui Zhang, Irwin King, Kaizhu Huang

发表机构 * University of Liverpool（利物浦大学）； Xi’an Jiaotong-Liverpool University（西安交通大学-利物浦大学）； Duke Kunshan University（杜克-昆山大学）； The Chinese University of Hong Kong（香港中文大学）

AI总结本文综述了灵巧机械手领域的研究进展，系统分析了硬件设计、控制与学习方法、数据集与评估体系等方面的现状与挑战。文章从四个互补角度出发，梳理了机械手在驱动、感知、控制策略等方面的关键权衡，并总结了当前研究的主要局限与未来发展方向，旨在为该领域提供结构化的理解与研究指引。

2605.13142 2026-05-18 cs.AI math.OC

A Constraint Programming Approach for n-Day Lookahead Playoff Clinching in the NHL

Gili Rosenberg, Kyle E. C. Booth, J. Kyle Brubaker, Ruben S. Andrist

发表机构 * Amazon Advanced Solutions Lab（亚马逊高级解决方案实验室）

AI总结本文研究了如何在国家冰球联盟（NHL）中确定一支球队在接下来的 $n$ 天内是否能够锁定季后赛资格的问题。针对复杂的晋级规则和复杂的平局处理机制，作者提出了一种基于约束编程的树搜索算法，能够高效地分析未来 $n$ 天比赛结果的所有可能组合，并判断球队是否能够确保季后赛席位。该方法结合了预处理、剪枝策略和节点排序启发式，有效提升了搜索效率，并通过大量真实赛季数据验证了其有效性，具有良好的扩展性，可用于分析其他相关体育指标。

Comments 18 pages, 5 figures, 4 tables. Accepted to CP 2026

2605.13073 2026-05-18 cs.CV

HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization

Yulei Kang, Tianze Zhu, Jian-Fang Hu, Jianhuang Lai, Wei-Shi Zheng

发表机构 * Sun Yat-sen University（中山大学）； Northeastern University（东北大学）； Guangdong Province Key Laboratory of Information Security Technology（广东省信息安全技术重点实验室）； Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China（教育部机器智能与先进计算重点实验室）

AI总结本文针对真实场景中3D高斯泼溅（3DGS）重建面临的动态干扰和光照引起的视图间外观不一致问题，提出了一种基于冲突感知的优化框架。该方法通过语义一致性引导的掩膜生成和双视角梯度调和策略，有效抑制了不可靠的监督信息并缓解视图间梯度冲突，从而提升了重建质量与稳定性。实验表明，该方法在复杂真实场景下取得了当前最优的渲染效果。

2605.12667 2026-05-18 cs.LG cs.AI

ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization

Nirmal Patel, Fei Wang, Inderjit S. Dhillon

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）； Google（谷歌）

AI总结该研究针对大语言模型对齐中基于人工智能反馈的强化学习（RLAIF）所面临的离散奖励噪声问题，提出了一种名为ODRPO的鲁棒策略优化框架。其核心方法是将多级离散奖励分解为一系列二元序数指示符，从而结构化地隔离评估噪声，并通过逐步设定的成功阈值独立计算优势，提升学习稳定性与鲁棒性。实验表明，ODRPO在多个基准任务上显著优于现有方法，且几乎不增加训练时间开销。

详情

英文摘要

The alignment of Large Language Models (LLMs) utilizes Reinforcement Learning from AI Feedback (RLAIF) for non-verifiable domains such as long-form question answering and open-ended instruction following. These domains often rely on LLM based auto-raters to provide granular, multi-tier discrete rewards (e.g., 1-10 rubrics) that are inherently stochastic due to prompt sensitivity and sampling randomness. We empirically verify the stochasticity of auto-raters that can propagate and corrupt standard advantage estimators like GRPO and MaxRL, as a noisy reward samples can skew normalization statistics and degrade the global learning signal. Empirically, sampling more rewards and taking majority voting may reduce the noise and improve performance, but this approach is computationally expensive. To address this bottleneck, we introduce $\textbf{O}$rdinal $\textbf{D}$ecomposition for $\textbf{R}$obust $\textbf{P}$olicy $\textbf{O}$ptimization ($\textbf{ODRPO}$), a framework that structurally isolates evaluation noise by decomposing discrete rewards into a sequence of ordinal binary indicators. By independently computing and accumulating advantages across these progressively challenging success thresholds, ODRPO prevents outlier evaluations from corrupting the global update while establishing an implicit, variance-aware learning curriculum. Empirically, ODRPO achieves robust performance on Qwen2.5-7B and Qwen3-4B models, outperforming baselines with relative improvements of upto 14.8% on FACTS-grounding-v2 and 7.5% on Alpaca-Evals. Critically, these gains are achieved with negligible training-time overhead, as ODRPO requires no additional compute per step compared to standard estimators. Supported by theoretical analysis confirming its optimization stability, ODRPO provides a scalable and robust framework for aligning models within the noisy, discrete evaluation landscape of modern RLAIF.

URL PDF HTML ☆

赞 0 踩 0

2605.11885 2026-05-18 cs.AI q-bio.NC

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

Justus Meyer zu Bexten, Nico Scherf, Bogdan Franczyk, Simon M. Hofmann

发表机构 * Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI)（可扩展数据分析与人工智能中心（ScaDS.AI））； Leipzig University（莱比锡大学）； Neural Data Science and Statistical Computing, Max Planck Institute for Human Cognitive and Brain Sciences（神经数据科学与统计计算，人类认知与脑科学马克斯·普朗克研究所）； Faculty of Economics, Leipzig University（经济学院，莱比锡大学）； Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences（神经病学系，人类认知与脑科学马克斯·普朗克研究所）

AI总结本文研究了如何利用基于注意力的逐层相关传播（LRP）方法对脑电图基础模型（EEG-FMs）进行解释，以解决其模型可解释性差的问题。研究将LRP方法从传统的卷积神经网络扩展到基于Transformer架构的EEG-FMs，发现该方法不仅能验证模型决策，还能揭示具有生物学意义的新假设。研究在运动想象和情感预测任务中展示了LRP的有效性，揭示了模型对特定脑区信号的依赖，为理解EEG-FMs的行为提供了新的视角。

Comments 18 pages, 6 figures

2605.11485 2026-05-18 cs.RO

Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

Lasse Peters, Laura Ferranti, Andrea Bajcsy, Javier Alonso-Mora

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Delft University of Technology（代尔夫特理工大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结该论文研究了如何在没有多智能体示范数据的情况下，通过单智能体示范数据学习多智能体协作行为。提出了一种名为CoDi的框架，通过用户定义的多智能体成本函数，将独立训练的单智能体扩散策略进行耦合，从而生成协调的多智能体行为。该方法无需多智能体示范数据，通过一种新的扩散采样方案实现策略协调，并能在无需额外训练的情况下适用于黑箱或非微分成本函数，实验表明其在数据效率和行为协调性方面优于传统多智能体方法。

详情

英文摘要

Imitation learning powered by generative models has proven effective for modeling complex single-agent behaviors. However, teaching multi-agent systems, like multiple arms or vehicles, to coordinate through imitation learning is hindered by a fundamental data bottleneck: as the joint state-action space grows exponentially with the number of agents, collecting a sufficient amount of coordinated multi-agent demonstrations becomes extremely costly. In this work, we ask: how can we leverage single-agent demonstration data to learn multi-agent policies? We present Coordinated Diffusion (CoDi), a framework that couples independently trained single-agent diffusion policies through a user-defined multi-agent cost function, without requiring any coordinated demonstrations. We derive a new diffusion-based sampling scheme wherein the diffusion score function decomposes into independent, single-agent pre-trained base policies plus a cost-driven guidance term that coordinates these base policies into cohesive multi-agent behavior. We show that this guidance term can be estimated in a gradient-free manner, making CoDi applicable to black-box, non-differentiable cost functions without additional training. Theoretically and empirically, we analyze the conditions under which this composition can faithfully approximate a target multi-agent behavior. We find a complementary role for demonstration data versus the cost function: single-agent demonstrations must cover the support of the desired multi-agent behavior, while the cost function must promote desired behavior from this product of single-agent policies. Our results in simulation and hardware experiments of a two-arm manipulation task show that CoDi discovers robust coordinated behavior from single-agent data, is more data-efficient than multi-agent baselines, and highlights the importance of joint guidance, base policy support, and cost design.

URL PDF HTML ☆

赞 0 踩 0

2605.11118 2026-05-18 cs.AI cs.IR

A Cascaded Generative Approach for e-Commerce Recommendations

Moein Hasani, Hamidreza Shahidi, Trace Levinson, Yuan Zhong, Guanghua Shu, Vinesh Gudla, Tejaswi Tenneti

发表机构 * The Pennsylvania State University（宾夕法尼亚州立大学）

AI总结本文提出了一种级联生成框架，用于解决电商推荐中个性化店面构建的问题。该方法将店面生成分解为两个生成任务：页面区域的主题生成和针对每个区域的受限关键词生成，以支持产品检索。通过教师-学生微调策略提升模型的生产效率，并结合传统排序模型实现混合架构，实验表明该方法在每页浏览量的购物车添加率上相比基线提升了约2.7%。

2605.10893 2026-05-18 cs.CL

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

Reza Khanmohammadi, Erfan Miahi, Simerjot Kaur, Charese H. Smiley, Ivan Brugere, Kundan Thind, Mohammad M. Ghassemi

发表机构 * Michigan State University（密歇根州立大学）； Independent AI Researcher（独立AI研究员）； JPMorgan AI Research（摩根大通AI研究）； Henry Ford Health（亨利福特健康）

AI总结该论文研究了大型视觉-语言模型（LVLM）在回答问题时可能依赖语言先验而非图像信息的问题，提出了一种名为BICR的模型无关置信度估计框架。BICR通过在训练时对比真实图像-问题对与图像遮蔽后的隐藏状态，学习区分视觉依据与纯语言驱动的回答，从而在不增加推理成本的情况下提升模型置信度的可靠性。实验表明，BICR在多个基准任务中表现出色，显著优于现有方法，且参数量更少。

2605.10799 2026-05-18 cs.LG cs.AI cs.CL

The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies

Gabriel Garcia

发表机构 * Independent Researcher（独立研究者）

AI总结该论文指出，在评估链式推理（CoT）可信度的标准方法中，存在一个由格式引起的偏差问题：当基准任务的推理链以明确的最终答案结尾时，现有的腐败实验主要测量的是答案位置的影响，而非中间计算步骤的重要性。研究通过实验表明，移除最终答案或提供错误答案会显著影响模型表现，且这种影响随模型规模变化而不同。论文进一步提出了一套三要素协议，以改进未来基于腐败的可信度研究。

Comments 34 pages, 6 figures, 13 tables. Submitted to NeurIPS 2026. Code and data: https://github.com/Gpgabriel25/LastWordWinsCoT

2605.10057 2026-05-18 cs.AI cs.MA

STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning

Ruiyi Yang, Lihuan Li, Hao Xue, Flora D. Salim

发表机构 * University of New South Wales（新南威尔士大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））

AI总结本文提出了一种名为STAR的失效感知路由框架，用于多智能体时空推理中的任务分配问题。该方法通过将智能体之间的控制决策显式建模为基于状态的转移策略，能够根据任务类型和执行状态动态选择合适的专家智能体，从而有效应对不同类型的执行失败。STAR通过结合专家指定的正常路由路径和从执行轨迹中学习的恢复转移，显著提升了系统在面对异常情况时的鲁棒性和可解释性。实验表明，STAR在多个时空推理基准上优于现有方法，尤其在执行路径偏离预期的情况下表现突出。

Comments 30 pages, 13 figures

2605.10052 2026-05-18 cs.CL cs.AI

Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering

Xinyu Zhang, Zhicheng Dou, Deyang Li, Jianjun Tao, Shuo Cheng, Ruifeng Shi, Fangchao Liu, Enrui Hu, Yangkai Ding, Hongbo Wang, Qi Ye, Xuefeng Jin, Zhangchun Zhao

发表机构 * openJiuwen Team（开放九文团队）； Gaoling School of Artificial Intelligence, Renmin University of China（北京语言大学人工智能学院）

AI总结随着人工智能工程范式从单智能体提示和上下文工程转向多智能体协调工程，如何系统化地编码和提升多智能体协作能力成为关键瓶颈。本文提出了一种名为 *Swarm Skills* 的可移植、自演进的多智能体系统规范，通过引入角色、工作流、执行边界和自演进语义结构，将多智能体协作流程转化为可分发的资产。研究还提出了一种自演进算法，能够自动提炼成功执行轨迹并持续优化现有技能，从而实现无需人工干预的多智能体协调策略自我进化。

2605.09877 2026-05-18 cs.LG cs.AI cs.CL

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory

Daniel Goldstein, Eugene Cheah

发表机构 * Featherless AI ； Eleuther AI

AI总结本文提出了一种名为 Key-Value Means（KVM）的新块循环注意力机制，能够支持固定大小或可扩展的状态存储。该方法在保持参数数量极少的情况下，使强大变压器模型具备线性时间复杂度的分块处理能力，并在长上下文任务中表现出色，预填充时间接近二次方且状态增长接近线性。KVM 结合了传统变压器和线性 RNN 的优势，支持分块并行训练与预填充，适用于所有层以节省 KV 缓存内存，并可在传统注意力机制中与 LRNN 混合使用，提升长上下文处理性能。

2605.09869 2026-05-18 cs.RO cs.CV

ConsistNav: Closing the Action Consistency Gap in Zero-Shot Object Navigation with Semantic Executive Control

Haosen Wang, Zhenyang Li, Yinqiang Zhang, Zongqi He, Lutao Jiang, Kai Li, Yizhou Zhao, Liaoyuan Fan, Wenjian Hou, Tingbang Liang, Yibin Wen, Defeng Gu

发表机构 * Sun Yat-sen University（中山大学）； The University of Hong Kong（香港大学）； Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； City University of Hong Kong（香港城市大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文研究了零样本物体导航中的动作一致性问题，即智能体在导航过程中容易因语义信息的反复解读而无法持续追踪目标。为此，作者提出了 ConsistNav，一种无需训练的零样本物体导航框架，通过引入语义执行控制器、持久候选记忆和稳定性感知动作控制三个模块，有效提升了导航过程中对目标的持续追踪能力和动作一致性。实验表明，ConsistNav 在多个基准数据集上取得了优于现有方法的性能，显著提升了成功率和路径成功率。

Comments 13 pages, 5 figures

2605.08949 2026-05-18 cs.LG

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

Binghang Lu, Zheyuan Deng, Runyu Zhang, Bing Hu, Yunhan Zhao, Yuan Tian, Changhong Mou, Guang Lin, Xiaomin Li

发表机构 * Purdue University（普渡大学）； Brown University（布朗大学）； Massachusetts Institute of Technology（麻省理工学院）； University of California, Irvine（加州大学 Irvine 分校）； Utah State University（犹他州立大学）； Harvard University（哈佛大学）

AI总结在大语言模型的持续学习中，灾难性遗忘是一个核心挑战。本文提出Muon-OGD方法，结合Muon优化器的谱范数几何特性与正交投影约束，通过谱范数约束的优化问题和高效的双迭代求解策略，有效避免对先前任务参数方向的干扰。实验表明，Muon-OGD在多个持续学习基准上优于传统微调和正交梯度方法，具有良好的计算可扩展性。

详情

英文摘要

A central challenge in continual learning for large language models (LLMs) is catastrophic forgetting, where adapting to new tasks can substantially degrade performance on previously learned ones. Existing projection-based methods mitigate such interference by restricting parameter updates to subspaces that are orthogonal to directions associated with past tasks. However, these methods are typically formulated under Euclidean parameter geometry, with update magnitudes and projections governed by the Frobenius norm. The recent empirical success of the Muon optimizer, which applies orthogonalized matrix updates and admits a spectral-norm interpretation, suggests that Frobenius geometry may not be the most effective choice for matrix-valued LLM parameters. Motivated by this observation, we propose Muon-OGD, a spectral-norm-aware continual learning framework that integrates Muon-style operator-norm geometry with orthogonal projection constraints. Our method formulates each update as a spectral-norm-constrained optimization problem with linear non-interference constraints, and solves it efficiently through dual iterations and Newton--Schulz matrix-sign approximations. By applying orthogonalized momentum updates that avoid protected directions associated with prior tasks, Muon-OGD aims to improve the stability--plasticity trade-off in sequential LLM adaptation. We evaluate the proposed method on standard continual learning benchmarks, TRACE, and domain-specific Coding--Math--Medical curricula using both encoder--decoder and decoder-only architectures. Empirically, Muon-OGD consistently improves over sequential fine-tuning and competitive orthogonal-gradient baselines, while remaining computationally scalable. These results suggest that spectral-norm-aware update geometry provides a practical and effective alternative to Frobenius-norm projection for continual learning in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.08894 2026-05-18 cs.CL cs.AI

Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs

Yuzhuang Xu, Xu Han, Yuxuan Li, Pengzhan Li, Wanxiang Che

发表机构 * Harbin Institute of Technology（哈尔滨工业大学）； Tsinghua University（清华大学）

AI总结尽管现有极低比特量化方法主要关注数值精度的保持，但本文指出，极低比特量化大语言模型还面临系统性的平滑性退化问题。通过引入平滑性代理指标和序列邻域建模，研究发现量化位宽越低，平滑性退化越严重，导致生成质量下降。为此，作者提出在后训练量化和量化感知训练中引入平滑性保持原则，有效提升了模型性能，强调了平滑性在极端量化中的重要性。

Comments 19 pages, 4 tables, 14 figures

2605.08245 2026-05-18 cs.CV cs.AI

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

Harshvardhan Saini, Samyak Jha, Yiming Tang, Dianbo Liu

发表机构 * Indian Institute of Technology Dhanbad（印度理工学院丹巴德分校）； National University of Singapore（新加坡国立大学）

AI总结本文研究了视觉-语言模型（VLMs）中由于语言与视觉模态过度对齐导致的幻觉问题，揭示了其根本原因在于解码器结构使得视觉嵌入过度对齐到文本流形，从而引入了语言统计偏倚，掩盖了细粒度视觉信息。作者首次量化分析了这一现象，提出两种互补的解决方案：一种是无需训练的推理策略，另一种是引入偏倚感知的微调方法，均能有效去除视觉表示中的语言偏倚。实验表明，这些方法在多个基准测试中显著减少了模型幻觉，并提升了长文本生成的质量。

2605.07557 2026-05-18 cs.LG

Beyond Distribution Estimation: Simplex Anchored Structural Inference Towards Universal Semi-Supervised Learning

Yaxin Hou, Jun Ma, Hanyang Li, Bo Han, Jie Yu, Yuheng Jia

发表机构 * Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nanjing, China（新一代人工智能技术及其交叉应用重点实验室（东南大学），教育部，南京，中国）； School of Computer Science and Engineering, Southeast University, Nanjing, China（计算机科学与工程学院，东南大学，南京，中国）； School of Electrical Engineering, Southeast University, Nanjing, China（电气工程学院，东南大学，南京，中国）

AI总结本文针对现实场景中标签数据稀缺且未标签数据分布未知的半监督学习难题，提出了一个名为UniSSL的通用半监督学习框架。为避免传统伪标签方法依赖分布估计带来的错误标签问题，作者提出基于表示层面结构推理的新方法，提出了一种名为SAGE的模型，通过捕捉高阶样本间依赖关系建立结构共识，并引入简单形等距紧框架引导类间表示分离，有效提升了模型性能。实验表明，SAGE在多个基准数据集上均优于现有方法，平均准确率提升达8.52%。

Comments The paper is accepted by ICML 2026

2605.07074 2026-05-18 cs.CV

Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection

Zhiyuan Wang, Yanxiang Chen, Pengcheng Zhao, Yunfeng Diao, Xin Liao

发表机构 * Hefei University of Technology（合肥工业大学）； Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education（知识工程与大数据重点实验室（合肥工业大学））； School of Computer Science and Information Engineering, Hefei University of Technology（计算机科学与信息工程学院）； Intelligent Interconnected Systems Laboratory of Anhui Province (Hefei University of Technology)（安徽省智能互联系统实验室（合肥工业大学））； School of Computer Science, Nanjing Audit University（南京审计大学计算机学院）； College of Cyber Science and Technology, Hunan University（湖南大学计算机科学与技术学院）

AI总结该论文研究了如何检测由不同未知架构生成的AI图像，指出现有方法容易过度依赖生成器特定的指纹和语义内容，导致泛化能力不足。研究发现，特征纠缠是主要原因，为此提出了一种正交分解与净化网络（ODP-Net），通过结构化分离通用伪造痕迹、生成器指纹和语义内容，有效提升了模型在未知生成模型上的检测性能。

Comments ~10 pages (IEEEtran two-column), 6 figures, 6 tables, 1 algorithm

2605.06390 2026-05-18 cs.AI

Automated alignment is harder than you think

Aleksandr Bowkis, Marie Davidsen Buhl, Jacob Pfau, Geoffrey Irving

发表机构 * AI Security Institute（人工智能安全研究所）

AI总结本文探讨了自动化对齐（automated alignment）在人工智能超级智能（ASI）发展中的潜在风险。研究指出，即使研究代理不刻意破坏对齐工作，自动化对齐过程仍可能产生误导性的安全评估，导致未对齐的AI被无意中部署。这是因为对齐研究涉及许多难以监督的模糊任务，人类判断存在系统性偏差，而自动化系统可能在优化压力下产生人类难以发现的错误，进而影响对齐结果的可靠性。因此，如何训练代理可靠地完成这些任务，成为自动化对齐研究中的关键挑战。

Comments 15 pages, 4 figures

2605.05179 2026-05-18 cs.LG cond-mat.dis-nn stat.ML

Estimating the expected output of wide random MLPs more efficiently than sampling

Wilson Wu, Victor Lecomte, Michael Winer, George Robinson, Jacob Hilton, Paul Christiano

发表机构 * Alignment Research Center（对齐研究中心）

AI总结本文提出了一种比采样更高效的方法，用于估计初始化后的宽随机多层感知机（MLP）在高斯输入下的期望输出。该方法通过构建每一层激活值的近似分布，利用累积量和Hermite展开等工具，避免了传统采样方式中逐个输入计算的耗时过程。实验表明，该方法在保证均方误差的前提下，显著减少了计算量，尤其在估计小概率事件和模型训练中表现出色，为降低模型尾部风险提供了新思路。

Comments 68 pages. Code is available at https://github.com/alignment-research-center/mlp_cumulant_propagation

2605.03548 2026-05-18 cs.LG cs.AI

PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics

Hao Zhou, Rui Zhang, Han Wan, Hao Sun

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学北京校区人工智能学院）

AI总结该研究提出了一种名为PerFlow的物理嵌入式修正流模型，用于高效重建和量化由偏微分方程（PDE）支配的时空动态场的不确定性。PerFlow通过将观测条件与物理约束解耦，实现了无需梯度引导的高效条件采样，并通过约束保持投影确保物理一致性。实验表明，该方法在保持良好物理特性的同时，显著提升了重建精度和推理速度。

Comments 17 pages, 8 figures. Accepted to IJCAI-ECAI 2026

2605.02960 2026-05-18 cs.LG

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving

Zhaoyuan Su, Olatunji Ruwase, Karthik Ganesan, Aurick Qiao, Samyam Rajbhandari, Juncheng Yang, Yue Cheng, Yuxiong He

发表机构 * University of Virginia（弗吉尼亚大学）； Snowflake AI Research（Snowflake AI研究院）； Harvard University（哈佛大学）

AI总结本文研究了在混合专家（MoE）模型中高效服务仅需预填充（prefill）的生产级任务（如分类、推荐等）的问题，提出了MoE-Prefill系统，通过异步专家并行（AsyncEP）机制，将专家权重的加载与计算重叠，避免了传统方法中的冗余计算和同步开销。该方法在前端引入前缀感知路由和真实FLOPs负载追踪，有效提升了模型的吞吐量和计算利用率，在多个硬件和精度配置下均表现出显著的性能提升。

Comments 19 pages, 12 figures, 4 tables

2605.01852 2026-05-18 cs.CV

DP-SfM: Dual-Pixel Structure-from-Motion without Scale Ambiguity

Lilika Makabe, Kohei Ashida, Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita

发表机构 * Graduate School of Information Science and Technology, The University of Osaka（信息科学与技术研究生院，大阪大学）

AI总结本文提出了一种名为DP-SfM的方法，利用双像素（DP）传感器捕获的图像进行多视角三维重建，无需参考物体或预先标定即可自动解决尺度模糊问题。该方法通过结合深度图与双像素图像中的散焦模糊信息，提出了一种简单有效的线性方法来估计绝对尺度，并进一步通过基于强度的优化对齐左右图像。实验表明，该方法在不同相机和镜头捕获的多样化场景中均表现出良好的效果。

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

AI 大模型

视觉与机器人

科学与医疗

Agentic Recommender System with Hierarchical Belief-State Memory

LLM-based Detection of Manipulative Political Narratives

Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Towards Robotic Dexterous Hand Intelligence: A Survey

A Constraint Programming Approach for n-Day Lookahead Playoff Clinching in the NHL

HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization

ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

A Cascaded Generative Approach for e-Commerce Recommendations

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies

STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning

Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory

ConsistNav: Closing the Action Consistency Gap in Zero-Shot Object Navigation with Semantic Executive Control

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

Beyond Distribution Estimation: Simplex Anchored Structural Inference Towards Universal Semi-Supervised Learning

Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection

Automated alignment is harder than you think

Estimating the expected output of wide random MLPs more efficiently than sampling

PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving

DP-SfM: Dual-Pixel Structure-from-Motion without Scale Ambiguity