arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.05960 2026-06-05 cs.RO

Towards a Data Flywheel for Embodied Intelligence in Logistics

面向物流具身智能的数据飞轮

Anlan Yu, Zaishu Chen, Zhiqing Hong, Daqing Zhang

发表机构 * Peking University（北京大学）； JD Logistics（京东物流）； HKUST (Guangzhou)（香港科技大学（广州））

AI总结提出一种数据驱动的物流具身智能框架，通过构建数据飞轮将日常操作转化为可复用数据资产，利用世界模型生成长尾包裹操作的可靠监督，并整合多模态数据实现策略持续改进。

详情

AI中文摘要

具身智能正从实验室演示走向工业部署，物流行业是其中的关键应用场景。基于学习的策略为超越传统感知-规划-控制流程提供了有前景的路径，但其可扩展性取决于具身数据的收集、组织和复用方式。本研究通过构建物流数据飞轮，探索面向工业具身智能的数据中心框架。我们的框架将日常操作转化为可复用的数据资产，利用世界模型为长尾包裹操作生成可靠监督，并将部署反馈反馈到策略改进中。作为初步成果， extit{WM-DAgger}引入了一种基于世界模型的数据聚合框架，该框架合成了分布外恢复数据，用于鲁棒的模仿学习。在此成果基础上，正在进行的工作探索如何将大规模野外多模态数据（包括标注的人类演示、未标注的操作视频以及系统级机器人日志）对齐用于策略学习，并将其转化为持续系统改进的反馈。

英文摘要

Embodied intelligence is moving from laboratory demonstrations toward industrial deployment, with the logistics industry serving as a key application scenario. Learning-based policies offer a promising path beyond traditional perception-planning-control pipelines, but their scalability depends on how embodied data can be collected, organized, and reused. This research studies a data-centric framework for industrial embodied intelligence by constructing a logistics data flywheel. Our framework converts daily operations into reusable data assets, uses World Models to generate reliable supervision for long-tail parcel manipulation, and feeds deployment feedback back into policy improvement. As an initial result, \textit{WM-DAgger} introduces a World-Model-based data aggregation framework that synthesizes out-of-distribution recovery data for robust imitation learning. Building on this result, ongoing work explores how large-scale in-the-wild multimodal data, including labeled human demonstrations, unlabeled operational videos, and system-level robot logs, can be aligned for policy learning and transformed into feedback for continual system improvement.

URL PDF HTML ☆

赞 0 踩 0

2606.05958 2026-06-05 cs.LG

Steering Vectors are an Adversarial Attack Surface

Steering Vectors 是对抗攻击面

Abzal Aidakhmetov, Donato Crisostomi, Tommaso Mencattini, Adrian Robert Minut, Iacopo Masi, Emanuele Rodolà

发表机构 * Sapienza University of Rome（罗马萨皮恩扎大学）； EPFL（苏黎世联邦理工学院）

AI总结本文揭示了一种隐蔽的数据投毒攻击，通过替换转向数据集中的4-6%令牌，使转向向量与反拒绝方向对齐，从而劫持目标模型，同时保留对良性提示的预期转向效果。

详情

AI中文摘要

激活转向已成为一种无需微调即可控制大型语言模型（LLM）行为的流行方法。由于该技术即插即用，用户共享数据集和预计算向量以转向模型激活。然而，我们展示了一种隐蔽的数据投毒攻击可以悄无声息地破坏这一流程。通过替换转向数据集中的4-6%令牌，攻击者可以使结果向量与反拒绝方向对齐。这劫持了目标模型，同时保留了对良性提示的预期转向效果。在此威胁模型下，恶意行为者可以分发一个看似安全的包，包含文本、向量和权重，以及一个终端用户可以验证的等价证书。我们在两个开放权重模型系列和八个模型-属性组合上测试了该攻击，观察到中毒向量的绝对攻击成功率（ASR）达到20-55%，比干净参考高出19%到51%。最后，我们发现一种拒绝方向正交化防御可以恢复约82%的ASR差距，而不损害良性行为。

英文摘要

Activation steering has become a popular way to control Large Language Model (LLM) behavior without fine-tuning. Since the technique is plug-and-play, users share datasets and precomputed vectors to steer model activations. However, we show that a \emph{stealth data poisoning attack} silently compromises this pipeline. By substituting $4{-}6\%$ of tokens in the steering dataset, an attacker can silently align the resulting vector with an anti-refusal direction. This jailbreaks the target model while preserving the intended steering effect on benign prompts. Under this threat model, a malicious actor can distribute an apparently safe bundle containing texts, vectors, and weights, alongside an equivalence certificate that the end-user can verify. We test the attack on two open-weight model families and eight model-attribute combinations, observing that poisoned vectors reach an absolute attack success rate (ASR) of $20{-}55\%$, $+19\%$ to $+51\%$ over a clean reference. Finally, we find that a refusal-direction orthogonalization defense can recover ${\approx}82\%$ of the ASR gap without harming benign behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.05957 2026-06-05 cs.LG stat.ML

Dead Directions: Geometric Singular Learning

死方向：几何奇异学习

Tejas Pradeep Shirodkar

发表机构 * IIIT, Hyderabad（Hyderabad 二十一世纪信息技术研究所）

AI总结本文通过引入“死方向”概念，桥接奇异学习理论与信息几何，提出在原始参数坐标下从Fisher曲率衰减率恢复KL阶数的方法，并扩展到深度网络，实现无需后验采样的Watanabe三元组(λ, m, ν)轨迹率读出。

Comments 139 pages, 13 figures, 13 tables

详情

AI中文摘要

奇异学习理论和信息几何研究了相同的参数空间，但使用了大体不同的词汇：前者在解决坐标中计算贝叶斯不变量，后者在非退化假设下使用原始坐标，而过参数化模型经常违反该假设。我们通过一个原始概念——死方向——将它们桥接起来：死方向是沿着Fisher度量退化的单位向量，等价于具有确定KL阶数的解析奇异集的切向量，KL阶数由KL散度消失的速度决定。两种解读命名同一向量；我们的核心操作表明，其KL阶数可作为方向Fisher曲率趋近奇异点的衰减率恢复，在原始参数坐标中无需Hironaka分解。光滑纤维上的选择规则将该速率转化为Watanabe的单方向对实对数规范阈值的贡献，我们将恢复扩展到多分量交叉、重数m、奇异波动ν（在一维方向中KL阶数通用）、先验RLCT偏移以及温度后验。然后我们将该速率提升到深度网络：多层K-FAC分解将每个Fisher块写为激活侧和梯度侧速率的乘积，两者之间存在对偶性，并在现代网络原语（残差流、层归一化、注意力）中实例化。商定理将该速率传递到在G不变度量下梯度流的规范商Θ/G；SGD符合条件，标准Adam不符合，我们构造了一个G等变Adam族预条件器（DDCAdam）使其符合。该桥接提供了对奇异几何的参数坐标处理、每个架构的闭式预测，以及从一个检查点的前向和后向传播中无需后验采样的Watanabe三元组(λ, m, ν)轨迹率读出。

英文摘要

Singular learning theory and information geometry have studied the same parameter spaces in mostly separate vocabularies: the former computes Bayesian invariants in resolved coordinates, the latter works in original coordinates under a non-degeneracy assumption that overparameterised models routinely violate. We bridge them through one primitive, the dead direction: a unit vector along which the Fisher metric degenerates, equivalently a tangent to the analytic singular set with a definite KL order, set by how fast the KL divergence vanishes. The two readings name the same vector; our central move shows its KL order is recoverable as the decay rate of the directional Fisher curvature approaching the singularity, in original parameter coordinates and without a Hironaka resolution. A selection rule on smooth fibres translates this rate into Watanabe's single-direction contribution to the real log canonical threshold, and we extend the recovery to multi-component crossings, multiplicity $m$, the singular fluctuation $ν$ (universal in the KL order for 1D directions), prior-RLCT shifts, and tempered posteriors. We then lift this rate to a deep network: a multi-layer K-FAC factorisation writes each Fisher block as a product of activation- and gradient-side rates with a duality between them, instantiated at modern-network primitives (residual streams, layer normalisation, attention). A quotient theorem carries the rate to the gauge quotient $Θ/G$ under gradient flow on a $G$-invariant metric; SGD qualifies, standard Adam does not, and we construct a $G$-equivariant Adam-family preconditioner (DDCAdam) that does. The bridge yields a parameter-coordinate handle on singular geometry, closed-form per-architecture predictions, and a trajectory-rate readout of Watanabe's triple $(λ, m, ν)$ from one checkpoint's forward and backward passes, without posterior sampling.

URL PDF HTML ☆

赞 0 踩 0

2606.05956 2026-06-05 cs.AI

Bidirectional Search for Longest Paths: Case for Front-to-Front Heuristics

最长路径的双向搜索：前向-前向启发式的情况

Tzur Shubi, Ariel Felner, Solomon Eyal Shimony, Shahaf S. Shperberg

发表机构 * Technion - Israel Institute of Technology（技术学院 - 以色列理工学院）

AI总结提出BiXDFBnB算法，将单前沿双向搜索框架适配到广义最长简单路径问题，利用前向-前向启发式减少节点扩展，并在某些情况下提升运行时间。

详情

AI中文摘要

双向启发式搜索可以潜在地减少适用于后向搜索的问题的搜索工作量。众所周知，前向-前向启发式可以减少节点扩展的数量，但其开销如此之高，以至于总体运行时间几乎总是增加。我们提出了BiXDFBnB，一种双向深度优先分支定界算法，它将单前沿双向搜索（SFBDS）框架——最初为最短路径（MIN）问题开发——适配到广义最长简单路径（GLSP）设置。由于SFBDS本质上在配对状态上操作，前向-前向（F2F）启发式评估自然出现，并避免了通常与双向前沿管理相关的开销。我们展示了这种适配可以成功应用于最大化（MAX）问题，同时有效处理重叠约束。BiXDFBnB应用于几种类型的最长路径问题：最长简单路径（LSP）、Snakes和Coil-in-the-Box（CIB）。经验评估表明，新算法经常减少节点扩展的数量，并且在某些情况下也改善了总体运行时间。

英文摘要

Bidirectional heuristic search can potentially reduce search effort for problems amenable to backward search. Therein, it is well-known that front-to-front heuristics can reduce the number of node expansions, but their overhead is so high that overall runtime almost always increases. We propose BiXDFBnB, a bidirectional depth-first branch-and-bound algorithm that adapts the Single-Frontier Bidirectional Search (SFBDS) framework - originally developed for shortest-path (MIN) problems - to the Generalized Longest Simple Path (GLSP) setting. Because SFBDS inherently operates on paired states, front-to-front (F2F) heuristic evaluation arises naturally and avoids the overhead typically associated with bidirectional frontier management. We show that this adaptation can be successfully applied to maximization (MAX) problems while efficiently handling overlapping constraints. BiXDFBnB is applied to several types of longest-path problems: Longest Simple Path (LSP), Snakes, and Coil-in-the-Box (CIB). Empirical evaluation shows that the new algorithm frequently reduces the number of node expansions and, in some cases, also improves overall runtime.

URL PDF HTML ☆

赞 0 踩 0

2606.05952 2026-06-05 cs.RO cs.AI

Learning of Robot Safety Policies via Adversarial Synthetic Scenarios

通过对抗性合成场景学习机器人安全策略

Nikolai Dorofeev, Alexey Odinokov, Rostislav Yavorskiy

发表机构 * National Research Institute of Automation and Applied Mathematics（国家自动化与应用数学研究所）

AI总结提出一个基于对抗性游戏的框架，通过红蓝两队对抗生成危险场景并迭代优化安全策略，以高效发现高风险边缘案例。

2606.05950 2026-06-05 cs.AI

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing

Edit-R2：面向多轮图像编辑的上下文感知强化学习

Yuxiao Ye, Haoran He, Fangyuan Kong, Xintao Wang, Pengfei Wan, Kun Gai, Ling Pan

发表机构 * Hong Kong University of Science and Technology（香港理工大学）； Kuaishou Technology（快手科技）

AI总结提出Edit-R2框架，通过重构会话意图和联合优化推理与生成的强化学习，解决多轮图像编辑中的长上下文稀释和状态污染问题，并在MICE-Bench基准上取得领先性能。

详情

AI中文摘要

基于扩散模型和统一多模态基础模型的文本引导图像编辑已取得快速进展。然而，现有方法大多局限于单轮设置，忽略了更现实的多轮上下文编辑场景，即用户通过一系列指令逐步细化图像。在此设置中，模型必须遵循每条新指令，同时保留累积的会话级约束，面临两种耦合的失败模式：长上下文稀释（稀疏文本约束难以从不断增长的图像-文本交错历史中恢复）和状态污染（早期编辑错误降低后续生成质量）。我们提出Edit-R2，一种用于统一多模态模型的新型强化学习后训练框架。Edit-R2重构操作会话意图，在每次编辑轮次前将分散的历史约束有效整合为显式推理轨迹。它进一步通过统一目标实现推理和生成的多轮强化学习，该目标联合优化离散文本空间中的意图重构生成和连续潜在空间中的流匹配图像生成，同时轨迹过滤机制抑制损坏的轨迹以在状态污染下稳定训练。为支持系统评估，我们引入MICE-Bench，一个大规模多轮上下文编辑基准，包含针对累积会话约束的指令遵循（IF）、内容一致性（CC）和全局感知（GA）的自动指标。实验表明，Edit-R2显著改进了多轮上下文编辑，并在与强基线的比较中取得了有竞争力的性能。

英文摘要

Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foundation models. However, most existing methods remain confined to single-turn settings, overlooking the more realistic scenario of multi-turn in-context editing, where users iteratively refine an image through a sequence of instructions. In this setting, a model must follow each new instruction while preserving accumulated session-level constraints, challenged by two coupled failure modes: long-context dilution, where sparse textual constraints become difficult to recover from growing interleaved image-text histories, and state contamination, where earlier editing mistakes degrade subsequent generations. We introduce Edit-R2, a novel reinforcement learning post-training framework for unified multimodal models. Edit-R2 reconstructs the operative session intent, which effectively consolidates scattered historical constraints into an explicit reasoning trace before each editing turn. It further enables multi-turn RL over both reasoning and generation through a unified objective that jointly optimizes intent reconstruction generation in discrete text space and flow-matching image generation in continuous latent space, while a trajectory filtering mechanism suppresses corrupted rollouts to stabilize training under state contamination. To support systematic evaluation, we introduce MICE-Bench, a large-scale benchmark for multi-turn in-context editing with automated metrics for instruction following (IF), content consistency (CC), and global awareness (GA) over accumulated session constraints. Experiments show that Edit-R2 substantially improves multi-turn in-context editing and achieves competitive performance compared against strong baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.05946 2026-06-05 cs.LG

Short paper: Models in the dark -- Rectification and erasure under GDPR in ML supply chains

短论文：黑暗中的模型——机器学习供应链中GDPR下的更正与删除

Henrik Graßhoff, Malte Hansen, Meiko Jensen, Sara Ramezanian

发表机构 * Karlstad University（卡尔斯塔德大学）

AI总结本文从跨学科视角调查机器学习供应链中实现GDPR更正权和删除权的挑战，提出“黑暗中的模型”概念，并分析其带来的紧迫问题。

Comments accepted for presentation at Annual Privacy Forum 2026

2606.05937 2026-06-05 cs.CL

Large Language Models are Perplexed by some Political Parties

大型语言模型对某些政党感到困惑

Paul Lerner, François Yvon

发表机构 * Sorbonne Université, CNRS, ISIR（索邦大学、国家科学研究中心、信息研究所）

AI总结通过困惑度评估，发现大型语言模型对极右翼和民族主义政党文本的困惑度高于社会民主党，且该偏差源于预训练阶段，指令微调影响甚微。

2606.05936 2026-06-05 cs.CL

Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails

语言模型中的认知不公正：预训练过滤器和护栏的审计

Marco Antonio Stranisci, A Pranav, Rossana Damiano, Christian Hardmeier, Anne Lauscher

发表机构 * University of Turin（都灵大学）； IT University of Copenhagen（哥本哈根技术大学）； Trustworthy AI Lab（可信人工智能实验室）

AI总结通过审计预训练过滤器和推理时护栏，发现它们对边缘群体（如跨性别者、女性和中美洲人）的提及存在过度标记，导致认知抹除，而人类标注者会保留大部分被标记内容。

详情

AI中文摘要

现代语言模型依赖预训练过滤器从训练语料中移除不良内容，以及推理时护栏抑制部署期间的不良输出。在本文中，我们研究了这些过滤和审核决策如何产生认知抹除形式，并揭示了自动化系统之间以及这些系统与人类判断之间的紧张关系。我们审计了四个预训练过滤器和三个推理时护栏，针对包含性别和地域来源提及的Common Crawl句子，以及一个手动标注的500句子子集。我们的分析表明，过滤和护栏决策与基于黑名单的词汇线索强相关，同时经常未能标记包含私人信息或明确仇恨言论的内容。与此同时，边缘群体，特别是跨性别者、女性和中美洲人，在各个系统中被显著过度标记。相比之下，人类标注者会保留88.5%的过滤器标记内容和91.3%的护栏标记内容，通常能识别出当前系统未能捕捉到的、由内容移除紧张关系产生的表征性伤害。综合来看，我们的研究结果记录了一种认知抹除形式，其中对边缘群体的提及在预训练前被不成比例地移除，并在推理时再次被抑制。

英文摘要

Modern language models rely on pretraining filters to remove undesirable content from training corpora and inference-time guardrails to suppress undesirable outputs during deployment. In this paper, we examine how these filtering and moderation decisions produce forms of epistemic erasure and reveal tensions both across automated systems and between these systems and human judgment. We audit four pretraining filters and three inference-time guardrails on Common Crawl sentences containing gender and regional-origin mentions, together with a manually annotated subset of 500 sentences. Our analysis shows that filtering and guardrail decisions are strongly associated with blocklist-based lexical cues, while frequently failing to flag content containing private information or explicit hate speech. At the same time, marginalized groups, particularly transgender people, women, and Central Americans, are significantly over-flagged across systems. Human annotators, by contrast, would retain 88.5\% of filter-flagged and 91.3\% of guardrail-flagged content, often recognizing representational harms arising from tensions of content removal that current systems fail to capture. Taken together, our findings document a form of epistemic erasure in which mentions of marginalized groups are disproportionately removed before pretraining and additionally suppressed again at inference time.

URL PDF HTML ☆

赞 0 踩 0

2606.05931 2026-06-05 cs.CL cs.AI cs.CV cs.IR cs.LG cs.MM eess.AS

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection

多模态还是非多模态：通过主动模态检测的查询自适应音视频人物检索

Erfan Loweimi, Mengjie Qian, Kate Knill, Guanfeng Wu, Chi-Ho Chan, Abbas Haider, Muhammad Awan, Josef Kittler, Hui Wang, Mark Gales

发表机构 * University of Cambridge（剑桥大学）； Queen's University Belfast（贝尔法斯特女王大学）； University of Surrey（萨里大学）； Cisco（思科）； Southwest Jiaotong University（西南交通大学）； Teesside University（泰赛德大学）

AI总结提出一种查询自适应框架，通过跨模态分数一致性检测主动模态，在BBC Rewind语料库上达到94.2%的P@1，优于单模态和固定融合方法。

Comments INTERSPEECH 2026

详情

AI中文摘要

当通过语音和面部从视频档案中检索一个人时，系统应该是多模态的吗？在实际的广播档案中，与精心策划的基准不同，目标可能只被听到但未被看到、只被看到但未被听到，或者两者兼有。融合来自缺失模态的分数会引入噪声，使精度低于最佳单模态系统。我们提出了一种查询自适应框架，通过跨模态分数一致性检测主动模态：当两种模态都活跃时，由一种模态检索的文件在另一种模态上也得分高；当一种模态缺失时，这种一致性被破坏。由这些跨模态特征驱动的分类器实现了89%的检测准确率。在BBC Rewind语料库（包含超过12,000个广播视频）上，自适应系统达到了94.2%的P@1，优于仅语音（82.9%）、仅面部（93.4%）和固定融合（90.0%），恢复了与具有真实模态标签的Oracle（96.6%）之间差距的64%。

英文摘要

When retrieving a person from a video archive by voice and face, should the system be multimodal or not? In real-world broadcast archives, unlike curated benchmarks, a target may be heard but unseen, seen but unheard, or both. Fusing scores from an absent modality injects noise, degrading precision below the best unimodal system. We propose a query-adaptive framework that detects active modalities via cross-modal score consistency: when both modalities are active, files retrieved by one also score highly on the other; this agreement breaks down when a modality is absent. Classifiers driven by these cross-modal features achieve 89% detection accuracy. On the BBC Rewind corpus (with over 12,000 broadcast videos) the adaptive system attains 94.2% P@1, outperforming speaker-only (82.9%), face-only (93.4%), and fixed fusion (90.0%), recovering 64% of the gap to an oracle with ground-truth modality labels (96.6%).

URL PDF HTML ☆

赞 0 踩 0

2606.05927 2026-06-05 cs.LG

Addressing Imbalance in Multi-Label Data via Label-Specific Distance-based Oversampling

通过标签特定距离的过采样解决多标签数据中的不平衡问题

Bin Liu, Jun Wu, Haoyu Peng, Ao Zhou, Jin Wang, QiaoSong Chen, Grigorios Tsoumakas

发表机构 * Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, China（数据工程与视觉计算重点实验室，重庆邮电大学，中国）； School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, China（计算机科学与技术学院，重庆邮电大学，中国）； State Key Laboratory of Novel Software Technology, Nanjing University, China（新型软件技术国家重点实验室，南京大学，中国）； School of Informatics, Aristotle University of Thessaloniki, Greece（信息学院，希腊阿尔蒂米斯大学）

AI总结针对多标签分类中的标签不平衡问题，提出基于标签特定距离的过采样方法LSDMLO，通过加权相关特征空间识别标签一致邻居，生成更有效的合成实例，实验表明优于现有方法。

详情

AI中文摘要

复杂的非平衡标签分布对多标签分类构成了严峻挑战，因为大多数分类器偏向于多数类和高频标签。过采样是一种高效且灵活的解决方案，通过增加实例来为多标签分类器提供更平衡的训练数据集。现有的大多数过采样方法以启发式方式创建合成实例，本质上依赖于在整个特征空间中使用欧氏距离检索的邻域信息。然而，它们未能考虑特征对不同标签的不同语义相关性，导致邻近邻居之间的标签不一致，进而引入标签混淆和过拟合到合成实例。为了克服上述问题，我们提出了一种新颖的采样方法，称为基于标签特定距离的多标签过采样（LSDMLO），该方法创建更有用且标签正确的合成实例，以解决多标签数据集中的不平衡问题。LSDMLO基于加权相关特征空间推导标签特定距离，以识别标签一致的邻居，这有助于选择在边界区域表达更多标签相关性的种子实例，并生成与原始数据标签分布一致的合成实例。综合实验表明，所提出的LSDMLO在各种基分类器下均优于最先进的多标签采样方法。

英文摘要

The complex imbalanced label distribution poses a crucial challenge to multi-label classification, as most classifiers are biased towards the majority class and high-frequent labels. Oversampling is an efficient and flexible solution that augments instances to provide a more balanced training dataset for multi-label classifiers. Most existing oversampling methods create synthetic instances in a heuristic way that essentially relies on neighborhood information retrieved using Euclidean distance within the entire feature space. However, they fail to consider the varying semantic relevance of features to different labels, leading to label inconsistency among proximate neighbors and further introducing label confusion and overfitting to synthetic instances. To overcome the above issue, we propose a novel sampling approach called Label-Specific Distance-based Multi-Label Oversampling (LSDMLO) that creates more useful and well-labeled synthetic instances to address the imbalance in multi-label datasets. LSDMLO derives the label-specific distance to identify label-consistent neighbors based on the weighted pertinent feature space, which facilitates selecting seed instances that express more label correlations in boundary areas and generating synthetic instances aligned with the label distribution of original data. The comprehensive experiments verify that the proposed LSDMLO outperforms the state-of-the-art multi-label sampling approaches under various base classifiers.

URL PDF HTML ☆

赞 0 踩 0

2606.05925 2026-06-05 cs.AI

Towards World Models in Biomedical Research

迈向生物医学研究的世界模型

Guangyu Wang, Jingkun Yue, Siqi Zhang, Yu Liu, Xiaoyu Wang, Mingyuan Meng, Changwei Ji, Zongbo Han, Yulin Wang, Yang Yue, Frank Fu, Ting Chen, Song Wu, Ziwei Liu, Jiangning Song, Ming Li, Gao Huang, Xiaohong Liu, Athanasios Vasilakos, Xingcai Zhang, Ping Zhang, Yong Li

发表机构 * State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China（网络与交换技术国家重点实验室，北京邮电大学，北京，中国）； Department of Engineering Science, University of Oxford, Oxford, United Kingdom（英国牛津大学工程科学系，牛津，英国）； Institute of Medical Artificial Intelligence, South China Hospital, Medical School, Shenzhen University, Shenzhen, Guangdong, China（医学人工智能研究所，南方医院，医学学院，深圳大学，深圳，广东，中国）； Zhongguancun Academy & Zhongguancun Institute of Artificial Intelligence, Beijing, China（中关村学院及中关村人工智能研究院，北京，中国）； Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, 100084, Beijing, China（北京信息科学与技术国家研究中心（BNRist），清华大学，100084，北京，中国）； Department of Chemical and Nano Engineering, University of California, San Diego, La Jolla, CA, USA（美国加州大学圣地亚哥分校化学与纳米工程系，La Jolla，CA，美国）； Nanyang Technological University, Singapore（新加坡南洋理工大学）； Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia（莫纳什大学生物医学发现研究所和生物化学与分子生物学系，墨尔本，维多利亚，澳大利亚）； David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada（加拿大滑铁卢大学戴维·R·切里顿计算机科学学校，滑铁卢，安大略，加拿大）； Department of ICT and Center for AI Research, University of Agder (UiA), Jon Lilletuns vei 9, Grimstad, Norway（挪威阿格德大学（UiA）信息与通信技术系及人工智能研究中心，Jon Lilletuns vei 9，Grimstad，挪威）； Department of Electronic Engineering, Tsinghua University, Beijing, China（清华大学电子工程系，北京，中国）

AI总结提出生物医学世界模型作为AI驱动发现的新范式，通过学习分子、细胞、组织和临床状态的潜在表征及干预条件动态，实现未来轨迹模拟，并探讨其在虚拟细胞、类器官、虚拟患者和手术模拟等应用中的潜力。

详情

AI中文摘要

生物医学的一个核心目标是理解、预测并最终控制生物系统对扰动、疾病进展和治疗干预的动态机制。尽管基础模型和大语言模型加速了生物医学数据解读，但当前大多数系统仍专注于静态模式识别，而非对生物未来的前瞻性模拟。在此，我们提出生物医学世界模型作为AI驱动发现的一种范式。这些模型学习分子、细胞、组织和临床状态的潜在表征，以及干预条件动态，使得在采取行动之前能够模拟未来轨迹。我们讨论了生物医学世界模型如何作为数据引擎、环境模拟器和科学规划基础，应用于虚拟细胞、类器官、虚拟患者和手术模拟等场景。我们概述了所需的数据基础设施、评估基准、安全约束和治理框架。生物医学世界模型可能为模拟引导、闭环且实验可操作的生物医学发现提供基础。

英文摘要

A central goal of biomedicine is to understand, predict and ultimately control the dynamic mechanisms by which biological systems respond to perturbations, disease progression and therapeutic intervention. Although foundation models and large language models have accelerated biomedical data interpretation, most current systems remain focused on static pattern recognition rather than prospective simulation of biological futures. Here we propose biomedical world models as a paradigm for AI-driven discovery. These models learn latent representations of molecular, cellular, tissue and clinical states, together with intervention-conditioned dynamics that allow future trajectories to be simulated before actions are taken. We discuss how biomedical world models could function as data engines, environment simulators and scientific planning substrates across applications including virtual cells, organoids, virtual patients and surgical simulation. We outline the data infrastructure, evaluation benchmarks, safety constraints and governance frameworks required. Biomedical world models may provide a foundation for simulation-guided, closed-loop and experimentally actionable biomedical discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.05924 2026-06-05 cs.CL cs.AI

Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach

更好的文学翻译：多维度数据生成与大语言模型训练方法

Zhihao Lin, Ziqi Zhu, Hao Huang, Guanghui Wang, Peiyang He

发表机构 * Amazon Web Services (AWS)（亚马逊网络服务（AWS））； Peking University（北京大学）

AI总结提出多维度迭代优化框架，通过专门的大语言模型生成高质量翻译参考和偏好数据，结合监督微调和强化学习（GRPO）提升文学翻译质量，在MetaphorTrans英中文学翻译基准上达到与Claude Sonnet 4.5竞争的性能。

Comments Accepted by ACL 2026 Industry

详情

AI中文摘要

文学翻译因高质量标注数据的稀缺以及需要在表达流畅性与文学效果之间取得平衡而面临独特挑战。我们提出了一个多维度迭代优化框架，通过专门的大语言模型翻译器生成高质量的翻译参考和偏好数据，每个翻译器针对一个不同的质量维度。我们利用生成的数据进行监督微调和强化学习。实验表明，我们的生成参考在监督微调中比原始真实数据高出8.65个CEA100点。对于强化学习，我们发现直接偏好优化（DPO）在此设置下导致性能下降，而利用显式奖励模型进行组相对策略优化（GRPO）则额外提升了1.51个点。我们将此归因于两阶段训练的稳定性和GRPO的在线探索能力。我们的最终模型LitMT-8B和LitMT-14B在MetaphorTrans英中文学翻译基准上分别达到67.25和69.07个CEA100点，与Claude Sonnet 4.5的68.43点具有竞争力，并展现出对域外文学作品（如欧·亨利）的强泛化能力。

英文摘要

Literary translation poses unique challenges due to the scarcity of high-quality annotated data and the need to balance expression fluency with literary effect. We present a multi-aspect iterative refinement framework that generates high-quality translation references and preference data through specialized LLM translators, each targeting a distinct quality dimension. We leverage the generated data for supervised fine-tuning and reinforcement learning. Experiments show that our generated references outperform the original ground truth for SFT by 8.65 CEA100 points. For reinforcement learning, we find that DPO leads to performance degradation in this setting, while leveraging an explicit reward model for GRPO yields an additional 1.51 point improvement. We attribute this to the stability of two-stage training and GRPO's online exploration capability. Our resulting models, LitMT-8B and LitMT-14B, achieve 67.25 and 69.07 CEA100 respectively on the MetaphorTrans English-to-Chinese literary translation benchmark, competitive with Claude Sonnet 4.5 at 68.43, and demonstrate strong generalization to out-of-domain literary work (i.e., O. Henry).

URL PDF HTML ☆

赞 0 踩 0

2606.05917 2026-06-05 cs.CV cs.CL

MemoryCard: Topic-Aware Multi-Modal Clue Compression for Long-Video Question Answering

MemoryCard: 面向长视频问答的主题感知多模态线索压缩

Qing Yang, Pengcheng Huang, Xinze Li, Zhenghao Liu, Yukun Yan, Yu Gu, Ge Yu, Gang Li, Maosong Sun

发表机构 * School of Computer Science and Engineering, Northeastern University（东北大学计算机科学与工程学院）； Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）； Digital China Group（数字中国集团）

AI总结提出MemoryCard框架，通过将长视频分割为主题事件单元并生成事件级摘要和代表性视觉时刻，以记忆卡形式增强VLMs的长视频问答能力，在相同视觉令牌预算下准确率提升高达21.8%。

Comments 21 pages, 8 figures

详情

AI中文摘要

长视频问答对视觉语言模型（VLMs）仍然具有挑战性，因为与答案相关的证据通常稀疏、短暂且时间上分散在冗长的视频上下文中。现有的以帧为中心的方法通过均匀采样、查询感知帧选择、视觉令牌压缩和自适应分辨率策略来提高效率。然而，它们仍然依赖孤立和零散的帧作为基本证据单元，限制了VLMs有效捕获连贯事件级语义的能力。为解决这一限制，我们提出了MemoryCard，一种基于视频记忆的增强框架，将长视频组织成自包含的记忆卡。具体来说，MemoryCard首先对视频和对齐的文本执行自读过程，将视频分割为语义连贯的单元，每个单元对应一个不同的主题或事件。对于每个单元，它生成事件级视频要点并选择代表性视觉时刻，然后将其渲染为统一的记忆卡，用于检索和问答。实验结果表明，在可比的视觉令牌预算下，MemoryCard持续提高了长视频问答性能，准确率相对提升高达21.8%。所有代码可在https://github.com/NEUIR/MemoryCard获取。

英文摘要

Long-video question answering remains challenging for Vision-Language Models (VLMs), as answer-relevant evidence is often sparse, transient, and temporally dispersed across lengthy video contexts. Existing frame-centric approaches improve efficiency through uniform sampling, query-aware frame selection, visual-token compression, and adaptive resolution strategies. However, they still rely on isolated and fragmented frames as the fundamental evidence units, limiting VLMs' ability to effectively capture coherent event-level semantics. To address this limitation, we propose MemoryCard, a video-memory-based augmentation framework that organizes long videos into self-contained Memory Cards. Specifically, MemoryCard first performs a self-reading process over videos and aligned utterances to segment the video into semantically coherent units, each corresponding to a distinct topic or event. For each unit, it generates an event-level video gist and selects representative visual moments, which are then rendered into unified Memory Cards for retrieval and question answering. Experimental results demonstrate that MemoryCard consistently improves long-video QA performance under comparable visual-token budgets, achieving up to a 21.8% relative improvement in accuracy. All code is available at https://github.com/NEUIR/MemoryCard.

URL PDF HTML ☆

赞 0 踩 0

2606.05916 2026-06-05 cs.CV

Unveiling the Unknown: Open Vocabulary Object Detection with Scene Graphs

揭示未知：基于场景图的开放词汇目标检测

Yi Chen, Yinghao Lu, Zhehao Li, Chenchen Yan, Jiafei Wu, Chong Wang, Jiangbo Qian

发表机构 * Faculty of Electrical Engineering and Computer Science, Ningbo University（宁波大学电气工程与计算机科学学院）； Faculty of Computing, Georg-August-Universität Göttingen（哥廷根大学计算机学院）； Merchants’ Guild Economics and Cultural Intelligent Computing Laboratory, Ningbo University（宁波大学商帮经济与文化智能计算实验室）； School of Software Technology, Zhejiang University（浙江大学软件学院）

AI总结提出场景引导的关系建模检测框架，利用场景图捕获候选区域与上下文对象之间的结构化语义和空间关系，并通过关系注意力模块和场景文本对齐分支增强开放词汇目标检测性能。

详情

AI中文摘要

开放词汇目标检测旨在识别训练数据中未出现的新目标类别。许多基于知识蒸馏的方法通过将预训练视觉-语言模型的知识迁移到目标检测中，展现了有前景的性能。然而，这些方法往往忽略了对象之间结构化的、图像特定的关系，例如交互和空间布局。这种忽视可能严重限制检测新类别的有效性。为解决这一问题，我们提出了一种场景引导的关系建模检测框架。该框架利用场景图捕获候选区域与其上下文对象之间的结构化语义和空间关系。它显式建模相邻区域之间的交互，并引入关系注意力模块隐式增强从场景图中提取的关键关系线索。此外，我们提出了一种基于场景的文本对齐分支，从字幕中蒸馏类别知识以指导关系对齐。该方法促进了视觉关系与语义信息的无缝集成，从而提升检测性能。大量实验表明，我们的模型在COCO和LVIS数据集上对新类别的AP优于其他OVOD方法。

英文摘要

Open-vocabulary object detection seeks to identify novel object categories that were not part of the training data. Many knowledge distillation-based approaches have shown promising performance by transferring knowledge from pre-trained vision-language models to object detection. However, these methods often overlook structured, image-specific relationships between objects, such as interactions and spatial arrangements. This oversight can significantly restrict the effectiveness of detecting novel categories. To address this issue, we propose a Scene-guided Relational Modeling detection framework. This framework utilizes scene graphs to capture structured semantic and spatial relationships between candidate regions and their contextual objects. It explicitly models interactions among neighboring regions and incorporates a Relation Attention Module to implicitly amplify the key relational cues extracted from the scene graph. Furthermore, we present a scene-based textual alignment branch that distills category knowledge from captions to guide relational alignment. This approach facilitates a seamless integration of visual relations with semantic information for enhanced detection performance. Comprehensive experiments show that our model achieves superior performance compared to other OVOD methods, improving the AP for novel categories on COCO and LVIS datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.05915 2026-06-05 cs.CV

CamFlow+: Hybrid Motion Bases for 2D Camera Motion Estimation with Stabilization Applications

CamFlow+: 用于二维相机运动估计的混合运动基及其稳定应用

Haipeng Li, Zhen Liu, Zhanglei Yang, Hai Jiang, Tianhao Zhou, Zhengzhe Liu, Ping Tan, Bing Zeng, Shuaicheng Liu

发表机构 * School of Information and Communication Engineering, University of Electronic Science and Technology of China（电子科技大学信息与通信工程学院）； University of Electronic Science and Technology of China（电子科技大学）； School of Aeronautics and Astronautics, Sichuan University（四川大学航空宇航学院）； YingCai Honors College, University of Electronic Science and Technology of China（电子科技大学 YingCai 优秀生学院）； Lingnan University（岭南大学）； Hong Kong University of Science and Technology and Shenzhen Loop Area Institute（香港科学与技术大学及深圳环宇研究院）

AI总结提出CamFlow+混合基框架，通过结合单应性物理基、随机基和深度平移基在稠密光流空间中直接估计二维相机运动，并引入深度感知平滑项，有效处理平移、深度变化和局部视差，在相机运动估计和视频稳定任务中取得最优效果。

详情

AI中文摘要

估计二维相机运动是计算机视觉和计算摄影的基础。现有的基于单应性的方法在平面场景或纯旋转情况下效果良好，但在相机平移、深度变化和局部视差方面表现不佳；局部单应性和网格模型提高了灵活性，但仍依赖于分片平面假设。我们提出CamFlow+，一个混合基框架，直接在稠密光流空间中表示二维相机运动。CamFlow+结合了单应性导出的物理基、从单应性流中采样的随机基以及从深度和相机内参导出的深度平移基，在保持相机运动规律的同时放松了单平面约束。一个深度感知平滑项进一步在连续深度区域正则化平移引起的视差，同时保留深度边界附近的运动变化。我们在GHOF-Cam上评估CamFlow+，这是一个相机运动基准，通过掩蔽光流基准中的动态对象和不适定遮挡区域来隔离相机引起的运动。实验表明，CamFlow+改进了稀疏和稠密相机运动估计。在数字视频稳定中，CamFlow+还提高了全局和局部稳定性，在盲用户研究中实现了最佳top-1偏好率。代码和数据集将在项目页面上提供：https://lhaippp.github.io/CamFlow+。

英文摘要

Estimating 2D camera motion is fundamental to computer vision and computational photography. Existing homography-based methods work well for planar scenes or pure rotation, but struggle with camera translation, depth variation, and local parallax; local homography and mesh-based models improve flexibility but still rely on piecewise planar assumptions. We introduce CamFlow+, a hybrid-basis framework that represents 2D camera motion directly in dense-flow space. CamFlow+ combines homography-derived physical bases, stochastic bases sampled from homography flows, and depth-translational bases derived from depth and camera intrinsics, relaxing the single-plane constraint while preserving camera-motion regularity. A depth-aware smoothness term further regularizes translation-induced parallax in continuous-depth regions while preserving motion changes near depth boundaries. We evaluate CamFlow+ on GHOF-Cam, a camera-motion benchmark that masks out dynamic objects and ill-posed occlusion regions in an optical-flow benchmark to isolate camera-induced motion. Experiments show that CamFlow+ improves sparse and dense camera-motion estimation. In digital video stabilization, CamFlow+ also improves global and local stability, achieving the best top-1 preference rate in a blind user study. Code and datasets will be available on the project page: https://lhaippp.github.io/CamFlow+.

URL PDF HTML ☆

赞 0 踩 0

2606.05912 2026-06-05 cs.CV

Self-Learning Expression Deformations for Data-Efficient Gaussian Avatars

自学习表情形变用于数据高效的高斯化身

Jiahao Yang, Xiaohang Yang, Qing Wang, Yilan Dong, Gregory Slabaugh, Shanxin Yuan

发表机构 * Queen Mary University of London（伦敦大学玛丽女王学院）

AI总结提出自适应高斯表情框架，通过自监督学习表情驱动的形变，结合2D高斯面元和符号距离场，实现从极少量输入数据（单帧、单目或单张图像）重建高保真可动画化身。

详情

AI中文摘要

使用3D高斯表示建模动态面部表情由于其非结构化特性仍然具有挑战性。传统的高斯化身流程需要大量的多视角和序列表情数据，限制了可扩展性和可访问性。在这项工作中，我们引入了自适应性高斯表情（SAGE），一个自学习表情诱导的高斯形变框架，能够从最小输入数据中实现高保真、可动画的化身。我们的方法联合优化2D高斯面元和符号距离场（SDF）以强制实现紧凑的、表面对齐的高斯分布，同时一个自监督的表情学习阶段用几何和外观一致性约束取代了长时间的训练序列。这种设计允许在多种重建场景下灵活部署：在多视角设置中，仅需单帧（时间步）而非数千帧；在单目设置中，仅需头部旋转而无需表情序列；在单次设置中，无需预训练或先验。实验表明，我们的方法在重建和动画质量上与最先进方法相当，同时将数据需求降低了几个数量级。我们的结果突显了自监督高斯形变学习作为迈向可访问、数据高效化身创建的一步的潜力。

英文摘要

Modeling dynamic facial expressions using 3D Gaussian representations remains challenging due to their unstructured nature. Conventional Gaussian avatar pipelines require extensive multiview and sequential expression data, limiting scalability and accessibility. In this work, we introduce Self-Adaptive Gaussian Expression (SAGE), a framework for self-learning expression-induced Gaussian deformations that enables high-fidelity, animatable avatars from minimal input data. Our method jointly optimizes 2D Gaussian surfels and a Signed Distance Field (SDF) to enforce compact, surface-aligned Gaussian distributions, while a self-supervised expression learning phase replaces long training sequences with geometric and appearance consistency constraints. This design allows flexible deployment across multiple reconstruction regimes: in the multiview setting, only a single frame (timestep) is required instead of thousands; in the monocular setting, only head rotations are needed without expression sequences; and in the one-shot setting, no pretraining or priors are necessary. Experiments demonstrate that our approach achieves reconstruction and animation quality comparable to state-of-the-art methods, while reducing data requirements by several orders of magnitude. Our results highlight the potential of self-supervised Gaussian deformation learning as a step toward accessible, data-efficient avatar creation.

URL PDF HTML ☆

赞 0 踩 0

2606.05911 2026-06-05 cs.SD cs.LG eess.AS

DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Complexity Monaural Speech Enhancement

DBHN-Net: 低复杂度单声道语音增强的双分支混合神经网络

Cunhang Fan, Enrui Liu, Jing Zhou, Jian Kang, Jie Li, Andong Li, Jian Zhou, Zhao Lv, Xuelong Li

发表机构 * State Key Laboratory of Opto-Electronic Information Acquisition and Protection Technology, (School of Computer Science and Technology), Anhui University（光电信息获取与防护技术国家重点实验室（计算机科学与技术学院），安徽大学）； China Telecom Artificial Intelligence Technology (Beijing) Co., Ltd（中国电信人工智能技术（北京）有限公司）； Institute of Acoustics, University of Chinese Academy of Sciences（中国科学院声学研究所）； Institute of Artificial Intelligence (TeleAI), China Telecom, China（人工智能研究所（TeleAI），中国电信，中国）

AI总结提出一种结合ANN和SNN的双分支混合神经网络，通过BandSplit、TF-Mamba等模块降低计算复杂度，同时利用交互和融合模块保持性能，在三个公共数据集上实现平均7.5倍复杂度降低。

Comments This article has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI)

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI2026)

详情

DOI: 10.1109/TPAMI.2026.3698087

AI中文摘要

尽管基于人工神经网络（ANN）的语音增强（SE）方法表现出色，但高计算复杂度和高能耗阻碍了它们在实际前端处理任务中的部署。目前，脉冲神经网络（SNN）在降低功耗方面显示出潜力。然而，SNN的离散二进制激活和复杂的时空动态常常导致信息丢失。因此，当前的挑战集中在如何保持性能并降低计算复杂度。为了解决这个问题，本文提出了一种双分支混合神经网络（DBHN）。1）在网络架构方面：设计了一个集成ANN和SNN的双分支网络，其中SNN分支降低功耗，而ANN分支解决信息丢失；开发了BandSplit和时频（TF）-Mamba模块，以同时压缩能耗和增强模型性能；实现了带有残差连接的脉冲特征提取组（SFEG）和信息转换块（ITB）组件，以减轻信息丢失，同时进一步细化特征表示。2）为了促进分支间的信息融合：设计了一个交互模块，以促进双分支网络各个阶段的信息交换；设计了一个TF交叉注意力融合模块，在数据自适应地引导SNN分支保留更多关键信息的同时，对双分支信息进行时频域融合。结果表明，所提出的模型在三个公共数据集上保持了优越的性能，同时与基线模型相比，计算复杂度平均降低了7.5倍。

英文摘要

Although artificial neural network (ANN) based speech enhancement (SE) methods demonstrate excellent performance, the high computational complexity and high energy consumption hinder their deployment in practical front-end processing tasks.} Currently, the spiking neural networks (SNNs) have shown potential in reducing power consumption. However, the discrete binary activation and complex spatio-temporal dynamics of SNNs often result in information loss. The current challenge therefore focuses on how to maintain performance and reduce computational complexity. To address this issue, this work propose a Dual-Branch Hybrid Neural (DBHN) Network. 1) In terms of network architecture: A dual-branch network integrating ANN and SNN was designed, where the SNN branch reduces power consumption while the ANN branch addresses information loss; The BandSplit and Time-Frequency (TF) -Mamba modules were developed to simultaneously compress energy consumption and enhance model performance; Spiking Feature Extraction Group (SFEG) and Information Transformation Block (ITB) components were implemented with residual connections to mitigate information loss while further refining feature representations. 2) To facilitate inter-branch information fusion: An Interaction module was designed to promote information exchange at various stages of the dual-branch network; A TF-Cross Attention-Fusion module was designed to perform time-frequency domain fusion of dual-branch information while data-adaptively guiding the SNN branch to retain more critical information. Results show that the proposed model maintains superior performance across three public datasets while achieving an average 7.5 fold reduction in computational complexity compared to baseline models.

URL PDF HTML ☆

赞 0 踩 0

2606.05909 2026-06-05 cs.SD eess.AS

Beyond WER: A Paired Acoustic Stress Test for Ambient Clinical Scribes

超越WER：面向环境临床记录员的配对声学压力测试

Xiao-Hang Jiang, Han-Jie Guo, Ying-Si Liang, Yang Ai, Zhen-Hua Ling, Lei Jiang, Zhi-Yang He

发表机构 * University of Science and Technology of China（中国科学技术大学）； iFLYTEK Co., Ltd.（iFLYTEK公司）

AI总结提出配对声学压力测试方法，通过注入噪声并冻结下游模型，揭示噪声对临床推理的安全影响，发现轻微声学扰动可逆转临床意义而不显著增加词错误率，并展示轻量级缓解策略。

Comments Accepted to INTERSPEECH 2026

详情

AI中文摘要

环境临床记录员越来越多地将自动语音识别与大型语言模型结合以自动化文档记录。然而，词错误率等传统指标掩盖了系统性的安全性退化。我们提出了一种配对声学压力测试，以隔离噪声对临床推理的因果影响。对于相同的对话，我们在保持下游模型配置不变的情况下注入多种噪声类型。关键的是，我们发现信号保真度与临床安全性之间存在危险的脱节。平稳环境噪声使词错误率仅增加了微不足道的0.71个百分点，但几乎使不安全输出的比例翻倍。我们的分析表明，轻微的声学扰动可以在不显著增加错误率的情况下逆转临床含义。此外，我们展示了一种轻量级缓解策略，该策略在噪声条件下减轻安全性退化，而无需进行模型微调。

英文摘要

Ambient clinical scribes increasingly combine Automatic Speech Recognition with Large Language Models to automate documentation. However, traditional metrics like Word Error Rate mask systemic safety degradation. We present a paired acoustic stress test to isolate the causal impact of noise on clinical reasoning. For the same dialogues, we inject diverse noise types while keeping the downstream model configuration frozen. Crucially, we uncover a dangerous disconnect between signal fidelity and clinical safety. Stationary ambient noise increased the Word Error Rate by a negligible 0.71 percentage points yet nearly doubled the rate of unsafe outputs. Our analysis reveals that minor acoustic perturbations can invert clinical meaning without substantially inflating error rates. Furthermore, we demonstrate a lightweight mitigation strategy that mitigates safety degradation under noisy conditions without requiring model fine tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.05906 2026-06-05 cs.CL

ACE-SQL: Adaptive Co-Optimization via Empirical Credit Assignment for Text-to-SQL

ACE-SQL: 基于经验信用分配的自适应协同优化方法用于文本到SQL

Xiaobing Chen, Ai Jian, Eryu Guo, Zhiqi Pang

发表机构 * Harbin Engineering University（哈尔滨工程大学）； Harbin Institute of Technology（哈尔滨工业大学）； Beijing University of Posts and Telecommunications（北京邮电大学）

AI总结提出ACE-SQL强化学习框架，通过在线列集池和经验信用分配联合优化模式检索与SQL生成，在BIRD Dev上达到65.3%的贪心执行准确率。

详情

AI中文摘要

文本到SQL将自然语言问题映射为可执行的SQL查询。现代数据库通常包含大型且复杂的模式，使得模式链接成为准确生成SQL的关键步骤。现有方法要么依赖全模式生成，这在大搜索空间中隐式进行模式链接，要么使用基于静态金列监督训练的独立检索器，其目标可能对当前生成器策略是次优的。为解决此问题，我们提出基于经验信用分配的自适应协同优化方法用于文本到SQL（ACE-SQL），这是一个在执行反馈下联合优化模式检索和SQL生成的强化学习框架。ACE-SQL从生成器rollout中构建在线列集池，并从与执行正确rollout最频繁关联的列集中推导出自适应在线策略检索目标。这引发了双向适应：检索器适应生成器能正确执行的列集，而生成器在执行反馈下适应检索器不断演变的模式选择。使用约3k个合成文本到SQL问题-数据库对进行强化学习训练，ACE-SQL在BIRD Dev上实现了65.3%的贪心执行准确率，每个查询使用0.93k输出令牌。代码仓库见https://github.com/xbchen1/ACE-SQL。

英文摘要

Text-to-SQL maps natural language questions to executable SQL queries. Modern databases often contain large and complex schemas, making schema linking a critical step for accurate SQL generation. Existing methods either rely on full-schema generation, which leaves schema linking implicit within a large search space, or use a separate retriever trained with static gold-column supervision, whose targets may be suboptimal for the current generator policy. To address this issue, we propose Adaptive Co-optimization via Empirical Credit Assignment for Text-to-SQL (ACE-SQL), a reinforcement learning (RL) framework that jointly optimizes schema retrieval and SQL generation under execution feedback. ACE-SQL constructs an online column-set pool from generator rollouts and derives adaptive on-policy retrieval targets from the column set most frequently associated with execution-correct rollouts. This induces bidirectional adaptation, where the retriever adapts toward column sets that the generator can execute correctly, while the generator adapts to the retriever's evolving schema selections under execution feedback. With approximately 3k synthetic Text-to-SQL question-database pairs for RL training, ACE-SQL achieves 65.3% greedy execution accuracy on BIRD Dev while using 0.93k output tokens per query. The repository is available at https://github.com/xbchen1/ACE-SQL.

URL PDF HTML ☆

赞 0 踩 0

2606.05903 2026-06-05 cs.RO

A Novel Method with Encoder-Decoder for Cross-Sensor Adaptation in Surface Shape Sensing with Sparse Strain Sensors

一种基于编码器-解码器的跨传感器自适应方法，用于稀疏应变传感器的表面形状感知

Shuo Wang, Heng Luo, Dian Jin, Xiaoming Tao

发表机构 * IEEE

AI总结提出一种结合元学习和少样本适应的编码器-解码器架构，实现不同传感器阵列间的跨传感器自适应，显著降低新传感器部署所需的标注数据量和适应时间，将感知误差从23.0 mm降至约4.0 mm。

详情

AI中文摘要

由内在差异或安装条件引起的传感器阵列性能变化可能导致形状感知结果不一致。为了获得准确结果，通常需要大量数据，并且必须为每个传感器阵列重新训练单独的模型，从而增加了数据采集、传输和计算的时间和成本。为解决这一问题，本文提出了一种基于稀疏应变传感器的表面形状感知编码器-解码器架构，并进一步结合元学习和少样本适应策略，实现不同传感器阵列组之间的自适应。实验结果表明，经过跨传感器自适应后，新部署的传感器阵列仅需少于5.0%的新标注数据，适应时间低于1秒，即可达到约4.0 mm的感知误差，相比未适应时的23.0 mm误差和训练新模型所需的20分钟数据采集时间，有显著提升。此外，误差低于5.0 mm的点数增加了超过65.0%。这些结果表明，所提方法能大幅降低表面形状感知的成本和训练负担，在软体机器人和可穿戴设备中具有广泛的应用潜力。

英文摘要

Performance variations in sensor arrays, caused by intrinsic differences or installation conditions, can lead to inconsistent results during shape sensing. To obtain accurate results, a large amount of data is usually required, and a separate model must be retrained for each sensor array, thereby increasing the cost and time of data acquisition, transmission, and computation. To address this issue, this work proposes an encoder-decoder architecture for surface shape sensing based on sparse strain sensors and further incorporates meta-learning and few-shot adaptation strategies to enable adaptation across different groups of sensor arrays. Experimental results demonstrate that, after the cross-sensor adaptation, a newly deployed sensor array achieves a sensing error of approximately 4.0 mm relying on less than 5.0% newly labeled data and requiring an adaptation time of under 1 second, which represents a substantial improvement from 23.0 mm error without adaptation and 20-minute data collection time required to train a new model. Moreover, the number of points with errors below 5.0 mm increased by more than 65.0%. These results indicate that the proposed method can substantially reduce the cost and training burden of surface shape sensing, and it has broad potential applications in soft robotics and wearable devices.

URL PDF HTML ☆

赞 0 踩 0

2606.05901 2026-06-05 cs.CL cs.AI

Reducing Hallucinations in Complex Question Answering using Simple Graph-based Retrieval-Augmented Generation (long version)

减少复杂问答中的幻觉：使用基于简单图的检索增强生成（长版）

Christopher J. Wedge, Joshua Stutter, Danny Dixon, Jacek Cała

发表机构 * National Innovation Centre for Data（数据创新研究中心）

AI总结本研究提出一种轻量级图结构支持的检索增强生成系统，通过结合向量搜索和图查询工具，在复杂问答任务中将幻觉答案数量减半，并显著提升事实正确性的精确率和召回率。

详情

AI中文摘要

大型语言模型（LLMs）从根本上改变了自然语言处理的格局。尽管取得了这些进展，LLMs和基于LLM的系统仍然容易出现各种故障模式。检索增强生成（RAG）系统已成为一种常见的部署场景，旨在避免LLM“幻觉”信息的已知风险，并使模型能够对训练期间无法访问的专有信息进行推理和问答，而无需进行昂贵的模型微调。在这项工作中，我们探索了使用轻量级图结构（具有相对简单的图模式）通过专用工具集支持RAG子系统的想法。我们设计了一个基于英语维基百科文章精选子集的结构化数据集上的智能体系统，该系统配备了多种向量搜索和图查询工具，并评估了其在MoNaCo（一个具有挑战性的维基百科QA基准测试，涉及复杂查询回答任务）上的问题表现。我们的结果表明，引入基于图的工具可以显著提高事实正确性的精确率和召回率，将幻觉答案的数量减半，并在三个评估场景中实现了最高的细粒度真实性得分。所有这些都仅以适度的令牌使用增加为代价。

英文摘要

Large language models (LLMs) have fundamentally transformed the landscape of Natural Language Processing. Despite these advances, LLMs and LLM-based systems remain prone to a variety of failure modes. Retrieval-augmented generation (RAG) systems have emerged as a common deployment scenario seeking to both avoid the well known risk of the LLM "hallucinating" information, and to enable reasoning and question answering over proprietary information that the LLM did not have access to during training without resorting to expensive model fine-tuning. In this work, we explore the idea of using a lightweight graph structure with a relatively simple graph schema, to support the RAG subsystem via a dedicated toolset. We design an agentic system with a variety of vector search and graph query tools operating over a structured dataset based on a curated subset of English Wikipedia articles, and evaluate its performance on questions from MoNaCo, a challenging Wikipedia QA benchmark of complex query answering tasks. Our results show that the introduction of graph-based tools can significantly increase the precision and recall of factual correctness, can halve the number of hallucinated answers, and achieves the highest fine-grained truthfulness score among the three evaluated scenarios. All this with a modest increase in token usage.

URL PDF HTML ☆

赞 0 踩 0

2606.05899 2026-06-05 cs.LG cond-mat.dis-nn

High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

可解注意力模型中LoRA微调的高维理论

O. Duranthon, F. Boncoraglio, L. Zdeborová

发表机构 * Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)（计算物理学实验室，瑞士联邦理工学院（EPFL））

AI总结本文通过高维统计理论分析低秩适应（LoRA）在注意力模型中的微调过程，揭示了预训练与微调之间的相互作用，并给出了测试误差和表示对齐的精确渐近刻画。

2606.05896 2026-06-05 cs.CV

Resonant Minds: Closed-Loop Social Avatars with Theory of Mind

共鸣心智：具备心智理论的闭环社交虚拟人

Jianxu Shangguan, Jing Xu, Hang Ye, Xiaoxuan Ma, Yizhou Wang, Wentao Zhu

发表机构 * University of Washington（华盛顿大学）； Peking University（北京大学）； Carnegie Mellon University（卡内基梅隆大学）； Eastern Institute of Technology, Ningbo（宁波工程技术学院）

AI总结提出一个闭环双智能体框架，通过整合感知、社会推理（基于心智理论）和多模态生成，实现具备社交智能的虚拟人，并在信息不对称数据集上取得优于全信息脚本模式的对话质量。

详情

AI中文摘要

创建具有真正社交智能的逼真数字人需要将认知推理和多模态生成统一在一个连贯的框架内。当前的方法将这些视为独立的任务：大型语言模型擅长对话但缺乏具身表达，而基于扩散的说话头模型实现了视觉保真度但忽略了社会认知。为了弥合这一差距，我们提出了一个闭环双智能体框架，将感知、社会推理和表达整合到一个连续的交互循环中。感知模块从视频中分析伙伴的多模态行为，而社会推理模块通过心智理论推断隐藏的心理状态，并通过集成机制选择响应。然后，表达模块生成情感可控的双智能体视频，合成说话者的言语和表情以及听者的反应行为，捕捉先前工作中缺失的双向动态。我们构建了一个分层的角色-场景数据集，包含基于心理学的角色和私人社交目标，以支持信息不对称下的评估。在该数据集上的实验表明，在对话质量和视频生成指标上均具有竞争性或优越的性能。值得注意的是，我们的方法在关键对话质量维度上甚至超过了全信息脚本模式，这表明在不确定性下显式的心理状态推断可以比无限制的信息访问引发更周到的对话。

英文摘要

Creating lifelike digital humans with genuine social intelligence requires unifying cognitive reasoning and multimodal generation within a coherent framework. Current approaches treat these as separate tasks: Large Language Models excel at dialogue but lack embodied expression, while diffusion-based talking head models achieve visual fidelity but ignore social cognition. To bridge this gap, we propose a closed-loop dual-agent framework integrating perception, social reasoning, and expression into a continuous interaction cycle. The perception module analyzes partners' multimodal behaviors from video, while the social reasoning module infers hidden mental states through Theory of Mind and selects responses via an ensemble mechanism. The expression module then generates emotion-controllable dual-agent videos synthesizing both speaker speech and expression alongside listener reactive behaviors, capturing bidirectional dynamics absent in prior work. We construct a hierarchical Persona-Scenario dataset with psychologically grounded personas and private social goals to support evaluation under information asymmetry. Experiments on this dataset demonstrate competitive or superior performance on both dialogue quality and video generation metrics. Notably, our method surpasses even the full-information Script mode on key dialogue quality dimensions, suggesting that explicit mental state inference under uncertainty can elicit more thoughtful dialogue than unrestricted information access.

URL PDF HTML ☆

赞 0 踩 0

2606.05895 2026-06-05 cs.CL cs.LG

Representing Research Attention as Contextually Structured Flows

将研究关注度表示为上下文结构化流

Jessica Rodrigues, Angelo Salatino, Gard Jenset, Scott Hale

发表机构 * University of Oxford（牛津大学）； The Open University（开放大学）； Springer Nature

AI总结提出注意力流（attention flows）作为上下文结构化表示，编码注意力的组织及其随时间演化，通过类比推理基准评估发现流表示更有效支持结构比较，并提升部分观测和结构扰动下的鲁棒性。

Comments Accepted at STi 2026 - International Conference on Science and Technology Indicators

详情

AI中文摘要

研究关注度被广泛用作可见性、影响和社会采纳的指标，但通常表示为聚合计数，无法保留注意力在上下文中随时间如何发展。这造成了注意力解释方式与其表示方式之间的不匹配。我们提出注意力流作为上下文结构化表示，编码注意力的组织及其随时间演化。我们通过构建基于研究产出间类比推理的基准，评估这些表示是否捕获可迁移结构。比较信号、序列和基于流的表示，我们发现流表示更有效地支持结构比较，特别是在注意力受时间进程或上下文分布影响的场景中。我们进一步表明，学习到的流表示在部分观测和结构扰动下提高了鲁棒性。总体而言，这些结果支持将注意力建模为上下文结构化现象，并为更具信息性的研究评估方法提供了基础。

英文摘要

Research attention is widely used as an indicator of visibility, influence, and societal uptake, yet it is typically represented as aggregated counts that do not preserve how attention develops across contexts over time. This creates a mismatch between how attention is interpreted and how it is represented. We propose attention flows as contextually structured representations that encode the organisation of attention and its evolution over time. We evaluate whether these representations capture transferable structure by constructing a benchmark based on analogy-style reasoning across research outputs. Comparing signal, sequence, and flow-based representations, we find that flow representations more effectively support structural comparison, particularly in settings where attention is shaped by temporal progression or context distributions. We further show that learned flow representations improve robustness under partial observation and structural perturbation. Overall, these results support modelling attention as a contextually structured phenomenon and provide a basis for more informative approaches to research evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.05894 2026-06-05 cs.CL

EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

EMBER: 通过预算化证据保留实现高效记忆的长时程智能体

Yilong Li, Suman Banerjee, Tong Che

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； NVIDIA Research（NVIDIA研究）

AI总结针对长时程智能体在固定预算下保留证据的问题，提出EMBER学习型保留策略，通过存储证据胶囊（含原文摘录、检索键和更新元数据）并利用查询后反馈训练，在LongMemEval-RR上显著提升F1、保留召回和读取召回。

详情

AI中文摘要

长时程智能体可以存档大量历史记录，但未来的答案仍然会产生检索、重读和上下文成本。当保留的记忆缺少与答案相关的证据时，系统必须返回原始历史的大部分内容。我们研究预算化证据存留：在查询未知之前，应保留哪些源证据，以便在固定的保留源证据令牌预算下保持可恢复和可用？我们将此设置实例化为预算化预查询保留，其中记忆在摄取期间写入，随后在无法访问完整原始流的情况下读取。我们引入了EMBER，一种学习型保留策略，它构建了一个紧凑的、基于源的证据状态。EMBER存储证据胶囊：逐字源摘录，附带检索键和更新元数据，同时保留基础性和读取时间访问。查询后结果反馈训练写入器在摄取-检索-答案链中保留证据。在LongMemEval-RR（我们基于LongMemEval衍生的保留证据协议）上，EMBER-14B在8192令牌保留证据比较点达到0.3017 F1，而最强非EMBER预算化基线为0.1765。在不同的保留源证据预算下，EMBER提高了F1、保留召回和读取召回，表明长时程记忆依赖于在预算内保留证据，而不是重读更大的历史记录。

英文摘要

Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw history. We study budgeted evidence survival: before the query is known, which source evidence should be retained so that it remains recoverable and usable under a fixed retained source-evidence token budget? We instantiate this setting as Budgeted Pre-Query Retention, where memory is written during ingestion and later read without access to the full raw stream. We introduce EMBER, a learned retention policy that constructs a compact, source-backed evidence state. EMBER stores evidence capsules: verbatim source excerpts paired with retrieval keys and update metadata, preserving both grounding and read-time access. Post-query outcome feedback trains the writer to preserve evidence across the ingestion-retrieval-answer chain. On LongMemEval-RR, our LongMemEval-derived retained-evidence protocol, EMBER-14B reaches 0.3017 F1 at the 8192-token retained-evidence comparison point, compared with 0.1765 for the strongest non-EMBER budgeted baseline. Across retained source-evidence budgets, EMBER improves F1, Retain-Recall, and Read-Recall, indicating that long-horizon memory depends on retaining evidence within the budget rather than rereading larger histories.

URL PDF HTML ☆

赞 0 踩 0

2606.05890 2026-06-05 cs.CL cs.AI

Staying with the Uncertainty: Uncertainty-Scaffolding Strategies for Artificial Moral Advisors in LLM-to-LLM Simulated Conversations

与不确定性共处：LLM对LLM模拟对话中人工道德顾问的不确定性支撑策略

Salvatore Greco, Hainiu Xu, Jacopo Domenicucci, Yulan He, Sylvie Delacroix

发表机构 * Centre for Data Futures, The Dickson Poon School of Law, King’s College London（数据未来中心、迪克森·普恩法学院、伦敦国王学院）； Department of Informatics, King’s College London（信息学院、伦敦国王学院）； LangAI, Center for Language AI Research, Tohoku University（LangAI、语言人工智能研究中心、东北大学）； Neukom Institute for Computational Science, Dartmouth College（计算科学尼科姆研究所、达特茅斯学院）

AI总结研究LLM作为人工道德顾问时，通过三种不确定性策略（视角倍增、张力保持、过程反思）与三种控制条件对比，在模拟对话中探讨如何帮助对话者“与不确定性共处”，发现不同策略在立场改变量上无差异但影响参与质量。

详情

AI中文摘要

LLM越来越多地被部署为各种背景下的人工道德顾问（AMA）：它们应该展现什么样的对话模式？在本文中，我们研究AMA如何帮助其对话者“与不确定性共处”。我们提出了三种不确定性模式（视角倍增、张力保持、过程反思），并将它们与三种控制条件（基线、说服、谄媚）进行比较。用户代理LLM与遵循特定不确定性策略的AMA就伦理困境进行对话，并完成对话前和对话后的问卷调查。我们进一步考察了两种角色提示格式（陈述式和叙述式）的效果。我们发现：（1）没有一个单一模型作为模拟用户代理占主导地位，开放模型通过角色间分歧与人类模糊性对齐，而封闭模型通过角色内对冲对齐；（2）陈述式角色更好地捕捉初始立场多样性，而叙述式角色显示出更现实的信念修正；（3）所有六种AMA策略产生可区分的对话模式；（4）不确定性策略的不同不在于它们产生多少立场改变，而在于它们维持的参与质量。

英文摘要

LLMs are increasingly deployed as Artificial Moral Advisors (AMA) in a variety of contexts: what kind of conversational patterns should they display? In this paper, we study how AMA can help their interlocutors "stay with the uncertainty". We propose three modes of uncertainty (Perspective-Multiplying, Tension-Preserving, Process-Reflecting) and compare them against three control conditions (Baseline, Persuasive, Sycophantic). A user-agent LLM engages in a dialogue on an ethical dilemma with an AMA following a specific uncertainty strategy, and completes pre- and post-conversation questionnaires. We further examine the effect of two persona prompt formats (Declarative and Narrative). We found that (1) no single model dominates as a simulated user agent, with open models aligning with human ambiguity through between-persona divergence and closed models through within-persona hedging; (2) declarative personas better capture initial stance diversity while narrative personas show more realistic belief revision; (3) all six AMA strategies produce distinguishable conversational patterns; and (4) uncertainty strategies differ not in how much stance revision they produce, but in the quality of engagement they sustain.

URL PDF HTML ☆

赞 0 踩 0

2606.05889 2026-06-05 cs.SD cs.CL eess.AS

GLASS: GRPO-Trained LoRA for Acoustic Style Steering in Zero-Shot Text-to-Speech

GLASS: 基于GRPO训练的LoRA用于零样本文本转语音中的声学风格引导

Jaehoon Kang, Yejin Lee, Kyuhong Shim

发表机构 * Department of Artificial Intelligence, Sungkyunkwan University（人工智能系，全州大学）

AI总结提出GLASS框架，通过GRPO训练轻量LoRA适配器实现零样本自回归TTS中可组合的声学风格控制，无需风格标签即可从奖励中学习控制。

详情

AI中文摘要

我们提出GLASS，一个用于零样本自回归文本转语音（TTS）中可组合声学风格控制的框架，该框架从生成后奖励而非风格标签中学习控制。在零样本TTS中，说话人提示通常将说话人身份与语速、音高等韵律属性纠缠在一起，使得在不改变提示本身的情况下难以改变风格。GLASS将每个声学属性视为一个由奖励定义的控制方向。对于每个控制轴，GLASS冻结TTS主干，并使用组相对策略优化（GRPO）训练一个轻量级LoRA适配器，以语音令牌长度和平均F0作为风格奖励，以WER作为可懂度锚点。由于每个控制表示为LoRA权重更新，独立训练的适配器可以通过线性LoRA算术进行交换、插值和组合，而无需重新训练主干。在语速和音高控制上的实验显示了目标风格偏移，同时保持了自然度、说话人相似性和可懂度，并展示了跨独立训练适配器的平滑插值和多轴组合。

英文摘要

We propose GLASS, a framework for composable acoustic style control in zero-shot autoregressive text-to-speech (TTS) that learns controls from post-generation rewards rather than style labels. In zero-shot TTS, a speaker prompt often entangles speaker identity with prosodic attributes such as speaking rate and pitch, making it difficult to change style without changing the prompt itself. GLASS instead treats each acoustic attribute as a reward-defined control direction. For each control axis, GLASS freezes the TTS backbone and trains one lightweight LoRA adapter with Group Relative Policy Optimization (GRPO), using speech-token length and mean F0 as style rewards and WER as an intelligibility anchor. Because each control is represented as a LoRA weight update, independently trained adapters can be swapped, interpolated, and composed through linear LoRA arithmetic without retraining the backbone. Experiments on speaking rate and pitch control show targeted style shifts while preserving naturalness, speaker similarity, and intelligibility, and demonstrate smooth interpolation and multi-axis composition across independently trained adapters.

URL PDF HTML ☆

赞 0 踩 0

2606.05888 2026-06-05 cs.AI

Retry Policy Gradients in Continuous Action Spaces

连续动作空间中的重试策略梯度

Soichiro Nishimori, Paavo Parmas

发表机构 * The University of Tokyo, Japan（东京大学）

AI总结本文提出重试目标（如pass@K和max@K）的路径导数估计器，将ReMax扩展到连续动作空间，通过重塑策略梯度景观促进随机探索，并引入ReMAC算法实现与SAC相当的性能。

详情

AI中文摘要

基于重试的目标（如pass@K和max@K）优化从多个采样轨迹中获得的最佳回报，最近的研究表明，它们可以在没有显式探索奖励的情况下促进探索。在离散动作空间中，ReMax被证明可以通过适应回报不确定性来实现这一点。在这项工作中，我们引入了重试目标的路径导数估计器，并用它们将ReMax扩展到连续动作空间。我们研究了由此产生的学习动态，并表明，即使使用确定性奖励，ReMax也可以通过重塑策略梯度景观来鼓励随机探索。特别地，它既改变了梯度的方向，使更新偏向于更高的策略熵，也改变了梯度的大小，抑制梯度并减缓收敛。我们进一步表明，Adam的自适应归一化可以缓解这种抑制，具体取决于其数值稳定化参数。在实验上，我们将该目标实例化为ReMax Actor-Critic（ReMAC），这是一种使用路径导数估计器优化ReMax目标的离策略actor-critic算法。我们的实验表明，ReMAC可以在没有熵正则化的情况下促进更高的策略熵，并实现与SAC相当的性能。

英文摘要

Retry-based objectives such as pass@K and max@K optimize the best return obtained from multiple sampled trajectories, and recent work has shown that they can promote exploration without explicit exploration bonuses. In discrete action spaces, ReMax was shown to do so by adapting to return uncertainty. In this work, we introduce pathwise derivative estimators for retry objectives and use them to extend ReMax to continuous action spaces. We study the resulting learning dynamics and show that, even with deterministic rewards, ReMax can encourage stochastic exploration by reshaping the policy-gradient landscape. In particular, it alters gradients both in direction, biasing updates toward higher policy entropy, and in magnitude, damping gradients and slowing convergence. We further show that Adam's adaptive normalization can mitigate this damping, depending on its numerical stabilization parameter. Empirically, we instantiate this objective as ReMax Actor-Critic (ReMAC), an off-policy actor--critic algorithm that optimizes the ReMax objective using a pathwise derivative estimator. Our experiments show that ReMAC can promote higher policy entropy without entropy regularization and achieves performance comparable to SAC.

URL PDF HTML ☆

赞 0 踩 0

2606.05885 2026-06-05 cs.LG

When Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent Training

当更密集的信用不足时：面向长周期LLM智能体训练的基于证据校准的策略优化

Yuanfan Li, Qi Zhou, Wenjing Duan, Lu Chen

发表机构 * X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China（X-LANCE实验室，计算机科学学院，上海交通大学，上海，中国）； Faculty of Electronic and Information Engineering, Xi’an Jiaotong University（电子与信息工程学院，西安交通大学）

AI总结针对长周期LLM智能体在稀疏延迟奖励下的信用分配问题，提出一种无评论家的策略优化算法ECPO，通过证据校准的动作优势和方差门控信用加权来修正密集信用的统计不可靠性，在ALFWorld和WebShop上显著提升性能。

详情

AI中文摘要

长周期LLM智能体需要能够在稀疏和延迟奖励下为中间决策分配信用的强化学习方法。最近的基于分组的方法如GiGPO通过构建重复锚点状态下的步骤级优势来改进GRPO。然而，我们表明这种密集信用在统计上可能不可靠：在有限的轨迹采样下，罕见但幸运的动作可能获得过大的优势，产生发散锚点偏差和后期训练振荡。我们提出证据校准策略优化（ECPO），一种在策略更新前校准步骤级信用的无评论家策略优化算法。ECPO结合了证据校准动作优势（将轨迹按规范动作分组并收缩低计数估计）和方差门控信用加权（抑制由动作内噪声主导的锚点状态）。在ALFWorld和WebShop上使用Qwen2.5-1.5B/7B的实验表明，ECPO持续优于强基线，在Qwen2.5-1.5B上，ALFWorld/WebShop的成功点分别比GiGPO提高+5.2/+7.3，同时仅增加0.1%的额外优势计算开销。

英文摘要

Long-horizon LLM agents require reinforcement learning methods that can assign credit to intermediate decisions under sparse and delayed rewards. Recent group-based methods such as GiGPO improve over GRPO by constructing step-level advantages at repeated anchor states. However, we show that such dense credit can be statistically unreliable: under limited rollouts, rare but lucky actions may receive overly large advantages, producing divergent anchor bias and late-stage training oscillation. We propose Evidence-Calibrated Policy Optimization (ECPO), a critic-free policy optimization algorithm that calibrates step-level credit before policy updates. ECPO combines Evidence-Calibrated Action Advantage, which groups rollouts by canonical actions and shrinks low-count estimates, with Variance-Gated Credit Weighting, which suppresses anchor states dominated by within-action noise. Experiments on ALFWorld and WebShop with Qwen2.5-1.5B/7B show that ECPO consistently outperforms strong baselines, improving GiGPO by +5.2/+7.3 success points on ALFWorld/WebShop with Qwen2.5-1.5B while adding only 0.1% additional advantage-computation overhead.

URL PDF HTML ☆

赞 0 踩 0