arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.31460 2026-06-01 cs.RO cs.SY eess.SY

On-Device Robotic Planning: Eliminating Inference Redundancy for Efficient Decision-Making

设备端机器人规划：消除推理冗余以实现高效决策

Joonhee Lee, Hyunseung Shin, Hyunmi Kim, Pei Zhang, Jeonggil Ko

AI总结提出REIS框架，通过场景门控、KV引导的affordance路由和审慎推理减少推理冗余，在保持语义适应性的同时加速机器人控制。

详情

Comments: 19 pages

AI中文摘要

基于推理的机器人策略使用大型语言和视觉语言模型实现了强大的语义规划能力，但大多受限于高推理延迟，限制了实际实时部署。在这项工作中，我们观察到机器人推理工作负载包含大量的时间冗余，连续观察经常产生相同的动作和子目标。基于这一洞察，我们提出了REIS，一种受人类认知启发的机器人决策框架，在保持语义适应性的同时最小化不必要的推理。REIS结合了轻量级场景门控、KV引导的affordance路由和审慎推理，以在具身约束下加速机器人控制。在ALFRED和真实世界机器人任务上的实验表明，REIS显著抑制了推理开销，同时保持了有竞争力的任务性能。

英文摘要

Reasoning-based robotic policies using large language and vision-language models achieve strong semantic planning capabilities but mostly suffer from a high inference latency that limits practical real-time deployment. In this work, we observe that robotic reasoning workloads contain substantial temporal redundancy, where consecutive observations frequently produce identical actions and subgoals. Based on this insight, we present REIS, a human cognition inspired robotic decision-making framework that minimizes unnecessary reasoning while preserving semantic adaptability. REIS combines lightweight scene gating, KV-steered affordance routing, and deliberative reasoning to accelerate robotic control under embodied constraints. Experiments on ALFRED, and real-world robotic tasks demonstrate that REIS significantly suppresses reasoning overhead while maintaining competitive task performance.

URL PDF HTML ☆

赞 0 踩 0

2605.31457 2026-06-01 cs.CV

VisionPulse: Dynamic Visual Sparsity for Efficient Multimodal Reasoning

VisionPulse: 用于高效多模态推理的动态视觉稀疏性

Hengbo Xu, Shengjie Jin, Yanbiao Ma, Zhiwu Lu

AI总结提出VisionPulse框架，通过步骤级视觉令牌剪枝，利用视觉注意力质量估计保留预算，仅保留关键令牌，在几乎不损失准确率的情况下减少推理开销和推理轨迹长度。

详情

Comments: Accepted at ICML 2026

AI中文摘要

随着大型多模态模型（LMMs）的快速发展，推理时间开销已成为实际部署的关键瓶颈。现有方法通常在预填充阶段剪枝视觉令牌，假设推理过程中所需的视觉证据保持静态。然而，我们经验性地表明，视觉证据具有强烈的步骤依赖性：每个解码步骤只有稀疏的视觉令牌子集是关键，且关键集在推理过程中演变。此外，我们识别出一个耦合瓶颈，其中冗余的视觉上下文可能将模型引向与查询无关的区域，从而延长推理轨迹。受这些洞察的指导，我们提出VisionPulse，一种推理过程中的步骤级视觉令牌剪枝框架。VisionPulse计算轻量级的视觉注意力质量，通过利用其与LMMs有效视觉令牌使用的强正相关性来估计步骤级保留预算，并在此预算下仅保留最关键的令牌。通过在推理过程中强制视觉稀疏性，VisionPulse过滤冗余的视觉上下文，同时保留相关的视觉证据，自然缩短推理轨迹。大量实验表明，VisionPulse每步仅保留5%的视觉令牌，推理轨迹缩短11.2%，同时保持准确率几乎不变。

英文摘要

With the rapid advancement of large multimodal models (LMMs), inference-time overhead has become a key bottleneck for real-world deployment. Existing methods typically prune visual tokens at prefill, assuming the required visual evidence remains static during reasoning. However, we empirically show that visual evidence is strongly step-dependent: only a sparse subset of visual tokens is critical at each decoding step, and the critical set evolves across reasoning. Furthermore, we identify a coupled bottleneck where redundant visual context can steer the model toward query-irrelevant regions, lengthening the reasoning trace. Guided by these insights, we propose VisionPulse, a step-wise visual token pruning framework during reasoning. VisionPulse computes a lightweight visual attention mass to estimate the step-wise retention budget by exploiting its strong positive correlation with LMMs' effective visual token usage and retain only the most critical tokens under this budget. By enforcing visual sparsity during reasoning, VisionPulse filters redundant visual context while preserving relevant visual evidence, shortening reasoning traces naturally. Extensive experiments show that VisionPulse only retains 5% of visual tokens per step with reasoning traces shortened by 11.2%, while keeping accuracy almost unchanged.

URL PDF HTML ☆

赞 0 踩 0

2605.31455 2026-06-01 cs.LG cs.CL

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

DRIFT: 解耦的轨迹采样与重要性加权微调以实现高效的多轮优化

Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai, Yao Shu

AI总结针对多轮交互中在线强化学习成本高而离线监督微调存在分布偏移的问题，提出DRIFT框架，通过将KL正则化强化学习目标等价转化为重要性加权监督学习，实现高效且稳定的多轮优化。

详情

AI中文摘要

大型语言模型越来越多地部署在多轮交互环境中，用户或环境可以迭代地提供轻量级反馈。不幸的是，优化这种行为在实践中面临一个尖锐的困境：在线强化学习能够有效处理多轮动态，但由于每次更新时生成完整修正轨迹的成本过高而变得昂贵，而离线监督微调（SFT）虽然高效，但存在分布偏移和行为崩溃的问题。为此，我们创新性地提出了DRIFT（解耦的轨迹采样与重要性加权微调）框架，该框架实现了KL正则化强化学习目标等价于重要性加权监督学习的理论洞察。DRIFT通过从固定参考策略中采样离线交互轨迹，推导基于回报的重要性权重，并通过在所得数据集上进行加权SFT来优化策略，从而将轨迹采样与优化解耦。实验表明，DRIFT在多轮强化学习基线中达到或超越其性能，同时保持了标准监督微调的训练效率和简单性。代码可在 https://github.com/2020-qqtcg/DRIFT 获取。

英文摘要

Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice: online reinforcement learning is able to effectively address multi-turn dynamics but is prohibitively expensive due to the cost of generating full correction trajectories at every update, whereas offline supervised fine-tuning (SFT) is efficient but suffers from distribution shift and behavioral collapse. To this end, we novelly propose DRIFT (Decoupled Rollouts and Importance-Weighted Fine-Tuning), a framework that operationalizes the theoretical insight that the KL-regularized RL objective is equivalent to importance-weighted supervised learning. DRIFT decouples rollout from optimization by sampling offline interaction trajectories from a fixed reference policy, deriving return-based importance weights, and optimizing the policy via weighted SFT on the resulting dataset. Empirically, we demonstrate that DRIFT matches or exceeds the performance of multi-turn reinforcement learning baselines while maintaining the training efficiency and simplicity of standard supervised fine-tuning. Code is available at https://github.com/2020-qqtcg/DRIFT.

URL PDF HTML ☆

赞 0 踩 0

2605.31452 2026-06-01 cs.CL cs.HC

Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows

自由译者的翻译分析 II：用于保密翻译工作流的本地大语言模型基准测试

Yuri Balashov, Rex VanHorn, Mingxi Xu, Austin Downes

AI总结针对自由译者和小型语言服务提供商，开发了实用低门槛的方法，通过基准测试本地可运行的大语言模型在保密敏感领域的离线翻译性能，发现精心选择的本地大语言模型可匹配或超越本地神经机器翻译系统和前沿大语言模型，但落后于顶级商业神经机器翻译系统。

详情

Comments: 20 pages. Accepted at EAMT-2026 (Tilburg, Netherlands, June 2026)

AI中文摘要

基于我们之前的工作，本文为自由译者和小型语言服务提供商开发了实用、低门槛的方法，使用严格且易于访问的分析方法来评估翻译技术。这里我们解决一个高风险、专业化的需求：保密敏感领域的离线翻译，其中隐私约束排除了基于云的引擎和商业大语言模型的使用。我们将之前工作中使用的Reeve基金会三语语料库（RFTC）扩展为多语语料库（RFMC），添加了句子对齐的德语和简体中文参考翻译。然后，我们通过Ollama对几个本地可运行的语言模型进行基准测试，涵盖四种语言方向，从该语料库中选取了1000多个句子。我们使用一致的单一提示调用，无需微调或领域适应，将本地大语言模型的输出与商业神经机器翻译（DeepL、百度）、前沿大语言模型（GPT-5.2）以及专业级本地神经机器翻译系统（OPUS-CAT、NeuralDesktop、Promt）进行比较。使用MATEO进行自动评估。结果显示，本地大语言模型在不同语言方向和模型规模上的性能存在显著差异。最好的本地大语言模型匹配或超越了本地神经机器翻译系统和前沿大语言模型，但仍落后于顶级商业神经机器翻译。这些发现强调了精心选择的本地大语言模型翻译对于隐私受限的专业人士的可行性，并为未来关于模型缩放和多语言能力的研究提供了信息。

英文摘要

Building on our previous work, this paper develops practical, low-barrier methods for freelance translators and smaller language service providers to evaluate translation technologies using rigorous yet accessible analytic methods. Here we address a high-stakes, specialized need: offline translation for confidentiality-sensitive domains in which privacy constraints preclude the use of cloud-based engines and commercial LLMs. We expand the Reeve Foundation Trilingual Corpus (RFTC) used in our previous work into a multilingual corpus (RFMC) by adding sentence-aligned German and Simplified Chinese reference translations. We then benchmark several locally runnable language models (via Ollama) across four language directions on 1000+ sentences selected from this corpus. We use consistent single-prompt calls without fine-tuning or domain adaptation, comparing local LLM outputs against commercial NMTs (DeepL, Baidu), a frontier LLM (GPT-5.2), and professional-grade local NMT systems (OPUS-CAT, NeuralDesktop, Promt). Automatic evaluation is conducted with MATEO. Results reveal substantial variation in local LLM performance across language directions and model sizes. The best local LLMs match or surpass local NMT systems and a frontier LLM, though they remain behind top commercial NMTs. These findings underscore the viability of carefully selected local LLM translation for privacy-constrained professionals and inform future research on model scaling and multilingual capability.

URL PDF HTML ☆

赞 0 踩 0

2605.31446 2026-06-01 cs.CL cs.AI

Fine-grained Verification via Diagnostic Reasoning Supervision for Aspect Sentiment Triplet Extraction

面向方面情感三元组抽取的诊断推理监督细粒度验证

Wenna Lai, Haoran Xie, Guandong Xu, Qing Li, S. Joe Qin

AI总结提出FiVeD框架，通过诊断推理监督进行细粒度验证，利用质量评分和错误分类等辅助任务提升ASTE三元组抽取的可靠性。

详情

Comments: 25 pages, 13 figures, and 6 tables

AI中文摘要

方面情感三元组抽取（ASTE）旨在识别方面词、观点词和情感极性作为结构化三元组，为下游信息系统应用（如意见挖掘、可解释推荐和评论摘要）提供必要输入。先前工作主要关注端到端抽取，而对抽取三元组的事后验证仍相对未被充分探索。这一差距限制了ASTE系统的可靠性，因为预测的三元组可能在局部合理但全局无效。此外，候选无效性是多方面的，候选可用性本质上是分级的，这促使了一种细粒度验证机制，可以过滤或重新排序来自不同抽取器的输出。在本文中，我们提出了FiVeD，一个具有诊断推理监督的细粒度验证框架。具体来说，验证器通过多个互补目标进行训练，包括作为主要任务的有效性分类和质量评分估计，以及作为辅助任务的错误类型分类和理由生成。我们定义了层次化错误类别，并在语义和句法约束下构建合理的错误三元组，利用现成的LLM和特定任务评分标准生成质量评分和诊断理由。在推理过程中，生成的质量评分用于过滤候选输出，支持可调节的精确率-召回率权衡。在多个ASTE基线模型上的实验表明，FiVeD作为即插即用的验证模块，持续将抽取性能提升最多3.53个F1点。

英文摘要

Aspect Sentiment Triplet Extraction (ASTE) aims to identify aspect terms, opinion terms, and sentiment polarities as structured triplets, providing essential inputs for downstream information system applications such as opinion mining, explainable recommendations, and review summarization. Prior work mainly focuses on end-to-end extraction, while post hoc verification of extracted triplets remains comparatively underexplored. This gap limits the reliability of ASTE systems, since predicted triplets may be locally plausible while being globally invalid. Moreover, candidate invalidity is multi-faceted and candidate usability is inherently graded, motivating a fine-grained verification mechanism that can filter or re-rank outputs from diverse extractors. In this paper, we propose FiVeD, a framework for Fine-grained Verification with Diagnostic reasoning supervision. Specifically, the verifier is trained with multiple complementary objectives, including validity classification and quality score estimation as primary tasks, with error type classification and rationale generation as auxiliary tasks. We define hierarchical error categories and construct plausible incorrect triplets under semantic and syntactic constraints, and leverage an off-the-shelf LLM with task-specific rubrics to produce quality scores and diagnostic rationales. During inference, the resulting quality scores are used to filter candidate outputs, supporting adjustable precision-recall tradeoffs. Experiments across multiple ASTE baselines demonstrate that FiVeD consistently improves extraction performance by up to 3.53 F1 points as a plug-and-play verification module.

URL PDF HTML ☆

赞 0 踩 0

2605.31445 2026-06-01 cs.GT cs.AI cs.CL cs.LG

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information

二手车销售机器人？作为讨价还价代理的LLM在部分信息下的诚实与轻信

Antonio Valerio Miceli-Barone, Vaishak Belle, Shay B. Cohen

AI总结研究LLM代理在模拟讨价还价场景中的表现，发现它们偏离博弈论均衡，尝试撒谎但无法有效利用信息不对称，且优化财务效用会增强谈判能力但增加不诚实行为。

详情

Comments: 18 pages, 14 figures

AI中文摘要

在这项工作中，我们研究了模拟讨价还价场景中的代理，其中买方和卖方通过文本渠道进行通信，并试图在不同信息制度（完全信息、信息不对称或相互不确定性）下谈判互利交易。我们评估了它们相对于博弈论解决方案的表现，并进一步调查了它们的诚实性（披露或隐瞒信息、误导或欺骗的倾向）以及轻信性（信任或不信任对方提供信息的倾向）。我们研究了零样本LLM代理（使用简单的提示脚手架）以及微调代理，以探讨优化代理以最大化财务利润是否使它们成为更强的谈判者，但也更不诚实和更不信任。我们发现，现成的LLM都显著偏离博弈论均衡，它们试图对自己的私人信息撒谎，但无法有效利用信息不对称。对财务效用的微调使代理在达成更好交易方面更强，但也更不诚实，这突显了优化代理任务对其安全性可能带来的风险。我们发布了我们的代码和一个讨价还价场景数据集。

英文摘要

In this work we study agents in simulated bargaining scenarios, where a buyer and a seller communicate through a text channel and attempt to negotiate mutually beneficial trades, under different information regimes (complete information, information asymmetry or mutual uncertainty). We evaluate their performance w.r.t. game-theoretical solutions and further investigate their honesty (their tendency to disclose or withhold information or to mislead and deceive) as well as their credulity (their tendency to trust or distrust information provided by the other agent). We study zero-shot LLM agents with simple prompting scaffolding as well as fine-tuned agents, in order to investigate whether optimising the agents to maximise financial profits makes them stronger negotiators but also more dishonest and less trusting. We find that off-the-shelf LLMs all substantially deviate from game-theoretical equilibria, they attempt to lie about their private information but cannot efficiently exploit information asymmetries. Fine-tuning on financial utility makes the agents stronger at achieving better deals but also more dishonest, highlighting the risks that optimising agents for a task can have on their safety. We release our code and a dataset of bargaining scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.31444 2026-06-01 cs.AI cs.LO

Answer-Set-Programming-based Abstractions for Reinforcement Learning

基于回答集编程的强化学习抽象方法

Rafael Bankosegger, Thomas Eiter, Johannes Oetsch

AI总结本文提出使用回答集编程（ASP）实现CARCASS框架中的抽象，以解决强化学习中状态空间巨大带来的挑战，并通过积木世界和Minigrid两个领域的案例验证了该方法的有效性。

详情

Comments: Accepted for publication at the 42nd International Conference on Logic Programming (ICLP 2026). To appear in Theory and Practice of Logic Programming (TPLP)

AI中文摘要

强化学习（RL）使自主智能体能够从经验中学习策略，但现实问题通常涉及巨大的状态空间，使得学习和泛化具有挑战性。因此，抽象和近似是必不可少的。关系强化学习（RRL）提供了一种推理对象及其关系的方法，而Martijn van Otterlo的CARCASS框架展示了如何使用逻辑表示在一阶域中建模马尔可夫决策过程（MDP）。CARCASS最初用Prolog实现，利用领域知识创建强大的抽象。我们探索了回答集编程（ASP），这是一种丰富的、与Prolog相反完全声明式的建模语言，以实现CARCASS抽象。我们在两个领域（即积木世界和Minigrid）的案例研究中评估了基于ASP的实现。我们的结果表明，使用ASP的CARCASS为构建RL抽象提供了一种有前景的方法，尤其是在领域知识可用的情况下。

英文摘要

Reinforcement Learning (RL) enables autonomous agents to learn policies from experience, but realistic problems often involve enormous state spaces, making learning and generalisation challenging. Abstraction and approximation are therefore essential. Relational Reinforcement Learning (RRL) offers a way to reason about objects and their relations, and the CARCASS framework by Martijn van Otterlo demonstrates how logical representations can model Markov Decision Processes (MDPs) in first-order domains. Originally implemented in Prolog, CARCASS leverages domain knowledge to create powerful abstractions. We explore Answer-Set Programming (ASP), which is a rich and, contrary to Prolog, fully declarative modelling language, to realise CARCASS abstractions. We evaluate our ASP-based implementation in case studies of two domains, viz. Blocks World and Minigrid. Our results indicate that CARCASS with ASP provides a promising approach to constructing abstractions for RL, especially when domain knowledge is available.

URL PDF HTML ☆

赞 0 踩 0

2605.31443 2026-06-01 stat.ME cs.LG econ.EM math.ST stat.TH

Modeling Covariate Transition for Efficient Estimation of Longitudinal Treatment Effects in Randomized Experiments

建模协变量转移以高效估计随机实验中的纵向处理效应

Naoki Chihara, Tatsushi Oka, Yasuko Matsubara, Yasushi Sakurai, Shota Yasui

AI总结提出一种回归调整框架，通过建模协变量转移来估计随机实验中的纵向处理效应，并实现渐近正态性和半参数有效性。

详情

Journal ref: The 43rd International Conference on Machine Learning, 2026
Comments: Accepted by ICML'26

AI中文摘要

我们提出一个回归调整框架，用于在静态制度下估计随机实验中的纵向处理效应。虽然回归调整方法通过使用预处理协变量有助于随机实验中的方差减少，但它们通常只关注平均效应，从中我们无法获得关于效应何时出现以及持续多久的有价值见解。为了解决这个问题，我们考虑随时间变化的中间结果和事后协变量，并使用转移核表示这些动态轨迹。此外，我们建立了估计量的渐近正态性和半参数效率界，从而实现更强大的统计推断。使用日本某流媒体平台的A/B测试数据进行的模拟研究和实证分析显示了我们的方法的实际优势。

英文摘要

We present a regression-adjustment framework designed for the estimation of longitudinal treatment effects in randomized experiments under static regimes. While regression-adjustment methods are useful for variance reduction in randomized experiments by using pre-treatment covariates, they usually focus only on average effects, from which we cannot obtain valuable insights into when the effects appear and how long they continue. To address this issue, we consider intermediate outcomes and evolving post-treatment covariates over time, and we represent such dynamic trajectories using transition kernels. Furthermore, we establish the asymptotic normality and the semiparametric efficiency bound for our estimator, enabling more powerful statistical inference. Simulation studies and empirical analysis using A/B test data from a streaming platform in Japan show the practical advantages of our method.

URL PDF HTML ☆

赞 0 踩 0

2605.31438 2026-06-01 cs.LG

Flow map learning in nonlinear vector autoregressive models: influence of the feature-library structure on the training error

非线性向量自回归模型中的流映射学习：特征库结构对训练误差的影响

Markus Gross

AI总结研究非线性向量自回归模型（NVAR/NG-RC）中特征库结构如何影响训练误差，揭示了训练误差随时间分辨率遵循的标度律，并指出特征库能否精确表示流映射的Lie级数系数决定了误差行为。

详情

Comments: 35 pages, 12 figures

AI中文摘要

时间序列预测通常需要学习非线性和时滞依赖关系。一类典型的预测模型是非线性向量自回归过程（NVAR），也称为下一代储层计算机（NG-RC）。这些模型在其显式特征库张成的空间上近似Koopman算子。我们考虑学习马尔可夫非线性动力系统的可辨识性问题，并表明训练误差作为时间分辨率的函数遵循特征性的（预）渐近标度律。这些定律取决于特征库能否精确或仅近似表示流映射（传播子）的早期Lie级数系数。对于由多项式向量场控制的动力系统，我们展示了具有单项式和傅里叶特征库的NVAR/NG-RC模型的机制。我们确定了训练误差对时间分辨率、涉及的非线性阶数和延迟项数量的依赖性。虽然延迟项减少了最优单步训练误差，但只有当库提供足够的非线性时，它们才能改善长期预测。因此，当模型类与真实数据生成过程不匹配时，小的训练误差与弱的泛化能力共存。在各种混沌动力系统上的数值实验证实了理论预测。

英文摘要

Time series forecasting often requires learning nonlinear and time-delayed dependencies. A paradigmatic class of forecasting models are nonlinear vector autoregressive processes (NVAR), also known as next-generation reservoir computers (NG-RCs). These models approximate the Koopman operator on the space spanned by their explicit feature library. We consider the identifiability problem for learning Markovian nonlinear dynamical systems and show that the training error as a function of time resolution follows characteristic (pre-)asymptotic scaling laws. These laws depend on whether the feature library can represent the early Lie-series coefficients of the flow map (propagator) exactly or merely approximately. For dynamical systems governed by polynomial vector fields, we demonstrate the mechanism for NVAR/NG-RC models with monomial and Fourier feature libraries. We determine the dependence of the training error on the temporal resolution, the involved nonlinear degree, and the number of delay terms. While delay terms reduce the optimal one-step training error, they improve long-horizon forecasts only when the library provides sufficient nonlinearity. Thus, small training error coexists with weak generalization as the model class is mismatched to the true data-generating process. Numerical experiments on various chaotic dynamical systems confirm the theoretical predictions.

URL PDF HTML ☆

赞 0 踩 0

2605.31436 2026-06-01 cs.RO

Actuator-Aware Inverse Kinematics with Joint-Limit Admissibility for Torque-Controlled Redundant Robots

面向力矩控制冗余机器人的关节极限可容许性感知逆运动学

Mohammad Dastranj, Mahdi Hejrati, Jouni Mattila

AI总结提出一种基于凸二次规划的逆运动学方法，通过控制障碍函数约束关节极限，并利用控制器兼容性目标解决冗余，实现无需修改下层控制器的任务行为改善。

详情

AI中文摘要

本文针对关节极限约束下的力矩控制冗余机器人，提出了执行器感知的逆运动学。在所考虑的架构中，逆运动学输出不仅仅是纯运动学的关节速度指令；它是提供给下游力矩级控制器的所需关节速度。因此，小的命令任务残差不一定能改善实际运动。所提出的方法构建了一个凸二次规划问题，其决策变量是关节级所需速度。控制障碍函数风格的边界施加了参考级关节极限可容许性，而任务方程通过惩罚松弛变量处理。冗余通过考虑先前命令一致性和执行器扭矩容量加权的控制器兼容性目标来解决。该方法独立于特定的力矩级控制器，可作为末端轨迹与冗余机器人控制器之间的中间逆运动学层。在虚拟分解控制的七自由度上肢外骨骼上的实验将所提方法与标准逆运动学基线以及约束任务保持二次规划基线进行了比较。结果表明，在不修改下游控制器的情况下，在测试轨迹中实现了更低的极限推动指令、有界可容许所需速度以及改善的实际任务行为。

英文摘要

This paper proposes actuator-aware inverse kinematics for torque-controlled redundant robots under joint-limit constraints. In the considered architecture, the inverse-kinematic output is not merely a purely kinematic joint-velocity command; it is the required joint velocity supplied to a downstream torque-level controller. Therefore, a small commanded task residual may not necessarily improve realized motion. The proposed method formulates a convex quadratic programming problem whose decision variable is the joint-level required velocity. Control barrier function style bounds impose reference-level joint-limit admissibility, while the task equation is handled through a penalized slack variable. Redundancy is resolved using a controller-compatibility objective that accounts for previous-command consistency and actuator torque-capacity weighting. The method is independent of the particular torque-level controller and can serve as an intermediate IK layer between an endpoint trajectory and a redundant robot controller. Experiments on a virtual-decomposition-controlled seven-degree-of-freedom upper-limb exoskeleton compare the method with standard inverse-kinematic baselines and a constrained task-preserving quadratic programming baseline. The results indicate lower limit-pushing commands, bounded admissible required velocities, and improved realized task behavior in the tested trajectory, without modifying the downstream controller.

URL PDF HTML ☆

赞 0 踩 0

2605.31433 2026-06-01 cs.CL

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

SCOPE：面向开放式任务的协同演化策略自我对弈

Wai-Chung Kwan, Aryo Pradipta Gema, Joshua Ong Jun Leang, Pasquale Minervini

AI总结提出SCOPE框架，通过协同演化挑战者和求解者策略，实现无数据自我对弈，在开放式任务上提升性能并超越依赖人工提示的方法。

详情

AI中文摘要

自我对弈可以在没有外部监督的情况下训练语言模型。然而，现有方法需要可规则检查的答案，使得开放式任务依赖于精心设计的提示或前沿模型作为评判者。我们引入了SCOPE，一个无数据的自我对弈框架，用于开放式任务，它协同演化两个策略：一个生成文档基础任务的挑战者，以及一个通过多轮检索回答这些任务的求解者。初始模型的冻结副本作为自我评判者，它从源文档编写特定任务的评分标准，并据此对求解者的回答进行评分。在三个7-8B指令调优模型（Qwen2.5、Qwen3、OLMo-3）上，SCOPE在八个基准测试中将开放式任务性能提升了高达+10.4分，并匹配或超过了在约9K个精心设计的提示上训练的GRPO_data。尽管仅在开放式任务上训练，SCOPE还在七个保留的短格式问答基准测试中提升了高达+13.8分，在所有三个模型上均超越了GRPO_data。消融实验表明，协同演化挑战者对于保持任务接近求解者的前沿是必要的；性能提升源于检索和合成的改进，其相对贡献因任务而异；并且评分标准生成质量是自我评判的瓶颈。

英文摘要

Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-free self-play framework for open-ended tasks that co-evolves two policies: a Challenger that generates document-grounded tasks, and a Solver that answers them through multi-turn retrieval. A frozen copy of the initial model serves as the self-judge, which writes task-specific rubrics from the source document and grades Solver responses against them. Across three 7-8B instruction-tuned models (Qwen2.5, Qwen3, OLMo-3), SCOPE improves open-ended performance by up to +10.4 points on eight benchmarks and matches or exceeds GRPO_data trained on ~9K curated prompts. Although trained only on open-ended tasks, SCOPE also improves held-out short-form QA by up to +13.8 points on seven held-out benchmarks, surpassing GRPO_data on all three models. Ablations show that co-evolving the Challenger is necessary to keep tasks near the Solver's frontier, that gains arise from improvements in both retrieval and synthesis with the relative contribution varying by task, and that rubric generation quality is the bottleneck for self-judging.

URL PDF HTML ☆

赞 0 踩 0

2605.31432 2026-06-01 cs.CL cs.AI cs.SD

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

DOA：面向语音大语言模型的长形式同声传译的无训练解码器仅注意力策略

Sara Papi, Luisa Bentivogli

AI总结提出DOA策略，利用解码器自注意力导出代理对齐，无需训练即可实现语音大语言模型在长形式同声传译中的流式决策。

详情

AI中文摘要

同声语音到文本翻译（SimulST）在语音尚未完成时生成翻译，需要流式策略来决定何时读取和何时写入。最先进的方法依赖于基于注意力的编码器-解码器模型，其中交叉注意力提供显式的对齐信号。相比之下，语音大语言模型（SpeechLLMs）是仅解码器架构，仅依赖自注意力。这引发了一个核心问题：解码器自注意力是否包含足够稳定的对齐信号来指导流式策略。此外，现有方法通常依赖于基于训练的适应或启发式等待-$k$策略，并且尚未在长形式场景中得到验证。为了填补这些空白，我们提出了仅解码器注意力（DOA），这是一种无训练策略，通过从自注意力中导出代理对齐，使现成的SpeechLLMs能够进行长形式同声传译。在Phi4-Multimodal和Qwen3-Omni上的实验表明，DOA提供了有效的对齐信号来支持流式决策，实现了低延迟的长形式SimulST，其质量接近无需重新训练的离线解码。

英文摘要

Simultaneous speech-to-text translation (SimulST) generates translations while speech is still unfolding, requiring a streaming policy that decides when to read and when to write. State-of-the-art approaches rely on attention-based encoder-decoder models where cross-attention provides explicit alignment signals. In contrast, Speech Large Language Models (SpeechLLMs) are decoder-only architectures relying solely on self-attention. This raises a central question: whether decoder self-attention contains sufficiently stable alignment signals to guide the streaming policy. Moreover, existing approaches typically rely on training-based adaptations or heuristic wait-$k$ policies and have not been validated in long-form settings. To fill these gaps, we propose Decoder-Only Attention (DOA), a training-free policy that enables long-form simultaneous translation with off-the-shelf SpeechLLMs by deriving a proxy alignment from self-attention. Experiments on Phi4-Multimodal and Qwen3-Omni show that DOA provides an effective alignment signal for supporting streaming decisions, enabling low-latency long-form SimulST with quality close to offline decoding without retraining.

URL PDF HTML ☆

赞 0 踩 0

2605.31429 2026-06-01 cs.CV

YARD: Y-Architecture Register Decoding for Efficient Hallucination Mitigation in Large Vision-Language Models

YARD: Y型架构寄存器解码用于大型视觉语言模型中的高效幻觉缓解

Ting Chen, Geng Li, Guohao Chen, Yu Hu, Guan Huang, Mai Chen, Langsheng Lei, Jun Du

AI总结提出YARD框架，通过Y型架构共享浅层计算并在中间层分支，用寄存器令牌替换视觉令牌构建退化分支，实现无需训练的对比解码，有效缓解幻觉并降低推理延迟。

详情

Comments: 21 pages, 11 figures

AI中文摘要

对比解码（CD）旨在通过对比标准模型和视觉退化模型的输出分布来缓解大型视觉语言模型（LVLM）中的幻觉。然而，现有的免训练CD方法存在次优的退化分支：完全丢弃视觉令牌过于极端并导致语言幻觉，而破坏输入图像对视觉证据的控制粗糙且由于需要两次完整前向传播而导致高推理延迟。为了解决这些问题，我们提出了YARD，一种免训练的Y型架构寄存器解码框架。受可靠文本到视觉定位主要出现在中间解码器层的观察启发，YARD通过共享浅层计算并恰好在此关键阶段分支，在内部构建退化分支。对于退化分支，YARD用寄存器令牌替换补丁级视觉令牌，这些令牌保留了全局图像语义但缺乏细粒度局部证据。这种图像感知但局部欠基础的设计提供了忠实的对比信号，没有极端模态不匹配，同时Y型架构严格避免了昂贵的第二次前向传播。在生成性和判别性幻觉基准上的大量实验表明，YARD在多个LVLM上一致实现了最先进的幻觉缓解，同时显著降低了推理延迟。

英文摘要

Contrastive decoding (CD) seeks to mitigate hallucinations in Large Vision-Language Models (LVLMs) by contrasting the output distributions of a standard model and a visually degraded model. However, existing training-free CD methods suffer from sub-optimal degraded branches: completely dropping visual tokens is too extreme and induces language hallucinations, while corrupting input images offers coarse control over visual evidence and suffers from high inference latency due to requiring two full forward passes. To address these dilemmas, we propose YARD, a training-free Y-Architecture Register Decoding framework. Motivated by the observation that reliable text-to-vision grounding predominantly emerges in the middle decoder layers, YARD constructs the degraded branch internally by sharing shallow-layer computations and branching exactly at this critical stage. For the degraded branch, YARD replaces patch-level visual tokens with register tokens, which preserve global image semantics but lack fine-grained local evidence. This image-aware yet locally under-grounded design provides a faithful contrastive signal without extreme modality mismatch, while the Y-architecture strictly avoids a costly second forward pass. Extensive experiments on generative and discriminative hallucination benchmarks demonstrate that YARD consistently achieves state-of-the-art hallucination mitigation across multiple LVLMs, alongside a significant reduction in inference latency.

URL PDF HTML ☆

赞 0 踩 0

2605.31427 2026-06-01 cs.LG cs.DC

DG-CoLearn: An Efficient Collaborative Learning Framework for Dynamic Graphs

DG-CoLearn：一种高效的动态图协同学习框架

Ashley Hoi-Ting Au, Zikun Zhang, Ligang He, Qiang Ni

AI总结针对动态图学习中重复全快照重训练计算开销大且不适用于分区数据协同场景的问题，提出基于增量图快照处理的客户端无感知协同学习框架DG-CoLearn，通过服务器中介的嵌入交换机制实现准确的多跳消息传递，在训练速度、通信开销和预测性能上均取得显著提升。

详情

AI中文摘要

动态图学习（DGL）对于建模演化的图数据至关重要，但现有方法由于重复的全快照重训练而遭受显著的计算开销，并且不适合具有分区数据的协同设置。在现实的图系统中，跨分区边是不可避免的，但客户端之间直接共享图结构可能违反隐私约束。我们提出DG-CoLearn，一种基于增量图快照处理的客户端无感知协同动态图学习框架，该框架将计算集中在受时间更新影响的图区域，同时通过时间建模保留历史信息。这种增量设计一致地应用于整个图处理流程，包括一种服务器中介的嵌入交换机制，以实现准确的多跳消息传递，而无需暴露原始的跨客户端结构信息。大量实验表明，DG-CoLearn在训练时间上实现了高达33.8倍的加速，通信开销降低了27.4倍，同时在节点分类（F1提升高达13.36%）和链接预测（MAP提升高达8.27%）任务上持续提高了预测性能。这些结果突显了DG-CoLearn在协同动态图学习中桥接效率、可扩展性和客户端间结构隐私方面的有效性。

英文摘要

Dynamic graph learning (DGL) is essential for modelling evolving graph data, but existing methods suffer from significant computational overhead due to repeated full-snapshot retraining and are not well-suited for collaborative settings with partitioned data. In realistic graph systems, cross-partition edges are unavoidable, but direct sharing of graph structure between clients may violate privacy constraints. We propose DG-CoLearn, a client-oblivious collaborative dynamic graph learning framework built on incremental graph snapshot processing, which focuses computation on graph regions affected by temporal updates while preserving historical information through temporal modelling. This incremental design is consistently applied across the entire graph processing pipeline, including a server-mediated embedding exchange mechanism to enable accurate multi-hop message passing without exposing raw cross-client structural information. Extensive experiments demonstrate that DG-CoLearn achieves up to 33.8$\times$ speedup in training time and 27.4$\times$ reduction in communication overhead, while consistently improving predictive performance on both node classification (up to 13.36% F1 improvement) and link prediction (up to 8.27% MAP improvement) tasks. These results highlight the effectiveness of DG-CoLearn in bridging efficiency, scalability, and client-to-client structural privacy in collaborative dynamic graph learning.

URL PDF HTML ☆

赞 0 踩 0

2605.31426 2026-06-01 eess.IV cs.CV math.OC

Self-Tuning Regularization for Image Scanning Microscopy

图像扫描显微镜的自调谐正则化

Sofia Agostoni, Lisa Cuneo, Christian Daniele, Giacomo Garré, Laurent Le, Alessandro Zunino, Giuseppe Vicidomini, Luca Calatroni

AI总结针对图像扫描显微镜（ISM）的多图像反卷积（MID）和超分辨率切片ISM（s²ISM）重建，提出一种自调谐显式正则化框架，通过贝叶斯最大后验公式结合多帧泊松数据保真项与ℓ1或平滑全变分惩罚，并基于残差白化原则自适应选择正则化参数，无需经验停止准则，在低光子条件下实现稳定超分辨和光学切片。

详情

AI中文摘要

图像扫描显微镜（ISM）是一种荧光成像技术，它结合探测器阵列采集和计算重建，实现理想共聚焦显微镜（即使用无穷小针孔）的理论分辨率，同时保持高信噪比。在获得超分辨图像的重建方法中，多图像反卷积（MID）及其旨在保持共聚焦显微镜光学切片能力的扩展（称为超分辨率切片ISM，s²ISM）是最广泛使用的方法之一。这两种方法都依赖于Richardson-Lucy型迭代方案，其半收敛行为需要提前停止，并且常常导致噪声放大和重建伪影。在这项工作中，我们为MID和s²ISM重建引入了一个自调谐显式正则化框架。在贝叶斯最大后验公式中，我们将多帧泊松数据保真项与显式正则化相结合，考虑ℓ1和平滑全变差惩罚作为代表性例子。我们进一步通过将残差白化原则适应于多帧泊松设置，并引入针对s²ISM定制的频谱高通扩展，开发了一种自动且无需真实值的正则化参数选择策略。由此产生的框架无需经验停止规则即可实现稳定重建。为了演示所提出的框架，我们考虑了基于近端梯度和镜像下降方法的一阶优化方案，并采用自适应回溯策略。在模拟和真实荧光ISM数据集上的实验表明，与无正则化方法相比，重建稳定性和图像质量得到改善，同时在低光子条件下实现了鲁棒的超分辨率和光学切片。

英文摘要

Image Scanning Microscopy (ISM) is a fluorescence imaging technique that combines detector-array acquisition and computational reconstruction to achieve the theoretical resolution of an ideal confocal microscope, i.e., one operating with an infinitesimally small pinhole, while maintaining high signal-to-noise ratio. Among the reconstruction methods for obtaining the super-resolved image, multi-image deconvolution (MID) and its extension aimed at preserving the optical sectioning capability of confocal microscopy, known as super-resolution sectioning ISM (s$^2$ISM), are among the most widely used approaches. Both methods rely on Richardson--Lucy-type iterative schemes, whose semi-convergent behavior requires early stopping and often leads to noise amplification and reconstruction artifacts. In this work, we introduce a self-tuning explicit regularization framework for both MID and s$^2$ISM reconstruction. Within a Bayesian maximum a posteriori formulation, we combine a multi-frame Poisson data fidelity term with explicit regularization, considering $\ell_1$ and smoothed total variation penalties as representative examples. We further develop an automatic and ground-truth-free strategy for regularization parameter selection by adapting the residual whiteness principle to the multi-frame Poisson setting and introducing a spectral high-pass extension tailored to s$^2$ISM. The resulting framework enables stable reconstructions without empirical stopping rules. To demonstrate the proposed framework, we consider first-order optimization schemes based on proximal gradient and mirror descent methods with adaptive backtracking strategies. Experiments on simulated and real fluorescence ISM datasets demonstrate improved reconstruction stability and image quality with respect to unregularized approaches, while enabling robust super-resolution and optical sectioning in low-photon conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.31423 2026-06-01 cs.LG

Fixed Universal Transformers

AI总结提出FSM-Net，通过频率注意力模块和交叉门控视觉E-Branchformer实现高效双域去模糊，在NTIRE 2026挑战赛中获得第二名。

详情

Comments: Accepted to NTIRE Workshop at CVPR 2026. Project page: https://efficient-deblurring-fsmnet.vercel.app

AI中文摘要

真实世界图像去模糊要求高保真恢复和计算效率，现有方法往往难以平衡。本文提出FSM-Net（频率-空间多分支网络），一种高效解决方案，在NTIRE 2026高效真实世界去模糊挑战赛中获得第二名。FSM-Net开创了双域方法：新颖的频率注意力模块通过FFT显式恢复高频结构细节，而瓶颈处的交叉门控视觉E-Branchformer以线性复杂度捕获全局依赖。为确保鲁棒收敛，我们采用由复合损失函数（多尺度Charbonnier、结构边缘和频率）引导的渐进课程训练策略。在RSBlur基准上评估，FSM-Net仅用4.94M参数和159.35 GMACs（1920x1200分辨率）即达到33.144 dB PSNR的出色性能。通过有效推动效率与质量的帕累托前沿，FSM-Net为资源受限的图像恢复建立了强基线。

英文摘要

Real-world image deblurring demands both high-fidelity restoration and computational efficiency, a balance existing methods often struggle to achieve. In this paper, we propose FSM-Net (Frequency-Spatial Multi-branch Network), a highly efficient solution that secured 2nd place in the NTIRE 2026 Challenge on Efficient Real-World Deblurring. FSM-Net pioneers a dual-domain approach: a novel Frequency Attention module explicitly recovers high-frequency structural details via FFT, while a Cross-Gated Vision E-Branchformer at the bottleneck captures global dependencies with linear complexity. To ensure robust convergence, we employ a progressive curriculum training strategy guided by a composite loss function (Multi-Scale Charbonnier, Structural Edge, and Frequency). Evaluated on the RSBlur benchmark, FSM-Net achieves an outstanding 33.144 dB PSNR with only 4.94M parameters and 159.35 GMACs (at 1920x1200 resolution). By effectively pushing the Pareto frontier of efficiency and quality, FSM-Net establishes a strong baseline for resource-constrained image restoration.

URL PDF HTML ☆

赞 0 踩 0

2605.31393 2026-06-01 cs.CL cs.AI

Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models

面向手语翻译的大语言模型目标端释义增强

Pedro Dal Bianco, Jean Paul Nunes Reinhold, Oscar Stanchi, Facundo Quiroga, Franco Ronchetti, Ulisses Brisolara Corrêa

AI总结针对手语翻译中平行语料稀缺和目标词汇长尾分布的问题，提出利用GPT-4o生成参考句子的受控释义变体进行目标端增强，并在三种手语数据集上验证了方法的有效性。

详情

Comments: Accepted at GenSign (https://genai4sl.github.io/) at CVPR 2026. Non proceedings track

AI中文摘要

手语翻译（SLT）仍然受到有限的配对手语视频/文本语料库和长尾目标词汇的限制。我们研究了目标端增强方法，其中GPT-4o生成参考句子的受控释义变体，而手语输入保持不变。采用基于Signformer姿态的Transformer，在两阶段调度下进行训练：先在增强语料库上预训练，然后在原始参考句子上微调。我们在三个具有互补挑战的数据集上进行了评估：PHOENIX14T（德国手语），具有适度的词汇多样性；GSL（希腊手语），具有高度受控、重复的录制；以及LSA-T（阿根廷手语），具有严重的长尾稀疏性。在PHOENIX14T上，增强将BLEU-4从9.56提高到10.33。接近饱和的GSL基线和极其稀疏的LSA-T设置揭示了该方法的局限性。据我们所知，这是第一项将LLM生成的目标端释义和LLM作为评估者应用于手语翻译的研究。语义评估揭示了词汇重叠指标低估的忠实度提升。

英文摘要

Sign language translation (SLT) remains constrained by limited paired sign-video/text corpora and heavy-tailed target vocabularies. We study target-side augmentation in which GPT-4o generates controlled paraphrase variants of reference sentences while the sign input remains unchanged. A Signformer-style pose-based Transformer is trained under a two-stage schedule: pre-training on the augmented corpus followed by fine-tuning on the original references. We evaluate on three datasets spanning complementary challenges: PHOENIX14T (German Sign Language), with moderate lexical diversity; GSL (Greek Sign Language), with highly ontrolled, repetitive recordings; and LSA-T (Argentinian Sign Language), with severe long-tail sparsity. On PHOENIX14T, augmentation improves BLEU-4 from 9.56 to 10.33. The near-saturated GSL baseline and extremely sparse LSA-T setting reveal the limits of the approach. To our knowledge, this is the first study to apply LLM-generated target-side araphrases and LLM-as-a-Judge evaluation to SLT. The semantic evaluation reveals gains in fidelity that lexical overlap metrics understate.

URL PDF HTML ☆

赞 0 踩 0

2605.31391 2026-06-01 physics.ins-det cs.LG hep-ex

Deep-learning-based low-energy trigger algorithms for the Hyper-Kamiokande experiment

基于深度学习的Hyper-Kamiokande实验低能量触发算法

Katharina Lachner, Saúl Alonso-Monsalve, Benjamin Richards, Davide Sgalaberna

AI总结本文针对Hyper-Kamiokande实验的低能中微子事件（<7 MeV），提出并比较了监督式神经网络和基于异常检测（自编码器与MPDR）的触发算法，在3 MeV单电子信号上效率分别达76.7%和31.8%，远超传统命中计数触发的26.4%，且GPU推理延迟低于毫秒级，满足实时运行需求。

详情

Comments: 16 pages, 6 figures

AI中文摘要

现代机器学习技术因其强大的模式识别能力在粒子物理学中变得越来越重要，包括在具有严格运行时间约束的实时数据采集中。本文详细介绍了针对大型水切伦科夫探测器（如Hyper-Kamiokande）的低能中微子事件（低于7 MeV）的基于深度学习的触发算法的性能。展示了自定义神经网络监督分类器的性能，以及两种仅基于探测器噪声训练的异常检测方法：纯自编码器和基于流形投影-扩散恢复（MPDR）的能量模型。监督模型对动能为3 MeV的单电子信号识别效率为76.7%，显著超过了传统基于命中计数触发的26.4%的信号效率，MPDR方法也达到了31.8%。在GPU上的运行时间评估显示，每窗口推理延迟远低于毫秒量级，表明实时操作是可行的。

英文摘要

Modern machine learning techniques have become increasingly important in particle physics because of their powerful pattern-recognition capabilities, including in real-time data acquisition where stringent runtime constraints apply. This paper details the performance of deep-learning-based trigger algorithms for a large water Cherenkov detector such as Hyper-Kamiokande aimed at low-energy neutrino events (below 7 MeV). The performance of custom neural-network supervised classifiers is shown alongside two anomaly-detection approaches trained solely on detector noise: a pure autoencoder and an energy-based model based on Manifold Projection--Diffusion Recovery (MPDR). The supervised model shows signal identification efficiencies of 76.7% for single electrons of 3 MeV kinetic energy, significantly exceeding signal efficiencies obtained from a traditional hit-count-based trigger of 26.4%, as does the MPDR approach with 31.8%. Runtime evaluations on GPU yield per-window inference latencies well below the millisecond scale, indicating that real-time operation is feasible.

URL PDF HTML ☆

赞 0 踩 0

2605.31388 2026-06-01 cs.LG

Constrained Multi-Objective Reinforcement Learning with Max-Min Criterion

带最大最小准则的约束多目标强化学习

Giseung Park, Hyunyoung Nam, Woohyeon Byeon, Amir Leshem, Youngchul Sung

AI总结提出一种融合最大最小准则与显式约束满足的多目标强化学习框架，通过理论分析和表格实验验证收敛性，并在建筑热控制、多目标运动控制和温室气体排放感知交通管理中展示其平衡公平性与约束满足的有效性。

详情

Comments: Accepted to ICML 2026

AI中文摘要

多目标强化学习（MORL）通过优化针对多个（通常相互冲突的）目标的策略来扩展标准强化学习。虽然最大最小MORL已成为促进公平性的有效方法，但其适用性仍然有限，特别是在必须纳入约束的情况下。在本文中，我们提出了一种将最大最小准则与显式约束满足相结合的MORL框架。我们为所提出的框架建立了理论基础，并通过收敛性分析和表格设置中的实验验证了所得算法。我们进一步在模拟建筑热控制、多目标运动控制和温室气体排放感知交通管理中展示了我们方法的实际相关性。在这些领域中，我们的方法有效地平衡了多目标决策中的公平性和约束满足。

英文摘要

Multi-Objective Reinforcement Learning (MORL) extends standard RL by optimizing policies with respect to multiple, often conflicting, objectives. While max-min MORL has emerged as an effective approach for promoting fairness, its applicability remains limited, particularly when constraints must be incorporated. In this paper, we propose a MORL framework that integrates the max-min criterion with explicit constraint satisfaction. We establish a theoretical foundation for the proposed framework and validate the resulting algorithm through convergence analysis and experiments in tabular settings. We further demonstrate the practical relevance of our approach in simulated building thermal control, multi-objective locomotion control, and greenhouse-gas-emission-aware traffic management. Across these domains, our method effectively balances fairness and constraint satisfaction in multi-objective decision-making.

URL PDF HTML ☆

赞 0 踩 0

2605.31387 2026-06-01 cs.CL cs.RO

Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

多轮多智能体对话用于协作重建仅略微提升VLM在空间推理上的性能

Chalamalasetti Kranti, Sherzod Hakimov, David Schlangen

AI总结研究通过多轮多智能体对话框架评估视觉语言模型在协作空间推理任务中的表现，发现视觉空间理解仍是主要瓶颈，文本表示和分解图像表示可部分提升性能。

详情

Comments: Preprint

AI中文摘要

在多样化环境中运行的机器人依赖视觉输入来解释物体和空间布局。在人类协作任务中，它们被期望通过语言传达这种理解。视觉语言模型（VLM）支持涉及视觉解释、问答和指令跟随的机器人任务，但它们在需要空间推理的协作对话任务中的能力仍未充分探索。我们通过一个结合视觉解释、基础、语言引导交互和动作生成的协作结构构建任务来研究这一差距。我们开发了一个框架，其中VLM通过对话从视觉和文本输入重建目标结构。我们在交互设置、输入模态和图像表示上评估了开放权重和封闭VLM。结果表明，对于评估的VLM，视觉表示的空间推理仍然困难。目标的详细文本表示在模态条件下产生更高的重建成功率，而分解的图像表示提高了性能。这些发现揭示了协作VLM智能体在视觉空间基础和基础指令生成方面的局限性。

英文摘要

Robots operating in diverse environments rely on visual input to interpret objects and spatial layouts. In human-collaborative tasks, they are expected to communicate this understanding through language. Vision-language models (VLMs) support robotic tasks involving visual interpretation, question answering, and instruction following, but their capabilities in collaborative dialogue tasks requiring spatial reasoning remain underexplored. We study this gap through a collaborative structure-building task that combines visual interpretation, grounding, language-guided interaction, and action generation. We develop a framework in which VLMs use dialogue to reconstruct a target structure from visual and textual inputs. We evaluate open-weight and closed VLMs across interaction settings, input modalities, and image representations. Results show that spatial reasoning over visual representations remains difficult for the evaluated VLMs. Detailed text representations of the target yield higher reconstruction success across modality conditions, while decomposed image representations improve performance. These findings reveal limits in visual spatial grounding and grounded instruction generation for collaborative VLM agents.

URL PDF HTML ☆

赞 0 踩 0

2605.31378 2026-06-01 cs.CL

Unlocking Fine-Grained Translation Quality Estimation in LRMs through Synergistically Evolving Implicit and Explicit Reasoning

解锁大型推理模型中细粒度翻译质量评估：通过协同演化隐式和显式推理

Renfei Dang, Xinye Wang, Zhejian Lai, Weilu Xu, Shimin Tao, Daimeng Wei, Min Zhang, Shujian Huang

AI总结针对大型推理模型在细粒度翻译质量评估上的困难，提出RIEQE两阶段训练框架，通过非思考监督微调（隐式推理）和思考强化学习（显式推理）协同演化，在WMT测试集上超越基线。

详情

AI中文摘要

大型推理模型（LRMs）即使拥有长推理链，在细粒度翻译质量评估（QE）上仍然困难。我们认为LRMs已经具备强大的多语言能力，而核心挑战源于学习细粒度QE任务的内在难度。本文提出RIEQE（隐式和显式推理用于QE），一个简单的两阶段训练框架，实现隐式（层级别）和显式（词级别）推理能力的协同演化。为了使隐式推理可行，我们首先将复杂的QE任务分解为简单的子任务。基于此，我们的两阶段方法应用：（1）非思考监督微调（NonThinking-SFT），无推理链的监督微调，直接提升模型的隐式推理倾向和能力；（2）思考强化学习（Thinking-RLVR），标准可验证奖励强化学习，随后增强显式推理。结果表明，在我们的框架下，隐式和显式推理协同演化。在WMT测试集上，基于Qwen3-4B-Thinking-2507的RIEQE在显式推理性能上超越所有基线，同时其隐式推理能力也与当前最好的基于编码器的模型相当。我们进一步提供了隐式和显式推理协同合作的证据，展示了它们如何相互促进。

英文摘要

Large Reasoning Models (LRMs) still struggle with fine-grained translation quality estimation (QE), even with long reasoning chains. We argue that LRMs already possess strong multilingual capabilities, while the core challenge stems from the intrinsic difficulty of learning the fine-grained QE task. In this paper, we propose RIEQE (Reasoning both Implicitly and Explicitly for QE), a simple two-stage training framework that enables the co-evolution of implicit (layer-wise) and explicit (token-wise) reasoning capabilities. To make implicit reasoning feasible, we first decompose the complex QE task into straightforward subtasks. Based on this, our two-stage approach applies: (1) NonThinking-SFT, Supervised Fine-Tuning (SFT) without reasoning chains to directly boost the model's implicit reasoning tendency and capability; and (2) Thinking-RLVR, standard Reinforcement Learning with Verifiable Reward (RLVR) to subsequently strengthen explicit reasoning. Results demonstrate that implicit and explicit reasoning synergistically co-evolve under our framework. On the WMT test sets, RIEQE based on Qwen3-4B-Thinking-2507 surpasses all baselines in explicit reasoning performance, while its implicit reasoning capability is also comparable to the best current encoder-based models. We further provide evidence for the synergistic collaboration between implicit and explicit reasoning, showing how they mutually benefit each other.

URL PDF HTML ☆

赞 0 踩 0

2605.31377 2026-06-01 cs.IR cs.AI

DynaTree: Dynamic Agentic Retrieval Tree for Time-Sensitive News Retrieval

DynaTree: 面向时效性新闻检索的动态智能检索树

Siyuan Qi, Xinyuan Wang, Yingxuan Yang, Haochuan Guo, Jianghao Lin, Weiwen Liu, Yong Yu, Weinan Zhang

AI总结提出DynaTree两阶段框架，通过离线构建可复用检索树和在线轻量子树选择，实现高效、自适应的时效性新闻检索，在Syft新闻基准和BEIR数据集上优于标准RAG和现有智能体方法。

详情

AI中文摘要

智能体检索增强生成通过集成规划、工具使用和迭代推理改进了检索，但现有的智能体RAG方法通常将语义扩展与检索决策耦合在短视推理循环中，导致推理成本高且不适用于时效性新闻检索。我们提出DynaTree，一个高效自适应新闻检索的两阶段框架。在离线阶段，DynaTree使用协调的智能体构建一个可复用的检索树，具体化查询主题的语义空间。在在线阶段，DynaTree在时间局部评估代理上执行轻量级日常子树选择，无需进一步的智能体推理、树修改或重新训练。在多日Syft新闻基准和多个BEIR数据集上的实验表明，DynaTree实现了强大的召回和排序性能，始终优于标准RAG和先前的智能体基线。我们进一步在Syft生产系统中部署DynaTree，并通过2026年1月28日至2月6日的在线A/B测试进行评估。动态适应变体将固定离线选择子树的生存率从0.32-0.53提高到0.59-0.73，并且在每个评估日都优于现有的生产召回器。这些结果表明，持久的、结构感知的语义扩展可以将离线智能体推理转化为实际改进，覆盖范围、新鲜度和相关性在真实世界新闻检索中均得到提升。

英文摘要

Agentic Retrieval-Augmented Generation improves retrieval by integrating planning, tool use, and iterative reasoning, but existing agentic RAG methods often couple semantic expansion with retrieval decisions in short-horizon inference loops, leading to high inference cost and limited suitability for time-sensitive news retrieval. We propose DynaTree, a two-stage framework for efficient and adaptive news retrieval. In the offline stage, DynaTree uses coordinated agents to construct a reusable retrieval tree that materializes the semantic space of a query topic. In the online stage, DynaTree performs lightweight daily subtree selection over a time-localized evaluation proxy, without further agentic reasoning, tree modification, or retraining. Experiments on a multi-day Syft news benchmark and multiple BEIR datasets show that DynaTree achieves strong recall and ranking performance, consistently outperforming standard RAG and prior agentic baselines. We further deploy DynaTree in the Syft production system and evaluate it through online A/B testing from Jan. 28 to Feb. 6, 2026. The dynamically adapted variant improves survival rate from 0.32-0.53 to 0.59-0.73 over a fixed offline-selected subtree and outperforms existing production recallers on every evaluation day. These results show that persistent, structure-aware semantic expansion can translate offline agentic reasoning into practical improvements in coverage, freshness, and relevance for real-world news retrieval.

URL PDF HTML ☆

赞 0 踩 0

2605.31376 2026-06-01 cs.RO cs.CV cs.GR

LiftNav: Path Planning via Semantic Lifting in TSDF-Guided Gaussian Splatting

LiftNav: TSDF引导的高斯泼溅中的语义提升路径规划

Hannah Schieber, Dominik Frischmann, Victor Schaack, Angela P. Schoellig, Daniel Roth

AI总结提出LiftNav混合导航框架，结合TSDF+GS双地图、YOLO检测、TSDF三维提升和B样条轨迹优化，实现无需密集三维嵌入的灵活语义导航，并通过铰链损失碰撞惩罚提升轨迹平滑性和安全性，在Replica数据集仿真中实现100%可行性和更短轨迹。

2605.31373 2026-06-01 cs.LG cs.AI

Scaling Higher-Order Graph Learning with Maximal Clique Complexes

基于最大团复形的规模化高阶图学习

Antoine Vialle, Aref Einizade, Fragkiskos D. Malliaros, Jhony H. Giraldo

AI总结提出简化与分解的细胞Weisfeiler-Leman测试及最大团复形，结合CliqueWalk随机游走，实现可扩展的高阶图神经网络。