arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.14862 2026-06-16 cs.RO 新提交

TacStyle: Personalizing Tactile Robot Policies using Structured Behavior Representations

TacStyle: 使用结构化行为表示个性化触觉机器人策略

Kevin Robledo, Matías I. Torres Galaz, Kumar Dixhant Rai, Shelly Sara Ulman, Tasmia Tasrin, Heramb Nemlekar

发表机构 * Department of Computer Science, California State University, Northridge（加州州立大学北岭分校计算机科学系）； Department of Mechanical Engineering, California State University, Northridge（加州州立大学北岭分校机械工程系）

AI总结提出通过结构化潜在表示组织用户偏好，结合基础模型解释语言指令，实现机器人行为精细调整，减少偏好标签需求。

Comments 14 pages, 5 figures

详情

AI中文摘要

辅助人类的机器人系统应能根据个人用户偏好调整其行为。例如，用户可能希望机器人手臂在折叠衣物或清洁家具时调整施加的力。自然语言为人类传达此类偏好提供了直观方式。语言条件机器人策略的最新进展表明，机器人可以成功使用语言提示来确定要执行的任务。然而，将相同方法扩展到实现任务应如何执行，需要描述任务数据中轨迹偏好或风格的详细标签。收集此类注释具有挑战性，而且直接以这些标签为条件可能无法提供对连续行为范围的精细控制。例如，通过“施加比之前稍大一点的压力”这样的抽象指令来传达机器人必须施加的确切力是困难的。因此，在这项工作中，我们提出使用语言来推理偏好行为，而不是直接生成它们。我们首先学习一个结构化的潜在表示，根据相应轨迹的差异来组织用户偏好。然后，给定一个偏好提示，我们使用基础模型来解释这个潜在空间，并选择一个能产生所需行为的值。通过仿真和真实世界实验，我们表明从直观结构化的潜在空间中选择机器人行为能够更精确地适应用户偏好，同时所需的偏好标签显著少于语言条件策略。

英文摘要

Robotic systems that assist humans should be capable of adapting their behaviors to individual user preferences. For instance, users may want a robot arm to adjust the amount of force it applies while folding their laundry or cleaning furniture. Natural language provides an intuitive way for humans to communicate such preferences. Recent progress in language-conditioned robot policies has shown that robots can successfully use language prompts to determine what task to perform. However, extending the same approach to realize how the task should be performed requires detailed labels describing the preferences or styles of trajectories in the task data. Not only is collecting such annotations challenging, but conditioning directly on these labels may also fail to provide fine-grained control over a continuous range of behaviors. For example, it can be difficult to convey the exact force that a robot must apply through abstract instructions like "apply a bit more pressure than before". Therefore, in this work, we propose using language to reason over preferred behaviors instead of directly generating them. We first learn a structured latent representation that organizes user preferences according to differences in the corresponding trajectories. Then, given a preference prompt, we use a foundation model to interpret this latent space and choose a value that produces the desired behavior. Through both simulation and real-world experiments, we show that selecting robot behaviors from an intuitively structured latent space enables more precise adaptation to user preferences while requiring significantly fewer preference labels than language-conditioned policies.

URL PDF HTML ☆

赞 0 踩 0

2606.15232 2026-06-16 cs.RO 新提交

Rethinking Implicit Spatial Representation in Visuomotor Policy Learning

重新思考视觉运动策略学习中的隐式空间表示

Xiangyu Chen, Yuxuan Hu, Chuhao Zhou, Jianfei Yang

发表机构 * MARS Lab, Nanyang Technological University（南洋理工大学MARS实验室）

AI总结本文重新评估了空间softmax池化在机器人操作中的有效性，发现其提供紧凑稳定的空间表示但受限于表示瓶颈，并提出PRISM编码器通过多尺度隐式空间信息融合提升性能。

详情

AI中文摘要

通过可复用技能学习新任务：面向具身持续学习的技能组合专家

Shuaike Zhang, Shaokun Wang, Haoyu Tang, Jianlong Wu, Liqiang Nie

发表机构 * Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））； Shandong University（山东大学）； Shenzhen Loop Area Institute（深圳循环区域研究所）

AI总结提出技能组合专家（SCE）框架，通过组合技能基础（CSG）分解演示为可复用技能，并利用双执行-转换专家（DETE）实现新任务学习，有效缓解具身持续学习中的灾难性遗忘。

Comments 13 pages, 5 figures

详情

AI中文摘要

具身持续学习（ECL）旨在使机器人能够在闭环控制下持续获取新的操作任务，同时保留先前学习的行为。与传统的持续学习相比，ECL遭受更严重的灾难性遗忘。在闭环控制下累积的特征漂移通过顺序决策逐步传播，导致先前学习的行为退化。ECL中的一个关键挑战在于如何在不断演变的任务中进行结构化的技能复用，因为现有方法主要关注技能学习，而没有明确组织它们以执行连贯的任务。为了解决这个问题，我们提出了SCE，一个用于ECL的技能组合专家框架。SCE通过组合技能基础（CSG）构建技能库，将任务演示分解为可复用的技能。在此基础上，双执行-转换专家（DETE）通过技能组合实现新任务学习，其中一个分支确保技能执行，另一个支持技能之间的转换以实现连贯行为。在LIBERO基准测试和真实世界操作任务上的实验表明，SCE持续提高了保留率和整体任务性能。进一步的特征漂移分析和消融研究验证了我们方法的有效性。项目网站：https://eqcy.github.io/sce/。

英文摘要

Embodied Continual Learning (ECL) aims to enable robots to continually acquire new manipulation tasks while retaining previously learned behaviors under closed-loop control. Compared with conventional continual learning, ECL suffers from more severe catastrophic forgetting. Feature drift accumulated under closed-loop control progressively propagates through sequential decision-making, leading to degradation of previously learned behaviors. A key challenge in ECL lies in structured skill reuse across continually evolving tasks, since existing methods primarily focus on skill learning without explicitly organizing them for coherent task execution. To address this issue, we propose SCE, a Skill-Compositional Experts framework for ECL. SCE builds a skill base via Compositional Skill Grounding (CSG), which decomposes task demonstrations into reusable skills. Based on this, Dual Execution-and-Transition Experts (DETE) enable new task learning through skill composition, where one branch ensures skill execution and the other supports transitions between skills for coherent behavior. Experiments on LIBERO benchmarks and real-world manipulation tasks demonstrate that SCE consistently improves retention and overall task performance. Further feature drift analyses and ablation studies verify the effectiveness of our method. Project website: https://eqcy.github.io/sce/.

URL PDF HTML ☆

赞 0 踩 0

2606.16178 2026-06-16 cs.RO 新提交

Scaling Short-Term Memory of Visuomotor Policies for Long-Horizon Tasks

面向长时任务的视觉运动策略短期记忆扩展

Rutav Shah, Rajat Kumar Jenamani, Xiaohan Zhang, Lingfeng Sun, Roberto Martín-Martín, Yuke Zhu, Deva Ramanan, Karl Schmeckpeper

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； Toyota Research Institute（丰田研究院）

AI总结提出PRISM架构，通过门控注意力与层次化压缩扩展视觉运动策略的短期记忆至两分钟，在ReMemBench基准上超越现有方法5%-12%。

Comments 14 pages, 9 Figures, 8 Tables

详情

AI中文摘要

许多机器人任务需要短期记忆，无论是检索不再可见的物体，还是在设定时间后关闭电器。然而，大多数通过模仿学习训练的视觉运动策略仅依赖即时感官输入，而不使用过去经验来指导决策。我们提出了PRISM，一种基于Transformer的视觉运动策略架构，通过两个关键组件有效利用短期记忆：(i) 门控注意力，过滤检索到的信息以抑制无关细节，通过减少历史与当前动作预测之间的虚假相关性来提高性能；(ii) 层次化架构，首先将局部信息压缩为紧凑令牌，然后整合它们以捕获时间上扩展的依赖关系，改善计算和内存占用。这些机制共同使我们将视觉运动策略的短期记忆扩展到长达两分钟。为了系统评估视觉运动控制中的记忆，我们引入了ReMemBench——一个包含八种多样化家务操作任务的基准，涵盖四类短期记忆——旨在促进通用记忆机制而非孤立的、特定任务的解决方案。PRISM持续优于先前的工作，包括循环架构、Transformer及其变体——在最强基线上实现了5%–12%的绝对改进。在RoboCasa和LIBERO基准上，尽管没有利用任何大规模预训练，它相对于其无记忆变体以及微调的视觉-语言-动作基线（如GR00T-N1-3B和OpenVLA）实现了11%–15%的绝对改进。PRISM和ReMemBench共同为开发和评估可扩展到长时任务的短期记忆增强视觉运动策略奠定了基础。更多资料请访问https://shahrutav.github.io/short-term-memory。

英文摘要

Many robotic tasks require short-term memory, whether it's retrieving an object that's no longer visible or turning off an appliance after a set period. Yet, most visuomotor policies trained via imitation learning rely only on immediate sensory input without using past experiences to guide decisions. We present PRISM, a transformer-based architecture for visuomotor policies to effectively use short-term memory via two key components: (i) gated attention, which filters retrieved information to suppress irrelevant details, improving performance by reducing the spurious correlations between the history and current action prediction, (ii) a hierarchical architecture that first compresses local information into compact tokens and then integrates them to capture temporally extended dependencies, improving its compute and memory footprint. Together, these mechanisms enable us to scale short-term memory in visuomotor policies for up to two minutes. To systematically evaluate memory in visuomotor control, we introduce ReMemBench -- a benchmark of eight diverse household manipulation tasks spanning four categories of short-term memory -- designed to foster general memory mechanisms rather than siloed, task-specific solutions. PRISM consistently outperforms prior works, including recurrent architectures, transformers, and their variants -- achieving an absolute improvement of 5%--12% over the strongest baseline. On the RoboCasa and LIBERO benchmarks, it achieves absolute improvements of 11%--15% over its no-memory variant and fine-tuned Vision-Language-Action baselines such as GR00T-N1-3B and OpenVLA, despite not leveraging any large-scale pretraining. Together, PRISM and ReMemBench establish a foundation for developing and evaluating short-term memory-augmented visuomotor policies that scale to long-horizon tasks. Additional materials are available at https://shahrutav.github.io/short-term-memory

URL PDF HTML ☆

赞 0 踩 0

2606.16458 2026-06-16 cs.RO 新提交

RHO: Your Coding Agent is Secretly a Roboticist

RHO：你的编码代理其实是个机器人专家

Karim Elmaaroufi, Justin Svegliato, Sarunas Kalade, Graham Schelle, Sanjit A. Seshia, Matei Zaharia

发表机构 * University of California, Berkeley（加州大学伯克利分校）； AMD

AI总结提出RHO范式，通过训练时搜索神经符号化多文件策略库，实现机器人任务的高效零样本泛化，在LIBERO-PRO和Robosuite上分别达到45%和70%的成功率，显著优于现有方法。

Comments 46 pages, 9 figures, 15 tables. Project page: https://rho-robotics.github.io

详情

AI中文摘要

代码即策略（CaP）表明，大型语言模型（LLM）可以通过组合感知、规划和控制原语来编写代码解决机器人任务。然而，最近的CaP系统在测试时依赖多轮代码生成循环，这对于实时机器人控制通常不可行。我们引入了机器人学优化（RHO），这是一种新颖的范式，其中支持工具编码的代理在训练时提出并搜索可解释的、神经符号化的多文件策略库（仓库即策略），这些库组合这些原语，而不是单个提示、函数或文件。RHO通过环境奖励和执行的反思性反馈进行搜索，而不是通过遥操作演示。它泛化到受扰动的拾取和放置场景，如LIBERO-PRO，其中OpenVLA得分为0.0%，π_{0.5}平均为12.83%。使用相同的低级原语，RHO达到45.0%的成功率，比最强的多轮代理系统高2.5倍，比π_{0.5}高3.5倍。在Robosuite上，RHO以70.0%的成绩创造了新的最先进水平，超过了之前多轮记录的68.29%，且部署时无需纠正性LLM代码编辑。当在控制循环中使用LLM时，如在RAI的O3DE基准测试中，RHO优化了部署代理的多文件提示、工具和控制代码，将保留成功率从23.5%提高到44.3%，同时减少了20%的墙钟时间和27%的工具调用。

英文摘要

Code-as-Policies (CaP) has shown that large language models (LLMs) can write code to solve robotics tasks by composing perception, planning, and control primitives. Recent CaP systems, however, rely on multi-turn code-generation loops at test time, which is often infeasible for real-time robot control. We introduce Robotics Harness Optimization (RHO), a novel paradigm in which tool-enabled coding agents, at training time, propose and search for interpretable, neurosymbolic multi-file policy repositories (Repositories-as-Policies) that compose these primitives rather than a single prompt, function, or file. RHO searches with reflective feedback from environment reward and execution rather than teleoperation demonstrations. It generalizes to perturbed pick-and-place settings like LIBERO-PRO, where OpenVLA scores 0.0% and $π_{0.5}$ averages 12.83%. Using the same low-level primitives, RHO reaches a 45.0% success rate, 2.5x higher than the strongest multi-turn agentic system, and 3.5x higher than $π_{0.5}$. On Robosuite, RHO sets a new state-of-the-art of 70.0%, exceeding the prior multi-turn record of 68.29% using single-turn execution with no corrective LLM code edits at deployment. When an LLM is used in the control loop, as on RAI's O3DE benchmark, RHO optimizes the deployed agent's multi-file harness of prompts, tools, and control code, improving held-out success from 23.5% to 44.3% with 20% less wall-clock time and 27% fewer tool calls.

URL PDF HTML ☆

赞 0 踩 0

2606.16572 2026-06-16 cs.RO 新提交

Steering Generative Reinforcement Learning into Stable Robotic Controller

将生成式强化学习引导至稳定机器人控制器

Yixuan Wang, Shutong Ding, Ke Hu, Tianxiang Gui, Jingya Wang, Ye Shi

发表机构 * ShanghaiTech University（上海科技大学）

AI总结提出SteerGenPO框架，通过潜在空间强化学习将训练好的生成式策略转化为鲁棒的确定性机器人控制器，在Isaac Lab和Unitree G1任务上优于基线方法，实现更稳定的推理行为。

详情

AI中文摘要

基于扩散和流的生成式策略通过迭代动作生成诱导丰富的随机探索，为强化学习提供了强大的策略类。然而，扩散策略的随机性不适用于高维机器人系统中的稳定精确控制，其中小的动作变化可能累积为不一致的运动并降低鲁棒性。为解决此问题，我们提出SteerGenPO，一种潜在空间强化学习框架，将训练好的生成式策略引导为鲁棒的确定性机器人控制器。关键思想是用学习到的潜在演员替换训练好的生成式策略的随机潜在采样，该潜在演员为生成式策略预测状态相关的潜在输入。这分离了探索和控制：随机生成采样在策略学习期间提供多样化的动作提议，而确定性潜在引导在部署时提供稳定和自适应的控制。我们在六个Isaac Lab基准测试和一个Unitree G1运动任务上评估了SteerGenPO。结果表明，SteerGenPO在经典RL和生成式RL基线上均有改进，同时其确定性潜在引导产生更稳定的推理时行为和更可靠的命令响应。

英文摘要

Diffusion and flow-based generative policies provide a powerful policy class for reinforcement learning by inducing rich stochastic exploration through iterative action generation. However, the stochasticity of diffusion policies is not suitable for stable and precise control in high-dimensional robotic systems, where small action variations can accumulate into inconsistent motion and reduced robustness. To address this issue, we propose SteerGenPO, a latent-space reinforcement learning framework that steers a trained generative policy into a robust deterministic robotic controller. The key idea is to replace stochastic latent sampling of the trained generative policy with a learned latent actor that predicts a state-dependent latent input for the generative policies. This separates exploration and control: stochastic generative sampling provides diverse action proposals during policy learning, while deterministic latent steering provides stable and adaptive control at deployment. We evaluate SteerGenPO on six Isaac Lab benchmarks and a Unitree G1 locomotion task. The results show SteerGenPO improves over both classical RL and generative RL baselines, while its deterministic latent steering produces more stable inference-time behaviors and more reliable command responses.

URL PDF HTML ☆

赞 0 踩 0

2606.16856 2026-06-16 cs.RO 新提交

Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning

基于视频的最优传输用于反馈高效的离线偏好强化学习

Tung M. Luu, Hwanhee Kim, Younghwan Lee, Chang D. Yoo

AI总结提出VOTP框架，利用视频基础模型和最优传输生成伪标签，仅需少量人类反馈即可学习有效奖励函数，显著降低标注成本。

Comments ICML 2026 (Oral)

详情

AI中文摘要

向强化学习智能体传达复杂目标通常需要精心的奖励工程。偏好强化学习（PbRL）通过从人类反馈中学习奖励函数提供了一种有前景的替代方案，但其可扩展性受到高标注成本的阻碍。受视频基础模型（ViFMs）进展的启发，我们提出了基于视频的最优传输偏好（VOTP），这是一个半监督框架，仅需少量标签即可学习有效的奖励函数。通过利用最优传输在ViFMs的丰富表示空间中对齐视觉轨迹，VOTP有效地为大量未标注数据生成高保真伪标签，大幅减少了人类监督。在运动控制和操作基准上的大量实验证明了VOTP的优越性，在有限的反馈预算下，其性能优于最先进的离线PbRL方法。我们还展示了VOTP在视觉干扰存在时的鲁棒性，并在真实机器人任务上验证了其实用性，其中它以最少的人类输入学习了有意义的奖励。

英文摘要

Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward engineering. Preference-based RL (PbRL) offers a promising alternative by learning reward functions from human feedback, but its scalability is hindered by high labeling costs. Inspired by advances in Video Foundation Models (ViFMs), we present Video-based Optimal Transport Preference (VOTP), a semi-supervised framework that learns effective reward functions from only a handful of labels. By leveraging optimal transport to align visual trajectories within the rich representation space of ViFMs, VOTP effectively generates high-fidelity pseudo-labels for large amounts of unlabeled data, substantially reducing human supervision. Extensive experiments across locomotion and manipulation benchmarks demonstrate the superiority of VOTP, which outperforms state-of-the-art offline PbRL methods under limited feedback budgets. We also showcase the robustness of VOTP in the presence of visual distractors and validate its utility on real robotic tasks, where it learns meaningful rewards with minimal human input.

URL PDF HTML ☆

赞 0 踩 0

2606.16888 2026-06-16 cs.RO 新提交

LOPAL: Local Performance-Aware Active Learning from Imperfect Demonstrations

LOPAL：基于局部性能感知的不完美演示主动学习

Johannes Heidersberger, Shail Jadav, Dongheui Lee

发表机构 * Autonomous Systems Lab, Institute of Computer Technology, TU Wien（维也纳工业大学计算机技术研究所自主系统实验室）； Institute of Robotics and Mechatronics, German Aerospace Center (DLR)（德国航空航天中心机器人与机电一体化研究所）

AI总结提出LOPAL方法，利用局部演示质量信息，通过高斯混合模型编码轨迹与质量评估，结合共享自主权主动收集纠正数据，在不完美演示中提升任务性能。

Comments Accepted for publication in IEEE Robotics and Automation Letters (RAL), 2026

详情

DOI: 10.1109/LRA.2026.3698364

AI中文摘要

从演示中学习（LfD）通过允许机器人直接从人类任务演示中学习，实现了直观的机器人技能获取。然而，当前方法通常未能解决由于次优和不一致的人类行为，演示质量在每个演示内部可能变化的问题。因此，我们引入了LOPAL（局部性能感知主动学习），一种利用这种局部演示质量信息的主动学习方法。我们的方法由两个协同组件组成。首先，一种局部性能驱动的LfD方法使用高斯混合模型（GMM）来编码演示轨迹及其相关的局部质量评估。这使得能够通过利用高性能的互补局部数据生成优于不完美演示的轨迹。其次，主动数据采集允许通过收集额外的信息样本来超越不完美演示。在缺乏良好数据的区域，通过共享自主权（SA）机制主动请求用户提供纠正，同时机器人自主执行学习的行为。LOPAL的有效性在仿真和真实世界实验中得到了验证。真实世界管道检查任务的结果表明，所提出的方法可以实现高达27.31%的任务性能提升，同时减少了收集演示所需的努力。

英文摘要

Learning from Demonstration (LfD) enables intuitive robot skill acquisition by allowing robots to learn directly from human task demonstrations. However, current methods often fail to address the fact that due to suboptimal and inconsistent human behavior, the quality of the demonstration can vary within each demonstration. Therefore, we introduce LOPAL (LOcal Performance-aware Active Learning), an active learning approach that leverages this local demonstration quality information. Our approach consists of two synergistic components. First, a local performance-driven LfD method uses a Gaussian Mixture Model (GMM) to encode both the demonstrated trajectories and their associated local quality assessments. This enables the generation of trajectories that outperform the imperfect demonstrations by utilizing complementary local data of high performance. Second, active data acquisition allows to improve beyond the imperfect demonstrations by collecting additional informative samples. In areas missing good data, the user is actively requested to provide corrections through a shared autonomy (SA) mechanism, while the robot autonomously executes the learned behavior. The efficacy of LOPAL was validated in both a simulation and a real-world experiment. The results from a real-world pipe inspection task showed that the proposed approach can achieve up to 27.31 % improvement in task performance while also reducing the effort required to collect the demonstrations.

URL PDF HTML ☆

赞 0 踩 0

2606.17011 2026-06-16 cs.RO cs.LG 新提交

FlowMPC：利用世界模型改进流匹配策略

Chandon Hamel

发表机构 * Stanford University（斯坦福大学）

AI总结提出FlowMPC框架，结合流匹配模仿策略与学习的世界模型，通过MPPI规划提升测试时性能，在ManiSkill操作任务中显著提高成功率。

详情

AI中文摘要

流匹配（FM）是一种在多模态动作空间中进行行为克隆的强大方法[Jiang et al., 2025]，但由于它没有直接训练以最大化期望回报，FM策略在测试时的表现仍有改进空间。本文研究学习的世界模型是否可以通过对策略提出的候选动作序列进行模型预测路径积分（MPPI）规划来改进FM策略。基于TD-MPC2 [Hansen et al., 2024]，我引入了FlowMPC，这是一个将模仿学习的FM策略与学习的世界模型相结合的框架，用于ManiSkill操作任务[Tao et al., 2025]中的测试时规划。在PickCube和PickSingleYCB上，添加世界模型比单独使用FM策略提高了性能，尤其是在回合结束时的成功率方面有显著提升。这些结果表明，基于世界模型的规划可以有效地补充基于流的模仿策略，而无需修改FM训练目标。

英文摘要

Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.

URL PDF HTML ☆

赞 0 踩 0

2606.16515 2026-06-16 cs.LG cs.AI cs.RO 交叉投稿

Direction-Conditioned Policies via Compositional Subgoal Scoring for Online Goal-Conditioned Reinforcement Learning

基于组合子目标评分的方向条件策略用于在线目标条件强化学习

Swaminathan S K, Damiya Gondha, Theyanesh Eswaramoorthy Rajahkrishnan, Aritra Hazra

AI总结提出方向条件策略（DCP），通过共享InfoNCE表示将目标达成分解为子目标评分和方向条件动作，理论证明方向充分性、训练与部署一致性及可控子空间失效条件，在九个环境中优于对比RL。

Comments 17 pages, Accepted to the 2nd Workshop on Compositional Learning at ICML 2026 (Seoul, South Korea)

详情

AI中文摘要

Hamilton-Jacobi-Bellman理论表明，最优目标条件动作仅通过当前状态下目标距离的梯度依赖于目标，然而标准的在线GCRL仍然将演员网络条件于原始目标——当目标远离数据分布时，这是一个几何上无信息的信号。我们提出方向条件策略（DCP），一种完全在线的方法，将目标达成分解为两个共享一个InfoNCE表示ψ的组件：一个子目标评分步骤，选择与最终目标g在ψ空间中对齐的已访问状态z_t；以及一个方向条件演员，它消耗从ψ(s_t)到ψ(z_t)的单位方向d_t和幅度r_t。这两个组件联合训练，在部署时干净地分解（子目标评分被移除，而方向条件保留，用g代替z_t），并允许在相同的(d_t, r_t)接口上进行独立修改。我们证明了三个结果。首先，HJB下的方向充分性：在控制仿射动力学下，最优动作仅通过价值梯度依赖于目标。其次，一个定量界表明，在学习表示的温和条件下，并假设评分规则返回一个路径上的z_t，演员在训练和部署时的条件输入在表示误差和测地线松弛下是一致的。第三，一个可控子空间刻画了方向条件失效的情况。在九个环境中，DCP在大多数最终指标上优于对比RL，在操作和障碍物交互任务上提升最大；对学习到的ψ-距离景观的定性分析表明，对比表示表现为一种在线拟度量，编码环境拓扑，而唯一的失败案例（AntSoccer）定位到理论预期的学习梯度病理。

英文摘要

Hamilton-Jacobi-Bellman theory implies that the optimal goal-conditioned action depends on the goal only through the gradient of the goal-reaching distance at the current state, yet standard online GCRL still conditions the actor on the raw goal -- a signal that is geometrically uninformative when the goal is far from the data distribution. We propose Direction-Conditioned Policies (DCP), a fully online method that decomposes goal-reaching into two components sharing one InfoNCE representation $ψ$: a subgoal-scoring step that selects a visited state $z_t$ aligned with the final goal $g$ in $ψ_g$, and a direction-conditioned actor that consumes the unit direction $d_t$ and magnitude $r_t$ from $ψ(s_t)$ to $ψ(z_t)$. The two components train jointly, factor cleanly at deployment (subgoal scoring is removed, while direction conditioning remains with $g$ in place of $z_t$), and admit independent modification at the same $(d_t,r_t)$ interface. We prove three results. First, direction sufficiency under HJB: the optimal action under control-affine dynamics depends on the goal only through the value gradient. Second, a quantitative bound showing that, under mild conditions on the learned representation and assuming the scoring rule returns an on-path $z_t$, the actor's conditioning input at training and at deployment coincide up to representation error and geodesic slack. Third, a controllable-subspace characterization of when directional conditioning fails. Across nine environments, DCP improves over Contrastive RL on most final metrics, with the largest gains on manipulation and obstacle-interaction tasks; a qualitative analysis of the learned $ψ$-distance landscape shows the contrastive representation behaves as an online quasimetric encoding environment topology, and the single failure case (AntSoccer) localizes to a learned-gradient pathology that the theory anticipates.

URL PDF HTML ☆

赞 0 踩 0

2408.15919 2026-06-16 cs.RO 版本更新

ReMoBot: Retrieval-Based Few-Shot Imitation Learning for Mobile Manipulation with Vision Foundation Models

ReMoBot: 基于检索的少样本模仿学习用于移动操作与视觉基础模型

Yuying Zhang, Wenyan Yang, Francesco Verdoja, Ville Kyrki, Joni Pajarinen

发表机构 * School of Electrical Engineering, Aalto University, Espoo, Finland（艾尔沃大学电气工程学院，埃斯波，芬兰）

AI总结提出ReMoBot，一种基于检索的少样本模仿学习框架，利用视觉基础模型从演示中检索信息，解决移动操作任务中的部分可观测性和数据有限问题，在真实世界任务中取得高成功率。

详情

AI中文摘要

模仿学习算法通常将演示提炼为参数化策略以模仿专家行为。然而，在数据有限和部分可观测的情况下（例如在自我中心的移动操作中），现有方法往往难以生成准确的动作。为了解决这些挑战，我们提出了ReMoBot，一种少样本、轨迹条件的模仿学习框架，它直接从演示中检索信息，以解决具有自我中心视觉观察的移动操作任务。利用视觉基础模型，ReMoBot通过结合状态级相似性、历史感知轨迹对齐和动作序列一致性来识别相关的专家演示，从而消除感知上相似观察的歧义。然后，智能体以完全无需训练的方式基于这些检索到的演示选择适当的控制命令。我们在波士顿动力Spot机器人上，在仿真和真实世界环境中评估了ReMoBot在三个移动操作任务上的表现。在仿真中对比了五种方法后，我们将我们的方法与两个直接在真实世界数据上训练（无仿真到真实迁移）的基线进行了比较。每个任务仅用20个演示，ReMoBot就优于基线，在Table Uncover（70%）和Gap Cover（80%）任务中取得了高成功率，同时在更具挑战性的真实世界Curtain Open任务中也展示了有前景的性能。此外，ReMoBot能够泛化到不同的机器人位置、物体尺寸和材料属性，突显了其在真实世界可变形移动操作中的鲁棒性。更多细节请访问：this https URL

英文摘要

Imitation learning (IL) algorithms typically distill demonstrations into parametric policies to mimic expert behavior. However, with limited data and partial observability, such as in egocentric mobile manipulation, existing methods often struggle to generate accurate actions. To address these challenges, we propose ReMoBot, a few-shot, trajectory-conditioned imitation learning framework that directly Retrieves information from demonstrations to solve Mobile manipulation tasks with ego-centric visual observations. Leveraging vision foundation models, ReMoBot identifies relevant expert demonstrations by combining state-level similarity, history-aware trajectory alignment, and action-sequence consistency to disambiguate perceptually similar observations. The agent then selects appropriate control commands based on these retrieved demonstrations in a fully training-free manner. We evaluate ReMoBot on three mobile manipulation tasks using a Boston Dynamics Spot robot in both simulation and real-world settings. After benchmarking five approaches in simulation, we compare our method with two baselines trained directly on real-world data without sim-to-real transfer. With only 20 demonstrations per task, ReMoBot outperforms the baselines, achieving high success rates in Table Uncover (70%) and Gap Cover (80%), while also showing promising performance on the more challenging Curtain Open task in the real-world setting. Furthermore, ReMoBot generalizes across varying robot positions, object sizes, and material properties, highlighting its robustness in real-world deformable mobile manipulation. Additional details are available at: https://sites.google.com/view/remobot/home

URL PDF HTML ☆

赞 0 踩 0

2506.20668 2026-06-16 cs.RO cs.LG 版本更新

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

DemoDiffusion: 使用预训练扩散策略的一次性人类模仿

Sungjae Park, Homanga Bharadhwaj, Shubham Tulsiani

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出DemoDiffusion方法，通过单次人类演示和预训练扩散策略，无需任务特定训练即可使机器人执行操作任务，在8项任务中平均成功率达83.8%。

Comments 11 pages. Published at ICRA 2026

详情

AI中文摘要

我们提出DemoDiffusion，一种简单的方法，使机器人能够通过模仿单次人类演示来执行操作任务，无需任务特定训练或配对的人-机器人数据。我们的方法基于两个见解。首先，人类演示中的手部运动为机器人的末端执行器轨迹提供了有用的先验，我们可以通过运动学重定向将其转换为粗略的开环机器人运动轨迹。其次，虽然这种重定向的运动捕捉了任务的整体结构，但它可能无法很好地与上下文中的合理机器人动作对齐。为了解决这个问题，我们利用预训练的通用扩散策略来修改轨迹，确保它既遵循人类运动，又保持在合理机器人动作的分布内。与基于在线强化学习或配对的人-机器人数据的方法不同，我们的方法能够以最小的努力稳健地适应新任务和场景。在涵盖8种不同操作任务的实际实验中，DemoDiffusion实现了83.8%的平均成功率，而预训练策略为13.8%，运动学重定向为52.5%，甚至在预训练通用策略完全失败的任务上也取得了成功。项目页面：此 https URL

英文摘要

We propose DemoDiffusion, a simple method for enabling robots to perform manipulation tasks by imitating a single human demonstration, without requiring task-specific training or paired human-robot data. Our approach is based on two insights. First, the hand motion in a human demonstration provides a useful prior for the robot's end-effector trajectory, which we can convert into a rough open-loop robot motion trajectory via kinematic retargeting. Second, while this retargeted motion captures the overall structure of the task, it may not align well with plausible robot actions in-context. To address this, we leverage a pre-trained generalist diffusion policy to modify the trajectory, ensuring it both follows the human motion and remains within the distribution of plausible robot actions. Unlike approaches based on online reinforcement learning or paired human-robot data, our method enables robust adaptation to new tasks and scenes with minimal effort. In real-world experiments across 8 diverse manipulation tasks, DemoDiffusion achieves 83.8\% average success rate, compared to 13.8\% for the pre-trained policy and 52.5\% for kinematic retargeting, succeeding even on tasks where the pre-trained generalist policy fails entirely. Project page: https://demodiffusion.github.io/

URL PDF HTML ☆

赞 0 踩 0

2509.18428 2026-06-16 cs.RO cs.CV 版本更新

Latent Action Pretraining Through World Modeling

通过世界建模的潜在动作预训练

Bahey Tharwat, Yara Nasser, Ali Abouzeid, Ian Reid

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（Mohamed bin Zayed人工智能大学）； Alexandria University（亚历山大大学）

AI总结提出LAWM框架，通过世界建模从无标签视频中学习潜在动作表征，实现跨任务、环境和本体的迁移学习，在LIBERO基准和真实场景中优于使用真实动作预训练的方法。

详情

AI中文摘要

视觉-语言-动作（VLA）模型在遵循语言指令的机器人操作任务学习中越来越受欢迎。最先进的VLA模型，如OpenVLA和$\pi_{0}$，是在通过遥操作收集的大规模手动标注动作数据集上训练的。最近的方法，包括LAPA和villa-X，引入了潜在动作表示，通过建模帧间的抽象视觉变化，实现在无标签数据集上的无监督预训练。尽管这些方法展示了强大的结果，但它们的大模型尺寸使得在真实世界环境中部署具有挑战性。在这项工作中，我们提出了LAWM，一个模型无关的框架，通过世界建模从无标签视频数据中学习潜在动作表示，以自监督方式预训练模仿学习模型。这些视频可以来自机器人记录或人类使用日常物品执行动作的视频。我们的框架能够跨任务、环境和本体迁移所学知识。它在LIBERO基准和真实世界设置中优于使用真实机器人动作预训练的模型以及其他类似的预训练方法，同时在真实世界环境中高效且实用。

英文摘要

Vision-Language-Action (VLA) models have gained popularity for learning robotic manipulation tasks that follow language instructions. State-of-the-art VLAs, such as OpenVLA and $π_{0}$, were trained on large-scale, manually labeled action datasets collected through teleoperation. More recent approaches, including LAPA and villa-X, introduce latent action representations that enable unsupervised pretraining on unlabeled datasets by modeling abstract visual changes between frames. Although these methods have shown strong results, their large model sizes make deployment in real-world settings challenging. In this work, we propose LAWM, a model-agnostic framework to pretrain imitation learning models in a self-supervised way, by learning latent action representations from unlabeled video data through world modeling. These videos can be sourced from robot recordings or videos of humans performing actions with everyday objects. Our framework is able to transfer learned knowledge across tasks, environments, and embodiments. It outperforms models pretrained with ground-truth robot actions and other similar pretraining methods on the LIBERO benchmark and real-world setup, while being efficient and practical for real-world settings.

URL PDF HTML ☆

赞 0 踩 0

2602.13197 2026-06-16 cs.RO cs.CV cs.LG 版本更新

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

模仿有效的方法：基于仿真过滤的人类视频模块化策略学习

Albert J. Zhai, Kuo-Hao Zeng, Jiasen Lu, Ali Farhadi, Shenlong Wang, Wei-Chiu Ma

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Allen Institute for AI（Allen人工智能研究所）； University of Washington（华盛顿大学）； Cornell University（康奈尔大学）

AI总结提出Perceive-Simulate-Imitate框架，通过仿真过滤人类视频中的抓取-轨迹对，学习任务导向的抓取与后抓取运动策略，无需机器人数据即可实现鲁棒操作。

Comments Transactions on Machine Learning Research (TMLR)

详情

AI中文摘要

通过观看人类视频学习操作技能的能力有潜力为机器人学习解锁新的高度可扩展数据源。本文研究抓取操作，其中任务涉及在抓取物体后执行各种后抓取运动。人类视频为学习后抓取运动提供了强信号，但对于学习先决的抓取行为帮助较小，尤其是对于没有类人手的机器人。一个有前景的方法是采用模块化策略设计，利用专用抓取生成器产生稳定抓取。然而，任意稳定抓取通常与任务不兼容，阻碍机器人执行期望的下游运动。为解决这一挑战，我们提出Perceive-Simulate-Imitate (PSI)框架，该框架使用通过仿真中配对抓取-轨迹过滤处理的人类视频运动数据来训练模块化操作策略。这一仿真步骤用抓取适用性标签扩展轨迹数据，从而允许对任务导向的抓取能力进行监督学习。通过真实世界实验，我们展示了该框架可以在没有任何机器人数据的情况下高效学习精确操作技能，相比直接使用抓取生成器，性能显著更鲁棒。

英文摘要

The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially for robots without human-like hands. A promising way forward is to use a modular policy design, leveraging a dedicated grasp generator to produce stable grasps. However, arbitrary stable grasps are often not task-compatible, hindering the robot's ability to perform the desired downstream motion. To address this challenge, we present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data processed by paired grasp-trajectory filtering in simulation. This simulation step extends the trajectory data with grasp suitability labels, which allows for supervised learning of task-oriented grasping capabilities. We show through real-world experiments that our framework can be used to learn precise manipulation skills efficiently without any robot data, resulting in significantly more robust performance than using a grasp generator naively.

URL PDF HTML ☆

赞 0 踩 0

2606.10449 2026-06-16 cs.RO 版本更新

GuideWalk: Learning Unified Autonomous Navigation and Locomotion for Humanoid Robots across Versatile Terrains

GuideWalk: 面向人形机器人的统一自主导航与运动学习，适用于多种地形

Haoxuan Han, Chen Chen, Linao Gong, Xin Yang, Hao Hu, Junhong Guo, Zhicheng He, Yao Su, Fenghua He

发表机构 * Harbin Institute of Technology（哈尔滨工业大学）； Leju Robotics（乐聚机器人）

AI总结提出GuideWalk框架，通过可通行性感知导航引导与地形自适应运动教师蒸馏，实现人形机器人在复杂地形上的稳定导航与运动协调。

详情

AI中文摘要

人形机器人已具备强大的运动能力，但在多种地形上的可靠导航仍然具有挑战性，因为避障必须与动态可行的运动协调。在这项工作中，我们提出了GuideWalk，一个统一的端到端框架，将可通行性感知的导航引导与地形自适应运动教师相结合，用于人形机器人导航。具体来说，我们引入了一个导航模块，提供明确的速度引导，将避障与地形条件解耦，从而能够在不同环境中进行鲁棒的规划。我们提出了一种复合教师蒸馏方案，其中目标导向的命令和动态一致的动作被聚合并蒸馏到单个策略中。为了进一步提高鲁棒性，蒸馏后的策略通过强化学习和辅助行为克隆目标进行微调，这促进了探索同时保留了期望的教师行为。实验表明，GuideWalk在保持稳定的人形运动的同时，实现了稳定有效的导航。

英文摘要

Humanoid robots have achieved strong locomotion capabilities, but reliable navigation on versatile terrains remains challenging because obstacle avoidance must be coordinated with dynamically feasible motion. In this work, we present GuideWalk, a unified end-to-end framework that integrates traversability-aware navigation guidance with terrain-adaptive locomotion teacher for humanoid navigation. Specifically, we introduce a navigation module that provides explicit velocity guidance, decoupling obstacle avoidance from terrain conditions to enable robust planning across diverse environments. We propose a composite teacher distillation scheme, where goal-directed commands and dynamically consistent actions are aggregated and distilled into a single policy. To further improve robustness, the distilled policy is refined with reinforcement learning and an auxiliary behavior cloning objective, which promotes exploration while preserving desirable teacher behaviors. Experiments demonstrate that GuideWalk achieves stable and effective navigation while maintaining stable humanoid locomotion.

URL PDF HTML ☆

赞 0 踩 0

2606.13053 2026-06-16 cs.RO cs.AI 版本更新

EV-WM: Event-Verified World Models for Long-Horizon Robotic Manipulation

EA-WM: 基于任务规范基础的事件感知世界模型用于长时域操作

Kailin Wang, Haoxiang Jie, Yaoyuan Yan, Jiacheng Zhou, Zhiyou Heng

发表机构 * AI Lab, Country Garden Services Group（碧桂园服务集团AI实验室）； Fudan University（复旦大学）； Omni AI

AI总结提出EA-WM框架，通过事件预测和验证增强预训练特征世界模型，实现长时域操作中任务进展信号的可靠评估与规划。

详情

AI中文摘要

预训练特征世界模型为机器人想象提供了有用的基础，但仅凭视觉或潜在预测并不能确定想象的未来是否满足任务相关事件。长时域操作需要关系性、谓词级和物理基础的进展信号：物体是否移动，抽屉或接触状态是否改变，放置谓词是否满足，以及候选未来是否足够可靠以执行。我们引入了EA-WM，一种事件感知世界模型框架，通过任务规范基础的事件预测和验证来增强冻结的视觉特征动力学。EA-WM在预训练视觉特征空间中展开候选未来，将其解码为结构化事件状态，并使用任务进展、语义一致性、物理可行性和不确定性项进行评分。验证器指导基于采样的规划，门控候选动作，并在接触敏感的LIBERO酒架设置中，选择PPO生成的提议。在导航、可变形物体、墙壁约束和语言描述的操作研究中，EA-WM表明事件感知验证可以使特征空间世界模型更可解释，并更好地与任务进展对齐。

英文摘要

Pretrained-feature world models provide a useful substrate for robot imagination, but visual or latent prediction alone does not determine whether an imagined future satisfies task-relevant predicates. Long-horizon manipulation requires progress signals that are relational, predicate-level, and physically grounded: whether an object has moved, whether a drawer or contact state has changed, whether a placement predicate is satisfied, and whether a candidate future is reliable enough for execution. We introduce \textbf{EV-WM}, a predicate-grounded verification framework for world-model planning. EV-WM rolls out candidate futures in pretrained visual-feature space, decodes them into structured event states, and scores them using task-progress, semantic-consistency, physical-feasibility, and uncertainty terms. The verifier guides sampling-based planning, gates candidate actions, and, in the contact-sensitive LIBERO wine-rack setting, selects among PPO-generated proposals. Across navigation, deformable-object, wall-constrained, and language-described manipulation studies, EV-WM shows that predicate-grounded verification can make feature-space world-model planning more interpretable and better aligned with task progress.

URL PDF HTML ☆

赞 0 踩 0

2606.13769 2026-06-16 cs.RO cs.CV cs.LG 版本更新

$μ_0$: A Scalable 3D Interaction-Trace World Model

$\mu_0$: 一种可扩展的3D交互轨迹世界模型

Seungjae Lee, Yoonkyo Jung, Jusuk Lee, Jonghun Shin, Amir Hossein Shahidzadeh, Yao-Chih Lee, H. Jin Kim, Jia-Bin Huang, Furong Huang

发表机构 * University of Maryland, College Park（马里兰大学帕克分校）； Seoul National University（首尔大学）

AI总结提出基于3D轨迹的可扩展世界模型$\mu_0$，通过预测交互点轨迹实现跨本体机器人学习，无需动作标签，性能媲美有监督模型。

详情

AI中文摘要

能够捕捉动作如何引起物理变化的世界模型使得可扩展的机器人学习成为可能，而无需依赖特定本体的动作标签。像素空间视频模型提供了广泛的视觉先验，但将模型容量消耗在密集外观重建上，而直接动作模型则需要特定本体的标签，阻碍了可扩展性。我们提出$\mu_0$，一种基于3D轨迹的可扩展世界模型。$\mu_0$不是预测密集像素或直接建模动作，而是预测显著交互点（如物体、工具、手和接触区域）的平滑3D轨迹，从而产生一个紧凑、与本体无关的运动接口。为了能够从多样化的视频源进行训练，我们的TraceExtract系统通过选择关键点、构建全局对齐的轨迹以及将运动片段与层次化语言描述关联，自动提取3D监督。这种TraceExtract监督通过将预训练的视觉-语言骨干网络与模块化轨迹专家相结合来预训练$\mu_0$，其中轨迹专家通过B样条控制点表示每个查询并预测未来轨迹。实验表明，$\mu_0$在2D和3D轨迹预测方面均优于基线方法，包括轨迹预测模型和分词VLM方法。由于$\mu_0$是冻结且可重用的，它可以与动作专家配对用于下游机器人本体。尽管是无动作预训练，由此产生的轨迹条件策略在性能上与使用动作监督预训练的VLA模型（如$\pi_0$）相当。这些结果确立了3D轨迹作为跨本体操作的可扩展和可迁移表示。

英文摘要

World models that capture how actions induce physical change enable scalable robot learning without reliance on embodiment-specific action labels. Pixel-space video models provide broad visual priors but expend model capacity on dense appearance reconstruction, while direct action models require embodiment-specific labels that hinder scalability. We present $μ_0$, a scalable world model based on 3D traces. Rather than predicting dense pixels or directly modeling actions, $μ_0$ forecasts smooth 3D trajectories for salient interaction points such as objects, tools, hands, and contact regions, yielding a compact, embodiment-agnostic motion interface. To enable training from diverse video sources, our TraceExtract system automatically extracts 3D supervision by selecting keypoints, constructing globally aligned traces, and associating motion segments with hierarchical language captions. This TraceExtract supervision pretrains $μ_0$ by combining a pretrained vision-language backbone with a modular trace expert, which represents each query via B-spline control points and predicts future traces. Experiments show that $μ_0$ outperforms baselines in both 2D and 3D trace prediction, including trace prediction models and tokenized VLM methods. Because $μ_0$ is frozen and reusable, it can be paired with action experts for downstream robot embodiments. Despite action-free pretraining, the resulting trace-conditioned policies achieve performance competitive with VLA models pretrained with action supervision, such as $π_0$. These results establish 3D traces as a scalable and transferable representation for cross-embodiment manipulation.

URL PDF HTML ☆

赞 0 踩 0

2602.17997 2026-06-16 cs.LG cs.RO 版本更新

Whole-Brain Connectomic Graph Model Enables Whole-Body Locomotion Control in Fruit Fly

全脑连接组图模型实现果蝇全身运动控制

Zehao Jin, Yaoye Zhu, Chen Zhang, Yanan Sui

发表机构 * Tsinghua University（清华大学）

AI总结提出Fly-connectomic Graph Model，将果蝇全脑连接组作为图结构控制器，通过深度强化学习驱动仿真果蝇运动，在多种任务中表现稳定且样本效率优于基线。

详情

AI中文摘要

动物在由全脑连接塑造的神经系统控制下执行协调的全身运动。全脑神经连接（即连接组）的映射为建模感觉运动信息流提供了天然的图结构，但其作为具身智能体神经控制器的潜力尚未被充分探索。本文介绍了Fly-connectomic Graph Model，该模型直接将成年果蝇的全脑连接组实例化为图结构神经控制器，通过深度强化学习驱动仿真生物力学果蝇的运动。我们在多种运动任务中实现了稳定的性能，并且与图和非图基线相比，样本效率更高。我们的结果展示了一种通过将全脑布线原理转化为可操作的架构先验来设计有效控制策略的生物启发式方法，同时通过动态信息流提高了可解释性。这项工作还通过提供一个计算平台来研究动物行为背后的感觉运动转换，以及一种推动更贴近自然的智能系统发展的范式，强调了连接神经力学与具身智能的潜力。

英文摘要

Animals perform coordinated whole-body movements under the control of neural systems shaped by brain-wide connectivity. The mapping of the whole-brain neural connections, or the connectomes, provides a natural graph for modeling sensorimotor information flow, yet its potential as a neural controller for embodied agents remains largely unexplored. Here, we introduce the Fly-connectomic Graph Model, which directly instantiates the whole-brain connectome of an adult Drosophila as a graph-structured neural controller for movements of a simulated biomechanical fruit fly via deep reinforcement learning. We achieve stable performance across diverse locomotion tasks, as well as better sample efficiency compared to both graph and non-graph baselines. Our results demonstrate a biologically informed way towards effective control policy design by translating whole-brain wiring principles into actionable architectural priors, while also improving the interpretability through dynamic information flow. This work also highlights the potential to bridge neuromechanics with embodied intelligence by providing a computational platform for investigating the sensorimotor transformation underlying animal behavior and a paradigm to advance the development of more nature-aligned intelligent systems.

URL PDF HTML ☆

赞 0 踩 0

2606.14763 2026-06-16 cs.RO cs.LG math.OC 新提交

Bayesian Optimization for Learning Nonlinear MPC in Autonomous Agent Navigation

自主智能体导航中学习非线性模型预测控制的贝叶斯优化

Lorenzo Ortolani, Gabriel Voss, Gabriele Beltrami, Francesco Dorati, Tommaso Felice Banfi

发表机构 * Talos Robotics AI

AI总结提出一种无地图框架，结合滚动时域规划与非线性MPC，利用贝叶斯优化自动调参，在仿真和实物四足机器人上实现高效导航。

Comments Published at the IEEE ICRA 2026 Xplore Workshop (Oral), Cross-Disciplinary aspects of Exploration in Robotics, Reinforcement Learning, and Search

详情

AI中文摘要

在动态未知环境中的实时自主导航仍然是移动机器人领域的一个基本挑战。我们提出了一种无地图框架，该框架紧密集成了反应式滚动时域规划与非线性模型预测控制（MPC）。在每个控制周期，构建基于激光雷达的高斯占据表示，并通过A*搜索生成无碰撞轨迹，随后由采用平滑sigmoid障碍屏障的CasADi/IPOPT MPC公式进行跟踪。为了提高对参数敏感性的鲁棒性，我们采用基于树结构Parzen估计器（TPE）的离线贝叶斯优化方案，该方案针对复合导航目标识别出接近最优的控制器参数。此外，使用高斯过程代理分析参数敏感性，并深入了解优化景观。所提出的框架与机器人无关，在仿真中使用Gazebo在Unitree Go2四足机器人上进行评估，随后部署到实体机器人上。实验结果表明，在仿真中调优的参数能有效迁移到硬件上，无需额外调优即可保持相当的性能。完整系统在部署时实现了高达90.0%的导航成功率，并且在仿真环境中评估指标平均提升38.9%。

英文摘要

Real-time autonomous navigation in dynamic, unknown environments remains a fundamental challenge for mobile robotics. We propose a map-free framework that tightly integrates reactive rolling-horizon planning with nonlinear Model Predictive Control (MPC). At each control cycle, a LiDAR-based Gaussian occupancy representation is constructed and used to generate collision-free trajectories via A* search, which are then tracked by a CasADi/IPOPT MPC formulation incorporating a smooth sigmoid obstacle barrier. To improve robustness to parameter sensitivity, we adopt an offline Bayesian optimization scheme based on Tree-structured Parzen Estimators (TPE), which identifies near-optimal controller parameters with respect to a composite navigation objective. In addition, a Gaussian Process surrogate is used to analyze parameter sensitivity and provide insight into the optimization landscape. The proposed framework is robot-agnostic and is evaluated on the Unitree Go2 quadruped in simulation using Gazebo, followed by deployment on the physical robot. Experimental results show that parameters tuned in simulation transfer effectively to hardware, maintaining comparable performance without additional tuning. The full system achieves up to a 90.0\% navigation success rate when deployed, along with a 38.9\% average improvement in the evaluation metrics across simulated environments.

URL PDF HTML ☆

赞 0 踩 0

2606.14794 2026-06-16 cs.RO 新提交

Computing Smooth Geodesics under Two-Sided Curvature Bounds with Applications to Robotics and Image Analysis

计算双侧曲率约束下的光滑测地线及其在机器人和图像分析中的应用

Da Chen, Zhenjiang Li, Jean-Marie Mirebeau, Xuecheng Tai, Jinglin Zhang, Wei Zhang, Laurent D. Cohen

发表机构 * CEREMADE, University Paris Dauphine, University-PSL, CNRS, UMR 7534（巴黎多芬纳大学CEREMADE实验室，巴黎文理研究大学，法国国家科学研究中心，UMR 7534）； Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University, Shandong Academy of Medical Sciences（山东省肿瘤医院放射肿瘤科，山东第一医科大学，山东省医学科学院）； Department of Mathematics, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay（巴黎萨克雷大学数学系，博雷利中心，巴黎萨克雷高等师范学校，法国国家科学研究中心）； Norce（挪威研究中心）； School of Control Science and Engineering, Shandong University（山东大学控制科学与工程学院）

AI总结提出一种基于Hamilton-Jacobi-Bellman偏微分方程框架的曲率有界测地线模型，通过约束曲率上下界实现路径的光滑性和几何控制，并给出离散化求解方案，应用于机器人路径规划和图像曲线结构跟踪。

详情

AI中文摘要

平面曲线的曲率由于与光滑性、刚性和弹性等理想几何特性密切相关，因此作为计算二阶最小路径的关键正则化项。本文解决计算物理和几何中一个更具挑战性的问题：跟踪曲率受任意上下界约束的最小路径。为此，我们提出了一种新的曲率有界测地线模型，该模型在Hamilton-Jacobi-Bellman (HJB) 偏微分方程 (PDE) 框架下开发。它通过强制曲率范围约束，对最小路径提供强大的几何控制，使得路径光滑且具有有界曲率限制。我们还提出了一种包含曲率约束的哈密顿量和HJB PDE的离散化方案，使得能够高效求解模型的数值解。最后，我们展示了所提出的曲率有界测地线模型在机器人路径规划和图像曲线结构跟踪中的应用能力。数值实验表明，所提出的曲率有界测地线模型是寻找满意路径的强大且鲁棒的工具。

英文摘要

Curvature of planar curves serves as a key regularization term for computing second-order minimal paths, due to its tight relevance to desirable geometric properties such as smoothness, rigidity, and elasticity. In this paper, we tackle a more challenging problem in computational physics and geometry problem: tracking minimal paths whose curvature is constrained by arbitrary upper and lower bounds. For that purpose, we propose a new curvature-bounded geodesic model, developed under the Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE) framework. It provides strong geometric control over minimal paths by enforcing curvature range constraints, whose paths are smooth and of bounded curvature limitation. We also present a discretization scheme for the Hamiltonian and the HJB PDE incorporating curvature bounds, allowing efficient solver for estimating numerical solutions to the model. Finally, we illustrate the capability of the proposed curvature-bounded geodesic model in applications of robot path planning and curvilinear structures tracking from images. Numerical experiments demonstrate that the proposed curvature-bounded geodesic model serves as a powerful and robust tool for finding satisfactory paths.

URL PDF HTML ☆

赞 0 踩 0

2606.15046 2026-06-16 cs.RO 新提交

Exact, Efficient, and Safe Occlusion-Aware Planning Using AH-Polyhedrons

使用AH-多面体的精确、高效且安全的遮挡感知规划

Long Kiu Chung, David Isele, Toktam Mohammadnejad, Faizan M. Tariq, Sangjae Bae, Shreyas Kousik, Jovin D'sa

发表机构 * Honda Research Institute (HRI)（本田研究所）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出APRO框架，利用博弈论主动感知和AH-多面体可达性分析，通过线性规划实现精确安全验证，解决自动驾驶代客泊车中的遮挡问题，达到100%安全率且实时。

Comments 8 pages, 3 figures

详情

AI中文摘要

安全处理遮挡是动态环境中自主移动机器人面临的基本挑战。这一问题在自动驾驶代客泊车（AVP）中尤为突出，因为交通规则宽松、遮挡频繁且杂乱，过度保守的行为可能导致车辆被困。然而，现有方法要么缺乏形式化安全保证，要么假设智能体遵循道路结构，要么引入保守性，使得AVP的遮挡感知规划仍然是一个开放挑战。在本文中，我们提出APRO（遮挡的AH-多面体可达性），一个基于博弈论主动感知和AH-多面体可达性分析的精确且高效的遮挡感知规划框架，以AVP作为典型用例。我们的关键洞察是将先前工作中基于集合的安全条件重新表述为AH-多面体的并集，从而通过线性规划（LP）实现精确的安全验证，无需在集合计算或道路拓扑假设中引入任何额外的保守性。我们进一步展示了如何将所得安全条件集成到基于优化的规划器或二分搜索方案中，以用于实时应用。我们在仿真和硬件实验中验证了我们的方法，包括在真实停车场数据集上的数据回放。实验结果表明，我们的方法在所有评估场景中始终达到100%的安全率，同时保持实时性能，从而比具有形式化安全保证的现有方法做出更安全、更优的决策。

英文摘要

Safely handling occlusions is a fundamental challenge for autonomous mobile robots operating in dynamic environments. This issue is especially prominent in autonomous valet parking (AVP), where traffic rules are lax, occlusions are frequent and cluttered, and overly conservative behavior can leave vehicles stuck. However, existing methods either lack formal safety guarantees, assume agents follow road structures, or introduce conservatism, leaving occlusion-aware planning for AVP an open challenge. In this paper, we propose APRO (AH-Polyhedron Reachability for Occlusions), an exact and efficient occlusion-aware planning framework based on game-theoretic active perception and AH-polyhedron reachability analysis with AVP as our canonical use case. Our key insight is to reformulate set-based safety conditions in prior work as unions of AH-polyhedrons, enabling exact safety verification through linear programming (LP) without any additional conservatism in set computations or assumptions on road topology. We further show how the resulting safety conditions can be integrated into optimization-based planners or a bisection search scheme for real-time applications. We validate our method in simulation and hardware experiments, including data replay on a real-world parking lot dataset. Experimental results demonstrate that our method consistently achieved a 100% safety rate across all evaluated scenarios while maintaining real-time performance, resulting in safer and more optimal decisions than existing methods with formal safety guarantees.

URL PDF HTML ☆

赞 0 踩 0

2606.15317 2026-06-16 cs.RO 新提交

Covariance-Regulated Recursive Koopman Learning for Nonlinear Systems with Uncertain Time-Varying Dynamics

面向不确定时变非线性系统的协方差调控递归Koopman学习

Weibin Gu, Chen Yang, Lu Shi, Chao Gao

发表机构 * Tsinghua University（清华大学）； China University of Petroleum-Beijing at Karamay（中国石油大学（北京）克拉玛依校区）； Xinchen Qihang Inc.（信辰启航有限公司）

AI总结针对离线模型在时变动力学下失效的问题，提出协方差调控递归Koopman学习框架，通过误差死区门控和常迹归一化策略防止协方差爆炸和参数冻结，实现数值稳定的在线建模，并在非完整驱动轮式机器人和扑翼微型飞行器上验证了其跟踪性能。

详情

AI中文摘要

自主机器人的离线模型在训练分布之外的时变动力学下常常失效。Koopman算子理论通过提升提供非线性动力学的线性表示，但其向实时递归估计的过渡可能面临数值脆弱性：使用指数遗忘时低激励下的协方差风涌，以及无遗忘时增益消失。本文提出了一种协方差调控递归Koopman学习（CR-RKL）框架，包含两种互补策略——误差死区门控和常迹归一化——每种策略都能独立防止协方差爆炸和参数冻结，后者还额外保留了不确定性的几何结构。在具有车轮滑移和Stribeck摩擦的非完整差分驱动机器人以及26克仿蝴蝶扑翼微型飞行器上验证，CR-RKL实现了数值稳定且准确的在线建模，当嵌入模型预测控制时，在不确定时变动力学下保持了可靠的跟踪性能。

英文摘要

Offline models for autonomous robots often fail under time-varying dynamics outside their training distribution. Koopman operator theory offers a linear representation of nonlinear dynamics via lifting, but its transition to real-time recursive estimation may suffer numerical vulnerabilities: covariance windup under low excitation when using exponential forgetting, and vanishing gain without forgetting. This paper introduces a Covariance-Regulated Recursive Koopman Learning (CR-RKL) framework with two complementary strategies--error dead-zone gating and constant-trace normalization--each independently capable of preventing covariance explosion and parameter freezing, with the latter additionally preserving the geometric structure of uncertainty. Validated on a non-holonomic differential-drive robot with wheel slip and Stribeck friction and on a 26-gram butterfly-inspired flapping-wing micro aerial vehicle, CR-RKL achieves numerically stable and accurate online modeling, and when embedded in model predictive control, it maintains reliable tracking performance under uncertain, time-varying dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.15469 2026-06-16 cs.RO 新提交

Learning Context-Aware Neural ODE Dynamics for Adaptive Robotic Control

学习上下文感知的神经ODE动力学用于自适应机器人控制

Shao-Yi Yu, Jen-Wei Wang, Maya Horii, Masayoshi Tomizuka, Vikas Garg

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Aalto University（阿尔托大学）； YaiYai Ltd（YaiYai有限公司）

AI总结提出基于神经ODE的上下文感知动力学模型，通过两阶段训练从状态-动作历史推断环境因素，实现模型预测控制下的自适应，在四旋翼、Sphero BOLT和Fanuc机械臂上验证了时空变化环境下的有效性。

详情

AI中文摘要

部署在不确定和动态变化环境中的机器人系统经常面临接触条件、空气动力学效应和外部干扰的变化，这些挑战了可靠的控制。为了在基于模型的控制下保持有效性，这些系统需要能够适应此类变化的动力学模型，特别是在直接获取完整环境信息受限的情况下。为了实现适应性并促进与模型预测控制的集成，我们提出了一种基于神经常微分方程的上下文感知动力学模型，该模型使用两阶段训练过程从状态-动作历史推断环境因素。我们在多种机器人平台上验证了该方法，包括仿真中的四旋翼，以及真实世界实验中的Sphero BOLT机器人和Fanuc机械臂。结果表明，我们的方法有效地适应了不同任务中时间和空间变化的环境变化。视频可在https://youtu.be/PY0sNyF2rqE 获取，源代码可在https://github.com/syyu410-yu/context-aware-neural-ode-control.git 获取。

英文摘要

Robotic systems deployed in uncertain and dynamically changing environments often face variations in contact conditions, aerodynamic effects, and external disturbances that challenge reliable control. To remain effective under model-based control, these systems require dynamics models that can adapt to such changes, especially when direct access to complete environmental information is limited. To enable adaptability and facilitate integration with model predictive control, we propose a context-aware dynamics model based on neural ordinary differential equations, which infers environmental factors from state-action histories using a two-phase training procedure. We validate the approach across diverse robotic platforms, including a quadrotor in simulation, as well as a Sphero BOLT robot and a Fanuc manipulator in real-world experiments. The results demonstrate that our method effectively adapts to temporally and spatially varying environmental changes across different tasks. Videos are available at https://youtu.be/PY0sNyF2rqE , and the source code is available at https://github.com/syyu410-yu/context-aware-neural-ode-control.git .

URL PDF HTML ☆

赞 0 踩 0

2606.15594 2026-06-16 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 新提交

Pixels to Proofs: Probabilistically-Safe Latent World Model Control via Parallel Conformal Robust MPC

从像素到证明：通过并行保形鲁棒MPC实现概率安全的潜在世界模型控制

Devesh Nath, Anutam Srinivasan, Haoran Yin, Ruitong Jiang, Jeffrey Fang, Glen Chou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出SLS^2框架，结合保形预测与鲁棒模型预测控制，在学习的潜在世界模型中实现基于视觉的安全运动规划，提升目标到达性能与安全性。

详情

AI中文摘要

我们提出了SLS^2，一个使用鲁棒模型预测控制（MPC）在学习的潜在世界模型中进行安全反馈运动规划的框架。我们的方法训练了一个动作条件的联合嵌入世界模型，具有紧凑的马尔可夫潜在状态，通过学习的潜在动力学实现高效的基于梯度的轨迹优化。为了在潜在预测不完美的情况下确保真实系统的安全性，我们采用保形预测来通知GPU加速的系统级综合（SLS）鲁棒MPC方案，以获得校准的潜在误差界限和鲁棒的潜在空间约束集。我们还学习并保形化了一个潜在约束检查器，使SLS规划器能够在闭环执行期间施加概率安全约束。我们在基于视觉的控制任务上评估了我们的方法，与潜在世界模型和安全规划基线相比，它提高了目标到达性能和安全性。

英文摘要

We present SLS^2, a framework for safe feedback motion planning from pixels using robust model predictive control (MPC) in learned latent world models. Our approach trains an action-conditioned joint-embedding world model with compact Markovian latent states, enabling efficient gradient-based trajectory optimization through learned latent dynamics. To enforce safety for the true system despite imperfect latent predictions, we inform a GPU-accelerated system level synthesis (SLS) robust MPC scheme with conformal prediction to obtain calibrated latent error bounds and robust latent-space constraint sets. We further learn and conformalize a latent constraint checker, allowing the SLS planner to impose probabilistic safety constraints during closed-loop execution. We evaluate our method on vision-based control tasks, where it improves both goal-reaching performance and safety over latent world-model and safe-planning baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.15654 2026-06-16 cs.RO cs.AI 新提交

PO-PDDL: Learning Symbolic POMDPs from Visual Demonstrations for Robot Planning Under Uncertainty

PO-PDDL: 从视觉演示中学习符号化POMDP以实现不确定性下的机器人规划

Wenjing Tang, Xuanjin Jin, Yuan Liu, Renming Huang, Cewu Lu, Panpan Cai

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结提出PO-PDDL符号化POMDP框架，通过从机器人执行视频中重建潜在状态轨迹、识别部分可观测性并学习随机转移与观测模型，实现不确定性下的鲁棒任务规划。

详情

AI中文摘要

现实世界的机器人任务规划必须在随机动作执行和部分可观测性下进行，然而为真实机器人领域构建部分可观测马尔可夫决策过程（POMDP）模型仍然困难且劳动密集。我们引入了PO-PDDL，一种POMDP的符号化表述，它保留了规划领域定义语言（PDDL）的关系结构和LLM友好的语法，同时显式建模了部分可观测性、随机性和信念。基于此表述，我们提出了一种用于学习PO-PDDL模型的演示驱动流程。该方法从真实机器人执行视频中重建潜在符号状态轨迹，通过推断状态与视觉观测之间的不一致性识别部分可观测性，并相应地学习随机转移和观测模型。得到的PO-PDDL领域可跨任务重用，并在感知和执行不确定性下实现在线信念空间规划。在真实世界长时域操作任务上的实验表明，我们的方法持续优于现有的PDDL和POMDP模型学习方法，以显著更低的规划成本实现了不确定性下的鲁棒任务规划。

英文摘要

Real-world robot task planning must operate under both stochastic action execution and partial observability, yet constructing Partially Observable Markov Decision Process (POMDP) models for real robotics domains remains difficult and labor-intensive. We introduce PO-PDDL, a symbolic formulation of POMDPs that preserves the relational structure and LLM-friendly syntax of the Planning Domain Definition Language (PDDL), while explicitly modeling partial observability, stochasticity, and beliefs. Building on this formulation, we propose a demonstration-driven pipeline for learning PO-PDDL models. The proposed method reconstructs latent symbolic state trajectories from real-robot execution videos, identifies partial observability via inconsistencies between inferred states and visual observations, and learns stochastic transition and observation models accordingly. The resulting PO-PDDL domains are reusable across tasks and enable online belief-space planning under both perception and execution uncertainty. Experiments on real-world long-horizon manipulation tasks show that our method consistently outperforms existing PDDL and POMDP model-learning approaches, achieving robust task planning under uncertainty with significantly lower planning cost.

URL PDF HTML ☆

赞 0 踩 0

2606.15896 2026-06-16 cs.RO cs.LG 新提交

LoComposition: Terrain-Adaptive Energy-Efficient Quadruped Locomotion without Gait Priors

LoComposition：无需步态先验的地形自适应高效四足运动

Loukas Kordos, Leonard T. Franz, Simon Rappenecker, Oliver Hausdoerfer, Angela P. Schoellig, Pavel Kolev, Georg Martius

发表机构 * Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； University of Tübingen（图宾根大学）； Technical University of Munich（慕尼黑工业大学）； University of Stuttgart（斯图加特大学）

AI总结提出一种将任务奖励、操作约束、能量最小化和地形感知分离的框架，无需显式步态先验，在四足机器人上实现高效地形自适应运动，运输成本降低56%，违规减少96%。

Comments 17 pages, 5 figures, 10 tables

详情

AI中文摘要

基于学习的四足运动通常依赖于复杂的奖励函数，将任务规范、操作限制、步态偏好和地形适应纠缠在单个优化目标中。我们通过不同的机制处理这些功能：任务规范用奖励，操作限制用约束，步态偏好用能量最小化，以及用外部感知来根据地形难度调整能量使用。我们表明，这些组件共同实现了高效、地形自适应的运动，并且移除每个组件会暴露出不同的失败模式。我们的公式移除了显式的步态先验（包括腾空时间、接触次数和足部间隙目标），转而支持涌现行为。与传统的复杂奖励基线相比，我们的公式在实现相当的地形穿越的同时，将运输成本降低了56%，操作限制违规减少了96%。得到的策略零样本迁移到使用基于LiDAR高程地图的物理Unitree Go2上。项目网站含视频：https://tinyurl.com/locomposition。

英文摘要

Learning-based quadrupedal locomotion typically relies on complex reward formulations that entangle task specification, operational limits, gait preference, and terrain adaptation within a single optimization objective. We instead treat these functions through distinct mechanisms: rewards for task specification, constraints for operational limits, energy minimization for gait preference, and exteroceptive perception for adapting energy use to terrain difficulty. We show that these components jointly enable efficient, terrain-adaptive locomotion, and that removing each component exposes a distinct failure mode. Our formulation removes explicit gait priors (including air-time, contact-count, and foot-clearance targets) in favor of emergent behavior. Compared to a conventional complex-reward baseline, our formulation achieves comparable terrain traversal while reducing cost of transport by 56% and operational-limit violations by 96%. The resulting policies transfer zero-shot to a physical Unitree Go2 using LiDAR-based elevation mapping. Project website with videos: https://tinyurl.com/locomposition.

URL PDF HTML ☆

赞 0 踩 0

2606.15918 2026-06-16 cs.RO 新提交

Energy-Efficient Arm Reaching for a Humanoid Robot via Deep Reinforcement Learning with Identified Power Models

基于识别功率模型的深度强化学习实现人形机器人节能手臂伸展

Nestor N. Deniz, Simon Parsons, Fernando Auat Cheein

发表机构 * Harper Adams University（哈珀亚当斯大学）； Lincoln Institute for Agri-Food Technology（林肯农业食品技术研究所）； Lincoln Centre for Autonomous Systems（林肯自主系统中心）

AI总结提出一种端到端能量感知强化学习框架，结合物理实验识别的电功率模型与SAC策略，在Unitree G1人形机器人上实现节能手臂伸展，仿真成功率69.9%，实物验证平均能耗71.5 J。

详情

AI中文摘要

在田间执行操作任务（如机器人苹果采摘）的人形机器人面临严重的能量约束，这直接限制了每块电池充电可执行的伸展运动次数。本文针对Unitree~G1人形机器人的7自由度左臂，提出了一种端到端的能量感知强化学习框架，该框架结合了基于物理实验识别的电功率模型和在基于Pinocchio的刚体动力学模拟器中训练的Soft Actor-Critic (SAC)策略。RL策略在增量关节位置动作空间上运行，并使用混合星座奖励进行训练，该奖励将四点末端执行器星座距离与扭矩范数能量代理相结合；经过$5\times10^6$次训练后，在运动学模拟中对$1\,000$个随机目标达到了$69.9\%$的成功率，成功情节的平均能量为\SI{98.16}{\joule}。最后，在物理Unitree~G1上，该策略在三个独立的10目标批次上进行了验证，实现了平均能量$71.5 \pm 48.3$\,J，末端执行器位置误差$2.64 \pm 1.04$\,cm，方向误差$6.92 \pm 1.33^\circ$——均在\SI{4}{\centi\metre}/$8.6^\circ$的训练容差内。这些结果构成了基于能量感知强化学习的人形机器人手臂伸展的第一步。

英文摘要

Humanoid robots performing in-field manipulation tasks, such as robotic apple harvesting, face severe energy constraints that directly limit the number of reaching motions that can be executed per battery charge. This paper presents an end-to-end, energy-aware reinforcement learning framework for the 7-degree-of-freedom left arm of the Unitree~G1 humanoid robot, combining a physics-based, experimentally identified electrical power model with a Soft Actor-Critic (SAC) policy trained in a Pinocchio-based rigid-body dynamics simulator. The RL policy operates on an incremental joint-position action space and is trained with a Hybrid Constellation Reward that combines a four-point end-effector constellation distance with a torque-norm energy proxy; after % $5\times10^6$ training it reaches a $69.9\%$ success rate over $1\,000$ random targets in kinematic simulation, at a mean energy of \SI{98.16}{\joule} on successful episodes. Finally, on the physical Unitree~G1, the policy is validated over three independent 10-target batches, achieving a mean energy of $71.5 \pm 48.3$\,J, an end-effector position error of $2.64 \pm 1.04$\,cm, and an orientation error of $6.92 \pm 1.33^\circ$ -- within the \SI{4}{\centi\metre}/$8.6^\circ$ training tolerance. These results constitute a first step toward energy-aware reinforcement-learning-based arm reaching for humanoid robots.

URL PDF HTML ☆

赞 0 踩 0

2606.16480 2026-06-16 cs.RO cs.AI cs.SY eess.SY 新提交

HOLO-MPPI: Multi-Scenario Motion Planning via Hierarchical Policy Optimization

HOLO-MPPI：通过分层策略优化的多场景运动规划

Youngjae Min, Jovin D'sa, Faizan M. Tariq, David Isele, Navid Azizan, Sangjae Bae

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Honda Research Institute, USA（本田研究所（美国））

AI总结提出HOLO-MPPI框架，结合离线高层策略学习与在线低层随机最优控制，实现多场景运动规划，无需针对每个场景重新调整参数，在自动驾驶中优于MPPI和端到端RL基线。

详情

AI中文摘要

部署在现实世界中的机器人必须在不同场景下规划运动，而无需针对每个场景重新调整参数。端到端强化学习（RL）可以跨场景泛化，但在分布偏移、奖励错误指定和随机交互下往往变得脆弱。模型预测路径积分（MPPI）控制能够在无梯度的情况下实现强大的实时优化，但其性能依赖于良好形状的采样先验，而手动设计先验无法扩展到多场景部署。我们提出了HOLO-MPPI（高层离线，低层在线MPPI），一种多场景运动规划框架，结合了高层策略学习与低层随机最优控制。离线时，我们学习一个高层策略，在抽象动作空间中提出场景鲁棒的规划，并利用学习的世界模型进行在线推演。在线时，该策略作为数据驱动的先验生成器，根据当前观测和目标参数化MPPI的采样分布。然后MPPI围绕该先验实时优化低层控制序列，以适应局部扰动。我们通过设计有效的高层动作空间和定制模型架构，在自动驾驶中实例化HOLO-MPPI。在多种驾驶场景下的评估表明，HOLO-MPPI在保持实时控制的同时，优于MPPI和端到端RL基线。

英文摘要

Robots deployed in the real world must plan motions across diverse scenarios without per-scenario retuning. End-to-end reinforcement learning (RL) can generalize across scenarios but often becomes brittle under distribution shift, reward misspecification, and stochastic interactions. Model predictive path integral (MPPI) control enables strong real-time refinement without gradients, but its performance depends on a well-shaped sampling prior, while manually designing the priors does not scale to multi-scenario deployment. We present HOLO-MPPI (High-level Offline, Low-level Online MPPI), a multi-scenario motion planning framework that combines high-level policy learning with low-level stochastic optimal control. Offline, we learn a high-level policy that proposes scenario-robust plans in an abstract action space, with a learned world model for online rollout. Online, the policy serves as a data-driven prior generator that parameterizes MPPI's sampling distribution conditioned on the current observation and goal. MPPI then optimizes low-level control sequences around this prior in real time to adapt to local disturbances. We instantiate HOLO-MPPI in autonomous driving by designing an effective high-level action space and tailored model architectures. Our evaluation across diverse driving scenarios shows that HOLO-MPPI improves upon MPPI and end-to-end RL baselines while maintaining real-time control.

URL PDF HTML ☆

赞 0 踩 0

2606.16542 2026-06-16 cs.RO 新提交

ADAPT: Analytical Disturbance-Aware Policy Training for Humanoid Locomotion

ADAPT: 面向人形机器人运动的解析干扰感知策略训练

Bofan Lyu, Jindou Jia, Kuangji Zuo, Yanshuo Lu, Shijia Han, Gen Li, Boyu Ma, Jingliang Li, Geng Li, Jianfei Yang

发表机构 * MARS Lab, Nanyang Technological University（南洋理工大学MARS实验室）

AI总结提出ADAPT框架，通过解析全身干扰观测器在线估计外力/力矩，无需传感器，提升人形机器人在干扰下的运动精度与鲁棒性。

详情

AI中文摘要

部署在人类中心环境中的人形机器人必须处理力交互任务，其中外部接触会引入意外干扰，破坏运动精度和稳定性。现有的基于学习的方法依赖于广泛的域随机化、特定任务的力目标或基于运动历史的学习型力估计器，每种方法都会在精度、任务可迁移性或分布外鲁棒性上做出妥协。我们提出了解析干扰感知策略训练（ADAPT），这是一个框架，它为人形机器人策略配备了物理基础的干扰观测器。ADAPT的核心是一个解析全身干扰观测器，它利用可访问的机器人动力学在线估计残余力/力矩，无需力/力矩传感器。估计的干扰直接输入策略，使人形机器人获得对外力/力矩的显式、基于物理的感知，能够泛化到各种未见过的场景。在Unitree G1人形机器人上的实验表明，ADAPT在躯干扰动、站立推力和不对称手部负载下实现了比仅基于本体感觉的基线更准确的干扰预测和更强的鲁棒性，即使在分布外干扰下也能改善速度跟踪。此外，ADAPT能够惩罚在下肢关节推断出的干扰，以鼓励更轻快的运动。

英文摘要

Humanoids deployed in human-centered environments must handle force-interactive tasks, where external contacts introduce unexpected disturbances that disrupt locomotion accuracy and stability. Existing learning-based approaches rely on broad domain randomization, task-specific force objectives, or learning-based force estimators from motion history, each of which compromises accuracy, task transferability, or out-of-distribution (OOD) robustness. We present Analytical Disturbance-Aware Policy Training (ADAPT), a framework that equips humanoid policies with a physically grounded disturbance observer. The core of ADAPT is an analytical whole-body disturbance observer that estimates residual force/torque online with the accessible robot dynamics, without requiring force/torque sensors. Fed directly into the policy, the estimated disturbances give the humanoid an explicit, physics-derived sense of external force/torque that can generalize across diverse unseen scenes. Experiments on a Unitree G1 humanoid show that ADAPT achieves accurate disturbance prediction and stronger robustness than a proprioception-only baseline under torso perturbations, standing pushes, and asymmetric hand payloads, with improved velocity tracking even on OOD disturbances. Moreover, ADAPT enables penalizing inferred disturbances at lower-body joints to encourage lighter locomotion.

URL PDF HTML ☆

赞 0 踩 0

2606.16564 2026-06-16 cs.RO cs.LG 新提交

Elastic ODYN: Differentiable Optimization for Infeasible Control and Learning in Robotics

Elastic ODYN：面向机器人中不可行控制与学习的可微优化

Aristotelis Papatheodorou, Jose Rojas, Ioannis Havoutis, Carlos Mastalli

发表机构 * University of Oxford（牛津大学）； Heriot-Watt University（赫瑞瓦特大学）

AI总结提出Elastic ODYN，一种通过平滑平方ℓ2弹性松弛处理不可行二次规划（QP）的原始-对偶非内点求解器，支持热启动，在无可行点时收敛到最接近可行解，并基于此开发可微QP层和不可行感知SQP方法，在基准QP、奇异接触力学、可微参数辨识及四足/人形机器人轨迹优化中优于现有方法。

Comments 8 pages, 5 figures, 2 tables

详情

AI中文摘要

机器人系统经常遇到冲突的目标、建模误差和退化接触条件，这些条件使得二次规划（QP）不可行。然而，大多数优化求解器和可微QP层假设可行性，当约束无法同时满足时，会导致数值失败、梯度不稳定或求解器崩溃。我们提出Elastic ODYN，一种原始-对偶非内点QP求解器，通过平滑平方ℓ2弹性松弛处理不可行性。所得公式在病态和退化条件下保持良态，支持热启动，并在无可行点时收敛到最接近可行解。一个轻量级细化阶段从弹性解中恢复有物理意义的对偶变量。基于此框架，我们开发了Elastic OdynLayer，一个在不可行性下具有稳定梯度的可微QP层，以及Elastic OdynSQP，一种不可行感知的SQP方法，通过选择性约束松弛解决不一致的子问题和本质不可行的最优控制任务。我们在基准QP、奇异接触力学、可微参数辨识以及四足和人形机器人轨迹优化上评估该框架。在所有设置中，Elastic ODYN在鲁棒性、热启动性能和收敛可靠性方面始终优于最先进的弹性QP求解器，使得优化、仿真、控制和学习能够超越现有方法的可行性假设。

英文摘要

Robotic systems routinely encounter conflicting objectives, modeling errors, and degenerate contact conditions that render quadratic programs (QPs) infeasible. Yet most optimization solvers and differentiable QP layers assume feasibility, leading to numerical failures, unstable gradients, or solver breakdown when constraints cannot be simultaneously satisfied. We present Elastic ODYN, a primal--dual non-interior-point QP solver that handles infeasibility through smooth squared-$\ell_2$ elastic relaxations. The resulting formulation remains well posed under ill-conditioning and degeneracy, supports warm starting, and converges to closest-to-feasible solutions when no feasible point exists. A lightweight refinement stage recovers physically meaningful dual variables from the elastic solution. Building on this framework, we develop Elastic OdynLayer, a differentiable QP layer with stable gradients under infeasibility, and Elastic OdynSQP, an infeasibility-aware SQP method that resolves inconsistent subproblems and intrinsically infeasible optimal control tasks through selective constraint relaxation. We evaluate the framework on benchmark QPs, singular contact mechanics, differentiable parameter identification, and quadrupedal and humanoid trajectory optimization. Across all settings, Elastic ODYN consistently outperforms state-of-the-art elastic QP solvers in robustness, warm-start performance, and convergence reliability, enabling optimization, simulation, control, and learning beyond the feasibility assumptions of existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.16696 2026-06-16 cs.RO 新提交

VENOM: Versatile Embodied Network for Omni-bodied Motion tracking

VENOM: 用于全身运动追踪的多功能具身网络

Siddharth Padmanabhan, Kazuki Miyazawa, Takato Horii

发表机构 * Graduate School of Engineering Science, University of Osaka（大阪大学工学研究科）

AI总结提出VENOM，一种基于GPT的跨具身全身运动追踪模型，在仿真中实现多个人形机器人的全身运动追踪，无需分离上下身控制。

详情

AI中文摘要

仅从演示数据中实现跨多个人形机器人的专家级表现力全身运动追踪，在人形机器人学习中仍然是一个具有挑战性且相对未充分探索的问题。跨具身运动追踪策略主要通过将控制问题分解为上身和下身控制来训练。本文提出VENOM，一种用于仿真中的人形机器人的跨具身全身运动追踪模型。VENOM是一种基于GPT的运动追踪器，在多人形机器人数据上训练，可以追踪整个身体，无需分割为上身和下身控制。我们整理了一个名为VENOM数据集的多人形机器人运动追踪数据集，包含状态、动作和奖励，并在此数据集上训练VENOM和基线模型。在本文中，我们评估了VENOM相对于基线的性能，并表明我们能够实现一个稳定的运动追踪器，其能力优于仅通过监督学习在多人形机器人数据上训练的MLP，并且还表明，尽管缺乏奖励反馈，VENOM与使用非对称演员-评论家强化学习训练的专家的追踪能力紧密匹配。

英文摘要

Achieving expert-level expressive full-body motion tracking across multiple humanoids solely from demonstration data remains a challenging and relatively an underexplored problem in humanoid robot learning. Cross-embodiment motion tracking policies are mostly trained by decoupling the control problem into upper and lower body control. This work proposes VENOM, a cross-embodiment full-body motion tracking model for humanoids in simulation. VENOM is a GPT-based motion tracker trained on multiple humanoid data that can track the entire body without the requirement to split into upper and lower body control. We curate a multi-humanoid motion tracking dataset called the VENOM dataset that contains states, actions, and rewards and train VENOM and the baselines on this dataset. In this letter, we evaluate VENOM's performance against baselines and show that we can achieve a stable motion tracker across different humanoids more capable than an MLP trained on multiple humanoid data with supervised learning alone, and also show that despite lack of reward feedback, VENOM closely matches the tracking capability of experts that were trained using asymmetric-actor critic reinforcement learning.

URL PDF HTML ☆

赞 0 踩 0

2606.16780 2026-06-16 cs.RO 新提交

DIFF-IPPO: Diffusion-Based Informative Path Planning with Open-Vocabulary Belief Maps

DIFF-IPPO：基于扩散的开放词汇信念地图信息路径规划

Sausar Karaf, Oleg Sautenkov, Mikhail Martynov, Dzmitry Tsetserukou

发表机构 * Intelligent Space Robotics Laboratory, CDE, Skoltech（智能空间机器人实验室，CDE，斯科尔科沃科学技术研究院）

AI总结提出DIFF-IPPO框架，结合开放词汇信念地图生成器与扩散规划器，在非高斯信念图上生成全局轨迹，实现高效目标搜索，检测得分达81.49%-86.55%。

详情

AI中文摘要

探索和物体搜索要求机器人感知环境、识别感兴趣区域，并规划提高目标检测可能性或最大化信息增益的轨迹。许多IPP方法，特别是在连续环境监测中，依赖于高斯过程信念模型，而物体搜索场景通常从语义或开放词汇感知中产生复杂的多模态信念地图。直接基于这种非高斯信念地图的全局轨迹生成仍然相对未被充分探索。尽管基于扩散的规划器为此类分布建模提供了强大能力，但它们在信息路径规划中的应用仍然有限。在这项工作中，我们提出了DIFF-IPPO，一个集成了开放词汇信念地图生成器和基于扩散的规划器的流水线，用于在信念地图上生成全局轨迹。该方法生成的轨迹将传感器覆盖集中在高信念区域，在不同数据集场景下实现了81.49%至86.55%的归一化检测得分。我们在一个模拟的搜索与救援场景中验证了该系统，其中规划器搜索候选建筑区域以定位燃烧的建筑。在此设置中，一个由五架无人机组成的团队使用批处理信念地图条件轨迹生成，在3.5分钟内实现了首次检测。

英文摘要

Exploration and object search require robots to perceive their environment, identify regions of interest, and plan trajectories that improve target-detection likelihood or maximize information gain. Many IPP methods, especially in continuous environmental monitoring, rely on Gaussian-process belief models, while object-search settings often produce complex, multimodal belief maps from semantic or open-vocabulary perception. Global trajectory generation directly conditioned on such non-Gaussian belief maps remains comparatively underexplored. Although diffusion-based planners offer strong capabilities for modeling such distributions, their use in informative path planning remains limited. In this work, we propose DIFF-IPPO, a pipeline that integrates an open-vocabulary belief map generator with a diffusion-based planner for global trajectory generation over belief maps. The method generates trajectories that concentrate sensor coverage over high-belief regions, achieving normalized detection scores between 81.49% and 86.55% across different dataset scenarios. We validate the system in a simulated search-and-rescue scenario where the planner searches candidate building regions to locate a burning building. In this setting, a team of five drones using batched belief-map-conditioned trajectory generation achieves first detections in 3.5 minutes.

URL PDF HTML ☆

赞 0 踩 0

2606.16972 2026-06-16 cs.RO cs.SY eess.SY 新提交

When Should a Robot Replan? Regret-Guided Update Scheduling in Time-Varying MDPs

机器人何时应重新规划？时变MDP中的遗憾引导更新调度

Negin Musavi, Gokul Puthumanaillam, Ruben Hernandez, William Schafer, Melkior Ornik

发表机构 * University of Illinois Urbana–Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结针对时变环境下机器人因预算限制无法持续重规划的问题，提出基于动态遗憾的在线更新调度规则，在仿真和实物实验中优于固定预算基线。

详情

AI中文摘要

在非平稳环境中运行的机器人必须随着动态漂移不断调整其策略，但机载能量和计算预算限制了全状态估计和重规划步骤的执行频率。这引出一个问题：在时间轴上，机器人何时应花费其有限的预算？我们在具有已知转移漂移率边界的时变马尔可夫决策过程（TVMDP）中形式化该问题。我们将执行建模为一种“跳过更新”方案，即在选定的更新时间点，智能体通过最大似然估计转移核并计算有限时域策略，而在更新间隔之间，则在传播的状态估计下重用该策略。我们分析了该方案的动态遗憾，并展示了它如何根据TVMDP的性质和跳过长度在跳过区间内增长；由此产生的界限通过一种在线、遗憾引导的更新规则回答了开头的问题，该规则自适应地分配预算。我们在具有时变滑移动力学的模拟火星车导航任务和室内障碍物场中的Crazyflie四旋翼飞行器上评估了该规则。自适应分配优于其他预算基线。

英文摘要

Robots operating in non-stationary environments must continually adapt their policies as the dynamics drift, but onboard energy and compute budgets cap how often a full state estimation and re-planning step can be performed. This raises a question: \emph{when}, along a horizon, should a robot spend its limited budget? We formulate this problem in time-varying Markov decision processes (TVMDPs) with a known bound on the rate of transition drift. We model execution as a \emph{skip-update} scheme in which, at chosen update times, the agent estimates the transition kernel by maximum likelihood and computes a finite-horizon policy, and between updates reuses this policy under a propagated state estimate. We analyze the dynamic regret of this scheme and show how it grows during skip intervals in terms of the properties of the TVMDP and the skip lengths; the resulting bound answers the opening question via an online, regret-guided update rule that allocates the budget adaptively. We evaluate the rule in a simulated Mars-rover navigation task with time-varying slip dynamics and on a Crazyflie quadrotor in indoor obstacle fields. Adaptive allocation outperforms other budgeted baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.13485 2026-06-16 eess.SY cs.HC cs.NE cs.RO cs.SY physics.med-ph 交叉投稿

Impedance MPC with Patient-Torque Estimation for Knee Rehabilitation Exoskeletons

用于膝关节康复外骨骼的阻抗模型预测控制与患者力矩估计

Yongyan Cao, Jinshan Tang

发表机构 * Department of Biomedical Engineering and Engineering Science（生物医学工程与工程科学系）

AI总结提出阻抗模型预测控制框架，结合卡尔曼扰动状态估计患者力矩，实现无偏移跟踪和辅助按需，在500 Hz下满足临床精度标准。

详情

AI中文摘要

膝关节康复外骨骼必须强制执行规定的关节轨迹，同时保持对非自主痉挛和自主患者努力的安全顺从——这是任何固定增益阻抗控制器的目标冲突。我们提出了一种用于膝关节康复外骨骼的阻抗模型预测控制框架，并在串联弹性执行器（SEA）平台上进行了演示：代数前馈将膝关节动力学简化为常系数标量双积分器，而滚动时域二次规划（QP）计算校正力矩，同时强制执行硬性的运动范围、力矩和速度限制（ISO 13482）。由直接基于SEA的力矩传感（通过弹性元件测量的串联弹性弹簧挠度——一种固有的、无EMG的患者力矩估计，而非单独的力传感器）驱动的卡尔曼扰动状态提供了标称无偏移保证，并通过其符号和期望运动方向实现无传感器的辅助按需。常状态矩阵允许离线预计算QP成本逆，从而实现多步时域下的500 Hz运行。在七个控制器基准测试（正弦跟踪、等长保持）中，500 Hz卡尔曼MPC在15 Nm痉挛下实现了0.1 mrad RMS、0.1 mrad稳态、0.2 mrad峰值的无偏移，而相同刚度下的经典阻抗控制器稳态偏移为515 mrad——直接测量通道几乎立即（几个采样周期内）收敛估计。没有估计器时，它实现经典阻抗（4.8 mrad RMS，8.3 mrad稳态）。所有MPC变体均满足87 mrad临床标准；没有经典控制器满足。该架构通过考虑耦合的每个关节QP为20自由度MyoSuite myoLeg设计。

英文摘要

Knee rehabilitation exoskeletons must enforce a prescribed joint trajectory while remaining safely compliant with involuntary spasm and voluntary patient effort-objectives in tension for any fixed-gain impedance controller. We present an Impedance Model Predictive Control framework for knee rehabilitation exoskeletons, demonstrated on a series-elastic-actuator (SEA) platform: an algebraic feedforward reduces the knee dynamics to a constant-coefficient scalar double integrator, and a receding-horizon quadratic program (QP) computes corrective torques while enforcing hard range-of-motion, torque, and velocity limits (ISO 13482). A Kalman disturbance state driven by direct SEA-based torque sensing (the series-elastic spring deflection measured through the elastic element - an intrinsic, EMG-free patient-torque estimate, not a separate load cell) gives a nominal offset-free guarantee and, via its sign and the desired-motion direction, sensorless Assist-as-Needed. The constant state matrix permits offline precomputation of the QP cost inverse, enabling 500 Hz operation with a multi-step horizon. Across seven-controller benchmarks (sinusoidal tracking, isometric hold), the 500 Hz Kalman MPC is offset free 0.1 mrad RMS, 0.1 mrad steady-state, 0.2 mrad peak under 15 Nm spasm, versus a 515 mrad steady-state offset for classical impedance at the same stiffness - the direct-measurement channel converging the estimate near-immediately (within a few sampling periods). Without the estimator it realizes a classical impedance (4.8 mrad RMS, 8.3 mrad steady-state). All MPC variants meet the 87 mrad clinical criterion; no classical controller does. The architecture is formulated for the 20 DOF MyoSuite myoLeg via coupling-aware per-joint QPs.

URL PDF HTML ☆

赞 0 踩 0

2606.16068 2026-06-16 eess.SY cs.RO cs.SY math.OC 交叉投稿

Anisotropic Template Ansätze for Robust Positive Invariance under State-Dependent Uncertainty

各向异性模板Ansätze用于状态依赖不确定性下的鲁棒正不变性

Abdelrahman Ramadan, Melissa Greeff, Sidney Givigi

发表机构 * Electrical and Computer Engineering, Smith Engineering, and with Ingenuity Labs Research Institute, Queen’s University（电气与计算机工程系、史密斯工程系以及Ingenuity Labs研究研究院，皇后大学）； School of Computing, and with Ingenuity Labs Research Institute, Queen’s University（计算学院以及Ingenuity Labs研究研究院，皇后大学）

AI总结提出一种基于高斯过程导出的正定矩阵场映射固定椭球模板的方法，建立状态和输入依赖扰动下鲁棒正不变性的充分条件，通过LMI条件实现，仿真显示体积大幅缩减。

详情

AI中文摘要

我们建立了在具有各向异性协方差结构的状态和输入依赖扰动下鲁棒正不变性的充分条件。所提出的ansatz通过高斯过程导出的正定矩阵场映射一个固定的椭球模板，在保留基于有限图验证的同时，包含了标量同位缩放。得到的LMI条件将学习到的场与Schur稳定动力学耦合；一个带有膨胀因子$r=1/(1-γ_{\mathrm{cl}})$的各向同性后备方案被证明是可接受的。在每个学习周期中，场被冻结，因此在线管道评估仅需一次GP协方差查询和一个小的矩阵平方根，无需在线集迭代或LMI求解。四旋翼仿真显示，相对于非自适应同位基线，3D速度管道体积减少了$195\times$，联合7D速度-控制子空间体积减少了$2.1\times10^5$倍。此扩展版本增加了完整证明、分离的离线/在线复杂度分析以及控制器扫描、收缩和投影面积研究。

英文摘要

We establish sufficient conditions for robust positive invariance under state- and input-dependent disturbances with anisotropic covariance structure. The proposed ansatz maps a fixed ellipsoidal template through a GP-derived positive-definite matrix field, subsuming scalar homothetic scaling while retaining finite graph-based verification. The resulting LMI conditions couple the learned field to Schur-stable dynamics; an isotropic fallback with inflation factor $r=1/(1-γ_{\mathrm{cl}})$ proves admissibility. During each learning epoch the field is frozen, so online tube evaluation is one GP covariance query and a small matrix square root, with no online set iteration or LMI solve. Quadrotor simulations show a $195\times$ reduction in 3D velocity-tube volume and a $2.1{\times}10^5$ reduction in the joint 7D velocity-control subspace relative to a non-adaptive homothetic baseline. This extended version adds full proofs, a separated offline/online complexity analysis, and controller-sweep, contraction, and projection-area studies.

URL PDF HTML ☆

赞 0 踩 0

2509.00836 2026-06-16 cs.RO 版本更新

One-Step Model Predictive Path Integral for Manipulator Motion Planning Using Configuration Space Distance Fields

基于构型空间距离场的一步模型预测路径积分用于机械臂运动规划

Yulin Li, Tetsuro Miyazaki, Kenji Kawashima

发表机构 * Department of Information Physics and Computing, The University of Tokyo（东京大学信息物理与计算系）

AI总结提出将构型空间距离场与模型预测路径积分结合，利用CDF梯度统一代价函数并缩短规划时域至一步，实现高效避障，在2D环境和7自由度机械臂仿真中成功率近100%，控制频率超750Hz。

详情

AI中文摘要

机械臂的运动规划是机器人学中的一个基本问题。经典的基于优化的方法通常依赖符号距离场（SDF）的梯度来施加避碰约束。然而，这些方法容易陷入局部最小值，并且在SDF梯度消失时可能失败。最近，构型空间距离场（CDF）被提出，它直接在机器人的构型空间中建模距离。与工作空间SDF不同，CDF几乎处处可微，因此提供了可靠的梯度信息。另一方面，无梯度方法如模型预测路径积分（MPPI）控制利用长时域滚动来实现避碰。虽然有效，但这些方法由于大量轨迹样本、重复碰撞检测以及设计具有异质物理单位的代价函数的困难而计算昂贵。在本文中，我们提出了一个将CDF与MPPI集成的框架，以实现机器人在其构型空间中的直接导航。利用CDF梯度，我们统一了关节空间中的MPPI代价，并将时域缩短为一步，大幅削减计算量，同时在实际中保持避碰能力。我们证明，我们的方法在2D环境中实现了接近100%的成功率，并在具有复杂障碍物的具有挑战性的7自由度Franka机械臂仿真中持续获得高成功率。此外，我们的方法达到了超过750Hz的控制频率，显著优于基于优化的方法和标准MPPI基线。这些结果突出了所提出的CDF-MPPI框架在高维运动规划中的有效性和效率。

英文摘要

Motion planning for robotic manipulators is a fundamental problem in robotics. Classical optimization-based methods typically rely on the gradients of signed distance fields (SDFs) to impose collision-avoidance constraints. However, these methods are susceptible to local minima and may fail when the SDF gradients vanish. Recently, Configuration Space Distance Fields (CDFs) have been introduced, which directly model distances in the robot's configuration space. Unlike workspace SDFs, CDFs are differentiable almost everywhere and thus provide reliable gradient information. On the other hand, gradient-free approaches such as Model Predictive Path Integral (MPPI) control leverage long-horizon rollouts to achieve collision avoidance. While effective, these methods are computationally expensive due to the large number of trajectory samples, repeated collision checks, and the difficulty of designing cost functions with heterogeneous physical units. In this paper, we propose a framework that integrates CDFs with MPPI to enable direct navigation in the robot's configuration space. Leveraging CDF gradients, we unify the MPPI cost in joint-space and reduce the horizon to one step, substantially cutting computation while preserving collision avoidance in practice. We demonstrate that our approach achieves nearly 100% success rates in 2D environments and consistently high success rates in challenging 7-DOF Franka manipulator simulations with complex obstacles. Furthermore, our method attains control frequencies exceeding 750 Hz, substantially outperforming both optimization-based and standard MPPI baselines. These results highlight the effectiveness and efficiency of the proposed CDF-MPPI framework for high-dimensional motion planning.

URL PDF HTML ☆

赞 0 踩 0

2509.20084 2026-06-16 cs.RO 版本更新

C-3TO: Continuous 3D Trajectory Optimization on Neural Euclidean Signed Distance Fields

C-3TO：基于神经欧几里得有符号距离场的连续三维轨迹优化

Guillermo Gil, Jose Antonio Cobano, Luis Merino, Fernando Caballero

发表机构 * Service Robotics Laboratory – Universidad Pablo de Olavide (Seville), Spain（帕布罗·奥拉维德大学机器人服务实验室（塞维利亚），西班牙）

AI总结提出一种在杂乱环境中利用在线神经欧几里得有符号距离场进行连续三维轨迹优化的框架，通过两阶段非线性优化直接优化五次多项式表示的平滑轨迹，实现安全、高效且可动态执行的轨迹规划。

Comments 8 pages, 5 figures, submitted and accepted in ICUAS 2026

详情

AI中文摘要

本文提出了一种新颖的框架，用于在杂乱环境中进行连续三维轨迹优化，利用在线神经欧几里得有符号距离场（ESDF）。与先前依赖离散化ESDF网格和插值的方法不同，我们的方法直接优化由五次多项式表示的平滑轨迹，该轨迹定义在连续的神经ESDF上，确保整个轨迹上的精确梯度信息。该框架集成了一个两阶段非线性优化管道，平衡了效率、安全性和平滑性。实验结果表明，C-3TO能够生成碰撞感知且动态可行的轨迹。此外，其在定义局部窗口大小和优化参数方面的灵活性，使得能够轻松适应不同用户的需求，而不影响性能。通过将连续轨迹参数化与持续更新的神经ESDF相结合，C-3TO为空中机器人安全高效的局部重规划建立了稳健且可泛化的基础。

英文摘要

This paper introduces a novel framework for continuous 3D trajectory optimization in cluttered environments, leveraging online neural Euclidean Signed Distance Fields (ESDFs). Unlike prior approaches that rely on discretized ESDF grids with interpolation, our method directly optimizes smooth trajectories represented by fifth-order polynomials over a continuous neural ESDF, ensuring precise gradient information throughout the entire trajectory. The framework integrates a two-stage nonlinear optimization pipeline that balances efficiency, safety and smoothness. Experimental results demonstrate that C-3TO produces collision-aware and dynamically feasible trajectories. Moreover, its flexibility in defining local window sizes and optimization parameters enables straightforward adaptation to diverse user's needs without compromising performance. By combining continuous trajectory parameterization with a continuously updated neural ESDF, C-3TO establishes a robust and generalizable foundation for safe and efficient local replanning in aerial robotics.

URL PDF HTML ☆

赞 0 踩 0

2606.08059 2026-06-16 cs.RO 版本更新

Perceptive Behavior Foundation Model: Adapting Human Motion Priors to Robot-Centric Terrain

感知行为基础模型：将人体运动先验适应到以机器人为中心的地形

Zifan Wang, Yizhao Li, Teli Ma, Qiang Zhang, Yudong Fan, Hao Xu, Shuo Yang, Junwei Liang

发表机构 * Mondo Robotics ； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； The Hong Kong University of Science and Technology（香港科技大学）； Artificial General Intelligence Institute, University of Science and Technology of China（中国科学技术大学通用人工智能研究院）

AI总结提出感知行为基础模型（Perceptive BFM），通过地形一致参考合成（TCRS）将人体运动先验适应到机器人局部地形，实现地形感知的人形机器人控制。

详情

AI中文摘要

人形机器人行为基础模型旨在从广泛的人体运动先验中获取可复用的全身控制策略，使单一控制器能够产生多样且富有表现力的行为。然而，现有的以运动为中心的基础策略大多假设参考运动已经与机器人周围环境物理兼容。当演示者、操作者和机器人处于不同环境时，这一假设不再成立：人体运动可能指定了预期行为，但并未指定机器人局部地形所需的落脚点、间隙、身体高度或接触时机。我们引入了\emph{感知行为基础模型}（Perceptive BFM），这是一种地形感知的人形机器人控制框架，将人体运动先验植根于以机器人为中心的感知。该模型保留原始运动学运动参考作为行为接口，同时利用局部地形观测来调整接触、姿态和时机。为了提供可扩展的地形监督，我们开发了\emph{地形一致参考合成}（TCRS），通过接触感知的落脚点构建、足部几何感知的摆动优化、支撑感知的根部重建、碰撞修复和多点逆运动学，将面向运动的运动片段转换为地形一致的参考。然后，我们训练一个盲适应参考教师，并通过目标帧动作对齐将其地形一致行为迁移到部署的原始参考学生。学生是一个身份门控Transformer跟踪器，其地形特征通过残差路径进入，这些路径初始化为保留运动跟踪先验，并仅在需要时训练产生局部修正。

英文摘要

Humanoid behavior foundation models aim to acquire reusable whole-body control policies from broad human motion priors, enabling a single controller to produce diverse and expressive behaviors. However, existing motion-centric foundation policies largely assume that the reference motion is already physically compatible with the robot's surroundings. This assumption breaks when the demonstrator, operator, and robot inhabit different environments: a human motion may specify the intended behavior, but not the footholds, clearance, body height, or contact timing required by the robot's local terrain. We introduce \emph{Perceptive Behavior Foundation Model} (Perceptive BFM), a terrain-aware humanoid control framework that grounds human motion priors in robot-centric perception. The model preserves raw kinematic motion references as the behavioral interface, while using local terrain observations to adapt contacts, posture, and timing. To provide scalable terrain supervision, we develop \emph{terrain-conformal reference synthesis} (TCRS), which converts locomotion-oriented human motion clips into terrain-consistent references through contact-aware foothold construction, foot-geometry-aware swing optimization, support-aware root reconstruction, collision repair, and multi-point inverse kinematics. We then train a blind adapted-reference teacher and transfer its terrain-conformal behavior to a deployed raw-reference student through target-frame action alignment. The student is an identity-gated Transformer tracker whose terrain features enter through residual pathways initialized to preserve the motion-tracking prior and trained to produce local corrections only when needed.

URL PDF HTML ☆

赞 0 踩 0

2606.14981 2026-06-16 cs.RO cs.AI cs.LG 新提交

ART-Glove：用于接触接地灵巧交互捕获的关节式触觉手套

Changyi Lin, Ding Zhao

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出ART-Glove关节式触觉手套，通过16个刚性功能表面和22个解剖对齐关节，同步捕获22自由度关节运动和2048触觉点接触信息，支持下游灵巧机器人学习。

2606.16436 2026-06-16 cs.RO cs.CV 新提交

V2P-Manip: Learning Dexterous Manipulation from Monocular Human Videos

V2P-Manip：从单目人类视频学习灵巧操作

Kaihan Chen, Yanming Shao, Haifeng Ji, Xiaokang Yang, Yao Mu

发表机构 * Zhejiang University（浙江大学）； Shanghai Jiao Tong University（上海交通大学）； Shanghai AI Laboratory（上海人工智能实验室）

AI总结提出V2P-Manip框架，从单目人类演示视频中提取具有视觉保真度和物理合理性的轨迹，通过两阶段精炼实现空间对齐与物理一致性，在TACO和OakInk基准上显著优于先前方法。

详情

AI中文摘要

实现自主机器人灵巧操作需要大规模精确、类人的动作序列。作为昂贵遥操作数据的可扩展补充，从单目视频中提取兼具视觉保真度和物理合理性的轨迹是具身智能的一个有前景的前沿方向。为此，我们引入V2P-Manip，一个高效的框架，旨在直接从人类演示视频中学习灵巧操作策略。我们建立了一个高效、集成的流水线，涵盖3D资产获取、轨迹估计和灵巧策略学习。为了弥合视觉感知与物理约束之间的差距，我们引入了一个两阶段精炼过程，以强制执行空间对齐和物理一致性。在TACO和OakInk基准上的评估表明，我们的方法在姿态精度、对非结构化环境的适应性以及训练效率方面显著优于先前方法。最终，实验结果证实了在多个合成操作任务上平均成功率超过75%，并验证了提取的操作先验在不同灵巧手形态上的适应性。

英文摘要

Achieving autonomous robotic dexterous manipulation requires precise, human-like action sequences at scale. As a scalable supplement to costly teleoperation data, extracting trajectories with both visual fidelity and physical plausibility from monocular videos represents a promising frontier in embodied AI. To this end, we introduce V2P-Manip, an efficient framework designed to learn dexterous manipulation policies directly from human demonstration videos. We establish an efficient, integrated pipeline encompassing 3D asset acquisition, trajectory estimation, and dexterous policy learning. To bridge the gap between visual perception and physical constraints, we introduce a two-stage refinement process to enforce spatial alignment and physical consistency. Evaluations on the TACO and OakInk benchmarks demonstrate that our approach significantly outperforms previous methods in pose accuracy, adaptability to unstructured environments, and training efficiency. Ultimately, experimental results confirm an average success rate of over 75% across multiple synthetic manipulation tasks and validate the adaptability of the extracted manipulation priors across diverse dexterous hand embodiments.

URL PDF HTML ☆

赞 0 踩 0

2606.16504 2026-06-16 cs.RO 新提交

APEX: Adaptive Policy Execution for Precise Manipulation

APEX: 用于精确操作的适应性策略执行

Mengfei Zhao, Chenxi Jiang, Tuo An, Jindou Jia, Jianfei Yang

发表机构 * MARS Lab, Nanyang Technological University（南洋理工大学MARS实验室）

AI总结针对策略与控制器间的执行差距，提出即插即用的APEX框架，通过动态可行参考重建和测试时自适应，减少跟踪误差并提升操作成功率。

Comments 20 pages, 9 figures, 4 tables

详情

AI中文摘要

现代模仿学习方法，包括视觉运动策略和视觉-语言-动作（VLA）策略，通常输出高层动作参考，由低层控制器执行。然而，缺乏高层参考信号以及策略在训练过程中对底层控制动态的不了解，不可避免地导致了执行差距。结果，实际动作系统地偏离策略指令的动作，对精度敏感的操作产生关键影响。先前的工作要么修改策略架构，要么修改低层控制器，两者都需要对预训练策略或封装控制器进行侵入式更改。这引发了一个自然问题：当策略和控制器都被视为不可访问的黑盒时，我们能否弥合执行差距？我们提出了适应性策略执行（APEX），这是一个插入在策略和控制器之间的即插即用框架，从策略输出中重建动态可行的参考，并在测试时根据低层状态反馈进行自适应，具有可证明的收敛保证。广泛的实证研究表明，APEX在演示回放中将控制器引起的跟踪误差减少了41.2%，并在四种视觉运动策略和VLA策略类别上将操作成功率提高了4.8-25.8个百分点。

英文摘要

Modern imitation learning methods, including visuomotor and Vision-Language-Action (VLA) policies, typically output high-level action references that are executed by low-level controllers. However, the absence of higher-order reference signals, together with the policy's lack of awareness of the underlying low-level control dynamics during training, inevitably induces an execution gap. As a result, realized actions deviate systematically from policy-commanded ones, with a critical impact on precision-sensitive manipulation. Prior work either modifies the policy architecture or the low-level controller, both requiring intrusive changes to the pretrained policy or packaged controller. This raises a natural question: when the policy and controller are both treated as inaccessible black boxes, can we bridge the execution gap? We propose Adaptive Policy Execution (APEX), a plug-and-play framework inserted between the policy and the controller that reconstructs a dynamically feasible reference from policy outputs and adapts at test-time according to low-level state feedback, with a provable convergence guarantee. Extensive empirical studies show that APEX reduces controller-induced tracking error by 41.2% on demonstration replay and improves manipulation success by 4.8--25.8 percentage points across four visuomotor and VLA policy classes.

URL PDF HTML ☆

赞 0 踩 0

2606.16690 2026-06-16 cs.RO cs.AI cs.CV 新提交

解耦的以对象为中心的视频理解用于生成机器人操作指令

Thanh Nguyen Canh, Thanh-Tuan Tran, Haolan Zhang, Ziyan Gao, Xiem HoangVan, Nak Young Chong

发表机构 * School of Information Science, Japan Advanced Institute of Science and Technology（日本北陆先端科学技术大学院大学信息科学学院）； University of Engineering and Technology, Vietnam National University（越南国立大学工程与技术大学）； Department of Robotics, Hanyang University（汉阳大学机器人学系）

AI总结提出解耦动作识别与对象选择的框架，通过TSM分类动作和对象选择算法识别任务相关对象，结合VLM生成精确指令，在Something-Something V2上显著提升性能。

详情

AI中文摘要

将视频演示翻译为可执行的机器人命令仍然具有挑战性，因为现有方法通常无法识别演示动作中功能涉及的对象。因此，它们可能生成语言上合理但操作上模糊的命令。我们提出了一种以对象为中心的视频理解框架，将动作识别与对象识别解耦，以生成精确的、无语法的操作命令。我们的方法集成了时间移位模块（TSM）用于高效的时空动作分类，以及一种新颖的\textbf{对象选择}算法，通过基于轨迹的角色分类、模糊检测和重叠最小化来识别任务相关对象。然后，选定的对象由视觉语言模型（VLM）处理，以实现鲁棒的类别识别和零样本泛化。在修改后的Something-Something V2数据集上评估，我们的方法达到了86.79%的动作分类准确率，在标准对象上BLEU-4得分为0.337，在新颖对象上为0.261。这些结果分别比最强的任务特定基线提高了80.2%和143.9%。在METEOR和CIDEr指标上观察到更大的提升，在新颖对象上分别达到157.9%和171.7%。在所有语义指标上，我们的方法始终优于任务特定方法，并与大型通用VLM保持竞争力或超越它们，同时保留了模块化的、以对象为中心的设计。

英文摘要

Translating video demonstrations into executable robot commands remains challenging because existing methods often fail to identify which objects are functionally involved in the demonstrated action. As a result, they may generate commands that are linguistically plausible but operationally ambiguous. We propose an object-centric video understanding framework that decouples action recognition from object identification to generate precise, grammar-free manipulation commands. Our approach integrates Temporal Shift Modules (TSM) for efficient spatio-temporal action classification with a novel \textbf{Object Selection} algorithm that identifies task-relevant objects through trajectory-based role classification, blur detection, and overlap minimization. The selected objects are then processed by Vision-Language Models (VLMs) for robust category recognition and zero-shot generalization. Evaluated on a modified Something-Something V2 dataset, our method achieves 86.79\% action classification accuracy and BLEU-4 scores of 0.337 on standard objects and 0.261 on novel objects. These results improve over the strongest task-specific baseline by 80.2\% and 143.9\%, respectively. Larger gains are observed in METEOR and CIDEr, reaching 157.9\% and 171.7\% on novel objects. Across all semantic metrics, our approach consistently outperforms task-specific methods and remains competitive with, or surpasses, large general-purpose VLMs while retaining a modular, object-centric design.

URL PDF HTML ☆

赞 0 踩 0

2606.09777 2026-06-16 cs.RO 版本更新

AetheRock: An Arm-Worn Robot Teaching System for Force-Guided Vision-Tactile Learning

AetheRock: 一种用于力引导视觉触觉学习的臂戴式机器人教学系统

Hong Li, Yue Xu, Yihan Tang, Yankang Dong, Chenyuan Liu, Chenyang Yu, Xuyang Li, Siyuan Huang, Yujun Shen, Nan Xue, Yong-Lu Li

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Ant Group（蚂蚁集团）； Shanghai Innovation Institute（上海创新研究院）； Beijing Institute for General Artificial Intelligence (BIGAI)（北京通用人工智能研究院）

AI总结提出臂戴式设备AetheRock采集夹爪力、视觉和触觉数据，并设计ForceVT框架利用力和视觉引导触觉学习，解决力感知机器人学习中传感器装配不兼容问题。

详情

AI中文摘要

力和触觉感知在接触密集操作中不可或缺。然而，由于手持或可穿戴设备中触觉和力传感器的不兼容装配，力感知机器人学习面临关键挑战。为解决这些限制，我们首先引入AetheRock用于夹爪力、视觉和触觉数据收集，这是一种臂戴式设备，指尖配备模块化且易于制造的视觉触觉传感器GelSlim-MiniFab，人体手指接触区域配备电阻式压力传感器，定制PCB模块，以及用于舒适和稳健收集的可穿戴套件。在此基础上，我们提出ForceVT，一种表示学习框架，利用力和视觉引导保真度无关的触觉学习，实现在任何触觉情况下的鲁棒推理。实际实验表明，AetheRock实现了合格的数据效率，且ForceVT有效缓解了视觉触觉传感器在制造和使用不一致时的低效问题。总体而言，我们的工作通过创新的硬件设计和算法减轻了夹爪力-视觉-触觉机器人学习的局限性。

英文摘要

Force and tactile sensing are indispensable in contact-rich manipulation. However, force-aware robot learning faces critical challenges due to the incompatible assembly of tactile and force sensors in handheld or wearable devices. To address these limitations, we first introduce AetheRock for gripper-force, vision, and tactile data collection, which is an arm-worn device featuring a modular and easily manufactured visuo-tactile sensor, GelSlim-MiniFab, at the fingertip, a resistive pressure sensor at the human finger contact region, a customized PCB module, and a wearable kit for comfortable and robust collection. Building on this, we propose ForceVT, a representation learning framework that uses force and vision to guide fidelity-agnostic tactile learning, enabling robust inference in any tactile situation. Real-world experiments show that AetheRock achieves qualified data efficiency and that ForceVT effectively alleviates inefficiencies when visuo-tactile sensors exhibit manufacturing and utilization inconsistencies. Overall, our work mitigates the limitations of gripper-force vision-tactile robot learning through innovative hardware design and algorithms.

URL PDF HTML ☆

赞 0 踩 0

2601.13565 2026-06-16 cs.CV cs.RO eess.IV 版本更新

Learning Fine-Grained Correspondence with Cross-Perspective Perception for Open-Vocabulary 6D Object Pose Estimation

学习细粒度对应与跨视角感知用于开放词汇6D物体姿态估计

Yu Qin, Shimeng Fan, Fan Yang, Zixuan Xue, Zijie Mai, Wenrui Chen, Kailun Yang, Zhiyong Li

发表机构 * School of Artificial Intelligence and Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University（人工智能与机器人学院和机器人视觉感知与控制技术国家工程研究中心，湖南大学）； State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University（自主智能无人系统国家重点实验室，同济大学）； School of Computer Science and Engineering, Hunan University of Science and Technology（计算机科学与工程学院，湖南科技大学）

AI总结提出FiCoP框架，通过物体中心解耦、跨视角全局感知模块和补丁相关预测器，实现空间约束的细粒度对应，显著提升开放世界6D姿态估计的鲁棒性。

Comments Accepted to IEEE Robotics and Automation Letters (RA-L). The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP

详情

AI中文摘要

开放词汇6D物体姿态估计使机器人能够仅凭自然语言指令操控任意未见过的物体。然而，现有方法的一个关键限制是它们依赖于无约束的全局匹配策略。在开放世界场景中，尝试将锚点特征与整个查询图像空间进行匹配会引入过多的歧义，因为目标特征容易与背景干扰物混淆。为解决这一问题，我们提出了细粒度对应姿态估计（FiCoP），这是一个从易受噪声影响的全局匹配过渡到空间约束的补丁级对应的框架。为了系统地消除背景干扰，FiCoP首先采用以物体为中心的解耦步骤，将目标从宏观环境噪声中隔离出来。基于这个局部区域，我们的核心方法创新有两个方面。首先，提出了跨视角全局感知（CPGP）模块，通过显式上下文推理和文本引导的语义注入融合双视图特征，建立结构一致性。其次，我们设计了一个补丁相关预测器（PCP），利用补丁到补丁的相关矩阵作为结构先验。这生成一个精确的块状关联图，作为空间滤波器，强制执行细粒度、抗噪声的匹配。在REAL275和Toyota-Light数据集上的实验表明，与最先进方法相比，FiCoP的平均召回率分别提高了8.0%和6.1%，突显了其在复杂、无约束的开放世界环境中为机器人代理提供鲁棒和泛化感知的能力。源代码将在此https URL公开。

英文摘要

Open-vocabulary 6D object pose estimation empowers robots to manipulate arbitrary unseen objects guided solely by natural language. However, a critical limitation of existing approaches is their reliance on unconstrained global matching strategies. In open-world scenarios, trying to match anchor features against the entire query image space introduces excessive ambiguity, as target features are easily confused with background distractors. To resolve this, we propose Fine-grained Correspondence Pose Estimation (FiCoP), a framework that transitions from noise-prone global matching to spatially-constrained patch-level correspondence. To systematically eliminate background interference, FiCoP first employs an object-centric disentanglement step to isolate the target from macro-level environmental noise. Building upon this localized region, our core methodological innovations are twofold. Firstly, a Cross-Perspective Global Perception (CPGP) module is proposed to fuse dual-view features, establishing structural consensus through explicit context reasoning and text-guided semantic injection. Secondly, we design a Patch Correlation Predictor (PCP) that leverages a patch-to-patch correlation matrix as a structural prior. This generates a precise block-wise association map, acting as a spatial filter to enforce fine-grained, noise-resilient matching. Experiments on the REAL275 and Toyota-Light datasets demonstrate that FiCoP improves Average Recall by 8.0% and 6.1%, respectively, compared to the state-of-the-art method, highlighting its capability to deliver robust and generalized perception for robotic agents operating in complex, unconstrained open-world environments. The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP.

URL PDF HTML ☆

赞 0 踩 0

2606.14776 2026-06-16 cs.RO cs.LG 新提交

Deep Learning-Based Lunar Crater Terrain Relative Navigation

基于深度学习的月球陨石坑地形相对导航

Batu Candan, Simone Servadio

发表机构 * NASA（美国国家航空航天局）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出一种结合深度学习陨石坑检测器和扩展卡尔曼滤波的地形相对导航算法，在初始位置偏差达5公里时仍能将导航误差降至数百米。

详情

AI中文摘要

准确的位置估计对于未来使用自主飞行器实现月球着陆至关重要，尤其是在地形特征稀疏的危险环境中。本文提出了一种地形相对导航（TRN）算法，该算法结合了我们专门为NASA陨石坑检测挑战问题设计的深度学习陨石坑检测器和扩展卡尔曼滤波（EKF）。我们的检测器分析从轨道获取的单目图像中的陨石坑特征，并通过匈牙利分配方法及基于共识的离群点去除方法，识别它们与全球数据库中陨石坑的匹配。然后，估计的测量值用于优化EKF，其中航天器在月心月固（LCLF）参考系中的姿态估计，结合高度辅助信息，约束径向漂移。仿真结果表明，即使航天器偏离实际位置达5公里，TRN也能从这种情况中恢复，将导航误差降低到几百米。需要注意的是，为了保持陨石坑特征的对应关系，必须将图像分辨率和场景中的尺度与检测器训练集分布相匹配。

英文摘要

Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.14879 2026-06-16 cs.RO cs.CV cs.LG 新提交

VANDERER: Map-Free Exploration using Future-Aware and Visual-Curiosity-Guided Diffusion Policy

VANDERER: 基于未来感知与视觉好奇心引导扩散策略的无地图探索

Venkata Naren Devarakonda, Raktim Gautam Goswami, Prashanth Krishnamurthy, Farshad Khorrami

发表机构 * Control/Robotics Research Laboratory (CRRL), Department of Electrical and Computer Engineering, NYU Tandon School of Engineering（纽约大学坦登工程学院电气与计算机工程系控制/机器人研究实验室（CRRL））； New York University Abu Dhabi (NYUAD) Center for Artificial Intelligence and Robotics (CAIR)（纽约大学阿布扎比分校人工智能与机器人中心（CAIR））

AI总结提出VANDERER框架，利用视觉好奇心模块引导预训练扩散策略，仅依赖单目图像实现高效无地图探索，在多种模拟环境中平均探索面积比NoMaD多13.4%。

详情

AI中文摘要

移动智能体需要高效的探索策略来绘制未知环境并自主规划任务。传统方法依赖于生成占据地图并优化未探索区域的访问顺序。然而，在传感器受限的设置中，例如仅使用单目相机，生成准确的占据地图具有挑战性。为了解决这一问题，我们提出了VANDERER，一个探索框架，它利用视觉好奇心模块（VCM）仅使用单目图像数据来引导预训练的扩散策略。该好奇心模块通过导航世界模型预测所提议动作的结果，并通过好奇心成本对其进行评估。然后，该成本引导扩散过程生成最大化探索的动作。在多种模拟环境中进行评估，VANDERER始终优于现有基线，平均探索面积比NoMaD多13.4%。我们的结果揭示了室外环境中视觉好奇心与几何好奇心之间的直接相关性，表明VANDERER能够有效利用这种关系，在传感器受限的智能体上实现高效探索。

英文摘要

Mobile agents require efficient exploration strategies to map unseen environments and autonomously plan tasks. Traditional methods rely on generating occupancy maps and optimizing the sequence in which unexplored regions are visited. However, in sensor-constrained settings, such as those limited to monocular cameras, generating accurate occupancy maps is challenging. To address this, we propose VANDERER, an exploration framework that leverages a Visual Curiosity Module (VCM) to guide pre-trained diffusion policies using only monocular image data. This curiosity module predicts the outcomes of proposed actions via a navigation world model and evaluates them through a curiosity cost. The cost then guides the diffusion process toward generating actions that maximize exploration. Evaluated across diverse simulated environments, VANDERER consistently outperforms established baselines, exploring an average of 13.4% more area than NoMaD. Our results reveal a direct correlation between visual and geometric curiosity in outdoor environments, demonstrating that VANDERER can effectively leverage this relationship for efficient exploration using sensor-constrained agents.

URL PDF HTML ☆

赞 0 踩 0

2606.15154 2026-06-16 cs.RO 新提交

Task-Aware Environment Augmentation for Reliable Navigation via Shielded Conditional Diffusion

任务感知的环境增强：通过屏蔽条件扩散实现可靠导航

Bharawee Phoompho, Gokul Puthumanaillam, Yan Miao, Ruben Hernandez, Tim Bretl, Sayan Mitra, Melkior Ornik

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结针对部分可观测环境下的轨迹规划可靠性问题，提出任务感知的环境增强方法SCoDA，利用条件扩散模型学习最优视觉标记布局，通过屏蔽采样引导标记放置，提升轨迹执行可靠性和完成时间。

详情

AI中文摘要

在部分可观测条件下的可靠轨迹规划不仅取决于计算可行的几何路径，还取决于机器人在执行该轨迹时是否能接收到信息丰富的观测。现有方法通常保持环境固定，通过信念空间规划、主动定位或增加传感来适应机器人，这往往导致在观测贫乏区域出现代价高昂的不确定性传播和脆弱行为。我们翻转这一视角，解决一个很大程度上开放的问题：\emph{任务感知的环境增强}——给定一个已建图的环境、一条规划的任务轨迹和少量视觉标记预算，应在何处增强环境，使得规划轨迹能在不确定性下可靠执行？我们的关键观察是，有用的标记布局由它们沿任务轨迹提供的定位支持定义：少量时机恰当的观测足以防止不确定性在状态估计误差会危及控制的区域累积。基于这一观察，我们提出了\tbp{SCoDA}，即$\textbf{S}$hielded $\textbf{Co}$nditional $\textbf{D}$iffusion for Environment $\textbf{A}$ugmentation（屏蔽条件扩散用于环境增强）。\tbp{SCoDA}从数据中学习高性能标记布局的条件分布，以环境、规划轨迹、干扰上下文和期望执行轮廓为条件。其屏蔽采样器推理沿规划执行应进行位姿校正的位置，并将该分布引导至任务相关、有限预算的增强。在模拟基准测试和硬件部署中，我们展示了\tbp{SCoDA}在轨迹执行可靠性和完成时间上优于强基线方法。代码、模型和数据集见：\hyperlink{scoda-diffusion.github.io}{https://scoda-diffusion.github.io/}

英文摘要

Reliable trajectory planning under partial observability depends not only on computing a feasible geometric path, but also on whether the robot receives informative observations while executing that trajectory. Existing approaches usually keep the environment fixed and adapt the robot through belief-space planning, active localization, or added sensing, often incurring costly uncertainty propagation and brittle behavior in observation-poor regions. We flip this perspective and address the largely open problem of \emph{task-aware environment augmentation}: given a mapped environment, a planned task trajectory, and a small budget of visual fiducial markers, where should the environment be augmented so that the planned trajectory can be executed reliably under uncertainty? Our key observation is that useful marker layouts are defined by the localization support they provide along the task trajectory: a small number of well-timed observations can be sufficient to prevent uncertainty from accumulating in regions where state-estimation error would otherwise compromise control. Building on this observation, we present \tbp{SCoDA}, $\textbf{S}$hielded $\textbf{Co}$nditional $\textbf{D}$iffusion for Environment $\textbf{A}$ugmentation. \tbp{SCoDA} learns a conditional distribution over high-performing fiducial layouts from data, using the environment, planned trajectory, disturbance context, and desired execution profile as conditioning. Its shielded sampler reasons over where along the planned execution pose corrections should occur, and steers this distribution toward task-relevant, finite-budget augmentations. Across simulated benchmarks and hardware deployments, we show that \tbp{SCoDA} improves trajectory execution reliability and completion time over strong baselines. Code, models and dataset available at: \hyperlink{scoda-diffusion.github.io}{https://scoda-diffusion.github.io/}

URL PDF HTML ☆

赞 0 踩 0

2606.15476 2026-06-16 cs.RO 新提交

FARM: Find Anything using Relational Spatial Memory

FARM: 使用关系空间记忆找到任何物体

Siming He, Leo Huang, Adam Lilja, Fabio Hubel, Jonas Frey, Marco Pavone, S. Shankar Sastry, Jitendra Malik, Claire Tomlin

发表机构 * UC Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出FARM系统，通过实时构建包含几何、视觉语言描述和视角证据的开放词汇物体级记忆，并利用VLM解析查询和显式空间约束，在44k语言查询中Recall@5和Recall@10分别提升164%和224%，Accuracy@1提升35%。

详情

AI中文摘要

在家庭、仓库及其他物体丰富的环境中运行的机器人需要能够按需找到特定物体实例的记忆系统。仅靠物体级记忆往往不够：场景中包含许多看似匹配的物体，用户通过目标与地标及周围物体的关系来指代目标（例如，“飞镖盘下方、海报左侧的高灯”），这要求一种支持通过语义、外观和空间谓词进行检索的关系空间记忆。为此，我们提出了FARM（使用关系空间记忆找到任何物体），该系统以5-10 Hz的实时速度构建一个紧凑的、开放词汇的物体级记忆，包含几何、视觉语言描述和视角证据。在查询时，FARM使用VLM解析查询并评分视觉证据，同时通过物体符号和关系谓词显式地约束空间关系。这种对VLM的结构化使用使得检索比基于帧历史或场景图上下文的端到端推理更准确和鲁棒。在涵盖67个室内外场景（面积从15到15,000平方米）的44k语言查询实验中，FARM的Recall@5和Recall@10相比先前方法分别提升了164%和224%，最终VLM重排序阶段将Accuracy@1提升了35%，同时保持实时运行。我们进一步在四足机器人上使用机载传感器和计算展示了闭环部署。

英文摘要

Robots operating in homes, warehouses, and other object-rich environments need memory systems that can find specific object instances on demand. Object-level memory alone is often insufficient: scenes contain many plausibly matching objects, and users refer to the target through relations to landmarks and surrounding objects (e.g. ``the tall lamp below the dartboard and to the left of the poster''), demanding a relational spatial memory that supports retrieval through semantic, appearance, and spatial predicates over objects. To achieve this, we present FARM (Find Anything using Relational Spatial Memory), which builds, in real time at 5-10 Hz, a compact, open-vocabulary, object-level memory with geometry, visual-language descriptors, and viewpoint evidence. At query time, FARM uses VLMs to parse the query and score visual evidence, while grounding spatial constraints explicitly through object symbols and relational predicates. This structured use of VLMs enables more accurate and robust retrieval than end-to-end reasoning over frame histories or scene-graph context. In experiments on 44k language queries spanning 67 indoor and outdoor scenes, ranging from 15 to 15,000 m^2, FARM improves Recall@5 and Recall@10 over prior methods by 164% and 224%, and a final VLM reranking stage improves Accuracy@1 by 35%, while running in real time. We further demonstrate closed-loop deployment on a quadrupedal robot using onboard sensors and compute.

URL PDF HTML ☆

赞 0 踩 0

2606.15491 2026-06-16 cs.RO 新提交

PolyMerge: 用多面体覆盖压缩3D高斯泼溅以实现可证明安全的资源受限导航

Jihoon Hong, Chih-Yuan Chiu, Sara Fridovich-Keil, Glen Chou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出PolyMerge，将大规模3D高斯泼溅模型转换为凸多面体覆盖，保证覆盖原模型所有障碍物，结合控制障碍函数实现实时安全路径规划，在Crazyflie无人机上验证。

详情

DOI: 10.1109/LRA.2026.3692083
Journal ref: IEEE Robotics and Automation Letters, vol. 11, no. 7, pp. 8512-8519, July 2026

AI中文摘要

障碍物避免对于安全导航和运动规划至关重要。最近的辐射场重建方法能够以高保真度进行物体检测和建模，但对于机载感知路径规划而言，仍然过于消耗内存和计算资源。为了解决这些限制，我们提出PolyMerge，将场景的大规模、逼真的3D高斯泼溅（3DGS）模型转换为凸多面体的轻量级表示，这些多面体的并集可证明地过度逼近原始3DGS模型中的所有障碍物。PolyMerge调整多面体数量以权衡保守性和计算成本，并与控制障碍函数（CBF）集成以规划无碰撞路径。我们在Crazyflie无人机的仿真和硬件实验中展示了PolyMerge，该无人机在严重的机载计算约束下使用PolyMerge实时计算并跟踪安全轨迹，在保证安全的同时在速度上优于基线。有关我们的代码和视频，请访问https://athlon76.github.io/PolyMerge-website/。

英文摘要

Obstacle avoidance is essential for safe navigation and motion planning. Recent radiance field reconstruction methods enable object detection and modeling with high fidelity, but remain too memory- and compute-intensive for on-board perception-based path planning. To address these limitations, we propose PolyMerge to convert a large, photorealistic 3D Gaussian Splatting (3DGS) model of a scene into a lightweight representation of convex polytopes whose union provably over-approximates all obstacles in the original 3DGS model. PolyMerge tunes the polytope count to trade off conservativeness and compute cost, and integrates with control barrier functions (CBFs) to plan collision-free paths. We showcase PolyMerge in simulation and hardware experiments on a Crazyflie drone, which uses PolyMerge to compute and follow safe trajectories in real time under severe onboard compute constraints, outperforming baselines in speed while guaranteeing safety. For our code and videos, visit https://athlon76.github.io/PolyMerge-website/.

URL PDF HTML ☆

赞 0 踩 0

2606.16400 2026-06-16 cs.RO 新提交

SemGeoNav:A Safety-Guided Visual Navigation Approach with Semantic Reasoning and Geometric Planning

SemGeoNav：一种结合语义推理与几何规划的安全引导视觉导航方法

Yu Liu, Zongyang Chen, Yan Guo, Chao Liu, Xianfei Pan

发表机构 * College of Intelligence Science and Technology, National University of Defense Technology（国防科技大学智能科学学院）

AI总结提出SemGeoNav分层视觉导航框架，融合端到端模型的高层语义推理与几何方法的可靠局部规划，实现鲁棒图像导航并显著提升避障能力，在真实机器人上优于ViNT和NoMaD。

Comments The paper has been accepted by ICGNC 2026

详情

AI中文摘要

基于学习的视觉导航增强了语义目标到达能力。然而，由于其黑箱特性，纯端到端模型通常缺乏显式的几何约束，导致在开放环境中避障不可预测且不可靠。相反，传统几何规划器确保安全性，但难以处理高维视觉目标。为了解决这些限制，我们提出了SemGeoNav，一种新颖的分层视觉导航框架。它紧密集成了端到端模型的高层语义推理与基于几何方法的可靠局部规划能力，实现了鲁棒的基于图像的导航，同时显著改善了避障。此外，我们引入了一种时间轨迹平滑机制，以确保机器人运动连续稳定。我们在真实环境中的Unitree Go2四足机器人上评估了SemGeoNav。结果表明，SemGeoNav优于现有代表性方法（包括ViNT和NoMaD），实现了更高的成功率和更短的导航时间。

英文摘要

Learning-based visual navigation has enhanced semantic goal-reaching capabilities. However, due to their black-box nature, purely end-to-end models often lack explicit geometric constraints, leading to unpredictable and unreliable obstacle avoidance in open environments. Conversely, traditional geometric planners ensure safety but struggle with high-dimensional visual targets. To address these limitations, we propose SemGeoNav, a novel hierarchical visual navigation framework.It tightly integrates the high-level semantic reasoning of end-to-end models with the reliable local planning ability of geometry-based methods, achieving robust image-based navigation while significantly improving obstacle avoidance. Furthermore, we introduce a temporal trajectory smoothing mechanism to ensure continuous and stable robot motion. We evaluated SemGeoNav on a Unitree Go2 quadruped robot in real-world environments. The results demonstrate that SemGeoNav outperforms existing representative methods, including ViNT and NoMaD, achieving higher success rates and shorter navigation times.

URL PDF HTML ☆

赞 0 踩 0

2606.16881 2026-06-16 cs.RO 新提交

MapDream: 面向视觉-语言导航的任务驱动地图学习

Guoxin Lian, Shuo Wang, Yucheng Wang, Yongcai Wang, Maiyue Chen, Kaihui Wang, Bo Zhang, Zhizhong Su, Deying Li, Zhaoxin Fan

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出MapDream框架，通过自回归鸟瞰图生成联合学习地图与动作预测，在R2R-CE和RxR-CE上达到单目最优性能。

详情

AI中文摘要

视觉-语言导航（VLN）要求智能体在部分可观测的3D环境中遵循自然语言指令，这促使地图表示能够聚合超出局部感知的空间上下文。然而，现有大多数方法依赖于独立于导航策略构建的手工地图。我们认为，地图应该是由导航目标直接塑造的学习表示，而非详尽的重建。基于这一见解，我们提出MapDream，一种地图在环框架，将地图构建表述为自回归鸟瞰图（BEV）图像合成。该框架联合学习地图生成和动作预测，将环境上下文蒸馏为紧凑的三通道BEV地图，仅保留导航关键的可通行性。监督预训练引导了可靠的地图到控制接口，而自回归设计通过强化微调实现端到端联合优化。在R2R-CE和RxR-CE上的实验取得了最先进的单目性能，验证了任务驱动的生成式地图学习。

英文摘要

Vision-Language Navigation (VLN) requires agents to follow natural language instructions in partially observed 3D environments, motivating map representations that aggregate spatial context beyond local perception. However, most existing approaches rely on hand-crafted maps constructed independently of the navigation policy. We argue that maps should instead be learned representations shaped directly by navigation objectives rather than exhaustive reconstructions. Based on this insight, we propose MapDream, a map-in-the-loop framework that formulates map construction as autoregressive bird's-eye-view (BEV) image synthesis. The framework jointly learns map generation and action prediction, distilling environmental context into a compact three-channel BEV map that preserves only navigation-critical affordances. Supervised pre-training bootstraps a reliable mapping-to-control interface, while the autoregressive design enables end-to-end joint optimization through reinforcement fine-tuning. Experiments on R2R-CE and RxR-CE achieve state-of-the-art monocular performance, validating task-driven generative map learning.

URL PDF HTML ☆

赞 0 踩 0

2602.05608 2026-06-16 cs.RO 版本更新

HiCrowd: Hierarchical Crowd Flow Alignment for Dense Human Environments

HiCrowd：密集人群环境中的分层人群流对齐

Yufei Zhu, Shih-Min Yang, Martin Magnusson, Allan Wang

发表机构 * Robot Navigation and Perception Lab, AASS Research Center, Örebro University, Sweden（奥雷布罗大学机器人导航与感知实验室，AASS研究中心，瑞典）； Miraikan – The National Museum of Emerging Science and Innovation, Japan（日本新兴科学与创新国家博物馆——Miraikan）

AI总结提出HiCrowd分层框架，结合强化学习与模型预测控制，通过跟随人群流解决机器人冻结问题，在真实和合成数据集上提升导航效率与安全性。

Comments 2026 IEEE International Conference on Robotics and Automation (ICRA)

详情

AI中文摘要

在密集人群中导航对移动机器人仍是一个重大挑战。关键问题是机器人冻结问题，即机器人难以找到安全运动并被困在人群中。为解决此问题，我们提出HiCrowd，一个将强化学习（RL）与模型预测控制（MPC）相结合的分层框架。HiCrowd利用周围行人运动作为引导，使机器人能够与兼容的人群流对齐。高层RL策略生成一个跟随点，使机器人与合适的人群组对齐，而低层MPC通过短视距规划安全地跟踪该引导。该方法结合了长期人群感知决策与安全短期执行。我们在离线设置（回放记录的人类轨迹）和在线设置（人类轨迹更新以在仿真中对机器人做出反应）中，将HiCrowd与反应式和基于学习的基线进行了比较。在真实世界数据集和合成人群数据集上的实验表明，我们的方法在导航效率和安全性上表现更优，同时减少了冻结行为。我们进一步通过在公共博物馆和大阪2025年世博会的实际部署验证，该方法无需重新训练即可在密集人流中导航，展现出鲁棒且具有社会意识的行为。我们的结果表明，利用人类运动作为引导，而非将人类仅视为动态障碍，为机器人在人群中安全高效导航提供了有力原则。项目代码和演示可在此https URL获取。

英文摘要

Navigating through dense human crowds remains a significant challenge for mobile robots. A key issue is the freezing robot problem, where the robot struggles to find safe motions and becomes stuck within the crowd. To address this, we propose HiCrowd, a hierarchical framework that integrates reinforcement learning (RL) with model predictive control (MPC). HiCrowd leverages surrounding pedestrian motion as guidance, enabling the robot to align with compatible crowd flows. A high-level RL policy generates a follow point to align the robot with a suitable pedestrian group, while a low-level MPC safely tracks this guidance with short horizon planning. The method combines long-term crowd aware decision making with safe short-term execution. We evaluate HiCrowd against reactive and learning-based baselines in offline setting (replaying recorded human trajectories) and online setting (human trajectories are updated to react to the robot in simulation). Experiments on a real-world dataset and a synthetic crowd dataset show that our method outperforms in navigation efficiency and safety, while reducing freezing behaviors. We further validate through real-world deployment in a public museum and Expo 2025 Osaka, where it navigates dense pedestrian flows without retraining, demonstrating robust and socially aware behavior. Our results suggest that leveraging human motion as guidance, rather than treating humans solely as dynamic obstacles, provides a powerful principle for safe and efficient robot navigation in crowds. Project code and demos are available at https://github.com/test-bai-cpu/HiCrowd.

URL PDF HTML ☆

赞 0 踩 0

2603.16273 2026-06-16 cs.RO 版本更新

GenZ-LIO: Generalizable LiDAR-Inertial Odometry Beyond Confined--Open Boundaries

GenZ-LIO: 超越受限与开放边界的可泛化激光雷达-惯性里程计

Daehan Lee, Hyungtae Lim, Seongjun Kim, Soonbin Rho, Changhyeon Lee, Sanghyun Park, Junwoo Hong, Eunseon Choi, Hyunyoung Jo, Soohee Han

发表机构 * Computational Control Engineering Laboratory (CoCEL), Department of Convergence IT Engineering and Electrical Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, South Korea（convergence 信息技术工程与电气工程系计算控制工程实验室，POSTECH 首尔大学（Pohang University of Science and Technology）, Pohang 37673, South Korea）； Laboratory for Information & Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA 02139, USA（信息与决策系统实验室，麻省理工学院（Massachusetts Institute of Technology）, Cambridge, MA 02139, USA）

AI总结提出GenZ-LIO框架，通过尺度自适应体素化、混合度量状态更新和体素剪枝对应搜索，解决受限与开放空间过渡导致的鲁棒性和效率下降问题，在42个序列上保持稳定估计。

Comments 21 pages, 12 figures

详情

AI中文摘要

对于巡检、搜索救援和探索等现场机器人任务，激光雷达-惯性里程计（LIO）通过在GNSS拒绝或无结构环境中提供定位和建图，可作为自主性的核心组件。然而，现场部署中常见的受限与开放空间之间的过渡会导致扫描密度和局部几何结构发生显著变化，从而降低LIO的鲁棒性和计算效率。为解决这些问题，我们提出了GenZ-LIO，一个可泛化的LIO框架，旨在适应受限和开放环境中空间尺度的变化。GenZ-LIO包含三个组件：(i) 尺度感知自适应体素化，用于调节跨空间尺度变化的扫描下采样；(ii) 混合度量状态更新，用于在变化的几何结构下组合点到平面和点到点残差；(iii) 体素剪枝对应搜索，用于高效的点到点匹配。我们使用来自九个公共数据集的42个序列以及我们新收集的NarrowWide数据集进行了全面评估，以分析LIO在不同现场场景下空间尺度变化时的性能。在评估的序列中，GenZ-LIO保持稳定的里程计估计而不发散，表明在测试的现场条件下具有实际鲁棒性。源代码和收集的数据集将在发表后公开。

英文摘要

For field robotic missions such as inspection, search-and-rescue, and exploration, light detection and ranging (LiDAR)-inertial odometry (LIO) can serve as a core component of autonomy by providing localization and mapping in GNSS-denied or unstructured environments. However, transitions between confined and open spaces, which are commonly encountered in field deployments, can induce substantial changes in scan density and local geometric structure, thereby reducing the robustness and computational efficiency of LIO. To address these issues, we present GenZ-LIO, a generalizable LIO framework designed to adapt to variations in spatial scale across confined and open environments. GenZ-LIO comprises three components: (i) scale-aware adaptive voxelization for regulating scan downsampling across spatial scale changes, (ii) hybrid-metric state update for combining point-to-plane and point-to-point residuals under varying geometric structure, and (iii) voxel-pruned correspondence search for efficient point-to-point matching. We conduct a comprehensive evaluation using 42 sequences from nine public datasets and our newly collected NarrowWide dataset to analyze LIO performance under spatial scale variations across diverse field scenarios. Across the evaluated sequences, GenZ-LIO maintains stable odometry estimation without divergence, indicating practical robustness under the tested field conditions. The source code and collected dataset will be made publicly available upon publication.

URL PDF HTML ☆

赞 0 踩 0

2606.04907 2026-06-16 cs.RO 版本更新

WAM-Nav: Asymmetric Latent World-Action Modeling for Unified Visual Navigation

WAM-Nav：面向统一视觉导航的非对称潜在世界-动作建模

Ning Yang, Yan Huang, Kaiwen Peng, Ziheng He, Kai Wang, Cui Miao, Kailin Lyu, Guo Li, Xiaofeng Wang, Zheng Zhu, Jing Liu, Nianfeng Liu

发表机构 * Nanjing University（南京大学）； Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； University of Chinese Academy of Sciences（中国科学院大学）； FiveAges ； National University of Defense Technology（国防科技大学）； Tsinghua University（清华大学）； GigaAI

AI总结提出WAM-Nav，一种联合学习动作生成与潜在视觉预测的非对称扩散Transformer模型，通过共享扩散Transformer实现长时程动作与短时程视觉预测的联合扩散，并引入双流上下文条件机制和目标对齐模块，在统一策略下支持图像目标、点目标和无目标导航，在ClutterScenes和InternScenes基准上分别提升15.7%和3.3%的成功率，并在真实环境中实现85%的任务成功率。

详情

AI中文摘要

视觉导航需要在复杂的几何和物理约束下生成平滑且无碰撞的轨迹。现有的反应式策略直接将观测映射到动作，缺乏预期推理能力，限制了其主动避障的能力。虽然视觉想象提供了预测性前瞻，但传统的模块化方法将场景预测与策略学习分离，常常导致误差累积和推理效率低下。为了解决这些限制，我们提出了WAM-Nav，一种用于具身视觉导航的潜在世界-动作模型，它联合学习动作生成和潜在视觉预测，从而在不影响推理效率的情况下实现更鲁棒和更具前瞻性的导航决策。具体来说，WAM-Nav利用共享的扩散Transformer进行非对称联合扩散，同时生成长时程动作和短时程视觉预测，减少了多步自回归展开中固有的推理延迟和视觉误差累积。为了进一步促进平滑且一致的轨迹生成，我们引入了一种双流上下文条件机制，将情节级别的自运动历史与顺序视觉观测相结合。结合统一的目标对齐模块，该模块在不同目标类型间保持平衡表示，WAM-Nav在单一策略下自然支持图像目标、点目标和无目标探索。在具有挑战性的ClutterScenes和InternScenes基准上的大量实验证明了WAM-Nav的强大泛化能力，特别是在图像目标和点目标导航中，成功率分别提高了15.7%和3.3%。真实世界部署进一步验证了有效的零样本模拟到现实迁移，在多样化的室内和室外环境中实现了平均85%的任务成功率。

英文摘要

Visual navigation requires generating smooth and collision-free trajectories under complex geometric and physical constraints. Existing reactive policies that directly map observations to actions lack anticipatory reasoning, limiting their ability to proactively avoid obstacles. While visual imagination offers predictive foresight, conventional modular approaches separate scene prediction from policy learning, often leading to error accumulation and inefficient inference. To address these limitations, we propose WAM-Nav, a Latent World-Action Model for embodied visual navigation that jointly learns action generation and latent visual foresight, enabling more robust and foresighted navigation decisions without compromising inference efficiency. Specifically, WAM-Nav utilizes a shared Diffusion Transformer for asymmetric joint diffusion to concurrently generate long-horizon actions and short-horizon visual foresight, reducing the inference latency and visual error accumulation inherent in multi-step autoregressive rollouts. To further encourage smooth and consistent trajectory generation, we introduce a dual-stream contextual conditioning mechanism that integrates episode-level ego-motion history with sequential visual observations. Combined with a unified goal alignment module that preserves balanced representations across goal types, WAM-Nav naturally supports Image-Goal, Point-Goal, and No-Goal exploration within a single policy. Extensive experiments on the challenging ClutterScenes and InternScenes benchmarks demonstrate strong generalization of WAM-Nav, particularly on Image-Goal and Point-Goal navigation, where it improves success rates by 15.7% and 3.3%, respectively. Real-world deployment further validates effective zero-shot sim-to-real transfer, achieving an average 85% task success rate across diverse indoor and outdoor environments.

URL PDF HTML ☆

赞 0 踩 0

2511.15645 2026-06-16 cs.CV cs.RO 版本更新

FDIO: Frequency Decomposed Inertial Odometry

FDIO：频率分解惯性里程计

Shanshan Zhang, Liqin Wu, Wenying Cao, Lingxiang Zheng, Yu Yang

发表机构 * Department of Information and Communication Engineering, National and Local Joint Engineering Research Center of Navigation and Location Based Services, Xiamen University（信息与通信工程系、导航与位置服务国家与地方联合工程研究中心、厦门大学）； Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University（电子科学系、固体表面物理化学国家重点实验室、厦门大学）

AI总结针对双设备采集场景中IMU信号耦合问题，提出频率分解惯性里程计（FDIO），通过拉普拉斯金字塔分解信号、Mamba模块建模低频长程运动和多尺度卷积提取高频局部特征，在五个数据集上平均绝对轨迹误差降低33.3%。

详情

AI中文摘要

行人惯性里程计（PIO）仅利用惯性测量单元（IMU）采集的加速度和角速度测量值估计自主行人运动，使其在消费级定位应用中具有极高价值。然而，在双设备采集设置下，自由携带的移动设备收集的IMU信号本质上是复合信号，其中人体躯干的全局运动与局部肢体运动引起的扰动耦合在一起。这种耦合使得精确的人体运动建模更具挑战性。为解决这一问题，本文提出了频率分解惯性里程计（FDIO）。该方法首先使用拉普拉斯金字塔将输入IMU信号分解为低频和高频分量。然后采用Mamba模块从低频分量中建模长程运动信息，并使用多尺度卷积模块从高频分量中提取细粒度局部动态特征。在五个公开PIO数据集上的实验表明，FDIO的平均绝对轨迹误差为3.221米，平均相对轨迹误差为2.550米，与RoNIN ResNet基线相比，误差分别降低了33.3%和16.7%。这些结果验证了所提出的频率分解策略的有效性。据我们所知，这项工作是将Mamba和频率分解架构引入惯性里程计的早期尝试之一。

英文摘要

Pedestrian inertial odometry (PIO) estimates autonomous pedestrian motion using only acceleration and angular velocity measurements collected by an inertial measurement unit (IMU), making it highly valuable for consumer level localization applications. However, under a dual device acquisition setting, IMU signals collected by a freely carried mobile device are inherently composite signals in which the global motion of the human torso is coupled with perturbations induced by local limb motion. This coupling makes accurate human motion modeling more challenging. To address this issue, this paper proposes frequency decomposed inertial odometry (FDIO). The proposed method first decomposes input IMU signals into low frequency and high frequency components using a Laplacian pyramid. It then adopts a Mamba module to model long range motion information from the low frequency component and uses a multi scale convolution module to extract fine grained local dynamic features from the high frequency component. Experiments on five public PIO datasets show that FDIO achieves an average absolute trajectory error of 3.221~m and an average relative trajectory error of 2.550~m, reducing the errors by 33.3\% and 16.7\% compared with the RoNIN ResNet baseline, respectively. These results validate the effectiveness of the proposed frequency decomposition strategy. To the best of our knowledge, this work is among the first efforts to introduce Mamba and a frequency decomposition architecture into inertial odometry.

URL PDF HTML ☆

赞 0 踩 0

2606.14969 2026-06-16 cs.RO 新提交

HATS：用于多臂数据收集的人-智能体遥操作系统

Zesen Lin, Jian-Jian Jiang, Haoming Cen, Xiao-Ming Wu, Dandan Zhang, Wei-Shi Zheng

发表机构 * School of Computer Science and Engineering, Sun Yat-sen University（中山大学计算机科学与工程学院）； Nanyang Technological University（南洋理工大学）； Imperial College London（帝国理工学院）

AI总结提出HATS系统，由单操作员借助MLLM智能体控制两主臂和两辅助臂，实现高效多臂数据收集，性能媲美双人专家团队。

详情

AI中文摘要

许多真实世界的操作场景，例如处理复杂的协作任务和应对大工作空间，需要协调两个以上的机械臂。因此，需要一个有效的多臂遥操作系统来收集训练协调多臂操作策略的示范数据。然而，现有的遥操作框架主要关注单操作员或多操作员设置，面临着单操作员认知负荷与多操作员协调成本之间的实际权衡。为了解决这个问题，我们引入了HATS，一个人类-智能体遥操作系统，使单个人类操作员在基于MLLM的智能体辅助下，能够收集多臂操作任务的数据。我们的系统解耦了控制空间：两个主臂由人类直接遥操作，而两个辅助臂由处理子任务的无训练智能体控制。此外，人类操作员可以在执行过程中使用语音命令来防止碰撞并纠正辅助臂的行为。大量评估表明，HATS在数据收集效率和成功率上与专家双人团队相当。此外，下游策略评估证明了通过HATS收集的数据的有效性和质量。

英文摘要

Many real-world manipulation scenarios, such as handling complex collaborative tasks and dealing with large workspaces, require coordination of more than two robotic arms. Consequently, an effective multi-arm teleoperation system is required to collect demonstrations for training coordinated multi-arm manipulation policies. However, existing teleoperation frameworks mainly focus on single-operator or multi-operator setups, facing a practical trade-off between the cognitive load placed on a single operator and the coordination cost incurred by multiple operators. To address this problem, we introduce HATS, a human-agent teleoperation system that enables a single human operator, assisted by an MLLM-based agent, to collect data for multi-arm manipulation tasks. Our system decouples the control space: two primary arms are directly teleoperated by the human, while two assistive arms are controlled by a training-free agent that handles sub-tasks. In addition, the human operator can use voice commands to prevent collisions and correct assistive arm behaviors during execution. Extensive evaluations demonstrate that HATS achieves data collection efficiency and success rates comparable to expert dual-human teams. Moreover, downstream policy evaluations demonstrate the efficacy and quality of the data collected through HATS.

URL PDF HTML ☆

赞 0 踩 0

2606.16600 2026-06-16 cs.RO 新提交

WaveSync: Constrained Wavefront Optimization for Synchronized Co-Speech Gestures in Humanoid Robots

WaveSync: 面向人形机器人同步共语手势的约束波前优化

Thang Tran Viet, Thanh Nguyen Canh, Gia Huy Uong, Phuc Van Dinh, Tan Viet Tuyen Nguyen, Xiem HoangVan, Nak Young Chong

发表机构 * University of Engineering and Technology, Vietnam National University（越南国立大学工程与技术大学）； School of Information Science, Japan Advanced Institute of Science and Technology（日本先端科学技术大学院大学信息科学学院）； School of Electronics and Computer Science, University of Southampton（南安普顿大学电子与计算机科学学院）； Department of Robotics, Hanyang University（汉阳大学机器人学系）

AI总结提出WaveSync框架，利用大语言模型分解语义并构建重要性波，通过动态运动基元生成手势轨迹，再经波前优化实现手势与语音的峰值同步，同时满足运动学约束，在五组对话场景中优于基线方法。

详情

AI中文摘要

富有表现力的共语手势对于自然的人机交互至关重要，但在物理人形机器人上生成这些手势非常困难，因为手势动作必须与语音重点对齐，同时满足严格的运动学和动力学约束。与虚拟化身不同，人形机器人无法自由执行快速或重叠的运动，这使得单词级别的同步和硬件安全的运动规划成为一个耦合问题。我们提出了\textbf{WaveSync}，一个混合框架，其中大语言模型将对话响应分解为结构化的语义模式，并为每个单词分配重要性权重，构建连续的语义重要性波。手势轨迹通过动态运动基元进行塑造，在增强表现力的同时确保运动学可行性。波前优化阶段实现手势与语音的峰值到峰值同步，并通过手势持续时间压缩和前向传播解决剩余的运动学违规。基于五个对话场景的实验评估表明，我们的方法实现了高同步精度，并在客观和主观评估中优于三个基线。WaveSync中的每个组件在生成富有表现力、语义基础且符合运动学要求的手势中都发挥了必要作用。代码、资源和视频可在\href{https://github.com/pairs-lab/WaveSync}{WaveSync}获取。

英文摘要

Expressive co-speech gestures are crucial for natural human-robot interaction, but generating them on physical humanoid robots is difficult because gesture strokes must align with speech emphasis while satisfying strict kinematic and dynamic constraints. Unlike virtual avatars, humanoid robots cannot freely execute rapid or overlapping motions, making word-level synchronization and hardware-safe motion planning a coupled problem. We present \textbf{WaveSync}, a hybrid framework in which a Large Language Model decomposes dialogue responses into structured semantic schemas and assigns per-word importance weights, constructing a continuous Semantic Importance Wave. Gesture trajectories are shaped through Dynamic Movement Primitives, enforcing kinematic feasibility while enhancing expressiveness. A Wavefront Optimization stage aligns peak-to-peak gesture-speech synchronization and resolves residual kinematic violations through gesture-duration compression and forward propagation. Experimental evaluation based on five dialogue scenarios shows that our method achieves high synchronization accuracy and outperforms three baselines in both objective and subjective evaluations. Each component in WaveSync plays a necessary role in producing gestures that are expressive, semantically grounded, and kinematically compliant. The code, resources, and videos are available at \href{https://github.com/pairs-lab/WaveSync}{WaveSync}

URL PDF HTML ☆

赞 0 踩 0

2602.02773 2026-06-16 cs.RO 版本更新

Bimanual High-Density EMG Control for In-Home Mobile Manipulation by Users with Quadriplegia

用于四肢瘫痪用户居家移动操作的双侧高密度肌电控制

Jehan Yang, Eleanor Hodgson, Cindy Sun, Zackory Erickson, Doug Weber

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）

AI总结针对颈椎脊髓损伤用户，提出双侧高密度肌电（HDEMG）前臂袖套，结合共享自主框架实现实时手势控制移动操作器，经12天居家研究验证日常任务有效性。

Comments 17 pages, 20 figures

详情

AI中文摘要

家中的移动操作器可以使颈椎脊髓损伤（cSCI）患者执行他们原本无法自行完成的日常家务任务。然而，这些用户的瘫痪通常限制了他们对传统机器人控制界面（如操纵杆或键盘）的访问。在这项工作中，我们介绍并部署了首个系统，该系统使四肢瘫痪用户能够利用来自身体瘫痪部位的运动意图，通过双侧高密度肌电（HDEMG）控制移动操作器。我们开发了一对定制的、织物集成的HDEMG前臂袖套，佩戴在双臂上，捕获来自临床瘫痪自由度的残余神经肌肉活动，并支持基于手势的实时机器人控制。我们在（n=2）名cSCI用户中实现了基于运动意图的高分类准确率，最高达到98.0%。其次，通过集成视觉、语言和运动规划模块，我们引入了一个共享自主框架，支持稳健且用户驱动的遥操作，特别有利于家庭环境中导航密集型任务。最后，为了在真实环境中演示该系统，我们进行了一项为期12天的居家用户研究，评估可穿戴EMG界面在每日机器人控制中的实时使用。这些系统组件共同实现了在真实家庭环境中执行日常生活活动（ADL）和其他家务任务的有效机器人控制。

英文摘要

Mobile manipulators in the home can enable people with cervical spinal cord injury (cSCI) to perform daily physical household tasks that they could not otherwise do themselves. However, paralysis in these users often limits access to traditional robot control interfaces such as joysticks or keyboards. In this work, we introduce and deploy the first system that enables a user with quadriplegia to control a mobile manipulator using intent from paralyzed parts of their body, using bimanual high-density electromyography (HDEMG). We develop a pair of custom, fabric-integrated HDEMG forearm sleeves, worn on both arms, that capture residual neuromotor activity from clinically paralyzed degrees of freedom and support real-time gesture-based robot control. We achieve high classification accuracies based on motor intent across (n = 2) users with cSCI, achieving up to 98.0%. Second, by integrating vision, language, and motion planning modules, we introduce a shared autonomy framework that supports robust and user-driven teleoperation, with particular benefits for navigation-intensive tasks in home environments. Finally, to demonstrate the system in the wild, we present a twelve-day in-home user study evaluating real-time use of the wearable EMG interface for daily robot control. Together, these system components enable effective robot control for performing activities of daily living (ADLs) and other household tasks in a real home environment.

URL PDF HTML ☆

赞 0 踩 0

2510.07063 2026-06-16 cs.HC cs.RO 版本更新

Artists' Views on Robotics Involvement in Painting Productions

艺术家对机器人参与绘画创作的看法

Francesca Cocchella, Nilay Roy Choudhury, Eric Chen, Patrícia Alves-Oliveira

发表机构 * CONTACT Unit, Italian Institute of Technology（意大利理工学院联络单位）； University of Michigan（密歇根大学）

AI总结通过八位抽象艺术家与机器人合作绘画的实证研究，发现人机协作更富趣味性和反思性，提供更大自主性，并激发克服系统限制的新策略。

Comments 10 pages, 9 figures, submitted to RAM special issue: Arts and Robotics

2606.15021 2026-06-16 cs.RO 新提交

Steering Autoregressive Vision-Language-Action Policies via Action Token Intervention

通过动作令牌干预引导自回归视觉-语言-动作策略

Jason Chan, Jonathan C. Kao

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）

AI总结提出Token Steering方法，在推理时通过干预动作令牌空间动态引导VLA模型轨迹生成，无需训练或微调，显著提升家务操作任务成功率。

Comments 9 pages, 5 figures

详情

AI中文摘要

我们提出Token Steering (TS)，一种通过直接干预动作令牌空间来动态引导自回归视觉-语言-动作（VLA）模型生成轨迹的方法。TS将低维用户输入注入模型的原生动作令牌表示，允许用户在无需修改底层视觉-语言模型（VLM）架构的情况下影响轨迹生成。由于TS完全在推理时运行，因此不需要额外的训练或微调。用户输入引导而非覆盖预训练策略，允许用户影响机器人动作，同时保留VLA学习的灵巧性、平滑性和任务先验。我们在两个家务操作任务——物体放置后关闭抽屉和状态感知物体交换——上评估TS，成功率分别从10.0%提高到72.5%，从16.7%提高到93.8%。通过实现对机器人基础模型的轻量级、直观引导，我们的界面有潜力改善消费环境中的交互，并拓宽对有限身体控制个体的可及性。项目网站：https://jasontchan.github.io/token-steering/ 。

英文摘要

We present Token Steering (TS), a method for dynamically steering trajectories generated by an autoregressive vision-language-action (VLA) model through direct intervention in the action-token space. TS injects low-dimensional user inputs into the model's native action-token representation, allowing users to influence trajectory generation without modifying the underlying vision-language model (VLM) architecture. Because TS operates entirely at inference time, it requires no additional training or finetuning. User inputs guide rather than override the pretrained policy, allowing users to influence robot actions while preserving the dexterity, smoothness, and task priors learned by the VLA. We evaluate TS on two household manipulation tasks -- drawer closing after object placement and state-aware object swapping -- and improve success rates from 10.0% to 72.5% and from 16.7% to 93.8%, respectively. By enabling lightweight, intuitive steering over robot foundation models, our interface has the potential to improve human-robot interaction in consumer environments and broaden accessibility for individuals with limited physical control. Project website: https://jasontchan.github.io/token-steering/ .

URL PDF HTML ☆

赞 0 踩 0

2606.15285 2026-06-16 cs.RO 新提交

Acting While Understanding: Asynchronous Semantic-Action Decoupling for Real-Time Vision-Language-Action Models

理解中行动：面向实时视觉-语言-动作模型的异步语义-动作解耦

Shenhao Yan, Ge Wang, Qi Liu, Weilin Meng, Jiahao Yang, Chengsi Yao, Fan Feng, Xiaoguang Ma, Yiming Zhao, Yatong Han

发表机构 * Northeastern University（东北大学）； Ising AI ； CUHK-Shenzhen（香港中文大学（深圳））

AI总结提出异步语义-动作解耦框架，分离VLAs中的语义理解与动作生成，通过低频率语义更新和高频率动作推理实现高频闭环控制，在LIBERO和真实机器人上验证了高达35.6Hz的动作模块吞吐率。

详情

AI中文摘要

视觉-语言-动作模型（VLAs）在机器人操作中展现出强大的任务理解和泛化能力，但全模型推理的高计算成本限制了其在低延迟、高频率闭环控制中的部署。我们提出一种异步语义-动作解耦框架，该框架沿现有VLAs的内部语义-动作接口分离语义理解与动作生成，无需重新设计视觉-语言骨干网络或引入外部规划器。低频理解模块异步更新可复用的语义条件，而高频动作模块持续输出控制动作，无需重复调用完整模型。为缓解陈旧语义与当前执行状态之间的时间不匹配，我们进一步引入历史动作条件化和时间错位训练，提供短时域执行上下文，并在陈旧语义条件下提高反馈控制鲁棒性。在LIBERO上使用$π_{0.5}$和UniVLA进行的实验，以及使用UniVLA的真实机器人部署表明，所提框架实现了高达35.6 Hz的服务端动作模块推理吞吐率，并提供了一条低侵入性路径，无需以控制速率运行完整VLA推理即可实现高频闭环控制。

英文摘要

Vision-Language-Action models (VLAs) have demonstrated strong task understanding and generalization in robotic manipulation, yet the high computational cost of full-model inference limits their deployment in low-latency, high-frequency closed-loop control. We propose an asynchronous semantic-action decoupling framework that separates semantic understanding from action generation along the internal semantic-action interface of existing VLAs, without redesigning the vision-language backbone or introducing an external planner. A low-frequency understanding module asynchronously updates reusable semantic conditions, while a high-frequency action module continuously outputs control actions without repeatedly invoking the full model. To mitigate the temporal mismatch between stale semantics and the current execution state, we further introduce historical action conditioning and time-misalignment training, which provide short-horizon execution context and improve feedback control robustness under stale semantic conditions. Experiments on LIBERO with $π_{0.5}$ and UniVLA, together with real-robot deployment using UniVLA, show that the proposed framework achieves up to 35.6 Hz server-side action-module inference throughput and offers a low-intrusion path to high-frequency closed-loop control without running full VLA inference at control rate.

URL PDF HTML ☆

赞 0 踩 0

2606.15631 2026-06-16 cs.RO cs.AI 新提交

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

检索，不重新训练：在测试时将视觉语言动作模型扩展到新任务

Jeongeun Park, Juhan Park, Taekyung Kim, Sungjoon Choi, Dongyoon Han, Sangdoo Yun

发表机构 * NAVER AI Lab（NAVER AI实验室）； Korea University（高丽大学）

AI总结提出检索增强策略，通过一次训练冻结模型，部署时仅添加检索数据即可适应新任务，无需逐任务微调，在跨本体泛化中优于基线。

Comments https://recap-robot.github.io/

详情

AI中文摘要

将视觉-语言-动作（VLA）策略扩展到新任务通常需要特定任务的遥操作演示和逐任务微调，这使得适应在数据收集和计算方面成本高昂。在本文中，我们表明这种目标侧逐任务适应成本可以被检索所取代。我们的检索增强策略在目标本体（查询）和更廉价的本体（池，例如人手视频）的配对演示上训练一次，然后冻结。新任务在部署时通过将池侧演示附加到检索池来添加。冻结策略在每个控制步骤中根据检索到的轨迹进行条件化，因此新任务通过索引数据而非更新参数来吸收。微调仅在面对新的、未见过的本体时需要，而不是每个新任务。我们表明，检索改进了超越特定骨干网络的策略，包括标准VLA策略，但其效果在基于视频生成的世界动作模型（WAM）Cosmos Policy中尤为显著。在这种设置中，检索提供了粗略的任务进展，而WAM的未来图像目标提供了额外的视觉一致性信号，增强了检索条件化的动作。在PushT上，我们研究了检索如何为跨本体泛化到未见目标角度提供可重用的高级运动先验，而在RoboTwin 2.0上，我们的方法在未见任务上优于跨本体基线，并且我们还在真实机器人上演示了该方法。

英文摘要

Extending a vision-language-action (VLA) policy to a new task typically requires task-specific teleoperated demonstrations and per-task fine-tuning, making adaptation costly in both data collection and compute. In this paper, we show that this target-side per-task adaptation cost can be replaced by retrieval. Our retrieval-augmented policy is trained once on paired demonstrations from the target embodiment (query) and a cheaper embodiment (pool, e.g., human-hand video), then frozen. New tasks are added at deployment by appending pool-side demonstrations to a retrieval pool. The frozen policy conditions on retrieved trajectories at every control step, so new tasks are absorbed by indexing data rather than updating parameters. Fine-tuning is needed only to take on a new, unseen embodiment, not for each new task. We show that retrieval improves policies beyond a specific backbone, including standard VLA policies, but its effect is especially pronounced in Cosmos Policy, a video-generation-based world-action model (WAM). In this setting, retrieval supplies coarse task progression, while the WAM's future-image objective provides an additional visual consistency signal that strengthens the retrieval-conditioned actions. On PushT, we study how retrieval provides a reusable high-level motion prior for cross-embodiment generalization to unseen goal angles, while on RoboTwin 2.0 our method outperforms cross-embodiment baselines on unseen tasks, and we additionally demonstrate the method on a real robot.

URL PDF HTML ☆

赞 0 踩 0

2606.15768 2026-06-16 cs.RO cs.AI 新提交

少思考，早行动：视觉-语言-动作模型中带早退的强化潜在推理

Dianqiao Lei, Lianlei Shan

AI总结提出AVA-VLA框架，通过强化学习去噪和早退策略优化潜在推理轨迹，在LIBERO上实现6倍推理加速和98.3%平均成功率。

Comments Accepted at ICML 2026

详情

AI中文摘要

现有的视觉-语言-动作（VLA）模型主要依赖显式的思维链（CoT）推理来桥接感知和动作。虽然有效，但这种范式在多步骤任务中面临高计算成本和错误传播的问题。在本文中，我们提出了自适应变量对齐VLA（AVA-VLA），一种新颖的潜在推理VLA框架，将推理建模为一系列不可观测的潜在变量，绕过了显式文本生成的需求。然而，潜在轨迹本质上容易受到噪声干扰和与下游目标不对齐的影响。为了解决这个问题，我们引入了一种基于强化学习的去噪机制，将潜在状态生成视为一个顺序决策过程，通过任务级奖励优化推理轨迹。此外，我们结合了一种早退策略，根据状态置信度自适应地终止推理，实现了深度和效率之间的动态权衡。在具身决策基准上的大量实验表明，AVA-VLA在LIBERO上实现了比显式CoT方法6倍的推理加速，同时达到了98.3%的平均成功率，在效率和长期稳定性上均优于全推理基线。

英文摘要

Existing Vision-Language-Action (VLA) models predominantly rely on explicit Chain-of-Thought (CoT) reasoning to bridge perception and action. While effective, this paradigm suffers from high computational costs and error propagation in multi-step tasks. In this paper, we propose Adaptive Variable Alignment VLA (AVA-VLA), a novel Latent Reasoning VLA framework that models reasoning as a sequence of unobservable latent variables, bypassing the need for explicit text generation. However, latent trajectories are inherently susceptible to noise interference and misalignment with downstream objectives. To address this, we introduce a Reinforcement Learning-based Denoising mechanism that treats latent state generation as a sequential decision process, optimizing reasoning trajectories via task-level rewards. Furthermore, we incorporate an Early-Exit Strategy that adaptively terminates reasoning based on state confidence, enabling a dynamic trade-off between depth and efficiency. Extensive experiments on embodied decision benchmarks demonstrate that AVA-VLA achieves a 6x inference speedup over explicit CoT methods while attaining a 98.3% average success rate on LIBERO, improving both efficiency and long-horizon stability over full-reasoning baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.15714 2026-06-16 cs.CL cs.RO 交叉投稿

Act on What You See: 在视觉-语言-动作模型中解锁安全社交导航

Qingzi Wang, Xiyang Wu, Guangyao Shi, Dianwei Chen, Xianfeng Yang, Dinesh Manocha

发表机构 * University of Maryland（马里兰大学）； University of Southern California（南加州大学）

AI总结提出SALSA框架，通过两阶段无标注后训练（社交行为对齐和时间安全对齐），使预训练VLA模型利用已有表征实现安全社交导航，减少86.4%的近距离碰撞。

详情

AI中文摘要

安全社交导航要求机器人区分行人与普通障碍物，并在危险迫近前做出反应。我们表明，预训练的视觉-语言-动作（VLA）模型已在其内部表征中编码了行人-物体区分和未来碰撞信号，但行为克隆未能将这些信号转化为社交上合适的动作。为解决这一不匹配问题，我们提出SALSA，一个两阶段无标注后训练框架：（1）社交行为对齐将中间层社交特征桥接到动作头，并在反事实人-物场景对上训练以打破视觉显著性捷径；（2）时间安全对齐提供自动生成的未来风险监督，实现预期性碰撞避免。在SCAND和实际部署中，SALSA将近距离碰撞减少86.4%，并将社交反事实准确率从53%提升至93%，表明通过教导VLA策略利用其已拥有的表征来行动，可以实现更安全的社交导航。这些结果表明，通过更好地对齐潜在表征与动作生成，预训练VLA策略可被调整用于更安全的社交导航。

英文摘要

Safe social navigation requires robots to distinguish people from ordinary obstacles and to react before danger becomes imminent. We show that pretrained Vision-Language-Action (VLA) models already encode pedestrian-object distinctions and future collision signals in their internal representations, but behavior cloning fails to translate these signals into socially appropriate actions. To address this mismatch, we propose SALSA, a two-stage annotation-free post-training framework: (1) social behavioral alignment bridges intermediate-layer social features to the action head and trains on counterfactual human-object scene pairs to break visual saliency shortcuts; (2) temporal safety alignment provides automatically generated future-risk supervision to enable anticipatory collision avoidance. On SCAND and real-world deployment, SALSA reduces near-collisions by 86.4% and improves social counterfactual accuracy from 53% to 93%, demonstrating that safer social navigation can be achieved by teaching VLA policies to act on representations they already possess. These results show that pretrained VLA policies can be adapted for safer social navigation by better aligning their latent representations with action generation.

URL PDF HTML ☆

赞 0 踩 0

2511.18960 2026-06-16 cs.LG cs.CV cs.RO 版本更新

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

AVA-VLA: 通过主动视觉注意力改进视觉-语言-动作模型

Lei Xiao, Jifeng Li, Juntao Gao, Feiyang Ye, Yan Jin, Jingjing Qian, Jing Zhang, Yong Wu, Xiaoyuan Yu

发表机构 * LiAuto Inc.（LiAuto公司）； Beijing University of Technology（北京理工大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结针对VLA模型忽视历史信息的问题，提出AVA-VLA框架，利用循环状态近似信念并引入主动视觉注意力动态重加权视觉令牌，在LIBERO和CALVIN等基准上取得最优性能。

Comments Accepted at CVPR 2026 (Highlight)

详情

AI中文摘要

视觉-语言-动作（VLA）模型最近在具身任务中取得了显著进展，但大多数方法在每个时间步独立处理视觉观察。这种历史无关的设计将机器人操作视为马尔可夫决策过程，而现实中的机器人控制本质上是部分可观测的，需要推理过去的交互。为了解决这一不匹配，我们从部分可观测马尔可夫决策过程的角度重新表述VLA策略学习，并提出AVA-VLA，一种将动作生成建立在循环状态上的框架，该状态作为智能体对任务历史信念的神经近似。基于此循环状态，我们引入了主动视觉注意力（AVA），它动态地重新加权当前观测中的视觉令牌，以关注与指令和执行历史最相关的区域。大量实验表明，AVA-VLA在标准机器人基准测试（包括LIBERO和CALVIN）上达到了最先进的性能，并有效迁移到真实世界的双臂操作任务。这些结果证明了时间基础的主动视觉处理在改善机器人序列决策中VLA性能的有效性。项目页面见该URL。

英文摘要

Vision-Language-Action (VLA) models have shown remarkable progress in embodied tasks recently, but most methods process visual observations independently at each timestep. This history-agnostic design treats robot manipulation as a Markov Decision Process, even though real-world robotic control is inherently partially observable and requires reasoning over past interactions. To address this mismatch, we reformulate VLA policy learning from a Partially Observable Markov Decision Process perspective and propose AVA-VLA, a framework that conditions action generation on a recurrent state that serves as a neural approximation to the agent's belief over task history. Built on this recurrent state, we introduce Active Visual Attention (AVA), which dynamically reweights visual tokens in the current observation to focus on regions most relevant given both the instruction and execution history. Extensive experiments show that AVA-VLA achieves state-of-the-art performance on standard robotic benchmarks, including LIBERO and CALVIN, and transfers effectively to real-world dual-arm manipulation tasks. These results demonstrate the effectiveness of temporally grounded active visual processing for improving VLA performance in robotic sequential decision-making. The project page is available at https://liauto-dsr.github.io/AVA-VLA-Page.

URL PDF HTML ☆

赞 0 踩 0

2606.13578 2026-06-16 cs.CL cs.AI cs.LG cs.MM cs.RO 版本更新

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

LabVLA：在科学实验室中落地视觉-语言-动作模型

Baochang Ren, Xinjie Liu, Xi Chen, Yanshuo Liu, Chenxi Li, Daqi Gao, Zeqin Su, Jintao Xing, Zirui Xue, Rui Li, Xiangyu Zhao, Shuofei Qiao, Minting Pan, Wangmeng Zuo, Lei Bai, Dongzhan Zhou, Ningyu Zhang, Huajun Chen

发表机构 * Zhejiang University（浙江大学）； Shanghai AI Laboratory（上海人工智能实验室）； Harbin Institute of Technology（哈尔滨工业大学）

AI总结针对科学实验室中机器人执行协议面临的数据和实体瓶颈，提出模拟数据引擎RoboGenesis和两阶段训练策略LabVLA，在LabUtopia基准上取得最高平均成功率。

Comments Work in progress. Project website at https://zjunlp.github.io/LabVLA/

详情

AI中文摘要

科学实验室越来越依赖AI系统来推理实验，但物理实验操作仍超出其能力范围。AI可以帮助阅读文献、生成假设和规划协议，但实验台前的协议执行仍需人类操作员。视觉-语言-动作（VLA）模型为书面协议与机器人执行之间提供了一种可能的接口，但现有策略主要在家庭和桌面演示上训练，很少遇到科学实验室中的仪器、透明液体或固定协议工作流。弥补这一差距需要实验室特定的监督和统一的学习框架，以适应执行实验协议所使用的不同机器人实体。因此，我们将数据和实体视为与模型设计并列的核心瓶颈。为解决数据方面的问题，我们构建了RoboGenesis，这是一个基于模拟的工作流和数据引擎，能够从原子技能组合配置的实验室工作流，验证和过滤 rollout，并跨支持的机器人配置文件导出结构化演示。在策略方面，我们提出了LabVLA，采用两阶段训练方案：首先进行FAST动作标记预训练，使Qwen3-VL-4B-Instruct骨干网络在学习任何连续控制之前具备动作意识；然后进行流匹配后训练，在知识隔离下附加一个DiT动作专家。在LabUtopia基准上，LabVLA在分布内和分布外设置下均达到了所有评估基线中最高的平均成功率。

英文摘要

Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan protocols, yet the execution of those protocols at the bench still requires a human operator. Vision-Language-Action (VLA) models provide one possible interface between written protocols and robot execution, but existing policies are trained mostly on household and tabletop demonstrations and rarely encounter the instruments, transparent liquids, or fixed protocol workflows found in scientific laboratories. Closing this gap requires both laboratory-specific supervision and a unified learning framework that can accommodate the diverse robot embodiments used to execute experimental protocols. We therefore identify data and embodiment as central bottlenecks alongside model design. To address the data side, we build RoboGenesis, a simulation-based workflow and data engine that composes configured laboratory workflows from atomic skills, validates and filters rollouts, and exports structured demonstrations across supported robot profiles. On the policy side, we present LabVLA, trained with a two-stage recipe: FAST action token pretraining first makes the Qwen3-VL-4B-Instruct backbone action aware before any continuous control is learned, and flow matching posttraining then attaches a DiT action expert under knowledge insulation. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among all evaluated baselines under both in-distribution and out-of-distribution settings.

URL PDF HTML ☆

赞 0 踩 0

2606.14882 2026-06-16 cs.RO 新提交

DynaHMRC: Decentralized Heterogeneous Multi-Robot Collaboration for Dynamic Tasks with Large Language Models

DynaHMRC: 基于大语言模型的动态任务去中心化异构多机器人协作

Wenhao Yu, Yu'ang Xie, Yifan Duan, Jie Peng, Guanting Ye, Ka-Veng Yuen, Yanyong Zhang, Jianmin Ji

发表机构 * University of Science and Technology of China (USTC)（中国科学技术大学）； University of Macau (UM)（澳门大学）

AI总结提出DynaHMRC去中心化框架，每个机器人作为角色感知的LLM智能体，通过四阶段闭环流程（自我描述、任务分配与领导竞标、领导者选举、反思执行）实现动态异构多机器人协作，并构建基准测试验证其高效性和可扩展性。

详情

AI中文摘要

大型语言模型（LLMs）为机器人提供了更丰富的任务理解和适应性，使其在协调长期任务中的异构多机器人系统方面具有前景。尽管有这种潜力，但仍存在几个挑战尚未充分探索：（1）集中式LLM调度器随着团队规模和环境复杂性的增加而扩展性差。单个模型必须处理过多的上下文信息，长上下文近似可能降低推理质量；（2）现有任务公式未能充分考虑动态设置，而对不断变化的任务条件的鲁棒适应对于实际部署至关重要；（3）领域特定数据稀缺限制了专门的机器人推理，使得专有通用模型在专家任务上效率低下。为了解决这些限制，我们提出了DynaHMRC，一个去中心化框架，其中每个机器人充当角色感知的LLM智能体。这种设计减轻了单模型上下文瓶颈，并支持跨异构团队配置的灵活协作。DynaHMRC将协作组织为四阶段闭环过程：自我描述、带有领导竞标的任务分配、领导者选举和反思执行，由可执行的机器人接口支持。我们进一步开发了一个基准测试，涵盖三个任务族、四种动态变化和六种团队配置，以系统研究动态任务建模。此外，我们进行了实证分析，以指导领域特定专家数据集的构建，并微调预训练LLM以提高专业能力。实验表明，DynaHMRC在更少的动作和通信步骤下实现了比强基线更高的成功率，同时在评估的设置中随着团队规模的增长显示出有希望的可扩展性趋势。

英文摘要

Large language models (LLMs) provide robots with richer task understanding and adaptability, making them promising for coordinating heterogeneous multi-robot systems in long-horizon tasks. Despite this potential, several challenges remain underexplored: (1) Centralized LLM schedulers scale poorly as team size and environmental complexity increase. A single model must process excessive contextual information, and long-context approximation may degrade reasoning quality; (2) Existing task formulations insufficiently consider dynamic settings, while robust adaptation to evolving task conditions is essential for real-world deployment; (3) Domain-specific data scarcity limits specialized robotic reasoning, making proprietary general-purpose models inefficient for expert tasks. To address these limitations, we propose DynaHMRC, a decentralized framework in which each robot acts as a role-aware LLM agent. This design mitigates the single-model context bottleneck and supports flexible collaboration across heterogeneous team configurations. DynaHMRC organizes collaboration as a four-stage closed-loop process: self-description, task allocation with leadership bidding, leader election, and reflective execution, supported by executable robot interfaces. We further develop a benchmark covering three task families, four dynamic variations, and six team configurations to systematically study dynamic task modeling. In addition, we conduct an empirical analysis to guide the construction of domain-specific expert datasets and fine-tune pretrained LLMs to improve specialized competence. Experiments show that DynaHMRC achieves higher success rates than strong baselines with fewer action and communication steps, while demonstrating promising scalability trends as team size grows within the evaluated settings.

URL PDF HTML ☆

赞 0 踩 0

2606.15255 2026-06-16 cs.RO 新提交

OSDAG: Online Scheduling for Efficient Multi-Robot Collaboration

OSDAG: 面向高效多机器人协作的在线调度

Thanh Nguyen Canh, Thang Tran Viet, Phuc Van Dinh, Xiem HoangVan, Nak Young Chong

发表机构 * Japan Advanced Institute of Science and Technology（日本北陆先端科学技术大学院大学）； University of Engineering and Technology, Vietnam National University（越南国立大学工程技术大学）； Hanyang University（汉阳大学）

AI总结提出OSDAG框架，结合LLM任务推理与DAG在线调度，通过一次性分解指令为依赖图并实时分配任务，相比对话式方法推理速度提升5-15倍，调度时间缩短38%。

详情

AI中文摘要

协调异构多机器人系统（MRS）完成复杂、长周期任务需要灵活的高层推理和高效的低层调度。现有的基于LLM的方法解决了推理方面，但引入了两个关键瓶颈：（1）执行过程中重复的LLM推理，随着智能体数量增加而增加延迟；（2）离线、预提交的调度，即使存在独立工作，也会迫使机器人等待顺序排列的前驱任务而闲置。本文提出了OSDAG，一种新颖的框架，将基于LLM的任务推理与有向无环图（DAG）表示和约束感知的在线调度相结合。LLM被调用一次，将自然语言指令分解为带有依赖注释的任务图，然后轻量级在线调度器实时将就绪任务分配给空闲智能体。DAG表示编码了前驱和资源约束，确保正确性同时暴露所有可用的并行性。在五个基准场景上的实验表明，与基于对话的方法相比，OSDAG的推理时间快5-15倍，与顺序基线相比，完成时间最多减少38%，并保持有竞争力的成功率。在双臂操作任务上的仿真和真实世界实验验证了所提方法在高效多机器人协调中的有效性和实用性。网站和资源可在 http://thanhnguyencanh.github.io/LLM_DAG4MultiRobot 获取。

英文摘要

Coordinating heterogeneous multi-robot systems (MRS) for complex, long-horizon tasks requires both flexible high-level reasoning and efficient low-level scheduling. Existing LLM-based approaches address the reasoning side but introduce two critical bottlenecks: (1) repeated LLM inference during execution, which inflates latency with agent count, and (2) offline, pre-committed scheduling, which forces robots to idle while waiting for sequentially ordered predecessors even when independent work is available. This paper presents OSDAG, a novel framework that integrates LLM-based task reasoning with Directed Acyclic Graph (DAG) representation and constraint-aware online scheduling. The LLM is invoked once to decompose a natural-language instruction into a dependency-annotated task graph, and a lightweight online scheduler then allocates ready tasks to idle agents in real time. The DAG representation encodes both precedence and resource constraints, ensuring correctness while exposing all available parallelism. Experiments across five benchmark scenarios demonstrate that OSDAG achieves 5-15x faster reasoning time compared to dialogue-based methods, reduces makespan by up to 38% over sequential baselines, and maintains competitive success rates. Both simulation and real-world experiments on dual-arm manipulation tasks validate the effectiveness and practicality of the proposed approach for efficient multi-robot coordination. The website and resources are available at http://thanhnguyencanh.github.io/LLM_DAG4MultiRobot

URL PDF HTML ☆

赞 0 踩 0

2606.15550 2026-06-16 cs.RO 新提交

Robots as Tokens: Unified Diffusion Transformer for Coordinated Multi-Robot Trajectory Generation

机器人作为令牌：面向协调多机器人轨迹生成的统一扩散Transformer

Ruofei Bai, Jie Chen, Yuxin Cai, Jun Li, Wei-Yun Yau, Lihua Xie

发表机构 * Nanyang Technological University（南洋理工大学）； Agency for Science, Technology and Research（新加坡科技研究局）； National University of Singapore（新加坡国立大学）

AI总结提出Roken框架，将每个机器人表示为离散令牌，通过扩散Transformer直接生成满足安全和连通性约束的多机器人轨迹，无需迭代后处理。

Comments 23 pages, 13 figures; \textbf{Project page:} \href{https://bairuofei.github.io/roken-project-page/}{\texttt{bairuofei.github.io/roken-project-page}}

详情

AI中文摘要

生成模型在语言和视觉生成中的成功激发了其在生成式机器人规划中的广泛应用。然而，现有工作大多聚焦于单机器人规划，或以顺序方式生成多机器人轨迹并通过迭代后处理解决机器人间冲突。本文研究协调多机器人轨迹（作为一种特殊的时空分布）是否可以通过生成模型以前馈方式学习和生成。我们提出Roken（Robots as Tokens），一种统一的扩散Transformer，直接生成同时满足（个体）安全和（全局）连通性约束的多机器人轨迹。Roken的核心设计是将每个机器人表示为一个离散令牌，使它们能够通过自注意力自然交互，并通过交叉注意力关注地图令牌以获取环境布局。我们进一步引入基于贝叶斯定理的多个辅助任务，提供多尺度时空监督以高效学习条件分布。训练时，Roken吸收来自不同团队规模的多样化专家轨迹。推理时，Roken作为一个多功能多机器人规划器，可处理单机器人规划、协调多机器人轨迹生成，以及通过固定部分机器人令牌作为条件进行条件轨迹生成。在多种杂乱环境中的实验表明，Roken能够生成协调的多机器人轨迹，以高成功率执行连通性约束的目标导航任务，优于用于生成训练数据集的基线方法。Roken在混合团队规模训练后展现出良好的可扩展性，并对未见或部分观测环境具有泛化能力，验证了其从多样化数据中学习并执行多种任务的潜力。

英文摘要

The success of generative models in language and visual generation has inspired extensive applications to generative robot planning. However, most existing works either focus on single-robot planning, or generate multi-robot trajectories in a sequential manner with iterative post-processing to resolve inter-robot conflicts. In this work, we investigate whether coordinated multi-robot trajectories, as a special spatiotemporal distribution, can be learned and generated with a generative model in a feed-forward manner. We propose Robots as Tokens (Roken), a unified diffusion transformer that directly generates multi-robot trajectories that satisfy both (individual) safety and (global) connectivity constraints. The core design of Roken is to represent each robot as a discrete token, allowing them to naturally interact with each other through self-attention, and cross-attend to map tokens for environment layouts. We further introduce several auxiliary tasks based on Bayes' theorem to provide multi-scale spatial-temporal supervision for efficient learning of the conditional distribution. In training, Roken absorbs diverse expert trajectories from different team sizes. During inference, Roken behaves as a versatile multi-robot planner that can handle single-robot planning, coordinated multi-robot trajectory generation, and conditional trajectory generation by fixing some robot tokens as conditions. Experiments in diverse cluttered environments show that Roken can generate coordinated multi-robot trajectories to perform connectivity-constrained goal navigation tasks with high success rates, outperforming the baseline method used to generate the training dataset. Roken also demonstrates good scalability after training with mixed team sizes, and shows generalization to unseen or partially observed environments, verifying its potential to learn from diverse data and perform versatile tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.16490 2026-06-16 cs.RO 新提交

Robots that Collaborate: Sequential Asymmetric Imitation for Learning Coupled Robot Policies

协作机器人：用于学习耦合机器人策略的序列非对称模仿

Yincong Chen, Ranpeng Qiu, Zihao Li, Yanan Zhou, Guoqiang Ren, Weiming Zhi

发表机构 * Zeno AI ； University of Sydney（悉尼大学）

AI总结提出序列非对称模仿（SAI），通过单操作员课程学习耦合多机器人行为，无需同步双操作员演示或显式通信，在真实双机器人操作任务中提升成功率与相位同步。

详情

AI中文摘要

协作移动操作要求机器人与部分可观测的伙伴协调，同时通过共享物体进行物理交互。这很困难，因为失败通常不是由于局部技能差，而是由于不合时宜的等待、让步、拉动、释放或重新定位。我们通过两个双臂移动操作器与刚性和可变形物体耦合来研究这个问题。我们提出序列非对称模仿（SAI），一种单操作员课程，用于学习耦合的多机器人行为，无需同步双操作员演示或显式机器人间通信。SAI 首先从与顺从人类伙伴的单侧演示中训练机器人 A，然后针对已部署的机器人 A 策略训练机器人 B，最后在协调失败附近使用稀疏干预来优化机器人 A。这种分阶段过程使策略暴露于越来越真实的伙伴行为，包括延迟、相位不匹配、让步不足和交互冲突。在真实世界的双机器人操作任务中，SAI 在任务成功率、相位同步和伙伴条件性让步方面优于独立模仿和课程消融基线。这些结果表明，物理耦合协作可以通过模仿课程的结构来学习，而不是通过同步多操作员演示或显式协调机制。项目页面：http://cyc0429.github.io/sai-project-page/

英文摘要

Collaborative mobile manipulation requires robots to coordinate with a partially observed partner while physically interacting through shared objects. This is difficult because failures often arise not from poor local skills, but from mistimed waiting, yielding, pulling, releasing, or repositioning. We study this problem with two bimanual mobile manipulators coupled through rigid and deformable objects. We propose Sequential Asymmetric Imitation (SAI), a single-teleoperator curriculum for learning coupled multi-robot behaviors without synchronized dual-operator demonstrations or explicit inter-robot communication. SAI trains Robot A from unilateral demonstrations with a compliant human partner, trains Robot B against the deployed Robot A policy, and then refines Robot A using sparse interventions near coordination failures. This staged process exposes the policies to increasingly realistic partner behaviors, including delay, phase mismatch,insufficient yielding, and interaction conflict. Across real-world dual-robot manipulation tasks, SAI improves task success, phase synchronization, and partner-contingent yielding over independent imitation and curriculum-ablation baselines. These results suggest that physically coupled collaboration can be learned through the structure of the imitation curriculum, rather than through synchronized multi-operator demonstrations or explicit coordination mechanisms.Project page:http://cyc0429.github.io/sai-project-page/

URL PDF HTML ☆

赞 0 踩 0

2606.16116 2026-06-16 eess.SY cs.MA cs.RO cs.SY math.DS 交叉投稿

Distributed Safe Consensus Under Asymmetric Input and Time-Varying Output Constraints

非对称输入与时变输出约束下的分布式安全一致性

Abhinav Sinha, Shashi Ranjan Kumar

发表机构 * Guidance, Autonomy, Learning, and Control for Intelligent Systems (GALACxIS) Lab, Department of Aerospace Engineering and Engineering Mechanics, University of Cincinnati（智能系统引导、自主、学习与控制实验室，航空航天工程与工程力学系，辛辛那提大学）； Intelligent Systems and Control (ISaC) Lab, Department of Aerospace Engineering, Indian Institute of Technology Bombay（智能系统与控制实验室，航空航天工程系，印度班加罗尔理工学院）

AI总结针对单积分多智能体系统，提出一种结合障碍坐标变换的分布式控制律，同时满足非对称执行器约束和时变输出安全约束，实现渐近同步。

详情

AI中文摘要

本文研究了在连通无向图上的单积分多智能体系统中，同时存在非对称执行器约束和输出安全约束下的安全分布式一致性问题。每个智能体配备一个连续可微的非对称执行器动力学，将命令控制信号映射到实际输入，同时使后者严格保持在规定的允许区间内。为了解决输出安全性问题，在公共时变安全区间上引入障碍坐标变换，并在变换后的坐标中设计分布式同步律。所得到的控制器将基于图的协调层与执行器侧跟踪层相结合，从而同时实现输入可容许性、安全输出集的前向不变性和渐近同步。对于初始条件的紧致可容许集，证明了闭环解是完整的，所有信号有界，执行器输入始终严格在其非对称界限内，并且智能体输出始终保持在规定的安全区间内。此外，变换后的同步误差指数收敛到零，原始智能体输出渐近同步到嵌入公共安全区间中的设计者选择的可容许轨迹。数值仿真验证了所提出的框架，并展示了在非对称执行器界限和时变输出约束下的安全一致性。

英文摘要

This paper studies safe distributed consensus for single-integrator multi-agent systems over connected undirected graphs under simultaneous asymmetric actuator constraints and output safety constraints. Each agent is equipped with a continuously differentiable asymmetric actuator dynamics that maps a commanded control signal to the realized plant input while keeping the latter strictly inside a prescribed admissible interval. To address output safety, a barrier-coordinate transformation is introduced over a common time-varying safe interval, and a distributed synchronization law is designed in the transformed coordinates. The resulting controller integrates a graph-based coordination layer with an actuator-side tracking layer, thereby enabling simultaneous enforcement of input admissibility, forward invariance of the safe output set, and asymptotic synchronization. For compact admissible sets of initial conditions, it is shown that the closed-loop solution is complete, all signals remain bounded, the actuator inputs remain strictly within their asymmetric bounds, and the agent outputs remain inside the prescribed safe interval for all time. Moreover, the transformed synchronization errors converge exponentially to zero, and the original agent outputs asymptotically synchronize to a designer-selected admissible trajectory embedded in the common safe interval. Numerical simulations validate the proposed framework and demonstrate safe consensus under both asymmetric actuation bounds and time-varying output constraints.

URL PDF HTML ☆

赞 0 踩 0

2511.21957 2026-06-16 cs.RO cs.MA 版本更新

RSPECT: Robust and Scalable Planner for Energy-Aware Coordination of UAV-UGV Teams in Aerial Monitoring

RSPECT：空中监控中无人机-无人车团队能量感知协调的鲁棒可扩展规划器

Cahit Ikbal Er, Amin Kashiri, Yasin Yazicioglu

发表机构 * Department of Electrical and Computer Engineering, Northeastern University（东北大学电气与计算机工程系）

AI总结针对无人机与作为移动充电站的无人车在长期空中监控任务中的鲁棒规划问题，提出可扩展启发式算法RSPECT，通过混合整数规划建模并保证规划可行性与鲁棒性。

Comments Accepted to the Journal of Intelligent & Robotic Systems (JINT)

详情

AI中文摘要

我们考虑能量受限的无人机（UAV）和作为移动充电站的无人车（UGV）的鲁棒规划，以执行长期空中监控任务。具体而言，给定一组需要无人机访问的点以及无人机-无人车团队的期望最终位置，目标是在不确定性（例如，未知障碍物/地形、风）下，找到一个无需重大修订即可实现的鲁棒计划（车辆轨迹），以最小时间完成此任务。我们提供了该问题的形式化描述，将其建模为混合整数规划（MIP），该问题是NP难的。由于精确求解方法对此类问题在计算上难以处理，我们提出了RSPECT，一种可扩展且高效的启发式算法。我们给出了关于算法复杂度以及所得计划的可行性和鲁棒性的理论结果。我们还通过仿真和实验展示了我们方法的性能。

英文摘要

We consider the robust planning of energy-constrained unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which act as mobile charging stations, to perform long-horizon aerial monitoring missions. More specifically, given a set of points to be visited by the UAVs and desired final positions of the UAV-UGV teams, the objective is to find a robust plan (the vehicle trajectories) that can be realized without a major revision in the face of uncertainty (e.g., unknown obstacles/terrain, wind) to complete this mission in minimum time. We provide a formal description of this problem as a mixed-integer program (MIP), which is NP-hard. Since exact solution methods are computationally intractable for such problems, we propose RSPECT, a scalable and efficient heuristic. We provide theoretical results on the complexity of our algorithm and the feasibility and robustness of resulting plans. We also demonstrate the performance of our method via simulations and experiments.

URL PDF HTML ☆

赞 0 踩 0

2512.13090 2026-06-16 cs.RO 版本更新

Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion

基于热启发的扩散模型实现从视觉和语言的多机器人运动规划

Jebeom Chae, Junwoo Chang, Seungho Yeom, Yujin Kim, Jongeun Choi

发表机构 * Department of Artificial Intelligence, Yonsei University（燕山大学人工智能学院）； School of Mechanical Engineering, Yonsei University（燕山大学机械工程学院）

AI总结提出LHD框架，结合CLIP语义先验与碰撞避免扩散核，实现语言条件化的多机器人无碰撞轨迹规划，在成功率上优于先前方法并降低延迟。

Comments 8 pages, 6 figures, accepted by IEEE Robotics and Automation Letters (RA-L)

详情

Journal ref: IEEE Robotics and Automation Letters, vol. 11, no. 6, pp. 7118-7125, June 2026

AI中文摘要

扩散模型最近通过捕捉可行轨迹的多模态分布，成为机器人运动规划的强大工具。然而，它们在具有灵活、语言条件化任务规范的多机器人场景中的扩展仍然有限。此外，当前的基于扩散的方法在推理过程中计算成本高，并且由于需要显式构建环境表示以及缺乏几何可达性推理机制，难以泛化。为了解决这些限制，我们提出了语言条件化热启发扩散（LHD），一个端到端的基于视觉的框架，生成语言条件化的无碰撞轨迹。LHD将来自视觉语言模型（VLM）CLIP的语义先验与作为物理归纳偏置的碰撞避免扩散核相结合，使规划器能够在可达工作空间内严格解释语言命令。这自然地处理了分布外（OOD）场景——在可达性方面——通过引导机器人朝向匹配语义意图的可访问替代方案，同时消除了推理时对显式障碍物信息的需求。在多种真实世界启发的地图上的广泛评估以及真实机器人实验表明，LHD在成功率上持续优于先前的基于扩散的规划器，同时减少了规划延迟。项目页面见：this https URL

英文摘要

Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-conditioned Heat-inspired Diffusion (LHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LHD integrates semantic priors from CLIP, a vision-language model (VLM), with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution (OOD) scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency. Project page is available at: https://jebeom.github.io/lhd_project_page/

URL PDF HTML ☆

赞 0 踩 0

2509.07561 2026-06-16 cs.MA cs.RO 版本更新

Bio-inspired decision making in robot swarms under biases

偏差下机器人群体中的生物启发式决策

Raina Zakir, Timoteo Carletti, Marco Dorigo, Andreagiovanni Reina

发表机构 * IRIDIA, Université Libre de Bruxelles（布鲁塞尔自由大学IRIDIA实验室）； Department of Mathematics and Namur Institute for Complex Systems, naXys, University of Namur（纳慕尔大学数学系和复杂系统纳慕尔研究所）； Centre for the Advanced Study of Collective Behaviour, Universität Konstanz（康斯坦茨大学集体行为高级研究所以）； Department of Computer and Information Science, Universität Konstanz（康斯坦茨大学计算机与信息科学系）； Department of Collective Behaviour, Max Planck Institute of Animal Behaviour（动物行为Max Planck研究所集体行为系）

AI总结研究在存在非社会偏差时，直接切换与交叉抑制两种意见动力学机制对机器人群体决策性能的影响，发现交叉抑制在偏差条件下更优。

详情

AI中文摘要

最小化机器人群体提供了一种可扩展、鲁棒且成本效益高的方法来执行复杂任务，有望改变医疗、灾难响应和环境监测等领域的应用。然而，协调这种去中心化系统仍然是一个基本挑战，特别是当机器人在通信、计算和内存方面受限时。在我们的研究中，单个机器人在感知环境时经常出错，但群体能够快速可靠地就$n$个离散选项中的最佳选项达成共识。我们比较了两种典型的意见动力学机制——直接切换和交叉抑制——它们是简单而有效的集体信息处理规则，在从神经群体到昆虫群体的生物系统中广泛存在。我们通过考虑影响意见动力学的非社会偏差，推广了现有的平均场模型。使用直接切换的群体在没有非社会动力学时能可靠地选择最佳选项，但一旦引入此类偏差，其性能下降，常常导致决策死锁。相比之下，受生物启发的交叉抑制在广泛的偏差条件下实现了更快、更一致、更准确、更鲁棒和更可扩展的决策。我们的研究结果为最小化群体的协调提供了理论和实践见解，并扩展到了生物学和工程学中广泛类别的去中心化决策系统。

英文摘要

Minimalistic robot swarms offer a scalable, robust, and cost-effective approach to performing complex tasks with the potential to transform applications in healthcare, disaster response, and environmental monitoring. However, coordinating such decentralised systems remains a fundamental challenge, particularly when robots are constrained in communication, computation, and memory. In our study, individual robots frequently make errors when sensing the environment, yet the swarm can rapidly and reliably reach consensus on the best among $n$ discrete options. We compare two canonical mechanisms of opinion dynamics -- direct-switch and cross-inhibition -- which are simple yet effective rules for collective information processing observed in biological systems across scales, from neural populations to insect colonies. We generalise the existing mean-field models by considering asocial biases influencing the opinion dynamics. While swarms using direct-switch reliably select the best option in absence of asocial dynamics, their performance deteriorates once such biases are introduced, often resulting in decision deadlocks. In contrast, bio-inspired cross-inhibition enables faster, more cohesive, accurate, robust, and scalable decisions across a wide range of biased conditions. Our findings provide theoretical and practical insights into the coordination of minimal swarms and offer insights that extend to a broad class of decentralised decision-making systems in biology and engineering.

URL PDF HTML ☆

赞 0 踩 0

2606.15251 2026-06-16 cs.RO cs.AI cs.LG 新提交

Driving, Fast or Slow? Neuro-Symbolic Guidance for Motion Prediction in Multi-Modal Ground Mobility

驾驶，快或慢？多模态地面移动中运动预测的神经符号引导

Simon Kohaut, Felix Divo, Julius Hahnewald, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

发表机构 * Artificial Intelligence and Machine Learning Lab, TU Darmstadt（达姆施塔特工业大学人工智能与机器学习实验室）； Honda Research Institute（本田研究所）； Hessian Center for AI (hessian.AI)（黑森州人工智能中心）； Centre for Cognitive Science（认知科学中心）； German Center for AI (DFKI)（德国人工智能研究中心）； Uncertainty in Artificial Intelligence Lab, TU Eindhoven（埃因霍温理工大学人工智能不确定性实验室）

AI总结提出TraCS框架，通过神经符号方法将交通规则编码为概率一阶逻辑，增强黑盒运动预测模型的可解释性和合规性，在Argoverse 2上持续提升SOTA性能。

详情

AI中文摘要

准确且可解释的异构交通空间（包括行人、自行车、汽车和卡车）运动预测对于安全的自主导航至关重要。然而，最先进的方法仍然是黑盒，缺乏对现实世界移动的监管和行为约束的显式编码。我们提出Trajectory Compliance-Shaping (TraCS)，一种神经符号框架，通过可解释的概率一阶逻辑增强现有的黑盒运动预测骨干网络。为此，TraCS采用智能体代码生成流水线，弥合交通规则的自然语言描述与概率运动预测之间的差距。此外，TraCS采用反应式数据流推理引擎，随着场景演变维护并高效更新合规性景观。为防止TraCS过度自信地将骨干网络的预测引导到错误方向，我们提出一种神经置信度评分，作为上下文感知的合规性信号衰减。我们在Argoverse 2基准上展示了TraCS如何持续改进最先进的预测骨干网络，表明概率和符号合规性推理是纯神经运动预测的广泛适用且计算高效的补充。

英文摘要

Accurate and interpretable motion prediction for heterogeneous traffic spaces, including pedestrians, bicycles, cars, and trucks, is essential for safe autonomous navigation. Nevertheless, state-of-the-art approaches remain predominantly black-box, lacking explicit encoding of the regulatory and behavioral constraints of real-world mobility. We propose Trajectory Compliance-Shaping (TraCS), a neuro-symbolic framework that augments existing black-box motion prediction backbones with interpretable and probabilistic first-order logic. To do so, TraCS employs an agentic code-generation pipeline to bridge the gap between natural-language descriptions of traffic regulations and probabilistic motion prediction. Furthermore, TraCS employs a reactive data-streaming inference engine that maintains and efficiently updates compliance landscapes as scenes evolve. To prevent TraCS from overconfidently steering the backbone's predictions in the wrong direction, we propose a neural confidence rating learned as a context-aware attenuation of the compliance signal. We demonstrate on the Argoverse 2 benchmark how TraCS consistently improves state-of-the-art prediction backbones, showing that probabilistic and symbolic compliance reasoning is a broadly applicable and computationally efficient complement to purely neural motion predictors.

URL PDF HTML ☆

赞 0 踩 0

2606.16042 2026-06-16 cs.RO cs.AI 新提交

Leveraging Deep Learning for Object and Position Recognition of Load Carriers for Autonomous Logistics Vehicles

利用深度学习实现自主物流车辆对载具的物体与位置识别

Christoph Legat, Tobias Miller, Marco Riess

发表机构 * Research Group on Cognitive Autonomy & Predictive Intelligence, Technical University of Applied Sciences, Augsburg, Germany（认知自主与预测智能研究组，奥格斯堡应用技术大学，德国）； Grenzebach Maschinenbau GmbH, Asbach-Bäumenheim, Germany（Grenzebach Maschinenbau GmbH，德国阿斯巴赫-博伊门海姆）

AI总结提出基于深度学习的框架，通过卷积神经网络从RGBD数据中识别载具上的预定义地标并计算其位姿，实现自主物流车辆对载具的检测与定位，实验验证了工业环境下的可靠性。

Comments 6 pages, 6 figures, IFAC World Congress2026, \c{opyright} 2026 the authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND

详情

AI中文摘要

本工作探索了在移动机器人中利用人工智能实现载具的自主检测和位姿估计，以便自动拾取。设计了一个深度神经网络，从RGBD数据中识别载具上的预定义地标；然后利用这些地标计算载具的位姿。该网络直接处理RGBD图像以估计地标位置，这些位置构成了确定载具位置的基础。该方法在大量实验中得到了验证，并包含软件和硬件实现。提出了一个基于深度学习的框架，用于检测载具并估计其位姿，以应用于自主物流车辆。我们的方法使用卷积神经网络从RGBD输入中识别载具上的特征参考点，并通过将这些推断出的地标与先验几何知识相结合来计算其位姿。实验表明，所得精度足以在工业环境中可靠地检测载具，证实了该方法适用于自主内部物流应用。

英文摘要

This work explores the use of artificial intelligence in mobile robotics to achieve autonomous detection and pose estimation of load carriers for automated pickup. A deep neural network is designed to recognize predefined landmarks on the carrier from RGBD data; these landmarks are then used to compute the carrier's pose. The network operates directly on RGBD images to estimate landmark positions, which form the basis for determining the carrier's location. The approach is validated in extensive experiments and comprises both software and hardware implementations. A deep learning-based framework is presented to detect load carriers and estimate their pose for use with autonomous logistics vehicles. Our method uses a convolutional neural network to identify characteristic reference points on the carrier from RGBD input and computes its pose by combining these inferred landmarks with prior geometric knowledge. Experiments show that the resulting accuracy is sufficient for reliable load carrier detection in industrial environments, confirming the suitability of the method for autonomous intralogistics applications.

URL PDF HTML ☆

赞 0 踩 0

2606.16513 2026-06-16 cs.RO 新提交

Agile Fall Recovery for Quadrotors with Bidirectional Thrust via Reinforcement Learning

基于强化学习的双向推力四旋翼敏捷坠落恢复

Anke Zhao, Yuhang Zhong, Kenghou Hoi, Junyu Mou, Junjie Wang, Lijie Wang, Jialiang Hou, Fei Gao

发表机构 * Institute of Cyber-Systems and Control, College of Control Science and Engineering, Zhejiang University（浙江大学控制科学与工程学院工业控制技术研究所）； Differential Robotics

AI总结提出基于强化学习的框架，利用轻量级机载传感器实现四旋翼从任意地面姿态恢复至稳定悬停，通过非对称演员-评论家架构和增量非线性动态逆控制器解决部分可观测性和传感器失效问题，仿真和实验验证了零样本迁移和鲁棒性。

详情

AI中文摘要

自主坠落恢复是四旋翼在现实环境中运行的关键能力，因为碰撞或故障可能导致飞行器以任意姿态停在地面上。该问题具有挑战性，因为恢复必须在有限的机载感知、受限的自由空间、地面接触以及存在未知干扰的情况下实现。本文提出了一种基于强化学习的框架，用于四旋翼从任意地面姿态自主恢复至稳定悬停，仅使用轻量级机载传感器。为了解决严重的部分可观测性和间歇性传感器失效问题，我们在非对称演员-评论家架构中训练了一个循环策略，并利用增量非线性动态逆（INDI）控制器跟踪策略输出。结合电机响应和光流的高保真仿真，整体训练框架显著缩小了仿真到现实的差距。仿真消融研究验证了主要设计选择的重要性，而真实世界实验展示了在不同初始姿态、风干扰和额外负载下的零样本迁移和鲁棒恢复。这些结果表明，无需明确的状态估计，仅使用有限且不可靠的机载传感即可实现敏捷的四旋翼坠落恢复。

英文摘要

Autonomous fall recovery is a critical capability for quadrotors operating in real-world environments, where collisions or failures may leave the vehicle resting on the ground in an arbitrary attitude. This problem is challenging because recovery must be achieved under limited onboard sensing, in constrained free space, with ground contact, and in the presence of unknown disturbances. In this letter, we present an RL-based framework for autonomous fall recovery of a quadrotor from arbitrary ground attitudes to stable hover using only lightweight onboard sensors. To address severe partial observability and intermittent sensor invalidity, we train a recurrent policy within an asymmetric actor--critic architecture, leveraging an Incremental Nonlinear Dynamic Inversion (INDI) controller to track the policy output. Combined with high-fidelity simulations of motor response and optical flow, the overall training framework significantly reduces the sim-to-real gap. Simulation ablation studies validate the importance of the main design choices, while real-world experiments demonstrate zero-shot transfer and robust recovery under different initial attitudes, wind disturbances, and additional payloads. These results demonstrate that agile quadrotor fall recovery can be achieved without explicit state estimation using only limited and unreliable onboard sensing.

URL PDF HTML ☆

赞 0 踩 0

2606.16621 2026-06-16 cs.RO 新提交

Reinforcement Learning with Inner-loop Dynamics Estimator for Aerial Manipulation under Uncertainty

基于内环动力学估计器的强化学习在不确定性下的空中操纵

Shivansh Pratap Singh, Samaksh Ujjwal, Ishita Chaudhary, V R Vasudevan, Rishabh Dev Yadav, Spandan Roy

发表机构 * International Institute of Information Technology Hyderabad（国际信息技术学院海德拉巴）； University of Manchester（曼彻斯特大学）

AI总结提出一种分层控制框架，结合强化学习与外环与内环动力学估计器，实现直接任务驱动控制，在硬件实验中降低末端执行器跟踪误差并提高任务成功率。

详情

AI中文摘要

空中操纵器能够在难以到达的环境中进行物理交互；然而，在快速臂运动、载荷变化及相关未知动态不确定性下，直接全身空中操纵的组合问题在很大程度上仍未解决。我们提出了一种分层控制框架，结合强化学习（RL）与内环动力学估计器来解决这一问题。RL外环将期望的六自由度（DOF）末端执行器目标映射到协调的全身指令，从而实现直接任务驱动控制，而无需在策略层依赖完全准确的耦合动态模型。内环随后跟踪这些指令，同时通过动力学估计器方案在执行过程中补偿瞬态惯性偏移和不确定性，无需系统模型知识。我们通过硬件实验，在变化的载荷条件下，在配备3自由度操纵器的定制四旋翼飞行器上验证了所提出的方法。与RL+PID和RL+INDI+PID基线相比，所提出的方法在测试的硬件条件下降低了末端执行器跟踪误差并提高了任务成功率。这些结果表明，将学习到的全身协调与基于估计器的低层补偿相结合，提高了在变化操作条件下空中操纵的精度和鲁棒性。

英文摘要

Aerial manipulators enable physical interaction in hard-to-reach environments; however, the combined problem of direct whole-body aerial manipulation under rapid arm motion, payload changes, and related unknown dynamic uncertainty remains a largely unsolved problem. We present a hierarchical control framework that combines Reinforcement Learning (RL) with an inner-loop dynamics estimator to address this problem. The RL outer loop maps desired 6-degrees-of-freedom (DOF) end-effector targets to coordinated whole-body commands, enabling direct task-driven control without relying on a fully accurate coupled dynamic model in the policy layer. An inner loop then tracks these commands while compensating for transient inertial shifts and uncertainty during execution via a dynamics estimator scheme without requiring system model knowledge. We validate the proposed approach on a custom quadrotor equipped with a 3-DoF manipulator through hardware experiments under varying payload conditions. Compared with RL+PID and RL+INDI+PID baselines, the proposed method reduces end-effector tracking error and improves task success rate across the tested hardware conditions. These results show that combining learned whole-body coordination with estimator-based low-level compensation improves the precision and robustness of aerial manipulation under changing operating conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.16735 2026-06-16 cs.RO 新提交

Pride and Prejudice: Toward an Information-Theoretic Framework for Mutually Communicative Driver Behavior Modeling

傲慢与偏见：迈向相互通信的驾驶员行为建模的信息论框架

Tingjun Li, Nan Xu, Shuo Feng, Hassan Askari, Bruno Henrique Groenner Barbosa, Konghui Guo

发表机构 * State Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University（吉林大学汽车底盘集成与仿生国家重点实验室）； Beijing National Research Center for Information Science and Technology, Tsinghua University（清华大学北京信息科学与技术国家研究中心）； Department of Engineering, Brock University（布鲁克大学工程系）； Department of Automatics, Federal University of Lavras（拉夫拉斯联邦大学自动化系）

AI总结针对自动驾驶与人类驾驶车辆间意图误读导致的安全与效率问题，提出基于信息论的隐式相互通信模型，结合贝叶斯说服博弈与信息论奖励，在NGSIM数据集上降低强制换道预测误差达20%。

Comments 16 pages, 10 figures. Accepted for the IEEE Transactions on Intelligent Transportation Systems (T-ITS), June 2026

详情

AI中文摘要

当自动驾驶车辆（AV）和人类驾驶车辆（HV）误读彼此的意图时，混合自主驾驶会变得不安全且低效。我们将此问题研究为换道中的隐式相互通信。所提出的框架建模了自车如何在认知不确定性下既表达自身意图又探测对方驾驶员的偏好。它结合了用于主动信号传递的带有虚拟特征的k级贝叶斯说服博弈、用于相互通信的信息论奖励以及通信能力的自适应权重。我们进一步引入了Pride-Inquiry (P-I) 和 Pride-Prejudice (P-P) 平面来分析通信强度和倾向。该模型使用基于通信的多智能体逆强化学习算法（C-MIRL）在自然主义NGSIM数据集上进行校准。与非通信基线相比，所提出的模型将强制换道的预测误差降低了高达20%，同时保持了强大的泛化能力。驾驶员在环问卷得分与校准后的通信变量呈正相关，支持了模型的主观有效性。学习到的奖励进一步表明，询问和倾听能力比单纯的骄傲和表达贡献更大，并且询问偏好在不同驾驶员之间变化更强烈。这些结果支持在交互驾驶中对相互通信和认知不确定性进行显式建模。

英文摘要

Mixed autonomy driving becomes unsafe and inefficient when autonomous vehicles (AVs) and human-driven vehicles (HVs) misread each other's intentions. We study this problem as implicit mutual communication in lane changes. The proposed framework models how the ego vehicle both expresses its intent and probes the other driver's preference under epistemic uncertainty. It combines a level-k Bayesian persuasion game with virtual features for proactive signaling, information-theoretic rewards for mutual communication, and adaptive weights of communication affordances. We further introduce the Pride-Inquiry (P-I) and Pride-Prejudice (P-P) planes to analyze communication intensity and tendency. The model is calibrated with a Communication-Based Multi-Agent Inverse Reinforcement Learning algorithm (C-MIRL) on the naturalistic NGSIM dataset. Compared with the non-communicative baseline, the proposed model reduces the prediction error of mandatory lane changes by up to 20% while maintaining strong generalization. Driver-In-the-Loop questionnaire scores are positively correlated with the calibrated communication variables, supporting the subjective validity of the model. The learned rewards further show that inquiry and listening affordances contribute more than pride and expression alone, and that inquiry preference varies more strongly across drivers. These results support explicit modeling of mutual communication and epistemic uncertainty in interactive driving.

URL PDF HTML ☆

赞 0 踩 0

2606.14716 2026-06-16 cs.CV cs.AI cs.RO 交叉投稿

公海航行智能航行模型

Hanna Krasowski, Stefan Schärdinger, Murat Arcak, Matthias Althoff

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Technical University of Munich（慕尼黑技术大学）

AI总结提出首个智能航行模型（ISM），模拟遵守海上交通规则的船舶，结合模型预测控制实现航点跟踪，在交互仿真中达到约97%的目标到达率且无碰撞。

详情

AI中文摘要

自主船舶有望提升海上贸易的安全性和可靠性。为促进自主船舶的发展，需要仿真来模拟与其他船舶的真实交互。然而，由于环境非结构化、交通规则粗略以及船舶类型差异大，模拟真实的交互式海上交通具有挑战性。目前，尚无用于严格基准测试自主船舶算法的交互式海上环境仿真标准。本文首次提出智能航行模型（ISM），该模型模拟公海航行中遵守规则的船舶。ISM船舶根据海上交通规则对其他交通参与者做出反应，同时解决由航点表征的运动规划任务。具体而言，ISM监控适用规则，据此生成合规航点，并利用模型预测控制跟踪航点。我们在两种环境中评估ISM：仅含ISM船舶的交互式交通，以及混合交通（部分船舶轨迹来自记录的真实海上交通数据或为关键场景手工设计）。结果表明，包含多种类型ISM船舶的仿真具有规则合规性和可扩展性。我们测试了4,049个关键交通场景。在ISM船舶的交互式交通中，未发生碰撞，目标到达率约为97%。

英文摘要

Autonomous vessels potentially enhance safety and reliability of seaborne trade. To facilitate the development of autonomous vessels, simulations are required to model realistic interactions with other vessels. However, modeling realistic interactive maritime traffic is challenging due to the unstructured environment, coarsely specified traffic rules, and largely varying vessel types. Currently, there is no standard for simulating interactive maritime environments in order to rigorously benchmark autonomous vessel algorithms. In this paper, we introduce the first intelligent sailing model (ISM), which simulates rule-compliant vessels for navigation on the open sea. An ISM vessel reacts to other traffic participants according to maritime traffic rules while at the same time solving a motion planning task characterized by waypoints. In particular, the ISM monitors the applicable rules, generates rule-compliant waypoints accordingly, and utilizes a model predictive control for tracking the waypoints. We evaluate the ISM in two environments: interactive traffic with only ISM vessels and mixed traffic where some vessel trajectories are from recorded real-world maritime traffic data or handcrafted for criticality. Our results show that simulations with many ISM vessels of different vessel types are rule-compliant and scalable. We tested 4,049 critical traffic scenarios. For interactive traffic with ISM vessels, no collisions occurred while goal-reaching rates of about 97 percent were achieved.

URL PDF HTML ☆

赞 0 踩 0

2510.12560 2026-06-16 cs.CV cs.LG cs.RO 版本更新

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

CoIRL-AD：面向自动驾驶的潜在世界模型中的协作-竞争模仿-强化学习

Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结提出CoIRL-AD框架，通过解耦模仿学习与强化学习、利用潜在世界模型进行长时程奖励估计以及引入竞争机制，在离线训练中提升自动驾驶的鲁棒性，尤其在跨城市泛化和长尾场景中表现优异。

Comments 19 pages, 22 figures, ICML 2026

详情

AI中文摘要

基于模仿学习（IL）训练的端到端自动驾驶模型通常泛化能力较差，尤其是在专家演示稀疏的长尾场景中。强化学习（RL）可以提供互补的任务级监督，但在没有交互模拟器的离线设置中，将RL应用于真实世界的自动驾驶具有挑战性，因为数据集主要由专家动作主导，行为多样性有限。我们提出CoIRL-AD，一个竞争性的双策略框架，在统一的离线训练机制下整合IL和RL。CoIRL-AD将模仿和奖励优化解耦到不同的智能体中，以缓解目标冲突，使用想象的未来轨迹进行长时程奖励估计，并引入竞争机制，选择性地传递有益行为，同时使RL保持与专家驾驶行为一致。在nuScenes基准上的实验表明，CoIRL-AD在强IL基线上持续提升鲁棒性，尤其在跨城市泛化和长尾场景中取得了显著改进。代码可在以下网址获取：this https URL。

英文摘要

End-to-end autonomous driving models trained with imitation learning (IL) often generalize poorly, particularly in long-tail scenarios where expert demonstrations are sparse. Reinforcement learning (RL) can provide complementary task-level supervision, but applying RL to real-world autonomous driving is challenging in offline settings without interactive simulators, where datasets are dominated by expert actions and provide limited behavioral diversity. We propose CoIRL-AD, a competitive dual-policy framework that integrates IL and RL under a unified offline training regime. CoIRL-AD decouples imitation and reward optimization into separate actors to alleviate objective conflicts, uses imagined future rollouts for long-horizon reward estimation, and introduces a competition mechanism that selectively transfers beneficial behaviors while keeping RL anchored to expert-like driving. Experiments on the nuScenes benchmark show that CoIRL-AD consistently improves robustness over strong IL-based baselines, with especially large gains in cross-city generalization and long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.

URL PDF HTML ☆

赞 0 踩 0

2512.24838 2026-06-16 cs.CV cs.RO 版本更新

CropTrack: A Tracking with Re-Identification Framework for Precision Agriculture

CropTrack: 面向精准农业的跟踪与重识别框架

Md Ahmed Al Muzaddid, Jordan A. James, William J. Beksi

发表机构 * Department of Computer Science and Engineering, The University of Texas at Arlington（计算机科学与工程系，德克萨斯大学阿灵顿分校）

AI总结针对农业场景中物体外观相似、频繁遮挡导致跟踪困难的问题，提出结合外观与运动信息的MOT框架CropTrack，通过重排序增强外观关联、一对多关联冲突解决和指数移动平均原型特征库，显著提升身份保持和关联精度。

Comments 8 pages, 5 figures, and 4 tables

详情

AI中文摘要

农业环境中的多目标跟踪（MOT）由于重复模式、相似物体外观、突然光照变化和频繁遮挡而面临重大挑战。该领域的当代跟踪器依赖物体运动而非外观进行关联。然而，当目标经历频繁且强烈的遮挡时，它们难以维持物体身份。物体外观的高度相似性使得在农业场景中集成基于外观的关联变得非平凡。为解决此问题，我们提出CropTrack，一种基于外观和运动信息结合的新型MOT框架。CropTrack集成了重排序增强的外观关联、基于外观冲突解决策略的一对多关联以及指数移动平均原型特征库，以改进基于外观的关联。在公开可用的农业MOT数据集上评估，CropTrack展示了一致的身份保持，优于传统的基于运动的跟踪方法。与现有技术相比，CropTrack在关联准确性和识别精度得分上取得了显著提升，同时身份切换次数更低。

英文摘要

Multiple-object tracking (MOT) in agricultural environments presents major challenges due to repetitive patterns, similar object appearances, sudden illumination changes, and frequent occlusions. Contemporary trackers in this domain rely on the motion of objects rather than appearance for association. Nevertheless, they struggle to maintain object identities when targets undergo frequent and strong occlusions. The high similarity of object appearances makes integrating appearance-based association nontrivial for agricultural scenarios. To solve this problem we propose CropTrack, a novel MOT framework based on the combination of appearance and motion information. CropTrack integrates a reranking-enhanced appearance association, a one-to-many association with appearance-based conflict resolution strategy, and an exponential moving average prototype feature bank to improve appearance-based association. Evaluated on publicly available agricultural MOT datasets, CropTrack demonstrates consistent identity preservation, outperforming traditional motion-based tracking methods. Compared to the state of the art, CropTrack achieves significant gains in association accuracy and identification precision scores with a lower number of identity switches.

URL PDF HTML ☆

赞 0 踩 0

2602.07343 2026-06-16 cs.CV cs.AI cs.LG cs.RO 版本更新

Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation

通过文字看道路：一种语言引导的RGB-T驾驶场景分割框架

Ruturaj Reddy, Hrishav Bakul Barua, Junn Yong Loo, Thanh Thi Nguyen, Ganesh Krishnasamy

发表机构 * National University of Singapore（新加坡国立大学）； University of Technology Sydney（悉尼科技大学）

AI总结提出CLARITY框架，利用视觉语言模型先验动态调整RGB-T融合策略，并引入暗目标语义保留和层次化解码器，在MFNet数据集上达到62.3% mIoU和77.5% mAcc的新SOTA。

详情

AI中文摘要

在恶劣光照、照明和阴影条件下，道路场景的鲁棒语义分割仍然是自动驾驶应用的核心挑战。RGB-热融合是一种标准方法，但现有方法在所有条件下统一应用静态融合策略，导致模态特定噪声在网络中传播。因此，我们提出CLARITY，它根据检测到的场景条件动态调整融合策略。在视觉语言模型（VLM）先验的引导下，网络学习根据光照状态调节每种模态的贡献，同时利用对象嵌入进行分割，而不是应用固定的融合策略。我们进一步引入了两种机制：一种保留有效的暗对象语义，这些语义在先前的噪声抑制方法中被错误丢弃；另一种是层次化解码器，它在不同尺度上强制结构一致性，以锐化薄对象的边界。在MFNet数据集上的实验表明，CLARITY建立了新的最先进水平（SOTA），实现了62.3%的mIoU和77.5%的mAcc。

英文摘要

Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a standard approach, yet existing methods apply static fusion strategies uniformly across all conditions, allowing modality-specific noise to propagate throughout the network. Hence, we propose CLARITY that dynamically adapts its fusion strategy to the detected scene condition. Guided by vision-language model (VLM) priors, the network learns to modulate each modality's contribution based on the illumination state while leveraging object embeddings for segmentation, rather than applying a fixed fusion policy. We further introduce two mechanisms - one which preserves valid dark-object semantics that prior noise-suppression methods incorrectly discard, and a hierarchical decoder that enforces structural consistency across scales to sharpen boundaries on thin objects. Experiments on the MFNet dataset demonstrate that CLARITY establishes a new state-of-the-art (SOTA), achieving 62.3% mIoU and 77.5% mAcc.

URL PDF HTML ☆

赞 0 踩 0

2602.14780 2026-06-16 cs.MA cs.CY cs.RO cs.SY eess.SY 版本更新

ROSA: Roundabout Optimized Speed Advisory with Multi-Agent Trajectory Prediction in Multimodal Traffic

ROSA: 多模式交通中基于多智能体轨迹预测的环岛优化速度建议

Anna-Lena Schlamp, Jeremias Gerner, Klaus Bogenberger, Werner Huber, Stefanie Schmidtner

发表机构 * IEEE

AI总结提出ROSA系统，结合Transformer多智能体轨迹预测与协调速度引导，提升环岛多模式混合交通的效率与安全，预测精度优于前人工作。

Comments 8 pages, 1 figure, 4 tables. Copyright 2026 IEEE. This is the accepted manuscript for 2025 IEEE International Conference on Intelligent Transportation Systems (ITSC), not the final published version

详情

AI中文摘要

基于物理的Unitree G1人形机器人手臂电力消耗模型识别

Nestor N. Deniz, Sebastian Vega, Simon Parsons, Fernando Auat Cheein

发表机构 * Harper Adams University（哈珀亚当斯大学）； Lincoln Institute for Agri-Food Technology（林肯农业食品技术研究所）； Lincoln Centre for Autonomous Systems（林肯自主系统中心）

AI总结提出一种基于物理的线性参数模型，用于预测Unitree G1人形机器人左臂的电力消耗，通过实验数据识别参数，在897条轨迹上达到R²=0.933，并在未见速度轨迹上验证泛化能力。

详情

AI中文摘要

精确预测电力消耗对于电池供电人形机器人的能量感知运动规划、电池管理和热监测至关重要。本文提出了一个基于物理的线性参数模型，用于Unitree~G1人形机器人七自由度左臂的电力消耗。所提出的公式将执行器损耗项与基线扭矩校正相结合，该校正捕捉重力补偿负载的变化，并能够准确预测负净功率轨迹。引入成对交互项来模拟多关节同时运动期间的功率耦合。模型参数从物理Unitree~G1上收集的实验数据中识别，使用板载功率测量作为回归目标。在覆盖单关节和协调手臂运动、多个速度水平的897条轨迹上，识别模型实现了$R^2 = 0.933$，RMSE为1.07 (W)。在46条以先前未见速度执行的轨迹上验证，得到$R^2 = 0.965$，显示出在识别数据集之外的强泛化能力。对识别参数的分析揭示了手臂上不同的功耗特性，粘性摩擦主导大多数关节（肩部俯仰和所有三个腕关节），铜损主导肩部偏航和肘部，而肩部滚动则独特地由库仑摩擦主导。

英文摘要

Accurate prediction of electrical power consumption is essential for energy-aware motion planning, battery management, and thermal monitoring in battery-powered humanoid robots. This letter presents a physics-based, linear-in-parameters model for the electrical power consumption of the seven-degree-of-freedom left arm of the Unitree~G1 humanoid robot. The proposed formulation combines actuator loss terms with a baseline-torque correction that captures changes in gravity-compensation load and enables accurate prediction of negative net power trajectories. Pairwise interaction terms are introduced to model power coupling during simultaneous multi-joint motion. Model parameters are identified from experimental data collected on a physical Unitree~G1 using onboard power measurements as the regression target. Across 897 trajectories covering single-joint and coordinated arm motions at multiple speed levels, the identified model achieves $R^2 = 0.933$ with an RMSE of 1.07 (W). Validation on 46 trajectories executed at previously unseen speeds yields $R^2 = 0.965$, demonstrating strong generalisation beyond the identification dataset. Analysis of the identified parameters reveals distinct power-consumption characteristics across the arm, with viscous friction dominating most joints (shoulder pitch and all three wrist joints), copper losses dominating shoulder yaw and the elbow, and shoulder roll uniquely dominated by Coulomb friction.

URL PDF HTML ☆

赞 0 踩 0

2606.15997 2026-06-16 cs.RO cs.SY eess.SY 新提交

Friction Characterization of a Cable-Driven Differential Actuation System for Lower-Limb Exoskeletons

下肢外骨骼用缆绳驱动差动驱动系统的摩擦特性

Alberto Maria Nobili, Fabio Salsedo, Alessandro Filippeschi

发表机构 * Institute of Mechanical Intelligence, Department of Excellence in Robotics and AI, Sant'Anna School of Advanced Studies（机械智能研究所、机器人及人工智能卓越系，圣安娜高等研究学院）； Wearable Robotics s.r.l.（可穿戴机器人有限公司）

AI总结提出一种用于下肢外骨骼髋-膝关节屈伸的差动驱动架构，通过电机与关节间的线性差动映射实现协同扭矩共享，并开发基于模型的摩擦估计策略实现无传感器扭矩估计。

Comments Accepted for presentation IEEE RAS/EMBS 11th International Conference on Biomedical Robotics and Biomechatronics

2606.16876 2026-06-16 cs.RO 新提交

ExoTraj: A General Lower-limb Exoskeleton Assistance Policy for Complex Environments

ExoTraj：面向复杂环境的通用下肢外骨骼辅助策略

Xiao-Yin Liu, Guotao Li, Long Sun, Xu Liang, Zeng-Guang Hou

发表机构 * The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所多模态人工智能系统国家重点实验室）； The School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； CASIA-MUST Joint Laboratory of Intelligence Science and Technology, Institute of Systems Engineering, Macau University of Science and Technology（澳门科技大学系统工程研究所中科院自动化所-澳门科技大学智能科学与技术联合实验室）； The School of Automation and Intelligence, Beijing Jiaotong University（北京交通大学自动化与智能学院）

AI总结提出ExoTraj统一策略，通过快速流匹配实现多模态特征到轨迹的精确预测，并利用模型预测控制优化力矩，在复杂户外场景中实现自适应辅助，无需昂贵运动捕捉系统。

Comments 28 pages, 19 figures, project page: https://xiaoyinliu0714.github.io/Home_ExoTraj/

详情

AI中文摘要

在动态外骨骼场景中，自适应力矩预测需要昂贵的运动捕捉系统，这在复杂户外环境中不可行。轨迹预测已成为解决该问题的有效方法之一。然而，外骨骼轨迹预测的核心挑战有两个：建立从多模态特征到轨迹信息的映射；构建从轨迹到力矩的映射。对于前者，现有方法大多仅执行单步预测并忽略受试者间轨迹变异性，从而限制了轨迹优化空间和预测泛化能力。为此，本文提出一种快速流匹配方法，能够实现精确的轨迹预测和更好的实时性能泛化，其中轨迹生成误差和编码观测用于指导训练方向。对于第二个挑战，由于人-机器人系统的高动态性以及感知与控制之间的强耦合，简单的控制方法难以基于预测轨迹实现高效辅助。本文利用模型预测控制并设计了一种新颖的优化目标来优化力矩，确保外骨骼实现舒适且鲁棒的辅助。通过整合上述两个组件，开发了统一策略ExoTraj，使其能够在复杂户外场景中实现自适应辅助，而无需高昂的数据采集成本。实验结果表明，与传统方法相比，ExoTraj在在线阶段将跨受试者预测误差降低了14.0%，并保持对外部噪声的鲁棒性。相对于零力矩条件，ExoTraj分别将代谢率降低11.5-24.4%，心率降低1.7-19.5%，峰值肌肉激活水平降低10.9-41.3%。

英文摘要

Adaptive torque prediction in dynamic exoskeleton scenarios requires expensive motion capture systems, which are infeasible in complex outdoor environments. Trajectory prediction has emerged as one of the effective approaches to address such an issue. However, the core challenges of exoskeleton trajectory prediction are twofold: establishing the mapping from multi-modal features to trajectory information; constructing the mapping from trajectory to torque. For the former, most existing methods perform only single-step prediction and neglect inter-subject trajectory variability, thereby limiting the trajectory optimization space and prediction generalization. To address this, this paper proposes a fast flow matching method that enables accurate trajectory prediction and better generalization for real-time performance, where trajectory generation errors and encoded observations are used to guide the training direction. For the second challenge, due to the high dynamics of the human-robot system and the strong coupling between perception and control, simple control methods struggle to achieve efficient assistance based on the predicted trajectory. This paper utilizes model predictive control and designs a novel optimization objective to optimize torque, ensuring the exoskeleton achieves comfortable and robust assistance. By integrating the above two components, the unified policy, denoted as ExoTraj, is developed to enable adaptive assistance in complex outdoor scenarios without high data acquisition cost. Experimental results show that compared to traditional methods, ExoTraj reduces cross-subject prediction error by 14.0% during the online phase and maintains robustness against external noise. Relative to the zero torque condition, ExoTraj decreases metabolic rate by 11.5-24.4%, heart rate by 1.7-19.5%, and peak muscle activation levels by 10.9-41.3%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.16657 2026-06-16 eess.SP cs.RO 交叉投稿

Towards mm-Level Accurate UWB Radar: High-Accuracy Phase-Based Obstacle Detection through Multi-Channel Fusion

迈向毫米级精度的UWB雷达：通过多通道融合实现基于相位的高精度障碍物检测

Jelle De Moerloose, Adnan Shahid, Eli De Poorter

AI总结提出一种在无源UWB雷达中利用相位信息进行距离估计的框架，通过多通道融合实现厘米级精度，中位误差1.69 cm，比仅用幅度的方法提升显著。

Comments 13 pages, Submitted to IEEE Transactions On Wireless Communications

详情

AI中文摘要

对于自主导引车、机器人及环境表征等应用，使用超宽带（UWB）雷达进行精确、无标签的距离估计至关重要。对于基于标签的定位系统，基于相位的UWB信号处理技术已展现出亚波长测距精度，但这些方法不适用于无源（无标签）雷达设置，因为其反射弱、多径条件复杂且缺乏已知的飞行时间（ToF）首径参考。本文首次证明，在完全无源的UWB雷达设置中可以有效利用相位信息。我们提出一种信号处理框架，通过将基于幅度的粗估计与跨多个频率通道的高分辨率相位变化相结合，提取可靠的距离信息。通过参考视距分量的相位测量，该方法补偿了硬件引起的相位漂移，而多通道频率分集的使用则能够消除周期性相位信息的模糊性，并提高对特定频率信道退化（如菲涅尔区）的鲁棒性。所提方法在配备使用DW3000设备的双基地UWB雷达的机器人上进行了验证，并在真实的金属工业环境中进行了评估。实验结果表明，我们的工作即使在高速下也能持续达到厘米级精度，中位误差为1.69 cm，显著优于仅依赖幅度信息的现有约10 cm精度的UWB雷达方法。我们进一步展示了多通道融合如何利用不相关的信道退化，相比单通道操作将误差降低超过40%，并概述了如何将相位建模与融合推向亚厘米级精度。

英文摘要

Accurate, tag-free distance estimation with ultrawideband (UWB) radar is essential for applications such as autonomous guided vehicles, robotics, and environment characterization. For tag-based localization systems, phase-based UWB signal processing techniques have demonstrated sub-wavelength ranging precision, but these approaches are not applicable for passive (tagless) radar setups with weak reflections, mixed multipath conditions, and the absence of a known time-of-flight (ToF) first-path reference. This paper demonstrates for the first time that phase information can be effectively exploited in a fully passive UWB radar setting. We introduce a signal processing framework that extracts reliable distance information by combining coarse amplitude-based estimates with high-resolution phase changes across multiple frequency channels. By referencing phase measurements with the line-of-sight component, the method compensates for hardware-induced phase drift, while the use of multichannel frequency diversity enables disambiguation of periodic phase information and improves robustness against frequencyspecific channel degradation such as Fresnel zones. The proposed approach is validated on a robot equipped with a bistatic UWB radar using DW3000 devices and evaluated in a realistic metallic industrial environment. Experimental results show that our work consistently achieves centimeter-level accuracy even at high speeds, with a median error of 1.69 cm, significantly outperforming existing ~10cm accuracy UWB radar approaches relying only on amplitude-information. We further show how multi-channel fusion exploits uncorrelated channel degradation to reduce the error by more than 40% compared to single-channel operation, and outline how phase modeling and fusion can be pushed toward sub-centimeter accuracy.

URL PDF HTML ☆

赞 0 踩 0

2511.06998 2026-06-16 cs.RO 版本更新

Raspi$^2$USBL: An open-source Raspberry Pi-Based Passive Inverted Ultra-Short Baseline Positioning System for Underwater Robotics

Raspi$^2$USBL：一种基于树莓派的开源被动倒置超短基线水下机器人定位系统

Jin Huang, Yingqiang Wang, Ying Chen

发表机构 * State Key Laboratory of Ocean Sensing, Ocean College of Zhejiang University（浙江省海洋传感重点实验室，浙江大学海洋学院）； School of Oceanography, Shanghai Jiao Tong University（上海交通大学海洋学院）

AI总结提出一种基于树莓派的低成本被动倒置超短基线定位系统，通过被动声学接收器和主动信标实现水下定位，在消声池、淡水湖和开放海域测试中达到0.1%斜距精度和0.1°方位角精度。

详情

AI中文摘要

精确的水下定位仍然是水下机器人领域的一个基本挑战，因为全球导航卫星系统（GNSS）信号无法穿透海面。本文介绍了Raspi$^2$USBL，一种基于树莓派的被动倒置超短基线（piUSBL）定位系统，为水下机器人研究提供了一个低成本、易获取且可复现的平台。该系统由一个被动声学接收器和一个主动信标组成。接收器集成了水听器阵列、多通道前置放大器、恒温晶振（OCXO）、树莓派5和MCC系列数据采集（DAQ）板。信标集成了匹配网络、功率放大器和发射换能器。一个开源C++框架支持单向传播时间（OWTT）消息的时钟同步和触发，同时执行匹配滤波、阵列波束成形和自适应增益控制，以估计飞行时间（TOF）和到达方向（DOA）。该系统在消声水池、淡水湖和开放海域试验中进行了验证。结果表明，斜距精度优于0.1%，方位角精度在0.1°以内，且在高达1.3公里的距离上性能稳定。这些发现表明，低成本的系统级可复现硬件能够提供研究级的水下定位精度。通过发布软件框架并提供可复现的硬件架构，Raspi$^2$USBL提供了一个参考平台，降低了水下机器人实验室的入门门槛，并促进了水下声学导航和群体机器人领域的可复现研究。

英文摘要

Precise underwater positioning remains a fundamental challenge for underwater robotics because global navigation satellite system (GNSS) signals cannot penetrate the sea surface. This paper presents Raspi$^2$USBL, a Raspberry Pi-based passive inverted ultra-short baseline (piUSBL) positioning system that provides a low-cost, accessible, and reproducible platform for underwater robotic research. The system consists of a passive acoustic receiver and an active beacon. The receiver integrates a hydrophone array, multichannel preamplifier, oven-controlled crystal oscillator (OCXO), Raspberry Pi 5, and MCC-series data acquisition (DAQ) board. The beacon integrates a matching network, power amplifier, and transmitting transducer. An open-source C++ framework supports clock synchronization and triggering for one-way travel-time (OWTT) messaging, while performing matched filtering, array beamforming, and adaptive gain control to estimate the time of flight (TOF) and direction of arrival (DOA). The system was validated in an anechoic tank, a freshwater lake, and open-sea trials. Results demonstrate a slant-range accuracy better than 0.1%, a bearing accuracy within 0.1°, and stable performance over distances up to 1.3 km. These findings show that low-cost, system-level reproducible hardware can deliver research-grade underwater positioning accuracy. By releasing the software framework and providing a reproducible hardware architecture, Raspi$^2$USBL offers a reference platform that lowers the entry barrier for underwater robotics laboratories and promotes reproducible research in underwater acoustic navigation and swarm robotics.

URL PDF HTML ☆

赞 0 踩 0

2604.00768 2026-06-16 cs.RO cs.HC 版本更新

SimWeaver：面向可变形操作的零样本RGB仿真到现实

Wenkang Hu, Haoran Wang, Yitong Li, Liu Liu, Mengao Zhao, Lai Jiang, Xincheng Tang, Junhang Wei, Zhengjie Shu, Zhendong Wang, Zhizhong Su, Huamin Wang, Ruigang Yang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Horizon Robotics（地平线机器人）； Style3D Research（Style3D研究院）

AI总结提出SimWeaver框架，通过200条仿真演示训练零样本RGB VLA策略，在5种可变形任务中达到91%平均真实成功率，无需遥操作或任务校准。

详情

AI中文摘要

可变形操作的RGB仿真到现实在没有真实世界微调的情况下基本上仍未解决。我们提出了SimWeaver，它在每个任务的200个模拟演示上训练零样本RGB VLA策略，在5种不同的可变形任务（包括塑料袋操作）中达到每个任务超过80%和平均91%的真实世界成功率，无需遥操作或每个任务校准。SimWeaver结合了一个可靠的基于测量的模拟器（SimWeaver-Sim）、一个支持单图像生成的可扩展资产框架（SimWeaver-Asset）、一个确定性拓扑感知轨迹合成器（SimWeaver-Syn）以及一个具有ISP感知光度增强的仿真到现实协议（SimWeaver-Real）。在丝绸抓取任务中，模拟训练的策略在视觉分布偏移下达到100%的成功率，而基于真实数据的基线下降到9-70%，且每个轨迹的成本低两个数量级。我们将发布SimWeaver和一个代表性资产子集。项目页面：https://simweaver.github.io/

英文摘要

RGB sim-to-real for deformable manipulation has remained largely unsolved without real-world fine-tuning. We present SimWeaver, which trains zero-shot RGB VLA policies on 200 simulated demonstrations per task, reaching above 80% per-task and 91% average real-world success across 5 diverse deformable tasks including plastic-bag manipulation, without teleoperation or per-task calibration. SimWeaver combines a reliable measurement-backed simulator (SimWeaver-Sim) with an extensible asset framework supporting single-image generation(SimWeaver-Asset), a deterministic topology-aware trajectory synthesizer (SimWeaver-Syn), and a sim-to-real protocol with ISP-aware photometric augmentation (SimWeaver-Real). On silk grasping, the sim-trained policy reaches 100% under visual distribution shifts where real-data baselines drop to 9-70%, at two orders of magnitude lower per-trajectory cost. We will release SimWeaver and a representative asset subset. Project page: https://simweaver.github.io/

URL PDF HTML ☆

赞 0 踩 0

2606.15431 2026-06-16 cs.RO 新提交

A Corridor-Scale CARLA-VISSIM Co-Simulation Framework for Multi-Intersection Urban Traffic

面向多交叉口城市走廊的CARLA-VISSIM联合仿真框架

Sima Ashayer, Austin Haris, Mina Sartipi

发表机构 * University of Tennessee at Chattanooga（田纳西大学查塔努加分校）

AI总结提出CARLA-VISSIM双向步进同步联合仿真框架，集成微观交通逻辑与高保真3D渲染，在田纳西州查塔努加市MLK大道约15个交叉口走廊上验证，支持混合控制与感知就绪的走廊级交通研究。

详情

AI中文摘要

本文提出了一个已实现的CARLA-VISSIM联合仿真框架，用于美国田纳西州查塔努加市马丁·路德·金大道上约15个相连交叉口的城市走廊。该系统通过双向步进同步接口集成CARLA 0.10.0（Unreal Engine 5）与PTV VISSIM 2026，将VISSIM的微观车辆、行人和信号控制器逻辑与CARLA的高保真3D渲染相结合。基于LiDAR的高程模型和RoadRunner的高清地图提供了地形精确的道路几何，并在两个仿真器中一致部署。该框架包含显式的参与者所有权、镜像生命周期管理、坐标协调以及每个参与者最新状态的更新策略，实现了VISSIM控制的交通流与CARLA控制的自我车辆之间的稳定交互。一个走廊规模的案例研究展示了在约100辆车和100名行人的峰值负载下，交通信号镜像、车辆-行人同步交互以及稳定的混合控制操作。该部署捕捉了MLK街上五个信号化交叉口及其连接的上游和下游交叉口的交互，揭示了多交叉口走廊特有的同步挑战。结果表明，以MLK为中心的走廊为验证跨仿真器一致性提供了有效测试平台，且所提出的架构支持可靠的、感知就绪的走廊级交通联合仿真。

英文摘要

This paper presents an implemented CARLA-VISSIM co-simulation framework for an urban corridor comprising approximately fifteen connected intersections centered on Martin Luther King Jr. Boulevard in Chattanooga, Tennessee. The system integrates CARLA 0.10.0 Unreal Engine 5 with PTV VISSIM 2026 through a bidirectional, step-synchronized interface that couples VISSIM's microscopic vehicle, pedestrian, and signal-controller logic with CARLA's high-fidelity 3D rendering. A LiDAR-derived elevation model and RoadRunner-based High Definition (HD) map provide terrain-accurate road geometry deployed consistently across both simulators. The framework incorporates explicit actor ownership, mirrored lifecycle management, coordinate reconciliation, and a latest-state-per-actor update policy, enabling stable interaction between VISSIM-controlled traffic and a CARLA-controlled ego vehicle. A corridor-scale case study demonstrates consistent traffic-signal mirroring, synchronized vehicle-pedestrian interactions, and stable mixed-authority operation under peak loads of approximately 100 vehicles and 100 pedestrians. The deployment captures the interaction of the five signalized intersections along MLK Street and their connecting upstream and downstream intersections, revealing synchronization challenges unique to multi-intersection corridors. Results indicate that this MLK-centered corridor provides an effective testbed for verifying cross-simulator consistency and that the proposed architecture supports reliable, perception-ready co-simulation for corridor-level traffic studies.

URL PDF HTML ☆

赞 0 踩 0

2606.15930 2026-06-16 cs.RO cs.AI 新提交

ControlMap: Controllable High-Definition Map Generation for Traffic Scenario Simulation

ControlMap: 用于交通场景仿真的可控高清地图生成

Marwan Farag, Steffen Wäldele, Yu Yao

发表机构 * University of Stuttgart（斯图加特大学）； Robert Bosch GmbH（博世公司）； Motional, Inc（Motional公司）

AI总结提出基于潜在扩散和ControlNet的数据驱动管道，实现可控高清地图生成，支持空间引导、条件强度调整和城市风格迁移，并引入新指标评估控制信号遵循度和地图真实性。

详情

AI中文摘要

仿真是验证自动驾驶系统的核心，但当前流程因高精（HD）地图创建成本高昂而受限于场景多样性不足。扩展HD地图需要昂贵的数据收集和人工处理。此外，现有生成模型缺乏在生成过程中针对特定道路拓扑进行细粒度控制的能力。本文提出一种数据驱动的可控HD地图生成管道，使用潜在扩散和ControlNet进行空间条件控制。据我们所知，我们是首个将空间引导信号注入扩散模型用于HD地图合成的工作。此外，我们的模型支持通过无分类器引导调整条件强度，并通过城市标签条件实现城市级风格迁移。为补充现有指标，我们引入两个新指标来评估对控制信号的遵循程度以及与真实地图的相似性。实验表明，我们的模型生成的HD地图真实且忠实遵循输入道路拓扑，同时准确保留城市特定细节。

英文摘要

Simulation is central to validating autonomous driving systems, yet current pipelines are limited by insufficient scenario diversity due to costly High Definition (HD) map creation. Scaling HD maps requires expensive data collection and manual processing. Moreover, existing generative models lack the fine-grained control necessary to target specific road topologies during generation. This paper presents a data-driven pipeline for controllable HD map generation using latent diffusion and ControlNet for spatial conditioning. To our knowledge, we are the first to inject spatial guidance signals into a diffusion model for HD map synthesis. Furthermore, our model supports adjustable conditioning strength through classifier-free guidance and city-level style transfer via city label conditioning. To complement existing metrics, we introduce two novel metrics to evaluate adherence to the control signal and similarity to ground-truth maps. Experiments demonstrate that our model generates realistic HD maps that faithfully follow input road topologies while accurately preserving city-specific details.

URL PDF HTML ☆

赞 0 踩 0

2606.16208 2026-06-16 cs.RO 新提交

ATHENA: Accelerated Multi-Task Heterogeneous Influence Functions for Robot Data Curation

ATHENA: 加速的多任务异构影响函数用于机器人数据筛选

Tao Xu, Jiaxin Wang, Runhao Zhang, Jiayi Guan, Xianchao Zeng, Weixi Song, Xinyu Zhou, Zhetao Chen, Guang Chen, Yong-Lu Li

发表机构 * Tongji University（同济大学）； Shanghai Innovation Institute（上海创新研究院）； Xi'an Jiaotong University（西安交通大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出ATHENA框架，利用Kronecker梯度结构和秩r随机截断近似加速影响函数计算，实现多任务VLA模型数据筛选，在模拟和真实机器人任务中以更少数据达到或超越全数据微调性能。

详情

AI中文摘要

在机器人模仿学习中，影响函数提供了一种原则性方法来量化每个演示对机器人任务结果的影响，但将其扩展到十亿参数的视觉-语言-动作（VLA）模型受到计算和多任务瓶颈的限制。为此，我们提出ATHENA，一个专为十亿参数规模的多任务VLA数据筛选设计的影响函数框架。具体来说，它利用线性层梯度的Kronecker结构来降低投影成本，并通过秩r随机截断近似来近似稠密Hessian矩阵的逆，在影响计算中实现了约313.4倍的加速。此外，ATHENA制定了全局和局部交互影响，以平衡50个联合训练任务间的数据筛选。在RoboTwin 2.0和真实机器人部署上的广泛评估，分别涵盖9.34小时和6.90小时的演示，表明ATHENA在模拟中仅使用50%的演示、在六个真实机器人任务中使用66.7%的数据，即可达到或超过全数据联合微调的性能。总体而言，ATHENA证明了其在十亿参数多任务VLA微调中用于数据筛选的有效性。

英文摘要

In robot imitation learning, influence functions provide a principled approach to quantify each demonstration's effect on robot task outcomes, yet scaling them to billion-parameter Vision-Language-Action (VLA) models is limited by computational and multitask bottlenecks. To this end, we propose ATHENA, an influence function framework tailored for multitask VLA data curation at a billion-parameter scale. Concretely, it leverages the Kronecker structure of linear-layer gradients to reduce projection cost, and approximates dense Hessian inversion with a rank-r Random Truncated Approximation, achieving about a 313.4x speedup in influence computation. Furthermore, ATHENA formulates global and local interactive influence to balance data curation across 50 jointly trained tasks. Extensive evaluations on RoboTwin 2.0 and real-robot deployment, covering 9.34 and 6.90 hours of demonstrations, respectively, show that ATHENA matches or exceeds full-data joint fine-tuning using only 50% of demonstrations in simulation and 66.7% of data across six real-robot tasks. Overall, ATHENA demonstrates its effectiveness for data curation in billion-parameter multitask VLA fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.16447 2026-06-16 cs.RO cs.AI 新提交

自动驾驶谈判者：一个在隐藏意图下进行社会谈判和心理理论的交互式可验证基准

Ashutosh Kumar

发表机构 * Owl Autonomous Imaging, Inc（Owl 自动成像公司）

AI总结提出一个文本多轮程序化生成环境，用于衡量自动驾驶中基于隐藏意图推断的隐式社会协调能力，通过特权模拟器状态计算奖励和诊断，当前最佳模型平均成功率仅0.68。

详情

AI中文摘要

自动驾驶充满了微小的社会谈判：一个司机向前推进，另一个让行，行人假意向路边移动，或车道车辆选择是否打开并线间隙。这类互动需要在部分可观测性下从行为推断隐藏意图，然后安全高效地行动。现有的自动驾驶语言基准主要关注感知、视觉问答或开环规划，而现有的语言智能体谈判基准通常将谈判明确表达在文本中。自动驾驶谈判者弥合了两者之间的差距：一个纯文本、多轮、程序化生成的环境，用于衡量驾驶中的隐式社会协调。智能体生成具体的驾驶动作。奖励和诊断从特权模拟器状态计算，而非模型的解释。本报告涵盖任务设计、奖励和反博弈不变量、验证场景、非LLM基线以及六模型推理排行榜。当前模型与脚本专家相去甚远。三个场景中最佳平均成功率为0.68；争议并线场景中模型表现统计上持平；难度层级区分了线索跟随与真正的等待承诺行为。

英文摘要

Autonomous driving is full of tiny social negotiations: a driver presses forward, another yields, a pedestrian fakes toward the curb, or a lane vehicle chooses whether to open a merge gap. Such interactions require inferring hidden intent from behavior under partial observability and then acting safely and efficiently. Existing autonomous-driving language benchmarks mostly focus on perception, visual question answering, or open-loop planning, while existing language-agent negotiation benchmarks typically make the negotiation explicit in text. Self-Driving Negotiator bridges the gap between the two: a text-only, multi-turn, procedurally generated environment for measuring implicit social coordination in driving. Agents generate specific driving actions. Reward and diagnostics are computed from the privileged simulator state, not from the explanation of the model. This report covers task design, reward and anti-gaming invariants, validated scenarios, non-LLM baselines, and a six-model inference leaderboard. Current models are far removed from the scripted expert. The best average success rate across three scenarios is 0.68; contested merge is statistically flat across models; and difficulty tiers separate cue-following from true wait-for-commitment behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.15522 2026-06-16 cond-mat.mtrl-sci cs.RO 交叉投稿

NIMO: A Software Platform for Closed-Loop Materials Exploration with Diverse AI Algorithms

NIMO：一个集成多种AI算法的闭环材料探索软件平台

Ryo Tamura, Naruki Yoshikawa, Koji Tsuda, Shoichi Matsuda

发表机构 * Center for Basic Research on Materials, National Institute for Materials Science（材料基础研究所，日本国家材料科学研究所）； Graduate School of Frontier Sciences, The University of Tokyo（东京大学前沿科学研究生院）； Center for Green Research on Energy and Environmental Materials, National Institute for Material Science（能源与环境材料绿色研究中心，日本国家材料科学研究所）

AI总结提出开源平台NIMO，通过模块化AI-机器人解耦、候选池架构和统一接口，集成12种AI算法，实现跨实验室的自主材料探索。

Comments 29 pages, 5 figures

详情

AI中文摘要

自主实验室（SDL）中，人工智能提出后续实验，机器人系统执行这些实验，正迅速成为材料发现的先锋。然而，一个关键的瓶颈在于如何将针对特定探索目标定制的多样化AI算法与不同实验室中异构的机器人硬件无缝衔接。在此，我们介绍NIMO，一个开源软件平台，旨在通过三个核心范式消除这一障碍：通过简单的CSV文件交换实现模块化AI-机器人解耦，一个离散候选池架构无缝吸收领域知识，以及一个预装了十二种不同AI算法的统一Python接口。在这篇视角文章中，我们回顾了每种算法的操作原理，以及由NIMO驱动的六个不同SDL实现，涵盖电解质发现、有机合成、薄膜探索、燃料电池过程信息学、咖啡环相探索和遗留液体处理自动化。其中一个实现还展示了NIMO与IvoryOS编排框架的无缝互操作性。为了普及自主科学，我们还介绍了一个无代码桌面应用程序，使非程序员能够进行直观的人机交互探索。NIMO可在https://github.com/NIMS-DA/nimo免费获取，为加速跨不同实验场景的自主材料探索提供了多功能的即插即用基础。

英文摘要

Self-driving laboratories (SDLs), where artificial intelligence proposes subsequent experiments and robotic systems execute them, are rapidly becoming the vanguard of materials discovery. A critical bottleneck, however, lies in seamlessly bridging diverse AI algorithms tailored for specific exploration goals with the heterogeneous robotic hardware found across different laboratories. Here, we present NIMO, an open-source software platform designed to dissolve this barrier through three core paradigms: a modular AI-robot decoupling mediated via simple CSV file exchange, a discrete candidate-pool architecture that seamlessly absorbs domain knowledge, and a unified Python interface pre-loaded with twelve distinct AI algorithms. In this Perspective, we review the operational principles of each algorithm alongside six diverse SDL implementations driven by NIMO, covering electrolyte discovery, organic synthesis, thin-film exploration, fuel-cell process informatics, coffee-ring phase exploration, and legacy liquid-handling automation. One of these also demonstrates NIMO's seamless interoperability with the IvoryOS orchestration framework. To democratize autonomous science, we also introduce a no-code desktop application that enables intuitive, human-in-the-loop exploration for non-programmers. NIMO is freely available at https://github.com/NIMS-DA/nimo, offering a versatile, plug-and-play foundation to accelerate autonomous materials exploration across diverse experimental landscapes.

URL PDF HTML ☆

赞 0 踩 0

2606.16202 2026-06-16 cs.CV cs.AI cs.RO 交叉投稿

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

EgoPhys: 从第一人称视频学习可变形物体的通用物理模型

Hyunjin Kim, Ri-Zhao Qiu, Guangqi Jiang, Xiaolong Wang

发表机构 * UC San Diego（加州大学圣地亚哥分校）

AI总结提出EgoPhys框架，从第一人称RGB视频中通过可泛化先验构建可变形物体的物理数字孪生，无需测试时优化即可预测弹簧刚度场，在重建、未来预测和零样本泛化上优于基线。

Comments Project Page: https://hjhyunjinkim.github.io/EgoPhys

详情

AI中文摘要

人类通过日常互动自然地理解物体物理，但准确预测复杂的可变形动力学（如弹性材料和织物）仍然是计算机视觉和机器人学的主要挑战。我们提出EgoPhys，一个利用可泛化先验从仅RGB的第一人称视频构建可变形物理数字孪生的框架。EgoPhys通过将每个物体的逆物理解蒸馏到紧凑码本中，克服了现有方法的局限性，从而能够为未见物体预测密集的弹簧刚度场，而无需每个弹簧的测试时优化。使用来自多样化第一人称交互的可泛化先验进行训练，EgoPhys在重建、未来预测和零样本泛化方面优于基线。为了支持训练和评估，我们整理了一个涵盖多样化可变形物体、场景和操作风格的第一人称交互数据集。我们将EgoPhys部署在真实的xArm6机器人上，证明从单个第一人称人类游戏视频初始化的数字孪生可以作为内部世界表示，辅助可变形物体规划，突显第一人称RGB观测作为通往真实到模拟管道的可扩展路径。

英文摘要

Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric videos by distilling per-object inverse-physics solutions into a compact codebook, enabling prediction of dense spring stiffness fields for unseen objects without per-spring test-time optimization. Trained with generalizable priors from diverse egocentric interactions, EgoPhys outperforms baselines in reconstruction, future prediction, and zero-shot generalization. To support training and evaluation, we curate an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. We deploy EgoPhys on a real xArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning, highlighting egocentric RGB observations as a scalable path toward real-to-sim pipelines.

URL PDF HTML ☆

赞 0 踩 0

2509.14548 2026-06-16 cs.RO cs.HC 版本更新

SimCoachCorpus: A naturalistic dataset with language and trajectories for embodied teaching

SimCoachCorpus：一个包含语言和轨迹的自然主义数据集，用于具身教学

Emily Sumner, Deepak E. Gopinath, Laporsha Dees, Patricio Reyes Gomez, Xiongyi Cui, Andrew Silva, Jean Costa, Allison Morgan, Mariah Schrum, Tiffany L. Chen, Avinash Balachandran, Guy Rosman

发表机构 * Toyota Research Institute（丰田研究院）； Cambridge, MA 02139（马萨诸塞州剑桥市02139）； Los Altos, CA 94022（加州洛斯阿尔托斯94022）

AI总结为填补具身交互领域中语言与物理动作结合的数据集空白，我们构建了SimCoachCorpus，包含29名受试者在赛车模拟器中的驾驶数据（15名有教练指导，14名无指导），同步记录车辆状态、教练反馈及认知负荷等，可用于研究运动学习、语言现象及训练教学模型。

Comments This is an extended version of a paper accepted to KDD Datasets & Benchmarks Track 2026

详情

AI中文摘要

高质量策划的数据集对于训练和评估人工智能方法至关重要，但在语言和物理动作交织的具身交互领域中，这类数据集往往缺乏。特别是，很少有数据集能够捕捉人们通过口头指令随时间在具身任务中获取运动技能的过程。为填补这一空白，我们引入了SimCoachCorpus：一个独特的赛车模拟器驾驶数据集，能够研究在指导和非指导运动技能获取过程中的丰富现象。在该数据集中，29名受试者被要求在驾驶模拟器中围绕赛道驾驶约九十分钟。15名参与者接受了一位专业性能驾驶教练的一对一指导，14名参与者没有接受教练指导。SimCoachCorpus包括车辆状态和输入、地图（赛道边界和赛车线）以及锥形标志等特征。此外，这些特征与教练的同步口头反馈以及每圈结束后的额外终端反馈同步。我们还为每个同步反馈话语提供了高层教练类别的高质量注释、学生对教练建议的遵从度评分，以及参与者的自我报告认知负荷和情绪状态（通过研究期间的调查收集）。最终数据集包含超过20,000个同步反馈话语、超过400个终端反馈话语以及超过40小时的交互驾驶数据。我们的自然主义交互数据集可用于研究运动学习动态、探索语言现象以及训练教学和学习计算模型。我们展示了该数据集在上下文学习、模仿学习和主题建模中的应用。数据托管在此https URL，代码可在此https URL获取。

英文摘要

High-quality curated datasets are essential for training and evaluating AI approaches, but are often lacking in embodied interactive domains where language and physical action are intertwined. In particular, few datasets capture how people acquire motor skills in embodied tasks through verbal instruction over time. To address this gap, we introduce SimCoachCorpus: a unique dataset of race car simulator driving that enables the investigation of rich phenomena during guided and unguided motor skill acquisition. In this dataset, 29 humans were asked to drive in a driving simulator around a race track for approximately ninety minutes. Fifteen participants received one-on-one instruction from a professional performance driving coach, and 14 participants drove without coaching instruction. SimCoachCorpus includes features such as vehicle state and inputs, map (track boundaries and race-line), and cone landmarks. Additionally, these are synchronized with the coach's concurrent verbal feedback and additional terminal feedback at the end of each lap. We also provide high-quality annotations of high-level coaching categories for each concurrent feedback utterance, ratings on students' compliance with coaching advice, and self-reported cognitive load and emotional state of participants (gathered from surveys during the study). The final dataset includes over 20,000 concurrent feedback utterances, over 400 terminal feedback utterances, and over 40 hours of interactive driving data. Our naturalistic interactive dataset can be used to investigate motor learning dynamics, explore linguistic phenomena, and train computational models of teaching and learning. We demonstrate applications of this dataset for in-context learning, imitation learning, and topic modeling. Data is hosted at https://doi.org/10.7910/DVN/W7VTKZ and code is available at https://github.com/ToyotaResearchInstitute/sim_coach_corpus

URL PDF HTML ☆

赞 0 踩 0

2411.19567 2026-06-16 cs.SE cs.RO 版本更新

DynNPC: Finding More Violations Induced by ADS in Simulation Testing through Dynamic NPC Behavior Generation

DynNPC：通过动态NPC行为生成在仿真测试中发现更多由ADS引发的违规

You Lu, Yifan Tian, Dingji Wang, Bihuan Chen, Xin Peng

发表机构 * College of Computer Science and Artificial Intelligence, Fudan University（计算机科学与人工智能学院，复旦大学）

AI总结提出DynNPC框架，让NPC车辆在仿真执行中根据交通信号和自车行为动态生成驾驶策略，以生成更多由自动驾驶系统（ADS）引发的违规场景，提升测试效率。

Comments Accepted by TOSEM 2026

详情

AI中文摘要

最近，许多仿真测试方法被提出，用于生成多样化的驾驶场景以测试自动驾驶系统（ADS）。然而，先前方法生成的场景中NPC车辆的行为是预定义并在仿真执行前变异的，忽略了交通信号和自车（Ego）车辆的行为。因此，它们发现的大量违规是由NPC车辆的不现实行为引发的，并未揭示ADS的缺陷。此外，迭代变异过程中NPC行为的巨大场景搜索空间限制了先前方法的效率。为解决这些限制，我们提出了一种新颖的基于场景的测试框架DynNPC，以生成更多由ADS引发的违规场景。具体来说，DynNPC允许NPC车辆在仿真执行期间根据交通信号和自车车辆的实时行为，使用不同的驾驶策略动态生成行为。我们将DynNPC与最先进的基于场景的测试方法进行比较。评估结果表明，DynNPC在发现更多由ADS引发的违规场景方面具有有效性和高效性。

英文摘要

Recently, a number of simulation testing approaches have been proposed to generate diverse driving scenarios for autonomous driving systems (ADSs) testing. However, the behaviors of NPC vehicles in these scenarios generated by previous approaches are predefined and mutated before simulation execution, ignoring traffic signals and the behaviors of the Ego vehicle. Thus, a large number of the violations they found are induced by unrealistic behaviors of NPC vehicles, revealing no bugs of ADSs. Besides, the vast scenario search space of NPC behaviors during the iterative mutations limits the efficiency of previous approaches. To address these limitations, we propose a novel scenario-based testing framework, DynNPC, to generate more violation scenarios induced by the ADS. Specifically, DynNPC allows NPC vehicles to dynamically generate behaviors using different driving strategies during simulation execution based on traffic signals and the real-time behavior of the Ego vehicle. We compare DynNPC with state-of-the-art scenario-based testing approaches. Our evaluation has demonstrated the effectiveness and efficiency of DynNPC in finding more violation scenarios induced by the ADS.

URL PDF HTML ☆

赞 0 踩 0

2502.19544 2026-06-16 cs.LG cs.RO 版本更新

Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

通过非策划数据引导世界模型的高效强化学习

Yi Zhao, Aidan Scannell, Wenshuai Zhao, Yuxin Hou, Tianyu Cui, Le Chen, Dieter Büchler, Arno Solin, Juho Kannala, Joni Pajarinen

发表机构 * Aalto University（阿alto大学）； University of Edinburgh（爱丁堡大学）； ELLIS Institute Finland（芬兰ELLIS研究所）； Deep Render ； Imperial College London（伦敦帝国理工学院）； Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； CIFAR AI Chair（CIFAR人工智能主席）； University of Alberta（阿尔伯塔大学）； Alberta Machine Intelligence Institute (Amii)（阿尔伯塔机器智能研究所（Amii））； University of Oulu（奥卢大学）

AI总结提出利用无奖励、混合质量、多本体的非策划离线数据，通过经验回放和执行引导技术解决分布偏移问题，显著提升在线强化学习的样本效率。

详情

AI中文摘要

利用离线数据是提高在线强化学习（RL）样本效率的一种有前景的方法。本文通过利用丰富的非策划数据（无奖励、混合质量、跨多个本体收集）来扩展离线到在线RL的可用数据池。尽管学习世界模型似乎有望利用此类数据，但我们发现简单的微调在许多任务上无法加速RL训练。通过仔细研究，我们将这种失败归因于微调期间离线数据和在线数据之间的分布偏移。为了解决这个问题并有效使用离线数据，我们提出了两种技术：\emph{i)} 经验回放和\emph{ii)} 执行引导。通过这些修改，非策划离线数据显著提高了RL的样本效率。在有限的样本预算下，我们的方法在跨越6个本体的72个视觉运动任务上，实现了几乎两倍于从头学习基线的总得分。在诸如移动和机器人操作等具有挑战性的任务上，它显著优于先前利用离线数据的方法。

英文摘要

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: \emph{i)} experience rehearsal and \emph{ii)} execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves nearly twice the aggregate score of learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

URL PDF HTML ☆

赞 0 踩 0

2603.05876 2026-06-16 cs.CV cs.RO 版本更新

Systematic Evaluation of Novel View Synthesis for Video Place Recognition

面向视频地点识别的合成新视角系统性评估

Muhammad Zawad Mahmud, Samiha Islam, Damian Lyons

AI总结系统评估合成新视角对视频地点识别的影响，发现少量合成视角可提升识别性能，且视角变化幅度不如添加数量和图像类型重要。

Comments Submitted to IEEE IROS 2026

2606.16022 2026-06-16 cs.RO 新提交

通过迭代策略更新的鲁棒共形CBF和CLF控制器

Omid Mirzaeedodangeh, Eliot Shekhtman, Nikolai Matni, Lars Lindemann

发表机构 * Automatic Control Laboratory, ETH Zürich（瑞士苏黎世联邦理工学院自动控制实验室）； Computer and Information Science, University of Pennsylvania（宾夕法尼亚大学计算机与信息科学系）； Electrical and Systems Engineering, University of Pennsylvania（宾夕法尼亚大学电气与系统工程系）

AI总结针对共形预测嵌入鲁棒控制时因分布偏移导致安全/稳定性保证失效的问题，提出迭代更新策略框架，结合对抗鲁棒共形预测与分布偏移预算，实现跨回合保证。

详情

AI中文摘要

共形预测（CP）已被用于获取学习动力学模型与真实未知系统之间误差的概率界限。然后，这些CP界限可以嵌入到鲁棒控制李雅普诺夫函数（CLF）和控制屏障函数（CBF）框架中。然而，由于部署的CLF/CBF策略下的闭环轨迹分布与推导CP界限及其保证的轨迹分布之间存在分布偏移，这种方法无法保留稳定性/安全性保证。为了解决这个问题，我们提出了一种情节式框架，该框架迭代更新鲁棒共形CLF/CBF策略，同时保持跨情节的稳定性/安全性保证。我们通过（1）使用对抗鲁棒共形预测，以及（2）量化分布偏移预算来实现这一点，该预算允许我们控制模型误差在策略更新中增加的程度。该分布偏移预算通过闭环轨迹灵敏度分析推导得出，为CP界限提供了隐式和显式更新规则。我们分析了算法的收敛性，并在三个案例研究中进行了演示。据我们所知，这是首次为鲁棒共形CBF/CLF策略提供稳定性/安全性保证的结果。

英文摘要

Conformal prediction (CP) has been used to obtain probabilistic bounds on the error between a learned dynamics model and the true but unknown system. Such CP bounds can then be embedded into robust control Lyapunov function (CLF) and control barrier function (CBF) frameworks. However, such an approach does not retain stability/safety guarantees because of the distribution shift between the closed-loop trajectory distribution under the deployed CLF/CBF policy and the trajectory distribution from which the CP bound and its guarantees were derived. To address this issue, we propose an episodic framework that iteratively updates the robust conformal CLF/CBF policy while maintaining stability/safety guarantees across episodes. We achieve this by (1) using adversarially robust conformal prediction, and (2) quantifying a distribution shift budget that allows us to control how much the model error can increase across policy updates. This distribution shift budget is derived via a closed-loop trajectory sensitivity analysis, yielding an implicit and an explicit update rule for the CP bound. We analyze convergence of our algorithm, which we demonstrate on three case studies. To the best of our knowledge, these are the first results that provide stability/safety guarantees for robust conformal CBF/CLF policies.

URL PDF HTML ☆

赞 0 踩 0

2606.16467 2026-06-16 cs.CR cs.RO 交叉投稿

A Formal Resilience Framework for Cyber-Physical Embodied Systems under Device-Level Cyberattacks

面向设备级网络攻击的物理信息具身系统形式化弹性框架

Alberto Giaretta

发表机构 * Department of Computer Science, Örebro University, Sweden（瑞典 Örebro 大学计算机科学系）； AI, Robotics and Cybersecurity Center (ARC), Örebro University, Sweden（瑞典 Örebro 大学人工智能、机器人与网络安全中心）

AI总结提出一种形式化可靠性框架，将入侵检测系统信息融入弹性评估谓词，用于分析网络攻击对具身CPS任务执行和实体保全的影响，并指导缓解策略部署。

Comments 8 pages, 2 tables

详情

AI中文摘要

在信息物理系统（CPS）中，容错通常通过分析传感器和执行器输出、检测渐进漂移或突然故障以及启动适当的容错机制来实现。在一般故障模型下合理的这种方法无法捕捉网络攻击引起的细微破坏，这些攻击可能采用微妙策略。这在具身CPS中尤为关键，因为计算和物理设备不仅在任务完成中起积极作用，还在实体保全（即维护系统物理完整性）中起积极作用。为防止结构性物理损伤，具身CPS需要一个能够主动响应网络攻击的框架。本文提出一个形式化可靠性框架，将入侵检测系统信息融入弹性评估谓词，从而能够评估对破坏和退化的容忍能力。该框架支持关于网络攻击如何影响任务执行和实体保全以及是否需要部署缓解策略的结构化推理。分析示例展示了其分析能力和合理性，为可靠且安全的具身CPS奠定了理论基础。

英文摘要

In cyber-physical systems (CPSs), fault tolerance is traditionally achieved by analysing sensor and actuator outputs, detecting progressive drift or sudden failures, and initiating suitable tolerance mechanisms. Reasonable under general failure models, this approach fails to capture nuanced disruptions caused by cyberattacks, which may employ subtle strategies. This is particularly critical in embodied CPSs, where computational and physical devices not only have an active role in task completion, but also in embodiment preservation (that is, maintaining the system's physical integrity). To prevent structural physical damage, embodied CPSs require a framework that enables proactive response to cyberattacks. This paper proposes a formal dependability framework that incorporates IDS information into resilience evaluation predicates, enabling assessment of tolerance to disruption and degradation. The framework supports structured reasoning about how cyberattacks affect task execution and embodiment preservation, and whether mitigation strategies must be deployed. Analytical examples demonstrate its analytical capability and soundness, establishing a theoretical foundation for dependable and secure embodied CPSs.

URL PDF HTML ☆

赞 0 踩 0

2601.15459 2026-06-16 cs.RO 版本更新

Neural Minimum-Distance Estimation for Collision-Aware Operation of Multi-Arm Laparoscopy Surgical Robots Through Learning-from-Simulation

基于仿真学习的多臂腹腔镜手术机器人碰撞感知操作的神经最小距离估计

Sarvin Ghiasi, Majid Roshanfar, Jake Barralet, Liane S. Feldman, Amir Hooshiar

发表机构 * Surgical Performance Enhancement and Robotics (SuPER) Centre, Department of Surgery（外科性能增强与机器人中心（SuPER）中心，外科部）； The Wilfred and Joyce Posluns Centre for Image Guided Innovation & Therapeutic Intervention (PCIGITI)（威廉与乔伊斯·波斯伦中心（PCIGITI）影像引导创新与治疗干预中心）； The Hospital for Sick Children (SickKids)（儿童医院（SickKids））

AI总结提出结合分析建模、实时仿真与深度残差神经网络的框架，用于多臂手术机器人最小距离估计与碰撞预警，模型在验证集上R²=0.940，RMSE=42.0 mm。

详情

DOI: 10.3390/s26123744
Journal ref: Sensors 2026, 26(12), 3744

AI中文摘要

本研究提出了一个集成框架，通过解决多臂操纵器之间的最小距离估计和相关的碰撞感知警告，提高腹腔镜手术中机械臂的安全性和操作效率。通过结合分析建模、实时仿真和机器学习，该框架为确保机器人安全操作提供了稳健的解决方案。开发了一个分析模型，基于关节配置估计机械臂之间的最小距离，提供理论计算作为验证工具和基准。为补充这一点，创建了一个3D仿真环境，模拟两个7自由度Kinova机械臂（Kinova inc., Boisbriand, QC, Canada），生成了用于距离估计和碰撞警告的多样化配置数据集。利用这些见解，训练了一个以关节配置为输入的深度残差神经网络模型。在保留的验证集上，模型达到了R²=0.940，RMSE=42.0 mm，MAE=28.7 mm，且平均偏差接近零，展示了强大的预测准确性和在整个工作空间中的一致泛化能力。该框架旨在作为早期碰撞警告层，当预测的臂间距离低于0.2 m阈值时触发警告，考虑到Kinova Gen3（Kinova inc., Boisbriand, QC, Canada）的横截面半径，这对应于大约50 mm的表面到表面间隙。这项工作展示了将分析建模与机器学习相结合以提高多臂机器人系统精度和可靠性的有效性。

英文摘要

This study presents an integrated framework for enhancing the safety and operational efficiency of robotic arms in laparoscopic surgery by addressing minimum distance estimation between multi-arm manipulators and the associated collision-aware warning. By combining analytical modeling, real time simulation, and machine learning, the framework offers a robust solution for ensuring safe robotic operations. An analytical model was developed to estimate the minimum distances between robotic arms based on their joint configurations, offering theoretical calculations that serve as both a validation tool and a benchmark. To complement this, a 3D simulation environment was created to model two 7 DOF Kinova robotic arms (Kinova inc., Boisbriand, QC, Canada), generating a diverse dataset of configurations for distance estimation and collision warning. Using these insights, a deep residual neural network model was trained with joint configurations as inputs. On the held out validation set, the model achieves R2 = 0.940, RMSE = 42.0 mm, MAE = 28.7 mm, and a near zero mean bias, demonstrating strong predictive accuracy and consistent generalization across the workspace. The framework is intended as an early collision warning layer, where a warning is triggered when the predicted inter-arm distance falls below a 0.2 m threshold, which corresponds to a surface to surface clearance of approximately 50 mm given the Kinova Gen3 (Kinova inc., Boisbriand, QC, Canada) cross sectional radius. This work demonstrates the effectiveness of combining analytical modeling with machine learning to enhance the precision and reliability of multi-arm robotic systems.

URL PDF HTML ☆

赞 0 踩 0

2606.12978 2026-06-16 cs.RO cs.CV cs.SY eess.SY 版本更新

Trajectory-Level Redirection Attacks on Vision-Language-Action Models

轨迹级重定向攻击对视觉-语言-动作模型

Gokul Puthumanaillam, Vardhan Dongre, Pranay Thangeda, Hooshang Nayyeri, Dilek Hakkani-Tür, Melkior Ornik

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文发现VLA模型存在轨迹级漏洞：看似保留原始指令的对抗性提示，能重定向机器人最终物理结果，并提出了命令保持的轨迹重定向威胁模型和在线提示搜索方法。

详情

AI中文摘要

视觉-语言-动作（VLA）策略将自然语言引入闭环机器人控制，使机器人能够直接从文本指令执行操作任务。同一接口赋予文本在控制中的循环角色，因为提示在每个重新规划步骤中被重复使用，每个提示条件化的动作会改变策略所作用的未来观测。现有的VLA攻击研究对抗性提示，这些提示引发目标低级动作或使此类动作在变化的图像中持续存在。我们识别出一个更强的轨迹级故障模式：一个提示仍然$\textit{看起来}$指定了预期任务，但重定向了最终物理结果。我们在数学上将这种设置形式化为$\textit{命令保持的轨迹重定向}$，这是一种仅提示的威胁模型，其中攻击者在情节开始前选择一个提示，所有策略和环境组件保持不变，并且提示必须保持接近良性指令，同时省略目标词和纠正语言。为了找到这样的提示，我们引入了一种在线提示搜索方法，该方法使用滚动来发现扰动，其闭环行为跟踪目标任务，同时满足命令保持约束。在仿真和硬件上的实验表明，接近良性的提示扰动可以将VLA滚动重定向到攻击者指定的目标。这些结果暴露了VLA指令基础中的轨迹级漏洞：看似保留预期命令的文本仍然可以让对手控制机器人的最终物理结果。项目网站：此https URL

英文摘要

Vision-language-action (VLA) policies bring natural language into closed-loop robot control, enabling robots to execute manipulation tasks directly from text instructions. The same interface gives text a recurring role in control because the prompt is reused at every replanning step, and each prompt-conditioned action changes the future observations on which the policy acts. Existing VLA attacks study adversarial prompts that elicit targeted low-level actions or make such actions persist across changing images. We identify a stronger trajectory-level failure mode: a prompt that still $\textit{appears}$ to specify the intended task but redirects the final physical outcome. We mathematically formalize this setting as $\textit{command-preserving trajectory redirection}$, a prompt-only threat model in which the attacker chooses one prompt before the episode, all policy and environment components remain fixed, and the prompt must stay close to the benign instruction while omitting target words and correction language. To find such prompts, we introduce an on-policy prompt search method that uses rollouts to discover perturbations whose closed-loop behavior tracks a target task while satisfying the command-preserving constraints. Experiments in simulation and on hardware show that near-benign prompt perturbations can redirect VLA rollouts to attacker-specified targets. These results expose a trajectory-level vulnerability in VLA instruction grounding: text that appears to preserve the intended command can still give an adversary control over the robot's final physical outcome. Project website: https://vla-redirection-attack.github.io/

URL PDF HTML ☆

赞 0 踩 0

2606.14238 2026-06-16 cs.RO cs.AI 版本更新

When and How Severely: Scenario-Specific Safety Envelopes for Driving VLAs

何时以及多严重：驾驶VLA的场景特定安全包络

Abhinaw Priyadershi, Jelena Frtunikj

发表机构 * NVIDIA Corporation（英伟达公司）； NVIDIA GmbH（英伟达德国有限公司）

AI总结针对ISO 21448下VLA驾驶规划器的安全认证，提出二维安全包络方法，通过GMM识别六种严重性等级，揭示场景特定风险差异。

详情

AI中文摘要

迈向下一代医疗：医疗具身AI在感知、决策与行动中的综述

Cheng Zhang, Qing Cai, Xingzheng Wu, Xun Yang, Xiaojun Chang, Bingkun Bao, Liqiang Nie, Xinwang Liu, Yi Yang

发表机构 * School of Information Science and Engineering, Ocean University of China（中国海洋大学信息科学与工程学院）； Innovation School of Artificial Intelligence, Hefei University of Technology（合肥工业大学人工智能创新学院）； School of Information Science and Technology, University of Science and Technology of China（中国科学技术大学信息科学技术学院）； School of Computer Science and Information Engineering, Hefei University of Technology（合肥工业大学计算机与信息工程学院）； School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)（哈尔滨工业大学（深圳）计算机科学与技术学院）； College of Computer Science and Technology, National University of Defense Technology（国防科技大学计算机科学与技术学院）； ReLER Laboratory, CCAI, Zhejiang University（浙江大学计算机辅助设计与图形学国家重点实验室）

AI总结本文系统综述医疗具身AI的核心组件，强调感知、决策与行动的协调集成，并分析临床实践中的挑战与未来方向。

Comments 19 pages, 9 figures

详情

AI中文摘要

基础模型在提升医疗效率方面表现出色，广泛应用于各类医疗场景。然而，它们在感知、理解和与物理世界交互方面的能力有限，严重制约了其在真实临床工作流中的有效性，而临床工作流中安全关键的决策和物理执行紧密耦合。近年来，具身人工智能（AI）作为一种有前景的物理交互范式出现，使智能体能够在复杂医疗环境中操作。随着该领域研究的迅速扩展，理解智能体如何在临床环境中作为集成的端到端系统运行变得日益关键。然而，现有关于医疗具身AI的综述大多强调单个方面或功能组件，缺乏统一的系统级组织。为支持和巩固最新进展，我们系统调查了医疗具身AI的核心组件，特别关注感知、决策与行动的协调集成。我们进一步回顾了代表性医疗应用和相关数据集，并分析了真实临床实践中遇到的主要挑战。最后，我们讨论了这一快速发展领域未来研究的关键方向。相关项目见 https://github.com/VMVLab/Medical_Embodied_AI_Paper_List。

英文摘要

Foundation models have demonstrated impressive performance in enhancing healthcare efficiency across a wide range of medical applications. Nevertheless, their limited ability to perceive, understand, and interact with the physical world significantly constrains their effectiveness in real-world clinical workflows, where safety-critical decision-making and physical execution are tightly coupled. Recently, embodied artificial intelligence (AI) has emerged as a promising physical-interactive paradigm for intelligent healthcare, enabling agents to operate in complex medical environments. As research in this area rapidly expands, understanding how intelligent agents function as integrated, end-to-end systems in clinical environments becomes increasingly critical. However, existing surveys on medical embodied AI largely emphasize individual aspects or functional components, lacking a unified system-level organization of the field. To support and consolidate recent advances, we systematically survey the core components of medical embodied AI, with a particular emphasis on the coordinated integration of perception, decision-making, and action. We further review representative medical applications and relevant datasets, and we analyze the major challenges encountered in real-world clinical practice. Finally, we discuss key directions for future research in this rapidly evolving field. The associated project can be found at https://github.com/VMVLab/Medical_Embodied_AI_Paper_List.

URL PDF HTML ☆

赞 0 踩 0

2601.08514 2026-06-16 cs.RO 版本更新

Simplifying ROS2 controllers with a modular architecture for robot-agnostic reference generation

简化ROS2控制器：用于机器人无关参考生成的模块化架构

Davide Risi, Vincenzo Petrone, Antonio Langella, Lorenzo Pagliara, Enrico Ferrentino, Pasquale Chiacchio

发表机构 * Department of Information Engineering, Electrical Engineering and Applied Mathematics (DIEM), University of Salerno（信息工程、电气工程与应用数学系（DIEM），萨勒诺大学）

AI总结提出一种模块化ROS2架构，通过专用参考生成器解耦参考处理与控制逻辑，减少重复代码并提升跨平台复用性，在UR和Franka机器人上验证了可靠跟踪与流水线构建效率。

Comments 5 pages, 7 figures

详情

AI中文摘要

本文介绍了一种新颖的ROS2模块化架构，该架构将获取、验证和插值参考所需的逻辑与跟踪这些参考的控制律解耦。该设计包含一个名为参考生成器的专用组件，它从外部节点（如规划器）接收参考（以单点或轨迹的形式），并通过现有的ros2_control链式机制以控制器的采样周期向下游控制器写入单点参考。这种分离消除了控制器中重复的参考处理代码，并提高了跨机器人平台的可重用性。我们实现了两个参考生成器：一个用于处理关节空间参考，另一个用于笛卡尔参考，以及一组新的控制器（带重力补偿的PD控制器、笛卡尔位姿控制器和导纳控制器），并在仿真和真实的Universal Robots及Franka Emika机械臂上验证了该方法。结果表明：(i) 在所有测试场景中参考均被可靠跟踪；(ii) 参考生成器减少了跨链式控制器的重复参考处理代码，有利于构建和复用复杂的控制器流水线；(iii) 控制器实现仅专注于控制律。

英文摘要

This paper introduces a novel modular architecture for ROS2 that decouples the logic required to acquire, validate, and interpolate references from the control laws that track them. The design includes a dedicated component, named Reference Generator, that receives references, in the form of either single points or trajectories, from external nodes (e.g., planners), and writes single-point references at the controller's sampling period via the existing ros2_control chaining mechanism to downstream controllers. This separation removes duplicated reference-handling code from controllers and improves reusability across robot platforms. We implement two reference generators: one for handling joint-space references and one for Cartesian references, along with a set of new controllers (PD with gravity compensation, Cartesian pose, and admittance controllers) and validate the approach on simulated and real Universal Robots and Franka Emika manipulators. Results show that (i) references are tracked reliably in all tested scenarios, (ii) reference generators reduce duplicated reference-handling code across chained controllers to favor the construction and reuse of complex controller pipelines, and (iii) controller implementations remain focused only on control laws.

URL PDF HTML ☆

赞 0 踩 0

2603.24350 2026-06-16 cs.RO cs.AI cs.LG 版本更新

Evidence of an Emergent "Self" in Continual Robot Learning

持续机器人学习中涌现的“自我”证据

Adidev Jhunjhunwala, Judah Goldfeder, Hod Lipson

发表机构 * Creative Machines Lab, Department of Mechanical Engineering, Columbia University（创意机器实验室，机械工程系，哥伦比亚大学）； Creative Machines Lab, Department of Computer Science, Columbia University（创意机器实验室，计算机科学系，哥伦比亚大学）

AI总结通过比较恒定任务与持续学习下机器人的认知结构，发现持续学习机器人形成显著更稳定的不变子网络，该子网络对适应性至关重要，为量化智能系统自我概念提供原则性方法。

Comments 44 pages, 24 figures, includes supplementary materials

详情

AI中文摘要

理解自我意识的一个关键挑战是，如何以原则性的方式量化一个智能系统是否具有“自我”概念，以及如果存在，如何将“自我”与其他认知结构区分开来。我们提出，可以通过寻找认知过程中相对于快速获得的认知技能变化较小的不变部分来隔离“自我”——因为我们的自我是我们经验中最持久的方面。我们利用这一原则分析了两种条件下机器人的认知结构：一个机器人学习恒定任务，而另一个在可变任务下进行持续学习。我们发现，经历持续学习的机器人形成了一个不变子网络，该子网络比对照组显著更稳定（p < 0.001），并且该子网络在功能上也很重要：保留它有助于适应，而破坏它会损害性能。我们在跨越运动控制和操作的三种不同机器人上验证了这一模式。

英文摘要

A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self", and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive skills - because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions: One robot learns a constant task, while a second undergoes continual learning under variable tasks. We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important: preserving it aids adaptation while damaging it impairs performance. We validate this pattern across three different robots spanning locomotion and manipulation.

URL PDF HTML ☆

赞 0 踩 0

2604.03386 2026-06-16 cs.RO cs.NE 版本更新

Activity-Dependent Plasticity in Morphogenetically-Grown Recurrent Networks

形态发生生长递归网络中的活动依赖可塑性

Sergii Medvid, Andrii Valenia, Mykola Glybovets

发表机构 * National University of Kyiv-Mohyla Academy（基輔-莫希拉學院國立大學）

AI总结研究形态发生生长递归网络中Hebbian与反Hebbian可塑性的作用，发现反Hebbian可塑性显著优于Hebbian，且协同进化独立发现该模式，表明反Hebbian优势对小递归网络具有普适性。

Comments 8 pages, 6 figures. Camera-ready version; accepted at GECCO 2026 Companion (EvoSelf workshop)

详情

DOI: 10.1145/3795101.3814700

AI中文摘要

神经架构搜索的发育方法通过自组织从紧凑基因组生长出功能网络，但所得网络以固定的生长后权重运行。我们在50,000个形态发生生长的递归控制器（CartPole和Acrobot上超过5M种配置）中刻画了Hebbian和反Hebbian可塑性，然后测试协同进化实验——其中可塑性参数编码在基因组中并与发育架构共同进化——是否独立恢复这些模式。我们的刻画显示：（1）对于胜任网络，反Hebbian可塑性显著优于Hebbian（Cohen's d = 0.53-0.64）；（2）遗憾（在最佳固定设置下损失的oracle改进比例）达到52-100%；（3）在非平稳条件下，可塑性的作用从微调转变为真正的适应。协同进化独立发现这些模式：在CartPole上，70%的运行进化出反Hebbian可塑性（p = 0.043）；在Acrobot上，进化发现接近零的eta且符号混合——与刻画完全匹配。随机RNN对照表明，反Hebbian优势对小递归网络具有普适性，但拓扑依赖程度是发育特异的：形态发生生长网络的遗憾比具有匹配拓扑统计的随机图高2-6倍。

英文摘要

Developmental approaches to neural architecture search grow functional networks from compact genomes through self-organisation, but the resulting networks operate with fixed post-growth weights. We characterise Hebbian and anti-Hebbian plasticity across 50,000 morphogenetically grown recurrent controllers (5M+ configurations on CartPole and Acrobot), then test whether co-evolutionary experiments -- where plasticity parameters are encoded in the genome and evolved alongside the developmental architecture -- recover these patterns independently. Our characterisation reveals that (1) anti-Hebbian plasticity significantly outperforms Hebbian for competent networks (Cohen's d = 0.53-0.64), (2) regret (fraction of oracle improvement lost under the best fixed setting) reaches 52-100%, and (3) plasticity's role shifts from fine-tuning to genuine adaptation under non-stationarity. Co-evolution independently discovers these patterns: on CartPole, 70% of runs evolve anti-Hebbian plasticity (p = 0.043); on Acrobot, evolution finds near-zero eta with mixed signs -- exactly matching the characterisation. A random-RNN control shows that anti-Hebbian dominance is generic to small recurrent networks, but the degree of topology-dependence is developmental-specific: regret is 2-6x higher for morphogenetically grown networks than for random graphs with matched topology statistics.

URL PDF HTML ☆

赞 0 踩 0

2604.16592 2026-06-16 cs.RO cs.AI cs.CV cs.ET 版本更新

Human Cognition in Machines: A Unified Perspective of World Models

机器中的人类认知：世界模型的统一视角

Timothy Rupprecht, Pu Zhao, Amir Taherin, Arash Akbari, Arman Akbari, Yumei He, Tooba Imtiaz, Sean Duffy, Juyi Lin, Yixiao Chen, Rahul Chowdhury, Enfu Nan, Yixin Shen, Yifan Cao, Haochen Zeng, Weiwei Chen, Geng Yuan, Jennifer Dy, Sarah Ostadabbas, Xuan Zhang, David Kaeli, Edmund Yeh, Yanzhi Wang

发表机构 * Northeastern University（东北大学）； EmbodyX Inc.（EmbodyX公司）； Tulane University（路易斯安那州立大学）； Cornell University（康奈尔大学）； University of Georgia（佐治亚大学）

AI总结提出统一框架整合记忆、感知等认知功能，指出动机和元认知研究不足，并引入认知世界模型新类别。

详情

AI中文摘要

本报告通过区分先前工作在认知功能上的创新来审视世界模型。许多工作声称其世界模型具有近乎人类般的认知能力。评估这些主张需要基于人类和机器认知理论的第一原理。在迈向类人世界模型的过程中，我们提出了一个概念性的统一框架，该框架完全整合了所有认知功能（即记忆、感知、语言、推理、想象、动机和元认知），并指出现有研究的空白，以指导未来技术的发展。特别是，我们发现动机（尤其是内在动机）和元认知仍然严重研究不足，并提出了基于主动推理和全局工作空间理论的具体方向来解决这些空白。我们还引入了认知世界模型，这是一个新的类别，涵盖在结构化知识上运行的科学发现代理框架。我们的分类法应用于视频、具身和认知世界模型，提出了先前分类法未涉及的研究方向。

英文摘要

This report of world models distinguishes prior works by the cognitive functions they innovate. Many works claim an almost human-like cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles from human and machine cognition theory. In moving towards human-like world models we present a conceptual unified framework for world models that fully incorporates all the cognitive functions (i.e., memory, perception, language, reasoning, imagining, motivation, and metacognition) and identify gaps in existing research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and metacognition remain drastically under-researched, and we propose concrete directions to address these gaps informed by active inference and global workspace theory. We also introduce epistemic world models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied to video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.

URL PDF HTML ☆

赞 0 踩 0

2601.08056 2026-06-16 q-bio.NC cs.RO 版本更新

The embodied brain: Bridging the brain, body, and behavior with biorealistic neuromechanical models

具身大脑：通过生物真实神经力学模型连接大脑、身体与行为

Sibo Wang-Chen, Pavan Ramdya

发表机构 * EPFL（瑞士联邦理工学院）

AI总结本文综述生物真实神经力学模型，通过将人工神经控制器嵌入模拟环境中的身体模型，揭示神经、身体与环境交互的行为控制算法，并推动神经科学、机器人学和机器学习之间的交流。

Comments 18 pages, 4 figures (including 1 graphical abstract), 1 table

详情

AI中文摘要

动物行为反映了神经系统、身体和环境之间的相互作用。因此，必须考虑生物力学和环境背景，以理解行为控制的算法。将人工神经控制器嵌入模拟环境中的身体模型的计算模型，是用于此目的的有力工具。在这里，我们回顾了生物真实神经力学模型的进展，同时强调了即将到来的新兴机遇。我们首先展示了这些模型如何能够推断出难以通过实验测量的生物物理变量。通过系统性扰动，可以通过这些模型生成新的可实验检验的假设。然后，我们考察了神经力学模型如何促进神经科学、机器人学和机器学习之间的交流，并展示了它们在医疗保健中的应用。我们设想，将实验研究与对其神经力学替代物的主动探测相结合，将显著加速神经科学的进展。

英文摘要

Animal behavior reflects interactions between the nervous system, body, and environment. Therefore, biomechanics and environmental context must be considered to understand algorithms for behavioral control. Computational models that embed artificial neural controllers within body models in simulated environments, are a powerful tool for this purpose. Here, we review advances in biorealistic neuromechanical models while also highlighting emerging opportunities ahead. We first show how these models enable inference of biophysical variables that are difficult to measure experimentally. Through systematic perturbation, one can generate new experimentally testable hypotheses through these models. We then examine how neuromechanical models facilitate the exchange between neuroscience, robotics, and machine learning, and showcase their applications in healthcare. We envision that coupling experimental studies with active probing of their neuromechanical surrogates will significantly accelerate progress in neuroscience.

URL PDF HTML ☆

赞 0 踩 0

2605.25006 2026-06-16 cs.RO cs.LG cs.NE 版本更新

Convex-Neural RRT*: Fast and Reliable Learning-Guided Sampling for High-Quality Robot Path Planning

Convex-Neural RRT*: 快速可靠的基于学习引导的高质量机器人路径规划采样

Hichem Cheriet, Badra Khellat Kihel, Samira Chouraqui, Bara J. Emran

AI总结提出Convex-Neural RRT*算法，通过神经网络预测高质量路径附近的凸候选区域来引导采样，在多种环境中相比神经引导变体减少30-75%计算时间，路径长度平均减少约5%，成功率超99%。

详情

DOI: 10.1109/ACCESS.2026.3703346

AI中文摘要

基于采样的机器人路径规划算法在不同障碍物配置的环境中提供了概率完备性和强经验收敛性。然而，在实践中，这些方法通常需要多次迭代才能获得高质量解。本文提出了Convex-Neural RRT*，一种增强的RRT*变体，它结合神经引导来预测高质量路径附近的信息性航点区域。从这些预测中提取凸候选区域，使规划器能够将探索集中在几何相关区域，同时保持全局探索。该算法在三种环境类型和18个基准地图上与Neural RRT*、Neural Informed RRT*、经典RRT*和LTA*进行了评估。实验结果表明，与神经引导变体相比，Convex-Neural RRT*减少了30-75%的计算时间，相对于LTA*减少了高达88-98%，同时与经典RRT*相比，平均路径长度减少了约5%，在复杂环境中改进更大。该方法在不同障碍物密度下保持了超过99%的整体成功率。这些发现表明，凸引导神经采样在计算效率和解质量之间提供了有效平衡，支持其在时间敏感的机器人导航任务中的适用性。

英文摘要

Sampling-based algorithms for robot path planning offer probabilistic completeness and strong empirical convergence properties across environments with diverse obstacle configurations. However, in practice, these methods often require many iterations to obtain high-quality solutions. This paper proposes Convex-Neural RRT*, an enhanced RRT* variant that incorporates neural guidance to predict informative waypoint regions near high-quality paths. Convex candidate regions are extracted from these predictions, enabling the planner to concentrate exploration on geometrically relevant areas while preserving global exploration. The proposed algorithm is evaluated against Neural RRT*, Neural Informed RRT*, classical RRT*, and LTA* across three environment types and 18 benchmark maps. Experimental results show that Convex-Neural RRT* reduces computation time by 30-75% compared to neural-guided variants and up to 88-98% relative to LTA*, while achieving an average path length reduction of approximately 5% compared to classical RRT*, with larger improvements observed in complex environments. The method also maintains an overall success rate above 99% across varying obstacle densities. These findings indicate that convex-guided neural sampling provides an effective balance between computational efficiency and solution quality, supporting its applicability to time-sensitive robotic navigation tasks.

URL PDF HTML ☆

赞 0 踩 0

2602.21954 2026-06-16 physics.soc-ph cs.RO 版本更新

The Swarm Intelligence Freeway-Urban Trajectories (SWIFTraj) Dataset -- Part II: A Graph-Based Approach for Trajectory Connection

蜂群智能高速公路-城市轨迹（SWIFTraj）数据集——第二部分：基于图的方法用于轨迹连接

Xinkai Ji, Pan Liu, Ying Yang, Yu Han

发表机构 * Hong Kong University of Science and Technology - Guangzhou（香港科技大学（广州））

AI总结本文提出基于图的方法，解决无人机群轨迹连接中的时间对齐和车辆匹配问题，通过模拟和真实数据验证，实现高精度轨迹连接。

详情

DOI: 10.26599/COMMTR.2026.9640035

AI中文摘要

在本系列论文第一部分中，我们介绍了SWIFTraj，一个通过无人机群收集的新开源车辆轨迹数据集。该数据集有两个显著特点：首先，通过连接连续无人机视频中的轨迹，提供长距离连续轨迹，最长超过4.5公里；其次，涵盖由高速公路及其连接的城市道路组成的综合交通网络。从无人机群获取如此长距离的连续轨迹具有挑战性，因为需要在多个视频之间进行准确的时间对齐，并且无人机的空间分布不规则。为解决这些挑战，本文提出了一种新颖的基于图的方法用于连接无人机群捕获的车辆轨迹。构建了一个无向图来表示灵活的无人机布局，并开发了一种基于轨迹匹配成本最小化的自动时间对齐方法，以估计视频之间的最佳时间偏移。为了关联不同视频中同一辆车的轨迹，使用匈牙利算法建立了车辆匹配表。所提出的方法在模拟和真实数据上进行了评估。实世界实验的结果显示，时间对齐误差在三个视频帧内，对应约0.1秒，且车辆匹配的F1分数约为0.99。这些结果证明了所提出方法在解决无人机轨迹连接中的关键挑战的有效性，并突显了其在大规模车辆轨迹收集中的潜力。

英文摘要

In Part I of this companion paper series, we introduced SWIFTraj, a new open-source vehicle trajectory dataset collected using a unmanned aerial vehicle (UAV) swarm. The dataset has two distinctive features. First, by connecting trajectories across consecutive UAV videos, it provides long-distance continuous trajectories, with the longest exceeding 4.5 km. Second, it covers an integrated traffic network consisting of both freeways and their connected urban roads. Obtaining such long-distance continuous trajectories from a UAV swarm is challenging, due to the need for accurate time alignment across multiple videos and the irregular spatial distribution of UAVs. To address these challenges, this paper proposes a novel graph-based approach for connecting vehicle trajectories captured by a UAV swarm. An undirected graph is constructed to represent flexible UAV layouts, and an automatic time alignment method based on trajectory matching cost minimization is developed to estimate optimal time offsets across videos. To associate trajectories of the same vehicle observed in different videos, a vehicle matching table is established using the Hungarian algorithm. The proposed approach is evaluated using both simulated and real-world data. Results from real-world experiments show that the time alignment error is within three video frames, corresponding to approximately 0.1 s, and that the vehicle matching achieves an F1-score of about 0.99. These results demonstrate the effectiveness of the proposed method in addressing key challenges in UAV-based trajectory connection and highlight its potential for large-scale vehicle trajectory collection.

URL PDF HTML ☆

赞 0 踩 0

2603.11729 2026-06-16 cs.DS cs.AI cs.RO 版本更新

Adapting Dijkstra for Buffers and Unlimited Transfers

为缓冲区和无限换乘调整Dijkstra算法

Denys Katkalo, Andrii Rohovyi, Toby Walsh

发表机构 * University of Oxford（牛津大学）

AI总结本文提出Transfer Aware Dijkstra (TAD)算法，通过扫描完整行程序列而非单条边，解决了带缓冲区时间的无限换乘路径规划中传统Dijkstra过滤失效的问题，并在伦敦和瑞士网络上实现比MR快两倍以上的速度且保持最优性。

Comments v4: clarified RAPTOR description in the Background section

详情

AI中文摘要

近年来，基于RAPTOR的算法被认为是无需预处理即可处理无限换乘路径规划的最先进技术。然而，这一地位很大程度上源于路由研究的演进，其中基于Dijkstra的解决方案被基于时间表的算法取代，而缺乏系统性的比较。在这项工作中，我们重新审视了经典的基于Dijkstra的无限换乘公共交通路由方法，并证明时间依赖Dijkstra (TD-Dijkstra) 优于MR。然而，高效的TD-Dijkstra实现依赖于在预处理期间过滤被支配的连接，这假设乘客总是可以切换到更快的连接。我们表明，当站点有缓冲区时间时，这种过滤是不合理的，因为它无法区分可能继续等待的坐席乘客和必须遵守缓冲区的换乘乘客。为了解决这一限制，我们引入了Transfer Aware Dijkstra (TAD)，这是一种修改后的算法，它扫描整个行程序列而不是单个边，从而正确处理缓冲区时间，同时保持相对于MR的性能优势。我们在伦敦和瑞士网络上的实验表明，与MR相比，我们可以在有和没有缓冲区时间的两个网络上实现超过两倍的速度提升，同时产生最优结果。

英文摘要

In recent years, RAPTOR based algorithms have been considered the state-of-the-art for path-finding with unlimited transfers without preprocessing. However, this status largely stems from the evolution of routing research, where Dijkstra-based solutions were superseded by timetable-based algorithms without a systematic comparison. In this work, we revisit classical Dijkstra-based approaches for public transit routing with unlimited transfers and demonstrate that Time-Dependent Dijkstra (TD-Dijkstra) outperforms MR. However, efficient TD-Dijkstra implementations rely on filtering dominated connections during preprocessing, which assumes passengers can always switch to a faster connection. We show that this filtering is unsound when stops have buffer times, as it cannot distinguish between seated passengers who may continue without waiting and transferring passengers who must respect the buffer. To address this limitation, we introduce Transfer Aware Dijkstra (TAD), a modification that scans entire trip sequences rather than individual edges, correctly handling buffer times while maintaining performance advantages over MR. Our experiments on the London and Switzerland networks show that we can achieve more than a twofold speedup over MR while producing optimal results on both networks, with and without buffer times.

URL PDF HTML ☆

赞 0 踩 0

2509.16370 2026-06-16 math.OC cs.MS cs.RO cs.SY eess.SY 版本更新

Dual-Regularized Riccati Recursions for Interior-Point Optimal Control

双正则化Riccati递归用于内点最优控制

João Sousa-Pinto, Dominique Orban

发表机构 * IMT School for Advanced Studies, Lucca（利卡大学高级研究学院）

AI总结本文提出双正则化线性二次调节器问题的闭式扩展Riccati递归，通过顺序和并行方法在O(N)和O(logN)时间内求解，并证明在满足特定惯性条件时，非零原始步骤是增广障碍-拉格朗日 merit 函数的下降方向。

详情

AI中文摘要

我们推导了顺序和并行Riccati递归的闭式扩展，用于求解双正则化线性二次调节器(LQR)问题，分别具有O(N)顺序时间和O(log(N))并行时间。我们展示，当使用正则化对偶-原内点方法求解光滑、约束、非凸、离散时间最优控制问题时，这些子问题会出现，即使存在分阶段的等式或不等式约束，也不需要对约束雅可比矩阵施加任何秩要求。我们证明，当Newton-KKT矩阵满足某些惯性条件时，每个非零原始步骤都是增广障碍-拉格朗日 merit 函数的下降方向。我们通过双正则化Riccati pivots的正定性（比标准LQR正定性要求更弱的条件）来表征这些惯性条件，从而获得廉价的惯性证书。我们提供了MIT授权的C++和JAX实现，以及在Lean中的完整形式化结果。我们对领先的最优控制和非线性规划求解器进行了基准测试，证明在中等规模问题上具有竞争力的性能，并在时间跨度、问题维度和约束数量增加时获得显著收益。

英文摘要

We derive closed-form extensions of the sequential and parallel Riccati recursions for solving dual-regularized linear-quadratic regulator (LQR) problems, with $O(N)$ sequential time and $O(\log(N))$ parallel time, respectively. We show that these subproblems arise when using regularized primal-dual interior-point methods to solve smooth, constrained, non-convex, discrete-time optimal control problems via multiple-shooting, even in the presence of stagewise equality or inequality constraints, and without imposing any rank requirements on constraint Jacobians. We prove that, when certain inertia conditions on the Newton-KKT matrix are met, each nonzero primal step is a descent direction of an augmented barrier-Lagrangian merit function. We characterize these inertia conditions in terms of the positive-definiteness of the dual-regularized Riccati pivots (a weaker condition than the standard LQR positive-definiteness requirements), thereby yielding inexpensive certificates of the required inertia. We provide MIT-licensed implementations of our methods in C++ and in JAX, as well as a full formalization of our results in Lean. We benchmark our algorithm against leading optimal control and nonlinear programming solvers on complex trajectory optimization problems, establishing competitive performance on moderate problems and substantial gains as the horizon length, problem dimension, and constraint count increase.

URL PDF HTML ☆

赞 0 踩 0

1. 机器人学习与模仿强化学习 25 篇

TacStyle: Personalizing Tactile Robot Policies using Structured Behavior Representations

Rethinking Implicit Spatial Representation in Visuomotor Policy Learning

A Hybrid Model-Based and Model-Free Framework for Active Multi-View Viewpoint Optimization in Sonar Target Recognition

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

Perfect Demo Makes Poor Teacher: Learning Robust Alignment from Critical Motion Segments

Learning New Tasks via Reusable Skills: Skill-Compositional Experts for Embodied Continual Learning

Scaling Short-Term Memory of Visuomotor Policies for Long-Horizon Tasks

RHO: Your Coding Agent is Secretly a Roboticist

Steering Generative Reinforcement Learning into Stable Robotic Controller

Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning

LOPAL: Local Performance-Aware Active Learning from Imperfect Demonstrations

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining

QPILOTS: Efficient Test-Time Q-Steering for Flow Policies

FlowMPC: Improving Flow Matching policies with World Models

Direction-Conditioned Policies via Compositional Subgoal Scoring for Online Goal-Conditioned Reinforcement Learning

ReMoBot: Retrieval-Based Few-Shot Imitation Learning for Mobile Manipulation with Vision Foundation Models

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

Latent Action Pretraining Through World Modeling

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

GuideWalk: Learning Unified Autonomous Navigation and Locomotion for Humanoid Robots across Versatile Terrains

EV-WM: Event-Verified World Models for Long-Horizon Robotic Manipulation

$μ_0$: A Scalable 3D Interaction-Trace World Model

Whole-Brain Connectomic Graph Model Enables Whole-Body Locomotion Control in Fruit Fly

2. 运动规划、控制与动力学 20 篇

Bayesian Optimization for Learning Nonlinear MPC in Autonomous Agent Navigation

Computing Smooth Geodesics under Two-Sided Curvature Bounds with Applications to Robotics and Image Analysis

Exact, Efficient, and Safe Occlusion-Aware Planning Using AH-Polyhedrons

Covariance-Regulated Recursive Koopman Learning for Nonlinear Systems with Uncertain Time-Varying Dynamics

Learning Context-Aware Neural ODE Dynamics for Adaptive Robotic Control

Pixels to Proofs: Probabilistically-Safe Latent World Model Control via Parallel Conformal Robust MPC

PO-PDDL: Learning Symbolic POMDPs from Visual Demonstrations for Robot Planning Under Uncertainty

LoComposition: Terrain-Adaptive Energy-Efficient Quadruped Locomotion without Gait Priors

Energy-Efficient Arm Reaching for a Humanoid Robot via Deep Reinforcement Learning with Identified Power Models

HOLO-MPPI: Multi-Scenario Motion Planning via Hierarchical Policy Optimization

ADAPT: Analytical Disturbance-Aware Policy Training for Humanoid Locomotion

Elastic ODYN: Differentiable Optimization for Infeasible Control and Learning in Robotics

VENOM: Versatile Embodied Network for Omni-bodied Motion tracking

DIFF-IPPO: Diffusion-Based Informative Path Planning with Open-Vocabulary Belief Maps

When Should a Robot Replan? Regret-Guided Update Scheduling in Time-Varying MDPs

Impedance MPC with Patient-Torque Estimation for Knee Rehabilitation Exoskeletons

Anisotropic Template Ansätze for Robust Positive Invariance under State-Dependent Uncertainty

One-Step Model Predictive Path Integral for Manipulator Motion Planning Using Configuration Space Distance Fields

C-3TO: Continuous 3D Trajectory Optimization on Neural Euclidean Signed Distance Fields

Perceptive Behavior Foundation Model: Adapting Human Motion Priors to Robot-Centric Terrain

3. 操作、抓取与灵巧手 18 篇

Inference-time Policy Steering via Vision and Touch

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objects

Seam-to-Graph Reconstruction for Garment Configuration Alignment

Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands

GeoTLM: Geometry-aware Tactile-Language Models for Contact Motion Orientation Reasoning of Dynamic Objects

A Deployment Case Study in Robotic Apparel Automation: Digital Twin Integration, Interoperability, and Workforce Enablement

TopoRetarget: Interaction-Preserving Retargeting for Dexterous Manipulation

ART-Glove: Articulated Tactile Glove for Contact-Grounded Dexterous Interaction Capture

V2P-Manip: Learning Dexterous Manipulation from Monocular Human Videos

APEX: Adaptive Policy Execution for Precise Manipulation

PATCH: Action-Chunk-Conditioned Latent Patch Innovation Monitoring for Robot Manipulation

Task-Error Residual Learning for Real-Robot Five-Ball Juggling

Human Universal Grasping

T-Rex: Tactile-Reactive Dexterous Manipulation

Phase-Localized Curation Does Not Help: A Negative Result on Per-Phase Metric Selection for Demonstration Filtering

Decoupled Object-Centric Video Understanding for Generating Robotic Manipulation Commands

AetheRock: An Arm-Worn Robot Teaching System for Force-Guided Vision-Tactile Learning

Learning Fine-Grained Correspondence with Cross-Perspective Perception for Open-Vocabulary 6D Object Pose Estimation

4. 导航、定位与SLAM 21 篇

Deep Learning-Based Lunar Crater Terrain Relative Navigation

VANDERER: Map-Free Exploration using Future-Aware and Visual-Curiosity-Guided Diffusion Policy

Task-Aware Environment Augmentation for Reliable Navigation via Shielded Conditional Diffusion

FARM: Find Anything using Relational Spatial Memory

FD-SLAM: Fast Dense Radar-Inertial SLAM with Frequency-Domain Loop Closure and Pose Graph Optimization

Can Causal Models Enhance Robot Navigation? Online Causal Adaptation for Real-Robot Navigation

FlashNav: Ultra-Fast Policy Training for Robot Navigation within 20 Seconds

A Smart-Scheduled Hybrid (SSH) EKF-FGO State Estimation

PolyMerge: Compressing 3D Gaussian Splats with Polytope Coverings for Provably Safe Resource-Constrained Navigation

SemGeoNav:A Safety-Guided Visual Navigation Approach with Semantic Reasoning and Geometric Planning

SGM-SLAM: Scene Graph Matching for Data-Efficient Distributed SLAM

Binary Tracking for Spatial QA and Navigation with Open Vision-Language Models

CrossMaps: Confidence-Aware Open-Vocabulary Semantic Mapping for Rover Navigation