2605.25813 2026-05-26 cs.RO 版本更新

Extending Embodied Question Answering from Perception to Decision

将具身问答从感知扩展到决策

Xicheng Gong, Qiwei Li, Peiran Xu, Yadong Mu

发表机构 * Peking University（北京大学）； XYZ Embodied AI（XYZ具身AI）

AI总结提出大规模具身问答数据集EQA-Decision和基线模型RoboDecision，系统覆盖静态场景构建、空间理解、任务动态推理和即时决策四个维度，以统一框架评估具身环境中的感知、推理和行动级决策。

Comments 11 pages,4 figures

详情

AI中文摘要

具身问答（EQA）连接了具身环境中的感知、推理和交互。然而，现有的数据集和基准仍然分散，每个都侧重于有限的推理技能子集，如空间理解或程序推理，而没有提供一个统一的、大规模的综合评估框架。我们提出了EQA-Decision，一个大规模具身问答数据集，系统地涵盖了具身推理的四个互补维度：静态场景构建、空间理解、任务动态推理和即时决策。该数据集包含超过四百万个问答对，并在多样化的具身场景中具有分层注释。此外，我们开发了RoboDecision，一个与EQA-Decision基准对齐的强基线模型，提供了一个统一框架，共同评估具身环境中的感知、推理和行动级决策。结果表明，EQA-Decision有效地基准测试并增强了VLM在空间和交互推理方面的能力，为推进具身智能研究提供了坚实基础。

英文摘要

Embodied Question Answering (EQA) connects perception, reasoning, and interaction within embodied environments. However, existing datasets and benchmarks remain fragmented, each focusing on a limited subset of reasoning skills such as spatial understanding or procedural reasoning, without offering a unified large-scale framework for comprehensive evaluation. We present EQA-Decision, a large-scale embodied QA dataset that systematically covers four complementary dimensions of embodied reasoning: static scene construction, spatial understanding, task dynamics reasoning, and instant decision. The dataset contains over four million question-answer pairs with hierarchical annotations across diverse embodied scenarios. In addition, we develop RoboDecision, a strong baseline model aligned with the EQA-Decision Benchmark, providing a unified framework that jointly evaluates perception, reasoning, and action-level decision-making in embodied environments. Results demonstrate that EQA-Decision effectively benchmarks and enhances VLM capabilities in spatial and interaction reasoning, providing a solid foundation for advancing embodied intelligence research.

URL PDF HTML ☆

赞 0 踩 0

2605.25790 2026-05-26 cs.RO 版本更新

HoLoArm: Deformable Arms for Collision-Tolerant Quadrotor Flight

HoLoArm: 用于碰撞容忍四旋翼飞行的可变形臂

Quang Ngoc Pham, Jonas Eschmann, Yang Zhou, Alejandro Ojeda Olarte, Giuseppe Loianno, Van Anh Ho

发表机构 * Japan Advanced Institute of Science and Technology（日本先进科学技术研究所）； University of California Berkeley（加州大学伯克利分校）； New York University（纽约大学）

AI总结受蜻蜓翅膀结脉结构启发，提出具有柔性臂的四旋翼HoLoArm，结合强化学习控制策略实现被动变形与快速恢复，在高达7.6 m/s碰撞速度下保持稳定飞行。

Comments 8 pages, 15 figures, 1 table, Accepted at the IEEE Robotics and Automation Letters (RA-L) and the IEEE International Conference on Robotics and Automation (ICRA), 2026

详情

DOI: 10.1109/LRA.2026.3656783
Journal ref: IEEE Robotics and Automation Letters, vol. 11, no. 3, pp. 3582-3589, March 2026

AI中文摘要

无人机在以人为中心的应用中日益普及，凸显了对能够承受碰撞并快速恢复的设计的需求，以最小化对人类和环境的风险。我们提出了HoLoArm，一种具有柔性臂的四旋翼，其灵感来源于蜻蜓翅膀的结脉结构。这种设计在保持飞行稳定性的同时提供了自然的柔韧性和弹性，并通过集成强化学习（RL）控制策略进一步增强了恢复和悬停性能。实验结果表明，HoLoArm可以在任何方向（包括轴向）被动变形，并根据冲击方向和程度在0.3-0.6秒内恢复。无人机能够在高达7.6米/秒的碰撞速度下存活，并携带540克有效载荷，同时保持稳定飞行。这项工作有助于具有高敏捷性和可靠安全性的软体空中机器人的形态设计，使其能够在杂乱和人类共享的环境中运行，并为未来将柔性结构与智能控制相结合的完全软体无人机奠定了基础。

英文摘要

The increasing use of drones in human-centric applications highlights the need for designs that can survive collisions and recover rapidly, minimizing risks to both humans and the environment. We present HoLoArm, a quadrotor with compliant arms inspired by the nodus structure of dragonfly wings. This design provides natural flexibility and resilience while preserving flight stability, which is further reinforced by the integration of a Reinforcement Learning (RL) control policy that enhances both recovery and hovering performance. Experimental results demonstrate that HoLoArm can passively deform in any direction, including axial one, and recover within 0.3-0.6 s depending on the direction and level of the impact. The drone can survive collisions at speeds up to 7.6 m/s and carry a 540 g payload while maintaining stable flight. This work contributes to the morphological design of soft aerial robots with high agility and reliable safety, enabling operation in cluttered and human shared environments, and lays the groundwork for future fully soft drones that integrate compliant structures with intelligent control.

URL PDF HTML ☆

赞 0 踩 0

2605.25685 2026-05-26 cs.RO 版本更新

HumanFlow -- Diffusion-Driven MAV Navigation Among Humans via Tightly-Coupled Motion Tracking, Forecasting, and Control

HumanFlow -- 通过紧耦合运动跟踪、预测和控制的扩散驱动MAV在人群中导航

Simon Schaefer, Joshua Näf, Stefan Leutenegger

发表机构 * Technical University of Munich（慕尼黑技术大学）； MCML ； MIRMI ； ETH Zurich（苏黎世联邦理工学院）

AI总结提出HumanFlow，一种潜在扩散模型，统一了人体运动跟踪与预测，并利用3D场景上下文，在严重遮挡下实现高精度、高效率的运动估计，并通过紧耦合控制实现MAV在人群中的无碰撞导航。

Comments Accepted to Robotics Science and Systems (RSS), 2026

详情

AI中文摘要

在3D场景上下文中对人类的鲁棒和准确感知对于将机器人集成到日常环境中至关重要。然而，现有方法通常无法预测与周围场景一致的合理且准确的人体运动估计，尤其是在存在严重遮挡或部分可见性的情况下。这可能会限制机器人操作的安全性和效率。我们引入了HumanFlow，一种潜在扩散模型，它统一了人体运动跟踪和预测，并以3D场景上下文为条件。我们展示了我们的人体运动模型在具有挑战性的条件下（包括严重遮挡）能够产生平滑且准确的预测，并且在跟踪精度上优于最先进的方法，同时效率显著更高。此外，我们展示了如何通过将这些表示作为基于流匹配的近似MPC策略的条件，将HumanFlow的潜在空间与控制紧密耦合。我们在模拟中使用真实人类轨迹验证了我们的策略用于MAV社交导航，展示了优越的导航性能，并且在人类部分可观察的情况下仍能保持无碰撞。

英文摘要

Robust and accurate perception of humans in their 3D scene context is essential for integrating robots into everyday environments. Existing approaches, however, often fail to predict plausible and accurate human motion estimates that are consistent with the surrounding scene, especially in the presence of heavy occlusions or partial visibility. This can limit both safety and efficiency for robotic operations. We introduce HumanFlow, a latent diffusion model that unifies human motion tracking and forecasting, conditioned on the 3D scene context. We show that our human motion model produces smooth and accurate predictions under challenging conditions, including heavy occlusions, and outperforms state-of-the-art methods in tracking accuracy while being significantly more efficient. Furthermore, we show how HumanFlow's latent space can be tightly coupled with control by conditioning a flow-matching-based, approximate MPC policy on these representations. We validate our policy in simulation with real human trajectories for MAV social navigation, demonstrating superior navigation performance and remaining collision-free, even under partial observability of the human.

URL PDF HTML ☆

赞 0 踩 0

2605.25672 2026-05-26 cs.RO 版本更新

EXPO-FT：面向视觉-语言-动作模型的样本高效强化学习微调

Perry Dong, Kuo-Han Hung, Tian Gao, Dorsa Sadigh, Chelsea Finn

发表机构 * Stanford University（斯坦福大学）

AI总结提出EXPO-FT系统，通过样本高效的强化学习微调预训练的VLA策略，在多种高精度操作任务中实现完美性能（30/30成功率），平均仅需19.1分钟在线机器人数据。

详情

AI中文摘要

高效且可靠地学习新任务的能力一直是机器人学的基础挑战。视觉-语言-动作（VLA）模型在多种操作任务中展现出强大的泛化能力，但预训练策略始终无法达到实际部署所需的可靠性。强化学习（RL）微调为弥合这一差距提供了有前景的路径，但现有方法要么从头开始训练而未充分利用预训练先验，要么微调VLA而未达到实际部署所需的样本效率和成功率。我们提出了EXPO-FT，一个用于对预训练VLA策略进行稳定、样本高效的RL微调的系统，填补了这一空白。我们的系统解决了一系列具有挑战性的操作任务，包括串灯并插入插头点亮、将台球击入袋中、将花插入酒瓶，每个任务都需要高精度、动态动作以及对不同初始状态的鲁棒性。我们的系统在所有评估任务中均实现了完美的任务性能（30/30成功），平均仅需19.1分钟的在线机器人数据，优于先前的从头RL训练和VLA微调方法。我们发布了一个开源代码库，旨在促进机器人领域中VLA模型RL微调的更广泛采用。

英文摘要

The ability to efficiently and reliably learn new tasks has been a foundational challenge in robotics. Vision-Language-Action (VLA) models have demonstrated strong generalization across diverse manipulation tasks, yet pretrained policies consistently fall short of the reliability required for real-world deployment. Reinforcement learning (RL) fine-tuning offers a promising path to bridge this gap, but existing approaches either train from scratch without fully leveraging pretrained priors, or fine-tune VLAs without achieving the sample efficiency and success rates that practical deployment demands. We present EXPO-FT, a system for stable, sample-efficient RL finetuning of pretrained VLA policies that closes this gap. Our system solves a suite of challenging manipulation tasks, including routing string lights and inserting the plug to light it up, striking a pool ball into a pocket, and inserting a flower into a wine bottle, each requiring combinations of high precision, dynamic actions, and robustness to varied initial states. Our system achieves perfect task performance (30/30 successes) across all evaluated tasks within an average of 19.1 minutes of online robot data, outperforming both prior RL-from-scratch and VLA finetuning approaches. We release an open-source codebase with the aim of facilitating broader adoption of RL finetuning of VLA models in robotics.

URL PDF HTML ☆

赞 0 踩 0

2605.25423 2026-05-26 cs.RO 版本更新

OPAL: Omnidirectional Path-efficient Aerial 3D expLoration

OPAL: 全方位路径高效空中三维探索

Yoga Satwik Chappidi, Avideh Zakhor

发表机构 * Department of Electrical Engineering and Computer Sciences, University of California, Berkeley（加州大学伯克利分校电子工程与计算机科学系）

AI总结提出OPAL框架，通过在歧义分支点进行360度偏航旋转替代计算密集的全局路径规划，实现计算简单、路径短且覆盖率高的自主探索。

Comments Submitted to IEEE Robotics and Automation Letters (RA-L)

详情

AI中文摘要

自主探索对于机器人绘制未知环境地图至关重要。探索算法的理想特性包括计算效率高和探索过程中行进距离小。受此启发，我们提出了全方位路径高效空中三维探索（OPAL），这是一个探索框架，其核心是在歧义分支点进行有意的360度偏航旋转，而不是进行计算密集的全局路径规划。我们设计了OPAL的多个变体，以确定在偏航旋转完成后如何选择前沿。其中一个变体是无模型的，而其他变体则使用大语言模型（LLM）或视觉语言模型（VLM）。我们通过改变邻近搜索半径以将前沿纳入选择过程，来表征这些变体的性能。通过仿真，我们发现尽管与计算更复杂的基线（如EDEN和FALCON）相比，耗时的原地偏航旋转增加了总探索时间，但OPAL计算更简单，实现了更短的行进距离和更高的覆盖率-距离曲线下面积。我们还表明，调整前沿选择搜索半径可以在行进距离和总探索时间之间进行权衡。我们在两个室内环境中使用Modal AI无人机将OPAL与FALCON进行比较，验证了我们的结果，发现OPAL的一个变体的行进距离比FALCON低25%。

英文摘要

Autonomous exploration is critical for robot mapping unknown environments. Desirable characteristics of exploration algorithms include compute efficiency and small traversed distance during the exploration process. Motivated by these, we present Omnidirectional Path-efficient Aerial 3D expLoration (OPAL), an exploration framework centered on deliberate 360-degree yaw rotation at ambiguous branch points rather than compute-heavy global tour planning. We devise multiple variants of OPAL to determine the frontier-selection strategy once the yaw pan is completed. One variant is model-free, while others use large language models (LLMs) or vision-language models (VLMs). We characterize the performance of these variants while varying the vicinity search radius to include frontiers in the selection process. Through simulations we find that although the time-consuming in-place yaw rotation increases total exploration time relative to more computationally complex baselines such as EDEN and FALCON, OPAL is computationally simpler and achieves shorter travel distances and higher coverage-versus-distance area under the curve. We also show that adjusting the frontier-selection search radius enables a tradeoff between travel distance and total exploration time. We verify our results on a Modal AI drone in two indoor environments by comparing OPAL against FALCON, and find that the traveled distance for a variant of OPAL to be as much as 25% lower than FALCON.

URL PDF HTML ☆

赞 0 踩 0

2605.25414 2026-05-26 cs.RO 版本更新

How to Mitigate the Distribution Shift Problem in Robotics Control: A Robust and Adaptive Approach Based on Offline to Online Imitation Learning

如何缓解机器人控制中的分布偏移问题：一种基于离线到在线模仿学习的鲁棒自适应方法

Hyung-Suk Yoon, Seung-Woo Seo

发表机构 * Department of Electronic and Computer Engineering, Seoul National University, Seoul, South Korea（电子与计算机工程系，首尔国立大学，首尔，韩国）

AI总结提出一种鲁棒离线到自适应在线模仿学习框架，通过离线阶段利用判别器扩展状态-动作覆盖和在线阶段自监督模仿学习，缓解分布偏移问题。

Comments 8 pages, 2 figures

详情

AI中文摘要

模仿学习中的分布偏移是指智能体无法为训练期间未访问的状态规划适当动作的问题。该问题很大程度上归因于专家演示在整个环境中提供的固有狭窄状态-动作覆盖。在本文中，我们提出了一种鲁棒离线到自适应在线模仿学习框架，以终身、多阶段方案处理分布偏移问题。在离线学习阶段，我们利用补充演示通过判别器有效训练策略，从而拓宽策略的状态-动作覆盖，增强策略对分布偏移的鲁棒性。在后续的在线推理阶段，我们的框架检测分布偏移的发生，并从在线经验中进行自监督模仿学习，使策略适应在线环境。通过在MuJoCo环境中的广泛评估，我们证明我们的方法在分布偏移的鲁棒性和对在线环境的适应性能方面优于基线算法，这表明我们的框架在对抗分布偏移方面具有优越性能。

英文摘要

Distribution shift in imitation learning refers to the problem that the agent cannot plan proper actions for a state that has not been visited during the training. This problem can be largely attributed to the inherently narrow state-action coverage provided by expert demonstrations over the full environment. In this paper, we propose a robust offline to adaptive online imitation learning framework that handles the distribution shift problem in a lifelong, multi-phase scheme. In the offline learning phase, we leverage supplementary demonstrations to broaden the state-action coverage of the policy by utilizing a discriminator to effectively train the policy with supplementary demonstrations, thereby enhancing the robustness of the policy to distribution shift. In the subsequent online inference phase, our framework detects the occurrence of distribution shift and conducts self-supervised imitation learning from online experiences to adapt the policy to the online environments. Through extensive evaluations in MuJoCo environments, we demonstrate that our method exhibits better robustness to distribution shift and better adaptation performance to online environments than the baseline algorithms, which indicates superior performance of our framework against the distribution shift.

URL PDF HTML ☆

赞 0 踩 0

2605.25401 2026-05-26 cs.RO 版本更新

Path Following Control System of Line-of-Sight Guidance for Robotic Dolphin with Multi-Link Mechanism in Underwater Simulator

水下模拟器中多连杆机构仿生海豚的视线导引路径跟踪控制系统

Takumi Asada, Takao Oki, Hideo Furuhashi, Kenta Tabata, Renato Miyagusuku, Koichi Ozaki

发表机构 * Utsunomiya University（乌山大学）； Aichi Institute of Technology（爱知技术大学）

AI总结针对多连杆仿生自主水下航行器（BAUV），提出了一种基于视线导引的路径跟踪控制系统，并在水下模拟器中进行了参数确定和控制方法评估。

详情

DOI: 10.1109/sii64115.2026.11404481
Journal ref: 2026 IEEE/SICE International Symposium on System Integration (SII). IEEE, 2026. p. 844-849

AI中文摘要

具有多连杆机构的仿生自主水下航行器（BAUV）因其低功耗和高机动性被广泛用于水生生物观测和环境调查。环境调查需要能够自动跟踪特定点的路径跟踪系统。然而，BAUV的路径跟踪系统有限，且其与多连杆机构机器人的评估尚未明确。由于BAUV的模型因仿生类型而异，其路径跟踪系统需要预先进行仿真。在本研究中，我们提出了一种适用于多连杆机构BAUV的路径跟踪系统，并在水下模拟中进行了评估。结果表明，可以设计出适合BAUV的路径跟踪系统，使用模拟器确定参数，并评估控制方法。

英文摘要

Biomimetic autonomous underwater vehicle (BAUV) with multi-link mechanism is widely used in aquatic life observation and environmental surveys due to its low power consumption and high maneuverability. An environmental survey requires a path following system that automatically follows specific points. However, the path following system of BAUV is limited, and its evaluation with multi-link mechanism robots has not yet been clarified. The path following system in BAUV requires prior simulation because the model differs depending on the type of biomimetics. In this study, we propose a path following system for BAUVs with a multi-link mechanism and evaluation in underwater simulation. In this result, it was possible to design a path following system suitable for BAUV, determine parameters using a simulator, and evaluate control methods.

URL PDF HTML ☆

赞 0 踩 0

2605.25393 2026-05-26 cs.RO 版本更新

Decision-Making with Lightweight Confidence-Aware Language Model for Autonomous Driving

基于轻量级置信感知语言模型的自动驾驶决策

Ruoyu Yao, Ruiguo Zhong, Pei Liu, Mingxing Peng, Rui Yang, Jun Ma

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出一种利用轻量级置信感知语言模型的决策框架，通过多智能体协作生成置信注释的决策演示并蒸馏到双头轻量模型，在nuPlan上实现SOTA成功率和低延迟。

Comments 8 Pages, 3 figures, ITSC 2026

详情

AI中文摘要

大型语言模型和多模态大语言模型在自动驾驶中展现出巨大潜力，提供类人推理和开放世界泛化能力。然而，这些庞大模型过高的计算开销和推理延迟严重阻碍了它们在资源受限的自动驾驶系统中的部署。为解决这一挑战，我们提出了一种新颖的决策框架，利用轻量级置信感知语言模型，弥合了复杂多模态意图推理与高效推理之间的差距。具体而言，我们设计了一个多智能体协作工作流，包括动作投票、置信评估和总结智能体，通过显式的思维链推理生成高质量、带置信注释的决策演示。然后，这些演示被蒸馏到一个具有双头架构的轻量级语言模型中，实现决策概率的联合预测和文本理由的生成。蒸馏通过置信感知微调策略结合检索增强生成来实现，以增强模型的适应性和数据效率。在nuPlan基准上的全面闭环实验表明，我们的方法在常规和长尾场景下均实现了最先进的成功率，同时保持了低推理延迟。

英文摘要

Large Language Models (LLMs) and Multimodal LLMs (MLLMs) have demonstrated immense potential in autonomous driving (AD) by offering human-like reasoning and open-world generalization. However, the excessive computational overhead and high inference latency of these massive models severely hinder their deployment in resource-constrained AD systems. To address this challenge, we propose a novel decision-making framework utilizing a lightweight confidence-aware language model, which bridges the gap between complex multimodal intention reasoning and efficient inference. Specifically, we design a multi-agent collaborative workflow, comprising action voting, confidence assessment, and summarization agents, to generate high-quality, confidence-annotated decision demonstrations via explicit Chain-of-Thought (CoT) reasoning. These demonstrations are then distilled into a lightweight language model featuring a dual-head architecture, enabling the joint prediction of decision probabilities and the generation of textual rationales. The distillation is realized via a confidence-aware fine-tuning strategy coupled with Retrieval Augmented Generation (RAG) to enhance the model's adaptability and data efficiency. Comprehensive closed-loop experiments on the nuPlan benchmark demonstrate that our approach achieves state-of-the-art (SOTA) success rates in both regular and long-tail scenarios while maintaining low inference latency.

URL PDF HTML ☆

赞 0 踩 0

2605.25362 2026-05-26 cs.RO 版本更新

Prior Policy Guided Dual-Agent Coordinated Manipulation Planning of Spacecraft-Manipulator System

先验策略引导的航天器-机械臂系统双智能体协同操控规划

Yuhui Hu, Dong Zhou, Kaihong Ouyang, Zhongliang Yu, Jianfeng Lv, Xiangyu Shao

发表机构 * School of Astronautics（航天学院）； School of Automation（自动化学院）； School of Information Science and Engineering（信息科学与工程学院）

AI总结针对空间机械臂与基座强耦合导致的姿态稳定问题，提出先验策略引导的双智能体协同操控规划框架，通过时间步级专家切换机制提升深度强化学习效率，实现末端执行器高精度到达与基座姿态稳定。

Comments 36 pages, 13 figures, 6 tables. Under review

详情

AI中文摘要

机械臂与基座之间的强动态耦合对维持航天器姿态稳定性构成了重大挑战，可能危及任务安全。本文提出了一种双智能体协同操控规划（DACMP）框架，该框架同时实现了六自由度空间机械臂末端执行器的高精度位姿到达和基座航天器的姿态稳定。为了提高学习效率，我们提出了一种结合时间步级专家切换引导（TESG）机制的先验策略引导深度强化学习算法，从而促进全局收敛并提高任务成功率。大量实验表明，DACMP在任务成功率和控制精度方面显著优于基线深度强化学习算法。此外，在包括系统约束、环境干扰和感知不确定性在内的各种挑战性场景下，验证了DACMP的鲁棒性。代码和仿真配置可在GitHub上获取：https://github.com/HIT-YuhuiHu/DACMP。

英文摘要

The strong dynamic coupling between the manipulator and the base poses a significant challenge to maintaining spacecraft attitude stability, potentially compromising mission safety. In this paper, we propose a Dual-Agent Coordinated Manipulation Planning (DACMP) framework that simultaneously achieves high-precision end-effector pose reaching for a 6-DoF space manipulator and attitude stabilization of the base spacecraft. To enhance learning efficiency, we present a prior policy-guided Deep Reinforcement Learning algorithm incorporating the Timestep-level Expert Switching Guidance (TESG) mechanism, thereby promoting global convergence and improving task success rates. Extensive experiments demonstrate that DACMP significantly outperforms baseline DRL algorithms in terms of task success rate and control precision. Furthermore, the robustness of DACMP is validated under various challenging scenarios, including system constraints, environmental disturbances, and perception uncertainties. The code and simulation configurations are available on GitHub: https://github.com/HIT-YuhuiHu/DACMP.

URL PDF HTML ☆

赞 0 踩 0

2605.25346 2026-05-26 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC 版本更新

Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers

用于学习和规划的并行可微可达性：带认证的神经动力学与控制器

Keyi Shen, Glen Chou

发表机构 * MIT（麻省理工学院）

AI总结提出一种基于JAX的并行可微可达性框架，结合泰勒模型流形构建与CROWN线性界传播，支持GPU批处理和自动微分，并用于认证训练和可达性感知的MPC，在非抓取操作和四旋翼任务中实现在线规划与有界不确定性下的认证可达集过近似。

Comments Robotics: Science and Systems XXII (RSS 2026)

详情

AI中文摘要

神经网络动力学模型和控制策略在机器人领域取得了强大性能，但在不确定性下提供可靠保证仍然困难，尤其是对于闭环神经网络系统。现有的可达性工具提供了形式化的过近似，但通常不可微、过于保守或对于现代学习和在线规划流程来说太慢。为了解决这个问题，我们提出了一个在JAX中可并行化、可微的可达性框架，适用于连续和离散时间系统，具有解析和基于神经网络的动力学和控制器。我们的框架通过统一表示结合了泰勒模型流形构建和CROWN风格的线性界传播，该表示在支持GPU批处理计算和自动微分的同时保留了仿射依赖。基于这个可达性基元，我们开发了(i)一种认证训练方法，鼓励生成对可达性友好的动力学模型和控制器，以及(ii)一种具有基于梯度细化的可达性感知采样MPC方案。在非抓取操作和四旋翼任务上的实验，包括硬件和更高维度的评估（高达72维），展示了在实际在线规划中保持有界不确定性下认证可达集过近似的可行性。

英文摘要

Neural network (NN) dynamics models and control policies achieve strong performance in robotics, but providing sound guarantees under uncertainty remains difficult, especially for closed-loop NN systems. Existing reachability tools provide formal over-approximations, yet are often non-differentiable, overly conservative, or too slow for modern learning and online planning pipelines. To address this, we present a parallelizable, differentiable reachability framework in JAX for continuous- and discrete-time systems with analytical and NN-based dynamics and controllers. Our framework combines Taylor-model flowpipe construction with CROWN-style linear bound propagation through a unified representation that preserves affine dependencies while supporting GPU-batched computation and automatic differentiation. Building on this reachability primitive, we develop (i) a certified training method that encourages reachability-friendly dynamics models and controllers, and (ii) a reachability-aware sampling-based MPC scheme with gradient-based refinement. Experiments on non-prehensile manipulation and quadrotor tasks, including hardware and higher-dimensional evaluations (up to 72D), demonstrate practical online planning while maintaining certified reachable-set over-approximations under bounded uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2605.25313 2026-05-26 cs.LG cs.AI cs.RO stat.ML 版本更新

UWM-JEPA: Predictive World Models That Imagine in Belief Space

UWM-JEPA：在信念空间中进行想象的世界预测模型

Santosh Kumar Radha, Oktay Goktas

发表机构 * AgentField AI

AI总结针对部分可观测环境，提出UWM-JEPA模型，通过密度矩阵潜变量和酉预测器在信念空间中保持联合状态谱，实现长时域盲推演下的不确定性保持，显著优于向量潜变量基线。

Comments 14 pages, 6 figures, 7 tables. Code and data: https://github.com/santoshkumarradha/uwm-jepa

详情

AI中文摘要

部分可观测环境下的世界模型必须想象多个兼容的隐藏未来，并在反事实动作下引导它们。联合嵌入预测架构（JEPAs）在潜在空间中实现这一点，但向量值潜变量没有内部结构来承载盲推演过程中隐藏连续性的信念。我们引入了酉世界模型JEPA（UWM-JEPA），这是一种JEPA世界模型，具有在联合系统-环境空间上的密度矩阵潜变量和学习的酉预测器。该结构在推演过程中精确保持联合状态谱，因此预测器本身不会耗散表示的不确定性。在一个需要根据给定动作序列进行五步前向模拟且目标观测被掩蔽的隐藏速度指示任务中，UWM-JEPA达到0.77的准确率，并且随着动作被扰动而单调下降；而参数匹配的LSTM-JEPA在相同的反事实目标目标和动作头训练下，在所有动作条件下都崩溃为多数类准确率（0.53）。在盲推演下，UWM-JEPA在短时域上损失不到十个点的探针R^2，而向量潜变量基线损失四十一个和六十八个点；两者在保留的上下文探针上表现相当，表明差异在于预测器而非编码器。动作敏感性本身需要针对反事实而非教师强制目标进行训练，这一发现适用于酉参数化之外。对于JEPA世界模型在部分可观测性下进行想象，潜变量几何和预测器动力学至关重要，而不仅仅是冻结的上下文编码能力。

英文摘要

World models for partially observed environments must imagine multiple compatible hidden futures and steer between them under counterfactual actions. Joint Embedding Predictive Architectures (JEPAs) do this in latent space, but a vector-valued latent has no internal structure for carrying the belief over hidden continuations through blind rollout. We introduce the Unitary World Model JEPA (UWM-JEPA), a JEPA world model with a density-matrix latent on a joint system-environment space and a learned unitary predictor. The construction preserves the joint-state spectrum exactly during rollout, so the predictor itself cannot dissipate the represented uncertainty. On a hidden-velocity indicator task requiring five-step forward simulation under a given action sequence with the target observation masked, UWM-JEPA reaches 0.77 accuracy and degrades monotonically as actions are perturbed; a parameter-matched LSTM-JEPA trained under the same counterfactual-target objective and action head collapses to majority-class accuracy (0.53) under every action condition. Under blind rollout, UWM-JEPA loses fewer than ten points of probe R^2 at short horizons while vector-latent baselines lose forty-one and sixty-eight; both nevertheless tie on a held-out context probe, locating the separation in the predictor rather than the encoder. Action sensitivity itself requires training against counterfactual rather than teacher-forced targets, a finding that applies beyond the unitary parameterisation. For JEPA world models to imagine under partial observability, latent geometry and predictor dynamics matter, not frozen context-encoding capacity alone.

URL PDF HTML ☆

赞 0 踩 0

2605.25293 2026-05-26 cs.CV cs.AI cs.RO 版本更新

Neuromorphic LiDAR-based Bird's Eye View Object Detection using Energy-efficient Spiking Neural Networks

基于神经形态激光雷达的鸟瞰图目标检测：使用节能脉冲神经网络

Sambit Mohapatra, Senthil Yogamani, Heinrich Gotzig, Patrick Mader

发表机构 * Valeo, Germany（德国瓦莱欧公司）； Valeo, Ireland（爱尔兰瓦莱欧公司）； TU Ilmenau, Germany（德国伊门豪大学）

AI总结提出一种端到端脉冲编码器-解码器网络，用于激光雷达点云鸟瞰图表示中的目标检测，通过代理梯度反向传播训练，在KITTI基准上达到高精度，并实现3.33倍突触操作能耗降低。

详情

AI中文摘要

自动驾驶感知需要在严格的功耗约束下对三维传感器数据进行准确高效的处理。传统卷积神经网络实现了强大的检测精度，但计算密集，限制了其在资源受限的神经形态平台上的部署。脉冲神经网络通过事件驱动的稀疏计算提供了一种引人注目的替代方案，但其在复杂真实世界感知任务（如三维目标检测）中的应用仍然有限。在这项工作中，我们提出了一种端到端脉冲编码器-解码器网络，用于激光雷达点云鸟瞰图表示中的目标检测，并使用代理梯度反向传播进行训练。我们训练了两个变体：一个膜电位变体，在输出阶段读取连续神经元状态以获得最大精度，在$\mathrm{IoU}\!=\!0.5$（简单/中等/困难）下达到$92.05$/$87.04$/$86.51$ AP；以及一个全二进制脉冲变体，每一层仅操作脉冲序列，用于直接神经形态部署。我们评估了四种输入脉冲编码策略，并证明允许网络直接从数据学习脉冲表示优于手工制作的泊松、延迟和z轴编码方案，在KITTI基准上，当顺序帧不可用且BEV输入跨时间步重复呈现作为时间流代理时。分块能量分析表明，在保守的基于循环的操作下，与等效CNN相比，突触操作能量降低了$3.33 imes$。这些结果共同证明了脉冲神经网络在自动驾驶中实现准确且节能的神经形态感知的可行性。

英文摘要

Autonomous driving perception demands accurate and efficient processing of three-dimensional sensor data under strict power constraints. Traditional convolutional neural networks achieve strong detection accuracy but are computationally intensive, limiting their suitability for deployment on resource-constrained neuromorphic platforms. Spiking neural networks offer a compelling alternative through event-driven sparse computation, yet their application to complex real-world perception tasks such as three-dimensional object detection remains limited. In this work, we propose an end-to-end spiking encoder-decoder network for object detection in bird's eye view representations of LiDAR point clouds, trained using surrogate gradient backpropagation. We train two variants: a membrane potential variant that reads continuous neuron state at the output stage for maximum accuracy, achieving $92.05$/$87.04$/$86.51$ AP at $\mathrm{IoU}\!=\!0.5$ (Easy/Moderate/Hard), and, a fully binary spiking variant that operates exclusively on spike trains at every layer for direct neuromorphic deployment. We evaluate four input spike encoding strategies and demonstrate that allowing the network to learn spike representations directly from data outperforms hand-crafted Poisson, latency, and z-axis encoding schemes on the KITTI benchmark, where sequential frames are unavailable and the BEV input is presented repeatedly across timesteps as a proxy for temporal streaming. A block-wise energy analysis demonstrates a $3.33\times$ reduction in synaptic operation energy over an equivalent CNN under conservative loop-based operation. Together, these results demonstrate the viability of spiking neural networks for accurate and energy-efficient neuromorphic perception in autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2605.25279 2026-05-26 cs.RO 版本更新

从试错中学习：具身大语言模型的反思式测试时规划

Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Leonidas Guibas, Jiajun Wu, Yejin Choi

发表机构 * Stanford University（斯坦福大学）； Northwestern University（西北大学）

AI总结提出反思式测试时规划方法，通过行动中反思和行动后反思两种模式，结合回溯性反思，使具身智能体在测试时进行自我纠正和经验积累，显著提升长程任务性能。

详情

AI中文摘要

具身大语言模型赋予机器人高级任务推理能力，但它们无法反思错误原因，导致部署成为一系列独立尝试，错误重复而非积累经验。借鉴人类反思实践，我们引入反思式测试时规划，整合两种反思模式： extit{行动中反思}，代理在行动前利用测试时扩展生成并评分多个候选行动，基于内部反思；以及 extit{行动后反思}，利用测试时训练，根据执行后的外部反思更新内部反思模型和行动策略。我们还包含回溯性反思，允许代理重新评估早期决策，并利用后见之明进行模型更新，实现适当的长程信用分配。在我们新设计的Long-Horizon Household基准和MuJoCo Cupboard Fitting基准上的实验表明，与基线模型相比有显著提升，并能零样本泛化到逼真的HM3D环境以及在Franka Panda机械臂上的真实机器人实验。消融实验证实，行动中反思和行动后反思相互依赖，且回溯性反思在较低计算开销下比逐步外部反馈实现更好的信用分配。定性分析进一步突出了通过反思进行的行为纠正。

英文摘要

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with zero-shot generalization to photorealistic HM3D environments and real-robot experiments on a Franka Panda arm. Ablations confirm that reflection-in-action and reflection-on-action are mutually dependent, and that retrospective reflection achieves better credit assignment than step-wise external feedback at lower computational overhead. Qualitative analyses further highlight behavioral correction through reflection.

URL PDF HTML ☆

赞 0 踩 0

2602.06508 2026-05-26 cs.RO 版本更新

World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy

World-VLA-Loop: 视频世界模型与VLA策略的闭环学习

Xiaokang Liu, Zechen Bai, Hai Ci, Kevin Yuchen Ma, Mike Zheng Shou

发表机构 * Show Lab, National University of Singapore（新加坡国立大学Show实验室）

AI总结提出World-VLA-Loop框架，通过状态感知视频世界模型联合预测未来帧和二元奖励，并采用协同进化范式迭代优化VLA策略，减少对真实环境交互的依赖。

Comments 16 pages, 9 figures

详情

AI中文摘要

强化学习（RL）可以超越行为克隆，优化视觉-语言-动作（VLA）策略，但由于需要大量 rollout、重置、监督和安全风险，真实世界的RL仍然昂贵。基于动作条件的视频世界模型提供了在虚拟环境中训练的选项，但它们在精确的动作跟随方面表现不佳，尤其是在细微的接近成功失败情况下。此外，它们缺乏用于RL的原生奖励信号。基于不准确的视觉预测计算奖励仍然不可靠。我们引入了World-VLA-Loop，它围绕两个基础设计和一个更高级别的协同进化范式构建。我们首先策划了SANS，专门混合成功和接近成功的轨迹，以改善动作-结果对齐。然后，我们训练了一个状态感知视频世界模型，该模型从扩散潜变量中联合预测未来帧和二元奖励。它将奖励估计与生成器耦合，而不是单独模块，从而反过来有利于视觉预测。由于RL过程中VLA行为会发生变化，固定的模拟器可能与更新后的策略不对齐，因此World-VLA-Loop通过使用精炼的世界模型进行迭代VLA后训练，同时将每个改进策略的rollout反馈回来增强和微调世界模型，从而形成闭环。在仿真和真实机器人实验中，World-VLA-Loop显著提高了VLA性能，同时减少了对昂贵的物理交互的依赖。

英文摘要

Reinforcement learning (RL) can refine Vision-Language-Action (VLA) policies beyond behavior cloning, but real-world RL remains expensive due to extensive rollouts, resets, supervision, and safety risks. Action-conditioned video world models offer an option to train in virtual environments, yet they exhibit imprecise action following, particularly on subtle near-success failures. Besides, they lack native reward signals for RL. Computing rewards based on inaccurate visual predictions remain unreliable. We introduce World-VLA-Loop, structured around two foundational designs and a higher-level co-evolving paradigm. We first curate SANS, dedicatedly mixing successful and near-success trajectories to improve action-outcome alignment. Then, we train a state-aware video world model that jointly predicts future frames and binary rewards from diffusion latents. It couples reward estimation to the generator rather than a separate module, and in turn, benefits visual prediction. Since VLA behavior shifts during RL, a fixed simulator can misalign with the updated policy, World-VLA-Loop therefore closes the loop by using the refined world model for iterative VLA post-training while feeding rollouts from each improved policy back to augment and fine-tune the world model. Across simulation and real-robot experiments, World-VLA-Loop substantially improves VLA performance while reducing reliance on costly physical interaction.

URL PDF HTML ☆

赞 0 踩 0

2510.20390 2026-05-26 cs.RO 版本更新

NeuralTouch: Neural Descriptors for Precise Sim-to-Real Tactile Robot Control

NeuralTouch: 用于精确的仿真到现实触觉机器人控制的神经描述符

Yijiong Lin, Bowen Deng, Keju Pu, Chenghua Lu, Max Yang, Efi Psomopoulou, Nathan F. Lepora

发表机构 * Department of Engineering Mathematics and Bristol Robotics Laboratory（工程数学系和布里斯托尔机器人实验室）

AI总结提出NeuralTouch多模态框架，结合神经描述符场（NDF）和触觉感知，通过深度强化学习策略利用触觉反馈优化抓取姿态，实现精确且可泛化的机器人操作。

详情

DOI: 10.1109/TMECH.2026.3687919
Journal ref: IEEE/ASME Transactions on Mechatronics, 2026 IEEE/ASME Transactions on Mechatronics IEEE/ASME Transactions on Mechatronics

AI中文摘要

抓取精度是精确物体操作的关键前提，通常需要机器人手与物体之间的仔细对齐。神经描述符场（NDF）提供了一种有前景的基于视觉的方法，能够生成跨物体类别泛化的抓取姿态。然而，由于相机标定不完美、点云不完整以及物体变异性，仅靠NDF可能产生不准确的姿态。同时，触觉感知能够实现更精确的接触，但现有方法通常学习仅限于简单、预定义接触几何的策略。在这项工作中，我们引入了NeuralTouch，一个集成NDF和触觉感知的多模态框架，通过轻柔的物理交互实现精确、可泛化的抓取。我们的方法利用NDF隐式表示目标接触几何，从中训练深度强化学习（RL）策略，利用触觉反馈来优化抓取。该策略以神经描述符为条件，不需要显式指定接触类型。我们通过仿真中的消融研究以及零样本迁移到真实世界的操作任务（如销钉出孔和瓶盖打开）来验证NeuralTouch，无需额外微调。结果表明，NeuralTouch在抓取精度和鲁棒性上显著优于基线方法，为精确、富接触的机器人操作提供了一个通用框架。

英文摘要

Grasping accuracy is a critical prerequisite for precise object manipulation, often requiring careful alignment between the robot hand and object. Neural Descriptor Fields (NDF) offer a promising vision-based method to generate grasping poses that generalize across object categories. However, NDF alone can produce inaccurate poses due to imperfect camera calibration, incomplete point clouds, and object variability. Meanwhile, tactile sensing enables more precise contact, but existing approaches typically learn policies limited to simple, predefined contact geometries. In this work, we introduce NeuralTouch, a multimodal framework that integrates NDF and tactile sensing to enable accurate, generalizable grasping through gentle physical interaction. Our approach leverages NDF to implicitly represent the target contact geometry, from which a deep reinforcement learning (RL) policy is trained to refine the grasp using tactile feedback. This policy is conditioned on the neural descriptors and does not require explicit specification of contact types. We validate NeuralTouch through ablation studies in simulation and zero-shot transfer to real-world manipulation tasks--such as peg-out-in-hole and bottle lid opening--without additional fine-tuning. Results show that NeuralTouch significantly improves grasping accuracy and robustness over baseline methods, offering a general framework for precise, contact-rich robotic manipulation.

URL PDF HTML ☆

赞 0 踩 0

2510.06351 2026-05-26 cs.RO 版本更新

A Formal gatekeeper Framework for Safe Dual Control with Active Exploration

具有主动探索的安全双重控制的正式门控框架

Kaleb Ben Naveed, Devansh R. Agrawal, Dimitra Panagou

发表机构 * Department of Robotics, University of Michigan（密歇根大学机器人系）； Department of Aerospace Engineering, University of Michigan（密歇根大学航空航天工程系）

AI总结提出一个集成鲁棒规划与主动探索的框架，通过门控机制仅在可验证改进且不牺牲安全时进行探索，实现安全与不确定性降低的平衡。

Comments Accepted at American Control Conference (ACC) 2026

详情

AI中文摘要

在模型不确定性下规划安全轨迹是一个基本挑战。鲁棒规划通过考虑最坏情况来确保安全，但忽略了不确定性降低，导致过于保守的行为。在名义任务期间主动实时降低不确定性定义了双重控制问题。大多数方法通过在成本中添加加权探索项来解决这一问题，调整以平衡名义目标和不确定性降低，但没有正式考虑何时探索是有益的。此外，某些方法强制安全性，而其他方法则没有。我们提出了一个框架，将鲁棒规划与正式保证下的主动探索集成如下：关键创新和贡献在于，仅在探索提供可验证改进且不牺牲安全时才进行探索。为实现这一点，我们利用我们早期关于门控器作为安全验证架构的工作，并将其扩展，使其生成既安全又信息丰富的轨迹，从而降低不确定性和任务成本，或将其保持在用户定义的预算内。通过参数不确定性下四旋翼飞行器在线双重控制的仿真案例研究评估了该方法。

英文摘要

Planning safe trajectories under model uncertainty is a fundamental challenge. Robust planning ensures safety by considering worst-case realizations, yet ignores uncertainty reduction and leads to overly conservative behavior. Actively reducing uncertainty on-the-fly during a nominal mission defines the dual control problem. Most approaches address this by adding a weighted exploration term to the cost, tuned to trade off the nominal objective and uncertainty reduction, but without formal consideration of when exploration is beneficial. Moreover, safety is enforced in some methods but not in others. We propose a framework that integrates robust planning with active exploration under formal guarantees as follows: The key innovation and contribution is that exploration is pursued only when it provides a verifiable improvement without compromising safety. To achieve this, we utilize our earlier work on gatekeeper as an architecture for safety verification, and extend it so that it generates both safe and informative trajectories that reduce uncertainty and the cost of the mission, or keep it within a user-defined budget. The methodology is evaluated via simulation case studies on the online dual control of a quadrotor under parametric uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2510.03827 2026-05-26 cs.CV cs.RO 版本更新

LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

LIBERO-PRO：超越记忆的视觉-语言-动作模型鲁棒与公平评估

Xueyang Zhou, Yangming Xu, Guiyao Tie, Yongchao Chen, Guowen Zhang, Duanfeng Chu, Pan Zhou, Lichao Sun

发表机构 * Huazhong University of Science and Technology（华中科技大学）； College of AI, Tsinghua University（清华大学人工智能学院）； Wuhan University of Technology（武汉理工大学）； Lehigh University（莱斯大学）

AI总结针对LIBERO基准评估中的记忆偏差问题，提出LIBERO-PRO扩展基准，通过在操作对象、初始状态、任务指令和环境四个维度施加合理扰动，揭示现有VLA模型性能从90%以上骤降至0.0%的严重缺陷，并呼吁采用鲁棒评估方法。

Comments 10 pages,7 figures, 0 tables

详情

AI中文摘要

LIBERO已成为评估视觉-语言-动作（VLA）模型的广泛采用的基准；然而，其当前的训练和评估设置存在问题，常常导致性能估计膨胀，并阻碍公平的模型比较。为了解决这些问题，我们引入了LIBERO-PRO，一个扩展的LIBERO基准，系统性地评估模型在四个维度（操作对象、初始状态、任务指令和环境）的合理扰动下的性能。实验结果表明，尽管现有模型在标准LIBERO评估下达到90%以上的准确率，但在我们的泛化设置下，其性能骤降至0.0%。关键的是，这种差异暴露了模型依赖于对训练集中动作序列和环境布局的死记硬背，而非真正的任务理解或环境感知。例如，当目标对象被替换为无关物品时，模型仍持续执行抓取动作；即使给出被破坏的指令甚至混乱的令牌，其输出也保持不变。这些发现揭示了当前评估实践中的严重缺陷，我们呼吁社区放弃误导性方法，转而采用对模型泛化和理解能力的鲁棒评估。我们的代码可在 https://github.com/Zxy-MLlab/LIBERO-PRO 获取。

英文摘要

LIBERO has emerged as a widely adopted benchmark for evaluating Vision-Language-Action (VLA) models; however, its current training and evaluation settings are problematic, often leading to inflated performance estimates and preventing fair model comparison. To address these issues, we introduce LIBERO-PRO, an extended LIBERO benchmark that systematically evaluates model performance under reasonable perturbations across four dimensions: manipulated objects, initial states, task instructions, and environments. Experimental results reveal that, although existing models achieve over 90% accuracy under the standard LIBERO evaluation, their performance collapses to 0.0% under our generalized setting. Crucially, this discrepancy exposes the models' reliance on rote memorization of action sequences and environment layouts from the training set, rather than genuine task understanding or environmental perception. For instance, models persist in executing grasping actions when the target object is replaced with irrelevant items, and their outputs remain unchanged even when given corrupted instructions or even messy tokens. These findings expose the severe flaws in current evaluation practices, and we call on the community to abandon misleading methodologies in favor of robust assessments of model generalization and comprehension. Our code is available at: https://github.com/Zxy-MLlab/LIBERO-PRO.

URL PDF HTML ☆

赞 0 踩 0

2510.01348 2026-05-26 cs.RO 版本更新

Kilometer-Scale GNSS-Denied UAV Navigation via Heightmap Gradients: A Winning System from the SPRIN-D Challenge

基于高程图梯度的千米级GNSS拒止无人机导航：SPRIN-D挑战优胜系统

Michal Werner, David Čapek, Tomáš Musil, Ondřej Franěk, Tomáš Báča, Martin Saska

发表机构 * Faculty of Electrical Engineering, Czech Technical University in Prague（捷克技术大学布拉格分校电子工程系）

AI总结针对GNSS拒止环境下无人机长距离飞行中的漂移问题，提出一种利用高程图梯度模板匹配进行漂移校正的轻量级定位方法，并在SPRIN-D挑战中实现9公里航点导航。

Comments 8 pages

详情

AI中文摘要

在GNSS拒止环境中实现可靠的长距离无人机飞行具有挑战性：集成里程计会导致漂移，在未探索区域无法进行闭环检测，且嵌入式平台计算能力有限。我们提出了一套完全机载的无人机系统，专为SPRIN-D Funke Fully Autonomous Flight Challenge开发，该挑战要求在没有GNSS或先验密集地图的情况下，在低于25米AGL（离地高度）的高度完成9公里长距离航点导航。该系统集成了感知、建图、规划和控制，并采用一种轻量级漂移校正方法，通过梯度模板匹配将激光雷达导出的局部高程图与先验地理数据高程图进行匹配，并在聚类粒子滤波器中融合里程计证据。在竞赛部署中，该系统在城区、森林和开阔地形中执行了千米级飞行，相对于原始里程计显著减少了漂移，同时在仅CPU硬件上实时运行。我们描述了系统架构、定位流程和竞赛评估，并报告了现场部署中的实际经验，为GNSS拒止无人机自主性的设计提供了参考。

英文摘要

Reliable long-range flight of unmanned aerial vehicles (UAVs) in GNSS-denied environments is challenging: integrating odometry leads to drift, loop closures are unavailable in previously unseen areas and embedded platforms provide limited computational power. We present a fully onboard UAV system developed for the SPRIN-D Funke Fully Autonomous Flight Challenge, which required 9 km long-range waypoint navigation below 25 m AGL (Above Ground Level) without GNSS or prior dense mapping. The system integrates perception, mapping, planning, and control with a lightweight drift-correction method that matches LiDAR-derived local heightmaps to a prior geo-data heightmap via gradient-template matching and fuses the evidence with odometry in a clustered particle filter. Deployed during the competition, the system executed kilometer-scale flights across urban, forest, and open-field terrain and reduced drift substantially relative to raw odometry, while running in real time on CPU-only hardware. We describe the system architecture, the localization pipeline, and the competition evaluation, and we report practical insights from field deployment that inform the design of GNSS-denied UAV autonomy.

URL PDF HTML ☆

赞 0 踩 0

2509.23651 2026-05-26 cs.RO 版本更新

HeLoM: Hierarchical Learning for Whole-Body Loco-Manipulation by a Hexapod Robot

HeLoM: 六足机器人全身移动操作的分层学习

Xinrong Yang, Peizhuo Li, Hongyi Li, Yifeng Peng, Arhaan Jain, Junkai Lu, Linnan Chang, Yuhong Cao, Yifeng Zhang, Ge Sun, Guillaume Sartoretti

发表机构 * MARMoT Lab, Department of Mechanical Engineering, National University of Singapore（机械工程系，新加坡国立大学MARMoT实验室）； Center for X-mechanics, Zhejiang University（浙江大学X力学中心）

AI总结提出HeLoM分层框架，通过协调多肢控制实现六足机器人对重/不规则物体的稳定推动，在仿真和实物实验中验证了有效性。

详情

AI中文摘要

在自然界中，动物经常需要移动/操纵与自身重量/大小相当的物体。与抓取和搬运相比，推动提供了一种更直接、高效的非抓取操纵策略，避免了复杂的抓取设计，同时利用直接接触在交互过程中调节物体的姿态。然而，实现有效的推动既需要足够的操纵能力，也需要稳定的全身协调，这在处理重型或不规则物体时尤其具有挑战性。为了解决这些挑战，我们提出了HeLoM，一种基于学习的六足机器人分层全身操纵框架，该框架利用协调的多肢控制，并适用于多足机器人系统。受多足昆虫合作策略的启发，我们的框架利用多个接触点和高度自由度，在物体交互过程中实现高效、动态的全身协调。HeLoM的高层规划器规划推动行为，而其低层控制器保持运动稳定性并生成动态一致的关节动作。这种设计使机器人能够通过协调的前肢交互和支撑性的后肢推进，在执行连续可控的推动行为的同时保持平衡。我们通过仿真和实物实验验证了HeLoM的有效性。结果表明，我们的框架能够在现实世界中稳定地将不同尺寸和未知物理属性的物体推动到指定的目标姿态。

英文摘要

In nature, animals often need to move/manipulate objects comparable in weight/size to their own bodies. Compared to grasping and carrying, pushing provides a more straightforward and efficient non-prehensile manipulation strategy, avoiding complex grasp design while leveraging direct contact to regulate an object's pose during interaction. Achieving effective pushing, however, requires both sufficient manipulation capability and stable whole-body coordination, which is particularly challenging when dealing with heavy or irregular objects. To address these challenges, we propose HeLoM, a learning-based hierarchical whole-body manipulation framework for hexapod robots that exploits coordinated multi-limb control and is applicable to multi-legged robotic systems. Inspired by the cooperative strategies of multi-legged insects, our framework leverages multiple contact points and high degrees of freedom to enable efficient and dynamic whole-body coordination during object interaction. HeLoM's high-level planner plans pushing behaviors, while its low-level controller maintains locomotion stability and generates dynamically consistent joint actions. This design enables the robot to maintain balance while executing continuous and controllable pushing behaviors through coordinated foreleg interaction and supportive hind-leg propulsion. We validate the effectiveness of HeLoM through both simulation and real-world experiments. Results show that our framework can stably push objects of varying sizes and unknown physical properties to designated goal poses in the real world.

URL PDF HTML ☆

赞 0 踩 0

2509.17057 2026-05-26 cs.RO 版本更新

RoboManipBaselines: A Unified Framework for Imitation Learning in Robotic Manipulation across Real and Simulation Environments

RoboManipBaselines：面向真实与仿真环境的机器人操作模仿学习统一框架

Masaki Murooka, Tomohiro Motoda, Ryoichi Nakajo, Hanbit Oh, Koshi Makihara, Keisuke Shirai, Tetsuya Ogata, Yukiyasu Domae

发表机构 * Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST)（日本国家先进工业科学技术研究院人工智能研究中心）； CNRS-AIST JRL (Joint Robotics Laboratory), IRL（法国国家科学研究中心与日本AIST联合机器人实验室）； Institute for AI and Robotics, Future Robotics Organization, Waseda University（早稻田大学未来机器人组织人工智能与机器人研究所）； AI Robot Association (AIRoA)（人工智能机器人协会）

AI总结提出RoboManipBaselines开源框架，统一支持仿真和真实环境下的机器人操作模仿学习全流程，包括数据收集、策略训练和部署，并通过基准测试和研究应用验证其有效性。

Comments Added a Limitations section in response to comments from reviewers

详情

DOI: 10.1109/ACCESS.2026.3691054
Journal ref: IEEE Access 2026

AI中文摘要

我们提出RoboManipBaselines，一个用于机器人操作模仿学习研究的开源软件框架。该框架支持完整的模仿学习流程，包括数据收集、策略训练和部署，覆盖仿真和真实环境。其设计强调通过一致的工作流程实现集成，跨不同环境和机器人平台的通用性，通过易于添加新机器人、任务和策略的可扩展性，以及通过使用公开数据集进行评估的可重复性。RoboManipBaselines系统地实现了模仿学习的核心组件：环境、数据集和策略。通过统一接口，该框架支持多种仿真器和真实机器人环境，以及多模态传感器和多种策略模型。我们进一步在仿真和真实环境中进行了基准评估，并介绍了多项研究应用，包括数据增强、与触觉模型的集成、交互式机器人系统、3D感知评估和硬件扩展。这些结果表明，RoboManipBaselines为利用模仿学习推进机器人操作的研究和实验验证提供了有用的基础。https://isri-aist.github.io/RoboManipBaselines-ProjectPage

英文摘要

We present RoboManipBaselines, an open-source software framework for imitation learning research in robotic manipulation. The framework supports the entire imitation learning pipeline, including data collection, policy training, and rollout, across both simulation and real-world environments. Its design emphasizes integration through a consistent workflow, generality across diverse environments and robot platforms, extensibility for easily adding new robots, tasks, and policies, and reproducibility through evaluations using publicly available datasets. RoboManipBaselines systematically implements the core components of imitation learning: environment, dataset, and policy. Through a unified interface, the framework supports multiple simulators and real robot environments, as well as multimodal sensors and a wide variety of policy models. We further present benchmark evaluations in both simulation and real-world environments and introduce several research applications, including data augmentation, integration with tactile models, interactive robotic systems, 3D sensing evaluation, and hardware extensions. These results demonstrate that RoboManipBaselines provides a useful foundation for advancing research and experimental validation in robotic manipulation using imitation learning. https://isri-aist.github.io/RoboManipBaselines-ProjectPage

URL PDF HTML ☆

赞 0 踩 0

2505.11758 2026-05-26 cs.CV cs.AI cs.GR cs.RO 版本更新

Generalizable Vision-Language Few-Shot Adaptation with Predictive Prompts and Negative Learning

具有预测性提示和负学习的可泛化视觉语言少样本适应

Sriram Mandalika

发表机构 * Hasso Plattner Institute, University of Potsdam（霍普夫纳研究所，波茨坦大学）

AI总结提出SCAN框架，通过查询自适应负路由、LLM引导对比提示和自适应融合权重，解决视觉语言模型少样本适应中负类信号处理问题，在11个基准上平均提升4.61%。

详情

AI中文摘要

视觉语言模型的少样本适应在推理时如何处理负类信号方面仍然存在根本性限制。现有方法对所有查询应用统一的负抑制，忽略了最具破坏性的混淆是查询特定的，并且随支持集几何形状而变化。我们提出SCAN（选择性混淆感知负样本），一个通过三个针对性贡献解决这一问题的框架。在推理中，查询自适应负路由将抑制限制在每个查询最易混淆的前K个类别，无需额外参数。通用负文本模板被替换为LLM引导的对比提示，描述易混淆类别对之间的区分属性，在关键处锐化文本决策边界。基于支持集Fisher可判别性估计的无参数自适应融合权重消除了手动调整视觉语言权衡的需要。在11个标准基准上评估，SCAN在16-shot设置下平均优于先前的基于提示和基于适配器的方法4.61%，在类间混淆最严重的细粒度数据集上提升高达7.70%。SCAN在分布偏移下也表现出强泛化性，在四个ImageNet OOD变体上平均提升2.95%，并在显著标签噪声下保持稳健性能，在50%标签损坏下的准确率仍超过最强竞争方法的干净基线。

英文摘要

Few-shot adaptation of vision-language models remains fundamentally limited by how negative class signals are handled at inference. Existing methods apply uniform negative suppression across all queries, ignoring that the most damaging confusions are query-specific and shift with support-set geometry. We introduce SCAN (Selective Confusion-Aware Negatives), a framework that addresses this gap through three targeted contributions. In inference, query-adaptive negative routing restricts suppression to the top-K most confusable classes per query, requiring zero additional parameters. Generic negative text templates are replaced with LLM-bootstrapped contrastive prompts that describe discriminative attributes between confusable class pairs, sharpening the textual decision boundary where it matters most. A parameter-free adaptive fusion weight estimated from support-set Fisher discriminability removes the need for manual tuning of the vision-language trade-off. Evaluated across 11 standard benchmarks, SCAN consistently outperforms prior prompt-based and adapter-based methods by an average of 4.61% at 16-shot, with gains of up to 7.70% on fine-grained datasets where inter-class confusion is most severe. SCAN also generalizes strongly under distribution shift, improving by 2.95% on average across four ImageNet OOD variants, and maintains robust performance under significant label noise, with accuracy under 50% label corruption still exceeding the clean baseline of the strongest competing method.

URL PDF HTML ☆

赞 0 踩 0

2502.16205 2026-05-26 cs.RO 版本更新

无需多视图生成的多视图一致3D高斯头部头像

Aviral Chharia, Fernando De la Torre

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出MVCHead，一种直接从随机采样的2D图像学习3D高斯头部模型的方法，通过层次状态空间块和SE(3)多视图评判器实现多视图一致性，无需多视图数据或3D监督。

Comments CVPR 2026; Project Website: https://humansensinglab.github.io/MVCHead/

详情

Journal ref: CVPR, Denver, CO, USA, 2026, pp. 40163-40174

AI中文摘要

高保真3D高斯头部头像生成对于AR/VR、远程呈现和数字人类等应用至关重要。现有方法依赖于多视图数据集、3D捕获或中间2D视图合成。相比之下，我们仅从随机采样的2D图像中学习条件和非条件3D头部模型，而不使用多视图数据、3D监督或中间视图生成。我们引入MVCHead，一种单次状态空间模型，直接在3D表示中强制执行多视图一致性（MVC），同时在这些约束下回归3D高斯。其核心是，我们提出层次状态空间（HiSS）块，从粗到细逐步细化高斯，同时捕获长距离依赖。在每个HiSS块中，我们修改Mamba的标准单向扫描，提出层次双向状态扫描（HiBiSS），将递归与多视图不一致性最强的轴对齐。最后，我们设计了一个SE(3)多视图评判器，判断一组自渲染是否来自单个底层3D配置，奖励跨视图像素对齐而不观察真实的多视图对。MVCHead实现了最先进的感知质量，在纹理和几何一致性上超越了先前方法，并保持了可比的形状一致性。为了展示可扩展性，我们发布了FaceGS-10K，这是第一个用于训练和评估3D头部模型的大规模即用型3D高斯头部资产数据集。项目页面和代码：https://humansensinglab.github.io/MVCHead/

X-DiffVLA：面向视觉-语言-动作模型的跨具身扩散动作头

Boyu Li, Chaoyi Xu, Haoqi Yuan, Xinrun Xu, Börje F. Karlsson, Dongbin Zhao, Haoran Li, Zongqing Lu

发表机构 * SKL-MAIS, Institute of Automation, Chinese Academy of Sciences（SKL-MAIS，自动化研究所，中国科学院）； School of Artificial Intelligence, University of Chinese Academy of Sciences（人工智能学院，中国科学院大学）； Beijing Academy of Artificial Intelligence（北京人工智能研究院）； BeingBeyond ； School of Computer Science, Peking University（北京大学计算机学院）

AI总结针对跨具身数据学习通用策略的挑战，提出X-DiffVLA模型，通过扩散模型和具身强制技术实现异构末端执行器间的知识迁移，在RoboCasa和Isaac Gym上分别提升15.3%和12.5%。

详情

AI中文摘要

从跨具身数据中学习通用策略仍然是机器人学中的基本挑战。尽管视觉-语言-动作（VLA）模型在大型多样化数据集上进行了预训练，但它们通常依赖于具身特定的微调才能在下游任务中实现强性能。这一要求严重限制了它们的泛化能力，并阻碍了执行相似任务的具身之间的知识迁移。为了克服这些限制，我们聚焦于共享机器人基座和异构末端执行器的跨具身设置，并提出X-DiffVLA，一种具有统一跨具身动作头的基于扩散的VLA模型。X-DiffVLA能够利用扩散模型的生成优势来捕捉跨具身数据集中的多样性和潜在相关性。具体地，我们引入了具身强制（Embodiment Forcing），一种无分类器引导技术，以隐式地将动作生成导向具身特定的功能组件，无需显式监督即可捕捉细粒度的结构细微差别。此外，设计了形态树扩散（Morphological Tree Diffusion）方法来增强不同末端执行器之间的行为相关性，最大化异构演示的可迁移性。在RoboCasa和Isaac Gym上的实验结果覆盖了从夹爪到灵巧手的多种具身，表明X-DiffVLA达到了最先进的性能，分别提升了15.3%和12.5%。真实世界评估进一步验证了所提出框架的鲁棒性及其在可扩展跨具身策略学习中的有效性。

英文摘要

Learning universal policies from cross-embodied data remains a fundamental challenge in robotics. Although Vision-Language-Action (VLA) models are pre-trained on large and diverse datasets, they typically rely on embodiment-specific fine-tuning to achieve strong performance in downstream tasks. This requirement severely limits their generalization capability and restricts knowledge transfer across embodiments performing similar tasks. To overcome these limitations, we focus on cross-embodied settings with shared robotic bases and heterogeneous end-effectors, and propose X-DiffVLA, a diffusion-based VLA model featuring a unified cross-embodied action head. X-DiffVLA can leverage the generative strengths of diffusion models to capture both the diversity and latent correlations in cross-embodied datasets. Specifically, we introduce Embodiment Forcing, a classifier-free guidance technique to implicitly steer action generation toward embodiment-specific functional components, capturing fine-grained structural nuances without explicit supervision. In addition, a Morphological Tree Diffusion approach is designed to strengthen behavioral correlations across diverse end-effectors, maximizing the transferability of heterogeneous demonstrations. Experimental results across RoboCasa and Isaac Gym, covering different embodiments from grippers to dexterous hands, show that X-DiffVLA achieves state-of-the-art performance, with improvements of 15.3% and 12.5%, respectively. Real-world evaluations further validate the robustness of the proposed framework and its effectiveness in scalable cross-embodied policy learning.

URL PDF HTML ☆

赞 0 踩 0

2605.25041 2026-05-26 cs.RO 版本更新

RAMBA: 4D Radar Mapping by Bundle Adjustment

RAMBA: 通过束调整的4D雷达建图

Jianzhu Huai, Yiwen Chen, Binliang Wang

发表机构 * State Key Lab of Info Engineering in Surveying, Mapping and Remote Sensing（信息工程测绘遥感国家重点实验室）

AI总结提出RAMBA框架，利用束调整联合优化雷达帧状态，结合协方差加权几何残差、IMU预积分因子和雷达自速度约束，实现全局一致的4D雷达建图。

Comments 5 pages, 2 figures, to present in ISPRS2026 Thematic Session 10 on Radar Perception

详情

AI中文摘要

4D雷达在机器人建图中越来越有吸引力，因为它提供距离、方位角、仰角和多普勒测量，同时在恶劣可见度条件下保持鲁棒性。尽管最近的雷达和雷达-惯性里程计方法已经实现了有前景的在线状态估计性能，但4D雷达的离线全局地图优化仍未得到充分探索。本文提出了RAMBA，一种用于全局一致4D雷达建图的雷达束调整框架。给定来自雷达-惯性里程计前端的初始位姿和雷达帧，RAMBA使用协方差加权几何残差、IMU预积分因子和雷达自速度约束联合优化雷达帧状态。几何残差通过跨选定帧形成基于体素的对应关系，并用点协方差加权每个残差，将成对GICP扩展到多帧优化。为了提高对漂移和重访的鲁棒性，RAMBA在对应关系形成过程中强制时间一致性，同时明确支持闭环约束。在ColoRadar和SNAIL Radar数据集上的实验表明，与雷达-惯性里程计和位姿图优化基线相比，RAMBA提高了地图一致性并通常提升了轨迹精度。

英文摘要

4D radar is increasingly attractive for robotic mapping because it provides range, azimuth, elevation, and Doppler measurements while remaining robust in adverse visibility conditions. Although recent radar and radar--inertial odometry methods have achieved promising online state estimation performance, offline global map refinement for 4D radar remains underexplored. This paper presents RAMBA, a radar bundle-adjustment framework for globally consistent 4D radar mapping. Given initial poses and radar frames from a radar--inertial odometry front-end, RAMBA jointly refines radar frame states using covariance-weighted geometric residuals, IMU preintegration factors, and radar ego-velocity constraints. The geometric residuals extend pairwise GICP to a multi-frame optimization by forming voxel-based correspondences across selected frames and weighting each residual with point covariances. To improve robustness against drift and revisits, RAMBA enforces temporal consistency during correspondence formation while explicitly supporting loop-closure constraints. Experiments on the ColoRadar and SNAIL Radar datasets show that RAMBA improves map consistency and usually enhances trajectory accuracy over radar--inertial odometry and pose-graph optimization baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.24985 2026-05-26 cs.RO cs.LG physics.comp-ph 版本更新

Learning, locomotion, and navigation of soft synthetic snakes in three-dimensional, heterogeneous environments

软体合成蛇在三维异质环境中的学习、运动与导航

Xiaotian Zhang, Ali Albazroun, Tixian Wang, Songyuan Cui, Prashant G. Mehta, Mattia Gazzola

发表机构 * Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana–Champaign（卡尔·R·沃塞基因组生物学研究所，伊利诺伊大学厄巴纳-香槟分校）； Department of Mechanical and Aerospace Engineering, Hong Kong University of Science and Technology（香港科学与技术大学机械与航空航天工程系）； Department of Mechanical Science and Engineering, University of Illinois Urbana–Champaign（伊利诺伊大学厄巴纳-香槟分校机械科学与工程系）

AI总结提出基于仿生驱动和感知模型的强化学习框架，使软体合成蛇能够自主导航非结构化三维地形，并通过高保真环境验证鲁棒性。

Comments 14 pages, 5 figures

详情

AI中文摘要

无肢陆地动物表现出卓越的运动多样性和控制能力，目前尚无法被工程对应物所超越。在这里，我们引入了一个计算框架，使软体合成蛇能够导航非结构化的、异质的三维地形。我们的方法基于仿生驱动和感知模型，这些模型降低了高自由度连续体固有的控制复杂性。这些模型被集成到强化学习架构中，以推导出穿越环境的策略。训练首先在简化的同质地形中进行，以学习运动基元。然后，这些基元被组合成针对复杂地形的自适应策略。我们通过将蛇部署在从真实世界成像重建的高保真三维环境中来展示鲁棒性，实现了可靠的导航。总体而言，这项工作为自然地形中连续系统的控制提供了一个物理真实的仿真平台和实用见解。

英文摘要

Limbless terrestrial animals exhibit exceptional locomotor versatility and control, currently unmatched by engineered counterparts. Here, we introduce a computational framework that enables soft synthetic snakes to navigate unstructured, heterogeneous 3D terrains. Our approach is grounded in bio-inspired actuation and sensing models that reduce the control complexity inherent to high-degree-of-freedom, continuum bodies. These models are integrated into a reinforcement learning architecture to derive environment-traversing policies. Training first occurs in simplified, homogeneous terrains to learn locomotion primitives. These are then composed into adaptive strategies for complex landscapes. We demonstrate robustness by deploying a snake in high-fidelity 3D environments reconstructed from real-world imaging, achieving reliable navigation. Overall, this work provides a physically-realistic simulation platform and practical insights for the control of continuum systems in natural terrains.

URL PDF HTML ☆

赞 0 踩 0

2605.24980 2026-05-26 cs.RO 版本更新

Loosely Coupled Factor Graph Optimization for Pseudolite-Augmented Navigation

松耦合因子图优化用于伪卫星增强导航

Chih-Chun Chen, Lipeng Tan, Shiyu Bai, Heike Vallery

发表机构 * Federal Ministry for Economic Affairs and Energy, Germany（德国经济事务和能源部长办公厅）； German Federal Ministry of Research, Technology, and Space (BMFTR)（德国联邦研究、技术和空间部长办公厅）； Robotics Institute Germany (RIG)（德国机器人研究所）

AI总结提出一种松耦合因子图优化框架，融合GNSS/伪卫星最小二乘解与IMU数据，在低可见度环境下相比标准最小二乘方法将平均三维误差降低22.8%至41.3%。

2605.24975 2026-05-26 cs.RO cs.AI cs.LG 版本更新

Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion

弥合差距：实现软演员-评论家算法用于高性能腿部运动

Gianluca Sabatini, Chenhao Li, Marco Hutter

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结本文通过识别软演员-评论家（SAC）在并行训练中性能不足的根本原因，并提出策略初始化、超时感知评论家目标和多步回报估计等改进，使其在腿部运动任务中达到与近端策略优化（PPO）相当的性能。

详情

AI中文摘要

近端策略优化（PPO）由于其在IsaacLab等大规模并行仿真环境中的鲁棒性和可扩展性，已成为训练腿部机器人的事实标准。然而，其基于策略的性质使其天生样本效率低下，阻碍了其在真实硬件上的持续适应和微调。相比之下，软演员-评论家（SAC）是一种可以重用过去经验的离策略算法，使其成为模拟到现实迁移工作流程的自然候选，其中同一算法既可用于仿真，也可用于真实机器人的在线学习。尽管有这些优势，SAC在大规模并行训练设置中始终未能匹配PPO的经验性能。本工作确定了这一差距的根本原因，并引入了针对性的修改，包括策略初始化、超时感知评论家目标和多步回报估计，使SAC能够稳定地大规模训练。在多个腿部机器人平台和多样化的运动任务上评估，我们的方法完全弥合了与PPO的性能差距。

英文摘要

Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in massively parallel simulation environments like IsaacLab. However, its on-policy nature makes it inherently sample-inefficient, preventing its use for continuous adaptation and fine-tuning on real hardware. Soft Actor-Critic (SAC), by contrast, is an off-policy algorithm that can reuse past experience, making it a natural candidate for sim-to-real transfer workflows where the same algorithm can be used both in simulation and for online learning on the real robot. Despite these advantages, SAC has consistently failed to match PPO's empirical performance in massively parallel training settings. This work identifies the root causes of this gap and introduces targeted modifications, covering policy initialization, timeout-aware critic targets, and multi-step return estimation, that enable SAC to train stably at scale. Evaluated across multiple legged robot platforms and diverse locomotion tasks, our approach closes the performance gap with PPO entirely.

URL PDF HTML ☆

赞 0 踩 0

2605.24950 2026-05-26 cs.RO cs.LG 版本更新

增强的INS/GNSS状态估计：利用基于GNSS的加速度测量

Gal Versano, Itzik Klein

发表机构 * Autonomous Navigation and Sensor Fusion Lab（自主导航与传感器融合实验室）； Hatter Department of Marine Technologies（海洋技术系）； Charney School of Marine Sciences（海洋科学学院）； University of Haifa（海法大学）

AI总结提出利用历史GNSS测量和运动模型提取车辆加速度信息，并集成到INS/GNSS滤波器中以提高定位鲁棒性和精度，在两组真实无人地面车辆数据集上分别实现11.40%和20.74%的平均位置均方根误差改进。

2605.24761 2026-05-26 cs.CV cs.RO 版本更新

Drift-Resistant Navigation World Model with Anchored Epipolar Guidance

抗漂移导航世界模型与锚定对极引导

Po-Chien Luan, Zimin Xia, Wuyang Li, Yang Gao, Alexandre Alahi

发表机构 * EPFL（瑞士联邦理工学院）

AI总结提出一种抗漂移导航世界模型，通过锚定引导滚动和双向对极几何约束，同时减轻感知漂移和几何漂移，提升长期视觉质量、几何一致性和多视图连贯性。

详情

AI中文摘要

我们提出抗漂移导航世界模型，这是一种生成模型，可减轻传统基于滚动的导航世界模型中的感知漂移和几何漂移。现有方法递归地将生成内容馈送到后续步骤，导致噪声累积和预测退化，即感知漂移。同时，它们的预测通常偏离智能体的运动，导致几何漂移。我们通过将世界模型预测重新设计为锚定引导滚动来解决这两种漂移。我们不顺序滚动每一帧，而是首先预测稀疏的未来锚点，作为稳定的长期目标，然后生成每个块内的中间帧，这些帧以过去上下文和未来锚点为条件。重要的是，这些稀疏锚点还提供几何约束，由双向对极几何支持，以定位中间帧中相应内容应出现的位置。在四个基准上的实验表明，在长期视觉质量、几何一致性和多视图连贯性方面，相对于强基线有一致的改进。这些提升进一步转化为相同规划器下下游规划性能的提高，突显了抗漂移、几何感知预测对于可靠导航世界模型的重要性。

英文摘要

We propose Drift-Resistant Navigation World Model, a generative model that mitigates both perceptual drift and geometric drift in conventional rollout-based navigation world models. Existing methods recursively feed generated content into subsequent steps, causing noise accumulation and degraded predictions, i.e., perceptual drift. Meanwhile, their predictions often deviate from the agent's motion, resulting in geometry drift. We address both types of drift by redesigning world-model prediction as an anchor-guided rollout. Instead of rolling out every frame sequentially, we first predict sparse future anchors that serve as stable long-range targets, and then generate intermediate frames within each chunk conditioned on both past context and future anchors. Importantly, these sparse anchors also provide geometric constraints, supported by bidirectional epipolar geometry, to localize where corresponding content should appear in the intermediate frames. Experiments on four benchmarks demonstrate consistent improvements over strong baselines in long-horizon visual quality, geometric consistency, and multi-view coherence. These gains further translate into improved downstream planning performance under the same planners, highlighting the importance of drift-resistant, geometry-aware prediction for reliable navigation world models.

URL PDF HTML ☆

赞 0 踩 0

2605.24760 2026-05-26 cs.RO 版本更新

Geometric Workspace Analysis and Transmission-Aware Dynamics of a Serial Spherical Tool for Microsurgery

显微外科用串行球形工具的几何工作空间分析与传动感知动力学

Anestis Mablekos-Alexiou, Lyndon da Cruz, Christos Bergeles

发表机构 * Moorfields Eye Hospital NHS Foundation Trust（莫尔菲兹眼科医院 NHS 基础信托）； King’s College London（国王学院伦敦）

AI总结提出一种用于显微外科的串行球形机构（带额外平移自由度）的运动学与传动感知设计框架，通过解析工作空间公式和传动感知动力学方法实现快速设计评估。

2605.24731 2026-05-26 eess.SY cs.RO cs.SY 版本更新

Passivity-based Semi-autonomous Rotational Motion Navigation for Rigid-body Networks: Stability and Human Passivity Analysis

基于无源性的刚体网络半自主旋转运动导航：稳定性与人体无源性分析

Reiji Terunuma, Yuta Nakamura, Takeshi Hatanaka

发表机构 * Institute of Science Tokyo（东京科学研究所）

AI总结提出一种基于无源性的半自主姿态控制框架，通过虚拟领导者和隐身控制实现多机器人系统在SO(3)上的人机交互稳定性，并证明在人体无源性假设下的闭环稳定性。

Comments This work is to be submitted to the 6th Workshop on Cyber-Physical Human Systems (CPHS2026) for possible publication

详情

AI中文摘要

本文提出了一种新颖的基于无源性的半自主姿态控制框架，特别关注定义在特殊正交群$SO(3)$上的姿态运动学。虽然人机交互有助于成功执行复杂任务，但确保$SO(3)$流形上人在回路系统的稳定性仍然是一个尚未解决的挑战。我们首先提出了一种新的控制架构，其中多机器人系统通过所谓的隐身控制保持反馈给人类操作员的平均信息的不变性，并且人类干预通过虚拟领导者进行调解，该虚拟领导者通过基于无源性的姿态同步律与机器人耦合。然后，我们在假设人类表现为无源系统的条件下，严格证明了所提出的在回路系统的闭环稳定性。为支持这一分析，进行了仿真研究，将人类操作员识别为动态系统，并检查了所识别模型的无源性特性。

英文摘要

This paper presents a novel passivity-based semi-autonomous attitude control framework, with a particular focus on attitude kinematics defined on the special orthogonal group $SO(3)$. While human-robot interaction facilitates the successful execution of complex tasks, ensuring stability of human-in-the-loop systems on the $SO(3)$ manifold remains a largely unsolved challenge. We first propose a new control architecture in which a multi-robot system preserves invariance of the average information fed back to the human operator through so-called stealthy control, and the human intervention is mediated through a virtual leader, which is coupled with the robots via a passivity-based attitude synchronization law. We then rigorously prove closed-loop stability of the proposed human-in-the-loop system under the assumption that the human behaves as a passive system. To support this analysis, simulation studies are conducted to identify the human operator as a dynamical system, and to examine passivity properties of the identified model.

URL PDF HTML ☆

赞 0 踩 0

2605.24690 2026-05-26 cs.RO cs.LG 版本更新

Sum of Costs Diffusion with Dynamic Guidance for Motion Planning

运动规划的动态引导代价和扩散模型

Aysu Aylin Kaplan, Özgür Erkent

发表机构 * Computer Engineering Department, Hacettepe University（哈切特佩大学计算机工程系）

AI总结提出一种基于扩散模型的高泛化运动规划方法，通过总碰撞代价梯度引导去噪过程并动态选择引导起始步，在Mπnets数据集上取得最优性能。

Comments Accepted at the Frontiers of Optimization for Robotics Workshop at the IEEE International Conference of Robotics & Automation (ICRA), 2026

2605.24643 2026-05-26 cs.RO cs.SY eess.SY 版本更新

Towards Low-Gravity Planetary Exploration using Reinforcement Learning for Walking, Jumping, and In-flight Attitude Control

面向低重力行星探测的强化学习行走、跳跃与飞行姿态控制

Jørgen Anker Olsen, Kostas Alexis

发表机构 * Autonomous Robots Lab（自主机器人实验室）； NTNU（挪威特罗姆瑟大学）

AI总结本文利用强化学习为四足机器人在火星低重力环境下开发行走、垂直跳跃、前向跳跃及飞行姿态控制策略，实现跨越障碍物并安全着陆，仿真与实验验证了策略的有效性。

Comments 16 pages, 16 figures

详情

AI中文摘要

本文提出了用于行星探测场景中动态四足运动的强化学习策略。基于采用五杆腿设计的任务优化四足机器人，我们开发了针对行走、垂直跳跃、前向跳跃和飞行姿态控制的强化学习策略，这些策略明确针对火星上的低重力环境进行了调整。这些策略共同使机器人能够通过协调跳跃和精确的飞行中重新定向来克服比自身更大的障碍物，实现安全着陆。我们通过单轴重新定向测试在Olympus四足机器人上展示了姿态控制策略的Sim2Real迁移，而所有运动策略均在仿真中进行了验证。一个完整的火星探测任务场景展示了在复杂地形上协调策略部署的能力。实验结果显示，在2.6秒内完成90°姿态重新定向，仿真表明在火星重力条件下可实现3.1米的垂直跳跃和3.9米的前向跳跃。- 补充视频：https://www.youtube.com/watch?v=qlSJ3P87A4A

英文摘要

This paper presents reinforcement learning (RL) policies for dynamic quadrupedal locomotion in planetary exploration scenarios. Building on a taskoptimized quadruped with a 5-bar leg design, we develop RL policies for walking, vertical jumping, forward jumping, and in-flight attitude control, explicitly tailored to the reduced gravity on Mars. These policies jointly enable such robots to overcome obstacles larger than themselves through coordinated jumping and precise in-flight reorientation for safe landings. We demonstrate Sim2Real transfer of the attitude control policy on the Olympus quadruped through single-axis reorientation tests, while all locomotion policies are validated in simulation. A complete Mars exploration mission scenario demonstrates coordinated policy deployment across challenging terrain. Experimental results show 90° attitude reorientation in 2.6 seconds, with simulations demonstrating 3.1 meter vertical jumps and 3.9 meter forward jumps under Martian gravity conditions. - Supplementary video: https://www.youtube.com/watch?v=qlSJ3P87A4A

URL PDF HTML ☆

赞 0 踩 0

2605.24642 2026-05-26 cs.CV cs.RO 版本更新

Understanding the Impact of Geometric Foundation Models on Vision-Language-Action Models

理解几何基础模型对视觉-语言-动作模型的影响

Yurou Yang, Muyuan Lin, Roberto Martin-Martin, Martin Labrie, Shreekant Gayaka, Cheng-Hao Kuo, Luca Carlone

发表机构 * Amazon Personal Robotics Group（亚马逊个人机器人小组）； University of Texas at Austin（德克萨斯大学奥斯汀分校）； Massachusetts Institute of Technology（麻省理工学院）

AI总结本文通过线性探测分析量化了视觉-语言-动作模型（VLA）与几何基础模型（GFM）之间的“几何差距”，比较了三种注入几何信息的架构，并研究了非架构因素对几何VLA性能的影响。

详情

AI中文摘要

近期工作探索了视觉-语言-动作模型（VLA）与用于3D重建的几何基础模型（GFM）（如VGGT）交叉领域的新机遇。虽然由此产生的几何VLA通常表现出改进的性能，但仍不清楚：(i) 现代VLA是否已经具备足够的几何理解能力，(ii) 将几何理解注入VLA的最佳架构是什么，以及(iii) 其他影响几何VLA的设计选择的效果。在本文中，我们针对特定的VLA（GR00T-N1.5）和GFM（VGGT）进行了严格的实验分析，以阐明这些问题。我们的第一个贡献是通过基于线性探测的严格分析，形式化了先前工作中关于当前VLA缺乏几何理解的直觉。该分析首次量化了VLA与GFM之间的“几何差距”。我们的第二个贡献是识别并比较了将GFM与VLA桥接的不同策略。我们实现了三种不同的架构，它们在将几何信息注入VLA的方式上有所不同，同时尽可能保持低级实现细节相似，以确保公平比较。最后，我们分析了非架构选择（例如，训练数据、相机数量、重建质量）对几何VLA性能的影响。

英文摘要

Recent work explores new opportunities at the intersection of vision-language-action models (VLAs) and geometric foundation models (GFMs) for 3D reconstruction, such as VGGT. While the resulting geometric VLAs often show improved performance, it remains unclear (i) if modern VLAs already have sufficient geometric understanding to start with, (ii) what is the best architecture to inject geometric understanding into a VLA, and (iii) what is the effect of other design choices that affect geometric VLAs. In this paper we provide a rigorous experimental analysis to shed light on these questions, for a specific choice of VLA (GR00T-N1.5) and GFM (VGGT). Our first contribution is to formalize prior work's intuition that current VLAs lack geometric understanding, by providing a rigorous analysis based on linear probing. The analysis quantifies, for the first time, the "geometric gap" between VLAs and GFMs. Our second contribution is to identify and compare different strategies to bridge GFMs with VLAs. We implement three different architectures, which differ in the way they inject geometry in the VLA, while keeping low-level implementation details as similar as possible, to ensure a fair comparison. Finally, we analyze the impact of non-architectural choices (e.g., training data, number of cameras, reconstruction quality) on the performance of the geometric VLAs.

URL PDF HTML ☆

赞 0 踩 0

2605.24622 2026-05-26 cs.RO cs.CV 版本更新

PoseRefer: Pathway-Local Parameters for Semantically Grounded Reference Resolution

PoseRefer: 用于语义基础指代消解的通路-局部参数

Anna Deichler

发表机构 * KTH Royal Institute of Technology（皇家理工学院）

AI总结提出PoseRefer架构，通过解耦姿态和文本通路并冻结MiniLM类别嵌入，在MM-Conv数据集上实现31.9%的top-1准确率，并揭示融合准确性可能受类别表示伪影影响。

Comments ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction

详情

AI中文摘要

一个机器人解析“把杯子放在那个上面”必须融合手势、语言和场景几何，然而3D基础基准测试仅部分捕获了这一情况：描述是事后编写的，手势是模板化的，或者指向是为相机摆拍的。MM-Conv从二元VR交互中捕获自然的伴随语音手势，同时包含全身动作捕捉和3D场景图。我们使用它来评估姿态-语言融合，采用解耦的后期融合架构，其中姿态和文本通路不共享任何学习参数。这两个选择共同使得通过受控消融更容易隔离类别、姿态和文本的贡献。使用冻结的MiniLM类别嵌入的融合在每种指代类型上都超过了仅姿态和最佳文本通路，达到31.9%的top-1。学习到的标量门根据文本通路是否有类别访问权限而在相反策略之间切换。这是一个可靠性诊断：除非通路在架构上解耦，否则语义基础系统的融合准确性声明与类别表示伪影无法区分。

英文摘要

A robot resolving ``put the cup on that one'' must fuse gesture, language, and scene geometry, yet 3D grounding benchmarks only partially capture this regime: descriptions are written post-hoc, gestures are templated, or pointing is staged for the camera. MM-Conv captures natural co-speech gesture from dyadic VR interaction alongside full-body motion capture and 3D scene graphs. We use it to evaluate pose-language fusion with a decoupled late-fusion architecture in which pose and text pathways share no learned parameters. The two choices together make category, pose, and text contributions easier to isolate through controlled ablations. Fusion with frozen MiniLM category embeddings exceeds pose alone and the best text-only pathway on every reference type, reaching 31.9% top-1. The learned scalar gate flips between opposing policies depending on whether the text pathway has category access. This is a reliability diagnostic: fusion-accuracy claims for semantic grounding systems are indistinguishable from category-representation artifacts unless pathways are architecturally decoupled.

URL PDF HTML ☆

赞 0 踩 0

2605.24592 2026-05-26 cs.RO 版本更新

MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots

MuGen: 人形机器人的多技能生成式运动控制器

Yusen Feng, Xiang Wang, Heyuan Yao, Zixi Kang, Xinyu Huo, Boyang Yu, Pengyun Qiu, Ruijie Zhao, Baoquan Chen, Libin Liu

发表机构 * Peking University（北京大学）

AI总结提出MuGen框架，利用VQ-VAE和教师-学生策略蒸馏，从异构人类运动数据中学习生成式运动表示，使人形机器人能够执行多技能运动并模仿未见过的动作。

2605.19430 2026-05-26 cs.RO 版本更新

Neuromorphic Control of a Flapping-Wing Robot on Resource-Constrained Hardware

资源受限硬件上扑翼机器人的神经形态控制

Rim El Filali, Chenrui Feng, Chao Gao, Weibin Gu

发表机构 * Institute for AI Industry Research (AIR)（人工智能产业研究院）； Tsinghua University（清华大学）； Department of Computer Science and Technology（计算机科学与技术系）； Xinchen Qihang Inc.（新晨科技有限公司）

AI总结针对重量小于30克的蝴蝶仿生扑翼机器人，提出一种层次化神经形态控制框架，在低成本ESP32微控制器上部署两个轻量级脉冲神经网络实现状态估计与控制，通过模仿学习训练，在无系留飞行中实现稳定俯仰和航向跟踪，相比传统人工神经网络延迟降低36%、功耗降低18%。

详情

AI中文摘要

扑翼微型飞行器（FWMAV）具有卓越的机动性和气动效率，但由于非线性动力学和严格的大小、重量和功率（SWaP）约束（例如重量小于30克的蝴蝶仿生机器人），给机载控制带来了重大挑战。为此，我们提出了一种层次化神经形态控制框架，能够在广泛可用、资源受限的ESP32微控制器（单价约5美元）上实现完全机载的闭环飞行。具体而言，我们的方法在机载部署了两个轻量级脉冲神经网络（SNN）：一个用于从原始传感器反馈进行状态估计，另一个通过调节中央模式发生器（CPG）进行翅膀驱动控制。通过模仿学习训练，该系统在无系留真实飞行中实现了稳定的俯仰和航向角跟踪。实验结果进一步表明，与传统人工神经网络（ANN）基线相比，基于SNN的控制器推理延迟降低了36%（从1059微秒降至680微秒），功耗降低了18%（从0.033瓦降至0.027瓦），证明了无需专用硬件的脉冲计算可行性。据我们所知，这项工作首次展示了FWMAV自主飞行的完全机载神经形态控制，突显了SNN在严格SWaP约束下实现节能自主的潜力。视觉摘要：http://bit.ly/4nI8ECY 代码：https://anonymous.4open.science/r/Espikify-76E3/

英文摘要

Flapping-Wing Micro Aerial Vehicles (FWMAVs) provide exceptional maneuverability and aerodynamic efficiency but pose significant challenges for onboard control due to nonlinear dynamics and stringent Size, Weight, and Power (SWaP) constraints, as exemplified by a butterfly-inspired robot less than 30 gram. To this end, we present a hierarchical neuromorphic control framework that enables fully onboard, closed-loop flight on a widely available, resource-constrained ESP32 microcontroller with a unit cost of approximately $5. Specifically, our method deploys two lightweight Spiking Neural Networks (SNNs) onboard: one for state estimation from raw sensory feedback and another for control via modulation of a Central Pattern Generator (CPG) for wing actuation. Trained by imitation learning, the system achieves stable pitch and heading angle tracking during untethered real-world flight. Experimental results further reveal that the SNN-based controller reduces latency by 36% (1059us to 680us) and power by 18% (0.033W to 0.027W) for inference compared to the conventional Artificial Neural Network (ANN) baseline, demonstrating the viability of spike-based computation without specialized hardware. To the best of our knowledge, this work constitutes the first demonstration of fully onboard neuromorphic control for autonomous flight of a FWMAV, highlighting the potential of SNNs to enable energy-efficient autonomy under stringent SWaP constraints. Visual abstract: http://bit.ly/4nI8ECY Code: https://anonymous.4open.science/r/Espikify-76E3/

URL PDF HTML ☆

赞 0 踩 0

2605.17268 2026-05-26 cs.AI cs.CV cs.RO 版本更新

Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation in Autonomous Driving Models

VLA 推理是否忠实？自动驾驶模型中因果链的安全性探究

Nicanor Mayumu, Xiaoheng Deng, Patrick Mukala

发表机构 * School of Computer Science and Engineering（计算机科学与工程学院）； Central South University（中南大学）； School of Computer Science（计算机科学学院）； University of Wollongong in Dubai（迪拜大学）

AI总结通过分析300次VLA推理，发现输出推理与轨迹的忠实度仅42.5%，存在大量漏检行人、轨迹脆弱及推理-动作不一致问题，并提出了信息论忠实度形式化定义与安全架构。

Comments Accept (Poster), CVPR 2026 Workshop DriveX NonArchival Track

2605.15971 2026-05-26 cs.RO 版本更新

OHP-RL: Online Human Preference as Guidance in Reinforcement Learning for Robot Manipulation

OHP-RL：在线人类偏好作为机器人操作强化学习中的指导

Yunyang Mo, Jian Li, Qiwei Wu, Yihang Kang, Renjing Xu

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港理工大学（广州））

AI总结提出OHP-RL框架，利用人类干预作为偏好信息，通过状态依赖偏好门自适应调节策略学习，在Franka机器人接触丰富的操作任务中实现高成功率、快速收敛和低人类干预。

详情

AI中文摘要

虽然强化学习使机器人能够自主获取技能，但其在实际部署中受到低效和不安全探索的严重限制。人类在环干预提供了一种实用的解决方案，但现有方法通常将这些干预作为辅助训练信号，未能充分捕捉它们提供的关于何时以及如何引导自主性的更丰富信息。人类干预通常编码了在安全和任务约束下对行为的相对偏好，而不是规定要模仿的精确动作。受此观点启发，我们提出在线人类偏好作为强化学习中的指导（OHP-RL），这是一个利用人类干预作为偏好信息来指导策略学习的框架。OHP-RL引入了一个状态依赖的偏好门，自适应地调节人类干预应在何时以及多大程度上塑造策略学习。这种设计使智能体能够从间歇性和不完美的人类反馈中受益，同时保持自主探索和稳定的策略优化。我们在Franka机器人上的三个具有挑战性的真实世界接触丰富操作任务中评估了OHP-RL。在所有任务中，OHP-RL始终实现了高成功率、更快的收敛以及比先前方法显著更低的人类干预努力。此外，学习到的策略在整个训练过程中表现出更稳定和与人类一致的行为。

英文摘要

While reinforcement learning (RL) enables robots to acquire skills autonomously, its real-world deployment is severely limited by inefficient and unsafe exploration. Human-in-the-loop interventions offer a practical solution, yet existing methods typically exploit these interventions as auxiliary training signals, without fully capturing the richer information they provide about when and how autonomy should be guided. Human interventions often encode relative preferences over behavior under safety and task constraints, rather than prescribing exact actions to imitate. Motivated by this perspective, we propose Online Human Preference as Guidance in Reinforcement Learning (OHP-RL), a framework that leverages human interventions as preference information to guide policy learning. OHP-RL introduces a state-dependent preference gate that adaptively regulates when and to what extent human interventions should shape policy learning. This design enables the agent to benefit from intermittent and imperfect human feedback while preserving autonomous exploration and stable policy optimization. We evaluate OHP-RL on three challenging real-world contact-rich manipulation tasks on a Franka robot. Across all tasks, OHP-RL consistently achieves strong success rates, faster convergence, and substantially lower human intervention effort than prior approaches. Moreover, the learned policies exhibit more stable and human-aligned behavior throughout training.

URL PDF HTML ☆

赞 0 踩 0

2605.05182 2026-05-26 cs.RO cs.SY eess.SY 版本更新

GPS拒止环境下无人机的高度自适应纯视觉地理定位

Xingyu Shao, Mengfan He, Chunyu Li, Liangzheng Sun, Ziyang Meng

发表机构 * Department of Precision Instrument, Tsinghua University（清华大学精密仪器系）； School of Aerospace Engineering, Beijing Institute of Technology（北京理工大学航天工程学院）； School of Instrumentation Science and Opto-electronics Engineering, Beijing Information Science and Technology University（北京信息科技大学仪器科学与光电工程学院）

AI总结针对无人机视觉位置识别中高度变化导致的尺度不匹配问题，提出一种基于单目视觉的高度自适应地理定位框架，通过频域变换估计相对高度并用于图像尺度归一化，结合分类-检索视觉位置识别模块实现粗定位，引入质量自适应边缘分类器提升检索鲁棒性。

详情

AI中文摘要

为了解决无人机视觉位置识别中由高度大幅变化引起的尺度不匹配问题，我们提出了一种仅依赖单目视觉的高度自适应地理定位框架。该方法首先通过将输入图像转换到频域，并将高度估计建模为回归作为分类问题，从单张下视图像中估计相对高度。然后利用估计的高度将查询图像裁剪到规范尺度，之后通过分类-检索视觉位置识别模块进行粗定位。为了在图像质量变化的情况下提高检索鲁棒性，我们进一步引入了质量自适应边缘分类器，并通过加权坐标估计对最终位置进行精化，该估计基于前k个检索候选。在两个合成数据集和两个真实飞行数据集上的实验表明，相对高度估计模块在显著高度变化下，下游检索性能有显著提升。与使用相同检索流程但未进行高度归一化相比，我们的视觉位置识别模块通过高度自适应使平均R@1和R@5分别提高了41.50和56.83个百分点，完整系统在报告的工作站硬件上以13.3帧/秒运行。这些结果表明，相对高度估计为跨高度无人机地理定位提供了有效的尺度先验，并在无需辅助距离传感器或时间输入的情况下支持GPS拒止环境下的粗初始化。

英文摘要

To address the scale mismatch caused by large altitude variations in UAV visual place recognition, we propose a monocular vision-only altitude-adaptive geo-localization framework. The method first estimates relative altitude from a single downward-looking image by transforming the input into the frequency domain and formulating altitude estimation as a regression-as-classification (RAC) problem. The estimated altitude is then used to crop the query image to a canonical scale, after which a classification-then-retrieval visual place recognition module performs coarse localization. To improve retrieval robustness under varying image quality, we further introduce a quality-adaptive margin classifier (QAMC) and refine the final location by weighted coordinate estimation over the top retrieved candidates. Experiments on two synthetic datasets and two real-flight datasets show that the relative altitude estimation (RAE) module yields clear overall improvements in downstream retrieval performance under significant altitude changes. With our visual place recognition module, altitude adaptation improves average R@1 and R@5 by 41.50 and 56.83 percentage points, respectively, compared with using the same retrieval pipeline without altitude normalization, and the full system runs at 13.3 frames/s on the reported workstation hardware. These results indicate that relative altitude estimation provides an effective scale prior for cross-altitude UAV geo-localization and supports GPS-denied coarse initialization without auxiliary range sensors or temporal inputs.

URL PDF HTML ☆

赞 0 踩 0

2602.03983 2026-05-26 cs.RO cs.CV 版本更新

Efficient Long-Horizon Vision-Language-Action Models via Static-Dynamic Disentanglement

通过静态-动态解耦实现高效长程视觉-语言-动作模型

Weikang Qiu, Huashuo Lei, Tinglin Huang, Rex Ying

发表机构 * Yale University（耶鲁大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出DySta框架，通过将视觉输入解耦为多级静态和动态令牌，减少上下文长度并复用KV缓存，实现高效多帧集成和推理，在基准测试和真实任务中显著提升性能。

详情

AI中文摘要

视觉-语言-动作（VLA）模型最近成为通用机器人控制的一种有前景的范式。基于视觉-语言模型（VLM）架构，VLA模型根据视觉观察和语言指令预测动作，在任务中实现了强大的性能和泛化能力。然而，VLA模型面临两个主要挑战：输入帧的有限上下文窗口，以及由于二次注意力复杂性和大参数数量导致的低效推理。为此，我们提出了DySta，一个将视觉输入解耦为多级静态和动态令牌的框架，使得（1）在帧间保留静态令牌的单一副本以显著减少上下文长度，以及（2）通过轻量级重缓存门（仅在必要时更新）重用静态令牌的键值（KV）缓存。这种设计实现了高效的多帧集成和高效推理。此外，我们引入了一个新的基准测试，更有效地评估VLA模型的多帧集成能力。实验表明，DySta在我们的基准测试中各项指标上提高了24.5%的多帧集成能力，在真实世界记忆依赖任务中绝对成功率达到23.3%，同时在模拟基准测试中推理速度提升2.0倍（成功率+2.3%），在真实世界通用任务中推理速度提升2.2倍（成功率+10.6%）。

英文摘要

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for generalist robotic control. Built upon vision-language model (VLM) architectures, VLAs predict actions conditioned on visual observations and language instructions, achieving strong performance and generalization across tasks. However, VLAs face two major challenges: a limited context window for input frames and inefficient inference due to the quadratic attention complexity and large parameter counts. To this end, we propose DySta, a framework that disentangles visual inputs into multi-level static and dynamic tokens, which enables (1) retaining a single copy of static tokens across frames to significantly reduce context length, and (2) reusing the key-value (KV) cache of static tokens through a lightweight recache gate that updates only when necessary. This design enables efficient multi-frame integration and efficient inference. In addition, we introduce a new benchmark that more effectively evaluates the multi-frame integration ability of VLAs. Experiments show that Dysta improves multi-frame integration by 24.5% across metrics on our benchmark and 23.3% in absolute success rate on real-world memory-dependent tasks, while accelerating inference by 2.0x (with +2.3% success rate) on simulation benchmarks and 2.2x (with +10.6% success rate) on real-world general tasks.

URL PDF HTML ☆

赞 0 踩 0

2512.00375 2026-05-26 cs.RO 版本更新

DPNet: Doppler LiDAR Motion Planning for Highly-Dynamic Environments

DPNet: 面向高动态环境的多普勒激光雷达运动规划

Wei Zuo, Zeyi Ren, Chengyang Li, Yikun Wang, Mingle Zhao, Shuai Wang, Wei Sui, Fei Gao, Yik-Chung Wu, Chengzhong Xu

发表机构 * The University of Hong Kong（香港大学）； Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences（中国科学院深圳先进技术研究所）； University of Macau（澳门大学）； D-Robotics ； Zhejiang University（浙江大学）

AI总结提出DPNet，通过多普勒卡尔曼神经网络跟踪快速障碍物并利用多普勒调谐模型预测控制实现高动态环境下的高频高精度运动规划。

Comments Accepted to IEEE Robotics and Automation Letters in April, 2026

详情

DOI: 10.1109/LRA.2026.3685933

AI中文摘要

现有的运动规划方法由于对环境变化理解不足，常常难以应对快速移动的障碍物。为了解决这一问题，我们提出将运动规划器与多普勒激光雷达集成，后者不仅提供测距测量，还提供瞬时点速度。然而，由于高精度和高频率的要求，这种集成并非易事。为此，我们引入了多普勒规划网络（DPNet），通过基于多普勒模型的学习来跟踪和应对快速障碍物。我们首先提出了一种多普勒卡尔曼神经网络（D-KalmanNet），用于在部分可观测的高斯状态空间模型下跟踪障碍物状态。然后，我们利用预测的障碍物运动构建了一个多普勒调谐模型预测控制（DT-MPC）框架用于自我运动规划，实现了控制器参数的运行时自动调优。这两个模块使得DPNet能够从最少数据中学习快速环境变化，同时保持轻量级，在跟踪和规划中实现高频率和高精度。在高保真模拟器和真实世界数据集上的实验表明，DPNet优于广泛的基准方案。代码可在 https://github.com/UUwei-zuo/DPNet 获取。

英文摘要

Existing motion planning methods often struggle with rapid-motion obstacles due to an insufficient understanding of environmental changes. To address this, we propose integrating motion planners with Doppler LiDARs, which provide not only ranging measurements but also instantaneous point velocities. However, this integration is nontrivial due to the requirements of high accuracy and high frequency. To this end, we introduce Doppler Planning Network (DPNet), which tracks and reacts to rapid obstacles via Doppler model-based learning. We first propose a Doppler Kalman neural network (D-KalmanNet) to track obstacle states under a partially observable Gaussian state space model. We then leverage the predicted motions of obstacles to construct a Doppler-tuned model predictive control (DT-MPC) framework for ego-motion planning, enabling runtime auto-tuning of controller parameters. These two modules allow DPNet to learn fast environmental changes from minimal data while remaining lightweight, achieving high frequency and high accuracy in both tracking and planning. Experiments on high-fidelity simulator and real-world datasets demonstrate the superiority of DPNet over extensive benchmark schemes. Code available at https://github.com/UUwei-zuo/DPNet

URL PDF HTML ☆

赞 0 踩 0

2511.19211 2026-05-26 cs.RO 版本更新

Soft Pneumatic Grippers: Topology optimization, 3D-printing and Experimental validation

软体气动夹爪：拓扑优化、3D打印与实验验证

Prabhat Kumar, Chandra Prakash, Josh Pinskier, David Howard, Matthijs Langelaar

发表机构 * Department Mechanical and Aerospace Engineering. India Institute of Technology Hyderbad, Telangana 502285, India（机械与航空航天工程系。印度理工学院海得拉巴分校，特伦敦502285，印度）； CSIRO Robotics, Pullenvale, QLD 4069, Australia（澳大利亚CSIRO机器人部，Pullenvale，QLD 4069，澳大利亚）； Faculty of Mechanical Engineering, Delft University of Technology, Mekelweg 2, Delft, 2628CD, Zuid-Holland, The Netherlands（代尔夫特理工大学机械工程学院，Mekelweg 2号，代尔夫特，2628CD，荷兰泽兰荷兰）

AI总结提出一种考虑载荷设计依赖性的软体气动夹爪拓扑优化框架，通过2D软臂单元优化、3D打印制造及实验验证，证明其优于传统矩形设计。

Comments 11 Figures

详情

AI中文摘要

本文提出了一种系统性的拓扑优化框架，用于设计软体气动夹爪（SPG），明确考虑了驱动载荷的设计依赖性。载荷使用达西定律并添加排水项进行建模。通过将问题表述为使用鲁棒公式的柔顺机构设计问题，优化了一个2D软臂单元。该问题被设定为最小-最大优化，其中考虑了蓝图设计和侵蚀设计的输出变形。对蓝图部分施加体积约束，对侵蚀部分施加应变能约束。采用MMA求解优化问题并获得优化的软单元。使用Ogden材料模型进行有限元分析证实，优化后的2D单元在气动载荷下优于传统的矩形设计。将优化后的2D单元拉伸得到3D模块，并组装十个这样的单元以形成软臂。分析了优化臂在不同压力载荷下的变形曲线。对四个臂进行3D打印，并与支撑结构集成以实现所提出的SPG。在具有不同重量、尺寸、刚度和形状的物体上展示了SPG的抓取性能。

英文摘要

This paper presents a systematic topology optimization framework for designing a soft pneumatic gripper (SPG), explicitly considering the design-dependent nature of the actuating load. The load is modeled using Darcy's law with an added drainage term. A 2D soft arm unit is optimized by formulating it as a compliant mechanism design problem using the robust formulation. The problem is posed as a min-max optimization, where the output deformations of blueprint and eroded designs are considered. A volume constraint is imposed on the blueprint part, while a strain-energy constraint is enforced on the eroded part. The MMA is employed to solve the optimization problem and obtain the optimized soft unit. Finite element analysis with the Ogden material model confirms that the optimized 2D unit outperforms a conventional rectangular design under pneumatic loading. The optimized 2D unit is extruded to obtain a 3D module, and ten such units are assembled to create a soft arm. Deformation profiles of the optimized arm are analysed under different pressure loads. Four arms are 3D-printed and integrated with a supporting structure to realize the proposed SPG. The gripping performance of the SPG is demonstrated on objects with different weights, sizes, stiffness, and shapes.

URL PDF HTML ☆

赞 0 踩 0

2510.23509 2026-05-26 cs.RO 版本更新

Logic-Guided Socially-aware Robot Navigation World Model

逻辑引导的社会感知机器人导航世界模型

Weizheng Wang, Obi Ike, Soyun Choi, Sungeun Hong, Aniket Bera, Byung-Cheol Min

发表机构 * School of Applied and Creative Computing, Purdue University（应用与创意计算学院，普渡大学）； Department of Computer Science, Purdue University（计算机科学系，普渡大学）； Department of Applied Artificial Intelligence, Sungkyunkwan University（应用人工智能系，成均馆大学）； Department of Computer Science and Department of Intelligent Systems Engineering, Indiana University Bloomington（计算机科学系和智能系统工程系，印第安纳大学布卢明顿分校）

AI总结提出NaviWM，通过结合结构化世界模型和逻辑驱动推理链，增强大语言模型在动态人类空间中生成社交合规且物理安全的导航决策的能力。

详情

AI中文摘要

社交机器人导航越来越依赖大语言模型进行推理、路径规划以及在动态人类空间中实现移动。然而，仅依赖LLM进行规划往往会导致不可预测和不安全的行为，尤其是在动态人类空间中，原因是物理基础有限且逻辑一致性弱。在这项工作中，我们引入了NaviWM，一种社会感知的机器人导航世界模型，它通过结构化世界模型和逻辑驱动的思维链过程增强LLM推理。NaviWM由两个主要组件组成：（1）一个时空世界模型，捕捉环境中智能体的位置、速度和活动；（2）一个演绎推理模块，通过多步、基于逻辑的推理过程引导LLM。这种集成使机器人能够在明确定义的约束（如个人空间、碰撞避免和时机）下生成既社交合规又物理安全的导航决策。与基于提示或微调的先前方法不同，NaviWM将社会规范编码为一阶逻辑，从而实现可解释和可验证的推理。实验表明，NaviWM提高了成功率并减少了社交违规，尤其是在拥挤环境中。这些结果证明了将形式推理与LLM结合用于鲁棒社交导航的好处。本工作的更多实验细节和演示视频可在以下网址找到：https://sites.google.com/view/NaviWM。

英文摘要

Social robot navigation increasingly relies on large language models for reasoning, path planning, and enabling movement in dynamic human spaces. However, relying solely on LLMs for planning often leads to unpredictable and unsafe behaviors, especially in dynamic human spaces, due to limited physical grounding and weak logical consistency. In this work, we introduce NaviWM, a socially-aware robot Navigation World Model that augments LLM reasoning with a structured world model and a logic-driven chain-of-thought process. NaviWM consists of two main components: (1) a spatial-temporal world model that captures the positions, velocities, and activities of agents in the environment, and (2) a deductive reasoning module that guides LLMs through a multi-step, logic-based inference process. This integration enables the robot to generate navigation decisions that are both socially compliant and physically safe, under well-defined constraints such as personal space, collision avoidance, and timing. Unlike previous methods based on prompting or fine-tuning, NaviWM encodes social norms as first-order logic, enabling interpretable and verifiable reasoning. Experiments show that NaviWM improves success rates and reduces social violations, particularly in crowded environments. These results demonstrate the benefit of combining formal reasoning with LLMs for robust social navigation. Additional experimental details and demo videos for this work can be found at: https://sites.google.com/view/NaviWM.

URL PDF HTML ☆

赞 0 踩 0

2510.01389 2026-05-26 cs.RO cs.AI cs.LG 版本更新

DELTA：可变形多连杆多旋翼飞行器的设计、控制与运动策略——用于空地混合运动与操作

Kazuki Sugihara, Moju Zhao, Takuzumi Nishio, Kei Okada, Masayuki Inaba

AI总结本文提出一种新型多连杆多旋翼机器人DELTA，通过在每个连杆上安装推进器并利用关节驱动，实现了地面滚动、空中飞行及多种环境下的操作能力，并设计了基于非线性优化的实时控制方法和考虑接触约束的运动策略。

Comments 20 pages, 31 figures

详情

AI中文摘要

近年来，多模态运动能力使机器人能够在陆地和空中领域机动。然而，大多数此类机器人仅设计用于运动，很少具备实际任务所需的操作能力。通过添加机械臂，地面机器人可以执行操作，一些带有机械臂的无人机已展示了空中操作能力。尽管如此，这类多旋翼无法直接用于地面操作，且这种配置本身不适合空地混合运动。这是因为其推进器集中式结构难以同时实现足够的操作自由度（DoF）以及带接触和变形的稳定运动。因此，在本工作中，我们开发了一种新型多连杆多旋翼机器人，每个连杆上装有推进器，并能够与环境接触。该机器人可以利用关节驱动，在多种环境中执行地面滚动运动、空中飞行运动以及操作。首先，我们介绍了所提出机器人的最小配置设计。我们还描述了运动学模型，并基于该模型提出了每个组件的设计。其次，我们提出了一种基于非线性优化的实时控制方法，该方法考虑了接触和关节运动，可应用于各种多旋翼。第三，我们提出了包含空地混合多连杆多旋翼特有接触约束的运动策略，并基于多接触模型分析了操作能力的局限性。最后，我们使用实现的样机展示了两个领域中的多种运动。据我们所知，这是多连杆多旋翼首次展示空地混合运动与操作。

英文摘要

In recent years, multimodal locomotion capabilities have enabled robots to maneuver in both terrestrial and aerial domains. However, most of these robots are designed only for locomotion, and few possess the manipulation capabilities required for practical tasks. By adding a manipulator, ground robots can perform manipulation, and some drones with robotic arms have demonstrated aerial manipulation. Nonetheless, such multirotors cannot be directly used for manipulation on the ground, and this configuration itself is unsuitable for air-ground hybrid locomotion. This is because their thruster-centralized structure makes it difficult to achieve both sufficient degrees of freedom (DoF) for manipulation and stable motion with contact and transformation. Therefore, in this work, we develop a new multilink multirotor with thrusters on each link and capable of contact with the environments. This robot can perform terrestrial rolling locomotion, aerial flight locomotion, and manipulation in multiple environments using joint actuation. First, we introduce a minimal configuration design of the proposed robot. We also describe a kinematic model and propose a design for each component based on this model. Second, we propose a real-time control method based on nonlinear optimization that considers contact and joint motion, which can be applied to various multirotors. Third, we propose motion strategies that include contact constraints specific to air-ground hybrid multilink multirotors, and analyze the limitations of manipulation capabilities based on multi-contact model. Finally, we demonstrate a variety of motions in both domains using the implemented prototype. To the best of our knowledge, this is the first demonstration of air-ground hybrid locomotion and manipulation by a multilink multirotor.

URL PDF HTML ☆

赞 0 踩 0

2105.01215 2026-05-26 cs.RO 版本更新

Lidar Scan Registration Robust to Extreme Motions

对极端运动鲁棒的激光雷达扫描配准

Simon-Pierre Deschênes, Dominic Baril, Vladimír Kubelka, Philippe Giguère, François Pomerleau

发表机构 * Northern Robotics Laboratory（北方机器人实验室）

AI总结针对极端运动下点云畸变导致配准失败的问题，提出一种考虑轨迹运动不确定性和环境几何的去畸变方法，在200 m/s^2和800 rad/s^2的峰值加速度下，平移误差降低9.26%，旋转误差降低21.84%。

Comments 8 pages, 8 figures, published in 2021 18th Conference on Robots and Vision (CRV), Burnaby, Canada

详情

DOI: 10.1109/CRV52889.2021.00014
Journal ref: 2021 18th Conference on Robots and Vision (CRV), 2021, pp. 17-24

AI中文摘要

配准算法，如迭代最近点（ICP），在过去几十年中已被证明在移动机器人定位算法中有效。然而，当机器人承受极端速度和加速度时，它们容易失败。例如，这种运动可能在碰撞后发生，导致点云严重畸变。虽然过去已经探索了点云去畸变方法以提高定位和建图精度，但这些方法仍然依赖于高精度的里程计系统或理想的导航条件。在本文中，我们提出了一种方法，考虑了用于去畸变点云的轨迹的剩余运动不确定性以及环境几何，以提高当前配准算法的鲁棒性。我们在一个产生200 m/s^2和800 rad/s^2峰值加速度的3D地图测试台上将我们的方法与其他三种解决方案进行了比较。在这些极端场景中，我们证明了我们的方法将平移误差降低了9.26%，旋转误差降低了21.84%。所提出的方法具有足够的通用性，可以无需调整地集成到许多加权ICP的变体中，并支持在更恶劣地形中的定位鲁棒性。

英文摘要

Registration algorithms, such as Iterative Closest Point (ICP), have proven effective in mobile robot localization algorithms over the last decades. However, they are susceptible to failure when a robot sustains extreme velocities and accelerations. For example, this kind of motion can happen after a collision, causing a point cloud to be heavily skewed. While point cloud de-skewing methods have been explored in the past to increase localization and mapping accuracy, these methods still rely on highly accurate odometry systems or ideal navigation conditions. In this paper, we present a method taking into account the remaining motion uncertainties of the trajectory used to de-skew a point cloud along with the environment geometry to increase the robustness of current registration algorithms. We compare our method to three other solutions in a test bench producing 3D maps with peak accelerations of 200 m/s^2 and 800 rad/s^2. In these extreme scenarios, we demonstrate that our method decreases the error by 9.26 % in translation and by 21.84 % in rotation. The proposed method is generic enough to be integrated to many variants of weighted ICP without adaptation and supports localization robustness in harsher terrains.

URL PDF HTML ☆

赞 0 踩 0

2605.24495 2026-05-26 cs.RO 版本更新

Elevator-LIO: Robust LiDAR-Inertial Odometry for Multi-Floor Navigation under Elevator-Induced Non-Inertial Motion

Elevator-LIO：电梯引起的非惯性运动下多层导航的鲁棒激光雷达-惯性里程计

Yifan Zhang, Yudong Huang, Yuchong Zhang, Changze Li, Haoran Liu, Ming Yang, Tong Qin

发表机构 * Shanghai Jiao Tong University（上海交通大学）

AI总结提出Elevator-LIO框架，通过解耦状态估计模型和模式依赖的迭代误差状态卡尔曼滤波器，实现电梯内连续定位，并利用自适应体素降采样和事件触发更新抑制垂直漂移。

Comments 16 pages, 10 figures, 5 tables

详情

AI中文摘要

本文提出了Elevator-LIO，一种旨在电梯行驶过程中实现机器人连续定位的激光雷达-惯性里程计框架，从而支持跨楼层机器人任务。为了解决非惯性框架下的状态估计问题，Elevator-LIO建立了一个解耦的状态估计模型，分别对机器人相对于电梯的运动和电梯自身的运动进行建模，并将其嵌入到模式依赖的迭代误差状态卡尔曼滤波器框架中。该框架在普通室内环境中退化为常规LIO估计，同时在电梯非惯性环境中实现电梯相关状态的传播和约束更新，从而实现连续稳定的定位。电梯模式管理器利用激光雷达测距统计和估计状态检测电梯进出事件，并在电梯停止时引入事件触发的零速度和零加速度更新，以抑制累积的垂直漂移。此外，本文采用自适应体素降采样策略，在环境尺度显著变化时保持有效点数的稳定。我们在包含79次电梯乘坐的20个真实世界序列上进行了广泛实验，包括大尺度空间、长垂直行程、动态行人干扰和镜面反射等实际挑战。结果表明，Elevator-LIO在所有序列中保持连续定位精度，其中17个序列的终端高度误差低于1厘米。相比之下，现有代表性定位系统在这些电梯序列上表现不佳。在Hilti 2022/2023数据集上的测试进一步表明，所提方法在标准室内场景中仍具有竞争力。项目页面位于https://xiaofan4122.github.io/Elevator_LIO_Page/。

英文摘要

This paper presents Elevator-LIO, a LiDAR-inertial odometry framework designed to achieve continuous robot localization during elevator travel, thereby supporting cross-floor robotic tasks. To address the state-estimation problem in non-inertial frames, Elevator-LIO establishes a decoupled state-estimation model that separately models the robot motion relative to the elevator and the elevator motion itself, and embeds it into a mode-dependent iterated error-state Kalman filter framework. This framework degenerates to conventional LIO estimation in ordinary indoor environments, while enabling the propagation and constrained update of elevator-related states in elevator non-inertial environments, thereby achieving continuous and stable localization. An elevator mode manager detects elevator entry and exit events using LiDAR ranging statistics and estimated states, and introduces event-triggered zero-velocity and zero-acceleration updates when the elevator stops to suppress accumulated vertical drift. In addition, this paper adopts an adaptive voxel downsampling strategy to maintain a stable number of effective points under significant environmental scale changes. We conduct extensive experiments on 20 real-world sequences containing 79 elevator rides, including practical challenges such as large-scale spaces, long vertical travel, dynamic pedestrian interference, and mirror reflections. The results show that Elevator-LIO maintains continuous localization accuracy in all sequences, with terminal height error below 1 cm in 17 sequences. In contrast, existing representative localization systems perform poorly on these elevator sequences. Tests on the Hilti 2022/2023 datasets further show that the proposed method remains competitive in standard indoor scenarios. The project page is available at https://xiaofan4122.github.io/Elevator_LIO_Page/.

URL PDF HTML ☆

赞 0 踩 0

2605.24449 2026-05-26 cs.RO cs.LG 版本更新

Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement Learning

基于强化学习的视觉引导户外飞行与避障

Shiladitya Dutta, Aayush Gupta, Varun Saran, Avideh Zakhor

发表机构 * College of Engineering, Department of Electrical Engineering and Computer Science, University of California Berkeley（加州大学伯克利分校工程学院电气工程与计算机科学系）

AI总结提出一种基于立体视觉深度和视觉惯性里程计的传感器运动策略，通过强化学习和特权学习在仿真中训练，实现零样本迁移到未知户外环境和无人机平台进行自主避障导航。

Comments Published in IEEE Robotics and Automation Letters, vol 11, no 2. Presented at the IEEE International Conference on Robotics and Automation 2026

详情

DOI: 10.1109/LRA.2025.3641120

AI中文摘要

尽管四旋翼飞行器凭借其全向机动性拥有令人印象深刻的穿越能力，但在复杂环境中需要持续的人工操控限制了其在GNSS和遥测信号缺失场景中的应用。为此，我们提出了一种新颖的传感器运动策略，该策略使用立体视觉深度和视觉惯性里程计（VIO）在未知环境中自主穿越障碍物以到达目标点。该策略由一个预训练的自编码器作为感知前端，后接一个规划与控制LSTM网络，输出速度指令，可由现成的商用无人机执行。我们利用强化学习和特权学习范式，通过两阶段过程在仿真中训练该策略：1）以全局运动规划器生成的优化轨迹作为监督骨干进行初始训练；2）在课程环境中进一步微调。为弥合仿真到现实的差距，我们采用领域随机化和奖励塑造来创建对噪声和领域偏移具有鲁棒性的策略。在户外实验中，我们的方法成功实现了对训练中从未遇到的障碍环境和无人机平台的零样本迁移。

英文摘要

Although quadcopters boast impressive traversal capabilities enabled by their omnidirectional maneuverability, the need for continuous pilot control in complex environments impedes their application in GNSS and telemetry-denied scenarios. To this end, we propose a novel sensorimotor policy that uses stereo-vision depth and visual-inertial odometry (VIO) to autonomously navigate through obstacles in an unknown environment to reach a goal point. The policy is comprised of a pre-trained autoencoder as the perception head followed by a planning and control LSTM network which outputs velocity commands that can be followed by an off-the-shelf commercial drone. We leverage reinforcement and privileged learning paradigms to train the policy in simulation through a two-stage process: 1) initial training with optimal trajectories generated by a global motion planner acting as a supervisory backbone, 2) further fine-tuning in a curriculum environment. To bridge the sim-to-real gap, we employ domain randomization and reward shaping to create a policy that is both robust to noise and domain shift. In outdoor experiments, our approach achieves successful zero-shot transfer to both obstacle environments and a drone platform that were never encountered during training.

URL PDF HTML ☆

赞 0 踩 0

2605.24436 2026-05-26 cs.MA cs.LG cs.RO 版本更新

A Reinforcement Learning Inspired Latent Yield Based Adaptive Algorithm Switching Mechanism

一种受强化学习启发的基于潜在收益的自适应算法切换机制

Jayprakash S. Nair, Jimson Mathew, Shivashankar B. Nair

发表机构 * Indian Institute of Technology Patna（印度理工学院帕纳布分校）； Indian Institute of Technology Guwahati（印度理工学院古瓦哈提分校）

AI总结针对在线或动态环境中算法选择困难的问题，提出一种受强化学习启发的潜在收益方法，通过封装奖励和惩罚触发探索与利用，实现自适应算法切换，并在排序算法和机器人避障任务中验证了有效性。

Comments Accepted and published in the Proceedings of the 29th European Conference on Applications of Evolutionary Computation (EvoApplications 2026), held as part of EvoStar 2026, Toulouse, France, April 8 to 10, 2026. Lecture Notes in Computer Science (LNCS), Springer Nature Switzerland

详情

DOI: 10.1007/978-3-032-23604-3_8
Journal ref: Applications of Evolutionary Computation, EvoApplications 2026, LNCS, Springer Nature Switzerland, 2026

AI中文摘要

对于给定的问题实例，选择最合适的算法仍然是一项具有挑战性的任务，尤其是在问题特征随时间演变的在线或动态环境中。仅依赖瞬时性能指标可能导致反应性和不稳定的行为，通常会导致次优的算法切换。本文介绍了一种计算高效的方法，用于聚合算法在多个问题实例上的性能，该方法对实例特征的剧烈变化具有相当的免疫性。受强化学习（RL）固有特征的启发，该技术将奖励和惩罚封装到一个潜在收益中，进而触发利用和探索，从而产生自适应算法切换。所提出的技术采用受遗传算法启发的岛屿模型，以促进并行探索和算法种群之间的性能交换，这些算法种群栖息在局部库中。在排序算法和机器人避障任务上的实验评估证明了该方法的可行性和有效性，突显了其在自适应算法选择至关重要的领域中的潜力。

英文摘要

Selecting the most suitable algorithm for a given problem instance remains a challenging task, particularly in online or dynamic environments where problem characteristics evolve over time. Relying solely on instantaneous performance metrics can result in a reactive and unstable behaviour, often leading to suboptimal algorithm switching. This paper introduces a computationally efficient approach for aggregating an algorithm's performance across multiple problem instances that is fairly immune to erratic variations in instance features. Inspired by features inherent to Reinforcement Learning (RL), this technique encapsulates rewards and penalties into a latent yield that, in turn, triggers exploitation and exploration, consequently resulting in adaptive algorithm switching. The proposed technique employs island models, inspired by Genetic Algorithms, to facilitate parallel exploration and performance exchanges among algorithm populations inhabiting local repertoires. Experimental evaluations on sorting algorithms and robotic obstacle avoidance tasks demonstrate the feasibility and effectiveness of the approach, highlighting its potential in domains where adaptive algorithm selection is critical.

URL PDF HTML ☆

赞 0 踩 0

2605.24433 2026-05-26 cs.RO cs.LG 版本更新

Smoother Action Chunking Flow Policy via Prior-Corrected Orthogonal Trust-Region Guidance

基于先验校正的正交信任区域引导的平滑动作块流策略

Kai Fang, Hailong Pei, Xuemin Chi

发表机构 * South China University of Technology（华南理工大学）； Zhejiang University（浙江大学）

AI总结提出POTR方法，通过先验校正权重和正交信任区域约束，改善流匹配机器人策略中动作块推理的边界不连续性和横向扰动，提升成功率和运动平滑性。

详情

AI中文摘要

流匹配机器人策略通常使用动作块推理进行高效的闭环控制，但块边界可能引入不连续的动作转换。现有的RTC引导通过在去噪过程中注入校正信号来改善连续性，但其权重调度在中间时间步较弱，且无约束的校正方向可能引入横向扰动。我们提出POTR，一种先验校正的正交信任区域引导方法。首先，我们将数据先验尺度$σ_d$纳入RTC引导权重，产生更强的中间时间校正。其次，我们将引导向量分解为与去噪速度平行和垂直的分量，并将垂直分量约束在信任区域内。在LIBERO上使用$π_{0.5}$，与RTC相比，POTR提高了成功率，并持续减少了块边界不连续性、加速度和加加速度。消融实验表明，先验校正权重提供了主要的校正增益，而正交信任区域进一步提高了稳定性。

英文摘要

Flow-matching robot policies commonly use action-chunking inference for efficient closed-loop control, but chunk boundaries can introduce discontinuous action transitions. Existing RTC guidance improves continuity by injecting correction signals during denoising, yet its weight schedule is weak at intermediate timesteps and its unconstrained correction direction may introduce transverse perturbations. We propose POTR, a **p**rior-corrected **o**rthogonal **t**rust-**r**egion guidance method. First, we incorporate a data-prior scale $σ_d$ into the RTC guidance weight, yielding stronger intermediate-time correction. Second, we decompose the guidance vector into components parallel and perpendicular to the denoising velocity, and constrain the perpendicular component within a trust region. On LIBERO with $π_{0.5}$, POTR improves success rate and consistently reduces chunk-boundary discontinuity, acceleration, and jerk compared with RTC. Ablations show that the prior-corrected weight provides the main correction gain, while the orthogonal trust region further improves stability.

URL PDF HTML ☆

赞 0 踩 0

2605.24394 2026-05-26 cs.RO 版本更新

RoboHitch: Learning Visual Affordance from Disordered Keypoints for Hitch Knots Tying

RoboHitch: 从无序关键点学习视觉可供性用于系结

Jiahui Zuo, Boyang Zhang, Fumin Zhang

发表机构 * Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology（电子与计算机工程系，香港科学与技术大学）

AI总结提出RoboHitch框架，利用无序3D关键点和RGB图像从人类演示中学习系结，通过动态图自编码器和卷积自编码器融合特征，预测抓取和放置可供性，实现遮挡下的系结。

详情

AI中文摘要

由于复杂的动力学和频繁的自遮挡，可变形线性物体的机器人操作面临重大挑战。现有的机器人打结方法通常依赖于有序关键点和显式边缘连接的精确拓扑状态跟踪。这种依赖使得它们在打结过程中因重复弯曲和交叉导致的跟踪漂移和拓扑不匹配而容易失败。为了解决这些限制，我们引入了RoboHitch，一个新颖的框架，它仅使用无序的3D关键点和RGB图像从人类演示中学习执行系结。这消除了对显式拓扑顺序的需求，允许更灵活的操作。我们的方法采用动态图自编码器从未跟踪的关键点中提取几何特征，并辅以卷积自编码器捕获必要的视觉上下文。然后，双向交叉注意力机制融合这些模态，共同预测抓取和放置可供性，促进对绳子状态的隐式推理，并在遮挡下实现系结。真实世界实验证明了我们方法的有效性和泛化能力，成功完成了自遮挡场景中的系结。

英文摘要

Robotic manipulation of deformable linear objects (DLOs) presents significant challenges due to complex dynamics and frequent self-occlusions. Existing robotic knot tying methods typically rely on precise topological state tracking with ordered keypoints and explicit edge connectivity. This reliance makes them prone to failures due to tracking drift and topology mismatch caused by repeated bending and crossings during knot formation.To address these limitations, we introduce RoboHitch, a novel framework that learns to perform hitch knot tying from human demonstrations using only disordered 3D keypoints and RGB images. This eliminates the need for explicit topological order, allowing for more flexible manipulation. Our method employs a dynamic Graph Autoencoder to extract geometric features from untracked keypoints, complemented by a Convolutional Autoencoder that captures essential visual context. A bidirectional cross-attention mechanism then fuses these modalities to jointly predict pick and place affordances, facilitating implicit reasoning about the rope's state and enabling knot tying under occlusion.Real-world experiments demonstrate the effectiveness and generalizability of our approach, successfully completing hitch knots in scenarios with self-occlusions.

URL PDF HTML ☆

赞 0 踩 0

2605.24350 2026-05-26 cs.RO cs.HC 版本更新

PACT: Proactive Asking for Continual Task Assistance in Human-Robot Collaboration

PACT：人机协作中持续任务辅助的主动询问

Chengbo He, Sheng Li, Chenyang Ma, Bochao Zou, Li Sun, Jiansheng Chen, Junliang Xing, Yuanchun Shi, Huimin Ma

发表机构 * University of Science and Technology Beijing（北京科技大学）； University of Oxford（牛津大学）； Tsinghua University（清华大学）

AI总结提出PACT框架，通过强化学习在部分观测下决定何时主动询问用户以澄清任务，从而在跨日人机协作中逐步提高辅助准确性和澄清效用。

详情

AI中文摘要

ECo-MoE: 基于具身条件的专家混合提升机器人的可进化性

Yibin Wang, Muhan Li, Zihan Guo, Sam Kriegman

发表机构 * Northwestern University, Evanston, IL, USA（西北大学，伊利诺伊州欧文顿分校）

AI总结提出一种机器人进化与学习联合优化模型，通过混合控制专家和潜在设计向量分布，在统一模块化框架中平衡个体策略与通用控制器的效率，实现不同体型的适应性行为并支持知识复用与进化引导。

详情

AI中文摘要

在本文中，我们介绍了一种机器人进化与学习的模型，该模型联合优化潜在设计向量（基因型）的分布和混合控制专家（神经模块），这些专家由每个解码设计（表型）的潜在坐标门控。这为协同设计算法提供了一种可扩展的替代方案，这些算法要么为每个机器人训练一个单独的策略（效率低下），要么为所有机器人训练一个单一通用控制器（导致过于保守的结构和行为）。我们的方法介于这两个极端之间，将祖先知识保存在一个统一但模块化的框架中，其中不同的身体结构激活和停用不同的学习感觉运动回路组合，以实现目标导向行为。这使得控制器的一部分可以被彻底改造，以更好地适应新出现的物种设计，而不会破坏其他专家模块中包含的来之不易的知识。它还允许预训练的专家策略直接插入到混合中，从而将进化引导到包含所需形态特征的潜在空间中原本未探索的区域。我们将这个过程称为“演示进化”，并探索如何利用它将自由形态进化引导到由预训练模型定义的规范结构。视频和代码可在以下网址找到：https://eco-moe.github.io。

英文摘要

In this paper, we introduce a model of evolution and learning in robots that co-optimizes a distribution of latent design vectors (genotypes) and a mixture of control experts (neural modules), which are gated by the latent coordinates of each decoded design (phenotype). This provides a scalable alternative to co-design algorithms that either train an individual policy for every robot, which is inefficient, or a monolithic universal controller for all robots, which results in overly conservative structures and behaviors. Our approach lies somewhere between these two extremes, preserving ancestral knowledge in a unified yet modular framework in which different body plans activate and deactivate different combinations of learned sensorimotor circuits for goal-directed behavior. This allows one part of the controller to be overhauled to better suit new species of designs as they emerge without disrupting the hard-earned knowledge contained within other expert modules. It also allows pretrained expert policies to be directly plugged into the mixture, which can steer evolution into otherwise unexplored areas of latent space containing desired morphological traits. We refer to this process as "evo by demo" and explore how it may be used to guide freeform evolution toward canonical structures defined by the pretrained model. Videos and code can be found at: https://eco-moe.github.io.

URL PDF HTML ☆

赞 0 踩 0

2605.24203 2026-05-26 cs.RO 版本更新

Afford-VLA: Action-Aligned Visual Planning via Internalized Affordance

Afford-VLA：通过内化可操作性实现动作对齐的视觉规划

Runze Wang, Yuqian Fu, Yu Li, Tao Lin, Tianwen Qian, Mohamed Elhoseiny, Bo Zhao, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue

发表机构 * Fudan University（复旦大学）； KAUST（康斯坦丁·亚历山大科研大学）； SJTU（上海交通大学）； East China Normal University（华东师范大学）

AI总结提出Afford-VLA框架，通过内化任务条件可操作性作为显式视觉规划接口，利用可学习<AFF>令牌查询交互区域并解码为紧凑嵌入以直接条件化动作生成，在多个模拟基准上取得最先进性能。

Comments 20 pages

详情

AI中文摘要

视觉-语言-动作（VLA）模型在通用机器人操作中展现出巨大潜力，但仍受限于空间推理不足，特别是在复杂视觉场景中确定交互位置。虽然近期工作引入多种形式的视觉规划来解决此问题，但现有方法要么依赖全局几何线索、符号中间表示，要么依赖外部生成的视觉信号，这些往往与下游动作预测弱耦合。本文重新审视VLA系统中的视觉规划，认为有效的规划应是局部的、视觉锚定的、内部生成的且直接与动作对齐。基于此洞察，我们提出Afford-VLA，一个统一框架，将任务条件可操作性内化为VLA模型中的显式视觉规划接口。具体而言，我们引入可学习<AFF>令牌来查询任务相关交互区域，从多模态特征解码可操作性掩码，并将其转换为紧凑嵌入，直接条件化动作生成。此设计使可操作性在VLA内部生成和利用，形成紧密耦合的感知-动作通路。为进一步支持此集成，我们采用训练策略，使可操作性通路与动作预测联合优化，提高其对下游控制的有效性。我们在多个模拟基准（包括LIBERO、LIBERO-Plus和SimplerEnv）上评估方法，取得一致的最先进性能，并展示了强大的真实世界结果。这些发现表明，将可操作性内化为动作对齐的视觉规划为改进VLA系统提供了强大范式。

英文摘要

Vision-language-action (VLA) models have shown strong potential for generalist robot manipulation, yet they remain limited by insufficient spatial reasoning, particularly in determining where to interact in complex visual scenes. While recent efforts introduce various forms of visual planning to address this issue, existing approaches either rely on global geometric cues, symbolic intermediate representations, or externally generated visual signals, which are often weakly coupled with downstream action prediction. In this work, we revisit visual planning in VLA systems and argue that effective planning should be local, visually grounded, internally generated, and directly aligned with action. Based on this insight, we propose Afford-VLA, a unified framework that internalizes task-conditioned affordance as an explicit visual planning interface within VLA models. Concretely, we introduce learnable <AFF> tokens to query task-relevant interaction regions, decode affordance masks from multimodal features, and convert them into compact embeddings that directly condition action generation. This design enables affordance to be both generated and utilized within the VLA, forming a tightly coupled perception-action pathway. To further support this integration, we adopt a training strategy that allows the affordance pathway to be jointly optimized with action prediction, improving its effectiveness for downstream control. We evaluate our method on multiple simulation benchmarks, including LIBERO, LIBERO-Plus, and SimplerEnv, achieving consistent state-of-the-art performance, along with strong real-world results. These findings demonstrate that internalizing affordance as action-aligned visual planning provides a powerful paradigm for improving VLA systems.

URL PDF HTML ☆

赞 0 踩 0

2605.24127 2026-05-26 cs.RO 版本更新

Investigating the Effect of a Series Elastic Actuation Retrofit to Black-Box Actuators

研究串联弹性驱动改造对黑箱执行器的影响

Ivan Tregear, Ayhan Aktas, Ferdinando Rodriguez y Baena

发表机构 * Imperial College London, Mechanical Engineering Department（帝国理工学院伦敦校区机械工程系）

AI总结通过为黑箱执行器加装串联弹性元件，利用有限元分析设计扭转弹性元件，实现了高保真力测量，将开环力控制带宽从10.32 Hz提升至30.32 Hz，提升2.93倍，且性能优于成本更高的商用传感器。

Comments Related GitHub repo available here: https://github.com/ITregear/SeriesElasticActuation-FYP

详情

AI中文摘要

在机器人应用中，执行器通常设计为具有最小间隙的刚性结构，以确保精度和可重复性。然而，这限制了柔顺性，导致在不确定环境中可能造成损坏和较差的力控制。串联弹性驱动（SEA）引入柔顺性以增强扰动抑制，并能够通过胡克定律进行力测量，但会降低系统带宽。一个定制的串联弹性（SE）元件被加装到一个黑箱执行器上，以减轻间隙和静摩擦等非线性。集成SE元件实现了高保真力测量，提高了力控制带宽和性能。通过有限元（FE）分析设计了一个扭转SE元件，其刚度为2155.4 Nm/rad。测量了原始电机和SEA集成配置的开环力控制带宽，同时利用SEA和商用力传感器的反馈评估了闭环带宽。SEA模块将带宽从10.32 Hz提高到30.32 Hz，提升了2.93倍。此外，尽管成本仅为25英镑，其性能仍比商用传感器高出7.63%。

英文摘要

In robotic applications, actuators are typically designed to be stiff with minimal backlash to ensure precision and repeatability. However, this limits compliance, leading to potential damage and poor force control in uncertain environments. Series Elastic Actuation (SEA) introduces compliance to enhance disturbance rejection and enable force measurement via Hooke's Law but reduces system bandwidth. A custom Series Elastic (SE) element was retrofitted to a black-box actuator to mitigate non-linearities like backlash and static friction. Integrating the SE element enabled high-fidelity force measurements, improving force control bandwidth and performance. A torsional SE element was designed through Finite Element (FE) analysis, yielding a stiffness of 2155.4 Nm/rad. Open-loop force control bandwidth was measured for the original motor and the SEA-integrated configuration, while closed-loop bandwidth was assessed using feedback from the SEA and a commercial force sensor. The SEA module increased bandwidth from 10.32 Hz to 30.32 Hz, a 2.93X improvement. Additionally, it outperformed the commercial sensor by 7.63% despite costing 25 GBP, a fraction of the price.

URL PDF HTML ☆

赞 0 踩 0

2605.24125 2026-05-26 cs.RO 版本更新

Anisotropic Diffusion-Driven Ergodic Coverage in Multi-Robot Systems

多机器人系统中各向异性扩散驱动的遍历覆盖

Thales C. Silva, Anoop Kiran, Nora Ayanian

发表机构 * Department of Computer Science（计算机科学系）； Brown University（布朗大学）

AI总结提出一种基于Perona-Malik各向异性扩散的遍历搜索方法，通过非均匀误差传播生成势场，引导多机器人系统实现更灵活的遍历覆盖。

详情

DOI: 10.1109/MRS66243.2025.11357259

AI中文摘要

我们考虑在多机器人系统中结合势场和遍历搜索的问题。传统的遍历搜索算法使用遍历性度量来考虑不同尺度下的期望分布。最近，提出了一种热方程驱动的遍历方法，增加了遍历度量平滑的灵活性。然而，这种方法作为各向同性扩散，无论期望分布如何变化，都会在所有方向上均匀传播误差。我们引入了遍历性问题的一类通用各向异性扩散公式，为遍历搜索生成势场。我们证明该方法推广了先前考虑径向基函数和热方程解来表示目标密度分布与覆盖轨迹之间差异的结果。在我们的解决方案中，智能体运动使用Perona-Malik扩散解的梯度进行引导，并且我们的公式将热方程作为特例。我们通过一系列不同场景的仿真来展示该方法。

英文摘要

We consider the problem of combining potential field and ergodic search on multi-robot systems. Traditional ergodic search algorithms use metrics for ergodicity that account for the desired distribution at different scales. Recently, a heat equation-driven ergodic approach was proposed, which adds flexibility to the smoothing of the ergodic metric. However, such an approach, as it is an isotropic diffusion, propagates the error uniformly in all directions, regardless of changes in the desired distribution. We introduce a general class of anisotropic diffusion formulation of the ergodicity problem, which generates a potential field for the ergodic search. We demonstrate that this approach generalizes previous results, which consider radial basis functions and the solution of the heat equation to represent the difference between the goal density distribution and the covered trajectories. In our solution, the agent movement is directed using the gradient of the solution of the Perona-Malik diffusion, and our formulation includes the heat equation as a special case. We demonstrate the methodology with a series of simulations in different scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.24111 2026-05-26 cs.RO cs.AI 版本更新

MASt3R-Nav: WayPixel Navigation in Relative 3D Maps

MASt3R-Nav: 相对3D地图中的WayPixel导航

Vansh Garg, Rohit Jayanti, Krish Pandya, Sarthak Chittawar, Siddharth Tourani, Muhammad Haris Khan, Sourav Garg, Madhava Krishna

发表机构 * Robotics Research Center, IIIT-Hyderabad, India（1 罗斯科技研究中心，IIIT-海得拉巴，印度）； University of Heidelberg（2 海德堡大学）； MBZUAI（3 MBZUAI）

AI总结提出一种基于像素相对连接性的地图表示，通过相对3D坐标系中的像素对应构建地图，并利用像素级图进行全局路径规划，训练控制器预测轨迹，实现高精度导航。

Comments 2026 IEEE International Conference on Robotics & Automation (ICRA)

详情

AI中文摘要

视觉导航能力与其底层世界表示紧密相关。与需要全局一致几何的传统3D地图不同，图像或物体相对拓扑图几乎完全放弃了几何理解，但这以牺牲导航能力为代价，通常仅限于教-重复模式。本文提出一种新颖的地图表示，即像素相对连接性，它在几何上精确但不需要全局几何一致性。受近期3D基础图像匹配进展的启发，我们通过基于单个图像对相对3D坐标系中像素对应的图像间连接性，从图像序列构建地图。然后，我们利用该像素级图通过近似和稀疏化图像内像素连接性来执行全局路径规划。由此，我们推导出“WayPixel Costmap”表示，并训练一个以此条件化的控制器来预测轨迹展开。我们展示了这种基于相对几何的密集像素级成本图比其图像级和物体级对应物是更精确的控制预测条件变量。这实现了一个高能力的导航系统，通过在模拟器中的四种导航任务和真实世界演示中得到验证。

英文摘要

Visual navigation ability is strongly tied to its underlying representation of the world. Unlike classical 3D maps that require globally-consistent geometry, image- or object-relative topological graphs almost entirely do away with geometric understanding. But, this comes at the cost of navigation capability, often limiting it to merely teach-and-repeat. In this work, we propose a novel map representation in the form of pixel-relative connectivity, which is geometrically accurate but does not require global geometric consistency. Inspired by recent progress in 3D grounded image matching, we construct a map from an image sequence through inter-image connectivity based on pixel correspondences in the relative 3D coordinate systems of individual image pairs. We then use this pixel-level graph to perform global path planning by approximating and sparsifying intra-image pixel connectivity. Through this, we derive a ''WayPixel Costmap'' representation and train a controller conditioned on it to predict a trajectory rollout. We show that this dense pixel-level costmap based on relative geometry is a more accurate conditioning variable for control prediction than its image- and object-level counterparts. This enables a highly capable navigation system, as validated on four types of navigation tasks in the simulator and through real world demonstrations.

URL PDF HTML ☆

赞 0 踩 0

2605.24074 2026-05-26 cs.CV cs.RO 版本更新

WideDepth: Millimeter-Accurate Benchmark for Fisheye Depth Estimation

WideDepth: 用于鱼眼深度估计的毫米级精度基准

Ilia Indyk, Ignat Penshin, Ivan Sosin, Maxim Monastyrny, Aleksei Valenkov, Ilya Makarov

发表机构 * Robotics Center（机器人中心）； AXXX ； Trusted AI Research Center, RAS（可信人工智能研究中心，俄罗斯科学院）

AI总结提出首个室内鱼眼深度估计数据集WideDepth，包含101个场景的5K高分辨率立体对和毫米级真值，并引入基于LiDAR的立体鱼眼图像生成方法，评估多种模型，微调后性能提升高达62%。

Comments Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

详情

AI中文摘要

鱼眼相机在机器人领域的近场操作、导航和沉浸式感知中应用日益广泛，但缺乏具有精确真值的室内深度基准。为此，我们引入WideDepth——首个用于鱼眼深度估计的室内数据集，包含101个场景的5K高分辨率立体对，标注了毫米级地面真值深度和视差。我们的数据集还包括在水平和垂直立体设置中，不同视场和基线下的配对针孔和鱼眼样本。我们进一步提出一种方法，将针孔训练的立体模型适配到鱼眼图像，并引入一种基于高分辨率LiDAR扫描的新型立体鱼眼图像生成流程。利用这些方法，我们在基准上全面评估了最先进的单目深度、立体匹配和深度补全模型。此外，我们提供了18K LiDAR导出的稀疏深度训练样本，在微调基于针孔的立体模型时，鱼眼数据性能提升高达62%。总之，我们基准的高精度和多功能性为推进鱼眼深度估计和机器人感知研究奠定了坚实基础。项目页面：https://ilyaind.github.io/WideDepth

英文摘要

Fisheye cameras are increasingly adopted in robotics for near-field manipulation, navigation, and immersive perception, yet indoor depth benchmarks with accurate ground truth are still missing. To address this, we introduce WideDepth - the first indoor dataset for fisheye depth estimation, featuring 101 scenes containing 5K high-resolution stereo pairs labeled with millimeter-level ground truth depth and disparity. Our dataset also includes paired pinhole and fisheye samples across varying fields of view and baselines in both horizontal and vertical stereo setups. We further propose a method to adapt pinhole-trained stereo models to fisheye images and introduce a novel stereo fisheye image generation pipeline based on high-resolution LiDAR scans. Leveraging these methods, we thoroughly evaluate state-of-the-art monocular depth, stereo matching, and depth completion models on our benchmark. Additionally, we provide 18K LiDAR-derived sparse depth training samples, achieving up to a 62% performance boost on fisheye data when fine-tuning pinhole-based stereo models. In summary, the high precision and versatility of our benchmark set a strong foundation for advancing research in fisheye depth estimation and robotics perception. Project page: https://ilyaind.github.io/WideDepth

URL PDF HTML ☆

赞 0 踩 0

2605.24044 2026-05-26 cs.RO cs.SE cs.SY eess.SY 版本更新

RED: Adaptive Real-Time DAG Scheduling for Robotic Inference under Environmental Dynamics

RED：面向环境动态的自适应实时DAG调度用于机器人推理

Zexin Li, Tao Ren, Johnathan Liu, Xiaoxi He, Cong Liu

发表机构 * University of California, Riverside（加州大学河滨分校）； University of Pittsburgh（匹兹堡大学）； University of Maryland, Baltimore County（马里兰大学巴尔的摩县分校）； University of Macau（澳门大学）

AI总结提出RED框架，通过截止时间感知调度器和MIMONet结构对齐，在资源受限机器人平台上实现多任务深度神经网络工作负载的实时调度，适应环境动态并保证端到端时序约束。

Comments Extension version of RTSS'23

详情

AI中文摘要

部署在动态环境中的机器人必须应对环境驱动的变化，这些变化会在运行时重塑计算：新任务可能出现，优先级关系可能改变，整体工作负载结构会演变，所有这些都会降低性能，特别是在资源紧张和实时预算下需要多任务推理时。我们提出RED，一个用于资源受限机器人平台上多任务深度神经网络工作负载的实时调度框架，它适应机器人环境动态（RED），同时在建模假设下保留端到端时序保证。RED的核心是一个截止时间感知调度器，它分配中间子截止时间，从而能够适应由不可预测条件引起的计算图演变和异步推理。该框架还支持灵活部署MIMONet（多输入多输出神经网络），这种网络常用于多任务机器人，通过权重共享缓解内存压力。RED通过工作负载细化和图重构过程显式利用这种共享参数属性，将MIMONet结构与可调度性要求对齐，提高兼容性和效率。我们在NVIDIA Jetson系列平台和Apple M系列MacBook上实现RED，并在代表真实机器人场景的导航导向工作负载上进行评估。实验表明，在吞吐量、截止时间满足率、抗干扰鲁棒性、适应性和运行时开销方面，RED持续优于现有方法。

英文摘要

Robots deployed in dynamic environments must contend with environment-driven changes that reshape computation at runtime: new tasks may appear, precedence relations can shift, and overall workload structure evolves, all of which degrade performance, especially when multi-task inference is required under tight resource and real-time budgets. We present RED, a real-time scheduling framework for multi-task deep neural network workloads on resource-constrained robotic platforms that adapts to Robotic Environmental Dynamics (RED) while preserving end-to-end timing guarantees under modeling assumptions. The core of RED is a deadline-aware scheduler that assigns intermediate sub-deadlines, allowing it to accommodate evolving computation graphs and asynchronous inference induced by unpredictable conditions. The framework also supports flexible deployment of MIMONet (multi-input multi-output neural networks), commonly used in multi-tasking robots to alleviate memory pressure through weight sharing. RED explicitly leverages this shared-parameter property via a workload refinement and graph-reconstruction procedure that aligns MIMONet structure with schedulability requirements, improving compatibility and efficiency. We implement RED on NVIDIA Jetson family platforms and on an Apple M-series MacBook and evaluate it on navigation-oriented workloads representative of real robotic scenarios. Experiments show consistent gains over existing methods in throughput, deadline satisfaction, robustness to interference, adaptability, and runtime overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.24004 2026-05-26 cs.AI cs.CV cs.LG cs.RO 版本更新

Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving

推理--想象--行动：基于世界模型的闭环LLM自动驾驶决策

Zhengqi Sun, Yiwen Sun, Boxuan Liu, Tailai Chen, Tianxu Guo, Jiabin Liu

发表机构 * 1Department of Information Management, Peking University, Beijing 100871, China ； 2School of Intelligence Science ； Technology, Peking University, Beijing 100871, China ； 3State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing 100080, China ； 4Yuanpei College, Peking University, Beijing 100871, China ； 5China Agricultural University, Beijing, China ； 6CRSC Research \& Design Institute Group Co., Ltd., Beijing, China

AI总结提出Reason--Imagine--Act (RIA)闭环框架，结合LLM推理器与动作条件世界模型进行在线安全验证，在CARLA点目标协议下实现80.05%路线完成率、51.10%到达率和0.20%碰撞率。

Comments Accepted by the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026). 8 pages, 2 figures

详情

AI中文摘要

大型语言模型（LLM）在自动驾驶中具有潜力，但仅基于语义的决策策略可能在动态交通中产生物理上不安全的行为。现有方法要么在没有显式动力学验证的情况下进行在线语言推理，要么主要在离线流程中使用世界模型，在决策时语义意图与物理可行性之间存在差距。我们提出了Reason--Imagine--Act (RIA)，一个闭环框架，将LLM推理器与动作条件世界模型耦合，用于在线安全验证。在每一步，LLM提出一个动作模板和候选子动作，世界模型执行短时域展开，安全评分器选择最安全的可执行动作并反馈给下一步推理。在统一的CARLA点目标协议（1000个回合）下，RIA实现了80.05%的路线完成率、51.10%的到达率和0.20%的碰撞率。在相同的闭环接口下，RIA在核心闭环指标上始终优于无训练基线，包括CARLA TM和MADA。为便于复现，代码可在https://github.com/pku-smart-city/source_code/tree/main/RIA获取。

英文摘要

Large language models (LLMs) are promising for autonomous driving, but semantics-only decision policies can yield physically unsafe behavior in dynamic traffic. Existing methods either perform online language reasoning without explicit dynamics verification or use world models mainly in offline pipelines, leaving a gap between semantic intent and physical feasibility at decision time. We propose Reason--Imagine--Act (RIA), a closed-loop framework that couples an LLM reasoner with an action-conditioned world model for online safety verification. At each step, the LLM proposes an action template and candidate sub-actions, the world model performs short-horizon rollouts, and a safety scorer selects the safest executable action with feedback to the next reasoning step. Under a unified CARLA point-goal protocol (1000 episodes), RIA achieves 80.05% route completion, 51.10% arrival rate, and 0.20% collision rate. Under the same closed-loop interface, RIA consistently outperforms training-free baselines, including CARLA TM and MADA, on core closed-loop metrics. For reproducibility, code is available at https://github.com/pku-smart-city/source_code/tree/main/RIA.

URL PDF HTML ☆

赞 0 踩 0

2605.23987 2026-05-26 cs.AI cs.RO 版本更新

VILAS：一种集成软抓取的VLA低成本机器人操作架构

Zijian An, Hadi Khezam, Bill Cai, Ran Yang, Shijie Geng, Yiming Feng, Yue Zheng, Lifeng Zhou

发表机构 * Drexel University（德雷塞尔大学）； Virginia Seafood Agricultural Research and Extension Center（弗吉尼亚海鲜农业研究与推广中心）； Amazon Store Foundation AI (SFAI)（亚马逊商店基金会人工智能（SFAI））

AI总结提出VILAS低成本模块化机器人操作平台，集成软抓取机构，支持端到端VLA策略学习与部署，并在葡萄抓取任务中验证有效性。

详情

AI中文摘要

我们提出了VILAS，一个完全低成本、模块化的机器人操作平台，旨在支持端到端视觉-语言-动作（VLA）策略学习并在可访问硬件上部署。该系统集成了法如FR5协作臂、Jodell RG52-50电动夹爪和双摄像头感知模块，通过基于ZMQ的通信架构统一协调遥操作、数据收集和策略部署于单一框架内。为了在不依赖显式力传感的情况下安全操作易碎物体，我们设计了一种基于kirigami的软柔性夹爪扩展件，在压缩载荷下产生可预测变形，提供对脆弱目标的温和且可重复接触。我们在VILAS平台上部署并评估了三种最先进的VLA模型：pi_0、pi_0.5和GR00T N1.6。所有模型均使用通过我们的遥操作流水线收集的相同演示数据集，从公开发布的预训练检查点进行微调。在葡萄抓取任务上的实验验证了所提系统的有效性，证实了有能力的操作策略可以在低成本模块化硬件上成功训练和部署。我们的结果进一步为当前VLA模型在真实环境中的部署特性提供了实践见解。

英文摘要

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.

URL PDF HTML ☆

赞 0 踩 0

2603.09458 2026-05-26 cs.RO 版本更新

Stein Variational Ergodic Surface Coverage with SE(3) Constraints

Stein变分遍历曲面覆盖与SE(3)约束

Jiayun Li, Yufeng Jin, Sangli Teng, Dejian Gong, Georgia Chalvatzaki

发表机构 * Department of Computer Science, TU Darmstadt（图宾根大学计算机科学系）； Honda Research Institute Europe GmbH（本田欧洲研究院）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种基于预条件SE(3) Stein变分梯度下降的采样即优化方法，用于生成满足SE(3)约束的遍历轨迹，实现复杂3D点云曲面的高质量覆盖。

详情

AI中文摘要

曲面操作任务要求机器人生成能够全面覆盖复杂3D曲面同时保持精确末端执行器姿态的轨迹。现有的遍历轨迹优化（TO）方法在覆盖任务中表现出色，但由于非凸优化景观以及采样即优化（SAO）技术中对SE(3)约束处理不足，在处理点云目标时存在困难。在这项工作中，我们引入了一种预条件SE(3) Stein变分梯度下降（SVGD）方法用于SAO遍历轨迹生成。我们提出的方法包含多项创新。首先，我们将点云遍历覆盖重新表述为流形感知的采样问题。其次，我们推导了SE(3)特定的SVGD粒子更新，第三，我们开发了一个预条件子以加速TO收敛。与基于优化的强基线和SAO基线相比，我们的基于采样的框架在保持SE(3)几何结构的同时，一致地识别出更优的局部最优解。在3D点云曲面覆盖基准测试和机器人曲面绘制任务上的实验表明，相对于现有的TO和SAO方法，我们的方法在我们的设置中以可计算的计算量实现了更优的覆盖质量，并在真实机器人实验中得到了验证。

英文摘要

Surface manipulation tasks require robots to generate trajectories that comprehensively cover complex 3D surfaces while maintaining precise end-effector poses. Existing ergodic trajectory optimization (TO) methods demonstrate success in coverage tasks, while struggling with point-cloud targets due to the nonconvex optimization landscapes and the inadequate handling of SE(3) constraints in sampling-as-optimization (SAO) techniques. In this work, we introduce a preconditioned SE(3) Stein Variational Gradient Descent (SVGD) approach for SAO ergodic trajectory generation. Our proposed approach comprises multiple innovations. First, we reformulate point-cloud ergodic coverage as a manifold-aware sampling problem. Second, we derive SE(3)-specific SVGD particle updates, and, third, we develop a preconditioner to accelerate TO convergence. Our sampling-based framework consistently identifies superior local optima compared to strong optimization-based and SAO baselines while preserving the SE(3) geometric structure. Experiments on a 3D point-cloud surface coverage benchmark and robotic surface drawing tasks demonstrate that our method achieves superior coverage quality with tractable computation in our setting relative to existing TO and SAO approaches, and is validated in real-world robot experiments.

URL PDF HTML ☆

赞 0 踩 0

2602.02839 2026-05-26 cs.RO 版本更新

Language Movement Primitives: Grounding Language Models in Robot Motion

语言运动基元：将语言模型锚定在机器人运动中

Yinlong Dai, Benjamin A. Christie, Daniel J. Evans, Dylan P. Losey, Simon Stepputtis

发表机构 * Collab , Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061（合作组，机械工程系，弗吉尼亚理工学院，黑斯堡，VA 24061）； TEA Lab , Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061（TEA实验室，机械工程系，弗吉尼亚理工学院，黑斯堡，VA 24061）

AI总结提出语言运动基元（LMP）框架，通过将视觉语言模型（VLM）推理与动态运动基元（DMP）参数化结合，实现零样本机器人操作任务。

详情

AI中文摘要

尽管在基于基础模型的通用问题解决方面取得了显著进展，但使机器人能够根据自然语言指令执行新颖的操作任务仍然是机器人学中的一个基本挑战。大型视觉和语言模型（VLM）能够处理高维输入数据以理解视觉场景和语言，并将任务分解为一系列逻辑步骤；然而，它们难以将这些步骤锚定在具体的机器人运动中。另一方面，机器人基础模型输出动作命令，但在成功执行新颖任务之前需要领域内的微调或经验。其核心仍然存在将抽象任务推理与低级运动控制连接起来的基本挑战。为了解决这一脱节，我们提出了语言运动基元（LMP），这是一个将VLM推理锚定在动态运动基元（DMP）参数化中的框架。我们的关键洞察是，DMP提供了少量可解释的参数，而VLM可以设置这些参数来指定多样、连续且稳定的轨迹。换句话说：VLM可以推理自由形式的自然语言任务描述，并将其期望的运动语义锚定到DMP中——弥合了高级任务推理与低级位置和速度控制之间的鸿沟。基于这种VLM和DMP的结合，我们制定了LMP流程，用于零样本机器人操作，通过生成一系列DMP运动有效完成桌面操作问题。在31个真实世界操作任务中，我们展示了LMP实现了65%的任务成功率，而最佳基线的成功率为35%。请访问我们的网站查看视频：https://collab.me.vt.edu/lmp

英文摘要

Enabling robots to perform novel manipulation tasks from natural language instructions remains a fundamental challenge in robotics, despite significant progress in generalized problem solving with foundational models. Large vision and language models (VLMs) are capable of processing high-dimensional input data for visual scene and language understanding, as well as decomposing tasks into a sequence of logical steps; however, they struggle to ground those steps in embodied robot motion. On the other hand, robotics foundation models output action commands, but require in-domain fine-tuning or experience before they are able to perform novel tasks successfully. At its core, there still remains the fundamental challenge of connecting abstract task reasoning with low-level motion control. To address this disconnect, we propose Language Movement Primitives (LMPs), a framework that grounds VLM reasoning in Dynamic Movement Primitive (DMP) parameterization. Our key insight is that DMPs provide a small number of interpretable parameters, and VLMs can set these parameters to specify diverse, continuous, and stable trajectories. Put another way: VLMs can reason over free-form natural language task descriptions, and semantically ground their desired motions into DMPs -- bridging the gap between high-level task reasoning and low-level position and velocity control. Building on this combination of VLMs and DMPs, we formulate our LMP pipeline for zero-shot robot manipulation that effectively completes tabletop manipulation problems by generating a sequence of DMP motions. Across 31 real-world manipulation tasks, we show that LMP achieves 65% task success as compared to 35% for the best performing baseline. See videos at our website: https://collab.me.vt.edu/lmp

URL PDF HTML ☆

赞 0 踩 0

2510.20955 2026-05-26 cs.LG cs.RO 版本更新

Approximating Safety Feedback Without a Safety Oracle via Model Predictive Control

无安全神谕下通过模型预测控制近似安全反馈

Jeff Pflueger, Michael Everett

发表机构 * Northeastern University（东北大学）

AI总结提出一种利用模拟器和模型预测路径积分算法，基于可逆性和正不变性假设来近似安全函数的方法，避免手动设计安全反馈。

Comments 8 pages, 5 figures

详情

AI中文摘要

移动机器人控制的安全决策算法通常需要存在反馈来验证提议动作的安全性。该反馈假定在控制系统的开发或部署过程中直接可用，可以采取显式约束公式或手工标记的安全数据集的形式，但两者都可能不准确或耗时。许多最近开发的模拟器可以处理复杂的交互和多样化的环境。这些环境具有隐式安全约束，可能难以建模。通过利用其中一个模拟器，我们可以构建一个安全函数的代理，从而绕过对手动设计反馈来捕获这些约束的需求。我们提出了一种算法，通过使用可逆性和对不安全状态空间的正不变性假设来近似安全性。该方法采用模型预测路径积分算法（MPPI）来建立这种可逆性并验证提议的动作。首先，通过模拟器将动作投影到未来状态。然后，如果MPPI能够找到一条路径返回到轨迹中的先前状态，则该状态保证在不安全（正不变）集合之外。实验结果表明，所提出的算法可以近似安全神谕的性能，同时避免将不安全状态分类为安全。

英文摘要

Safe decision-making algorithms for control of mobile robots often require the existence of feedback to verify the safety of proposed actions. This feedback is assumed to be directly available during the development or deployment of the control system. It can take the form of either an explicit constraint formulation or a set of hand-labeled safety data, both of which can be inaccurate or time consuming to produce. Many recently developed simulators can handle complex interactions and varied environments. These environments have implicit safety constraints that may be hard to model. By leveraging one of these simulators, we can construct a proxy for a safety function that bypasses the need for hand designed feedback in capturing these constraints. We present an algorithm that approximates safety by using reversibility and a positive-invariance assumption on the unsafe state space. This method employs the Model-Predictive Path Integral algorithm (MPPI) to establish this reversibility and verify a proposed action. First the action is projected via the simulator to a future state. Then if MPPI can find a path back to a previous state in the trajectory, that state is guaranteed to be outside the unsafe (positive invariant) set. Experimental results demonstrate that the proposed algorithm can approximate the performance of a safety oracle while avoiding classification of unsafe states as safe.

URL PDF HTML ☆

赞 0 踩 0

2510.16435 2026-05-26 cs.RO cs.CL cs.HC 版本更新

What Questions Should Robots Be Able to Answer? A Dataset of User Questions for Explainable Robotics

机器人应该能够回答哪些问题？一个用于可解释机器人的用户问题数据集

Lennart Wachowiak, Andrew Coles, Gerard Canal, Oya Celiktutan

发表机构 * King's College London, CDT in Safe and Trusted AI（国王学院伦敦大学，安全与可信人工智能中心）； King's College London（国王学院伦敦大学）

AI总结本文通过收集100名参与者的1893个问题，构建了一个面向家用机器人的用户问题数据集，涵盖12个类别和70个子类别，旨在帮助机器人学家确定机器人需要回答的关键问题类型。

详情

AI中文摘要

随着大型语言模型和对话界面在人机交互中的广泛使用，机器人回答用户问题的能力比以往任何时候都更加重要。因此，我们引入了一个包含1,893个家用机器人用户问题的数据集，这些数据来自100名参与者，并分为12个类别和70个子类别。可解释机器人领域的大多数工作集中在“为什么”问题上。相比之下，我们的数据集提供了多种类型的问题，从关于简单执行细节的问题到关于机器人在假设场景中如何行动的问题——从而为机器人学家提供了关于其机器人需要能够回答哪些问题的宝贵见解。为了收集数据集，我们创建了15个视频刺激和7个文本刺激，描绘了机器人执行各种家务任务。然后，我们询问Prolific上的参与者在每个描绘的情境中他们想问机器人什么问题。在最终数据集中，最常见的类别是关于任务执行细节（21.4%）、机器人能力（12.6%）和性能评估（10.7%）的问题。尽管关于机器人如何处理潜在困难场景并确保正确行为的问题较少，但用户认为这些是机器人最需要能够回答的问题。此外，我们发现自认为是机器人学新手的人与更有经验的用户提出的问题不同。新手更倾向于询问简单事实，例如机器人做了什么或环境的当前状态。随着机器人进入与人类共享的环境，并且语言成为给出指令和交互的核心，该数据集为（i）识别机器人需要记录并暴露给对话界面的信息，（ii）对问答模块进行基准测试，以及（iii）设计符合用户期望的解释策略提供了宝贵的基础。

英文摘要

With the growing use of large language models and conversational interfaces in human-robot interaction, robots' ability to answer user questions is more important than ever. We therefore introduce a dataset of 1,893 user questions for household robots, collected from 100 participants and organized into 12 categories and 70 subcategories. Most work in explainable robotics focuses on why-questions. In contrast, our dataset provides a wide variety of questions, from questions about simple execution details to questions about how the robot would act in hypothetical scenarios -- thus giving roboticists valuable insights into what questions their robot needs to be able to answer. To collect the dataset, we created 15 video stimuli and 7 text stimuli, depicting robots performing varied household tasks. We then asked participants on Prolific what questions they would want to ask the robot in each portrayed situation. In the final dataset, the most frequent categories are questions about task execution details (21.4%), the robot's capabilities (12.6%), and performance assessments (10.7%). Although questions about how robots would handle potentially difficult scenarios and ensure correct behavior are less frequent, users rank them as the most important for robots to be able to answer. Moreover, we find that users who identify as novices in robotics ask different questions than more experienced users. Novices are more likely to inquire about simple facts, such as what the robot did or the current state of the environment. As robots enter environments shared with humans and language becomes central to giving instructions and interaction, this dataset provides a valuable foundation for (i) identifying the information robots need to log and expose to conversational interfaces, (ii) benchmarking question-answering modules, and (iii) designing explanation strategies that align with user expectations.

URL PDF HTML ☆

赞 0 踩 0