arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

机器人 / 具身智能

机器人、具身智能、机器人学习、操作、导航和具身世界模型。

今日/当前日期收录 42 信号源:cs.RO, cs.AI, cs.CV, cs.LG
2606.19357 2026-06-19 cs.RO cs.AI 新提交 95%

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

Physical Atari: 一个用于机器人实时强化学习的鲁棒且可访问的平台

Khurram Javed, Joseph Modayil, Gloria Kennickell, Richard S. Sutton, John Carmack

发表机构 * Keen Technologies University of Alberta, Canada(阿尔伯塔大学,加拿大) Openmind Research Institute(Openmind研究机构)

专题命中 机器人学习 :机器人实时强化学习平台,验证算法在物理世界学习

AI总结 提出Physical Atari平台,通过机器人操作Atari控制器和实时渲染游戏帧,实现物理世界中的强化学习研究,验证了算法可直接在机器人上学习,并指出分布偏移会显著降低策略性能。

Comments To appear at RLC 2026

详情
AI中文摘要

我们构建了一个名为Robotroller的机器人,它能够操作Atari CX40+控制器,以及一个名为Atari Devbox的设备,该设备在屏幕上渲染来自Arcade Learning Environment的游戏帧和奖励信号。Robotroller和Atari Devbox,连同现成的摄像头和台式计算机,构成一个可用于研究物理世界中强化学习算法的系统。我们将整个系统称为Physical Atari。在本文中,我们详细介绍了使Physical Atari成为一个鲁棒且可访问平台的关键决策。为了使系统鲁棒,我们设计了Robotroller,使得所有运动都通过轴承完成,从而减少磨损。此外,我们编写了软件,以高频监控伺服电机的状态并进行干预以限制应力。为了使系统可访问,我们使用了价格合理的现成组件和可通过消费级3D打印机制造的零件。Physical Atari的建造成本低于1000美元,并且已用于数周不间断的强化学习实验,未出现任何机械故障。我们用它验证了强化学习算法可以直接在机器人上学习,并表明即使学习和部署之间的微小分布偏移也会显著降低策略的性能。我们的结果强调了设备端适应对于在机器人上获得强性能的重要性。

英文摘要

We built a robot called the Robotroller that actuates an Atari CX40+ controller and a device called the Atari Devbox that renders the game frame and the reward signal from the Arcade Learning Environment on a screen. The Robotroller and the Atari Devbox, together with an off-the-shelf camera and a desktop computer, constitute a system that can be used to study reinforcement learning algorithms in the physical world. We call the full system Physical Atari. In this paper, we detail the key decisions that make Physical Atari a robust and accessible platform. To make the system robust, we designed the Robotroller so that all movement is done through bearings, which reduces wear. Additionally, we wrote software that monitors the state of the servos at a high frequency and intervenes to limit stress. To make the system accessible, we used affordable off-the-shelf components and parts that can be manufactured using consumer 3D printers. Physical Atari can be built for under $1,000 and has been used for weeks of non-stop reinforcement learning experiments without any mechanical failures. We used it to validate that reinforcement learning algorithms can learn directly on robots and show that even small distribution shifts between learning and deployment can significantly degrade the performance of policies. Our results underscore the importance of on-device adaptation for strong performance on robots.

2606.19729 2026-06-19 cs.RO cs.AI 新提交 90%

VOiLA: Vectorized Online Planning with Learned Diffusion Model for POMDP Agents

VOiLA: 基于学习扩散模型的向量化在线规划用于POMDP智能体

Marcus Hoerger, Rishikesh Joshi, Rahul Shome, Ian Manchester, Hanna Kurniawati

发表机构 * Australian National University(澳大利亚国立大学) The University of Sydney(悉尼大学)

专题命中 机器人学习 :提出POMDP在线规划框架,用于机器人规划。

AI总结 提出VOiLA框架,利用条件扩散模型学习POMDP模型,通过蒸馏加速采样并与向量化在线规划器集成,在三个基准任务和实物机器人上实现高效在线规划。

Comments Submitted to the 2026 International Symposium of Robotics Research (ISRR)

详情
AI中文摘要

不确定性下的规划是自主机器人的关键能力。部分可观测马尔可夫决策过程(POMDP)为此提供了强大框架。尽管基于POMDP的规划已取得显著进展,但其在现实问题中的应用常受限于难以获得准确的POMDP模型。我们提出VOiLA(Vectorized Online planning wIth Learned diffusion model for POMDP Agents),一个学习任务无关POMDP模型以实现在不确定性下在线规划的框架。VOiLA使用条件扩散模型学习转移和观测采样器,并学习用于基于粒子的信念更新的观测似然模型。为实现高效在线规划,扩散采样器被蒸馏为紧凑的前馈生成器,并与VOPP(一种利用GPU并行化的在线POMDP规划器)集成。实验结果表明,蒸馏策略将采样成本降低了近三个数量级,使学习到的生成式POMDP模型对在线规划实用。在三个基准问题上的评估表明,VOiLA在使用不到10%训练数据的情况下,性能达到或优于递归软演员-评论家算法,并且对未见环境配置的泛化能力更强。实物机器人评估表明,VOiLA仅使用模拟数据学习模型,并在10次运行中全部成功完成任务。

英文摘要

Planning under uncertainty is an essential capability for autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for such a capability. Although POMDP-based planning has advanced significantly, its application to real-world problems is often limited by the difficulty of obtaining faithful POMDP models. We present Vectorized Online planning wIth Learned diffusion model for POMDP Agents (VOiLA), a framework that learns task-agnostic POMDP models for online planning under uncertainty. VOiLA learns transition and observation samplers using conditional diffusion models and learns observation-likelihood models for particle-based belief updates. To enable efficient online planning, the diffusion samplers are distilled into compact feedforward generators and integrated with Vectorized Online POMDP Planner (VOPP), an online POMDP planner designed to leverage GPU parallelization. Experimental results indicate the distillation strategy reduces sampling cost by up to nearly three orders of magnitude, making learned generative POMDP models practical for online planning. Evaluation of VOiLA on three benchmark problems indicate that VOiLA achieves equal or better performance than Recurrent Soft Actor Critic while using less than 10% training data, and generalizes much better to unseen environment configurations. Physical robot evaluation indicates VOiLA uses the models learned using only simulated data and generates a policy that successfully accomplish the task in 10 of 10 runs.

2606.19728 2026-06-19 cs.RO cs.AI 新提交 90%

Bidirectional Tutoring for Developmental Motor Learning in Robots: Co-Developed Interaction Dynamics Support Stable Learning

机器人发展性运动学习的双向辅导:共同发展的交互动力学支持稳定学习

Rui Fukushima, Jun Tani

发表机构 * Okinawa Institute of Science and Technology Graduate University(冲绳科学技术大学院大学)

专题命中 机器人学习 :提出双向辅导框架用于机器人运动技能学习。

AI总结 提出双向辅导框架,通过人类或AI导师与机器人动态适应,利用自由能原理神经网络实现稳定序列学习,在物体操作任务中验证了行为一致性和泛化能力。

Comments 16 pages, 14 figures

详情
AI中文摘要

众所周知,婴儿通过与照顾者的密集互动来发展运动技能。尽管这种社会互动对人类发展至关重要,但机器人的运动技能学习通常被视为单向过程,机器人被动接受导师的演示。这忽视了社会互动的一个关键特性:它本质上是双向的,导师和学习者相互动态适应。在这种互动中,机器人的过往经验可能作为先验约束,塑造共同发展轨迹的动态。我们假设双向辅导允许这些约束引导形成一致的行为模式,从而保持行为一致性并支持泛化,而单向互动缺乏此类约束,导致更广泛、更不一致的行为模式。为检验这一假设,我们使用实体人形机器人进行了两个物体操作实验:一个涉及人机互动,另一个采用AI导师通过自适应干预机制与真实机器人互动,以检验在更受控条件下是否会出现类似效果。我们使用基于自由能原理的神经网络并扩展生成回放来实现发展性学习框架,该框架支持从单个辅导情节中进行稳定的逐序列学习。在两种设置中,双向辅导促进了行为一致性和阶段性泛化,同时机器人逐渐需要更少的导师指导。这些结果表明,双向辅导作为一种具身和社会化方法,为机器人的发展性运动学习提供了有效支架。

英文摘要

Infants are well known to develop their motor skills through dense interaction with caregivers. Although such social interaction is crucial for human development, motor-skill learning in robots is often treated as a unidirectional process in which robots passively receive demonstrations from tutors. This overlooks a key property of social interaction: it is inherently bidirectional, with tutor and learner dynamically adapting to each other. In such interactions, the robot's past experiences may function as prior constraints that shape the dynamics of their co-developed trajectories. We hypothesize that bidirectional tutoring allows such constraints to guide the formation of consistent behavioral patterns that preserve behavioral coherence and support generalization, whereas unidirectional interaction lacks such constraints and leads to broader, less consistent behavioral patterns. To examine this hypothesis, we conducted two experiments with a physical humanoid robot performing an object manipulation task: one involving human-robot interaction and another employing an AI tutor interacting with the real robot through an adaptive intervention mechanism designed to examine whether similar effects would emerge under more controlled conditions. We implement the developmental learning framework using a free-energy-principle-based neural network extended with generative replay, which supports stable sequence-by-sequence learning from single tutored episodes. Across both settings, bidirectional tutoring fostered consistent behaviors and stage-wise generalization, while the robot gradually required less tutor guidance. These results suggest that bidirectional tutoring, as an embodied and socially grounded approach, provides an effective scaffold for developmental motor learning in robots.

2606.19699 2026-06-19 cs.RO cs.LG cs.SY eess.SY 新提交 90%

Comparative Study on Agility, Efficiency, and Impact Absorption of Bipedal Robots with Active Toes

具有主动脚趾的双足机器人敏捷性、效率和冲击吸收的比较研究

Joong-Gil Kim, Wontae Ye, Geunwoo Cho, Seong-Ho Yun, Se-Hyoung Cho, Yong-Jae Kim

发表机构 * School of Electrical, Electronics and Communication Engineering, Korea University of Technology and Education(韩国技术教育大学电气、电子与通信工程学院) Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology(韩国科学技术研究院人工智能与机器人研究所) Robot Innovation Hub, WIRobotics Inc.(WIRobotics公司机器人创新中心)

专题命中 机器人学习 :比较双足机器人有无主动脚趾的性能。

AI总结 提出一种14自由度双足机器人,模拟人类脚趾的轻量、高扭矩、坚固特性,通过高保真仿真训练环境,对比有无主动脚趾的配置,发现脚趾机器人以1.33米/秒行走时,CoT降低17.5%,脚跟冲击力降低5.0%,路径偏差平均和最大分别降低25.0%和34.0%。

Comments 6 pages, 7 figures

详情
AI中文摘要

人类腿部表现出高效率、敏捷性和冲击吸收能力,其中脚趾在这些能力中起着关键作用。尽管已经有许多尝试在机器人中实现类似人类的脚趾,但它们尚未完全复制人类特征,也没有严格验证其益处。我们提出了一种14自由度的双足机器人,模拟人类脚趾的轻量、高扭矩、坚固特性。为了定量分析主动脚趾在敏捷性、效率和冲击吸收方面的有效性,我们开发了一个高保真仿真训练环境,该环境反映了具有耦合传动和精确功耗的实际执行器。为了确保有和没有主动脚趾的配置之间的公平比较,我们设计了一个最小化强化学习奖励函数,并对两者应用了相同的训练程序。仿真结果表明,在1.33米/秒行走时,与无脚趾配置相比,配备脚趾的机器人将CoT降低了17.5%,脚跟冲击力降低了5.0%。在敏捷性测试中,平均和最大路径偏差分别降低了25.0%和34.0%。

英文摘要

Human legs exhibit high efficiency, agility, and impact absorption, with toes playing a crucial role in these capabilities. While many attempts have been made to implement human-like toes in robots, they have not fully replicated human characteristics nor rigorously validated their benefits. We propose a 14-DOF biped robot emulating human toes' lightweight, high-torque, robust nature. To quantitatively analyze the effectiveness of the active toes in terms of agility, efficiency, and impact absorption, we developed a high-fidelity simulation training environment that reflects actual actuators with coupled transmissions and accurate power consumption. To ensure a fair comparison between configurations with and without active toes, we designed a minimal RL reward function and applied an identical training procedure to both. The simulation results indicate that, at 1.33 m/s walking, the toe-equipped robot reduced CoT by 17.5% and heel-strike GRF by 5.0% compared with the toe-ablation configuration. On the agility test, average and maximum path deviation decreased by 25.0% and 34.0%, respectively.

2606.19419 2026-06-19 cs.RO cs.AI 新提交 90%

Playful Agentic Robot Learning

趣味性具身机器人学习

Junyi Zhang, Jiaxin Ge, Hanjun Yoo, Letian Fu, Zihan Yang, Yaowei Liu, Raj Saravanan, Shaofeng Yin, Justin Yu, Dantong Niu, Zirui Wang, Roei Herzig, Ken Goldberg, Yutong Bai, David M. Chan, Ion Stoica, Angjoo Kanazawa, Jiahui Lei, Haiwen Feng, Trevor Darrell

发表机构 * University of California, Berkeley(加州大学伯克利分校) Impossible Research

专题命中 机器人学习 :机器人通过自主探索学习可复用技能。

AI总结 提出RATs框架,让机器人通过自主探索学习可复用技能,在LIBERO-PRO和MolmoSpaces上分别提升20.6和17.0个百分点。

Comments Project page: https://playful-rats.github.io/

详情
AI中文摘要

当前的具身机器人系统可以编写可执行的代码即策略程序、观察反馈并在多次尝试中修正行为,但它们仍然主要是任务驱动的:可复用技能仅在明确指令后获得。我们研究趣味性具身机器人学习,其中具身编码代理在下游任务到来之前,将自主导向的趣味性作为持续技能学习阶段。我们引入RATs,即专为趣味性技能获取设计的机器人代理团队。在趣味性阶段,RATs提出新颖且可学习的探索性任务,规划并执行机器人代码策略,验证中间进展,诊断失败,通过密集的步骤级反馈进行重试,并将成功执行提炼到持久代码技能库中。在测试时,代理从该冻结库中重用相关技能以帮助解决新任务。在LIBERO-PRO和MolmoSpaces上的实验表明,与无趣味性和随机趣味性基线相比,趣味性学习技能在保留的下游任务上分别提升了20.6和17.0个百分点(相对于CaP-Agent0)。此外,学习到的技能可以通过简单地检索到上下文中插入到其他推理时代码即策略代理中,无需微调基础模型,即可在RoboSuite和真实世界迁移中分别提升8.9和8.8个百分点。

英文摘要

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

2511.16223 2026-06-19 cs.RO 90%

DynaMimicGen: A Data Generation Framework for Robot Learning of Dynamic Tasks

DynaMimicGen:一种用于机器人动态任务学习的数据生成框架

Vincenzo Pomponi, Paolo Franceschi, Stefano Baraldo, Loris Roveda, Oliver Avram, Luca Maria Gambardella, Anna Valente

发表机构 * Institute of Systems and Technologies for Sustainable Production (ISTePS)(可持续生产系统与技术研究所) Department of Innovative Technologies (DTI)(创新技术系) University of Applied Science and Arts of Southern Switzerland (SUPSI)(瑞士南部应用科学与艺术大学) Istituto Dalle Molle di studi sull’intelligenza artificiale (IDSIA)(达莫尔智能研究 institute) Department of Mechanical Engineering(机械工程系) Politecnico di Milano (PoliMi)(米兰理工学院) Faculty of Informatics(信息学院) Università della Svizzera Italiana (USI)(瑞士意大利大学)

专题命中 机器人学习 :提出DynaMimicGen框架生成动态任务数据用于机器人学习。

AI总结 本文提出DynaMimicGen框架,通过少量人类示范生成数据,支持动态任务学习,产生适应性强的轨迹,提升机器人在复杂环境中的表现。

详情
AI中文摘要

学习稳健的操作策略通常需要大量且多样化的数据集,但收集这些数据耗时费力且不适用于动态环境。本文引入DynaMimicGen(D-MG),一种可扩展的数据生成框架,能够在极少量人类监督下训练策略,同时支持动态任务设置。仅需少量人类示范,D-MG首先将示范分割为有意义的子任务,然后利用动态运动片段(DMPs)来适应和推广演示行为到新颖且动态变化的环境。改进了依赖静态假设或简单轨迹插值的先前方法,D-MG生成平滑、真实且任务一致的笛卡尔轨迹,能够实时适应任务执行过程中物体姿态、机器人状态或场景几何的变化。我们的方法支持不同场景——包括场景布局、物体实例和机器人配置——使其适用于静态和高度动态的操作任务。我们证明机器人代理通过模仿学习在D-MG生成的数据上实现了在长时间跨度和接触丰富的基准测试中的强大表现,包括立方体堆叠和将杯子放入抽屉等任务,即使在不可预测的环境变化下也是如此。通过消除对大量人类示范的需求并使动态设置的泛化成为可能,D-MG提供了一种强大而高效的替代手动数据收集方法,为可扩展的自主机器人学习铺平道路。

英文摘要

Learning robust manipulation policies typically requires large and diverse datasets, the collection of which is time-consuming, labor-intensive, and often impractical for dynamic environments. In this work, we introduce DynaMimicGen (D-MG), a scalable dataset generation framework that enables policy training from minimal human supervision while uniquely supporting dynamic task settings. Given only a few human demonstrations, D-MG first segments the demonstrations into meaningful sub-tasks, then leverages Dynamic Movement Primitives (DMPs) to adapt and generalize the demonstrated behaviors to novel and dynamically changing environments. Improving prior methods that rely on static assumptions or simplistic trajectory interpolation, D-MG produces smooth, realistic, and task-consistent Cartesian trajectories that adapt in real time to changes in object poses, robot states, or scene geometry during task execution. Our method supports different scenarios - including scene layouts, object instances, and robot configurations - making it suitable for both static and highly dynamic manipulation tasks. We show that robot agents trained via imitation learning on D-MG-generated data achieve strong performance across long-horizon and contact-rich benchmarks, including tasks like cube stacking and placing mugs in drawers, even under unpredictable environment changes. By eliminating the need for extensive human demonstrations and enabling generalization in dynamic settings, D-MG offers a powerful and efficient alternative to manual data collection, paving the way toward scalable, autonomous robot learning.

2606.20521 2026-06-19 cs.CV 新提交 85%

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

HumanScale: 以自我为中心的人类视频在具身预训练中可超越真实机器人数据

Juncheng Ma, Jianxin Bi, Yufan Deng, Xuanran Zhai, Kewei Zhang, Ye Huang, Bo Liang, Shukai Gong, Jiankai Tu, Xiaotian Tang, Jiaxin Li, Kaiqi Chen, Duomin Wang, Yuqi Wang, Bingyi Kang, Eric Huang, Zhiyang Dou, Zhen Dong, Enze Xie, Wojciech Matusik, Tat-Seng Chua, Daquan Zhou

发表机构 * PKU(北京大学) NUS(新加坡国立大学) MIT(麻省理工学院) UCSB(加州大学圣塔芭芭拉分校) NVIDIA(英伟达)

专题命中 机器人学习 :人类视频用于具身基础模型预训练

AI总结 本文通过系统比较发现,经过精心设计的过滤和标注流程,以自我为中心的人类视频在具身基础模型预训练中不仅可行,而且性能优于遥操作真实机器人数据,验证了“预训练于人类视频+少量机器人数据适配”的可扩展范式。

Comments Github: https://github.com/DAGroup-PKU/HumanNet/

详情
AI中文摘要

具身基础模型有望像大型语言模型一样从数据扩展中受益,但面临更严重的数据瓶颈。遥操作真实机器人轨迹因其精确的动作监督和具身对齐而仍然是主要的预训练来源,但其可扩展性受限于高采集成本、获取难度以及低行为和环境多样性。这些限制引发了对以自我为中心的人类视频作为可扩展、成本显著更低且更多样化的具身模型预训练替代方案的兴趣。然而,与遥操作真实机器人数据相比,其有效性仍未得到充分探索。为了解决这个问题,我们在固定的后训练和验证协议下,进行了一项系统研究,比较以自我为中心的人类视频和遥操作真实机器人轨迹作为具身基础模型的预训练数据源。令人惊讶的是,我们发现经过精心设计的过滤和标注流程处理的以自我为中心的数据,不仅是模型预训练的可行替代品,而且可以带来更优的性能。在相同预训练数据量下,在以自我为中心数据上预训练的模型在真实机器人动作预测上的验证损失降低了24%,在分布内和分布外真实机器人任务执行上的成功率分别提高了52.5%和90%。这一发现验证了具身基础模型的一种可扩展范式:在以自我为中心的人类视频上预训练以学习多样化的世界表征,然后使用少量标注的真实机器人数据进行适配以实现动作空间对齐。我们希望这项研究能鼓励对以自我为中心数据的更广泛探索,并在昂贵的机器人数据收集之前为数据质量评估提供指导。

英文摘要

Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalability is limited by high collection cost, acquisition difficulty, and low behavioral and environmental diversity. These limitations have sparked interest in egocentric human video as a scalable, substantially lower-cost, and more diverse alternative for embodied model pretraining. However, its effectiveness compared to teleoperated real-robot data remains underexplored. To address this question, we conduct a systematic study comparing egocentric human video and teleoperated real-robot trajectories as pretraining data sources for embodied foundation models, under fixed post-training and validation protocols. Surprisingly, we find that egocentric data, when processed through a carefully designed filtering and labeling pipeline, is not merely a viable substitute for model pretraining but can lead to superior performance. With the same amount of pretraining data, models pretrained on egocentric data achieve a 24% lower validation loss on real-robot action prediction, as well as 52.5% and 90% higher success rates on in-distribution and out-of-distribution real-robot task execution, respectively. This finding verifies a scalable paradigm for embodied foundation models: pretrain on egocentric human video to learn diverse world representations, then adapt with a small amount of labeled real-robot data for action-space alignment. We hope this study encourages broader exploration of egocentric data and offers guidance for data quality assessment before costly robot data collection.

2606.20495 2026-06-19 cs.RO 新提交 85%

Increasing Resilience of Continuum Robots via Motion Planning Algorithms

通过运动规划算法提高连续体机器人的韧性

Oxana Shamilyan, Ievgen Kabin, Zoya Dyka, Oleksandr Sudakov, Peter Langendoerfer

发表机构 * IHP – Leibniz-Institut für innovative Mikroelektronik(莱布尼茨创新微电子研究所) BTU Cottbus-Senftenberg(科特博斯-塞芬堡工业大学) Technical Center, National Academy of Sciences of Ukraine(乌克兰国家科学院技术中心)

专题命中 机器人学习 :研究连续体机器人的运动规划算法

AI总结 本文实验研究运动规划算法对连续体机器人韧性的影响,通过改进遗传算法和A*算法,结合层次分析法评估路径质量,发现遗传算法生成更多样化路径,提升机器人韧性。

详情
AI中文摘要

本文介绍了针对韧性连续体机器人的运动规划实验研究。我们主要关注多准则决策、其在路径规划算法中的应用、对生成路径的影响以及执行时间。为此,我们使用了两种著名的路径规划算法,即遗传算法和A*算法,并通过添加层次分析法算法来评估生成路径的质量,对其进行了修改。在我们的实验中,层次分析法考虑了四个不同的准则,即距离、电机损伤、机器人手臂的机械损伤和精度,每个准则都被认为有助于连续体机器人的韧性。使用不同的准则对于延长连续体机器人的维护操作时间是必要的。我们使用两种不同的机器人模拟环境进行了实验。尽管我们显著简化了机器人模型及其环境,但我们仍然基于真实机器人原型实现了环境的一些特征。特别地,其中一个环境包含单路径点和多路径点,另一个环境仅包含多路径点。结果表明,与A*算法相比,遗传算法的性能时间不依赖于环境的基数。它生成更多样化的路径,从而提高了机器人的韧性。

英文摘要

This paper presents an experimental study of motion planning for resilient continuum robots. In this study we mainly focused on multi-criteria decision-making, its application for path-planning algorithms, impact on the generated path and execution time. To do this, we used two well-known algorithms for path planning, namely Genetic algorithm and A star algorithm, and modified them by adding the Analytical Hierarchy Process algorithm to evaluate the quality of the paths generated. In our experiment the Analytical Hierarchy Process considers four different criteria, i.e. distance, motors damage, mechanical damage of the robot's arm and accuracy, each considered to contribute to the resilience of a continuum robot. The use of different criteria is necessary to increase the time to maintenance operations of the continuum robot. We conducted the experiments using two different simulated environments of the robot. Although we significantly simplified the robot's model and its environment, we still implemented some of the features of the environment based on the real robot prototype. In particular, one of the environments has single- as well as multi-path points, and other consists of the multi-path points only. The results show that, in contrast to A star, the performance time of Genetic algorithm does not depend on the environment's cardinality. It generates more diverse paths, which increases the robot's resilience.

2606.20389 2026-06-19 cs.RO 新提交 85%

CoLI: A Reproducible Platform for Continuum Robot Learning via Monolithic 3D Printing and Isomorphic Teleoperation

CoLI: 通过整体3D打印和同构遥操作实现连续体机器人学习的可复现平台

Ziyuan Tang, Chenxi Xiao*

发表机构 * School of Information Science and Technology at ShanghaiTech University(上海科技大学信息科学与技术学院)

专题命中 机器人学习 :连续体机器人学习平台,支持模仿学习和遥操作。

AI总结 提出一种基于多材料3D打印和同构遥操作的连续体机器人平台,简化制造流程并实现无奇异映射控制,支持模仿学习自主控制,通过硬件表征和操作任务验证其可复现性和学习就绪性。

Comments 8 pages, 7 figures, 1 table, accepted by IROS2026

详情
AI中文摘要

连续体机器人因其高自由度、柔顺结构和操作安全性,在操作任务中展现出巨大潜力。然而,复杂的制造和组装过程、具有挑战性的运动学建模以及缺乏直观的控制接口,导致其在研究和实际应用中的可复现性受到阻碍。为解决这些问题,我们提出了一种新颖的开源连续体机器人设计。该平台采用多材料3D打印实现简化的制造流程,使机械臂能够作为整体柔顺结构制造,且组装工作量最小。控制通过同构遥操作接口实现,该接口建立了直接的执行器级映射,无需显式运动学建模,并提供无奇异映射。基于该硬件设计,平台进一步支持基于模仿学习的自主控制。通过硬件表征和一系列操作任务对所提出的系统进行了评估。实验结果表明,该平台提供了一个可复现的、学习就绪的连续体机器人系统,加速了连续体机器人社区的算法开发和系统基准测试。

英文摘要

Continuum robots offer strong potential for manipulation tasks due to their high degrees of freedom, compliant structures, and operational safety. However, their adoption in both research and practical applications has been hindered by reproducibility issues arising from complex fabrication and assembly processes, challenging kinematic modeling, and a lack of intuitive control interfaces. To address these challenges, we present a novel open-source continuum robot design. The platform features a simplified fabrication pipeline enabled by multi-material 3D printing, allowing the arm to be fabricated as a monolithic compliant structure with minimal assembly. Control is achieved through an isomorphic teleoperation interface that establishes a direct actuator-level mapping, eliminating the need for explicit kinematic modeling and providing a singularity-free mapping. Building on this hardware design, the platform further supports imitation-learning-based autonomous control. The proposed system is evaluated through hardware characterization and a set of manipulation tasks. Experimental results demonstrate that the platform provides a reproducible, learning-ready continuum robot system, accelerating algorithmic development and systematic benchmarking for the continuum robotics community.

2606.20365 2026-06-19 cs.RO cs.MA 新提交 85%

An Infrastructure-less, Control-Independent Solution to Relative Localisation of a Team of Mobile Robots using Ranging Measurements

基于测距的移动机器人团队相对定位的无基础设施、控制无关解决方案

Paolo Golinelli, Tommaso Faraci, Daniele Fontanelli

发表机构 * Department of Industrial Engineering, University of Trento(特伦托大学工业工程系) Department of Information Engineering and Computer Science, University of Trento(特伦托大学信息工程与计算机科学系)

专题命中 机器人学习 :移动机器人团队协作定位算法

AI总结 提出一种无锚点、完全去中心化的协作定位算法,仅依赖局部里程计、稀疏测距和短程通信,无需控制机器人运动即可实现团队可观测性,采用多假设贝叶斯框架保证鲁棒性。

详情
AI中文摘要

定位机器人团队的能力对于从非结构化环境中的机器人舰队到协作控制和导航任务等应用至关重要。在此类场景中,固定基础设施通常不可用,部署必须快速灵活,系统要求必须最小化。我们提出了一种去中心化协作定位算法,同时解决了所有这些挑战。该方法无锚点、完全去中心化,并且与大多数现有方法不同,不需要控制机器人运动来确保团队可观测性。它仅依赖局部里程计、稀疏的代理间测距测量和短程通信,这些在实践中广泛可用。该算法采用多假设贝叶斯框架,维护所有可行解集,确保在瞬态不可观测条件下的鲁棒性。此外,通过信息共享,每个代理都能受益于整个群体的估计,即使在部分连接条件下也是如此。

英文摘要

The ability to localise teams of robots is essential for applications ranging from robotic fleets in unstructured environments to cooperative control and navigation tasks. In such contexts, fixed infrastructure is often unavailable, deployments must be fast and flexible, and system requirements must be minimal. We present a decentralised cooperative localisation algorithm that addresses all these challenges at once. The method is anchor-less, fully decentralised, and, unlike most existing approaches, does not require controlling the robots motion to ensure team observability. It relies only on local odometry, sparse inter-agent ranging measurements, and short-range communication, all of which are widely available in practice. The algorithm adopts a multi-hypothesis Bayesian framework that maintains the entire set of feasible solutions, ensuring robustness under transient unobservable conditions. Moreover, through information sharing, each agent benefits from the estimates of the entire group, even in partially connected conditions.

2606.20209 2026-06-19 cs.RO cs.AI 新提交 85%

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

FlowMaps: 使用流匹配建模长期多模态物体动态

Francesco Argenziano, Miguel Saavedra-Ruiz, Sacha Morin, Charlie Gauthier, Daniele Nardi, Liam Paull

发表机构 * Sapienza University of Rome(罗马大学) Université de Montréal(蒙特利尔大学) Mila - Quebec AI Institute(米拉-魁北克人工智能研究所)

专题命中 机器人学习 :FlowMaps建模物体动态,提升机器人导航性能。

AI总结 提出FlowMaps模型,通过潜在流匹配学习物体位置的多模态时空分布,预测动态物体未来位置,提升机器人在变化家庭环境中的导航性能。

详情
AI中文摘要

对3D场景的联合空间和时间理解是部署在日常家庭环境中的机器人的关键要求。这些智能体不仅必须理解和导航空间布局,还必须推理这些空间如何随时间演变。特别是,人类每天与物体互动,导致物体在整个环境中改变位置,使机器人难以可靠地将当前观察与先前看到的物体关联起来。然而,这些互动并非随机:人类的习惯和日常行为在物体位置上产生了时空一致的模式,机器人智能体可以学习这些模式,然后将其用于下游任务,如导航。为此,我们引入了FlowMaps,一种潜在流匹配模型,用于估计连续3D空间中动态物体未来位置的多模态分布。通过学习物体之间的隐式依赖关系及其时间演变,FlowMaps预测物体位置在人类过去互动条件下的可能变化,同时支持在具有相似物体习惯的未见环境中的泛化。为了展示该方法的实用性,我们在模拟和真实环境中将FlowMaps部署到下游的动态物体导航任务中。在超过600个回合中,FlowMaps优于最先进的方法,表明通过连续、多模态的时空分布建模物体动态可以改善机器人在变化家庭环境中的搜索和导航。代码和附加材料可在此https URL获取。

英文摘要

Joint spatial and temporal understanding of 3D scenes is a crucial requirement for robots deployed in everyday household environments. Such agents must not only comprehend and navigate spatial layouts, but also reason about how these spaces evolve over time. In particular, humans interact with objects daily, causing them to change position throughout the environment and making it difficult for robots to reliably associate current observations with previously seen objects. However, these interactions are not random: human habits and routines induce spatio-temporally consistent patterns in object locations, which robotic agents can potentially learn and then exploit for downstream tasks such as navigation. To this end, we introduce FlowMaps, a latent flow matching model for estimating multimodal distributions over the future locations of dynamic objects in a continuous 3D space. By learning the implicit dependencies among objects and their temporal evolution, FlowMaps predicts likely changes in object locations conditioned on past human interactions, while supporting generalization across previously unseen environments that share similar object routines. To demonstrate the utility of this method, we deploy FlowMaps in a downstream dynamic Object Navigation task in both simulated and real-world environments. Across more than 600 episodes, FlowMaps outperforms state-of-the-art approaches, showing that modeling object dynamics through continuous, multimodal spatio-temporal distributions improves robotic search and navigation in changing household environments. Code and additional material is available at https://fra-tsuna.github.io/flowmaps/.

2606.20150 2026-06-19 cs.RO 新提交 85%

Robust Assembly State Reasoning from Action Recognition for Human-Robot Collaboration

面向人机协作的基于动作识别的鲁棒装配状态推理

James Fant-Male, Roel Pieters

发表机构 * Cognitive Robotics group, Unit of Automation Technology and Mechanical Engineering, Tampere University(坦佩雷大学自动化技术与机械工程系认知机器人组)

专题命中 机器人学习 :人机协作中的装配状态推理。

AI总结 研究从动作识别输入跟踪装配状态的方法,比较逻辑、HMM和神经网络方法,发现最优方法因任务而异,逻辑方法在多变场景更鲁棒。

Comments Preprint accepted to the 35th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026). 8 pages, 9 figures, 3 tables

详情
AI中文摘要

人类动作识别(HAR)在人机协作(HRC)研究中经常被用于理解已执行的动作以及协作任务的状态。然而,从HAR准确跟踪装配状态尚未得到充分研究,并且在现实场景中并非易事。本研究系统性地调查并比较了使用动作识别输入跟踪装配状态的方法。使用两个不同数据集和五种状态跟踪方法(包括基于逻辑的、隐马尔可夫模型(HMM)和神经网络(NN)方法)进行的调查表明,最优方法在不同任务中并不统一,并且不同方法在不同情况下会失败。测试使用具有不同噪声水平的模拟输入和来自HAR模型的真实输入进行。结果表明,NN和HMM方法在变异性有限的任务中表现良好,但在其他场景中,基于逻辑的方法可能更鲁棒。对于没有额外传感的重复动作任务,建模预期动作持续时间的方法也很重要。

英文摘要

Human Action Recognition (HAR) is frequently investigated in Human-Robot Collaboration (HRC) research to understand what actions have been performed and hence the state of a collaborative task. Accurately tracking an assembly state from HAR is however not fully investigated, and in realistic scenarios is not a trivial task. This research systematically investigates and compares methods for tracking assembly state using action recognition inputs. Investigations using two diverse datasets and five state tracking approaches, including logic-based, Hidden Markov Model (HMM), and neural network (NN) methods, show that optimal approaches are not uniform across different tasks and that different methods fail under different circumstances. Testing is performed using both simulated inputs with varying noise levels and realistic inputs from a HAR model. Results show NN and HMM methods can perform well in tasks with limited variability, but for other scenarios logic-based approaches can be more robust. Methods which model expected action duration are also important for tasks with repeated actions where no additional sensing is provided.

2606.20104 2026-06-19 cs.LG cs.AI 新提交 85%

Sensorimotor World Models: Perception for Action via Inverse Dynamics

传感器运动世界模型:通过逆动力学实现面向行动感知

Petr Ivashkov, Randall Balestriero, Bernhard Schölkopf

发表机构 * Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所) Department of Computer Science, Brown University(布朗大学计算机科学系) ELLIS Institute(ELLIS研究所) ETH Zürich(苏黎世联邦理工学院)

专题命中 机器人学习 :世界模型用于机器人控制

AI总结 提出传感器运动世界模型(SMWM),通过逆动力学正则化端到端训练潜空间世界模型,防止表示崩溃并学习与行动对齐的紧凑表示,在2D和3D控制任务中实现竞争性规划性能。

详情
AI中文摘要

面向行动的感知表明,世界的表示不应仅由视觉保真度决定,而应由其与行动的相关性决定。同时,潜在的JEPA风格世界模型主张从高维观测中学习紧凑的预测状态以促进未来状态的预测,但这些模型的端到端训练并非易事,因为如果我们的唯一目标是构建易于预测的潜在状态,表示可能会崩溃。我们引入了一种传感器运动世界模型(SMWM):一种通过逆动力学正则化进行端到端训练的潜在世界模型。这一单一正则化解决了两个问题:它防止表示崩溃并诱导与行动对齐的表示。通过迫使潜在状态保留关于转换背后行动的信息,它使模型偏向于环境中可控的自由度,同时丢弃不可控的干扰因素。这产生了从离线、无奖励轨迹中训练的稳定潜在世界模型,无需冻结编码器、指数移动平均或复杂的潜在正则化。实验表明,SMWM学习了紧凑、可解释的潜在空间,并在简单的2D和3D控制任务中实现了竞争性的规划性能。

英文摘要

Perception for action suggests that representations of the world should be shaped not by visual fidelity alone, but by their relevance for actions. At the same time, latent JEPA-style world models advocate learning compact predictive states from high-dimensional observations to facilitate the prediction of future states, but end-to-end training of these models is nontrivial because representations may collapse if our only goal is to construct a latent state that is easy to predict. We introduce a sensorimotor world model (SMWM): a latent world model trained end-to-end with inverse dynamics regularization. This single regularizer addresses both issues: it prevents representation collapse and induces action-aligned representations. By forcing latent states to preserve information about the action underlying a transition, it biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors. This yields stable latent world models trained from offline, reward-free trajectories, without frozen encoders, exponential moving averages, or complex latent regularizers. Empirically, SMWM learns compact, interpretable latent spaces and enables competitive planning performance across simple 2D and 3D control tasks.

2606.20056 2026-06-19 cs.RO 新提交 85%

VFILC: Accurate Frequency Extrapolations in Imitation Learning via Sampling Frequency ILC

VFILC: 通过采样频率迭代学习控制实现模仿学习中的精确频率外推

Nozomu Masuya, Toshiaki Tsuji, Sho Sakaino

发表机构 * Grad. School of Science Technology University of Tsukuba Tsukuba, Japan Engineering Saitama University Saitama, Japan Information Engineering University of Tsukuba Tsukuba, Japan

专题命中 机器人学习 :提出模仿学习方法用于机器人速度外推。

AI总结 提出VFILC方法,结合可变频率模仿学习与前馈-反馈迭代学习控制,在三种任务中实现精确的速度外推,频率误差降低最高81%。

Comments 8 pages, 17 figures. Accepted at IROS 2026

详情
AI中文摘要

传统的基于神经网络(NN)的变速度运动模仿学习方法要么局限于内插速度,要么在外推超出训练速度范围时产生不可预测的运动。可变频率模仿学习(VFIL)通过将NN模型的采样频率与运动频率相关联,实现了速度的外推,但其开环配置导致频率误差,特别是在外推的高频设置中。本研究提出了基于VFIL和迭代学习控制(ILC)的可变频率模仿学习与迭代学习控制(VFILC),包含前馈和反馈两部分,前者利用VFIL的优势,后者调整频率误差。实验结果表明,所提方法成功且精确地外推了运动速度,并在所有三个任务中减少了频率误差;特别是在以训练数据中平均速度的两倍进行外推时,与简单前馈VFIL相比,反馈在擦拭任务中将频率误差显著降低了81%,在摇晃任务中降低了50%。即使在受复杂摩擦特性影响的接触密集混合任务的内插频率下,所提方法相比VFIL也将精度提高了27%。

英文摘要

Conventional neural network (NN)-based imitation learning methods for variable-speed motion either restricted their scope to interpolated speeds, or generated unpredictable motions when extrapolating beyond trained velocity ranges. Variable-frequency imitation learning (VFIL) enabled extrapolations of speeds by linking the NN model's sampling frequency to the motion frequency, whereas its open-loop configuration caused frequency errors, especially in the extrapolated high-frequency settings. This study proposes variable-frequency imitation learning with iterative learning control (VFILC) based on a combination of VFIL and iterative learning control (ILC) with both feedforward and feedback parts, the former taking advantage of VFIL and the latter adjusting the frequency errors. The experimental results showed that the proposed method successfully and accurately extrapolated motion speeds and reduced frequency errors in all three tasks, and that the feedback especially reduced the frequency errors by a remarkable 81% in the wiping task and 50% in the shaking task, both compared to simple feedforward VFIL, when extrapolating at double the average speed in the training data. The proposed method also improved accuracy by 27% compared with VFIL even at an interpolated frequency for a contact-rich mixing task affected by complex friction traits.

2606.20048 2026-06-19 cs.RO 新提交 85%

MirrorDuo: Reflection-Consistent Visuomotor Learning from Mirrored Demonstration Pairs

MirrorDuo:基于镜像演示对的反射一致视觉运动学习

Zheyu Zhuang, Ruiyu Wang, Giovanni Luca Marchetti, Florian T. Pokorny, Danica Kragic

发表机构 * Division of Robotics, Perception and Learning(机器人、感知与学习 division)

专题命中 机器人学习 :提出镜像演示增强行为克隆,用于机器人学习。

AI总结 提出MirrorDuo方法,通过反射一致性为每个原始演示生成镜像副本,实现数据增强,在相同数据预算下显著提升行为克隆性能,并支持零/少样本技能迁移。

Comments Published in CoRL 2025

Journal ref CoRL 2025

详情
AI中文摘要

基于图像的行为克隆利用从无处不在的RGB相机捕获的演示。然而,它仍然受到收集多样化演示成本的限制,特别是在工作空间变化中泛化。我们提出MirrorDuo,一种基于反射的公式,操作于图像、本体感受和完整的6自由度末端执行器动作元组,为每个原始演示生成镜像对应物,有效实现“收集一个,免费获得一个”。它可以作为现有学习管道(如标准行为克隆或扩散策略)的数据增强策略,或作为反射等变策略网络的结构先验。通过利用原始域和镜像域之间的重叠,当演示均匀分布在工作空间两侧时,MirrorDuo在相同数据预算下实现了显著改进的性能。当演示仅限于一侧时,MirrorDuo能够在目标布局中仅使用零或五个演示实现向镜像工作空间的高效技能迁移。

英文摘要

Image-based behaviour cloning leverages demonstrations captured from ubiquitous RGB cameras. However, it remains constrained by the cost of collecting diverse demos, especially for generalizing across workspace variations. We propose MirrorDuo, a reflection-based formulation that operates on image, proprioception, and full 6-DoF end-effector action tuples, generating a mirrored counterpart for each original demonstration, effectively achieving "collect one, get one for free". It can be applied as a data augmentation strategy for existing learning pipelines, such as standard behaviour cloning or diffusion policy, or as a structural prior for reflection-equivariant policy networks. By leveraging the overlap between the original and mirrored domains, MirrorDuo achieves significantly improved performance under the same data budget when demonstrations are evenly distributed across both sides of the workspace. When demonstrations are confined to one side, MirrorDuo enables efficient skill transfer to the mirrored workspace with as few as zero or five demos in the target arrangement.

2606.19990 2026-06-19 cs.AI 新提交 85%

Reward as An Agent for Embodied World Models

奖励作为具身世界模型的智能体

Pu Li, Zhigang Lin, Qiang Wu, Yongxuan Lv, Fei Wang, Shan You

发表机构 * ACE Robotics(ACE机器人)

专题命中 机器人学习 :提出奖励智能体框架用于具身世界模型

AI总结 提出奖励智能体框架和动态感知 rollout 多样化方法,通过鲁棒验证支持更广泛探索,缓解奖励黑客问题,提升世界模型性能。

详情
AI中文摘要

虽然强化学习已成为改进世界模型的有前景工具,现有方法大多依赖于训练分布附近的保守 rollout,限制了探索、行为多样性和更丰富的动态发现。在这项工作中,我们挑战这种保守范式。我们认为核心限制不是探索本身,而是缺乏支持更广泛探索的可靠验证策略。没有可靠的验证,扩展的探索极易受到奖励黑客攻击,即策略利用不完美的奖励而未能实现真正的改进。为了评估这一动机,我们在具身世界模型中实例化我们的方法,其中物理合理性和任务完成性为复杂动态下的可扩展强化学习提供了严格的测试平台。在验证方面,我们引入奖励作为智能体,一种主动评估生成行为以提供鲁棒奖励信号并减轻分布偏移下奖励黑客攻击的智能体奖励框架。在探索方面,我们通过 DynDiff-GRPO 引入动态感知 rollout 多样化,显式扩展动作空间探索以多样化轨迹、拓宽状态-动作覆盖范围,并鼓励超越保守 rollout 机制的更丰富具身行为。通过将奖励作为智能体与 DynDiff-GRPO 统一,我们在更可靠的奖励基础上实现强化学习,并大幅多样化采样,有效缓解奖励黑客攻击,同时在多个开源世界模型上取得显著的精度提升,从而证明当基于鲁棒验证时,更广泛的探索可以成功扩展。

英文摘要

While RL has become a promising tool for refining world models, existing methods largely rely on conservative rollouts near the training distribution, limiting exploration, behavioral diversity, and richer dynamic discovery. In this work, we challenge this conservative paradigm. We argue that the core limitation is not exploration itself, but the lack of reliable verification strategies to support broader exploration. Without reliable verification, expanded exploration becomes highly susceptible to reward hacking, where policies exploit imperfect rewards without achieving genuine improvement. To evaluate this motivation, we instantiate our method in embodied world models, where physical plausibility, and task completion provide a rigorous testbed for scalable RL under complex dynamics. On the verification side, we introduce Reward as an Agent, an agentic reward framework that actively evaluates generated behaviors to provide robust reward signals and mitigate reward hacking under distribution shifts. On the exploration side, we introduce Dynamic-Aware Rollout Diversification through DynDiff-GRPO, which explicitly expands action-space exploration to diversify trajectories, broaden state-action coverage, and encourage richer embodied behaviors beyond conservative rollout regimes. By unifying Reward as an Agent with DynDiff-GRPO, we enable RL on a more reliable reward foundation with substantially diversified sampling, effectively mitigating reward hacking while yielding significant accuracy gains across multiple open-source world models, thereby demonstrating that broader exploration can scale successfully when grounded in robust verification.

2606.19928 2026-06-19 cs.RO 新提交 85%

SWAP: Symmetric Equivariant World-Model for Agile Robot Parkour

SWAP: 用于敏捷机器人跑酷的对称等变世界模型

Kaixin Lan, Ze Wang, Hongyi Li, Lei Jiang, Chaojie Fu, Chengkai Su, Choi Lam Wong, Yongbin Jin, Hongtao Wang

发表机构 * Center for X-Mechanics, Zhejiang University(浙江大学交叉力学中心) ZJU-Hangzhou Global Scientific and Technology Innovation Center(浙江大学杭州国际科创中心) Mirrorme Technology Co., Ltd.(魔镜科技有限公司)

专题命中 机器人学习 :提出对称等变世界模型用于四足机器人跑酷

AI总结 提出SWAP框架,将对称等变性嵌入世界模型和演员-评论家网络,实现四足机器人跑酷记录突破(跨越2.13米间隙、攀爬1.63米平台),并展现出对未见镜像地形的几何泛化与零样本迁移能力。

详情
AI中文摘要

虽然潜在世界模型能够实现极限跑酷所需的主动预测,但其纯数据驱动的特性迫使它们将左右对称交互冗余编码为独立模式。这增加了学习负担并阻碍了几何规律性的捕获,限制了潜在空间对下游策略的效率。为了解决这个问题,我们提出了SWAP,一个端到端的等变对称世界模型。该框架将对称性直接嵌入到世界模型和演员-评论家网络中。在真实世界测试中,机器人跨越了2.13米的间隙并攀爬了1.63米的高台,打破了四足机器人跑酷的记录。此外,该框架对未见过的镜像地形展现出鲁棒的几何泛化能力,并在多种户外环境中具有卓越的零样本迁移能力。这些结果表明,对称等变性是推动学习型腿式运动物理极限的有效结构先验。

英文摘要

While latent world models enable the proactive predictions required for extreme parkour, their purely data-driven nature forces them to redundantly encode left-right symmetric interactions as independent patterns. This inflates the learning burden and hinders the capture of geometric regularities, restricting the latent space's efficiency for downstream policies. To address this, we propose SWAP, an end-to-end equivariant symmetric world model. This framework embeds symmetry directly into both the world model and the actor-critic networks. In real-world tests, the robot leaps across a 2.13 m gap and climbs a 1.63 m platform, breaking records for quadruped parkour. Furthermore, the framework exhibits robust geometric generalization to unseen mirrored terrains and exceptional zero-shot transferability across diverse outdoor environments. These results demonstrate that symmetry equivariance is an effective structural prior for pushing the physical boundaries of learned legged locomotion.

2606.19774 2026-06-19 cs.RO 新提交 85%

Start Right, Arrive Right: Asynchronous Execution via Initial Noise Selection

开始正确,到达正确:通过初始噪声选择实现异步执行

Trong-Bao Ho, Quang-Tan Nguyen, Thien-Loc Ha, Gia-Binh Nguyen, Viet-Thanh Nguyen, Long Dinh, Minh N. Vu, Duy M. H. Nguyen, An Thai Le, Ngo Anh Vien

发表机构 * VinRobotics VinUniversity DFKI(德国人工智能研究中心) University of Stuttgart(斯图加特大学) IMPRS-IS(国际马克斯·普朗克智能系统研究学院)

专题命中 机器人学习 :通过初始噪声选择解决机器人异步执行中的动作块不一致。

AI总结 针对流式策略异步执行中的动作块边界不一致问题,提出无需训练的PAINT方法,通过初始噪声选择而非轨迹引导实现前缀一致性,在12个模拟和6个真实操作任务中提升执行一致性与任务性能。

Comments First version 19 pages, project site: https://paint-action-chunking.github.io

详情
AI中文摘要

动作分块使机器人策略能够产生时间上连贯的行为,但基于流的策略生成多步动作序列会产生延迟,与实时控制不兼容。在异步执行下,机器人继续执行当前块的同时生成下一个块,即使微小延迟也会在块边界造成不一致。现有方法通过将生成导向已执行的动作前缀来解决此问题。我们则表明,通过在生成开始前选择合适的初始噪声即可实现前缀一致性,使得未经修改的流ODE能够生成连贯的下一块。这将异步推理重新定义为噪声选择问题而非轨迹引导问题。我们提出\textbf{PAINT},一种无需训练的方法,通过后向欧拉反演找到此噪声,并通过重绘规则构建最终块。总之,\texttt{PAINT}不需要梯度、重新训练或策略修改;然而它在\textit{12个模拟基准}和\textit{6个真实世界操作任务}(涵盖单臂、双臂和人形机器人)上提高了执行一致性和任务性能。网站:~\href{ this https URL }{\texttt{ this https URL }}。

英文摘要

Action chunking enables robot policies to produce temporally coherent behavior, but generating multi-step action sequences with flow-based policies incurs latency that is incompatible with real-time control. Under asynchronous execution, the robot continues executing the current chunk while the next one is generated, causing even minor delays to create inconsistencies at chunk boundaries. Existing methods address this problem by steering generation toward the already executed action prefix. We instead show that prefix consistency can be achieved by selecting an appropriate initial noise before generation begins, allowing the unmodified flow ODE to produce a coherent next chunk. This reframes asynchronous inference as a noise selection problem rather than a trajectory steering problem. We introduce \textbf{PAINT}, a training-free method that finds this noise via backward Euler inversion and constructs the final chunk through a repainting rule. In summary, \texttt{PAINT} requires no gradients, retraining, or policy modification; yet it improves execution consistency and task performance across \textit{12 simulated benchmarks} and \textit{6 real-world manipulation tasks} spanning single-arm, bimanual, and humanoid embodiments. Website: ~\href{https://paint-action-chunking.github.io}{\texttt{https://paint-action-chunking.github.io}}.

2606.19752 2026-06-19 cs.RO cs.AI 新提交 85%

Temporal Self-Imitation Learning

时间自我模仿学习

Yinsen Jia, Boyuan Chen

发表机构 * Duke University(杜克大学)

专题命中 机器人学习 :时间自我模仿学习提升长时域机器人操作效率。

AI总结 提出时间自我模仿学习框架,通过挖掘高效成功轨迹并转化为可重用监督信号,提升长时域机器人操作任务的学习效率与鲁棒性。

详情
AI中文摘要

基于奖励塑形训练的长时域机器人操作策略仍可能通过低效交互利用密集奖励,而训练过程中稀有高效行为可能被遗忘。我们认为时间效率本身为强化学习提供了强大且未充分利用的自我监督源。我们引入时间自我模仿学习(TSIL),一种强化学习框架,挖掘学习过程中产生的时间高效成功轨迹,并将其转化为可重用的监督信号以改进未来策略。TSIL通过从快速成功轨迹中提取配置条件自适应时间目标逐步优化学习,并通过效率加权自我模仿学习保留和重放高效行为。在15个不同的长时域操作任务中,TSIL持续提升了学习效率、任务完成效率、快速成功行为的重访率以及对不稳定训练条件的鲁棒性。更广泛地,我们的结果表明,成功行为的时间结构本身为强化学习提供了超越人工奖励塑形的可扩展自我监督信号。

英文摘要

Long-horizon robot manipulation policies trained with reward shaping can still exploit dense rewards through inefficient interaction, while rare efficient behaviors may be forgotten during training. We argue that temporal efficiency itself provides a powerful and underutilized source of self-supervision for reinforcement learning. We introduce Temporal Self-Imitation Learning (TSIL), a reinforcement learning framework that mines temporally efficient successful trajectories generated during learning and converts them into reusable supervision for future policy improvement. TSIL progressively refines learning using configuration-conditioned adaptive temporal targets derived from fast successful trajectories, while preserving and replaying efficient behaviors through efficiency-weighted self-imitation learning. Across 15 distinct long-horizon manipulation tasks, TSIL consistently improves learning efficiency, task-completion efficiency, revisitation of fast successful behaviors, and robustness to unstable training conditions. More broadly, our results suggest that the temporal structure of successful behavior itself provides a scalable self-supervisory signal for reinforcement learning beyond manually engineered reward shaping alone.

2606.19633 2026-06-19 cs.RO cs.AI 新提交 85%

CTS-MoE: Implicit Terrain Adaptation via Mixture-of-Experts for Perceptive Locomotion

CTS-MoE: 基于混合专家模型的隐式地形适应感知运动

Francisco Affonso, Matheus P. Angarola, Ana Luiza Mineiro, Aditya Potnis, Marcelo Becker, Girish Chowdhary

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of São Paulo(圣保罗大学)

专题命中 机器人学习 :提出CTS-MoE用于感知运动,隐式地形适应。

AI总结 针对非连续地形上的感知运动问题,提出CTS-MoE方法,通过密集混合专家策略与感知门控组合共享行为,并用多批评家防止价值干扰,实现端到端训练和隐式地形适应,在仿真和硬件上优于基线。

详情
AI中文摘要

在不连续地形(如楼梯、间隙和障碍物)上的感知腿式运动需要自适应行为,因为单一的保守步态无法产生应对突然拓扑变化所需的预期动作。将该问题视为多任务强化学习,会在共享与分离之间引入张力。任务使用共同的运动基础但具有冲突的奖励,因此策略必须共享行为同时避免价值干扰。先前的工作只解决了其中一方面:整体策略牺牲了专业化,而分层子策略牺牲了跨过渡和未知地形的泛化能力。我们提出CTS-MoE,它结合了密集混合专家执行器与基于感知的门控来组合共享行为,以及具有任务特定价值头的多批评家来防止干扰。该模型在单阶段并发教师-学生设置中进行端到端训练,处理部分可观测性并避免顺序蒸馏,任务标签仅在训练期间使用。部署时,路由仅依赖于感知,从而无需高层选择器或地形分类器即可实现地形适应。在仿真和硬件上对Unitree Go1进行的实验(涵盖已知和未知地形)显示了任务感知的专业化,与整体基线相比,跟踪误差更低,成功率更高。项目网站:此https URL。

英文摘要

Perceptive legged locomotion over discontinuous terrain (e.g., stairs, gaps, and obstacles) requires adaptive behavior, as a single conservative gait cannot produce the anticipatory maneuvers needed for abrupt topology changes. Cast as multi-task reinforcement learning, this problem introduces a tension between sharing and separation. Tasks use a common locomotion base but have conflicting rewards, so a policy must share behavior while avoiding value interference. Prior work addresses only one side, with monolithic policies sacrificing specialization and hierarchical sub-policies sacrificing generalization across transitions and unseen terrain. We propose CTS-MoE, which combines a dense mixture-of-experts actor with perception-based gating to compose shared behaviors and a multi-critic with task-specific value heads to prevent interference. The model is trained end-to-end in a single-stage concurrent teacher-student setup that handles partial observability and avoids sequential distillation, with task labels used only during training. At deployment, routing depends solely on perception, allowing terrain adaptation without a high-level selector or terrain classifier. Experiments on a Unitree Go1 in simulation and on hardware across seen and unseen terrains show task-aware specialization, with lower tracking error and higher success rates than monolithic baselines. Project Website: https://cts-moe.github.io/ .

2606.19598 2026-06-19 cs.RO 新提交 85%

Fail-RAG : A Retrieval Augmented Generation Informed Framework for Robot Failure Identification

Fail-RAG:一种基于检索增强生成的机器人故障识别框架

Ameya Salvi, Jie Hu

发表机构 * Hitachi America, Ltd.(日立美国有限公司)

专题命中 机器人学习 :针对仓库机器人操作故障检测,属于机器人学习

AI总结 提出Fail-RAG框架,利用检索增强生成和视觉语言模型,通过嵌入故障图像和上下文信息并查询数据库,实现机器人操作故障的高效检测,在仓库自动化任务中平均检测准确率提升25个百分点。

详情
AI中文摘要

工业自动化正经历由技术突破和社会变革驱动的机器人演进:向通用机器人、具身和物理人工智能发展,以及劳动力短缺的加剧。智能自主机器人不仅需要按计划运动,还需对意外事件做出反应。本研究聚焦于仓库中物料搬运机器人的意外事件,将其定义为故障,并开发检测机器人操作故障的方法。由于环境和任务的动态性,故障形式可能变化,基于规则的检测方法可能失效。我们提出'Fail-RAG',一种基于检索增强生成(RAG)的故障检测框架,其中故障图像和上下文信息被嵌入,并通过计算相似度查询故障数据库。进一步使用视觉语言模型(VLM)按照指令模板分析故障并提供细节。通过使用固定机械臂和移动操作器在仓库自动化常见任务中进行仿真和物理实验,评估了Fail-RAG的性能。与使用现成VLM相比,Fail-RAG在五种机器人操作类型上的平均故障检测准确率提高了25个百分点,表明其在真实世界故障检测中的有效性。

英文摘要

Industry automation is witnessing an evolution in robotics driven by both technological breakthroughs and societal changes: progress towards generalist robots, embodied and physical artificial intelligence (AI), and increasing labor shortage in manufacturing.An intelligent autonomous robot needs to not only act according to planned motions but also react to any unexpected events. In this study, we focus on such unexpected events in warehouses where robots are used for material handling. Specifically, we refer to any unexpected events as failures and develop methods to detect robot operations related failures. Rule-based detection methods may break since the form of failures could change due to the dynamic nature of both environments and tasks. We propose 'Fail-RAG', a Retrieval Augmented Generation (RAG)-based failure detection framework where failure images and context information are embedded and queried against a failure database by calculating their similarities. Vision-Language Models (VLMs) are further used to analyze failures and provide details by following our instruction template. We evaluated the performance of Fail-RAG by conducting both simulation and physical experiments using fixed robot arms and a mobile manipulator for multiple tasks that are common in warehouse automation. Fail-RAG achieved 25 percentage point higher failure detection accuracy on average across five types of robot operations compared to using off-the-shelf VLMs, indicating its effectiveness for real-world failure detection.

2606.19531 2026-06-19 cs.CV cs.RO 新提交 85%

ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?

ImageWAM:世界动作模型真的需要视频生成,还是只需要图像编辑?

Yuyang Zhang, Wenyao Zhang, Zekun Qi, He Zhang, Haitao Lin, Jingbo Zhang, Yao Mu, Xiaokang Yang, Wenjun Zeng, Xin Jin

发表机构 * Shanghai Jiao Tong University(上海交通大学) Eastern Institute of Technology(东方理工学院) Tencent Robotics X(腾讯机器人X) Tsinghua University(清华大学) Zhongguancun Academy(中关村学院)

专题命中 机器人学习 :用图像编辑模型进行机器人动作预测

AI总结 提出ImageWAM框架,利用预训练图像编辑模型替代视频生成进行机器人动作预测,通过编辑去噪的KV缓存作为世界动作上下文,在多个模拟和真实实验中优于基线,计算量降至1/6,延迟降至1/4。

Comments Project Page: https://zhangwenyao1.github.io/ImageWAM/

详情
AI中文摘要

世界动作模型(WAMs)通常依赖视频生成来桥接视觉世界建模和机器人控制。然而,基于视频的WAMs面临三个耦合的限制:密集的多帧未来令牌使得推理成本高昂,完整的视频预测将容量花费在与动作无关的时间和外观细节上,以及长期未来想象可能引入误导动作预测的错误。这些问题提出了一个简单的问题:世界动作模型真的需要视频生成吗?我们提出ImageWAM,一个简单的WAM框架,将预训练的图像编辑模型重新用于机器人动作预测。与视频生成相比,图像编辑提供了更匹配的先验:它只需要建模目标帧变换,关注与动作相关的当前到目标视觉差异,并通过编辑预训练将任务指令接地到局部视觉变化。在实践中,ImageWAM在推理时不解码目标帧;相反,它根据图像编辑去噪产生的KV缓存条件化一个流匹配动作专家,将其用作紧凑的世界动作上下文。ImageWAM在多个模拟和真实世界实验中优于标准VLA基线和匹配的竞争性WAM,且无需额外的策略预训练。它还将FLOPs降低到基于视频的WAMs的1/6,延迟降低到1/4。注意力分析进一步表明,编辑缓存聚焦于任务相关的变化区域,支持图像编辑作为基于视频的世界动作建模的有效替代方案。

英文摘要

World Action Models (WAMs) commonly rely on video generation to bridge visual world modeling and robot control. However, video-based WAMs face three coupled limitations: dense multi-frame future tokens make inference costly, full video prediction spends capacity on action-irrelevant temporal and appearance details, and long-horizon future imagination may introduce errors that mislead action prediction. These issues raise a simple question: Does world action model really need video generation? We propose ImageWAM, a simple WAM framework that repurposes pretrained image editing models for robot action prediction. In contrast to video generation, image editing provides a better-matched prior: it only needs to model a target-frame transformation, focuses on action-relevant current-to-target visual differences, and grounds task instructions to localized visual changes through edit pretraining. In practice, ImageWAM does not decode the target frame at inference time; instead, it conditions a flow-matching action expert on the KV caches produced by image-editing denoising, using them as a compact world-action context. ImageWAM outperforms standard VLA baselines and matching competitive WAMs without additional policy pretraining across different simulator and real-world experiments. It also reduces FLOPs to 1/6 and latency to 1/4 of video-based WAMs. Attention analysis further shows that editing caches focus on task-relevant change regions, supporting image editing as an effective alternative to video-based world-action modeling.

2510.05013 2026-06-19 stat.ML cs.LG 85%

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology(冲绳科学技术大学院大学)

专题命中 机器人学习 :好奇心驱动的机器人动作与语言学习

AI总结 本研究通过好奇心驱动的机器人自我探索,结合Q学习实现主动推理,揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式,为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情
AI中文摘要

婴儿通过极少的经验就能泛化习得语言,而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么?我们通过实验研究了这一问题,其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句(例如,推红色立方体)相关的动作。我们的方法使用Q学习摊销主动推理,实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加,泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现,这种模式类似于儿童语言学习中的表征重述。这些结果表明,好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

2606.20549 2026-06-19 cs.RO 新提交 80%

Generating Robot Hands from Human Demonstrations

从人类演示生成机器人手

Sha Yi, Nicklas Hansen, Xueqian Bai, Carmelo Sferrazza, Michael T. Tolley, Xiaolong Wang

发表机构 * University of California San Diego(加州大学圣迭戈分校) Amazon Frontier AI & Robotics(亚马逊前沿人工智能与机器人)

专题命中 机器人学习 :从人类演示生成机器人手的设计框架。

AI总结 提出数据驱动框架,利用人类日常操作中超过400万帧指尖运动数据,通过逆运动学匹配指尖位置,优化树状结构机器人手的设计,生成通用6自由度手和低自由度任务专用手,并训练强化学习智能体加速设计搜索。

详情
AI中文摘要

机器人学习在控制学习方面取得了快速进展,但学习机器人的物理身体仍然困难得多,因为同时搜索设计和控制会产生一个非常大的组合问题。在这里,我们提出了一个数据驱动的框架,用于从人类演示生成机器人手。我们不是为每个候选设计学习一个复杂的控制器,而是使用制造后使用的相同简单控制策略来生成机器人手设计:通过逆运动学匹配指尖位置。利用来自日常操作的超过400万帧人类指尖运动数据,我们的算法优化树状结构机器人手以再现所需的目标运动。该框架产生了一个6自由度(DoF)通用手和具有空间四杆仿生关节的低自由度任务专用手。为了加速设计搜索,我们训练了一个强化学习(RL)智能体来提出好的手设计和关节角度,将搜索时间从数小时减少到数分钟。我们直接将机制制作为具有打印就绪关节的一体式铰接结构。在真实世界实验中,6自由度手实现了高度精确的遥操作指尖跟踪,优于现有的商用机器人手,而专门的3自由度手以降低的机械复杂性再现了结构化的人类和合成轨迹。这些结果表明,大规模人类运动数据不仅可以用于训练机器人控制器,还可以作为优化和生成机器人物理实体的参考。

英文摘要

Robot learning has advanced rapidly in learning control, but learning the physical body of a robot remains much more difficult because jointly searching over design and control creates a very large combinatorial problem. Here, we present a data-driven framework for generating robot hands from human demonstrations. Instead of learning a complex controller together with each candidate design, we generate robot hand designs using the same simple control policy used after fabrication: matching fingertip positions through inverse kinematics. Using more than 4 million frames of human fingertip motion from everyday manipulation, our algorithm optimizes tree-structured robot hands to reproduce desired target motions. The framework produced both a 6-degree-of-freedom (DoF) general-purpose hand and lower-DoF task-specific hands with spatial four-bar mimic joints. To accelerate the search over designs, we trained a reinforcement-learning (RL) actor to propose good hand designs and joint angles, reducing search time from hours to minutes. We fabricated the mechanisms directly as one-piece articulated structures with print-in-place joints. In real-world experiments, the 6-DoF hand achieved highly accurate teleoperated fingertip tracking better than available commercial robot hands, whereas the specialized 3-DoF hands reproduced structured human and synthetic trajectories with reduced mechanical complexity. These results showed that large-scale human motion data can be used not only to train robot controllers but also as a reference for optimizing and generating the physical embodiment of robots.

2606.20428 2026-06-19 cs.RO 新提交 80%

ARC: Adaptive Robust Joint State and Covariance Estimation

ARC:自适应鲁棒联合状态与协方差估计

Alexandre Hadji-Thomas, Andrew Stirling, James R. Forbes

专题命中 机器人学习 :鲁棒状态估计,用于机器人传感器数据处理

AI总结 提出统一块坐标下降框架,结合自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式估计器,实现离群值下状态与协方差的自适应联合估计。

Comments Submitted to information IEEE Robotics and Automation Letters (RA-L), June 2026. 8 pages, 7 figures, 1 table

详情
AI中文摘要

传感器测量经常受到离群值和非高斯噪声的污染。这些传感器数据中的缺陷会导致经典状态估计器产生有偏且不可靠的状态和不确定性估计。鲁棒估计器拒绝或降低离群值的权重,但不进行测量协方差估计,而联合状态和协方差估计器假设高斯残差和固定的损失形状参数。将这两种能力整合到一个框架中,可以在存在离群值的情况下同时估计状态和协方差。本文提出了一种统一的块坐标下降框架,该框架结合了范数感知自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式协方差估计器,产生了一个自调谐的联合状态和协方差估计器。该框架在蒙特卡洛模拟和真实世界超宽带定位实验(在杂乱的视距外环境中)中进行了评估。结果表明,所提出的估计器能够一致地恢复真实的内点测量协方差,并在状态估计精度上达到或超过所有基线方法,且无需任何手动参数调整。

英文摘要

Sensor measurements are frequently corrupted by outliers and non-Gaussian noise. These imperfections in the sensor data can cause classical state estimators to generate biased and unreliable state and uncertainty estimates. Robust estimators reject or downweight outliers but do not perform measurement covariance estimation, whereas joint state and covariance estimators assume Gaussian residuals and fixed loss shape parameters. Integrating these two capabilities into a single framework is an opportunity to simultaneously estimate both state and covariance in the presence of outliers. This paper proposes a unified Block-Coordinate Descent framework that combines a norm-aware adaptive robust loss, an Iteratively Reweighted Least-Squares state update, and a Minimum Weighted Covariance Determinant covariance estimator, yielding a self-tuning joint state and covariance estimator. The framework is evaluated in a Monte-Carlo simulation and on real-world ultra-wideband localization experiments in cluttered non-line-of-sight environments. Results show that the proposed estimator consistently recovers the true inlier measurement covariance and matches or exceeds the state estimation accuracy of all baselines, without requiring any manual parameter tuning.

2606.20197 2026-06-19 cs.RO 新提交 80%

Stable Transformer-Actor-Critic Model Predictive Control: A Contraction Analysis Approach

稳定的Transformer-Actor-Critic模型预测控制:一种收缩分析方法

Antonio Marino, Valerio Modugno, Marco Cognetti

发表机构 * University of Cambridge(剑桥大学) University College London(伦敦大学学院) LAAS-CNRS(Laas--cnrs)

专题命中 机器人学习 :Transformer-Actor-Critic MPC用于无人机控制。

AI总结 提出一种Transformer-Actor-Critic MPC架构,通过证明Transformer满足增量输入-状态稳定性并利用黎曼收缩理论分析互联动力学,将理论界作为训练正则化项,实现可证明鲁棒的控制策略。

详情
AI中文摘要

Actor-Critic模型预测控制(MPC)有效解决了复杂的非凸控制问题,但保证这些流程中基于序列的学习模型的闭环稳定性仍然具有挑战性。本文介绍了一种新颖的Transformer-Actor-Critic MPC架构,具有形式化的鲁棒性保证。首先,我们证明了Transformer网络可以满足全局增量输入-状态稳定性($\delta$ISS)。然后,我们利用黎曼收缩理论分析物理对象与预测神经网络之间的互联动力学。最后,我们将这些理论界作为训练正则化项,以产生可证明鲁棒的策略。该框架在非线性3D无人机模型上进行了验证,执行目标到达和避障机动。

英文摘要

Actor-Critic Model Predictive Control (MPC) effectively addresses complex, non-convex control problems, but guaranteeing the closed-loop stability of sequence-based learning models within these pipelines remains challenging. This paper introduces a novel Transformer-Actor-Critic MPC architecture with formal robustness guarantees. First, we prove that Transformer networks can satisfy global incremental Input-to-State Stability ($δ$ISS). We then leverage Riemannian contraction theory to analyze the interconnected dynamics between the physical plant and the predictive neural network. Finally, we integrate these theoretical bounds as a training regularizer to yield a certifiably robust policy. The framework is validated on a nonlinear 3D drone model executing target-reaching and obstacle-avoidance maneuvers.

2606.20031 2026-06-19 cs.RO cs.AI 新提交 80%

A Neuromorphic Reinforcement Learning Framework for Efficient Pathfinding in Robotic Mobile Fulfillment Systems

一种用于机器人移动履行系统高效路径规划的神经形态强化学习框架

Junzhe Xu, Zecui Zeng, Lusong Li, Yuetong Fang, Renjing Xu

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) JD Explore Academy(京东探索研究院)

专题命中 机器人学习 :神经形态强化学习用于机器人路径规划。

AI总结 提出SDQN-RMFS框架,通过ANN到SNN的转换和硬标签知识蒸馏,在神经形态芯片上实现超低功耗路径规划,相比GPU能耗降低11281倍,延迟减少近一半。

详情
AI中文摘要

动态环境变化、受限工作空间和严格的实时约束使得机器人移动履行系统(RMFS)中的路径规划对传统的搜索和基于规则的方法来说是一个具有挑战性的问题,这些方法通常遭受高计算复杂性和长决策延迟。虽然强化学习(RL)已成为一种强大的替代方案,但在资源受限的硬件上以极端的能源效率部署学习到的策略仍然是一个开放的挑战。我们提出了SDQN-RMFS,一个端到端的框架,实现了从全精度人工神经网络(ANN)训练的RL策略到神经形态芯片的高保真部署。通过仅在稀疏事件触发时进行计算,该框架实现了超低功耗的RMFS路径规划。我们的全栈流水线操作如下:首先通过碰撞允许策略高效训练ANN策略以密集化信息轨迹,然后通过硬标签知识蒸馏方法将其转换为脉冲神经网络(SNN)。这有效地解决了输出分布不匹配问题,在保持策略能力的同时显著降低了推理延迟。硬件实验表明,与高性能GPU基线相比,能耗节省高达11281倍,延迟几乎减少两倍,同时决策质量与原始训练策略相当。这些结果确立了物理神经形态推理作为大规模RMFS运营的实用且能源可持续的途径。

英文摘要

Dynamic environmental changes, confined workspaces, and stringent real-time constraints make pathfinding in Robotic Mobile Fulfillment Systems (RMFS) a challenging problem for conventional search- and rule-based methods, which typically suffer from high computational complexity and long decision latency. While reinforcement learning (RL) has emerged as a powerful alternative, deploying learned policies with extreme energy efficiency on resource-constrained hardware remains an open challenge. We present SDQN-RMFS, an end-to-end framework that achieves high-fidelity deployment of an RL-trained policy from a full-precision artificial neural network (ANN) through to a neuromorphic chip. By computing only when triggered by sparse events, this framework unlocks ultra-low-power RMFS pathfinding. Our full-stack pipeline operates as follows: an ANN policy is first efficiently trained via a collision-allowing strategy to densify informative trajectories, and then converted into a spiking neural network (SNN) via a hard-label knowledge distillation approach. This effectively addresses the output distribution mismatch, preserving policy capability across the ANN-to-SNN pipeline while substantially reducing inference latency. Hardware experiments demonstrate up to 11,281$\times$ energy savings and a nearly two-fold reduction in latency compared to a high-performance GPU baseline, while maintaining decision quality on par with the original trained policy. These results establish physical neuromorphic inference as a practical and energy-sustainable pathway for large-scale RMFS operations.

2606.19935 2026-06-19 cs.AI 新提交 80%

PhysDrift: Bridging the Embodiment Gap in Humanoid Co-Speech Motion Generation

PhysDrift: 弥合人形机器人共语动作生成中的具身差距

Zhangzhao Liang, Xiaofen Xing, Mingyue Yang, Wenlve Zhou, Xiangmin Xu

发表机构 * South China University of Technology(华南理工大学) DexForce Technology(DexForce科技公司) Foshan University(佛山大学)

专题命中 机器人学习 :提出人形机器人共语动作生成框架PhysDrift

AI总结 针对人形机器人共语动作生成中人体运动流形与机器人具身约束不匹配的问题,提出IK-EER框架和PhysDrift模型,直接预测可执行关节轨迹,提升运动对齐、物理合理性和实时交互能力。

详情
AI中文摘要

人形机器人需要共语动作,这些动作不仅要富有表现力且与语音对齐,还要在具身约束下物理可执行。现有的共语动作生成流程主要是以人为中心的:首先以人体表示(如SMPL-X)生成动作,随后重定向到人形机器人。在这项工作中,我们识别出这种范式中的基本具身差距,即人体运动流形与人形机器人具身约束之间的不匹配在运动转移和物理执行过程中破坏了具身一致性。通过广泛分析,我们表明尽管重定向可以保留粗粒度的运动语义,但它显著压缩了运动多样性并削弱了韵律-动作同步,限制了富有表现力的人形机器人行为。为解决此问题,我们首先提出IK-EER,一种保留韵律的人形机器人运动策展框架,在重定向过程中联合优化运动学可行性和语音-运动时间对齐。基于策展的机器人原生运动数据集,我们进一步引入PhysDrift,一种具身感知的共语动作生成框架,直接预测可执行的人形机器人关节轨迹,无需依赖中间人体表示。与传统的以人为中心的流程不同,PhysDrift在训练和推理过程中都保持具身一致性,同时加入物理正则化以稳定机器人运动动态。大量实验和真实世界人形机器人部署表明,具身感知的机器人原生生成显著改善了语音-运动对齐、物理合理性、运动平滑性、推理效率和实时交互能力。

英文摘要

Humanoid robots require co-speech motions that are not only expressive and speech-aligned, but also physically executable under embodiment constraints. Existing co-speech generation pipelines are predominantly human-centric: motions are first generated in human-body representations such as SMPL-X and subsequently retargeted to humanoid robots. In this work, we identify a fundamental embodiment gap in this paradigm, where the mismatch between human motion manifolds and humanoid embodiment constraints disrupts embodiment consistency during motion transfer and physical execution. Through extensive analysis, we show that although retargeting can preserve coarse motion semantics, it significantly compresses motion diversity and weakens prosody-motion synchronization, limiting expressive humanoid behaviors. To address this problem, we first propose IK-EER, a prosody-preserving humanoid motion curation framework that jointly optimizes kinematic feasibility and speech-motion temporal alignment during retargeting. Building upon the curated robot-native motion dataset, we further introduce PhysDrift, an embodiment-aware co-speech motion generation framework that directly predicts executable humanoid joint trajectories from speech without relying on intermediate human-body representations. Unlike conventional human-centric pipelines, PhysDrift maintains embodiment consistency throughout both training and inference while incorporating physical regularization to stabilize robot motion dynamics. Extensive experiments and real-world humanoid deployment demonstrate that embodiment-aware robot-native generation substantially improves speech-motion alignment, physical plausibility, motion smoothness, inference efficiency, and real-time interaction capability.

2606.19914 2026-06-19 cs.RO cs.AI 新提交 80%

Co-policy: Responsive Human-Robot Co-Creation for Musical Performances

Co-policy: 响应式人机音乐共创框架

Xuetao Li, Wenke Huang, Mang Ye, Zijian Liu, Jinhua Xie, Jifeng Xuan, Miao Li

发表机构 * School of Computer Science, Wuhan University(武汉大学计算机学院) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算与数据科学学院) School of Automation, Wuhan University of Technology(武汉理工大学自动化学院) School of Geodesy and Geomatics, Wuhan University(武汉大学测绘学院) School of Robotics, Wuhan University(武汉大学机器人学院)

专题命中 机器人学习 :提出人机音乐共创框架Co-policy

AI总结 提出Co-policy框架,通过语义锚定、约束变分和视觉运动策略实现人机音乐实时共创,在真实钟琴实验中优于扩散策略基线。

详情
AI中文摘要

艺术长期以来一直是人类创造力的关键表达。具身人工智能为生成模型通过物理动作而非无形数字内容参与创造力提供了一条途径。在机器人音乐共创中,将语义音乐理解与实时且可物理执行的表演连接起来具有挑战性。我们提出了Co-policy,一个人机音乐共创框架,它分离了语义意图接地、约束音乐变分和视觉运动执行。为了接地音乐语义,Co-policy使用预推理语义锚点和微调的Qwen-vl规划器(F-Qwen)将语音、实时音乐种子和视觉观察转化为结构化的共创计划。为了支持低延迟执行,Co-policy引入了高斯混合视觉运动策略(GMP),实现为条件混合密度策略,在单次前向传递中将目标音符和视觉上下文映射到多模态机器人动作。与仅复现用户指定音符的机器人回放系统不同,Co-policy在音乐和物理约束下生成互补的音乐响应。真实机器人钟琴实验、消融研究和专家评估显示,与扩散策略和消融基线相比,意图对齐、执行准确性和响应频率均有提升,支持物理接地动作生成作为具身人机共创的关键要求。

英文摘要

Art has long stood as a pivotal expression of human creativity. Embodied artificial intelligence offers a route for generative models to participate in that creativity through physical action rather than disembodied digital content. In robotic music co-creation, it is challenging to connect semantic musical understanding with real-time and physically executable performance. We present Co-policy, a framework for human-robot musical co-creation that separates semantic intent grounding, constrained musical variation, and visuomotor execution. To ground musical semantics, Co-policy uses pre-inference semantic anchors and a fine-tuned Qwen-vl planner (F-Qwen) to transform speech, live musical seeds, and visual observations into structured co-creation plans. To support low-latency execution, Co-policy introduces a Gaussian-Mixture Visuomotor Policy (GMP), implemented as a conditional mixture-density policy that maps target notes and visual context to multimodal robot actions in a single forward pass. Unlike robotic playback systems that merely reproduce user-specified notes, Co-policy generates complementary musical responses under both musical and physical constraints. Real-robot chime experiments, ablations, and expert evaluation show improved intent alignment, execution accuracy, and response frequency over diffusion-policy and ablated baselines, supporting physically grounded action generation as a key requirement for embodied human-AI co-creation.

2606.19711 2026-06-19 cs.RO cs.LG cs.SY eess.SY 新提交 80%

A Differentiable Composite Approximation Framework for Autonomous Underwater Vehicle Maneuvering Modeling from Sea-Trial Data

一种可微复合近似框架:基于海试数据的自主水下航行器机动建模

Aobo Wang, Aifei Xia, Zihao Wang, Lizhu Hao

发表机构 * College of Shipbuilding Engineering, Harbin Engineering University(哈尔滨工程大学船舶工程学院) China Academy of Aerospace Aerodynamics(中国航天空气动力技术研究院) Institute of Artificial Intelligence, Shanghai University(上海大学人工智能研究院) China Ship Scientific Research Center(中国船舶科学研究中心)

专题命中 机器人学习 :提出AUV机动建模的可微复合近似框架。

AI总结 提出可微复合近似框架,结合多项式基与数据自适应基联合校准,并引入转向运动电流估计补偿,提升AUV机动预测精度。

详情
AI中文摘要

基于机载测量的场建模可以生成反映真实运行特性的自主水下航行器(AUV)机动模型。从近似角度看,传统机动模型使用预定义的约束多项式基,而数据驱动模型使用数据自适应基。受此基函数视角启发,本文提出一种可微复合近似公式,其中多项式基分量和数据自适应基分量被视为单个预测器的可微部分并联合校准。开发了一种基于梯度的协同校准方法用于全尺寸AUV机动预测,其中灵敏度感知机制调节有界多项式更新,而神经残差在共享预测目标下捕获剩余非线性差异。为了考虑现场数据中的海流效应,引入了一种基于转向运动的电流估计和补偿程序,以构建电流补偿的学习目标用于训练和滚动预测。该框架使用从7米长AUV在多种机动条件下收集的海试数据进行评估。结果表明,与纯多项式、纯神经网络和冻结先验混合基线相比,所提方法改进了递归轨迹和速度预测,证明了其在基于现场数据的AUV机动建模中的适用性。

英文摘要

Field-based modeling from onboard measurements can produce autonomous underwater vehicle (AUV) maneuvering models that reflect real operating characteristics. From an approximation perspective, conventional maneuvering models use predefined constraint polynomial bases, whereas data-driven models use data-adaptive bases. Motivated by this basis-function view, this paper presents a differentiable composite-approximation formulation, in which the polynomial-basis component and the data-adaptive basis component are treated as differentiable parts of a single predictor and calibrated jointly. A gradient-based co-calibration method is developed for full-scale AUV maneuvering prediction, where a sensitivity-aware mechanism regulates bounded polynomial updates while the neural residual captures remaining nonlinear discrepancies under a shared prediction objective. To account for ocean-current effects in field data, a turning-motion-based current estimation and compensation procedure is incorporated to construct current-compensated learning targets for training and rollout. The framework is evaluated using sea-trial data collected from a 7-meter AUV under multiple maneuvering conditions. Results show that the proposed method improves recursive trajectory and velocity prediction compared with polynomial-only, neural-only, and frozen-prior hybrid baselines, demonstrating its applicability to field-data-based AUV maneuvering modeling.