机器人 / 具身智能

2606.02800 2026-06-18 cs.CV cs.AI cs.LG cs.MM cs.RO 版本更新专题 90

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3：面向物理AI的全模态世界模型

NVIDIA, :, Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski

发表机构 * NVIDIA

专题命中机器人基础模型：为具身智能体提供通用骨干网络

AI总结提出基于统一混合Transformer架构的全模态世界模型Cosmos 3，联合处理语言、图像、视频、音频和动作序列，在理解和生成任务上达到新最优，为具身智能体提供可扩展的通用骨干。

详情

AI中文摘要

我们介绍了Cosmos 3，一个全模态世界模型家族，设计用于在统一的混合Transformer架构中联合处理和生成语言、图像、视频、音频和动作序列。通过支持高度灵活的输入输出配置，Cosmos 3无缝统一了物理AI的关键模态——有效地将视觉语言模型、视频生成器、世界模拟器和世界动作模型整合到一个框架中。我们的评估表明，Cosmos 3在一系列多样化的理解和生成任务中确立了新的最优水平，展示了全模态世界模型作为具身智能体可扩展、通用骨干的能力。我们的后训练Cosmos 3模型在技术报告撰写时被Artificial Analysis评为最佳开源文本到图像和图像到视频模型，并被RoboArena评为最佳策略模型。为了加速物理AI领域的开放研究和部署，我们在Linux基金会的OpenMDW-1.1许可证下提供我们的代码、模型检查点、策划的合成数据集和评估基准，网址为https://this https URL License at this https URL }{ this http URL and this https URL。项目网站位于https://this https URL。

英文摘要

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

URL PDF HTML ☆

赞 0 踩 0

2605.05925 2026-06-18 cs.RO 版本更新专题 90

DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

DexSynRefine：合成与精炼人-物交互运动以实现物理可行的灵巧机器人动作

Hyesung Lee, Hyunwoo Jung, Si-Hwan Heo, Sungwook Yang

发表机构 * Korea Institute of Science and Technology（韩国科学技术院）； KAIST（韩国科学技术院）； Hanyang University（翰阳大学）

专题命中机器人操作：提出DexSynRefine框架，实现灵巧机器人操作。

AI总结提出DexSynRefine框架，通过HOI-MMFP运动先验合成手-物轨迹，结合任务空间残差强化学习和接触动力学适应，将人-物交互数据转化为物理可行的灵巧操作，在五个任务上成功率提升50-70个百分点。

Comments Project page: https://dexsynrefine.github.io/

详情

AI中文摘要

从人-物交互（HOI）数据中学习灵巧操作为机器人遥操作提供了一种可扩展的替代方案，但HOI演示通常稀疏且纯运动学，在实体不匹配和接触丰富的动力学下直接重定向不可靠。我们提出DexSynRefine，一个耦合框架，将HOI数据视为结构化运动先验而非可执行的机器人动作。DexSynRefine首先使用HOI运动流形流基元（HOI-MMFP）——一种耦合手-物运动的运动先验，根据任务和初始物体状态合成手-物轨迹。然后通过任务空间残差强化学习对其进行物理接地，并通过从本体感受历史推断缺失的接触动力学上下文来适应执行。在五个灵巧操作任务中，每个阶段解决一个互补的瓶颈：HOI-MMFP提高了轨迹一致性和平滑性，任务空间残差在测试的替代方案中提供了最强的接地表示，接触动力学适应实现了鲁棒的真实世界执行。综合来看，DexSynRefine在真实世界中的成功率比运动学重定向提高了50-70个百分点。

英文摘要

Learning dexterous manipulation from human-object interaction (HOI) data offers a scalable alternative to robot teleoperation, but HOI demonstrations are typically sparse and purely kinematic, making direct retargeting unreliable under embodiment mismatch and contact-rich dynamics. We present DexSynRefine, a coupled framework that treats HOI data as structured motion priors rather than executable robot actions. DexSynRefine first synthesizes hand-object trajectories conditioned on the task and initial object state using HOI Motion Manifold Flow Primitives (HOI-MMFP), a motion prior for coupled hand-object motion. It then physically grounds them with task-space residual reinforcement learning and adapts execution by inferring missing contact-dynamics context from proprioceptive history. Across five dexterous manipulation tasks, each stage addresses a complementary bottleneck: HOI-MMFP improves trajectory consistency and smoothness, task-space residuals provide the strongest grounding representation among the tested alternatives, and contact-dynamics adaptation enables robust real-world execution. Together, DexSynRefine improves real-world success rates over kinematic retargeting by 50-70~percentage points.

URL PDF HTML ☆

赞 0 踩 0

2601.20381 2026-06-18 cs.RO 版本更新专题 85

STORM: Slot-based Task-aware Object-centric Representation for robotic Manipulation

STORM：基于槽的任务感知面向对象的机器人操作表示

Alexandre Chapin, Emmanuel Dellandréa, Liming Chen

发表机构 * Ecole Centrale de Lyon, LIRIS（里尔森中央理工大学，LIRIS实验室）

专题命中机器人操作：提出STORM模块用于机器人操作表示学习。

AI总结提出STORM模块，通过多阶段训练策略将冻结的视觉基础模型与语义感知槽结合，生成面向对象的任务感知表示，提升机器人操作在视觉干扰下的泛化性和控制性能。

详情

AI中文摘要

视觉基础模型为机器人提供了强大的感知特征，但其密集表示缺乏显式的对象级结构，限制了操作任务的鲁棒性和可收缩性。我们提出STORM（基于槽的任务感知面向对象的机器人操作表示），一个轻量级的面向对象适应模块，通过一组语义感知槽增强冻结的视觉基础模型，用于机器人操作。STORM不重新训练大型骨干网络，而是采用多阶段训练策略：首先通过使用语言嵌入的视觉-语义预训练稳定面向对象的槽，然后与下游操作策略联合适应。这种分阶段学习防止了退化槽的形成，并在保持语义一致性的同时将感知与任务目标对齐。在对象发现基准和模拟操作任务上的实验表明，与直接使用冻结的基础模型特征或端到端训练面向对象的表示相比，STORM改善了对视觉干扰物的泛化能力和控制性能。我们的结果强调了多阶段适应作为将通用基础模型特征转化为用于机器人控制的任务感知面向对象表示的有效机制。

英文摘要

Visual foundation models provide strong perceptual features for robotics, but their dense representations lack explicit object-level structure, limiting robustness and contractility in manipulation tasks. We propose STORM (Slot-based Task-aware Object-centric Representation for robotic Manipulation), a lightweight object-centric adaptation module that augments frozen visual foundation models with a small set of semantic-aware slots for robotic manipulation. Rather than retraining large backbones, STORM employs a multi-phase training strategy: object-centric slots are first stabilized through visual--semantic pretraining using language embeddings, then jointly adapted with a downstream manipulation policy. This staged learning prevents degenerate slot formation and preserves semantic consistency while aligning perception with task objectives. Experiments on object discovery benchmarks and simulated manipulation tasks show that STORM improves generalization to visual distractors, and control performance compared to directly using frozen foundation model features or training object-centric representations end-to-end. Our results highlight multi-phase adaptation as an efficient mechanism for transforming generic foundation model features into task-aware object-centric representations for robotic control.

URL PDF HTML ☆

赞 0 踩 0

2510.18085 2026-06-18 cs.RO cs.AI cs.MA 版本更新专题 90

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

R2BC: 从单智能体演示进行多智能体模仿学习

Connor Mattson, Varun Raveendra, Ellen Novoseller, Nicholas Waytowich, Vernon J. Lawhern, Daniel S. Brown

发表机构 * Kahlert School of Computing, University of Utah（犹他大学凯勒尔计算学院）； DEVCOM Army Research Laboratory（陆军研究实验室）

专题命中机器人学习：多机器人模仿学习，核心是机器人学习

AI总结提出R2BC方法，通过轮换单智能体演示训练多机器人系统，无需联合动作空间演示，在模拟和实物任务中性能媲美或超越基于特权同步演示的基线方法。

Comments 8 pages, 6 figures. In Proceedings: IEEE International Conference on Robotics & Automation (ICRA 2026)

详情

AI中文摘要

模仿学习（IL）是人类教授机器人的自然方式，尤其是在高质量演示易于获取的情况下。虽然IL已广泛应用于单机器人场景，但将其扩展到多智能体系统的研究相对较少，尤其是在单个人类必须为协作机器人团队提供演示的场景中。本文介绍并研究了轮换行为克隆（R2BC），该方法使单个人类操作员能够通过顺序的单智能体演示有效训练多机器人系统。我们的方法允许人类一次远程操作一个智能体，并逐步向整个系统教授多智能体行为，无需联合多智能体动作空间的演示。我们表明，在四个多智能体模拟任务中，R2BC方法的性能与基于特权同步演示的Oracle行为克隆方法相当，甚至在某些情况下超越后者。最后，我们在两个使用真实人类演示训练的物理机器人任务上部署了R2BC。

英文摘要

Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.

URL PDF HTML ☆

赞 0 踩 0

2512.11736 2026-06-18 cs.RO 版本更新专题 85

Bench-Push: Benchmarking Pushing-based Navigation and Manipulation Tasks for Mobile Robots

Bench-Push：基于推动的移动机器人导航与操作任务基准测试

Ninghan Zhong, Steven Caro, Megnath Ramesh, Rishi Bhatnagar, Avraiem Iskandar, Stephen L. Smith

发表机构 * Institute for Robotics and Intelligent Machines, Georgia Institute of Technology（机器人与智能机器研究所，佐治亚理工学院）； Department of Electrical and Computer Engineering, University of Waterloo（电气与计算机工程系，滑铁卢大学）； Department of Mechanical Engineering, University of Alberta（机械工程系，阿尔伯塔大学）

专题命中机器人学习：提出推动式移动机器人导航与操作基准

AI总结提出首个统一的推动式移动机器人导航与操作基准Bench-Push，包含多种模拟环境、新评估指标和基线实现，用于解决可移动障碍物环境中的机器人推动任务评估问题。

Comments Published in CRV 2026

详情

AI中文摘要

移动机器人越来越多地部署在具有可移动物体的杂乱环境中，这对禁止交互的传统方法提出了挑战。在这种环境中，移动机器人必须超越传统的避障策略，利用推动或轻推策略来实现其目标。尽管基于推动的机器人研究正在增长，但评估依赖于临时设置，限制了可重复性和交叉比较。为了解决这个问题，我们提出了Bench-Push，这是首个用于基于推动的移动机器人导航和操作任务的统一基准。Bench-Push包括多个组件：1）一系列全面的模拟环境，捕捉推动任务中的基本挑战，包括在具有可移动障碍物的迷宫中导航、自主船舶在冰覆盖水域中导航、箱子递送和区域清理，每个任务都有不同复杂程度；2）新的评估指标，用于捕捉效率、交互努力和部分任务完成；3）使用Bench-Push评估跨环境的已建立基线的示例实现。Bench-Push作为Python库开源，采用模块化设计。代码、文档和训练模型可在https://this URL找到。

英文摘要

Mobile robots are increasingly deployed in cluttered environments with movable objects, posing challenges for traditional methods that prohibit interaction. In such settings, the mobile robot must go beyond traditional obstacle avoidance, leveraging pushing or nudging strategies to accomplish its goals. While research in pushing-based robotics is growing, evaluations rely on ad hoc setups, limiting reproducibility and cross-comparison. To address this, we present Bench-Push, the first unified benchmark for pushing-based mobile robot navigation and manipulation tasks. Bench-Push includes multiple components: 1) a comprehensive range of simulated environments that capture the fundamental challenges in pushing-based tasks, including navigating a maze with movable obstacles, autonomous ship navigation in ice-covered waters, box delivery, and area clearing, each with varying levels of complexity; 2) novel evaluation metrics to capture efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-Push to evaluate example implementations of established baselines across environments. Bench-Push is open-sourced as a Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN.

URL PDF HTML ☆

赞 0 踩 0

2602.01700 2026-06-18 cs.RO 版本更新专题 80

Tilt-Ropter: A Fully Actuated Hybrid Aerial-Terrestrial Vehicle with Tilt Rotors and Passive Wheels

Tilt-Ropter: 一种带有倾转旋翼和被动轮的全驱动混合空中-地面车辆

Ruoyu Wang, Xuchen Liu, Zongzhou Wu, Zixuan Guo, Wendi Ding, Ben M. Chen

发表机构 * Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong（机械与自动化工程系，香港中文大学）； Faculty of Engineering, The University of Hong Kong（工程学院，香港大学）； Peng Cheng Laboratory（鹏城实验室）

专题命中机器人学习：提出混合空中-地面车辆Tilt-Ropter，属于机器人。

AI总结提出全驱动混合空中-地面车辆Tilt-Ropter，通过倾转旋翼和被动轮实现高效多模态运动，并设计统一非线性模型预测控制器实现低跟踪误差和地面运动功耗降低92.8%。

Comments 8 pages, 10 figures. Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026)

详情

AI中文摘要

在这项工作中，我们提出了Tilt-Ropter，一种全驱动的混合空中-地面车辆（HATV），它集成了倾转旋翼和被动轮，以实现高效的多模态运动。与传统的欠驱动HATV不同，Tilt-Ropter的全驱动设计允许力和扭矩解耦控制，提高了机动性和地面运动效率。开发了一个统一的非线性模型预测控制器（NMPC）来跟踪参考轨迹，强制执行非完整约束，并适应运动模式间的接触效应，同时通过专门的控制分配确保执行器可行性。为了解决复杂的轮地动力学问题，集成了一个外部力估计器来提供实时交互力估计。该系统通过仿真和实际实验进行了验证，包括无缝的空地过渡和轨迹跟踪任务。实验结果表明，两种模式下的跟踪误差都很低，并且地面运动期间的功耗相比飞行降低了92.8%，突显了该平台在能源受限环境中执行长时间任务的适用性。

英文摘要

In this work, we present Tilt-Ropter, a fully actuated hybrid aerial-terrestrial vehicle (HATV) that integrates tilt rotors with passive wheels to enable efficient multi-modal locomotion. Unlike conventional underactuated HATVs, the fully actuated design of Tilt-Ropter allows decoupled force and torque control, improving maneuverability and ground locomotion efficiency. A unified nonlinear model predictive controller (NMPC) is developed to track reference trajectories, enforce non-holonomic constraints, and accommodate contact effects across locomotion modes, while ensuring actuator feasibility through dedicated control allocation. To address complex wheel-ground dynamics, an external wrench estimator is incorporated to provide real-time interaction wrench estimates. The system is validated through simulation and real-world experiments, including seamless air-ground transitions and trajectory tracking tasks. Experimental results demonstrate low tracking errors in both modes and reveal a 92.8% reduction in power consumption during ground locomotion compared to flight, highlighting the platform's suitability for long-duration missions in energy-constrained environments.

URL PDF HTML ☆

赞 0 踩 0

2503.08895 2026-06-18 cs.RO 版本更新专题 80

Mutual Adaptation in Human-Robot Co-Transportation with Human Preference Uncertainty

人机协同运输中考虑人类偏好不确定性的相互适应

Al Jaber Mahmud, Weizi Li, Xuan Wang

发表机构 * George Mason University（乔治·马歇尔大学）； University of California, Riverside（加州大学河滨分校）

专题命中机器人学习：人机协同运输中的相互适应

AI总结针对人机协同运输中人类偏好参数不确定及适应策略平衡问题，提出统一框架，通过建模偏好概率分布、时变固执度及协调规划模型，结合位姿优化策略，实现相互适应以提升任务性能。

Comments 9 pages, 6 figures

详情

AI中文摘要

相互适应可以通过整合机器人和人类对环境的理解来增强人机协同运输的整体任务性能。虽然人类建模有助于捕捉人类的主观偏好，但存在两个挑战：（i）人类偏好参数的不确定性，以及（ii）需要平衡对人和机器都有利的适应策略。在本文中，我们提出了一个统一的框架来应对这些挑战，并通过相互适应提高任务性能。首先，我们不依赖固定参数，而是通过纳入一系列不确定的人类偏好参数来建模人类选择的概率分布。在此基础上，我们引入时变固执度量和协调规划模型，该模型允许机器人领导团队的轨迹，或者如果人类偏好的路径与机器人的计划冲突且其固执度超过阈值，则机器人转为跟随人类。最后，我们引入一种用于低级控制的位姿优化策略，以减轻人类领导时的不确定行为。为了验证该框架，我们设计并进行了包含二十名人类参与者反馈的研究。然后，通过仿真，我们展示了我们的模型在通过相互适应和位姿优化增强任务性能方面的有效性。

英文摘要

Mutual adaptation can enhance overall task performance in human-robot co-transportation by integrating both the robot's and the human's understanding of the environment. While human modeling helps capture humans' subjective preferences, two challenges persist: (i) the uncertainty of human preference parameters and (ii) the need to balance adaptation strategies that benefit both humans and robots. In this paper, we propose a unified framework to address these challenges and improve task performance through mutual adaptation. First, instead of relying on fixed parameters, we model a probability distribution of human choices by incorporating a range of uncertain human preference parameters. Building on this, we introduce a time-varying stubbornness measure and a coordinated planning model, which allows either the robot to lead the team's trajectory or, if a human's preferred path conflicts with the robot's plan and their stubbornness exceeds a threshold, the robot to transition to following the human. Finally, we introduce a pose optimization strategy for low-level control to mitigate the uncertain human behaviors when they are leading. To validate the framework, we design and perform a study with human feedback from twenty human participants. We then demonstrate, through simulations, the effectiveness of our models in enhancing task performance with mutual adaptation and pose optimization.

URL PDF HTML ☆

赞 0 踩 0

2511.02036 2026-06-18 cs.RO 版本更新专题 70

TurboMap: GPU-Accelerated Local Mapping for Visual SLAM

TurboMap: 面向视觉SLAM的GPU加速局部建图

Parsa Hosseininejad, Kimia Khabiri, Shishir Gopinath, Soudabeh Mohammadhashemi, Karthik Dantu, Steven Y. Ko

发表机构 * Simon Fraser University（西蒙弗雷泽大学）； University at Buffalo（布法罗大学）

专题命中机器人学习：SLAM是机器人感知的核心技术

AI总结针对视觉SLAM中局部建图延迟问题，提出GPU并行化与CPU优化结合的TurboMap后端，通过重构地图点创建、融合及关键帧管理，实现1.3-1.6倍加速且保持精度。

Comments Accepted for presentation at IROS 2026, preprint

详情

AI中文摘要

在实时视觉SLAM系统中，局部建图必须在严格的延迟约束下运行，因为延迟会降低地图质量并增加跟踪失败的风险。GPU并行化是降低延迟的有效途径。然而，由于同步共享状态更新以及将大型地图数据结构传输到GPU的开销，并行化局部建图具有挑战性。本文提出TurboMap，一个GPU并行化且CPU优化的局部建图后端，全面解决了这些挑战。我们重构了地图点创建，以在GPU上实现并行关键点对应搜索，重新设计并并行化了地图点融合，在CPU上优化了冗余关键帧剔除，并集成了基于GPU的快速局部光束法平差求解器。为最小化数据传输和同步成本，我们引入了持久化的GPU驻留关键帧存储。在EuRoC和TUM-VI数据集上的实验表明，平均局部建图速度分别提升1.3倍和1.6倍，同时保持精度不变。

英文摘要

In real-time Visual SLAM systems, local mapping must operate under strict latency constraints, as delays degrade map quality and increase the risk of tracking failure. GPU parallelization offers a promising way to reduce latency. However, parallelizing local mapping is challenging due to synchronized shared-state updates and the overhead of transferring large map data structures to the GPU. This paper presents TurboMap, a GPU-parallelized and CPU-optimized local mapping backend that holistically addresses these challenges. We restructure Map Point Creation to enable parallel Keypoint Correspondence Search on the GPU, redesign and parallelize Map Point Fusion, optimize Redundant Keyframe Culling on the CPU, and integrate a fast GPU-based Local Bundle Adjustment solver. To minimize data transfer and synchronization costs, we introduce persistent GPU-resident keyframe storage. Experiments on the EuRoC and TUM-VI datasets show average local mapping speedups of 1.3x and 1.6x, respectively, while preserving accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.01605 2026-06-18 cs.RO 版本更新专题 85

Embedding Semantic Risk into Distance Fields and CBFs for Online Monocular Safe Control

将语义风险嵌入距离场和CBF用于在线单目安全控制

Dawei Zhang, Nuo Chen, Shuo Liu, Roberto Tron, Zhiwen Fan

发表机构 * Division of Systems Engineering, Boston University（系统工程系，波士顿大学）； Department of Mechanical Engineering, Boston University（机械工程系，波士顿大学）； Department of Electrical and Computer Engineering, Texas A&M University（电气与计算机工程系，德克萨斯农工大学）

专题命中具身导航：单目安全控制，语义风险嵌入距离场用于导航

AI总结提出一种在线单目感知到控制框架，通过将语义风险直接嵌入欧几里得符号距离场（ESDF），在控制优化前编码风险，实现基于控制障碍函数（CBF）的语义感知安全导航与遥操作。

详情

AI中文摘要

我们提出了一种在线单目感知到控制框架，将语义风险嵌入到用于基于控制障碍函数（CBF）的安全导航和遥操作的距离场中。许多基于感知的安全过滤器对所有映射的障碍物分配相同的基于距离的安全裕度，或者仅将语义用作下游控制器调整，而不是在空间表示中编码语义风险。我们的框架通过将语义信息直接嵌入欧几里得符号距离场（ESDF），在线推理障碍物几何和类别相关风险。这种设计在控制优化前编码语义风险，因此高风险对象在安全场中施加更大的空间影响，同时保留运行时高效的ESDF查询。具体来说，基于基础模型的SLAM前端从单目RGB视频重建密集3D几何，而每帧语义分割提供像素级类别标签，这些标签被融合到重建的几何中。得到的几何-语义表示随后被转换为ESDF，其中语义标签识别安全相关区域并在场计算前施加类别相关的膨胀。语义感知的ESDF提供CBF控制器所需的局部距离值和空间导数，而类别相关的增益进一步调节控制器响应。广泛的仿真和硬件实验证明了在线操作在10-20 Hz的频率以及遥操作和自主导航中的语义感知安全行为。

英文摘要

We propose an online monocular perception-to-control framework that embeds semantic risk into the distance field used by Control Barrier Function (CBF)-based safe navigation and teleoperation. Many perception-based safety filters assign the same distance-based safety margin to all mapped obstacles or use semantics only as a downstream controller adjustment, rather than encoding semantic risk in the spatial representation. Our framework instead reasons online about obstacle geometry and class-dependent risk by embedding semantic information directly into the Euclidean Signed Distance Field (ESDF). This design encodes semantic risk before control optimization, so high-risk objects exert a larger spatial influence in the safety field while retaining efficient ESDF queries at runtime. Specifically, a foundation-model-based SLAM front end reconstructs dense 3-D geometry from monocular RGB video, while per-frame semantic segmentation provides pixel-level class labels that are fused into the reconstructed geometry. The resulting geometric-semantic representation is then converted into an ESDF, where semantic labels identify safety-relevant regions and impose class-dependent inflation before field computation. The semantic-aware ESDF provides the local distance values and spatial derivatives required by the CBF controller, while class-dependent gains further regulate the controller response. Extensive simulation and hardware experiments demonstrate online operation at 10--20 Hz and semantic-aware safe behavior in both teleoperation and autonomous navigation.

URL PDF HTML ☆

赞 0 踩 0

2601.07052 2026-06-18 cs.RO 版本更新专题 80

RSLCPP -- Deterministic Simulations Using ROS 2

RSLCPP——使用ROS 2进行确定性仿真

Simon Sagmeister, Marcel Weinmann, Phillip Pitschi, Markus Lienkamp

发表机构 * Technical University of Munich, Germany（慕尼黑技术大学）； School of Engineering & Design, Department of Mobility Systems Engineering, Institute of Automotive Technology（工程与设计学院，移动系统工程系，汽车技术研究所）； School of Engineering & Design, Department of Engineering Physics and Computation, Institute of Automatic Control（工程与设计学院，工程物理与计算系，自动控制研究所）

专题命中其他机器人：使用ROS 2实现确定性仿真，用于机器人开发

AI总结针对ROS异步多进程设计导致仿真结果不可复现的问题，提出RSLCPP库，通过确定性回调执行实现跨平台可复现仿真，无需修改现有节点代码。

Comments Accepted for publication at the 'IEEE Robotics and Automation Practice'

详情

DOI: 10.1109/RAP.2026.3704080

AI中文摘要

仿真在现实机器人技术中至关重要，为开发各种机器人应用提供了安全、可扩展且高效的环境。虽然机器人操作系统（ROS）在学术界和工业界已被广泛采用作为这些机器人应用的基础，但其异步、多进程的设计使得复现变得复杂，尤其是在不同的硬件平台上。当计算时间和通信延迟变化时，无法保证确定性回调执行。这种缺乏复现性的问题给科学基准测试和持续集成带来了困难，因为在这些场景中一致的结果至关重要。为了解决这个问题，我们提出了一种使用ROS 2节点创建确定性仿真的方法。我们的ROS仿真库（RSLCPP）实现了这种方法，使得现有节点可以组合成一个产生可复现结果的仿真例程，通常无需更改任何源代码。我们证明，在测试合成基准测试和真实机器人系统时，我们的方法在各种CPU和架构上产生相同的结果。RSLCPP已开源，网址为：https://this https URL。

英文摘要

Simulation is crucial in real-world robotics, offering safe, scalable, and efficient environments for developing a variety of robotic applications. While the Robot Operating System (ROS) has been widely adopted as the backbone of these robotic applications in both academia and industry, its asynchronous, multi-process design complicates reproducibility, especially across varying hardware platforms. Deterministic callback execution cannot be guaranteed when computation times and communication delays vary. This lack of reproducibility complicates scientific benchmarking and continuous integration, where consistent results are essential. To address this, we present a methodology to create deterministic simulations using ROS 2 nodes. Our ROS Simulation Library for C++ (RSLCPP) implements this approach, enabling existing nodes to be combined into a simulation routine that yields reproducible results, usually without requiring any source code changes. We demonstrate that our approach produces identical results across various CPUs and architectures when testing both a synthetic benchmark and a real-world robotics system. RSLCPP is open-sourced at https://github.com/TUMFTM/rslcpp.

URL PDF HTML ☆

赞 0 踩 0

2501.06348 2026-06-18 cs.HC cs.RO 版本更新专题 60

Why Automate This? Exploring Correlations Between Desire for Robotic Automation, Invested Time and Well-Being

为什么自动化这个？探索机器人自动化愿望、投入时间与幸福感之间的相关性

Ruchira Ray, Leona Pang, Sanjana Srivastava, Li Fei-Fei, Samantha Shorey, Roberto Martín-Martín

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）； Stanford University（斯坦福大学）； University of Pittsburgh（匹兹堡大学）

专题命中其他机器人：探索机器人自动化偏好与时间、幸福感的相关性。

AI总结本研究利用BEHAVIOR-1K等数据集，发现活动时间并非自动化偏好的强预测因子，而幸福感和痛苦感是最强指标，并揭示了性别和收入水平的差异。

Comments 26 pages, 14 figures

详情

AI中文摘要

理解人类倾向于自动化任务的动机对于开发无缝融入日常生活的机器人至关重要。因此，我们提出疑问：个体是否更倾向于根据活动消耗的时间或执行活动时的感受来自动化活动？本研究探讨了这些偏好以及它们是否在不同社会群体（特别是性别类别和收入水平）之间存在差异。利用BEHAVIOR-1K数据集、美国时间使用调查以及美国时间使用调查幸福感模块的数据，我们研究了机器人自动化愿望、花费时间以及相关感受（幸福感、意义感、悲伤感、痛苦感、压力感或疲惫感）之间的关系。我们的主要发现表明，尽管存在常见假设，但活动花费的时间并不能强烈预测自动化偏好；相反，幸福感和痛苦感是最强的指标。我们还识别出性别和经济水平的差异：女性倾向于自动化压力大的活动，而男性倾向于自动化让他们不快乐的活动；中等收入个体优先自动化不太愉快和有意义的活动，而低收入和高收入群体则没有显著相关性。我们希望我们的研究有助于推动机器人设计符合用户优先事项，使家用机器人朝着更具社会相关性的解决方案发展。所有数据和交互式工具均可在此https URL公开获取。

英文摘要

Understanding the motivations underlying the human inclination to automate tasks is vital for developing robots that fit seamlessly into daily life. Accordingly, we ask: are individuals more inclined to automate activities based on the time they consume or the feelings experienced while performing them? This study explores these preferences and whether they vary across social groups, specifically gender category and income level. Leveraging data from the BEHAVIOR-1K dataset, the American Time-Use Survey, and the American Time-Use Survey Well-Being Module, we investigate the relationship between the desire for robot automation, time spent, and associated feelings: Happiness, Meaningfulness, Sadness, Painfulness, Stressfulness, or Tiredness. Our key findings show that, despite common assumptions, time spent on activities does not strongly predict automation preferences; instead, happiness and pain are the strongest indicators. We also identify differences by gender and economic level: Women prefer to automate stressful activities, whereas men prefer to automate those that make them unhappy; mid-income individuals prioritize automating less enjoyable and meaningful activities, while low and high-income show no significant correlations. We hope our research helps motivate the design of robots that align with user priorities, moving domestic robotics toward more socially relevant solutions. All data and an interactive tool are publicly available at https://robin-lab.cs.utexas.edu/why-automate-this/.

URL PDF HTML ☆

赞 0 踩 0

1. 机器人基础模型 1 篇

Cosmos 3: Omnimodal World Models for Physical AI

2. 机器人操作 2 篇

DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

STORM: Slot-based Task-aware Object-centric Representation for robotic Manipulation

3. 机器人学习 5 篇

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

Bench-Push: Benchmarking Pushing-based Navigation and Manipulation Tasks for Mobile Robots

Tilt-Ropter: A Fully Actuated Hybrid Aerial-Terrestrial Vehicle with Tilt Rotors and Passive Wheels

Mutual Adaptation in Human-Robot Co-Transportation with Human Preference Uncertainty

TurboMap: GPU-Accelerated Local Mapping for Visual SLAM

4. 具身导航 1 篇

Embedding Semantic Risk into Distance Fields and CBFs for Online Monocular Safe Control

5. 其他机器人 2 篇

RSLCPP -- Deterministic Simulations Using ROS 2

Why Automate This? Exploring Correlations Between Desire for Robotic Automation, Invested Time and Well-Being