arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.27314 2026-05-27 cs.RO cs.SY eess.SY 版本更新

Riding the Shifting Potential: When Reactive Control Suffices for Multi-Goal Behavior

驾驭变化势场:何时反应控制足以实现多目标行为

Vito Mengers, Oliver Brock

发表机构 * Robotics and Biology Laboratory, Technische Universität Berlin(技术大学柏林机器人与生物学实验室) Science of Intelligence (SCIoI), Cluster of Excellence, Berlin, Germany(智能科学(SCIoI),卓越中心,柏林,德国) Robotics Institute Germany(德国机器人研究所)

AI总结 本文提出通过零空间投影扩展图基世界模型中的交互结构,动态调整优先级以解决多目标冲突,在非凸障碍导航和非凸物体推拉任务中实现100%成功率,无需演示或重新训练。

详情
AI中文摘要

反应控制通常被认为不足以处理多目标任务,因为冲突目标会导致局部极小值。我们认为这一限制并非固有,而是源于无法反映目标当前交互方式的静态编码。我们利用图基世界模型中编码的交互结构,通过零空间投影对其进行扩展:冲突在产生处通过将低优先级梯度投影到高优先级梯度的零空间来解决,优先级根据当前状态连续确定。我们在两个目标冲突为核心问题的领域中进行了演示:非凸障碍导航(静态势场在此根本失败)和非凸物体推拉(我们的方法在一百个配置中达到100%成功率,而最速下降基线为0%,扩散策略约为55%,无需演示或重新训练)。相同的公式直接迁移到具有额外感知和运动学约束的真实机器人上,通过相同机制适应这些约束。

英文摘要

Reactive control is often considered insufficient for multi-objective tasks because conflicting objectives give rise to local minima. We argue this limitation is not inherent but arises from static encodings that fail to reflect how objectives currently interact. We exploit the interaction structure encoded in a graph-based world model by extending it with nullspace projections: conflicts are resolved where they arise by projecting lower-priority gradients into the nullspace of higher-priority ones, with priorities determined continuously from the current state. We demonstrate this in two domains where conflicts between objectives are central: navigation around non-convex obstacles, where static potential fields fundamentally fail, and planar pushing of non-convex objects, where our method achieves $100\%$ success across one-hundred configurations versus $0\%$ for the steepest-descent baseline and ${\sim}55\%$ for diffusion policy, without demonstrations or retraining. The same formulation transfers directly to a real robot with additional perceptual and kinematic constraints, accommodating them through the same mechanism.

2605.27178 2026-05-27 cs.CV cs.AI cs.LG cs.RO 版本更新

FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

FoundObj: 自监督基础模型作为无标签3D物体分割的奖励

Zihui Zhang, Zhixuan Sun, Yafei Yang, Jinxi Li, Jiahao Chen, Bo Yang

发表机构 * Shenzhen Research Institute, The Hong Kong Polytechnic University(深圳研究院,香港理工大学) vLAR Group, The Hong Kong Polytechnic University(vLAR小组,香港理工大学)

AI总结 提出FoundObj框架,利用自监督2D/3D基础模型的语义和几何先验作为奖励,通过强化学习引导超点合并,实现无标注复杂场景3D物体分割。

Comments ICML 2026. Zihui and Zhixuan are co-first authors. Code and data are available at: https://github.com/vLAR-group/FoundObj

详情
AI中文摘要

我们解决了在训练过程中不依赖任何场景级人类标注的复杂场景点云中3D物体分割的挑战性任务。现有方法通常局限于识别简单物体,这主要是由于学习过程中物体先验不足。在本文中,我们提出了FoundObj,一个新颖的框架,其特点是基于超点的物体发现代理,该代理在我们的创新语义和几何奖励模块的指导下逐步合并合适的相邻超点。这些模块协同利用自监督2D/3D基础模型中的语义和几何先验,为物体发现代理提供互补反馈,并通过强化学习实现对多类物体的鲁棒识别。在多个基准上的大量实验表明,我们的方法始终优于现有基线。值得注意的是,我们的方法在零样本和长尾场景中表现出强大的泛化能力,突显了其在可扩展、无标签3D物体分割方面的潜力。

英文摘要

We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process. In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery agent that incrementally merges suitable neighboring superpoints, guided by our innovative semantic and geometric reward modules. These modules synergistically leverage semantic and geometric priors from self-supervised 2D/3D foundation models, providing complementary feedback to the object discovery agent and enabling robust identification of multi-class objects through reinforcement learning. Extensive experiments on diverse benchmarks demonstrate that our approach consistently outperforms existing baselines. Notably, our method exhibits strong generalization in zero-shot and long-tail scenarios, underscoring its potential for scalable, label-free 3D object segmentation.

2605.27167 2026-05-27 cs.RO 版本更新

TCBiRRT: Rapid Motion Planning for Tightly Coupled Dual-arm Space Manipulator Using Task-space Random Expansion

TCBiRRT:紧耦合双臂空间机械臂的任务空间随机扩展快速运动规划

Jiawei Zhang, Xinhao Miao, Jifeng Guo, Qinghua Li, Chengchao Bai

发表机构 * Harbin Institute of Technology(哈尔滨工业大学)

AI总结 针对紧耦合双臂空间机械臂在闭链约束下的运动规划问题,提出一种任务空间约束的双向快速随机扩展树算法(TCBiRRT),通过在任务空间直接采样和节点扩展,结合路径逆运动学映射和重抓取机制,显著提高规划成功率和速度。

Comments 12 pages, 9 figures

详情
AI中文摘要

在大型空间结构的在轨组装中,为紧耦合双臂空间机械臂规划在闭链约束下的运动路径是一个基础且具有挑战性的问题。闭链约束显著减少了可行构型空间,使得现有规划器难以高效生成无碰撞运动,尤其是在杂乱环境中。为解决这一问题,本文提出了一种任务空间约束的双向快速随机扩展树算法,称为TCBiRRT。与在高维构型空间中运行的传统方法不同,所提方法直接在由操作对象位姿定义的任务空间中进行随机采样和节点扩展。开发了一种任务空间节点扩展策略来生成候选对象运动,然后通过路径逆运动学算法将其映射到连续关节路径。该方法进一步与双向RRT框架和重抓取机制集成,以高效连接两个随机树。在具有不同环境复杂度的代表性在轨组装场景中进行了大量仿真。结果表明,与最先进的规划器相比,TCBiRRT实现了显著更高的成功率和数量级的规划时间改进。所提方法为紧耦合双臂空间机械臂的运动规划提供了一种高效且鲁棒的解决方案。

英文摘要

Planning the motion path for a tightly coupled dual-arm space manipulator under closed-chain constraints is a fundamental yet challenging problem in on-orbit assembly of large-scale space structures. The closed-chain constraints significantly reduce the feasible configuration space, making it difficult for existing planners to efficiently generate collision-free motions, especially in cluttered environments. To address this issue, this paper proposes a task-space constrained bidirectional rapidly-exploring random tree algorithm, termed TCBiRRT. Unlike conventional methods that operate in the high-dimensional configuration space, the proposed approach performs random sampling and node expansion directly in the task space defined by the manipulated object pose. A task-space node expansion strategy is developed to generate candidate object motions, which are then mapped to continuous joint paths using a path inverse kinematics algorithm. The method is further integrated with a bidirectional RRT framework and a regrasp mechanism to efficiently connect two random trees. Extensive simulations are conducted in representative on-orbit assembly scenarios with varying levels of environmental complexity. The results demonstrate that TCBiRRT achieves significantly higher success rates and orders-of-magnitude improvements in planning time compared to state-of-the-art planners. The proposed method provides an efficient and robust solution for motion planning of tightly coupled dual-arm space manipulators.

2605.27129 2026-05-27 cs.CV cs.RO 版本更新

YOLO26-RipeLoc Lite: A lightweight architecture for tomato ripeness detection and picking point localization in greenhouse robotic harvesting

YOLO26-RipeLoc Lite:用于温室机器人采摘中番茄成熟度检测与采摘点定位的轻量级架构

Rajmeet Singh, Manveen Kaur, Shahpour Alirezaee, Irfan Hussain

发表机构 * Department of Mechanical Engineering(机械工程系) Khalifa University(卡利法大学) University of Windsor(温莎大学)

AI总结 提出基于YOLO26的轻量级架构YOLO26-RipeLoc Lite,通过轻量特征金字塔网络、成熟度感知注意力模块和紧凑检测头,实现温室番茄的成熟度分类与中心点定位,在仅2.38M参数下达到92.9% mAP@0.5。

详情
AI中文摘要

在温室番茄生产中,自动化收获需要准确检测成熟番茄、进行成熟度分类,并为机器人末端执行器精确定位采摘点。本文提出YOLO26-RipeLoc Lite,一种基于YOLO26的轻量级深度学习架构,用于同时检测、成熟度分类和温室番茄的中心点定位。该模型引入了三项改进:(1) 轻量特征金字塔网络(LFPN),采用深度可分离卷积实现高效多尺度融合;(2) 成熟度感知注意力模块(RAAM),具有双池化和可学习的成熟度偏置向量,增强颜色纹理区分能力;(3) 紧凑检测头(CDH),采用共享卷积和集成的中心点回归分支,用于直接抓取规划。该模型在来自阿联酋阿布扎比SILAL温室的自定义数据集(1500张图像,6227个实例,其中3566个成熟,2661个未成熟)上进行评估。YOLO26-RipeLoc Lite在仅使用2.38M参数的情况下,实现了92.9%的mAP@0.5(成熟95.2%,未成熟90.6%),在所有评估架构中精度最高(95.2%)。训练后批量归一化剪枝30%可将参数减少至约1.8M,且精度损失可忽略。消融研究证实,温室感知的HSV增强提供了最大的改进(+2.02个百分点 mAP@50),骨干网络冻结达到了峰值精度(93.8%),而三阶段渐进解冻获得了最佳的定位质量(mAP@50:95为64.6%)。与YOLOv8n/s、YOLO11n/s、YOLO12n/s和YOLO26s的比较证实了其优越的精度-效率:比YOLO12n精度高2.9个百分点,参数少7.0%,并集成了用于机器人末端执行器引导的中心点定位。

英文摘要

In greenhouse tomato production, automated harvesting requires accurate detection of ripe tomatoes, ripeness classification, and precise picking-point localization for robotic end-effectors. This paper proposes YOLO26-RipeLoc Lite, a lightweight deep learning architecture based on YOLO26 for simultaneous detection, ripeness classification, and center-point localization of greenhouse tomatoes. The model introduces three modifications: (1) a Lightweight Feature Pyramid Network (LFPN) with depthwise separable convolutions for efficient multi-scale fusion, (2) a Ripeness-Aware Attention Module (RAAM) with dual pooling and a learnable ripeness bias vector for enhanced color-texture discrimination, and (3) a Compact Detection Head (CDH) with shared convolutions and an integrated center-point regression branch for direct grasp planning. The model is evaluated on a custom dataset of 1,500 images with 6,227 instances (3,566 ripe, 2,661 unripe) from the SILAL greenhouse, Abu Dhabi, UAE. YOLO26-RipeLoc Lite achieves mAP@0.5 of 92.9% (95.2% ripe, 90.6% unripe) with the highest precision (95.2%) among all evaluated architectures using only 2.38M parameters. Post-training BatchNorm pruning at 30% reduces parameters to ~1.8M with negligible accuracy loss. Ablation studies confirm that greenhouse-aware HSV augmentation provides the largest improvement (+2.02 pp mAP@50), backbone freezing achieves peak precision (93.8%), and 3-phase progressive unfreezing yields the best localization quality (mAP@50:95 of 64.6%). Comparisons with YOLOv8n/s, YOLO11n/s, YOLO12n/s, and YOLO26s confirm superior accuracy-efficiency: 2.9 pp higher precision than YOLO12n with 7.0% fewer parameters and integrated center-point localization for robotic end-effector guidance.

2605.27079 2026-05-27 cs.LG cs.AI cs.RO 版本更新

Trust Region Q Adjoint Matching

信任区域Q伴随匹配

Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin

发表机构 * KAIST AI(韩国科学技术院人工智能) Seoul National University(首尔国立大学) RLWRLD

AI总结 针对预训练流策略的离策略强化学习不稳定性,提出信任区域Q伴随匹配方法,通过投影对偶下降自适应控制路径空间KL散度,实现稳定微调,在50个OGBench任务中离线RL成功率达68%。

详情
AI中文摘要

由于多步采样过程带来的优化不稳定性,预训练流策略的离策略强化学习仍然具有挑战性。最近,带有伴随匹配的Q学习(QAM)通过将问题重新表述为一个具有学习评论家的无记忆随机最优控制(SOC)问题来解决这一问题。然而,QAM继承了评论家引导改进的根本脆弱性:当评论家病态时,小的评论家误差会被放大,通常导致模型崩溃。本文引入了信任区域Q伴随匹配(TRQAM),一种稳定的离策略微调算法,通过投影对偶下降自适应地控制与预训练流策略的路径空间KL散度。具体来说,我们优化SOC动力学中的信任区域参数$λ$,并从理论上证明路径空间KL可以用$λ$的闭式函数表示。因此,我们的方法可以精确控制与预训练流策略的精确偏差,实现稳定的离策略强化学习。通过在50个OGBench任务上的实验,TRQAM在离线强化学习和离线到在线强化学习中都持续优于先前的方法。特别是,TRQAM在离线强化学习中实现了68%的总体成功率,显著提高了最强基线的46%。

英文摘要

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the multi-step sampling process. Recently, Q-learning with Adjoint Matching (QAM) addressed this issue by reformulating into a memoryless stochastic optimal control (SOC) problem with a learned critic. However, QAM inherits a fundamental fragility of critic-guided improvement: small critic errors are amplified when critics are ill-conditioned, often leading to model collapse. This paper introduces Trust Region Q-Adjoint Matching (TRQAM), a stable off-policy fine-tuning algorithm that adaptively controls the path-space KL with pretrained flow policies through projected dual descent. Specifically, we optimize the trust-region parameter $λ$ in SOC dynamics, and theoretically show that the path-space KL can be represented by a closed-form function of $λ$. As a result, our method can precisely control the exact deviation from pretrained flow policies, achieving stable off-policy RL. Through experiments on 50 OGBench tasks, TRQAM consistently outperforms prior arts in both offline RL and offline-to-online RL. In particular, TRQAM achieves an overall success rate of 68% in offline RL, substantially improves the strongest baseline at 46%.

2605.27046 2026-05-27 cs.RO 版本更新

Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy

学习平衡电机热安全与四足运动性能的残差策略

Yuhang Wan, Weixian Lin, Letian Qian, Yiqi Zou, Weiwei Wu, Shengwei Wu, Chuanlin Zhao, Xin Luo

发表机构 * School of Mechanical Science and Engineering, Huazhong University of Science and Technology(华中科技大学机械科学与工程学院)

AI总结 提出一种两阶段训练框架,结合整机热模型和残差策略,在保持运动性能的同时防止电机过热,实现长时间负重运动。

详情
AI中文摘要

电机热管理在电动驱动机器人(尤其是腿式机器人)中常被忽视,但电机过热是限制长时间运动的关键因素,特别是在负载条件下。本文将一个四足机器人的整机热模型集成到强化学习流水线中以更新电机温度,并提出一个用于电机热管理的两阶段训练框架。在该框架中,首先预训练一个名义策略作为能够穿越多种地形的运动基线。然后,在名义策略之上训练一个残差策略,根据机器人的热状态提供修正动作,确保在低温条件下保持高性能,并在高温条件下防止电机过热。仿真结果表明,所提出的策略在电机热安全与运动性能之间实现了有效平衡。在宇树A1四足机器人上的真实世界实验进一步验证了该方法:在3千克负载下,机器人能够在多种地形上稳定运动超过13分钟,而仅使用名义策略时,约5分钟就会导致电机过热。

英文摘要

Motor thermal management is often overlooked in the context of electrically-actuated robots, particularly legged robots, but motor overheating is a key factor that limits long-duration locomotion especially under payload conditions. This paper integrates a whole-body thermal model of a quadruped robot into the reinforcement learning pipeline to update motor temperatures, and proposes a two-stage training framework for motor thermal management. In this framework, a nominal policy is first pre-trained as a locomotion baseline capable of traversing diverse terrains. A residual policy is then trained on top of the nominal policy to provide corrective actions based on the robot's thermal state, ensuring high performance under low-temperature conditions and preventing motor overheating under high-temperature conditions. Simulation results demonstrate that the proposed policy achieves an effective balance between motor thermal safety and locomotion performance. Real-world experiments on a Unitree A1 quadruped robot further validate the approach: under a 3 kg payload, the robot achieves stable locomotion across multiple terrains for over 13 minutes, while the nominal policy alone leads to motor overheating in about 5 minutes.

2605.27038 2026-05-27 cs.RO 版本更新

TPS-Drive: Task-Guided Representation Purification for VLM-based Autonomous Driving

TPS-Drive: 基于VLM的自动驾驶任务引导表示净化

Jiaxiang Li, Yumao Liu, Ke Ma

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州))

AI总结 提出TPS-Drive框架,通过任务引导的表示净化(Agent-Centric Tokenizer)解决VLM在自动驾驶中的空间幻觉和表示干扰问题,实现精确的3D空间预测与安全规划。

详情
AI中文摘要

视觉-语言模型(VLM)为自动驾驶规划提供了有前景的基础,但弥合语义推理与精确3D空间预测之间的差距仍然是一个关键挑战。现有的表示策略通常遵循两条路径:文本对齐方法将连续空间状态扁平化为符号,这损害了几何结构并导致“空间幻觉”;密集视觉方法保留了空间拓扑,但用冗余的背景纹理压垮了标准分词器,导致“表示干扰”。为了解决这些限制,我们引入了TPS-Drive,一个以任务引导表示净化为核心的新框架,使VLM能够在净化空间中思考。其核心是一个以智能体为中心的分词器,利用由冻结的3D检测头监督的任务引导向量量化机制,将有限的码本容量从普遍的静态背景显式重新分配给关键的动态智能体,并有效隔离空间冗余。利用这种净化的空间词汇,TPS-Drive采用解耦的推理流程,依次执行场景理解、未来预测和动作生成。该框架通过渐进的三阶段训练范式进行优化,最终通过奖励驱动的细化超越纯模仿学习。大量实验验证了我们的方法:TPS-Drive在开环nuScenes评估中实现了准确的智能体空间状态预测并降低了碰撞率,同时在严格的闭环NAVSIMv1和NAVSIMv2基准测试中建立了新的安全记录。

英文摘要

Vision-Language Models (VLMs) provide a promising foundation for autonomous driving planning, yet bridging semantic reasoning and precise 3D spatial forecasting remains a critical challenge. Existing representation strategies generally follow two paths: text-aligned methods flatten continuous spatial states into symbols, which compromises geometric structure and induces "spatial hallucinations"; dense visual methods preserve spatial topology but overwhelm standard tokenizers with redundant background textures, leading to "representation interference". To address these limitations, we introduce TPS-Drive, a novel framework centered on Task-Guided Representation Purification that empowers VLMs to Think in Purified Space. At its core, an Agent-Centric Tokenizer utilizes a task-guided vector quantization mechanism supervised by a frozen 3D detection head, which explicitly reallocates limited codebook capacity from pervasive static backgrounds to critical dynamic agents and effectively isolates spatial redundancy. Leveraging this purified spatial vocabulary, TPS-Drive employs a decoupled reasoning pipeline that sequentially performs scene understanding, future forecasting, and action generation. The framework is optimized via a progressive three-stage training paradigm, culminating in reward-driven refinement that surpasses pure imitation learning. Extensive experiments validate our approach: TPS-Drive achieves accurate agent spatial state forecasting and reduces collision rates in open-loop nuScenes evaluations, while establishing new safety records on the rigorous closed-loop NAVSIMv1 and NAVSIMv2 benchmarks.

2605.26991 2026-05-27 cs.RO 版本更新

Towards Shared Embodied Intelligence in Humanoid Robots through Optimization Development and Testing of the Human Aware ergoCub Robot

通过优化开发与测试人类感知的ergoCub机器人迈向人形机器人的共享具身智能

Carlotta Sartore, Mohamed Elobaid, Lorenzo Rapetti, Giulio Romualdi, Stefano Dafarra, Nicola A. Piga, Ines Sorrentino, Paolo Maria Vicecone, Silvio Traversaro, Ugo Pattacini, Luca Fiorio, Francesco Draicchio, Giovanna Tranfo, Lorenzo Natale, Marco Maggiali, Daniele Pucci

发表机构 * GenerativeBionics(生成生物技术) Artificial and Mechanical Intelligence(人工与机械智能) School of Computer Science, University of Manchester(曼彻斯特大学计算机科学学院) Humanoid Sensing and Perception(人形感知与感知) iCub Tech Facility(iCub技术设施) DiMEILA, Istituto Nazionale Assicurazione Infortuni sul Lavoro (INAIL)(DiMEILA,意大利国家职业伤害保险机构(INAIL))

AI总结 提出一种融合共享智能与具身认知的架构,通过优化机器人硬件与控制以符合人体工学指标,实现人机物理协作,并以ergoCub人形机器人为具体实现。

详情
AI中文摘要

协作是人类行为的核心,使得完成超出个人能力的任务成为可能。这种能力源于通过对他人的内部表征来协调行动,这一概念被称为共享智能。此外,人类以其身体和认知能力为特征,这些能力会根据环境进行优化,这种现象被称为具身认知。设计能够安全有效地与人协作的人形机器人需要统一这些原则。在此,我们提出一种整合共享智能与具身认知的架构,使机器人能够与人类进行物理协作,其中机器人硬件和控制针对人体指标进行优化,利用人体和运动智能的表征。最终目标是实现一种共享具身智能的形式。具体而言,我们的架构根据人体工程学指标优化机器人硬件和物理智能参数。这是通过将人机交互建模为硬件配置的函数,并将人体模型嵌入机器人的物理智能中来实现的。作为具体实现,我们介绍了人形机器人ergoCub,其形态和控制已针对与人类的协作任务进行了优化。我们的方法为设计在硬件和物理智能层面优先考虑人体工程学的人形机器人提供了一个框架,并应用于工业和辅助机器人领域。

英文摘要

Collaboration is central to human behavior, enabling tasks beyond individual capability. This ability arises from coordinating actions through internal representations of others, a concept known as shared intelligence. Additionally, humans are characterized by physical bodies and cognitive abilities that are optimized in response to their environment, a phenomenon referred to as embodied cognition. Designing humanoid robots that collaborate safely and effectively with people requires unifying these principles. Here we propose an architecture that integrates shared intelligence and embodied cognition to enable robots to physically collaborate with humans, where robot hardware and control are optimized for human metrics, using representations of the human body and motion intelligence. The ultimate goal is to achieve a form of shared embodied intelligence. Specifically, our architecture optimizes robot hardware and physical intelligence parameters with respect to human ergonomic metrics. This is accomplished by modeling human-robot interaction as a function of hardware configurations and embedding human models into the robot's physical intelligence. As a concrete implementation, we present the humanoid robot ergoCub, whose morphology and control have been optimized for collaborative tasks with humans. Our approach provides a framework for designing humanoid robots that prioritize human ergonomics at both the hardware and physical intelligence levels, with applications in industrial and assistive robotics.

2605.26944 2026-05-27 cs.RO cs.CV 版本更新

Object Pose and Shape Estimation for Grasping: Does it Work?

用于抓取的目标姿态与形状估计:有效吗?

Pavan Karke, Kushal Shah, Gaurav Singh, Md Faizal Karim, K Madhava Krishna, Rajat Talak

发表机构 * Robotics Research Center, IIIT Hyderabad(IIIT海得拉巴机器人研究中心) National University of Singapore(新加坡国立大学)

AI总结 本文通过对比端到端抓取合成方法与模块化方法(先估计目标姿态和形状再采样抓取),评估现有姿态和形状估计方法在抓取任务中的有效性。

Comments 9 pages, 8 figures

详情
AI中文摘要

目标姿态和形状估计问题近年来取得了关键进展。编码器-解码器(如SAM3D、LRM、CRISP)和基于扩散的模型(如InstantMesh、Zero123、SceneComplete)展示了类别无关的形状编码能力和开放集泛化性。在这项工作中,我们提出一个问题:当与对极抓取采样结合使用时,目标姿态和形状估计方法是否足够成熟,以至于能够超越端到端抓取合成方法?我们通过将研究范围限定在平行颚夹爪、7自由度抓取和单视图RGB(-D)图像输入,详细探讨了这个问题。我们实现并比较了一种最先进的端到端抓取合成方法和三种模块化方法,这些方法首先估计场景中所有目标的姿态和形状,然后使用对极采样生成抓取。我们观察到,在所有实验中,模块化方法均优于端到端方法。模块化方法能够合成大量抓取,即使是对于端到端方法失败的小目标也是如此。模块化方法的有效性取决于姿态和形状估计的准确性,并且在杂乱场景中会部分退化——这是现有姿态和形状估计方法的局限性。我们还分析了三种模块化方法的失败模式和运行时间,这些方法使用了两种不同的目标姿态和形状估计方式:一种基于编码器-解码器模型,另一种基于扩散模型。最后,我们证明单视图目标姿态和形状估计方法可以与视觉语言模型结合,仅从单视图RGB-D图像输入即可产生语言条件抓取。我们注意到其性能与最先进的LERF-TOGO基线相当。

英文摘要

The problem of object pose and shape estimation has seen key advancements lately. Encoder-decoder (e.g., SAM3D, LRM, CRISP) and diffusion-based models (e.g., InstantMesh, Zero123, SceneComplete) have shown category-agnostic shape encoding capacity and open-set generalizability. In this work, we ask the question: Are the object pose and shape estimation methods mature enough, such that when used with antipodal grasp sampling, can outperform the end-to-end grasp synthesis methods? We explore this question in detail by scoping our study to parallel jaw grippers, 7-DoF grasps, and single-view RGB(-D) image as input. We implement and compare a state-of-the-art, end-to-end grasp synthesis method and three modular methods, which first estimate the object pose and shape for all objects in the scene, and generate grasps using antipodal sampling. We observe that the modular methods outperform the end-to-end method in all our experiments. The modular methods are able to synthesize plenty of grasps, even for small objects, where the end-to-end methods fail. The effectiveness of the modular methods is contingent on the accuracy of the pose and shape estimation, and suffers partial degradation in cluttered scenes - a limitation of the existing pose and shape estimation methods. We also analyze the failure modes and run-times for the three modular methods, which use two different ways of object pose and shape estimation: one based on an encoder-decoder model, while another a diffusion model. Finally, we demonstrate that the single-view object pose and shape estimation methods can be augmented with vision-language models to yield language-conditioned grasps from just single-view RGB-D image as input. We notice comparable performance to the state-of-the-art LERF-TOGO baseline.

2605.26936 2026-05-27 cs.RO 版本更新

A Bioinspired Underwater Robot with a Latch-Mediated Soft Bistable Mechanism

一种具有闩锁介导的软体双稳态机构的仿生水下机器人

Chongze Bi, Wenjie Wu, Zonghao Zuo, Li Wen

AI总结 本文提出一种受生物启发的软体双稳态执行器,通过集成闩锁机构实现单电机驱动的非对称能量输入与释放,结合鳍结构实现高效水下推进与机动,实验验证了稳定拍动、精确转向及多模式运动能力。

Comments 6 pages, 6 figures

详情
AI中文摘要

近年来,水下机器人技术取得了显著进展。然而,微型水下机器人的发展仍受限于传统能源的低能量密度。自然界提供了引人注目的解决方案——像螳螂虾和跳蚤这样的生物利用闩锁介导的弹簧驱动(LaMSA)系统,通过解耦的能量存储和释放机制实现快速运动。尽管对LaMSA进行了广泛研究,但在简单紧凑的结构中复制这种快速、非对称驱动仍然具有挑战性。在这项工作中,我们介绍了一种受生物启发的软体双稳态执行器,它集成了闩锁机制,能够使用单个电机实现非对称的能量输入和释放。结合鳍结构,这种设计促进了高效的水下推进和机动性。实验结果表明,该机器人实现了稳定的周期性拍动、精确的转向,以及最大推力0.528 N、冲量0.147 Ns和垂直位移30 mm。通过调节鳍角,机器人实现了多种运动,包括垂直上升、斜向前进和横向平移。这项研究为控制紧凑型水下机器人的运动提供了一种新颖、节能的方法,为先进仿生设计在探索、环境监测和检查中的潜在应用铺平了道路。

英文摘要

Underwater robotics has advanced significantly over recent decades. however, the development of miniaturized underwater robots remains limited by low energy densities of traditional power sources. Nature offers compelling solutions-organisms like mantis shrimps and fleas utilize latch-mediated spring actuation (LaMSA) systems that achieve rapid movements through a decoupled energy storage and release mechanism. Despite extensive studies of LaMSA, replicating such rapid, asymmetric actuation within simple, compact structures remains challenging. In this work, we introduce a bioinspired, soft bistable actuator with an integrated latch mechanism that enables asymmetric energy input and release using a single motor. Coupled with fin structures, this design facilitates efficient underwater propulsion and maneuverability. Experimental results demonstrate stable periodic flapping, precise steering, and a maximum thrust of 0.528 N, impulse of 0.147 Ns, and vertical displacement of 30 mm. By modulating fin angles, the robot achieves versatile motions, including vertical ascent, diagonal forward movement, and lateral translation. This study presents a novel, energy-efficient approach for controlling motion in compact underwater robots, paving the way for advanced biomimetic designs with potential applications in exploration, environmental monitoring, and inspection.

2605.26856 2026-05-27 q-bio.NC cs.AI cs.RO 版本更新

The Sensation Modulating Network:Haltability as the architectural ground for object-directed phenomenology

感觉调节网络:可停性作为对象导向现象学的架构基础

G. Nagarjuna, Durgaprasad Karnam

发表机构 * Indian Institute of Science Education and Research (IISER) Pune(印度科学教育与研究学院(IISER)浦那) Centre for Educational Technology (CET), Indian Institute of Technology Bombay(教育技术中心(CET),印度理工学院孟买)

AI总结 本文提出感觉调节网络(SMN)作为具身认知的架构,通过对手动力学和可停性机制,将对象导向现象学(胡塞尔意义)的意向性建立在身体组织的结构特征上,从而调和认知主义与4E认知的争论。

Comments 64 pages, main body 38 pages + References 6, Appendices 20 pages, Tables 3, and Figures 21

详情
AI中文摘要

认知科学仍然分裂为认知主义——它解释了递归和语言,但无法将形式符号扎根于意义——和4E方法——它将认知扎根于身体,但很少详细说明身体的架构以支持生成性。我们认为这一僵局源于对具身代理架构的不完整描述,并提出一个架构:感觉调节网络(SMN),即认知代理被构想为整个身体,在每个解剖尺度上由对手动力学组织,由感觉调节器构建,这些调节器通过一个基底感知和行动,配对成协调动作区,由全身广播网络路由。三个承诺赋予了SMN其效力。可停性——将对抗性可供性招募到共激活平衡中——提供了对象导向现象学(在胡塞尔意义上)所需的架构位置:对手性使得共激活成为可能,共激活使得停止成为可能,停止使得注意成为可能,注意使得意向指向成为可能,而无需在顶层添加任何模块。可自我调节动作模式(SMAP)的双信号特性使得自我/世界区分成为布线的结构特征,而非代理应用的范畴。四级动作模式层级——基础、可停、可协商、交易——提供了从自主规律性到公共惯例化的单一轨迹,将基于语法的生成性条件定位为架构转变。SMN调和了认知主义与4E的争论:递归存在于可协商动作模式的可修改动力学中,具身性存在于支持它们的对手基底中。附录中给出了一个初步的形式化方法和八个预测寄存器(七个可测试,一个假设性),以及参考模拟。

英文摘要

Cognitive science remains split between cognitivism - which accounts for recursion and language but cannot ground formal symbols in meaning - and 4E approaches - which ground cognition in the body but rarely specify the body's architecture in enough detail to support generativity. We argue the impasse stems from an incomplete account of the embodied agent's architecture, and propose one: the Sensation Modulating Network (SMN), the cognitive agent conceived as the whole body, organized at every anatomical scale by opponent dynamics, built from Sensation Modulators that sense and act through one substrate, paired into Coordinated Action Zones routed by a body-wide broadcast network. Three commitments give the SMN its purchase. Haltability - the recruitment of antagonistic affordance into co-activated equilibrium - provides the architectural locus that object-directed phenomenology, in Husserl's sense, requires: opponency enables co-activation, co-activation enables halt, halt enables attention, attention enables intentional directedness, with no module added on top. The dual-signal property of self-modulatable action patterns (SMAPs) makes the self/world distinction a structural feature of the wiring rather than a category the agent applies. And a four-level action-pattern hierarchy - Basal, Haltable, Negotiable, Transactional - gives a single trajectory from autonomic regularity to public conventionalization, locating the conditions for grammar-grounded generativity as architectural transitions. The SMN reconciles the cognitivism-4E debate: recursion lives in the modifiable dynamics of Negotiable Action Patterns, embodiment in the opponent substrate that supports them. A tentative formalism and eight predicted registers (seven testable, one hypothetical), with reference simulations, are given in an appendix.

2605.26831 2026-05-27 cs.CV cs.RO 版本更新

OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes

OSMa-Bench++:面向操作任务的语义映射开放基准测试,使用提示生成的合成场景

Regina Kurkova, Maxim Popov, Sergey Kolyubin

发表机构 * Biomechatronics and Energy-Efficient Robotics (BE2R) Lab, ITMO University(生物机电学与节能机器人实验室,ITMO大学)

AI总结 本文扩展OSMa-Bench,通过提示生成合成室内场景实现可控基准测试,并提出一种基于提示的VQA类别,用于语义映射方法在杂乱、小物体、部分遮挡和光照变化等条件下的压力测试。

Comments Code: https://github.com/be2rlab/OSMa-Bench-v2

详情
AI中文摘要

语义映射方法越来越多地被用作下游机器人推理和操作的中间场景表示,但它们的评估仍然很大程度上依赖于固定的基准数据集,这些数据集对操作相关边缘情况的覆盖有限。在这项工作中,我们将OSMa-Bench扩展到使用提示生成的合成室内场景进行可控基准测试。我们的流程自动生成场景描述,使用SceneSmith合成相应环境,并将生成的资产适配为OSMa-Bench兼容的仿真格式。这种适配需要一个非平凡的中层,包括语义归一化、材质和纹理修复、着色器回退策略、地面处理、导航设置和受控光照配置。所提出设置的一个关键优势是原始场景生成提示是预先已知的,因此可以作为预期场景的辅助语义规范。我们利用这一特性,将OSMa-Bench的VQA组件扩展了一个基于提示的问题类别。由此产生的框架支持在杂乱、小物体、部分遮挡和光照变化等条件下对语义场景表示进行有针对性的压力测试,并使基准测试更具可扩展性,更好地与下游操作需求对齐。我们的代码可在https://github.com/be2rlab/OSMa-Bench-v2获取。

英文摘要

Semantic mapping methods are increasingly used as intermediate scene representations for downstream robotic reasoning and manipulation, yet their evaluation is still largely tied to fixed benchmark datasets with limited coverage of manipulation-relevant corner cases. In this work, we extend OSMa-Bench toward controllable benchmarking with prompt-generated synthetic indoor scenes. Our pipeline automatically generates scene descriptions, synthesizes corresponding environments with SceneSmith, and adapts the resulting assets into an OSMa-Bench-compatible simulation format. This adaptation requires a nontrivial intermediate layer, including semantic normalization, material and texture repair, shader fallback policies, floor handling, navigation setup, and controlled lighting configuration. A key advantage of the proposed setup is that the original scene-generation prompt is known in advance and can therefore serve as an auxiliary semantic specification of the intended scene. We use this property to extend the VQA component of OSMa-Bench with a prompt-grounded question category. The resulting framework supports targeted stress-testing of semantic scene representations under conditions such as clutter, small objects, partial occlusions, and lighting variation, and makes benchmarking more extensible and better aligned with downstream manipulation requirements. Our code is available at https://github.com/be2rlab/OSMa-Bench-v2.

2605.26828 2026-05-27 cs.RO 版本更新

Learning Compositional Symbolic Task Rules from Demonstrations with Inductive Logic Programming

通过归纳逻辑编程从演示中学习组合符号任务规则

Oleh Borys, Karla Stepanova

发表机构 * Czech Institute of Informatics, Robotics and Cybernetics(捷克信息学、机器人学与自动控制研究所)

AI总结 提出一种基于归纳逻辑编程的分解学习方法,从演示中学习可解释、可重用且支持强泛化的符号任务规则。

Comments In: ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction, Vienna, 2026 In: ICRA 2026, International Joint Workshop on Ontologies, Semantic Maps and Autonomous Robotics Standardization (J-WOSMARS 2026), Vienna, 2026

详情
AI中文摘要

从演示中学习不仅应捕捉任务如何执行,还应解释演示行为的高层任务结构。随着机器人变得更加自主,这种任务表示必须可检查、可重用且人类可解释。为此,我们研究如何通过归纳逻辑编程(ILP)表示和学习机器人任务,将复杂任务分解为不同抽象(本体)层次上的一系列更简单的学习目标。该系统从演示和先验(领域)知识中推断符号规则,并在学习更高层任务结构时重用已学习的规则。我们在一个合成的积木组装场景中评估了该方法,结果表明学习到的抽象是可解释的,并支持对更难的、包含未见物体的保留任务进行强泛化。这些结果初步证明分解的ILP是实现任务级LfD的可行方法。

英文摘要

Learning from Demonstration~(LfD) should capture not only how a task is executed, but also its high-level task structure that explains the demonstrated behavior. As robots become more autonomous, such task representations must be inspectable, reusable, and human-interpretable. To address this, we study how to represent and learn robotic tasks with inductive logic programming~(ILP) by decomposing a complex task into a series of simpler learning objectives at different abstraction (ontological) levels. The system infers symbolic rules from demonstrations and prior (domain) knowledge, and reuses learned rules when learning higher-level task structure. We evaluate the approach in a synthetic block-assembly scenario and show that the learned abstractions are interpretable and support strong generalization to harder, held-out tasks with unseen objects. These results provide preliminary evidence that decomposed ILP is a feasible approach to task-level LfD.

2605.26820 2026-05-27 cs.RO 版本更新

Can VLA Models Learn from Real-World Data Continually without Forgetting?

VLA 模型能否从现实世界数据中持续学习而不遗忘?

Jiarun Zhu, Yijun Hong, Xiaoquan Sun, Zetian Xu, Mingqi Yuan, Zhiyong Wang, Wenjun Zeng, Jiayu Chen

发表机构 * HKU(香港大学) INFIFORCE EIT, Ningbo(宁波工程学院) HUST(华中科技大学) SUSTech(南方科技大学) HITSZ(香港理工大学)

AI总结 本研究通过构建包含四个顺序操作任务的真实世界持续学习数据集,实证发现视觉-语言-动作(VLA)模型在持续学习异构真实世界演示时存在严重灾难性遗忘,并系统评估了经验回放方法的关键实施因素。

详情
AI中文摘要

视觉-语言-动作(VLA)模型为通用机器人提供了有前景的基础。然而,它们在现实场景中的成功部署需要能够持续获取新技能,同时保留先前学习的行为。虽然开创性研究在狭窄的模拟环境中研究了VLA模型的持续学习,但在现实条件下这一挑战仍未得到充分探索。为解决这一局限,我们构建了一个真实世界的持续学习数据集,包含四个顺序操作任务,涵盖刚体抓取放置、接触式按压和可变形物体折叠。利用该数据集,我们进行了全面实验,发现VLA模型在持续学习异构真实世界演示时遭受显著的灾难性遗忘。然后,我们系统评估了经验回放,并揭示了决定其成功的关键实施因素。总之,这项工作提供了真实世界持续VLA学习的首次实证研究,并为部署长期运行的机器人策略提供了实用指导。

英文摘要

Vision-language-action (VLA) models provide a promising foundation for general-purpose robotics. However, their successful deployment in real-world scenarios requires the ability to continually acquire new skills while retaining previously learned behaviors. While pioneering research has studied the continual learning of VLA models in narrowly simulated environments, this challenge remains largely unexplored under realistic conditions. To address this limitation, we construct a real-world continual learning dataset comprising four sequential manipulation tasks, spanning rigid-object pick-and-place, contact-rich pressing, and deformable-object folding. Using this dataset, we conduct comprehensive experiments and find that VLA models suffer significant catastrophic forgetting when continually learning from heterogeneous real-world demonstrations. We then systematically evaluate experience replay and uncover key implementation factors that govern its success. In summary, this work provides the first empirical study of real-world continual VLA learning and offers practical guidance for deploying long-lived robot policies.

2605.26782 2026-05-27 cs.RO cs.HC 版本更新

Manipulating Tangible Virtual Object Dynamics to Promote Learning of Precision Force Generation

操控有形虚拟物体动力学以促进精确力生成的学习

Alberto Garzás-Villar, Alba Riera-Cardona, Alexis Derumigny, J. Micah Prendergast, Jane Murray Cramm, Laura Marchal-Crespo

发表机构 * Department of Cognitive Robotics, Delft University of Technology, Delft, 2628 CD, The Netherlands Department of Socio-medical Sciences, Erasmus School of Health Policy \& Management, Erasmus University Rotterdam, Rotterdam, 3062PA, The Netherlands. Department of Applied Mathematics, Delft University of Technology, Delft, 2628 CD, The Netherlands Department of Rehabilitation Medicine, Erasmus Medical Center, Rotterdam, 3015 GD, The Netherlands

AI总结 本研究提出通过操控有形虚拟物体的动力学(线性、高斯或反对称高斯弹簧模型)来训练精确力控制,实验表明反对称高斯组在训练中力精度最高,但长期保留无显著差异,且参与者主要依赖学习到的目标伸长而非目标力。

详情
AI中文摘要

机器人触觉设备结合虚拟现实为训练精细力生成提供了新机会,这是中风后康复中重要但常被忽视的部分。本研究提出,操控有形虚拟物体的渲染动力学可用于训练精确力控制,同时激活体感系统。我们进行了一项实验,50名健康参与者执行一项类似冰壶的任务,他们必须拉伸虚拟弹簧以产生目标释放力,将石头推至冰面上预定义位置。在训练中,弹簧的力-伸长关系被建模为线性或非线性函数,即高斯或反对称高斯函数,在释放目标力处导数为零。结果表明,反对称高斯组在训练中始终比线性组获得更高的力精度,而高斯组仅在训练后期优于线性组。人格特质分析显示,在高斯动力学下,更高的自由精神得分与较差的表现和减少的任务探索相关,而更高的挑战转化得分与增加的探索相关。尽管存在这些训练效应,但在不同弹簧类型或人格特质之间,长期保留没有显著差异。参与者主要依赖学习到的目标伸长而非目标力,这通过在不同刚度但相同目标力的转移任务中的表现得以证实。虽然这些方法对体感神经康复有前景,但在对神经疾病患者进行测试之前,需要改进以减少对本体感觉线索的依赖。

英文摘要

Robotic haptic devices combined with virtual reality offer novel opportunities to train fine force generation, an essential yet overlooked component of post-stroke rehabilitation. This study proposes that manipulating the rendered dynamics of tangible virtual objects can be leveraged to train precise force control while engaging the somatosensory system. We conducted an experiment with fifty healthy participants who performed a curling-inspired task in which they had to stretch a virtual spring to generate a target release force to propel the stone to a predefined location on the ice sheet. During training, the spring's force-elongation relationship was modeled as either a linear or non-linear function, i.e., a Gaussian or antisymmetric Gaussian (AS-Gaussian) function with zero derivative at the release target force. Results indicate that the AS-Gaussian group consistently achieved higher force accuracy during training than the linear group, while the Gaussian group only outperformed the linear group toward the end of training. Analysis of personality traits revealed that higher Free Spirit scores were associated with poorer performance and reduced task exploration under Gaussian dynamics, whereas higher Transform-of-Challenge scores correlated with increased exploration. Despite these training effects, no significant differences in long-term retention were found across spring types or personality traits. Participants primarily relied on learned target elongation rather than target force, as evidenced by performance in a transfer task with a different stiffness but the same target force. While promising for somatosensory neurorehabilitation, these methods require refinement to reduce reliance on proprioceptive cues before testing with neurological patients.

2605.26710 2026-05-27 cs.RO 版本更新

Look Further: Socially-Compliant Navigation System in Residential Buildings

看得更远:住宅楼中的社交合规导航系统

Akira Shiba, Marina Obata, Nathan Kau, Zoltan Beck, Rishi Shah, Michael Sudano, Sabrina Lee

发表机构 * Toyota Woven City(丰田织城)

AI总结 提出一种主动变道(PLC)运动模式,通过将反应距离扩展到8米以上,改善人类对机器人运动的感知,并在直走廊场景中显著提升安全性、流畅性和礼貌性。

Comments 2025 ACM/IEEE International Conference on Human-Robot Interaction

详情
Journal ref
2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Melbourne, Australia, 2025, pp. 272-282
AI中文摘要

移动机器人对人的反应距离强烈影响人机交互的多种品质。本文聚焦于移动配送机器人在住宅室内走廊环境中的导航。社交导航方法通常侧重于避免令人不适的人机交互,例如机器人侵入某人的个人空间。由于个人空间已被证明仅在几米范围内,社交导航方法通常侧重于解决这些短距离交互。然而,在本工作中,我们证明通过将反应距离扩展到超过8米(远超出典型交互距离),可以改善人类对机器人运动的感知。我们引入了主动变道(PLC)运动模式以及利用该模式在更远距离上对人做出反应的导航系统。该模式包括当机器人在走廊中从中心向侧面导航时,在距离迎面而来的人8米处改变其横向位置。我们进行了一项有42名参与者的用户研究,基于三个服务目标(安全性、流畅性和礼貌性)评估他们对配送机器人的印象。在直走廊场景(正面接近)中,结果显示与文献中典型的运动模式(减速、停止和在靠近人时反应性避障)相比,这三个目标均有显著改善。相比之下,在交叉口(盲角)场景中,没有任何一种方法显著优于其他方法,参与者对机器人运动模式的偏好各不相同。

英文摘要

The distance at which a mobile robot reacts to a person strongly impacts various qualities of the human-robot interaction. In this paper, we focus on the navigation of a mobile delivery robot platform in a residential indoor hallway environment. Social navigation methods typically focus on avoiding uncomfortable human-robot interactions, such as when a robot encroaches on someone's personal space. Since personal space has been shown to be in the range of just a few meters, social navigation methods typically focus on deconflicting and resolving these short-range interactions. In this work, however, we demonstrate that by extending the reaction distance to over eight meters, far beyond the typical interaction distance, we can improve the human's perception of the robot's motion. We introduce the Proactive Lane-Changing (PLC) motion pattern and a navigation system that leverages it to react to people at an increased distance. This pattern consists of changing the robot's lateral position as it navigates down the hallway from the center to the side at an eight-meter distance from an oncoming person. We conducted a user study with 42 participants to assess their impressions of the delivery robot based on three service objectives: safety, smoothness, and politeness. In the straight hallway scenario (Frontal Approach), results showed significant improvement in each of these three objectives compared to typical motion patterns found in the literature: slowing down, stopping, and reactive collision avoidance in the proximity of a person. In contrast, in the intersection (Blind Corner) scenarios, none of the approaches performed significantly better than any other, with participants having a diverse range of preferences among robot motion patterns.

2605.24465 2026-05-27 cs.RO 版本更新

Polymander II: an amphibious salamander-inspired robot with contact and flow sensors

Polymander II:一种带有接触和流量传感器的两栖蝾螈启发机器人

Qiyuan Fu, Sudong Lee, Andrea Grillo, Jonathan Arreguit, Louis Gevers, Josie Hughes, Auke J. Ijspeert

发表机构 * Biorobotics Laboratory, EPFL(生物机器人实验室,瑞士联邦理工学院) CREATE Lab, EPFL(CREATE实验室,瑞士联邦理工学院) Innobridge Services Sàrl(Innobridge Services公司)

AI总结 本文提出一种基于霍尔效应传感器的两栖机器人,用于感知足部接触力和侧向水动力,实现陆水环境感知与反馈控制。

Comments This work has been accepted for publication in the 2026 International Conference on Robotics and Automation (ICRA), Vienna, Austria

详情
AI中文摘要

机器人受益于感官信息来协调身体运动、增强对扰动的鲁棒性,并在不同模式间转换以适应各种地形。然而,很少有兩栖机器人能够感知与陆地和水中环境的交互。在本文中,我们提出了一种解决方案,使用霍尔效应传感器来感知一种受蝾螈启发的两栖机器人的足部接触力和侧向水动力。通过两条总线,机器人可以同时以超过500 Hz的频率获取这些外部感受信息,并以100 Hz的频率获取本体感受信息,如关节位置和负载。所使用的霍尔效应传感器体积小巧,适合嵌入机器人多个位置,并且对小力具有高灵敏度。此外,由于传感器可以与测量对象分开放置,防水实现相对容易。我们的测试展示了机器人在穿越两栖环境方面的能力,以及其在利用反馈控制执行更复杂运动任务方面的潜力。

英文摘要

Robots benefit from sensory information to coordinate body movement, gain robustness against perturbations, and transition between different modes to adapt to various terrains. However, few amphibious robots can sense interactions with both terrestrial and aquatic environments. In this paper, we present a solution that uses Hall-effect sensors to sense foot contact forces and lateral hydrodynamic forces on a salamander-inspired amphibious robot. With two bus lines, the robot can simultaneously acquire this exteroceptive information at more than 500 Hz and proprioceptive information, such as joint positions and loads, at 100 Hz. The Hall-effect sensors used are compact, making them suitable for embedding in multiple positions within a robot, and exhibit high sensitivity to small forces. Moreover, because the sensor can be positioned separately from the measured object, waterproofing can be implemented with relative ease. Our tests demonstrate the robot's capabilities in traversing amphibious environments and its potential in using feedback control for more complex locomotion tasks.

2605.26682 2026-05-27 cs.RO cs.CV 版本更新

SteelDS: A High-Resolution Video Dataset of E40 Steel Scrap for Object Detection and Instance Segmentation

SteelDS: 用于目标检测和实例分割的E40钢废料高分辨率视频数据集

Melanie Neubauer, Christian Rauch, Gerald Koinig, Alexia Tischberger-Aldrian, Roland Pomberger, Elmar Rueckert

发表机构 * Chair of Cyber-Physical-Systems(系统工程系) Technical University of Leoben(莱比锡技术大学) Chair of Waste Processing Technology and Waste Management(废物处理技术与废物管理系)

AI总结 该数据集提供了E40级钢和铜废料在传送带上的高分辨率标注视频序列,用于支持材料分类、目标检测和实例分割的机器学习模型开发。

详情
AI中文摘要

该数据集提供了粉碎的E40级钢和铜废料在传送带上的高分辨率、标注视频序列。在受控实验室环境中捕获,数据反映了工业磁选后阶段,通常需要人工干预去除铜污染物。数据集包含五个子集的24,297个标注帧,包含396个钢和101个铜物体,按大小分类。它支持材料分类、目标检测和实例分割的机器学习模型开发。包含物体间距和密度的变化,以模拟真实的工业分拣条件。地面真值标注包括像素级分割掩码和材料类别。该数据集作为评估自动化分拣算法的基准,旨在识别复杂、异质钢废料流中的铜杂质。

英文摘要

This dataset provides high-resolution, annotated video sequences of shredded E40-grade steel and copper scrap on a conveyor belt. Captured in a controlled laboratory environment, the data reflects the industrial post-magnetic sorting stage, where manual intervention is typically required to remove copper contaminants. The dataset comprises 24,297 labeled frames across five subsets, featuring 396 steel and 101 copper objects categorized by size. It supports the development of machine learning models for material classification, object detection, and instance segmentation. Variations in object spacing and density are included to simulate realistic industrial sorting conditions. Ground truth annotations include pixel-wise segmentation masks and material classes. This dataset serves as a benchmark for evaluating automated sorting algorithms aiming to identify copper impurities within complex, heterogeneous steel scrap streams.

2605.26649 2026-05-27 cs.RO 版本更新

On the Generalization Capabilities, Design Choices and Limitations of Keypoint Imitation Learning

关键点模仿学习的泛化能力、设计选择与局限性

Thomas Lips, Marco Moletta, Michael C. Welle, Danica Kragic, Francis wyffels

发表机构 * AI and Robotics Lab, IDLAB-AIRO, Ghent University-imec(人工智能与机器人实验室,IDLAB-AIRO,根特大学-imec) Robotics, Perception and Learning Lab, (RPL), EECS, KTH Royal Institute of Technology(机器人、感知与学习实验室,(RPL),EECS,皇家理工学院) INCAR Robotics AB, Stockholm, Sweden(INCAR机器人公司,斯德哥尔摩,瑞典)

AI总结 本文通过2000多次真实世界实验,系统评估了基于视觉基础模型的关键点模仿学习在机器人操作中的泛化能力、设计选择及局限性,发现其总体成功率达75%,显著优于RGB基线(47%),与S2扩散模型(73%)相当,但未超越其他表示方法且继承了基础模型的局限。

Comments This version was submitted to IROS 2026

详情
AI中文摘要

基于RGB的模仿学习需要大量演示才能泛化到未见过的物体或场景,这促使研究人员探索中间表示以提高机器人操作的泛化能力。视觉基础模型能够通过一次提取关键点来提供这种表示。然而,如何最优地将它们整合到模仿学习中,以及它们何时优于其他表示,仍不清楚。我们结合了以往关键点模仿学习(KIL)研究中的方法,并研究了若干设计选择以提供实用指南。通过2000多次真实世界实验,我们还评估了KIL对未见物体和场景变化的泛化能力。KIL在五个任务上实现了75%的总体成功率,显著优于RGB基线(47%),并与S2扩散模型(73%)相当。最后,我们探讨了用于关键点提取的基础模型的局限性,并将KIL扩展到具有多个物体实例的任务。我们的结果证实KIL是一种数据高效的机器人学习方法,尽管它并未超越其他表示方法,并且继承了用于关键点提取的基础模型的局限性。所有实验视频、演示和结果均可在https://kil-manipulation.github.io/获取。

英文摘要

RGB-based imitation learning requires many demonstrations to generalize to unseen objects or scenes, motivating research into intermediate representations to improve generalization for robotic manipulation. Visual foundation models enable one-shot extraction of keypoints to provide such representation. However, it remains unclear how to integrate them into imitation learning optimally and when they outperform alternative representations. We combine approaches from previous works on keypoint imitation learning (KIL) and investigate several design choices to provide practical guidelines. Using over 2000 real-world rollouts, we also assess the generalization capabilities of KIL to unseen objects and scene variations. KIL achieves a 75% overall success rate across five tasks, significantly outperforming the RGB baseline (47%) and performing on par with S2-diffusion (73%). Finally, we explore the limitations of the foundation models used for keypoint extraction and extend KIL to tasks with multiple object instances. Our results confirm KIL as a data-efficient approach for robot learning, though it does not outperform alternative representations and inherits limitations of the foundation models used for keypoint extraction. All rollout videos, demonstrations, and results are available at https://kil-manipulation.github.io/.

2605.26648 2026-05-27 cs.RO 版本更新

L-Learning : A Lyapunov-Based Approach Leveraging Lagrangian Mechanics for Efficient and Stable Robot Tracking

L-Learning:一种基于李雅普诺夫与拉格朗日力学的机器人高效稳定跟踪方法

Quan Quan, Hao Li

发表机构 * School of Automation Science and Electrical Engineering, Beihang University(北京航空航天大学自动化科学与电气工程学院)

AI总结 提出L-Learning框架,结合李雅普诺夫稳定性理论与拉格朗日力学,从数据中学习系统能量函数,实现高效、稳定且样本效率高的机器人轨迹跟踪。

Comments 9 pages, 4 figures, 4 tables

详情
AI中文摘要

本文提出L-Learning,一种新颖的机器人数据驱动控制框架,将李雅普诺夫稳定性理论与拉格朗日力学相结合,以增强轨迹跟踪性能。传统控制方法在动态和不确定环境中往往性能下降,而数据驱动方法虽然适应性更强,但常受限于高样本复杂度和缺乏严格的稳定性保证。L-Learning通过从数据中显式学习系统能量函数来缓解这些挑战,从而在确保闭环稳定性的同时优化性能。L-Learning具有优越的控制精度、理论稳定性保证和高样本效率,是实际机器人应用中有前景的解决方案。

英文摘要

This paper presents L-Learning, a novel data-driven control framework for robotics that integrates Lyapunov stability theory with Lagrangian mechanics to enhance trajectory tracking performance. While traditional control methods often suffer from performance degradation in dynamic and uncertain environments, data-driven approaches, while more adaptable, are frequently limited by high sample complexity and a lack of rigorous stability guarantees. L-Learning mitigates these challenges by explicitly learning the system's energy function from data, thereby optimizing performance while ensuring closed-loop stability intrinsically. Characterized by superior control accuracy, theoretical stability guarantees, and high sample efficiency, L-Learning represents a promising solution for practical robotic applications.

2605.26638 2026-05-27 cs.RO 版本更新

HyperSim: A Holistic Sim-To-Real Framework For Robust Robotic Manipulation

HyperSim: 一种面向鲁棒机器人操作的整体仿真到现实框架

Junyi Dong, Haotian Luo, Ziwei Xu, Shengwei Bian, Heng Zhang, Sitong Mao, Jingyi Guo, Yang Xu, Wenhao Chen, Qiuyu Feng, Yao Mu, Ping Luo, Shunbo Zhou, Xiaodong Wu

发表机构 * CloudRobo Lab, Huawei Cloud Computing Technologies Co.,Ltd.(华为云计算技术有限公司云机器人实验室) Shanghai Jiao Tong University(上海交通大学) The University of Hong Kong(香港大学)

AI总结 本文提出HyperSim框架,通过高保真环境合成、对抗轨迹生成和仿真-现实联合训练三大支柱,系统性地缩小仿真到现实的域差距,在400次真实世界任务执行中实现了ACT和π0分别达到80%和95%的仿真到现实成功率。

Comments 9 pages, 8 figures

详情
AI中文摘要

扩展数据量和多样性对于泛化具身智能至关重要。虽然合成数据生成为昂贵的物理数据采集提供了一种可扩展的替代方案,但由于域差距,将机器人操作策略从仿真迁移到现实世界(仿真到现实)仍然是一个艰巨的挑战。本文提出了HyperSim,一个涵盖从合成数据生成到策略训练和无缝现实部署的整体框架。为了系统地弥合仿真到现实的差距,HyperSim通过三个核心支柱实现:高保真环境合成、对抗轨迹生成和仿真-现实联合训练。这些模块共同通过增强视觉保真度、扩展数据覆盖范围和强制域不变表示来解决域差异。我们通过一项大规模实证研究严格验证了HyperSim,该研究涉及两个代表性操作模型的400次真实世界任务执行。在三个细粒度指标上评估,我们的完整流程在ACT和π0上分别实现了80%和95%的显著仿真到现实成功率。此外,在我们的对抗轨迹上训练的策略对动态不确定性表现出显著增强的鲁棒性,在物理扰动下实现了35%更高的完成率。

英文摘要

Scaling data volume and diversity is critical for generalizing embodied intelligence. While synthetic data generation offers a scalable alternative to expensive physical data acquisition, transferring robotic manipulation policies from simulation to the real world (sim-to-real) remains a formidable challenge due to the domain gap. This paper presents HyperSim, a holistic framework spanning from synthetic data generation to policy training and seamless real-world deployment. To systematically bridge the sim-to-real gap, HyperSim is realized through three core pillars: high-fidelity environment synthesis, adversarial trajectory generation, and sim-and-real co-training. Collectively, these modules address domain discrepancies by enhancing visual fidelity, expanding data coverage, and enforcing domain-invariant representations. We rigorously validate HyperSim through a large-scale empirical study involving 400 real-world task executions across two representative manipulation models. Assessed across three fine-grained metrics, our complete pipeline achieves remarkable sim-to-real success rates of 80% and 95% with ACT and π_{0}, respectively. Furthermore, policies trained on our adversarial trajectories exhibit significantly enhanced robustness against dynamic uncertainties, achieving a 35% higher completion rate under physical perturbations.

2605.26637 2026-05-27 cs.RO 版本更新

Enabling Extensible Embodied Capabilities with Tools

利用工具实现可扩展的具身能力

Xueyang Zhou, Zijia Wang, Qianjiang Li, Yibo Hu, Guiyao Tie, Li Wan, Yidan Liu, Pan Zhou, Lichao Sun, Yongchao Chen

发表机构 * Huazhong University of Science and Technology(华中科技大学) Hebei University of Technology(河北工业大学) Tianjin University(天津大学) Lehigh University(莱特大学) College of AI, Tsinghua University(清华大学人工智能学院)

AI总结 提出一种通过外部化能力为工具、并借助标准化协议ETP动态调用工具的方法,在仿真和真实平台上平均提升具身性能31%-36%,但揭示了工具使用在认知和感知方面增益显著而在执行方面有限。

Comments 51 pages, 20 figures,

详情
AI中文摘要

大多数现有的具身智能方法将感知、推理、规划和控制统一在参数化策略中。然而,这些能力本质上是层次化和异质的,使得它们难以在单一模型中可靠地学习和模块化。我们提出了一种能力外部化方法,将异质能力解耦为独立优化的工具,在推理时动态调用。为此,我们引入了具身工具协议(ETP),一种用于具身工具注册、发现、调用和执行的标准化协议,并策划了100多个经过验证的工具,涵盖感知、认知、推理和执行,作为工具库。在此基础上,我们构建了EmbodiedToolBench,以评估工具增强是否提高了具身性能,以及当前模型在工具必要性识别、工具选择、工具执行和工具链组合方面的工具使用能力。在仿真和真实平台上的实验证实,能力外部化一致地提高了具身性能(在EB-ALFRED上平均提升31%,在EB-Navigation上平均提升36%),但揭示了一个明确的边界:在认知和感知方面增益显著,而在执行类能力方面增益有限。此外,我们的分析表明,知道何时、调用哪个以及如何调用工具仍然是所有模型面临的持续挑战,从而凸显了具身工具能力作为未来研究的关键方向。

英文摘要

Most existing embodied intelligence methods formulate perception, reasoning, planning, and control within a unified parameterized policy. Yet these capabilities are inherently hierarchical and heterogeneous, making them difficult to reliably learn and modularize within a single model. We propose a capability externalization approach that decouples heterogeneous capabilities into independently optimized tools, dynamically invoked at inference time. To this end, we introduce Embodied Tool Protocol (ETP), a standardized protocol for embodied tool registration, discovery, invocation, and execution, and curate 100+ validated tools spanning perception, cognition, reasoning, and execution as the tool base. Building on this, we construct EmbodiedToolBench to evaluate both whether tool augmentation improves embodied performance and how well current models use tools across tool-necessity recognition, tool selection, tool execution, and tool-chain composition. Experiments across simulation and real-world platforms confirm that capability externalization consistently improves embodied performance (avg. gain 31% on EB-ALFRED and 36% on EB-Navigation), yet reveal a clear boundary: gains are substantial for cognition and perception but are limited for execution-type capabilities. Moreover, our analysis reveals that knowing when, which, and how to invoke tools remains a persistent challenge across all models, thereby highlighting embodied tool competence as a critical direction for future research.

2605.26627 2026-05-27 eess.SY cs.RO cs.SY 版本更新

Breaking the Epistemic Trap: Active Perception Under Compound Uncertainty

打破认知陷阱:复合不确定性下的主动感知

Chayan Banerjee, Ethan Goan

发表机构 * School of Electrical Engineering and Robotics(电气工程与机器人学学院)

AI总结 针对强化学习在安全关键领域中因状态-动力学耦合不确定性导致的失败,提出基于互信息的复合不确定性系数和主动信息寻求策略的适应性安全架构。

详情
AI中文摘要

在安全关键领域部署强化学习,从自动驾驶到医疗决策支持,受到系统遇到不熟悉条件时出现的失败的限制。我们认为,根本瓶颈不是单个挑战,如变化的动力学或不完整的观测,而是它们的协同交互,我们称之为认知陷阱:代理无法在不知道系统动力学的情况下估计其状态,也无法在没有准确状态信息的情况下学习动力学。在模拟运动中的概念验证实验表明,结合这些不确定性导致的失败远严重于单独挑战,性能下降77%,而单独效应相加为46%,展示了传统方法忽略的复合失败模式。这些方法采用被动的认知立场,无法解决这种耦合的不确定性。我们提出将安全重新定义为信息问题,引入一个适应性安全架构,围绕三个贡献构建:复合不确定性系数(κ),一种基于互信息的度量,量化状态-动力学耦合,可在线上计算而无需完整的联合信念推断;由MaxInfoRL目标驱动的信息寻求策略,主动探测系统动力学;以及随认知耦合上升而收紧的机制自适应安全约束。这种范式转变,从被动鲁棒性到主动感知,为在不确定性下运行、识别自身无知并战略性地采取行动解决它的决策系统提供了原则性路径。

英文摘要

Deploying reinforcement learning in safety critical domains, from autonomous vehicles to medical decision support, is constrained by failures arising when systems encounter unfamiliar conditions. We argue that the fundamental bottleneck is not individual challenges like changing dynamics or incomplete observations, but their synergistic interaction, which we term the Epistemic Trap: agents cannot estimate their state without knowing system dynamics, nor learn dynamics without accurate state information. Proof-of-concept experiments in simulated locomotion reveal that combining these uncertainties causes failures far worse than either challenge alone, a 77% performance degradation against the 46% by adding the individual effects, demonstrating compounding failure modes that conventional methods overlook. Such approaches adopt a passive epistemic stance that cannot resolve this coupled uncertainty. We propose reframing safety as an information problem, introducing an Adaptive Safety Architecture built around three contributions: the Compound Uncertainty Coefficient ($κ$), a mutual information based metric that quantifies state dynamics coupling and is computable online without full joint belief inference; information seeking policies governed by a MaxInfoRL objective that actively probe system dynamics; and regime-adaptive safety constraints that tighten as epistemic coupling rises. This paradigm shift, from passive robustness to active perception, offers a principled path toward decision making systems that operate under uncertainty, recognize their own ignorance, and act strategically to resolve it.

2605.26478 2026-05-27 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

基于随机解耦策略梯度的高效在策略视觉强化学习

Haoxiang You, Yilang Liu, Davis Zong, Qian Wang, Teeratham Vitchutripop, Qi Wang, Daniel Rakita, Ian Abraham

发表机构 * Yale University(耶鲁大学) Shanghai Jiao Tong University(上海交通大学) University of Sydney(悉尼大学)

AI总结 提出随机解耦策略梯度(SDPG)方法,通过轨迹滚动的随机扰动估计策略梯度,在单GPU上数小时内端到端训练多样化的视觉运动控制策略,显著降低计算和内存开销,并在视觉MuJoCo基准测试中优于基线方法。

详情
AI中文摘要

我们提出了随机解耦策略梯度(SDPG),一种轻量级的视觉强化学习方法,能够在单个NVIDIA RTX 4080 GPU上在数小时内端到端训练多样化的视觉运动控制策略。SDPG通过轨迹滚动的随机扰动估计策略梯度,所需批量渲染环境数量减少几个数量级,并显著降低计算和内存开销。在视觉MuJoCo基准测试中,SDPG在训练时间、内存使用和奖励方面始终优于基线方法。最后,为支持未来研究,我们引入了一套涵盖灵巧操作、具有挑战性的运动控制的逼真视觉机器人基准测试,并在物理硬件上展示了有效的仿真到现实迁移。

英文摘要

We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning (RL) method that trains diverse visuomotor control policies end-to-end within a few hours on a single NVIDIA RTX 4080 GPU. SDPG estimates policy gradients via random perturbations of trajectory rollouts, requiring orders of magnitude fewer batch-rendered environments and substantially reducing compute and memory overhead. On visual MuJoCo benchmarks, SDPG consistently outperforms baseline methods in training time, memory usage, and rewards. Finally, to support future research, we introduce a suite of realistic visual robotics benchmarks spanning dexterous manipulation, challenging locomotion, and demonstrate effective sim-to-real transfer on physical hardware.

2605.26471 2026-05-27 cs.RO 版本更新

Heterogeneous AAV Logistics Task Allocation: A Reinforcement Learning Enhanced Overlapping Coalition Formation Game Approach

异构AAV物流任务分配:一种强化学习增强的重叠联盟形成博弈方法

Yuze Zhou, Jingliang Sun, Junzhi Li, Jianxin Zhong, Zihan Wang, Teng Long

发表机构 * Beijing Institute of Technology(北京理工大学) Key Laboratory of Dynamics and Control of Flight Vehicle, Ministry of Education(教育部飞行器动力学与控制重点实验室)

AI总结 针对动态城市物流中时间敏感任务的随机出现带来的异构AAV任务分配最优性挑战,提出一种基于Transformer的软演员-评论家网络增强的重叠联盟形成博弈方法,实现全局最优任务分配并证明收敛至纳什稳定均衡。

Comments 12 pages

详情
AI中文摘要

在动态城市物流中,时间敏感任务的随机出现对异构AAV物流任务分配提出了显著的最优性挑战。为解决这一问题,提出了一种强化学习增强的重叠联盟形成博弈方法。建立了动态任务分配模型,其中全局最优性通过耦合服务质量与资源消耗的广义物流成本进行数学量化。为应对随机订单到达引起的时变任务集,设计了一种基于Transformer的软演员-评论家网络。通过利用多头自注意力编码可变长度的物流状态并捕获任务间的时空依赖关系,学习到的策略自适应地指导联盟更新,取代重叠联盟形成博弈中的启发式规则。在此基础上,异构AAV可以为动态物流任务形成更高效的重叠联盟。所得到的联盟形成过程被证明构成一个精确势博弈,保证了在有限迭代次数内收敛到纳什稳定均衡。数值仿真表明,所提算法在广义物流成本准则下有效提高了任务分配的最优性。在32架AAV和80个任务的场景中,与启发式OCF基线相比,我们的算法实现了39.76%的成本降低。室内飞行实验进一步验证了其实用性。

英文摘要

In dynamic urban logistics, the stochastic emergence of time-sensitive tasks poses a significant optimality challenge for heterogeneous AAVs logistics task allocation. To address this problem, a reinforcement learning enhanced overlapping coalition formation game approach is proposed. A dynamic task allocation model is established, where global optimality is mathematically quantified by a generalized logistics cost coupling service quality and resource consumption. To deal with the time-varying task sets induced by stochastic order arrivals, a transformer-based soft actor-critic network is designed. By leveraging multi-head self-attention to encode variable-length logistics states and capture task-wise spatiotemporal dependencies, the learned policy adaptively guides coalition updates, replacing heuristic rules in the overlapping coalition formation game. On this basis, heterogeneous AAVs can form more efficient overlapping coalitions for dynamic logistics tasks. The resulting coalition formation process is proven to constitute an exact potential game, which guarantees convergence to a Nash-stable equilibrium within a finite number of iterations. Numerical simulations demonstrate that the proposed algorithm effectively improves the optimality of task allocation under the generalized logistics cost criterion. In a scenario with 32 AAVs and 80 tasks, our algorithm achieves a 39.76% cost reduction compared with the heuristic OCF baseline. Indoor flight experiments further validate its practicality.

2605.26349 2026-05-27 cs.RO 版本更新

Closing the Loop in Teleoperation: Episode-Level Data Quality Assessment and Feedback for High-Quality Demonstration Collection

在遥操作中闭环:面向高质量演示收集的片段级数据质量评估与反馈

Gokul Narayanan, Yash Shahapurkar, Melih Erdogan, Brian Zhu, Eugen Solowjow

发表机构 * Siemens Corporation(西门子公司)

AI总结 提出数据质量评估与反馈框架,通过语义任务进度和机器人遥测数据提供即时后片段反馈,帮助新手操作员提升演示质量。

详情
AI中文摘要

工业自动化正处于关键时刻,物理AI正推动从刚性、手工设计的自动化系统向更灵活、自适应的系统转变。这一转变产生了对大规模、真实世界机器人演示数据的需求,使得遥操作成为越来越重要的数据收集机制。然而,在实践中,高质量的遥操作演示仍然难以获得,因为新手操作员经常产生任务成功但下游使用次优的片段,原因包括低效运动、重复修正或接近机器人关节极限操作。我们提出一个数据质量评估与反馈(DQAF)框架,通过提供基于语义任务进度和机器人遥测的即时后片段反馈,在遥操作中实现闭环。该框架提取质量相关信号,如子任务进度、运动平滑度、停顿、运动学极限,并将其转化为结构化质量评估和可操作的自然语言反馈。与二元成功或失败反馈不同,所提系统解释了片段为何次优,并突出显示下次试验中需要纠正的具体行为。我们通过诊断验证研究和试点用户研究评估该框架。在验证研究中,系统在数据集整理过程中与人类评审员进行比较,产生拒绝原因和可操作的改进反馈。在涉及三个新手操作员的两项操作任务的试点研究中,接收系统即时自动后片段反馈的操作员比未接收的改进更快,更早产生更高质量的演示。

英文摘要

Industrial automation is at a pivotal moment, as Physical AI is driving a transition from rigid, hand-engineered automation systems toward more flexible and adaptive systems. This shift has created a growing demand for large-scale, real-world robot demonstration data, making teleoperation an increasingly important mechanism for data collection. However, high-quality teleoperated demonstrations remain difficult to obtain in practice, as novice operators often produce episodes that are task-successful but suboptimal for downstream use due to inefficient motion, repeated corrections, or operation near robot joint limits. We present a Data Quality Assessment and Feedback (DQAF) framework that closes the loop in teleoperation by providing immediate post-episode feedback grounded in semantic task progress and robot telemetry. The framework extracts quality relevant signals such as sub-task progress, motion smoothness, stalls, kinematic limits and converts them into structured quality assessments and actionable natural-language feedback. Unlike binary success or failure feedback, the proposed system explains why an episode is suboptimal and highlights specific behaviors to correct in the next trial. We evaluate the framework through a diagnostic validation study and a pilot user study. In the validation study, the system is compared with a human reviewer during dataset curation, producing rejection reasons and actionable feedback for improvement. In the pilot study with three novice operators across two manipulation tasks, the operator who received the systems immediate, automated post-episode feedback improved faster than those who did not, producing higher-quality demonstrations sooner.

2605.26348 2026-05-27 cs.RO 版本更新

RCSP: Risk-Sensitive Conjectural Scenario Planning for Safe Dynamic Robot Navigation

RCSP: 面向安全动态机器人导航的风险敏感推测性场景规划

Zhengye Han, Quanyan Zhu

发表机构 * Department of Electrical and Computer Engineering(电气计算机工程系)

AI总结 提出风险敏感推测性场景规划(RCSP),通过轻量级信念维护、未来交互采样和高风险尾部惩罚,结合局部安全检查,解决移动机器人在动态障碍物环境中的预测性近碰撞承诺问题,并在仿真中验证其提升安全性和路径质量。

详情
AI中文摘要

移动机器人在碰撞之前就可能失败:当前安全的速度可能使机器人陷入即将被移动障碍物关闭的通道。我们研究了这种预测性近碰撞承诺问题,并提出了风险敏感推测性场景规划(RCSP),这是一个规划层,它根据合理的短视障碍物未来对候选命令进行评估。RCSP维护一个关于局部运动推测的轻量级信念,采样未来交互,惩罚高风险尾部,并通过局部安全检查执行。在受控的MuJoCo瓶颈任务中,RCSP规划器无碰撞地到达目标,并且与非自适应预测器相比,提供了更高的次要安全性和路径质量点估计,但增加了延迟。在ROS2/Gazebo中,将局部安全层添加到标准Nav2堆栈可减少动态近碰撞失败。在官方DynaBARN/Jackal迁移中,调整后的DWA和TEB在严格的基准成功率上仍然更强,揭示了该方法的边界。这些仿真结果将RCSP定位为一个预测风险模块,在动态瓶颈机制中补充现有的导航堆栈。

英文摘要

Mobile robots can fail before they collide: a velocity that is safe now may commit the robot to a passage that moving obstacles will soon close. We study this predictive near-miss commitment problem and propose Risk-Sensitive Conjectural Scenario Planning (RCSP), a planning layer that evaluates candidate commands against plausible short-horizon obstacle futures. RCSP maintains a lightweight belief over local motion conjectures, samples future interactions, penalizes high-risk tails, and executes through a local safety check. In controlled MuJoCo bottleneck tasks, the RCSP planner reaches the goal without collisions and yields higher secondary safety and path-quality point estimates than a non-adaptive predictor, with additional latency. In ROS2/Gazebo, adding the local safety layer to a standard Nav2 stack reduces dynamic near-miss failures. On official DynaBARN/Jackal transfer, tuned DWA and TEB remain stronger on strict benchmark success, revealing the boundary of the approach. These simulation results position RCSP as a predictive-risk module that complements existing navigation stacks in dynamic bottleneck regimes.

2605.26330 2026-05-27 cs.RO 版本更新

NightSight: Passive Computation for Navigation in Dark Using Events

NightSight:利用事件在黑暗中进行被动计算导航

Deepak Singh, Brijan Vaghasiya, Shreyas Khobragade, Nitin Sanket

发表机构 * NVIDIA

AI总结 提出一种结合单目事件相机、编码孔径镜头和红外点投影仪的轻量级感知方法,通过卷积神经网络解码深度相关模糊签名生成密集深度图,实现小型空中机器人在完全黑暗环境中的实时导航。

Comments 6 pages, 7 figures

详情
AI中文摘要

小型空中机器人由于其敏捷性、低成本以及在大型平台无法进入的杂乱空间中穿行的能力,特别适合在受限和危险环境中进行搜索和救援。然而,在完全黑暗中实现自主导航仍然是一个重大挑战,因为小型空中机器人难以容纳需要大量载荷、功率或计算的感知系统。在这项工作中,我们提出了一种轻量级感知方法,结合单目事件相机、编码孔径镜头和红外点投影仪,以实现在此类条件下的导航。通过编码孔径成像的投影图案会产生深度相关的模糊签名,隐式编码场景几何。我们训练了一个卷积神经网络,仅使用从简单平面墙设置生成的合成数据来将这些签名解码为密集深度图。尽管训练条件有限,该模型能零样本泛化到复杂的真实场景。我们的系统在NVIDIA Jetson Orin Nano上以20 Hz实时运行,展示了其对资源受限平台的适用性。我们进一步分析了不同编码孔径设计对深度估计性能的影响。我们的方法在2.5米范围内实现了高精度(l1误差7.0厘米,2.80%误差)。这些结果突显了结合结构光照明、编码光学和事件传感在完全黑暗中实现鲁棒感知和导航的潜力。

英文摘要

Small aerial robots are particularly well-suited for search and rescue in confined and hazardous environments due to their agility, low cost, and ability to traverse through cluttered spaces that are inaccessible to larger platforms. However, enabling autonomous navigation in complete darkness remains a significant challenge, because small aerial robots cannot easily accommodate perception systems that demand substantial payload, power, or computation. In this work, we present a lightweight perception approach that combines a monocular event camera, a coded aperture lens, and an infrared dot projector to enable navigation in such conditions. The projected pattern, when imaged through the coded aperture, produces depth dependent blur signatures that implicitly encode scene geometry. We train a convolutional neural network to decode these signatures into dense depth maps using only synthetic data generated from a simple planar wall setup. Despite this minimal training regime, the model generalizes zero-shot to complex real-world scenes. Our system operates in real time at 20 Hz on a NVIDIA Jetson Orin Nano, demonstrating suitability for resource-constrained platforms. We further analyze the impact of different coded aperture designs on depth estimation performance. Our approach gives high accuracy (l1 error 7.0cm) upto 2.5m range (2.80% error). These results highlight the potential of combining structured illumination, coded optics, and event-based sensing for enabling robust perception and navigation in complete darkness.

2605.26286 2026-05-27 cs.MA cs.AI cs.RO 版本更新

Decoupled Delay Compensation: Enhancing Pre-trained MARL Policies via Learned Dynamics Filtering

解耦延迟补偿:通过学习的动力学过滤增强预训练的多智能体强化学习策略

Maxim Mednikov, Oren Gal

发表机构 * University of Haifa(海法大学)

AI总结 针对多智能体强化学习在延迟观测和通信延迟下的性能退化问题,提出一种模块化的执行阶段状态估计层,利用学习的门控转移模型和递归卡尔曼滤波从异步测量中估计当前状态,作为预训练策略的即插即用模块,显著提升对通信延迟和丢包的鲁棒性。

Comments 8 pages, 7 figures

详情
AI中文摘要

现实世界中的多智能体强化学习系统通常必须在过时观测、随机通信延迟和间歇性丢包下运行。在理想同步条件下训练的策略在这些场景中常常表现出显著的性能下降,因为它们基于过时的反馈行动。我们提出了一种模块化的执行阶段状态估计层,用当前信念状态估计替换延迟的通信观测。该框架将学习的门控转移模型与递归卡尔曼滤波层相结合,从异步测量中估计瞬时状态。该方法的一个主要优势是其模块性:估计器作为预训练策略的即插即用模块,无需修改原始MARL训练算法、架构或奖励结构。在多种多智能体和连续控制基准上的评估表明,所提出的层持续增强了对通信延迟和消息丢失的鲁棒性。在协调密集和动态不稳定的任务中观察到最显著的性能提升,这些任务中时间一致性对控制至关重要。

英文摘要

Real-world multi-agent reinforcement learning (MARL) systems must often operate under stale observations, stochastic communication delays, and intermittent packet loss. Policies trained under idealized synchronous conditions frequently exhibit significant performance degradation in these regimes because they act on outdated feedback. We propose a modular execution-stage state-estimation layer that replaces delayed communicated observations with current belief-state estimates. The framework integrates a learned Gated transition model with a recursive Kalman filtering layer to estimate instantaneous states from asynchronous measurements. A primary advantage of this approach is its modularity, The estimator serves as a plug-in for pre-trained policies, requiring no modifications to the original MARL training algorithm, architecture, or reward structure. Evaluation across diverse multi-agent and continuous-control benchmarks demonstrates that the proposed layer consistently enhances robustness to communication latency and message loss. The most significant performance gains are observed in coordination-intensive and dynamically unstable tasks where temporal consistency is critical for control.

2605.26284 2026-05-27 cs.RO 版本更新

PhyPush: One Push is All You Need for Sensorless Physical Property Estimation with Physics-Guided Transformers

PhyPush:基于物理引导的Transformer,一次推动即可实现无传感器物理属性估计

Koyo Fujii, Luis Figueredo, Praminda Caleb-Solly, Ivan Boschi, Edoardo Ida', Marco Carricato, Aly Magassouba

发表机构 * School of Computer Science, University of Nottingham(诺丁汉大学计算机科学学院) Dept. of Industrial Engineering, University of Bologna(博洛尼亚大学工业工程系)

AI总结 提出PhyPush框架,利用物理引导的Transformer从单次推动的末端执行器速度估计物体质量和摩擦系数,通过牛顿第二定律和库仑摩擦模型约束提升物理一致性和泛化能力。

Comments Submitted to 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems

详情
AI中文摘要

准确估计物体质量和摩擦是实现可靠自适应机器人操作的基础。尽管交互感知为推断此类属性提供了强大机制,但现有方法大多依赖力/力矩传感器、触觉阵列或多相机运动捕捉系统等专用硬件,限制了可扩展性和部署。本文提出PhyPush,一种物理引导的Transformer框架,仅使用单次推动中运动学推导的末端执行器速度来估计物体的质量和摩擦系数。这通常需要标准机械臂上可用的数据。该模型通过物理引导损失融入牛顿第二定律和库仑摩擦模型的约束,提高了物理一致性以及对未见物体和表面的泛化能力。在多样化的仿真和真实世界设置中,PhyPush在具有挑战性的域外条件下始终能实现更准确的质量和摩擦估计。在仿真中,与能够获取完整力信息的基线相比,误差降低超过10%;在真实世界实验中,其表现优于数据驱动损失方法。总体而言,结果表明物理引导学习能够仅依赖单次推动和现成的运动学数据,实现低成本、传感器高效的物理属性估计。

英文摘要

Accurately estimating object mass and friction is fundamental to achieving reliable and adaptive robotic manipulation. Although interactive perception provides a powerful mechanism for inferring such properties, most existing approaches depend on specialized hardware such as force/torque sensors, tactile arrays, or multi-camera motion-capture systems, limiting scalability and deployment. This paper presents PhyPush, a physics-guided Transformer framework that estimates an object's mass and friction coefficient using only kinematically derived end-effector velocity from a single push. This typically requires data available on standard robotic arms. The model incorporates constraints from Newton's second law and the Coulomb friction model through a physics-guided loss, improving physical consistency and generalization to unseen objects and surfaces. Across diverse simulation and real-world setups, PhyPush consistently achieves more accurate mass and friction estimation in challenging out-of-domain conditions. In simulation, it reduces error by over 10% compared with a baseline that has privileged access to full force information, while in real-world experiments, it outperforms a data-driven loss approach. Overall, the results demonstrate that physics-guided learning can enable low-cost, sensor-efficient estimation of physical properties, relying solely on a single push and readily available kinematic data.

2605.26155 2026-05-27 cs.RO cs.AI cs.LG 版本更新

When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability

自适应引导何时有帮助?部分可观测条件下自动驾驶的信念感知特权蒸馏

Mehmet Haklidir

发表机构 * TUBITAK BILGEM Artificial Intelligence Institute(土耳其TUBITAK BILGEM人工智能研究所)

AI总结 本文提出信念感知GSAC(BA-GSAC),通过集成分歧动态调节蒸馏系数,系统研究自适应引导在部分可观测自动驾驶中的有效性,发现严重遮挡下系数过早崩溃,并揭示可观测性盲区问题。

Comments 9 pages, 3 figures, 7 tables. Accepted at CVPR 2026 Workshop on Autonomous Driving (WAD)

详情
AI中文摘要

引导软演员-评论家(GSAC)将来自特权全状态教师的知识蒸馏给部分可观测的学生,用于自动驾驶,但使用固定的蒸馏系数λ,而不考虑智能体的不确定性。我们提出信念感知GSAC(BA-GSAC),通过集成分歧调节λ,并将其作为系统实证研究的测试平台,探究:自适应引导何时真正有帮助?在Highway-Env上评估五种策略(固定λ∈{0.01, 0.1}、自适应、线性衰减和普通SAC)在三个POMDP难度级别下,我们发现初步的单种子运行表明在轻度和中度部分可观测性下有收益,但在严重遮挡下(所有方法使用3个种子评估),自适应系数在大约3K步内坍缩到λ_min。我们将其归因于可观测性盲区现象:由于集成预测部分观测,即使在严重遮挡下也能达到低分歧,建模了可见部分但无法检测缺失部分。我们诊断了根本原因并提出了架构修复(使用引导演员的特权访问在完整状态预测上训练集成);虽然此处未验证,但我们表明即使存在当前限制,预热阶段也提供了可测量的稳定性(CV=13.3% vs. 常数λ=0.01的29.8%)。实际上,简单的确定性线性衰减计划在所有指标上实现了最佳的严重POMDP性能(均值116.5,CV=8.9%),表明稳定性收益来自调度效应而非集成。这些发现为设计不确定性感知的师生框架提供了实用指导,并强调了集成预测目标是一个重要的设计选择。

英文摘要

Guided Soft Actor-Critic (GSAC) distills knowledge from a privileged full-state teacher to a partial-observation student for autonomous driving, but uses a fixed distillation coefficient lambda regardless of the agent's uncertainty. We present Belief-Aware GSAC (BA-GSAC), which modulates lambda via ensemble disagreement, and use it as a testbed for a systematic empirical study asking: when does adaptive guidance actually help? Evaluating five strategies (fixed lambda in {0.01, 0.1}, adaptive, linear decay, and vanilla SAC) across three POMDP difficulty levels on Highway-Env, we find that preliminary single-seed runs suggest benefits under mild and moderate partial observability, but under severe occlusion (evaluated with 3 seeds for all methods) the adaptive coefficient collapses to lambda_min within about 3K steps. We trace this to an observability blindness phenomenon: because the ensemble predicts partial observations, it achieves low disagreement even under heavy occlusion, modeling what is visible but unable to detect what is missing. We diagnose the root cause and propose an architectural fix (training the ensemble on full-state predictions using the guiding actor's privileged access); while not validated here, we show that even with current limitations, the warmup phase provides measurable stabilization (CV=13.3% vs. 29.8% for constant lambda=0.01). In fact, a simple deterministic linear decay schedule achieves the best severe-POMDP performance across all metrics (mean 116.5, CV=8.9%), suggesting that the scheduling effect, not the ensemble, drives the stability benefit. These findings provide practical guidance for designing uncertainty-aware teacher-student frameworks and highlight ensemble prediction targets as an important design choice.

2605.26151 2026-05-27 physics.med-ph cs.RO 版本更新

Towards Real-World Identification of Fatigued Muscle Groups via Musculoskeletal Simulation

面向真实世界中疲劳肌肉群识别的肌肉骨骼仿真方法

Jenishkumar Chauhan, Samarth Brahmbhatt, Vineet Vashista

发表机构 * Human-Centered Robotics Lab at IIT Gandhinagar(IIT甘地纳加尔人类中心机器人实验室) IIT Gandhinagar(IIT甘地纳加尔)

AI总结 提出一种通过比较真实自由空间运动与仿真肌肉骨骼模型来无接触识别上肢疲劳肌肉群的算法,实验证明能可靠区分多个疲劳肌肉群,并展示了如何配置先进仿真器以缩小仿真到现实的差距。

Comments Video File: https://www.youtube.com/watch?v=scvi3DCD9UY

详情
AI中文摘要

无接触诊断肌肉骨骼疾病有望改善人口健康以及协作环境中的机器人行为。然而,当前的诊断方法需要现场体检,由训练有素的医生通过接触感知各肌肉施加的力。虽然存在仿真工具,但将其用于真实数据诊断的研究尚不充分。本文提出一种识别上肢哪个肌肉群疲劳的算法。该算法将受试者的真实自由空间运动与仿真的肌肉骨骼模型运动进行比较,因此是无接触的:避免了侵入式传感或现场评估的需要。我们的算法使用基于物理的肌肉骨骼模型模拟各种疲劳条件,并从真实和仿真数据中提取诊断运动特征,进行比较以进行诊断。在真实数据上的实验结果表明,所提方法能够可靠地区分多个疲劳肌肉群。此外,通过全面的性能比较,我们展示了如何正确配置最新的先进肌肉骨骼仿真器,以解决疲劳诊断任务中的仿真到现实差距。我们的方法有望推动远程和自动化诊断的进一步研究,显著降低大规模早期检测的门槛。

英文摘要

Contactless diagnosis of musculoskeletal disorders can potentially improve population health as well as robot behaviours in collaborative settings. However, current diagnosis methods require an in-person physical examination in which a trained physician senses, through contact, the force applied by various muscles. Simulation tools exist, but their use for diagnosis with real data is under-explored. In this paper, we propose an algorithm for identifying which upper-limb muscle group is fatigued. Our algorithm compares the realworld free-space motion of the subject with that of a simulated musculoskeletal model, and is therefore contactless: preventing the need for invasive sensing or in-person assessment. Our algorithm simulates various fatigue conditions using a physics-based musculoskeletal model and extracts diagnostic motion features from both real and simulated data, which are compared for diagnosis. Experimental results on real data demonstrate that the proposed method can reliably distinguish between multiple muscle-groups of fatigue. Additionally, through comprehensive performance comparisons, we show how recent advanced musculoskeletal simulators can be properly configured to address the sim-to-real gap in the context of the fatigue diagnosis task. Our approach can potentially spur further research in remote and automated diagnosis, significantly lowering the barrier to large-scale and early detection.

2605.25029 2026-05-27 cs.RO 版本更新

ParkingWorld: End-to-End Autonomous Parking Reinforcement Learning from Corrective Experience in 3DGS Simulation

ParkingWorld: 基于3DGS仿真中纠正性经验的端到端自主泊车强化学习

Zhengcheng Yu, Changze Li, Haoran Liu, Tong Qin

发表机构 * Tsinghua University(清华大学) Shanghai Jiaotong University(上海交通大学)

AI总结 提出一种基于纠正性经验的样本高效强化学习框架(CIL-SERL),在逼真的3D高斯溅射(3DGS)仿真器中训练端到端自主泊车策略,通过多级回放缓冲区机制提高成功率、效率和安全性。

Comments 9 pages(including 1 page of Appendix), 6 figures. Will be submitted to RA-L 2026

详情
AI中文摘要

自主泊车需要在狭窄、杂乱且高度受限的环境中进行精确的低速操控,车辆必须避开静态障碍物和复杂的几何边界。与模仿学习不同(模仿学习通常需要大量高质量专家演示才能收敛到稳定策略,且泛化到未见场景的能力有限),传统强化学习方法面临训练开销过大、探索效率低下,甚至在具有挑战性的场景中无法学习可行泊车策略等持续挑战。为解决这些问题,本文提出了一种基于纠正性循环的样本高效强化学习(CIL-SERL)框架,用于端到端自主泊车,该框架完全在逼真的3D高斯溅射(3DGS)泊车模拟器中训练,能够对真实场景进行高保真数字重建。受学习实践中纠错笔记本的启发,我们设计了一种新颖的多级回放缓冲区机制。这些缓冲区将标准RL轨迹、人工纠正干预、失败探索轨迹和基于回滚的纠正段分层组织并存储在不同但相互连接的内存区域中,从而在训练过程中促进结构化采样和有针对性的学习。所提出的框架在3DGS仿真环境和真实车辆平台上进行了系统评估。大量实验结果表明,我们的方法在多种场景下显著提高了泊车成功率、运行效率和安全性,验证了所提出的基于CIL-SERL的端到端自主泊车解决方案的有效性和实际适用性。

英文摘要

Autonomous parking demands precise low-speed maneuvering within narrow, cluttered, and highly constrained environments, where vehicles must navigate tight spaces while avoiding static obstacles and complex geometric boundaries. Unlike imitation learning, which typically requires massive volumes of high-quality expert demonstrations to converge to a stable policy and often suffers from limited generalization to unseen scenarios, traditional reinforcement learning (RL) methods face persistent challenges including excessive training overhead, inefficient exploration, and even failure to learn viable parking strategies in challenging settings. To address these limitations, this paper presents a correction-in-the-loop sample-efficient reinforcement learning (CIL-SERL) framework for end-to-end autonomous parking, which is entirely trained in a photorealistic 3D Gaussian Splatting (3DGS) parking simulator that enables high-fidelity digital reconstruction of real-world scenes. Inspired by error-correction notebooks used in learning practice, we design a novel multi-level replay buffer mechanism. These buffers hierarchically organize and store standard RL rollouts, human corrective interventions, failed exploration trajectories, and rollback-based correction segments in separate yet interconnected memory regions, facilitating structured sampling and targeted learning during training. The proposed framework is systematically evaluated in both the 3DGS simulation environment and a physical vehicle platform. Extensive experimental results demonstrate that our method achieves substantial improvements in parking success rate, operational efficiency, and safety performance across diverse scenarios, validating the effectiveness and practical applicability of the proposed CIL-SERL-based end-to-end autonomous parking solution.

2605.20255 2026-05-27 cs.LG cs.AI cs.HC cs.RO 版本更新

Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty

行人行为不确定性下安全自动驾驶的多智能体强化学习

Prakash Aryan, Kaushik Raghupathruni, Timo Kehrer, Sebastiano Panichella

发表机构 * University of Bern(伯恩大学) AI4I, The Italian Institute of Artificial Intelligence(意大利人工智能研究所)

AI总结 本文使用多智能体近端策略优化(MAPPO)联合训练自动驾驶汽车和12个行人,通过隐藏的行人特质模拟乱穿马路行为,相比固定策略基线显著降低了碰撞率,并揭示了速度差异指标可用于检测未预期的乱穿马路行为。

Comments Accepted to ICRA 2026 Workshop "8th Workshop on Long-term Human Motion Prediction"

详情
AI中文摘要

自动驾驶汽车(SDC)的仿真测试通常依赖脚本化行人模型,这些模型无法捕捉真实过街行为的异质性和不确定性,限制了安全评估的真实性,尤其是对于由车辆无法观察到的潜在人格特质支配的乱穿马路行为。我们假设,通过多智能体强化学习(MARL)联合训练行人和SDC,相比针对固定行人策略训练,能产生更真实的交互场景,并且可预测与不可预测过街行为之间的差距可以直接从轨迹中测量。我们使用多智能体近端策略优化(MAPPO)联合训练一个SDC和12个行人:行人移动遵循脚本化的Dijkstra路径规划,而RL策略控制高层的前进/等待决策,乱穿马路概率取决于每个行人在回合开始时采样并隐藏于SDC的特质。在500回合评估中,联合训练的SDC达到78%的目标完成率,碰撞率为14%,而最佳基于规则的基线分别为35%和33%。速度差异指标显示,在近距离(0-3米)范围内,SDC在乱穿马路者附近比在人行横道使用者附近快2.65米/秒,表明乱穿马路遭遇未被预期。乱穿马路占过街事件的13%,但占碰撞的62%,并且联合训练相比单智能体RL减少了30%的碰撞,因为行人学会了在SDC高速接近时等待。

英文摘要

Simulation-based testing of self-driving cars (SDCs) typically relies on scripted pedestrian models that do not capture the heterogeneity and uncertainty of real crossing behavior, limiting the realism of safety assessments, especially for jaywalking, which is governed by latent personality traits the vehicle cannot observe. We hypothesize that jointly training pedestrians and the SDC with multi-agent reinforcement learning (MARL) yields more realistic interaction scenarios than training against fixed pedestrian policies, and that the behavior gap between predictable and unpredictable crossings can be measured directly from trajectories. We co-train an SDC and 12 pedestrians using Multi-Agent Proximal Policy Optimization (MAPPO): pedestrian locomotion follows scripted Dijkstra pathfinding while an RL policy controls high-level go/wait decisions, and jaywalking probability depends on a per-pedestrian trait sampled at episode start and hidden from the SDC. In 500-episode evaluations, the co-trained SDC reached 78% of goals with a 14% collision rate, versus 35%/33% for the best rule-based baseline. A speed differential metric shows the SDC traveled 2.65 m/s faster near jaywalkers than near crosswalk users at close range (0-3 m), indicating jaywalking encounters were not anticipated. Jaywalking was 13% of crossing events but 62% of collisions, and co-training reduced collisions by 30% relative to single-agent RL as pedestrians learned to wait when the SDC approached at speed.

2603.04639 2026-05-27 cs.RO cs.AI 版本更新

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

RoboMME:机器人通用策略的记忆基准与理解

Yinpei Dai, Hongze Fu, Jayjun Lee, Yuejiang Liu, Haoran Zhang, Jianing Yang, Chelsea Finn, Nima Fazeli, Joyce Chai

发表机构 * University of Michigan(密歇根大学) Stanford University(斯坦福大学) Figure AI

AI总结 提出RoboMME基准,通过16个操作任务评估VLA模型在长时程和历史依赖场景中的记忆能力,并基于π0.5骨干网络探索14种记忆增强变体,发现记忆表示的有效性高度依赖于任务。

Comments Accepted to ICML 2026

详情
AI中文摘要

记忆对于长时程和历史依赖的机器人操作至关重要。这类任务通常涉及计数重复动作或操作暂时被遮挡的物体。最近的视觉-语言-动作(VLA)模型已开始融入记忆机制;然而,它们的评估仍局限于狭窄、非标准化的设置中。这限制了对记忆的系统理解、比较和进展测量。为应对这些挑战,我们引入了RoboMME:一个大规模标准化基准,用于评估和推进VLA模型在长时程、历史依赖场景中的表现。我们的基准包含16个操作任务,这些任务基于精心设计的分类法构建,该分类法评估时间、空间、对象和程序记忆。我们进一步开发了一套基于π0.5骨干网络的14种记忆增强VLA变体,以系统探索多种集成策略下的不同记忆表示。实验结果表明,记忆表示的有效性高度依赖于任务,每种设计在不同任务中都有独特的优势和局限性。视频和代码可在我们的网站https://robomme.github.io上找到。

英文摘要

Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluations remain confined to narrow, non-standardized settings. This limits systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios. Our benchmark comprises 16 manipulation tasks constructed under a carefully designed taxonomy that evaluates temporal, spatial, object, and procedural memory. We further develop a suite of 14 memory-augmented VLA variants built on the π0.5 backbone to systematically explore different memory representations across multiple integration strategies. Experimental results show that the effectiveness of memory representations is highly task-dependent, with each design offering distinct advantages and limitations across different tasks. Videos and code can be found at our website https://robomme.github.io.

2603.12592 2026-05-27 cs.DS cs.AI cs.RO 版本更新

Early Pruning for Public Transport Routing

公共交通路由的早期剪枝

Andrii Rohovyi, Abdallah Abuaisha, Toby Walsh

发表机构 * Department of Computer Science and Engineering, University of New South Wales (UNSW), Sydney, NSW 2033, Australia(新南威尔士大学计算机科学与工程系) Department of Data Science and Artificial Intelligence, Monash University, Melbourne, Australia(墨尔本大学数据科学与人工智能系)

AI总结 提出早期剪枝技术,通过预排序换乘连接并在换乘循环中应用剪枝规则,在不影响最优性的情况下加速公共交通路由算法,实验表明查询时间最多减少57%。

详情
AI中文摘要

公共交通的路由算法,特别是广泛使用的RAPTOR及其变体,在支持无限换乘时,常常在换乘松弛阶段面临性能瓶颈,尤其是在密集的换乘图上。这种低效源于遍历许多潜在的站点间连接(步行、自行车、电动滑板车等)。为了保持可接受的性能,从业者通常限制换乘距离或排除某些换乘选项,这可能会降低路径的最优性并限制向旅客展示的多模式选项。本文介绍了早期剪枝,一种低开销的技术,可以在不影响最优性的情况下加速路由算法。通过按持续时间预排序换乘连接,并在换乘循环内应用剪枝规则,该方法在站点处丢弃较长的换乘,一旦它们无法产生比当前最佳解更早的到达时间。早期剪枝可以以最小的更改集成到现有代码库中,并且只需要一次预处理步骤。该技术在扩展准则设置中保持帕累托最优性,只要额外的优化准则在换乘持续时间上单调非递减。在多个基于RAPTOR的最新解决方案中,包括RAPTOR、ULTRA-RAPTOR、McRAPTOR、BM-RAPTOR、ULTRA-McRAPTOR和UBM-RAPTOR,并在瑞士和伦敦交通网络上测试,我们实现了高达57%的查询时间减少。该方法为交通路径查找算法的效率提供了可推广的改进。

英文摘要

Routing algorithms for public transport, particularly the widely used RAPTOR and its variants, often face performance bottlenecks during the transfer relaxation phase, especially on dense transfer graphs, when supporting unlimited transfers. This inefficiency arises from iterating over many potential inter-stop connections (walks, bikes, e-scooters, etc.). To maintain acceptable performance, practitioners often limit transfer distances or exclude certain transfer options, which can reduce path optimality and restrict the multimodal options presented to travellers. This paper introduces Early Pruning, a low-overhead technique that accelerates routing algorithms without compromising optimality. By pre-sorting transfer connections by duration and applying a pruning rule within the transfer loop, the method discards longer transfers at a stop once they cannot yield an earlier arrival than the current best solution. Early Pruning can be integrated with minimal changes to existing codebases and requires only a one-time preprocessing step. The technique preserves Pareto-optimality in extended-criteria settings whenever the additional optimization criteria are monotonically non-decreasing in transfer duration. Across multiple state-of-the-art RAPTOR-based solutions, including RAPTOR, ULTRA-RAPTOR, McRAPTOR, BM-RAPTOR, ULTRA-McRAPTOR, and UBM-RAPTOR and tested on the Switzerland and London transit networks, we achieved query time reductions of up to 57\%. This approach provides a generalizable improvement to the efficiency of transit pathfinding algorithms.

2508.14422 2026-05-27 eess.SY cs.RO cs.SY math.OC 版本更新

A Sliced Learning Framework for Online Disturbance Identification in Quadrotor SO(3) Attitude Control

四旋翼SO(3)姿态控制中在线扰动辨识的切片学习框架

Tianhua Gao, Masashi Izumita, Kohji Tomita, Akiya Kamimura

发表机构 * Intelligent Systems Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Japan(国家先进工业科学与技术研究院智能系统研究所,日本) Graduate School of Systems and Information Engineering, University of Tsukuba, Japan(茨大大学系统与信息工程研究生院,日本)

AI总结 提出一种维度分解的几何学习框架Sliced Learning,通过李代数误差表示实现轴空间分解,结合轻量级SANM模块和Lyapunov自适应律,在资源受限MCU上实现400Hz在线扰动辨识。

Comments v4: This version has been accepted for publication in IEEE/ASME Transactions on Mechatronics (TMECH). Supplementary video links have also been added

详情
Journal ref
2026 IEEE/ASME Transactions on Mechatronics
AI中文摘要

本文介绍了一种称为Sliced Learning的维度分解几何学习框架,用于四旋翼几何姿态控制中的扰动辨识。与传统的从状态学习不同,该框架采用从误差学习的策略,使用李代数误差表示作为输入特征,在保持SO(3)结构的同时实现轴空间分解(“切片”)。这与神经科学中观察到的认知控制的几何机制高度一致,其中神经系统在结构化子空间内组织自适应表征以实现认知灵活性和效率。基于该框架,我们开发了轻量级且结构可解释的Sliced Adaptive-Neuro Mapping(SANM)模块。用于在线辨识的高维映射被轴向“切片”为多个低维子映射(“切片”),由浅层神经网络和自适应律实现。这些神经网络和自适应律通过基于Lyapunov的自适应在其各自的共享子空间中在线更新。为了增强可解释性,我们证明了在时变扰动和惯性不确定性下的指数收敛性。据我们所知,Sliced Learning是首批在资源受限的微控制器单元(MCU)(如STM32)上以400Hz频率展示轻量级在线神经自适应的框架之一,并经过了实际实验验证。

英文摘要

This paper introduces a dimension-decomposed geometric learning framework called Sliced Learning for disturbance identification in quadrotor geometric attitude control. Instead of conventional learning-from-states, this framework adopts a learning-from-error strategy by using the Lie-algebraic error representation as the input feature, enabling axis-wise space decomposition (``slicing") while preserving the SO(3) structure. This is highly consistent with the geometric mechanism of cognitive control observed in neuroscience, where neural systems organize adaptive representations within structured subspaces to enable cognitive flexibility and efficiency. Based on this framework, we develop a lightweight and structurally interpretable Sliced Adaptive-Neuro Mapping (SANM) module. The high-dimensional mapping for online identification is axially ``sliced" into multiple low-dimensional submappings (``slices"), implemented by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation within their respective shared subspaces. To enhance interpretability, we prove exponential convergence despite time-varying disturbances and inertia uncertainties. To our knowledge, Sliced Learning is among the first frameworks to demonstrate lightweight online neural adaptation at 400 Hz on resource-constrained microcontroller units (MCUs), such as STM32, with real-world experimental validation.

2601.16578 2026-05-27 cs.RO cs.SY eess.SY 版本更新

Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab

Cyber-Physical Mobility Lab中的零样本多智能体强化学习基准测试

Julius Beerwerth, Jianye Xu, Simon Schäfer, Fynn Belderink, Bassam Alrifaee

发表机构 * Cyber-Physical Mobility Lab(智能物理移动实验室) University of the Bundeswehr Munich(联邦国防军大学慕尼黑) RWTH Aachen University(亚琛工业大学)

AI总结 本文基于Cyber-Physical Mobility Lab构建了一个可复现的基准测试平台,用于评估联网自动驾驶汽车多智能体强化学习策略的仿真到现实迁移,并揭示了性能下降的两个互补来源。

详情
AI中文摘要

我们提出了一个可复现的基准测试,用于评估联网自动驾驶汽车(CAV)的多智能体强化学习(MARL)策略的仿真到现实迁移。该平台基于Cyber-Physical Mobility Lab(CPM Lab)[1],集成了仿真、高保真数字孪生和物理测试平台,能够对MARL运动规划策略进行结构化的零样本评估。我们通过在所有三个领域部署SigmaRL训练的策略[2]来展示其用途,揭示了性能下降的两个互补来源:仿真与硬件控制栈之间的架构差异,以及由环境真实性增加引起的仿真到现实差距。开源设置使得在现实且可复现的条件下,能够系统分析MARL中的仿真到现实挑战。

英文摘要

We present a reproducible benchmark for evaluating sim-to-real transfer of Multi-Agent Reinforcement Learning (MARL) policies for Connected and Automated Vehicles (CAVs). The platform, based on the Cyber-Physical Mobility Lab (CPM Lab) [1], integrates simulation, a high-fidelity digital twin, and a physical testbed, enabling structured zero-shot evaluation of MARL motion-planning policies. We demonstrate its use by deploying a SigmaRL-trained policy [2] across all three domains, revealing two complementary sources of performance degradation: architectural differences between simulation and hardware control stacks, and the sim-to-real gap induced by increasing environmental realism. The open-source setup enables systematic analysis of sim-to-real challenges in MARL under realistic, reproducible conditions.

2604.08059 2026-05-27 cs.RO cs.AI 版本更新

Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study

受治理的能力演化:基于AI组件的系统的生命周期兼容性检查与回滚——以具身智能体为例

Xue Qin, Simin Luan, John See, Zeyd Boukhers, Cong Yang, Zhijun Li

发表机构 * School of Software, Harbin Institute of Technology(哈尔滨工业大学软件学院) School of Computer Science and Technology, Harbin Institute of Technology(哈尔滨工业大学计算机科学与技术学院) School of Mathematical and Computer Sciences, Heriot-Watt University, Malaysia Campus(赫瑞-瓦德大学马来西亚分校数学与计算机科学学院) School of Future Science and Engineering, Soochow University(苏州大学未来科学与工程学院) Fraunhofer Institute for Applied Information Technology(弗劳恩霍夫应用信息科技研究所)

AI总结 针对基于AI组件的系统,提出一种受治理的能力演化框架,通过四类兼容性检查和七阶段升级管线实现安全部署,在具身智能体实验中实现零不安全激活。

Comments 42 pages, 7 figures, 12 tables

详情
AI中文摘要

由版本化AI组件构建的软件系统越来越需要生命周期治理:当能力模块演化到新版本时,宿主系统必须决定新版本是否可以安全激活、应在何种部署条件下运行、如何监控以及何时回滚。现有的软件部署模式(金丝雀发布、蓝绿部署、特性标志和MLOps管线)解决了这一循环的部分问题,但它们是针对无状态Web服务而非驱动现场AI组件的带状态、策略约束运行时设计的。我们将受治理的能力演化形式化为基于AI组件的系统的一等软件生命周期问题,并提出一个分阶段升级框架,其中每个新能力版本被视为受治理的部署候选,而非立即可执行的替换。该框架引入了四类升级兼容性检查(接口、策略、行为、恢复),并将其组织成七阶段管线(候选验证、沙箱评估、影子部署、门控激活、在线监控、回滚、审计)。我们在带有ROS 2中间件的PyBullet操作测试平台上实现了参考原型,并在15个随机种子的6轮能力升级中进行了评估。朴素升级实现了72.9%的任务成功率,但到最后一轮不安全激活率升至60%;受治理升级保持了可比的成功率(67.4%),同时在所有轮次中保持零不安全激活(Wilcoxon p=0.003)。影子部署揭示了40%的升级回归问题,这些问题是单独沙箱评估无法发现的,并且在79.8%的激活后漂移场景中回滚成功。

英文摘要

Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whether the new version may be activated safely, under what deployment conditions it should run, how it must be monitored, and when it should be rolled back. Existing software-deployment patterns (canary release, blue-green, feature flags, and MLOps pipelines) address parts of this loop but were designed for stateless web services rather than for stateful, policy-constrained runtimes that drive AI components in the field. We formulate governed capability evolution as a first-class software-lifecycle problem for AI-component-based systems and propose a staged upgrade framework in which every new capability version is treated as a governed deployment candidate rather than an immediately executable replacement. The framework introduces four upgrade compatibility checks (interface, policy, behavioral, recovery) and organizes them into a seven-stage pipeline (candidate validation, sandbox evaluation, shadow deployment, gated activation, online monitoring, rollback, audit). We implement a reference prototype on a PyBullet manipulation testbed with ROS 2 middleware and evaluate it over 6 rounds of capability upgrade with 15 random seeds. Naive upgrade achieves 72.9% task success but drives unsafe activation to 60% by the final round; governed upgrade retains comparable success (67.4%) while maintaining zero unsafe activations across all rounds (Wilcoxon p=0.003). Shadow deployment reveals 40% of upgrade regressions invisible to sandbox evaluation alone, and rollback succeeds in 79.8% of post-activation drift scenarios.

2605.00412 2026-05-27 cs.AI cs.RO 版本更新

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

物理原生世界模型:生成式世界建模的哈密顿视角

Sen Cui, Jingheng Ma

发表机构 * Tsinghua University(清华大学)

AI总结 提出哈密顿世界模型,通过结构化潜相空间和哈密顿动力学演化实现物理可靠、动作可控且长期稳定的未来预测,用于具身决策。

详情
AI中文摘要

世界模型最近重新成为具身智能、机器人、自动驾驶和基于模型的强化学习的核心范式。然而,当前的世界模型研究通常由三条部分分离的路线主导:强调视觉未来合成的2D视频生成模型、强调空间重建的3D场景中心模型,以及强调抽象预测表示的JEPA类潜变量模型。每条路线都取得了重要进展,但它们仍然难以提供物理可靠、动作可控且长期稳定的预测以支持具身决策。在本文中,我们认为世界模型的瓶颈不再仅仅是它们能否生成逼真的未来,而是这些未来是否物理上有意义且对动作有用。我们提出哈密顿世界模型作为世界建模的一个物理基础视角。关键思想是将观测编码到结构化的潜相空间中,通过带有控制、耗散和残差项的哈密顿动力学演化潜状态,将预测轨迹解码为未来观测,并利用生成的轨迹进行规划。我们讨论了哈密顿结构如何提高可解释性、数据效率和长期稳定性,同时也指出了在涉及摩擦、接触、非保守力和可变形物体的真实机器人场景中的实际挑战。

英文摘要

World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent models that emphasize abstract predictive representations. While each route has made important progress, they still struggle to provide physically reliable, action-controllable, and long-horizon stable predictions for embodied decision making. In this paper, we argue that the bottleneck of world models is no longer only whether they can generate realistic futures, but whether those futures are physically meaningful and useful for action. We propose \emph{Hamiltonian World Models} as a physically grounded perspective on world modeling. The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, decode the predicted trajectory into future observations, and use the resulting rollouts for planning. We discuss how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability, while also noting practical challenges in real-world robotic scenes involving friction, contact, non-conservative forces, and deformable objects.

2604.00993 2026-05-27 astro-ph.IM astro-ph.EP cs.LG cs.RO 版本更新

Focal plane wavefront control with model-based reinforcement learning

基于模型的强化学习进行焦平面波前控制

Jalo Nousiainen, Iremsu Taskin, Markus Kasper, Gilles Orban De Xivry, Olivier Absil

发表机构 * European Southern Observatory (ESO)(欧洲南天文学观测站) STAR Institute, Université de Liège(利根大学STAR研究所)

AI总结 提出基于模型的强化学习算法PO4NCPA,通过顺序相位分集自动校正动态和静态非共路像差,实现高对比度成像中的焦平面波前控制。

Comments 13 pages, 11 figures accepted by A&A

详情
Journal ref
A&A 709, A267 (2026)
AI中文摘要

直接成像潜在宜居系外行星是极大望远镜高对比度成像仪器的主要科学目标之一。大多数此类系外行星轨道靠近其主星,其观测受到快速移动的大气散斑和准静态非共路像差(NCPA)的限制。传统的NCPA校正方法通常使用机械镜面探针,这会在操作期间影响性能。本文提出了基于机器学习的NCPA控制方法,通过利用顺序相位分集自动检测和校正动态及静态NCPA误差。我们将先前用于自适应光学的强化学习工作扩展到焦平面控制。一种新的基于模型的RL算法——NCPA策略优化(PO4NCPA),将焦平面图像解释为输入数据,并通过顺序相位分集确定相位校正,从而在没有先验系统知识的情况下优化非日冕和日冕后点扩散函数。此外,我们通过在受水汽诱导视宁度(动态NCPA)影响的地基望远镜和红外成像仪上数值模拟静态NCPA误差,证明了该方法的有效性。模拟表明,PO4NCPA能够稳健地补偿静态和动态NCPA。在静态情况下,它实现了使用日冕仪时近最优的焦平面光抑制,以及无日冕仪时近最优的斯特列尔比。在动态NCPA情况下,它在这些指标上与结合1步延迟积分器的模态最小二乘重构性能相当。该方法对ELT光瞳、矢量涡旋日冕仪以及在光子和背景噪声下仍然有效。PO4NCPA是无模型的,可直接应用于标准成像以及任何日冕仪。其亚毫秒级的推理时间和性能也使其适用于高对比度成像之外的大气湍流实时低阶校正。

英文摘要

The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.

2603.28730 2026-05-27 cs.RO cs.CL cs.CV 版本更新

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

SOLE-R1:视频语言推理作为机器人强化学习的唯一奖励

Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart, Ondrej Biza

发表机构 * MIT(麻省理工学院) RAI Institute(机器人智能研究所)

AI总结 提出SOLE-R1模型,通过视频语言时空推理生成密集任务进度估计作为唯一奖励信号,实现在无真实奖励、演示或任务特定调优下的零样本在线强化学习。

详情
AI中文摘要

视觉语言模型(VLM)在各种任务中展现出令人印象深刻的能力,这促使人们努力利用这些模型来监督机器人学习。然而,当在强化学习(RL)中用作评估器时,当今最强的模型在部分可观测性和分布偏移下常常失败,使得策略能够利用感知错误而非解决任务。我们提出SOLE-R1(自观察学习器),一种专门设计用于为在线RL提供唯一奖励信号的视频语言推理模型。仅给定原始视频观测和自然语言目标,SOLE-R1执行每时间步的时空思维链(CoT)推理,并生成可直接用作奖励的密集任务进度估计。为了训练SOLE-R1,我们开发了一个大规模视频轨迹和推理合成流水线,生成与连续进度监督对齐的时间基础CoT轨迹。这些数据与基础的空间和多帧时间推理相结合,并使用混合框架训练模型,该框架将监督微调与可验证奖励的RL相结合。在四个不同的仿真环境和真实机器人设置中,SOLE-R1实现了从随机初始化的零样本在线RL:机器人学习之前未见过的操作任务,无需真实奖励、成功指标、演示或任务特定调优。SOLE-R1在24个未见过的任务上成功,并显著优于强视觉语言奖励器,包括Robometer、RoboReward、ReWiND、GPT-5和Gemini-3-Pro,同时对奖励破解表现出明显更强的鲁棒性。我们在匿名页面发布所有模型、数据、代码和演示:https://philip-mit.github.io/sole-r1/

英文摘要

Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enabling policies to exploit perceptual errors rather than solve the task. We introduce SOLE-R1 (Self-Observing LEarner), a video-language reasoning model explicitly designed to serve as the sole reward signal for online RL. Given only raw video observations and a natural-language goal, SOLE-R1 performs per-timestep spatiotemporal chain-of-thought (CoT) reasoning and produces dense estimates of task progress that can be used directly as rewards. To train SOLE-R1, we develop a large-scale video trajectory and reasoning synthesis pipeline that generates temporally grounded CoT traces aligned with continuous progress supervision. This data is combined with foundational spatial and multi-frame temporal reasoning, and used to train the model with a hybrid framework that couples supervised fine-tuning with RL from verifiable rewards. Across four different simulation environments and a real-robot setting, SOLE-R1 enables zero-shot online RL from random initialization: robots learn previously unseen manipulation tasks without ground-truth rewards, success indicators, demonstrations, or task-specific tuning. SOLE-R1 succeeds on 24 unseen tasks and substantially outperforms strong vision-language rewarders, including Robometer, RoboReward, ReWiND, GPT-5, and Gemini-3-Pro, while exhibiting markedly greater robustness to reward hacking. We release all models, data, code, and demos at the anonymous page: https://philip-mit.github.io/sole-r1/

2603.25415 2026-05-27 cs.AI cs.RO 版本更新

Modernising Reinforcement Learning-Based Navigation for Embodied Semantic Scene Graph Generation

具身语义场景图生成的强化学习导航现代化

Roman Küble, Marco Hüller, Mrunmai Phatak, Rainer Lienhart, Jörg Hähner

发表机构 * Organic Computing Group(有机计算组) Machine Learning and Computer Vision Group(机器学习与计算机视觉组) University of Augsburg(奥格斯堡大学) Am Technologiezentrum 8(技术中心8号) Augsburg, Germany(德国奥格斯堡)

AI总结 提出模块化导航组件,通过替换策略优化方法和重新设计离散动作表示,现代化具身语义场景图生成中的决策过程,并评估不同动作集和策略结构对场景图完整性、执行安全性和导航行为的影响。

详情
AI中文摘要

语义世界模型使具身智能体能够推理对象、关系和空间上下文,超越纯几何表示。在有机计算中,此类模型是在不确定性和资源约束下实现目标驱动自适应的关键。核心挑战是在有限动作预算内获取最大化模型质量和下游实用性的观测。语义场景图(SSG)为此提供了结构紧凑的表示。然而,在有限动作视界内构建SSG需要探索策略,在信息增益与导航成本之间权衡,并决定何时额外动作的收益递减。本文提出了用于具身语义场景图生成的模块化导航组件,并通过替换策略优化方法和重新审视离散动作公式来现代化其决策。我们研究了紧凑和更细粒度的较大离散动作集,并比较了原子动作上的单头策略与动作组件上的分解多头策略。我们评估了课程学习和基于深度的可选碰撞监督,并评估了SSG完整性、执行安全性和导航行为。结果表明,仅替换优化算法在相同奖励塑造下相对于基线将SSG完整性提高了21%。深度主要影响执行安全性(无碰撞运动),而完整性基本保持不变。将现代优化与更细粒度、分解的动作表示相结合,产生了最强的完整性-效率权衡。

英文摘要

Semantic world models enable embodied agents to reason about objects, relations, and spatial context beyond purely geometric representations. In Organic Computing, such models are a key enabler for objective-driven self-adaptation under uncertainty and resource constraints. The core challenge is to acquire observations maximising model quality and downstream usefulness within a limited action budget. Semantic scene graphs (SSGs) provide a structured and compact representation for this purpose. However, constructing them within a finite action horizon requires exploration strategies that trade off information gain against navigation cost and decide when additional actions yield diminishing returns. This work presents a modular navigation component for Embodied Semantic Scene Graph Generation and modernises its decision-making by replacing the policy-optimisation method and revisiting the discrete action formulation. We study compact and finer-grained, larger discrete motion sets and compare a single-head policy over atomic actions with a factorised multi-head policy over action components. We evaluate curriculum learning and optional depth-based collision supervision, and assess SSG completeness, execution safety, and navigation behaviour. Results show that replacing the optimisation algorithm alone improves SSG completeness by 21\% relative to the baseline under identical reward shaping. Depth mainly affects execution safety (collision-free motion), while completeness remains largely unchanged. Combining modern optimisation with a finer-grained, factorised action representation yields the strongest overall completeness--efficiency trade-off.

2602.21450 2026-05-27 cs.RO cs.SY eess.SY 版本更新

Vector Fields for Path Following on Lie Groups with Application in Robot Control

李群上的路径跟随向量场及其在机器人控制中的应用

Felipe Bartelt, Luciano C. A. Pimenta, Weijia Yao, Vinicius M. Gonçalves

AI总结 针对李群上的路径跟随问题,提出一种通用向量场框架,保证从几乎所有初始条件收敛到期望参数曲线并连续运动,在SE(3)上给出最小表示的控制输入,通过机械臂实验验证有效性。

Comments Manuscript revised: new title, reframed abstract and introduction for robotics, and added a coauthor

详情
AI中文摘要

许多机器人系统允许独立控制位置和姿态(位姿),包括全向飞行器、水下机器人和机械臂末端执行器。在许多应用中,这些系统必须遵循连续的位姿序列,从而形成轨迹跟踪或路径跟随问题。与轨迹跟踪相比,路径跟随具有重要的实际优势。我们特别关注李群上的路径跟随问题。将机器人视为在三维空间中运动的刚体,该路径跟随问题可以表述为在矩阵李群SE(3)上设计引导向量场的问题。在本文中,我们开发了一个通用的向量场框架,用于连通矩阵李群上的路径跟随,其中SE(3)是一个重要的特例。所提出的向量场保证从几乎所有初始条件收敛到期望参数曲线,同时确保沿路径连续运动。此外,另一个有趣的特点是,与先前的工作相比,控制输入在表示上是“最小的”,并且更接近工程应用(例如,在SE(3)情况下的身体扭曲)。在建立一般情况后,该框架被专门应用于机器人学中特别感兴趣的SE(3),产生了一种适用于实时机器人控制的高效算法。使用机械臂跟踪复杂位姿路径的实验证明了该方法的有效性。还提供了开源实现。

英文摘要

Many robotic systems allow independent control of position and orientation (pose), including omnidirectional aerial vehicles, underwater robots, and manipulator end-effectors. In many applications, these systems must follow a continuous sequence of poses, leading to either trajectory-tracking or path following formulations. Compared to trajectory-tracking, path following offers important practical advantages. In particular, we focus on the problem of path following on Lie groups. Considering the robots as rigid bodies moving in the 3D space, this path-following problem can be posed as a problem of designing guiding vector fields on the matrix Lie group SE(3). In this paper, we develop a general vector-field framework for path following on connected matrix Lie groups, of which SE(3) is a prominent special case. The proposed vector field guarantees convergence to a desired parametric curve from almost all initial conditions while ensuring continuous motion along the path. Furthermore, another interesting feature is that, as opposed to previous works, the control input is "minimal" in terms of representation and closer to the engineering application (e.g., the body twist in the case SE(3)). After establishing the general case, the framework is then specialized to SE(3), of special interest in robotics, yielding an efficient algorithm suitable for real-time robotic control. Experiments with a robotic manipulator tracking complex pose paths demonstrate the effectiveness of the approach. An open-source implementation is also provided.

2602.17822 2026-05-27 cs.RO 版本更新

Evolution of Safety Requirements in Industrial Robotics: Comparative Analysis of ISO 10218-1/2 (2011 vs. 2025) and Integration of ISO/TS 15066

工业机器人安全要求的演进:ISO 10218-1/2(2011 与 2025 版)比较分析及 ISO/TS 15066 的整合

Daniel Hartmann, Kristýna Hamříková, Aleš Vysocký, Vendula Laciok, Aleš Bernatík

发表机构 * Faculty of Mechanical Engineering, VSB—Technical University of Ostrava(机械工程学院,奥斯特拉瓦技术大学) Faculty of Safety Engineering, VSB—Technical University of Ostrava(安全工程学院,奥斯特拉瓦技术大学)

AI总结 本文通过比较 ISO 10218:2011 与 ISO 10218:2025 标准,分析工业机器人安全要求在功能安全、网络安全、机器人分类及协作应用等方面的演进,并整合 ISO/TS 15066,建立现代机器人系统设计与运行的全面框架。

详情
AI中文摘要

工业机器人已成为大型制造企业不可或缺的组成部分。同时,协作机器人日益突出,引入了人机交互的新范式。这些进步促使安全标准进行全面修订,特别是纳入了网络安全和防止未经授权访问网络化机器人系统的要求。本文对 ISO 10218:2011 和 ISO 10218:2025 标准进行了比较分析,考察了其结构、术语、技术要求和附录的演进。分析揭示了功能安全和网络安全方面的显著扩展,引入了机器人和协作应用的新分类,以及技术规范 ISO/TS 15066 的规范性整合。因此,新版本综合了机械、功能和数字安全要求,为现代机器人系统的设计和运行建立了全面框架。

英文摘要

Industrial robotics has established itself as an integral component of large-scale manufacturing enterprises. Simultaneously, collaborative robotics is gaining prominence, introducing novel paradigms of human-machine interaction. These advancements have necessitated a comprehensive revision of safety standards, specifically incorporating requirements for cybersecurity and protection against unauthorized access in networked robotic systems. This article presents a comparative analysis of the ISO 10218:2011 and ISO 10218:2025 standards, examining the evolution of their structure, terminology, technical requirements, and annexes. The analysis reveals significant expansions in functional safety and cybersecurity, the introduction of new classifications for robots and collaborative applications, and the normative integration of the technical specification ISO/TS 15066. Consequently, the new edition synthesizes mechanical, functional, and digital safety requirements, establishing a comprehensive framework for the design and operation of modern robotic systems.

2601.14702 2026-05-27 cs.AI cs.CV cs.RO 版本更新

Drive-P2D: A Progressive Perception-to-Decision Benchmark for VLMs in Autonomous Driving

Drive-P2D:自动驾驶中视觉语言模型的渐进式感知到决策基准

Zecong Tang, Zixu Wang, Yifei Wang, Weitong Lian, Tianjian Gao, Haoran Li, Tengju Ru, Lingyi Meng, Zhejun Cui, Yichen Zhu, Qi Kang, Kaixuan Wang, Yu Zhang

发表机构 * Zhejiang University(浙江大学) The University of Hong Kong(香港大学)

AI总结 提出Drive-P2D基准,通过分离推理与答案的协议,在目标、场景和决策三个层级上评估视觉语言模型的感知到决策能力,并分析错误模式。

详情
AI中文摘要

自动驾驶需要在复杂场景中实现可靠的感知和安全的决策。最近的视觉语言模型(VLM)展示了推理和泛化能力,为自动驾驶开辟了新的可能性;然而,现有的基准通常分别评估感知和决策,通过仅选择格式限制故障分析,或通过LLM评分的长格式输出引入评估偏差。为了解决这些问题,我们提出了Drive-P2D,一个渐进式感知到决策基准,包含6650个问题,涵盖目标、场景和决策三个层级。Drive-P2D采用分离的推理与答案协议:最终答案客观评分,而推理则用于分析沿渐进式感知到决策链暴露的错误模式。我们评估了所有场景和高风险场景下的主流VLM,并通过相关性分析和相似场景鲁棒性测试进一步刻画了感知到决策的能力边界。推理进一步揭示了逻辑推理错误和语义特征遗漏等故障模式,我们训练了一个轻量级分析器模型来自动化大规模推理错误模式标注。这些设计共同为构建更安全、更可靠的用于现实世界自动驾驶的VLM提供了实用见解。

英文摘要

Autonomous driving requires reliable perception and safe decision-making in complex scenarios. Recent vision-language models (VLMs) demonstrate reasoning and generalization abilities, opening new possibilities for autonomous driving; however, existing benchmarks often evaluate perception and decision-making separately, limit failure analysis with choice-only formats, or introduce evaluation bias through LLM-scored long-form outputs. To address these issues, we present Drive-P2D, a progressive perception-to-decision benchmark with 6,650 questions across Object, Scene, and Decision levels. Drive-P2D adopts a separated reasoning-and-answer protocol: final answers are scored objectively, while reasoning is analyzed to identify error modes exposed along the progressive perception-to-decision chain. We evaluate mainstream VLMs across all and high-risk scenarios, and further characterize the perception-to-decision capability boundary through correlation analysis and similar-scene robustness testing. Reasoning further exposes failure modes such as logical reasoning errors and semantic feature omissions, and we train a lightweight analyzer model to automate large-scale error-mode annotation of reasoning. Together, these designs provide practical insights for building safer and more reliable VLMs for real-world autonomous driving.

2601.07284 2026-05-27 cs.RO 版本更新

AdaMorph: Unified Motion Retargeting via Embodiment-Aware Adaptive Transformers

AdaMorph: 通过具身感知自适应变换器实现统一运动重定向

Haoyu Zhang, Shibo Jin, Lusong Li, Jun Li, Liang Lin, Xiaodong He, Zecui Zeng

发表机构 * JD Explore Academy(京东探索学院)

AI总结 提出AdaMorph统一框架,利用具身感知自适应变换器将人体运动重定向到多种机器人形态,实现零样本泛化。

详情
AI中文摘要

将人体运动重定向到异构机器人是机器人学中的一个基本挑战,主要由于不同具身之间的严重运动学和动力学差异。现有解决方案通常训练特定于具身的模型,这扩展性差且无法利用共享的运动语义。为了解决这个问题,我们提出了AdaMorph,一个统一的神经重定向框架,使单个模型能够将人体运动适应到多种机器人形态。我们的方法将重定向视为一个条件生成任务。我们将人体运动映射到一个与形态无关的潜在意图空间,并利用双用途提示机制来条件化生成。不同于简单的输入拼接,我们利用自适应层归一化(AdaLN)根据具身约束动态调制解码器的特征空间。此外,我们通过基于课程的训练目标强制执行物理合理性,通过积分确保方向和轨迹一致性。在12个不同的人形机器人上的实验结果表明,AdaMorph有效地统一了跨异构拓扑的控制,在保持源行为动态本质的同时,对未见过的复杂运动表现出强大的零样本泛化能力。

英文摘要

Retargeting human motion to heterogeneous robots is a fundamental challenge in robotics, primarily due to the severe kinematic and dynamic discrepancies between varying embodiments. Existing solutions typically resort to training embodiment-specific models, which scales poorly and fails to exploit shared motion semantics. To address this, we present AdaMorph, a unified neural retargeting framework that enables a single model to adapt human motion to diverse robot morphologies. Our approach treats retargeting as a conditional generation task. We map human motion into a morphology-agnostic latent intent space and utilize a dual-purpose prompting mechanism to condition the generation. Instead of simple input concatenation, we leverage Adaptive Layer Normalization (AdaLN) to dynamically modulate the decoder's feature space based on embodiment constraints. Furthermore, we enforce physical plausibility through a curriculum-based training objective that ensures orientation and trajectory consistency via integration. Experimental results on 12 distinct humanoid robots demonstrate that AdaMorph effectively unifies control across heterogeneous topologies, exhibiting strong zero-shot generalization to unseen complex motions while preserving the dynamic essence of the source behaviors.

2506.06016 2026-05-27 eess.SY cs.RO cs.SY 版本更新

Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation

相对姿态与目标角速度估计的等变滤波器

Gil Serrano, Bruno J. Guerreiro, Pedro Lourenço, Rita Cunha

发表机构 * Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa(系统机器人研究所,技术高等学院,里斯本大学) CTS/Uninova and LASI, School of Science and Technology at NOVA University Lisbon(CTS/Uninova 和 LASI,里斯本 NOVA 大学科学与技术学院) GNC Division, Flight Segment and Robotics, GMV(飞行段与机器人部,GMV)

AI总结 提出一种等变滤波器(EqF),利用两个已知非共线向量观测,同时估计相对姿态和目标角速度,并通过仿真和实验验证其性能。

Comments Published in the IEEE Transactions on Aerospace and Electronic Systems, 2026. Open Access article under CC BY 4.0

详情
Journal ref
IEEE Transactions on Aerospace and Electronic Systems, vol. 62, pp. 2965-2979, 2026
AI中文摘要

两个刚体之间相对姿态和角速度的精确估计是航天应用(如航天器交会对接)的基础。在这些场景中,追踪航天器必须利用机载传感器确定目标的姿态和角速度。本文解决了设计等变滤波器(EqF)的挑战,该滤波器能够利用目标坐标系中两个已知非共线向量的带噪观测,可靠地估计相对姿态和目标角速度。为了推导EqF,提出了系统的对称性,并计算了到对称群的等变提升。分析了可观测性和收敛性。仿真展示了滤波器的性能,蒙特卡洛运行产生了统计显著的结果。还研究了低速率测量的影响,并提出了缓解该影响的策略。使用基准标记以及传统相机和事件相机进行测量获取的实验结果进一步验证了该方法,确认了其在现实环境中的有效性。

英文摘要

Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.

2509.18384 2026-05-27 cs.RO cs.FL 版本更新

LAD-VF: LLM-Automatic Differentiation Enables Fine-Tuning-Free Robot Planning from Formal Methods Feedback

LAD-VF:LLM自动微分实现基于形式化方法反馈的无微调机器人规划

Yunhao Yang, Junyuan Hong, Gabriel Jacob Perin, Zhiwen Fan, Li Yin, Zhangyang Wang, Ufuk Topcu

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) University of São Paulo(圣保罗大学) Texas A&M University(德克萨斯A&M大学) SylphAI

AI总结 提出LAD-VF框架,利用形式化验证反馈和LLM自动微分自动优化提示词,无需微调即可提升机器人规划任务中规范符合率,成功率从60%提升至90%以上。

Comments Presented at ICRA 2026

详情
AI中文摘要

大型语言模型(LLM)能够将自然语言指令转化为机器人、自动驾驶等领域的可执行动作计划。然而,在物理世界中部署LLM驱动的规划需要严格遵守安全和监管约束,当前模型常因幻觉或弱对齐而违反这些约束。传统的数据驱动对齐方法(如直接偏好优化DPO)需要昂贵的人工标注,而近期基于形式化反馈的方法仍依赖资源密集型的微调。本文提出LAD-VF,一种无需微调的框架,利用形式化验证反馈实现自动化提示工程。通过引入与LLM-AutoDiff集成的形式化验证感知文本损失,LAD-VF迭代优化提示词而非模型参数。这带来三个关键优势:(i) 无需微调的可扩展适应;(ii) 与模块化LLM架构兼容;(iii) 通过可审计的提示词实现可解释的优化。在机器人导航和操作任务中的实验表明,LAD-VF显著提升了规范符合率,将成功率从60%提升至90%以上。因此,我们的方法为可信、形式化验证的LLM驱动控制系统提供了一条可扩展且可解释的路径。

英文摘要

Large language models (LLMs) can translate natural language instructions into executable action plans for robotics, autonomous driving, and other domains. Yet, deploying LLM-driven planning in the physical world demands strict adherence to safety and regulatory constraints, which current models often violate due to hallucination or weak alignment. Traditional data-driven alignment methods, such as Direct Preference Optimization (DPO), require costly human labeling, while recent formal-feedback approaches still depend on resource-intensive fine-tuning. In this paper, we propose LAD-VF, a fine-tuning-free framework that leverages formal verification feedback for automated prompt engineering. By introducing a formal-verification-informed text loss integrated with LLM-AutoDiff, LAD-VF iteratively refines prompts rather than model parameters. This yields three key benefits: (i) scalable adaptation without fine-tuning; (ii) compatibility with modular LLM architectures; and (iii) interpretable refinement via auditable prompts. Experiments in robot navigation and manipulation tasks demonstrate that LAD-VF substantially enhances specification compliance, improving success rates from 60% to over 90%. Our method thus presents a scalable and interpretable pathway toward trustworthy, formally-verified LLM-driven control systems.

2509.10481 2026-05-27 cs.NI cs.RO cs.SY eess.SP eess.SY 版本更新

Synergetic Empowerment: Wireless Communications Meets Embodied Intelligence

协同赋能:无线通信遇见具身智能

Hongtao Liang, Yihe Diao, YuHang Wu, Fuhui Zhou, Qihui Wu

发表机构 * College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics(南京航空航天大学电子与信息工程学院) College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics(南京航空航天大学人工智能学院)

AI总结 本文提出无线通信与具身智能的协同赋能框架,通过感知-认知-执行循环揭示二者相互促进的双重性,将无线通信从简单工具转变为集体智能的数字神经系统,并推动孤立智能体进化为具备涌现能力的超个体。

Comments Accepted by IEEE Communications Magazine

详情
AI中文摘要

无线通信正在演变为一个智能体时代,其中具有内在具身智能的大规模智能体不仅是用户,更是积极参与者。无线通信与具身智能的完美结合可以实现协同赋能,极大促进智能体通信的发展。本文介绍了这种协同赋能的概览,将其视为一个共同进化过程,将无线通信从简单的工具转变为集体智能的数字神经系统,同时将孤立的智能体提升为一个统一的超个体,其涌现能力远超个体贡献的总和。此外,我们通过感知-认知-执行(PCE)循环的视角详细阐述了具身智能与无线通信如何相互受益,揭示了一个基本双重性:每个PCE阶段既对网络容量提出挑战,又为系统级优化创造了前所未有的机遇。最后,指出了关键开放问题和未来研究方向。

英文摘要

Wireless communication is evolving into an agent era, where large-scale agents with inherent embodied intelligence are not just users but active participants. The perfect combination of wireless communication and embodied intelligence can achieve a synergetic empowerment and greatly facilitate the development of agent communication. An overview of this synergetic empowerment is presented, framing it as a co-evolutionary process that transforms wireless communication from a simple utility into the digital nervous system of a collective intelligence, while simultaneously elevating isolated agents into a unified superorganism with emergent capabilities far exceeding individual contributions. Moreover, we elaborate how embodied intelligence and wireless communication mutually benefit each other through the lens of the perception-cognition-execution (PCE) loop, revealing a fundamental duality where each PCE stage both challenges network capacity and creates unprecedented opportunities for system-wide optimization. Furthermore, critical open issues and future research directions are identified.

2504.00167 2026-05-27 cs.RO 版本更新

Enhancing Physical Human-Robot Interaction: Recognizing Digits via Intrinsic Robot Tactile Sensing

增强物理人机交互:通过机器人本体触觉感知识别数字

Teresa Sinico, Giovanni Boschetti, Pedro Neto

发表机构 * Dep. Management and Engineering University of Padua(管理与工程系帕多瓦大学) CEMMPRE, Dep. Mechanical Engineering University of Coimbra(CEMMPRE,机械工程系科英布拉大学)

AI总结 利用协作机器人内置扭矩传感器采集人手在触控板上书写数字时的关节力矩和末端力数据,通过双向LSTM网络实现94%准确率的在线数字识别,并在水果递送任务中验证其应用潜力。

详情
AI中文摘要

物理人机交互(pHRI)仍然是实现与机器人直观安全交互的关键挑战。当前的进展通常依赖外部触觉传感器作为接口,这增加了机器人系统的复杂性。在本研究中,我们利用协作机器人的本体触觉感知能力,识别用户在安装在机器人法兰上的无仪器触控板上绘制的数字。我们提出了一个数据集,包含机器人关节扭矩信号以及相应的末端执行器(EEF)力和力矩,这些数据来自机器人每个关节的集成扭矩传感器,用户在手写数字(0-9)时采集。pHRI-DIGI-TACT数据集从不同用户收集,以捕捉手写的自然变化。为增强分类鲁棒性,我们开发了一种数据增强技术来处理反转和旋转的数字输入。双向长短期记忆(Bi-LSTM)网络利用数据的时空特性,实现在线数字分类,在各种测试场景中总体准确率达到94%,包括涉及未参与系统训练的用户。该方法在真实机器人上的水果递送任务中实现,展示了其辅助日常生活的潜力。数据集和视频演示可在 https://TS-Robotics.github.io/pHRI-DIGI/ 获取。

英文摘要

Physical human-robot interaction (pHRI) remains a key challenge for achieving intuitive and safe interaction with robots. Current advancements often rely on external tactile sensors as interface, which increase the complexity of robotic systems. In this study, we leverage the intrinsic tactile sensing capabilities of collaborative robots to recognize digits drawn by humans on an uninstrumented touchpad mounted to the robot's flange. We propose a dataset of robot joint torque signals along with corresponding end-effector (EEF) forces and moments, captured from the robot's integrated torque sensors in each joint, as users draw handwritten digits (0-9) on the touchpad. The pHRI-DIGI-TACT dataset was collected from different users to capture natural variations in handwriting. To enhance classification robustness, we developed a data augmentation technique to account for reversed and rotated digits inputs. A Bidirectional Long Short-Term Memory (Bi-LSTM) network, leveraging the spatiotemporal nature of the data, performs online digit classification with an overall accuracy of 94\% across various test scenarios, including those involving users who did not participate in training the system. This methodology is implemented on a real robot in a fruit delivery task, demonstrating its potential to assist individuals in everyday life. Dataset and video demonstrations are available at: https://TS-Robotics.github.io/pHRI-DIGI/.

2009.11997 2026-05-27 cs.LG cs.AI cs.RO 版本更新

Continual Model-Based Reinforcement Learning with Hypernetworks

基于超网络的连续模型强化学习

Yizhou Huang, Kevin Xie, Homanga Bharadhwaj, Florian Shkurti

发表机构 * Division of Engineering Science, University of Toronto, Canada(多伦多大学工程科学系) Department of Computer Science, University of Toronto, Canada(多伦多大学计算机科学系)

AI总结 提出HyperCRL方法,利用任务条件超网络在序列任务中持续学习动力学模型,避免重新训练并固定存储开销,在机器人 locomotion 和 manipulation 任务中优于现有持续学习方法。

Comments Updated link to project website in the abstract. 7 pages (+2 pages in appendix), 8 figures. In proceedings of the 2021 IEEE International Conference on Robotics and Automation

详情
AI中文摘要

在基于模型的强化学习(MBRL)和模型预测控制(MPC)中,有效规划依赖于学习到的动力学模型的准确性。在MBRL和MPC的许多实例中,该模型被假定为平稳的,并且定期从头开始重新训练,使用从环境交互开始收集的状态转移经验。这意味着训练动力学模型所需的时间——以及计划执行之间的暂停时间——随着收集的经验规模线性增长。我们认为这对于终身机器人学习来说太慢,并提出了HyperCRL,一种使用任务条件超网络在序列任务中持续学习所遇到动力学的方法。我们的方法有三个主要特点:首先,它包括不重新访问先前任务训练数据的动力学学习会话,因此只需存储最近固定大小的状态转移经验;其次,它使用固定容量的超网络来表示非平稳且任务感知的动力学;第三,它优于依赖固定容量网络的现有持续学习替代方案,并且与记忆不断增长的过去经验核心集的基线方法相比具有竞争力。我们展示了HyperCRL在机器人 locomotion 和 manipulation 场景(如推和开门任务)中在连续基于模型的强化学习中的有效性。我们的项目网站(含视频)位于此链接:https://rvl.cs.toronto.edu/blog/hypercrl

英文摘要

Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dynamics model. In many instances of MBRL and MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state transition experience collected from the beginning of environment interactions. This implies that the time required to train the dynamics model - and the pause required between plan executions - grows linearly with the size of the collected experience. We argue that this is too slow for lifelong robot learning and propose HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks. Our method has three main attributes: first, it includes dynamics learning sessions that do not revisit training data from previous tasks, so it only needs to store the most recent fixed-size portion of the state transition experience; second, it uses fixed-capacity hypernetworks to represent non-stationary and task-aware dynamics; third, it outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience. We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios, such as tasks involving pushing and door opening. Our project website with videos is at this link https://rvl.cs.toronto.edu/blog/hypercrl