arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.27314 2026-05-27 cs.RO cs.SY eess.SY 版本更新

YOLO26-RipeLoc Lite：用于温室机器人采摘中番茄成熟度检测与采摘点定位的轻量级架构

Rajmeet Singh, Manveen Kaur, Shahpour Alirezaee, Irfan Hussain

发表机构 * Department of Mechanical Engineering（机械工程系）； Khalifa University（卡利法大学）； University of Windsor（温莎大学）

AI总结提出基于YOLO26的轻量级架构YOLO26-RipeLoc Lite，通过轻量特征金字塔网络、成熟度感知注意力模块和紧凑检测头，实现温室番茄的成熟度分类与中心点定位，在仅2.38M参数下达到92.9% mAP@0.5。

详情

AI中文摘要

在温室番茄生产中，自动化收获需要准确检测成熟番茄、进行成熟度分类，并为机器人末端执行器精确定位采摘点。本文提出YOLO26-RipeLoc Lite，一种基于YOLO26的轻量级深度学习架构，用于同时检测、成熟度分类和温室番茄的中心点定位。该模型引入了三项改进：(1) 轻量特征金字塔网络（LFPN），采用深度可分离卷积实现高效多尺度融合；(2) 成熟度感知注意力模块（RAAM），具有双池化和可学习的成熟度偏置向量，增强颜色纹理区分能力；(3) 紧凑检测头（CDH），采用共享卷积和集成的中心点回归分支，用于直接抓取规划。该模型在来自阿联酋阿布扎比SILAL温室的自定义数据集（1500张图像，6227个实例，其中3566个成熟，2661个未成熟）上进行评估。YOLO26-RipeLoc Lite在仅使用2.38M参数的情况下，实现了92.9%的mAP@0.5（成熟95.2%，未成熟90.6%），在所有评估架构中精度最高（95.2%）。训练后批量归一化剪枝30%可将参数减少至约1.8M，且精度损失可忽略。消融研究证实，温室感知的HSV增强提供了最大的改进（+2.02个百分点 mAP@50），骨干网络冻结达到了峰值精度（93.8%），而三阶段渐进解冻获得了最佳的定位质量（mAP@50:95为64.6%）。与YOLOv8n/s、YOLO11n/s、YOLO12n/s和YOLO26s的比较证实了其优越的精度-效率：比YOLO12n精度高2.9个百分点，参数少7.0%，并集成了用于机器人末端执行器引导的中心点定位。

英文摘要

In greenhouse tomato production, automated harvesting requires accurate detection of ripe tomatoes, ripeness classification, and precise picking-point localization for robotic end-effectors. This paper proposes YOLO26-RipeLoc Lite, a lightweight deep learning architecture based on YOLO26 for simultaneous detection, ripeness classification, and center-point localization of greenhouse tomatoes. The model introduces three modifications: (1) a Lightweight Feature Pyramid Network (LFPN) with depthwise separable convolutions for efficient multi-scale fusion, (2) a Ripeness-Aware Attention Module (RAAM) with dual pooling and a learnable ripeness bias vector for enhanced color-texture discrimination, and (3) a Compact Detection Head (CDH) with shared convolutions and an integrated center-point regression branch for direct grasp planning. The model is evaluated on a custom dataset of 1,500 images with 6,227 instances (3,566 ripe, 2,661 unripe) from the SILAL greenhouse, Abu Dhabi, UAE. YOLO26-RipeLoc Lite achieves mAP@0.5 of 92.9% (95.2% ripe, 90.6% unripe) with the highest precision (95.2%) among all evaluated architectures using only 2.38M parameters. Post-training BatchNorm pruning at 30% reduces parameters to ~1.8M with negligible accuracy loss. Ablation studies confirm that greenhouse-aware HSV augmentation provides the largest improvement (+2.02 pp mAP@50), backbone freezing achieves peak precision (93.8%), and 3-phase progressive unfreezing yields the best localization quality (mAP@50:95 of 64.6%). Comparisons with YOLOv8n/s, YOLO11n/s, YOLO12n/s, and YOLO26s confirm superior accuracy-efficiency: 2.9 pp higher precision than YOLO12n with 7.0% fewer parameters and integrated center-point localization for robotic end-effector guidance.

URL PDF HTML ☆

赞 0 踩 0

2605.27079 2026-05-27 cs.LG cs.AI cs.RO 版本更新

Trust Region Q Adjoint Matching

信任区域Q伴随匹配

Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin

发表机构 * KAIST AI（韩国科学技术院人工智能）； Seoul National University（首尔国立大学）； RLWRLD

AI总结针对预训练流策略的离策略强化学习不稳定性，提出信任区域Q伴随匹配方法，通过投影对偶下降自适应控制路径空间KL散度，实现稳定微调，在50个OGBench任务中离线RL成功率达68%。

详情

AI中文摘要

由于多步采样过程带来的优化不稳定性，预训练流策略的离策略强化学习仍然具有挑战性。最近，带有伴随匹配的Q学习（QAM）通过将问题重新表述为一个具有学习评论家的无记忆随机最优控制（SOC）问题来解决这一问题。然而，QAM继承了评论家引导改进的根本脆弱性：当评论家病态时，小的评论家误差会被放大，通常导致模型崩溃。本文引入了信任区域Q伴随匹配（TRQAM），一种稳定的离策略微调算法，通过投影对偶下降自适应地控制与预训练流策略的路径空间KL散度。具体来说，我们优化SOC动力学中的信任区域参数$λ$，并从理论上证明路径空间KL可以用$λ$的闭式函数表示。因此，我们的方法可以精确控制与预训练流策略的精确偏差，实现稳定的离策略强化学习。通过在50个OGBench任务上的实验，TRQAM在离线强化学习和离线到在线强化学习中都持续优于先前的方法。特别是，TRQAM在离线强化学习中实现了68%的总体成功率，显著提高了最强基线的46%。

英文摘要

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the multi-step sampling process. Recently, Q-learning with Adjoint Matching (QAM) addressed this issue by reformulating into a memoryless stochastic optimal control (SOC) problem with a learned critic. However, QAM inherits a fundamental fragility of critic-guided improvement: small critic errors are amplified when critics are ill-conditioned, often leading to model collapse. This paper introduces Trust Region Q-Adjoint Matching (TRQAM), a stable off-policy fine-tuning algorithm that adaptively controls the path-space KL with pretrained flow policies through projected dual descent. Specifically, we optimize the trust-region parameter $λ$ in SOC dynamics, and theoretically show that the path-space KL can be represented by a closed-form function of $λ$. As a result, our method can precisely control the exact deviation from pretrained flow policies, achieving stable off-policy RL. Through experiments on 50 OGBench tasks, TRQAM consistently outperforms prior arts in both offline RL and offline-to-online RL. In particular, TRQAM achieves an overall success rate of 68% in offline RL, substantially improves the strongest baseline at 46%.

URL PDF HTML ☆

赞 0 踩 0

2605.27046 2026-05-27 cs.RO 版本更新

Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy

学习平衡电机热安全与四足运动性能的残差策略

Yuhang Wan, Weixian Lin, Letian Qian, Yiqi Zou, Weiwei Wu, Shengwei Wu, Chuanlin Zhao, Xin Luo

发表机构 * School of Mechanical Science and Engineering, Huazhong University of Science and Technology（华中科技大学机械科学与工程学院）

AI总结提出一种两阶段训练框架，结合整机热模型和残差策略，在保持运动性能的同时防止电机过热，实现长时间负重运动。

详情

AI中文摘要

电机热管理在电动驱动机器人（尤其是腿式机器人）中常被忽视，但电机过热是限制长时间运动的关键因素，特别是在负载条件下。本文将一个四足机器人的整机热模型集成到强化学习流水线中以更新电机温度，并提出一个用于电机热管理的两阶段训练框架。在该框架中，首先预训练一个名义策略作为能够穿越多种地形的运动基线。然后，在名义策略之上训练一个残差策略，根据机器人的热状态提供修正动作，确保在低温条件下保持高性能，并在高温条件下防止电机过热。仿真结果表明，所提出的策略在电机热安全与运动性能之间实现了有效平衡。在宇树A1四足机器人上的真实世界实验进一步验证了该方法：在3千克负载下，机器人能够在多种地形上稳定运动超过13分钟，而仅使用名义策略时，约5分钟就会导致电机过热。

英文摘要

Motor thermal management is often overlooked in the context of electrically-actuated robots, particularly legged robots, but motor overheating is a key factor that limits long-duration locomotion especially under payload conditions. This paper integrates a whole-body thermal model of a quadruped robot into the reinforcement learning pipeline to update motor temperatures, and proposes a two-stage training framework for motor thermal management. In this framework, a nominal policy is first pre-trained as a locomotion baseline capable of traversing diverse terrains. A residual policy is then trained on top of the nominal policy to provide corrective actions based on the robot's thermal state, ensuring high performance under low-temperature conditions and preventing motor overheating under high-temperature conditions. Simulation results demonstrate that the proposed policy achieves an effective balance between motor thermal safety and locomotion performance. Real-world experiments on a Unitree A1 quadruped robot further validate the approach: under a 3 kg payload, the robot achieves stable locomotion across multiple terrains for over 13 minutes, while the nominal policy alone leads to motor overheating in about 5 minutes.

URL PDF HTML ☆

赞 0 踩 0

2605.27038 2026-05-27 cs.RO 版本更新

TPS-Drive: Task-Guided Representation Purification for VLM-based Autonomous Driving

TPS-Drive: 基于VLM的自动驾驶任务引导表示净化

Jiaxiang Li, Yumao Liu, Ke Ma

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））

AI总结提出TPS-Drive框架，通过任务引导的表示净化（Agent-Centric Tokenizer）解决VLM在自动驾驶中的空间幻觉和表示干扰问题，实现精确的3D空间预测与安全规划。

详情

AI中文摘要

视觉-语言模型（VLM）为自动驾驶规划提供了有前景的基础，但弥合语义推理与精确3D空间预测之间的差距仍然是一个关键挑战。现有的表示策略通常遵循两条路径：文本对齐方法将连续空间状态扁平化为符号，这损害了几何结构并导致“空间幻觉”；密集视觉方法保留了空间拓扑，但用冗余的背景纹理压垮了标准分词器，导致“表示干扰”。为了解决这些限制，我们引入了TPS-Drive，一个以任务引导表示净化为核心的新框架，使VLM能够在净化空间中思考。其核心是一个以智能体为中心的分词器，利用由冻结的3D检测头监督的任务引导向量量化机制，将有限的码本容量从普遍的静态背景显式重新分配给关键的动态智能体，并有效隔离空间冗余。利用这种净化的空间词汇，TPS-Drive采用解耦的推理流程，依次执行场景理解、未来预测和动作生成。该框架通过渐进的三阶段训练范式进行优化，最终通过奖励驱动的细化超越纯模仿学习。大量实验验证了我们的方法：TPS-Drive在开环nuScenes评估中实现了准确的智能体空间状态预测并降低了碰撞率，同时在严格的闭环NAVSIMv1和NAVSIMv2基准测试中建立了新的安全记录。

英文摘要

Vision-Language Models (VLMs) provide a promising foundation for autonomous driving planning, yet bridging semantic reasoning and precise 3D spatial forecasting remains a critical challenge. Existing representation strategies generally follow two paths: text-aligned methods flatten continuous spatial states into symbols, which compromises geometric structure and induces "spatial hallucinations"; dense visual methods preserve spatial topology but overwhelm standard tokenizers with redundant background textures, leading to "representation interference". To address these limitations, we introduce TPS-Drive, a novel framework centered on Task-Guided Representation Purification that empowers VLMs to Think in Purified Space. At its core, an Agent-Centric Tokenizer utilizes a task-guided vector quantization mechanism supervised by a frozen 3D detection head, which explicitly reallocates limited codebook capacity from pervasive static backgrounds to critical dynamic agents and effectively isolates spatial redundancy. Leveraging this purified spatial vocabulary, TPS-Drive employs a decoupled reasoning pipeline that sequentially performs scene understanding, future forecasting, and action generation. The framework is optimized via a progressive three-stage training paradigm, culminating in reward-driven refinement that surpasses pure imitation learning. Extensive experiments validate our approach: TPS-Drive achieves accurate agent spatial state forecasting and reduces collision rates in open-loop nuScenes evaluations, while establishing new safety records on the rigorous closed-loop NAVSIMv1 and NAVSIMv2 benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.26991 2026-05-27 cs.RO 版本更新

Towards Shared Embodied Intelligence in Humanoid Robots through Optimization Development and Testing of the Human Aware ergoCub Robot

通过优化开发与测试人类感知的ergoCub机器人迈向人形机器人的共享具身智能

Carlotta Sartore, Mohamed Elobaid, Lorenzo Rapetti, Giulio Romualdi, Stefano Dafarra, Nicola A. Piga, Ines Sorrentino, Paolo Maria Vicecone, Silvio Traversaro, Ugo Pattacini, Luca Fiorio, Francesco Draicchio, Giovanna Tranfo, Lorenzo Natale, Marco Maggiali, Daniele Pucci

发表机构 * GenerativeBionics（生成生物技术）； Artificial and Mechanical Intelligence（人工与机械智能）； School of Computer Science, University of Manchester（曼彻斯特大学计算机科学学院）； Humanoid Sensing and Perception（人形感知与感知）； iCub Tech Facility（iCub技术设施）； DiMEILA, Istituto Nazionale Assicurazione Infortuni sul Lavoro (INAIL)（DiMEILA，意大利国家职业伤害保险机构（INAIL））

AI总结提出一种融合共享智能与具身认知的架构，通过优化机器人硬件与控制以符合人体工学指标，实现人机物理协作，并以ergoCub人形机器人为具体实现。

详情

AI中文摘要

协作是人类行为的核心，使得完成超出个人能力的任务成为可能。这种能力源于通过对他人的内部表征来协调行动，这一概念被称为共享智能。此外，人类以其身体和认知能力为特征，这些能力会根据环境进行优化，这种现象被称为具身认知。设计能够安全有效地与人协作的人形机器人需要统一这些原则。在此，我们提出一种整合共享智能与具身认知的架构，使机器人能够与人类进行物理协作，其中机器人硬件和控制针对人体指标进行优化，利用人体和运动智能的表征。最终目标是实现一种共享具身智能的形式。具体而言，我们的架构根据人体工程学指标优化机器人硬件和物理智能参数。这是通过将人机交互建模为硬件配置的函数，并将人体模型嵌入机器人的物理智能中来实现的。作为具体实现，我们介绍了人形机器人ergoCub，其形态和控制已针对与人类的协作任务进行了优化。我们的方法为设计在硬件和物理智能层面优先考虑人体工程学的人形机器人提供了一个框架，并应用于工业和辅助机器人领域。

英文摘要

Collaboration is central to human behavior, enabling tasks beyond individual capability. This ability arises from coordinating actions through internal representations of others, a concept known as shared intelligence. Additionally, humans are characterized by physical bodies and cognitive abilities that are optimized in response to their environment, a phenomenon referred to as embodied cognition. Designing humanoid robots that collaborate safely and effectively with people requires unifying these principles. Here we propose an architecture that integrates shared intelligence and embodied cognition to enable robots to physically collaborate with humans, where robot hardware and control are optimized for human metrics, using representations of the human body and motion intelligence. The ultimate goal is to achieve a form of shared embodied intelligence. Specifically, our architecture optimizes robot hardware and physical intelligence parameters with respect to human ergonomic metrics. This is accomplished by modeling human-robot interaction as a function of hardware configurations and embedding human models into the robot's physical intelligence. As a concrete implementation, we present the humanoid robot ergoCub, whose morphology and control have been optimized for collaborative tasks with humans. Our approach provides a framework for designing humanoid robots that prioritize human ergonomics at both the hardware and physical intelligence levels, with applications in industrial and assistive robotics.

URL PDF HTML ☆

赞 0 踩 0

2605.26944 2026-05-27 cs.RO cs.CV 版本更新

Object Pose and Shape Estimation for Grasping: Does it Work?

用于抓取的目标姿态与形状估计：有效吗？

Pavan Karke, Kushal Shah, Gaurav Singh, Md Faizal Karim, K Madhava Krishna, Rajat Talak

发表机构 * Robotics Research Center, IIIT Hyderabad（IIIT海得拉巴机器人研究中心）； National University of Singapore（新加坡国立大学）

AI总结本文通过对比端到端抓取合成方法与模块化方法（先估计目标姿态和形状再采样抓取），评估现有姿态和形状估计方法在抓取任务中的有效性。

Comments 9 pages, 8 figures

详情

AI中文摘要

目标姿态和形状估计问题近年来取得了关键进展。编码器-解码器（如SAM3D、LRM、CRISP）和基于扩散的模型（如InstantMesh、Zero123、SceneComplete）展示了类别无关的形状编码能力和开放集泛化性。在这项工作中，我们提出一个问题：当与对极抓取采样结合使用时，目标姿态和形状估计方法是否足够成熟，以至于能够超越端到端抓取合成方法？我们通过将研究范围限定在平行颚夹爪、7自由度抓取和单视图RGB(-D)图像输入，详细探讨了这个问题。我们实现并比较了一种最先进的端到端抓取合成方法和三种模块化方法，这些方法首先估计场景中所有目标的姿态和形状，然后使用对极采样生成抓取。我们观察到，在所有实验中，模块化方法均优于端到端方法。模块化方法能够合成大量抓取，即使是对于端到端方法失败的小目标也是如此。模块化方法的有效性取决于姿态和形状估计的准确性，并且在杂乱场景中会部分退化——这是现有姿态和形状估计方法的局限性。我们还分析了三种模块化方法的失败模式和运行时间，这些方法使用了两种不同的目标姿态和形状估计方式：一种基于编码器-解码器模型，另一种基于扩散模型。最后，我们证明单视图目标姿态和形状估计方法可以与视觉语言模型结合，仅从单视图RGB-D图像输入即可产生语言条件抓取。我们注意到其性能与最先进的LERF-TOGO基线相当。

英文摘要

The problem of object pose and shape estimation has seen key advancements lately. Encoder-decoder (e.g., SAM3D, LRM, CRISP) and diffusion-based models (e.g., InstantMesh, Zero123, SceneComplete) have shown category-agnostic shape encoding capacity and open-set generalizability. In this work, we ask the question: Are the object pose and shape estimation methods mature enough, such that when used with antipodal grasp sampling, can outperform the end-to-end grasp synthesis methods? We explore this question in detail by scoping our study to parallel jaw grippers, 7-DoF grasps, and single-view RGB(-D) image as input. We implement and compare a state-of-the-art, end-to-end grasp synthesis method and three modular methods, which first estimate the object pose and shape for all objects in the scene, and generate grasps using antipodal sampling. We observe that the modular methods outperform the end-to-end method in all our experiments. The modular methods are able to synthesize plenty of grasps, even for small objects, where the end-to-end methods fail. The effectiveness of the modular methods is contingent on the accuracy of the pose and shape estimation, and suffers partial degradation in cluttered scenes - a limitation of the existing pose and shape estimation methods. We also analyze the failure modes and run-times for the three modular methods, which use two different ways of object pose and shape estimation: one based on an encoder-decoder model, while another a diffusion model. Finally, we demonstrate that the single-view object pose and shape estimation methods can be augmented with vision-language models to yield language-conditioned grasps from just single-view RGB-D image as input. We notice comparable performance to the state-of-the-art LERF-TOGO baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.26936 2026-05-27 cs.RO 版本更新

VLA 模型能否从现实世界数据中持续学习而不遗忘？

Jiarun Zhu, Yijun Hong, Xiaoquan Sun, Zetian Xu, Mingqi Yuan, Zhiyong Wang, Wenjun Zeng, Jiayu Chen

发表机构 * HKU（香港大学）； INFIFORCE ； EIT, Ningbo（宁波工程学院）； HUST（华中科技大学）； SUSTech（南方科技大学）； HITSZ（香港理工大学）

AI总结本研究通过构建包含四个顺序操作任务的真实世界持续学习数据集，实证发现视觉-语言-动作（VLA）模型在持续学习异构真实世界演示时存在严重灾难性遗忘，并系统评估了经验回放方法的关键实施因素。

详情

AI中文摘要

视觉-语言-动作（VLA）模型为通用机器人提供了有前景的基础。然而，它们在现实场景中的成功部署需要能够持续获取新技能，同时保留先前学习的行为。虽然开创性研究在狭窄的模拟环境中研究了VLA模型的持续学习，但在现实条件下这一挑战仍未得到充分探索。为解决这一局限，我们构建了一个真实世界的持续学习数据集，包含四个顺序操作任务，涵盖刚体抓取放置、接触式按压和可变形物体折叠。利用该数据集，我们进行了全面实验，发现VLA模型在持续学习异构真实世界演示时遭受显著的灾难性遗忘。然后，我们系统评估了经验回放，并揭示了决定其成功的关键实施因素。总之，这项工作提供了真实世界持续VLA学习的首次实证研究，并为部署长期运行的机器人策略提供了实用指导。

英文摘要

Vision-language-action (VLA) models provide a promising foundation for general-purpose robotics. However, their successful deployment in real-world scenarios requires the ability to continually acquire new skills while retaining previously learned behaviors. While pioneering research has studied the continual learning of VLA models in narrowly simulated environments, this challenge remains largely unexplored under realistic conditions. To address this limitation, we construct a real-world continual learning dataset comprising four sequential manipulation tasks, spanning rigid-object pick-and-place, contact-rich pressing, and deformable-object folding. Using this dataset, we conduct comprehensive experiments and find that VLA models suffer significant catastrophic forgetting when continually learning from heterogeneous real-world demonstrations. We then systematically evaluate experience replay and uncover key implementation factors that govern its success. In summary, this work provides the first empirical study of real-world continual VLA learning and offers practical guidance for deploying long-lived robot policies.

URL PDF HTML ☆

赞 0 踩 0

2605.26782 2026-05-27 cs.RO cs.HC 版本更新

Manipulating Tangible Virtual Object Dynamics to Promote Learning of Precision Force Generation

操控有形虚拟物体动力学以促进精确力生成的学习

Alberto Garzás-Villar, Alba Riera-Cardona, Alexis Derumigny, J. Micah Prendergast, Jane Murray Cramm, Laura Marchal-Crespo

发表机构 * Department of Cognitive Robotics, Delft University of Technology, Delft, 2628 CD, The Netherlands ； Department of Socio-medical Sciences, Erasmus School of Health Policy \& Management, Erasmus University Rotterdam, Rotterdam, 3062PA, The Netherlands. ； Department of Applied Mathematics, Delft University of Technology, Delft, 2628 CD, The Netherlands ； Department of Rehabilitation Medicine, Erasmus Medical Center, Rotterdam, 3015 GD, The Netherlands

AI总结本研究提出通过操控有形虚拟物体的动力学（线性、高斯或反对称高斯弹簧模型）来训练精确力控制，实验表明反对称高斯组在训练中力精度最高，但长期保留无显著差异，且参与者主要依赖学习到的目标伸长而非目标力。

详情

AI中文摘要

机器人触觉设备结合虚拟现实为训练精细力生成提供了新机会，这是中风后康复中重要但常被忽视的部分。本研究提出，操控有形虚拟物体的渲染动力学可用于训练精确力控制，同时激活体感系统。我们进行了一项实验，50名健康参与者执行一项类似冰壶的任务，他们必须拉伸虚拟弹簧以产生目标释放力，将石头推至冰面上预定义位置。在训练中，弹簧的力-伸长关系被建模为线性或非线性函数，即高斯或反对称高斯函数，在释放目标力处导数为零。结果表明，反对称高斯组在训练中始终比线性组获得更高的力精度，而高斯组仅在训练后期优于线性组。人格特质分析显示，在高斯动力学下，更高的自由精神得分与较差的表现和减少的任务探索相关，而更高的挑战转化得分与增加的探索相关。尽管存在这些训练效应，但在不同弹簧类型或人格特质之间，长期保留没有显著差异。参与者主要依赖学习到的目标伸长而非目标力，这通过在不同刚度但相同目标力的转移任务中的表现得以证实。虽然这些方法对体感神经康复有前景，但在对神经疾病患者进行测试之前，需要改进以减少对本体感觉线索的依赖。

英文摘要

Robotic haptic devices combined with virtual reality offer novel opportunities to train fine force generation, an essential yet overlooked component of post-stroke rehabilitation. This study proposes that manipulating the rendered dynamics of tangible virtual objects can be leveraged to train precise force control while engaging the somatosensory system. We conducted an experiment with fifty healthy participants who performed a curling-inspired task in which they had to stretch a virtual spring to generate a target release force to propel the stone to a predefined location on the ice sheet. During training, the spring's force-elongation relationship was modeled as either a linear or non-linear function, i.e., a Gaussian or antisymmetric Gaussian (AS-Gaussian) function with zero derivative at the release target force. Results indicate that the AS-Gaussian group consistently achieved higher force accuracy during training than the linear group, while the Gaussian group only outperformed the linear group toward the end of training. Analysis of personality traits revealed that higher Free Spirit scores were associated with poorer performance and reduced task exploration under Gaussian dynamics, whereas higher Transform-of-Challenge scores correlated with increased exploration. Despite these training effects, no significant differences in long-term retention were found across spring types or personality traits. Participants primarily relied on learned target elongation rather than target force, as evidenced by performance in a transfer task with a different stiffness but the same target force. While promising for somatosensory neurorehabilitation, these methods require refinement to reduce reliance on proprioceptive cues before testing with neurological patients.

URL PDF HTML ☆

赞 0 踩 0

2605.26710 2026-05-27 cs.RO 版本更新

Look Further: Socially-Compliant Navigation System in Residential Buildings

看得更远：住宅楼中的社交合规导航系统

Akira Shiba, Marina Obata, Nathan Kau, Zoltan Beck, Rishi Shah, Michael Sudano, Sabrina Lee

发表机构 * Toyota Woven City（丰田织城）

AI总结提出一种主动变道（PLC）运动模式，通过将反应距离扩展到8米以上，改善人类对机器人运动的感知，并在直走廊场景中显著提升安全性、流畅性和礼貌性。

Comments 2025 ACM/IEEE International Conference on Human-Robot Interaction

详情

DOI: 10.1109/HRI61500.2025.10973828
Journal ref: 2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Melbourne, Australia, 2025, pp. 272-282

AI中文摘要

移动机器人对人的反应距离强烈影响人机交互的多种品质。本文聚焦于移动配送机器人在住宅室内走廊环境中的导航。社交导航方法通常侧重于避免令人不适的人机交互，例如机器人侵入某人的个人空间。由于个人空间已被证明仅在几米范围内，社交导航方法通常侧重于解决这些短距离交互。然而，在本工作中，我们证明通过将反应距离扩展到超过8米（远超出典型交互距离），可以改善人类对机器人运动的感知。我们引入了主动变道（PLC）运动模式以及利用该模式在更远距离上对人做出反应的导航系统。该模式包括当机器人在走廊中从中心向侧面导航时，在距离迎面而来的人8米处改变其横向位置。我们进行了一项有42名参与者的用户研究，基于三个服务目标（安全性、流畅性和礼貌性）评估他们对配送机器人的印象。在直走廊场景（正面接近）中，结果显示与文献中典型的运动模式（减速、停止和在靠近人时反应性避障）相比，这三个目标均有显著改善。相比之下，在交叉口（盲角）场景中，没有任何一种方法显著优于其他方法，参与者对机器人运动模式的偏好各不相同。

英文摘要

The distance at which a mobile robot reacts to a person strongly impacts various qualities of the human-robot interaction. In this paper, we focus on the navigation of a mobile delivery robot platform in a residential indoor hallway environment. Social navigation methods typically focus on avoiding uncomfortable human-robot interactions, such as when a robot encroaches on someone's personal space. Since personal space has been shown to be in the range of just a few meters, social navigation methods typically focus on deconflicting and resolving these short-range interactions. In this work, however, we demonstrate that by extending the reaction distance to over eight meters, far beyond the typical interaction distance, we can improve the human's perception of the robot's motion. We introduce the Proactive Lane-Changing (PLC) motion pattern and a navigation system that leverages it to react to people at an increased distance. This pattern consists of changing the robot's lateral position as it navigates down the hallway from the center to the side at an eight-meter distance from an oncoming person. We conducted a user study with 42 participants to assess their impressions of the delivery robot based on three service objectives: safety, smoothness, and politeness. In the straight hallway scenario (Frontal Approach), results showed significant improvement in each of these three objectives compared to typical motion patterns found in the literature: slowing down, stopping, and reactive collision avoidance in the proximity of a person. In contrast, in the intersection (Blind Corner) scenarios, none of the approaches performed significantly better than any other, with participants having a diverse range of preferences among robot motion patterns.

URL PDF HTML ☆

赞 0 踩 0

2605.24465 2026-05-27 cs.RO 版本更新

Polymander II: an amphibious salamander-inspired robot with contact and flow sensors

Polymander II：一种带有接触和流量传感器的两栖蝾螈启发机器人

Qiyuan Fu, Sudong Lee, Andrea Grillo, Jonathan Arreguit, Louis Gevers, Josie Hughes, Auke J. Ijspeert

发表机构 * Biorobotics Laboratory, EPFL（生物机器人实验室，瑞士联邦理工学院）； CREATE Lab, EPFL（CREATE实验室，瑞士联邦理工学院）； Innobridge Services Sàrl（Innobridge Services公司）

AI总结本文提出一种基于霍尔效应传感器的两栖机器人，用于感知足部接触力和侧向水动力，实现陆水环境感知与反馈控制。

Comments This work has been accepted for publication in the 2026 International Conference on Robotics and Automation (ICRA), Vienna, Austria

详情

AI中文摘要

机器人受益于感官信息来协调身体运动、增强对扰动的鲁棒性，并在不同模式间转换以适应各种地形。然而，很少有兩栖机器人能够感知与陆地和水中环境的交互。在本文中，我们提出了一种解决方案，使用霍尔效应传感器来感知一种受蝾螈启发的两栖机器人的足部接触力和侧向水动力。通过两条总线，机器人可以同时以超过500 Hz的频率获取这些外部感受信息，并以100 Hz的频率获取本体感受信息，如关节位置和负载。所使用的霍尔效应传感器体积小巧，适合嵌入机器人多个位置，并且对小力具有高灵敏度。此外，由于传感器可以与测量对象分开放置，防水实现相对容易。我们的测试展示了机器人在穿越两栖环境方面的能力，以及其在利用反馈控制执行更复杂运动任务方面的潜力。

英文摘要

Robots benefit from sensory information to coordinate body movement, gain robustness against perturbations, and transition between different modes to adapt to various terrains. However, few amphibious robots can sense interactions with both terrestrial and aquatic environments. In this paper, we present a solution that uses Hall-effect sensors to sense foot contact forces and lateral hydrodynamic forces on a salamander-inspired amphibious robot. With two bus lines, the robot can simultaneously acquire this exteroceptive information at more than 500 Hz and proprioceptive information, such as joint positions and loads, at 100 Hz. The Hall-effect sensors used are compact, making them suitable for embedding in multiple positions within a robot, and exhibit high sensitivity to small forces. Moreover, because the sensor can be positioned separately from the measured object, waterproofing can be implemented with relative ease. Our tests demonstrate the robot's capabilities in traversing amphibious environments and its potential in using feedback control for more complex locomotion tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.26682 2026-05-27 cs.RO cs.CV 版本更新

SteelDS: A High-Resolution Video Dataset of E40 Steel Scrap for Object Detection and Instance Segmentation

SteelDS: 用于目标检测和实例分割的E40钢废料高分辨率视频数据集

Melanie Neubauer, Christian Rauch, Gerald Koinig, Alexia Tischberger-Aldrian, Roland Pomberger, Elmar Rueckert

发表机构 * Chair of Cyber-Physical-Systems（系统工程系）； Technical University of Leoben（莱比锡技术大学）； Chair of Waste Processing Technology and Waste Management（废物处理技术与废物管理系）

AI总结该数据集提供了E40级钢和铜废料在传送带上的高分辨率标注视频序列，用于支持材料分类、目标检测和实例分割的机器学习模型开发。

2605.26649 2026-05-27 cs.RO 版本更新

On the Generalization Capabilities, Design Choices and Limitations of Keypoint Imitation Learning

关键点模仿学习的泛化能力、设计选择与局限性

Thomas Lips, Marco Moletta, Michael C. Welle, Danica Kragic, Francis wyffels

发表机构 * AI and Robotics Lab, IDLAB-AIRO, Ghent University-imec（人工智能与机器人实验室，IDLAB-AIRO，根特大学-imec）； Robotics, Perception and Learning Lab, (RPL), EECS, KTH Royal Institute of Technology（机器人、感知与学习实验室，（RPL），EECS，皇家理工学院）； INCAR Robotics AB, Stockholm, Sweden（INCAR机器人公司，斯德哥尔摩，瑞典）

AI总结本文通过2000多次真实世界实验，系统评估了基于视觉基础模型的关键点模仿学习在机器人操作中的泛化能力、设计选择及局限性，发现其总体成功率达75%，显著优于RGB基线（47%），与S2扩散模型（73%）相当，但未超越其他表示方法且继承了基础模型的局限。

Comments This version was submitted to IROS 2026

详情

AI中文摘要

打破认知陷阱：复合不确定性下的主动感知

Chayan Banerjee, Ethan Goan

发表机构 * School of Electrical Engineering and Robotics（电气工程与机器人学学院）

AI总结针对强化学习在安全关键领域中因状态-动力学耦合不确定性导致的失败，提出基于互信息的复合不确定性系数和主动信息寻求策略的适应性安全架构。

详情

AI中文摘要

在安全关键领域部署强化学习，从自动驾驶到医疗决策支持，受到系统遇到不熟悉条件时出现的失败的限制。我们认为，根本瓶颈不是单个挑战，如变化的动力学或不完整的观测，而是它们的协同交互，我们称之为认知陷阱：代理无法在不知道系统动力学的情况下估计其状态，也无法在没有准确状态信息的情况下学习动力学。在模拟运动中的概念验证实验表明，结合这些不确定性导致的失败远严重于单独挑战，性能下降77%，而单独效应相加为46%，展示了传统方法忽略的复合失败模式。这些方法采用被动的认知立场，无法解决这种耦合的不确定性。我们提出将安全重新定义为信息问题，引入一个适应性安全架构，围绕三个贡献构建：复合不确定性系数（κ），一种基于互信息的度量，量化状态-动力学耦合，可在线上计算而无需完整的联合信念推断；由MaxInfoRL目标驱动的信息寻求策略，主动探测系统动力学；以及随认知耦合上升而收紧的机制自适应安全约束。这种范式转变，从被动鲁棒性到主动感知，为在不确定性下运行、识别自身无知并战略性地采取行动解决它的决策系统提供了原则性路径。

英文摘要

Deploying reinforcement learning in safety critical domains, from autonomous vehicles to medical decision support, is constrained by failures arising when systems encounter unfamiliar conditions. We argue that the fundamental bottleneck is not individual challenges like changing dynamics or incomplete observations, but their synergistic interaction, which we term the Epistemic Trap: agents cannot estimate their state without knowing system dynamics, nor learn dynamics without accurate state information. Proof-of-concept experiments in simulated locomotion reveal that combining these uncertainties causes failures far worse than either challenge alone, a 77% performance degradation against the 46% by adding the individual effects, demonstrating compounding failure modes that conventional methods overlook. Such approaches adopt a passive epistemic stance that cannot resolve this coupled uncertainty. We propose reframing safety as an information problem, introducing an Adaptive Safety Architecture built around three contributions: the Compound Uncertainty Coefficient ($κ$), a mutual information based metric that quantifies state dynamics coupling and is computable online without full joint belief inference; information seeking policies governed by a MaxInfoRL objective that actively probe system dynamics; and regime-adaptive safety constraints that tighten as epistemic coupling rises. This paradigm shift, from passive robustness to active perception, offers a principled path toward decision making systems that operate under uncertainty, recognize their own ignorance, and act strategically to resolve it.

URL PDF HTML ☆

赞 0 踩 0

2605.26478 2026-05-27 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

基于随机解耦策略梯度的高效在策略视觉强化学习

Haoxiang You, Yilang Liu, Davis Zong, Qian Wang, Teeratham Vitchutripop, Qi Wang, Daniel Rakita, Ian Abraham

发表机构 * Yale University（耶鲁大学）； Shanghai Jiao Tong University（上海交通大学）； University of Sydney（悉尼大学）

AI总结提出随机解耦策略梯度（SDPG）方法，通过轨迹滚动的随机扰动估计策略梯度，在单GPU上数小时内端到端训练多样化的视觉运动控制策略，显著降低计算和内存开销，并在视觉MuJoCo基准测试中优于基线方法。

2605.26471 2026-05-27 cs.RO 版本更新

Heterogeneous AAV Logistics Task Allocation: A Reinforcement Learning Enhanced Overlapping Coalition Formation Game Approach

异构AAV物流任务分配：一种强化学习增强的重叠联盟形成博弈方法

Yuze Zhou, Jingliang Sun, Junzhi Li, Jianxin Zhong, Zihan Wang, Teng Long

发表机构 * Beijing Institute of Technology（北京理工大学）； Key Laboratory of Dynamics and Control of Flight Vehicle, Ministry of Education（教育部飞行器动力学与控制重点实验室）

AI总结针对动态城市物流中时间敏感任务的随机出现带来的异构AAV任务分配最优性挑战，提出一种基于Transformer的软演员-评论家网络增强的重叠联盟形成博弈方法，实现全局最优任务分配并证明收敛至纳什稳定均衡。

Comments 12 pages

详情

AI中文摘要

在动态城市物流中，时间敏感任务的随机出现对异构AAV物流任务分配提出了显著的最优性挑战。为解决这一问题，提出了一种强化学习增强的重叠联盟形成博弈方法。建立了动态任务分配模型，其中全局最优性通过耦合服务质量与资源消耗的广义物流成本进行数学量化。为应对随机订单到达引起的时变任务集，设计了一种基于Transformer的软演员-评论家网络。通过利用多头自注意力编码可变长度的物流状态并捕获任务间的时空依赖关系，学习到的策略自适应地指导联盟更新，取代重叠联盟形成博弈中的启发式规则。在此基础上，异构AAV可以为动态物流任务形成更高效的重叠联盟。所得到的联盟形成过程被证明构成一个精确势博弈，保证了在有限迭代次数内收敛到纳什稳定均衡。数值仿真表明，所提算法在广义物流成本准则下有效提高了任务分配的最优性。在32架AAV和80个任务的场景中，与启发式OCF基线相比，我们的算法实现了39.76%的成本降低。室内飞行实验进一步验证了其实用性。

英文摘要

In dynamic urban logistics, the stochastic emergence of time-sensitive tasks poses a significant optimality challenge for heterogeneous AAVs logistics task allocation. To address this problem, a reinforcement learning enhanced overlapping coalition formation game approach is proposed. A dynamic task allocation model is established, where global optimality is mathematically quantified by a generalized logistics cost coupling service quality and resource consumption. To deal with the time-varying task sets induced by stochastic order arrivals, a transformer-based soft actor-critic network is designed. By leveraging multi-head self-attention to encode variable-length logistics states and capture task-wise spatiotemporal dependencies, the learned policy adaptively guides coalition updates, replacing heuristic rules in the overlapping coalition formation game. On this basis, heterogeneous AAVs can form more efficient overlapping coalitions for dynamic logistics tasks. The resulting coalition formation process is proven to constitute an exact potential game, which guarantees convergence to a Nash-stable equilibrium within a finite number of iterations. Numerical simulations demonstrate that the proposed algorithm effectively improves the optimality of task allocation under the generalized logistics cost criterion. In a scenario with 32 AAVs and 80 tasks, our algorithm achieves a 39.76% cost reduction compared with the heuristic OCF baseline. Indoor flight experiments further validate its practicality.

URL PDF HTML ☆

赞 0 踩 0

2605.26349 2026-05-27 cs.RO 版本更新

Closing the Loop in Teleoperation: Episode-Level Data Quality Assessment and Feedback for High-Quality Demonstration Collection

在遥操作中闭环：面向高质量演示收集的片段级数据质量评估与反馈

Gokul Narayanan, Yash Shahapurkar, Melih Erdogan, Brian Zhu, Eugen Solowjow

发表机构 * Siemens Corporation（西门子公司）

AI总结提出数据质量评估与反馈框架，通过语义任务进度和机器人遥测数据提供即时后片段反馈，帮助新手操作员提升演示质量。

详情

AI中文摘要

工业自动化正处于关键时刻，物理AI正推动从刚性、手工设计的自动化系统向更灵活、自适应的系统转变。这一转变产生了对大规模、真实世界机器人演示数据的需求，使得遥操作成为越来越重要的数据收集机制。然而，在实践中，高质量的遥操作演示仍然难以获得，因为新手操作员经常产生任务成功但下游使用次优的片段，原因包括低效运动、重复修正或接近机器人关节极限操作。我们提出一个数据质量评估与反馈（DQAF）框架，通过提供基于语义任务进度和机器人遥测的即时后片段反馈，在遥操作中实现闭环。该框架提取质量相关信号，如子任务进度、运动平滑度、停顿、运动学极限，并将其转化为结构化质量评估和可操作的自然语言反馈。与二元成功或失败反馈不同，所提系统解释了片段为何次优，并突出显示下次试验中需要纠正的具体行为。我们通过诊断验证研究和试点用户研究评估该框架。在验证研究中，系统在数据集整理过程中与人类评审员进行比较，产生拒绝原因和可操作的改进反馈。在涉及三个新手操作员的两项操作任务的试点研究中，接收系统即时自动后片段反馈的操作员比未接收的改进更快，更早产生更高质量的演示。

英文摘要

Industrial automation is at a pivotal moment, as Physical AI is driving a transition from rigid, hand-engineered automation systems toward more flexible and adaptive systems. This shift has created a growing demand for large-scale, real-world robot demonstration data, making teleoperation an increasingly important mechanism for data collection. However, high-quality teleoperated demonstrations remain difficult to obtain in practice, as novice operators often produce episodes that are task-successful but suboptimal for downstream use due to inefficient motion, repeated corrections, or operation near robot joint limits. We present a Data Quality Assessment and Feedback (DQAF) framework that closes the loop in teleoperation by providing immediate post-episode feedback grounded in semantic task progress and robot telemetry. The framework extracts quality relevant signals such as sub-task progress, motion smoothness, stalls, kinematic limits and converts them into structured quality assessments and actionable natural-language feedback. Unlike binary success or failure feedback, the proposed system explains why an episode is suboptimal and highlights specific behaviors to correct in the next trial. We evaluate the framework through a diagnostic validation study and a pilot user study. In the validation study, the system is compared with a human reviewer during dataset curation, producing rejection reasons and actionable feedback for improvement. In the pilot study with three novice operators across two manipulation tasks, the operator who received the systems immediate, automated post-episode feedback improved faster than those who did not, producing higher-quality demonstrations sooner.

URL PDF HTML ☆

赞 0 踩 0

2605.26348 2026-05-27 cs.RO 版本更新

公共交通路由的早期剪枝

Andrii Rohovyi, Abdallah Abuaisha, Toby Walsh

发表机构 * Department of Computer Science and Engineering, University of New South Wales (UNSW), Sydney, NSW 2033, Australia（新南威尔士大学计算机科学与工程系）； Department of Data Science and Artificial Intelligence, Monash University, Melbourne, Australia（墨尔本大学数据科学与人工智能系）

AI总结提出早期剪枝技术，通过预排序换乘连接并在换乘循环中应用剪枝规则，在不影响最优性的情况下加速公共交通路由算法，实验表明查询时间最多减少57%。

详情

AI中文摘要

公共交通的路由算法，特别是广泛使用的RAPTOR及其变体，在支持无限换乘时，常常在换乘松弛阶段面临性能瓶颈，尤其是在密集的换乘图上。这种低效源于遍历许多潜在的站点间连接（步行、自行车、电动滑板车等）。为了保持可接受的性能，从业者通常限制换乘距离或排除某些换乘选项，这可能会降低路径的最优性并限制向旅客展示的多模式选项。本文介绍了早期剪枝，一种低开销的技术，可以在不影响最优性的情况下加速路由算法。通过按持续时间预排序换乘连接，并在换乘循环内应用剪枝规则，该方法在站点处丢弃较长的换乘，一旦它们无法产生比当前最佳解更早的到达时间。早期剪枝可以以最小的更改集成到现有代码库中，并且只需要一次预处理步骤。该技术在扩展准则设置中保持帕累托最优性，只要额外的优化准则在换乘持续时间上单调非递减。在多个基于RAPTOR的最新解决方案中，包括RAPTOR、ULTRA-RAPTOR、McRAPTOR、BM-RAPTOR、ULTRA-McRAPTOR和UBM-RAPTOR，并在瑞士和伦敦交通网络上测试，我们实现了高达57%的查询时间减少。该方法为交通路径查找算法的效率提供了可推广的改进。

英文摘要

Routing algorithms for public transport, particularly the widely used RAPTOR and its variants, often face performance bottlenecks during the transfer relaxation phase, especially on dense transfer graphs, when supporting unlimited transfers. This inefficiency arises from iterating over many potential inter-stop connections (walks, bikes, e-scooters, etc.). To maintain acceptable performance, practitioners often limit transfer distances or exclude certain transfer options, which can reduce path optimality and restrict the multimodal options presented to travellers. This paper introduces Early Pruning, a low-overhead technique that accelerates routing algorithms without compromising optimality. By pre-sorting transfer connections by duration and applying a pruning rule within the transfer loop, the method discards longer transfers at a stop once they cannot yield an earlier arrival than the current best solution. Early Pruning can be integrated with minimal changes to existing codebases and requires only a one-time preprocessing step. The technique preserves Pareto-optimality in extended-criteria settings whenever the additional optimization criteria are monotonically non-decreasing in transfer duration. Across multiple state-of-the-art RAPTOR-based solutions, including RAPTOR, ULTRA-RAPTOR, McRAPTOR, BM-RAPTOR, ULTRA-McRAPTOR, and UBM-RAPTOR and tested on the Switzerland and London transit networks, we achieved query time reductions of up to 57\%. This approach provides a generalizable improvement to the efficiency of transit pathfinding algorithms.

URL PDF HTML ☆

赞 0 踩 0

2508.14422 2026-05-27 eess.SY cs.RO cs.SY math.OC 版本更新

A Sliced Learning Framework for Online Disturbance Identification in Quadrotor SO(3) Attitude Control

四旋翼SO(3)姿态控制中在线扰动辨识的切片学习框架

Tianhua Gao, Masashi Izumita, Kohji Tomita, Akiya Kamimura

发表机构 * Intelligent Systems Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Japan（国家先进工业科学与技术研究院智能系统研究所，日本）； Graduate School of Systems and Information Engineering, University of Tsukuba, Japan（茨大大学系统与信息工程研究生院，日本）

AI总结提出一种维度分解的几何学习框架Sliced Learning，通过李代数误差表示实现轴空间分解，结合轻量级SANM模块和Lyapunov自适应律，在资源受限MCU上实现400Hz在线扰动辨识。

Comments v4: This version has been accepted for publication in IEEE/ASME Transactions on Mechatronics (TMECH). Supplementary video links have also been added

详情

DOI: 10.1109/TMECH.2026.3689054
Journal ref: 2026 IEEE/ASME Transactions on Mechatronics

AI中文摘要

本文介绍了一种称为Sliced Learning的维度分解几何学习框架，用于四旋翼几何姿态控制中的扰动辨识。与传统的从状态学习不同，该框架采用从误差学习的策略，使用李代数误差表示作为输入特征，在保持SO(3)结构的同时实现轴空间分解（“切片”）。这与神经科学中观察到的认知控制的几何机制高度一致，其中神经系统在结构化子空间内组织自适应表征以实现认知灵活性和效率。基于该框架，我们开发了轻量级且结构可解释的Sliced Adaptive-Neuro Mapping（SANM）模块。用于在线辨识的高维映射被轴向“切片”为多个低维子映射（“切片”），由浅层神经网络和自适应律实现。这些神经网络和自适应律通过基于Lyapunov的自适应在其各自的共享子空间中在线更新。为了增强可解释性，我们证明了在时变扰动和惯性不确定性下的指数收敛性。据我们所知，Sliced Learning是首批在资源受限的微控制器单元（MCU）（如STM32）上以400Hz频率展示轻量级在线神经自适应的框架之一，并经过了实际实验验证。

英文摘要

This paper introduces a dimension-decomposed geometric learning framework called Sliced Learning for disturbance identification in quadrotor geometric attitude control. Instead of conventional learning-from-states, this framework adopts a learning-from-error strategy by using the Lie-algebraic error representation as the input feature, enabling axis-wise space decomposition (``slicing") while preserving the SO(3) structure. This is highly consistent with the geometric mechanism of cognitive control observed in neuroscience, where neural systems organize adaptive representations within structured subspaces to enable cognitive flexibility and efficiency. Based on this framework, we develop a lightweight and structurally interpretable Sliced Adaptive-Neuro Mapping (SANM) module. The high-dimensional mapping for online identification is axially ``sliced" into multiple low-dimensional submappings (``slices"), implemented by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation within their respective shared subspaces. To enhance interpretability, we prove exponential convergence despite time-varying disturbances and inertia uncertainties. To our knowledge, Sliced Learning is among the first frameworks to demonstrate lightweight online neural adaptation at 400 Hz on resource-constrained microcontroller units (MCUs), such as STM32, with real-world experimental validation.

URL PDF HTML ☆

赞 0 踩 0

2601.16578 2026-05-27 cs.RO cs.SY eess.SY 版本更新

Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab

Cyber-Physical Mobility Lab中的零样本多智能体强化学习基准测试

Julius Beerwerth, Jianye Xu, Simon Schäfer, Fynn Belderink, Bassam Alrifaee

发表机构 * Cyber-Physical Mobility Lab（智能物理移动实验室）； University of the Bundeswehr Munich（联邦国防军大学慕尼黑）； RWTH Aachen University（亚琛工业大学）

AI总结本文基于Cyber-Physical Mobility Lab构建了一个可复现的基准测试平台，用于评估联网自动驾驶汽车多智能体强化学习策略的仿真到现实迁移，并揭示了性能下降的两个互补来源。

2604.08059 2026-05-27 cs.RO cs.AI 版本更新

Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study

受治理的能力演化：基于AI组件的系统的生命周期兼容性检查与回滚——以具身智能体为例

Xue Qin, Simin Luan, John See, Zeyd Boukhers, Cong Yang, Zhijun Li

发表机构 * School of Software, Harbin Institute of Technology（哈尔滨工业大学软件学院）； School of Computer Science and Technology, Harbin Institute of Technology（哈尔滨工业大学计算机科学与技术学院）； School of Mathematical and Computer Sciences, Heriot-Watt University, Malaysia Campus（赫瑞-瓦德大学马来西亚分校数学与计算机科学学院）； School of Future Science and Engineering, Soochow University（苏州大学未来科学与工程学院）； Fraunhofer Institute for Applied Information Technology（弗劳恩霍夫应用信息科技研究所）

AI总结针对基于AI组件的系统，提出一种受治理的能力演化框架，通过四类兼容性检查和七阶段升级管线实现安全部署，在具身智能体实验中实现零不安全激活。

Comments 42 pages, 7 figures, 12 tables

详情

AI中文摘要

由版本化AI组件构建的软件系统越来越需要生命周期治理：当能力模块演化到新版本时，宿主系统必须决定新版本是否可以安全激活、应在何种部署条件下运行、如何监控以及何时回滚。现有的软件部署模式（金丝雀发布、蓝绿部署、特性标志和MLOps管线）解决了这一循环的部分问题，但它们是针对无状态Web服务而非驱动现场AI组件的带状态、策略约束运行时设计的。我们将受治理的能力演化形式化为基于AI组件的系统的一等软件生命周期问题，并提出一个分阶段升级框架，其中每个新能力版本被视为受治理的部署候选，而非立即可执行的替换。该框架引入了四类升级兼容性检查（接口、策略、行为、恢复），并将其组织成七阶段管线（候选验证、沙箱评估、影子部署、门控激活、在线监控、回滚、审计）。我们在带有ROS 2中间件的PyBullet操作测试平台上实现了参考原型，并在15个随机种子的6轮能力升级中进行了评估。朴素升级实现了72.9%的任务成功率，但到最后一轮不安全激活率升至60%；受治理升级保持了可比的成功率（67.4%），同时在所有轮次中保持零不安全激活（Wilcoxon p=0.003）。影子部署揭示了40%的升级回归问题，这些问题是单独沙箱评估无法发现的，并且在79.8%的激活后漂移场景中回滚成功。

英文摘要

Software systems built from versioned AI components increasingly need lifecycle-time governance: when a capability module evolves into a new version, the hosting system must decide whether the new version may be activated safely, under what deployment conditions it should run, how it must be monitored, and when it should be rolled back. Existing software-deployment patterns (canary release, blue-green, feature flags, and MLOps pipelines) address parts of this loop but were designed for stateless web services rather than for stateful, policy-constrained runtimes that drive AI components in the field. We formulate governed capability evolution as a first-class software-lifecycle problem for AI-component-based systems and propose a staged upgrade framework in which every new capability version is treated as a governed deployment candidate rather than an immediately executable replacement. The framework introduces four upgrade compatibility checks (interface, policy, behavioral, recovery) and organizes them into a seven-stage pipeline (candidate validation, sandbox evaluation, shadow deployment, gated activation, online monitoring, rollback, audit). We implement a reference prototype on a PyBullet manipulation testbed with ROS 2 middleware and evaluate it over 6 rounds of capability upgrade with 15 random seeds. Naive upgrade achieves 72.9% task success but drives unsafe activation to 60% by the final round; governed upgrade retains comparable success (67.4%) while maintaining zero unsafe activations across all rounds (Wilcoxon p=0.003). Shadow deployment reveals 40% of upgrade regressions invisible to sandbox evaluation alone, and rollback succeeds in 79.8% of post-activation drift scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.00412 2026-05-27 cs.AI cs.RO 版本更新

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

物理原生世界模型：生成式世界建模的哈密顿视角

Sen Cui, Jingheng Ma

发表机构 * Tsinghua University（清华大学）

AI总结提出哈密顿世界模型，通过结构化潜相空间和哈密顿动力学演化实现物理可靠、动作可控且长期稳定的未来预测，用于具身决策。

详情

AI中文摘要

世界模型最近重新成为具身智能、机器人、自动驾驶和基于模型的强化学习的核心范式。然而，当前的世界模型研究通常由三条部分分离的路线主导：强调视觉未来合成的2D视频生成模型、强调空间重建的3D场景中心模型，以及强调抽象预测表示的JEPA类潜变量模型。每条路线都取得了重要进展，但它们仍然难以提供物理可靠、动作可控且长期稳定的预测以支持具身决策。在本文中，我们认为世界模型的瓶颈不再仅仅是它们能否生成逼真的未来，而是这些未来是否物理上有意义且对动作有用。我们提出哈密顿世界模型作为世界建模的一个物理基础视角。关键思想是将观测编码到结构化的潜相空间中，通过带有控制、耗散和残差项的哈密顿动力学演化潜状态，将预测轨迹解码为未来观测，并利用生成的轨迹进行规划。我们讨论了哈密顿结构如何提高可解释性、数据效率和长期稳定性，同时也指出了在涉及摩擦、接触、非保守力和可变形物体的真实机器人场景中的实际挑战。

英文摘要

World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent models that emphasize abstract predictive representations. While each route has made important progress, they still struggle to provide physically reliable, action-controllable, and long-horizon stable predictions for embodied decision making. In this paper, we argue that the bottleneck of world models is no longer only whether they can generate realistic futures, but whether those futures are physically meaningful and useful for action. We propose \emph{Hamiltonian World Models} as a physically grounded perspective on world modeling. The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, decode the predicted trajectory into future observations, and use the resulting rollouts for planning. We discuss how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability, while also noting practical challenges in real-world robotic scenes involving friction, contact, non-conservative forces, and deformable objects.

URL PDF HTML ☆

赞 0 踩 0

2604.00993 2026-05-27 astro-ph.IM astro-ph.EP cs.LG cs.RO 版本更新

Focal plane wavefront control with model-based reinforcement learning

基于模型的强化学习进行焦平面波前控制

Jalo Nousiainen, Iremsu Taskin, Markus Kasper, Gilles Orban De Xivry, Olivier Absil

发表机构 * European Southern Observatory (ESO)（欧洲南天文学观测站）； STAR Institute, Université de Liège（利根大学STAR研究所）

AI总结提出基于模型的强化学习算法PO4NCPA，通过顺序相位分集自动校正动态和静态非共路像差，实现高对比度成像中的焦平面波前控制。

Comments 13 pages, 11 figures accepted by A&A

详情

DOI: 10.1051/0004-6361/202558504
Journal ref: A&A 709, A267 (2026)

AI中文摘要

直接成像潜在宜居系外行星是极大望远镜高对比度成像仪器的主要科学目标之一。大多数此类系外行星轨道靠近其主星，其观测受到快速移动的大气散斑和准静态非共路像差（NCPA）的限制。传统的NCPA校正方法通常使用机械镜面探针，这会在操作期间影响性能。本文提出了基于机器学习的NCPA控制方法，通过利用顺序相位分集自动检测和校正动态及静态NCPA误差。我们将先前用于自适应光学的强化学习工作扩展到焦平面控制。一种新的基于模型的RL算法——NCPA策略优化（PO4NCPA），将焦平面图像解释为输入数据，并通过顺序相位分集确定相位校正，从而在没有先验系统知识的情况下优化非日冕和日冕后点扩散函数。此外，我们通过在受水汽诱导视宁度（动态NCPA）影响的地基望远镜和红外成像仪上数值模拟静态NCPA误差，证明了该方法的有效性。模拟表明，PO4NCPA能够稳健地补偿静态和动态NCPA。在静态情况下，它实现了使用日冕仪时近最优的焦平面光抑制，以及无日冕仪时近最优的斯特列尔比。在动态NCPA情况下，它在这些指标上与结合1步延迟积分器的模态最小二乘重构性能相当。该方法对ELT光瞳、矢量涡旋日冕仪以及在光子和背景噪声下仍然有效。PO4NCPA是无模型的，可直接应用于标准成像以及任何日冕仪。其亚毫秒级的推理时间和性能也使其适用于高对比度成像之外的大气湍流实时低阶校正。

英文摘要

The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.

URL PDF HTML ☆

赞 0 踩 0

2603.28730 2026-05-27 cs.RO cs.CL cs.CV 版本更新

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

SOLE-R1：视频语言推理作为机器人强化学习的唯一奖励

Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart, Ondrej Biza

发表机构 * MIT（麻省理工学院）； RAI Institute（机器人智能研究所）

AI总结提出SOLE-R1模型，通过视频语言时空推理生成密集任务进度估计作为唯一奖励信号，实现在无真实奖励、演示或任务特定调优下的零样本在线强化学习。

详情

AI中文摘要

视觉语言模型（VLM）在各种任务中展现出令人印象深刻的能力，这促使人们努力利用这些模型来监督机器人学习。然而，当在强化学习（RL）中用作评估器时，当今最强的模型在部分可观测性和分布偏移下常常失败，使得策略能够利用感知错误而非解决任务。我们提出SOLE-R1（自观察学习器），一种专门设计用于为在线RL提供唯一奖励信号的视频语言推理模型。仅给定原始视频观测和自然语言目标，SOLE-R1执行每时间步的时空思维链（CoT）推理，并生成可直接用作奖励的密集任务进度估计。为了训练SOLE-R1，我们开发了一个大规模视频轨迹和推理合成流水线，生成与连续进度监督对齐的时间基础CoT轨迹。这些数据与基础的空间和多帧时间推理相结合，并使用混合框架训练模型，该框架将监督微调与可验证奖励的RL相结合。在四个不同的仿真环境和真实机器人设置中，SOLE-R1实现了从随机初始化的零样本在线RL：机器人学习之前未见过的操作任务，无需真实奖励、成功指标、演示或任务特定调优。SOLE-R1在24个未见过的任务上成功，并显著优于强视觉语言奖励器，包括Robometer、RoboReward、ReWiND、GPT-5和Gemini-3-Pro，同时对奖励破解表现出明显更强的鲁棒性。我们在匿名页面发布所有模型、数据、代码和演示：https://philip-mit.github.io/sole-r1/

英文摘要

Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enabling policies to exploit perceptual errors rather than solve the task. We introduce SOLE-R1 (Self-Observing LEarner), a video-language reasoning model explicitly designed to serve as the sole reward signal for online RL. Given only raw video observations and a natural-language goal, SOLE-R1 performs per-timestep spatiotemporal chain-of-thought (CoT) reasoning and produces dense estimates of task progress that can be used directly as rewards. To train SOLE-R1, we develop a large-scale video trajectory and reasoning synthesis pipeline that generates temporally grounded CoT traces aligned with continuous progress supervision. This data is combined with foundational spatial and multi-frame temporal reasoning, and used to train the model with a hybrid framework that couples supervised fine-tuning with RL from verifiable rewards. Across four different simulation environments and a real-robot setting, SOLE-R1 enables zero-shot online RL from random initialization: robots learn previously unseen manipulation tasks without ground-truth rewards, success indicators, demonstrations, or task-specific tuning. SOLE-R1 succeeds on 24 unseen tasks and substantially outperforms strong vision-language rewarders, including Robometer, RoboReward, ReWiND, GPT-5, and Gemini-3-Pro, while exhibiting markedly greater robustness to reward hacking. We release all models, data, code, and demos at the anonymous page: https://philip-mit.github.io/sole-r1/

URL PDF HTML ☆

赞 0 踩 0

2603.25415 2026-05-27 cs.AI cs.RO 版本更新

Modernising Reinforcement Learning-Based Navigation for Embodied Semantic Scene Graph Generation

具身语义场景图生成的强化学习导航现代化

Roman Küble, Marco Hüller, Mrunmai Phatak, Rainer Lienhart, Jörg Hähner

发表机构 * Organic Computing Group（有机计算组）； Machine Learning and Computer Vision Group（机器学习与计算机视觉组）； University of Augsburg（奥格斯堡大学）； Am Technologiezentrum 8（技术中心8号）； Augsburg, Germany（德国奥格斯堡）

AI总结提出模块化导航组件，通过替换策略优化方法和重新设计离散动作表示，现代化具身语义场景图生成中的决策过程，并评估不同动作集和策略结构对场景图完整性、执行安全性和导航行为的影响。

详情

AI中文摘要

语义世界模型使具身智能体能够推理对象、关系和空间上下文，超越纯几何表示。在有机计算中，此类模型是在不确定性和资源约束下实现目标驱动自适应的关键。核心挑战是在有限动作预算内获取最大化模型质量和下游实用性的观测。语义场景图（SSG）为此提供了结构紧凑的表示。然而，在有限动作视界内构建SSG需要探索策略，在信息增益与导航成本之间权衡，并决定何时额外动作的收益递减。本文提出了用于具身语义场景图生成的模块化导航组件，并通过替换策略优化方法和重新审视离散动作公式来现代化其决策。我们研究了紧凑和更细粒度的较大离散动作集，并比较了原子动作上的单头策略与动作组件上的分解多头策略。我们评估了课程学习和基于深度的可选碰撞监督，并评估了SSG完整性、执行安全性和导航行为。结果表明，仅替换优化算法在相同奖励塑造下相对于基线将SSG完整性提高了21%。深度主要影响执行安全性（无碰撞运动），而完整性基本保持不变。将现代优化与更细粒度、分解的动作表示相结合，产生了最强的完整性-效率权衡。

英文摘要

Semantic world models enable embodied agents to reason about objects, relations, and spatial context beyond purely geometric representations. In Organic Computing, such models are a key enabler for objective-driven self-adaptation under uncertainty and resource constraints. The core challenge is to acquire observations maximising model quality and downstream usefulness within a limited action budget. Semantic scene graphs (SSGs) provide a structured and compact representation for this purpose. However, constructing them within a finite action horizon requires exploration strategies that trade off information gain against navigation cost and decide when additional actions yield diminishing returns. This work presents a modular navigation component for Embodied Semantic Scene Graph Generation and modernises its decision-making by replacing the policy-optimisation method and revisiting the discrete action formulation. We study compact and finer-grained, larger discrete motion sets and compare a single-head policy over atomic actions with a factorised multi-head policy over action components. We evaluate curriculum learning and optional depth-based collision supervision, and assess SSG completeness, execution safety, and navigation behaviour. Results show that replacing the optimisation algorithm alone improves SSG completeness by 21\% relative to the baseline under identical reward shaping. Depth mainly affects execution safety (collision-free motion), while completeness remains largely unchanged. Combining modern optimisation with a finer-grained, factorised action representation yields the strongest overall completeness--efficiency trade-off.

URL PDF HTML ☆

赞 0 踩 0

2602.21450 2026-05-27 cs.RO cs.SY eess.SY 版本更新

Vector Fields for Path Following on Lie Groups with Application in Robot Control

李群上的路径跟随向量场及其在机器人控制中的应用

Felipe Bartelt, Luciano C. A. Pimenta, Weijia Yao, Vinicius M. Gonçalves

AI总结针对李群上的路径跟随问题，提出一种通用向量场框架，保证从几乎所有初始条件收敛到期望参数曲线并连续运动，在SE(3)上给出最小表示的控制输入，通过机械臂实验验证有效性。

Comments Manuscript revised: new title, reframed abstract and introduction for robotics, and added a coauthor

详情

AI中文摘要

许多机器人系统允许独立控制位置和姿态（位姿），包括全向飞行器、水下机器人和机械臂末端执行器。在许多应用中，这些系统必须遵循连续的位姿序列，从而形成轨迹跟踪或路径跟随问题。与轨迹跟踪相比，路径跟随具有重要的实际优势。我们特别关注李群上的路径跟随问题。将机器人视为在三维空间中运动的刚体，该路径跟随问题可以表述为在矩阵李群SE(3)上设计引导向量场的问题。在本文中，我们开发了一个通用的向量场框架，用于连通矩阵李群上的路径跟随，其中SE(3)是一个重要的特例。所提出的向量场保证从几乎所有初始条件收敛到期望参数曲线，同时确保沿路径连续运动。此外，另一个有趣的特点是，与先前的工作相比，控制输入在表示上是“最小的”，并且更接近工程应用（例如，在SE(3)情况下的身体扭曲）。在建立一般情况后，该框架被专门应用于机器人学中特别感兴趣的SE(3)，产生了一种适用于实时机器人控制的高效算法。使用机械臂跟踪复杂位姿路径的实验证明了该方法的有效性。还提供了开源实现。

英文摘要

Many robotic systems allow independent control of position and orientation (pose), including omnidirectional aerial vehicles, underwater robots, and manipulator end-effectors. In many applications, these systems must follow a continuous sequence of poses, leading to either trajectory-tracking or path following formulations. Compared to trajectory-tracking, path following offers important practical advantages. In particular, we focus on the problem of path following on Lie groups. Considering the robots as rigid bodies moving in the 3D space, this path-following problem can be posed as a problem of designing guiding vector fields on the matrix Lie group SE(3). In this paper, we develop a general vector-field framework for path following on connected matrix Lie groups, of which SE(3) is a prominent special case. The proposed vector field guarantees convergence to a desired parametric curve from almost all initial conditions while ensuring continuous motion along the path. Furthermore, another interesting feature is that, as opposed to previous works, the control input is "minimal" in terms of representation and closer to the engineering application (e.g., the body twist in the case SE(3)). After establishing the general case, the framework is then specialized to SE(3), of special interest in robotics, yielding an efficient algorithm suitable for real-time robotic control. Experiments with a robotic manipulator tracking complex pose paths demonstrate the effectiveness of the approach. An open-source implementation is also provided.

URL PDF HTML ☆

赞 0 踩 0

2602.17822 2026-05-27 cs.RO 版本更新

Evolution of Safety Requirements in Industrial Robotics: Comparative Analysis of ISO 10218-1/2 (2011 vs. 2025) and Integration of ISO/TS 15066

工业机器人安全要求的演进：ISO 10218-1/2（2011 与 2025 版）比较分析及 ISO/TS 15066 的整合

Daniel Hartmann, Kristýna Hamříková, Aleš Vysocký, Vendula Laciok, Aleš Bernatík

发表机构 * Faculty of Mechanical Engineering, VSB—Technical University of Ostrava（机械工程学院，奥斯特拉瓦技术大学）； Faculty of Safety Engineering, VSB—Technical University of Ostrava（安全工程学院，奥斯特拉瓦技术大学）

AI总结本文通过比较 ISO 10218:2011 与 ISO 10218:2025 标准，分析工业机器人安全要求在功能安全、网络安全、机器人分类及协作应用等方面的演进，并整合 ISO/TS 15066，建立现代机器人系统设计与运行的全面框架。

详情

DOI: 10.1016/j.rineng.2026.110486

AI中文摘要

工业机器人已成为大型制造企业不可或缺的组成部分。同时，协作机器人日益突出，引入了人机交互的新范式。这些进步促使安全标准进行全面修订，特别是纳入了网络安全和防止未经授权访问网络化机器人系统的要求。本文对 ISO 10218:2011 和 ISO 10218:2025 标准进行了比较分析，考察了其结构、术语、技术要求和附录的演进。分析揭示了功能安全和网络安全方面的显著扩展，引入了机器人和协作应用的新分类，以及技术规范 ISO/TS 15066 的规范性整合。因此，新版本综合了机械、功能和数字安全要求，为现代机器人系统的设计和运行建立了全面框架。

英文摘要

Industrial robotics has established itself as an integral component of large-scale manufacturing enterprises. Simultaneously, collaborative robotics is gaining prominence, introducing novel paradigms of human-machine interaction. These advancements have necessitated a comprehensive revision of safety standards, specifically incorporating requirements for cybersecurity and protection against unauthorized access in networked robotic systems. This article presents a comparative analysis of the ISO 10218:2011 and ISO 10218:2025 standards, examining the evolution of their structure, terminology, technical requirements, and annexes. The analysis reveals significant expansions in functional safety and cybersecurity, the introduction of new classifications for robots and collaborative applications, and the normative integration of the technical specification ISO/TS 15066. Consequently, the new edition synthesizes mechanical, functional, and digital safety requirements, establishing a comprehensive framework for the design and operation of modern robotic systems.

URL PDF HTML ☆

赞 0 踩 0

2601.14702 2026-05-27 cs.AI cs.CV cs.RO 版本更新

Drive-P2D: A Progressive Perception-to-Decision Benchmark for VLMs in Autonomous Driving

Drive-P2D：自动驾驶中视觉语言模型的渐进式感知到决策基准

Zecong Tang, Zixu Wang, Yifei Wang, Weitong Lian, Tianjian Gao, Haoran Li, Tengju Ru, Lingyi Meng, Zhejun Cui, Yichen Zhu, Qi Kang, Kaixuan Wang, Yu Zhang

发表机构 * Zhejiang University（浙江大学）； The University of Hong Kong（香港大学）

AI总结提出Drive-P2D基准，通过分离推理与答案的协议，在目标、场景和决策三个层级上评估视觉语言模型的感知到决策能力，并分析错误模式。

详情

AI中文摘要

自动驾驶需要在复杂场景中实现可靠的感知和安全的决策。最近的视觉语言模型（VLM）展示了推理和泛化能力，为自动驾驶开辟了新的可能性；然而，现有的基准通常分别评估感知和决策，通过仅选择格式限制故障分析，或通过LLM评分的长格式输出引入评估偏差。为了解决这些问题，我们提出了Drive-P2D，一个渐进式感知到决策基准，包含6650个问题，涵盖目标、场景和决策三个层级。Drive-P2D采用分离的推理与答案协议：最终答案客观评分，而推理则用于分析沿渐进式感知到决策链暴露的错误模式。我们评估了所有场景和高风险场景下的主流VLM，并通过相关性分析和相似场景鲁棒性测试进一步刻画了感知到决策的能力边界。推理进一步揭示了逻辑推理错误和语义特征遗漏等故障模式，我们训练了一个轻量级分析器模型来自动化大规模推理错误模式标注。这些设计共同为构建更安全、更可靠的用于现实世界自动驾驶的VLM提供了实用见解。

英文摘要

Autonomous driving requires reliable perception and safe decision-making in complex scenarios. Recent vision-language models (VLMs) demonstrate reasoning and generalization abilities, opening new possibilities for autonomous driving; however, existing benchmarks often evaluate perception and decision-making separately, limit failure analysis with choice-only formats, or introduce evaluation bias through LLM-scored long-form outputs. To address these issues, we present Drive-P2D, a progressive perception-to-decision benchmark with 6,650 questions across Object, Scene, and Decision levels. Drive-P2D adopts a separated reasoning-and-answer protocol: final answers are scored objectively, while reasoning is analyzed to identify error modes exposed along the progressive perception-to-decision chain. We evaluate mainstream VLMs across all and high-risk scenarios, and further characterize the perception-to-decision capability boundary through correlation analysis and similar-scene robustness testing. Reasoning further exposes failure modes such as logical reasoning errors and semantic feature omissions, and we train a lightweight analyzer model to automate large-scale error-mode annotation of reasoning. Together, these designs provide practical insights for building safer and more reliable VLMs for real-world autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2601.07284 2026-05-27 cs.RO 版本更新

AdaMorph: Unified Motion Retargeting via Embodiment-Aware Adaptive Transformers

AdaMorph: 通过具身感知自适应变换器实现统一运动重定向

Haoyu Zhang, Shibo Jin, Lusong Li, Jun Li, Liang Lin, Xiaodong He, Zecui Zeng

发表机构 * JD Explore Academy（京东探索学院）

AI总结提出AdaMorph统一框架，利用具身感知自适应变换器将人体运动重定向到多种机器人形态，实现零样本泛化。

详情

AI中文摘要

将人体运动重定向到异构机器人是机器人学中的一个基本挑战，主要由于不同具身之间的严重运动学和动力学差异。现有解决方案通常训练特定于具身的模型，这扩展性差且无法利用共享的运动语义。为了解决这个问题，我们提出了AdaMorph，一个统一的神经重定向框架，使单个模型能够将人体运动适应到多种机器人形态。我们的方法将重定向视为一个条件生成任务。我们将人体运动映射到一个与形态无关的潜在意图空间，并利用双用途提示机制来条件化生成。不同于简单的输入拼接，我们利用自适应层归一化（AdaLN）根据具身约束动态调制解码器的特征空间。此外，我们通过基于课程的训练目标强制执行物理合理性，通过积分确保方向和轨迹一致性。在12个不同的人形机器人上的实验结果表明，AdaMorph有效地统一了跨异构拓扑的控制，在保持源行为动态本质的同时，对未见过的复杂运动表现出强大的零样本泛化能力。

英文摘要

Retargeting human motion to heterogeneous robots is a fundamental challenge in robotics, primarily due to the severe kinematic and dynamic discrepancies between varying embodiments. Existing solutions typically resort to training embodiment-specific models, which scales poorly and fails to exploit shared motion semantics. To address this, we present AdaMorph, a unified neural retargeting framework that enables a single model to adapt human motion to diverse robot morphologies. Our approach treats retargeting as a conditional generation task. We map human motion into a morphology-agnostic latent intent space and utilize a dual-purpose prompting mechanism to condition the generation. Instead of simple input concatenation, we leverage Adaptive Layer Normalization (AdaLN) to dynamically modulate the decoder's feature space based on embodiment constraints. Furthermore, we enforce physical plausibility through a curriculum-based training objective that ensures orientation and trajectory consistency via integration. Experimental results on 12 distinct humanoid robots demonstrate that AdaMorph effectively unifies control across heterogeneous topologies, exhibiting strong zero-shot generalization to unseen complex motions while preserving the dynamic essence of the source behaviors.

URL PDF HTML ☆

赞 0 踩 0

2506.06016 2026-05-27 eess.SY cs.RO cs.SY 版本更新

Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation

相对姿态与目标角速度估计的等变滤波器

Gil Serrano, Bruno J. Guerreiro, Pedro Lourenço, Rita Cunha

发表机构 * Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa（系统机器人研究所，技术高等学院，里斯本大学）； CTS/Uninova and LASI, School of Science and Technology at NOVA University Lisbon（CTS/Uninova 和 LASI，里斯本 NOVA 大学科学与技术学院）； GNC Division, Flight Segment and Robotics, GMV（飞行段与机器人部，GMV）

AI总结提出一种等变滤波器（EqF），利用两个已知非共线向量观测，同时估计相对姿态和目标角速度，并通过仿真和实验验证其性能。

Comments Published in the IEEE Transactions on Aerospace and Electronic Systems, 2026. Open Access article under CC BY 4.0

详情

DOI: 10.1109/TAES.2025.3643413
Journal ref: IEEE Transactions on Aerospace and Electronic Systems, vol. 62, pp. 2965-2979, 2026

AI中文摘要

两个刚体之间相对姿态和角速度的精确估计是航天应用（如航天器交会对接）的基础。在这些场景中，追踪航天器必须利用机载传感器确定目标的姿态和角速度。本文解决了设计等变滤波器（EqF）的挑战，该滤波器能够利用目标坐标系中两个已知非共线向量的带噪观测，可靠地估计相对姿态和目标角速度。为了推导EqF，提出了系统的对称性，并计算了到对称群的等变提升。分析了可观测性和收敛性。仿真展示了滤波器的性能，蒙特卡洛运行产生了统计显著的结果。还研究了低速率测量的影响，并提出了缓解该影响的策略。使用基准标记以及传统相机和事件相机进行测量获取的实验结果进一步验证了该方法，确认了其在现实环境中的有效性。

英文摘要

Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.

URL PDF HTML ☆

赞 0 踩 0

2509.18384 2026-05-27 cs.RO cs.FL 版本更新

LAD-VF: LLM-Automatic Differentiation Enables Fine-Tuning-Free Robot Planning from Formal Methods Feedback

LAD-VF：LLM自动微分实现基于形式化方法反馈的无微调机器人规划

Yunhao Yang, Junyuan Hong, Gabriel Jacob Perin, Zhiwen Fan, Li Yin, Zhangyang Wang, Ufuk Topcu

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）； University of São Paulo（圣保罗大学）； Texas A&M University（德克萨斯A&M大学）； SylphAI

AI总结提出LAD-VF框架，利用形式化验证反馈和LLM自动微分自动优化提示词，无需微调即可提升机器人规划任务中规范符合率，成功率从60%提升至90%以上。

Comments Presented at ICRA 2026

详情

AI中文摘要

大型语言模型（LLM）能够将自然语言指令转化为机器人、自动驾驶等领域的可执行动作计划。然而，在物理世界中部署LLM驱动的规划需要严格遵守安全和监管约束，当前模型常因幻觉或弱对齐而违反这些约束。传统的数据驱动对齐方法（如直接偏好优化DPO）需要昂贵的人工标注，而近期基于形式化反馈的方法仍依赖资源密集型的微调。本文提出LAD-VF，一种无需微调的框架，利用形式化验证反馈实现自动化提示工程。通过引入与LLM-AutoDiff集成的形式化验证感知文本损失，LAD-VF迭代优化提示词而非模型参数。这带来三个关键优势：(i) 无需微调的可扩展适应；(ii) 与模块化LLM架构兼容；(iii) 通过可审计的提示词实现可解释的优化。在机器人导航和操作任务中的实验表明，LAD-VF显著提升了规范符合率，将成功率从60%提升至90%以上。因此，我们的方法为可信、形式化验证的LLM驱动控制系统提供了一条可扩展且可解释的路径。

英文摘要

Large language models (LLMs) can translate natural language instructions into executable action plans for robotics, autonomous driving, and other domains. Yet, deploying LLM-driven planning in the physical world demands strict adherence to safety and regulatory constraints, which current models often violate due to hallucination or weak alignment. Traditional data-driven alignment methods, such as Direct Preference Optimization (DPO), require costly human labeling, while recent formal-feedback approaches still depend on resource-intensive fine-tuning. In this paper, we propose LAD-VF, a fine-tuning-free framework that leverages formal verification feedback for automated prompt engineering. By introducing a formal-verification-informed text loss integrated with LLM-AutoDiff, LAD-VF iteratively refines prompts rather than model parameters. This yields three key benefits: (i) scalable adaptation without fine-tuning; (ii) compatibility with modular LLM architectures; and (iii) interpretable refinement via auditable prompts. Experiments in robot navigation and manipulation tasks demonstrate that LAD-VF substantially enhances specification compliance, improving success rates from 60% to over 90%. Our method thus presents a scalable and interpretable pathway toward trustworthy, formally-verified LLM-driven control systems.

URL PDF HTML ☆

赞 0 踩 0

2509.10481 2026-05-27 cs.NI cs.RO cs.SY eess.SP eess.SY 版本更新

Synergetic Empowerment: Wireless Communications Meets Embodied Intelligence

协同赋能：无线通信遇见具身智能

Hongtao Liang, Yihe Diao, YuHang Wu, Fuhui Zhou, Qihui Wu

发表机构 * College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics（南京航空航天大学电子与信息工程学院）； College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics（南京航空航天大学人工智能学院）

AI总结本文提出无线通信与具身智能的协同赋能框架，通过感知-认知-执行循环揭示二者相互促进的双重性，将无线通信从简单工具转变为集体智能的数字神经系统，并推动孤立智能体进化为具备涌现能力的超个体。

Comments Accepted by IEEE Communications Magazine

详情

AI中文摘要

无线通信正在演变为一个智能体时代，其中具有内在具身智能的大规模智能体不仅是用户，更是积极参与者。无线通信与具身智能的完美结合可以实现协同赋能，极大促进智能体通信的发展。本文介绍了这种协同赋能的概览，将其视为一个共同进化过程，将无线通信从简单的工具转变为集体智能的数字神经系统，同时将孤立的智能体提升为一个统一的超个体，其涌现能力远超个体贡献的总和。此外，我们通过感知-认知-执行（PCE）循环的视角详细阐述了具身智能与无线通信如何相互受益，揭示了一个基本双重性：每个PCE阶段既对网络容量提出挑战，又为系统级优化创造了前所未有的机遇。最后，指出了关键开放问题和未来研究方向。

英文摘要

Wireless communication is evolving into an agent era, where large-scale agents with inherent embodied intelligence are not just users but active participants. The perfect combination of wireless communication and embodied intelligence can achieve a synergetic empowerment and greatly facilitate the development of agent communication. An overview of this synergetic empowerment is presented, framing it as a co-evolutionary process that transforms wireless communication from a simple utility into the digital nervous system of a collective intelligence, while simultaneously elevating isolated agents into a unified superorganism with emergent capabilities far exceeding individual contributions. Moreover, we elaborate how embodied intelligence and wireless communication mutually benefit each other through the lens of the perception-cognition-execution (PCE) loop, revealing a fundamental duality where each PCE stage both challenges network capacity and creates unprecedented opportunities for system-wide optimization. Furthermore, critical open issues and future research directions are identified.

URL PDF HTML ☆

赞 0 踩 0

2504.00167 2026-05-27 cs.RO 版本更新

Enhancing Physical Human-Robot Interaction: Recognizing Digits via Intrinsic Robot Tactile Sensing

增强物理人机交互：通过机器人本体触觉感知识别数字

Teresa Sinico, Giovanni Boschetti, Pedro Neto

发表机构 * Dep. Management and Engineering University of Padua（管理与工程系帕多瓦大学）； CEMMPRE, Dep. Mechanical Engineering University of Coimbra（CEMMPRE，机械工程系科英布拉大学）

AI总结利用协作机器人内置扭矩传感器采集人手在触控板上书写数字时的关节力矩和末端力数据，通过双向LSTM网络实现94%准确率的在线数字识别，并在水果递送任务中验证其应用潜力。

详情

DOI: 10.1109/IECON58223.2025.11221741

AI中文摘要

物理人机交互（pHRI）仍然是实现与机器人直观安全交互的关键挑战。当前的进展通常依赖外部触觉传感器作为接口，这增加了机器人系统的复杂性。在本研究中，我们利用协作机器人的本体触觉感知能力，识别用户在安装在机器人法兰上的无仪器触控板上绘制的数字。我们提出了一个数据集，包含机器人关节扭矩信号以及相应的末端执行器（EEF）力和力矩，这些数据来自机器人每个关节的集成扭矩传感器，用户在手写数字（0-9）时采集。pHRI-DIGI-TACT数据集从不同用户收集，以捕捉手写的自然变化。为增强分类鲁棒性，我们开发了一种数据增强技术来处理反转和旋转的数字输入。双向长短期记忆（Bi-LSTM）网络利用数据的时空特性，实现在线数字分类，在各种测试场景中总体准确率达到94%，包括涉及未参与系统训练的用户。该方法在真实机器人上的水果递送任务中实现，展示了其辅助日常生活的潜力。数据集和视频演示可在 https://TS-Robotics.github.io/pHRI-DIGI/ 获取。

英文摘要

Physical human-robot interaction (pHRI) remains a key challenge for achieving intuitive and safe interaction with robots. Current advancements often rely on external tactile sensors as interface, which increase the complexity of robotic systems. In this study, we leverage the intrinsic tactile sensing capabilities of collaborative robots to recognize digits drawn by humans on an uninstrumented touchpad mounted to the robot's flange. We propose a dataset of robot joint torque signals along with corresponding end-effector (EEF) forces and moments, captured from the robot's integrated torque sensors in each joint, as users draw handwritten digits (0-9) on the touchpad. The pHRI-DIGI-TACT dataset was collected from different users to capture natural variations in handwriting. To enhance classification robustness, we developed a data augmentation technique to account for reversed and rotated digits inputs. A Bidirectional Long Short-Term Memory (Bi-LSTM) network, leveraging the spatiotemporal nature of the data, performs online digit classification with an overall accuracy of 94\% across various test scenarios, including those involving users who did not participate in training the system. This methodology is implemented on a real robot in a fruit delivery task, demonstrating its potential to assist individuals in everyday life. Dataset and video demonstrations are available at: https://TS-Robotics.github.io/pHRI-DIGI/.

URL PDF HTML ☆

赞 0 踩 0

2009.11997 2026-05-27 cs.LG cs.AI cs.RO 版本更新

Continual Model-Based Reinforcement Learning with Hypernetworks

基于超网络的连续模型强化学习

Yizhou Huang, Kevin Xie, Homanga Bharadhwaj, Florian Shkurti

发表机构 * Division of Engineering Science, University of Toronto, Canada（多伦多大学工程科学系）； Department of Computer Science, University of Toronto, Canada（多伦多大学计算机科学系）

AI总结提出HyperCRL方法，利用任务条件超网络在序列任务中持续学习动力学模型，避免重新训练并固定存储开销，在机器人 locomotion 和 manipulation 任务中优于现有持续学习方法。

Comments Updated link to project website in the abstract. 7 pages (+2 pages in appendix), 8 figures. In proceedings of the 2021 IEEE International Conference on Robotics and Automation

详情

AI中文摘要

在基于模型的强化学习（MBRL）和模型预测控制（MPC）中，有效规划依赖于学习到的动力学模型的准确性。在MBRL和MPC的许多实例中，该模型被假定为平稳的，并且定期从头开始重新训练，使用从环境交互开始收集的状态转移经验。这意味着训练动力学模型所需的时间——以及计划执行之间的暂停时间——随着收集的经验规模线性增长。我们认为这对于终身机器人学习来说太慢，并提出了HyperCRL，一种使用任务条件超网络在序列任务中持续学习所遇到动力学的方法。我们的方法有三个主要特点：首先，它包括不重新访问先前任务训练数据的动力学学习会话，因此只需存储最近固定大小的状态转移经验；其次，它使用固定容量的超网络来表示非平稳且任务感知的动力学；第三，它优于依赖固定容量网络的现有持续学习替代方案，并且与记忆不断增长的过去经验核心集的基线方法相比具有竞争力。我们展示了HyperCRL在机器人 locomotion 和 manipulation 场景（如推和开门任务）中在连续基于模型的强化学习中的有效性。我们的项目网站（含视频）位于此链接：https://rvl.cs.toronto.edu/blog/hypercrl

英文摘要

Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dynamics model. In many instances of MBRL and MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state transition experience collected from the beginning of environment interactions. This implies that the time required to train the dynamics model - and the pause required between plan executions - grows linearly with the size of the collected experience. We argue that this is too slow for lifelong robot learning and propose HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks. Our method has three main attributes: first, it includes dynamics learning sessions that do not revisit training data from previous tasks, so it only needs to store the most recent fixed-size portion of the state transition experience; second, it uses fixed-capacity hypernetworks to represent non-stationary and task-aware dynamics; third, it outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience. We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios, such as tasks involving pushing and door opening. Our project website with videos is at this link https://rvl.cs.toronto.edu/blog/hypercrl

URL PDF HTML ☆

赞 0 踩 0