机器人 / 具身智能 - arXivDaily 专题

2605.22748 2026-06-19 cs.RO cs.AI cs.LG cs.MA 版本更新 85%

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

通过多智能体强化学习实现超人类安全且敏捷的赛车

Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier, Davide Scaramuzza

发表机构 * Robotics and Perception Group, University of Zurich（苏黎世大学机器人与感知组）； Google DeepMind（谷歌深Mind）； Nomagic

专题命中机器人学习：多智能体强化学习用于四旋翼赛车

AI总结本文提出通过多智能体强化学习在高速四旋翼赛车中实现安全且敏捷的性能，展示了多智能体交互对真实世界交互安全性的关键作用，同时在高速赛车中超越人类飞行员并减少碰撞率。

Comments 12 pages (+4 supplementary). Website: https://rpg.ifi.uzh.ch/marl

详情

AI中文摘要

自主系统在孤立或模拟环境中已实现超人类性能，但在共享、动态的真实世界空间中仍显得脆弱。这种失败源于物理应用中主导的单智能体范式，其中其他参与者被忽略或视为环境噪声，阻碍了有效协调。本文证明多智能体强化学习为真实世界交互提供了必要的安全性基础。使用高速四旋翼赛车作为高风险测试平台，训练智能体在复杂空气动力学相互作用和战略机动中导航，具有可变数量的赛车。通过联赛基于的自我对战，智能体进化出复杂的前瞻性行为，包括主动避障、超车和处理多智能体物理交互，包括空气动力学下洗。我们的智能体在超过22米/秒的速度下多玩家赛车中超越了冠军级人类飞行员，同时与最先进的单智能体基线相比，碰撞率减少了50%。关键的是，使用多样化的人工智能体进行训练能够实现零样本泛化到更安全的人类交互。这些结果表明，实现稳健的机器人共存的路径不在于孤立的安全约束，而在于多智能体交互的严格要求。多媒体材料可在：https://rpg.ifi.uzh.ch/marl

英文摘要

Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic maneuvering with a variable number of racers. Through league-based self-play, agents evolve sophisticated anticipatory behaviors, including proactive collision avoidance, overtaking, and handling multi-agent physical interactions, including aerodynamic downwash. Our agents outperform a champion-level human pilot in multi-player races at speeds exceeding 22 m/s, while simultaneously reducing collision rates by 50 % compared to state-of-the-art single-agent baselines. Crucially, training with diverse artificial agents enables zero-shot generalization to safer human interaction. These results suggest that the path to robust robotic co-existence lies not in isolated safety constraints, but in the rigorous demands of multi-agent interaction. Multimedia materials are available at: https://rpg.ifi.uzh.ch/marl

URL PDF HTML ☆

赞 0 踩 0

2604.09795 2026-06-19 eess.SY cs.RO cs.SY 版本更新 85%

On Feedback Speed Control for a Planar Tracking

平面跟踪中的反馈速度控制

Xincheng Li, Tengyue Liu, Udit Halder

发表机构 * Department of Mechanical and Aerospace Engineering, University of South Florida（南佛罗里达大学机械与航空航天工程系）

专题命中机器人学习：平面跟踪反馈速度控制律

AI总结针对领航-跟随平面跟踪问题，提出一种反馈速度控制律与恒定方位角转向策略，实现并排编队并证明渐近稳定性，扩展至N-agent链网络。

详情

AI中文摘要

本文研究了领航者和跟随者之间的平面跟踪问题。我们提出了一种新颖的反馈速度控制律，结合恒定方位角转向策略，以保持两个智能体之间的并排编队。我们证明了当领航者的转向已知时，所提出的控制使闭环系统渐近稳定。对于跟随者无法获取领航者转向的情况，我们表明系统相对于被视为输入的领航者转向仍然是输入-状态稳定的。此外，我们证明如果领航者的转向是周期性的，跟随者将渐近收敛到具有相同周期的周期轨道。我们通过数值模拟和移动机器人实验验证了这些结果。最后，我们通过将两智能体控制律扩展到N智能体链网络，展示了所提出方法的可扩展性，并说明了其在生物和工程群体中方向信息传播的意义。

英文摘要

This paper investigates a planar tracking problem between a leader and follower agent. We propose a novel feedback speed control law, paired with a constant bearing steering strategy, to maintain an abreast formation between the two agents. We prove that the proposed control yields asymptotic stability of the closed-loop system when the steering of the leader is known. For the case when the leader's steering is unavailable to the follower, we show that the system is still input-to-state stable with respect to the leader's steering viewed as an input. Furthermore, we demonstrate that if the leader's steering is periodic, the follower will asymptotically converge to a periodic orbit with the same period. We validate these results through numerical simulations and experimental implementations on mobile robots. Finally, we demonstrate the scalability of the proposed approach by extending the two-agent control law to an N-agent chain network, illustrating its implications for directional information propagation in biological and engineered flocks.

URL PDF HTML ☆

赞 0 踩 0

2512.11173 2026-06-19 cs.RO 版本更新 85%

Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance

从单实例RGB演示中学习类别级最后米导航

Tzu-Hsien Lee, Fidan Mahmudova, Karthik Desingh

发表机构 * University of Minnesota, Twin Cities（明尼苏达大学 Twin Cities 分校）

专题命中机器人学习：模仿学习框架实现四足机器人最后米导航

AI总结提出面向对象的模仿学习框架，利用RGB观测实现四足移动机械臂在最后米阶段的精确导航，无需深度或地图先验，在类别级泛化中达到高成功率。

详情

AI中文摘要

移动机械臂基座的精确定位对于后续成功操作至关重要。大多数基于RGB的导航系统仅保证粗略的米级精度，不适合移动操作的精确定位阶段。这一差距导致操作策略无法在其训练演示的分布内运行，从而导致频繁的执行失败。我们通过引入一种面向对象的模仿学习框架来解决这一差距，用于最后米导航，使四足移动机械臂机器人仅使用其机载摄像头的RGB观测即可实现可操作的定位。我们的方法将导航策略条件化为三个输入：目标图像、来自机载摄像头的多视角RGB观测以及指定目标对象的文本提示。然后，语言驱动的分割模块和空间得分矩阵解码器提供显式的对象定位和相对姿态推理。使用类别内单个对象实例的真实世界数据，该系统能够泛化到不同环境中具有挑战性光照和背景条件的未见对象实例。为了全面评估这一点，我们引入了两个指标：边缘对齐度量（使用真实方向）和对象对齐度量（评估机器人视觉上面对目标的程度）。在这些指标下，我们的策略在相对于未见目标对象定位时，边缘对齐成功率达到74.58%，对象对齐成功率达到89.42%。这些结果表明，无需深度、LiDAR或地图先验，即可在类别级实现精确的最后米导航，为统一的移动操作提供可扩展的途径。项目页面：此https URL

英文摘要

Achieving precise positioning of the mobile manipulator's base is essential for successful manipulation actions that follow. Most of the RGB-based navigation systems only guarantee coarse, meter-level accuracy, making them less suitable for the precise positioning phase of mobile manipulation. This gap prevents manipulation policies from operating within the distribution of their training demonstrations, resulting in frequent execution failures. We address this gap by introducing an object-centric imitation learning framework for last-meter navigation, enabling a quadruped mobile manipulator robot to achieve manipulation-ready positioning using only RGB observations from its onboard cameras. Our method conditions the navigation policy on three inputs: goal images, multi-view RGB observations from the onboard cameras, and a text prompt specifying the target object. A language-driven segmentation module and a spatial score-matrix decoder then supply explicit object grounding and relative pose reasoning. Using real-world data from a single object instance within a category, the system generalizes to unseen object instances across diverse environments with challenging lighting and background conditions. To comprehensively evaluate this, we introduce two metrics: an edge-alignment metric, which uses ground truth orientation, and an object-alignment metric, which evaluates how well the robot visually faces the target. Under these metrics, our policy achieves 74.58% success in edge-alignment and 89.42% success in object-alignment when positioning relative to unseen target objects. These results show that precise last-meter navigation can be achieved at a category-level without depth, LiDAR, or map priors, enabling a scalable pathway toward unified mobile manipulation. Project page: https://rpm-lab-umn.github.io/category-level-last-meter-nav/

URL PDF HTML ☆

赞 0 踩 0

2510.05013 2026-06-19 stat.ML cs.LG 85%

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology（冲绳科学技术大学院大学）

专题命中机器人学习：好奇心驱动的机器人动作与语言学习

AI总结本研究通过好奇心驱动的机器人自我探索，结合Q学习实现主动推理，揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式，为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情

AI中文摘要

婴儿通过极少的经验就能泛化习得语言，而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么？我们通过实验研究了这一问题，其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句（例如，推红色立方体）相关的动作。我们的方法使用Q学习摊销主动推理，实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加，泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现，这种模式类似于儿童语言学习中的表征重述。这些结果表明，好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

URL PDF HTML ☆

赞 0 踩 0

2606.20549 2026-06-19 cs.RO 新提交 80%

Generating Robot Hands from Human Demonstrations

从人类演示生成机器人手

Sha Yi, Nicklas Hansen, Xueqian Bai, Carmelo Sferrazza, Michael T. Tolley, Xiaolong Wang

发表机构 * University of California San Diego（加州大学圣迭戈分校）； Amazon Frontier AI & Robotics（亚马逊前沿人工智能与机器人）

专题命中机器人学习：从人类演示生成机器人手的设计框架。

AI总结提出数据驱动框架，利用人类日常操作中超过400万帧指尖运动数据，通过逆运动学匹配指尖位置，优化树状结构机器人手的设计，生成通用6自由度手和低自由度任务专用手，并训练强化学习智能体加速设计搜索。

详情

AI中文摘要

机器人学习在控制学习方面取得了快速进展，但学习机器人的物理身体仍然困难得多，因为同时搜索设计和控制会产生一个非常大的组合问题。在这里，我们提出了一个数据驱动的框架，用于从人类演示生成机器人手。我们不是为每个候选设计学习一个复杂的控制器，而是使用制造后使用的相同简单控制策略来生成机器人手设计：通过逆运动学匹配指尖位置。利用来自日常操作的超过400万帧人类指尖运动数据，我们的算法优化树状结构机器人手以再现所需的目标运动。该框架产生了一个6自由度（DoF）通用手和具有空间四杆仿生关节的低自由度任务专用手。为了加速设计搜索，我们训练了一个强化学习（RL）智能体来提出好的手设计和关节角度，将搜索时间从数小时减少到数分钟。我们直接将机制制作为具有打印就绪关节的一体式铰接结构。在真实世界实验中，6自由度手实现了高度精确的遥操作指尖跟踪，优于现有的商用机器人手，而专门的3自由度手以降低的机械复杂性再现了结构化的人类和合成轨迹。这些结果表明，大规模人类运动数据不仅可以用于训练机器人控制器，还可以作为优化和生成机器人物理实体的参考。

英文摘要

Robot learning has advanced rapidly in learning control, but learning the physical body of a robot remains much more difficult because jointly searching over design and control creates a very large combinatorial problem. Here, we present a data-driven framework for generating robot hands from human demonstrations. Instead of learning a complex controller together with each candidate design, we generate robot hand designs using the same simple control policy used after fabrication: matching fingertip positions through inverse kinematics. Using more than 4 million frames of human fingertip motion from everyday manipulation, our algorithm optimizes tree-structured robot hands to reproduce desired target motions. The framework produced both a 6-degree-of-freedom (DoF) general-purpose hand and lower-DoF task-specific hands with spatial four-bar mimic joints. To accelerate the search over designs, we trained a reinforcement-learning (RL) actor to propose good hand designs and joint angles, reducing search time from hours to minutes. We fabricated the mechanisms directly as one-piece articulated structures with print-in-place joints. In real-world experiments, the 6-DoF hand achieved highly accurate teleoperated fingertip tracking better than available commercial robot hands, whereas the specialized 3-DoF hands reproduced structured human and synthetic trajectories with reduced mechanical complexity. These results showed that large-scale human motion data can be used not only to train robot controllers but also as a reference for optimizing and generating the physical embodiment of robots.

URL PDF HTML ☆

赞 0 踩 0

2606.20428 2026-06-19 cs.RO 新提交 80%

ARC: Adaptive Robust Joint State and Covariance Estimation

ARC：自适应鲁棒联合状态与协方差估计

Alexandre Hadji-Thomas, Andrew Stirling, James R. Forbes

专题命中机器人学习：鲁棒状态估计，用于机器人传感器数据处理

AI总结提出统一块坐标下降框架，结合自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式估计器，实现离群值下状态与协方差的自适应联合估计。

Comments Submitted to information IEEE Robotics and Automation Letters (RA-L), June 2026. 8 pages, 7 figures, 1 table

详情

AI中文摘要

传感器测量经常受到离群值和非高斯噪声的污染。这些传感器数据中的缺陷会导致经典状态估计器产生有偏且不可靠的状态和不确定性估计。鲁棒估计器拒绝或降低离群值的权重，但不进行测量协方差估计，而联合状态和协方差估计器假设高斯残差和固定的损失形状参数。将这两种能力整合到一个框架中，可以在存在离群值的情况下同时估计状态和协方差。本文提出了一种统一的块坐标下降框架，该框架结合了范数感知自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式协方差估计器，产生了一个自调谐的联合状态和协方差估计器。该框架在蒙特卡洛模拟和真实世界超宽带定位实验（在杂乱的视距外环境中）中进行了评估。结果表明，所提出的估计器能够一致地恢复真实的内点测量协方差，并在状态估计精度上达到或超过所有基线方法，且无需任何手动参数调整。

英文摘要

Sensor measurements are frequently corrupted by outliers and non-Gaussian noise. These imperfections in the sensor data can cause classical state estimators to generate biased and unreliable state and uncertainty estimates. Robust estimators reject or downweight outliers but do not perform measurement covariance estimation, whereas joint state and covariance estimators assume Gaussian residuals and fixed loss shape parameters. Integrating these two capabilities into a single framework is an opportunity to simultaneously estimate both state and covariance in the presence of outliers. This paper proposes a unified Block-Coordinate Descent framework that combines a norm-aware adaptive robust loss, an Iteratively Reweighted Least-Squares state update, and a Minimum Weighted Covariance Determinant covariance estimator, yielding a self-tuning joint state and covariance estimator. The framework is evaluated in a Monte-Carlo simulation and on real-world ultra-wideband localization experiments in cluttered non-line-of-sight environments. Results show that the proposed estimator consistently recovers the true inlier measurement covariance and matches or exceeds the state estimation accuracy of all baselines, without requiring any manual parameter tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.20197 2026-06-19 cs.RO 新提交 80%

Stable Transformer-Actor-Critic Model Predictive Control: A Contraction Analysis Approach

稳定的Transformer-Actor-Critic模型预测控制：一种收缩分析方法

Antonio Marino, Valerio Modugno, Marco Cognetti

发表机构 * University of Cambridge（剑桥大学）； University College London（伦敦大学学院）； LAAS-CNRS（Laas--cnrs）

专题命中机器人学习：Transformer-Actor-Critic MPC用于无人机控制。

AI总结提出一种Transformer-Actor-Critic MPC架构，通过证明Transformer满足增量输入-状态稳定性并利用黎曼收缩理论分析互联动力学，将理论界作为训练正则化项，实现可证明鲁棒的控制策略。

2606.20031 2026-06-19 cs.RO cs.AI 新提交 80%

A Neuromorphic Reinforcement Learning Framework for Efficient Pathfinding in Robotic Mobile Fulfillment Systems

一种用于机器人移动履行系统高效路径规划的神经形态强化学习框架

Junzhe Xu, Zecui Zeng, Lusong Li, Yuetong Fang, Renjing Xu

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； JD Explore Academy（京东探索研究院）

专题命中机器人学习：神经形态强化学习用于机器人路径规划。

AI总结提出SDQN-RMFS框架，通过ANN到SNN的转换和硬标签知识蒸馏，在神经形态芯片上实现超低功耗路径规划，相比GPU能耗降低11281倍，延迟减少近一半。

详情

AI中文摘要

动态环境变化、受限工作空间和严格的实时约束使得机器人移动履行系统（RMFS）中的路径规划对传统的搜索和基于规则的方法来说是一个具有挑战性的问题，这些方法通常遭受高计算复杂性和长决策延迟。虽然强化学习（RL）已成为一种强大的替代方案，但在资源受限的硬件上以极端的能源效率部署学习到的策略仍然是一个开放的挑战。我们提出了SDQN-RMFS，一个端到端的框架，实现了从全精度人工神经网络（ANN）训练的RL策略到神经形态芯片的高保真部署。通过仅在稀疏事件触发时进行计算，该框架实现了超低功耗的RMFS路径规划。我们的全栈流水线操作如下：首先通过碰撞允许策略高效训练ANN策略以密集化信息轨迹，然后通过硬标签知识蒸馏方法将其转换为脉冲神经网络（SNN）。这有效地解决了输出分布不匹配问题，在保持策略能力的同时显著降低了推理延迟。硬件实验表明，与高性能GPU基线相比，能耗节省高达11281倍，延迟几乎减少两倍，同时决策质量与原始训练策略相当。这些结果确立了物理神经形态推理作为大规模RMFS运营的实用且能源可持续的途径。

英文摘要

Dynamic environmental changes, confined workspaces, and stringent real-time constraints make pathfinding in Robotic Mobile Fulfillment Systems (RMFS) a challenging problem for conventional search- and rule-based methods, which typically suffer from high computational complexity and long decision latency. While reinforcement learning (RL) has emerged as a powerful alternative, deploying learned policies with extreme energy efficiency on resource-constrained hardware remains an open challenge. We present SDQN-RMFS, an end-to-end framework that achieves high-fidelity deployment of an RL-trained policy from a full-precision artificial neural network (ANN) through to a neuromorphic chip. By computing only when triggered by sparse events, this framework unlocks ultra-low-power RMFS pathfinding. Our full-stack pipeline operates as follows: an ANN policy is first efficiently trained via a collision-allowing strategy to densify informative trajectories, and then converted into a spiking neural network (SNN) via a hard-label knowledge distillation approach. This effectively addresses the output distribution mismatch, preserving policy capability across the ANN-to-SNN pipeline while substantially reducing inference latency. Hardware experiments demonstrate up to 11,281$\times$ energy savings and a nearly two-fold reduction in latency compared to a high-performance GPU baseline, while maintaining decision quality on par with the original trained policy. These results establish physical neuromorphic inference as a practical and energy-sustainable pathway for large-scale RMFS operations.

URL PDF HTML ☆

赞 0 踩 0

2606.19935 2026-06-19 cs.AI 新提交 80%

PhysDrift: Bridging the Embodiment Gap in Humanoid Co-Speech Motion Generation

PhysDrift: 弥合人形机器人共语动作生成中的具身差距

Zhangzhao Liang, Xiaofen Xing, Mingyue Yang, Wenlve Zhou, Xiangmin Xu

发表机构 * South China University of Technology（华南理工大学）； DexForce Technology（DexForce科技公司）； Foshan University（佛山大学）

专题命中机器人学习：提出人形机器人共语动作生成框架PhysDrift

AI总结针对人形机器人共语动作生成中人体运动流形与机器人具身约束不匹配的问题，提出IK-EER框架和PhysDrift模型，直接预测可执行关节轨迹，提升运动对齐、物理合理性和实时交互能力。

详情

AI中文摘要

人形机器人需要共语动作，这些动作不仅要富有表现力且与语音对齐，还要在具身约束下物理可执行。现有的共语动作生成流程主要是以人为中心的：首先以人体表示（如SMPL-X）生成动作，随后重定向到人形机器人。在这项工作中，我们识别出这种范式中的基本具身差距，即人体运动流形与人形机器人具身约束之间的不匹配在运动转移和物理执行过程中破坏了具身一致性。通过广泛分析，我们表明尽管重定向可以保留粗粒度的运动语义，但它显著压缩了运动多样性并削弱了韵律-动作同步，限制了富有表现力的人形机器人行为。为解决此问题，我们首先提出IK-EER，一种保留韵律的人形机器人运动策展框架，在重定向过程中联合优化运动学可行性和语音-运动时间对齐。基于策展的机器人原生运动数据集，我们进一步引入PhysDrift，一种具身感知的共语动作生成框架，直接预测可执行的人形机器人关节轨迹，无需依赖中间人体表示。与传统的以人为中心的流程不同，PhysDrift在训练和推理过程中都保持具身一致性，同时加入物理正则化以稳定机器人运动动态。大量实验和真实世界人形机器人部署表明，具身感知的机器人原生生成显著改善了语音-运动对齐、物理合理性、运动平滑性、推理效率和实时交互能力。

英文摘要

Humanoid robots require co-speech motions that are not only expressive and speech-aligned, but also physically executable under embodiment constraints. Existing co-speech generation pipelines are predominantly human-centric: motions are first generated in human-body representations such as SMPL-X and subsequently retargeted to humanoid robots. In this work, we identify a fundamental embodiment gap in this paradigm, where the mismatch between human motion manifolds and humanoid embodiment constraints disrupts embodiment consistency during motion transfer and physical execution. Through extensive analysis, we show that although retargeting can preserve coarse motion semantics, it significantly compresses motion diversity and weakens prosody-motion synchronization, limiting expressive humanoid behaviors. To address this problem, we first propose IK-EER, a prosody-preserving humanoid motion curation framework that jointly optimizes kinematic feasibility and speech-motion temporal alignment during retargeting. Building upon the curated robot-native motion dataset, we further introduce PhysDrift, an embodiment-aware co-speech motion generation framework that directly predicts executable humanoid joint trajectories from speech without relying on intermediate human-body representations. Unlike conventional human-centric pipelines, PhysDrift maintains embodiment consistency throughout both training and inference while incorporating physical regularization to stabilize robot motion dynamics. Extensive experiments and real-world humanoid deployment demonstrate that embodiment-aware robot-native generation substantially improves speech-motion alignment, physical plausibility, motion smoothness, inference efficiency, and real-time interaction capability.

URL PDF HTML ☆

赞 0 踩 0

2606.19914 2026-06-19 cs.RO cs.AI 新提交 80%

Co-policy: Responsive Human-Robot Co-Creation for Musical Performances

Co-policy: 响应式人机音乐共创框架

Xuetao Li, Wenke Huang, Mang Ye, Zijian Liu, Jinhua Xie, Jifeng Xuan, Miao Li

发表机构 * School of Computer Science, Wuhan University（武汉大学计算机学院）； College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算与数据科学学院）； School of Automation, Wuhan University of Technology（武汉理工大学自动化学院）； School of Geodesy and Geomatics, Wuhan University（武汉大学测绘学院）； School of Robotics, Wuhan University（武汉大学机器人学院）

专题命中机器人学习：提出人机音乐共创框架Co-policy

AI总结提出Co-policy框架，通过语义锚定、约束变分和视觉运动策略实现人机音乐实时共创，在真实钟琴实验中优于扩散策略基线。

详情

AI中文摘要

艺术长期以来一直是人类创造力的关键表达。具身人工智能为生成模型通过物理动作而非无形数字内容参与创造力提供了一条途径。在机器人音乐共创中，将语义音乐理解与实时且可物理执行的表演连接起来具有挑战性。我们提出了Co-policy，一个人机音乐共创框架，它分离了语义意图接地、约束音乐变分和视觉运动执行。为了接地音乐语义，Co-policy使用预推理语义锚点和微调的Qwen-vl规划器（F-Qwen）将语音、实时音乐种子和视觉观察转化为结构化的共创计划。为了支持低延迟执行，Co-policy引入了高斯混合视觉运动策略（GMP），实现为条件混合密度策略，在单次前向传递中将目标音符和视觉上下文映射到多模态机器人动作。与仅复现用户指定音符的机器人回放系统不同，Co-policy在音乐和物理约束下生成互补的音乐响应。真实机器人钟琴实验、消融研究和专家评估显示，与扩散策略和消融基线相比，意图对齐、执行准确性和响应频率均有提升，支持物理接地动作生成作为具身人机共创的关键要求。

英文摘要

Art has long stood as a pivotal expression of human creativity. Embodied artificial intelligence offers a route for generative models to participate in that creativity through physical action rather than disembodied digital content. In robotic music co-creation, it is challenging to connect semantic musical understanding with real-time and physically executable performance. We present Co-policy, a framework for human-robot musical co-creation that separates semantic intent grounding, constrained musical variation, and visuomotor execution. To ground musical semantics, Co-policy uses pre-inference semantic anchors and a fine-tuned Qwen-vl planner (F-Qwen) to transform speech, live musical seeds, and visual observations into structured co-creation plans. To support low-latency execution, Co-policy introduces a Gaussian-Mixture Visuomotor Policy (GMP), implemented as a conditional mixture-density policy that maps target notes and visual context to multimodal robot actions in a single forward pass. Unlike robotic playback systems that merely reproduce user-specified notes, Co-policy generates complementary musical responses under both musical and physical constraints. Real-robot chime experiments, ablations, and expert evaluation show improved intent alignment, execution accuracy, and response frequency over diffusion-policy and ablated baselines, supporting physically grounded action generation as a key requirement for embodied human-AI co-creation.

URL PDF HTML ☆

赞 0 踩 0

2606.19711 2026-06-19 cs.RO cs.LG cs.SY eess.SY 新提交 80%

A Differentiable Composite Approximation Framework for Autonomous Underwater Vehicle Maneuvering Modeling from Sea-Trial Data

一种可微复合近似框架：基于海试数据的自主水下航行器机动建模

Aobo Wang, Aifei Xia, Zihao Wang, Lizhu Hao

发表机构 * College of Shipbuilding Engineering, Harbin Engineering University（哈尔滨工程大学船舶工程学院）； China Academy of Aerospace Aerodynamics（中国航天空气动力技术研究院）； Institute of Artificial Intelligence, Shanghai University（上海大学人工智能研究院）； China Ship Scientific Research Center（中国船舶科学研究中心）

专题命中机器人学习：提出AUV机动建模的可微复合近似框架。

AI总结提出可微复合近似框架，结合多项式基与数据自适应基联合校准，并引入转向运动电流估计补偿，提升AUV机动预测精度。

详情

AI中文摘要

基于机载测量的场建模可以生成反映真实运行特性的自主水下航行器（AUV）机动模型。从近似角度看，传统机动模型使用预定义的约束多项式基，而数据驱动模型使用数据自适应基。受此基函数视角启发，本文提出一种可微复合近似公式，其中多项式基分量和数据自适应基分量被视为单个预测器的可微部分并联合校准。开发了一种基于梯度的协同校准方法用于全尺寸AUV机动预测，其中灵敏度感知机制调节有界多项式更新，而神经残差在共享预测目标下捕获剩余非线性差异。为了考虑现场数据中的海流效应，引入了一种基于转向运动的电流估计和补偿程序，以构建电流补偿的学习目标用于训练和滚动预测。该框架使用从7米长AUV在多种机动条件下收集的海试数据进行评估。结果表明，与纯多项式、纯神经网络和冻结先验混合基线相比，所提方法改进了递归轨迹和速度预测，证明了其在基于现场数据的AUV机动建模中的适用性。

英文摘要

Field-based modeling from onboard measurements can produce autonomous underwater vehicle (AUV) maneuvering models that reflect real operating characteristics. From an approximation perspective, conventional maneuvering models use predefined constraint polynomial bases, whereas data-driven models use data-adaptive bases. Motivated by this basis-function view, this paper presents a differentiable composite-approximation formulation, in which the polynomial-basis component and the data-adaptive basis component are treated as differentiable parts of a single predictor and calibrated jointly. A gradient-based co-calibration method is developed for full-scale AUV maneuvering prediction, where a sensitivity-aware mechanism regulates bounded polynomial updates while the neural residual captures remaining nonlinear discrepancies under a shared prediction objective. To account for ocean-current effects in field data, a turning-motion-based current estimation and compensation procedure is incorporated to construct current-compensated learning targets for training and rollout. The framework is evaluated using sea-trial data collected from a 7-meter AUV under multiple maneuvering conditions. Results show that the proposed method improves recursive trajectory and velocity prediction compared with polynomial-only, neural-only, and frozen-prior hybrid baselines, demonstrating its applicability to field-data-based AUV maneuvering modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.19675 2026-06-19 cs.RO 新提交 80%

ForEnt: A Multi-Modal Dataset for Characterizing Quadruped Robot Entrapments in Forest Environments

ForEnt: 用于表征四足机器人在森林环境中被困的多模态数据集

Natapat Kirdwichai, Danesh Tarapore

发表机构 * University of Southampton（南安普顿大学）

专题命中机器人学习：四足机器人森林被困多模态数据集，支持基准测试

AI总结针对四足机器人在森林中因植被缠绕而倾覆的问题，提出多模态数据集ForEnt，包含RGB-D、LiDAR、本体感知和第三人称视频，记录69次被困事件，支持可重复的基准测试。

Comments 8 pages, 7 figures

详情

AI中文摘要

腿式机器人越来越多地被部署在森林中进行生态调查和监测，但由于穿越森林环境带来的挑战，它们的自主性经常中断。森林被困，例如当机器人的腿被藤蔓或其他植被缠住时，会导致失去稳定性并翻倒。此类事件不仅中断任务并需要人工干预，还可能损坏机器人硬件。为了解决缺乏专门数据集来研究森林环境中这些故障模式的问题，我们提出了ForEnt，这是一个多模态数据集，使用低成本的Unitree Go2四足机器人在英国南安普顿公共林地的八个森林地点收集。在我们的数据集中，进行了约1.7公里的穿越，共11个序列，记录了69次被困事件。ForEnt包括时间同步的RGB-D图像、LiDAR扫描、本体感知数据和第三人称视频，能够分析导致被困的地形因素，并提供标记的传感器流用于可重复的基准测试。通过支持被困检测策略的评估，ForEnt降低了在具有挑战性的森林环境中开发稳健四足机器人部署的门槛。

英文摘要

Legged robots are increasingly deployed in forests for ecological surveying and monitoring, yet their autonomy is often interrupted consequent to the challenges posed in traversing forest environments. Forest entrapments, for example, when a robot's legs are ensnared in vines or other vegetation, result in loss of stability and toppling. Such events not only disrupt the mission and require manual intervention, but also risk damage to the robot hardware. To address the absence of a dedicated dataset to investigate these failure modes in forest environments, we present ForEnt, a multi-modal dataset collected with the low-cost Unitree Go2 quadruped across eight forest sites in the Southampton Common Woodlands, UK. For our dataset, over approximately 1.7 km of traversals in 11 sequences were conducted, yielding 69 recorded entrapment events. ForEnt includes time-synchronized RGB-D images, LiDAR scans, proprioceptive data, and third-person video, enabling analysis of terrain factors contributing to entrapment and providing labeled sensor streams for reproducible benchmarking. By supporting the evaluation of entrapment detection strategies, ForEnt lowers the barrier to developing robust quadruped robot deployments in challenging forest environments.

URL PDF HTML ☆

赞 0 踩 0

2606.19656 2026-06-19 cs.RO cs.LG 新提交 80%

DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning

DF-ExpEnse: 扩散滤波探索用于高效样本微调

Calvin Luo, Chen Sun, Shuran Song

发表机构 * Stanford University（斯坦福大学）； Brown University（布朗大学）

专题命中机器人学习：利用扩散滤波探索提升机器人微调样本效率

AI总结提出DF-ExpEnse探索技术，利用生成控制策略的多模态建模能力和评论家集成，在微调中高效收集在线经验，提升样本效率。

Comments ICML 2026

详情

AI中文摘要

智能机器人决策的自然方案是从预训练的生成控制策略初始化，该策略总结了离线经验，并将其适应于自收集的在线经验。我们提出了DF-ExpEnse，一种探索技术，可提高在线经验收集的质量，从而提升微调样本效率。DF-ExpEnse利用生成控制策略的多模态建模能力，创建一个表达性强且易于评估的候选集。然后，它利用评论家集成来识别在质量与高探索兴趣之间最佳平衡的动作。在群体设置中，DF-ExpEnse进一步支持跨智能体通信，以促进群体协作探索。DF-ExpEnse可以无缝集成到通过强化学习微调预训练生成控制策略的现有策略中。我们通过实验验证，在各种操作和 locomotion 任务中，与默认微调和替代动作选择方案相比，DF-ExpEnse 持续带来样本效率优势。项目可在此 https URL 找到。

英文摘要

A natural recipe for intelligent robotic decision-making is initializing from pretrained generative control policies, which have summarized offline experience, and adapting them to self-collected online experience. We present DF-ExpEnse, an exploration technique that improves the quality of online experience collection, thus increasing finetuning sample-efficiency. DF-ExpEnse leverages the multimodal modeling capabilities of the generative control policy to create an expressive and tractably evaluatable candidate set. It then utilizes an ensemble of critics to identify the action that best balances quality with high exploration interest. In fleet settings, DF-ExpEnse further enables cross-agent communication to facilitate collaborative exploration as a group. DF-ExpEnse can be seamlessly integrated with existing strategies that finetune pretrained generative control policies via reinforcement learning. We experimentally validate consistent sample-efficiency benefits through DF-ExpEnse across a variety of manipulation and locomotion tasks, compared to default finetuning and alternative action selection schemes. Project can be found at https://df-expense.github.io.

URL PDF HTML ☆

赞 0 踩 0

2606.19632 2026-06-19 cs.RO cs.AI cs.LG cs.LO cs.MA 新提交 80%

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

通过决策树蒸馏对学习到的多智能体通信策略进行形式化验证

Ahmad Farooq, Kamran Iqbal

发表机构 * University of Arkansas at Little Rock（阿肯色大学小石城分校）

专题命中机器人学习：通过决策树蒸馏验证多智能体通信策略的安全性。

AI总结提出通过决策树蒸馏将多智能体强化学习策略转化为可解释模型，并利用PRISM进行形式化验证，确保安全属性转移至原始网络，在无人机编队任务中实现88.9%属性满足率。

Comments 9 pages, 3 figures, 7 tables. Accepted at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026), Pittsburgh, Pennsylvania, USA, September 27-October 1, 2026

详情

AI中文摘要

多智能体强化学习使智能体能够通过涌现通信发展协调策略，但神经策略缺乏无人机群和自动驾驶车队等安全关键机器人部署所需的形式化安全保证。我们提出了首个通过学习策略抽象进行安全验证的端到端框架：神经策略被蒸馏为可解释的决策树，然后进行形式化验证，并通过经验验证确认验证的安全属性可转移至原始网络。我们的四阶段流程包括：从智能体观测中提取领域特定特征；决策树蒸馏达到97.9% +/- 1.2%的神经策略保真度；自动翻译为PRISM概率模型检查器规范，具有完整的特征到状态变量对应关系；以及通过成对分解、联合界聚合和经验邻居建模对概率计算树逻辑属性进行组合验证。评估用于5-7个智能体多无人机协调的矢量量化变分信息瓶颈策略，我们验证了18个涵盖安全性、活性和合作的时间逻辑属性，实现了88.9%的属性满足率，所有五个安全阈值均满足（碰撞概率0.3% vs 阈值1%）。原始神经策略的蒙特卡洛验证确认验证的安全属性转移偏差<=0.6个百分点（95%置信区间）。离散VQ-VIB消息相比连续方法提供+11.6至+13.6个百分点的保真度优势，实现3-4倍更快的验证。我们的框架为蒸馏策略抽象提供了经验验证的安全验证，作为深度多智能体强化学习与多机器人部署形式化安全工作流之间的实用桥梁。

英文摘要

Multi-agent reinforcement learning (MARL) enables agents to develop coordination strategies through emergent communication, but neural policies lack the formal safety guarantees required for safety-critical robotic deployment in drone swarms and autonomous vehicle fleets. We present the first end-to-end framework for safety verification of learned multi-agent communication policies through policy abstraction: neural policies are distilled into interpretable decision trees, then formally verified, with empirical validation confirming that verified safety properties transfer to original networks. Our four-stage pipeline consists of domain-specific feature extraction from agent observations, decision tree distillation achieving 97.9% +/- 1.2% fidelity to neural policies, automated translation to PRISM probabilistic model checker specifications with complete feature-to-state-variable correspondence, and compositional verification of Probabilistic Computation Tree Logic (PCTL) properties via pairwise decomposition with union-bound aggregation and empirical neighbor modeling. Evaluating Vector-Quantized Variational Information Bottleneck (VQ-VIB) policies for multi-drone coordination with 5-7 agents, we verify 18 temporal logic properties across safety, liveness, and cooperation, achieving 88.9% property satisfaction with all five safety thresholds satisfied (0.3% collision probability vs. 1% threshold). Monte Carlo validation of original neural policies confirms that verified safety properties transfer with <=0.6 percentage-point deviation (95% CI). Discrete VQ-VIB messages provide +11.6 to +13.6 percentage-point fidelity advantages over continuous methods, enabling 3-4x faster verification. Our framework provides empirically validated safety verification for distilled policy abstractions, serving as a practical bridge between deep MARL and formal safety workflows for multi-robot deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.19512 2026-06-19 cs.RO cs.SY eess.SY 新提交 80%

Proprioceptive Invariant State Estimation for Humanoid Robots on Non-Inertial Ground

非惯性地面上仿人机器人的本体感觉不变状态估计

Falak Mandali, Zijian He, Yan Gu

发表机构 * Purdue University（普渡大学）

专题命中机器人学习：仿人机器人在非惯性地面的状态估计

AI总结提出一种仅使用本体感觉的InEKF方法，利用足部IMU和运动学约束，实现非惯性地面上仿人机器人的实时状态估计，收敛速度提升96%，位置误差降低80%。

详情

AI中文摘要

本文提出了一种不变扩展卡尔曼滤波（InEKF）方法，用于在非惯性地面上运行的仿人机器人仅使用机载本体感觉进行实时状态估计。所提出的方法估计机器人相对于移动地面框架的基座位置和速度，无需直接测量地面运动或外部安装的传感器。通过足部安装的IMU利用支撑脚的运动学约束，该滤波器在保持完全本体感觉的同时，考虑了过程模型和测量模型中的地面引起的非线性。估计器被设计为具有右不变测量模型，从而在较大的初始不确定性下实现有利的误差动态。可观测性分析建立了机器人相对于非惯性地面框架的相对基座位置和速度可观测的条件。在摇摆和俯仰地面上站立和蹲下的Digit仿人机器人实验表明，与现有的InEKF相比，收敛速度提高了96%，位置估计误差减少了80%。在单轴旋转地面上的行走实验实现了平均估计误差小于9厘米，初始误差高达1米。

英文摘要

This paper presents an invariant extended Kalman filtering (InEKF) approach for real-time state estimation of humanoid robots operating on non-inertial ground using only onboard proprioceptive sensing. The proposed approach estimates the robot's base position and velocity relative to the moving ground frame without requiring direct measurements of ground motion or externally mounted sensors. By exploiting kinematic constraints at the stance foot through foot-mounted IMUs, the filter accounts for ground-induced nonlinearities in the process and measurement models while remaining fully proprioceptive. The estimator is formulated to admit a right-invariant measurement model, enabling favorable error dynamics under large initial uncertainties. Observability analysis establishes conditions under which the robot's relative base position and velocity are observable with respect to the non-inertial ground frame. Experiments with the Digit humanoid robot standing and squatting atop a swaying and pitching ground showcase a 96% speedup in convergence rate and an 80% reduction in position estimate errors over existing InEKFs. Walking experiments on a uni-axially rotating ground achieve an average estimation error of less than 9 cm for an initial error of up to 1 m.

URL PDF HTML ☆

赞 0 踩 0

2606.19031 2026-06-19 cs.RO 新提交 80%

Congestion-Aware Robot Tour Planning in Crowded Environments

拥挤环境中的拥塞感知机器人巡视规划

Stefano Bernagozzi, Charlie Street, Masoumeh Mansouri, Lorenzo Natale

发表机构 * Istituto Italiano di Tecnologia（意大利理工学院）； Università di Genova（热那亚大学）； University of Birmingham（伯明翰大学）

专题命中机器人学习：拥挤环境机器人巡视规划，属于机器人学习

AI总结提出一种基于概率的巡视规划器，通过学习人流预测模型并在线构建马尔可夫决策过程，在拥挤环境中高效规划机器人路径，减少拥塞影响。

Comments Accepted to IEEE IROS 2026

Journal ref IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2026

详情

AI中文摘要

自主移动服务机器人通常需要完成在环境中遍历一组位置的巡视任务。例如，引导人们穿过购物中心、在配送中心递送包裹或在博物馆提供导览。然而，在拥挤环境中，人群的存在可能对机器人性能产生负面影响。例如，人类会触发机器人的碰撞避免操作，从而降低机器人速度。人群随机移动且随时间变化。本文提出一种针对拥挤环境的概率巡视规划器，该规划器明确考虑人类拥塞。我们学习圆形线性流场（CLiFF）地图，该地图根据初始观测预测人类轨迹。然后，我们利用这些预测在线构建并求解马尔可夫决策过程，从而高效地将机器人引导通过环境。我们的方法具有足够的可扩展性，能够在观察到新人群时重新规划。我们在购物中心的真实人群数据集上评估了该方法。

英文摘要

Autonomous mobile service robots are often required to complete tours that require navigating through a set of locations in an environment. Example domains include guiding people through a shopping mall, delivering packages in a fulfilment centre, or giving guided tours in a museum. However, in crowded environments, the presence of people may negatively impact robot performance. For example, humans will activate robot collision avoidance manoeuvres that slow the robot down. Crowds move stochastically and vary throughout the day. In this paper we present a probabilistic tour planner for crowded environments which explicitly reasons over human congestion. We learn circular linear flow field (CLiFF) maps which predict human trajectories given an initial observation. We then use these predictions to build and solve a Markov decision process online which efficiently routes the robot through the environment. Our approach is scalable enough to re-plan as new people are observed. We evaluate our approach on a real-world crowd dataset in a shopping mall.

URL PDF HTML ☆

赞 0 踩 0

2606.19504 2026-06-19 cs.RO cs.SY eess.SY 新提交 75%

Simulating Robotic Locomotion in Sand: Resistive Force Theory in an Open-Source Physics Engine

模拟沙地中的机器人运动：开源物理引擎中的阻力理论

Ryan Walker Brown, Laura K. Treers, Kathryn A. Daltorio

发表机构 * Case Western Reserve University（凯斯西储大学）； University of Vermont（佛蒙特大学）

专题命中机器人学习：沙地机器人运动模拟与阻力理论集成

AI总结将三维颗粒阻力理论（3D RFT）集成到MuJoCo物理引擎中，实现沙地行走模拟，验证了足端形状、速度和负载对运动的影响，并在六足机器人实验中预测行走距离和沉陷误差在20%以内。

Comments 12 pages, 7 figures

详情

AI中文摘要

阻力理论（RFT）的最新进展使得无需模拟单个颗粒相互作用即可近似沙地运动中的地面反作用力，从而降低了计算成本。然而，这些工具在常用于机器人仿真的3D物理引擎中尚不可用。我们探讨了将阻力近似与标准动力学计算相结合，是否能为自由行走的机器人提供稳定的支撑。为此，我们在物理仿真引擎MuJoCo中实现了三维颗粒阻力理论（3D RFT）。我们在多个场景中验证了仿真，证明了由于末端执行器形状、速度和负载引起的关键趋势得以保留。我们的实现预测了12自由度六足机器人在沙地中的行走距离和足部下沉，误差在实验值的20%以内。尽管RFT存在固有近似，但本文描述的开源工具有望帮助开发新的和改进的机器人设计，以穿越颗粒介质基底。

英文摘要

Recent advancements in Resistive Force Theory (RFT) enable approximation of ground reaction forces for locomotion in sand without the computational expense of modeling interactions with individual grains. However, these tools have been absent in 3D physics engines commonly used for robot simulation. We explore if resistive force approximations are sufficient, when integrated with standard dynamics calculations, to provide a stable substrate for a freely walking robot. To determine this, we implement 3D Granular Resistive Force Theory (3D RFT) in a physics simulation engine, MuJoCo. We verify simulations in multiple scenarios to demonstrate that key trends due to end effector shape, speed, and loading are preserved. Our implementation predicts walking distance and foot sinkage of a 12-Degree of Freedom hexapod robot within 20\% of experiments in sand. While RFT has inherent approximations, the open source tool described here has potential to help develop new and improved robot designs to traverse granular media substrates.

URL PDF HTML ☆

赞 0 踩 0

2606.19408 2026-06-19 cs.LG cs.RO 新提交 75%

FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning

FlexLAM: 解决潜在动作学习中的瓶颈权衡

Takanori Yoshimoto, Yang Hu, Naruya Kondo, Tatsuya Matsushima

发表机构 * University of Tsukuba（筑波大学）； The University of Tokyo（东京大学）

专题命中机器人学习：潜在动作学习，用于机器人视频与决策。

AI总结针对潜在动作模型中固定容量瓶颈导致的权衡问题，提出FlexLAM，通过嵌套dropout实现变长潜在动作，在不增加架构或损失的情况下，在稀缺标签和低回报任务中优于固定容量模型，并支持推理时调整令牌预算。

详情

AI中文摘要

潜在动作为无动作视频与下游决策提供了紧凑接口，但现有潜在动作模型（LAM）强制每个转换通过固定容量瓶颈。我们识别出一个瓶颈权衡：过于紧凑的编码可能丢弃动作对齐所需的转换线索，而过于松散的编码则保留了额外的转换变化，当对齐标签稀缺或分布狭窄时必须解决这些变化。FlexLAM用通过嵌套dropout训练的变长潜在动作取代固定容量，产生前缀有效编码，首先捕获紧凑的转换结构，仅在需要时添加细节，无需新架构或损失。在标准稀缺标签监督下和低回报单任务对齐压力测试中，单个FlexLAM在每个评估的令牌预算下匹配或超越单独训练的固定容量LAM，表明FlexLAM不仅在推理时可调整，而且在相同令牌预算下学习了更好的潜在动作接口。同一模型支持推理时令牌预算调整而无需重新训练，并且FlexLAM改善了Ego4D转换重建。这些结果表明，变长潜在动作是对潜在动作模型、潜在动作世界模型和视频预训练动作接口中固定容量瓶颈的无架构、即插即用升级。

英文摘要

Latent actions provide a compact interface between action-free video and downstream decision-making, yet existing Latent Action Models (LAMs) force every transition through a fixed-capacity bottleneck. We identify a bottleneck trade-off: overly tight codes can discard transition cues needed for action alignment, while overly loose codes preserve additional transition variation that must be resolved when alignment labels are scarce or narrowly distributed. FlexLAM replaces this fixed capacity with variable-length latent actions trained by nested dropout, yielding prefix-valid codes that capture compact transition structure first and add detail only when needed, without new architectures or losses. A single FlexLAM matches or surpasses separately trained fixed-capacity LAMs at every evaluated token budget under standard scarce-label supervision and under a low-return single-task alignment stress test, indicating that FlexLAM is not merely adjustable at inference time but learns a better latent-action interface at the same token budgets. The same model supports inference-time token-budget adjustment without retraining, and FlexLAM improves Ego4D transition reconstruction. These results suggest that variable-length latent actions are an architecture-free, drop-in upgrade to the fixed-capacity bottleneck in latent action models, latent-action world models, and video-pretrained action interfaces.

URL PDF HTML ☆

赞 0 踩 0

2606.20322 2026-06-19 cs.RO 新提交 70%

Towards 3D karst underwater scene reconstruction from rotating sonar data

基于旋转声纳数据的3D喀斯特水下场景重建

Georgios Evangelos Margaritis, Lionel Lapierre, Simon Rohou, Zhi Yan, Andreas Nüchter, François Goulette

发表机构 * U2IS, ENSTA, Institut Polytechnique de Paris（巴黎综合理工学院ENSTA学院U2IS实验室）； Lab-STICC, ENSTA, Institut Polytechnique de Paris（巴黎综合理工学院ENSTA学院Lab-STICC实验室）； Informatics XVII – Robotics, Julius-Maximilians-Universität Würzburg（尤利乌斯-马克西米利安-维尔茨堡大学信息学XVII – 机器人学）

专题命中机器人学习：结合SLAM与深度学习的水下重建

AI总结针对声纳数据稀疏噪声大、导航漂移导致3D重建困难的问题，提出结合连续时间SLAM校正轨迹与两阶段深度学习表面重建的流水线，生成可沉浸导航的3D网格。

Comments 1st Workshop on Long-term Deployments in the Wild (LoWi)

2606.20272 2026-06-19 cs.RO cs.CV 新提交 70%

Efficiently Linking Real Scenes with Synthetic Data Generation for AI-based Cognitive Robotics and Computer Vision Applications

高效连接真实场景与合成数据生成以支持基于AI的认知机器人和计算机视觉应用

Paul Koch, Vivek Chavan, André Sers, Adem Karakurt, Paul Hofmann, Mohamad Zaher Ziadeh, Jörg Krüger

发表机构 * Fraunhofer IPK（弗劳恩霍夫生产设备和设计技术研究所）； TU Berlin（柏林工业大学）

专题命中机器人学习：讨论认知机器人的合成数据生成

AI总结本文讨论当前AI视觉模型在认知机器人应用中的局限，并提出通过连接仿真与真实世界训练数据生成来弥合领域差距的方法。

Comments Accepted and best paper award at MHI-Kolloquium 2024

2606.20246 2026-06-19 cs.RO cs.AI 新提交 70%

Finetuning Vision-Language-Action Models Requires Fewer Layers Than You Think

微调视觉-语言-动作模型所需的层数比你想象的少

Gia-Binh Nguyen, Trong-Bao Ho, Thien-Loc Ha, Khoa Vo, Philip Lund Møller, Quang T. Nguyen, Long Dinh, Tuan Dam, Vu Duong, Tung M. Luu, Trung Le, Tran Nguyen Le, Minh Vu, An Thai Le, Ngan Le, Daniel Sonntag, James Zou, Jan Peters, Duy M. H. Nguyen, Ngo Anh Vien

发表机构 * Center for AI Research, VinUniversity（VinUniversity人工智能研究中心）； VinRobotics ； University of Arkansas（阿肯色大学）； Technical University of Denmark（丹麦技术大学）； Hanoi University of Science and Technology（河内科技大学）； KAIST（韩国科学技术院）； Monash University（莫纳什大学）； Oldenburg University（奥尔登堡大学）； DFKI（德国人工智能研究中心）； University of Stuttgart（斯图加特大学）； IMPRS-IS（国际马克斯·普朗克智能系统研究学院）； Stanford University（斯坦福大学）； Technische Universität Darmstadt（达姆施塔特工业大学）

专题命中机器人学习：应用于机器人操作模型压缩

AI总结本文发现VLA模型存在层间表示冗余，提出无需训练的压缩方法，通过去除冗余层将模型深度减少50%，实现40-50%训练加速和30%推理加速，性能不变。

详情

AI中文摘要

在大规模视频-机器人数据集上预训练的视觉-语言-动作（VLA）模型彻底改变了机器人操作，但其数十亿参数架构在下游微调和实时推理过程中带来了巨大的计算负担。在这项工作中，我们揭示了这些连续控制基础策略（例如pi_0、GR00T-N1.5）的一个高度非平凡的结构特性：尽管在多样化的物理轨迹上训练，它们表现出严重的逐层表示冗余。为了利用这一点，我们引入了一个完全无需训练的结构压缩流程，避免了现有方法需要加载全尺寸模型来学习优化的令牌缩减或动态层选择器的需求。相反，仅通过使用中心核对齐的单次前向传递来识别冗余层特征，我们移除孪生层以永久压缩模型深度高达50%，涵盖VLM主干和连续控制策略头。这种精简架构的下游微调带来了双重加速效益：训练时间减少40-50%，实时推理速度提升高达30%，同时匹配或超越全尺寸基模型性能。我们在三个模拟基准（LIBERO、RoboCasa、SimplerEnv）和10个跨4种不同机器人实体的多样化真实世界操作任务上全面验证了我们的方法。这些结果证明，先进的VLA所需的层数远少于先前假设，为可扩展的机器人学习提供了一种高度计算高效的范式。

英文摘要

Vision-Language-Action (VLA) models pre-trained on massive video-robot datasets have revolutionized robotic manipulation, yet their multi-billion parameter architectures impose prohibitive computational burdens during downstream fine-tuning and real-time inference. In this work, we reveal a highly non-trivial architectural characteristic of these continuous control foundation policies (e.g., pi_0, GR00T-N1.5): despite being trained on diverse physical trajectories, they exhibit severe layer-wise representational redundancy. To exploit this, we introduce a structural compression pipeline that is entirely training-free, bypassing the need of existing methods to load full-scale models to learn optimized token reductions or dynamic layer selectors. Instead, using only a single forward pass via Centered Kernel Alignment to identify redundant layer features, we remove twin layers to permanently compress the model depth by up to 50% across both the VLM backbone and the continuous control policy head. Downstream fine-tuning of this streamlined architecture yields a dual acceleration benefit: a 40-50% reduction in training time and up to 30% faster real-time inference, while matching or exceeding full-scale base model performance. We comprehensively validate our method across three simulation benchmarks (LIBERO, RoboCasa, SimplerEnv) and 10 diverse real-world manipulation tasks across 4 unique robotic embodiments. These results prove that advanced VLAs require significantly fewer layers than previously assumed, offering a highly compute-efficient paradigm for scalable robot learning.

URL PDF HTML ☆

赞 0 踩 0

2606.19889 2026-06-19 cs.CV 新提交 70%

SurgVista: Long-Horizon Surgical World Modeling with Plausible Instrument-Tissue Dynamics

SurgVista：具有合理器械-组织动力学的长程手术世界建模

Wentao Pan, Wuyang Li, Shengyuan Liu, Xinyu Liu, Hengyu Liu, Yixuan Yuan

发表机构 * The Chinese University of Hong Kong（香港中文大学）； EPFL（瑞士联邦理工学院洛桑）； Imperial College London（伦敦帝国学院）

专题命中机器人学习：世界模型支持机器人策略学习。

AI总结提出SurgVista手术世界模型，通过变形一致性正则化和漂移适应训练，解决空间交互不连贯和时间保真度崩溃问题，在长程预测中显著优于现有方法。

详情

AI中文摘要

将机器人策略学习扩展到自主手术面临挑战，因为专家演示成本高昂且体内探索存在重大安全风险。手术世界模型通过从初始观测生成逼真的、动作条件下的未来帧来解决这一问题，但现有方法存在两种持续失效模式：空间交互不连贯，即可见器械接触未能引起空间一致的组织变形；以及时间保真度崩溃，即预测误差在自回归展开中累积并逐渐破坏视觉质量。我们提出SurgVista，一种通过两种训练策略缓解这两种失效的手术世界模型。变形一致性正则化从训练视频中提取场景点轨迹，并通过潜在对比学习强制跨帧一致性，增强物理一致的器械-组织动力学。漂移适应训练通过用在线预测残差和根据长程漂移统计校准的光度增强扰动条件帧，减轻长程漂移，在扩展展开中维持视觉保真度。为了进行严格评估，我们进一步引入SurgWorld-Bench，包含多样化的手术类型、长程展开以及用于器械运动精度和组织响应保真度的解耦指标。大量实验表明，SurgVista在视觉质量、时间一致性和交互保真度方面持续优于最先进方法，且随着预测视界增长优势扩大。

英文摘要

Scaling robot policy learning for autonomous surgery is challenging, as expert demonstrations are expensive and in vivo exploration poses substantial safety risks. Surgical world models address this by generating realistic, action-conditioned future frames from an initial observation, but existing methods exhibit two persistent failure modes: spatial interaction incoherence, where visible instrument contact fails to induce spatially consistent tissue deformation, and temporal fidelity collapse, where prediction errors compound across autoregressive rollouts and progressively corrupt visual quality. We present SurgVista, a surgical world model that mitigates both failures through two training recipes. Deformation Consistency Regularization extracts scene-point trajectories from training videos and enforces cross-frame coherence through latent contrastive learning, strengthening physically consistent instrument-tissue dynamics. Drift Adaptation Training mitigates long-horizon drift by perturbing conditioning frames with online prediction residuals and photometric augmentations calibrated to long-horizon drift statistics, sustaining visual fidelity over extended rollouts. To enable rigorous evaluation, we further introduce SurgWorld-Bench, featuring diverse procedure types, long-range rollouts, and decoupled metrics for instrument-motion accuracy and tissue-response fidelity. Extensive experiments show that SurgVista consistently outperforms state-of-the-art methods across visual quality, temporal consistency, and interaction fidelity, with gains widening as the prediction horizon grows.

URL PDF HTML ☆

赞 0 踩 0

2605.09383 2026-06-19 cs.RO 版本更新 70%

Safety-Critical LiDAR-Inertial Odometry with On-Manifold Deterministic Protection Level

安全关键的激光雷达-惯性里程计与在线流形确定性保护级别

Yueqi Zhu, Yan Pan, Chufan Rui, Jiasheng Luo, Shihua Li, Bo Zhou

发表机构 * School of Automation, Southeast University（东南大学自动化学院）； Key Laboratory of Measurement and Control of CSE, Ministry of Education（教育部测控CSE重点实验室）

专题命中机器人学习：用于移动机器人导航的安全关键里程计

AI总结本文提出一种安全关键的激光雷达-惯性里程计，通过在线流形确定性状态估计提供确定性保护级别，以提升移动机器人在安全关键场景中的导航安全性。

详情

AI中文摘要

在安全关键场景中，自主导航系统的保护级别对于使移动机器人安全执行任务至关重要。然而，现有针对机器人概率导航系统的研究通常使用有限数据集进行离线准确性评估，并假设结果可应用于未知真实环境。因此，当前自主移动机器人往往缺乏在线安全评估的保护级别。为填补这一空白，我们提出了一种安全关键的激光雷达-惯性里程计（LIO），其基于在线流形确定性状态估计提供确定性保护级别。通过采用未知但有界的假设，我们推导出点云噪声与迭代最近点算法估计不确定性之间的简洁闭式关系。利用这一关系，我们设计了一种在线流形椭球集成员滤波器，并将其实现于LIO系统中。利用集成员滤波器的性质，我们的系统将估计位置的可行集作为确定性保护级别，用作机器人下游自主操作的安全参考。实验结果表明，我们的系统能够为各种环境中的不同机器人提供有效的确定性在线安全参考。

英文摘要

In safety-critical scenarios, the protection level of the autonomous navigation system is crucial for enabling mobile robots to perform safe tasks. However, existing studies on probabilistic navigation systems for robots usually perform offline accuracy evaluations using limited datasets and assume that the results can be applied to unknown real-world environments. As a result, current autonomous mobile robots often lack protection levels for online safety assessment. To fill this gap, we propose a safety-critical LiDAR-inertial odometry (LIO) that provides deterministic protection levels based on on-manifold deterministic state estimation. By adopting the unknown but bounded assumption, we derive a neat closed-form relationship between point cloud noise and the uncertainty of the estimation from the iterated closest point algorithm. Using this relationship, we design an on-manifold ellipsoidal set-membership filter and implement it within the LIO system. Leveraging the properties of the set-membership filter, our system offers the feasible sets of the estimated locations as the deterministic protection levels, serving as safety references for the robots' downstream autonomous operations. The experimental results show that our system can provide effective deterministic online safety references for diverse robots in various environments.

URL PDF HTML ☆

赞 0 踩 0

2505.18201 2026-06-19 cs.RO cs.LG 版本更新 70%

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

强化孪生用于扑翼无人机的混合控制

Romain Poletti, Lorenzo Schena, Lilla Koloszar, Joris Degroote, Miguel Alfonso Mendez

发表机构 * Environmental and Applied Fluid Dynamics, von Karman Institute for Fluid Dynamics（环境与应用流体动力学，冯·卡门流体动力学研究所）； Department of Mechanical Engineering, Vrije Universiteit Brussel（机械工程系，自由大学布鲁塞尔）； Department of Electromechanical, Systems and Metal Engineering, Ghent University（机电系统与金属工程系，根特大学）； Aero-Thermo-Mechanics Laboratory, École Polytechnique de Bruxelles, Université Libre de Bruxelles（航空热力学力学实验室，布鲁塞尔理工学院，自由大学布鲁塞尔）； Experimental Aerodynamics and Propulsion Lab, Universidad Carlos III de Madrid（实验空气动力学与推进实验室，马德里卡洛斯三世大学）

专题命中机器人学习：混合控制扑翼无人机，结合强化学习与数字孪生。

AI总结提出一种混合无模型/基于模型的扑翼无人机控制方法，通过强化孪生算法结合强化学习与自适应数字孪生，利用迁移学习和策略裁判提升样本效率与控制鲁棒性。

详情

AI中文摘要

控制扑翼无人机需要能够处理来自不完整、有噪声传感器数据的时变、非线性、欠驱动动力学的控制器。人工智能的最新进展，特别是强化学习，通过从环境交互中进行数据驱动的策略优化，为解决此类复杂控制问题开辟了新视角。然而，纯数据驱动方法样本效率低，需要大量甚至不安全的探索，尤其是在缺乏引导物理模型的情况下。这激发了混合人工智能-物理框架。本文提出了一种使用强化孪生算法的混合无模型/基于模型的飞行控制方法。基于模型的组件使用伴随公式和从实时轨迹中连续识别的自适应数字孪生；无模型组件使用强化学习。两个智能体通过迁移学习、模仿学习以及真实环境与数字孪生之间的共享经验来共享知识，并由一个策略裁判协调，该裁判根据数字孪生性能和真实到虚拟一致性比率选择哪个智能体在现实中行动。该框架针对扑翼无人机的纵向控制进行了评估，该无人机被建模为由准稳态气动力驱动的非线性时变系统。混合策略在三种自适应模型初始化下进行了测试：（1）从现有数据进行离线识别，（2）随机初始化并进行完全在线识别，以及（3）使用有偏参数进行离线预训练，然后进行在线自适应。在所有情况下，混合框架在性能、鲁棒性和样本效率方面均优于纯无模型和纯基于模型的方法。

英文摘要

Controlling flapping-wing drones requires controllers that handle time-varying, nonlinear, underactuated dynamics from incomplete, noisy sensor data. Recent advances in artificial intelligence (AI), particularly reinforcement learning (RL), have opened new perspectives for addressing such complex control problems through data-driven policy optimization from interaction with the environment. Yet purely data-driven methods are sample-inefficient, demanding extensive, sometimes unsafe exploration, especially without guiding physical models. This motivates hybrid AI-physics frameworks. This article proposes a hybrid model-free/model-based flight-control approach using the reinforcement twinning algorithm. The model-based (MB) component uses an adjoint formulation and an adaptive digital twin continuously identified from live trajectories; the model-free (MF) component uses RL. The two agents share knowledge via transfer learning, imitation learning, and shared experience between the real environment and the digital twin, coordinated by a policy referee that selects which agent acts in reality based on digital-twin performance and a real-to-virtual consistency ratio. The framework is evaluated for the longitudinal control of a flapping-wing drone, modelled as a nonlinear time-varying system driven by quasi-steady aerodynamic forces. The hybrid strategy is tested under three adaptive-model initializations: (1) offline identification from existing data, (2) random initialization with fully online identification, and (3) offline pre-training with biased parameters followed by online adaptation. In all cases, the hybrid framework improves performance, robustness, and sample efficiency over purely model-free and purely model-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.19721 2026-06-19 cs.LG cs.AI 新提交 60%

OnDeFog: Online Decision Transformer under Frame Dropping

OnDeFog：帧丢失下的在线决策变压器

Daiki Yotsufuji, Kenta Nishihara, Shoma Shimizu, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

专题命中机器人学习：强化学习方法应用于机器人决策。

AI总结针对帧丢失导致性能下降的问题，提出OnDeFog，将DeFog机制与在线决策变压器结合，通过直接环境交互学习策略，在高丢帧率环境下优于ODT，在低奖励数据集上优于DeFog。

Comments Accepted to PRICAI 2025

详情

DOI: 10.1007/978-981-95-7072-0_10

AI中文摘要

在具有挑战性的现实世界强化学习应用中，通信延迟或传感器故障经常导致帧丢失，此时智能体无法接收丢失的状态及相关奖励。为了解决帧丢失导致的性能下降问题，通过将额外机制引入决策变压器以处理帧丢失，开发了随机帧丢失下的决策变压器（DeFog）。尽管DeFog可以缓解帧丢失环境中的性能下降，但由于DeFog是一种离线学习方法，它难以有效泛化到训练数据集中未充分表示的新状态。在本研究中，我们提出OnDeFog，它将DeFog中的机制与在线决策变压器（ODT）相结合，ODT是一种通过直接环境交互学习策略的在线强化学习方法。全面的实验评估表明，我们提出的OnDeFog在高丢帧率环境下相比ODT取得了更优的性能，并且在包含大量低奖励数据的数据集上优于DeFog。

英文摘要

In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the performance degradation caused by frame dropping, the Decision Transformer under Random Frame Dropping (DeFog) was developed by incorporating additional mechanisms into the decision transformer to tackle frame dropping. Although DeFog can mitigate performance degradation in frame-dropping environments, since DeFog is an offline learning method, it struggles to effectively generalize to novel states not adequately represented in the training dataset. In this study, we propose OnDeFog, which integrates the mechanisms in DeFog with the online decision transformer (ODT), an online reinforcement learning method that learns policies through direct environmental interaction. Comprehensive experimental evaluation demonstrates that our proposed OnDeFog achieves superior performance compared to ODT in environments characterized by high dropping frame rate and outperforms DeFog on datasets containing a large amount of low-reward data.

URL PDF HTML ☆

赞 0 踩 0