arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.03940 2026-06-03 eess.IV cs.CV cs.LG cs.RO 版本更新

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

SEAOTTER: 基于传感器嵌入自编码器与一次性转码的高效重建

Dan Jacobellis, Neeraja J. Yadwadkar

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系) The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出SEAOTTER框架,结合传感器嵌入自编码器与可学习JPEG转码,在200:1压缩比下实现比AVIF快7倍编码、3.5倍解码,并提升ImageNet top-1准确率8%,同时保持JPEG兼容性。

详情
AI中文摘要

在机器人系统中,使用低成本、低功耗硬件可以轻松捕获高分辨率的大量视觉数据。然而,当通过JPEG/MPEG等传统编解码器传输时,有限的带宽和机载计算资源阻碍了充分利用。较新的编解码器(如AV1/AVIF)改善了率失真权衡,但需要更多资源进行编码,在没有定制ASIC的情况下不切实际。最近的非对称自编码器在极端功率和带宽约束下提供高质量,但增加了高昂的解码成本,并使用忽略围绕JPEG等标准建立的数十年基础设施的特有格式。为了解决这些限制,我们引入了一种基于传感器嵌入自编码器与一次性转码的高效重建(SEAOTTER)的云机器人压缩框架。由于传感器、云和消费阶段面临非常不同的功率和带宽预算,SEAOTTER结合了学习潜变量的紧凑性和标准JPEG文件的广泛可用性。由于朴素转码会降低性能,我们提出了一种可学习的JPEG颜色和量化变换,能够提高全局、密集和基于视觉语言感知的准确性。使用SEAOTTER,我们为预训练的冻结编码器训练通用和任务感知的转码流水线。在200:1的压缩比下,与AVIF相比,我们观察到编码速度提高7倍,解码速度提高3.5倍,ImageNet top-1准确率提高8%,同时保持与JPEG基础设施的兼容性。我们的代码可从此https URL获取。

英文摘要

In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve the rate-distortion trade-off, but demand far more resources for encoding, impractical without custom ASICs. Recent asymmetric autoencoders deliver high quality under extreme power and bandwidth constraints, but add prohibitive decoding cost and use bespoke formats that ignore decades of infrastructure built around standards like JPEG. To address these limitations, we introduce a compression framework for cloud robotics based on a Sensor Embedded Autoencoder paired with a One-Time Transcode for Efficient Reconstruction (SEAOTTER). Because the sensor, cloud, and consumer stages face very different power and bandwidth budgets, SEAOTTER combines the compactness of a learned latent with the broad usability of a standard JPEG file. Since naive transcoding degrades performance, we propose a learnable JPEG color and quantization transform that enables increased accuracy for global, dense, and vision-language-based perception. Using SEAOTTER, we train both general-purpose and task-aware transcoding pipelines for a pre-trained, frozen encoder. At a compression ratio of 200:1 and compared to AVIF, we observe 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, while retaining compatibility with JPEG infrastructure. Our code is available at https://github.com/UT-SysML/seaotter .

2606.03994 2026-06-03 cs.CV cs.RO 版本更新

SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image

SimuScene: 从单张图像重建仿真就绪的组合式3D场景

Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim, Hyunsoo Cha, Hanbyul Joo

发表机构 * Seoul National University(首尔国立大学)

AI总结 提出SimuScene,一种将物理仿真融入形状和布局估计的组合式3D重建流水线,通过物理引擎诊断重建错误并驱动修正,生成稳定且仿真就绪的场景。

Comments Project Page: https://snuvclab.github.io/SimuScene/

详情
AI中文摘要

从单张图像重建可交互、仿真就绪的3D场景是机器人操作的关键瓶颈。虽然最近的单图像提升器能恢复合理的每个物体形状,但组合它们会产生因物体相互穿透、悬浮或下沉而在物理仿真中崩溃的场景。现有的物理感知方法严格将其作为事后布局修正,而未解决底层几何误差。为此,我们引入SimuScene,一种将物理置于形状和布局估计循环中的组合式3D重建流水线。我们不仅将物理用于布局清理,还在生成过程中利用物理引擎作为诊断测量工具。通过在重力下对重建物体进行诊断性仿真,我们将穿透和支撑失败转化为定量修正信号,驱动重力轴拉伸和非模态形状重采样。这种物理信息反馈循环减轻了累积的重建误差,并产生稳定、仿真就绪的组合式3D场景。大量实验在物理稳定性和几何对齐基准上展示了最先进的性能。我们进一步通过在仿人控制和机器人臂操作任务中部署重建环境来突出SimuScene的实用性。

英文摘要

Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters recover plausible per-object shapes, composing them yields scenes that collapse under physical simulation due to interpenetrating, hovering, or sinking objects. Existing physics-aware methods address this strictly as a post-hoc layout correction, leaving the underlying geometric errors unresolved. To address this, we introduce SimuScene, a compositional 3D reconstruction pipeline that puts physics in the loop of shape and layout estimation. Rather than using physics merely for layout cleanup, we utilize the physics engine as a diagnostic measurement tool during the generative process itself. By diagnostically simulating reconstructed objects under gravity, we convert penetration and support failures into quantitative correction signals that drive gravity-axis stretching and amodal shape resampling. This physics-informed feedback loop mitigates accumulated reconstruction errors and produces a stable, simulation-ready compositional 3D scene. Extensive experiments demonstrate state-of-the-art performance on physical stability and geometric alignment benchmarks. We further highlight SimuScene's utility by deploying reconstructed environments in humanoid control and robot-arm manipulation tasks.

2606.03992 2026-06-03 cs.CV cs.RO 版本更新

Exploring Easy Boosts for Lidar Semantic Scene Completion

探索激光雷达语义场景补全的简易提升方法

Tetiana Martyniuk, Jonathan Seele, Alexandre Boulch, Gilles Puy, Renaud Marlet, Raoul de Charette

发表机构 * Inria, France(法国国家信息与自动化技术研究所) valeo.ai, France(valeo.ai公司) ETH Zurich, Switzerland(瑞士苏黎世联邦理工学院) LIGM, CNRS, Univ Gustave Eiffel, ENPC, IP Paris, France(法国高等科学研究院(CNRS))

AI总结 本文研究无需复杂架构重设计的“免费午餐”策略,通过为输入点云添加语义伪标签和可见性信息,显著提升激光雷达语义场景补全性能,使旧模型与最先进系统竞争甚至超越。

Comments Accepted to ICIP 2026

详情
AI中文摘要

本文研究了“免费午餐”策略,以提升激光雷达语义场景补全(SSC)的性能,而无需复杂的架构重新设计。我们首先证明,使用现成分割器为输入点云赋予语义伪标签可以显著提升现有架构的性能。通过将这些模型与 oracle 进行评估,我们确定高质量的语义先验是 mIoU 提升的主要驱动力。此外,我们为输入激光雷达扫描配备了可见性信息,以区分空区域和未知区域,这为测试的架构提供了次要的性能提升。使用这些简单的增强,我们观察到旧模型仍然可以与最先进的系统竞争,甚至超越它们。我们的代码可在 https://this https URL 获取。

英文摘要

This paper investigates "free lunch" strategies to boost the performance of lidar semantic scene completion (SSC) without requiring complex architectural redesigns. We first demonstrate that endowing input point clouds with semantic pseudo-labels from off-the-shelf segmentors significantly improves the performance of existing architectures. By evaluating these models against an oracle, we establish that high-quality semantic priors are a primary driver of mIoU gains. Furthermore, we equip the input lidar scan with visibility information that distinguishes between empty and unknown spaces, which provides a secondary performance boost across the tested architectures. Using these simple enhancements, we observe that older models remain competitive with state-of-the-art systems, and can even outperform them. Our code is available at https://github.com/astra-vision/SSC-Priors.

2606.03985 2026-06-03 cs.RO cs.AI cs.CV 版本更新

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Humanoid-GPT:扩展数据与结构以实现零样本运动跟踪

Zekun Qi, Xuchuan Chen, Dairu Liu, Chenghuai Lin, Yunrui Lian, Sikai Liang, Zhikai Zhang, Yu Guan, Jilong Wang, Wenyao Zhang, Xinqiang Yu, He Wang, Li Yi

发表机构 * Tsinghua University(清华大学) Galbot Inc.(Galbot公司) Shanghai Jiao Tong University(上海交通大学) Peking University(北京大学) Shanghai Qi Zhi Institute(上海启智研究院)

AI总结 提出Humanoid-GPT,一种基于GPT风格的因果Transformer,在十亿级运动语料上预训练,实现全身控制,通过扩展数据和模型容量达到对未见运动和任务的零样本泛化。

Comments Accepted at CVPR 2026

详情
AI中文摘要

我们介绍了Humanoid-GPT,一种具有因果注意力的GPT风格Transformer,在十亿级运动语料上训练用于全身控制。与受限于稀缺数据和敏捷性-泛化权衡的先前浅层MLP跟踪器不同,Humanoid-GPT在一个包含所有主要动作捕捉数据集和大规模内部录制的20亿帧重定向语料上预训练。扩展数据和模型容量产生了一个单一的生成式Transformer,它能够跟踪高度动态的行为,同时实现对未见运动和控制任务的前所未有的零样本泛化。大量实验和扩展分析表明,我们的模型建立了新的性能前沿,展示了对未见任务的鲁棒零样本泛化,同时能够跟踪高度动态和复杂的运动。

英文摘要

We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus that unifies all major mocap datasets with large-scale in-house recordings. Scaling both data and model capacity yields a single generative Transformer that tracks highly dynamic behaviors while achieving unprecedented zero-shot generalization to unseen motions and control tasks. Extensive experiments and scaling analyses show that our model establishes a new performance frontier, demonstrating robust zero-shot generalization to unseen tasks while simultaneously tracking highly dynamic and complex motions.

2606.03954 2026-06-03 cs.CV cs.LG cs.RO 版本更新

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

VLESA: 用于人类活动监测的视觉语言具身安全智能体

Hanjiang Hu, Yiyuan Pan, Jiaxing Li, Xusheng Luo, Alexander Robey, Na Li, Yebin Wang, Changliu Liu

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Mitsubishi Electric Research Laboratories(三菱电机研究实验室) Harvard University(哈佛大学)

AI总结 提出VLESA框架,通过自我中心视频监测人类活动,利用GRPO训练的目标条件安全Q过滤器进行实时安全干预,在ASIMOV-2.0基准上实现更高干预精度。

Comments 18 pages, 5 tables, 5 figures

详情
AI中文摘要

随着AI系统越来越多地协助人类完成物理任务,确保安全变得至关重要——物理动作会带来即时且不可逆转的后果,而数字错误则不会。我们引入了视觉语言具身安全智能体(VLESA),这是一个从自我中心视频监测人类活动,并在预测到危险动作时触发实时安全干预的框架。VLESA处理意图依赖的安全问题,其中相同的动作可能根据上下文而安全或危险。我们引入了一个将自我中心帧与目标条件安全注释配对的数据集,使得能够通过GRPO训练一个目标条件安全Q过滤器,该过滤器在不重新训练的情况下根据推断的意图评估动作。在此基础上,提出了一个意图-动作预测智能体,用于从视频中联合推断目标并预测未来动作。在ASIMOV-2.0基准上,VLESA在精确的地面真值帧处实现了比基线更高的干预准确率,而通过目标条件约束解码,GRPO训练的Q过滤器将动作安全性提高了超过41个百分点。代码可在该网址获取。

英文摘要

As AI systems increasingly assist humans in physical tasks, ensuring safety becomes paramount -- physical actions carry immediate and irreversible consequences that digital errors do not. We introduce the Vision-Language Embodied Safety Agent (VLESA), a framework that monitors human activities from egocentric video and triggers real-time safety interventions when dangerous actions are predicted. VLESA addresses intent-dependent safety where identical actions can be safe or dangerous depending on context. A dataset pairing egocentric frames with goal-conditioned safety annotations is introduced, enabling a goal-conditioned safety Q-filter trained via GRPO that evaluates actions with respect to inferred intent without retraining. On top of that, an intent-action prediction agent is proposed to jointly infer goals and predict future actions from video. On the ASIMOV-2.0 benchmark, VLESA achieves higher intervention accuracy at the exact ground-truth frame compared to baselines, while the GRPO-trained Q-filter improves action safety by over 41 percentage points through goal-conditioned constrained decoding. Code is available at https://github.com/HanjiangHu/VLESA.

2606.03949 2026-06-03 cs.RO 版本更新

Preference-Calibrated Human-in-the-Loop Reinforcement Learning for Robotic Manipulation

偏好校准的人机协同强化学习用于机器人操作

Zeyi Liu, Guangyao Liu, Yinuo Qu, Yuquan Xue, Bofang Jia, Chunhua Yang, Weihua Gui, Keke Huang, Ziwei Wang

发表机构 * Central South University(中南大学) Nanyang Technological University(南洋理工大学) Zhejiang University(浙江大学)

AI总结 提出PACT框架,通过干预隐式偏好信号进行信用重分配和策略对齐,提升人机协同强化学习的样本效率和性能。

Comments Submitted to CoRL2026

详情
AI中文摘要

人机协同强化学习(HIL-RL)通过在线人类干预提高了真实机器人操作中的样本效率。然而,成功的轨迹可能包含偏离期望任务执行路径并迫使人类干预的次优动作。现有的HIL-RL方法通常对所有转换应用一致的信用分配原则,通过次优段均匀传播折扣终端奖励,忽略了每个转换对任务成功的实际贡献。这高估了评论家学习的Q值,并间接误导演员更新朝向次优行为模式。为此,我们提出了PACT,一种偏好校准的演员-评论家训练框架,利用干预引起的隐式偏好信号对识别出的次优段进行信用重分配,同时直接指导策略训练以实现无偏的评论家-演员学习。具体来说,我们首先设计了一个从人类演示中学习并识别次优段进行信用校正的进度模型。然后,从干预状态下的人类动作和重采样策略动作中,我们构建偏好对来定义一个反事实优势,惩罚识别出的次优段的贝尔曼目标,实现方向性信用校准。此外,我们在有界均值空间中直接将策略与人类纠正动作对齐,提供了评论家引导更新之外的额外信号。在五个真实机器人操作任务中,PACT将平均成功率提高了24.5%,并实现了1.3倍的更快收敛,从而提高了强化学习的样本效率和性能。代码可在https://this URL获取。

英文摘要

Human-in-the-loop reinforcement learning (HIL-RL) improves sample efficiency in real-robot manipulation through online human intervention. However, successful trajectories may include suboptimal actions that deviate from the desired task-execution path and force human intervention. Existing HIL-RL methods typically apply the consistent credit assignment principle to all transitions, uniformly propagating discounted terminal rewards through suboptimal segments, ignoring the actual contribution of each transition to task success. This overestimates Q-values for critic learning and indirectly misguides actor updates toward suboptimal behavior patterns. To this end, we propose PACT, a Preference-calibrated Actor-Critic Training framework that leverages the implicit preference signals induced by intervention to perform credit reassignment on identified suboptimal segments while directly guiding policy training for unbiased critic-actor learning. Specifically, we first design a progress model that learns from human demonstration and identifies suboptimal segments for credit correction. Then, from the human action and resampled policy action at the intervention state, we build preference pairs to define a counterfactual advantage that penalizes Bellman targets of the identified suboptimal segment, enabling directional credit calibration. Moreover, we directly align the policy with human corrective actions in the bounded mean space, providing an additional signal beyond critic-guided updates. Across five real-robot manipulation tasks, PACT improves the average success rate by 24.5% and achieves 1.3 times faster convergence, thereby improving both RL sample efficiency and performance. Code is available at https://anonymous.4open.science/r/HILRL-A1X-BC05.

2606.03931 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Multi-Robot Bearing-only Pose Estimation via Angle Rigidity

基于角度刚性的多机器人仅方位姿态估计

J. Francisco Presenza, Leonardo J. Colombo, Ignacio Mas, Juan I. Giribet

发表机构 * Institute of Engineering Technology and Sciences "Hilario Fernández Long" (CONICET-UBA)(希拉里·费尔南德斯·隆工程技术与科学研究所(CONICET-UBA)) Centre for Automation and Robotics (CSIC-UPM)(自动化研究中心(CSIC-UPM)) Artificial Intelligence and Robotics Laboratory, Universidad de San Andrés and CONICET(人工智能与机器人实验室,圣安德烈斯大学及CONICET)

AI总结 提出一种分布式仅方位姿态估计器,利用体坐标系方位角计算位置并恢复姿态,仅需角度刚性条件,实现局部一致指数稳定。

详情
AI中文摘要

本文提出了一种新颖的分布式基于方位的姿态估计器,用于时变多机器人系统。该方法利用从体坐标系方位计算出的角度来估计机器人在 $\mathbb{R}^3$ 中的位置,而无需知道其方向。方向在 $\mathrm{SO}(3)$ 中从估计的位置、方位和方位导数中恢复。所提出的观测器仅要求(有向)感知拓扑是 extit{角度刚性的},这是一个比常用条件(如方位刚性)更弱的条件。在假设部分机器人持续激励运动的情况下,建立了所提出观测器的局部一致指数稳定性。通过仿真评估了该方案的有效性和实用性。

英文摘要

This letter proposes a novel distributed bearing-based pose estimator for time-varying multi-robot systems. The method uses angles computed from body-frame bearings to estimate the robots' positions in $\mathbb{R}^3$ without knowledge of their orientations. The orientations in $\mathrm{SO}(3)$ are recovered from the estimated positions, the bearings, and the bearing derivatives. The proposed observer only requires the (directed) sensing topology to be \textit{angle-rigid}, a weaker condition than the commonly used ones like bearing rigidity. Local uniform exponential stability of the proposed observer is established under the assumption of persistently exciting motions for a subset of robots. Simulations are presented and discussed to evaluate the scheme's effectiveness and practicality.

2606.03905 2026-06-03 cs.RO 版本更新

Semantic-weighted ICP for LiDAR Odometry: Class-Aware Residual Reweighting for Robust Scan Registration

语义加权ICP用于LiDAR里程计:基于类感知残差加权的鲁棒扫描配准

Vasco Carvalho, Tiago Barros, Urbano J. Nunes

发表机构 * Institute of Robotics and Autonomous Systems, University of Lisbon(里斯本大学机器人与自主系统研究所)

AI总结 提出语义加权ICP方法,通过根据语义类别的几何稳定性对残差进行加权,在动态和复杂环境中提升LiDAR里程计的位姿估计鲁棒性。

详情
AI中文摘要

LiDAR里程计是自主机器人系统的基本组成部分,依赖于连续点云之间的几何配准来估计自运动。然而,传统的几何方法在动态或非结构化环境中常常退化,原因是移动物体、稀疏几何特征、植被和语义模糊结构导致不可靠的对应关系。现有工作表明,其中一些限制可以通过在配准过程中引入环境的语义信息来解决。在这项工作中,我们在此基础上进一步表明,并非环境中的所有元素对配准同等重要。因此,我们提出了一种用于LiDAR里程计的语义类加权ICP。所提出的方法不是严格过滤掉属于特定语义类别的点,而是根据其预期的几何稳定性对属于语义类别的点的残差进行加权。这种策略使得信息丰富但可能不稳定的结构能够对配准过程做出贡献,同时减轻动态物体的影响。实验评估在SemanticKITTI和RELLIS-3D数据集上进行,这些数据集包括城市、高速公路、乡村和越野环境。实证结果表明,所提出的语义加权ICP改进了位姿估计,特别是在传统刚性特征稀缺的具有挑战性的越野场景中。此外,分析表明,这种加权策略的有效性高度依赖于环境,受场景的结构和语义组成影响。

英文摘要

LiDAR odometry is a fundamental component of autonomous robotic systems, relying on geometric registration between consecutive point clouds to estimate ego-motion. However, traditional geometric approaches often degrade in dynamic or unstructured environments due to unreliable correspondences caused by moving objects, sparse geometric features, vegetation, and semantically ambiguous structures. Existing works have shown that, some of these limitations can be addressed by introducing semantic information from the environment in the registration process. In this work, we build on this, and show that not all elements in the environment are equally relevant for registration. Hence, we propose a semantic class-weighted ICP for LiDAR odometry. Instead of strictly filtering out points belonging to specific semantic classes, the proposed approach weights the residuals of points belonging to semantic categories based on their expected geometric stability. This strategy enables informative but potentially unstable structures, to contribute to the registration process while mitigating the influence of dynamic objects. The experimental evaluation was conducted on the SemanticKITTI and RELLIS-3D datasets, which include urban, highway, rural, and off-road environments. The empirical results show that the proposed Semantic-weighted ICP improves pose estimation, especially in challenging off-road scenarios where conventional rigid features are scarce. Furthermore, the analysis reveals that the effectiveness of this weighting strategy is highly environment-dependent, influenced by the structural and semantic composition of the scene.

2606.03874 2026-06-03 cs.CV cs.RO 版本更新

DyaPlex: Full-Duplex Speech-Motion Model for Dyadic Interaction

DyaPlex: 用于二元交互的全双工语音-运动模型

Koki Nagano, Hongyu Liu, Seonwook Park, Tianye Li, Amrita Mazumdar, Christian Jacobsen, Shengze Wang, Michael Stengel, Rajarshi Roy, Ka Chun Cheung, Simon See, Shalini De Mello

发表机构 * NVIDIA HKUST(香港科技大学)

AI总结 提出DyaPlex,一种流式全双工语音-运动模型,通过双塔Transformer架构和统一二元令牌交织机制,实现同步多模态交互,在单体和二元交互基准上达到最优性能。

Comments Project page: https://research.nvidia.com/labs/amri/projects/DyaPlex

详情
AI中文摘要

我们提出了DyaPlex,一种用于二元交互的流式全双工语音-运动模型。为了捕捉人类交流的连续性和互惠性,这种全双工能力使智能体能够以流式方式同时感知和生成语音及物理运动。其核心在于,我们的方法利用了基础全双工语音模型的强先验,并集成了新颖的运动通路,从而实现完全同步的多模态交互。具体来说,我们设计了一种双塔Transformer架构,在保持冻结基础语音模型的零样本对话推理能力的同时,构建了深度耦合的流式运动通路。通过引入统一的二元令牌交织机制,并借助时间对齐的语音-运动RoPE引导交叉注意力,我们的模型有效地将自回归运动与丰富的潜在语音特征对齐。在4000小时的Seamless Interaction数据集上训练,我们的模型有效捕捉了跨说话者依赖关系,并在单体和二元人类交互基准上建立了新的最优性能。

英文摘要

We present DyaPlex, a streaming, full-duplex speech-and-motion model designed for dyadic interaction. To capture the continuous and reciprocal nature of human communication, this full-duplex capability empowers the agent to simultaneously perceive and generate both speech and physical motion in a streaming fashion. At its core, our method leverages the strong priors of a foundational full-duplex speech model and integrates a novel motion pathway, thereby achieving fully synchronized multi-modal interaction. Specifically, we design a dual-tower Transformer architecture that preserves the zero-shot conversational reasoning of a frozen base speech model while constructing a deeply coupled, streaming motion pathway. By introducing a unified dyadic token interleaving mechanism and guiding cross-attention via a time-aligned speech-motion RoPE, our model effectively aligns autoregressive motions with rich latent speech features. Trained on the 4,000-hour Seamless Interaction dataset, our model effectively captures cross-speaker dependencies and establishes new state-of-the-art performance across both monadic and dyadic human interaction benchmarks.

2606.03847 2026-06-03 cs.RO 版本更新

Denoising Tells When to Replan: Denoising-Variance Adaptive Chunking for Flow-Based Robot Policies

去噪提示何时重新规划:基于流的机器人策略的去噪方差自适应分块

Xiangdong Feng, Yuxuan Cheng, Chen Shi, Boyao Han, Yuxuan Yan, Yitong Hong, Zhuotao Tian, Li Jiang

发表机构 * Beijing Institute of Technology(北京理工大学) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Shenzhen Loop Area Institute(深圳Loop区研究院) Hunan University(湖南大学) Xi’an Jiaotong University(西安交通大学) Renmin University of China(中国人民大学) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳))

AI总结 针对基于流的机器人策略中固定执行步长的问题,提出DVAC方法,利用去噪过程中干净动作估计的方差自适应决定执行步长,在保持或提升任务成功率的同时降低重新规划频率。

详情
AI中文摘要

动作分块已成为基于流的机器人策略的常见推理策略,通过建模演示中的多步时间依赖关系来改善动作连贯性。然而,执行步长通常仍被设为经验固定值,忽略了可预测的自由空间运动和精度关键交互阶段往往需要不同的重新规划频率。在本文中,我们首先证明基于流的策略的去噪过程包含任务阶段的内在信号:干净动作估计在可预测运动阶段保持稳定,但在接触密集或精度敏感操作附近波动更大。受此观察启发,我们提出DVAC(去噪方差自适应分块),一种测试时方法,自适应地决定从每个预测分块中执行多少动作。DVAC测量最终去噪步骤中干净动作估计的方差,执行稳定的低方差前缀,并在提交高方差未来动作之前重新规划。为了跨任务和 rollout 迁移,DVAC进一步使用局部方差尺度的滚动估计来校准阈值。在LIBERO、RoboTwin、CALVIN和真实世界操作上的实验表明,DVAC在提高任务成功率的同时降低了重新规划频率。使用基于$\pi_{0.5}$的策略,DVAC将LIBERO成功率从94.75%提高到98.00%,重新规划减少43.0%,同时在RoboTwin和CALVIN上也取得了总体收益,并提高了真实世界执行效率。

英文摘要

Action chunking has become a common inference strategy for flow-based robot policies, improving action coherence by modeling multi-step temporal dependencies in demonstrations. However, the execution horizon is still typically set as an empirical fixed value, overlooking that predictable free-space motions and precision-critical interaction phases often require different replanning frequencies. In this work, we first show that the denoising process of flow-based policies contains an intrinsic signal of task phases: clean-action estimates remain stable during predictable motion phases, but fluctuate more strongly around contact-rich or precision-sensitive operations. Motivated by this observation, we propose DVAC (Denoising-Variance Adaptive Chunking), a test-time method that adaptively determines how many actions to execute from each predicted chunk. DVAC measures the variance of clean-action estimates over the final denoising steps, executes the stable low-variance prefix, and replans before high-variance future actions are committed. To transfer across tasks and rollouts, DVAC further calibrates the threshold with a rolling estimate of the local variance scale. Experiments on LIBERO, RoboTwin, CALVIN, and real-world manipulation show that DVAC improves task success while reducing replanning frequency. With a $π_{0.5}$-based policy, DVAC improves LIBERO success from 94.75% to 98.00% and reduces replanning by 43.0%, while also yielding aggregate gains on RoboTwin and CALVIN and improving real-world execution efficiency.

2606.03834 2026-06-03 cs.RO 版本更新

Let the Dynamics Flow: Stable Flow Matching Dynamical Systems

让动力学流动:稳定的流匹配动力系统

Rodrigo Pérez-Dattari, Francisco Leiva, Andrea Testa, Leonel Rozo, Javier Ruiz del Solar, Noémie Jaquier

发表机构 * Department of Robotics, Perception, and Learning, KTH Royal Institute of Technology(机器人、感知与学习系,皇家理工学院) Advanced Mining Technology Center (AMTC) and Department of Electrical Engineering, Universidad de Chile(先进采矿技术中心(AMTC)和电气工程系,智利大学) Bosch Center for Artificial Intelligence, Renningen, Germany(博世人工智能中心,德国Renningen) Italian Institute of Artificial Intelligence (AI4I), Turin, Italy(意大利人工智能研究所(AI4I),意大利都灵)

AI总结 提出稳定流匹配动力系统(SFMDS)框架,通过流匹配参数化动力系统并施加李雅普诺夫稳定性约束,实现稳定、可扩展、多模态的机器人运动生成。

详情
AI中文摘要

流匹配最近已成为模仿学习的一种强大方法,能够实现可扩展、表达力强且多模态的运动策略。然而,将这些生成模型纳入形式化的稳定性保证(确保机器人行为安全和可泛化的前提)仍然是一个重大挑战。虽然将机器人运动建模为动力系统允许这种基于稳定性的归纳偏置,但现有框架难以捕捉复杂机器人任务中固有的丰富动作分布。本文介绍了稳定流匹配动力系统(SFMDS),这是一个弥合高容量生成模型与形式化李雅普诺夫稳定性保证之间差距的新框架。SFMDS通过流匹配参数化动力系统,同时将模型约束到稳定解族。我们提出了两种变体:基于惩罚项的软约束,以及直接嵌入模型架构的硬结构约束。我们还将两种公式扩展到李群。在基准数据集、仿真和类人机器人上的实验表明,SFMDS在低维和高维状态空间中学习稳定、可扩展和多模态的动力系统,从而实现安全且富有表现力的机器人运动生成。

英文摘要

Flow matching has recently emerged as a powerful approach for imitation learning, enabling scalable, expressive, and multimodal motion policies. However, incorporating formal stability guarantees into these generative models, a prerequisite to ensure safe and generalizable robot behaviors, remains a significant challenge. While modeling robot motions as dynamical systems allows for such stability-based inductive biases, existing frameworks struggle to capture the rich action distributions inherent in complex robotic tasks. This paper introduces Stable Flow Matching Dynamical Systems (SFMDS), a novel framework that bridges the gap between high-capacity generative modeling and formal Lyapunov stability guarantees. SFMDS parametrizes dynamical systems via flow matching while simultaneously constraining the model to a family of stable solutions. We propose two variants: a soft constraint based on a penalty term, and a hard structural constraint embedded directly in the model architecture. We further extend both formulations to Lie groups. Experiments on benchmark datasets, in simulation, and on a humanoid robot show that SFMDS learns stable, scalable, and multimodal dynamical systems in low- and high-dimensional state spaces, enabling safe and expressive robot motion generation.

2606.03798 2026-06-03 cs.RO 版本更新

Optimal Design and Analytical Modeling of a Soft Fin-Ray Effect Gripper Finger Using the Finite Rigid Elements Method

基于有限刚性单元法的软体鳍射线效应夹爪手指的优化设计与解析建模

Sara Adeli, Hassan Sayyaadi

AI总结 提出采用有限刚性单元法(FREM)对软体鳍射线效应(FRE)夹爪手指进行建模与优化,实现精准力控,以轻柔抓取易损农产品。

详情
AI中文摘要

受鳍射线启发的软体夹爪为轻柔处理易损、不规则物体(尤其在农业中)提供了有前景的解决方案。本研究旨在设计、制造和建模一种鳍射线效应(FRE)软体夹爪手指,以实现未来应用中的精确力控制。该设计旨在轻柔抓取需要适应性和精确力施加的易损农产品,如番茄。为解决软体机器人固有的挑战,包括非线性行为、无限自由度和可变材料属性,采用有限刚性单元法(FREM)进行建模。该方法在保持解析精度的同时,为后续阶段力控制器的开发提供了可靠基础。使用ANSYS创建了详细的有限元模型(FEM),并通过仿真和实验测试验证了解析结果。基于四个关键标准优化了夹爪手指:尖端位移、总变形、应力分布和接触力。最优手指配置包括长度30毫米、肋间距10毫米、七根肋条角度-15度、肋条厚度1毫米。使用FREM的理论建模预测手指变形误差为3%,而ANSYS数值模型误差为2%。

英文摘要

Fin Ray-inspired soft grippers offer a promising solution for gently handling delicate, irregular objects, especially in agriculture. The objective of this research is to design, fabricate, and model a Fin Ray Effect (FRE) soft gripper finger to enable precise force control in future applications. This design aims to gently grasp delicate agricultural products, such as tomatoes, that require both adaptability and accurate force application. To address the inherent challenges of soft robotics, including nonlinear behavior, infinite degrees of freedom, and variable material properties, the Finite Rigid Elements Method (FREM) was employed for modeling. This method preserves analytical accuracy while providing a reliable foundation for the development of a force controller in later stages. A detailed Finite Element Model (FEM) was created using ANSYS, and the analytical results were validated through simulation and experimental testing. The gripper's fingers were optimized based on four key criteria: tip displacement, total deflection, stress distribution, and contact force. The optimal finger configuration includes a length of 30 mm, rib spacing of 10 mm, seven ribs angled at -15 deg, and a rib thickness of 1 mm. Theoretical modeling using the FREM predicted finger deformation with a 3% error, while the ANSYS numerical model achieved 2% error.

2606.03756 2026-06-03 cs.RO cs.LG 版本更新

Neural Navigation Functions for Zero-Shot Generalizable Motion Planning

神经导航函数用于零样本泛化运动规划

Benjamin D. Shaffer, Pei-An Hsieh, Brooks Kinch, Nathaniel Trask, M. Ani Hsieh

发表机构 * University of Pennsylvania, United States(宾夕法尼亚大学,美国) Department of Mechanical Engineering and Applied Mechanics(机械工程与应用力学系) Department of Electrical and Systems Engineering(电气与系统工程系)

AI总结 提出神经导航函数(Neural-NF),通过将数据驱动适应嵌入结构化椭圆规划器,实现跨未见环境几何的零样本迁移,并保证无碰撞、单调下降和全局最小值。

Comments 17 pages, 10 figures

详情
AI中文摘要

我们引入了神经导航函数(Neural-NF),一种学习到的反应式导航函数,能够跨未见环境几何进行零样本迁移。Neural-NF将数据驱动适应置于结构化椭圆规划器中,其中导航目标被学习,而规划器结构通过构造得以保留。具体来说,内在的拉普拉斯派生特征被映射到局部PDE系数,求解得到的边值问题在每个目标域上产生全局一致的值函数。对于每个可接受的学习模型,所得策略无碰撞,提供单调下降,并通过构造在目标处具有全局最小值。这为任何参数设置提供了线性可解的最优控制解释。实验上,Neural-NF在多样几何上实现了强大的零样本迁移,并比直接预测值函数的学习规划器性能提升高达5倍。

英文摘要

We introduce Neural Navigation Functions (Neural-NF), a learned reactive navigation function capable of zero-shot transfer across unseen environment geometries. Neural-NF places data-driven adaptation within a structured elliptic planner, where the navigation objective is learned while planner structure is preserved by construction. Specifically, intrinsic Laplacian-derived features are mapped to local PDE coefficients, and solving the resulting boundary value problem produces a globally consistent value function on each target domain. For every admissible learned model, the resulting policy is collision-free, provides monotonic descent and a global minimum at the goal by construction. This admits a linearly-solvable optimal-control interpretation for any parameter setting. Empirically, Neural-NF achieves strong zero-shot transfer across diverse geometries and outperforms learned planners that directly predict the value function by up to a $5\times$ improvement.

2606.03694 2026-06-03 cs.RO cs.CV cs.HC 版本更新

Face versus Body Tracking for Human-Robot Interaction: An Egocentric Dataset

面向人机交互的面部与身体跟踪:一个自我中心数据集

Jessica Wenninger, Gabriel Skantze

发表机构 * Furhat Robotics University of Naples Federico II(那不勒斯费德里科二世大学) Division of Speech, Music and Hearing, KTH Royal Institute of Technology(语音、音乐和听觉研究所,皇家理工学院)

AI总结 针对社交机器人自我中心视角下频繁身份切换问题,提出一个自定义标注的自我中心数据集,通过系统评估检测误差、对比面部与身体跟踪,并分析扩展空间记忆和外观重识别的影响,最终优化管道将身份切换减少49%。

Comments 8 pages, 5 figures, 3 tables. Accepted to the 35th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026)

详情
AI中文摘要

为了实现有意义的人机交互(HRI),机器人必须通过持续跟踪用户来不断评估参与度。然而,最先进的计算机视觉模型主要针对监控或自动驾驶进行了优化。社交机器人面临独特的自我中心挑战,例如人类跳动、相互遮挡或离开画面。频繁的身份切换(IDSW)会导致机器人在对话中失去立足点。为了解决这个问题,我们引入了一个新颖的、自定义标注的自我中心数据集,通过Furhat机器人收集,以捕捉复杂的社会动态。我们进行了系统评估,将检测错误与跟踪逻辑分离,比较面部与身体跟踪,并评估扩展空间记忆和外观重识别(ReID)的影响。结果表明,增加空间记忆可以缓解长时间遮挡,但在复杂动态事件上失败。集成ReID解决了复杂的切换,但表现出相反的效果:它显著提高了身体跟踪的稳定性,但由于轮廓角度敏感性导致面部IDSW激增。最终,我们的优化管道将IDSW减少了49%,减轻了交互中断。由于标准基准缺乏密集的近距离遮挡,这项工作强调了原生捕捉社会动态对于真正验证HRI感知模型的迫切需求。

英文摘要

To enable meaningful human-robot interaction (HRI), a robot must continuously assess engagement by consistently tracking users over time. State-of-the-art computer vision models, however, are heavily optimized for surveillance or autonomous driving. A social robot faces distinct egocentric challenges, such as humans bouncing, obstructing each other, or leaving the frame. Frequent identity switches (IDSW) cause the robot to lose its footing mid-conversation. To address this, we introduce a novel, custom-annotated egocentric dataset collected via the Furhat robot to capture complex social dynamics. We present a systematic evaluation isolating detection errors from tracking logic, comparing face versus body tracking, and assessing the impact of extended spatial memory and appearance re-identification (ReID). Results indicate that increasing spatial memory mitigates prolonged occlusions but fails on complex dynamic events. Integrating ReID resolves complex switches but exhibits opposing effects: it substantially improves body tracking stability, yet causes facial IDSW to spike due to profile angle sensitivity. Ultimately, our optimized pipeline reduces IDSW by 49\%, mitigating interaction breakdowns. Because standard benchmarks lack dense, close-quarter occlusions, this work highlights the critical need for natively captured social dynamics to truly validate HRI perception models.

2606.03682 2026-06-03 cs.RO 版本更新

GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language Navigation

GN0:迈向视觉语言导航中生成、评估与策略学习的统一范式

Xinhai Li, Xiaotao Zhang, Yuehao Huang, Jiankun Dong, Tianhang Wang, Sunyao Zhou, Yunzi Wu, Chengnuo Sun, Yunfei Ge, Qizhen Weng, Chi Zhang, Chenjia Bai, Xuelong Li

AI总结 提出GN0统一框架,通过自动生成大规模导航数据集GN-Matrix、基于3DGS的高保真仿真平台和BEV基准GN-Bench,结合RL驱动的导航基础模型BAE,在VLN任务上超越现有方法。

详情
AI中文摘要

具身导航将智能体与物理世界连接起来,是通用机器人智能的基础。导航数据的有限可用性和质量限制了视觉语言导航(VLN)系统的泛化和长时程能力。为解决这一问题,我们整理了多样化的3D场景,并开发了大规模导航数据的自动化流水线,生成了GN-Matrix数据集。基于3D高斯泼溅(3DGS)引擎,我们引入了一个支持交互式漫游和碰撞感知导航的高保真仿真平台。我们进一步提出了GN-Bench,这是首个基于BEV的基准测试,包含用于人机交互评估的动态3DGS化身。为了利用仿真器,我们开发了一个RL驱动的导航基础模型——Break and Establish(BAE)。在监督学习之后,DAgger将模型暴露于滚动生成的状态,打破了狭窄的专家中心分布,并实现了下游RL探索。这一统一的VLN范式整合了基于地图和无地图的任务,包括指令跟随、人类跟随和目标导航。GN-BAE将高保真3DGS渲染的鸟瞰图表示形式化为紧凑记忆,解锁了VLM中的潜在空间推理。在GN-Bench和VLN-CE上的广泛评估表明,GN0优于最先进的VLN方法。总体而言,GN-Matrix提供了一个涵盖数据、仿真和学习的统一框架,推动了研究和工业应用中的具身导航。

英文摘要

Embodied navigation connects intelligent agents with the physical world and is fundamental for general robotic intelligence. Limited availability and quality of navigation data have constrained Vision-and-Language Navigation (VLN) systems' generalization and long-horizon capabilities. To address this, we curate diverse 3D scenes and develop an automated pipeline for large-scale navigation data, resulting in the GN-Matrix dataset. Building on a 3D Gaussian Splatting (3DGS) engine, we introduce a high-fidelity simulation platform supporting interactive roaming and collision-aware navigation. We further propose GN-Bench, the first BEV-based benchmark incorporating dynamic 3DGS avatars for human-robot interaction evaluation. To leverage the simulator, we develop an RL-driven navigation foundation model, Break and Establish (BAE). After supervised learning, DAgger exposes the model to rollout-induced states, breaking narrow expert-centric distributions and enabling downstream RL exploration. This unified VLN paradigm integrates map-based and map-free tasks, including instruction following, human following, and goal navigation. GN-BAE formalizes high-fidelity 3DGS-rendered Bird's Eye View representations as compact memory, unlocking latent spatial reasoning in VLMs. Extensive evaluations on GN-Bench and VLN-CE show that GN0 outperforms state-of-the-art VLN methods. Overall, GN-Matrix offers a unified framework spanning data, simulation, and learning, advancing embodied navigation in research and industrial applications.

2606.03593 2026-06-03 cs.SE cs.RO 版本更新

Making Embodied AI Reliable: A Community Agenda from Testing to Formal Verification

使具身AI可靠:从测试到形式验证的社区议程

Xi Zheng, Dulanga Weerakoon, Yintong Huo, Teresa Yeo, Guy Van Den Broeck, Vijay Ganesh, Daniel Neider, Biplav Srivastava, Ivan Ruchkin, Archan Misra, Corina Pasareanu

发表机构 * University of Waterloo(滑铁卢大学) Universityinceton University(普林斯顿大学)

AI总结 本文基于AAAI'26 Bridge Program讨论,提出通过集成测试、形式验证和运行时保证的神经符号方法,解决具身AI在开放世界中的生命周期可靠性问题。

详情
AI中文摘要

具身AI系统越来越多地部署在开放世界环境中,但确保其可靠性仍然是一个根本性挑战。借鉴AAAI'26 Bridge Program关于“通过测试和形式验证使具身AI可靠”的讨论,本文认为具身AI的可靠性本质上是一个生命周期保证问题,源于不确定性、人类交互以及紧密耦合系统组件之间的涌现行为。我们确定了实现可靠具身AI的三个互补方向:(1)基于可信场景的测试,由经过验证的规范和有意义覆盖度量支持;(2)通过系统行为和环境的符号化结构化表示实现的组合验证;(3)能够在部署期间适应不确定性和分布偏移的运行时保证机制。我们不将这些方法视为独立,而是倡导集成保证工作流,通过共享的神经符号表示和系统生命周期中的持续反馈,连接测试、验证和运行时适应。这种集成为构建能够在复杂现实世界中安全可靠运行的值得信赖的具身AI系统提供了基础。

英文摘要

Embodied AI systems are increasingly deployed in open-world environments, yet ensuring their reliability remains a fundamental challenge. Drawing on discussions from the AAAI'26 Bridge Program on "Making Embodied AI Reliable with Testing and Formal Verification", this article argues that reliability in embodied AI is inherently a lifecycle assurance problem arising from uncertainty, human interaction, and emergent behaviors across tightly coupled system components. We identify three complementary directions toward reliable embodied AI: (1) trustworthy scenario-based testing supported by validated specifications and meaningful coverage metrics, (2) compositional verification enabled by structured symbolic representations of system behavior and environmental context, and (3) runtime assurance mechanisms capable of adapting to uncertainty and distribution shifts during deployment. Rather than treating these approaches independently, we advocate integrated assurance workflows that connect testing, verification, and runtime adaptation through shared neuro-symbolic representations and continuous feedback across the system lifecycle. Such integration provides a foundation for building trustworthy embodied AI systems that can operate safely and reliably in complex real-world environments.

2606.03590 2026-06-03 cs.RO 版本更新

CANMOT: Class-Aware Noise Modeling for Multi-Object Tracking in Autonomous Driving

CANMOT: 自动驾驶中多目标跟踪的类别感知噪声建模

Timo Osterburg, Stefan Schütte, Torsten Bertram

发表机构 * Institute of Control Theory and Systems Engineering, TU Dortmund University(控制理论与系统工程研究所,多特蒙德大学)

AI总结 针对自动驾驶中多目标跟踪任务,提出一种类别感知且目标对齐的噪声建模框架CANMOT,通过引入类别特定的过程与测量噪声协方差矩阵,并在目标坐标系中表达以保持纵向-横向各向异性,从而提升跟踪性能并显著减少身份切换。

Comments submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026)

详情
AI中文摘要

基于卡尔曼滤波的多目标跟踪(MOT)因其强大的性能、计算效率和可解释性,仍然是自动驾驶的强基线。在大多数实际系统中,过程噪声和测量噪声协方差是全局定义并在对象类别间共享的,假设异质交通参与者具有相同的不确定性特征。本文重新审视了这一假设,并提出了CANMOT,一种用于基于KF的3D MOT的类别感知和目标对齐的噪声建模框架。引入了类别特定的对角过程与测量协方差矩阵,并可选地在对象坐标系中表达以保持纵向-横向各向异性。在nuScenes基准上的系统实验表明,与最先进方法相比,类别感知和目标对齐的噪声建模提高了跟踪性能,并显著减少了身份切换。此外,使用平均归一化估计误差平方(ANEES)和基于$\chi^2$的违例测试分析了估计不确定性的一致性。结果揭示了标准基于KF的MOT基线存在严重的过度自信。虽然所提出的公式在不修改底层滤波框架的情况下改善了校准,但仍然表现出显著的不一致性,凸显了在该领域进一步研究的必要性。代码可在该https URL获取。

英文摘要

Kalman filter (KF)-based multi-object tracking (MOT) remains a strong baseline for autonomous driving due to its strong performance, computational efficiency and interpretability. In most practical systems, the process noise and measurement noise covariances are defined globally and shared across object classes, presuming identical uncertainty characteristics across heterogeneous traffic participants. This work revisits this assumption and proposes CANMOT, a class-aware and object-aligned noise modeling framework for KF-based 3D MOT. Class-specific diagonal process and measurement covariance matrices are introduced and optionally expressed in the object coordinate frame to preserve longitudinal-lateral anisotropy. Systematic experiments on the nuScenes benchmark show that class-aware and object-aligned noise modeling improves tracking performance and substantially reduces identity switches compared to state-of-the-art (SotA). In addition, the consistency of the estimated uncertainty is analyzed using the Average Normalized Estimation Error Squared (ANEES) and $χ^2$-based violation tests. The results reveal severe overconfidence in standard KF-based MOT baselines. While the proposed formulation improves calibration without modifying the underlying filtering framework, it still exhibits substantial inconsistency, highlighting the need for further research in this area. Code is available at https://github.com/rst-tu-dortmund/learned-3d-nms.

2606.03581 2026-06-03 cs.CV cs.RO 版本更新

UnsOcc: 3D Semantic Occupancy Prediction in Unstructured Scene via Rendering Fusion

UnsOcc:非结构化场景下基于渲染融合的3D语义占用预测

Ye Wu, Ruiqi Song, Baiyong Ding, Nanxin Zeng, Junjie Cheng, Yunfeng Ai

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) Waytous Inc.(Waytous公司)

AI总结 提出UnsOcc多模态框架,通过渲染融合模块和基于高斯溅射的细节感知辅助监督,解决非结构化场景中跨模态融合困难与长尾分布问题,在露天矿和nuScenes数据集上超越现有方法。

Comments 8 pages

详情
AI中文摘要

非结构化场景给自动驾驶带来了独特挑战,因为不规则障碍物和稀疏的场景布局削弱了3D目标检测等传统感知方法的有效性。3D语义占用预测因其能够通过为3D空间中的单个体素分配语义标签来提供密集的空间表示而成为研究热点。然而,将3D语义占用预测直接应用于非结构化场景仍然具有挑战性,因为场景稀疏性阻碍了有效的跨模态融合,并且这些场景中更严重的长期尾部分布进一步降低了预测性能。为了验证我们方法的有效性,我们构建了一个从露天矿收集的非结构化场景专用数据集。在此基础上,我们提出了UnsOcc,一种多模态3D语义占用预测框架,提高了在非结构化环境中的鲁棒性。其核心是,我们引入了一个基于渲染的融合模块RenderFusion,通过双向渲染监督增强跨模态特征对齐。此外,我们提出了GSRefinement,一种基于高斯溅射的细节感知辅助监督方法,将稀疏的3D占用预测投影到密集的2D语义分割图中,从而实现对长尾类别的有效监督。在露天矿数据集和nuScenes数据集上的大量实验表明,我们的方法显著优于现有的最先进方法。

英文摘要

Unstructured scenes present unique challenges for autonomous driving, as irregular obstacles and sparse scene layouts undermine the effectiveness of traditional perception methods such as 3D object detection. 3D semantic occupancy prediction has emerged as a prominent focus due to its ability to provide dense spatial representations by assigning semantic labels to individual voxels in 3D space. However, directly applying 3D semantic occupancy prediction to unstructured scenes remains challenging because scene sparsity hinders effective cross-modal fusion and the more severe long-tail distribution in these scenarios further degrades prediction performance. To validate the effectiveness of our approach, we construct a dedicated dataset of unstructured scenes collected from open-pit mines. Based on this, we propose UnsOcc, a multi-modal 3D semantic occupancy prediction framework that improves robustness in unstructured environments. At its core, we introduce a rendering-based fusion module, RenderFusion, which enhances cross-modal feature alignment through bidirectional rendering supervision. Furthermore, we propose GSRefinement, a detail-aware auxiliary supervision method based on Gaussian Splatting that projects sparse 3D occupancy predictions into dense 2D semantic segmentation maps, enabling effective supervision for long-tail categories. Extensive experiments on both the open-pit mine dataset and the nuScenes dataset demonstrate that our method significantly outperforms existing state-of-the-art approaches.

2606.03568 2026-06-03 cs.CV cs.AI cs.LG cs.RO 版本更新

Learned Non-Maximum Suppression for 3D Object Detection

用于3D目标检测的学习型非极大值抑制

Timo Osterburg, Stefan Schütte, Torsten Bertram

发表机构 * Institute of Control Theory and Systems Engineering, TU Dortmund University(控制理论与系统工程研究所,多特蒙德技术大学)

AI总结 提出两种基于学习的过滤模块(D2D-Rescore和GossipNet3D)替代启发式NMS,通过检测间关系提升3D检测性能,尤其改善小物体和稀有类别的检测精度。

Comments 6 pages, accepted at IEEE Intelligent Vehicles Symposium (IV) 2026

详情
AI中文摘要

后处理是基于激光雷达的3D目标检测中的关键阶段,必须过滤密集且重叠的提议以实现紧凑可靠的感知。本文引入了两个学习型过滤模块,通过利用检测之间的关系来替代启发式非极大值抑制(NMS)。D2D-Rescore采用基于Transformer的检测到检测(D2D)注意力,而GossipNet3D通过鸟瞰图中的局部消息传递将2D GossipNet概念适应到3D。一种与nuScenes评估协议对齐的度量感知匹配策略确保了训练和验证行为的一致性,从而提高了整体检测性能。与CircleNMS相比,两种方法都提高了平均精度(mAP)、nuScenes检测分数(NDS)和真阳性质量,特别是对于小物体和稀有类别,同时增加了最小的计算开销。这些结果表明,学习型的检测级过滤可以在不修改基础网络的情况下增强3D检测器的可靠性,为启发式抑制提供了一种原则性的替代方案。代码可在以下网址获取:https://this URL。

英文摘要

Post-processing is a critical stage in LiDAR-based 3D object detection, where dense and overlapping proposals must be filtered for compact and reliable perception. This work introduces two learned filtering modules that replace heuristic non-maximum suppression (NMS) by leveraging relations among detections. D2D-Rescore employs transformer-based detection-to-detection (D2D) attention, while GossipNet3D adapts the 2D GossipNet concept to 3D through localized message passing in bird's-eye view. A metric-aware matching strategy aligned with the nuScenes evaluation protocol ensures consistent training and validation behavior, improving overall detection performance. Both approaches improve mean average precision (mAP), nuScenes detection score (NDS), and true positive quality compared to CircleNMS, particularly for small and infrequent classes, while adding minimal computational overhead. These results demonstrate that learned, detection-level filtering can enhance 3D detector reliability without modifying the base network, offering a principled alternative to heuristic suppression. Code is available at https://github.com/rst-tu-dortmund/learned-3d-nms .

2606.03556 2026-06-03 cs.RO 版本更新

Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics

部分可观测的对抗性补丁攻击在机器人视觉-语言-动作模型上的应用

Xiaofei Wang, Mingliang Han, Tianyu Hao, Yi Yang, Yun-Bo Zhao, Keke Tang

发表机构 * Department of Automation, University of Science and Technology of China(自动化系,中国科学技术大学) SmartMore Corporation(SmartMore公司) Cyberspace Institute of Advanced Technology, Guangzhou University(广西亚技术空间研究所,广州大学) th Medical Center of Chinese PLA General Hospital(中国人民解放军总医院第八医学中心) Institute of Artificial Intelligence, Hefei Comprehensive National Science Center(合肥综合性国家科学中心人工智能研究院)

AI总结 针对机器人VLA模型,提出部分可观测威胁模型下的两阶段攻击框架,利用注意力图定位关键区域并优化补丁以破坏语义接地和增加动作轨迹曲率,导致长期任务失败。

Comments Accepted by IEEE Robotics and Automation Letters, 2026

详情
AI中文摘要

视觉-语言-动作(VLA)模型在机器人领域受到关注,但其对对抗性攻击的鲁棒性仍鲜有探索。现有工作表明对抗性补丁可以误导基于VLA的机器人,但假设完全访问整个执行轨迹,这在实践中是不现实的。我们通过制定部分可观测威胁模型来解决这一限制,其中攻击者只能利用轨迹的短前缀来生成固定补丁,应用于所有后续帧。在此设置下,我们提出了一个两阶段框架。首先,我们使用模型的注意力图定位补丁,以识别与完整指令对应的视觉关键区域。然后,我们优化补丁以破坏目标对象的语义接地并增加动作轨迹的曲率,从而在感知和控制中复合故障。在模拟和真实机器人环境中的大量实验表明,我们的方法在部分可观测性下维持对抗效果,诱导长期中断并显著降低任务成功率。

英文摘要

Vision-language-action (VLA) models are gaining attention in robotics, yet their robustness to adversarial attacks remains largely unexplored. Existing work shows that adversarial patches can mislead VLA-based robots but assumes full access to the entire execution trajectory, an unrealistic requirement in practice. We address this limitation by formulating a partially observable threat model, where the adversary can exploit only a short prefix of the trajectory to generate a fixed patch applied to all subsequent frames. Under this setting, we propose a two-phase framework. First, we localize the patch using the model's attention maps to identify visually critical regions that correspond to the full instruction. Then, we optimize the patch to disrupt the semantic grounding of target objects and increase the curvature of action trajectories, thereby compounding failures in both perception and control. Extensive experiments in simulation and real-world robotic environments show that our method sustains adversarial effects under partial observability, inducing long-horizon disruptions and significantly reducing task success rates.

2606.03551 2026-06-03 cs.RO 版本更新

NVIDIA Isaac Sim: Enabling Scalable, GPU-Accelerated Simulation for Robotics

NVIDIA Isaac Sim:实现可扩展的GPU加速机器人仿真

Sicong Gao, Maurice Pagnucco, Tomasz Bednarz, Yang Song

发表机构 * School of Computer Science and Engineering, The University of New South Wales(新南威尔士大学计算机科学与工程学院) NVIDIA USA(NVIDIA美国公司)

AI总结 本文系统综述了NVIDIA Isaac Sim的架构、应用模式及局限性,重点分析其GPU加速在大规模并行训练、合成数据生成和物理精确建模方面的优势,并探讨了未来方向。

详情
AI中文摘要

仿真已成为机器人研究的核心基础设施。与以往的仿真器不同,NVIDIA Isaac Sim利用GPU加速实现大规模并行训练和物理精确建模。其合成数据生成流水线缓解了高质量训练数据的稀缺性,支持数据驱动的机器人学习和大规模以仿真为中心的实验。然而,现有综述通常将其视为众多仿真器之一,缺乏对其架构特性、使用模式和局限性的系统分析。本文从系统和应用角度综述Isaac Sim,概述其架构并与广泛使用的仿真器进行比较。我们分析了五个主要领域的代表性研究,总结了常见的使用模式,特别是在数据生成和高保真仿真方面。我们还概述了关键的未来方向和挑战,包括物理开放世界学习、以仿真为中心的培训以及实际可用性约束。

英文摘要

Simulation has become a core infrastructure for robotics research. Unlike previous simulators, NVIDIA Isaac Sim leverages GPU acceleration to enable large-scale parallel training and physics-accurate modeling. Its synthetic data generation pipeline alleviates the scarcity of high-quality training data, supporting data-driven robot learning and large-scale simulation-centric experimentation. However, existing surveys often treat it as one simulator among many, without a systematic analysis of its architectural characteristics, usage patterns, and limitations. This survey reviews Isaac Sim from system and application perspectives, outlining its architecture and comparing it with widely used simulators. We analyze representative studies across five major domains and summarize common usage patterns, particularly in data generation and high-fidelity simulation. We also outline key future directions and challenges, including physics open-world learning, simulation-centric training and practical usability constraints.

2606.03545 2026-06-03 cs.RO 版本更新

Static and Dynamic Representations for Tactile Contact-Angle Estimation with Event-Based Sensors

基于事件传感器的触觉接触角估计的静态与动态表示

Yanhui Lu, Efi Psomopoulou, Benjamin Ward-Cherrier

发表机构 * School of Engineering Mathematics and Technology, University of Bristol(布里斯托大学工程数学与科技学院)

AI总结 本文利用事件触觉传感器(NeuroTac)的事件流,比较了三种事件衍生的空间轮廓表示(动态、静态及其组合)用于接触角估计,并验证了其在机器人操作中实现高频、低延迟触觉角度估计的潜力。

Comments 8 pages, 8 figures. Submitted to IEEE Robotics and Automation Letters (RAL), under review

详情
AI中文摘要

基于事件的触觉传感为接触密集的机器人交互提供了低延迟信号采集。本文研究了使用来自事件触觉传感器(NeuroTac)的事件流进行接触角估计,并比较了三种事件衍生的空间轮廓表示:捕获近期事件活动的动态表示、恢复更持久接触状态的静态表示以及它们的组合表示。在评估的运动场景中,所有表示管道在所有测试采样间隔下的P99处理延迟均低于10毫秒,展示了它们在机器人操作中用于高频基于事件的触觉角度估计的潜力。在特定场景训练下,静态表示始终比动态和组合表示表现略好,在连续传感器滚动期间产生平均总体MAE为0.160°,在随机插入的运动中断期间停止阶段平均MAE为0.251°。它还在速度和压痕深度变化方面表现出比其他两种表示更小的性能波动。

英文摘要

Event-based tactile sensing offers low-latency signal acquisition for contact-rich robotic interaction. This paper investigates contact-angle estimation using event streams from an event-based tactile sensor (NeuroTac) and compares three event-derived spatial contour representations: a dynamic representation capturing recent event activity, a static representation recovering a more persistent contact state, and their combined representation. Across the evaluated motion scenarios, all representation pipelines exhibited P99 processing latency below 10 ms at all tested sampling intervals, demonstrating their potential for high-frequency event-based tactile angle estimation in robotic manipulation. The static representation consistently achieved marginally better performance than the dynamic and combined representations under scenario-specific training, yielding a mean overall MAE of 0.160° during continuous sensor rolling and a stop-phase mean MAE of 0.251° during randomly inserted motion interruptions. It also exhibited smaller performance fluctuations across speed and indentation depth variations than the other two representations.

2606.03536 2026-06-03 cs.RO 版本更新

Bionic Human-Motion Style Transfer for Physically Executable Whole-Body Control of Humanoid Robots

仿人运动风格迁移用于人形机器人物理可执行全身控制

Tianchen Huang, Mingkuan Zhao, Yang Gao, Feiyang Yuan, Junchi Gu, Xiaohu Zhang, Dongdong Zhao, Shi Yan, Yu Wang, Wei Gao, Shiwu Zhang

发表机构 * Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China(人形机器人研究院,精密机械与精密仪器系,中国科学技术大学) School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University(计算机科学与技术学院,电子与信息工程学院,西安交通大学) School of Information Science and Engineering, Lanzhou University(信息科学与工程学院,兰州大学)

AI总结 提出一种仿生生成到控制框架,通过物理感知多条件潜扩散模型和预览式全身跟踪策略,将短时人体风格示例迁移到不同运动内容上,实现人形机器人可执行且表达性强的全身运动。

Comments Project page: https://huangtc233.github.io/bionic-style-transfer/

详情
AI中文摘要

表达性全身运动对于在人类环境中运行的人形机器人至关重要,机器人需要稳定移动的同时呈现可读且可调整的身体行为。然而,大多数表达性运动仍来自固定演示或手动设计的脚本,难以在不同运动内容间复用演示风格。受人体运动风格通过步态节奏、姿态、手臂摆动和身体摇摆传递情感和意图线索的启发,本文提出了一种仿生生成到控制框架,用于人形机器人上的示例驱动风格迁移。给定一个短时人体风格示例和目标内容运动,所提框架生成一个风格化全身参考,保留预期运动内容的同时迁移演示风格。开发了一个物理感知多条件潜扩散模型来融合风格、内容和轨迹条件,并使用无分类器引导在不重新训练的情况下调整风格强度。为提高硬件可执行性,在训练期间对解码后的运动施加接触一致性和时间平滑正则化。生成的参考随后转换为G1兼容的机器人参考,并由基于预览的全身跟踪策略执行,该策略采用聚类和蒸馏策略训练。仿真和Unitree G1实验表明,所提方法可以将短时人体风格示例迁移到多样化的机器人运动内容,与面向动画的风格迁移基线相比减少接触和抖动伪影,并在125次真实机器人试验中达到96.0%的成功率。结果证明了使用短时人体运动示例作为可复用的仿生源实现物理可执行表达性人形运动的可行性。

英文摘要

Expressive whole-body motion is important for humanoid robots operating in human environments, where robots are expected to move stably while presenting readable and adjustable body behaviors. However, most expressive motions are still obtained from fixed demonstrations or manually designed scripts, making it difficult to reuse a demonstrated style across different motion contents. Inspired by the way human motion styles convey affective and intentional cues through gait rhythm, posture, arm swing and body sway, this paper proposes a bionic generation-to-control framework for exemplar-driven style transfer on humanoid robots. Given a short human style exemplar and a target content motion, the proposed framework generates a stylized whole-body reference that preserves the intended motion content while transferring the demonstrated style. A physics-aware multi-condition latent diffusion model is developed to fuse style, content and trajectory conditions, and classifier-free guidance is used to adjust the style intensity without retraining. To improve hardware executability, contact-consistency and temporal-smoothness regularization are imposed on decoded motions during training. The generated references are then converted into G1-compatible robot references and executed by a preview-based whole-body tracking policy trained with a cluster-and-distill strategy. Simulation and Unitree G1 experiments show that the proposed method can transfer short human style exemplars to diverse robot motion contents, reduce contact and jitter artifacts compared with animation-oriented style-transfer baselines, and achieve a 96.0% success rate over 125 reported real-robot trials. The results demonstrate the feasibility of using short human motion exemplars as reusable bionic sources for physically executable expressive humanoid motion.

2606.03512 2026-06-03 cs.RO cs.AI 版本更新

SPADE: Sketch-guided Path Planning Augmented with Diffusion Experts

SPADE: 草图引导的路径规划增强扩散专家

Charbel Abi Hana, Tatiana Ghantous, Mikael Khalil, Anthony Rizk

发表机构 * IDEALworks GmbH IMT Atlantique IDEALworks GmbH & Saint Joseph University of Beirut(IDEALworks GmbH及贝鲁特圣约瑟夫大学)

AI总结 提出一种结合扩散增强的框架,通过改进的标注工具和训练策略,在保持实时性的同时提升路径规划的泛化能力和鲁棒性,显著降低姿态误差和FID。

详情
AI中文摘要

路径规划对于自主移动机器人(AMR)至关重要。将人类偏好纳入规划的常规方法通常依赖于复杂的奖励工程或硬件密集型解决方案。最近的最先进框架利用模仿学习从专家演示中训练特定行为的路径规划模型。然而,这些方法面临两个关键限制:对未见环境的泛化能力有限,以及演示收集中的鲁棒性较低。为了解决这些挑战,本文介绍了一个增强框架,专注于两个主要贡献:一个基于ROS 2重构的标注工具,以及一种新颖的训练策略,将基于扩散的数据增强集成到基线行为克隆模型中。提供了专家演示数据集,并通过消融研究评估所提出解决方案的鲁棒性。增强方法优于最先进的方法,绝对姿态误差(APE)降低39.1%,Fréchet初始距离(FID)降低33.5%,同时可训练参数减少93.8%。此外,它达到了扩散级别的泛化能力,同时保留了最先进模型的实时、边缘特性。

英文摘要

Path planning is essential for Autonomous Mobile Robots (AMRs). Conventional methods for incorporating human preferences into planning typically rely on either complex reward engineering or hardware-intensive solutions. Recent state-of-the-art frameworks leverage imitation learning to train behavior-specific path planning models from expert demonstrations. However, these approaches face two key limitations: limited generalization to unseen environments and low robustness in demonstration collection. To address these challenges, this work introduces an enhanced framework that focuses on two main contributions: an overhauled annotation tool built on ROS 2, and a novel training strategy that integrates diffusion-based augmentation into baseline behavioral cloning models. A dataset of expert demonstrations is provided and evaluated through ablation studies to assess the robustness of the proposed solution. The enhanced approach outperforms state-of-the-art methods with 39.1% lower Absolute Pose Error (APE) and 33.5% lower Fr'echet Inception Distance (FID) while having 93.8% less trainable parameters. Moreover it attains diffusion-level generalization while preserving the real-time, on-edge properties of state-of-the-art models.

2606.03476 2026-06-03 cs.RO 版本更新

Human2Humanoid: Physics-Aware Cross-Morphology Motion Retargeting for Humanoid Robots

Human2Humanoid: 面向人形机器人的物理感知跨形态运动重定向

Tianchen Huang, Feiyang Yuan, Junchi Gu, Shurui Fang, Xiaohu Zhang, Yu Wang, Wei Gao, Shiwu Zhang

发表机构 * Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China(人形机器人研究院,精密机械与精密仪器系,中国科学技术大学)

AI总结 提出Human2Humanoid无监督运动重定向框架,利用CycleGAN和骨架感知图卷积网络处理未配对数据,通过形态不变末端执行器一致性损失和物理感知可行性约束,实现从人体运动到人形机器人的高保真重定向。

Comments Project page: https://huangtc233.github.io/human2humanoid_website/

详情
AI中文摘要

将人体运动重定向到人形机器人对于远程操作、模仿学习和人机交互至关重要。然而,由于人类与机器人在骨骼拓扑、肢体比例和自由度等方面的显著形态差异,以及配对运动数据的稀缺性,这仍然具有挑战性。本文提出了Human2Humanoid,一种无监督运动重定向框架,能够将人体运动高保真地迁移到人形机器人行为。为了在未配对数据下弥合领域差距,我们采用基于CycleGAN的架构,配备骨架感知图卷积网络来捕获拓扑相关的运动特征。为了解决跨域尺度不匹配问题,我们引入了一种形态不变的末端执行器一致性损失,该损失对齐归一化的末端执行器轨迹,以保留跨实体的运动语义。为了提高物理合理性并减少接触伪影,我们施加了显式的物理感知可行性约束,以鼓励再现源运动中的接触模式。实验结果表明,所提出的方法成功地将人体运动重定向到Unitree G1人形机器人,无需配对数据,并且在下游可控性和物理可行性方面均优于现有方法。

英文摘要

Retargeting human motion to humanoid robots is critical for teleoperation, imitation learning and human-robot interaction. However, it remains challenging because of substantial morphological discrepancies between humans and robots, including differences in skeletal topology, limb proportions and degrees of freedom, as well as the scarcity of paired motion data. This paper presents Human2Humanoid, an unsupervised motion retargeting framework that transfers human motions to humanoid robot behaviors with high fidelity. To bridge the domain gap under unpaired data, we adopt a CycleGAN-based architecture equipped with a skeleton-aware graph convolutional network to capture topology-dependent motion features. To address cross-domain scale mismatches, we introduce a morphology-invariant end-effector consistency loss that aligns normalized end-effector trajectories to preserve motion semantics across embodiments. To improve physical plausibility and reduce contact artifacts, we impose explicit physics-aware feasibility constraints to encourage reproduction of the contact patterns in the source motion. Experimental results show that the proposed method successfully retargets human motion to the Unitree G1 humanoid robot without paired data, and outperforms existing methods in both downstream controllability and physical feasibility.

2606.03421 2026-06-03 cs.RO 版本更新

Reliability-Guided Depth Fusion for Glare-Resilient Navigation Costmaps

基于可靠性引导的深度融合用于抗眩光导航代价地图

Shang-En Tsai

AI总结 针对反光地面、玻璃边界等表面导致的深度测量噪声,提出基于显式深度可靠性建模的代价地图构建方法,通过DRM-Net预测像素级可靠性并采用加权门控融合机制抑制错误占据更新,实验证明能有效减少虚假障碍并保持实时性能。

详情
AI中文摘要

反光地面、玻璃边界和光滑室内表面上的镜面眩光经常破坏主动立体RGB-D深度测量,产生空洞和尖峰,这些空洞和尖峰在占据栅格代价地图中累积为持久的幻影障碍物。本文提出一种基于显式深度可靠性建模的抗眩光代价地图构建方法。轻量级深度可靠性地图网络(DRM-Net)预测镜面干扰下的逐像素测量可信度,可靠性引导的加权门控融合(RGF)机制在损坏的测量值累积到地图之前调节占据更新。为了支持鲁棒的训练和评估,该方法使用姿态对齐的多视图参考深度构建来减少循环监督偏差,并通过融合变体消融、参数敏感性分析、跨条件测试、配对导航比较、可靠性地图指标和嵌入式运行时分析进行评估。在配备Intel RealSense D435和Jetson Orin Nano的真实移动机器人平台上的实验表明,所提方法减少了虚假障碍物插入,改善了自由空间保留,并在反光地板、玻璃墙和自然光眩光条件下保持实时吞吐量。这些结果支持将眩光视为测量可靠性问题,而不是密集深度补全问题,用于安全关键的室内导航。

英文摘要

Specular glare on reflective floors, glass boundaries, and glossy indoor surfaces frequently corrupts active-stereo RGB-D depth measurements, producing holes and spikes that accumulate as persistent phantom obstacles in occupancy-grid costmaps. This paper presents a glare-resilient costmap construction method based on explicit depth-reliability modeling. A lightweight Depth Reliability Map network (DRM-Net) predicts per-pixel measurement trustworthiness under specular interference, and a reliability-guided weighted-and-gated fusion (RGF) mechanism modulates occupancy updates before corrupted measurements are accumulated into the map. To support robust training and evaluation, the method uses pose-aligned multi-view reference-depth construction to reduce circular-supervision bias and is evaluated through fusion-variant ablations, parameter-sensitivity analysis, cross-condition tests, paired navigation comparisons, reliability-map metrics, and embedded runtime profiling. Experiments on a real mobile robotic platform equipped with an Intel RealSense D435 and a Jetson Orin Nano show that the proposed method reduces false obstacle insertion, improves free-space preservation, and maintains real-time throughput under reflective-floor, glass-wall, and natural-light glare conditions. These results support treating glare as a measurement-reliability problem rather than as a dense depth-completion problem for safety-critical indoor navigation.

2606.03392 2026-06-03 cs.RO 版本更新

OpenEAI-Platform: An Open-source Embodied Artificial Intelligence Hardware-Software Unified Platform

OpenEAI-Platform: 一个开源具身人工智能硬件-软件统一平台

Jinyuan Zhang, Luoyi Fan, Leiyu Wang, Yeqiang Wang, Yicheng Zhu, Cewu Lu, Nanyang Ye

发表机构 * Shanghai Innovation Institute(上海创新研究院) Huazhong University of Science and Technology(华中科技大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出OpenEAI-Platform,集成低成本6+1自由度机械臂和可复现VLA模型,通过开源设计和两阶段训练在真实操作任务中超越商业臂,性能媲美大规模预训练基线。

详情
AI中文摘要

现实世界中的具身AI需要精确的硬件和稳健的视觉-语言-动作(VLA)策略。我们提出OpenEAI-Platform,一个完全开源平台,集成了低成本6+1自由度机械臂(OpenEAI-Arm)和可复现的VLA模型(OpenEAI-VLA)。OpenEAI-Arm提供开源机械设计以实现低制造成本,并采用柔顺控制方法以提高精度。OpenEAI-VLA基于Qwen3-VL-4B,使用扩散Transformer动作头,并仅使用开源机器人和多模态数据集进行两阶段训练。在四个真实操作任务中,OpenEAI-Arm在相同策略下优于两款商用6+1自由度机械臂,而OpenEAI-VLA在仅有限预训练数据下达到了与大规模预训练pi0基线相当的成功率。我们将发布完整的硬件设计、驱动程序、模型以及训练/数据流水线,以支持可复现研究和可扩展数据收集。我们的代码、布局和模型将在论文被接收后发布。

英文摘要

Embodied AI in the real world requires both accurate hardware and robust vision-language-action (VLA) policies. We present OpenEAI-Platform, a fully open-source platform that integrates a low-cost 6+1 degree-of-freedom (dof) robotic arm (OpenEAI-Arm) and a reproducible VLA model (OpenEAI-VLA). OpenEAI-Arm provides open-source mechanical designs for low manufacturing cost and compliant control methods for higher accuracy. OpenEAI-VLA builds on Qwen3-VL-4B and uses a Diffusion Transformer action head, and is trained in two stages with only open-source robot and multimodal datasets. Across four real-world manipulation tasks, OpenEAI-Arm outperforms two commercial 6+1-dof arms under the same policy, and OpenEAI-VLA achieves success rates comparable to the large-scale pretrained pi0 baseline with only limited pretraining data. We will release the full hardware designs, drivers, models, and training/data pipelines to support reproducible research and scalable data collection. Our codes, layouts, and models will be released after the paper is accepted.

2606.03390 2026-06-03 cs.RO 版本更新

Extreme Motion Generation via Hybrid Null-Space Control for Straight-Line Path Following

通过混合零空间控制实现直线路径跟踪的极端运动生成

Xinyi Yuan, Weiwei Wan, Kensuke Harada

发表机构 * Graduate School of Engineering Science, The University of Osaka, Japan(大阪大学工学研究科)

AI总结 提出一种混合控制器,结合强化学习策略和模型控制,在关节极限附近切换,以最大化机械臂沿预定轨迹的笛卡尔路径长度,在7自由度Franka FR3上平均延长27%的路径长度。

详情
AI中文摘要

这项工作研究了“极端运动生成”,旨在在机械臂工作空间内沿预定义轨迹最大化笛卡尔路径长度。这一目标在工业中很重要,因为路径跟踪是许多任务(如表面涂层和焊接)的基础。更关键的是,极端运动使固定基座机械臂能够在有限可达性下利用运动学能力。然而,这种利用在实践中具有挑战性,因为机械臂必须在执行过程中主动避开安全边界,这本质上是一个长视界问题。因此,我们主张长视界决策应委托给基于学习的策略以最大化利用,而经典模型控制器覆盖近边界区域,其中学习策略由于稀疏数据覆盖而急剧退化。具体来说,我们提出的方法是一个步级混合控制器,根据归一化关节极限距离在基于强化学习的控制器和模型控制器之间切换。初始关节配置通过条件扩散采样获得,基于学习到的运动先验改进了可实现的路径长度。我们在7自由度Franka FR3上对10,000个直线路径跟踪任务评估了所提出的框架,平均滚动长度比基于模型的基线延长了27%。值得注意的是,某些任务产生了朝向运动极端的显著延伸,如统计结果中报告的最大改进所示。本文的项目网站和相关视频可在此https URL找到。

英文摘要

This work studies ``extreme motion generation'', which aims to maximize the Cartesian path length along a pre-defined trajectory within the manipulator's workspace. This objective is important in industry as long as path-following is fundamental to a large variety of tasks such as surface coating and welding. More critically, extreme motion enables a fixed-base manipulator to exploit the kinematic capability under limited reachability. However, such exploitation is challenging in practice, as the manipulator must actively avoid the safety boundary through execution, which is inherently a long-horizon problem. Accordingly, we claim that long-horizon decision-making should be delegated to a learning-based policy to maximize exploitation, while a classical model-based controller covers the near-boundary region, where the learning policy degrades sharply due to sparse data coverage. In detail, our proposed method is a step-level hybrid controller that switches between an RL-based and a model-based controller according to the normalized joint-limit distance. The initial joint configuration is sampled through conditional diffusion-based sampling, which improves the achievable path length based on the learned motion prior. We evaluate the proposed framework on 10,000 straight-line path-following tasks with a 7-DoF Franka FR3, extending the average rollout length by 27\% over the model-based baseline. Notably, certain tasks yield a pronounced extension toward the motion extreme, as reflected in the maximum improvement reported in the statistical results. The project website and related videos of this paper can be found at https://yuan-xinyi.github.io/extreme-motion-generation/.

2606.03385 2026-06-03 cs.RO cs.AI 版本更新

Grasp-Then-Plan with Failure Attribution: A Closed Two-Stage Framework for Precise and Generalizable Robotic Manipulation

先抓取后规划与失败归因:一种用于精确且可泛化机器人操作的闭环两阶段框架

Jiahao Xu, Peiyuan Wang, Hanzhuo Zhang, Zihao Yu, Tianyu Fu, Hao Chen, Xuanhao Xiang, Jianbo Yu, Chenchen Fu, Wanyuan Wang

发表机构 * School of Computer Science and Engineering, Southeast University, China(东南大学计算机科学与工程学院)

AI总结 提出GTP-FA框架,通过任务导向的两阶段抓取-规划流程和失败归因模型,在抓取和规划模块中分别注入任务先验和风险惩罚以及针对高风险初始状态进行数据收集和微调,显著提升机器人操作任务的成功率。

Comments 32 pages, project page: https://sites.google.com/view/gtp-fa/

详情
AI中文摘要

在机器人操作中,抓取与运动规划之间的紧密耦合常常掩盖失败的真实原因,导致低效的试错过程。为了实现高效的长时域操作,我们提出了GTP-FA(先抓取后规划与失败归因),一种面向任务的两阶段抓取-规划框架,该框架生成抓取候选并根据所选抓取执行下游运动规划。给定失败的操作轨迹,我们学习一个失败归因模型,该模型可泛化到未见过的抓取,并生成失败模式的稳定分布以进行诊断引导的优化。基于这些归因结果,我们以诊断驱动的方式优化两个模块:在抓取侧,我们将任务级先验和风险惩罚注入抓取候选评分和优化中,以抑制不稳定或与任务不兼容的抓取;在规划侧,我们通过数据收集和微调针对高风险初始状态,以解决真正的规划瓶颈。我们在仿真和真实机器人实验中评估了所提出的框架,并表明GTP-FA在基于RL、IL、扩散策略和VLA的设置中提升了相应的基础学习器,实现了显著更高的总体任务成功率。

英文摘要

In robotic manipulation, the tight coupling between grasping and motion planning often obscures the true source of failure, leading to inefficient trial-and-error. To enable efficient long-horizon manipulation, we propose GTP-FA (Grasp-Then-Plan with Failure Attribution), a task-oriented two-stage grasp-then-plan framework that generates grasp candidates and performs downstream motion planning conditioned on the selected grasp. Given a failed manipulation trajectory, we learn a failure attribution model that generalizes to unseen grasps and produces a stable distribution over failure modes for diagnosis-guided optimization. Based on these attribution results, we then optimize both modules in a diagnosis-driven manner: on the grasping side, we inject task-level priors and risk penalties into grasp candidate scoring and optimization to suppress unstable or task-incompatible grasps; on the planning side, we target high-risk initial states through data collection and fine-tuning to address genuine planning bottlenecks. We evaluate the proposed framework in both simulation and real-robot experiments, and show that GTP-FA improves the corresponding base learners across RL, IL, diffusion-policy, and VLA-based settings, achieving substantially higher overall task success rates.

2606.03374 2026-06-03 cs.RO 版本更新

eMEM: A Hybrid Spatio-Temporal Memory System For Embodied Agents

eMEM:一种面向具身智能体的混合时空记忆系统

A. Haroon Rasheed, Maria Kabtoul

AI总结 提出eMEM混合图记忆系统,通过多索引架构和分层整合管道实现具身智能体在空间、时间和语义上的高效记忆检索,并在ProcTHOR-10K基准测试中达到80.8加权平均分。

详情
AI中文摘要

我们提出eMEM(具身记忆),一种基于混合图的记忆系统,用于在物理环境中运行的具身智能体。当前的智能体记忆架构,如Generative Agents、MemGPT和A-MEM,将记忆视为文本流或知识图谱,但具身智能体需要同时能够按意义、空间和时间进行搜索的记忆。eMEM通过一个统一在单一图模型背后的多索引架构(用于结构化存储的SQLite、用于近似最近邻语义搜索的hnswlib以及用于空间查询的R-tree)填补了这一空白。一个分层整合管道将原始感知观察转化为压缩摘要,模仿生物系统中海马体-新皮层的整合。十个面向智能体的回忆工具暴露了记忆检索原语,包括概念到位置的解析和跨层回忆,作为LLM工具调用的第一类操作。该系统完全嵌入式,与智能体在同一进程中运行。此外,我们引入了eMEM-Bench v1,这是一个我们在ProcTHOR-10K场景上构建的用于具身记忆评估的基准。该基准明确围绕八个认知心理学范式(DRM诱饵、模式分离、模式完成、源监控、上下文依赖检索、长时程干扰、序列位置和增强保留曲线)组织,每个范式都经过选择,使得结果能够对照人类和先前智能体记忆系统的更广泛记忆系统文献进行解释;这是像LoCoMo或OpenEQA这样的表面任务基准无法提供的诊断水平。eMEM在988个探针上获得80.8加权平均分,在模拟延迟从1小时到1年的房间独特项目上保持平稳的保留曲线。我们表明,纯RAG基线(flat_rag消融)在上下文依赖检索上损失30分,在DRM诱饵拒绝上损失29分,分别隔离了多层存储和整合的贡献。我们发布了系统和基准代码。

英文摘要

We present eMEM (Embodied Memory), a hybrid graph-based memory system for embodied agents operating in physical environments. Current agent memory architectures, such as Generative Agents, MemGPT, and A-MEM, treat memory as text streams or knowledge graphs, but embodied agents require memory that is simultaneously searchable by meaning, space, and time. eMEM fills this gap with a multi-index architecture (SQL ITE for structured storage, hnswlib for approximate nearest neighbour semantic search, and an R-tree for spatial queries) unified behind a single graph model. A tiered consolidation pipeline transforms raw perceptual observations into compressed summaries, mirroring hippocampal-neocortical consolidation in biological systems. Ten agent-facing recall tools expose memory retrieval primitives, including concept-to-location resolution and cross layer recall, as first-class operations for LLM tool calling. The system is fully embedded and runs in-process alongside the agent. In addition we introduce eMEM-Bench v1, a benchmark we construct over ProcTHOR-10K scenes for embodied memory evaluation. The benchmark is organised explicitly around eight cognitive-psychology paradigms (DRM lures, pattern separation, pattern completion, source monitoring, context-dependent retrieval, long-horizon interference, serial position, and a foil augmented retention curve), each chosen so that the result is interpretable against the broader memory-systems literature in humans and prior agent-memory systems; a level of diagnostic that surface-task benchmarks like LoCoMo or OpenEQA cannot provide. eMEM scores 80.8 weighted mean over 988 probes, with a flat retention curve at ceiling from 1 h to 1 yr of simulated delay on room-unique items. We show that a pure RAG baseline (the flat_rag ablation) loses 30 pt on context dependent retrieval and 29 pt on DRM lure rejection, isolating the contribution of multi-layer storage and consolidation respectively. We release both the system and the benchmark code.

2606.03340 2026-06-03 cs.RO 版本更新

Autonomous Navigation System for Library Service Robot Based on Unitree Go2 Edu

基于 Unitree Go2 Edu 的图书馆服务机器人自主导航系统

Aoduo Li, Haoran Lv, Bingquan Ou, Jianfeng Li, Yingdong Li, Zimeng Li

发表机构 * Unitree Go2 Edu

AI总结 针对图书馆狭窄通道和动态障碍物环境,提出基于 ROS 2 的四足机器人导航系统,融合 RTAB-Map、AMCL/EKF 和 Nav2 实现高成功率定位与避障,地图误差 3.7 cm。

Comments 6 pages, 5 figures, 4 tables. Accepted by WCCIS 2026

详情
AI中文摘要

图书馆需要自主机器人在狭窄通道中安静移动,同时确保读者、椅子、包和手推车周围的安全。本文提出了一套基于 Unitree Go2 Edu 四足机器人的 ROS 2 导航系统,该机器人配备了 4D LiDAR、前置深度相机和 IMU。我们并未假设图书馆是粗糙地形,而是针对实际部署中遇到的移动性不连续问题,包括地面过渡、临时杂乱和部分堵塞通道(低底盘轮式平台在此类场景中适应性较差)。采用 RTAB-Map 进行视觉-LiDAR SLAM,基于 AMCL 和 EKF 的传感器融合实现定位,以及基于 A* 和 DWA 的 Nav2 栈支持路径规划和局部避障。在真实图书馆中,该系统在静态、低密度动态和高密度动态场景下的成功率分别为 100%、96% 和 88%,而针对测量控制距离的地图验证显示平均度量误差为 3.7 cm。

英文摘要

Libraries require autonomous robots to move quietly through narrow aisles while remaining safe around readers, chairs, bags, and carts. This paper presents a ROS 2 navigation system for a Unitree Go2 Edu quadruped equipped with a 4D LiDAR, a front depth camera, and an IMU. Rather than assuming the library is rough terrain, we target the practical mobility discontinuities of real deployments, including floor transitions, temporary clutter, and partially blocked passages where low-clearance wheeled platforms are less tolerant. RTAB-Map is used for visual-LiDAR SLAM, AMCL and EKF-based sensor fusion provide localization, and a Nav2 stack with A* and DWA supports planning and local avoidance. In a real library, the system achieves 100%, 96%, and 88% success rates in static, low-density dynamic, and high-density dynamic scenes, while map validation against surveyed control distances yields a mean metric error of 3.7 cm.

2606.03335 2026-06-03 cs.RO 版本更新

GPU-Parallel Multi-Task Reinforcement Learning with Demonstration Guided Policy Optimization

GPU并行多任务强化学习与演示引导的策略优化

Rui Zhang, Qiwei Wu, Zhengyu Zhang, Tao Li, Yunrong Guo, Junjie Lai, Renjing Xu, Weihua Zhang

发表机构 * NVIDIA The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出一种将结构化操作任务族转化为GPU并行多任务强化学习基准的构建方法MT-LIBERO,并设计演示引导策略优化算法DGPO,结合重要性加权PPO与自适应行为克隆,实现异构任务套件的高效并行训练。

详情
AI中文摘要

大规模GPU并行强化学习已经改变了机器人仿真中可训练的内容,但大多数系统仍为每个任务优化一个专家策略。我们提出了一种构建方法,将结构化操作任务族转化为GPU并行多任务强化学习基准,并在Isaac Lab中使用LIBERO资产和任务谓词实例化为MT-Libero。该基准支持在异构任务套件上同时进行强化学习,具有并行渲染、物理随机化以及状态输入或视觉输入策略。为了使这种训练在稀疏成功信号和有限先验数据下变得实用,我们进一步提出了DGPO,一种在线演示引导方法,它将重要性加权PPO与对匹配演示动作的自适应行为克隆相结合。DGPO实现了对演示任务分布的可调偏好,在保持在线PPO的稳定性和在线改进优势的同时,优于无先验强化学习和现有的基于演示的方法。

英文摘要

Large scale GPU-parallel reinforcement learning has changed what can be trained in robot simulation, yet most systems still optimize one specialist policy per task. We propose a construction methodology for turning structured manipulation task families into GPU-parallel multi-task RL benchmarks, and instantiate it as MT-Libero using LIBERO assets and task predicates in Isaac Lab. The resulting benchmark supports simultaneous reinforcement learning over heterogeneous task suites with parallel rendering, physics randomization, and state-input or visual-input policies. To make such training practical under sparse success signals and limited prior data, we further propose DGPO, an on-policy demonstration guided method that combines importance weighted PPO with adaptive behavior cloning on matched demonstration actions. DGPO enables a tunable preference toward demonstrated task distributions, outperforming both prior-free RL and existing demonstration-based methods while preserving the stability and online improvement benefits of on-policy PPO.

2606.03312 2026-06-03 cs.RO cs.AI 版本更新

RobotValues: Evaluating Household Robots When Human Values Conflict

RobotValues: 当人类价值观冲突时评估家用机器人

Jongwook Han, Hyeongjin Kim, Yohan Jo

发表机构 * Graduate School of Data Science, Seoul National University(首尔国立大学数据科学研究生院)

AI总结 提出RobotValues基准,通过10K个价值冲突场景评估家用机器人规划器,发现视觉语言模型存在默认价值偏好且难以覆盖,表明评估需考虑价值冲突下的行动选择。

详情
AI中文摘要

虽然家用机器人通常基于任务完成度进行评估,但日常家庭环境涉及价值冲突情境,其中机器人应选择优先考虑其他价值观(如人类自主性、效率或社会适宜性)而非任务成功的行动。然而,目前尚无评估机器人在此类场景中价值偏好的基准。我们引入RobotValues,一个在10K个价值冲突场景中评估家用机器人规划器的基准。每个实例包含一个逼真的家庭图像和多个优先考虑不同人类价值观的合理机器人动作。我们通过LLM辅助场景生成、利益相关者基于价值观提取、图像生成和自动质量控制构建RobotValues。使用RobotValues评估机器人领域使用的视觉语言模型,发现模型表现出默认价值偏好,包括安全性和适应性,而低估了隐私优先的行动。当模型被指示优先考虑与其自身偏好冲突的特定价值观时,它们通常无法覆盖默认行动,80%的时间选择了错误行动。这些发现表明,家用机器人评估不仅应衡量任务完成度或安全性合规性,还应衡量当人类价值观冲突时机器人是否能在合理行动中做出选择。

英文摘要

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots' value preferences in such scenarios. We introduce RobotValues, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using RobotValues we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.

2606.03297 2026-06-03 cs.RO 版本更新

SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation

SplitAdapter: 通过因子化自适应的负载感知人形机器人移动操作

Jeonguk Kang, Hanbyel Cho, Sanghyun Kang, Donghan Koo

发表机构 * Future Robot AI Group, Samsung Electronics(三星电子未来机器人人工智能组)

AI总结 针对人形机器人在不同负载和高度下移动操作时负载变化与动力学不匹配的问题,提出SplitAdapter方法,通过冻结预训练策略并扩展负载与动力学感知编码器,结合分割世界模型目标、GRL交叉对抗正则化和分层特征线性调制,显著提升重载条件下的任务成功率。

详情
AI中文摘要

人形机器人的移动操作需要在不同物体质量和拾取/放置高度下实现稳定的全身控制。在仿真到现实的迁移中,物体引起的负载变化和机器人侧的动力学不匹配在物理接触期间相互作用,这尤其具有挑战性。现有的基于历史的自适应方法通常将这些因素压缩到单个潜在表示中,这可能在重载操作下削弱鲁棒性。我们提出 extbf{SplitAdapter: 通过因子化自适应的负载感知人形机器人移动操作},该方法冻结预训练的箱子操作策略,并通过使用分割世界模型目标、基于GRL的交叉对抗正则化和分层特征线性调制(FiLM)训练的物体/负载和动力学感知上下文编码器进行扩展。在仿真到仿真实验和实际部署中,SplitAdapter在物体质量为$2$、$4$和$6$千克以及拾取/放置高度为$0$、$30$和$60$厘米的情况下,相对于基础策略和世界模型FiLM基线提高了完整任务成功率,其中在重载条件下改进最大。

英文摘要

Humanoid loco-manipulation requires stable whole-body control under varying object masses and pickup/placement heights. This becomes particularly challenging in sim-to-real transfer, where object-induced load variation and robot-side dynamics mismatch interact during physical contact. Existing history-based adapters often compress these factors into a single latent representation, which can weaken robustness under heavy-load manipulation. We propose \textbf{SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation}, which freezes a pretrained box manipulation policy and extends it with object/load and dynamics-aware context encoders trained with split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical Feature-wise Linear Modulation (FiLM). In sim-to-sim experiments and real-world deployment, SplitAdapter improves Full-task success over the base policy and world-model FiLM baselines across object masses of $2$, $4$, and $6$ kg and pickup/placement heights of $0$, $30$, and $60$ cm, with the largest improvements under heavy-load conditions.

2606.03296 2026-06-03 cs.RO 版本更新

Bridging Predictive Uncertainty and Safe Action: Sample-Conditioned Differentiable Planning for Autonomous Driving

桥接预测不确定性与安全行动:面向自动驾驶的样本条件可微分规划

Chengzhen Meng, Pei Liu, Zhiyu Huang, Chen Lv, Jun Ma

发表机构 * Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology(香港科学与技术大学机器人与自主系统方向) Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology(香港科学与技术大学电子与计算机工程系) Department of Civil and Environmental Engineering, University of California, Los Angeles(加州大学洛杉矶分校土木与环境工程系) School of Mechanical and Aerospace Engineering, Nanyang Technological University(南洋理工大学机械与航空航天工程学院)

AI总结 提出一种样本条件可微分规划框架,通过扩散模型生成多样未来场景并直接输入可微分规划器,利用条件风险价值约束缓解预测不确定性,实现安全、高效、舒适的自动驾驶运动规划。

详情
AI中文摘要

复杂、动态且交互的驾驶环境给自动驾驶带来了重大挑战,主要源于周围交通的普遍不确定性。当前系统的一个基本瓶颈是高度表达性的不确定性建模与可解释、安全的运动规划之间的脱节。在本文中,我们提出了一种新颖的样本条件可微分规划框架,通过将扩散生成的未来轨迹显式纳入优化过程来弥合这一差距。我们的方法不是将预测压缩为单一的确定性未来或依赖黑盒端到端架构,而是利用条件扩散模型生成一组多样化的合理未来场景。关键的是,这些样本直接输入可微分规划器,该规划器通过经验条件风险价值尾部风险约束显式缓解预测不确定性。这使得规划器能够优化一条物理可解释的轨迹,该轨迹对罕见但安全关键的交互具有鲁棒性。此外,我们引入了一种场景上下文的有向图表示,在预测有效性和计算效率方面均带来了显著提升。通过在Waymo Open Motion和Argoverse 2数据集上进行的大量开环和闭环评估,我们的框架在安全性、效率和乘坐舒适性方面显著优于最先进的基线方法。

英文摘要

Complex, dynamic, and interactive driving environments pose significant challenges for autonomous driving, primarily due to the pervasive uncertainty of surrounding traffic. A fundamental bottleneck in current systems is the disconnect between highly expressive uncertainty modeling and interpretable, safe motion planning. In this paper, we propose a novel sample-conditioned differentiable planning framework that bridges this gap by explicitly incorporating diffusion-generated future trajectories into the optimization process. Rather than compressing predictions into a single deterministic future or relying on black-box end-to-end architectures, our approach leverages a conditional diffusion model to generate a diverse set of plausible future scenarios. Crucially, these samples are directly fed into a differentiable planner, which explicitly mitigates predictive uncertainty via an empirical Conditional Value-at-Risk (CVaR) tail-risk constraint. This allows the planner to optimize a physically interpretable trajectory that is robust to rare yet safety-critical interactions. Furthermore, we introduce a directed graph representation for scene context that yields substantial improvements in both predictive effectiveness and computational efficiency. Validated through extensive open-loop and closed-loop evaluations on the Waymo Open Motion and Argoverse 2 datasets, our framework significantly outperforms state-of-the-art baselines in safety, efficiency, and ride comfort.

2606.03268 2026-06-03 cs.RO 版本更新

EaDex: A Cross-Embodiment Dexterous Manipulation Framework from Low-Cost Demonstrations

EaDex: 一种基于低成本演示的跨形态灵巧操作框架

Qian Zhao, Xin Tong, Chengdong Wu, Yang Yang, Yingtian Li

发表机构 * Faculty of Robot Science and Engineering, Northeastern University(机器人科学与工程学院,东北大学) Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences(深圳先进技术研究所,中国科学院) School of Automation, Nanjing University of Information Science and Technology(自动化学院,南京信息科学技术大学)

AI总结 提出EaDex框架,通过RGB-D相机捕捉人手运动并构建结构化演示数据,结合基于接触奖励的动态演示退火机制,在低成本演示条件下实现多形态灵巧操作的快速学习和训练。

Comments 11 pages, 5 figures, Conference: CoRL 2026, Submitted as Preprint

详情
AI中文摘要

灵巧操作学习长期以来受到数据和训练高成本的阻碍,因为纯强化学习通常需要大规模交互探索,而模仿学习依赖于昂贵的高质量演示。为了解决这个问题,我们提出了EaDex,一种在低成本演示条件下的多形态灵巧操作学习框架,它能够快速生成演示数据,从而减少训练时间以实现高效的灵巧操作。在数据层面,EaDex仅使用单个RGB-D相机捕捉人手运动,并通过基于MANO的手部建模、数据归一化和运动重定向构建结构化演示数据。在学习层面,我们引入了一种基于接触奖励的动态演示退火机制,该机制在演示引导下进行早期探索,并随着接触奖励的积累逐渐过渡到自主优化。使用我们自定义的数据集,我们在三种灵巧手和三种铰接物体打开任务上评估了EaDex,涵盖了九种跨形态操作设置,相比没有演示退火的基线实现了55.3%的相对改进。这些结果验证了所提出的低成本演示流程和动态演示退火策略在灵巧操作学习中的有效性。

英文摘要

Dexterous manipulation learning has long been hindered by the high costs of data and training, as pure reinforcement learning typically requires large-scale interactive exploration and imitation learning depends on high-quality demonstrations that are expensive to collect. To address this problem, we propose EaDex, a multi-embodiment dexterous manipulation learning framework under low-cost demonstration conditions, which enables rapid generation of demonstration data and consequently reduces training time for efficient dexterous manipulation. At the data level, EaDex captures human hand motions using only a single RGB-D camera and constructs structured demonstration data through MANO-based hand modeling, data normalization, and motion retargeting. At the learning level, we introduce a contact-reward-based dynamic demonstration annealing mechanism, which guides early-stage exploration under demonstration and gradually transitions to autonomous optimization with accumulating contact rewards. Using our custom dataset, we evaluate EaDex on three dexterous hands and three articulated object-opening tasks, covering nine cross-embodiment manipulation settings, achieving a 55.3% relative improvement over the baseline without demonstration annealing. These results validate the effectiveness of the proposed low-cost demonstration pipeline and the dynamic demonstration annealing strategy for dexterous manipulation learning.

2606.03265 2026-06-03 cs.RO 版本更新

Wheel-Mounted/GNSS Fusion with AI-Aided Position Updates

基于人工智能辅助位置更新的轮式/GNSS融合定位

Gal Versano, Itzik Klein

发表机构 * Autonomous Navigation and Sensor Fusion Lab(自主导航与传感器融合实验室) Hatter Department of Marine Technologies(海洋技术系) Charney School of Marine Sciences(海洋科学学院) University of Haifa(海法大学)

AI总结 提出一种混合神经惯性导航框架,结合轮式惯性传感器、强制周期轨迹和神经网络,通过误差状态扩展卡尔曼滤波融合GNSS位置更新,实现定位精度提升约46%。

详情
AI中文摘要

精确且鲁棒的定位仍然是自主地面车辆面临的基本挑战。在这项工作中,我们提出了一种混合神经惯性导航框架,该框架集成了轮式惯性传感器、强制周期轨迹以及一个简单高效的神经网络,能够在误差状态扩展卡尔曼滤波中通过GNSS位置更新回归车辆位移。周期轨迹提高了惯性信噪比,使得网络仅利用惯性读数即可估计位移。通过使用多个轮式惯性传感器的真实世界实验验证了该方法。实验结果表明,与标准轮式惯性传感器融合GNSS更新相比,所提方法在定位精度上实现了显著提升,位置均方根误差降低了约46%。

英文摘要

Accurate and robust localization remains a fundamental challenge for autonomous ground vehicles. In this work, we propose a hybrid neural inertial navigation framework that integrates a wheel-mounted inertial sensors, enforced periodic trajectories, and a simple, efficient neural network capable of regressing vehicle displacement with GNSS position updates in an error-state extended Kalman filter. The periodic trajectories increase the inertial signal-to-noise ratio, allowing the network to use only inertial readings to estimate displacement. The approach is validated through real-world experiments using multiple wheel-mounted inertial sensors. Experimental results demonstrate that the proposed method achieves a significant improvement in positioning accuracy, reducing the position root mean squared error by approximately 46 % compared to standard wheel-mounted inertial sensor fusion with GNSS updates.

2606.03252 2026-06-03 cs.RO cs.AI 版本更新

AirDreamer: Generalist Drone Navigation with World Models

AirDreamer: 基于世界模型的通用无人机导航

Zian Liu, Andong Yang, Chunkai Yang, Ruidong An, Chao Gao, Guyue Zhou

发表机构 * Institute for AI Industry Research, Tsinghua University, Beijing, China(人工智能产业研究院,清华大学,北京,中国) Department of Electronic Engineering, Tsinghua University, Beijing, China(电子工程系,清华大学,北京,中国) School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China(遥感与信息工程学院,武汉大学,武汉,中国)

AI总结 提出一种结合强化学习策略和世界模型理解的无人机导航框架,通过稀疏奖励函数避免局部最优,在复杂未知环境中实现优于基线5.3%的成功率,并支持零调参的仿真到现实迁移。

Comments 8 pages, 8 figures

详情
AI中文摘要

在未知且杂乱的环境中导航无人机需要可靠地泛化到未见过的场景布局,并理解与机器人能力相关的环境结构。先前的方法假设相同的环境配置,通常严重依赖人工设计的感知管道和预定义规则来引导机器人到达目标。这个过程依赖于环境,且跨环境泛化能力差。受动物导航行为启发,我们设计了一个导航框架,该框架在基于世界模型的环境理解之上使用基于强化学习的策略进行导航,以克服这些问题。此外,我们设计了一个无需手工塑造项的稀疏奖励函数,以避免局部极小值陷阱并鼓励偏航控制行为。在仿真和真实无人机上,我们的方法展现出在复杂未知环境中导航和逃离其他方法失败的局部最优的新兴能力。在具有挑战性的地图上,它比最佳基线实现了5.3%更高的导航成功率。此外,所提出的框架在部署期间无需任何调整即可实现有效的仿真到现实迁移。代码将公开。

英文摘要

Navigating a drone in unseen and cluttered environments requires reliable generalization to unseen scene layouts and understanding of environmental structure relative to the robot's capabilities. Previous methods, which assume the same environment configuration, often rely heavily on human-designed perception pipelines and predefined rules to guide the robot toward the target. This process is environment-dependent and generalizes poorly across environments. Inspired by animal navigation behavior, we design a navigation framework that navigates with a reinforcement-learning-based policy on top of a world-model-based environment understanding to overcome these issues. In addition, a sparse reward function without hand-crafted shaping terms is designed to avoid local minima traps and encourage yaw control behaviors. In simulation and on real drones, our method exhibits emergent capabilities for navigating complex, unseen environments and escaping local optima where other methods fail. In challenging maps, it achieves a 5.3% higher navigation success rate than best baseline. Furthermore, the proposed framework achieves effective sim-to-real transfer without any tuning during deployment. The code will be publicly available.

2606.03240 2026-06-03 cs.RO 版本更新

GeoAlign: Beyond Semantics with State-Guided Spatial Alignment in VLA Models

GeoAlign: VLA模型中的状态引导空间对齐超越语义

Yizhi Chen, Zhanxiang Cao, Xinyi Peng, Yixiao Zheng, Xiaxi Si, Yiheng Li, Liyun Yan, Keqi Zhu, Xueyun Chen, Shengcheng Fu, Tianyue Zhan, Yufei Jia, Jinming Yao, Yan Xie, Kun Wang, Cewu Lu, Yue Gao

发表机构 * Tongji University(同济大学) Shanghai Innovation Institute(上海创新研究院) Shanghai Jiao Tong University(上海交通大学) Zhejiang University(浙江大学) Jingdezhen Ceramic University(景德镇陶瓷大学) Tsinghua University(清华大学) HONOR(HONOR公司) University of Science and Technology of China(中国科学技术大学)

AI总结 提出GeoAlign架构,通过RGB几何分支的后训练和机器人本体状态引导的几何特征查询,实现几何感知的空间对齐和动态可供性选择,在多个基准上取得高性能。

Comments 20 pages, 9 figures, 8 tables, including appendix

详情
AI中文摘要

当前的视觉-语言-动作(VLA)模型通常优化语义基础,而可执行的操纵需要几何感知的空间对齐和动态可供性选择。我们引入了GeoAlign,一种用于VLA策略学习的状态引导空间对齐架构。GeoAlign使用机器人领域的RGB-D监督对RGB几何分支进行后训练,生成RGB衍生的几何增强后训练(GEP)特征用于策略部署。机器人的本体状态查询GEP特征网格,产生紧凑的、相位相关的几何令牌用于动作预测。GeoAlign在LIBERO上达到99.0%,在三个SimplerEnv-Fractal任务上达到85.3%,在八个几何关键的真实世界ALOHA任务上达到78.8%,消融实验证实了几何后训练和本体状态引导查询的价值。

英文摘要

Current Vision--Language--Action (VLA) models often optimize for semantic grounding, whereas executable manipulation requires geometry-aware spatial alignment and dynamic affordance selection. We introduce GeoAlign, a state-guided spatial alignment architecture for VLA policy learning. GeoAlign post-trains an RGB geometry branch with robot-domain RGB-D supervision, yielding RGB-derived Geometry-Enhanced Post-Trained (GEP) features for policy rollout. The robot's proprioceptive state queries the GEP feature grid, producing compact, phase-dependent geometry tokens for action prediction. GeoAlign achieves 99.0% on LIBERO, 85.3% across three SimplerEnv-Fractal tasks, and 78.8% on eight geometry-critical real-world ALOHA tasks, with ablations confirming the value of geometry post-training and proprioceptive-state-guided querying.

2606.03223 2026-06-03 cs.RO cs.AI 版本更新

BotDirector: Robot Storytelling Across the Symmetrical Reality with Multi-modal Interactions

BotDirector:跨对称现实的多模态交互机器人讲故事

Zhe Sun, Meng Wang, Lei Wang, Yuxi Wang, Wanxin Li, Yujia Peng, Zhenliang Zhang

发表机构 * State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China(国家一般人工智能重点实验室,BIGAI,北京,中国) Peking University, Beijing, China(北京大学,北京,中国)

AI总结 提出一个结合具身交互和自然语言交互的机器人讲故事系统,利用LLM代理将儿童创建的叙事转化为自导航群体机器人的运动序列,支持灵活场景和日常物品。

详情
Journal ref
2026 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)
AI中文摘要

机器人讲故事融合了技术创新和创意表达,以前所未有的方式吸引儿童。然而,技术方面往往对儿童来说过于复杂。我们提出了一个交互式系统,通过具身和自然语言交互促进机器人讲故事。儿童用自己的物品布置游乐场,并与LLM代理一起创建叙事。创建的叙事基于地图和角色转化为运动序列,并由自导航群体机器人执行。该系统增强了机器人讲故事的灵活性,使幼儿能够用日常物品创作机器人戏剧。

英文摘要

Robot storytelling offers a unique blend of technological innovation and creative expression that engages children in unprecedented ways. However, the technical aspects are often too complicated for children. We propose an interactive system that facilitates robot storytelling with tangible and natural language interactions. Children arrange the playground with their own stuff and create narratives with an LLM agent. The created narratives are transformed into a motion sequence based on the map and characters, and the motions are executed by self-navigating swarm robots. This system enhances robot storytelling with flexible scenarios, enabling young children to create robot dramas with everyday objects.

2606.03204 2026-06-03 cs.RO eess.SP 版本更新

Toward Gripper-Integrated Active Electrosense for Pre-Contact Sensing in Underwater Soft Grippers

面向水下软体夹爪预接触感知的夹爪集成主动电感知

Ahsan Tanveer, Muhammad Hamza, Waqar Hussain Afridi, Chen Wang, Guangming Xie

发表机构 * Intelligent Biomimetic Design Lab, School of Advanced Manufacturing and Robotics, State Key Laboratory for Turbulence and Complex Systems, College of Engineering, Peking University(智能仿生设计实验室,先进制造与机器人学院,湍流与复杂系统国家重点实验室,北京大学) National Engineering Research Center of Software Engineering, Peking University(软件工程国家工程研究中心,北京大学) Institute of Ocean Research, Peking University(海洋研究所,北京大学)

AI总结 针对水下视觉受限问题,提出一种集成于软体夹爪的主动电感知方法,通过测量导电介质中电场扰动实现预接触信号检测,实验表明多电极电压读数可检测物体引起的结构化变化。

Comments Extended abstract accepted to the IEEE ICRA 2026 Workshop on Manipulation Robustness

详情
AI中文摘要

水下操作通常发生在因浑浊、眩光和夹爪遮挡导致能见度降低的环境中,这限制了接近和抓取过程中基于视觉感知的可靠性。在这种情况下,软体夹爪非常适合顺应性交互,但通常缺乏在视觉不可靠时指导接近和闭合的机载预接触线索。本扩展摘要探索了主动电感知作为一种轻量级传感模式,通过测量导电介质中施加电场的扰动,在接触前提供类似接近的信号。我们为仿章鱼夹爪设计了离散电极布局,并使用现成硬件记录多通道传感电压。使用悬浮导电球进行的模拟和水槽实验显示,相对于空水基线,多电极电压读数出现了结构化的、依赖于物体的变化,且可检测性随5至20 V的激励和1 mHz至1 kHz的频率而变化。这些发现促使系统研究集成于夹爪的电感知作为水下软体操作补充预接触线索的可行性。

英文摘要

Underwater manipulation often occurs under degraded visibility due to turbidity, glare, and gripper occlusion, limiting the reliability of vision-based perception during approach and grasping. In such settings, soft grippers are well suited for compliant interaction, but they typically lack an onboard pre-contact cue that can guide approach and closure when vision is unreliable. This extended abstract explores active electrosense as a lightweight sensing modality that can provide a proximity-like signal prior to contact by measuring perturbations of an applied electric field in conductive media. We instrument an octopus-inspired gripper with a discrete electrode layout and record multi-channel sensing voltages using off-the-shelf hardware. Simulation and tank experiments with a suspended conductive sphere show structured, object-dependent changes in the multi-electrode voltage readout relative to empty-water baselines, with detectability varying across excitation of 5 to 20 V and frequencies from 1 mHz to 1 kHz. These findings motivate systematic investigation of gripper-integrated electrosense as a complementary pre-contact cue for underwater soft manipulation.

2606.03188 2026-06-03 cs.RO 版本更新

GeoSem-WAM: Geometry- and Semantic-Aware World Action Models

GeoSem-WAM:几何与语义感知的世界动作模型

Fulong Ma, Daojie Peng, Wenjun Yue, Jiahang Cao, Bintao Wang, Qiang Zhang, Jun Ma

发表机构 * HKUST(GZ)(香港科技大学(广州)) HKU(香港大学) USTC(中国科学技术大学) SDU(山东大学) X-Humaniod

AI总结 提出GeoSem-WAM框架,通过几何和语义监督增强潜在表示,在统一潜在空间中联合捕捉场景动态、空间几何和语义上下文,避免测试时显式未来展开或视频生成,提升动作预测准确性和鲁棒性。

详情
AI中文摘要

最近的世界动作模型(WAM)在具身决策中展示了令人印象深刻的能力。然而,它们的有效性是源于推理过程中的显式未来想象,还是由预测训练引起的表示学习,仍是一个未解之谜。新兴证据表明,主要优势在于学习鲁棒的潜在表示,而非在测试时生成未来观测。尽管如此,现有的WAM主要依赖于基于RGB的未来预测,这提供了对复杂环境有限的结构和空间理解。为了解决这个问题,我们提出了一个结构化世界建模框架,通过几何和语义监督增强潜在表示。除了未来的RGB预测,我们的模型引入了两个辅助预测分支,用于未来的几何和语义表示,使其能够在统一的潜在空间中联合捕捉场景动态、空间几何和语义上下文。关键在于,我们的方法通过避免测试时的显式未来展开或视频生成,保持了高效的推理。大量实验表明,纳入结构化世界监督一致地提高了动作预测准确性、场景理解以及在具有挑战性的具身场景下的鲁棒性,突显了其推进可扩展和高效WAM的潜力。

英文摘要

Recent World Action Models (WAMs) have demonstrated impressive capabilities in embodied decision-making. However, whether their effectiveness stems from explicit future imagination during inference or representation learning induced by predictive training remains an open question. Emerging evidence suggests the primary advantage lies in learning robust latent representations rather than generating future observations at test time. Nevertheless, existing WAMs mainly rely on RGB-based future prediction, which provides limited structural and spatial understanding of complex environments. To address this, we propose a structured world modeling framework that enhances latent representations through geometric and semantic supervision. Alongside future RGB prediction, our model introduces two auxiliary prediction branches for future geometry and semantic representations, enabling it to jointly capture scene dynamics, spatial geometry, and semantic context within a unified latent space. Crucially, our approach preserves efficient inference by avoiding explicit future rollout or video generation at test time. Extensive experiments show that incorporating structured world supervision consistently improves action prediction accuracy, scene understanding, and robustness under challenging embodied scenarios, highlighting its potential for advancing scalable and efficient WAMs.

2606.03159 2026-06-03 cs.CV cs.AI cs.RO 版本更新

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

NVIDIA OmniDreams:用于闭环自动驾驶仿真的实时生成式世界模型

NVIDIA, :, Aarti Basant, Amlan Kar, Despoina Paschalidou, Fangyin Wei, Francesco Ferroni, Guillermo Garcia Cobo, Haithem Turki, Huan Ling, Jaewoo Seo, James Lucas, Jay Zhangjie Wu, Jialiang Wang, Jonathan Lorraine, Jun Gao, Kai He, Katarina Tothova, Kevin Xie, Michał Tyszkiewicz, Qi Wu, Riccardo de Lutio, Ruilong Li, Sanja Fidler, Seung Wook Kim, Tianchang Shen, Tianshi Cao, Tobias Pfaff, William Lew, Xindi Wu, Xuanchi Ren, Yifan Lu, Yuxuan Zhang, Zan Gojcic, Zian Wang

AI总结 提出OmniDreams,一个基于Cosmos扩散模型训练的基础生成式世界模型,通过自回归生成动作条件视频,实现闭环仿真中复杂长尾场景的实时合成,并验证其在策略模型训练中的有效性。

详情
AI中文摘要

随着自动驾驶能力的提升,在长尾场景中安全评估驾驶策略仍是一个关键瓶颈。在闭环仿真中,驾驶策略模型与环境主动交互,其动作动态更新模拟器状态并直接影响下一组生成的传感器观测。尽管近期基于重建的神经模拟器提供了逼真效果,但它们从根本上受限于初始捕获数据,难以泛化到高度动态或新颖场景。为克服这些限制,我们引入了OmniDreams,一个从Cosmos扩散模型进行中期和后训练的基础生成式世界模型,能够自回归地实时生成动作条件视频。通过利用Cosmos丰富的视觉先验以及在21k小时驾驶场景上的中期和后训练,OmniDreams合成了传统模拟器难以捕获的复杂未观测现象,例如极端天气和不可预测的动态智能体行为。关键在于,它自回归地根据过去帧、当前模拟器状态和即时驾驶动作来调节其逼真的传感器生成。在结合Alpamayo 1策略模型和AlpaSim编排器的闭环系统中部署时,OmniDreams充当一个高度响应、反应灵敏的环境,为训练和评估下一代自动驾驶策略提供了可扩展且全面的解决方案。我们还展示了初步结果,表明从OmniDreams后训练的世界-动作模型(WAM)在Physical AI自动驾驶NuRec数据集上取得了强劲性能,超越了基于VLA的Alpamayo 1.5研究策略模型,同时仅使用其1/5的总参数量。这些结果凸显了像OmniDreams这样的实时世界模型也有潜力作为策略架构的骨干网络。

英文摘要

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.

2606.03134 2026-06-03 cs.RO cs.LG 版本更新

How Visible Are Silent Manipulation Failures? An Observability Study of False-Success Detection in Simulated Robot Episodes

无声操作失败的可见性:模拟机器人任务中假成功检测的可观测性研究

Aarav Bedi

发表机构 * Aarav Bedi

AI总结 本研究通过模拟双机械臂ALOHA任务,探讨机器人自身成功检测器标记为成功的任务中,假成功(实际失败但被误判为成功)的可恢复性,发现基于关节数据的检测器在方块转移任务中几乎完全可恢复假成功,而在插销任务中仅部分可恢复,视觉检测器可弥补差距,且可分离性依赖于远低于实际传感器噪声的速度差异。

Comments 4 pages, 3 figures

详情
AI中文摘要

模仿学习策略用于机器人操作时,其训练任务的成功标签质量取决于机器人自身的成功检测器。一种特别有害的错误是假成功:机器人记录为成功但实际任务结果错误的任务。我们针对这些任务提出一个狭窄但实际的问题:一旦任务被标记为成功,推翻该标签所需的信息有多少存在于本体感觉中,又有多少需要视觉?我们在两个双机械臂ALOHA任务上构建模拟测试平台,通过环境扰动而非标签编辑诱发失败,利用检测器从未见过的特权模拟器状态标记每个任务,仅保留机器人标记为成功的任务。然后,我们将限制于本体感觉的检测器与基于视觉的检测器进行比较。我们发现可恢复性范围广泛:在方块转移任务中,假成功几乎完全可从关节数据中恢复,而在插销插入任务中,本体感觉仅恢复部分假成功,视觉检测器则弥补了大部分差距。我们还表明,我们测量的本体感觉可分离性依赖于远低于任何实际传感器噪声水平的速度差异,因此最好将其视为无噪声模拟器夸大的乐观上限。我们发布了生成和评估流程。

英文摘要

Imitation-learning policies for robot manipulation inherit the quality of the success labels attached to their training episodes, and those labels are usually produced by the robot's own success check. A particularly damaging error is the false success: an episode the robot logs as a success when the task outcome was actually wrong. We ask a narrow but practical question about these episodes. Once an episode has already been flagged as a success, how much of the information needed to overturn that label is present in proprioception, and how much requires vision? We build a simulated testbed on two bimanual ALOHA tasks, induce failures through environment perturbations rather than label edits, label every episode by privileged simulator state that the detector never sees, and keep only episodes the robot flagged as successful. We then compare detectors restricted to proprioception against a vision-based detector. We find that recoverability spans a wide range: in cube transfer the false successes are almost fully recoverable from joint data alone, while in peg insertion proprioception recovers only part of them and a vision detector closes most of the gap. We also show that the proprioceptive separability we measure rests on velocity differences far below any realistic sensor noise floor, so it is best read as an optimistic upper bound that a noiseless simulator inflates. We release the generation and evaluation pipeline.

2606.03127 2026-06-03 cs.RO 版本更新

TTT-VLA: Test-Time Latent Prompt Optimization for Vision-Language-Action Models

TTT-VLA:面向视觉-语言-动作模型的测试时潜在提示优化

Wenbo Zhang, Jianxiong Li, Shuai Yang, Sijin Chen, Jiajun Liu, Lingqiao Liu, Xiao Ma

发表机构 * ByteDance Seed(字节跳动种子) The University of Adelaide(阿德莱德大学) Tsinghua University(清华大学) Zhejiang University(浙江大学) The University of Hong Kong(香港大学) CSIRO Data61

AI总结 提出TTT-VLA框架,通过测试时优化潜在提示来适应分布偏移,无需修改策略本身,在SimperEnv上提升单/多实体任务成功率。

详情
AI中文摘要

基于大规模数据训练的视觉-语言-动作(VLA)模型取得了显著进展,但在部署时仍易受分布偏移影响。最近的VLA模型表明,提示可以作为引导策略行为的有效接口,但现有的基于提示的引导通常依赖外部指导。这自然引出一个问题:能否通过优化提示来实现VLA的测试时训练(TTT),使得引导接口本身可以从交互中学习和适应?我们通过TTT-VLA来解决这个问题,这是一种基于潜在提示优化(LPO)的测试时训练框架。在训练期间,潜在提示通过额外的代理任务学习,为策略学习提供额外的学习条件信号。在测试时,通过从当前环境收集交互数据,并仅使用代理任务的自监督信号优化这些数据上的潜在提示来执行TTT,而不修改策略本身。在SimperEnv上的实验表明,所提方法在单实体和多实体设置中均能持续提高任务成功率。进一步分析表明,提升主要源于纠正少量关键决策,而非全局改变策略行为。这些结果表明,LPO为基础操作策略的部署时改进提供了一条有效且实用的途径。

英文摘要

Vision-Language-Action (VLA) models trained on large-scale data have made remarkable progress, but they remain vulnerable to distribution shifts at deployment time. Recent VLA models suggest that prompts can serve as an efficient interface for steering policy behavior, but existing prompt-based steering typically relies on external guidance. This raises a natural question: can test-time training (TTT) for VLA be achieved by optimizing a prompt, so that the steering interface itself can be learned and adapted from interaction? We address this question with TTT-VLA, a test-time training framework based on Latent Prompt Optimization (LPO). During training, the latent prompt is learned with an additional proxy task, providing an extra learned conditioning signal for policy learning. At test time, TTT is performed by collecting interaction data from the current environment and optimizing only the latent prompt on those data using the proxy task's self-supervised signal, without modifying the policy itself. Experiments on SimplerEnv demonstrate that the proposed method consistently improves task success rates in both single- and multi-embodiment settings. Further analysis shows that the gains arise primarily from correcting a small number of critical decisions rather than globally altering policy behavior. These results suggest that LPO provides an effective and practical pathway for deployment-time improvement of foundation manipulation policies.

2606.03047 2026-06-03 cs.RO cs.MA 版本更新

ModuLoop : Low-Level Code Generation using Modular Synthesizer and Closed-Loop Debugger for Robotic Control

ModuLoop: 使用模块化合成器和闭环调试器进行机器人控制的低级代码生成

Gina Yoon, Sumin Lee, Joo Yong Sim

发表机构 * Department of Mechanical Systems Engineering, Sookmyung Women’s University(苏州市女子大学机械系统工程系)

AI总结 提出闭环模块化代码合成框架,利用预训练大语言模型进行模块化代码规划与生成,并通过迭代执行和调试探针实现系统调试与优化,成功应用于RGB-D相机与机械臂标定及抓取任务。

Comments IEEE Robotics and Automation Letters (2025)

详情
AI中文摘要

大型语言模型(LLMs)在包括代码生成和问题解决在内的各个领域展示了令人印象深刻的表现。然而,它们在机器人控制中的应用,特别是在需要精确操作、实时反馈和环境依赖执行的低级任务中,仍然有限。为了解决这一挑战,我们提出了闭环模块化代码合成框架。该框架利用预训练的LLM,无需任何任务特定的微调,执行模块化代码规划和生成,并在迭代执行生成的代码的同时插入调试探针以观察其行为。这种闭环结构促进了系统性的调试和优化,最终生成可执行的控制程序。我们将该框架应用于RGB-D相机和机械臂的标定,验证了其在真实世界环境中的有效性。此外,通过后续的抓取任务,我们不仅展示了标定的准确性,还展示了框架的潜在可扩展性。在两个任务中,该框架都实现了高执行准确性和自主性,说明了使用我们框架进行基于LLM的机器人控制的实用性和可扩展性。

英文摘要

Large Language Models (LLMs) have demonstrated impressive performance across various domains, including code generation and problem solving. However, their application in robotic control, particularly in low-level tasks that require precise manipulation, real-time feedback, and environment-dependent execution, remains limited. To address this challenge, we propose the Closed-Loop Modular Code Synthesizer framework. This framework leverages a pre-trained LLM without any task-specific fine-tuning to perform modular code planning and generation, and iteratively executes the generated code while inserting debugging probes to observe its behavior. This closed-loop structure facilitates systematic debugging and refinement, ultimately producing executable control programs. We apply the proposed framework to the calibration of an RGB-D camera and a robotic arm, validating its effectiveness in real-world settings. Furthermore, through a subsequent pick-and-place task, we demonstrate not only the accuracy of the calibration but also the potential extensibility of the framework. Across both tasks, the framework achieved high execution accuracy and autonomy, illustrating the practicality and scalability of LLM-based robotic control using our framework.

2606.03017 2026-06-03 cs.LG cs.AI cs.RO 版本更新

ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL

ConTraIRL:用于可迁移逆强化学习的分解对比抽象

Yikang Gui, Bikramjit Banerjee, Prashant Doshi

发表机构 * School of Computing University of Georgia(乔治亚大学计算学院) School of Computing Sciences & Computer Engineering The University of Southern Mississippi(密西西比大学计算科学与计算机工程学院)

AI总结 提出ConTraIRL框架,通过双编码器对比学习解耦环境动态与任务目标的潜在表示,实现组合奖励迁移,在连续控制基准上显著提升少样本迁移的样本效率和奖励恢复。

详情
AI中文摘要

当策略必须泛化到未见过的环境动态与任务目标组合时,逆强化学习中的奖励迁移不可靠。我们提出用于可迁移逆强化学习的分解对比抽象(ConTraIRL),该框架通过学习这两个因素的解耦潜在表示来实现组合奖励迁移。ConTraIRL采用双编码器架构,将观测映射到分离的动态和目标的潜在空间,并通过双重对比目标进行训练。时间对齐鼓励动态编码器学习目标不变的结构,而目标编码器捕获动态不变的特征。这种分解支持在重组动态-目标设置下的奖励推断。在连续控制基准上的实验表明,对未见过的动态-目标配对进行有效的少样本迁移,与迁移逆强化学习基线相比,提高了样本效率和奖励恢复。

英文摘要

Reward transfer in Inverse Reinforcement Learning (IRL) is unreliable when policies must generalize to unseen combinations of environment dynamics and task goals. We propose Factorized Contrastive Abstractions for Transferable IRL (ConTraIRL), a framework that enables compositional reward transfer by learning decoupled latent representations of these two factors. ConTraIRL uses a dual-encoder architecture that maps observations into separate dynamics and goal latent spaces, trained with a dual contrastive objective. Temporal alignment encourages the dynamics encoder to learn goal-invariant structure, while the goal encoder captures dynamics-invariant features. This factorization supports reward inference under recombined dynamics-goal settings. Experiments on continuous control benchmarks demonstrate effective few-shot transfer to unseen dynamics-goal pairings, improving sample efficiency and reward recovery over transfer IRL baselines.

2606.03003 2026-06-03 cs.LG cs.AI cs.RO 版本更新

Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group

精确等变性在训练中保持,实现跨对称群的零样本泛化

Hongbo Wang

发表机构 * Department of Mathematics, Stony Brook University(石溪大学数学系)

AI总结 通过等变编码器和预测器构建的潜世界模型,其训练损失具有可证明的对称性,从而在仅拟合部分方向动力学时,数学上确定整个轨道上的行为,实现跨对称群的零样本泛化。

Comments 92 pages, 11 figures. Core paper plus an extended results-log appendix and a forward-looking theory supplement. All experiments are laptop-scale (CPU/MPS), fully seeded and deterministic

详情
AI中文摘要

由等变编码器 $E$ 和等变预测器 $f$ 构建的潜世界模型继承了其训练损失的可证明对称性:当世界的动力学真正承载一个群 $G$,通过正交表示 $\rho(g)$ 作用于潜变量时,单步预测 relMSE 在整个群上精确不变,因此仅在方向的受限切片上拟合动力学,数学上就确定了整个轨道上的动力学(举一反三)。我们在笔记本电脑规模(CPU/MPS,完全设定随机种子)上端到端验证了这一点。[A] 该对称性在真实的 Muon/AdamW + EMA + VICReg 运行中幸存——组合的编码-预测残差在优化后约为 $10^{-6}$,不仅在初始化时,而且在任何优化器下都成立。[B] 单步误差在整个群上平坦至五位小数,而相同假设类别的非等变基线拟合了切片但在分布外失效(2D 中 VN $\times 1.00$ 对比基线 $\times 13.8$,3D 中 $\times 17.2$,整个 $\mathrm{SE}(3)$ 阶梯上 $\times 157$),且等变模型小 $4.5$-$7.4$ 倍。[C] 相同的等距论证提升到闭环:在匹配的等变规划器下,方向 $g$ 处的控制轨迹恰好是所见轨迹应用 $\rho(g)$ 的结果,因此闭环误差在整个群上不变——在真实 PushT 上的 2D/$\mathrm{SO}(2)$ 中浮点地板精确,在 3D/$\mathrm{SE}(3)$ 中统计平坦(不相交的 95% 置信区间)。我们针对 Sutton 的苦涩教训对先验进行了压力测试:增强、暴力规模和软等变性各自最多缩小跨群任务指标,但从未达到浮点地板精确性。由于等变性在复合下封闭,$H$ 步展开在每个视界上保持平坦($\times 1.00$,$\le 2\times 10^{-7}$),而基线的残差随 $H$ 复合。超出范围:任务成功扫描、无规划器不变性和缩放。

英文摘要

A latent world model built from an equivariant encoder $E$ and an equivariant predictor $f$ inherits a provable symmetry of its training loss: when the world's dynamics genuinely carries a group $G$ acting on latents by an orthogonal representation $ρ(g)$, the one-step prediction relMSE is exactly invariant across the whole group, so fitting the dynamics on a restricted slice of orientations mathematically determines it on the entire orbit (jǔ yī fǎn sān). We verify this end-to-end at laptop scale (CPU/MPS, fully seeded). [A] The symmetry survives a real Muon/AdamW + EMA + VICReg run -- composed encode-then-predict residual $\sim 10^{-6}$ after optimisation, not just at initialisation, and under any optimiser. [B] One-step error is flat to five digits across the group, while a same-hypothesis-class non-equivariant baseline fits the slice but breaks out-of-distribution (VN $\times 1.00$ vs baseline $\times 13.8$ in 2D, $\times 17.2$ in 3D, $\times 157$ over the full $\mathrm{SE}(3)$ ladder), with the equivariant model $4.5$-$7.4\times$ smaller. [C] The same isometry argument lifts to closed loop: under a matching equivariant planner the control trajectory at orientation $g$ is exactly $ρ(g)$ applied to the seen one, so closed-loop error is invariant across the group -- float-floor-exact in 2D/$\mathrm{SO}(2)$ on real PushT and statistically flat in 3D/$\mathrm{SE}(3)$ (disjoint 95% CIs). We stress-test the prior against Sutton's Bitter Lesson: augmentation, brute-force scale, and soft-equivariance each close at most the across-group task metric, never the float-floor exactness. Because equivariance is closed under composition, the $H$-fold rollout stays flat ($\times 1.00$, $\le 2\times 10^{-7}$) at every horizon, while the baseline's residual compounds with $H$. Out of scope: task-success sweeps, planner-free invariance, and scaling.

2606.02996 2026-06-03 cs.RO cs.CV cs.HC 版本更新

MARIO: Motion-Augmented Real-Time Multi-Sensor Inertial Odometry

MARIO: 运动增强的实时多传感器惯性里程计

Yiquan Li, Taeyoung Yeon, Chenfeng Gao, Vasco Xu, Xuanyou Liu, Karan Ahuja

发表机构 * Northwestern University(西北大学) University of Chicago(芝加哥大学)

AI总结 提出MARIO框架,通过学习IMU推断的人体姿态先验约束运动动力学,并结合多传感器融合(磁力计、气压计、辅助IMU),在Nymeria数据集上将位置漂移降低36%-42%,实现无相机人体跟踪的准确鲁棒惯性里程计。

Comments CVPR 2026 Findings

详情
AI中文摘要

仅使用惯性测量单元(IMU)的惯性里程计(IO)为增强现实(AR)和可穿戴设备中的人体运动跟踪提供了轻量级解决方案。最近的基于学习的IO方法通过在大规模人体运动数据集上进行预训练,提高了惯性定位的泛化能力。然而,这些方法仍然容易受到漂移和噪声的影响,因为它们没有显式捕捉人体运动动力学,尤其是在日常活动数据集(如Nymeria)上。在这项工作中,我们提出通过学习的IMU推断姿态先验将惯性里程计建立在人体运动学基础上,该先验促进物理一致的运动约束。我们将此姿态先验集成到现有IO架构中,并在具有挑战性的Nymeria数据集上将位置漂移减少高达36%,该数据集比先前工作中使用的数据集大5倍。我们进一步通过传感器融合框架改进了长期性能,该框架整合了商用AR眼镜上已有的轻量级传感器的辅助信号,包括磁力计、气压计和辅助IMU。通过这种融合策略,位置漂移减少了高达42%,提高了在不同运动条件下的鲁棒性和泛化能力。总之,我们的结果通过将人体运动学与多模态传感统一起来,为惯性轻量级里程计引入了新范式,为准确鲁棒的无相机人体跟踪设立了新基准。我们的网站位于此https URL。

英文摘要

Inertial odometry (IO) using only Inertial Measurement Units (IMUs) provides a lightweight solution for human motion tracking in augmented reality (AR) and wearable devices. Recent learning-based IO methods have improved the generalizability of inertial localization through large-scale pretraining on human motion datasets. However, these approaches remain prone to drift and noise because they do not explicitly capture human motion dynamics, especially on daily activity datasets such as Nymeria. In this work, we propose to ground inertial odometry in human kinematics through a learned IMU-inferred pose prior, which promotes physically consistent motion constraints. We integrate this pose prior into existing IO architectures and reduce positional drift by up to 36% on the challenging Nymeria dataset, which is 5x larger than datasets used in prior work. We further improve long-term performance with a sensor-fusion framework that incorporates auxiliary signals from lightweight sensors already available on commercial AR glasses, including magnetometers, barometers, and secondary IMUs. With this fusion strategy, positional drift is reduced by up to 42%, improving robustness and generalization across diverse motion conditions. Together, our results introduce a new paradigm for inertial and lightweight odometry by unifying human motion kinematics with multimodal sensing, setting a new benchmark for accurate and robust camera-less human tracking. Our website is available at https://spice-lab.org/projects/MARIO/.

2606.02979 2026-06-03 cs.CV cs.AI cs.RO 版本更新

Towards Compact Autonomous Driving Perception with Balanced Learning and Multi-sensor Fusion

面向紧凑型自动驾驶感知的平衡学习与多传感器融合

Oskar Natan, Jun Miura

发表机构 * Department of Computer Science and Engineering, Toyohashi University of Technology(计算机科学与工程系,丰田寺大学) Department of Computer Science and Electronics, Gadjah Mada University(计算机科学与电子系,加查马达大学)

AI总结 提出一种紧凑的深度多任务学习模型,通过自适应损失加权和中间传感器融合技术,在单次前向传播中同时处理语义分割、深度估计、激光雷达分割和鸟瞰投影,实现高效自动驾驶感知。

Comments This work has been accepted for publication in IEEE Transactions on Intelligent Transportation Systems. https://ieeexplore.ieee.org/document/9712213

详情
AI中文摘要

我们提出了一种新颖的紧凑型深度多任务学习模型,能够在一次前向传播中处理多种自动驾驶感知任务。该模型同时执行多视角语义分割、深度估计、激光雷达分割和鸟瞰投影,无需其他模型支持。我们还提供了一种自适应损失加权算法,以解决因任务众多而出现的学习不平衡问题。通过数据预处理和中间传感器融合技术,该模型可以处理并组合来自RGB摄像头、动态视觉传感器(DVS)和安装在自车多个位置的激光雷达的多种输入模态。因此,可以更好地理解动态变化的环境。基于消融研究,使用我们提出的方法训练的模型变体取得了更好的性能。此外,还进行了比较研究,以阐明其与一些近期模型组合相比的性能和有效性。结果表明,即使参数少得多,我们的模型仍能保持更好的性能。因此,该模型可以更快地推理,并减少GPU内存使用。此外,结果在3个不同的CARLA仿真数据集和1个真实世界的nuScenes-lidarseg数据集上保持一致。为了支持未来的研究,我们在以下网址公开共享代码和其他文件:https://this URL。

英文摘要

We present a novel compact deep multi-task learning model to handle various autonomous driving perception tasks in one forward pass. The model performs multiple views of semantic segmentation, depth estimation, light detection and ranging (LiDAR) segmentation, and bird's eye view projection simultaneously without being supported by other models. We also provide an adaptive loss weighting algorithm to tackle the imbalanced learning issue that occurred due to plenty of given tasks. Through data pre-processing and intermediate sensor fusion techniques, the model can process and combine multiple input modalities retrieved from RGB cameras, dynamic vision sensors (DVS), and LiDAR placed at several positions on the ego vehicle. Therefore, a better understanding of a dynamically changing environment can be achieved. Based on the ablation study, the model variant trained with our proposed method achieves a better performance. Furthermore, a comparative study is also conducted to clarify its performance and effectiveness against the combination of some recent models. As a result, our model maintains better performance even with much fewer parameters. Hence, the model can inference faster with less GPU memory utilization. Moreover, the result tends to be consistent in 3 different CARLA simulation datasets and 1 real-world nuScenes-lidarseg dataset. To support future research, we share codes and other files publicly at https://github.com/oskarnatan/compact-perception.

2606.02969 2026-06-03 cs.RO math.OC 版本更新

Hybrid Dynamics Modeling for a Flexible 2-DoF Robotic Arm

柔性2自由度机械臂的混合动力学建模

Maciek Popik, Daniel Yang, Mahdis Bisheban

发表机构 * Dept. of Mechanical and Manufacturing Eng at the Schulich School of Engineering, University of Calgary, Alberta, Canada(施密特工程学院机械与制造工程系,卡尔加里大学,阿尔伯塔,加拿大) Schulich School of Engineering at the University of Calgary(卡尔加里大学施密特工程学院) Intelligent Dynamics and Control Lab(智能动力与控制实验室) University of Calgary(卡尔加里大学)

AI总结 针对刚性模型无法捕获的未建模动力学,本文结合刚体动力学与高斯混合模型或纯数据驱动回归,对柔性2自由度机械臂进行混合建模,并比较了不同方法的扭矩预测精度。

详情
AI中文摘要

本文研究了三种对柔性连杆2自由度机械臂动力学进行建模的方法,以解决刚体模型无法捕获的未建模动力学。两种物理信息模型将刚体动力学(RBD)公式与高斯混合模型(GMM)相结合,以捕获残差模型误差和连杆柔性。一个基于运动学的回归模型作为纯数据驱动的基线。使用开源数据集,首先通过运动学特征的岭回归估计扭矩预测,而基于物理的基线则根据公布的规格构建,随后使用普通最小二乘回归直接从数据估计相同的参数集。结果表明,基于物理的参数精度最差,而正则化和最小二乘估计器与实测扭矩更吻合。残差分析和误差指标凸显了纯参数模型在柔性连杆系统中的局限性,并强调了正则化和数据驱动辨识的价值,支持了半参数残差学习方法的发展。

英文摘要

This paper examines three approaches for modeling the dynamics of a flexible-link 2-DoF robotic arm to address unmodeled dynamics not captured by rigid-body models. Two physics informed models combine rigid-body dynamics (RBD) formulations with a Gaussian Mixture Model (GMM) to capture residual model errors and linkage flexibility. A kinematics-based regression model serves as a purely data-driven baseline. Using an open-source dataset, torque predictions are first estimated using Ridge regression on kinematic features, while the physicsbased baseline is constructed from published specifications, and ordinary least-squares regression is subsequently used to estimate the same parameter set directly from data. Results show that the physics-based parameters yield the poorest accuracy, while regularized and least-squares estimators align more closely with measured torques. Residual analysis and error metrics highlight the limitations of purely parametric models for flexible-link systems and underscore the value of regularization and data-driven identification, supporting developments of semi-parametric residual learning methods.

2606.02956 2026-06-03 cs.CV cs.LG cs.RO 版本更新

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

自动驾驶的未来之路:KITScenes多模态数据集

Richard Schwarzkopf, Fabian Immel, Alexander Blumberg, Jonas Merkert, Nils Rack, Kaiwen Wang, Fabian Konstantinidis, Julian Truetsch, Carlos Fernandez, Annika Bätz, Kevin Rösch, Marlon Steiner, Willi Poh, Yinzhe Shen, Royden Wagner, Felix Hauser, Dominik Strutz, Jaime Villa, Gleb Stepanov, Holger Caesar, Ömer Şahin Taş, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

发表机构 * FZI Research Center for Information Technology(弗劳恩霍夫信息技术研究中心) Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) University Charles III of Madrid(马德里第三大学) Delft University of Technology(代尔夫特理工大学)

AI总结 本文提出KITScenes多模态数据集,通过高保真传感器和完整HD地图,解决现有数据集在传感器精度、地图完整性和地理多样性上的不足,并引入四个基准推动空间学习。

Comments 28 pages, 21 figures

详情
AI中文摘要

现有的自动驾驶数据集取得了重大进展,但在传感器保真度、地图完整性或地理多样性方面仍存在不足。我们提出了KITScenes多模态数据集,这是一个基于高保真传感器和地图构建的欧洲数据集。我们完全同步的传感器套件结合了高分辨率全局快门相机、超过400米的长距离激光雷达、4D成像雷达以及冗余的GNSS/INS定位。据我们所知,我们的HD地图是任何传感器数据集中最完整的,并通过开源软件上的自动驾驶试验进行了验证。首次在公共数据集中,所有与驾驶相关的交通元素(如交通灯)都以3D方式映射到重投影精确的水平,并具有完整的拓扑连接。我们的数据集记录在街道布局不规则且交通模式混合的城市中,通过拓宽可用的地理多样性来补充现有数据集。我们还引入了四个基准,每个基准都推动了具身AI的空间学习:在线HD地图构建、长距离深度估计、新颖视图合成和端到端驾驶。项目页面:此https URL

英文摘要

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/

2606.02951 2026-06-03 cs.RO cs.AI cs.CL cs.CV cs.HC 版本更新

SCOPE: Real-Time Natural Language Camera Agent at the Edge

SCOPE:边缘实时自然语言相机代理

Nikolaj Hindsbo, Sina Ehsani, Pragyana Mishra

发表机构 * Armada AI

AI总结 提出SCOPE模块化代理,用于自然语言控制的PTZ相机,在边缘部署实现实时感知、规划与控制,并通过仿真和物理实验评估延迟、准确性和错误模式。

Comments 9 pages, 4 figures, 6 tables. Accepted at HRI '26 (21st ACM/IEEE International Conference on Human-Robot Interaction), Edinburgh, Scotland, March 16--19, 2026. Code: https://github.com/HindsboNikolaj/SCOPE

详情
Journal ref
Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI '26), ACM, 2026
AI中文摘要

在机器人领域部署语言驱动的代理需要能够反映现实任务需求的评估:自然语言指令与可重复的结果。此类代理必须将语言模型连接到可调用的感知和控制工具,并使用部署关键指标(包括延迟、准确性和错误模式)进行评估。我们提出了SCOPE(用于感知和评估的仿真与相机操作),这是一个模块化代理,用于自然语言、开放词汇的云台变焦(PTZ)相机控制和视觉场景理解,专门为边缘部署设计。SCOPE既可在基于Blender的仿真环境中运行,也可在物理PTZ相机上运行,所有感知、规划和控制均在部署现场使用边缘可访问的计算资源本地执行。我们发布了一个包含536个任务的基准测试,涵盖问答、单步和多步命令、计数、空间推理、描述以及光学字符识别,在基于Blender的仿真环境中提供逼真的PTZ控制功能。执行轨迹与LM作为评判器结合,以评估延迟、准确性和错误模式。我们评估了19种规划器-感知模型组合,将Qwen3小语言模型(SLM)与Moondream和Qwen视觉语言模型(VLM)配对。更强的SLM显著减少了幻觉并改善了工具路由,从而实现了更可靠的闭环行为。一旦使用了足够强大的SLM,感知就成为主要的性能瓶颈。在规划和感知方面,混合专家模型在延迟和内存占用与更小网络相当的情况下,始终匹配或超过密集替代方案。量化在精度损失最小的情况下提供了额外的效率提升,为实时、边缘可行的语言驱动PTZ控制确定了一个实用的、从仿真到现实验证的设计点。

英文摘要

Deploying language-driven agents in robotics requires evaluations that reflect real-world task demands: natural-language instructions with reproducible outcomes. Such agents must connect language models to callable perception and control tools, and be assessed using deployment-critical metrics including latency, accuracy, and error modes. We present SCOPE (Simulation and Camera Operations for Perception and Evaluation), a modular agent for natural-language, open-vocabulary pan-tilt-zoom (PTZ) camera control and visual scene understanding, designed explicitly for edge deployment. SCOPE operates both in a Blender-based simulation environment and on a physical PTZ camera, executing all perception, planning, and control locally at the deployment site using edge-accessible compute. We release a 536-task benchmark spanning QA, single- and multi-step commands, counting, spatial reasoning, descriptions, and optical character recognition in a Blender-based simulation environment that exposes realistic PTZ control affordances. Execution traces are combined with an LM-as-Judge to evaluate latency, accuracy, and error modes. We evaluate 19 planner-perception model combinations pairing Qwen3 small language models (SLMs) with Moondream and Qwen vision-language models (VLMs). Stronger SLMs substantially reduce hallucinations and improve tool routing, leading to more reliable closed-loop behavior. Once a sufficiently capable SLM is used, perception becomes the dominant performance bottleneck. Mixture-of-Experts models on both the planning and perception side consistently match or exceed dense alternatives at latencies and memory footprints comparable to much smaller networks. Quantization provides additional efficiency gains with minimal accuracy degradation, identifying a practical, sim-to-real validated design point for real-time, edge-feasible language-driven PTZ control.

2606.02928 2026-06-03 cs.RO 版本更新

Improved Postural Stability Using a Lightweight Semi-Active Soft Back Support Device Under Standing Perturbations

使用轻量级半主动软背部支撑装置在站立扰动下改善姿势稳定性

Rohan Khatavkar, Jiefeng Sun, Hyunglae Lee

发表机构 * School for Engineering of Matter, Transport and Energy(物质、运输与能源工程学院)

AI总结 研究提出一种结合气动人工肌肉与弹性带的轻量级半主动软背部支撑装置,通过快速提供辅助力显著降低全身角动量并增加稳定裕度,从而改善站立扰动后的平衡恢复。

Comments 6 pages, 8 figures, submitted to IROS 2026, the IEEE/RSJ International Conference on Intelligent Robots and Systems

详情
AI中文摘要

老年人在站立时受到扰动(如向前失去平衡)后特别容易跌倒。辅助躯干伸展的背部支撑装置可能通过防止过度躯干屈曲来帮助减轻跌倒风险。先前的研究已经研究了重型背部支撑装置;然而,这些系统由于其附加质量往往对稳定性产生不利影响,这会使身体自然重心发生不利的偏移。相比之下,轻量级被动装置显示出有限的益处,因为它们在向前平衡丧失相关的相对较小的躯干屈曲期间只能产生适度的辅助力。在本研究中,我们评估了一种轻量级半主动软背部支撑装置在站立扰动后对姿势稳定性的影响。我们的装置将一个主动元件(气动人工肌肉)与一个被动弹性带并联。主动元件在扰动后快速提供辅助力,克服了被动装置的局限性。对五名健康个体进行的实验表明,半主动装置显著降低了全身角动量并增加了稳定裕度,表明平衡恢复性能得到改善。这些结果突显了半主动软可穿戴机器人作为站立扰动期间跌倒预防的有效且轻量级策略的前景。

英文摘要

Older adults are particularly susceptible to falls following perturbations during standing, such as forward loss of balance. Back support devices that assist trunk extension may help mitigate fall risk by preventing excessive trunk flexion. Previous studies have investigated heavy back support devices; however, these systems often introduced adverse effects on stability due to their added mass, which shifted the body's natural center of mass unfavorably. In contrast, lightweight passive devices have shown limited benefits, as they can generate only modest assistive forces during the relatively small trunk flexion associated with forward balance loss. In this study, we evaluated the effects of a lightweight semi-active soft back support device on postural stability following standing perturbations. Our device combines an active element (a pneumatic artificial muscle) in parallel with a passive elastic band. The active element rapidly provides assistive force following a perturbation, overcoming the limitations of passive devices. Experiments conducted with five healthy individuals demonstrated that the semi-active device significantly reduced whole-body angular momentum and increased the margin of stability, indicating improved balance recovery performance. These results highlight the promise of semi-active soft wearable robots as an effective and lightweight strategy for fall prevention during standing perturbations.

2606.02888 2026-06-03 cs.RO 版本更新

Impact of a Soft Wearable Back-Support Device on Postural Stability during Trip-Like Perturbations

软性可穿戴背部支撑装置在类似绊倒扰动下对姿势稳定性的影响

Yuanhao Chen, Rohan Khatavkar, Soubhagya Nayak, Jiefeng Sun, Hyunglae Lee

发表机构 * School for Engineering of Matter, Transport and Energy, Arizona State University(物质、运输与能源工程学院,亚利桑那州立大学)

AI总结 通过扰动站立和行走实验,研究软性可穿戴背部支撑装置在类似绊倒扰动下对姿势稳定性的增强效果,发现装置使用提高了最小稳定裕度,表明其可改善反应性平衡控制,具有防跌倒潜力。

Comments 6 pages, 6 figures, to be published in the proceedings of the 2026 11th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob)

详情
AI中文摘要

通过两种实验范式(扰动站立和扰动行走)研究了软性可穿戴背部支撑装置在类似绊倒扰动下对姿势稳定性的增强效果。健康受试者在三种不同的背部支撑条件下完成试验:无装置、低刚度装置、高刚度激活装置。使用最大不稳定点的最小稳定裕度(MOS)量化全身稳定性。结果表明,使用装置时MOS增加,表明姿势稳定性增强。在站立条件下,MOS随装置刚度显著增加;而在行走条件下,两种装置条件相比无装置均改善了MOS,但两者之间无显著差异。这些发现凸显了具有可调刚度的软性可穿戴背部支撑装置在改善对外部扰动的反应性平衡控制方面的潜力,对防跌倒具有重要意义。未来研究应探索个性化刚度优化,并评估在跌倒高风险人群中的有效性。

英文摘要

The effectiveness of a soft wearable back-support device in enhancing postural stability was investigated under trip-like perturbations using two experimental paradigms: perturbed standing and perturbed walking. Healthy subjects completed trials under three different back-support conditions: no device, device worn with low stiffness, and device activated with high stiffness. Whole-body stability was quantified using the minimum Margin of Stability (MOS) at the point of maximal instability. Results demonstrated increased MOS during device use, indicating enhanced postural stability. In standing, MOS increased significantly with device stiffness, whereas in walking, both device conditions improved MOS relative to no device but did not differ significantly from each other. These findings highlight the potential of soft wearable back-support devices with adjustable stiffness to improve reactive balance control against external perturbations, with important implications for fall prevention. Future research should explore personalized stiffness optimization and evaluate efficacy in populations at elevated risk of falls.

2606.02879 2026-06-03 cs.RO 版本更新

Direct Informed Sampling on Riemannian Manifolds via Loewner Order Lower Bounds

基于Loewner序下界的黎曼流形直接知情采样

Phone Thiha Kyaw, Jonathan Kelly

发表机构 * Space and Terrestrial Autonomous Robotic Systems (STARS) Laboratory, University of Toronto Institute for Aerospace Studies (UTIAS)(太空与地面自主机器人系统实验室,多伦多大学航空航天研究所)

AI总结 提出一种利用Loewner序计算度量张量最紧常数下界的矩阵值可容许启发式,将黎曼知情集映射为各向同性欧氏空间中的标准长球超椭球,实现直接无拒绝采样,加速多种最优运动规划器收敛。

Comments Submitted to IEEE Robotics and Automation Letters (RA-L)

详情
AI中文摘要

知情采样技术通过将搜索聚焦于状态空间的有希望区域来加速基于采样的运动规划器,然而大多数现有方法依赖于欧氏启发式,这些启发式在依赖于构型的黎曼度量下变得不可容许。虽然标量特征值下界通过均匀缩放欧氏距离恢复了可容许性,但它们丢弃了度量的方向结构,产生过于保守的知情集。我们提出一种矩阵值可容许启发式,利用对称正定矩阵上的Loewner序计算度量张量最紧的常数下界,同时保留其完整的方向结构。该下界的Cholesky分解定义了一个到各向同性欧氏空间的线性映射,在该空间中黎曼知情集简化为标准的长球超椭球,从而能够使用现有算法进行直接无拒绝采样。在6自由度UR5、7自由度Franka和14自由度PR2上三种不同黎曼度量下的操作任务实验表明,我们的启发式产生的知情集始终比欧氏和标量特征值下界更紧,加速了多种最先进渐近最优规划器的收敛。

英文摘要

Informed sampling techniques accelerate sampling-based motion planners by focusing the search on promising regions of the state space, yet most existing methods rely on Euclidean heuristics that become inadmissible under configuration-dependent Riemannian metrics. While scalar eigenvalue bounds restore admissibility by uniformly scaling the Euclidean distance, they discard the directional structure of the metric, producing overly conservative informed sets. We propose a matrix-valued admissible heuristic that exploits the Loewner order on symmetric positive definite matrices to compute the tightest constant lower bound on the metric tensor while preserving its full directional structure. The Cholesky factorization of this bound defines a linear map to an isotropic Euclidean space in which the Riemannian informed set reduces to a standard prolate hyperspheroid, enabling direct, rejection-free sampling using existing algorithms. Experiments on manipulation tasks with a 6-DoF UR5, 7-DoF Franka, and 14-DoF PR2 under three distinct Riemannian metrics show that our heuristic produces consistently tighter informed sets than both the Euclidean and scalar eigenvalue bounds, accelerating convergence across multiple state-of-the-art asymptotically optimal planners.

2606.02872 2026-06-03 eess.SY cs.MA cs.RO cs.SY 版本更新

Terminal Time and Angle-Constrained Nonlinear Intercept Guidance

终端时间和角度约束的非线性拦截制导

Shivam Bajpai, Abhinav Sinha

发表机构 * University of California(加州大学)

AI总结 针对单一控制输入下的欠驱动非线性拦截问题,提出基于分层滑模的制导律,同时控制终端时间和角度,并扩展至常速目标拦截。

详情
AI中文摘要

本文考虑使用横向加速度作为唯一控制输入,同时控制拦截器的撞击时间和撞击角度的问题。由于单一控制输入,非线性交战运动学本质上是欠驱动的,这使得制导律综合变得复杂。为了克服这一挑战,开发了一种基于分层滑模的制导律,以同时调节两个终端约束。所提出的架构包括一个两层滑模流形。第一层由分别对应撞击时间和撞击角度误差动力学的两个子滑模面组成,而第二层引入了一个组合两个单独子滑模面的复合滑模流形。然后,设计了一种变增益自适应制导律,以确保对静止目标的带时间和角度约束的拦截,并进一步扩展至拦截常速目标。针对各种交战场景进行了仿真,以证明所提出方法的有效性。

英文摘要

This paper considers the problem of simultaneously controlling an interceptor's impact time and impact angle using its lateral acceleration as the sole control input. With a single control input, the nonlinear engagement kinematics is inherently underactuated, which complicates guidance law synthesis. To overcome this challenge, a hierarchical sliding mode-based guidance law is developed to concurrently regulate the two terminal constraints. The proposed architecture consists of a two-layer sliding manifold. The first layer comprises two sub-sliding surfaces corresponding to the impact time and impact angle error dynamics, respectively, while the second layer introduces a composite sliding manifold that combines the two individual sub-surfaces. Then, a variable-gain adaptive guidance law is designed to ensure time and angle-constrained interception against a stationary target, which is further extended to intercept a constant velocity target. Simulations are conducted for various engagement scenarios to attest to the efficacy of the proposed approach.

2606.02796 2026-06-03 cs.RO 版本更新

A Measurement-Driven Digital Twin Architecture for Plant-Level Biomass Estimation and Growth Forecasting in Hydroponic Systems

基于测量驱动的数字孪生架构:用于水培系统中植物级生物量估计与生长预测

Morgan Mayborne, Abhisesh Silwal, George Kantor

发表机构 * The Robotics Institute, Carnegie Mellon University(卡内基梅隆大学机器人研究所)

AI总结 提出一种结合传感器数据和模型更新的数字孪生架构,通过RGB-D图像和神经网络实时估计生菜质量,并实现未来1-4天生长预测,误差约2克。

Comments 7 pages, 6 figures

详情
AI中文摘要

针对密集城市中心的食品分配问题,已开发出水培等替代土壤园艺的方法。本文开发了一种新系统,利用测量信息流和可用模型,持续更新水培环境中单个生菜植株的生长轨迹估计。这些“数字孪生”模型被集成到一个运行中的水培温室中,配备定制园艺和传感器硬件以生长和测量相关信息。为辅助更新模型参数,使用自定义神经网络连续测量植物产量,输入为植物的RGB-D图像。该网络在1300张图像的收集数据集上训练,能够估计质量,误差在真实值的1.5克以内。集成到定制系统后,数字孪生生长预测可近似未来1至4天的产量,保持约2克的预测误差。

英文摘要

Alternatives to soil-based horticulture, such as hydroponics, have been developed to respond to food distribution concerns for dense urban centers. A new system was developed to track an individual lettuce plant's growth in a hydroponic environment, utilizing streams of measured information and available models to continuously update the growth trajectory estimates for a plant. These "digital twin" models were integrated into an operating hydroponic greenhouse, with custom horticultural and sensor hardware to grow and measure relevant information. To aid in updating model parameters, plant yield was continuously measured with a custom neural network, using RGB-D images of the plants as an input. The network, trained on a collected dataset of 1300 images, was able to estimate mass within 1.5 g of the ground-truth value. After integration into the custom system, digital twin growth projections could approximate future yield between one and four days in the future, maintaining around a 2 g forecasting error.

2606.02775 2026-06-03 cs.AI cs.AR cs.DC cs.PF cs.RO 版本更新

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

AURA: 恒定VRAM下机器人策略的动作门控记忆

Josef Chen

发表机构 * KAIKAKU(卡基库)

AI总结 提出AURA-Mem,一种恒定大小、基于动作误差信号门控写入的循环记忆,替代KV缓存,在边缘机器人任务中实现与基线相当的准确率,同时减少5-9倍写入次数。

详情
AI中文摘要

KV缓存是数据中心合适的记忆,但却是机器人错误的记忆。数据中心推理批量处理许多短请求并重置它们,在众多请求中分摊注意力缓存。具身智能体则在带宽受限的边缘硬件上运行一个长且不重置的回合,其中高带宽内存和闪存稀缺,闪存写入寿命有限,内存写入而非计算可能成为约束瓶颈。AURA-Mem(动作效用循环自适应记忆)针对这一场景。它用一个固定大小的循环记忆和一个学习得到的门控包装冻结的视觉-语言-动作骨干网络,该门控仅在当前观测会改变下一个动作时写入:一种知道何时保持沉默的记忆。与基于重建的记忆不同,该门控直接针对闭环动作误差信号进行训练。其推理状态固定为4,224字节,无论时间步长如何,而KV缓存则在100,000步时增长到6,061倍。在受控的合成基准测试中,AURA-Mem在准确率上与最佳的O(1)基线相当,同时使用5.19-6.13倍更少的写入,在更简单的配置上最多减少9.19倍写入。预算匹配的随机和周期性调度无法恢复这一增益,从而将收益归因于动作惊喜信号。在LIBERO-Long上训练的闭环OpenVLA-OFT 7B面板(每个机械臂n=60个回合)上,门控不会损害成功率:AURA-Mem与无门控基础策略(0.233)相当,并略超过始终写入的KV臂(0.217),同时使用7.0倍更少的写入和恒定内存。我们还实例化了一个近似信息状态价值损失界限作为方法论演示;在此规模下,该界限是空洞的而非保证。

英文摘要

The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint. AURA-Mem (Action-Utility Recurrent Adaptive Memory) targets this regime. It wraps a frozen vision-language-action backbone with a constant-size recurrent memory and a learned gate that writes only when the current observation would change the next action: memory that knows when to stay silent. Unlike reconstruction-based memory, the gate is trained directly against a closed-loop action-error signal. Its inference state is fixed at 4,224 bytes regardless of horizon, while a KV-cache grows to 6,061 times larger at 100,000 steps. On a controlled synthetic benchmark, AURA-Mem matches the best O(1) baseline in accuracy while using 5.19-6.13 times fewer writes, and up to 9.19 times fewer writes on easier configurations. Budget-matched random and periodic schedules do not recover this gain, isolating the benefit to the action-surprise signal. On a trained closed-loop OpenVLA-OFT 7B panel on LIBERO-Long (n=60 episodes per arm), the gate does not hurt success: AURA-Mem matches the ungated base policy (0.233) and slightly exceeds an always-write KV arm (0.217), while using 7.0 times fewer writes and constant memory. We also instantiate an approximate-information-state value-loss bound as a methodology demonstration; at this scale, the bound is vacuous rather than a guarantee.

2606.02767 2026-06-03 cs.RO cs.LG 版本更新

Hybrid Adaptive Kalman Filtering for Data-Efficient Joint Tracking and Classification

混合自适应卡尔曼滤波用于数据高效的联合跟踪与分类

Jiho Lee, Nisar R. Ahmed, Rebecca Russell

发表机构 * Charles Stark Draper Laboratory, Inc.(查尔斯·斯泰克·德帕尔实验室,Inc.) Ann and H. J. Smead Department of Aerospace Engineering Sciences(安与H.J.斯梅德航空航天工程科学系)

AI总结 提出一种自监督混合自适应卡尔曼滤波器,通过仅从测量中学习系统动力学和过程噪声协方差的结构化校正,同时保持滤波器的概率结构,实现低数据和大数据场景下的高精度估计与鲁棒分类。

Comments 8 pages, 4 figures

详情
AI中文摘要

卡尔曼滤波性能对模型失配和噪声协方差调谐高度敏感。基于学习的方法解决了这些局限性,但通常依赖于大量数据集的监督训练,且不能产生一致的不确定性估计。在本文中,我们提出了一种自监督混合自适应卡尔曼滤波器,该滤波器仅从测量中学习系统动力学和过程噪声协方差的结构化校正,同时保持滤波器的概率结构。这使得可以计算创新似然,并随后通过广义贝叶斯推理用于模型分类。在真实世界和模拟数据集上的实验结果表明,在低数据和大数据场景下,估计精度和统计一致性均得到提高,分类性能也表现出鲁棒性。

英文摘要

Kalman filtering performance is highly sensitive to model mismatch and noise covariance tuning. Learning-based approaches address these limitations but typically rely on supervised training with large datasets and do not produce consistent uncertainty estimates. In this paper, we propose a self-supervised Hybrid Adaptive Kalman Filter that learns structured corrections to system dynamics and process noise covariance from measurements alone while preserving the probabilistic structure of the filter. This allows the innovation likelihood to be computed and subsequently used for model classification via generalized Bayesian inference. Experimental results on real-world and simulated datasets demonstrate improved estimation accuracy and statistical consistency as well as robust classification performance across both low-data and large-data scenarios.

2606.02745 2026-06-03 cs.RO cs.LG 版本更新

SeeTraceAct: Visibility-Aware Latent Planning from Cross-Embodiment Demonstration Videos

SeeTraceAct: 跨具身演示视频中的可见性感知潜在规划

Jaehyeon Son, Junhyun Kim, Kyle Kam, Jeremiah Coholich, Seok Joon Kim, Jinhoo Kim, Chris Dongjoo Kim, Jaemin Cho, Dieter Fox, Zsolt Kira

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Allen Institute for AI(Allen人工智能研究所) Johns Hopkins University(约翰霍普金斯大学) University of Washington(华盛顿大学)

AI总结 提出SeeTraceAct框架,通过可见性感知的未来末端执行器轨迹预测增强空间定位,实现基于单次跨具身演示视频的机器人策略泛化,在模拟和真实场景中取得最优成功率。

详情
AI中文摘要

视觉-语言-动作模型(VLA)是有前途的通用机器人策略,但将其适应新任务通常需要昂贵的任务特定遥操作数据。作为替代,我们研究一次性演示条件VLA,其中机器人策略以未见任务的单个演示视频为条件。我们发现,当成功执行需要精确定位小目标区域时,现有的端到端方法往往难以应对。为解决这一限制,我们提出SeeTraceAct,一种演示条件VLA框架,通过可见性感知的未来末端执行器轨迹预测来鼓励精确的空间定位。为实现跨具身演示的可重复评估,我们引入并发布了RoboCasa-DC,这是RoboCasa的演示条件扩展,包含成对的人形机器人视频。在RoboCasa-DC和真实世界基准(Franka Panda臂以人类演示为条件)上的实验表明,SeeTraceAct优于基线,在所有四个RoboCasa-DC设置中实现了最佳成功率,并将真实世界平均成功率提高了12.5个百分点。

英文摘要

Vision-language-action models (VLAs) are promising general-purpose robot policies, but adapting them to new tasks typically requires costly task-specific teleoperation data. As an alternative, we study one-shot demo-conditioned VLAs, where a robot policy is conditioned on a single demonstration video of an unseen task. We find that existing end-to-end approaches often struggle when successful execution requires precisely localizing small target regions. To address this limitation, we propose SeeTraceAct, a demo-conditioned VLA framework that encourages precise spatial grounding through visibility-aware prediction of future end-effector traces. To enable reproducible evaluation with cross-embodiment demonstrations, we introduce and release RoboCasa-DC, a demo-conditioned extension of RoboCasa with episode-paired humanoid videos. Experiments on RoboCasa-DC and a real-world benchmark, where a Franka Panda arm is conditioned on human demonstrations, show that SeeTraceAct outperforms baselines, achieving the best success rate across all four RoboCasa-DC settings and improving real-world average success by 12.5 percentage points.

2606.02677 2026-06-03 cs.RO 版本更新

Motion Planning in Dynamic Environments: A Survey from Classical to Modern Methods

动态环境中的运动规划:从经典到现代方法的综述

Zongyuan Shen, Yaming Ou, Shalabh Gupta, Shancheng Zhao, Dehua Zhou, Gao Wang, Zhongqiang Ren, Junfeng Fan, Long Cheng

发表机构 * College of Information Science and Technology, Jinan University(济南大学信息科学与技术学院) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Department of Electrical and Computer Engineering, University of Connecticut(康奈尔大学电子与计算机工程系) Global College, Shanghai Jiao Tong University(上海交通大学全球学院) Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所)

AI总结 本文综述了138篇文献,将动态环境中的运动规划方法分为采样、图搜索、模型预测控制、学习及经典局部规划五类,分析了各方法的原理、优缺点及动态环境特有挑战。

详情
AI中文摘要

动态环境中的运动规划要求机器人持续调整路径以应对环境变化,实现安全不间断的导航。尽管许多综述回顾了静态环境中的规划,但针对动态环境的系统综述仍然有限。本文对138篇文献进行了全面综述,主要发表于2015年至2025年,涵盖经典和基于学习的方法。运动规划方法根据采样、图搜索、模型预测控制、学习以及额外的经典局部规划方法(包括速度障碍、势场和动态窗口)的概念分为五类。学习技术包括监督学习和强化学习。我们还讨论了动态感知在运动规划中的作用,涵盖了使用相机、LiDAR和事件传感器检测和建模移动障碍物的技术。该综述分析了每种方法的原理、优势和局限性,特别关注动态环境特有的挑战,如预测不确定性、人机交互和机器人冻结问题。该综述为研究人员提供了对动态环境中运动规划方法的结构化理解。

英文摘要

Motion planning in dynamic environments requires robots to continuously adapt their paths in response to environmental changes for safe and uninterrupted navigation. While many surveys have reviewed planning in static settings, systematic reviews focused on dynamic environments remain limited. This paper presents a comprehensive survey of 138 works, primarily published between 2015 and 2025, spanning both classical and learning-based approaches. The motion planning methods are grouped into five categories based on the concepts of sampling, graph search, model predictive control, learning, and additional classical local planning approaches, including velocity obstacles, potential fields and dynamic windows. The learning techniques include supervised learning and reinforcement learning. We also discuss the role of dynamic perception in motion planning, covering techniques for detecting and modeling moving obstacles using cameras, LiDAR, and event-based sensors. The survey analyzes the principles, strengths, and limitations of each method, with particular attention to challenges unique to dynamic environments, such as prediction uncertainty, human-robot interaction, and the freezing robot problem. The survey provides researchers with a structured understanding of motion planning methods in dynamic environments.

2606.02658 2026-06-03 cs.RO 版本更新

Fixed-Time Dynamic Landing of Quadrotors using Adaptive Unscented Kalman Filtering and Nonlinear Model Predictive Control

基于自适应无迹卡尔曼滤波和非线性模型预测控制的四旋翼飞行器固定时间动态着陆

Mohammadreza Izadi, Zeinab Shayan, Steven Waslander, Reza Faieghi

发表机构 * Autonomous Vehicles Laboratory, Department of Aerospace Engineering, Toronto Metropolitan University(自主车辆实验室,航空航天工程系,多伦多 Metropolitan 大学) University of Toronto Institute for Aerospace Studies, University of Toronto(多伦多大学航空航天研究 institute,多伦多大学)

AI总结 提出一种结合非线性模型预测控制与实时最小加加速度轨迹规划器及自适应无迹卡尔曼滤波的估计与控制框架,实现多旋翼无人机在移动平台上的固定时间动态着陆,并通过仿真和硬件实验验证了其可重复着陆能力和优于EKF/UKF的速度预测精度。

Comments Accepted to the Conference on Robots and Vision (CRV 2026), Vancouver, Canada

详情
AI中文摘要

本文介绍了一种用于多旋翼无人机在移动平台上动态着陆的估计与控制框架。所提出的方法将非线性模型预测控制与实时最小加加速度轨迹规划器相结合,该规划器强制执行规定的着陆时间,从而在终端下降过程中实现一致的时间安排。为了增强在时变传感质量下的鲁棒性,我们采用了自适应无迹卡尔曼滤波,在线更新过程和测量噪声统计量。此外,我们提供了参考可行性分析,表明在标准跟踪假设下,最小加加速度参考会诱导有界的推力和扭矩指令。所提出的框架在仿真和硬件实验中进行了评估,并表明相对于基于EKF/UKF的方法,实现了可重复的着陆和改进的平台速度预测精度。

英文摘要

This paper introduces an estimation and control framework for dynamic landing of multi-rotor uncrewed aerial vehicles on moving platforms. The proposed method integrates nonlinear model predictive control with a real-time minimum-jerk trajectory planner that enforces a prescribed touchdown time, enabling consistent timing during the terminal descent. To enhance robustness in the presence of time-varying sensing quality, we utilize an adaptive unscented kalman filter that updates the process and measurement noise statistics online. In addition, we provide a reference feasibility analysis showing that minimum-jerk references induce bounded thrust and torque commands under standard tracking hypotheses. The proposed framework is evaluated in simulation and hardware experiments, and it is shown to achieve repeatable landings and improved platform velocity prediction accuracy relative to EKF/UKF-based methods.

2606.02641 2026-06-03 cs.RO cs.AI 版本更新

CARVE: Certified Affordable Repair of Vetoed Maneuvers via Envelopes for Interactive Driving

CARVE: 通过包络实现交互驾驶中被否决机动的认证可负担修复

Yifan Wang

发表机构 * Yifan Wang(王一帆)

AI总结 针对交互驾驶中规则感知堆栈易忽略的硬规则裕度负值问题,提出CARVE认证层,通过有限格点上的自我与代理战术算子,实现被否决机动的可负担修复认证,并证明其合理性。

Comments 8 pages, 3 figures

详情
AI中文摘要

交互驾驶暴露了规则感知自动驾驶堆栈中容易忽略的失效模式:即使非优先代理的小幅合法让步可恢复可行性,自我候选的硬规则裕度仍可能为负。现有的规则手册、防护和可达性过滤器在否决不安全动作方面表现强劲,而基于预测的规划器则对可能的响应进行建模。两者均未返回运行时证明对象,该对象说明哪个有界多代理编辑修复了机动、谁拥有编辑、请求是否在路权上可负担,以及如果请求未被遵守,自我后备是什么。我们将这一缺失对象形式化为*交互修复认证*,并引入*CARVE*,一个在自我拥有和代理拥有的战术算子有限格点上的无预测认证层。代理拥有的请求仅在\(B_j(s) = eta(\pi_j)\alpha_j^{\max}(s)\)内可接受,这是一个将运动学可达性与规范优先级分离的合作包络。生成的证书记录了绑定规则、修复类别、修复集、责任加权成本分配和后备。在589个基于Lanelet2几何的INTERACTION重放片段上,CARVE-Greedy接受了98.64%的初始否决机动,恢复了370/378个人类解决错误否决,同时保持了589/589的路权尊重、零优先级代理假阳性以及400/400的负压力否决。我们证明了证书的合理性、结构性的路权尊重、精确的有限格点最小性、后备应急性和责任一致性条件。CARVE不预测也不需要其他驾驶员的合规性;它认证在声明假设下提议的交互是否有界、可归因且规范上可接受。

英文摘要

Interactive driving exposes a failure mode that is easy to miss in rule-aware autonomous-driving stacks: a hard-rule margin can be negative for an ego candidate even though a small lawful accommodation by a non-priority agent would restore feasibility. Existing rulebooks, shields, and reachability filters are strong at vetoing unsafe actions, while prediction-based planners model likely responses. Neither returns a runtime proof object that states which bounded multi-agent edit repairs the maneuver, who owns the edit, whether the request is right-of-way affordable, and what ego fallback remains if the request is not observed. We formulate this missing object as *interactive repair certification* and introduce *CARVE*, a prediction-free certificate layer over a finite lattice of ego-owned and agent-owned tactical operators. Agent-owned requests are admissible only inside \(B_j(s) = β(π_j)α_j^{\max}(s)\), a cooperation envelope that separates kinematic reachability from normative priority. The resulting certificate records the binding rule, repair category, repair set, responsibility-weighted cost split, and fallback. On 589 Lanelet2-geometry-grounded INTERACTION replay episodes, CARVE-Greedy accepts 98.64% of initially vetoed maneuvers and recovers 370/378 human-resolved false vetoes, while preserving 589/589 right-of-way respect, zero priority-agent false positives, and 400/400 negative-stress vetoes. We prove certificate soundness, structural right-of-way respect, exact finite-lattice minimality, fallback contingency, and blame-consistency conditions. CARVE does not predict or require another driver's compliance; it certifies whether a proposed interaction is bounded, attributable, and normatively admissible under declared assumptions.

2606.03735 2026-06-03 nlin.CD cs.MA cs.RO 版本更新

On dynamic multi-agent pathfinding methods: review, simulations and modifications

动态多智能体路径规划方法:综述、仿真与改进

Gabriel Fejziaj, Salama Hassona, Wieslaw Marszalek

发表机构 * Department of Computer Science, Opole University of Technology(计算机科学系,奥波尔技术大学)

AI总结 本文系统研究动态多智能体路径规划(D-MAPF)中的六种代表性算法,并提出一种基于模板的A**算法,通过离线几何路径生成与在线时间适应解耦,在频繁变化和有限感知环境中提高解质量。

详情
AI中文摘要

本文系统研究了动态多智能体路径规划(D-MAPF)背景下的路径规划算法,该设置结合了动态障碍物、部分可观测性和智能体间冲突。我们在统一的仿真框架内评估了六种代表性算法:Dijkstra、D* Lite、Space-Time A*、WHCA*、M*以及一种新方法A**。提出的A**算法引入了一种基于模板的方法,将离线几何路径生成与在线时间适应解耦。通过预计算多条多样候选路径并使用时空规划动态重新连接,A**在频繁变化和有限感知的环境中提高了解质量。

英文摘要

This paper presents a systematic study of pathfinding algorithms in the context of Dynamic Multi-Agent Pathfinding (D-MAPF), a setting that combines dynamic obstacles, partial observability, and inter-agent conflicts. We evaluate six representative algorithms: Dijkstra, D* Lite, Space-Time A*, WHCA*, M*, and a novel method denoted as A** within a unified simulation framework. The proposed A** algorithm introduces a template-based approach that decouples offline geometric path generation from online temporal adaptation. By precomputing multiple diverse candidate paths and dynamically reconnecting to them using space-time planning, A** improves solution quality in environments with frequent changes and limited sensing

2606.01851 2026-06-03 cs.RO 版本更新

PHASOR: Phase-Anchored Universal Action Representations for Humanoid Embodiments

PHASOR: 面向人形本体的相位锚定通用动作表示

Kihyun Kim, Chaeyun Kim, Jongho Shin, Taeyoun Kwon, Junghyun Kim, Mijin Koo, Haon Park

发表机构 * AIM Intelligence Seoul National University(首尔国立大学) LG Electronics(LG电子) MAUM AI OpenMind

AI总结 提出PHASOR方法,通过将动作嵌入空间分解为相位流形和姿态分支,并结合运动语义蒸馏,构建跨本体的通用动作表示,实现人形机器人的跨本体检索和下游任务性能提升。

Comments * Equal contribution

详情
AI中文摘要

学习一个好的动作嵌入空间对于可扩展的机器人策略学习至关重要,但现有方法将动作潜在变量视为任务特定的中间产物,而非第一类表示。由此产生的潜在变量是非结构化的、本体特定的,且与运动语义关联较弱,限制了可解释性、可控性和跨机器人的迁移性。我们将动作嵌入空间本身定位为第一类设计目标,下游策略质量源于表示质量。利用运动的内在周期性,我们将其分解为一个相位流形(通过FFT参数系数捕获循环结构)和一个姿态分支(将流形条件化为非周期配置细节)。结合运动语义蒸馏,这种分解结构产生了一个跨本体的运动流形,该流形在设计上是可解释且与本体无关的。将多个人形机器人锚定到一个共享的预训练流形上,则在不同平台上产生统一的动作嵌入空间,实现了强大的跨本体检索和下游机器人任务的一致性能提升。

英文摘要

Learning a good action embedding space is fundamental to scalable robot policy learning, yet existing methods treat action latents as task-specific intermediates rather than first-class representations. The resulting latents are unstructured, embodiment-specific, and weakly tied to motion semantics, limiting interpretability, controllability, and transferability across robots. We position the action embedding space itself as a first-class design target, with downstream policy quality emerging from representation quality. Exploiting motion's intrinsic periodicity, we factorize it into a phase manifold that captures cyclic structure via FFT-parametric coefficients, together with a pose branch that conditions the manifold on non-periodic configuration detail. Combined with motion-semantic distillation, this factorized structure yields a cross-embodiment motion manifold that is interpretable and embodiment-agnostic by design. Anchoring multiple humanoid robots to a shared human-pretrained manifold then produces a unified action embedding space across diverse platforms, achieving strong cross-embodiment retrieval and consistent gains on downstream robot tasks.

2606.01241 2026-06-03 cs.RO 版本更新

OneVLA: A Unified Framework for Embodied Tasks

OneVLA:面向具身任务的统一框架

Lingfeng Zhang, Xiaoshuai Hao, Yingbo Tang, Lei Zhou, Shuyi Zhang, Jinkun Liu, Hongsheng Li, Chenhao Zhang, Qiang Zhang, Hangjun Ye, Xiaojun Liang, Long Chen, Wenbo Ding

发表机构 * Tsinghua University(清华大学) Pengcheng Laboratory(鹏城实验室) Xiaomi EV(小米电动车) Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) Peking University(北京大学) HKUST(GZ)(香港科技大学(广州))

AI总结 提出统一架构OneVLA,通过设计统一动作头和渐进式训练策略(含数据构建和思维链微调),在导航与操作任务上实现跨任务正迁移,达到最先进性能。

详情
AI中文摘要

导航和操作是具身智能的基本能力,使机器人能够解释自然语言命令并与环境进行物理交互。然而,当前的视觉-语言-动作(VLA)模型仍受限于任务特定的架构,专门处理导航或操作,这阻碍了通用机器人智能体的发展。为弥补这一差距,我们引入了OneVLA,一个统一架构,将这些不同任务整合到单个连贯框架中。具体来说,我们设计了一个统一的动作头,能够生成导航和操作动作,无需任务特定的变体。此外,我们提出了一种多阶段渐进式训练策略——结合精心构建的数据和思维链(CoT)微调——促进了两个领域之间的强正迁移和相互增强。在模拟和真实环境中的大量实验表明,OneVLA实现了最先进的性能,显著优于专门的单任务和现有的跨任务模型。通过统一这些核心能力,OneVLA为真正的通用机器人系统铺平了道路。模型和源代码将公开发布。

英文摘要

Navigation and manipulation are fundamental capabilities of embodied intelligence, enabling robots to interpret natural language commands and interact physically with their surroundings. However, current Vision-Language-Action (VLA) models remain constrained by task-specific architectures, specializing in either navigation or manipulation, which hinders the development of general-purpose robotic agents. To bridge this gap, we introduce OneVLA, a unified architecture that integrates these distinct tasks into a single, cohesive framework. Specifically, we design a unified action head capable of generating both navigation and manipulation actions without requiring task-specific variants. Furthermore, we propose a multi stage progressive training strategy-incorporating curated data construction and Chain-of-Thought (CoT) fine-tuning that facilitates strong positive transfer and mutual reinforcement between the two domains. Extensive experiments in both simulated and real-world environments demonstrate that OneVLA achieves state-of-the-art performance, significantly outperforming both specialized single-task and existing cross-task models. By unifying these core capabilities, OneVLA paves the way for truly general-purpose robotic systems. The model and source code will be publicly released.

2605.31434 2026-06-03 cs.RO 版本更新

Shaft-integrated Force Sensing with Transformer-based Dynamics Compensation for Telesurgery

基于变压器的动力学补偿的轴集成力传感用于远程手术

Shuyuan Yang, Grant Boone, Timo Markert, Sebastian Matich, Andreas Theissler, Martin Atzmueller, Zonghe Chua

发表机构 * Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University(电气、计算机与系统工程系,凯斯西储大学) Department of Mechanical and Aerospace Engineering, Case Western Reserve University(机械与航空航天工程系,凯斯西储大学) Resense GmbH Semantic Information Systems Group, Osnabrück University(语义信息系统组,奥斯纳布吕克大学) Justus Liebig University(吉森大学) German Research Center for Artificial Intelligence (DFKI)(德国人工智能研究中心(DFKI))

AI总结 提出一种将六轴力传感器集成到标准缆驱动手术器械远端的方法,利用变压器神经网络补偿内部缆力,实现末端执行器力估计,归一化误差低于6%。

Comments The paper was accepted by IEEE Transactions on Medical Robotics and Bionics in May 2026

详情
AI中文摘要

机器人辅助微创手术(RAMIS)增强了外科医生的灵巧性,新平台利用触觉反馈进一步提高性能。这种力信息具有更广泛的潜力,可用于性能评估、触觉定位和手术自主性。这促使需要将力传感集成到RAMIS工具中的可访问方法。本工作提出了一种将六轴商用力传感器集成到标准缆驱动手术器械远端的方法,在保持设备原始机械功能的同时实现末端执行器力测量。所提出的设计强调可重复性和研究应用的可访问性,无需专门的制造工具。变压器神经网络将力传感器测量值与机器人状态信息相结合,以帮助估计末端执行器施加的力,补偿由驱动引起的内部缆力。我们提出的方法实现了低于6%的归一化误差,并且比纯近端数据驱动传感方法更好地泛化到未见条件。高内部缆力导致传感器饱和并降低轴向力的可观测性,这可能沿工具主轴和更高负载条件下降低性能。鉴于当前性能水平,系统集成性和性能的平衡使得在RAMIS中触觉反馈、技能评估和力信息自主性等及时主题的应用和研究成为可能。视频和代码可在https://enhanced-telerobotics.github.io/shaft force sensing获取。

英文摘要

Robot-Assisted Minimally Invasive Surgery (RAMIS) enhances surgeon dexterity, with newer platforms leveraging haptic feedback to further improve performance. Such force information has broader potential to inform performance assessment, tactile localization, and surgical autonomy. This motivates the need for accessible approaches to integrating force sensing into RAMIS tools. This work presents a method for integrating a six-axis commercial force sensor into the distal end of a standard cable-driven surgical instrument, enabling end-effector force measurement while preserving the original mechanical functionality of the device. The proposed design emphasizes reproducibility and accessibility for research applications, requiring no specialized manufacturing tools. A transformer neural network integrates force sensor measurements with robot state information to aid estimation of applied forces at the end-effector, compensating for internal cable forces arising from actuation. Our proposed approach achieved normalized errors below 6%, and generalized to unseen conditions better than purely proximal data-driven sensing approaches. High internal cable forces caused sensor saturation and reduced axial force observability, which can degrade performance along the tool's major axis and under higher load conditions. Given current levels of performance, the balance of system integrability and performance enables applications and research into timely topics of haptic feedback, skill assessment, and force-informed autonomy in RAMIS. Videos and code are available at https://enhanced-telerobotics.github.io/shaft_force_sensing/.

2605.31067 2026-06-03 cs.RO 版本更新

Seeing Fast and Slow: Bimodal 3D Scene Graphs for Open-set Tasks

快与慢:面向开放集任务的双模态3D场景图

Marcel Bartholomeus Prasetyo, Shrutika Vishal Thengane, A Manicka Praveen, Yi Loo, Malika Meghjani

AI总结 提出BiMoSG方法,通过默认快速模式生成粗粒度3D场景图,并在需要时切换至慢速模式生成细粒度开放词汇3D场景图,实现实时任务执行。

Comments Submission has not been cleared with funding agency

详情
AI中文摘要

开放集任务执行可以显著受益于根据上下文和机器人探索环境时不断变化的信息,在粗粒度和细粒度场景表示之间无缝切换。例如,通常从粗粒度场景表示开始就足够了,只有当机器人遇到可能包含任务相关对象的区域时,才采用更精细、更细粒度的场景表示。因此,在这项工作中,我们提出了BiMoSG,一种用于开放集任务的双模态3D场景图生成方法。BiMoSG默认采用“快速”模式,以高效生成粗粒度3D场景图,并可以切换到“慢速”模式,为任务相关对象生成更精细的开放词汇3D场景图。我们证明,我们提出的3D场景图生成方法显著快于开源的最新方法。这使得我们能够将场景图生成过程与任务执行集成,用于实时部署。

英文摘要

Open-set task execution can significantly benefit from seamlessly switching between coarse and fine scene representations depending on the context and the evolving information as the robot explores the environment. For example, it is often sufficient to start with a coarse scene representation initially and only employ a finer, more granular scene representation when the robot encounters regions which are likely to contain the task relevant objects. Hence, in this work, we propose BiMoSG, a bimodal 3D scene graph generation approach for open-set tasks. BiMoSG employs a "fast" mode by default to efficiently generate a coarse 3D scene graph and can switch to a "slow" mode for generating a finer open vocabulary 3D scene graph of task relevant objects. We demonstrate that our proposed 3D scene graph generation approach is significantly faster than the open-source state-of-the-art approaches. This allows us to integrate the scene graph generation process with task execution for real-time deployment.

2605.26006 2026-06-03 cs.CV cs.GR cs.RO 版本更新

MIND: Multi-Scale Intent Diffusion for Text-Driven Physics-Based Humanoid Control

MIND: 多尺度意图扩散用于文本驱动的基于物理的人形控制

Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Jingya Wang

发表机构 * ShanghaiTech University(上海科技大学) University of Pennsylvania(宾夕法尼亚大学) Bytedance Seed(字节跳动种子) Stanford University(斯坦福大学) InstAdapt

AI总结 提出MIND框架,通过多尺度意图扩散机制将文本命令与低级动作语义对齐,实现基于物理的人形机器人行为生成。

详情
AI中文摘要

使基于物理的人形机器人能够根据高级文本命令执行多样化的行为仍然是一个重大挑战。现有方法通常遵循两阶段范式(结合运动学动作生成与基于物理的跟踪)或端到端模仿学习范式(直接从文本生成动作)。然而,前者受限于运动学生成与基于物理跟踪之间的固有域偏移,而后者则难以弥合文本命令与低级动作之间的巨大模态差距,限制了有效的语义对齐。值得注意的是,人形状态编码了丰富的运动动态,与低级动作相比,这些动态在语义上与文本描述更对齐,因此成为推导行为意图的自然基础。基于这一见解,我们提出了MIND,一种新颖的端到端扩散框架,用于文本驱动的基于物理的人形控制,该框架利用行为意图作为文本命令与低级动作之间的语义桥梁。其核心是,MIND引入了多尺度意图扩散机制,其中整体意图预测器捕获全局行为动态以指导整体行为合成,而即时意图预测器在每一步扩散中提供逐步的细粒度信号以进行局部行为细化。这种分层意图公式化为人形控制施加了结构化的归纳偏置,改善了语义对齐和行为自然性。此外,MIND将人形状态编码到潜在空间中,以实现更有效的语义意图建模。大量实验表明,MIND优于现有方法,并能从文本命令中合成连贯、物理合理且语义对齐的人形行为。我们的代码将发布以促进未来研究。

英文摘要

Enabling physics-based humanoids to execute diverse behaviors from high-level textual commands remains a significant challenge. Existing methods typically follow either a two-stage paradigm that combines kinematic motion generation with physics-based tracking, or an end-to-end imitation-learning paradigm that directly generates actions from text. However, the former suffers from the inherent domain shift between kinematic generation and physics-based tracking, while the latter struggles with the substantial modality gap between textual commands and low-level actions, limiting effective semantic alignment. Notably, humanoid states encode rich motion dynamics that are more semantically aligned with textual descriptions than low-level actions, making them a natural basis for deriving behavioral intent. Building upon this insight, we propose MIND, a novel end-to-end diffusion framework for text-driven physics-based humanoid control that leverages behavioral intent as a semantic bridge between textual commands and low-level actions. At its core, MIND introduces a multi-scale intent diffusion mechanism, where a holistic intent predictor captures global behavioral dynamics to guide overall behavior synthesis, while an immediate intent predictor provides step-wise, fine-grained signals for local behavior refinement at each diffusion step. This hierarchical intent formulation imposes a structured inductive bias for humanoid control, improving semantic alignment and behavioral naturalness. Furthermore, MIND encodes humanoid states into a latent space to enable more effective semantic intent modeling. Extensive experiments demonstrate that MIND outperforms existing methods and synthesizes coherent, physically plausible, and semantically aligned humanoid behaviors from text commands. Project page: https://binlee26.github.io/MIND_page.

2605.30313 2026-06-03 cs.RO 版本更新

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

UniLab: 超越GPU主导范式的机器人强化学习异构架构

Yufei Jia, Zhanxiang Cao, Mingrui Yu, Heng Zhang, Shenyu Chen, Dixuan Jiang, Meng Li, Xiaofan Li, Yiyang Liu, Junzhe Wu, Zheng Li, XiLin Fang, Ting-Yu Tsui, Shengcheng Fu, Haoyang Li, Anqi Wang, Zifan Wang, Dongjie Zhu, Chenyu Cao, Zhenbiao Huang, Ziang Zheng, Jie Lu, Xin Ma, Zhengyang Wei, Xiang Zhao, Tianyue Zhan, Ye He, Yuxiang Chen, Yizhou Jiang, Yue Li, Haizhou Ge, Yuhang Dong, Fan Jia, Ziheng Zhang, Meng Zhang, Xiwa Deng, Zhixing Chen, Hanyang Shao, Chenxin Dong, Yixuan Li, Yizhi Chen, Bokui Chen, Kaifeng Zhang, Hanqing Cui, Yusen Qin, Ruqi Huang, Lei Han, Tiancai Wang, Xiang Li, Yue Gao, Guyue Zhou

发表机构 * THU(清华大学) SJTU(上海交通大学) SII(上海信息所) Motphys HITSZ(哈尔滨工业大学) BIT(北京理工大学) NEU(南京大学) SUSTech(四川大学) TJU(天津大学) DISCOVER Robotics HKUST(GZ)(香港科技大学(广州)) Galbot NUS(国立新加坡大学) WTU(武汉理工大学) HBUT(湖南大学) AMD NJU(南京大学) ZJU(浙江大学) Dexmal Sharpa D-Robotics

AI总结 提出UniLab异构CPU-仿真/GPU-学习架构,通过统一运行时解耦CPU并行仿真与GPU策略更新,在相同硬件配置下将端到端训练效率提升3-10倍,并减少对NVIDIA CUDA的依赖。

详情
AI中文摘要

基于仿真的当代机器人控制强化学习日益围绕GPU驻留仿真组织:物理、轨迹收集和学习都放在单个以GPU为中心的执行路径上。这种范式极大地提高了训练速度,但也鼓励了一种默认假设,即高效训练需要物理位于GPU上。我们重新审视这一假设。我们的观点是,在仿真主导的机器人控制中,关键问题不是哪个处理器运行物理,而是仿真吞吐量、策略学习和运行时同步是否形成高效的端到端循环。我们提出了UniLab,一种异构CPU-仿真/GPU-学习架构,通过统一的数据移动、缓冲和同步运行时,将CPU并行仿真与GPU策略更新解耦。UniLab实现为一个完整且可扩展的训练系统,使用MuJoCoUni和MotrixSim CPU批处理物理后端,支持PPO、FastSAC、FlashSAC和APPO。在代表性的基于仿真的机器人控制任务上,UniLab在相同硬件配置下将端到端训练效率提升了3-10倍,同时减少了对基于NVIDIA CUDA的软件栈的依赖,并支持在Apple macOS平台以及AMD ROCm和Intel XPU加速器后端上的跨平台执行。这些结果表明,GPU仿真是高效训练的有效路径,但不是必需的路径,拓宽了机器人强化学习训练可用的实际系统选择。项目页面:https://unilabsim.github.io。

英文摘要

Simulation-based RL for contemporary robot control is increasingly organized around GPU-resident simulation: physics, rollout collection, and learning are placed on a single GPU-centric execution path. This paradigm has greatly improved training speed, but it has also encouraged a default assumption that efficient training requires physics to reside on the GPU. We revisit this assumption. Our view is that, in simulation-dominated robot control, the essential question is not which processor runs physics, but whether simulation throughput, policy learning, and runtime synchronization form an efficient end-to-end loop. We present UniLab, a heterogeneous CPU-simulation / GPU-learning architecture that decouples CPU-parallel simulation from GPU policy updates through a unified runtime for data movement, buffering, and synchronization. UniLab is implemented as a complete and extensible training system using MuJoCoUni and MotrixSim CPU-batched physics backends, supporting PPO, FastSAC, FlashSAC, and APPO. On representative simulation-based robot control tasks, UniLab improves end-to-end training efficiency by 3--10$\times$ under the same hardware configuration, while reducing dependence on the NVIDIA CUDA-based software stack and supporting cross-platform execution on the Apple macOS platform and the AMD ROCm and Intel XPU accelerator backends. These results show that GPU simulation is an effective path to efficient training, but not a necessary one, broadening the practical system choices available for robot RL training. Project page: https://unilabsim.github.io.

2605.29663 2026-06-03 cs.RO 版本更新

EXACT-MPPI: Exact Signed-Distance Navigation for Arbitrary-Footprint Robots from Point Clouds via Path Integral Control

EXACT-MPPI:通过路径积分控制实现点云中任意足迹机器人的精确有符号距离导航

Chen Peng, Zhikang Ge, Wenwu Lu, Haiming Gao, Stavros Vougioukas, Peng Wei

发表机构 * ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China(浙江大学杭州全球科技创新中心) College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China(浙江大学生物系统工程与食品科学学院) Department of Biological and Agricultural Engineering, University of California, Davis, Davis, California, USA(加州大学戴维斯分校生物与农业工程系)

AI总结 提出EXACT-MPPI框架,将解析精确有符号距离评估器嵌入模型预测路径积分控制器,无需中间地图表示,直接处理点云实现任意形状足迹机器人的安全导航。

详情
AI中文摘要

地面机器人通常携带有效载荷、工具或其他附件,使其有效足迹变成复杂的非凸形状。在杂乱环境中安全导航需要考虑到这种真实几何形状,然而大多数局部规划器使用凸或膨胀代理简化它,并将传感器数据栅格化为占用网格或距离场。当间隙与足迹几何形状相当时,这两种选择都会消除可行运动。我们提出EXACT-MPPI,一种无需训练的局部导航框架,将局部点云观测和稀疏引导直接映射到运动命令,无需任何中间地图表示。该框架将解析的精确有符号距离评估器嵌入模型预测路径积分(MPPI)控制器中。足迹表示为简单多边形,适用于一般凸或凹平面形状,并具有矩形覆盖特化以加速直线足迹的评估,从而实现足迹感知碰撞成本,无需凸分解、膨胀或学习编码器。在每个MPPI rollout期间,观测到的障碍物点被变换到预测的机体坐标系中,并针对足迹进行评估。所有操作在JAX中批处理,利用GPU并行性实现实时滚动时域控制。实验表明,EXACT-MPPI在批处理距离评估上比学习的点到机器人基线更快,在凸足迹规划器失败的地方保留了可行运动,并在密集静态和移动障碍物下保持鲁棒性。相同的框架通过仅更改足迹描述和运动模型即可部署在差速驱动、阿克曼、全向和混合模式平台上,无需针对每个平台进行训练。因此,将精确足迹几何与基于采样的预测控制相结合,为跨不同机器人的足迹感知局部导航提供了一种实用的、无需训练的途径。

英文摘要

Ground robots often carry payloads, implements, or other attachments that turn their effective footprint into complex, non-convex shapes. Navigating safely through clutter then requires reasoning about this true geometry, yet most local planners simplify it with convex or inflated proxies and rasterize sensor data into occupancy grids or distance fields. Both choices eliminate feasible motions when clearance is comparable to the footprint geometry. We present EXACT-MPPI, a training-free local navigation framework that maps local point-cloud observations and sparse guidance directly to motion commands, without any intermediate map representation. The framework embeds an analytic, exact signed-distance evaluator into a Model Predictive Path Integral (MPPI) controller. The footprint is represented as a simple polygon for general convex or concave planar shapes, with a rectangle-cover specialization for faster evaluation of rectilinear footprints, enabling footprint-aware collision costs without convex decomposition, inflation, or learned encoders. During each MPPI rollout, observed obstacle points are transformed into the predicted body frame and evaluated against the footprint. All operations are batched in JAX, leveraging GPU parallelism for real-time receding-horizon control. Experiments show that EXACT-MPPI accelerates batched distance evaluation over a learned point-to-robot baseline, preserves feasible motion where convex-footprint planners fail, and remains robust under dense static and moving obstacles. The same framework deploys on differential-drive, Ackermann, omnidirectional, and hybrid-mode platforms by changing only the footprint description and motion model without per-platform training. Pairing exact footprint geometry with sampling-based predictive control thus offers a practical, training-free path to footprint-aware local navigation across diverse robots.

2605.25051 2026-06-03 cs.RO 版本更新

A Decentralized LiDAR-SLAM System with Certifiably Optimal Pose Graph Optimization

一种具有可认证最优位姿图优化的去中心化LiDAR-SLAM系统

Baoshan Song, Feng Huang, Li-Ta Hsu

发表机构 * The Hong Kong Polytechnic University(香港理工大学)

AI总结 针对多机器人去中心化LiDAR-SLAM全局一致性问题,提出首个集成可认证最优位姿图优化后端的系统,利用黎曼块坐标下降算法实现全局一致轨迹估计,无需精确初始猜测,轨迹RMSE相比DiSCo-SLAM最高降低48.9%。

Comments In Proceedings of the IEEE International Conference on Robotics & Automation (ICRA'26) 1st Workshop on Robot Meets GNSS and Ranging for Seamless Autonomy, Vienna, Austria, Jun. 5, 2026

详情
AI中文摘要

去中心化多机器人LiDAR-SLAM对于协作任务至关重要,但在保持全局一致性方面面临重大挑战。现有框架主要依赖局部搜索优化或一次性坐标对齐,容易导致次优收敛和长期不一致,尤其是在大规模或退化环境中。为解决这些局限性,本文提出了首个集成最先进的可认证最优位姿图优化(PGO)后端的去中心化LiDAR-SLAM系统。通过利用黎曼块坐标下降(RBCD)算法,我们的系统无需精确初始猜测即可确保全局一致的轨迹估计。实验结果表明,所提出的框架实现了卓越的鲁棒性,与最先进的DiSCo-SLAM相比,轨迹RMSE最高改善了48.9%。

英文摘要

Decentralized multi-robot LiDAR-SLAM is essential for collaborative missions but faces significant challenges in maintaining global consistency. Existing frameworks predominantly rely on local-search optimization or one-time coordinate alignment, which are prone to suboptimal convergence and long-term inconsistency, especially in large-scale or degenerate environments. To address these limitations, this paper presents the first decentralized LiDAR-SLAM system that integrates a state-of-the-art certifiably optimal Pose Graph Optimization (PGO) backend. By leveraging the Riemannian Block Coordinate Descent (RBCD) algorithm, our system ensures globally consistent trajectory estimation without requiring accurate initial guesses. Experimental results demonstrate that the proposed framework achieves superior robustness, improving trajectory RMSE by up to 48.9% compared to the state-of-the-art DiSCo-SLAM.

2605.22018 2026-06-03 cs.CV cs.AI cs.RO 版本更新

FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments

FRED:面向洪水道路环境的多模态自动驾驶数据集

Connor Malone, Sebastien Demmel, Sebastien Glaser

发表机构 * Queensland University of Technology(昆士兰理工大学) ARC Training Centre for Automated Vehicles in Rural and Remote Regions (AVR3)(农村和偏远地区自动化车辆培训中心(AVR3))

AI总结 提出首个针对道路水险场景的多模态自动驾驶数据集FRED,包含相机、LiDAR和IMU数据,并提供语义标签以支持水险检测方法训练与评估。

详情
AI中文摘要

洪水道路环境数据集(FRED)是,据我们所知,首个专门针对道路水险场景数据收集的多模态自动驾驶数据集。该数据集包含来自2.3 MP FLIR Blackfly USB3相机的图像、来自Ouster OS1-64 LiDAR的64线360度点云,以及由Geoflex RTK GNSS校正的iXblue ATLANS-C IMU数据,数据采集自五个不同地点,涵盖洪水期间和洪水之后。数据以两种格式发布:KITTI风格格式,便于与现有数据工具集成;以及RTMaps格式,用于直接回放车辆的数据捕获。我们提供语义标签,以支持用于水险检测的单传感器和传感器融合方法的训练与评估。提供位置和速度数据,以及干燥条件下捕获的数据,以支持可能包含地图的基于位置的检测方法开发,并评估其他任务,如定位和SLAM。

英文摘要

The Flooded Road Environments Dataset (FRED) is, to our knowledge, the first multi-modal autonomous driving dataset specifically targeting the collection of data from scenarios involving water hazards on the road. The dataset contains images from a 2.3 MP FLIR Blackfly USB3 camera, 64-beam 360 degree point clouds from an Ouster OS1-64 LiDAR, and data from an iXblue ATLANS-C IMU corrected by a Geoflex RTK GNSS, from five separate locations captured both during and after flooding events. The data has been released in two formats: a KITTI-style format for easy integration with existing data tools, and the RTMaps format for direct replay of the vehicle's data capture. We provide semantic labels to enable the training and evaluation of both single-sensor and sensor-fusion methods for water hazard detection. Position and velocity, as well as data captured under dry conditions, are provided to enable the development of location-based detection methods that may incorporate maps, and to evaluate other tasks such as localisation and SLAM.

2605.16816 2026-06-03 cs.RO 版本更新

"I'm Not Mad, Just Focused'': Understanding Human Emotions in Human-Robot Collaboration

“我没生气,只是专注”:理解人机协作中的人类情绪

Seung Chan Hong, Dana Kulić, Leimin Tian

发表机构 * Faculty of Engineering, Monash University(莫纳什大学工程学院) CSIRO Robotics(CSIRO机器人实验室)

AI总结 提出基于视觉语言模型(VLM)的情绪识别系统,利用上下文理解改善人机协作中的情绪解读,实验表明其语义相似性和情感对齐优于基线CNN系统,且用户偏好情绪自适应机器人行为。

详情
Journal ref
IEEE Robotics and Automation Letters, vol. 11, no. 7, pp. 8260-8267, July 2026
AI中文摘要

人机协作(HRC)可以从机器人解读人类情绪状态的能力中受益。然而,当前HRC中的情绪识别(ER)模型往往表现不足,特别是因为它们依赖于表演数据集和单一模态输入(如面部表情)。我们提出了一种新颖的基于视觉语言模型(VLM)的ER系统,利用上下文理解来改善HRC中的情绪解读。我们首先通过评估VLM-ER系统与现有HRC数据集上人工标注的语义和情感相似性来对其进行评估。然后,在协作配送任务的用户研究中,我们评估了基于VLM-ER系统推断的用户情绪状态来调节机器人行为的效果。结果表明,与基线卷积神经网络系统相比,所提出的VLM-ER系统实现了更高的人工标注语义相似性和正向情感对齐。此外,用户研究中的参与者更喜欢由VLM-ER系统促进的情绪自适应机器人行为。

英文摘要

Human-robot collaboration (HRC) can benefit from robots' abilities to interpret human emotional states. However, current emotion recognition (ER) models in HRC often fall short, particularly due to their reliance on acted datasets and single-modality inputs like facial expressions. We propose a novel vision language model (VLM)-based ER system that leverages contextual understanding to improve emotion interpretation in HRC. We first evaluate the VLM-ER system by assessing its semantic and sentiment similarity with human annotations on an existing HRC dataset. Then, in a user study with a service robot in a collaborative delivery task, we evaluate the effects of modulating the robot's behaviour based on the user's emotional state inferred by the VLM-ER system. The results show that the proposed VLM-ER system achieves higher semantic similarity and positive sentiment alignment with human annotations compared to a baseline convolutional neural network-based system. Further, participants in the user study preferred emotion-adaptive robot behaviour facilitated by the VLM-ER system.

2604.19275 2026-06-03 eess.SY cs.OS cs.RO cs.SY 版本更新

Scheduling Analysis of UAV Flight Control Workloads on PREEMPT_RT Linux Using a Raspberry Pi 5

基于Raspberry Pi 5的PREEMPT_RT Linux上无人机飞行控制工作负载的调度分析

Luiz Giacomossi, Håkan Forsberg, Ivan Tomasic, Baran Çürüklü, Tommaso Cucinotta

发表机构 * Mälardalen University(马尔达LEN大学) ReTiS Lab, Scuola Superiore Sant’Anna(ReTiS实验室,圣安娜高等学院)

AI总结 通过分析Raspberry Pi 5上PREEMPT_RT Linux内核的激活路径对250 Hz控制回路的影响,发现标准内核最差延迟超过9 ms,而PREEMPT_RT将最差延迟降低约88%至225微秒以下,但剩余抖动主要由硬件内存争用引起。

Comments 9 pages, 8 figures, conference

详情
AI中文摘要

现代无人机架构日益趋向于将高级自主性和低级飞行控制统一在单个通用操作系统(GPOS)上。然而,复杂的多核片上系统(SoC)由于共享资源争用引入了显著的时间不确定性。本文对Raspberry Pi 5上的PREEMPT_RT Linux内核进行了架构分析,特别隔离了内核激活路径(延迟执行的SoftIRQ与实时直接激活)对250 Hz控制回路的影响。结果表明,在高负载下,标准内核不适合,最差延迟超过9毫秒。相比之下,PREEMPT_RT将最差延迟降低了近88%,降至225微秒以下,通过强制直接唤醒路径减轻了操作系统噪声。这些发现表明,虽然PREEMPT_RT解决了调度方差问题,但现代SoC上的剩余抖动主要由硬件内存争用驱动。

英文摘要

Modern UAV architectures increasingly aim to unify high-level autonomy and low-level flight control on a single General-Purpose Operating System (GPOS). However, complex multi-core System-on-Chips (SoCs) introduce significant timing indeterminism due to shared resource contention. This paper performs an architectural analysis of the PREEMPT RT Linux kernel on a Raspberry Pi 5, specifically isolating the impact of kernel activation paths (deferred execution SoftIRQs versus real-time direct activation) on a 250 Hz control loop. Results show that under heavy stress, the standard kernel is unsuitable, exhibiting worst-case latencies exceeding 9 ms. In contrast, PREEMPT RT reduced the worst-case latency by nearly 88 percent to under 225 microseconds, enforcing a direct wake-up path that mitigates OS noise. These findings demonstrate that while PREEMPT RT resolves scheduling variance, the residual jitter on modern SoCs is primarily driven by hardware memory contention.

2603.23117 2026-06-03 cs.CR cs.AI cs.RO 版本更新

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

TRAP: 通过对抗性补丁劫持VLA的CoT推理

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji, Wenyuan Xu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出TRAP攻击,利用对抗性补丁劫持视觉-语言-动作模型的链式推理,实现目标行为操控。

Comments Accepted by ICML 2026

详情
AI中文摘要

通过集成链式推理,视觉-语言-动作模型在机器人操作中展现出强大能力,特别是在提升泛化性和可解释性方面。然而,基于CoT的推理机制的安全性尚未得到充分探索。在本文中,我们证明CoT推理引入了一种新的攻击向量,用于目标行为劫持——例如,导致机器人错误地将刀递给一个人而不是苹果——而无需修改用户的指令。我们首先提供经验证据表明,即使CoT与输入指令在语义上不一致,它仍然强烈主导动作生成。基于这一观察,我们提出TRAP,这是首个针对CoT推理VLA模型的目标行为劫持对抗性攻击。通过针对推理到动作的路径,TRAP使用对抗性补丁(例如,放置在桌子上的桌布)来引导中间CoT推理和下游动作朝向对手定义的行为。在三个代表性推理VLA上的广泛评估,涵盖了不同的CoT推理机制,证明了TRAP的有效性。值得注意的是,我们在现实环境中通过将补丁打印在纸上实现了该攻击。我们的发现凸显了保护VLA系统中CoT推理的紧迫性。项目页面可在https://zhengxian-huang.github.io/TRAP-website/获取。

英文摘要

By integrating Chain-of-Thought (CoT) reasoning, Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, particularly by improving generalization and interpretability. However, the security of CoT-based reasoning mechanisms remains largely unexplored. In this paper, we show that CoT reasoning introduces a novel attack vector for targeted behavior hijacking--for example, causing a robot to mistakenly deliver a knife to a person instead of an apple--without modifying the user's instruction. We first provide empirical evidence that CoT strongly governs action generation, even when it is semantically misaligned with the input instructions. Building on this observation, we propose TRAP, the first targeted behavior-hijacking adversarial attack against CoT-reasoning VLA models. By targeting the reasoning-to-action pathway, TRAP uses an adversarial patch (e.g., a tablecloth placed on the table) to steer intermediate CoT reasoning and downstream actions toward adversary-defined behaviors. Extensive evaluations on three representative reasoning VLAs, spanning distinct CoT reasoning mechanisms, demonstrate the effectiveness of TRAP. Notably, we implemented the patch by printing it on paper in a real-world setting. Our findings highlight the urgent need to secure CoT reasoning in VLA systems. The project page is available at https://zhengxian-huang.github.io/TRAP-website/.

2602.04132 2026-06-03 eess.SY cs.LG cs.RO cs.SY 版本更新

LC-SAC: Lyapunov-Constrained Soft Actor-Critic via Koopman Operator Theory for Trajectory Tracking and Stabilization

LC-SAC: 基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法用于轨迹跟踪与镇定

Dhruv S. Kushwaha, Zoleikha A. Biron

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种结合Koopman算子理论的李雅普诺夫约束软演员-评论家算法,通过线性提升动力学模型和闭环控制李雅普诺夫函数实现轨迹跟踪与镇定,并引入条件风险价值约束处理罕见但严重的失稳事件。

Comments 13 pages, 8 Figures

详情
AI中文摘要

强化学习在解决复杂序列决策问题中取得了显著成功,但其在安全关键物理系统中的应用仍受限于缺乏稳定性保证。标准强化学习算法优先考虑奖励最大化,往往产生可能引起振荡或无界状态发散的策略。本文提出一种基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法。我们通过扩展动态模态分解学习误差动力学的线性提升代理模型,并求解离散代数Riccati方程以获得闭式二次候选控制李雅普诺夫函数。该控制李雅普诺夫函数作为拉格朗日惩罚项被纳入SAC演员更新中,通过条件风险价值目标聚合最坏情况尾部分布,将约束压力集中在罕见但严重的失稳事件上。我们进一步引入三种结构性的EDMD改进:在求解DARE之前对提升的A矩阵进行谱半径归一化、具有物理意义的LQR状态代价,以及强制V(0)=0的值偏置锚点,使得闭式控制李雅普诺夫函数对于更高维的提升模型(如倒立摆和3D四旋翼)是适定的。消融研究表明,硬拉格朗日约束是必要的,将其替换为奖励塑形会导致学习不稳定并在四旋翼任务中导致回报崩溃。

英文摘要

Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. In this work we propose a Lyapunov-Constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We learn a linear lifted surrogate of the error dynamics via Extended Dynamic Mode Decomposition (EDMD) and solve the Discrete Algebraic Riccati Equation (DARE) to obtain a closed-form quadratic candidate Control Lyapunov Function (CLF). This CLF is incorporated into the SAC actor update as a Lagrangian penalty that aggregates the worst-case tail of violations via a Conditional Value-at-Risk (CVaR) objective, concentrating constraint pressure on rare but severe instability events. We further introduce three structural EDMD refinements spectral-radius normalization of the lifted A-matrix prior to the DARE solve, a physically meaningful LQR state cost, and a value-bias anchor enforcing V(0)=0 that make the closed-form CLF well-posed for higher-dimensional lifted models such as the cartpole and 3D quadrotor. The ablation study shows that a hard Lagrangian constraint is essential, replacing it with reward shaping (Lyap-RS-SAC) destabilizes learning and collapses return on quadrotor tasks.

2602.06219 2026-06-03 cs.RO cs.AI 版本更新

Coupled Local and Global World Models for Efficient First Order RL

耦合局部与全局世界模型的高效一阶强化学习

Joseph Amigo, Rooholla Khorrambakht, Nicolas Mansard, Ludovic Righetti

发表机构 * Machines in Motion Laboratory, New York University, USA(纽约大学运动机器实验室) LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France(图卢兹大学LAAS-CNRS中心) Artificial and Natural Intelligence Toulouse Institute, Toulouse, France(图卢兹人工智能与自然智能研究所)

AI总结 提出一种通过解耦一阶梯度方法在数据驱动的世界模型内训练策略的方法,结合局部和全局世界模型实现高效梯度计算,在Push-T任务和四足机器人操作任务中显著优于PPO。

Comments Project website: https://coupled-global-local-wm-rl.pages.dev/

详情
AI中文摘要

世界模型为在标准模拟器难以处理的情况下更忠实地捕捉复杂动力学(包括接触和非刚性)以及复杂感官信息(如视觉感知)提供了一条有前景的途径。然而,这些模型的计算复杂度高,对流行的强化学习方法构成了挑战,这些方法已成功用于模拟器解决复杂运动任务,但在操作任务上仍存在困难。本文介绍了一种完全绕过模拟器的方法,在从机器人与真实环境交互中学习到的世界模型内部训练强化学习策略。其核心是通过一种新颖的解耦一阶梯度方法实现大规模扩散模型的策略训练:全尺度世界模型生成准确的前向轨迹,而轻量级潜在空间代理近似其局部动力学以实现高效梯度计算。这种局部与全局世界模型的耦合确保了高保真展开以及计算上可处理的微分。我们在Push-T操作任务上证明了该方法的有效性,其在样本效率上显著优于PPO。我们还通过四足机器人的自我中心物体操作任务进一步评估了该方法。这些结果共同表明,在数据驱动的世界模型内部学习是解决难以建模的图像空间强化学习任务的一条有前景的途径,无需依赖手工设计的物理模拟器。

英文摘要

World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, in situations where standard simulators struggle. However, these models are computationally complex to evaluate, posing a challenge for popular RL approaches that have been successfully used with simulators to solve complex locomotion tasks but yet struggle with manipulation. This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments. At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method: a full-scale world model generates accurate forward trajectories, while a lightweight latent-space surrogate approximates its local dynamics for efficient gradient computation. This coupling of a local and global world model ensures high-fidelity unrolling alongside computationally tractable differentiation. We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency. We further evaluate our approach through an ego-centric object manipulation task with a quadruped. Together, these results demonstrate that learning inside data-driven world models is a promising pathway for solving hard-to-model RL tasks in image space without reliance on hand-crafted physics simulators.

2512.19347 2026-06-03 cs.RO 版本更新

OMP: One-step Meanflow Policy with Directional Alignment

OMP: 一步均值流策略与方向对齐

Han Fang, Yize Huang, Yuheng Zhao, Paul Weng, Xiao Li, Yutong Ban

发表机构 * School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China(上海交通大学机械工程学院) Global College, Shanghai Jiao Tong University, Shanghai, China(上海交通大学全球学院) Duke Kunshan University, Jiangsu, China(杜克昆山大学)

AI总结 提出一步均值流策略(OMP),通过方向对齐机制和微分推导方程解决均值流在机器人操作中的谱偏差和梯度饥饿问题,实现高保真实时操控。

Comments Accepted as poster of ICML-2026

详情
AI中文摘要

机器人操作日益采用数据驱动的生成策略框架,但该领域面临持续的权衡:扩散模型推理延迟高,而基于流的方法通常需要复杂的架构约束。尽管在图像生成领域,均值流范式提供了单步推理的路径,但其直接应用于机器人领域受到关键理论病理的阻碍,特别是低速度区域中的谱偏差和梯度饥饿。为克服这些限制,我们提出了一步均值流策略(OMP),一种专为高保真实时操作设计的新型框架。我们引入轻量级方向对齐机制,以显式同步预测速度与真实均值速度。此外,我们实现了微分推导方程(DDE)来近似雅可比向量积(JVP)算子,该算子解耦前向和后向传播,显著降低内存复杂度。在Adroit和Meta-World基准上的大量实验表明,OMP在成功率和轨迹精度上优于最先进方法,特别是在高精度任务中,同时保持了单步生成的效率。

英文摘要

Robot manipulation has increasingly adopted data-driven generative policy frameworks, yet the field faces a persistent trade-off: diffusion models suffer from high inference latency, while flow-based methods often require complex architectural constraints. Although in image generation domain, the MeanFlow paradigm offers a path to single-step inference, its direct application to robotics is impeded by critical theoretical pathologies, specifically spectral bias and gradient starvation in low-velocity regimes. To overcome these limitations, we propose the One-step MeanFlow Policy (OMP), a novel framework designed for high-fidelity, real-time manipulation. We introduce a lightweight directional alignment mechanism to explicitly synchronize predicted velocities with true mean velocities. Furthermore, we implement a Differential Derivation Equation (DDE) to approximate the Jacobian-Vector Product (JVP) operator, which decouples forward and backward passes to significantly reduce memory complexity. Extensive experiments on the Adroit and Meta-World benchmarks demonstrate that OMP outperforms state-of-the-art methods in success rate and trajectory accuracy, particularly in high-precision tasks, while retaining the efficiency of single-step generation.

2512.22539 2026-06-03 cs.RO cs.CV 版本更新

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

VLA-Arena:一个用于基准测试视觉-语言-动作模型的开源框架

Borong Zhang, Jiahao Li, Jiachen Shen, Yuhao Zhang, Yishuai Cai, Yuanpei Chen, Juntao Dai, Jiaming Ji, Yaodong Yang

AI总结 提出VLA-Arena基准,通过三正交轴(任务结构、语言命令、视觉观察)量化任务难度,系统评估视觉-语言-动作模型的能力边界与失败模式。

Comments Accepted by ICML 2026

详情
AI中文摘要

尽管视觉-语言-动作模型(VLA)正快速向通用机器人策略发展,但定量理解其局限和失败模式仍然困难。为此,我们引入了一个名为VLA-Arena的全面基准。我们提出了一种新颖的结构化任务设计框架,用于在三个正交轴上量化难度:(1)任务结构,(2)语言命令,以及(3)视觉观察。这使我们能够系统地设计具有细粒度难度级别的任务,从而精确测量模型能力边界。对于任务结构,VLA-Arena的170个任务被分为四个维度:安全性、干扰物、外推和长时域。每个任务设计有三个难度级别(L0-L2),仅在L0上进行微调以评估通用能力。正交于此,语言(W0-W4)和视觉(V0-V4)扰动可应用于任何任务,以实现鲁棒性的解耦分析。我们对最先进的VLA进行了广泛评估,揭示了几个关键局限性,包括强烈的记忆化倾向而非泛化、不对称鲁棒性、缺乏对安全约束的考虑,以及无法组合已学技能以完成长时域任务。为了促进针对这些挑战的研究并确保可重复性,我们提供了完整的VLA-Arena框架,包括从任务定义到自动评估的端到端工具链,以及用于微调的VLA-Arena-S/M/L数据集。我们的基准、数据、模型和排行榜可在https://vla-arena.github.io获取。

英文摘要

While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To address this, we introduce a comprehensive benchmark called VLA-Arena. We propose a novel structured task design framework to quantify difficulty across three orthogonal axes: (1) Task Structure, (2) Language Command, and (3) Visual Observation. This allows us to systematically design tasks with fine-grained difficulty levels, enabling a precise measurement of model capability frontiers. For Task Structure, VLA-Arena's 170 tasks are grouped into four dimensions: Safety, Distractor, Extrapolation, and Long Horizon. Each task is designed with three difficulty levels (L0-L2), with fine-tuning performed exclusively on L0 to assess general capability. Orthogonal to this, language (W0-W4) and visual (V0-V4) perturbations can be applied to any task to enable a decoupled analysis of robustness. Our extensive evaluation of state-of-the-art VLAs reveals several critical limitations, including a strong tendency toward memorization over generalization, asymmetric robustness, a lack of consideration for safety constraints, and an inability to compose learned skills for long-horizon tasks. To foster research addressing these challenges and ensure reproducibility, we provide the complete VLA-Arena framework, including an end-to-end toolchain from task definition to automated evaluation and the VLA-Arena-S/M/L datasets for fine-tuning. Our benchmark, data, models, and leaderboard are available at https://vla-arena.github.io.

2512.18268 2026-06-03 cs.RO cs.CG 版本更新

On The Computational Complexity of Minimum Aerial Photographs for Planar Region Coverage

关于平面区域覆盖的最小航拍照片的计算复杂性

Si Wei Feng

AI总结 研究用正方形和圆形覆盖简单平面多边形的计算复杂性,证明了不可近似性间隙并开发了2.828-最优近似算法。

Comments I have not communicated well with other contributors to the work when submitting this paper

详情
AI中文摘要

随着无人机技术的普及,航拍在环境监测、结构检查、执法等日常场景中变得普遍。该领域的一个核心挑战是在尊重图像分辨率和有限拍摄数量等约束的同时,高效地用照片覆盖目标区域,使照片能够完整捕捉该区域。本文研究了使用正方形和圆形覆盖简单平面多边形的计算复杂性。具体来说,它展示了1.165(对于正方形)和1.25(对于受限正方形中心)的不可近似性间隙,并开发了一个2.828-最优近似算法,表明这些问题在计算上难以近似。本文的直觉可以扩展到航拍之外更广泛的应用,如农药喷洒和战略传感器放置。

英文摘要

With the popularity of drone technologies, aerial photography has become prevalent in many daily scenarios such as environment monitoring, structure inspection, law enforcement etc. A central challenge in this domain is the efficient coverage of a target area with photographs that can entirely capture the region, while respecting constraints such as the image resolution, and limited number of pictures that can be taken. This work investigates the computational complexity of covering a simple planar polygon using squares and circles. Specifically, it shows inapproximability gaps of $1.165$ (for squares) and $1.25$ (for restricted square centers) and develops a $2.828$-optimal approximation algorithm, demonstrating that these problems are computationally intractable to approximate. The intuitions of this work can extend beyond aerial photography to broader applications such as pesticide spraying and strategic sensor placement.

2512.21235 2026-06-03 cs.RO 版本更新

RoboCade: Gamifying Robot Data Collection

RoboCade: 游戏化机器人数据收集

Suvir Mirchandani, Mia Tang, Jiafei Duan, Jubayer Ibn Hamid, Michael Cho, Dorsa Sadigh

AI总结 提出游戏化远程操作平台RoboCade,通过嵌入视觉反馈、音效、进度条等游戏化元素,吸引普通用户收集演示数据,并证明该数据可提升下游策略成功率16-56%,且用户愉悦度提高24%。

Comments 10 pages, 9 figures. International Conference on Robotics and Automation (ICRA) 2026

详情
AI中文摘要

从人类演示中模仿学习已成为训练自主机器人策略的主流方法。然而,收集演示数据集成本高昂:通常需要访问机器人,并在冗长乏味的过程中持续付出努力。这些因素限制了可用于训练策略的数据规模。我们旨在通过让更广泛的受众参与既易于访问又具有激励性的游戏化数据收集体验来解决这一可扩展性挑战。具体来说,我们开发了一个游戏化远程操作平台RoboCade,以吸引普通用户收集对下游策略训练有益的数据。为此,我们将游戏化策略嵌入系统界面和数据收集任务的设计中。在系统界面中,我们包含视觉反馈、音效、目标可视化、进度条、排行榜和徽章等组件。我们还提出了构建与有用下游目标任务具有重叠结构的游戏化任务的原则。我们在三个操作任务(包括空间排列、扫描和插入)上实例化了RoboCade。为了说明游戏化机器人数据收集的可行性,我们通过平台收集了一个演示数据集,并表明使用该数据共同训练机器人策略可以提高非游戏化目标任务的成功率(+16-56%)。此外,我们进行了一项用户研究,验证了新手用户认为游戏化平台比标准非游戏化平台显著更有趣(+24%)。这些结果凸显了游戏化数据收集作为收集演示数据的一种可扩展、可访问且引人入胜的方法的前景。

英文摘要

Imitation learning from human demonstrations has become a dominant approach for training autonomous robot policies. However, collecting demonstration datasets is costly: it often requires access to robots and needs sustained effort in a tedious, long process. These factors limit the scale of data available for training policies. We aim to address this scalability challenge by involving a broader audience in a gamified data collection experience that is both accessible and motivating. Specifically, we develop a gamified remote teleoperation platform, RoboCade, to engage general users in collecting data that is beneficial for downstream policy training. To do this, we embed gamification strategies into the design of the system interface and data collection tasks. In the system interface, we include components such as visual feedback, sound effects, goal visualizations, progress bars, leaderboards, and badges. We additionally propose principles for constructing gamified tasks that have overlapping structure with useful downstream target tasks. We instantiate RoboCade on three manipulation tasks -- including spatial arrangement, scanning, and insertion. To illustrate the viability of gamified robot data collection, we collect a demonstration dataset through our platform, and show that co-training robot policies with this data can improve success rate on non-gamified target tasks (+16-56%). Further, we conduct a user study to validate that novice users find the gamified platform significantly more enjoyable than a standard non-gamified platform (+24%). These results highlight the promise of gamified data collection as a scalable, accessible, and engaging method for collecting demonstration data.

2511.04421 2026-06-03 cs.RO 版本更新

Temporal Action Selection for Action Chunking

用于动作分块的时间动作选择

Yueyang Weng, Xiaopeng Zhang, Yongjin Mu, Yingcong Zhu, Yanjie Li

发表机构 * Guangdong Key Laboratory of Intelligent Morphing Mechanisms and Adaptive Robotics and School of Intelligence Science and Engineering, the Harbin Institute of Technology Shenzhen, China(广东省智能变形机制与自适应机器人重点实验室和智能科学与工程学院,哈尔滨工业大学深圳学院)

AI总结 提出时间动作选择(TAS)算法,通过缓存多时间步预测的动作块并动态选择最优动作,在保持决策一致性的同时提升反应性,显著提高任务成功率。

详情
AI中文摘要

动作分块是学习从示范(LfD)中广泛采用的方法。通过建模多步动作块而非单步动作,动作分块显著增强了对人类专家策略的建模能力。然而,由于动作分块仅在完整动作块执行后才做出单一决策,由此导致的决策频率降低限制了实时观测的利用,削弱了在动态或嘈杂环境中的反应性。现有解决该问题的尝试主要是在反应性和决策一致性之间进行权衡,未能同时实现两者。为解决这一局限,我们提出了一种新颖算法——时间动作选择(TAS),该算法缓存来自多个时间步的预测动作块,并通过轻量级选择器网络动态选择最优动作。TAS在反应性和决策一致性上实现了平衡优化。跨多个任务及不同基础策略架构的实验表明,TAS显著提高了成功率。此外,将TAS作为基础策略与残差强化学习(RL)相结合,既提升了训练效率,也提高了性能上限。在仿真和物理机器人上的实验均证实了该方法的有效性。

英文摘要

Action chunking is a widely adopted approach in Learning from Demonstration (LfD). By modeling multi-step action chunks rather than single-step actions, action chunking significantly enhances modeling capabilities for human expert policies. However, because action chunking makes a single decision only after a complete action block has been executed, the resulting reduction in decision frequency restricts the utilization of real-time observations, impairing reactivity in dynamic or noisy environments. Existing efforts to address this issue have primarily resorted to trading off reactivity against decision consistency, without achieving both. To address this limitation, we propose a novel algorithm, Temporal Action Selection (TAS), which caches predicted action chunks from multiple timesteps and dynamically selects the optimal action through a lightweight selector network. TAS achieves balanced optimization across both reactivity and decision consistency. Experiments across multiple tasks with diverse base policy architectures show that TAS significantly improves success rates. Furthermore, integrating TAS as a base policy with residual reinforcement learning (RL) improves both training efficiency and the performance ceiling. Experiments in both simulation and physical robots confirm the method's efficacy.

2511.02417 2026-06-03 cs.CV cs.RO 版本更新

CropCraft: A Procedural World Generator for Robotic Simulation of Agricultural Tasks

CropCraft:用于农业任务机器人仿真的程序化世界生成器

Riccardo Bertoglio, Cyrille Pierre, Johann Laconte, Roland Lenain

发表机构 * Institut National de la Recherche Agronomique(法国国家农业科研院)

AI总结 提出基于Blender和Python的开源程序化世界生成器CropCraft,通过YAML配置生成多样化农田场景,支持间作、葡萄园和杂草田,并生成带标注的3D仿真环境,用于农业机器人感知和导航算法开发。

详情
AI中文摘要

现代农业中 agroecological 实践的采用要求机器人系统能够在高度多样化和复杂的田间环境中运行。开发和评估此类系统严重依赖仿真,但生成代表 agroecological 多样性的逼真且可配置的3D环境仍然是一个主要挑战。本文提出了 CropCraft,一个基于 Blender 和 Python 构建的开源程序化世界生成器,旨在生成适用于农业机器人的3D仿真环境。CropCraft 通过简单的 YAML 配置文件生成作物田,支持多种场景,包括间作、葡萄园和杂草丛生的田地。该工具包含一个多生长阶段的3D植物模型库(作物、草和杂草),并使用随机放置算法真实地再现实际田地中观察到的空间变异性。生成的场景可直接导入 Gazebo 仿真器,并包含所有放置元素的地面真值标注,支持感知和导航算法的开发。为了展示 CropCraft 的实际用途,我们将其应用于使用深度学习的作物-杂草语义分割任务。生成了包含10,000张玉米田合成图像的数据集,这些图像具有不同的杂草密度、生长阶段和光照条件,并用于训练多个分割架构。仅使用合成数据训练的模型在真实田间图像上实现了约10%的平均交并比(mIoU)的 sim-to-real 差距,优于先前的先进合成生成方法。我们进一步表明,即使将少量真实图像与合成数据结合,也能提高跨领域的泛化能力,为农业感知任务中合成数据的有效使用提供了新见解。

英文摘要

The adoption of agroecological practices in modern agriculture requires robotic systems capable of operating in highly diverse and complex field environments. Developing and evaluating such systems relies heavily on simulation, yet generating realistic and configurable 3D environments representative of agroecological diversity remains a major challenge. This paper presents CropCraft, an open-source procedural world generator built on Blender and Python, designed to produce 3D simulation environments tailored to agricultural robotics. CropCraft generates crop fields from a simple YAML configuration file, supporting a wide range of scenarios including intercropping, vineyards, and weed-infested fields. The tool includes a library of 3D plant models (crops, grasses, and weeds) at multiple growth stages, and uses stochastic placement algorithms to realistically reproduce the spatial variability observed in real fields. Generated worlds are directly importable into the Gazebo simulator and include ground-truth annotations for all placed elements, supporting both perception and navigation algorithm development. To demonstrate the practical utility of CropCraft, we apply it to the task of crop-weed semantic segmentation using deep learning. A dataset of 10,000 synthetic images of maize fields with varying weed densities, growth stages, and lighting conditions was generated and used to train several segmentation architectures. Models trained exclusively on synthetic data achieve a sim-to-real gap of approximately 10% mean Intersection over Union (mIoU) on real field images, outperforming previous state-of-the-art synthetic generation approaches. We further show that combining even a few real images with synthetic data improves generalization across domains, providing new insights into the effective use of synthetic data for agricultural perception tasks.

2505.08222 2026-06-03 cs.RO cs.AI cs.DC cs.PF 版本更新

Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles

通过自主车辆扩展多智能体强化学习用于水声跟踪

Matteo Gallici, Ivan Masmitja, Mario Martín

发表机构 * KEMLG Research Group, Universitat Politècnica de Catalunya Barcelona, Spain(凯姆尔格研究组,巴塞罗那理工大学,西班牙) Instituto de Ciencias del Mar, Consejo Superior de Investigaciones Científicas, Barcelona, Spain(海洋科学研究所,西班牙国家科学研究委员会,巴塞罗那,西班牙) KEMLG Research Group, Universitat Politècnica de Catalunya (UPC), and with the HPAI group at Barcelona Supercomputing Center (BSC), Barcelona, Spain(凯姆尔格研究组,巴塞罗那理工大学(UPC),以及巴塞罗那超级计算中心(BSC)的HPAI组,巴塞罗那,西班牙)

AI总结 提出一种GPU加速环境(高达30000倍加速)和基于Transformer的MARL架构(TransfMAPPO),实现多目标快速移动场景下的水下跟踪,跟踪误差低于5米。

详情
AI中文摘要

自主车辆(AV)为水下跟踪等科学任务提供了经济高效的解决方案。强化学习(RL)已成为控制AV的强大方法,但扩展到舰队(对于多目标跟踪或快速移动目标至关重要)具有挑战性。多智能体RL(MARL)以样本效率低下而闻名,虽然像Gazebo的LRAUV这样的高保真模拟器提供高达100倍实时速度的单机器人模拟,但在多车辆场景中几乎没有加速,使得MARL训练不切实际。然而,高保真模拟对于测试复杂策略和缩小模拟到现实的差距至关重要。为了解决这些限制,我们开发了一个GPU加速环境,在保持其动力学的同时,实现了比Gazebo高达30000倍的加速。这使得快速、端到端的GPU训练以及无缝转移到Gazebo进行评估成为可能。我们还引入了一种基于Transformer的架构(TransfMAPPO),该架构学习对舰队规模和目标数量不变的策略,从而能够通过课程学习在日益复杂的场景中训练更大的舰队。经过大规模GPU训练后,我们在Gazebo中进行了广泛评估,表明即使面对多个快速移动的目标,我们的方法也能将跟踪误差保持在5米以下。

英文摘要

Autonomous vehicles (AVs) offer a cost-effective solution for scientific missions such as underwater tracking. Reinforcement learning (RL) has emerged as a powerful method for controlling AVs, but scaling to fleets (essential for multi-target tracking or rapidly moving targets) is challenging. Multi-Agent RL (MARL) is notoriously sample-inefficient, and while high-fidelity simulators like Gazebo's LRAUV provide up to 100x faster-than-real-time single-robot simulations, they offer little speedup in multi-vehicle scenarios, making MARL training impractical. Yet, high-fidelity simulation is crucial to test complex policies and close the sim-to-real gap. To address these limitations, we develop a GPU-accelerated environment that achieves up to 30,000x speedup over Gazebo while preserving its dynamics. This enables fast, end-to-end GPU training and seamless transfer to Gazebo for evaluation. We also introduce a Transformer-based architecture (TransfMAPPO) that learns policies invariant to fleet size and number of targets, enabling curriculum learning to train larger fleets on increasingly complex scenarios. After large-scale GPU training, we perform extensive evaluations in Gazebo, showing our method maintains tracking errors below 5m even with multiple fast-moving targets.

2505.17659 2026-06-03 cs.RO cs.CV 版本更新

Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

Plan-R1:安全且可行的轨迹规划作为语言建模

Xiaolong Tang, Meina Kan, Shiguang Shan, Xilin Chen

发表机构 * Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 提出Plan-R1两阶段轨迹规划框架,通过原则对齐与行为学习解耦,结合规则奖励和方差解耦GRPO,显著提升自动驾驶规划的安全性和可行性。

Comments Accepted by ICLR2026

详情
AI中文摘要

安全且可行的轨迹规划对于现实世界的自动驾驶系统至关重要。然而,现有的基于学习的规划器严重依赖专家演示,这不仅缺乏明确的安全意识,还可能继承次优人类驾驶数据中的不良行为(如超速)。受大型语言模型成功的启发,我们提出了Plan-R1,一种两阶段轨迹规划框架,将原则对齐与行为学习解耦。在第一阶段,通用轨迹预测器在专家数据上进行预训练,以捕获多样化的、类人的驾驶行为。在第二阶段,使用基于规则的奖励通过组相对策略优化(GRPO)对模型进行微调,明确地将自我规划与安全、舒适和交通规则遵守等原则对齐。这种两阶段范式保留了类人行为,同时增强了安全意识并丢弃了演示中的不良模式。此外,我们识别了直接应用GRPO到规划的一个关键限制:组级归一化消除了跨组的尺度差异,导致罕见、高方差的安全违规组与大量低方差的安全组具有相似的优势,从而抑制了对安全关键目标的优化。为解决此问题,我们提出了方差解耦GRPO(VD-GRPO),用中心化和固定缩放替代归一化以保留绝对奖励幅度,确保安全关键目标在整个训练过程中保持主导地位。在nuPlan基准上的实验表明,Plan-R1显著提高了规划的安全性和可行性,达到了最先进的性能,特别是在现实反应性设置中。我们的代码可在https://github.com/XiaolongTang23/Plan-R1获取。

英文摘要

Safe and feasible trajectory planning is critical for real-world autonomous driving systems. However, existing learning-based planners rely heavily on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting undesirable behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a two-stage trajectory planning framework that decouples principle alignment from behavior learning. In the first stage, a general trajectory predictor is pre-trained on expert data to capture diverse, human-like driving behaviors. In the second stage, the model is fine-tuned with rule-based rewards using Group Relative Policy Optimization (GRPO), explicitly aligning ego planning with principles such as safety, comfort, and traffic rule compliance. This two-stage paradigm retains human-like behaviors while enhancing safety awareness and discarding undesirable patterns from demonstrations. Furthermore, we identify a key limitation of directly applying GRPO to planning: group-wise normalization erases cross-group scale differences, causing rare, high-variance safety-violation groups to have similar advantages as abundant low-variance safe groups, thereby suppressing optimization for safety-critical objectives. To address this, we propose Variance-Decoupled GRPO (VD-GRPO), which replaces normalization with centering and fixed scaling to preserve absolute reward magnitudes, ensuring that safety-critical objectives remain dominant throughout training. Experiments on the nuPlan benchmark demonstrate that Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance, particularly in realistic reactive settings. Our code is available at https://github.com/XiaolongTang23/Plan-R1.

2509.20623 2026-06-03 cs.RO 版本更新

Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation

潜在激活编辑:基于推理时策略精炼的安全多机器人导航

Satyajeet Das, Darren Chiu, Zhehui Huang, Lars Lindemann, Gaurav S. Sukhatme

发表机构 * Department of Computer Science, University of Southern California(南加州大学计算机科学系) Automatic Control Laboratory, ETH Zürich(苏黎世联邦理工学院自动控制实验室)

AI总结 提出潜在激活编辑(LAE)框架,通过在推理时在线检测并编辑中间激活,在不修改权重或架构的情况下降低预训练策略的碰撞率,在四旋翼导航中实现近90%的碰撞减少。

详情
AI中文摘要

强化学习在协调和导航多个四旋翼等复杂领域取得了显著进展。然而,即使经过良好训练的策略在障碍物密集的环境中仍然容易发生碰撞。通过重新训练或微调来解决这些罕见但关键的安全故障成本高昂,并且有损于先前学到的技能。受大语言模型中的激活引导和计算机视觉中的潜在编辑启发,我们引入了一个推理时潜在激活编辑(LAE)框架,该框架在不修改权重或架构的情况下精炼预训练策略的行为。该框架分两个阶段运行:(i)在线分类器监控中间激活以检测与不良行为相关的状态,(ii)激活编辑模块选择性地修改被标记的激活,将策略转向更安全的区域。在这项工作中,我们专注于提高多四旋翼导航的安全性。我们假设放大策略内部的风险感知可以诱导更安全的行为。我们通过训练一个潜在碰撞世界模型来实例化这一想法,该模型预测未来的碰撞前激活,从而促使更早和更谨慎的避碰响应。大量的仿真和真实Crazyflie实验表明,与未编辑的基线相比,LAE实现了统计上显著的碰撞减少(累计碰撞减少近90%),并显著增加了无碰撞轨迹的比例,同时保持了任务完成。更广泛地说,我们的结果确立了LAE作为一种轻量级范式,可在资源受限的硬件上对学习后的机器人策略进行部署后精炼。

英文摘要

Reinforcement learning has enabled significant progress in complex domains such as coordinating and navigating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activation steering in large language models and latent editing in computer vision, we introduce a framework for inference-time Latent Activation Editing (LAE) that refines the behavior of pre-trained policies without modifying their weights or architecture. The framework operates in two stages: (i) an online classifier monitors intermediate activations to detect states associated with undesired behaviors, and (ii) an activation editing module that selectively modifies flagged activations to shift the policy towards safer regimes. In this work, we focus on improving safety in multi-quadrotor navigation. We hypothesize that amplifying a policy's internal perception of risk can induce safer behaviors. We instantiate this idea through a latent collision world model trained to predict future pre-collision activations, thereby prompting earlier and more cautious avoidance responses. Extensive simulations and real-world Crazyflie experiments demonstrate that LAE achieves statistically significant reduction in collisions (nearly 90% fewer cumulative collisions compared to the unedited baseline) and substantially increases the fraction of collision-free trajectories, while preserving task completion. More broadly, our results establish LAE as a lightweight paradigm, feasible on resource-constrained hardware, for post-deployment refinement of learned robot policies.

2509.18068 2026-06-03 cs.RO eess.SP 版本更新

RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

RadarSFD:基于预训练先验的单帧扩散用于雷达点云

Bin Zhao, Nakul Garg

发表机构 * Rice University(里士大学)

AI总结 提出RadarSFD,一种条件潜在扩散框架,利用预训练单目深度估计器的几何先验,从单帧雷达数据重建密集LiDAR-like点云,无需合成孔径或多帧聚合。

Comments Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026). Project page: https://phi-lab-rice.github.io/RadarSFD/

详情
AI中文摘要

毫米波雷达在雾、烟、尘和低光环境下提供稳健的感知,使其适用于尺寸、重量和功率受限的机器人平台。现有的雷达成像方法通常依赖合成孔径或多帧聚合来提高分辨率,这对于小型空中、检测或可穿戴系统不切实际。我们提出RadarSFD,一种条件潜在扩散框架,无需运动或SAR即可从单帧雷达重建密集的LiDAR-like点云。我们的方法将预训练单目深度估计器的几何先验转移到扩散骨干中,通过通道级潜在拼接将其锚定到雷达输入,并使用结合潜在空间和像素空间损失的双空间目标进行正则化。在RadarHD基准上,RadarSFD相对于基线模型实现了最先进的性能。定性结果显示恢复了精细的墙壁和狭窄的间隙,跨新环境的实验证实了强大的泛化能力。消融研究强调了预训练初始化、雷达BEV条件和双空间损失的重要性。这些结果共同为紧凑型机器人系统中的密集点云感知建立了一个实用的单帧、无SAR毫米波雷达流水线。

英文摘要

Millimeter-wave radar provides robust perception in fog, smoke, dust, and low light, making it attractive for size-, weight-, and power-constrained robotic platforms. Existing radar imaging methods typically rely on synthetic aperture or multi-frame aggregation to improve resolution, which is impractical for small aerial, inspection, or wearable systems. We present RadarSFD, a conditional latent diffusion framework that reconstructs dense LiDAR-like point clouds from a single radar frame without motion or SAR. Our approach transfers geometric priors from a pretrained monocular depth estimator into the diffusion backbone, anchors them to radar inputs via channel-wise latent concatenation, and regularizes outputs with a dual-space objective combining latent and pixel-space losses. On the RadarHD benchmark, RadarSFD achieves state-of-the-art performance against baseline models. Qualitative results show recovery of fine walls and narrow gaps, and experiments across new environments confirm strong generalization. Ablation studies highlight the importance of pretrained initialization, radar BEV conditioning, and the dual-space loss. Together, these results establish a practical single-frame, no-SAR mmWave radar pipeline for dense point cloud perception in compact robotic systems.

2509.14636 2026-06-03 cs.RO 版本更新

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

BEV-ODOM2: 基于PV-BEV融合与密集光流监督的增强型BEV单目视觉里程计用于地面机器人

Yufei Wei, Chenxiao Hu, Wangtao Lu, Sha Lu, Yuxiang Cui, Fuzhang Han, Rong Xiong, Yue Wang

发表机构 * Tsinghua University(清华大学)

AI总结 针对现有BEV方法中位姿训练稀疏监督和透视投影信息丢失的问题,提出BEV-ODOM2框架,通过密集BEV光流监督和PV-BEV融合,在四个数据集上实现40%的RTE提升,并支持边缘实时部署。

详情
AI中文摘要

尺度一致的自我运动估计是自主地面机器人的基础。鸟瞰图(BEV)表示通过提供度量尺度的平面工作空间,自然地解决了单目视觉里程计(MVO)的尺度漂移问题,使得6自由度自我运动简化为更鲁棒的3自由度模型。然而,现有的基于BEV的方法存在两个关键限制:仅从位姿训练得到的稀疏监督信号,以及透视到BEV投影过程中的信息丢失。我们提出了BEV-ODOM2,一个增强框架,无需额外标注即可解决这两个限制。我们的方法引入了(1)直接从3自由度位姿真值构建的密集BEV光流监督,用于像素级指导,以及(2)透视视图(PV)-BEV融合,在投影前计算相关体积以保留6自由度运动线索。增强的旋转采样策略进一步在训练中平衡了不同的运动模式。我们在四个不同空间尺度的数据集上进行了评估:KITTI、Oxford、NCLT和我们新收集的ZJH-VO基准。BEV-ODOM2相比之前的BEV方法实现了40%的RTE提升,在NVIDIA Jetson AGX Orin上的实时推理确认了边缘部署的可行性。源代码和ZJH-VO数据集已公开发布,以促进未来研究。

英文摘要

Scale-consistent ego-motion estimation is fundamental for autonomous ground robots. Bird's-Eye-View (BEV) representation naturally addresses the scale drift problem of monocular visual odometry (MVO) by providing a metric-scaled planar workspace, enabling the simplification of 6-DoF ego-motion to a more robust 3-DoF model. However, existing BEV-based methods suffer from two key limitations: sparse supervision signals from pose-only training, and information loss during perspective-to-BEV projection. We present BEV-ODOM2, an enhanced framework that addresses both limitations without requiring additional annotations. Our approach introduces (1) dense BEV optical flow supervision constructed directly from 3-DoF pose ground truth for pixel-level guidance, and (2) Perspective View (PV)-BEV fusion that computes correlation volumes before projection to preserve 6-DoF motion cues. An enhanced rotation sampling strategy further balances diverse motion patterns during training. We evaluate on four datasets with varied spatial scales: KITTI, Oxford, NCLT, and our newly collected ZJH-VO benchmark. BEV-ODOM2 achieves a 40\% RTE improvement over prior BEV-based methods, with real-time inference on an NVIDIA Jetson AGX Orin confirming edge deployment feasibility. The source code and the ZJH-VO dataset are publicly released to facilitate future research.

2508.09606 2026-06-03 cs.RO cs.SY eess.SY 版本更新

BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots

BEAVR:用于机器人的双手、多形态、可访问的虚拟现实遥操作系统

Alejandro Posadas-Nava, Alejandro Carrasco, Richard Linares

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology(航空与航天系,麻省理工学院)

AI总结 提出BEAVR,一个开源的双手多形态VR遥操作系统,通过零拷贝流式架构和异步“思考-行动”控制循环,实现低延迟、多机器人实时控制与数据记录,并兼容多种视觉运动策略。

Comments Accepted for presentation on ICCR Kyoto 2025

详情
AI中文摘要

\textbf{BEAVR}是一个用于机器人的开源、双手、多形态虚拟现实(VR)遥操作系统,旨在统一异构机器人平台上的实时控制、数据记录和策略学习。BEAVR使用商用VR硬件实现实时、灵巧的遥操作,支持从7自由度机械臂到全身人形机器人的模块化集成,并直接以LeRobot数据集模式记录同步的多模态演示。我们的系统具有零拷贝流式架构,实现≤35毫秒延迟,一个用于可扩展推理的异步“思考-行动”控制循环,以及一个针对实时多机器人操作优化的灵活网络API。我们在多种操作任务上对BEAVR进行基准测试,并展示其与领先的视觉运动策略(如ACT、DiffusionPolicy和SmolVLA)的兼容性。所有代码公开可用,数据集发布在Hugging Face上\footnote{代码、数据集和VR应用可在https://github.com/ARCLab-MIT/BEAVR-Bot获取。}

英文摘要

\textbf{BEAVR} is an open-source, bimanual, multi-embodiment Virtual Reality (VR) teleoperation system for robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema. Our system features a zero-copy streaming architecture achieving $\leq$35\,ms latency, an asynchronous ``think--act'' control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA. All code is publicly available, and datasets are released on Hugging Face\footnote{Code, datasets, and VR app available at https://github.com/ARCLab-MIT/BEAVR-Bot.

2205.15412 2026-06-03 cs.DC cs.MA cs.RO 版本更新

Asynchronous Deterministic Leader Election in Three-Dimensional Programmable Matter

三维可编程物质中的异步确定性领导者选举

Joseph L. Briones, Tishya Chhabra, Joshua J. Daymude, Andréa W. Richa

AI总结 针对三维可编程物质,提出基于面心立方晶格的分布式算法,在非公平顺序敌手下O(n)轮内确定性选举唯一领导者,并利用并发控制框架转化为非公平异步敌手下首个领导者选举算法。

Comments 18 pages, 4 figures, 2 tables. Accepted at ICDCN 2023

详情
Journal ref
Proceedings of the 24th International Conference on Distributed Computing and Networking (ICDCN 2023), pp. 38-47
AI中文摘要

经过三十多年的科学努力实现可编程物质(一种可以根据用户输入或对环境响应改变其物理特性的物质),在模块化机器人系统的工程和相应的集体行为算法理论方面都取得了许多进展。然而,虽然模块化机器人的设计通常处理真实三维(3D)空间的挑战,但算法理论仍然主要关注二维(2D)抽象,如平面和平面图。在这项工作中,我们使用面心立方(FCC)晶格来表示空间并定义局部空间方向,为可编程物质的规范阿米巴模型形式化了3D几何空间变体。然后,我们给出了一种用于连通、可收缩的2D或3D几何阿米巴系统中领导者选举的分布式算法,该算法在非公平顺序敌手下确定性选举出恰好一个领导者,时间复杂度为O(n)轮,其中n是系统中的阿米巴数量。接着,我们展示了如何使用阿米巴算法的并发控制框架(DISC 2021)转换该算法,以获得已知的第一个在非公平异步敌手下解决领导者选举的阿米巴算法,适用于2D和3D空间。

英文摘要

Over three decades of scientific endeavors to realize programmable matter, a substance that can change its physical properties based on user input or responses to its environment, there have been many advances in both the engineering of modular robotic systems and the corresponding algorithmic theory of collective behavior. However, while the design of modular robots routinely addresses the challenges of realistic three-dimensional (3D) space, algorithmic theory remains largely focused on 2D abstractions such as planes and planar graphs. In this work, we formalize the 3D geometric space variant for the canonical amoebot model of programmable matter, using the face-centered cubic (FCC) lattice to represent space and define local spatial orientations. We then give a distributed algorithm for leader election in connected, contractible 2D or 3D geometric amoebot systems that deterministically elects exactly one leader in $\mathcal{O}(n)$ rounds under an unfair sequential adversary, where $n$ is the number of amoebots in the system. We then demonstrate how this algorithm can be transformed using the concurrency control framework for amoebot algorithms (DISC 2021) to obtain the first known amoebot algorithm, both in 2D and 3D space, to solve leader election under an unfair asynchronous adversary.

2105.02420 2026-06-03 cs.DC cs.ET cs.RO 版本更新

The Canonical Amoebot Model: Algorithms and Concurrency Control

规范变形虫模型:算法与并发控制

Joshua J. Daymude, Andréa W. Richa, Christian Scheideler

AI总结 提出规范变形虫模型,通过消息传递和对抗性激活模型形式化并发执行,并给出两种并发算法设计方法(直接嵌入并发控制和基于锁的转换框架),以六边形形成算法为例验证。

Comments 48 pages, 7 figures, 2 tables

详情
Journal ref
Distributed Computing (2023) 36, pp. 159-192
AI中文摘要

变形虫模型将主动可编程物质抽象为称为变形虫的简单计算元素的集合,这些元素局部交互以共同完成协调和移动任务。自2014年在SPAA上引入以来,越来越多的文献针对各种问题调整了其假设;然而,如果没有标准化的假设层次结构,很难在变形虫模型下对结果进行精确的系统比较。我们提出了规范变形虫模型,这是一种更新的形式化,区分了核心模型特征和假设变体族。规范变形虫模型解决的一个关键改进是并发性。现有文献大多隐含地假设变形虫的动作是隔离且可靠的,将分析简化为至多一个变形虫同时活动的顺序设置。然而,真实的可编程物质系统是并发的。规范变形虫模型将所有变形虫通信形式化为消息传递,利用并发执行的对抗性激活模型。在这种对时间的精细处理下,我们采用了两种互补的方法来设计并发算法。我们首先建立了一组在任何并发执行下算法正确性的充分条件,将并发控制直接嵌入算法设计。然后,我们提出了一个使用锁的并发控制框架,将在顺序设置中终止并满足特定约定的变形虫算法转换为在并发设置中表现出等效行为的算法。作为案例研究,我们使用一个简单的六边形形成算法演示了这两种方法。规范变形虫模型和这些互补的并发算法设计方法共同为可编程物质的分布式计算研究开辟了新的方向。

英文摘要

The amoebot model abstracts active programmable matter as a collection of simple computational elements called amoebots that interact locally to collectively achieve tasks of coordination and movement. Since its introduction at SPAA 2014, a growing body of literature has adapted its assumptions for a variety of problems; however, without a standardized hierarchy of assumptions, precise systematic comparison of results under the amoebot model is difficult. We propose the canonical amoebot model, an updated formalization that distinguishes between core model features and families of assumption variants. A key improvement addressed by the canonical amoebot model is concurrency. Much of the existing literature implicitly assumes amoebot actions are isolated and reliable, reducing analysis to the sequential setting where at most one amoebot is active at a time. However, real programmable matter systems are concurrent. The canonical amoebot model formalizes all amoebot communication as message passing, leveraging adversarial activation models of concurrent executions. Under this granular treatment of time, we take two complementary approaches to concurrent algorithm design. We first establish a set of sufficient conditions for algorithm correctness under any concurrent execution, embedding concurrency control directly in algorithm design. We then present a concurrency control framework that uses locks to convert amoebot algorithms that terminate in the sequential setting and satisfy certain conventions into algorithms that exhibit equivalent behavior in the concurrent setting. As a case study, we demonstrate both approaches using a simple algorithm for hexagon formation. Together, the canonical amoebot model and these complementary approaches to concurrent algorithm design open new directions for distributed computing research on programmable matter.

2108.09403 2026-06-03 cs.RO cs.DC 版本更新

Deadlock and Noise in Self-Organized Aggregation Without Computation

无计算的自组织聚合中的死锁与噪声

Joshua J. Daymude, Noble C. Harasha, Andréa W. Richa, Ryan Yiu

AI总结 研究无计算自组织聚合算法在多机器人系统中的死锁问题,证明确定性运动下存在死锁构型,并发现少量误差可避免死锁,同时提出一种离散噪声版本。

Comments 17 pages, 11 figures

详情
Journal ref
Stabilization, Safety, and Security of Distributed Systems (SSS 2021), pp. 51-65
AI中文摘要

聚合是群体机器人学中的基本行为,要求系统聚集在一个紧凑、连通的集群中。2014年,Gauci等人提出了一种令人惊讶的算法,仅使用二进制视线传感器且无需算术计算或持久内存,即可可靠地实现群体聚合。该算法已被严格证明能够将一个机器人聚合到另一个机器人,但尚不清楚它是否总能像实验和模拟中观察到的那样聚合$n > 2$个机器人的系统。我们证明,当机器人的运动是均匀且确定性的时,对于$n > 3$个机器人,存在死锁构型,使得该算法无法实现聚合。从积极方面看,我们表明该算法(i)对小量误差具有鲁棒性,从而能够避免死锁,并且(ii)在使用锥形视线传感器时,对于$n = 2$的情况,可证明实现线性运行时间加速。最后,我们引入了该算法的一种带噪声的离散改编,更易于进行噪声的严格分析,其模拟结果与原始的连续算法定性一致。

英文摘要

Aggregation is a fundamental behavior for swarm robotics that requires a system to gather together in a compact, connected cluster. In 2014, Gauci et al. proposed a surprising algorithm that reliably achieves swarm aggregation using only a binary line-of-sight sensor and no arithmetic computation or persistent memory. It has been rigorously proven that this algorithm will aggregate one robot to another, but it remained open whether it would always aggregate a system of $n > 2$ robots as was observed in experiments and simulations. We prove that there exist deadlocked configurations from which this algorithm cannot achieve aggregation for $n > 3$ robots when the robots' motion is uniform and deterministic. On the positive side, we show that the algorithm (i) is robust to small amounts of error, enabling deadlock avoidance, and (ii) provably achieves a linear runtime speedup for the $n = 2$ case when using a cone-of-sight sensor. Finally, we introduce a noisy, discrete adaptation of this algorithm that is more amenable to rigorous analysis of noise and whose simulation results align qualitatively with the original, continuous algorithm.

2007.04377 2026-06-03 cs.DC cs.RO 版本更新

Bio-Inspired Energy Distribution for Programmable Matter

可编程物质的仿生能量分布

Joshua J. Daymude, Andréa W. Richa, Jamison W. Weber

AI总结 受枯草芽孢杆菌生物膜生长行为启发,提出一种基于通信的算法,通过抑制饥饿模块的能量消耗,确保可编程物质系统中所有模块获得足够能量,并扩展了amoebot模型的生成树原语以支持崩溃故障自稳定。

详情
Journal ref
Proceedings of the 22nd International Conference on Distributed Computing and Networking (ICDCN 2021), pp. 86-95
AI中文摘要

在主动可编程物质系统中,单个模块需要持续的能量供应才能参与系统的集体行为。这些系统通常由至少一个模块可访问的外部能源供电,并依赖模块间的能量传输在整个系统中分配能量。尽管在解决可编程物质硬件中能量管理的挑战方面投入了大量精力,但可编程物质的算法理论在很大程度上忽略了能量使用和分布对算法可行性和效率的影响。在这项工作中,我们提出了一种受枯草芽孢杆菌生物膜生长行为启发的amoebot模型能量分布算法。这些细菌使用化学信号传递其代谢状态并调节整个生物膜中的营养消耗,确保所有细菌获得所需的营养。我们的算法类似地使用通信来在存在饥饿模块时抑制能量消耗,使所有模块能够获得足够的能量以满足其需求。作为一个支持性但独立的结果,我们扩展了amoebot模型成熟的生成树原语,使其在崩溃故障存在时能够自稳定。最后,我们展示了如何利用这一自稳定原语将我们的能量分布算法与现有的amoebot模型算法组合,从而有效地将先前的工作推广到也考虑能量约束。

英文摘要

In systems of active programmable matter, individual modules require a constant supply of energy to participate in the system's collective behavior. These systems are often powered by an external energy source accessible by at least one module and rely on module-to-module power transfer to distribute energy throughout the system. While much effort has gone into addressing challenging aspects of power management in programmable matter hardware, algorithmic theory for programmable matter has largely ignored the impact of energy usage and distribution on algorithm feasibility and efficiency. In this work, we present an algorithm for energy distribution in the amoebot model that is loosely inspired by the growth behavior of Bacillus subtilis bacterial biofilms. These bacteria use chemical signaling to communicate their metabolic states and regulate nutrient consumption throughout the biofilm, ensuring that all bacteria receive the nutrients they need. Our algorithm similarly uses communication to inhibit energy usage when there are starving modules, enabling all modules to receive sufficient energy to meet their demands. As a supporting but independent result, we extend the amoebot model's well-established spanning forest primitive so that it self-stabilizes in the presence of crash failures. We conclude by showing how this self-stabilizing primitive can be leveraged to compose our energy distribution algorithm with existing amoebot model algorithms, effectively generalizing previous work to also consider energy constraints.

1903.02091 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Geometric Adaptive Control with Neural Networks for a Quadrotor UAV in Wind fields

风场中四旋翼无人机的几何自适应神经网络控制

Mahdis Bisheban, Taeyoung Lee

AI总结 针对风场引起的非结构力和力矩扰动,提出一种基于多层神经网络在线调整权重的几何自适应控制器,实现位置和航向跟踪误差的一致最终有界。

详情
AI中文摘要

本文提出了一种带有人工神经网络的四旋翼无人机几何自适应控制器。假设四旋翼动力学受到风引起的任意非结构力和力矩的干扰。为了解决这个问题,所提出的控制系统增加了多层神经网络,并根据自适应律在线调整神经网络的权重。利用通用逼近定理,表明未知扰动的影响可以得到缓解。更具体地说,在所提出的控制系统下,位置和航向方向的跟踪误差是一致最终有界的,并且最终界可以任意减小。这些方法直接在特殊欧几里得群上开发,以避免局部参数化固有的复杂性或奇异性。首先通过数值例子说明了所提出控制系统的有效性。然后,通过几个室内飞行实验证明,即使对于激进的、敏捷的机动,所提出的控制器也能成功抑制风扰动的影响。

英文摘要

This paper proposes a geometric adaptive controller for a quadrotor unmanned aerial vehicle with artificial neural networks. It is assumed that the dynamics of a quadrotor is disturbed by arbitrary, unstructured forces and moments caused by wind. To address this, the proposed control system is augmented with multilayer neural networks, and the weights of neural networks are adjusted online according to an adaptive law. By utilizing the universal approximation theorem, it is shown that the effects of unknown disturbances can be mitigated. More specifically, under the proposed control system, the tracking errors in the position and the heading direction are uniformly ultimately bounded where the ultimate bound can be reduced arbitrarily. These are developed directly on the special Euclidean group to avoid complexities or singularities inherent to local parameterizations. The efficacy of the proposed control system is first illustrated by numerical examples. Then, several indoor flight experiments are presented to demonstrate that the proposed controller successfully rejects the effects of wind disturbances even for aggressive, agile maneuvers.

1803.06363 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Geometric Adaptive Control for a Quadrotor UAV with Wind Disturbance Rejection

四旋翼无人机抗风扰动的几何自适应控制

Mahdis Bisheban, Taeyoung Lee

AI总结 针对四旋翼无人机,提出一种基于在线调整多层神经网络的几何自适应控制方案,以抑制未知非结构扰动,并利用特殊欧几里得群上的李雅普诺夫稳定性理论证明跟踪误差一致最终有界,且可通过数值示例验证其抗风扰动能力。

详情
AI中文摘要

本文提出了一种四旋翼无人机的几何自适应控制方案,其中未知、非结构扰动的影响通过在线调整的多层神经网络来减轻。利用特殊欧几里得群上的李雅普诺夫稳定性理论分析了所提控制器的稳定性,并证明跟踪误差一致最终有界,且其最终界可任意缩小。给出了四旋翼动力学中风扰动的数学模型,并表明所提自适应控制器能够成功抑制风扰动的影响。这些通过数值示例进行了说明。

英文摘要

This paper presents a geometric adaptive control scheme for a quadrotor unmanned aerial vehicle, where the effects of unknown, unstructured disturbances are mitigated by a multilayer neural network that is adjusted online. The stability of the proposed controller is analyzed with Lyapunov stability theory on the special Euclidean group, and it is shown that the tracking errors are uniformly ultimately bounded with an ultimate bound that can be abridged arbitrarily. A mathematical model of wind disturbance on the quadrotor dynamics is presented, and it is shown that the proposed adaptive controller is capable of rejecting the effects of wind disturbances successfully. These are illustrated by numerical examples.

1011.1939 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Discrete Partitioning and Coverage Control for Gossiping Robots

面向闲聊机器人的离散分区与覆盖控制

Joseph W. Durham, Ruggero Carli, Paolo Frasca, Francesco Bullo

AI总结 针对非凸环境,提出基于图表示和短程不可靠成对通信的分布式算法,实现机器人团队的分区与覆盖控制,并证明收敛到成对最优分区。

Comments Accepted to IEEE TRO. 14 double-column pages, 10 figures. v2 is a thorough revision of v1, including new algorithms and revised mathematical and simulation results

详情
AI中文摘要

我们提出了分布式算法,用于自动部署一组移动机器人以对非凸环境进行分区和覆盖。为处理任意非凸环境,我们将其表示为图。我们的分区和覆盖算法仅需要短程、不可靠的成对“闲聊”通信。该算法包含两个部分:(1) 一个运动协议,确保相邻机器人至少间歇性地通信;(2) 一个成对分区规则,用于在两个机器人通信时更新领地所有权。通过研究图顶点分区空间上的适当动力系统,我们证明了领地所有权在有限时间内收敛到成对最优分区。这一新的平衡集代表了比常见Lloyd类型算法更优的性能。此外,我们详细说明了算法如何在大规模团队和大规模环境中良好扩展,以及计算如何在有限资源下随时运行。最后,我们报告了在复杂环境中的大规模仿真和使用Player/Stage机器人控制系统的硬件实验。

英文摘要

We propose distributed algorithms to automatically deploy a team of mobile robots to partition and provide coverage of a non-convex environment. To handle arbitrary non-convex environments, we represent them as graphs. Our partitioning and coverage algorithm requires only short-range, unreliable pairwise "gossip" communication. The algorithm has two components: (1) a motion protocol to ensure that neighboring robots communicate at least sporadically, and (2) a pairwise partitioning rule to update territory ownership when two robots communicate. By studying an appropriate dynamical system on the space of partitions of the graph vertices, we prove that territory ownership converges to a pairwise-optimal partition in finite time. This new equilibrium set represents improved performance over common Lloyd-type algorithms. Additionally, we detail how our algorithm scales well for large teams in large environments and how the computation can run in anytime with limited resources. Finally, we report on large-scale simulations in complex environments and hardware experiments using the Player/Stage robot control system.

1202.0253 2026-06-03 cs.RO cs.SY eess.SY 版本更新

High-speed Flight in an Ergodic Forest

遍历森林中的高速飞行

Sertac Karaman, Emilio Frazzoli

AI总结 本文研究在仅已知障碍物生成过程统计特性的随机障碍物场中高速导航的理论基础,通过遍历性和渗流理论揭示了无限无碰撞轨迹存在的相变现象,并推导了临界速度的上下界。

Comments Manuscript submitted to the IEEE Transactions on Robotics

详情
AI中文摘要

受鸟类在密集森林等杂乱环境中飞行的启发,本文研究了一个新颖运动规划问题的理论基础:在仅已知障碍物生成过程统计特性的情况下,通过随机生成的障碍物场进行高速导航。类似于平面森林环境,假设障碍物生成过程决定了圆盘形障碍物的位置和大小。当该过程是遍历的,并且在鸟类动力学的温和技术条件下,证明了通过森林的无限无碰撞轨迹的存在性表现出相变。一方面,如果鸟的飞行速度超过某个临界速度,那么以概率1,不存在无限无碰撞轨迹,即无论控制鸟运动的规划算法如何,鸟几乎必然最终会与某棵树碰撞。另一方面,如果鸟的飞行速度低于该临界速度,那么几乎必然存在至少一条无限无碰撞轨迹。针对齐次泊松森林的特殊情况,考虑鸟动力学的简单模型,推导了临界速度的上下界。对于相同情况,给出了一个等价渗流模型。利用该模型,通过蒙特卡洛模拟近似了相图。本文还通过遍历理论和渗流理论建立了机器人运动规划与统计物理之间的新联系,这可能具有独立的研究价值。

英文摘要

Inspired by birds flying through cluttered environments such as dense forests, this paper studies the theoretical foundations of a novel motion planning problem: high-speed navigation through a randomly-generated obstacle field when only the statistics of the obstacle generating process are known a priori. Resembling a planar forest environment, the obstacle generating process is assumed to determine the locations and sizes of disk-shaped obstacles. When this process is ergodic, and under mild technical conditions on the dynamics of the bird, it is shown that the existence of an infinite collision-free trajectory through the forest exhibits a phase transition. On one hand, if the bird flies faster than a certain critical speed, then, with probability one, there is no infinite collision-free trajectory, i.e., the bird will eventually collide with some tree, almost surely, regardless of the planning algorithm governing the bird's motion. On the other hand, if the bird flies slower than this critical speed, then there exists at least one infinite collision-free trajectory, almost surely. Lower and upper bounds on the critical speed are derived for the special case of a homogeneous Poisson forest considering a simple model for the bird's dynamics. For the same case, an equivalent percolation model is provided. Using this model, the phase diagram is approximated in Monte-Carlo simulations. This paper also establishes novel connections between robot motion planning and statistical physics through ergodic theory and percolation theory, which may be of independent interest.

1205.0207 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Shortest Path Set Induced Vertex Ordering and its Application to Distributed Distance Optimal Multi-agent Formation Path Planning

最短路径集诱导的顶点排序及其在分布式距离最优多智能体编队路径规划中的应用

Jingjin Yu

AI总结 针对无向图中不可区分智能体移动到任意目标编队的距离最优路径规划问题,提出一种基于最短路径集诱导顶点排序的集中式算法,并首次实现分布式调度,同时保证相同的收敛时间。

Comments Extended the earlier version to 8 Pages, complete with literature review. One additional section on a distributed scheduling algorithm is added

详情
AI中文摘要

对于在单位边长的连通图上将一组不可区分的智能体移动到任意目标编队的任务,先前的研究表明,使用完全集中式算法可以调度距离最优路径,并具有严格的收敛时间保证。在本研究中,我们表明问题公式实际上在底层图网络上诱导出更基本的顶点排序,这直接导致更直观的调度算法,保证相同的收敛时间且运行更快。更重要的是,这种结构使得在将个体路径分配给智能体后能够实现分布式调度算法,这是以前不可能的。顶点排序也容易扩展到更一般的图——那些具有非单位容量和边长的图——对于这些图,我们再次保证达到期望编队的收敛时间。

英文摘要

For the task of moving a group of indistinguishable agents on a connected graph with unit edge lengths into an arbitrary goal formation, it was previously shown that distance optimal paths can be scheduled to complete with a tight convergence time guarantee, using a fully centralized algorithm. In this study, we show that the problem formulation in fact induces a more fundamental ordering of the vertices on the underlying graph network, which directly leads to a more intuitive scheduling algorithm that assures the same convergence time and runs faster. More importantly, this structure enables a distributed scheduling algorithm once individual paths are assigned to the agents, which was not possible before. The vertex ordering also readily extends to more general graphs - those with non-unit capacities and edge lengths - for which we again guarantee the convergence time until the desired formation is achieved.

1204.5717 2026-06-03 cs.DS cs.RO cs.SY eess.SY 版本更新

Multi-agent Path Planning and Network Flow

多智能体路径规划与网络流

Jingjin Yu, Steven M. LaValle

AI总结 将图上的多智能体路径规划问题归约到网络流问题,利用组合网络流算法和线性规划技术求解,并证明当目标置换不变时存在最长完成时间不超过 n+V-1 步的可行解路径集,给出 O(nVE) 时间算法,进一步研究时间和距离最优性及其帕累托最优结构。

Comments Corrected an inaccuracy on time optimal solution for average arrival time

详情
AI中文摘要

本文将图(路标图)上的多智能体路径规划与网络流问题联系起来,表明前者可以归约到后者,从而使得组合网络流算法以及一般的线性规划技术能够应用于图上的多智能体路径规划问题。利用这一联系,我们证明当目标是置换不变时,该问题总是存在一个可行解路径集,其最长完成时间不超过 $n + V - 1$ 步,其中 $n$ 是智能体数量,$V$ 是底层图的顶点数。然后,我们给出一个完整算法,在 $O(nVE)$ 时间内找到这样的解,其中 $E$ 是图的边数。进一步,我们研究可行解的时间和距离最优性,表明它们具有成对的帕累托最优结构,并再次为优化这两个实际目标提供了高效算法。

英文摘要

This paper connects multi-agent path planning on graphs (roadmaps) to network flow problems, showing that the former can be reduced to the latter, therefore enabling the application of combinatorial network flow algorithms, as well as general linear program techniques, to multi-agent path planning problems on graphs. Exploiting this connection, we show that when the goals are permutation invariant, the problem always has a feasible solution path set with a longest finish time of no more than $n + V - 1$ steps, in which $n$ is the number of agents and $V$ is the number of vertices of the underlying graph. We then give a complete algorithm that finds such a solution in $O(nVE)$ time, with $E$ being the number of edges of the graph. Taking a further step, we study time and distance optimality of the feasible solutions, show that they have a pairwise Pareto optimal structure, and again provide efficient algorithms for optimizing two of these practical objectives.

1204.3830 2026-06-03 cs.RO cs.AI cs.SY eess.SY 版本更新

Planning Optimal Paths for Multiple Robots on Graphs

图上多机器人路径规划的最优路径

Jingjin Yu, Steven M. LaValle

AI总结 提出两种基于多流整数线性规划的模型,分别求解多机器人路径规划的最小最后到达时间和最小总距离问题,算法完备且保证最优解。

Comments Changed "agents" to "robots"

详情
AI中文摘要

在本文中,我们研究了图上多机器人路径规划(MPP)的最优问题。我们提出了两种基于多流的整数线性规划(ILP)模型,分别计算MPP公式的最小最后到达时间和最小总距离解。这些ILP模型产生的算法是完备的,并保证得到真正的最优解。此外,我们的灵活框架可以轻松适应MPP问题的其他变体。专注于时间最优算法,我们评估了其性能,既作为独立算法,也作为快速解决大规模问题实例的通用启发式方法。计算结果证实了我们方法的有效性。

英文摘要

In this paper, we study the problem of optimal multi-robot path planning (MPP) on graphs. We propose two multiflow based integer linear programming (ILP) models that computes minimum last arrival time and minimum total distance solutions for our MPP formulation, respectively. The resulting algorithms from these ILP models are complete and guaranteed to yield true optimal solutions. In addition, our flexible framework can easily accommodate other variants of the MPP problem. Focusing on the time optimal algorithm, we evaluate its performance, both as a stand alone algorithm and as a generic heuristic for quickly solving large problem instances. Computational results confirm the effectiveness of our method.

1204.3820 2026-06-03 eess.SY cs.AI cs.RO cs.SY 版本更新

Distance Optimal Formation Control on Graphs with a Tight Convergence Time Guarantee

图上具有紧收敛时间保证的距离最优编队控制

Jingjin Yu, Steven M. LaValle

AI总结 针对连通图上单位边距下无碰撞移动多个不可区分智能体到任意目标顶点集的任务,提出一种快速距离最优控制算法,并给出紧收敛时间保证。

Comments Brought to be in-sync with final version submitted to CDC 2012 with only minor updates

详情
AI中文摘要

对于在单位边距的连通图上将一组不可区分智能体无碰撞地移动到任意目标顶点集的任务,我们提出了一种快速距离最优控制算法,引导智能体进入期望编队。此外,我们证明了该算法还提供了紧收敛时间保证(时间最优性和距离最优性无法同时满足)。我们的通用图表述允许该算法应用于诸如具有孔洞(模拟障碍物)的任意维度网格等场景。在线可用的仿真验证了我们的理论发展。

英文摘要

For the task of moving a set of indistinguishable agents on a connected graph with unit edge distance to an arbitrary set of goal vertices, free of collisions, we propose a fast distance optimal control algorithm that guides the agents into the desired formation. Moreover, we show that the algorithm also provides a tight convergence time guarantee (time optimality and distance optimality cannot be simultaneously satisfied). Our generic graph formulation allows the algorithm to be applied to scenarios such as grids with holes (modeling obstacles) in arbitrary dimensions. Simulations, available online, confirm our theoretical developments.

1108.3405 2026-06-03 eess.SY cs.MA cs.RO cs.SY math.OC 版本更新

Hybrid 3-D Formation Control for Unmanned Helicopters

无人直升机的混合三维编队控制

A. Karimoddini, H. Lin, B. M. Chen, T. H. Lee

AI总结 针对无人直升机编队控制,提出一种混合监督控制框架,通过状态空间球形抽象和双相似性实现离散逻辑与连续动态的交互,并嵌入防碰撞机制。

Comments Submitted for publication

详情
AI中文摘要

无人机团队构成典型的网络化信息物理系统,涉及离散逻辑与连续动态的交互。本文提出一种用于无人直升机三维领航-跟随编队控制的混合监督控制框架。所提出的混合控制框架捕捉了决策单元与路径规划器连续动态之间的内部交互,从而提高了系统的整体可靠性。为设计此类混合控制器,提出了一种状态空间的球形抽象作为新的抽象方法。利用划分空间上的多仿射函数性质,得到有限状态离散事件系统模型,该模型被证明与原始连续变量动态系统是双相似的。然后,在离散域中,为抽象模型模块化设计了一个逻辑监督器。由于抽象DES模型与原始无人机动态之间的双相似性,设计的逻辑监督器可通过接口层实现为混合控制器。该监督器驱动无人机动态以满足设计要求。换言之,混合控制器能够从控制视界内的任意初始状态将无人机引导至期望编队,并维持编队。此外,在设计的监督器中嵌入了防碰撞机制。最后,通过为无人直升机开发的硬件在环仿真平台验证了该算法。结果表明了算法的有效性。

英文摘要

Teams of Unmanned Aerial Vehicles (UAVs) form typical networked cyber-physical systems that involve the interaction of discrete logic and continuous dynamics. This paper presents a hybrid supervisory control framework for the three-dimensional leader follower formation control of unmanned helicopters. The proposed hybrid control framework captures internal interactions between the decision making unit and the path planner continuous dynamics of the system, and hence improves the system's overall reliability. To design such a hybrid controller, a spherical abstraction of the state space is proposed as a new method of abstraction. Utilizing the properties of multi-affine functions over the partitioned space leads to a finite state Discrete Event System (DES) model, which is shown to be bisimilar to the original continuous-variable dynamical system. Then, in the discrete domain, a logic supervisor is modularly designed for the abstracted model. Due to the bisimilarity between the abstracted DES model and the original UAV dynamics, the designed logic supervisor can be implemented as a hybrid controller through an interface layer. This supervisor drives the UAV dynamics to satisfy the design requirements. In other words, the hybrid controller is able to bring the UAVs to the desired formation starting from any initial state inside the control horizon and then, maintain the formation. Moreover, a collision avoidance mechanism is embedded in the designed supervisor. Finally, the algorithm has been verified by a hardware-in-the-loop simulation platform, which is developed for unmanned helicopters. The presented results show the effectiveness of the algorithm.

1104.4251 2026-06-03 cs.RO cs.MA cs.SY eess.SY math.OC 版本更新

Distributed Self-Organization Of Swarms To Find Globally $ε$-Optimal Routes To Locally Sensed Targets

群体分布式自组织以找到局部感知目标的全局$ε$-最优路径

Ishanu Chattopadhyay

AI总结 针对大规模群体,提出一种仅利用局部信息的分布式路径规划算法,通过信息渗透和梯度涌现实现接近最优的路径选择,并严格分析了收敛性、鲁棒性、可扩展性及系统参数的影响。

Comments 38 pages 10 Figures

详情
AI中文摘要

在大规模群体的背景下,研究了局部感知目标的近最优分布式路径规划问题。所提出的算法仅使用可以局部查询的信息,并建立了关于收敛性、鲁棒性和可扩展性的严格理论结果,同时分析了系统参数(如个体通信半径和个体速度)对全局性能的影响。该方法的基本思想是让局部信息在整个群体中渗透,使个体能够间接访问全局上下文。通过相邻个体之间的局部信息交换,以分布式方式计算反映个体性能的梯度。研究表明,为了沿着接近最优的路径到达只能局部感知且位置未知的目标,个体只需向其“最佳”邻居移动,其中“最佳”的概念是通过计算底层概率有限状态自动机的状态特定语言度量得到的。理论结果在超过$10^4$个个体的高保真仿真实验中得到了验证。

英文摘要

The problem of near-optimal distributed path planning to locally sensed targets is investigated in the context of large swarms. The proposed algorithm uses only information that can be locally queried, and rigorous theoretical results on convergence, robustness, scalability are established, and effect of system parameters such as the agent-level communication radius and agent velocities on global performance is analyzed. The fundamental philosophy of the proposed approach is to percolate local information across the swarm, enabling agents to indirectly access the global context. A gradient emerges, reflecting the performance of agents, computed in a distributed manner via local information exchange between neighboring agents. It is shown that to follow near-optimal routes to a target which can be only sensed locally, and whose location is not known a priori, the agents need to simply move towards its "best" neighbor, where the notion of "best" is obtained by computing the state-specific language measure of an underlying probabilistic finite state automata. The theoretical results are validated in high-fidelity simulation experiments, with excess of $10^4$ agents.

1104.1159 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees

不确定环境下具有概率满足保证的LTL控制

Xu Chu Ding, Stephen L. Smith, Calin Belta, Daniela Rus

AI总结 提出一种最大化任务完成概率的机器人控制策略生成方法,任务由线性时序逻辑公式描述,通过将问题转化为马尔可夫决策过程的最优策略求解,并利用概率模型检验技术给出完整解决方案。

Comments Technical Report accompanying IFAC 2011

详情
AI中文摘要

我们提出一种生成机器人控制策略的方法,该策略最大化完成任务的概率。任务由一组属性的线性时序逻辑(LTL)公式给出,这些属性可以在划分环境的区域中满足。我们假设属性在区域中满足的概率已知,并且机器人只能在当前区域确定命题的真值。受分区抽象相关结果的启发,我们假设运动在图上进行。为了考虑噪声传感器和执行器,我们假设一个控制动作会启用多个具有已知概率的转移。我们证明该问题可以简化为为马尔可夫决策过程(MDP)生成控制策略的问题,使得在其状态上满足LTL公式的概率最大化。我们基于概率模型检验的现有结果,为后一个问题提供了完整解决方案。我们包含一个说明性案例研究。

英文摘要

We present a method to generate a robot control strategy that maximizes the probability to accomplish a task. The task is given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the regions of a partitioned environment. We assume that the probabilities with which the properties are satisfied at the regions are known, and the robot can determine the truth value of a proposition only at the current region. Motivated by several results on partitioned-based abstractions, we assume that the motion is performed on a graph. To account for noisy sensors and actuators, we assume that a control action enables several transitions with known probabilities. We show that this problem can be reduced to the problem of generating a control policy for a Markov Decision Process (MDP) such that the probability of satisfying an LTL formula over its states is maximized. We provide a complete solution for the latter problem that builds on existing results from probabilistic model checking. We include an illustrative case study.

1103.4065 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Probabilistically Safe Vehicle Control in a Hostile Environment

敌对环境中概率安全的车辆控制

Igor Cizelj, Xu Chu Ding, Morteza Lahijanian, Alessandro Pinto, Calin Belta

AI总结 本文提出一种在静态障碍和移动对手的敌对环境中控制车辆的方法,通过将对手运动建模为泊松过程、车辆穿越时间建模为指数分布,并利用马尔可夫决策过程和概率计算树逻辑最大化任务完成概率。

详情
AI中文摘要

本文提出了一种在具有静态障碍和移动对手的敌对环境中控制车辆的方法。车辆需要满足一个任务目标,该目标表示为在分区环境区域上满足的一组属性的时序逻辑规范。我们将对手在环境区域之间的运动建模为泊松过程。此外,我们假设车辆在每个区域的两个面之间穿越所需的时间服从指数分布,并从环境模拟器中获得该指数分布的速率。我们将车辆的运动和对手分布的车辆更新捕获为马尔可夫决策过程。利用概率计算树逻辑中的工具,我们为车辆找到一种控制策略,以最大化完成任务目标的概率。我们通过说明性案例研究展示了我们的方法。

英文摘要

In this paper we present an approach to control a vehicle in a hostile environment with static obstacles and moving adversaries. The vehicle is required to satisfy a mission objective expressed as a temporal logic specification over a set of properties satisfied at regions of a partitioned environment. We model the movements of adversaries in between regions of the environment as Poisson processes. Furthermore, we assume that the time it takes for the vehicle to traverse in between two facets of each region is exponentially distributed, and we obtain the rate of this exponential distribution from a simulator of the environment. We capture the motion of the vehicle and the vehicle updates of adversaries distributions as a Markov Decision Process. Using tools in Probabilistic Computational Tree Logic, we find a control strategy for the vehicle that maximizes the probability of accomplishing the mission objective. We demonstrate our approach with illustrative case studies.

1209.2058 2026-06-03 cs.RO cs.DC cs.MA cs.SY eess.SY 版本更新

Safe and Stabilizing Distributed Multi-Path Cellular Flows

安全且稳定的分布式多路径蜂窝流

Taylor T. Johnson, Sayan Mitra

AI总结 针对分区平面中的分布式交通控制问题,提出一种保证实体间最小安全距离并能在单目标下自稳定、多目标下避免死锁的协议,通过临时阻塞和局部地理路由实现安全与进展。

Comments An earlier version of this paper appeared in the 30th IEEE International Conference on Distributed Computing Systems (ICDCS 2010)

详情
AI中文摘要

我们研究了分区平面中的分布式交通控制问题,其中每个分区(单元)内所有实体(机器人、车辆等)的运动是耦合的。在此类系统中建立活性具有挑战性,但这种分析对于将分布式交通控制算法应用于协调机器人群体和智能高速公路系统等场景是必要的。我们提出了一个分布式交通控制协议的正式模型,该模型保证实体之间的最小安全距离,即使某些单元发生故障。一旦新故障停止发生,在单目标情况下,协议保证自稳定,并且具有到目标单元可行路径的实体能够向目标前进。对于多目标情况,故障可能导致系统死锁,因此我们识别了一类非死锁故障,其中所有实体都能向各自目标前进。该算法依赖于两个通用原则:临时阻塞以维护安全性,以及局部地理路由以保证进展。我们的断言式证明可作为其他分布式交通控制协议分析的模板。我们给出了仿真结果,提供了吞吐量作为实体速度、安全距离、单目标路径复杂度、故障恢复率和多目标路径复杂度的函数估计。

英文摘要

We study the problem of distributed traffic control in the partitioned plane, where the movement of all entities (robots, vehicles, etc.) within each partition (cell) is coupled. Establishing liveness in such systems is challenging, but such analysis will be necessary to apply such distributed traffic control algorithms in applications like coordinating robot swarms and the intelligent highway system. We present a formal model of a distributed traffic control protocol that guarantees minimum separation between entities, even as some cells fail. Once new failures cease occurring, in the case of a single target, the protocol is guaranteed to self-stabilize and the entities with feasible paths to the target cell make progress towards it. For multiple targets, failures may cause deadlocks in the system, so we identify a class of non-deadlocking failures where all entities are able to make progress to their respective targets. The algorithm relies on two general principles: temporary blocking for maintenance of safety and local geographical routing for guaranteeing progress. Our assertional proofs may serve as a template for the analysis of other distributed traffic control protocols. We present simulation results that provide estimates of throughput as a function of entity velocity, safety separation, single-target path complexity, failure-recovery rates, and multi-target path complexity.

1302.0450 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Algorithms for leader selection in stochastically forced consensus networks

随机力驱动共识网络中的领导者选择算法

Fu Lin, Makan Fardad, Mihailo R. Jovanović

AI总结 针对随机力驱动共识网络,通过凸松弛和贪婪算法优化领导者选择以最小化均方偏差。

Comments Submitted to IEEE Transactions on Automatic Control

详情
Journal ref
IEEE Trans. Automat. Control (2014), vol. 59, no. 7, pp. 1789-1802
AI中文摘要

我们感兴趣的是分配指定数量的节点作为领导者,以最小化随机力驱动网络中与共识的均方偏差。该问题出现在多个应用中,包括车辆编队控制和传感器网络定位。对于领导者受噪声影响的网络,我们证明了布尔约束(节点要么是领导者,要么不是)是非凸性的唯一来源。通过将这些约束松弛到其凸包,我们得到了全局最优值的下界。我们还使用一种简单但高效的贪婪算法来识别领导者并计算上界。对于领导者完美遵循其期望轨迹的网络,我们以秩约束的形式识别了另一个非凸性来源。移除秩约束并松弛布尔约束得到一个半定规划,为此我们开发了一种适用于大型网络的定制算法。提供了从规则网格到随机图等多个例子,以说明所开发算法的有效性。

英文摘要

We are interested in assigning a pre-specified number of nodes as leaders in order to minimize the mean-square deviation from consensus in stochastically forced networks. This problem arises in several applications including control of vehicular formations and localization in sensor networks. For networks with leaders subject to noise, we show that the Boolean constraints (a node is either a leader or it is not) are the only source of nonconvexity. By relaxing these constraints to their convex hull we obtain a lower bound on the global optimal value. We also use a simple but efficient greedy algorithm to identify leaders and to compute an upper bound. For networks with leaders that perfectly follow their desired trajectories, we identify an additional source of nonconvexity in the form of a rank constraint. Removal of the rank constraint and relaxation of the Boolean constraints yields a semidefinite program for which we develop a customized algorithm well-suited for large networks. Several examples ranging from regular lattices to random graphs are provided to illustrate the effectiveness of the developed algorithms.

1303.2912 2026-06-03 cs.AI cs.RO cs.SY eess.SY stat.ML 版本更新

Integrated Pre-Processing for Bayesian Nonlinear System Identification with Gaussian Processes

基于高斯过程的贝叶斯非线性系统辨识的集成预处理

Roger Frigola, Carl Edward Rasmussen

AI总结 提出GP-FNARX模型,通过集成数据预处理与稀疏高斯过程回归,实现从原始数据到辨识模型的自动化流程,并利用边际似然最大化同时优化预处理参数和超参数,获得能报告不确定性的贝叶斯动力学模型。

Comments Proceedings of the 52th IEEE International Conference on Decision and Control (CDC), Firenze, Italy, December 2013

详情
AI中文摘要

我们介绍了GP-FNARX:一种新的非线性系统辨识模型,基于带有滤波回归量(F)的非线性自回归外生模型(NARX),其中非线性回归问题使用稀疏高斯过程(GP)解决。我们将数据预处理与系统辨识集成到一个完全自动化的流程中,从原始数据到辨识模型。预处理参数和GP超参数均通过最大化概率模型的边际似然来调整。我们获得了系统动力学的贝叶斯模型,该模型能够在数据稀缺的区域报告其不确定性。自动化方法、不确定性建模及其相对较低的计算成本使GP-FNARX成为机器人和自适应控制应用的良好候选方案。

英文摘要

We introduce GP-FNARX: a new model for nonlinear system identification based on a nonlinear autoregressive exogenous model (NARX) with filtered regressors (F) where the nonlinear regression problem is tackled using sparse Gaussian processes (GP). We integrate data pre-processing with system identification into a fully automated procedure that goes from raw data to an identified model. Both pre-processing parameters and GP hyper-parameters are tuned by maximizing the marginal likelihood of the probabilistic model. We obtain a Bayesian model of the system's dynamics which is able to report its uncertainty in regions where the data is scarce. The automated approach, the modeling of uncertainty and its relatively low computational cost make of GP-FNARX a good candidate for applications in robotics and adaptive control.

1209.4433 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Transverse Contraction Criteria for Existence, Stability, and Robustness of a Limit Cycle

极限环存在性、稳定性和鲁棒性的横向收缩准则

Ian R. Manchester, Jean-Jacques E. Slotine

AI总结 本文推导了自治系统中轨道稳定极限环存在的微分收缩条件,该条件可表示为逐点线性矩阵不等式,从而可利用凸优化工具(如平方和规划)搜索稳定极限环存在的证书,并将收缩动力学的许多理想性质(如互联下收缩保持)推广到该框架,同时通过引入微分耗散性和横向微分耗散性概念,基于子系统LMI条件建立大规模系统的收缩与横向收缩。

Comments 6 pages, 1 figure. Conference submission

详情
AI中文摘要

本文推导了自治系统中轨道稳定极限环存在的微分收缩条件。该横向收缩条件可表示为逐点线性矩阵不等式(LMI),从而允许使用凸优化工具(如平方和规划)来搜索稳定极限环存在的证书。收缩动力学的许多理想性质被推广到这一框架,包括在一大类互联下收缩的保持。此外,通过引入微分耗散性和横向微分耗散性的概念,可以基于子系统的LMI条件建立大规模系统的收缩和横向收缩。

英文摘要

This paper derives a differential contraction condition for the existence of an orbitally-stable limit cycle in an autonomous system. This transverse contraction condition can be represented as a pointwise linear matrix inequality (LMI), thus allowing convex optimization tools such as sum-of-squares programming to be used to search for certificates of the existence of a stable limit cycle. Many desirable properties of contracting dynamics are extended to this context, including preservation of contraction under a broad class of interconnections. In addition, by introducing the concepts of differential dissipativity and transverse differential dissipativity, contraction and transverse contraction can be established for large scale systems via LMI conditions on component subsystems.

1302.7314 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Torque Saturation in Bipedal Robotic Walking through Control Lyapunov Function Based Quadratic Programs

基于控制李雅普诺夫函数二次规划的双足机器人行走中的力矩饱和

Kevin Galloway, Koushil Sreenath, Aaron D. Ames, J. W. Grizzle

AI总结 本文提出一种通过凸优化将用户定义的控制输入饱和直接纳入控制李雅普诺夫函数(CLF)行走控制器计算的新方法,并在双足机器人MABEL上实验验证。

详情
AI中文摘要

本文提出了一种新颖的方法,用于将用户定义的控制输入饱和直接纳入基于控制李雅普诺夫函数(CLF)的双足机器人行走控制器的计算中。作者先前的工作已经证明了CLF控制器在稳定双足步行器周期性步态方面的有效性,而当前工作通过提供一种更有效的处理控制饱和的方法扩展了这些结果。这种基于以1 kHz控制更新率运行的凸优化例程的新方法,不仅适用于处理力矩饱和,还适用于将一整个用户定义的约束族纳入CLF控制器的在线计算中。本文最后在双足机器人MABEL上对主要结果进行了实验实现。

英文摘要

This paper presents a novel method for directly incorporating user-defined control input saturations into the calculation of a control Lyapunov function (CLF)-based walking controller for a biped robot. Previous work by the authors has demonstrated the effectiveness of CLF controllers for stabilizing periodic gaits for biped walkers, and the current work expands on those results by providing a more effective means for handling control saturations. The new approach, based on a convex optimization routine running at a 1 kHz control update rate, is useful not only for handling torque saturations but also for incorporating a whole family of user-defined constraints into the online computation of a CLF controller. The paper concludes with an experimental implementation of the main results on the bipedal robot MABEL.

1301.0043 2026-06-03 cs.HC cs.RO cs.SY eess.SY 版本更新

A Framework for Analysing Driver Interactions with Semi-Autonomous Vehicles

分析驾驶员与半自主车辆交互的框架

Siraj Shaikh, Padmanabhan Krishnan

AI总结 提出一个结合人类行为经验模型与环境系统模型的框架,通过模型检验分析驾驶员与半自主车辆交互的安全性,并以驾驶员疲劳为例验证其适用性。

Comments In Proceedings FTSCS 2012, arXiv:1212.6574

详情
Journal ref
EPTCS 105, 2012, pp. 85-99
AI中文摘要

半自主车辆在从采矿到物流再到国防的各种环境中日益发挥关键功能。此类系统的一个关键特征是控制回路中存在人类(驾驶员)。为了确保安全,驾驶员需要了解车辆的自主方面,而车辆内置的自动化功能旨在实现更安全的控制。在本文中,我们提出了一个框架,将描述人类行为的经验模型与环境及系统模型相结合。然后,我们通过模型检验分析这些模型之间的交互,以验证所需的安全属性。目的是分析安全车辆-驾驶员交互的设计。我们通过一个涉及半自主车辆的案例研究证明了我们方法的适用性,其中驾驶员疲劳是安全旅程的关键因素。

英文摘要

Semi-autonomous vehicles are increasingly serving critical functions in various settings from mining to logistics to defence. A key characteristic of such systems is the presence of the human (drivers) in the control loop. To ensure safety, both the driver needs to be aware of the autonomous aspects of the vehicle and the automated features of the vehicle built to enable safer control. In this paper we propose a framework to combine empirical models describing human behaviour with the environment and system models. We then analyse, via model checking, interaction between the models for desired safety properties. The aim is to analyse the design for safe vehicle-driver interaction. We demonstrate the applicability of our approach using a case study involving semi-autonomous vehicles where the driver fatigue are factors critical to a safe journey.

1212.2495 2026-06-03 cs.RO cs.AI cs.SY eess.SY 版本更新

Policy-contingent abstraction for robust robot control

基于策略抽象的鲁棒机器人控制

Joelle Pineau, Geoffrey Gordon, Sebastian Thrun

AI总结 提出一种可扩展的控制算法,使移动机器人系统在充分考虑概率信念的情况下做出高层决策,并成功部署于护理机构。

Comments Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

详情
AI中文摘要

本文提出一种可扩展的控制算法,使已部署的移动机器人系统能够在充分考虑其概率信念的情况下做出高层决策。我们的方法基于分层控制器和分层MDP的丰富文献中的见解。所得到的控制器已成功部署在宾夕法尼亚州匹兹堡附近的一家护理机构中。据我们所知,这项工作是应用POMDP解决高层机器人控制问题的独特实例。

英文摘要

This paper presents a scalable control algorithm that enables a deployed mobile robot system to make high-level decisions under full consideration of its probabilistic belief. Our approach is based on insights from the rich literature of hierarchical controllers and hierarchical MDPs. The resulting controller has been successfully deployed in a nursing facility near Pittsburgh, PA. To the best of our knowledge, this work is a unique instance of applying POMDPs to high-level robotic control problems.

1211.4038 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Stochastic receding horizon control of nonlinear stochastic systems with probabilistic state constraints

具有概率状态约束的非线性随机系统的随机滚动时域控制

Shridhar K. Shah, Herbert G. Tanner, Chetan D. Pahlajani

AI总结 针对受概率状态约束的连续时间随机非线性系统,提出一种将滚动时域参考路径设计与随机最优控制器相结合的实时可实施控制框架,并证明无控制输入约束下的闭环收敛性。

Comments Draft of submission to IEEE Transactions of Automatic Control

详情
AI中文摘要

本文描述了一种针对受概率状态约束的连续时间随机非线性系统的滚动时域控制设计框架。其目标是推导出可在当前移动处理器上实时实现的解决方案。该方法将问题分解为基于系统动力学漂移分量设计滚动时域参考路径,然后实施随机最优控制器以使系统保持接近并跟随参考路径。在某些情况下,随机最优控制器可以闭式获得;在更一般的情况下,预计算的数值解可以实时实现,无需在线计算。假设控制输入无约束,建立了闭环系统的收敛性,并提供了仿真结果以验证理论预测。

英文摘要

The paper describes a receding horizon control design framework for continuous-time stochastic nonlinear systems subject to probabilistic state constraints. The intention is to derive solutions that are implementable in real-time on currently available mobile processors. The approach consists of decomposing the problem into designing receding horizon reference paths based on the drift component of the system dynamics, and then implementing a stochastic optimal controller to allow the system to stay close and follow the reference path. In some cases, the stochastic optimal controller can be obtained in closed form; in more general cases, pre-computed numerical solutions can be implemented in real-time without the need for on-line computation. The convergence of the closed loop system is established assuming no constraints on control inputs, and simulation results are provided to corroborate the theoretical predictions.

1211.1690 2026-06-03 cs.RO cs.CV cs.LG cs.SY eess.SY 版本更新

Learning Monocular Reactive UAV Control in Cluttered Natural Environments

学习在杂乱自然环境中进行单目反应式无人机控制

Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell, Martial Hebert

AI总结 本文使用单目相机和模仿学习训练控制器,使小型四旋翼飞行器能在自然森林环境中以1.5m/s速度自主避障导航。

Comments 8 pages, 10 figures

详情
AI中文摘要

大型无人机的自主导航相对简单,因为可以使用昂贵的传感器和监控设备。相比之下,在杂乱环境中低空飞行的微型飞行器(MAV)的避障仍然是一项具有挑战性的任务。与大型飞行器不同,MAV只能携带非常轻的传感器,如摄像头,这使得通过障碍物的自主导航更具挑战性。本文描述了一个系统,该系统能够使小型四旋翼直升机在自然森林环境中低空自主导航。仅使用单个廉价摄像头感知环境,我们能够保持高达1.5m/s的恒定速度。通过少量人类飞行员演示,我们使用最新的模仿学习技术训练了一个控制器,该控制器通过调整MAV的航向来避免树木。我们在室内更受控的环境和室外真实自然森林环境中展示了系统的性能。

英文摘要

Autonomous navigation for large Unmanned Aerial Vehicles (UAVs) is fairly straight-forward, as expensive sensors and monitoring devices can be employed. In contrast, obstacle avoidance remains a challenging task for Micro Aerial Vehicles (MAVs) which operate at low altitude in cluttered environments. Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous navigation through obstacles much more challenging. In this paper, we describe a system that navigates a small quadrotor helicopter autonomously at low altitude through natural forest environments. Using only a single cheap camera to perceive the environment, we are able to maintain a constant velocity of up to 1.5m/s. Given a small set of human pilot demonstrations, we use recent state-of-the-art imitation learning techniques to train a controller that can avoid trees by adapting the MAVs heading. We demonstrate the performance of our system in a more controlled environment indoors, and in real natural forest environments outdoors.

1209.5805 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Memoryless Control Design for Persistent Surveillance under Safety Constraints

安全约束下持久监视的无记忆控制设计

Eduardo Arvelo, Eric Kim, Nuno C. Martins

AI总结 针对有限二维网格中移动机器人的持久监视问题,提出一种基于熵最大化原理的有限参数凸规划方法,设计时间不变无记忆控制策略,在避免进入禁止区域的同时最大化被持久监视的状态数。

详情
AI中文摘要

本文研究在存在禁止区域的有限二维网格中移动的机器人的时间不变无记忆控制策略设计,这些机器人被任务为持久监视该区域。我们将每个机器人建模为一个受控马尔可夫链,其状态包括在网格中的位置和运动方向。目标是找到最少数量的机器人和相关的时间不变无记忆控制策略,保证在不访问禁止状态的情况下持久监视最大数量的状态。我们提出了一种基于熵最大化原理的有限参数凸规划设计方法。提供了数值示例。

英文摘要

This paper deals with the design of time-invariant memoryless control policies for robots that move in a finite two- dimensional lattice and are tasked with persistent surveillance of an area in which there are forbidden regions. We model each robot as a controlled Markov chain whose state comprises its position in the lattice and the direction of motion. The goal is to find the minimum number of robots and an associated time-invariant memoryless control policy that guarantees that the largest number of states are persistently surveilled without ever visiting a forbidden state. We propose a design method that relies on a finitely parametrized convex program inspired by entropy maximization principles. Numerical examples are provided.

1210.0888 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Control Design along Trajectories with Sums of Squares Programming

基于平方和规划的轨迹控制设计

Anirudha Majumdar, Amir Ali Ahmadi, Russ Tedrake

AI总结 提出一种通过平方和规划最大化不变漏斗尺寸的控制设计方法,以形式化保证机器人控制任务的稳定性和安全性。

详情
AI中文摘要

受对具有挑战性的机器人控制任务的控制器稳定性和安全性形式化保证需求的驱动,我们提出了一种控制设计程序,该程序明确寻求最大化通向预定义目标集的不变“漏斗”的尺寸。我们的不变性证明以适当定义的Lyapunov不等式组的平方和证明形式给出。这些证明以及我们提出的多项式控制器可以通过半定优化高效获得。我们的方法可以处理跟踪给定轨迹导致的时变动力学、输入饱和(例如力矩限制),并可扩展到处理动力学和状态的不确定性。所得控制器可用于空间填充反馈运动规划算法,以显著减少轨迹数量填充空间。我们在一个严重力矩受限的欠驱动双摆(Acrobot)上演示了我们的方法,并提供了广泛的仿真和硬件验证。

英文摘要

Motivated by the need for formal guarantees on the stability and safety of controllers for challenging robot control tasks, we present a control design procedure that explicitly seeks to maximize the size of an invariant "funnel" that leads to a predefined goal set. Our certificates of invariance are given in terms of sums of squares proofs of a set of appropriately defined Lyapunov inequalities. These certificates, together with our proposed polynomial controllers, can be efficiently obtained via semidefinite optimization. Our approach can handle time-varying dynamics resulting from tracking a given trajectory, input saturations (e.g. torque limits), and can be extended to deal with uncertainty in the dynamics and state. The resulting controllers can be used by space-filling feedback motion planning algorithms to fill up the space with significantly fewer trajectories. We demonstrate our approach on a severely torque limited underactuated double pendulum (Acrobot) and provide extensive simulation and hardware validation.

1205.3668 2026-06-03 cs.RO cs.SY eess.SY nlin.AO physics.comp-ph 版本更新

Synthesis and Adaptation of Effective Motor Synergies for the Solution of Reaching Tasks

有效运动协同的合成与自适应用于解决到达任务

Cristiano Alessandro, Juan Pablo Carbajal, Andrea d'Avella

AI总结 受肌肉协同假说启发,提出一种通过线性组合少量预定义协同(synergies)生成开环控制器的方法,使智能体能够自主合成并适应有效协同集以解决点对点到达任务,显著降低控制问题维度并保持良好性能。

Comments conference paper

详情
AI中文摘要

受肌肉协同假说的启发,我们提出了一种方法,用于为求解点对点到达任务的智能体生成开环控制器。控制器输出被定义为少量预定义驱动(称为协同)的线性组合。该方法可以从发展视角进行解释,因为它允许智能体自主合成并适应一组有效的协同以适应新的行为需求。该方案极大地降低了控制问题的维度,同时保持了良好的性能水平。该框架在一个平面运动链中进行了评估,并在多个场景中量化了解决方案的质量。

英文摘要

Taking inspiration from the hypothesis of muscle synergies, we propose a method to generate open loop controllers for an agent solving point-to-point reaching tasks. The controller output is defined as a linear combination of a small set of predefined actuations, termed synergies. The method can be interpreted from a developmental perspective, since it allows the agent to autonomously synthesize and adapt an effective set of synergies to new behavioral needs. This scheme greatly reduces the dimensionality of the control problem, while keeping a good performance level. The framework is evaluated in a planar kinematic chain, and the quality of the solutions is quantified in several scenarios.

1203.4345 2026-06-03 eess.SY cs.AI cs.RO cs.SY stat.ML 版本更新

Robust Filtering and Smoothing with Gaussian Processes

基于高斯过程的鲁棒滤波与平滑

Marc Peter Deisenroth, Ryan Turner, Marco F. Huber, Uwe D. Hanebeck, Carl Edward Rasmussen

AI总结 提出一种基于非参数高斯过程模型的非线性随机动态系统鲁棒贝叶斯滤波与平滑算法,通过解析平滑实现鲁棒性,数值实验表明在其它先进方法失效时仍保持稳健。

Comments 7 pages, 1 figure, draft version of paper accepted at IEEE Transactions on Automatic Control

详情
AI中文摘要

我们提出了一种原则性算法,用于在非线性随机动态系统中进行鲁棒贝叶斯滤波和平滑,其中转移函数和测量函数均由非参数高斯过程(GP)模型描述。在信号处理、机器学习、机器人和控制领域,GP通过后验概率分布表示未知系统函数,其重要性日益增加。这种现代的“系统辨识”方式比寻找参数函数表示的点估计更为鲁棒。在本文中,我们提出了一种原则性算法,用于在GP动态系统中进行鲁棒解析平滑,该系统在机器人和控制领域应用日益广泛。我们的数值评估表明,在其它最先进的高斯滤波器和平滑器可能失败的情况下,所提方法具有鲁棒性。

英文摘要

We propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior probability distributions. This modern way of "system identification" is more robust than finding point estimates of a parametric function representation. In this article, we present a principled algorithm for robust analytic smoothing in GP dynamic systems, which are increasingly used in robotics and control. Our numerical evaluations demonstrate the robustness of the proposed approach in situations where other state-of-the-art Gaussian filters and smoothers can fail.

1207.3434 2026-06-03 cs.AI cs.RO cs.SY eess.SY 版本更新

An Approach to Model Interest for Planetary Rover through Dezert-Smarandache Theory

基于Dezert-Smarandache理论的行星探测器兴趣建模方法

Matteo Ceriotti, Massimiliano Vasile, Giovanni Giardini, Mauro Massari

AI总结 提出一种通过Dezert-Smarandache理论融合有效载荷和导航信息来量化行星探测器目标兴趣度的方法,实现自主目标重分配与科学目标优选。

Comments Journal Of Aerospace Computing, Information, And Communication Vol. 5, Month 2008

详情
AI中文摘要

本文提出了一种为行星探测器目标分配兴趣度的方法。为目标分配兴趣度,使探测器能够自主地转换和重新分配目标。兴趣度由数据融合的有效载荷和导航信息定义。融合产生一个“兴趣地图”,量化探测器周围每个区域的兴趣水平。通过这种方式,规划器可以在有限的人为干预下选择最有趣的科学目标进行分析,并自主重新分配其目标。使用Dezert-Smarandache plausible and paradoxical reasoning理论进行信息融合:该理论允许处理模糊和冲突的数据。特别是,它允许我们直接模拟必须评估特定目标集相关性的科学家的行为。本文展示了所提方法在生成可靠兴趣地图中的应用。

英文摘要

In this paper, we propose an approach for assigning an interest level to the goals of a planetary rover. Assigning an interest level to goals, allows the rover autonomously to transform and reallocate the goals. The interest level is defined by data-fusing payload and navigation information. The fusion yields an "interest map", that quantifies the level of interest of each area around the rover. In this way the planner can choose the most interesting scientific objectives to be analyzed, with limited human intervention, and reallocates its goals autonomously. The Dezert-Smarandache Theory of Plausible and Paradoxical Reasoning was used for information fusion: this theory allows dealing with vague and conflicting data. In particular, it allows us directly to model the behavior of the scientists that have to evaluate the relevance of a particular set of goals. The paper shows an application of the proposed approach to the generation of a reliable interest map.

1207.1280 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Probabilistically Safe Control of Noisy Dubins Vehicles

噪声Dubins车辆的概率安全控制

Igor Cizelj, Calin Belta

AI总结 针对噪声Dubins车辆,通过马尔可夫决策过程(MDP)和概率计算树逻辑(PCTL)最大化满足时序逻辑规约的概率,并保证原环境中的满足概率有下界。

Comments Technical Report

详情
AI中文摘要

我们解决了控制随机版本的Dubins车辆的问题,使得在分区环境中一组属性区域上满足时序逻辑规约的概率最大化。我们假设车辆能够确定其在已知环境地图中的精确初始位置。然而,受实际限制启发,我们假设车辆配备有噪声执行器,并且在运动过程中只能使用有限精度的陀螺仪测量其角速度。通过量化和离散化,我们以马尔可夫决策过程(MDP)的形式构建了车辆运动的有限近似。我们允许任务规约为关于环境属性的时序逻辑语句,并使用概率计算树逻辑(PCTL)工具生成最大化满足概率的MDP控制策略。我们将该策略转化为车辆反馈控制策略,并证明车辆在原环境中满足规约的概率由MDP上满足规约的最大概率给出下界。

英文摘要

We address the problem of controlling a stochastic version of a Dubins vehicle such that the probability of satisfying a temporal logic specification over a set of properties at the regions in a partitioned environment is maximized. We assume that the vehicle can determine its precise initial position in a known map of the environment. However, inspired by practical limitations, we assume that the vehicle is equipped with noisy actuators and, during its motion in the environment, it can only measure its angular velocity using a limited accuracy gyroscope. Through quantization and discretization, we construct a finite approximation for the motion of the vehicle in the form of a Markov Decision Process (MDP). We allow for task specifications given as temporal logic statements over the environmental properties, and use tools in Probabilistic Computation Tree Logic (PCTL) to generate an MDP control policy that maximizes the probability of satisfaction. We translate this policy to a vehicle feedback control strategy and show that the probability that the vehicle satisfies the specification in the original environment is bounded from below by the maximum probability of satisfying the specification on the MDP.

1109.2363 2026-06-03 stat.AP cs.RO cs.SY eess.SY math.OC 版本更新

Sensor Management: Past, Present, and Future

传感器管理:过去、现在与未来

Alfred O. Hero, Douglas Cochran

AI总结 本文综述了传感器管理的理论、算法和应用,涵盖其发展历程和当前现状,并展望未来方向。

Comments 15 pages, 112 references

详情
Journal ref
IEEE Sensors Journal, vol. 11, issue 12, pp. 3064-3075, December 2011
AI中文摘要

传感器系统通常在资源约束下运行,这些约束阻止了所有资源同时使用。当传感系统具有主动管理这些资源的能力时,即能够在部署期间根据先前的测量改变其运行配置,传感器管理就变得相关。当前或近期可能使用传感器管理的系统示例包括自主机器人、监视和侦察网络以及波形捷变雷达。本文概述了传感器管理的理论、算法和应用,如其过去几十年发展至今的状况。

英文摘要

Sensor systems typically operate under resource constraints that prevent the simultaneous use of all resources all of the time. Sensor management becomes relevant when the sensing system has the capability of actively managing these resources; i.e., changing its operating configuration during deployment in reaction to previous measurements. Examples of systems in which sensor management is currently used or is likely to be used in the near future include autonomous robots, surveillance and reconnaissance networks, and waveform-agile radars. This paper provides an overview of the theory, algorithms, and applications of sensor management as it has developed over the past decades and as it stands today.

1204.0133 2026-06-03 eess.SY cs.IT cs.RO cs.SY math.IT 版本更新

Progressive Gaussian Filtering

渐进式高斯滤波

Uwe D. Hanebeck, Jannik Steinbring

AI总结 提出一种渐进贝叶斯方法,通过耦合高斯密度与狄拉克混合近似的常微分方程连续跟踪非高斯后验,并在离散时间立方传感器问题上优于现有滤波器。

详情
AI中文摘要

本文提出了一种渐进贝叶斯过程,其中测量信息被连续地纳入给定的先验估计(尽管我们在离散时间步长进行观测)。关键思想是通过采用一种新的耦合密度表示(包括高斯密度及其狄拉克混合近似)来推导一阶常微分方程组。该常微分方程用于通过其最佳匹配高斯近似连续跟踪真实的非高斯后验。通过一个典型的基准示例——离散时间立方传感器问题,将新滤波器的性能与最先进的滤波器进行了比较评估。

英文摘要

In this paper, we propose a progressive Bayesian procedure, where the measurement information is continuously included into the given prior estimate (although we perform observations at discrete time steps). The key idea is to derive a system of ordinary first-order differential equations (ODE) by employing a new coupled density representation comprising a Gaussian density and its Dirac Mixture approximation. The ODE is used for continuously tracking the true non-Gaussian posterior by its best matching Gaussian approximation. The performance of the new filter is evaluated in comparison with state-of-the-art filters by means of a canonical benchmark example, the discrete-time cubic sensor problem.

1203.6243 2026-06-03 eess.SY cs.RO cs.SY 版本更新

Optimal Pruning for Multi-Step Sensor Scheduling

多步传感器调度的最优剪枝

Marco F. Huber

AI总结 针对线性高斯传感器调度问题,提出基于信息矩阵和Riccati方程单调性的信息剪枝算法,以计算高效地最小化多步估计误差。

Comments 6 pages, 3 figures, 1 algorithm, accepted for publication as technical correspondence in IEEE Transactions on Automatic Control

详情
AI中文摘要

在所考虑的线性高斯传感器调度问题中,仅从一组传感器中选择一个传感器进行测量。为了以计算可行的方式最小化多个时间步上的估计误差,提出了所谓基于信息的剪枝算法。该算法利用传感器的信息矩阵和Riccati方程的单调性,从而能够根据传感器的信息贡献进行排序,并将许多传感器从调度中排除。此外,为分支定界搜索计算了一个紧的下界,进一步提高了剪枝性能。

英文摘要

In the considered linear Gaussian sensor scheduling problem, only one sensor out of a set of sensors performs a measurement. To minimize the estimation error over multiple time steps in a computationally tractable fashion, the so-called information-based pruning algorithm is proposed. It utilizes the information matrices of the sensors and the monotonicity of the Riccati equation. This allows ordering sensors according to their information contribution and excluding many of them from scheduling. Additionally, a tight lower is calculated for branch-and-bound search, which further improves the pruning performance.

1202.5544 2026-06-03 cs.RO cs.SY eess.SY math.DS math.OC math.PR 版本更新

An Incremental Sampling-based Algorithm for Stochastic Optimal Control

基于增量采样的随机最优控制算法

Vu Anh Huynh, Sertac Karaman, Emilio Frazzoli

AI总结 针对连续时间连续空间随机最优控制问题,提出增量马尔可夫决策过程(iMDP)算法,通过随机采样状态空间生成离散化序列并异步值迭代,以任意精度逼近最优策略。

Comments Part of the results have been submitted to the IEEE International Conference on Robotics and Automation (ICRA 2012). Minnesota, USA, May 2012

详情
AI中文摘要

本文考虑一类连续时间、连续空间的随机最优控制问题。基于马尔可夫链近似方法和确定性路径规划中基于采样的算法的最新进展,我们提出了一种名为增量马尔可夫决策过程(iMDP)的新算法,用于增量计算在期望成本意义上任意逼近最优策略的控制策略。该算法的主要思想是通过对状态空间进行随机采样,生成原始问题的一系列有限离散化。在每次迭代中,离散化问题是一个马尔可夫决策过程,作为原始问题的增量细化模型。我们证明,以概率1,(i)每个离散化问题的最优值函数序列一致收敛到原始随机最优控制问题的最优值函数,并且(ii)原始最优值函数可以使用异步值迭代以增量方式高效计算。因此,所提出的算法为连续问题的最优控制策略计算提供了一种随时方法。在存在过程噪声的杂乱环境中,通过运动规划和控制问题展示了所提出方法的有效性。

英文摘要

In this paper, we consider a class of continuous-time, continuous-space stochastic optimal control problems. Building upon recent advances in Markov chain approximation methods and sampling-based algorithms for deterministic path planning, we propose a novel algorithm called the incremental Markov Decision Process (iMDP) to compute incrementally control policies that approximate arbitrarily well an optimal policy in terms of the expected cost. The main idea behind the algorithm is to generate a sequence of finite discretizations of the original problem through random sampling of the state space. At each iteration, the discretized problem is a Markov Decision Process that serves as an incrementally refined model of the original problem. We show that with probability one, (i) the sequence of the optimal value functions for each of the discretized problems converges uniformly to the optimal value function of the original stochastic optimal control problem, and (ii) the original optimal value function can be computed efficiently in an incremental manner using asynchronous value iterations. Thus, the proposed algorithm provides an anytime approach to the computation of optimal control policies of the continuous problem. The effectiveness of the proposed approach is demonstrated on motion planning and control problems in cluttered environments in the presence of process noise.

1202.2185 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Temporal Logic Motion Control using Actor-Critic Methods

使用Actor-Critic方法的时序逻辑运动控制

Xu Chu Ding, Jing Wang, Morteza Lahijanian, Ioannis Ch. Paschalidis, Calin A. Belta

AI总结 针对大型分区环境中基于时序逻辑规范的控制问题,提出一种基于最小二乘时序差分学习的Actor-Critic近似动态规划框架,通过优化随机控制策略参数实现近似最优策略。

Comments Technical Report which accompanies an ICRA2012 paper

详情
AI中文摘要

本文考虑从以时序逻辑语句形式给出的规范部署机器人的问题,该规范涉及大型分区环境中区域满足的某些属性。我们假设机器人具有噪声传感器和执行器,并将其在环境区域中的运动建模为马尔可夫决策过程(MDP)。机器人控制问题变为寻找在MDP上最大化满足时序逻辑任务概率的控制策略。对于大型环境,获取每个状态-动作对的转移概率以及求解最优策略所需的优化问题通常计算上不可行。为解决这些问题,我们提出了一种基于最小二乘时序差分学习方法的Actor-Critic类型近似动态规划框架。该框架在机器人的样本路径上运行,并针对少量参数优化随机控制策略。转移概率仅在需要时获取。硬件在环仿真证实,参数的收敛转化为近似最优策略。

英文摘要

In this paper, we consider the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov Decision Process (MDP). The robot control problem becomes finding the control policy maximizing the probability of satisfying the temporal logic task on the MDP. For a large environment, obtaining transition probabilities for each state-action pair, as well as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates on sample paths of the robot and optimizes a randomized control policy with respect to a small set of parameters. The transition probabilities are obtained only when needed. Hardware-in-the-loop simulations confirm that convergence of the parameters translates to an approximately optimal policy.

1112.5282 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Observability of Strapdown INS Alignment: A Global Perspective

捷联惯导系统对准的可观测性:全局视角

Yuanxin Wu, Hongliang Zhang, Meiping Wu, Xiaoping Hu, Dewen Hu

AI总结 本文从全局视角出发,利用SO(3)流形上的姿态演化等固有特性,研究捷联惯导系统静态和翻滚对准的可观测性,证明绕两个不同轴连续旋转可实现完全可观测,绕单轴旋转则存在有限不可观测状态。

Comments 25 pages; IEEE Trans. on Aerospace and Electronic Systems, Jan. 2012

详情
Journal ref
IEEE Trans. on Aerospace and Electronic Systems, 48(1), pp. 78-102, 2012
AI中文摘要

捷联惯导系统(INS)的对准具有很强的非线性,当采用机动(例如翻滚技术)来改善对准时,非线性甚至更严重。由于没有通用的规则来处理非线性系统的可观测性,大多数先前的工作通过隐式假设原始非线性系统和线性化系统具有相同的可观测性特征,来研究相应线性化系统的可观测性。捷联惯导对准是一个具有自身特性的非线性系统。利用捷联惯导的固有属性,例如SO(3)流形上的姿态演化,我们从基本定义出发,开发了一种全局且构造性的方法来研究捷联惯导静态和翻滚对准的可观测性,突出了姿态机动对可观测性的影响。我们证明,考虑未知常值传感器偏差,如果捷联惯导绕两个不同轴连续旋转,则对准将是完全可观测的;如果绕单轴旋转,则对于有限已知不可观测状态(不超过两个)几乎是可观测的。全局视角的可观测性为我们提供了对问题的深入理解和更清晰的图景,揭示了先前关于捷联惯导对准的理论结果的不全面或不一致之处。这些不一致的报告要求对大量文献中所有基于线性化的可观测性研究进行重新审视。我们进行了大量仿真,包括构造理想观测器和扩展卡尔曼滤波器,数值结果与分析一致。这些结论还有助于在实践中设计最优翻滚策略和合适的状态观测器,以最大化对准性能。

英文摘要

Alignment of the strapdown inertial navigation system (INS) has strong nonlinearity, even worse when maneuvers, e.g., tumbling techniques, are employed to improve the alignment. There is no general rule to attack the observability of a nonlinear system, so most previous works addressed the observability of the corresponding linearized system by implicitly assuming that the original nonlinear system and the linearized one have identical observability characteristics. Strapdown INS alignment is a nonlinear system that has its own characteristics. Using the inherent properties of strapdown INS, e.g., the attitude evolution on the SO(3) manifold, we start from the basic definition and develop a global and constructive approach to investigate the observability of strapdown INS static and tumbling alignment, highlighting the effects of the attitude maneuver on observability. We prove that strapdown INS alignment, considering the unknown constant sensor biases, will be completely observable if the strapdown INS is rotated successively about two different axes and will be nearly observable for finite known unobservable states (no more than two) if it is rotated about a single axis. Observability from a global perspective provides us with insights into and a clearer picture of the problem, shedding light on previous theoretical results on strapdown INS alignment that were not comprehensive or consistent.. The reporting of inconsistencies calls for a review of all linearization-based observability studies in the vast literature. Extensive simulations with constructed ideal observers and an extended Kalman filter are carried out, and the numerical results accord with the analysis. The conclusions can also assist in designing the optimal tumbling strategy and the appropriate state observer in practice to maximize the alignment performance.

1111.2258 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Design and Implementation of Prosthetic Arm using Gear Motor Control Technique with Appropriate Testing

使用齿轮电机控制技术的假肢手臂设计与实现及适当测试

Biswarup Neogi, Soumyajit Mukherjee, Soumya Ghosal, Achintya Das, D. N. Tibarewala

AI总结 本文提出一种基于齿轮电机控制技术的假肢手臂硬件设计方法,通过处理器编程实现手臂运动,并用肌肉应变替代传统肌电信号,成功测试了轻量化假肢模型。

Comments 5 Pages,13 Figures

详情
Journal ref
International Journal of Computer Applications in Engineering, Technology and Sciences(IJ-CA-ETS), Volume 3, Issue 1, Page 281-285, 2011
AI中文摘要

人体任何部分的复制过程都始于假肢控制科学。本文重点介绍了一种采用齿轮电机控制技术的假肢手臂硬件设计方法。通过处理器编程,本文展示了假肢控制手臂的运动,并对所设计的假肢模型进行了成功测试。假肢手臂的结构设计用更轻的材料替代了重金属,同时用肌肉应变替代了传统的肌电信号。

英文摘要

Any part of the human body replication procedure commences the prosthetic control science. This paper highlights the hardware design technique of a prosthetic arm with implementation of gear motor control aspect. The prosthetic control arm movement has been demonstrated in this paper applying processor programming and with the successful testing of the designed prosthetic model. The architectural design of the prosthetic arm here has been replaced by lighter material instead of heavy metal, as well as the traditional EMG (electro myographic) signal has been replaced by the muscle strain.

1109.1251 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Synthesis of Distributed Control and Communication Schemes from Global LTL Specifications

从全局LTL规范综合分布式控制与通信方案

Yushan Chen, Xu Chu Ding, Calin Belta

AI总结 提出一种从全局线性时序逻辑(LTL)规范综合多智能体控制与通信策略的技术,通过并发理论检查规范可分布性,并利用LTL模型检验生成个体策略。

Comments Technical Report accompanying an accepted paper for CDC2011

详情
AI中文摘要

我们提出了一种技术,用于从由线性时序逻辑(LTL)公式给出的全局任务规范中,综合一组智能体的控制和通信策略,该公式基于一组可由智能体满足的属性。我们考虑一个纯离散场景,其中每个智能体的动态被建模为有限转移系统。所提出的计算框架包括两个主要步骤。首先,我们扩展并发理论的结果,以检查规范是否可在智能体之间分布。其次,我们利用LTL模型检验的思想生成个体控制和通信策略。我们将该方法应用于在机器人城市环境中自动部署一组微型汽车。

英文摘要

We introduce a technique for synthesis of control and communication strategies for a team of agents from a global task specification given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied by the agents. We consider a purely discrete scenario, in which the dynamics of each agent is modeled as a finite transition system. The proposed computational framework consists of two main steps. First, we extend results from concurrency theory to check whether the specification is distributable among the agents. Second, we generate individual control and communication strategies by using ideas from LTL model checking. We apply the method to automatically deploy a team of miniature cars in our Robotic Urban-Like Environment.

1111.1684 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Simulation Techniques and Prosthetic Approach Towards Biologically Efficient Artificial Sense Organs- An Overview

仿真技术与假体方法在生物高效人工感觉器官中的应用综述

Biswarup Neogi, Soumya Ghosal, Soumyajit Mukherjee, Achintya Das, D. N. Tibarewala

AI总结 本文综述了控制理论在假体感觉器官(包括视觉、味觉和嗅觉)中的应用,重点讨论了仿真技术和控制建模在人工器官性能评估与设计中的关键作用。

Comments 12 Pages

详情
AI中文摘要

本文综述了控制理论在假体感觉器官(包括视觉、味觉和嗅觉)中的应用。如今,仿真方面已成为假体领域研究的中心。在自然生物器官功能失调的患者中,假体器官已有多种成功应用。仿真方面和控制建模对于了解系统性能以及生成人工器官的原始方法不可或缺。本综述主要关注控制技术,迄今为止是对试图模仿生物活性感觉器官效能的人工感觉器官的理论综述和融合。关键词:虚拟现实,假体视觉,人工

英文摘要

An overview of the applications of control theory to prosthetic sense organs including the senses of vision, taste and odor is being presented in this paper. Simulation aspect nowadays has been the centre of research in the field of prosthesis. There have been various successful applications of prosthetic organs, in case of natural biological organs dis-functioning patients. Simulation aspects and control modeling are indispensible for knowing system performance, and to generate an original approach of artificial organs. This overview focuses mainly on control techniques, by far a theoretical overview and fusion of artificial sense organs trying to mimic the efficacies of biologically active sensory organs. Keywords: virtual reality, prosthetic vision, artificial

1108.3221 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

An Optimal Control Approach for the Persistent Monitoring Problem

持续监测问题的最优控制方法

Christos G. Cassandras, Xu Chu Ding, Xuchao Lin

AI总结 提出一种最优控制框架,通过控制移动代理的运动来最小化任务空间中的不确定性度量,并利用无穷小扰动分析将问题简化为参数优化。

Comments Technical report accompanying the CDC2011 submission

详情
AI中文摘要

我们提出了一种针对持续监测问题的最优控制框架,其目标是通过控制移动代理的运动来最小化给定任务空间中的不确定性度量。对于一维空间中的单个代理,我们证明最优解可以通过一系列切换位置获得,从而将其简化为参数优化问题。利用无穷小扰动分析(IPA),我们通过基于梯度的算法得到了完整解。我们还讨论了一种能够在线获得近优解的滚动时域控制器。我们通过数值示例说明了我们的方法。

英文摘要

We propose an optimal control framework for persistent monitoring problems where the objective is to control the movement of mobile agents to minimize an uncertainty metric in a given mission space. For a single agent in a one-dimensional space, we show that the optimal solution is obtained in terms of a sequence of switching locations, thus reducing it to a parametric optimization problem. Using Infinitesimal Perturbation Analysis (IPA) we obtain a complete solution through a gradient-based algorithm. We also discuss a receding horizon controller which is capable of obtaining a near-optimal solution on-the-fly. We illustrate our approach with numerical examples.

1109.2288 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Heterogeneity for Increasing Performance and Reliability of Self-Reconfigurable Multi-Robot Organisms

异构性提升自重构多机器人组织的性能与可靠性

S. Kernbach, F. Schlachter, R. Humza, J. Liedke, S. Popesku, S. Russo, T. Ranzani, L. Manfredi, C. Stefanini, R. Matthias, Ch. Schwarzer, B. Girault, P. Alschbach, E. Meister, O. Scholz

AI总结 本文研究异构性在自重构模块化机器人系统中的设计选择与性能评估,通过机电和软件设计实验证明异构平台能提升系统性能和可靠性。

详情
Journal ref
IROS11, workshop on "Reconfigurable Modular Robotics", San Francisco, 2011
AI中文摘要

同质性与异构性在模块化机器人系统设计中代表一种众所周知的权衡。本文探讨异构性概念、其基本原理、设计选择及性能评估。我们介绍了自重构系统面临的挑战,展示了异构平台的机电和软件设计进展,并讨论了旨在证明该系统可用性和性能的实验。

英文摘要

Homogeneity and heterogeneity represent a well-known trade-off in the design of modular robot systems. This work addresses the heterogeneity concept, its rationales, design choices and performance evaluation. We introduce challenges for self-reconfigurable systems, show advances of mechatronic and software design of heterogeneous platforms and discuss experiments, which intend to demonstrate usability and performance of this system.

1108.6175 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Adaptive Locomotion of Multibody Snake-like Robot

多体蛇形机器人的自适应运动

Eugen Meister, Sergej Stepanenko, Serge Kernbach

AI总结 针对25自由度蛇形机器人,提出一种自适应节律控制算法,通过仿真和实物实验研究其行为和能量特性,并分析不同身体节段的动力学差异。

Comments Multibody Dynamics 2011, ECCOMAS Thematic Conference, J.C. Samin, P. Fisette (eds.) Brussels, Belgium, 4-7 July, 2011

详情
AI中文摘要

本文提出了一种针对具有25个自由度的蛇形机器人的自适应节律控制。该自适应步态控制以算法方式在仿真和真实机器人上实现。我们研究了这种控制的行为和能量特性以及不同身体节段的动力学。结果表明,尽管使用同质发生器,物理约束对相邻身体节段产生不均匀影响。通过对此类动力学进行解析建模,可能导致节律控制的振荡器异质耦合,并影响步态模式发生器的可扩展性和同步效果。

英文摘要

This paper represents an adaptive rhythmic control for a snake-like robot with 25 degrees of freedom. The adaptive gait control is implemented in algorithmic way in simulation and on a real robot. We investigated behavioral and energetic properties of this control and a dynamics of different body segments. It turned out that despite using homogeneous generators, physical constraints have an inhomogeneous impact on neighbor body segments. By analytical modeling of such dynamics, it may result in heterogeneous coupling of oscillators for a rhythmic control and impact scalability and synchronization effects of gait pattern generators.

1108.4698 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

最小二乘时序差分演员-评论家方法及其在机器人运动控制中的应用

Reza Moazzez Estanjini, Xu Chu Ding, Morteza Lahijanian, Jing Wang, Calin A. Belta, Ioannis Ch. Paschalidis

AI总结 针对最大化到达某些状态同时避免其他状态的概率的马尔可夫决策过程问题,提出一种基于最小二乘时序差分学习的演员-评论家近似动态规划算法,并证明其收敛到参数空间中的驻点。

Comments Technical report accompanying an accepted paper to CDC 2011

详情
AI中文摘要

我们考虑为马尔可夫决策过程(MDP)寻找控制策略的问题,以最大化到达某些状态同时避免其他状态的概率。该问题受机器人应用启发,当需要概率机器人运动模型满足时序逻辑任务规范时,这类问题自然出现。我们将该问题转化为随机最短路径(SSP)问题,并开发了一种新的近似动态规划算法来求解。该算法属于演员-评论家类型,使用最小二乘时序差分学习方法。它基于系统的样本路径运行,并在由一组简约参数参数化的预指定类中优化策略。我们证明了其收敛到参数空间中的驻点对应的策略。仿真结果证实了所提解决方案的有效性。

英文摘要

We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.

1108.5624 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Multi-Robot Searching Algorithm Using Levy Flight and Artificial Potential Field

基于Levy飞行和人工势场的多机器人搜索算法

Donny K. Sutantyo, Serge Kernbach, Valentin A. Nepomnyashchikh, Paul Levi

AI总结 提出结合Levy飞行和人工势场的多机器人搜索算法,通过实验验证其效率并开发通用框架。

Comments Eighth IEEE International Workshop on Safety, Security, and Rescue Robotics (SSRR-2010), Bremen, Germany, 26-30 July 2010

详情
AI中文摘要

高效的搜索算法在机器人领域至关重要,特别是对于目标可用性未知且环境条件高度不可预测的探索任务。在非常大的环境中,单个机器人扫描区域或体积是不够的,需要多个机器人参与集体探索。本文提出结合称为Levy飞行的仿生搜索算法和人工势场方法,为多机器人应用实现高效的搜索算法。本工作的主要焦点不仅是通过实验证明概念或衡量算法的效率,而且还要开发一个合适的通用框架,以便在仿真和真实机器人平台上实现。还进行了比较不同搜索算法的几个实验。

英文摘要

An efficient search algorithm is very crucial in robotic area, especially for exploration missions, where the target availability is unknown and the condition of the environment is highly unpredictable. In a very large environment, it is not sufficient to scan an area or volume by a single robot, multiple robots should be involved to perform the collective exploration. In this paper, we propose to combine bio-inspired search algorithm called Levy flight and artificial potential field method to perform an efficient searching algorithm for multi-robot applications. The main focus of this work is not only to prove the concept or to measure the efficiency of the algorithm by experiments, but also to develop an appropriate generic framework to be implemented both in simulation and on real robotic platforms. Several experiments, which compare different search algorithms, are also performed.

1108.5543 2026-06-03 cs.RO cs.NE cs.SY eess.SY 版本更新

Multi-Robot Organisms: State of the Art

多机器人有机体:最新技术综述

Serge Kernbach, Oliver Scholz, Kanako Harada, Sergej Popesku, Jens Liedke, Humza Raja, Wenguo Liu, Fabio Caparrelli, Jaouhar Jemai, Jiri Havlik, Eugen Meister, Paul Levi

AI总结 本文综述了人工多机器人有机体领域的最新进展,涵盖机电一体化、传感器与计算设备、软件框架,并介绍了群体与可重构机器人领域的一项重大挑战。

详情
Journal ref
ICRA2010, workshop on "Modular Robots: State of the Art", pp.1-10, Anchorage, 2010
AI中文摘要

本文代表了人工多机器人有机体领域的最新发展。它简要考虑了机电一体化发展、传感器和计算设备、软件框架,并介绍了群体与可重构机器人领域的一项重大挑战。

英文摘要

This paper represents the state of the art development on the field of artificial multi-robot organisms. It briefly considers mechatronic development, sensor and computational equipment, software framework and introduces one of the Grand Challenges for swarm and reconfigurable robotics.

1108.4432 2026-06-03 cs.RO cs.SY eess.SY math.OC physics.comp-ph 版本更新

Exploiting the Passive Dynamics of a Compliant Leg to Develop Gait Transitions

利用柔性腿的被动动力学发展步态转换

Harold Roberto Martinez Salazar, Juan Pablo Carbajal

AI总结 通过混合动力系统分析弹簧负载倒立摆模型,识别稳定与不稳定区域,并利用不稳定区域在恒定能量下诱导步态转换,同时提出简单变攻角控制策略使系统几乎始终稳定。

详情
Journal ref
Phys. Rev. E 6(83),pp 066707, Jun 2011
AI中文摘要

在双足运动领域,弹簧负载倒立摆(SLIP)模型已被提出作为解释多种步态动力学的统一框架。本文对该数学模型及其动力学特性进行了新颖的分析。我们采用混合动力系统的视角来研究动力学,并定义了部分稳定性和可行性等概念。通过这种方法,一方面我们识别了运动的稳定和不稳定区域;另一方面,我们找到了利用运动不稳定区域在恒定能量状态下诱导步态转换的方法。此外,我们证明简单的非恒定攻角控制策略可以使系统几乎始终稳定。

英文摘要

In the area of bipedal locomotion, the spring loaded inverted pendulum (SLIP) model has been proposed as a unified framework to explain the dynamics of a wide variety of gaits. In this paper, we present a novel analysis of the mathematical model and its dynamical properties. We use the perspective of hybrid dynamical systems to study the dynamics and define concepts such as partial stability and viability. With this approach, on the one hand, we identified stable and unstable regions of locomotion. On the other hand, we found ways to exploit the unstable regions of locomotion to induce gait transitions at a constant energy regime. Additionally, we show that simple non-constant angle of attack control policies can render the system almost always stable.

1108.3240 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Multi-robot Deployment From LTL Specifications with Reduced Communication

基于LTL规范的多机器人部署与通信减少

Marius Kloetzer, Xu Chu Ding, Calin Belta

AI总结 提出一种分层框架,通过有限抽象、并行组合和运动规划,将全局LTL规范自动部署到多独轮车机器人团队,并重点设计算法减少执行阶段的机器人间通信。

Comments CDC 2011 Technical Report

详情
AI中文摘要

在本文中,我们开发了一个计算框架,用于根据某些感兴趣区域上的LTL公式给出的全局规范,全自动部署一组独轮车机器人。我们的分层方法包括四个步骤:(i) 为每个机器人的运动构建有限抽象,(ii) 抽象模型的并行组合,(iii) 生成满足团队规范的运动;(iv) 将该运动映射到单个机器人的控制和通信策略。本文的主要结果是提出一种算法,用于在过程的第四步减少机器人间的通信量。

英文摘要

In this paper, we develop a computational framework for fully automatic deployment of a team of unicycles from a global specification given as an LTL formula over some regions of interest. Our hierarchical approach consists of four steps: (i) the construction of finite abstractions for the motions of each robot, (ii) the parallel composition of the abstractions, (iii) the generation of a satisfying motion of the team; (iv) mapping this motion to individual robot control and communication strategies. The main result of the paper is an algorithm to reduce the amount of inter-robot communication during the fourth step of the procedure.

1108.2126 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Multi-Modal Local Sensing and Communication for Collective Underwater Systems

多模态本地感知与通信用于集体水下系统

Serge Kernbach, Tobias Dipper, Donny Sutantyo

AI总结 本文研究集体水下系统中用于网络和集群模式的本地感知与通信,通过模态和子模态通信的特定组合实现多AUV间的专用协作。

详情
Journal ref
Proceedings of the 11th International Conference on Mobile Robots and Competitions, Robotica 2011, Lisbon, pp.96-101, 2011
AI中文摘要

本文致力于研究用于网络和集群模式的集体水下系统中的本地感知与通信。结果表明,同时用于机器人-机器人和机器人-物体检测的模态和子模态通信的特定组合,可以在多个AUV之间创建专用协作。这些技术、平台和实验被简要描述,并使我们能够得出关于集体水下系统中不同信号方法的有用组合的结论。

英文摘要

This paper is devoted to local sensing and communication for collective underwater systems used in networked and swarm modes. It is demonstrated that a specific combination of modal and sub-modal communication, used simultaneously for robot-robot and robot-object detection, can create a dedicated cooperation between multiple AUVs. These technologies, platforms and experiments are shortly described, and allow us to make a conclusion about useful combinations of different signaling approaches for collective underwater systems.

1006.2165 2026-06-03 stat.ME cs.AI cs.RO cs.SY eess.SY math.OC stat.ML 版本更新

A Probabilistic Perspective on Gaussian Filtering and Smoothing

高斯滤波与平滑的概率视角

Marc Peter Deisenroth, Henrik Ohlsson

AI总结 本文从概率视角统一高斯滤波与平滑方法,指出其核心区别仅在于联合概率均值和协方差的计算/近似方式,并据此推导了容积卡尔曼平滑器及基于吉布斯采样的鲁棒滤波与平滑算法。

Comments 14 pages. Extended version of conference paper (ACC 2011)

详情
AI中文摘要

我们提出了一个关于高斯滤波与平滑的通用概率视角。这使我们能够证明,常见的高斯滤波/平滑方法仅通过其计算/近似联合概率的均值和协方差的方法来区分。这意味着,通过提供计算这些矩的方法,可以直接推导出新的滤波器和平滑器。基于这一见解,我们推导了容积卡尔曼平滑器,并提出了一种基于吉布斯采样的新型鲁棒滤波与平滑算法。

英文摘要

We present a general probabilistic perspective on Gaussian filtering and smoothing. This allows us to show that common approaches to Gaussian filtering/smoothing can be distinguished solely by their methods of computing/approximating the means and covariances of joint probabilities. This implies that novel filters and smoothers can be derived straightforwardly by providing methods for computing these moments. Based on this insight, we derive the cubature Kalman smoother and propose a novel robust filtering and smoothing algorithm based on Gibbs sampling.

1106.0708 2026-06-03 math.OC cs.MA cs.RO cs.SY eess.SY 版本更新

Optimal Sensor Configurations for Rectangular Target Dectection

矩形目标检测的最优传感器配置

François-Alex Bourque, Bao U. Nguyen

AI总结 针对具有矩形对称性和均匀分布朝向的目标,提出一种在半个圆周上均匀选择n个角度的最优搜索策略,并给出未检测概率的下界。

Comments 6 pages, 2 figures

详情
AI中文摘要

找到了从多个不同角度观测目标的最优搜索策略。假设目标具有矩形对称性且朝向均匀分布。矩形对称性意味着目标的一侧是其相对侧的镜像。找到最优解通常是一个难题。幸运的是,对称性原理允许找到解析且直观的解。一种这样的最优搜索策略包括在半个圆周上均匀选择n个角度,并给出了未检测到目标的概率的下界。由于不需要目标朝向的先验知识,这种搜索策略也具有鲁棒性,这是搜索和探测任务中的一个理想特性。

英文摘要

Optimal search strategies where targets are observed at several different angles are found. Targets are assumed to exhibit rectangular symmetry and have a uniformly-distributed orientation. By rectangular symmetry, it is meant that one side of a target is the mirror image of its opposite side. Finding an optimal solution is generally a hard problem. Fortunately, symmetry principles allow analytical and intuitive solutions to be found. One such optimal search strategy consists of choosing n angles evenly separated on the half-circle and leads to a lower bound of the probability of not detecting targets. As no prior knowledge of the target orientation is required, such search strategies are also robust, a desirable feature in search and detection missions.

1105.2254 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Symmetries in observer design: review of some recent results and applications to EKF-based SLAM

观测器设计中的对称性:近期结果综述及在基于EKF的SLAM中的应用

Silvere Bonnabel

AI总结 本文综述了保持对称性的观测器理论及其近期进展,并将其应用于基于扩展卡尔曼滤波的同步定位与地图构建(EKF SLAM),提出了一种具有收敛性的新对称性保持扩展卡尔曼滤波器,并证明了特定增益选择可确保全局指数收敛。

Comments This paper accompanies a presentation to be given at Eighth International Workshop on Robot Motion and Control (RoMoCo'11)

详情
AI中文摘要

本文首先综述了保持对称性的观测器理论,并提及了一些近期结果。然后,我们将该理论应用于基于扩展卡尔曼滤波的同步定位与地图构建(EKF SLAM)。这使我们能够为非线性SLAM问题推导出一种新的(保持对称性的)扩展卡尔曼滤波器,该滤波器具有收敛性质。我们还证明了增益的特殊选择可确保全局指数收敛。

英文摘要

In this paper, we first review the theory of symmetry-preserving observers and we mention some recent results. Then, we apply the theory to Extended Kalman Filter-based Simultaneous Localization and Mapping (EKF SLAM). It allows to derive a new (symmetry-preserving) Extended Kalman Filter for the non-linear SLAM problem that possesses convergence properties. We also prove a special choice of the gains ensures global exponential convergence.

1103.4342 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

MDP Optimal Control under Temporal Logic Constraints

时序逻辑约束下的MDP最优控制

Xu Chu Ding, Stephen L. Smith, Calin Belta, Daniela Rus

AI总结 针对马尔可夫决策过程(MDP),提出一种在给定线性时序逻辑(LTL)规范下自动生成控制策略的方法,并引入优化命题以最小化期望成本,通过动态规划算法合成最优或次优策略。

Comments Technical report accompanying the CDC2011 submission

详情
AI中文摘要

在本文中,我们开发了一种方法,用于自动生成以马尔可夫决策过程(MDP)建模的动态系统的控制策略。控制规范以线性时序逻辑(LTL)公式给出,该公式基于定义在MDP状态上的一组命题。我们合成一个控制策略,使得MDP几乎必然满足给定规范(如果这样的策略存在)。此外,我们指定一个“优化命题”以重复满足,并制定了一个新的优化准则,即最小化该命题满足之间的期望成本。我们提出了策略最优的充分条件,并开发了一种动态规划算法,该算法在某些条件下合成最优策略,否则合成次优策略。此问题源于需要执行持久性任务的机器人应用,例如环境监测或数据收集。

英文摘要

In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exists. In addition, we designate an "optimizing proposition" to be repeatedly satisfied, and we formulate a novel optimization criterion in terms of minimizing the expected cost in between satisfactions of this proposition. We propose a sufficient condition for a policy to be optimal, and develop a dynamic programming algorithm that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. This problem is motivated by robotic applications requiring persistent tasks, such as environmental monitoring or data gathering, to be performed.

1102.3396 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Detecting Separation in Robotic and Sensor Networks

检测机器人与传感器网络中的分离

Chenda Liao, Harshavardhan Chenji, Prabir Barooah, Radu Stoleru, Tamás Kalmár-Nagy

AI总结 针对机器人与传感器网络中节点与基站可能因移动或故障而分离的问题,提出一种基于平均化方案的分布式算法,通过监测节点状态收敛性来检测永久性分离。

详情
AI中文摘要

本文考虑在机器人与传感器网络中监测检测代理与基站分离的问题。这种分离可能由代理的移动和/或故障引起。在静态网络中,分离/切断检测可以通过节点与基站之间传递消息来实现,但对于高移动性网络,由于路由不断变化,这种解决方案不切实际。我们提出了一种分布式算法来检测与基站的分离。该算法包括一个平均化方案,其中每个节点通过与其当前邻居通信来更新一个标量状态。我们证明,如果一个节点永久性地与基站断开连接,其状态收敛到$0$。如果一个节点在平均意义上与基站连接,即使在任何时刻都不连接,我们证明其状态的期望值收敛到一个正数。因此,节点可以通过监测其状态来检测是否已与基站分离。通过仿真、实际系统实现以及涉及静态和移动网络的实验,验证了所提算法的有效性。

英文摘要

In this paper we consider the problem of monitoring detecting separation of agents from a base station in robotic and sensor networks. Such separation can be caused by mobility and/or failure of the agents. While separation/cut detection may be performed by passing messages between a node and the base in static networks, such a solution is impractical for networks with high mobility, since routes are constantly changing. We propose a distributed algorithm to detect separation from the base station. The algorithm consists of an averaging scheme in which every node updates a scalar state by communicating with its current neighbors. We prove that if a node is permanently disconnected from the base station, its state converges to $0$. If a node is connected to the base station in an average sense, even if not connected in any instant, then we show that the expected value of its state converges to a positive number. Therefore, a node can detect if it has been separated from the base station by monitoring its state. The effectiveness of the proposed algorithm is demonstrated through simulations, a real system implementation and experiments involving both static as well as mobile networks.

1003.4831 2026-06-03 cs.RO cs.SY eess.SY physics.med-ph 版本更新

Ball on a beam: stabilization under saturated input control with large basin of attraction

球杆系统:饱和输入控制下具有大吸引域的自稳定

Yannick Aoustin, Alexander Formal'skii

AI总结 针对直线和圆形两种欠驱动球杆系统,利用Jordan形式设计考虑电压饱和的反馈控制律,使吸引域逼近可控域,并通过仿真验证非线性控制律的有效性。

详情
Journal ref
Multibody System Dynamics 21 (2008) 71-89
AI中文摘要

本文致力于两个欠驱动平面系统的镇定问题,即著名的直线球杆系统和一种原创的圆形球杆系统。利用每个系统模型在不稳定平衡点附近线性化的Jordan形式,设计了反馈控制律。明确考虑了输入到电机的电压限制。直线球杆系统在平衡点附近的运动中有一个不稳定模态。所提出的控制律确保吸引域与可控域重合。圆形球杆系统在平衡点附近有两个不稳定模态。因此,这种从未被考虑过的装置比直线球杆系统更难控制。主要贡献是提出一种简单的新控制律,通过调整其增益参数,使得在线性情况下吸引域可以任意接近可控域。针对两个非线性系统,给出了仿真结果,以说明所设计的非线性控制律的效率并确定吸引域。

英文摘要

This article is devoted to the stabilization of two underactuated planar systems, the well-known straight beam-and-ball system and an original circular beam-and-ball system. The feedback control for each system is designed, using the Jordan form of its model, linearized near the unstable equilibrium. The limits on the voltage, fed to the motor, are taken into account explicitly. The straight beam-and-ball system has one unstable mode in the motion near the equilibrium point. The proposed control law ensures that the basin of attraction coincides with the controllability domain. The circular beam-and-ball system has two unstable modes near the equilibrium point. Therefore, this device, never considered in the past, is much more difficult to control than the straight beam-and-ball system. The main contribution is to propose a simple new control law, which ensures by adjusting its gain parameters that the basin of attraction arbitrarily can approach the controllability domain for the linear case. For both nonlinear systems, simulation results are presented to illustrate the efficiency of the designed nonlinear control laws and to determine the basin of attraction.

1010.2247 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Regions of Attraction for Hybrid Limit Cycles of Walking Robots

行走机器人混合极限环的吸引域

Ian R. Manchester, Mark M. Tobenkin, Michael Levashov, Russ Tedrake

AI总结 本文应用非线性混合极限环吸引域分析的最新研究成果,通过范德波尔振荡器、无辐车轮和指南针步态三个示例系统,详细阐述了利用平方和分析和半定规划寻找横向动力学李雅普诺夫函数的方法,并展示了优化横向面、处理冲击映射、优化李雅普诺夫函数以及轨道稳定控制设计等不同方面的应用。

详情
AI中文摘要

本文阐述了非线性混合极限环吸引域分析的最新研究成果的应用。详细分析了三个示例系统:范德波尔振荡器、“无辐车轮”和“指南针步态”,后两者是欠驱动机器人行走的简化模型。所使用的方法包括将目标周期附近的动力学分解为切向和横向分量,并利用平方和分析(半定规划)在横向动力学中寻找李雅普诺夫函数。每个示例都展示了该过程的不同方面,包括横向面的优化、冲击映射的处理、李雅普诺夫函数的优化以及轨道稳定控制设计。

英文摘要

This paper illustrates the application of recent research in region-of-attraction analysis for nonlinear hybrid limit cycles. Three example systems are analyzed in detail: the van der Pol oscillator, the "rimless wheel", and the "compass gait", the latter two being simplified models of underactuated walking robots. The method used involves decomposition of the dynamics about the target cycle into tangential and transverse components, and a search for a Lyapunov function in the transverse dynamics using sum-of-squares analysis (semidefinite programming). Each example illuminates different aspects of the procedure, including optimization of transversal surfaces, the handling of impact maps, optimization of the Lyapunov function, and orbitally-stabilizing control design.

1008.3760 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Formal-language-theoretic Optimal Path Planning For Accommodation of Amortized Uncertainties and Dynamic Effects

形式语言理论最优路径规划以容纳摊销不确定性和动态效应

Ishanu Chattopadhyay, Anthony Cascone, Asok Ray

AI总结 提出基于形式语言定量测度理论的全局最优路径规划方法,通过引入概率不可控转移建模不确定性,并采用无搜索组合优化最大化概率正则语言测度,实现机器人导航中目标到达概率最大化与障碍碰撞概率最小化。

Comments Submitted for review for possible publication elsewhere; journal reference will be added when available

详情
AI中文摘要

我们报告了一种基于形式语言定量测度理论的机器人路径规划全局最优方法。对基于语言测度的路径规划算法$ ustar$进行了重要推广,明确考虑了平均动态不确定性和规划执行中的估计误差。导航自动机的概念被推广为包含概率不可控转移,通过建模和规划执行过程中与计算策略的概率偏差来考虑不确定性。规划问题被转化为概率有限状态自动机的性能最大化问题。本质上,我们求解以下优化问题:计算最大化到达目标概率同时最小化碰撞障碍概率的导航策略。所提出方法的关键新颖之处包括使用不可控转移概念建模不确定性,以及通过高效无搜索组合方法求解后续优化问题,以最大化概率正则语言的定量测度。该算法在多种机器人导航模型中的适用性已通过实验室环境中两轮移动机器人平台(SEGWAY RMP 200)的实验验证得到展示。

英文摘要

We report a globally-optimal approach to robotic path planning under uncertainty, based on the theory of quantitative measures of formal languages. A significant generalization to the language-measure-theoretic path planning algorithm $\nustar$ is presented that explicitly accounts for average dynamic uncertainties and estimation errors in plan execution. The notion of the navigation automaton is generalized to include probabilistic uncontrollable transitions, which account for uncertainties by modeling and planning for probabilistic deviations from the computed policy in the course of execution. The planning problem is solved by casting it in the form of a performance maximization problem for probabilistic finite state automata. In essence we solve the following optimization problem: Compute the navigation policy which maximizes the probability of reaching the goal, while simultaneously minimizing the probability of hitting an obstacle. Key novelties of the proposed approach include the modeling of uncertainties using the concept of uncontrollable transitions, and the solution of the ensuing optimization problem using a highly efficient search-free combinatorial approach to maximize quantitative measures of probabilistic regular languages. Applicability of the algorithm in various models of robot navigation has been shown with experimental validation on a two-wheeled mobile robotic platform (SEGWAY RMP 200) in a laboratory environment.