arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.03940 2026-06-03 eess.IV cs.CV cs.LG cs.RO 版本更新

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

SEAOTTER: 基于传感器嵌入自编码器与一次性转码的高效重建

Dan Jacobellis, Neeraja J. Yadwadkar

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； The University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出SEAOTTER框架，结合传感器嵌入自编码器与可学习JPEG转码，在200:1压缩比下实现比AVIF快7倍编码、3.5倍解码，并提升ImageNet top-1准确率8%，同时保持JPEG兼容性。

详情

AI中文摘要

在机器人系统中，使用低成本、低功耗硬件可以轻松捕获高分辨率的大量视觉数据。然而，当通过JPEG/MPEG等传统编解码器传输时，有限的带宽和机载计算资源阻碍了充分利用。较新的编解码器（如AV1/AVIF）改善了率失真权衡，但需要更多资源进行编码，在没有定制ASIC的情况下不切实际。最近的非对称自编码器在极端功率和带宽约束下提供高质量，但增加了高昂的解码成本，并使用忽略围绕JPEG等标准建立的数十年基础设施的特有格式。为了解决这些限制，我们引入了一种基于传感器嵌入自编码器与一次性转码的高效重建（SEAOTTER）的云机器人压缩框架。由于传感器、云和消费阶段面临非常不同的功率和带宽预算，SEAOTTER结合了学习潜变量的紧凑性和标准JPEG文件的广泛可用性。由于朴素转码会降低性能，我们提出了一种可学习的JPEG颜色和量化变换，能够提高全局、密集和基于视觉语言感知的准确性。使用SEAOTTER，我们为预训练的冻结编码器训练通用和任务感知的转码流水线。在200:1的压缩比下，与AVIF相比，我们观察到编码速度提高7倍，解码速度提高3.5倍，ImageNet top-1准确率提高8%，同时保持与JPEG基础设施的兼容性。我们的代码可从此https URL获取。

英文摘要

In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve the rate-distortion trade-off, but demand far more resources for encoding, impractical without custom ASICs. Recent asymmetric autoencoders deliver high quality under extreme power and bandwidth constraints, but add prohibitive decoding cost and use bespoke formats that ignore decades of infrastructure built around standards like JPEG. To address these limitations, we introduce a compression framework for cloud robotics based on a Sensor Embedded Autoencoder paired with a One-Time Transcode for Efficient Reconstruction (SEAOTTER). Because the sensor, cloud, and consumer stages face very different power and bandwidth budgets, SEAOTTER combines the compactness of a learned latent with the broad usability of a standard JPEG file. Since naive transcoding degrades performance, we propose a learnable JPEG color and quantization transform that enables increased accuracy for global, dense, and vision-language-based perception. Using SEAOTTER, we train both general-purpose and task-aware transcoding pipelines for a pre-trained, frozen encoder. At a compression ratio of 200:1 and compared to AVIF, we observe 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, while retaining compatibility with JPEG infrastructure. Our code is available at https://github.com/UT-SysML/seaotter .

URL PDF HTML ☆

赞 0 踩 0

2606.03994 2026-06-03 cs.CV cs.RO 版本更新

SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image

SimuScene: 从单张图像重建仿真就绪的组合式3D场景

Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim, Hyunsoo Cha, Hanbyul Joo

发表机构 * Seoul National University（首尔国立大学）

AI总结提出SimuScene，一种将物理仿真融入形状和布局估计的组合式3D重建流水线，通过物理引擎诊断重建错误并驱动修正，生成稳定且仿真就绪的场景。

Comments Project Page: https://snuvclab.github.io/SimuScene/

详情

AI中文摘要

从单张图像重建可交互、仿真就绪的3D场景是机器人操作的关键瓶颈。虽然最近的单图像提升器能恢复合理的每个物体形状，但组合它们会产生因物体相互穿透、悬浮或下沉而在物理仿真中崩溃的场景。现有的物理感知方法严格将其作为事后布局修正，而未解决底层几何误差。为此，我们引入SimuScene，一种将物理置于形状和布局估计循环中的组合式3D重建流水线。我们不仅将物理用于布局清理，还在生成过程中利用物理引擎作为诊断测量工具。通过在重力下对重建物体进行诊断性仿真，我们将穿透和支撑失败转化为定量修正信号，驱动重力轴拉伸和非模态形状重采样。这种物理信息反馈循环减轻了累积的重建误差，并产生稳定、仿真就绪的组合式3D场景。大量实验在物理稳定性和几何对齐基准上展示了最先进的性能。我们进一步通过在仿人控制和机器人臂操作任务中部署重建环境来突出SimuScene的实用性。

英文摘要

Reconstructing interactive, simulation-ready 3D scenes from a single image is a critical bottleneck for robotic manipulation. While recent single-image lifters recover plausible per-object shapes, composing them yields scenes that collapse under physical simulation due to interpenetrating, hovering, or sinking objects. Existing physics-aware methods address this strictly as a post-hoc layout correction, leaving the underlying geometric errors unresolved. To address this, we introduce SimuScene, a compositional 3D reconstruction pipeline that puts physics in the loop of shape and layout estimation. Rather than using physics merely for layout cleanup, we utilize the physics engine as a diagnostic measurement tool during the generative process itself. By diagnostically simulating reconstructed objects under gravity, we convert penetration and support failures into quantitative correction signals that drive gravity-axis stretching and amodal shape resampling. This physics-informed feedback loop mitigates accumulated reconstruction errors and produces a stable, simulation-ready compositional 3D scene. Extensive experiments demonstrate state-of-the-art performance on physical stability and geometric alignment benchmarks. We further highlight SimuScene's utility by deploying reconstructed environments in humanoid control and robot-arm manipulation tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.03992 2026-06-03 cs.CV cs.RO 版本更新

Exploring Easy Boosts for Lidar Semantic Scene Completion

探索激光雷达语义场景补全的简易提升方法

Tetiana Martyniuk, Jonathan Seele, Alexandre Boulch, Gilles Puy, Renaud Marlet, Raoul de Charette

发表机构 * Inria, France（法国国家信息与自动化技术研究所）； valeo.ai, France（valeo.ai公司）； ETH Zurich, Switzerland（瑞士苏黎世联邦理工学院）； LIGM, CNRS, Univ Gustave Eiffel, ENPC, IP Paris, France（法国高等科学研究院（CNRS））

AI总结本文研究无需复杂架构重设计的“免费午餐”策略，通过为输入点云添加语义伪标签和可见性信息，显著提升激光雷达语义场景补全性能，使旧模型与最先进系统竞争甚至超越。

Comments Accepted to ICIP 2026

2606.03985 2026-06-03 cs.RO cs.AI cs.CV 版本更新

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Humanoid-GPT：扩展数据与结构以实现零样本运动跟踪

Zekun Qi, Xuchuan Chen, Dairu Liu, Chenghuai Lin, Yunrui Lian, Sikai Liang, Zhikai Zhang, Yu Guan, Jilong Wang, Wenyao Zhang, Xinqiang Yu, He Wang, Li Yi

发表机构 * Tsinghua University（清华大学）； Galbot Inc.（Galbot公司）； Shanghai Jiao Tong University（上海交通大学）； Peking University（北京大学）； Shanghai Qi Zhi Institute（上海启智研究院）

AI总结提出Humanoid-GPT，一种基于GPT风格的因果Transformer，在十亿级运动语料上预训练，实现全身控制，通过扩展数据和模型容量达到对未见运动和任务的零样本泛化。

Comments Accepted at CVPR 2026

2606.03954 2026-06-03 cs.CV cs.LG cs.RO 版本更新

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

VLESA: 用于人类活动监测的视觉语言具身安全智能体

Hanjiang Hu, Yiyuan Pan, Jiaxing Li, Xusheng Luo, Alexander Robey, Na Li, Yebin Wang, Changliu Liu

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Mitsubishi Electric Research Laboratories（三菱电机研究实验室）； Harvard University（哈佛大学）

AI总结提出VLESA框架，通过自我中心视频监测人类活动，利用GRPO训练的目标条件安全Q过滤器进行实时安全干预，在ASIMOV-2.0基准上实现更高干预精度。

Comments 18 pages, 5 tables, 5 figures

详情

AI中文摘要

随着AI系统越来越多地协助人类完成物理任务，确保安全变得至关重要——物理动作会带来即时且不可逆转的后果，而数字错误则不会。我们引入了视觉语言具身安全智能体（VLESA），这是一个从自我中心视频监测人类活动，并在预测到危险动作时触发实时安全干预的框架。VLESA处理意图依赖的安全问题，其中相同的动作可能根据上下文而安全或危险。我们引入了一个将自我中心帧与目标条件安全注释配对的数据集，使得能够通过GRPO训练一个目标条件安全Q过滤器，该过滤器在不重新训练的情况下根据推断的意图评估动作。在此基础上，提出了一个意图-动作预测智能体，用于从视频中联合推断目标并预测未来动作。在ASIMOV-2.0基准上，VLESA在精确的地面真值帧处实现了比基线更高的干预准确率，而通过目标条件约束解码，GRPO训练的Q过滤器将动作安全性提高了超过41个百分点。代码可在该网址获取。

英文摘要

As AI systems increasingly assist humans in physical tasks, ensuring safety becomes paramount -- physical actions carry immediate and irreversible consequences that digital errors do not. We introduce the Vision-Language Embodied Safety Agent (VLESA), a framework that monitors human activities from egocentric video and triggers real-time safety interventions when dangerous actions are predicted. VLESA addresses intent-dependent safety where identical actions can be safe or dangerous depending on context. A dataset pairing egocentric frames with goal-conditioned safety annotations is introduced, enabling a goal-conditioned safety Q-filter trained via GRPO that evaluates actions with respect to inferred intent without retraining. On top of that, an intent-action prediction agent is proposed to jointly infer goals and predict future actions from video. On the ASIMOV-2.0 benchmark, VLESA achieves higher intervention accuracy at the exact ground-truth frame compared to baselines, while the GRPO-trained Q-filter improves action safety by over 41 percentage points through goal-conditioned constrained decoding. Code is available at https://github.com/HanjiangHu/VLESA.

URL PDF HTML ☆

赞 0 踩 0

2606.03949 2026-06-03 cs.RO 版本更新

Preference-Calibrated Human-in-the-Loop Reinforcement Learning for Robotic Manipulation

偏好校准的人机协同强化学习用于机器人操作

Zeyi Liu, Guangyao Liu, Yinuo Qu, Yuquan Xue, Bofang Jia, Chunhua Yang, Weihua Gui, Keke Huang, Ziwei Wang

发表机构 * Central South University（中南大学）； Nanyang Technological University（南洋理工大学）； Zhejiang University（浙江大学）

AI总结提出PACT框架，通过干预隐式偏好信号进行信用重分配和策略对齐，提升人机协同强化学习的样本效率和性能。

Comments Submitted to CoRL2026

详情

AI中文摘要

人机协同强化学习（HIL-RL）通过在线人类干预提高了真实机器人操作中的样本效率。然而，成功的轨迹可能包含偏离期望任务执行路径并迫使人类干预的次优动作。现有的HIL-RL方法通常对所有转换应用一致的信用分配原则，通过次优段均匀传播折扣终端奖励，忽略了每个转换对任务成功的实际贡献。这高估了评论家学习的Q值，并间接误导演员更新朝向次优行为模式。为此，我们提出了PACT，一种偏好校准的演员-评论家训练框架，利用干预引起的隐式偏好信号对识别出的次优段进行信用重分配，同时直接指导策略训练以实现无偏的评论家-演员学习。具体来说，我们首先设计了一个从人类演示中学习并识别次优段进行信用校正的进度模型。然后，从干预状态下的人类动作和重采样策略动作中，我们构建偏好对来定义一个反事实优势，惩罚识别出的次优段的贝尔曼目标，实现方向性信用校准。此外，我们在有界均值空间中直接将策略与人类纠正动作对齐，提供了评论家引导更新之外的额外信号。在五个真实机器人操作任务中，PACT将平均成功率提高了24.5%，并实现了1.3倍的更快收敛，从而提高了强化学习的样本效率和性能。代码可在https://this URL获取。

英文摘要

Human-in-the-loop reinforcement learning (HIL-RL) improves sample efficiency in real-robot manipulation through online human intervention. However, successful trajectories may include suboptimal actions that deviate from the desired task-execution path and force human intervention. Existing HIL-RL methods typically apply the consistent credit assignment principle to all transitions, uniformly propagating discounted terminal rewards through suboptimal segments, ignoring the actual contribution of each transition to task success. This overestimates Q-values for critic learning and indirectly misguides actor updates toward suboptimal behavior patterns. To this end, we propose PACT, a Preference-calibrated Actor-Critic Training framework that leverages the implicit preference signals induced by intervention to perform credit reassignment on identified suboptimal segments while directly guiding policy training for unbiased critic-actor learning. Specifically, we first design a progress model that learns from human demonstration and identifies suboptimal segments for credit correction. Then, from the human action and resampled policy action at the intervention state, we build preference pairs to define a counterfactual advantage that penalizes Bellman targets of the identified suboptimal segment, enabling directional credit calibration. Moreover, we directly align the policy with human corrective actions in the bounded mean space, providing an additional signal beyond critic-guided updates. Across five real-robot manipulation tasks, PACT improves the average success rate by 24.5% and achieves 1.3 times faster convergence, thereby improving both RL sample efficiency and performance. Code is available at https://anonymous.4open.science/r/HILRL-A1X-BC05.

URL PDF HTML ☆

赞 0 踩 0

2606.03931 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Multi-Robot Bearing-only Pose Estimation via Angle Rigidity

基于角度刚性的多机器人仅方位姿态估计

J. Francisco Presenza, Leonardo J. Colombo, Ignacio Mas, Juan I. Giribet

发表机构 * Institute of Engineering Technology and Sciences "Hilario Fernández Long" (CONICET-UBA)（希拉里·费尔南德斯·隆工程技术与科学研究所（CONICET-UBA））； Centre for Automation and Robotics (CSIC-UPM)（自动化研究中心（CSIC-UPM））； Artificial Intelligence and Robotics Laboratory, Universidad de San Andrés and CONICET（人工智能与机器人实验室，圣安德烈斯大学及CONICET）

AI总结提出一种分布式仅方位姿态估计器，利用体坐标系方位角计算位置并恢复姿态，仅需角度刚性条件，实现局部一致指数稳定。

2606.03905 2026-06-03 cs.RO 版本更新

Semantic-weighted ICP for LiDAR Odometry: Class-Aware Residual Reweighting for Robust Scan Registration

语义加权ICP用于LiDAR里程计：基于类感知残差加权的鲁棒扫描配准

Vasco Carvalho, Tiago Barros, Urbano J. Nunes

发表机构 * Institute of Robotics and Autonomous Systems, University of Lisbon（里斯本大学机器人与自主系统研究所）

AI总结提出语义加权ICP方法，通过根据语义类别的几何稳定性对残差进行加权，在动态和复杂环境中提升LiDAR里程计的位姿估计鲁棒性。

详情

AI中文摘要

LiDAR里程计是自主机器人系统的基本组成部分，依赖于连续点云之间的几何配准来估计自运动。然而，传统的几何方法在动态或非结构化环境中常常退化，原因是移动物体、稀疏几何特征、植被和语义模糊结构导致不可靠的对应关系。现有工作表明，其中一些限制可以通过在配准过程中引入环境的语义信息来解决。在这项工作中，我们在此基础上进一步表明，并非环境中的所有元素对配准同等重要。因此，我们提出了一种用于LiDAR里程计的语义类加权ICP。所提出的方法不是严格过滤掉属于特定语义类别的点，而是根据其预期的几何稳定性对属于语义类别的点的残差进行加权。这种策略使得信息丰富但可能不稳定的结构能够对配准过程做出贡献，同时减轻动态物体的影响。实验评估在SemanticKITTI和RELLIS-3D数据集上进行，这些数据集包括城市、高速公路、乡村和越野环境。实证结果表明，所提出的语义加权ICP改进了位姿估计，特别是在传统刚性特征稀缺的具有挑战性的越野场景中。此外，分析表明，这种加权策略的有效性高度依赖于环境，受场景的结构和语义组成影响。

英文摘要

LiDAR odometry is a fundamental component of autonomous robotic systems, relying on geometric registration between consecutive point clouds to estimate ego-motion. However, traditional geometric approaches often degrade in dynamic or unstructured environments due to unreliable correspondences caused by moving objects, sparse geometric features, vegetation, and semantically ambiguous structures. Existing works have shown that, some of these limitations can be addressed by introducing semantic information from the environment in the registration process. In this work, we build on this, and show that not all elements in the environment are equally relevant for registration. Hence, we propose a semantic class-weighted ICP for LiDAR odometry. Instead of strictly filtering out points belonging to specific semantic classes, the proposed approach weights the residuals of points belonging to semantic categories based on their expected geometric stability. This strategy enables informative but potentially unstable structures, to contribute to the registration process while mitigating the influence of dynamic objects. The experimental evaluation was conducted on the SemanticKITTI and RELLIS-3D datasets, which include urban, highway, rural, and off-road environments. The empirical results show that the proposed Semantic-weighted ICP improves pose estimation, especially in challenging off-road scenarios where conventional rigid features are scarce. Furthermore, the analysis reveals that the effectiveness of this weighting strategy is highly environment-dependent, influenced by the structural and semantic composition of the scene.

URL PDF HTML ☆

赞 0 踩 0

2606.03874 2026-06-03 cs.CV cs.RO 版本更新

DyaPlex: Full-Duplex Speech-Motion Model for Dyadic Interaction

DyaPlex: 用于二元交互的全双工语音-运动模型

Koki Nagano, Hongyu Liu, Seonwook Park, Tianye Li, Amrita Mazumdar, Christian Jacobsen, Shengze Wang, Michael Stengel, Rajarshi Roy, Ka Chun Cheung, Simon See, Shalini De Mello

发表机构 * NVIDIA ； HKUST（香港科技大学）

AI总结提出DyaPlex，一种流式全双工语音-运动模型，通过双塔Transformer架构和统一二元令牌交织机制，实现同步多模态交互，在单体和二元交互基准上达到最优性能。

Comments Project page: https://research.nvidia.com/labs/amri/projects/DyaPlex

详情

AI中文摘要

我们提出了DyaPlex，一种用于二元交互的流式全双工语音-运动模型。为了捕捉人类交流的连续性和互惠性，这种全双工能力使智能体能够以流式方式同时感知和生成语音及物理运动。其核心在于，我们的方法利用了基础全双工语音模型的强先验，并集成了新颖的运动通路，从而实现完全同步的多模态交互。具体来说，我们设计了一种双塔Transformer架构，在保持冻结基础语音模型的零样本对话推理能力的同时，构建了深度耦合的流式运动通路。通过引入统一的二元令牌交织机制，并借助时间对齐的语音-运动RoPE引导交叉注意力，我们的模型有效地将自回归运动与丰富的潜在语音特征对齐。在4000小时的Seamless Interaction数据集上训练，我们的模型有效捕捉了跨说话者依赖关系，并在单体和二元人类交互基准上建立了新的最优性能。

英文摘要

We present DyaPlex, a streaming, full-duplex speech-and-motion model designed for dyadic interaction. To capture the continuous and reciprocal nature of human communication, this full-duplex capability empowers the agent to simultaneously perceive and generate both speech and physical motion in a streaming fashion. At its core, our method leverages the strong priors of a foundational full-duplex speech model and integrates a novel motion pathway, thereby achieving fully synchronized multi-modal interaction. Specifically, we design a dual-tower Transformer architecture that preserves the zero-shot conversational reasoning of a frozen base speech model while constructing a deeply coupled, streaming motion pathway. By introducing a unified dyadic token interleaving mechanism and guiding cross-attention via a time-aligned speech-motion RoPE, our model effectively aligns autoregressive motions with rich latent speech features. Trained on the 4,000-hour Seamless Interaction dataset, our model effectively captures cross-speaker dependencies and establishes new state-of-the-art performance across both monadic and dyadic human interaction benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.03847 2026-06-03 cs.RO 版本更新

Denoising Tells When to Replan: Denoising-Variance Adaptive Chunking for Flow-Based Robot Policies

去噪提示何时重新规划：基于流的机器人策略的去噪方差自适应分块

Xiangdong Feng, Yuxuan Cheng, Chen Shi, Boyao Han, Yuxuan Yan, Yitong Hong, Zhuotao Tian, Li Jiang

发表机构 * Beijing Institute of Technology（北京理工大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； Shenzhen Loop Area Institute（深圳Loop区研究院）； Hunan University（湖南大学）； Xi’an Jiaotong University（西安交通大学）； Renmin University of China（中国人民大学）； Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））

AI总结针对基于流的机器人策略中固定执行步长的问题，提出DVAC方法，利用去噪过程中干净动作估计的方差自适应决定执行步长，在保持或提升任务成功率的同时降低重新规划频率。

详情

AI中文摘要

动作分块已成为基于流的机器人策略的常见推理策略，通过建模演示中的多步时间依赖关系来改善动作连贯性。然而，执行步长通常仍被设为经验固定值，忽略了可预测的自由空间运动和精度关键交互阶段往往需要不同的重新规划频率。在本文中，我们首先证明基于流的策略的去噪过程包含任务阶段的内在信号：干净动作估计在可预测运动阶段保持稳定，但在接触密集或精度敏感操作附近波动更大。受此观察启发，我们提出DVAC（去噪方差自适应分块），一种测试时方法，自适应地决定从每个预测分块中执行多少动作。DVAC测量最终去噪步骤中干净动作估计的方差，执行稳定的低方差前缀，并在提交高方差未来动作之前重新规划。为了跨任务和 rollout 迁移，DVAC进一步使用局部方差尺度的滚动估计来校准阈值。在LIBERO、RoboTwin、CALVIN和真实世界操作上的实验表明，DVAC在提高任务成功率的同时降低了重新规划频率。使用基于$\pi_{0.5}$的策略，DVAC将LIBERO成功率从94.75%提高到98.00%，重新规划减少43.0%，同时在RoboTwin和CALVIN上也取得了总体收益，并提高了真实世界执行效率。

英文摘要

Action chunking has become a common inference strategy for flow-based robot policies, improving action coherence by modeling multi-step temporal dependencies in demonstrations. However, the execution horizon is still typically set as an empirical fixed value, overlooking that predictable free-space motions and precision-critical interaction phases often require different replanning frequencies. In this work, we first show that the denoising process of flow-based policies contains an intrinsic signal of task phases: clean-action estimates remain stable during predictable motion phases, but fluctuate more strongly around contact-rich or precision-sensitive operations. Motivated by this observation, we propose DVAC (Denoising-Variance Adaptive Chunking), a test-time method that adaptively determines how many actions to execute from each predicted chunk. DVAC measures the variance of clean-action estimates over the final denoising steps, executes the stable low-variance prefix, and replans before high-variance future actions are committed. To transfer across tasks and rollouts, DVAC further calibrates the threshold with a rolling estimate of the local variance scale. Experiments on LIBERO, RoboTwin, CALVIN, and real-world manipulation show that DVAC improves task success while reducing replanning frequency. With a $π_{0.5}$-based policy, DVAC improves LIBERO success from 94.75% to 98.00% and reduces replanning by 43.0%, while also yielding aggregate gains on RoboTwin and CALVIN and improving real-world execution efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.03834 2026-06-03 cs.RO 版本更新

Let the Dynamics Flow: Stable Flow Matching Dynamical Systems

让动力学流动：稳定的流匹配动力系统

Rodrigo Pérez-Dattari, Francisco Leiva, Andrea Testa, Leonel Rozo, Javier Ruiz del Solar, Noémie Jaquier

发表机构 * Department of Robotics, Perception, and Learning, KTH Royal Institute of Technology（机器人、感知与学习系，皇家理工学院）； Advanced Mining Technology Center (AMTC) and Department of Electrical Engineering, Universidad de Chile（先进采矿技术中心（AMTC）和电气工程系，智利大学）； Bosch Center for Artificial Intelligence, Renningen, Germany（博世人工智能中心，德国Renningen）； Italian Institute of Artificial Intelligence (AI4I), Turin, Italy（意大利人工智能研究所（AI4I），意大利都灵）

AI总结提出稳定流匹配动力系统（SFMDS）框架，通过流匹配参数化动力系统并施加李雅普诺夫稳定性约束，实现稳定、可扩展、多模态的机器人运动生成。

详情

AI中文摘要

流匹配最近已成为模仿学习的一种强大方法，能够实现可扩展、表达力强且多模态的运动策略。然而，将这些生成模型纳入形式化的稳定性保证（确保机器人行为安全和可泛化的前提）仍然是一个重大挑战。虽然将机器人运动建模为动力系统允许这种基于稳定性的归纳偏置，但现有框架难以捕捉复杂机器人任务中固有的丰富动作分布。本文介绍了稳定流匹配动力系统（SFMDS），这是一个弥合高容量生成模型与形式化李雅普诺夫稳定性保证之间差距的新框架。SFMDS通过流匹配参数化动力系统，同时将模型约束到稳定解族。我们提出了两种变体：基于惩罚项的软约束，以及直接嵌入模型架构的硬结构约束。我们还将两种公式扩展到李群。在基准数据集、仿真和类人机器人上的实验表明，SFMDS在低维和高维状态空间中学习稳定、可扩展和多模态的动力系统，从而实现安全且富有表现力的机器人运动生成。

英文摘要

Flow matching has recently emerged as a powerful approach for imitation learning, enabling scalable, expressive, and multimodal motion policies. However, incorporating formal stability guarantees into these generative models, a prerequisite to ensure safe and generalizable robot behaviors, remains a significant challenge. While modeling robot motions as dynamical systems allows for such stability-based inductive biases, existing frameworks struggle to capture the rich action distributions inherent in complex robotic tasks. This paper introduces Stable Flow Matching Dynamical Systems (SFMDS), a novel framework that bridges the gap between high-capacity generative modeling and formal Lyapunov stability guarantees. SFMDS parametrizes dynamical systems via flow matching while simultaneously constraining the model to a family of stable solutions. We propose two variants: a soft constraint based on a penalty term, and a hard structural constraint embedded directly in the model architecture. We further extend both formulations to Lie groups. Experiments on benchmark datasets, in simulation, and on a humanoid robot show that SFMDS learns stable, scalable, and multimodal dynamical systems in low- and high-dimensional state spaces, enabling safe and expressive robot motion generation.

URL PDF HTML ☆

赞 0 踩 0

GN0：迈向视觉语言导航中生成、评估与策略学习的统一范式

Xinhai Li, Xiaotao Zhang, Yuehao Huang, Jiankun Dong, Tianhang Wang, Sunyao Zhou, Yunzi Wu, Chengnuo Sun, Yunfei Ge, Qizhen Weng, Chi Zhang, Chenjia Bai, Xuelong Li

AI总结提出GN0统一框架，通过自动生成大规模导航数据集GN-Matrix、基于3DGS的高保真仿真平台和BEV基准GN-Bench，结合RL驱动的导航基础模型BAE，在VLN任务上超越现有方法。

详情

AI中文摘要

具身导航将智能体与物理世界连接起来，是通用机器人智能的基础。导航数据的有限可用性和质量限制了视觉语言导航（VLN）系统的泛化和长时程能力。为解决这一问题，我们整理了多样化的3D场景，并开发了大规模导航数据的自动化流水线，生成了GN-Matrix数据集。基于3D高斯泼溅（3DGS）引擎，我们引入了一个支持交互式漫游和碰撞感知导航的高保真仿真平台。我们进一步提出了GN-Bench，这是首个基于BEV的基准测试，包含用于人机交互评估的动态3DGS化身。为了利用仿真器，我们开发了一个RL驱动的导航基础模型——Break and Establish（BAE）。在监督学习之后，DAgger将模型暴露于滚动生成的状态，打破了狭窄的专家中心分布，并实现了下游RL探索。这一统一的VLN范式整合了基于地图和无地图的任务，包括指令跟随、人类跟随和目标导航。GN-BAE将高保真3DGS渲染的鸟瞰图表示形式化为紧凑记忆，解锁了VLM中的潜在空间推理。在GN-Bench和VLN-CE上的广泛评估表明，GN0优于最先进的VLN方法。总体而言，GN-Matrix提供了一个涵盖数据、仿真和学习的统一框架，推动了研究和工业应用中的具身导航。

英文摘要

Embodied navigation connects intelligent agents with the physical world and is fundamental for general robotic intelligence. Limited availability and quality of navigation data have constrained Vision-and-Language Navigation (VLN) systems' generalization and long-horizon capabilities. To address this, we curate diverse 3D scenes and develop an automated pipeline for large-scale navigation data, resulting in the GN-Matrix dataset. Building on a 3D Gaussian Splatting (3DGS) engine, we introduce a high-fidelity simulation platform supporting interactive roaming and collision-aware navigation. We further propose GN-Bench, the first BEV-based benchmark incorporating dynamic 3DGS avatars for human-robot interaction evaluation. To leverage the simulator, we develop an RL-driven navigation foundation model, Break and Establish (BAE). After supervised learning, DAgger exposes the model to rollout-induced states, breaking narrow expert-centric distributions and enabling downstream RL exploration. This unified VLN paradigm integrates map-based and map-free tasks, including instruction following, human following, and goal navigation. GN-BAE formalizes high-fidelity 3DGS-rendered Bird's Eye View representations as compact memory, unlocking latent spatial reasoning in VLMs. Extensive evaluations on GN-Bench and VLN-CE show that GN0 outperforms state-of-the-art VLN methods. Overall, GN-Matrix offers a unified framework spanning data, simulation, and learning, advancing embodied navigation in research and industrial applications.

URL PDF HTML ☆

赞 0 踩 0

2606.03593 2026-06-03 cs.SE cs.RO 版本更新

NVIDIA Isaac Sim：实现可扩展的GPU加速机器人仿真

Sicong Gao, Maurice Pagnucco, Tomasz Bednarz, Yang Song

发表机构 * School of Computer Science and Engineering, The University of New South Wales（新南威尔士大学计算机科学与工程学院）； NVIDIA USA（NVIDIA美国公司）

AI总结本文系统综述了NVIDIA Isaac Sim的架构、应用模式及局限性，重点分析其GPU加速在大规模并行训练、合成数据生成和物理精确建模方面的优势，并探讨了未来方向。

详情

AI中文摘要

仿真已成为机器人研究的核心基础设施。与以往的仿真器不同，NVIDIA Isaac Sim利用GPU加速实现大规模并行训练和物理精确建模。其合成数据生成流水线缓解了高质量训练数据的稀缺性，支持数据驱动的机器人学习和大规模以仿真为中心的实验。然而，现有综述通常将其视为众多仿真器之一，缺乏对其架构特性、使用模式和局限性的系统分析。本文从系统和应用角度综述Isaac Sim，概述其架构并与广泛使用的仿真器进行比较。我们分析了五个主要领域的代表性研究，总结了常见的使用模式，特别是在数据生成和高保真仿真方面。我们还概述了关键的未来方向和挑战，包括物理开放世界学习、以仿真为中心的培训以及实际可用性约束。

英文摘要

Simulation has become a core infrastructure for robotics research. Unlike previous simulators, NVIDIA Isaac Sim leverages GPU acceleration to enable large-scale parallel training and physics-accurate modeling. Its synthetic data generation pipeline alleviates the scarcity of high-quality training data, supporting data-driven robot learning and large-scale simulation-centric experimentation. However, existing surveys often treat it as one simulator among many, without a systematic analysis of its architectural characteristics, usage patterns, and limitations. This survey reviews Isaac Sim from system and application perspectives, outlining its architecture and comparing it with widely used simulators. We analyze representative studies across five major domains and summarize common usage patterns, particularly in data generation and high-fidelity simulation. We also outline key future directions and challenges, including physics open-world learning, simulation-centric training and practical usability constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.03545 2026-06-03 cs.RO 版本更新

Static and Dynamic Representations for Tactile Contact-Angle Estimation with Event-Based Sensors

基于事件传感器的触觉接触角估计的静态与动态表示

Yanhui Lu, Efi Psomopoulou, Benjamin Ward-Cherrier

发表机构 * School of Engineering Mathematics and Technology, University of Bristol（布里斯托大学工程数学与科技学院）

AI总结本文利用事件触觉传感器（NeuroTac）的事件流，比较了三种事件衍生的空间轮廓表示（动态、静态及其组合）用于接触角估计，并验证了其在机器人操作中实现高频、低延迟触觉角度估计的潜力。

Comments 8 pages, 8 figures. Submitted to IEEE Robotics and Automation Letters (RAL), under review

详情

AI中文摘要

基于事件的触觉传感为接触密集的机器人交互提供了低延迟信号采集。本文研究了使用来自事件触觉传感器（NeuroTac）的事件流进行接触角估计，并比较了三种事件衍生的空间轮廓表示：捕获近期事件活动的动态表示、恢复更持久接触状态的静态表示以及它们的组合表示。在评估的运动场景中，所有表示管道在所有测试采样间隔下的P99处理延迟均低于10毫秒，展示了它们在机器人操作中用于高频基于事件的触觉角度估计的潜力。在特定场景训练下，静态表示始终比动态和组合表示表现略好，在连续传感器滚动期间产生平均总体MAE为0.160°，在随机插入的运动中断期间停止阶段平均MAE为0.251°。它还在速度和压痕深度变化方面表现出比其他两种表示更小的性能波动。

英文摘要

Event-based tactile sensing offers low-latency signal acquisition for contact-rich robotic interaction. This paper investigates contact-angle estimation using event streams from an event-based tactile sensor (NeuroTac) and compares three event-derived spatial contour representations: a dynamic representation capturing recent event activity, a static representation recovering a more persistent contact state, and their combined representation. Across the evaluated motion scenarios, all representation pipelines exhibited P99 processing latency below 10 ms at all tested sampling intervals, demonstrating their potential for high-frequency event-based tactile angle estimation in robotic manipulation. The static representation consistently achieved marginally better performance than the dynamic and combined representations under scenario-specific training, yielding a mean overall MAE of 0.160° during continuous sensor rolling and a stop-phase mean MAE of 0.251° during randomly inserted motion interruptions. It also exhibited smaller performance fluctuations across speed and indentation depth variations than the other two representations.

URL PDF HTML ☆

赞 0 踩 0

2606.03536 2026-06-03 cs.RO 版本更新

Bionic Human-Motion Style Transfer for Physically Executable Whole-Body Control of Humanoid Robots

仿人运动风格迁移用于人形机器人物理可执行全身控制

Tianchen Huang, Mingkuan Zhao, Yang Gao, Feiyang Yuan, Junchi Gu, Xiaohu Zhang, Dongdong Zhao, Shi Yan, Yu Wang, Wei Gao, Shiwu Zhang

发表机构 * Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China（人形机器人研究院，精密机械与精密仪器系，中国科学技术大学）； School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University（计算机科学与技术学院，电子与信息工程学院，西安交通大学）； School of Information Science and Engineering, Lanzhou University（信息科学与工程学院，兰州大学）

AI总结提出一种仿生生成到控制框架，通过物理感知多条件潜扩散模型和预览式全身跟踪策略，将短时人体风格示例迁移到不同运动内容上，实现人形机器人可执行且表达性强的全身运动。

Comments Project page: https://huangtc233.github.io/bionic-style-transfer/

详情

AI中文摘要

表达性全身运动对于在人类环境中运行的人形机器人至关重要，机器人需要稳定移动的同时呈现可读且可调整的身体行为。然而，大多数表达性运动仍来自固定演示或手动设计的脚本，难以在不同运动内容间复用演示风格。受人体运动风格通过步态节奏、姿态、手臂摆动和身体摇摆传递情感和意图线索的启发，本文提出了一种仿生生成到控制框架，用于人形机器人上的示例驱动风格迁移。给定一个短时人体风格示例和目标内容运动，所提框架生成一个风格化全身参考，保留预期运动内容的同时迁移演示风格。开发了一个物理感知多条件潜扩散模型来融合风格、内容和轨迹条件，并使用无分类器引导在不重新训练的情况下调整风格强度。为提高硬件可执行性，在训练期间对解码后的运动施加接触一致性和时间平滑正则化。生成的参考随后转换为G1兼容的机器人参考，并由基于预览的全身跟踪策略执行，该策略采用聚类和蒸馏策略训练。仿真和Unitree G1实验表明，所提方法可以将短时人体风格示例迁移到多样化的机器人运动内容，与面向动画的风格迁移基线相比减少接触和抖动伪影，并在125次真实机器人试验中达到96.0%的成功率。结果证明了使用短时人体运动示例作为可复用的仿生源实现物理可执行表达性人形运动的可行性。

英文摘要

Expressive whole-body motion is important for humanoid robots operating in human environments, where robots are expected to move stably while presenting readable and adjustable body behaviors. However, most expressive motions are still obtained from fixed demonstrations or manually designed scripts, making it difficult to reuse a demonstrated style across different motion contents. Inspired by the way human motion styles convey affective and intentional cues through gait rhythm, posture, arm swing and body sway, this paper proposes a bionic generation-to-control framework for exemplar-driven style transfer on humanoid robots. Given a short human style exemplar and a target content motion, the proposed framework generates a stylized whole-body reference that preserves the intended motion content while transferring the demonstrated style. A physics-aware multi-condition latent diffusion model is developed to fuse style, content and trajectory conditions, and classifier-free guidance is used to adjust the style intensity without retraining. To improve hardware executability, contact-consistency and temporal-smoothness regularization are imposed on decoded motions during training. The generated references are then converted into G1-compatible robot references and executed by a preview-based whole-body tracking policy trained with a cluster-and-distill strategy. Simulation and Unitree G1 experiments show that the proposed method can transfer short human style exemplars to diverse robot motion contents, reduce contact and jitter artifacts compared with animation-oriented style-transfer baselines, and achieve a 96.0% success rate over 125 reported real-robot trials. The results demonstrate the feasibility of using short human motion exemplars as reusable bionic sources for physically executable expressive humanoid motion.

URL PDF HTML ☆

赞 0 踩 0

2606.03512 2026-06-03 cs.RO cs.AI 版本更新

SPADE: Sketch-guided Path Planning Augmented with Diffusion Experts

SPADE: 草图引导的路径规划增强扩散专家

Charbel Abi Hana, Tatiana Ghantous, Mikael Khalil, Anthony Rizk

发表机构 * IDEALworks GmbH ； IMT Atlantique ； IDEALworks GmbH & Saint Joseph University of Beirut（IDEALworks GmbH及贝鲁特圣约瑟夫大学）

AI总结提出一种结合扩散增强的框架，通过改进的标注工具和训练策略，在保持实时性的同时提升路径规划的泛化能力和鲁棒性，显著降低姿态误差和FID。

详情

DOI: 10.65109/RIHP6974

AI中文摘要

路径规划对于自主移动机器人（AMR）至关重要。将人类偏好纳入规划的常规方法通常依赖于复杂的奖励工程或硬件密集型解决方案。最近的最先进框架利用模仿学习从专家演示中训练特定行为的路径规划模型。然而，这些方法面临两个关键限制：对未见环境的泛化能力有限，以及演示收集中的鲁棒性较低。为了解决这些挑战，本文介绍了一个增强框架，专注于两个主要贡献：一个基于ROS 2重构的标注工具，以及一种新颖的训练策略，将基于扩散的数据增强集成到基线行为克隆模型中。提供了专家演示数据集，并通过消融研究评估所提出解决方案的鲁棒性。增强方法优于最先进的方法，绝对姿态误差（APE）降低39.1%，Fréchet初始距离（FID）降低33.5%，同时可训练参数减少93.8%。此外，它达到了扩散级别的泛化能力，同时保留了最先进模型的实时、边缘特性。

英文摘要

Path planning is essential for Autonomous Mobile Robots (AMRs). Conventional methods for incorporating human preferences into planning typically rely on either complex reward engineering or hardware-intensive solutions. Recent state-of-the-art frameworks leverage imitation learning to train behavior-specific path planning models from expert demonstrations. However, these approaches face two key limitations: limited generalization to unseen environments and low robustness in demonstration collection. To address these challenges, this work introduces an enhanced framework that focuses on two main contributions: an overhauled annotation tool built on ROS 2, and a novel training strategy that integrates diffusion-based augmentation into baseline behavioral cloning models. A dataset of expert demonstrations is provided and evaluated through ablation studies to assess the robustness of the proposed solution. The enhanced approach outperforms state-of-the-art methods with 39.1% lower Absolute Pose Error (APE) and 33.5% lower Fr'echet Inception Distance (FID) while having 93.8% less trainable parameters. Moreover it attains diffusion-level generalization while preserving the real-time, on-edge properties of state-of-the-art models.

URL PDF HTML ☆

赞 0 踩 0

2606.03476 2026-06-03 cs.RO 版本更新

Human2Humanoid: Physics-Aware Cross-Morphology Motion Retargeting for Humanoid Robots

Human2Humanoid: 面向人形机器人的物理感知跨形态运动重定向

Tianchen Huang, Feiyang Yuan, Junchi Gu, Shurui Fang, Xiaohu Zhang, Yu Wang, Wei Gao, Shiwu Zhang

发表机构 * Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China（人形机器人研究院，精密机械与精密仪器系，中国科学技术大学）

AI总结提出Human2Humanoid无监督运动重定向框架，利用CycleGAN和骨架感知图卷积网络处理未配对数据，通过形态不变末端执行器一致性损失和物理感知可行性约束，实现从人体运动到人形机器人的高保真重定向。

Comments Project page: https://huangtc233.github.io/human2humanoid_website/

详情

AI中文摘要

将人体运动重定向到人形机器人对于远程操作、模仿学习和人机交互至关重要。然而，由于人类与机器人在骨骼拓扑、肢体比例和自由度等方面的显著形态差异，以及配对运动数据的稀缺性，这仍然具有挑战性。本文提出了Human2Humanoid，一种无监督运动重定向框架，能够将人体运动高保真地迁移到人形机器人行为。为了在未配对数据下弥合领域差距，我们采用基于CycleGAN的架构，配备骨架感知图卷积网络来捕获拓扑相关的运动特征。为了解决跨域尺度不匹配问题，我们引入了一种形态不变的末端执行器一致性损失，该损失对齐归一化的末端执行器轨迹，以保留跨实体的运动语义。为了提高物理合理性并减少接触伪影，我们施加了显式的物理感知可行性约束，以鼓励再现源运动中的接触模式。实验结果表明，所提出的方法成功地将人体运动重定向到Unitree G1人形机器人，无需配对数据，并且在下游可控性和物理可行性方面均优于现有方法。

英文摘要

Retargeting human motion to humanoid robots is critical for teleoperation, imitation learning and human-robot interaction. However, it remains challenging because of substantial morphological discrepancies between humans and robots, including differences in skeletal topology, limb proportions and degrees of freedom, as well as the scarcity of paired motion data. This paper presents Human2Humanoid, an unsupervised motion retargeting framework that transfers human motions to humanoid robot behaviors with high fidelity. To bridge the domain gap under unpaired data, we adopt a CycleGAN-based architecture equipped with a skeleton-aware graph convolutional network to capture topology-dependent motion features. To address cross-domain scale mismatches, we introduce a morphology-invariant end-effector consistency loss that aligns normalized end-effector trajectories to preserve motion semantics across embodiments. To improve physical plausibility and reduce contact artifacts, we impose explicit physics-aware feasibility constraints to encourage reproduction of the contact patterns in the source motion. Experimental results show that the proposed method successfully retargets human motion to the Unitree G1 humanoid robot without paired data, and outperforms existing methods in both downstream controllability and physical feasibility.

URL PDF HTML ☆

赞 0 踩 0

2606.03421 2026-06-03 cs.RO 版本更新

先抓取后规划与失败归因：一种用于精确且可泛化机器人操作的闭环两阶段框架

Jiahao Xu, Peiyuan Wang, Hanzhuo Zhang, Zihao Yu, Tianyu Fu, Hao Chen, Xuanhao Xiang, Jianbo Yu, Chenchen Fu, Wanyuan Wang

发表机构 * School of Computer Science and Engineering, Southeast University, China（东南大学计算机科学与工程学院）

AI总结提出GTP-FA框架，通过任务导向的两阶段抓取-规划流程和失败归因模型，在抓取和规划模块中分别注入任务先验和风险惩罚以及针对高风险初始状态进行数据收集和微调，显著提升机器人操作任务的成功率。

Comments 32 pages, project page: https://sites.google.com/view/gtp-fa/

详情

AI中文摘要

在机器人操作中，抓取与运动规划之间的紧密耦合常常掩盖失败的真实原因，导致低效的试错过程。为了实现高效的长时域操作，我们提出了GTP-FA（先抓取后规划与失败归因），一种面向任务的两阶段抓取-规划框架，该框架生成抓取候选并根据所选抓取执行下游运动规划。给定失败的操作轨迹，我们学习一个失败归因模型，该模型可泛化到未见过的抓取，并生成失败模式的稳定分布以进行诊断引导的优化。基于这些归因结果，我们以诊断驱动的方式优化两个模块：在抓取侧，我们将任务级先验和风险惩罚注入抓取候选评分和优化中，以抑制不稳定或与任务不兼容的抓取；在规划侧，我们通过数据收集和微调针对高风险初始状态，以解决真正的规划瓶颈。我们在仿真和真实机器人实验中评估了所提出的框架，并表明GTP-FA在基于RL、IL、扩散策略和VLA的设置中提升了相应的基础学习器，实现了显著更高的总体任务成功率。

英文摘要

In robotic manipulation, the tight coupling between grasping and motion planning often obscures the true source of failure, leading to inefficient trial-and-error. To enable efficient long-horizon manipulation, we propose GTP-FA (Grasp-Then-Plan with Failure Attribution), a task-oriented two-stage grasp-then-plan framework that generates grasp candidates and performs downstream motion planning conditioned on the selected grasp. Given a failed manipulation trajectory, we learn a failure attribution model that generalizes to unseen grasps and produces a stable distribution over failure modes for diagnosis-guided optimization. Based on these attribution results, we then optimize both modules in a diagnosis-driven manner: on the grasping side, we inject task-level priors and risk penalties into grasp candidate scoring and optimization to suppress unstable or task-incompatible grasps; on the planning side, we target high-risk initial states through data collection and fine-tuning to address genuine planning bottlenecks. We evaluate the proposed framework in both simulation and real-robot experiments, and show that GTP-FA improves the corresponding base learners across RL, IL, diffusion-policy, and VLA-based settings, achieving substantially higher overall task success rates.

URL PDF HTML ☆

赞 0 踩 0

2606.03374 2026-06-03 cs.RO 版本更新

eMEM: A Hybrid Spatio-Temporal Memory System For Embodied Agents

eMEM：一种面向具身智能体的混合时空记忆系统

A. Haroon Rasheed, Maria Kabtoul

AI总结提出eMEM混合图记忆系统，通过多索引架构和分层整合管道实现具身智能体在空间、时间和语义上的高效记忆检索，并在ProcTHOR-10K基准测试中达到80.8加权平均分。

详情

AI中文摘要

我们提出eMEM（具身记忆），一种基于混合图的记忆系统，用于在物理环境中运行的具身智能体。当前的智能体记忆架构，如Generative Agents、MemGPT和A-MEM，将记忆视为文本流或知识图谱，但具身智能体需要同时能够按意义、空间和时间进行搜索的记忆。eMEM通过一个统一在单一图模型背后的多索引架构（用于结构化存储的SQLite、用于近似最近邻语义搜索的hnswlib以及用于空间查询的R-tree）填补了这一空白。一个分层整合管道将原始感知观察转化为压缩摘要，模仿生物系统中海马体-新皮层的整合。十个面向智能体的回忆工具暴露了记忆检索原语，包括概念到位置的解析和跨层回忆，作为LLM工具调用的第一类操作。该系统完全嵌入式，与智能体在同一进程中运行。此外，我们引入了eMEM-Bench v1，这是一个我们在ProcTHOR-10K场景上构建的用于具身记忆评估的基准。该基准明确围绕八个认知心理学范式（DRM诱饵、模式分离、模式完成、源监控、上下文依赖检索、长时程干扰、序列位置和增强保留曲线）组织，每个范式都经过选择，使得结果能够对照人类和先前智能体记忆系统的更广泛记忆系统文献进行解释；这是像LoCoMo或OpenEQA这样的表面任务基准无法提供的诊断水平。eMEM在988个探针上获得80.8加权平均分，在模拟延迟从1小时到1年的房间独特项目上保持平稳的保留曲线。我们表明，纯RAG基线（flat_rag消融）在上下文依赖检索上损失30分，在DRM诱饵拒绝上损失29分，分别隔离了多层存储和整合的贡献。我们发布了系统和基准代码。

英文摘要

We present eMEM (Embodied Memory), a hybrid graph-based memory system for embodied agents operating in physical environments. Current agent memory architectures, such as Generative Agents, MemGPT, and A-MEM, treat memory as text streams or knowledge graphs, but embodied agents require memory that is simultaneously searchable by meaning, space, and time. eMEM fills this gap with a multi-index architecture (SQL ITE for structured storage, hnswlib for approximate nearest neighbour semantic search, and an R-tree for spatial queries) unified behind a single graph model. A tiered consolidation pipeline transforms raw perceptual observations into compressed summaries, mirroring hippocampal-neocortical consolidation in biological systems. Ten agent-facing recall tools expose memory retrieval primitives, including concept-to-location resolution and cross layer recall, as first-class operations for LLM tool calling. The system is fully embedded and runs in-process alongside the agent. In addition we introduce eMEM-Bench v1, a benchmark we construct over ProcTHOR-10K scenes for embodied memory evaluation. The benchmark is organised explicitly around eight cognitive-psychology paradigms (DRM lures, pattern separation, pattern completion, source monitoring, context-dependent retrieval, long-horizon interference, serial position, and a foil augmented retention curve), each chosen so that the result is interpretable against the broader memory-systems literature in humans and prior agent-memory systems; a level of diagnostic that surface-task benchmarks like LoCoMo or OpenEQA cannot provide. eMEM scores 80.8 weighted mean over 988 probes, with a flat retention curve at ceiling from 1 h to 1 yr of simulated delay on room-unique items. We show that a pure RAG baseline (the flat_rag ablation) loses 30 pt on context dependent retrieval and 29 pt on DRM lure rejection, isolating the contribution of multi-layer storage and consolidation respectively. We release both the system and the benchmark code.

URL PDF HTML ☆

赞 0 踩 0

2606.03340 2026-06-03 cs.RO 版本更新

Autonomous Navigation System for Library Service Robot Based on Unitree Go2 Edu

基于 Unitree Go2 Edu 的图书馆服务机器人自主导航系统

Aoduo Li, Haoran Lv, Bingquan Ou, Jianfeng Li, Yingdong Li, Zimeng Li

发表机构 * Unitree Go2 Edu

AI总结针对图书馆狭窄通道和动态障碍物环境，提出基于 ROS 2 的四足机器人导航系统，融合 RTAB-Map、AMCL/EKF 和 Nav2 实现高成功率定位与避障，地图误差 3.7 cm。

Comments 6 pages, 5 figures, 4 tables. Accepted by WCCIS 2026

详情

AI中文摘要

图书馆需要自主机器人在狭窄通道中安静移动，同时确保读者、椅子、包和手推车周围的安全。本文提出了一套基于 Unitree Go2 Edu 四足机器人的 ROS 2 导航系统，该机器人配备了 4D LiDAR、前置深度相机和 IMU。我们并未假设图书馆是粗糙地形，而是针对实际部署中遇到的移动性不连续问题，包括地面过渡、临时杂乱和部分堵塞通道（低底盘轮式平台在此类场景中适应性较差）。采用 RTAB-Map 进行视觉-LiDAR SLAM，基于 AMCL 和 EKF 的传感器融合实现定位，以及基于 A* 和 DWA 的 Nav2 栈支持路径规划和局部避障。在真实图书馆中，该系统在静态、低密度动态和高密度动态场景下的成功率分别为 100%、96% 和 88%，而针对测量控制距离的地图验证显示平均度量误差为 3.7 cm。

英文摘要

Libraries require autonomous robots to move quietly through narrow aisles while remaining safe around readers, chairs, bags, and carts. This paper presents a ROS 2 navigation system for a Unitree Go2 Edu quadruped equipped with a 4D LiDAR, a front depth camera, and an IMU. Rather than assuming the library is rough terrain, we target the practical mobility discontinuities of real deployments, including floor transitions, temporary clutter, and partially blocked passages where low-clearance wheeled platforms are less tolerant. RTAB-Map is used for visual-LiDAR SLAM, AMCL and EKF-based sensor fusion provide localization, and a Nav2 stack with A* and DWA supports planning and local avoidance. In a real library, the system achieves 100%, 96%, and 88% success rates in static, low-density dynamic, and high-density dynamic scenes, while map validation against surveyed control distances yields a mean metric error of 3.7 cm.

URL PDF HTML ☆

赞 0 踩 0

2606.03335 2026-06-03 cs.RO 版本更新

GPU-Parallel Multi-Task Reinforcement Learning with Demonstration Guided Policy Optimization

GPU并行多任务强化学习与演示引导的策略优化

Rui Zhang, Qiwei Wu, Zhengyu Zhang, Tao Li, Yunrong Guo, Junjie Lai, Renjing Xu, Weihua Zhang

发表机构 * NVIDIA ； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出一种将结构化操作任务族转化为GPU并行多任务强化学习基准的构建方法MT-LIBERO，并设计演示引导策略优化算法DGPO，结合重要性加权PPO与自适应行为克隆，实现异构任务套件的高效并行训练。

详情

AI中文摘要

大规模GPU并行强化学习已经改变了机器人仿真中可训练的内容，但大多数系统仍为每个任务优化一个专家策略。我们提出了一种构建方法，将结构化操作任务族转化为GPU并行多任务强化学习基准，并在Isaac Lab中使用LIBERO资产和任务谓词实例化为MT-Libero。该基准支持在异构任务套件上同时进行强化学习，具有并行渲染、物理随机化以及状态输入或视觉输入策略。为了使这种训练在稀疏成功信号和有限先验数据下变得实用，我们进一步提出了DGPO，一种在线演示引导方法，它将重要性加权PPO与对匹配演示动作的自适应行为克隆相结合。DGPO实现了对演示任务分布的可调偏好，在保持在线PPO的稳定性和在线改进优势的同时，优于无先验强化学习和现有的基于演示的方法。

英文摘要

Large scale GPU-parallel reinforcement learning has changed what can be trained in robot simulation, yet most systems still optimize one specialist policy per task. We propose a construction methodology for turning structured manipulation task families into GPU-parallel multi-task RL benchmarks, and instantiate it as MT-Libero using LIBERO assets and task predicates in Isaac Lab. The resulting benchmark supports simultaneous reinforcement learning over heterogeneous task suites with parallel rendering, physics randomization, and state-input or visual-input policies. To make such training practical under sparse success signals and limited prior data, we further propose DGPO, an on-policy demonstration guided method that combines importance weighted PPO with adaptive behavior cloning on matched demonstration actions. DGPO enables a tunable preference toward demonstrated task distributions, outperforming both prior-free RL and existing demonstration-based methods while preserving the stability and online improvement benefits of on-policy PPO.

URL PDF HTML ☆

赞 0 踩 0

2606.03312 2026-06-03 cs.RO cs.AI 版本更新

RobotValues: Evaluating Household Robots When Human Values Conflict

RobotValues: 当人类价值观冲突时评估家用机器人

Jongwook Han, Hyeongjin Kim, Yohan Jo

发表机构 * Graduate School of Data Science, Seoul National University（首尔国立大学数据科学研究生院）

AI总结提出RobotValues基准，通过10K个价值冲突场景评估家用机器人规划器，发现视觉语言模型存在默认价值偏好且难以覆盖，表明评估需考虑价值冲突下的行动选择。

详情

AI中文摘要

虽然家用机器人通常基于任务完成度进行评估，但日常家庭环境涉及价值冲突情境，其中机器人应选择优先考虑其他价值观（如人类自主性、效率或社会适宜性）而非任务成功的行动。然而，目前尚无评估机器人在此类场景中价值偏好的基准。我们引入RobotValues，一个在10K个价值冲突场景中评估家用机器人规划器的基准。每个实例包含一个逼真的家庭图像和多个优先考虑不同人类价值观的合理机器人动作。我们通过LLM辅助场景生成、利益相关者基于价值观提取、图像生成和自动质量控制构建RobotValues。使用RobotValues评估机器人领域使用的视觉语言模型，发现模型表现出默认价值偏好，包括安全性和适应性，而低估了隐私优先的行动。当模型被指示优先考虑与其自身偏好冲突的特定价值观时，它们通常无法覆盖默认行动，80%的时间选择了错误行动。这些发现表明，家用机器人评估不仅应衡量任务完成度或安全性合规性，还应衡量当人类价值观冲突时机器人是否能在合理行动中做出选择。

英文摘要

While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots' value preferences in such scenarios. We introduce RobotValues, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using RobotValues we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.

URL PDF HTML ☆

赞 0 踩 0

2606.03297 2026-06-03 cs.RO 版本更新

SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation

SplitAdapter: 通过因子化自适应的负载感知人形机器人移动操作

Jeonguk Kang, Hanbyel Cho, Sanghyun Kang, Donghan Koo

发表机构 * Future Robot AI Group, Samsung Electronics（三星电子未来机器人人工智能组）

AI总结针对人形机器人在不同负载和高度下移动操作时负载变化与动力学不匹配的问题，提出SplitAdapter方法，通过冻结预训练策略并扩展负载与动力学感知编码器，结合分割世界模型目标、GRL交叉对抗正则化和分层特征线性调制，显著提升重载条件下的任务成功率。

详情

AI中文摘要

人形机器人的移动操作需要在不同物体质量和拾取/放置高度下实现稳定的全身控制。在仿真到现实的迁移中，物体引起的负载变化和机器人侧的动力学不匹配在物理接触期间相互作用，这尤其具有挑战性。现有的基于历史的自适应方法通常将这些因素压缩到单个潜在表示中，这可能在重载操作下削弱鲁棒性。我们提出 extbf{SplitAdapter: 通过因子化自适应的负载感知人形机器人移动操作}，该方法冻结预训练的箱子操作策略，并通过使用分割世界模型目标、基于GRL的交叉对抗正则化和分层特征线性调制（FiLM）训练的物体/负载和动力学感知上下文编码器进行扩展。在仿真到仿真实验和实际部署中，SplitAdapter在物体质量为$2$、$4$和$6$千克以及拾取/放置高度为$0$、$30$和$60$厘米的情况下，相对于基础策略和世界模型FiLM基线提高了完整任务成功率，其中在重载条件下改进最大。

英文摘要

Humanoid loco-manipulation requires stable whole-body control under varying object masses and pickup/placement heights. This becomes particularly challenging in sim-to-real transfer, where object-induced load variation and robot-side dynamics mismatch interact during physical contact. Existing history-based adapters often compress these factors into a single latent representation, which can weaken robustness under heavy-load manipulation. We propose \textbf{SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation}, which freezes a pretrained box manipulation policy and extends it with object/load and dynamics-aware context encoders trained with split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical Feature-wise Linear Modulation (FiLM). In sim-to-sim experiments and real-world deployment, SplitAdapter improves Full-task success over the base policy and world-model FiLM baselines across object masses of $2$, $4$, and $6$ kg and pickup/placement heights of $0$, $30$, and $60$ cm, with the largest improvements under heavy-load conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.03296 2026-06-03 cs.RO 版本更新

Bridging Predictive Uncertainty and Safe Action: Sample-Conditioned Differentiable Planning for Autonomous Driving

桥接预测不确定性与安全行动：面向自动驾驶的样本条件可微分规划

Chengzhen Meng, Pei Liu, Zhiyu Huang, Chen Lv, Jun Ma

发表机构 * Robotics and Autonomous Systems Thrust, The Hong Kong University of Science and Technology（香港科学与技术大学机器人与自主系统方向）； Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology（香港科学与技术大学电子与计算机工程系）； Department of Civil and Environmental Engineering, University of California, Los Angeles（加州大学洛杉矶分校土木与环境工程系）； School of Mechanical and Aerospace Engineering, Nanyang Technological University（南洋理工大学机械与航空航天工程学院）

AI总结提出一种样本条件可微分规划框架，通过扩散模型生成多样未来场景并直接输入可微分规划器，利用条件风险价值约束缓解预测不确定性，实现安全、高效、舒适的自动驾驶运动规划。

详情

AI中文摘要

复杂、动态且交互的驾驶环境给自动驾驶带来了重大挑战，主要源于周围交通的普遍不确定性。当前系统的一个基本瓶颈是高度表达性的不确定性建模与可解释、安全的运动规划之间的脱节。在本文中，我们提出了一种新颖的样本条件可微分规划框架，通过将扩散生成的未来轨迹显式纳入优化过程来弥合这一差距。我们的方法不是将预测压缩为单一的确定性未来或依赖黑盒端到端架构，而是利用条件扩散模型生成一组多样化的合理未来场景。关键的是，这些样本直接输入可微分规划器，该规划器通过经验条件风险价值尾部风险约束显式缓解预测不确定性。这使得规划器能够优化一条物理可解释的轨迹，该轨迹对罕见但安全关键的交互具有鲁棒性。此外，我们引入了一种场景上下文的有向图表示，在预测有效性和计算效率方面均带来了显著提升。通过在Waymo Open Motion和Argoverse 2数据集上进行的大量开环和闭环评估，我们的框架在安全性、效率和乘坐舒适性方面显著优于最先进的基线方法。

英文摘要

Complex, dynamic, and interactive driving environments pose significant challenges for autonomous driving, primarily due to the pervasive uncertainty of surrounding traffic. A fundamental bottleneck in current systems is the disconnect between highly expressive uncertainty modeling and interpretable, safe motion planning. In this paper, we propose a novel sample-conditioned differentiable planning framework that bridges this gap by explicitly incorporating diffusion-generated future trajectories into the optimization process. Rather than compressing predictions into a single deterministic future or relying on black-box end-to-end architectures, our approach leverages a conditional diffusion model to generate a diverse set of plausible future scenarios. Crucially, these samples are directly fed into a differentiable planner, which explicitly mitigates predictive uncertainty via an empirical Conditional Value-at-Risk (CVaR) tail-risk constraint. This allows the planner to optimize a physically interpretable trajectory that is robust to rare yet safety-critical interactions. Furthermore, we introduce a directed graph representation for scene context that yields substantial improvements in both predictive effectiveness and computational efficiency. Validated through extensive open-loop and closed-loop evaluations on the Waymo Open Motion and Argoverse 2 datasets, our framework significantly outperforms state-of-the-art baselines in safety, efficiency, and ride comfort.

URL PDF HTML ☆

赞 0 踩 0

2606.03268 2026-06-03 cs.RO 版本更新

EaDex: A Cross-Embodiment Dexterous Manipulation Framework from Low-Cost Demonstrations

EaDex: 一种基于低成本演示的跨形态灵巧操作框架

Qian Zhao, Xin Tong, Chengdong Wu, Yang Yang, Yingtian Li

发表机构 * Faculty of Robot Science and Engineering, Northeastern University（机器人科学与工程学院，东北大学）； Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences（深圳先进技术研究所，中国科学院）； School of Automation, Nanjing University of Information Science and Technology（自动化学院，南京信息科学技术大学）

AI总结提出EaDex框架，通过RGB-D相机捕捉人手运动并构建结构化演示数据，结合基于接触奖励的动态演示退火机制，在低成本演示条件下实现多形态灵巧操作的快速学习和训练。

Comments 11 pages, 5 figures, Conference: CoRL 2026, Submitted as Preprint

详情

AI中文摘要

灵巧操作学习长期以来受到数据和训练高成本的阻碍，因为纯强化学习通常需要大规模交互探索，而模仿学习依赖于昂贵的高质量演示。为了解决这个问题，我们提出了EaDex，一种在低成本演示条件下的多形态灵巧操作学习框架，它能够快速生成演示数据，从而减少训练时间以实现高效的灵巧操作。在数据层面，EaDex仅使用单个RGB-D相机捕捉人手运动，并通过基于MANO的手部建模、数据归一化和运动重定向构建结构化演示数据。在学习层面，我们引入了一种基于接触奖励的动态演示退火机制，该机制在演示引导下进行早期探索，并随着接触奖励的积累逐渐过渡到自主优化。使用我们自定义的数据集，我们在三种灵巧手和三种铰接物体打开任务上评估了EaDex，涵盖了九种跨形态操作设置，相比没有演示退火的基线实现了55.3%的相对改进。这些结果验证了所提出的低成本演示流程和动态演示退火策略在灵巧操作学习中的有效性。

英文摘要

Dexterous manipulation learning has long been hindered by the high costs of data and training, as pure reinforcement learning typically requires large-scale interactive exploration and imitation learning depends on high-quality demonstrations that are expensive to collect. To address this problem, we propose EaDex, a multi-embodiment dexterous manipulation learning framework under low-cost demonstration conditions, which enables rapid generation of demonstration data and consequently reduces training time for efficient dexterous manipulation. At the data level, EaDex captures human hand motions using only a single RGB-D camera and constructs structured demonstration data through MANO-based hand modeling, data normalization, and motion retargeting. At the learning level, we introduce a contact-reward-based dynamic demonstration annealing mechanism, which guides early-stage exploration under demonstration and gradually transitions to autonomous optimization with accumulating contact rewards. Using our custom dataset, we evaluate EaDex on three dexterous hands and three articulated object-opening tasks, covering nine cross-embodiment manipulation settings, achieving a 55.3% relative improvement over the baseline without demonstration annealing. These results validate the effectiveness of the proposed low-cost demonstration pipeline and the dynamic demonstration annealing strategy for dexterous manipulation learning.

URL PDF HTML ☆

赞 0 踩 0

2606.03265 2026-06-03 cs.RO 版本更新

Wheel-Mounted/GNSS Fusion with AI-Aided Position Updates

基于人工智能辅助位置更新的轮式/GNSS融合定位

Gal Versano, Itzik Klein

发表机构 * Autonomous Navigation and Sensor Fusion Lab（自主导航与传感器融合实验室）； Hatter Department of Marine Technologies（海洋技术系）； Charney School of Marine Sciences（海洋科学学院）； University of Haifa（海法大学）

AI总结提出一种混合神经惯性导航框架，结合轮式惯性传感器、强制周期轨迹和神经网络，通过误差状态扩展卡尔曼滤波融合GNSS位置更新，实现定位精度提升约46%。

2606.03252 2026-06-03 cs.RO cs.AI 版本更新

AirDreamer: Generalist Drone Navigation with World Models

AirDreamer: 基于世界模型的通用无人机导航

Zian Liu, Andong Yang, Chunkai Yang, Ruidong An, Chao Gao, Guyue Zhou

发表机构 * Institute for AI Industry Research, Tsinghua University, Beijing, China（人工智能产业研究院，清华大学，北京，中国）； Department of Electronic Engineering, Tsinghua University, Beijing, China（电子工程系，清华大学，北京，中国）； School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China（遥感与信息工程学院，武汉大学，武汉，中国）

AI总结提出一种结合强化学习策略和世界模型理解的无人机导航框架，通过稀疏奖励函数避免局部最优，在复杂未知环境中实现优于基线5.3%的成功率，并支持零调参的仿真到现实迁移。

Comments 8 pages, 8 figures

详情

AI中文摘要

在未知且杂乱的环境中导航无人机需要可靠地泛化到未见过的场景布局，并理解与机器人能力相关的环境结构。先前的方法假设相同的环境配置，通常严重依赖人工设计的感知管道和预定义规则来引导机器人到达目标。这个过程依赖于环境，且跨环境泛化能力差。受动物导航行为启发，我们设计了一个导航框架，该框架在基于世界模型的环境理解之上使用基于强化学习的策略进行导航，以克服这些问题。此外，我们设计了一个无需手工塑造项的稀疏奖励函数，以避免局部极小值陷阱并鼓励偏航控制行为。在仿真和真实无人机上，我们的方法展现出在复杂未知环境中导航和逃离其他方法失败的局部最优的新兴能力。在具有挑战性的地图上，它比最佳基线实现了5.3%更高的导航成功率。此外，所提出的框架在部署期间无需任何调整即可实现有效的仿真到现实迁移。代码将公开。

英文摘要

Navigating a drone in unseen and cluttered environments requires reliable generalization to unseen scene layouts and understanding of environmental structure relative to the robot's capabilities. Previous methods, which assume the same environment configuration, often rely heavily on human-designed perception pipelines and predefined rules to guide the robot toward the target. This process is environment-dependent and generalizes poorly across environments. Inspired by animal navigation behavior, we design a navigation framework that navigates with a reinforcement-learning-based policy on top of a world-model-based environment understanding to overcome these issues. In addition, a sparse reward function without hand-crafted shaping terms is designed to avoid local minima traps and encourage yaw control behaviors. In simulation and on real drones, our method exhibits emergent capabilities for navigating complex, unseen environments and escaping local optima where other methods fail. In challenging maps, it achieves a 5.3% higher navigation success rate than best baseline. Furthermore, the proposed framework achieves effective sim-to-real transfer without any tuning during deployment. The code will be publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.03240 2026-06-03 cs.RO 版本更新

GeoAlign: Beyond Semantics with State-Guided Spatial Alignment in VLA Models

GeoAlign: VLA模型中的状态引导空间对齐超越语义

Yizhi Chen, Zhanxiang Cao, Xinyi Peng, Yixiao Zheng, Xiaxi Si, Yiheng Li, Liyun Yan, Keqi Zhu, Xueyun Chen, Shengcheng Fu, Tianyue Zhan, Yufei Jia, Jinming Yao, Yan Xie, Kun Wang, Cewu Lu, Yue Gao

发表机构 * Tongji University（同济大学）； Shanghai Innovation Institute（上海创新研究院）； Shanghai Jiao Tong University（上海交通大学）； Zhejiang University（浙江大学）； Jingdezhen Ceramic University（景德镇陶瓷大学）； Tsinghua University（清华大学）； HONOR（HONOR公司）； University of Science and Technology of China（中国科学技术大学）

AI总结提出GeoAlign架构，通过RGB几何分支的后训练和机器人本体状态引导的几何特征查询，实现几何感知的空间对齐和动态可供性选择，在多个基准上取得高性能。

Comments 20 pages, 9 figures, 8 tables, including appendix

2606.03223 2026-06-03 cs.RO cs.AI 版本更新

BotDirector: Robot Storytelling Across the Symmetrical Reality with Multi-modal Interactions

BotDirector：跨对称现实的多模态交互机器人讲故事

Zhe Sun, Meng Wang, Lei Wang, Yuxi Wang, Wanxin Li, Yujia Peng, Zhenliang Zhang

发表机构 * State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China（国家一般人工智能重点实验室，BIGAI，北京，中国）； Peking University, Beijing, China（北京大学，北京，中国）

AI总结提出一个结合具身交互和自然语言交互的机器人讲故事系统，利用LLM代理将儿童创建的叙事转化为自导航群体机器人的运动序列，支持灵活场景和日常物品。

2606.03204 2026-06-03 cs.RO eess.SP 版本更新

Toward Gripper-Integrated Active Electrosense for Pre-Contact Sensing in Underwater Soft Grippers

面向水下软体夹爪预接触感知的夹爪集成主动电感知

Ahsan Tanveer, Muhammad Hamza, Waqar Hussain Afridi, Chen Wang, Guangming Xie

发表机构 * Intelligent Biomimetic Design Lab, School of Advanced Manufacturing and Robotics, State Key Laboratory for Turbulence and Complex Systems, College of Engineering, Peking University（智能仿生设计实验室，先进制造与机器人学院，湍流与复杂系统国家重点实验室，北京大学）； National Engineering Research Center of Software Engineering, Peking University（软件工程国家工程研究中心，北京大学）； Institute of Ocean Research, Peking University（海洋研究所，北京大学）

AI总结针对水下视觉受限问题，提出一种集成于软体夹爪的主动电感知方法，通过测量导电介质中电场扰动实现预接触信号检测，实验表明多电极电压读数可检测物体引起的结构化变化。

Comments Extended abstract accepted to the IEEE ICRA 2026 Workshop on Manipulation Robustness

详情

AI中文摘要

水下操作通常发生在因浑浊、眩光和夹爪遮挡导致能见度降低的环境中，这限制了接近和抓取过程中基于视觉感知的可靠性。在这种情况下，软体夹爪非常适合顺应性交互，但通常缺乏在视觉不可靠时指导接近和闭合的机载预接触线索。本扩展摘要探索了主动电感知作为一种轻量级传感模式，通过测量导电介质中施加电场的扰动，在接触前提供类似接近的信号。我们为仿章鱼夹爪设计了离散电极布局，并使用现成硬件记录多通道传感电压。使用悬浮导电球进行的模拟和水槽实验显示，相对于空水基线，多电极电压读数出现了结构化的、依赖于物体的变化，且可检测性随5至20 V的激励和1 mHz至1 kHz的频率而变化。这些发现促使系统研究集成于夹爪的电感知作为水下软体操作补充预接触线索的可行性。

英文摘要

Underwater manipulation often occurs under degraded visibility due to turbidity, glare, and gripper occlusion, limiting the reliability of vision-based perception during approach and grasping. In such settings, soft grippers are well suited for compliant interaction, but they typically lack an onboard pre-contact cue that can guide approach and closure when vision is unreliable. This extended abstract explores active electrosense as a lightweight sensing modality that can provide a proximity-like signal prior to contact by measuring perturbations of an applied electric field in conductive media. We instrument an octopus-inspired gripper with a discrete electrode layout and record multi-channel sensing voltages using off-the-shelf hardware. Simulation and tank experiments with a suspended conductive sphere show structured, object-dependent changes in the multi-electrode voltage readout relative to empty-water baselines, with detectability varying across excitation of 5 to 20 V and frequencies from 1 mHz to 1 kHz. These findings motivate systematic investigation of gripper-integrated electrosense as a complementary pre-contact cue for underwater soft manipulation.

URL PDF HTML ☆

赞 0 踩 0

2606.03188 2026-06-03 cs.RO 版本更新

GeoSem-WAM: Geometry- and Semantic-Aware World Action Models

GeoSem-WAM：几何与语义感知的世界动作模型

Fulong Ma, Daojie Peng, Wenjun Yue, Jiahang Cao, Bintao Wang, Qiang Zhang, Jun Ma

发表机构 * HKUST(GZ)（香港科技大学（广州））； HKU（香港大学）； USTC（中国科学技术大学）； SDU（山东大学）； X-Humaniod

AI总结提出GeoSem-WAM框架，通过几何和语义监督增强潜在表示，在统一潜在空间中联合捕捉场景动态、空间几何和语义上下文，避免测试时显式未来展开或视频生成，提升动作预测准确性和鲁棒性。

详情

AI中文摘要

最近的世界动作模型（WAM）在具身决策中展示了令人印象深刻的能力。然而，它们的有效性是源于推理过程中的显式未来想象，还是由预测训练引起的表示学习，仍是一个未解之谜。新兴证据表明，主要优势在于学习鲁棒的潜在表示，而非在测试时生成未来观测。尽管如此，现有的WAM主要依赖于基于RGB的未来预测，这提供了对复杂环境有限的结构和空间理解。为了解决这个问题，我们提出了一个结构化世界建模框架，通过几何和语义监督增强潜在表示。除了未来的RGB预测，我们的模型引入了两个辅助预测分支，用于未来的几何和语义表示，使其能够在统一的潜在空间中联合捕捉场景动态、空间几何和语义上下文。关键在于，我们的方法通过避免测试时的显式未来展开或视频生成，保持了高效的推理。大量实验表明，纳入结构化世界监督一致地提高了动作预测准确性、场景理解以及在具有挑战性的具身场景下的鲁棒性，突显了其推进可扩展和高效WAM的潜力。

英文摘要

Recent World Action Models (WAMs) have demonstrated impressive capabilities in embodied decision-making. However, whether their effectiveness stems from explicit future imagination during inference or representation learning induced by predictive training remains an open question. Emerging evidence suggests the primary advantage lies in learning robust latent representations rather than generating future observations at test time. Nevertheless, existing WAMs mainly rely on RGB-based future prediction, which provides limited structural and spatial understanding of complex environments. To address this, we propose a structured world modeling framework that enhances latent representations through geometric and semantic supervision. Alongside future RGB prediction, our model introduces two auxiliary prediction branches for future geometry and semantic representations, enabling it to jointly capture scene dynamics, spatial geometry, and semantic context within a unified latent space. Crucially, our approach preserves efficient inference by avoiding explicit future rollout or video generation at test time. Extensive experiments show that incorporating structured world supervision consistently improves action prediction accuracy, scene understanding, and robustness under challenging embodied scenarios, highlighting its potential for advancing scalable and efficient WAMs.

URL PDF HTML ☆

赞 0 踩 0

2606.03159 2026-06-03 cs.CV cs.AI cs.RO 版本更新

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

NVIDIA OmniDreams：用于闭环自动驾驶仿真的实时生成式世界模型

NVIDIA, :, Aarti Basant, Amlan Kar, Despoina Paschalidou, Fangyin Wei, Francesco Ferroni, Guillermo Garcia Cobo, Haithem Turki, Huan Ling, Jaewoo Seo, James Lucas, Jay Zhangjie Wu, Jialiang Wang, Jonathan Lorraine, Jun Gao, Kai He, Katarina Tothova, Kevin Xie, Michał Tyszkiewicz, Qi Wu, Riccardo de Lutio, Ruilong Li, Sanja Fidler, Seung Wook Kim, Tianchang Shen, Tianshi Cao, Tobias Pfaff, William Lew, Xindi Wu, Xuanchi Ren, Yifan Lu, Yuxuan Zhang, Zan Gojcic, Zian Wang

AI总结提出OmniDreams，一个基于Cosmos扩散模型训练的基础生成式世界模型，通过自回归生成动作条件视频，实现闭环仿真中复杂长尾场景的实时合成，并验证其在策略模型训练中的有效性。

详情

AI中文摘要

随着自动驾驶能力的提升，在长尾场景中安全评估驾驶策略仍是一个关键瓶颈。在闭环仿真中，驾驶策略模型与环境主动交互，其动作动态更新模拟器状态并直接影响下一组生成的传感器观测。尽管近期基于重建的神经模拟器提供了逼真效果，但它们从根本上受限于初始捕获数据，难以泛化到高度动态或新颖场景。为克服这些限制，我们引入了OmniDreams，一个从Cosmos扩散模型进行中期和后训练的基础生成式世界模型，能够自回归地实时生成动作条件视频。通过利用Cosmos丰富的视觉先验以及在21k小时驾驶场景上的中期和后训练，OmniDreams合成了传统模拟器难以捕获的复杂未观测现象，例如极端天气和不可预测的动态智能体行为。关键在于，它自回归地根据过去帧、当前模拟器状态和即时驾驶动作来调节其逼真的传感器生成。在结合Alpamayo 1策略模型和AlpaSim编排器的闭环系统中部署时，OmniDreams充当一个高度响应、反应灵敏的环境，为训练和评估下一代自动驾驶策略提供了可扩展且全面的解决方案。我们还展示了初步结果，表明从OmniDreams后训练的世界-动作模型（WAM）在Physical AI自动驾驶NuRec数据集上取得了强劲性能，超越了基于VLA的Alpamayo 1.5研究策略模型，同时仅使用其1/5的总参数量。这些结果凸显了像OmniDreams这样的实时世界模型也有潜力作为策略架构的骨干网络。

英文摘要

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.03134 2026-06-03 cs.RO cs.LG 版本更新

How Visible Are Silent Manipulation Failures? An Observability Study of False-Success Detection in Simulated Robot Episodes

无声操作失败的可见性：模拟机器人任务中假成功检测的可观测性研究

Aarav Bedi

发表机构 * Aarav Bedi

AI总结本研究通过模拟双机械臂ALOHA任务，探讨机器人自身成功检测器标记为成功的任务中，假成功（实际失败但被误判为成功）的可恢复性，发现基于关节数据的检测器在方块转移任务中几乎完全可恢复假成功，而在插销任务中仅部分可恢复，视觉检测器可弥补差距，且可分离性依赖于远低于实际传感器噪声的速度差异。

Comments 4 pages, 3 figures

详情

AI中文摘要

模仿学习策略用于机器人操作时，其训练任务的成功标签质量取决于机器人自身的成功检测器。一种特别有害的错误是假成功：机器人记录为成功但实际任务结果错误的任务。我们针对这些任务提出一个狭窄但实际的问题：一旦任务被标记为成功，推翻该标签所需的信息有多少存在于本体感觉中，又有多少需要视觉？我们在两个双机械臂ALOHA任务上构建模拟测试平台，通过环境扰动而非标签编辑诱发失败，利用检测器从未见过的特权模拟器状态标记每个任务，仅保留机器人标记为成功的任务。然后，我们将限制于本体感觉的检测器与基于视觉的检测器进行比较。我们发现可恢复性范围广泛：在方块转移任务中，假成功几乎完全可从关节数据中恢复，而在插销插入任务中，本体感觉仅恢复部分假成功，视觉检测器则弥补了大部分差距。我们还表明，我们测量的本体感觉可分离性依赖于远低于任何实际传感器噪声水平的速度差异，因此最好将其视为无噪声模拟器夸大的乐观上限。我们发布了生成和评估流程。

英文摘要

Imitation-learning policies for robot manipulation inherit the quality of the success labels attached to their training episodes, and those labels are usually produced by the robot's own success check. A particularly damaging error is the false success: an episode the robot logs as a success when the task outcome was actually wrong. We ask a narrow but practical question about these episodes. Once an episode has already been flagged as a success, how much of the information needed to overturn that label is present in proprioception, and how much requires vision? We build a simulated testbed on two bimanual ALOHA tasks, induce failures through environment perturbations rather than label edits, label every episode by privileged simulator state that the detector never sees, and keep only episodes the robot flagged as successful. We then compare detectors restricted to proprioception against a vision-based detector. We find that recoverability spans a wide range: in cube transfer the false successes are almost fully recoverable from joint data alone, while in peg insertion proprioception recovers only part of them and a vision detector closes most of the gap. We also show that the proprioceptive separability we measure rests on velocity differences far below any realistic sensor noise floor, so it is best read as an optimistic upper bound that a noiseless simulator inflates. We release the generation and evaluation pipeline.

URL PDF HTML ☆

赞 0 踩 0

2606.03127 2026-06-03 cs.RO 版本更新

TTT-VLA: Test-Time Latent Prompt Optimization for Vision-Language-Action Models

TTT-VLA：面向视觉-语言-动作模型的测试时潜在提示优化

Wenbo Zhang, Jianxiong Li, Shuai Yang, Sijin Chen, Jiajun Liu, Lingqiao Liu, Xiao Ma

发表机构 * ByteDance Seed（字节跳动种子）； The University of Adelaide（阿德莱德大学）； Tsinghua University（清华大学）； Zhejiang University（浙江大学）； The University of Hong Kong（香港大学）； CSIRO Data61

AI总结提出TTT-VLA框架，通过测试时优化潜在提示来适应分布偏移，无需修改策略本身，在SimperEnv上提升单/多实体任务成功率。

详情

AI中文摘要

基于大规模数据训练的视觉-语言-动作（VLA）模型取得了显著进展，但在部署时仍易受分布偏移影响。最近的VLA模型表明，提示可以作为引导策略行为的有效接口，但现有的基于提示的引导通常依赖外部指导。这自然引出一个问题：能否通过优化提示来实现VLA的测试时训练（TTT），使得引导接口本身可以从交互中学习和适应？我们通过TTT-VLA来解决这个问题，这是一种基于潜在提示优化（LPO）的测试时训练框架。在训练期间，潜在提示通过额外的代理任务学习，为策略学习提供额外的学习条件信号。在测试时，通过从当前环境收集交互数据，并仅使用代理任务的自监督信号优化这些数据上的潜在提示来执行TTT，而不修改策略本身。在SimperEnv上的实验表明，所提方法在单实体和多实体设置中均能持续提高任务成功率。进一步分析表明，提升主要源于纠正少量关键决策，而非全局改变策略行为。这些结果表明，LPO为基础操作策略的部署时改进提供了一条有效且实用的途径。

英文摘要

Vision-Language-Action (VLA) models trained on large-scale data have made remarkable progress, but they remain vulnerable to distribution shifts at deployment time. Recent VLA models suggest that prompts can serve as an efficient interface for steering policy behavior, but existing prompt-based steering typically relies on external guidance. This raises a natural question: can test-time training (TTT) for VLA be achieved by optimizing a prompt, so that the steering interface itself can be learned and adapted from interaction? We address this question with TTT-VLA, a test-time training framework based on Latent Prompt Optimization (LPO). During training, the latent prompt is learned with an additional proxy task, providing an extra learned conditioning signal for policy learning. At test time, TTT is performed by collecting interaction data from the current environment and optimizing only the latent prompt on those data using the proxy task's self-supervised signal, without modifying the policy itself. Experiments on SimplerEnv demonstrate that the proposed method consistently improves task success rates in both single- and multi-embodiment settings. Further analysis shows that the gains arise primarily from correcting a small number of critical decisions rather than globally altering policy behavior. These results suggest that LPO provides an effective and practical pathway for deployment-time improvement of foundation manipulation policies.

URL PDF HTML ☆

赞 0 踩 0

2606.03047 2026-06-03 cs.RO cs.MA 版本更新

ModuLoop : Low-Level Code Generation using Modular Synthesizer and Closed-Loop Debugger for Robotic Control

ModuLoop: 使用模块化合成器和闭环调试器进行机器人控制的低级代码生成

Gina Yoon, Sumin Lee, Joo Yong Sim

发表机构 * Department of Mechanical Systems Engineering, Sookmyung Women’s University（苏州市女子大学机械系统工程系）

AI总结提出闭环模块化代码合成框架，利用预训练大语言模型进行模块化代码规划与生成，并通过迭代执行和调试探针实现系统调试与优化，成功应用于RGB-D相机与机械臂标定及抓取任务。

Comments IEEE Robotics and Automation Letters (2025)

详情

DOI: 10.1109/LRA.2025.3623437

AI中文摘要

大型语言模型（LLMs）在包括代码生成和问题解决在内的各个领域展示了令人印象深刻的表现。然而，它们在机器人控制中的应用，特别是在需要精确操作、实时反馈和环境依赖执行的低级任务中，仍然有限。为了解决这一挑战，我们提出了闭环模块化代码合成框架。该框架利用预训练的LLM，无需任何任务特定的微调，执行模块化代码规划和生成，并在迭代执行生成的代码的同时插入调试探针以观察其行为。这种闭环结构促进了系统性的调试和优化，最终生成可执行的控制程序。我们将该框架应用于RGB-D相机和机械臂的标定，验证了其在真实世界环境中的有效性。此外，通过后续的抓取任务，我们不仅展示了标定的准确性，还展示了框架的潜在可扩展性。在两个任务中，该框架都实现了高执行准确性和自主性，说明了使用我们框架进行基于LLM的机器人控制的实用性和可扩展性。

英文摘要

Large Language Models (LLMs) have demonstrated impressive performance across various domains, including code generation and problem solving. However, their application in robotic control, particularly in low-level tasks that require precise manipulation, real-time feedback, and environment-dependent execution, remains limited. To address this challenge, we propose the Closed-Loop Modular Code Synthesizer framework. This framework leverages a pre-trained LLM without any task-specific fine-tuning to perform modular code planning and generation, and iteratively executes the generated code while inserting debugging probes to observe its behavior. This closed-loop structure facilitates systematic debugging and refinement, ultimately producing executable control programs. We apply the proposed framework to the calibration of an RGB-D camera and a robotic arm, validating its effectiveness in real-world settings. Furthermore, through a subsequent pick-and-place task, we demonstrate not only the accuracy of the calibration but also the potential extensibility of the framework. Across both tasks, the framework achieved high execution accuracy and autonomy, illustrating the practicality and scalability of LLM-based robotic control using our framework.

URL PDF HTML ☆

赞 0 踩 0

2606.03017 2026-06-03 cs.LG cs.AI cs.RO 版本更新

面向紧凑型自动驾驶感知的平衡学习与多传感器融合

Oskar Natan, Jun Miura

发表机构 * Department of Computer Science and Engineering, Toyohashi University of Technology（计算机科学与工程系，丰田寺大学）； Department of Computer Science and Electronics, Gadjah Mada University（计算机科学与电子系，加查马达大学）

AI总结提出一种紧凑的深度多任务学习模型，通过自适应损失加权和中间传感器融合技术，在单次前向传播中同时处理语义分割、深度估计、激光雷达分割和鸟瞰投影，实现高效自动驾驶感知。

Comments This work has been accepted for publication in IEEE Transactions on Intelligent Transportation Systems. https://ieeexplore.ieee.org/document/9712213

详情

DOI: 10.1109/TITS.2022.3149370

AI中文摘要

我们提出了一种新颖的紧凑型深度多任务学习模型，能够在一次前向传播中处理多种自动驾驶感知任务。该模型同时执行多视角语义分割、深度估计、激光雷达分割和鸟瞰投影，无需其他模型支持。我们还提供了一种自适应损失加权算法，以解决因任务众多而出现的学习不平衡问题。通过数据预处理和中间传感器融合技术，该模型可以处理并组合来自RGB摄像头、动态视觉传感器（DVS）和安装在自车多个位置的激光雷达的多种输入模态。因此，可以更好地理解动态变化的环境。基于消融研究，使用我们提出的方法训练的模型变体取得了更好的性能。此外，还进行了比较研究，以阐明其与一些近期模型组合相比的性能和有效性。结果表明，即使参数少得多，我们的模型仍能保持更好的性能。因此，该模型可以更快地推理，并减少GPU内存使用。此外，结果在3个不同的CARLA仿真数据集和1个真实世界的nuScenes-lidarseg数据集上保持一致。为了支持未来的研究，我们在以下网址公开共享代码和其他文件：https://this URL。

英文摘要

We present a novel compact deep multi-task learning model to handle various autonomous driving perception tasks in one forward pass. The model performs multiple views of semantic segmentation, depth estimation, light detection and ranging (LiDAR) segmentation, and bird's eye view projection simultaneously without being supported by other models. We also provide an adaptive loss weighting algorithm to tackle the imbalanced learning issue that occurred due to plenty of given tasks. Through data pre-processing and intermediate sensor fusion techniques, the model can process and combine multiple input modalities retrieved from RGB cameras, dynamic vision sensors (DVS), and LiDAR placed at several positions on the ego vehicle. Therefore, a better understanding of a dynamically changing environment can be achieved. Based on the ablation study, the model variant trained with our proposed method achieves a better performance. Furthermore, a comparative study is also conducted to clarify its performance and effectiveness against the combination of some recent models. As a result, our model maintains better performance even with much fewer parameters. Hence, the model can inference faster with less GPU memory utilization. Moreover, the result tends to be consistent in 3 different CARLA simulation datasets and 1 real-world nuScenes-lidarseg dataset. To support future research, we share codes and other files publicly at https://github.com/oskarnatan/compact-perception.

URL PDF HTML ☆

赞 0 踩 0

2606.02969 2026-06-03 cs.RO math.OC 版本更新

Hybrid Dynamics Modeling for a Flexible 2-DoF Robotic Arm

柔性2自由度机械臂的混合动力学建模

Maciek Popik, Daniel Yang, Mahdis Bisheban

发表机构 * Dept. of Mechanical and Manufacturing Eng at the Schulich School of Engineering, University of Calgary, Alberta, Canada（施密特工程学院机械与制造工程系，卡尔加里大学，阿尔伯塔，加拿大）； Schulich School of Engineering at the University of Calgary（卡尔加里大学施密特工程学院）； Intelligent Dynamics and Control Lab（智能动力与控制实验室）； University of Calgary（卡尔加里大学）

AI总结针对刚性模型无法捕获的未建模动力学，本文结合刚体动力学与高斯混合模型或纯数据驱动回归，对柔性2自由度机械臂进行混合建模，并比较了不同方法的扭矩预测精度。

详情

AI中文摘要

本文研究了三种对柔性连杆2自由度机械臂动力学进行建模的方法，以解决刚体模型无法捕获的未建模动力学。两种物理信息模型将刚体动力学（RBD）公式与高斯混合模型（GMM）相结合，以捕获残差模型误差和连杆柔性。一个基于运动学的回归模型作为纯数据驱动的基线。使用开源数据集，首先通过运动学特征的岭回归估计扭矩预测，而基于物理的基线则根据公布的规格构建，随后使用普通最小二乘回归直接从数据估计相同的参数集。结果表明，基于物理的参数精度最差，而正则化和最小二乘估计器与实测扭矩更吻合。残差分析和误差指标凸显了纯参数模型在柔性连杆系统中的局限性，并强调了正则化和数据驱动辨识的价值，支持了半参数残差学习方法的发展。

英文摘要

This paper examines three approaches for modeling the dynamics of a flexible-link 2-DoF robotic arm to address unmodeled dynamics not captured by rigid-body models. Two physics informed models combine rigid-body dynamics (RBD) formulations with a Gaussian Mixture Model (GMM) to capture residual model errors and linkage flexibility. A kinematics-based regression model serves as a purely data-driven baseline. Using an open-source dataset, torque predictions are first estimated using Ridge regression on kinematic features, while the physicsbased baseline is constructed from published specifications, and ordinary least-squares regression is subsequently used to estimate the same parameter set directly from data. Results show that the physics-based parameters yield the poorest accuracy, while regularized and least-squares estimators align more closely with measured torques. Residual analysis and error metrics highlight the limitations of purely parametric models for flexible-link systems and underscore the value of regularization and data-driven identification, supporting developments of semi-parametric residual learning methods.

URL PDF HTML ☆

赞 0 踩 0

2606.02956 2026-06-03 cs.CV cs.LG cs.RO 版本更新

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

自动驾驶的未来之路：KITScenes多模态数据集

Richard Schwarzkopf, Fabian Immel, Alexander Blumberg, Jonas Merkert, Nils Rack, Kaiwen Wang, Fabian Konstantinidis, Julian Truetsch, Carlos Fernandez, Annika Bätz, Kevin Rösch, Marlon Steiner, Willi Poh, Yinzhe Shen, Royden Wagner, Felix Hauser, Dominik Strutz, Jaime Villa, Gleb Stepanov, Holger Caesar, Ömer Şahin Taş, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

发表机构 * FZI Research Center for Information Technology（弗劳恩霍夫信息技术研究中心）； Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）； University Charles III of Madrid（马德里第三大学）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出KITScenes多模态数据集，通过高保真传感器和完整HD地图，解决现有数据集在传感器精度、地图完整性和地理多样性上的不足，并引入四个基准推动空间学习。

Comments 28 pages, 21 figures

详情

AI中文摘要

现有的自动驾驶数据集取得了重大进展，但在传感器保真度、地图完整性或地理多样性方面仍存在不足。我们提出了KITScenes多模态数据集，这是一个基于高保真传感器和地图构建的欧洲数据集。我们完全同步的传感器套件结合了高分辨率全局快门相机、超过400米的长距离激光雷达、4D成像雷达以及冗余的GNSS/INS定位。据我们所知，我们的HD地图是任何传感器数据集中最完整的，并通过开源软件上的自动驾驶试验进行了验证。首次在公共数据集中，所有与驾驶相关的交通元素（如交通灯）都以3D方式映射到重投影精确的水平，并具有完整的拓扑连接。我们的数据集记录在街道布局不规则且交通模式混合的城市中，通过拓宽可用的地理多样性来补充现有数据集。我们还引入了四个基准，每个基准都推动了具身AI的空间学习：在线HD地图构建、长距离深度估计、新颖视图合成和端到端驾驶。项目页面：此https URL

英文摘要

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/

URL PDF HTML ☆

赞 0 踩 0

2606.02951 2026-06-03 cs.RO cs.AI cs.CL cs.CV cs.HC 版本更新

SCOPE: Real-Time Natural Language Camera Agent at the Edge

SCOPE：边缘实时自然语言相机代理

Nikolaj Hindsbo, Sina Ehsani, Pragyana Mishra

发表机构 * Armada AI

AI总结提出SCOPE模块化代理，用于自然语言控制的PTZ相机，在边缘部署实现实时感知、规划与控制，并通过仿真和物理实验评估延迟、准确性和错误模式。

Comments 9 pages, 4 figures, 6 tables. Accepted at HRI '26 (21st ACM/IEEE International Conference on Human-Robot Interaction), Edinburgh, Scotland, March 16--19, 2026. Code: https://github.com/HindsboNikolaj/SCOPE

详情

DOI: 10.1145/3757279.3785641
Journal ref: Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI '26), ACM, 2026

AI中文摘要

在机器人领域部署语言驱动的代理需要能够反映现实任务需求的评估：自然语言指令与可重复的结果。此类代理必须将语言模型连接到可调用的感知和控制工具，并使用部署关键指标（包括延迟、准确性和错误模式）进行评估。我们提出了SCOPE（用于感知和评估的仿真与相机操作），这是一个模块化代理，用于自然语言、开放词汇的云台变焦（PTZ）相机控制和视觉场景理解，专门为边缘部署设计。SCOPE既可在基于Blender的仿真环境中运行，也可在物理PTZ相机上运行，所有感知、规划和控制均在部署现场使用边缘可访问的计算资源本地执行。我们发布了一个包含536个任务的基准测试，涵盖问答、单步和多步命令、计数、空间推理、描述以及光学字符识别，在基于Blender的仿真环境中提供逼真的PTZ控制功能。执行轨迹与LM作为评判器结合，以评估延迟、准确性和错误模式。我们评估了19种规划器-感知模型组合，将Qwen3小语言模型（SLM）与Moondream和Qwen视觉语言模型（VLM）配对。更强的SLM显著减少了幻觉并改善了工具路由，从而实现了更可靠的闭环行为。一旦使用了足够强大的SLM，感知就成为主要的性能瓶颈。在规划和感知方面，混合专家模型在延迟和内存占用与更小网络相当的情况下，始终匹配或超过密集替代方案。量化在精度损失最小的情况下提供了额外的效率提升，为实时、边缘可行的语言驱动PTZ控制确定了一个实用的、从仿真到现实验证的设计点。

英文摘要

Deploying language-driven agents in robotics requires evaluations that reflect real-world task demands: natural-language instructions with reproducible outcomes. Such agents must connect language models to callable perception and control tools, and be assessed using deployment-critical metrics including latency, accuracy, and error modes. We present SCOPE (Simulation and Camera Operations for Perception and Evaluation), a modular agent for natural-language, open-vocabulary pan-tilt-zoom (PTZ) camera control and visual scene understanding, designed explicitly for edge deployment. SCOPE operates both in a Blender-based simulation environment and on a physical PTZ camera, executing all perception, planning, and control locally at the deployment site using edge-accessible compute. We release a 536-task benchmark spanning QA, single- and multi-step commands, counting, spatial reasoning, descriptions, and optical character recognition in a Blender-based simulation environment that exposes realistic PTZ control affordances. Execution traces are combined with an LM-as-Judge to evaluate latency, accuracy, and error modes. We evaluate 19 planner-perception model combinations pairing Qwen3 small language models (SLMs) with Moondream and Qwen vision-language models (VLMs). Stronger SLMs substantially reduce hallucinations and improve tool routing, leading to more reliable closed-loop behavior. Once a sufficiently capable SLM is used, perception becomes the dominant performance bottleneck. Mixture-of-Experts models on both the planning and perception side consistently match or exceed dense alternatives at latencies and memory footprints comparable to much smaller networks. Quantization provides additional efficiency gains with minimal accuracy degradation, identifying a practical, sim-to-real validated design point for real-time, edge-feasible language-driven PTZ control.

URL PDF HTML ☆

赞 0 踩 0

2606.02928 2026-06-03 cs.RO 版本更新

Improved Postural Stability Using a Lightweight Semi-Active Soft Back Support Device Under Standing Perturbations

使用轻量级半主动软背部支撑装置在站立扰动下改善姿势稳定性

Rohan Khatavkar, Jiefeng Sun, Hyunglae Lee

发表机构 * School for Engineering of Matter, Transport and Energy（物质、运输与能源工程学院）

AI总结研究提出一种结合气动人工肌肉与弹性带的轻量级半主动软背部支撑装置，通过快速提供辅助力显著降低全身角动量并增加稳定裕度，从而改善站立扰动后的平衡恢复。

Comments 6 pages, 8 figures, submitted to IROS 2026, the IEEE/RSJ International Conference on Intelligent Robots and Systems

详情

AI中文摘要

老年人在站立时受到扰动（如向前失去平衡）后特别容易跌倒。辅助躯干伸展的背部支撑装置可能通过防止过度躯干屈曲来帮助减轻跌倒风险。先前的研究已经研究了重型背部支撑装置；然而，这些系统由于其附加质量往往对稳定性产生不利影响，这会使身体自然重心发生不利的偏移。相比之下，轻量级被动装置显示出有限的益处，因为它们在向前平衡丧失相关的相对较小的躯干屈曲期间只能产生适度的辅助力。在本研究中，我们评估了一种轻量级半主动软背部支撑装置在站立扰动后对姿势稳定性的影响。我们的装置将一个主动元件（气动人工肌肉）与一个被动弹性带并联。主动元件在扰动后快速提供辅助力，克服了被动装置的局限性。对五名健康个体进行的实验表明，半主动装置显著降低了全身角动量并增加了稳定裕度，表明平衡恢复性能得到改善。这些结果突显了半主动软可穿戴机器人作为站立扰动期间跌倒预防的有效且轻量级策略的前景。

英文摘要

Older adults are particularly susceptible to falls following perturbations during standing, such as forward loss of balance. Back support devices that assist trunk extension may help mitigate fall risk by preventing excessive trunk flexion. Previous studies have investigated heavy back support devices; however, these systems often introduced adverse effects on stability due to their added mass, which shifted the body's natural center of mass unfavorably. In contrast, lightweight passive devices have shown limited benefits, as they can generate only modest assistive forces during the relatively small trunk flexion associated with forward balance loss. In this study, we evaluated the effects of a lightweight semi-active soft back support device on postural stability following standing perturbations. Our device combines an active element (a pneumatic artificial muscle) in parallel with a passive elastic band. The active element rapidly provides assistive force following a perturbation, overcoming the limitations of passive devices. Experiments conducted with five healthy individuals demonstrated that the semi-active device significantly reduced whole-body angular momentum and increased the margin of stability, indicating improved balance recovery performance. These results highlight the promise of semi-active soft wearable robots as an effective and lightweight strategy for fall prevention during standing perturbations.

URL PDF HTML ☆

赞 0 踩 0

2606.02888 2026-06-03 cs.RO 版本更新

Impact of a Soft Wearable Back-Support Device on Postural Stability during Trip-Like Perturbations

软性可穿戴背部支撑装置在类似绊倒扰动下对姿势稳定性的影响

Yuanhao Chen, Rohan Khatavkar, Soubhagya Nayak, Jiefeng Sun, Hyunglae Lee

发表机构 * School for Engineering of Matter, Transport and Energy, Arizona State University（物质、运输与能源工程学院，亚利桑那州立大学）

AI总结通过扰动站立和行走实验，研究软性可穿戴背部支撑装置在类似绊倒扰动下对姿势稳定性的增强效果，发现装置使用提高了最小稳定裕度，表明其可改善反应性平衡控制，具有防跌倒潜力。

Comments 6 pages, 6 figures, to be published in the proceedings of the 2026 11th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob)

详情

AI中文摘要

通过两种实验范式（扰动站立和扰动行走）研究了软性可穿戴背部支撑装置在类似绊倒扰动下对姿势稳定性的增强效果。健康受试者在三种不同的背部支撑条件下完成试验：无装置、低刚度装置、高刚度激活装置。使用最大不稳定点的最小稳定裕度（MOS）量化全身稳定性。结果表明，使用装置时MOS增加，表明姿势稳定性增强。在站立条件下，MOS随装置刚度显著增加；而在行走条件下，两种装置条件相比无装置均改善了MOS，但两者之间无显著差异。这些发现凸显了具有可调刚度的软性可穿戴背部支撑装置在改善对外部扰动的反应性平衡控制方面的潜力，对防跌倒具有重要意义。未来研究应探索个性化刚度优化，并评估在跌倒高风险人群中的有效性。

英文摘要

The effectiveness of a soft wearable back-support device in enhancing postural stability was investigated under trip-like perturbations using two experimental paradigms: perturbed standing and perturbed walking. Healthy subjects completed trials under three different back-support conditions: no device, device worn with low stiffness, and device activated with high stiffness. Whole-body stability was quantified using the minimum Margin of Stability (MOS) at the point of maximal instability. Results demonstrated increased MOS during device use, indicating enhanced postural stability. In standing, MOS increased significantly with device stiffness, whereas in walking, both device conditions improved MOS relative to no device but did not differ significantly from each other. These findings highlight the potential of soft wearable back-support devices with adjustable stiffness to improve reactive balance control against external perturbations, with important implications for fall prevention. Future research should explore personalized stiffness optimization and evaluate efficacy in populations at elevated risk of falls.

URL PDF HTML ☆

赞 0 踩 0

2606.02879 2026-06-03 cs.RO 版本更新

Direct Informed Sampling on Riemannian Manifolds via Loewner Order Lower Bounds

基于Loewner序下界的黎曼流形直接知情采样

Phone Thiha Kyaw, Jonathan Kelly

发表机构 * Space and Terrestrial Autonomous Robotic Systems (STARS) Laboratory, University of Toronto Institute for Aerospace Studies (UTIAS)（太空与地面自主机器人系统实验室，多伦多大学航空航天研究所）

AI总结提出一种利用Loewner序计算度量张量最紧常数下界的矩阵值可容许启发式，将黎曼知情集映射为各向同性欧氏空间中的标准长球超椭球，实现直接无拒绝采样，加速多种最优运动规划器收敛。

Comments Submitted to IEEE Robotics and Automation Letters (RA-L)

详情

AI中文摘要

知情采样技术通过将搜索聚焦于状态空间的有希望区域来加速基于采样的运动规划器，然而大多数现有方法依赖于欧氏启发式，这些启发式在依赖于构型的黎曼度量下变得不可容许。虽然标量特征值下界通过均匀缩放欧氏距离恢复了可容许性，但它们丢弃了度量的方向结构，产生过于保守的知情集。我们提出一种矩阵值可容许启发式，利用对称正定矩阵上的Loewner序计算度量张量最紧的常数下界，同时保留其完整的方向结构。该下界的Cholesky分解定义了一个到各向同性欧氏空间的线性映射，在该空间中黎曼知情集简化为标准的长球超椭球，从而能够使用现有算法进行直接无拒绝采样。在6自由度UR5、7自由度Franka和14自由度PR2上三种不同黎曼度量下的操作任务实验表明，我们的启发式产生的知情集始终比欧氏和标量特征值下界更紧，加速了多种最先进渐近最优规划器的收敛。

英文摘要

Informed sampling techniques accelerate sampling-based motion planners by focusing the search on promising regions of the state space, yet most existing methods rely on Euclidean heuristics that become inadmissible under configuration-dependent Riemannian metrics. While scalar eigenvalue bounds restore admissibility by uniformly scaling the Euclidean distance, they discard the directional structure of the metric, producing overly conservative informed sets. We propose a matrix-valued admissible heuristic that exploits the Loewner order on symmetric positive definite matrices to compute the tightest constant lower bound on the metric tensor while preserving its full directional structure. The Cholesky factorization of this bound defines a linear map to an isotropic Euclidean space in which the Riemannian informed set reduces to a standard prolate hyperspheroid, enabling direct, rejection-free sampling using existing algorithms. Experiments on manipulation tasks with a 6-DoF UR5, 7-DoF Franka, and 14-DoF PR2 under three distinct Riemannian metrics show that our heuristic produces consistently tighter informed sets than both the Euclidean and scalar eigenvalue bounds, accelerating convergence across multiple state-of-the-art asymptotically optimal planners.

URL PDF HTML ☆

赞 0 踩 0

2606.02872 2026-06-03 eess.SY cs.MA cs.RO cs.SY 版本更新

Terminal Time and Angle-Constrained Nonlinear Intercept Guidance

终端时间和角度约束的非线性拦截制导

Shivam Bajpai, Abhinav Sinha

发表机构 * University of California（加州大学）

AI总结针对单一控制输入下的欠驱动非线性拦截问题，提出基于分层滑模的制导律，同时控制终端时间和角度，并扩展至常速目标拦截。

详情

AI中文摘要

本文考虑使用横向加速度作为唯一控制输入，同时控制拦截器的撞击时间和撞击角度的问题。由于单一控制输入，非线性交战运动学本质上是欠驱动的，这使得制导律综合变得复杂。为了克服这一挑战，开发了一种基于分层滑模的制导律，以同时调节两个终端约束。所提出的架构包括一个两层滑模流形。第一层由分别对应撞击时间和撞击角度误差动力学的两个子滑模面组成，而第二层引入了一个组合两个单独子滑模面的复合滑模流形。然后，设计了一种变增益自适应制导律，以确保对静止目标的带时间和角度约束的拦截，并进一步扩展至拦截常速目标。针对各种交战场景进行了仿真，以证明所提出方法的有效性。

英文摘要

This paper considers the problem of simultaneously controlling an interceptor's impact time and impact angle using its lateral acceleration as the sole control input. With a single control input, the nonlinear engagement kinematics is inherently underactuated, which complicates guidance law synthesis. To overcome this challenge, a hierarchical sliding mode-based guidance law is developed to concurrently regulate the two terminal constraints. The proposed architecture consists of a two-layer sliding manifold. The first layer comprises two sub-sliding surfaces corresponding to the impact time and impact angle error dynamics, respectively, while the second layer introduces a composite sliding manifold that combines the two individual sub-surfaces. Then, a variable-gain adaptive guidance law is designed to ensure time and angle-constrained interception against a stationary target, which is further extended to intercept a constant velocity target. Simulations are conducted for various engagement scenarios to attest to the efficacy of the proposed approach.

URL PDF HTML ☆

赞 0 踩 0

2606.02796 2026-06-03 cs.RO 版本更新

A Measurement-Driven Digital Twin Architecture for Plant-Level Biomass Estimation and Growth Forecasting in Hydroponic Systems

基于测量驱动的数字孪生架构：用于水培系统中植物级生物量估计与生长预测

Morgan Mayborne, Abhisesh Silwal, George Kantor

发表机构 * The Robotics Institute, Carnegie Mellon University（卡内基梅隆大学机器人研究所）

AI总结提出一种结合传感器数据和模型更新的数字孪生架构，通过RGB-D图像和神经网络实时估计生菜质量，并实现未来1-4天生长预测，误差约2克。

Comments 7 pages, 6 figures

详情

AI中文摘要

针对密集城市中心的食品分配问题，已开发出水培等替代土壤园艺的方法。本文开发了一种新系统，利用测量信息流和可用模型，持续更新水培环境中单个生菜植株的生长轨迹估计。这些“数字孪生”模型被集成到一个运行中的水培温室中，配备定制园艺和传感器硬件以生长和测量相关信息。为辅助更新模型参数，使用自定义神经网络连续测量植物产量，输入为植物的RGB-D图像。该网络在1300张图像的收集数据集上训练，能够估计质量，误差在真实值的1.5克以内。集成到定制系统后，数字孪生生长预测可近似未来1至4天的产量，保持约2克的预测误差。

英文摘要

Alternatives to soil-based horticulture, such as hydroponics, have been developed to respond to food distribution concerns for dense urban centers. A new system was developed to track an individual lettuce plant's growth in a hydroponic environment, utilizing streams of measured information and available models to continuously update the growth trajectory estimates for a plant. These "digital twin" models were integrated into an operating hydroponic greenhouse, with custom horticultural and sensor hardware to grow and measure relevant information. To aid in updating model parameters, plant yield was continuously measured with a custom neural network, using RGB-D images of the plants as an input. The network, trained on a collected dataset of 1300 images, was able to estimate mass within 1.5 g of the ground-truth value. After integration into the custom system, digital twin growth projections could approximate future yield between one and four days in the future, maintaining around a 2 g forecasting error.

URL PDF HTML ☆

赞 0 踩 0

2606.02775 2026-06-03 cs.AI cs.AR cs.DC cs.PF cs.RO 版本更新

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

AURA: 恒定VRAM下机器人策略的动作门控记忆

Josef Chen

发表机构 * KAIKAKU（卡基库）

AI总结提出AURA-Mem，一种恒定大小、基于动作误差信号门控写入的循环记忆，替代KV缓存，在边缘机器人任务中实现与基线相当的准确率，同时减少5-9倍写入次数。

详情

AI中文摘要

KV缓存是数据中心合适的记忆，但却是机器人错误的记忆。数据中心推理批量处理许多短请求并重置它们，在众多请求中分摊注意力缓存。具身智能体则在带宽受限的边缘硬件上运行一个长且不重置的回合，其中高带宽内存和闪存稀缺，闪存写入寿命有限，内存写入而非计算可能成为约束瓶颈。AURA-Mem（动作效用循环自适应记忆）针对这一场景。它用一个固定大小的循环记忆和一个学习得到的门控包装冻结的视觉-语言-动作骨干网络，该门控仅在当前观测会改变下一个动作时写入：一种知道何时保持沉默的记忆。与基于重建的记忆不同，该门控直接针对闭环动作误差信号进行训练。其推理状态固定为4,224字节，无论时间步长如何，而KV缓存则在100,000步时增长到6,061倍。在受控的合成基准测试中，AURA-Mem在准确率上与最佳的O(1)基线相当，同时使用5.19-6.13倍更少的写入，在更简单的配置上最多减少9.19倍写入。预算匹配的随机和周期性调度无法恢复这一增益，从而将收益归因于动作惊喜信号。在LIBERO-Long上训练的闭环OpenVLA-OFT 7B面板（每个机械臂n=60个回合）上，门控不会损害成功率：AURA-Mem与无门控基础策略（0.233）相当，并略超过始终写入的KV臂（0.217），同时使用7.0倍更少的写入和恒定内存。我们还实例化了一个近似信息状态价值损失界限作为方法论演示；在此规模下，该界限是空洞的而非保证。

基于自适应无迹卡尔曼滤波和非线性模型预测控制的四旋翼飞行器固定时间动态着陆

Mohammadreza Izadi, Zeinab Shayan, Steven Waslander, Reza Faieghi

发表机构 * Autonomous Vehicles Laboratory, Department of Aerospace Engineering, Toronto Metropolitan University（自主车辆实验室，航空航天工程系，多伦多 Metropolitan 大学）； University of Toronto Institute for Aerospace Studies, University of Toronto（多伦多大学航空航天研究 institute，多伦多大学）

AI总结提出一种结合非线性模型预测控制与实时最小加加速度轨迹规划器及自适应无迹卡尔曼滤波的估计与控制框架，实现多旋翼无人机在移动平台上的固定时间动态着陆，并通过仿真和硬件实验验证了其可重复着陆能力和优于EKF/UKF的速度预测精度。

Comments Accepted to the Conference on Robots and Vision (CRV 2026), Vancouver, Canada

详情

DOI: 10.21428/d82e957c.2b2a57b1

AI中文摘要

本文介绍了一种用于多旋翼无人机在移动平台上动态着陆的估计与控制框架。所提出的方法将非线性模型预测控制与实时最小加加速度轨迹规划器相结合，该规划器强制执行规定的着陆时间，从而在终端下降过程中实现一致的时间安排。为了增强在时变传感质量下的鲁棒性，我们采用了自适应无迹卡尔曼滤波，在线更新过程和测量噪声统计量。此外，我们提供了参考可行性分析，表明在标准跟踪假设下，最小加加速度参考会诱导有界的推力和扭矩指令。所提出的框架在仿真和硬件实验中进行了评估，并表明相对于基于EKF/UKF的方法，实现了可重复的着陆和改进的平台速度预测精度。

英文摘要

This paper introduces an estimation and control framework for dynamic landing of multi-rotor uncrewed aerial vehicles on moving platforms. The proposed method integrates nonlinear model predictive control with a real-time minimum-jerk trajectory planner that enforces a prescribed touchdown time, enabling consistent timing during the terminal descent. To enhance robustness in the presence of time-varying sensing quality, we utilize an adaptive unscented kalman filter that updates the process and measurement noise statistics online. In addition, we provide a reference feasibility analysis showing that minimum-jerk references induce bounded thrust and torque commands under standard tracking hypotheses. The proposed framework is evaluated in simulation and hardware experiments, and it is shown to achieve repeatable landings and improved platform velocity prediction accuracy relative to EKF/UKF-based methods.

URL PDF HTML ☆

赞 0 踩 0

2606.02641 2026-06-03 cs.RO cs.AI 版本更新

CARVE: Certified Affordable Repair of Vetoed Maneuvers via Envelopes for Interactive Driving

CARVE: 通过包络实现交互驾驶中被否决机动的认证可负担修复

Yifan Wang

发表机构 * Yifan Wang（王一帆）

AI总结针对交互驾驶中规则感知堆栈易忽略的硬规则裕度负值问题，提出CARVE认证层，通过有限格点上的自我与代理战术算子，实现被否决机动的可负担修复认证，并证明其合理性。

Comments 8 pages, 3 figures

详情

AI中文摘要

交互驾驶暴露了规则感知自动驾驶堆栈中容易忽略的失效模式：即使非优先代理的小幅合法让步可恢复可行性，自我候选的硬规则裕度仍可能为负。现有的规则手册、防护和可达性过滤器在否决不安全动作方面表现强劲，而基于预测的规划器则对可能的响应进行建模。两者均未返回运行时证明对象，该对象说明哪个有界多代理编辑修复了机动、谁拥有编辑、请求是否在路权上可负担，以及如果请求未被遵守，自我后备是什么。我们将这一缺失对象形式化为*交互修复认证*，并引入*CARVE*，一个在自我拥有和代理拥有的战术算子有限格点上的无预测认证层。代理拥有的请求仅在$B_j(s) = eta(\pi_j)\alpha_j^{\max}(s)$内可接受，这是一个将运动学可达性与规范优先级分离的合作包络。生成的证书记录了绑定规则、修复类别、修复集、责任加权成本分配和后备。在589个基于Lanelet2几何的INTERACTION重放片段上，CARVE-Greedy接受了98.64%的初始否决机动，恢复了370/378个人类解决错误否决，同时保持了589/589的路权尊重、零优先级代理假阳性以及400/400的负压力否决。我们证明了证书的合理性、结构性的路权尊重、精确的有限格点最小性、后备应急性和责任一致性条件。CARVE不预测也不需要其他驾驶员的合规性；它认证在声明假设下提议的交互是否有界、可归因且规范上可接受。

英文摘要

Interactive driving exposes a failure mode that is easy to miss in rule-aware autonomous-driving stacks: a hard-rule margin can be negative for an ego candidate even though a small lawful accommodation by a non-priority agent would restore feasibility. Existing rulebooks, shields, and reachability filters are strong at vetoing unsafe actions, while prediction-based planners model likely responses. Neither returns a runtime proof object that states which bounded multi-agent edit repairs the maneuver, who owns the edit, whether the request is right-of-way affordable, and what ego fallback remains if the request is not observed. We formulate this missing object as *interactive repair certification* and introduce *CARVE*, a prediction-free certificate layer over a finite lattice of ego-owned and agent-owned tactical operators. Agent-owned requests are admissible only inside $B_j(s) = β(π_j)α_j^{\max}(s)$, a cooperation envelope that separates kinematic reachability from normative priority. The resulting certificate records the binding rule, repair category, repair set, responsibility-weighted cost split, and fallback. On 589 Lanelet2-geometry-grounded INTERACTION replay episodes, CARVE-Greedy accepts 98.64% of initially vetoed maneuvers and recovers 370/378 human-resolved false vetoes, while preserving 589/589 right-of-way respect, zero priority-agent false positives, and 400/400 negative-stress vetoes. We prove certificate soundness, structural right-of-way respect, exact finite-lattice minimality, fallback contingency, and blame-consistency conditions. CARVE does not predict or require another driver's compliance; it certifies whether a proposed interaction is bounded, attributable, and normatively admissible under declared assumptions.

URL PDF HTML ☆

赞 0 踩 0

2606.03735 2026-06-03 nlin.CD cs.MA cs.RO 版本更新

On dynamic multi-agent pathfinding methods: review, simulations and modifications

动态多智能体路径规划方法：综述、仿真与改进

Gabriel Fejziaj, Salama Hassona, Wieslaw Marszalek

发表机构 * Department of Computer Science, Opole University of Technology（计算机科学系，奥波尔技术大学）

AI总结本文系统研究动态多智能体路径规划（D-MAPF）中的六种代表性算法，并提出一种基于模板的A**算法，通过离线几何路径生成与在线时间适应解耦，在频繁变化和有限感知环境中提高解质量。

2606.01851 2026-06-03 cs.RO 版本更新

PHASOR: Phase-Anchored Universal Action Representations for Humanoid Embodiments

PHASOR: 面向人形本体的相位锚定通用动作表示

Kihyun Kim, Chaeyun Kim, Jongho Shin, Taeyoun Kwon, Junghyun Kim, Mijin Koo, Haon Park

发表机构 * AIM Intelligence ； Seoul National University（首尔国立大学）； LG Electronics（LG电子）； MAUM AI ； OpenMind

AI总结提出PHASOR方法，通过将动作嵌入空间分解为相位流形和姿态分支，并结合运动语义蒸馏，构建跨本体的通用动作表示，实现人形机器人的跨本体检索和下游任务性能提升。

Comments * Equal contribution

详情

AI中文摘要

学习一个好的动作嵌入空间对于可扩展的机器人策略学习至关重要，但现有方法将动作潜在变量视为任务特定的中间产物，而非第一类表示。由此产生的潜在变量是非结构化的、本体特定的，且与运动语义关联较弱，限制了可解释性、可控性和跨机器人的迁移性。我们将动作嵌入空间本身定位为第一类设计目标，下游策略质量源于表示质量。利用运动的内在周期性，我们将其分解为一个相位流形（通过FFT参数系数捕获循环结构）和一个姿态分支（将流形条件化为非周期配置细节）。结合运动语义蒸馏，这种分解结构产生了一个跨本体的运动流形，该流形在设计上是可解释且与本体无关的。将多个人形机器人锚定到一个共享的预训练流形上，则在不同平台上产生统一的动作嵌入空间，实现了强大的跨本体检索和下游机器人任务的一致性能提升。

英文摘要

Learning a good action embedding space is fundamental to scalable robot policy learning, yet existing methods treat action latents as task-specific intermediates rather than first-class representations. The resulting latents are unstructured, embodiment-specific, and weakly tied to motion semantics, limiting interpretability, controllability, and transferability across robots. We position the action embedding space itself as a first-class design target, with downstream policy quality emerging from representation quality. Exploiting motion's intrinsic periodicity, we factorize it into a phase manifold that captures cyclic structure via FFT-parametric coefficients, together with a pose branch that conditions the manifold on non-periodic configuration detail. Combined with motion-semantic distillation, this factorized structure yields a cross-embodiment motion manifold that is interpretable and embodiment-agnostic by design. Anchoring multiple humanoid robots to a shared human-pretrained manifold then produces a unified action embedding space across diverse platforms, achieving strong cross-embodiment retrieval and consistent gains on downstream robot tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.01241 2026-06-03 cs.RO 版本更新

OneVLA: A Unified Framework for Embodied Tasks

OneVLA：面向具身任务的统一框架

Lingfeng Zhang, Xiaoshuai Hao, Yingbo Tang, Lei Zhou, Shuyi Zhang, Jinkun Liu, Hongsheng Li, Chenhao Zhang, Qiang Zhang, Hangjun Ye, Xiaojun Liang, Long Chen, Wenbo Ding

发表机构 * Tsinghua University（清华大学）； Pengcheng Laboratory（鹏城实验室）； Xiaomi EV（小米电动车）； Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； Peking University（北京大学）； HKUST(GZ)（香港科技大学（广州））

AI总结提出统一架构OneVLA，通过设计统一动作头和渐进式训练策略（含数据构建和思维链微调），在导航与操作任务上实现跨任务正迁移，达到最先进性能。

详情

AI中文摘要

导航和操作是具身智能的基本能力，使机器人能够解释自然语言命令并与环境进行物理交互。然而，当前的视觉-语言-动作（VLA）模型仍受限于任务特定的架构，专门处理导航或操作，这阻碍了通用机器人智能体的发展。为弥补这一差距，我们引入了OneVLA，一个统一架构，将这些不同任务整合到单个连贯框架中。具体来说，我们设计了一个统一的动作头，能够生成导航和操作动作，无需任务特定的变体。此外，我们提出了一种多阶段渐进式训练策略——结合精心构建的数据和思维链（CoT）微调——促进了两个领域之间的强正迁移和相互增强。在模拟和真实环境中的大量实验表明，OneVLA实现了最先进的性能，显著优于专门的单任务和现有的跨任务模型。通过统一这些核心能力，OneVLA为真正的通用机器人系统铺平了道路。模型和源代码将公开发布。

英文摘要

Navigation and manipulation are fundamental capabilities of embodied intelligence, enabling robots to interpret natural language commands and interact physically with their surroundings. However, current Vision-Language-Action (VLA) models remain constrained by task-specific architectures, specializing in either navigation or manipulation, which hinders the development of general-purpose robotic agents. To bridge this gap, we introduce OneVLA, a unified architecture that integrates these distinct tasks into a single, cohesive framework. Specifically, we design a unified action head capable of generating both navigation and manipulation actions without requiring task-specific variants. Furthermore, we propose a multi stage progressive training strategy-incorporating curated data construction and Chain-of-Thought (CoT) fine-tuning that facilitates strong positive transfer and mutual reinforcement between the two domains. Extensive experiments in both simulated and real-world environments demonstrate that OneVLA achieves state-of-the-art performance, significantly outperforming both specialized single-task and existing cross-task models. By unifying these core capabilities, OneVLA paves the way for truly general-purpose robotic systems. The model and source code will be publicly released.

URL PDF HTML ☆

赞 0 踩 0

2605.31434 2026-06-03 cs.RO 版本更新

Shaft-integrated Force Sensing with Transformer-based Dynamics Compensation for Telesurgery

基于变压器的动力学补偿的轴集成力传感用于远程手术

Shuyuan Yang, Grant Boone, Timo Markert, Sebastian Matich, Andreas Theissler, Martin Atzmueller, Zonghe Chua

发表机构 * Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University（电气、计算机与系统工程系，凯斯西储大学）； Department of Mechanical and Aerospace Engineering, Case Western Reserve University（机械与航空航天工程系，凯斯西储大学）； Resense GmbH ； Semantic Information Systems Group, Osnabrück University（语义信息系统组，奥斯纳布吕克大学）； Justus Liebig University（吉森大学）； German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心（DFKI））

AI总结提出一种将六轴力传感器集成到标准缆驱动手术器械远端的方法，利用变压器神经网络补偿内部缆力，实现末端执行器力估计，归一化误差低于6%。

Comments The paper was accepted by IEEE Transactions on Medical Robotics and Bionics in May 2026

详情

AI中文摘要

机器人辅助微创手术（RAMIS）增强了外科医生的灵巧性，新平台利用触觉反馈进一步提高性能。这种力信息具有更广泛的潜力，可用于性能评估、触觉定位和手术自主性。这促使需要将力传感集成到RAMIS工具中的可访问方法。本工作提出了一种将六轴商用力传感器集成到标准缆驱动手术器械远端的方法，在保持设备原始机械功能的同时实现末端执行器力测量。所提出的设计强调可重复性和研究应用的可访问性，无需专门的制造工具。变压器神经网络将力传感器测量值与机器人状态信息相结合，以帮助估计末端执行器施加的力，补偿由驱动引起的内部缆力。我们提出的方法实现了低于6%的归一化误差，并且比纯近端数据驱动传感方法更好地泛化到未见条件。高内部缆力导致传感器饱和并降低轴向力的可观测性，这可能沿工具主轴和更高负载条件下降低性能。鉴于当前性能水平，系统集成性和性能的平衡使得在RAMIS中触觉反馈、技能评估和力信息自主性等及时主题的应用和研究成为可能。视频和代码可在https://enhanced-telerobotics.github.io/shaft force sensing获取。

英文摘要

Robot-Assisted Minimally Invasive Surgery (RAMIS) enhances surgeon dexterity, with newer platforms leveraging haptic feedback to further improve performance. Such force information has broader potential to inform performance assessment, tactile localization, and surgical autonomy. This motivates the need for accessible approaches to integrating force sensing into RAMIS tools. This work presents a method for integrating a six-axis commercial force sensor into the distal end of a standard cable-driven surgical instrument, enabling end-effector force measurement while preserving the original mechanical functionality of the device. The proposed design emphasizes reproducibility and accessibility for research applications, requiring no specialized manufacturing tools. A transformer neural network integrates force sensor measurements with robot state information to aid estimation of applied forces at the end-effector, compensating for internal cable forces arising from actuation. Our proposed approach achieved normalized errors below 6%, and generalized to unseen conditions better than purely proximal data-driven sensing approaches. High internal cable forces caused sensor saturation and reduced axial force observability, which can degrade performance along the tool's major axis and under higher load conditions. Given current levels of performance, the balance of system integrability and performance enables applications and research into timely topics of haptic feedback, skill assessment, and force-informed autonomy in RAMIS. Videos and code are available at https://enhanced-telerobotics.github.io/shaft_force_sensing/.

URL PDF HTML ☆

赞 0 踩 0

2605.31067 2026-06-03 cs.RO 版本更新

Seeing Fast and Slow: Bimodal 3D Scene Graphs for Open-set Tasks

快与慢：面向开放集任务的双模态3D场景图

Marcel Bartholomeus Prasetyo, Shrutika Vishal Thengane, A Manicka Praveen, Yi Loo, Malika Meghjani

AI总结提出BiMoSG方法，通过默认快速模式生成粗粒度3D场景图，并在需要时切换至慢速模式生成细粒度开放词汇3D场景图，实现实时任务执行。

Comments Submission has not been cleared with funding agency

详情

AI中文摘要

开放集任务执行可以显著受益于根据上下文和机器人探索环境时不断变化的信息，在粗粒度和细粒度场景表示之间无缝切换。例如，通常从粗粒度场景表示开始就足够了，只有当机器人遇到可能包含任务相关对象的区域时，才采用更精细、更细粒度的场景表示。因此，在这项工作中，我们提出了BiMoSG，一种用于开放集任务的双模态3D场景图生成方法。BiMoSG默认采用“快速”模式，以高效生成粗粒度3D场景图，并可以切换到“慢速”模式，为任务相关对象生成更精细的开放词汇3D场景图。我们证明，我们提出的3D场景图生成方法显著快于开源的最新方法。这使得我们能够将场景图生成过程与任务执行集成，用于实时部署。

英文摘要

Open-set task execution can significantly benefit from seamlessly switching between coarse and fine scene representations depending on the context and the evolving information as the robot explores the environment. For example, it is often sufficient to start with a coarse scene representation initially and only employ a finer, more granular scene representation when the robot encounters regions which are likely to contain the task relevant objects. Hence, in this work, we propose BiMoSG, a bimodal 3D scene graph generation approach for open-set tasks. BiMoSG employs a "fast" mode by default to efficiently generate a coarse 3D scene graph and can switch to a "slow" mode for generating a finer open vocabulary 3D scene graph of task relevant objects. We demonstrate that our proposed 3D scene graph generation approach is significantly faster than the open-source state-of-the-art approaches. This allows us to integrate the scene graph generation process with task execution for real-time deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.26006 2026-06-03 cs.CV cs.GR cs.RO 版本更新

MIND: Multi-Scale Intent Diffusion for Text-Driven Physics-Based Humanoid Control

MIND: 多尺度意图扩散用于文本驱动的基于物理的人形控制

Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Jingya Wang

发表机构 * ShanghaiTech University（上海科技大学）； University of Pennsylvania（宾夕法尼亚大学）； Bytedance Seed（字节跳动种子）； Stanford University（斯坦福大学）； InstAdapt

AI总结提出MIND框架，通过多尺度意图扩散机制将文本命令与低级动作语义对齐，实现基于物理的人形机器人行为生成。

详情

AI中文摘要

使基于物理的人形机器人能够根据高级文本命令执行多样化的行为仍然是一个重大挑战。现有方法通常遵循两阶段范式（结合运动学动作生成与基于物理的跟踪）或端到端模仿学习范式（直接从文本生成动作）。然而，前者受限于运动学生成与基于物理跟踪之间的固有域偏移，而后者则难以弥合文本命令与低级动作之间的巨大模态差距，限制了有效的语义对齐。值得注意的是，人形状态编码了丰富的运动动态，与低级动作相比，这些动态在语义上与文本描述更对齐，因此成为推导行为意图的自然基础。基于这一见解，我们提出了MIND，一种新颖的端到端扩散框架，用于文本驱动的基于物理的人形控制，该框架利用行为意图作为文本命令与低级动作之间的语义桥梁。其核心是，MIND引入了多尺度意图扩散机制，其中整体意图预测器捕获全局行为动态以指导整体行为合成，而即时意图预测器在每一步扩散中提供逐步的细粒度信号以进行局部行为细化。这种分层意图公式化为人形控制施加了结构化的归纳偏置，改善了语义对齐和行为自然性。此外，MIND将人形状态编码到潜在空间中，以实现更有效的语义意图建模。大量实验表明，MIND优于现有方法，并能从文本命令中合成连贯、物理合理且语义对齐的人形行为。我们的代码将发布以促进未来研究。

英文摘要

Enabling physics-based humanoids to execute diverse behaviors from high-level textual commands remains a significant challenge. Existing methods typically follow either a two-stage paradigm that combines kinematic motion generation with physics-based tracking, or an end-to-end imitation-learning paradigm that directly generates actions from text. However, the former suffers from the inherent domain shift between kinematic generation and physics-based tracking, while the latter struggles with the substantial modality gap between textual commands and low-level actions, limiting effective semantic alignment. Notably, humanoid states encode rich motion dynamics that are more semantically aligned with textual descriptions than low-level actions, making them a natural basis for deriving behavioral intent. Building upon this insight, we propose MIND, a novel end-to-end diffusion framework for text-driven physics-based humanoid control that leverages behavioral intent as a semantic bridge between textual commands and low-level actions. At its core, MIND introduces a multi-scale intent diffusion mechanism, where a holistic intent predictor captures global behavioral dynamics to guide overall behavior synthesis, while an immediate intent predictor provides step-wise, fine-grained signals for local behavior refinement at each diffusion step. This hierarchical intent formulation imposes a structured inductive bias for humanoid control, improving semantic alignment and behavioral naturalness. Furthermore, MIND encodes humanoid states into a latent space to enable more effective semantic intent modeling. Extensive experiments demonstrate that MIND outperforms existing methods and synthesizes coherent, physically plausible, and semantically aligned humanoid behaviors from text commands. Project page: https://binlee26.github.io/MIND_page.

URL PDF HTML ☆

赞 0 踩 0

2605.30313 2026-06-03 cs.RO 版本更新

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

UniLab: 超越GPU主导范式的机器人强化学习异构架构

Yufei Jia, Zhanxiang Cao, Mingrui Yu, Heng Zhang, Shenyu Chen, Dixuan Jiang, Meng Li, Xiaofan Li, Yiyang Liu, Junzhe Wu, Zheng Li, XiLin Fang, Ting-Yu Tsui, Shengcheng Fu, Haoyang Li, Anqi Wang, Zifan Wang, Dongjie Zhu, Chenyu Cao, Zhenbiao Huang, Ziang Zheng, Jie Lu, Xin Ma, Zhengyang Wei, Xiang Zhao, Tianyue Zhan, Ye He, Yuxiang Chen, Yizhou Jiang, Yue Li, Haizhou Ge, Yuhang Dong, Fan Jia, Ziheng Zhang, Meng Zhang, Xiwa Deng, Zhixing Chen, Hanyang Shao, Chenxin Dong, Yixuan Li, Yizhi Chen, Bokui Chen, Kaifeng Zhang, Hanqing Cui, Yusen Qin, Ruqi Huang, Lei Han, Tiancai Wang, Xiang Li, Yue Gao, Guyue Zhou

发表机构 * THU（清华大学）； SJTU（上海交通大学）； SII（上海信息所）； Motphys ； HITSZ（哈尔滨工业大学）； BIT（北京理工大学）； NEU（南京大学）； SUSTech（四川大学）； TJU（天津大学）； DISCOVER Robotics ； HKUST(GZ)（香港科技大学（广州））； Galbot ； NUS（国立新加坡大学）； WTU（武汉理工大学）； HBUT（湖南大学）； AMD ； NJU（南京大学）； ZJU（浙江大学）； Dexmal ； Sharpa ； D-Robotics

AI总结提出UniLab异构CPU-仿真/GPU-学习架构，通过统一运行时解耦CPU并行仿真与GPU策略更新，在相同硬件配置下将端到端训练效率提升3-10倍，并减少对NVIDIA CUDA的依赖。

详情

AI中文摘要

基于仿真的当代机器人控制强化学习日益围绕GPU驻留仿真组织：物理、轨迹收集和学习都放在单个以GPU为中心的执行路径上。这种范式极大地提高了训练速度，但也鼓励了一种默认假设，即高效训练需要物理位于GPU上。我们重新审视这一假设。我们的观点是，在仿真主导的机器人控制中，关键问题不是哪个处理器运行物理，而是仿真吞吐量、策略学习和运行时同步是否形成高效的端到端循环。我们提出了UniLab，一种异构CPU-仿真/GPU-学习架构，通过统一的数据移动、缓冲和同步运行时，将CPU并行仿真与GPU策略更新解耦。UniLab实现为一个完整且可扩展的训练系统，使用MuJoCoUni和MotrixSim CPU批处理物理后端，支持PPO、FastSAC、FlashSAC和APPO。在代表性的基于仿真的机器人控制任务上，UniLab在相同硬件配置下将端到端训练效率提升了3-10倍，同时减少了对基于NVIDIA CUDA的软件栈的依赖，并支持在Apple macOS平台以及AMD ROCm和Intel XPU加速器后端上的跨平台执行。这些结果表明，GPU仿真是高效训练的有效路径，但不是必需的路径，拓宽了机器人强化学习训练可用的实际系统选择。项目页面：https://unilabsim.github.io。

英文摘要

Simulation-based RL for contemporary robot control is increasingly organized around GPU-resident simulation: physics, rollout collection, and learning are placed on a single GPU-centric execution path. This paradigm has greatly improved training speed, but it has also encouraged a default assumption that efficient training requires physics to reside on the GPU. We revisit this assumption. Our view is that, in simulation-dominated robot control, the essential question is not which processor runs physics, but whether simulation throughput, policy learning, and runtime synchronization form an efficient end-to-end loop. We present UniLab, a heterogeneous CPU-simulation / GPU-learning architecture that decouples CPU-parallel simulation from GPU policy updates through a unified runtime for data movement, buffering, and synchronization. UniLab is implemented as a complete and extensible training system using MuJoCoUni and MotrixSim CPU-batched physics backends, supporting PPO, FastSAC, FlashSAC, and APPO. On representative simulation-based robot control tasks, UniLab improves end-to-end training efficiency by 3--10$\times$ under the same hardware configuration, while reducing dependence on the NVIDIA CUDA-based software stack and supporting cross-platform execution on the Apple macOS platform and the AMD ROCm and Intel XPU accelerator backends. These results show that GPU simulation is an effective path to efficient training, but not a necessary one, broadening the practical system choices available for robot RL training. Project page: https://unilabsim.github.io.

URL PDF HTML ☆

赞 0 踩 0

2605.29663 2026-06-03 cs.RO 版本更新

EXACT-MPPI: Exact Signed-Distance Navigation for Arbitrary-Footprint Robots from Point Clouds via Path Integral Control

EXACT-MPPI：通过路径积分控制实现点云中任意足迹机器人的精确有符号距离导航

Chen Peng, Zhikang Ge, Wenwu Lu, Haiming Gao, Stavros Vougioukas, Peng Wei

发表机构 * ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China（浙江大学杭州全球科技创新中心）； College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China（浙江大学生物系统工程与食品科学学院）； Department of Biological and Agricultural Engineering, University of California, Davis, Davis, California, USA（加州大学戴维斯分校生物与农业工程系）

AI总结提出EXACT-MPPI框架，将解析精确有符号距离评估器嵌入模型预测路径积分控制器，无需中间地图表示，直接处理点云实现任意形状足迹机器人的安全导航。

详情

AI中文摘要

地面机器人通常携带有效载荷、工具或其他附件，使其有效足迹变成复杂的非凸形状。在杂乱环境中安全导航需要考虑到这种真实几何形状，然而大多数局部规划器使用凸或膨胀代理简化它，并将传感器数据栅格化为占用网格或距离场。当间隙与足迹几何形状相当时，这两种选择都会消除可行运动。我们提出EXACT-MPPI，一种无需训练的局部导航框架，将局部点云观测和稀疏引导直接映射到运动命令，无需任何中间地图表示。该框架将解析的精确有符号距离评估器嵌入模型预测路径积分（MPPI）控制器中。足迹表示为简单多边形，适用于一般凸或凹平面形状，并具有矩形覆盖特化以加速直线足迹的评估，从而实现足迹感知碰撞成本，无需凸分解、膨胀或学习编码器。在每个MPPI rollout期间，观测到的障碍物点被变换到预测的机体坐标系中，并针对足迹进行评估。所有操作在JAX中批处理，利用GPU并行性实现实时滚动时域控制。实验表明，EXACT-MPPI在批处理距离评估上比学习的点到机器人基线更快，在凸足迹规划器失败的地方保留了可行运动，并在密集静态和移动障碍物下保持鲁棒性。相同的框架通过仅更改足迹描述和运动模型即可部署在差速驱动、阿克曼、全向和混合模式平台上，无需针对每个平台进行训练。因此，将精确足迹几何与基于采样的预测控制相结合，为跨不同机器人的足迹感知局部导航提供了一种实用的、无需训练的途径。

英文摘要

Ground robots often carry payloads, implements, or other attachments that turn their effective footprint into complex, non-convex shapes. Navigating safely through clutter then requires reasoning about this true geometry, yet most local planners simplify it with convex or inflated proxies and rasterize sensor data into occupancy grids or distance fields. Both choices eliminate feasible motions when clearance is comparable to the footprint geometry. We present EXACT-MPPI, a training-free local navigation framework that maps local point-cloud observations and sparse guidance directly to motion commands, without any intermediate map representation. The framework embeds an analytic, exact signed-distance evaluator into a Model Predictive Path Integral (MPPI) controller. The footprint is represented as a simple polygon for general convex or concave planar shapes, with a rectangle-cover specialization for faster evaluation of rectilinear footprints, enabling footprint-aware collision costs without convex decomposition, inflation, or learned encoders. During each MPPI rollout, observed obstacle points are transformed into the predicted body frame and evaluated against the footprint. All operations are batched in JAX, leveraging GPU parallelism for real-time receding-horizon control. Experiments show that EXACT-MPPI accelerates batched distance evaluation over a learned point-to-robot baseline, preserves feasible motion where convex-footprint planners fail, and remains robust under dense static and moving obstacles. The same framework deploys on differential-drive, Ackermann, omnidirectional, and hybrid-mode platforms by changing only the footprint description and motion model without per-platform training. Pairing exact footprint geometry with sampling-based predictive control thus offers a practical, training-free path to footprint-aware local navigation across diverse robots.

URL PDF HTML ☆

赞 0 踩 0

2605.25051 2026-06-03 cs.RO 版本更新

A Decentralized LiDAR-SLAM System with Certifiably Optimal Pose Graph Optimization

一种具有可认证最优位姿图优化的去中心化LiDAR-SLAM系统

Baoshan Song, Feng Huang, Li-Ta Hsu

发表机构 * The Hong Kong Polytechnic University（香港理工大学）

AI总结针对多机器人去中心化LiDAR-SLAM全局一致性问题，提出首个集成可认证最优位姿图优化后端的系统，利用黎曼块坐标下降算法实现全局一致轨迹估计，无需精确初始猜测，轨迹RMSE相比DiSCo-SLAM最高降低48.9%。

Comments In Proceedings of the IEEE International Conference on Robotics & Automation (ICRA'26) 1st Workshop on Robot Meets GNSS and Ranging for Seamless Autonomy, Vienna, Austria, Jun. 5, 2026

2605.22018 2026-06-03 cs.CV cs.AI cs.RO 版本更新

FRED: A Multi-Modal Autonomous Driving Dataset for Flooded Road Environments

FRED：面向洪水道路环境的多模态自动驾驶数据集

Connor Malone, Sebastien Demmel, Sebastien Glaser

发表机构 * Queensland University of Technology（昆士兰理工大学）； ARC Training Centre for Automated Vehicles in Rural and Remote Regions (AVR3)（农村和偏远地区自动化车辆培训中心（AVR3））

AI总结提出首个针对道路水险场景的多模态自动驾驶数据集FRED，包含相机、LiDAR和IMU数据，并提供语义标签以支持水险检测方法训练与评估。

详情

AI中文摘要

洪水道路环境数据集（FRED）是，据我们所知，首个专门针对道路水险场景数据收集的多模态自动驾驶数据集。该数据集包含来自2.3 MP FLIR Blackfly USB3相机的图像、来自Ouster OS1-64 LiDAR的64线360度点云，以及由Geoflex RTK GNSS校正的iXblue ATLANS-C IMU数据，数据采集自五个不同地点，涵盖洪水期间和洪水之后。数据以两种格式发布：KITTI风格格式，便于与现有数据工具集成；以及RTMaps格式，用于直接回放车辆的数据捕获。我们提供语义标签，以支持用于水险检测的单传感器和传感器融合方法的训练与评估。提供位置和速度数据，以及干燥条件下捕获的数据，以支持可能包含地图的基于位置的检测方法开发，并评估其他任务，如定位和SLAM。

英文摘要

The Flooded Road Environments Dataset (FRED) is, to our knowledge, the first multi-modal autonomous driving dataset specifically targeting the collection of data from scenarios involving water hazards on the road. The dataset contains images from a 2.3 MP FLIR Blackfly USB3 camera, 64-beam 360 degree point clouds from an Ouster OS1-64 LiDAR, and data from an iXblue ATLANS-C IMU corrected by a Geoflex RTK GNSS, from five separate locations captured both during and after flooding events. The data has been released in two formats: a KITTI-style format for easy integration with existing data tools, and the RTMaps format for direct replay of the vehicle's data capture. We provide semantic labels to enable the training and evaluation of both single-sensor and sensor-fusion methods for water hazard detection. Position and velocity, as well as data captured under dry conditions, are provided to enable the development of location-based detection methods that may incorporate maps, and to evaluate other tasks such as localisation and SLAM.

URL PDF HTML ☆

赞 0 踩 0

2605.16816 2026-06-03 cs.RO 版本更新

"I'm Not Mad, Just Focused'': Understanding Human Emotions in Human-Robot Collaboration

“我没生气，只是专注”：理解人机协作中的人类情绪

Seung Chan Hong, Dana Kulić, Leimin Tian

发表机构 * Faculty of Engineering, Monash University（莫纳什大学工程学院）； CSIRO Robotics（CSIRO机器人实验室）

AI总结提出基于视觉语言模型（VLM）的情绪识别系统，利用上下文理解改善人机协作中的情绪解读，实验表明其语义相似性和情感对齐优于基线CNN系统，且用户偏好情绪自适应机器人行为。

详情

DOI: 10.1109/LRA.2026.3694591
Journal ref: IEEE Robotics and Automation Letters, vol. 11, no. 7, pp. 8260-8267, July 2026

AI中文摘要

人机协作（HRC）可以从机器人解读人类情绪状态的能力中受益。然而，当前HRC中的情绪识别（ER）模型往往表现不足，特别是因为它们依赖于表演数据集和单一模态输入（如面部表情）。我们提出了一种新颖的基于视觉语言模型（VLM）的ER系统，利用上下文理解来改善HRC中的情绪解读。我们首先通过评估VLM-ER系统与现有HRC数据集上人工标注的语义和情感相似性来对其进行评估。然后，在协作配送任务的用户研究中，我们评估了基于VLM-ER系统推断的用户情绪状态来调节机器人行为的效果。结果表明，与基线卷积神经网络系统相比，所提出的VLM-ER系统实现了更高的人工标注语义相似性和正向情感对齐。此外，用户研究中的参与者更喜欢由VLM-ER系统促进的情绪自适应机器人行为。

英文摘要

Human-robot collaboration (HRC) can benefit from robots' abilities to interpret human emotional states. However, current emotion recognition (ER) models in HRC often fall short, particularly due to their reliance on acted datasets and single-modality inputs like facial expressions. We propose a novel vision language model (VLM)-based ER system that leverages contextual understanding to improve emotion interpretation in HRC. We first evaluate the VLM-ER system by assessing its semantic and sentiment similarity with human annotations on an existing HRC dataset. Then, in a user study with a service robot in a collaborative delivery task, we evaluate the effects of modulating the robot's behaviour based on the user's emotional state inferred by the VLM-ER system. The results show that the proposed VLM-ER system achieves higher semantic similarity and positive sentiment alignment with human annotations compared to a baseline convolutional neural network-based system. Further, participants in the user study preferred emotion-adaptive robot behaviour facilitated by the VLM-ER system.

URL PDF HTML ☆

赞 0 踩 0

2604.19275 2026-06-03 eess.SY cs.OS cs.RO cs.SY 版本更新

Scheduling Analysis of UAV Flight Control Workloads on PREEMPT_RT Linux Using a Raspberry Pi 5

基于Raspberry Pi 5的PREEMPT_RT Linux上无人机飞行控制工作负载的调度分析

Luiz Giacomossi, Håkan Forsberg, Ivan Tomasic, Baran Çürüklü, Tommaso Cucinotta

发表机构 * Mälardalen University（马尔达LEN大学）； ReTiS Lab, Scuola Superiore Sant’Anna（ReTiS实验室，圣安娜高等学院）

AI总结通过分析Raspberry Pi 5上PREEMPT_RT Linux内核的激活路径对250 Hz控制回路的影响，发现标准内核最差延迟超过9 ms，而PREEMPT_RT将最差延迟降低约88%至225微秒以下，但剩余抖动主要由硬件内存争用引起。

Comments 9 pages, 8 figures, conference

详情

AI中文摘要

现代无人机架构日益趋向于将高级自主性和低级飞行控制统一在单个通用操作系统（GPOS）上。然而，复杂的多核片上系统（SoC）由于共享资源争用引入了显著的时间不确定性。本文对Raspberry Pi 5上的PREEMPT_RT Linux内核进行了架构分析，特别隔离了内核激活路径（延迟执行的SoftIRQ与实时直接激活）对250 Hz控制回路的影响。结果表明，在高负载下，标准内核不适合，最差延迟超过9毫秒。相比之下，PREEMPT_RT将最差延迟降低了近88%，降至225微秒以下，通过强制直接唤醒路径减轻了操作系统噪声。这些发现表明，虽然PREEMPT_RT解决了调度方差问题，但现代SoC上的剩余抖动主要由硬件内存争用驱动。

英文摘要

Modern UAV architectures increasingly aim to unify high-level autonomy and low-level flight control on a single General-Purpose Operating System (GPOS). However, complex multi-core System-on-Chips (SoCs) introduce significant timing indeterminism due to shared resource contention. This paper performs an architectural analysis of the PREEMPT RT Linux kernel on a Raspberry Pi 5, specifically isolating the impact of kernel activation paths (deferred execution SoftIRQs versus real-time direct activation) on a 250 Hz control loop. Results show that under heavy stress, the standard kernel is unsuitable, exhibiting worst-case latencies exceeding 9 ms. In contrast, PREEMPT RT reduced the worst-case latency by nearly 88 percent to under 225 microseconds, enforcing a direct wake-up path that mitigates OS noise. These findings demonstrate that while PREEMPT RT resolves scheduling variance, the residual jitter on modern SoCs is primarily driven by hardware memory contention.

URL PDF HTML ☆

赞 0 踩 0

2603.23117 2026-06-03 cs.CR cs.AI cs.RO 版本更新

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

TRAP: 通过对抗性补丁劫持VLA的CoT推理

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji, Wenyuan Xu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出TRAP攻击，利用对抗性补丁劫持视觉-语言-动作模型的链式推理，实现目标行为操控。

Comments Accepted by ICML 2026

详情

AI中文摘要

通过集成链式推理，视觉-语言-动作模型在机器人操作中展现出强大能力，特别是在提升泛化性和可解释性方面。然而，基于CoT的推理机制的安全性尚未得到充分探索。在本文中，我们证明CoT推理引入了一种新的攻击向量，用于目标行为劫持——例如，导致机器人错误地将刀递给一个人而不是苹果——而无需修改用户的指令。我们首先提供经验证据表明，即使CoT与输入指令在语义上不一致，它仍然强烈主导动作生成。基于这一观察，我们提出TRAP，这是首个针对CoT推理VLA模型的目标行为劫持对抗性攻击。通过针对推理到动作的路径，TRAP使用对抗性补丁（例如，放置在桌子上的桌布）来引导中间CoT推理和下游动作朝向对手定义的行为。在三个代表性推理VLA上的广泛评估，涵盖了不同的CoT推理机制，证明了TRAP的有效性。值得注意的是，我们在现实环境中通过将补丁打印在纸上实现了该攻击。我们的发现凸显了保护VLA系统中CoT推理的紧迫性。项目页面可在https://zhengxian-huang.github.io/TRAP-website/获取。

英文摘要

By integrating Chain-of-Thought (CoT) reasoning, Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, particularly by improving generalization and interpretability. However, the security of CoT-based reasoning mechanisms remains largely unexplored. In this paper, we show that CoT reasoning introduces a novel attack vector for targeted behavior hijacking--for example, causing a robot to mistakenly deliver a knife to a person instead of an apple--without modifying the user's instruction. We first provide empirical evidence that CoT strongly governs action generation, even when it is semantically misaligned with the input instructions. Building on this observation, we propose TRAP, the first targeted behavior-hijacking adversarial attack against CoT-reasoning VLA models. By targeting the reasoning-to-action pathway, TRAP uses an adversarial patch (e.g., a tablecloth placed on the table) to steer intermediate CoT reasoning and downstream actions toward adversary-defined behaviors. Extensive evaluations on three representative reasoning VLAs, spanning distinct CoT reasoning mechanisms, demonstrate the effectiveness of TRAP. Notably, we implemented the patch by printing it on paper in a real-world setting. Our findings highlight the urgent need to secure CoT reasoning in VLA systems. The project page is available at https://zhengxian-huang.github.io/TRAP-website/.

URL PDF HTML ☆

赞 0 踩 0

2602.04132 2026-06-03 eess.SY cs.LG cs.RO cs.SY 版本更新

LC-SAC: Lyapunov-Constrained Soft Actor-Critic via Koopman Operator Theory for Trajectory Tracking and Stabilization

LC-SAC: 基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法用于轨迹跟踪与镇定

Dhruv S. Kushwaha, Zoleikha A. Biron

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种结合Koopman算子理论的李雅普诺夫约束软演员-评论家算法，通过线性提升动力学模型和闭环控制李雅普诺夫函数实现轨迹跟踪与镇定，并引入条件风险价值约束处理罕见但严重的失稳事件。

Comments 13 pages, 8 Figures

详情

AI中文摘要

强化学习在解决复杂序列决策问题中取得了显著成功，但其在安全关键物理系统中的应用仍受限于缺乏稳定性保证。标准强化学习算法优先考虑奖励最大化，往往产生可能引起振荡或无界状态发散的策略。本文提出一种基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法。我们通过扩展动态模态分解学习误差动力学的线性提升代理模型，并求解离散代数Riccati方程以获得闭式二次候选控制李雅普诺夫函数。该控制李雅普诺夫函数作为拉格朗日惩罚项被纳入SAC演员更新中，通过条件风险价值目标聚合最坏情况尾部分布，将约束压力集中在罕见但严重的失稳事件上。我们进一步引入三种结构性的EDMD改进：在求解DARE之前对提升的A矩阵进行谱半径归一化、具有物理意义的LQR状态代价，以及强制V(0)=0的值偏置锚点，使得闭式控制李雅普诺夫函数对于更高维的提升模型（如倒立摆和3D四旋翼）是适定的。消融研究表明，硬拉格朗日约束是必要的，将其替换为奖励塑形会导致学习不稳定并在四旋翼任务中导致回报崩溃。

英文摘要

Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. In this work we propose a Lyapunov-Constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We learn a linear lifted surrogate of the error dynamics via Extended Dynamic Mode Decomposition (EDMD) and solve the Discrete Algebraic Riccati Equation (DARE) to obtain a closed-form quadratic candidate Control Lyapunov Function (CLF). This CLF is incorporated into the SAC actor update as a Lagrangian penalty that aggregates the worst-case tail of violations via a Conditional Value-at-Risk (CVaR) objective, concentrating constraint pressure on rare but severe instability events. We further introduce three structural EDMD refinements spectral-radius normalization of the lifted A-matrix prior to the DARE solve, a physically meaningful LQR state cost, and a value-bias anchor enforcing V(0)=0 that make the closed-form CLF well-posed for higher-dimensional lifted models such as the cartpole and 3D quadrotor. The ablation study shows that a hard Lagrangian constraint is essential, replacing it with reward shaping (Lyap-RS-SAC) destabilizes learning and collapses return on quadrotor tasks.

URL PDF HTML ☆

赞 0 踩 0

2602.06219 2026-06-03 cs.RO cs.AI 版本更新

Coupled Local and Global World Models for Efficient First Order RL

耦合局部与全局世界模型的高效一阶强化学习

Joseph Amigo, Rooholla Khorrambakht, Nicolas Mansard, Ludovic Righetti

发表机构 * Machines in Motion Laboratory, New York University, USA（纽约大学运动机器实验室）； LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France（图卢兹大学LAAS-CNRS中心）； Artificial and Natural Intelligence Toulouse Institute, Toulouse, France（图卢兹人工智能与自然智能研究所）

AI总结提出一种通过解耦一阶梯度方法在数据驱动的世界模型内训练策略的方法，结合局部和全局世界模型实现高效梯度计算，在Push-T任务和四足机器人操作任务中显著优于PPO。

Comments Project website: https://coupled-global-local-wm-rl.pages.dev/

详情

AI中文摘要

世界模型为在标准模拟器难以处理的情况下更忠实地捕捉复杂动力学（包括接触和非刚性）以及复杂感官信息（如视觉感知）提供了一条有前景的途径。然而，这些模型的计算复杂度高，对流行的强化学习方法构成了挑战，这些方法已成功用于模拟器解决复杂运动任务，但在操作任务上仍存在困难。本文介绍了一种完全绕过模拟器的方法，在从机器人与真实环境交互中学习到的世界模型内部训练强化学习策略。其核心是通过一种新颖的解耦一阶梯度方法实现大规模扩散模型的策略训练：全尺度世界模型生成准确的前向轨迹，而轻量级潜在空间代理近似其局部动力学以实现高效梯度计算。这种局部与全局世界模型的耦合确保了高保真展开以及计算上可处理的微分。我们在Push-T操作任务上证明了该方法的有效性，其在样本效率上显著优于PPO。我们还通过四足机器人的自我中心物体操作任务进一步评估了该方法。这些结果共同表明，在数据驱动的世界模型内部学习是解决难以建模的图像空间强化学习任务的一条有前景的途径，无需依赖手工设计的物理模拟器。

英文摘要

World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, in situations where standard simulators struggle. However, these models are computationally complex to evaluate, posing a challenge for popular RL approaches that have been successfully used with simulators to solve complex locomotion tasks but yet struggle with manipulation. This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments. At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method: a full-scale world model generates accurate forward trajectories, while a lightweight latent-space surrogate approximates its local dynamics for efficient gradient computation. This coupling of a local and global world model ensures high-fidelity unrolling alongside computationally tractable differentiation. We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency. We further evaluate our approach through an ego-centric object manipulation task with a quadruped. Together, these results demonstrate that learning inside data-driven world models is a promising pathway for solving hard-to-model RL tasks in image space without reliance on hand-crafted physics simulators.

URL PDF HTML ☆

赞 0 踩 0

2512.19347 2026-06-03 cs.RO 版本更新

OMP: One-step Meanflow Policy with Directional Alignment

OMP: 一步均值流策略与方向对齐

Han Fang, Yize Huang, Yuheng Zhao, Paul Weng, Xiao Li, Yutong Ban

发表机构 * School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China（上海交通大学机械工程学院）； Global College, Shanghai Jiao Tong University, Shanghai, China（上海交通大学全球学院）； Duke Kunshan University, Jiangsu, China（杜克昆山大学）

AI总结提出一步均值流策略（OMP），通过方向对齐机制和微分推导方程解决均值流在机器人操作中的谱偏差和梯度饥饿问题，实现高保真实时操控。

Comments Accepted as poster of ICML-2026

详情

AI中文摘要

机器人操作日益采用数据驱动的生成策略框架，但该领域面临持续的权衡：扩散模型推理延迟高，而基于流的方法通常需要复杂的架构约束。尽管在图像生成领域，均值流范式提供了单步推理的路径，但其直接应用于机器人领域受到关键理论病理的阻碍，特别是低速度区域中的谱偏差和梯度饥饿。为克服这些限制，我们提出了一步均值流策略（OMP），一种专为高保真实时操作设计的新型框架。我们引入轻量级方向对齐机制，以显式同步预测速度与真实均值速度。此外，我们实现了微分推导方程（DDE）来近似雅可比向量积（JVP）算子，该算子解耦前向和后向传播，显著降低内存复杂度。在Adroit和Meta-World基准上的大量实验表明，OMP在成功率和轨迹精度上优于最先进方法，特别是在高精度任务中，同时保持了单步生成的效率。

英文摘要

Robot manipulation has increasingly adopted data-driven generative policy frameworks, yet the field faces a persistent trade-off: diffusion models suffer from high inference latency, while flow-based methods often require complex architectural constraints. Although in image generation domain, the MeanFlow paradigm offers a path to single-step inference, its direct application to robotics is impeded by critical theoretical pathologies, specifically spectral bias and gradient starvation in low-velocity regimes. To overcome these limitations, we propose the One-step MeanFlow Policy (OMP), a novel framework designed for high-fidelity, real-time manipulation. We introduce a lightweight directional alignment mechanism to explicitly synchronize predicted velocities with true mean velocities. Furthermore, we implement a Differential Derivation Equation (DDE) to approximate the Jacobian-Vector Product (JVP) operator, which decouples forward and backward passes to significantly reduce memory complexity. Extensive experiments on the Adroit and Meta-World benchmarks demonstrate that OMP outperforms state-of-the-art methods in success rate and trajectory accuracy, particularly in high-precision tasks, while retaining the efficiency of single-step generation.

URL PDF HTML ☆

赞 0 踩 0

2512.22539 2026-06-03 cs.RO cs.CV 版本更新

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

VLA-Arena：一个用于基准测试视觉-语言-动作模型的开源框架

Borong Zhang, Jiahao Li, Jiachen Shen, Yuhao Zhang, Yishuai Cai, Yuanpei Chen, Juntao Dai, Jiaming Ji, Yaodong Yang

AI总结提出VLA-Arena基准，通过三正交轴（任务结构、语言命令、视觉观察）量化任务难度，系统评估视觉-语言-动作模型的能力边界与失败模式。

Comments Accepted by ICML 2026

详情

AI中文摘要

Plan-R1：安全且可行的轨迹规划作为语言建模

Xiaolong Tang, Meina Kan, Shiguang Shan, Xilin Chen

发表机构 * Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Chinese Academy of Sciences（中国科学院大学）

AI总结提出Plan-R1两阶段轨迹规划框架，通过原则对齐与行为学习解耦，结合规则奖励和方差解耦GRPO，显著提升自动驾驶规划的安全性和可行性。

Comments Accepted by ICLR2026

详情

AI中文摘要

安全且可行的轨迹规划对于现实世界的自动驾驶系统至关重要。然而，现有的基于学习的规划器严重依赖专家演示，这不仅缺乏明确的安全意识，还可能继承次优人类驾驶数据中的不良行为（如超速）。受大型语言模型成功的启发，我们提出了Plan-R1，一种两阶段轨迹规划框架，将原则对齐与行为学习解耦。在第一阶段，通用轨迹预测器在专家数据上进行预训练，以捕获多样化的、类人的驾驶行为。在第二阶段，使用基于规则的奖励通过组相对策略优化（GRPO）对模型进行微调，明确地将自我规划与安全、舒适和交通规则遵守等原则对齐。这种两阶段范式保留了类人行为，同时增强了安全意识并丢弃了演示中的不良模式。此外，我们识别了直接应用GRPO到规划的一个关键限制：组级归一化消除了跨组的尺度差异，导致罕见、高方差的安全违规组与大量低方差的安全组具有相似的优势，从而抑制了对安全关键目标的优化。为解决此问题，我们提出了方差解耦GRPO（VD-GRPO），用中心化和固定缩放替代归一化以保留绝对奖励幅度，确保安全关键目标在整个训练过程中保持主导地位。在nuPlan基准上的实验表明，Plan-R1显著提高了规划的安全性和可行性，达到了最先进的性能，特别是在现实反应性设置中。我们的代码可在https://github.com/XiaolongTang23/Plan-R1获取。

英文摘要

Safe and feasible trajectory planning is critical for real-world autonomous driving systems. However, existing learning-based planners rely heavily on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting undesirable behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a two-stage trajectory planning framework that decouples principle alignment from behavior learning. In the first stage, a general trajectory predictor is pre-trained on expert data to capture diverse, human-like driving behaviors. In the second stage, the model is fine-tuned with rule-based rewards using Group Relative Policy Optimization (GRPO), explicitly aligning ego planning with principles such as safety, comfort, and traffic rule compliance. This two-stage paradigm retains human-like behaviors while enhancing safety awareness and discarding undesirable patterns from demonstrations. Furthermore, we identify a key limitation of directly applying GRPO to planning: group-wise normalization erases cross-group scale differences, causing rare, high-variance safety-violation groups to have similar advantages as abundant low-variance safe groups, thereby suppressing optimization for safety-critical objectives. To address this, we propose Variance-Decoupled GRPO (VD-GRPO), which replaces normalization with centering and fixed scaling to preserve absolute reward magnitudes, ensuring that safety-critical objectives remain dominant throughout training. Experiments on the nuPlan benchmark demonstrate that Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance, particularly in realistic reactive settings. Our code is available at https://github.com/XiaolongTang23/Plan-R1.

URL PDF HTML ☆

赞 0 踩 0

2509.20623 2026-06-03 cs.RO 版本更新

Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation

潜在激活编辑：基于推理时策略精炼的安全多机器人导航

Satyajeet Das, Darren Chiu, Zhehui Huang, Lars Lindemann, Gaurav S. Sukhatme

发表机构 * Department of Computer Science, University of Southern California（南加州大学计算机科学系）； Automatic Control Laboratory, ETH Zürich（苏黎世联邦理工学院自动控制实验室）

AI总结提出潜在激活编辑（LAE）框架，通过在推理时在线检测并编辑中间激活，在不修改权重或架构的情况下降低预训练策略的碰撞率，在四旋翼导航中实现近90%的碰撞减少。

详情

AI中文摘要

强化学习在协调和导航多个四旋翼等复杂领域取得了显著进展。然而，即使经过良好训练的策略在障碍物密集的环境中仍然容易发生碰撞。通过重新训练或微调来解决这些罕见但关键的安全故障成本高昂，并且有损于先前学到的技能。受大语言模型中的激活引导和计算机视觉中的潜在编辑启发，我们引入了一个推理时潜在激活编辑（LAE）框架，该框架在不修改权重或架构的情况下精炼预训练策略的行为。该框架分两个阶段运行：（i）在线分类器监控中间激活以检测与不良行为相关的状态，（ii）激活编辑模块选择性地修改被标记的激活，将策略转向更安全的区域。在这项工作中，我们专注于提高多四旋翼导航的安全性。我们假设放大策略内部的风险感知可以诱导更安全的行为。我们通过训练一个潜在碰撞世界模型来实例化这一想法，该模型预测未来的碰撞前激活，从而促使更早和更谨慎的避碰响应。大量的仿真和真实Crazyflie实验表明，与未编辑的基线相比，LAE实现了统计上显著的碰撞减少（累计碰撞减少近90%），并显著增加了无碰撞轨迹的比例，同时保持了任务完成。更广泛地说，我们的结果确立了LAE作为一种轻量级范式，可在资源受限的硬件上对学习后的机器人策略进行部署后精炼。

英文摘要

Reinforcement learning has enabled significant progress in complex domains such as coordinating and navigating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activation steering in large language models and latent editing in computer vision, we introduce a framework for inference-time Latent Activation Editing (LAE) that refines the behavior of pre-trained policies without modifying their weights or architecture. The framework operates in two stages: (i) an online classifier monitors intermediate activations to detect states associated with undesired behaviors, and (ii) an activation editing module that selectively modifies flagged activations to shift the policy towards safer regimes. In this work, we focus on improving safety in multi-quadrotor navigation. We hypothesize that amplifying a policy's internal perception of risk can induce safer behaviors. We instantiate this idea through a latent collision world model trained to predict future pre-collision activations, thereby prompting earlier and more cautious avoidance responses. Extensive simulations and real-world Crazyflie experiments demonstrate that LAE achieves statistically significant reduction in collisions (nearly 90% fewer cumulative collisions compared to the unedited baseline) and substantially increases the fraction of collision-free trajectories, while preserving task completion. More broadly, our results establish LAE as a lightweight paradigm, feasible on resource-constrained hardware, for post-deployment refinement of learned robot policies.

URL PDF HTML ☆

赞 0 踩 0

2509.18068 2026-06-03 cs.RO eess.SP 版本更新

RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

RadarSFD：基于预训练先验的单帧扩散用于雷达点云

Bin Zhao, Nakul Garg

发表机构 * Rice University（里士大学）

AI总结提出RadarSFD，一种条件潜在扩散框架，利用预训练单目深度估计器的几何先验，从单帧雷达数据重建密集LiDAR-like点云，无需合成孔径或多帧聚合。

Comments Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026). Project page: https://phi-lab-rice.github.io/RadarSFD/

详情

AI中文摘要

毫米波雷达在雾、烟、尘和低光环境下提供稳健的感知，使其适用于尺寸、重量和功率受限的机器人平台。现有的雷达成像方法通常依赖合成孔径或多帧聚合来提高分辨率，这对于小型空中、检测或可穿戴系统不切实际。我们提出RadarSFD，一种条件潜在扩散框架，无需运动或SAR即可从单帧雷达重建密集的LiDAR-like点云。我们的方法将预训练单目深度估计器的几何先验转移到扩散骨干中，通过通道级潜在拼接将其锚定到雷达输入，并使用结合潜在空间和像素空间损失的双空间目标进行正则化。在RadarHD基准上，RadarSFD相对于基线模型实现了最先进的性能。定性结果显示恢复了精细的墙壁和狭窄的间隙，跨新环境的实验证实了强大的泛化能力。消融研究强调了预训练初始化、雷达BEV条件和双空间损失的重要性。这些结果共同为紧凑型机器人系统中的密集点云感知建立了一个实用的单帧、无SAR毫米波雷达流水线。

英文摘要

Millimeter-wave radar provides robust perception in fog, smoke, dust, and low light, making it attractive for size-, weight-, and power-constrained robotic platforms. Existing radar imaging methods typically rely on synthetic aperture or multi-frame aggregation to improve resolution, which is impractical for small aerial, inspection, or wearable systems. We present RadarSFD, a conditional latent diffusion framework that reconstructs dense LiDAR-like point clouds from a single radar frame without motion or SAR. Our approach transfers geometric priors from a pretrained monocular depth estimator into the diffusion backbone, anchors them to radar inputs via channel-wise latent concatenation, and regularizes outputs with a dual-space objective combining latent and pixel-space losses. On the RadarHD benchmark, RadarSFD achieves state-of-the-art performance against baseline models. Qualitative results show recovery of fine walls and narrow gaps, and experiments across new environments confirm strong generalization. Ablation studies highlight the importance of pretrained initialization, radar BEV conditioning, and the dual-space loss. Together, these results establish a practical single-frame, no-SAR mmWave radar pipeline for dense point cloud perception in compact robotic systems.

URL PDF HTML ☆

赞 0 踩 0

2509.14636 2026-06-03 cs.RO 版本更新

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

BEV-ODOM2: 基于PV-BEV融合与密集光流监督的增强型BEV单目视觉里程计用于地面机器人

Yufei Wei, Chenxiao Hu, Wangtao Lu, Sha Lu, Yuxiang Cui, Fuzhang Han, Rong Xiong, Yue Wang

发表机构 * Tsinghua University（清华大学）

AI总结针对现有BEV方法中位姿训练稀疏监督和透视投影信息丢失的问题，提出BEV-ODOM2框架，通过密集BEV光流监督和PV-BEV融合，在四个数据集上实现40%的RTE提升，并支持边缘实时部署。

详情

AI中文摘要

尺度一致的自我运动估计是自主地面机器人的基础。鸟瞰图（BEV）表示通过提供度量尺度的平面工作空间，自然地解决了单目视觉里程计（MVO）的尺度漂移问题，使得6自由度自我运动简化为更鲁棒的3自由度模型。然而，现有的基于BEV的方法存在两个关键限制：仅从位姿训练得到的稀疏监督信号，以及透视到BEV投影过程中的信息丢失。我们提出了BEV-ODOM2，一个增强框架，无需额外标注即可解决这两个限制。我们的方法引入了（1）直接从3自由度位姿真值构建的密集BEV光流监督，用于像素级指导，以及（2）透视视图（PV）-BEV融合，在投影前计算相关体积以保留6自由度运动线索。增强的旋转采样策略进一步在训练中平衡了不同的运动模式。我们在四个不同空间尺度的数据集上进行了评估：KITTI、Oxford、NCLT和我们新收集的ZJH-VO基准。BEV-ODOM2相比之前的BEV方法实现了40%的RTE提升，在NVIDIA Jetson AGX Orin上的实时推理确认了边缘部署的可行性。源代码和ZJH-VO数据集已公开发布，以促进未来研究。

英文摘要

Scale-consistent ego-motion estimation is fundamental for autonomous ground robots. Bird's-Eye-View (BEV) representation naturally addresses the scale drift problem of monocular visual odometry (MVO) by providing a metric-scaled planar workspace, enabling the simplification of 6-DoF ego-motion to a more robust 3-DoF model. However, existing BEV-based methods suffer from two key limitations: sparse supervision signals from pose-only training, and information loss during perspective-to-BEV projection. We present BEV-ODOM2, an enhanced framework that addresses both limitations without requiring additional annotations. Our approach introduces (1) dense BEV optical flow supervision constructed directly from 3-DoF pose ground truth for pixel-level guidance, and (2) Perspective View (PV)-BEV fusion that computes correlation volumes before projection to preserve 6-DoF motion cues. An enhanced rotation sampling strategy further balances diverse motion patterns during training. We evaluate on four datasets with varied spatial scales: KITTI, Oxford, NCLT, and our newly collected ZJH-VO benchmark. BEV-ODOM2 achieves a 40\% RTE improvement over prior BEV-based methods, with real-time inference on an NVIDIA Jetson AGX Orin confirming edge deployment feasibility. The source code and the ZJH-VO dataset are publicly released to facilitate future research.

URL PDF HTML ☆

赞 0 踩 0

2508.09606 2026-06-03 cs.RO cs.SY eess.SY 版本更新

BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots

BEAVR：用于机器人的双手、多形态、可访问的虚拟现实遥操作系统

Alejandro Posadas-Nava, Alejandro Carrasco, Richard Linares

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology（航空与航天系，麻省理工学院）

AI总结提出BEAVR，一个开源的双手多形态VR遥操作系统，通过零拷贝流式架构和异步“思考-行动”控制循环，实现低延迟、多机器人实时控制与数据记录，并兼容多种视觉运动策略。

Comments Accepted for presentation on ICCR Kyoto 2025

详情

DOI: 10.1109/ICCR67607.2025.11372114

AI中文摘要

\textbf{BEAVR}是一个用于机器人的开源、双手、多形态虚拟现实（VR）遥操作系统，旨在统一异构机器人平台上的实时控制、数据记录和策略学习。BEAVR使用商用VR硬件实现实时、灵巧的遥操作，支持从7自由度机械臂到全身人形机器人的模块化集成，并直接以LeRobot数据集模式记录同步的多模态演示。我们的系统具有零拷贝流式架构，实现≤35毫秒延迟，一个用于可扩展推理的异步“思考-行动”控制循环，以及一个针对实时多机器人操作优化的灵活网络API。我们在多种操作任务上对BEAVR进行基准测试，并展示其与领先的视觉运动策略（如ACT、DiffusionPolicy和SmolVLA）的兼容性。所有代码公开可用，数据集发布在Hugging Face上\footnote{代码、数据集和VR应用可在https://github.com/ARCLab-MIT/BEAVR-Bot获取。}

英文摘要

\textbf{BEAVR} is an open-source, bimanual, multi-embodiment Virtual Reality (VR) teleoperation system for robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema. Our system features a zero-copy streaming architecture achieving $\leq$35\,ms latency, an asynchronous ``think--act'' control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA. All code is publicly available, and datasets are released on Hugging Face\footnote{Code, datasets, and VR app available at https://github.com/ARCLab-MIT/BEAVR-Bot.

URL PDF HTML ☆

赞 0 踩 0

2205.15412 2026-06-03 cs.DC cs.MA cs.RO 版本更新

Asynchronous Deterministic Leader Election in Three-Dimensional Programmable Matter

三维可编程物质中的异步确定性领导者选举

Joseph L. Briones, Tishya Chhabra, Joshua J. Daymude, Andréa W. Richa

AI总结针对三维可编程物质，提出基于面心立方晶格的分布式算法，在非公平顺序敌手下O(n)轮内确定性选举唯一领导者，并利用并发控制框架转化为非公平异步敌手下首个领导者选举算法。

Comments 18 pages, 4 figures, 2 tables. Accepted at ICDCN 2023

详情

DOI: 10.1145/3571306.3571389
Journal ref: Proceedings of the 24th International Conference on Distributed Computing and Networking (ICDCN 2023), pp. 38-47

AI中文摘要

经过三十多年的科学努力实现可编程物质（一种可以根据用户输入或对环境响应改变其物理特性的物质），在模块化机器人系统的工程和相应的集体行为算法理论方面都取得了许多进展。然而，虽然模块化机器人的设计通常处理真实三维（3D）空间的挑战，但算法理论仍然主要关注二维（2D）抽象，如平面和平面图。在这项工作中，我们使用面心立方（FCC）晶格来表示空间并定义局部空间方向，为可编程物质的规范阿米巴模型形式化了3D几何空间变体。然后，我们给出了一种用于连通、可收缩的2D或3D几何阿米巴系统中领导者选举的分布式算法，该算法在非公平顺序敌手下确定性选举出恰好一个领导者，时间复杂度为O(n)轮，其中n是系统中的阿米巴数量。接着，我们展示了如何使用阿米巴算法的并发控制框架（DISC 2021）转换该算法，以获得已知的第一个在非公平异步敌手下解决领导者选举的阿米巴算法，适用于2D和3D空间。

英文摘要

Over three decades of scientific endeavors to realize programmable matter, a substance that can change its physical properties based on user input or responses to its environment, there have been many advances in both the engineering of modular robotic systems and the corresponding algorithmic theory of collective behavior. However, while the design of modular robots routinely addresses the challenges of realistic three-dimensional (3D) space, algorithmic theory remains largely focused on 2D abstractions such as planes and planar graphs. In this work, we formalize the 3D geometric space variant for the canonical amoebot model of programmable matter, using the face-centered cubic (FCC) lattice to represent space and define local spatial orientations. We then give a distributed algorithm for leader election in connected, contractible 2D or 3D geometric amoebot systems that deterministically elects exactly one leader in $\mathcal{O}(n)$ rounds under an unfair sequential adversary, where $n$ is the number of amoebots in the system. We then demonstrate how this algorithm can be transformed using the concurrency control framework for amoebot algorithms (DISC 2021) to obtain the first known amoebot algorithm, both in 2D and 3D space, to solve leader election under an unfair asynchronous adversary.

URL PDF HTML ☆

赞 0 踩 0

2105.02420 2026-06-03 cs.DC cs.ET cs.RO 版本更新

The Canonical Amoebot Model: Algorithms and Concurrency Control

规范变形虫模型：算法与并发控制

Joshua J. Daymude, Andréa W. Richa, Christian Scheideler

AI总结提出规范变形虫模型，通过消息传递和对抗性激活模型形式化并发执行，并给出两种并发算法设计方法（直接嵌入并发控制和基于锁的转换框架），以六边形形成算法为例验证。

Comments 48 pages, 7 figures, 2 tables

详情

DOI: 10.1007/s00446-023-00443-3
Journal ref: Distributed Computing (2023) 36, pp. 159-192

AI中文摘要

变形虫模型将主动可编程物质抽象为称为变形虫的简单计算元素的集合，这些元素局部交互以共同完成协调和移动任务。自2014年在SPAA上引入以来，越来越多的文献针对各种问题调整了其假设；然而，如果没有标准化的假设层次结构，很难在变形虫模型下对结果进行精确的系统比较。我们提出了规范变形虫模型，这是一种更新的形式化，区分了核心模型特征和假设变体族。规范变形虫模型解决的一个关键改进是并发性。现有文献大多隐含地假设变形虫的动作是隔离且可靠的，将分析简化为至多一个变形虫同时活动的顺序设置。然而，真实的可编程物质系统是并发的。规范变形虫模型将所有变形虫通信形式化为消息传递，利用并发执行的对抗性激活模型。在这种对时间的精细处理下，我们采用了两种互补的方法来设计并发算法。我们首先建立了一组在任何并发执行下算法正确性的充分条件，将并发控制直接嵌入算法设计。然后，我们提出了一个使用锁的并发控制框架，将在顺序设置中终止并满足特定约定的变形虫算法转换为在并发设置中表现出等效行为的算法。作为案例研究，我们使用一个简单的六边形形成算法演示了这两种方法。规范变形虫模型和这些互补的并发算法设计方法共同为可编程物质的分布式计算研究开辟了新的方向。

英文摘要

The amoebot model abstracts active programmable matter as a collection of simple computational elements called amoebots that interact locally to collectively achieve tasks of coordination and movement. Since its introduction at SPAA 2014, a growing body of literature has adapted its assumptions for a variety of problems; however, without a standardized hierarchy of assumptions, precise systematic comparison of results under the amoebot model is difficult. We propose the canonical amoebot model, an updated formalization that distinguishes between core model features and families of assumption variants. A key improvement addressed by the canonical amoebot model is concurrency. Much of the existing literature implicitly assumes amoebot actions are isolated and reliable, reducing analysis to the sequential setting where at most one amoebot is active at a time. However, real programmable matter systems are concurrent. The canonical amoebot model formalizes all amoebot communication as message passing, leveraging adversarial activation models of concurrent executions. Under this granular treatment of time, we take two complementary approaches to concurrent algorithm design. We first establish a set of sufficient conditions for algorithm correctness under any concurrent execution, embedding concurrency control directly in algorithm design. We then present a concurrency control framework that uses locks to convert amoebot algorithms that terminate in the sequential setting and satisfy certain conventions into algorithms that exhibit equivalent behavior in the concurrent setting. As a case study, we demonstrate both approaches using a simple algorithm for hexagon formation. Together, the canonical amoebot model and these complementary approaches to concurrent algorithm design open new directions for distributed computing research on programmable matter.

URL PDF HTML ☆

赞 0 踩 0

2108.09403 2026-06-03 cs.RO cs.DC 版本更新

Deadlock and Noise in Self-Organized Aggregation Without Computation

无计算的自组织聚合中的死锁与噪声

Joshua J. Daymude, Noble C. Harasha, Andréa W. Richa, Ryan Yiu

AI总结研究无计算自组织聚合算法在多机器人系统中的死锁问题，证明确定性运动下存在死锁构型，并发现少量误差可避免死锁，同时提出一种离散噪声版本。

Comments 17 pages, 11 figures

详情

DOI: 10.1007/978-3-030-91081-5_4
Journal ref: Stabilization, Safety, and Security of Distributed Systems (SSS 2021), pp. 51-65

AI中文摘要

聚合是群体机器人学中的基本行为，要求系统聚集在一个紧凑、连通的集群中。2014年，Gauci等人提出了一种令人惊讶的算法，仅使用二进制视线传感器且无需算术计算或持久内存，即可可靠地实现群体聚合。该算法已被严格证明能够将一个机器人聚合到另一个机器人，但尚不清楚它是否总能像实验和模拟中观察到的那样聚合$n > 2$个机器人的系统。我们证明，当机器人的运动是均匀且确定性的时，对于$n > 3$个机器人，存在死锁构型，使得该算法无法实现聚合。从积极方面看，我们表明该算法（i）对小量误差具有鲁棒性，从而能够避免死锁，并且（ii）在使用锥形视线传感器时，对于$n = 2$的情况，可证明实现线性运行时间加速。最后，我们引入了该算法的一种带噪声的离散改编，更易于进行噪声的严格分析，其模拟结果与原始的连续算法定性一致。

英文摘要

Aggregation is a fundamental behavior for swarm robotics that requires a system to gather together in a compact, connected cluster. In 2014, Gauci et al. proposed a surprising algorithm that reliably achieves swarm aggregation using only a binary line-of-sight sensor and no arithmetic computation or persistent memory. It has been rigorously proven that this algorithm will aggregate one robot to another, but it remained open whether it would always aggregate a system of $n > 2$ robots as was observed in experiments and simulations. We prove that there exist deadlocked configurations from which this algorithm cannot achieve aggregation for $n > 3$ robots when the robots' motion is uniform and deterministic. On the positive side, we show that the algorithm (i) is robust to small amounts of error, enabling deadlock avoidance, and (ii) provably achieves a linear runtime speedup for the $n = 2$ case when using a cone-of-sight sensor. Finally, we introduce a noisy, discrete adaptation of this algorithm that is more amenable to rigorous analysis of noise and whose simulation results align qualitatively with the original, continuous algorithm.

URL PDF HTML ☆

赞 0 踩 0

2007.04377 2026-06-03 cs.DC cs.RO 版本更新

Bio-Inspired Energy Distribution for Programmable Matter

可编程物质的仿生能量分布

Joshua J. Daymude, Andréa W. Richa, Jamison W. Weber

AI总结受枯草芽孢杆菌生物膜生长行为启发，提出一种基于通信的算法，通过抑制饥饿模块的能量消耗，确保可编程物质系统中所有模块获得足够能量，并扩展了amoebot模型的生成树原语以支持崩溃故障自稳定。

详情

DOI: 10.1145/3427796.3427835
Journal ref: Proceedings of the 22nd International Conference on Distributed Computing and Networking (ICDCN 2021), pp. 86-95

AI中文摘要

在主动可编程物质系统中，单个模块需要持续的能量供应才能参与系统的集体行为。这些系统通常由至少一个模块可访问的外部能源供电，并依赖模块间的能量传输在整个系统中分配能量。尽管在解决可编程物质硬件中能量管理的挑战方面投入了大量精力，但可编程物质的算法理论在很大程度上忽略了能量使用和分布对算法可行性和效率的影响。在这项工作中，我们提出了一种受枯草芽孢杆菌生物膜生长行为启发的amoebot模型能量分布算法。这些细菌使用化学信号传递其代谢状态并调节整个生物膜中的营养消耗，确保所有细菌获得所需的营养。我们的算法类似地使用通信来在存在饥饿模块时抑制能量消耗，使所有模块能够获得足够的能量以满足其需求。作为一个支持性但独立的结果，我们扩展了amoebot模型成熟的生成树原语，使其在崩溃故障存在时能够自稳定。最后，我们展示了如何利用这一自稳定原语将我们的能量分布算法与现有的amoebot模型算法组合，从而有效地将先前的工作推广到也考虑能量约束。

英文摘要

In systems of active programmable matter, individual modules require a constant supply of energy to participate in the system's collective behavior. These systems are often powered by an external energy source accessible by at least one module and rely on module-to-module power transfer to distribute energy throughout the system. While much effort has gone into addressing challenging aspects of power management in programmable matter hardware, algorithmic theory for programmable matter has largely ignored the impact of energy usage and distribution on algorithm feasibility and efficiency. In this work, we present an algorithm for energy distribution in the amoebot model that is loosely inspired by the growth behavior of Bacillus subtilis bacterial biofilms. These bacteria use chemical signaling to communicate their metabolic states and regulate nutrient consumption throughout the biofilm, ensuring that all bacteria receive the nutrients they need. Our algorithm similarly uses communication to inhibit energy usage when there are starving modules, enabling all modules to receive sufficient energy to meet their demands. As a supporting but independent result, we extend the amoebot model's well-established spanning forest primitive so that it self-stabilizes in the presence of crash failures. We conclude by showing how this self-stabilizing primitive can be leveraged to compose our energy distribution algorithm with existing amoebot model algorithms, effectively generalizing previous work to also consider energy constraints.

URL PDF HTML ☆

赞 0 踩 0

1903.02091 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Geometric Adaptive Control with Neural Networks for a Quadrotor UAV in Wind fields

风场中四旋翼无人机的几何自适应神经网络控制

Mahdis Bisheban, Taeyoung Lee

AI总结针对风场引起的非结构力和力矩扰动，提出一种基于多层神经网络在线调整权重的几何自适应控制器，实现位置和航向跟踪误差的一致最终有界。

详情

AI中文摘要

本文提出了一种带有人工神经网络的四旋翼无人机几何自适应控制器。假设四旋翼动力学受到风引起的任意非结构力和力矩的干扰。为了解决这个问题，所提出的控制系统增加了多层神经网络，并根据自适应律在线调整神经网络的权重。利用通用逼近定理，表明未知扰动的影响可以得到缓解。更具体地说，在所提出的控制系统下，位置和航向方向的跟踪误差是一致最终有界的，并且最终界可以任意减小。这些方法直接在特殊欧几里得群上开发，以避免局部参数化固有的复杂性或奇异性。首先通过数值例子说明了所提出控制系统的有效性。然后，通过几个室内飞行实验证明，即使对于激进的、敏捷的机动，所提出的控制器也能成功抑制风扰动的影响。

英文摘要

This paper proposes a geometric adaptive controller for a quadrotor unmanned aerial vehicle with artificial neural networks. It is assumed that the dynamics of a quadrotor is disturbed by arbitrary, unstructured forces and moments caused by wind. To address this, the proposed control system is augmented with multilayer neural networks, and the weights of neural networks are adjusted online according to an adaptive law. By utilizing the universal approximation theorem, it is shown that the effects of unknown disturbances can be mitigated. More specifically, under the proposed control system, the tracking errors in the position and the heading direction are uniformly ultimately bounded where the ultimate bound can be reduced arbitrarily. These are developed directly on the special Euclidean group to avoid complexities or singularities inherent to local parameterizations. The efficacy of the proposed control system is first illustrated by numerical examples. Then, several indoor flight experiments are presented to demonstrate that the proposed controller successfully rejects the effects of wind disturbances even for aggressive, agile maneuvers.

URL PDF HTML ☆

赞 0 踩 0

1803.06363 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Geometric Adaptive Control for a Quadrotor UAV with Wind Disturbance Rejection

四旋翼无人机抗风扰动的几何自适应控制

Mahdis Bisheban, Taeyoung Lee

AI总结针对四旋翼无人机，提出一种基于在线调整多层神经网络的几何自适应控制方案，以抑制未知非结构扰动，并利用特殊欧几里得群上的李雅普诺夫稳定性理论证明跟踪误差一致最终有界，且可通过数值示例验证其抗风扰动能力。

1011.1939 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Discrete Partitioning and Coverage Control for Gossiping Robots

面向闲聊机器人的离散分区与覆盖控制

Joseph W. Durham, Ruggero Carli, Paolo Frasca, Francesco Bullo

AI总结针对非凸环境，提出基于图表示和短程不可靠成对通信的分布式算法，实现机器人团队的分区与覆盖控制，并证明收敛到成对最优分区。

Comments Accepted to IEEE TRO. 14 double-column pages, 10 figures. v2 is a thorough revision of v1, including new algorithms and revised mathematical and simulation results

详情

DOI: 10.1109/TRO.2011.2170753

AI中文摘要

我们提出了分布式算法，用于自动部署一组移动机器人以对非凸环境进行分区和覆盖。为处理任意非凸环境，我们将其表示为图。我们的分区和覆盖算法仅需要短程、不可靠的成对“闲聊”通信。该算法包含两个部分：(1) 一个运动协议，确保相邻机器人至少间歇性地通信；(2) 一个成对分区规则，用于在两个机器人通信时更新领地所有权。通过研究图顶点分区空间上的适当动力系统，我们证明了领地所有权在有限时间内收敛到成对最优分区。这一新的平衡集代表了比常见Lloyd类型算法更优的性能。此外，我们详细说明了算法如何在大规模团队和大规模环境中良好扩展，以及计算如何在有限资源下随时运行。最后，我们报告了在复杂环境中的大规模仿真和使用Player/Stage机器人控制系统的硬件实验。

英文摘要

We propose distributed algorithms to automatically deploy a team of mobile robots to partition and provide coverage of a non-convex environment. To handle arbitrary non-convex environments, we represent them as graphs. Our partitioning and coverage algorithm requires only short-range, unreliable pairwise "gossip" communication. The algorithm has two components: (1) a motion protocol to ensure that neighboring robots communicate at least sporadically, and (2) a pairwise partitioning rule to update territory ownership when two robots communicate. By studying an appropriate dynamical system on the space of partitions of the graph vertices, we prove that territory ownership converges to a pairwise-optimal partition in finite time. This new equilibrium set represents improved performance over common Lloyd-type algorithms. Additionally, we detail how our algorithm scales well for large teams in large environments and how the computation can run in anytime with limited resources. Finally, we report on large-scale simulations in complex environments and hardware experiments using the Player/Stage robot control system.

URL PDF HTML ☆

赞 0 踩 0

1202.0253 2026-06-03 cs.RO cs.SY eess.SY 版本更新

High-speed Flight in an Ergodic Forest

遍历森林中的高速飞行

Sertac Karaman, Emilio Frazzoli

AI总结本文研究在仅已知障碍物生成过程统计特性的随机障碍物场中高速导航的理论基础，通过遍历性和渗流理论揭示了无限无碰撞轨迹存在的相变现象，并推导了临界速度的上下界。

Comments Manuscript submitted to the IEEE Transactions on Robotics

详情

DOI: 10.1109/ICRA.2012.6225235

AI中文摘要

受鸟类在密集森林等杂乱环境中飞行的启发，本文研究了一个新颖运动规划问题的理论基础：在仅已知障碍物生成过程统计特性的情况下，通过随机生成的障碍物场进行高速导航。类似于平面森林环境，假设障碍物生成过程决定了圆盘形障碍物的位置和大小。当该过程是遍历的，并且在鸟类动力学的温和技术条件下，证明了通过森林的无限无碰撞轨迹的存在性表现出相变。一方面，如果鸟的飞行速度超过某个临界速度，那么以概率1，不存在无限无碰撞轨迹，即无论控制鸟运动的规划算法如何，鸟几乎必然最终会与某棵树碰撞。另一方面，如果鸟的飞行速度低于该临界速度，那么几乎必然存在至少一条无限无碰撞轨迹。针对齐次泊松森林的特殊情况，考虑鸟动力学的简单模型，推导了临界速度的上下界。对于相同情况，给出了一个等价渗流模型。利用该模型，通过蒙特卡洛模拟近似了相图。本文还通过遍历理论和渗流理论建立了机器人运动规划与统计物理之间的新联系，这可能具有独立的研究价值。

英文摘要

Inspired by birds flying through cluttered environments such as dense forests, this paper studies the theoretical foundations of a novel motion planning problem: high-speed navigation through a randomly-generated obstacle field when only the statistics of the obstacle generating process are known a priori. Resembling a planar forest environment, the obstacle generating process is assumed to determine the locations and sizes of disk-shaped obstacles. When this process is ergodic, and under mild technical conditions on the dynamics of the bird, it is shown that the existence of an infinite collision-free trajectory through the forest exhibits a phase transition. On one hand, if the bird flies faster than a certain critical speed, then, with probability one, there is no infinite collision-free trajectory, i.e., the bird will eventually collide with some tree, almost surely, regardless of the planning algorithm governing the bird's motion. On the other hand, if the bird flies slower than this critical speed, then there exists at least one infinite collision-free trajectory, almost surely. Lower and upper bounds on the critical speed are derived for the special case of a homogeneous Poisson forest considering a simple model for the bird's dynamics. For the same case, an equivalent percolation model is provided. Using this model, the phase diagram is approximated in Monte-Carlo simulations. This paper also establishes novel connections between robot motion planning and statistical physics through ergodic theory and percolation theory, which may be of independent interest.

URL PDF HTML ☆

赞 0 踩 0

1205.0207 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Shortest Path Set Induced Vertex Ordering and its Application to Distributed Distance Optimal Multi-agent Formation Path Planning

最短路径集诱导的顶点排序及其在分布式距离最优多智能体编队路径规划中的应用

Jingjin Yu

AI总结针对无向图中不可区分智能体移动到任意目标编队的距离最优路径规划问题，提出一种基于最短路径集诱导顶点排序的集中式算法，并首次实现分布式调度，同时保证相同的收敛时间。

Comments Extended the earlier version to 8 Pages, complete with literature review. One additional section on a distributed scheduling algorithm is added

1204.5717 2026-06-03 cs.DS cs.RO cs.SY eess.SY 版本更新

Multi-agent Path Planning and Network Flow

多智能体路径规划与网络流

Jingjin Yu, Steven M. LaValle

AI总结将图上的多智能体路径规划问题归约到网络流问题，利用组合网络流算法和线性规划技术求解，并证明当目标置换不变时存在最长完成时间不超过 n+V-1 步的可行解路径集，给出 O(nVE) 时间算法，进一步研究时间和距离最优性及其帕累托最优结构。

Comments Corrected an inaccuracy on time optimal solution for average arrival time

详情

AI中文摘要

本文将图（路标图）上的多智能体路径规划与网络流问题联系起来，表明前者可以归约到后者，从而使得组合网络流算法以及一般的线性规划技术能够应用于图上的多智能体路径规划问题。利用这一联系，我们证明当目标是置换不变时，该问题总是存在一个可行解路径集，其最长完成时间不超过 $n + V - 1$ 步，其中 $n$ 是智能体数量，$V$ 是底层图的顶点数。然后，我们给出一个完整算法，在 $O(nVE)$ 时间内找到这样的解，其中 $E$ 是图的边数。进一步，我们研究可行解的时间和距离最优性，表明它们具有成对的帕累托最优结构，并再次为优化这两个实际目标提供了高效算法。

英文摘要

This paper connects multi-agent path planning on graphs (roadmaps) to network flow problems, showing that the former can be reduced to the latter, therefore enabling the application of combinatorial network flow algorithms, as well as general linear program techniques, to multi-agent path planning problems on graphs. Exploiting this connection, we show that when the goals are permutation invariant, the problem always has a feasible solution path set with a longest finish time of no more than $n + V - 1$ steps, in which $n$ is the number of agents and $V$ is the number of vertices of the underlying graph. We then give a complete algorithm that finds such a solution in $O(nVE)$ time, with $E$ being the number of edges of the graph. Taking a further step, we study time and distance optimality of the feasible solutions, show that they have a pairwise Pareto optimal structure, and again provide efficient algorithms for optimizing two of these practical objectives.

URL PDF HTML ☆

赞 0 踩 0

1204.3830 2026-06-03 cs.RO cs.AI cs.SY eess.SY 版本更新

Planning Optimal Paths for Multiple Robots on Graphs

图上多机器人路径规划的最优路径

Jingjin Yu, Steven M. LaValle

AI总结提出两种基于多流整数线性规划的模型，分别求解多机器人路径规划的最小最后到达时间和最小总距离问题，算法完备且保证最优解。

Comments Changed "agents" to "robots"

1204.3820 2026-06-03 eess.SY cs.AI cs.RO cs.SY 版本更新

Distance Optimal Formation Control on Graphs with a Tight Convergence Time Guarantee

图上具有紧收敛时间保证的距离最优编队控制

Jingjin Yu, Steven M. LaValle

AI总结针对连通图上单位边距下无碰撞移动多个不可区分智能体到任意目标顶点集的任务，提出一种快速距离最优控制算法，并给出紧收敛时间保证。

Comments Brought to be in-sync with final version submitted to CDC 2012 with only minor updates

1108.3405 2026-06-03 eess.SY cs.MA cs.RO cs.SY math.OC 版本更新

Hybrid 3-D Formation Control for Unmanned Helicopters

无人直升机的混合三维编队控制

A. Karimoddini, H. Lin, B. M. Chen, T. H. Lee

AI总结针对无人直升机编队控制，提出一种混合监督控制框架，通过状态空间球形抽象和双相似性实现离散逻辑与连续动态的交互，并嵌入防碰撞机制。

Comments Submitted for publication

详情

AI中文摘要

无人机团队构成典型的网络化信息物理系统，涉及离散逻辑与连续动态的交互。本文提出一种用于无人直升机三维领航-跟随编队控制的混合监督控制框架。所提出的混合控制框架捕捉了决策单元与路径规划器连续动态之间的内部交互，从而提高了系统的整体可靠性。为设计此类混合控制器，提出了一种状态空间的球形抽象作为新的抽象方法。利用划分空间上的多仿射函数性质，得到有限状态离散事件系统模型，该模型被证明与原始连续变量动态系统是双相似的。然后，在离散域中，为抽象模型模块化设计了一个逻辑监督器。由于抽象DES模型与原始无人机动态之间的双相似性，设计的逻辑监督器可通过接口层实现为混合控制器。该监督器驱动无人机动态以满足设计要求。换言之，混合控制器能够从控制视界内的任意初始状态将无人机引导至期望编队，并维持编队。此外，在设计的监督器中嵌入了防碰撞机制。最后，通过为无人直升机开发的硬件在环仿真平台验证了该算法。结果表明了算法的有效性。

英文摘要

Teams of Unmanned Aerial Vehicles (UAVs) form typical networked cyber-physical systems that involve the interaction of discrete logic and continuous dynamics. This paper presents a hybrid supervisory control framework for the three-dimensional leader follower formation control of unmanned helicopters. The proposed hybrid control framework captures internal interactions between the decision making unit and the path planner continuous dynamics of the system, and hence improves the system's overall reliability. To design such a hybrid controller, a spherical abstraction of the state space is proposed as a new method of abstraction. Utilizing the properties of multi-affine functions over the partitioned space leads to a finite state Discrete Event System (DES) model, which is shown to be bisimilar to the original continuous-variable dynamical system. Then, in the discrete domain, a logic supervisor is modularly designed for the abstracted model. Due to the bisimilarity between the abstracted DES model and the original UAV dynamics, the designed logic supervisor can be implemented as a hybrid controller through an interface layer. This supervisor drives the UAV dynamics to satisfy the design requirements. In other words, the hybrid controller is able to bring the UAVs to the desired formation starting from any initial state inside the control horizon and then, maintain the formation. Moreover, a collision avoidance mechanism is embedded in the designed supervisor. Finally, the algorithm has been verified by a hardware-in-the-loop simulation platform, which is developed for unmanned helicopters. The presented results show the effectiveness of the algorithm.

URL PDF HTML ☆

赞 0 踩 0

1104.4251 2026-06-03 cs.RO cs.MA cs.SY eess.SY math.OC 版本更新

Distributed Self-Organization Of Swarms To Find Globally $ε$-Optimal Routes To Locally Sensed Targets

群体分布式自组织以找到局部感知目标的全局$ε$-最优路径

Ishanu Chattopadhyay

AI总结针对大规模群体，提出一种仅利用局部信息的分布式路径规划算法，通过信息渗透和梯度涌现实现接近最优的路径选择，并严格分析了收敛性、鲁棒性、可扩展性及系统参数的影响。

Comments 38 pages 10 Figures

详情

AI中文摘要

在大规模群体的背景下，研究了局部感知目标的近最优分布式路径规划问题。所提出的算法仅使用可以局部查询的信息，并建立了关于收敛性、鲁棒性和可扩展性的严格理论结果，同时分析了系统参数（如个体通信半径和个体速度）对全局性能的影响。该方法的基本思想是让局部信息在整个群体中渗透，使个体能够间接访问全局上下文。通过相邻个体之间的局部信息交换，以分布式方式计算反映个体性能的梯度。研究表明，为了沿着接近最优的路径到达只能局部感知且位置未知的目标，个体只需向其“最佳”邻居移动，其中“最佳”的概念是通过计算底层概率有限状态自动机的状态特定语言度量得到的。理论结果在超过$10^4$个个体的高保真仿真实验中得到了验证。

英文摘要

The problem of near-optimal distributed path planning to locally sensed targets is investigated in the context of large swarms. The proposed algorithm uses only information that can be locally queried, and rigorous theoretical results on convergence, robustness, scalability are established, and effect of system parameters such as the agent-level communication radius and agent velocities on global performance is analyzed. The fundamental philosophy of the proposed approach is to percolate local information across the swarm, enabling agents to indirectly access the global context. A gradient emerges, reflecting the performance of agents, computed in a distributed manner via local information exchange between neighboring agents. It is shown that to follow near-optimal routes to a target which can be only sensed locally, and whose location is not known a priori, the agents need to simply move towards its "best" neighbor, where the notion of "best" is obtained by computing the state-specific language measure of an underlying probabilistic finite state automata. The theoretical results are validated in high-fidelity simulation experiments, with excess of $10^4$ agents.

URL PDF HTML ☆

赞 0 踩 0

1104.1159 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees

不确定环境下具有概率满足保证的LTL控制

Xu Chu Ding, Stephen L. Smith, Calin Belta, Daniela Rus

AI总结提出一种最大化任务完成概率的机器人控制策略生成方法，任务由线性时序逻辑公式描述，通过将问题转化为马尔可夫决策过程的最优策略求解，并利用概率模型检验技术给出完整解决方案。

Comments Technical Report accompanying IFAC 2011

详情

AI中文摘要

我们提出一种生成机器人控制策略的方法，该策略最大化完成任务的概率。任务由一组属性的线性时序逻辑（LTL）公式给出，这些属性可以在划分环境的区域中满足。我们假设属性在区域中满足的概率已知，并且机器人只能在当前区域确定命题的真值。受分区抽象相关结果的启发，我们假设运动在图上进行。为了考虑噪声传感器和执行器，我们假设一个控制动作会启用多个具有已知概率的转移。我们证明该问题可以简化为为马尔可夫决策过程（MDP）生成控制策略的问题，使得在其状态上满足LTL公式的概率最大化。我们基于概率模型检验的现有结果，为后一个问题提供了完整解决方案。我们包含一个说明性案例研究。

英文摘要

We present a method to generate a robot control strategy that maximizes the probability to accomplish a task. The task is given as a Linear Temporal Logic (LTL) formula over a set of properties that can be satisfied at the regions of a partitioned environment. We assume that the probabilities with which the properties are satisfied at the regions are known, and the robot can determine the truth value of a proposition only at the current region. Motivated by several results on partitioned-based abstractions, we assume that the motion is performed on a graph. To account for noisy sensors and actuators, we assume that a control action enables several transitions with known probabilities. We show that this problem can be reduced to the problem of generating a control policy for a Markov Decision Process (MDP) such that the probability of satisfying an LTL formula over its states is maximized. We provide a complete solution for the latter problem that builds on existing results from probabilistic model checking. We include an illustrative case study.

URL PDF HTML ☆

赞 0 踩 0

1103.4065 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Probabilistically Safe Vehicle Control in a Hostile Environment

敌对环境中概率安全的车辆控制

Igor Cizelj, Xu Chu Ding, Morteza Lahijanian, Alessandro Pinto, Calin Belta

AI总结本文提出一种在静态障碍和移动对手的敌对环境中控制车辆的方法，通过将对手运动建模为泊松过程、车辆穿越时间建模为指数分布，并利用马尔可夫决策过程和概率计算树逻辑最大化任务完成概率。

1209.2058 2026-06-03 cs.RO cs.DC cs.MA cs.SY eess.SY 版本更新

Safe and Stabilizing Distributed Multi-Path Cellular Flows

安全且稳定的分布式多路径蜂窝流

Taylor T. Johnson, Sayan Mitra

AI总结针对分区平面中的分布式交通控制问题，提出一种保证实体间最小安全距离并能在单目标下自稳定、多目标下避免死锁的协议，通过临时阻塞和局部地理路由实现安全与进展。

Comments An earlier version of this paper appeared in the 30th IEEE International Conference on Distributed Computing Systems (ICDCS 2010)

详情

AI中文摘要

我们研究了分区平面中的分布式交通控制问题，其中每个分区（单元）内所有实体（机器人、车辆等）的运动是耦合的。在此类系统中建立活性具有挑战性，但这种分析对于将分布式交通控制算法应用于协调机器人群体和智能高速公路系统等场景是必要的。我们提出了一个分布式交通控制协议的正式模型，该模型保证实体之间的最小安全距离，即使某些单元发生故障。一旦新故障停止发生，在单目标情况下，协议保证自稳定，并且具有到目标单元可行路径的实体能够向目标前进。对于多目标情况，故障可能导致系统死锁，因此我们识别了一类非死锁故障，其中所有实体都能向各自目标前进。该算法依赖于两个通用原则：临时阻塞以维护安全性，以及局部地理路由以保证进展。我们的断言式证明可作为其他分布式交通控制协议分析的模板。我们给出了仿真结果，提供了吞吐量作为实体速度、安全距离、单目标路径复杂度、故障恢复率和多目标路径复杂度的函数估计。

英文摘要

We study the problem of distributed traffic control in the partitioned plane, where the movement of all entities (robots, vehicles, etc.) within each partition (cell) is coupled. Establishing liveness in such systems is challenging, but such analysis will be necessary to apply such distributed traffic control algorithms in applications like coordinating robot swarms and the intelligent highway system. We present a formal model of a distributed traffic control protocol that guarantees minimum separation between entities, even as some cells fail. Once new failures cease occurring, in the case of a single target, the protocol is guaranteed to self-stabilize and the entities with feasible paths to the target cell make progress towards it. For multiple targets, failures may cause deadlocks in the system, so we identify a class of non-deadlocking failures where all entities are able to make progress to their respective targets. The algorithm relies on two general principles: temporary blocking for maintenance of safety and local geographical routing for guaranteeing progress. Our assertional proofs may serve as a template for the analysis of other distributed traffic control protocols. We present simulation results that provide estimates of throughput as a function of entity velocity, safety separation, single-target path complexity, failure-recovery rates, and multi-target path complexity.

URL PDF HTML ☆

赞 0 踩 0

1302.0450 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Algorithms for leader selection in stochastically forced consensus networks

随机力驱动共识网络中的领导者选择算法

Fu Lin, Makan Fardad, Mihailo R. Jovanović

AI总结针对随机力驱动共识网络，通过凸松弛和贪婪算法优化领导者选择以最小化均方偏差。

Comments Submitted to IEEE Transactions on Automatic Control

详情

DOI: 10.1109/TAC.2014.2314223
Journal ref: IEEE Trans. Automat. Control (2014), vol. 59, no. 7, pp. 1789-1802

AI中文摘要

我们感兴趣的是分配指定数量的节点作为领导者，以最小化随机力驱动网络中与共识的均方偏差。该问题出现在多个应用中，包括车辆编队控制和传感器网络定位。对于领导者受噪声影响的网络，我们证明了布尔约束（节点要么是领导者，要么不是）是非凸性的唯一来源。通过将这些约束松弛到其凸包，我们得到了全局最优值的下界。我们还使用一种简单但高效的贪婪算法来识别领导者并计算上界。对于领导者完美遵循其期望轨迹的网络，我们以秩约束的形式识别了另一个非凸性来源。移除秩约束并松弛布尔约束得到一个半定规划，为此我们开发了一种适用于大型网络的定制算法。提供了从规则网格到随机图等多个例子，以说明所开发算法的有效性。

英文摘要

We are interested in assigning a pre-specified number of nodes as leaders in order to minimize the mean-square deviation from consensus in stochastically forced networks. This problem arises in several applications including control of vehicular formations and localization in sensor networks. For networks with leaders subject to noise, we show that the Boolean constraints (a node is either a leader or it is not) are the only source of nonconvexity. By relaxing these constraints to their convex hull we obtain a lower bound on the global optimal value. We also use a simple but efficient greedy algorithm to identify leaders and to compute an upper bound. For networks with leaders that perfectly follow their desired trajectories, we identify an additional source of nonconvexity in the form of a rank constraint. Removal of the rank constraint and relaxation of the Boolean constraints yields a semidefinite program for which we develop a customized algorithm well-suited for large networks. Several examples ranging from regular lattices to random graphs are provided to illustrate the effectiveness of the developed algorithms.

URL PDF HTML ☆

赞 0 踩 0

1303.2912 2026-06-03 cs.AI cs.RO cs.SY eess.SY stat.ML 版本更新

Integrated Pre-Processing for Bayesian Nonlinear System Identification with Gaussian Processes

基于高斯过程的贝叶斯非线性系统辨识的集成预处理

Roger Frigola, Carl Edward Rasmussen

AI总结提出GP-FNARX模型，通过集成数据预处理与稀疏高斯过程回归，实现从原始数据到辨识模型的自动化流程，并利用边际似然最大化同时优化预处理参数和超参数，获得能报告不确定性的贝叶斯动力学模型。

Comments Proceedings of the 52th IEEE International Conference on Decision and Control (CDC), Firenze, Italy, December 2013

1209.4433 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Transverse Contraction Criteria for Existence, Stability, and Robustness of a Limit Cycle

极限环存在性、稳定性和鲁棒性的横向收缩准则

Ian R. Manchester, Jean-Jacques E. Slotine

AI总结本文推导了自治系统中轨道稳定极限环存在的微分收缩条件，该条件可表示为逐点线性矩阵不等式，从而可利用凸优化工具（如平方和规划）搜索稳定极限环存在的证书，并将收缩动力学的许多理想性质（如互联下收缩保持）推广到该框架，同时通过引入微分耗散性和横向微分耗散性概念，基于子系统LMI条件建立大规模系统的收缩与横向收缩。

Comments 6 pages, 1 figure. Conference submission

1302.7314 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Torque Saturation in Bipedal Robotic Walking through Control Lyapunov Function Based Quadratic Programs

基于控制李雅普诺夫函数二次规划的双足机器人行走中的力矩饱和

Kevin Galloway, Koushil Sreenath, Aaron D. Ames, J. W. Grizzle

AI总结本文提出一种通过凸优化将用户定义的控制输入饱和直接纳入控制李雅普诺夫函数（CLF）行走控制器计算的新方法，并在双足机器人MABEL上实验验证。

1301.0043 2026-06-03 cs.HC cs.RO cs.SY eess.SY 版本更新

A Framework for Analysing Driver Interactions with Semi-Autonomous Vehicles

分析驾驶员与半自主车辆交互的框架

Siraj Shaikh, Padmanabhan Krishnan

AI总结提出一个结合人类行为经验模型与环境系统模型的框架，通过模型检验分析驾驶员与半自主车辆交互的安全性，并以驾驶员疲劳为例验证其适用性。

Comments In Proceedings FTSCS 2012, arXiv:1212.6574

详情

DOI: 10.4204/EPTCS.105.7
Journal ref: EPTCS 105, 2012, pp. 85-99

AI中文摘要

半自主车辆在从采矿到物流再到国防的各种环境中日益发挥关键功能。此类系统的一个关键特征是控制回路中存在人类（驾驶员）。为了确保安全，驾驶员需要了解车辆的自主方面，而车辆内置的自动化功能旨在实现更安全的控制。在本文中，我们提出了一个框架，将描述人类行为的经验模型与环境及系统模型相结合。然后，我们通过模型检验分析这些模型之间的交互，以验证所需的安全属性。目的是分析安全车辆-驾驶员交互的设计。我们通过一个涉及半自主车辆的案例研究证明了我们方法的适用性，其中驾驶员疲劳是安全旅程的关键因素。

英文摘要

Semi-autonomous vehicles are increasingly serving critical functions in various settings from mining to logistics to defence. A key characteristic of such systems is the presence of the human (drivers) in the control loop. To ensure safety, both the driver needs to be aware of the autonomous aspects of the vehicle and the automated features of the vehicle built to enable safer control. In this paper we propose a framework to combine empirical models describing human behaviour with the environment and system models. We then analyse, via model checking, interaction between the models for desired safety properties. The aim is to analyse the design for safe vehicle-driver interaction. We demonstrate the applicability of our approach using a case study involving semi-autonomous vehicles where the driver fatigue are factors critical to a safe journey.

URL PDF HTML ☆

赞 0 踩 0

1212.2495 2026-06-03 cs.RO cs.AI cs.SY eess.SY 版本更新

Policy-contingent abstraction for robust robot control

基于策略抽象的鲁棒机器人控制

Joelle Pineau, Geoffrey Gordon, Sebastian Thrun

AI总结提出一种可扩展的控制算法，使移动机器人系统在充分考虑概率信念的情况下做出高层决策，并成功部署于护理机构。

Comments Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

1211.4038 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Stochastic receding horizon control of nonlinear stochastic systems with probabilistic state constraints

具有概率状态约束的非线性随机系统的随机滚动时域控制

Shridhar K. Shah, Herbert G. Tanner, Chetan D. Pahlajani

AI总结针对受概率状态约束的连续时间随机非线性系统，提出一种将滚动时域参考路径设计与随机最优控制器相结合的实时可实施控制框架，并证明无控制输入约束下的闭环收敛性。

Comments Draft of submission to IEEE Transactions of Automatic Control

1211.1690 2026-06-03 cs.RO cs.CV cs.LG cs.SY eess.SY 版本更新

Learning Monocular Reactive UAV Control in Cluttered Natural Environments

学习在杂乱自然环境中进行单目反应式无人机控制

Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell, Martial Hebert

AI总结本文使用单目相机和模仿学习训练控制器，使小型四旋翼飞行器能在自然森林环境中以1.5m/s速度自主避障导航。

Comments 8 pages, 10 figures

详情

AI中文摘要

大型无人机的自主导航相对简单，因为可以使用昂贵的传感器和监控设备。相比之下，在杂乱环境中低空飞行的微型飞行器（MAV）的避障仍然是一项具有挑战性的任务。与大型飞行器不同，MAV只能携带非常轻的传感器，如摄像头，这使得通过障碍物的自主导航更具挑战性。本文描述了一个系统，该系统能够使小型四旋翼直升机在自然森林环境中低空自主导航。仅使用单个廉价摄像头感知环境，我们能够保持高达1.5m/s的恒定速度。通过少量人类飞行员演示，我们使用最新的模仿学习技术训练了一个控制器，该控制器通过调整MAV的航向来避免树木。我们在室内更受控的环境和室外真实自然森林环境中展示了系统的性能。

英文摘要

Autonomous navigation for large Unmanned Aerial Vehicles (UAVs) is fairly straight-forward, as expensive sensors and monitoring devices can be employed. In contrast, obstacle avoidance remains a challenging task for Micro Aerial Vehicles (MAVs) which operate at low altitude in cluttered environments. Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous navigation through obstacles much more challenging. In this paper, we describe a system that navigates a small quadrotor helicopter autonomously at low altitude through natural forest environments. Using only a single cheap camera to perceive the environment, we are able to maintain a constant velocity of up to 1.5m/s. Given a small set of human pilot demonstrations, we use recent state-of-the-art imitation learning techniques to train a controller that can avoid trees by adapting the MAVs heading. We demonstrate the performance of our system in a more controlled environment indoors, and in real natural forest environments outdoors.

URL PDF HTML ☆

赞 0 踩 0

1209.5805 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

Memoryless Control Design for Persistent Surveillance under Safety Constraints

安全约束下持久监视的无记忆控制设计

Eduardo Arvelo, Eric Kim, Nuno C. Martins

AI总结针对有限二维网格中移动机器人的持久监视问题，提出一种基于熵最大化原理的有限参数凸规划方法，设计时间不变无记忆控制策略，在避免进入禁止区域的同时最大化被持久监视的状态数。

1210.0888 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Control Design along Trajectories with Sums of Squares Programming

基于平方和规划的轨迹控制设计

Anirudha Majumdar, Amir Ali Ahmadi, Russ Tedrake

AI总结提出一种通过平方和规划最大化不变漏斗尺寸的控制设计方法，以形式化保证机器人控制任务的稳定性和安全性。

详情

AI中文摘要

受对具有挑战性的机器人控制任务的控制器稳定性和安全性形式化保证需求的驱动，我们提出了一种控制设计程序，该程序明确寻求最大化通向预定义目标集的不变“漏斗”的尺寸。我们的不变性证明以适当定义的Lyapunov不等式组的平方和证明形式给出。这些证明以及我们提出的多项式控制器可以通过半定优化高效获得。我们的方法可以处理跟踪给定轨迹导致的时变动力学、输入饱和（例如力矩限制），并可扩展到处理动力学和状态的不确定性。所得控制器可用于空间填充反馈运动规划算法，以显著减少轨迹数量填充空间。我们在一个严重力矩受限的欠驱动双摆（Acrobot）上演示了我们的方法，并提供了广泛的仿真和硬件验证。

英文摘要

Motivated by the need for formal guarantees on the stability and safety of controllers for challenging robot control tasks, we present a control design procedure that explicitly seeks to maximize the size of an invariant "funnel" that leads to a predefined goal set. Our certificates of invariance are given in terms of sums of squares proofs of a set of appropriately defined Lyapunov inequalities. These certificates, together with our proposed polynomial controllers, can be efficiently obtained via semidefinite optimization. Our approach can handle time-varying dynamics resulting from tracking a given trajectory, input saturations (e.g. torque limits), and can be extended to deal with uncertainty in the dynamics and state. The resulting controllers can be used by space-filling feedback motion planning algorithms to fill up the space with significantly fewer trajectories. We demonstrate our approach on a severely torque limited underactuated double pendulum (Acrobot) and provide extensive simulation and hardware validation.

URL PDF HTML ☆

赞 0 踩 0

1205.3668 2026-06-03 cs.RO cs.SY eess.SY nlin.AO physics.comp-ph 版本更新

Synthesis and Adaptation of Effective Motor Synergies for the Solution of Reaching Tasks

有效运动协同的合成与自适应用于解决到达任务

Cristiano Alessandro, Juan Pablo Carbajal, Andrea d'Avella

AI总结受肌肉协同假说启发，提出一种通过线性组合少量预定义协同（synergies）生成开环控制器的方法，使智能体能够自主合成并适应有效协同集以解决点对点到达任务，显著降低控制问题维度并保持良好性能。

Comments conference paper

1203.4345 2026-06-03 eess.SY cs.AI cs.RO cs.SY stat.ML 版本更新

Robust Filtering and Smoothing with Gaussian Processes

基于高斯过程的鲁棒滤波与平滑

Marc Peter Deisenroth, Ryan Turner, Marco F. Huber, Uwe D. Hanebeck, Carl Edward Rasmussen

AI总结提出一种基于非参数高斯过程模型的非线性随机动态系统鲁棒贝叶斯滤波与平滑算法，通过解析平滑实现鲁棒性，数值实验表明在其它先进方法失效时仍保持稳健。

Comments 7 pages, 1 figure, draft version of paper accepted at IEEE Transactions on Automatic Control

详情

DOI: 10.1109/TAC.2011.2179426

AI中文摘要

我们提出了一种原则性算法，用于在非线性随机动态系统中进行鲁棒贝叶斯滤波和平滑，其中转移函数和测量函数均由非参数高斯过程（GP）模型描述。在信号处理、机器学习、机器人和控制领域，GP通过后验概率分布表示未知系统函数，其重要性日益增加。这种现代的“系统辨识”方式比寻找参数函数表示的点估计更为鲁棒。在本文中，我们提出了一种原则性算法，用于在GP动态系统中进行鲁棒解析平滑，该系统在机器人和控制领域应用日益广泛。我们的数值评估表明，在其它最先进的高斯滤波器和平滑器可能失败的情况下，所提方法具有鲁棒性。

英文摘要

We propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior probability distributions. This modern way of "system identification" is more robust than finding point estimates of a parametric function representation. In this article, we present a principled algorithm for robust analytic smoothing in GP dynamic systems, which are increasingly used in robotics and control. Our numerical evaluations demonstrate the robustness of the proposed approach in situations where other state-of-the-art Gaussian filters and smoothers can fail.

URL PDF HTML ☆

赞 0 踩 0

1207.3434 2026-06-03 cs.AI cs.RO cs.SY eess.SY 版本更新

An Approach to Model Interest for Planetary Rover through Dezert-Smarandache Theory

基于Dezert-Smarandache理论的行星探测器兴趣建模方法

Matteo Ceriotti, Massimiliano Vasile, Giovanni Giardini, Mauro Massari

AI总结提出一种通过Dezert-Smarandache理论融合有效载荷和导航信息来量化行星探测器目标兴趣度的方法，实现自主目标重分配与科学目标优选。

Comments Journal Of Aerospace Computing, Information, And Communication Vol. 5, Month 2008

1207.1280 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Probabilistically Safe Control of Noisy Dubins Vehicles

噪声Dubins车辆的概率安全控制

Igor Cizelj, Calin Belta

AI总结针对噪声Dubins车辆，通过马尔可夫决策过程（MDP）和概率计算树逻辑（PCTL）最大化满足时序逻辑规约的概率，并保证原环境中的满足概率有下界。

Comments Technical Report

详情

AI中文摘要

我们解决了控制随机版本的Dubins车辆的问题，使得在分区环境中一组属性区域上满足时序逻辑规约的概率最大化。我们假设车辆能够确定其在已知环境地图中的精确初始位置。然而，受实际限制启发，我们假设车辆配备有噪声执行器，并且在运动过程中只能使用有限精度的陀螺仪测量其角速度。通过量化和离散化，我们以马尔可夫决策过程（MDP）的形式构建了车辆运动的有限近似。我们允许任务规约为关于环境属性的时序逻辑语句，并使用概率计算树逻辑（PCTL）工具生成最大化满足概率的MDP控制策略。我们将该策略转化为车辆反馈控制策略，并证明车辆在原环境中满足规约的概率由MDP上满足规约的最大概率给出下界。

英文摘要

We address the problem of controlling a stochastic version of a Dubins vehicle such that the probability of satisfying a temporal logic specification over a set of properties at the regions in a partitioned environment is maximized. We assume that the vehicle can determine its precise initial position in a known map of the environment. However, inspired by practical limitations, we assume that the vehicle is equipped with noisy actuators and, during its motion in the environment, it can only measure its angular velocity using a limited accuracy gyroscope. Through quantization and discretization, we construct a finite approximation for the motion of the vehicle in the form of a Markov Decision Process (MDP). We allow for task specifications given as temporal logic statements over the environmental properties, and use tools in Probabilistic Computation Tree Logic (PCTL) to generate an MDP control policy that maximizes the probability of satisfaction. We translate this policy to a vehicle feedback control strategy and show that the probability that the vehicle satisfies the specification in the original environment is bounded from below by the maximum probability of satisfying the specification on the MDP.

URL PDF HTML ☆

赞 0 踩 0

1109.2363 2026-06-03 stat.AP cs.RO cs.SY eess.SY math.OC 版本更新

Sensor Management: Past, Present, and Future

传感器管理：过去、现在与未来

Alfred O. Hero, Douglas Cochran

AI总结本文综述了传感器管理的理论、算法和应用，涵盖其发展历程和当前现状，并展望未来方向。

Comments 15 pages, 112 references

1204.0133 2026-06-03 eess.SY cs.IT cs.RO cs.SY math.IT 版本更新

Progressive Gaussian Filtering

渐进式高斯滤波

Uwe D. Hanebeck, Jannik Steinbring

AI总结提出一种渐进贝叶斯方法，通过耦合高斯密度与狄拉克混合近似的常微分方程连续跟踪非高斯后验，并在离散时间立方传感器问题上优于现有滤波器。

1203.6243 2026-06-03 eess.SY cs.RO cs.SY 版本更新

Optimal Pruning for Multi-Step Sensor Scheduling

多步传感器调度的最优剪枝

Marco F. Huber

AI总结针对线性高斯传感器调度问题，提出基于信息矩阵和Riccati方程单调性的信息剪枝算法，以计算高效地最小化多步估计误差。

Comments 6 pages, 3 figures, 1 algorithm, accepted for publication as technical correspondence in IEEE Transactions on Automatic Control

1202.5544 2026-06-03 cs.RO cs.SY eess.SY math.DS math.OC math.PR 版本更新

An Incremental Sampling-based Algorithm for Stochastic Optimal Control

基于增量采样的随机最优控制算法

Vu Anh Huynh, Sertac Karaman, Emilio Frazzoli

AI总结针对连续时间连续空间随机最优控制问题，提出增量马尔可夫决策过程（iMDP）算法，通过随机采样状态空间生成离散化序列并异步值迭代，以任意精度逼近最优策略。

Comments Part of the results have been submitted to the IEEE International Conference on Robotics and Automation (ICRA 2012). Minnesota, USA, May 2012

详情

AI中文摘要

本文考虑一类连续时间、连续空间的随机最优控制问题。基于马尔可夫链近似方法和确定性路径规划中基于采样的算法的最新进展，我们提出了一种名为增量马尔可夫决策过程（iMDP）的新算法，用于增量计算在期望成本意义上任意逼近最优策略的控制策略。该算法的主要思想是通过对状态空间进行随机采样，生成原始问题的一系列有限离散化。在每次迭代中，离散化问题是一个马尔可夫决策过程，作为原始问题的增量细化模型。我们证明，以概率1，（i）每个离散化问题的最优值函数序列一致收敛到原始随机最优控制问题的最优值函数，并且（ii）原始最优值函数可以使用异步值迭代以增量方式高效计算。因此，所提出的算法为连续问题的最优控制策略计算提供了一种随时方法。在存在过程噪声的杂乱环境中，通过运动规划和控制问题展示了所提出方法的有效性。

英文摘要

In this paper, we consider a class of continuous-time, continuous-space stochastic optimal control problems. Building upon recent advances in Markov chain approximation methods and sampling-based algorithms for deterministic path planning, we propose a novel algorithm called the incremental Markov Decision Process (iMDP) to compute incrementally control policies that approximate arbitrarily well an optimal policy in terms of the expected cost. The main idea behind the algorithm is to generate a sequence of finite discretizations of the original problem through random sampling of the state space. At each iteration, the discretized problem is a Markov Decision Process that serves as an incrementally refined model of the original problem. We show that with probability one, (i) the sequence of the optimal value functions for each of the discretized problems converges uniformly to the optimal value function of the original stochastic optimal control problem, and (ii) the original optimal value function can be computed efficiently in an incremental manner using asynchronous value iterations. Thus, the proposed algorithm provides an anytime approach to the computation of optimal control policies of the continuous problem. The effectiveness of the proposed approach is demonstrated on motion planning and control problems in cluttered environments in the presence of process noise.

URL PDF HTML ☆

赞 0 踩 0

1202.2185 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Temporal Logic Motion Control using Actor-Critic Methods

使用Actor-Critic方法的时序逻辑运动控制

Xu Chu Ding, Jing Wang, Morteza Lahijanian, Ioannis Ch. Paschalidis, Calin A. Belta

AI总结针对大型分区环境中基于时序逻辑规范的控制问题，提出一种基于最小二乘时序差分学习的Actor-Critic近似动态规划框架，通过优化随机控制策略参数实现近似最优策略。

Comments Technical Report which accompanies an ICRA2012 paper

详情

AI中文摘要

本文考虑从以时序逻辑语句形式给出的规范部署机器人的问题，该规范涉及大型分区环境中区域满足的某些属性。我们假设机器人具有噪声传感器和执行器，并将其在环境区域中的运动建模为马尔可夫决策过程（MDP）。机器人控制问题变为寻找在MDP上最大化满足时序逻辑任务概率的控制策略。对于大型环境，获取每个状态-动作对的转移概率以及求解最优策略所需的优化问题通常计算上不可行。为解决这些问题，我们提出了一种基于最小二乘时序差分学习方法的Actor-Critic类型近似动态规划框架。该框架在机器人的样本路径上运行，并针对少量参数优化随机控制策略。转移概率仅在需要时获取。硬件在环仿真证实，参数的收敛转化为近似最优策略。

英文摘要

In this paper, we consider the problem of deploying a robot from a specification given as a temporal logic statement about some properties satisfied by the regions of a large, partitioned environment. We assume that the robot has noisy sensors and actuators and model its motion through the regions of the environment as a Markov Decision Process (MDP). The robot control problem becomes finding the control policy maximizing the probability of satisfying the temporal logic task on the MDP. For a large environment, obtaining transition probabilities for each state-action pair, as well as solving the necessary optimization problem for the optimal policy are usually not computationally feasible. To address these issues, we propose an approximate dynamic programming framework based on a least-square temporal difference learning method of the actor-critic type. This framework operates on sample paths of the robot and optimizes a randomized control policy with respect to a small set of parameters. The transition probabilities are obtained only when needed. Hardware-in-the-loop simulations confirm that convergence of the parameters translates to an approximately optimal policy.

URL PDF HTML ☆

赞 0 踩 0

1112.5282 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Observability of Strapdown INS Alignment: A Global Perspective

捷联惯导系统对准的可观测性：全局视角

Yuanxin Wu, Hongliang Zhang, Meiping Wu, Xiaoping Hu, Dewen Hu

AI总结本文从全局视角出发，利用SO(3)流形上的姿态演化等固有特性，研究捷联惯导系统静态和翻滚对准的可观测性，证明绕两个不同轴连续旋转可实现完全可观测，绕单轴旋转则存在有限不可观测状态。

Comments 25 pages; IEEE Trans. on Aerospace and Electronic Systems, Jan. 2012

详情

DOI: 10.1109/TAES.2012.6129622
Journal ref: IEEE Trans. on Aerospace and Electronic Systems, 48(1), pp. 78-102, 2012

AI中文摘要

捷联惯导系统（INS）的对准具有很强的非线性，当采用机动（例如翻滚技术）来改善对准时，非线性甚至更严重。由于没有通用的规则来处理非线性系统的可观测性，大多数先前的工作通过隐式假设原始非线性系统和线性化系统具有相同的可观测性特征，来研究相应线性化系统的可观测性。捷联惯导对准是一个具有自身特性的非线性系统。利用捷联惯导的固有属性，例如SO(3)流形上的姿态演化，我们从基本定义出发，开发了一种全局且构造性的方法来研究捷联惯导静态和翻滚对准的可观测性，突出了姿态机动对可观测性的影响。我们证明，考虑未知常值传感器偏差，如果捷联惯导绕两个不同轴连续旋转，则对准将是完全可观测的；如果绕单轴旋转，则对于有限已知不可观测状态（不超过两个）几乎是可观测的。全局视角的可观测性为我们提供了对问题的深入理解和更清晰的图景，揭示了先前关于捷联惯导对准的理论结果的不全面或不一致之处。这些不一致的报告要求对大量文献中所有基于线性化的可观测性研究进行重新审视。我们进行了大量仿真，包括构造理想观测器和扩展卡尔曼滤波器，数值结果与分析一致。这些结论还有助于在实践中设计最优翻滚策略和合适的状态观测器，以最大化对准性能。

英文摘要

Alignment of the strapdown inertial navigation system (INS) has strong nonlinearity, even worse when maneuvers, e.g., tumbling techniques, are employed to improve the alignment. There is no general rule to attack the observability of a nonlinear system, so most previous works addressed the observability of the corresponding linearized system by implicitly assuming that the original nonlinear system and the linearized one have identical observability characteristics. Strapdown INS alignment is a nonlinear system that has its own characteristics. Using the inherent properties of strapdown INS, e.g., the attitude evolution on the SO(3) manifold, we start from the basic definition and develop a global and constructive approach to investigate the observability of strapdown INS static and tumbling alignment, highlighting the effects of the attitude maneuver on observability. We prove that strapdown INS alignment, considering the unknown constant sensor biases, will be completely observable if the strapdown INS is rotated successively about two different axes and will be nearly observable for finite known unobservable states (no more than two) if it is rotated about a single axis. Observability from a global perspective provides us with insights into and a clearer picture of the problem, shedding light on previous theoretical results on strapdown INS alignment that were not comprehensive or consistent.. The reporting of inconsistencies calls for a review of all linearization-based observability studies in the vast literature. Extensive simulations with constructed ideal observers and an extended Kalman filter are carried out, and the numerical results accord with the analysis. The conclusions can also assist in designing the optimal tumbling strategy and the appropriate state observer in practice to maximize the alignment performance.

URL PDF HTML ☆

赞 0 踩 0

1111.2258 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Design and Implementation of Prosthetic Arm using Gear Motor Control Technique with Appropriate Testing

使用齿轮电机控制技术的假肢手臂设计与实现及适当测试

Biswarup Neogi, Soumyajit Mukherjee, Soumya Ghosal, Achintya Das, D. N. Tibarewala

AI总结本文提出一种基于齿轮电机控制技术的假肢手臂硬件设计方法，通过处理器编程实现手臂运动，并用肌肉应变替代传统肌电信号，成功测试了轻量化假肢模型。

Comments 5 Pages,13 Figures

1109.1251 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Synthesis of Distributed Control and Communication Schemes from Global LTL Specifications

从全局LTL规范综合分布式控制与通信方案

Yushan Chen, Xu Chu Ding, Calin Belta

AI总结提出一种从全局线性时序逻辑(LTL)规范综合多智能体控制与通信策略的技术，通过并发理论检查规范可分布性，并利用LTL模型检验生成个体策略。

Comments Technical Report accompanying an accepted paper for CDC2011

1111.1684 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Simulation Techniques and Prosthetic Approach Towards Biologically Efficient Artificial Sense Organs- An Overview

仿真技术与假体方法在生物高效人工感觉器官中的应用综述

Biswarup Neogi, Soumya Ghosal, Soumyajit Mukherjee, Achintya Das, D. N. Tibarewala

AI总结本文综述了控制理论在假体感觉器官（包括视觉、味觉和嗅觉）中的应用，重点讨论了仿真技术和控制建模在人工器官性能评估与设计中的关键作用。

Comments 12 Pages

1108.3221 2026-06-03 eess.SY cs.RO cs.SY math.OC 版本更新

An Optimal Control Approach for the Persistent Monitoring Problem

持续监测问题的最优控制方法

Christos G. Cassandras, Xu Chu Ding, Xuchao Lin

AI总结提出一种最优控制框架，通过控制移动代理的运动来最小化任务空间中的不确定性度量，并利用无穷小扰动分析将问题简化为参数优化。

Comments Technical report accompanying the CDC2011 submission

1109.2288 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Heterogeneity for Increasing Performance and Reliability of Self-Reconfigurable Multi-Robot Organisms

异构性提升自重构多机器人组织的性能与可靠性

S. Kernbach, F. Schlachter, R. Humza, J. Liedke, S. Popesku, S. Russo, T. Ranzani, L. Manfredi, C. Stefanini, R. Matthias, Ch. Schwarzer, B. Girault, P. Alschbach, E. Meister, O. Scholz

AI总结本文研究异构性在自重构模块化机器人系统中的设计选择与性能评估，通过机电和软件设计实验证明异构平台能提升系统性能和可靠性。

1108.6175 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Adaptive Locomotion of Multibody Snake-like Robot

多体蛇形机器人的自适应运动

Eugen Meister, Sergej Stepanenko, Serge Kernbach

AI总结针对25自由度蛇形机器人，提出一种自适应节律控制算法，通过仿真和实物实验研究其行为和能量特性，并分析不同身体节段的动力学差异。

Comments Multibody Dynamics 2011, ECCOMAS Thematic Conference, J.C. Samin, P. Fisette (eds.) Brussels, Belgium, 4-7 July, 2011

1108.4698 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

最小二乘时序差分演员-评论家方法及其在机器人运动控制中的应用

Reza Moazzez Estanjini, Xu Chu Ding, Morteza Lahijanian, Jing Wang, Calin A. Belta, Ioannis Ch. Paschalidis

AI总结针对最大化到达某些状态同时避免其他状态的概率的马尔可夫决策过程问题，提出一种基于最小二乘时序差分学习的演员-评论家近似动态规划算法，并证明其收敛到参数空间中的驻点。

Comments Technical report accompanying an accepted paper to CDC 2011

1108.5624 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Multi-Robot Searching Algorithm Using Levy Flight and Artificial Potential Field

基于Levy飞行和人工势场的多机器人搜索算法

Donny K. Sutantyo, Serge Kernbach, Valentin A. Nepomnyashchikh, Paul Levi

AI总结提出结合Levy飞行和人工势场的多机器人搜索算法，通过实验验证其效率并开发通用框架。

Comments Eighth IEEE International Workshop on Safety, Security, and Rescue Robotics (SSRR-2010), Bremen, Germany, 26-30 July 2010

1108.5543 2026-06-03 cs.RO cs.NE cs.SY eess.SY 版本更新

Multi-Robot Organisms: State of the Art

多机器人有机体：最新技术综述

Serge Kernbach, Oliver Scholz, Kanako Harada, Sergej Popesku, Jens Liedke, Humza Raja, Wenguo Liu, Fabio Caparrelli, Jaouhar Jemai, Jiri Havlik, Eugen Meister, Paul Levi

AI总结本文综述了人工多机器人有机体领域的最新进展，涵盖机电一体化、传感器与计算设备、软件框架，并介绍了群体与可重构机器人领域的一项重大挑战。

1108.4432 2026-06-03 cs.RO cs.SY eess.SY math.OC physics.comp-ph 版本更新

Exploiting the Passive Dynamics of a Compliant Leg to Develop Gait Transitions

利用柔性腿的被动动力学发展步态转换

Harold Roberto Martinez Salazar, Juan Pablo Carbajal

AI总结通过混合动力系统分析弹簧负载倒立摆模型，识别稳定与不稳定区域，并利用不稳定区域在恒定能量下诱导步态转换，同时提出简单变攻角控制策略使系统几乎始终稳定。

1108.3240 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Multi-robot Deployment From LTL Specifications with Reduced Communication

基于LTL规范的多机器人部署与通信减少

Marius Kloetzer, Xu Chu Ding, Calin Belta

AI总结提出一种分层框架，通过有限抽象、并行组合和运动规划，将全局LTL规范自动部署到多独轮车机器人团队，并重点设计算法减少执行阶段的机器人间通信。

Comments CDC 2011 Technical Report

1108.2126 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Multi-Modal Local Sensing and Communication for Collective Underwater Systems

多模态本地感知与通信用于集体水下系统

Serge Kernbach, Tobias Dipper, Donny Sutantyo

AI总结本文研究集体水下系统中用于网络和集群模式的本地感知与通信，通过模态和子模态通信的特定组合实现多AUV间的专用协作。

1006.2165 2026-06-03 stat.ME cs.AI cs.RO cs.SY eess.SY math.OC stat.ML 版本更新

A Probabilistic Perspective on Gaussian Filtering and Smoothing

高斯滤波与平滑的概率视角

Marc Peter Deisenroth, Henrik Ohlsson

AI总结本文从概率视角统一高斯滤波与平滑方法，指出其核心区别仅在于联合概率均值和协方差的计算/近似方式，并据此推导了容积卡尔曼平滑器及基于吉布斯采样的鲁棒滤波与平滑算法。

Comments 14 pages. Extended version of conference paper (ACC 2011)

1106.0708 2026-06-03 math.OC cs.MA cs.RO cs.SY eess.SY 版本更新

Optimal Sensor Configurations for Rectangular Target Dectection

矩形目标检测的最优传感器配置

François-Alex Bourque, Bao U. Nguyen

AI总结针对具有矩形对称性和均匀分布朝向的目标，提出一种在半个圆周上均匀选择n个角度的最优搜索策略，并给出未检测概率的下界。

Comments 6 pages, 2 figures

1105.2254 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Symmetries in observer design: review of some recent results and applications to EKF-based SLAM

观测器设计中的对称性：近期结果综述及在基于EKF的SLAM中的应用

Silvere Bonnabel

AI总结本文综述了保持对称性的观测器理论及其近期进展，并将其应用于基于扩展卡尔曼滤波的同步定位与地图构建（EKF SLAM），提出了一种具有收敛性的新对称性保持扩展卡尔曼滤波器，并证明了特定增益选择可确保全局指数收敛。

Comments This paper accompanies a presentation to be given at Eighth International Workshop on Robot Motion and Control (RoMoCo'11)

1103.4342 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

MDP Optimal Control under Temporal Logic Constraints

时序逻辑约束下的MDP最优控制

Xu Chu Ding, Stephen L. Smith, Calin Belta, Daniela Rus

AI总结针对马尔可夫决策过程（MDP），提出一种在给定线性时序逻辑（LTL）规范下自动生成控制策略的方法，并引入优化命题以最小化期望成本，通过动态规划算法合成最优或次优策略。

Comments Technical report accompanying the CDC2011 submission

详情

AI中文摘要

在本文中，我们开发了一种方法，用于自动生成以马尔可夫决策过程（MDP）建模的动态系统的控制策略。控制规范以线性时序逻辑（LTL）公式给出，该公式基于定义在MDP状态上的一组命题。我们合成一个控制策略，使得MDP几乎必然满足给定规范（如果这样的策略存在）。此外，我们指定一个“优化命题”以重复满足，并制定了一个新的优化准则，即最小化该命题满足之间的期望成本。我们提出了策略最优的充分条件，并开发了一种动态规划算法，该算法在某些条件下合成最优策略，否则合成次优策略。此问题源于需要执行持久性任务的机器人应用，例如环境监测或数据收集。

英文摘要

In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exists. In addition, we designate an "optimizing proposition" to be repeatedly satisfied, and we formulate a novel optimization criterion in terms of minimizing the expected cost in between satisfactions of this proposition. We propose a sufficient condition for a policy to be optimal, and develop a dynamic programming algorithm that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. This problem is motivated by robotic applications requiring persistent tasks, such as environmental monitoring or data gathering, to be performed.

URL PDF HTML ☆

赞 0 踩 0

1102.3396 2026-06-03 cs.RO cs.SY eess.SY 版本更新

Detecting Separation in Robotic and Sensor Networks

检测机器人与传感器网络中的分离

Chenda Liao, Harshavardhan Chenji, Prabir Barooah, Radu Stoleru, Tamás Kalmár-Nagy

AI总结针对机器人与传感器网络中节点与基站可能因移动或故障而分离的问题，提出一种基于平均化方案的分布式算法，通过监测节点状态收敛性来检测永久性分离。

详情

AI中文摘要

本文考虑在机器人与传感器网络中监测检测代理与基站分离的问题。这种分离可能由代理的移动和/或故障引起。在静态网络中，分离/切断检测可以通过节点与基站之间传递消息来实现，但对于高移动性网络，由于路由不断变化，这种解决方案不切实际。我们提出了一种分布式算法来检测与基站的分离。该算法包括一个平均化方案，其中每个节点通过与其当前邻居通信来更新一个标量状态。我们证明，如果一个节点永久性地与基站断开连接，其状态收敛到$0$。如果一个节点在平均意义上与基站连接，即使在任何时刻都不连接，我们证明其状态的期望值收敛到一个正数。因此，节点可以通过监测其状态来检测是否已与基站分离。通过仿真、实际系统实现以及涉及静态和移动网络的实验，验证了所提算法的有效性。

英文摘要

In this paper we consider the problem of monitoring detecting separation of agents from a base station in robotic and sensor networks. Such separation can be caused by mobility and/or failure of the agents. While separation/cut detection may be performed by passing messages between a node and the base in static networks, such a solution is impractical for networks with high mobility, since routes are constantly changing. We propose a distributed algorithm to detect separation from the base station. The algorithm consists of an averaging scheme in which every node updates a scalar state by communicating with its current neighbors. We prove that if a node is permanently disconnected from the base station, its state converges to $0$. If a node is connected to the base station in an average sense, even if not connected in any instant, then we show that the expected value of its state converges to a positive number. Therefore, a node can detect if it has been separated from the base station by monitoring its state. The effectiveness of the proposed algorithm is demonstrated through simulations, a real system implementation and experiments involving both static as well as mobile networks.

URL PDF HTML ☆

赞 0 踩 0

1003.4831 2026-06-03 cs.RO cs.SY eess.SY physics.med-ph 版本更新

Ball on a beam: stabilization under saturated input control with large basin of attraction

球杆系统：饱和输入控制下具有大吸引域的自稳定

Yannick Aoustin, Alexander Formal'skii

AI总结针对直线和圆形两种欠驱动球杆系统，利用Jordan形式设计考虑电压饱和的反馈控制律，使吸引域逼近可控域，并通过仿真验证非线性控制律的有效性。

详情

DOI: 10.1007/s11044-008-9128-0
Journal ref: Multibody System Dynamics 21 (2008) 71-89

AI中文摘要

本文致力于两个欠驱动平面系统的镇定问题，即著名的直线球杆系统和一种原创的圆形球杆系统。利用每个系统模型在不稳定平衡点附近线性化的Jordan形式，设计了反馈控制律。明确考虑了输入到电机的电压限制。直线球杆系统在平衡点附近的运动中有一个不稳定模态。所提出的控制律确保吸引域与可控域重合。圆形球杆系统在平衡点附近有两个不稳定模态。因此，这种从未被考虑过的装置比直线球杆系统更难控制。主要贡献是提出一种简单的新控制律，通过调整其增益参数，使得在线性情况下吸引域可以任意接近可控域。针对两个非线性系统，给出了仿真结果，以说明所设计的非线性控制律的效率并确定吸引域。

英文摘要

This article is devoted to the stabilization of two underactuated planar systems, the well-known straight beam-and-ball system and an original circular beam-and-ball system. The feedback control for each system is designed, using the Jordan form of its model, linearized near the unstable equilibrium. The limits on the voltage, fed to the motor, are taken into account explicitly. The straight beam-and-ball system has one unstable mode in the motion near the equilibrium point. The proposed control law ensures that the basin of attraction coincides with the controllability domain. The circular beam-and-ball system has two unstable modes near the equilibrium point. Therefore, this device, never considered in the past, is much more difficult to control than the straight beam-and-ball system. The main contribution is to propose a simple new control law, which ensures by adjusting its gain parameters that the basin of attraction arbitrarily can approach the controllability domain for the linear case. For both nonlinear systems, simulation results are presented to illustrate the efficiency of the designed nonlinear control laws and to determine the basin of attraction.

URL PDF HTML ☆

赞 0 踩 0

1010.2247 2026-06-03 math.OC cs.RO cs.SY eess.SY 版本更新

Regions of Attraction for Hybrid Limit Cycles of Walking Robots

行走机器人混合极限环的吸引域

Ian R. Manchester, Mark M. Tobenkin, Michael Levashov, Russ Tedrake

AI总结本文应用非线性混合极限环吸引域分析的最新研究成果，通过范德波尔振荡器、无辐车轮和指南针步态三个示例系统，详细阐述了利用平方和分析和半定规划寻找横向动力学李雅普诺夫函数的方法，并展示了优化横向面、处理冲击映射、优化李雅普诺夫函数以及轨道稳定控制设计等不同方面的应用。

1008.3760 2026-06-03 cs.RO cs.SY eess.SY math.OC 版本更新

Formal-language-theoretic Optimal Path Planning For Accommodation of Amortized Uncertainties and Dynamic Effects

形式语言理论最优路径规划以容纳摊销不确定性和动态效应

Ishanu Chattopadhyay, Anthony Cascone, Asok Ray

AI总结提出基于形式语言定量测度理论的全局最优路径规划方法，通过引入概率不可控转移建模不确定性，并采用无搜索组合优化最大化概率正则语言测度，实现机器人导航中目标到达概率最大化与障碍碰撞概率最小化。

Comments Submitted for review for possible publication elsewhere; journal reference will be added when available

详情

AI中文摘要

我们报告了一种基于形式语言定量测度理论的机器人路径规划全局最优方法。对基于语言测度的路径规划算法$ ustar$进行了重要推广，明确考虑了平均动态不确定性和规划执行中的估计误差。导航自动机的概念被推广为包含概率不可控转移，通过建模和规划执行过程中与计算策略的概率偏差来考虑不确定性。规划问题被转化为概率有限状态自动机的性能最大化问题。本质上，我们求解以下优化问题：计算最大化到达目标概率同时最小化碰撞障碍概率的导航策略。所提出方法的关键新颖之处包括使用不可控转移概念建模不确定性，以及通过高效无搜索组合方法求解后续优化问题，以最大化概率正则语言的定量测度。该算法在多种机器人导航模型中的适用性已通过实验室环境中两轮移动机器人平台（SEGWAY RMP 200）的实验验证得到展示。

英文摘要

We report a globally-optimal approach to robotic path planning under uncertainty, based on the theory of quantitative measures of formal languages. A significant generalization to the language-measure-theoretic path planning algorithm $\nustar$ is presented that explicitly accounts for average dynamic uncertainties and estimation errors in plan execution. The notion of the navigation automaton is generalized to include probabilistic uncontrollable transitions, which account for uncertainties by modeling and planning for probabilistic deviations from the computed policy in the course of execution. The planning problem is solved by casting it in the form of a performance maximization problem for probabilistic finite state automata. In essence we solve the following optimization problem: Compute the navigation policy which maximizes the probability of reaching the goal, while simultaneously minimizing the probability of hitting an obstacle. Key novelties of the proposed approach include the modeling of uncertainties using the concept of uncontrollable transitions, and the solution of the ensuing optimization problem using a highly efficient search-free combinatorial approach to maximize quantitative measures of probabilistic regular languages. Applicability of the algorithm in various models of robot navigation has been shown with experimental validation on a two-wheeled mobile robotic platform (SEGWAY RMP 200) in a laboratory environment.

URL PDF HTML ☆

赞 0 踩 0