arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

自动驾驶

自动驾驶感知、规划、BEV、占用预测、激光雷达和仿真评测。

今日/当前日期收录 19 信号源:cs.RO, cs.CV, eess.IV, cs.AI

1. 仿真评测 1 篇

2606.19267 2026-06-18 cs.RO cs.SY eess.SY 新提交 95%

A Mixed-Reality Testbed for Autonomous Vehicles

自动驾驶汽车的混合现实测试平台

H. M. Sabbir Ahmad, Ehsan Sabouni, Emrullah Celik, Zean Wan, Damola Ajeyemi, Christos G. Cassandras, Wenchao Li

发表机构 * Division of Systems Engineering(系统工程系) Boston University(波士顿大学) Department of Electrical and Computer Engineering(电气与计算机工程系)

专题命中 仿真评测 :混合现实测试平台用于自动驾驶验证

AI总结 提出一种混合现实硬件在环测试平台,集成物理移动机器人与高保真仿真环境,用于验证感知、规划和控制算法,并支持多智能体系统研究。

Comments 9 pages, 7 figures, 1 table

详情
AI中文摘要

我们提出了一种用于自动驾驶汽车的混合现实、硬件在环(HIL)测试平台,该平台将物理移动机器人测试平台与高保真仿真环境无缝集成。虚拟仿真能够创建多样化的、安全关键的驾驶场景,以验证最先进的感知、规划和控制算法,同时通过配备多模态传感器的物理机器人在逼真的虚拟环境中增强仿真,进一步促进严格的验证。我们的测试平台还利用无线通信实现车辆连接,并通过物理机器人和虚拟仿真代理的组合容纳大量代理,支持包括网联自动驾驶汽车(CAV)在内的多智能体系统研究。最后,我们提出了一种结合感知、规划和一种新颖的基于控制障碍函数(CBF)的在线学习控制器的安全保证框架,用于CAV。使用所提出框架的实验用于验证和展示测试平台的关键功能以及其在弥合仿真与真实世界硬件部署之间差距方面的整体效用。

英文摘要

We propose a mixed-reality, hardware-in-the-loop (HIL) testbed for autonomous vehicles that seamlessly integrates a physical testbed of mobile robots with a high-fidelity simulation environment. The virtual simulation enables the creation of diverse, safety-critical driving scenarios to validate state-of-the-art perception, planning, and control algorithms, while augmenting simulations with physical robots equipped with multimodal sensors in photorealistic virtual environments further facilitating rigorous validation. Our testbed also features vehicular connectivity using wireless communication and can accommodate a large number of agents through the combination of physical robots and virtual simulated agents, supporting research on multi-agent systems including Connected and Autonomous Vehicles (CAVs). Finally, we present a safety-guaranteed framework combining perception, planning and a novel online learning-based controller using Control Barrier Functions (CBFs) for CAVs. Experiments using the proposed framework are used to validate and demonstrate the key functionalities and the overall utility of the testbed to bridge the gap between simulation and real-world hardware deployment.

2. 感知 12 篇

2606.19258 2026-06-18 cs.CV cs.RO 新提交 90%

CABLE: Cloud-Assisted Bandwidth-efficient LMM-based Encoding for V2X Systems

CABLE: 面向V2X系统的云辅助带宽高效LMM编码框架

Haohua Que, Zhipeng Bao, Qianyi Wu, Handong Yao

发表机构 * College of Engineering, University of Georgia(佐治亚大学工程学院)

专题命中 感知 :V2X系统云辅助带宽高效感知框架

AI总结 提出CABLE框架,通过边缘端利用自我运动补偿和残差运动线索传播云分割掩码,生成感兴趣区域(ROI)并仅上传ROI掩码图像,形成掩码-ROI-LMM反馈循环,在五个数据集上实现73-87%的ROI像素覆盖减少和5-8倍LMM预填充加速。

详情
AI中文摘要

云托管的大型多模态模型(LMM)可以为车联网系统提供强大的开放词汇感知能力,但简单地将全分辨率帧从边缘传输到云会导致严重的通信开销和云侧预填充延迟。我们提出了CABLE,一种用于边缘-云感知的云辅助带宽高效LMM编码框架。CABLE在边缘端利用自我运动补偿传播先前的云分割掩码,通过残差运动线索进行细化,并通过走廊包络整合断开区域,形成鲁棒的感兴趣区域(ROI)。仅上传ROI掩码图像,而云分割输出作为下一帧的先验反馈,形成掩码-ROI-LMM反馈循环。在五个数据集(nuScenes、WOD-ZB、Waymo、KITTI和CADC)上的实验表明,该方法在保持感知能力的同时实现了显著的通信节省,相对于全帧推理,ROI像素覆盖减少73-87%,估计LMM预填充加速5-8倍,检测质量略有折衷。

英文摘要

Cloud-hosted large multimodal models (LMMs) can provide strong open-vocabulary perception for Vehicle-to-Everything systems, but naively transmitting full-resolution frames from edge to cloud causes severe communication overhead and high cloud-side prefill latency. We present CABLE, a cloud-assisted bandwidth-efficient LMM-based encoding framework for edge-cloud perception. CABLE propagates the previous cloud segmentation mask on the edge using ego-motion compensation, refines it with residual-motion cues, and consolidates disconnected regions via a corridor envelope to form a robust region of interest (ROI). Only ROI-masked images are uploaded, while the cloud segmentation output is fed back as the prior for the next frame, forming a mask-to-ROI-to-LMM feedback loop. Experiments on five datasets (nuScenes, WOD-ZB, Waymo, KITTI, and CADC) show consistent communication savings while largely preserving perception, achieving $73$--$87\%$ ROI pixel-coverage reduction with $5$--$8\times$ estimated LMM prefill speedup at a modest detection-quality trade-off relative to full-frame inference.

2606.18824 2026-06-18 cs.CV cs.LG 新提交 90%

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

他们将去哪里?从自我中心视频建模多模态行人机动

Yuxuan Xie, Nicolas Pugeault, Chongfeng Wei, Hubert P. H. Shum, Edmond S. L. Ho

发表机构 * School of Computing Science, University of Glasgow(格拉斯哥大学计算机科学学院) James Watt School of Engineering, University of Glasgow(格拉斯哥大学詹姆斯·瓦特工程学院) Department of Computer Science, Durham University(杜伦大学计算机科学系)

专题命中 感知 :自我中心视频行人轨迹预测,用于自动驾驶

AI总结 提出MMPM框架,通过行为感知交互模块和基于CVAE的模态感知轨迹预测器,分别建模行人过马路和不过马路两种模式,提升自我中心视角下多模态轨迹预测准确性。

Comments Accepted at The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

详情
AI中文摘要

从自我中心摄像头进行行人轨迹预测具有挑战性,因为它依赖于与车辆和场景上下文的复杂交互以及行人的意图。通过建模行人历史与未来轨迹的相关性和意图,通常会产生多模态(即多个模式)分布。现有的随机预测器通常从单一单峰分布中采样多个未来轨迹,这可能导致次优的“混合模式”轨迹,这些轨迹位于不同的运动模式之间,并在真实场景中变得不合理。在本文中,我们提出MMPM,一种模态感知框架,基于行人的过马路行为将未来轨迹分布分别建模为语义上有意义的模式。MMPM由两个模块组成:行为感知行人交互模块(PIM),通过引入注视、头部和手势来联合捕捉行人-车辆和行人-环境交互;以及基于CVAE的模态感知轨迹预测器(MTP)模块,分别对过马路和不过马路两种模式的未来轨迹分布进行建模。基于查询的解码器进一步在解码过程中强制执行模态一致性。在PIE和JAAD数据集上的实验表明,我们的方法超越了最先进的基线。我们提出的MTP是模型无关的,可以集成到现有框架如BiTrap-NP和SGNet-ED中,以进一步提高未来轨迹预测性能。我们还引入了一种数据驱动的验证协议,将预测与时空一致的真实轨迹匹配,展示了相比先前工作改进的逐帧位移误差。

英文摘要

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

2606.18948 2026-06-18 cs.RO 新提交 85%

C-ARC: Continuous-Adaptive Range Clustering for Non-Repetitive LiDAR Sensors

C-ARC: 面向非重复式LiDAR传感器的连续自适应范围聚类

Nick B. Schroeder, Jonathan Lichtenfeld, Oskar von Stryk

发表机构 * Technical University of Darmstadt(德累斯顿技术大学) Simulation, Systems Optimization and Robotics Group(仿真、系统优化与机器人组)

专题命中 感知 :提出LiDAR点云实时聚类框架,用于自动驾驶感知

AI总结 提出C-ARC框架,通过滑动窗口上的持久双图结构解耦高频点插入与按需聚类检索,并利用指数控制环自适应校准网格分辨率,实现非重复式LiDAR点云的实时聚类。

Comments Submitted to IEEE Robotics and Automation Letters. This work has been submitted to the IEEE for possible publication. 8 pages, 7 figures

详情
AI中文摘要

实时LiDAR聚类识别点云中的结构,是许多移动机器人算法的重要前提。当前方法主要针对重复式机械LiDAR传感器开发。近年来,由于成本和外形尺寸小,非重复式LiDAR传感器的使用显著增加。这类基于Risley棱镜的非重复传感器违反了重复式机械传感器的两个关键假设:结构化的扫描线和明确的帧边界。其Rhodonea曲线轨迹产生非均匀点分布,且缺乏旋转周期使得传统扫描线索引无法适用。为满足这些新需求,我们开发了C-ARC,一个连续自适应范围聚类框架,它在滑动窗口上维护一个持久双图,将高频点插入与按需聚类检索解耦。这对于SLAM或跟踪等关键功能至关重要。自适应范围网格分辨率机制在初始化时使用指数控制环校准网格尺寸,无需预先了解扫描模式即可平衡稀疏-碰撞权衡。作为开源的单线程C++17库实现,C-ARC在商用硬件上对Livox Mid-360以20 Hz产生实时聚类输出。在Livox Avia上的评估表明,对于扫描模式高度集中的传感器,无界单元占用是主要限制。自适应分辨率机制还提高了现有基于网格的方法在非重复数据上的聚类质量。

英文摘要

Real-time LiDAR clustering identifies structures in point clouds, which is an essential prerequisite for many mobile robotics algorithms. Current methods are mostly developed for repetitive mechanical LiDAR sensors. Recently, the use of non-repetitive LiDAR sensors is strongly increasing due to their small cost and form factor. Such non-repetitive Risley prism-based sensors violate two key assumptions of repetitive mechanical sensors: structured scan lines and well-defined frame boundaries. Their Rhodonea-curve trajectories produce non-uniform point distributions, and the absence of a rotation cycle renders conventional scan line indexing inapplicable. To meet such new requirements, we developed C-ARC, a Continuous-Adaptive Range Clustering framework that maintains a persistent dual-graph over a sliding window, decoupling high-frequency point insertion from on-demand cluster retrieval. This is crucial for key functionalities like SLAM or tracking. An adaptive range grid resolution mechanism calibrates grid dimensions at initialization using an exponential control loop, balancing the sparsity-collision trade-off without prior knowledge of the scanning pattern. Implemented as an open-sourced single-threaded C++17 library, C-ARC produces real-time cluster output at 20 Hz on commodity hardware for the Livox Mid-360. Evaluation on the Livox Avia identifies unbounded cell occupancy as the primary limitation for sensors with strongly concentrated scan patterns. The adaptive resolution mechanism additionally improves clustering quality for existing grid-based methods on non-repetitive data.

2606.18864 2026-06-18 cs.LG cs.AI 新提交 85%

Scaling Learning-based AEB with Massive Unlabeled Data

基于大规模无标签数据的可扩展学习型自动紧急制动

Xiangyu Wang, Yang Zhan, Mengxiang Hao, Chuanchuan Zhong, Yansong Jia, Junjie Zhang, Yu Han, Xin Jiang, Zhen Cao, Ying Wang, Yulun Song, Zhitao Xu

发表机构 * Li Auto

专题命中 感知 :自动紧急制动,半监督学习提升性能。

AI总结 提出稳定元反馈半监督学习框架,通过噪声感知解耦和运动学门控伪标签,利用大规模无标签数据提升自动紧急制动性能,实现超100:1正误触发比和35%无事故里程提升。

Comments Accepted for presentation at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

详情
AI中文摘要

本文研究如何在生产约束下,利用大规模无标签车队数据扩展基于学习的自动紧急制动(AEB)。我们的方法基于元反馈半监督学习(MF-SSL),其中教师模型为无标签驾驶数据生成伪标签,并使用小型有标签锚定集作为安全关键反馈进行更新。在生产中,锚定歧义和有标签-无标签不匹配会放大系统性的伪标签错误,导致误触发。我们提出了一种稳定的MF-SSL框架,包括:(i) 噪声感知解耦,从教师监督更新路径中移除易产生歧义的锚定;(ii) 运动学门控伪标签,结合教师冲突惩罚,抑制无标签数据上由不匹配引起的风险幻觉,同时保持广泛覆盖。大量实验表明,随着无标签数据从1M扩展到1B窗口,模型性能持续提升,在保持舒适性的同时提高了安全性。经过1B数据训练的学生模型已部署到数十万辆车辆上,并在超过10^9公里的行驶中得到验证,实现了超过100:1的正误触发比,且相比仅基于规则的基线,无事故行驶里程提升了35%。

英文摘要

This paper studies how to scale learning-based automatic emergency braking (AEB) with massive unlabeled fleet data under production constraints. Our approach is based on meta-feedback semi-supervised learning (MF-SSL), where a teacher generates pseudo labels for unlabeled driving data and is updated using a small labeled anchor set as safety-critical feedback. In production, anchor ambiguity and labeled-unlabeled mismatch can amplify systematic pseudo-label errors, leading to spurious triggers. We propose a stabilized MF-SSL framework with (i) Noise-Aware Decoupling, which removes ambiguity-prone anchors from the teacher's supervised update path, and (ii) kinematics-gated pseudo-labeling with a teacher conflict penalty to suppress mismatch-induced risk hallucinations on unlabeled data while maintaining broad coverage. Extensive experiments show consistent gains as unlabeled data scale from 1M to 1B windows, improving safety while keeping comfort stable. The 1B-trained student model is deployed to hundreds of thousands of vehicles and validated over \$10^9$ km of driving, achieving a positive-to-false activation ratio exceeding 100:1 and a 35% improvement in accident-free driving mileage over a production rule-only baseline.

2606.18687 2026-06-18 cs.CV cs.RO 新提交 85%

Spatially Stratified Distillation for Heterogeneous Radar Place Recognition

空间分层蒸馏用于异构雷达位置识别

Sagun Singh Shrestha, Samuel Harding, Abdelwahed Khamis, Saimunur Rahman, Peyman Moghadam

发表机构 * CSIRO Robotics(澳大利亚联邦科学与工业研究组织机器人实验室) University of Queensland(昆士兰大学)

专题命中 感知 :雷达位置识别,用于自动驾驶

AI总结 针对4D汽车雷达与密集旋转雷达之间的异构位置识别,提出空间分层蒸馏(SSD)方法,通过基于雷达回波的物理空间非对称对齐,在重叠区域强制特征对齐,在稀疏区域降低蒸馏权重,在HeRCULES数据集上达到最先进性能。

Comments IEEE ICRA Workshop on Open Challenges for Rigorous Robot Perception 2026

详情
AI中文摘要

可扩展的全天候位置识别越来越依赖于异构雷达位置识别来桥接不同的硬件平台。一个显著的应用是将来自经济高效的4D汽车雷达的查询与由密集旋转雷达构建的高保真参考地图进行匹配。这一过程从根本上受到4D传感器极端稀疏性(和窄视场)的限制,该传感器仅捕获旋转雷达数据库中存在的结构密度的一小部分。先前的工作通过统一不同的雷达信号来解决这个问题,即将两种信号投影到共同的表示空间。然而,它们在多会话环境中性能下降。在本文中,我们提出了空间分层蒸馏(SSD);一种策略,用直接从物理雷达回波导出的非对称空间对齐取代标准的均匀蒸馏。在两个雷达都有重叠回波的区域,SSD强制进行强特征对齐。关键的是,在4D学生雷达缺乏回波但教师雷达在共享视场内包含有效结构的稀疏区域,SSD应用大幅折扣的蒸馏权重。对最近的HeRCULES数据集的广泛评估表明,SSD显著优于先前的位置识别方法,在其具有挑战性的动态序列上取得了最先进的结果。

英文摘要

Scalable, all-weather place recognition increasingly relies on heterogeneous radar place recognition to bridge diverse hardware platforms. A notable application is matching queries from cost-effective 4D automotive radars against high-fidelity reference maps built by dense spinning radars. This process is fundamentally limited by the extreme sparsity (and narrow field-of-view) of the 4D sensor, which captures only a fraction of the structural density present in the spinning radar database. Prior efforts address this issue by unifying different radar signals. That is, projecting both signals into a common representational space. Yet, they suffer performance degradation in multi-session environments. In this paper, we propose spatially-stratified distillation (SSD); a strategy that replaces standard uniform distillation with an asymmetric spatial alignment derived directly from physical radar returns. In regions where both radars exhibit overlapping returns, SSD enforces strong feature alignment. Crucially, in sparse regions where the 4D student lacks returns but the teacher contains valid structure within the shared field of view, SSD applies heavily discounted distillation weights. Extensive evaluations of the recent HeRCULES dataset demonstrate that SSD significantly outperforms prior place recognition methods, achieving state-of-the-art results on its challenging dynamic sequences.

2606.19190 2026-06-18 cs.RO 新提交 80%

FAST-LIVGO: A Degeneracy-Robust LiDAR-Inertial-Visual-GNSS Fusion Odometry

FAST-LIVGO:一种退化鲁棒的LiDAR-惯性-视觉-GNSS融合里程计

Zhiyu Chen, Chunran Zheng, Jiayu Wen, XiaoLei Zhang, Jiaming Xu, Feng Pan, Yukang Cui

发表机构 * College of Mechatronics and Control Engineering, Shenzhen University(深圳大学机电与控制工程学院) Department of Mechanical Engineering, The University of Hong Kong(香港大学机械工程系) College of Automation, Harbin Engineering University(哈尔滨工程大学自动化学院)

专题命中 感知 :LiDAR-惯性-视觉-GNSS融合里程计,用于自动驾驶

AI总结 提出一种基于误差状态迭代卡尔曼滤波的紧耦合LiDAR-惯性-视觉-GNSS融合框架,通过动态时间规整的时空对齐模块、多普勒和时差载波相位观测模型以及退化感知的双模式异常值拒绝策略,在长期大尺度动态环境中实现高精度鲁棒的状态估计。

Comments Accepted for presentation at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026)

详情
AI中文摘要

在长期、大规模和高度动态环境中的鲁棒状态估计与建图仍然是机器人领域的关键挑战。现有的LiDAR-惯性-视觉里程计(LIVO)系统在局部精度上表现良好,但在长距离下会累积漂移,并在几何退化或无纹理场景中可能失效。同时,GNSS辅助融合框架通常依赖LiDAR或视觉里程计进行状态预测和异常值拒绝,使其在里程计退化时变得脆弱。为解决这些局限,我们提出一种基于误差状态迭代卡尔曼滤波的紧耦合LiDAR-惯性-视觉-GNSS融合框架。引入基于动态时间规整的在线时空对齐模块以应对高度动态条件。为更好利用GNSS精度,我们开发了基于多普勒频移和固定锚点时间差载波相位的观测模型,在不增加历史锚点状态的情况下提供毫米级相对约束。我们进一步设计了一种退化感知的双模式异常值拒绝策略,根据LIVO退化程度在LIVO先验引导拒绝和GNSS辅助恢复之间切换。在公开M3DGR数据集和自建20 m/s固定翼无人机数据集上的实验表明,我们的系统减少了累积漂移和地图重影,在精度和鲁棒性上优于现有方法。

英文摘要

Robust state estimation and mapping in long-term, large-scale, and highly dynamic environments remains a key challenge in robotics. Existing LiDAR-Inertial-Visual Odometry (LIVO) systems achieve strong local accuracy but suffer from accumulated drift over long distances and may fail in geometrically degraded or textureless scenes. Meanwhile, GNSS-aided fusion frameworks often rely on LiDAR or visual odometry for state prediction and outlier rejection, making them vulnerable when odometry degenerates. To address these limitations, we propose a tightly coupled LiDAR-Inertial-Visual-GNSS fusion framework based on an Error-State Iterated Kalman Filter. An online spatiotemporal alignment module using Dynamic Time Warping is introduced for highly dynamic conditions. To better exploit GNSS precision, we develop observation models based on Doppler shifts and fixed-anchor Time-Differenced Carrier Phase, providing millimeter-level relative constraints without augmenting historical anchor states. We further design a degeneracy-aware dual-mode outlier rejection strategy that switches between LIVO-prior-guided rejection and GNSS-aided recovery according to the LIVO degeneracy level. Experiments on the public M3DGR dataset and a custom 20~m/s fixed-wing UAV dataset demonstrate that our system reduces accumulated drift and map ghosting, outperforming state-of-the-art methods in accuracy and robustness.

2606.18599 2026-06-18 cs.CR cs.AI 新提交 80%

MIDS: Detecting Stealthy Masquerade and Tampering Attacks on CAN Bus via Bidirectional Mamba

MIDS:通过双向Mamba检测CAN总线上的隐蔽伪装和篡改攻击

Qiqi Liu, Runhan Song, Lei Cui, Heng Zhang, Yuyan Sun, Limin Sun

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(信息工程研究所,中国科学院) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络安全学院) Zhongguancun Laboratory(中关村实验室)

专题命中 感知 :提出MIDS检测CAN总线攻击,保障车辆安全

AI总结 针对CAN总线缺乏加密认证易受攻击的问题,提出MIDS双流框架,利用双向状态空间模型并行处理标识符和载荷,在特斯拉Model 3数据集上F1达96.94%,优于基线8个百分点以上。

详情
AI中文摘要

控制器局域网(CAN)协议是现代车辆中电子控制单元(ECU)的主要通信标准,但其缺乏加密和认证,使其面临一系列安全威胁。现有的入侵检测系统主要针对制造型攻击(通过帧注入实现的DoS、模糊测试、ID欺骗),此类攻击中每ID到达间隔统计等检测信号易于获取。我们转而解决更困难的伪装场景,其中内部攻击者在其原始传输时隙原位替换合法帧,保持流量周期性,使基于流量统计的防御失效。我们提出Mamba入侵检测系统(MIDS),一种创新的双流框架,并行处理CAN标识符和载荷,并通过双向选择性状态空间建模重建其联合时间语义。为评估MIDS,我们从物理特斯拉Model 3在三种驾驶模式下收集了超过1亿个CAN帧,并合成了54种伪装攻击变体,涵盖仅ID、仅数据和组合修改。MIDS在该数据集上达到96.94%的F1分数,超过最强可复现基线8个百分点以上,同时保持1.147毫秒的单窗口推理延迟——为实时车载部署留有充足余量。为验证泛化能力,我们进一步在四个公开基准(ROAD、CrySyS、OTIDS、CT&T)上评估MIDS,涵盖伪装和注入场景;在统一的5折协议下,MIDS的F1分数从93.70%到99.61%,超过八个复现基线中最强者最多13.94个百分点。

英文摘要

The Controller Area Network (CAN) protocol is the primary communication standard for Electronic Control Units (ECUs) in modern vehicles, but its lack of encryption and authentication exposes it to a range of security threats. Existing intrusion detection systems are largely tuned to fabrication-style attacks (DoS, fuzzing, ID spoofing realised by frame injection), in which detection signals such as per-ID inter-arrival statistics are readily available. We instead address the harder \emph{masquerade} setting~\cite{b37}, in which an internal adversary substitutes a legitimate frame in-situ at its original transmission slot, preserving traffic periodicity and rendering traffic-statistic defences ineffective. We propose the Mamba Intrusion Detection System (MIDS), an innovative dual-stream framework that processes CAN identifiers and payloads in parallel and reconstructs their joint temporal semantics through bidirectional selective state-space modelling. To evaluate MIDS, we collected over 100 million CAN frames from a physical Tesla Model 3 across three driving regimes and synthesised 54 masquerade attack variants spanning ID-only, data-only, and combined modifications. MIDS attains an F1 of 96.94\% on this dataset, exceeding the strongest reproducible baseline by more than 8 percentage points, while sustaining a 1.147~ms single-window inference latency -- ample headroom for real-time onboard deployment. To verify generalisation, we further evaluate MIDS on four public benchmarks (ROAD, CrySyS, OTIDS, CT\&T) covering both masquerade and injection scenarios; MIDS attains F1 from 93.70\% to 99.61\%, outperforming the strongest of eight reproduced baselines by up to 13.94 percentage points under a unified 5-fold protocol.

2606.19307 2026-06-18 cs.RO 新提交 70%

Observability and Consistency Analysis for Visual-Inertial Navigation with Anchored Feature Parameterizations

基于锚定特征参数化的视觉惯性导航的可观性与一致性分析

Mitchell Cohen, Vassili Korotkine, James Richard Forbes

发表机构 * Department of Mechanical Engineering, McGill University(麦吉尔大学机械工程系)

专题命中 感知 :视觉惯性导航可观性与一致性

AI总结 分析基于滤波的视觉惯性导航系统(VINS)使用锚定特征表示时的可观性与一致性,证明其不可观子空间独立于估计的地标状态,从而改善一致性,但仍依赖导航状态,需额外一致性增强技术。

Comments Accepted to IEEE/RSJ IROS. 8 pages, 3 figures, 4 tables

详情
AI中文摘要

本文分析了使用锚定特征表示的基于滤波的视觉惯性导航系统(VINS)的可观性和一致性特性。结果表明,采用锚定地标参数化的VINS的不可观子空间独立于估计的地标状态,从而无需任何额外修改即可改善估计器的一致性。然而,不可观子空间仍然依赖于估计的导航状态,因此需要额外的一致性增强技术。本文提出了两种方法来改善采用锚定特征表示的VINS的一致性。仿真结果表明,与使用全局参考系解析特征的算法相比,所有采用锚定特征参数化的估计器都表现出更好的一致性,特别是在特征初始化可能较差的情况下。在TUM-VI数据集上的真实世界实验表明,仅使用锚定特征表示即可获得与采用全局特征表示的一致性改进估计器相当的性能,证明了在VINS中使用锚定特征参数化的优势。

英文摘要

This paper presents an analysis of the observability and consistency properties of filtering-based visual-inertial navigation systems (VINS) that utilize anchored feature representations. The unobservable subspace of VINS with anchored landmark parameterizations is shown to be independent of the estimated landmark state, which leads to improved estimator consistency properties without any additional modifications. However, the unobservable subspace is still found to depend on the estimated navigation state, necessitating additional consistency-enforcing techniques. Two methods to improve the consistency of VINS with anchored feature representations are presented. Simulation results showcase that all estimators employing anchored feature paramterizations exhibit improved consistency properties compared to algorithms that estimate features resolved in a global reference frame, especially in scenarios where feature initialization may be poor. Real-world experiments on the TUM-VI dataset showcase that the use of anchored feature representations alone can yield comparable performance to consistency-improved estimators employing a global feature representation, demonstrating the benefit of using anchored feature parameterizations for VINS.

2606.19176 2026-06-18 cs.RO cs.AI cs.SY eess.SY 新提交 70%

Hardware- and Vision-in-the-Loop Validation of Deep Monocular Pose Estimation for Autonomous Maritime UAV Flight

用于自主海上无人机飞行的深度单目位姿估计的硬件与视觉在环验证

Maneesha Wickramasuriya, Beomyeol Yu, Jaden Shin, Mason Huslig, Taeyoung Lee, Murray Snyder

发表机构 * Mechanical and Aerospace Engineering, George Washington University(机械与航空航天工程,乔治华盛顿大学)

专题命中 感知 :无人机单目位姿估计,属于自动驾驶感知

AI总结 提出硬件验证的视觉在环框架,结合深度变换器单目位姿估计器和延迟卡尔曼滤波器,在模拟逼真海上环境中实现自主室内飞行,验证了感知延迟等嵌入式效应。

Comments 6 pages 9 figues

详情
AI中文摘要

船舶上的自主无人机操作需要可靠的基于视觉的相对位姿估计,然而海上验证成本高、依赖天气且风险大。本文提出一个硬件验证的视觉在环框架,能够在模拟逼真海上环境的同时实现完全自主的室内飞行。渲染的海上视图由板载的基于深度变换器的单目位姿估计器处理。延迟的视觉测量与高频率IMU数据通过延迟卡尔曼滤波器融合,为几何控制提供一致的状态估计。该系统捕捉了纯仿真中缺失的关键嵌入式效应,包括感知延迟、异步更新和计算约束。自主起飞、轨迹跟踪和着陆实验证明了稳定的闭环飞行。结果建立了一个安全且硬件真实的中间阶段,用于在船上部署之前开发海上无人机自主性。

英文摘要

Autonomous UAV operations on ships require reliable vision-based relative pose estimation, yet at-sea validation is costly, weather-dependent, and risky. This paper presents a hardware-validated vision-in-the-loop framework that enables fully autonomous indoor flight while emulating photorealistic maritime environments. Rendered maritime views are processed onboard by a deep transformer-based monocular pose estimator. Delayed vision measurements are fused with high-rate IMU data using a delayed Kalman filter to provide consistent state estimates for geometric control. The system captures critical embedded effects, including perception latency, asynchronous updates, and computational constraints, that are absent in pure simulation. Autonomous takeoff, trajectory tracking, and landing experiments demonstrate stable closed-loop flight. The results establish a safe and hardware-realistic intermediate stage for developing maritime UAV autonomy prior to shipboard deployment.

2606.19154 2026-06-18 cs.RO 新提交 70%

Viking Hill Dataset: A Lidar-Radar-Camera Dataset for Detection and Segmentation in Forest Scenes

Viking Hill数据集:用于森林场景检测与分割的激光雷达-雷达-相机数据集

Vladimír Kubelka, Oleksandr Kotlyar, Unal Artan, Martin Magnusson

发表机构 * Örebro University(奥雷布罗大学) AASS research centre(AASS研究中心) Robot Navigation and Perception Lab(机器人导航与感知实验室)

专题命中 感知 :多传感器数据集用于森林场景感知,类似自动驾驶

AI总结 提出首个包含4D成像雷达的森林多传感器数据集,通过MinkowskiUNet实现雷达与激光雷达点云的语义分割,并评估树干分割质量与树木尺寸的关系。

Comments 33 pages, 11 figures

详情
AI中文摘要

在森林冠层下运行的自主机器人需要对树木及周围植被在不同季节条件下进行稳健感知。现有的林业数据集提供带有单棵树标注的激光雷达或相机数据,但均未包含共配准的4D成像雷达——这一模态因其对视觉退化、表面污染和植被遮挡的鲁棒性而日益受到关注。我们介绍了一个由移动机器人收集的多传感器森林数据集,该机器人配备了高分辨率FMCW成像雷达、激光雷达、RGB相机、IMU和RTK-GNSS。该场地在两个不同植被状态的会话中记录,3D立方体标注(包括每棵树的直径估计)为所有三种感知模态提供了共享语义标签。此外,我们提供了使用MinkowskiUNet对雷达和激光雷达点云进行语义分割的基线结果。雷达在主要类别(地面91%,冠层86%)上取得了与激光雷达竞争性的IoU分数,但在几何精细结构(如树干)上落后(56%对74%)。跨模态分析进一步比较了激光雷达和雷达的树干分割与RGB检测模型,而按直径分层的评估揭示了树干分割质量如何随树木尺寸变化。除了分割,共配准的多模态数据和RTK-GNSS辅助参考定位支持冠层下地图构建、定位和传感器融合的研究。数据集和标注工具已公开。

英文摘要

Autonomous robots operating under forest canopies need robust perception of trees and surrounding vegetation across varying seasonal conditions. Existing forestry datasets provide lidar or camera data with per-tree annotations, but none include co-registered 4D imaging radar -- a modality of growing interest for its resilience to visual degradation, surface contamination, and vegetation occlusion. We introduce a multi-sensor forest dataset collected by a mobile robot equipped with a high-resolution FMCW imaging radar, lidar, RGB camera, IMU, and RTK-GNSS. The site was recorded in two sessions under contrasting vegetation states, and 3D cuboid annotations -- including per-tree diameter estimates -- provide shared semantic labels across all three perception modalities. Furthermore, we provide baseline results for semantic segmentation of the radar and lidar point clouds using MinkowskiUNet. Radar achieves IoU scores competitive with lidar for dominant classes (ground 91%, canopy 86%) while lagging on geometrically fine structures such as tree trunks (56% vs. 74%). A cross-modality analysis further compares lidar and radar trunk segmentation against an RGB detection model, and a diameter-stratified evaluation reveals how trunk segmentation quality varies with tree size. Beyond segmentation, the co-registered multi-modal data and RTK-GNSS-aided reference positioning support research in mapping, localization, and sensor fusion under canopy. The dataset and annotation tools are publicly available.

2606.17030 2026-06-18 cs.CV 新提交 70%

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Qwen-RobotWorld技术报告:通过语言条件视频生成统一具身世界模型

Jie Zhang, Xiaoyue Chen, Anzhe Chen, Dayiheng Liu, Deqing Li, Gengze Zhou, Hale Yin, Haoqi Yuan, Haoyang Li, Jiahao Li, Jiazhao Zhang, Jingren Zhou, Kaiyuan Gao, Kun Yan, Lihan Jiang, Ningyuan Tang, Pei Lin, Qihang Peng, Shengming Yin, Tianhe Wu, Tianyi Yan, Xiao Xu, Yan Shu, Yanran Zhang, Ye Wang, Yi Wang, Yilei Chen, Yixian Xu, Yiyang Huang, Yuxiang Chen, Zekai Zhang, Zhendong Wang, Zixing Lei, Zhixuan Liang, Zihao Liu, Zikai Zhou, Chenxu Lv, Xiong-Hui Chen, Chenfei Wu

发表机构 * Qwen Team(Qwen团队)

专题命中 感知 :预测自动驾驶场景的未来视觉轨迹

AI总结 提出Qwen-RobotWorld,一种以自然语言为统一动作接口的语言条件视频世界模型,通过双流MMDiT、大规模具身世界知识语料和渐进式课程训练,在机器人操作、自动驾驶等任务中实现物理一致的未来视觉轨迹预测,在多个基准上取得最优结果。

详情
AI中文摘要

我们介绍Qwen-RobotWorld,一种用于具身智能的语言条件视频世界模型。以自然语言作为统一动作接口,它从当前观测预测物理上合理的未来视觉轨迹,涵盖机器人操作、自动驾驶、室内导航和人到机器人迁移。这种统一公式提供了三个有前景的应用方向:用于策略训练增强的合成数据生成、用于策略评估的可扩展虚拟环境,以及用于下游机器人控制的语言引导规划信号。这是通过三部分设计实现的:a) 双流MMDiT与MLLM动作编码,其中60层双流扩散变压器通过逐层联合注意力将冻结的Qwen2.5-VL语义与视频VAE潜变量耦合;b) 具身世界知识(EWK),一个860万视频-文本语料库(2亿+帧),包含20+种具身形态和500+动作类别的动作-语言映射;c) 通用+专家渐进式课程,一种两阶段训练策略,首先学习通用视觉先验,然后在共享语言接口下注入具身专门化。广泛的结果显示出强竞争力:在EWMBench和DreamGen Bench上总体排名第一,在WorldModelBench和PBench上优于所有开源模型。在RoboTwin-IF基准上的额外零样本分析进一步支持了鲁棒泛化和多视图一致性。

英文摘要

We introduce Qwen-RobotWorld, a language-conditioned video world model for embodied intelligence. With natural language as a unified action interface, it predicts physically grounded future visual trajectories from current observations across robotic manipulation, autonomous driving, indoor navigation, and human-to-robot transfer. This unified formulation provides three promising application directions: synthetic data generation for policy training augmentation, scalable virtual environments for policy evaluation, and language-guided planning signals for downstream robot control. This is achieved through a three-part design: a) Double-Stream MMDiT with MLLM Action Encoding, where a 60-layer double-stream diffusion transformer couples frozen Qwen2.5-VL semantics with video-VAE latents through layer-wise joint attention; b) Embodied World Knowledge (EWK), an 8.6M video-text corpus (200M+ frames) with action-language mapping over 20+ embodiments and 500+ action categories; and c) General+Expert Progressive Curriculum, a two-stage training strategy that first learns general visual priors and then injects embodied specialization under a shared language interface. Extensive results show strong competitiveness: ranks 1st overall on EWMBench and DreamGen Bench, outperforms all open-source models on WorldModelBench and PBench. Additional zero-shot analyses on RoboTwin-IF benchmark further support robust generalization and multi-view consistency.

2606.18841 2026-06-18 cs.CV 新提交 60%

Rethinking Air-Ground Collaboration: A Progressive Cross-Task Benchmark and Socialized Learning Framework

重新思考空地协作:渐进式跨任务基准与社会化学习框架

Zhoupeng Guo, Yunqi Zhu, Zhihe Fan, Xinjie Yao, Ruipu Zhao, Boan Tao, Yiming Sun, Zhen Wang, Pengfei Zhu

发表机构 * School of Automation, Southeast University(东南大学自动化学院) School of Computer Science and Engineering, University of New South Wales(新南威尔士大学计算机科学与工程学院) School of Sports Training, Tianjin University of Sport(天津体育学院运动训练学院) Faculty of Information Engineering and Automation, Kunming University of Science and Technology(昆明理工大学信息工程与自动化学院) School of Artificial Intelligence, Tianjin University(天津大学人工智能学院) School of Artificial Intelligence, Hebei University of Technology(河北工业大学人工智能学院)

专题命中 感知 :空地协作感知,可用于自动驾驶。

AI总结 提出空地渐进协作基准AGPC和社会化协同感知框架SCP,通过双层级路由器实现跨视角跨任务选择性交互,在异构空地感知中提升下游性能7.86%。

详情
AI中文摘要

空地协同感知对于真实世界动态环境中的鲁棒视觉理解至关重要。然而,现有研究通常将协作建模为单任务跨视角融合,忽视了定位、目标关联和细粒度解析之间的功能依赖关系。此外,空中和地面视角的异构性引入了显著的几何、尺度和遮挡差异,使得统一特征共享容易受到负迁移的影响。为解决这些问题,我们将空地感知建模为渐进式跨任务协作任务,并构建了空地渐进协作(AGPC)基准,这是一个包含超过745K原始视频帧的时空对齐基准。基于该基准,我们提出了社会化协同感知(SCP),一个从空中全局定位到地面目标关联和身份感知解析的渐进式协作框架。其核心模块——双层级路由器(DLR),将输入侧的多尺度专家选择与输出侧的任务条件调制解耦,实现了选择性的跨视角和跨任务交互,同时抑制有害干扰。大量实验证明了SCP的有效性。它实现了3.73%的协同进化增益和7.86%的平均下游性能提升。这些结果表明,对于异构空地感知,任务条件协作比统一融合更有效。代码可在该网址获取。

英文摘要

Air-ground collaborative perception is crucial for robust visual understanding in real-world dynamic environments. However, existing studies typically formulate collaboration as single-task cross-view fusion, overlooking the functional dependencies among localization, target association, and fine-grained parsing. In addition, the heterogeneous nature of aerial and ground views introduces substantial geometric, scale, and occlusion discrepancies, making uniform feature sharing vulnerable to negative transfer. To tackle these issues, we model air-ground perception as a progressive cross-task collaboration task and construct the Air-Ground Progressive Collaboration (AGPC) benchmark, a spatio-temporally aligned benchmark comprising more than 745K raw video frames. Built upon this benchmark, we propose Socialized Co-Perception (SCP), a coarse-to-fine framework that organizes collaboration progressively from aerial global localization to ground target association and identity-aware parsing. Its core module, the Dual-Layer Router (DLR), decouples input-side multi-scale expert selection from output-side task-conditioned modulation, enabling selective cross-view and cross-task interaction while suppressing harmful interference. Extensive experiments demonstrate the effectiveness of SCP. It achieves a 3.73\% coevolutionary gain and a 7.86\% improvement in average downstream performance. These results show that task-conditioned collaboration is more effective than uniform fusion for heterogeneous air-ground perception. The code is available at https://github.com/g1136639260-spec/AGSCP.

3. 规划控制 4 篇

2606.19227 2026-06-18 cs.RO 新提交 80%

Constant Time-Delay Leader Following with Neural Networks and Invariant Extended Kalman Filters for Arbitrary Trajectories

基于神经网络与不变扩展卡尔曼滤波的任意轨迹恒定时间延迟领航跟随

Luka Antonyshyn, Paulo Ricardo Marques de Araujo, Sidney Givigi

发表机构 * University of Toronto Institute for Aerospace Studies(多伦多大学航空航天研究所) School of Computing(计算学院)

专题命中 规划控制 :车辆队列轨迹跟踪,属于自动驾驶规划控制

AI总结 提出一种结合概率Seq2Seq神经网络与不变扩展卡尔曼滤波的恒定时间延迟轨迹跟踪方法,用于无通信、无全局坐标的车队,在SE(2)流形上准确估计领车轨迹,并利用几何模型预测控制提升性能。

Comments 9 pages, 6 figures

详情
AI中文摘要

本文提出了一种用于车辆队列的恒定时间延迟轨迹跟踪方法,该方法无需车辆间通信、公共坐标系或全球定位。该方法将概率序列到序列(Seq2Seq)神经网络与不变扩展卡尔曼滤波(IEKF)相结合,以热启动预测过程,从而在SE(2)流形上准确估计领车相对轨迹。进一步引入几何模型预测控制器,以充分利用基于流形的轨迹预测来改善控制性能。该系统能够处理具有不同速度和运动轮廓的任意非线性轨迹,同时减少了对基于专家领域知识的轨迹跟踪系统设计的需求,即使在长轨迹延迟下也是如此。通过运动学仿真中与纯IEKF基线、基于学习的方法以及真实轨迹的对比,以及使用真实机器人车辆的实验,验证了该方法的有效性。

英文摘要

This paper proposes a constant time-delay trajectory tracking method for vehicle convoys operating without inter-vehicle communication, a common coordinate system, or global positioning. The method integrates a probabilistic sequence-to-sequence (Seq2Seq) neural network with an invariant extended Kalman filter (IEKF) to warm-start the prediction process, allowing accurate estimation of a leader vehicle's relative trajectory on the SE(2) manifold. A geometric model predictive controller is further incorporated to fully exploit the manifold-based trajectory predictions for improved control performance. The system can handle arbitrary nonlinear trajectories with varying speeds and motion profiles while reducing the need for expert-based domain knowledge for the design of trajectory following systems, even under long trajectory delays. The effectiveness of the method is validated through comparisons with a pure IEKF baseline, learning-based methods, and the ground-truth trajectory in kinematic simulations, as well as in experiments using real robotic vehicles.

2606.18630 2026-06-18 cs.RO 新提交 80%

DNN Koopman-Based Deviation Compensation for UGV Path Tracking Control on Coupled Slope and Potholed Road

基于DNN Koopman的偏差补偿用于耦合坡度和坑洼道路上的UGV路径跟踪控制

Jian Zhao, Wenbo Zhou, Zhicheng Chen, Bing Zhu, Jiayi Han, Dongjian Song, Yinju Lin, Peixing Zhang

发表机构 * Xiamen King Long United Automotive Industry Co., Ltd.(厦门金龙联合汽车工业有限公司)

专题命中 规划控制 :UGV路径跟踪控制,补偿坡道和坑洼扰动

AI总结 提出基于DNN Koopman的偏差补偿策略,结合自适应遗忘递推最小二乘估计轮胎刚度、Laguerre模型预测控制与事件触发协同补偿,在耦合坡度和坑洼道路上提升UGV路径跟踪精度超11.5%

Comments 22 pages, 13 figures

详情
AI中文摘要

在越野场景中运行的无人地面车辆面临复杂地形扰动,这些扰动会显著降低路径跟踪性能。针对这一挑战,本文提出了一种基于深度神经网络Koopman的偏差补偿策略,用于无人地面车辆路径跟踪控制。首先,基于耦合坡度上的车辆动力学函数,设计了一种带有解耦误差项的自适应遗忘递推最小二乘法来估计轮胎侧偏刚度。在此基础上,通过引入Laguerre函数,设计了一种Laguerre模型预测控制路径跟踪控制策略,该策略可在不同耦合坡度场景下降低计算资源消耗的同时保持可靠的跟踪性能。然后,通过将Koopman算子理论与深度神经网络相结合,提出了一种深度神经网络Koopman路径偏差补偿方法,该方法显著提高了无人地面车辆在坑洼道路扰动下的路径跟踪精度。此外,基于补偿激活准则和可信度验证,建立了一种将Laguerre模型预测控制与深度神经网络Koopman耦合的事件触发并行协同补偿机制。该机制提高了坑洼道路上的路径跟踪精度,同时确保了整体转向指令的可行性和深度神经网络Koopman补偿后车辆的稳定性。最后,构建了硬件在环实验平台进行验证。实验结果表明,所提出的无人地面车辆路径跟踪策略在多种工况下跟踪性能提升超过11.5%。

英文摘要

Unmanned ground vehicles (UGVs) operating in off-road scenarios are confronted with complex terrain disturbances that can substantially degrade path tracking performance. To address this challenge, this paper proposes a deep neural network (DNN) Koopman-based deviation compensation strategy for UGV path tracking control. Firstly, based on the vehicle dynamic function on coupled slope, an adaptive forgetting recursive least squares method with decoupled error terms is designed to estimate tire cornering stiffness. On this basis, a Laguerre model predictive control (LMPC) path tracking control strategy is designed by incorporating Laguerre functions, which can reduce computational resource usage while maintaining reliable tracking performance across different coupled slope scenarios. Then, by integrating Koopman operator theory with DNN, a DNN Koopman (DK) path deviation compensation method is proposed, which significantly improves the path tracking accuracy of UGV under potholed road disturbances. Furthermore, an event-triggered parallel cooperative (EPC) compensation mechanism that couples LMPC with DK is established based on compensation activation criteria and credibility verification. This mechanism improves path tracking accuracy on potholed road while ensuring the feasibility of overall steering command and stability of vehicle after DK compensation. Finally, a hardware-in-the-loop (HiL) experimental platform is constructed for validation. Experimental results demonstrate that the proposed UGV path tracking strategy improves tracking performance by more than 11.5% across multiple operating conditions.

2606.18883 2026-06-18 cs.RO 新提交 65%

ZiMPedance: Impedance-Aware ZMP Modeling and Control for Payload Carrying with Quadruped Robots

ZiMPedance:面向四足机器人负载搬运的阻抗感知ZMP建模与控制

Giovanni B. Dessy, Lorenzo Amatucci, Victor Barasuol, Claudio Semini

发表机构 * Dynamic Legged Systems Lab, Istituto Italiano di Tecnologia (IIT)(动态腿部系统实验室,意大利技术研究院(IIT))

专题命中 规划控制 :涉及ZMP建模与控制,可迁移至自动驾驶

AI总结 提出扩展零力矩点(ZMP)公式以包含被动负载接口动力学,结合模型预测控制减少稳定性违规达10倍,并提高运动效率。

详情
AI中文摘要

四足机器人的负载运输受到机器人与负载之间物理接口动力学的强烈影响。与主动机械臂相比,被动弹簧臂减轻了重量和复杂性,但其弹簧-阻尼动力学可能引入振荡力,降低运动稳定性。本文推导了一个扩展的零力矩点(ZMP)公式,该公式包含被动负载接口动力学,将刚度、阻尼和负载质量与稳定性裕度联系起来。分析表明,欠阻尼配置可能与运动谐波共振。基于这一见解,我们通过被动子系统动力学增强了单刚体动力学模型,并将其集成到模型预测控制框架中。在仿真中,所提出的控制器将稳定性违规减少高达10倍(从7.0%降至0.7%),并通过将水平地面反作用力努力降低高达15%来提高运动效率。硬件实验表明,在标称控制器失效的拉放扰动下,携带2公斤负载的机器人能够稳定运动。同一模型还使得通过被动臂动力学实现末端执行器跟踪成为可能,而无需直接驱动臂。

英文摘要

Load transportation with quadruped robots is strongly affected by the dynamics of the physical interface between the robot and the load. Passive spring-based arms reduce weight and complexity compared to active manipulators, but their spring-damper dynamics can introduce oscillatory forces that degrade locomotion stability. This paper derives an extended Zero Moment Point (ZMP) formulation that includes passive payload-interface dynamics, relating stiffness, damping, and payload mass to the stability margin. The analysis shows that underdamped configurations can resonate with locomotion harmonics. Based on this insight, we augment a Single Rigid Body Dynamics model with passive subsystem dynamics and integrate it into a Model Predictive Control framework. In simulation, the proposed controller reduces stability violations by up to $10\times$, from $7.0\%$ to $0.7\%$, and increase locomotion efficiency by lowering horizontal ground reaction force effort by up to $15\%$ compared to a nominal baseline. Hardware experiments with a $2\,\mathrm{kg}$ payload show stable locomotion under pull-release disturbances where the nominal controller fails. The same model also enables end-effector tracking through passive arm dynamics without direct arm actuation.

2606.18516 2026-06-18 cs.RO 新提交 60%

Task Allocation and Motion Planning in Dynamic, Cluttered Environments via CBBA and Graphs of Convex Sets

动态杂乱环境下的任务分配与运动规划:基于CBBA与凸集图

Matthew D. Osburn, Cameron K. Peterson, John L. Salmon

发表机构 * Electrical and Computer Engineering(电气与计算机工程系) Mechanical Engineering(机械工程系)

专题命中 规划控制 :动态环境中的轨迹规划

AI总结 针对动态杂乱环境中的多智能体任务规划,提出结合凸集图(GCS)进行轨迹优化与共识捆绑算法(CBBA)进行分布式任务分配的方法,实现安全高效的轨迹规划和任务协调。

Comments 15 pages single column, 10 figures, AIAA-Scitech 2027 Submission

详情
AI中文摘要

在杂乱、动态环境中的多智能体任务规划需要在分配任务给智能体的同时,确定通过环境的安全、时间高效的轨迹。当任务是动态的(例如会合目标)时,分配决策不仅取决于哪个智能体最适合某项任务,还取决于该任务何时何地可以到达。本文提出了一个解决该问题的方法,该方法将凸集图(GCS)用于轨迹优化,与共识捆绑算法(CBBA)用于分布式任务分配相结合。在我们的方法中,GCS通过使用时间扩展(3D+时间)配置空间找到通过动态环境的最优轨迹。同时,CBBA协调跨智能体的任务分配,使得在移动环境中能够做出明智的决策。然后,我们连接分配和规划,使智能体能够在3D+时间配置空间中避免碰撞,并提供准确的任务完成时间估计。我们在具有静态和动态任务的模拟杂乱环境中展示了我们方法的有效性。

英文摘要

Multi-agent task planning in cluttered, dynamic environments requires assigning tasks to agents while simultaneously determining safe, time-efficient trajectories through the environment. When tasks are dynamic, such as rendezvous objectives, allocation decisions depend not only on which agent is best suited for a task, but also on when and where that task can be reached. This paper presents a solution to this problem, which combines Graphs of Convex Sets (GCS) for trajectory optimization with the Consensus-Based Bundle Algorithm (CBBA) for distributed task allocation. In our approach, GCS finds optimal trajectories through dynamic environments using a time-extended (3D+time) configuration space. At the same time, CBBA coordinates task assignments across agents, enabling informed decision-making in a moving environment. We then connect allocation and planning to allow the agents to avoid collisions in the 3D+time configuration space and provide accurate time estimates for task completion. We demonstrate the effectiveness of our approach in simulated cluttered environments with static and dynamic tasks.

4. BEV与占用 1 篇

2606.19122 2026-06-18 cs.RO 新提交 80%

Monocular 3D Occupancy Perception for Robots on Sidewalks via Hybrid 2D-3D Learning

基于混合2D-3D学习的人行道机器人单目3D占用感知

Yukai Ma, Joe Lin, Liu Liu, Honglin He, Lulu Ricketts, Brad Squicciarini, Yong Liu, Bolei Zhou

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Zhejiang University(浙江大学) Coco Robotics(Coco机器人) Massachusetts Institute of Technology(麻省理工学院)

专题命中 BEV与占用 :单目3D占用感知用于机器人导航,类似自动驾驶

AI总结 提出WalkOCC框架,通过混合射线行进单目3D占用感知,结合LiDAR-RGB配对数据与大规模无配对单目图像学习,提升人行道机器人导航的预测精度和泛化能力。

详情
AI中文摘要

现实世界中的人行道拥挤、杂乱且结构化程度低于道路,使得3D占用预测成为配送机器人和电动轮椅等移动机器人安全导航的关键。现有的占用学习流程主要针对道路自动驾驶设计,通常在大规模配对的LiDAR-RGB数据集上训练,需要密集的3D监督和多个摄像头输入,这些数据收集成本高且未能充分捕捉人行道特定特征。我们提出WalkOCC,一种用于人行道机器人的混合射线行进单目3D占用感知框架。WalkOCC显式地将来自LiDAR-RGB配对数据的几何基础与来自大规模无配对单目图像的可扩展学习相结合。它从配对序列中引导出伪占用监督,并在额外的仅2D数据上联合学习图像级表示。它在不需要昂贵的3D占用标注的情况下实现了稳定的优化和改进的泛化能力。大量实验表明,与基于自监督图像的基线相比,在预测精度、对路缘和排水沟等细微城市结构的细粒度分割以及对环境和跨本体变化的鲁棒性方面,WalkOCC均取得了一致的提升。为了便于评估和基准测试,我们还引入了Sidewalk3D,这是一个大规模的人行道感知数据集,包含在多个地点和时间段收集的LiDAR-相机配对序列,以及用于评估的3D语义占用标注。代码和数据将公开提供。

英文摘要

Sidewalks in the real world are crowded, cluttered, and less structured than roads, making 3D occupancy prediction a key ingredient for the safe navigation of mobile robots such as delivery bots and electric wheelchairs. Existing occupancy learning pipelines are largely designed for on-road autonomous driving and often train on large-scale paired LiDAR-RGB datasets with dense 3D supervision and multiple camera inputs, which are costly to collect and do not adequately capture sidewalk-specific characteristics. We propose WalkOCC, a hybrid Ray-marching monocular 3D occupancy perception framework for robots operating on sidewalks. WalkOCC explicitly couples geometric grounding from LiDAR-RGB paired data with scalable learning from large-scale unpaired monocular images. It bootstraps pseudo occupancy supervision from paired sequences and jointly learns image-level representations on additional 2D-only data. It yields stable optimization and improved generalization without requiring costly 3D occupancy annotations. Extensive experiments demonstrate consistent gains in prediction accuracy, fine-grained segmentation of subtle urban structures such as curbs and gutters, and robustness to environmental and cross-embodiment shifts compared with self-supervised image-based baselines. To facilitate evaluation and benchmarking, we also introduce Sidewalk3D, a large-scale sidewalk perception dataset with LiDAR-camera paired sequences collected across multiple locations and time periods, along with 3D semantic occupancy annotations for evaluation. Code and data will be made available.

5. 其他自动驾驶 1 篇

2606.18314 2026-06-18 cs.CG 新提交 60%

Repair Entropy in Dynamic Geometric Nearest-Neighbour Structures

动态几何最近邻结构中的修复熵

Faruk Alpay, Bugra Kilictas

专题命中 其他自动驾驶 :研究动态几何最近邻维护,可应用于自动驾驶。

AI总结 针对小运动下的精确最近邻维护问题,提出基于修复前沿熵的自适应策略,在O(|F_t| log N)时间内修复失效证书,并验证了2400种运动场景下的有效性。

Comments 10 pages, 2 figures, 2 tables; code and dataset provided as ancillary files

详情
AI中文摘要

我们研究小运动下精确最近邻维护的动态几何数据结构。对每个点,我们存储一个由最近邻和两个最小邻近距离组成的证书,间隙为$c_i=d^i_2-d^i_1$。三角不等式给出一个尖锐的有效性半径:在最大位移为$\varepsilon$的一步后,每个满足$c_i>4\varepsilon$的证书仍然有效,因此所有可能的失效被限制在修复前沿$F_t$内。我们引入修复前沿熵$H(F_t)$,即失效证书在索引单元上的归一化香农熵,作为选择事件驱动修复、批量修复或完全重建的工作负载描述符。由此产生的维护规则在单元占用有界的情况下,仅以$O(|F_t|\log N)$时间修复前沿,而完全重建代价为$\Theta(N)$;此外,熵为事件驱动修复所触及的前沿单元数量提供下界,并改变了经验上的修复-重建交叉点。我们在$d\in\{2,3\}$中评估了十种运动族,$N$高达16,000,使用精确的平铺GPU预言机和GPU网格重建作为真实值和竞争者。在2400个标记的转换中,有效性规则没有遗漏任何无效证书,低压前沿通常通过增量修复更便宜,而相同大小的扩散前沿对于事件驱动修复更昂贵,但对于批量修复则不然。发布的数据集记录了前沿几何、证书审计、每种策略的时间以及最佳策略标签。

英文摘要

We study dynamic geometric data structures for exact nearest-neighbour maintenance under small motions. For each point we store a certificate consisting of its nearest neighbour and the two smallest neighbour distances, with clearance $c_i=d^i_2-d^i_1$. A triangle-inequality argument gives a sharp validity radius: after a step of maximum displacement $\varepsilon$, every certificate with $c_i>4\varepsilon$ remains valid, so all possible failures are confined to a repair frontier $F_t$. We introduce repair-frontier entropy $H(F_t)$, the normalized Shannon entropy of failed certificates over index cells, as a workload descriptor for choosing between event-driven repair, batched repair, and full rebuild. The resulting maintenance rule repairs only the frontier in $O(|F_t|\log N)$ time under bounded cell occupancy, while a full rebuild costs $Θ(N)$; moreover, entropy lower-bounds the number of frontier cells touched by event-driven repair and shifts the empirical repair-rebuild crossover. We evaluate ten motion families in $d\in{2,3}$, with $N$ up to $16,000$, using an exact tiled GPU oracle and a GPU grid rebuild as ground truth and competitor. Across $2400$ labelled transitions, the validity rule misses no invalid certificate, low-pressure frontiers are usually cheaper to repair incrementally, and diffuse frontiers of the same size are more expensive for event-driven repair but not for batched repair. The released dataset records frontier geometry, certificate audits, per-strategy times, and best-strategy labels.