Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations
Bench2Drive-Robust: 在部署扰动下闭环自动驾驶的基准测试
Zhiyuan Zhang, Zhenghao Jin, Yanlun Peng, Xianda Guo, Haoran Liu, Shaofeng Zhang, Xingjun Ma, Zuxuan Wu, Junchi Yan, Xiaosong Jia, Yu-Gang Jiang
AI总结 本文提出Bench2Drive-Robust,首个针对闭环端到端自动驾驶在现实部署扰动下的设备中心鲁棒性基准测试,评估了三种主要来源的部署相关扰动对自动驾驶系统的影响,揭示了传统图像级腐蚀评估未能完全捕捉的鲁棒性挑战。
详情
鲁棒性是部署自动驾驶系统到现实世界中的关键要求。现有的自动驾驶鲁棒性基准测试在研究图像级腐蚀(如恶劣天气或摄像头退化)对感知模块和开环规划输出的影响方面取得了重要进展。然而,部署还可能涉及系统级缺陷,如推理延迟和自我状态估计误差,这些在闭环端到端自动驾驶评估中仍较少研究。这些缺陷可以通过反馈回路积累并导致控制不稳定。在本文中,我们提出了Bench2Drive-Robust,据我们所知,这是首个针对闭环端到端自动驾驶在现实部署扰动下的设备中心鲁棒性基准测试。我们系统地评估了三种主要来源的部署导向扰动:摄像头流故障(帧丢失、部分观察)、自我状态估计误差(GPS噪声,以及速度或里程误差)和计算导致的控制延迟(模型推理延迟)。我们评估了代表性端到端驾驶方法,并分析它们在不同扰动严重程度下的鲁棒性。我们的结果表明,这些部署相关扰动可以显著降低闭环驾驶性能,揭示了传统图像级腐蚀评估未能完全捕捉的鲁棒性挑战。通过建立闭环评估协议并展示这些部署导向扰动的实质性影响,Bench2Drive-Robust定义了端到端自动驾驶的实用鲁棒性问题,并鼓励进一步研究面向部署的鲁棒驾驶系统。
Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules and open-loop planning outputs. However, deployment can also involve system-level imperfections, such as inference latency and ego-state estimation errors, which remain less studied in closed-loop E2E-AD evaluation. These imperfections can accumulate through the feedback loop and destabilize control. In this work, we present Bench2Drive-Robust, to our knowledge the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving under realistic deployment perturbations. We systematically evaluate deployment-oriented perturbations arising from three major sources: camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, and speed or odometry errors), and compute-induced control delay (model inference delay). We evaluate representative end-to-end driving methods and analyze their robustness under different perturbation severities. Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations. By establishing a closed-loop evaluation protocol and demonstrating the substantial impact of these deployment-oriented perturbations, Bench2Drive-Robust defines practical robustness problems for end-to-end autonomous driving and encourages further research on deployment-aware robust driving systems.