自动驾驶 - arXivDaily 专题

2603.11417 2026-06-18 cs.CV cs.LG 版本更新 90%

Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations

端到端自动驾驶中的零样本跨城市泛化：自监督与监督表示

Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska

发表机构 * Department of Electrical and Computer Engineering, NYU Tandon School of Engineering（电气工程系，纽约大学Tandon工程学院）

专题命中感知：端到端自动驾驶跨城市泛化研究

AI总结研究端到端自动驾驶模型在跨城市零样本迁移中的泛化能力，发现自监督预训练（如I-JEPA、DINOv2、MAE）相比监督预训练能显著减少位移和碰撞退化，提升闭环评估中的分布外PDMS。

详情

AI中文摘要

端到端自动驾驶模型通常使用监督的ImageNet预训练骨干网络在多城市数据集上训练，但其泛化到未见城市的能力尚未得到充分检验。当训练和评估数据在地理上混合时，模型可能隐含地依赖城市特定线索，掩盖了在真实世界域偏移下泛化到新位置时可能出现的失败模式。在这项工作中，我们将零样本跨城市迁移定义为端到端自动驾驶的受控表示级压力测试，并探究视觉预训练如何影响地理域偏移下的迁移行为。我们通过将自监督骨干网络I-JEPA、DINOv2和MAE集成到规划框架中进行了全面研究。我们在nuScenes上的开环设置和NAVSIM上的闭环评估协议中，在严格的地理划分下评估性能。我们的实验揭示了当模型在不同道路拓扑、交通规则和视觉环境的城市间迁移时存在显著的泛化差距。在开环评估中，监督骨干网络在城市间迁移时表现出严重退化，而某些领域特定的自监督方法可以显著减少位移和碰撞退化。在闭环评估中，自监督预训练在多个单城市训练设置中提高了平均分布外PDMS。我们的结果提供了经验证据，表明表示学习影响跨城市规划的鲁棒性，并促使将零样本地理迁移作为评估端到端自动驾驶系统的重要压力测试。

英文摘要

End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely unexamined. When training and evaluation data are geographically mixed, models may implicitly rely on city-specific cues, masking failure modes that would occur under real-world domain shifts when generalizing to new locations. In this work, we formulate zero-shot cross-city transfer as a controlled representation-level stress test for end-to-end autonomous driving and ask how visual pretraining affects transfer behavior under geographic domain shift. We conduct a comprehensive study by integrating self-supervised backbones I-JEPA, DINOv2, and MAE into planning frameworks. We evaluate performance under strict geographic splits on nuScenes in the open-loop setting and on NAVSIM in the closed-loop evaluation protocol. Our experiments reveal a substantial generalization gap when transferring models across cities with different road topologies, traffic conventions, and visual environments. In open-loop evaluation, a supervised backbone exhibits severe degradation when transferring between cities, yet some domain-specific self-supervised methods can substantially reduce both displacement and collision degradation. In closed-loop evaluation, self-supervised pretraining improves average out-of-distribution PDMS in several single-city training settings. Our results provide empirical evidence that representation learning influences the robustness of cross-city planning and motivate zero-shot geographic transfer as an important stress test for evaluating end-to-end autonomous driving systems.

URL PDF HTML ☆

赞 0 踩 0

2602.04401 2026-06-18 cs.RO cs.CV 版本更新 80%

Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition

视觉地点识别中可靠操作点选择的分位数迁移

Dhyey Manish Rajani, Michael Milford, Tobias Fischer

发表机构 * QUT Centre for Robotics（昆士兰理工大学机器人中心）； School of Electrical Engineering and Robotics（电气工程与机器人学院）； Queensland University of Technology（昆士兰理工大学）

专题命中感知：视觉地点识别操作点选择

AI总结提出一种通过分位数归一化迁移阈值的方法，自动选择视觉地点识别系统的操作点，在100%精度下最大化召回率，无需手动调参。

Comments Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

详情

AI中文摘要

视觉地点识别（VPR）是全球导航卫星系统（GNSS）受限环境中定位的关键组成部分，但其性能严重依赖于选择平衡精度和召回率的图像匹配阈值（操作点）。阈值通常针对特定环境离线手动调整，并在部署期间固定，导致在环境变化下性能下降。我们提出一种方法，自动选择VPR系统的操作点，以在100%精度下最大化召回率。该方法使用已知对应关系的小型校准遍历，并通过相似度得分分布的分位数归一化将阈值迁移到部署中。这种分位数迁移确保阈值在校准大小和查询子集上保持稳定。在五个基准数据集上使用七种最先进的VPR技术进行的实验表明，我们提出的方法始终优于现有基线，使底层VPR技术在大约两倍的部署场景中（中位数改进）以100%精度运行，同时在该精度下检索到多达29%的正确匹配。该方法通过适应新环境并在操作条件下泛化，消除了手动调整。我们的代码可在该https URL获取。

英文摘要

Visual Place Recognition (VPR) is a key component for localisation in Global Navigation Satellite System (GNSS)-denied environments, but its performance critically depends on selecting an image matching threshold (operating point) that balances precision and recall. Thresholds are typically hand-tuned offline for a specific environment and fixed during deployment, leading to degraded performance under environmental change. We propose a method that automatically selects the operating point of a VPR system to maximise recall at 100% precision. The method uses a small calibration traversal with known correspondences and transfers thresholds to deployment via quantile normalisation of similarity score distributions. This quantile transfer ensures that thresholds remain stable across calibration sizes and query subsets. Experiments with seven state-of-the-art VPR techniques across five benchmark datasets demonstrate that our proposed approach consistently outperforms existing baselines, enabling the underlying VPR technique to operate at 100% precision in approximately twice as many deployment scenarios (median improvement), while retrieving up to 29% more correct matches at that precision. The method eliminates manual tuning by adapting to new environments and generalising across operating conditions. Our code is available at https://github.com/DhyeyR-007/Quantile-Transfer-for-Reliable-VPR.

URL PDF HTML ☆

赞 0 踩 0