Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method
扩展以占据为中心的驾驶场景生成:数据集与方法
Bohan Li, Xin Jin, Hu Zhu, Hongsi Liu, Ruikai Li, Jiazhe Guo, Kaiwen Cai, Chao Ma, Yueming Jin, Hao Zhao, Xiaokang Yang, Wenjun Zeng
AI总结 针对占据数据稀缺问题,构建最大语义占据数据集Nuplan-Occ,并提出统一框架联合生成高质量语义占据、多视角视频和LiDAR点云,采用时空解耦架构及高斯泼溅稀疏点图渲染和传感器感知嵌入策略,实现高保真生成。
Comments IEEE TPAMI
详情
驾驶场景生成是自动驾驶的关键领域,支持下游应用,包括感知和规划评估。以占据为中心的方法通过提供跨帧和模态的一致条件,最近取得了最先进的结果;然而,其性能严重依赖于标注的占据数据,而这类数据仍然稀缺。为克服这一限制,我们整理了Nuplan-Occ,这是迄今为止最大的语义占据数据集,基于广泛使用的Nuplan基准构建。其规模和多样性不仅促进了大规模生成建模,也促进了自动驾驶下游应用。基于该数据集,我们开发了一个统一框架,联合合成高质量语义占据、多视角视频和LiDAR点云。我们的方法采用时空解耦架构,支持4D动态占据的高保真空间扩展和时间预测。为弥合模态差距,我们进一步提出了两种新技术:基于高斯泼溅的稀疏点图渲染策略,增强多视角视频生成;以及传感器感知嵌入策略,显式建模LiDAR传感器属性以实现逼真的多LiDAR模拟。大量实验表明,与现有方法相比,我们的方法实现了更优的生成保真度和可扩展性,并验证了其在下游任务中的实用价值。仓库:https://github.com/Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation/tree/v2
Driving scene generation is a critical domain for autonomous driving, enabling downstream applications, including perception and planning evaluation. Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities; however, their performance heavily depends on annotated occupancy data, which still remains scarce. To overcome this limitation, we curate Nuplan-Occ, the largest semantic occupancy dataset to date, constructed from the widely used Nuplan benchmark. Its scale and diversity facilitate not only large-scale generative modeling but also autonomous driving downstream applications. Based on this dataset, we develop a unified framework that jointly synthesizes high-quality semantic occupancy, multi-view videos, and LiDAR point clouds. Our approach incorporates a spatio-temporal disentangled architecture to support high-fidelity spatial expansion and temporal forecasting of 4D dynamic occupancy. To bridge modal gaps, we further propose two novel techniques: a Gaussian splatting-based sparse point map rendering strategy that enhances multi-view video generation, and a sensor-aware embedding strategy that explicitly models LiDAR sensor properties for realistic multi-LiDAR simulation. Extensive experiments demonstrate that our method achieves superior generation fidelity and scalability compared to existing approaches, and validates its practical value in downstream tasks. Repo: https://github.com/Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation/tree/v2