Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
Phys4D: 从视频扩散模型实现细粒度物理一致的4D建模
Haoran Lu, Shang Wu, Songling Liu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, Han Liu
AI总结 提出Phys4D流水线,通过三阶段训练(伪监督预训练、物理监督微调、强化学习校正)从视频扩散模型学习物理一致的4D世界表示,显著提升细粒度时空与物理一致性。
详情
最近的视频扩散模型作为大规模生成式世界模型已经取得了令人印象深刻的能力。然而,这些模型通常难以保持细粒度的物理一致性,随时间表现出物理上不合理的动态。在这项工作中,我们提出了 \textbf{Phys4D},一个从视频扩散模型中学习物理一致的4D世界表示的流水线。Phys4D 采用 \textbf{三阶段训练范式},逐步将外观驱动的视频扩散模型提升为物理一致的4D世界表示。我们首先通过大规模伪监督预训练引导出稳健的几何和运动表示,为4D场景建模奠定基础。然后,我们使用模拟生成的数据进行基于物理的监督微调,强制执行时间一致的4D动态。最后,我们应用基于模拟的强化学习来纠正难以通过显式监督捕获的残留物理违规。为了评估超越外观指标的细粒度物理一致性,我们引入了一套 \textbf{4D世界一致性评估},探测几何一致性、运动稳定性和长期物理合理性。实验结果表明,与外观驱动的基线相比,Phys4D 显著改善了细粒度时空和物理一致性,同时保持了强大的生成性能。我们的项目页面可在此 https URL 获取。
Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time. In this work, we present \textbf{Phys4D}, a pipeline for learning physics-consistent 4D world representations from video diffusion models. Phys4D adopts \textbf{a three-stage training paradigm} that progressively lifts appearance-driven video diffusion models into physics-consistent 4D world representations. We first bootstrap robust geometry and motion representations through large-scale pseudo-supervised pretraining, establishing a foundation for 4D scene modeling. We then perform physics-grounded supervised fine-tuning using simulation-generated data, enforcing temporally consistent 4D dynamics. Finally, we apply simulation-grounded reinforcement learning to correct residual physical violations that are difficult to capture through explicit supervision. To evaluate fine-grained physical consistency beyond appearance-based metrics, we introduce a set of \textbf{4D world consistency evaluation} that probe geometric coherence, motion stability, and long-horizon physical plausibility. Experimental results demonstrate that Phys4D substantially improves fine-grained spatiotemporal and physical consistency compared to appearance-driven baselines, while maintaining strong generative performance. Our project page is available at https://sensational-brioche-7657e7.netlify.app/