TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
TriSplat: 面向仿真的前馈式3D场景重建
Weijie Wang, Zimu Li, Jinchuan Shi, Zeyu Zhang, Botao Ye, Marc Pollefeys, Donny Y. Chen, Bohan Zhuang
AI总结 提出TriSplat,一种前馈式重建网络,使用有向三角形图元表示场景,直接从稀疏视图图像预测并导出可直接用于仿真的网格场景。
详情
- Comments
- Project Page: https://lhmd.top/trisplat, Code: https://github.com/ziplab/TriSplat
稀疏视图3D重建越来越多地通过前馈式splatting网络来解决,这些网络直接从图像预测显式图元。然而,现有方法大多仍以高斯图元为中心,且仅间接暴露表面:提取可用于下游仿真、物理推理或具身交互的网格仍需昂贵的后处理步骤,这违背了前馈式的承诺。这一限制在无姿态设置中尤为突出,因为场景结构和相机参数必须从稀疏观测中联合估计。我们提出TriSplat,一种前馈式重建网络,使用有向三角形图元表示场景,并直接从单次前向传播中导出可用于仿真的网格场景。给定输入图像,网络预测局部3D点图、三角形属性、相机姿态和可选内参。我们的方法不是将三角形方向回归为无约束的潜变量,而是从预测的点图构建几何法线,通过图像条件法线头进行细化,并将其转换为稳定的局部框架用于三角形参数化。单目法线引导调度进一步稳定早期训练,而透明度和模糊调度逐步锐化学习到的表面表示以直接提取网格。在RealEstate10K和DL3DV上的实验表明,与高斯前馈基线相比,该表示方法能产生更几何保真的重建,同时保持有竞争力的新视角渲染质量。由于渲染图元本身就是表面三角形,输出可直接被物理引擎、碰撞检测器和标准渲染管线使用而无需任何转换,使其成为面向仿真的前馈式3D场景重建的实用解决方案。
Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.