arXivDaily arXiv每日学术速递 周一至周五更新
重置
cs.GR图形学11
2606.12214 2026-06-11 cs.HC cs.GR 新提交

Identifying cybersickness causes in virtual reality games using symbolic machine learning algorithms

使用符号机器学习算法识别虚拟现实游戏中的晕动症原因

Thiago Porcino, Erick Oliveira Rodrigues, Flavia Bernardini, Daniela Trevisan, Esteban Clua

AI总结 提出用符号机器学习算法对VR游戏中晕动症原因进行排序,通过两个游戏和37个有效样本的实验,发现旋转和加速度在飞行游戏中更易引发晕动症,且VR经验不足者更易不适。

详情
AI中文摘要

虚拟现实(VR)和头戴式显示器在教育、军事、娱乐和健康等各个领域越来越受欢迎。尽管此类技术提供了高度的沉浸感,但它们也可能引发不适症状。这种状况被称为晕动症(CS),在最近的虚拟现实出版物中相当常见。本文提出了一种新颖的实验分析,使用符号机器学习来对VR游戏中CS的潜在原因进行排序。我们估计CS的原因并根据其影响使用经典机器学习进行排序。实验使用了两个虚拟现实游戏和6个实验协议,以及来自88名志愿者的37个有效样本。我们的结果表明,与赛车游戏相比,在飞行游戏中旋转和加速度更频繁地引发晕动症。我们还可以观察到,VR经验较少的受试者更容易感到不适。以往经验在赛车游戏中扮演更重要的角色,因为该游戏在控制器方面给用户更多自由,更多的位移选择以及更多用户控制的加速度。此外,根据短期或长期VR暴露,引发不适的不同原因会出现。我们针对这两种场景(短期和长期暴露体验)提出了缓解CS的策略,并比较了两种突出场景(赛车和飞行)。

英文摘要

Virtual reality (VR) and head-mounted displays are constantly gaining popularity in various fields such as education, military, entertainment, and health. Although such technologies provide a high sense of immersion, they can also trigger symptoms of discomfort. This condition is called cybersickness (CS) and is quite popular in recent virtual reality publications. This work proposes a novel experimental analysis using symbolic machine learning to rank potential causes of CS in VR games. We estimate CS causes and rank them according to their impact using classical machine learning. Experiments are performed using two virtual reality games and 6 experimental protocols along with 37 valid samples from a total of 88 volunteers. Our results show that rotation and acceleration triggered cybersickness more frequently in a flight game in contrast to a race game. We could also observe that subjects that are less experienced with VR are more prone to feel discomfort. Former experience plays a more important role on the race game, as this game provides more liberty to the user in terms of controllers, more displacement alternatives and a more user-controlled acceleration. Furthermore, different causes that trigger discomfort arise based on short or long term VR exposures. We suggest strategies for mitigating CS for these two scenarios: short and long term exposure experiences and compare the two highlighted scenarios (race and flight).

2606.12153 2026-06-11 cs.CV cs.GR 新提交

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

TopoCap: 学习拓扑无关的运动先验用于单目视频到动画

Cheng-Feng Pu, Jia-Peng Zhang, Meng-Hao Guo, Yan-Pei Cao, Shi-Min Hu

发表机构 * Zhili College, Tsinghua University(清华大学致理书院) BNRist, Department of Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系,北京国家信息科学与技术研究中心) VAST

AI总结 提出TopoCap,首个统一框架,从单目视频提取运动并重定向到任意未见骨骼拓扑的角色,无需测试时优化,通过图CVAE学习通用运动流形和条件流匹配实现。

详情
AI中文摘要

生成式3D资产的爆炸式增长创造了巨大的动画需求,然而当前的动作捕捉方法仍然脆弱,局限于特定物种的模板(例如SMPL)或需要劳动密集型的手动绑定。我们引入了TopoCap,这是第一个统一的框架,能够从单目视频中提取运动并将其重定向到具有任意、未见过的骨骼拓扑的角色,即从双足到六足和无生命物体,无需测试时优化。我们的关键洞察是,虽然骨骼结构是组合且离散的,但运动背后的物理占据了一个连续的、低维的流形。我们通过一个两阶段生成流水线实现了这一洞察。首先,我们使用图CVAE学习一个通用运动流形,该流形将异构的运动链压缩成共享的、固定长度的潜在代码。通过明确地以目标骨架的结构嵌入为条件对解码器进行条件化,我们将运动动力学与骨骼拓扑解耦。其次,我们将视频到动画视为一个条件流匹配问题,从视觉特征预测这些拓扑无关的代码。为了学习这种广义先验,我们引入了Mobjaverse,这是一个从Objaverse-XL整理的大规模数据集。它包含超过5000个独特的骨骼拓扑和200万帧,其结构多样性比现有数据集高出两个数量级。大量实验表明,\MethodMotion在人类和四足基准测试中优于专业模型,同时实现了对长尾3D生物的零样本重定向。数据集在此https URL公开。

英文摘要

The explosion of generative 3D assets has created a massive demand for animation, yet current motion capture methods remain brittle, restricted to species-specific templates (e.g., SMPL) or requiring labor-intensive manual rigging. We introduce TopoCap, the first unified framework capable of extracting motion from monocular video and retargeting it onto characters with arbitrary, unseen skeletal topologies, i.e., from bipeds to hexapods and inanimate objects, without test-time optimization. Our key insight is that while skeletal structures are combinatorial and discrete, the underlying physics of motion occupy a continuous, low-dimensional manifold. We materialize this insight via a two-stage generative pipeline. First, we learn a Universal Motion Manifold using a Graph CVAE that compresses heterogeneous kinematic chains into a shared, fixed-length latent code. By explicitly conditioning the decoder on a structural embedding of the target rig, we disentangle motion dynamics from skeletal topology. Second, we treat video-to-animation as a conditional flow matching problem, predicting these topology-agnostic codes from visual features. To learn this generalized prior, we introduce Mobjaverse, a massive-scale dataset curated from Objaverse-XL. Comprising over 5,000 unique skeletal topologies and 2 million frames, it exceeds the structural diversity of existing datasets by two orders of magnitude. Extensive experiments demonstrate that \MethodMotion outperforms specialist models on human and quadruped benchmarks while enabling zero-shot retargeting for the long tail of 3D creatures. Dataset is publicly available at this https URL.

2606.12040 2026-06-11 cs.AI cs.GR 新提交

A Lightweight Multi-Agent Framework for Automated Concrete Barrier Design

一种用于自动混凝土护栏设计的轻量级多智能体框架

Wanting Wang, Xiye Ma, Yuyang He, Minghui Cheng, Ran Cao

AI总结 提出基于AutoGen的“生成-评估-优化”闭环多智能体框架,实现混凝土护栏自动设计,准确率超98%,且8B参数轻量模型可优于631B旗舰模型。

详情
AI中文摘要

钢筋混凝土公路护栏的设计是一个安全关键过程,需要严格遵守AASHTO-LRFD桥梁设计指南等监管规定。当前的工程实践严重依赖手动、迭代和启发式计算来满足复杂的非线性材料和力学约束。尽管大型语言模型(LLMs)表现出强大的生成能力,但它们在结构工程中的直接应用仍受到幻觉风险和物理基础不足的限制。为了解决这些挑战,本研究提出了一种新颖的“生成-评估-优化”闭环框架,利用AutoGen的多智能体编排能力实现混凝土护栏的自动设计。实验结果表明,所提出的智能体框架实现了超过98%的设计准确率,显著优于独立的通用LLMs。更重要的是,研究揭示了设计性能不一定与模型规模相关,8B参数的轻量级模型可以胜过无约束的631B参数旗舰模型。这一发现凸显了在降低计算成本的同时提高AI辅助工程工具在工业应用中的可及性的潜力。所提出的多智能体设计框架的源代码可在项目GitHub仓库中获取:this https URL。关键词:结构工程;多智能体系统;大型语言模型;混凝土护栏设计;AutoGen;设计自动化。

英文摘要

The design of reinforced concrete highway barriers is a safety-critical process that requires strict compliance with regulatory provisions such as the AASHTO-LRFD bridge design guidelines. Current engineering practice relies heavily on manual, iterative, and heuristic calculations to satisfy complex nonlinear material and mechanics constraints. Although Large Language Models (LLMs) demonstrate strong generative capabilities, their direct application to structural engineering remains limited by hallucination risks and insufficient physical grounding. To address these challenges, this study proposes a novel "generation-evaluation-optimization" closed-loop framework for automated concrete barrier design using the multi-agent orchestration capabilities of AutoGen. Experimental results demonstrate that the proposed agentic framework achieves over 98% design accuracy, significantly outperforming standalone general-purpose LLMs. More importantly, the study reveals that design performance is not necessarily correlated with model scale, where an 8B-parameter lightweight model could outperform unconstrained 631B-parameter flagship models. This finding highlights the potential to substantially reduce computational costs while improving the accessibility of AI-assisted engineering tools for industry applications. The source code for the proposed multi-agent design framework is available at the project GitHub repository: this https URL. Keywords: Structural Engineering; Multi-Agent Systems; Large Language Models; Concrete Barrier Design; AutoGen; Design Automation.

2606.12008 2026-06-11 cs.CG cs.GR 新提交

Automated Responsive Thematic Mapping with Layout Guides

基于布局引导的自动化响应式专题制图

Arjen Simons, Sarah Schöttler, Wouter Meulemans, Kevin Verbeek, Bettina Speckmann

AI总结 提出首个算法框架,通过布局引导结构高效计算响应式专题地图,实现地图元素在不同显示尺寸下的平滑自适应,兼顾统计信息可读性与制图上下文。

详情
AI中文摘要

专题地图以视觉方式传达关于空间单元(如国家或州)的统计信息。它们必须平衡承载统计信息的地图元素的个体可读性与整体制图上下文。如今,大多数地图不再是静态图像,而必须灵活响应各种设备类型和显示尺寸。当前的响应式专题制图方法存在局限性:对从业者而言劳动密集,且通常依赖组合不连贯的视觉编码以覆盖不同设备类型。在本文中,我们首次提出一种算法框架,用于高效计算能平滑适应不同显示尺寸的响应式专题地图。我们框架的关键组件是布局引导:一种组合结构,编码了专题地图的两个基本方面。第一个方面是每个统计地图元素的视觉需求(至少其期望的宽度和高度),第二个方面是以地图元素相对位置形式呈现的制图上下文。我们的主要算法贡献是地图排列器,它接收视觉容器作为输入,并返回合适的布局引导。地图排列器以稳定且一致的方式实现:如果容器变化很小,布局引导也变化很小,且相同的输入容器总是产生相同的布局引导。要使用我们的框架,需要三个要素:$(1)$ 参考布局,对应于“理想”的专题地图,$(2)$ 所有地图元素的总体垂直和水平顺序(针对具有极端宽高比的容器的期望布局),以及$(3)$ 能够从布局引导构建专题地图的专题制图算法。我们在两种类型的专题地图上演示了我们的框架,即矩形和Demers面积图。

英文摘要

Thematic maps visually communicate statistical information about spatial units such as countries or states. They must balance the individual readability of those map elements that carry the statistical information and the overall cartographic context. Nowadays, most maps are not static images, but must flexibly respond to a range of device types and display sizes. Current approaches to responsive thematic mapping are limited: they are labor-intensive for practitioners and often rely on combining disjointed visual encodings to cover different device types. In this paper we introduce the first algorithmic framework to efficiently compute responsive thematic maps that smoothly adapt to different display sizes. A key component of our framework is the layout guide: a combinatorial structure which encodes the two essential aspects of a thematic map. The first aspect are the visual requirements of each statistical map element (at least their desired width and height), the second aspect is the cartographic context in the form of relative positions of map elements. Our main algorithmic contribution is the map arranger which takes a visual container as input and returns a suitable layout guide. The map arranger does so in a stable and consistent manner: if the container changes only a little, then so does the layout guide, and the same input container always results in the same layout guide. To use our framework, one needs three ingredients: $(1)$ a reference layout, which corresponds to the ``ideal'' thematic map, $(2)$ a total vertical and horizontal order for all map elements (the desired layouts for containers with extreme aspect ratios), and $(3)$ a thematic mapping algorithm that can construct a thematic map from a layout guide. We demonstrate our framework on two types of thematic maps, namely rectangular and Demers cartograms.

2606.11743 2026-06-11 cs.RO cs.GR cs.LG 新提交

TacCoRL: Integrating Tactile Feedback into VLA via Simulation

TacCoRL: 通过仿真将触觉反馈集成到视觉-语言-动作模型中

Siyu Ma, Yuqi Liang, Chang Yu, Yunuo Chen, Hao Su, Yixin Zhu, Yin Yang, Chenfanfu Jiang

发表机构 * University of California, Los Angeles(加利福尼亚大学洛杉矶分校) University of California, San Diego(加利福尼亚大学圣迭戈分校) University of Electronic Science and Technology of China(电子科技大学) Peking University(北京大学) University of Utah(犹他大学)

AI总结 提出TacCoRL框架,通过仿真与真实联合训练和强化学习,将触觉反馈注入视觉-语言-动作策略,在接触密集型任务中平均成功率提升22.5%。

详情
AI中文摘要

视觉-语言-动作(VLA)模型为机器人操作提供了强大的视觉、语言和动作先验,但仅凭视觉观察往往缺失接触密集型任务所需的局部接触状态。我们提出TacCoRL,一个可扩展的框架,将触觉反馈注入VLA策略,并通过仿真-真实联合训练和基于仿真的强化学习(RL)进行改进,无需大规模触觉预训练或广泛的真实世界接触探索。关键思想不仅是添加触觉作为输入,而是学习在接近失败状态下接触读数应如何调节动作响应,这些状态在演示中罕见且在硬件上收集风险高。我们使用真实对齐的仿真器作为接触交互的闭环训练环境。混合的仿真和真实轨迹首先在预训练策略中热启动触觉条件动作。具有可验证任务奖励的强化学习随后通过仿真接触回滚优化策略。它强化导致任务完成的触觉条件动作,而真实轨迹上的监督目标将精炼策略锚定到部署的视觉、触觉和动作分布。所得策略直接转移到真实机器人,无需特权仿真状态或在线真实世界RL。在四个双臂接触密集型任务中,最终的视觉-触觉策略平均成功率达到72.5%,而基线为50.0%。结果视频和更多细节见此链接。

英文摘要

Vision-language-action (VLA) models provide strong visual, language, and action priors for robot manipulation, but visual observations alone often miss the local contact state required for contact-rich tasks. We present TacCoRL, a scalable framework that injects Tactile feedback into VLA policies and improves them through sim-real Co-training and simulation-based reinforcement learning (RL), without requiring large-scale tactile pretraining or extensive real-world contact exploration. The key idea is not only adding touch as an input, but learning how contact readings should modulate action responses in near-failure states that are rare in demonstrations and risky to collect on hardware. We use a real-aligned simulator as a closed-loop training environment for contact interaction. Mixed simulated and real trajectories first warm-start tactile-conditioned actions in the pretrained policy. Reinforcement learning with verifiable task rewards then optimizes the policy using simulated contact rollouts. It reinforces tactile-conditioned actions that lead to task completion, while a supervised objective on real trajectories keeps the refined policy anchored to deployment visual, tactile, and action distributions. The resulting policy transfers directly to the real robot without privileged simulation state or online real-world RL. Across four bimanual contact-rich tasks, the final visuo-tactile policy achieves an average success rate of 72.5%, compared to baseline of 50.0%. Result videos and more details are available at this https URL

2606.11656 2026-06-11 cs.GR 新提交

MoGeFlow: Flowing Through Motion Codebook Geometry for Text-to-Motion Generation

MoGeFlow: 通过运动码本几何流动实现文本到运动生成

Pengcheng Fang, Tengjiao Sun, Xiaoyu Zhan, Xiaohao Cai, Dongjie Fu

AI总结 提出MoGeFlow模型,利用运动码本的几何结构,通过连续流生成运动码帧,替代离散码预测,在多个基准上取得最优结果。

详情
AI中文摘要

向量量化的运动分词器为文本到运动生成提供了紧凑的离散接口,但大多数运动码先验将码索引视为无序的分类标签。这种观点忽略了运动码的一个关键属性:它们是解码器绑定的物理运动原型,其学习到的码本可以携带有意义的局部运动学几何。我们通过码本诊断验证了这一属性。学习到的PartVQ组特定码之间的距离与局部运动原型距离对齐,打乱控制会消除这种对齐,并且用逐渐更远的邻居替换码会导致解码运动变化单调增大。这些结果表明运动码本表现出可测量的、非随机的、解码器因果的几何结构。基于这一观察,我们提出了\textbf{MoGeFlow},一种通过运动码本几何进行生成的文本到运动模型。MoGeFlow将每个运动码帧表示为PartVQ组特定码嵌入的结构化集合,学习这些帧状态上的文本条件连续流,并将终端状态投影回有效的运动码以进行冻结解码。这保留了离散分词化的紧凑性和有效性,同时用几何感知的码本空间生成替代了分类码预测。实验在HumanML3D和KIT-ML上实现了新的R-Precision最先进水平,在生成方法中获得了最佳的HumanML3D多模态距离和KIT-ML FID,并在基准协议下获得了MotionMillion上最佳的R@1、R@2、R@3和FID。

英文摘要

Vector-quantized motion tokenizers provide a compact discrete interface for text-to-motion generation, but most motion-code priors treat code indices as unordered categorical labels. This view overlooks a key property of motion codes: they are decoder-bound prototypes of physical movement, and their learned codebooks can carry meaningful local kinematic geometry. We verify this property through codebook diagnostics. Distances between learned PartVQ group-specific codes align with local motion-prototype distances, shuffled controls remove this alignment, and replacing codes with progressively farther neighbors induces monotonically larger decoded motion changes. These results show that motion codebooks exhibit measurable, non-random, and decoder-causal geometry. Based on this observation, we propose \textbf{MoGeFlow}, a text-to-motion model that generates through motion codebook geometry. MoGeFlow represents each motion-code frame as a structured set of PartVQ group-specific code embeddings, learns a text-conditioned continuous flow over these frame states, and projects terminal states back to valid motion codes for frozen decoding. This preserves the compactness and validity of discrete tokenization while replacing categorical code prediction with geometry-aware codebook-space generation. Experiments set new state of the art in R-Precision on HumanML3D and KIT-ML, achieve the best HumanML3D MultiModal Distance and KIT-ML FID among generated methods, and obtain the best MotionMillion R@1, R@2, R@3, and FID under the benchmark protocol.

2606.11529 2026-06-11 cs.GR cs.CV cs.PF 新提交

XPR: An Extensible Cross-Platform Point-Based Differentiable Renderer

XPR:一个可扩展的跨平台基于点的可微分渲染器

Steve Rhyner, Sankeerth Durvasula, Aleksandr Kovalev, Hansel Jia, Adrian Zhao, Mrutunjayya Mrutunjayya, Nilesh Ahuja, Selvakumar Panneer, Christina Giannoula, Nandita Vijaykumar

AI总结 提出XPR框架,通过高级编程接口和模块化渲染管线,支持用少量代码实现3DGS等新方法,并利用XLA编译器跨平台运行。

详情
AI中文摘要

基于点的可微分渲染支撑着现代3D重建、新视角合成和基于学习的图形管线,但开发新的渲染方法通常需要大量的底层实现、硬件特定的内核以及手动编写的反向传播。这限制了快速原型设计、可重复性、探索和部署,尤其是在不同的硬件平台上。本文提出了XPR,一个可扩展的跨平台基于点的可微分渲染框架。XPR引入了一个高级编程接口,将方法特定的逻辑与共享的渲染管线分离,允许用户用几行代码实现新方法。其管线将渲染分解为模块化的、静态形状的并行操作,这些操作可以通过跨平台编译器降级到GPU、TPU、CPU和其他ML加速器。我们展示了3DGS、3DGUT和LinPrim的实现,仅需几百行Python代码,每个都可以通过XLA编译器编译到一系列硬件平台。这些结果表明,XPR为新兴的基于点的可微分渲染系统实现了快速实验和可移植执行。

英文摘要

Point-based differentiable rendering underpins modern 3D reconstruction, novel-view synthesis, and learning-based graphics pipelines, but developing new rendering methods often requires extensive low-level implementation, hardware-specific kernels, and manually written backward passes. This limits rapid prototyping, reproducibility, exploration, and deployment, especially across diverse hardware platforms. This paper presents XPR, an extensible cross-platform framework for point-based differentiable rendering. XPR introduces a high-level programming interface that separates method-specific logic from the shared rendering pipeline, allowing users to implement new methods in a few lines of code. Its pipeline decomposes rendering into modular, statically shaped parallel operations that can be lowered by a cross-platform compiler to GPUs, TPUs, CPUs, and other ML accelerators. We demonstrate implementations of 3DGS, 3DGUT, and LinPrim, with only a few 100s lines of Python code, each of which can be compiled to a range of hardware platforms with the XLA compiler. These results show that XPR enables fast experimentation and portable execution for emerging point-based differentiable rendering systems.

2606.11446 2026-06-11 cs.CV cs.GR 新提交

3D-CBM: A Framework for Concept-Based Interpretability in Generative 3D Modeling

3D-CBM:生成式3D建模中基于概念可解释性的框架

Ahmad Al-Kabbany

发表机构 * Yubree Labs Multimedia Interaction and Communication Lab, Arab Academy for Science and Technology(阿拉伯科学技术学院多媒体交互与通信实验室)

AI总结 提出将概念瓶颈模型(CBM)融入3D生成架构,通过多层级可解释原语和功能属性映射,实现语义可操控的3D生成,实验验证了高概念预测精度和交互式纠错能力。

详情
AI中文摘要

本研究引入了一个将概念瓶颈模型(CBM)融入3D生成架构的框架,以解决深度几何学习中固有的“语义鸿沟”。随着深度模型成为3D内容创建的核心,可解释性从边缘特性转变为医疗和制造等安全关键领域中信任和问责的基本要求。CBM通过约束潜在表示与人类定义的概念对齐,提供了一种内在的可解释性解决方案,但其在非结构化3D数据上的应用仍 largely unexplored。我们设计、实现并验证了一个正式的3D-CBM架构,将原始几何输入(包括点云和网格)映射到可解释基元和功能属性的多层级分类中。该框架进一步确定了专门用于基于概念监督的战略性数据集,如PartNet和ShapeNet。来自3D部件操作概念验证实验的结果证明了该框架的有效性,实现了88.8%的概念预测准确率和0.0115的Chamfer距离。关键的是,该模型支持精确的测试时干预,允许交互式纠正结构错误。这项工作为语义可操控的3D生成奠定了基础,并邀请进一步探索协作式人在回路设计系统。

英文摘要

This research introduces a framework for incorporating Concept Bottleneck Models (CBMs) into 3D generative architectures to address the inherent 'semantic gap' in deep geometric learning. As deep models become central to 3D content creation, explainability shifts from a peripheral feature to a fundamental requirement for trust and accountability in safety-critical domains such as healthcare and manufacturing. CBMs provide an intrinsic interpretability solution by constraining latent representations to align with human-defined concepts, yet their application to unstructured 3D data remains largely unexplored. We design, implement, and validate a formal 3D-CBM architecture that maps raw geometric inputs, including point clouds and meshes, into a multi-tiered taxonomy of interpretable primitives and functional attributes. The framework further identifies strategic datasets, such as PartNet and ShapeNet, specialized for concept-based supervision. Experimental results from a 3D part-manipulation proof-of-concept experiment demonstrate the framework's efficacy, achieving a concept prediction accuracy of 88.8\% and a Chamfer Distance of 0.0115. Critically, the model enables precise test-time intervention, allowing for the interactive correction of structural errors. This work establishes a foundation for semantically-steerable 3D generation and invites further exploration into collaborative human-in-the-loop design systems.

2606.11390 2026-06-11 cs.CV cs.DC cs.GR cs.LG 新提交

A Scalable PyTorch Abstraction for Multi-GPU Gaussian Splatting

一种可扩展的多GPU高斯泼溅PyTorch抽象

Matthew Cong, Francis Williams, Jonathan Swartz, Mark Harris, Sanja Fidler, Ken Museth

发表机构 * NVIDIA(英伟达) University of Toronto(多伦多大学) Vector Institute(向量研究所)

AI总结 提出一种多GPU高斯泼溅方法,通过CUDA统一内存和NVLink在算子级别分布参数,实现大规模场景重建,支持超过10亿高斯泼溅。

详情
Comments
14 pages, 6 tables, 2 figures, and 1 listing. Includes supplementary material
AI中文摘要

高斯泼溅方法在真实世界的神经重建中越来越受欢迎。然而,由于计算和内存限制,它们在规模和分辨率上常常受限。我们提出了一种多GPU高斯泼溅方法,将重建扩展到更高的分辨率和更大的场景,同时抽象掉了通常与模型分布相关的代码复杂性。为实现这一目标,我们提出一个PyTorch后端,通过CUDA统一内存和NVLink在GPU之间分布高斯参数和泼溅算子。由于分布发生在算子级别,模型代码不需要显式的跨设备通信。更广泛地说,该后端将多个GPU暴露为一个聚合的PyTorch设备,并支持其他PyTorch算子。我们展示了包含超过10亿个高斯泼溅的城市规模重建,具有街道级细节,数量是当前最先进方法的25倍以上。

英文摘要

Gaussian splatting methods have become increasingly popular for neural reconstruction of the real world. However, they are often limited in scale and resolution due to compute and memory constraints. We present a multi-GPU Gaussian splatting approach that scales reconstruction to higher resolutions and larger scenes while abstracting away the code complexity typically associated with distributing a model. To accomplish this, we propose a PyTorch backend that distributes the Gaussian parameters and splatting operators across GPUs via CUDA unified memory and NVLink. Because distribution occurs at the operator level, the model code requires no explicit cross-device communication. More broadly, the backend exposes multiple GPUs as an aggregate PyTorch device and supports other PyTorch operators. We demonstrate city-scale reconstructions with street-level detail consisting of over 1 billion Gaussian splats, more than 25 times as many as the current state of the art.

2606.11314 2026-06-11 cs.CV cs.GR 新提交

TRON: Tracing Rays to Orchestrate a Neural Renderer for 3D Gaussian Reconstructions

TRON:追踪光线以编排用于3D高斯重建的神经渲染器

Or Perel, Hassan Abu Alhaija, Zian Wang, Jacob Munkberg, Matan Atzmon, Sanja Fidler, Masha Shugrina

发表机构 * NVIDIA(英伟达) University of Toronto(多伦多大学) Vector Institute(向量研究所)

AI总结 提出TRON框架,结合3D高斯光线追踪与神经渲染,实现真实世界3D场景在新光照、动态物体运动、物体插入和材质编辑下的逼真可控渲染,通过内在分解先验和光线追踪辐射引导,弥合物理渲染与神经渲染的差距。

详情
Comments
Project page: this https URL
AI中文摘要

我们介绍了TRON,一种渲染框架,它将3D高斯光线追踪与神经渲染相结合,使得在新型光照、动态物体运动、物体插入和材质编辑下,对真实世界3D场景进行逼真且可控的渲染成为可能。先前仅依赖高斯表示的物理渲染(PBR)的方法,由于重建几何、材质估计和光传输估计的不完善,难以实现逼真的重光照。同时,神经渲染方法通常缺乏显式场景表示,限制了它们支持细粒度交互编辑的能力。TRON桥接了这两种范式。我们使用来自学习逆渲染模型的内在分解先验来正则化高斯场的材质属性,并重新利用光线追踪器提供辐射度量指导而非最终像素。通过将此输出视为结构化的3D支架,我们赋予轻量级神经渲染器能力,以弥合着色模型约束估计与逼真输出之间的领域差距。我们的关键见解是,显式3D知识与稳健材质先验的结合提供了速度和可控性,而神经渲染则实现了逼真图像的合成。为了支持真实世界场景,我们采用多阶段策略训练神经渲染器,包括大规模预训练和在从3D重建中构建的210万渲染合成及真实世界帧的新数据集上进行针对性微调。TRON在逼真度上优于基于高斯的重光照方法,在可编辑性和速度上优于先前的神经渲染器。据我们所知,TRON是首个能够在捕获的3D环境中实现实用交互式应用的方法,在动态几何、光照和材质条件下提供逼真的外观。

英文摘要

We introduce TRON, a rendering framework that combines 3D Gaussian ray tracing with neural rendering to enable realistic and controllable rendering of real-world 3D scenes under novel lighting, dynamic object motion, object insertion, and material editing. Prior approaches that rely solely on physically based rendering (PBR) of Gaussian representations struggle to achieve realistic relighting due to imperfections in reconstructed geometry, material estimates, and light transport estimation. At the same time, neural rendering methods often lack an explicit scene representation, limiting their ability to support interactive editing with fine-grained manipulation. TRON bridges these two paradigms. We use intrinsic decomposition priors from a learned inverse rendering model to regularize the material properties of a Gaussian field, and repurpose a ray tracer to provide radiometric guidance rather than final pixels. By treating this output as a structured 3D scaffold, we empower a lightweight neural renderer to bridge the domain gap between shading-model constrained estimates and photorealistic output. Our key insight is that the combination of explicit 3D knowledge with robust material priors provides speed and controllability, while neural rendering enables the synthesis of photorealistic images. To support real-world scenarios, we train our neural renderer with a multi-stage strategy consisting of large-scale pretraining and targeted fine-tuning on a newly constructed dataset of 2.1M rendered synthetic and real-world frames from 3D reconstructions. TRON outperforms Gaussian-based relighting methods in realism, and prior neural renderers in editability and speed. To the best of our knowledge, TRON is the first method to enable practical interactive applications in captured 3D environments, offering realistic appearance under dynamic geometric, lighting and material conditions.

2605.17557 2026-06-11 cs.GR cs.CV 版本更新

Real-Time Neural Hair Denoising

实时神经头发去噪

Chenghao Wu, Yuefan Shen, Tao Huang, Kai Yan, Zahra Montazeri, Kui Wu

AI总结 本文提出了一种轻量级的实时方法,用于从严重欠采样的光栅化输入中重建基于丝状的头发G-Buffers。方法首先应用神经空间重建和时间累积来恢复头发覆盖,即像素内的分数头发可见性及切线向量,然后利用切线引导的重建步骤完成位置信息,随后用于基于物理的延迟头发着色。在多种发型和静态/动态场景下评估了该方法,其头发重建质量优于现有专门针对头发的去噪技术以及通用工业神经重建解决方案如DLSS和FSR。

详情
AI中文摘要

我们提出了一种轻量级的实时方法,用于从严重欠采样的光栅化输入中重建基于丝状的头发G-Buffers。我们的流程首先应用神经空间重建和时间累积来恢复头发覆盖,即像素内的分数头发可见性及切线。然后使用切线引导的重建步骤完成位置,该信息随后用于基于物理的延迟头发着色。我们在多种发型,包括直发、卷发、阿非利卡发型和马尾发型,在静态和动态场景下评估了我们的方法。我们的方法在头发重建质量上优于现有的专门针对头发的去噪技术以及通用工业神经重建解决方案,如DLSS和FSR。

英文摘要

We propose a lightweight real-time method for reconstructing strand-based hair G-Buffers from severely undersampled rasterized inputs. Our pipeline first applies neural spatial reconstruction and temporal accumulation to recover hair coverage, i.e., fractional hair visibility within a pixel, and tangent. It then uses a tangent-guided reconstruction step to complete the position, which is subsequently used for physically based deferred hair shading. We evaluate our method across a diverse set of hairstyles, including straight, wavy, afro, and ponytail styles, under both static and dynamic scenarios. Our method achieves higher hair reconstruction quality than existing hair-specific denoising techniques and general industrial neural reconstruction solutions such as DLSS and FSR.