arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

3D 视觉

三维重建、NeRF、Gaussian Splatting、点云和空间智能。

今日/当前日期收录 5 信号源:cs.CV, cs.GR, cs.RO

1. 点云 2 篇

2605.17131 2026-06-18 cs.CV cs.AI cs.LG 版本更新 专题 95

A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

针对点云分类和分割的深度学习架构系统性调研

Minhas Kamal, Hiranya Garbha Kumar, Balakrishnan Prabhakaran

发表机构 * State University of New York at Albany(纽约州立大学阿尔巴尼分校)

专题命中 点云 :系统性调研点云分类和分割的深度学习架构。

AI总结 本文系统性地探讨了点云分类和分割中的深度学习架构,分析了点云数据的结构特性,分类了不同架构的工作,并评估了其在主流基准上的性能,同时指出了开放挑战和未来方向。

Comments We reviewed a decade of advancements in point cloud processing: trace the evolution of the field from its foundational roots to the modern SOTA, analyze how diverse architectures overcome the inherent geometric challenges of 3D data, and map out critical research gaps alongside promising future directions. GitHub: https://github.com/MinhasKamal/DeepLearningForPointCloud

Journal ref ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026

详情
AI中文摘要

点云因其简洁性和几何保真度而成为表示3D形状和场景最广泛采用的格式。然而,其固有的无序和不规则性质,加剧了传感器噪声和遮挡的影响,给基于机器学习的方法带来了独特的挑战。为应对这些问题,已开发出多种策略,包括转换为有序格式、提取局部几何特征以及基于排列不变或自注意力的处理方法。在本文中,我们的重点是深度学习模型在3D视觉三个基本任务中的应用:点云分类、部分分割和语义分割。我们首先正式定义点云数据,然后深入讨论其结构特性。接着,我们根据其骨干结构对重要工作进行分类,并评估其在流行基准上的性能。除了经验比较外,我们还提供了架构创新和局限性的见解。我们还概述了3D点云理解中的开放挑战和有前途的未来方向。

英文摘要

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

2601.01200 2026-06-18 cs.CV eess.IV 版本更新 专题 85

Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity

点云的多尺度隐式结构相似性客观质量评估

Zhang Chen, Shuai Wan, Yuezhe Zhang, Siyu Ren, Fuzheng Yang, Junhui Hou

发表机构 * School of Electronics and Information, Northwestern Polytechnical University(电子与信息学院,西北工业大学) Department of Computer Science, City University of Hong Kong(计算机科学系,香港城市大学) School of Telecommunication Engineering, Xidian University(电信工程学院,西安电子科技大学)

专题命中 点云 :点云质量评估,多尺度隐式结构相似性

AI总结 针对点云质量评估中不规则数据匹配困难的问题,提出多尺度隐式结构相似性度量(MS-ISSM),通过径向基函数连续表示局部特征并比较隐式函数系数,结合ResGrouped-MLP网络,在多个基准上超越现有方法。

Comments IEEE TMM Accepted

详情
AI中文摘要

点云的无结构和不规则特性对精确的点云质量评估(PCQA)构成重大挑战,特别是在建立准确的感知特征对应关系方面。为了解决这一问题,我们提出了多尺度隐式结构相似性度量(MS-ISSM)。与传统的点对点匹配不同,MS-ISSM利用径向基函数(RBF)连续表示局部特征,将失真测量转化为隐式函数系数的比较。该方法有效避免了不规则数据中固有的匹配误差。此外,我们提出了ResGrouped-MLP质量评估网络,该网络能够鲁棒地将多尺度特征差异映射到感知分数。该网络架构摒弃了传统的平面多层感知器(MLP),采用分组编码策略,集成了残差块和通道注意力机制。这种分层设计使得模型能够保留亮度、色度和几何的独特物理语义,同时自适应地关注高、中、低尺度上最显著的失真特征。在多个基准上的实验结果表明,MS-ISSM在可靠性和泛化性方面均优于最先进的指标。源代码可在以下网址获取:this https URL。

英文摘要

The unstructured and irregular nature of points poses a significant challenge for accurate point cloud quality assessment (PCQA), particularly in establishing accurate perceptual feature correspondence. To tackle this, we propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM). Unlike traditional point-to-point matching, MS-ISSM utilizes radial basis function (RBF) to represent local features continuously, transforming distortion measurement into a comparison of implicit function coefficients. This approach effectively circumvents matching errors inherent in irregular data. Additionally, we propose a ResGrouped-MLP quality assessment network, which robustly maps multi-scale feature differences to perceptual scores. The network architecture departs from traditional flat multi-layer perceptron (MLP) by adopting a grouped encoding strategy integrated with residual blocks and channel-wise attention mechanisms. This hierarchical design allows the model to preserve the distinct physical semantics of luma, chroma, and geometry while adaptively focusing on the most salient distortion features across High, Medium, and Low scales. Experimental results on multiple benchmarks demonstrate that MS-ISSM outperforms state-of-the-art metrics in both reliability and generalization. The source code is available at: https://github.com/ZhangChen2022/MS-ISSM.

2. 三维重建 2 篇

2503.09439 2026-06-18 cs.CV 版本更新 专题 85

SuperCarver: Texture-Consistent 3D Geometry Super-Resolution for High-Fidelity Surface Detail Generation

SuperCarver: 纹理一致的3D几何超分辨率用于高保真表面细节生成

Qijian Zhang, Xiaozheng Jian, Xuan Zhang, Wenping Wang, Junhui Hou

发表机构 * Tencent Games, China(腾讯游戏,中国) Department of Computer Science & Engineering, Texas A & M University(电子与计算机工程系,德克萨斯A&M大学) Department of Computer Science, City University of Hong Kong(计算机科学系,香港城市大学)

专题命中 三维重建 :提出3D几何超分辨率管线,补充纹理一致表面细节。

AI总结 提出SuperCarver,一种3D几何超分辨率管线,通过先验引导的法线扩散模型和噪声鲁棒的逆渲染,为粗糙网格补充纹理一致的表面细节,实现高保真细节生成。

Comments Accepted in IEEE TVCG

详情
AI中文摘要

传统的高精度网格资产生产流程需要专业3D艺术家/建模师进行繁琐且费力的手动雕刻。近年来,AI赋能的3D内容创作在从图像或文本提示生成合理结构和复杂外观方面取得了显著进展。然而,合成逼真的表面细节仍然面临巨大挑战,并且增强现有低质量3D网格(而非图像/文本到3D生成)的几何保真度仍然是一个开放问题。在本文中,我们介绍了SuperCarver,一种3D几何超分辨率管线,用于为给定的粗糙网格补充纹理一致的表面细节。我们首先从多个视角将原始纹理网格渲染到图像域。为了实现细节增强,我们构建了一个确定性先验引导的法线扩散模型,该模型在精心策划的成对细节缺乏和细节丰富的法线图渲染数据集上进行微调。为了从潜在不完美的法线图预测更新网格表面,我们设计了一种通过可变形距离场的噪声鲁棒逆渲染方案。实验表明,我们的SuperCarver能够生成由实际纹理外观描述的逼真且富有表现力的表面细节,使其成为升级历史低质量3D资产和减少高多边形网格雕刻工作量的强大工具。

英文摘要

Conventional production workflow of high-precision mesh assets necessitates a cumbersome and laborious process of manual sculpting by specialized 3D artists/modelers. The recent years have witnessed remarkable advances in AI-empowered 3D content creation for generating plausible structures and intricate appearances from images or text prompts. However, synthesizing realistic surface details still poses great challenges, and enhancing the geometry fidelity of existing lower-quality 3D meshes (instead of image/text-to-3D generation) remains an open problem. In this paper, we introduce SuperCarver, a 3D geometry super-resolution pipeline for supplementing texture-consistent surface details onto a given coarse mesh. We start by rendering the original textured mesh into the image domain from multiple viewpoints. To achieve detail boosting, we construct a deterministic prior-guided normal diffusion model, which is fine-tuned on a carefully curated dataset of paired detail-lacking and detail-rich normal map renderings. To update mesh surfaces from potentially imperfect normal map predictions, we design a noise-resistant inverse rendering scheme through deformable distance field. Experiments demonstrate that our SuperCarver is capable of generating realistic and expressive surface details depicted by the actual texture appearance, making it a powerful tool to both upgrade historical low-quality 3D assets and reduce the workload of sculpting high-poly meshes.

2511.02036 2026-06-18 cs.RO 版本更新 专题 80

TurboMap: GPU-Accelerated Local Mapping for Visual SLAM

TurboMap: 面向视觉SLAM的GPU加速局部建图

Parsa Hosseininejad, Kimia Khabiri, Shishir Gopinath, Soudabeh Mohammadhashemi, Karthik Dantu, Steven Y. Ko

发表机构 * Simon Fraser University(西蒙弗雷泽大学) University at Buffalo(布法罗大学)

专题命中 三维重建 :GPU加速局部建图用于视觉SLAM

AI总结 针对视觉SLAM中局部建图延迟问题,提出GPU并行化与CPU优化结合的TurboMap后端,通过重构地图点创建、融合及关键帧管理,实现1.3-1.6倍加速且保持精度。

Comments Accepted for presentation at IROS 2026, preprint

详情
AI中文摘要

在实时视觉SLAM系统中,局部建图必须在严格的延迟约束下运行,因为延迟会降低地图质量并增加跟踪失败的风险。GPU并行化是降低延迟的有效途径。然而,由于同步共享状态更新以及将大型地图数据结构传输到GPU的开销,并行化局部建图具有挑战性。本文提出TurboMap,一个GPU并行化且CPU优化的局部建图后端,全面解决了这些挑战。我们重构了地图点创建,以在GPU上实现并行关键点对应搜索,重新设计并并行化了地图点融合,在CPU上优化了冗余关键帧剔除,并集成了基于GPU的快速局部光束法平差求解器。为最小化数据传输和同步成本,我们引入了持久化的GPU驻留关键帧存储。在EuRoC和TUM-VI数据集上的实验表明,平均局部建图速度分别提升1.3倍和1.6倍,同时保持精度不变。

英文摘要

In real-time Visual SLAM systems, local mapping must operate under strict latency constraints, as delays degrade map quality and increase the risk of tracking failure. GPU parallelization offers a promising way to reduce latency. However, parallelizing local mapping is challenging due to synchronized shared-state updates and the overhead of transferring large map data structures to the GPU. This paper presents TurboMap, a GPU-parallelized and CPU-optimized local mapping backend that holistically addresses these challenges. We restructure Map Point Creation to enable parallel Keypoint Correspondence Search on the GPU, redesign and parallelize Map Point Fusion, optimize Redundant Keyframe Culling on the CPU, and integrate a fast GPU-based Local Bundle Adjustment solver. To minimize data transfer and synchronization costs, we introduce persistent GPU-resident keyframe storage. Experiments on the EuRoC and TUM-VI datasets show average local mapping speedups of 1.3x and 1.6x, respectively, while preserving accuracy.

3. 其他3D视觉 1 篇

2510.16486 2026-06-18 cs.GR 版本更新 专题 70

Region-Aware Wasserstein Distances of Persistence Diagrams and Merge Trees

区域感知的持久图与合并树的Wasserstein距离

Mathieu Pont, Christoph Garth

专题命中 其他3D视觉 :拓扑数据分析,与3D视觉弱相关

AI总结 提出一种利用输入域中拓扑特征区域的Wasserstein距离泛化方法,通过极值对齐区域的距离重新定义特征比较,实现更优判别力,并应用于时变集合跟踪和降维可视化。

详情
AI中文摘要

本文提出了针对第0持久图和合并树的Wasserstein距离的泛化方法[21],[68],该方法利用了输入域中拓扑特征区域的优势。具体而言,我们将拓扑特征的比较重新定义为它们的极值对齐区域值之间的距离。这产生了比经典Wasserstein距离更具判别力的度量,并通过一个输入参数调整区域属性在距离中的影响。我们提出了两种策略来控制方法的计算时间和内存存储:分别通过允许在计算中使用区域的子集,以及通过压缩区域属性以获得低内存表示。在公开可用的集合数据上的大量实验证明了我们方法的效率,平均运行时间在分钟量级。我们通过两个应用展示了我们贡献的实用性。首先,我们使用由我们的方法提供的拓扑特征之间的分配来跟踪它们在时变集合中的演化,并提出时间持久曲线以促进理解这些特征如何出现、消失和随时间变化。其次,我们的方法允许计算集合的距离矩阵,该矩阵可用于降维目的并在二维中直观地表示其所有成员,我们表明这样的距离矩阵还允许检测集合中的关键阶段。最后,我们提供了一个C++实现,可用于重现我们的结果。

英文摘要

This paper presents a generalization of the Wasserstein distance for both 0th persistence diagrams and merge trees [21], [68] that takes advantage of the regions of their topological features in the input domain. Specifically, we redefine the comparison of topological features as a distance between the values of their extrema-aligned regions. It results in a more discriminative metric than the classical Wasserstein distance and generalizes it through an input parameter adjusting the impact of the region properties in the distance. We present two strategies to control both computation time and memory storage of our method by respectively enabling the use of subsets of the regions in the computation, and by compressing the regions' properties to obtain low-memory representations. Extensive experiments on openly available ensemble data demonstrate the efficiency of our method, with running times on the orders of minutes on average. We show the utility of our contributions with two applications. First, we use the assignments between topological features provided by our method to track their evolution in time-varying ensembles and propose the temporal persistence curves to facilitate the understanding of how these features appear, disappear and change over time. Second, our method allows to compute a distance matrix of an ensemble that can be used for dimensionality reduction purposes and visually represent in 2D all its members, we show that such distance matrices also allow to detect key phases in the ensemble. Finally, we provide a C++ implementation that can be used to reproduce our results.