3D 视觉 - arXivDaily 专题

2605.17131 2026-06-18 cs.CV cs.AI cs.LG 版本更新 95%

A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

针对点云分类和分割的深度学习架构系统性调研

Minhas Kamal, Hiranya Garbha Kumar, Balakrishnan Prabhakaran

发表机构 * State University of New York at Albany（纽约州立大学阿尔巴尼分校）

专题命中点云：系统性调研点云分类和分割的深度学习架构。

AI总结本文系统性地探讨了点云分类和分割中的深度学习架构，分析了点云数据的结构特性，分类了不同架构的工作，并评估了其在主流基准上的性能，同时指出了开放挑战和未来方向。

Comments We reviewed a decade of advancements in point cloud processing: trace the evolution of the field from its foundational roots to the modern SOTA, analyze how diverse architectures overcome the inherent geometric challenges of 3D data, and map out critical research gaps alongside promising future directions. GitHub: https://github.com/MinhasKamal/DeepLearningForPointCloud

Journal ref ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2026

详情

DOI: 10.1145/3815180

AI中文摘要

点云因其简洁性和几何保真度而成为表示3D形状和场景最广泛采用的格式。然而，其固有的无序和不规则性质，加剧了传感器噪声和遮挡的影响，给基于机器学习的方法带来了独特的挑战。为应对这些问题，已开发出多种策略，包括转换为有序格式、提取局部几何特征以及基于排列不变或自注意力的处理方法。在本文中，我们的重点是深度学习模型在3D视觉三个基本任务中的应用：点云分类、部分分割和语义分割。我们首先正式定义点云数据，然后深入讨论其结构特性。接着，我们根据其骨干结构对重要工作进行分类，并评估其在流行基准上的性能。除了经验比较外，我们还提供了架构创新和局限性的见解。我们还概述了3D点云理解中的开放挑战和有前途的未来方向。

英文摘要

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

URL PDF HTML ☆

赞 0 踩 0

2606.18583 2026-06-18 cs.CV cs.RO 新提交 85%

Aerial-ground LiDAR place recognition with patch-level self-supervised learning and expanded reciprocal re-ranking

空地激光雷达地点识别：基于块级自监督学习和扩展互逆重排序

Yandi Yang, Xianghong Zou, Jianping Li, Haofeng Xie, Saurav Uprety, Hongzhou Yang, Naser El-Sheimy

发表机构 * University of Calgary（卡尔加里大学）； Nanchang University（南昌大学）； Nanyang Technological University（南洋理工大学）； Wuhan University（武汉大学）

专题命中点云：空地激光雷达地点识别框架，点云检索重排序

AI总结提出一种空地激光雷达地点识别框架，通过多尺度块级自监督学习缩小域差距，并利用扩展互逆重排序算法减少误检，在多个数据集上显著提升检索精度。

详情

AI中文摘要

激光雷达地点识别用于确定在预先采集的点云地图上的位置。最常研究的基于地面激光雷达的地点识别存在预访问要求、覆盖不完整和视角有限等缺点。使用预先采集的全覆盖机载激光扫描（ALS）数据作为空中先验地图可以克服这些缺点，使得跨视角地点识别变得必要且有利。然而，空地激光雷达地点识别面临重大挑战，包括空中和地面点云之间的域差距以及初始检索中的误检。为了解决这些问题，我们提出了一种用于空地激光雷达地点识别的新型检索和重排序框架。基于相邻点云块与锚点块共享相似语义的先验知识，我们的检索网络在多个尺度上引入了块级自监督学习模块，并与场景级学习相结合，以提高空中和地面点云之间全局特征的判别性。此外，利用ALS点云的结构化空间分布，我们引入了一种扩展互逆（ER）重排序算法，以最大化利用邻域信息，并根据邻域特征优化每个特征，然后用于更新相似度矩阵以进行最终排序。大量实验表明，我们的检索网络优于现有最先进（SOTA）方法，在CS-Urban-Scenes数据集上平均Recall@1提高了9.8%，平均Recall@1%提高了3.2%，同时在CS-Campus3D数据集上也展示了最佳性能。此外，我们的ER重排序算法在无需额外训练的情况下，进一步将CS-Campus3D上的平均Recall@1提高了4.9%，CS-Urban-Scenes上提高了10.2%。

英文摘要

LiDAR place recognition determines one's position on a prior point cloud map. The most studied ground-level LiDAR place recognition suffers from pre-visit requirements, incomplete coverage, and limited perspectives. Using pre-acquired, full-coverage Airborne Laser Scanning (ALS) data as an aerial prior map overcomes these drawbacks, making cross-view place recognition necessary and advantageous. However, aerial-ground LiDAR place recognition faces significant challenges, including the domain gap between aerial and ground point clouds, and false positives during initial retrieval. To address these challenges, we present a novel retrieval and re-ranking framework for aerial-ground LiDAR place recognition. Based on the priors that neighboring point cloud patches share similar semantics with anchor patch, our retrieval network introduces patch-level self-supervised learning modules at multiple scales and integrates with scene-level learning to improve global feature discriminativeness between aerial and ground point clouds. Furthermore, leveraging the structured spatial distribution of ALS point clouds, we introduce an Expanded Reciprocal (ER) re-ranking algorithm to exploit neighborhood information maximally and refine each feature based on neighbor features, which are then used to update the similarity matrix for final ranking. Extensive experiments demonstrate that our retrieval network outperforms existing state-of-the-art (SOTA) methods, achieving a 9.8\% improvement in average Recall@1 and a 3.2\% improvement in average Recall@1\% on the CS-Urban-Scenes, while also showing the best performance on the CS-Campus3D dataset. Additionally, our ER re-ranking algorithm further boosts the average Recall@1 by 4.9\% on CS-Campus3D and 10.2\% on CS-Urban-Scenes without additional training.

URL PDF HTML ☆

赞 0 踩 0

2601.01200 2026-06-18 cs.CV eess.IV 版本更新 85%

Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity

点云的多尺度隐式结构相似性客观质量评估

Zhang Chen, Shuai Wan, Yuezhe Zhang, Siyu Ren, Fuzheng Yang, Junhui Hou

发表机构 * School of Electronics and Information, Northwestern Polytechnical University（电子与信息学院，西北工业大学）； Department of Computer Science, City University of Hong Kong（计算机科学系，香港城市大学）； School of Telecommunication Engineering, Xidian University（电信工程学院，西安电子科技大学）

专题命中点云：点云质量评估，多尺度隐式结构相似性

AI总结针对点云质量评估中不规则数据匹配困难的问题，提出多尺度隐式结构相似性度量（MS-ISSM），通过径向基函数连续表示局部特征并比较隐式函数系数，结合ResGrouped-MLP网络，在多个基准上超越现有方法。

Comments IEEE TMM Accepted

详情

AI中文摘要

点云的无结构和不规则特性对精确的点云质量评估（PCQA）构成重大挑战，特别是在建立准确的感知特征对应关系方面。为了解决这一问题，我们提出了多尺度隐式结构相似性度量（MS-ISSM）。与传统的点对点匹配不同，MS-ISSM利用径向基函数（RBF）连续表示局部特征，将失真测量转化为隐式函数系数的比较。该方法有效避免了不规则数据中固有的匹配误差。此外，我们提出了ResGrouped-MLP质量评估网络，该网络能够鲁棒地将多尺度特征差异映射到感知分数。该网络架构摒弃了传统的平面多层感知器（MLP），采用分组编码策略，集成了残差块和通道注意力机制。这种分层设计使得模型能够保留亮度、色度和几何的独特物理语义，同时自适应地关注高、中、低尺度上最显著的失真特征。在多个基准上的实验结果表明，MS-ISSM在可靠性和泛化性方面均优于最先进的指标。源代码可在以下网址获取：this https URL。

英文摘要

The unstructured and irregular nature of points poses a significant challenge for accurate point cloud quality assessment (PCQA), particularly in establishing accurate perceptual feature correspondence. To tackle this, we propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM). Unlike traditional point-to-point matching, MS-ISSM utilizes radial basis function (RBF) to represent local features continuously, transforming distortion measurement into a comparison of implicit function coefficients. This approach effectively circumvents matching errors inherent in irregular data. Additionally, we propose a ResGrouped-MLP quality assessment network, which robustly maps multi-scale feature differences to perceptual scores. The network architecture departs from traditional flat multi-layer perceptron (MLP) by adopting a grouped encoding strategy integrated with residual blocks and channel-wise attention mechanisms. This hierarchical design allows the model to preserve the distinct physical semantics of luma, chroma, and geometry while adaptively focusing on the most salient distortion features across High, Medium, and Low scales. Experimental results on multiple benchmarks demonstrate that MS-ISSM outperforms state-of-the-art metrics in both reliability and generalization. The source code is available at: https://github.com/ZhangChen2022/MS-ISSM.

URL PDF HTML ☆

赞 0 踩 0

2606.18948 2026-06-18 cs.RO 新提交 75%

C-ARC: Continuous-Adaptive Range Clustering for Non-Repetitive LiDAR Sensors

C-ARC: 面向非重复式LiDAR传感器的连续自适应范围聚类

Nick B. Schroeder, Jonathan Lichtenfeld, Oskar von Stryk

发表机构 * Technical University of Darmstadt（德累斯顿技术大学）； Simulation, Systems Optimization and Robotics Group（仿真、系统优化与机器人组）

专题命中点云：处理非重复式LiDAR点云聚类，属于3D视觉

AI总结提出C-ARC框架，通过滑动窗口上的持久双图结构解耦高频点插入与按需聚类检索，并利用指数控制环自适应校准网格分辨率，实现非重复式LiDAR点云的实时聚类。

Comments Submitted to IEEE Robotics and Automation Letters. This work has been submitted to the IEEE for possible publication. 8 pages, 7 figures

详情

AI中文摘要

实时LiDAR聚类识别点云中的结构，是许多移动机器人算法的重要前提。当前方法主要针对重复式机械LiDAR传感器开发。近年来，由于成本和外形尺寸小，非重复式LiDAR传感器的使用显著增加。这类基于Risley棱镜的非重复传感器违反了重复式机械传感器的两个关键假设：结构化的扫描线和明确的帧边界。其Rhodonea曲线轨迹产生非均匀点分布，且缺乏旋转周期使得传统扫描线索引无法适用。为满足这些新需求，我们开发了C-ARC，一个连续自适应范围聚类框架，它在滑动窗口上维护一个持久双图，将高频点插入与按需聚类检索解耦。这对于SLAM或跟踪等关键功能至关重要。自适应范围网格分辨率机制在初始化时使用指数控制环校准网格尺寸，无需预先了解扫描模式即可平衡稀疏-碰撞权衡。作为开源的单线程C++17库实现，C-ARC在商用硬件上对Livox Mid-360以20 Hz产生实时聚类输出。在Livox Avia上的评估表明，对于扫描模式高度集中的传感器，无界单元占用是主要限制。自适应分辨率机制还提高了现有基于网格的方法在非重复数据上的聚类质量。

英文摘要

Real-time LiDAR clustering identifies structures in point clouds, which is an essential prerequisite for many mobile robotics algorithms. Current methods are mostly developed for repetitive mechanical LiDAR sensors. Recently, the use of non-repetitive LiDAR sensors is strongly increasing due to their small cost and form factor. Such non-repetitive Risley prism-based sensors violate two key assumptions of repetitive mechanical sensors: structured scan lines and well-defined frame boundaries. Their Rhodonea-curve trajectories produce non-uniform point distributions, and the absence of a rotation cycle renders conventional scan line indexing inapplicable. To meet such new requirements, we developed C-ARC, a Continuous-Adaptive Range Clustering framework that maintains a persistent dual-graph over a sliding window, decoupling high-frequency point insertion from on-demand cluster retrieval. This is crucial for key functionalities like SLAM or tracking. An adaptive range grid resolution mechanism calibrates grid dimensions at initialization using an exponential control loop, balancing the sparsity-collision trade-off without prior knowledge of the scanning pattern. Implemented as an open-sourced single-threaded C++17 library, C-ARC produces real-time cluster output at 20 Hz on commodity hardware for the Livox Mid-360. Evaluation on the Livox Avia identifies unbounded cell occupancy as the primary limitation for sensors with strongly concentrated scan patterns. The adaptive resolution mechanism additionally improves clustering quality for existing grid-based methods on non-repetitive data.

URL PDF HTML ☆

赞 0 踩 0