arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.21936 2026-03-27 cs.CV

Cross-Instance Gaussian Splatting Registration via Geometry-Aware Feature-Guided Alignment

Roy Amoyal, Oren Freifeld, Chaim Baskin

Comments Accepted to CVPR 2026

详情

英文摘要

We present Gaussian Splatting Alignment (GSA), a novel method for aligning two independent 3D Gaussian Splatting (3DGS) models via a similarity transformation (rotation, translation, and scale), even when they are of different objects in the same category (e.g., different cars). In contrast, existing methods can only align 3DGS models of the same object (e.g., the same car) and often must be given true scale as input, while we estimate it successfully. GSA leverages viewpoint-guided spherical map features to obtain robust correspondences and introduces a two-step optimization framework that aligns 3DGS models while keeping them fixed. First, we apply an iterative feature-guided absolute orientation solver as our coarse registration, which is robust to poor initialization (e.g., 180 degrees misalignment or a 10x scale gap). Next, we use a fine registration step that enforces multi-view feature consistency, inspired by inverse radiance-field formulations. The first step already achieves state-of-the-art performance, and the second further improves results. In the same-object case, GSA outperforms prior works, often by a large margin, even when the other methods are given the true scale. In the harder case of different objects in the same category, GSA vastly surpasses them, providing the first effective solution for category-level 3DGS registration and unlocking new applications. Project webpage: https://bgu-cs-vil.github.io/GSA-project/

URL PDF HTML ☆

赞 0 踩 0

2603.19961 2026-03-27 cs.CV

Cov2Pose: Leveraging Spatial Covariance for Direct Manifold-aware 6-DoF Object Pose Estimation

Nassim Ali Ousalah, Peyman Rostami, Vincent Gaudillière, Emmanuel Koumandakis, Anis Kacem, Enjie Ghorbel, Djamila Aouada

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2603.19844 2026-03-27 cs.CV

Hyper-Connections for Adaptive Multi-Modal MRI Brain Tumor Segmentation

Lokendra Kumar, Shubham Aggarwal

Comments 29 pages,6 tables,17 figures

2603.19682 2026-03-27 cs.CV

3D Gaussian Splatting with Self-Constrained Priors for High Fidelity Surface Reconstruction

Takeshi Noda, Yu-Shen Liu, Zhizhong Han

Comments Accepted by CVPR 2026. Project page: https://takeshie.github.io/GSPrior

2603.19516 2026-03-27 cs.CV cs.AI

Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis

Sheng Lu, Hao Chen, Rui Yin, Juyan Ba, Yu Zhang, Yuanzhe Li

Comments Computer Vision and Pattern Recognition 2026

2603.19091 2026-03-27 cs.LG

Position: Spectral GNNs Are Neither Spectral Nor Superior for Node Classification

Qin Jiang, Chengjia Wang, Michael Lones, Dongdong Chen, Wei Pang

2603.16861 2026-03-27 cs.RO

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

Abhay Deshpande, Maya Guru, Rose Hendrix, Snehal Jauhri, Ainaz Eftekhar, Rohun Tripathi, Max Argus, Jordi Salvador, Haoquan Fang, Matthew Wallingford, Wilbert Pumacay, Yejin Kim, Quinn Pfeifer, Ying-Chun Lee, Piper Wolters, Omar Rayyan, Mingtong Zhang, Jiafei Duan, Karen Farley, Winson Han, Eli Vanderbilt, Dieter Fox, Ali Farhadi, Georgia Chalvatzaki, Dhruv Shah, Ranjay Krishna

2603.15132 2026-03-27 cs.CV

WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

Hainuo Wang, Mingjia Li, Xiaojie Guo

2603.14294 2026-03-27 cs.CV cs.AI cs.LG cs.RO

Seeking Physics in Diffusion Noise

Chujun Tang, Lei Zhong, Fangqiang Ding

Comments 32 pages, 8 figures, 10 tables

2603.13843 2026-03-27 cs.CV

MOGeo: Beyond One-to-One Cross-View Object Geo-localization

Bo Lv, Qingwang Zhang, Le Wu, Yuanyuan Li, Yingying Zhu

2603.13315 2026-03-27 cs.RO

Bi-HIL: Bilateral Control-Based Multimodal Hierarchical Imitation Learning via Subtask-Level Progress Rate and Keyframe Memory for Long-Horizon Contact-Rich Robotic Manipulation

Thanpimon Buamanee, Masato Kobayashi, Yuki Uranishi

2603.13133 2026-03-27 cs.RO

DecoVLN: Decoupling Observation, Reasoning, and Correction for Vision-and-Language Navigation

Zihao Xin, Wentong Li, Yixuan Jiang, Bin Wang, Runmin Cong, Jie Qin, Shengjun Huang

Comments 16 pages, 8 figures, CVPR2026

2603.11827 2026-03-27 cs.CV

Multimodal classification of Radiation-Induced Contrast Enhancements and tumor recurrence using deep learning

Robin Peretzke, Marlin Hanstein, Maximilian Fischer, Lars Badhi Wessel, Obada Alhalabi, Sebastian Regnery, Andreas Kudak, Maximilian Deng, Tanja Eichkorn, Philipp Hoegen Saßmannshausen, Fabian Allmendinger, Jan-Hendrik Bolten, Philipp Schröter, Christine Jungk, Jürgen Peter Debus, Peter Neher, Laila König, Klaus Maier-Hein

2603.11687 2026-03-27 cs.CL cs.AI

SemBench: A Universal Semantic Framework for LLM Evaluation

Mikel Zubillaga, Naiara Perez, Oscar Sainz, German Rigau

Comments Accepted at LREC 2026

2603.06663 2026-03-27 cs.CV cs.AI

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

Giacomo Frisoni, Lorenzo Molfetta, Mattia Buzzoni, Gianluca Moro

Comments Please cite the definitive, copyrighted, and peer-reviewed version of this article published in AAAI 2026, edited by Sven Koenig et al., AAAI Press, Vol. 40, No. 36, Technical Track, pp. 30726-30734, 2026. DOI: https://doi.org/10.1609/aaai.v40i36.40329

2603.05181 2026-03-27 cs.CV

Mario: Multimodal Graph Reasoning with Large Language Models

Yuanfu Sun, Kang Li, Pengkang Guo, Jiajin Liu, Qiaoyu Tan

Comments CVPR 2026

2603.05042 2026-03-27 cs.CV cs.RO

CoIn3D: Revisiting Configuration-Invariant Multi-Camera 3D Object Detection

Zhaonian Kuang, Rui Ding, Haotian Wang, Xinhu Zheng, Meng Yang, Gang Hua

Comments Accepted to CVPR 2026 main track

2603.04595 2026-03-27 cs.LG

A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments

Mohammed Omer Shakeel Ahmed

Comments 6 pages, 1 figure, 1 table. Accepted for publication in the 2025 IEEE International Conference on Future Machine Learning and Data Science (FMLDS)

详情

DOI: 10.1109/FMLDS67896.2025.00021
Journal ref: 2025 IEEE International Conference on Future Machine Learning and Data Science (FMLDS)

英文摘要

Duplicate records pose significant challenges in customer relationship management (CRM)and healthcare, often leading to inaccuracies in analytics, impaired user experiences, and compliance risks. Traditional deduplication methods rely heavily on direct identifiers such as names, emails, or Social Security Numbers (SSNs), making them ineffective under strict privacy regulations like GDPR and HIPAA, where such personally identifiable information (PII) is restricted or masked. In this research, I propose a novel, scalable, multimodal AI framework for detecting duplicates without depending on sensitive information. This system leverages three distinct modalities: semantic embeddings derived from textual fields (names, cities) using pre-trained DistilBERT models, behavioral patterns extracted from user login timestamps, and device metadata encoded through categorical embeddings. These heterogeneous modalities are combined using a late fusion approach and clustered via DBSCAN, an unsupervised density-based algorithm. This proposed model is evaluated against a traditional string-matching baseline on a synthetic CRM dataset specifically designed to reflect privacy-preserving constraints. The multimodal framework demonstrated good performance, achieving a good F1-score by effectively identifying duplicates despite variations and noise inherent in the data. This approach offers a privacy-compliant solution to entity resolution and supports secure digital infrastructure, enhances the reliability of public health analytics, and promotes ethical AI adoption across government and enterprise settings. It is well-suited for integration into national health data modernization efforts, aligning with broader goals of privacy-first innovation.

URL PDF HTML ☆

赞 0 踩 0

2603.03904 2026-03-27 cs.CV

Architecture and evaluation protocol for transformer-based visual object tracking in UAV applications

Augustin Borne, Pierre Notin, Christophe Hennequin, Sebastien Changey, Stephane Bazeille, Christophe Cudel, Franz Quint

2603.01010 2026-03-27 cs.CV

GeodesicNVS: Probability Density Geodesic Flow Matching for Novel View Synthesis

Xuqin Wang, Tao Wu, Yanfeng Zhang, Lu Liu, Mingwei Sun, Yongliang Wang, Niclas Zeller, Daniel Cremers

Comments Accepted by CVPR 2026; Project Page see https://xuqinwang.github.io/geodesicNVS.github.io/

2603.00141 2026-03-27 cs.CV cs.AI cs.LG eess.IV

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Xiangyan Qu, Zhenlong Yuan, Jing Tang, Rui Chen, Datao Tang, Meng Yu, Lei Sun, Yancheng Bai, Xiangxiang Chu, Gaopeng Gou, Gang Xiong, Yujun Cai

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2602.23783 2026-03-27 cs.CV

Diffusion Probe: Generated Image Result Prediction Using CNN Probes

Benlei Cui, Bukun Huang, Zhizeng Ye, Xuemei Dong, Tuo Chen, Hui Xue, Dingkang Yang, Longtao Huang, Jingqun Tang, Haiwen Hong

Comments CVPR 2026

2602.22752 2026-03-27 cs.CL cs.AI

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger

Comments 14 pages, 1 figure, 7 tables. Accepted to the 15th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) at EACL 2026, Rabat, Morocco

2602.22013 2026-03-27 cs.CV

RobustVisRAG: Causality-Aware Vision-Based Retrieval-Augmented Generation under Visual Degradations

I-Hsiang Chen, Yu-Wei Liu, Tse-Yu Wu, Yu-Chien Chiang, Jen-Chien Yang, Wei-Ting Chen

Comments Accepted by CVPR2026; Project Page: https://robustvisrag.github.io

2602.20060 2026-03-27 cs.CV cs.RO

MeanFuser: Fast One-Step Multi-Modal Trajectory Generation and Adaptive Reconstruction via MeanFlow for End-to-End Autonomous Driving

Junli Wang, Yinan Zheng, Xueyi Liu, Zebin Xing, Pengfei Li, Guang Li, Kun Ma, Guang Chen, Hangjun Ye, Zhongpu Xia, Long Chen, Qichao Zhang

Comments Accepted by CVPR 2026

2602.09929 2026-03-27 cs.CV cs.AI

Monocular Normal Estimation via Shading Sequence Estimation

Zongrui Li, Xinhua Ma, Minghui Hu, Yunqing Zhao, Yingchen Yu, Qian Zheng, Chang Liu, Xudong Jiang, Song Bai

Comments ICLR 2026 (Oral), Project page: https://xinhua694.github.io/RoSE.github.io/

2602.08371 2026-03-27 cs.CL

ViGoEmotions: A Benchmark Dataset For Fine-grained Emotion Detection on Vietnamese Texts

Hung Quang Tran, Nam Tien Pham, Son T. Luu, Kiet Van Nguyen

Comments Accepted as main paper at EACL 2026

2602.07444 2026-03-27 cs.CV eess.SP

Perspective-aware fusion of incomplete depth maps and surface normals for accurate 3D reconstruction

Ondrej Hlinka, Georg Kaniak, Christian Kapeller

Comments submitted to IET Electronics Letters

2602.02193 2026-03-27 cs.CV

SSI-DM: Singularity Skipping Inversion of Diffusion Models

Chen Min, Enze Jiang, Jishen Peng, Zheng Ma

Comments A complete revision is needed

2601.19102 2026-03-27 cs.LG

OWLEYE: Zero-Shot Learner for Cross-Domain Graph Data Anomaly Detection

Lecheng Zheng, Dongqi Fu, Zihao Li, Jingrui He

Comments Accepted by ICLR 2026