arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.26260 2026-03-30 cs.CV cs.AI

GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation

Xujing Tao, Chuxin Wang, Yubo Ai, Zhixin Cheng, Zhuoyuan Li, Liangsheng Liu, Yujia Chen, Xinjun Li, Qiao Li, Wenfei Yang, Tianzhu Zhang

Comments Accepted to CVPR 2026

详情

英文摘要

Open-vocabulary 3D semantic segmentation aims to segment arbitrary categories beyond the training set. Existing methods predominantly rely on distilling knowledge from 2D open-vocabulary models. However, aligning 3D features to the 2D representation space restricts intrinsic 3D geometric learning and inherits errors from 2D predictions. To address these limitations, we propose GeoGuide, a novel framework that leverages pretrained 3D models to integrate hierarchical geometry-semantic consistency for open-vocabulary 3D segmentation. Specifically, we introduce an Uncertainty-based Superpoint Distillation module to fuse geometric and semantic features for estimating per-point uncertainty, adaptively weighting 2D features within superpoints to suppress noise while preserving discriminative information to enhance local semantic consistency. Furthermore, our Instance-level Mask Reconstruction module leverages geometric priors to enforce semantic consistency within instances by reconstructing complete instance masks. Additionally, our Inter-Instance Relation Consistency module aligns geometric and semantic similarity matrices to calibrate cross-instance consistency for same-category objects, mitigating viewpoint-induced semantic drift. Extensive experiments on ScanNet v2, Matterport3D, and nuScenes demonstrate the superior performance of GeoGuide.

URL PDF HTML ☆

赞 0 踩 0

2603.26258 2026-03-30 cs.CV cs.AI cs.LG

ARTA: Adaptive Mixed-Resolution Token Allocation for Efficient Dense Feature Extraction

David Hagerman, Roman Naeem, Erik Brorsson, Fredrik Kahl, Lennart Svensson

2603.26254 2026-03-30 cs.LG

Improving Risk Stratification in Hypertrophic Cardiomyopathy: A Novel Score Combining Echocardiography, Clinical, and Medication Data

Marion Taconné, Valentina D. A. Corino, Annamaria Del Franco, Sara Giovani, Iacopo Olivotto, Adrien Al Wazzan, Erwan Donal, Pietro Cerveri, Luca Mainardi

2603.26253 2026-03-30 cs.CL

SocialX: A Modular Platform for Multi-Source Big Data Research in Indonesia

Muhammad Apriandito Arya Saputra, Andry Alamsyah, Dian Puteri Ramadhani, Thomhert Suprapto Siadari, Hanif Fakhrurroja

Comments 10 pages, 1 Figure, 4 Tables

2603.26250 2026-03-30 cs.CV

Real-Time Branch-to-Tool Distance Estimation for Autonomous UAV Pruning: Benchmarking Five DEFOM-Stereo Variants from Simulation to Jetson Deployment

Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

详情

英文摘要

Autonomous tree pruning with unmanned aerial vehicles (UAVs) is a safety-critical real-world task: the onboard perception system must estimate the metric distance from a cutting tool to thin tree branches in real time so that the UAV can approach, align, and actuate the pruner without collision. We address this problem by training five variants of DEFOM-Stereo - a recent foundation-model-based stereo matcher - on a task-specific synthetic dataset and deploying the checkpoints on an NVIDIA Jetson Orin Super 16 GB. The training corpus is built in Unreal Engine 5 with a simulated ZED Mini stereo camera capturing 5,520 stereo pairs across 115 tree instances from three viewpoints at 2m distance; dense EXR depth maps provide exact, spatially complete supervision for thin branches. On the synthetic test set, DEFOM-Stereo ViT-S achieves the best depth-domain accuracy (EPE 1.74 px, D1-all 5.81%, delta-1 95.90%, depth MAE 23.40 cm) but its Jetson inference speed of ~2.2 FPS (~450 ms per frame) remains too slow for responsive closed-loop tool control. A newly introduced balanced variant, DEFOM-PrunePlus (~21M backbone, ~3.3 FPS on Jetson), offers the best deployable accuracy-speed trade-off (EPE 5.87 px, depth MAE 64.26 cm, delta-1 87.59%): its frame rate is sufficient for real-time guidance and its depth accuracy supports safe branch approach planning at the 2m operating range. The lightweight DEFOM-PruneStereo (~6.9 FPS) and DEFOM-PruneNano (~8.5 FPS) run fast but sacrifice substantial accuracy (depth MAE > 57 cm), making estimates too unreliable for safe actuation. Zero-shot inference on real photographs confirms that full-capacity models preserve branch geometry, validating the sim-to-real transfer. We conclude that DEFOM-PrunePlus provides the most practical accuracy-latency balance for onboard distance estimation, while ViT-S serves as the reference for future hardware.

URL PDF HTML ☆

赞 0 踩 0

2603.26249 2026-03-30 cs.LG

Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems

Pascal Henrich, Jonas Sievers, Maximilian Beichter, Thomas Blank, Ralf Mikut, Veit Hagenmeyer

2603.26246 2026-03-30 cs.CL cs.AI cs.LG eess.AS

Distilling Conversations: Abstract Compression of Conversational Audio Context for LLM-based ASR

Shashi Kumar, Esaú Villatoro-Tello, Sergio Burdisso, Kadri Hacioglu, Thibault Bañeras-Roux, Hasindri Watawana, Dairazalia Sanchez-Cortes, Srikanth Madikeri, Petr Motlicek, Andreas Stolcke

Comments 11 pages

2603.26236 2026-03-30 cs.CL

A Universal Vibe? Finding and Controlling Language-Agnostic Informal Register with SAEs

Uri Z. Kialy, Avi Shtarkberg, Ayal Klein

2603.26235 2026-03-30 cs.CL

GS-BrainText: A Multi-Site Brain Imaging Report Dataset from Generation Scotland for Clinical Natural Language Processing Development and Validation

Beatrice Alex, Claire Grover, Arlene Casey, Richard Tobin, Heather Whalley, William Whiteley

Comments 11 pages, 1 figure

2603.26231 2026-03-30 cs.LG cs.PF math.OC math.PR

Optimization Trade-offs in Asynchronous Federated Learning: A Stochastic Networks Approach

Abdelkrim Alahyane, Céline Comte, Matthieu Jonckheere

2603.26211 2026-03-30 cs.CV cs.AI

Towards GUI Agents: Vision-Language Diffusion Models for GUI Grounding

Shrinidhi Kumbhar, Haofu Liao, Srikar Appalaraju, Kunwar Yashraj Singh

Comments Accepted to CVPR 2026

2603.26207 2026-03-30 cs.CL cs.AI

Sparse Auto-Encoders and Holism about Large Language Models

Jumbly Grindrod

2603.26206 2026-03-30 cs.CV cs.RO

4DRaL: Bridging 4D Radar with LiDAR for Place Recognition using Knowledge Distillation

Ningyuan Huang, Zhiheng Li, Zheng Fang

Comments Accepted by ICRA 2026

2603.26193 2026-03-30 cs.CV cs.AI

MemCam: Memory-Augmented Camera Control for Consistent Video Generation

Xinhang Gao, Junlin Guan, Shuhan Luo, Wenzhuo Li, Guanghuan Tan, Jiacheng Wang

Comments 6 pages, 3 figures, 3 tables, accepted by IJCNN 2026

2603.26192 2026-03-30 cs.CV

HAD: Heterogeneity-Aware Distillation for Lifelong Heterogeneous Learning

Xuerui Zhang, Xuehao Wang, Zhan Zhuang, Linglan Zhao, Ziyue Li, Xinmin Zhang, Zhihuan Song, Yu Zhang

2603.26190 2026-03-30 cs.CV cs.LG

Dual-Stage Invariant Continual Learning under Extreme Visual Sparsity

Rangya Zhang, Jiaping Xiao, Lu Bai, Yuhang Zhang, Mir Feroskhan

2603.26188 2026-03-30 cs.CV

OSA: Echocardiography Video Segmentation via Orthogonalized State Update and Anatomical Prior-aware Feature Enhancement

Rui Wang, Huisi Wu, Jing Qin

2603.26186 2026-03-30 cs.CV cs.AI

Progressive Learning with Anatomical Priors for Reliable Left Atrial Scar Segmentation from Late Gadolinium Enhancement MRI

Jing Zhang, Bastien Bergere, Emilie Bollache, Jonas Leite, Mikaël Laredo, Alban Redheuil, Nadjia Kachenoura

Comments 16 pages, 3 figures, 3 tables

2603.26183 2026-03-30 cs.CV

DUGAE: Unified Geometry and Attribute Enhancement via Spatiotemporal Correlations for G-PCC Compressed Dynamic Point Clouds

Pan Zhao, Hui Yuan, Chang Sun, Chongzhen Tian, Raouf Hamzaoui, Sam Kwong

2603.26181 2026-03-30 cs.CV

GLINT: Modeling Scene-Scale Transparency via Gaussian Radiance Transport

Youngju Na, Jaeseong Yun, Soohyun Ryu, Hyunsu Kim, Sung-Eui Yoon, Suyong Yeon

Comments CVPR 2026, Project page: https://youngju-na.github.io/GLINT

2603.26179 2026-03-30 cs.CV

Consistency Beyond Contrast: Enhancing Open-Vocabulary Object Detection Robustness via Contextual Consistency Learning

Bozhao Li, Shaocong Wu, Tong Shao, Senqiao Yang, Qiben Shan, Zhuotao Tian, Jingyong Su

2603.26178 2026-03-30 cs.LG

Geometric Evolution Graph Convolutional Networks: Enhancing Graph Representation Learning via Ricci Flow

Jicheng Ma, Yunyan Yang, Juan Zhao, Liang Zhao

2603.26177 2026-03-30 cs.LG

Can AI Scientist Agents Learn from Lab-in-the-Loop Feedback? Evidence from Iterative Perturbation Discovery

Gilles Wainrib, Barbara Bodinier, Haitem Dakhli, Josep Monserrat, Almudena Espin Perez, Sabrina Carpentier, Roberta Codato, John Klein

2603.26174 2026-03-30 cs.CV

CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions

Chonghuinan Wang, Zihan Chen, Yuxiang Wei, Tianyi Jiang, Xiaohe Wu, Fan Li, Wangmeng Zuo, Hongxun Yao

Comments Accepted by CVPR2026

2603.26168 2026-03-30 cs.CV

Provably Contractive and High-Quality Denoisers for Convergent Restoration

Shubhi Shukla, Pravin Nair

2603.26164 2026-03-30 cs.LG cs.CL

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Hao Liang, Zhengyang Zhao, Meiyi Qiang, Mingrui Chen, Lu Ma, Rongyi Yu, Hengyi Feng, Shixuan Sun, Zimo Meng, Xiaochen Ma, Xuanlin Yang, Qifeng Cai, Ruichuan An, Bohan Zeng, Zhen Hao Wong, Chengyu Shen, Runming He, Zhaoyang Han, Yaowei Zheng, Fangcheng Fu, Conghui He, Bin Cui, Zhiyu Li, Weinan E, Wentao Zhang

2603.26156 2026-03-30 cs.CL cs.CY

Clash of the models: Comparing performance of BERT-based variants for generic news frame detection

Vihang Jumle

2603.26154 2026-03-30 cs.CV

IP-Bench: Benchmark for Image Protection Methods in Image-to-Video Generation Scenarios

Xiaofeng Li, Leyi Sheng, Zhen Sun, Zongmin Zhang, Jiaheng Wei, Xinlei He

2603.26145 2026-03-30 cs.CV

Efficient Few-Shot Learning for Edge AI via Knowledge Distillation on MobileViT

Shuhei Tsuyuki, Reda Bensaid, Jérémy Morlier, Mathieu Léonardon, Naoya Onizawa, Vincent Gripon, Takahiro Hanyu

2603.26140 2026-03-30 cs.LG cs.AI

On the Complexity of Optimal Graph Rewiring for Oversmoothing and Oversquashing in Graph Neural Networks

Mostafa Haghir Chehreghani