arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16538 2026-03-18 cs.CV

Rethinking Pose Refinement in 3D Gaussian Splatting under Pose Prior and Geometric Uncertainty

Mangyu Kong, Jaewon Lee, Seongwon Lee, Euntai Kim

Comments 17 pages, 11 figures, CVPR 2026

详情

英文摘要

3D Gaussian Splatting (3DGS) has recently emerged as a powerful scene representation and is increasingly used for visual localization and pose refinement. However, despite its high-quality differentiable rendering, the robustness of 3DGS-based pose refinement remains highly sensitive to both the initial camera pose and the reconstructed geometry. In this work, we take a closer look at these limitations and identify two major sources of uncertainty: (i) pose prior uncertainty, which often arises from regression or retrieval models that output a single deterministic estimate, and (ii) geometric uncertainty, caused by imperfections in the 3DGS reconstruction that propagate errors into PnP solvers. Such uncertainties can distort reprojection geometry and destabilize optimization, even when the rendered appearance still looks plausible. To address these uncertainties, we introduce a relocalization framework that combines Monte Carlo pose sampling with Fisher Information-based PnP optimization. Our method explicitly accounts for both pose and geometric uncertainty and requires no retraining or additional supervision. Across diverse indoor and outdoor benchmarks, our approach consistently improves localization accuracy and significantly increases stability under pose and depth noise.

URL PDF HTML ☆

赞 0 踩 0

2603.16537 2026-03-18 cs.AI cs.HC cs.RO

Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots

Carmen Ng

Comments Accepted at the Proceedings of the CHI 2026 Workshop: Ethics at the Front-End

2603.16536 2026-03-18 cs.RO

Kamino: GPU-based Massively Parallel Simulation of Multi-Body Systems with Challenging Topologies

Vassilios Tsounis, Guirec Maloisel, Christian Schumacher, Ruben Grandia, Agon Serifi, David Müller, Chris Amevor, Tobias Widmer, Moritz Bächer

2603.16535 2026-03-18 cs.LG math.OC stat.ML

SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds

Viktor Stein, Wuchen Li, Gabriele Steidl

Comments 24 pages, 2 figures, 3 tables, comments welcome!

2603.16531 2026-03-18 cs.RO

LIMBERO: A Limbed Climbing Exploration Robot Toward Traveling on Rocky Cliffs

Kentaro Uno, Masazumi Imai, Kazuki Takada, Teruhiro Kataonami, Yudai Matsuura, Antonin Ringeval-Meusnier, Keita Nagaoka, Mikio Eguchi, Ryo Nishibe, Kazuya Yoshida

Comments Author's version of a manuscript accepted at the 2026 IEEE International Conference on Robotics and Automation (ICRA). (c) IEEE

2603.16526 2026-03-18 cs.AI

Exploring different approaches to customize language models for domain-specific text-to-code generation

Luís Freire, Fernanda A. Andaló, Nicki Skafte Detlefsen

2603.16524 2026-03-18 cs.CV cs.LG physics.comp-ph physics.data-an

An approximate graph elicits detonation lattice

Vansh Sharma, Venkat Raman

2603.16503 2026-03-18 cs.RO cs.SY eess.SY

When Rolling Gets Weird: A Curved-Link Tensegrity Robot for Non-Intuitive Behavior

Lauren Ervin, Harish Bezawada, Vishesh Vikas

Comments Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

2603.16500 2026-03-18 cs.LG cs.CL

From the Inside Out: Progressive Distribution Refinement for Confidence Calibration

Xizhong Yang, Yinan Xia, Huiming Wang, Mofei Song

Comments 15 pages

2603.16495 2026-03-18 cs.AI

ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

Zihe Wang, Yihuan Wang, Haiyang Yu. Zhiyong Cui, Xiaojian Liao, Chengcheng Wang, Yonglin Tian, Yongxin Tong

详情

英文摘要

The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different systems. Meanwhile, Large Language Models (LLMs) are increasingly applied in intelligent transportation, advancing traffic models from algorithmic to cognitive intelligence. However, general LLMs are unable to effectively understand the regulations and causal relationships of events in unconventional scenarios in the expressway field. Therefore, this paper constructs a pre-trained multimodal large language model (MLLM) for expressways, ExpressMind, which serves as the cognitive core for intelligent expressway operations. This paper constructs the industry's first full-stack expressway dataset, encompassing traffic knowledge texts, emergency reasoning chains, and annotated video events to overcome data scarcity. This paper proposes a dual-layer LLM pre-training paradigm based on self-supervised training and unsupervised learning. Additionally, this study introduces a Graph-Augmented RAG framework to dynamically index the expressway knowledge base. To enhance reasoning for expressway incident response strategies, we develop a RL-aligned Chain-of-Thought (RL-CoT) mechanism that enforces consistency between model reasoning and expert problem-solving heuristics for incident handling. Finally, ExpressMind integrates a cross-modal encoder to align the dynamic feature sequences under the visual and textual channels, enabling it to understand traffic scenes in both video and image modalities. Extensive experiments on our newly released multi-modal expressway benchmark demonstrate that ExpressMind comprehensively outperforms existing baselines in event detection, safety response generation, and complex traffic analysis. The code and data are available at: https://wanderhee.github.io/ExpressMind/.

URL PDF HTML ☆

赞 0 踩 0

2603.16489 2026-03-18 cs.CV cs.AI

Unlearning for One-Step Generative Models via Unbalanced Optimal Transport

Hyundo Choi, Junhyeong An, Jinseong Park, Jaewoong Choi

Comments 27 pages, 10 figures

2603.16483 2026-03-18 cs.CL

On the Emotion Understanding of Synthesized Speech

Yuan Ge, Haishu Zhao, Aokai Hao, Junxiang Zhang, Bei Li, Xiaoqian Liu, Chenglong Wang, Jianjin Wang, Bingsen Zhou, Bingyu Liu, Jingbo Zhu, Zhengtao Yu, Tong Xiao

2603.16482 2026-03-18 cs.CV cs.AI

DST-Net: A Dual-Stream Transformer with Illumination-Independent Feature Guidance and Multi-Scale Spatial Convolution for Low-Light Image Enhancement

Yicui Shi, Yuhan Chen, Xiangfei Huang, Zhenguo Wang, Wenxuan Yu, Ying Fang

详情

英文摘要

Low-light image enhancement aims to restore the visibility of images captured by visual sensors in dim environments by addressing their inherent signal degradations, such as luminance attenuation and structural corruption. Although numerous algorithms attempt to improve image quality, existing methods often cause a severe loss of intrinsic signal priors. To overcome these challenges, we propose a Dual-Stream Transformer Network (DST-Net) based on illumination-agnostic signal prior guidance and multi-scale spatial convolutions. First, to address the loss of critical signal features under low-light conditions, we design a feature extraction module. This module integrates Difference of Gaussians (DoG), LAB color space transformations, and VGG-16 for texture extraction, utilizing decoupled illumination-agnostic features as signal priors to continuously guide the enhancement process. Second, we construct a dual-stream interaction architecture. By employing a cross-modal attention mechanism, the network leverages the extracted priors to dynamically rectify the deteriorated signal representation of the enhanced image, ultimately achieving iterative enhancement through differentiable curve estimation. Furthermore, to overcome the inability of existing methods to preserve fine structures and textures, we propose a Multi-Scale Spatial Fusion Block (MSFB) featuring pseudo-3D and 3D gradient operator convolutions. This module integrates explicit gradient operators to recover high-frequency edges while capturing inter-channel spatial correlations via multi-scale spatial convolutions. Extensive evaluations and ablation studies demonstrate that DST-Net achieves superior performance in subjective visual quality and objective metrics. Specifically, our method achieves a PSNR of 25.64 dB on the LOL dataset. Subsequent validation on the LSRW dataset further confirms its robust cross-scene generalization.

URL PDF HTML ☆

赞 0 踩 0

2603.16471 2026-03-18 cs.RO

Coverage First Next Best View for Inspection of Cluttered Pipe Networks Using Mobile Manipulators

Joshua Raymond Bettles, Jiaxu Wu, Bruno Vilhena Adorno, Joaquin Carrasco, Atsushi Yamashita

Comments 8 pages, 9 figures, 1 table. Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems 2026

2603.16463 2026-03-18 cs.AI cs.HC

Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition

Yu Liu, Lei Zhang, Haoxun Li, Hanlei Shi, Yuxuan Ding, Leyuan Qu, Taihao Li

2603.16461 2026-03-18 cs.CV

GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models

Jiaxin Zhang, Junjun Jiang, Haijie Li, Youyu Chen, Kui Jiang, Dave Zhenyu Chen

2603.16459 2026-03-18 cs.CL

DynHD: Hallucination Detection for Diffusion Large Language Models via Denoising Dynamics Deviation Learning

Yanyu Qian, Yue Tan, Yixin Liu, Wang Yu, Shirui Pan

Comments 15 pages, 8 figures, 5 tables

2603.16455 2026-03-18 cs.CV

Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

Weiqing Li, Jinyue Guo, Yaqi Wang, Haiyang Xiao, Yuewei Zhang, Guohua Liu, Hao Henry Wang

Comments Accepted by CVPR2026

2603.16453 2026-03-18 cs.AI

RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Stability of LLM Agents in Realistic Retail Environments

Linghua Zhang, Jun Wang, Jingtong Wu, Zhisong Zhang

2603.16447 2026-03-18 cs.CV cs.GR

ProgressiveAvatars: Progressive Animatable 3D Gaussian Avatars

Kaiwen Song, Jinkai Cui, Juyong Zhang

Comments Accepted to CVPR 2026, Project page: https://ustc3dv.github.io/ProgressiveAvatars/

2603.16445 2026-03-18 cs.AI

Visual Distraction Undermines Moral Reasoning in Vision-Language Models

Xinyi Yang, Chenheng Xu, Weijun Hong, Ce Mo, Qian Wang, Fang Fang, Yixin Zhu

2603.16444 2026-03-18 cs.CV

Fast-HaMeR: Boosting Hand Mesh Reconstruction using Knowledge Distillation

Hunain Ahmed Jillani, Ahmed Tawfik Aboukhadra, Ahmed Elhayek, Jameel Malik, Nadia Robertini, Didier Stricker

2603.16440 2026-03-18 cs.LG cs.CL

Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models

Rishaank Gupta

详情

英文摘要

Large language model compression has made substantial progress through pruning, quantization, and low-rank decomposition, yet a fundamental limitation persists across all existing methods: compression budgets are allocated without any representation of what individual model components functionally encode. We term this the capability-blind compression problem and argue it is a root cause of two well-documented failures -- the insensitivity of perplexity-based evaluation to reasoning capability loss, and the abrupt phase transitions in model performance recently characterized by Ma et al. (2026). We propose Capability-Guided Compression (CGC), a framework that addresses this by using Sparse Autoencoder (SAE)-derived capability density maps to allocate differential compression budgets across transformer components. Capability density is a formally defined scalar measure combining the feature breadth, activation entropy, and cross-input consistency of a component's SAE feature activation distribution. We prove theoretically that components with higher capability density exhibit lower structural redundancy and reach their individual phase transition points at lower compression ratios, providing the first pre-compression mechanism for component-level phase transition prediction. Experiments on GPT-2 Medium confirm that capability density is statistically independent of Wanda importance scores (Spearman rho = -0.054, n = 384 heads), establishing it as a genuinely novel compression signal orthogonal to all existing importance metrics. We report a negative result on PPL-based compression comparison and provide a principled diagnosis identifying GPT-2 Medium as an insufficient test bed for the full CGC hypothesis. The theoretical framework, density formalism, and orthogonality finding constitute a foundation for capability-aware compression research.

URL PDF HTML ☆

赞 0 踩 0

2603.16439 2026-03-18 cs.CV cs.AI

CD-FKD: Cross-Domain Feature Knowledge Distillation for Robust Single-Domain Generalization in Object Detection

Junseok Lee, Sungho Shin, Seongju Lee, Kyoobin Lee

Comments Accepted to ICRA 2026

2603.16436 2026-03-18 cs.LG

DISCOVER: A Solver for Distributional Counterfactual Explanations

Yikai Gu, Lele Cao, Bo Zhao, Lei Lei, Lei You

Comments 20 pages, 8 figures, 4 tables

2603.16435 2026-03-18 cs.CL

VQKV: High-Fidelity and High-Ratio Cache Compression via Vector-Quantization

Yixuan Wang, Qingyu Shi, Jiayu Zhou, Dianbo Liu, Ziwei He, Zhouhan Lin

2603.16434 2026-03-18 cs.AI q-fin.TR

From Natural Language to Executable Option Strategies via Large Language Models

Haochen Luo, Zhengzhao Lai, Junjie Xu, Yifan Li, Tang Pok Hin, Yuan Zhang, Chen Liu

2603.16426 2026-03-18 cs.CV

3D Fourier-based Global Feature Extraction for Hyperspectral Image Classification

Muhammad Ahmad

2603.16424 2026-03-18 cs.RO cs.NA cs.SY eess.SY math.NA

Early-Terminable Energy-Safe Iterative Coupling for Parallel Simulation of Port-Hamiltonian Systems

Qi Wei, Jianfeng Tao, Hongyu Nie, Wangtao Tan

2603.16423 2026-03-18 cs.CV cs.AI

SF-Mamba: Rethinking State Space Model for Vision

Masakazu Yoshimura, Teruaki Hayashi, Yuki Hoshino, Wei-Yao Wang, Takeshi Ohashi

Comments 21 pages