arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03800 2026-04-07 cs.CV

HistoFusionNet: Histogram-Guided Fusion and Frequency-Adaptive Refinement for Nighttime Image Dehazing

Mohammad Heydari, Wei Dong, Shahram Shirani, Jun Chen, Han Zhou

详情

英文摘要

Nighttime image dehazing remains a challenging low-level vision problem due to the joint presence of haze, glow, non-uniform illumination, color distortion, and sensor noise, which often invalidate assumptions commonly used in daytime dehazing. To address these challenges, we propose HistoFusionNet, a transformer-enhanced architecture tailored for nighttime image dehazing by combining histogram-guided representation learning with frequency-adaptive feature refinement. Built upon a multi-scale encoder-decoder backbone, our method introduces histogram transformer blocks that model long-range dependencies by grouping features according to their dynamic-range characteristics, enabling more effective aggregation of similarly degraded regions under complex nighttime lighting. To further improve restoration fidelity, we incorporate a frequency-aware refinement branch that adaptively exploits complementary low- and high-frequency cues, helping recover scene structures, suppress artifacts, and enhance local details. This design yields a unified framework that is particularly well suited to the heterogeneous degradations encountered in real nighttime hazy scenes. Extensive experiments and highly competitive performance of our method on the NTIRE 2026 Nighttime Image Dehazing Challenge benchmark demonstrate the effectiveness of the proposed method. Our team ranked 1st among 22 participating teams, highlighting the robustness and competitive performance of HistoFusionNet. The code is available at: https://github.com/heydarimo/Night-Time-Dehazing

URL PDF HTML ☆

赞 0 踩 0

2604.03797 2026-04-07 cs.CV

Confidence-Driven Facade Refinement of 3D Building Models Using MLS Point Clouds

Xiaoyu Huang

2604.03781 2026-04-07 cs.RO

OpenRC: An Open-Source Robotic Colonoscopy Framework for Multimodal Data Acquisition and Autonomy Research

Siddhartha Kapuria, Mohammad Rafiee Javazm, Naruhiko Ikoma, Joga Ivatury, Mohammad Ali Nasseri, Nassir Navab, Farshid Alambeigi

2604.03774 2026-04-07 cs.CV cs.AI

When Does Multimodal AI Help? Diagnostic Complementarity of Vision-Language Models and CNNs for Spectrum Management in Satellite-Terrestrial Networks

Yuanhang Li

Comments 10 pages, 4 figures

2604.03773 2026-04-07 cs.CV

M2StyleGS: Multi-Modality 3D Style Transfer with Gaussian Splatting

Xingyu Miao, Xueqi Qiu, Haoran Duan, Yawen Huang, Xian Wu, Jingjing Deng, Yang Long

2604.03766 2026-04-07 cs.RO cs.SY eess.SY

A Novel Hybrid PID-LQR Controller for Sit-To-Stand Assistance Using a CAD-Integrated Simscape Multibody Lower Limb Exoskeleton

Ranjeet Kumbhar, Rajmeet Singh, Appaso M Gadade, Ashish Singla, Irfan Hussain

2604.03764 2026-04-07 cs.LG cs.AI

Automated Attention Pattern Discovery at Scale in Large Language Models

Jonathan Katzy, Razvan-Mihai Popescu, Erik Mekkes, Arie van Deursen, Maliheh Izadi

Comments Accepted to TMLR

详情

Journal ref: Transactions on Machine Learning Research 2026

英文摘要

Large language models have found success by scaling up capabilities to work in general settings. The same can unfortunately not be said for interpretability methods. The current trend in mechanistic interpretability is to provide precise explanations of specific behaviors in controlled settings. These often do not generalize, or are too resource intensive for larger studies. In this work we propose to study repeated behaviors in large language models by mining completion scenarios in Java code datasets, through exploiting the structured nature of code. We collect the attention patterns generated in the attention heads to demonstrate that they are scalable signals for global interpretability of model components. We show that vision models offer a promising direction for analyzing attention patterns at scale. To demonstrate this, we introduce the Attention Pattern - Masked Autoencoder(AP-MAE), a vision transformer-based model that efficiently reconstructs masked attention patterns. Experiments on StarCoder2 show that AP-MAE (i) reconstructs masked attention patterns with high accuracy, (ii) generalizes across unseen models with minimal degradation, (iii) reveals recurring patterns across inferences, (iv) predicts whether a generation will be correct without access to ground truth, with accuracies ranging from 55% to 70% depending on the task, and (v) enables targeted interventions that increase accuracy by 13.6% when applied selectively, but cause collapse when applied excessively. These results establish attention patterns as a scalable signal for interpretability and demonstrate that AP-MAE provides a transferable foundation for both analysis and intervention in large language models. Beyond its standalone value, AP-MAE also serves as a selection procedure to guide fine-grained mechanistic approaches. We release code and models to support future work in large-scale interpretability.

URL PDF HTML ☆

赞 0 踩 0

2604.03759 2026-04-07 cs.RO cs.AI

Build on Priors: Vision--Language--Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation

Pierrick Lorang, Johannes Huemer, Timothy Duggan, Kai Goebel, Patrik Zips, Matthias Scheutz

详情

英文摘要

Enabling robots to learn long-horizon manipulation tasks from a handful of demonstrations remains a central challenge in robotics. Existing neuro-symbolic approaches often rely on hand-crafted symbolic abstractions, semantically labeled trajectories or large demonstration datasets, limiting their scalability and real-world applicability. We present a scalable neuro-symbolic framework that autonomously constructs symbolic planning domains and data-efficient control policies from as few as one to thirty unannotated skill demonstrations, without requiring manual domain engineering. Our method segments demonstrations into skills and employs a Vision-Language Model (VLM) to classify skills and identify equivalent high-level states, enabling automatic construction of a state-transition graph. This graph is processed by an Answer Set Programming solver to synthesize a PDDL planning domain, which an oracle function exploits to isolate the minimal, task-relevant and target relative observation and action spaces for each skill policy. Policies are learned at the control reference level rather than at the raw actuator signal level, yielding a smoother and less noisy learning target. Known controllers can be leveraged for real-world data augmentation by projecting a single demonstration onto other objects in the scene, simultaneously enriching the graph construction process and the dataset for imitation learning. We validate our framework primarily on a real industrial forklift across statistically rigorous manipulation trials, and demonstrate cross-platform generality on a Kinova Gen3 robotic arm across two standard benchmarks. Our results show that grounding control learning, VLM-driven abstraction, and automated planning synthesis into a unified pipeline constitutes a practical path toward scalable, data-efficient, expert-free and interpretable neuro-symbolic robotics.

URL PDF HTML ☆

赞 0 踩 0

2604.03754 2026-04-07 cs.CL cs.AI

Testing the Limits of Truth Directions in LLMs

Angelos Poulis, Mark Crovella, Evimaria Terzi

2604.03747 2026-04-07 cs.RO

CT-VoxelMap: Efficient Continuous-Time LiDAR-Inertial Odometry with Probabilistic Adaptive Voxel Mapping

Lei Zhao, Xingyi Li, Tianchen Deng, Chuan Cao, Han Zhang, Weidong Chen

2604.03742 2026-04-07 cs.AI

Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Process and DualJudge

Yulong He, Ivan Smirnov, Dmitry Fedrushkov, Sergey Kovalchuk, Ilya Revin

2604.03741 2026-04-07 cs.CV physics.comp-ph

Shower-Aware Dual-Stream Voxel Networks for Structural Defect Detection in Cosmic-Ray Muon Tomography

Parthiv Dasgupta, Sambhav Agarwal, Palash Dutta, Raja Karmakar, Sudeshna Goswami

Comments 8 pages, 10 figures, 4 tables. Includes supplementary data via Zenodo DOI: 10.5281/zenodo.19355077. This work introduces SA-DSVN for 3D voxel segmentation in muon tomography, utilizing secondary electromagnetic shower multiplicities. (pp. 1, 3)

2604.03738 2026-04-07 cs.CV

Rethinking Position Embedding as a Context Controller for Multi-Reference and Multi-Shot Video Generation

Binyuan Huang, Yuning Lu, Weinan Jia, Hualiang Wang, Mu Liu, Daiqing Yang

Comments Accepted to CVPR 2026

2604.03730 2026-04-07 cs.RO

A Multi-View 3D Telepresence System for XR Robot Teleoperation

Enes Ulas Dincer, Manuel Zaremski, Alexandra Nick, Elias Wucher, Barbara Deml, Gerhard Neumann

2604.03716 2026-04-07 cs.CV cs.GR

CGHair: Compact Gaussian Hair Reconstruction with Card Clustering

Haimin Luo, Srinjay Sarkar, Albert Mosella-Montoro, Francisco Vicente Carrasco, Fernando De la Torre

Comments Accepted to CVPR 2026. This arXiv version is not the final published version

2604.03710 2026-04-07 cs.CV cs.AI cs.LG

Learning Superpixel Ensemble and Hierarchy Graphs for Melanoma Detection

Asmaa M. Elwer, Muhammad A. Rushdi, Mahmoud H. Annaby

2604.03706 2026-04-07 cs.CV

XSeg: A Large-scale X-ray Contraband Segmentation Benchmark For Real-World Security Screening

Hongxia Gao, Litao Li, Yixin Chen, Jiali Wen, Kaijie Zhang, Qianyun Liu

Comments 12 pages, 8 figures, Accepted to CVPR 2026

2604.03697 2026-04-07 cs.CV

SGTA: Scene-Graph Based Multi-Modal Traffic Agent for Video Understanding

Xingcheng Zhou, Mingyu Liu, Walter Zimmer, Jiajie Zhang, Alois Knoll

2604.03696 2026-04-07 cs.CV

FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning

Zhengyu Fu, René Zurbrügg, Kaixian Qu, Marc Pollefeys, Marco Hutter, Hermann Blum, Zuria Bauer

2604.03695 2026-04-07 cs.CL

POEMetric: The Last Stanza of Humanity

Bingru Li, Han Wang, Hazel Wilkinson

2604.03693 2026-04-07 cs.CV

ResGuard: Enhancing Robustness Against Known Original Attacks in Deep Watermarking

Hanyi Wang, Han Fang, Yupeng Qiu, Shilin Wang, Ee-Chien Chang

2604.03685 2026-04-07 cs.CV

DSERT-RoLL: Robust Multi-Modal Perception for Diverse Driving Conditions with Stereo Event-RGB-Thermal Cameras, 4D Radar, and Dual-LiDAR

Hoonhee Cho, Jae-Young Kang, Yuhwan Jeong, Yunseo Yang, Wonyoung Lee, Youngho Kim, Kuk-Jin Yoon

Comments Accepted by CVPR2026

2604.03684 2026-04-07 cs.CL

Researchers waste 80% of LLM annotation costs by classifying one text at a time

Christian Pipal, Eva-Maria Vogel, Morgan Wack, Frank Esser

2604.03679 2026-04-07 cs.CL cs.AI cs.IR cs.LG cs.MM

LightThinker++: From Reasoning Compression to Memory Management

Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang

Comments Work in progress. This is an extended version of LightThinker

2604.03677 2026-04-07 cs.CL cs.AI

Unlocking Prompt Infilling Capability for Diffusion Language Models

Yoshinari Fujinuma, Keisuke Sakaguchi

2604.03674 2026-04-07 cs.CV

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Haowei Zhu, Ji Liu, Ziqiong Liu, Dong Li, Junhai Yong, Bin Wang, Emad Barsoum

Comments Accepted by ICLR 2026

2604.03673 2026-04-07 cs.CL

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

Greta Gorzoni, Ludovica Pannitto, Francesca Masini

2604.03672 2026-04-07 cs.CL cs.AI

AI Appeals Processor: A Deep Learning Approach to Automated Classification of Citizen Appeals in Government Services

Vladimir Beskorovainyi

Comments 10 pages, 0 figures, 5 tables

2604.03667 2026-04-07 cs.CV

Leveraging Gaze and Set-of-Mark in VLLMs for Human-Object Interaction Anticipation from Egocentric Videos

Daniele Materia, Francesco Ragusa, Giovanni Maria Farinella

Comments Accepted to International Conference on Pattern Recognition (ICPR) 2026

2604.03664 2026-04-07 cs.CL

Document-Level Numerical Reasoning across Single and Multiple Tables in Financial Reports

Yi-Cheng Wang, Wei-An Wang, Chu-Song Chen