arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.27981 2026-03-31 cs.CL cs.SD

On the Role of Encoder Depth: Pruning Whisper and LoRA Fine-Tuning in SLAM-ASR

Ganesh Pavan Kartikeya Bharadwaj Kolluri, Michael Kampouridis, Ravi Shekhar

Comments Accepted at SPEAKABLE Workshop, LREC 2026

详情

英文摘要

Automatic speech recognition (ASR) has advanced rapidly in recent years, driven by large-scale pretrained models and end-to-end architectures such as SLAM-ASR. A key component of SLAM-ASR systems is the Whisper speech encoder, which provides robust acoustic representations. While model pruning has been explored for the full Whisper encoder-decoder architecture, its impact within the SLAM-ASR setting remains under-investigated. In this work, we analyze the effects of layer pruning in the Whisper encoder when used as the acoustic backbone of SLAM-ASR. We further examine the extent to which LoRA-based fine-tuning can recover performance degradation caused by pruning. Experiments conducted across three Whisper variants (Small, Medium, Large-v2), three languages representing distinct resource levels (Danish, Dutch, English), and over 200 training runs demonstrate that pruning two encoder layers causes only 2-4% WER degradation, and that combining this pruning with LoRA adaptation consistently outperforms the unpruned baseline while reducing total parameters by 7-14%. Moreover, our error analysis reveals that LoRA primarily compensates through the language model's linguistic priors, reducing total word errors by 11-21% for Dutch and English, with substitutions and deletions showing the largest reductions. However, for low-resource Danish, the reduction is smaller (4-7%), and LoRA introduces increased insertion errors, indicating that compensation effectiveness depends on the LLM's pre-existing language proficiency and available training data.

URL PDF HTML ☆

赞 0 踩 0

2603.27970 2026-03-31 cs.CV

AffordMatcher: Affordance Learning in 3D Scenes from Visual Signifiers

Nghia Vu, Tuong Do, Khang Nguyen, Baoru Huang, Nhat Le, Binh Xuan Nguyen, Erman Tjiputra, Quang D. Tran, Ravi Prakash, Te-Chuan Chiu, Anh Nguyen

Comments 14 pages. Accepted to CVPR 2026

2603.27969 2026-03-31 cs.CV

Hg-I2P: Bridging Modalities for Generalizable Image-to-Point-Cloud Registration via Heterogeneous Graphs

Pei An, Junfeng Ding, Jiaqi Yang, Yulong Wang, Jie Ma, Liangliang Nan

Comments Accepted to CVPR 2026

2603.27967 2026-03-31 cs.CV

Learning Multi-View Spatial Reasoning from Cross-View Relations

Suchae Jeong, Jaehwi Song, Haeone Lee, Hanna Kim, Jian Kim, Dongjun Lee, Dong Kyu Shin, Changyeon Kim, Dongyoon Hahm, Woogyeol Jin, Juheon Choi, Kimin Lee

Comments Accepted to CVPR 2026

2603.27965 2026-03-31 cs.CV

ExFusion: Efficient Transformer Training via Multi-Experts Fusion

Jiacheng Ruan, Daize Dong, Xiaoye Qu, Tong Zhu, Ting Liu, Yuzhuo Fu, Yu Cheng, Suncheng Xiang

Comments Accepted by IEEE TMM2026

2603.27962 2026-03-31 cs.LG cs.GT

Gradient Manipulation in Distributed Stochastic Gradient Descent with Strategic Agents: Truthful Incentives with Convergence Guarantees

Ziqin Chen, Yongqiang Wang

Comments 19 pages, 8 figures

2603.27958 2026-03-31 cs.AI

CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs

Yongkang Du, Xiaohan Zou, Minhao Cheng, Lu Lin

2603.27950 2026-03-31 cs.LG

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Kieran Didi, Zuobai Zhang, Guoqing Zhou, Danny Reidenbach, Zhonglin Cao, Sooyoung Cha, Tomas Geffner, Christian Dallago, Jian Tang, Michael M. Bronstein, Martin Steinegger, Emine Kucukbenli, Arash Vahdat, Karsten Kreis

Comments ICLR 2026 Oral Presentation. Project page: https://research.nvidia.com/labs/genair/proteina-complexa/

2603.27949 2026-03-31 cs.CL

EnsemJudge: Enhancing Reliability in Chinese LLM-Generated Text Detection through Diverse Model Ensembles

Zhuoshang Wang, Yubing Ren, Guoyu Zhao, Xiaowei Zhu, Hao Li, Yanan Cao

Comments Accepted by NLPCC 2025 Shared Tasks

2603.27944 2026-03-31 cs.RO

Flip Stunts on Bicycle Robots using Iterative Motion Imitation

Jeonghwan Kim, Shamel Fahmi, Seungeun Rho, Sehoon Ha, Gabriel Nelson

Comments 8 Pages, Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2026

2603.27938 2026-03-31 cs.CL

Top-down string-to-dependency Neural Machine Translation

Shuhei Kondo, Katsuhito Sudoh, Yuji Matsumoto

2603.27931 2026-03-31 cs.CV

A Cross-Scale Decoder with Token Refinement for Off-Road Semantic Segmentation

Seongkyu Choi Jhonghyun An

2603.27929 2026-03-31 cs.LG cs.AI

Physics-Guided Transformer (PGT): Physics-Aware Attention Mechanism for PINNs

Ehsan Zeraatkar, Rodion Podorozhny, Jelena Tešić

2603.27923 2026-03-31 cs.CV

ForestSim: A Synthetic Benchmark for Intelligent Vehicle Perception in Unstructured Forest Environments

Pragat Wagle, Zheng Chen, Lantao Liu

2603.27922 2026-03-31 cs.AI cs.IR

GEAKG: Generative Executable Algorithm Knowledge Graphs

Camilo Chacón Sartori, José H. García, Andrei Voicu Tomut, Christian Blum

2603.27915 2026-03-31 cs.CV

FlashSign: Pose-Free Guidance for Efficient Sign Language Video Generation

Liuzhou Zhang, Zeyu Zhang, Biao Wu, Luyao Tang, Zirui Song, Hongyang He, Renda Han, Guangzhen Yao, Huacan Wang, Ronghao Chen, Xiuying Chen, Guan Huang, Zheng Zhu

2603.27913 2026-03-31 cs.CV

Spatial Orthogonal Refinement for Robust RGB-Event Visual Object Tracking

Dexing Huang, Shiao Wang, Fan Zhang, Xiao Wang

Comments Joint International Conference on Automation-Intelligence-Safety and International Symposium on Autonomous Systems 2026 (ICAIS and ISAS 2026)

2603.27912 2026-03-31 cs.RO cs.SY eess.SY

Safety Guardrails in the Sky: Realizing Control Barrier Functions on the VISTA F-16 Jet

Andrew W. Singletary, Max H. Cohen, Tamas G. Molnar, Aaron D. Ames

2603.27904 2026-03-31 cs.CV

BINO: Encoder Centric Self Supervised Stereo With Native Pair Input

Haokun Zhou

2603.27900 2026-03-31 cs.CV

Rényi Entropy: A New Token Pruning Metric for Vision Transformers

Wei-Yuan Su, Ruijie Zhang, Zheng Zhang

2603.27898 2026-03-31 cs.CV

SAGE: Sink-Aware Grounded Decoding for Multimodal Hallucination Mitigation

Tripti Shukla, Zsolt Kira

Comments 25 pages, 6 figures, 7 tables

2603.27891 2026-03-31 cs.CV

Poppy: Polarization-based Plug-and-Play Guidance for Enhancing Monocular Normal Estimation

Irene Kim, Sai Tanmay Reddy Chakkera, Alexandros Graikos, Dimitris Samaras, Akshat Dave

Comments project page: https://irnkim.github.io/poppy/

2603.27885 2026-03-31 cs.LG

Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks

Matthew Loftus

Comments 8 pages, 2 figures, 5 tables

2603.27884 2026-03-31 cs.LG math.OC

Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

Kihyun Yu, Seoungbin Bae, Dabeen Lee

2603.27880 2026-03-31 cs.LG cs.AI cs.RO math.DS

Kernel Dynamics under Path Entropy Maximization

Jnaneshwar Das

Comments 7 pages, 2 figures

2603.27877 2026-03-31 cs.CL cs.SD

HumMusQA: A Human-written Music Understanding QA Benchmark Dataset

Benno Weck, Pablo Puentes, Andrea Poltronieri, Satyajeet Prabhu, Dmitry Bogdanov

Comments Dataset available at https://doi.org/10.5281/zenodo.18462523

2603.27866 2026-03-31 cs.CV

Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning

Ming Liu, Yunbei Zhang, Shilong Liu, Liwen Wang, Wensheng Zhang

2603.27859 2026-03-31 cs.CL cs.NA math.NA

KazByte: Adapting Qwen models to Kazakh via Byte-level Adapter

Rauan Akylzhanov

Comments Technical announcement

2603.27857 2026-03-31 cs.AI

CARGO: Carbon-Aware Gossip Orchestration in Smart Shipping

Alexandros S. Kalafatelis, Nikolaos Nomikos, Vasileios Nikolakakis, Nikolaos Tsoulakos, Panagiotis Trakadas

2603.27855 2026-03-31 cs.CL

What can LLMs tell us about the mechanisms behind polarity illusions in humans? Experiments across model scales and training steps

Dario Paape