arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.22203 2026-04-27 eess.AS cs.SD

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

Szu-Jui Chen, John H. L. Hansen

Comments Accepted to Speech Communication 2026

详情

DOI: 10.1016/j.specom.2026.103380
Journal ref: Speech Communication 180 (2026) 103380

英文摘要

Using self-supervised learning (SSL) models has significantly improved performance for downstream speech tasks, surpassing the capabilities of traditional hand-crafted features. This study investigates the amalgamation of SSL models, with the aim to leverage both their individual strengths and refine extracted features to achieve improved speech recognition models for naturalistic scenarios. Our research investigates the massive naturalistic Fearless Steps (FS) APOLLO resource, with particular focus on the FS Challenge (FSC) Phase-4 corpus, providing the inaugural analysis of this dataset. Additionally, we incorporate the CHiME-6 dataset to evaluate performance across diverse naturalistic speech scenarios. While exploring previously proposed Feature Refinement Loss and fusion methods, we found these methods to be less effective on the FSC Phase-4 corpus. To address this, we introduce a novel deep cross-attention (DCA) fusion method, designed to elevate performance, especially for the FSC Phase-4 corpus. Our objective is to foster creation of superior FS APOLLO community resources, catering to the diverse needs of researchers across various disciplines. The proposed solution achieves an absolute +1.1% improvement in WER, providing effective meta-data creation for the massive FS APOLLO community resource.

URL PDF HTML ☆

赞 0 踩 0

2604.22191 2026-04-27 cs.CR cs.CL

Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning

Chaoran Chen, Dayu Yuan, Peter Kairouz

2604.22180 2026-04-27 cs.IR cs.AI

ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression

Xiaojie Ke, Shuai Zhang, Liansheng Sun, Yongjin Wang, Hengjun Jiang, Xiangkun Liu, Cunxin Gu, Jian Xu, Guanjun Jiang

详情

英文摘要

Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the "lost in the middle" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment between the compressed representation space and the ranking space, we introduce a residual connection structure that combines encoder embeddings with contextualized hidden states from the reranker. Furthermore, we replace the conventional autoregressive decoding with a one-step cosine-similarity-based scoring mechanism, eliminating the generation bottleneck entirely. ResRank is trained through a carefully designed dual-stage, multi-task, end-to-end joint optimization strategy that simultaneously trains the encoder and reranker, achieving learning objective alignment between retrieval and reranking while substantially reducing training complexity. Extensive experiments on TREC Deep Learning and eight BEIR benchmark datasets demonstrate that ResRank achieves competitive or superior ranking effectiveness compared to existing approaches while requiring zero generated tokens and processing only one token per passage, yielding a fundamentally better balance between effectiveness and efficiency.

URL PDF HTML ☆

赞 0 踩 0

2604.22176 2026-04-27 cs.CR cs.LG

FixV2W: Correcting Invalid CVE-CWE Mappings with Knowledge Graph Embeddings

Sevval Simsek, Varsha Athreya, David Starobinski

2604.22157 2026-04-27 cs.CR cs.AI

PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store

Bhanuka Silva, Anirban Mahanti, Aruna Seneviratne, Suranga Senevirante

Comments 20 pages, 9 figures, 2 tables

2604.22143 2026-04-27 cs.CY cs.CL

Recognition Without Authorization: LLMs and the Moral Order of Online Advice

Tom van Nuenen

2604.22136 2026-04-27 cs.CR cs.LG

Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems

Jun He, Deying Yu

Comments 15 pages, 2 figures

2604.22133 2026-04-27 eess.AS cs.SD

Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis

Haopeng Geng, Longfei Yang, Xi Chen, Haitong Sun, Daisuke Saito, Nobuaki Minematsu

2604.22121 2026-04-27 eess.SY cs.RO cs.SY

Characterizing pitch and roll torque coupling in insect-sized flapping-wing robots using a microfabricated gimbal

Aaron Weber, Daksh Dhingra, Sawyer B. Fuller

Comments Submitted for journal publication in Mechatronics and conference presentation at IFAC World Congress 2026. 9 pages, 11 figures

2604.22103 2026-04-27 cs.CY cs.CV

How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits

Jason Tang, Stephen Law

2604.22096 2026-04-27 cs.CR cs.LG cs.SE

Who Audits the Auditor? Tamper-Proof Fraud Detection with Blockchain-Anchored Explainable ML

Zhaohui Wang

Comments Accepted to IEEE COMPSAC 2026 (Paper ID 9376, SEPT Symposium). This is the de-anonymized camera-ready version. Code is available at: https://github.com/GeoffreyWang1117/fraud-detection-chain

2604.22089 2026-04-27 cs.SE cs.AI

Ethics Testing: Proactive Identification of Generative AI System Harms

Shin Hwei Tan, Haibo Wang, Heng Li

2604.22072 2026-04-27 cs.DC cs.AI

Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning

Amine Barrak

2604.22068 2026-04-27 cs.SE cs.RO

TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation

Nahian Salsabil, Sebastian Elbaum

Comments FSE'26 Tool Demonstration Track

2604.22046 2026-04-27 cs.SE cs.AI

Call-Chain-Aware LLM-Based Test Generation for Java Projects

Guancheng Wang, Qinghua Xu, Lionel C. Briand, Zhaoqiang Guo, Kui Liu

2604.22043 2026-04-27 physics.soc-ph cs.LG

Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues

Vivek Upadhyay, Amaresh Chakrabarti

Comments 42 pages, 4 figures, 1 table

2604.22018 2026-04-27 q-bio.NC cs.AI cs.LG eess.SP

Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity

Deepank Girish, Yi Hao Chan, Sukrit Gupta, Jing Xia, Jagath C. Rajapakse

2604.22014 2026-04-27 cs.MA cs.RO

DM$^3$-Nav: Decentralized Multi-Agent Multimodal Multi-Object Semantic Navigation

Amin Kashiri, Atharva Jamsandekar, Yasin Yazıcıoğlu

2604.22005 2026-04-27 cs.IT cs.LG eess.SP math.IT

Null-Space Flow Matching for MIMO Channel Estimation in Latency-Constrained Systems

Junjie Zhao, Guangming Liang, Dongzhu Liu, Xiaonan Liu

Comments 6 pages, 3 figures, 20 references

2604.21989 2026-04-27 math.OC cs.AI cs.RO cs.SY eess.SY math.DS

Model Predictive Control of Hybrid Dynamical Systems

Ricardo G. Sanfelice, Berk Altin

Comments Technical report associated with paper to appear in IEEE Transactions on Automatic Control, 2026

2604.21961 2026-04-27 cs.LO cs.AI cs.SC

A general optimization solver based on OP-to-MaxSAT reduction

Yuxin Zhao, Han Huang, Zhifeng Hao

2604.21958 2026-04-27 cs.SE cs.AI

A systematic review of generative AI usage for IT project management

Ionut Anghel, Tudor Cioara

2604.21957 2026-04-27 cs.IT cs.AI cs.LG eess.SP math.IT

MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction

Aladin Djuhera, Haris Gacanin, Holger Boche

2604.21950 2026-04-27 cs.SE cs.AI cs.LG

Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation

Charles Junichi McAndrews

Comments 17 pages main text, 2 page references, 3 figures. Code: https://github.com/L3G/feedback-over-form

2604.21938 2026-04-27 cs.CY cs.AI

The Biggest Risk of Embodied AI is Governance Lag

Shaoshan Liu

2604.18143 2026-04-27 stat.ML cs.LG stat.ME

Distributional Off-Policy Evaluation with Deep Quantile Process Regression

Qi Kuang, Chao Wang, Yuling Jiao, Fan Zhou

Comments Journal of the American Statistical Association

2604.12799 2026-04-27 cs.GT cs.AI cs.DS

Efficiency of Proportional Mechanisms in Online Auto-Bidding Advertising

Nguyen Kim Thang

2604.11594 2026-04-27 eess.AS cs.SD

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Shuiyuan Wang, Zhixian Zhao, Hongfei Xue, Chengyou Wang, Shuai Wang, Hui Bu, Xin Xu, Lei Xie

2604.07169 2026-04-27 stat.ML cs.LG cs.NA math.NA

FLUID: Flow-based Unified Inference for Dynamics

Tiangang Cui, Xiaodong Feng, Chenlong Pei, Xiaoliang Wan, Tao Zhou

Comments 43 pages

2604.02861 2026-04-27 cs.DB cs.AI

LLM+Graph@VLDB'2025 Workshop Summary

Yixiang Fang, Arijit Khan, Tianxing Wu, Da Yan, Shu Wang