arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.15849 2026-04-20 cs.SD

TinyMU: A Compact Audio-Language Model for Music Understanding

Xiquan Li, Aurian Quelennec, Slim Essid

Comments ICASSP 2026

详情

英文摘要

Music understanding and reasoning are central challenges in the Music Information Research field, with applications ranging from retrieval and recommendation to music agents and virtual assistants. Recent Large Audio-Language Models (LALMs) have shown remarkable progress in answering music-related questions by following user instructions. However, their massive scale, often billions of parameters, results in expensive training, slow inference, and limited deployability on edge devices. In this work, we present TinyMU, a lightweight (229M) Music-Language Model (MLM) that achieves performance comparable to much larger LALMs while remaining efficient and compact. To train TinyMU, we introduce MusicSkills-3.5M, a carefully curated, music-grounded question-answering dataset with 3.5M samples. Spanning multiple-choice, binary, and open-ended formats, this dataset provides fine-grained supervision across diverse musical concepts. For its architecture, TinyMU leverages MATPAC++, the SOTA self-supervised audio encoder for fine-grained feature extraction. Paired with a lightweight linear projector, it efficiently aligns audio embeddings with the language model. Through extensive evaluation, we show that TinyMU performs strongly in both basic music understanding and complex reasoning. Notably, on the MuChoMusic benchmark, it achieves 82\% of SOTA LALM's performance despite being 35x smaller, highlighting the potential of small MLMs under constrained computational budgets.

URL PDF HTML ☆

赞 0 踩 0

2604.15847 2026-04-20 cs.CL

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization

Junyi Li, Yongqiang Chen, Ningning Ding

Comments Accepted by ACL 2026 Main Conference

2604.15842 2026-04-20 cs.CL

Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms

Tanja Baeumel, Josef van Genabith, Simon Ostermann

Comments MathNLP 2025

2604.15840 2026-04-20 cs.CL

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

Shidong Yang, Ziyu Ma, Tongwen Huang, Yiming Hu, Yong Wang, Xiangxiang Chu

Comments Accepted to ACL 2026

2604.15839 2026-04-20 cs.AI cs.CL cs.LO

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

Chengwu Liu, Yichun Yin, Ye Yuan, Jiaxuan Xie, Botao Li, Siqi Li, Jianhao Shen, Yan Xu, Lifeng Shang, Ming Zhang

Comments ACL 2026 Main Conference

2604.15838 2026-04-20 cs.LG

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

Zhaobo Hu, Vincent Gauthier, Mehdi Naima

2604.15837 2026-04-20 cs.AI

Stein Variational Black-Box Combinatorial Optimization

Thomas Landais, Olivier Goudet, Adrien Goëffon, Frédéric Saubion, Sylvain Lamprier

2604.15833 2026-04-20 cs.LG

Modern Structure-Aware Simplicial Spatiotemporal Neural Network

Zhaobo Hu, Vincent Gauthier, Mehdi Naima

2604.15829 2026-04-20 cs.CV cs.CR

Beyond Text Prompts: Precise Concept Erasure through Text-Image Collaboration

Jun Li, Lizhi Xiong, Ziqiang Li, Weiwei Jiang, Zhangjie Fu, Yong Li, Guo-Sen Xie

Comments 25 pages, accepted by CVPR 2026

2604.15828 2026-04-20 cs.CV

SSFT: A Lightweight Spectral-Spatial Fusion Transformer for Generic Hyperspectral Classification

Alexander Musiat, Nikolas Ebert, Oliver Wasenmüller

Comments This paper has been accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2026

2604.15823 2026-04-20 cs.CV

Watching Movies Like a Human: Egocentric Emotion Understanding for Embodied Companions

Ze Dong, Hao Shi, Zejia Gao, Zhonghua Yi, Kaiwei Wang, Lin Wang

Comments 15 pages

2604.15822 2026-04-20 cs.LG cs.AI cs.CE cs.NE eess.SP

ECG-Lens: Benchmarking ML & DL Models on PTB-XL Dataset

Saloni Garg, Ukant Jadia, Amit Sagtani, Kamal Kant Hiran

Comments 8 pages, 4 figures, 3 tables

详情

DOI: 10.1109/ETNCC63262.2024.10767459
Journal ref: 2024 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), 2024, pp. 1-8

英文摘要

Automated classification of electrocardiogram (ECG) signals is a useful tool for diagnosing and monitoring cardiovascular diseases. This study compares three traditional machine learning algorithms (Decision Tree Classifier, Random Forest Classifier, and Logistic Regression) and three deep learning models (Simple Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Complex CNN (ECGLens)) for the classification of ECG signals from the PTB-XL dataset, which contains 12-lead recordings from normal patients and patients with various cardiac conditions. The DL models were trained on raw ECG signals, allowing them to automatically extract discriminative features. Data augmentation using the Stationary Wavelet Transform (SWT) was applied to enhance model performance, increase the diversity of training samples, and preserve the essential characteristics of the ECG signals. The models were evaluated using multiple metrics, including accuracy, precision, recall, F1-score, and ROC-AUC. The ECG-Lens model achieved the highest performance, with 80% classification accuracy and a 90% ROC-AUC. These findings demonstrate that deep learning architectures, particularly complex CNNs substantially outperform traditional ML methods on raw 12-lead ECG data, and provide a practical benchmark for selecting automated ECG classification models and identifying directions for condition-specific model development.

URL PDF HTML ☆

赞 0 踩 0

2604.15814 2026-04-20 cs.CV cs.RO

Continual Hand-Eye Calibration for Open-world Robotic Manipulation

Fazeng Li, Gan Sun, Chenxi Liu, Yao He, Wei Cong, Yang Cong

2604.15809 2026-04-20 cs.CV

Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow

Chengxin Liu, Wonseok Choi, Chenshuang Zhang, Tae-Hyun Oh

Comments CVPR 2026. Project page: https://cxliu0.github.io/AIF/

2604.15808 2026-04-20 cs.CV cs.AI

Beyond a Single Frame: Multi-Frame Spatially Grounded Reasoning Across Volumetric MRI

Lama Moukheiber, Caleb M. Yeung, Haotian Xue, Alec Helbling, Zelin Zhao, Yongxin Chen

2604.15805 2026-04-20 cs.RO cs.AI

From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation

Jasper Lu, Zhenhao Shen, Yuanfei Wang, Shugao Liu, Shengqiang Xu, Shawn Xie, Jingkai Xu, Feng Jiang, Jade Yang, Chen Xie, Ruihai Wu

2604.15802 2026-04-20 cs.CL

CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents

Hyunseok Park, Jihyeon Kim, Jongeun Kim, Dongsik Yoon

2604.15795 2026-04-20 cs.CV

Fed3D: Federated 3D Object Detection

Suyan Dai, Chenxi Liu, Fazeng Li, Peican Lin

2604.15794 2026-04-20 cs.LG cs.AI cs.CL

Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting

Chi Liu, Xin Chen, Xu Zhou, Fangbo Tu, Srinivasan Manoharan

Comments 14 pages, 8 figures

2604.15791 2026-04-20 cs.LG

Convolutionally Low-Rank Models with Modified Quantile Regression for Interval Time Series Forecasting

Miaoxuan Zhu, Yi Yu, Yuyang Li, Wei Li, Guangcan Liu

2604.15789 2026-04-20 cs.CL

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

Wai Man Si, Mingjie Li, Michael Backes, Yang Zhang

2604.15787 2026-04-20 cs.LG cs.AI

EVIL: Evolving Interpretable Algorithms for Zero-Shot Inference on Event Sequences and Time Series with LLMs

David Berghaus

2604.15783 2026-04-20 cs.LG

Similarity-Based Bike Station Expansion via Hybrid Denoising Autoencoders

Oluwaleke Yusuf, M. Tsaqif Wismadi, Adil Rasheed

Comments 10 pages, 9 figures. Code available at https://github.com/Outsiders17711/TCB-SimilarityAE-Expansion

2604.15782 2026-04-20 cs.LG physics.soc-ph

Fusing Cellular Network Data and Tollbooth Counts for Urban Traffic Flow Estimation

Oluwaleke Yusuf, Shaira Tabassum

Comments 8 pages, 7 figures

2604.15780 2026-04-20 cs.LG cs.CL

Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs

Wai Man Si, Mingjie Li, Michael Backes, Yang Zhang

2604.15777 2026-04-20 cs.CV cs.AI

SegMix:Shuffle-based Feedback Learning for Semantic Segmentation of Pathology Images

Zhiling Yan, Sicheng Chen, Tianyi Zhang, Nan Ying, Yanli Lei, Guanglei Zhang

2604.15776 2026-04-20 cs.CL cs.AI

PIIBench: A Unified Multi-Source Benchmark Corpus for Personally Identifiable Information Detection

Pritesh Jha

详情

DOI: 10.5281/zenodo.19533263

英文摘要

We present PIIBench, a unified benchmark corpus for Personally Identifiable Information (PII) detection in natural language text. Existing resources for PII detection are fragmented across domain-specific corpora with mutually incompatible annotation schemes, preventing systematic comparison of detection systems. We consolidate ten publicly available datasets spanning synthetic PII corpora, multilingual Named Entity Recognition (NER) benchmarks, and financial domain annotated text, yielding a corpus of 2,369,883 annotated sequences and 3.35 million entity mentions across 48 canonical PII entity types. We develop a principled normalization pipeline that maps 80+ source-specific label variants to a standardized BIO tagging scheme, applies frequency-based suppression of near absent entity types, and produces stratified 80/10/10 train/validation/test splits preserving source distribution. To establish baseline difficulty, we evaluate eight published systems spanning rule-based engines (Microsoft Presidio), general purpose NER models (spaCy, BERT-base NER, XLM-RoBERTa NER, SpanMarker mBERT, SpanMarker BERT), a PII-specific model (Piiranha DeBERTa), and a financial NER specialist (XtremeDistil FiNER). All systems achieve span-level F1 below 0.14, with the best system (Presidio, F1=0.1385) still producing zero recall on most entity types. These results directly quantify the domain-silo problem and demonstrate that PIIBench presents a substantially harder and more comprehensive evaluation challenge than any existing single source PII dataset. The dataset construction pipeline and benchmark evaluation code are publicly available at https://github.com/pritesh-2711/pii-bench.

URL PDF HTML ☆

赞 0 踩 0

2604.15775 2026-04-20 cs.LG hep-ex quant-ph

Federated Learning with Quantum Enhanced LSTM for Applications in High Energy Physics

Abhishek Sawaika, Durga Pritam Suggisetti, Udaya Parampalli, Rajkumar Buyya

Comments 8 pages, 7 figures, accepted at IEEE WCCI, 2026

2604.15772 2026-04-20 cs.RO

Fuzzy Logic Theory-based Adaptive Reward Shaping for Robust Reinforcement Learning (FARS)

Hürkan Şahin, Van Huyen Dang, Erdi Sayar, Alper Yegenoglu, Erdal Kayacan

Comments 6 pages, 5 figures

2604.15769 2026-04-20 cs.LG cs.AI

Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments 6 pages, 3 figures, 7 tables