arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.20638 2026-04-13 cs.SD cs.CV cs.MM eess.AS

Music Audio-Visual Question Answering Requires Specialized Multimodal Designs

Wenhao You, Xingjian Diao, Wenjun Huang, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Tingxuan Wu, Ming Cheng, Soroush Vosoughi, Jiang Gui

Comments Accepted to Annual Meeting of the Association for Computational Linguistics (ACL 2026). The first two authors contributed equally

2505.18600 2026-04-13 cs.CV cs.AI cs.LG

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye

Comments NeurIPS 2025 (Spotlight)

2505.12509 2026-04-13 cs.LG cs.AI

Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models

Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang

Comments Accepted to ACL 2026 Main Conference

2505.12318 2026-04-13 cs.LG

Task-agnostic Low-rank Residual Adaptation for Efficient Federated Continual Fine-Tuning

Feng Yu, Jia Hu, Geyong Min

2505.01218 2026-04-13 cs.LG cs.NE

Quantitative Attractor Analysis of High-Capacity Kernel Hopfield Networks

Akira Tamamori

Comments 17 pages, 7 figures; accepted to NOLTA, IEICE

2504.07031 2026-04-13 cs.LG

Reducing Class Bias In Data-Balanced Datasets Through Hardness-Based Resampling

Pawel Pukowski, Venet Osmani

Comments Submitted to Springer ML

2503.14075 2026-04-13 cs.CV cs.CL

Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models

Zhenwei Shao, Mingyang Wang, Weijun Zhang, Zhou Yu, Wenwen Pan, Yan Yang, Tao Wei, Hongyuan Zhang, Jun Yu

Comments An extended version of our ICCV paper at ICCV2025/html/Shao_Growing_a_Twig_to_Accelerate_Large_Vision-Language_Models_ICCV_2025_paper.html" target="_blank" rel="noopener">https://openaccess.thecvf.com/content/ICCV2025/html/Shao_Growing_a_Twig_to_Accelerate_Large_Vision-Language_Models_ICCV_2025_paper.html

2503.06983 2026-04-13 cs.CV cs.RO

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

Jiahao Wang, Xiangyu Cao, Jiaru Zhong, Yuner Zhang, Zeyu Han, Haibao Yu, Chuang Zhang, Lei He, Shaobing Xu, Jianqiang Wang

Comments Accepted by AAAI 2026

2502.11941 2026-04-13 cs.LG cs.AI

Deep Spatio-Temporal Neural Network for Air Quality Reanalysis

Ammar Kheder, Benjamin Foreback, Lili Wang, Zhi-Song Liu, Michael Boy

2502.02345 2026-04-13 cs.LG

Low Rank Based Subspace Inference for the Laplace Approximation of Bayesian Neural Networks

Josua Faller, Jörg Martin

Comments for associated code, see https://github.com/josh3142/LowRankLaplaceApproximation

2501.15461 2026-04-13 cs.LG

Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space

Xin He, Yili Wang, Wenqi Fan, Xu Shen, Xin Juan, Rui Miao, Xin Wang

Comments Accepted by The Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2025)

2501.14377 2026-04-13 cs.RO

Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight

Angel Romero, Ashwin Shenai, Ismail Geles, Elie Aljalbout, Davide Scaramuzza

Comments 8 pages, 6 Figures, accepted to IEEE ICRA 2026

2501.11568 2026-04-13 cs.LG

Graph Defense Diffusion Model

Xin He, Wenqi Fan, Yili Wang, Chengyi Liu, Rui Miao, Xin Juan, Xin Wang

Comments Accepted by The 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD 2026)

详情

英文摘要

Graph Neural Networks (GNNs) are highly vulnerable to adversarial attacks, which can greatly degrade their performance. Existing graph purification methods attempt to address this issue by filtering attacked graphs. However, they struggle to defend effectively against multiple types of adversarial attacks (e.g., targeted attacks and non-targeted attacks) simultaneously due to limited flexibility. Additionally, these methods lack comprehensive modeling of graph data, relying heavily on heuristic prior knowledge. To overcome these challenges, we introduce the Graph Defense Diffusion Model (GDDM), a flexible purification method that leverages the denoising and modeling capabilities of diffusion models. The iterative nature of diffusion models aligns well with the stepwise process of adversarial attacks, making them particularly suitable for defense. By iteratively adding and removing noises (edges), GDDM effectively purifies attacked graphs, restoring their original structures and features. Our GDDM consists of two key components: (1) Graph Structure-Driven Refiner, which preserves the basic fidelity of the graph during the denoising process, and ensures that the generated graph remains consistent with the original scope; and (2) Node Feature-Constrained Regularizer, which removes residual impurities from the denoised graph, further enhancing the purification effect. By designing tailored denoising strategies to handle different types of adversarial attacks, we improve the GDDM's adaptability to various attack scenarios. Furthermore, GDDM demonstrates strong scalability, leveraging its structural properties to seamlessly transfer across similar datasets without retraining. Extensive experiments on three real-world datasets demonstrate that GDDM outperforms state-of-the-art methods in defending against various adversarial attacks, showcasing its robustness and effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2412.16107 2026-04-13 cs.RO

Allocation for Omnidirectional Aerial Robots: Incorporating Power Dynamics

Eugenio Cuniato, Mike Allenspach, Thomas Stastny, Helen Oleynikova, Roland Siegwart, Michael Pantic

2412.12686 2026-04-13 cs.CL

Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges

Yangfan Ye, Xiaocheng Feng, Xiachong Feng, Libo Qin, Yichong Huang, Lei Huang, Weitao Ma, Qichen Hong, Zhirui Zhang, Yunfei Lu, Xiaohui Yan, Duyu Tang, Dandan Tu, Bing Qin

Comments IEEE Transactions on Audio, Speech and Language Processing

2412.12242 2026-04-13 cs.CV cs.AI cs.LG

OmniPrism: Learning Disentangled Visual Concept for Image Generation

Yangyang Li, Daqing Liu, Wu Liu, Allen He, Xinchen Liu, Yongdong Zhang, Guoqing Jin

Comments WebPage available at https://tale17.github.io/omni/

2410.15001 2026-04-13 cs.LG stat.ML

FIT-GNN: Faster Inference Time for GNNs that 'FIT' in Memory Using Coarsening

Shubhajit Roy, Hrriday Ruparel, Kishan Ved, Anirban Dasgupta

Comments Published in Transactions on Machine Learning Research (TMLR), 2026. Available at https://openreview.net/forum?id=g7r7y2I7Sz

2410.09355 2026-04-13 cs.LG cs.AI stat.ML

On Divergence Measures for Training GFlowNets

Tiago da Silva, Eliezer de Souza da Silva, Diego Mesquita

Comments Accepted at NeurIPS 2024, https://openreview.net/forum?id=N5H4z0Pzvn

2410.04047 2026-04-13 cs.LG cs.AI

TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis

Wen Ye, Wei Yang, Defu Cao, Yizhou Zhang, Lumingyuan Tang, Jie Cai, Yan Liu

2407.20524 2026-04-13 cs.CL

Contrastive Feedback Mechanism for Simultaneous Speech Translation

Haotian Tan, Sakriani Sakti

Comments Accepted to Interspeech 2024 main conference

2404.10976 2026-04-13 cs.LG cs.AI cs.MA

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

Wei Duan, Jie Lu, Junyu Xuan

Comments Accepted by IJCAI 2024. Update Discussion

2403.19253 2026-04-13 cs.LG cs.MA

Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Wei Duan, Jie Lu, Junyu Xuan

Comments Accepted by IEEE TNNLS on 17-Nov-2024. Update Discussion

2402.02540 2026-04-13 cs.CV cs.CR

Embedding Non-Distortive Cancelable Face Template Generation

Dmytro Zakharov, Oleksandr Kuznetsov, Emanuele Frontoni, Natalia Kryvinska

2312.09436 2026-04-13 cs.RO cs.AI cs.LG cs.SY eess.SY

Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

Jung-Hoon Cho, Sirui Li, Jeongyun Kim, Cathy Wu

Comments 18 pages, 12 figures

2311.14756 2026-04-13 cs.LG cs.AI

Task-Distributionally Robust Data-Free Meta-Learning

Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao

2604.09521 2026-04-13 cs.IT cs.AI math.IT

Semantic Rate-Distortion for Bounded Multi-Agent Communication: Capacity-Derived Semantic Spaces and the Communication Cost of Alignment

Anthony T. Nixon

Comments 34 pages, 13 figures. Code: https://github.com/alch3mistdev/semantic-rate-distortion

详情

英文摘要

When two agents of different computational capacities interact with the same environment, they need not compress a common semantic alphabet differently; they can induce different semantic alphabets altogether. We show that the quotient POMDP $Q_{m,T}(M)$ - the unique coarsest abstraction consistent with an agent's capacity - serves as a capacity-derived semantic space for any bounded agent, and that communication between heterogeneous agents exhibits a sharp structural phase transition. Below a critical rate $R_{\text{crit}}$ determined by the quotient mismatch, intent-preserving communication is structurally impossible. In the supported one-way memoryless regime, classical side-information coding then yields exponential decay above the induced benchmark. Classical coding theorems tell you the rate once the source alphabet is fixed; our contribution is to derive that alphabet from bounded interaction itself. Concretely, we prove: (1) a fixed-$\varepsilon$ structural phase-transition theorem whose lower bound is fully general on the common-history quotient comparison; (2) a one-way Wyner-Ziv benchmark identification on quotient alphabets, with exact converse, exact operational equality for memoryless quotient sources, and an ergodic long-run bridge via explicit mixing bounds; (3) an asymptotic one-way converse in the shrinking-distortion regime $\varepsilon = O(1/T)$, proved from the message stream and decoder side information; and (4) alignment traversal bounds enabling compositional communication through intermediate capacity levels. Experiments on eight POMDP environments (including RockSample(4,4)) illustrate the phase transition, a structured-policy benchmark shows the one-way rate can drop by up to $19\times$ relative to the counting bound, and a shrinking-distortion sweep matches the regime of the asymptotic converse.

URL PDF HTML ☆

赞 0 踩 0

2604.09489 2026-04-13 cs.CR cs.AI cs.DC cs.LG

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Israt Jahan Mouri, Muhammad Ridowan, Muhammad Abdullah Adnan

Comments 21 pages, 9 figures, 7 tables

2604.09468 2026-04-13 eess.IV cs.CV

DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification

Muazzem Hussain Khan, Tasdid Hasnain, Md. Jamil khan, Ruhul Amin, Md. Shamim Reza, Md. Al Mehedi Hasan, Md Ashad Alam

Comments 25 [ages. 9 Figures

详情

英文摘要

In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within histopathological images. To validate the efficiency of the proposed architecture, an extensive experiment was executed on a comprehensive multi-cancer dataset including Breast Cancer, Oral Cancer, Lung and Colon Cancer, Kidney Cancer, and Acute Lymphocytic Leukemia (ALL), including both original and segmented images were analyzed to assess model robustness across heterogeneous clinical imaging conditions. Our approach is benchmarked alongside several state-of-the-art CNN and transfer models, including DenseNet121, DenseNet201, InceptionV3, ResNet50, EfficientNetB3, multiple ViT variants, and Swin Transformer models. However, all models were trained and validated using a unified pipeline, incorporating balanced data preprocessing, transfer learning, and fine-tuning strategies. The experimental results demonstrated that our proposed architecture consistently gained superior performance, reaching 100% test accuracy for lung-colon cancer, segmented leukemia datasets, and up to 99.23% accuracy for breast cancer classification. The model also achieved near-perfect precision, f1 score, and recall, indicating highly stable scores across divers cancer types. Overall, the proposed model establishes a highly accurate, interpretable, and also robust multi-cancer classification system, demonstrating strong benchmark for future research and provides a unified comparative assessment useful for designing reliable AI-assisted histopathological diagnosis and clinical decision-making.

URL PDF HTML ☆

赞 0 踩 0

2604.09451 2026-04-13 q-bio.QM cs.LG

An Open-Source, Open Data Approach to Activity Classification from Triaxial Accelerometry in an Ambulatory Setting

Sepideh Nikookar, Edward Tian, Harrison Hoffman, Matthew Parks, J. Lucas McKay, Yashar Kiarashi, Tommy T. Thomas, Alex Hall, David W. Wright, Gari D. Clifford

2604.09446 2026-04-13 eess.SP cs.LG

Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet

Mohammad Ali Vahedifar, Mojtaba Nazari, Qi Zhang