arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2509.23727 2026-04-10 cs.SD cs.AI

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

Junyou Wang, Zehua Chen, Binjie Yuan, Kaiwen Zheng, Chang Li, Yuxuan Jiang, Jun Zhu

Comments Accepted at ICME 2026

详情

英文摘要

The design of diffusion-based audio generation systems has been investigated from diverse perspectives, such as data space, network architecture, and conditioning techniques, while most of these innovations require model re-training. In sampling, classifier-free guidance (CFG) has been uniformly adopted to enhance generation quality by strengthening condition alignment. However, CFG often compromises diversity, resulting in suboptimal performance. Although the recent autoguidance (AG) method proposes another direction of guidance that maintains diversity, its direct application in audio generation has so far underperformed CFG. In this work, we introduce AudioMoG, an improved sampling method that enhances text-to-audio (T2A) and video-to-audio (V2A) generation quality without requiring extensive training resources. We start with an analysis of both CFG and AG, examining their respective advantages and limitations for guiding diffusion models. Building upon our insights, we introduce a mixture-of-guidance framework that integrates diverse guidance signals with their interaction terms (e.g., the unconditional bad version of the model) to maximize cumulative advantages. Experiments show that, given the same inference speed, our approach consistently outperforms single guidance in T2A generation across sampling steps, concurrently showing advantages in V2A, text-to-music, and image generation. Demo samples are available at: https://audiomog.github.io.

URL PDF HTML ☆

赞 0 踩 0

2509.16750 2026-04-10 cs.LG

Interpretable Clinical Classification with Kolmogorov-Arnold Networks

Alejandro Almodóvar, Patricia A. Apellániz, Alba Garrido, Fernando Fernández-Salvador, Santiago Zazo, Juan Parras

Comments 29 pages

详情

英文摘要

The increasing use of machine learning in clinical decision support has been limited by the lack of transparency of many high-performing models. In clinical settings, predictions must be interpretable, auditable, and actionable. This study investigates Kolmogorov-Arnold Networks (KANs) as intrinsically interpretable alternatives to conventional black-box models for clinical classification of tabular health data, aiming to balance predictive performance with clinically meaningful transparency. We introduce two KAN-based models: the Logistic KAN, a flexible generalization of logistic regression, and the Kolmogorov-Arnold Additive Model (KAAM), an additive variant that yields transparent symbolic representations through feature-wise decomposability. Both models are evaluated on multiple public clinical datasets and compared with standard linear, tree-based, and neural baselines. Across all datasets, the proposed models achieve predictive performance comparable to or exceeding that of commonly used baselines while remaining fully interpretable. Logistic-KAN obtains the highest overall ranking across evaluation metrics, with a mean reciprocal rank of 0.76, indicating consistently strong performance across tasks. KAAM provides competitive accuracy while offering enhanced transparency through feature-wise decomposability, patient-level visualizations, and nearest-patient retrieval, enabling direct inspection of individual predictions. KAN-based models provide a practical and trustworthy alternative to black-box models for clinical classification, offering a strong balance between predictive performance and interpretability for clinical decision support. By enabling transparent, patient-level reasoning and clinically actionable insights, the proposed models represent a promising step toward trustworthy AI in healthcare (code: https://github.com/Patricia-A-Apellaniz/classification_with_kans).

URL PDF HTML ☆

赞 0 踩 0

2507.06949 2026-04-10 cs.CV

Ecological Legacies of Pre-Columbian Settlements Evident in Palm Clusters of Neotropical Mountain Forests

Sebastian Fajardo, Sina Mohammadi, Jonas Gregorio de Souza, César Ardila, Alan Tapscott Baltar, Shaddai Heidgen, Maria Isabel Mayorga Hernández, Sylvia Mota de Oliveira, Fernando Montejo, Marco Moderato, Vinicius Peripato, Katy Puche, Carlos Reina, Juan Carlos Vargas, Frank W. Takes, Marco Madella

2506.08514 2026-04-10 cs.LG

DiffGradCAM: A Class Activation Map Using the Full Model Decision to Solve Unaddressed Adversarial Attacks

Jacob Piland, Chris Sweet, Adam Czajka

2506.02978 2026-04-10 cs.LG

On the Robustness of Tabular Foundation Models: Test-Time Attacks and In-Context Defenses

Mohamed Djilani, Thibault Simonetto, Karim Tit, Florian Tambon, Salah Ghamizi, Maxime Cordy, Mike Papadakis

Comments This work has been accepted for publication at the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). The final version will be available on IEEE Xplore. To IEEE SaTML 2026

2506.01062 2026-04-10 cs.CL cs.AI cs.LG

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

Thinh Pham, Nguyen Nguyen, Pratibha Zunjare, Weiyuan Chen, Yu-Min Tseng, Tu Vu

Comments Camera Ready version for ICLR 2026

2504.20656 2026-04-10 cs.LG cs.AI cs.CY cs.HC

Federated learning, ethics, and the double black box problem in medical AI

Joshua Hatherley, Anders Søgaard, Angela Ballantyne, Ruben Pauwels

2504.10284 2026-04-10 cs.CL

arXiv2Table: Toward Realistic Benchmarking and Evaluation for LLM-Based Literature-Review Table Generation

Weiqi Wang, Jiefu Ou, Yangqiu Song, Benjamin Van Durme, Daniel Khashabi

Comments ACL 2026 Main Conference

2504.04640 2026-04-10 cs.CL cs.AI

Splits! Flexible Sociocultural Linguistic Investigation at Scale

Eylon Caplan, Tania Chakraborty, Dan Goldwasser

Comments Accepted to ACL 2026 Main Conference

2503.18562 2026-04-10 cs.CL cs.AI cs.HC cs.LG

Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models

Nariman Naderi, Seyed Amir Ahmad Safavi-Naini, Thomas Savage, Zahra Atf, Peter Lewis, Girish Nadkarni, Ali Soroush

Comments 35 pages, 5 figures, 1 table, 7 supplementary figures

2503.13551 2026-04-10 cs.CL cs.AI

Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models

Teng Wang, Zhangyi Jiang, Zhenqi He, Shenyang Tong, Wenhan Yang, Yanan Zheng, Zeyu Li, Zifan He, Hailei Gong, Zewen Ye, Shengjie Ma, Jianping Zhang

2503.08907 2026-04-10 cs.LG physics.comp-ph physics.flu-dyn

From Models To Experiments: Shallow Recurrent Decoder Networks on the DYNASTY Experimental Facility

Stefano Riva, Andrea Missaglia, Carolina Introini, J. Nathan Kutz, Antonio Cammi

2503.01804 2026-04-10 cs.CL cs.AI cs.LG

$\texttt{SEM-CTRL}$: Semantically Controlled Decoding

Mohammad Albinhassan, Pranava Madhyastha, Alessandra Russo

Comments Published in Transactions on Machine Learning Research (TMLR), 03/2026

2502.19280 2026-04-10 cs.LG cs.DC cs.IR

Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing

Akash Dhasade, Rachid Guerraoui, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos

Comments To appear in the proceedings of DAIS 2026 (Distributed Applications and Interoperable Systems). An earlier version appeared at EuroMLSys 2025

2502.02514 2026-04-10 cs.CV cs.LG

Privacy Attacks on Image AutoRegressive Models

Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch, Adam Dziedzic

Comments Accepted at ICML2025

2501.00773 2026-04-10 cs.LG cs.AI cs.DB

OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks

Haoyang Li, Yuming Xu, Alexander Zhou, Yongqi Zhang, Jason Chen Zhang, Lei Chen, Qing Li

2409.02136 2026-04-10 cs.LG cs.AI cs.CL

Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Mohammadreza Ghaffarzadeh-Esfahani, Mahdi Ghaffarzadeh-Esfahani, Arian Salahi-Niri, Hossein Toreyhi, Zahra Atf, Amirali Mohsenzadeh-Kermani, Mahshad Sarikhani, Zohreh Tajabadi, Fatemeh Shojaeian, Mohammad Hassan Bagheri, Aydin Feyzi, Mohammadamin Tarighatpayma, Narges Gazmeh, Fateme Heydari, Hossein Afshar, Amirreza Allahgholipour, Farid Alimardani, Ameneh Salehi, Naghmeh Asadimanesh, Mohammad Amin Khalafi, Hadis Shabanipour, Ali Moradi, Sajjad Hossein Zadeh, Omid Yazdani, Romina Esbati, Moozhan Maleki, Danial Samiei Nasr, Amirali Soheili, Hossein Majlesi, Saba Shahsavan, Alireza Soheilipour, Nooshin Goudarzi, Erfan Taherifard, Hamidreza Hatamabadi, Jamil S Samaan, Thomas Savage, Ankit Sakhuja, Ali Soroush, Girish Nadkarni, Ilad Alavi Darazam, Mohamad Amin Pourhoseingholi, Seyed Amir Ahmad Safavi-Naini

Comments Code is available at: https://github.com/mohammad-gh009/Large-Language-Models-vs-Classical-Machine-learning and https://github.com/Sdamirsa/Tehran_COVID_Cohort. The datasets are available from the corresponding author on reasonable request (sdamirsa@ymail.com)

2409.00084 2026-04-10 cs.CL cs.AI

Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models

Seyed Amir Ahmad Safavi-Naini, Shuhaib Ali, Omer Shahab, Zahra Shahhoseini, Thomas Savage, Sara Rafiee, Jamil S Samaan, Reem Al Shabeeb, Farah Ladak, Jamie O Yang, Juan Echavarria, Sumbal Babar, Aasma Shaukat, Samuel Margolis, Nicholas P Tatonetti, Girish Nadkarni, Bara El Kurdi, Ali Soroush

Comments Manuscript Pages: 34, Figures: 7, Tables: 2, Supplementary File Pages: 35, Data Transparency Statement: Code is available at: https://github.com/Sdamirsa/LLM-VLM-in-Gastroenterology . Study data from American College of Gastroenterology (ACG) are restricted and available upon request with ACG permission. Correction: updated abstract considering Llama3.1 results

详情

DOI: 10.1038/s41746-025-02174-0
Journal ref: npj Digital Medicine 8, 797 (2025)

英文摘要

Background and Aims: This study evaluates the medical reasoning performance of large language models (LLMs) and vision language models (VLMs) in gastroenterology. Methods: We used 300 gastroenterology board exam-style multiple-choice questions, 138 of which contain images to systematically assess the impact of model configurations and parameters and prompt engineering strategies utilizing GPT-3.5. Next, we assessed the performance of proprietary and open-source LLMs (versions), including GPT (3.5, 4, 4o, 4omini), Claude (3, 3.5), Gemini (1.0), Mistral, Llama (2, 3, 3.1), Mixtral, and Phi (3), across different interfaces (web and API), computing environments (cloud and local), and model precisions (with and without quantization). Finally, we assessed accuracy using a semiautomated pipeline. Results: Among the proprietary models, GPT-4o (73.7%) and Claude3.5-Sonnet (74.0%) achieved the highest accuracy, outperforming the top open-source models: Llama3.1-405b (64%), Llama3.1-70b (58.3%), and Mixtral-8x7b (54.3%). Among the quantized open-source models, the 6-bit quantized Phi3-14b (48.7%) performed best. The scores of the quantized models were comparable to those of the full-precision models Llama2-7b, Llama2--13b, and Gemma2-9b. Notably, VLM performance on image-containing questions did not improve when the images were provided and worsened when LLM-generated captions were provided. In contrast, a 10% increase in accuracy was observed when images were accompanied by human-crafted image descriptions. Conclusion: In conclusion, while LLMs exhibit robust zero-shot performance in medical reasoning, the integration of visual data remains a challenge for VLMs. Effective deployment involves carefully determining optimal model configurations, encouraging users to consider either the high performance of proprietary models or the flexible adaptability of open-source models.

URL PDF HTML ☆

赞 0 踩 0

2407.09658 2026-04-10 cs.LG cs.CR

BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

Zhengyuan Jiang, Xingyu Lyu, Shanghao Shi, Yang Xiao, Yimin Chen, Y. Thomas Hou, Wenjing Lou, Ning Wanga

详情

DOI: 10.3233/faia250914
Journal ref: ECAI 2025

英文摘要

Federated learning, while being a promising approach for collaborative model training, is susceptible to backdoor attacks due to its decentralized nature. Backdoor attacks have shown remarkable stealthiness, as they compromise model predictions only when inputs contain specific triggers. As a countermeasure, anomaly detection is widely used to filter out backdoor attacks in FL. However, the non-independent and identically distributed (non-IID) data distribution nature of FL clients presents substantial challenges in backdoor attack detection, as the data variety introduces variance among benign models, making them indistinguishable from malicious ones. In this work, we propose a novel distribution-aware backdoor detection mechanism, BoBa, to address this problem. To differentiate outliers arising from data variety versus backdoor attacks, we propose to break down the problem into two steps: clustering clients utilizing their data distribution, and followed by a voting-based detection. We propose a novel data distribution inference mechanism for accurate data distribution estimation. To improve detection robustness, we introduce an overlapping clustering method, where each client is associated with multiple clusters, ensuring that the trustworthiness of a model update is assessed collectively by multiple clusters rather than a single cluster. Through extensive evaluations, we demonstrate that BoBa can reduce the attack success rate to lower than 0.001 while maintaining high main task accuracy across various attack strategies and experimental settings.

URL PDF HTML ☆

赞 0 踩 0

2404.02696 2026-04-10 cs.LG

Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition

Behrooz Razeghi, Parsa Rahimi, Sébastien Marcel

2403.14922 2026-04-10 cs.LG cs.NI

CODA: A Continuous Online Evolve Framework for Deploying HAR Sensing Systems

Minghui Qiu, Jun Chen, Lin Chen, Shuxin Zhong, Yandao Huang, Lu Wang, Kaishun Wu

2401.14992 2026-04-10 cs.LG cs.DB

Graph-based Active Learning for Entity Cluster Repair

Victor Christen, Daniel Obraczka, Marvin Hofer, Martin Franke, Erhard Rahm

2604.08194 2026-04-10 cs.LG cs.NA math.NA

Approximation of the Basset force in the Maxey-Riley-Gatignol equations via universal differential equations

Finn Sommer, Vamika Rathi, Sebastian Goetschel, Daniel Ruprecht

Comments 24 pages, 15 figures

2604.08192 2026-04-10 cs.LG cs.CV

Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings

Yunxiang Peng, Mengmeng Ma, Ziyu Yao, Xi Peng

Comments CVPR 2026(Highlight)

2604.08189 2026-04-10 cs.LG

Equivariant Efficient Joint Discrete and Continuous MeanFlow for Molecular Graph Generation

Rongjian Xu, Teng Pang, Zhiqiang Dong, Guoqiang Wu

2604.08185 2026-04-10 cs.RO

State and Trajectory Estimation of Tensegrity Robots via Factor Graphs and Chebyshev Polynomials

Edgar Granados, Patrick Meng, Charles Tang, Shrimed Sangani, William R. Johnson, Rebecca Kramer-Bottiglio, Kostas Bekris

Comments Accepted at Robotsoft 2026

2604.08184 2026-04-10 cs.SD cs.AI

AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

Yuankun Xie, Haonan Cheng, Jiayi Zhou, Xiaoxuan Guo, Tao Wang, Jian Liu, Weiqiang Wang, Ruibo Fu, Xiaopeng Wang, Hengyan Huang, Xiaoying Huang, Long Ye, Guangtao Zhai

Comments Accepted to the ACM Multimedia 2026 Grand Challenge

2604.08181 2026-04-10 cs.LG

Long-Term Embeddings for Balanced Personalization

Andrii Dzhoha, Egor Malykh

2604.08174 2026-04-10 cs.LG

Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

Teng Pang, Zhiqiang Dong, Yan Zhang, Rongjian Xu, Guoqiang Wu, Yilong Yin

2604.08172 2026-04-10 cs.CV

On the Global Photometric Alignment for Low-Level Vision

Mingjia Li, Tianle Du, Hainuo Wang, Qiming Hu, Xiaojie Guo