arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.06469 2026-04-09 cs.CV

Predicting Alzheimer's disease progression using rs-fMRI and a history-aware graph neural network

Mahdi Moghaddami, Mohammad-Reza Siadat, Austin Toma, Connor Laming, Huirong Fu

Comments Proc. SPIE 13926, Medical Imaging 2026: Computer-Aided Diagnosis, 1392604

详情

DOI: 10.1117/12.3088066
Journal ref: Proceedings Volume 13926, Medical Imaging 2026: Computer-Aided Diagnosis; 1392604 (2026)

英文摘要

Alzheimer's disease (AD) is a neurodegenerative disorder that affects more than seven million people in the United States alone. AD currently has no cure, but there are ways to potentially slow its progression if caught early enough. In this study, we propose a graph neural network (GNN)-based model for predicting whether a subject will transition to a more severe stage of cognitive impairment at their next clinical visit. We consider three stages of cognitive impairment in order of severity: cognitively normal (CN), mild cognitive impairment (MCI), and AD. We use functional connectivity graphs derived from resting-state functional magnetic resonance imaging (rs-fMRI) scans of 303 subjects, each with a different number of visits. Our GNN-based model incorporates a recurrent neural network (RNN) block, enabling it to process data from the subject's entire visit history. It can also work with irregular time gaps between visits by incorporating visit distance information into our input features. Our model demonstrates robust predictive performance, even with missing visits in the subjects' visit histories. It achieves an accuracy of 82.9%, with an especially impressive accuracy of 68.8% on CN to MCI conversions - a task that poses a substantial challenge in the field. Our results highlight the effectiveness of rs-fMRI in predicting the onset of MCI or AD and, in conjunction with other modalities, could offer a viable method for enabling timely interventions to slow the progression of cognitive impairment.

URL PDF HTML ☆

赞 0 踩 0

2604.06467 2026-04-09 cs.CV

PhysHead: Simulation-Ready Gaussian Head Avatars

Berna Kabadayi, Vanessa Sklyarova, Wojciech Zielonka, Justus Thies, Gerard Pons-Moll

Comments Project Page: see https://phys-head.github.io/; Youtube Video: see https://www.youtube.com/watch?v=k68fsSSwzc0; Accepted to CVPR 2026

2604.06465 2026-04-09 cs.CL cs.AI

Multi-objective Evolutionary Merging Enables Efficient Reasoning Models

Mario Iacobelli, Adrian Robert Minut, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Iacopo Masi, Emanuele Rodolà

2604.06464 2026-04-09 cs.LG physics.app-ph stat.ML

Weighted Bayesian Conformal Prediction

Xiayin Lou, Peng Luo

2604.06456 2026-04-09 cs.CL

Context-Aware Dialectal Arabic Machine Translation with Interactive Region and Register Selection

Afroza Nowshin, Prithweeraj Acharjee Porag, Haziq Jeelani, Fayeq Jeelani Syed

Comments 14 pages, 5 figures, 5 tables. Preprint under review

2604.06452 2026-04-09 cs.CL

Learning to Interrupt in Language-based Multi-agent Communication

Danqing Wang, Da Yin, Ruta Desai, Lei Li, Asli Celikyilmaz, Ansong Ni

2604.06451 2026-04-09 cs.LG

Quality-preserving Model for Electronics Production Quality Tests Reduction

Noufa Haneefa, Teddy Lazebnik, Einav Peretz-Andersson

2604.06440 2026-04-09 cs.CV cs.LG

Visual prompting reimagined: The power of the Activation Prompts

Yihua Zhang, Hongkang Li, Yuguang Yao, Aochuan Chen, Shuai Zhang, Pin-Yu Chen, Meng Wang, Sijia Liu

Comments AISTATS 2026

2604.06435 2026-04-09 cs.CV cs.AI

Continual Visual Anomaly Detection on the Edge: Benchmark and Efficient Solutions

Manuel Barusco, Francesco Borsatti, David Petrovic, Davide Dalle Pezze, Gian Antonio Susto

2604.06427 2026-04-09 cs.LG cs.AI cs.CL

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

Yi Xu, Philipp Jettkant, Laura Ruis

Comments 10 pages, 3 figures, 1 table (30 pages, 9 figures, 10 tables including references and appendices)

2604.06424 2026-04-09 cs.CL cs.AI

Team Fusion@ SU@ BC8 SympTEMIST track: transformer-based approach for symptom recognition and linking

Georgi Grazhdanski, Sylvia Vassileva, Ivan Koychev, Svetla Boytcheva

Comments 6 pages, 3 tables, Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models, American Medical Informatics Association 2023 Annual Symposium

2604.06422 2026-04-09 cs.CL cs.AI cs.CV

When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

Jonathan Nemitz, Carsten Eickhoff, Junyi Jessy Li, Kyle Mahowald, Michal Golovanevsky, William Rudman

2604.06421 2026-04-09 cs.CL

State-of-the-Art Arabic Language Modeling with Sparse MoE Fine-Tuning and Chain-of-Thought Distillation

Navan Preet Singh, Anurag Garikipati, Ahmed Abulkhair, Jyani Akshay Jagdishbhai, Atul Yaduvanshi, Amarendra Chaudhary, Madalina Ciobanu, Qingqing Mao, Ritankar Das

详情

英文摘要

This paper introduces Arabic-DeepSeek-R1, an application-driven open-source Arabic LLM that leverages a sparse MoE backbone to address the digital equity gap for under-represented languages, and establishes a new SOTA across the entire Open Arabic LLM Leaderboard (OALL). Our four-phase CoT distillation scheme integrates Arabic-specific linguistic verification and regional ethical norms into a 372M-token, contamination-controlled 80/20 Arabic-English training mixture. Arabic-DeepSeek-R1 achieves the highest average score across the seven-benchmark OALL suite while establishing SOTA or near-SOTA, including dominant results on grammar-focused MadinahQA (surpassing both GPT-5.1 and the OALL leader by substantial margins), safety-oriented AraTrust, multi-ability AlGhafa, and retrieval-augmented ALRAGE. Our results indicate that the combination of sparse MoE architecture, culturally-informed CoT distillation with explicit Arabic linguistic checks, and strategic bilingual data curation enables an open-source adapted model to systematically outperform the proprietary frontier system GPT-5.1 on the majority of benchmarks evaluating comprehensive language-specific tasks: the first such demonstration for Arabic LLMs. These findings indicate that much of Arabic's performance deficit in current LLM ecosystems stems from under-specialization rather than architectural limitations, and that parameter-efficient adaptation of open reasoning models can yield breakthrough SOTA performance without industrial-scale pretraining costs. Arabic-DeepSeek-R1 establishes a validated and replicable framework for sovereign and domain-specific language technologies, demonstrating that strategic, culturally-grounded adaptation of sparse MoE backbones offers a viable and cost-effective pathway to achieving record-breaking performance across standardized benchmarks for low-resource languages.

URL PDF HTML ☆

赞 0 踩 0

2604.06416 2026-04-09 cs.CL cs.AI cs.LG

Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries

Rebecca M. M. Hicke, Sil Hamilton, David Mimno, Ross Deans Kristensen-McLachlan

2604.06413 2026-04-09 cs.LG

ODE-free Neural Flow Matching for One-Step Generative Modeling

Xiao Shou

2604.06405 2026-04-09 cs.AI cs.DB

BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization

Roque Lopez, Yurong Liu, Christos Koutras, Juliana Freire

2604.06403 2026-04-09 cs.CL cs.AI

FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts

Sylvia Vassileva, Ivan Koychev, Svetla Boytcheva

Comments 8 pages, 1 figure, 6 tables, Challenge and Workshop BC9 Large Language Models for Clinical and Biomedical NLP, International Joint Conference on Artificial Intelligence IJCAI 2025

2604.06401 2026-04-09 cs.AI cs.CE cs.CV cs.LG

ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

Kranthi Kommuru, Kunal Khanvilkar, Gaurav Parekh

2604.06395 2026-04-09 cs.LG q-bio.NC stat.ML

Bridging Theory and Practice in Crafting Robust Spiking Reservoirs

Ruggero Freddi, Nicolas Seseri, Diana Nigrisoli, Alessio Basti

2604.06393 2026-04-09 cs.CL

ART: Attention Replacement Technique to Improve Factuality in LLMs

Ziqin Luo, Yihao Quan, Xiaofeng Zhang, Xiaosong Yuan, Chen Shen

2604.06392 2026-04-09 cs.AI cs.MA cs.SE

Qualixar OS: A Universal Operating System for AI Agent Orchestration

Varun Pratap Bhardwaj

Comments 20 pages, 7 figures, 8 tables. Zenodo DOI: 10.5281/zenodo.19454219

2604.06391 2026-04-09 cs.LG cs.AI

Toward a universal foundation model for graph-structured data

Sakib Mostafa, Lei Xing, Md. Tauhidul Islam

Comments 19 pages, 5 figures, 12 supplementary figures

详情

英文摘要

Graphs are a central representation in biomedical research, capturing molecular interaction networks, gene regulatory circuits, cell--cell communication maps, and knowledge graphs. Despite their importance, currently there is not a broadly reusable foundation model available for graph analysis comparable to those that have transformed language and vision. Existing graph neural networks are typically trained on a single dataset and learn representations specific only to that graph's node features, topology, and label space, limiting their ability to transfer across domains. This lack of generalization is particularly problematic in biology and medicine, where networks vary substantially across cohorts, assays, and institutions. Here we introduce a graph foundation model designed to learn transferable structural representations that are not specific to specific node identities or feature schemes. Our approach leverages feature-agnostic graph properties, including degree statistics, centrality measures, community structure indicators, and diffusion-based signatures, and encodes them as structural prompts. These prompts are integrated with a message-passing backbone to embed diverse graphs into a shared representation space. The model is pretrained once on heterogeneous graphs and subsequently reused on unseen datasets with minimal adaptation. Across multiple benchmarks, our pretrained model matches or exceeds strong supervised baselines while demonstrating superior zero-shot and few-shot generalization on held-out graphs. On the SagePPI benchmark, supervised fine-tuning of the pretrained backbone achieves a mean ROC-AUC of 95.5%, a gain of 21.8% over the best supervised message-passing baseline. The proposed technique thus provides a unique approach toward reusable, foundation-scale models for graph-structured data in biomedical and network science applications.

URL PDF HTML ☆

赞 0 踩 0

2604.06389 2026-04-09 cs.AI

SELFDOUBT: Uncertainty Quantification for Reasoning LLMs via the Hedge-to-Verify Ratio

Satwik Pandey, Suresh Raghu, Shashwat Pandey

Comments 9 pages, 4 figures, 4 tables, plus appendix. Submitted to COLM 2026

2604.06387 2026-04-09 cs.RO cs.AI

Uncertainty Estimation for Deep Reconstruction in Actuatic Disaster Scenarios with Autonomous Vehicles

Samuel Yanes Luis, Alejandro Casado Pérez, Alejandro Mendoza Barrionuevo, Dame Seck Diop, Sergio Toral Marín, Daniel Gutiérrez Reina

2604.06385 2026-04-09 cs.CL

Application-Driven Pedagogical Knowledge Optimization of Open-Source LLMs via Reinforcement Learning and Supervised Fine-Tuning

Navan Preet Singh, Xiaokun Wang, Anurag Garikipati, Madalina Ciobanu, Qingqing Mao, Ritankar Das

Comments * These authors contributed equally to this work and share first authorship

2604.06382 2026-04-09 cs.RO

Designing Privacy-Preserving Visual Perception for Robot Navigation Based on User Privacy Preferences

Xuying Huang, Sicong Pan, Delphine Reinhardt, Maren Bennewitz

2604.06377 2026-04-09 cs.LG cs.AI

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu

2604.06376 2026-04-09 cs.CV

MTA-Agent: An Open Recipe for Multimodal Deep Search Agents

Xiangyu Peng, Can Qin, An Yan, Xinyi Yang, Zeyuan Chen, Ran Xu, Chien-Sheng Wu

详情

英文摘要

Multimodal large language models (MLLMs) have demonstrated strong capabilities in visual understanding, yet they remain limited in complex, multi-step reasoning that requires deep searching and integrating visual evidence with external knowledge. In this work, we address this challenge by constructing high-quality, verified multi-hop vision-language training data for multimodal deep-search agents. We propose a Multi-hop Tool-Augmented Agent for Evidence-based QA Synthesis (MTA-Agent), which automatically selects tools and their parameters to retrieve and validate evidence from both visual and textual sources and generates structured multi-hop question-answer trajectories. Starting from diverse VQA seed datasets, our pipeline produces a large-scale training dataset, MTA-Vision-DeepSearch, containing 21K high-quality multi-hop examples. The data is filtered through a multi-stage verification process to ensure factual consistency and answer uniqueness. Using MTA-Vision-DeepSearch, a 32B open-source multimodal search agent achieves state-of-the-art performance, reaching an average of 54.63\% across six challenging benchmarks, outperforming GPT-5 (51.86\%), Gemini-2.5-Pro (50.98\%), and Gemini-3-Pro (54.46\%) under the same tool settings. We further show that training on our data improves both reasoning depth and tool-use behavior, increasing the average number of steps from 2.27 to 4.28, and leading to more systematic and persistent search strategies. Additionally, we demonstrate that training can be performed without real-time tool calls by replaying cached interactions, significantly reducing training cost. Importantly, we present MTA-Agent as a fully open recipe for multimodal deep search: we release the entire dataset, training trajectories, and implementation details to enable reproducibility and future research on open multimodal search agents.

URL PDF HTML ☆

赞 0 踩 0

2604.06375 2026-04-09 cs.AI

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

Isaac Henry, Avery Byrne, Christopher Giza, Ron Henry, Shahram Yazdani

Comments 18 pages, 1 figure,

2604.06374 2026-04-09 cs.CL cs.LG

The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models

Michael Rizvi-Martel, Guillaume Rabusseau, Marius Mosbach

Comments 9 pages