arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.08203 2026-04-10 cs.CV cs.AI

MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning

Zheng Jiang, Heng Guo, Chengyu Fang, Changchen Xiao, Xinyang Hu, Lifeng Sun, Minfeng Xu

Comments Accepted by ICLR 2026

详情

英文摘要

Medical Vision-Language Models (VLMs) hold immense promise for complex clinical tasks, but their reasoning capabilities are often constrained by text-only paradigms that fail to ground inferences in visual evidence. This limitation not only curtails performance on tasks requiring fine-grained visual analysis but also introduces risks of visual hallucination in safety-critical applications. Thus, we introduce MedVR, a novel reinforcement learning framework that enables annotation-free visual reasoning for medical VLMs. Its core innovation lies in two synergistic mechanisms: Entropy-guided Visual Regrounding (EVR) uses model uncertainty to direct exploration, while Consensus-based Credit Assignment (CCA) distills pseudo-supervision from rollout agreement. Without any human annotations for intermediate steps, MedVR achieves state-of-the-art performance on diverse public medical VQA benchmarks, significantly outperforming existing models. By learning to reason directly with visual evidence, MedVR promotes the robustness and transparency essential for accelerating the clinical deployment of medical AI.

URL PDF HTML ☆

赞 0 踩 0

2604.07053 2026-04-10 cs.CV

AnchorSplat: Feed-Forward 3D Gaussian Splatting with 3D Geometric Priors

Xiaoxue Zhang, Xiaoxu Zheng, Yixuan Yin, Tiao Zhao, Kaihua Tang, Michael Bi Mi, Zhan Xu, Dave Zhenyu Chen

Comments CVPR 2026

2604.06613 2026-04-10 cs.CL cs.AI cs.IT cs.LG math.IT

The Detection-Extraction Gap: Models Know the Answer Before They Can Say It

Hanyang Wang, Mingxuan Zhu

2604.03317 2026-04-10 cs.CV

Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning

Junyuan Liang, Qi Zhou, Sahan Bulathwela, Mutlu Cukurova

Comments 15 pages, 6 figures, 2 tables, accepted by the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

2603.29184 2026-04-10 cs.LG cs.NA math.NA

Biomimetic causal learning for microstructure-forming phase transitions

Anci Lin, Xiaohong Liu, Zhiwen Zhang, Wenju Zhao

2603.28618 2026-04-10 cs.AI

Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning

Ziqi Miao, Haonan Jia, Lijun Li, Chen Qian, Yuan Xiong, Wenting Yan, Jing Shao

Comments 21 pages, 15 figures, 6 tables

2603.28507 2026-04-10 cs.LG cs.AI

Continued AI Scaling Requires Repeated Efficiency Doublings

Chien-Ping Lu

Comments 9 pages, 1 figure. v2

2603.27765 2026-04-10 cs.AI

Let the Agent Steer: Closed-Loop Ranking Optimization via Influence Exchange

Yin Cheng, Liao Zhou, Xiyu Liang, Dihao Luo, Tewei Lee, Kailun Zheng, Weiwei Zhang, Mingchen Cai, Jian Dong, Andy Zhang

2603.23208 2026-04-10 cs.LG

A One-Inclusion Graph Approach to Multi-Group Learning

Noah Bergam, Samuel Deng, Daniel Hsu

Comments An error in the main proof of our main lemma was found by an anonymous reviewer, particularly in the parameter required to find a feasible matching in our reduction to a "multi-group" bipartite matching problem. We did not find a way to fix the error through current techniques

2603.18056 2026-04-10 cs.LG

Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse

Dip Roy, Rajiv Misra, Sanjay Kumar Singh

详情

DOI: 10.1016/j.neucom.2026.133498
Journal ref: Neurocomputing, Volume 682, 14 June 2026, 133498

英文摘要

Extreme neural network sparsification (90% activation reduction) presents a critical challenge for mechanistic interpretability: understanding whether interpretable features survive aggressive compression. This work investigates feature survival under severe capacity constraints in hybrid Variational Autoencoder--Sparse Autoencoder (VAE-SAE) architectures. We introduce an adaptive sparsity scheduling framework that progressively reduces active neurons from 500 to 50 over 50 training epochs, and provide empirical evidence for fundamental limits of the sparsification-interpretability relationship. Testing across two benchmark datasets -- dSprites and Shapes3D -- with both Top-k and L1 sparsification methods, our key finding reveals a pervasive paradox: while global representation quality (measured by Mutual Information Gap) remains stable, local feature interpretability collapses systematically. Under Top-k sparsification, dead neuron rates reach $34.4\pm0.9\%$ on dSprites and $62.7\pm1.3\%$ on Shapes3D at k=50. L1 regularization -- a fundamentally different "soft constraint" paradigm -- produces equal or worse collapse: $41.7\pm4.4\%$ on dSprites and $90.6\pm0.5\%$ on Shapes3D. Extended training for 100 additional epochs fails to recover dead neurons, and the collapse pattern is robust across all tested threshold definitions. Critically, the collapse scales with dataset complexity: Shapes3D (RGB, 6 factors) shows $1.8\times$ more dead neurons than dSprites (grayscale, 5 factors) under Top-k and $2.2\times$ under L1. These findings establish that interpretability collapse under sparsification is intrinsic to the compression process rather than an artifact of any particular algorithm, training duration, or threshold choice.

URL PDF HTML ☆

赞 0 踩 0

2603.04759 2026-04-10 cs.CL cs.AI

Stacked from One: Multi-Scale Self-Injection for Context Window Extension

Wei Han, Pan Zhou, Soujanya Poria, Shuicheng Yan

Comments 20 pages, 6 figures

2602.03249 2026-04-10 cs.AI cs.LG

Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning

Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Wenlei Shi, Yiwei Wang, Xiaodan Liang, Jing Tang

2601.21872 2026-04-10 cs.AI

WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents

Yao Zhang, Shijie Tang, Zeyu Li, Zhen Han, Volker Tresp

Comments Published as a conference paper at ICLR 2026. Extended version with additional experiments

2601.03786 2026-04-10 cs.CL cs.LG

Compact Example-Based Explanations for Language Models

Loris Schoenegger, Benjamin Roth

Comments ACL 2026 Findings. 9 pages

2601.02535 2026-04-10 cs.CL cs.AI

ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation

Hyeong Kyu Choi, Sharon Li

Comments ACL 2026 Main

2512.19253 2026-04-10 cs.LG cs.AI cs.CV

Machine Unlearning in the Era of Quantum Machine Learning: An Empirical Study

Carla Crivoi, Radu Tudor Ionescu

Comments Accepted at ICPR 2026

2512.19173 2026-04-10 cs.CL cs.CV

CycleChart: A Unified Consistency-Based Learning Framework for Bidirectional Chart Understanding and Generation

Dazhen Deng, Sen Yang, Yuchen He, Yuan Tian, Yingcai Wu

2512.17489 2026-04-10 cs.CV

LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models

Muhammad Atif Butt, Kai Wang, Javier Vazquez-Corral, Joost Van De Weijer

Comments Accepted to IEEE/CVF CVPR 2026 Workshop on AI for Creative Visual Content Generation, Editing, and Understanding (CVEU)

2512.12623 2026-04-10 cs.CV cs.CL

Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

Chengzhi Liu, Yuzhe Yang, Yue Fan, Qingyue Wei, Sheng Liu, Xin Eric Wang

2512.09928 2026-04-10 cs.RO

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Minghui Lin, Pengxiang Ding, Shu Wang, Zifeng Zhuang, Yang Liu, Xinyang Tong, Wenxuan Song, Shangke Lyu, Siteng Huang, Donglin Wang

Comments CVPR 2026, Project page: https://hifvla.github.io, Github: https://github.com/OpenHelix-Team/HiF-VLA

2512.09665 2026-04-10 cs.CV cs.CY cs.LG

OxEnsemble: Fair Ensembles for Low-Data Classification

Jonathan Rystrøm, Zihao Fu, Chris Russell

Comments Forthcoming @ MIDL 2026

2511.18387 2026-04-10 cs.AI

Scaling Implicit Fields via Hypernetwork-Driven Multiscale Coordinate Transformations

Plein Versace

Comments arXiv admin note: This paper has been withdrawn by arXiv due to unverifiable authorship and affiliation

2511.18384 2026-04-10 cs.SD cs.AI

NSTR: Neural Spectral Transport Representation for Space-Varying Frequency Fields

Plein Versace

Comments arXiv admin note: This paper has been withdrawn by arXiv due to unverifiable authorship and affiliation

2511.11666 2026-04-10 cs.LG stat.ML

Adaptive Stepsizing for Stochastic Gradient Langevin Dynamics in Bayesian Neural Networks

Rajit Rajpal, Benedict Leimkuhler, Yuanhao Jiang

2511.08605 2026-04-10 cs.CL cs.CY cs.HC cs.MA cs.MM

Mina: A Multilingual LLM-Powered Legal Assistant Agent for Bangladesh for Empowering Access to Justice

Azmine Toushik Wasi, Wahid Faisal, Mst Rafia Islam, Md Rizwan Parvez

Comments Accepted to ACL 2026 Findings

2510.21366 2026-04-10 cs.CV cs.LG

BADiff: Bandwidth Adaptive Diffusion Model

Xi Zhang, Hanwei Zhu, Yan Zhong, Jiamang Wang, Weisi Lin

Comments NeurIPS 2025 Poster

2510.20549 2026-04-10 cs.CV cs.RO

Deep Learning-Powered Visual SLAM Aimed at Assisting Visually Impaired Navigation

Marziyeh Bamdad, Hans-Peter Hutter, Alireza Darvishy

Comments 8 pages, 7 figures, 4 tables. Published in the Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025), VISAPP

2510.17458 2026-04-10 cs.LG physics.geo-ph

Explainable AI for microseismic event detection

Ayrat Abdullin, Denis Anikiev, Umair Bin Waheed

Comments v2: Revised manuscript after journal review; updated methods/results; now under review at Artificial Intelligence in Geosciences

2510.14096 2026-04-10 cs.LG

TENDE: Transfer Entropy Neural Diffusion Estimation

Simon Pedro Galeano Munoz, Mustapha Bounoua, Giulio Franzese, Pietro Michiardi, Maurizio Filippone

2509.26036 2026-04-10 cs.CV cs.AI cs.LG

SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP

Christoph Timmermann, Hyunse Lee, Woojin Lee

Comments 22 pages, 12 figures