arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.13962 2026-03-31 cs.CL

sebis at ArchEHR-QA 2026: How Much Can You Do Locally? Evaluating Grounded EHR QA on a Single Notebook

Ibrahim Ebrar Yurt, Fabian Karl, Tejaswi Choppa, Florian Matthes

详情

英文摘要

Clinical question answering over electronic health records (EHRs) can help clinicians and patients access relevant medical information more efficiently. However, many recent approaches rely on large cloud-based models, which are difficult to deploy in clinical environments due to privacy constraints and computational requirements. In this work, we investigate how far grounded EHR question answering can be pushed when restricted to a single notebook. We participate in all four subtasks of the ArchEHR-QA 2026 shared task and evaluate several approaches designed to run on commodity hardware. All experiments are conducted locally without external APIs or cloud infrastructure. Our results show that such systems can achieve competitive performance on the shared task leaderboards. In particular, our submissions perform above average in two subtasks, and we observe that smaller models can approach the performance of much larger systems when properly configured. These findings suggest that privacy-preserving EHR QA systems running fully locally are feasible with current models and commodity hardware. The source code is available at https://github.com/ibrahimey/ArchEHR-QA-2026.

URL PDF HTML ☆

赞 0 踩 0

2603.12533 2026-03-31 cs.CV

Do You See What I Am Pointing At? Gesture-Based Egocentric Video Question Answering

Yura Choi, Roy Miles, Rolandos Alexandros Potamias, Ismail Elezi, Jiankang Deng, Stefanos Zafeiriou

Comments Accepted to CVPR 2026

2603.11410 2026-03-31 cs.CV

Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary

Nazia Tasnim, Keanu Nichols, Yuting Yang, Nicholas Ikechukwu, Elva Zou, Deepti Ghadiyaram, Bryan A. Plummer

Comments This is a replacement and updated version for submission arXiv:2505.21649 : Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks

2603.08206 2026-03-31 cs.LG cs.AI

Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules

Jonas Landsgesell, Pascal Knoll

2603.08021 2026-03-31 cs.RO cs.CV

AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis

Xiaofei Wu, Yi Zhang, Yumeng Liu, Yuexin Ma, Yujiao Shi, Xuming He

Comments CVPR 2026

2603.07619 2026-03-31 cs.CV

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan

Comments CVPR2026 Findings

2603.07455 2026-03-31 cs.CV cs.AI cs.CL cs.GR

Image Generation Models: A Technical History

Rouzbeh Shirvani

2603.05097 2026-03-31 cs.RO

AIM-SLAM: Dense Monocular SLAM via Adaptive and Informative Multi-View Keyframe Prioritization with Foundation Model

Jinwoo Jeon, Dong-Uk Seo, Eungchang Mason Lee, Hyun Myung

Comments 8 pages

2603.04955 2026-03-31 cs.LG physics.med-ph

Uncertainty quantification in neural network-based glucose prediction for diabetes

Hai Siong Tan, Rafe McBeth

Comments 20 pages, 7 figures; v2: minor revisions with PR-AUC curves included in result analysis. Code available at https://github.com/HaiSiong-Tan/Uncertainty_aware_glucose_prediction

2603.04528 2026-03-31 cs.AI math.HO

Discovering mathematical concepts through a multi-agent system

Daattavya Aggarwal, Oisin Kim, Carl Henrik Ek, Challenger Mishra

Comments Added link to code base

2603.04427 2026-03-31 cs.LG cs.AI

Thin Keys, Full Values: Reducing KV Cache via Low-Dimensional Attention Selection

Hengshuai Yao, Xing Chen, Ahmed Murtadha, Guan Wang

2603.03192 2026-03-31 cs.CV cs.CL cs.LG

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani

Comments CVPR 2026. Project Page: https://mod-dpo.github.io/

2603.02190 2026-03-31 cs.CV cs.AI cs.GR cs.HC cs.LG

Sketch2Colab: Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

Divyanshu Daiya, Aniket Bera

Comments Accepted to CVPR 2026 Main Conference (11 pages, 8 figures)

2603.01608 2026-03-31 cs.AI

Evaluating and Understanding Scheming Propensity in LLM Agents

Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, David Lindner

2602.23153 2026-03-31 cs.CV cs.AI

Efficient Encoder-Free Fourier-based 3D Large Multimodal Model

Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Yiming Wang, Fabio Poiesi

2602.22419 2026-03-31 cs.CV

CLIP Is Shortsighted: Paying Attention Beyond the First Sentence

Marc-Antoine Lavoie, Anas Mahmoud, Aldo Zaimi, Arsene Fansi Tchango, Steven L. Waslander

Comments 20 pages, 15 figures, to be published in the CVPR 2026 proceedings

2602.21655 2026-03-31 cs.CV cs.AI

CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning

Zhijiang Tang, Linhua Wang, Jiaxin Qi, Weihao Jiang, Peng Hou, Anxiang Zeng, Jianqiang Huang

Comments Accept by CVPR 2026

2602.19778 2026-03-31 cs.SD cs.IR cs.LG cs.MM

Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation

Nghia Phan, Rong Jin, Gang Liu, Xiao Dong

Comments 8 pages, 6 figures, 3 tables

2602.18845 2026-03-31 cs.CV

Echoes of ownership: Adversarial-guided dual injection for copyright protection in MLLMs

Chengwei Xia, Fan Ma, Ruijie Quan, Yunqiu Xu, Kun Zhan, Yi Yang

Comments Accepted to CVPR 2026!

2602.17542 2026-03-31 cs.CL cs.CY

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Zhangqi Duan, Arnav Kankaria, Dhruv Kartik, Andrew Lan

2602.15608 2026-03-31 cs.RO physics.app-ph

Grip as Needed, Glide on Demand: Ultrasonic Lubrication for Robotic Locomotion

Mostafa A. Atalla, Daan van Bemmel, Jack Cummings, Paul Breedveld, Michaël Wiertlewski, Aimée Sakes

Comments Accepted for publication in the 2026 IEEE International Conference on Robotics and Automation (ICRA) in Vienna

2602.12957 2026-03-31 cs.CV

HSD: Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding

Wenhui Liao, Hongliang Li, Pengyu Xie, Xinyu Cai, Yufan Shen, Yi Xin, Qi Qin, Shenglong Ye, Tianbin Li, Ming Hu, Junjun He, Yihao Liu, Wenhai Wang, Min Dou, Bin Fu, Botian Shi, Yu Qiao, Lianwen Jin

2602.11321 2026-03-31 cs.RO

ExtremControl: Low-Latency Humanoid Teleoperation with Direct Extremity Control

Ziyan Xiong, Lixing Fang, Junyun Huang, Kashu Yamazaki, Hao Zhang, Chuang Gan

Comments Project website: https://extremcontrol.github.io/

2602.08961 2026-03-31 cs.CV cs.AI cs.CG cs.LG

MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

Ruijie Zhu, Jiahao Lu, Wenbo Hu, Xiaoguang Han, Jianfei Cai, Ying Shan, Chuanxia Zheng

Comments Project page: https://ruijiezhu94.github.io/MotionCrafter_Page

2602.08602 2026-03-31 cs.RO

Mimic Intent, Not Just Trajectories

Renming Huang, Chendong Zeng, Wenjing Tang, Jintian Cai, Cewu Lu, Panpan Cai

2602.07775 2026-03-31 cs.CV

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

Haodong Li, Shaoteng Liu, Zhe Lin, Manmohan Chandraker

Comments v5: Fix some typos. Figures were compressed to 150 dpi to comply with arXiv's submission size limit. Project page: https://rolling-sink.github.io/

2602.07530 2026-03-31 cs.LG cs.DS

Compact Conformal Subgraphs

Sreenivas Gollapudi, Kostas Kollias, Kamesh Munagala, Aravindan Vijayaraghavan

2602.05150 2026-03-31 cs.CL

GreekMMLU: A Native-Sourced Multitask Benchmark for Evaluating Language Models in Greek

Yang Zhang, Mersin Konomi, Christos Xypolopoulos, Konstantinos Divriotis, Konstantinos Skianis, Giannis Nikolentzos, Giorgos Stamou, Guokan Shang, Michalis Vazirgiannis

2602.04361 2026-03-31 cs.CV cs.AI cs.LG

SparVAR: Exploring Sparsity in Visual AutoRegressive Modeling for Training-Free Acceleration

Zekun Li, Ning Wang, Tongxin Bai, Changwang Mei, Peisong Wang, Shuang Qiu, Jian Cheng

Comments CVPR 2026

详情

英文摘要

Visual AutoRegressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction paradigm. However, mainstream VAR paradigms attend to all tokens across historical scales at each autoregressive step. As the next scale resolution grows, the computational complexity of attention increases quartically with resolution, causing substantial latency. Prior accelerations often skip high-resolution scales, which speeds up inference but discards high-frequency details and harms image quality. To address these problems, we present \textbf{SparVAR}, a training-free acceleration framework that exploits three properties of VAR attention: \textbf{(i) strong attention sinks}, \textbf{(ii) cross-scale activation similarity}, and \textbf{(iii) pronounced locality}. Specifically, we dynamically predict the sparse attention pattern of later high-resolution scales from a sparse decision scale, and construct scale self-similar sparse attention via an efficient index-mapping mechanism, enabling high-efficiency sparse attention computation at large scales. Furthermore, we propose cross-scale local sparse attention and implement an efficient block-wise sparse kernel, which achieves $\mathbf{> 5\times}$ faster forward speed than FlashAttention. Extensive experiments demonstrate that the proposed SparVAR can reduce the generation time of an 8B model producing $1024\times1024$ high-resolution images to the \textbf{1s}, \textbf{without skipping the last scales}. Compared with the VAR baseline accelerated by FlashAttention, our method achieves a $\mathbf{1.57\times}$ speed-up while preserving almost all high-frequency details. When combined with existing scale-skipping strategies, SparVAR attains up to a $\mathbf{2.28\times}$ acceleration, while maintaining competitive visual generation quality. Code is available at \href{https://github.com/CAS-CLab/SparVAR}{SparVAR}.

URL PDF HTML ☆

赞 0 踩 0

2602.03548 2026-03-31 cs.CL

SEAD: Self-Evolving Agent for Multi-Turn Service Dialogue

Yuqin Dai, Ning Gao, Wei Zhang, Jie Wang, Zichen Luo, Jinpeng Wang, Yujie Wang, Ruiyuan Wu, Chaozheng Wang