arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.23801 2026-04-28 cs.CL cs.IR

Domain Fine-Tuning vs. Retrieval-Augmented Generation for Medical Multiple-Choice Question Answering: A Controlled Comparison at the 4B-Parameter Scale

Avi-ad Avraam Buskila

详情

英文摘要

Practitioners deploying small open-weight large language models (LLMs) for medical question answering face a recurring design choice: invest in a domain-fine-tuned model, or keep a general-purpose model and inject domain knowledge at inference time via retrieval-augmented generation (RAG). We isolate this trade-off by holding model size, prompt template, decoding temperature, retrieval pipeline, and evaluation protocol fixed, and varying only (i) whether the model has been domain-adapted (Gemma 3 4B vs. MedGemma 4B, both 4-bit quantized and served via Ollama) and (ii) whether retrieved passages from a medical knowledge corpus are inserted into the prompt. We evaluate all four cells of this 2x2 design on the full MedQA-USMLE 4-option test split (1,273 questions) with three repetitions per question (15,276 LLM calls). Domain fine-tuning yields a +6.8 percentage-point gain in majority-vote accuracy over the general 4B baseline (53.3% vs. 46.4%, McNemar p < 10^-4). RAG over MedMCQA explanations does not produce a statistically significant gain in either model, and in the domain-tuned model the point estimate is slightly negative (-1.9 pp, p = 0.16). At this scale and on this benchmark, domain knowledge encoded in weights dominates domain knowledge supplied in context. We release the full experiment code and JSONL traces to support replication.

URL PDF HTML ☆

赞 0 踩 0

2604.23800 2026-04-28 cs.LG stat.ML

Causal Representation Learning from General Environments under Nonparametric Mixing

Ignavier Ng, Shaoan Xie, Xinshuai Dong, Peter Spirtes, Kun Zhang

Comments Accepted to AISTATS 2025. This is a slightly revised version of the published paper

2604.23799 2026-04-28 cs.CV

VitaminP: cross-modal learning enables whole-cell segmentation from routine histology

Yasin Shokrollahi, Karina B. Pinao Gonzales, Elizve N. Barrientos Toro, Paul Acosta, Patient Mosaic Team, Pingjun Chen, Yinyin Yuan, Xiaoxi Pan

Comments 44 pages, 10 figures. Code and models available

2604.23798 2026-04-28 cs.LG cs.CV

ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

Chih-Chung Hsu, Xin-Di Ma, Wo-Ting Liao, Chia-Ming Lee

Comments Accepted to CVPRF2026

2604.23790 2026-04-28 cs.LG stat.ML

A General Representation-Based Approach to Multi-Source Domain Adaptation

Ignavier Ng, Yan Li, Zijian Li, Yujia Zheng, Guangyi Chen, Kun Zhang

Comments ICML 2025

2604.23788 2026-04-28 cs.CV cs.HC

MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks

Jui-Cheng Chiu, Yu-Chao Wang, Shengyang Luo, Tongyan Wang, Qi Yang, Nabin Khanal, Yingjie Victor Chen

Comments 14 pages (11 pages main text), 6 figures, 1 table

详情

英文摘要

Appreciating multi-figure paintings requires understanding how characters relate through subtle cues like gaze alignment, gesture, and spatial arrangement. We present MIRAGE, an evidence-centric framework designed to scaffold the exploration of these "micro-interactions" in multi-figure artworks. While such cues are essential for deep narrative appreciation, they are often distributed across complex scenes and difficult for viewers to systematically identify. Existing vision-language models (VLMs) frequently fail to provide reliable assistance, offering ungrounded interpretations that lack traceable visual evidence. MIRAGE addresses this by constructing a structured intermediate representation capturing identities, pose cues, and gaze hypotheses. However, the challenge extends beyond extracting these cues to coordinating them during interpretation. Without an explicit mechanism to organize and reconcile relational evidence, models often collapse multiple interaction hypotheses into a single unstable or weakly grounded narrative, even when low-level signals are available. This representation allows users to verify how high-level interpretations are anchored in low-level visual facts. By separating spatial grounding from narrative generation, MIRAGE enables users to inspect and reason about figure-to-figure relationships through a verifiable evidence layer. We evaluate MIRAGE against painting-only VLM baselines using a blind assessment protocol. Results show that MIRAGE significantly improves identity consistency, reduces relational hallucinations, and increases the coverage of subtle interactions. These findings suggest that structured grounding can serve as a critical interaction control layer, providing the necessary scaffolding for a more reliable, transparent, and human-led understanding of complex visual narratives.

URL PDF HTML ☆

赞 0 踩 0

2604.23786 2026-04-28 cs.AI cs.LG

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

Sophie Chiang, Tom Brennan, Fethiye Irmak Dogan, Jiaee Cheong, Hatice Gunes

Comments 10 pages, 4 figures, 3 tables

2604.23781 2026-04-28 cs.CV cs.SE

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Fanqing Meng, Lingxiao Du, Zijian Wu, Guanzheng Chen, Xiangyan Liu, Jiaqi Liao, Chonghe Jiang, Zhenglin Wan, Jiawei Gu, Pengfei Zhou, Rui Huang, Ziqi Zhao, Shengyuan Ding, Ailing Yu, Bo Peng, Bowei Xia, Hao Sun, Haotian Liang, Ji Xie, Jiajun Chen, Jiajun Song, Liu Yang, Ming Xu, Qionglin Qiu, Runhao Fu, Shengfang Zhai, Shijian Wang, Tengfei Ma, Tianyi Wu, Weiyang Jin, Yan Wang, Yang Dai, Yao Lai, Youwei Shu, Yue Liu, Yunzhuo Hao, Yuwei Niu, Jinkai Huang, Jiayuan Zhuo, Zhennan Shen, Linyu Wu, Cihang Xie, Yuyin Zhou, Jiaheng Zhang, Zeyu Zheng, Mengkang Hu, Michael Qizhe Shieh

Comments github repo: https://github.com/evolvent-ai/ClawMark

2604.23776 2026-04-28 cs.CV cs.AI

From Noisy Historical Maps to Time-Series Oil Palm Mapping Without Annotation in Malaysia and Indonesia (2020-2024)

Nuttaset Kuapanich, Juepeng Zheng, Bohan Shi, Jiaying Liu, Jiayin Jiang, Jiatao Huang, Shenghan Tan, Qingmei Li, Haohuan Fu

2604.23775 2026-04-28 cs.RO

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

Qi Li, Bo Yin, Weiqi Huang, Ruhao Liu, Bojun Zou, Runpeng Yu, Jingwen Ye, Weihao Yu, Xinchao Wang

2604.23767 2026-04-28 cs.LG

WISE-FM:Operation-Aware, Engineering-Informed Foundation Model for Multi-Task Well Design

Carine de Menezes Rebello, Anderson Rapello dos Santos, Idelfonso B. R. Nogueira

详情

英文摘要

Deploying machine learning models across diverse well portfolios requires generalisation to wells with design parameters outside the training distribution. Current data-driven approaches to virtual flow metering (VFM) and bottomhole estimation typically treat each well independently or ignore the influence of well design on operational behaviour. We present WISE (Well Intelligence and Systems Engineering Foundation Model), a design-aware, physics-informed multi-task model that integrates three complementary mechanisms: Feature-wise Linear Modulation (FiLM) and cross-modal attention to condition operational embeddings on well design parameters; multi-task learning for simultaneous prediction of flow rates, bottomhole conditions, and flow regime classification; and structural mass conservation with soft physics constraints derived from well engineering principles. Evaluation on the ManyWells benchmark (2000 simulated wells, $10^6$ data points) demonstrates that design-aware models reduce VFM prediction error by up to $13\times$ compared to design-unaware baselines, and that physics constraints reduce negative flow predictions by 65%. Flow regime classification achieves 97.7% bottomhole accuracy, providing continuous well integrity monitoring without additional sensors. The methodology transfers to real operational data from five Equinor Volve producers (oil rate $R^2 = 0.89$, bottomhole pressure $R^2 = 0.98$, water rate $R^2 = 0.97$). The trained model additionally serves as a fast surrogate for integrity-aware well design optimisation over a 24-dimensional design space, with more than $1000\times$ speedup over drift-flux simulations. These results demonstrate that design awareness, physics enforcement, and multi-task learning are essential and complementary ingredients for foundation models intended to operate across large well portfolios.

URL PDF HTML ☆

赞 0 踩 0

2604.23761 2026-04-28 cs.RO

Unleashing the Agility of Wheeled-Legged Robots for High-Dynamic Reflexive Obstacle Evasion

Yongen Zhao, Zihao Xu, Wenzhi Lu, Zhen Chu, Ce Hao

Comments 8 pages, 8 figures, 4 tables

2604.23753 2026-04-28 cs.AI cs.HC cs.LG

Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion

Nastaran Dab, Raziyeh Zall, Mohammadreza Kangavari

2604.23747 2026-04-28 cs.LG cs.AI cs.CL

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning

Alexis Limozin, Eduard Durech, Torsten Hoefler, Imanol Schlag, Valentina Pyatkin

2604.23742 2026-04-28 cs.SD

RTCFake: Speech Deepfake Detection in Real-Time Communication

Jun Xue, Zhuolin Yi, Yihuan Huang, Yanzhen Ren, Yujie Chen, Cunhang Fan, Zicheng Su, Yonghong Zhang, Bo Cai

Comments Accepted by ACL 2026

2604.23740 2026-04-28 cs.LG

Transformer as an Euler Discretization of Score-based Variational Flow

Huadong Liao

2604.23733 2026-04-28 cs.CL

Multimodal QUD: Inquisitive Questions from Scientific Figures

Yating Wu, William Rudman, Venkata S Govindarajan, Alexandros G. Dimakis, Junyi Jessy Li

2604.23732 2026-04-28 cs.LG cs.AI cs.HC

Impact of Age Specialized Models for Hypoglycemia Classification

Beyza Cinar, Maria Maleshkova

Comments Accepted for IEEE CAI 2026. 13 pages, 6 Figures, and 10 Tables

详情

英文摘要

Disease progression varies with age and is influenced by underlying genetic, biochemical, and hormonal etiologies, suggesting the need for tailored monitoring, care, and medication beyond standard clinical guidelines. Specifically, in autoimmune diseases like type 1 diabetes (T1D), where patients depend on exogenous insulin to compensate for insulin deficiency, medication dosing and the physiological response reflected in vital signs can differ. Insulin therapy can lead to hypoglycemia, a dangerous condition characterized by decreased blood glucose levels ($\leq$70). This risk can be mitigated through improved diabetes management supported by data analytics. Notably, leveraging data from continuous glucose monitoring (CGM) devices, hypoglycemia onset can be predicted. However, while glucose variability, auto-antibody levels, and hypoglycemia occurrence differ across age groups, hypoglycemia classification most often only relies on population-based models specialized in specific age ranges. In this work, we classify hypoglycemia 0, 5-15, 20-45, and 50-120 minutes before onset using DiaData, a large CGM dataset of patients with T1D ranging from children to seniors. In particular, we investigate: 1) the generalizability of a population-based model including all age groups, 2) the impact of age-segmented models trained separately per age group, and 3) the effect of model individualization through transfer learning. The results show that a global population-based model yields similar or superior performance compared to age-segmented models. These findings suggest that data from children, teenagers, and adults can be combined for training models on hypoglycemia classification. While glucose variation differs across age groups, short-term hypoglycemic patterns are similar. However, data of children obtain their best recall with age specialized model.

URL PDF HTML ☆

赞 0 踩 0

2604.23730 2026-04-28 cs.AI

Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

Jungmin Choi, Keisuke Sakaguchi, Hiroaki Yamada

Comments 5 pages, Accepted to ICAIL 2026

2604.23729 2026-04-28 cs.CV

DynProto: Dynamic Prototype Evolution for Out-of-Distribution Detection

Yanqi Wu, Xinhua Lu, Runhe Lai, Qichao Chen, Jia-Xin Zhuang, Wei-Shi Zheng, Ruixuan Wang

Comments Accept by CVPR2026 Findings

2604.23720 2026-04-28 cs.LG

Quasi-Equivariant Metanetworks

Viet-Hoang Tran, An Nguyen, Benoît Guérand, Thieu N. Vo, Tan M. Nguyen

Comments Accepted to ICLR 2026

2604.23718 2026-04-28 cs.CV

Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection

Xuefen Liu, Xinquan Yang, Mianjie Zheng, Kun Tang, Xuguang Li, Xiaoqi Guo, Linlin Shen, He Meng

2604.23717 2026-04-28 cs.SD cs.CL

HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models

Peize He, Yaodi Luo, Xiaoqian Liu, Xuyang Liu, Jiahang Deng, Yaosong Du, Bangyu Li, Xiyan Gui, Yuxuan Chen, Linfeng Zhang

Comments Homepage: https://dabdans.github.io/HeadRouter/

2604.23709 2026-04-28 cs.CV eess.IV

ZID-Net: Zero-Inference Diffusion Prior Decoupling Network for Single Image Dehazing

Xinheng Li, Minghao Chen, Mengqing Wu, Yan Liu, Guanying Huo

Comments Submitted to Neurocomputing. Includes 12 figures and 8 tables

2604.23706 2026-04-28 cs.CV

Weakly Supervised Multicenter Nancy Index Scoring in Ulcerative Colitis Using Foundation Models

Adam Kukučka, Ondřej Fabián, Vít Musil, Tomáš Brázdil

2604.23705 2026-04-28 cs.LG

Can an MLP Absorb Its Own Skip Connection?

Antonij Mijoski, Marko Karbevski

2604.23704 2026-04-28 cs.CV

A Pose-only Geometric Constraint for Multi-Camera Pose Adjustment

Shunkun Liang, Banglei Guan, Bin Li, Qifeng Yu, Yang Shang

2604.23702 2026-04-28 cs.RO

QuietWalk: Physics-Informed Reinforcement Learning for Ground Reaction Force-Aware Humanoid Locomotion Under Diverse Footwear

Hanze Hu, Luying Feng, Silu Chen, Tianjiang Zheng, Dexin Jiang, Wei Chen, Chi Zhang, Guilin Yang, Yaochu Jin

Comments 8 pages,8 figures

2604.23701 2026-04-28 cs.CL cs.AI cs.CV

Agri-CPJ: A Training-Free Explainable Framework for Agricultural Pest Diagnosis Using Caption-Prompt-Judge and LLM-as-a-Judge

Wentao Zhang, Qi Zhang, Mingkun Xu, Mu You, Henghua Shen, Zhongzhi He, Keyan Jin, Derek F. Wong, Tao Fang

Comments This work is an expanded version of our prior paper published in the IEEE ICASSP 2026 conference arXiv:2512.24947, from 4 to 20+ pages, presenting a well-structured and principled framework, extensive experiments, and deeper insights. Tao Fang is the corresponding author

2604.23696 2026-04-28 cs.RO cs.SY eess.SY

Real-Time Non-Contact Force Compensation for Wrist-Mounted Force/Torque Sensors in Haptic-Enabled Robotic Surgery Training

Walid Shaker, Mustafa Suphi Erden

Comments Submitted to 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)