arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.27720 2026-05-01 cs.AI

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Xupeng Chen, Binbin Shi, Chenqian Le, Qifu Yin, Lang Lin, Haowei Ni, Ran Gong, Panfeng Li

详情

英文摘要

Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is poorly characterized. We audit five recent frontier and grounding-aware VLMs (Gemini~2.5~Pro, GPT-5, o3, GLM-4.5V, Qwen~2.5~VL) on Medical VQA along two trust-relevant axes. Perception: all models localize anatomical and pathological targets poorly -- the best model reaches only 0.23 mean IoU and 19.1% Acc@0.5 -- and exhibit clinically dangerous laterality confusion. Pipeline integration: a self-grounding pipeline, where the same model localizes then answers, degrades VQA accuracy for every model -- driven by both inaccurate localization and format-compliance failures under the two-step prompt (parse failure rises to 70%--99% for Gemini and GPT-5 on VQA-RAD). Replacing predicted boxes with ground-truth annotations recovers and improves VQA accuracy, consistent with the failure residing in the perception module rather than in the decomposition itself. These observational findings identify grounding quality as a primary trustworthiness bottleneck in our SLAKE bounding-box setting. As a complementary fine-tuning follow-up, supervised fine-tuning of Qwen~2.5~VL on combined Med-VQA training data attains the highest reported SLAKE open-ended recall (85.5%) among comparable methods, suggesting that the VQA-level gap is tractable with domain adaptation; whether this also closes the perception/trustworthiness bottleneck is left to future work.

URL PDF HTML ☆

赞 0 踩 0

2604.27715 2026-05-01 cs.CV

Improving Calibration in Test-Time Prompt Tuning for Vision-Language Models via Data-Free Flatness-Aware Prompt Pretraining

Hyeonseo Jang, Jaebyeong Jeon, Joong-Won Hwang, Kibok Lee

Comments CVPR 2026

2604.27713 2026-05-01 cs.AI

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

Wilder Baldwin, Sepideh Ghanavati

2604.27712 2026-05-01 cs.CV cs.CL

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

Nhi Ngoc-Yen Nguyen, Anh-Duc Nguyen, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

2604.27711 2026-05-01 cs.RO

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

Yanghao Zhou, Jingyu Ma, Yibo Peng, Zhenguo Sun, Yu Bai, Börje F. Karlsson

Comments Work in progress. Project page: https://baai-agents.github.io/ExoActor/

2604.27707 2026-05-01 cs.AI cs.CL

Contextual Agentic Memory is a Memo, Not True Memory

Binyan Xu, Xilin Dai, Kehuan Zhang

2604.27704 2026-05-01 cs.CV

A generalised pre-training strategy for deep learning networks in semantic segmentation of remotely sensed images

Yuan Fang, Yuanzhi Cai, Jagannath Aryal, Qinfeng Zhu, Hong Huang, Cheng Zhang, Lei Fan

2604.27702 2026-05-01 cs.CV

RayFormer: Modeling Inter- and Intra-Ray Similarity for NeRF-Based Video Snapshot Compressive Imaging

Yubo Dong, Danhua Liu, Anqi Li, Zhenyuan Lin

2604.27699 2026-05-01 cs.AI

Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

Chunhui Zhang, Yuxuan Wang, Aoyang Qin, Yi-Long Lu, Kunlun Wu, Yizhou Wang, Wei Wang

2604.27697 2026-05-01 cs.CV cs.AI

Deep Learning-Based Segmentation of Peritoneal Cancer Index Regions from CT Imaging

Pieter C. Gort, Lotte J. S. Ewals, Marion W. Tops-Welten, Cris H. B. Claessens, Joost Nederend, Fons van der Sommen

Comments Accepted for presentation at Computer Assisted Radiology and Surgery (CARS) 2026

2604.27695 2026-05-01 cs.CV cs.CL

EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

Yuyang Li, Yime He, Zeyu Zhang, Dong Gong

2604.27691 2026-05-01 cs.AI

When Agents Evolve, Institutions Follow

Chao Fei, Hongcheng Guo, Yanghua Xiao

2604.27674 2026-05-01 cs.CL cs.AI cs.CR cs.IR

One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness

Hiroyuki Deguchi, Katsuki Chousa, Yusuke Sakai

Comments Accepted at ACL2026 (main)

2604.27673 2026-05-01 cs.AI cs.CY cs.HC cs.LG cs.SI

The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text

Sebastiano Franchini, Alexis Carrillo, Edoardo Sebastiano De Duro, Riccardo Improta, Ali Aghazadeh Ardebili, Massimo Stella

2604.27669 2026-05-01 cs.AI cs.SY eess.SY

Fairness for distribution network operations and planning

Pedro F. C. de Carvalho, Zijie Liu, Md Umar Hashmi, Dirk Van Hertem

Comments 16 pages, 0 figures, 2 tables, CIRED Conference Workshop Brussels 2026

2604.27667 2026-05-01 cs.RO cs.LG

Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

Buqing Ou, Frederike Dümbgen

Comments 8 pages, 6 figures

2604.27661 2026-05-01 cs.CL

Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

Emilia Milano, Alistair Plum, Yves Scherrer, Christoph Purschke

2604.27656 2026-05-01 cs.LG cs.AI cs.NE

When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry

Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi

2604.27654 2026-05-01 cs.CV

MSR:Hybrid Field Modeling for CT-MRI Rigid-Deformable Registration of the Cervical Spine with an Annotated Dataset

Bohai Zhang, Wenjie Chen, Mu Li, Kaixing Long, Xing Shen, Xinqiang Yao, Jincheng Yang, Jianting Chen, Wei Yang, Qianjin Feng, Lei Cao

2604.27653 2026-05-01 cs.CV

FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging

Dahua Gao, Yubo Dong, Anqi Li, Zhenyuan Lin, Ang Gao, Danhua Liu, Guangming Shi

Comments First work on exploring high-level computer vision tasks in compressive spectral imaging

2604.27638 2026-05-01 cs.LG

Green Physics-Informed Machine Learning Models For Structural Health Monitoring

Daisy R Bradley, Elizabeth J Cross

Comments 11 pages, 6 figures

2604.27637 2026-05-01 cs.AI

Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading

Nicholas Sadjoli, Tim Siefken, Atin Ghosh, Yifan Mai, Daniel Dahlmeier

Comments Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

2604.27633 2026-05-01 cs.AI

Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor

Petter Törnberg, Michelle Schimmel

2604.27624 2026-05-01 cs.CL cs.AI cs.CY cs.HC cs.LG

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

Ali Aghazadeh Ardebili, Massimo Stella

2604.27621 2026-05-01 cs.RO cs.CV

Robot Learning from Human Videos: A Survey

Junyi Ma, Erhang Zhang, Haoran Yang, Ditao Li, Chenyang Xu, Guangming Wang, Hesheng Wang

Comments Paper list: https://github.com/IRMVLab/awesome-robot-learning-from-human-videos

2604.27620 2026-05-01 cs.CV

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

Pengna Li, Kangyi Wu, Shaoqing Xu, Fang Li, Hanbing Li, Lin Zhao, Kailin Lyu, Long Chen, Zhi-Xin Yang, Nanning Zheng

Comments Submmited to ACM MM 2026

2604.27618 2026-05-01 cs.AI cs.CY cs.HC cs.LG cs.SI

Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

Naomi Esposito, Anthony Tricarico, Luisa Porzio, Ali Aghazadeh Ardebili, Massimo Stella

2604.27616 2026-05-01 cs.CL cs.MA

RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems

Jiacheng Liu, Zichen Tang, Zhongjun Yang, Xinyi Hu, Xueyuan Lin, Linwei Jia, Ruofei Bai, Rongjin Li, Shiyao Peng, Haocheng Gao, Haihong E

Comments Accepted to Findings of ACL 2026

2604.27613 2026-05-01 cs.LG

AMGenC: Generating Charge Balanced Amorphous Materials

Yan Lin, Jilin Hu, N. M. Anoop Krishnan, Morten M. Smedskjaer

2604.27606 2026-05-01 cs.LG cs.AI cs.CV

ZAYAN: Disentangled Contrastive Transformer for Tabular Remote Sensing Data

Al Zadid Sultan Bin Habib, Tanpia Tasnim, Md. Ekramul Islam, Muntasir Tabasum

Comments Accepted for presentation at the 28th International Conference on Pattern Recognition (ICPR 2026) at Lyon, France. Code available at https://github.com/zadid6pretam/ZAYAN. PyPI package: pip install zayan