arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.23501 2026-03-25 cs.CV cs.AI cs.CL

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

Ufaq Khan, Umair Nawaz, L D M S S Teja, Numaan Saeed, Muhammad Bilal, Yutong Xie, Mohammad Yaqub, Muhammad Haris Khan

Comments 11 Pages

详情

英文摘要

Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understanding. In clinical practice, interpretation begins with pre-diagnostic sanity checks: verifying that the input is valid to read (correct modality and anatomy, plausible viewpoint and orientation, and no obvious integrity violations). Existing benchmarks largely assume this step is solved, and therefore miss a critical failure mode: a model can produce plausible narratives even when the input is inconsistent or invalid. We introduce MedObvious, a 1,880-task benchmark that isolates input validation as a set-level consistency capability over small multi-panel image sets: the model must identify whether any panel violates expected coherence. MedObvious spans five progressive tiers, from basic orientation/modality mismatches to clinically motivated anatomy/viewpoint verification and triage-style cues, and includes five evaluation formats to test robustness across interfaces. Evaluating 17 different VLMs, we find that sanity checking remains unreliable: several models hallucinate anomalies on normal (negative-control) inputs, performance degrades when scaling to larger image sets, and measured accuracy varies substantially between multiple-choice and open-ended settings. These results show that pre-diagnostic verification remains unsolved for medical VLMs and should be treated as a distinct, safety-critical capability before deployment.

URL PDF HTML ☆

赞 0 踩 0

2603.23500 2026-03-25 cs.CV

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Huang, Wanli Ouyang

2603.23499 2026-03-25 cs.CV

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

Jaewon Min, Jaeeun Lee, Yeji Choi, Paul Hyunbin Cho, Jin Hyeon Kim, Tae-Young Lee, Jongsik Ahn, Hwayeong Lee, Seonghyun Park, Seungryong Kim

Comments Project page: https://cvlab-kaist.github.io/DA-Flow

2603.23497 2026-03-25 cs.CV

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Zhen Li, Zian Meng, Shuwei Shi, Wenshuo Peng, Yuwei Wu, Bo Zheng, Chuanhao Li, Kaipeng Zhang

2603.23496 2026-03-25 cs.LG

Estimating Flow Velocity and Vehicle Angle-of-Attack from Non-invasive Piezoelectric Structural Measurements Using Deep Learning

Chandler B. Smith, S. Hales Swift, Andrew Steyer, Ihab El-Kady

2603.23495 2026-03-25 cs.CV cs.AI cs.LG

VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions

Adrian Bulat, Alberto Baldrati, Ioannis Maniadis Metaxas, Yassine Ouali, Georgios Tzimiropoulos

Comments Accepted at CVPR 2026

2603.23491 2026-03-25 cs.CV

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Brian Chao, Lior Yariv, Howard Xiao, Gordon Wetzstein

Comments Project website at https://bchao1.github.io/foveated-diffusion

2603.23489 2026-03-25 cs.CV

AgentRVOS: Reasoning over Object Tracks for Zero-Shot Referring Video Object Segmentation

Woojeong Jin, Jaeho Lee, Heeseong Shin, Seungho Jang, Junhwan Heo, Seungryong Kim

2603.23487 2026-03-25 cs.CV

TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation

Jini Yang, Eunbeen Hong, Soowon Son, Hyunkoo Lee, Sunghwan Hong, Sunok Kim, Seungryong Kim

2603.23483 2026-03-25 cs.CV cs.CL

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Haoyu Huang, Jinfa Huang, Zhongwei Wan, Xiawu Zheng, Rongrong Ji, Jiebo Luo

Comments Code: https://github.com/MAC-AutoML/SpecEyes

2603.23481 2026-03-25 cs.RO cs.AI cs.CV cs.LG

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Haoran Yuan, Weigang Yi, Zhenyu Zhang, Wendi Chen, Yuchen Mo, Jiashi Yin, Xinzhuo Li, Xiangyu Zeng, Chuan Wen, Cewu Lu, Katherine Driggs-Campbell, Ismini Lourentzou

Comments https://plan-lab.github.io/projects/vtam/

2603.23478 2026-03-25 cs.CV

UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation

Jiaying Lin, Dan Xu

2603.23463 2026-03-25 cs.CV cs.AI

InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting

Duc Vu, Kien Nguyen, Trong-Tung Nguyen, Ngan Nguyen, Phong Nguyen, Khoi Nguyen, Cuong Pham, Anh Tran

Comments Accepted to CVPR'26 (Main Conference)

2603.23462 2026-03-25 cs.CV

RealMaster: Lifting Rendered Scenes into Photorealistic Video

Dana Cohen-Bar, Ido Sobol, Raphael Bensadoun, Shelly Sheynin, Oran Gafni, Or Patashnik, Daniel Cohen-Or, Amit Zohar

Comments Project page: https://danacohen95.github.io/RealMaster/

2603.23461 2026-03-25 cs.LG

End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

Zakaria Mhammedi, Alexander Rakhlin, Nneka Okolo

2603.23455 2026-03-25 cs.CV

DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection

Gautam Rajendrakumar Gare, Neehar Peri, Matvei Popov, Shruti Jain, John Galeotti, Deva Ramanan

Comments Project Page: https://ggare-cmu.github.io/DetPO/

2603.23447 2026-03-25 cs.CV cs.AI

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding

Yiping Chen, Jinpeng Li, Wenyu Ke, Yang Luo, Jie Ouyang, Zhongjie He, Li Liu, Hongchao Fan, Hao Wu

Comments 24 pages, 11 figures, 12 tables

2603.23439 2026-03-25 cs.CV

SIGMA: A Physics-Based Benchmark for Gas Chimney Understanding in Seismic Images

Bao Truong, Quang Nguyen, Baoru Huang, Jinpei Han, Van Nguyen, Ngan Le, Minh-Tan Pham, Doan Huy Hien, Anh Nguyen

Comments Accepted at The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026

2603.23436 2026-03-25 cs.LG

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

Connor Mclaughlin, Nigel Lee, Lili Su

Comments 9 pages

2603.23414 2026-03-25 cs.LG cs.AI

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling

Yiqi Zhang, Huiqiang Jiang, Xufang Luo, Zhihe Yang, Chengruidong Zhang, Yifei Shen, Dongsheng Li, Yuqing Yang, Lili Qiu, Yang You

2603.23413 2026-03-25 cs.CV

I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation

Jia Li, Han Yan, Yihang Chen, Siqi Li, Xibin Song, Yifu Wang, Jianfei Cai, Tien-Tsin Wong, Pan Ji

Comments Project page: https://riga2.github.io/i3dm

2603.23408 2026-03-25 cs.CV

GeoSANE: Learning Geospatial Representations from Models, Not Data

Joelle Hanna, Damian Falk, Stella X. Yu, Damian Borth

2603.23393 2026-03-25 cs.RO

Rectify, Don't Regret: Avoiding Pitfalls of Differentiable Simulation in Trajectory Prediction

Harsh Yadav, Christian Bohn, Tobias Meisen

2603.23390 2026-03-25 cs.CV eess.IV

Harnessing Lightweight Transformer with Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation

Xinyu Liu, Zhen Chen, Wuyang Li, Chenxin Li, Yixuan Yuan

Comments Accepted to IEEE TPAMI

2603.23386 2026-03-25 cs.CV cs.GR cs.RO

SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM

Chuanrui Zhang, Minghan Qin, Yuang Wang, Baifeng Xie, Hang Li, Ziwei Wang

2603.23381 2026-03-25 cs.CV

FG-Portrait: 3D Flow Guided Editable Portrait Animation

Yating Xu, Yunqi Miao, Evangelos Ververas, Jiankang Deng, Jifei Song

Comments CVPR 2026

2603.23370 2026-03-25 cs.CV

Object Pose Transformer: Unifying Unseen Object Pose Estimation

Weihang Li, Lorenzo Garattoni, Fabien Despinoy, Nassir Navab, Benjamin Busam

Comments Project Page: https://colin-de.github.io/OPT-Pose/

2603.23365 2026-03-25 cs.RO

PinPoint: Monocular Needle Pose Estimation for Robotic Suturing via Stein Variational Newton and Geometric Residuals

Jesse F. d'Almeida, Tanner Watts, Susheela Sharma Stern, James Ferguson, Alan Kuntz, Robert J. Webster

Comments 15 pages, 7 Figures

2603.23355 2026-03-25 cs.LG cs.CL

Off-Policy Value-Based Reinforcement Learning for Large Language Models

Peng-Yuan Wang, Ziniu Li, Tian Xu, Bohan Yang, Tian-Shuo Liu, ChenYang Wang, Xiong-Hui Chen, Yi-Chen Li, Tianyun Yang, Congliang Chen, Yang Yu

2603.23346 2026-03-25 cs.AI

RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue

Long Mai