arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.23501 2026-03-25 cs.CV cs.AI cs.CL

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

Ufaq Khan, Umair Nawaz, L D M S S Teja, Numaan Saeed, Muhammad Bilal, Yutong Xie, Mohammad Yaqub, Muhammad Haris Khan

Comments 11 Pages

详情

英文摘要

Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understanding. In clinical practice, interpretation begins with pre-diagnostic sanity checks: verifying that the input is valid to read (correct modality and anatomy, plausible viewpoint and orientation, and no obvious integrity violations). Existing benchmarks largely assume this step is solved, and therefore miss a critical failure mode: a model can produce plausible narratives even when the input is inconsistent or invalid. We introduce MedObvious, a 1,880-task benchmark that isolates input validation as a set-level consistency capability over small multi-panel image sets: the model must identify whether any panel violates expected coherence. MedObvious spans five progressive tiers, from basic orientation/modality mismatches to clinically motivated anatomy/viewpoint verification and triage-style cues, and includes five evaluation formats to test robustness across interfaces. Evaluating 17 different VLMs, we find that sanity checking remains unreliable: several models hallucinate anomalies on normal (negative-control) inputs, performance degrades when scaling to larger image sets, and measured accuracy varies substantially between multiple-choice and open-ended settings. These results show that pre-diagnostic verification remains unsolved for medical VLMs and should be treated as a distinct, safety-critical capability before deployment.

URL PDF HTML ☆

赞 0 踩 0

2603.23500 2026-03-25 cs.CV

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Huang, Wanli Ouyang

2603.23499 2026-03-25 cs.CV

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

Jaewon Min, Jaeeun Lee, Yeji Choi, Paul Hyunbin Cho, Jin Hyeon Kim, Tae-Young Lee, Jongsik Ahn, Hwayeong Lee, Seonghyun Park, Seungryong Kim

Comments Project page: https://cvlab-kaist.github.io/DA-Flow

2603.23497 2026-03-25 cs.CV

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Zhen Li, Zian Meng, Shuwei Shi, Wenshuo Peng, Yuwei Wu, Bo Zheng, Chuanhao Li, Kaipeng Zhang

2603.23496 2026-03-25 cs.LG

Estimating Flow Velocity and Vehicle Angle-of-Attack from Non-invasive Piezoelectric Structural Measurements Using Deep Learning

Chandler B. Smith, S. Hales Swift, Andrew Steyer, Ihab El-Kady

2603.23495 2026-03-25 cs.CV cs.AI cs.LG

VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions

Adrian Bulat, Alberto Baldrati, Ioannis Maniadis Metaxas, Yassine Ouali, Georgios Tzimiropoulos

Comments Accepted at CVPR 2026

2603.23491 2026-03-25 cs.CV

Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Brian Chao, Lior Yariv, Howard Xiao, Gordon Wetzstein

Comments Project website at https://bchao1.github.io/foveated-diffusion

2603.23490 2026-03-25 cs.CG cs.DS

Dynamic Light Spanners in Doubling Metrics

Sujoy Bhore, Jonathan Conroy, Arnold Filtser

2603.23489 2026-03-25 cs.CV

AgentRVOS: Reasoning over Object Tracks for Zero-Shot Referring Video Object Segmentation

Woojeong Jin, Jaeho Lee, Heeseong Shin, Seungho Jang, Junhwan Heo, Seungryong Kim

2603.23487 2026-03-25 cs.CV

TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation

Jini Yang, Eunbeen Hong, Soowon Son, Hyunkoo Lee, Sunghwan Hong, Sunok Kim, Seungryong Kim

2603.23483 2026-03-25 cs.CV cs.CL

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Haoyu Huang, Jinfa Huang, Zhongwei Wan, Xiawu Zheng, Rongrong Ji, Jiebo Luo

Comments Code: https://github.com/MAC-AutoML/SpecEyes

2603.23482 2026-03-25 cs.SE cs.AI

ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

Muhammad Khalid, Manuel Oriol, Yilmaz Uygun

Comments 17 pages, 6 figures, 7 tables. Accepted at VerifAI-2026 Workshop, co-located with ETAPS 2026

2603.23481 2026-03-25 cs.RO cs.AI cs.CV cs.LG

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Haoran Yuan, Weigang Yi, Zhenyu Zhang, Wendi Chen, Yuchen Mo, Jiashi Yin, Xinzhuo Li, Xiangyu Zeng, Chuan Wen, Cewu Lu, Katherine Driggs-Campbell, Ismini Lourentzou

Comments https://plan-lab.github.io/projects/vtam/

2603.23480 2026-03-25 cs.CE

Stablecoins as Dry Powder: A Copula-Based Risk Analysis of Cryptocurrency Markets

Elliot Jones, Toshiko Matsui, William Knottenbelt

2603.23478 2026-03-25 cs.CV

UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation

Jiaying Lin, Dan Xu

2603.23476 2026-03-25 cs.IT cs.NI cs.SY eess.SY math.IT

Index-Based Scheduling for a Resource-Constrained Quantum Switch

Subhankar Banerjee, Stavros Mitrolaris, Sennur Ulukus

2603.23475 2026-03-25 eess.SY cs.SY physics.app-ph

Bridging the numerical-physical gap in acoustic holography via end-to-end differentiable structural optimization

Moon Hwan Lee, Mohd. Afzal Khan, Akm Ashiquzzaman, Eunbin Lee, Jonghun Lee, Euiheon Chung, Hyuk-Sang Kwon, Jae Youn Hwang

2603.23474 2026-03-25 cs.CY

Evidence of political bias in search engines and language models before major elections

Íris Damião, Paulo Almeida, João Franco, Nuno Santos, Pedro C. Magalhães, Joana Gonçalves-Sá

Comments 20 pages, 4 figures; Supplementary Information : Page 22 - 74

2603.23471 2026-03-25 cs.CY

Regulating AI Agents

Kathrin Gardhouse, Amin Oueslati, Noam Kolt

2603.23470 2026-03-25 cs.SE

ConceptCoder: Improve Code Reasoning via Concept Learning

Md Mahbubur Rahman, Hengbo Tong, Wei Le

2603.23465 2026-03-25 eess.SY cs.SY

Statistical Efficiency of Single- and Multi-step Models for Forecasting and Control

Anne Somalwar, Bruce D. Lee, George J. Pappas, Nikolai Matni

Comments arXiv admin note: substantial text overlap with arXiv:2504.01766

2603.23463 2026-03-25 cs.CV cs.AI

InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting

Duc Vu, Kien Nguyen, Trong-Tung Nguyen, Ngan Nguyen, Phong Nguyen, Khoi Nguyen, Cuong Pham, Anh Tran

Comments Accepted to CVPR'26 (Main Conference)

2603.23462 2026-03-25 cs.CV

RealMaster: Lifting Rendered Scenes into Photorealistic Video

Dana Cohen-Bar, Ido Sobol, Raphael Bensadoun, Shelly Sheynin, Oran Gafni, Or Patashnik, Daniel Cohen-Or, Amit Zohar

Comments Project page: https://danacohen95.github.io/RealMaster/

2603.23461 2026-03-25 cs.LG

End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

Zakaria Mhammedi, Alexander Rakhlin, Nneka Okolo

2603.23458 2026-03-25 cs.GT cs.DC

SNARE: A TRAP for Rational Players to Solve Byzantine Consensus in the 5f+1 Model

Alejandro Ranchal-Pedrosa, Benjamin Marsh

Comments WIP

2603.23455 2026-03-25 cs.CV

DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection

Gautam Rajendrakumar Gare, Neehar Peri, Matvei Popov, Shruti Jain, John Galeotti, Deva Ramanan

Comments Project Page: https://ggare-cmu.github.io/DetPO/

2603.23450 2026-03-25 eess.SY cs.SY

Information-Driven Active Perception for k-step Predictive Safety Monitoring

Sumukha Udupa, Jie Fu

Comments 6 pages, 6 figures, 1 table, submitted to IEEE L-CSS

2603.23447 2026-03-25 cs.CV cs.AI

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding

Yiping Chen, Jinpeng Li, Wenyu Ke, Yang Luo, Jie Ouyang, Zhongjie He, Li Liu, Hongchao Fan, Hao Wu

Comments 24 pages, 11 figures, 12 tables

2603.23445 2026-03-25 cs.HC cs.MM

MRATTS: An MR-Based Acupoint Therapy Training System with Real-Time Acupoint Detection and Evaluation Standards

Jiacheng Liu, Bohan Chen, Qian Wang, Weichao Song, Fangfei Ye, Liang Zhou, Haibin Ling, Bingyao Huang

2603.23443 2026-03-25 cs.SE cs.AI

Evaluating LLM-Based Test Generation Under Software Evolution

Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar

Comments 10 pages, 9 figures, 2 tables

详情

英文摘要

Large Language Models (LLMs) are increasingly used for automated unit test generation. However, it remains unclear whether these tests reflect genuine reasoning about program behavior or simply reproduce superficial patterns learned during training. If the latter dominates, LLM-generated tests may exhibit weaknesses such as reduced coverage, missed regressions, and undetected faults. Understanding how LLMs generate tests and how those tests respond to code evolution is therefore essential. We present a large-scale empirical study of LLM-based test generation under program changes. Using an automated mutation-driven framework, we analyze how generated tests react to semantic-altering changes (SAC) and semantic-preserving changes (SPC) across eight LLMs and 22,374 program variants. LLMs achieve strong baseline results, reaching 79% line coverage and 76% branch coverage with fully passing test suites on the original programs. However, performance degrades as programs evolve. Under SACs, the pass rate of newly generated tests drops to 66%, and branch coverage declines to 60%. More than 99% of failing SAC tests pass on the original program while executing the modified region, indicating residual alignment with the original behavior rather than adaptation to updated semantics. Performance also declines under SPCs despite unchanged functionality: pass rates fall to 79% and branch coverage to 69%. Although SPC edits preserve semantics, they often introduce larger syntactic changes, leading to instability in generated test suites. Models generate more new tests while discarding many baseline tests, suggesting sensitivity to lexical changes rather than true semantic impact. Overall, our results indicate that current LLM-based test generation relies heavily on surface-level cues and struggles to maintain regression awareness as programs evolve.

URL PDF HTML ☆

赞 0 踩 0