arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.13091 2026-04-20 cs.CV

Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence

Seunghwan Bang, Hwanjun Song

Comments Project page: https://disl-lab.github.io/VAEX-Bench/

详情

英文摘要

The growing interest in embodied agents increases the demand for spatiotemporal video understanding, yet existing benchmarks largely emphasize extractive reasoning, where answers can be explicitly presented within spatiotemporal events. It remains unclear whether multimodal large language models can instead perform abstractive spatiotemporal reasoning, which requires integrating observations over time, combining dispersed cues, and inferring implicit spatial and contextual structure. To address this gap, we formalize abstractive spatiotemporal reasoning from videos by introducing a structured evaluation taxonomy that systematically targets its core dimensions and constructs a controllable, scenario-driven synthetic egocentric video dataset tailored to evaluate abstractive spatiotemporal reasoning capabilities, spanning object-, room-, and floor-plan-level scenarios. Based on this framework, we present VAEX-BENCH, a benchmark comprising five abstractive reasoning tasks together with their extractive counterparts. Our extensive experiments compare the performance of state-of-the-art MLLMs under extractive and abstractive settings, exposing their limitations on abstractive tasks and providing a fine-grained analysis of the underlying bottlenecks. The dataset will be released soon.

URL PDF HTML ☆

赞 0 踩 0

2603.11698 2026-04-20 cs.CV cs.AI cs.CL

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

Xianjing Han, Bin Zhu, Shiqi Hu, Franklin Mingzhe Li, Patrick Carrington, Roger Zimmermann, Jingjing Chen

Comments ACL 2026 Main Conference, Project page: https://hanxjing.github.io/OSCBench

2603.11487 2026-04-20 cs.LG

Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks

Yuval Ran-Milo

Comments 21 pages, 8 figures

2603.06454 2026-04-20 cs.CV

Training Flow Matching: The Role of Weighting and Parameterization

Anne Gagneux, Ségolène Martin, Rémi Gribonval, Mathurin Massias

Comments Published as a paper at the 2nd DeLTa Workshop, ICLR 2026

2603.03332 2026-04-20 cs.CL cs.AI cs.LG

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Ashwath Vaithinathan Aravindan, Mayank Kejriwal

2603.02263 2026-04-20 cs.CV cs.AI

Social-JEPA: Emergent Geometric Isomorphism

Haoran Zhang, Youjin Wang, Yi Duan, Rong Fu, Dianyu Zhao, Sicheng Fan, Shuaishuai Cao, Wentao Guo, Xiao Zhou

Comments This preprint is withdrawn due to significant errors in the emergent geometric isomorphism results that necessitate full rewriting, coupled with unresolved author disagreement on authorship. A corrected and revised manuscript will be released separately

2603.02210 2026-04-20 cs.CV

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Yichen Liu, Donghao Zhou, Jie Wang, Xin Gao, Guisheng Liu, Jiatong Li, Quanwei Zhang, Qiang Lyu, Lanqing Guo, Shilei Wen, Weiqiang Wang, Pheng-Ann Heng

Comments Accepted by CVPR 2026 (Project page: https://correr-zhou.github.io/HiFi-Inpaint/)

2603.00663 2026-04-20 cs.RO

Optimal Solutions for the Moving Target Vehicle Routing Problem via Branch-and-Price with Relaxed Continuity

Anoop Bhat, Geordan Gutow, Zhongqiang Ren, Sivakumar Rathinam, Howie Choset

Comments Accepted to ICAPS 2026

2602.21950 2026-04-20 cs.CL

MEDSYN: Benchmarking Multi-EviDence SYNthesis in Complex Clinical Cases for Multimodal Large Language Models

Boqi Chen, Xudong Liu, Jiachuan Peng, Marianne Frey-Marti, Bang Zheng, Kyle Lam, Lin Li, Jianing Qiu

2602.17909 2026-04-20 cs.CV

A Single Image and Multimodality Is All You Need for Novel View Synthesis

Amirhosein Javadi, Chi-Shiang Gau, Konstantinos D. Polyzos, Tara Javidi

2602.15143 2026-04-20 cs.AI cs.CL

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Xinhang Ma, William Yeoh, Ning Zhang, Yevgeniy Vorobeychik

2602.14517 2026-04-20 cs.CL cs.LG

Large Language Models for Math Education in Low-Resource Languages: A Study in Sinhala and Tamil

Sukumar Kishanthan, Kumar Thushalika, Buddhi Jayasekara, Asela Hevapathige

Comments Accepted to ITHET 2026

2602.12370 2026-04-20 cs.CV

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

Zekun Li, Sizhe An, Chengcheng Tang, Chuan Guo, Ivan Shugurov, Linguang Zhang, Amy Zhao, Srinath Sridhar, Lingling Tao, Abhay Mittal

Comments Project page: https://kunkun0w0.github.io/project/LLaMo/

2602.07006 2026-04-20 cs.CV cs.LG stat.ML

Scalable spatial point process models for forensic footwear analysis

Alokesh Manna, Neil Spencer, Dipak K. Dey

2602.05638 2026-04-20 cs.CV

SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos

Jinlin Wu, Felix Holm, Chuxi Chen, An Wang, Yaxin Hu, Xiaofan Ye, Zelin Zang, Miao Xu, Lihua Zhou, Huai Liao, Danny T. M. Chan, Ming Feng, Wai S. Poon, Hongliang Ren, Dong Yi, Nassir Navab, Gaofeng Meng, Jiebo Luo, Hongbin Liu, Zhen Lei

2602.01766 2026-04-20 cs.LG cs.AI

CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Runsong Zhao, Shilei Liu, Jiwei Tang, Langming Liu, Haibin Chen, Weidong Zhang, Yujin Yuan, Tong Xiao, Jingbo Zhu, Wenbo Su, Bo Zheng

Comments ACL 2026 main

2602.01566 2026-04-20 cs.CL

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

Chiwei Zhu, Benfeng Xu, Mingxuan Du, Shaohan Wang, Xiaorui Wang, Zhendong Mao, Yongdong Zhang

Comments 22 pages, 6 figures; Accepted to ACL 2026

2602.00114 2026-04-20 cs.CV cs.AI cs.LG

1S-DAug: One-Shot Data Augmentation for Robust Few-Shot Generalization

Yunwei Bai, Ying Kiat Tan, Yao Shu, Tsuhan Chen

2601.11374 2026-04-20 cs.CL

Reward Modeling for Scientific Writing Evaluation

Furkan Şahinuç, Subhabrata Dutta, Iryna Gurevych

Comments Accepted to ACL 2026 (Main). Project page: https://ukplab.github.io/acl2026-expert-rm/

2601.07160 2026-04-20 cs.AI cs.LG

AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units

Xinzi Cao, Jianyang Zhai, Pengfei Li, Zhiheng Hu, Cen Yan, Bingxu Mu, Guanghuan Fang, Bin She, Jiayu Li, Yihan Su, Dongyang Tao, Xiansong Huang, Fan Xu, Feidiao Yang, Yao Lu, Chang-Dong Wang, Yutong Lu, Weicheng Xue, Bin Zhou, Yonghong Tian

Comments 33 pages,7 figures,16 tables

2601.05547 2026-04-20 cs.CV cs.AI

VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck

Feiran Zhang, Yixin Wu, Zhenghua Wang, Xiaohua Wang, Changze Lv, Xuanjing Huang, Xiaoqing Zheng

2601.04377 2026-04-20 cs.CL cs.AI cs.LG

Disco-RAG: Discourse-Aware Retrieval-Augmented Generation

Dongqi Liu, Hang Ding, Qiming Feng, Xurong Xie, Zhucun Xue, Chengjie Wang, Jian Li, Jiangning Zhang, Yabiao Wang

Comments ACL 2026 Main & Long Conference Paper

2601.03746 2026-04-20 cs.CL

Whose Facts Win? LLM Source Preferences under Knowledge Conflicts

Jakob Schuster, Vagrant Gautam, Katja Markert

Comments Data and code: https://github.com/JaSchuste/llm-source-preference

2601.03699 2026-04-20 cs.CL

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Quy-Anh Dang, Chris Ngo, Truong-Son Hy

2601.03633 2026-04-20 cs.CV cs.AI

MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction

Wenjie Luo, Chuanhu Deng, Chaorong Li, Rongyao Deng, Qiang Yang

2601.02996 2026-04-20 cs.CL

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

Yihong Liu, Raoyuan Zhao, Hinrich Schütze, Michael A. Hedderich

Comments ACL 2026 Findings

2601.02531 2026-04-20 cs.CL cs.AI

Losses that Cook: Topological Optimal Transport for Structured Recipe Generation

Mattia Ottoborgo, Daniele Rege Cambrin, Paolo Garza

Comments Accepted to ACL 2026 Findings

2512.24625 2026-04-20 cs.LG cs.AI

AutoFed: Personalized Federated Traffic Prediction via Adaptive Prompt

Zijian Zhao, Yitong Shang, Sen Li

2512.23421 2026-04-20 cs.CV

DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

Tianze Xia, Yongkang Li, Lijun Zhou, Jingfeng Yao, Kaixin Xiong, Haiyang Sun, Bing Wang, Kun Ma, Guang Chen, Hangjun Ye, Wenyu Liu, Xinggang Wang

Comments 18 pages, 6 figures, CVPR 2026

2512.23304 2026-04-20 cs.CV cs.AI

MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images

Md. Sazzadul Islam Prottasha, Nabil Walid Rafi

Comments Accepted for publication in the Journal of Machine Learning and Deep Learning (JMLDL). 9 pages, 9 figures, 10 tables