arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.01662 2026-05-05 cs.CV

Video Active Perception: Effective Inference-Time Long-Form Video Understanding with Vision-Language Models

Martin Q. Ma, Willis Guo, Aditya Agrawal, Ankit Gupta, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

Comments ICCV 2025 workshop

详情

英文摘要

Large vision-language models (VLMs) have advanced multimodal tasks such as video question answering (QA). However, VLMs face the challenge of selecting frames effectively and efficiently, as standard uniform sampling is expensive and performance may plateau. Inspired by active perception theory, which posits that models gain information by acquiring data that differs from their expectations, we introduce Video Active Perception (VAP), a training-free method to enhance long-form video QA using VLMs. Our approach treats keyframe selection as data acquisition in active perception and leverages a lightweight text-conditioned video generation model to represent prior world knowledge. Empirically, VAP achieves state-of-the-art zero-shot results on long-form or reasoning video QA datasets such as EgoSchema, NExT-QA, ActivityNet-QA, IntentQA, and CLEVRER, achieving an increase of up to 5.6 x frame efficiency by frames per question over standard GPT-4o, Gemini 1.5 Pro, and LLaVA-OV. Moreover, VAP shows stronger reasoning abilities than previous methods and effectively selects keyframes relevant to questions. These findings highlight the potential of leveraging active perception to improve the frame effectiveness and efficiency of long-form video QA.

URL PDF HTML ☆

赞 0 踩 0

2605.01659 2026-05-05 cs.CV cs.AI

TRIMMER: A New Paradigm for Video Summarization through Self-Supervised Reinforcement Learning

Pritam Mishra, Coloma Ballester, Dimosthenis Karatzas

2605.01657 2026-05-05 cs.CV

Act2See: Emergent Active Visual Perception for Video Reasoning

Martin Q. Ma, Yuxiao Qu, Aditya Agrawal, Willis Guo, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

Comments CVPR 2026

2605.01653 2026-05-05 cs.CV

SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models

Fangzheng Wu, Brian Summa

2605.01650 2026-05-05 cs.LG

Geospatial foundation-model embeddings improve population estimation unevenly across space and scale

Wenbin Zhang, Eimear Cleary, Francisco Rowe, Somnath Chaudhuri, Maksym Bondarenko, Shengjie Lai, Andrew J. Tatem

2605.01647 2026-05-05 cs.CL

Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection

Priyadarshan Narayanasamy, Swastik Agrawal, Klint Faber, Fardina Fathmiul Alam

Comments 11 figures, 10 tables, 24 pages, Under Review at COLM 2026

2605.01640 2026-05-05 cs.LG cs.CL

Prescriptive Scaling Laws for Data Constrained Training

Justin Lovelace, Christian Belardi, Srivatsa Kundurthy, Shriya Sudhakar, Kilian Q. Weinberger

2605.01638 2026-05-05 cs.CV

Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection

Tianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Xinze Li, Bingyu Zhu, Wuhui Duan, Congang Chen, Zeyu Fu, Yi Dong, Baoyuan Wu, Jason Li, Guangliang Cheng

Comments Accepted to CVPR 2026

2605.01637 2026-05-05 cs.LG cs.CC cs.DM math.CO

The Banach-Butterfly Invariant: Influence-Adaptive Walsh Geometry for Ternary Polynomial Threshold Functions

Gorgi Pavlov

Comments 21 pages, 3 figures. Theory paper; LLM-application companion in preparation. Code, certificates, and 616,126 NPN-canonical n=5 representatives in supplementary repository

2605.01634 2026-05-05 cs.LG

Chebyshev-Augmented One-Shot Transfer Learning for PINNs on Nonlinear Differential Equations

Yiqi Rao, Pavlos Protopapas

Comments 18 pages, 4 figures, 9 tables, accepted to ICLR 2026 Workshop on Artificial Intelligence and Partial Differential Equations

2605.01632 2026-05-05 cs.LG

Perturb and Correct: Post-Hoc Ensembles using Affine Redundancy

Eleanor Quint

2605.01630 2026-05-05 cs.CL cs.AI

Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

Roseval Malaquias Junior, Giovana Kerche Bonás, Thales Sales Almeida, Hugo Abonizio, Thiago Laitz, Ramon Pires, Marcos Piau, Celio Larcher, Rodrigo Nogueira

2605.01609 2026-05-05 cs.LG cs.AI

Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations

Pratyush Acharya, Nuraj Rimal, Habish Dhakal

Comments 25 pages, 16 figures, 13 tables

2605.01605 2026-05-05 cs.CL cs.AI

Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models

Zhuoyun Li, Boxuan Wang, Jinwei Hu, Zhenglin Huang, Qisong He, Xinmiao Huang, Guangliang Cheng, Xiaowei Huang, Yi Dong

Comments Under review

2605.01604 2026-05-05 cs.AI

Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework

Mukund Pandey

Comments 11 pages, 6 tables, 1 figure. Reference implementation: https://github.com/mukund1985/llm-eval-toolkit

2605.01596 2026-05-05 cs.CL

Fine-Tuning Pre-Trained Code Models for AI-Generated Code Detection

Jany-Gabriel Ispas, Sergiu Nisioi

Comments Archaeology at SemEval-2026 Task 13

2605.01580 2026-05-05 cs.LG cs.AI

Model Merging: Foundations and Algorithms

Donato Crisostomi

Comments PhD thesis

2605.01574 2026-05-05 cs.LG

Hybrid Quantum Reinforcement Learning with QAOA for Improved Vehicle Routing Optimization

T. Satyanarayana Murthy, B. Swathi Sowmya, Santhosh Voruganti, Sai Varshini Giridi, Chaitanyya Pratap Agarwal, Vanteddu Akshitha

2605.01568 2026-05-05 cs.CV

Unifying Deep Stochastic Processes for Image Enhancement

Wojciech Kozłowski, Radosław Kuczbański, Kamil Adamczewski, Karol Szczypkowski, Maciej Zięba

Comments 27 pages, in proceesings of the 43rd International Conference on Machine Learning, Seoul, South Korea

2605.01566 2026-05-05 cs.AI

Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling

Florian Valentin Wunderlich, Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp

Comments Accepted at SRW at ACL 2026, long paper

2605.01563 2026-05-05 cs.CV

Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection

Ceausescu Ciprian-Mihai, Anghelina Ion-Marian, Alexe Dumitru-Bogdan

Comments Journal extension from the KES paper

2605.01555 2026-05-05 cs.CL cs.AI cs.HC

Automated Interpretability and Feature Discovery in Language Models with Agents

Arnau Marin-Llobet, Javier Ferrando

2605.01552 2026-05-05 cs.CV

Robust Fundamental Matrix Estimation from Single Image Motion Blur

Bao-Long Tran, Per-Erik Forssén, Fredrik Viksten

Comments 13 pages, 8 figures, under submission

2605.01548 2026-05-05 cs.LG cs.CV eess.SP

ECG-biometrics-bench: A Unified Framework for Reproducible Benchmarking of ECG Biometrics

Milad Parvan

Comments Under review

详情

DOI: 10.5281/zenodo.19451890

英文摘要

Electrocardiogram (ECG) biometrics have emerged as a promising modality for continuous, liveness-aware authentication in wearable systems. However, many prior studies report overly optimistic results due to data leakage (e.g., random splits within the same session). To address this issue, we introduce ECG-biometrics-bench, a modular, reproducible benchmarking framework that standardizes preprocessing, segmentation, and evaluation across seven widely used public ECG datasets spanning clinical, ambulatory, and large-scale cohort settings. The framework supports both closed-set and open-set (i.e., subject-disjoint generalization in this work) evaluation, as well as progressively realistic protocols including cross-session and long-term temporal separation. To facilitate reproducible research in the community, the ECG-biometrics-bench repository will be made publicly accessible on GitHub upon the acceptance of this manuscript. Through a comprehensive multi-dataset analysis, we expose the Random Split Fallacy, demonstrating that intra-session evaluation protocols artificially inflate performance while masking severe degradation caused by temporal drift and unseen identities. Furthermore, by evaluating multiple architectures, including DeepECG, ResNet1D, and CNN-LSTM, we show that these failures are not model-specific but are likely inherent to current supervised feature-learning paradigms. Finally, we demonstrate that performance degradation due to temporal aging can be partially mitigated through a heavy enrollment, lightweight authentication strategy based on dynamic multi-session template fusion. These findings establish a more realistic baseline for ECG biometrics and highlight critical challenges that must be addressed for reliable real-world deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.01544 2026-05-05 cs.RO

An Efficient Metric for Data Quality Measurement in Imitation Learning

Noushad Sojib, Momotaz Begum

2605.01542 2026-05-05 cs.LG cs.AI physics.comp-ph

Mesh Based Simulations with Spatial and Temporal awareness

Paul Garnier, Vincent Lannelongue, Elie Hachem

2605.01537 2026-05-05 cs.CL

The grip of grammar on meaning uncertainty: cross-linguistic evidence, neural correlates, and clinical relevance

Rui He, Claudio Palominos, Samuele Vallisa, Ni Yang, Han Zhang, Miguel Ángel Santos Santos, Neguine Rezaii, Sergi Valero, Yonghua Huang, Huan Li, Hong Jiang, Yongjun Peng, Maria Francisca Alonso-Sánchez, Frederike Stein, Tilo Kircher, Philipp Homan, Iris Sommer, Lena Palaniyappan, Wolfram Hinzen

2605.01520 2026-05-05 cs.CV cs.CL

MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models

Yin Zhang, Jiaxuan Zhao, Zonghan Wu, Zengxiang Li, Junfeng Fang, Kun Wang, Qingsong Wen, Yilei Shao

2605.01519 2026-05-05 cs.CV

Certified vs. Empirical Adversarial Robust-ness via Hybrid Convolutions with Attention Stochasticity

Joy Dhar, Song Xia, Manish Kumar Pandey, Maryam Haghighat, Azadeh Alavi, Ferdous Sohel, Wenyu Zhang, Nayyar Zaidi

2605.01517 2026-05-05 cs.CV

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

Guotao Liang, Zhangcheng Wang, Chuang Wang, Juncheng Hu, Haitao Zhou, Junhua Liu, Jing Zhang, Dong Xu, Qian Yu

Comments Accepted to ICML 2026. Project page: https://yukinonooo.github.io/VAnimProject