arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.14659 2026-03-17 cs.CV cs.AI

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

Daeun Lee, Shoubin Yu, Yue Zhang, Mohit Bansal

Comments Project website: https://visioncoach.github.io/

详情

英文摘要

Video reasoning requires models to locate and track question-relevant evidence across frames. While reinforcement learning (RL) with verifiable rewards improves accuracy, it still struggles to achieve reliable spatio-temporal grounding during the reasoning process. Moreover, improving grounding typically relies on scaled training data or inference-time perception tools, which increases annotation cost or computational cost. To address this challenge, we propose VisonCoach, an input-adaptive RL framework that improves spatio-temporal grounding through visual prompting as training-time guidance. During RL training, visual prompts are selectively applied to challenging inputs to amplify question-relevant evidence and suppress distractors. The model then internalizes these improvements through self-distillation, enabling grounded reasoning directly on raw videos without visual prompting at inference. VisonCoach consists of two components: (1) Visual Prompt Selector, which predicts appropriate prompt types conditioned on the video and question, and (2) Spatio-Temporal Reasoner, optimized with RL under visual prompt guidance and object-aware grounding rewards that enforce object identity consistency and multi-region bounding-box overlap. Extensive experiments demonstrate that VisonCoach achieves state-of-the-art performance under comparable settings, across diverse video reasoning, video understanding, and temporal grounding benchmarks (V-STAR, VideoMME, World-Sense, VideoMMMU, PerceptionTest, and Charades-STA), while maintaining a single efficient inference pathway without external tools. Our results show that visual prompting during training improves grounded video reasoning, while self-distillation enables the model to internalize this ability without requiring prompts at inference time.

URL PDF HTML ☆

赞 0 踩 0

2603.14658 2026-03-17 cs.CV cs.AI

Human-AI Ensembles Improve Deepfake Detection in Low-to-Medium Quality Videos

Marco Postiglione, Isabel Gortner, V. S. Subrahmanian

2603.14656 2026-03-17 cs.RO

Coordinate-Independent Robot Model Identification

Yanhao Yang, Ross L. Hatton

Comments 8 pages, 7 figures, supplementary video: https://youtu.be/w2bBBV9t1fk?si=iCoJ4l51wumwvCIo

2603.14651 2026-03-17 cs.LG cs.AI cs.MA

EARCP: Self-Regulating Coherence-Aware Ensemble Architecture for Sequential Decision Making -- Ensemble Auto-Regule par Coherence et Performance

Mike Amega

Comments 13 pages, 1 table, 1 algorithm. Open-source implementation available at https://github.com/Volgat/earcp and via pip install earcp. Dual-licensed: free for academic researchers, students, and organizations with gross revenue under $100,000/year; commercial license required for organizations exceeding this threshold (contact author)

2603.14648 2026-03-17 cs.LG cs.AI

A Methodology for Thermal Limit Bias Predictability Through Artificial Intelligence

Anirudh Tunga, Michael J. Mueterthies, Jonathan Nistor

2603.14647 2026-03-17 cs.CV cs.AI

TopoCL: Topological Contrastive Learning for Medical Imaging

Guangyu Meng, Pengfei Gu, Peixian Liang, John P. Lalor, Erin Wolf Chambers, Danny Z. Chen

2603.14646 2026-03-17 cs.AI

Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models

Thuy Ngoc Nguyen, Duy Nhat Phan, Cleotilde Gonzalez

Comments 8 pages, 4 figures, 3 tables, conference

2603.14645 2026-03-17 cs.CV

Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion

Mang Ning, Mingxiao Li, Le Zhang, Lanmiao Liu, Matthew B. Blaschko, Albert Ali Salah, Itir Onal Ertugrul

Comments We use NIPS template for readability reason

2603.14643 2026-03-17 cs.AI cs.CL

Argumentation for Explainable and Globally Contestable Decision Support with LLMs

Adam Dejl, Matthew Williams, Francesca Toni

2603.14639 2026-03-17 cs.RO cs.CV

Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

Seoyoung Lee, Shaekh Mohammad Shithil, Durgakant Pushp, Lantao Liu, Zhangyang Wang

2603.14636 2026-03-17 cs.SD cs.AI cs.CL eess.AS

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee

Comments 6 pages, 4 figures, 2 tables

2603.14632 2026-03-17 cs.CV cs.IT math.IT

Continual Few-shot Adaptation for Synthetic Fingerprint Detection

Joseph Geo Benjamin, Anil K. Jain, Karthik Nandakumar

Comments Accepted in 14th International Workshop on Biometrics and Forensics (IWBF-2026)

2603.14631 2026-03-17 cs.LG cs.AI cs.CL

Anterior's Approach to Fairness Evaluation of Automated Prior Authorization System

Sai P. Selvaraj, Khadija Mahmoud, Anuj Iravane

2603.14623 2026-03-17 cs.LG

Proactive Routing to Interpretable Surrogates with Distribution-Free Safety Guarantees

Iqtedar Uddin, Mazin Khider, André Bauer

2603.14609 2026-03-17 cs.CV

GroundSet: A Cadastral-Grounded Dataset for Spatial Understanding with Vector Data

Roger Ferrod, Maël Lecene, Krishna Sapkota, George Leifman, Vered Silverman, Genady Beryozkin, Sylvain Lobry

2603.14608 2026-03-17 cs.LG cs.AI math.OC stat.ML

Delightful Policy Gradient

Ian Osband

2603.14605 2026-03-17 cs.RO

CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports

Peng Ren, Chuan Qi, Haoyang Ge, Qiyuan Su, Xuguo He, Cong Huang, Pei Chi, Jiang Zhao, Kai Chen

2603.14604 2026-03-17 cs.RO cs.CV cs.LG

Tactile Modality Fusion for Vision-Language-Action Models

Charlotte Morissette, Amin Abyaneh, Wei-Di Chang, Anas Houssaini, David Meger, Hsiu-Chin Lin, Jonathan Tremblay, Gregory Dudek

Comments 19 pages, 5 figures

2603.14603 2026-03-17 cs.RO

Latent Dynamics-Aware OOD Monitoring for Trajectory Prediction with Provable Guarantees

Tongfei Guo, Lili Su

2603.14600 2026-03-17 cs.LG cs.AI cs.RO

A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

Jingyi Liu, Jian Guo, Eberhard Gill

Comments Submitted to Acta Astronautica

2603.14598 2026-03-17 cs.RO

SmallSatSim: A High-Fidelity Simulation and Training Toolkit for Microgravity Robotic Close Proximity Operations

David Schwartz, Alexander Hansson, Sabrina Bodmer, David Sternberg, Oliver Jia-Richards, Keenan Albee

Comments 7 pages, 7 figures

2603.14594 2026-03-17 cs.AI cs.LG cs.LO

Scaling the Explanation of Multi-Class Bayesian Network Classifiers

Yaofang Zhang, Adnan Darwiche

Comments To appear in the 4th World Conference on Explainable Artificial Intelligence (XAI), 2026

2603.14593 2026-03-17 cs.CL

Parameter-Efficient Quality Estimation via Frozen Recursive Models

Umar Abubacar, Roman Bauer, Diptesh Kanojia

Comments Accepted to LowResLM Workshop @ EACL 2026

2603.14592 2026-03-17 cs.LG cs.CR

A Multi-Scale Graph Learning Framework with Temporal Consistency Constraints for Financial Fraud Detection in Transaction Networks under Non-Stationary Conditions

Yiming Lei, Qiannan Shen, Junhao Song

Comments 39 pages, 13 figures

详情

英文摘要

Financial fraud detection in transaction networks involves modeling sparse anomalies, dynamic patterns, and severe class imbalance in the presence of temporal drift in the data. In real-world transaction systems, a suspicious transaction is rarely isolated: rather, legitimate and suspicious transactions are often connected through accounts, intermediaries or through temporal transaction sequences. Attribute-based or randomly partitioned learning pipelines are therefore insufficient to detect relationally structured fraud. STC-MixHop, a graph-based framework combining spatial multi-resolution propagation with lightweight temporal consistency modeling for anomaly and fraud detection in dynamic transaction networks. It integrates three components: a MixHop-inspired multi-scale neighborhood diffusion encoder a multi-scale neighborhood diffusion MixHop-based encoder for learning structural patterns; a spatial-temporal attention module coupling current and preceding graph snapshots to stabilize representations; and a temporally informed self-supervised pretraining strategy exploiting unlabeled transaction interactions to improve representation quality. We evaluate the framework primarily on the PaySim dataset under strict chronological splits, supplementing the analysis with Porto Seguro and FEMA data to probe cross-domain component behavior. Results show that STC-MixHop is competitive among graph methods and achieves strong screening-oriented recall under highly imbalanced conditions. The experiments also reveal an important boundary condition: when node attributes are highly informative, tabular baselines remain difficult to outperform. Graph structure contributes most clearly where hidden relational dependencies are operationally important. These findings support a stability-focused view of graph learning for financial fraud detection.

URL PDF HTML ☆

赞 0 踩 0

2603.14591 2026-03-17 cs.LG cs.AI cs.IR

FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference

Wilhelm Tranheden, Shahnawaz Ahmed, Devdatt Dubhashi, Jonna Matthiesen, Hannes von Essen

Comments A collection of models with FlashHead optimization can be found at: https://huggingface.co/collections/embedl/flashhead

2603.14589 2026-03-17 cs.LG cs.AI cs.RO

Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

Jingyi Liu, Jian Guo, Eberhard Gill

Comments Revised manuscript, submitted to Astrodynamics

2603.14588 2026-03-17 cs.AI cs.IR cs.LG

SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory

Varun Pratap Bhardwaj

Comments 43 pages, 5 figures, 9 tables, 3 appendices. Code: https://github.com/qualixar/superlocalmemory. Zenodo DOI: 10.5281/zenodo.19038659

2603.14587 2026-03-17 cs.CV

Texel Splatting: Perspective-Stable 3D Pixel Art

Dylan Ebert

Comments 3 pages, 2 figures

2603.14567 2026-03-17 cs.CL

Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes

Deepon Halder, Raj Dabre

2603.14563 2026-03-17 cs.CL

Multilingual TinyStories: A Synthetic Combinatorial Corpus of Indic Children's Stories for Training Small Language Models

Deepon Halder, Angira Mukherjee