arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.27060 2026-03-31 cs.CV

VIRST: Video-Instructed Reasoning Assistant for SpatioTemporal Segmentation

Jihwan Hong, Jaeyoung Do

Comments CVPR 2026

详情

英文摘要

Referring Video Object Segmentation (RVOS) aims to segment target objects in videos based on natural language descriptions. However, fixed keyframe-based approaches that couple a vision language model with a separate propagation module often fail to capture rapidly changing spatiotemporal dynamics and to handle queries requiring multi-step reasoning, leading to sharp performance drops on motion-intensive and reasoning-oriented videos beyond static RVOS benchmarks. To address these limitations, we propose VIRST (Video-Instructed Reasoning Assistant for Spatio-Temporal Segmentation), an end-to-end framework that unifies global video reasoning and pixel-level mask prediction within a single model. VIRST bridges semantic and segmentation representations through the Spatio-Temporal Fusion (STF), which fuses segmentation-aware video features into the vision-language backbone, and employs the Temporal Dynamic Anchor Updater to maintain temporally adjacent anchor frames that provide stable temporal cues under large motion, occlusion, and reappearance. This unified design achieves state-of-the-art results across diverse RVOS benchmarks under realistic and challenging conditions, demonstrating strong generalization to both referring and reasoning oriented settings. The code and checkpoints are available at https://github.com/AIDASLab/VIRST.

URL PDF HTML ☆

赞 0 踩 0

2603.27059 2026-03-31 cs.CV

Towards Intrinsic-Aware Monocular 3D Object Detection

Zhihao Zhang, Abhinav Kumar, Xiaoming Liu

Comments This paper is accepted by CVPR 2026

2603.27058 2026-03-31 cs.LG cs.RO

Liquid Networks with Mixture Density Heads for Efficient Imitation Learning

Nikolaus Correll

2603.27057 2026-03-31 cs.CL cs.AI

Debiasing Large Language Models toward Social Factors in Online Behavior Analytics through Prompt Knowledge Tuning

Hossein Salemi, Jitin Krishnan, Hemant Purohit

Comments This is a preprint of the accepted paper for publication in IEEE Transactions on Computational Social Systems

2603.27055 2026-03-31 cs.CL cs.IR

Text Data Integration

Md Ataur Rahman, Dimitris Sacharidis, Oscar Romero, Sergi Nadal

Comments Accepted for Publication as a Book Chapter in "Data Engineering for Data Science" (ISBN: 978-3-032-18765-9)

2603.27040 2026-03-31 cs.CV

Unified Number-Free Text-to-Motion Generation Via Flow Matching

Guanhe Huang, Oya Celiktutan

2603.27035 2026-03-31 cs.SD

Diachronic Modeling of Tonal Coherence on the Tonnetz Across Classical and Popular Repertoires

Weilun Xu, Edward Hall, Martin Rohrmeier

2603.27033 2026-03-31 cs.CV

RealBirdID: Benchmarking Bird Species Identification in the Era of MLLMs

Logan Lawrence, Mustafa Chasmai, Rangel Daroya, Wuao Liu, Seoyun Jeong, Aaron Sun, Max Hamilton, Fabien Delattre, Oindrila Saha, Subhransu Maji, Grant Van Horn

Comments Accepted to CVPR26. 23 pages, 23 figures, 5 tables

2603.27029 2026-03-31 cs.CV cs.LG

YOLO Object Detectors for Robotics -- a Comparative Study

Patryk Niżeniec, Marcin Iwanowski, Marcin Gahbler

2603.27027 2026-03-31 cs.CL cs.AI

TAPS: Task Aware Proposal Distributions for Speculative Sampling

Mohamad Zbib, Mohamad Bazzi, Ammar Mohanna, Hasan Abed Al Kader Hammoud, Bernard Ghanem

Comments 21 pages, 11 figures. Code: https://github.com/Moe-Zbeeb/TAPS Weights: https://huggingface.co/collections/zbeeb/taps Datasets: https://huggingface.co/datasets/zbeeb/TAPS-Datasets

2603.27021 2026-03-31 cs.CL

Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language

Hanif Rahman, Shafeeq ur Rehman

Comments Submitted to Interspeech 2026

2603.27016 2026-03-31 cs.CV cs.AI

Generative Shape Reconstruction with Geometry-Guided Langevin Dynamics

Linus Härenstam-Nielsen, Dmitrii Pozdeev, Thomas Dagès, Nikita Araslanov, Daniel Cremers

2603.27014 2026-03-31 cs.CV

GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection

Jiaming Li, Zhijia Liang, Weikai Chen, Lin Ma, Guanbin Li

Comments NIPS2025

2603.27012 2026-03-31 cs.RO cs.AI

UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

Hao Li, Long Yin Chung, Jack Goler, Ryan Zhang, Xiaochi Xie, Huy Ha, Shuran Song, Mark Cutkosky

2603.27008 2026-03-31 cs.CL

RASPRef: Retrieval-Augmented Self-Supervised Prompt Refinement for Large Reasoning Models

Rahul Soni

2603.26997 2026-03-31 cs.RO cs.HC

ROSClaw: An OpenClaw ROS 2 Framework for Agentic Robot Control and Interaction

Irvin Steve Cardenas, Marcus Anthony Arnett, Natalie Catherine Yeo, Lucky Sah, Jong-Hoon Kim

2603.26996 2026-03-31 cs.AI cs.CL cs.LG cs.PL

FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?

Nikil Ravi, Kexing Ying, Vasilii Nesterov, Rayan Krishnan, Elif Uskuplu, Bingyu Xia, Janitha Aswedige, Langston Nashold

Comments Accepted at ICLR 2026 Workshop: VerifAI-2: The Second Workshop on AI Verification in the Wild. Live leaderboard hosted here: https://www.vals.ai/benchmarks/proof_bench

2603.26995 2026-03-31 cs.RO

SCRAMPPI: Efficient Contingency Planning for Mobile Robot Navigation via Hamilton-Jacobi Reachability

Raj Harshit Srirangam, Leonard Jung, Rohith Poola, Michael Everett

Comments 8 pages, 5 figures

2603.26994 2026-03-31 cs.LG q-bio.QM

ImmSET: Sequence-Based Predictor of TCR-pMHC Specificity at Scale

Marco Garcia Noceda, Matthew T Noakes, Andrew FigPope, Daniel E Mattox, Bryan Howie, Harlan Robins

Comments Accepted to ML4H 2025 (Proceedings Track). To appear in PMLR 297

2603.26992 2026-03-31 cs.CL

A large corpus of lucid and non-lucid dream reports

Remington Mallett

2603.26989 2026-03-31 cs.SD

Algo Pärt: An Algorithmic Reconstruction of Arvo Pärt's Summa

Bas Cornelissen

Comments 21 pages, 15 figures

2603.26988 2026-03-31 cs.SD eess.AS

Rhythmic segment analysis: Conceptualizing, visualizing, and measuring rhythmic data

Bas Cornelissen

Comments 15 pages, 7 figures

2603.26983 2026-03-31 cs.AI cs.CY cs.LG

Transparency as Architecture: Structural Compliance Gaps in EU AI Act Article 50 II

Vera Schmitt, Niklas Kruse, Premtim Sahitaj, Julius Schöning

Comments 10 pages, 2 figures

2603.26976 2026-03-31 cs.CV

Beyond Mortality: Advancements in Post-Mortem Iris Recognition through Data Collection and Computer-Aided Forensic Examination

Rasel Ahmed Bhuiyan, Parisa Farmanifard, Renu Sharma, Andrey Kuehlkamp, Aidan Boyd, Patrick J Flynn, Kevin W Bowyer, Arun Ross, Dennis Chute, Adam Czajka

2603.26975 2026-03-31 cs.LG

Probabilistic Forecasting of Localized Wildfire Spread Based on Conditional Flow Matching

Bryan Shaddy, Haitong Qin, Brianna Binder, James Haley, Riya Duddalwar, Kyle Hilburn, Assad Oberai

2603.26954 2026-03-31 cs.LG math.ST stat.TH

High dimensional theory of two-phase optimizers

Atish Agarwala

2603.26952 2026-03-31 cs.CV cs.LG

Multimodal Deep Learning for Diabetic Foot Ulcer Staging Using Integrated RGB and Thermal Imaging

Gulengul Mermer, Mustafa Furkan Aksu, Gozde Ozsezer, Sevki Cetinkalp, Orhan Er, Mehmet Kemal Gullu

Comments 18 pages, 7 figures

2603.26945 2026-03-31 cs.CV

Real-time Appearance-based Gaze Estimation for Open Domains

Zhenhao Li, Zheng Liu, Seunghyun Lee, Amin Fadaeinejad, Yuanhao Yu

2603.26944 2026-03-31 cs.AI

Neuro-Symbolic Learning for Predictive Process Monitoring via Two-Stage Logic Tensor Networks with Rule Pruning

Fabrizio De Santis, Gyunam Park, Francesco Zanichelli

Comments Accepted PAKDD 2026

2603.26939 2026-03-31 cs.SD cs.CL eess.AS

Multilingual Stutter Event Detection for English, German, and Mandarin Speech

Felix Haas, Sebastian P. Bayerl