arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.18434 2026-02-23 cs.CV

Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory

Vatsal Agarwal, Saksham Suri, Matthew Gwilliam, Pulkit Kumar, Abhinav Shrivastava

Comments Project page: see https://vatsalag99.github.io/memstream/

详情

英文摘要

Streaming video understanding requires models to robustly encode, store, and retrieve information from a continuous video stream to support accurate video question answering (VQA). Existing state-of-the-art approaches rely on key-value caching to accumulate frame-level information over time, but use a limited number of tokens per frame, leading to the loss of fine-grained visual details. In this work, we propose scaling the token budget to enable more granular spatiotemporal understanding and reasoning. First, we find that current methods are ill-equipped to handle dense streams: their feature encoding causes query-frame similarity scores to increase over time, biasing retrieval toward later frames. To address this, we introduce an adaptive selection strategy that reduces token redundancy while preserving local spatiotemporal information. We further propose a training-free retrieval mixture-of-experts that leverages external models to better identify relevant frames. Our method, MemStream, achieves +8.0% on CG-Bench, +8.5% on LVBench, and +2.4% on VideoMME (Long) over ReKV with Qwen2.5-VL-7B.

URL PDF HTML ☆

赞 0 踩 0

2602.18432 2026-02-23 cs.CV

SARAH: Spatially Aware Real-time Agentic Humans

Evonne Ng, Siwei Zhang, Zhang Chen, Michael Zollhoefer, Alexander Richard

Comments Project page: https://evonneng.github.io/sarah/

2602.18429 2026-02-23 cs.CL cs.IR

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Harshul Raj Surana, Arijit Maji, Aryan Vats, Akash Ghosh, Sriparna Saha, Amit Sheth

2602.18428 2026-02-23 cs.LG cs.CV eess.IV

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Mojtaba Sahraee-Ardakan, Mauricio Delbracio, Peyman Milanfar

详情

英文摘要

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

URL PDF HTML ☆

赞 0 踩 0

2602.18425 2026-02-23 cs.CL cs.IR

RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

Deniz Qian, Hung-Ting Chen, Eunsol Choi

Comments 18 pages, 12 figures, 12 tables

2602.18424 2026-02-23 cs.CV cs.RO

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation

Xia Su, Ruiqi Chen, Benlin Liu, Jingwei Ma, Zonglin Di, Ranjay Krishna, Jon Froehlich

2602.18422 2026-02-23 cs.CV

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Linxi Xie, Lisong C. Sun, Ashley Neall, Tong Wu, Shengqu Cai, Gordon Wetzstein

Comments Project page here: https://codeysun.github.io/generated-reality

2602.18421 2026-02-23 cs.RO cond-mat.soft

Snapping Actuators with Asymmetric and Sequenced Motion

Xin Li, Ye Jin, Mohsen Jafarpour, Hugo de Souza Oliveira, Edoardo Milana

Comments 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026)

2602.18420 2026-02-23 cs.CL

SPQ: An Ensemble Technique for Large Language Model Compression

Jiamin Yao, Eren Gultepe

Comments Accepted to LREC 2026 Main Conference

2602.18417 2026-02-23 cs.LG cs.CL

Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

Joshua Nunley

Comments 12 pages, 3 figures, 8 tables

2602.18409 2026-02-23 cs.LG cs.AI cs.LO

Unifying approach to uniform expressivity of graph neural networks

Huan Luo, Jonni Virtema

2602.18403 2026-02-23 cs.LG

Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction: A Comparative Study

Orfeas Bourchas, George Papalambrou

Comments Accepted to the KGML Bridge at AAAI 2026 (non-archival)

2602.18401 2026-02-23 cs.LG cs.AI q-bio.NC stat.ML

Leakage and Second-Order Dynamics Improve Hippocampal RNN Replay

Josue Casco-Rodriguez, Nanda H. Krishna, Richard G. Baraniuk

2602.18397 2026-02-23 cs.RO

How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf

Wenqi Jiang, Jason Clemons, Karu Sankaralingam, Christos Kozyrakis

2602.18396 2026-02-23 cs.LG eess.SP math.PR stat.AP stat.ML

PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing

Ehsan Lari, Reza Arablouei, Stefan Werner

Comments 13 pages, 5 figures, 2 tables, Submitted to IEEE Transactions on Signal Processing (TSP)

2602.18394 2026-02-23 cs.CV

Self-Aware Object Detection via Degradation Manifolds

Stefan Becker, Simon Weiss, Wolfgang Hübner, Michael Arens

2602.18386 2026-02-23 cs.RO cs.AI cs.LG cs.SY eess.SY

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Mohamed Elgouhary, Amr S. El-Wakeel

2602.18384 2026-02-23 cs.LG cs.AI

FedZMG: Efficient Client-Side Optimization in Federated Learning

Fotios Zantalis, Evangelos Zervas, Grigorios Koulouras

2602.18379 2026-02-23 cs.RO cond-mat.soft

Ori-Sense: origami capacitive sensing for soft robotic applications

Hugo de Souza Oliveira, Xin Li, Mohsen Jafarpour, Edoardo Milana

Comments 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026)

2602.18374 2026-02-23 cs.RO cs.AI

Zero-shot Interactive Perception

Venkatesh Sripada, Frank Guerin, Amir Ghalamzan

Comments Original manuscript submitted on April 24, 2025. Timestamped and publicly available on OpenReview: https://openreview.net/forum?id=7MhpFcr5Nx

2602.18351 2026-02-23 cs.CL cs.AI

Validating Political Position Predictions of Arguments

Jordan Robinson, Angus R. Williams, Katie Atkinson, Anthony G. Cohn

Comments 13 pages, 6 figures, 6 tables. Under review

2602.18348 2026-02-23 cs.LG

Explaining AutoClustering: Uncovering Meta-Feature Contribution in AutoML for Clustering

Matheus Camilo da Silva, Leonardo Arrighi, Ana Carolina Lorena, Sylvio Barbon Junior

2602.18346 2026-02-23 cs.CL cs.AI

Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

Pavithra PM Nair, Preethu Rose Anish

2602.18344 2026-02-23 cs.RO

Downwash-aware Configuration Optimization for Modular Aerial Systems

Mengguang Li, Heinz Koeppl

Comments Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2026

2602.18330 2026-02-23 cs.RO

Tendon-Driven Reciprocating and Non-Reciprocating Motion via Snapping Metabeams

Mohsen Jafarpour, Ayberk Yüksek, Shahab Eshghi, Stanislav Gorb, Edoardo Milana

Comments 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026)

2602.18329 2026-02-23 cs.CV math.AT

G-LoG Bi-filtration for Medical Image Classification

Qingsong Wang, Jiaxing He, Bingzhe Hou, Tieru Wu, Yang Cao, Cailing Yao

2602.18326 2026-02-23 cs.CL

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Tao Wu, Adam Kapelner

Comments 8 pages, 3 figures, 4 tables

2602.18322 2026-02-23 cs.CV

Unifying Color and Lightness Correction with View-Adaptive Curve Adjustment for Robust 3D Novel View Synthesis

Ziteng Cui, Shuhong Liu, Xiaoyu Dong, Xuangeng Chu, Lin Gu, Ming-Hsuan Yang, Tatsuya Harada

Comments Journal extension version of CVPR 2025 paper: arXiv:2504.01503

2602.18314 2026-02-23 cs.CV cs.GR cs.RO

Diff2DGS: Reliable Reconstruction of Occluded Surgical Scenes via 2D Gaussian Splatting

Tianyi Song, Danail Stoyanov, Evangelos Mazomenos, Francisco Vasconcelos

Comments This work has been submitted to the IEEE for possible publication

2602.18312 2026-02-23 cs.RO cs.GR

Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty

Zhaoming Xie, Kevin Karol, Jessica Hodgins