arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.18434 2026-02-23 cs.CV

Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory

Vatsal Agarwal, Saksham Suri, Matthew Gwilliam, Pulkit Kumar, Abhinav Shrivastava

Comments Project page: see https://vatsalag99.github.io/memstream/

详情

英文摘要

Streaming video understanding requires models to robustly encode, store, and retrieve information from a continuous video stream to support accurate video question answering (VQA). Existing state-of-the-art approaches rely on key-value caching to accumulate frame-level information over time, but use a limited number of tokens per frame, leading to the loss of fine-grained visual details. In this work, we propose scaling the token budget to enable more granular spatiotemporal understanding and reasoning. First, we find that current methods are ill-equipped to handle dense streams: their feature encoding causes query-frame similarity scores to increase over time, biasing retrieval toward later frames. To address this, we introduce an adaptive selection strategy that reduces token redundancy while preserving local spatiotemporal information. We further propose a training-free retrieval mixture-of-experts that leverages external models to better identify relevant frames. Our method, MemStream, achieves +8.0% on CG-Bench, +8.5% on LVBench, and +2.4% on VideoMME (Long) over ReKV with Qwen2.5-VL-7B.

URL PDF HTML ☆

赞 0 踩 0

2602.18432 2026-02-23 cs.CV

SARAH: Spatially Aware Real-time Agentic Humans

Evonne Ng, Siwei Zhang, Zhang Chen, Michael Zollhoefer, Alexander Richard

Comments Project page: https://evonneng.github.io/sarah/

2602.18429 2026-02-23 cs.CL cs.IR

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Harshul Raj Surana, Arijit Maji, Aryan Vats, Akash Ghosh, Sriparna Saha, Amit Sheth

2602.18428 2026-02-23 cs.LG cs.CV eess.IV

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Mojtaba Sahraee-Ardakan, Mauricio Delbracio, Peyman Milanfar

详情

英文摘要

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

URL PDF HTML ☆

赞 0 踩 0

2602.18427 2026-02-23 math.CO cs.DM math-ph math.MP math.OC

Polytopes of alternating sign matrices with dihedral-subgroup symmetry

Péter Madarasi

2602.18426 2026-02-23 astro-ph.GA cs.CV

Spatio-Spectroscopic Representation Learning using Unsupervised Convolutional Long-Short Term Memory Networks

Kameswara Bharadwaj Mantha, Lucy Fortson, Ramanakumar Sankar, Claudia Scarlata, Chris Lintott, Sandor Kruk, Mike Walmsley, Hugh Dickinson, Karen Masters, Brooke Simmons, Rebecca Smethurst

Comments This manuscript was previously submitted to ICML for peer review. Reviewers noted that while the underlying VAE-based architecture builds on established methods, its application to spatially-resolved IFS data is promising for unsupervised representation learning in astronomy. This version is released for community visibility. Reviewer decisions: Weak accept and Weak reject (Final: Reject)

2602.18425 2026-02-23 cs.CL cs.IR

RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

Deniz Qian, Hung-Ting Chen, Eunsol Choi

Comments 18 pages, 12 figures, 12 tables

2602.18424 2026-02-23 cs.CV cs.RO

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation

Xia Su, Ruiqi Chen, Benlin Liu, Jingwei Ma, Zonglin Di, Ranjay Krishna, Jon Froehlich

2602.18422 2026-02-23 cs.CV

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Linxi Xie, Lisong C. Sun, Ashley Neall, Tong Wu, Shengqu Cai, Gordon Wetzstein

Comments Project page here: https://codeysun.github.io/generated-reality

2602.18421 2026-02-23 cs.RO cond-mat.soft

Snapping Actuators with Asymmetric and Sequenced Motion

Xin Li, Ye Jin, Mohsen Jafarpour, Hugo de Souza Oliveira, Edoardo Milana

Comments 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026)

2602.18420 2026-02-23 cs.CL

SPQ: An Ensemble Technique for Large Language Model Compression

Jiamin Yao, Eren Gultepe

Comments Accepted to LREC 2026 Main Conference

2602.18417 2026-02-23 cs.LG cs.CL

Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

Joshua Nunley

Comments 12 pages, 3 figures, 8 tables

2602.18416 2026-02-23 eess.SY cs.SY math.OC

Convex Block-Cholesky Approach to Risk-Constrained Low-thrust Trajectory Design under Operational Uncertainty

Kenshiro Oguri, Gregory Lantoine

2602.18409 2026-02-23 cs.LG cs.AI cs.LO

Unifying approach to uniform expressivity of graph neural networks

Huan Luo, Jonni Virtema

2602.18405 2026-02-23 cs.IT math.IT

A Generalized Information Bottleneck Method: A Decision-Theoretic Perspective

Akira Kamatsuka, Takahiro Yoshida

2602.18404 2026-02-23 math.NA cs.NA

Well-posedness and time stepping adaptivity for a class of collocation discretisations of time-fractional subdiffusion equations

Sebastian Franz, Natalia Kopteva

Comments 23 pages, 9 figures

2602.18403 2026-02-23 cs.LG

Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction: A Comparative Study

Orfeas Bourchas, George Papalambrou

Comments Accepted to the KGML Bridge at AAAI 2026 (non-archival)

2602.18401 2026-02-23 cs.LG cs.AI q-bio.NC stat.ML

Leakage and Second-Order Dynamics Improve Hippocampal RNN Replay

Josue Casco-Rodriguez, Nanda H. Krishna, Richard G. Baraniuk

2602.18397 2026-02-23 cs.RO

How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf

Wenqi Jiang, Jason Clemons, Karu Sankaralingam, Christos Kozyrakis

2602.18396 2026-02-23 cs.LG eess.SP math.PR stat.AP stat.ML

PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing

Ehsan Lari, Reza Arablouei, Stefan Werner

Comments 13 pages, 5 figures, 2 tables, Submitted to IEEE Transactions on Signal Processing (TSP)

2602.18394 2026-02-23 cs.CV

Self-Aware Object Detection via Degradation Manifolds

Stefan Becker, Simon Weiss, Wolfgang Hübner, Michael Arens

2602.18390 2026-02-23 cs.DB cs.LO

Dichotomy for Axiomatising Inclusion Dependencies on K-Databases

Miika Hannula, Teymur Ismikhanov, Jonni Virtema

2602.18389 2026-02-23 cs.DS

Improved Algorithms for Clustering with Noisy Distance Oracles

Pinki Pradhan, Anup Bhattacharya, Ragesh Jaiswal

Comments 37 pages, 10 figures

2602.18386 2026-02-23 cs.RO cs.AI cs.LG cs.SY eess.SY

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Mohamed Elgouhary, Amr S. El-Wakeel

2602.18384 2026-02-23 cs.LG cs.AI

FedZMG: Efficient Client-Side Optimization in Federated Learning

Fotios Zantalis, Evangelos Zervas, Grigorios Koulouras

2602.18382 2026-02-23 eess.SY cs.SY math.OC

Incremental Input-to-State Stability and Equilibrium Tracking for Stochastic Contracting Dynamics

Yu Kawano, Simone Betteti, Alexander Davydov, Francesco Bullo

2602.18380 2026-02-23 cs.CC cs.GT econ.TH

The Complexity of Sparse Win-Lose Bimatrix Games

Eleni Batziou, John Fearnley, Abheek Ghosh, Rahul Savani

Comments 43 pages

2602.18379 2026-02-23 cs.RO cond-mat.soft

Ori-Sense: origami capacitive sensing for soft robotic applications

Hugo de Souza Oliveira, Xin Li, Mohsen Jafarpour, Edoardo Milana

Comments 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026)

2602.18376 2026-02-23 eess.SY cs.SY

Parameter Update Laws for Adaptive Control with Affine Equality Parameter Constraints

Ashwin P. Dani

2602.18374 2026-02-23 cs.RO cs.AI

Zero-shot Interactive Perception

Venkatesh Sripada, Frank Guerin, Amir Ghalamzan

Comments Original manuscript submitted on April 24, 2025. Timestamped and publicly available on OpenReview: https://openreview.net/forum?id=7MhpFcr5Nx