arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.06480 2026-03-09 cs.RO

History-Conditioned Spatio-Temporal Visual Token Pruning for Efficient Vision-Language Navigation

Qitong Wang, Yijun Liang, Ming Li, Tianyi Zhou, Christopher Rasmussen

详情

英文摘要

Vision-Language Navigation (VLN) enables robots to follow natural-language instructions in visually grounded environments, serving as a key capability for embodied robotic systems. Recent Vision-Language-Action (VLA) models have demonstrated strong navigation performance, but their high computational cost introduces latency that limits real-time deployment. We propose a training-free spatio-temporal vision token pruning framework tailored to VLA-based VLN. We apply spatial token selection to the current view, alongside spatio-temporal compression for historical memories, enabling efficient long-horizon inference while reducing redundant computation. Leveraging attention-based token importance and query-guided spatio-temporal filtering, the proposed approach preserves navigation-relevant information without retraining or modifying pretrained models, allowing plug-and-play integration into existing VLA systems. Through experiments on standard VLN benchmarks, we confirm that our method significantly outperforms existing pruning strategies. It successfully preserves superior navigation accuracy under extreme pruning scenarios, all while maintaining the highly competitive inference efficiency. Real-world deployment on a Unitree Go2 quadruped robot further validates reliable and low-latency instruction-following navigation under practical robotic constraints. We hope this work helps bridge the gap between large-scale multimodal modeling and efficient, real-time embodied deployment in robotic navigation systems.

URL PDF HTML ☆

赞 0 踩 0

2603.06467 2026-03-09 cs.CV

GreenRFM: Toward a resource-efficient radiology foundation model

Yingtai Li, Shuai Ming, Mingyue Zhao, Haoran Lai, Rongsheng Wang, Rui Zhou, Rundong Wang, Yujia Li, Wei Wei, Shaohua Kevin Zhou

2603.06459 2026-03-09 cs.CV cs.AI

Do Foundation Models Know Geometry? Probing Frozen Features for Continuous Physical Measurement

Yakov Pyotr Shkolnikov

2603.06449 2026-03-09 cs.CV

CaTok: Taming Mean Flows for One-Dimensional Causal Image Tokenization

Yitong Chen, Zuxuan Wu, Xipeng Qiu, Yu-Gang Jiang

Comments Project website is available in https://sharelab-sii.github.io/catok-web

2603.06445 2026-03-09 cs.CV

What if? Emulative Simulation with World Models for Situated Reasoning

Ruiping Liu, Yufan Chen, Yuheng Zhang, Junwei Zheng, Kunyu Peng, Chengzhi Wu, Chenguang Huang, Di Wen, Jiaming Zhang, Kailun Yang, Rainer Stiefelhagen

2603.06444 2026-03-09 cs.SD cs.AI

Prosodic Boundary-Aware Streaming Generation for LLM-Based TTS with Streaming Text Input

Changsong Liu, Tianrui Wang, Ye Ni, Yizhou Peng, Eng Siong Chng

2603.06440 2026-03-09 cs.LG quant-ph

Toward Generative Quantum Utility via Correlation-Complexity Map

Chen-Yu Liu, Leonardo Placidi, Eric Brunner, Enrico Rinaldi

Comments 33 pages, 8 figures

2603.06428 2026-03-09 cs.CL cs.AI

Abductive Reasoning with Syllogistic Forms in Large Language Models

Hirohiko Abe, Risako Ando, Takanobu Morishita Kentaro Ozeki, Koji Mineshima, Mitsuhiro Okada

Comments Published in Proceedings of the 3rd International Conference on Human and Artificial Rationalities (HAR 2024), LNCS 15504, pp. 3-17

2603.06426 2026-03-09 cs.CV cs.AI cs.LG

CLoPA: Continual Low Parameter Adaptation of Interactive Segmentation for Medical Image Annotation

Parhom Esmaeili, Chayanin Tangwiriyasakul, Eli Gibson, Sebastien Ourselin, M. Jorge Cardoso

Comments 10 pages, 2 figures

2603.06424 2026-03-09 cs.CL

From Prompting to Preference Optimization: A Comparative Study of LLM-based Automated Essay Scoring

Minh Hoang Nguyen, Vu Hoang Pham, Xuan Thanh Huynh, Phuc Hong Mai, Vinh The Nguyen, Quang Nhut Huynh, Huy Tien Nguyen, Tung Le

Comments 19 pages, 10 figures, 7 tables

2603.06421 2026-03-09 cs.CV

Non-invasive Growth Monitoring of Small Freshwater Fish in Home Aquariums via Stereo Vision

Clemens Seibold, Anna Hilsmann, Peter Eisert

Comments Accepted at VISAPP 2026

2603.06416 2026-03-09 cs.CL

Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason's Selection Task

Hirohiko Abe, Kentaro Ozeki, Risako Ando, Takanobu Morishita, Koji Mineshima, Mitsuhiro Okada

Comments To appear in the Proceedings of EACL 2026

2603.06408 2026-03-09 cs.CV cs.AI cs.GR

Physical Simulator In-the-Loop Video Generation

Lin Geng Foo, Mark He Huang, Alexandros Lattas, Stylianos Moschoglou, Thabo Beeler, Christian Theobalt

Comments Accepted to CVPR 2026

2603.06407 2026-03-09 cs.CV

Locating and Editing Figure-Ground Organization in Vision Transformers

Stefan Arnold, René Gröbner

2603.05404 2026-03-09 cs.RO

ROScopter: A Multirotor Autopilot based on ROSflight 2.0

Jacob Moore, Ian Reid, Phil Tokumaru, Randy Beard, Tim McLain

Comments Submitted to the 2026 International Conference on Unmanned Aerial Systems

2603.01474 2026-03-09 cs.RO

ROSER: Few-Shot Robotic Sequence Retrieval for Scalable Robot Learning

Zillur Rahman, Eddison Pham, Alejandro Daniel Noel, Cristian Meo

Comments 2026 ICLR DATA-FM Workshop

2603.01203 2026-03-09 cs.AI

How Well Does Agent Development Reflect Real-World Work?

Zora Zhiruo Wang, Sanidhya Vijayvargiya, Aspen Chen, Hanmo Zhang, Venu Arvind Arangarajan, Jett Chen, Valerie Chen, Diyi Yang, Daniel Fried, Graham Neubig

2602.24142 2026-03-09 cs.CL cs.AI

CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

Yuxuan Liu, Weikai Xu, Kun Huang, Changyu Chen, Jiankun Zhao, Pengzhi Gao, Wei Liu, Jian Luan, Shuo Shang, Bo Du, Ji-Rong Wen, Rui Yan

2602.21915 2026-03-09 cs.CV

Protein Graph Neural Networks for Heterogeneous Cryo-EM Reconstruction

Jonathan Krook, Axel Janson, Joakim Andén, Melanie Weber, Ozan Öktem

2602.03837 2026-03-09 cs.CL cs.AI

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques

David P. Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, MohammadHossein Bateni, Simina Branzei, Michael P. Brenner, Lin Chen, Ying Feng, Lance Fortnow, Gang Fu, Ziyi Guan, Zahra Hadizadeh, Mohammad T. Hajiaghayi, Mahdi JafariRaviz, Adel Javanmard, Karthik C. S., Ken-ichi Kawarabayashi, Ravi Kumar, Silvio Lattanzi, Euiwoong Lee, Yi Li, Ioannis Panageas, Dimitris Paparas, Benjamin Przybocki, Bernardo Subercaseaux, Ola Svensson, Shayan Taherijam, Xuan Wu, Eylon Yogev, Morteza Zadimoghaddam, Samson Zhou, Yossi Matias, James Manyika, Vahab Mirrokni

Comments The changes over version 2 are that we cleaned up the last paragraph on color-coding at the end of section 2. Also, for section 6.1 we added a reference to followup work of the authors, and other minor edits in that section

2602.00276 2026-03-09 cs.AI

Localizing and Correcting Errors for LLM-based Planners

Aditya Kumar, William W. Cohen

2601.13327 2026-03-09 cs.AI

PepEDiff: Zero-Shot Peptide Binder Design via Protein Embedding Diffusion

Po-Yu Liang, Tibo Duran, Jun Bai

2601.05805 2026-03-09 cs.RO

InsSo3D: Inertial Navigation System and 3D Sonar SLAM for turbid environment inspection

Simon Archieri, Ahmet Cinar, Shu Pan, Jonatan Scharff Willners, Michele Grimaldi, Ignacio Carlucho, Yvan Petillot

2601.01144 2026-03-09 cs.RO

VISO: Robust Underwater Visual-Inertial-Sonar SLAM with Photometric Rendering for Dense 3D Reconstruction

Shu Pan, Simon Archieri, Ahmet Cinar, Jonatan Scharff Willners, Ignacio Carlucho, Yvan Petillot

2601.00092 2026-03-09 cs.CV

Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark

Pan Wang, Yang Liu, Guile Wu, Eduardo R. Corral-Soto, Chengjie Huang, Binbin Xu, Dongfeng Bai, Xu Yan, Yuan Ren, Xingxin Chen, Yizhe Wu, Tao Huang, Wenjun Wan, Xin Wu, Pei Zhou, Xuyang Dai, Kangbo Lv, Hongbo Zhang, Yosef Fried, Aixue Ye, Bailan Feng, Zhenyu Chen, Zhen Li, Yingcong Chen, Yiyi Liao, Bingbing Liu

Comments Technical Report

2512.19535 2026-03-09 cs.CV cs.AI

CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion

Moritz Böhle, Amélie Royer, Juliette Marrie, Edouard Grave, Patrick Pérez

Comments updated with improved CA results

2512.15994 2026-03-09 cs.RO cs.SY eess.SY

SORS: A Modular, High-Fidelity Simulator for Soft Robots

Manuel Mekkattu, Mike Y. Michelis, Robert K. Katzschmann

Comments This work has been submitted to the IEEE for possible publication. Code and data are available at github.com/srl-ethz/sors

2512.13687 2026-03-09 cs.CV

Towards Scalable Pre-training of Visual Tokenizers for Generation

Jingfeng Yao, Yuda Song, Yucong Zhou, Xinggang Wang

Comments Our pre-trained models are available at https://github.com/MiniMax-AI/VTP

2511.13459 2026-03-09 cs.RO

Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

Bingkun Huang, Yuhe Gong, Zewen Yang, Tianyu Ren, Luis Figueredo

Comments 8 pages

2511.12474 2026-03-09 cs.CV cs.CL cs.GR

Co-Layout: LLM-driven Co-optimization for Interior Layout

Chucheng Xiang, Ruchao Bao, Biyin Feng, Wenzheng Wu, Zhongyuan Liu, Yirui Guan, Ligang Liu

Comments AAAI 2026