arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.10469 2026-03-12 cs.RO

DepthCache: Depth-Guided Training-Free Visual Token Merging for Vision-Language-Action Model Inference

Yuquan Li, Lianjie Ma, Han Ding, Lijun Zhu

Comments 8 pages, 6 figures

详情

英文摘要

Vision-Language-Action (VLA) models enable generalist robotic manipulation but suffer from high inference latency. This bottleneck stems from the massive number of visual tokens processed by large language backbones. Existing methods either prune or merge tokens uniformly, degrading the spatial reasoning essential for robotic control. We present DepthCache, a training-free framework that leverages depth as a structural prior for visual token compression. It partitions observations into depth-based regions and applies spatially differentiated merge ratios, preserving the near-field workspace while compressing the distant background. To exploit temporal redundancy, DepthCache distributes the merging process across consecutive frames, ensuring consistent representations while reducing per-step computation. A motion-adaptive pipeline further optimizes auxiliary view compression based on end-effector dynamics. The framework requires no model modification, generalizing across diverse VLA architectures. On the LIBERO benchmark, DepthCache achieves up to 1.28x inference speedup with less than 1% average success rate degradation across three VLA models (pi_0.5, OpenVLA, GR00T), whereas pruning and merging baselines incur 4--24% degradation at comparable compression. Real-world experiments on a physical manipulator demonstrate that DepthCache enables faster task throughput and more responsive closed-loop control in latency-sensitive scenarios.

URL PDF HTML ☆

赞 0 踩 0

2603.10466 2026-03-12 cs.CV cs.AI

UniPINN: A Unified PINN Framework for Multi-task Learning of Diverse Navier-Stokes Equations

Dengdi Sun, Jie Chen, Xiao Wang, Jin Tang

2603.10465 2026-03-12 cs.SD cs.CV cs.HC

MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR

Tianyu Xu, Sieun Kim, Qianhui Zheng, Ruoyu Xu, Tejasvi Ravi, Anuva Kulkarni, Katrina Passarella-Ward, Junyi Zhu, Adarsh Kowdle

2603.10463 2026-03-12 cs.CV

Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning

Yushuo Zheng, Huiyu Duan, Zicheng Zhang, Xiaohong Liu, Xiongkuo Min

2603.10459 2026-03-12 cs.RO

SUBTA: A Framework for Supported User-Guided Bimanual Teleoperation in Structured Assembly

Xiao Liu, Prakash Baskaran, Songpo Li, Simon Manschitz, Wei Ma, Dirk Ruiken, Soshi Iba

Comments 8 pages, 7 figures, accepted at ICRA 2026

2603.10456 2026-03-12 cs.CV

LCAMV: High-Accuracy 3D Reconstruction of Color-Varying Objects Using LCA Correction and Minimum-Variance Fusion in Structured Light

Wonbeen Oh, Jae-Sang Hyun

2603.10451 2026-03-12 cs.RO cs.AI

FAR-Dex: Few-shot Data Augmentation and Adaptive Residual Policy Refinement for Dexterous Manipulation

Yushan Bai, Fulin Chen, Hongzheng Sun, Yuchuang Tong, En Li, Zhengtao Zhang

Comments Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

2603.10442 2026-03-12 cs.LG stat.ML

GGMPs: Generalized Gaussian Mixture Processes

Vardaan Tekriwal, Mark D. Risser, Hengrui Luo, Marcus M. Noack

2603.10438 2026-03-12 cs.RO cs.CV

AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

Lianjie Ma, Yuquan Li, Bingzheng Jiang, Ziming Zhong, Han Ding, Lijun Zhu

Comments 8 pages, 5 figures, 5 tables

2603.10436 2026-03-12 cs.RO cs.DC

COHORT: Hybrid RL for Collaborative Large DNN Inference on Multi-Robot Systems Under Real-Time Constraints

Mohammad Saeid Anwar, Anuradha Ravi, Indrajeet Ghosh, Gaurav Shinde, Carl Busart, Nirmalya Roy

Comments Recently accepted at 27th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks ( IEEE WoWMoM 2026)

2603.10430 2026-03-12 cs.LG cs.AI

Domain-Adaptive Health Indicator Learning with Degradation-Stage Synchronized Sampling and Cross-Domain Autoencoder

Jungho Choo, Hanbyeol Park, Gawon Lee, Yunkyung Park, Hyerim Bae

2603.10418 2026-03-12 cs.CV

TractoRC: A Unified Probabilistic Learning Framework for Joint Tractography Registration and Clustering

Yijie Li, Xi Zhu, Junyi Wang, Ye Wu, Lauren J. O'Donnell, Fan Zhang

Comments 11 pages, 3 figures

2603.10417 2026-03-12 cs.CV

Frames2Residual: Spatiotemporal Decoupling for Self-Supervised Video Denoising

Mingjie Ji, Zhan Shi, Kailai Zhou, Zixuan Fu, Xun Cao

2603.10410 2026-03-12 cs.LG cs.AI cs.DB

Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression

Taehyung Kwon, Yeonje Choi, Yeongho Kim, Kijung Shin

Comments to be published in the 42nd IEEE International Conference on Data Engineering (ICDE '26)

详情

英文摘要

Spatio-temporal time series are widely used in real-world applications, including traffic prediction and weather forecasting. They are sequences of observations over extensive periods and multiple locations, naturally represented as multidimensional data. Forecasting is a central task in spatio-temporal analysis, and numerous deep learning methods have been developed to address it. However, as dataset sizes and model complexities continue to grow in practice, training deep learning models has become increasingly time- and resource-intensive. A promising solution to this challenge is dataset distillation, which synthesizes compact datasets that can effectively replace the original data for model training. Although successful in various domains, including time series analysis, existing dataset distillation methods compress only one dimension, making them less suitable for spatio-temporal datasets, where both spatial and temporal dimensions jointly contribute to the large data volume. To address this limitation, we propose STemDist, the first dataset distillation method specialized for spatio-temporal time series forecasting. A key idea of our solution is to compress both temporal and spatial dimensions in a balanced manner, reducing training time and memory. We further reduce the distillation cost by performing distillation at the cluster level rather than the individual location level, and we complement this coarse-grained approach with a subset-based granular distillation technique that enhances forecasting performance. On five real-world datasets, we show empirically that, compared to both general and time-series dataset distillation methods, datasets distilled by our STemDist method enable model training (1) faster (up to 6X) (2) more memory-efficient (up to 8X), and (3) more effective (with up to 12% lower prediction error).

URL PDF HTML ☆

赞 0 踩 0

2603.10408 2026-03-12 cs.CV

Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics

Tianshuo Xu, Zhifei Chen, Leyi Wu, Hao Lu, Ying-cong Chen

Comments https://tianshuo-xu.github.io/Motion-Forcing/

2603.10407 2026-03-12 cs.RO

Rethinking Gaussian Trajectory Predictors: Calibrated Uncertainty for Safe Planning

Fatemeh Cheraghi Pouria, Mahsa Golchoubian, Katherine Driggs-Campbell

2603.10402 2026-03-12 cs.RO

Shape Control of a Planar Hyper-Redundant Robot via Hybrid Kinematics-Informed and Learning-based Approach

Yuli Song, Wenbo Li, Wenci Xin, Zhiqiang Tang, Daniela Rus, Cecilia Laschi

2603.10400 2026-03-12 cs.LG cs.AI math.OC stat.ML

Designing Service Systems from Textual Evidence

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi

Comments 67 pages,

2603.10398 2026-03-12 cs.CV

Multi-Person Pose Estimation Evaluation Using Optimal Transportation and Improved Pose Matching

Takato Moriki, Hiromu Taketsugu, Norimichi Ukita

Comments 8 pages, 10 figures. Accepted at MVA 2025

2603.10397 2026-03-12 cs.LG cs.AI

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

Tongcheng Zhang, Zhanpeng Zhou, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan

Comments Accepted to AAAI 2026(oral)

2603.10396 2026-03-12 cs.AI

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, Masaki Adachi

2603.10392 2026-03-12 cs.RO cs.AI

Safe Probabilistic Planning for Human-Robot Interaction using Conformal Risk Control

Jake Gonzales, Kazuki Mizuta, Karen Leung, Lillian J. Ratliff

2603.10391 2026-03-12 cs.LG cs.CV

Variance-Aware Adaptive Weighting for Diffusion Model Training

Nanlong Sun, Lei Shi

Comments 15 pages, 8 figures, 1 table

2603.10390 2026-03-12 cs.RO

ScanDP: Generalizable 3D Scanning with Diffusion Policy

Itsuki Hirako, Ryo Hakoda, Yubin Liu, Matthew Hwang, Yoshihiro Sato, Takeshi Oishi

Comments 8 pages, 7 figures, 5 tables. Project Page: https://treeitsuki.github.io/ScanDP/

2603.10379 2026-03-12 cs.LG cs.AI

Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design

Junzhuo Li, Peijie Jiang, Changxin Tian, Jia Liu, Zhiqiang Zhang, Xuming Hu

2603.10373 2026-03-12 cs.RO cs.AI

Few-Shot Adaptation to Non-Stationary Environments via Latent Trend Embedding for Robotics

Yasuyuki Fujii, Emika Kameda, Hiroki Fukada, Yoshiki Mori, Tadashi Matsuo, Nobutaka Shimada

2603.10370 2026-03-12 cs.CV

GeoSense: Internalizing Geometric Necessity Perception for Multimodal Reasoning

Ruiheng Liu, Haihong Hao, Mingfei Han, Xin Gu, Kecheng Zhang, Changlin Li, Xiaojun Chang

2603.10367 2026-03-12 cs.CL cs.AI

Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking

Haoxiang Su, Ruiyu Fang, Liting Jiang, Xiaomeng Huang, Shuangyong Song

2603.10360 2026-03-12 cs.CV

One Token, Two Fates: A Unified Framework via Vision Token Manipulation Against MLLMs Hallucination

Zhan Fa, Yue Duan, Jian Zhang, Lei Qi, Yinghuan Shi

Comments 10 pages

2603.10359 2026-03-12 cs.AI cs.LG

HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation

Wenjing Zhang, Jiangze Yan, Jieyun Huang, Yi Shen, Shuming Shi, Ping Chen, Ning Wang, Zhaoxiang Liu, Kai Wang, Shiguo Lian

Comments 11 pages,5 figures