arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.26666 2026-03-30 cs.RO

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Zhide Zhong, Haodong Yan, Junfeng Li, Junjie He, Tianran Zhang, Haoang Li

详情

英文摘要

Although pre-trained Vision-Language-Action (VLA) models exhibit impressive generalization in robotic manipulation, post-training remains crucial to ensure reliable performance during deployment. However, standard offline Supervised Fine-Tuning (SFT) suffers from distribution shifts and catastrophic forgetting of pre-trained capabilities, while online Reinforcement Learning (RL) struggles with sparse rewards and poor sample efficiency. In this paper, we propose On-Policy VLA Distillation (VLA-OPD), a framework bridging the efficiency of SFT with the robustness of RL. Instead of relying on sparse environmental rewards, VLA-OPD leverages an expert teacher to provide dense, token-level supervision on the student's self-generated trajectories. This enables active error correction on policy-induced states while preserving pre-trained general capabilities through gentle alignment. Crucially, we formulate VLA-OPD via a Reverse-KL objective. Unlike standard Forward-KL that induces mode-covering entropy explosion, or Hard-CE that causes premature entropy collapse, our bounded mode-seeking objective ensures stable policy learning by filtering out the teacher's epistemic uncertainty while maintaining action diversity. Experiments on LIBERO and RoboTwin2.0 benchmarks demonstrate that VLA-OPD significantly improves sample efficiency over RL and robustness over SFT, while effectively mitigating catastrophic forgetting during post-training.

URL PDF HTML ☆

赞 0 踩 0

2603.26665 2026-03-30 cs.CV

Detailed Geometry and Appearance from Opportunistic Motion

Ryosuke Hirai, Kohei Yamashita, Antoine Guédon, Ryo Kawahara, Vincent Lepetit, Ko Nishino

2603.26663 2026-03-30 cs.CL

Weight Tying Biases Token Embeddings Towards the Output Space

Antonio Lopardo, Avyukth Harish, Catherine Arnett, Akshat Gupta

2603.26661 2026-03-30 cs.CV

GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

Nicolas von Lützow, Barbara Rössle, Katharina Schmid, Matthias Nießner

Comments Project page: https://nicolasvonluetzow.github.io/GaussianGPT/ - Project video: https://youtu.be/zVnMHkFzHDg

2603.26659 2026-03-30 cs.RO

Partial Motion Imitation for Learning Cart Pushing with Legged Manipulators

Mili Das, Morgan Byrd, Donghoon Baek, Sehoon Ha

Comments 8 pages, 5 figures

2603.26658 2026-03-30 cs.CV

Zero-Shot Depth from Defocus

Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng

2603.26657 2026-03-30 cs.CV cs.LG

Tunable Soft Equivariance with Guarantees

Md Ashiqur Rahman, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh

2603.26653 2026-03-30 cs.CV cs.AI cs.CL cs.LG

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Shaoxuan Li, Zhixuan Zhao, Hanze Deng, Zirun Ma, Shulin Tian, Zuyan Liu, Yushi Hu, Haoning Wu, Yuhao Dong, Benlin Liu, Ziwei Liu, Ranjay Krishna

Comments Project Page: https://perceptioncomp.github.io

2603.26647 2026-03-30 cs.LG cs.SY eess.SY

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

Ashutosh Soni, Peizhong Ju, Atilla Eryilmaz, Ness B. Shroff

2603.26646 2026-03-30 cs.CV

Beyond Language: Grounding Referring Expressions with Hand Pointing in Egocentric Vision

Ling Li, Bowen Liu, Zinuo Zhan, Peng Jie, Jianhui Zhong, Kenglun Chang, Zhidong Deng

2603.26644 2026-03-30 cs.LG astro-ph.IM stat.ME

Automatic Laplace Collapsed Sampling: Scalable Marginalisation of Latent Parameters via Automatic Differentiation

Toby Lovick, David Yallup, Will Handley

Comments 28 Pages, 7 Figures. Comments welcome

2603.26639 2026-03-30 cs.CV cs.AI

Make Geometry Matter for Spatial Reasoning

Shihua Zhang, Qiuhong Shen, Shizun Wang, Tianbo Pan, Xinchao Wang

2603.26638 2026-03-30 cs.CV cs.RO

Drive-Through 3D Vehicle Exterior Reconstruction via Dynamic-Scene SfM and Distortion-Aware Gaussian Splatting

Nitin Kulkarni, Akhil Devarashetti, Charlie Cluss, Livio Forte, Philip Schneider, Chunming Qiao, Alina Vereshchaka

Comments 8 pages, 7 figures, Submitted to IEEE IROS 2026 (under review)

2603.26629 2026-03-30 cs.LG

Context-specific Credibility-aware Multimodal Fusion with Conditional Probabilistic Circuits

Pranuthi Tenali, Sahil Sidheekh, Saurabh Mathur, Erik Blasch, Kristian Kersting, Sriraam Natarajan

2603.26611 2026-03-30 cs.LG stat.ME stat.ML

Benchmarking Tabular Foundation Models for Conditional Density Estimation in Regression

Rafael Izbicki, Pedro L. C. Rodrigues

2603.26610 2026-03-30 cs.CV cs.AI

Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling

Ruixing Zhang, Hanzhang Jiang, Leilei Sun, Liangzhe Han, Jibin Wang, Weifeng Lv

2603.26604 2026-03-30 cs.LG hep-ph physics.ins-det

Hardware-Aware Tensor Networks for Real-Time Quantum-Inspired Anomaly Detection at Particle Colliders

Sagar Addepalli, Prajita Bhattarai, Abhilasha Dave, Julia Gonski

Comments 28 pages, 9 figures

2603.26599 2026-03-30 cs.CV

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Zhaochong An, Orest Kupyn, Théo Uscidda, Andrea Colaco, Karan Ahuja, Serge Belongie, Mar Gonzalez-Franco, Marta Tintore Gazulla

Comments Project Page: https://zhaochongan.github.io/projects/VGGRPO

2603.26597 2026-03-30 cs.CV

From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning

Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Xilin Zhao, Qingming Huang

Comments Accepted at CVPR 2026

2603.26596 2026-03-30 cs.LG

Characterization and forecasting of national-scale solar power ramp events

Luca Lanzilao, Angela Meyer

详情

英文摘要

The rapid growth of solar energy is reshaping power system operations and increasing the complexity of grid management. As photovoltaic (PV) capacity expands, short-term fluctuations in PV generation introduce substantial operational uncertainty. At the same time, solar power ramp events intensify risks of grid instability and unplanned outages due to sudden large power fluctuations. Accurate identification, forecasting and mitigation of solar ramp events are therefore critical to maintaining grid stability. In this study, we analyze two years of PV power production from 6434 PV stations at 15-minute resolution. We develop quantitative metrics to define solar ramp events and systematically characterize their occurrence, frequency, and magnitude at a national scale. Furthermore, we examine the meteorological drivers of ramp events, highlighting the role of mesoscale cloud systems. In particular, we observe that ramp-up events are typically associated with cloud dissipation during the morning, while ramp-down events commonly occur when cloud cover increases in the afternoon. Additionally, we adopt a recently developed spatiotemporal forecasting framework to evaluate both deterministic and probabilistic PV power forecasts derived from deep learning and physics-based models, including SolarSTEPS, SHADECast, IrradianceNet, and IFS-ENS. The results show that SHADECast is the most reliable model, achieving a CRPS 10.8% lower than that of SolarSTEPS at a two-hour lead time. Nonetheless, state-of-the-art nowcasting models struggle to capture ramp dynamics, with forecast RMSE increasing by up to 50% compared to normal operating conditions. Overall, these results emphasize the need for improved high-resolution spatiotemporal modelling to enhance ramp prediction skill and support the reliable integration of large-scale solar generation into power systems.

URL PDF HTML ☆

赞 0 踩 0

2603.26595 2026-03-30 cs.LG hep-ex

PQuantML: A Tool for End-to-End Hardware-aware Model Compression

Roope Niemi, Anastasiia Petrovych, Arghya Ranjan Das, Enrico Lupi, Chang Sun, Dimitrios Danopoulos, Marlon Joshua Helbing, Mia Liu, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini

2603.26589 2026-03-30 cs.CV

The Limits of Learning from Pictures and Text: Vision-Language Models and Embodied Scene Understanding

Gillian Rosenberg, Skylar Stadhard, Bruce C. Hansen, Michelle R. Greene

Comments 7 figures, 5 tables

2603.26587 2026-03-30 cs.CL

EnTaCs: Analyzing the Relationship Between Sentiment and Language Choice in English-Tamil Code-Switching

Paul Bontempo

Comments 5 pages, 2 figures

2603.26586 2026-03-30 cs.CV

MA-Bench: Towards Fine-grained Micro-Action Understanding

Kun Li, Jihao Gu, Fei Wang, Zhiliang Wu, Hehe Fan, Dan Guo

Comments Accepted by CVPR 2026

2603.26575 2026-03-30 cs.LG

The Climber's Grip -- Personalized Deep Learning Models for Fear and Muscle Activity in Climbing

Matthias Boeker, Dana Swarbrick, Ulysse T. A. Côté-Allard, Marc T. P. Adam, Hugo L. Hammer, Pål Halvorsen

2603.26569 2026-03-30 cs.LG

Machine Unlearning under Retain-Forget Entanglement

Jingpu Cheng, Ping Liu, Qianxiao Li, Chi Zhang

Comments ICLR 2026 camera-ready

2603.26556 2026-03-30 cs.CL cs.AI

When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

Juan Gabriel Kostelec, Xiang Wang, Axel Laborieux, Christos Sourmpis, Qinghai Guo

2603.26553 2026-03-30 cs.CV

HolisticSemGes: Semantic Grounding of Holistic Co-Speech Gesture Generation with Contrastive Flow-Matching

Lanmiao Liu, Esam Ghaleb, Aslı Özyürek, Zerrin Yumak

2603.26547 2026-03-30 cs.LG

A Lyapunov Analysis of Softmax Policy Gradient for Stochastic Bandits

Tor Lattimore

Comments 6 pages

2603.26544 2026-03-30 cs.CL q-bio.QM

Development of a European Union Time-Indexed Reference Dataset for Assessing the Performance of Signal Detection Methods in Pharmacovigilance using a Large Language Model

Maria Kefala, Jeffery L. Painter, Syed Tauhid Bukhari, Maurizio Sessa

Comments 4 Figures and 2 Tables