arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.01223 2026-03-06 cs.LG cs.CL

Learn Hard Problems During RL with Reference Guided Fine-tuning

Yangzhen Wu, Shanda Li, Zixin Wen, Xin Zhou, Ameet Talwalkar, Yiming Yang, Wenhao Huang, Tianle Cai

Comments 15 pages, 5 figures

详情

英文摘要

Reinforcement learning (RL) for mathematical reasoning can suffer from reward sparsity: for challenging problems, LLM fails to sample any correct trajectories, preventing RL from receiving meaningful positive feedback. At the same time, there often exist human-written reference solutions along with the problem (e.g., problems from AoPS), but directly fine-tuning on these solutions offers no benefit because models often cannot imitate human proofs that lie outside their own reasoning distribution. We introduce Reference-Guided Fine-Tuning (ReGFT), a simple and effective method that utilizes human-written reference solutions to synthesize positive trajectories on hard problems and train on them before RL. For each problem, we provide the model with a partial reference solution and let it generate its own reasoning trace, ensuring the resulting trajectories remain in the model's reasoning space while still benefiting from reference guidance. Fine-tuning on these reference-guided trajectories increases the number of solvable problems and produces a checkpoint that receives more positive rewards during RL. Across three benchmarks (AIME24, AIME25, BeyondAIME), ReGFT consistently improves supervised accuracy, accelerates DAPO training, and raises the final performance plateau of RL. Our results show that ReGFT effectively overcomes reward sparsity and unlocks stronger RL-based mathematical reasoning.

URL PDF HTML ☆

赞 0 踩 0

2603.01209 2026-03-06 cs.AI cs.LG

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Victor May, Aaditya Salgarkar, Yishan Wang, Diganta Misra, Huu Nguyen

Comments Code: https://github.com/mrcabbage972/agents-learn-runtime

2603.01145 2026-03-06 cs.AI

AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution

Yutao Yang, Junsong Li, Qianjun Pan, Bihao Zhan, Yuxuan Cai, Lin Du, Jie Zhou, Kai Chen, Qin Chen, Xin Li, Bo Zhang, Liang He

2603.01007 2026-03-06 cs.CV

Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving

Xubo Zhu, Haoyang Zhang, Fei He, Rui Wu, Yanhu Shan, Wen Yang, Huai Yu

Comments 10 pages, 6 figures. Accepted at CVPR 2026

2603.00589 2026-03-06 cs.CV cs.AI

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

Cencen Liu, Dongyang Zhang, Wen Yin, Jielei Wang, Tianyu Li, Ji Guo, Wenbo Jiang, Guoqing Wang, Guoming Lu

Comments Accepted to CVPR 2026 Findings

2603.00395 2026-03-06 cs.SD cs.LG eess.AS

Fine-grained Soundscape Control for Augmented Hearing

Seunghyun Oh, Malek Itani, Aseem Gauri, Shyamnath Gollakota

Comments 15 pages, 11 figures, 4 tables, submitted to ACM MobiSys 2026

2602.24290 2026-03-06 cs.CV

UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images

Junhwa Hur, Charles Herrmann, Songyou Peng, Philipp Henzler, Zeyu Ma, Todd Zickler, Deqing Sun

Comments ICLR 2026, Project page: https://ufo-4d.github.io/

2602.24096 2026-03-06 cs.CV cs.AI cs.LG

DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer

Yuxuan Zhang, Katarína Tóthová, Zian Wang, Kangxue Yin, Haithem Turki, Riccardo de Lutio, Yen-Yu Chang, Or Litany, Sanja Fidler, Zan Gojcic

Comments For more details and updates, please visit our project website: https://research.nvidia.com/labs/sil/projects/diffusion-harmonizer

2602.23974 2026-03-06 cs.AI

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Fan Zhang, Baoru Huang, Xin Zhang

Comments Withdrawn due to a crucial mistake

2602.22091 2026-03-06 cs.CV

Learning to Drive is a Free Gift: Large-Scale Label-Free Autonomy Pretraining from Unposed In-The-Wild Videos

Matthew Strong, Wei-Jer Chang, Quentin Herau, Jiezhi Yang, Yihan Hu, Chensheng Peng, Wei Zhan

Comments Accepted at CVPR 2026

2602.21366 2026-03-06 cs.RO

Environment-Aware Learning of Smooth GNSS Covariance Dynamics for Autonomous Racing

Y. Deemo Chen, Arion Zimmermann, Thomas A. Berrueta, Soon-Jo Chung

Comments 8 pages, Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

2602.19948 2026-03-06 cs.CL cs.AI cs.CY cs.HC cs.MA

Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

Ian Steenstra, Paola Pedrelli, Weiyan Shi, Stacy Marsella, Timothy W. Bickmore

Comments This paper is a condensed version of the first author's Ph.D. dissertation submitted to Northeastern University

2602.18764 2026-03-06 cs.AI cs.CL

The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol

Andreas Schlapbach

Comments 18 sections, 4 figures, 7 tables, 40 references. Original research presenting: (1) formal framework mapping Schema-Guided Dialogue principles to Model Context Protocol concepts, (2) five foundational design principles for LLM-native schema authoring, (3) architectural patterns for secure, scalable agent orchestration. Research supported by SBB (Swiss Federal Railways)

2602.18688 2026-03-06 cs.RO

Scout-Rover cooperation: online terrain strength mapping and traversal risk estimation for planetary-analog explorations

Shipeng Liu, J. Diego Caporale, Yifeng Zhang, Xingjue Liao, William Hoganson, Wilson Hu, Shivangi Misra, Neha Peddinti, Rachel Holladay, Ethan Fulcher, Akshay Ram Panyam, Andrik Puentes, Jordan M. Bretzfelder, Michael Zanetti, Uland Wong, Daniel E. Koditschek, Mark Yim, Douglas Jerolmack, Cynthia Sung, Feifei Qian

Comments 8 figures

2602.18655 2026-03-06 cs.RO cs.SY eess.SY

Infinite-Dimensional Closed-Loop Inverse Kinematics for Soft Robots via Neural Operators

Carina Veil, Moritz Flaschel, Ellen Kuhl, Cosimo Della Santina

2602.17686 2026-03-06 cs.LG cs.AI

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO

Bowen Yu, Maolin Wang, Sheng Zhang, Binhao Wang, Yi Wen, Jingtong Gao, Bowen Liu, Zimo Zhao, Wanyu Wang, Xiangyu Zhao

Comments 22 pages, 12 figures

2602.17260 2026-03-06 cs.CV

EA-Swin: An Embedding-Agnostic Swin Transformer for AI-Generated Video Detection

Hung Mai, Loi Dinh, Duc Hai Nguyen, Dat Do, Luong Doan, Khanh Nguyen Quoc, Huan Vu, Naeem Ul Islam, Tuan Do

Comments 2nd preprint version

2602.15572 2026-03-06 cs.LG cs.MA

Neural Network-Based Parameter Estimation of a Labour Market Agent-Based Model

M Lopes Alves, Joel Dyer, Doyne Farmer, Michael Wooldridge, Anisoara Calinescu

Comments To be presented at the 6th World Conference on Complex Systems (WCCS 2026)

2602.13550 2026-03-06 cs.LG

Out-of-Support Generalisation via Weight-Space Sequence Modelling

Roussel Desmond Nzoyem

Comments Published at the Catch, Adapt, and Operate (CAO): Monitoring ML Models Under Drift workshop at ICLR 2026

2602.12704 2026-03-06 cs.LG quant-ph

QTabGAN: A Hybrid Quantum-Classical GAN for Tabular Data Synthesis

Subhangi Kumari, Rakesh Achutha, Vignesh Sivaraman

Comments 21 pages, Minor revisions to improve clarity

2602.09980 2026-03-06 cs.LG cs.AI physics.comp-ph

Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks

Enzo Nicolas Spotorno, Josafat Ribeiro Leal, Antonio Augusto Frohlich

Comments 5 pages, 1 figure, accepted as Poster in AI&PDE ICLR 2026 Workshop

2602.04243 2026-03-06 cs.RO

Viewpoint Matters: Dynamically Optimizing Viewpoints with Masked Autoencoder for Visual Manipulation

Pengfei Yi, Yifan Han, Junyan Li, Litao Liu, Wenzhao Lian

2602.01780 2026-03-06 cs.CV cs.RO

DDP-WM: Disentangled Dynamics Prediction for Efficient World Models

Shicheng Yin, Kaixuan Yin, Weixing Chen, Yang Liu, Guanbin Li, Liang Lin

Comments Efficient and high-fidelity world model. Code is available at https://hcplab-sysu.github.io/DDP-WM

2602.01601 2026-03-06 cs.LG cs.AI cs.CL

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards

Hieu Trung Nguyen, Bao Nguyen, Wenao Ma, Yuzhi Zhao, Ruifeng She, Viet Anh Nguyen

Comments Accepted at ICLR 2026

2602.00485 2026-03-06 cs.AI

Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

Shule Lu, Yujing Wang, Hainan Zhang, Xiaoshan Yang, Hongwei Zheng, Yongxin Tong, Changsheng Xu, Zhiming Zheng

Comments Due to the need for substantial revisions, the authors believe that the paper should be retracted first.A revised version may be resubmitted

2601.23236 2026-03-06 cs.LG cs.AI math.OC stat.ML

YuriiFormer: A Suite of Nesterov-Accelerated Transformers

Aleksandr Zimin, Yury Polyanskiy, Philippe Rigollet

2601.23038 2026-03-06 cs.RO

MOSAIC: Modular Scalable Autonomy for Intelligent Coordination of Heterogeneous Robotic Teams

David Oberacker, Julia Richter, Philip Arm, Marvin Grosse Besselmann, Lennart Puck, William Talbot, Maximilian Schik, Sabine Bellmann, Tristan Schnell, Hendrik Kolvenbach, Rüdiger Dillmann, Marco Hutter, Arne Roennau

Comments This work has been submitted to the IEEE for possible publication

2601.22571 2026-03-06 cs.AI

PerfGuard: A Performance-Aware Agent for Visual Content Generation

Zhipeng Chen, Zhongrui Zhang, Chao Zhang, Yifan Xu, Lan Yang, Jun Liu, Ke Li, Yi-Zhe Song

Comments This paper has been accepted by ICLR 2026. The original paper link is: https://openreview.net/pdf?id=tdN42GTv4S The code repository link is: https://github.com/FelixChan9527/PerfGuard

2601.19175 2026-03-06 cs.LG cs.AI cs.IR cs.SI

A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction

Jinkyu Sung, Myunggeum Jee, Joonseok Lee

Comments Accepted for ICLR 2026

2601.16333 2026-03-06 cs.CV cs.AI cs.CL

Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments

Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle