arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.15165 2026-03-04 cs.LG math.OC

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach

Xin Guo, Zijiu Lyu

详情

英文摘要

This paper studies policy transfer, one of the well-known transfer learning techniques adopted in large language models, for continuous-time reinforcement learning problems. In the case of continuous-time linear-quadratic systems with Shannon's entropy regularization, we fully exploit the Gaussian structure of their optimal policy and the stability of their associated Riccati equations. In the general case where the system has possibly non-linear and bounded dynamics, the key technical component is the stability of diffusion SDEs which is established by invoking the rough path theory. Our work provides the first theoretical proof of policy transfer for continuous-time RL: an optimal policy learned for one RL problem can be used to initialize to search for a near-optimal policy for another closely related RL problem, while achieving (at least) the same rate of convergence for the original algorithm. As a byproduct of our analysis, we derive the stability of a concrete class of continuous-time score-based diffusion models via their connection with LQRs. To illustrate the benefit of policy transfer for RL, we propose a novel policy learning algorithm for continuous-time LQRs, which achieves global linear convergence and local super-linear convergence.

URL PDF HTML ☆

赞 0 踩 0

2510.14765 2026-03-04 cs.CV cs.AI cs.GR

Inpainting the Red Planet: Diffusion Models for the Reconstruction of Martian Environments in Virtual Reality

Giuseppe Lorenzo Catalano, Agata Marta Soccini

Comments 21 pages, 9 figures

2510.13315 2026-03-04 cs.CV cs.AI

Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models

Eun Woo Im, Muhammad Kashif Ali, Vivek Gupta

Comments 10 pages, accepted to ICLR 2026

2510.11369 2026-03-04 cs.CV

Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment

Shijie Zhao, Xuanyu Zhang, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, Jian Zhang

Comments ICLR 2026 Oral

2510.10902 2026-03-04 cs.LG stat.ML

Auditing Information Disclosure During LLM-Scale Gradient Descent Using Gradient Uniqueness

Sleem Abdelghafar, Maryam Aliakbarpour, Chris Jermaine

2510.09160 2026-03-04 cs.LG

Efficient Resource-Constrained Training of Transformers via Subspace Optimization

Le-Trung Nguyen, Enzo Tartaglione, Van-Tam Nguyen

Comments ICLR 2026 Oral

2510.06410 2026-03-04 cs.AI

Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?

Aochong Oliver Li, Tanya Goyal

详情

英文摘要

Reasoning LLMs are trained to verbalize their reasoning process, yielding strong gains on complex tasks. This transparency also opens a promising direction: multiple reasoners can directly collaborate on each other's thinking within a shared trajectory, yielding better inference efficiency and exploration. A key prerequisite, however, is the ability to assess the usefulness and build on another model's partial thinking -- we call this off-trajectory reasoning. Our paper investigates a critical question: can standard solo-reasoning training pipelines deliver desired off-trajectory behaviors? We propose twin tests that capture the two extremes of the off-trajectory spectrum, namely Recoverability, which tests whether LLMs can backtrack from "distractions" induced by misleading reasoning traces, and Guidability, which tests their ability to build upon correct reasoning from stronger collaborators. Our study evaluates 15 open-weight LLMs (1.5B-32B) and reveals a counterintuitive finding -- "stronger" LLMs on benchmarks are often more fragile under distraction. Moreover, all models tested fail to effectively leverage guiding steps from collaborators on problems beyond their inherent capabilities with solve rates remaining under 9.2%. Finally, we conduct control studies to isolate the effects of three factors in post-training on these behaviors: the choice of distillation teacher, the use of RL, and data selection strategy. Our results provide actionable insights for training natively strong reasoning collaborators; e.g., we find that suboptimal recoverability behaviors of teacher models are transferred to distilled students even if the distillation trajectories are correct. Taken together, this work lays the groundwork for evaluating multi-model collaborations in shared reasoning trajectories and highlights the limitations of off-the-shelf reasoning LLMs.

URL PDF HTML ☆

赞 0 踩 0

2510.03215 2026-03-04 cs.CL cs.LG

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Tianyu Fu, Zihan Min, Hanling Zhang, Jichao Yan, Guohao Dai, Wanli Ouyang, Yu Wang

Comments Published in ICLR'26

2510.03101 2026-03-04 cs.LG

AdaBet: Gradient-free Layer Selection for Efficient Training of Deep Neural Networks

Irene Tenison, Soumyajit Chatterjee, Fahim Kawsar, Mohammad Malekzadeh

Comments Full Version accepted at CVPR 2026

2510.03027 2026-03-04 cs.LG

Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling

Junyi Yao, Parham Eftekhar, Gene Cheung, Xujin Chris Liu, Yao Wang, Wei Hu

Comments Accepted by ICLR2026, 10 pages, 2 figures

2510.02692 2026-03-04 cs.LG cs.AI

Fine-Tuning Diffusion Models via Intermediate Distribution Shaping

Gautham Govind Anil, Shaan Ul Haque, Nithish Kannen, Dheeraj Nagaraj, Sanjay Shakkottai, Karthikeyan Shanmugam

Comments Accepted at ICLR 2026

2510.00578 2026-03-04 cs.CV

Arbitrary Generative Video Interpolation

Guozhen Zhang, Haiguang Wang, Chunyu Wang, Yuan Zhou, Qinglin Lu, Limin Wang

Comments ICLR 2026

2510.00438 2026-03-04 cs.CV

BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

Zhaoyang Li, Dongjun Qian, Kai Su, Qishuai Diao, Xiangyang Xia, Chang Liu, Wenfei Yang, Tianzhu Zhang, Zehuan Yuan

Comments Accepted by ICLR 2026

2509.24421 2026-03-04 cs.CV

Proxy-GS: Unified Occlusion Priors for Training and Inference in Structured 3D Gaussian Splatting

Yuanyuan Gao, Yuning Gong, Yifei Liu, Li Jingfeng, Dingwen Zhang, Yanci Zhang, Dan Xu, Xiao Sun, Zhihang Zhong

Comments Project page: https://visionary-laboratory.github.io/Proxy-GS

2509.23725 2026-03-04 cs.AI

MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models

Siqi Ma, Jiajie Huang, Fan Zhang, Yue Shen, Jinlin Wu, Guohui Fan, Zhu Zhang, Zelin Zang

Comments accepted by AAAI-26 (ORAL)

2509.23348 2026-03-04 cs.LG

Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport

Xavier Aramayo Carrasco, Grigoriy Ksenofontov, Aleksei Leonov, Iaroslav Sergeevich Koshelev, Alexander Korotin

2509.23265 2026-03-04 cs.LG

CREPE: Controlling Diffusion with Replica Exchange

Jiajun He, Paul Jeha, Peter Potaptchik, Leo Zhang, José Miguel Hernández-Lobato, Yuanqi Du, Saifuddin Syed, Francisco Vargas

Comments Accepted to ICLR 2026

2509.23141 2026-03-04 cs.CV

Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

Peilin Feng, Zhutao Lv, Junyan Ye, Xiaolei Wang, Xinjie Huo, Jinhua Yu, Wanghan Xu, Wenlong Zhang, Lei Bai, Conghui He, Weijia Li

Comments Published as a conference paper at ICLR 2026

2509.22641 2026-03-04 cs.CL cs.AI cs.HC

Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

Arkadiy Saakyan, Najoung Kim, Smaranda Muresan, Tuhin Chakrabarty

Comments ICLR 2026 Camera Ready. 30 pages, 11 figures, 15 tables

2509.22445 2026-03-04 cs.LG cs.AI cs.CL

Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers

Peter Shaw, James Cohan, Jacob Eisenstein, Kristina Toutanova

Comments ICLR 2026

2509.20986 2026-03-04 cs.CV cs.AI

SiNGER: A Clearer Voice Distills Vision Transformers Further

Geunhyeok Yu, Sunjae Jeong, Yoonyoung Choi, Jaeseung Kim, Hyoseok Hwang

Comments Main paper: 12 pages (including 3 pages of references), 6 figures, 6 tables. Appendix: 9 pages, 7 figures. ICLR 2026 accepted

2509.16858 2026-03-04 cs.RO

Towards an Adaptive Social Game-Playing Robot: An Offline Reinforcement Learning-Based Framework

Soon Jynn Chu, Raju Gottumukkala, Alan Barhorst

Comments Submitted to conference

2509.10167 2026-03-04 cs.LG

The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram

Lénaïc Chizat

详情

英文摘要

We study the gradient-based training of large-depth residual networks (ResNets) from standard random initializations. We show that infinite-depth ResNets behave as if they were infinitely wide, regardless of their actual width. More precisely, we obtain that with a fixed embedding dimension $D$, the training dynamics converges to a unique Neural Mean ODE training dynamics as the depth $L$ diverges, regardless of the scaling of the hidden width $M$. For a residual scale $Θ_D\big(\fracα{LM}\big)$ with $α=Θ_D(1)$, we obtain the error bound $O_D\big(\frac{1}{L}+ \frac{1}{\sqrt{LM}}\big)$ between the model's output and its limit after a fixed number gradient of steps. In this regime, the limit exhibits maximal local feature updates, i.e. the Mean ODE is genuinely non-linearly parameterized. In contrast, we show that $α\to \infty$ yields a lazy ODE regime where the Mean ODE is linearly parameterized, and we derive a convergence rate in this case as well. We then focus on the particular case of ResNets with two-layer perceptron blocks, for which we study how these scalings depend on the embedding dimension $D$. We identify the residual scale $O\big(\frac{\sqrt{D}}{LM}\big)$ as necessary and sufficient for maximal local feature updates. In this regime, we prove a high-probability error bound $O\big(\frac{1}{L}+ \frac{\sqrt{D}}{\sqrt{LM}}\big)$ between the ResNet and its limit after a fixed number of gradient steps. Our convergence results rely on a novel mathematical perspective on ResNets : (i) due to the randomness of the initialization, the forward and backward pass through the ResNet behave as the stochastic approximation of certain mean ODEs, and (ii) by propagation of chaos (that is, asymptotic independence of the units) this behavior is preserved through the training dynamics. We verify empirically that all our rates are tight.

URL PDF HTML ☆

赞 0 踩 0

2509.07430 2026-03-04 cs.LG cs.AI

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward

Long Li, Zhijian Zhou, Jiaran Hao, Jason Klein Liu, Yanting Miao, Wei Pang, Xiaoyu Tan, Wei Chu, Zhe Wang, Shirui Pan, Chao Qu, Yuan Qi

Comments 27 pages, 6 figures

2509.05425 2026-03-04 cs.CL cs.AI

No Text Needed: Forecasting MT Quality and Inequity from Fertility and Metadata

Jessica M. Lundin, Ada Zhang, David Adelani, Cody Carroll

2509.03191 2026-03-04 cs.LG

Tabular foundation model for GEOAI benchmark problems BM/AirportSoilProperties/2/2025

Taiga Saito, Yu Otake, Stephen Wu

2508.05612 2026-03-04 cs.LG cs.AI

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Linghao Zhu, Yiran Guan, Dingkang Liang, Jianzhong Ju, Zhenbo Luo, Bin Qin, Jian Luan, Yuliang Liu, Xiang Bai

Comments This paper has been accepted by ICLR 2026. Conference link: https://iclr.cc/virtual/2026/poster/10007559 OpenReview link: https://openreview.net/forum?id=mYP33u1QBK Project page at: https://xenozlh.github.io/Shuffle-R1/

2508.01077 2026-03-04 cs.LG cs.AI

The Lattice Geometry of Neural Network Quantization -- A Short Equivalence Proof of GPTQ and Babai's Algorithm

Johann Birnick

Comments 9 pages, 3 figures, accepted at ICLR 2026

2507.20128 2026-03-04 cs.SD

Diffusion-based Symbolic Music Generation with Structured State Space Models

Shenghua Yuan, Xing Tang, Jiatao Chen, Tianming Xie, Jing Wang, Bing Shi

Comments This is a duplicate submission. The updated and correct version of this paper is available at arXiv:2603.00576, Efficient Long-Sequence Diffusion Modeling for Symbolic Music Generation. Please disregard this version

2507.17520 2026-03-04 cs.RO cs.CV

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

Shuai Yang, Hao Li, Bin Wang, Yilun Chen, Yang Tian, Tai Wang, Hanqing Wang, Feng Zhao, Yiyi Liao, Jiangmiao Pang

Comments 48 pages