arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.26694 2026-05-08 cs.RO cs.AI cs.CV

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

Jun Guo, Qiwei Li, Peiyan Li, Zilong Chen, Nan Sun, Yifei Su, Heyun Wang, Yuan Zhang, Xinghang Li, Huaping Liu

Comments Project website: https://sharinka0715.github.io/X-WAM/

详情

英文摘要

We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action efficiency and world modeling quality. To leverage the strong visual priors of pretrained video diffusion models, X-WAM imagines the future world by predicting multi-view RGB-D videos, and obtains spatial information efficiently through a lightweight structural adaptation: replicating the final few blocks of the pretrained Diffusion Transformer into a dedicated depth prediction branch for the reconstruction of future spatial information. Moreover, we propose Asynchronous Noise Sampling (ANS) to jointly optimize generation quality and action decoding efficiency. ANS applies a specialized asynchronous denoising schedule during inference, which rapidly decodes actions with fewer steps to enable efficient real-time execution, while dedicating the full sequence of steps to generate high-fidelity video. Rather than entirely decoupling the timesteps during training, ANS samples from their joint distribution to align with the inference distribution. Pretrained on over 5,800 hours of robotic data, X-WAM achieves 79.2% and 90.7% average success rate on RoboCasa and RoboTwin 2.0 benchmarks, while producing high-fidelity 4D reconstruction and generation surpassing existing methods in both visual and geometric metrics.

URL PDF HTML ☆

赞 0 踩 0

2604.25907 2026-05-08 cs.LG cs.AI

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

Chu-Cheng Lin, Eugene Ie

详情

英文摘要

SFT-then-RLVR is widely used for post-training reasoning models, but why this specific ordering, and why RLVR-only stalls at cold start, have lacked a unifying theoretical account. We provide that account under a unified loss family $J_Q$ using the Tsallis $q$-logarithm. $J_Q$ is a single-parameter family that interpolates between RLVR (at $q{=}0$, the \textit{exploitation pole}) and the log-marginal-likelihood over latent trajectories (at $q{=}1$, the \textit{density-estimation pole}), under which the standard pipeline corresponds to a stepwise $q{=}1 \to 0$ schedule. All members share the same per-example gradient direction, differing only by a per-instance amplification $P_θ^{-q}$ that reweights each instance independently of the learning rate. Under gradient flow analysis, we show that the exploitation pole requires $Ω(\frac{1}{p_0})$ time to escape cold start but is robust to label noise, while the density-estimation pole escapes in $Θ\big(\log(\frac{1}{p_0})\big)$ but memorizes label noise. This separation explains how SFT ($q{=}1$) first moves the model out of the cold-start regime, followed by the more robust RLVR ($q{=}0$), under the SFT-then-RLVR paradigm. We further derive two Monte Carlo estimators that directly optimize fixed-$q$ on the $J_Q$ continuum, without annotated rationales: Gradient-Amplified RL (GARL) and Posterior-Attenuated Fine-Tuning (PAFT), with shared bias $O\big(\frac{q}{M P_θ^q}\big)$ but different variance and stability properties. On FinQA, HotPotQA, and MuSiQue, GARL at sufficiently high $q$ substantially mitigates cold-start stalling, escaping cold start where GRPO fails entirely. In warm start, GARL at low $q$ dominates FinQA where training is stable; on HotPotQA and MuSiQue, GARL destabilizes and PAFT at $q{=}0.75$ remains stable, reaching $47.9$ \texttt{m@16} on HotPotQA ($+13.9$ over GRPO).

URL PDF HTML ☆

赞 0 踩 0

2604.24909 2026-05-08 cs.LG cs.CE

Contrastive Image-Metadata Pre-Training for Materials Transmission Electron Microscopy

Georgia Channing, Debora Keller, Marta D. Rossell, Philip Torr, Stig Helveg, Henrik Eliasson

2604.24016 2026-05-08 cs.LG

Direction-Aware Offline-to-Online Learning in Linear Contextual Bandits

Zean Han, Ruihan Lin, Zezhen Ding, Jiheng Zhang

2604.22031 2026-05-08 cs.LG cs.AI

Mochi: Aligning Pre-training and Inference for Efficient Graph Foundation Models via Meta-Learning

João Mattos, Arlei Silva

Comments 23 pages, 7 figures

2604.20051 2026-05-08 cs.CL cs.LG

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

Chengyu Huang, Sheng-Yen Chou, Zhengxin Zhang, Claire Cardie

2604.19331 2026-05-08 cs.CL

Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation

Eoghan Cunningham, Derek Greene, James Cross, Antonio Rago

Comments Accepted at KR'26 In The Wild Track. Camera Ready with additional supplementary materials

2604.19043 2026-05-08 cs.AI

Learning Lifted Action Models from Unsupervised Visual Traces

Kai Xi, Stephen Gould, Sylvie Thiébaux

Comments Accepted to the 36th International Conference on Automated Planning and Scheduling (ICAPS-26)

2604.18978 2026-05-08 cs.LG cs.AI

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

Yuan Zhuang, Yuexin Bian, Sihong He, Jie Feng, Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao

2604.18738 2026-05-08 cs.CL

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models

Lin Yao

2604.18555 2026-05-08 cs.LG cs.AI cs.NI

A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work

Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, Amit Portnoy, Shay Vargaftik

详情

英文摘要

This note clarifies the relationship between the recent TurboQuant work and the earlier DRIVE (NeurIPS 2021) and EDEN (ICML 2022) schemes. DRIVE is a 1-bit quantizer that EDEN extended to any $b>0$ bits per coordinate; we refer to them collectively as EDEN. First, TurboQuant$_{\text{mse}}$ is a special case of EDEN obtained by fixing EDEN's scalar scale parameter to $S=1$. EDEN supports both biased and unbiased quantization, each optimized by a different $S$ (chosen via methods described in the EDEN works). The fixed choice $S=1$ used by TurboQuant is generally suboptimal, although the optimal $S$ for biased EDEN converges to $1$ as the dimension grows; accordingly TurboQuant$_{\text{mse}}$ approaches EDEN's behavior for large $d$. Second, TurboQuant$_{\text{prod}}$ combines a biased $(b-1)$-bit EDEN step with an unbiased 1-bit QJL quantization of the residual. It is suboptimal in three ways: (1) its $(b-1)$-bit step uses the suboptimal $S=1$; (2) its 1-bit unbiased residual quantization has worse MSE than (unbiased) 1-bit EDEN; (3) chaining a biased $(b-1)$-bit step with a 1-bit unbiased residual step is inferior to unbiasedly quantizing the input directly with $b$-bit EDEN. Third, some of the analysis in the TurboQuant work mirrors that of the EDEN works: both exploit the connection between random rotations and the shifted Beta distribution, use the Lloyd-Max algorithm, and note that Randomized Hadamard Transforms can replace uniform random rotations. Experiments support these claims: biased EDEN (with optimized $S$) is more accurate than TurboQuant$_{\text{mse}}$, and unbiased EDEN is markedly more accurate than TurboQuant$_{\text{prod}}$, often by more than a bit (e.g., 2-bit EDEN beats 3-bit TurboQuant$_{\text{prod}}$). We also repeat all accuracy experiments from the TurboQuant paper, showing that EDEN outperforms it in every setup we have tried.

URL PDF HTML ☆

赞 0 踩 0

2604.17866 2026-05-08 cs.CL cs.AI

Latent Abstraction for Retrieval-Augmented Generation

Ha Lan N. T, Minh-Anh Nguyen, Dung D. Le

2604.17739 2026-05-08 cs.LG cs.CL

Democratizing Tool Learning with Environments Fully Simulated by a Free 8B Language Model

Chenming Tang, Hsiu-Yuan Huang, Weijie Liu, Junqiang Zheng, Saiyong Yang, Yunfang Wu

Comments Preprint

2604.13075 2026-05-08 cs.CL cs.AI

DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs

Md Hasebul Hasan, Krity Haque Charu, Eshwara Prasad Sridhar, Shuchisnigdha Deb, Mohammad A. Islam

Comments 20 pages

2604.11890 2026-05-08 cs.LG stat.ML

Subcritical Signal Propagation at Initialization in Normalization-Free Transformers

Sergey Alekseev

Comments Minor text edits; 10 pages of main text; 34 pages total; 5 figures in the main text, 25 figures total; preprint

2604.11535 2026-05-08 cs.AI

Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems

Xi-Wei Pan, Shi-Wen An, Jin-Guo Liu

Comments The source code is available at https://github.com/CodingThrust/problem-reductions

2604.07096 2026-05-08 cs.LG stat.ML

Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

Changkun Guan, Mengfan Xu

Comments 21 pages

2604.06132 2026-05-08 cs.AI

Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents

Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang

2604.05377 2026-05-08 cs.CV

Can Vision-Language Models Think from the Sky? Unifying UAV Reasoning and Generation

Jintao Sun, Gangyi Ding, Donglin Di, Hu Zhang, Zhedong Zheng

Comments 21 pages, 12 figures, 7 tables

2604.04415 2026-05-08 cs.CL

STEER: Structured Event Evidence for Video Reasoning via Multi-Objective Reinforcement Learning

Zinuo Li, Yongxin Guo, Jun Liu, Jiawei Zhan, Xi Jiang, Chengjie Wang, Mohammed Bennamoun, Farid Boussaid, Feng Zheng, Qiuhong Ke

2604.01951 2026-05-08 cs.LG

Autolearn: Learn by Surprise, Commit by Proof

Kang-Sin Choi

Comments 21 pages, 2 figures

2604.01178 2026-05-08 cs.LG cs.AI cs.CL

Screening Is Enough

Ken M. Nakanishi

Comments 36 pages, 27 figures. Revised version with retuned Transformer baselines, additional experiments, ablations, and appendix analyses

2603.26240 2026-05-08 cs.RO cs.MA cs.NE

SwarmCoDe: A Scalable Co-Design Framework for Heterogeneous Robot Swarms via Dynamic Speciation

Andrew Wilhelm, Josie Hughes

Comments 8 pages, 9 figures

2603.24768 2026-05-08 cs.AI

Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

Zeda Xu, Nikolas Martelaro, Christopher McComb

2603.22155 2026-05-08 cs.LG math.OC

RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

Zhankun Luo, M. Berk Sahin, Antesh Upadhyay, Behzad Sharif, Abolfazl Hashemi

Comments First three authors contributed equally

2603.21877 2026-05-08 cs.LG cs.AI

P^2O: Joint Policy and Prompt Optimization

Xinyu Lu, Kaiqi Zhang, Jinglin Yang, Boxi Cao, Yaojie Lu, Hongyu Lin, Min He, Xianpei Han, Le Sun

2603.20103 2026-05-08 cs.LG cs.AI cs.RO

Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

Seyed Mahdi B. Azad, Jasper Hoffmann, Iman Nematollahi, Hao Zhu, Abhinav Valada, Joschka Boedecker

2603.17859 2026-05-08 cs.CV

VISER: Visually-Informed System for Enhanced Robustness in Open-Set Iris Presentation Attack Detection

Byron Dowling, Jacob Piland, Eleanor Frederick, Christopher Sweet, Adam Czajka

Comments Version 2

2603.15270 2026-05-08 cs.CL cs.AI

From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs

Xu Zhang, Wenxin Ma, Chenxu Wu, Rongsheng Wang, Zhiyang He, Xiaodong Tao, Kun Zhang, S. Kevin Zhou

2603.13085 2026-05-08 cs.LG cs.CV cs.NA math.NA stat.ML

Linearized Attention Cannot Enter the Kernel Regime at Any Practical Width

Jose Marie Antonio Miñoza, Paulo Mario P. Medina, Sebastian C. Ibañez