arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2510.09285 2026-03-03 cs.CV

Spotlight on Token Perception for Multimodal Reinforcement Learning

Siyuan Huang, Xiaoye Qu, Yafu Li, Yun Luo, Zefeng He, Daizong Liu, Yu Cheng

Comments Accepted by ICLR 2026, project page: https://github.com/huaixuheqing/VPPO-RL

2510.08919 2026-03-03 cs.CV cs.LG

PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning

Daiki Yoshikawa, Takashi Matsubara

Comments 24 pages. Codes are available at https://github.com/tksmatsubara/PHyCLIP

Journal ref International Conference on Learning Representations (ICLR), 2026

2510.08630 2026-03-03 cs.CL

ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

Jingbiao Mei, Mingsheng Sun, Jinghong Chen, Pengda Qin, Yuhong Li, Da Chen, Bill Byrne

Comments ICLR 2026

2510.07746 2026-03-03 cs.LG

t-SNE Exaggerates Clusters, Provably

Noah Bergam, Szymon Snoeck, Nakul Verma

Comments ICLR 2026

2510.06377 2026-03-03 cs.LG cs.AI cs.DB

Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data

Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos Kanatsoulis, Roshan Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, Jure Leskovec

Comments Accepted to ICLR 2026

2510.06218 2026-03-03 cs.CV cs.AI

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark

Deheng Zhang, Yuqian Fu, Runyi Yang, Yang Miao, Tianwen Qian, Xu Zheng, Guolei Sun, Ajad Chhatkuli, Xuanjing Huang, Yu-Gang Jiang, Luc Van Gool, Danda Pani Paudel

Comments Accepted by ICLR 2026

2510.06203 2026-03-03 cs.LG cs.AI

Reference Grounded Skill Discovery

Seungeun Rho, Aaron Trinh, Danfei Xu, Sehoon Ha

2510.06005 2026-03-03 cs.CL

MASA: Rethinking the Representational Bottleneck in LoRA with Multi-A Shared Adaptation

Qin Dong, Yuntian Tang, Heming Jia, Yunhang Shen, Bohan Jia, Wenxuan Huang, Lianyue Zhang, Jiao Xie, Shaohui Lin, Rongrong Ji

Comments 16 pages, 5 figures

2510.05534 2026-03-03 cs.CL

Revisiting Self-Play Preference Optimization: On the Role of Prompt Difficulty

Yao Xiao, Jung-jae Kim, Roy Ka-wei Lee, Lidong Bing

2510.05132 2026-03-03 cs.CL cs.AI cs.LG

Training Large Language Models To Reason In Parallel With Global Forking Tokens

Sheng Jia, Xiao Wang, Shiva Prasad Kasiviswanathan

Comments Accepted at ICLR 2026

Journal ref The Fourteenth International Conference on Learning Representations (ICLR 2026), https://openreview.net/forum?id=xBQvvkg4Wc

2510.05069 2026-03-03 cs.CL cs.AI

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Dachuan Shi, Abedelkadir Asi, Keying Li, Xiangchi Yuan, Leyan Pan, Wenke Lee, Wen Xiao

Comments ICLR 2026. Code: https://github.com/sdc17/SwiReasoning, Website: https://swireasoning.github.io/

2510.05060 2026-03-03 cs.LG math.ST stat.ML stat.TH

ResCP: Reservoir Conformal Prediction for Time Series Forecasting

Roberto Neglia, Andrea Cini, Michael M. Bronstein, Filippo Maria Bianchi

Comments ICLR 2026

2510.04676 2026-03-03 cs.LG

Counterfactual Credit Guided Bayesian Optimization

Qiyu Wei, Haowei Wang, Richard Allmendinger, Mauricio A. Álvarez

2510.04474 2026-03-03 cs.AI cs.LG

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization

Gang Li, Yan Chen, Ming Lin, Tianbao Yang

Comments Accepted to ICLR 2026

2510.03605 2026-03-03 cs.AI cs.LG stat.ML

Understanding the Role of Training Data in Test-Time Scaling

Adel Javanmard, Baharan Mirzasoleiman, Vahab Mirrokni

Comments 25 pages, 5 figures, accepted in ICLR 2026

2510.03253 2026-03-03 cs.LG cs.AI

Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents

Heyang Gao, Zexu Sun, Erxue Min, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Xu Chen

Comments Accepted to ICLR 2026

2510.02253 2026-03-03 cs.CV cs.AI cs.LG

DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing

Zihan Zhou, Shilin Lu, Shuli Leng, Shaocong Zhang, Zhuming Lian, Xinlei Yu, Adams Wai-Kin Kong

Comments Accepted by ICLR 2026

2510.01265 2026-03-03 cs.LG cs.AI cs.CL

RLP: Reinforcement as a Pretraining Objective

Ali Hatamizadeh, Syeda Nahida Akter, Shrimai Prabhumoye, Jan Kautz, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi

Comments ICLR 2026 camera ready

详情

英文摘要

The dominant paradigm for training large reasoning models starts with pre-training using next-token prediction loss on vast amounts of data. Reinforcement learning, while powerful in scaling reasoning, is introduced only as the very last phase of post-training, preceded by supervised fine-tuning. While dominant, is this an optimal way of training? In this paper, we present RLP, an information-driven reinforcement pretraining objective, that brings the core spirit of reinforcement learning -- exploration -- to the last phase of pretraining. The key idea is to treat chain-of-thought as an exploratory action, with rewards computed based on the information gain it provides for predicting future tokens. This training objective essentially encourages the model to think for itself before predicting what comes next, thus teaching an independent thinking behavior earlier in the pretraining. More concretely, the reward signal measures the increase in log-likelihood of the next token when conditioning on both context and a sampled reasoning chain, compared to conditioning on context alone. This approach yields a verifier-free dense reward signal, allowing for efficient training for the full document stream during pretraining. Specifically, RLP reframes reinforcement learning for reasoning as a pretraining objective on ordinary text, bridging the gap between next-token prediction and the emergence of useful chain-of-thought reasoning. Pretraining with RLP on Qwen3-1.7B-Base lifts the overall average across an eight-benchmark math-and-science suite by 19%. With identical post-training, the gains compound, with the largest improvements on reasoning-heavy tasks such as AIME25 and MMLU-Pro. Applying RLP to the Nemotron-Nano-12B-v2 increases the overall average from 42.81% to 61.32% and raises the average on scientific reasoning by 23%, demonstrating scalability across architectures and model sizes.

URL PDF HTML ☆

赞 0 踩 0

2510.01051 2026-03-03 cs.LG cs.AI cs.CL

GEM: A Gym for Agentic LLMs

Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Simon Yu, Xiangxin Zhou, Haotian Xu, Shaopan Xiong, Bo Liu, Chenmien Tan, Chuen Yang Beh, Weixun Wang, Hao Zhu, Weiyan Shi, Diyi Yang, Michael Shieh, Yee Whye Teh, Wee Sun Lee, Min Lin

2510.00236 2026-03-03 cs.LG

Per-example gradients: a new frontier for understanding and improving optimizers

Vincent Roulet, Atish Agarwala

2509.26432 2026-03-03 cs.LG cs.AI

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Guanxi Lu, Hao Mark Chen, Yuto Karashima, Zhican Wang, Daichi Fujiki, Hongxiang Fan

Comments Published as a conference paper at ICLR 2026

2509.25175 2026-03-03 cs.CL cs.AI

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

Haolei Xu, Xinyu Mei, Yuchen Yan, Rui Zhou, Wenqi Zhang, Weiming Lu, Yueting Zhuang, Yongliang Shen

Comments Functionality upgrade. Code: https://github.com/ZJU-REAL/EasySteer Demo: https://www.youtube.com/watch?v=3rRGzZmhrXg

2509.25087 2026-03-03 cs.LG cs.AI cs.CL

Scaling with Collapse: Efficient and Predictable Training of LLM Families

Shane Bergsma, Bin Claire Zhang, Nolan Dey, Shaheer Muhammad, Gurpreet Gosal, Joel Hestness

Comments ICLR 2026

2509.24502 2026-03-03 cs.CL

SUIT: Knowledge Editing with Subspace-Aware Key-Value Mappings

Haewon Park, Sangwoo Kim, Yohan Jo

Comments 31 pages, 13 figures, 17 tables

2509.24385 2026-03-03 cs.CV cs.AI

Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy

Haijier Chen, Bo Xu, Shoujian Zhang, Haoze Liu, Jiaxuan Lin, Jingrong Wang

2509.24365 2026-03-03 cs.CV cs.AI

Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models

Jitai Hao, Hao Liu, Xinyan Xiao, Qiang Huang, Jun Yu

Comments ICLR 2026

2509.24282 2026-03-03 cs.CL cs.AI

SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Gyuhyeon Seo, Jungwoo Yang, Junseong Pyo, Nalim Kim, Jonggeun Lee, Yohan Jo

Comments Accepted at ICLR 2026 (Oral)

2509.24245 2026-03-03 cs.CL cs.AI

Prompt and Parameter Co-Optimization for Large Language Models

Xiaohe Bo, Rui Li, Zexu Sun, Quanyu Dai, Zeyu Zhang, Zihang Tian, Xu Chen, Zhenhua Dong

Comments ICLR 2026

2509.24156 2026-03-03 cs.AI cs.CL

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

Yuhui Wang, Changjiang Li, Guangke Chen, Jiacheng Liang, Ting Wang

Comments Accepted to ICLR 2026

2509.23721 2026-03-03 cs.RO

DA-MMP: Learning Coordinated and Accurate Throwing with Dynamics-Aware Motion Manifold Primitives

Chi Chu, Huazhe Xu

Comments Accepted to ICRA 2026. Project page: https://cc299792458.github.io/da-mmp/