语言大模型 / LLM

2606.19348 2026-06-19 cs.CL cs.AI 新提交 95%

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-V4: 迈向高效百万令牌上下文智能

DeepSeek-AI, Anyi Xu, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, Chenchen Ling, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chengyu Hou, Chenhao Xu, Chenze Shao, Chong Ruan, Conner Sun, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Donghao Li, Dongjie Ji, Erhang Li, Fang Wei, Fangyun Lin, Fangzhou Yuan, Feiyu Xia, Fucong Dai, Guangbo Hao, Guanting Chen, Guoai Cao, Guolai Meng, Guowei Li, Han Yu, Han Zhang, Hanwei Xu, Hao Li, Haofen Liang, Haoling Zhang, Haoming Luo, Haoran Wei, Haotian Yuan, Haowei Zhang, Haowen Luo, Haoyu Chen, Haozhe Ji, Hengqing Zhang, Honghui Ding, Hongxuan Tang, Huanqi Cao, Huazuo Gao, Hui Qu, Hui Zeng, J Yang, JQ Zhu, Jia Luo, Jia Song, Jia Yu, Jialiang Huang, Jialu Cai, Jian Liang, Jiangting Zhou, Jiasheng Ye, Jiashi Li, Jiaxin Xu, Jiewen Hu, Jieyu Yang, Jin Chen, Jin Yan, Jingchang Chen, Jingli Zhou, Jingting Xiang, Jingyang Yuan, Jingyuan Cheng, Jingzi Zhou, Jinhua Zhu, Jiping Yu, Joseph Sun, Jun Ran, Junguang Jiang, Junjie Qiu, Junlong Li, Junmin Zheng, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Kexing Zhou, Kezhao Huang, Kuai Yu, Lean Wang, Lecong Zhang, Lei Wang, Leyi Xia, Li Zhang, Liang Zhao, Lihua Guo, Lingxiao Luo, Linwang Ma, Linyan Zhu, Litong Wang, Liyu Cai, Liyue Zhang, Longhao Chen, MS Di, MY Xu, Max Mei, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Mingxu Zhou, Minmin Han, Ning Wang, Panpan Huang, Panpan Wang, Peixin Cong, Peiyi Wang, Peng Zhang, Qiancheng Wang, Qihao Zhu, Qingyang Li, Qinyu Chen, Qiushi Du, Qiwei Jiang, Rui Tian, Ruifan Xu, Ruijie Lu, Ruiling Xu, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, Runqian Chen, Runqiu Yin, Runxin Xu, Ruomeng Shen, Ruoyu Zhang, Ruyi Chen, SH Liu, Shanghao Lu, Shangmian Sun, Shangyan Zhou, Shanhuang Chen, Shaofei Cai, Shaoheng Nie, Shaoqing Wu, Shaoyuan Chen, Shengding Hu, Shengyu Liu, Shiqiang Hu, Shirong Ma, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, Shuying Yu, Songyang Zhou, Tao Ni, Tao Yun, Tian Jin, Tian Pei, Tian Ye, Tianle Lin, Tianran Ji, Tianyi Cui, Tianyuan Yue, Tingting Yu, Tun Wang, W Zhang, WL Xiao, Wangding Zeng, Wei An, Weilin Zhao, Wen Liu, Wenfeng Liang, Wenjie Pang, Wenjing Luo, Wenjing Yao, Wenjun Gao, Wenkai Yang, Wenlve Huang, Wenqing Hou, Wentao Zhang, Wenting Ma, Xi Gao, Xiang He, Xiangwen Wang, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaokang Zhang, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xingchen Liu, Xingkai Yu, Xingyou Li, Xinyu Yang, Xinyu Zhang, Xu Chen, Xuanyu Wang, Xuecheng Su, Xueyin Chen, Xuheng Lin, Xuwei Fu, YC Yan, YQ Wang, YW Ma, Yanfeng Luo, Yang Zhang, Yanhong Xu, Yanru Ma, Yanwen Huang, Yao Li, Yao Li, Yao Xu, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Qian, Yi Shao, Yi Yu, Yichao Zhang, Yifan Ding, Yifan Shi, Yijia Wu, Yiliang Xiong, Yiling Ma, Ying He, Ying Tang, Ying Zhou, Yingjia Luo, Yinmin Zhong, Yishi Piao, Yisong Wang, Yixiang Zhang, Yixiao Chen, Yixuan Tan, Yixuan Wei, Yiyang Ma, Yiyuan Liu, Yonglun Yang, Yongqiang Guo, Yongtong Wu, Yu Wu, YuKun Li, Yuan Cheng, Yuan Ou, Yuanfan Xu, Yuanhao Li, Yuduan Wang, Yuehan Yang, Yuer Xu, Yuhan Wu, Yuhao Meng, Yuheng Zou, Yukun Zha, Yunfan Xiong, Yupeng Chen, Yuping Lin, Yuqian Cao, Yuqian Wang, Yushun Zhang, Yuting Yan, Yutong Lin, Yuxian Gu, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuxuan Zhou, Yuyang Zhou, Yuzhen Huang, ZF Wu, Zehao Wang, Zehua Zhao, Zehui Ren, Zekai Zhang, Zhangli Sha, Zhe Fu, Zhe Ju, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zheren Gao, Zhewen Hao, Zhibin Gou, Zhicheng Ma, Zhigang Yan, Zhihong Shao, Zhixian Huang, Zhixuan Chen, Zhiyu Wu, Zhizhou Ren, Zhongyu Wu, Zhuoshu Li, Zhuping Zhang, Zian Xu, Zihao Wang, Zihua Qu, Zihui Gu, Zijia Zhu, Zilin Li, Zipeng Zhang, Ziwei Xie, Ziyi Gao, Ziyi Wan, Zizheng Pan, Zongqing Yao

发表机构 * DeepSeek-AI（深度求索人工智能）

专题命中预训练：百万token上下文MoE模型，架构优化

AI总结提出DeepSeek-V4系列MoE模型，通过混合注意力架构、流形约束超连接和Muon优化器，实现百万令牌上下文的高效推理，在核心任务上超越前代。

详情

AI中文摘要

我们展示了DeepSeek-V4系列的预览版本，包括两个强大的混合专家（MoE）语言模型——DeepSeek-V4-Pro（1.6T参数，49B激活）和DeepSeek-V4-Flash（284B参数，13B激活），两者均支持一百万个令牌的上下文长度。DeepSeek-V4系列在架构和优化方面引入了多项关键升级：（1）混合注意力架构，结合压缩稀疏注意力（CSA）和重度压缩注意力（HCA），以提高长上下文效率；（2）流形约束超连接（mHC），增强传统残差连接；（3）Muon优化器，实现更快的收敛和更高的训练稳定性。我们在超过32T多样且高质量的令牌上预训练了两个模型，随后通过全面的后训练流程解锁并进一步增强其能力。DeepSeek-V4-Pro-Max是DeepSeek-V4-Pro的最大推理努力模式，重新定义了开放模型的最先进水平，在核心任务上超越了其前代。同时，DeepSeek-V4系列在长上下文场景中非常高效。在百万令牌上下文设置下，与DeepSeek-V3.2相比，DeepSeek-V4-Pro仅需27%的单令牌推理FLOPs和10%的KV缓存。这使得我们能够常规支持百万令牌上下文，从而使长时任务和进一步的测试时扩展更加可行。模型检查点可从此https URL获取。

英文摘要

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.

URL PDF HTML ☆

赞 0 踩 0

2606.20381 2026-06-19 cs.AI 新提交 90%

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

重新思考LLM FP4预训练中的收缩偏差：几何起源、系统影响与UFP4方案

Qian Zhao, Kunlong Chen, Changxin Tian, Zhonghui Jiang, Haitao Zhang, Chaofan Yu, Peijie Jiang, Mingliang Gong, Jia Liu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou

发表机构 * Ling Team, Ant Group（蚂蚁集团灵团队）

专题命中预训练：研究LLM FP4预训练中的收缩偏差与优化方案。

AI总结本文发现E2M1格式因几何不对称导致收缩偏差，该偏差经随机哈达玛变换放大，造成训练不稳定；提出均匀网格E1M2/INT4及UFP4训练方案，在多种模型上实现更低损失。

Comments 18 pages, 12 figures

详情

AI中文摘要

FP4训练有望大幅减少LLM预训练的内存和计算成本，然而当前的FP4硬件路径和方案，包括NVIDIA Blackwell/Rubin级系统和AMD MI350系列GPU，仍以E2M1数据元素为中心。在本研究中，我们识别出该选择的一个根本限制：诸如E2M1的非均匀格式固有地遭受收缩偏差，这是一种由其可表示区间的几何不对称性导致的系统性负舍入误差。我们证明该偏差在层间乘性累积，并被随机哈达玛变换（RHT）放大，为现有基于E2M1的FP4方案中观察到的训练不稳定性提供了统一解释。相比之下，均匀网格（E1M2/INT4）绕过了这种网格几何误差，并能更好地将RHT改进的桶利用率转化为更高的量化质量。基于这一发现，我们提出UFP4，一种均匀4位训练方案，它将RHT应用于所有三个训练GEMM，同时仅对dY施加随机舍入。在Dense 1.5B、MoE 7.9B和MoE 124B的长程预训练中，UFP4始终比强E2M1基线实现更低的BF16相对损失退化，这得到了缩放定律分析和消融研究的支持。我们的结果表明，未来的加速器应支持E1M2/INT4风格的均匀4位网格作为与E2M1并列的一等训练原语。

英文摘要

FP4 training promises substantial reductions in memory and computation cost for LLM pretraining, yet current FP4 hardware paths and recipes, including NVIDIA Blackwell/Rubin-class systems and AMD MI350-series GPUs, remain centered on E2M1 data elements. In this study, we identify a fundamental limitation of that choice: non-uniform formats such as E2M1 inherently suffer from Shrinkage Bias, a systematic negative rounding error caused by the geometric asymmetry of their representable bins. We show that this bias accumulates multiplicatively across layers and is amplified by the Random Hadamard Transform (RHT), providing a unified explanation for the training instability observed in existing E2M1-based FP4 recipes. In contrast, uniform grids (E1M2/INT4) bypass this grid-geometry error and better convert the improved bucket utilization from RHT into higher quantization quality. Based on this finding, we propose UFP4, a uniform 4-bit training recipe that applies RHT to all three training GEMMs while restricting stochastic rounding to dY alone. On Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining, UFP4 consistently achieves lower BF16-relative loss degradation than strong E2M1-based baselines, supported by scaling-law analysis and ablation studies. Our results suggest that future accelerators should support E1M2/INT4-style uniform 4-bit grids as first-class training primitives alongside E2M1.

URL PDF HTML ☆

赞 0 踩 0

2606.20089 2026-06-19 cs.CL cs.AI 新提交 90%

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

IHUBERT: 面向波斯语资源的基于向量的语义去重与领域平衡预训练

Arash Ghafouri, Mahdi Firouzmandi, Hossein Saberi, Mohammad Reza Hasani Ahangar

发表机构 * Department of Artificial Intelligence and Cognitive Science, Imam Hossein Comprehensive University（人工智能与认知科学系，伊玛目·侯赛因综合大学）

专题命中预训练：波斯语预训练语言模型

AI总结提出IHUBERT，一个基于RoBERTa-base的波斯语预训练模型，通过多阶段预处理（包括基于向量数据库的语义去重和领域平衡）在45GB语料上训练，在多项NLU任务上取得领先结果，尤其抽取式问答表现突出。

详情

AI中文摘要

波斯语预训练语言模型仍然受到大规模高质量预训练语料库稀缺以及标准分类和NER任务之外评估不足的限制。我们提出了IHUBERT，一个从头训练的波斯语单语PLM，采用RoBERTa-base编码器（1.25亿参数），在Sepahr-Danesh集合的45GB精选子集（约70-80亿token）上进行训练。为了提高语料质量并减少冗余，我们采用多阶段预处理流程，包括规范化、精确和近似重复去除、匿名化，以及基于向量数据库的语义去重，以实现跨领域和语体的分布平衡控制。我们还在完整的预训练语料库上训练了一个13.9万词汇量的BPE分词器，以更好地捕捉波斯语的形态和拼写变化。IHUBERT在七个波斯语NLU基准测试上进行评估，涵盖NER、情感分析、主题分类、NLI、抽取式问答和关系抽取，使用任务标准指标（实体级F1、宏F1、EM/F1）。IHUBERT在抽取式QA上取得了最强增益，在PQuAD（F1 88.3542）和ParsiNLU-RC（F1 49.0987）上均排名第一，并在FarsTail上取得了最佳结果（宏F1 0.8350）。在NER和主题分类上，它保持竞争力（例如，ParsTwiNER上F1 0.8308；DigiMag上宏F1 0.7953），而关系抽取仍然是主要差距（PERLEX上宏F1 0.6684）。在IHUBERT预训练语料库上的受控分词器消融实验表明，在匹配词汇量下，BPE产生的子词碎片化程度略低于WordPiece，支持了我们的分词设计。总体而言，IHUBERT通过语义精选的大规模预训练以及跨分类和理解型任务的广泛评估，推进了波斯语语言建模。

英文摘要

Persian pretrained language models (PLMs) are still limited by the scarcity of large-scale, high-quality pretraining corpora and by insufficient evaluation beyond standard classification and NER tasks. We present IHUBERT, a monolingual Persian PLM trained from scratch with the RoBERTa-base encoder (125M parameters) on a 45 GB curated subset of the Sepahr-Danesh collection (about 7-8B tokens). To improve corpus quality and reduce redundancy, we employ a multi-stage preprocessing pipeline that includes normalization, exact and near-duplicate removal, anonymization, and vector-database-based semantic deduplication for distribution balancing control across domains and registers. We additionally train a 139k-vocabulary BPE tokenizer on the full pretraining corpus to better capture Persian morphology and orthographic variation. IHUBERT is evaluated on seven Persian NLU benchmarks covering NER, sentiment analysis, topic classification, NLI, extractive question answering, and relation extraction, using task-standard metrics (entity-level F1, Macro-F1, EM/F1). IHUBERT achieves its strongest gains on extractive QA, ranking first on both PQuAD (F1 88.3542) and ParsiNLU-RC (F1 49.0987), and attains the best result on FarsTail (Macro-F1 0.8350). On NER and topic classification, it remains competitive (e.g., 0.8308 F1 on ParsTwiNER; 0.7953 Macro-F1 on DigiMag), while relation extraction remains the main remaining gap (0.6684 Macro-F1 on PERLEX). A controlled tokenizer ablation on the IHUBERT pretraining corpus shows that BPE yields slightly lower subword fragmentation than WordPiece at matched vocabulary size, supporting our tokenization design. Overall, IHUBERT advances Persian language modeling through semantically curated large-scale pretraining and broad evaluation across both classification and comprehension-oriented tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.19993 2026-06-19 cs.LG 新提交 85%

Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs

激活与影响感知秩 (AIR)：保持功能的SVD压缩用于大语言模型

Nico Harder, Daniel Becking, Karsten Mueller, Wojciech Samek

发表机构 * Fraunhofer HHI（弗劳恩霍夫研究所）

专题命中预训练：提出LLM压缩框架，提升模型效率

AI总结提出AIR框架，基于SVD和反向信号影响度量，通过单次交替最小二乘扫描实现权重矩阵的低秩近似，在参数保留≤60%时困惑度比SVD-LLM(W)改善>18%，并减少90%校准数据。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference (AdaptFM), Seoul, South Korea (non-archival)

2606.19491 2026-06-19 cs.LG stat.ML 新提交 85%

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

LayerNorm Transformer 中的代数死方向：一种仅需前向传播的大语言模型规模诊断方法

Tejas Pradeep Shirodkar, P. J. Narayanan

发表机构 * IIIT, Hyderabad（海得拉巴国际信息技术学院）

专题命中预训练：研究LayerNorm变换器的死方向，涉及预训练模型诊断。

AI总结本文发现 LayerNorm 的逆尺度方向是后最终归一化中心激活协方差矩阵的精确代数核，可仅从参数中读取死方向，无需前向或后向传播，并在 14 个预训练模型上验证了其有效性。

Comments 34 pages, 7 figures, 6 tables. Empirical companion to arXiv:2606.05957

详情

AI中文摘要

预训练 Transformer 位于损失函数的奇异极小值附近，此时 Fisher 信息度量沿死方向退化：参数空间中方向性 Fisher 为零的方向。通常定位这样的方向需要一次前向传播和激活矩阵的特征分解，或基于采样的复杂度估计；没有一种方法能仅从网络参数计算方向。我们针对 LayerNorm Transformer 给出了一个这样的方向。LayerNorm 仿射的逆尺度方向 $\gamma^{-1}/\|\gamma^{-1}\|$ 是后最终归一化中心激活协方差矩阵的精确代数核，适用于任何输入分布，并在参数空间中诱导出相应的死方向。它仅从 LN 尺度参数读取，无需前向或后向传播，无需特征分解：这是针对 LayerNorm 的最廉价死方向读取方法。我们在 14 个预训练 Transformer（9 个 LayerNorm，5 个 RMSNorm；160M-35B；语言和视觉目标）上进行了测试。在随机初始化时，预测方向与测量的底部奇异方向（一次前向传播，直接 SVD）在 9/9 的 LayerNorm 模型上匹配到小数点后四位，并在 5/5 的 RMSNorm 模型上正确缺失，后者缺乏产生该方向的均值减法投影器。在训练后的检查点上，沿该方向的协方差特征值加深约 ${\sim}10^3$ 倍，并打开更多死方向；随机初始化到训练后的差距是一次前向传播、每检查点沿预测坐标的奇异结构读出。由此得出两个闭式结论：残差流的最小奇异值在 13/14 个 Transformer 上逐块保持不变（在其自身输入分布上测量），唯一的例外（Gemma$4$-$31$B）是一个真正的死方向，同一读出可精确定位；核方向的存在从参数本身即可对 Transformer 的归一化进行分类。

英文摘要

Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $γ^{-1}/\|γ^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

URL PDF HTML ☆

赞 0 踩 0

2606.19468 2026-06-19 cs.CL 新提交 85%

Characterizing Narrative Content in Web-scale LLM Pretraining Data

网络规模LLM预训练数据中的叙事内容特征化

Teagan Johnson, Elliott Ash, Andrew Piper, Maria Antoniak

发表机构 * University of Colorado Boulder（科罗拉多大学波尔德分校）； ETH Zürich（苏黎世联邦理工学院）； McGill University（麦吉尔大学）

专题命中预训练：细粒度研究LLM预训练语料库的叙事特征。

AI总结首次细粒度研究LLM预训练语料库Dolma的叙事特征，提出涵盖三个核心叙事元素（能动性、场景、事件）的框架，构建NarraBERT模型并发布NarraDolma数据集，揭示叙事结构在异构数据中可测量且分布不均。

Comments 8 pages of main content, 28 total pages. 30 figures

详情

AI中文摘要

尽管叙事是人类交流的基本模式，但网络规模LLM预训练语料库的叙事组成仍然很大程度上未被探索。我们首次对Dolma（一个3万亿词元的开放预训练语料库）中的叙事特征进行了细粒度研究。借鉴叙事理论，我们设计了一个框架，涵盖三个核心叙事元素（能动性、场景和事件），并将其操作化为11个可解释维度。在采样并标注了400个多样化的段落之后，我们微调并验证了NarraBERT，一个基于RoBERTa的细粒度叙事预测模型。我们将NarraBERT应用于300万个段落，生成了新数据集NarraDolma。我们发现：(i) 叙事结构在极度异构的数据中是可大规模测量的；(ii) 我们揭示了网络文本背后连续的多维叙事结构；(iii) 叙事质量在预训练来源和主题之间分布不均，而当前的策展实践既未测量也未考虑这一点。我们的框架、数据集和分析为理解LLM预训练数据中叙事质量的分布以及研究数据组成如何影响叙事推理任务提供了基础。我们公开发布了NarraDolma和NarraBERT。

英文摘要

The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of narrative features in Dolma, a 3-trillion-token open pretraining corpus. Drawing on narrative theory, we design a framework spanning three core narrative elements (agency, setting, and events) operationalized as 11 interpretable dimensions. After sampling and annotating a diverse set of 400 passages, we finetune and validate NarraBERT, a RoBERTa-based model for fine-grained narrative prediction. We apply NarraBERT to 3M passages, resulting in a new dataset, NarraDolma. We find (i) narrative structure is measurable at scale across extremely heterogeneous data, (ii) we uncover a continuous, multidimensional narrative structure underlying web text, and (iii) narrative qualities are unequally distributed across pretraining sources and topics in ways that current curation practices neither measure nor account for. Our framework, dataset, and analyses provide a foundation for understanding how narrative qualities are distributed in LLM pretraining data and for studying how data composition affects narrative reasoning tasks. We publicly release NarraDolma and NarraBERT.

URL PDF HTML ☆

赞 0 踩 0

2510.06048 2026-06-19 cs.LG 版本更新 85%

BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining

BLISS: 一种用于语言模型预训练数据选择的轻量级双层影响评分方法

Jie Hao, Rui Yu, Wei Zhang, Huixia Wang, Jie Xu, Mingrui Liu

发表机构 * Department of Computer Science, George Mason University, USA（乔治·马歇尔大学计算机科学系）； IBM T.J. Watson Research Center, USA（IBM T.J. Watson研究部）； Department of Statistics, Rice University（里士大学统计系）； Department of System Engineering & Operations Research, George Mason University, USA（乔治·马歇尔大学系统工程与运营管理系）

专题命中预训练：提出数据选择方法用于语言模型预训练

AI总结提出一种无需外部预训练模型的轻量级数据选择方法BLISS，通过双层优化和代理模型估计训练样本的长期影响，实现高效数据筛选，在C4数据集上预训练多种规模模型，显著加速收敛并提升下游任务性能。

详情

AI中文摘要

有效的数据选择对于预训练大型语言模型（LLM）至关重要，可以提高效率并增强对下游任务的泛化能力。然而，现有方法通常需要利用外部预训练模型，使得难以将数据选择的效果与外部预训练模型的效果分开。此外，如果模型训练至收敛，它们通常忽略所选数据的长期影响，这主要是由于全规模LLM预训练的过高成本。在本文中，我们介绍了BLISS（用于数据选择的轻量级双层影响评分方法）：一种轻量级数据选择方法，完全从头开始操作，不依赖任何外部预训练预言模型，同时明确考虑所选数据的长期影响。BLISS利用一个小型代理模型作为LLM的替代，并采用一个评分模型来估计如果代理模型训练至收敛时训练样本的长期影响。我们将数据选择形式化为一个双层优化问题，其中上层目标优化评分模型以分配重要性权重给训练样本，确保最小化下层目标（即在加权训练损失上训练代理模型直至收敛）导致最佳验证性能。一旦优化完成，训练好的评分模型预测数据集的影响分数，从而能够高效选择高质量样本用于LLM预训练。我们通过在C4数据集的选择子集上预训练410M/1B/2.8B Pythia和LLaMA-0.5B模型来验证BLISS。值得注意的是，在1B模型设置下，BLISS在达到与最先进方法相同性能时实现了1.7倍的加速，展示了在多个下游任务上的优越性能。

英文摘要

Effective data selection is essential for pretraining large language models (LLMs), enhancing efficiency and improving generalization to downstream tasks. However, existing approaches often require leveraging external pretrained models, making it difficult to disentangle the effects of data selection from those of the external pretrained models. In addition, they often overlook the long-term impact of selected data if the model is trained to convergence, primarily due to the prohibitive cost of full-scale LLM pretraining. In this paper, we introduce BLISS (\textbf{B}ileve\textbf{L} \textbf{I}nfluence \textbf{S}coring method for data \textbf{S}election): a lightweight data selection method that operates entirely \emph{from scratch}, without relying on any external pretrained oracle models, while explicitly accounting for the long-term impact of selected data. BLISS leverages a small proxy model as a surrogate for the LLM and employs a score model to estimate the long-term influence of training samples if the proxy model is trained to convergence. We formulate data selection as a bilevel optimization problem, where the upper-level objective optimizes the score model to assign importance weights to training samples, ensuring that minimizing the lower-level objective (i.e., training the proxy model over the weighted training loss until convergence) leads to best validation performance. Once optimized, the trained score model predicts influence scores for the dataset, enabling efficient selection of high-quality samples for LLM pretraining. We validate BLISS by pretraining 410M/1B/2.8B Pythia and LLaMA-0.5B models on selected subsets of the C4 dataset. Notably, under the 1B model setting, BLISS achieves $1.7\times$ speedup in reaching the same performance as the state-of-the-art method, demonstrating superior performance across multiple downstream tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.19989 2026-06-19 cs.DC cs.LG 新提交 80%

Online Dynamic Batching with Formal Guarantees for LLM Training

面向LLM训练的具有形式保证的在线动态批处理

Dian Li, Zekun Wang, Yaoru Wang, Jiahong Yan

发表机构 * Tencent（腾讯）

专题命中预训练：提出在线动态批处理系统加速LLM训练

AI总结提出在线动态批处理（ODB）系统，在数据加载器侧将批构建延迟到样本真实成本可观测时，解决离线批采样中预处理成本不可见问题，实现1.58-4.43x吞吐量提升，并提供无死锁有界终止的形式化保证。

Comments 29 pages, 3 figures, 21 tables

详情

AI中文摘要

现代LLM训练打破了离线批采样器背后的一个核心假设：样本的真实训练成本只有在预处理、增强、模板化、分词和多模态视觉标记扩展之后才能观察到。除非为依赖于预处理和增强的长度缓存付费，否则批构建对于决定填充、内存使用和GPU饱和度的量是盲目的。我们引入了在线动态批处理（ODB），这是一个数据加载器侧的即插即用系统，它将批形成移动到这一精确可观测性点，同时保持DDP步骤对齐。我们将这一同步需求形式化为分布式组对齐问题，并证明了在默认加入模式身份覆盖和可选非加入样本配额封闭下的无死锁有界终止。ODB不需要修改模型、优化器或注意力核，并以轻量级训练器适配器的形式发布为online-dynamic-batching。在UltraChat/LLaVA/ShareGPT4o上对公开的2B/8B Qwen3-VL进行的实验中，与固定批Standard相比，ODB在单节点全量微调/LoRA上实现了1.58-2.51倍的逐字样本吞吐量提升，在两节点全量微调上实现了1.71-3.78倍提升，质量与Standard相当；生产环境MM-Mix达到4.43倍。与GMT/BMT离线令牌预算预言机相比，ODB在UltraChat/LLaVA上差距在15%以内，在高变异系数的ShareGPT4o上更快：单节点全量微调/LoRA为2.24-2.39倍，两节点全量微调为3.06-3.69倍。总之，ODB占据了高异质性LLM微调的在线/即插即用领域：在质量与Standard相当的情况下实现大幅吞吐量提升，提供形式化的DGAP保证，无需长度缓存预计算或核重写。

英文摘要

Modern LLM training breaks a core assumption behind offline batch samplers: the true training cost of a sample is only observable after preprocessing, augmentation, templating, tokenization, and multimodal visual-token expansion. Unless one pays for a preprocessing- and augmentation-dependent length cache, batch construction is therefore blind to the quantity that determines padding, memory use, and GPU saturation. We introduce Online Dynamic Batching (ODB), a DataLoader-side drop-in system that moves batch formation to this point of accurate observability while preserving DDP step alignment. We formalize this synchronization requirement as the Distributed Group Alignment Problem and prove deadlock-free bounded termination with default join-mode identity coverage and opt-in non-join sample-quota closure. ODB requires no model, optimizer, or attention-kernel changes and is released as online-dynamic-batching with lightweight trainer adapters. Across public 2B/8B Qwen3-VL runs on UltraChat/LLaVA/ShareGPT4o, ODB improves literal emitted-sample throughput vs. fixed-batch Standard by 1.58-2.51x on single-node Full FT/LoRA and 1.71-3.78x on two-node Full FT, with Standard-comparable quality; production MM-Mix reaches 4.43x. Against GMT/BMT offline token-budget oracles, ODB is within 15% on UltraChat/LLaVA and faster on high-CV ShareGPT4o: 2.24-2.39x single-node Full FT/LoRA and 3.06-3.69x two-node Full FT. Together, ODB occupies the online/drop-in regime for high-heterogeneity LLM fine-tuning: large throughput gains at Standard-comparable quality, formal DGAP guarantees, and no length-cache precompute or kernel rewrites.

URL PDF HTML ☆

赞 0 踩 0

2606.20097 2026-06-19 cs.CL 新提交 90%

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

HydraHead：从头部级功能异质性到专业化注意力混合

Zhentao Tan, Wei Chen, Jingyi Shen, Yao Liu, Xu Shen, Yue Wu, Jieping Ye

发表机构 * Alibaba Group（阿里巴巴集团）

专题命中长上下文：长上下文注意力混合架构

AI总结提出HydraHead架构，沿头部维度混合全注意力和线性注意力，通过可解释性驱动的头部选择和尺度归一化融合模块，在长上下文任务中优于层级混合设计，仅用15B token训练即在512K上下文长度上提升69%。

详情

AI中文摘要

注意力的二次复杂度对长上下文处理构成了关键瓶颈，激发了混合注意力设计的兴趣。大多数开源混合模型采用层级策略。然而，先前工作注意到线性注意力与全注意力整合的内在困难，表明注意力混合的设计空间仍未充分探索。为了探索这一空间，我们进行可解释性分析，观察到层表现出块级功能相似性，而同一层内的单个头部尽管共享输入特征，却显示出不同的功能专门化。这种头部级异质性表明，头部维度为融合异质注意力信号提供了自然且原则性的粒度。基于这一洞察，我们引入了HydraHead，一种沿头部轴混合全注意力和线性注意力的新型架构。HydraHead具有两个关键创新：（1）一种可解释性驱动的选择策略，识别检索关键的头部并仅为其保留全注意力；（2）一种尺度归一化融合模块，调和全注意力和线性注意力头部输出之间的分布差距。通过利用参数重用和蒸馏的三阶段迁移流程，我们以最小的训练开销实现了高性能混合模型。在统一的训练设置下，HydraHead在长上下文任务中优于其他混合设计，同时保持强大的通用推理能力。通过可解释性驱动的头部选择，它以7:1的线性注意力与全注意力比例匹配了3:1层级混合的长上下文性能。关键的是，仅用15B token训练，HydraHead在512K上下文长度上比基线提升超过69%，接近Qwen3.5（一个具有256K原生上下文长度的类似规模领先模型）。这突显了头部级混合的显著扩展潜力。

英文摘要

The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the inherent difficulty of integrating Linear Attention (LA) with Full Attention (FA), suggesting that the design space of attention hybridization remains underexplored. To probe this space, we conduct interpretability analysis and observe that layers exhibit block-wise functional similarity, while individual heads within the same layer display distinct functional specialization despite sharing input features. This head-level heterogeneity suggests that the head dimension provides a natural and principled granularity for fusing heterogeneous attention signals. Building on this insight, we introduce HydraHead, a novel architecture that hybridizes FA and LA along the head axis. HydraHead features two key innovations: (1) an interpretability-driven selection strategy that identifies retrieval-critical heads and preserves FA only for them, and (2) a scale-normalized fusion module that reconciles the distributional gap between FA and LA head outputs. By leveraging a three-stage transfer pipeline with parameter reuse and distillation, we achieve high-performance hybrid models with minimal training overhead. Under a unified training setup, HydraHead outperforms other hybrid designs in long-context tasks while maintaining strong general reasoning. With interpretability-driven head selection, it matches a 3:1 layer-wise hybrid's long-context performance at a 7:1 LA-to-FA ratio. Crucially, trained on only 15B tokens, HydraHead achieves over 69% improvement over the baseline at 512K context length, approaching Qwen3.5, a leading model of comparable size with a native context length of 256K. This highlights the significant scaling potential of head-level hybridization.

URL PDF HTML ☆

赞 0 踩 0

2606.19744 2026-06-19 cs.CL cs.AI cs.HC 新提交 90%

Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings

超越统一遗忘：不同偏好设置下顺序直接偏好优化的研究

Pranav Bhandari, Nicolas Fay, Amitava Datta, Usman Naseem, Mehwish Nasim

发表机构 * Network Analysis and Social Influence Modelling (NASIM) Lab（网络分析与社会影响建模实验室）； School of Physics Maths and Computing, The University of Western Australia（西澳大学物理数学与计算学院）； School of Psychological Science, The University of Western Australia（西澳大学心理科学学院）； School of Computing, Macquarie University（麦考瑞大学计算机学院）

专题命中后训练：研究顺序DPO在不同偏好设置下的影响，涉及对齐方法。

AI总结研究顺序DPO在不同偏好设置下的影响，发现遗忘模式并非统一，而是取决于目标关系、信号强度和训练顺序，并提出未来对齐流程应考虑目标兼容性。

Comments Submitted to EMNLP 2026

详情

AI中文摘要

将语言模型与人类偏好对齐通常需要优化多个行为目标。一种实用方法是使用直接偏好优化（DPO）等偏好优化方法顺序应用这些目标，但目前尚不清楚后续训练是否会统一降低先前学习的偏好，或者这种影响是否取决于目标之间的关系。我们研究了跨越四种偏好设置（包括分布冲突、多属性交互、强安全信号和兼容的响应质量目标）的顺序DPO。使用带有LoRA适配器的Llama-3.1-8B-Instruct，我们在每个阶段后使用固定的基础模型参考评估所有目标。我们发现顺序DPO不会产生单一的遗忘模式；偏好变化从部分退化到稳定、成对重新分配或正迁移，具体取决于目标关系、信号强度和训练顺序。使用长度归一化策略边界的成对分析表明，聚合指标可能掩盖偏好对之间的异质性变化，而四分位数分解显示，高置信度对可能根据设置而退化或改进。机制诊断表明，在所有设置中，阶段2的梯度和适配器更新与先前目标接近正交，几乎没有证据表明直接梯度对立是主要驱动因素。这些发现表明，未来的顺序对齐流程应考虑目标兼容性和信号强度，而不是假设后续目标会统一影响先前的偏好。

英文摘要

Aligning language models with human preferences often requires optimising multiple behavioural objectives. A practical approach is to apply these objectives sequentially using preference optimisation methods such as Direct Preference Optimisation (DPO), but it remains unclear whether later training uniformly degrades preferences learned earlier or whether the effect depends on the relationship between objectives. We study sequential DPO across four preference settings covering distributional conflict, multi-attribute interaction, strong safety signal, and compatible response-quality objectives. Using Llama-3.1-8B-Instruct with LoRA adapters, we evaluate all objectives after every stage with a fixed base-model reference. We find that sequential DPO does not produce a single forgetting pattern; preference change ranges from partial degradation to stability, pair-level redistribution, or positive transfer depending on objective relationship, signal strength, and training order. Pair-level analysis using length-normalised policy margins shows that aggregate metrics can mask heterogeneous changes across preference pairs, whereas quartile decomposition reveals that high-confidence pairs can either degrade or improve depending on the setting. Mechanistic diagnostics show that Stage~2 gradients and adapter updates are near-orthogonal to the previous objective across all settings, providing little evidence that direct gradient opposition is the primary driver. These findings suggest that future sequential alignment pipelines should account for objective compatibility and signal strength, rather than assuming that later objectives affect earlier preferences uniformly.

URL PDF HTML ☆

赞 0 踩 0

2604.00626 2026-06-19 cs.LG cs.CL 版本更新 90%

A Survey of On-Policy Distillation for Large Language Models

大型语言模型的在线策略蒸馏综述

Mingyang Song, Mao Zheng

发表机构 * Tencent, China（腾讯，中国）

专题命中后训练：综述在线策略蒸馏方法，涉及LLM后训练

AI总结本文综述了大型语言模型的在线策略蒸馏方法，探讨了蒸馏过程中如何通过反馈减少累积误差，提出了基于f-散度最小化的蒸馏框架，并分析了蒸馏与强化学习之间的联系。

Comments Ongoing Work

详情

AI中文摘要

随着大型语言模型（LLMs）在能力和成本上的持续增长，将前沿能力转移到更小、可部署的学生模型已成为核心工程问题，知识蒸馏仍然是这一转移的主导技术。工业流水线中普遍采用的静态模仿教师生成文本的方法存在结构性缺陷，随着任务变得更长且需要更多推理，这种缺陷变得更加严重。因为学生是在完美教师前缀上训练的，但在推理时必须生成自己的文本，小错误往往会积累成学生很少被训练来恢复的轨迹，导致的暴露偏差已被证明与序列长度的平方成比例。在线策略蒸馏（OPD）围绕这一观察重新组织训练循环，通过让教师对学生实际生成的内容提供反馈，以减少累积项趋于线性，并将蒸馏重新定义为迭代修正过程，而不是单次模仿。由此产生的文献在分歧设计、奖励引导优化和自我对抗方面有所扩展，但贡献仍然分散在知识蒸馏、RLHF和模仿学习社区中，缺乏统一的处理。本文提供了这样的处理。我们正式将OPD定义为学生采样轨迹上的f-散度最小化，将该领域沿三个设计轴（优化什么、信号来源在哪里、以及如何在实践中稳定训练）组织起来，并整合成功条件、反复失败模式以及OPD与KL约束强化学习之间的联系。最后，我们提出了由此综合而产生的开放性问题，包括蒸馏扩展定律、不确定反馈、代理蒸馏以及知识蒸馏与强化学习之间的日益增长的重叠。

英文摘要

As Large Language Models continue to grow in both capability and cost, transferring frontier capabilities into smaller, deployable students has become an important engineering problem, and knowledge distillation remains a common technique for this transfer. The prevailing recipe in industrial pipelines, static imitation of teacher-generated text, carries a structural weakness that grows more severe as tasks become longer and more reasoning-intensive. Because the student is trained on flawless teacher prefixes but generates its own at inference, small errors tend to accumulate into trajectories it has rarely been trained to recover from, and the resulting exposure bias has been shown to scale roughly with the square of sequence length. On-Policy Distillation reorganizes the training loop around this observation by having the teacher provide feedback on what the student actually produces, with the goal of reducing the compounding term toward linear and reframing distillation as an iterative correction process rather than single-pass imitation. The resulting literature has expanded along divergence design, reward-guided optimization, and self-play, yet contributions remain scattered across the knowledge distillation, RLHF, and imitation learning communities without a unified treatment. This survey provides such a treatment. We formalize OPD as f-divergence minimization over student-sampled trajectories, organize the field along three design axes (what to optimize, where the signal comes from, and how to stabilize training in practice), and consolidate success conditions, recurring failure modes, and the connection between OPD and KL-constrained reinforcement learning. We close with open problems that emerge from this synthesis, including distillation scaling laws, uncertainty-aware feedback, agent-level distillation, and the growing overlap between knowledge distillation and RL.

URL PDF HTML ☆

赞 0 踩 0

2602.22495 2026-06-19 cs.LG cs.AI 版本更新 90%

Reinforcement-aware Knowledge Distillation for LLM Reasoning

面向LLM推理的强化学习感知知识蒸馏

Zhaoyang Zhang, Shuli Jiang, Yantao Shen, Yuting Zhang, Dhananjay Ram, Shuo Yang, Zhuowen Tu, Wei Xia, Stefano Soatto

发表机构 * Meta ； Guo et al. ； Lin et al. ； Xu et al. ； Shao et al. ； Schulman et al. ； Xie et al.

专题命中后训练：强化学习感知知识蒸馏用于LLM推理

AI总结提出RL感知蒸馏（RLAD），通过信任区域比率蒸馏（TRRD）在强化学习后训练中实现选择性模仿，解决分布不匹配和目标干扰问题，在逻辑推理和数学基准上优于现有方法。

详情

AI中文摘要

强化学习（RL）后训练最近推动了长链思维推理大语言模型（LLM）的重大进展，但这类模型的高推理成本促使将其蒸馏到更小的学生模型中。大多数现有的知识蒸馏（KD）方法是为监督微调（SFT）设计的，依赖于固定的教师轨迹或基于教师-学生KL散度的正则化。当与RL结合时，这些方法常常遭受分布不匹配和目标干扰：教师监督可能与学生不断变化的rollout分布不一致，并且KL正则化项可能与奖励最大化竞争，需要仔细的损失平衡。为了解决这些问题，我们提出了RL感知蒸馏（RLAD），它在RL期间执行选择性模仿——仅在改进当前策略更新时引导学生向教师学习。我们的核心组件，信任区域比率蒸馏（TRRD），用基于PPO/GRPO风格似然比的目标替代教师-学生KL正则化项，该目标锚定到教师-旧策略混合，从而在学生rollout上产生优势感知、信任区域约束的蒸馏，并自然平衡探索、利用和模仿。在多种逻辑推理和数学基准上，RLAD始终优于离线蒸馏、标准GRPO和基于KL的在策略教师-学生知识蒸馏。

英文摘要

Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought reasoning large language models (LLMs), but the high inference cost of such models motivates distillation into smaller students. Most existing knowledge distillation (KD) methods are designed for supervised fine-tuning (SFT), relying on fixed teacher traces or teacher-student Kullback-Leibler (KL) divergence-based regularization. When combined with RL, these approaches often suffer from distribution mismatch and objective interference: teacher supervision may not align with the student's evolving rollout distribution, and the KL regularizer can compete with reward maximization and require careful loss balancing. To address these issues, we propose RL-aware distillation (RLAD), which performs selective imitation during RL -- guiding the student toward the teacher only when it improves the current policy update. Our core component, Trust Region Ratio Distillation (TRRD), replaces the teacher-student KL regularizer with a PPO/GRPO-style likelihood-ratio objective anchored to a teacher--old-policy mixture, yielding advantage-aware, trust-region-bounded distillation on student rollouts and naturally balancing exploration, exploitation, and imitation. Across diverse logic reasoning and math benchmarks, RLAD consistently outperforms offline distillation, standard GRPO, and KL-based on-policy teacher-student knowledge distillation.

URL PDF HTML ☆

赞 0 踩 0

2509.25148 2026-06-19 cs.AI 版本更新 90%

AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models

AAPA：用于大型语言模型后训练的对抗锚定偏好对齐

Faqiang Qian, Kang An, Weikun Zhang, Ziliang Wang, Xuhui Zheng, Liangjian Wen, Yong Dai, Mengya Gao, Yichao Wu

发表机构 * Southwest University of Finance and Economics（西南财经大学）

专题命中后训练：提出对抗锚定偏好对齐框架，增强后训练目标

AI总结提出AAPA框架，通过固定轻量判别器对策略输出与专家响应进行句子级对抗锚定，增强SFT、GRPO等后训练目标，在指令遵循基准上持续提升性能。

详情

AI中文摘要

大型语言模型的后训练对齐通常结合了专家演示上的监督微调（SFT）和来自偏好或可验证反馈的强化学习（RL）。SFT提供了有用的行为锚点，但可能过拟合静态演示，而RL鼓励探索但可能偏离专家行为或利用不完美的奖励。我们提出\textbf{AAPA}（\emph{对抗锚定偏好对齐}），这是一个插件式框架，通过句子级对抗锚定信号增强现有的后训练目标。AAPA使用固定的轻量判别器将策略生成结果与离线预收集的专家响应进行比较，因此在策略优化期间既不需要在线教师推理，也不需要判别器协同训练。相同的锚定项可以添加到SFT、GRPO和CHORD中，同时保留其原始训练流程。在指令遵循基准上的实验表明，AAPA在不同模型规模上一致地改善了相应的基础目标。特别是，分阶段的AAPA配置在\texttt{Qwen3-0.6B}上比强GRPO基线提高了5.77%，在\texttt{Qwen3-4B}上提高了3.75%。对响应长度、对数概率分布和判别器变体的进一步分析表明，对抗锚定为偏好优化提供了稳定的语义基础信号。代码可在\url{this https URL}获取。

英文摘要

Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose \textbf{AAPA} (\emph{Adversarially Anchored Preference Alignment}), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.

URL PDF HTML ☆

赞 0 踩 0

2606.20008 2026-06-19 cs.LG 新提交 85%

VIMPO: Value-Implicit Policy Optimization for LLMs

VIMPO: 值隐式策略优化用于大语言模型

Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, Xuandong Zhao

发表机构 * UC Berkeley（加州大学伯克利分校）； Yale University（耶鲁大学）

专题命中后训练：提出VIMPO方法优化LLM推理能力。

AI总结提出VIMPO方法，通过KL正则化强化学习的最优条件导出策略隐含值函数，无需训练评论家，实现细粒度信用分配，在数学推理基准上优于GRPO。

详情

AI中文摘要

基于可验证奖励的强化学习已成为提升大语言模型推理能力的核心工具，但当前方法在简单性与信用分配之间存在权衡。GRPO等群组相对方法避免了训练评论家，但通常为每个token分配轨迹级优势。Actor-critic方法提供更密集的学习信号，但需要学习值函数，其自身存在训练不稳定性。我们提出VIMPO，一种无需评论家的策略优化方法，从KL正则化强化学习的最优条件推导出策略隐含值函数。对于自回归生成，得到的值递归可以用策略-参考对数比率表示，并由轨迹结束时无未来奖励的终止条件锚定。这给出了一个简单的值损失，它结合了结果级可验证奖励，而无需训练评论家。相同的推导也产生了无需评论家的actor优势，使VIMPO能够通过值损失分离奖励合并，并通过PPO风格的actor更新进行策略改进。在数学RLVR基准上，VIMPO在MATH-500、AIME 2024、AIME 2025和OlympiadBench上均优于GRPO，尤其在竞赛式评估中提升更大。在噪声奖励下，VIMPO保持对GRPO的持续优势，表明策略隐含值优化可以在保持无评论家训练实用简单性的同时提供更精细的信用分配。

英文摘要

Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative methods such as GRPO avoid training a critic, but typically assign a trajectory-level advantage to every token. Actor-critic methods provide denser learning signals, but require a learned value function with its own training instability. We introduce VIMPO, a critic-free policy optimization method that derives a policy-implied value function from the optimality conditions of KL-regularized reinforcement learning. For autoregressive generation, the resulting value recurrence can be written in terms of policy-reference log-ratios and anchored by the terminal condition that no future reward remains at the end of a trajectory. This gives a simple value loss that incorporates outcome-level verifiable rewards without training a critic. The same derivation also yields a critic-free actor advantage, allowing VIMPO to separate reward incorporation through the value loss from policy improvement through a PPO-style actor update. On mathematical RLVR benchmarks, VIMPO improves over GRPO across MATH-500, AIME 2024, AIME 2025, and OlympiadBench, with especially larger gains on competition-style evaluations. Under noisy rewards, VIMPO retains a consistent advantage over GRPO, suggesting that policy-implied value optimization can provide finer credit assignment while preserving the practical simplicity of critic-free training.

URL PDF HTML ☆

赞 0 踩 0

2606.20002 2026-06-19 cs.LG cs.AI cs.CL 新提交 80%

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

Connect the Dots：通过强化学习训练具备跨域泛化能力的长期生命周期智能体

Yanxi Chen, Weijie Shi, Yuexiang Xie, Boyi Hu, Yaliang Li, Bolin Ding, Jingren Zhou

发表机构 * Alibaba Group（阿里巴巴集团）

专题命中后训练：通过强化学习训练LLM的元能力。

AI总结提出Connect the Dots框架，通过端到端强化学习训练LLM在长期任务中自我更新上下文并泛化到新领域，实验验证了跨域泛化能力。

Comments Work in progress; we will continuously update the codebase and arXiv version

详情

AI中文摘要

本文提出了一个通用框架，用于训练大型语言模型（LLMs）具备“Connect the Dots”（CoD）这一元能力，该能力是长期生命周期智能体所必需的：当基于LLM的AI智能体部署在环境中时，它解决一系列长期任务，同时持续探索环境、从自身经验中学习，并迭代地自我更新关于环境的上下文，从而在更新上下文的条件下，在未来任务上实现逐步更好的性能。CoD框架的主要组成部分包括：（1）用于端到端强化学习（RL）的算法设计和基础设施，其中包含交替执行任务和更新上下文的长展开序列；（2）用于在训练过程中激励和激发LLM中目标元能力的任务和环境，以及在评估过程中忠实衡量进展的任务和环境。我们展示了CoD框架的概念验证实现，包括具有细粒度信用分配的GRPO风格RL算法，以及针对目标元能力（而非特定领域的LLM能力或标准的逐任务RL）量身定制的任务和环境。实证结果验证了CoD设置中端到端RL训练的有效性，并展示了所激发元能力的分布外泛化潜力——在训练领域内、跨不同领域以及从CoD到Ralph-loop设置中。我们对CoD的研究连接了多项先前工作，并为推进LLM和AI智能体开辟了新的机遇。为促进进一步研究和应用，我们在\url{this https URL}上发布了我们的实现。

英文摘要

This work presents a general framework for training large language models (LLMs) to "Connect the Dots" (CoD), a meta-capability required by long-lifecycle agents: as an LLM-based AI agent gets deployed in an environment, it solves a long sequence of tasks while continuously exploring the environment, learning from its own experiences, and iteratively self-updating its context about the environment, thereby achieving progressively better performance on future tasks conditioned on the updated context. Major components of the CoD framework include: (1) algorithm design and infrastructure for end-to-end reinforcement learning (RL) with long rollout sequences interleaving solve-task and update-context episodes; (2) tasks and environments for incentivizing and eliciting the targeted meta-capability in LLMs during training, as well as for faithfully measuring progress during evaluation. We present proof-of-concept implementations of the CoD framework, including a GRPO-style RL algorithm with fine-grained credit assignment, as well as tasks and environments tailored to the targeted meta-capability (rather than domain-specific LLM capabilities or standard task-by-task RL). Empirical results validate the efficacy of end-to-end RL training in the CoD setting, and demonstrate the potential for out-of-distribution generalization -- within the training domains, across different domains, and from CoD to Ralph-loop settings -- of the elicited meta-capability. Our investigation of CoD connects several lines of prior works, and opens up new opportunities for advancing LLMs and AI agents. To facilitate further research and applications, we release our implementations at \url{https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod}.

URL PDF HTML ☆

赞 0 踩 0

2602.14696 2026-06-19 cs.LG 版本更新 90%

A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't)

对目标指令选择的批判性审视：厘清什么重要（以及什么不重要）

Nihal V. Nayak, Paula Rodriguez-Diaz, Neha Hulkund, Sara Beery, David Alvarez-Melis

发表机构 * Harvard University（哈佛大学）； MIT（麻省理工学院）； Kempner Institute（凯门研究所）

专题命中指令微调：系统分析指令微调中目标指令选择的核心要素

AI总结本文系统解构指令微调中目标指令选择的两大核心要素——数据表示与选择算法，发现基于梯度的表示结合贪心轮询选择在低预算下表现最佳，但收益随预算增加而减弱，并统一了多种算法为近似距离最小化。

Comments ICML 2026

详情

AI中文摘要

大型语言模型（LLM）的指令微调通常涉及从大型候选池中选择一个指令训练子集，使用来自目标任务的小型查询集。尽管兴趣日益增长，关于目标指令选择的文献仍然支离破碎且不透明：方法在选择预算上差异很大，经常省略零样本基线，并且常常混淆关键组件的贡献。因此，实践者缺乏针对其目标任务选择指令的可操作指导。在这项工作中，我们旨在通过解构和系统分析两个核心要素：数据表示和选择算法，为这一领域带来清晰度。我们的框架支持跨模型、任务和预算的受控比较。我们发现，只有基于梯度的数据表示选择的子集，其与查询的相似性能够一致地预测跨数据集、模型和候选池的性能。虽然没有单一方法占主导地位，但基于梯度的表示与贪心轮询选择相结合，在低预算下平均表现最佳，但这些收益在较大预算下会减弱。最后，我们将几种现有的选择算法统一为所选子集与查询集之间近似距离最小化的形式，并用新的泛化界限支持这一观点。更广泛地说，我们的发现为LLM微调中更原则性的数据选择提供了关键见解和基础。代码可在该 https URL 获取。

英文摘要

Instruction fine-tuning of large language models (LLMs) often involves selecting a subset of instruction training data from a large candidate pool, using a small query set from the target task. Despite growing interest, the literature on targeted instruction selection remains fragmented and opaque: methods vary widely in selection budgets, often omit zero-shot baselines, and frequently entangle the contributions of key components. As a result, practitioners lack actionable guidance on selecting instructions for their target tasks. In this work, we aim to bring clarity to this landscape by disentangling and systematically analyzing the two core ingredients: data representation and selection algorithms. Our framework enables controlled comparisons across models, tasks, and budgets. We find that only gradient-based data representations choose subsets whose similarity to the query consistently predicts performance across datasets, models, and candidate pools. While no single method dominates, gradient-based representations paired with greedy round-robin selection often perform best on average at low budgets, but these gains diminish at larger budgets. Finally, we unify several existing selection algorithms as forms of approximate distance minimization between the selected subset and the query set, and support this view with new generalization bounds. More broadly, our findings provide critical insights and a foundation for more principled data selection in LLM fine-tuning. The code is available at https://github.com/dcml-lab/targeted-instruction-selection.

URL PDF HTML ☆

赞 0 踩 0

2602.04306 2026-06-19 cs.CL cs.AI 版本更新 85%

DeFrame: Debiasing Large Language Models Against Framing Effects

DeFrame: 消除大语言模型中的框架效应偏差

Kahee Lim, Soyeon Kim, Steven Euijong Whang

发表机构 * KAIST（韩国科学技术院）

专题命中指令微调：提出框架感知去偏方法，增强LLM跨框架一致性

AI总结针对大语言模型在语义等价但不同表述的提示下产生不一致偏见的问题，提出框架感知的去偏方法，通过量化框架差异并增强跨框架一致性，有效降低整体偏见并提升鲁棒性。

Comments Accepted to Findings of ACL 2026

详情

AI中文摘要

随着大语言模型（LLMs）在现实应用中的日益部署，确保其在不同人口群体中的公平响应变得至关重要。尽管做出了许多努力，但一个持续的挑战是隐藏的偏见：LLMs 在标准评估下表现公平，但在这些评估设置之外可能产生有偏见的响应。在本文中，我们识别出框架——语义等价的提示在表达方式上的差异（例如，“A 比 B 好” vs. “B 比 A 差”）——作为导致这一差距的一个未被充分探索的因素。我们首先引入“框架差异”的概念来量化框架对公平性评估的影响。通过用替代框架扩充公平性评估基准，我们发现（1）公平性得分随框架变化显著，以及（2）现有的去偏方法改善了整体（即框架平均）公平性，但往往未能减少框架引起的差异。为了解决这个问题，我们提出了一种框架感知的去偏方法，鼓励 LLMs 在不同框架之间更加一致。实验表明，我们的方法减少了整体偏见，并提高了对框架差异的鲁棒性，使 LLMs 能够产生更公平和更一致的响应。

英文摘要

As large language models (LLMs) are increasingly deployed in real-world applications, ensuring their fair responses across demographics has become crucial. Despite many efforts, an ongoing challenge is hidden bias: LLMs appear fair under standard evaluations, but can produce biased responses outside those evaluation settings. In this paper, we identify framing -- differences in how semantically equivalent prompts are expressed (e.g., "A is better than B" vs. "B is worse than A") -- as an underexplored contributor to this gap. We first introduce the concept of "framing disparity" to quantify the impact of framing on fairness evaluation. By augmenting fairness evaluation benchmarks with alternative framings, we find that (1) fairness scores vary significantly with framing and (2) existing debiasing methods improve overall (i.e., frame-averaged) fairness, but often fail to reduce framing-induced disparities. To address this, we propose a framing-aware debiasing method that encourages LLMs to be more consistent across framings. Experiments demonstrate that our approach reduces overall bias and improves robustness against framing disparities, enabling LLMs to produce fairer and more consistent responses.

URL PDF HTML ☆

赞 0 踩 0

2606.20493 2026-06-19 cs.LG cs.AI cs.MA 新提交 85%

Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

传染网络：多智能体LLM系统中的评估者偏见传播

Zewen Liu

发表机构 * Qilu Institute of Technology, School of Software Engineering（齐鲁理工学院软件工程学院）

专题命中其他LLM ：研究多智能体LLM系统中评估者偏见传播

AI总结提出传染网络框架，量化评估者偏见在多智能体LLM系统中的传播，发现同模型智能体间偏见传播系数为0.157-0.352，且增大评估委员会规模可减少72.4%的传播效应。

Comments 20 pages, 4 figures, 4 tables

详情

AI中文摘要

当大型语言模型在多智能体系统中担任评估者时，其系统性评估偏见会通过智能体网络传播。我们引入传染网络，这是一个用于衡量评估者偏见如何在交互的LLM智能体间传播的正式框架。在使用DeepSeek-chat进行的受控3智能体实验中，我们采用了三种不同的评估者偏见配置文件（结构化、平衡、基于证据），测量了跨智能体传染矩阵Gamma_3，并发现评估者偏见始终在智能体间传播（gamma在[0.157, 0.352]范围内），即使是在相同底层模型内也是如此。我们识别出由谱半径rho(Gamma_N)控制的三种传播机制，并证明同质模型智能体产生的传染系数比先前工作中观察到的跨模型系数弱3-5倍（MM-EPC: gamma约0.85-1.3），使其处于抑制机制中。我们表明，将评估委员会规模从k=1增加到k=3可将有效传染减少72.4%，提供了一种可行的缓解策略。我们发布了开源的传染网络实验框架。

英文摘要

When large language models serve as evaluators in multi-agent systems, their systematic evaluation biases propagate through the agent network. We introduce Contagion Networks, a formal framework for measuring how evaluator biases spread across interacting LLM agents. In a controlled 3-agent experiment using DeepSeek-chat with three distinct evaluator bias profiles (structured, balanced, evidence-based), we measure the Cross-Agent Contagion Matrix Gamma_3 and find that evaluator biases consistently propagate between agents (gamma in [0.157, 0.352]), even within the same underlying model. We identify three propagation regimes governed by the spectral radius rho(Gamma_N), and demonstrate that homogeneous-model agents produce contagion coefficients 3-5x weaker than cross-model coefficients observed in prior work (MM-EPC: gamma approx 0.85-1.3), placing them in the suppression regime. We show that increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%, providing an actionable mitigation strategy. We release the open-source Contagion Network experimental framework.

URL PDF HTML ☆

赞 0 踩 0

2606.19746 2026-06-19 cs.DC 新提交 85%

ACUTE协议：操作语言模型激活以实现更好的校准、效用和信任

Nishant Subramani, Palash Goyal, Yiwen Song, Mani Malek, Yuan Xue, Tomas Pfister, Hamid Palangi

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Google（谷歌）； Scale AI

专题命中其他LLM ：提出激活置信度估计协议，提升校准与信任

AI总结提出ACUTE协议，通过操作语言模型激活来估计置信度，平衡校准与信息性，在多项选择问答、工具调用和科学文档摘要等任务上优于强基线，提升校准、效用和可信度。

Comments ICML 2026

详情

AI中文摘要

随着语言模型的改进并越来越多地部署以解决各种任务，可信度变得至关重要。校准是信任的良好代理：良好校准的置信度估计有助于在信任特定模型输出时告知风险与回报的权衡。不幸的是，即使模型改进，它们仍然校准不良，往往偏向过度自信。此外，校准可能被操纵：总是预测基率的策略是完美校准的，但完全没有信息性。为了解决这个问题，我们开发了一个新指标，即通过预言机重新归一化的期望效用（EURO），它平衡了校准和信息性。我们还提出了一种通用的基于激活的置信度、效用和信任估计协议（ACUTE），以适当裁决不确定性。ACUTE协议为4个模型家族的6个模型上的3个任务（包括多项选择问答、工具调用和科学文档摘要）提供了灵活、样本高效和计算高效的置信度估计器。ACUTE在EURO上优于强基线，同时保持较低的校准误差。综合来看，我们的工作表明，为LLM配备ACUTE协议可以在多种设置中提高校准、效用和可信度。

英文摘要

As language models improve and become increasingly deployed to solve a variety of tasks, trustworthiness becomes essential. Calibration is a good proxy for trust: well-calibrated confidence estimates help inform the risk versus reward tradeoff when trusting a specific model output. Unfortunately, even as models improve, they remain poorly calibrated, often biasing towards overconfidence. Additionally, calibration can be gamed: a policy that always predicts the base rate is perfectly calibrated, but completely uninformative. To resolve this, we develop a new metric, expected utility renormalized by the oracle (EURO), that balances calibration and informativeness. We also propose a general-purpose activation-based confidence, utility, and trust estimation protocol (ACUTE) to appropriately adjudicate uncertainty. The ACUTE protocol provides flexible, sample-efficient, and compute-efficient confidence estimators for 3 tasks including multiple choice question answering, tool-calling, and scientific document summarization across 6 models from 4 model families. ACUTE outperforms strong baselines on EURO, while maintaining low calibration error. Taken together, our work shows that equipping LLMs with the ACUTE protocol can improve calibration, utility, and trustworthiness in numerous settings.

URL PDF HTML ☆

赞 0 踩 0

2512.06899 2026-06-19 cs.CR 版本更新 85%

Patronus: Identifying and Mitigating Transferable Backdoors in Pre-trained Language Models

Patronus: 识别和缓解预训练语言模型中的可迁移后门

Tianhang Zhao, Haodong Zhao, Wei Du, Pengzhou Cheng, Junxian Li, Sufeng Duan, Haojin Zhu, Gongshen Liu

专题命中其他LLM ：针对预训练语言模型后门攻击的防御框架，涉及LLM安全。

AI总结针对预训练语言模型供应链中可迁移后门的安全威胁，提出Patronus防御框架，通过输入侧不变性检测和双阶段缓解策略，在15个模型和9个任务上实现≥98.3%后门检测召回率。

Comments Work in progress

详情

AI中文摘要

“预训练，然后微调”范式彻底改变了自然语言处理（NLP）。在此背景下，可迁移后门对预训练语言模型（PLMs）供应链构成严重威胁，然而防御研究仍处于起步阶段，主要依赖于检测输出特征空间中的异常。我们发现一个关键缺陷：下游任务的微调不可避免地会修改模型参数，改变输出分布，使得预先计算的防御失效。为解决此问题，我们提出Patronus，一种新颖的防御框架，将防御焦点从输出特征转移到输入侧不变性，利用对抗性触发即使在模型权重变化时也保持恒定的特性。为了克服离散文本优化的收敛挑战，Patronus引入了一种多触发对比搜索算法，有效桥接了基于梯度的优化与对比学习目标。此外，我们采用了一种双阶段缓解策略，结合实时输入监控和通过对抗训练进行的模型净化。在15个PLMs和9个任务上的大量实验表明，Patronus实现了≥98.3%的后门检测召回率，并将攻击成功率降低到干净设置的水平，在所有设置中显著优于所有最先进的基线。代码可从此https URL获取。

英文摘要

The ``Pre-train, then fine-tune'' paradigm has revolutionized Natural Language Processing (NLP). In this context, transferable backdoors pose a severe threat to the Pre-trained Language Models (PLMs) supply chain, yet defensive research remains nascent, primarily relying on detecting anomalies in the output feature space. We identify a critical flaw that fine-tuning on downstream tasks inevitably modifies model parameters, shifting the output distribution and rendering pre-computed defense ineffective. To address this, we propose Patronus, a novel defense framework that shifts the defensive focus from output features to input-side invariance, exploiting the fact that adversarial triggers remain constant even as model weights change. To overcome the convergence challenges of discrete text optimization, Patronus introduces a multi-trigger contrastive search algorithm that effectively bridges gradient-based optimization with contrastive learning objectives. Furthermore, we employ a dual-stage mitigation strategy combining real-time input monitoring with model purification via adversarial training. Extensive experiments across 15 PLMs and nine tasks demonstrate that Patronus achieves $\geq98.3\%$ backdoor detection recall and reduces attack success rates to clean settings, significantly outperforming all state-of-the-art baselines in all settings. Code is available at https://github.com/zth855/Patronus.

URL PDF HTML ☆

赞 0 踩 0

2606.20517 2026-06-19 cs.AI cs.PL 新提交 80%

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Multi-LCB: 将 LiveCodeBench 扩展到多种编程语言

Maria Ivanova, Pavel Zadorozhny, Rodion Levichev, Ivan Petrov, Adamenko Pavel, Ivan Lopatin, Alexey Kutalev, Dmitrii Babaev

发表机构 * GigaCode ； Yandex School of Data Analysis, Applied AI Institute（Yandex数据分析学院，应用人工智能研究所）

专题命中其他LLM ：评估LLM跨语言代码生成能力，涉及预训练模型

AI总结提出 Multi-LCB 基准，将 LiveCodeBench 的 Python 任务扩展到 12 种编程语言，评估 LLM 跨语言代码生成能力，发现 Python 过拟合和语言特定污染等问题。

Comments ICLR 2026

详情

AI中文摘要

LiveCodeBench (LCB) 最近已成为评估大型语言模型 (LLM) 在代码生成任务上的广泛采用的基准。通过策划竞争性编程问题、不断向集合中添加新问题并根据发布日期进行过滤，LCB 提供了污染感知的评估，并提供了编码能力的整体视图。然而，LCB 仍然局限于 Python，留下了 LLM 是否能够泛化到现实软件工程所需的各种编程语言的问题。我们引入了 Multi-LCB，这是一个跨十二种编程语言（包括 Python）评估 LLM 的基准。Multi-LCB 将 LCB 数据集中的 Python 任务转换为其他语言中的等效任务，同时保留 LCB 的污染控制和评估协议。由于它与原始 LCB 格式完全兼容，Multi-LCB 将自动跟踪未来的 LCB 更新，从而能够系统地评估跨语言代码生成能力，并要求模型在 Python 之外保持良好的性能。我们在 Multi-LCB 上评估了 24 个 LLM 的指令和推理能力，发现了 Python 过拟合、语言特定污染以及多语言性能显著差异的证据。我们的结果将 Multi-LCB 确立为多编程语言代码评估的严格新基准，直接解决了 LCB 的主要局限性，并揭示了当前 LLM 能力的关键差距。

英文摘要

LiveCodeBench (LCB) has recently become a widely adopted benchmark for evaluating large language models (LLMs) on code-generation tasks. By curating competitive programming problems, constantly adding fresh problems to the set, and filtering them by release dates, LCB provides contamination-aware evaluation and offers a holistic view of coding capability. However, LCB remains restricted to Python, leaving open the question of whether LLMs can generalize across the diverse programming languages required in real-world software engineering. We introduce Multi-LCB, a benchmark for evaluating LLMs across twelve programming languages, including Python. Multi-LCB transforms Python tasks from the LCB dataset into equivalent tasks in other languages while preserving LCB's contamination controls and evaluation protocol. Because it is fully compatible with the original LCB format, Multi-LCB will automatically track future LCB updates, enabling systematic assessment of cross-language code generation competence and requiring models to sustain performance well beyond Python. We evaluated 24 LLMs for instruction and reasoning on Multi-LCB, uncovering evidence of Python overfitting, language-specific contamination, and substantial disparities in multilingual performance. Our results establish Multi-LCB as a rigorous new benchmark for multi-programming-language code evaluation, directly addressing LCB's primary limitation and exposing critical gaps in current LLM capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.20245 2026-06-19 cs.AI 新提交 80%

Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference

导航不可靠的参数化与上下文知识：面向LLM推理的显式知识冲突解决

Huang Peng, Jiuyang Tang, Weixin Zeng, Hao Xu, Xiang Zhao

发表机构 * National Key Laboratory of Big Data and Decision, National University of Defense Technology（国防科技大学大数据与决策国家重点实验室）

专题命中其他LLM ：解决LLM参数知识与上下文冲突

AI总结提出MACR框架，通过自适应知识评估与多智能体推理，显式解决大语言模型内部参数知识与外部上下文之间的冲突，超越传统二元选择范式。

Comments 12 pages, 3 figures

详情

AI中文摘要

大型语言模型（LLM）通过利用广泛的参数化知识和上下文学习能力，在多种基于语言的任务中取得了强劲性能，使其能够整合输入提示中提供的外部信息。然而，外部知识的整合可能引入冲突，不仅存在于模型内部参数知识与外部信息之间，也存在于多个外部上下文之间。现有方法通常假设模型或提供的上下文是可靠的，忽视了两种来源都可能包含错误的情况，并通过优先考虑某一来源而非另一来源来避免冲突，而非主动解决不一致性。为解决这些局限，我们提出了一种新颖的LLM知识冲突解决框架MACR，该框架超越了传统的二元选择范式，并基于多智能体推理方法引入了显式的冲突解决机制。具体而言，我们首先提出一种自适应知识评估与检索方法，采用改进的语义熵度量来量化LLM对给定查询答案的置信度。基于此置信度估计，MACR要么将模型的内部知识外化为文本表示，要么在内部知识不足时检索相关外部知识，为后续推理生成基本上下文。然后，我们引入一个归纳式多智能体推理框架，包含三个专门智能体，分别用于归纳显式规则、分析潜在冲突以及解决所有可用上下文中的不一致性。实验结果表明，MACR在多个基准测试中显著优于最先进的基线方法，同时提供了可解释的显式冲突解决方案。

英文摘要

Large language models (LLMs) have achieved strong performance across a wide range of language-based tasks by leveraging both extensive parametric knowledge and in-context learning ability, enabling them to incorporate external information provided in the input prompt. However, the integration of external knowledge can introduce conflicts, not only between the model's internal parametric knowledge and the external information, but also among multiple pieces of external contexts. Existing approaches typically assume that either the model or the provided context is reliable, overlooking the possibility that both sources may contain errors, and avoid conflicts by privileging one source over the other, rather than actively resolving inconsistencies. To address these limitations, we propose a novel framework MACR for LLM knowledge conflict resolution that moves beyond the conventional binary choice paradigm and incorporates an explicit conflict-resolution mechanism based on a multi-agent reasoning approach. Specifically, we first propose an adaptive knowledge assessment and retrieval approach that employs a modified semantic entropy measure to quantify an LLM's confidence in its answer to a given query. Based on this confidence estimation, MACR either externalizes the model's internal knowledge as textual representations or retrieves relevant external knowledge when internal knowledge is insufficient, generating basic contexts for subsequent reasoning. Then we introduce an inductive multi-agent reasoning framework with three specialized agents that, respectively, induce explicit rules, analyze potential conflicts, and resolve inconsistencies across all available contexts. Empirical results demonstrate that MACR significantly outperforms state-of-the-art baselines across benchmarks, while also providing interpretable resolutions of explicit conflicts.

URL PDF HTML ☆

赞 0 踩 0

2606.20152 2026-06-19 cs.CL cs.AI 新提交 80%

From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models

从文本到分数：追踪大型语言模型中作文质量表征的出现

Jiaxu Zuo, Mu You, Kaixin Lan, Tao Fang, Yujia Huo, Henghua Shen, Lidia S. Chao, Derek F. Wong

发表机构 * NLP2 CT Lab, Department of Computer and Information Science, University of Macau（澳门大学计算机与信息科学系NLP2 CT实验室）； Institute of International Language Services Studies, Macau Millennium College（澳门 millennium 学院国际语言服务研究学院）； School of Data Science and Information Engineering, Guizhou Minzu University（贵州民族大学数据科学与信息工程学院）

专题命中其他LLM ：分析LLM内部表征用于自动作文评分。

AI总结通过线性探测等方法分析8个LLM在三个数据集上的隐藏表征，发现作文质量信息以线性可解码形式存在，并识别出与分数相关的神经元，揭示了LLM评分的内在机制。

Comments This is a preprint of a manuscript currently under peer review

详情

AI中文摘要

近年来，大型语言模型（LLMs）的进展极大地改变了自动作文评分（AES），但基于LLM的评分内部机制仍知之甚少。在本工作中，我们系统分析了八个LLMs在两个英文作文数据集（ASAP++、CSEE）和一个葡萄牙语数据集（ENEM）上的隐藏表征。通过线性探测、跨提示泛化、降维和神经元级分析，我们发现一致证据表明作文质量信息以线性可访问的形式编码在LLM表征中。这些表征在层间逐步出现，在不同提示策略下保持稳健，并且尽管评分标准不同，仍能在作文提示间部分迁移。此外，非线性探测相对于线性探测仅提供边际且不一致的改进，表明大多数作文质量信息已经是线性可解码的。我们进一步识别出单个“作文评分神经元”，其激活与作文分数强相关，且其行为对目标干预敏感。此外，这些神经元的逐层分布随作文长度系统性地变化，较长的作文更依赖深层。总体而言，我们的发现提供了LLM编码与作文质量相关的结构化表征的证据，并为基于LLM的AES系统的可解释性提供了新见解。

英文摘要

Recent advances in Large Language Models (LLMs) have substantially transformed Automated Essay Scoring (AES), yet the internal mechanisms underlying LLM-based scoring remain poorly understood. In this work, we systematically analyze the hidden representations of eight LLMs across two English essay datasets (ASAP++, CSEE) and one Portuguese dataset (ENEM). Using linear probing, cross-prompt generalization, dimensionality reduction, and neuron-level analyses, we find consistent evidence that essay quality information is encoded in a linearly accessible form within LLM representations. These representations emerge progressively across layers, remain robust across prompting strategies, and partially transfer across essay prompts despite differences in scoring rubrics. In addition, nonlinear probes provide only marginal and inconsistent improvements over linear probes, suggesting that most essay quality information is already linearly decodable. We further identify individual ``essay scoring neurons'' whose activations strongly correlate with essay scores and whose behavior is sensitive to targeted intervention. Moreover, the layer-wise distribution of these neurons systematically shifts with essay length, with longer essays relying more heavily on deeper layers. Overall, our findings provide evidence that LLMs encode structured representations related to essay quality and offer new insights into the interpretability of LLM-based AES systems.

URL PDF HTML ☆

赞 0 踩 0

2606.19700 2026-06-19 cs.CL 新提交 85%

TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature

TerraMARS: 用于火星地球化改造文献的领域自适应小语言模型管道

Jyotsna Singh, Ash Black, Jeff Larsen, Scott R. Saleska

发表机构 * University of Arizona（亚利桑那大学）； College of Information Science, University of Arizona（亚利桑那大学信息科学学院）； Biosphere 2, University of Arizona（亚利桑那大学生物圈2）； Department of Ecology and Evolutionary Biology, University of Arizona（亚利桑那大学生态与进化生物学系）； Department of Environmental Sciences, University of Arizona（亚利桑那大学环境科学系）

专题命中领域大模型：领域自适应小语言模型管道，用于火星科学文献提取。

AI总结提出TerraMARS管道，结合领域自适应小语言模型，从火星科学文献中提取结构化信息，支持地球化改造研究。

Comments 16 pages, 1 figure, 4 tables

详情

AI中文摘要

研究人员有兴趣了解火星，以便最终使其适合人类居住。为此，需要通过科学文献全面了解行星的大气、水文、表面化学、辐射环境和空间特征。这些文献包含有价值的信息和有意义的定量约束，可用于其他模型和研究，如宜居性评估和未来的地球化改造研究。我们提出了TerraMARS，一个端到端的信息提取管道，它结合了领域自适应的小语言模型来回答火星地球化改造相关问题，并将非结构化的火星科学文本转换为机器可读的结构化输出（JSON格式）。收集了一个开放获取论文语料库，并使用多阶段检索和分块框架进行处理。使用量化低秩自适应（QLoRA）对火星特定问答和信息提取数据集进行微调，使Google Gemma 3 1B适应领域。生成的管道产生两种类型的输出，并为将科学文献中的知识整合到下游应用（如数字孪生和火星宜居性建模）提供了基础。该管道的输出看起来很有前景，但需要进一步改进以提高提取准确性和事实一致性。

英文摘要

Researchers are interested in learning about Mars so that it may eventually become habitable for humans. To achieve this, there is a need for comprehensive knowledge of the planet's atmosphere, hydrology, surface chemistry, radiation environment, and spatial features through the scientific literature. These contain valuable information and meaningful quantitative constraints that can be used in other models and studies, such as habitability assessment and future terraforming studies. We present TerraMARS, an end-to-end information extraction pipeline that combines a domain-adapted Small Language Model to answer Mars terraforming-related questions and convert unstructured Mars science text into machine-readable structured outputs in JavaScript Object Notation (JSON) format. A corpus of open-access papers is collected and processed using a multistage retrieval and chunking framework. Google Gemma 3 1B was adapted to the domain using Quantized Low-Rank Adaptation (QLoRA) fine-tuning on Mars-specific question-answering and information extraction datasets. The resulting pipeline generates both types of output and provides a foundation for integrating knowledge from scientific literature into downstream applications like digital twins and habitability modeling for Mars. The output from this pipeline looks promising, but further improvements are needed to increase extraction accuracy and factual consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.20138 2026-06-19 cs.AI cs.CL cs.HC cs.LG 新提交 80%

Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring

学习提示：基于自适应LLM的高中辅导提升学生参与度

Po-Chin Chang, Nicholas Hogan, Aske Plaat, Michiel T. van der Meer

发表机构 * Leiden University（莱顿大学）； FutureWhiz

专题命中领域大模型：自适应LLM高中辅导系统。

AI总结提出一种基于14个教学特征的主题感知提示路由模型，通过模拟训练和在线A/B测试，在高中辅导中实现自适应策略切换，提高教学效率并减少交互轮次。

详情

AI中文摘要

LLMs可以个性化教育，尽管当前的静态提示辅导系统难以适应不同的学科。我们开发并测试了一个具有主题感知提示的系统，该系统基于从原始转录中提取的14个教学特征（例如，辅导支架、学生理解）。我们首先在模拟环境中训练一个提示路由模型，然后将其部署到实际高中学生的在线适应中。模拟基准测试显示，路由器的性能优于两个静态基线（$0.694$ vs. $0.647$ 和 $0.64$, $p<0.001$）。A/B测试（$N=656$ 次对话，来自359名学生）显示了从模拟到现实的迁移，其中模型从分析策略切换到支架学习策略。我们的自适应提示选择机制提高了教学效率，保持了教学质量，并减少了约3轮交互（$p=0.007$）。虽然贪婪路由器的练习转化率与基线相当（$19.1\%$ vs. $19.6\%$），但随机采样策略的随机路由器实现了更高的转化率（$28.1\%$）。

英文摘要

LLMs can personalize education, although current static-prompt tutoring systems struggle to adapt to diverse academic disciplines. We develop and test a system with subject-aware prompting, based on 14 pedagogical features (e.g., tutor scaffolding, student understanding) extracted from raw transcripts. We first train a prompt routing model in a simulation environment, and then deploy it for online adaptation with actual high-school students. The simulation benchmark shows the router outperforming two static baselines ($0.694$ vs. $0.647$ and $0.64$, $p<0.001$). A/B testing ($N=656$ conversations from 359 students) shows sim-to-real transfer where the model switches from analytical to scaffolding learning strategies. Our adaptive prompt selection mechanism improves instructional efficiency, maintains pedagogical quality and reduces interactions by around 3 turns ($p=0.007$). While a greedy router achieves a comparable exercise conversion rate with the baseline ($19.1\%$ vs. $19.6\%$), a stochastic router that samples strategies leads to a higher conversion rate ($28.1\%$).

URL PDF HTML ☆

赞 0 踩 0

1. 预训练 8 篇

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

Characterizing Narrative Content in Web-scale LLM Pretraining Data

BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining

Online Dynamic Batching with Formal Guarantees for LLM Training

2. 长上下文 1 篇

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

3. 后训练 6 篇

Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings

A Survey of On-Policy Distillation for Large Language Models

Reinforcement-aware Knowledge Distillation for LLM Reasoning

AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models

VIMPO: Value-Implicit Policy Optimization for LLMs

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

4. 指令微调 2 篇

A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't)

DeFrame: Debiasing Large Language Models Against Framing Effects

5. 其他LLM 11 篇

Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

SAC: Disaggregated KV Cache System for Sparse Attention LLMs with CXL

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

Diffusion Language Models: An Experimental Analysis

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

Patronus: Identifying and Mitigating Transferable Backdoors in Pre-trained Language Models

Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages

Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference

From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models

6. 领域大模型 2 篇

TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature

Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring