Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training
Extra-Merge:追踪语言模型预训练中模型合并的秩-1子空间
发表机构 * School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences, Beijing, China(中国科学院大学先进交叉学科学院) ; State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, China(中国科学院人工智能安全国家重点实验室) ; University of Chinese Academy of Sciences, China(中国科学院大学) ; Alibaba Group, China(阿里巴巴集团) ; School of Mathematics, Southeast University, Nanjing, China(东南大学数学学院)
AI总结 本文通过分析预训练后期轨迹发现秩-1子空间现象,提出无需额外训练的Extra-Merge方法,沿该子空间外推以最小化损失,在GPT-2和LLaMA系列上优于标准合并基线。