2502.17055
2026-05-28
cs.LG
cs.AI
GradientStabilizer:Fix the Norm, Not the Gradient
GradientStabilizer:固定范数,而非梯度
Tianjin Huang, Zhangyang Wang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Jiaxing Shang, Tianlong Chen, Ke Li, Lu Liu, Qingsong Wen, Shiwei Liu
发表机构
*
Department of Computer Science, University of Exeter(埃克塞特大学计算机科学系)
;
Department of Mathematics and Computer Science, Eindhoven University of Technology(埃因霍温理工大学数学与计算机科学系)
;
School of the Gifted Young, University of Science and Technology of China(中国科学技术大学天才青年学院)
;
Department of Electrical and Computer Engineering, University of Texas at Austin(德克萨斯大学奥斯汀分校电气与计算机工程系)
;
Department of Computer Science, University of Reading(阅读大学计算机科学系)
;
School of Cyber Science and Technology, Sun Yat-sen University(中山大学网络科学与技术学院)
;
Department of Computer Science, The University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校计算机科学系)
;
ELLIS Institute Tubingen(图宾根ELLIS研究所)
;
Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所)
;
Tübingen AI Center, Tübingen, Germany(图宾根人工智能中心,德国图宾根)
;
College of Computer Science, Chongqing University(重庆大学计算机学院)
AI总结
提出GradientStabilizer,一种轻量级梯度变换方法,通过统计稳定的梯度范数估计替换更新幅度,在不改变梯度方向的前提下抑制极端梯度尖峰,从而提升训练稳定性并减少发散。