WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points
WinQ: 加速围绕鞍点的语言模型量化感知训练
Dongyue Li, Zechun Liu, Kai Yi, Zhenshuo Zhang, Changsheng Zhao, Raghuraman Krishnamoorthi, Harshit Khaitan, Hongyang R. Zhang, Steven Li
AI总结 本文研究了量化感知训练(QAT)在低比特宽度下的收敛问题,提出WinQ算法通过重置权重和噪声注入梯度来加速训练并提升性能。
Comments 23 pages; To appear in ICML 2026
详情
量化感知训练(QAT)被广泛用于通过训练全精度权重来量化语言模型,其主要瓶颈是收敛缓慢和早期性能 plateau,特别是在低于4比特宽度时。尽管先前工作已观察到此问题,但其精确原因仍不清楚。在本文中,我们通过估计损失曲面Hessian谱来分析QAT的收敛性。我们发现权重会收敛到鞍点周围的平坦区域,其中大量Hessian特征值同时为正和负。在训练过程中,越来越多的Hessian特征值集中在零附近,其幅度减小。在较低的比特宽度下,Hessian谱中的特征值幅度显著更小。为缓解这些问题,我们提出了一种名为WinQ的算法,包括:(1)周期性地将权重重置为全精度和量化权重的线性插值,减少到量化网格的距离并增加特征值幅度,以及(2)计算噪声注入权重的梯度以正则化Hessian。广泛的实验表明,WinQ在各种量化方法和模型上将QAT加速了多达4倍。在相同的训练成本下,WinQ将最先进的子4比特量化改进了高达8.8%。这些结果在16种不同语言模型、量化方法和比特宽度的设置中保持一致。
Quantization-aware training (QAT) is widely adopted to quantize language models by training full-precision weights using gradients from the quantized model. The main bottleneck is its slow convergence and early performance plateau, particularly below 4-bit-widths. While this problem has been observed in prior work, its precise cause remains unclear. In this paper, we analyze the convergence of QAT by estimating the spectrum of the loss-surface Hessians. We find that the weights converge to flat regions around saddle points, where a large fraction of the Hessian eigenvalues are both positive and negative. During training, an increasing fraction of Hessian eigenvalues concentrates around zero, whose magnitude decreases. At lower bit-widths, the magnitude of eigenvalues in the Hessian spectrum is significantly smaller. To mitigate these issues, we propose an algorithm called WinQ to accelerate QAT, which involves: (1) periodically resetting weights to the linear interpolation of full-precision and quantized weights, reducing the distance to the quantization grid and increasing eigenvalue magnitude, and (2) computing gradients of noise-injected weights to regularize the Hessian. Extensive experiments show that WinQ accelerates QAT by up to 4 times across various quantization methods and models. Under the same training cost, WinQ improves state-of-the-art sub-4-bit quantization by up to 8.8%. These results are consistent across 16 settings with different language models, quantization methods, and bit widths.