SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation
SGMD: 得分梯度匹配蒸馏用于少步视频扩散蒸馏
Zhuguanyu Wu, Ruihao Gong, Yang Yong, Yushi Huang, Xiangyu Fan, Lei Yang, Dahua Lin, Xianglong Liu
AI总结 针对分布匹配蒸馏在少步视频扩散中训练昂贵且运动动态保守的问题,提出得分梯度匹配蒸馏(SGMD),通过直接优化假得分朝向教师并使用教师停止梯度Fisher作为稳定目标,实现约3倍训练加速并显著提升运动动态。
详情
- Comments
- ICML 2026
分布匹配蒸馏(DMD)是加速少步视频扩散模型推理的常用范式。然而,DMD风格的视频蒸馏面临两个耦合挑战:假得分必须跟踪不断演化的生成器,当需要频繁更新时训练成本高昂,而反向KL风格匹配可能具有模式寻求性和保守性,难以保持强运动动态。为解决这些问题,我们提出 extbf{得分梯度匹配蒸馏(SGMD)}。SGMD采用假得分视角,直接优化假得分朝向教师,同时使用教师停止梯度Fisher作为稳定的分布匹配目标。我们提供了梯度分析,论证了在理想跟踪下该目标选择的合理性。在此基础上,SGMD引入一对双重势:负残差(NR)用于外环校正,残差收缩(RC)用于内环跟踪。实验上,与DMD2相比,SGMD实现了约$\sim 3 imes$的训练加速,并显著改善了4步蒸馏模型的运动动态,同时保持了时间一致性。一项人类研究证实,SGMD在运动质量和整体偏好上更受青睐,而视觉质量和文本对齐保持相当。代码可在https://github.com/ModelTC/LightX2V获取。
Distribution Matching Distillation (DMD) is a widely used paradigm for accelerating inference in few-step video diffusion models. However, DMD-style video distillation faces two coupled challenges: the fake score must track a continuously evolving generator, making training costly when frequent updates are required, while reverse-KL-style matching can be mode-seeking and conservative for preserving strong motion dynamics. To address these issues, we propose \textbf{Score Gradient Matching Distillation (SGMD)}. SGMD adopts a fake-score perspective by directly optimizing the fake score toward the teacher, while using teacher stop-gradient Fisher as a stable distribution-matching objective. We provide a gradient analysis that motivates this objective choice under ideal tracking. Building on this, SGMD introduces a pair of dual potentials: negative-residual (NR) for outer-loop correction and residual-contraction (RC) for inner-loop tracking. Empirically, compared to DMD2, SGMD achieves an approximately $\sim 3\times$ training speedup and substantially improves motion dynamics for 4-step distilled models while preserving temporal consistency. A human study confirms that SGMD is preferred in motion quality and overall preference, while visual quality and text alignment remain comparable. Code is available at https://github.com/ModelTC/LightX2V.