Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
利用局部转移一致性加速扩散采样
Shangwen Zhu, Han Zhang, Zhantao Yang, Qianyu Peng, Zhao Pu, Huangji Wang, Fan Cheng
AI总结 提出一种无需训练的加速方法LTC-Accel,通过利用相邻步骤间转移算子的统计关系来估计当前转移算子,从而加速文本到图像和视频的扩散采样,兼容多种网络结构和现有加速技术。
详情
基于文本的扩散模型在从文本描述生成高质量图像和视频方面取得了重大突破。然而,去噪过程漫长的采样时间仍然是实际应用中的一个显著瓶颈。以往的方法要么忽略相邻步骤之间的统计关系,要么依赖于它们之间的注意力或特征相似性,这通常只适用于特定的网络结构。为了解决这个问题,我们在相邻步骤之间的转移算子中发现了一种新的统计关系,重点关注网络输出之间的关系。这种关系对网络结构没有任何要求。基于这一观察,我们提出了一种新颖的无训练加速方法,称为LTC-Accel,它利用识别出的关系基于相邻步骤估计当前转移算子。由于对网络结构没有特定假设,LTC-Accel几乎适用于所有基于扩散的方法,并且与几乎所有现有的加速技术正交,因此易于与它们结合。实验结果表明,LTC-Accel在文本到图像和文本到视频合成中显著加速了采样,同时保持了具有竞争力的样本质量。具体来说,LTC-Accel在Stable Diffusion v2中实现了1.67倍的加速,在视频生成模型中实现了1.55倍的加速。当与蒸馏模型结合时,LTC-Accel在视频生成中实现了惊人的10倍加速,允许实时生成超过16FPS。
Text-based diffusion models have made significant breakthroughs in generating high-quality images and videos from textual descriptions. However, the lengthy sampling time of the denoising process remains a significant bottleneck in practical applications. Previous methods either ignore the statistical relationships between adjacent steps or rely on attention or feature similarity between them, which often only works with specific network structures. To address this issue, we discover a new statistical relationship in the transition operator between adjacent steps, focusing on the relationship of the outputs from the network. This relationship does not impose any requirements on the network structure. Based on this observation, we propose a novel training-free acceleration method called LTC-Accel, which uses the identified relationship to estimate the current transition operator based on adjacent steps. Due to no specific assumptions regarding the network structure, LTC-Accel is applicable to almost all diffusion-based methods and orthogonal to almost all existing acceleration techniques, making it easy to combine with them. Experimental results demonstrate that LTC-Accel significantly speeds up sampling in text-to-image and text-to-video synthesis while maintaining competitive sample quality. Specifically, LTC-Accel achieves a speedup of 1.67-fold in Stable Diffusion v2 and a speedup of 1.55-fold in video generation models. When combined with distillation models, LTC-Accel achieves a remarkable 10-fold speedup in video generation, allowing real-time generation of more than 16FPS.