图像生成 - arXivDaily 专题

2605.08189 2026-06-18 eess.AS 版本更新 55%

DiffVQE: Hybrid Diffusion Voice Quality Enhancement Under Acoustic Echo and Noise

DiffVQE：声学回声和噪声下的混合扩散语音质量增强

Haljan Lugo, Ernst Seidel, Pejman Mowlaee, Ziyue Zhao, Tim Fingscheidt

专题命中其他图像生成：提出扩散模型用于语音质量增强，非图像生成。

AI总结提出首个基于扩散的声学回声控制模型DiffVQE，在回声和噪声控制性能、计算复杂度和模型大小上均优于判别式DeepVQE模型。

Comments 6 pages, 4 figures, accepted at Interspeech 2026

详情

AI中文摘要

声学回声和背景噪声对免提系统和免提电话中的语音增强提出了挑战。判别式训练的端到端方法为联合声学回声控制（AEC）和去噪提供了强大的解决方案。然而，随着生成方法的出现，基于扩散的方法在语音增强任务中表现出卓越的性能。在这项工作中，据我们所知，我们提供了第一个（仍然是非因果的）基于扩散的AEC模型（DiffVQE），该模型在拓扑结构、训练数据和训练框架方面是可复现的。到目前为止，在不使用扩散的情况下，微软的判别式DeepVQE模型已被证明优于ICASSP 2023 AEC挑战赛的任何参赛作品，取得了卓越的性能。使用来自Interspeech 2025 URGENT挑战赛的数据作为多样化、高质量的训练数据集，我们的DiffVQE在回声和噪声控制性能以及计算复杂度和模型大小方面均优于DeepVQE。

英文摘要

Acoustic echo and background noise pose challenges on speech enhancement in hands-free systems and speakerphones. Discriminatively trained end-to-end methods represent a powerful solution for joint acoustic echo control (AEC) and denoising. However, with the advent of generative methods, diffusion-based approaches have seen remarkable performance in speech enhancement tasks. In this work, to the best of our knowledge, we provide the first (still non-causal) diffusion-based AEC model (DiffVQE) that is reproducible in terms of topology, training data, and training framework. So far, without employing diffusion, Microsoft's discriminative DeepVQE model has been shown to excel any of the ICASSP 2023 AEC Challenge entries achieving remarkable performance. Using data from the Interspeech 2025 URGENT Challenge for a diverse, high-quality training dataset, our DiffVQE excels DeepVQE both in echo and noise control performance, as well as in computational complexity and model size.

URL PDF HTML ☆

赞 0 踩 0