HALO: Half-Frame-Rate Adaptive Learnable Operator for Lightweight STFT-Based Speech Enhancement
HALO:半帧率自适应可学习算子用于轻量级基于STFT的语音增强
Jiadong Zhao, Dahan Wang, Yu Sun, Leyan Yang, Xiaobin Rong, Shiruo Sun, Yuxiang Hu, Jing Lu
AI总结 提出HALO模块,通过半帧率处理减少STFT重叠帧冗余,降低轻量模型计算成本,在DNS3数据集上验证了有效性。
详情
- Comments
- Accepted by Interspeech 2026
基于STFT的语音增强通常采用重叠分析帧。虽然重叠对于稳定的STFT处理至关重要,但它使相邻帧高度相关,导致轻量模型中的冗余计算。我们提出了半帧率自适应可学习算子(HALO),这是一个因果插件模块,在不改变STFT过程的情况下将内部帧率减半。HALO广泛适用于许多轻量模型,在骨干网络之前应用自适应速率降低,之后进行恢复,在原始STFT网格上重建全速率频谱。降低和恢复均通过轻量动态卷积实现。通过将处理帧率减半,HALO在不增加算法延迟的情况下降低了骨干网络的计算成本,为通道扩展释放了预算。在DNS3数据集上的实验表明,在匹配复杂度下,各种轻量模型均获得一致提升,证明了减少重叠引起的冗余的有效性。
STFT-based speech enhancement typically adopts overlapping analysis frames. While overlap is essential for stable STFT processing, it makes adjacent frames highly correlated, causing redundant computation in lightweight models. We propose Half-frame-rate Adaptive Learnable Operator (HALO), a causal plug-in module that halves the internal frame rate without altering the STFT procedure. Broadly applicable to many lightweight models, HALO applies adaptive rate reduction before the backbone and restoration afterward, reconstructing the full-rate spectrum on the original STFT grid. Both reduction and restoration are implemented with lightweight dynamic convolutions. By halving the processed frame rate, HALO reduces backbone compute cost with no added algorithmic latency, freeing budget for channel widening. Experiments on the DNS3 dataset show consistent gains across diverse lightweight models under matched complexity, demonstrating the effectiveness of reducing overlap-induced redundancy.