Massive Open-Vocabulary Keyword Spotting
大规模开放词汇关键词识别
Leonor Barreiros, Raul Monteiro, Afonso Mendes, Gonçalo M. Correia
AI总结 提出一种内存占用更小的开放词汇关键词识别系统,无需微调即可处理大规模数据库,在未见语言中达到与未压缩方案相当的实体召回率。
详情
- Comments
- Accepted to Interspeech 2026
自动语音识别系统在转录训练数据中罕见词汇(即专业术语)时表现不佳。开放词汇关键词识别结合上下文偏置已被证明可以缓解这一问题。然而,现有系统只能处理几百个术语的词汇表,否则会成为不可行的瓶颈。我们提出了一种系统,其存储特征的内存占用比可比基线小128倍,允许用户处理大规模数据库,同时保持开放词汇。无需微调语音识别模型,我们的系统在未见过的语言中也达到了与未压缩解决方案相当的实体召回率。
Automatic speech recognition systems have been shown to under-perform when it comes to transcribing words rarely seen in the training data, namely specialized terminology. Open-vocabulary keyword spotting, combined with contextual biasing, has been shown to mitigate this issue. However, existing systems can only handle glossaries of a few hundred terms without becoming an infeasible bottleneck. We propose a system that stores features with a memory footprint up to 128 times smaller than a comparable baseline and allows users to process massive databases while remaining open-vocabulary. Without fine-tuning the speech recognition model, our system achieves a comparable entity recall as uncompressed solutions, even in languages not seen during training.