Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching
Flow-HOA:基于流匹配的Ambisonics编码生成式联合优化
Yuhuan You, Yufan Qian, Tianshu Qu, Bin Wang, Xueyang Lv
AI总结 提出Flow-HOA生成框架,通过条件流匹配联合优化时域、频谱和空间保真度,生成可部署的FIR编码滤波器组,在合成数据和真实录音上均优于强基线方法。
详情
- Comments
- Accepted for presentation at AES Europe 2026 Convention (AES 160th Convention), Copenhagen, Denmark, May 28-30, 2026
从稀疏、不规则的麦克风阵列进行高阶Ambisonics(HOA)编码仍然是沉浸式通信和XR中消费级空间音频捕获的关键挑战。我们提出Flow-HOA,一个生成式框架,联合优化包含时域、频谱和空间保真度的多维目标,同时生成可部署的、时不变的有限脉冲响应(FIR)编码滤波器组。通过条件流匹配,模型学习将简单先验分布映射到FIR滤波器系数的目标分布。训练由复合损失引导,平衡时域波形保真度、多分辨率频谱一致性、子带能量保持和空间指向性约束。在合成模拟数据上的客观评估表明,在信号保真度和空间准确性指标上均优于强模型基线。在真实麦克风阵列录音上的主观听音测试进一步证实,Flow-HOA能产生更高的整体音质并减少伪影,展示了从合成训练数据到真实捕获条件的泛化能力。
Higher-Order Ambisonics (HOA) encoding from sparse, irregular microphone arrays remains a critical challenge for consumer spatial audio capture in immersive communication and XR. We propose Flow-HOA, a generative framework that jointly optimizes a multi-dimensional objective encompassing time-domain, spectral, and spatial fidelity while producing a deployable, time-invariant bank of Finite Impulse Response (FIR) encoding filters. Using conditional flow matching, the model learns to map a simple prior distribution to the target distribution of FIR filter coefficients. Training is guided by a composite loss that balances time-domain waveform fidelity, multi-resolution spectral consistency, sub-band energy preservation, and spatial directivity constraints. Objective evaluations on synthetically simulated data demonstrate improved performance over strong model-based baselines in both signal fidelity and spatial accuracy metrics. Subjective listening tests on real microphone array recordings further confirm that Flow-HOA yields higher overall sound quality with reduced artifacts, demonstrating generalization from synthetic training data to real-world capture conditions.