FoleyGenEx: Unified Video-to-Audio Generation with Multi-Modal Control, Temporal Alignment, and Semantic Precision
FoleyGenEx: 统一视频到音频生成,具备多模态控制、时间对齐与语义精度
发表机构 * Academy for Advanced Interdisciplinary Studies, Nankai University(南开大学前沿交叉学科研究院) ; Kling Team, Kuaishou Technology(快手科技Kling团队)
AI总结 提出FoleyGenEx统一框架,通过条件注入、多模态动态掩码和副词数据增强,实现视频到音频生成中多模态控制、帧级时间对齐与细粒度语义的同步合成。
Comments Accepted by INTERSPEECH 2026
Journal ref INTERSPEECH 2026