2603.16606
2026-06-19
cs.CL
版本更新
Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech
Omnilingual SONAR:跨语言与跨模态句子嵌入,连接大规模多语言文本与语音
Omnilingual SONAR Team, João Maria Janeiro, Pere-Lluís Huguet Cabot, Ioannis Tsiamas, Yen Meng, Vivek Iyer, Guillem Ramírez, Loic Barrault, Belen Alastruey, Xiang "Tony" Cao, Yu-An Chung, Marta R. Costa-Jussa, David Dale, Kevin Heffernan, Jaehyeong Jo, Artyom Kozhevnikov, Alexandre Mourachko, Christophe Ropers, Holger Schwenk, Paul-Ambroise Duquenne
发表机构
*
FAIR at Meta(Meta的FAIR)
AI总结
提出OmniSONAR模型,通过渐进式训练和教师-学生蒸馏,在数千种语言上实现文本、语音、代码和数学表达式的统一语义嵌入,在跨语言检索和翻译任务上显著降低错误率,并支持零样本语音翻译。