2606.01802
2026-06-08
cs.SD
cs.AI
版本更新
MOSS-Audio Technical Report
MOSS-Audio 技术报告
Chen Yang, Chufan Yu, Hanfu Chen, Jie Zhu, Jingqi Chen, Ke Chen, Wenxuan Wang, Yang Wang, Yaozhou Jiang, Yi Jiang, Zhengyuan Lin, Ziqi Chen, Zhaoye Fei, Chenghao Liu, Donghua Yu, Jun Zhan, Kang Yu, Kexin Huang, Liwei Fan, Mingshu Chen, Qinyuan Cheng, Ruixiao Li, Shimin Li, Songlin Wang, Xingjian Zhao, Yang Gao, Yitian Gong, Yiyang Zhang, Zhe Xu, Xipeng Qiu
发表机构
*
OpenMOSS Team(开放MOSS团队)
AI总结
提出统一音频-语言模型 MOSS-Audio,通过 DeepStack 跨层特征注入和时间标记实现语音、环境声和音乐的理解,在音频字幕、时间感知问答、时间戳转录和音频推理任务上取得强性能。