2605.20266
2026-05-21
cs.SD
版本更新
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
大型音频语言模型综述:通用性、可信度与展望
Kaiwen Luo, Zhenhong Zhou, Leo Wang, Liang Lin, Yang Xiao, Tianyu Shao, Yuanhe Zhang, Yuxuan Li, Miao Yu, Kailin Lyu, Jiaming Zhang, Dongrui Liu, Li Sun, Yueming Wu, Kai Li, Ting Dang, Xiaojun Jia, Rohan Kumar Das, Xinfeng Li, Siyuan Liang, Qiufeng Wang, Xingjun Ma, Jing Chen, Kun Wang, Junhao Dong, Deqing Zou, Yu Cheng, Xia Hu, Zhigang Zeng, Sen Su, Yang Liu, Yu-Gang Jiang, Philip S. Yu, Yew-Soon Ong
发表机构
*
Nanyang Technological University(南洋理工大学)
;
Independent Researcher(独立研究者)
;
The University of Melbourne(墨尔本大学)
;
North China Electric Power University(华北电力大学)
;
Beijing University of Posts and Telecommunications(北京邮电大学)
;
University of Chinese Academy of Sciences(中国科学院大学)
;
University of Science and Technology of China(中国科学技术大学)
;
Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所)
;
Shanghai AI Laboratory(上海人工智能实验室)
;
Huazhong University of Science and Technology(华中科技大学)
;
Tsinghua University(清华大学)
;
Fortemedia Singapore(富媒体新加坡)
;
Tencent(腾讯)
;
Fudan University(复旦大学)
;
Wuhan University(武汉大学)
;
Chinese University of Hong Kong(香港中文大学)
;
Chongqing University of Posts and Telecommunications(重庆邮电大学)
;
University of Illinois Chicago(伊利诺伊大学芝加哥分校)
AI总结
本文综述了大型音频语言模型的通用性、可信度及未来发展方向,探讨了其架构创新、对齐算法及安全风险,并提出了防御深入、因果音频世界建模等策略以提升音频智能的可信度。