2605.28480
2026-05-28
eess.AS
cs.SD
版本更新
Audio-Mind: An Auditable Agentic Framework for Audio Understanding
Audio-Mind: 一种可审计的音频理解智能体框架
Yucheng Wang, Jing Peng, Hanqi Li, Chenghao Wang, Wenming Tu, Yu Xi, Zhaokai Sun, Kai Yu, Shuai Wang
发表机构
*
School of Intelligence Science and Technology, Nanjing University, China(南京大学智能科学与技术学院)
;
Department of Computer Science, ETH Zürich, Switzerland(苏黎世联邦理工学院计算机科学系)
;
X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, China(上海交通大学计算机科学学院X-LANCE实验室)
;
School of Automation Science and Engineering, Xi’an Jiaotong University, China(西安交通大学自动化科学与工程学院)
;
School of Computer Science, Northwestern Polytechnical University, China(西北工业大学计算机科学学院)
AI总结
提出Audio-Mind框架,通过条件性证据获取动态结合强前端与规划器引导的工具使用,解决音频理解中智能体证据获取的时机问题,在MMAR和MSU-Bench上分别达到80.4%和82.8%的准确率,并生成可审计的推理轨迹。