2606.13608
2026-06-12
cs.AI
cs.LG
新提交
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility
AgentBeats:面向开放性、标准化和可复现性的智能体评估代理化
Xiaoyuan Liu, Jianhong Tu, Yuqi Chen, Siyuan Xie, Sihan Ren, Tianneng Shi, Gal Gantar, Evan Sandoval, Donghyun Lee, Daniel Miao, Peter J. Gilbert, Nick Hynes, Mauro Staver, Warren He, David Marn, Andrew Low, Xi Zhang, Elron Bandel, Michal Shmueli-Scheuer, Siva Reddy, Alexandre Drouin, Alexandre Lacoste, Ramayya Krishnan, Elham Tabassi, Yu Su, Victor Barres, Chenguang Wang, Wenbo Guo, Dawn Song
发表机构
*
University of California, Berkeley(加州大学伯克利分校)
;
Purdue University(普渡大学)
;
University of Ljubljana(卢布尔雅那大学)
;
University of Washington(华盛顿大学)
;
Oasis Labs
;
University of Maryland(马里兰大学)
;
IBM Research(IBM研究院)
;
Mila
;
McGill University(麦吉尔大学)
;
ServiceNow Research(ServiceNow研究院)
;
Carnegie Mellon University(卡内基梅隆大学)
;
National Institute of Standards and Technology(美国国家标准与技术研究院)
;
The Ohio State University(俄亥俄州立大学)
;
University of Cambridge(剑桥大学)
;
University of California, Santa Barbara(加州大学圣塔芭芭拉分校)
AI总结
提出代理化智能体评估(AAA)框架,通过标准化协议(A2A和MCP)统一评估接口,实现开放、可复现的多智能体评估,并基于AgentBeats系统通过大规模竞赛和案例研究验证其覆盖性、实用性和保真度。