2605.08678
2026-05-28
cs.LG
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
MLS-Bench:对构建更好AI的AI系统的全面且严格评估
Bohan Lyu, Yucheng Yang, Siqiao Huang, Jiaru Zhang, Qixin Xu, Xinghan Li, Xinyang Han, Yicheng Zhang, Huaqing Zhang, Runhan Huang, Kaicheng Yang, Zitao Chen, Wentao Guo, Junlin Yang, Xinyue Ai, Wenhao Chai, Yadi Cao, Ziran Yang, Kun Wang, Dapeng Jiang, Huan-ang Gao, Shange Tang, Chengshuai Shi, Simon S. Du, Max Simchowitz, Jiantao Jiao, Dawn Song, Chi Jin
发表机构
*
UC Berkeley(伯克利大学)
;
Princeton University(普林斯顿大学)
;
Tsinghua University(清华大学)
;
University of Washington(华盛顿大学)
;
Purdue University(Purdue 大学)
;
Harvard University(哈佛大学)
;
University of Pennsylvania(宾夕法尼亚大学)
;
Shanghai Jiao Tong University(上海交通大学)
;
UC San Diego(圣地亚哥大学)
;
Carnegie Mellon University(卡内基梅隆大学)
AI总结
提出MLS-Bench基准,包含12个领域140个任务,评估AI系统能否发明通用且可扩展的机器学习方法,发现当前智能体在方法发明上仍远逊于人类,瓶颈在于科学洞察而非单纯搜索或计算。