2605.08678
2026-05-28
cs.LG
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
MLS-Bench:对构建更好AI的AI系统的全面且严格评估
Bohan Lyu, Yucheng Yang, Siqiao Huang, Jiaru Zhang, Qixin Xu, Xinghan Li, Xinyang Han, Yicheng Zhang, Huaqing Zhang, Runhan Huang, Kaicheng Yang, Zitao Chen, Wentao Guo, Junlin Yang, Xinyue Ai, Wenhao Chai, Yadi Cao, Ziran Yang, Kun Wang, Dapeng Jiang, Huan-ang Gao, Shange Tang, Chengshuai Shi, Simon S. Du, Max Simchowitz, Jiantao Jiao, Dawn Song, Chi Jin
AI总结
提出MLS-Bench基准,包含12个领域140个任务,评估AI系统能否发明通用且可扩展的机器学习方法,发现当前智能体在方法发明上仍远逊于人类,瓶颈在于科学洞察而非单纯搜索或计算。