2606.24855
2026-06-24
cs.AI
新提交
OpenThoughts-Agent: Data Recipes for Agentic Models
OpenThoughts-Agent: 智能体模型的数据配方
Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt
发表机构
*
UC Berkeley(加州大学伯克利分校)
;
Stanford University(斯坦福大学)
;
JSC(于利希超级计算中心)
;
LAION
;
University of Texas at Austin(德克萨斯大学奥斯汀分校)
;
Bespoke Labs
;
Laude Institute
;
UCLA(加州大学洛杉矶分校)
;
Harvard University & Harvard Medical School(哈佛大学与哈佛医学院)
;
TU Munich & Munich Center for Machine Learning(慕尼黑工业大学与慕尼黑机器学习中心)
;
New York University(纽约大学)
;
Medical University of South Carolina(南卡罗来纳医科大学)
;
The LLM Data Company
;
BenchFlow
;
Independent Researcher(独立研究员)
;
Northeastern University(东北大学)
;
University of Wisconsin–Madison(威斯康星大学麦迪逊分校)
;
University of Washington(华盛顿大学)
;
TU Darmstadt(达姆施塔特工业大学)
;
University of Southern California(南加州大学)
;
UC San Diego(加州大学圣地亚哥分校)
;
Amazon(亚马逊)
;
Microsoft(微软)
;
Korea University(高丽大学)
;
Cornell Tech(康奈尔科技)
;
University of Michigan(密歇根大学)
AI总结
提出全开放数据筛选流水线,通过100多次消融实验研究任务来源与多样性,构建10万样本训练集,在7个智能体基准上平均44.8%准确率,较最强开源模型提升3.9个百分点。