Tongyi DeepResearch Technical Report
通义深研技术报告
Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang, Zile Qiao, Chenxi Wang, Donglei Yu, Gang Fu, Haiyang Shen, Jiayin Yang, Jun Lin, Junkai Zhang, Kui Zeng, Li Yang, Hailong Yin, Maojia Song, Ming Yan, Minpeng Liao, Peng Xia, Qian Xiao, Rui Min, Ruixue Ding, Runnan Fang, Shaowei Chen, Shen Huang, Shihang Wang, Shihao Cai, Weizhou Shen, Xiaobin Wang, Xin Guan, Xinyu Geng, Yingcheng Shi, Yuning Wu, Zhuo Chen, Zijian Li, Yong Jiang
AI总结 本文介绍了一种专为长时间深度信息检索任务设计的代理大语言模型,通过端到端训练框架结合代理中期和后期训练,实现了在复杂任务中的可扩展推理和信息检索,同时提供了高可扩展的数据合成管道,实现了无需昂贵人工标注的自动化训练流程,并在多个深度研究基准测试中取得了最先进的性能。
Comments https://tongyi-agent.github.io/blog
详情
我们介绍了通义深研,一种专为长周期、深度信息检索任务设计的代理大语言模型。为了激励自主深度研究代理,通义深研通过端到端训练框架结合代理中期和后期训练,实现了在复杂任务中的可扩展推理和信息检索。我们设计了一个高度可扩展的数据合成管道,完全自动化,无需依赖昂贵的人工标注,并赋能所有训练阶段。通过为每个阶段构建定制化环境,我们的系统在整个过程中实现了稳定一致的交互。通义深研拥有305亿总参数,每token仅激活33亿个参数,在多个代理深度研究基准测试中,包括人类最后考试、浏览比较、浏览比较-中文、WebWalkerQA、xbench-DeepSearch、FRAMES和xbench-DeepSearch-2510,均取得了最先进的性能。我们开源了该模型、框架和完整解决方案,以赋能社区。
We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.