2606.10479
2026-06-10
cs.AI
新提交
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics
ComBench: 奥林匹克级组合数学中严格证明推理与构造实现的基准测试
Shunkai Zhang, Haoran Zhang, Yun Luo, Qianjia Cheng, Haodi Lei, Yizhuo Li, Runzhe Zhan, Zhilin Wang, Bangjie Xu, Yucheng Su, Xinmiao Han, Xiaoye Qu, Dongrui Liu, Zhouchen Lin, Yu Qiao, Ning Ding, Yafu Li, Yu Cheng
发表机构
*
Shanghai AI Laboratory(上海人工智能实验室)
;
Peking University(北京大学)
;
Shanghai Jiao Tong University(上海交通大学)
;
Tsinghua University(清华大学)
;
The Chinese University of Hong Kong(香港中文大学)
AI总结
提出ComBench基准,包含100道奥林匹克级组合问题,分分析和构造两类,通过评分与验证评估大模型推理能力,发现最强模型准确率仅65.4%,且证明推理与构造实现能力存在差异。