Learning to Reason with Insight for Informal Theorem Proving
学习在非形式定理证明中进行洞察推理
Yunhe Li, Hao Shi, Bowen Deng, Wei Wang, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Siyang Gao, Chao Wang, Shuang Qiu, Linqi Song
AI总结 针对非形式定理证明中缺乏洞察(识别核心技巧)的瓶颈,提出统一训练框架DeepInsight,通过分层数据集、渐进式多阶段SFT和基于洞察的策略优化方法,显著提升大语言模型的数学推理能力。
详情
尽管大多数自动定理证明方法依赖于形式证明系统,但非形式定理证明能更好地发挥大语言模型(LLMs)在自然语言处理方面的优势。在这项工作中,我们识别出非形式定理证明的一个主要瓶颈是缺乏洞察,即难以识别解决复杂问题所需的核心技巧。为了解决这个问题,我们提出了$ exttt{DeepInsight}$,一个统一的训练框架,旨在培养这种基本的推理技能,并使LLMs能够进行洞察推理。我们的框架由三个部分组成:(1)$ exttt{DeepInsightTheorem}$,一个分层数据集,通过显式提取核心技巧和证明草图以及最终证明来结构化非形式证明;(2)渐进式多阶段SFT策略,模拟人类学习过程,教授模型证明写作、规划和洞察识别;(3)$ exttt{InsightPO}$,一种策略优化方法,在此洞察层次结构上分配结构化奖励。我们在具有挑战性的数学基准上的实验表明,这种洞察感知的生成策略显著优于基线。这些结果表明,教模型识别和应用核心技巧可以大幅提高其数学推理能力。
Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a primary bottleneck in informal theorem proving as a lack of insight, namely the difficulty of recognizing the core techniques required to solve complex problems. To address this, we propose $\texttt{DeepInsight}$, a unified training framework designed to cultivate this essential reasoning skill and enable LLMs to perform insightful reasoning. Our framework consists of three components: (1) $\texttt{DeepInsightTheorem}$, a hierarchical dataset that structures informal proofs by explicitly extracting core techniques and proof sketches alongside the final proof; (2) a Progressive Multi-Stage SFT strategy that mimics the human learning process, teaching the model proof writing, planning, and insight identification; and (3) $\texttt{InsightPO}$, a policy optimization method that assigns structured rewards over this insight hierarchy. Our experiments on challenging mathematical benchmarks demonstrate that this insight-aware generation strategy significantly outperforms baselines. These results demonstrate that teaching models to identify and apply core techniques can substantially improve their mathematical reasoning.