Operads for compositional reasoning in LLMs
用于LLM组合推理的Operad框架
Nathaniel Bottman, Kyle Richardson
AI总结 提出operad作为问题分解的数学框架,定义问题operad Q,将QA模型解释为Q上的代数,并引入operadic一致性度量,实验表明该度量与准确性强相关。
详情
问题分解,即将复杂查询分解为更简单的子查询,并将子查询的答案组合成最终答案,是提高LLM推理能力的常用策略,但目前缺乏严格的数学基础。本文提出operad(一种模拟多输入单输出操作及其组合的数学结构)作为描述问题分解的自然框架。我们定义了问题operad $Q$,其中操作对应问题模板,组合对应子答案的替换,并展示了QA模型如何被解释为$Q$上的代数。除了重新诠释现有实践,这一operad视角还指向了新方法,特别是operadic一致性概念,它衡量QA模型的答案在问题分解树的部分折叠上是否一致。关于operadic一致性的实证评估见我们的姊妹论文(Bottman, Liu, and Richardson, 2026),该论文发现它在12个LLM和4个多跳QA数据集上与准确性强相关,且优于基于温度的标准自一致性基线。我们认为operad是问题分解的自然数学框架,而诸如operadic一致性等不变量为分析和改进多步推理的可靠性开辟了新方向。
Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical foundation. In this paper, we propose operads, mathematical structures that model many-in, one-out operations and compositions thereof, as a natural framework for describing question decomposition. We define the questions operad $Q$, in which operations correspond to question templates and composition corresponds to substitution of sub-answers, and show how QA models can be interpreted as algebras over $Q$. Beyond reframing existing practice, this operadic perspective points toward new methods, in particular a notion of operadic consistency, which measures whether a QA model's answers agree across the partial collapses of a question decomposition tree. Empirical evaluation of operadic consistency is reported in our companion paper (Bottman, Liu, and Richardson, 2026), which finds it strongly correlated with accuracy across twelve LLMs and four multi-hop QA datasets and outperforming standard temperature-based self-consistency baselines. We argue that operads are the natural mathematical home for question decomposition, and that invariants such as operadic consistency open new directions for analyzing and improving the reliability of multi-step reasoning.