Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
平衡推理器:学习吸引子使推理可扩展
Benhao Huang, Zhengyang Geng, Zico Kolter
AI总结 本文提出平衡推理器(EqR),通过学习任务条件的吸引子来实现可扩展推理,该方法在测试时无需外部验证器或任务特定先验,通过增加深度和广度实现推理能力的提升,从而在Sudoku-Extreme上将准确率从2.6%提升至超过99%。
详情
- Comments
- ICML 2026
通过迭代更新潜在状态来扩展测试时计算已成为推理的强大范式。然而,这些迭代模型能够超越记忆模式进行泛化内部机制仍不清楚。我们假设可泛化推理源于学习任务条件的吸引子:潜在动态系统,其稳定固定点对应有效解。我们通过平衡推理器(EqR)正式化这一过程,该方法在测试时无需外部验证器或任务特定先验,通过沿两个轴扩展内部动态:深度(通过运行更多迭代)和广度(通过聚合多个初始化中的随机轨迹)。经验上,测试时扩展的收益与更强的收敛性向解对齐的吸引子紧密相关。这种吸引子视角使神经网络能够根据任务难度自适应分配测试时计算。虽然简单案例在1到5次迭代步骤内收敛,更难的案例则受益于大规模测试时扩展。通过展开相当于40,000层的深度,可扩展的潜在推理将准确率从前馈模型的2.6%提升到Sudoku-Extreme上的超过99%。这些结果表明,学习的吸引子景观为理解迭代潜在模型中的可扩展推理提供了有用的机制视角。
Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions. We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors. This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.