ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research
ORLoopBench:运筹学中自我修正与行为理性的求解器在环基准测试
AI总结 提出ORLoopBench基准套件,通过将不可行模型修复形式化为求解器在环马尔可夫决策过程,利用不可约不可行子系统(IIS)反馈,结合验证强化学习训练(RLVR),使8B模型在LP修复上超越前沿API(95.3% vs 92.4% RR@5),并揭示全模型代码再生中的语义漂移问题。
Comments 58 pages, accepted by ICML 2026