R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search
R-APS:基于反思性对抗帕累托搜索的组合推理与上下文元学习用于约束设计
João Pedro Gandarela, Thiago Rios, Stefan Menzel, André Freitas
AI总结 提出R-APS方法,通过推理模式分解、分阶段组合推理、敏感性引导对抗测试和元归纳规则提取,联合解决LLM在代理设置中的错误传播、最坏情况扰动和知识失效问题,在平面机构合成任务上实现更紧的鲁棒性证书和更快的迭代速度。
详情
大型语言模型(LLM)在开放式任务上表现流畅,但在需要规划、使用工具和长时间行动的代理设置中,流畅性并不能保证可靠交付。我们将这一差距归因于三个耦合的结构性失败:错误传播而不定位、最坏情况扰动未评估、积累的知识从未失效。我们认为这些失败有一个共同根源:溯因、反事实、元归纳、纠正和归纳推理将共享上下文拉向不相容的方向。我们提出反思性对抗帕累托搜索(R-APS),据我们所知,这是第一种通过推理模式分解联合解决所有三个失败的方法,为每种推理模式分配其自己的上下文,并在三个时间尺度上协调交互:带有类型化验证批评者的分阶段组合推理(失败定位)、作为第一类帕累托目标的敏感性引导反事实压力测试(鲁棒性)、以及带有显式失效的元归纳规则提取(持久记忆)。R-APS无需微调,仅通过结构化协议设计在冻结的LLM上运行。我们在平面机构综合(机器人、假肢、机械设计)上评估,每个候选解由运动学求解器检查。在32个目标轨迹上,R-APS提供的鲁棒性证书比均匀扰动基线紧3.5倍,首次接纳迭代速度提高46%,Chamfer距离比Enum+GA减少2.1倍,同时联合控制杆数和最坏情况鲁棒性。小型4B推理专用模型在协议内与通用70B骨干模型竞争,表明结构化协议可以部分抵消模型规模。
Large language models (LLMs) are fluent on open-ended tasks, yet in agentic settings, where a system must plan, use tools, and act over extended horizons, fluency does not ensure reliable delivery. We trace this gap to three coupled structural failures: errors propagate without localization, worst-case perturbations go unevaluated, and accumulated knowledge is never invalidated. We argue these share a root cause: abductive, counterfactual, meta-inductive, corrective, and inductive reasoning pull a shared context in incompatible directions. We introduce Reflective Adversarial Pareto Search (R-APS), to our knowledge the first method addressing all three failures jointly via reasoning-mode decomposition, allocating each reasoning mode its own context and orchestrating interaction across three timescales: staged compositional reasoning with a typed validation critic (failure localization), sensitivity-guided counterfactual stress-testing as a first-class Pareto objective (robustness), and meta-inductive rule extraction with explicit invalidation (persistent memory). R-APS requires no fine-tuning and operates on a frozen LLM purely via structured protocol design. We evaluate on planar mechanism synthesis (robotics, prosthetics, mechanical design), with every candidate checked by a kinematic solver. On 32 target trajectories, R-APS delivers robustness certificates 3.5x tighter than uniform-perturbation baselines, 46% faster iterations-to-first-admission, and 2.1x Chamfer-distance reduction over Enum+GA while jointly controlling bar-count and worst-case robustness. Small 4B reasoning-specialized models prove competitive with general-purpose 70B backbones inside the protocol, suggesting structured protocols can partially offset model scale.