LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
LANG: 用于多语言推理的强化学习与语言自适应提示引导
Yuchun Fan, Bei Li, Peiguang Li, Yilin Wang, Yongyu Mu, Jian Yang, Xin Chen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Jingbo Zhu, Tong Xiao
AI总结 本文提出LANG框架,通过语言条件提示引导非英语推理任务的探索,解决了多语言环境下强化学习在输入语言一致性与推理质量之间的权衡问题,提升了推理性能而不影响语言一致性。
详情
- Comments
- Accepted to ACL 2026 (main conference)
强化学习已被证明在增强大型语言模型(LLMs)的多步推理方面非常有效,但其好处尚未完全转化为多语言环境。现有方法在根本上面临一个矛盾:优先考虑输入语言的一致性严重损害推理质量,而优先考虑推理则会导致无意中向英语漂移。我们通过LANG,一种新的框架,利用语言条件提示来指导非英语推理任务的探索。我们的方法结合了两个关键机制来防止依赖这些提示:一个逐步衰减计划,逐渐撤回支架,以及一个语言自适应切换,将学习时间跨度调整到特定语言的困难程度。在具有挑战性的多语言数学基准上的实验证明,LANG显著提高了推理性能,而不会损害语言一致性。此外,我们表明我们的框架超越了数学,促进了模型各层之间更一致的语言对齐。
Reinforcement learning has proven effective for enhancing multi-step reasoning in large language models (LLMs), yet its benefits have not fully translated to multilingual contexts. Existing methods struggle with a fundamental trade-off: prioritizing input-language consistency severely hampers reasoning quality, while prioritizing reasoning often leads to unintended language drift toward English. We address this challenge with LANG, a novel framework that leverages language-conditioned hints to guide exploration in non-English reasoning tasks. Our method incorporates two key mechanisms to prevent dependency on these hints: a progressive decay schedule that gradually withdraws scaffolding, and a language-adaptive switch that tailors learning horizons to specific language difficulties. Empirical results on challenging multilingual mathematical benchmarks reveal that LANG substantially enhances reasoning performance without compromising language consistency. Moreover, we show that our framework generalizes beyond mathematics, fostering more consistent language alignment across model layers