LsrIF: Enhancing Logic-Structured Instruction Following of Large Language Models
LsrIF: 增强大语言模型的逻辑结构化指令遵循能力
Qingyu Ren, Qianyu He, Jingwen Chang, Geng Zhang, Jiajie Zhu, Xingzhou Chen, Zhuofei Shi, Jiaqing Liang, Yanghua Xiao, Han Xia, Zeye Sun, Fei Yu
AI总结 提出LsrIF框架,通过构建并行、顺序、条件和嵌套结构的原子约束数据,并采用结构感知的奖励聚合方法,提升大语言模型在逻辑结构化指令遵循任务中的表现。
详情
指令遵循对于大语言模型至关重要,然而现实世界中的指令通常涉及具有逻辑结构的多个约束,例如并行组合、顺序依赖和条件分支。现有方法通常通过简单组合约束来构建数据,并在训练过程中通过平均各个约束分数来聚合奖励,忽略了逻辑依赖关系并引入了噪声信号。我们提出LsrIF,一个用于逻辑结构化指令遵循的训练框架。LsrIF通过将原子约束组织成并行、顺序、条件和嵌套结构来构建数据,并应用与其执行语义一致的结构感知奖励聚合:对并行约束取平均奖励,在顺序结构中早期失败后衰减后续奖励,在条件结构中仅奖励活跃分支。实验表明,LsrIF在领域内和领域外设置中均提升了指令遵循能力,同时也有利于逻辑推理。进一步分析表明,逻辑结构化训练增加了对约束相关词元和逻辑连接词的注意力,表明模型对指令逻辑的建模得到改善。我们将发布我们的数据和代码以供未来研究。
Instruction following is critical for large language models, yet real-world instructions often involve multiple constraints with logical structures, such as parallel composition, sequential dependencies, and conditional branching. Existing methods typically construct data by simply combining constraints and aggregate rewards by averaging individual constraint scores during training, overlooking logical dependencies and introducing noisy signals. We propose LsrIF, a training framework for logic-structured instruction following. LsrIF constructs data by organizing atomic constraints into parallel, sequential, conditional, and nested structures, and applies structure-aware reward aggregation aligned with their execution semantics: averaging rewards for parallel constraints, decaying later rewards after early failures in sequential structures, and rewarding only active branches in conditional structures. Experiments show that LsrIF improves instruction following in both in-domain and out-of-domain settings while also benefiting logic reasoning. Further analysis indicates that logic-structured training increases attention to constraint-related tokens and logical connectors, suggesting improved modeling of instruction logic. We will release our data and code for future research.