CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation
CRITIC-R1: 学习结构化评论用于检索增强生成
Wenhan Xiao, Ziwei Zhang, Chuanyue Yu, Xingcheng Fu, Qingyun Sun, Runhua Xu, Jianxin Li
AI总结 提出CRITIC-R1框架,通过强化学习将RAG评论建模为结构化错误诊断问题,设计保守判断对齐和诊断质量对齐奖励函数,提升检索增强生成的答案质量。
详情
- Comments
- 17 pages,13 figures
检索增强生成(RAG)通过引入外部证据改进了知识密集型问答。然而,现有的RAG方法仍然存在幻觉和细微推理错误。最近的研究引入外部评论来优化RAG输出,但它们通常提供粗粒度且结构薄弱的反馈,表现出过度激进的干预,导致噪声大且不可靠的优化,限制了其纠正效果。为解决这些问题,我们提出了CRITIC-R1,一个结构化评论框架,将RAG评论制定并学习为使用强化学习(RL)的显式错误诊断问题。我们的框架将常见的RAG错误分类为多个诊断维度,包括判定、错误位置、推理分析和修复生成。为了学习这些能力,我们设计了两个奖励函数:保守判断对齐(CJA)首先鼓励校准的高层判断,同时减轻过度激进现象;而诊断质量对齐(DQA)通过门控奖励进一步改进细粒度诊断反馈。我们使用基于GRPO的RL训练评论模型,并从外部LLM教师模型收集过程级监督。在五个QA基准上的实验表明,CRITIC-R1在强RAG基线上持续提高了答案质量。我们的源代码可在 https://anonymous.4open.science/r/critic-r1-FCB0 获取。
Retrieval-augmented generation (RAG) improves knowledge-intensive question answering by incorporating external evidence. However, existing RAG methods still suffer from hallucinations and subtle reasoning errors. Recent studies introduce external critics to refine RAG outputs, yet they often provide coarse-grained and weakly structured feedback, exhibit over-aggressive intervention, and lead to noisy and unreliable refinement, limiting their effectiveness for correction. To tackle these issues, we propose CRITIC-R1, a structured critic framework that formulates and learns RAG critique as an explicit error diagnosis problem using reinforcement learning (RL). Our framework categorizes common RAG errors into multiple diagnostic dimensions, including verdict, error location, reasoning analysis, and fix generation. To learn these capabilities, we design two reward functions: Conservative Judgement Alignment (CJA) first encourages calibrated high-level judgements while mitigating the over-aggressive phenomenon, whereas Diagnostic Quality Alignment (DQA) further improves fine-grained diagnostic feedback through gated rewards. We train the critic model using GRPO-based RL with process-level supervision collected from external LLM teacher models. Experiments across five QA benchmarks show that CRITIC-R1 consistently improves answer quality over strong RAG baselines. Our source code is available at https://anonymous.4open.science/r/critic-r1-FCB0