2606.05174
2026-06-05
cs.CL
cs.AI
Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO
通过基于方差感知的评分规则奖励与GRPO改进LLMs中心脏医学问答
Arash Ahmadi, Parisa Masnadi, Sarah Sharif, Charles Nicholson, David Ebert, Mike Banad
发表机构
*
School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK, USA(电气与计算机工程学院,俄克拉荷马大学,诺曼,OK,USA)
;
Intelligent Neuromorphic and Quantum Understanding for Innovative Research and Engineering (INQUIRE) Laboratory, University of Oklahoma, Norman, OK, USA(创新研究与工程智能神经形态与量子理解实验室,俄克拉荷马大学,诺曼,OK,USA)
;
Khiabani Data Science and Analytics Institute, University of Oklahoma, Norman, OK, USA(Khiabani数据科学与分析研究所,俄克拉荷马大学,诺曼,OK,USA)
;
Data Institute for Societal Challenges (DISC), University of Oklahoma, Norman, OK, USA(社会挑战数据研究所(DISC),俄克拉荷马大学,诺曼,OK,USA)
;
School of Industrial and Systems Engineering, University of Oklahoma, Norman, OK, USA(工业与系统工程学院,俄克拉荷马大学,诺曼,OK,USA)
;
Office of Responsible Artificial Intelligence (ORAI), University of Arizona, Tucson, AZ, USA(负责任人工智能办公室(ORAI),亚利桑那大学,图森,AZ,USA)
AI总结
提出一种方差感知奖励框架,结合GRPO和RaR-Medicine的评分规则,通过连续分析奖励函数替代离散聚合,提升LLMs在心脏医学问答上的准确率和F1分数。