2605.12313
2026-05-13
cs.CL
cs.IR
Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering
Rezarta Islamaj, Joey Chan, Robert Leaman, Jongmyung Jung, Hyeongsoon Hwang, Quoc-An Nguyen, Hoang-Quynh Le, Harikrishnan Gurushankar Saisudha, Ganesh Chandrasekar, Rustam R. Taktashov, Nadezhda Yu. Bizyukova, Sofia I. R. Conceição, Paulo R. C. Lopes, Reem Abdel Salam, Mary Adewunmi, Zhiyong Lu
AI总结
BioCreative IX 的 MedHopQA 共享任务旨在评估大型语言模型在多跳医学问答中的推理能力,提出了包含1000个复杂问答对的新型数据集,每个问题需结合两个不同维基页面的信息进行两跳推理,特别关注罕见疾病相关问题。任务吸引了13支队伍的48次提交,结果表明基于检索增强生成(RAG)等策略的系统显著优于基线模型,最佳系统在概念准确度(MedCPT)和精确匹配(EM)指标上分别达到89.30%和87.30%。该数据集已公开,以推动医学多跳问答领域的发展。