AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows
AIPatient Arena:基于电子健康记录的大语言模型在端到端临床咨询工作流中的评估
发表机构 * School of Control Science and Engineering, Shandong University(控制科学与工程学院,山东大学) ; Key Laboratory of Machine Intelligence and System Control, Shandong University(机器智能与系统控制重点实验室,山东大学) ; Department of Medicine and Therapeutics, The Chinese University of Hong Kong(医学与治疗学系,香港中文大学) ; Department of Geriatric Medicine, Qilu Hospital of Shandong University(老年医学科,山东大学齐鲁医院) ; Department of Psychiatry, The Chinese University of Hong Kong(精神病学系,香港中文大学) ; Li Chiu Kong Family Sleep Assessment Unit, Department of Psychiatry, Faculty of Medicine, The Chinese University of Hong Kong(李秋虹家庭睡眠评估单元,精神病学系,医学院,香港中文大学) ; Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong(李嘉诚健康科学研究院,医学院,香港中文大学) ; Gerald Choa Neuroscience Institute, Department of Medicine and Therapeutics, The Chinese University of Hong Kong(Gerald Choa 神经科学研究所,医学与治疗学系,香港中文大学)
AI总结 提出AIPatient Arena框架,通过电子健康记录构建患者知识图谱,在多轮医患交互中评估大语言模型的八项临床能力,发现模型在信息覆盖、诊断推理等方面存在不足,强调过程评估的重要性。
Comments 49 pages, 12 figues, 11 tables