2605.24636
2026-05-27
cs.AI
cs.CL
GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration
GlobalDentBench:一个用于评估牙科领域大语言模型临床推理能力并包含专家校准的多国基准
Junjie Zhao, Jingyi Liang, Zhenyang Cai, Jiaming Zhang, Zhenwei Wen, Shuzhi Deng, Wenjing Yi, Chunfeng Luo, Hexian Zhang, Junying Chen, Tianrui Liu, Zhuhui Bai, Zixu Zhang, Pradeep Singh, Xiang Liu, Jianquan Li, Nhan L Tran, Falk Schwendicke, Zuolin Jin, Lijian Jin, Liangyi Chen, Wei-fa Yang, Benyou Wang, Junwen Wang, Shan Jiang
发表机构
*
Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong(香港大学牙科学院应用口腔科学与社区牙科护理系)
;
School of Data Science, The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)数据科学学院)
;
School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)人工智能学院)
;
Department of Periodontology, Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University, Shenzhen, China(南方医科大学深圳口腔医院(平山)牙周科)
;
Beijing Institute of Collaborative Innovation(北京协同创新研究院)
;
Department of Orthodontics, Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University, Shenzhen, China(南方医科大学深圳口腔医院(平山)正畸科)
;
Shenzhen Stomatology Hospital (Pingshan) of Southern Medical University, Shenzhen, China(南方医科大学深圳口腔医院(平山))
;
College of Future Technology, Peking University(北京大学未来技术学院)
;
Freedom AI
;
New Cornerstone Science Laboratory, National Biomedical Imaging Center, State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking-Tsinghua Center for Life Sciences, College of Future Technology, Peking University, Beijing 100871, China(新基石科学实验室、国家生物医学成像中心、膜生物学国家重点实验室、分子医学研究院、北京大学未来技术学院、生命科学中心,北京大学,北京100871,中国)
;
IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China(IDG/ McGovern脑科学研究院,北京大学,北京100871,中国)
;
Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, The University of Hong Kong(香港大学牙科学院口腔颌面外科系)
;
Shenzhen Loop Area Institute(深圳环城区域研究所)
;
Department of Cancer Biology, Mayo Clinic Arizona, 5777 E. Mayo Blvd., IERB-3-504A, Phoenix, Arizona, 85054, USA(梅奥诊所亚利桑那分部癌症生物学部门,5777 E. Mayo Blvd., IERB-3-504A, Phoenix, Arizona, 85054, USA)
;
Department of Conservative Dentistry, Periodontology and Digital Dentistry, LMU University Hospital, LMU Munich, Munich, Germany(慕尼黑大学医院保守牙科、牙周病学和数字牙科部门,慕尼黑,德国,慕尼黑大学)
;
Division of Periodontology & Implant Dentistry, Faculty of Dentistry, The University of Hong Kong, Hong Kong, SAR, China(香港大学牙科学院牙周病学与种植牙科系,香港,中国)
AI总结
提出首个跨国牙科基准GlobalDentBench,包含14个专科、88个国家的8978道专家验证题目,评估三种推理层次,揭示当前大语言模型在牙科临床推理中性能随复杂度下降且存在高风险。