A Clinically Validated Foundation Model for Comprehensive Lung Pathology Interpretation
临床验证的基础模型用于全面肺部病理解读
Zhengrui Guo, Zhengyu Zhang, Jiabo Ma, Yihui Wang, Fengtao Zhou, Yingxue Xu, Ling Liang, Chenglong Zhao, Qi Xie, Jinbang Li, Shujing Guo, Fangyi Han, Zhijian Cen, Ziyi Liu, Cheng Jin, Junlin Hou, Zhixuan Chen, Yu Cai, Lijuan Qu, Shifu Chen, Yueping Liu, Zhe Wang, Xiuming Zhang, Muyan Cai, Li Liang, Hao Chen
AI总结 提出PulmoFoundation,一种基于Virchow2和约4万张H&E染色全切片图像进行亚专科预训练的肺部病理基础模型,通过32项临床任务和前瞻性随机对照试验验证,在诊断准确性、效率和一致性上显著提升。
详情
病理评估指导肺癌诊断、治疗选择和预后评估,但当前的CPath方法依赖于针对孤立目标的任务特定模型。尽管泛癌基础模型提供了多功能性,但它们缺乏亚专科深度,且未在临床工作流程中评估或在真实世界环境中进行前瞻性验证。我们介绍了PulmoFoundation,这是一个多中心、前瞻性验证、随机对照试验(RCT)评估的基础模型,用于术前、术中和术后护理的全面肺部病理评估。PulmoFoundation基于Virchow2,通过使用约40,000张诊断性H&E染色全切片图像(WSI)进行亚专科特定预训练构建,并在约26,000张WSI上系统评估了32项临床相关任务。除了准确预测分子标记和患者生存率外,我们的模型在活检、冰冻切片和手术切除切片的核芯诊断任务中达到了临床级性能。在一项针对1,357名患者、涵盖11项诊断任务的注册前瞻性研究中,我们的模型实现了平均AUC 92.3%。使用预设的分诊阈值,PulmoFoundation可以减少68.8%的活检和83.0%的冰冻切片的额外二次复核负担,并推迟44.5%的IHC染色订单,阳性预测值分别为1.0、0.991和0.966。除了前瞻性验证,我们还进行了一项交叉RCT,涉及八名病理学家,AI辅助在4,928个病例-阅片者对中提高了诊断准确性(有AI为91.7%,无AI为83.8%)。AI辅助还使中位诊断时间减少了19.6%,诊断信心提高了8.7%,并将阅片者间一致性从中等(kappa=0.56)提高到显著(kappa=0.76)。这些评估共同支持PulmoFoundation作为临床验证的肺部病理决策支持系统。
Pathological assessment guides lung cancer diagnosis, treatment selection, and prognostic evaluation, yet current CPath approaches rely on task-specific models for isolated objectives. Although pan-cancer foundation models offer versatility, they lack subspecialty-level depth and have not been evaluated across clinical workflows or prospectively validated in real-world settings. We introduce PulmoFoundation, a multi-center, prospectively validated, randomized controlled trial (RCT)-evaluated foundation model for comprehensive lung pathology assessment across pre-operative, intra-operative, and post-operative care. Built upon Virchow2 via subspecialty-specific pretraining using ~40,000 diagnostic H&E-stained whole-slide images (WSIs), PulmoFoundation was systematically evaluated on ~26,000 WSIs across 32 clinically relevant tasks. In addition to accurately predicting molecular markers and patient survival, our model achieves clinical-grade performance in core diagnostic tasks across biopsy, frozen section, and surgical resection slides. In a registered prospective study of 1,357 patients across 11 diagnostic tasks, our model achieved an average AUC of 92.3%. Using pre-specified triage thresholds, PulmoFoundation could reduce additional second-review burden for 68.8% of biopsies and 83.0% of frozen sections, and defer 44.5% of IHC stain orders, with PPVs of 1.0, 0.991, and 0.966. Beyond prospective validation, we conducted a crossover RCT with eight pathologists, in which AI assistance improved diagnostic accuracy across 4,928 case-reader pairs (91.7% w/ AI vs. 83.8% w/o AI). AI assistance also reduced median diagnostic time by 19.6%, increased diagnostic confidence by 8.7%, and improved inter-rater agreement from moderate (kappa = 0.56) to substantial (kappa = 0.76). Together, these evaluations support PulmoFoundation as a clinically validated decision-support system for lung pathology.