arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

医学 AI

医学智能、临床 AI、医学影像、病理、诊断和医疗健康大模型。

今日/当前日期收录 7 信号源:cs.CV, cs.LG, q-bio, eess.IV, eess.SP
2605.00665 2026-06-19 cs.CV 版本更新 95%

Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

基于深度学习的视网膜图像预测阿尔茨海默病风险因素:英国生物银行中生物学相关形态学关联的开发和验证

Seowung Leem, Yunchao Yang, Adam J. Woods, Ruogu Fang

发表机构 * J. Crayton Pruitt Family Dept. of Biomedical Engineering, University of Florida(朱·克雷顿·普瑞特生物医学工程系,佛罗里达大学) University of Florida Research Computing(佛罗里达大学研究计算中心) Meta AI (FAIR)(Meta AI(FAIR)) School of Behavioral and Brain Sciences, University of Texas at Dallas(德克萨斯大学达拉斯分校行为与脑科学学院) Dept. of Electrical and Computer Engineering, University of Florida(佛罗里达大学电气与计算机工程系) Dept. of Computer and Information Science and Engineering, University of Florida(佛罗里达大学计算机与信息科学与工程系) Center for Cognitive Aging and Memory, University of Florida(佛罗里达大学认知衰老与记忆中心)

专题命中 医学影像 :用深度学习从视网膜图像预测阿尔茨海默病风险因素

AI总结 利用深度学习从视网膜彩色眼底照片预测12个阿尔茨海默病相关风险因素,并揭示其背后的视网膜结构特征,发现视神经头和视网膜血管等区域与风险因素及阿尔茨海默病前期变化相关。

Comments Accepted to the "Journal of Alzheimer's Disease" for publication

详情
AI中文摘要

系统性的、代谢性的、生活方式的因素已通过流行病学和AD特异性生物标志物研究与阿尔茨海默病(AD)建立关联。彩色眼底摄影(CFP)是否包含与这些AD相关风险域相对应的视网膜结构特征仍不清楚。为了确定深度学习(DL)模型能否从CFP预测12个AD相关风险因素,并表征这些预测背后的视网膜结构,从而评估CFP是否反映AD易感性的通路。使用来自英国生物银行的44,501名独特参与者的62,876张CFP,训练DL模型预测与AD发病率相关的12个因素:6个分类变量(性别、吸烟、失眠、经济状况、饮酒、抑郁)和6个连续变量(年龄、受教育完成年龄、BMI、收缩压、舒张压、HbA1c)。评估模型性能、模型显著性和显著性衍生得分(CAM-Score),并与视网膜形态测量进行比较。还将得分在AD发病病例(平均发病前8.55年)与匹配对照之间进行比较。DL的性能范围为分类变量的AUROC=0.5654-0.9480,连续变量的R2=-0.0291-0.7620,优于大多数形态测量-机器学习模型。基于显著性的得分一致地突出了生物学上有意义的区域,特别是视神经头和视网膜血管。它也与现有的形态测量变异一致。多个基于显著性的得分在AD发病病例与匹配对照之间存在显著差异,表明风险因素的视网膜相关性与临床前AD相关变化之间存在潜在重叠。CFP编码了与AD风险因素相关的视网膜特征。尽管不具有诊断性,但DL衍生的视网膜表征可能揭示反映潜在AD易感性的生物学上有意义的风险相关结构变化。

英文摘要

The systemic, metabolic, lifestyle factors have established associations with Alzheimer's Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using 62,876 CFPs from 44,501 unique participants from the UK Biobank, DL models were trained to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.

2508.01819 2026-06-19 eess.IV 版本更新 90%

Decoding the Alzheimer's Continuum: Interpretable Multi-Gate Routing for Diagnosis and Transition Prediction

解码阿尔茨海默病连续谱:可解释的多门路由用于诊断与转换预测

Yufeng Jiang, Hexiao Ding, Hongzhao Chen, Jing Lan, Xinzhi Teng, Gerald W. Y. Cheng, Yunlin Mao, Zongxi Li, Haoran Xie, Jung Sun Yoo, Jing Cai

专题命中 医学影像 :基于sMRI进行阿尔茨海默病诊断和转换预测。

AI总结 提出M$^3$AD统一框架,利用可解释多门专家混合架构,基于T1加权sMRI同时实现三分类诊断和阶段转换预测,准确率达95.13%。

Comments Accepted by MICCAI2026

详情
AI中文摘要

阿尔茨海默病(AD)表现为从正常认知(NC)经轻度认知障碍(MCI)到痴呆的连续进展。然而,大多数深度学习方法将此连续谱简化为不连续的分类任务,很大程度上忽略了动态阶段转换。为了解码这一复杂进展,我们提出M$^3$AD,一个统一框架,仅使用T1加权sMRI联合处理三分类诊断和诊断阶段转换预测。M$^3$AD利用可解释的多门专家混合架构,采用专门的路由机制动态捕获诊断特定的病理模式和跨连续谱的共享结构特征。它进一步通过自适应注意力融合整合临床先验(年龄、性别、eTIV)以增强泛化能力。M$^3$AD在原始实验设置下达到95.13%的准确率(MCLNC报告为90.44%),转换预测准确率为94.87%。关键的是,分析多门路由揭示了区分稳定性和进展性MCI的独特专家激活特征,为个体水平的进展风险分层提供了机制基础。代码见:此 https URL。

英文摘要

Alzheimer's disease (AD) manifests as a continuous progression from normal cognition (NC) through mild cognitive impairment (MCI) to dementia. However, most deep learning approaches reduce this continuum to disjointed classification tasks, largely ignoring dynamic stage transitions. To decode this complex progression, we propose M$^3$AD, a unified framework that jointly addresses three-class diagnosis classification and diagnosis stage transition prediction using only T1-weighted sMRI. M$^3$AD leverages an interpretable multi-gate mixture of experts architecture, employing specialized routing mechanisms to dynamically capture both diagnosis-specific pathological patterns and shared structural features across the continuum. It further integrates clinical priors (age, sex, eTIV) via adaptive attention fusion to enhance generalization. M$^3$AD achieves 95.13% accuracy, compared to 90.44% reported by MCLNC under its original experimental setting, and 94.87% for transition prediction. Crucially, analyzing the multi-gate routing reveals distinct expert activation signatures distinguishing stable from progressive MCI, providing a mechanistic basis for individual-level progression risk stratification. Code is available at https://github.com/csyfjiang/M3AD.

2503.23179 2026-06-19 eess.IV cs.CV 版本更新 90%

OncoReg: Medical Image Registration for Oncological Challenges

OncoReg:面向肿瘤学挑战的医学图像配准

Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Tim Hable, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

发表机构 * Institute of Medical Informatics, University of Lübeck(吕贝克大学医学信息学研究所) Institute of Radiology and Nuclear Medicine, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射科和核医学研究所) Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute(伦塞拉塞尔理工学院生物医学工程系和生物技术与跨学科研究中心) School of Computer Science, University of Birmingham(伯明翰大学计算机科学学院) Division of Informatics, Imaging and Data Sciences, University of Manchester(曼彻斯特大学信息学、成像和数据科学系) DAMO Academy, Alibaba Group(阿里集团DAMO学院) Hangzhou Shengshi Technology Co., Ltd(杭州盛世科技有限公司) Department of Radiation Oncology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射肿瘤科) EchoScout GmbH Radboud University Medical Center, Nijmegen(奈密根大学医学中心) Institute of Interventional Radiology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院介入放射科)

专题命中 医学影像 :医学图像配准用于肿瘤放疗,属于医学影像。

AI总结 提出OncoReg挑战,通过两阶段框架在保护患者隐私的同时开发可泛化的图像配准方法,用于放射治疗中锥束CT与扇束CT的配准,发现特征提取是关键,深度学习和经典方法结合最有效。

Comments 21 pages, 13 figures

详情
AI中文摘要

在现代癌症研究中,由于患者隐私相关的挑战,产生的大量医学数据往往未被充分利用。OncoReg挑战通过一个两阶段框架解决了这一问题,该框架使研究人员能够在确保患者隐私的同时开发和验证图像配准方法,并促进更可泛化的AI模型的发展。第一阶段涉及使用公开可用的数据集,第二阶段则专注于在安全的医院网络内对私有数据集进行模型训练。OncoReg建立在Learn2Reg挑战的基础上,纳入了放射治疗中介入性锥束计算机断层扫描与标准计划扇束CT图像的配准。准确的图像配准在肿瘤学中至关重要,特别是在图像引导放射治疗的动态治疗调整中,需要精确对齐以最小化对健康组织的辐射暴露,同时有效靶向肿瘤。本文详细介绍了OncoReg挑战的方法和数据,并对竞赛参赛作品和结果进行了全面分析。研究发现,特征提取在此配准任务中起着关键作用。从该挑战中涌现的一种新方法展示了其多功能性,而现有方法的表现与新技术相当。深度学习和经典方法在图像配准中仍扮演重要角色,尤其是方法的组合,特别是在特征提取方面,被证明最为有效。

英文摘要

In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves working with a publicly available dataset, while phase two focuses on training models on a private dataset within secure hospital networks. OncoReg builds upon the foundation established by the Learn2Reg Challenge by incorporating the registration of interventional cone-beam computed tomography with standard planning fan-beam CT images in radiotherapy. Accurate image registration is crucial in oncology, particularly for dynamic treatment adjustments in image-guided radiotherapy, where precise alignment is necessary to minimise radiation exposure to healthy tissues while effectively targeting tumours. This work details the methodology and data behind the OncoReg Challenge and provides a comprehensive analysis of the competition entries and results. Findings reveal that feature extraction plays a pivotal role in this registration task. A new method emerging from this challenge demonstrated its versatility, while established approaches continue to perform comparably to newer techniques. Both deep learning and classical approaches still play significant roles in image registration, with the combination of methods, particularly in feature extraction, proving most effective.

2504.02885 2026-06-19 cs.CL 版本更新 90%

Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation

Med-R2:面向医学报告生成的感知与反思驱动复杂推理

Hao Wang, Shuchang Ye, Jinghao Lin, Usman Naseem, Jinman Kim

发表机构 * The School of Computer Science, The University of Sydney(悉尼大学计算机科学学院) The School of Computing, Macquarie University(麦考瑞大学计算机学院) Doubao Medical Group, ByteDance(字节跳动 doubao 医疗集团)

专题命中 医学影像 :提出医学报告生成方法,涉及病理特征感知和诊断推理

AI总结 提出Med-R2微调策略,通过引入感知驱动的长推理过程和放射学知识指导,并加入反思机制修正感知错误,提升LVLMs在医学报告生成中的病理特征感知和诊断准确性。

Comments 28 pages, 3 figures, 1 table

详情
AI中文摘要

自动化医学报告生成(MRG)越来越多地被用于减轻人工报告负担和辅助决策。大型视觉语言模型(LVLMs)因其细粒度的图像-文本对齐和先进的文本生成能力,在自动化MRG中展现出巨大潜力。目前,最先进的MRG主要专注于通过直接监督微调(SFT)来适应预训练的LVLMs,这是一种使用医学图像-报告对的微调策略。然而,有几个因素限制了这些LVLMs的性能。首先,直接SFT使LVLMs能够直接生成医学报告,而无需经过病理特征感知和诊断推理的中间思考过程。这导致可能无法感知病理特征,从而引起误诊。其次,直接SFT缺乏放射学特定知识的指导,导致LVLMs误解感知到的病理特征并做出错误诊断。为了解决这些问题,我们提出了一种名为Med-R2的新型微调策略。我们引入了一个感知驱动的长推理过程,该过程在报告生成之前进行,并融入放射学特定知识作为指导。此外,为了减轻复杂推理中潜在的感知错误,引入了一种反思机制来细化病理特征的感知和生成的报告。我们的实验表明,Med-R2通过微调LVLMs有效增强了MRG的病理特征感知能力和诊断准确性。

英文摘要

Automated medical report generation (MRG) is increasingly used to reduce the burden of manual reporting and for decision support. Large vision-language models (LVLMs) hold great promise for automated MRG due to their fine-grained image-text alignment and advanced text-generation capabilities. Currently, state-of-the-art MRGs primarily focus on adapting pre-trained LVLMs with direct supervised fine-tuning (SFT), a fine-tuning strategy with medical image-report pairs. However, several factors limit the performance of these LVLMs. Firstly, direct SFT enables LVLMs to generate medical reports directly without an intermediate thinking process of pathological feature perception and diagnostic reasoning. This causes a potential failure to perceive pathological features and thus leads to misdiagnosis. Secondly, direct SFT lacks the incorporation of radiology-specific knowledge guidance, causing LVLMs to misinterpret perceived pathological features and make incorrect diagnoses. To address these gaps, we propose a novel fine-tuning strategy named Med-R2. We introduce a perception-driven long reasoning process that precedes report generation and incorporates radiology-specific knowledge as guidance. Additionally, to alleviate potential perceptual errors in complex reasoning, a reflection mechanism is introduced to refine the perception of pathological features and the generated report. Our experiments demonstrate that Med-R2 effectively enhances the capability of pathological features perception and diagnosis accuracy for MRG via fine-tuned LVLMs.

2405.10705 2026-06-19 eess.IV cs.CV 版本更新 90%

3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

基于血管概率引导衰减学习的稀疏视角动态DSA图像三维血管重建

Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

发表机构 * School of Biomedical Engineering \& State Key Laboratory of Advanced Medical Materials Devices, ShanghaiTech University, Shanghai, China National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China School of Electronic Information Communications, Huazhong University of Science Department of Computer Science \& Engineering, Texas A\&M University, USA

专题命中 医学影像 :提出稀疏视角DSA重建方法,降低辐射剂量

AI总结 提出血管概率引导衰减学习框架,通过静态与动态衰减场互补加权实现稀疏视角DSA重建,降低辐射剂量,并采用渐进训练和时间扰动损失提升质量。

Comments Accepted by Medical Image Analysis (MedIA), 2026

详情
AI中文摘要

数字减影血管造影(DSA)是血管疾病诊断的金标准之一。借助造影剂,时间分辨的二维DSA图像提供全面的血流信息,可用于重建三维血管结构以进行医学评估。当前的商用DSA系统通常需要数百个扫描视角进行重建,导致大量辐射暴露。在本研究中,我们提出了一种基于神经渲染的优化框架,专门用于高质量稀疏视角DSA重建,以减少辐射剂量。我们的方法称为血管概率引导衰减学习,将DSA成像表示为静态和动态衰减场的互补加权组合,权重来自时间无关的血管概率场。作为前景掩膜,血管概率为静态和动态场提供适应不同场景类型的适当梯度。该机制实现了静态背景与动态造影剂流的自监督分解,并显著提高了重建质量。我们的模型通过最小化合成投影与真实DSA图像之间的差异进行训练。我们进一步采用两种训练策略来提高重建质量:(1)由粗到细的渐进训练以改善几何结构,以及(2)时间扰动渲染损失以保持时间一致性。实验结果表明了高质量的三维血管重建和二维DSA图像合成。

英文摘要

Digital Subtraction Angiography (DSA) is one of the gold standards for vascular disease diagnosis. With the help of a contrast agent, time-resolved 2D DSA images deliver comprehensive blood flow information and can be utilized to reconstruct 3D vessel structures for medical assessment. Current commercial DSA systems typically require hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. In this study, we propose a neural rendering-based optimization framework tailored for high-quality sparse-view DSA reconstruction to reduce radiation dosage. Our approach, termed vessel probability guided attenuation learning, represents DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the time-independent vessel probability field. Functioning as a foreground mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism enables self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves reconstruction quality. Our model is trained by minimizing the discrepancy between synthesized projections and real captured DSA images. We further employ two training strategies to improve reconstruction quality: (1) coarse-to-fine progressive training for better geometry and (2) temporal perturbed rendering loss for temporal consistency. Experimental results have demonstrated high-quality 3D vessel reconstruction and 2D DSA image synthesis.

2603.01250 2026-06-19 cs.CV cs.AI 版本更新 85%

The MAMA-MIA Challenge: Advancing Generalizability and Fairness in Breast MRI Tumor Segmentation and Treatment Response Prediction

MAMA-MIA挑战:推进乳腺MRI肿瘤分割与治疗反应预测的泛化性和公平性

Lidia Garrucho, Smriti Joshi, Kaisar Kushibar, Richard Osuala, Maciej Bobowicz, Xavier Bargalló, Paulius Jaruševičius, Kai Geissler, Raphael Schäfer, Muhammad Alberb, Tony Xu, Anne Martel, Daniel Sleiman, Navchetan Awasthi, Hadeel Awwad, Joan C. Vilanova, Robert Martí, Daan Schouten, Jeong Hoon Lee, Mirabela Rusu, Eleonora Poeta, Luisa Vargas, Eliana Pastor, Maria A. Zuluaga, Jessica Kächele, Dimitrios Bounias, Alexandra Ertl, Katarzyna Gwoździewicz, Maria-Laura Cosaka, Pasant M. Abo-Elhoda, Sara W. Tantawy, Shorouq S. Sakrana, Norhan O. Shawky-Abdelfatah, Amr Muhammad Abdo-Salem, Androniki Kozana, Eugen Divjak, Gordana Ivanac, Katerina Nikiforaki, Michail E. Klontzas, Rosa García-Dosdá, Meltem Gulsun-Akpinar, Oğuz Lafcı, Carlos Martín-Isla, Oliver Díaz, Laura Igual, Karim Lekadir

发表机构 * Barcelona Artificial Intelligence in Medicine Lab (BCN-AIM), Facultat de Matemàtiques i Informàtica, Universitat de Barcelona(巴塞罗那人工智能在医学实验室(BCN-AIM),巴塞罗那大学数学与计算机学院)

专题命中 医学影像 :乳腺MRI肿瘤分割与治疗反应预测

AI总结 提出MAMA-MIA挑战,通过标准化基准评估乳腺MRI肿瘤分割和病理完全缓解预测,在跨洲多中心数据上分析模型泛化性与公平性,发现性能与亚组公平性之间存在权衡。

详情
AI中文摘要

乳腺癌是全球女性中最常诊断的恶性肿瘤,也是癌症相关死亡的主要原因之一。动态对比增强磁共振成像在肿瘤表征和治疗监测中发挥核心作用,尤其是接受新辅助化疗的患者。然而,现有的乳腺磁共振成像人工智能模型通常使用异质性数据集、研究人群和评估协议进行开发和评估,使得直接比较困难,并限制了跨机构和临床相关患者亚组的模型鲁棒性理解。MAMA-MIA挑战旨在通过提供标准化基准来解决这些问题,该基准用于联合评估原发性肿瘤分割和仅使用治疗前磁共振成像预测病理完全缓解。训练队列包括来自美国多家机构的1506名患者,而评估则在来自三个独立欧洲中心的574名患者的外部测试集上进行,以评估跨大陆和跨机构的泛化性。统一的评分框架结合了预测性能与年龄、绝经状态和乳腺密度方面的亚组一致性。26个国际团队参加了最终评估阶段。结果表明,在共同的外部评估框架下,性能存在显著差异,并揭示了整体准确性与亚组公平性之间的权衡。该挑战提供了标准化数据集、评估协议和公共资源,以促进开发稳健且公平的乳腺癌影像人工智能系统。

英文摘要

Breast cancer is the most frequently diagnosed malignancy among women worldwide and a leading cause of cancer-related mortality. Dynamic contrast-enhanced magnetic resonance imaging plays a central role in tumor characterization and treatment monitoring, particularly in patients receiving neoadjuvant chemotherapy. However, existing artificial intelligence models for breast magnetic resonance imaging are typically developed and evaluated using heterogeneous datasets, study populations, and assessment protocols, making direct comparison difficult and limiting understanding of model robustness across institutions and clinically relevant patient subgroups. The MAMA-MIA Challenge was designed to address these challenges by providing a standardized benchmark for the joint evaluation of primary tumor segmentation and prediction of pathologic complete response using pre-treatment magnetic resonance imaging only. The training cohort comprised 1,506 patients from multiple institutions in the United States, while evaluation was conducted on an external test set of 574 patients from three independent European centers to assess cross-continental and cross-institutional generalization. A unified scoring framework combined predictive performance with subgroup consistency across age, menopausal status, and breast density. Twenty-six international teams participated in the final evaluation phase. Results demonstrate substantial performance variability under a common external evaluation framework and reveal trade-offs between overall accuracy and subgroup fairness. The challenge provides standardized datasets, evaluation protocols, and public resources to promote the development of robust and equitable artificial intelligence systems for breast cancer imaging.

2602.22959 2026-06-19 cs.CV 版本更新 80%

Can Agents Distinguish Visually Hard-to-Separate Diseases in a Zero-Shot Setting? A Pilot Study

智能体能否在零样本设置中区分视觉上难以分离的疾病?一项初步研究

Zihao Zhao, Frederik Hauke, Juliana De Castilhos, Sven Nebelung, Daniel Truhn

发表机构 * Department of Diagnostic and Interventional Radiology, University Hospital Aachen, 52074 Aachen, Germany(诊断与介入放射科,亚琛大学医院,德国亚琛,52074)

专题命中 医学影像 :区分视觉混淆疾病的零样本诊断

AI总结 本研究探索多模态大语言模型智能体在零样本下区分视觉混淆疾病(如黑色素瘤与不典型痣、肺水肿与肺炎)的能力,提出基于对比裁决的多智能体框架,在皮肤镜数据上准确率提升11个百分点,但总体性能仍不足临床部署。

Comments Code available at https://github.com/TruhnLab/Contrastive-Agent-Reasoning. Accepted by MICCAI 2026

详情
AI中文摘要

多模态大语言模型(MLLMs)的快速进展引发了对基于智能体系统的日益关注。尽管大多数医学影像先前工作集中于自动化常规临床工作流程,我们研究了一个未被充分探索但临床意义重大的场景:在零样本设置中区分视觉上难以分离的疾病。我们在两个仅基于影像的代理诊断任务上对代表性智能体进行基准测试:(1)黑色素瘤与不典型痣,以及(2)肺水肿与肺炎,尽管临床管理存在显著差异,但视觉特征高度混淆。我们引入了一种基于对比裁决的多智能体框架。实验结果显示诊断性能提升(在皮肤镜数据上准确率提高11个百分点),并在定性样本上减少了无根据的声明,尽管整体性能仍不足以用于临床部署。我们承认人类注释中固有的不确定性以及临床背景的缺失,这进一步限制了向真实世界场景的转化。在此受控设置中,这项初步研究为视觉混淆场景下的零样本智能体性能提供了初步见解。

英文摘要

The rapid progress of multimodal large language models (MLLMs) has led to increasing interest in agent-based systems. While most prior work in medical imaging concentrates on automating routine clinical workflows, we study an underexplored yet clinically significant setting: distinguishing visually hard-to-separate diseases in a zero-shot setting. We benchmark representative agents on two imaging-only proxy diagnostic tasks, (1) melanoma vs. atypical nevus and (2) pulmonary edema vs. pneumonia, where visual features are highly confounded despite substantial differences in clinical management. We introduce a multi-agent framework based on contrastive adjudication. Experimental results show improved diagnostic performance (an 11-percentage-point gain in accuracy on dermoscopy data) and reduced unsupported claims on qualitative samples, although overall performance remains insufficient for clinical deployment. We acknowledge the inherent uncertainty in human annotations and the absence of clinical context, which further limit the translation to real-world settings. Within this controlled setting, this pilot study provides preliminary insights into zero-shot agent performance in visually confounded scenarios.