医学 AI

2606.20112 2026-06-19 cs.CV eess.IV 新提交 95%

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

像素级残差扩散Transformer：可扩展的3D CT体生成

Zhenkai Zhang, Markus Hiller, Krista A. Ehinger, Tom Drummond

发表机构 * School of Computing and Information Systems, The University of Melbourne（墨尔本大学计算与信息系统学院）

专题命中医学影像：提出3D CT体生成方法，用于医学影像

AI总结提出像素级残差扩散Transformer（PRDiT），通过两阶段训练（局部MLP盲估计器分离低频结构+全局残差扩散Transformer建模高频残差）实现高保真3D CT体生成，在LIDC-IDRI和RAD-ChestCT数据集上优于现有方法。

Comments Accepted at ICLR 2026. Code available at https://github.com/Fredy-Zhang/PRDiT

详情

AI中文摘要

由于现有生成模型固有的巨大计算需求和优化困难，生成具有精细细节的高分辨率3D CT体仍然具有挑战性。在本文中，我们提出了像素级残差扩散Transformer（PRDiT），这是一种可扩展的生成框架，可直接在体素级别合成高质量的3D医学体。PRDiT引入了一个两阶段训练架构，包括：1）一个局部去噪器，形式为基于MLP的盲估计器，作用于重叠的3D块，以有效分离低频结构；2）一个全局残差扩散Transformer，采用内存高效注意力来建模和细化整个体上的高频残差。这种从粗到细的建模策略简化了优化，增强了训练稳定性，并有效保留了细微结构，而无需自编码器瓶颈。在LIDC-IDRI和RAD-ChestCT数据集上进行的大量实验表明，PRDiT始终优于最先进的模型，如HA-GAN、3D LDM和WDM-3D，在3D FID、MMD和Wasserstein距离指标上显著降低。

英文摘要

Generating high-resolution 3D CT volumes with fine details remains challenging due to substantial computational demands and optimization difficulties inherent to existing generative models. In this paper, we propose the Pixel-Level Residual Diffusion Transformer (PRDiT), a scalable generative framework that synthesizes high-quality 3D medical volumes directly at voxel-level. PRDiT introduces a two-stage training architecture comprising 1) a local denoiser in the form of an MLP-based blind estimator operating on overlapping 3D patches to separate low-frequency structures efficiently, and 2) a global residual diffusion transformer employing memory-efficient attention to model and refine high-frequency residuals across entire volumes. This coarse-to-fine modeling strategy simplifies optimization, enhances training stability, and effectively preserves subtle structures without the limitations of an autoencoder bottleneck. Extensive experiments conducted on the LIDC-IDRI and RAD-ChestCT datasets demonstrate that PRDiT consistently outperforms state-of-the-art models, such as HA-GAN, 3D LDM and WDM-3D, achieving significantly lower 3D FID, MMD and Wasserstein distance scores.

URL PDF HTML ☆

赞 0 踩 0

2606.20108 2026-06-19 cs.CV cs.LG 新提交 95%

EFIQA: Explainable Fundus Image Quality Assessment via Anatomical Priors

EFIQA: 基于解剖先验的可解释眼底图像质量评估

Pengwei Wang, José Morano, Qian Wan, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria（维也纳医科大学医学数据科学中心人工智能研究所）； Christian Doppler Lab for Artificial Intelligence in Retina, Medical University of Vienna, Austria（维也纳医科大学视网膜人工智能克里斯蒂安·多普勒实验室）

专题命中医学影像：眼底图像质量评估，医学影像应用

AI总结提出无需质量标签的EFIQA框架，利用解剖先验通过掩膜解剖修复学习正常结构，生成空间质量图，在多个基准上超越监督方法，兼具可解释性。

Comments Accepted in MIDL 2026. Code: https://github.com/penway/EFIQA

Journal ref Proceedings of Machine Learning Research 315:2248-2264, 2026

详情

AI中文摘要

图像质量控制对于广泛的下游应用至关重要。基于深度学习的图像质量评估方法通常根据数据集特定的质量标签训练分类器，这继承了两种局限性：（1）泛化能力受限于训练集的标注标准；（2）这些方法无法提供质量下降的空间反馈，缺乏可解释性。在这项工作中，我们提出了EFIQA，一个无需质量相关监督的框架，并通过设计生成空间质量图。EFIQA不是从人工标注的标签中学习“什么是退化”，而是通过利用解剖先验来学习“应该有什么”。对于眼底摄影，我们将其实例化为两阶段方法：首先通过掩膜解剖修复训练无监督异常检测器，以识别缺失血管区域；然后将这一先验知识蒸馏到一个浅层适配器中，将冻结基础模型的特征映射到精确的质量图。外部数据集评估表明，这种无需标签且只需最小适配的方法，在不同质量标准的基准上，与监督方法相比，实现了更好的性能和可解释性，突显了其在现实应用中的潜力。

英文摘要

Image quality control is vital for a wide range of downstream applications. Deep learning-based image quality assessment methods typically train classifiers on dataset-specific quality labels, inheriting two limitations: (1) generalization is tied to the labeling criteria of the training set and (2) these methods cannot provide spatial feedback on where the quality is degraded, lacking explainability. In this work, we propose EFIQA, a framework that requires no quality-related supervision and produces spatial quality maps by design. Rather than learning ``what is degradation" from human-annotated labels, EFIQA learns ``what should be there" by leveraging anatomical priors. For fundus photography, we instantiate this as a two-stage approach, by first training an unsupervised anomaly detector via masked anatomical inpainting to identify regions of missing vasculature, and then distilling this prior knowledge into a shallow adapter mapping features of a frozen foundation model to precise quality maps. External-dataset evaluation demonstrates that this label-free approach with minimal adaptation achieves better performance and explainability compared with supervised methods across benchmarks with different quality criteria, highlighting its potential for real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2606.19838 2026-06-19 cs.CV 新提交 95%

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

OTCHA: 基于最优传输的置信度感知潜在中心对齐用于多视图医学图像分类

Jiwoong Yang, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

专题命中医学影像：多视图医学图像分类，应用于乳腺X光和胸片。

AI总结提出OTCHA模块，通过最优传输对齐多视图补丁令牌与共享潜在中心令牌，结合置信度门控和部分匹配，消除无关特征，提升多视图医学图像分类鲁棒性。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

多视图成像（如乳腺X线摄影和胸部X线摄影）是临床实践的标准组成部分。然而，医学图像通常未配准，且包含视图特定的伪影或无关背景线索，这些可能掩盖诊断相关发现。许多现有方法直接融合每个视图的表征，使得此类无关内容污染融合嵌入，并在不同视图配置下降低鲁棒性。我们提出OTCHA，一种基于最优传输（OT）的置信度感知潜在中心令牌对齐模块，在融合前细化补丁令牌以用于多视图分类。OTCHA引入一组跨视图共享的可学习潜在中心令牌。对于每个视图，我们计算补丁令牌与中心令牌之间的OT计划，该计划联合考虑特征相似性和几何结构，并通过令牌条件尘埃箱增强OT公式以实现部分匹配并丢弃无关令牌。所得传输计划提供令牌级匹配置信度，该置信度门控中心介导的消息传递，并加权一种新的基于最优传输的表征对齐损失以稳定细化。在三个多视图医学图像数据集上的实验表明，在不同解剖结构和视图配置下，相比竞争基线方法取得一致改进。我们的代码可在该https URL获取。

英文摘要

Multi-view imaging, such as mammography and chest radiography, is a standard component of clinical practice. However, medical images are often unregistered and contain view-specific artifacts or irrelevant background cues that can obscure diagnostically relevant findings. Many existing methods directly fuse per-view representations, allowing such irrelevant content to contaminate the fused embedding and reducing robustness under varying view configurations. We propose OTCHA, a confidence-aware latent hub token alignment module based on optimal transport (OT) that refines patch tokens before fusion for multi-view classification. OTCHA introduces a set of learnable latent hub tokens shared across views. For each view, we compute an OT plan between patch tokens and hub tokens that jointly considers feature similarity and geometry, and augment the OT formulation with token-conditional dustbins to enable partial matching and discard irrelevant tokens. The resulting transport plan provides token-wise matching confidence, which gates hub-mediated message passing and weights a novel optimal-transport-based representation alignment loss to stabilize refinement. Experiments on three multi-view medical image datasets demonstrate consistent improvements over competing baselines across diverse anatomies and view configurations. Our code is available at https://github.com/labhai/OTCHA.

URL PDF HTML ☆

赞 0 踩 0

2606.19824 2026-06-19 cs.CV cs.AI 新提交 95%

CSWinUNETR: Segmentation of Thin Anatomical Structures in Medical Images

CSWinUNETR: 医学图像中薄解剖结构的分割

Junho Moon, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

专题命中医学影像：分割视网膜血管、脑血管等薄解剖结构。

AI总结提出CSWinUNETR通用骨干网络，通过交叉形条带自注意力、循环移位、细节增强多尺度自注意力和稀疏控制动态蛇形卷积，解决薄结构分割中的低对比度、断裂和类不平衡问题，在眼科、神经血管和皮肤科基准上超越现有方法。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

准确分割薄而曲折的解剖结构，如视网膜血管、脑血管和面部皱纹，由于低对比度、频繁断裂和严重的类别不平衡仍然具有挑战性。尽管最近的卷积和基于Transformer的模型提高了性能，但它们常常产生碎片化的预测，并且无法恢复细小的分支。我们提出了CSWinUNETR，一个用于2D和3D薄结构分割的通用骨干网络。它采用交叉形条带自注意力来建模长距离主轴上下文，并结合循环移位以增强条带间的信息交换。为了更好地保留细粒度细节，我们进一步引入了一个细节增强的多尺度自注意力模块，该模块从多分辨率表示中聚合上下文特征。此外，我们提出了稀疏控制动态蛇形卷积，它从稀疏预测的控制点重建可靠的密集曲线核，以更好地跟随曲折的几何形状。在眼科、神经血管成像和皮肤科的四个基准上的大量实验表明，CSWinUNETR在没有任务特定后处理或拓扑感知损失的情况下，始终优于最先进的方法。代码可在该网址获取。

英文摘要

Accurate segmentation of thin, tortuous anatomical structures, such as retinal vessels, cerebral vasculature, and facial wrinkles, remains challenging due to low contrast, frequent discontinuities, and severe class imbalance. Although recent convolutional and Transformer-based models have improved performance, they often yield fragmented predictions and fail to recover fine branches. We propose CSWinUNETR, a general-purpose backbone for 2D and 3D thin-structure segmentation. It employs cross-shaped stripe self-attention to model long-range principal-axis context and incorporates cyclic shifts to enhance information exchange across stripes. To better preserve fine-grained details, we further introduce a detail-enhanced multi-scale self-attention module that aggregates contextual features from multi-resolution representations. In addition, we propose sparse-control dynamic snake convolution, which reconstructs reliable dense curvilinear kernels from sparsely predicted control points to better follow tortuous geometry. Extensive experiments on four benchmarks across ophthalmology, neurovascular imaging, and dermatology demonstrate that CSWinUNETR consistently outperforms state-of-the-art methods without task-specific post-processing or topology-aware losses. The code is available at https://github.com/labhai/CSWinUNETR.

URL PDF HTML ☆

赞 0 踩 0

2606.19460 2026-06-19 cs.CV cs.AI cs.LG 新提交 95%

Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers

使用整流流变换器扩展胸部X光片的生成式基础模型

Fabio De Sousa Ribeiro, Emma A. M. Stanley, Charles Jones, Tian Xia, Dominic C. Marshall, Laurent Renard Triché, Christopher V. Cosgriff, Panagiotis Dimitrakopoulos, Sotirios A. Tsaftaris, Ben Glocker

发表机构 * Imperial College London（帝国理工学院）； Causality in Healthcare AI Hub（医疗AI因果关系中心）； University of Edinburgh（爱丁堡大学）； Cleveland Clinic London（克利夫兰诊所伦敦）； Department of Perioperative Medicine, CHU Clermont-Ferrand（克莱蒙费朗大学医院围手术期医学科）； Department of Medicine, Massachusetts General Hospital（麻省总医院医学部）； Broad Institute of MIT and Harvard（麻省理工学院与哈佛大学博德研究所）

专题命中医学影像：十亿参数级胸部X光片生成基础模型。

AI总结提出首个十亿参数级胸部X光片生成基础模型，通过整流流变换器实现高保真可控合成，显著提升合成图像与真实图像的不可区分性。

Comments Project page: https://RadiT-project.github.io

详情

AI中文摘要

我们引入了首个从零开始在十亿参数规模上训练的胸部X光片合成生成基础模型。现有的放射学AI模型通常在不同患者亚群、机构和采集设置下泛化能力差，导致实际临床效用有限。可控、高保真的胸部X光片合成是多样化临床数据集和评估诊断模型鲁棒性的有前景途径。因此，我们提出了迄今为止最大的胸部X光片专用生成基础模型，拥有超过13亿参数，在包含120万张X光片和临床专家指导元数据的精选异质数据集上训练了1.6万亿个token。我们的模型支持跨多个人口统计亚组、采集视图和十多种病理的可控X光片生成和编辑。此外，我们显著推进了X光片合成保真度的最新技术，生成的图像对临床专家而言与真实X光片无法区分。

英文摘要

We introduce the first generative foundation model for chest radiograph synthesis trained from scratch at the billion-parameter scale. Existing radiographic AI models often suffer from poor generalisation across patient subpopulations, institutions, and acquisition settings, resulting in limited real-world clinical utility. Controlled, high-fidelity synthesis of chest radiographs is a promising path toward diversifying clinical datasets and evaluating the robustness of diagnostic models. Therefore, we present the largest specialist generative foundation model for chest radiographs to date, with over 1.3B parameters, trained for 1.6T tokens on a curated, heterogeneous dataset comprising 1.2M radiographs and clinical expert-guided metadata. Our model supports controllable radiograph generation and editing across multiple demographic subgroups, acquisition views, and a dozen pathologies. Moreover, we significantly advance the state of the art in radiograph synthesis fidelity, producing images that are indistinguishable from real radiographs to clinical experts.

URL PDF HTML ☆

赞 0 踩 0

2606.14957 2026-06-19 cs.CV 新提交 95%

Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

学习用于多模态神经影像的稀疏潜在预测基础模型

Haoxu Huang, Long Chen, Jingyun Chen, Jinu Hyun, James Ryan Loftus, Kara Melmed, Daniel Orringer, Jennifer Frontera, Seena Dehkharghani, Arjun Masurkar, Narges Razavian

发表机构 * New York University, Center for Data Science（纽约大学数据科学中心）； NYU Grossman School of Medicine, Department of Radiology（纽约大学格罗斯曼医学院放射学系）； State University of New York at Binghamton, School of Computing（纽约州立大学宾汉姆顿分校计算机学院）； NYU Grossman School of Medicine, Department of Neurology（纽约大学格罗斯曼医学院神经病学系）； NYU Grossman School of Medicine, Department of Neurosurgery（纽约大学格罗斯曼医学院神经外科学系）； NYU Grossman School of Medicine, Department of Pathology（纽约大学格罗斯曼医学院病理学系）； School of Medicine, Department of Radiology, Stanford（斯坦福大学医学院放射学系）； NYU Grossman School of Medicine, Department of Neuroscience（纽约大学格罗斯曼医学院神经科学系）； NYU Grossman School of Medicine, Neuroscience Institute（纽约大学格罗斯曼医学院神经科学研究所）

专题命中医学影像：多模态神经影像基础模型

AI总结提出Neuro-JEPA模型，结合潜在预测目标和专家混合架构，学习T1w、T2w和FLAIR三种MRI序列的统一表示，在25项临床任务和22项公开数据集任务上优于现有基础模型和CNN基线。

Comments Under Review Preprint

详情

AI中文摘要

脑部MRI通常作为多个互补序列采集，具有独特的对比度加权，包括T1加权成像（T1w）解剖对比和液体敏感T2加权（T2w）对比。然而，在健康系统规模上，跨多种MRI对比机制学习统一表示的方法尚缺乏。在本研究中，我们引入了Neuro-JEPA，一种稀疏多模态神经影像基础模型，它结合了潜在预测目标和专家混合架构，以编码跨核心T1w、T2w和液体抑制FLAIR成像（FLAIR）的脑部MRI。我们进一步对架构、掩码、目标和稀疏性设计选择进行了系统的方法论研究，这些选择有利于稳健的神经影像多模态表示学习。Neuro-JEPA在428,647项研究的1,551,862次扫描上进行了预训练，这些扫描经过了模态特定的预处理和跨三种核心结构脑部MRI序列的数据整理。我们在临床和研究环境中评估了学习到的表示，包括来自三个健康系统（NYU Langone、NYU Long Island和Massachusetts General Hospital）的25项任务，以及来自12个公开数据集的22项任务，涵盖了单模态、多模态和跨域评估配置。在这些基准测试中，现有的神经影像基础模型相对于简单的卷积神经网络（CNN）基线显示出不一致的提升，而Neuro-JEPA在所有评估设置中实现了更强且更一致的性能。这些结果建立了一个可扩展的多模态神经影像表示学习方法论框架，并强调了基础模型评估协议需要包括简单基线、临床异质性队列和受控的多模态比较。

英文摘要

Brain MRIs are routinely acquired as multiple complementary sequences with unique contrast weighting, including T1-weighed imaging (T1w) anatomic and fluid-sensitive T2-weighted (T2w) contrasts. However, methods for learning unified representations across the multitude of MRI contrast mechanisms at health-system scale are lacking. In this study, we introduce Neuro-JEPA, a sparse multimodal neuroimaging foundation model that combines a latent predictive objective with a Mixture-of-Experts architecture to encode brain MRI across core T1w, T2w, and fluid-suppressed FLAIR imaging (FLAIR). We further provide a systematic methodological study of architectural, masking, objective, and sparsity design choices beneficial for robust neuroimaging multimodal representation learning. Neuro-JEPA was pretrained on 1,551,862 scans from 428,647 studies after modality-specific preprocessing with data curation across three core structural brain MRI sequences. We evaluated the learned representations across clinical and research settings, including 25 tasks from three health systems: NYU Langone, NYU Long Island, and Massachusetts General Hospital, and 22 tasks from 12 public datasets, covering unimodal, multimodal and cross-domain evaluation configurations. Across these benchmarks, existing neuroimaging foundation models showed inconsistent gains over a simple convolutional neural network (CNN) baseline, whereas Neuro-JEPA achieved stronger and more consistent performance across all evaluated settings. These results establish a scalable methodological framework for multimodal neuroimaging representation learning and highlight the need for foundation model evaluation protocols that include simple baselines, clinically heterogeneous cohorts and controlled multimodal comparisons.

URL PDF HTML ☆

赞 0 踩 0

2605.00665 2026-06-19 cs.CV 版本更新 95%

Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

基于深度学习的视网膜图像预测阿尔茨海默病风险因素：英国生物银行中生物学相关形态学关联的开发和验证

Seowung Leem, Yunchao Yang, Adam J. Woods, Ruogu Fang

发表机构 * J. Crayton Pruitt Family Dept. of Biomedical Engineering, University of Florida（朱·克雷顿·普瑞特生物医学工程系，佛罗里达大学）； University of Florida Research Computing（佛罗里达大学研究计算中心）； Meta AI (FAIR)（Meta AI（FAIR））； School of Behavioral and Brain Sciences, University of Texas at Dallas（德克萨斯大学达拉斯分校行为与脑科学学院）； Dept. of Electrical and Computer Engineering, University of Florida（佛罗里达大学电气与计算机工程系）； Dept. of Computer and Information Science and Engineering, University of Florida（佛罗里达大学计算机与信息科学与工程系）； Center for Cognitive Aging and Memory, University of Florida（佛罗里达大学认知衰老与记忆中心）

专题命中医学影像：用深度学习从视网膜图像预测阿尔茨海默病风险因素

AI总结利用深度学习从视网膜彩色眼底照片预测12个阿尔茨海默病相关风险因素，并揭示其背后的视网膜结构特征，发现视神经头和视网膜血管等区域与风险因素及阿尔茨海默病前期变化相关。

Comments Accepted to the "Journal of Alzheimer's Disease" for publication

详情

AI中文摘要

系统性的、代谢性的、生活方式的因素已通过流行病学和AD特异性生物标志物研究与阿尔茨海默病（AD）建立关联。彩色眼底摄影（CFP）是否包含与这些AD相关风险域相对应的视网膜结构特征仍不清楚。为了确定深度学习（DL）模型能否从CFP预测12个AD相关风险因素，并表征这些预测背后的视网膜结构，从而评估CFP是否反映AD易感性的通路。使用来自英国生物银行的44,501名独特参与者的62,876张CFP，训练DL模型预测与AD发病率相关的12个因素：6个分类变量（性别、吸烟、失眠、经济状况、饮酒、抑郁）和6个连续变量（年龄、受教育完成年龄、BMI、收缩压、舒张压、HbA1c）。评估模型性能、模型显著性和显著性衍生得分（CAM-Score），并与视网膜形态测量进行比较。还将得分在AD发病病例（平均发病前8.55年）与匹配对照之间进行比较。DL的性能范围为分类变量的AUROC=0.5654-0.9480，连续变量的R2=-0.0291-0.7620，优于大多数形态测量-机器学习模型。基于显著性的得分一致地突出了生物学上有意义的区域，特别是视神经头和视网膜血管。它也与现有的形态测量变异一致。多个基于显著性的得分在AD发病病例与匹配对照之间存在显著差异，表明风险因素的视网膜相关性与临床前AD相关变化之间存在潜在重叠。CFP编码了与AD风险因素相关的视网膜特征。尽管不具有诊断性，但DL衍生的视网膜表征可能揭示反映潜在AD易感性的生物学上有意义的风险相关结构变化。

英文摘要

The systemic, metabolic, lifestyle factors have established associations with Alzheimer's Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using 62,876 CFPs from 44,501 unique participants from the UK Biobank, DL models were trained to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.

URL PDF HTML ☆

赞 0 踩 0

2601.15119 2026-06-19 eess.IV cs.CV 95%

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

医学影像中的视觉模型：一种用于超声扫描中多囊卵巢综合征检测的混合方法

Md Mahmudul Hoque, Md Mehedi Hassain, Muntakimur Rahaman, Md. Towhidul Islam, Shaista Rani, Md Sharif Mollah

发表机构 * Department of CSE, CCN University of Science & Technology（计算机科学与工程系，CCN科学与技术大学）； Department of EEE,International Islamic University Chittagong（电子工程系，国际伊斯兰大学恰tagong分校）； Faculty of Engineering, Multimedia University（工程学院，多媒体大学）； Department of CSE, Stamford University of Bangladesh（计算机科学与工程系，斯塔福德大学孟加拉国分校）； Department of Biology, Lucknow University（生物学系，拉胡尔大学）； Department of CSE, Bangladesh Army International University of Science & Technology（计算机科学与工程系，孟加拉国军队国际科学与技术大学）

专题命中医学影像：混合视觉模型用于超声图像PCOS检测，属于医学影像分析。

AI总结本文提出两种混合模型，结合卷积和Transformer方法，用于超声图像中多囊卵巢综合征的准确检测，最终模型在准确性上达到98.23%。

详情

DOI: 10.1088/1742-6596/3191/1/012120

AI中文摘要

多囊卵巢综合征（PCOS）是育龄女性最常见的内分泌疾病。许多孟加拉女性在老年时患PCOS。我们的研究目的是识别有效的基于视觉的医学图像分析技术，并评估混合模型以准确检测PCOS。我们引入了两种新颖的混合模型，结合卷积和Transformer方法。训练和测试数据被分为两类：“感染”（PCOS阳性）和“非感染”（健康卵巢）。在初始阶段，我们的第一个混合模型“DenConST”（结合DenseNet121、Swin Transformer和ConvNeXt）达到了85.69%的准确率。最终优化的模型“DenConREST”（结合Swin Transformer、ConvNeXt、DenseNet121、ResNet18和EfficientNetV2）表现出更优异的性能，准确率达到98.23%。在所有评估的模型中，DenConREST表现最佳。本研究为从超声图像中检测PCOS提供了一个高效的解决方案，显著提高了诊断准确性并减少了检测错误。

英文摘要

Polycystic Ovary Syndrome (PCOS) is the most familiar endocrine illness in women of reproductive age. Many Bangladeshi women suffer from PCOS disease in their older age. The aim of our research is to identify effective vision-based medical image analysis techniques and evaluate hybrid models for the accurate detection of PCOS. We introduced two novel hybrid models combining convolutional and transformer-based approaches. The training and testing data were organized into two categories: "infected" (PCOS-positive) and "noninfected" (healthy ovaries). In the initial stage, our first hybrid model, 'DenConST' (integrating DenseNet121, Swin Transformer, and ConvNeXt), achieved 85.69% accuracy. The final optimized model, 'DenConREST' (incorporating Swin Transformer, ConvNeXt, DenseNet121, ResNet18, and EfficientNetV2), demonstrated superior performance with 98.23% accuracy. Among all evaluated models, DenConREST showed the best performance. This research highlights an efficient solution for PCOS detection from ultrasound images, significantly improving diagnostic accuracy while reducing detection errors.

URL PDF HTML ☆

赞 0 踩 0

2606.19804 2026-06-19 cs.CV 新提交 92%

HypOProto: Hyperbolic Ordinal Prototypes for Left Ventricular Filling Pressure Classification

HypOProto: 用于左心室充盈压分类的双曲序数原型

Victoria Wu, Nima Hashemi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang

发表机构 * The University of British Columbia（不列颠哥伦比亚大学）； Vancouver General Hospital（温哥华综合医院）

专题命中医学影像：使用超声心动图进行左心室充盈压分类，属于医学影像分析。

AI总结提出HypOProto框架，利用双曲空间中的序数原型对左心室充盈压进行分类，通过冻结的可解释基础模型实现高精度与临床可解释性。

详情

AI中文摘要

超声心动图（echo）是一种广泛用于评估心脏功能的成像模态，左心室充盈压（LVFP）是心力衰竭等疾病的关键生理标志物。将LVFP分为正常和升高类别的标准依赖于多普勒衍生的$E/e'$比值，该比值依赖于操作者，且在资源有限的环境中通常不可用，这促使了直接从B模式超声推断LVFP的方法。现有的深度学习方法实现了高性能，但大多是黑盒模型，限制了临床可解释性。我们提出了HypOProto，一个基于双曲序数原型的可解释LVFP分类框架，使用冻结的可解释基础模型骨干。HypOProto沿着生理$E/e'$尺度排列原型，将边界情况放置在双曲面根附近，其中小的角度差异区分相似情况，而正常和升高情况占据向外位置，反映诊断确定性的增加。这种双曲几何编码了临床上有意义的序数关系，并提高了可解释性。我们还引入了一种新的双曲原型角度分离（HyperPAS）损失，强制在双曲空间中实现类间原型分离。HypOProto在保持透明性的同时实现了最先进的性能，并在可视化中突出显示临床相关区域。这项工作代表了超声中LVFP分类的第一个基于原型的框架。我们的代码可在以下网址找到：此 https URL。

英文摘要

Echocardiography (echo) is a widely used imaging modality for assessing cardiac function, with Left Ventricular Filling Pressure (LVFP) serving as a critical physiological marker for conditions such as heart failure. Standard LVFP classification into normal \emph{vs} elevated categories relies on the Doppler-derived $E/e'$ ratio, which is operator-dependent and often unavailable in resource-limited settings, motivating methods that infer LVFP directly from B-mode echo. Existing deep learning approaches achieve high performance but remain largely black-box, limiting clinical interpretability. We propose HypOProto, a hyperbolic, ordinal prototype-based framework for interpretable LVFP classification using a frozen, explainable foundation model backbone. HypOProto arranges prototypes along the physiological $E/e'$ scale, placing borderline cases near the hyperboloid root where small angular differences separate similar cases, while normal and elevated cases occupy outward positions reflecting increasing diagnostic certainty. This hyperbolic geometry encodes clinically meaningful ordinal relationships and improves interpretability. We also introduce a novel Hyperbolic Prototype Angular Separation (HyperPAS) loss, enforcing inter-class prototype separation in hyperbolic space. HypOProto achieves SOTA performance while maintaining transparency, and highlights clinically relevant regions in visualizations. This work represents the first prototype-based framework for LVFP classification in echo. Our code can be found at https://github.com/DeepRCL/HypOProto.

URL PDF HTML ☆

赞 0 踩 0

2606.20477 2026-06-19 cs.CV cs.CL cs.LG 新提交 90%

Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology

面向放射学的空间定位2D视觉-语言模型的可扩展训练

Yusuf Salcan, Simon Ging, Robin Schirrmeister, Philipp Arnold, Elmar Kotter, Behzad Bozorgtabar, Thomas Brox

发表机构 * Computer Vision Group, University of Freiburg, Germany（德国弗莱堡大学计算机视觉组）； Department of Radiology, Medical Center -- University of Freiburg, Germany（德国弗莱堡大学医学中心放射科）； CRIION-AI Lab, Freiburg, Germany（德国弗莱堡CRIION-AI实验室）

专题命中医学影像：放射学视觉语言模型，空间定位

AI总结提出RefRad2D大规模双语数据集，通过LLM和自动分割生成空间定位数据，训练RadGrounder模型联合完成报告生成、VQA和空间定位，在外部基准上取得竞争性结果。

Comments Accepted for MICCAI 2026. First two authors: equal contribution. Last two authors: equal supervision

详情

AI中文摘要

我们研究了如何在没有手动空间标注的情况下，为放射学训练具有视觉定位能力的视觉-语言模型（VLM）。我们引入了RefRad2D，这是一个大规模的双语（德语/英语）数据集，包含来自临床实践的120万对CT和MR图像-文本对，并通过基于LLM的筛选和自动分割自动生成任务特定的VQA和空间定位子集。在此数据上训练的模型RadGrounder联合执行报告生成、视觉问答以及通过边界框检测或分割进行的空间定位。在外部VQA基准（Slake，VQA-RAD）上，RadGrounder取得了与专用医学VLM竞争的结果。将我们的临床数据加入训练混合集，相比于仅在下游数据集上微调，提高了开放式VQA的性能，显示了数据集的迁移性。关键在于，添加定位监督不会降低语言质量，从而在不牺牲VQA性能的情况下实现空间可验证的输出。

英文摘要

We study how to train visually grounded vision-language models (VLMs) for radiology without manual spatial annotations. We introduce RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with task-specific VQA and spatial grounding subsets generated automatically via LLM-based curation and automated segmentation. Trained on this data, our model RadGrounder jointly performs report generation, visual question answering, and spatial grounding via bounding-box detection or segmentation. On external VQA benchmarks (Slake, VQA-RAD), RadGrounder achieves competitive results with specialized medical VLMs. Adding our clinical data to the training mixture improves open-ended VQA over fine-tuning on the downstream datasets alone, showing the transferability of our dataset. Crucially, adding grounding supervision does not degrade language quality, enabling spatially verifiable outputs at no cost to VQA performance.

URL PDF HTML ☆

赞 0 踩 0

2606.20390 2026-06-19 cs.CV 新提交 90%

Geometry-Aware Superpixel Graph Transformer with Metadata for Skin Lesion Classification

几何感知超像素图变换器结合元数据用于皮肤病变分类

Muhammad Azeem, Tanveer Hussain, Amr Ahmed, Ardhendu Behera

发表机构 * Edge Hill University（埃奇希尔大学）

专题命中医学影像：提出基于图的皮肤病变分类方法，使用皮肤镜图像。

AI总结提出一种基于区域的图学习框架，将病变建模为超像素图，利用几何边属性和元数据上下文节点，通过边缘感知图变换器实现多模态融合，在四个公开数据集上取得优于现有方法的分类性能。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

由于病变结构异质性、类内变异大以及良恶性病例间细微视觉差异，从皮肤镜图像进行自动化皮肤癌分类仍然具有挑战性。现有的CNN/ViT流程通常依赖全局或补丁级特征，并常通过后期融合结合患者元数据，这限制了空间基础的多模态推理。我们提出一种新颖的基于区域的图学习框架，将病变显式建模为空间连贯的超像素区域图，这些区域表示为冻结的CNN特征。为了捕捉细粒度的病变排列，我们将区域间几何编码为边属性，并引入一个与所有区域相连的专用元数据上下文节点，从而在同一关系空间内结构化地整合人口统计学/临床变量。节点表示通过我们的边缘感知图变换器进行更新，随后进行注意力驱动的传播，最终生成用于良恶性分类的图级嵌入。在四个公开基准上的实验表明，显式的区域级关系建模和图原生多模态融合相较于现有技术取得了持续改进。因此，我们建立了一种新的以图为中心的视角，其中CNN特征被建模为关系节点，并通过上下文整合得到改进，从而产生更具表现力和鲁棒性的分类结果。

英文摘要

Automated skin cancer classification from dermoscopic images remains challenging due to heterogeneous lesion structure, strong intra-class variability, and subtle visual differences between benign and malignant cases. Existing CNN/ViT pipelines typically rely on global or patch-level features and often combine patient metadata via late fusion, which limits spatially grounded multimodal reasoning. We present a novel region-based graph learning framework that explicitly models lesions as graphs of spatially coherent superpixel regions represented as frozen CNN features. To capture fine-grained lesion arrangements, we encode inter-regional geometry as edge attributes and introduce a dedicated metadata context node connected to all regions, providing structured integration of demographic/clinical variables within the same relational space. Node representations are updated using our edge-aware graph transformer followed by attention-driven propagation, and a final graph-level embedding for benign-malignant classification. Experiments on four public benchmarks demonstrate that explicit region-level relational modeling and graph-native multimodal fusion yield consistent gains over the state-of-the-art. Consequently, we establish a new graph-centric perspective in which CNN features are modeled as relational nodes and improved through contextual integration, yielding more expressive and robust classifications.

URL PDF HTML ☆

赞 0 踩 0

2606.20172 2026-06-19 cs.LG 新提交 90%

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

基于多模态胎儿MRI预测早产背景下的出生胎龄

Diego Fajardo-Rojas, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter

发表机构 * Leibniz University Hannover（莱布尼茨汉诺威大学）

专题命中医学影像：多模态胎儿MRI预测早产出生胎龄。

AI总结提出结合多模态胎儿MRI和机器学习流程预测出生胎龄，包括数据插补、特征选择和回归模型，在333例对照和93例早产数据上评估，R²=0.13，MAE=2.74周，准确率0.77。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:013

Journal ref Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

详情

DOI: 10.59275/j.melba.2026-f34b

AI中文摘要

早产与高死亡率和终身发病风险相关。复杂的多因素病因阻碍了准确预测和最佳护理。我们开发并评估了一个包含定制机器学习方法的流程，用于数据插补、特征选择和回归模型，以从333例对照和93例早产病例的综合多模态形态和功能胎儿MRI数据预测出生胎龄。将出生胎龄预测分为足月和早产类别，并报告其准确性、敏感性和特异性。进行了消融研究以进一步验证流程设计。使用分层10折交叉验证评估性能。该流程实现了0.13的R²分数和2.74周的平均绝对误差。在交叉验证中，准确率为0.77，敏感性为0.59，特异性为0.82。流程选择的主要特征包括宫颈长度和基于胎盘T2*值的统计量。快速、运动鲁棒的多模态胎儿MRI技术与机器学习预测的结合使得能够预测出生胎龄。这些信息对任何妊娠都至关重要。据我们所知，早产在文献中仅作为分类问题处理。因此，这项工作提供了概念验证。未来工作将增加队列规模，以允许在早产队列内进行更精细的分层。我们的代码可在以下网址获取：此https URL。

英文摘要

Preterm birth is associated with significant mortality and a risk for lifelong morbidity. The complex multifactorial aetiology hampers accurate prediction and thus optimal care. A pipeline consisting of bespoke machine learning methods for data imputation, feature selection, and regression models to predict gestational age (GA) at birth was developed and evaluated from comprehensive multi-modal morphological and functional fetal MRI data from 333 control cases and 93 preterm birth cases. The GA at birth predictions were classified into term and preterm categories and their accuracy, sensitivity, and specificity were reported. An ablation study was performed to further validate the design of the pipeline. Performance was evaluated using stratified 10-fold cross-validation. The pipeline achieves an R2 score of 0.13 and a mean absolute error of 2.74 weeks. It also achieves a 0.77 accuracy, 0.59 sensitivity, and 0.82 specificity across folds. The predominant features selected by the pipeline include cervical length and statistics derived from placental T2* values. The confluence of fast, motion-robust and multi-modal fetal MRI techniques and machine learning prediction allowed the prediction of the gestation at birth. This information is essential for any pregnancy. To the best of our knowledge, preterm birth had only been addressed as a classification problem in the literature. Therefore, this work provides a proof of concept. Future work will increase the cohort size to allow for finer stratification within the preterm birth cohort. Our code is available at https://github.com/dfajardorojas/ml-for-preterm-birth-.

URL PDF HTML ☆

赞 0 踩 0

2606.20161 2026-06-19 cs.CV 新提交 90%

ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

ARTEMIS: 基于智能体引导的可靠性感知时间掩码演化用于不完美监督的视频息肉分割

Tong Wang, Siwen Wang, Yaolei Qi, Jinxing Zhou, Yuting He, Guanyu Yang, Yutong Xie

发表机构 * Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education（东南大学教育部新一代人工智能技术及其跨学科应用重点实验室）； Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（穆罕默德·本·扎耶德人工智能大学）； School of Medicine, Case Western Reserve University（凯斯西储大学医学院）

专题命中医学影像：视频息肉分割，临床医学影像应用。

AI总结提出ARTEMIS框架，利用视觉语言智能体选择可靠时间锚点，结合SAM2传播和可靠性感知鲁棒学习，从不完美监督（点、涂鸦、少量密集标签）中学习高质量视频息肉分割掩码，在多个基准上达到最优性能。

详情

AI中文摘要

不完美监督的视频息肉分割（VPS）旨在从廉价监督中学习密集、时间一致的掩码，包括弱标注（点、涂鸦）和少量密集标注帧的半监督。该设置具有临床价值，但由于弱对比、模糊边界、运动模糊和镜面高光，加上稀疏的像素级指导，具有挑战性。虽然SAM2可以从稀疏输入生成密集掩码，但直接伪标签通常会产生几何退化的掩码，存在边界泄漏，未充分利用时间一致性，并忽略可靠性。为解决这些问题，我们提出ARTEMIS，一个由智能体引导的可靠性感知时间掩码演化驱动的统一框架，用于不完美监督的VPS。ARTEMIS从可用监督初始化粗掩码：SAM2转换点/涂鸦，而密集标签作为可靠锚点。一个辩论-判断视觉语言智能体在弱监督下选择可靠的时间锚点，这些锚点通过SAM2双向传播以细化不可靠或未标注的帧。最后，ARTEMIS使用时间可靠性感知鲁棒学习训练分割器，结合可靠性引导的参考选择、参考原型传输模块和可靠性感知鲁棒损失。这些组件评估掩码可靠性，随时间演化锚点，跨帧传输目标身份，并降低噪声监督的权重而非丢弃困难样本。在SUN-SEG和CVC-ClinicDB-612上的涂鸦、点和有限标签设置下的实验表明，ARTEMIS达到了最先进的性能。代码将在此https URL发布。

英文摘要

Imperfectly supervised video polyp segmentation (VPS) aims to learn dense, temporally consistent masks from inexpensive supervision, including weak annotations (points, scribbles) and semi-supervision with few densely labeled frames. This setting is clinically valuable but challenging due to weak contrast, ambiguous boundaries, motion blur, and specular highlights, compounded by sparse pixel-level guidance. While SAM2 can generate dense masks from sparse inputs, direct pseudo-labeling often yields geometry-degraded masks with boundary leakage, underutilizes temporal consistency, and ignores reliability. To address these issues, we propose ARTEMIS, a unified framework for imperfectly supervised VPS driven by agent-guided reliability-aware temporal mask evolution. ARTEMIS initializes coarse masks from available supervision: SAM2 converts points/scribbles, while dense labels serve as reliable anchors. A debate-and-judge vision-language agent selects reliable temporal anchors under weak supervision, which are propagated bidirectionally with SAM2 to refine unreliable or unlabeled frames. Finally, ARTEMIS trains the segmenter using temporal reliability-aware robust learning, incorporating reliability-guided reference selection, a Reference Prototype Transport Module, and reliability-aware robust loss. These components assess mask reliability, evolve anchors over time, transport target identity across frames, and down-weight noisy supervision instead of discarding difficult samples. Experiments on SUN-SEG and CVC-ClinicDB-612 under scribble, point, and limited-label settings demonstrate that ARTEMIS achieves state-of-the-art performance. Code will be released at https://github.com/wangtong627/ARTEMIS.

URL PDF HTML ☆

赞 0 踩 0

2606.20143 2026-06-19 cs.CV 新提交 90%

HEad and neCK TumOR (HECKTOR) 2025: Benchmark of Segmentation, Diagnosis, and Prognosis in Multimodal PET/CT

头颈肿瘤 (HECKTOR) 2025 挑战赛：多模态 PET/CT 中的分割、诊断与预后基准

Numan Saeed, Salma Hassan, Shahad Hardan, Lishan Cai, Xinglong Liang, Moona Mazher, Abdul Qayyum, Yansong Bu, Mengye Lyu, Yue Lin, Mingyuan Meng, Chuanyi Huang, Lisheng Wang, Dalal Chamseddine, Shamimeh Ahrari, Beining Wu, Yifei Chen, Fuyou Mao, Hao Zhang, Baixiang Zhao, Surajit Ray, Muzi Guo, Lei Xiang, Jakob Dexl, Michael Ingrisch, Adrien Depeursinge, Arman Rahmim, Mathieu Hatt, Vincent Andrearczyk, Mohammad Yaqub

发表机构 * Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（穆罕默德·本·扎耶德人工智能大学）； Amsterdam UMC（阿姆斯特丹大学医学中心）； The Netherlands Cancer Institute（荷兰癌症研究所）； Radboud University Medical Centre（拉德堡德大学医学中心）； University College London（伦敦大学学院）； Imperial College London（帝国理工学院）； Shenzhen Technology University（深圳技术大学）； Shenzhen University（深圳大学）； Newland Digital Technology（新大陆数字技术）； The University of Sydney（悉尼大学）； Shanghai Jiao Tong University（上海交通大学）； University Hospital, Nantes（南特大学医院）； Nantes Université, Centrale Nantes, CNRS, LS2N（南特大学、南特中央理工学院、法国国家科学研究中心、LS2N实验室）； Hangzhou Dianzi University（杭州电子科技大学）； Tsinghua University（清华大学）； Central South University（中南大学）； University of Glasgow（格拉斯哥大学）； China Mobile System Integration Co., Ltd.（中移系统集成有限公司）； Subtle Medical Inc.（Subtle Medical公司）； University Hospital, LMU Munich（慕尼黑大学医院）； Munich Center for Machine Learning（慕尼黑机器学习中心）； BC Cancer Research Institute（不列颠哥伦比亚癌症研究所）； HES-SO Valais-Wallis University of Applied Sciences and Arts（HES-SO瓦莱州应用科学与艺术大学）； Lausanne University Hospital (CHUV)（洛桑大学医院）； LaTIM, INSERM, UMR 1101, Univ Brest（LaTIM实验室、法国国家健康与医学研究院、UMR 1101、布雷斯特大学）

专题命中医学影像：头颈肿瘤PET/CT分割、诊断与预后基准。

AI总结 HECKTOR 2025 挑战赛利用多模态 PET/CT 和电子健康记录，建立了头颈癌自动分析的基准，涵盖肿瘤分割、复发预测和 HPV 分类三个任务，最佳算法分别达到 Dice 0.75、C-index 0.66 和平衡准确率 0.56。

Comments 17 pages, 4 figures, 4 tables. Overview paper for the HECKTOR 2025 challenge, held as a satellite event at MICCAI 2025. Challenge website: https://hecktor.grand-challenge.org/

详情

AI中文摘要

头颈癌 (HNC) 构成显著的全球健康负担，准确的肿瘤勾画对于有效的放疗计划至关重要。口咽部解剖结构的复杂性，加上肿瘤在影像上的异质性表现，使得手动分割耗时且存在观察者间差异。除分割外，从非侵入性影像预测长期临床结局（如无复发生存期 RFS）和确定人乳头瘤病毒 (HPV) 状态，仍然是具有挑战性但临床价值高的目标。HECKTOR 2025 挑战赛通过使用多模态 PET/CT 影像和电子健康记录，建立了一个用于自动 HNC 分析的全面基准。基于前几届（2020-2022），本次挑战赛采用了扩展的多机构数据集，包含来自全球 10 个中心的 1100 多名患者。参与者需完成三个互补目标：(1) 分割原发肿瘤体积 (GTVp) 和转移淋巴结 (GTVn)，(2) 预测无复发生存期，(3) 分类 HPV 状态。挑战赛吸引了 35 个注册团队，其中 15 个最终提交在保留测试集上进行了评估。表现最佳的算法在分割上达到平均 Dice 相似系数 0.75，在生存预测上达到一致性指数 0.66，在 HPV 分类上达到平衡准确率 0.56。本文对所提交的方法进行了全面分析，评估了它们在不同病变特征上的性能，并讨论了它们在自动化肿瘤学工作流程和决策支持系统中临床转化的意义。

英文摘要

Head and neck cancers (HNC) represent a significant global health burden, with accurate tumor delineation being essential for effective radiotherapy planning. The complexity of the oropharyngeal anatomy, combined with the heterogeneous appearance of tumors on imaging, makes manual segmentation time-intensive and subject to inter-observer variability. Beyond segmentation, predicting long-term clinical outcomes, such as recurrence-free survival (RFS), and determining human papillomavirus (HPV) status from noninvasive imaging, remain challenging yet clinically valuable goals. The HECKTOR 2025 challenge addresses these needs by establishing a comprehensive benchmark for automated HNC analysis using multimodal PET/CT imaging and electronic health records. Building on previous editions (2020-2022), this challenge features an expanded multi-institutional dataset comprising over 1,100 patients from 10 centers worldwide. Participants were tasked with three complementary objectives: (1) segmenting primary gross tumor volumes (GTVp) and metastatic lymph nodes (GTVn), (2) predicting recurrence-free survival, and (3) classifying HPV status. The challenge attracted 35 registered teams, with 15 final submissions evaluated on a held-out test set. Top-performing algorithms achieved a mean Dice similarity coefficient of 0.75 for segmentation, a concordance index of 0.66 for survival prediction, and a balanced accuracy of 0.56 for HPV classification. This paper presents a comprehensive analysis of the submitted methodologies, evaluates their performance across different lesion characteristics, and discusses their implications for clinical translation in automated oncology workflows and decision support systems.

URL PDF HTML ☆

赞 0 踩 0

2606.20037 2026-06-19 cs.LG 新提交 90%

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

使用3D MRI和PET的多模态方法诊断阿尔茨海默病

Loukas Ilias, Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis

发表机构 * DSS Lab, School of ECE, NTUA（NTUA ECE学院DSS实验室）

专题命中医学影像：用MRI和PET多模态诊断阿尔茨海默病。

AI总结提出结合3D卷积特征提取器与三种融合策略（拼接、门控多模态单元、门控自注意力）及稀疏门控混合专家分类器的多模态模型，用于阿尔茨海默病诊断，在三个二分类任务上验证了输入自适应建模的有效性。

Comments 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

详情

DOI: 10.1109/BIBM66473.2025.11357133

AI中文摘要

阿尔茨海默病（AD）是一种不可逆的神经退行性疾病，也是全球主要的死亡原因之一。早期诊断尤为重要，尤其是在轻度认知障碍（MCI）阶段，及时干预有助于延缓其向AD的进展。神经影像数据，如磁共振成像（MRI）和正电子发射断层扫描（PET），可以通过提供与疾病相关的结构和功能脑变化来帮助早期检测脑部变化。然而，许多多模态模型仍通过静态拼接融合MRI和PET，并对所有受试者应用相同的计算，这限制了其对患者/站点异质性的鲁棒性，并可能浪费计算资源。为解决这些局限性，我们首次研究了将3D卷积特征提取器与三种融合策略（拼接、门控多模态单元（GMU）和门控自注意力）以及一个稀疏门控混合专家（MoE）分类器相结合的方法，该分类器执行输入自适应路由，仅激活每个病例中最具信息量的专家。最后，我们利用Grad-CAM可视化疾病相关区域，确保模型的可解释性。实验在三个二分类任务（NC vs. MCI、MCI vs. AD和NC vs. AD）上进行。结果表明，GMU在NC vs. MCI和NC vs. AD上分别达到80.46%和95.47%的准确率，而门控自注意力在MCI vs. AD上达到82.08%。消融实验表明，移除MoE会持续降低所有任务的准确率。这些发现强调了利用MRI和PET互补性的输入自适应多模态建模在AD诊断中的价值。

英文摘要

Alzheimer's disease (AD) is an irreversible neurodegenerative disorder and a leading cause of death worldwide. Early diagnosis plays an important part especially at the Mild Cognitive Impairment stage, where timely intervention can help slow its progression before it advances to AD. Neuroimaging data, like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) scans, can help detect brain changes early by providing structural and functional brain changes related to the disease. Yet, many multimodal models still fuse MRI and PET with static concatenation and apply identical computation to all subjects, which limits robustness to patient/site heterogeneity and can waste computation. To address these limitations, we present the first study of combining 3D convolutional feature extractors with three fusion strategies - concatenation, Gated Multimodal Unit (GMU), and gated self-attention - and a sparsely gated Mixture-of-Experts (MoE) classifier that performs input-adaptive routing, activating only the most informative experts per case. Finally, we utilize Grad-CAM to visualize disease-related regions, ensuring model interpretability. Experiments are performed across three binary classification tasks (NC vs. MCI, MCI vs. AD, and NC vs. AD). Results show that GMU achieves accuracies of 80.46 % (NC vs. MCI) and 95.47 % (NC vs. AD), while gated self-attention attains 82.08 % on MCI vs. AD. Ablations show that removing the MoE consistently degrades accuracy across all tasks. These findings underscore the value of input-adaptive, multimodal modeling for AD diagnosis by leveraging the complementary nature of MRI and PET.

URL PDF HTML ☆

赞 0 踩 0

2606.19651 2026-06-19 cs.AI cs.CV cs.LG 新提交 90%

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

BrainG3N：用于可控3D脑MRI生成的双用途分词器

Max Van Puyvelde, Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

发表机构 * Department of Biomedical Data Science, Stanford University School of Medicine（斯坦福大学医学院生物医学数据科学系）； Department of Mathematical Modelling, Statistics & Bioinformatics, Ghent University（根特大学数学建模、统计与生物信息学系）； Department of Electrical Engineering, Stanford University（斯坦福大学电气工程系）

专题命中医学影像：基于3D掩码自编码器的脑MRI生成，支持条件生成

AI总结提出基于3D掩码自编码器的分词器，解耦编码器与解码器，在23项线性探测任务中21项超越SOTA，并支持条件生成和纵向预测。

详情

AI中文摘要

三维（3D）脑MRI是临床神经病学和神经肿瘤学的核心，生成模型可以增强代表性不足的队列、模拟疾病轨迹并支持隐私保护的数据共享。潜在扩散已成为建模成像数据的首选解决方案，但它对分词器提出了两个竞争性要求：编码器嵌入必须保留下游任务所需的临床信息，解码器必须重建解剖学上准确的体积。现有的重建驱动分词器以牺牲前者为代价实现了后者。为了解决这个问题，我们引入了一种基于全体积掩码自编码器（MAE）的分词器，用于3D脑MRI潜在扩散，解耦编码器和解码器：冻结的3D MAE编码器产生临床信息丰富的嵌入，而专用的CNN解码器从这些嵌入的线性投影重建体素。我们在来自18个公共队列的35,309个体积上预训练编码器，涵盖四种模态、十种疾病类别和200多个采集站点，并在两种设置中展示了其双重用途。首先，在23项线性探测基准测试中，编码器在21项任务上优于或匹配SOTA模型（即BrainIAC、BrainSegFounder和MedicalNet）。其次，在这些临床信息丰富的嵌入上训练的条件扩散变压器（DiT）支持跨六个变量的条件生成和患者特定的纵向预测。这些结果共同建立了一个单一的3D脑MRI嵌入空间，能够同时支持下游临床任务和可控生成。

英文摘要

Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.

URL PDF HTML ☆

赞 0 踩 0

2606.19371 2026-06-19 cs.LG cs.AI cs.CV 新提交 90%

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

ProMUSE: 渐进式多模态不确定性引导的分阶段证据阿尔茨海默病分类

Long Doan, Branden Chen, Ethan Litton, Huan Huang, Jiajing Huang, Yixin Xie, Weihua Zhou, Nandakumar Narayanan, Chen Zhao

发表机构 * Kennesaw State University（肯尼索州立大学）； Michigan Technological University（密歇根理工大学）； University of Iowa（爱荷华大学）

专题命中医学影像：多模态阿尔茨海默病分类，使用MRI和PET。

AI总结提出ProMUSE，一种渐进式多模态不确定性引导的分阶段证据网络，通过自适应决定何时需要额外模态，在保持准确性的同时降低数据采集成本。

详情

AI中文摘要

阿尔茨海默病（AD）是一种致命性疾病，会破坏老年人的记忆和认知能力。大多数AD治疗在早期阶段有效，导致对早期AD诊断的需求日益增加。AD诊断越来越依赖多模态数据，如临床评估、结构磁共振成像（MRI）和正电子发射断层扫描（PET）成像。然而，MRI和PET采集仍然昂贵且不易普及，使得全模态推理在现实临床工作流程中不切实际。我们提出ProMUSE，一种渐进式多模态不确定性引导的分阶段证据网络，该网络自适应地确定何时需要额外模态，有助于在保持准确性的同时降低数据采集的总体成本。ProMUSE首先使用低成本临床数据进行证据分类，并通过基于Dirichlet的主观逻辑模型量化不确定性。当不确定性超过学习阈值时，ProMUSE逐步引入MRI或PET特征，通过Dempster-Shafer理论融合模态层面的信念和不确定性，获得校准的多模态预测。这种分阶段采集策略能够在最小化对昂贵成像依赖的同时实现准确诊断。在ADNI、AIBL和OASIS数据集上针对CN-AD、CN-MCI和MCI-AD任务的实验表明，ProMUSE在减少50-90%的MRI/PET使用量的同时，实现了与全模态基线相当或更优的准确性，从而大幅节省成本。这些结果突显了ProMUSE作为现实世界AD筛查中一种实用、不确定性感知且资源高效的解决方案。

英文摘要

Alzheimer's disease (AD) is a fatal disorder that destroys memory and cognitive skills in the elderly population. Most treatments for AD are effective in the early stage, leading to an increasing demand for early AD diagnosis. AD diagnosis increasingly relies on multimodal data such as clinical assessments, structural Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET) imaging. However, MRI and PET acquisition remain costly and not universally accessible, making full-modality inference impractical in real-world clinical workflows. We propose ProMUSE, a Progressive Multi-modal Uncertainty Guided Staged Evidential Network that adaptively determines when additional modalities are necessary, helping reduce the overall cost of data acquisition while maintaining accuracy. ProMUSE first performs evidential classification using low-cost clinical data and quantifies uncertainty via a Dirichlet-based subjective logic model. When uncertainty exceeds a learned threshold, ProMUSE progressively incorporates MRI or PET features, fusing modality-wise belief and uncertainty through Dempster-Shafer theory to obtain a calibrated multimodal prediction. This staged acquisition strategy enables accurate diagnosis while minimizing reliance on expensive imaging. Experiments on ADNI, AIBL, and OASIS across CN-AD, CN-MCI, and MCI-AD tasks demonstrate that ProMUSE achieves competitive or superior accuracy compared to full-modality baselines while reducing MRI/PET usage by 50-90%, yielding substantial cost savings. These results highlight ProMUSE as a practical, uncertainty-aware, and resource-efficient solution for real-world AD screening.

URL PDF HTML ☆

赞 0 踩 0

2508.01819 2026-06-19 eess.IV 版本更新 90%

Decoding the Alzheimer's Continuum: Interpretable Multi-Gate Routing for Diagnosis and Transition Prediction

解码阿尔茨海默病连续谱：可解释的多门路由用于诊断与转换预测

Yufeng Jiang, Hexiao Ding, Hongzhao Chen, Jing Lan, Xinzhi Teng, Gerald W. Y. Cheng, Yunlin Mao, Zongxi Li, Haoran Xie, Jung Sun Yoo, Jing Cai

专题命中医学影像：基于sMRI进行阿尔茨海默病诊断和转换预测。

AI总结提出M$^3$AD统一框架，利用可解释多门专家混合架构，基于T1加权sMRI同时实现三分类诊断和阶段转换预测，准确率达95.13%。

Comments Accepted by MICCAI2026

详情

AI中文摘要

阿尔茨海默病（AD）表现为从正常认知（NC）经轻度认知障碍（MCI）到痴呆的连续进展。然而，大多数深度学习方法将此连续谱简化为不连续的分类任务，很大程度上忽略了动态阶段转换。为了解码这一复杂进展，我们提出M$^3$AD，一个统一框架，仅使用T1加权sMRI联合处理三分类诊断和诊断阶段转换预测。M$^3$AD利用可解释的多门专家混合架构，采用专门的路由机制动态捕获诊断特定的病理模式和跨连续谱的共享结构特征。它进一步通过自适应注意力融合整合临床先验（年龄、性别、eTIV）以增强泛化能力。M$^3$AD在原始实验设置下达到95.13%的准确率（MCLNC报告为90.44%），转换预测准确率为94.87%。关键的是，分析多门路由揭示了区分稳定性和进展性MCI的独特专家激活特征，为个体水平的进展风险分层提供了机制基础。代码见：此 https URL。

英文摘要

Alzheimer's disease (AD) manifests as a continuous progression from normal cognition (NC) through mild cognitive impairment (MCI) to dementia. However, most deep learning approaches reduce this continuum to disjointed classification tasks, largely ignoring dynamic stage transitions. To decode this complex progression, we propose M$^3$AD, a unified framework that jointly addresses three-class diagnosis classification and diagnosis stage transition prediction using only T1-weighted sMRI. M$^3$AD leverages an interpretable multi-gate mixture of experts architecture, employing specialized routing mechanisms to dynamically capture both diagnosis-specific pathological patterns and shared structural features across the continuum. It further integrates clinical priors (age, sex, eTIV) via adaptive attention fusion to enhance generalization. M$^3$AD achieves 95.13% accuracy, compared to 90.44% reported by MCLNC under its original experimental setting, and 94.87% for transition prediction. Crucially, analyzing the multi-gate routing reveals distinct expert activation signatures distinguishing stable from progressive MCI, providing a mechanistic basis for individual-level progression risk stratification. Code is available at https://github.com/csyfjiang/M3AD.

URL PDF HTML ☆

赞 0 踩 0

2507.23027 2026-06-19 cs.CV cs.AI 90%

Recovering Diagnostic Value: Super-Resolution-Aided Echocardiographic Classification in Resource-Constrained Imaging

恢复诊断价值：超分辨率辅助的资源受限成像中的心电图分类

Krishan Agyakari Raja Babu, Om Prabhu, Annu, Mohanasankar Sivaprakasam

发表机构 * Indian Institute of Technology Madras（印度理工学院马德拉斯分校）； All India Institute of Medical Sciences（全印度医学科学研究所）； Indian Institute of Technology Hyderabad（印度理工学院海得拉巴分校）

专题命中医学影像：超分辨率增强超声心动图分类，属于医学影像。

AI总结本文研究了基于深度学习的超分辨率技术在低质量2D超声心动图分类中的应用，通过CAMUS数据集验证了SRGAN和SRResNet在提升分类准确率和计算效率方面的有效性。

Comments Accepted at the MICCAI Workshop on "Medical Image Computing in Resource Constrained Settings & Knowledge Interchange (MIRASOL)" 2025

详情

DOI: 10.1007/978-3-032-13654-1_8

AI中文摘要

在资源受限环境下，自动心脏解读常受限于低质量超声心动图图像，限制了后续诊断模型的效果。尽管超分辨率（SR）技术在增强磁共振成像（MRI）和计算机断层扫描（CT）扫描方面表现出潜力，但其在超声心动图-一种广泛但易受噪声影响的模态中的应用仍待探索。本文研究了基于深度学习的SR技术在低质量2D超声心动图分类中的潜力。使用公开的CAMUS数据集，我们按图像质量分层样本，并评估了两个临床相关的任务：相对简单的两腔 vs. 四腔（2CH vs. 4CH）视图分类和更复杂的终舒张期 vs. 终收缩期（ED vs. ES）相分类。我们应用了两种广泛使用的SR模型-Super-Resolution Generative Adversarial Network（SRGAN）和Super-Resolution Residual Network（SRResNet），以增强低质量图像并观察到性能指标上的显著提升，特别是SRResNet，它还提供了计算效率。我们的发现表明，SR可以有效恢复降质超声扫描的诊断价值，使其成为资源受限环境（RCS）中AI辅助护理的可行工具，实现以少胜多。

英文摘要

Automated cardiac interpretation in resource-constrained settings (RCS) is often hindered by poor-quality echocardiographic imaging, limiting the effectiveness of downstream diagnostic models. While super-resolution (SR) techniques have shown promise in enhancing magnetic resonance imaging (MRI) and computed tomography (CT) scans, their application to echocardiography-a widely accessible but noise-prone modality-remains underexplored. In this work, we investigate the potential of deep learning-based SR to improve classification accuracy on low-quality 2D echocardiograms. Using the publicly available CAMUS dataset, we stratify samples by image quality and evaluate two clinically relevant tasks of varying complexity: a relatively simple Two-Chamber vs. Four-Chamber (2CH vs. 4CH) view classification and a more complex End-Diastole vs. End-Systole (ED vs. ES) phase classification. We apply two widely used SR models-Super-Resolution Generative Adversarial Network (SRGAN) and Super-Resolution Residual Network (SRResNet), to enhance poor-quality images and observe significant gains in performance metric-particularly with SRResNet, which also offers computational efficiency. Our findings demonstrate that SR can effectively recover diagnostic value in degraded echo scans, making it a viable tool for AI-assisted care in RCS, achieving more with less.

URL PDF HTML ☆

赞 0 踩 0

2503.23179 2026-06-19 eess.IV cs.CV 版本更新 90%

OncoReg: Medical Image Registration for Oncological Challenges

OncoReg：面向肿瘤学挑战的医学图像配准

Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Tim Hable, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

发表机构 * Institute of Medical Informatics, University of Lübeck（吕贝克大学医学信息学研究所）； Institute of Radiology and Nuclear Medicine, University Hospital Schleswig-Holstein（石勒斯维希-霍尔斯坦大学医院放射科和核医学研究所）； Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute（伦塞拉塞尔理工学院生物医学工程系和生物技术与跨学科研究中心）； School of Computer Science, University of Birmingham（伯明翰大学计算机科学学院）； Division of Informatics, Imaging and Data Sciences, University of Manchester（曼彻斯特大学信息学、成像和数据科学系）； DAMO Academy, Alibaba Group（阿里集团DAMO学院）； Hangzhou Shengshi Technology Co., Ltd（杭州盛世科技有限公司）； Department of Radiation Oncology, University Hospital Schleswig-Holstein（石勒斯维希-霍尔斯坦大学医院放射肿瘤科）； EchoScout GmbH ； Radboud University Medical Center, Nijmegen（奈密根大学医学中心）； Institute of Interventional Radiology, University Hospital Schleswig-Holstein（石勒斯维希-霍尔斯坦大学医院介入放射科）

专题命中医学影像：医学图像配准用于肿瘤放疗，属于医学影像。

AI总结提出OncoReg挑战，通过两阶段框架在保护患者隐私的同时开发可泛化的图像配准方法，用于放射治疗中锥束CT与扇束CT的配准，发现特征提取是关键，深度学习和经典方法结合最有效。

Comments 21 pages, 13 figures

详情

AI中文摘要

在现代癌症研究中，由于患者隐私相关的挑战，产生的大量医学数据往往未被充分利用。OncoReg挑战通过一个两阶段框架解决了这一问题，该框架使研究人员能够在确保患者隐私的同时开发和验证图像配准方法，并促进更可泛化的AI模型的发展。第一阶段涉及使用公开可用的数据集，第二阶段则专注于在安全的医院网络内对私有数据集进行模型训练。OncoReg建立在Learn2Reg挑战的基础上，纳入了放射治疗中介入性锥束计算机断层扫描与标准计划扇束CT图像的配准。准确的图像配准在肿瘤学中至关重要，特别是在图像引导放射治疗的动态治疗调整中，需要精确对齐以最小化对健康组织的辐射暴露，同时有效靶向肿瘤。本文详细介绍了OncoReg挑战的方法和数据，并对竞赛参赛作品和结果进行了全面分析。研究发现，特征提取在此配准任务中起着关键作用。从该挑战中涌现的一种新方法展示了其多功能性，而现有方法的表现与新技术相当。深度学习和经典方法在图像配准中仍扮演重要角色，尤其是方法的组合，特别是在特征提取方面，被证明最为有效。

英文摘要

In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves working with a publicly available dataset, while phase two focuses on training models on a private dataset within secure hospital networks. OncoReg builds upon the foundation established by the Learn2Reg Challenge by incorporating the registration of interventional cone-beam computed tomography with standard planning fan-beam CT images in radiotherapy. Accurate image registration is crucial in oncology, particularly for dynamic treatment adjustments in image-guided radiotherapy, where precise alignment is necessary to minimise radiation exposure to healthy tissues while effectively targeting tumours. This work details the methodology and data behind the OncoReg Challenge and provides a comprehensive analysis of the competition entries and results. Findings reveal that feature extraction plays a pivotal role in this registration task. A new method emerging from this challenge demonstrated its versatility, while established approaches continue to perform comparably to newer techniques. Both deep learning and classical approaches still play significant roles in image registration, with the combination of methods, particularly in feature extraction, proving most effective.

URL PDF HTML ☆

赞 0 踩 0

2504.02885 2026-06-19 cs.CL 版本更新 90%

Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation

Med-R2：面向医学报告生成的感知与反思驱动复杂推理

Hao Wang, Shuchang Ye, Jinghao Lin, Usman Naseem, Jinman Kim

发表机构 * The School of Computer Science, The University of Sydney（悉尼大学计算机科学学院）； The School of Computing, Macquarie University（麦考瑞大学计算机学院）； Doubao Medical Group, ByteDance（字节跳动 doubao 医疗集团）

专题命中医学影像：提出医学报告生成方法，涉及病理特征感知和诊断推理

AI总结提出Med-R2微调策略，通过引入感知驱动的长推理过程和放射学知识指导，并加入反思机制修正感知错误，提升LVLMs在医学报告生成中的病理特征感知和诊断准确性。

Comments 28 pages, 3 figures, 1 table

详情

AI中文摘要

自动化医学报告生成（MRG）越来越多地被用于减轻人工报告负担和辅助决策。大型视觉语言模型（LVLMs）因其细粒度的图像-文本对齐和先进的文本生成能力，在自动化MRG中展现出巨大潜力。目前，最先进的MRG主要专注于通过直接监督微调（SFT）来适应预训练的LVLMs，这是一种使用医学图像-报告对的微调策略。然而，有几个因素限制了这些LVLMs的性能。首先，直接SFT使LVLMs能够直接生成医学报告，而无需经过病理特征感知和诊断推理的中间思考过程。这导致可能无法感知病理特征，从而引起误诊。其次，直接SFT缺乏放射学特定知识的指导，导致LVLMs误解感知到的病理特征并做出错误诊断。为了解决这些问题，我们提出了一种名为Med-R2的新型微调策略。我们引入了一个感知驱动的长推理过程，该过程在报告生成之前进行，并融入放射学特定知识作为指导。此外，为了减轻复杂推理中潜在的感知错误，引入了一种反思机制来细化病理特征的感知和生成的报告。我们的实验表明，Med-R2通过微调LVLMs有效增强了MRG的病理特征感知能力和诊断准确性。

英文摘要

Automated medical report generation (MRG) is increasingly used to reduce the burden of manual reporting and for decision support. Large vision-language models (LVLMs) hold great promise for automated MRG due to their fine-grained image-text alignment and advanced text-generation capabilities. Currently, state-of-the-art MRGs primarily focus on adapting pre-trained LVLMs with direct supervised fine-tuning (SFT), a fine-tuning strategy with medical image-report pairs. However, several factors limit the performance of these LVLMs. Firstly, direct SFT enables LVLMs to generate medical reports directly without an intermediate thinking process of pathological feature perception and diagnostic reasoning. This causes a potential failure to perceive pathological features and thus leads to misdiagnosis. Secondly, direct SFT lacks the incorporation of radiology-specific knowledge guidance, causing LVLMs to misinterpret perceived pathological features and make incorrect diagnoses. To address these gaps, we propose a novel fine-tuning strategy named Med-R2. We introduce a perception-driven long reasoning process that precedes report generation and incorporates radiology-specific knowledge as guidance. Additionally, to alleviate potential perceptual errors in complex reasoning, a reflection mechanism is introduced to refine the perception of pathological features and the generated report. Our experiments demonstrate that Med-R2 effectively enhances the capability of pathological features perception and diagnosis accuracy for MRG via fine-tuned LVLMs.

URL PDF HTML ☆

赞 0 踩 0

2405.10705 2026-06-19 eess.IV cs.CV 版本更新 90%

3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

基于血管概率引导衰减学习的稀疏视角动态DSA图像三维血管重建

Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

发表机构 * School of Biomedical Engineering \& State Key Laboratory of Advanced Medical Materials ； Devices, ShanghaiTech University, Shanghai, China ； National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China ； School of Electronic Information ； Communications, Huazhong University of Science ； Department of Computer Science \& Engineering, Texas A\&M University, USA

专题命中医学影像：提出稀疏视角DSA重建方法，降低辐射剂量

AI总结提出血管概率引导衰减学习框架，通过静态与动态衰减场互补加权实现稀疏视角DSA重建，降低辐射剂量，并采用渐进训练和时间扰动损失提升质量。

Comments Accepted by Medical Image Analysis (MedIA), 2026

详情

DOI: 10.1016/j.media.2026.104088

AI中文摘要

数字减影血管造影（DSA）是血管疾病诊断的金标准之一。借助造影剂，时间分辨的二维DSA图像提供全面的血流信息，可用于重建三维血管结构以进行医学评估。当前的商用DSA系统通常需要数百个扫描视角进行重建，导致大量辐射暴露。在本研究中，我们提出了一种基于神经渲染的优化框架，专门用于高质量稀疏视角DSA重建，以减少辐射剂量。我们的方法称为血管概率引导衰减学习，将DSA成像表示为静态和动态衰减场的互补加权组合，权重来自时间无关的血管概率场。作为前景掩膜，血管概率为静态和动态场提供适应不同场景类型的适当梯度。该机制实现了静态背景与动态造影剂流的自监督分解，并显著提高了重建质量。我们的模型通过最小化合成投影与真实DSA图像之间的差异进行训练。我们进一步采用两种训练策略来提高重建质量：（1）由粗到细的渐进训练以改善几何结构，以及（2）时间扰动渲染损失以保持时间一致性。实验结果表明了高质量的三维血管重建和二维DSA图像合成。

英文摘要

Digital Subtraction Angiography (DSA) is one of the gold standards for vascular disease diagnosis. With the help of a contrast agent, time-resolved 2D DSA images deliver comprehensive blood flow information and can be utilized to reconstruct 3D vessel structures for medical assessment. Current commercial DSA systems typically require hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. In this study, we propose a neural rendering-based optimization framework tailored for high-quality sparse-view DSA reconstruction to reduce radiation dosage. Our approach, termed vessel probability guided attenuation learning, represents DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the time-independent vessel probability field. Functioning as a foreground mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism enables self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves reconstruction quality. Our model is trained by minimizing the discrepancy between synthesized projections and real captured DSA images. We further employ two training strategies to improve reconstruction quality: (1) coarse-to-fine progressive training for better geometry and (2) temporal perturbed rendering loss for temporal consistency. Experimental results have demonstrated high-quality 3D vessel reconstruction and 2D DSA image synthesis.

URL PDF HTML ☆

赞 0 踩 0

2606.19372 2026-06-19 eess.IV cs.CV cs.LG 新提交 90%

Full-Self Diagnostics (FSD): Physics-Grounded Visual Biomarker Inference from Smartphone Video via Inverse Problems and Operator Learning

全自诊断(FSD): 通过逆问题和算子学习从智能手机视频进行基于物理的可视生物标志物推断

Jonathan Thomas, Harsh Thaker

发表机构 * Algomash® (Algorithmic Mashup Inc.)（算法混搭公司）

专题命中健康监测：从手机视频推断生理状态，血糖监测

AI总结提出全自诊断(FSD)框架，结合物理前向模型、信息论可观测性、正则化逆问题、算子学习和随机变分推断，从9秒面部视频恢复生理状态，在59名受试者38812次扫描中验证，血糖MARD达29.86%。

Comments 38,812 paired scans, preliminary longitudinal validation of multichannel visual glucose inference (MARD 17 to 46 percent across cohorts); physics plus information theory plus operator learning framework

详情

AI中文摘要

我们提出全自诊断(FSD)，一个统一的数学框架，用于从消费级智能手机拍摄的无约束9秒面部视频中恢复潜在生理状态。该方法整合了五个相互增强的组件：(1)基于辐射传输方程和发色团吸收的物理前向模型，将相机观测映射到生物标志物浓度；(2)信息论可观测性理论，证明多通道视觉信号（光谱、脉搏、呼吸、微表情和眼动）与生理状态包含严格递增的互信息；(3)具有域均匀可辨识性保证的稳定Tikhonov正则化逆问题；(4)算子学习公式，实现跨设备、分辨率和人群的泛化；(5)可解释为随机变分推断的监督学习过程，从配对生物传感器真实值持续优化模型，性能随配对观测数量的平方根倒数比例提升。在59名受试者的38812次真实世界配对扫描上的实证验证展示了实际性能。第一作者自采数据（血糖范围35-550 mg/dL）的MARD为29.86%，97.57%的预测落在Clarke误差网格A+B区，仅0.27%在危险E区。一位管理良好的糖尿病参与者在较窄的70-180 mg/dL范围内达到MARD 17%。这些结果证实，消费级面部视频编码了足够的结构化信息，可在完全无约束条件下进行临床相关的非侵入性生物标志物推断，且性能随更多配对数据的可用性可预测地提升。

英文摘要

We present Full-Self Diagnostics (FSD), a unified mathematical framework for recovering latent physiological states from unconstrained 9-second facial videos captured by consumer smartphones. The approach integrates five mutually reinforcing components: (1) a physics-based forward model derived from the radiative transfer equation and chromophore absorption that maps camera observables to biomarker concentrations; (2) an information-theoretic observability theory proving that multi-channel visual signals (spectral, pulse, respiratory, micro-expression, and oculomotor) contain strictly increasing mutual information with physiological state; (3) a stable, Tikhonov-regularized inverse problem with domain-uniform identifiability guarantees; (4) an operator-learning formulation that enables generalization across devices, resolutions, and populations; and (5) a supervised learning procedure, interpretable as stochastic variational inference, that continuously refines the model from paired biosensor ground truth with performance improving proportionally to one over the square root of the number of paired observations. Empirical validation on 38812 real-world paired scans across 59 subjects demonstrates practical performance. Self-collected data from the lead author (glucose range 35-550 mg/dL) yields MARD of 29.86 percent with 97.57 percent of predictions in Clarke Error Grid Zones A+B and only 0.27 percent in the dangerous Zone E. A well-managed diabetic participant achieves MARD of 17 percent in the narrower 70-180 mg/dL band. These results confirm that consumer-grade facial video encodes sufficient structured information for clinically relevant, non-invasive biomarker inference under fully unconstrained conditions, with performance scaling predictably as more paired data becomes available.

URL PDF HTML ☆

赞 0 踩 0

2606.19481 2026-06-19 cs.LG 新提交 90%

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Insulin4RL：面向离线强化学习的重症监护室实时胰岛素管理

Thomas Frost, Steve Harris

发表机构 * Institute of Health Informatics（健康信息学研究所）； University College London（伦敦大学学院）

专题命中健康监测：重症监护室胰岛素管理数据集，用于离线强化学习。

AI总结针对电子健康记录离散化导致模型泛化性差的问题，提出基于真实临床轨迹的离线强化学习数据集Insulin4RL，包含375,000+决策和12,209名患者，用于评估模型在真实采样假设下的性能。

Comments Under submission

详情

AI中文摘要

离线强化学习（ORL）有潜力利用历史电子健康记录（EHR）数据提高临床决策质量。当前该领域的训练和评估实践严重依赖于按固定规则时间间隔离散化的EHR数据集。离散化创建了复杂临床场景的虚构表示，并损害了回顾性模型评估的泛化性。在本文中，我们介绍Insulin4RL，一个医疗ORL数据集，其特点是来自真实临床轨迹的自然不规则输入和动作。该数据集源自MIMIC-IV，包含超过375,000个标记决策，涉及12,209名需要在重症监护室进行胰岛素输注滴定的患者。因此，该数据集可用于研究ORL模型在现实临床采样假设下的性能。我们提供了数据集结构和特征的描述、使用无模型离线强化学习的基线性能指标，以及使用拟合Q评估的标准化评估协议。最后，我们提出了未来研究可以利用该资源解决的领域。

英文摘要

Offline reinforcement learning (ORL) offers the potential to improve the quality of clinical decision-making using historical electronic health record (EHR) data. Current training and evaluative practices in this field rely heavily on EHR datasets that have been temporally discretised into fixed, regular time intervals. Discretisation creates fictional representations of complex clinical scenarios and compromises the generalisability of retrospective model evaluations. In this paper, we introduce Insulin4RL, a healthcare ORL dataset featuring naturally irregular inputs and actions from real clinical trajectories. Derived from MIMIC-IV, Insulin4RL comprises over 375,000 labelled decisions across 12,209 patients requiring insulin infusion titration in the Intensive Care Unit. The dataset can thus be used for research into ORL model performance under realistic clinical sampling assumptions. We provide a description of the dataset's structure and characteristics, baseline performance metrics using model-free offline reinforcement learning, and a standardised evaluation protocol using fitted Q-evaluation. We conclude with suggested areas for future research that could be addressed using this resource.

URL PDF HTML ☆

赞 0 踩 0

2602.07628 2026-06-19 cs.AI cs.LG 版本更新 90%

SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures

SleepMaMi：一种融合宏观与微观结构的通用睡眠基础模型

Keondo Park, Younghoon Na, Yourim Choi, Hyunwoo Ryu, Hyun-Woo Shin, Hyung-Sin Kim

发表机构 * Graduate School of Data Science, Seoul National University, Seoul, South Korea（首尔国立大学数据科学研究生院，韩国首尔）； Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea（首尔国立大学医学院生物医学科学系，韩国首尔）； Obstructive Upper Airway Research (OUaR) Laboratory, Department of Pharmacology, Seoul National University College of Medicine, Seoul, Republic of Korea（首尔国立大学医学院药理学系阻塞性上气道研究（OUaR）实验室，韩国首尔）； Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul, Republic of Korea（首尔国立大学医院耳鼻喉头颈外科系，韩国首尔）

专题命中健康监测：睡眠基础模型SleepMaMi融合宏观微观结构

AI总结提出SleepMaMi睡眠基础模型，通过分层双编码器设计（宏观编码器建模整夜时间依赖，微观编码器捕捉生物信号短时特征），结合人口统计引导对比学习和混合掩码自编码器训练，在超过2万条PSG记录上预训练，在下游任务中优于或匹配现有基础模型。

Comments 8 pages, Appendix 9 pages

详情

AI中文摘要

虽然向统一基础模型的转变已经彻底改变了许多深度学习领域，但睡眠医学仍然主要局限于专注于局部微观结构特征的特定任务模型。这些方法常常忽略多导睡眠图（PSG）丰富的多模态背景，并且未能捕捉整夜睡眠的全局宏观结构。为了解决这个问题，我们引入了SleepMaMi，一种睡眠基础模型，旨在掌握长达一小时的睡眠架构和细粒度信号形态。我们的框架采用分层双编码器设计：宏观编码器用于建模整夜时间依赖，微观编码器用于从生物信号中捕捉短期特征。宏观编码器通过人口统计引导对比学习进行训练，该学习将夜间睡眠模式与客观受试者元数据（如年龄、性别和BMI）对齐，以优化全局表示。微观编码器通过混合掩码自编码器（MAE）和多模态对比目标进行优化。在超过20,000条PSG记录（158K小时）的大规模语料库上预训练，SleepMaMi在多样化的下游任务套件中优于或匹配现有的最先进基础模型，展示了在临床睡眠分析中卓越的泛化能力和标签高效适应能力。

英文摘要

While the shift toward unified foundation models has revolutionized many deep learning domains, sleep medicine remains largely restricted to task-specific models that focus on localized micro-structure features. These approaches often neglect the rich, multi-modal context of Polysomnography (PSG) and fail to capture the global macro-structure of a full night's sleep. To address this, we introduce SleepMaMi , a Sleep Foundation Model engineered to master both hour-long sleep architectures and fine-grained signal morphologies. Our framework utilizes a hierarchical dual-encoder design: a Macro-Encoder to model full-night temporal dependencies and a Micro-Encoder to capture short-term characteristics from biosignals. Macro-Encoder is trained via Demographic-Guided Contrastive Learning, which aligns overnight sleep patterns with objective subject metadata, such as age, sex and BMI to refine global representations. Micro-Encoder is optimized via a hybrid Masked Autoencoder (MAE) and multi-modal contrastive objective. Pre-trained on a massive corpus of $>$20,000 PSG recordings (158K hours),SleepMaMi outperforms or matches state-of-the-art existing foundation models across a diverse suite of downstream tasks, demonstrating superior generalizability and label-efficient adaptation for clinical sleep analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.20250 2026-06-19 cs.CV 新提交 90%

Single-Stage Hierarchical Rectification for Weakly Supervised Histopathology Segmentation

单阶段层次化校正用于弱监督组织病理学分割

Duc T. Nguyen, Hoang-Long Nguyen, Thanh-Ha DO, Huy-Hieu Pham

发表机构 * VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam（越南河内VinUniversity VinUni-Illinois智慧健康中心）； The Computer Vision and Medical AI Lab, VinUniversity, Hanoi, Vietnam（越南河内VinUniversity计算机视觉与医学人工智能实验室）； Posts and Telecommunications Institute of Technology, Hanoi, Vietnam（越南河内邮电技术学院）

专题命中病理影像：弱监督组织病理学分割

AI总结提出单阶段层次化校正框架，通过层次化特征校正模块在单次训练中直接生成高保真激活图，解决多阶段弱监督分割中的误差传播和计算开销问题。

Comments Accepted to MICCAI 2026. This is the pre-review submitted version, not the camera-ready version. The final authenticated version will be available in the MICCAI 2026 proceedings

详情

AI中文摘要

现有的计算病理学中的弱监督语义分割方法依赖于多阶段范式：类激活图生成、离线伪掩码细化和全监督再训练。虽然这种解耦方法已被广泛采用，但它存在根本性缺陷。多阶段过程不仅导致高计算训练成本，还遭受误差传播：浅层CNN中的局部纹理偏差产生假阳性伪影，后续细化步骤往往无法纠正。为了通过简单而高效的方法解决这些持续存在的挑战，我们提出了单阶段层次化校正（SSHR）框架。我们的方法不是事后被动地细化CAM，而是在前向传播过程中主动净化中间特征表示。我们引入了一个层次化特征校正模块（HFRM），利用深层全局语义上下文过滤浅层中的局部异常。该机制在单个训练循环内直接生成高保真激活图。在LUAD-HistoSeg和BCSS数据集上的实验表明，SSHR优于最先进的多阶段方法。此外，SSHR将训练时间减少了2到5倍。这种效率降低了计算开销，并加速了大规模组织病理学工作流的临床转化。代码可在以下网址获取：this https URL

英文摘要

Existing weakly supervised semantic segmentation (WSSS) methods in computational pathology rely on a multi-stage paradigm: class activation map (CAM) generation, offline pseudo-mask refinement, and fully supervised retraining. While established, this decoupled approach presents fundamental limitations. The multi-stage process not only incurs high computational training costs but also suffers from error propagation: local texture biases in shallow CNN layers generate false-positive artifacts that subsequent refinement steps often fail to correct. To address these persistent challenges through a simple yet highly effective approach, we propose the Single-Stage Hierarchical Rectification (SSHR) framework. Rather than passively refining CAMs post-hoc, our method proactively purifies intermediate feature representations during the forward pass. We introduce a Hierarchical Feature Rectification Module (HFRM) that utilizes deep global semantic context to filter out local anomalies in shallow layers. This mechanism generates high-fidelity activation maps directly within a single training loop. Experiments on the LUAD-HistoSeg and BCSS datasets demonstrate that SSHR outperforms state-of-the-art multi-stage methods. Furthermore, SSHR reduces training duration by 2 to 5 times. This efficiency minimizes computational overhead and accelerates clinical translation for large-scale histopathology workflows. The code is available at: https://github.com/trongduc-nguyen/SSHR

URL PDF HTML ☆

赞 0 踩 0

2606.19966 2026-06-19 cs.CV cs.LG 新提交 90%

Semantic-Anchored Evidential Fusion for Domain-Robust Whole-Slide Survival Analysis

语义锚定证据融合用于域鲁棒的全切片生存分析

Yucheng Xing, Ling Huang, Pei Liu, Jingying Ma, Jiaqing Xu, Kai He, Mengling Feng

发表机构 * National University of Singapore（新加坡国立大学）； Imperial College London（帝国理工学院）； Hunan University（湖南大学）

专题命中病理影像：提出SAEFS框架用于全切片生存分析

AI总结提出SAEFS框架，通过视觉问答提取语义锚点，结合双流证据提取和狄利克雷主观逻辑建模不确定性，实现跨域零样本生存分析，平均C-index提升10.2%。

详情

AI中文摘要

全切片图像（WSIs）广泛用于计算癌症预后。然而，现有方法主要关注域内性能，难以泛化到不同临床中心。这一局限性源于它们依赖像素级表示，极易受到染色协议和扫描硬件导致的域特定伪影影响。我们假设高级病理语义（如肿瘤分级和微环境结构）提供了域不变的语义表示，反映了人类病理学家的鲁棒诊断逻辑。因此，我们提出了语义锚定证据融合生存（SAEFS）框架，其中SAEFS通过视觉问答（VQA）从WSIs中推导语义锚点，采用双流WSI证据提取架构，使用基于狄利克雷的主观逻辑建模不确定性，并通过谨慎合取规则融合语义和视觉证据，以避免来自相关源的过度自信融合。仅在单一源域上训练并在四个未见域上进行零样本评估，SAEFS在预测准确性和可靠性上均一致优于最先进模型，平均C-index提升10.2%。定量分析进一步表明，VQA导出的语义特征比像素级特征表现出显著更低的跨中心差异，突显了其在跨中心临床应用中的鲁棒性。

英文摘要

Whole-slide images (WSIs) are widely used for computational cancer prognosis. However, most existing methods primarily focus on in-domain performance and fail to generalize across clinical centers. This limitation stems from their reliance on pixel-derived representations that are highly susceptible to domain-specific artifacts caused by staining protocols and scanner hardware. We hypothesize that high-level pathology semantics, such as tumor grade and micro-environmental architecture, provide a domain-invariant semantic representation that mirrors the robust diagnostic logic of human pathologists. Therefore, we propose a Semantic-Anchored Evidential Fusion Survival (SAEFS) framework, where SAEFS derives semantic anchors from WSIs via Visual Question Answering (VQA), employs a dual-stream WSI evidence extraction architecture, uses Dirichlet-based Subjective Logic to model uncertainty, and fuses semantic and visual evidence through a cautious conjunction rule to avoid overconfident fusion from correlated sources. Trained exclusively on one source domain and evaluated zero-shot across four unseen domains, SAEFS consistently outperforms state-of-the-art models both in prediction accuracy and reliability, improving the average C-index by 10.2%. Quantitative analyses further show that VQA-derived semantic features exhibit significantly lower cross-center divergence than pixel-derived features, highlighting their robustness for cross-center clinical applications.

URL PDF HTML ☆

赞 0 踩 0

2606.20164 2026-06-19 cs.CL cs.AI cs.LG q-bio.QM 新提交 90%

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

MedRLM：用于长上下文临床推理、传感器引导筛查、证据支持决策及社区到三级转诊优化的递归多模态健康智能

Aueaphum Aueawatthanaphisut

发表机构 * School of Information, Computer ； Communication Technology Sirindhorn International Institute of Technology, Thammasat University Pathum Thani, Thailand 1

专题命中临床大模型：MedRLM递归多模态框架用于临床推理和决策。

AI总结提出MedRLM递归多模态健康智能框架，通过递归检查、分解、检索、验证和合成患者信息，协调多个专业代理并引入临床证据图记忆，实现长上下文临床推理和传感器引导筛查。

Comments 9 pages, 3 figures, 3 tables, 1 Algorithm, 29 equations

详情

AI中文摘要

现实世界的临床决策支持需要对异质性和纵向的患者信息进行推理，而不是回答孤立的医学问题。然而，当前的医学大语言模型和检索增强生成系统通常依赖单步提示或检索，当临床证据分布在长电子健康记录、医学图像、传感器流、指南和转诊约束中时，这可能变得脆弱。本文提出MedRLM，一个用于长上下文临床推理、传感器引导筛查和社区到三级转诊支持的递归多模态健康智能框架。MedRLM不是将所有患者信息压缩到一个提示中，而是将患者病例视为一个外部临床环境，可以递归地检查、分解、检索、验证和综合。该框架协调了专门用于临床文本、纵向EHR、医学影像、生理传感器信号、指南检索、不确定性审计和转诊规划的代理。它进一步引入了临床证据图记忆，将患者特定的观察结果与检索到的证据、标准化定义、传感器衍生的生物标志物和转诊标准连接起来。传感器引导的递归触发机制在检测到异常生理或行为模式时激活更深层次的推理，而不确定性门控细化支持临床医生对高风险或低置信度病例的审查。我们还概述了一个使用公共和经认证的临床数据集（涵盖EHR、放射学、ECG、ICU时间序列和转诊代理结果）的真实数据评估设计。MedRLM旨在将医学AI从静态问答转向可审计、多模态和流程感知的临床决策支持。

英文摘要

Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and retrieval-augmented generation systems often rely on single-step prompting or retrieval, which can be fragile when clinical evidence is distributed across long electronic health records, medical images, sensor streams, guidelines, and referral constraints. This paper proposes MedRLM, a Recursive Multimodal Health Intelligence framework for long-context clinical reasoning, sensor-guided screening, and community-to-tertiary referral support. Instead of compressing all patient information into one prompt, MedRLM treats the patient case as an external clinical environment that can be recursively inspected, decomposed, retrieved, verified, and synthesized. The framework coordinates specialized agents for clinical text, longitudinal EHR, medical imaging, physiological sensor signals, guideline retrieval, uncertainty auditing, and referral planning. It further introduces a Clinical Evidence Graph Memory to connect patient-specific observations with retrieved evidence, standardized definitions, sensor-derived biomarkers, and referral criteria. A sensor-guided recursive triggering mechanism activates deeper reasoning when abnormal physiological or behavioral patterns are detected, while uncertainty-gated refinement supports clinician review for high-risk or low-confidence cases. We also outline a real-data evaluation design using public and credentialed clinical datasets spanning EHR, radiology, ECG, ICU time series, and referral-proxy outcomes. MedRLM aims to move medical AI from static question answering toward auditable, multimodal, and workflow-aware clinical decision support.

URL PDF HTML ☆

赞 0 踩 0

2606.19373 2026-06-19 cs.LG cs.AI 新提交 90%

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

cAPM：具有主动学习的持续AI辅助起搏标测

Dylan O'Hara, Pradeep Bajracharya, Casey Meisenzahl, Karli Gillette, Anton J. Prassl, Gernot Plank, Saman Nazarian, Roderick Tung, John L Sapp, Linwei Wang

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）； University of Utah（犹他大学）； Scientific Computing and Imaging Institute, University of Utah（犹他大学科学计算与成像研究所）； Medical University of Graz（格拉茨医科大学）； University of Pennsylvania Perelman School of Medicine（宾夕法尼亚大学佩雷尔曼医学院）； The University of Arizona College of Medicine（亚利桑那大学医学院）； Dalhousie University（达尔豪斯大学）

专题命中诊断辅助：AI辅助起搏标测，用于室性心动过速治疗。

AI总结提出cAPM框架，通过任务无关的代理神经网络、主动学习和持续学习策略，在减少起搏标测数据量的同时，实现跨室性心动过速的知识迁移，将定位精度提升至81%。

详情

AI中文摘要

室性心动过速是一种危及生命的心律失常，是心源性猝死的主要原因。起搏标测是一种临床程序，用于在导管消融室性心动过速期间识别干预靶点。它要求临床医生在心室的不同部位起搏，并快速解释由此产生的心电图，以确定下一步起搏位置或是否已识别出靶点。已提出主动学习AI模型来指导临床医生选择下一个起搏点，显示出在减少起搏点数量和改善起搏标测效率方面的潜力。现有方法需要对每个靶点重新训练，无法在同一患者或不同患者的多个室性心动过速之间迁移知识。我们引入cAPM用于持续AI辅助起搏标测，以捕获和迁移从过去起搏标测数据中积累的知识，从而减少未来靶点室性心动过速所需的起搏标测数据量。这是通过一个任务无关的代理神经网络实现的，该网络学习从起搏点到12导联心电图形态的映射；一种主动学习策略，通过为每个靶点选择信息量最大的起搏点来优化该代理模型；以及一种持续学习策略，以顺序方式执行此操作，同时保留先前靶点的知识。在由不同生理条件和心室几何形状下顺序呈现的定位任务组成的计算机模拟测试平台上评估，cAPM（无论是否重放过去数据样本）在使用4.5个起搏标测点时，在临床耐受范围内（5毫米精度）定位的概率达到81%，而最先进的主动学习方法使用13.7个起搏点达到38%的概率。这些结果为cAPM准备用于体内临床前和临床研究提供了坚实基础，在这些研究中，cAPM可用于指导起搏标测。

英文摘要

Ventricular tachycardia is a life-threatening rhythm disorder and a major cause of sudden cardiac death. Pace-mapping is a clinical procedure for identifying the intervention target during catheter ablation of VT. It requires clinicians to pace different sites in the ventricles and rapidly interpret the resulting electrocardiograms to determine where to pace next or whether a target site has been identified. Active learning AI models have been proposed to guide clinicians to the next pacing site, showing promise in reducing the number of pacing sites and improving the efficiency of pace-mapping. Existing methods require retraining each target without the ability to transfer knowledge across multiple VTs within the same patient or across patients. We introduce cAPM for continuous AI-assisted pace-mapping to capture and transfer knowledge accumulated from past pace-mapping data to reduce the number of pace-mapping data needed for future target VTs. This is made possible by a task-agnostic surrogate neural network that learns the mapping from pacing sites to 12-lead ECG morphology, an active-learning strategy that refines this surrogate model by selecting the most informative pacing site for each target, and a continual learning strategy to do so sequentially while retaining knowledge from prior targets. Evaluated on an in-silico testbed consisting of sequentially-presented localization tasks across different physiological conditions and ventricular geometries, cAPM with and without replay of past data samples achieved an 81% probability of localizing within clinical tolerance (5 mm accuracy) using 4.5 pace-mapping sites, compared to the state-of-the-art active-learning method achieving 38% probability using 13.7 pacing sites. These results provide a strong basis for preparing cAPM towards in-vivo preclinical and clinical studies where it can be used to guide pace-mapping.

URL PDF HTML ☆

赞 0 踩 0

2601.00014 2026-06-19 eess.SP cs.AI cs.LG 版本更新 90%

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

建模全天心电图信号以可解释人工智能预测心力衰竭风险

Eran Zvuloni, Ronit Almog, Michael Glikson, Shany Brimer Biton, Ilan Green, Izhar Laufer, Offer Amir, Joachim A. Behar

发表机构 * Leumit Health Services（Leumit健康服务）

专题命中诊断辅助：利用深度学习预测心力衰竭风险，属于诊断辅助

AI总结提出DeepHHF深度学习模型，利用24小时单导联心电图数据预测五年内心力衰竭风险，AUC达0.80，优于短时片段和临床评分，可解释性分析显示模型关注心律失常和心脏异常。

详情

AI中文摘要

心力衰竭（HF）影响11.8%的65岁及以上成年人，降低生活质量和寿命。预防HF可降低发病率和死亡率。我们假设将人工智能（AI）应用于24小时单导联心电图（ECG）数据可预测五年内HF风险。为此，使用了Technion-Leumit Holter ECG（TLHE）数据集，包括20年间收集的47,729名患者的69,663条记录。我们的深度学习模型DeepHHF在24小时ECG记录上训练，实现了0.80的受试者工作特征曲线下面积，优于使用30秒片段和临床评分的模型。DeepHHF识别的高风险个体住院或死亡事件概率翻倍。可解释性分析显示DeepHHF关注心律失常和心脏异常。本研究强调了深度学习建模24小时连续ECG数据的可行性，捕捉了对可靠风险预测至关重要的阵发性事件。应用于单导联Holter ECG的人工智能无创、廉价且广泛可及，使其成为HF风险预测的有前景工具。

英文摘要

Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

URL PDF HTML ☆

赞 0 踩 0

1. 医学影像 22 篇

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

EFIQA: Explainable Fundus Image Quality Assessment via Anatomical Priors

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

CSWinUNETR: Segmentation of Thin Anatomical Structures in Medical Images

Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers

Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

HypOProto: Hyperbolic Ordinal Prototypes for Left Ventricular Filling Pressure Classification

Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology

Geometry-Aware Superpixel Graph Transformer with Metadata for Skin Lesion Classification

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

HEad and neCK TumOR (HECKTOR) 2025: Benchmark of Segmentation, Diagnosis, and Prognosis in Multimodal PET/CT

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

Decoding the Alzheimer's Continuum: Interpretable Multi-Gate Routing for Diagnosis and Transition Prediction

Recovering Diagnostic Value: Super-Resolution-Aided Echocardiographic Classification in Resource-Constrained Imaging

OncoReg: Medical Image Registration for Oncological Challenges

Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation

3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

2. 健康监测 3 篇

Full-Self Diagnostics (FSD): Physics-Grounded Visual Biomarker Inference from Smartphone Video via Inverse Problems and Operator Learning

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures

3. 病理影像 2 篇

Single-Stage Hierarchical Rectification for Weakly Supervised Histopathology Segmentation

Semantic-Anchored Evidential Fusion for Domain-Robust Whole-Slide Survival Analysis

4. 临床大模型 1 篇

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

5. 诊断辅助 2 篇

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI