arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

医学 AI

医学智能、临床 AI、医学影像、病理、诊断和医疗健康大模型。

今日/当前日期收录 15 信号源:cs.CV, cs.LG, q-bio, eess.IV, eess.SP

1. 医学影像 10 篇

2508.11211 2026-06-18 eess.IV cs.CV 版本更新 95%

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

面向CT视野扩展的高效图像到图像薛定谔桥

Zhenhao Li, Song Ni, Long Yang, Xiaojie Yin, Haijun Yu, Jiazhou Wang, Hongbin Han, Weigang Hu, Yixing Huang

发表机构 * Institute of Medical Technology, Peking University Health Science Center(北京大学人民医院医学技术研究所) Shanghai Cancer Center, Fudan University(复旦大学上海癌症中心) Department of Electrical and Computer Engineering, University of Massachusetts Lowell(马萨诸塞大学洛厄尔分校电气与计算机工程系) Beijing Key Laboratory of Intelligent Neuromodulation and Brain Disorder Treatment(北京智能神经调控与脑疾病治疗重点实验室)

专题命中 医学影像 :CT视野扩展,属于医学影像

AI总结 提出基于图像到图像薛定谔桥(I²SB)扩散模型的CT视野扩展框架,通过直接学习有限视野与扩展视野图像间的随机映射,实现单步快速推理,在精度和速度上均超越现有扩散模型。

Comments 12 pages

Journal ref IEEE Transactions on Radiation and Plasma Medical Sciences 2026

详情
AI中文摘要

计算机断层扫描(CT)是一种用于无创、高分辨率可视化内部解剖结构的基石成像模态。然而,当扫描物体超出扫描仪的视野(FOV)时,投影数据被截断,导致重建不完整并在FOV边界附近出现明显伪影。传统重建算法难以从这类数据中恢复准确的解剖结构,限制了临床可靠性。深度学习方法已被探索用于FOV扩展,其中扩散生成模型代表了图像合成的最新进展。然而,传统扩散模型由于迭代采样过程,计算量大且推理速度慢。为解决这些限制,我们提出了一种基于图像到图像薛定谔桥(I$^2$SB)扩散模型的高效CT FOV扩展框架。与从纯高斯噪声合成图像的传统扩散模型不同,I$^2$SB学习配对的有限FOV和扩展FOV图像之间的直接随机映射。这种直接对应关系产生了更可解释和可追踪的生成过程,增强了重建中的解剖一致性和结构保真度。I$^2$SB实现了优越的定量性能,在模拟噪声数据上的均方根误差(RMSE)值为49.8 HU,在真实数据上为152.0 HU,优于最先进的扩散模型,如条件去噪扩散概率模型(cDDPM)和基于块的扩散方法。此外,其单步推理使得每2D切片的重建仅需0.19秒,相比cDDPM(135秒)实现了超过700倍的加速,并超过了第二快的DiffusionGAN(0.58秒)。这种准确性和效率的结合表明I$^2$SB具有实时或临床部署的潜力。

英文摘要

Computed tomography (CT) is a cornerstone imaging modality for non-invasive, high-resolution visualization of internal anatomical structures. However, when the scanned object exceeds the scanner's field of view (FOV), projection data are truncated, resulting in incomplete reconstructions and pronounced artifacts near FOV boundaries. Conventional reconstruction algorithms struggle to recover accurate anatomy from such data, limiting clinical reliability. Deep learning approaches have been explored for FOV extension, with diffusion generative models representing the latest advances in image synthesis. Yet, conventional diffusion models are computationally demanding and slow at inference due to their iterative sampling process. To address these limitations, we propose an efficient CT FOV extension framework based on the image-to-image Schrödinger Bridge (I$^2$SB) diffusion model. Unlike traditional diffusion models that synthesize images from pure Gaussian noise, I$^2$SB learns a direct stochastic mapping between paired limited-FOV and extended-FOV images. This direct correspondence yields a more interpretable and traceable generative process, enhancing anatomical consistency and structural fidelity in reconstructions. I$^2$SB achieves superior quantitative performance, with root-mean-square error (RMSE) values of 49.8 HU on simulated noisy data and 152.0 HU on real data, outperforming state-of-the-art diffusion models such as conditional denoising diffusion probabilistic models (cDDPM) and patch-based diffusion methods. Moreover, its one-step inference enables reconstruction in just 0.19 s per 2D slice, representing over a 700-fold speedup compared to cDDPM (135 s) and surpassing DiffusionGAN (0.58 s), the second fastest. This combination of accuracy and efficiency indicates that I$^2$SB has potential for real-time or clinical deployment.

2512.09185 2026-06-18 cs.CV cs.AI 版本更新 95%

Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation

学习患者特异性疾病动态:基于潜在流匹配的纵向影像生成

Hao Chen, Rui Yin, Yifan Chen, Qi Chen, Chao Li

发表机构 * University of Cambridge(剑桥大学) Nanjing First Hospital(南京第一医院) Nanjing Medical University(南京医科大学) Johns Hopkins University(约翰霍普金斯大学) University of Dundee(邓迪大学)

专题命中 医学影像 :提出纵向MRI生成框架,建模疾病进展

AI总结 提出Δ-LFM框架,利用流匹配对齐患者潜在轨迹,通过患者特异性潜在对齐实现单调疾病进展建模,在三个纵向MRI基准上验证了可解释性和性能。

Comments ICLR 2026 accepted

详情
AI中文摘要

理解疾病进展是一个直接的临床挑战,对早期诊断和个性化治疗具有重要意义。虽然最近的生成方法试图对进展进行建模,但关键不匹配仍然存在:疾病动态本质上是连续且单调的,然而潜在表示通常是分散的,缺乏语义结构,并且基于扩散的模型通过随机去噪过程破坏了连续性。在这项工作中,我们提出将疾病动态视为速度场,并利用流匹配(FM)来对齐患者数据的时间演变。与先前方法不同,它捕捉了疾病的内在动态,使进展更具可解释性。然而,一个关键挑战仍然存在:在潜在空间中,自动编码器(AE)不能保证跨患者的对齐或与临床严重性指标(例如年龄和疾病状况)的相关性。为了解决这个问题,我们提出学习患者特异性潜在对齐,这迫使患者轨迹沿着特定轴延伸,其幅度随疾病严重程度单调增加。这导致了一个一致且语义上有意义的潜在空间。总之,我们提出了Δ-LFM,一个用于通过流匹配建模患者特异性潜在进展的框架。在三个纵向MRI基准上,Δ-LFM展示了强大的实证性能,更重要的是,为解释和可视化疾病动态提供了一个新框架。

英文摘要

Understanding disease progression is a central clinical challenge with direct implications for early diagnosis and personalized treatment. While recent generative approaches have attempted to model progression, key mismatches remain: disease dynamics are inherently continuous and monotonic, yet latent representations are often scattered, lacking semantic structure, and diffusion-based models disrupt continuity with random denoising process. In this work, we propose to treat the disease dynamic as a velocity field and leverage Flow Matching (FM) to align the temporal evolution of patient data. Unlike prior methods, it captures the intrinsic dynamic of disease, making the progression more interpretable. However, a key challenge remains: in latent space, Auto-Encoders (AEs) do not guarantee alignment across patients or correlation with clinical-severity indicators (e.g., age and disease conditions). To address this, we propose to learn patient-specific latent alignment, which enforces patient trajectories to lie along a specific axis, with magnitude increasing monotonically with disease severity. This leads to a consistent and semantically meaningful latent space. Together, we present $Δ$-LFM, a framework for modeling patient-specific latent progression with flow matching. Across three longitudinal MRI benchmarks, $Δ$-LFM demonstrates strong empirical performance and, more importantly, offers a new framework for interpreting and visualizing disease dynamics.

2510.10779 2026-06-18 cs.CV 版本更新 95%

Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans

结构化谱图表示学习用于3D CT扫描的多标签异常分析

Theo Di Piazza, Carole Lazarus, Olivier Nempont, Loic Boussel

发表机构 * INSA Lyon, University of Lyon, CNRS, INSERM, CREATIS UMR 5220, U1294(里昂国立应用科学学院、里昂大学、国家科学研究中心、法国国家医学研究院、CREATIS UMR 5220、U1294)

专题命中 医学影像 :3D CT异常分析,多标签分类

AI总结 提出一种基于谱图卷积的2.5D框架,将3D CT体积表示为结构化图,通过轴向切片三元组节点建模层间依赖,实现多标签异常分类,跨数据集泛化性能强。

Comments Accepted at MELBA Journal 2026

详情
AI中文摘要

随着CT检查数量的增长,对器官分割、异常检测和报告生成等自动化工具的需求日益增加,以支持放射科医生管理临床工作负载。由于三维数据中固有的复杂空间关系和异常的广泛变异性,3D胸部CT扫描的多标签分类仍然是一个关键但具有挑战性的问题。基于3D卷积神经网络的现有方法难以捕捉长距离依赖,而视觉Transformer通常需要在大规模领域特定数据集上进行大量预训练才能获得竞争力。在这项工作中,我们提出了一种2.5D替代方案,引入了一个新的基于图的框架,将3D CT体积表示为结构化图,其中轴向切片三元组作为节点,通过谱图卷积处理,使模型能够推理层间依赖,同时保持与临床部署兼容的复杂度。我们的方法在来自独立机构的3个数据集上进行训练和评估,实现了强大的跨数据集泛化能力,并与最先进的视觉编码器相比表现出竞争性能。我们进一步进行了全面的消融研究,以评估各种聚合策略、边加权方案和图连接模式的影响。此外,我们通过自动放射学报告生成和腹部CT数据的迁移实验展示了我们方法的更广泛适用性。

英文摘要

With the growing volume of CT examinations, there is an increasing demand for automated tools such as organ segmentation, abnormality detection, and report generation to support radiologists in managing their clinical workload. Multi-label classification of 3D Chest CT scans remains a critical yet challenging problem due to the complex spatial relationships inherent in volumetric data and the wide variability of abnormalities. Existing methods based on 3D convolutional neural networks struggle to capture long-range dependencies, while Vision Transformers often require extensive pre-training on large-scale, domain-specific datasets to perform competitively. In this work, we propose a 2.5D alternative by introducing a new graph-based framework that represents 3D CT volumes as structured graphs, where axial slice triplets serve as nodes processed through spectral graph convolution, enabling the model to reason over inter-slice dependencies while maintaining complexity compatible with clinical deployment. Our method, trained and evaluated on 3 datasets from independent institutions, achieves strong cross-dataset generalization, and shows competitive performance compared to state-of-the-art visual encoders. We further conduct comprehensive ablation studies to evaluate the impact of various aggregation strategies, edge-weighting schemes, and graph connectivity patterns. Additionally, we demonstrate the broader applicability of our approach through transfer experiments on automated radiology report generation and abdominal CT data.

2606.00491 2026-06-18 cs.CV cs.AI 版本更新 90%

Pre-Deployment Robustness Stress Testing for CT Segmentation Systems Using Clinically Motivated Multi-Corruption Augmentation

CT分割系统的部署前鲁棒性压力测试:使用临床驱动的多损坏增强

CholMin Kanga, Jonghyun Chung, Amanpreet Kaur, Nagesh Gulkotwar, Aarthi Sivasankaran

发表机构 * Seoul National University(首尔国立大学) Google Inc.(谷歌公司)

专题命中 医学影像 :CT分割系统鲁棒性压力测试,医学影像增强

AI总结 提出RAMP框架,通过多损坏增强提升CT分割模型在临床异质成像条件下的鲁棒性,显著缩小干净与损坏图像性能差距。

详情
AI中文摘要

基于深度学习的CT分割系统在干净基准图像上通常能达到高精度,但在噪声、分辨率损失、对比度变化、强度偏移和伪影等异质临床成像条件下,其性能可能会下降。这种不稳定性可能限制其在真实医疗成像工作流程中的可靠部署。 我们提出鲁棒性增强多损坏流水线(RAMP),这是一个面向鲁棒性的CT分割增强框架。RAMP结合了解剖约束的空间扰动、CT强度变换和随机多损坏组合,使模型在训练过程中暴露于临床可行的图像退化。 在两个CT分割评估设置中,RAMP实现了最强的损坏图像性能和最小的干净到损坏鲁棒性差距。在五器官噪声评估基准中,与nnU-Net基线相比,RAMP将平均损坏Dice从0.610提高到0.753,并将鲁棒性差距从0.264降低到0.064。在Abdomen1K中,RAMP将平均损坏Dice从0.633提高到0.789,并将鲁棒性差距从0.290降低到0.070。尽管RAMP未达到最高的干净图像Dice,但它显著减轻了严重图像退化下的最坏情况分割崩溃。 这些结果表明,多损坏增强可以作为提高CT分割系统在异质临床环境中可靠性的实用部署前策略。

英文摘要

Deep learning-based CT segmentation systems often achieve high accuracy on clean benchmark images, but their performance may degrade under heterogeneous clinical imaging conditions such as noise, resolution loss, contrast variation, intensity shift, and artifacts. This instability can limit reliable deployment in real-world medical imaging workflows. We propose Robustness via Augmented Multi-corruption Pipeline (RAMP), a robustness-oriented augmentation framework for CT segmentation. RAMP combines anatomically constrained spatial perturbations, CT intensity transformations, and stochastic multi-corruption composition to expose models to clinically plausible image degradation during training. Across two CT segmentation evaluation settings, RAMP achieved the strongest corrupted-image performance and the smallest clean-to-corrupted robustness gap. In the five-organ noisy evaluation benchmark, RAMP improved mean corrupted Dice from 0.610 to 0.753 and reduced the robustness gap from 0.264 to 0.064 compared with the nnU-Net baseline. In Abdomen1K, RAMP improved mean corrupted Dice from 0.633 to 0.789 and reduced the robustness gap from 0.290 to 0.070. Although RAMP did not achieve the highest clean-image Dice, it substantially mitigated worst-case segmentation collapse under severe image degradation. These results suggest that multi-corruption augmentation can serve as a practical pre-deployment strategy for improving the reliability of CT segmentation systems in heterogeneous clinical environments.

2605.12567 2026-06-18 cs.CV cs.AI 版本更新 90%

Pyramid Self-Contrastive Learning for Single-shot Test-time Ultrasound Image Denoising

金字塔自对比学习框架用于测试时超声图像去噪

Jiajing Zhang, Bingze Dai, Xi Zhang, Yue Xu, Wei-Ning Lee

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong(香港大学电子与计算机工程系) Department of Biomedical Engineering, Duke University(达特茅斯大学生物医学工程系)

专题命中 医学影像 :提出测试时超声图像去噪框架,提升结构细节。

AI总结 本文提出一种纯测试时训练框架,用于单次超声图像去噪,应用于合成孔径超声,通过自对比学习分离解剖相似性和噪声随机性,提升去噪效果和结构细节。

详情
AI中文摘要

内在的电子噪声和斑点噪声使超声图像的临床解释复杂化。传统去噪方法依赖显式噪声假设,其有效性在复合噪声条件下减弱。基于学习的方法需要大量标注数据和模型参数。这些预定义和预训练的方法在复杂体内环境中不可避免地导致领域偏移,因此局限于特定噪声类型并常模糊结构细节。本文提出了一种纯测试时训练框架用于单次超声图像去噪,并应用于合成孔径超声(SAU),该方法通过自对比学习在金字塔潜在空间中分离解剖相似性和噪声随机性。干净图像随后从解剖空间解码,而丢弃噪声空间。A2A在测试时仅使用一个噪声样本的SAU信号进行训练,从而从根本上消除了领域偏移和预训练成本。模拟实验,包括电子噪声水平0至30 dB和不同包含几何形状,证明了A2A在SNR和CNR上的改进分别为69.3%和34.4%。体内结果表明,仅使用心脏六个超声切面、肝脏和肾脏的两个孔径数据,SNR和CNR分别提高了84.8%和25.7%。A2A在多种成像目标和配置中产生清晰的图像/信号,为更可靠的超声解剖可视化和功能评估铺平了道路。

英文摘要

The inherent electronic and speckle noise complicates clinical interpretation of ultrasound images. Conventional denoising methods rely on explicit noise assumptions whose validity diminishes under composite noise conditions. Learning-based methods are usually pretrained in a limited image domain using a labeled dataset, which implies inevitable domain shift in complex in vivo environments. This study proposes a Pyramid Self-Contrastive Learning (PSCL) framework for test-time ultrasound image denoising without pretraining. Given multiple noisy samples from only one-shot imaging, PSCL disentangles anatomical similarity and noise randomness into separate pyramid latent spaces. The clean image is then decoded from the anatomy space while discarding the noise space. We first apply PSCL to synthetic aperture ultrasound (SAU), where an Aperture-to-Aperture loop serves as a self-supervised proxy task to ensure denoising fidelity. Simulation experiments, including noise levels from 0 to 30 dB and inclusion geometries from simple to complex, demonstrated improvements of 69.3% in SNR and 34.4% in CNR. The in vivo results showed 84.8% SNR and 25.7% CNR gains using only two aperture data of the heart in six echocardiographic views, liver, and kidney. PSCL delivers clear images across diverse imaging targets and configurations, paving the way for more reliable anatomical visualization without domain shift and pretraining costs.

2602.11467 2026-06-18 cs.LG 版本更新 90%

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

PRISM:一种用于可解释形状建模的三维概率神经表示

Yining Jiao, Sreekalyani Bhamidi, Carlton Jude Zdanski, Julia S Kimbell, Andrew Prince, Cameron P Worden, Samuel Kirse, Christopher Rutter, Benjamin H Shields, Jisan Mahmud, Marc Niethammer

发表机构 * Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA(北卡罗来纳大学教堂山分校计算机科学系) Department of Computer Science, University of California San Diego, La Jolla, USA(加州大学圣地亚哥分校计算机科学系) School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA(北卡罗来纳大学教堂山分校医学院)

专题命中 医学影像 :可解释形状建模,解剖结构不确定性

AI总结 提出PRISM框架,结合隐式神经表示与不确定性感知统计形状分析,通过封闭形式Fisher信息度量实现高效局部时间不确定性量化,在形状演化、个性化预测和异常检测任务中表现优异。

Comments ICML 2026, camera-ready version, 24 pages

详情
AI中文摘要

理解解剖形状如何响应发育协变量而演变——并量化其空间变化的不确定性——在医疗保健研究中至关重要。现有方法通常依赖于忽略空间异质性动态的全局时间扭曲公式。我们引入PRISM,一种新颖的框架,将隐式神经表示与不确定性感知统计形状分析相结合。PRISM建模给定协变量下形状的条件分布,提供总体均值和协变量依赖不确定性在任意位置的空间连续估计。一个关键的理论贡献是封闭形式的Fisher信息度量,通过自动微分实现高效、解析可处理的局部时间不确定性量化。在三个合成数据集和一个临床数据集上的实验表明,PRISM在统一框架内从建模形状演化到个性化形状预测和异常检测等多样化任务中表现出色,同时提供可解释且临床有意义的不确定性估计。

英文摘要

Understanding how anatomical shapes evolve in response to developmental covariates - and quantifying their spatially varying uncertainties - is critical in healthcare research. Existing approaches typically rely on global time-warping formulations that ignore spatially heterogeneous dynamics. We introduce PRISM, a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis. PRISM models the conditional distribution of shapes given covariates, providing spatially continuous estimates of both the population mean and covariate-dependent uncertainty at arbitrary locations. A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification via automatic differentiation. Experiments on three synthetic datasets and one clinical dataset demonstrate PRISM's strong performance across diverse tasks - from modeling shape evolution to personalized shape prediction and anomaly detection - within a unified framework, while providing interpretable and clinically meaningful uncertainty estimates.

2512.10353 2026-06-18 cs.CV 版本更新 90%

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

混合Transformer-Mamba用于弱监督体积医学分割

Yiheng Lyu, Lian Xu, Coen Arrow, Mohammed Bennamoun, Farid Boussaid, Girish Dwivedi

发表机构 * University of Western Australia(西澳大学) Harry Perkins Institute of Medical Research(哈利·佩金斯医学研究所) National Imaging Facility(国家成像设施) Fiona Stanley Hospital(菲奥娜·斯蒂尔医院) Victor Chang Cardiac Research Institute(维多利亚·张心脏研究中心)

专题命中 医学影像 :混合Transformer-Mamba用于弱监督体积医学分割

AI总结 提出TranSamba混合架构,通过跨平面建模捕获3D上下文,在弱监督下实现高效体积分割,在三个数据集上达到最优性能。

详情
AI中文摘要

弱监督分割使得模型能够从平面级标签进行训练。现有方法通常依赖2D编码器,忽略了医学数据的体积特性。我们提出TranSamba,一种混合Transformer-Mamba架构,旨在通过跨平面建模捕获3D上下文。TranSamba在Vision Transformer骨干网络基础上增加跨平面Mamba块,利用线性时间建模实现相邻平面间的高效信息交换。这种交换改善了平面内自注意力以及后续用于目标定位的注意力图。TranSamba在输入体积深度上保持线性时间复杂度和恒定空间复杂度。在涵盖不同模态和病理的三个数据集上的大量实验表明,TranSamba达到了最先进的性能,展示了跨平面建模的泛化有效性。代码可在以下网址获取:this https URL.

英文摘要

Weakly supervised segmentation enables model training from plane-level labels. Existing methods often rely on 2D encoders, neglecting the volumetric nature of medical data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context via cross-plane modeling. TranSamba augments a Vision Transformer backbone with Cross-Plane Mamba blocks, leveraging linear-time modeling for efficient information exchange across neighboring planes. This exchange improves in-plane self-attention and subsequent attention maps for object localization. TranSamba maintains linear time complexity and constant space complexity with respect to the input volume depth. Extensive experiments on three datasets covering diverse modalities and pathologies show that TranSamba achieves state-of-the-art performance, demonstrating the generalizable efficacy of cross-plane modeling. Code is available at: https://github.com/YihengLyu/TranSamba.

2606.03827 2026-06-18 cs.CV cs.AI 版本更新 85%

Conditional Latent Diffusion Model with Fourier-based Motion Modelling for Virtual Population Synthesis

基于傅里叶运动建模的条件潜扩散模型用于虚拟人群合成

Shaokun Lan, Haoran Dou, Jinghan Huang, Arezoo Zakeri, Fengming Lin, Zherui Zhou, Jinming Duan, Alejandro F. Frangi

发表机构 * Centre for Computational Imaging and Modelling in Medicine (CIMIM)(计算医学成像与建模中心) University of Manchester(曼彻斯特大学) Christabel Pankhurst Institute(克里斯塔贝尔·潘克赫斯特研究所) Department of Computer Science(计算机科学系) Division of Informatics, Imaging & Data Sciences(信息学、成像与数据科学分会) Department of Electrical & Electronic Engineering(电子与电气工程系) NIHR Manchester Biomedical Research Centre, Manchester Academic Health Sciences Centre, University of Manchester(尼日利亚卫生研究委员会曼彻斯特生物医学研究中心、曼彻斯特学术健康科学中心、曼彻斯特大学)

专题命中 医学影像 :心脏网格序列生成,医学影像应用

AI总结 提出4D F-MeshLDM框架,结合卷积网格VAE、截断傅里叶级数运动参数化和条件扩散先验,实现可控的3D+t心脏网格序列生成,在UK Biobank数据上优于基线方法。

Comments This work has been early accepted by International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2026

详情
AI中文摘要

医疗设备的计算机模拟试验需要生成虚拟解剖人群。在心血管应用中,虚拟解剖通常表示为从生成模型采样的3D+t网格。然而,大多数现有网格生成器关注静态解剖,而序列模型往往缺乏显式周期性。为此,我们提出4D F-MeshLDM,一个条件生成框架,包括用于编码网格的卷积网格VAE、使用截断傅里叶级数参数化运动的结构化潜空间,以及学习傅里叶系数令牌上潜分布的先验扩散。通过仿射调制将扩散过程条件化于临床协变量,我们实现了可控合成。采样令牌并执行逆傅里叶合成产生周期一致的潜轨迹,可解码为3D+t心脏网格序列。在5,000名UK Biobank受试者上的实验表明,4D F-MeshLDM在解剖保真度上优于最先进的基线,并实现了接近零的周期闭合误差。此外,生成的队列准确保留了临床功能指标,突显了我们的框架在可靠的心脏计算机模拟试验中的潜力。

英文摘要

In-silico trials of medical devices require the generation of virtual populations of anatomies. In cardiovascular applications, virtual anatomy is typically represented as a 3D+t mesh sampled from a generative model. However, most existing mesh generators focus on static anatomy, while sequence models often lack explicit periodicity. To this end, we propose 4D F-MeshLDM, a conditional generative framework comprising a convolutional mesh VAE to encode meshes, a structural latent space that parameterises motion using a truncated Fourier series, and a diffusion prior that learns the latent distribution over Fourier coefficient tokens. By conditioning the diffusion process on clinical covariates via affine modulation, we enable controllable synthesis. Sampling tokens and performing inverse Fourier synthesis yield cycle-consistent latent trajectories, which can be decoded into 3D+t cardiac mesh sequences. Experiments on 5,000 UK Biobank subjects demonstrate that 4D F-MeshLDM outperforms state-of-the-art baselines in anatomical fidelity and achieves near-zero cycle closure error. Furthermore, the generated cohorts accurately preserve clinical functional indices, highlighting the potential of our framework for reliable in-silico cardiac trials.

2504.01527 2026-06-18 cs.CV eess.IV 版本更新 85%

Beyond Nearest Neighbor Interpolation in Data Augmentation

超越数据增强中的最近邻插值

Olivier Rukundo

发表机构 * Department of Electronic and Computer Engineering, University of Limerick(电子与计算机工程系,利默里克大学)

专题命中 医学影像 :提出离线数据增强管道,提升医学图像分割性能。

AI总结 本文提出改进的几何变换函数和均值分类过滤机制,以避免最近邻插值带来的标注误差和低通滤波影响,通过离线数据增强管道提升医学图像分割性能。

Comments 10 pages, 11 figures, 14 tables

详情
AI中文摘要

避免最近邻插值导致的未定义类别标签风险忽视了增强训练数据中像素级标注误差的加剧风险。此外,插值算法固有的低通滤波效应会加剧标注区域内的高频结构细节退化风险。为避免这些风险,作者通过修改卷积神经网络的数据转换函数,引入改进的几何变换函数,去除对最近邻插值的依赖,并整合基于均值的类别过滤机制来处理未定义的类别标签。作者还实现了离线数据增强管道,生成特定于插值的增强训练数据,从而能够定量评估插值对增强训练数据的低通滤波效应。在三个医学图像分割数据集和XBAT+数据集上的实验评估显示,在多个定量指标上均实现了性能提升。

英文摘要

Avoiding the risk of undefined categorical labels using nearest neighbor interpolation overlooks the risk of exacerbating pixel level annotation errors in augmented training data. Additionally, the inherent low pass filtering effects of interpolation algorithms exacerbate the risk of degrading high frequency structural details within annotated regions of interest. To avoid these risks, the author modified convolutional neural networks data transformation functions by incorporating a modified geometric transformation function, removing reliance on nearest neighbor interpolation, and integrating a mean-based class filtering mechanism to handle undefined categorical labels with alternative interpolation algorithms. The author also implemented an offline data augmentation pipeline to generate interpolation specific augmented training data, enabling quantitative assessment of interpolation specific low pass filtering effects on augmented training data. Experimental evaluation on three medical image segmentation datasets and the XBAT+ datasets demonstrated performance gains across multiple quantitative metrics.

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新 70%

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少,而且何处:将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

专题命中 医学影像 :在糖尿病视网膜病变选择性预测中验证方法

AI总结 针对安全关键分类中认知不确定性度量无法区分类别的问题,提出将互信息分解为每类向量$C_k$,通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制,在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

Journal ref Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}

详情
AI中文摘要

在安全关键分类中,失败的代价往往是不对称的,然而贝叶斯深度学习用单个标量——互信息(MI)来总结认知不确定性,这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$,其中$\mu_k{=}\mathbb{E}[p_k]$,$\sigma_k^2{=}\mathrm{Var}[p_k]$,计算基于后验样本。该分解来自熵的二阶泰勒展开;$1/\mu_k$加权校正了边界抑制,使$C_k$在稀有类别和常见类别之间具有可比性。根据构造,$\sum_k C_k \approx \mathrm{MI}$,并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后,我们在三个任务上验证了它:(i)糖尿病视网膜病变的选择性预测,其中关键类别的$C_k$相比MI降低了34.7%的选择性风险,相比方差基线降低了56.2%;(ii)临床和图像基准上的分布外检测,其中$\sum_k C_k$取得了最高的AUROC,并且每类视角暴露了MI无法察觉的不对称偏移;(iii)受控的标签噪声研究,其中在端到端贝叶斯训练下,$\sum_k C_k$对注入的偶然噪声的敏感性低于MI,而在迁移学习下两种度量均退化。在所有任务中,后验近似的质量对不确定性的影响至少与度量选择本身一样强,这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

2. 诊断辅助 4 篇

2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新 95%

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect:通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany(科隆大学数学与自然科学学院,德国) Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院生物医学信息学研究所,德国) Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆分子医学中心(CMMC),科隆大学医学院与科隆大学医院,德国) Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究,德国) Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany(认知神经科学,神经科学与医学研究所,Juelich研究中心,德国) Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院神经科,德国) Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany(神经科中心,帕金森、睡眠与运动障碍部门,波恩大学医院,德国) German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany(德国神经退行性疾病研究中心(DZNE),波恩,德国) Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany(老龄化与相关疾病卓越中心(CECAD),科隆大学,德国) Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany(神经科,施普伦德-霍斯特大学医院,基尔校区和基尔大学,德国) Department of Informatics, Technical University of Munich, Germany(信息学院,慕尼黑技术大学,德国) Institute for Digital Medicine, University Hospital Bonn, Germany(数字医学研究所,波恩大学医院,德国) Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark(路德维希基金会帕金森病研究中心(PACE),奥胡斯大学,丹麦) Department of Nuclear Medicine, Aarhus University Hospital, Denmark(核医学部,奥胡斯大学医院,丹麦) Department of Electrical and Computer Engineering, Aarhus University, Denmark(电气与计算机工程系,奥胡斯大学,丹麦) Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK(牛津帕金森病中心与神经科,牛津大学临床神经科学系,英国)

专题命中 诊断辅助 :通过体动记录筛查REM睡眠行为障碍,属于诊断辅助。

AI总结 提出ActiTect,一个全自动开源机器学习工具,通过标准化预处理和睡眠-觉醒检测,从体动记录中识别RBD,在多个独立队列中验证了泛化能力(AUROC 0.84-0.94)。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

Journal ref npj Digital Medicine (2026)

详情
AI中文摘要

孤立性快速眼动睡眠行为障碍(iRBD)是α-突触核蛋白病的主要前驱标志,通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力,但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect,一个全自动开源机器学习工具,用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力,我们的流程包括稳健的预处理和自动睡眠-觉醒检测,以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列,在嵌套交叉验证下表现出强大的区分能力(AUROC = 0.95)。在盲法本地测试集(n = 31,AUROC = 0.86)和两个独立外部队列(n = 113,AUROC = 0.84;n = 57,AUROC = 0.94)上验证了泛化性。为评估现实世界鲁棒性,跨内部和外部队列的留一数据集交叉验证显示出一致的性能(AUROC范围 = 0.84-0.89)。补充稳定性分析表明,关键预测特征在数据集中保持可重复性,支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用,我们的工具促进了广泛采用,并促进了独立验证和协作改进,从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

2605.21528 2026-06-18 cs.LG cs.AI 版本更新 85%

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

可重复的基于日志的自动机器学习框架用于医疗风险预测中的可解释流水线优化

Rui Huang, Lican Huang

发表机构 * School of Basic Medicine, Hangzhou Normal University(杭州师范大学基础医学院) Research Department, Hangzhou Domain Zones Technology Co.Ltd.(杭州域区技术有限公司)

专题命中 诊断辅助 :AutoML框架用于医疗风险预测,属于诊断辅助。

AI总结 本文提出了一种可重复的基于日志的自动机器学习框架,用于医疗风险预测中的可解释流水线优化,通过分析组件属性、交互和冗余性,提高了模型性能和稳定性。

详情
AI中文摘要

准确且可重复的疾病风险预测仍然具有挑战性,由于异质特征、有限样本和严重的类别不平衡。本研究引入了yvsoucom-iterkit,一种确定性和基于日志的自动化机器学习框架,将流水线优化完全可重复地建模为配置级系统。每个流水线被编码为可追溯的日志实体,使能够分析组件属性、交互、相似性和跨种子鲁棒性。在超过18,000个流水线配置上对Pima Indians糖尿病和中风数据集的实验揭示了一个结构化且部分冗余的搜索空间,其中性能由一小部分相互作用的组件决定。随机森林重要性分析显示,增强(0.454)、模型选择(0.198)和不平衡处理(0.101)是Pima数据集的关键驱动因素,而不平衡处理主导中风(0.406)。组件相似性分析显示强冗余性,特征选择变体(biMax-biMean)表现出低RMS距离(0.0252),混合匹配无增强(0.0279),TomekLinks与无不平衡处理对齐(0.0325),而高斯噪声与无增强的差异更大(0.10)。该框架使用集成模型(加权F1 0.89,宏F1 0.88在Pima;加权F1 0.94在中风)实现了强且稳定的性能,而宏F1在中风上较低(0.67)由于类别不平衡。跨种子分析揭示了性能-鲁棒性权衡,集成模型的变异性低于SVM。这些结果表明,有效的AutoML优化可以聚焦于一组高影响的组件。

英文摘要

Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

2603.15988 2026-06-18 eess.AS cs.AI cs.LG 版本更新 85%

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

无中生有:面向构音障碍语音严重程度鲁棒估计的数据增强

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

发表机构 * 1 University of Illinois Urbana-Champaign, IL, USA 2 Korea Advanced Institute of Science \& Technology, KR

专题命中 诊断辅助 :构音障碍语音质量评估,用于临床诊断

AI总结 提出三阶段框架,利用未标注构音障碍语音和典型语音数据集,通过教师模型生成伪标签、标签感知对比学习预训练和微调,在五个未见数据集上平均SRCC达0.761,显著优于现有方法。

Comments Accepted to Interspeech 2026 Long Paper Track

详情
AI中文摘要

构音障碍语音质量评估(DSQA)对于临床诊断和包容性语音技术至关重要。然而,主观评估成本高且难以规模化,而标注数据的稀缺限制了鲁棒的客观建模。为解决这一问题,我们提出了一个三阶段框架,利用未标注的构音障碍语音和大规模典型语音数据集来扩展训练。教师模型首先生成未标注样本的伪标签,然后使用标签感知对比学习策略进行弱监督预训练,使模型暴露于多样化的说话者和声学条件。预训练模型随后针对下游DSQA任务进行微调。在跨越多种病因和语言的五个未见数据集上的实验证明了我们方法的鲁棒性。我们的基于Whisper的基线显著优于SOTA DSQA预测器(如SpICE),完整框架在未见测试数据集上实现了平均SRCC为0.761。

英文摘要

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

2509.14183 2026-06-18 stat.ME stat.AP 版本更新 70%

Index Date Imputation for Survival Analysis in Externally Controlled Trials with Delayed Treatment Initiation

延迟治疗启动的外部对照试验中生存分析的索引日期插补

Q. Le Coent, G. L. Rosner, M-C. Wang, C. Hu

专题命中 诊断辅助 :外部对照试验中索引日期插补方法

AI总结 针对外部对照试验中因治疗启动延迟导致的索引日期错位问题,提出截断感知的索引日期插补(IDI)方法,结合倾向得分加权以校正混杂,模拟和真实数据验证其减少偏差的有效性。

详情
AI中文摘要

外部对照试验将单臂试验的结果与从历史试验、注册或观察性研究中抽取的外部对照进行比较。对于时间至事件终点,一个关键挑战是单臂试验中的随访以治疗启动为索引,而外部对照数据以更早的临床里程碑(如诊断或复发)为索引。这种错位可能引入永存时间偏倚,扭曲风险集,并复杂化生存比较的解释。我们提出索引日期插补(IDI),一种截断感知的方法,用于在延迟治疗启动的设置中为外部对照患者插补可比较的索引日期。IDI估计目标单臂人群中治疗启动时间的边际分布,同时考虑到启动时间仅在存活足够长以启动治疗的患者中观察到。然后使用插补的索引日期来对齐随访,并在外部对照队列中强制实施可比较的截断条件。由于仅时间对齐不能解决人群水平的混杂,IDI与倾向得分加权或匹配相结合,以改善队列之间的协变量可比性。我们通过蒙特卡洛模拟研究评估所提出方法的有限样本性能。使用来自一项随机肿瘤试验的数据,我们模拟了一个具有诱导索引日期错位的外部对照分析,并显示IDI减少了与随机试验基准的差异。IDI为涉及延迟治疗启动的生存分析中的索引日期对齐提供了一种实用策略,并且在有合适外部对照可用时,可以与标准的协变量调整方法集成。

英文摘要

Externally controlled trials compare outcomes from a single-arm trial with external controls drawn from historical trials, registries, or observational studies. For time-to-event endpoints, a key challenge arises when follow-up is indexed at treatment initiation in the single-arm trial, but the external-control data are indexed at an earlier clinical milestone, such as diagnosis or relapse. This misalignment can induce immortal time bias, distort risk sets, and complicate the interpretation of survival comparisons. We propose Index Date Imputation (IDI), a truncation-aware approach for imputing comparable index dates for external-control patients in settings with delayed treatment initiation. IDI estimates the marginal distribution of treatment-initiation times in the target single-arm population while accounting for the fact that initiation times are observed only among patients who survive long enough to initiate treatment. The imputed index dates are then used to align follow-up and enforce comparable truncation conditions in the external-control cohort. Because temporal alignment alone does not address population-level confounding, IDI is combined with propensity score weighting or matching to improve covariate comparability between cohorts. We evaluate the finite-sample performance of the proposed approach through Monte Carlo simulation studies. Using data from a randomized oncology trial, we emulate an externally controlled analysis with induced index-date misalignment and show that IDI reduces discrepancy from the randomized trial benchmark. IDI provides a practical strategy for index-date alignment in survival analyses involving delayed treatment initiation and can be integrated with standard covariate-adjustment methods when suitable external controls are available.

3. 临床大模型 1 篇

2605.10840 2026-06-18 cs.LG cs.AI q-bio.QM 版本更新 85%

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Clin-JEPA:一种多阶段协同训练框架,用于EHR患者轨迹的联合嵌入预测预训练

Yixuan Yang, Mehak Arora, Ryan Zhang, Baraa Abed, Junseob Kim, Tilendra Choudhary, Md Hassanuzzaman, Kevin Zhu, Ayman Ali, Chengkun Yang, Alasdair Edward Gent, Victor Moas, Rishikesan Kamaleswaran

发表机构 * Duke University(杜克大学)

专题命中 临床大模型 :提出Clin-JEPA框架,用于EHR患者轨迹预训练。

AI总结 本文提出Clin-JEPA框架,通过多阶段预训练稳定协同训练编码器和预测器,解决EHR数据中联合嵌入预测的挑战,实现多任务下游任务的高性能表现。

Comments 16 pages, 4 figures, 8 tables. Code: https://github.com/YeungYathin/Clin-JEPA

详情
AI中文摘要

我们介绍了Clin-JEPA,一种用于EHR患者轨迹的联合嵌入预测(JEPA)预训练的多阶段协同训练框架。JEPA架构已在机器人领域实现了潜在空间规划,并在视觉领域实现了高质量的表示学习,但将其扩展到EHR数据以获得一个能够同时预测患者轨迹并服务于多种下游风险预测任务的单一主干,仍是一个开放性挑战。现有的JEPA框架要么在预训练后丢弃预测器(I-JEPA,V-JEPA),要么在冻结的预训练编码器上训练预测器(V-JEPA 2-AC),导致编码器在推理时无法感知预测器必须使用的滚动信号;在共享JEPA预测目标下协同训练编码器和预测器将提供这种基础,但朴素的协同训练不稳定,代表性崩溃和在线/目标漂移导致自回归滚动发散。Clin-JEPA的五阶段预训练课程——预测器预热、联合细化、EMA目标对齐、硬同步和预测器最终化——通过阶段解决每个失败模式,稳定地协同训练基于Qwen3-8B的编码器和一个具有9200万参数的潜在轨迹预测器。在MIMIC-IV ICU数据上,三个独立评估支持该框架:(1)潜在ℓ1滚动漂移唯一收敛(-15.7%)在48小时范围内,而基线和消融测试发散(+3%至+4951%);(2)编码器学习了临床可区分的潜在几何结构(衰变患者群体在潜在空间中偏离4.83×,而稳定患者仅偏离≤2.62×);(3)单一主干在多任务下游评估中优于强大的表格和序列基线。Clin-JEPA在ICareFM EEP上达到平均AUROC 0.851,在8个二元风险任务上达到0.883(比基线平均高0.038和0.041)

英文摘要

We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data -- to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning -- remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum -- predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization -- addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).