SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival Analysis
SDPM:用于连续时间生存分析的生存扩散概率模型
Stanislav R. Kirpichenko, Andrei V. Konstantinov, Lev V. Utkin
AI总结 本文提出SDPM,一种用于连续时间生存分析的生成模型,通过去噪扩散模型建模生存结果的条件分布,避免了对事件时间分布的参数假设,并在变换的目标空间中使用标准化对数时间和连续高斯混合表示来表示删失指示符,从而在多个真实生存数据集上取得了竞争力的预测性能。
详情
生存分析旨在从具有删失观测的数据中估计时间到事件的分布。许多现有方法要么对危险函数施加结构假设,要么离散化时间轴,这可能会限制灵活性并引入近似误差。我们提出了生存扩散概率模型(SDPM),一种用于连续时间生存分析的生成方法。SDPM利用去噪扩散模型建模生存结果的条件分布,该分布由观测时间和删失指示符表示,即P(T,δ|X)。在假设条件独立删失的情况下,模型生成的条件样本可以通过Kaplan-Meier估计器转换为生存函数估计。该公式避免了对事件时间分布的参数假设,并不需要对输出时间空间进行离散化。模型在变换的目标空间中运行,使用标准化对数时间和连续高斯混合表示来表示删失指示符。我们评估了SDPM在十个真实生存数据集上的性能,并将其与五个强大的基线模型进行了比较,包括基于树、提升和神经生存模型。结果表明,SDPM在C指数、整合时间依赖AUC和整合Brier分数上均取得了竞争性的预测性能。对合成Cox-Weibull数据的分析表明,当生成足够多的样本时,SDPM能够比强大的非参数基线更准确地恢复潜在连续生存分布的形状。消融研究证实了所提出的目标空间变换的重要性,这些变换提高了事件率校准、减少了无效生成时间并提供了预测判别的一致增益。实现所提出模型的代码已公开可用。
Survival analysis aims to estimate a time-to-event distribution from data with censored observations. Many existing methods either impose structural assumptions on the hazard function or discretize the time axis, which may limit flexibility and introduce approximation errors. We propose the Survival Diffusion Probabilistic Model (SDPM), a generative approach to continuous-time survival analysis. SDPM models the conditional distribution of the survival outcome, represented by the pair of observed time and censoring indicator, $\mathbb{P}(T,δ\mid \mathbf{x})$, using a denoising diffusion model. Under the assumption of conditionally independent censoring, conditional samples generated by the model can be transformed into survival function estimates using the Kaplan-Meier estimator. This formulation avoids parametric assumptions on the event-time distribution and does not require a discretization of the output time space. The model operates in a transformed target space, using standardized log-times and a continuous Gaussian-mixture representation of the censoring indicator. We evaluate SDPM on ten real survival datasets and compare it with five strong baselines, including tree-based, boosting-based, and neural survival models. Results show that SDPM achieves competitive predictive performance across C-index, integrated time-dependent AUC, and integrated Brier score. A study on synthetic Cox-Weibull data demonstrates that SDPM can recover the shape of an underlying continuous survival distribution more accurately than a strong nonparametric baseline when sufficiently many samples are generated. An ablation study confirms the importance of the proposed target-space transformations, which improve event-rate calibration, reduce invalid generated times, and provide consistent gains in predictive discrimination. Codes implementing the proposed model are publicly available.