arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

医学 AI

医学智能、临床 AI、医学影像、病理、诊断和医疗健康大模型。

今日/当前日期收录 8 信号源:cs.CV, cs.LG, q-bio, eess.IV, eess.SP
2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新 95%

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect:通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany(科隆大学数学与自然科学学院,德国) Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院生物医学信息学研究所,德国) Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆分子医学中心(CMMC),科隆大学医学院与科隆大学医院,德国) Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究,德国) Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany(认知神经科学,神经科学与医学研究所,Juelich研究中心,德国) Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院神经科,德国) Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany(神经科中心,帕金森、睡眠与运动障碍部门,波恩大学医院,德国) German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany(德国神经退行性疾病研究中心(DZNE),波恩,德国) Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany(老龄化与相关疾病卓越中心(CECAD),科隆大学,德国) Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany(神经科,施普伦德-霍斯特大学医院,基尔校区和基尔大学,德国) Department of Informatics, Technical University of Munich, Germany(信息学院,慕尼黑技术大学,德国) Institute for Digital Medicine, University Hospital Bonn, Germany(数字医学研究所,波恩大学医院,德国) Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark(路德维希基金会帕金森病研究中心(PACE),奥胡斯大学,丹麦) Department of Nuclear Medicine, Aarhus University Hospital, Denmark(核医学部,奥胡斯大学医院,丹麦) Department of Electrical and Computer Engineering, Aarhus University, Denmark(电气与计算机工程系,奥胡斯大学,丹麦) Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK(牛津帕金森病中心与神经科,牛津大学临床神经科学系,英国)

专题命中 诊断辅助 :通过体动记录筛查REM睡眠行为障碍,属于诊断辅助。

AI总结 提出ActiTect,一个全自动开源机器学习工具,通过标准化预处理和睡眠-觉醒检测,从体动记录中识别RBD,在多个独立队列中验证了泛化能力(AUROC 0.84-0.94)。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

Journal ref npj Digital Medicine (2026)

详情
AI中文摘要

孤立性快速眼动睡眠行为障碍(iRBD)是α-突触核蛋白病的主要前驱标志,通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力,但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect,一个全自动开源机器学习工具,用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力,我们的流程包括稳健的预处理和自动睡眠-觉醒检测,以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列,在嵌套交叉验证下表现出强大的区分能力(AUROC = 0.95)。在盲法本地测试集(n = 31,AUROC = 0.86)和两个独立外部队列(n = 113,AUROC = 0.84;n = 57,AUROC = 0.94)上验证了泛化性。为评估现实世界鲁棒性,跨内部和外部队列的留一数据集交叉验证显示出一致的性能(AUROC范围 = 0.84-0.89)。补充稳定性分析表明,关键预测特征在数据集中保持可重复性,支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用,我们的工具促进了广泛采用,并促进了独立验证和协作改进,从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

2410.23503 2026-06-18 cs.LG 90%

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

基于生理和人口数据的机器学习模型在CBRNE紧急场景中用于缺氧严重程度分诊的发展与比较分析

Santino Nanini, Mariem Abid, Yassir Mamouni, Arnaud Wiedemann, Philippe Jouvet, Stephane Bourassa

发表机构 * SADC-CDSS IA PEDIATRICS, CHU Sainte-Justine, Montreal, Canada(SADC-CDSS IA儿科,圣-朱斯特医院,蒙特利尔,加拿大) Solutions Applicare AI Inc., Montreal, Canada(应用爱智AI公司,蒙特利尔,加拿大) Université de Montréal, Canada(蒙特利尔大学,加拿大) MEDINT CBRNE Group, Montreal, Canada(MEDINT CBRNE组,蒙特利尔,加拿大)

专题命中 诊断辅助 :机器学习模型预测缺氧严重程度用于分诊

AI总结 本文开发了机器学习模型预测紧急分诊中的缺氧严重程度,利用生理数据提升预测准确性,GBM在训练速度和可解释性上优于序列模型,未来将整合多医院数据提升模型泛化能力。

Comments 12 figures, 12 tables and 39 pages

Journal ref Diagnostics 14 (2024) 2763

详情
AI中文摘要

本文开发了机器学习模型用于预测紧急分诊中的缺氧严重程度,特别是在化学、生物、辐射、核和爆炸(CBRNE)事件中,利用医疗级传感器的生理数据。梯度提升模型(XGBoost、LightGBM、CatBoost)和序列模型(LSTM、GRU)在MIMIC-III和IV数据集上进行了训练。一个稳健的预处理管道处理了缺失数据、类别不平衡,并整合了带有遮罩的合成数据。梯度提升模型(GBM)在训练速度、可解释性和可靠性方面优于序列模型,使其适合实时决策。尽管序列模型在处理时间数据方面表现良好,但其性能提升未能 justify 更高的计算成本。选择了5分钟的预测窗口以实现及时干预,以分钟级插值标准化数据。特征重要性分析突显了遮罩和评分特征在提高透明度和性能中的重要作用。时间依赖性被证明是次要的,因为梯度提升模型能够有效捕捉关键模式,而无需依赖时间依赖性。本研究突显了机器学习在改善分诊和减少警报疲劳方面的潜力。未来的工作将整合多个医院的数据以提高模型在临床环境中的泛化能力。

英文摘要

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

2606.19140 2026-06-18 cs.LG 新提交 85%

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

ChronoSurv:一种临床路径引导的多模态生存分析图框架

Hugo Miccinilli, Theo Di Piazza

发表机构 * Université Paris-Saclay, CentraleSupélec, MICS, France(巴黎萨克雷大学,中央超算学院,MICS,法国) University of Lyon, INSA Lyon, CREATIS, France(里昂大学,里昂国家理工学院,CREATIS,法国)

专题命中 诊断辅助 :多模态生存分析框架,用于头颈癌预测

AI总结 提出ChronoSurv,一种基于有向图的多模态生存分析框架,通过层次化拓扑和异质消息传递建模临床轨迹,在头颈癌数据集上取得最优判别性能与可靠校准。

Comments Accepted at MICCAI 2026. Submitted version due to embargo

详情
AI中文摘要

准确的生存预测对于头颈癌的个性化治疗计划至关重要,但由于多模态临床数据的异质性和高维性,这仍然具有挑战性。虽然深度生存模型在预测性能上优于经典统计方法,但现有方法通常依赖于静态融合策略或时间无关建模,限制了其捕捉结构化临床工作流程的能力。在这项工作中,我们提出了ChronoSurv,一种用于多模态生存分析的异质层次有向图框架。ChronoSurv使用与关键诊断步骤对齐的有向图,将患者护理表示为进展感知的临床轨迹。层次拓扑包含细粒度、粗粒度和全局表示,进一步支持对缺失模态的灵活适应,而异质消息传递则建模了跨模态和临床步骤的复杂非对称关系。在两个公共数据集上的实验结果表明,ChronoSurv在保持统计可靠校准的同时,实现了最先进的判别性能。全面的消融研究进一步证实了每个架构组件的贡献,突出了轨迹感知图建模在多模态生存预测中的潜力。

英文摘要

Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival models have improved predictive performance over classical statistical approaches, existing methods typically rely on static fusion strategies or temporally agnostic modeling, limiting their ability to capture structured clinical workflows. In this work, we propose ChronoSurv, a heterogeneous hierarchical directed graph framework for multimodal survival analysis. ChronoSurv represents patient care as a progression-aware clinical trajectory using directed graphs aligned with key diagnostic steps. A hierarchical topology incorporates fine-grained, coarse, and global representations, further supporting flexible adaptation to missing modalities, while heterogeneous message passing models complex and asymmetric relationships across modalities and clinical steps. Experimental results on two public datasets demonstrate that ChronoSurv achieves state-of-the-art discriminative performance while maintaining statistically reliable calibration. Comprehensive ablation studies further confirm the contribution of each architectural component, highlighting the potential of trajectory-aware graph modeling for multimodal survival prediction.

2606.18571 2026-06-18 cs.LG cs.CL cs.SD eess.AS 新提交 85%

Fair Cognitive Impairment Detection Through Unlearning

通过去学习实现公平的认知障碍检测

William Nguyen, Jiali Cheng, Hadi Amiri

发表机构 * University of Massachusetts Lowell, USA(马萨诸塞大学洛厄尔分校)

专题命中 诊断辅助 :多模态框架公平检测轻度认知障碍

AI总结 提出一种多模态框架,结合跨模态融合和梯度反转去学习,减少人口统计信息对轻度认知障碍检测的偏见,在跨语言数据集上缩小性能差距。

Comments Interspeech 2026

详情
AI中文摘要

轻度认知障碍(MCI)是一种以记忆、语言或思维能力显著下降为特征的医学状况。从自发语音中检测MCI对于可扩展的筛查具有前景。然而,学习模型常常利用与标签相关的人口统计线索,导致不同亚组之间存在较大的性能差距。我们提出了一种多模态框架,结合了(i)模态间(语音、文本和图像)的跨模型融合,以及(ii)使用梯度反转的去学习,该技术阻止共享嵌入编码与任务无关的人口统计属性。在多语言基准TAUKADIAL和PREPARE上的评估表明,我们的方法在MCI分类上优于最先进的多语言和多模态基线,同时显著缩小了患者亚组(性别和语言)之间的性能差距。我们进一步分析了跨数据集的迁移,表明人口统计去学习有助于学习更鲁棒的MCI检测表示。

英文摘要

Mild Cognitive Impairment (MCI) is a medical condition characterized by a noticeable decline in memory, language, or thinking abilities. MCI detection from spontaneous speech is promising for scalable screening. However, learned models often exploit demographic cues correlated with labels, resulting in a large performance gap across subgroups. We present a multimodal framework that combines (i) cross-model fusion between modalities (speech, text, and image), and (ii) unlearning using gradient reversal that discourages the shared embedding from encoding task-irrelevant demographic attributes. Evaluated on the multilingual benchmarks TAUKADIAL and PREPARE, our method outperforms the state-of-the-art multilingual and multimodal baseline in MCI classification while substantially reducing the performance gap across patient subgroups (sex and language). We further analyze transfer across datasets, showing that demographic unlearning helps learn more robust representations for MCI detection.

2606.15973 2026-06-18 eess.SP 新提交 85%

An auscultation location specific study on the relationship between expiratory-to-inspiratory acoustic patterns and spirometric airflow limitation across age and gender in asthmatic patients

基于听诊位置的哮喘患者呼气-吸气声学模式与肺功能气流受限关系的年龄和性别特异性研究

Dheeraj Harish Kumar, Sanjana M C, Perumal Keerthi Priya, K V Nikhath Khanam, Uma Maheshwari Krishnaswamy, Prasanta Kumar Ghosh

专题命中 诊断辅助 :呼吸音分析辅助哮喘诊断,医学AI

AI总结 本研究通过分析141名哮喘患者的呼吸音频谱,发现呼气-吸气声功率比与FEV1/FVC在100-400Hz频段显著相关,且相关性受听诊位置、年龄和性别影响。

详情
AI中文摘要

哮喘导致呼气气流受限,临床通过肺功能检查评估,使用FEV1/FVC比值表示第一秒呼出气量占用力肺活量的比例。先前研究表明,在后部听诊位置(左下、左上、右上、右下)记录的呼吸音可反映局部气流模式。本研究在141名20-60岁参与者中,使用Spearman相关分析,研究呼气-吸气(E/I)频谱功率比与FEV1/FVC在不同频率子带的关系。100-200 Hz和200-400 Hz频带显示出显著相关性。总体而言,较低的后部听诊位置关联性更强;年轻成年人在左下位置相关性更强,而老年人在左上位置相关性更强。性别分层分析显示,男性在左下位置相关性更强,女性在左上位置相关性更强。

英文摘要

Asthma causes expiratory airflow limitation and is clinically assessed using spirometry, which provides the FEV1/FVC ratio representing the proportion of air exhaled in the first second relative to total forced vital capacity. Prior studies suggest that respiratory sounds recorded at posterior sites (Left Lower, Left Upper, Right Upper, Right Lower) reflect regional airflow patterns. In this study, we investigate the relationship between the expiratory-to-inspiratory (E/I) spectral power ratio and FEV1/FVC in 141 participants aged 20-60 years using Spearman correlation across frequency subbands. The 100-200 Hz and 200-400 Hz bands showed significant correlations. Overall, lower posterior sites showed stronger associations; younger adults showed stronger correlations at the Left Lower site, whereas older adults showed stronger correlations at the Left Upper site. Gender-stratified analysis showed stronger Left Lower correlations in males and stronger Left Upper correlations in females.

2605.21528 2026-06-18 cs.LG cs.AI 版本更新 85%

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

可重复的基于日志的自动机器学习框架用于医疗风险预测中的可解释流水线优化

Rui Huang, Lican Huang

发表机构 * School of Basic Medicine, Hangzhou Normal University(杭州师范大学基础医学院) Research Department, Hangzhou Domain Zones Technology Co.Ltd.(杭州域区技术有限公司)

专题命中 诊断辅助 :AutoML框架用于医疗风险预测,属于诊断辅助。

AI总结 本文提出了一种可重复的基于日志的自动机器学习框架,用于医疗风险预测中的可解释流水线优化,通过分析组件属性、交互和冗余性,提高了模型性能和稳定性。

详情
AI中文摘要

准确且可重复的疾病风险预测仍然具有挑战性,由于异质特征、有限样本和严重的类别不平衡。本研究引入了yvsoucom-iterkit,一种确定性和基于日志的自动化机器学习框架,将流水线优化完全可重复地建模为配置级系统。每个流水线被编码为可追溯的日志实体,使能够分析组件属性、交互、相似性和跨种子鲁棒性。在超过18,000个流水线配置上对Pima Indians糖尿病和中风数据集的实验揭示了一个结构化且部分冗余的搜索空间,其中性能由一小部分相互作用的组件决定。随机森林重要性分析显示,增强(0.454)、模型选择(0.198)和不平衡处理(0.101)是Pima数据集的关键驱动因素,而不平衡处理主导中风(0.406)。组件相似性分析显示强冗余性,特征选择变体(biMax-biMean)表现出低RMS距离(0.0252),混合匹配无增强(0.0279),TomekLinks与无不平衡处理对齐(0.0325),而高斯噪声与无增强的差异更大(0.10)。该框架使用集成模型(加权F1 0.89,宏F1 0.88在Pima;加权F1 0.94在中风)实现了强且稳定的性能,而宏F1在中风上较低(0.67)由于类别不平衡。跨种子分析揭示了性能-鲁棒性权衡,集成模型的变异性低于SVM。这些结果表明,有效的AutoML优化可以聚焦于一组高影响的组件。

英文摘要

Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

2603.15988 2026-06-18 eess.AS cs.AI cs.LG 版本更新 85%

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

无中生有:面向构音障碍语音严重程度鲁棒估计的数据增强

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

发表机构 * 1 University of Illinois Urbana-Champaign, IL, USA 2 Korea Advanced Institute of Science \& Technology, KR

专题命中 诊断辅助 :构音障碍语音质量评估,用于临床诊断

AI总结 提出三阶段框架,利用未标注构音障碍语音和典型语音数据集,通过教师模型生成伪标签、标签感知对比学习预训练和微调,在五个未见数据集上平均SRCC达0.761,显著优于现有方法。

Comments Accepted to Interspeech 2026 Long Paper Track

详情
AI中文摘要

构音障碍语音质量评估(DSQA)对于临床诊断和包容性语音技术至关重要。然而,主观评估成本高且难以规模化,而标注数据的稀缺限制了鲁棒的客观建模。为解决这一问题,我们提出了一个三阶段框架,利用未标注的构音障碍语音和大规模典型语音数据集来扩展训练。教师模型首先生成未标注样本的伪标签,然后使用标签感知对比学习策略进行弱监督预训练,使模型暴露于多样化的说话者和声学条件。预训练模型随后针对下游DSQA任务进行微调。在跨越多种病因和语言的五个未见数据集上的实验证明了我们方法的鲁棒性。我们的基于Whisper的基线显著优于SOTA DSQA预测器(如SpICE),完整框架在未见测试数据集上实现了平均SRCC为0.761。

英文摘要

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

2509.14183 2026-06-18 stat.ME stat.AP 版本更新 70%

Index Date Imputation for Survival Analysis in Externally Controlled Trials with Delayed Treatment Initiation

延迟治疗启动的外部对照试验中生存分析的索引日期插补

Q. Le Coent, G. L. Rosner, M-C. Wang, C. Hu

专题命中 诊断辅助 :外部对照试验中索引日期插补方法

AI总结 针对外部对照试验中因治疗启动延迟导致的索引日期错位问题,提出截断感知的索引日期插补(IDI)方法,结合倾向得分加权以校正混杂,模拟和真实数据验证其减少偏差的有效性。

详情
AI中文摘要

外部对照试验将单臂试验的结果与从历史试验、注册或观察性研究中抽取的外部对照进行比较。对于时间至事件终点,一个关键挑战是单臂试验中的随访以治疗启动为索引,而外部对照数据以更早的临床里程碑(如诊断或复发)为索引。这种错位可能引入永存时间偏倚,扭曲风险集,并复杂化生存比较的解释。我们提出索引日期插补(IDI),一种截断感知的方法,用于在延迟治疗启动的设置中为外部对照患者插补可比较的索引日期。IDI估计目标单臂人群中治疗启动时间的边际分布,同时考虑到启动时间仅在存活足够长以启动治疗的患者中观察到。然后使用插补的索引日期来对齐随访,并在外部对照队列中强制实施可比较的截断条件。由于仅时间对齐不能解决人群水平的混杂,IDI与倾向得分加权或匹配相结合,以改善队列之间的协变量可比性。我们通过蒙特卡洛模拟研究评估所提出方法的有限样本性能。使用来自一项随机肿瘤试验的数据,我们模拟了一个具有诱导索引日期错位的外部对照分析,并显示IDI减少了与随机试验基准的差异。IDI为涉及延迟治疗启动的生存分析中的索引日期对齐提供了一种实用策略,并且在有合适外部对照可用时,可以与标准的协变量调整方法集成。

英文摘要

Externally controlled trials compare outcomes from a single-arm trial with external controls drawn from historical trials, registries, or observational studies. For time-to-event endpoints, a key challenge arises when follow-up is indexed at treatment initiation in the single-arm trial, but the external-control data are indexed at an earlier clinical milestone, such as diagnosis or relapse. This misalignment can induce immortal time bias, distort risk sets, and complicate the interpretation of survival comparisons. We propose Index Date Imputation (IDI), a truncation-aware approach for imputing comparable index dates for external-control patients in settings with delayed treatment initiation. IDI estimates the marginal distribution of treatment-initiation times in the target single-arm population while accounting for the fact that initiation times are observed only among patients who survive long enough to initiate treatment. The imputed index dates are then used to align follow-up and enforce comparable truncation conditions in the external-control cohort. Because temporal alignment alone does not address population-level confounding, IDI is combined with propensity score weighting or matching to improve covariate comparability between cohorts. We evaluate the finite-sample performance of the proposed approach through Monte Carlo simulation studies. Using data from a randomized oncology trial, we emulate an externally controlled analysis with induced index-date misalignment and show that IDI reduces discrepancy from the randomized trial benchmark. IDI provides a practical strategy for index-date alignment in survival analyses involving delayed treatment initiation and can be integrated with standard covariate-adjustment methods when suitable external controls are available.