arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

医学 AI

医学智能、临床 AI、医学影像、病理、诊断和医疗健康大模型。

今日/当前日期收录 62 信号源:cs.CV, cs.LG, q-bio, eess.IV, eess.SP

1. 医学影像 17 篇

2512.10353 2026-06-18 cs.CV 版本更新 90%

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

混合Transformer-Mamba用于弱监督体积医学分割

Yiheng Lyu, Lian Xu, Coen Arrow, Mohammed Bennamoun, Farid Boussaid, Girish Dwivedi

发表机构 * University of Western Australia(西澳大学) Harry Perkins Institute of Medical Research(哈利·佩金斯医学研究所) National Imaging Facility(国家成像设施) Fiona Stanley Hospital(菲奥娜·斯蒂尔医院) Victor Chang Cardiac Research Institute(维多利亚·张心脏研究中心)

专题命中 医学影像 :混合Transformer-Mamba用于弱监督体积医学分割

AI总结 提出TranSamba混合架构,通过跨平面建模捕获3D上下文,在弱监督下实现高效体积分割,在三个数据集上达到最优性能。

详情
AI中文摘要

弱监督分割使得模型能够从平面级标签进行训练。现有方法通常依赖2D编码器,忽略了医学数据的体积特性。我们提出TranSamba,一种混合Transformer-Mamba架构,旨在通过跨平面建模捕获3D上下文。TranSamba在Vision Transformer骨干网络基础上增加跨平面Mamba块,利用线性时间建模实现相邻平面间的高效信息交换。这种交换改善了平面内自注意力以及后续用于目标定位的注意力图。TranSamba在输入体积深度上保持线性时间复杂度和恒定空间复杂度。在涵盖不同模态和病理的三个数据集上的大量实验表明,TranSamba达到了最先进的性能,展示了跨平面建模的泛化有效性。代码可在以下网址获取:this https URL.

英文摘要

Weakly supervised segmentation enables model training from plane-level labels. Existing methods often rely on 2D encoders, neglecting the volumetric nature of medical data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context via cross-plane modeling. TranSamba augments a Vision Transformer backbone with Cross-Plane Mamba blocks, leveraging linear-time modeling for efficient information exchange across neighboring planes. This exchange improves in-plane self-attention and subsequent attention maps for object localization. TranSamba maintains linear time complexity and constant space complexity with respect to the input volume depth. Extensive experiments on three datasets covering diverse modalities and pathologies show that TranSamba achieves state-of-the-art performance, demonstrating the generalizable efficacy of cross-plane modeling. Code is available at: https://github.com/YihengLyu/TranSamba.

2511.12126 2026-06-18 eess.IV 90%

Volumetric Ultrasound via 3D Null Subtraction Imaging with Circular and Spiral Apertures

体积分层超声成像:基于圆形和螺旋孔径的3D空子减法成像

Bingze Dai, Xi Zhang, Wei-Ning Lee

专题命中 医学影像 :提出3D空子减法成像技术用于体积超声,属于医学影像。

AI总结 本文提出3D空子减法成像技术,通过高效空子减法与稀疏孔径设计提升体积分层超声成像的图像质量、帧率和硬件复杂度平衡,实验显示其在方位和仰角分辨率及对比度方面优于传统DAS方法。

Comments 10 pages,12 figures

Journal ref Ultrasonics, 2026: 108179

详情
AI中文摘要

体积分层超声成像面临图像质量、帧率和硬件复杂度之间的根本性权衡。本文介绍了一种非线性波束成形框架,即三维空子减法成像(3D NSI),通过结合计算高效的空子减法过程与针对矩阵阵列的多路复用感知稀疏孔径设计,解决这一权衡问题。我们评估了三种声学孔径配置:一个完全驱动的圆形孔径和两个费马螺旋稀疏孔径。为克服矩阵阵列在与低通道数超声系统多路复用时常见的通道共享限制,我们提出了一种螺旋“无重复”孔径,强制在发射-接收事件之间保持非重叠的元件集。该设计解决了多路复用冲突,并使仅使用1024个元件探头中的240个主动元件即可实现高达16倍的采集体积速率。在计算机模拟和组织仿生假体实验中,3D NSI在方位和仰角分辨率方面平均提高了36%,对比度比传统延迟求和(DAS)波束成形器提高了约20%。当与螺旋无重复孔径结合时,3D NSI框架实现了每秒超过1000个体积分层,计算负载仅为DAS的三倍以下,使其成为实时4D成像的实用解决方案。

英文摘要

Volumetric ultrasound imaging faces a fundamental trade-off among image quality, frame rate, and hardware complexity. This study introduces three-dimensional Null Subtraction Imaging (3D NSI), a nonlinear beamforming framework that addresses this trade-off by combining computationally efficient null-subtraction process with multiplexing-aware sparse aperture designs on matrix arrays. We evaluate three apodization configurations: a fully addressed circular aperture and two Fermat's spiral sparse apertures. To overcome channel-sharing constraints common in matrix arrays multiplexed with low-channel-count ultrasound systems, we propose a spiral "no-reuse" apodization that enforces non-overlapping element sets across transmit-receive events. This design resolves multiplexing conflicts and enables up to a 16-fold increase in acquisition volume rate using only 240 active elements on a 1024-element probe. In computer simulations and tissue-mimicking phantom experiments, 3D NSI achieved an average improvement of 36% in azimuthal and elevational resolutions, along with an approximately 20% higher contrast ratio, compared to the conventional Delay-and-Sum (DAS) beamformer under matched transmit/receive configurations. When implemented with the spiral no-reuse aperture, the 3D NSI framework achieved over 1000 volumes per second with a computational load less than three times that of DAS, making it a practical solution for real-time 4D imaging.

2510.13562 2026-06-18 physics.med-ph cs.CV cs.NA math.NA 90%

An efficient approach with theoretical guarantees to simultaneously reconstruct activity and attenuation sinogram for TOF-PET

一种具有理论保证的高效方法用于同时重建TOF-PET的活动和衰减正弦图

Liyang Hu, Chong Chen

发表机构 * State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China(数学科学国家重点实验室,数学与系统科学研究院,中国科学院,北京100190,中国) University of Chinese Academy of Sciences, Beijing 100190, China(中国科学院大学,北京100190,中国)

专题命中 医学影像 :PET重建,核心医学影像方法

AI总结 本文提出一种基于最大似然估计的新方法,用于同时重建TOF-PET的活动和衰减正弦图,通过利用指数形式的衰减校正因子和活动总量约束,证明了方法的可解性,并通过实验验证了其在精度和效率上的优越性。

Comments 32 pages, 11 figures, 4 tables

Journal ref IEEE Transactions on Computational Imaging 2026

详情
AI中文摘要

在正电子发射断层扫描(PET)中,进行衰减校正对于获得体内定量准确的活动图(示踪剂分布)至关重要。通常,这基于从计算机断层扫描或磁共振成像获得的估计衰减图。然而,除了衰减校正因子的误差外,额外的扫描不仅会引入新的辐射剂量或增加扫描时间,还会由于两次连续扫描之间的各种运动导致严重的对齐问题。为了解决这些问题,基于最大似然估计,我们提出了一种新的数学模型,仅从时间飞越(TOF)-PET发射数据中同时重建活动和衰减正弦图。特别地,我们充分利用了衰减校正因子的唯一指数形式,并在所提出的模型中考虑了某些掩码区域的活动总量约束。此外,我们证明了其可解性,包括解的存在性、唯一性和稳定性。我们提出了一种交替更新算法来求解该模型,并分析了其收敛性。最后,使用各种TOF-PET发射数据的数值实验表明,所提出的方法在数值收敛性和抗噪性方面表现良好,并在精度和效率上优于一些最先进的方法,且具有自主衰减校正的能力。

英文摘要

In positron emission tomography (PET), it is indispensable to perform attenuation correction in order to obtain the quantitatively accurate activity map (tracer distribution) in the body. Generally, this is carried out based on the estimated attenuation map obtained from computed tomography or magnetic resonance imaging. However, except for errors in the attenuation correction factors obtained, the additional scan not only brings in new radiation doses and/or increases the scanning time but also leads to severe misalignment induced by various motions during and between the two sequential scans. To address these issues, based on maximum likelihood estimation, we propose a new mathematical model for simultaneously reconstructing the activity and attenuation sinogram from the time-of-flight (TOF)-PET emission data only. Particularly, we make full use of the exclusively exponential form for the attenuation correction factors, and consider the constraint of a total amount of the activity in some mask region in the proposed model. Furthermore, we prove its well-posedness, including the existence, uniqueness and stability of the solution. We propose an alternating update algorithm to solve the model, and also analyze its convergence. Finally, numerical experiments with various TOF-PET emission data demonstrate that the proposed method is of numerical convergence and robust to noise, and outperforms some state-of-the-art methods in terms of accuracy and efficiency, and has the capability of autonomous attenuation correction.

2507.05647 2026-06-18 eess.IV cs.CV 90%

Diffusion-Based Limited-Angle CT Reconstruction under Noisy Conditions

基于扩散的噪声条件下有限角度CT重建

Jiaqi Guo, Santiago López-Tapia

发表机构 * Dept. of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA(电气与计算机工程系,西北大学,埃文斯顿,伊利诺伊州,美国)

专题命中 医学影像 :CT重建方法,直接应用于医学影像

AI总结 本文提出基于扩散的有限角度CT重建方法,通过Mean-Reverting随机微分方程完成缺失角度视图,结合噪声感知校正机制提升鲁棒性,实验表明在不同噪声强度和采集条件下均表现优异。

Comments Accepted at the 2025 IEEE International Conference on Image Processing (ICIP), Workshop

详情
AI中文摘要

有限角度计算机断层扫描(LACT)是一个具有挑战性的逆问题,其中缺失的角度投影导致不完整的sinogram和重建图像中的严重伪影。尽管最近的基于学习的方法已显示出有效性,但大多数方法假设理想、无噪声的测量,并未能解决测量噪声的影响。为了克服这一限制,我们将LACT视为sinogram修复任务,并提出基于扩散的框架,利用Mean-Reverting随机微分方程(MR-SDE)公式来完成缺失的角度视图。为了在现实噪声下提高鲁棒性,我们提出RNSD$^+$,一种新的噪声感知校正机制,该机制在推理时显式建模不确定性,从而实现可靠且稳健的重建。广泛的实验表明,我们的方法在数据一致性和感知质量上一致优于基线模型,并且在不同噪声强度和采集场景下具有良好的泛化能力。

英文摘要

Limited-Angle Computed Tomography (LACT) is a challenging inverse problem where missing angular projections lead to incomplete sinograms and severe artifacts in the reconstructed images. While recent learning-based methods have demonstrated effectiveness, most of them assume ideal, noise-free measurements and fail to address the impact of measurement noise. To overcome this limitation, we treat LACT as a sinogram inpainting task and propose a diffusion-based framework that completes missing angular views using a Mean-Reverting Stochastic Differential Equation (MR-SDE) formulation. To improve robustness under realistic noise, we propose RNSD$^+$, a novel noise-aware rectification mechanism that explicitly models inference-time uncertainty, enabling reliable and robust reconstruction. Extensive experiments demonstrate that our method consistently surpasses baseline models in data consistency and perceptual quality, and generalizes well across varying noise intensity and acquisition scenarios.

2606.19182 2026-06-18 eess.IV 新提交 85%

Optimized Multi-Contrast Self-Supervised MRI Reconstruction using Learned k-space Partitioning

使用学习型k空间划分的优化多对比度自监督MRI重建

Brenden Kadota, Charles Millard, Mark Chiew

专题命中 医学影像 :提出多对比度自监督MRI重建方法

AI总结 提出一种多对比度自监督学习框架,通过端到端学习最优k空间数据划分,无需全采样数据即可提升MRI重建质量。

详情
AI中文摘要

目的:深度学习在通过从欠采样数据重建高质量图像来加速MRI方面显示出前景。虽然最近的工作利用多对比度信息来提高重建性能,但这些方法依赖于监督学习,需要全采样k空间进行训练。一种方法,通过数据欠采样的自监督学习(SSDU),通过将k空间划分为两个集合,并在两者之间进行网络映射,从而能够直接在欠采样k空间上进行训练。在这项工作中,我们通过两项修改改进了MRI自监督重建。方法:我们提出了一个多对比度自监督学习框架,该框架联合训练多个欠采样对比度,无需全采样k空间数据作为参考。此外,我们以端到端的方式为每个对比度学习最优的自监督数据划分,进一步提高了重建质量。具体来说,我们学习一个最优的划分概率分布,对其进行采样以生成用于划分的掩码。结果:在两个公开可用的多对比度MRI数据集上的实验表明,与当前的单对比度自监督学习方法相比,我们提出的自监督多对比度学习划分方法提高了重建质量。我们还证明了学习k空间数据的划分进一步增强了重建的保真度。结论:多对比度重建与学习划分相结合,比单对比度自监督MRI重建提高了重建保真度。意义:与之前的自监督方法相比,我们的方法可以实现更高的图像保真度和/或加速MRI协议时间,并且无需全采样k空间进行训练。

英文摘要

Objective: Deep Learning has shown promise in accelerating MRI by reconstructing high-quality images from under-sampled data. While recent work has leveraged multi-contrast information to improve reconstruction performance, these methods rely on supervised learning, which requires fully sampled k-space for training. One method, self-supervised learning via data undersampling (SSDU), enables direct training on under-sampled k-space by partitioning it into two sets, with a network mapping between the two. In this work, we improve MRI self-supervised MRI reconstruction with two modifications. Methods: We propose a multi-contrast self-supervised learning framework that jointly trains on multiple under-sampled contrasts without requiring fully sampled k-space data as a reference. Moreover, we learn an optimal self-supervised data partitioning for each contrast in an end-to-end manner, further enhancing reconstruction quality. Specifically, we learn an optimal partitioning probability distribution, which is sampled to generate a mask for partitioning. Results: Experiments on two publicly available multi-contrast MRI datasets demonstrate the improved reconstruction quality of our proposed self-supervised multi-contrast learned partitioning method compared to the current single-contrast self-supervised learning methods. We also demonstrate that learning the partitioning of k-space data further enhances the fidelity of reconstructions. Conclusion: Multi-contrast reconstruction combined with learned partitioning improves reconstruction fidelity over single-contrast self-supervised MRI reconstructions. Significance: Our method can facilitate higher image fidelity and/or accelerated MRI protocol times compared to previous self-supervised methods, and without requiring fully sampled k-space for training.

2606.18489 2026-06-18 eess.IV 新提交 85%

GHOST-CAT: An Efficient and Practical Network for Mesh Generation from 3D Echocardiography

GHOST-CAT: 一种高效实用的三维超声心动图网格生成网络

Edward Ferdian, Debbie Zhao, Alistair A. Young, Martyn P. Nash

专题命中 医学影像 :从3D超声心动图生成左心室网格,属于医学影像处理

AI总结 提出GHOST-CAT两阶段网络,结合CNN、图卷积和Transformer,从3D超声心动图生成拓扑一致、时间连贯的左心室网格,在100例测试集上Dice系数达0.87(腔室)和0.75(心肌),优于现有方法。

详情
AI中文摘要

深度学习的最新进展显著加速了心脏成像工作流程,从分割到用于计算建模的网格生成。然而,由于3D超声心动图的低对比度噪声比、锥形视野以及对声影的敏感性,其分析面临独特挑战。在此,我们提出了一种专为3D超声心动图定制的高效实用网络。我们的方法由一个两阶段网络组成,结合了卷积神经网络、图卷积网络和Transformer,以创建准确的时间变化3D左心室网格,这些网格在整个心动周期中拓扑一致且时间连贯。我们的模型在100张3D超声图像的保留测试数据集上实现了比当前最先进方法更优越的网格重建精度,与心脏磁共振成像导出的参考分割相比,Dice系数为0.87±0.05(腔室)和0.75±0.07(心肌),平均±标准差表面距离为3.3±0.6毫米(心内膜)和3.5±0.5毫米(心外膜)。重建的网格能够自动计算常规临床指标,如体积、质量和应变,并支持生物物理数字孪生的高级应用。源代码在此https URL公开共享。

英文摘要

Recent advances in deep learning have significantly accelerated cardiac imaging workflows, from segmentation to the generation of meshes for computational modelling. Nevertheless, analysis of 3D echocardiograms presents unique challenges due to their low contrast-to-noise ratio, conical field of view, and susceptibility to acoustic shadowing. Here, we present an efficient and practical network tailored for 3D echocardiograms. Our method consists of a two-stage network that combines convolutional neural networks, graph convolutional networks, and transformers, to create accurate time-varying 3D meshes of the left ventricle that are topologically consistent and temporally coherent throughout the cardiac cycle. Our model achieved superior mesh reconstruction accuracy compared to current state-of-the-art methods on a held-out test dataset of 100 3D echo images, with a Dice coefficient of 0.87 +/- 0.05 (cavity) and 0.75 +/- 0.07 (myocardium), and mean +/- SD surface distances of 3.3 +/- 0.6 mm (endocardium) and 3.5 +/- 0.5 mm (epicardium), against reference segmentations derived from cardiac magnetic resonance imaging. The reconstructed mesh enables automated calculation of routine clinical indices, such as volume, mass, and strain, and enables advanced applications with biophysical digital twins. Source code is openly shared at https://github.com/EdwardFerdian/ghost-cat.

2606.18749 2026-06-18 cs.CV 新提交 85%

Toward Training-Free Zero-Shot Anomaly Detection in 3D Medical Images: A Batch-Based Approach Using 2D Foundation Models

迈向3D医学图像的无训练零样本异常检测:基于批次的方法使用2D基础模型

Tai Le-Gia

发表机构 * Chungnam National University(忠南大学)

专题命中 医学影像 :3D医学图像零样本异常检测,无训练方法。

AI总结 提出CS3F框架,利用2D基础模型对3D医学图像进行零样本异常检测,通过沿多轴分解、切片编码和跨主体相似性计算异常分数,并引入粗到细的分词策略减少信号衰减。

详情
AI中文摘要

零样本异常检测(ZSAD)在医学成像中具有吸引力,因为临床系统必须处理异构采集协议、变化的患者群体以及可能缺乏标注训练数据的病理。大多数现有的零样本异常检测方法是为2D图像设计的,它们直接扩展到3D医学体积受到大规模体积基础模型稀缺或利用体积上下文困难的限制。我们提出CS3F,一个无训练的基于批次的框架,用于3D医学图像中的ZSAD,使用2D基础模型。每个体积沿多个解剖轴分解,并由2D视觉变换器逐切片编码。然后通过池化相邻切片特征将其转换为局部体积令牌。异常分数通过跨主体互相似性获得:在其他主体中缺乏相似令牌的令牌被赋予更高的异常分数。为了减少深度池化引起的病灶信号衰减,我们引入了一种粗到细的分词策略,无需穷举匹配即可实现细分辨率体积评分。CS3F在脑部MRI上针对转移瘤、胶质瘤和中风进行评估,并在肺部CT上验证其泛化能力,超越标准图谱对齐的脑部MRI。结果表明,冻结的2D基础模型可以支持3D医学图像中的异常定位,且细分词化的益处很大程度上取决于病灶对比度和成像模态。

英文摘要

Zero-shot anomaly detection (ZSAD) is attractive for medical imaging because clinical systems must handle heterogeneous acquisition protocols, changing patient populations, and pathologies for which annotated training data may be unavailable. Most existing zero-shot anomaly detection methods are designed for 2D images, and their direct extension to 3D medical volumes is limited by the scarcity of large-scale volumetric foundation models or by the difficulty of utilizing volumetric context. We propose CS3F, a training-free batch-based framework for ZSAD in 3D medical images using 2D foundation models. Each volume is decomposed along multiple anatomical axes and encoded slice-wise by a 2D vision transformer. These are then converted into localized volumetric tokens by pooling neighboring slice features. Anomaly scores are obtained from cross-subject mutual similarity: tokens that lack close analogues in other subjects are assigned higher anomaly scores. To reduce the attenuation of focal lesion signals caused by depth pooling, we introduce a coarse-to-fine tokenization strategy that enables fine-resolution volumetric scoring without exhaustive matching. CS3F is evaluated on brain MRI across metastases, glioma, and stroke, as well as validated on lung CT to test generalizability beyond atlas-aligned brain MRI. The results show that frozen 2D foundation models can support anomaly localization in 3D medical images, and that the benefit of fine tokenization depends strongly on lesion contrast and imaging modality.

2606.18658 2026-06-18 cs.CV eess.IV 新提交 85%

On-Manifold Variational Learning with Heat-Kernel Priors

基于热核先验的流形变分学习

Jiarui Xing, Tal Zeevi, Nian Wu, Jian Wang

发表机构 * Yale School of Medicine(耶鲁大学医学院) University of Virginia(弗吉尼亚大学) Harvard Medical School(哈佛医学院)

专题命中 医学影像 :在心脏瘢痕和脑MRI基准上取得最高精度

AI总结 提出一种流形锚定变分框架,利用几何感知EM算法选择热核加权潜图上的图中心点作为原型,确保原型在流形上,并通过Dirichlet能量正则化保持潜空间几何平滑,在心脏瘢痕和脑MRI基准上取得最高精度和清晰原型。

详情
AI中文摘要

学习医学影像队列的无监督表示可以揭示临床上有意义的原型,而无需专家标签,这些标签通常带有噪声且无法捕捉真实的病理异质性。然而,现有的深度潜变量模型通过欧几里得平均估计高斯混合先验,产生的原型会偏离弯曲的数据流形,并随着子种群数量的增加而退化。我们提出了一种流形锚定变分框架,基于几何感知的期望最大化(EM)算法,其M步骤选择每个子种群原型作为热核加权潜图上具有最高扩散中心性的图中心点,确保每个原型保持在流形上。Dirichlet能量正则化强制潜空间的几何平滑性,每个子种群的不确定性分数实现了无标签的质量评估。流形锚定EM是一种通用几何工具,扩展了标准EM,并易于应用于其他潜变量模型。在心脏瘢痕和脑MRI基准上,我们的框架在所有比较方法中取得了最高精度,产生了迄今为止最清晰的原型,并且在所有基线退化的较大子种群数量下保持稳定。

英文摘要

Learning unsupervised representations of medical imaging cohorts can reveal clinically meaningful prototypes without expert labels, which are often noisy and fail to capture true pathological heterogeneity. However, existing deep latent-variable models estimate Gaussian mixture priors via Euclidean averaging, producing prototypes that drift off the curved data manifold and degenerate as the number of sub-populations grows. We propose a manifold-anchored variational framework built on a geometry-aware Expectation-Maximization (EM) algorithm, whose M-step selects each sub-population prototype as the graph medoid with the highest diffusion centrality on a heat-kernel-weighted latent graph, ensuring that every prototype remains on-manifold. A Dirichlet energy regularizer enforces geometric smoothness of the latent space, and a per-sub-population uncertainty score enables label-free quality assessment. \rev{The manifold-anchored EM is a general-purpose geometric tool that extends standard EM and applies readily to other latent-variable models beyond this setting.} On cardiac scar and brain MRI benchmarks, our framework attains the highest accuracy among all compared methods, produces the sharpest prototypes reported to date, and remains stable at large sub-population counts where all baselines degenerate.

2606.17412 2026-06-18 cs.CV cs.AI 新提交 85%

Enhancing Pathological VLMs with Cross-scale Reasoning

增强病理视觉语言模型的跨尺度推理能力

Chi Phan, Tianyi Zhang, Qiaochu Xue, Yufeng Wu, Dan Hu, Zeyu Liu, Sudong Wang, Yueming Jin

发表机构 * Department of Electrical and Computer Engineering, National University of Singapore(新加坡国立大学电气与计算机工程系) PuzzleLogic Pte Ltd(PuzzleLogic私人有限公司) Department of Pathology, Fujian Medical University Cancer Hospital & Fujian Cancer Hospital(福建医科大学附属肿瘤医院病理科暨福建省肿瘤医院)

专题命中 医学影像 :病理VLM跨尺度推理,医学影像分析

AI总结 提出首个跨尺度训练与评估范式,通过多倍率视觉问答任务增强病理视觉语言模型的跨尺度推理能力,并构建高质量基准数据集Scale-VQA及模型ScaleReasoner-R1,实现最优性能。

详情
AI中文摘要

病理图像本质上是多尺度的,要求病理学家整合从低倍放大下的整体组织结构到高倍放大下的细胞形态的证据以进行准确诊断。虽然现有的视觉语言模型(VLM)病理数据集包含多种尺度,但它们通常缺乏明确的跨尺度推理目标。这一限制阻碍了VLM捕获关键的跨尺度表示和学习基于证据的推理。为弥补这一差距,我们引入了首个跨尺度训练和评估范式,将病理解释表述为多倍率推理。然而,创建这样的任务揭示了一个关键挑战:多图像视觉问答(VQA)容易受到仅文本捷径的影响,这使得模型能够利用与放大倍数相关的伪影而非视觉证据来猜测答案。为解决此问题,我们提出了一种泄漏感知的策展流程,结合了对抗性仅文本筛选和约束引导的问题设计。利用该流程,我们构建了Scale-VQA,一个高质量基准,包含4,685个多项选择题,基于2,537张跨多个放大级别的病理图像。最后,我们提出了ScaleReasoner-R1,一个通过强化学习训练的模型,以优化跨尺度VQA任务的性能。ScaleReasoner-R1在我们的跨尺度推理基准上达到了最先进的性能,并在已有的单尺度基准上泛化到最先进的性能。研究结果表明,即使是有限的跨尺度监督也能显著改善病理理解。代码和演示将开源。

英文摘要

Pathological images are inherently multi-scale, requiring pathologists to integrate evidence from global tissue architecture at low magnification to cellular morphology at higher magnification for accurate diagnosis. While existing pathological datasets for vision-language model (VLM) include various scales, they often lack an explicit cross-scale reasoning objective. This limitation prevents VLMs from capturing essential cross-scale representations and learning evidence-based reasoning. To bridge this gap, we introduce the first cross-scale training and evaluation paradigm that formulates pathology interpretation as multi-magnification reasoning. However, creating such a task reveals a critical challenge: multi-image visual question answering (VQA) is prone to text-only shortcuts, which allow models to guess answers using magnification-dependent artifacts rather than visual evidence. To address this, we propose a leakage-aware curation pipeline that combines adversarial text-only screening with constraint-guided question design. Using this pipeline, we construct Scale-VQA, a high-quality benchmark with 4,685 multiple-choice questions grounded in 2,537 pathology images across multiple magnification levels. Finally, we present ScaleReasoner-R1, a model trained via reinforcement learning to optimize performance on the cross-scale VQA task. ScaleReasoner-R1 achieves state-of-the-art performance on our cross-scale reasoning benchmark and generalizes to SOTA performance on established single-scale benchmarks. Findings suggest that even the limited cross-scale supervision can significantly improve pathological understanding. The code and demos will be open-sourced.

2606.03827 2026-06-18 cs.CV cs.AI 版本更新 85%

Conditional Latent Diffusion Model with Fourier-based Motion Modelling for Virtual Population Synthesis

基于傅里叶运动建模的条件潜扩散模型用于虚拟人群合成

Shaokun Lan, Haoran Dou, Jinghan Huang, Arezoo Zakeri, Fengming Lin, Zherui Zhou, Jinming Duan, Alejandro F. Frangi

发表机构 * Centre for Computational Imaging and Modelling in Medicine (CIMIM)(计算医学成像与建模中心) University of Manchester(曼彻斯特大学) Christabel Pankhurst Institute(克里斯塔贝尔·潘克赫斯特研究所) Department of Computer Science(计算机科学系) Division of Informatics, Imaging & Data Sciences(信息学、成像与数据科学分会) Department of Electrical & Electronic Engineering(电子与电气工程系) NIHR Manchester Biomedical Research Centre, Manchester Academic Health Sciences Centre, University of Manchester(尼日利亚卫生研究委员会曼彻斯特生物医学研究中心、曼彻斯特学术健康科学中心、曼彻斯特大学)

专题命中 医学影像 :心脏网格序列生成,医学影像应用

AI总结 提出4D F-MeshLDM框架,结合卷积网格VAE、截断傅里叶级数运动参数化和条件扩散先验,实现可控的3D+t心脏网格序列生成,在UK Biobank数据上优于基线方法。

Comments This work has been early accepted by International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2026

详情
AI中文摘要

医疗设备的计算机模拟试验需要生成虚拟解剖人群。在心血管应用中,虚拟解剖通常表示为从生成模型采样的3D+t网格。然而,大多数现有网格生成器关注静态解剖,而序列模型往往缺乏显式周期性。为此,我们提出4D F-MeshLDM,一个条件生成框架,包括用于编码网格的卷积网格VAE、使用截断傅里叶级数参数化运动的结构化潜空间,以及学习傅里叶系数令牌上潜分布的先验扩散。通过仿射调制将扩散过程条件化于临床协变量,我们实现了可控合成。采样令牌并执行逆傅里叶合成产生周期一致的潜轨迹,可解码为3D+t心脏网格序列。在5,000名UK Biobank受试者上的实验表明,4D F-MeshLDM在解剖保真度上优于最先进的基线,并实现了接近零的周期闭合误差。此外,生成的队列准确保留了临床功能指标,突显了我们的框架在可靠的心脏计算机模拟试验中的潜力。

英文摘要

In-silico trials of medical devices require the generation of virtual populations of anatomies. In cardiovascular applications, virtual anatomy is typically represented as a 3D+t mesh sampled from a generative model. However, most existing mesh generators focus on static anatomy, while sequence models often lack explicit periodicity. To this end, we propose 4D F-MeshLDM, a conditional generative framework comprising a convolutional mesh VAE to encode meshes, a structural latent space that parameterises motion using a truncated Fourier series, and a diffusion prior that learns the latent distribution over Fourier coefficient tokens. By conditioning the diffusion process on clinical covariates via affine modulation, we enable controllable synthesis. Sampling tokens and performing inverse Fourier synthesis yield cycle-consistent latent trajectories, which can be decoded into 3D+t cardiac mesh sequences. Experiments on 5,000 UK Biobank subjects demonstrate that 4D F-MeshLDM outperforms state-of-the-art baselines in anatomical fidelity and achieves near-zero cycle closure error. Furthermore, the generated cohorts accurately preserve clinical functional indices, highlighting the potential of our framework for reliable in-silico cardiac trials.

2504.01527 2026-06-18 cs.CV eess.IV 版本更新 85%

Beyond Nearest Neighbor Interpolation in Data Augmentation

超越数据增强中的最近邻插值

Olivier Rukundo

发表机构 * Department of Electronic and Computer Engineering, University of Limerick(电子与计算机工程系,利默里克大学)

专题命中 医学影像 :提出离线数据增强管道,提升医学图像分割性能。

AI总结 本文提出改进的几何变换函数和均值分类过滤机制,以避免最近邻插值带来的标注误差和低通滤波影响,通过离线数据增强管道提升医学图像分割性能。

Comments 10 pages, 11 figures, 14 tables

详情
AI中文摘要

避免最近邻插值导致的未定义类别标签风险忽视了增强训练数据中像素级标注误差的加剧风险。此外,插值算法固有的低通滤波效应会加剧标注区域内的高频结构细节退化风险。为避免这些风险,作者通过修改卷积神经网络的数据转换函数,引入改进的几何变换函数,去除对最近邻插值的依赖,并整合基于均值的类别过滤机制来处理未定义的类别标签。作者还实现了离线数据增强管道,生成特定于插值的增强训练数据,从而能够定量评估插值对增强训练数据的低通滤波效应。在三个医学图像分割数据集和XBAT+数据集上的实验评估显示,在多个定量指标上均实现了性能提升。

英文摘要

Avoiding the risk of undefined categorical labels using nearest neighbor interpolation overlooks the risk of exacerbating pixel level annotation errors in augmented training data. Additionally, the inherent low pass filtering effects of interpolation algorithms exacerbate the risk of degrading high frequency structural details within annotated regions of interest. To avoid these risks, the author modified convolutional neural networks data transformation functions by incorporating a modified geometric transformation function, removing reliance on nearest neighbor interpolation, and integrating a mean-based class filtering mechanism to handle undefined categorical labels with alternative interpolation algorithms. The author also implemented an offline data augmentation pipeline to generate interpolation specific augmented training data, enabling quantitative assessment of interpolation specific low pass filtering effects on augmented training data. Experimental evaluation on three medical image segmentation datasets and the XBAT+ datasets demonstrated performance gains across multiple quantitative metrics.

2606.19174 2026-06-18 cs.HC cs.AI 新提交 80%

A Clinician-Centered Pipeline for Annotation and Evaluation in Ultrasound AI Studies

面向临床医生的超声AI研究注释与评估流程

Fangyijie Wang, Jianjun Yu, Wentao Shi, Haixia Huang, Ran Shi, Guénolé Silvestre, Kathleen M. Curran

发表机构 * Research Ireland Centre for Research Training in Machine Learning(爱尔兰研究机器学习研究中心) School of Medicine, University College Dublin, Dublin, Ireland(都柏林大学医学院) The Third People's Hospital of Zhenjiang City, Zhenjiang, China(镇江市第三人民医院) Zhenjiang Maternal and Child Health Hospital, Zhenjiang, China(镇江 maternal and child health hospital) The Fifth People's Hospital of Zhenjiang City, Zhenjiang, China(镇江市第五人民医院) School of Computer Science, University College Dublin, Dublin, Ireland(都柏林大学计算机科学学院)

专题命中 医学影像 :超声AI注释与评估流程,属于医学影像

AI总结 提出一个基于中央服务器和轻量级浏览器的临床医生中心化流程,支持远程注释、盲评和多评分者参与,在胎儿超声分割研究中验证了其可重复性和统计一致性。

Comments Accepted to MIUA 2026

详情
AI中文摘要

临床医生中心的评估对于验证医学AI系统至关重要,尤其是在超声成像中,定量指标并不总能捕捉临床可用性。现有的医学图像平台主要关注数据集标注,缺乏对盲法模型比较和可重复评估工作流的集成支持。我们提出了一个面向临床医生的超声AI研究远程注释与评估流程。该流程使用中央服务器和轻量级浏览器界面,使临床医生无需下载本地数据集即可进行注释、盲法排序和审查。该流程还支持多评分者参与、集中结果聚合和自动统计分析。我们在一个胎儿超声分割研究中验证了该流程,涉及六名评分者,涵盖专家、全科医生和非专家经验水平。系统自动生成了Spearman相关性、Kendall's τ和top-1选择统计量。结果显示专家与其他组之间存在中等到强的一致性。盲法评估结果表明,后期主动学习模型更受青睐。这些结果表明,该流程可以支持超声成像中临床医生中心的注释和可重复的人机AI评估研究。该流程可在GitHub上获取。

英文摘要

Clinician-centered evaluation is critical for validating medical AI systems, especially in ultrasound imaging where quantitative metrics do not always capture clinical usability. Existing medical image platforms primarily focus on dataset labeling. They lack integrated support for blinded model comparison and reproducible evaluation workflows. We present a clinician-centered pipeline for remote annotation and evaluation in ultrasound AI studies. The proposed pipeline uses a centralized server and lightweight browser interfaces to enable clinicians to perform annotation, blinded ranking, and review without local dataset downloads. The pipeline also supports multi-rater participation, centralized result aggregation, and automated statistical analysis. We validate the pipeline in a fetal ultrasound segmentation study with six raters spanning expert, generalist, and non-expert experience levels. The system automatically generated Spearman correlation, Kendall's $τ$, and top-1 selection statistics. Results indicated moderate to strong agreement across experts and other groups. The blinded evaluation results showed a tendency for later active learning models to be preferred. These outcomes suggest that the pipeline can support clinician-centered annotation and reproducible human-\ac{AI} evaluation studies in ultrasound imaging. The proposed pipeline is available on \href{https://github.com/13204942/SonoRate}{GitHub}.

2606.18287 2026-06-18 cs.LG 新提交 80%

Artemis: Anatomy-Resolved inTervention for Eliminating Multimodal NeuroImage confounderS

Artemis: 解剖分辨的干预方法用于消除多模态神经影像混杂因素

Siyuan Dai, Yang Du, Kun Zhao, Zhusuyi Chen, Heng Huang, Paul Thompson, Chao Shi, Haoteng Tang, Liang Zhan

发表机构 * University of Pittsburgh(匹兹堡大学) University of Maryland(马里兰大学) University of Southern California(南加州大学) Binghamton University(宾汉姆顿大学) University of Texas Rio Grande Valley(德克萨斯大学里奥格兰德河谷分校)

专题命中 医学影像 :提出Artemis框架消除神经影像混杂因素,提升诊断性能。

AI总结 提出Artemis框架,通过区域级因果干预学习特定脑区的混杂因素表示,消除fMRI和DTI多模态神经影像中人口统计学混杂因素对GNN的影响,在三个基准上提升性能。

Comments 11 pages, 8 figures

详情
AI中文摘要

多模态神经影像学整合了来自fMRI的功能连接和来自DTI的结构连接,使得使用图神经网络对脑网络进行无创分析成为可能。然而,年龄和性别等人口统计学因素系统地混淆了脑连接与临床结果之间的关系,导致GNN利用虚假捷径而非学习因果不变表示。尽管最近的因果GNN方法在图建模层面引入因果关系,但其因果机制仍然是领域无关的,没有考虑临床神经影像数据中固有的真实世界混杂因素。此外,脑网络是基于图谱分区构建的,每个区域对人口统计学因素表现出不同的敏感性,因此需要区域感知的调整。我们提出了Artemis,一个区域级因果框架,通过在每个脑区域独立进行因果干预,使用轻量级参数学习区域特定的混杂因素表示,从而弥合了这一差距。我们的调整综合利用多模态功能和结构特征进行图推理,作为一个与任意GNN骨干兼容的插件模块。在三个基准(用于疾病诊断的ADNI、用于痴呆分期的OASIS和用于性别分类的HCP)上的实验表明,与代表性的基于GNN的基线相比,该方法具有一致的改进。多项支持实验进一步证明了统计显著性和神经科学可解释性。

英文摘要

Multimodal neuroimaging, integrating functional connectivity from fMRI and structural connectivity from DTI, enables non-invasive analysis of brain networks using graph neural networks. However, demographic factors such as age and sex systematically confound the relationship between brain connectivity and clinical outcomes, causing GNNs to exploit spurious shortcuts rather than learning causally invariant representations. While recent causal GNN methods introduce causality at the graph-modeling level, their causal mechanisms remain domain-agnostic without accounting for the real-world confounders inherent in clinical neuroimaging data. Moreover, brain networks are constructed from atlas-based parcellations where each region exhibits distinct sensitivity to demographic factors, necessitating region-aware adjustment. We propose Artemis, a region-level causal framework that bridges this gap with causal intervention at each brain region independently by learning region-specific confounder representations with lightweight parameters. Our adjustment comprehensively utilized the multimodal functional and structural features for graph reasoning as a plug-in module compatible with arbitrary GNN backbones. Experiments on three benchmarks, ADNI for disease diagnosis, OASIS for dementia staging, and HCP for sex classification, demonstrate consistent improvements over representative GNN-based baselines. Multiple supporting experiments further demonstrate statistical significance and neuroscientific interpretability.

2606.19270 2026-06-18 eess.IV cs.LG physics.med-ph 新提交 80%

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

超越算法:医学影像人工智能中的概念创新

Mark A. Anastasio

发表机构 * Mallinckrodt Institute of Radiology and Department of Electrical & Systems Engineering, Washington University in St. Louis(马林克罗德特放射医学研究所和电气与系统工程系,华盛顿大学圣路易斯分校)

专题命中 医学影像 :医学影像AI概念创新讨论

AI总结 本文区分算法创新与概念创新,指出当前激励结构过度奖励算法新颖性而忽视概念贡献,通过医学影像AI案例展示概念不足导致的错位目标与有限临床影响,并提出促进概念创新的建议。

详情
AI中文摘要

人工智能推动了医学影像研究的快速发展,产生了日益复杂的算法,并在基准任务上稳步改进。然而,这种以算法为中心的发展轨迹也揭示了一个日益加剧的不平衡:虽然计算方法快速进步,但定义成像任务、评估指标和临床意义的概念基础有时仍未得到充分审视。在这篇观点文章中,我们区分了算法创新(专注于在固定问题定义内改进计算实现和性能)与概念创新(重新定义提出的问题、衡量成功的方式以及方法在临床上的相关性)。我们认为,当前的激励结构、培训路径和发表规范不成比例地奖励算法新颖性,尤其是对早期职业研究者而言,而有时低估了对科学成熟和临床转化至关重要的概念贡献。通过医学影像AI的代表性例子,我们展示了概念基础不足如何导致目标错位、泛化脆弱以及现实世界影响有限。最后,我们为研究者、导师、审稿人和期刊提出了可操作的建议,以更好地识别、支持和整合概念创新与算法进步。

英文摘要

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.

2606.18887 2026-06-18 eess.IV physics.med-ph 新提交 80%

Efficient Image Registration for Ultrasound Localization Microscopy by Obtaining Gradients via Integration Across Iterations

通过跨迭代积分获取梯度的超声定位显微镜高效图像配准

Jipeng Yan, Chang Liu, Hengchang Liu, Biao Huang, Meng-Xing Tang, Yingxiang Liu, Ying Tan

专题命中 医学影像 :超声定位显微镜图像配准

AI总结 提出极值搜索控制(ESC)替代显式梯度计算,用于超声定位显微镜(ULM)图像配准,实现每迭代计算成本降低约3.5倍,并在离体猪心ULM成像中达到219 μm分辨率。

详情
AI中文摘要

通过图像配准进行组织运动校正对于超声定位显微镜(ULM)至关重要。参数化图像配准通常被表述为一个优化问题,其中运动参数被迭代更新以最大化图像相似度,所使用的优化算法通常依赖于梯度信息,而梯度的显式计算可能变得计算密集。本研究探讨了极值搜索控制(ESC)作为图像配准中显式导数计算的替代方案。通过跨迭代积分扰动和解调后的图像相似度度量来获取下降信息,ESC避免了每次迭代中图像相似度度量对运动参数的微分。经典的ESC(其优化行为近似于经典梯度下降(GD))首先与GD进行比较,用于仿射图像配准,使用从离体猪心跳动数据集中提取的模拟真实运动。结果表明,ESC实现了与GD相当的配准精度和收敛行为,同时每迭代计算成本降低了约3.5倍。随后,ESC被用于两阶段运动校正流程,其中仿射配准补偿全局组织运动,B样条配准校正残余局部变形。所提出的方法应用于离体跳动猪心的ULM成像,实现了219 μm的空间分辨率,显著低于与2.4 MHz发散波成像相关的半波长衍射极限321 μm。这些结果表明,ESC为ULM图像配准中的显式导数计算提供了一种有效的替代方案,能够实现精确的运动校正和高质量的超分辨率成像。

英文摘要

Tissue motion correction through image registration is essential for ultrasound localization microscopy (ULM). Parametric image registration is commonly formulated as an optimization problem where motion parameters are iteratively updated to maximize image similarity, and used optimization algorithms typically rely on gradient information, the explicit evaluation of which can become computationally demanding. This work investigates Extremum Seeking Control (ESC) as an alternative to explicit derivative evaluation in image registration. By obtaining descent information via integrating perturbed and demodulated image similarity metric across iterations, ESC avoids differentiation of the image similarity metric with respect to motion parameters in each iteration. The classical ESC, whose optimization behavior approximates that of classical gradient descent (GD), is first compared with GD for affine image registration using simulated ground-truth motions derived from a beating ex vivo porcine heart dataset. The results show that ESC achieves registration accuracy and convergence behavior comparable to GD while reducing per-iteration computational cost by approximately 3.5-fold. ESC is subsequently employed in a two-stage motion correction pipeline, where affine registration compensates for global tissue motion and B-spline registration corrects residual local deformation. The proposed method is applied to ULM imaging of a beating ex vivo porcine heart and achieves a spatial resolution of 219 um, substantially below the half-wavelength diffraction limit of 321 um associated with 2.4 MHz diverging-wave imaging. These results demonstrate that ESC provides an effective alternative to explicit derivative evaluation in ULM image registration, enabling accurate motion correction and high-quality super-resolution imaging.

2606.19169 2026-06-18 cs.GR cs.SY eess.SY 新提交 70%

RespGeomLib: A Reproducible Parametric Engine for Generating Analysis-Ready Human Airway Lumen Geometry

RespGeomLib:一个可复现的参数化引擎,用于生成分析就绪的人类气道管腔几何结构

Nichula Wasalathilaka, Parakrama Ekanayake, Roshan Godaliyadda

专题命中 医学影像 :气道管腔几何生成,用于医学仿真

AI总结 提出RespGeomLib,一个基于YAML规范的可复现参数化引擎,通过端口组装与隐式平滑混合生成无缝气道管腔表面,避免全局体素化,在定量上产生更清洁的分叉且更高效,支持形态测量引导生成和CFD仿真。

Comments Accepted to Publication at 2026 IEEE Mercon

详情
AI中文摘要

CT衍生的气道模型支持肺形态测量和气流模拟,但通常受限于远端扫描分辨率和分叉附近需要大量清理。程序化替代方案是可复现的,但许多依赖于拼接的管状基元,这些基元引入了非光滑连接和定义不清的开放边界。我们提出了RespGeomLib,一个可复现的参数化引擎,用于从紧凑的YAML规范生成分析就绪的人类气道管腔表面。该框架结合了基于端口的组装与隐式平滑最小混合,以产生无缝连接,同时通过解析段和分叉周围的局部隐式提取避免全树体素化。定量上,RespGeomLib产生比布尔/拼接基线更清洁的连接,并且比全树全局隐式提取更快且更节省内存。我们进一步展示了形态测量引导的树生成、受控合成气道变体以及具有稳定气流模拟的CFD就绪导出。RespGeomLib针对需要可复现形态测量、受控合成变体和模拟就绪管腔几何的生物医学工作流。代码公开于此https URL。

英文摘要

CT-derived airway models support pulmonary morphometry and airflow simulation, but are often limited by distal scan resolution and the need for substantial cleanup near bifurcations. Procedural alternatives are reproducible, yet many rely on stitched tubular primitives that introduce non-smooth junctions and poorly defined open boundaries. We present RespGeomLib, a reproducible parametric engine for generating analysis-ready human airway lumen surfaces from compact YAML specifications. The framework combines port-based assembly with implicit smooth-min junction blending to produce seamless junctions, while avoiding full-tree voxelization through analytic segments and local implicit extraction around bifurcations. Quantitatively, RespGeomLib yields cleaner junctions than a Boolean/stitch baseline and is substantially faster and more memory-efficient than whole-tree global implicit extraction. We further demonstrate morphometry-guided tree generation, controlled synthetic airway variants, and CFD-ready export with stable airflow simulation. RespGeomLib targets biomedical workflows requiring reproducible morphometry, controlled synthetic variants, and simulation-ready lumen geometry. The code is publicly available at https://nichula01.github.io/Respgeomlib/

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新 70%

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少,而且何处:将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

专题命中 医学影像 :在糖尿病视网膜病变选择性预测中验证方法

AI总结 针对安全关键分类中认知不确定性度量无法区分类别的问题,提出将互信息分解为每类向量$C_k$,通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制,在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

Journal ref Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}

详情
AI中文摘要

在安全关键分类中,失败的代价往往是不对称的,然而贝叶斯深度学习用单个标量——互信息(MI)来总结认知不确定性,这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$,其中$\mu_k{=}\mathbb{E}[p_k]$,$\sigma_k^2{=}\mathrm{Var}[p_k]$,计算基于后验样本。该分解来自熵的二阶泰勒展开;$1/\mu_k$加权校正了边界抑制,使$C_k$在稀有类别和常见类别之间具有可比性。根据构造,$\sum_k C_k \approx \mathrm{MI}$,并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后,我们在三个任务上验证了它:(i)糖尿病视网膜病变的选择性预测,其中关键类别的$C_k$相比MI降低了34.7%的选择性风险,相比方差基线降低了56.2%;(ii)临床和图像基准上的分布外检测,其中$\sum_k C_k$取得了最高的AUROC,并且每类视角暴露了MI无法察觉的不对称偏移;(iii)受控的标签噪声研究,其中在端到端贝叶斯训练下,$\sum_k C_k$对注入的偶然噪声的敏感性低于MI,而在迁移学习下两种度量均退化。在所有任务中,后验近似的质量对不确定性的影响至少与度量选择本身一样强,这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

2. 临床大模型 3 篇

2508.20275 2026-06-18 cs.LG cs.CL q-bio.QM 90%

A Systematic Review on the Generative AI Applications in Human Medical Genomics

关于生成式AI在人类医学基因组学中的应用系统综述

Anton Changalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov

发表机构 * Dpt. of Genomic Medicine(基因组医学系) D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology(D.O. Ott妇产科与生殖医学研究所)

专题命中 临床大模型 :探讨LLM在遗传疾病诊断中的应用,属于临床AI。

AI总结 本文系统综述了生成式AI在罕见和常见疾病遗传研究与诊断中的应用,分析了LLM在基因组变异识别、注释及医学影像中的作用,指出其在多模态数据整合和临床应用中的挑战。

Comments 31 pages, 5 figures

Journal ref Frontiers in Genetics 16 (2026) 1694070

详情
AI中文摘要

尽管传统统计技术和机器学习方法在遗传学和特别是遗传病诊断中做出了重要贡献,但它们在处理复杂、高维数据时往往遇到困难,而最先进的深度学习模型现在解决了这一挑战。基于Transformer架构的大语言模型(LLMs)在需要理解非结构化医疗数据的任务中表现出色。本文系统综述了LLMs在遗传研究和诊断中的作用,通过PubMed、bioRxiv、medRxiv和arXiv的自动化关键词搜索,分析了172项研究,突显了基因组变异识别、注释和解释以及通过视觉Transformer改进的医学影像进展。关键发现表明,虽然基于Transformer的模型显著提高了疾病和风险分层,但在变异解释、医学影像分析和报告生成方面仍存在挑战,整合多模态数据(基因组序列、影像和临床记录)到统一且临床稳健的流程中面临可扩展性和临床应用限制。本文提供了LLM在转变遗传病诊断和支持遗传教育方面的全面分类和评估,为导航这一快速发展的领域提供指导。

英文摘要

Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

2605.10840 2026-06-18 cs.LG cs.AI q-bio.QM 版本更新 85%

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

Clin-JEPA:一种多阶段协同训练框架,用于EHR患者轨迹的联合嵌入预测预训练

Yixuan Yang, Mehak Arora, Ryan Zhang, Baraa Abed, Junseob Kim, Tilendra Choudhary, Md Hassanuzzaman, Kevin Zhu, Ayman Ali, Chengkun Yang, Alasdair Edward Gent, Victor Moas, Rishikesan Kamaleswaran

发表机构 * Duke University(杜克大学)

专题命中 临床大模型 :提出Clin-JEPA框架,用于EHR患者轨迹预训练。

AI总结 本文提出Clin-JEPA框架,通过多阶段预训练稳定协同训练编码器和预测器,解决EHR数据中联合嵌入预测的挑战,实现多任务下游任务的高性能表现。

Comments 16 pages, 4 figures, 8 tables. Code: https://github.com/YeungYathin/Clin-JEPA

详情
AI中文摘要

我们介绍了Clin-JEPA,一种用于EHR患者轨迹的联合嵌入预测(JEPA)预训练的多阶段协同训练框架。JEPA架构已在机器人领域实现了潜在空间规划,并在视觉领域实现了高质量的表示学习,但将其扩展到EHR数据以获得一个能够同时预测患者轨迹并服务于多种下游风险预测任务的单一主干,仍是一个开放性挑战。现有的JEPA框架要么在预训练后丢弃预测器(I-JEPA,V-JEPA),要么在冻结的预训练编码器上训练预测器(V-JEPA 2-AC),导致编码器在推理时无法感知预测器必须使用的滚动信号;在共享JEPA预测目标下协同训练编码器和预测器将提供这种基础,但朴素的协同训练不稳定,代表性崩溃和在线/目标漂移导致自回归滚动发散。Clin-JEPA的五阶段预训练课程——预测器预热、联合细化、EMA目标对齐、硬同步和预测器最终化——通过阶段解决每个失败模式,稳定地协同训练基于Qwen3-8B的编码器和一个具有9200万参数的潜在轨迹预测器。在MIMIC-IV ICU数据上,三个独立评估支持该框架:(1)潜在ℓ1滚动漂移唯一收敛(-15.7%)在48小时范围内,而基线和消融测试发散(+3%至+4951%);(2)编码器学习了临床可区分的潜在几何结构(衰变患者群体在潜在空间中偏离4.83×,而稳定患者仅偏离≤2.62×);(3)单一主干在多任务下游评估中优于强大的表格和序列基线。Clin-JEPA在ICareFM EEP上达到平均AUROC 0.851,在8个二元风险任务上达到0.883(比基线平均高0.038和0.041)

英文摘要

We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data -- to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning -- remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum -- predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization -- addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).

2606.18518 2026-06-18 cs.LG cs.AI 新提交 80%

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

PSyGenTAB:通过约束优化生成合成临床表格数据的隐私保护框架

Arshia Ilaty, Hossein Shirazi, Manasi Chitale, Kedar Hegde, Dhanalakshmi Ramesh, Rashmi S. Manjunath, Amir Rahmani, Hajar Homayouni

发表机构 * San Diego State University(圣地亚哥州立大学) University of California, Irvine(加利福尼亚大学尔湾分校)

专题命中 临床大模型 :生成合成临床表格数据

AI总结 提出PSyGenTAB框架,将合成医疗数据生成建模为约束优化问题,通过增强拉格朗日方法嵌入可配置隐私约束,在保证隐私阈值的同时最大化临床数据效用,实验表明合成数据训练的模型性能与真实数据相当。

Comments 20 pages

详情
AI中文摘要

由于机构壁垒和严格的隐私法规(如HIPAA和GDPR),医疗AI的发展受到高质量临床数据获取限制。合成数据生成提供了一种潜在解决方案,但现有方法缺乏明确管理隐私-效用权衡的原则性机制,常常退化临床有意义的模式或面临患者重识别风险。我们提出PSyGenTAB,一个隐私保护生成框架,将合成医疗数据生成建模为使用增强拉格朗日方法求解的约束优化问题。通过将可配置的隐私约束直接嵌入模型训练,PSyGenTAB在最大化临床数据效用的同时强制执行最低隐私阈值。在多个临床驱动的基准测试中,PSyGenTAB保留了可靠健康AI所需的特征间临床关系和少数类诊断模式。使用“合成训练、真实测试”和“真实训练、合成测试”协议的下游评估表明,在合成数据上训练的模型达到了与真实患者记录训练模型相当的性能。隐私审计进一步证明了精确记录复制的减少和对成员推理攻击的强大抵抗力。这些结果确立了PSyGenTAB作为平衡合成医疗数据中隐私保护和临床效用的原则性框架,支持安全的跨机构AI开发。

英文摘要

The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution, but existing methods lack principled mechanisms to explicitly manage the privacy-utility trade-off, often degrading clinically meaningful patterns or risking patient re-identification. We present PSyGenTAB, a privacy-preserving generative framework that formulates synthetic healthcare data generation as a constrained optimization problem solved using the Augmented Lagrangian Method. By embedding configurable privacy constraints directly into model training, PSyGenTAB enforces minimum privacy thresholds while maximizing clinical data utility. Across multiple clinically motivated benchmarks, PSyGenTAB preserves inter-feature clinical relationships and minority-class diagnostic patterns essential for reliable health AI. Downstream evaluation using Train-on-Synthetic, Test-on-Real and Train-on-Real, Test-on-Synthetic protocols shows that models trained on synthetic data achieve performance comparable to those trained on real patient records. Privacy auditing further demonstrates reduced exact record reproduction and strong resilience to membership inference attacks. These results establish PSyGenTAB as a principled framework for balancing privacy protection and clinical utility in synthetic healthcare data, supporting secure cross-institutional AI development.

3. 诊断辅助 7 篇

2410.23503 2026-06-18 cs.LG 90%

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

基于生理和人口数据的机器学习模型在CBRNE紧急场景中用于缺氧严重程度分诊的发展与比较分析

Santino Nanini, Mariem Abid, Yassir Mamouni, Arnaud Wiedemann, Philippe Jouvet, Stephane Bourassa

发表机构 * SADC-CDSS IA PEDIATRICS, CHU Sainte-Justine, Montreal, Canada(SADC-CDSS IA儿科,圣-朱斯特医院,蒙特利尔,加拿大) Solutions Applicare AI Inc., Montreal, Canada(应用爱智AI公司,蒙特利尔,加拿大) Université de Montréal, Canada(蒙特利尔大学,加拿大) MEDINT CBRNE Group, Montreal, Canada(MEDINT CBRNE组,蒙特利尔,加拿大)

专题命中 诊断辅助 :机器学习模型预测缺氧严重程度用于分诊

AI总结 本文开发了机器学习模型预测紧急分诊中的缺氧严重程度,利用生理数据提升预测准确性,GBM在训练速度和可解释性上优于序列模型,未来将整合多医院数据提升模型泛化能力。

Comments 12 figures, 12 tables and 39 pages

Journal ref Diagnostics 14 (2024) 2763

详情
AI中文摘要

本文开发了机器学习模型用于预测紧急分诊中的缺氧严重程度,特别是在化学、生物、辐射、核和爆炸(CBRNE)事件中,利用医疗级传感器的生理数据。梯度提升模型(XGBoost、LightGBM、CatBoost)和序列模型(LSTM、GRU)在MIMIC-III和IV数据集上进行了训练。一个稳健的预处理管道处理了缺失数据、类别不平衡,并整合了带有遮罩的合成数据。梯度提升模型(GBM)在训练速度、可解释性和可靠性方面优于序列模型,使其适合实时决策。尽管序列模型在处理时间数据方面表现良好,但其性能提升未能 justify 更高的计算成本。选择了5分钟的预测窗口以实现及时干预,以分钟级插值标准化数据。特征重要性分析突显了遮罩和评分特征在提高透明度和性能中的重要作用。时间依赖性被证明是次要的,因为梯度提升模型能够有效捕捉关键模式,而无需依赖时间依赖性。本研究突显了机器学习在改善分诊和减少警报疲劳方面的潜力。未来的工作将整合多个医院的数据以提高模型在临床环境中的泛化能力。

英文摘要

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

2606.19140 2026-06-18 cs.LG 新提交 85%

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

ChronoSurv:一种临床路径引导的多模态生存分析图框架

Hugo Miccinilli, Theo Di Piazza

发表机构 * Université Paris-Saclay, CentraleSupélec, MICS, France(巴黎萨克雷大学,中央超算学院,MICS,法国) University of Lyon, INSA Lyon, CREATIS, France(里昂大学,里昂国家理工学院,CREATIS,法国)

专题命中 诊断辅助 :多模态生存分析框架,用于头颈癌预测

AI总结 提出ChronoSurv,一种基于有向图的多模态生存分析框架,通过层次化拓扑和异质消息传递建模临床轨迹,在头颈癌数据集上取得最优判别性能与可靠校准。

Comments Accepted at MICCAI 2026. Submitted version due to embargo

详情
AI中文摘要

准确的生存预测对于头颈癌的个性化治疗计划至关重要,但由于多模态临床数据的异质性和高维性,这仍然具有挑战性。虽然深度生存模型在预测性能上优于经典统计方法,但现有方法通常依赖于静态融合策略或时间无关建模,限制了其捕捉结构化临床工作流程的能力。在这项工作中,我们提出了ChronoSurv,一种用于多模态生存分析的异质层次有向图框架。ChronoSurv使用与关键诊断步骤对齐的有向图,将患者护理表示为进展感知的临床轨迹。层次拓扑包含细粒度、粗粒度和全局表示,进一步支持对缺失模态的灵活适应,而异质消息传递则建模了跨模态和临床步骤的复杂非对称关系。在两个公共数据集上的实验结果表明,ChronoSurv在保持统计可靠校准的同时,实现了最先进的判别性能。全面的消融研究进一步证实了每个架构组件的贡献,突出了轨迹感知图建模在多模态生存预测中的潜力。

英文摘要

Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival models have improved predictive performance over classical statistical approaches, existing methods typically rely on static fusion strategies or temporally agnostic modeling, limiting their ability to capture structured clinical workflows. In this work, we propose ChronoSurv, a heterogeneous hierarchical directed graph framework for multimodal survival analysis. ChronoSurv represents patient care as a progression-aware clinical trajectory using directed graphs aligned with key diagnostic steps. A hierarchical topology incorporates fine-grained, coarse, and global representations, further supporting flexible adaptation to missing modalities, while heterogeneous message passing models complex and asymmetric relationships across modalities and clinical steps. Experimental results on two public datasets demonstrate that ChronoSurv achieves state-of-the-art discriminative performance while maintaining statistically reliable calibration. Comprehensive ablation studies further confirm the contribution of each architectural component, highlighting the potential of trajectory-aware graph modeling for multimodal survival prediction.

2606.18571 2026-06-18 cs.LG cs.CL cs.SD eess.AS 新提交 85%

Fair Cognitive Impairment Detection Through Unlearning

通过去学习实现公平的认知障碍检测

William Nguyen, Jiali Cheng, Hadi Amiri

发表机构 * University of Massachusetts Lowell, USA(马萨诸塞大学洛厄尔分校)

专题命中 诊断辅助 :多模态框架公平检测轻度认知障碍

AI总结 提出一种多模态框架,结合跨模态融合和梯度反转去学习,减少人口统计信息对轻度认知障碍检测的偏见,在跨语言数据集上缩小性能差距。

Comments Interspeech 2026

详情
AI中文摘要

轻度认知障碍(MCI)是一种以记忆、语言或思维能力显著下降为特征的医学状况。从自发语音中检测MCI对于可扩展的筛查具有前景。然而,学习模型常常利用与标签相关的人口统计线索,导致不同亚组之间存在较大的性能差距。我们提出了一种多模态框架,结合了(i)模态间(语音、文本和图像)的跨模型融合,以及(ii)使用梯度反转的去学习,该技术阻止共享嵌入编码与任务无关的人口统计属性。在多语言基准TAUKADIAL和PREPARE上的评估表明,我们的方法在MCI分类上优于最先进的多语言和多模态基线,同时显著缩小了患者亚组(性别和语言)之间的性能差距。我们进一步分析了跨数据集的迁移,表明人口统计去学习有助于学习更鲁棒的MCI检测表示。

英文摘要

Mild Cognitive Impairment (MCI) is a medical condition characterized by a noticeable decline in memory, language, or thinking abilities. MCI detection from spontaneous speech is promising for scalable screening. However, learned models often exploit demographic cues correlated with labels, resulting in a large performance gap across subgroups. We present a multimodal framework that combines (i) cross-model fusion between modalities (speech, text, and image), and (ii) unlearning using gradient reversal that discourages the shared embedding from encoding task-irrelevant demographic attributes. Evaluated on the multilingual benchmarks TAUKADIAL and PREPARE, our method outperforms the state-of-the-art multilingual and multimodal baseline in MCI classification while substantially reducing the performance gap across patient subgroups (sex and language). We further analyze transfer across datasets, showing that demographic unlearning helps learn more robust representations for MCI detection.

2606.15973 2026-06-18 eess.SP 新提交 85%

An auscultation location specific study on the relationship between expiratory-to-inspiratory acoustic patterns and spirometric airflow limitation across age and gender in asthmatic patients

基于听诊位置的哮喘患者呼气-吸气声学模式与肺功能气流受限关系的年龄和性别特异性研究

Dheeraj Harish Kumar, Sanjana M C, Perumal Keerthi Priya, K V Nikhath Khanam, Uma Maheshwari Krishnaswamy, Prasanta Kumar Ghosh

专题命中 诊断辅助 :呼吸音分析辅助哮喘诊断,医学AI

AI总结 本研究通过分析141名哮喘患者的呼吸音频谱,发现呼气-吸气声功率比与FEV1/FVC在100-400Hz频段显著相关,且相关性受听诊位置、年龄和性别影响。

详情
AI中文摘要

哮喘导致呼气气流受限,临床通过肺功能检查评估,使用FEV1/FVC比值表示第一秒呼出气量占用力肺活量的比例。先前研究表明,在后部听诊位置(左下、左上、右上、右下)记录的呼吸音可反映局部气流模式。本研究在141名20-60岁参与者中,使用Spearman相关分析,研究呼气-吸气(E/I)频谱功率比与FEV1/FVC在不同频率子带的关系。100-200 Hz和200-400 Hz频带显示出显著相关性。总体而言,较低的后部听诊位置关联性更强;年轻成年人在左下位置相关性更强,而老年人在左上位置相关性更强。性别分层分析显示,男性在左下位置相关性更强,女性在左上位置相关性更强。

英文摘要

Asthma causes expiratory airflow limitation and is clinically assessed using spirometry, which provides the FEV1/FVC ratio representing the proportion of air exhaled in the first second relative to total forced vital capacity. Prior studies suggest that respiratory sounds recorded at posterior sites (Left Lower, Left Upper, Right Upper, Right Lower) reflect regional airflow patterns. In this study, we investigate the relationship between the expiratory-to-inspiratory (E/I) spectral power ratio and FEV1/FVC in 141 participants aged 20-60 years using Spearman correlation across frequency subbands. The 100-200 Hz and 200-400 Hz bands showed significant correlations. Overall, lower posterior sites showed stronger associations; younger adults showed stronger correlations at the Left Lower site, whereas older adults showed stronger correlations at the Left Upper site. Gender-stratified analysis showed stronger Left Lower correlations in males and stronger Left Upper correlations in females.

2605.21528 2026-06-18 cs.LG cs.AI 版本更新 85%

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

可重复的基于日志的自动机器学习框架用于医疗风险预测中的可解释流水线优化

Rui Huang, Lican Huang

发表机构 * School of Basic Medicine, Hangzhou Normal University(杭州师范大学基础医学院) Research Department, Hangzhou Domain Zones Technology Co.Ltd.(杭州域区技术有限公司)

专题命中 诊断辅助 :AutoML框架用于医疗风险预测,属于诊断辅助。

AI总结 本文提出了一种可重复的基于日志的自动机器学习框架,用于医疗风险预测中的可解释流水线优化,通过分析组件属性、交互和冗余性,提高了模型性能和稳定性。

详情
AI中文摘要

准确且可重复的疾病风险预测仍然具有挑战性,由于异质特征、有限样本和严重的类别不平衡。本研究引入了yvsoucom-iterkit,一种确定性和基于日志的自动化机器学习框架,将流水线优化完全可重复地建模为配置级系统。每个流水线被编码为可追溯的日志实体,使能够分析组件属性、交互、相似性和跨种子鲁棒性。在超过18,000个流水线配置上对Pima Indians糖尿病和中风数据集的实验揭示了一个结构化且部分冗余的搜索空间,其中性能由一小部分相互作用的组件决定。随机森林重要性分析显示,增强(0.454)、模型选择(0.198)和不平衡处理(0.101)是Pima数据集的关键驱动因素,而不平衡处理主导中风(0.406)。组件相似性分析显示强冗余性,特征选择变体(biMax-biMean)表现出低RMS距离(0.0252),混合匹配无增强(0.0279),TomekLinks与无不平衡处理对齐(0.0325),而高斯噪声与无增强的差异更大(0.10)。该框架使用集成模型(加权F1 0.89,宏F1 0.88在Pima;加权F1 0.94在中风)实现了强且稳定的性能,而宏F1在中风上较低(0.67)由于类别不平衡。跨种子分析揭示了性能-鲁棒性权衡,集成模型的变异性低于SVM。这些结果表明,有效的AutoML优化可以聚焦于一组高影响的组件。

英文摘要

Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

2603.15988 2026-06-18 eess.AS cs.AI cs.LG 版本更新 85%

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

无中生有:面向构音障碍语音严重程度鲁棒估计的数据增强

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson

发表机构 * 1 University of Illinois Urbana-Champaign, IL, USA 2 Korea Advanced Institute of Science \& Technology, KR

专题命中 诊断辅助 :构音障碍语音质量评估,用于临床诊断

AI总结 提出三阶段框架,利用未标注构音障碍语音和典型语音数据集,通过教师模型生成伪标签、标签感知对比学习预训练和微调,在五个未见数据集上平均SRCC达0.761,显著优于现有方法。

Comments Accepted to Interspeech 2026 Long Paper Track

详情
AI中文摘要

构音障碍语音质量评估(DSQA)对于临床诊断和包容性语音技术至关重要。然而,主观评估成本高且难以规模化,而标注数据的稀缺限制了鲁棒的客观建模。为解决这一问题,我们提出了一个三阶段框架,利用未标注的构音障碍语音和大规模典型语音数据集来扩展训练。教师模型首先生成未标注样本的伪标签,然后使用标签感知对比学习策略进行弱监督预训练,使模型暴露于多样化的说话者和声学条件。预训练模型随后针对下游DSQA任务进行微调。在跨越多种病因和语言的五个未见数据集上的实验证明了我们方法的鲁棒性。我们的基于Whisper的基线显著优于SOTA DSQA预测器(如SpICE),完整框架在未见测试数据集上实现了平均SRCC为0.761。

英文摘要

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of labeled data limits robust objective modeling. To address this, we propose a three-stage framework that leverages unlabeled dysarthric speech and large-scale typical speech datasets to scale training. A teacher model first generates pseudo-labels for unlabeled samples, followed by weakly supervised pretraining using a label-aware contrastive learning strategy that exposes the model to diverse speakers and acoustic conditions. The pretrained model is then fine-tuned for the downstream DSQA task. Experiments on five unseen datasets spanning multiple etiologies and languages demonstrate the robustness of our approach. Our Whisper-based baseline significantly outperforms SOTA DSQA predictors such as SpICE, and the full framework achieves an average SRCC of 0.761 across unseen test datasets.

2509.14183 2026-06-18 stat.ME stat.AP 版本更新 70%

Index Date Imputation for Survival Analysis in Externally Controlled Trials with Delayed Treatment Initiation

延迟治疗启动的外部对照试验中生存分析的索引日期插补

Q. Le Coent, G. L. Rosner, M-C. Wang, C. Hu

专题命中 诊断辅助 :外部对照试验中索引日期插补方法

AI总结 针对外部对照试验中因治疗启动延迟导致的索引日期错位问题,提出截断感知的索引日期插补(IDI)方法,结合倾向得分加权以校正混杂,模拟和真实数据验证其减少偏差的有效性。

详情
AI中文摘要

外部对照试验将单臂试验的结果与从历史试验、注册或观察性研究中抽取的外部对照进行比较。对于时间至事件终点,一个关键挑战是单臂试验中的随访以治疗启动为索引,而外部对照数据以更早的临床里程碑(如诊断或复发)为索引。这种错位可能引入永存时间偏倚,扭曲风险集,并复杂化生存比较的解释。我们提出索引日期插补(IDI),一种截断感知的方法,用于在延迟治疗启动的设置中为外部对照患者插补可比较的索引日期。IDI估计目标单臂人群中治疗启动时间的边际分布,同时考虑到启动时间仅在存活足够长以启动治疗的患者中观察到。然后使用插补的索引日期来对齐随访,并在外部对照队列中强制实施可比较的截断条件。由于仅时间对齐不能解决人群水平的混杂,IDI与倾向得分加权或匹配相结合,以改善队列之间的协变量可比性。我们通过蒙特卡洛模拟研究评估所提出方法的有限样本性能。使用来自一项随机肿瘤试验的数据,我们模拟了一个具有诱导索引日期错位的外部对照分析,并显示IDI减少了与随机试验基准的差异。IDI为涉及延迟治疗启动的生存分析中的索引日期对齐提供了一种实用策略,并且在有合适外部对照可用时,可以与标准的协变量调整方法集成。

英文摘要

Externally controlled trials compare outcomes from a single-arm trial with external controls drawn from historical trials, registries, or observational studies. For time-to-event endpoints, a key challenge arises when follow-up is indexed at treatment initiation in the single-arm trial, but the external-control data are indexed at an earlier clinical milestone, such as diagnosis or relapse. This misalignment can induce immortal time bias, distort risk sets, and complicate the interpretation of survival comparisons. We propose Index Date Imputation (IDI), a truncation-aware approach for imputing comparable index dates for external-control patients in settings with delayed treatment initiation. IDI estimates the marginal distribution of treatment-initiation times in the target single-arm population while accounting for the fact that initiation times are observed only among patients who survive long enough to initiate treatment. The imputed index dates are then used to align follow-up and enforce comparable truncation conditions in the external-control cohort. Because temporal alignment alone does not address population-level confounding, IDI is combined with propensity score weighting or matching to improve covariate comparability between cohorts. We evaluate the finite-sample performance of the proposed approach through Monte Carlo simulation studies. Using data from a randomized oncology trial, we emulate an externally controlled analysis with induced index-date misalignment and show that IDI reduces discrepancy from the randomized trial benchmark. IDI provides a practical strategy for index-date alignment in survival analyses involving delayed treatment initiation and can be integrated with standard covariate-adjustment methods when suitable external controls are available.

4. 健康监测 3 篇

2606.18640 2026-06-18 cs.LG q-bio.QM 新提交 85%

MetaboNet-Bench: A Multi-modal Benchmark for Glucose Forecasting in Type 1 Diabetes

MetaboNet-Bench:1型糖尿病血糖预测的多模态基准

Nathaniel Jeffries, Miriam Wolff, Sam Royston, Elizabeth Healey, Caleb Mayer, David Klonoff, Michael Snyder, Tao Wang

发表机构 * Department of Genetics, Stanford University School of Medicine(斯坦福大学医学院遗传学系) Replica Health Boston Children’s Hospital, Harvard Medical School(哈佛医学院波士顿儿童医院) Diabetes Research Institute, Mills-Peninsula Medical Center(米尔斯半岛医学中心糖尿病研究所)

专题命中 健康监测 :1型糖尿病血糖预测多模态基准

AI总结 针对1型糖尿病血糖预测算法缺乏标准化评估基准的问题,提出MetaboNet-Bench多模态基准,集成血糖、胰岛素和碳水化合物数据,通过多个模型对比验证多模态数据对模型性能的影响。

Comments main content in 10 pages with 5 figures; supplementary section with 11 more pages and 5 more figures

详情
AI中文摘要

血糖预测算法是1型糖尿病血糖控制管理的重要方面。迄今为止,研究社区已经开发了大量预测算法和模型。然而,公认的是,缺乏标准化的模型性能评估基准使得公平比较变得困难,并阻碍了进一步的创新,因此基准标准化迫在眉睫。此外,许多已发表的血糖预测算法仅限于CGM数据,忽略了其他多模态信号,如胰岛素剂量和碳水化合物摄入。在此,我们介绍MetaboNet-Bench,这是一个针对1型糖尿病患者的多模态血糖预测基准,它提供了一个可扩展的开源评估框架,用于比较利用血糖、胰岛素和碳水化合物数据的血糖预测算法。然后,我们通过基准测试几个最近发布的血糖预测模型和一个自定义的多模态时间序列模型(代表不同的模型架构)来展示其实用性。结果表明,添加数据模态的好处取决于模型的复杂性,并且纳入更多临床指标有助于识别未来研究中有意义的空白。

英文摘要

Glucose forecasting algorithms are an important aspect of glycemic control management in type 1 diabetes. So far, the research community has developed numerous algorithms and models for forecasting. However, it is well-recognized that the lack of standardized model performance evaluation benchmarks makes fair comparison difficult and hinders further innovation, and thus benchmark standardization is in urgent need. Furthermore, many published glucose forecasting algorithms are limited to CGM data alone, ignoring other multimodal signals such as insulin dosing and carbohydrate intake. Here, we introduce MetaboNet-Bench, a benchmark for multimodal glucose forecasting for patients with type 1 diabetes that provides an extensible open-source evaluation framework for comparison of glucose forecasting algorithms that leverage glucose, insulin, and carbohydrate data. We then demonstrate its utility by benchmarking several recently published glucose forecasting models and a custom multimodal time-series model, representing different model architectures. The results show that the benefit of adding data modalities is conditioned on the complexity of the model and that incorporating more clinical metrics helps identify meaningful gaps to fill for future research.

2511.06140 2026-06-18 q-bio.QM 80%

Non-invasive load measurement in the human tibia via spectral analysis of flexural waves

通过弯曲波的频谱分析非侵入式测量人体胫骨的负荷

Ali Yawar, Daniel H. Aslan, Daniel E. Lieberman

专题命中 健康监测 :非侵入式胫骨负荷测量,用于运动医学

AI总结 该研究提出了一种非侵入式测量胫骨压缩力的方法,通过分析胫骨中传播的弯曲波频谱,利用频谱峰值位置与压缩力的线性关系进行测量,验证了该方法在人体运动和体育医学中的应用潜力。

Comments 23 pages, 23 figures, 1 table. Manuscript revised for clarity and consistency

Journal ref J. R. Soc. Interface (2026) 23 (239): 20251206

详情
AI中文摘要

骨骼传递的力在人类生物力学中经常被研究,但非侵入式测量尤其在非实验室环境中具有挑战性。我们介绍了一种非侵入式、体内测量胫骨压缩力的技术,利用胫骨中传播的弯曲波。将胫骨建模为轴向压缩的欧拉-伯努利梁,显示胫骨弯曲波具有依赖于负载的频谱。在生理条件下,波加速谱中的峰值位置与胫骨上的压缩力线性变化,并可作为压缩力的代理。我们通过一个概念验证的可穿戴系统测试了该技术的有效性,该系统通过皮肤安装的机械换能器生成弯曲波,并利用皮肤安装的加速度计测量这些波的频谱。与梁理论一致,9名参与者的数据显示了胫骨压缩力与频谱峰值位置之间的线性关系,相关系数r=0.82-0.99(均值r=0.93)用于前后摆动试验,r=0.81-0.98(均值r=0.93)用于步行试验。这种基于弯曲波的技术可能催生一种新的可穿戴传感器,用于非侵入式生理骨负荷监测和测量,影响人类运动和运动医学的研究。

英文摘要

Forces transmitted by bones are routinely studied in human biomechanics, but it is challenging to measure them non-invasively, especially outside of laboratory settings. We introduce a technique for non-invasive, in vivo measurement of tibial compressive force using flexural waves propagating in the tibia. Modelling the tibia as an axially compressed Euler-Bernoulli beam, we show that tibial flexural waves have load-dependent frequency spectra. Specifically, under physiological conditions, peak locations in the wave acceleration spectra vary linearly with the compressive force on the tibia and may be used as proxies for the compressive force. We test the validity of this technique using a proof-of-concept wearable system that generates flexural waves via a skin-mounted mechanical transducer and measures the spectra of these waves using a skin-mounted accelerometer. In agreement with beam theory, data from 9 participants demonstrate linear relationships between tibial compressive force and spectral peak location, with Pearson correlation coefficients $r=0.82 - 0.99$ (mean $r=0.93$) for medial-lateral swaying and $r=0.81 - 0.98$ (mean $r=0.93$) for walking trials. This flexural wave-based technique could give rise to a new class of wearable sensors for non-invasive physiological bone load monitoring and measurement, impacting research in human locomotion and sports medicine.

2412.01836 2026-06-18 q-bio.NC 80%

Eye dominance and testing order effects in the circularly-oriented macular pigment optical density measurements that rely on the perception of structured light-based stimuli

圆周定向视网膜色素密度测量中眼主导性与测试顺序效应的影响

Mukhit Kulmaganbetov, Taranjit Singh, Dmitry Pushin, Pinki Chahal, David Cory, Davis Garrad, Connor Kapahi, Melanie Mungalsingh, Iman Salehi, Andrew Silva, Ben Thompson, Zhangting Wang, Dusan Sarenac

专题命中 健康监测 :研究视网膜色素密度测量中的影响因素

AI总结 研究探讨了基于结构化光刺激的视网膜色素密度测量中,眼主导性和测试顺序对感知的影响,发现两者与测量结果无显著相关性,为未来临床应用奠定基础。

详情
AI中文摘要

心理物理学中结构化光刺激的辨别可能在筛查各种视网膜疾病,包括退行性视网膜病变中发挥作用。圆周定向视网膜色素密度(coMPOD)通过结构化光诱导的视网膜现象辨别性能计算,可能揭示视网膜健康的新功能生物标志物。本研究探讨了眼主导性和测试顺序对结构化光刺激感知的潜在影响,这些因素可能影响基于结构化光技术的筛查测试的灵敏度。28名18-38岁受试者在全面眼科检查后参与研究。心理物理任务中,多种具有多方位条纹旋转特定时间频率的结构化光刺激被投射到受试者视网膜上。通过遮蔽视网膜中央区域,测量了刺激可感知区域的视网膜等距(R)。使用考虑结构化光刺激不同空间密度和时间频率的感知阈值测量的时空敏感性模型,计算了每个受试者的coMPOD轮廓斜率(a值)。眼主导性和测试顺序效应的皮尔逊相关系数为r=0.8(p<0.01)。两种因素的布兰-阿尔曼图显示零偏倚。结果表明,两眼的测量结果可重复,暗示眼主导性和测试顺序对结构化光刺激感知影响较小。结果为未来探索结构化光工具在眼科临床应用中的实用价值奠定了基础。

英文摘要

Psychophysical discrimination of structured light (SL) stimuli may be useful in screening for various macular disorders, including degenerative macular diseases. The circularly-oriented macular pigment optical density (coMPOD), calculated from the discrimination performance of SL-induced entoptic phenomena, may reveal a novel functional biomarker of macular health. In this study, we investigated the potential influence of eye dominance and testing order effects on SL-based stimulus perception, factors that potentially influence the sensitivity of screening tests based on SL technology. A total of 28 participants (aged 18-38 years) were selected for the study after undergoing a comprehensive eye examination. A psychophysical task was performed where various SL-based entoptic images with multiple azimuthal fringes rotating with a specific temporal frequency were projected onto the participants' retinas. By occluding the central areas of entoptic images, we measured the retinal eccentricity ($R$) of the perceivable area of the stimuli. The slope of the coMPOD profile ($a$-value) was calculated for each participant using a spatiotemporal sensitivity model that takes into account the perceptual threshold measurements of structured light stimuli with varying spatial densities and temporal frequencies. The Pearson correlation coefficient between eye dominance and testing order effects was $r=0.8$ ($p<0.01$). The Bland-Altman plots for both factors indicated zero bias. The results indicate repeatable measurements for both eyes, implying minimal impact from eye dominance and testing order on SL-based stimulus perception. The results provide a foundation for future studies exploring the clinical utility of SL tools in eye health.