医学 AI - arXivDaily 专题

2508.11211 2026-06-18 eess.IV cs.CV 版本更新 95%

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

面向CT视野扩展的高效图像到图像薛定谔桥

Zhenhao Li, Song Ni, Long Yang, Xiaojie Yin, Haijun Yu, Jiazhou Wang, Hongbin Han, Weigang Hu, Yixing Huang

发表机构 * Institute of Medical Technology, Peking University Health Science Center（北京大学人民医院医学技术研究所）； Shanghai Cancer Center, Fudan University（复旦大学上海癌症中心）； Department of Electrical and Computer Engineering, University of Massachusetts Lowell（马萨诸塞大学洛厄尔分校电气与计算机工程系）； Beijing Key Laboratory of Intelligent Neuromodulation and Brain Disorder Treatment（北京智能神经调控与脑疾病治疗重点实验室）

专题命中医学影像：CT视野扩展，属于医学影像

AI总结提出基于图像到图像薛定谔桥（I²SB）扩散模型的CT视野扩展框架，通过直接学习有限视野与扩展视野图像间的随机映射，实现单步快速推理，在精度和速度上均超越现有扩散模型。

Comments 12 pages

Journal ref IEEE Transactions on Radiation and Plasma Medical Sciences 2026

详情

AI中文摘要

计算机断层扫描（CT）是一种用于无创、高分辨率可视化内部解剖结构的基石成像模态。然而，当扫描物体超出扫描仪的视野（FOV）时，投影数据被截断，导致重建不完整并在FOV边界附近出现明显伪影。传统重建算法难以从这类数据中恢复准确的解剖结构，限制了临床可靠性。深度学习方法已被探索用于FOV扩展，其中扩散生成模型代表了图像合成的最新进展。然而，传统扩散模型由于迭代采样过程，计算量大且推理速度慢。为解决这些限制，我们提出了一种基于图像到图像薛定谔桥（I$^2$SB）扩散模型的高效CT FOV扩展框架。与从纯高斯噪声合成图像的传统扩散模型不同，I$^2$SB学习配对的有限FOV和扩展FOV图像之间的直接随机映射。这种直接对应关系产生了更可解释和可追踪的生成过程，增强了重建中的解剖一致性和结构保真度。I$^2$SB实现了优越的定量性能，在模拟噪声数据上的均方根误差（RMSE）值为49.8 HU，在真实数据上为152.0 HU，优于最先进的扩散模型，如条件去噪扩散概率模型（cDDPM）和基于块的扩散方法。此外，其单步推理使得每2D切片的重建仅需0.19秒，相比cDDPM（135秒）实现了超过700倍的加速，并超过了第二快的DiffusionGAN（0.58秒）。这种准确性和效率的结合表明I$^2$SB具有实时或临床部署的潜力。

英文摘要

Computed tomography (CT) is a cornerstone imaging modality for non-invasive, high-resolution visualization of internal anatomical structures. However, when the scanned object exceeds the scanner's field of view (FOV), projection data are truncated, resulting in incomplete reconstructions and pronounced artifacts near FOV boundaries. Conventional reconstruction algorithms struggle to recover accurate anatomy from such data, limiting clinical reliability. Deep learning approaches have been explored for FOV extension, with diffusion generative models representing the latest advances in image synthesis. Yet, conventional diffusion models are computationally demanding and slow at inference due to their iterative sampling process. To address these limitations, we propose an efficient CT FOV extension framework based on the image-to-image Schrödinger Bridge (I$^2$SB) diffusion model. Unlike traditional diffusion models that synthesize images from pure Gaussian noise, I$^2$SB learns a direct stochastic mapping between paired limited-FOV and extended-FOV images. This direct correspondence yields a more interpretable and traceable generative process, enhancing anatomical consistency and structural fidelity in reconstructions. I$^2$SB achieves superior quantitative performance, with root-mean-square error (RMSE) values of 49.8 HU on simulated noisy data and 152.0 HU on real data, outperforming state-of-the-art diffusion models such as conditional denoising diffusion probabilistic models (cDDPM) and patch-based diffusion methods. Moreover, its one-step inference enables reconstruction in just 0.19 s per 2D slice, representing over a 700-fold speedup compared to cDDPM (135 s) and surpassing DiffusionGAN (0.58 s), the second fastest. This combination of accuracy and efficiency indicates that I$^2$SB has potential for real-time or clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2512.09185 2026-06-18 cs.CV cs.AI 版本更新 95%

Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation

学习患者特异性疾病动态：基于潜在流匹配的纵向影像生成

Hao Chen, Rui Yin, Yifan Chen, Qi Chen, Chao Li

发表机构 * University of Cambridge（剑桥大学）； Nanjing First Hospital（南京第一医院）； Nanjing Medical University（南京医科大学）； Johns Hopkins University（约翰霍普金斯大学）； University of Dundee（邓迪大学）

专题命中医学影像：提出纵向MRI生成框架，建模疾病进展

AI总结提出Δ-LFM框架，利用流匹配对齐患者潜在轨迹，通过患者特异性潜在对齐实现单调疾病进展建模，在三个纵向MRI基准上验证了可解释性和性能。

Comments ICLR 2026 accepted

详情

AI中文摘要

理解疾病进展是一个直接的临床挑战，对早期诊断和个性化治疗具有重要意义。虽然最近的生成方法试图对进展进行建模，但关键不匹配仍然存在：疾病动态本质上是连续且单调的，然而潜在表示通常是分散的，缺乏语义结构，并且基于扩散的模型通过随机去噪过程破坏了连续性。在这项工作中，我们提出将疾病动态视为速度场，并利用流匹配（FM）来对齐患者数据的时间演变。与先前方法不同，它捕捉了疾病的内在动态，使进展更具可解释性。然而，一个关键挑战仍然存在：在潜在空间中，自动编码器（AE）不能保证跨患者的对齐或与临床严重性指标（例如年龄和疾病状况）的相关性。为了解决这个问题，我们提出学习患者特异性潜在对齐，这迫使患者轨迹沿着特定轴延伸，其幅度随疾病严重程度单调增加。这导致了一个一致且语义上有意义的潜在空间。总之，我们提出了Δ-LFM，一个用于通过流匹配建模患者特异性潜在进展的框架。在三个纵向MRI基准上，Δ-LFM展示了强大的实证性能，更重要的是，为解释和可视化疾病动态提供了一个新框架。

英文摘要

Understanding disease progression is a central clinical challenge with direct implications for early diagnosis and personalized treatment. While recent generative approaches have attempted to model progression, key mismatches remain: disease dynamics are inherently continuous and monotonic, yet latent representations are often scattered, lacking semantic structure, and diffusion-based models disrupt continuity with random denoising process. In this work, we propose to treat the disease dynamic as a velocity field and leverage Flow Matching (FM) to align the temporal evolution of patient data. Unlike prior methods, it captures the intrinsic dynamic of disease, making the progression more interpretable. However, a key challenge remains: in latent space, Auto-Encoders (AEs) do not guarantee alignment across patients or correlation with clinical-severity indicators (e.g., age and disease conditions). To address this, we propose to learn patient-specific latent alignment, which enforces patient trajectories to lie along a specific axis, with magnitude increasing monotonically with disease severity. This leads to a consistent and semantically meaningful latent space. Together, we present $Δ$-LFM, a framework for modeling patient-specific latent progression with flow matching. Across three longitudinal MRI benchmarks, $Δ$-LFM demonstrates strong empirical performance and, more importantly, offers a new framework for interpreting and visualizing disease dynamics.

URL PDF HTML ☆

赞 0 踩 0

2510.10779 2026-06-18 cs.CV 版本更新 95%

Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans

结构化谱图表示学习用于3D CT扫描的多标签异常分析

Theo Di Piazza, Carole Lazarus, Olivier Nempont, Loic Boussel

发表机构 * INSA Lyon, University of Lyon, CNRS, INSERM, CREATIS UMR 5220, U1294（里昂国立应用科学学院、里昂大学、国家科学研究中心、法国国家医学研究院、CREATIS UMR 5220、U1294）

专题命中医学影像：3D CT异常分析，多标签分类

AI总结提出一种基于谱图卷积的2.5D框架，将3D CT体积表示为结构化图，通过轴向切片三元组节点建模层间依赖，实现多标签异常分类，跨数据集泛化性能强。

Comments Accepted at MELBA Journal 2026

详情

DOI: 10.59275/j.melba.2026-87e3

AI中文摘要

随着CT检查数量的增长，对器官分割、异常检测和报告生成等自动化工具的需求日益增加，以支持放射科医生管理临床工作负载。由于三维数据中固有的复杂空间关系和异常的广泛变异性，3D胸部CT扫描的多标签分类仍然是一个关键但具有挑战性的问题。基于3D卷积神经网络的现有方法难以捕捉长距离依赖，而视觉Transformer通常需要在大规模领域特定数据集上进行大量预训练才能获得竞争力。在这项工作中，我们提出了一种2.5D替代方案，引入了一个新的基于图的框架，将3D CT体积表示为结构化图，其中轴向切片三元组作为节点，通过谱图卷积处理，使模型能够推理层间依赖，同时保持与临床部署兼容的复杂度。我们的方法在来自独立机构的3个数据集上进行训练和评估，实现了强大的跨数据集泛化能力，并与最先进的视觉编码器相比表现出竞争性能。我们进一步进行了全面的消融研究，以评估各种聚合策略、边加权方案和图连接模式的影响。此外，我们通过自动放射学报告生成和腹部CT数据的迁移实验展示了我们方法的更广泛适用性。

英文摘要

With the growing volume of CT examinations, there is an increasing demand for automated tools such as organ segmentation, abnormality detection, and report generation to support radiologists in managing their clinical workload. Multi-label classification of 3D Chest CT scans remains a critical yet challenging problem due to the complex spatial relationships inherent in volumetric data and the wide variability of abnormalities. Existing methods based on 3D convolutional neural networks struggle to capture long-range dependencies, while Vision Transformers often require extensive pre-training on large-scale, domain-specific datasets to perform competitively. In this work, we propose a 2.5D alternative by introducing a new graph-based framework that represents 3D CT volumes as structured graphs, where axial slice triplets serve as nodes processed through spectral graph convolution, enabling the model to reason over inter-slice dependencies while maintaining complexity compatible with clinical deployment. Our method, trained and evaluated on 3 datasets from independent institutions, achieves strong cross-dataset generalization, and shows competitive performance compared to state-of-the-art visual encoders. We further conduct comprehensive ablation studies to evaluate the impact of various aggregation strategies, edge-weighting schemes, and graph connectivity patterns. Additionally, we demonstrate the broader applicability of our approach through transfer experiments on automated radiology report generation and abdominal CT data.

URL PDF HTML ☆

赞 0 踩 0

2606.00491 2026-06-18 cs.CV cs.AI 版本更新 90%

Pre-Deployment Robustness Stress Testing for CT Segmentation Systems Using Clinically Motivated Multi-Corruption Augmentation

CT分割系统的部署前鲁棒性压力测试：使用临床驱动的多损坏增强

CholMin Kanga, Jonghyun Chung, Amanpreet Kaur, Nagesh Gulkotwar, Aarthi Sivasankaran

发表机构 * Seoul National University（首尔国立大学）； Google Inc.（谷歌公司）

专题命中医学影像：CT分割系统鲁棒性压力测试，医学影像增强

AI总结提出RAMP框架，通过多损坏增强提升CT分割模型在临床异质成像条件下的鲁棒性，显著缩小干净与损坏图像性能差距。

详情

AI中文摘要

基于深度学习的CT分割系统在干净基准图像上通常能达到高精度，但在噪声、分辨率损失、对比度变化、强度偏移和伪影等异质临床成像条件下，其性能可能会下降。这种不稳定性可能限制其在真实医疗成像工作流程中的可靠部署。我们提出鲁棒性增强多损坏流水线（RAMP），这是一个面向鲁棒性的CT分割增强框架。RAMP结合了解剖约束的空间扰动、CT强度变换和随机多损坏组合，使模型在训练过程中暴露于临床可行的图像退化。在两个CT分割评估设置中，RAMP实现了最强的损坏图像性能和最小的干净到损坏鲁棒性差距。在五器官噪声评估基准中，与nnU-Net基线相比，RAMP将平均损坏Dice从0.610提高到0.753，并将鲁棒性差距从0.264降低到0.064。在Abdomen1K中，RAMP将平均损坏Dice从0.633提高到0.789，并将鲁棒性差距从0.290降低到0.070。尽管RAMP未达到最高的干净图像Dice，但它显著减轻了严重图像退化下的最坏情况分割崩溃。这些结果表明，多损坏增强可以作为提高CT分割系统在异质临床环境中可靠性的实用部署前策略。

英文摘要

Deep learning-based CT segmentation systems often achieve high accuracy on clean benchmark images, but their performance may degrade under heterogeneous clinical imaging conditions such as noise, resolution loss, contrast variation, intensity shift, and artifacts. This instability can limit reliable deployment in real-world medical imaging workflows. We propose Robustness via Augmented Multi-corruption Pipeline (RAMP), a robustness-oriented augmentation framework for CT segmentation. RAMP combines anatomically constrained spatial perturbations, CT intensity transformations, and stochastic multi-corruption composition to expose models to clinically plausible image degradation during training. Across two CT segmentation evaluation settings, RAMP achieved the strongest corrupted-image performance and the smallest clean-to-corrupted robustness gap. In the five-organ noisy evaluation benchmark, RAMP improved mean corrupted Dice from 0.610 to 0.753 and reduced the robustness gap from 0.264 to 0.064 compared with the nnU-Net baseline. In Abdomen1K, RAMP improved mean corrupted Dice from 0.633 to 0.789 and reduced the robustness gap from 0.290 to 0.070. Although RAMP did not achieve the highest clean-image Dice, it substantially mitigated worst-case segmentation collapse under severe image degradation. These results suggest that multi-corruption augmentation can serve as a practical pre-deployment strategy for improving the reliability of CT segmentation systems in heterogeneous clinical environments.

URL PDF HTML ☆

赞 0 踩 0

2605.12567 2026-06-18 cs.CV cs.AI 版本更新 90%

Pyramid Self-Contrastive Learning for Single-shot Test-time Ultrasound Image Denoising

金字塔自对比学习框架用于测试时超声图像去噪

Jiajing Zhang, Bingze Dai, Xi Zhang, Yue Xu, Wei-Ning Lee

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong（香港大学电子与计算机工程系）； Department of Biomedical Engineering, Duke University（达特茅斯大学生物医学工程系）

专题命中医学影像：提出测试时超声图像去噪框架，提升结构细节。

AI总结本文提出一种纯测试时训练框架，用于单次超声图像去噪，应用于合成孔径超声，通过自对比学习分离解剖相似性和噪声随机性，提升去噪效果和结构细节。

详情

AI中文摘要

内在的电子噪声和斑点噪声使超声图像的临床解释复杂化。传统去噪方法依赖显式噪声假设，其有效性在复合噪声条件下减弱。基于学习的方法需要大量标注数据和模型参数。这些预定义和预训练的方法在复杂体内环境中不可避免地导致领域偏移，因此局限于特定噪声类型并常模糊结构细节。本文提出了一种纯测试时训练框架用于单次超声图像去噪，并应用于合成孔径超声（SAU），该方法通过自对比学习在金字塔潜在空间中分离解剖相似性和噪声随机性。干净图像随后从解剖空间解码，而丢弃噪声空间。A2A在测试时仅使用一个噪声样本的SAU信号进行训练，从而从根本上消除了领域偏移和预训练成本。模拟实验，包括电子噪声水平0至30 dB和不同包含几何形状，证明了A2A在SNR和CNR上的改进分别为69.3%和34.4%。体内结果表明，仅使用心脏六个超声切面、肝脏和肾脏的两个孔径数据，SNR和CNR分别提高了84.8%和25.7%。A2A在多种成像目标和配置中产生清晰的图像/信号，为更可靠的超声解剖可视化和功能评估铺平了道路。

英文摘要

The inherent electronic and speckle noise complicates clinical interpretation of ultrasound images. Conventional denoising methods rely on explicit noise assumptions whose validity diminishes under composite noise conditions. Learning-based methods are usually pretrained in a limited image domain using a labeled dataset, which implies inevitable domain shift in complex in vivo environments. This study proposes a Pyramid Self-Contrastive Learning (PSCL) framework for test-time ultrasound image denoising without pretraining. Given multiple noisy samples from only one-shot imaging, PSCL disentangles anatomical similarity and noise randomness into separate pyramid latent spaces. The clean image is then decoded from the anatomy space while discarding the noise space. We first apply PSCL to synthetic aperture ultrasound (SAU), where an Aperture-to-Aperture loop serves as a self-supervised proxy task to ensure denoising fidelity. Simulation experiments, including noise levels from 0 to 30 dB and inclusion geometries from simple to complex, demonstrated improvements of 69.3% in SNR and 34.4% in CNR. The in vivo results showed 84.8% SNR and 25.7% CNR gains using only two aperture data of the heart in six echocardiographic views, liver, and kidney. PSCL delivers clear images across diverse imaging targets and configurations, paving the way for more reliable anatomical visualization without domain shift and pretraining costs.

URL PDF HTML ☆

赞 0 踩 0

2602.11467 2026-06-18 cs.LG 版本更新 90%

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

PRISM：一种用于可解释形状建模的三维概率神经表示

Yining Jiao, Sreekalyani Bhamidi, Carlton Jude Zdanski, Julia S Kimbell, Andrew Prince, Cameron P Worden, Samuel Kirse, Christopher Rutter, Benjamin H Shields, Jisan Mahmud, Marc Niethammer

发表机构 * Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA（北卡罗来纳大学教堂山分校计算机科学系）； Department of Computer Science, University of California San Diego, La Jolla, USA（加州大学圣地亚哥分校计算机科学系）； School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA（北卡罗来纳大学教堂山分校医学院）

专题命中医学影像：可解释形状建模，解剖结构不确定性

AI总结提出PRISM框架，结合隐式神经表示与不确定性感知统计形状分析，通过封闭形式Fisher信息度量实现高效局部时间不确定性量化，在形状演化、个性化预测和异常检测任务中表现优异。

Comments ICML 2026, camera-ready version, 24 pages

详情

AI中文摘要

理解解剖形状如何响应发育协变量而演变——并量化其空间变化的不确定性——在医疗保健研究中至关重要。现有方法通常依赖于忽略空间异质性动态的全局时间扭曲公式。我们引入PRISM，一种新颖的框架，将隐式神经表示与不确定性感知统计形状分析相结合。PRISM建模给定协变量下形状的条件分布，提供总体均值和协变量依赖不确定性在任意位置的空间连续估计。一个关键的理论贡献是封闭形式的Fisher信息度量，通过自动微分实现高效、解析可处理的局部时间不确定性量化。在三个合成数据集和一个临床数据集上的实验表明，PRISM在统一框架内从建模形状演化到个性化形状预测和异常检测等多样化任务中表现出色，同时提供可解释且临床有意义的不确定性估计。

英文摘要

Understanding how anatomical shapes evolve in response to developmental covariates - and quantifying their spatially varying uncertainties - is critical in healthcare research. Existing approaches typically rely on global time-warping formulations that ignore spatially heterogeneous dynamics. We introduce PRISM, a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis. PRISM models the conditional distribution of shapes given covariates, providing spatially continuous estimates of both the population mean and covariate-dependent uncertainty at arbitrary locations. A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification via automatic differentiation. Experiments on three synthetic datasets and one clinical dataset demonstrate PRISM's strong performance across diverse tasks - from modeling shape evolution to personalized shape prediction and anomaly detection - within a unified framework, while providing interpretable and clinically meaningful uncertainty estimates.

URL PDF HTML ☆

赞 0 踩 0

2512.10353 2026-06-18 cs.CV 版本更新 90%

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

混合Transformer-Mamba用于弱监督体积医学分割

Yiheng Lyu, Lian Xu, Coen Arrow, Mohammed Bennamoun, Farid Boussaid, Girish Dwivedi

发表机构 * University of Western Australia（西澳大学）； Harry Perkins Institute of Medical Research（哈利·佩金斯医学研究所）； National Imaging Facility（国家成像设施）； Fiona Stanley Hospital（菲奥娜·斯蒂尔医院）； Victor Chang Cardiac Research Institute（维多利亚·张心脏研究中心）

专题命中医学影像：混合Transformer-Mamba用于弱监督体积医学分割

AI总结提出TranSamba混合架构，通过跨平面建模捕获3D上下文，在弱监督下实现高效体积分割，在三个数据集上达到最优性能。

2606.03827 2026-06-18 cs.CV cs.AI 版本更新 85%

Conditional Latent Diffusion Model with Fourier-based Motion Modelling for Virtual Population Synthesis

基于傅里叶运动建模的条件潜扩散模型用于虚拟人群合成

Shaokun Lan, Haoran Dou, Jinghan Huang, Arezoo Zakeri, Fengming Lin, Zherui Zhou, Jinming Duan, Alejandro F. Frangi

发表机构 * Centre for Computational Imaging and Modelling in Medicine (CIMIM)（计算医学成像与建模中心）； University of Manchester（曼彻斯特大学）； Christabel Pankhurst Institute（克里斯塔贝尔·潘克赫斯特研究所）； Department of Computer Science（计算机科学系）； Division of Informatics, Imaging & Data Sciences（信息学、成像与数据科学分会）； Department of Electrical & Electronic Engineering（电子与电气工程系）； NIHR Manchester Biomedical Research Centre, Manchester Academic Health Sciences Centre, University of Manchester（尼日利亚卫生研究委员会曼彻斯特生物医学研究中心、曼彻斯特学术健康科学中心、曼彻斯特大学）

专题命中医学影像：心脏网格序列生成，医学影像应用

AI总结提出4D F-MeshLDM框架，结合卷积网格VAE、截断傅里叶级数运动参数化和条件扩散先验，实现可控的3D+t心脏网格序列生成，在UK Biobank数据上优于基线方法。

Comments This work has been early accepted by International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2026

详情

AI中文摘要

医疗设备的计算机模拟试验需要生成虚拟解剖人群。在心血管应用中，虚拟解剖通常表示为从生成模型采样的3D+t网格。然而，大多数现有网格生成器关注静态解剖，而序列模型往往缺乏显式周期性。为此，我们提出4D F-MeshLDM，一个条件生成框架，包括用于编码网格的卷积网格VAE、使用截断傅里叶级数参数化运动的结构化潜空间，以及学习傅里叶系数令牌上潜分布的先验扩散。通过仿射调制将扩散过程条件化于临床协变量，我们实现了可控合成。采样令牌并执行逆傅里叶合成产生周期一致的潜轨迹，可解码为3D+t心脏网格序列。在5,000名UK Biobank受试者上的实验表明，4D F-MeshLDM在解剖保真度上优于最先进的基线，并实现了接近零的周期闭合误差。此外，生成的队列准确保留了临床功能指标，突显了我们的框架在可靠的心脏计算机模拟试验中的潜力。

英文摘要

In-silico trials of medical devices require the generation of virtual populations of anatomies. In cardiovascular applications, virtual anatomy is typically represented as a 3D+t mesh sampled from a generative model. However, most existing mesh generators focus on static anatomy, while sequence models often lack explicit periodicity. To this end, we propose 4D F-MeshLDM, a conditional generative framework comprising a convolutional mesh VAE to encode meshes, a structural latent space that parameterises motion using a truncated Fourier series, and a diffusion prior that learns the latent distribution over Fourier coefficient tokens. By conditioning the diffusion process on clinical covariates via affine modulation, we enable controllable synthesis. Sampling tokens and performing inverse Fourier synthesis yield cycle-consistent latent trajectories, which can be decoded into 3D+t cardiac mesh sequences. Experiments on 5,000 UK Biobank subjects demonstrate that 4D F-MeshLDM outperforms state-of-the-art baselines in anatomical fidelity and achieves near-zero cycle closure error. Furthermore, the generated cohorts accurately preserve clinical functional indices, highlighting the potential of our framework for reliable in-silico cardiac trials.

URL PDF HTML ☆

赞 0 踩 0

2504.01527 2026-06-18 cs.CV eess.IV 版本更新 85%

Beyond Nearest Neighbor Interpolation in Data Augmentation

超越数据增强中的最近邻插值

Olivier Rukundo

发表机构 * Department of Electronic and Computer Engineering, University of Limerick（电子与计算机工程系，利默里克大学）

专题命中医学影像：提出离线数据增强管道，提升医学图像分割性能。

AI总结本文提出改进的几何变换函数和均值分类过滤机制，以避免最近邻插值带来的标注误差和低通滤波影响，通过离线数据增强管道提升医学图像分割性能。

Comments 10 pages, 11 figures, 14 tables

详情

AI中文摘要

避免最近邻插值导致的未定义类别标签风险忽视了增强训练数据中像素级标注误差的加剧风险。此外，插值算法固有的低通滤波效应会加剧标注区域内的高频结构细节退化风险。为避免这些风险，作者通过修改卷积神经网络的数据转换函数，引入改进的几何变换函数，去除对最近邻插值的依赖，并整合基于均值的类别过滤机制来处理未定义的类别标签。作者还实现了离线数据增强管道，生成特定于插值的增强训练数据，从而能够定量评估插值对增强训练数据的低通滤波效应。在三个医学图像分割数据集和XBAT+数据集上的实验评估显示，在多个定量指标上均实现了性能提升。

英文摘要

Avoiding the risk of undefined categorical labels using nearest neighbor interpolation overlooks the risk of exacerbating pixel level annotation errors in augmented training data. Additionally, the inherent low pass filtering effects of interpolation algorithms exacerbate the risk of degrading high frequency structural details within annotated regions of interest. To avoid these risks, the author modified convolutional neural networks data transformation functions by incorporating a modified geometric transformation function, removing reliance on nearest neighbor interpolation, and integrating a mean-based class filtering mechanism to handle undefined categorical labels with alternative interpolation algorithms. The author also implemented an offline data augmentation pipeline to generate interpolation specific augmented training data, enabling quantitative assessment of interpolation specific low pass filtering effects on augmented training data. Experimental evaluation on three medical image segmentation datasets and the XBAT+ datasets demonstrated performance gains across multiple quantitative metrics.

URL PDF HTML ☆

赞 0 踩 0

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新 70%

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少，而且何处：将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics（数学与统计学系）

专题命中医学影像：在糖尿病视网膜病变选择性预测中验证方法

AI总结针对安全关键分类中认知不确定性度量无法区分类别的问题，提出将互信息分解为每类向量$C_k$，通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制，在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

Journal ref Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}

详情

AI中文摘要

在安全关键分类中，失败的代价往往是不对称的，然而贝叶斯深度学习用单个标量——互信息（MI）来总结认知不确定性，这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$，其中$\mu_k{=}\mathbb{E}[p_k]$，$\sigma_k^2{=}\mathrm{Var}[p_k]$，计算基于后验样本。该分解来自熵的二阶泰勒展开；$1/\mu_k$加权校正了边界抑制，使$C_k$在稀有类别和常见类别之间具有可比性。根据构造，$\sum_k C_k \approx \mathrm{MI}$，并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后，我们在三个任务上验证了它：（i）糖尿病视网膜病变的选择性预测，其中关键类别的$C_k$相比MI降低了34.7%的选择性风险，相比方差基线降低了56.2%；（ii）临床和图像基准上的分布外检测，其中$\sum_k C_k$取得了最高的AUROC，并且每类视角暴露了MI无法察觉的不对称偏移；（iii）受控的标签噪声研究，其中在端到端贝叶斯训练下，$\sum_k C_k$对注入的偶然噪声的敏感性低于MI，而在迁移学习下两种度量均退化。在所有任务中，后验近似的质量对不确定性的影响至少与度量选择本身一样强，这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

URL PDF HTML ☆

赞 0 踩 0