arXivDaily arXiv每日学术速递 周一至周五更新
重置
eess.IV图像视频10
2606.12294 2026-06-11 cs.CV eess.IV 新提交

Bridging the Modality Gap in Forensic Image Retrieval

弥合法医图像检索中的模态差距

Ricardo González-Gazapo, Annette Morales-González, Yoanna Martínez-Díaz, Heydi Méndez-Vázquez, Milton García-Borroto

发表机构 * Advanced Technologies Application Center (CENATAV)(先进技术应用中心(CENATAV)) Centro de Sistemas Complejos, Facultad de Física, Universidad de La Habana(哈瓦那大学物理学院复杂系统中心)

AI总结 提出统一检索框架,利用多模态大语言模型生成文本描述并结合视觉与文本特征融合,提升纹身、人脸素描等法医任务的检索精度与鲁棒性。

详情
Comments
23 pages, 5 figures, paper submitted to Elsevier journal
AI中文摘要

自动图像检索在现代法医分析中扮演着越来越关键的角色,支持依赖于视觉证据高效比较的调查工作流程。虽然先前的工作主要集中在开发和优化多模态检索系统,但很少关注评估这些技术在多样化真实场景中的法医适用性。在本研究中,我们提出了一个统一的检索框架,适用于四个关键的法医任务:(1)给定纹身查询图像的纹身图像检索;(2)由人类专家文本描述引导的纹身检索,模拟目击者口头描述纹身的常见情况;(3)从手绘草图中检索纹身;(4)从法医面部素描中检索人脸。我们的系统利用多模态大语言模型(MLLM)自动为所有查询和图库图像生成结构化文本描述,然后使用句子变换器嵌入进行基于文本的比较。我们使用仅视觉嵌入、仅文本嵌入以及一种多模态融合策略来评估检索性能,该策略结合了来自与每个任务相关的最先进视觉特征提取器的文本和图像相似性分数。模态融合一致地提高了检索精度和鲁棒性,特别是在视觉信息有限或嘈杂的场景中(例如,素描、部分纹身或零碎的目击者陈述)。这项工作突显了统一多模态检索流程的法医价值,并展示了现代MLLM如何能够操作化传统上依赖人工专家分析的具有挑战性的法医任务。我们的结果将多模态检索定位为支持涉及纹身、面部合成和目击者描述的调查工作流程的有前途工具。

英文摘要

Automated image retrieval plays an increasingly critical role in modern forensic analysis, supporting investigative workflows that rely on efficient comparison of visual evidence. While prior work has focused primarily on developing and optimizing multimodal retrieval systems, limited attention has been paid to evaluating the forensic applicability of these technologies across diverse real-world scenarios. In this study, we present a unified retrieval framework adapted to four key forensic tasks: (1) tattoo image retrieval given a tattoo query image; (2) tattoo retrieval guided by human-expert textual descriptions, modelling the common situation where a witness verbally describes a tattoo; (3) tattoo retrieval from hand-drawn sketches; and (4) face retrieval from forensic face sketches. Our system leverages a multimodal large language model (MLLM) to automatically generate structured textual descriptions for all queries and gallery images, followed by sentence-transformer embedding for text-based comparison. We evaluate retrieval using visual-only embeddings, text-only embeddings and a multimodal fusion strategy that combines text- and image-based similarity scores derived from state-of-the-art visual feature extractors relevant to each task. The fusion of modalities consistently improves retrieval precision and robustness, especially in scenarios where visual information is limited or noisy (e.g., sketches, partial tattoos, or fragmented witness statements). This work highlights the forensic value of a unified multimodal retrieval pipeline and demonstrates how modern MLLMs can operationalize challenging forensic tasks that traditionally rely on manual expert analysis. Our results position multimodal retrieval as a promising tool for supporting investigative workflows involving tattoos, facial composites, and witness descriptions.

2606.12226 2026-06-11 cs.CV eess.IV 新提交

An Electric Potential-Augmented Benchmark Dataset for Physics-Guided Image Reconstruction of Electrical Capacitance Tomography

一种电势增强的基准数据集,用于电容层析成像的物理引导图像重建

Xinqi Zhang, Qiming Ma, Lihui Peng

发表机构 * Department of Automation, Tsinghua University(清华大学自动化系)

AI总结 针对电容层析成像(ECT)数据驱动方法忽略电势场的问题,提出一个包含电势图的基准数据集,通过COMSOL-MATLAB管道生成20,000个样本,并验证其提升建模精度和鲁棒性。

详情
AI中文摘要

虽然深度学习显著推进了电容层析成像(ECT)的图像重建,但大多数数据驱动方法直接映射电容和介电常数分布,将传感器视为黑箱。这忽略了电势场——控制非线性和病态“软场”效应的基本物理联系。为解决此问题,我们提出一个电势增强的ECT基准数据集,旨在将ECT背后的潜在物理显式集成到学习过程中。通过COMSOL-MATLAB管道为八电极传感器生成示例,数据集包含20,000个随机样本,涵盖四种典型流型。关键的是,除了传统的电容向量和以图像形式描绘的介电常数分布外,每个样本还保留了八个激励方向的全场电势图。除了数据发布,我们还提供了ECT正问题和逆问题的说明性评估协议。通过在分布内(IID)和分布外(OOD)场景下的全面测试,我们系统地展示了包含电势图如何增强建模精度和鲁棒性。从根本上说,潜在场信息的显式包含显著降低了将物理定律集成到ECT建模中的障碍,从而为未来ECT图像重建的物理引导机器学习建立了标准化基础。

英文摘要

While deep learning has significantly advanced image reconstruction of Electrical Capacitance Tomography (ECT), most data-driven methods map directly between capacitance and permittivity distribution, treating the sensor as a black box. This overlooks the electric potential field -- the fundamental physical link governing the nonlinear and ill-posed ``soft-field'' effect. To address this, we propose an electric potential-augmented ECT benchmark dataset designed to explicitly integrate latent physics behind ECT into the learning process. Generated via a COMSOL-MATLAB pipeline for an eight-electrode sensor as an example, the dataset comprises 20,000 randomized samples across four typical flow patterns. Crucially, alongside the conventional capacitance vectors and permittivity distributions depicted as images, each sample preserves eight excitation-wise full-field potential maps. Beyond data release, we provide illustrative evaluation protocols for both forward and inverse problems of ECT. Through comprehensive testing on both in-distribution (IID) and out-of-distribution (OOD) scenarios, we systematically demonstrate how the inclusion of electric potential maps enhances modeling accuracy and robustness. Fundamentally, the explicit inclusion of latent field information significantly lowers the barrier to integrating physical laws into ECT modeling, thereby establishing a standardized foundation for future physics-guided machine learning of ECT image reconstruction.

2606.12123 2026-06-11 eess.IV 新提交

An Indoor Localization Technique Utilizing Passive Tags and 3-D Microwave Passive Radar Imaging

利用无源标签和三维微波无源雷达成像的室内定位技术

Quanfeng Wang, Alexander H. Paulus, Mei Song Tong, Thomas F. Eibert

AI总结 提出一种利用三维近场无源雷达成像的隐私合规室内定位方法,通过无源标签增强散射场强度实现精确定位,并支持非理想成像场景。

详情
Comments
This paper is published in Progress In Electromagnetics Research (PIER), Vol.181, pp.89--98, 2024. This is the author's version which has not been fully edited and content may change prior to final publication. This repository copy is provided to comply with open-access requirements
AI中文摘要

提出一种利用三维近场(NF)无源雷达成像技术的隐私合规室内定位方法。该技术利用普遍辐射的电磁场进行成像,引入无源标签以增强散射场强度,从而在成像层面实现精确定位。该方法还支持非理想成像场景中的定位,例如有限带宽或高反射环境。基于几何特性,简单且低成本的无源标签能够直观地区分个体或物体。讨论了相关的隐私保护机制,其中无源标签的频率变化特性在隐私和伦理考量下提供了额外的灵活性和潜在应用。提出了多种形式的无源标签,仿真和实验结果均验证了所提出的无源标签设计的有效性。

英文摘要

A privacy-compliant indoor localization approach utilizing a 3-D near-field (NF) passive radar imaging technique is presented. This technique leverages ubiquitously radiated electromagnetic fields for imaging, with passive tags introduced to enhance the strength of scattering fields, thereby enabling precise localization at the imaging level. The method also supports localization in non-ideal imaging scenarios, such as for limited bandwidth or in highly-reflective environments. Based on their geometrical properties the simple and low-cost passive tags enable intuitive differentiation between individuals or objects. Associated privacy protection mechanisms are discussed, where the frequency-varying properties of the passive tags provide additional flexibility and potential applications under privacy and ethical considerations. Several forms of passive tags are presented, where both simulation and experimental results validate the effectiveness of the proposed passive tag designs.

2606.12074 2026-06-11 cs.CV cs.AI eess.IV 新提交

Non-frontal face recognition using GANs and memristor-based classifiers

基于GAN和忆阻器分类器的非正面人脸识别

Semih Vazgecen, Cristian Sestito, Spyros Stathopoulos, Themis Prodromakis

发表机构 * Centre for Electronics Frontiers, Institute for Integrated Micro and Nano Systems, School of Engineering, The University of Edinburgh(爱丁堡大学工程学院集成微纳系统研究所电子前沿中心)

AI总结 提出将轻量级GAN正面化与忆阻器神经形态识别结合,解决非正面人脸识别,在数据集上达96%准确率。

详情
Comments
12 pages, 4 figures, 1 Supplementary (22 pages, 16 figures, 6 tables, 4 supplementary notes)
AI中文摘要

人脸识别系统通过深度学习技术取得了显著进展,在复杂场景中实现了高性能和鲁棒性。然而,这些方法带来了巨大的计算开销,限制了它们在资源受限平台(如无人机)上的原位适用性,而这些平台需要应对非正面人脸图像等挑战。基于忆阻器的神经形态系统已成为边缘AI应用的一种引人注目的方法,它将生物启发式处理与高效可扩展的计算相结合。在这项工作中,我们提出了一种人脸识别框架,通过集成基于轻量级生成对抗网络(GAN)的正面化处理和基于忆阻器的神经形态识别,来解决非正面姿态变化问题。在两个数据集上的实验结果表明,将对抗学习与忆阻技术相结合的有效性,实现了高达96%的识别准确率。所提出的方法缓解了传统AI的计算瓶颈,并为动态真实环境中的人脸识别提供了一种可扩展、高效的解决方案。

英文摘要

Face recognition systems have advanced significantly through deep learning techniques, delivering high performance and robustness in complex scenarios. However, these approaches incur substantial computational overhead, limiting their in situ applicability in resource-constrained platforms such as drones, where they can address challenges including non-frontal facial imagery. Memristor-based neuromorphic systems have emerged as a compelling approach for edge AI applications, combining biologically inspired processing with efficient and scalable computation. In this work, we propose a facial recognition framework that addresses non-frontal pose variations by integrating lightweight generative adversarial network (GAN)-based pose frontalisation with memristor-based neuromorphic recognition. The experimental results on two datasets demonstrate the effectiveness of combining adversarial learning with memristive technology, achieving up to 96% identification accuracy. The proposed approach alleviates the computational bottlenecks of conventional AI and offers a scalable, efficient solution for face recognition in dynamic real-world environments.

2606.11500 2026-06-11 eess.IV cs.CE cs.IT cs.LG q-bio.NC 新提交

FlexiBrain: Resolution-Agnostic Voxel-Level Encoding for Native fMRI

FlexiBrain: 面向原生fMRI的分辨率无关体素级编码

Mo Wang, Wenhao Ye, Junfeng Xia, Minghao Xu, Hongkai Wen, Quanying Liu

AI总结 提出FlexiBrain,一种基于Mamba-JEPA的分辨率无关体素级编码框架,通过动态补丁调整直接处理原生fMRI数据,避免破坏性空间标准化,在五个下游任务中性能提升达12个百分点,并显著降低预处理成本。

详情
AI中文摘要

大规模深度学习模型在神经科学中的成功从根本上受到严重数据异质性的制约。从不同来源聚合的原生fMRI数据在空间和时间分辨率上表现出显著差异。因此,大多数现有框架依赖于冗长、僵化的预处理流程,以强制数据集之间的一致性。这种做法引入了两个关键限制:(1)可能退化受试者特定的解剖信息;(2)显著的计算开销,通常每个受试者需要数小时的处理。在此,我们提出FlexiBrain,一种基于Mamba-JEPA的分辨率无关体素级编码框架,用于原生fMRI。FlexiBrain以真实物理单位定义补丁大小,并采用动态补丁调整,从而绕过破坏性的空间标准化,同时允许直接摄取原生空间中的数据。我们使用高效的Mamba-JEPA骨干网络实例化该框架,以建模高维4D fMRI信号。在五个不同的下游神经科学任务中,FlexiBrain持续优于近期最先进的方法,在不使用外部数据增强的情况下实现了高达12个百分点的提升。重要的是,FlexiBrain作为一个无缝插件模块,显著降低了预处理成本,并加速了稳健的体素级fMRI基础模型的开发。代码可在该https URL获取。

英文摘要

The success of large-scale deep learning models in neuroscience is fundamentally constrained by severe data heterogeneity. Native fMRI data aggregated from diverse sources exhibit substantial variation in both spatial and temporal resolutions. Consequently, most existing frameworks rely on lengthy, rigid preprocessing pipelines that enforce uniformity across datasets. This practice introduces two critical limitations: (1) potential degradation of subject-specific anatomical information; (2) significant computational overhead, often requiring hours of processing per subject. Here, we propose FlexiBrain, a resolution-agnostic voxel-level encoding framework for native fMRI based on Mamba-JEPA. FlexiBrain defines patch sizes in real-world physical units and employs a dynamic patch resizing, thereby bypassing destructive spatial standardization while enabling direct ingestion of data in native space. We instantiate the framework using an efficient Mamba-JEPA backbone to model high-dimensional 4D fMRI signals. Across five diverse downstream neuroscience tasks, FlexiBrain consistently outperforms recent state-of-the-art methods, achieving gains of up to 12 percentage points without external data augmentation. Importantly, FlexiBrain functions as a seamless plug-in module, substantially reducing preprocessing costs and accelerating the development of robust voxel-level fMRI foundation models. Code is available at this https URL.

2606.11287 2026-06-11 eess.IV cs.CV 新提交

Intelligent Skin Cancer Detection Using a Multispectral Metasurface and a Hybrid

基于多光谱超表面和混合深度学习的智能皮肤癌检测

Afsane Saee Arezoomand

AI总结 提出结合多光谱超表面成像与CNN-ViT混合深度学习架构,实现皮肤癌高精度检测,准确率达98%,灵敏度95%,特异性99%。

详情
Comments
8 pages
AI中文摘要

皮肤癌是全球最常见的恶性肿瘤之一,早期检测对于提高患者生存率和降低治疗成本至关重要。传统的皮肤镜和视觉成像技术主要局限于可见光谱,通常无法捕捉与早期恶性肿瘤相关的细微光谱特征。本研究提出了一种创新框架,将多光谱超表面成像与基于卷积神经网络和视觉Transformer的混合深度学习架构相结合。设计的超表面能够非侵入性地获取对组织变化高度敏感的丰富光谱信息,而混合CNN-ViT模型同时提取局部和全局特征,以稳健地对皮肤病变进行分类。基于模拟的评估表明,所提方法实现了约98%的准确率、95%的灵敏度和99%的特异性,优于传统的基于RGB和单一架构的方法。使用注意力图进行的定性分析显示,模型关注临床相关的病变区域,提高了可解释性。总体而言,结果表明,将基于超表面的多光谱成像与混合深度学习相结合,可以引入新一代皮肤病学诊断工具,并为便携、快速且高精度的临床系统铺平道路。

英文摘要

Skin cancer is among the most prevalent malignancies worldwiAdbe satnradcitts early detection is essential for improving patient survival and reducing treatment costs Conventional dermoscopic and visual imaging techniques are primarily limited to the visible spectrum and often fail to capture subtle spectral signatures associated with early stage malignancies This study proposes an innovative framework that integrates a multispectral metasurface for imaging with a hybrid deep learning architecture based on Convolutional Neural Networks and Vision Transformers The designed metasurface enables noninvasive acquisition of rich spectral information highly sensitive to tissue alterations while the hybrid CNN ViT model simultaneously extracts local and global features to robustly classify skin lesions Simulation-based evaluations demonstrate that the proposed method achieves approximately 98 accuracy 95 percentages sensitivity and 99 perentage specificity surpassing conventional RGB-based and single-architecture approaches Qualitative analyses using attention maps reveal that the model focuses on clinically relevant lesion regions improving interpretability Overall the results indicate that combining metasurface based multispectral imaging with hybrid deep learning can introduce a new generation of diagnostic tools in dermatology and pave the way for portable fast and highly accurate clinical systems

2606.11107 2026-06-11 eess.IV cs.CV cs.LG 版本更新

Multimodal Brain Tumour Classification Using Feature Fusion

使用特征融合的多模态脑肿瘤分类

Wajih ul Islam, Muhammad Yaqoob, Javed Ali Khan, Volker Steuber

AI总结 提出双分支多模态网络,融合MRI图像与91个放射组学特征,通过门控融合实现脑肿瘤分类,准确率达96.13%。

详情
AI中文摘要

临床医生通过综合患者症状、病史以及来自MRI和CT扫描等模态的定量成像数据,形成统一的临床判断来诊断脑肿瘤。然而,大多数深度学习模型仅依赖MRI/CT图像,未能复制临床医生的多模态推理。我们探索了一种双分支多模态网络,将原始MRI扫描与91个提取的放射组学特征(强度、纹理、形状和边界描述符)相结合,将脑肿瘤分类为胶质瘤、脑膜瘤、垂体瘤和无肿瘤。预训练的CNN骨干网络编码图像流,而专用的MLP编码放射组学特征流。通过拼接、门控或双向跨模态注意力策略融合两个流。在平衡的7200张图像数据集上的九次实验运行中,所有多模态配置均优于单模态基线,其中门控融合实现了最佳准确率96.13%。

英文摘要

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.

2602.14913 2026-06-11 cs.LG eess.IV 版本更新

Coverage Guarantees for Pseudo-Calibrated Conformal Prediction under Distribution Shift

分布漂移下伪校准保形预测的覆盖保证

Farbod Siahkali, Ashwin Verma, Vijay Gupta

AI总结 针对分布漂移下保形预测覆盖失效问题,利用伪校准和领域自适应工具,推导目标覆盖下界,并提出通过松弛参数膨胀保形阈值的方法及源调优伪校准算法,实验证明其能缓解覆盖退化。

详情
Comments
Under review. 6 pages, 2 figures, 1 table
AI中文摘要

保形预测(CP)在可交换性假设下提供无分布边际覆盖保证,但当数据分布发生漂移时,这些保证可能失效。我们分析了在有限标签条件协变量漂移模型下,使用伪校准作为应对这种性能损失的工具。利用领域自适应的工具,我们根据分类器的源域损失和漂移的Wasserstein度量推导出目标覆盖的下界。利用这一结果,我们提供了一种设计伪校准集的方法,该方法通过松弛参数膨胀保形阈值,使目标覆盖保持在规定水平以上。最后,我们提出了一种源调优伪校准算法,该算法根据分类器的不确定性在硬伪标签和随机化标签之间进行插值。数值实验表明,我们的界限定性地跟踪了伪校准行为,并且源调优方案在分布漂移下缓解了覆盖退化,同时保持了非平凡的预测集大小。

英文摘要

Conformal prediction (CP) offers distribution-free marginal coverage guarantees under an exchangeability assumption, but these guarantees can fail if the data distribution shifts. We analyze the use of pseudo-calibration as a tool to counter this performance loss under a bounded label-conditional covariate shift model. Using tools from domain adaptation, we derive a lower bound on target coverage in terms of the source-domain loss of the classifier and a Wasserstein measure of the shift. Using this result, we provide a method to design pseudo-calibrated sets that inflate the conformal threshold by a slack parameter to keep target coverage above a prescribed level. Finally, we propose a source-tuned pseudo-calibration algorithm that interpolates between hard pseudo-labels and randomized labels as a function of classifier uncertainty. Numerical experiments show that our bounds qualitatively track pseudo-calibration behavior and that the source-tuned scheme mitigates coverage degradation under distribution shift while maintaining nontrivial prediction set sizes.

2512.13765 2026-06-11 eess.IV cs.AI cs.LG 版本更新

Towards Deep Learning Surrogate for the Forward Problem in Electrocardiology: A Scalable Alternative to Physics-Based Models

面向心电学正问题的深度学习代理模型:一种可扩展的物理模型替代方案

Shaheim Ogbomo-Harmitt, Cesare Magnetti, Chiara Spota, Jakub Grzelak, Oleg Aslanidi

AI总结 提出基于注意力机制的序列到序列深度学习框架,作为心电学正问题的代理模型,从心脏电压传播图预测心电图信号,在2D组织模拟中达到高精度(平均R²=0.99±0.01),为物理模型提供可扩展、低成本的替代方案。

详情
Comments
Accepted to CinC conference 2025
AI中文摘要

心电学中的正问题,即从心脏电活动计算体表电位,传统上使用基于物理的模型(如双域或单域方程)求解。虽然准确,但这些方法计算成本高,限制了其在实时和大规模临床中的应用。我们提出一个概念验证的深度学习(DL)框架,作为正问题求解器的高效代理。该模型采用基于时间依赖注意力机制的序列到序列架构,从心脏电压传播图预测心电图(ECG)信号。引入了一种混合损失函数,结合Huber损失和谱熵项,以保持时域和频域的保真度。使用包含健康、纤维化和缝隙连接重塑条件的2D组织模拟,模型实现了高精度(平均$R^2 = 0.99 \pm 0.01$)。消融研究证实了卷积编码器、时间感知注意力和谱熵损失的贡献。这些发现突显了DL作为物理求解器的可扩展、低成本替代方案的潜力,适用于临床和数字孪生应用。

英文摘要

The forward problem in electrocardiology, computing body surface potentials from cardiac electrical activity, is traditionally solved using physics-based models such as the bidomain or monodomain equations. While accurate, these approaches are computationally expensive, limiting their use in real-time and large-scale clinical applications. We propose a proof-of-concept deep learning (DL) framework as an efficient surrogate for forward solvers. The model adopts a time-dependent, attention-based sequence-to-sequence architecture to predict electrocardiogram (ECG) signals from cardiac voltage propagation maps. A hybrid loss combining Huber loss with a spectral entropy term was introduced to preserve both temporal and frequency-domain fidelity. Using 2D tissue simulations incorporating healthy, fibrotic, and gap junction-remodelled conditions, the model achieved high accuracy (mean $R^2 = 0.99 \pm 0.01$). Ablation studies confirmed the contributions of convolutional encoders, time-aware attention, and spectral entropy loss. These findings highlight DL as a scalable, cost-effective alternative to physics-based solvers, with potential for clinical and digital twin applications.

2507.21164 2026-06-11 cs.LG cs.AI eess.IV stat.ML 版本更新

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

OCSVM引导的无监督异常检测表示学习

Nicolas Pinon (MYRIAD), Robin Trombetta (MYRIAD), Carole Lartizien (MYRIAD)

AI总结 提出一种将表示学习与可解析求解的一类SVM耦合的方法,通过定制损失函数直接对齐潜在特征与决策边界,在MNIST-C和脑MRI病变检测任务上展现了鲁棒性和性能。

详情
AI中文摘要

无监督异常检测(UAD)旨在无需标签数据检测异常,这在许多机器学习应用中是必要的,因为异常样本稀少或不可用。大多数最先进的方法分为两类:基于重构的方法(通常重构异常过于完美)和与密度估计器解耦的表示学习(可能遭受次优特征空间)。虽然一些近期方法尝试耦合特征学习和异常检测,但它们通常依赖替代目标、限制核选择或引入近似,从而限制了表达能力和鲁棒性。为解决这一挑战,我们提出了一种新颖方法,通过自定义损失公式将表示学习与可解析求解的一类SVM(OCSVM)耦合,该损失直接使潜在特征与OCSVM决策边界对齐。该模型在两个任务上评估:基于MNIST-C的新基准,以及具有挑战性的脑MRI细微病变检测任务。与大多数关注图像级别大而高信号病变的方法不同,我们的方法成功针对小而非高信号的病变,同时我们评估体素级别的指标,处理了更具临床相关性的场景。两个实验评估了对领域偏移的鲁棒性形式,包括MNIST-C中的损坏类型以及MRI中的纹理或人群年龄变化。结果展示了我们提出模型的性能和鲁棒性,突显了其在通用UAD和现实医学成像应用中的潜力。源代码可在此https URL获取。

英文摘要

Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data, a necessity in many machine learning applications where anomalous samples are rare or not available. Most state-of-the-art methods fall into two categories: reconstruction-based approaches, which often reconstruct anomalies too well, and decoupled representation learning with density estimators, which can suffer from suboptimal feature spaces. While some recent methods attempt to couple feature learning and anomaly detection, they often rely on surrogate objectives, restrict kernel choices, or introduce approximations that limit their expressiveness and robustness. To address this challenge, we propose a novel method that couples representation learning with an analytically solvable One-Class SVM (OCSVM), through a custom loss formulation that directly aligns latent features with the OCSVM decision boundary. The model is evaluated on two tasks: a \deleted{new} benchmark based on MNIST-C, and a challenging brain MRI \deleted{subtle} lesion detection task. Unlike most methods that focus on large, hyperintense lesions at the image level, our approach succeeds to target small, non-hyperintense lesions, while we evaluate voxel-wise metrics, addressing a more clinically relevant scenario. Both experiments evaluate a form of robustness to domain shifts, including corruption types in MNIST-C and texture or population age variations in MRI. Results demonstrate performance and robustness of our proposed model, highlighting its potential for general UAD and real-world medical imaging applications. The source code is available at this https URL.