DCFO: Density-Based Counterfactuals for Outliers -- Additional Material
DCFO: 基于密度的离群点反事实解释——补充材料
Tommaso Amico, Pernille Matthews, Lena Krieger, Arthur Zimek, Ira Assent
AI总结 针对局部离群因子(LOF)缺乏可解释性的问题,提出基于密度的离群点反事实解释方法(DCFO),通过将数据空间划分为LOF平滑区域实现高效梯度优化,在50个OpenML数据集上优于现有方法。
详情
离群点检测识别显著偏离大多数数据分布的数据点。解释离群点对于理解导致其检测的潜在因素、验证其重要性以及识别潜在偏差或错误至关重要。有效的解释提供可操作的见解,有助于采取预防措施以避免未来出现类似的离群点。反事实解释通过识别改变预测所需的最小变化,阐明特定数据点为何被分类为离群点。尽管有价值,但大多数现有的反事实解释方法忽略了离群点检测带来的独特挑战,并且未能针对经典、广泛采用的离群点检测算法。局部离群因子(LOF)是最流行的无监督离群点检测方法之一,通过相对局部密度量化离群程度。尽管LOF在多种应用中广泛使用,但它缺乏可解释性。为解决这一局限性,我们提出了基于密度的离群点反事实解释(DCFO),这是一种专门为LOF生成反事实解释的新方法。DCFO将数据空间划分为LOF行为平滑的区域,从而实现高效的基于梯度的优化。在50个OpenML数据集上的广泛实验验证表明,DCFO始终优于基准竞争对手,在生成的反事实的邻近性和有效性方面表现更优。
Outlier detection identifies data points that significantly deviate from the majority of the data distribution. Explaining outliers is crucial for understanding the underlying factors that contribute to their detection, validating their significance, and identifying potential biases or errors. Effective explanations provide actionable insights, facilitating preventive measures to avoid similar outliers in the future. Counterfactual explanations clarify why specific data points are classified as outliers by identifying minimal changes required to alter their prediction. Although valuable, most existing counterfactual explanation methods overlook the unique challenges posed by outlier detection, and fail to target classical, widely adopted outlier detection algorithms. Local Outlier Factor (LOF) is one the most popular unsupervised outlier detection methods, quantifying outlierness through relative local density. Despite LOF's widespread use across diverse applications, it lacks interpretability. To address this limitation, we introduce Density-based Counterfactuals for Outliers (DCFO), a novel method specifically designed to generate counterfactual explanations for LOF. DCFO partitions the data space into regions where LOF behaves smoothly, enabling efficient gradient-based optimisation. Extensive experimental validation on 50 OpenML datasets demonstrates that DCFO consistently outperforms benchmarked competitors, offering superior proximity and validity of generated counterfactuals.