Provable Fairness Repair for Deep Neural Networks
深度神经网络的可证公平修复
Jianan Ma, Jingyi Wang, Qi Xuan, Zhen Wang
AI总结 本文提出ProF框架,通过区间界限传播技术,为深度神经网络提供可证的公平性修复,实现对偏见样本周围整个集合的公平性保障,并在多个基准数据集上验证了其有效性。
Comments 15 pages, 6 figures, 7 tables. full version of the paper accepted by ASE 2025
详情
- Journal ref
- Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2025
深度神经网络(DNNs)正面临诸如个体歧视等伦理问题。为此,已开发出大量NN修复技术来调整模型并减轻此类不良行为。然而,现有公平性修复方法通常是数据驱动的,往往缺乏可证保证和对未见过样本的泛化能力。为克服这些限制,我们提出了ProF,一种具有可证保证的新型公平性修复框架。ProF的核心思想是利用区间界限传播(一种广泛使用的神经网络验证技术)来在偏见样本x周围的整个集合S(x)上准确捕捉模型输出。所推导的界限用于指导公平性修复,促使模型在S(x)上产生一致的输出。具体而言,我们将公平性约束和模型修改整合到统一的约束求解公式中,该公式可转换为可由现成求解器解决的混合整数线性规划(MILP)问题。MILP问题的解有效地诱导出一个具有整体S(x)公平性保障的修复模型。我们在四个广泛使用的基准数据集上评估了ProF,并证明其实现了可证公平性修复,在完整数据集上的泛化能力高达95.93%,在整个输入空间上为93.16%。值得注意的是,ProF可以轻松配置以支持多种敏感属性和更实际的公平性定义,同时提供可证修复保证,并实现约90%的公平性提升。我们的代码可在https://github.com/nninjn/ProF上获得。
Deep neural networks (DNNs) are suffering from ethical issues such as individual discrimination. In response, extensive NN repair techniques have been developed to adjust models and mitigate such undesired behaviors. However, existing fairness repair methods are typically data-centric, which often lack provable guarantees and generalization to unseen samples. To overcome these limitations, we propose ProF, a novel fairness repair framework with provable guarantees. The key intuition of ProF is to leverage interval bound propagation (a widely used NN verification technique) to soundly capture model outputs over the whole set $S(\mathbf{x})$ around a biased sample $\mathbf{x}$. The derived bounds are utilized to guide fairness repair which encourages the model to produce consistent outputs on $S(\mathbf{x})$. Specifically, we integrate fairness constraints and model modifications into a unified constraint-solving formulation, which can be transformed to a Mixed-Integer Linear Programming (MILP) problem solvable by off-the-shelf solvers. The solution to the MILP problem effectively induces a repaired model with guaranteed fairness over the whole set $S(\mathbf{x})$. We evaluate ProF on four widely used benchmark datasets and demonstrate that it achieves provable fairness repair, with generalization of up to 95.93\% on full datasets and 93.16\% on the entire input space. Notably, ProF can be easily configured to support multiple sensitive attributes and more practical fairness definitions, while providing provable repair guarantees and delivering around 90\% fairness improvement. Our code is available at https://github.com/nninjn/ProF.