PURGE: Projected Unlearning via Retain-Guided Erasure
PURGE: 通过保留引导擦除的投影遗忘
Vedant Jawandhia, Daksh Ahuja, Ghufran Alam Siddiqui, Prashant Trivedi, Yash Sinha, Pratik Narang
AI总结 提出一种基于持续学习与机器遗忘对偶性的遗忘算法PURGE,利用梯度投影约束保留损失,并通过多层表示擦除和保留混淆目标实现隐私与效用的平衡。
详情
- Comments
- 13 pages, 10 figures, 6 tables
我们提出PURGE,一种基于简单但未被充分利用的观察构建的机器遗忘算法:持续学习(CL)和机器遗忘(MU)本质上是二元问题。CL试图在不遗忘旧任务的情况下学习新任务;MU试图在不损害保留性能的情况下擦除特定数据,代表了相同基本张力在相反方向上的体现。PURGE通过调整A-GEM(Chaudhry等人,2019)的梯度投影来利用这种对偶性,使得每个遗忘步骤都受到约束,不会增加保留集损失。在此基础上,它执行多层表示擦除,将中间层中遗忘集的激活推向保留分布,以从隐藏表示中移除信息,而不仅仅是在输出层抑制信息。一个关键的设计选择是保留混淆目标:不是将遗忘输出推向均匀分布(我们发现这很容易被成员推断攻击检测到),而是将目标设定为模型在保留数据上的自然混淆模式。这使得遗忘模型难以与从头重新训练的模型区分。两个自调节停止标准(保留损失预算和遗忘准确率目标)让算法自行决定何时停止,无需手动调整训练轮数。在五个数据集(CIFAR-10、MNIST、SVHN、STL10、PathMNIST)上的22个类别级遗忘任务实验中,PURGE始终将保留准确率保持在96%以上,同时实现接近0.5(理想值)的MIA AUROC,在隐私-效用前沿上优于梯度上升、KL均匀分布以及多个已发表的基线方法。
We propose PURGE, a machine unlearning algorithm built on a simple but an under-exploited observation: continual learning (CL) and machine unlearning (MU) which are fundamentally dual problems. CL tries to learn new tasks without forgetting old ones; MU tries to erase specific data without hurting retained performance representing the same underlying tension in opposite directions. PURGE leverages this duality by adapting gradient projection from A-GEM (Chaudhry et al., 2019) so that every unlearning step is constrained to not increase the retain-set loss. On top of this, it performs multi-layer representation erasure, pushing forget-set activations in intermediate layers towards the retain distribution to remove information from hidden representations rather than just suppressing it at the output. A key design choice is the retain-confusion target: rather than pushing forget outputs toward the uniform distribution, which we found to be surprisingly easy for membership inference attacks to detect, we instead target the model's natural confusion pattern on retain data. This makes the unlearned model hard to distinguish from one retrained from scratch. Two self-regulating stopping criteria (a retain-loss budget and a forget-accuracy target) let the algorithm decide on its own when to stop, removing the need for manual epoch tuning. In experiments on five datasets (CIFAR-10, MNIST, SVHN, STL10, PathMNIST) across 22 class-level forgetting tasks, PURGE consistently keeps retain accuracy above 96% while achieving MIA AUROC close to 0.5 (the ideal), outperforming gradient ascent, KL-uniform, and several published baselines on the privacy-utility frontier.