When Accuracy Is Not Enough: Uncertainty Collapse between Noisy Label Learning and Out-of-Distribution Detection
当准确性不够时:噪声标签学习与分布外检测之间的不确定性崩溃
Ningkang Peng, Jingyang Mao, Runhan Zhou, Peirong Ma, Yanhui Gu
AI总结 本文研究了噪声标签学习与分布外检测之间的不确定性崩溃问题,提出了一种通用的ACC-OOD基准,揭示了高准确率并不保证分布外可靠性,提出虚拟边距正则化方法来缓解这一问题。
详情
噪声标签学习(LNL)通常通过封闭集分类准确率进行评估,但部署时往往需要分类器能够拒绝分布外(OOD)输入。我们提出了一种学习者无关的ACC-OOD基准,冻结LNL检查点,并在合成和真实噪声标签上评估它们,使用标准化的近/远OOD路由和事后评分。该基准揭示了一种反复出现的失败模式:高封闭集准确率不保证OOD可靠性,因为低置信度、被错误分类的分布内样本可能在噪声训练下与OOD输入占据的得分和特征区域重叠。我们称之为这种病理现象不确定性崩溃。这种结构重叠可能导致高准确率的LNL方法在标准OOD评分下失去ID错误/OOD界面的分离性。作为干预措施,我们研究了虚拟边距正则化(VMR),一种轻量级的修复探针,主要通过PSSCL展示,通过在可信ID批次上合成边界虚拟异常值并扩大能量边距。VMR在不替换主机目标或牺牲封闭集准确率的情况下,部分减少了由崩溃引起的远OOD失败。这些结果支持LNL基准,同时报告封闭集泛化、开放世界可靠性以及结构重叠诊断。
Learning with noisy labels (LNL) is typically benchmarked by closed-set classification accuracy, yet deployment often requires classifiers to reject out-of-distribution (OOD) inputs. We present a learner-agnostic ACC-OOD benchmark that freezes LNL checkpoints and evaluates them with standardized near-/far-OOD routing and post-hoc scores across synthetic and real label noise. The benchmark reveals a recurring failure mode: high closed-set accuracy does not ensure OOD reliability, because low-confidence, misclassified in-distribution samples can overlap the score and feature regions occupied by OOD inputs under noisy training. We term this pathology uncertainty collapse. This structural overlap can make high-accuracy LNL methods lose separability at the ID-error/OOD interface under standard OOD scores. As an intervention, we study Virtual Margin Regularization (VMR), a lightweight repair probe demonstrated mainly with PSSCL that synthesizes boundary virtual outliers on trusted ID batches and widens the energy margin. VMR partially reduces the collapse-induced far-OOD failure without replacing the host objective or sacrificing closed-set accuracy in the tested settings. These results support LNL benchmarks that co-report closed-set generalization, open-world reliability, and structural overlap diagnostics.