Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier
插件损失用于证据深度学习:一个简化框架用于不确定性估计,其中包括softmax分类器
Berk Hayta, Hannah Laus, Simon Mittermaier, Felix Krahmer
AI总结 本文提出了一种简化框架,用于通过插件损失近似证据深度学习中的不确定性估计,证明了在特定证据到狄利克雷分布映射下,该框架包含标准的softmax分类器,并在Google语音命令数据集上验证了其有效性。
详情
现实中的基于传感器的学习系统需要可靠且计算高效的不确定性估计。证据深度学习(EDL)通过狄利克雷分布建模类概率,从而实现单次通过的不确定性估计,其中狄利克雷参数由一个学习的神经网络映射预测。然而,这种方法可能导致计算挑战,因为狄利克雷期望目标比标准监督学习损失更复杂,增加了分析和实现的难度。我们通过近似由EDL诱导的一阶经验风险最小化问题的目标,使用在狄利克雷均值上评估的插件损失,证明在温和假设下,对于广泛的一类损失函数,包括均方误差和交叉熵损失,近似误差随着证据的增长而减小。作为特殊情况,我们的分析为在不确定性估计中使用softmax提供了正当性,因为在特定的证据到狄利克雷分布映射下,我们的框架包含标准的softmax分类器。我们在Google语音命令数据集上验证了所提出的简化目标,并展示了其在预测准确性和选择性预测性能上与经典EDL相当,同时使用标准深度学习损失和训练流程实现起来更简单。到目前为止,本文的实证分析是首次通过EDL获得语音识别任务中的覆盖-准确性权衡。
Real-world sensor-based learning systems require uncertainty estimation that is both reliable and computationally efficient. Evidential Deep Learning (EDL) provides single-pass uncertainty estimation by modeling the class probabilities via Dirichlet distributions, where the Dirichlet parameters are predicted by a learned neural network mapping. However, this approach can lead to computational challenges, as Dirichlet expected objectives are more complex than standard supervised learning losses, complicating their analysis and implementation. We address this issue by approximating the objective of the first-order empirical risk minimization problem induced by EDL with a plug-in loss evaluated at the Dirichlet mean and show that, under mild assumptions, the approximation error decays with growing evidence for a broad class of loss functions, including mean-squared error and cross-entropy loss. As a special case, our analysis provides justification for the use of softmax in the context of uncertainty estimation, since under a particular evidence-to-Dirichlet mapping, our framework includes the standard softmax classifier. We validate the proposed simplified objectives on the Google Speech Commands dataset and show that they achieve predictive accuracy and selective prediction performance comparable to classical EDL, while being simpler to implement using standard deep learning losses and training pipelines. To the best of our knowledge, this empirical analysis is the first to obtain coverage-accuracy trade-offs for speech recognition tasks through EDL.