Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

Class probabilities predicted by most multiclass classifiers are uncalibrated, often tending towards over-confidence. With neural networks, calibration can be improved by temperature scaling, a method to learn a single corrective multiplicative factor for inputs to the last softmax layer.On non-neural models the existing methods apply binary calibration in a pairwise or one-vs-rest fashion. We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification. It is easily implemented with neural nets since it is equivalent to log-transforming the uncalibrated probabilities, followed by one linear layer and softmax. Experiments demonstrate improved probabilistic predictions according to multiple measures (confidence-ECE, classwise-ECE, log-loss, Brier score) across a wide range of datasets and classifiers. Parameters of the learned Dirichlet calibration map provide insights to the biases in the uncalibrated model.

超越温度定标:通过Dirichlet校准获得校准良好的多类概率

大多数多类别分类器预测的类别概率是未经校准的,常常会导致过度自信。使用神经网络,可以通过温度缩放改善校准,这是一种学习输入到最后一个softmax层的输入的校正乘数的方法。.. 在非神经模型上,现有方法以成对或相对于静止的方式应用二进制校准。我们提出了一种本机的多类校准方法,该方法适用于任何模型类的分类器,这些方法源自Dirichlet分布,并根据二进制分类法推广了beta校准方法。它很容易用神经网络实现,因为它等效于对数校正未校准的概率,然后是一个线性层和softmax。实验证明,可以在多种数据集和分类器上根据多种度量(置信度ECE,逐级ECE,对数损失,Brier得分)进行改进的概率预测。所学习的Dirichlet校准图的参数提供了对未校准模型中偏差的见解。 (阅读更多)