Distributional Generalization: A New Kind of Generalization

We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error. For example, if we mislabel 30% of dogs as cats in the train set of CIFAR-10, then a ResNet trained to interpolation will in fact mislabel roughly 30% of dogs as cats on the *test set* as well, while leaving other classes unaffected.This behavior is not captured by classical generalization, which would only consider the average error and not the distribution of errors over the input domain. Our formal conjectures, which are much more general than this example, characterize the form of distributional generalization that can be expected in terms of problem parameters: model architecture, training procedure, number of samples, and data distribution. We give empirical evidence for these conjectures across a variety of domains in machine learning, including neural networks, kernel machines, and decision trees. Our results thus advance our empirical understanding of interpolating classifiers.

分布泛化:一种新的泛化

我们引入了一种新的泛化概念-分布泛化-粗略地指出,分类器在训练和测试时的输出是*接近分布的*,而不是仅在平均误差上接近。例如,如果我们在CIFAR-10的训练集中将30%的狗标记为猫,那么经过训练以进行插值的ResNet实际上也会在*“测试集”上将大约30%的狗标记为猫,而其他类不受影响。.. 经典概括无法捕获此行为,经典概括仅考虑平均误差,而不考虑误差在输入域上的分布。我们的形式猜想比这个例子要普遍得多,它描述了可以根据问题参数期望的分布概括形式:模型体系结构,训练过程,样本数量和数据分布。我们为机器学习中各个领域的这些猜想提供了经验证据,包括神经网络,内核机器和决策树。因此,我们的结果提高了我们对插值分类器的经验理解。 (阅读更多)