Kernel-Based Smoothness Analysis of Residual Networks

A major factor in the success of deep neural networks is the use of sophisticated architectures rather than the classical multilayer perceptron (MLP). Residual networks (ResNets) stand out among these powerful modern architectures.Previous works focused on the optimization advantages of deep ResNets over deep MLPs. In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoother interpolations than MLPs. We analyze this phenomenon via the neural tangent kernel (NTK) approach. First, we compute the NTK for a considered ResNet model and prove its stability during gradient descent training. Then, we show by various evaluation methodologies that the NTK of ResNet, and its kernel regression results, are smoother than the ones of MLP. The better smoothness observed in our analysis may explain the better generalization ability of ResNets and the practice of moderately attenuating the residual blocks.

基于核的残差网络平滑度分析

深度神经网络成功的一个主要因素是使用复杂的体系结构而不是经典的多层感知器(MLP)。残留网络(ResNets)在这些强大的现代体系结构中脱颖而出。.. 先前的工作着重于深度ResNets相对于深度MLP的优化优势。在本文中,我们展示了这两种模型之间的另一个区别,即ResNets倾向于促进比MLP更平滑的插值。我们通过神经正切核(NTK)方法分析了这种现象。首先,我们为考虑的ResNet模型计算NTK,并证明其在梯度下降训练过程中的稳定性。然后,我们通过各种评估方法表明ResNet的NTK及其内核回归结果比MLP平滑。在我们的分析中观察到的更好的平滑度可以解释ResNets更好的泛化能力以及适度衰减残留块的实践。 (阅读更多)