Evidence against implicitly recurrent computations in residual neural networks

Recent work on residual neural networks (ResNets) has suggested that a ResNet's deep feedforward computation may be characterized as implicitly recurrent in that it iteratively refines the same representation like a recurrent network. To test this hypothesis, we manipulate the degree of weight sharing across layers in ResNets using soft gradient coupling.This new method, which provides a form of recurrence regularization, can interpolate smoothly between an ordinary ResNet and a ``"recurrent" ResNet (i.e., one that uses identical weights across layers and thus could be physically implemented with a recurrent network computing the successive stages iteratively across time). We define three indices of recurrent iterative computation and show that a higher degree of gradient coupling promotes iterative convergent computation in ResNets. To measure the degree of weight sharing, we quantify the effective number of parameters of models along the continuum between nonrecurrent and recurrent. For a given effective number of parameters, recurrence regularization does not improve classification accuracy on three visual recognition tasks (MNIST, CIFAR-10, Digitclutter). ResNets, thus, may not benefit from a more similar sets of weights across layers, suggesting that their power does not derive from implicitly recurrent computation.

残差神经网络中隐式递归计算的证据

关于残差神经网络(ResNets)的最新研究表明,ResNet的深度前馈计算可以被表征为隐式递归,因为它可以迭代地精炼与递归网络类似的表示。为了检验该假设,我们使用软梯度耦合来控制ResNets中各层的权重分配程度。.. 这种提供递归正则化形式的新方法可以在普通ResNet和““递归” ResNet”之间平滑地插值(即,一种使用跨层相同权重的递归,因此可以通过递归网络计算该递归网络来实现)跨时间迭代阶段)。我们定义了循环迭代计算的三个指标,并表明较高的梯度耦合度可促进ResNets中的迭代收敛计算。为了衡量权重分配的程度,我们沿非经常性和经常性之间的连续性量化了模型参数的有效数量。对于给定的有效参数数量,递归正则化不能提高三种视觉识别任务(MNIST,CIFAR-10,Digitclutter)的分类准确性。因此,ResNets (阅读更多)