Better Together: Resnet-50 accuracy with $13 \times$ fewer parameters and at $3\times$ speed

Recent research on compressing deep neural networks has focused on reducing the number of parameters. Smaller networks are easier to export and deploy on edge-devices.We introduce Adjoined networks as a training approach that can regularize and compress any CNN-based neural architecture. Our one-shot learning paradigm trains both the original and the smaller networks together. The parameters of the smaller network are shared across both the architectures. We prove strong theoretical guarantees on the regularization behavior of the adjoint training paradigm. We complement our theoretical analysis by an extensive empirical evaluation of both the compression and regularization behavior of adjoint networks. For resnet-50 trained adjointly on Imagenet, we are able to achieve a $13.7x$ reduction in the number of parameters (For size comparison, we ignore the parameters in the last linear layer as it varies by dataset and are typically dropped during fine-tuning. Else, the reductions are $11.5x$ and $95x$ for imagenet and cifar-100 respectively.) and a $3x$ improvement in inference time without any significant drop in accuracy. For the same architecture on CIFAR-100, we are able to achieve a $99.7x$ reduction in the number of parameters and a $5x$ improvement in inference time. On both these datasets, the original network trained in the adjoint fashion gains about $3\%$ in top-1 accuracy as compared to the same network trained in the standard fashion.

更好的结合:Resnet-50的精度

压缩深度神经网络的最新研究集中于减少参数的数量。较小的网络更易于导出和部署在边缘设备上。.. 我们引进毗连网络的训练方法,可以经常化,压缩任何基于CNN-神经结构。我们的一站式学习范例将原始网络和较小的网络一起训练。较小的网络的参数在两种体系结构之间共享。我们证明了伴随训练范式的正则化行为的强大理论保证。我们通过对伴随网络的压缩和正则化行为进行广泛的经验评估来补充理论分析。对于在Imagenet上经过联合培训的resnet-50,我们能够实现 13.7X 参数数量的减少(对于大小比较,我们忽略最后一个线性层中的参数,因为它随数据集而变化,并且通常在微调时被删除。 11.5X 和 95X 分别用于imagenet和cifar-100。)和 3X 改进推理时间,而准确性没有任何明显下降。对于CIFAR-100上的相同架构,我们能够实现 99.7X 减少参数数量并减少 5X 改善推理时间。在这两个数据集上,以伴随方式训练的原始网络获得约 3% 与以标准方式训练的同一网络相比,具有最高的1精度。 (阅读更多)