Nonconvex Regularization for Network Slimming:Compressing CNNs Even More

In the last decade, convolutional neural networks (CNNs) have evolved to become the dominant models for various computer vision tasks, but they cannot be deployed in low-memory devices due to its high memory requirement and computational cost. One popular, straightforward approach to compressing CNNs is network slimming, which imposes an $\ell_1$ penalty on the channel-associated scaling factors in the batch normalization layers during training.In this way, channels with low scaling factors are identified to be insignificant and are pruned in the models. In this paper, we propose replacing the $\ell_1$ penalty with the $\ell_p$ and transformed $\ell_1$ (T$\ell_1$) penalties since these nonconvex penalties outperformed $\ell_1$ in yielding sparser satisfactory solutions in various compressed sensing problems. In our numerical experiments, we demonstrate network slimming with $\ell_p$ and T$\ell_1$ penalties on VGGNet and Densenet trained on CIFAR 10/100. The results demonstrate that the nonconvex penalties compress CNNs better than $\ell_1$. In addition, T$\ell_1$ preserves the model accuracy after channel pruning, and $\ell_{1/2, 3/4}$ yield compressed models with similar accuracies as $\ell_1$ after retraining.

网络瘦身的非凸正则化:进一步压缩CNN

在过去的十年中,卷积神经网络(CNN)已经发展成为各种计算机视觉任务的主要模型,但是由于其高内存需求和计算成本,它们无法部署在低内存设备中。压缩CNN的一种流行,直接的方法是网络瘦身,它要求 ℓ1个 训练过程中批次归一化层中与通道相关的缩放因子的损失。.. 用这种方法,将比例因子低的通道识别为无关紧要的,并在模型中对其进行了修剪。在本文中,我们建议替换 ℓ1个 罚款 ℓp 并转变了 ℓ1个 (T ℓ1个 )罚则,因为这些非凸罚则优于 ℓ1个 在各种压缩感测问题中产生稀疏的令人满意的解决方案。在我们的数值实验中,我们展示了使用 ℓp 和T ℓ1个 对VGGNet和Densenet的处罚接受了CIFAR 10/100的培训。结果表明,非凸罚分压缩CNN的效果优于 ℓ1个 。另外,T ℓ1个 修剪通道后保留模型精度,并且 ℓ1个/2,3/4 产生压缩模型,其精确度与 ℓ1个 再培训之后。 (阅读更多)