Transform Quantization for CNN Compression
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models.We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).
CNN压缩的变换量化
在本文中,我们通过变换量化压缩卷积神经网络(CNN)权重训练后。以前的CNN量化技术倾向于忽略权重和激活的联合统计信息,在给定的量化比特率下产生次优的CNN性能,或者仅在训练过程中考虑其联合统计信息,并且不便于有效压缩已训练的CNN模型。.. 我们优化地变换(解相关)并使用速率失真框架量化后训练的权重,以提高任何给定的量化比特率下的压缩率。变换量化在单个框架中统一了量化和降维(去相关)技术,以促进CNN的低比特率压缩和变换域中的有效推断。我们首先介绍用于CNN量化的速率和失真理论,并将最佳量化作为速率失真优化问题。然后,我们证明可以通过本文中得出的最佳端到端学习变换(ELT),在去相关之后使用最佳位深度分配来解决此问题。实验表明,在再训练和未再训练的量化方案中,变换量化都可以推动CNN压缩技术的发展。特别是,我们发现带有再训练的变换量化能够将CNN模型(例如AlexNet,ResNet和DenseNet)压缩到非常低的比特率(1-2位)。 (阅读更多)
暂无评论