模型量化是为了离散化深度神经网络(DNN)的权重和激活。与以前的手动定义量化超参数(例如精度(即位宽),动态范围(即最小和最大离散值)和步长(即离散值之间的间隔))的方法不同,这项工作提出了一种新颖的方法来区别学习所有其中一个名为“可微分动态量化(DDQ)”,它具有许多吸引人的好处。..

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Model quantization is to discretize weights and activations of a deep neural network (DNN). Unlike previous methods that manually defined the quantization hyperparameters such as precision (\ie bitwidth), dynamic range (\ie minimum and maximum discrete values) and stepsize (\ie interval between discrete values),this work proposes a novel approach to differentiably learn all of them, named Differentiable Dynamic Quantization (DDQ), which possesses several appealing benefits.(1) Unlike previous works that applied the rounding operation to discretize values, DDQ provides a unified perspective by formulating discretization as a matrix-vector product, where different values of the matrix and vector represent different quantization methods such as mixed precision and soft quantization, and their values can be learned differentiably from training data, making different hidden layers in a DNN used different quantization methods. (2) DDQ is hardware-friendly, where all variables can be computed by using low-precision matrix-vector multiplication, making it capable in wide spectrum of hardwares. (3) The matrix variable is carefully reparameterized to reduce its number of parameters from O(2^{b^2}) to O(\log2^b), where b is the bit width. Extensive experiments show that DDQ outperforms prior arts on various advanced networks and benchmarks. For instance, compared to the full-precision models, MobileNetv2 trained with DDQ achieves comparable top1 accuracy on ImageNet (71.7% vs 71.9%), while ResNet18 trained with DDQ increases accuracy by 0.5%. These results relatively improve recent state-of-the-art quantization methods by 70% and 140% compared to the full-precision models.