Joint Pruning & Quantization for Extremely Sparse Neural Networks
We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware.In a practical scenario, there are particularly many applications for dense prediction tasks, hence we choose stereo depth estimation as target. We propose a two stage pruning and quantization pipeline and introduce a Taylor Score alongside a new fine-tuning mode to achieve extreme sparsity without sacrificing performance. Our evaluation does not only show that pruning and quantization should be investigated jointly, but also shows that almost 99% of memory demand can be cut while hardware costs can be reduced up to 99.9%. In addition, to compare with other works, we demonstrate that our pruning stage alone beats the state-of-the-art when applied to ResNet on CIFAR10 and ImageNet.
极稀疏神经网络的联合修剪和量化
我们研究深度神经网络的修剪和量化。我们的目标是为量化网络实现极高的稀疏性,以实现低成本和低功耗的加速器硬件。.. 在实际情况下,密集预测任务有很多应用,因此我们选择立体声深度估计作为目标。我们提出了两阶段的修剪和量化流程,并引入了泰勒分数和新的微调模式,以在不牺牲性能的情况下实现极大的稀疏性。我们的评估不仅表明应该对修剪和量化进行联合研究,而且还表明可以减少将近99%的内存需求,同时可以将硬件成本降低多达99.9%。此外,为了与其他作品进行比较,我们证明了将修剪阶段应用于CIFAR10和ImageNet上的ResNet时,其技术水平是最先进的。 (阅读更多)
暂无评论