这项工作通过协作学习提出了一种有效而有效的在线知识提取方法,称为KDCL,它能够持续提高具有不同学习能力的深度神经网络(DNN)的泛化能力。与现有的两阶段知识提炼方法不同,KDCL将所有具有能力的DNN作为“教师”进行预培训,然后将其知识单向(即单向)转移给另一个“学生” DNN,KDCL将所有DNN都视为“学生”。并在一个阶段进行协作式培训(在协作培训期间,知识可以在任意学生之间转移),从而实现并行计算,快速计算和吸引人的泛化能力。..
Online Knowledge Distillation via Collaborative Learning
This work presents an efficient yet effective online Knowledge Distillation method via Collaborative Learning, termed KDCL, which is able to consistently improve the generalization ability of deep neural networks (DNNs) that have different learning capacities. Unlike existing two-stage knowledge distillation approaches that pre-train a DNN with large capacity as the "teacher" and then transfer the teacher's knowledge to another "student" DNN unidirectionally (i.e. one-way), KDCL treats all DNNs as "students" and collaboratively trains them in a single stage (knowledge is transferred among arbitrary students during collaborative training), enabling parallel computing, fast computations, and appealing generalization ability.Specifically, we carefully design multiple methods to generate soft target as supervisions by effectively ensembling predictions of students and distorting the input images. Extensive experiments show that KDCL consistently improves all the "students" on different datasets, including CIFAR-100 and ImageNet. For example, when trained together by using KDCL, ResNet-50 and MobileNetV2 achieve 78.2% and 74.0% top-1 accuracy on ImageNet, outperforming the original results by 1.4% and 2.0% respectively. We also verify that models pre-trained with KDCL transfer well to object detection and semantic segmentation on MS COCO dataset. For instance, the FPN detector is improved by 0.9% mAP.
暂无评论