Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

06/30/2020
by   Hadi Pouransari, et al.
7

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher uncertainties; (2) sampling training examples from underlying data distribution; and (3) matching student and teacher output distributions. We conduct extensive evaluations on regression and classification tasks and show that compared to the original knowledge distillation, extracurricular learning reduces the gap by 46 This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures: 7.9 top-1 image classification accuracy on the CIFAR100 dataset, and +2.9 top-1 image classification accuracy on the ImageNet dataset.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset