Global Knowledge Distillation in Federated Learning
Knowledge distillation has caught a lot of attention in Federated Learning (FL) recently. It has the advantage for FL to train on heterogeneous clients which have different data size and data structure. However, data samples across all devices are usually not independent and identically distributed (non-i.i.d), posing additional challenges to the convergence and speed of federated learning. As FL randomly asks the clients to join the training process and each client only learns from local non-i.i.d data, which makes learning processing even slower. In order to solve this problem, an intuitive idea is using the global model to guide local training. In this paper, we propose a novel global knowledge distillation method, named FedGKD, which learns the knowledge from past global models to tackle down the local bias training problem. By learning from global knowledge and consistent with current local models, FedGKD learns a global knowledge model in FL. To demonstrate the effectiveness of the proposed method, we conduct extensive experiments on various CV datasets (CIFAR-10/100) and settings (non-i.i.d data). The evaluation results show that FedGKD outperforms previous state-of-the-art methods.
READ FULL TEXT