Rényi Divergence Deep Mutual Learning
This paper revisits an incredibly simple yet exceedingly effective computing paradigm, Deep Mutual Learning (DML). We observe that the effectiveness correlates highly to its excellent generalization quality. In the paper, we interpret the performance improvement with DML from a novel perspective that it is roughly an approximate Bayesian posterior sampling procedure. This also establishes the foundation for applying the Rényi divergence to improve the original DML, as it brings in the variance control of the prior (in the context of DML). Therefore, we propose Rényi Divergence Deep Mutual Learning (RDML). Our empirical results represent the advantage of the marriage of DML and the Rényi divergence. The flexible control imposed by the Rényi divergence is able to further improve DML to learn better generalized models.
READ FULL TEXT