To Balance or Not to Balance: An Embarrassingly Simple Approach for Learning with Long-Tailed Distributions

12/10/2019
by   Junjie Zhang, et al.
0

Real-world visual data often exhibits a long-tailed distribution, where some ”head” classes have a large number of samples, yet only a few samples are available for the ”tail” classes. Such imbalanced distribution causes a great challenge for learning a deep neural network, which can be boiled down into a dilemma: on the one hand, we prefer to increase the exposure of the tail class samples to avoid the excessive dominance of head classes in the classifier training. On the other hand, oversampling tail classes makes the network prone to over-fitting, since the head class samples are often consequently under-represented. To resolve this dilemma, in this paper, we propose an embarrassingly simple-yet-effective approach. The key idea is to split a network into a classifier part and a feature extractor part, and then employ different training strategies for each part. Specifically, to promote the awareness of tail-classes, a class-balanced sampling scheme is utilised for training both the classifier and the feature extractor. For the feature extractor, we also introduce an auxiliary training task, which is to train a classifier under the regular random sampling scheme. In this way, the feature extractor is jointly trained from both sampling strategies and thus can take advantage of all training data and avoid the over-fitting issue. Apart from this basic auxiliary task, we further explore the benefit of using self-supervised learning as the auxiliary task. Without using any bells and whistles, our model achieves superior performance over the state-of-the-art solutions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset