Learning Generalizable Representations via Diverse Supervision
The problem of rare category recognition has received a lot of attention recently, with state-of-the-art methods achieving significant improvements. However, we identify two major limitations in the existing literature. First, the benchmarks are constructed by randomly splitting the categories of artificially balanced datasets into frequent (head), and rare (tail) subsets, which results in unrealistic category distributions in both of them. Second, the idea of using external sources of supervision to learn generalizable representations is largely overlooked. In this work, we attempt to address both of these shortcomings by introducing the ADE-FewShot benchmark. It stands upon the ADE dataset for scene parsing that features a realistic, long-tail distribution of categories as well as a diverse set of annotations. We turn it into a realistic few-shot classification benchmark by splitting the object categories into head and tail based on their distribution in the world. We then analyze the effect of applying various supervision sources on representation learning for rare category recognition, and observe significant improvements.
READ FULL TEXT