OVO: One-shot Vision Transformer Search with Online distillation

12/28/2022
by   Zimian Wei, et al.
0

Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32 CIFAR-100, respectively.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset