Large pretrained plain vision Transformers (ViTs) have been the workhors...
The public model zoo containing enormous powerful pretrained model famil...
Recent advances in Transformers have come with a huge requirement on
com...
Transformer is a transformative framework that models sequential data an...
The task of action detection aims at deducing both the action category a...
Vision Transformers (ViTs) have triggered the most recent and significan...
There has been an explosion of interest in designing high-performance
Tr...
Transformers have become one of the dominant architectures in deep learn...
The recently proposed Visual image Transformers (ViT) with pure attentio...
Vision-and-Language Navigation (VLN) is unique in that it requires turni...