Text-guided diffusion models (TDMs) are widely applied but can fail
unex...
Vision Transformer (ViT) has become one of the most popular neural
archi...
Active learning promises to improve annotation efficiency by iteratively...
This paper studies the potential of distilling knowledge from pre-traine...
The recent success of Vision Transformers is shaking the long dominance ...
Adversarial Propagation (AdvProp) is an effective way to improve recogni...
In this work we present point-level region contrast, a self-supervised
p...
Transformer emerges as a powerful tool for visual recognition. In additi...
Recently, there emerges a series of vision Transformers, which show supe...
Fine-grained visual classification (FGVC) which aims at recognizing obje...
We propose Mask Guided (MG) Matting, a robust matting framework that tak...
Understanding objects in terms of their individual parts is important,
b...
Leveraging temporal information has been regarded as essential for devel...
Today's most popular approaches to keypoint detection learn a holistic
r...
3D convolution neural networks (CNN) have been proved very successful in...
Referring object detection and referring image segmentation are importan...
Detecting semantic parts of an object is a challenging task in computer
...