We present a sequence-to-sequence vision-language model whose parameters...
In this work, instead of directly predicting the pixel-level segmentatio...
We study semi-supervised learning (SSL) for vision transformers (ViT), a...
In this paper, we study how to use masked signal modeling in vision and
...
Most existing works on few-shot object detection (FSOD) focus on a setti...
In this paper, we study the challenging instance-wise vision-language ta...
We consider the problem of omni-supervised object detection, which can u...
We present Contrastive Neighborhood Alignment (CNA), a manifold learning...
We present a plug-in replacement for batch normalization (BN) called
exp...
Low-precision networks, with weights and activations quantized to low
bi...
In object detection, the intersection over union (IoU) threshold is
freq...
Despite increasing efforts on universal representations for visual
recog...
In object detection, an intersection over union (IoU) threshold is requi...
The problem of quantizing the activations of a deep neural network is
co...
In recent years, numerous effective multi-object tracking (MOT) methods ...
The design of complexity-aware cascaded detectors, combining features of...