This paper presents OxfordTVG-HIC (Humorous Image Captions), a large-sca...
This paper presents a new mechanism to facilitate the training of mask
t...
Modern deep networks can be better generalized when trained with noisy
s...
Large-scale pre-training has been proven to be crucial for various compu...
Video Panoptic Segmentation (VPS) aims at assigning a class label to eac...
Mixup-based augmentation has been found to be effective for generalizing...
Transformers with powerful global relation modeling abilities have been
...
Human vision is able to capture the part-whole hierarchical information ...
Some image restoration tasks like demosaicing require difficult training...
By assigning each relationship a single label, current approaches formul...
Multi-sensor perception is crucial to ensure the reliability and accurac...
We present MMDetection, an object detection toolbox that contains a rich...
Cascade is a classic yet powerful architecture that has boosted performa...
The basic principles in designing convolutional neural network (CNN)
str...
Motion representation plays a vital role in human action recognition in
...