Visual Commonsense Reasoning (VCR) remains a significant yet challenging...
Temporal modeling is crucial for various video learning tasks. Most rece...
Expanding an existing tourist photo from a partially captured scene to a...
Learning-based multi-view stereo (MVS) methods have made impressive prog...
Visual Commonsense Reasoning (VCR), deemed as one challenging extension ...
Language model pre-training based on large corpora has achieved tremendo...
Underwater image enhancement is such an important vision task due to its...
This paper strives for pixel-level segmentation of actors and their acti...
We present a new architecture for end-to-end sequence learning of action...
In online action detection, the goal is to detect the start of an action...