Existing deep video models are limited by specific tasks, fixed input-ou...
Object tracking (OT) aims to estimate the positions of target objects in...
Exploring dense matching between the current frame and past frames for
l...
Online media data, in the forms of images and videos, are becoming mains...
This paper presents OmniVL, a new foundation model to support both
image...
Recent advances in image editing techniques have posed serious challenge...
Video transformers have achieved impressive results on major video
recog...
Blind face inpainting refers to the task of reconstructing visual conten...
The widespread dissemination of forged images generated by Deepfake
tech...
Humans can easily recognize actions with only a few examples given, whil...