Until recently, the Video Instance Segmentation (VIS) community operated...
Keypoint detection descriptors are foundational tech-nologies for co...
Video understanding tasks take many forms, from action detection to visu...
Visual object tracking is a key component to many egocentric vision prob...
Open-world instance segmentation is the task of grouping pixels into obj...
Cognitive science has shown that humans perceive videos in terms of even...
We introduce PyTorchVideo, an open-source deep-learning library that pro...
Conventional video models rely on a single stream to capture the complex...
Current state-of-the-art object detection and segmentation methods work ...
This paper presents a novel task together with a new benchmark for detec...
Differential Neural Architecture Search (NAS) requires all layer choices...
In this paper, we study an intermediate form of supervision, i.e.,
singl...
Existing models often leverage co-occurrences between objects and their
...
Understanding temporal information and how the visual world changes over...
Video classification methods often divide the video into short clips, do...
Motion is a salient cue to recognize actions in video. Modern action
rec...
Consider end-to-end training of a multi-modal vs. a single-modal network...
Current fully-supervised video datasets consist of only a few hundred
th...
Group convolution has been shown to offer great computational savings in...
It can be difficult to tell whether a trained generative model has learn...