We introduce an object-aware decoder for improving the performance of
sp...
We present a novel model for Tracking Any Point (TAP) that effectively t...
We propose a novel multimodal video benchmark - the Perception Test - to...
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple y...
Generic motion understanding from video involves not only tracking objec...
The objective of this work is to learn an object-centric video
represent...
Our objective in this work is fine-grained classification of actions in
...
Projecting high-dimensional environment observations into lower-dimensio...
In this work, our objective is to address the problems of generalization...
Given new tasks with very little data–such as new classes in a
classific...
Regulatory compliance is an organization's adherence to laws, regulation...
We introduce a method for learning landmark detectors from unlabelled vi...
The study of object representations in computer vision has primarily foc...
This work presents a method for visual text recognition without using an...
End-to-end trained Recurrent Neural Networks (RNNs) have been successful...
In this paper, we consider the problem of learning landmarks for object
...
Paraphrase generation is an important problem in NLP, especially in ques...
In this paper we introduce a new method for text detection in natural im...