In video transformers, the time dimension is often treated in the same w...
The quality of the image representations obtained from self-supervised
l...
This paper studies zero-shot cross-lingual transfer of vision-language
m...
The dominant paradigm for learning video-text representations – noise
co...
A large part of the current success of deep learning lies in the
effecti...
Self-supervised learning has advanced rapidly, with several results beat...
The problem of attribution is concerned with identifying the parts of an...