The core problem in zero-shot open vocabulary detection is how to align
...
Recent work has shown exciting promise in updating large language models...
Deepfakes pose a serious threat to our digital society by fueling the sp...
The learning objective of vision-language approach of CLIP does not
effe...
We investigate the mechanisms underlying factual knowledge recall in
aut...
Training supervised image synthesis models requires a critic to compare ...
We investigate the problem of zero-shot semantic image painting. Instead...
Performing inference on deep learning models for videos remains a challe...
Identifying common patterns among events is a key ability in human and
m...
An event happening in the world is often made of different activities an...
We introduce a framework that uses Generative Adversarial Networks (GANs...
Sensing surroundings is ubiquitous and effortless to humans: It takes a
...
Because of the rich dynamical structure of videos and their ubiquity in
...
In the last decade, artificial intelligence (AI) models inspired by the ...
We present the Moments in Time Dataset, a large-scale human-annotated
co...
Temporal relational reasoning, the ability to link meaningful transforma...