Text-to-image diffusion models understand spatial relationship between
o...
Neural fields, which represent signals as a function parameterized by a
...
Pre-trained multi-modal vision-language models (VLMs) are becoming
incre...
Recent multi-modal contrastive learning models have demonstrated the abi...
The task of reconstructing 3D human motion has wideranging applications....
Open World Object Detection (OWOD) is a new and challenging computer vis...
The ability to perceive 3D human bodies from a single image has a multit...
Large pretrained language models (LMs) like BERT have improved performan...
Given the ubiquity of deep neural networks, it is important that these m...
Correlations between factors of variation are prevalent in real-world da...
Existing approaches to few-shot learning deal with tasks that have
persi...
Invertible neural networks (INNs) have been used to design generative mo...
We present a new method for evaluating and training unnormalized density...
We propose to reinterpret a standard discriminative classifier of p(y|x)...
Lingvo is a Tensorflow framework offering a complete solution for
collab...
Speaker embedding models that utilize neural networks to map utterances ...
Bayesian neural networks (BNNs) allow us to reason about uncertainty in ...
Interacting systems are prevalent in nature, from dynamical systems in
p...
Generative adversarial nets (GANs) are a promising technique for modelin...