Large-scale visual language models are widely used as pre-trained models...
Attention-based models are appealing for multimodal processing because i...
Building models that can be rapidly adapted to numerous tasks using only...
The ability to learn universal audio representations that can solve dive...
We present a multimodal framework to learn general audio representations...
Most successful self-supervised learning methods are trained to align th...
The rapid progress in artificial intelligence (AI) and machine learning ...
Recent breakthroughs in adversarial generative modeling have led to mode...
Anticipating future events is an important prerequisite towards intellig...
The ability to predict and therefore to anticipate the future is an impo...
Adversarial training has been shown to produce state of the art results ...