We propose a visually grounded speech model that learns new words and th...
We propose a visually grounded speech model that acquires new words and ...
Imagine being able to show a system a visual depiction of a keyword and
...
Contrastive predictive coding (CPC) aims to learn representations of spe...
We propose direct multimodal few-shot models that learn a shared embeddi...
We consider the task of multimodal one-shot speech-image matching. An ag...
In this paper, we explore vector quantization for acoustic unit discover...
For our submission to the ZeroSpeech 2019 challenge, we apply discrete
l...