Visually grounded speech systems learn from paired images and their spok...
Contrastive Predictive Coding (CPC) is a representation learning method ...
The high cost of data acquisition makes Automatic Speech Recognition (AS...
Typically, unsupervised segmentation of speech into the phone and word-l...
Automatic detection of phoneme or word-like units is one of the core
obj...
Unsupervised spoken term discovery consists of two tasks: finding the
ac...