wav2vec: Unsupervised Pre-training for Speech Recognition

04/11/2019
by   Steffen Schneider, et al.
0

We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pre-train a simple multi-layer convolutional neural network optimized via a noise contrastive binary classification task. Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 32 achieves 2.78 best reported character-based system in the literature while using three orders of magnitude less labeled training data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset