Improving Transformer-based Speech Recognition Using Unsupervised Pre-training
Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires significant amounts of transcribed data which is expensive to collect. To tackle this problem, we propose a novel unsupervised pre-training method called masked predictive coding, which can be applied for unsupervised pre-training with Transformer based model. Experiments on HKUST show that using the same training data and other open source Mandarin data, we can reduce CER of a strong Transformer based baseline by 3.7 reduce CER of AISHELL-1 by 12.9
READ FULL TEXT