Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

09/26/2019
by   Yuxian Meng, et al.
0

In this paper, we investigate the problem of training neural machine translation (NMT) systems with a dataset of more than 40 billion bilingual sentence pairs, which is larger than the largest dataset to date by orders of magnitude. Unprecedented challenges emerge in this situation compared to previous NMT work, including severe noise in the data and prohibitively long training time. We propose practical solutions to handle these issues and demonstrate that large-scale pretraining significantly improves NMT performance. We are able to push the BLEU score of WMT17 Chinese-English dataset to 32.3, with a significant performance boost of +3.2 over existing state-of-the-art results.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset