On the Use of Machine Translation-Based Approaches for Vietnamese Diacritic Restoration

09/20/2017
by   Thai Hoang Pham, et al.
0

This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based machine translation models. This is the first work that applies neural-based machine translation method to this problem and gives a thorough comparison to the phrase-based machine translation method which is the current state-of-the-art method for this problem. On a large dataset, the phrase-based approach has an accuracy of 97.32 approach is 96.15 accuracy, it is about twice faster than the phrase-based method in terms of inference speed. Moreover, neural-based machine translation method has much room for future improvement such as incorporating pre-trained word embeddings and collecting more training data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset