Efficient neural speech synthesis for low-resource languages throughmultilingual modeling

09/02/2020
by   EstherKlabbers, et al.
0

Recent advances in neural TTS have led to models that canproduce high-quality synthetic speech. However, these mod-els typically require large amounts of training data, which canmake it costly to produce a new voice with the desired qual-ity. Although multi-speaker modeling can reduce the data re-quirements necessary for a new voice, this approach is usuallynot viable for many low-resource languages for which abundantmulti-speaker data is not available. In this paper, we thereforeinvestigated to what extent multilingual multi-speaker model-ing can be an alternative to monolingual multi-speaker model-ing, and explored how data from foreign languages may best becombined with low-resource language data. We found that mul-tilingual modeling can increase the naturalness of low-resourcelanguage speech, showed that multilingual models can producespeech with a naturalness comparable to monolingual multi-speaker models, and saw that the target language naturalnesswas affected by the strategy used to add foreign language data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset