Efficiently and accurately translating a corpus into a low-resource lang...
Many natural language processing (NLP) tasks make use of massively
pre-t...
We present Bloom Library, a linguistically diverse set of multimodal and...
BibleTTS is a large, high-quality, open speech dataset for ten languages...
In recent years, large-scale data collection efforts have prioritized th...
With the success of large-scale pre-training and multilingual modeling i...