BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla
This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language in the web domain. We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark. Then, using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer model for Bangla. BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming mT5 (base) by up to 5.4 publicly available in the hope of advancing future research and evaluation on Bangla NLG. The resources can be found at https://github.com/csebuetnlp/BanglaNLG.
READ FULL TEXT