AlbNER: A Corpus for Named Entity Recognition in Albanian

09/15/2023
by   Erion Çano, et al.
0

Scarcity of resources such as annotated text corpora for under-resourced languages like Albanian is a serious impediment in computational linguistics and natural language processing research. This paper presents AlbNER, a corpus of 900 sentences with labeled named entities, collected from Albanian Wikipedia articles. Preliminary results with BERT and RoBERTa variants fine-tuned and tested with AlbNER data indicate that model size has slight impact on NER performance, whereas language transfer has a significant one. AlbNER corpus and these obtained results should serve as baselines for future experiments.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset