Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition

09/27/2017
by   L. T. Anh, et al.
0

Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approaches have been proposed for this task in Russian language, it still has a substantial potential for the better solutions. In this work, we studied several deep neural network models starting from vanilla Bi-directional Long Short-Term Memory (Bi-LSTM) then supplementing it with Conditional Random Fields (CRF) as well as highway networks and finally adding external word embeddings. All models were evaluated across three datasets: Gareev's dataset, Person-1000, FactRuEval-2016. We found that extension of Bi-LSTM model with CRF significantly increased the quality of predictions. Encoding input tokens with external word embeddings reduced training time and allowed to achieve state of the art for the Russian NER task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2018

Combining neural and knowledge-based approaches to Named Entity Recognition in Polish

Named entity recognition (NER) is one of the tasks in natural language p...
research
03/10/2020

Adaptive Name Entity Recognition under Highly Unbalanced Data

For several purposes in Natural Language Processing (NLP), such as Infor...
research
02/07/2017

Fast and Accurate Entity Recognition with Iterated Dilated Convolutions

Today when many practitioners run basic NLP on the entire web and large-...
research
08/24/2017

NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit

This paper demonstrates neural network-based toolkit namely NNVLP for es...
research
10/22/2019

IPOD: Corpus of 190,000 Industrial Occupations

Job titles are the most fundamental building blocks for occupational dat...
research
01/07/2019

Team EP at TAC 2018: Automating data extraction in systematic reviews of environmental agents

We describe our entry for the Systematic Review Information Extraction t...
research
04/03/2019

Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF

The article introduces a new set of Polish word embeddings, built using ...

Please sign up or login with your details

Forgot password? Click here to reset