Using Unlabeled Texts for Named-Entity Recognition
Named Entity Recognition (NER) poses the problem of learning with multiple views in two ways. First, there are not enough labeled texts so that the exploitation of unlabeled texts becomes necessary. Second, words and word sequences offer several aspects for representation, each reflecting another aspect of them. Instead of choosing the most promising representation as done in feature selection, the cooperation of different features enhances learning NER. In this paper, we investigate the bootstrapping of features. From labeled and unlabeled texts, features are determined which in turn are exploited to recognize names automatically. The SVM is used as the learning engine. Results on German texts and on biomedical texts show that the approach is promising.
READ FULL TEXT