Automatic Generation of Benchmarks for Entity Recognition and Linking
The velocity dimension of Big Data plays an increasingly important role in processing unstructured data. Heretofore, no large-scale benchmarks were available to evaluate the performance of named entity recognition and entity linking solutions. This unavailability was due to the creation of gold standards for named entity recognition and entity linking being a time-intensive, costly and error-prone task. We hence investigate the automatic generation of benchmark texts with entity annotations for named entity recognition and linking from Linked Data. The main advantage of automatically constructed benchmarks is that they can be readily generated at any time, and are cost-effective while being guaranteed to achieve gold-standard quality. We compare the performance of 11 tools on the benchmarks we generate with their performance on 16 benchmarks that were created manually. Our results suggest that our automatic benchmark generation approach can create varied benchmarks that have characteristics similar to those of existing benchmarks. In addition, we perform a large-scale runtime evaluation of entity recognition and linking solutions for the first time in literature. Our experimental results are available at http://faturl.com/bengalexps/?open.
READ FULL TEXT