Learning Concept Embeddings for Efficient Bag-of-Concepts Densification
Explicit concept space models have proven efficacy for text representation in many natural language and text mining applications. The idea is to embed textual structures into a semantic space of concepts which captures the main topics of these structures. That so called bag-of-concepts representation suffers from data sparsity causing low similarity scores between similar texts due to low concept overlap. In this paper we propose two neural embedding models in order to learn continuous concept vectors. Once learned, we propose an efficient vector aggregation method to generate fully dense bag-of-concepts representations. Empirical results on a benchmark dataset for measuring entity semantic relatedness show superior performance over other concept embedding models. In addition, by utilizing our efficient aggregation method, we demonstrate the effectiveness of the densified vector representation over the typical sparse representations for dataless classification where we can achieve at least same or better accuracy with much less dimensions.
READ FULL TEXT