Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

04/08/2020

∙

Topic models extract meaningful groups of words from documents, allowing for a better understanding of data. However, the solutions are often not coherent enough, and thus harder to interpret. Coherence can be improved by adding more contextual knowledge to the model. Recently, neural topic models have become available, while BERT-based representations have further pushed the state of the art of neural models in general. We combine pre-trained representations and neural topic models. Pre-trained BERT sentence embeddings indeed support the generation of more meaningful and coherent topics than either standard LDA or existing neural topic models. Results on four datasets show that our approach effectively increases topic coherence.

READ FULL TEXT

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

Sign in with Google

Consider DeepAI Pro