Independent Components of Word Embeddings Represent Semantic Features

12/19/2022
by   Tomáš Musil, et al.
0

Independent Component Analysis (ICA) is an algorithm originally developed for finding separate sources in a mixed signal, such as a recording of multiple people in the same room speaking at the same time. It has also been used to find linguistic features in distributional representations. In this paper, we used ICA to analyze words embeddings. We have found that ICA can be used to find semantic features of the words and these features can easily be combined to search for words that satisfy the combination. We show that only some of the independent components represent such features, but those that do are stable with regard to random initialization of the algorithm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset