kōan: A Corrected CBOW Implementation

12/30/2020
by   Ozan Irsoy, et al.
0

It is a common belief in the NLP community that continuous bag-of-words (CBOW) word embeddings tend to underperform skip-gram (SG) embeddings. We find that this belief is founded less on theoretical differences in their training objectives but more on faulty CBOW implementations in standard software libraries such as the official implementation word2vec.c and Gensim. We show that our correct implementation of CBOW yields word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks while being more than three times as fast to train. We release our implementation, kōan, at https://github.com/bloomberg/koan.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset