Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts
We show that an efficient and popular method for calculating bigram frequencies is unsuitable for bodies of short texts and offer a simple alternative. Our method has the same computational complexity as the old method and offers an exact count instead of an approximation.
READ FULL TEXT