MR-RePair: Grammar Compression based on Maximal Repeats

11/12/2018
by   Isamu Furuya, et al.
0

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpus. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset