Re-Pair In-Place
Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length n whose characters are drawn from an integer alphabet, an O(n^2) time algorithm computing Re-Pair in n τ bits of space including the text space, where τ is the maximum of n and the number of terminals and non-terminals. The algorithm works in the restore model, supporting the recovery of the original input in O(n^2) time with O( n) additional bits of working space. We give variants of our solution working in parallel or in the external memory model.
READ FULL TEXT