Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space

09/08/2018
by   Travis Gagie, et al.
0

Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms (BWTs). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms ofr. Since then, a number of other indexes with space bounded by other measures of repetitiveness --- the number of phrases in the Lempel-Ziv parse, the sizeof the smallest grammar generating (only) the text, the size of the smallest automaton recognizing the text factors --- have been proposed for efficiently locating,but not directly counting, the occurrences of a pattern. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time O(m+occ) within O(r(n/r)) space, on a RAM machine with words of w=Ω( n) bits. Within O(r (n/r)) space, our index can also count in optimal time O(m).Raising the space to O(r w_σ(n/r)), we support count and locate in O( m(σ)/w) and O( m(σ)/w+occ) time, which is optimal in the packed setting and had not been obtained before in compressed space. We also describe a structure using O(r(n/r)) space that replaces the text and extracts any text substring ...

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro