Text Indexing and Searching in Sublinear Time
We introduce the first index that can be built in o(n) time for a text of length n, and also queried in o(m) time for a pattern of length m. On a constant-size alphabet, for example, our index uses O(n^1/2+εn) bits, is built in O(n/^1/2-ε n) deterministic time, and finds the occ pattern occurrences in time O(m/ n + √( n) n + occ), where ε>0 is an arbitrarily small constant. As a comparison, the most recent classical text index uses O(n n) bits, is built in O(n) time, and searches in time O(m/ n + n + occ). We build on a novel text sampling based on difference covers, which enjoys properties that allow us efficiently computing longest common prefixes in constant time. We extend our results to the secondary memory model as well, where we give the first construction in o(Sort(n)) time of a data structure with suffix array functionality, which can search for patterns in the almost optimal time, with an additive penalty of O(√(_M/B n) n), where M is the size of main memory available and B is the disk block size.
READ FULL TEXT