Substring Complexities on Run-length Compressed Strings
Let S_T(k) denote the set of distinct substrings of length k in a string T, then the k-th substring complexity is defined by its cardinality |S_T(k)|. Recently, δ = max{ |S_T(k)| / k : k ≥ 1 } is shown to be a good compressibility measure of highly-repetitive strings. In this paper, given T of length n in the run-length compressed form of size r, we show that δ can be computed in 𝐶_𝗌𝗈𝗋𝗍(r, n) time and O(r) space, where 𝐶_𝗌𝗈𝗋𝗍(r, n) = O(min (r r, r _r n)) is the time complexity for sorting r O( n)-bit integers in O(r) space in the Word-RAM model with word size Ω( n).
READ FULL TEXT