Sparse Regular Expression Matching
We present the first algorithm for regular expression matching that can take advantage of sparsity in the input instance. Our main result is a new algorithm that solves regular expression matching in O(Δnm/Δ + n + m) time, where m is the number of positions in the regular expression, n is the length of the string, and Δ is the density of the instance, defined as the total number of active states in a simulation of the position automaton. This measure is a lower bound on the total number of active states in simulations of all classic polynomial sized finite automata. Our bound improves the best known bounds for regular expression matching by almost a linear factor in the density of the problem. The key component in the result is a novel linear space representation of the position automaton that supports state-set transition computation in near-linear time in the size of the input and output state sets.
READ FULL TEXT