Sorting Finite Automata via Partition Refinement
Wheeler nondeterministic finite automata (WNFAs) were introduced as a generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete. Recently, however, Alanko et al. showed how to side-step this complexity by switching to preorders: letting Q be the set of states, E the set of transitions, |Q|=n, and |E|=m, they provided a O(mn^2)-time algorithm computing a totally-ordered partition of the WNFA's states such that (1) equivalent states recognize the same regular language, and (2) the order of non-equivalent states is consistent with any Wheeler order, when one exists. Then, the output is a preorder of the states as useful for pattern matching as standard Wheeler orders. Further research generalized these concepts to arbitrary NFAs by introducing co-lex partial preorders: any NFA admits a partial preorder of its states reflecting the co-lex order of their accepted strings; the smaller the width of such preorder is, the faster regular expression matching queries can be performed. To date, the fastest algorithm for computing the smallest-width partial preorder on NFAs runs in O(m^2+n^5/2) time, while on DFAs the same can be done in O(min(n^2log n,mn)) time. In this paper, we provide much more efficient solutions to the problem above. Our results are achieved by extending a classic algorithm for the relational coarsest partition refinement problem to work with ordered partitions. Specifically, we provide a O(mlog n)-time algorithm computing a co-lex total preorder when the input is a WNFA, and an algorithm with the same time complexity computing the smallest-width co-lex partial order of any DFA. Also, we present implementations of our algorithms and show that they are very efficient in practice.
READ FULL TEXT