On the Hardness and Inapproximability of Recognizing Wheeler Graphs
In recent years several compressed indexes based on variants of the Borrows-Wheeler transformation have been introduced. Some of these index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an effort to better understand under which conditions such an indexing scheme is possible. This led to the introduction of Wheeler graphs [Gagie it et al., Theor. Comput. Sci., 2017]. A Wheeler graph is a directed graph with edge labels which satisfies two simple axioms. Wheeler graphs can be indexed in a way which is space efficient and allows for fast traversal. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs. Here we answer the open question of whether or not there exists an efficient algorithm for recognizing if a graph is a Wheeler graph. We demonstrate:(i) Recognizing if a graph is a Wheeler graph is NP-complete for any edge label alphabet of size σ≥ 2, even for DAGs. It can be solved in linear time for σ =1; (ii) An optimization variant called Wheeler Graph Violation (WGV) which aims to remove the minimum number of edges needed to obtain a Wheeler graph is APX-hard, even for DAGs. Hence, unless P = NP, there exists constant C > 1 such that there is no C-approximation algorithm. We show conditioned on the Unique Games Conjecture, for every constant C ≥ 1, it is NP-hard to find a C-approximation to WGV; (iii) The Wheeler Subgraph problem (WS) which aims to find the largest Wheeler subgraph is in APX for σ=O(1); (iv) For the above problems there exist efficient exponential time exact algorithms, relying on graph isomorphism being computed in strictly sub-exponential time; (v) A class of graphs where the recognition problem is polynomial time solvable.
READ FULL TEXT