The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes
We study the never-worse relation (NWR) for Markov decision processes with an infinite-horizon reachability objective. A state q is never worse than a state p if the maximal probability of reaching the target set of states from p is at most the same value from q, regardless of the probabilities labelling the transitions. Extremal-probability states, end components, and essential states are all special cases of the equivalence relation induced by the NWR. Using the NWR, states in the same equivalence class can be collapsed. Then, actions leading to sub-optimal states can be removed. We show the natural decision problem associated to computing the NWR is coNP-complete. Finally, we extend a known incomplete polynomial-time iterative algorithm to under-approximate the NWR.
READ FULL TEXT