The Complexity of POMDPs with Long-run Average Objectives

04/30/2019
by   Krishnendu Chatterjee, et al.
0

We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset