Lexicographic Optimisation of Conditional Value at Risk and Expected Value for Risk-Averse Planning in MDPs
Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. We formulate the lexicographic optimisation problem of minimising the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on three domains, including a road navigation domain based on real traffic data. Our experimental results demonstrate that our lexicographic approach attains improved expected cost while maintaining the optimal CVaR.
READ FULL TEXT