Opponent Aware Reinforcement Learning

08/22/2019
by   Víctor Gallego, et al.
5

In several reinforcement learning (RL) scenarios such as security settings, there may be adversaries trying to interfere with the reward generating process for their own benefit. We introduce Threatened Markov Decision Processes (TMDPs) as a framework to support an agent against potential opponents in a RL context. We also propose a level-k thinking scheme resulting in a novel learning approach to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries in RL while the agent learns

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset