Cautious Reinforcement Learning with Logical Constraints

02/26/2020
by   Mohammadhosein Hasanbeig, et al.
0

This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesize optimal control policies while ensuring safety during the learning process. We express the safety requirements as a temporal logic formula. Enforcing the RL agent to stay safe during learning might limit the exploration in some safety-critical cases. However, we show that the proposed architecture is able to automatically handle the trade-off between efficient progress in exploration and ensuring strict safety. Theoretical guarantees are available on the convergence of the algorithm. Finally experimental results are provided to showcase the performance of the proposed method.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset