Cautious Reinforcement Learning with Logical Constraints
This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesize optimal control policies while ensuring safety during the learning process. We express the safety requirements as a temporal logic formula. Enforcing the RL agent to stay safe during learning might limit the exploration in some safety-critical cases. However, we show that the proposed architecture is able to automatically handle the trade-off between efficient progress in exploration and ensuring strict safety. Theoretical guarantees are available on the convergence of the algorithm. Finally experimental results are provided to showcase the performance of the proposed method.
READ FULL TEXT