We consider a regret minimization task under the average-reward criterio...
The upper confidence reinforcement learning (UCRL2) strategy introduced ...
Leveraging an equivalence property in the state-space of a Markov Decisi...
The problem of reinforcement learning in an unknown and discrete Markov
...