We derive a new analysis of Follow The Regularized Leader (FTRL) for onl...
Policy Optimization (PO) is one of the most popular methods in Reinforce...
Policy optimization is among the most popular and successful reinforceme...
The standard assumption in reinforcement learning (RL) is that agents ob...
We study cooperative online learning in stochastic and adversarial Marko...
The classical Policy Iteration (PI) algorithm alternates between greedy
...
We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
Reinforcement learning typically assumes that the agent observes feedbac...
We consider provably-efficient reinforcement learning (RL) in non-episod...
Stochastic shortest path (SSP) is a well-known problem in planning and
c...
Stochastic shortest path (SSP) is a well-known problem in planning and
c...
Policy optimization methods are one of the most widely used classes of
R...
We consider online learning in episodic loop-free Markov decision proces...