Discounted Reinforcement Learning is Not an Optimization Problem
Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. This is because it is not an optimization problem — it lacks an objective function. After substantiating these claims, we go on to address some misconceptions about discounting and its connection to the average reward formulation. We encourage researchers to adopt rigorous optimization approaches for reinforcement learning in continuing tasks, such as average reward.
READ FULL TEXT