Exploring TD error as a heuristic for σ selection in Q(σ, λ)

12/21/2019
by   Abhishek Nan, et al.
13

In the landscape of TD algorithms, the Q(σ, λ) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. σ∈ [0, 1] indicates the extent to which sampling is used. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset