Exploring TD error as a heuristic for σ selection in Q(σ, λ)

12/21/2019

∙

In the landscape of TD algorithms, the Q(σ, λ) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. σ∈ [0, 1] indicates the extent to which sampling is used. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.

READ FULL TEXT

Exploring TD error as a heuristic for σ selection in Q(σ, λ)

Sign in with Google

Consider DeepAI Pro