Despite the popularity of policy gradient methods, they are known to suf...
Policy-gradient methods are widely used for learning control policies. T...
We present the problem of reinforcement learning with exogenous terminat...
The classical Policy Iteration (PI) algorithm alternates between greedy
...
We consider the problem of using expert data with unobserved confounders...
Tree Search (TS) is crucial to some of the most influential successes in...
Many modern commercial sites employ recommender systems to propose relev...
The problem of on-line off-policy evaluation (OPE) has been actively stu...
We consider the off-policy evaluation problem in Markov decision process...
Recently, SuttonMW15 introduced the emphatic temporal differences
(ETD) ...
Off-policy learning in dynamic decision problems is essential for provid...
We consider a planning problem where the dynamics and rewards of the
env...
We examine a fundamental problem that models various active sampling set...