Assaf Hallak

research

∙ 01/30/2023

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

Despite the popularity of policy gradient methods, they are known to suf...

0 Gal Dalal, et al. ∙

research

∙ 09/28/2022

SoftTreeMax: Policy Gradient with Tree Search

Policy-gradient methods are widely used for learning control policies. T...

0 Gal Dalal, et al. ∙

research

∙ 05/30/2022

Reinforcement Learning with a Terminator

We present the problem of reinforcement learning with exogenous terminat...

0 Guy Tennenholtz, et al. ∙

research

∙ 01/28/2022

Planning and Learning with Adaptive Lookahead

The classical Policy Iteration (PI) algorithm alternates between greedy ...

0 Aviv Rosenberg, et al. ∙

research

∙ 10/13/2021

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

We consider the problem of using expert data with unobserved confounders...

0 Guy Tennenholtz, et al. ∙

research

∙ 07/04/2021

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Tree Search (TS) is crucial to some of the most influential successes in...

0 Assaf Hallak, et al. ∙

research

∙ 02/23/2017

Automatic Representation for Lifetime Value Recommender Systems

Many modern commercial sites employ recommender systems to propose relev...

0 Assaf Hallak, et al. ∙

research

∙ 02/23/2017

Consistent On-Line Off-Policy Evaluation

The problem of on-line off-policy evaluation (OPE) has been actively stu...

0 Assaf Hallak, et al. ∙

research

∙ 09/17/2015

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision process...

0 Assaf Hallak, et al. ∙

research

∙ 08/14/2015

Emphatic TD Bellman Operator is a Contraction

Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) ...

0 Assaf Hallak, et al. ∙

research

∙ 02/11/2015

Off-policy evaluation for MDPs with unknown structure

Off-policy learning in dynamic decision problems is essential for provid...

0 Assaf Hallak, et al. ∙

research

∙ 02/08/2015

Contextual Markov Decision Processes

We consider a planning problem where the dynamics and rewards of the env...

0 Assaf Hallak, et al. ∙

research

∙ 08/12/2012

How to sample if you must: on optimal functional sampling

We examine a fundamental problem that models various active sampling set...

0 Assaf Hallak, et al. ∙

Assaf Hallak

Featured Co-authors

Sign in with Google

Consider DeepAI Pro