Off-Policy Estimation (OPE) methods allow us to learn and evaluate
decis...
"Clipping" (a.k.a. importance weight truncation) is a widely used
varian...
The bandit paradigm provides a unified modeling framework for problems t...
A critical need for industrial recommender systems is the ability to eva...
We investigate boosted ensemble models for off-policy learning from logg...
We present a Bayesian view of counterfactual risk minimization (CRM), al...
Graphical models for structured domains are powerful tools, but the
comp...