Bayesian Counterfactual Risk Minimization

06/29/2018

∙

by Ben London, et al.

∙

We present a Bayesian view of counterfactual risk minimization (CRM), also known as offline policy optimization from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated IPS estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM.

READ FULL TEXT

Bayesian Counterfactual Risk Minimization

Sign in with Google

Consider DeepAI Pro