Conditionally Risk-Averse Contextual Bandits

10/24/2022
by   Mónika Farsang, et al.
0

We desire to apply contextual bandits to scenarios where average-case statistical guarantees are inadequate. Happily, we discover the composition of reduction to online regression and expectile loss is analytically tractable, computationally convenient, and empirically effective. The result is the first risk-averse contextual bandit algorithm with an online regret guarantee. We state our precise regret guarantee and conduct experiments from diverse scenarios in dynamic pricing, inventory management, and self-tuning software; including results from a production exascale cloud data processing system.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset