Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

05/12/2018
by   Yining Wang, et al.
0

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(√(T T)), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset