Continuous Assortment Optimization with Logit Choice Probabilities under Incomplete Information
We consider assortment optimization in relation to a product for which a particular attribute can be continuously adjusted. Examples include the duration of a loan (where each duration corresponds to a specific interest rate) and the data limit for a cell phone subscription. The question to be addressed is: how should a retailer determine what to offer to maximize profit? Representing the assortment as a union of subintervals, the choice of a customer is modelled as a continuous logit choice model; a capacity constraint is imposed on the assortment. The problem can be phrased as a multi-armed bandit, i.e., the objective is to estimate demand over time by sequentially offering different assortments to incoming costumers. Kernel density estimation is applied to the observed purchases. We present an explore-then-exploit policy, which endures at most a regret of order T^2/3 (neglecting logarithmic factors). Also, by showing that any policy in the worst case must endure at least a regret of order T^2/3, we conclude that our policy can be regarded as asymptotically optimal.
READ FULL TEXT