A deep reinforcement learning framework for allocating buyer impressions in e-commerce websites
We study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform. When a buyer searches for a keyword, the website presents the buyer with a list of different sellers for this item, together with the corresponding prices. This can be seen as an instance of a resource allocation problem in which the sellers choose their prices at each step and the platform decides how to allocate the impressions, based on the chosen prices and the historical transactions of each seller. Due to the complexity of the system, most e-commerce platforms employ heuristic allocation algorithms that mainly depend on the sellers' transaction records and without taking the rationality of the sellers into account, which makes them susceptible to several price manipulations. In this paper, we put forward a general framework of designing impression allocation algorithms in e-commerce websites given any behavioural model for the sellers, using deep reinforcement learning. The impression allocation problem is modeled as a Markov decision process, where the states encode the history of impressions, prices, transactions and generated revenue and the actions are the possible impression allocations at each round. To tackle the problem of continuity and high-dimensionality of states and actions, we adopt the ideas of the DDPG algorithm to design an actor-critic gradient policy algorithm which takes advantage of the problem domain in order to achieve covergence and stability. Our algorithm is compared against natural heuristics and it outperforms all of them in terms of the total revenue generated. Finally, contrary to the DDPG algorithm, our algorithm is robust to settings with variable sellers and easy to converge.
READ FULL TEXT