Counterfactual Learning from Logs for Improved Ranking of E-Commerce Products
Improved search quality enhances users' satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Learning to Rank (LTR) algorithms require relevance judgments on products for learning. In real commercial scenarios, getting such judgments poses an immense challenge in application of LTR algorithms. In the literature, it is proposed to employ user feedback signals such as clicks, orders etc to generate relevance judgments. It is done by aggregating the logged data and calculating click rate, order rate etc of products, for each query in the logs. In this paper, we advocate counterfactual risk minimization (CRM) approach which circumvents the need of such data pre-processing and is better suited for learning from logged data, i.e. contextual bandit feedback. Due to unavailability of public E-Com LTR dataset, we provide Mercateo dataset from our E-Com platform. This dataset contains information of queries from real users, actions taken by the policy running on the system, probability of these actions and feedback of users on those actions. Our commercial dataset contains more than 10 million click log entries and 1 million order logs from a catalogue of about 3.5 million products and 3000 queries. To the best of our knowledge, this is the first work which examines effectiveness of CRM approach in learning ranking model from real-world logged data. Our empirical evaluation shows that CRM approach is able to learn directly from logged contextual-bandit feedback. Our method outperforms full-information loss on deep neural network model as well as traditional ranking models like LambdaMART. These findings have significant implications for improving the quality of search in E-Com platforms.
READ FULL TEXT