Recommender system as an exploration coordinator: a bounded O(1) regret algorithm for large platforms

01/29/2023

∙

On typical modern platforms, users are only able to try a small fraction of the available items. This makes it difficult to model the exploration behavior of platform users as typical online learners who explore all the items. Towards addressing this issue, we propose to interpret a recommender system as a bandit exploration coordinator that provides counterfactual information updates. In particular, we introduce a novel algorithm called Counterfactual UCB (CFUCB) which is guarantees user exploration coordination with bounded regret under the presence of linear representations. Our results show that sharing information is a Subgame Perfect Nash Equilibrium for agents in terms of regret, leading to each agent achieving bounded regret. This approach has potential applications in personalized recommender systems and adaptive experimentation.

READ FULL TEXT

Recommender system as an exploration coordinator: a bounded O(1) regret algorithm for large platforms

Sign in with Google

Consider DeepAI Pro