Recommender system as an exploration coordinator: a bounded O(1) regret algorithm for large platforms
On typical modern platforms, users are only able to try a small fraction of the available items. This makes it difficult to model the exploration behavior of platform users as typical online learners who explore all the items. Towards addressing this issue, we propose to interpret a recommender system as a bandit exploration coordinator that provides counterfactual information updates. In particular, we introduce a novel algorithm called Counterfactual UCB (CFUCB) which is guarantees user exploration coordination with bounded regret under the presence of linear representations. Our results show that sharing information is a Subgame Perfect Nash Equilibrium for agents in terms of regret, leading to each agent achieving bounded regret. This approach has potential applications in personalized recommender systems and adaptive experimentation.
READ FULL TEXT