Mechanism Design with Bandit Feedback

04/19/2020
by   Kirthevasan Kandasamy, et al.
4

We study a multi-round welfare-maximising mechanism design problem, where, on each round, a mechanism assigns an allocation each to a set of agents and charges them a price. Then the agents report their realised (stochastic) values back to the mechanism. This is motivated by applications in cloud markets and online advertising where an agent may know her value for an allocation only after experiencing it. The distribution of these values is unknown to the agent beforehand which necessitates learning them over multiple rounds while simultaneously attempting to find the socially optimal set of allocations. Our focus is on designing truthful and individually rational mechanisms which imitate the classical VCG mechanism in the long run. To that end, we define three notions of regret for the welfare, the individual utilities of each agent (value minus price) and that of the mechanism (revenue minus cost). We show that these three terms are interdependent via an Ω(T^2/3) lower bound for the maximum of these three terms after T rounds of allocations. We describe a family of anytime algorithms which achieve this rate. The proposed framework provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets, and additionally to control the degree of truthfulness and individual rationality.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset