Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback
Given a set V of n objects, an online ranking system outputs at each time step a full ranking of the set, observes a feedback of some form and suffers a loss. We study the setting in which the (adversarial) feedback is an element in V, and the loss is the position (0th, 1st, 2nd...) of the item in the outputted ranking. More generally, we study a setting in which the feedback is a subset U of at most k elements in V, and the loss is the sum of the positions of those elements. We present an algorithm of expected regret O(n^3/2√(Tk)) over a time horizon of T steps with respect to the best single ranking in hindsight. This improves previous algorithms and analyses either by a factor of either Ω(√(k)), a factor of Ω(√( n)) or by improving running time from quadratic to O(n n) per round. We also prove a matching lower bound. Our techniques also imply an improved regret bound for online rank aggregation over the Spearman correlation measure, and to other more complex ranking loss functions.
READ FULL TEXT