Revealing subgroup structure in ranked data using a Bayesian WAND
Ranked data arise in many areas of application ranging from the ranking of up-regulated genes for cancer to the ranking of academic statistics journals. Complications can arise when rankers do not report a full ranking of all entities; for example, they might only report their top--M ranked entities after seeing some or all entities. It can also be useful to know whether rankers are equally informative, and whether some entities are effectively judged to be exchangeable. Recent work has focused on determining an aggregate (overall) ranking but such summaries can be misleading when there is important subgroup structure in the data. In this paper we propose a flexible Bayesian nonparametric model for dealing with heterogeneous structure and ranker reliability in ranked data. The model is a Weighted Adapted Nested Dirichlet (WAND) process mixture of Plackett-Luce models and inference proceeds through a simple and efficient Gibbs sampling scheme for posterior sampling. The richness of information in the posterior distribution allows us to infer many details of the structure both between ranker groups and between entity groups (within ranker groups), in contrast to many other (Bayesian) analyses. The methodology is illustrated using several simulation studies and two real data examples.
READ FULL TEXT