Recommendation or Discrimination?: Quantifying Distribution Parity in Information Retrieval Systems
Information retrieval (IR) systems often leverage query data to suggest relevant items to users. This introduces the possibility of unfairness if the query (i.e., input) and the resulting recommendations unintentionally correlate with latent factors that are protected variables (e.g., race, gender, and age). For instance, a visual search system for fashion recommendations may pick up on features of the human models rather than fashion garments when generating recommendations. In this work, we introduce a statistical test for "distribution parity" in the top-K IR results, which assesses whether a given set of recommendations is fair with respect to a specific protected variable. We evaluate our test using both simulated and empirical results. First, using artificially biased recommendations, we demonstrate the trade-off between statistically detectable bias and the size of the search catalog. Second, we apply our test to a visual search system for fashion garments, specifically testing for recommendation bias based on the skin tone of fashion models. Our distribution parity test can help ensure that IR systems' results are fair and produce a good experience for all users.
READ FULL TEXT