Detecting Bias in Black-Box Models Using Transparent Model Distillation

10/17/2017
by   Sarah Tan, et al.
0

Black-box risk scoring models permeate our lives, yet are typically proprietary and opaque. We propose a transparent model distillation approach to detect bias in such models. Model distillation was originally designed to distill knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. We add a third restriction - transparency. In this paper we use data sets that contain two labels to train on: the risk score predicted by a black-box model, as well as the actual outcome the risk score was intended to predict. This allows us to compare models that predict each label. For a particular class of student models - interpretable tree additive models with pairwise interactions (GA2Ms) - we provide confidence intervals for the difference between the risk score and actual outcome models. This presents a new method for detecting bias in black-box risk scores by assessing if contributions of protected features to the risk score are statistically different from their contributions to the actual outcome.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset