Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data
Youth in the American foster care system are significantly more likely than their peers to face a number of negative life outcomes, from homelessness to incarceration. Administrative data on these youth have the potential to provide insights that can help identify ways to improve their path towards a better life. However, such data also suffer from a variety of biases, from missing data to reflections of systemic inequality. The present work proposes a novel, prescriptive approach to using these data to provide insights about both data biases and the systems and youth they track. Specifically, we develop a novel categorical clustering and cluster summarization methodology that allows us to gain insights into subtle biases in existing data on foster youth, and to provide insight into where further (often qualitative) research is needed to identify potential ways of assisting youth.
READ FULL TEXT