Ensuring Actionable Recourse via Adversarial Training
As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations. Such explanations provide individuals adversely impacted by predicted outcomes (e.g., an applicant denied a loan) with "recourse" —i.e., a description of how they can change their features to obtain a positive outcome. We propose a novel algorithm that leverages adversarial training and PAC confidence sets to learn models that theoretically guarantee recourse to affected individuals with high probability without sacrificing accuracy. To the best of our knowledge, our approach is the first to learn models for which recourses are guaranteed with high probability. Extensive experimentation with real world datasets spanning various applications including recidivism prediction, bail outcomes, and lending demonstrate the efficacy of the proposed framework.
READ FULL TEXT