Learning Optimized Risk Scores on Large-Scale Datasets

10/01/2016
by   Berk Ustun, et al.
0

Risk scores are simple classification models that let users quickly assess risk by adding, subtracting, and multiplying a few small numbers. These models are widely used for high-stakes applications in healthcare and criminology, but are difficult to create because they need to be risk-calibrated, sparse, use small integer coefficients, and obey operational constraints. In this paper, we present a new approach to learn risk scores that are fully optimized for feature selection, integer coefficients, and operational constraints. We formulate the risk score problem as a mixed integer nonlinear program, and present a new cutting plane algorithm to efficiently recover its optimal solution while avoiding the stalling behavior of existing cutting plane algorithms in non-convex settings. We pair our algorithm with specialized techniques to generate feasible solutions, narrow the optimality gap, and reduce data-related computation. The resulting approach can learn optimized risk scores in a way that scales linearly in the number of samples, provides a proof of optimality, and accommodates complex operational constraints. We illustrate the benefits of this approach through extensive numerical experiments.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset