Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis

08/25/2017
by   Eric L. Seidel, et al.
0

Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed" versions -- to automatically learn a model of where the error is most likely to be found. Given a new ill-typed program, Nate executes the model to generate a list of potential blame assignments ranked by likelihood. We evaluate Nate by comparing its precision to the state of the art on a set of over 5,000 ill-typed OCaml programs drawn from two instances of an introductory programming course. We show that when the top-ranked blame assignment is considered, Nate's data-driven model is able to correctly predict the exact sub-expression that should be changed 72 higher than the state-of-the-art SHErrLoc tool. Furthermore, Nate's accuracy surpasses 85 consider the top three.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset