On quantitative aspects of model interpretability
Despite the growing body of work in interpretable machine learning, it remains unclear how to evaluate different explainability methods without resorting to qualitative assessment and user-studies. While interpretability is an inherently subjective matter, previous works in cognitive science and epistemology have shown that good explanations do possess aspects that can be objectively judged apart from fidelity), such assimplicity and broadness. In this paper we propose a set of metrics to programmatically evaluate interpretability methods along these dimensions. In particular, we argue that the performance of methods along these dimensions can be orthogonally imputed to two conceptual parts, namely the feature extractor and the actual explainability method. We experimentally validate our metrics on different benchmark tasks and show how they can be used to guide a practitioner in the selection of the most appropriate method for the task at hand.
READ FULL TEXT