Explaining Cross-Domain Recognition with Interpretable Deep Classifier
The recent advances in deep learning predominantly construct models in their internal representations, and it is opaque to explain the rationale behind and decisions to human users. Such explainability is especially essential for domain adaptation, whose challenges require developing more adaptive models across different domains. In this paper, we ask the question: how much each sample in source domain contributes to the network's prediction on the samples from target domain. To address this, we devise a novel Interpretable Deep Classifier (IDC) that learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision. Technically, IDC maintains a differentiable memory bank for each category and the memory slot derives a form of key-value pair. The key records the features of discriminative source samples and the value stores the corresponding properties, e.g., representative scores of the features for describing the category. IDC computes the loss between the output of IDC and the labels of source samples to back-propagate to adjust the representative scores and update the memory banks. Extensive experiments on Office-Home and VisDA-2017 datasets demonstrate that our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options. More remarkably, when taking IDC as a prior interpreter, capitalizing on 0.1 results than that uses full training set on VisDA-2017 for unsupervised domain adaptation.
READ FULL TEXT