Causally Estimating the Sensitivity of Neural NLP Models to Spurious Features

10/14/2021
by   Yunxiang Zhang, et al.
33

Recent work finds modern natural language processing (NLP) models relying on spurious features for prediction. Mitigating such effects is thus important. Despite this need, there is no quantitative measure to evaluate or compare the effects of different forms of spurious features in NLP. We address this gap in the literature by quantifying model sensitivity to spurious features with a causal estimand, dubbed CENT, which draws on the concept of average treatment effect from the causality literature. By conducting simulations with four prominent NLP models – TextRNN, BERT, RoBERTa and XLNet – we rank the models against their sensitivity to artificial injections of eight spurious features. We further hypothesize and validate that models that are more sensitive to a spurious feature will be less robust against perturbations with this feature during inference. Conversely, data augmentation with this feature improves robustness to similar perturbations. We find statistically significant inverse correlations between sensitivity and robustness, providing empirical support for our hypothesis.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset