Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates
Annotating training data for sequence tagging tasks is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We are the first to thoroughly investigate this powerful combination in sequence tagging. We find that taggers based on deep pre-trained models can benefit from Bayesian query strategies with the help of the Monte Carlo (MC) dropout. Results of experiments with various uncertainty estimates and MC dropout variants show that the Bayesian active learning by disagreement query strategy coupled with the MC dropout applied only in the classification layer of a Transformer-based tagger is the best option in terms of quality. This option also has very little computational overhead. We also demonstrate that it is possible to reduce the computational overhead of AL by using a smaller distilled version of a Transformer model for acquiring instances during AL.
READ FULL TEXT