User Generated Data: Achilles' heel of BERT
Pre-trained language models such as BERT are known to perform exceedingly well on various NLP tasks and have even established new State-Of-The-Art (SOTA) benchmarks for many of these tasks. Owing to its success on various tasks and benchmark datasets, industry practitioners have started to explore BERT to build applications solving industry use cases. These use cases are known to have much more noise in the data as compared to benchmark datasets. In this work we systematically show that when the data is noisy, there is a significant degradation in the performance of BERT. Specifically, we performed experiments using BERT on popular tasks such sentiment analysis and textual similarity. For this we work with three well known datasets - IMDB movie reviews, SST-2 and STS-B to measure the performance. Further, we examine the reason behind this performance drop and identify the shortcomings in the BERT pipeline.
READ FULL TEXT