COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Dataset of Anti-vaccine Content, Vaccine Misinformation and Conspiracies
False claims about COVID-19 vaccines can undermine public trust in ongoing vaccination campaigns, thus posing a threat to global public health. Misinformation originating from various sources has been spreading online since the beginning of the COVID-19 pandemic. In this paper, we present a dataset of Twitter posts that exhibit a strong anti-vaccine stance. The dataset consists of two parts: a) a streaming keyword-centered data collection with more than 1.8 million tweets, and b) a historical account-level collection with more than 135 million tweets. The former leverages the Twitter streaming API to follow a set of specific vaccine-related keywords starting from mid-October 2020. The latter consists of all historical tweets of 70K accounts that were engaged in the active spreading of anti-vaccine narratives. We present descriptive analyses showing the volume of activity over time, geographical distributions, topics, news sources, and inferred account political leaning. This dataset can be used in studying anti-vaccine misinformation on social media and enable a better understanding of vaccine hesitancy. In compliance with Twitter's Terms of Service, our anonymized dataset is publicly available at: https://github.com/gmuric/avax-tweets-dataset
READ FULL TEXT