Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

04/14/2020
by   Firoj Alam, et al.
1

Time-critical analysis of social media streams is important for humanitarian organizations to plan rapid response during disasters. The crisis informatics research community has developed several techniques and systems to process and classify big crisis data on social media. However, due to a variety of different datasets used in the literature, it is not possible to compare the results and to measure the progress made towards better models for crisis classification. In this work, we attempt to bridge this gap by providing a standard crisis-related dataset. We consolidate labels of 8 annotated data sources and provide 166.1k and 141.5k tweets for informativeness and humanitarian classification tasks. The consolidation also result in larger dataset size which is helpful in training stronger models. We also provide baseline results using CNN and BERT models. We make the dataset available at https://crisisnlp.qcri.org/crisis_datasets_benchmarks.html

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset