Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question Answering Data

02/01/2021
by   Dian Yu, et al.
0

In spite of much recent research in the area, it is still unclear whether subject-area question-answering data is useful for machine reading comprehension (MRC) tasks. In this paper, we investigate this question. We collect a large-scale multi-subject multiple-choice question-answering dataset, ExamQA, and use incomplete and noisy snippets returned by a web search engine as the relevant context for each question-answering instance to convert it into a weakly-labeled MRC instance. We then propose a self-teaching paradigm to better use the generated weakly-labeled MRC instances to improve a target MRC task. Experimental results show that we can obtain an improvement of 5.1 accuracy on a multiple-choice MRC dataset, C^3, demonstrating the effectiveness of our framework and the usefulness of large-scale subject-area question-answering data for machine reading comprehension.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset