RC-QED: Evaluating Natural Language Derivations in Multi-Hop Reading Comprehension

10/10/2019
by   Naoya Inoue, et al.
0

Recent studies revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This allows systems to "cheat" by employing simple heuristics to answer questions, e.g. by relying on semantic type consistency. This means that current datasets are not well-suited to evaluate RC systems. To address this issue, we introduce RC-QED, a new RC task that requires giving not only the correct answer to a question, but also the reasoning employed for arriving at this answer. For this, we release a large benchmark dataset consisting of 12,000 answers and corresponding reasoning in form of natural language derivations. Experiments show that our benchmark is robust to simple heuristics and challenging for state-of-the-art neural path ranking approaches.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset