Neural Duplicate Question Detection without Labeled Training Data

11/13/2019
by   Andreas Rücklé, et al.
0

Supervised training of neural models to duplicate question detection in community Question Answering (cQA) requires large amounts of labeled question pairs, which can be costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods—weak supervision using the title and body of a question, and the automatic generation of duplicate questions—and show that both can achieve improved performances even though they do not require any labeled data. We provide a comparison of popular training strategies and show that our proposed approaches are more effective in many cases because they can utilize larger amounts of data from the cQA forums. Finally, we show that weak supervision with question title and body information is also an effective method to train cQA answer selection models without direct answer supervision.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset