Good-Enough Compositional Data Augmentation

04/21/2019
by   Jacob Andreas, et al.
0

We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training examples and replacing (possibly discontinuous) fragments with other fragments that appear in at least one similar environment. The protocol is model-agnostic and useful for a variety of tasks. Applied to neural sequence-to-sequence models, it reduces relative error rate by up to 87 problems from the diagnostic SCAN tasks and 16 Applied to n-gram language modeling, it reduces perplexity by roughly 1 small datasets in several languages.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset