Attacking Neural Text Detectors

02/19/2020
by   Max Wolff, et al.
0

Machine learning based language models have recently made significant progress, which introduces a danger to spread misinformation. To combat this potential danger, several methods have been proposed for detecting text written by these language models. This paper presents two classes of black-box attacks on these detectors, one which randomly replaces characters with homoglyphs, and the other a simple scheme to purposefully misspell words. The homoglyph and misspelling attacks decrease a popular neural text detector's recall on neural text from 97.44 the attacks are transferable to other neural text detectors.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset