Poisoning and Backdooring Contrastive Learning

06/17/2021
by   Nicholas Carlini, et al.
0

Contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.005 (e.g., just 150 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of less than 0.0001 Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset