The Efficacy of SHIELD under Different Threat Models
We study the efficacy of SHIELD in the face of alternative threat models. We find that SHIELD's robustness decreases by 65 against an adaptive adversary (one who knows JPEG compression is being used as a pre-processing step but not necessarily the compression level) in the gray-box threat model (adversary is aware of the model architecture but not necessarily the weights of that model). However, these adversarial examples are, so far, unable to force a targeted prediction. We also find that the robustness of the JPEG-trained models used in SHIELD decreases by 67 drops from 57 threat model. The addition of SLQ pre-processing to these JPEG-trained models is also not a robust defense (accuracy drops to 0.1 adversary in the gray-box threat model, and an adversary can create adversarial perturbations that force a chosen prediction. We find that neither JPEG-trained models with SLQ pre-processing nor SHIELD are robust against an adaptive adversary in the white-box threat model (accuracy is 0.1 can control the predicted output of their adversarial images. Finally, ensemble-based attacks transfer better (29.8 non-ensemble based attacks (1.4
READ FULL TEXT