Releasing Inequality Phenomena in L_∞-Adversarial Training via Input Gradient Distillation

05/16/2023

∙

Since adversarial examples appeared and showed the catastrophic degradation they brought to DNN, many adversarial defense methods have been devised, among which adversarial training is considered the most effective. However, a recent work showed the inequality phenomena in l_∞-adversarial training and revealed that the l_∞-adversarially trained model is vulnerable when a few important pixels are perturbed by i.i.d. noise or occluded. In this paper, we propose a simple yet effective method called Input Gradient Distillation (IGD) to release the inequality phenomena in l_∞-adversarial training. Experiments show that while preserving the model's adversarial robustness, compared to PGDAT, IGD decreases the l_∞-adversarially trained model's error rate to inductive noise and inductive occlusion by up to 60% and 16.53%, and to noisy images in Imagenet-C by up to 21.11%. Moreover, we formally explain why the equality of the model's saliency map can improve such robustness.

READ FULL TEXT

Releasing Inequality Phenomena in L_∞-Adversarial Training via Input Gradient Distillation

Sign in with Google

Consider DeepAI Pro