Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters Substitution

08/31/2022
by   Mingwei Zhou, et al.
0

Most current methods generate adversarial examples with the L_p norm specification. As a result, many defense methods utilize this property to eliminate the impact of such attacking algorithms. In this paper,we instead introduce "unrestricted" perturbations that create adversarial samples by using spurious relations which were learned by model training. Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results, and treat them as spurious relations learned by the model. Then we create adversarial samples by using them to replace the corresponding feature clusters in the target image. Experimental evaluations show that in both black-box and white-box situations. Our adversarial examples do not change the semantics of images, while still being effective at fooling an adversarially trained DNN image classifier.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset