Feature Denoising for Improving Adversarial Robustness
Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9 method achieves 55.7 our method secures 42.6 first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 --- it achieved 50.6 against 48 unknown attackers, surpassing the runner-up approach by 10 and models will be made publicly available.
READ FULL TEXT 
  
  
     share
 share