Anomaly Detection of Test-Time Evasion Attacks using Class-conditional Generative Adversarial Networks
Deep Neural Networks (DNNs) have been shown vulnerable to adversarial (Test-Time Evasion (TTE)) attacks which, by making small changes to the input, alter the DNN's decision. We propose an attack detector based on class-conditional Generative Adversarial Networks (GANs). We model the distribution of clean data conditioned on the predicted class label by an Auxiliary Classifier GAN (ACGAN). Given a test sample and its predicted class, three detection statistics are calculated using the ACGAN Generator and Discriminator. Experiments on image classification datasets under different TTE attack methods show that our method outperforms state-of-the-art detection methods. We also investigate the effectiveness of anomaly detection using different DNN layers (input features or internal-layer features) and demonstrate that anomalies are harder to detect using features closer to the DNN's output layer.
READ FULL TEXT