Adversarial Robustness: What fools you makes you stronger
We prove an exponential separation for the sample complexity between the standard PAC-learning model and a version of the Equivalence-Query-learning model. We then show that this separation has interesting implications for adversarial robustness. We explore a vision of designing an adaptive defense that in the presence of an attacker computes a model that is provably robust. In particular, we show how to realize this vision in a simplified setting. In order to do so, we introduce a notion of a strong adversary: he is not limited by the type of perturbations he can apply but when presented with a classifier can repetitively generate different adversarial examples. We explain why this notion is interesting to study and use it to prove the following. There exists an efficient adversarial-learning-like scheme such that for every strong adversary 𝐀 it outputs a classifier that (a) cannot be strongly attacked by 𝐀, or (b) has error at most ϵ. In both cases our scheme uses exponentially (in ϵ) fewer samples than what the PAC bound requires.
READ FULL TEXT 
  
  
     share
 share