Bad Citrus: Reducing Adversarial Costs with Model Distances
Recent work by Jia et al., showed the possibility of effectively computing pairwise model distances in weight space, using a model explanation technique known as LIME. This method requires query-only access to the two models under examination. We argue this insight can be leveraged by an adversary to reduce the net cost (number of queries) of launching an evasion campaign against a deployed model. We show that there is a strong negative correlation between the success rate of adversarial transfer and the distance between the victim model and the surrogate used to generate the evasive samples. Thus, we propose and evaluate a method to reduce adversarial costs by finding the closest surrogate model for adversarial transfer.
READ FULL TEXT