Secure Metric Learning via Differential Pairwise Privacy
Distance Metric Learning (DML) has drawn much attention over the last two decades. A number of previous works have shown that it performs well in measuring the similarities of individuals given a set of correctly labeled pairwise data by domain experts. These important and precisely-labeled pairwise data are often highly sensitive in real world (e.g., patients similarity). This paper studies, for the first time, how pairwise information can be leaked to attackers during distance metric learning, and develops differential pairwise privacy (DPP), generalizing the definition of standard differential privacy, for secure metric learning. Unlike traditional differential privacy which only applies to independent samples, thus cannot be used for pairwise data, DPP successfully deals with this problem by reformulating the worst case. Specifically, given the pairwise data, we reveal all the involved correlations among pairs in the constructed undirected graph. DPP is then formalized that defines what kind of DML algorithm is private to preserve pairwise data. After that, a case study employing the contrastive loss is exhibited to clarify the details of implementing a DPP-DML algorithm. Particularly, the sensitivity reduction technique is proposed to enhance the utility of the output distance metric. Experiments both on a toy dataset and benchmarks demonstrate that the proposed scheme achieves pairwise data privacy without compromising the output performance much (Accuracy declines less than 0.01 throughout all benchmark datasets when the privacy budget is set at 4).
READ FULL TEXT