Scalable Secure Computation of Statistical Functions with Applications to k-Nearest Neighbors
Given a set S of n d-dimensional points, the k-nearest neighbors (KNN) is the problem of quickly finding k points in S that are nearest to a query point q. The k-nearest neighbors problem has applications in machine learning for classification and regression and also in searching. The secure version of KNN where either q or S are encrypted, has applications such as providing services over sensitive (such as medical or localization) data. In this work we present the first scalable and efficient algorithm for solving KNN with Fully Homomorphic Encryption (FHE) that is realized by a polynomial whose degree is independent of n, the number of points. We implemented our algorithm in an open source library based on HELib implementation for the Brakerski-Gentry-Vakuntanthan's FHE scheme, and ran experiments on MIT's OpenStack cloud. Our experiments show that given a query point q, we can find the set of 20 nearest points out of more than 1000 points in less than an hour. Our result introduces a statistical coreset, which is a data summarization technique that allows statistical functions, such as moments, to be efficiently and scalably computed. As a central tool, we design a new coin toss technique which we use to build the coreset. This coin toss technique and computation of statistical functions may be of independent interest.
READ FULL TEXT