Reconstruction of the distribution of sensitive data under free-will privacy
The local privacy mechanisms, such as k-RR, RAPPOR, and the geo-indistinguishability ones, have become quite popular thanks to the fact that the obfuscation can be effectuated at the users end, thus avoiding the need of a trusted third party. Another important advantage is that each data point is sanitized independently from the others, and therefore different users may use different levels of obfuscation depending on their privacy requirements, or they may even use entirely different mechanisms depending on the services they are trading their data for. A challenging requirement in this setting is to construct the original distribution on the users sensitive data from their noisy versions. Existing techniques can only estimate that distribution separately on each obfuscation schema and corresponding noisy data subset. But the smaller are the subsets, the more imprecise the estimations are. In this paper we study how to avoid the subsets-fractioning problem when combining local privacy mechanisms, thus recovering an optimal utility. We focus on the estimation of the original distribution, and on the two main methods to estimate it: the matrix-inversion method and the iterative Bayes update. We consider various cases of combination of local privacy mechanisms, and compare the flexibility and the performance of the two methods.
READ FULL TEXT