Bayesian Pseudo Posterior Mechanism under Differential Privacy
We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic datasets with a Differential privacy (DP) guarantee from any proposed synthesizer model. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weights ∈ [0, 1] to surgically downweight high-risk records for the generation and release of record-level synthetic data. The pseudo posterior synthesizer constructs weights using Lipschitz bounds for the log-likelihood for each data record, which provides a practical, general formulation for using weights based on record-level sensitivities that we show achieves dramatic improvements in the DP guarantee as compared to the unweighted, non-private synthesizer. We compute a local sensitivity specific to our Consumer Expenditure Surveys (CE) dataset for family income, published by the U.S. Bureau of Labor Statistics, and reveal mild conditions that guarantees its contraction to a global sensitivity result over all x∈X. We show that utility is better preserved for our pseudo posterior mechanism as compared to the exponential mechanism (EM) estimated on the same non-private synthesizer. Our results may be applied to any synthesizing mechanism envisioned by the data analyst in a computationally tractable way that only involves estimation of a pseudo posterior distribution for θ unlike recent approaches that use naturally-bounded utility functions under application of the EM.
READ FULL TEXT