Differentially Private Stream Processing at Scale

03/31/2023

∙

We design, to the best of our knowledge, the first differentially private (DP) stream processing system at scale. Our system –Differential Privacy SQL Pipelines (DP-SQLP)– is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least 16× reduction in error over meaningful baselines we consider.

READ FULL TEXT

Differentially Private Stream Processing at Scale

Sign in with Google

Consider DeepAI Pro