Two-Phase Data Synthesis for Income: An Application to the NHIS

06/02/2020
by   Kevin Ros, et al.
0

We propose a two-phase synthesis process for synthesizing income, a sensitive variable which is usually highly-skewed and has a number of reported zeros. We consider two forms of a continuous income variable: a binary form, which is modeled and synthesized in phase 1; and a non-negative continuous form, which is modeled and synthesized in phase 2. Bayesian synthesis models are proposed for the two-phase synthesis process, and other synthesis models such as classification and regression trees (CART) are readily implementable. We demonstrate our methods as applications to a sample from the National Health Interview Survey (NHIS). Utility and risk profiles of generated synthetic datasets are evaluated and compared to results from a single-phase synthesis process.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset