Probabilistic Re-aggregation Algorithm [First Draft]
Spatial data about individuals or businesses is often aggregated over polygonal regions to preserve privacy, provide useful insight and support decision making. Given a particular aggregation of data (say into local government areas), the re-aggregation problem is to estimate how that same data would aggregate over a different set of polygonal regions (say electorates) without having access to the original unit records. Data61 is developing new re-aggregation algorithms that both estimate confidence intervals of their predictions and utilize additional related datasets when available to improve accuracy. The algorithms are an improvement over the current re-aggregation procedure in use by the ABS, which is manually applied by the data user, less accurate in validation experiments and provides a single best guess answer. The algorithms are deployed in an accessible web service that automatically learns a model and applies it to user-data. This report formulates the re-aggregation problem, describes Data61's new algorithms, and presents preliminary validation experiments.
READ FULL TEXT