Demystifying Statistical Matching Algorithms for Big Data
Statistical matching is an effective method for estimating causal effects in which treated units are paired with control units with “similar” values of confounding covariates prior to performing estimation. In this way, matching helps isolate the effect of treatment on response from effects due to the confounding covariates. While there are a large number of software packages to perform statistical matching, the algorithms and techniques used to solve statistical matching problems – especially matching without replacement – are not widely understood. In this paper, we describe in detail commonly-used algorithms and techniques for solving statistical matching problems. We focus in particular on the efficiency of these algorithms as the number of observations grow large. We advocate for the further development of statistical matching methods that impose and exploit “sparsity” – by greatly restricting the available matches for a given treated unit – as this may be critical to ensure scalability of matching methods as data sizes grow large.
READ FULL TEXT