Building Representative Matched Samples with Multi-valued Treatments in Large Observational Studies: Analysis of the Impact of an Earthquake on Educational Attainment
What is the impact of an earthquake on the educational attainment of high school students? In this paper, we address this question using a unique data set and new matching methods. In particular, we use an administrative census of the same students measured before and after the 2010 Chilean earthquake. We propose and analyze new matching methods that overcome three challenges of existing approaches. These new methods allow us: (i) to handle multi-valued treatments without estimating the generalized propensity score; (ii) to build self-weighted matched samples that are representative of a target population by design; and (iii) to work with much larger data sets than other similar approaches. For this, we use a linear-sized mixed integer programming formulation for matching with distributional covariate balance. We formally show that this formulation is more effective than alternative quadratic-sized formulations, as its reduction in size does not affect its strength from the standpoint of its linear programming relaxation. With this formulation, we can handle data sets with hundreds of thousands of observations in a couple of minutes. Using these methods, we show that while increasing levels of exposure to the earthquake have a negative impact on school attendance, there is no effect on university admission test scores.
READ FULL TEXT