Covariate Selection for Generalizing Experimental Results
Scientists are interested in generalizing causal effects estimated in an experiment to a target population. However, analysts are often constrained by available covariate information, which has limited applicability of existing approaches that assume rich covariate data from both experimental and population samples. As a concrete context, we focus on a large-scale development program, called the Youth Opportunities Program (YOP), in Uganda. Although more than 40 pre-treatment covariates are available in the experiment, only 8 of them were also measured in a target population. To tackle this common issue of data constraints, we propose a method to estimate a separating set – a set of variables affecting both the sampling mechanism and treatment effect heterogeneity – and show that the population average treatment effect (PATE) can be identified by adjusting for estimated separating sets. Our approach has two advantages. First, our algorithm only requires a rich set of covariates in the experimental data, not in the target population. Second, the algorithm can estimate separating sets under researcher-specific constraints on what variables are measured in the population. Using the YOP experiment, we find that the proposed algorithm can allow for estimation of the PATE in situations where conventional methods fail due to data requirements.
READ FULL TEXT