Averaging causal estimators in high dimensions
There has been increasing interest in recent years in the development of approaches to estimate causal effects when the number of potential confounders is prohibitively large. This growth in interest has led to a number of potential estimators one could use in this setting. Each of these estimators has different operating characteristics, and it is unlikely that one estimator will outperform all others across all possible scenarios. Coupling this with the fact that an analyst can never know which approach is best for their particular data, we propose a synthetic estimator that averages over a set of candidate estimators. Averaging is widely used in statistics for problems such as prediction, where there are many possible models, and averaging can improve performance and increase robustness to using incorrect models. We show that these ideas carry over into the estimation of causal effects in high-dimensional scenarios. We show theoretically that averaging provides robustness against choosing a bad model, and show empirically via simulation that the averaging estimator performs quite well, and in most cases nearly as well as the best among all possible candidate estimators. Finally, we illustrate these ideas in an environmental wide association study and see that averaging provides the largest benefit in the more difficult scenarios that have large numbers of confounders.
READ FULL TEXT