Random Spatial Forests
A fundamental problem in environmental epidemiology studies on the association of air pollution exposure with health outcomes is identifying exposure levels for individuals in a cohort study. Measurements are not made at each study participants place of residence, thus individual specific exposure levels are estimated using observations from regulatory monitors. In recent years, it has become desirable to use random forests and other statistical learning techniques to model air pollution exposure at unobserved sites. However, these methods do not exploit the spatial structure of the data. We propose a computationally efficient algorithm to build regression trees allowing for spatial correlation and use these trees to construct random spatial forests. Simulations show that our method outperforms existing approaches on spatially indexed data, and we demonstrate its improved accuracy on elemental carbon, organic carbon, silicon, and sulfur measurements across the continental United States.
READ FULL TEXT