Generalised Boosted Forests
This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family 𝔼[Y|X] = g^-1(f(X)) our goal is to obtain an estimate for f. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest. We call the sum of these three estimators a generalised boosted forest. We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric. We also provide a variance estimator, which we can obtain with the same computational cost as the original estimate itself. Empirical experiments on real-world data and simulations demonstrate that the methods can effectively reduce bias, and that confidence interval coverage is conservative in the bulk of the covariate distribution.
READ FULL TEXT