Hidden variables unseen by Random Forests
Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. We argue that alternative partitioning schemes can enhance identification of these interactions. Furthermore, we extend recent theory of Random Forests based on the notion of impurity decrease by considering probabilistic impurity decrease conditions. Within this framework, consistency of a new algorithm coined 'Random Split Random Forest' tailored to address function classes involving pure interactions is established. In a simulation study, we validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role.
READ FULL TEXT