Machine learning algorithms to infer trait matching and predict species interactions in ecological networks
Ecologists have long suspected that species are more likely to interact if their traits match in a particular way. For example, a pollination interaction may be particularly likely if the proportions of a bee's tongue match flower shape in a beneficial way. Empirical evidence for trait matching, however, varies significantly in strength among different types of ecological networks. Here, we show that ambiguity among empirical trait matching studies may have arisen at least in parts from using overly simple statistical models. Using simulated and real data, we contrast conventional regression models with Machine Learning (ML) models (Random Forest, Boosted Regression Trees, Deep Neural Networks, Convolutional Neural Networks, Support Vector Machines, naive Bayes, and k-Nearest-Neighbor), testing their ability to predict species interactions based on traits, and infer trait combinations causally responsible for species interactions. We find that the best ML models can successfully predict species interactions in plant-pollinator networks (up to 0.93 AUC) and outperform conventional regression models. Our results also demonstrate that ML models can better identify the causally responsible trait matching combinations than GLMs. In two case studies, the best ML models could successfully predict species interactions in a global plant-pollinator database and infer ecologically plausible trait matching rules for a plant-hummingbird network from Costa Rica, without any prior assumptions about the system. We conclude that flexible ML models offer many advantages over traditional regression models for understanding interaction networks. We anticipate that these results extrapolate to other network types, such as trophic or competitive networks. More generally, our results highlight the potential of ML and artificial intelligence for inference beyond standard tasks such as pattern recognition.
READ FULL TEXT