Generalised linear models for prognosis and intervention: Theory, practice, and implications for machine learning
In health research, machine learning (ML) is often hailed as the new frontier of data analytics which, combined with big data, will purportedly revolutionise delivery of healthcare and ultimately lead to more informed public health policy and clinical decision-making. However, much of the promise of ML is predicated on prediction, which is fundamentally distinct from causal inference. Nevertheless, these two concepts are often conflated in practice. We briefly consider the sources of this conflation, and the implications it has for modelling practices and subsequent interpretation, in the context of generalised linear models (GLMs). We then go on to consider the implications for ML methods (which are typically applied to prediction tasks), and offer lessons for researchers seeking to use ML for both prediction and causal inference. Our primary aim is to highlight the key differences between models for prediction and causal inference in order to encourage the critical and transparent application of ML to problems in health research.
READ FULL TEXT