Computing with R-INLA: Accuracy and reproducibility with implications for the analysis of COVID-19 data
The statistical methods used to analyze medical data are becoming increasingly complex. Novel statistical methods increasingly rely on simulation studies to assess their validity. Such assessments typically appear in statistical or computational journals, and the methodology is later introduced to the medical community through tutorials. This can be problematic if applied researchers use the methodologies in settings that have not been evaluated. In this paper, we explore a case study of one such method that has become popular in the analysis of coronavirus disease 2019 (COVID-19) data. The integrated nested Laplace approximations (INLA), as implemented in the R-INLA package, approximates the marginal posterior distributions of target parameters that would have been obtained from a fully Bayesian analysis. We seek to answer an important question: Does existing research on the accuracy of INLA's approximations support how researchers are currently using it to analyze COVID-19 data? We identify three limitations to work assessing INLA's accuracy: 1) inconsistent definitions of accuracy, 2) a lack of studies validating how researchers are actually using INLA, and 3) a lack of research into the reproducibility of INLA's output. We explore the practical impact of each limitation with simulation studies based on models and data used in COVID-19 research. Our results suggest existing methods of assessing the accuracy of the INLA technique may not support how COVID-19 researchers are using it. Guided in part by our results, we offer a proposed set of minimum guidelines for researchers using statistical methodologies primarily validated through simulation studies.
READ FULL TEXT