Combining covariance tapering and lasso driven low rank decomposition for the kriging of large spatial datasets
Large spatial datasets are becoming ubiquitous in environmental sciences with the explosion in the amount of data produced by sensors that monitor and measure the Earth system. Consequently, the geostatistical analysis of these data requires adequate methods. Richer datasets lead to more complex modeling but may also prevent from using classical techniques. Indeed, the kriging predictor is not straightforwarldly available as it requires the inversion of the covariance matrix of the data. The challenge of handling such datasets is therefore to extract the maximum of information they contain while ensuring the numerical tractability of the associated inference and prediction algorithms. The different approaches that have been developed in the literature to address this problem can be classified into two families, both aiming at making the inversion of the covariance matrix computationally feasible. The covariance tapering approach circumvents the problem by enforcing the sparsity of the covariance matrix, making it invertible in a reasonable computation time. The second available approach assumes a low rank representation of the covariance function. While both approaches have their drawbacks, we propose a way to combine them and benefit from their advantages. The covariance model is assumed to have the form low rank plus sparse. The choice of the basis functions sustaining the low rank component is data driven and is achieved through a selection procedure, thus alleviating the computational burden of the low rank part. This model expresses as a spatial random effects model and the estimation of the parameters is conducted through a step by step approach treating each scale separately. The resulting model can account for second order non stationarity and handle large volumes of data.
READ FULL TEXT