Perturbations and Causality in Gaussian Models
Causal inference is understood to be a very challenging problem with observational data alone. Without making additional strong assumptions, it is only typically possible given access to data arising from perturbing the underlying system. To identify causal relations among a collections of covariates and a target or response variable, existing procedures rely on at least one of the following assumptions: i) the target variable remains unperturbed, ii) the hidden variables remain unperturbed, and iii) the hidden effects are dense. In this paper, we consider a perturbation model for interventional data (involving soft and hard interventions) over a collection of Gaussian variables that does not satisfy any of these conditions and can be viewed as a mixed-effects linear structural causal model. We propose a maximum-likelihood estimator – dubbed DirectLikelihood – that exploits system-wide invariances to uniquely identify the population causal structure from perturbation data. Our theoretical guarantees also carry over to settings where the variables are non-Gaussian but are generated according to a linear structural causal model. Further, we demonstrate that the population causal parameters are solutions to a worst-case risk with respect to distributional shifts from a certain perturbation class. We illustrate the utility of our perturbation model and the DirectLikelihood estimator on synthetic data as well as real data involving protein expressions.
READ FULL TEXT