Explicit natural gradient updates for Cholesky factor in Gaussian variational approximation
Stochastic gradient methods have enabled variational inference for high-dimensional models and large data. However, the steepest ascent direction in the parameter space of a statistical model is given not by the commonly used Euclidean gradient, but the natural gradient which premultiplies the Euclidean gradient by the inverted Fisher information matrix. Use of natural gradients can improve convergence significantly, but inverting the Fisher information matrix is daunting in high-dimensions. In Gaussian variational approximation, natural gradient updates of the natural parameters (expressed in terms of the mean and precision matrix) of the Gaussian distribution can be derived analytically, but do not ensure the precision matrix remains positive definite. To tackle this issue, we consider Cholesky decomposition of the covariance or precision matrix and derive explicit natural gradient updates of the Cholesky factor by finding the inverse of the Fisher information matrix analytically. Natural gradient updates of the Cholesky factor as compared to natural parameters, depend only on the first instead of the second derivative of the log posterior density and reduces computational cost. Sparsity constraints incorporating posterior independence structure can be imposed by fixing relevant entries in the Cholesky factor to zero.
READ FULL TEXT