High-Dimensional Gaussian Process Inference with Derivatives
Although it is widely known that Gaussian processes can be conditioned on observations of the gradient, this functionality is of limited use due to the prohibitive computational cost of 𝒪(N^3 D^3) in data points N and dimension D. The dilemma of gradient observations is that a single one of them comes at the same cost as D independent function evaluations, so the latter are often preferred. Careful scrutiny reveals, however, that derivative observations give rise to highly structured kernel Gram matrices for very general classes of kernels (inter alia, stationary kernels). We show that in the low-data regime N<D, the Gram matrix can be decomposed in a manner that reduces the cost of inference to 𝒪(N^2D + (N^2)^3) (i.e., linear in the number of dimensions) and, in special cases, to 𝒪(N^2D + N^3). This reduction in complexity opens up new use-cases for inference with gradients especially in the high-dimensional regime, where the information-to-cost ratio of gradient observations significantly increases. We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
READ FULL TEXT