Optimal Rates for Regularized Conditional Mean Embedding Learning
We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of Y given X into a target reproducing kernel Hilbert space ℋ_Y. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between ℋ_X and L_2, to ℋ_Y. This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal O(log n / n) rates without assuming ℋ_Y to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal.
READ FULL TEXT