Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions
Kernel mean embeddings have recently attracted the attention of the machine learning community. They map measures μ from some set M to functions in a reproducing kernel Hilbert space (RKHS) with kernel k. The RKHS distance of two mapped measures is a semi-metric d_k over M. We study three questions. (I) For a given kernel, what sets M can be embedded? (II) When is the embedding injective over M (in which case d_k is a metric)? (III) How does the d_k-induced topology compare to other topologies on M? The existing machine learning literature has addressed these questions in cases where M is (a subset of) the finite regular Borel measures. We unify, improve and generalise those results. Our approach naturally leads to continuous and possibly even injective embeddings of (Schwartz-) distributions, i.e., generalised measures, but the reader is free to focus on measures only. In particular, we systemise and extend various (partly known) equivalences between different notions of universal, characteristic and strictly positive definite kernels, and show that on an underlying locally compact Hausdorff space, d_k metrises the weak convergence of probability measures if and only if k is continuous and characteristic.
READ FULL TEXT