A General Scoring Rule for Randomized Kernel Approximation with Application to Canonical Correlation Analysis

10/11/2019
by   Yinsong Wang, et al.
0

Random features has been widely used for kernel approximation in large-scale machine learning. A number of recent studies have explored data-dependent sampling of features, modifying the stochastic oracle from which random features are sampled. While proposed techniques in this realm improve the approximation, their application is limited to a specific learning task. In this paper, we propose a general scoring rule for sampling random features, which can be employed for various applications with some adjustments. We first observe that our method can recover a number of data-dependent sampling methods (e.g., leverage scores and energy-based sampling). Then, we restrict our attention to a ubiquitous problem in statistics and machine learning, namely Canonical Correlation Analysis (CCA). We provide a principled guide for finding the distribution maximizing the canonical correlations, resulting in a novel data-dependent method for sampling features. Numerical experiments verify that our algorithm consistently outperforms other sampling techniques in the CCA task.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset