Sign-Full Random Projections

04/26/2018

∙

The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two D-dim data vectors u, v∈R^D, one can generate x = ∑_i=1^D u_i r_i, and y = ∑_i=1^D v_i r_i, where r_i∼ N(0,1) iid. The "collision probability" is Pr(sgn(x)=sgn(y)) = 1-cos^-1ρ/π, where ρ = ρ(u,v) is the cosine similarity. We develop "sign-full" random projections by estimating ρ from (e.g.,) the expectation E(sgn(x)y)=√(2/π)ρ, which can be further substantially improved by normalizing y. For nonnegative data, we recommend an interesting estimator based on E(y_- 1_x≥ 0 + y_+ 1_x<0) and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood estimator. At high similarity (ρ→1), the asymptotic variance of recommended estimator is only 4/3π≈ 0.4 of the estimator for sign-sign projections. At small k and high similarity, the improvement would be even much more substantial.

READ FULL TEXT

Sign-Full Random Projections

Sign in with Google

Consider DeepAI Pro