Sign-Full Random Projections

04/26/2018
by   Ping Li, et al.
0

The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two D-dim data vectors u, v∈R^D, one can generate x = ∑_i=1^D u_i r_i, and y = ∑_i=1^D v_i r_i, where r_i∼ N(0,1) iid. The "collision probability" is Pr(sgn(x)=sgn(y)) = 1-cos^-1ρ/π, where ρ = ρ(u,v) is the cosine similarity. We develop "sign-full" random projections by estimating ρ from (e.g.,) the expectation E(sgn(x)y)=√(2/π)ρ, which can be further substantially improved by normalizing y. For nonnegative data, we recommend an interesting estimator based on E(y_- 1_x≥ 0 + y_+ 1_x<0) and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood estimator. At high similarity (ρ→1), the asymptotic variance of recommended estimator is only 4/3π≈ 0.4 of the estimator for sign-sign projections. At small k and high similarity, the improvement would be even much more substantial.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro