Bulk Johnson-Lindenstrauss Lemmas
For a set X of N points in ℝ^D, the Johnson-Lindenstrauss lemma provides random linear maps that approximately preserve all pairwise distances in X – up to multiplicative error (1±ϵ) with high probability – using a target dimension of O(ϵ^-2log(N)). Certain known point sets actually require a target dimension this large – any smaller dimension forces at least one distance to be stretched or compressed too much. What happens to the remaining distances? If we only allow a fraction η of the distances to be distorted beyond tolerance (1±ϵ), we show a target dimension of O(ϵ^-2log(4e/η)log(N)/R) is sufficient for the remaining distances. With the stable rank of a matrix A as ‖A‖_F^2/‖A‖^2, the parameter R is the minimal stable rank over certain log(N) sized subsets of X-X or their unit normalized versions, involving each point of X exactly once. The linear maps may be taken as random matrices with i.i.d. zero-mean unit-variance sub-gaussian entries. When the data is sampled i.i.d. as a given random vector ξ, refined statements are provided; the most improvement happens when ξ or the unit normalized ξ-ξ' is isotropic, with ξ' an independent copy of ξ, and includes the case of i.i.d. coordinates.
READ FULL TEXT