Near neighbor preserving dimension reduction for doubling subsets of ℓ_1

02/23/2019
by   Ioannis Z. Emiris, et al.
0

Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (ℓ_2) metric and, much less, for the Manhattan (ℓ_1) metric. Our primary motivation is the approximate nearest neighbor problem in ℓ_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both ℓ_2 and ℓ_1 metrics, as well as for doubling subsets of ℓ_2. In this paper, we propose a dimension reduction, near neighbor-preserving embedding for doubling subsets of ℓ_1. Our approach is to represent the point set with a carefully chosen covering set, and then apply a random projection to that covering set. We study two cases of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset