Composition of nested embeddings with an application to outlier removal
We study the design of embeddings into Euclidean space with outliers. Given a metric space (X,d) and an integer k, the goal is to embed all but k points in X (called the "outliers") into ℓ_2 with the smallest possible distortion c. Finding the optimal distortion c for a given outlier set size k, or alternately the smallest k for a given target distortion c are both NP-hard problems. In fact, it is UGC-hard to approximate k to within a factor smaller than 2 even when the metric sans outliers is isometrically embeddable into ℓ_2. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an O(log^4 k) factor and the distortion to within a constant factor. The main technical component in our result is an approach for constructing a composition of two given embeddings from subsets of X into ℓ_2 which inherits the distortions of each to within small multiplicative factors. Specifically, given a low c_S distortion embedding from S⊂ X into ℓ_2 and a high(er) c_X distortion embedding from the entire set X into ℓ_2, we construct a single embedding that achieves the same distortion c_S over pairs of points in S and an expansion of at most O(log k)· c_X over the remaining pairs of points, where k=|X∖ S|. Our composition theorem extends to embeddings into arbitrary ℓ_p metrics for p≥ 1, and may be of independent interest. While unions of embeddings over disjoint sets have been studied previously, to our knowledge, this is the first work to consider compositions of nested embeddings.
READ FULL TEXT