Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in L_p metrics

11/21/2021
by   Vincent Cohen-Addad, et al.
0

K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in ℓ_p-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in ℓ_p-metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied max k-coverage problem on set systems is hard to approximate to a factor greater than 1-1/e, even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in ℓ_p-metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k-means objective: ∙ Discrete case: To a factor of 3.94 in the ℓ_1-metric and to a factor of 1.73 in the ℓ_2-metric; this improves upon the previous factor of 1.56 and 1.17 respectively, obtained under UGC. ∙ Continuous case: To a factor of 2.10 in the ℓ_1-metric and to a factor of 1.36 in the ℓ_2-metric; this improves upon the previous factor of 1.07 in the ℓ_2-metric obtained under UGC. We also obtain similar improvements under JCH for the k-median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP '05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS '19) to (nearly) the same inapproximability factors but now under the standard NP≠P assumption (instead of UGC).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset