Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under ℓ_p-Distances

10/04/2017
by   Grigory Yaroslavtsev, et al.
0

We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of n input d-dimensional vectors under Hamming, ℓ_1, ℓ_2 and ℓ_∞ distances. All our algorithms run in O( n) rounds of MPC for any fixed d and achieve (1+ϵ)-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for o( n)-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on a variety of datasets exhibiting speedups of several orders of magnitude.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro