k-Center Clustering with Outliers in the MPC and Streaming Model

02/24/2023
by   Mark de Berg, et al.
0

Given a point set P ⊆ X of size n in a metric space (X,dist) of doubling dimension d and two parameters k ∈ N and z ∈ N, the k-center problem with z outliers asks to return a set C^∗⊆ X of k centers such that the maximum distance of all but z points of P to their nearest center in C^∗ is minimized. An (ϵ,k,z)-coreset for this problem is a weighted point set P^* such that an optimal solution for the k-center problem with z outliers on P^* gives a (1±ϵ)-approximation for the k-center problem with z outliers on P. We study the construction of such coresets in the Massively Parallel Computing (MPC) model, and in the insertion-only as well as the fully dynamic streaming model. We obtain the following results, for any given 0 < ϵ≤ 1: In all cases, the size of the computed coreset is O(k/ϵ^d+z). - In the MPC model, we present a deterministic 2-round and a randomized 1-round algorithm. Additionally, we provide a deterministic algorithm that obtains a trade-off between the number of rounds, R, and the storage per machine. - For the insertion-only streaming model, we present an algorithm and a tight lower bound to support it. - We also discuss the dynamic streaming model, which allows both insertions and deletions in the data stream. In this model, we present the first algorithm and a lower bound. - Finally, we consider the sliding window model, where we are interested in maintaining an (ϵ,k,z)-coreset for the last W points in the stream, we present a tight lower bound that confirms the optimality of the previous work by De Berg, Monemizadeh, and Zhong (ESA2020).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset