k-Center Clustering with Outliers in the MPC and Streaming Model
Given a point set P ⊆ X of size n in a metric space (X,dist) of doubling dimension d and two parameters k ∈ N and z ∈ N, the k-center problem with z outliers asks to return a set C^∗⊆ X of k centers such that the maximum distance of all but z points of P to their nearest center in C^∗ is minimized. An (ϵ,k,z)-coreset for this problem is a weighted point set P^* such that an optimal solution for the k-center problem with z outliers on P^* gives a (1±ϵ)-approximation for the k-center problem with z outliers on P. We study the construction of such coresets in the Massively Parallel Computing (MPC) model, and in the insertion-only as well as the fully dynamic streaming model. We obtain the following results, for any given 0 < ϵ≤ 1: In all cases, the size of the computed coreset is O(k/ϵ^d+z). - In the MPC model, we present a deterministic 2-round and a randomized 1-round algorithm. Additionally, we provide a deterministic algorithm that obtains a trade-off between the number of rounds, R, and the storage per machine. - For the insertion-only streaming model, we present an algorithm and a tight lower bound to support it. - We also discuss the dynamic streaming model, which allows both insertions and deletions in the data stream. In this model, we present the first algorithm and a lower bound. - Finally, we consider the sliding window model, where we are interested in maintaining an (ϵ,k,z)-coreset for the last W points in the stream, we present a tight lower bound that confirms the optimality of the previous work by De Berg, Monemizadeh, and Zhong (ESA2020).
READ FULL TEXT