Coresets for Clustering in Excluded-minor Graphs and Beyond
Coresets are modern data-reduction tools that are widely used in data analysis to improve efficiency in terms of running time, space and communication complexity. Our main result is a fast algorithm to construct a small coreset for k-Median in (the shortest-path metric of) an excluded-minor graph. Specifically, we give the first coreset of size that depends only on k, ϵ and the excluded-minor size, and our running time is quasi-linear (in the size of the input graph). The main innovation in our new algorithm is that is iterative; it first reduces the n input points to roughly O(log n) reweighted points, then to O(loglog n), and so forth until the size is independent of n. Each step in this iterative size reduction is based on the importance sampling framework of Feldman and Langberg (STOC 2011), with a crucial adaptation that reduces the number of distinct points, by employing a terminal embedding (where low distortion is guaranteed only for the distance from every terminal to all other points). Our terminal embedding is technically involved and relies on shortest-path separators, a standard tool in planar and excluded-minor graphs. Furthermore, our new algorithm is applicable also in Euclidean metrics, by simply using a recent terminal embedding result of Narayanan and Nelson, (STOC 2019), which extends the Johnson-Lindenstrauss Lemma. We thus obtain an efficient coreset construction in high-dimensional Euclidean spaces, thereby matching and simplifying state-of-the-art results (Sohler and Woodruff, FOCS 2018; Huang and Vishnoi, STOC 2020). In addition, we also employ terminal embedding with additive distortion to obtain small coresets in graphs with bounded highway dimension, and use applications of our coresets to obtain improved approximation schemes, e.g., an improved PTAS for planar k-Median via a new centroid set.
READ FULL TEXT 
  
  
     share
 share