Near-Optimal Explainable k-Means for All Dimensions

06/29/2021
by   Moses Charikar, et al.
21

Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k^1 - 2/d poly(dlog k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k,d≥ 2. Taking the minimum of this bound and the k polylog (k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k^1 - 2/d polylog(k), which we show is optimal for every choice of k,d≥ 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O(log kloglog k) bound, improving near-exponentially over the previous best bound of O(klog k) by Laber and Murtinho (ICML 2021).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset