Improved Approximation and Scalability for Fair Max-Min Diversification

01/18/2022

∙

Given an n-point metric space (𝒳,d) where each point belongs to one of m=O(1) different categories or groups and a set of integers k_1, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1. We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ϵ for any constant ϵ. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation. 2. We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. For constant dimensions, categories and any constant ϵ>0, we present a 1+ϵ approximation algorithm that runs in O(nk) + 2^O(k) time where k=k_1+…+k_m. We can improve the running time to O(nk)+ poly(k) at the expense of only picking (1-ϵ) k_i points from category i∈ [m]. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.

READ FULL TEXT

Improved Approximation and Scalability for Fair Max-Min Diversification

Sign in with Google

Consider DeepAI Pro