Fitting Metrics and Ultrametrics with Minimum Disagreements

08/29/2022
by   Vincent Cohen-Addad, et al.
0

Given x ∈ (ℝ_≥ 0)^[n]2 recording pairwise distances, the METRIC VIOLATION DISTANCE (MVD) problem asks to compute the ℓ_0 distance between x and the metric cone; i.e., modify the minimum number of entries of x to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an O(log n)-approximation algorithm for MVD, exponentially improving the previous best approximation ratio of O(OPT^1/3) of Fan et al. [ SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of ULTRAMETRIC VIOLATION DISTANCE (UMVD), where the goal is to compute the ℓ_0 distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The UMVD can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad et al. [FOCS, 2021] from ℓ_1 norm to ℓ_0 norm. We show that this problem can be favorably interpreted as an instance of Correlation Clustering with an additional hierarchical structure, which we solve using a new O(1)-approximation algorithm for correlation clustering that has the structural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an O(log n loglog n) approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of x forming an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset