Fitting Metrics and Ultrametrics with Minimum Disagreements
Given x ∈ (ℝ_≥ 0)^[n]2 recording pairwise distances, the METRIC VIOLATION DISTANCE (MVD) problem asks to compute the ℓ_0 distance between x and the metric cone; i.e., modify the minimum number of entries of x to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an O(log n)-approximation algorithm for MVD, exponentially improving the previous best approximation ratio of O(OPT^1/3) of Fan et al. [ SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of ULTRAMETRIC VIOLATION DISTANCE (UMVD), where the goal is to compute the ℓ_0 distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The UMVD can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad et al. [FOCS, 2021] from ℓ_1 norm to ℓ_0 norm. We show that this problem can be favorably interpreted as an instance of Correlation Clustering with an additional hierarchical structure, which we solve using a new O(1)-approximation algorithm for correlation clustering that has the structural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an O(log n loglog n) approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of x forming an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.
READ FULL TEXT