Approximation Schemes for Low-Rank Binary Matrix Approximation Problems
We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are Low GF(2)-Rank Approximation, Low Boolean-Rank Approximation, and various versions of Binary Clustering. For example, for Low GF(2)-Rank Approximation problem, where for an m× n binary matrix A and integer r>0, we seek for a binary matrix B of GF_2 rank at most r such that ℓ_0 norm of matrix A-B is minimum, our algorithm, for any ϵ>0 in time f(r,ϵ)· n· m, where f is some computable function, outputs a (1+ϵ)-approximate solution with probability at least (1-1/e). Our approximation algorithms substantially improve the running times and approximation factors of previous works. We also give (deterministic) PTASes for these problems running in time n^f(r)1/ϵ^21/ϵ, where f is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting in its own.
READ FULL TEXT