Distributed Statistical Estimation of Matrix Products with Applications
We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix A and Bob holds a matrix B, and they want to estimate statistics of A · B. We focus on the well-studied ℓ_p-norm, distinct elements (p = 0), ℓ_0-sampling, and heavy hitter problems. The goal is to minimize both the communication cost and the number of rounds of communication. This problem is closely related to the fundamental set-intersection join problem in databases: when p = 0 the problem corresponds to the size of the set-intersection join. When p = ∞ the output is simply the pair of sets with the maximum intersection size. When p = 1 the problem corresponds to the size of the corresponding natural join. We also consider the heavy hitters problem which corresponds to finding the pairs of sets with intersection size above a certain threshold, and the problem of sampling an intersecting pair of sets uniformly at random.
READ FULL TEXT