Distributed Statistical Estimation of Matrix Products with Applications

07/02/2018
by   David P. Woodruff, et al.
0

We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix A and Bob holds a matrix B, and they want to estimate statistics of A · B. We focus on the well-studied ℓ_p-norm, distinct elements (p = 0), ℓ_0-sampling, and heavy hitter problems. The goal is to minimize both the communication cost and the number of rounds of communication. This problem is closely related to the fundamental set-intersection join problem in databases: when p = 0 the problem corresponds to the size of the set-intersection join. When p = ∞ the output is simply the pair of sets with the maximum intersection size. When p = 1 the problem corresponds to the size of the corresponding natural join. We also consider the heavy hitters problem which corresponds to finding the pairs of sets with intersection size above a certain threshold, and the problem of sampling an intersecting pair of sets uniformly at random.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset