On Approximating Total Variation Distance
Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the computational problem of determining the TV distance between two product distributions over the domain {0,1}^n. We establish the following results. 1. Exact computation of TV distance between two product distributions is #𝖯-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals. 2. Given two product distributions P and Q with marginals of P being at least 1/2 and marginals of Q being at most the respective marginals of P, there exists a fully polynomial-time randomized approximation scheme (FPRAS) for computing the TV distance between P and Q. In particular, this leads to an efficient approximation scheme for the interesting case when P is an arbitrary product distribution and Q is the uniform distribution. We pose the question of characterizing the complexity of approximating the TV distance between two arbitrary product distributions as a basic open problem in computational statistics.
READ FULL TEXT