Pearson Distance is not a Distance

08/15/2019
by   Victor Solo, et al.
0

The Pearson distance between a pair of random variables X,Y with correlation ρ_xy, namely, 1-ρ_xy, has gained widespread use, particularly for clustering, in areas such as gene expression analysis, brain imaging and cyber security. In all these applications it is implicitly assumed/required that the distance measures be metrics, thus satisfying the triangle inequality. We show however, that Pearson distance is not a metric. We go on to show that this can be repaired by recalling the result, (well known in other literature) that √(1-ρ_xy) is a metric. We similarly show that a related measure of interest, 1-|ρ_xy|, which is invariant to the sign of ρ_xy, is not a metric but that √(1-ρ_xy^2) is. We also give generalizations of these results.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset