Impossibility of phylogeny reconstruction from k-mer counts

10/27/2020
by   Wai-Tong Louis Fan, et al.
0

We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed k no statistically consistent phylogeny estimation is possible from k-mer counts of the leaf sequences alone. Formally, we establish that the joint leaf distributions of k-mer counts on two distinct trees have total variation distance bounded away from 1 as the sequence length tends to infinity. That is, the two distributions cannot be distinguished with probability going to one in that asymptotic regime. Our results are information-theoretic: they imply an impossibility result for any reconstruction method using only k-mer counts at the leaves.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset