Faster Approximate(d) Text-to-Pattern L1 Distance

01/28/2018
by   Przemysław Uznański, et al.
0

The problem of finding distance between pattern of length m and text of length n is a typical way of generalizing pattern matching to incorporate dissimilarity score. For both Hamming and L_1 distances only a super linear upper bound O(n√(m)) are known, which prompts the question of relaxing the problem: either by asking for 1 ±ε approximate distance (every distance is reported up to a multiplicative factor), or k-approximated distance (distances exceeding k are reported as ∞). We focus on L_1 distance, for which we show new algorithms achieving complexities respectively O(ε^-1 n) and O((m+k√(m)) · n/m). This is a significant improvement upon previous algorithms with runtime O(ε^-2 n) of Lipsky and Porat (Algorithmica 2011) and O(n√(k)) of Amir, Lipsky, Porat and Umanski (CPM 2005). We also provide a series of reductions, showing that if our upper bound for approximate L_1 distance is tight, then so is our upper bound for k-approximated L_1 distance, and if the latter is tight then so is k-approximated Hamming distance upper bound due to the result of Gawrychowski and Uznański (arXiv 2017).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset