Text Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

01/03/2021
by   Domenico Cantone, et al.
0

In this paper we investigate the approximate string matching problem when the allowed edit operations are non-overlapping unbalanced translocations of adjacent factors. Such kind of edit operations take place when two adjacent sub-strings of the text swap, resulting in a modified string. The two involved substrings are allowed to be of different lengths. Such large-scale modifications on strings have various applications. They are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of cancer suppressor genes. In addition, among other applications, they are frequent modifications accounted in musical or in natural language information retrieval. However, despite of their central role in so many fields of text processing, little attention has been devoted to the problem of matching strings allowing for this kind of edit operation. In this paper we present three algorithms for solving the problem, all of them with a (nm^3) worst-case and a (m^2)-space complexity, where m and n are the length of the pattern and of the text, respectively. particular, our first algorithm is based on the dynamic-programming approach. Our second solution improves the previous one by making use of the Directed Acyclic Word Graph of the pattern. Finally our third algorithm is based on an alignment procedure. We also show that under the assumptions of equiprobability and independence of characters, our second algorithm has a (nlog^2_σ m) average time complexity, for an alphabet of size σ≥ 4.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset