Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms

08/02/2018
by   Kazuo Iwama, et al.
0

The problem called "String reconstruction from substrings" is a mathematical model of sequencing by hybridization that plays an important role in DNA sequencing. In this problem, we are given a blackbox oracle holding an unknown string X and are required to obtain (reconstruct) X through "substring queries" Q(S). Q(S) is given to the oracle with a string S and the answer of the oracle is Yes if X includes S as a substring and No otherwise. Our goal is to minimize the number of queries for the reconstruction. In this paper, we deal with only binary strings for X whose length n is given in advance by using a sequence of good S's. In 1995, Skiena and Sundaram first studied this problem and obtained an algorithm whose query complexity is n+O( n). Its information theoretic lower bound is n, and they posed an obvious open question; if we can remove the O( n) additive term. No progress has been made until now. This paper gives two partially positive answers to this open question. One is a randomized algorithm whose query complexity is n+O(1) with high probability and the other is an average-case algorithm also having a query complexity of n+O(1) on average. The n lower bound is still true for both cases, and hence they are optimal up to an additive constant.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset