Low-redundancy codes for correcting multiple short-duplication and edit errors
Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most p edits, where a short duplication generates a copy of a substring with length ≤ 3 and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to p edits (in addition to duplications) at the additional cost of roughly 8p(log_q n)(1+o(1)) symbols of redundancy, thus achieving the same asymptotic rate, where q≥ 4 is the alphabet size and p is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when p is a constant with respect to the code length.
READ FULL TEXT