Coding over Sets for DNA Storage
In this paper we study error-correcting codes for the storage of data in synthetic DNA. We investigate a storage model where a data set is represented by an unordered set of M sequences, each of length L. Errors within that model are losses of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We propose code constructions which can correct errors in such a storage system that can be encoded and decoded efficiently. By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, we show that many of our codes are close to optimal.
READ FULL TEXT 
  
  
     share
 share