Investigating some attributes of periodicity in DNA sequences via semi-Markov modelling
DNA segments and sequences have been studied thoroughly during the past decades. One of the main problems in computational biology is the identification of exon-intron structures inside genes using mathematical techniques. Previous studies have used different methods, such as Fourier analysis and hidden-Markov models, in order to be able to predict which parts of a gene correspond to a protein encoding area. In this paper, a semi-Markov model is applied to 3-base periodic sequences, which characterize the protein-coding regions of the gene. Analytic forms of the related probabilities and the corresponding indexes are provided, which yield a description of the underlying periodic pattern. Last, the previous theoretical results are illustrated with DNA sequences of synthetic and real data.
READ FULL TEXT