Poisson Approximation in Biological Context

Número: 
2
Ano: 
2006
Autor: 
Nicolas Vergne
Miguel Abadi
Abstract: 

Using recent results on the occurrence times of a string of symbols in a stochastic process with mixing properties, we present a new method for the search of rare words in biological sequences generally modelled by a Markov chain. We obtain a bound of the error between the law of the number of occurrences of a word in a sequence (under a Markov model) and its Poisson approximation. A global bound is already given by a Chen-Stein method. Our method, the $\psi$-mixing method, gives local bounds. We search a number of occurrences from which we can regard the studied word as a rare word. If the word appears more often than this number in the biological sequence, we conclude that it is an overrepresented word and then we suppose a biological role. Our method always give a limit number, while it was impossible with the Chen-Stein method.Comparing the methods, we observe a better accuracy for the $\psi$-mixing method for the bound of the tails of distribution. We also present the software PANOW 1 dedicated to the computation of the error term and the limit number of occurrences for a studied word.

Keywords: 
Poisson approximation
Chen-Stein method
mixing processes
Markov chains
rare words
DNA sequences
Observação: 
submitted 01/06
Arquivo: