Parametric Modelling of Genomic Sequences Distance

Número: 
73
Ano: 
2002
Autor: 
Aluísio Pinheiro
Pranab Kumar Sen
Hildete P. Pinheiro
Abstract: 

The paper considers the problem of homogeneity among groups by comparison of genomic sequences. Among the problems in that kind of analysis two points are specially addressed here. Genetic data perceives information as categorical variables and as a consequence the overall view of it generates strong dependence between genetic sites. The second problem is that usually models are built on the cleaned data (functional genetic such as genes) and the rest of the data that also carries information is dismissed as useless. We proceed here with emphasis on the available heuristic evidences of great diversity in statistical distributions for the categorical data available. A fully operational parametric statistical model is proposed. The model is built with flexibility to withstand use in several different organisms and adapt itself to that usually dismissed material and the dependence between sites. Consistency of the estimators and of derived test procedures are shown.

Keywords: 
DNA sequences
Categorical Data
Dependent Data
Genome
Exons
Introns
Codon
Nucleotide
Log-ratio test
Asymptotic normality
Fisher Information
Mathematics Subject Classification 2000 (MSC 2000): 
Primary- 62F12; 62F03; secondary- 62M05; 92D20;
Arquivo: