Analysis of Variance for Hamming Distances Applied to Unbalanced Designs

Número: 
30
Ano: 
2001
Autor: 
Hildete P. Pinheiro
Françoise Seillier-Moiseiwitsch
Pranab Kumar Sen
Abstract: 

The interest here is the between- and within-group comparison of genomic sequences. All possible pairwise comparisons within and across groups are performed. Thus, unlike in analyses relying on measures of diversity (such as the Gini-Simpson index), sequences are considered on an individual basis. We develop a categorical analysis-of-variance framework for Hamming distances. This metric measures the proportion of positions at which two aligned sequences differ. We assume that the sequences are distantly related, but do not require that positions along the genome be independent. The total sum of squares is decomposed into within-, between- and across-group expressions. The latter term does not appear in the classical set-up. The theory of generalized U-statistics is utilized to find the asymptotic distribution of each sum of squares. Test statistics to assess homogeneity among groups are constructed.

Arquivo: