This unrooted tree of population relationships is the result of an interim analysis of our accumulating data set as of September, 2000. The data consist of allele (haplotype) frequencies at 23 independent loci in each of the 30 populations. The data on many of these loci are or will soon be present in ALFRED. 14 of the loci had complete data on more than one site across the locus, and haplotype frequencies at those loci were estimated using HAPLO (Hawley & Kidd, 1995). Allele frequencies at the other nine loci were calculated by simple gene counting; most of them were biallelic polymorphisms. For the loci with multiple sites many haplotypes were present or estimated to be present at quite low frequencies and sporadically among populations. Such haplotypes contribute little information on population relationships and consequently all haplotypes that were never seen at a frequency greater than 5% in any population were pooled into a residual class. Even with that pooling, there remain in the data set 145 statistically independent alleles, counted as SIGMA (ni-1) where ni is the number of alleles (haplotypes) at the ith locus, in the data set used to calculate the genetic distances, i.e., after rare alleles were pooled into a single residual allele (haplotype) class.
The genetic distance calculated from these allele frequencies was the tau measure defined by Kidd & Cavalli-Sforza (1974) as -ln (1-Fst). This measure is theoretically, for diverging populations, linear in t/2Ne. Each tree structure can be represented as a set of linear equations in segment lengths summing to equal the pairwise genetic distance. The tree illustrated is the best one found for representing the pairwise distances. By "best" we mean that it had the smallest sum of the squared error terms in each of the linear equations; that is the quantity minimized by a least squares solution to the set of equations and was also used to distinguish between different tree structures. This tree is also the tree with all positive segment lengths that has the smallest sum of those segment lengths, i.e., is most parsimonious. Though there is no guarantee this is the best possible fit to the genetic distances, the search routines used indicate it should be at least close to the best. Bootstrap analyses have not been done on this tree but analyses done on a subset of these data and a subset of these populations showed extremely high bootstrap support for the basic structure of the tree, i.e., the African and European populations as two distinct clusters on one side of a long central branch and East Asian and Amerindian populations as two distinct clusters at the other end of that branch with Melanesians coming off the long central branch. The arrangements of populations within each of those four clusters, however, cannot be considered to have strong statistical support. Note also that the clusters represent geographic clustering. The dataset contains few populations that are geographically intermediate and where they occur–Ethiopian Jews and Siberian Yakut–they tend to be placed in more "intermediate" places in the tree.
For more discussion
see "Interpretation
of Phylogenetic Trees"
© 1999 Kenneth K. Kidd, Yale University.
All rights reserved.
These figures may be reproduced for
classroom use only.