ISSN 10227954, Russian Journal of Genetics, 2011, Vol. 47, No. 7, pp. 874–878. © Pleiades Publishing, Inc., 2011.
Original Russian Text © A.N. Evsyukov, I.Yu. Morozova, 2011, published in Genetika, 2011, Vol. 47, No. 7, pp. 986–990.
Researchers in population genetics well know he
problem of comparison of several populations at sub
stantially differing sample sizes. If more than two popu
lations are compared, and at least one of them is repre
sented by a substantially smaller sample than other pop
ulations, mutual comparison of the genetic distances
often leads to unexpected results that are far from real
ity. This problem is very important for many population
genetic studies because, among other things, some
widely used methods of population genetic analysis,
such as multidimensional scaling and cluster analysis,
use data on genetic distances.
The essence of the problem of samples of different
sizes is that the smaller the sample, the less accurately
the sample frequencies approximate the frequencies of
genetic markers in the general population. Therefore,
the smaller the sample, the greater the error of the
genetic marker frequency and the standard deviation
of the estimate of the genetic distance to this sample
are. As a result, if one of the compared samples is sub
stantially smaller than the other samples in the group,
the genetic distances to this sample from the other
samples will typically be overestimated as compared to
the pairwise distances between the other samples.
An increase in the size of the problem sample and
removal of the sample from analysis are not always
acceptable as the means for solving the problem of sam
ples of different sizes. One more approach is to decrease
the sizes of all samples used in comparative analysis.
However, this approach has also at least the drawback:
much information on genetic polymorphism is lost for
analysis in this case. We assumed that the problem could
be solved by using multiple random permutations. The
main idea of this approach is to randomly reduce all
samples to the size of the smallest sample sufficiently
many times; this approach is similar to two wellknown,
closely related statistical methods, jackknifing and
bootstrapping (see, e.g., [1, 2]). In our case, we calcu
lated the marker frequencies in the resultant samples
and estimated the pairwise genetic distances between
them after every random reduction of the sample sizes.
After repeating this procedure sufficiently many (one
thousand or more) times, we averaged the estimates of
the genetic distances and used the resultant mean values
in subsequent comparative analysis. Obviously, the
decrease of large samples to the size of the smallest sam
ple may change the frequency distribution of markers,
the change being the greater, the greater is the decrease
in the sample size. Therefore, the expected effect of this
method was a stochastic increase in genetic distances.
We assumed that this increase should be heterogeneous,
i.e., the greatest for those populations the samples from
which were originally the largest. We expected that the
estimates of genetic distances obtained by this method
would be corrected adequately to the sizes of the sam
ples and, hence, would better represent the differences
between populations and solve the problem of compar
ison of several populations at different sample sizes.
To estimate the genetic distances by the method of
multiple permutations, we developed software in the
Borland Delphi 7.0 programming support environ
ment. The software calculates CavalliSforza’s 
pairwise angular genetic distances for any number of
populations, arbitrary number of loci, and the number
of permutations specified by the user.
RESULTS AND DISCUSSION
Let us illustrate the efficiency of the proposed method
by its application to actual data. We used seven ethnic
samples of humans from published studies containing
Calculation of Interpopulation Genetic Distances
at Different Sample Sizes
A. N. Evsyukov and I. Yu. Morozova
Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991 Russia
Received February 9, 2011
—A new method for comparison of interpopulation genetic distances is proposed. The method
allows a more precise location of populations in the space of genetic characters relative to one another at con
siderably different sample sizes. The method consists in multiple reduction of the sample sizes to the size of
the smallest sample followed by averaging the calculated genetic distances between the reduced samples. Soft
ware for calculation of genetic distances by this method is presented.
MODELS AND METHODS