1022-7954/05/4107- © 2005 Pleiades Publishing, Inc.
Russian Journal of Genetics, Vol. 41, No. 7, 2005, pp. 808–813. Translated from Genetika, Vol. 41, No. 7, 2005, pp. 990–996.
Original Russian Text Copyright © 2005 by Axenovich, Kirichenko.
Allele frequencies are the main characteristic of
population genetic structure. If a population is large and
in equilibrium, these frequencies are easy to determine.
However, most natural populations has ﬁnite sizes, and
their structure is being changed by various microevolu-
tionary processes. Migration from other populations
differing in genetic structure is one of these processes.
Several problems requiring the estimation of allele fre-
quencies can be solved by analysis of mixed popula-
tions formed as a result of migration. The basic
assumption is that the genetic structure of the original
population before the start of migration processes is
known is common for most such analyses, including
the analysis of changes in population genetic structure
with time and its dependence on migration rate [1, 2],
as well as the determination of the origin or the hybrid
status of individuals from the mixed population. The
latter problem arises when dealing with the mapping of
complex traits in humans [3, 4] and in animal popula-
tion genetics, e.g., the study of hybrid zones [5, 6].
There is also the reverse problem, where the data on
the genetic structure of a mixed population and popula-
tion origin of individuals are used to reconstruct the
allele frequencies in the original population before the
start of migration. These estimations are especially
important for human populations. They make it possi-
ble to study the evolutionary history of human popula-
tions and solve problems related to the genetic control
of complex traits characterized by considerable genetic
heterogeneity. Genetic heterogeneity is substantially
smaller within ethnic groups; therefore, they are
regarded as unique material for identifying the genes
that determine complex traits, such as widespread
genetic diseases [7–9]. The genes are mapped by ana-
lyzing linkage disequilibrium with various genetic
markers, precision of the estimation of allele frequencies
of marker genes in the representatives of the ethnic group
studied being crucial for adequate mapping [10, 11].
Precision of an estimate implies that it is unbiased
(with random rather than regular deviations from the
general population mean value) and has variance small
enough for testing the signiﬁcance of differences
between populations with respect to allele frequencies.
These properties of the estimate are ensured by the
appropriate choice of the statistical method for their
obtaining and the characteristics of the sample used for
analysis. Maximum likelihood (ML) estimates of fre-
quencies are known to be the best in this respect .
However, to ensure these properties, it is necessary to
write the likelihood in such a manner that the method of
sampling be taken into account (i.e., the form of the
function should reﬂect whether the sample is random or
selected according to a rule, whether observations are
independent or correlated, and whether the data belong
to the same or different distributions). It is also neces-
sary to maximize the sample informativeness limiting
the estimation precision.
Estimation of Allele Frequencies
in Ethnically Heterogeneous Populations
T. I. Axenovich and A. V. Kirichenko
Institute of Cytology and Genetics, Siberian Division, Russian Academy of Sciences, Novosibirsk, 630090 Russia;
fax: (3833) 33-12-78; e-mail: firstname.lastname@example.org
Received July 21, 2004
—A method for reconstructing allele frequencies characteristic of an original ethnically homogeneous
population before the start of migration processes is described. Information on both the ethnic group studied
and offspring of interethnic marriages is used to estimate the allele frequencies. This makes it possible to
increase the informativeness of the sample, which, in the case of ethnic heterogeneity, depends not only on
allele frequencies and the total sample size, but also on the ethnic structure of the sample. The problem of esti-
mating allele frequency in an ethnically heterogeneous sample has been solved analytically for diallelic loci. It
has been demonstrated that, if offspring of interethnic marriages with the same degree of outbreeding is added
to a sample of the ethnic group studied, the sample informativeness does not change. To utilize the information
contained in the phenotypes of the offspring of interethnic marriages, representatives of the population from which
migration occurs should be included into the sample. The size of the sample ensuring the preassigned accuracy of
estimation is minimized at a certain ratio between the numbers of the offspring of interethnic marriages and the
“immigrants.” To analyze polyallelic loci, a software package has been developed that allows estimating allele fre-
quencies, determining the errors of these estimates, and planning the sample ensuring the preassigned accuracy of
estimation. The package is available free at http://mga.bionet.nsc.ru/PopMixed/PopMixed.html.
MODELS AND METHODS