The Use of the Expectation–Maximization (EM) Algorithm for Maximum Likelihood Estimation of Gametic Frequencies of Multilocus Polymorphic Codominant Systems Based on Sampled Population Data

The Use of the Expectation–Maximization (EM) Algorithm for Maximum Likelihood Estimation of...
Sergeev, A.; Agapova, R.
2004-10-13 00:00:00
Estimation of gametic frequencies in multilocus polymorphic systems based on the numerical distribution of multilocus genotypes in a population sample (“analysis without pedigrees”) is difficult because some gametes are not recognized in the data obtained. Even in the case of codominant systems, where all alleles can be recognized by genotypes, so that direct estimation of the frequencies of genes (alleles) is possible (“complete data”), estimation of the frequencies of multilocus gametes based on the data on multilocus genotypes is sometimes impossible, whether population data or even family data are used for studying genotypic segregation or analysis of linkage (“incomplete data”). Such “incomplete data” are analyzed based on the corresponding genetic models using the expectation–maximization (EM) algorithm. In this study, the EM algorithm based on the random-marriage model for a nonsubdivided population was used to estimate gametic frequencies. The EM algorithm used in the study does not set any limitations on the number of loci and the number of alleles of each locus. Locus and alleles are identified by numeration making possible to arrange loops. In each combination of alleles for a given combination of m out of L loci (L is the total number of loci studied), all alleles are assigned value 1, and the remaining alleles are assigned value 0. The sum of zeros and unities for each gamete is its gametic value (h), and the sum of the gametic values of the gametes that form a given genotype is the genotypic value (g) of this genotype. Then, gametes with the sameh are united into a single class, which reduces the number of the estimated parameters. In a general case of m loci, this procedure yields m + 1 classes of gametes and 2m + 1 classes of genotypes with genotypic valuesg = 0, 1, 2,... 2m. The unknown frequencies of them + 1 classes of gametes can be represented as functions of the gametic frequencies whose maximum likelihood estimations (MLEs) have been obtained in all previous EM procedures and the only unknown frequency (P
m(m)) that is to be estimated in the given EM procedure. At the expectation step, the expected frequencies (F
m(g) of the genotypes with genotypic valuesg are expressed in terms of the products of the frequencies of m + 1 classes of gametes. The data on genotypes are the numbers (n
g) of individuals with genotypic values g = 0, 1, 2, 3, ..., 2m. The maximization step is the maximization of the logarithm of the likelihood function (LLF) for n
g values. Thus, the EM algorithm is reduced, in each case, to solution of only one equation with one unknown parameter with the use of the n
g values, i.e., the numbers of individuals after the corresponding regrouping of the data on the individuals" genotypes. Treatment of the data obtained by Kurbatova on the MNSs and Rhesus systems with alleles C, C
w
, c, D, d, E, e with the use of Weir's EM algorithm and the EM algorithm suggested in this study yielded similar results. However, the MLEs of the parameters obtained with the use of either algorithm often converged to a wrong solution: the sum of the frequencies of all gametes (4 and 12 gametes for MNSs and Rhesus, respectively) was not equal to 1.0 even if the global maximum of LLF was reached for each of them (as it was for MNSs with the use of Weir's EM algorithm), with each parameter falling within admissible limits (e.g., 0, min(P
N, P
s) for P
Ns). The χ2 function is suggested to be used as a goodness-of-fit function for the distribution of genotypes in a sample in order to select acceptable solutions. However, the minimum of this function only guarantee the acceptability of solutions if all limitations on the parameters are met: the sum of estimations of gametic frequencies is 1.0, each frequency falls within the admissible limits, and the “gametic algebra” is complied with (none of the frequencies is negative).
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pngRussian Journal of GeneticsSpringer Journalshttp://www.deepdyve.com/lp/springer-journals/the-use-of-the-expectation-maximization-em-algorithm-for-maximum-mV0GL0tm65

The Use of the Expectation–Maximization (EM) Algorithm for Maximum Likelihood Estimation of Gametic Frequencies of Multilocus Polymorphic Codominant Systems Based on Sampled Population Data

Estimation of gametic frequencies in multilocus polymorphic systems based on the numerical distribution of multilocus genotypes in a population sample (“analysis without pedigrees”) is difficult because some gametes are not recognized in the data obtained. Even in the case of codominant systems, where all alleles can be recognized by genotypes, so that direct estimation of the frequencies of genes (alleles) is possible (“complete data”), estimation of the frequencies of multilocus gametes based on the data on multilocus genotypes is sometimes impossible, whether population data or even family data are used for studying genotypic segregation or analysis of linkage (“incomplete data”). Such “incomplete data” are analyzed based on the corresponding genetic models using the expectation–maximization (EM) algorithm. In this study, the EM algorithm based on the random-marriage model for a nonsubdivided population was used to estimate gametic frequencies. The EM algorithm used in the study does not set any limitations on the number of loci and the number of alleles of each locus. Locus and alleles are identified by numeration making possible to arrange loops. In each combination of alleles for a given combination of m out of L loci (L is the total number of loci studied), all alleles are assigned value 1, and the remaining alleles are assigned value 0. The sum of zeros and unities for each gamete is its gametic value (h), and the sum of the gametic values of the gametes that form a given genotype is the genotypic value (g) of this genotype. Then, gametes with the sameh are united into a single class, which reduces the number of the estimated parameters. In a general case of m loci, this procedure yields m + 1 classes of gametes and 2m + 1 classes of genotypes with genotypic valuesg = 0, 1, 2,... 2m. The unknown frequencies of them + 1 classes of gametes can be represented as functions of the gametic frequencies whose maximum likelihood estimations (MLEs) have been obtained in all previous EM procedures and the only unknown frequency (P
m(m)) that is to be estimated in the given EM procedure. At the expectation step, the expected frequencies (F
m(g) of the genotypes with genotypic valuesg are expressed in terms of the products of the frequencies of m + 1 classes of gametes. The data on genotypes are the numbers (n
g) of individuals with genotypic values g = 0, 1, 2, 3, ..., 2m. The maximization step is the maximization of the logarithm of the likelihood function (LLF) for n
g values. Thus, the EM algorithm is reduced, in each case, to solution of only one equation with one unknown parameter with the use of the n
g values, i.e., the numbers of individuals after the corresponding regrouping of the data on the individuals" genotypes. Treatment of the data obtained by Kurbatova on the MNSs and Rhesus systems with alleles C, C
w
, c, D, d, E, e with the use of Weir's EM algorithm and the EM algorithm suggested in this study yielded similar results. However, the MLEs of the parameters obtained with the use of either algorithm often converged to a wrong solution: the sum of the frequencies of all gametes (4 and 12 gametes for MNSs and Rhesus, respectively) was not equal to 1.0 even if the global maximum of LLF was reached for each of them (as it was for MNSs with the use of Weir's EM algorithm), with each parameter falling within admissible limits (e.g., 0, min(P
N, P
s) for P
Ns). The χ2 function is suggested to be used as a goodness-of-fit function for the distribution of genotypes in a sample in order to select acceptable solutions. However, the minimum of this function only guarantee the acceptability of solutions if all limitations on the parameters are met: the sum of estimations of gametic frequencies is 1.0, each frequency falls within the admissible limits, and the “gametic algebra” is complied with (none of the frequencies is negative).

Journal

Russian Journal of Genetics
– Springer Journals

Published: Oct 13, 2004

Recommended Articles

Loading...

References

Abundant Class of Human DNA Polymorphisms, Which Can be Typed Using the Polymerase Chain Reaction

Weber, J.L.; May, P.E.

A Second-Generation Linkage Map of the Human Genome

Weissenbach, J.; Gyapay, G.; Dib, C.

An E-M Algorithm and Testing Strategy for Multiple-Locus Haplotypes

Long, J.C.; Williams, R.C.; Urbanek, M.

The Estimation of Gene Frequencies in a Random Mating Population

Ceppellini, R.; Siniscalo, M.; Smith, C.A.B.

Counting Methods in Genetical Statistics

Smith, C.A.B.

A Gene Counting Method of Maximum Likelihood for Estimating Gene Frequencies in AB0 and AB0-like Systems

Yasuda, N.; Kimura, M.

Genetic Data Analysis

Weir, B.S.

Maximum Likelihood from Incomplete Data via the EM Algorithm

Dempster, A.P.; Laird, N.M.; Rubin, D.B.

Demonstration of the Adaptive Role of Human Blood Group Polymorphism by Analysis of Several Loci: Heterozygosity Levels and Phenotypic Diversity in Two Generations

Kurbatova, O.L.

You’re reading a free preview. Subscribe to read the entire article.

“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”

Daniel C.

“Whoa! It’s like Spotify but for academic articles.”

@Phil_Robichaud

“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”

@deepthiw

“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.

Subscribe to Journal Email Alerts

To subscribe to email alerts, please log in first, or sign up for a DeepDyve account if you don’t already have one.

Follow a Journal

To get new article updates from a journal on your personalized homepage, please log in first, or sign up for a DeepDyve account if you don’t already have one.

Our policy towards the use of cookies

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.