Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

MODELTEST: testing the model of DNA substitution.

MODELTEST: testing the model of DNA substitution. &#  %& BIOINFORMATICS APPLICATIONS NOTE MODELTEST: testing the model of DNA substitution ," &) % "*!  (%## '(*$%* & &&#& - (" !$ &+% %",()"*-   (&, through the Monte Carlo simulation (parametric bootstrap- Abstract ping) (Goldman, 1993). Summary: The program MODELTEST uses log likeli- Another way of comparing different models without the hood scores to establish the model of DNA evolution that nested requirement is the Akaike information criterion best fits the data. (minimum theoretical information criterion, AIC) (Akaike, Availability: The MODELTEST package, including the 1974). The AIC is a useful measure that rewards models for source code and some documentation is available at good fit, but imposes a penalty for unnecessary parameters http://bioag.byu.edu/zoology/crandall_lab/modeltest.html. (e.g. Hasegawa, 1990). If L is the maximum value of the li- Contact: dp47@email.byu.edu kelihood function for a specific model using n independently adjusted parameters within the model, then AIC = –2ln L + All phylogenetic methods make assumptions, whether ex- 2n. Smaller values of AIC indicate better models. plicit or implicit, about the process of DNA substitution (Fel- MODELTEST is a simple program written in ANSI C and senstein, 1988). For example, an assumption common to compiled for the Power Macintosh using Metrowerks Code- many phylogenetic methods is a bifurcating tree to describe Warrior. It is designed to compare different nested models of the phylogeny of species (Huelsenbeck and Crandall, 1997). DNA substitution in a hierarchical hypothesis-testing frame- Consequently, all the methods of phylogenetic inference de- pend on their underlying models. To have confidence in in- work (Figure 1). MODELTEST calculates the likelihood ferences it is necessary to have confidence in the models ratio test statistic δ = 2 log Λ and its associated P-value (Goldman, 1993). Because of this, all the methods based on using a χ distribution with q degrees of freedom in order to explicit models of evolution should explore which is the reject or fail to reject different null hypotheses about the pro- model that fits the data best, justifying then its use. In tradi- cess of DNA substitution. It also calculates the AIC estimate tional statistical theory, a widely accepted statistic for testing associated with each likelihood score. the goodness of fit of models is the likelihood ratio test statis- The user communicates with the program using a standard tic δ = 2 log Λ, being console interface, where the input and output files as well as some options and help can be specified. By default, the program max [L (Null Model | Data)] will accept two classes of input files: a file containing ordered raw log likelihood scores corresponding to the tested models max [L (Alternative Model | Data)] (see Figure 1) or a PAUP* (Swofford, 1998) file containing a matrix of the same log likelihood scores resulting from the ex- where L is the likelihood under the null hypothesis (simple ecution of a block of PAUP* (Swofford, 1998) commands. This model) and L is the likelihood under the alternative hypoth- block of PAUP* commands is available in the documentation. esis (more complex, parameter rich, model). When the mo- dels compared are nested (the null hypothesis is a special When specified, the program can also read a file with likelihood case of the alternative hypothesis), and the null hypothesis is scores for identifying the minimum AIC estimate. The output correct, the δ statistic is asymptotically distributed as χ with of MODELTEST consists of the P-values corresponding to the q degrees of freedom, where q is the difference in number of tests performed. In these tests the null hypotheses are equal base free parameters between the two models; equivalently, q is frequencies, transition rate equals transversion rate, equal transi- the number of restrictions on the parameters of the alternative tion rates and equal transversion rates, rates equal among sites hypothesis required to derive the particular case of the null and no invariable sites. Finally, the program interprets these P- hypothesis (Kendall and Stuart, 1979). To preserve the nest- values and chooses the model that fits the data best among those ing of the models, the likelihood scores are estimated using tested following the likelihood ratio test and/or AIC criteria, the same tree, and then, once the models have been com- using a default individual alpha value of 0.01 (for maintaining pared, a final tree is estimated using the chosen model of an overall alpha value of 0.05, the standard Bonferroni correc- evolution. When the models are not nested, an alternative tion — alpha/number of tests — results in an individual alpha means of generating the null distribution of the δ statistic is value of 0.01), or another value specified by the user. Oxford University Press 817 D.Posada and K.A.Crandall Fig. 1. Hierarchical hypothesis testing in MODELTEST. At each level the null hypothesis (upper model) is either accepted (A) or reject ed (R). The models of DNA substitution are: JC (Jukes and Cantor, 1969), K80 (Kimura, 1980), SYM (Zharkikh, 1994), F81 (Felsenstein, 19 81), HKY (Hasegawa et al., 1985), and GTR (Rodríguez et al., 1990). Γ: shape parameter of the gamma distribution; I: proportion of invariable sites. df: degrees of freedom. : equal base frequencies (0.25), π : frequency of adenine, π : frequency of cytosine, π : frequency of guanine, π : A C G T frequency of thymine. ρ: equal substitution rate, α: transition rate, β: transversion rate; μ : A⇒C rate, μ : A⇒G rate, μ : A⇒T rate, μ : C⇒G 1 2 3 4 rate, μ : C⇒T rate, μ : G⇒T rate. 5 6 Huelsenbeck,J.P. and Crandall,K.A. (1997) Phylogeny estimation and Acknowledgements hypothesis testing using maximum likelihood. Annu. Rev. Ecol. Syst., This project was supported by a fellowship from Caixagali- 28, 437–466. cia Foundation (D.P.), the Alfred P. Sloan Foundation Jukes,T.H. and Cantor,C.R. (1969) Evolution of protein molecules. In (K.A.C), and the National Institutes of Health (K.A.C.). We Munro (ed.), Mammalian Protein Metabolism. Academic Press, New wish to thank the anonymous reviewers for their excellent York, pp. 21–132. Kendall,M. and Stuart,A. (1979) The Advanced Theory of Statistics, Vol. suggestions. 2, 4th edn. Charles Griffin, London, pp. 240–252. Kimura,M. (1980) A simple method for estimating evolutionary rate of References base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111–120. Akaike,H. (1974) A new look at the statistical model identification. IEEE Rodríguez,F.J., Oliver,J.L., Marín,A. and Medina,J.R. (1990) The general Trans. Autom. Contr., 19, 716–723. stochastic model of nucleotide substitution. J. Theor. Biol., 142, Felsenstein,J. (1988) Phylogenies from molecular sequences: inference 485–501. and reliability. Annu. Rev. Genet., 22, 521–565. Swofford,D.L. (1998) PAUP*: phylogenetic analysis using parsimony Goldman,N. (1993) Statistical tests of models of DNA substitution. J. Mol. (and other methods). Version 4.0 (prerelease test version). Sinauer, Evol., 36, 182–198. Sunderland, Massachusetts (in press). Hasegawa,M. (1990) Phylogeny and molecular evolution in primates. Jpn Zharkikh,A. (1994) Estimation of evolutionary distances between nucleo- J. Genet., 65, 243–265. tide sequences. J. Mol. Evol., 9, 315–329. Hasegawa,M., Kishino,H. and Yano,T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 21, 160–174. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

MODELTEST: testing the model of DNA substitution.

Bioinformatics , Volume 14 (9): 2 – Jan 1, 1998

Loading next page...
 
/lp/oxford-university-press/modeltest-testing-the-model-of-dna-substitution-T3xYapbPmT

References (9)

Publisher
Oxford University Press
Copyright
© Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/14.9.817
Publisher site
See Article on Publisher Site

Abstract

&#  %& BIOINFORMATICS APPLICATIONS NOTE MODELTEST: testing the model of DNA substitution ," &) % "*!  (%## '(*$%* & &&#& - (" !$ &+% %",()"*-   (&, through the Monte Carlo simulation (parametric bootstrap- Abstract ping) (Goldman, 1993). Summary: The program MODELTEST uses log likeli- Another way of comparing different models without the hood scores to establish the model of DNA evolution that nested requirement is the Akaike information criterion best fits the data. (minimum theoretical information criterion, AIC) (Akaike, Availability: The MODELTEST package, including the 1974). The AIC is a useful measure that rewards models for source code and some documentation is available at good fit, but imposes a penalty for unnecessary parameters http://bioag.byu.edu/zoology/crandall_lab/modeltest.html. (e.g. Hasegawa, 1990). If L is the maximum value of the li- Contact: dp47@email.byu.edu kelihood function for a specific model using n independently adjusted parameters within the model, then AIC = –2ln L + All phylogenetic methods make assumptions, whether ex- 2n. Smaller values of AIC indicate better models. plicit or implicit, about the process of DNA substitution (Fel- MODELTEST is a simple program written in ANSI C and senstein, 1988). For example, an assumption common to compiled for the Power Macintosh using Metrowerks Code- many phylogenetic methods is a bifurcating tree to describe Warrior. It is designed to compare different nested models of the phylogeny of species (Huelsenbeck and Crandall, 1997). DNA substitution in a hierarchical hypothesis-testing frame- Consequently, all the methods of phylogenetic inference de- pend on their underlying models. To have confidence in in- work (Figure 1). MODELTEST calculates the likelihood ferences it is necessary to have confidence in the models ratio test statistic δ = 2 log Λ and its associated P-value (Goldman, 1993). Because of this, all the methods based on using a χ distribution with q degrees of freedom in order to explicit models of evolution should explore which is the reject or fail to reject different null hypotheses about the pro- model that fits the data best, justifying then its use. In tradi- cess of DNA substitution. It also calculates the AIC estimate tional statistical theory, a widely accepted statistic for testing associated with each likelihood score. the goodness of fit of models is the likelihood ratio test statis- The user communicates with the program using a standard tic δ = 2 log Λ, being console interface, where the input and output files as well as some options and help can be specified. By default, the program max [L (Null Model | Data)] will accept two classes of input files: a file containing ordered raw log likelihood scores corresponding to the tested models max [L (Alternative Model | Data)] (see Figure 1) or a PAUP* (Swofford, 1998) file containing a matrix of the same log likelihood scores resulting from the ex- where L is the likelihood under the null hypothesis (simple ecution of a block of PAUP* (Swofford, 1998) commands. This model) and L is the likelihood under the alternative hypoth- block of PAUP* commands is available in the documentation. esis (more complex, parameter rich, model). When the mo- dels compared are nested (the null hypothesis is a special When specified, the program can also read a file with likelihood case of the alternative hypothesis), and the null hypothesis is scores for identifying the minimum AIC estimate. The output correct, the δ statistic is asymptotically distributed as χ with of MODELTEST consists of the P-values corresponding to the q degrees of freedom, where q is the difference in number of tests performed. In these tests the null hypotheses are equal base free parameters between the two models; equivalently, q is frequencies, transition rate equals transversion rate, equal transi- the number of restrictions on the parameters of the alternative tion rates and equal transversion rates, rates equal among sites hypothesis required to derive the particular case of the null and no invariable sites. Finally, the program interprets these P- hypothesis (Kendall and Stuart, 1979). To preserve the nest- values and chooses the model that fits the data best among those ing of the models, the likelihood scores are estimated using tested following the likelihood ratio test and/or AIC criteria, the same tree, and then, once the models have been com- using a default individual alpha value of 0.01 (for maintaining pared, a final tree is estimated using the chosen model of an overall alpha value of 0.05, the standard Bonferroni correc- evolution. When the models are not nested, an alternative tion — alpha/number of tests — results in an individual alpha means of generating the null distribution of the δ statistic is value of 0.01), or another value specified by the user. Oxford University Press 817 D.Posada and K.A.Crandall Fig. 1. Hierarchical hypothesis testing in MODELTEST. At each level the null hypothesis (upper model) is either accepted (A) or reject ed (R). The models of DNA substitution are: JC (Jukes and Cantor, 1969), K80 (Kimura, 1980), SYM (Zharkikh, 1994), F81 (Felsenstein, 19 81), HKY (Hasegawa et al., 1985), and GTR (Rodríguez et al., 1990). Γ: shape parameter of the gamma distribution; I: proportion of invariable sites. df: degrees of freedom. : equal base frequencies (0.25), π : frequency of adenine, π : frequency of cytosine, π : frequency of guanine, π : A C G T frequency of thymine. ρ: equal substitution rate, α: transition rate, β: transversion rate; μ : A⇒C rate, μ : A⇒G rate, μ : A⇒T rate, μ : C⇒G 1 2 3 4 rate, μ : C⇒T rate, μ : G⇒T rate. 5 6 Huelsenbeck,J.P. and Crandall,K.A. (1997) Phylogeny estimation and Acknowledgements hypothesis testing using maximum likelihood. Annu. Rev. Ecol. Syst., This project was supported by a fellowship from Caixagali- 28, 437–466. cia Foundation (D.P.), the Alfred P. Sloan Foundation Jukes,T.H. and Cantor,C.R. (1969) Evolution of protein molecules. In (K.A.C), and the National Institutes of Health (K.A.C.). We Munro (ed.), Mammalian Protein Metabolism. Academic Press, New wish to thank the anonymous reviewers for their excellent York, pp. 21–132. Kendall,M. and Stuart,A. (1979) The Advanced Theory of Statistics, Vol. suggestions. 2, 4th edn. Charles Griffin, London, pp. 240–252. Kimura,M. (1980) A simple method for estimating evolutionary rate of References base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111–120. Akaike,H. (1974) A new look at the statistical model identification. IEEE Rodríguez,F.J., Oliver,J.L., Marín,A. and Medina,J.R. (1990) The general Trans. Autom. Contr., 19, 716–723. stochastic model of nucleotide substitution. J. Theor. Biol., 142, Felsenstein,J. (1988) Phylogenies from molecular sequences: inference 485–501. and reliability. Annu. Rev. Genet., 22, 521–565. Swofford,D.L. (1998) PAUP*: phylogenetic analysis using parsimony Goldman,N. (1993) Statistical tests of models of DNA substitution. J. Mol. (and other methods). Version 4.0 (prerelease test version). Sinauer, Evol., 36, 182–198. Sunderland, Massachusetts (in press). Hasegawa,M. (1990) Phylogeny and molecular evolution in primates. Jpn Zharkikh,A. (1994) Estimation of evolutionary distances between nucleo- J. Genet., 65, 243–265. tide sequences. J. Mol. Evol., 9, 315–329. Hasegawa,M., Kishino,H. and Yano,T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 21, 160–174.

Journal

BioinformaticsOxford University Press

Published: Jan 1, 1998

There are no references for this article.