Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DnaSP, DNA polymorphism analyses by the coalescent and other methods

DnaSP, DNA polymorphism analyses by the coalescent and other methods Vol. 19 no. 18 2003, pages 2496–2497 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/btg359 DnaSP, DNA polymorphism analyses by the coalescent and other methods 1,∗ 2, 2 Julio Rozas , Juan C. Sánchez-DelBarrio , Xavier Messeguer and Ricardo Rozas Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08071 Barcelona, Spain and Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Spain Received on February 28, 2003; revised on June 6, 2003; accepted on July 8, 2003 ABSTRACT theory (see Hudson, 1990; Rosenberg and Nordborg, 2002) Summary: DnaSP is a software package for the analysis of has become the primary framework to analyse the data. DNA polymorphism data. Present version introduces several Indeed, coalescent-based methods are critical for detecting new modules and features which, among other options allow: the signature of positive natural selection, in the identifica- (1) handling big data sets (∼5 Mb per sequence); (2) conduct- tion of haplotype blocks across the genome, or for inferring ing a large number of coalescent-based tests by Monte Carlo the effect of intragenic recombination. Here, we describe computer simulations; (3) extensive analyses of the genetic version 4 of the DnaSP software package (Rozas and Rozas, differentiation and gene flow among populations; (4) analysing 1999). Present version largely extends the capabilities of the the evolutionary pattern of preferred and unpreferred codons; software allowing extensive DNA polymorphism analyses on (5) generating graphical outputs for an easy visualization of a user-friendly interface. results. Availability: The software package, including complete docu- SYSTEMS AND METHODS mentation and examples, is freely available to academic users DnaSP version 4 is written in Microsoft Visual Basic v. 6.0 from: http://www.ub.es/dnasp and runs on ix86 compatible processors under Microsoft Contact: jrozas@ub.edu ® Windows . DnaSP can also run on Apple Macintosh, Linux and Unix-based platforms using Windows emulator software INTRODUCTION with one of the required Microsoft Windows versions. Recent advances in DNA sequencing and polymorphism MAIN NEW FEATURES detection methodologies are generating huge data sets of DNA sequence variation and of single nucleotide polymor- DnaSP provides a user-friendly Microsoft Windows graphic phisms (SNPs). Analysis of such DNA polymorphism data interface and can read (and export) five multiple-aligned nuc- will definitively enhance our understanding of both the evol- leotide sequence file formats: FASTA, MEGA, NBRF/PIR, utionary significance of DNA polymorphisms and of the NEXUS and PHYLIP. DnaSP allows the analysis of poly- evolutionary history of populations and species (Nordborg morphism, divergence, genetic differentiation, gene flow, and Innan, 2002). Additionally, DNA polymorphism informa- gene conversion, linkage disequilibrium, recombination, tion has a wide range of applications, including pharmaco- codon usage and also conducts a number of neutrality tests. genomics, animal and plant breeding, conservation genetics, The analyses can be performed in a subset of sites (including epidemiology genetics, medicine and forensics. synonymous, non-synonymous, non-coding, i-fold degener- Current massive data sets are stimulating the development ate sites) or in a subset of DNA sequences. Coding region of numerous methods to interpret DNA polymorphism data. analysis can be performed using a number of predefined These methods capture different features of the data (SNP genetic codes and codon usage tables. frequency, association among variants, haplotype structure, Coalescent-based methods synonymous and non-synonymous changes, recombinational DnaSP has extensively increased the capabilities of the events, codon usage, etc.) (Rosenberg and Nordborg, 2002; coalescent-based analyses. Present DnaSP version allows Bamshad and Wooding, 2003). In this context, the coalescent conducting most of the developed neutrality tests (with and without outgroup) and linkage disequilibrium statist- To whom correspondence should be addressed. ics, including—among others—(1) Tajima’s, Fu’s and Fu Present address: Departament de Tecnología, Universitat Pompeu Fabra, Barcelona, Spain. and Li’s tests (Tajima, 1989; Fu and Li, 1993; Fu, 1997); 2496 Bioinformatics 19(18) © Oxford University Press 2003; all rights reserved. DNA polymorphism analysis (2) Depaulis and Veuille’s haplotype-based tests (Depaulis manuscript. We also thank the numerous people who tested the and Veuille, 1998); (3) B - and Q-tests (Wall, 1999); (4) H -test program with their data, especially members of the Molecular (Fay and Wu, 2000); (5) Z , ZZ and Z linkage disequi- Evolutionary Genetics group in the Departament de Genètica, nS A librium based-statistics (Kelly, 1997; Rozas et al., 2001). Universitat de Barcelona. This work was supported by grant DnaSP also computes a number of statistical tests for detect- BMC2001-2906 from the Dirección General de Investigación ing population growth including the recently developed R test Científica y Técnica, Spain, conferred on M. Aguadé, and by (Ramos-Onsins and Rozas, 2002). The Monte Carlo computer grant TXT98-1802 from the Dirección General de Enseñanza simulation module allows generating the empirical distribu- Superior e Investigación Científica, Spain, conferred on J.R. tion for a very large number of test statistics. Simulations can be conducted for different recombination rates. REFERENCES Akashi,H. (1999) Detecting the ‘footprint’ of natural selection in Gene flow and genetic differentiation within and between species DNA sequence data. Gene, 238, The Gene Flow module has been completely rewritten. 39–51. Present version allows performing a number of gene flow and Bamshad,M. and Wooding,S.P. (2003) Signatures of natural selection genetic differentiation among population analyses with dif- in the human genome. Nat. Rev. Genet., 4, 99–111. Depaulis,F. and Veuille,M. (1998) Neutrality tests based on the dis- ferent options for treating alignment gaps. To detect genetic tribution of haplotypes under an infinite-site model. Mol. Biol. differentiation among subpopulations DnaSP implements sev- Evol., 15, 1788–1790. eral statistics based both on the number of haplotypes and Fay,J.C. and Wu,C.-I. (2000) Hitchhiking under positive Darwinian on the number of nucleotide changes (i.e. sequence-based selection. Genetics, 155, 1405–1413. statistics) (Hudson et al., 1992a; Hudson, 2000). DnaSP also Fu,Y.-X. and Li,W.-H. (1993) Statistical tests of neutrality of estimates several parameters of the standardized measure of mutations. Genetics, 133, 693–709. the genetic diversity among populations (F , and the related ST Fu,Y.-X. (1997) Statistical tests of neutrality of mutations against statistics G , N ) (see Hudson et al., 1992b). From these ST ST population growth, hitchhiking and background selection. F based estimators, the migration rates (in terms of Nm; ST Genetics, 147, 915–925. where m is the migration rate) are obtained. The outcome Hudson,R.R. (1990) Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol., 7, 1–44. values can be exported as a distance data file (PHYLIP and Hudson,R.R. (2000) A new statistic for detecting genetic differenti- MEGA formats) for further phylogenetic analyses. DnaSP ation. Genetics, 155, 2011–2014. incorporates two methods to test for genetic differentiation: 2 Hudson,R.R., Boos,D.D. and Kaplan,N.L. (1992a) A statistical (1) the standard χ homogeneity test and (2) a Monte Carlo test for detecting population subdivision. Mol. Biol. Evol., 9, permutation (randomization) test (Hudson et al., 1992a). 138–151. Hudson,R.R., Slatkin,M. and Maddison,W.P. (1992b) Estimation Analysis of preferred and unpreferred codons of levels of gene flow from DNA sequence data. Genetics, 132, Present version implements a number of algorithms and 583–589. methods to analyse the impact of natural selection and muta- Kelly,J.K. (1997) A test of neutrality based on interlocus associations. tional processes on codon usage bias. In addition to the Genetics, 146, 1197–1206. standard codon usage bias estimators (CBI, ENC, Scaled χ Nordborg,M. and Innan,H. (2002) Molecular population genetics. etc.), DnaSP also implements an algorithm to identify pre- Curr. Opin. Plant Biol., 5, 69–73. Ramos-Onsins,S.E. and Rozas,J. (2002) Statistical properties of new ferred (P) and unpreferred (U) synonymous changes. This neutrality tests against population growth. Mol. Biol. Evol., 19, information is critical for determining the effect of nat- 2092–2100. ural selection (weak selection) on synonymous codons (see Rosenberg,N.A. and Nordborg,M. (2002) Genealogical trees, coales- Akashi, 1999). DnaSP allows estimating the numbers of pre- cent theory, and the analysis of genetic polymorphisms. Nat. Rev. ferred and unpreferred changes within species (which requires Genet., 3, 380–390. the availability of one outgroup to polarize the mutations), Rozas,J. and Rozas,R. (1999) DnaSP version 3: an integrated pro- and also those changes polymorphic within species and fixed gram for molecular population genetics and molecular evolution between species (which requires the availability of two out- analysis. Bioinformatics, 15, 174–175. groups). DnaSP also provides several predefined codon usage Rozas,J., Gullaud,M., Blandin,G. and Aguadé,M. (2001) DNA varia- tables. The user, additionally, can also define his own codon tion at the rp49 gene region of Drosophila simulans: evolutionary inferences from an unusual haplotype structure. Genetics, 158, usage table; this user-defined information can be stored on a 1147–1155. private block of the NEXUS file format. Tajima,F. (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123, ACKNOWLEDGEMENTS 585–595. Wall,J.D. (1999) Recombination and the power of statistical tests of We thank M. Aguadé, A. Blanco-García, H. Quesada, neutrality. Genet. Res., 74, 65–69. C. Segarra and A. Vilella for critical comments on the http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

DnaSP, DNA polymorphism analyses by the coalescent and other methods

Loading next page...
 
/lp/oxford-university-press/dnasp-dna-polymorphism-analyses-by-the-coalescent-and-other-methods-cinAjWX6gt

References (19)

Publisher
Oxford University Press
Copyright
© Oxford University Press 2003; all rights reserved.
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btg359
Publisher site
See Article on Publisher Site

Abstract

Vol. 19 no. 18 2003, pages 2496–2497 BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/btg359 DnaSP, DNA polymorphism analyses by the coalescent and other methods 1,∗ 2, 2 Julio Rozas , Juan C. Sánchez-DelBarrio , Xavier Messeguer and Ricardo Rozas Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08071 Barcelona, Spain and Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Spain Received on February 28, 2003; revised on June 6, 2003; accepted on July 8, 2003 ABSTRACT theory (see Hudson, 1990; Rosenberg and Nordborg, 2002) Summary: DnaSP is a software package for the analysis of has become the primary framework to analyse the data. DNA polymorphism data. Present version introduces several Indeed, coalescent-based methods are critical for detecting new modules and features which, among other options allow: the signature of positive natural selection, in the identifica- (1) handling big data sets (∼5 Mb per sequence); (2) conduct- tion of haplotype blocks across the genome, or for inferring ing a large number of coalescent-based tests by Monte Carlo the effect of intragenic recombination. Here, we describe computer simulations; (3) extensive analyses of the genetic version 4 of the DnaSP software package (Rozas and Rozas, differentiation and gene flow among populations; (4) analysing 1999). Present version largely extends the capabilities of the the evolutionary pattern of preferred and unpreferred codons; software allowing extensive DNA polymorphism analyses on (5) generating graphical outputs for an easy visualization of a user-friendly interface. results. Availability: The software package, including complete docu- SYSTEMS AND METHODS mentation and examples, is freely available to academic users DnaSP version 4 is written in Microsoft Visual Basic v. 6.0 from: http://www.ub.es/dnasp and runs on ix86 compatible processors under Microsoft Contact: jrozas@ub.edu ® Windows . DnaSP can also run on Apple Macintosh, Linux and Unix-based platforms using Windows emulator software INTRODUCTION with one of the required Microsoft Windows versions. Recent advances in DNA sequencing and polymorphism MAIN NEW FEATURES detection methodologies are generating huge data sets of DNA sequence variation and of single nucleotide polymor- DnaSP provides a user-friendly Microsoft Windows graphic phisms (SNPs). Analysis of such DNA polymorphism data interface and can read (and export) five multiple-aligned nuc- will definitively enhance our understanding of both the evol- leotide sequence file formats: FASTA, MEGA, NBRF/PIR, utionary significance of DNA polymorphisms and of the NEXUS and PHYLIP. DnaSP allows the analysis of poly- evolutionary history of populations and species (Nordborg morphism, divergence, genetic differentiation, gene flow, and Innan, 2002). Additionally, DNA polymorphism informa- gene conversion, linkage disequilibrium, recombination, tion has a wide range of applications, including pharmaco- codon usage and also conducts a number of neutrality tests. genomics, animal and plant breeding, conservation genetics, The analyses can be performed in a subset of sites (including epidemiology genetics, medicine and forensics. synonymous, non-synonymous, non-coding, i-fold degener- Current massive data sets are stimulating the development ate sites) or in a subset of DNA sequences. Coding region of numerous methods to interpret DNA polymorphism data. analysis can be performed using a number of predefined These methods capture different features of the data (SNP genetic codes and codon usage tables. frequency, association among variants, haplotype structure, Coalescent-based methods synonymous and non-synonymous changes, recombinational DnaSP has extensively increased the capabilities of the events, codon usage, etc.) (Rosenberg and Nordborg, 2002; coalescent-based analyses. Present DnaSP version allows Bamshad and Wooding, 2003). In this context, the coalescent conducting most of the developed neutrality tests (with and without outgroup) and linkage disequilibrium statist- To whom correspondence should be addressed. ics, including—among others—(1) Tajima’s, Fu’s and Fu Present address: Departament de Tecnología, Universitat Pompeu Fabra, Barcelona, Spain. and Li’s tests (Tajima, 1989; Fu and Li, 1993; Fu, 1997); 2496 Bioinformatics 19(18) © Oxford University Press 2003; all rights reserved. DNA polymorphism analysis (2) Depaulis and Veuille’s haplotype-based tests (Depaulis manuscript. We also thank the numerous people who tested the and Veuille, 1998); (3) B - and Q-tests (Wall, 1999); (4) H -test program with their data, especially members of the Molecular (Fay and Wu, 2000); (5) Z , ZZ and Z linkage disequi- Evolutionary Genetics group in the Departament de Genètica, nS A librium based-statistics (Kelly, 1997; Rozas et al., 2001). Universitat de Barcelona. This work was supported by grant DnaSP also computes a number of statistical tests for detect- BMC2001-2906 from the Dirección General de Investigación ing population growth including the recently developed R test Científica y Técnica, Spain, conferred on M. Aguadé, and by (Ramos-Onsins and Rozas, 2002). The Monte Carlo computer grant TXT98-1802 from the Dirección General de Enseñanza simulation module allows generating the empirical distribu- Superior e Investigación Científica, Spain, conferred on J.R. tion for a very large number of test statistics. Simulations can be conducted for different recombination rates. REFERENCES Akashi,H. (1999) Detecting the ‘footprint’ of natural selection in Gene flow and genetic differentiation within and between species DNA sequence data. Gene, 238, The Gene Flow module has been completely rewritten. 39–51. Present version allows performing a number of gene flow and Bamshad,M. and Wooding,S.P. (2003) Signatures of natural selection genetic differentiation among population analyses with dif- in the human genome. Nat. Rev. Genet., 4, 99–111. Depaulis,F. and Veuille,M. (1998) Neutrality tests based on the dis- ferent options for treating alignment gaps. To detect genetic tribution of haplotypes under an infinite-site model. Mol. Biol. differentiation among subpopulations DnaSP implements sev- Evol., 15, 1788–1790. eral statistics based both on the number of haplotypes and Fay,J.C. and Wu,C.-I. (2000) Hitchhiking under positive Darwinian on the number of nucleotide changes (i.e. sequence-based selection. Genetics, 155, 1405–1413. statistics) (Hudson et al., 1992a; Hudson, 2000). DnaSP also Fu,Y.-X. and Li,W.-H. (1993) Statistical tests of neutrality of estimates several parameters of the standardized measure of mutations. Genetics, 133, 693–709. the genetic diversity among populations (F , and the related ST Fu,Y.-X. (1997) Statistical tests of neutrality of mutations against statistics G , N ) (see Hudson et al., 1992b). From these ST ST population growth, hitchhiking and background selection. F based estimators, the migration rates (in terms of Nm; ST Genetics, 147, 915–925. where m is the migration rate) are obtained. The outcome Hudson,R.R. (1990) Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol., 7, 1–44. values can be exported as a distance data file (PHYLIP and Hudson,R.R. (2000) A new statistic for detecting genetic differenti- MEGA formats) for further phylogenetic analyses. DnaSP ation. Genetics, 155, 2011–2014. incorporates two methods to test for genetic differentiation: 2 Hudson,R.R., Boos,D.D. and Kaplan,N.L. (1992a) A statistical (1) the standard χ homogeneity test and (2) a Monte Carlo test for detecting population subdivision. Mol. Biol. Evol., 9, permutation (randomization) test (Hudson et al., 1992a). 138–151. Hudson,R.R., Slatkin,M. and Maddison,W.P. (1992b) Estimation Analysis of preferred and unpreferred codons of levels of gene flow from DNA sequence data. Genetics, 132, Present version implements a number of algorithms and 583–589. methods to analyse the impact of natural selection and muta- Kelly,J.K. (1997) A test of neutrality based on interlocus associations. tional processes on codon usage bias. In addition to the Genetics, 146, 1197–1206. standard codon usage bias estimators (CBI, ENC, Scaled χ Nordborg,M. and Innan,H. (2002) Molecular population genetics. etc.), DnaSP also implements an algorithm to identify pre- Curr. Opin. Plant Biol., 5, 69–73. Ramos-Onsins,S.E. and Rozas,J. (2002) Statistical properties of new ferred (P) and unpreferred (U) synonymous changes. This neutrality tests against population growth. Mol. Biol. Evol., 19, information is critical for determining the effect of nat- 2092–2100. ural selection (weak selection) on synonymous codons (see Rosenberg,N.A. and Nordborg,M. (2002) Genealogical trees, coales- Akashi, 1999). DnaSP allows estimating the numbers of pre- cent theory, and the analysis of genetic polymorphisms. Nat. Rev. ferred and unpreferred changes within species (which requires Genet., 3, 380–390. the availability of one outgroup to polarize the mutations), Rozas,J. and Rozas,R. (1999) DnaSP version 3: an integrated pro- and also those changes polymorphic within species and fixed gram for molecular population genetics and molecular evolution between species (which requires the availability of two out- analysis. Bioinformatics, 15, 174–175. groups). DnaSP also provides several predefined codon usage Rozas,J., Gullaud,M., Blandin,G. and Aguadé,M. (2001) DNA varia- tables. The user, additionally, can also define his own codon tion at the rp49 gene region of Drosophila simulans: evolutionary inferences from an unusual haplotype structure. Genetics, 158, usage table; this user-defined information can be stored on a 1147–1155. private block of the NEXUS file format. Tajima,F. (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123, ACKNOWLEDGEMENTS 585–595. Wall,J.D. (1999) Recombination and the power of statistical tests of We thank M. Aguadé, A. Blanco-García, H. Quesada, neutrality. Genet. Res., 74, 65–69. C. Segarra and A. Vilella for critical comments on the

Journal

BioinformaticsOxford University Press

Published: Dec 12, 2003

There are no references for this article.