Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

RDP2: recombination detection and analysis from sequence alignments

RDP2: recombination detection and analysis from sequence alignments Vol. 21 no. 2 2005, pages 260–262 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bth490 RDP2: recombination detection and analysis from sequence alignments 1,∗ 1 2 D. P. Martin , C. Williamson and D. Posada Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town 7000, South Africa and Department of Biochemistry, Genetics and Immunology, University of Vigo, 36200 Vigo, Spain Received on April 20, 2004; revised on June 28, 2004; accepted on August 13, 2004 Advance Access publication September 17, 2004 ABSTRACT under all conditions (Posada and Crandall, 2001; Posada, Summary: RDP2 is a Windows 95/XP program that exam- 2000). ines nucleotide sequence alignments and attempts to identify Sharing major components of its user interface and the RDP recombinant sequences and recombination breakpoints using recombination detection method with its predecessor, RDP, 10 published recombination detection methods, including RDP2 implements a variety of additional non-parametric geneconv, bootscan, maximum χ , chimaera and sister recombination detection methods (i.e. methods that do not scanning. The program enables fast automated analysis of make use of population genetic models and make no attempt to large alignments (up to 300 sequences containing 13 000 estimate the population recombination rate; Table 1). Among sites), and interactive exploration, management and verific- the new inclusions are many methods that have performed well ation of results with different recombination detection and tree in comparative tests (Drouin et al., 1999; Posada and Crandall, drawing methods. 2001; Posada, 2000). We have focused on published methods Availability: RDP2 is available free from the RDP2 website that can be used to (1) identify recombinant sequences, (2) (http://darwin.uvigo.es/rdp/rdp.html) identify recombination breakpoints and (3) identify parental Contact: darren@science.uct.ac.za sequences. The program can use any combination of six Supplementary information: Detailed descriptions of RDP2 methods to automatically (rdp, geneconv, maximum χ , and the methods it implements are included in the program bootscan, chimaera and sister scanning) identify recom- manual, which can be downloaded from the RDP2 website. binant and parental sequences, estimate breakpoint positions and calculate probability scores for potential recombination events. Once all potential recombination events are identi- fied, RDP2 sorts analysis results and attempts to determine A major problem encountered while using standard phylogen- the number of unique recombination events identifiable in etic methods in studies involving recombining organisms is an alignment. RDP2 can be set to automatically (1) filter that the evolutionary history of a recombinant sequence cannot be described with a single phylogenetic tree. A single recom- out unique events detected by fewer than a specified num- binant sequence in an alignment can seriously influence the ber of methods, (2) identify consensus daughter and parental branching order and branch lengths of the phylogenetic trees sequences using all evidence for a single actual recombination constructed using the alignment (Posada and Crandall, 2002). event (often involving many potential parental and daughter In addition, recombination compromises the validity of sev- sequence combinations detected using multiple methods) and eral phylogenetic inferences one can make by examining (3) use all evidence for a single actual event to determine most trees (Schierup and Hein, 2000a,b). A number of computa- probable breakpoint positions using a modified maximum χ tional tools for detecting and quantifying various aspects of approach (Maynard-Smith, 1992). recombination have therefore been developed (for a list of RDP2 permits exploration and checking of analysis results available recombination detection programs see http://www. in a highly interactive and user-friendly way. For any detected umber.embnet.org/∼robertson/recombination/index.shtml). A recombinations event, informations such as the method used Comparison of the recombination detection power of 14 of to detect the event, breakpoint positions, parental sequences, these methods using simulated and real datasets indicated that probability values, degrees of agreement with results obtained while some always performed better than others, no single using other detection methods, raw plot data, informative sites method can be adjudged to be best in detecting recombination in the alignment and phylogenetic trees, can be displayed by simply clicking on a graphical representation of the event. To whom correspondence should be addressed. Once an event is selected for more detailed study, checking 260 Bioinformatics vol. 21 issue 2 © Oxford University Press 2004; all rights reserved. RDP2: recombination detection and analysis Table 1. A brief description of recombination detection methods implemented in RDP2 Method (a.k.a.) Sequence Variable (V)/ Sliding Automated References a c comparisons All (A) sites window scans scanned rdp (RDP method) T V ++ Martin and Rybicki (2000) geneconv (Sawyer’s runs test) T/D V −+ Padidam et al. (1999) bootscan TA ++ Salminen et al. (1995) maximum χ (MaxChi) T/D V ++ Maynard-Smith (1992) chimaera TV ++ Posada and Crandall (2001) sister scanning (SiScan) T/F A ++ Gibbs et al. (2000) lard TA −− Holmes et al. (1999) distance plot (SimPlot) T/D A +− Lole et al. (1999) topal TA + − McGuire and Wright (2000) reticulate (compatibility matrix) F V −− Jakobsen and Easteal (1996) T, every possible combination of three sequences in an alignment scanned; D, every possible combination of two sequences in an alignment scanned with variable sites inferred from full alignment; and F, full alignment or substantial part thereof (4+ sequences) scanned with variable sites inferred only from the sequences being scanned. The exact subset of sites scanned will differ between methods and can also differ for the same method with different program settings. Only six methods can be used to automatically identify recombinant sequences and breakpoints from an alignment. Methods can also be run in either a manual or a checking mode allowing users to test specific recombination hypotheses. the evidence for recombination using 10 different recombina- of South Africa (D.P.M.), US National Institutes of Health tion detection methods (besides the six automated methods (D.P.) and the ‘Ramón y Cajal’ programme of the Spanish these also include lard, topal, reticulate and distance government (D.P.) for partially funding the development and plots) is achieved by simply selecting the methods from a distribution of RDP2. menu. To further aid in evaluating evidence for recombina- tion, RDP2 can also use phylip components simultaneously REFERENCES (Felsenstein, 1989; Olsen et al., 1994) to display phylogenetic Drouin,G., Prat,F., Ell,M. and Clarke,G.D.P. (1999) Detecting trees (UPGMA, bootstrapped neighbor-joining, least squares and characterizing gene conversions between multigene family or maximum-likelihood) constructed from different portions members. Mol. Biol. Evol., 16, 1369–1390. of an alignment. Felsenstein,J. (1989) PHYLIP—Phylogeny Inference Package As the amount of detectable recombination in an alignment (Version 3.2). Cladistics, 5, 164–166. Gibbs,M.J., Armstrong,J.S. and Gibbs,A.J. (2000) Sister-scanning: increases, the complexity of correctly inferring which a Monte Carlo procedure for assessing signals in recombinant sequences are parental and which are recombinant increases sequences. Bioinformatics, 16, 573–582. as well. RDP2 encourages user verification of its analysis Holmes,E.C., Worobey,M. and Rambaut,A. (1999) Phylogenetic results and permits user acceptance and rejection of potential evidence for recombination in Dengue virus. Mol. Biol. Evol., recombination events (useful for tracking the progress of an 16, 405–409. analysis), and interactive ‘correction’ of apparent parental and Jakobsen,I.B. and Easteal,S. (1996) A program for calculating daughter sequence misidentification. and displaying compatibility matrices as an aid in determining We have not placed any restrictions on the size of alignments reticulate evolution in molecular sequences. Comput. Appl. that can be examined using RDP2. For example, automated Biosci., 12, 291–295. analyses using all the detection methods together on a PC Lole,K.S., Bollinger,R.C., Paranjape,R.S., Gadarki,D., Kulkami,S.S., Novak,N.G., Ingersoll,R., Sheppard,H.W. and with 256 MB RAM and a 1 GHz Celeron Processor can take Ray,S.C. (1999) Full-length human immunodeficiency type 1 5 min for a 50 sequence alignment of 3 kb long sequences genomes from subtype C-infected seroconverters in India, with and less than 48 h for a 316 sequence alignment of 13 kb long evidence of intersubtype recombination. J. Virol., 73, 152–160. sequences. Martin, D. and Rybicki, E. (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics, 16, 562–563. Smith,J.M. (1992) Analyzing the mosaic structure of genes. J. Mol. ACKNOWLEDGEMENTS Evol., 34, 126–129. We would like to thank Stanley Sawyer, Andrew Rambaut, McGuire,G. and Wright,F. (2000) TOPAL 2.0: improved detection Ingrid Jakobsen, Joseph Felsenstein, Gary Olsen, Adrian of mosaic sequences within multiple alignments. Bioinformatics, Gibbs and John Armstrong for either agreeing to have their 16, 130–134. programs distributed using RDP2 or providing pieces of code Olsen,G.J., Matsuda,H., Hagstrom,R. and Overbeek,R. (1994) in RDP2. We also thank The National Research Foundation fastDNAML: a tool for construction of phylogenetic trees of 261 D.P.Martin et al. DNA sequences using maximum likelihood. Comput. Appl. Posada,D. and Crandall,K.A. (2002) The effect of recombination Biosci., 10, 41–48. on the accuracy of phylogeny estimation. J. Mol. Evol., 54, Padidam,M., Sawyer,S. and Fauquet,C.M. (1999) Possible 396–402. emergence of new geminiviruses by frequent recombination. Salminen,M.O., Carr,J.K., Burke,D.S. and McCutchan,F.E. (1995) Virology, 265, 218–225. Identification of breakpoints in intergenotypic recombinants of Posada,D. (2002) Evaluation of methods for detecting recombina- HIV type 1 by bootscanning. AIDS Res. Hum. Retroviruses., 11, tion from DNA sequences: empirical data. Mol. Biol. Evol., 19, 1423–1425. 708–717. Schierup,M.H. and Hein,J. (2000a) Consequences of recombination Posada,D. and Crandall,K.A. (2001) Evaluation of methods for on traditional phylogenetic analysis. Genetics, 156, 879–891. detecting recombination from DNA sequences: Computer Schierup,M.H. and Hein,J. (2000b) Recombination and the simulations. Proc. Natl Acad. Sci. USA, 98, 13757–13762. molecular clock. Mol. Biol. Evol., 17, 1578–1579. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

RDP2: recombination detection and analysis from sequence alignments

Bioinformatics , Volume 21 (2): 3 – Sep 17, 2004

Loading next page...
 
/lp/oxford-university-press/rdp2-recombination-detection-and-analysis-from-sequence-alignments-cAkyausRUq

References (19)

Publisher
Oxford University Press
Copyright
Bioinformatics vol. 21 issue 2 © Oxford University Press 2005; all rights reserved.
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bth490
pmid
15377507
Publisher site
See Article on Publisher Site

Abstract

Vol. 21 no. 2 2005, pages 260–262 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/bth490 RDP2: recombination detection and analysis from sequence alignments 1,∗ 1 2 D. P. Martin , C. Williamson and D. Posada Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town 7000, South Africa and Department of Biochemistry, Genetics and Immunology, University of Vigo, 36200 Vigo, Spain Received on April 20, 2004; revised on June 28, 2004; accepted on August 13, 2004 Advance Access publication September 17, 2004 ABSTRACT under all conditions (Posada and Crandall, 2001; Posada, Summary: RDP2 is a Windows 95/XP program that exam- 2000). ines nucleotide sequence alignments and attempts to identify Sharing major components of its user interface and the RDP recombinant sequences and recombination breakpoints using recombination detection method with its predecessor, RDP, 10 published recombination detection methods, including RDP2 implements a variety of additional non-parametric geneconv, bootscan, maximum χ , chimaera and sister recombination detection methods (i.e. methods that do not scanning. The program enables fast automated analysis of make use of population genetic models and make no attempt to large alignments (up to 300 sequences containing 13 000 estimate the population recombination rate; Table 1). Among sites), and interactive exploration, management and verific- the new inclusions are many methods that have performed well ation of results with different recombination detection and tree in comparative tests (Drouin et al., 1999; Posada and Crandall, drawing methods. 2001; Posada, 2000). We have focused on published methods Availability: RDP2 is available free from the RDP2 website that can be used to (1) identify recombinant sequences, (2) (http://darwin.uvigo.es/rdp/rdp.html) identify recombination breakpoints and (3) identify parental Contact: darren@science.uct.ac.za sequences. The program can use any combination of six Supplementary information: Detailed descriptions of RDP2 methods to automatically (rdp, geneconv, maximum χ , and the methods it implements are included in the program bootscan, chimaera and sister scanning) identify recom- manual, which can be downloaded from the RDP2 website. binant and parental sequences, estimate breakpoint positions and calculate probability scores for potential recombination events. Once all potential recombination events are identi- fied, RDP2 sorts analysis results and attempts to determine A major problem encountered while using standard phylogen- the number of unique recombination events identifiable in etic methods in studies involving recombining organisms is an alignment. RDP2 can be set to automatically (1) filter that the evolutionary history of a recombinant sequence cannot be described with a single phylogenetic tree. A single recom- out unique events detected by fewer than a specified num- binant sequence in an alignment can seriously influence the ber of methods, (2) identify consensus daughter and parental branching order and branch lengths of the phylogenetic trees sequences using all evidence for a single actual recombination constructed using the alignment (Posada and Crandall, 2002). event (often involving many potential parental and daughter In addition, recombination compromises the validity of sev- sequence combinations detected using multiple methods) and eral phylogenetic inferences one can make by examining (3) use all evidence for a single actual event to determine most trees (Schierup and Hein, 2000a,b). A number of computa- probable breakpoint positions using a modified maximum χ tional tools for detecting and quantifying various aspects of approach (Maynard-Smith, 1992). recombination have therefore been developed (for a list of RDP2 permits exploration and checking of analysis results available recombination detection programs see http://www. in a highly interactive and user-friendly way. For any detected umber.embnet.org/∼robertson/recombination/index.shtml). A recombinations event, informations such as the method used Comparison of the recombination detection power of 14 of to detect the event, breakpoint positions, parental sequences, these methods using simulated and real datasets indicated that probability values, degrees of agreement with results obtained while some always performed better than others, no single using other detection methods, raw plot data, informative sites method can be adjudged to be best in detecting recombination in the alignment and phylogenetic trees, can be displayed by simply clicking on a graphical representation of the event. To whom correspondence should be addressed. Once an event is selected for more detailed study, checking 260 Bioinformatics vol. 21 issue 2 © Oxford University Press 2004; all rights reserved. RDP2: recombination detection and analysis Table 1. A brief description of recombination detection methods implemented in RDP2 Method (a.k.a.) Sequence Variable (V)/ Sliding Automated References a c comparisons All (A) sites window scans scanned rdp (RDP method) T V ++ Martin and Rybicki (2000) geneconv (Sawyer’s runs test) T/D V −+ Padidam et al. (1999) bootscan TA ++ Salminen et al. (1995) maximum χ (MaxChi) T/D V ++ Maynard-Smith (1992) chimaera TV ++ Posada and Crandall (2001) sister scanning (SiScan) T/F A ++ Gibbs et al. (2000) lard TA −− Holmes et al. (1999) distance plot (SimPlot) T/D A +− Lole et al. (1999) topal TA + − McGuire and Wright (2000) reticulate (compatibility matrix) F V −− Jakobsen and Easteal (1996) T, every possible combination of three sequences in an alignment scanned; D, every possible combination of two sequences in an alignment scanned with variable sites inferred from full alignment; and F, full alignment or substantial part thereof (4+ sequences) scanned with variable sites inferred only from the sequences being scanned. The exact subset of sites scanned will differ between methods and can also differ for the same method with different program settings. Only six methods can be used to automatically identify recombinant sequences and breakpoints from an alignment. Methods can also be run in either a manual or a checking mode allowing users to test specific recombination hypotheses. the evidence for recombination using 10 different recombina- of South Africa (D.P.M.), US National Institutes of Health tion detection methods (besides the six automated methods (D.P.) and the ‘Ramón y Cajal’ programme of the Spanish these also include lard, topal, reticulate and distance government (D.P.) for partially funding the development and plots) is achieved by simply selecting the methods from a distribution of RDP2. menu. To further aid in evaluating evidence for recombina- tion, RDP2 can also use phylip components simultaneously REFERENCES (Felsenstein, 1989; Olsen et al., 1994) to display phylogenetic Drouin,G., Prat,F., Ell,M. and Clarke,G.D.P. (1999) Detecting trees (UPGMA, bootstrapped neighbor-joining, least squares and characterizing gene conversions between multigene family or maximum-likelihood) constructed from different portions members. Mol. Biol. Evol., 16, 1369–1390. of an alignment. Felsenstein,J. (1989) PHYLIP—Phylogeny Inference Package As the amount of detectable recombination in an alignment (Version 3.2). Cladistics, 5, 164–166. Gibbs,M.J., Armstrong,J.S. and Gibbs,A.J. (2000) Sister-scanning: increases, the complexity of correctly inferring which a Monte Carlo procedure for assessing signals in recombinant sequences are parental and which are recombinant increases sequences. Bioinformatics, 16, 573–582. as well. RDP2 encourages user verification of its analysis Holmes,E.C., Worobey,M. and Rambaut,A. (1999) Phylogenetic results and permits user acceptance and rejection of potential evidence for recombination in Dengue virus. Mol. Biol. Evol., recombination events (useful for tracking the progress of an 16, 405–409. analysis), and interactive ‘correction’ of apparent parental and Jakobsen,I.B. and Easteal,S. (1996) A program for calculating daughter sequence misidentification. and displaying compatibility matrices as an aid in determining We have not placed any restrictions on the size of alignments reticulate evolution in molecular sequences. Comput. Appl. that can be examined using RDP2. For example, automated Biosci., 12, 291–295. analyses using all the detection methods together on a PC Lole,K.S., Bollinger,R.C., Paranjape,R.S., Gadarki,D., Kulkami,S.S., Novak,N.G., Ingersoll,R., Sheppard,H.W. and with 256 MB RAM and a 1 GHz Celeron Processor can take Ray,S.C. (1999) Full-length human immunodeficiency type 1 5 min for a 50 sequence alignment of 3 kb long sequences genomes from subtype C-infected seroconverters in India, with and less than 48 h for a 316 sequence alignment of 13 kb long evidence of intersubtype recombination. J. Virol., 73, 152–160. sequences. Martin, D. and Rybicki, E. (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics, 16, 562–563. Smith,J.M. (1992) Analyzing the mosaic structure of genes. J. Mol. ACKNOWLEDGEMENTS Evol., 34, 126–129. We would like to thank Stanley Sawyer, Andrew Rambaut, McGuire,G. and Wright,F. (2000) TOPAL 2.0: improved detection Ingrid Jakobsen, Joseph Felsenstein, Gary Olsen, Adrian of mosaic sequences within multiple alignments. Bioinformatics, Gibbs and John Armstrong for either agreeing to have their 16, 130–134. programs distributed using RDP2 or providing pieces of code Olsen,G.J., Matsuda,H., Hagstrom,R. and Overbeek,R. (1994) in RDP2. We also thank The National Research Foundation fastDNAML: a tool for construction of phylogenetic trees of 261 D.P.Martin et al. DNA sequences using maximum likelihood. Comput. Appl. Posada,D. and Crandall,K.A. (2002) The effect of recombination Biosci., 10, 41–48. on the accuracy of phylogeny estimation. J. Mol. Evol., 54, Padidam,M., Sawyer,S. and Fauquet,C.M. (1999) Possible 396–402. emergence of new geminiviruses by frequent recombination. Salminen,M.O., Carr,J.K., Burke,D.S. and McCutchan,F.E. (1995) Virology, 265, 218–225. Identification of breakpoints in intergenotypic recombinants of Posada,D. (2002) Evaluation of methods for detecting recombina- HIV type 1 by bootscanning. AIDS Res. Hum. Retroviruses., 11, tion from DNA sequences: empirical data. Mol. Biol. Evol., 19, 1423–1425. 708–717. Schierup,M.H. and Hein,J. (2000a) Consequences of recombination Posada,D. and Crandall,K.A. (2001) Evaluation of methods for on traditional phylogenetic analysis. Genetics, 156, 879–891. detecting recombination from DNA sequences: Computer Schierup,M.H. and Hein,J. (2000b) Recombination and the simulations. Proc. Natl Acad. Sci. USA, 98, 13757–13762. molecular clock. Mol. Biol. Evol., 17, 1578–1579.

Journal

BioinformaticsOxford University Press

Published: Sep 17, 2004

There are no references for this article.