Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

RDP: detection of recombination amongst aligned sequences

RDP: detection of recombination amongst aligned sequences Vol. 16 no. 6 2000 BIOINFORMATICS APPLICATIONS NOTE Pages 562–563 RDP: detection of recombination amongst aligned sequences Darren Martin and Ed Rybicki Microbiology Department, University of Cape Town, Private Bag, Cape Town, South Africa Received on September 24, 1999; revised on December 8, 1999; accepted on December 23, 1999 Abstract one another than to a third sequence, C. Non-informative Summary: Recombination Detection Program (RDP) is sites are: a program that applies a pairwise scanning approach to 1. identical in all three sequences the detection of recombination amongst a group of aligned 2. different in all three sequences DNA sequences. The software runs under Windows95 and combines highly automated screening of large numbers of 3. unique to A or B and are not present in any member sequences with a highly interactive interface for examin- of a group of reference sequences. ing the results of the analyses. The method of reference sequence selection is user- Availability: For academic purposes RDP is available free defined and is based on the relative positions of the of charge from: http:// www.uct.ac.za/ depts/ microbiology/ three selected sequences in an UPGMA dendrogram. microdescription.htm In the second analysis step a window of user-defined Contact: darren@molbiol.uct.ac.za width is moved along the aligned sub-sequences one nucleotide at a time and an average percentage identity for each of the three possible sequence pairs is calculated at each position. Sequences of possibly recombinant Recombination between divergent genomes is believed to origin are identified as regions where the percentage be a major mechanism by which diversity amongst viruses identities of sequences A and C or B and C are higher is generated (Robertson et al., 1995). Although a number than for sequences A and B. In the third analysis step of methods have been devised for the analysis of recombi- the probability that the nucleotide arrangement in the nation (Grassly and Holmes, 1997; Hein, 1990; Maynard identified region that results in sequences A or B appearing Smith and Smith, 1998; McGuire et al., 1997; Salminen more closely related to sequence C may have occurred by et al., 1995; Sawyer, 1989; Siepel et al., 1995; Weiller, chance, is approximated using the binomial distribution 1998), the vast majority of computer programs that have adapted from Rice (1995): been devised to automate these methods lack an interac- L N! tive user interface, are incompatible with the most com- m N−m P = G × × p × (1 − p) mon personal computer operating systems, and are rela- N m!( N − m)! m= M tively inaccessible to casual users. We have written Re- combination Detection Program (RDP) as a means of ad- where G is the number of possible combinations of three dressing these problems. It runs under Windows95/98/NT sequences, L is the length of the information rich subse- and couples a high degree of analysis automation with an quences, N is the length of the putatively recombinant re- interactive and detailed graphical user interface. gion, M is the number of nucleotides in common between For the detection of recombination, RDP utilises a pair- either A or B and C in the putatively recombinant region, wise scanning approach. Beginning with a multiple se- and p is the proportion of nucleotides in common between quence alignment in Phylip, DNAMAN, FASTA, GCG or either A or B and C in the entire subsequence. If the value CLUSTAL formats, the software examines every possible of P is lower than a user-definable cut-off figure, infor- combination of three sequences for evidence of recombi- mation on the potential recombination is stored for later nation in a three-step procedure. In the first step all phy- access before the next combination of three sequences is logenetically non-informative sites are discarded from the selected and analysed. Once every combination of three group of three sequences to obtain three information-rich sequences has been analysed, an interactive graphical in- sub-sequences. In every group of three sequences there are terface for examination of the analysis results enables the two sequences, A and B, that are more closely related to user to access the stored information. 562  c Oxford University Press 2000 RDP: software for recombinant detection Control button panel Percentage identity display Sequence display Dendrogram display Pair-wise sub-sequence identity plot display Schematic sequence display Potentially recombinant region Fig. 1. The RDP user interface. An example is displayed of the output obtained following analysis of the sequences indicated in the sequence display and selection of a potentially recombinant region in the schematic sequence display for further analysis. Because reference sequences are selected based on References their positions relative to the selected triplets within an Grassly,N.C. and Holmes,E.C. (1997) A likelihood method for UPGMA dendrogram there are situations when RDP is the detection of selection and recombination using nucleotide sequences. Mol. Biol. Evol., 14, 239–247. unable to correctly discriminate between daughter and Hein,J. (1990) Reconstructing evolution of sequences subject to parental sequences. These situations may occur if the recombination using parsimony. Math. Biosci., 98, 185–200. parental and daughter sequences are all nearest neighbors McGuire,G., Wright,F. and Prentice,M.J. (1997) A graphical in the dendrogram, if only one parental sequence is method for detecting recombination in phylogenetic data sets. present in the alignment or where daughter sequences have Mol. Biol. Evol., 14, 1125–1131. obtained too few of their phylogenetically informative Maynard Smith,J. and Smith,N.H. (1998) Detecting recombination nucleotides from a single parent for them to be situated from gene trees. Mol. Biol. Evol., 15, 590–599. in the dendrogram at a position that properly reflects Rice,J.A. (1995) Mathematical statistics and data analysis. Duxbury Press, Belmont, pp. 36–38. their evolutionary history. In all these cases, however, the Robertson,D.L., Hahn,B.H. and Sharp,P.H. (1995) Recombination program will still approximate the correct recombination in AIDS viruses. J. Mol. Evol., 40, 249–259. breakpoints. Salminen,M., Carr,J.K., Burke,D.S. and McCutchan,F.E. (1995) The program’s interface is divided into a number of sec- Identification of breakpoints in intergenotypic recombinants tions that display both general information relating all of of HIV type 1 by bootscanning. AIDS Res. Hum. Retrovirus., 11, the aligned sequences to one another, and specific infor- 1423–1425. mation on the relationships of user-specified putatively re- Sawyer,S. (1989) Statistical tests for detecting gene conversion. Mol. Biol. Evol., 6, 526–538. combinant sequences to sequences closely related to their Siepel,A.C., Halpern,A.L., Macken,C. and Korber,B.T.M. (1995) potential parents (Figure 1). A computer program designed to screen rapidly for HIV We have used RDP to simultaneously analyse 86 full type 1 intersubtype recombinant sequences. AIDS Res. Hum. length HIV and SIV genomes, and have been able to Retrovirus., 11, 1413–1416. determine the composition of all previously identified Weiller,G.F. (1998) Phylogenetic profiles: a graphical method for inter-subtype HIV-1 recombinants (Robertson et al., 1995; detecting recombinations in homologous sequences. Mol. Evol. Salminen et al., 1995; Siepel et al., 1995). System., 15, 326–335. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

RDP: detection of recombination amongst aligned sequences

Bioinformatics , Volume 16 (6): 2 – Jun 1, 2000

Loading next page...
 
/lp/oxford-university-press/rdp-detection-of-recombination-amongst-aligned-sequences-4xh6AaciTf

References (9)

Publisher
Oxford University Press
Copyright
© Oxford University Press 2000
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/16.6.562
Publisher site
See Article on Publisher Site

Abstract

Vol. 16 no. 6 2000 BIOINFORMATICS APPLICATIONS NOTE Pages 562–563 RDP: detection of recombination amongst aligned sequences Darren Martin and Ed Rybicki Microbiology Department, University of Cape Town, Private Bag, Cape Town, South Africa Received on September 24, 1999; revised on December 8, 1999; accepted on December 23, 1999 Abstract one another than to a third sequence, C. Non-informative Summary: Recombination Detection Program (RDP) is sites are: a program that applies a pairwise scanning approach to 1. identical in all three sequences the detection of recombination amongst a group of aligned 2. different in all three sequences DNA sequences. The software runs under Windows95 and combines highly automated screening of large numbers of 3. unique to A or B and are not present in any member sequences with a highly interactive interface for examin- of a group of reference sequences. ing the results of the analyses. The method of reference sequence selection is user- Availability: For academic purposes RDP is available free defined and is based on the relative positions of the of charge from: http:// www.uct.ac.za/ depts/ microbiology/ three selected sequences in an UPGMA dendrogram. microdescription.htm In the second analysis step a window of user-defined Contact: darren@molbiol.uct.ac.za width is moved along the aligned sub-sequences one nucleotide at a time and an average percentage identity for each of the three possible sequence pairs is calculated at each position. Sequences of possibly recombinant Recombination between divergent genomes is believed to origin are identified as regions where the percentage be a major mechanism by which diversity amongst viruses identities of sequences A and C or B and C are higher is generated (Robertson et al., 1995). Although a number than for sequences A and B. In the third analysis step of methods have been devised for the analysis of recombi- the probability that the nucleotide arrangement in the nation (Grassly and Holmes, 1997; Hein, 1990; Maynard identified region that results in sequences A or B appearing Smith and Smith, 1998; McGuire et al., 1997; Salminen more closely related to sequence C may have occurred by et al., 1995; Sawyer, 1989; Siepel et al., 1995; Weiller, chance, is approximated using the binomial distribution 1998), the vast majority of computer programs that have adapted from Rice (1995): been devised to automate these methods lack an interac- L N! tive user interface, are incompatible with the most com- m N−m P = G × × p × (1 − p) mon personal computer operating systems, and are rela- N m!( N − m)! m= M tively inaccessible to casual users. We have written Re- combination Detection Program (RDP) as a means of ad- where G is the number of possible combinations of three dressing these problems. It runs under Windows95/98/NT sequences, L is the length of the information rich subse- and couples a high degree of analysis automation with an quences, N is the length of the putatively recombinant re- interactive and detailed graphical user interface. gion, M is the number of nucleotides in common between For the detection of recombination, RDP utilises a pair- either A or B and C in the putatively recombinant region, wise scanning approach. Beginning with a multiple se- and p is the proportion of nucleotides in common between quence alignment in Phylip, DNAMAN, FASTA, GCG or either A or B and C in the entire subsequence. If the value CLUSTAL formats, the software examines every possible of P is lower than a user-definable cut-off figure, infor- combination of three sequences for evidence of recombi- mation on the potential recombination is stored for later nation in a three-step procedure. In the first step all phy- access before the next combination of three sequences is logenetically non-informative sites are discarded from the selected and analysed. Once every combination of three group of three sequences to obtain three information-rich sequences has been analysed, an interactive graphical in- sub-sequences. In every group of three sequences there are terface for examination of the analysis results enables the two sequences, A and B, that are more closely related to user to access the stored information. 562  c Oxford University Press 2000 RDP: software for recombinant detection Control button panel Percentage identity display Sequence display Dendrogram display Pair-wise sub-sequence identity plot display Schematic sequence display Potentially recombinant region Fig. 1. The RDP user interface. An example is displayed of the output obtained following analysis of the sequences indicated in the sequence display and selection of a potentially recombinant region in the schematic sequence display for further analysis. Because reference sequences are selected based on References their positions relative to the selected triplets within an Grassly,N.C. and Holmes,E.C. (1997) A likelihood method for UPGMA dendrogram there are situations when RDP is the detection of selection and recombination using nucleotide sequences. Mol. Biol. Evol., 14, 239–247. unable to correctly discriminate between daughter and Hein,J. (1990) Reconstructing evolution of sequences subject to parental sequences. These situations may occur if the recombination using parsimony. Math. Biosci., 98, 185–200. parental and daughter sequences are all nearest neighbors McGuire,G., Wright,F. and Prentice,M.J. (1997) A graphical in the dendrogram, if only one parental sequence is method for detecting recombination in phylogenetic data sets. present in the alignment or where daughter sequences have Mol. Biol. Evol., 14, 1125–1131. obtained too few of their phylogenetically informative Maynard Smith,J. and Smith,N.H. (1998) Detecting recombination nucleotides from a single parent for them to be situated from gene trees. Mol. Biol. Evol., 15, 590–599. in the dendrogram at a position that properly reflects Rice,J.A. (1995) Mathematical statistics and data analysis. Duxbury Press, Belmont, pp. 36–38. their evolutionary history. In all these cases, however, the Robertson,D.L., Hahn,B.H. and Sharp,P.H. (1995) Recombination program will still approximate the correct recombination in AIDS viruses. J. Mol. Evol., 40, 249–259. breakpoints. Salminen,M., Carr,J.K., Burke,D.S. and McCutchan,F.E. (1995) The program’s interface is divided into a number of sec- Identification of breakpoints in intergenotypic recombinants tions that display both general information relating all of of HIV type 1 by bootscanning. AIDS Res. Hum. Retrovirus., 11, the aligned sequences to one another, and specific infor- 1423–1425. mation on the relationships of user-specified putatively re- Sawyer,S. (1989) Statistical tests for detecting gene conversion. Mol. Biol. Evol., 6, 526–538. combinant sequences to sequences closely related to their Siepel,A.C., Halpern,A.L., Macken,C. and Korber,B.T.M. (1995) potential parents (Figure 1). A computer program designed to screen rapidly for HIV We have used RDP to simultaneously analyse 86 full type 1 intersubtype recombinant sequences. AIDS Res. Hum. length HIV and SIV genomes, and have been able to Retrovirus., 11, 1413–1416. determine the composition of all previously identified Weiller,G.F. (1998) Phylogenetic profiles: a graphical method for inter-subtype HIV-1 recombinants (Robertson et al., 1995; detecting recombinations in homologous sequences. Mol. Evol. Salminen et al., 1995; Siepel et al., 1995). System., 15, 326–335.

Journal

BioinformaticsOxford University Press

Published: Jun 1, 2000

There are no references for this article.