Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

The Gypsy Database (GyDB) of mobile genetic elements: release 2.0

The Gypsy Database (GyDB) of mobile genetic elements: release 2.0 D70–D74 Nucleic Acids Research, 2011, Vol. 39, Database issue Published online 29 October 2010 doi:10.1093/nar/gkq1061 The Gypsy Database (GyDB) of mobile genetic elements: release 2.0 1, 1 1 1 Carlos Llorens *, Ricardo Futami , Laura Covelli , Laura Domı´nguez-Escriba ´ , 1 2 2 3 Jose M. Viu , Daniel Tamarit , Jose Aguilar-Rodrı´guez , Miguel Vicente-Ripolles , 1 1,4 5 1,3 Gonzalo Fuster , Guillermo P. Bernet , Florian Maumus , Alfonso Munoz-Pomer , 3 2,6 2,6 Jose M. Sempere , Amparo Latorre and Andres Moya Biotechvana, Parc Cientı´fic, Universitat de Vale` ncia, Calle Catedra´ tico Jose´ Beltra´ n 2, 46980 Paterna (Vale` ncia), Unidad Mixta de Investigacio´ n en Geno´ mica y Salud del Centro Superior de Investigacio´ n en Salud Pu´ blica (CSISP)-Universitat de Vale` ncia (Instituto Cavanilles de Biodiversidad y Biologı´a Evolutiva), Avenida de ` ´ ´ Catalun˜ a 21, 46020 Valencia, Departamento de Sistemas Informaticos y Computacion (DSIC), Universitat ` ` ` Politecnica de Valencia, Camino de Vera S/N, 46022 Valencia, Instituto Valenciano de Investigaciones Agrarias (IVIA), Carretera Moncada-Naquera, Km 4.5, 46113, Moncada (Valencia), Spain, Institut Jean-Pierre Bourgin, INRA Centre de Versailles-Grignon, Route de Saint-Cyr, 78026 Versailles, France and CIBER en Epidemiologia ´ ` y Salud Publica (CIBEResp), Parc de Recerca Biomedica de Barcelona, Calle Doctor Aiguader 88 1 Planta, 8003 Barcelona, Spain Received September 2, 2010; Revised October 11, 2010; Accepted October 13, 2010 ABSTRACT hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq This article introduces the second release of databases and BLAST and HMM servers to facilitate the Gypsy Database of Mobile Genetic Elements sequence characterization of new LTR retroelement (GyDB 2.0): a research project devoted to the evolu- and caulimovirus queries; and (v) a bibliographic tionary dynamics of viruses and transposable server. GyDB 2.0 is available at http://gydb.org. elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is INTRODUCTION continuously progressing, and that owing to the Mobile genetic elements (MGEs) are ubiquitous, autono- high molecular diversity of mobile elements mous genetic units that often constitute a significant part requires to be completed in several stages. GyDB of their host genomes. It is commonly accepted that 2.0 has been powered with a wiki to allow other mobile DNA elements are powerful vectors for disease researchers participate in the project. The current and evolution, from which distinct host genes have evolved during the history of life (1,2). The emergence database stage and scope are long terminal and subsequent role played by viruses and MGEs in the repeats (LTR) retroelements and relatives. GyDB history of life is an exciting topic that requires further 2.0 is an update based on the analysis of Ty3/ investigation. In this respect, researchers aim to discern Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR relevant aspects of the molecular changes responsible for retroelements and the Caulimoviridae pararetro- various characteristics in organisms related to horizontal viruses of plants. Among other features, in terms transfer, infection and disease. Among the distinct of the aforementioned topics, this update adds: initiatives launched with the aim of investigating the (i) a variety of descriptions and reviews distributed diversity of MGEs (see for example 3–5) was the Gypsy in multiple web pages; (ii) protein-based Database (GyDB) of MGEs (6), a research project phylogenies, where phylogenetic levels are devoted to the evolutionary dynamics of viruses and MGEs (and their related host proteins), which was assigned to distinct classified elements; (iii) a launched in 2008. The GyDB project is a highly collection of multiple alignments, lineage-specific *To whom correspondence should be addressed. Tel: + 34 963 544 993; Email: [email protected] The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2011, Vol. 39, Database issue D71 informative database established within an evolutionary THE UPDATE: NEW FEATURES context of classification, where one piece of research GyDB 2.0 consists of 1234 web pages addressing the delivers one conclusion that drives individuals towards phylogenetic study of Ty3/Gypsy, Retroviridae, Ty1/ another goal. The most captivating aspect of this project Copia and Bel/Pao LTR retroelement. Caulimoviruses is that a share of our efforts are dedicated to the interpret- (Caulimoviridae) are formally plant DNA ation of analyses, paying particular attention to non- pararetroviruses, but they were considered in GyDB 2.0 redundant elements displaying a certain degree of owing to their relationship with LTR retroelements based distance and investigating how they can be collectively on the common gag/coat and pol regions [for more details, aligned or related, in terms of protein domain architec- see (7) and references therein]. Table 1 summarizes the ture, with other lineages and elements. Because of the topics addressed in this update, as well as the servers impressive molecular diversity of viruses and MGEs, the and database sections it offers. The sequences on which GyDB is a long-term project that has been arranged in a GyDB 2.0 is based were retrieved from GenBank (8) and database in continuous progression, and must be achieved the methodologies employed were the same as those in stages. The current database stage and scope is described earlier in references (6,7,9). At GyDB we retroviruses and retrotransposons with long terminal evaluate the phylogenetic signal of classified distinct repeats (LTR retroelements) and their relatives. elements and create hidden Markov model (HMMs) Following the outline of the earlier release (the study of profiles (10) per lineage and protein domain. In Ty3/Gypsy and Retroviridae LTR retroelements), this addition, the project is concerned with the evolutionary article presents the GyDB update based on the phylogen- relationships between MGEs and their host genomes, etic evaluation of the most representative LTR retroele- based on the analysis of common protein families. In ment families and the plant caulimoviruses. This update, this regard, GyDB 2.0 focuses on two protein called GyDB 2.0, is available at http://gydb.org and superfamilies including protein products commonly includes sequence phylogenetic classification in addition encoded by LTR retroelements and their host genomes; to significant bioinformatic improvements. In particular, the new infrastructure implements a wiki management the chromodomain superfamily (11) and clan AA of system constructed with the aim of promoting a aspartic peptidases (12,13). This second release is world-wide community of researchers collaborating in accompanied by bibliographic data-mining from the analysis and classification of MGEs and viruses PubMed databases hosted at the National Center for inhabiting (or circulating in) living organisms. Biotechnology Information (NCBI, http://www.ncbi.nlm. Table 1. GyDB 2.0 new features: topics and contents Systems Families Lineages Elements Protein domains Accessory proteins LTRs LTR retroelements Ty3/Gypsy 34 96 8 1 Yes LTR retroelements Ty1/Copia 19 69 8 – Yes LTR retroelements Retroviridae 8 50 8 41 Yes LTR retroelements Bel/Pao 5 23 7 – Yes LTR retroelements Caulimoviridae 630 10 27 No Related families Clan AA 35 323 1 – No Related families Chromodomains 2 123 1 – No Topics Sections Availability Systematics 9 Side menu Domains 14 Side menu Database 8 Side menu Servers 3 Top menu: BLAST, HMM, Literature Wiki tools and utilities 3 Top menu Databases Items Sections Genomes (full-length genomes) 271 sequences BLAST search and RefSeq DBs LTRs (nucleotide sequences) 413 sequences BLAST search and RefSeq DBs Cores (protein cores sequences) 1895 sequences BLAST search and RefSeq DBs HMMs 314 HMM profiles HMM search and GyDB collection Multiple alignments 131 alignments GyDB collection Consensus sequences 314 MRC sequences GyDB collection Phylogenetic trees 70 trees Phylogenies Clan AA ancestral reconstruction 70 alignments CAARD database Literature 100797 references Literature server We included caulimoviruses in the second release in view of their relationship with LTR retroelements based on the common gag/coat and pol region. D72 Nucleic Acids Research, 2011, Vol. 39, Database issue nih.gov/) to document up to date information regarding major menus––a top menu and a side menu. The top menu the distinct classified elements. allows access to the three servers: (i) BLAST server; implements a BLAST search powered by the NCBI BLAST package (14), DATABASE ORGANIZATION allowing protein and DNA comparisons with the GENOMES, LTRs and CORES databases. These GyDB 2.0 is deployed over a Linux-MySQL-Apache-PHP databases collect the full-length genomes, the LTR (LAMP) stack, with additional Ajax programming to sequences and all the protein sequences on which minimize server responses to client browsers. The design the second release is based, respectively. is similar to that of the previous release but implements (ii) HMM server; implements HMMER3 package various changes on the web interface. As shown in (http://hmmer.janelia.org) and allows protein Figure 1, the database organization is founded upon two Figure 1. GyDB 2.0 organization and implementation. Nucleic Acids Research, 2011, Vol. 39, Database issue D73 comparisons against a database of protein domain FUTURE PERSPECTIVES lineage-specific HMM profiles created based on the Sequencing projects constantly deliver new types of MGEs update. This server provides additional comparisons [for example (17–22)]; hence the classification of non- between HMM profiles and the aforementioned redundant elements based on their phylogenetic signal is CORES database. an open issue at GyDB, and results in the preparation of (iii) LITERATURE server; allows users to search new sections. For example, we are committed to improv- bibliography of interest in the topic. ing the understanding of the diversity and evolutionary dynamics of MGEs in eukaryotic and prokaryotic organ- An additional new tool in GyDB 2.0 is its wiki, powered isms. In this regard of eukaryotic LTR retroelements (the by the MediaWiki content management system (http:// current database scope), the sequence repertoire at GyDB www.mediawiki.org/). This tool has been implemented with representative elements retrieved from recently to allow other users participate in the project by editing sequenced marine secondary endosymbionts including or creating topics. Accession to this wiki is free but it the brown alga Ectocarpus siliculosus (heterokont) and requires a subscription (registration). The rationale the coccolithophore Emiliania huxleyi (haptophyte) will behind this choice is that edits are registered by date and be implemented. In terms of other research topics in author in order to credit contributions, and secondly, we preparation, one concerns the construction of a server have programmed a revision mechanism to review all devoted to the study of the complete set of MGEs and changes constructively before making them public. The repeats (the mobilome) of biological genomes. This top menu includes three sections to log in and manage server will be introduced with two forthcoming publica- the distinct wiki resources. Finally, to the right of the tions focusing on the LTR retroelements and their related top menu, GyDB 2.0 includes a text field to search the transposases of the pea aphid Acyrthosiphon pisum whole project under two modes (detailed in Figure 1). genome [see (23)]. At the technical level, we are exploring The side menu divides the distinct GyDB sections into the application of formal grammars and machine learning three major demarcations (emphasized with boxes in algorithms to automate, as far as possible, the manage- Figure 1). The first collects sections associated with the ment and classification of the sequence data. We are also systematics applied at GyDB. The second implements in- committed to developing solutions for other non-trivial formation concerning the domains typically observed in difficulties that arise with the growing size of the the genomic structure of the elements we classify. The databases. Viruses and MGEs usually show different third demarcation offers free access to distinct databases, rates of evolution and high variability depending on the which are organized into three sections: evaluated protein or region. Therefore, we aim to implement more than one method of phylogenetic recon- (i) Trees and Networks; consists of the collection of struction to offer the user different perspectives based on inferred phylogenetic trees based on distinct different methods (or the opportunity to upload updated protein domains encoded by the classified phylogenies via the wiki). On the other hand, the trad- elements, or based on their concatenation (when itional view of the origin and evolution of biological they are parts of polyproteins). Remarkably, systems is that they are usually monophyletic, but such inferred pol polyprotein phylogenies based on the an assumption has been challenged by increasing concatenation of the protease, reverse transcriptase, evidence suggesting that natural evolution can frequently RNaseH and integrase domains, are the major cri- proceed by gradual and vertical means, in addition to terion for assigning phylogenetic levels at GyDB 2.0 distinct modular, saltatory and reticulate events (24–36). [results introduced in (7)]. Phylogenetic trees In this respect, we are investigating appropriate protocols provide links to the corresponding element page at to combine phylogenetic inference with new tendencies in GyDB 2.0. By clicking any element name in any tree network biology [see also (7)]. an entry assigned to this element is opened. These tree image maps were created using Phylograph 1.0 (15). This section includes the clan AA reference ACKNOWLEDGEMENTS database (CAARD) of ancestral maximum likeli- We thank all the colleagues detailed in the list available at hood (ML) reconstructions (13) that has been (http://gydb.org/index.php/Acknowledgments) for their implemented and maintained at GyDB. support in contributing images of biological host organ- (ii) GyDB collection (16) or the repository of multiple isms. We are also grateful to Senior NAR Editor alignments, HMMs, and majority rule consensus Dr Michael Galperin and to the two anonymous reviewers (MRC) sequences offered at GyDB 2.0. When a for their constructive comments in improving this article. deposited alignment, profile or MRC sequence is Finally we also thank Denys Wheatley and Angela associated with a journal publication, its entry in Panther from Biomedes for copyediting of this article. the collection includes citation information. (iii) REF SEQ DATABASES or the repository for downloading the databases (GENOMES, CORES FUNDING and LTRs) implemented in the BLAST server. Centro de Desarrollo Tecnolo´ gico Industrial (CDTI) Finally, a variety of links to other database initiatives (grant IDI-20100007, partial); Empresa Nacional de relevant to the topic are included in the side menu. Innovacio´ n, S.A (ENISA) (17092008, partial); IMPIVA D74 Nucleic Acids Research, 2011, Vol. 39, Database issue 18. Novikova,O., Mayorov,V., Smyshlyaev,G., Fursov,M., (IMIDTA/2009/118 and IMDTA/2010/740, partial); Adkison,L., Pisarenko,O. and Blinov,A. (2008) Novel clades of European Regional Development Fund (ERDF); chromodomain-containing Gypsy LTR retrotransposons from Ministerio de Ciencia e Innovacio´ n (MICINN) (Torres- mosses (Bryophyta). Plant J., 56, 562–574. Quevedo grants PTQ-09-01-00020, PTQ-09-01-00670 and 19. Bae,Y.A., Ahn,J.S., Kim,S.H., Rhyu,M.G., Kong,Y. and Cho,S.Y. (2008) PwRn1, a novel Ty3/gypsy-like retrotransposon PTQ-10-03552, partial). Funding for open access charge: of Paragonimus westermani: molecular characters and its University of Valencia. differentially preserved mobile potential according to host chromosomal polyploidy. BMC. Genomics, 9, 482. Conflict of interest statement. None declared. 20. Gao,D., Gill,N., Kim,H.R., Walling,J.G., Zhang,W., Fan,C., Yu,Y., Ma,J., SanMiguel,P., Jiang,N. et al. (2009) A lineage-specific centromere retrotransposon in Oryza brachyantha. REFERENCES Plant J., 60, 820–831. 21. Gottlieb,A.M. and Poggio,L. (2010) Genomic screening in 1. Hurst,G.D.D. and Schilthuizen,M. (1998) Selfish genetic elements dioecious ‘‘yerba mate’’ tree (Ilex paraguariensis A. St. Hill., and speciation. Heredity, 80, 2–8. Aquifoliaceae) through representational difference analysis. 2. Volff,J.N. and Brosius,J. (2007) Modern genomes with retro-look: Genetica, 138, 567–578. retrotransposed elements, retroposition and the origin of new 22. Maumus,F., Allen,A.E., Mhiri,C., Hu,H., Jabbari,K., Vardi,A., genes. Genome Dyn., 3, 175–190. Grandbastien,M.A. and Bowler,C. (2009) Potential impact of 3. Fauquet,C.M., Mayo,M.A., Desselberger,U. and Ball,L.A. (2005) stress activated retrotransposons on genome evolution in a marine Virus Taxonomy, VIIIth Report of the ICTV. Elsevier/Academic diatom. BMC Genomics, 10, 624. Press, London. 23. The International Aphid Genomics Consortium. (2010) Genome 4. Jurka,J., Kapitonov,V.V., Pavlicek,A., Klonowski,P., Kohany,O. sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol., 8, and Walichiewicz,J. (2005) Repbase Update, a database of e1000313. eukaryotic repetitive elements. Cytogenet. Genome Res., 110, 24. Malik,H.S. and Eickbush,T.H. (1999) Modular evolution of the 462–467. integrase domain in the Ty3/Gypsy class of LTR 5. Leplae,R., Hebrant,A., Wodak,S.J. and Toussaint,A. (2004) retrotransposons. J. Virol., 73, 5186–5190. ACLAME: a CLAssification of Mobile genetic Elements. 25. Lerat,E., Brunet,F., Bazin,C. and Capy,P. (1999) Is the evolution Nucleic Acids Res., 32, D45–D49. of transposable elements modular? Genetica, 107, 15–25. 6. Llorens,C., Futami,R., Bezemer,D. and Moya,A. (2008) The 26. Goodwin,T.J. and Poulter,R.T. (2002) A group of deuterostome Gypsy Database (GyDB) of mobile genetic elements. Ty3/ gypsy-like retrotransposons with Ty1/ copia-like pol-domain Nucleic Acids Res., 36, 38–46. orders. Mol. Genet. Genomics, 267, 481–491. 7. Llorens,C., Munoz-Pomer,A., Bernad,L., Botella,H. and Moya,A. 27. Eickbush,T.H. and Malik,H.S. (2002) Origin and evolution of (2009) Network dynamics of eukaryotic LTR retroelements retrotransposons. In Craig,N.L., Craigie,R., Gellert,M. and beyond phylogenetic trees. Biol. Direct., 4, 41. Lambowitz,A.M. (eds), Mobile DNA II. ASM Press, Washington 8. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and DC, pp. 1111–1144. Sayers,E.W. (2009) GenBank. Nucleic Acids Res., 37, D26–D31. 28. Malik,H.S. and Eickbush,T.H. (2001) Phylogenetic analysis of 9. Llorens,C., Fares,M.A. and Moya,A. (2008) Relationships of ribonuclease H domains suggests a late, chimeric origin of LTR Gag–pol diversity between Ty3/Gypsy and Retroviridae LTR retrotransposable elements and retroviruses. Genome Res., 11, retroelements and the three kings hypothesis. BMC Evol. Biol., 8, 1187–1197. 29. Marco,A. and Marin,I. (2008) How Athila retrotransposons 10. Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, survive in the Arabidopsis genome. BMC. Genomics, 9, 219. 14, 755–763. 30. Rambaut,A., Posada,D., Crandall,K.A. and Holmes,E.C. (2004) 11. Koonin,E.V., Zhou,S. and Lucchesi,J.C. (1995) The chromo The causes and consequences of HIV evolution. Nat. Rev. Genet., superfamily: new members, duplication of the chromo domain 5, 52–61. and possible role in delivering transcription regulators to 31. Flavell,A.J. (1999) Long terminal repeat retrotransposons jump chromatin. Nucleic Acids Res., 23, 4229–4233. between species. Proc. Natl Acad. Sci. USA, 96, 12211–12212. 12. Rawlings,N.D., Barrett,A.J. and Bateman,A. (2010) MEROPS: 32. Jordan,I.K., Matyunina,L.V. and McDonald,J.F. (1999) Evidence the peptidase database. Nucleic Acids Res., 38, D227–D233. for the recent horizontal transfer of long terminal repeat 13. Llorens,C., Futami,R., Renaud,G. and Moya,A. (2009) retrotransposon. Proc. Natl Acad. Sci. USA, 96, 12621–12625. Bioinformatic flowchart and database to investigate the origins 33. Bousalem,M., Douzery,E.J. and Seal,S.E. (2008) Taxonomy, and diversity of Clan AA peptidases. Biol. Direct., 4,3. molecular phylogeny and evolution of plant reverse transcribing 14. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., viruses (family Caulimoviridae) inferred from full-length genome Miller,W. and Lipman,D.J. (1997) Gapped BLAST and and reverse transcriptase sequences. Arch. Virol., 153, 1085–1102. PSI-BLAST: a new generation of protein database search 34. Koonin,E.V., Mushegian,A.R., Ryabov,E.V. and Dolja,V.V. programs. Nucleic Acids Res., 25, 3389–3402. (1991) Diverse groups of plant RNA and DNA viruses share 15. Llorens,C., Futami,R., Vicente-Ripolles,M. and Moya,A. (2008) related movement proteins that may possess chaperone-like Phylograph: a multifunction Java editor for handling phylogenetic activity. J. Gen. Virol., 72(Pt 12), 2895–2903. trees. Biotechvana Bioinformatics, Biotechvana, Valencia, SOFT: 35. Llorens,J.V., Clark,J.B., Martinez-Garay,I., Soriano,S., Phylograph. deFrutos,R. and Martinez-Sebastian,M.J. (2008) Gypsy 16. Llorens,C., Mun˜ oz-Pomer,A., Futami,R. and Moya,A. (2009) The endogenous retrovirus maintains potential infectivity in several GyDB Collection of Viral and Mobile Genetic Element Models. species of Drosophilids. BMC Evol. Biol., 8, 302. Biotechvana Bioinformatics, Biotechvana, Valencia, CR: GyDB 36. de Setta,N., Van Sluys,M.A., Capy,P. and Carareto,C.M. (2009) Collection. Multiple invasions of Gypsy and Micropia retroelements in genus 17. Piskurek,O., Nishihara,H. and Okada,N. (2008) The evolution of Zaprionus and melanogaster subgroup of the genus Drosophila. two partner LINE/SINE families and a full-length BMC Evol. Biol., 9, 279. chromodomain-containing Ty3/Gypsy LTR element in the first reptilian genome of Anolis carolinensis. Gene, 441, 111–118. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

Loading next page...
 
/lp/oxford-university-press/the-gypsy-database-gydb-of-mobile-genetic-elements-release-2-0-H2RYRM3Y77

References (100)

Publisher
Oxford University Press
Copyright
The Author(s) 2010. Published by Oxford University Press.
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gkq1061
pmid
21036865
Publisher site
See Article on Publisher Site

Abstract

D70–D74 Nucleic Acids Research, 2011, Vol. 39, Database issue Published online 29 October 2010 doi:10.1093/nar/gkq1061 The Gypsy Database (GyDB) of mobile genetic elements: release 2.0 1, 1 1 1 Carlos Llorens *, Ricardo Futami , Laura Covelli , Laura Domı´nguez-Escriba ´ , 1 2 2 3 Jose M. Viu , Daniel Tamarit , Jose Aguilar-Rodrı´guez , Miguel Vicente-Ripolles , 1 1,4 5 1,3 Gonzalo Fuster , Guillermo P. Bernet , Florian Maumus , Alfonso Munoz-Pomer , 3 2,6 2,6 Jose M. Sempere , Amparo Latorre and Andres Moya Biotechvana, Parc Cientı´fic, Universitat de Vale` ncia, Calle Catedra´ tico Jose´ Beltra´ n 2, 46980 Paterna (Vale` ncia), Unidad Mixta de Investigacio´ n en Geno´ mica y Salud del Centro Superior de Investigacio´ n en Salud Pu´ blica (CSISP)-Universitat de Vale` ncia (Instituto Cavanilles de Biodiversidad y Biologı´a Evolutiva), Avenida de ` ´ ´ Catalun˜ a 21, 46020 Valencia, Departamento de Sistemas Informaticos y Computacion (DSIC), Universitat ` ` ` Politecnica de Valencia, Camino de Vera S/N, 46022 Valencia, Instituto Valenciano de Investigaciones Agrarias (IVIA), Carretera Moncada-Naquera, Km 4.5, 46113, Moncada (Valencia), Spain, Institut Jean-Pierre Bourgin, INRA Centre de Versailles-Grignon, Route de Saint-Cyr, 78026 Versailles, France and CIBER en Epidemiologia ´ ` y Salud Publica (CIBEResp), Parc de Recerca Biomedica de Barcelona, Calle Doctor Aiguader 88 1 Planta, 8003 Barcelona, Spain Received September 2, 2010; Revised October 11, 2010; Accepted October 13, 2010 ABSTRACT hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq This article introduces the second release of databases and BLAST and HMM servers to facilitate the Gypsy Database of Mobile Genetic Elements sequence characterization of new LTR retroelement (GyDB 2.0): a research project devoted to the evolu- and caulimovirus queries; and (v) a bibliographic tionary dynamics of viruses and transposable server. GyDB 2.0 is available at http://gydb.org. elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is INTRODUCTION continuously progressing, and that owing to the Mobile genetic elements (MGEs) are ubiquitous, autono- high molecular diversity of mobile elements mous genetic units that often constitute a significant part requires to be completed in several stages. GyDB of their host genomes. It is commonly accepted that 2.0 has been powered with a wiki to allow other mobile DNA elements are powerful vectors for disease researchers participate in the project. The current and evolution, from which distinct host genes have evolved during the history of life (1,2). The emergence database stage and scope are long terminal and subsequent role played by viruses and MGEs in the repeats (LTR) retroelements and relatives. GyDB history of life is an exciting topic that requires further 2.0 is an update based on the analysis of Ty3/ investigation. In this respect, researchers aim to discern Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR relevant aspects of the molecular changes responsible for retroelements and the Caulimoviridae pararetro- various characteristics in organisms related to horizontal viruses of plants. Among other features, in terms transfer, infection and disease. Among the distinct of the aforementioned topics, this update adds: initiatives launched with the aim of investigating the (i) a variety of descriptions and reviews distributed diversity of MGEs (see for example 3–5) was the Gypsy in multiple web pages; (ii) protein-based Database (GyDB) of MGEs (6), a research project phylogenies, where phylogenetic levels are devoted to the evolutionary dynamics of viruses and MGEs (and their related host proteins), which was assigned to distinct classified elements; (iii) a launched in 2008. The GyDB project is a highly collection of multiple alignments, lineage-specific *To whom correspondence should be addressed. Tel: + 34 963 544 993; Email: [email protected] The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2011, Vol. 39, Database issue D71 informative database established within an evolutionary THE UPDATE: NEW FEATURES context of classification, where one piece of research GyDB 2.0 consists of 1234 web pages addressing the delivers one conclusion that drives individuals towards phylogenetic study of Ty3/Gypsy, Retroviridae, Ty1/ another goal. The most captivating aspect of this project Copia and Bel/Pao LTR retroelement. Caulimoviruses is that a share of our efforts are dedicated to the interpret- (Caulimoviridae) are formally plant DNA ation of analyses, paying particular attention to non- pararetroviruses, but they were considered in GyDB 2.0 redundant elements displaying a certain degree of owing to their relationship with LTR retroelements based distance and investigating how they can be collectively on the common gag/coat and pol regions [for more details, aligned or related, in terms of protein domain architec- see (7) and references therein]. Table 1 summarizes the ture, with other lineages and elements. Because of the topics addressed in this update, as well as the servers impressive molecular diversity of viruses and MGEs, the and database sections it offers. The sequences on which GyDB is a long-term project that has been arranged in a GyDB 2.0 is based were retrieved from GenBank (8) and database in continuous progression, and must be achieved the methodologies employed were the same as those in stages. The current database stage and scope is described earlier in references (6,7,9). At GyDB we retroviruses and retrotransposons with long terminal evaluate the phylogenetic signal of classified distinct repeats (LTR retroelements) and their relatives. elements and create hidden Markov model (HMMs) Following the outline of the earlier release (the study of profiles (10) per lineage and protein domain. In Ty3/Gypsy and Retroviridae LTR retroelements), this addition, the project is concerned with the evolutionary article presents the GyDB update based on the phylogen- relationships between MGEs and their host genomes, etic evaluation of the most representative LTR retroele- based on the analysis of common protein families. In ment families and the plant caulimoviruses. This update, this regard, GyDB 2.0 focuses on two protein called GyDB 2.0, is available at http://gydb.org and superfamilies including protein products commonly includes sequence phylogenetic classification in addition encoded by LTR retroelements and their host genomes; to significant bioinformatic improvements. In particular, the new infrastructure implements a wiki management the chromodomain superfamily (11) and clan AA of system constructed with the aim of promoting a aspartic peptidases (12,13). This second release is world-wide community of researchers collaborating in accompanied by bibliographic data-mining from the analysis and classification of MGEs and viruses PubMed databases hosted at the National Center for inhabiting (or circulating in) living organisms. Biotechnology Information (NCBI, http://www.ncbi.nlm. Table 1. GyDB 2.0 new features: topics and contents Systems Families Lineages Elements Protein domains Accessory proteins LTRs LTR retroelements Ty3/Gypsy 34 96 8 1 Yes LTR retroelements Ty1/Copia 19 69 8 – Yes LTR retroelements Retroviridae 8 50 8 41 Yes LTR retroelements Bel/Pao 5 23 7 – Yes LTR retroelements Caulimoviridae 630 10 27 No Related families Clan AA 35 323 1 – No Related families Chromodomains 2 123 1 – No Topics Sections Availability Systematics 9 Side menu Domains 14 Side menu Database 8 Side menu Servers 3 Top menu: BLAST, HMM, Literature Wiki tools and utilities 3 Top menu Databases Items Sections Genomes (full-length genomes) 271 sequences BLAST search and RefSeq DBs LTRs (nucleotide sequences) 413 sequences BLAST search and RefSeq DBs Cores (protein cores sequences) 1895 sequences BLAST search and RefSeq DBs HMMs 314 HMM profiles HMM search and GyDB collection Multiple alignments 131 alignments GyDB collection Consensus sequences 314 MRC sequences GyDB collection Phylogenetic trees 70 trees Phylogenies Clan AA ancestral reconstruction 70 alignments CAARD database Literature 100797 references Literature server We included caulimoviruses in the second release in view of their relationship with LTR retroelements based on the common gag/coat and pol region. D72 Nucleic Acids Research, 2011, Vol. 39, Database issue nih.gov/) to document up to date information regarding major menus––a top menu and a side menu. The top menu the distinct classified elements. allows access to the three servers: (i) BLAST server; implements a BLAST search powered by the NCBI BLAST package (14), DATABASE ORGANIZATION allowing protein and DNA comparisons with the GENOMES, LTRs and CORES databases. These GyDB 2.0 is deployed over a Linux-MySQL-Apache-PHP databases collect the full-length genomes, the LTR (LAMP) stack, with additional Ajax programming to sequences and all the protein sequences on which minimize server responses to client browsers. The design the second release is based, respectively. is similar to that of the previous release but implements (ii) HMM server; implements HMMER3 package various changes on the web interface. As shown in (http://hmmer.janelia.org) and allows protein Figure 1, the database organization is founded upon two Figure 1. GyDB 2.0 organization and implementation. Nucleic Acids Research, 2011, Vol. 39, Database issue D73 comparisons against a database of protein domain FUTURE PERSPECTIVES lineage-specific HMM profiles created based on the Sequencing projects constantly deliver new types of MGEs update. This server provides additional comparisons [for example (17–22)]; hence the classification of non- between HMM profiles and the aforementioned redundant elements based on their phylogenetic signal is CORES database. an open issue at GyDB, and results in the preparation of (iii) LITERATURE server; allows users to search new sections. For example, we are committed to improv- bibliography of interest in the topic. ing the understanding of the diversity and evolutionary dynamics of MGEs in eukaryotic and prokaryotic organ- An additional new tool in GyDB 2.0 is its wiki, powered isms. In this regard of eukaryotic LTR retroelements (the by the MediaWiki content management system (http:// current database scope), the sequence repertoire at GyDB www.mediawiki.org/). This tool has been implemented with representative elements retrieved from recently to allow other users participate in the project by editing sequenced marine secondary endosymbionts including or creating topics. Accession to this wiki is free but it the brown alga Ectocarpus siliculosus (heterokont) and requires a subscription (registration). The rationale the coccolithophore Emiliania huxleyi (haptophyte) will behind this choice is that edits are registered by date and be implemented. In terms of other research topics in author in order to credit contributions, and secondly, we preparation, one concerns the construction of a server have programmed a revision mechanism to review all devoted to the study of the complete set of MGEs and changes constructively before making them public. The repeats (the mobilome) of biological genomes. This top menu includes three sections to log in and manage server will be introduced with two forthcoming publica- the distinct wiki resources. Finally, to the right of the tions focusing on the LTR retroelements and their related top menu, GyDB 2.0 includes a text field to search the transposases of the pea aphid Acyrthosiphon pisum whole project under two modes (detailed in Figure 1). genome [see (23)]. At the technical level, we are exploring The side menu divides the distinct GyDB sections into the application of formal grammars and machine learning three major demarcations (emphasized with boxes in algorithms to automate, as far as possible, the manage- Figure 1). The first collects sections associated with the ment and classification of the sequence data. We are also systematics applied at GyDB. The second implements in- committed to developing solutions for other non-trivial formation concerning the domains typically observed in difficulties that arise with the growing size of the the genomic structure of the elements we classify. The databases. Viruses and MGEs usually show different third demarcation offers free access to distinct databases, rates of evolution and high variability depending on the which are organized into three sections: evaluated protein or region. Therefore, we aim to implement more than one method of phylogenetic recon- (i) Trees and Networks; consists of the collection of struction to offer the user different perspectives based on inferred phylogenetic trees based on distinct different methods (or the opportunity to upload updated protein domains encoded by the classified phylogenies via the wiki). On the other hand, the trad- elements, or based on their concatenation (when itional view of the origin and evolution of biological they are parts of polyproteins). Remarkably, systems is that they are usually monophyletic, but such inferred pol polyprotein phylogenies based on the an assumption has been challenged by increasing concatenation of the protease, reverse transcriptase, evidence suggesting that natural evolution can frequently RNaseH and integrase domains, are the major cri- proceed by gradual and vertical means, in addition to terion for assigning phylogenetic levels at GyDB 2.0 distinct modular, saltatory and reticulate events (24–36). [results introduced in (7)]. Phylogenetic trees In this respect, we are investigating appropriate protocols provide links to the corresponding element page at to combine phylogenetic inference with new tendencies in GyDB 2.0. By clicking any element name in any tree network biology [see also (7)]. an entry assigned to this element is opened. These tree image maps were created using Phylograph 1.0 (15). This section includes the clan AA reference ACKNOWLEDGEMENTS database (CAARD) of ancestral maximum likeli- We thank all the colleagues detailed in the list available at hood (ML) reconstructions (13) that has been (http://gydb.org/index.php/Acknowledgments) for their implemented and maintained at GyDB. support in contributing images of biological host organ- (ii) GyDB collection (16) or the repository of multiple isms. We are also grateful to Senior NAR Editor alignments, HMMs, and majority rule consensus Dr Michael Galperin and to the two anonymous reviewers (MRC) sequences offered at GyDB 2.0. When a for their constructive comments in improving this article. deposited alignment, profile or MRC sequence is Finally we also thank Denys Wheatley and Angela associated with a journal publication, its entry in Panther from Biomedes for copyediting of this article. the collection includes citation information. (iii) REF SEQ DATABASES or the repository for downloading the databases (GENOMES, CORES FUNDING and LTRs) implemented in the BLAST server. Centro de Desarrollo Tecnolo´ gico Industrial (CDTI) Finally, a variety of links to other database initiatives (grant IDI-20100007, partial); Empresa Nacional de relevant to the topic are included in the side menu. Innovacio´ n, S.A (ENISA) (17092008, partial); IMPIVA D74 Nucleic Acids Research, 2011, Vol. 39, Database issue 18. Novikova,O., Mayorov,V., Smyshlyaev,G., Fursov,M., (IMIDTA/2009/118 and IMDTA/2010/740, partial); Adkison,L., Pisarenko,O. and Blinov,A. (2008) Novel clades of European Regional Development Fund (ERDF); chromodomain-containing Gypsy LTR retrotransposons from Ministerio de Ciencia e Innovacio´ n (MICINN) (Torres- mosses (Bryophyta). Plant J., 56, 562–574. Quevedo grants PTQ-09-01-00020, PTQ-09-01-00670 and 19. Bae,Y.A., Ahn,J.S., Kim,S.H., Rhyu,M.G., Kong,Y. and Cho,S.Y. (2008) PwRn1, a novel Ty3/gypsy-like retrotransposon PTQ-10-03552, partial). Funding for open access charge: of Paragonimus westermani: molecular characters and its University of Valencia. differentially preserved mobile potential according to host chromosomal polyploidy. BMC. Genomics, 9, 482. Conflict of interest statement. None declared. 20. Gao,D., Gill,N., Kim,H.R., Walling,J.G., Zhang,W., Fan,C., Yu,Y., Ma,J., SanMiguel,P., Jiang,N. et al. (2009) A lineage-specific centromere retrotransposon in Oryza brachyantha. REFERENCES Plant J., 60, 820–831. 21. Gottlieb,A.M. and Poggio,L. (2010) Genomic screening in 1. Hurst,G.D.D. and Schilthuizen,M. (1998) Selfish genetic elements dioecious ‘‘yerba mate’’ tree (Ilex paraguariensis A. St. Hill., and speciation. Heredity, 80, 2–8. Aquifoliaceae) through representational difference analysis. 2. Volff,J.N. and Brosius,J. (2007) Modern genomes with retro-look: Genetica, 138, 567–578. retrotransposed elements, retroposition and the origin of new 22. Maumus,F., Allen,A.E., Mhiri,C., Hu,H., Jabbari,K., Vardi,A., genes. Genome Dyn., 3, 175–190. Grandbastien,M.A. and Bowler,C. (2009) Potential impact of 3. Fauquet,C.M., Mayo,M.A., Desselberger,U. and Ball,L.A. (2005) stress activated retrotransposons on genome evolution in a marine Virus Taxonomy, VIIIth Report of the ICTV. Elsevier/Academic diatom. BMC Genomics, 10, 624. Press, London. 23. The International Aphid Genomics Consortium. (2010) Genome 4. Jurka,J., Kapitonov,V.V., Pavlicek,A., Klonowski,P., Kohany,O. sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol., 8, and Walichiewicz,J. (2005) Repbase Update, a database of e1000313. eukaryotic repetitive elements. Cytogenet. Genome Res., 110, 24. Malik,H.S. and Eickbush,T.H. (1999) Modular evolution of the 462–467. integrase domain in the Ty3/Gypsy class of LTR 5. Leplae,R., Hebrant,A., Wodak,S.J. and Toussaint,A. (2004) retrotransposons. J. Virol., 73, 5186–5190. ACLAME: a CLAssification of Mobile genetic Elements. 25. Lerat,E., Brunet,F., Bazin,C. and Capy,P. (1999) Is the evolution Nucleic Acids Res., 32, D45–D49. of transposable elements modular? Genetica, 107, 15–25. 6. Llorens,C., Futami,R., Bezemer,D. and Moya,A. (2008) The 26. Goodwin,T.J. and Poulter,R.T. (2002) A group of deuterostome Gypsy Database (GyDB) of mobile genetic elements. Ty3/ gypsy-like retrotransposons with Ty1/ copia-like pol-domain Nucleic Acids Res., 36, 38–46. orders. Mol. Genet. Genomics, 267, 481–491. 7. Llorens,C., Munoz-Pomer,A., Bernad,L., Botella,H. and Moya,A. 27. Eickbush,T.H. and Malik,H.S. (2002) Origin and evolution of (2009) Network dynamics of eukaryotic LTR retroelements retrotransposons. In Craig,N.L., Craigie,R., Gellert,M. and beyond phylogenetic trees. Biol. Direct., 4, 41. Lambowitz,A.M. (eds), Mobile DNA II. ASM Press, Washington 8. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and DC, pp. 1111–1144. Sayers,E.W. (2009) GenBank. Nucleic Acids Res., 37, D26–D31. 28. Malik,H.S. and Eickbush,T.H. (2001) Phylogenetic analysis of 9. Llorens,C., Fares,M.A. and Moya,A. (2008) Relationships of ribonuclease H domains suggests a late, chimeric origin of LTR Gag–pol diversity between Ty3/Gypsy and Retroviridae LTR retrotransposable elements and retroviruses. Genome Res., 11, retroelements and the three kings hypothesis. BMC Evol. Biol., 8, 1187–1197. 29. Marco,A. and Marin,I. (2008) How Athila retrotransposons 10. Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, survive in the Arabidopsis genome. BMC. Genomics, 9, 219. 14, 755–763. 30. Rambaut,A., Posada,D., Crandall,K.A. and Holmes,E.C. (2004) 11. Koonin,E.V., Zhou,S. and Lucchesi,J.C. (1995) The chromo The causes and consequences of HIV evolution. Nat. Rev. Genet., superfamily: new members, duplication of the chromo domain 5, 52–61. and possible role in delivering transcription regulators to 31. Flavell,A.J. (1999) Long terminal repeat retrotransposons jump chromatin. Nucleic Acids Res., 23, 4229–4233. between species. Proc. Natl Acad. Sci. USA, 96, 12211–12212. 12. Rawlings,N.D., Barrett,A.J. and Bateman,A. (2010) MEROPS: 32. Jordan,I.K., Matyunina,L.V. and McDonald,J.F. (1999) Evidence the peptidase database. Nucleic Acids Res., 38, D227–D233. for the recent horizontal transfer of long terminal repeat 13. Llorens,C., Futami,R., Renaud,G. and Moya,A. (2009) retrotransposon. Proc. Natl Acad. Sci. USA, 96, 12621–12625. Bioinformatic flowchart and database to investigate the origins 33. Bousalem,M., Douzery,E.J. and Seal,S.E. (2008) Taxonomy, and diversity of Clan AA peptidases. Biol. Direct., 4,3. molecular phylogeny and evolution of plant reverse transcribing 14. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., viruses (family Caulimoviridae) inferred from full-length genome Miller,W. and Lipman,D.J. (1997) Gapped BLAST and and reverse transcriptase sequences. Arch. Virol., 153, 1085–1102. PSI-BLAST: a new generation of protein database search 34. Koonin,E.V., Mushegian,A.R., Ryabov,E.V. and Dolja,V.V. programs. Nucleic Acids Res., 25, 3389–3402. (1991) Diverse groups of plant RNA and DNA viruses share 15. Llorens,C., Futami,R., Vicente-Ripolles,M. and Moya,A. (2008) related movement proteins that may possess chaperone-like Phylograph: a multifunction Java editor for handling phylogenetic activity. J. Gen. Virol., 72(Pt 12), 2895–2903. trees. Biotechvana Bioinformatics, Biotechvana, Valencia, SOFT: 35. Llorens,J.V., Clark,J.B., Martinez-Garay,I., Soriano,S., Phylograph. deFrutos,R. and Martinez-Sebastian,M.J. (2008) Gypsy 16. Llorens,C., Mun˜ oz-Pomer,A., Futami,R. and Moya,A. (2009) The endogenous retrovirus maintains potential infectivity in several GyDB Collection of Viral and Mobile Genetic Element Models. species of Drosophilids. BMC Evol. Biol., 8, 302. Biotechvana Bioinformatics, Biotechvana, Valencia, CR: GyDB 36. de Setta,N., Van Sluys,M.A., Capy,P. and Carareto,C.M. (2009) Collection. Multiple invasions of Gypsy and Micropia retroelements in genus 17. Piskurek,O., Nishihara,H. and Okada,N. (2008) The evolution of Zaprionus and melanogaster subgroup of the genus Drosophila. two partner LINE/SINE families and a full-length BMC Evol. Biol., 9, 279. chromodomain-containing Ty3/Gypsy LTR element in the first reptilian genome of Anolis carolinensis. Gene, 441, 111–118.

Journal

Nucleic Acids ResearchOxford University Press

Published: Jan 29, 2011

There are no references for this article.