TY - JOUR AU - Robinson-Rechavi, Marc AB - Abstract Nuclear hormone receptors are an abundant class of ligand activated transcriptional regulators, found in varying numbers in all animals. Based on our experience of managing the official nomenclature of nuclear receptors, we have developed NUREBASE, a database containing protein and DNA sequences, reviewed protein alignments and phylogenies, taxonomy and annotations for all nuclear receptors. The reviewed NUREBASE is completed by NUREBASE_DAILY, automatically updated every 24 h. Both databases are organized under a client/server architecture, with a client written in Java which runs on any platform. This client, named FamFetch, integrates a graphical interface allowing selection of families, and manipulation of phylogenies and alignments. NUREBASE sequence data is also accessible through a World Wide Web server, allowing complex queries. All information on accessing and installing NUREBASE may be found at http://www.ens-lyon.fr/LBMC/laudet/nurebase.html. Received August 13, 2001; Revised and Accepted October 10, 2001. INTRODUCTION Nuclear hormone receptors are one of the most abundant classes of transcriptional regulators in metazoans, in which they regulate functions as diverse as reproduction, differentiation, development, metabolism, metamorphosis or homeostasis (1). They function as ligand-activated transcription factors, thus providing a direct link between signaling molecules that control these processes and transcriptional responses. Nuclear receptors form a superfamily of phylogenetically related proteins, which share a common structural organization: a variable N-terminal region (A/B domain), a central well conserved DNA binding domain (DBD, C domain), a non-conserved hinge (D domain) and a C-terminus, moderately conserved ligand binding domain (LBD, E domain) (1). The superfamily includes receptors for hydrophobic molecules such as steroid hormones (estrogens, glucocorticoids, progesterone, mineralocorticoids, androgens, vitamin D, ecdysone, oxysterols, bile acids, etc.), retinoic acids (all-trans and 9-cis isoforms), thyroid hormones, fatty acids, leukotrienes and prostaglandins (2). A large number of nuclear receptors have also been identified by homology with the conserved DBD and LBD, but have no identified natural ligand, and are referred to as ‘nuclear orphan receptors’. As nuclear receptors bind small molecules which can easily be modified by drug design, and control functions associated with major pathologies (cancer, osteoporosis, diabetes, etc.), they are promising pharmacological targets. The search of ligands for orphan receptors and the identification of novel signaling pathways has become a very active research field (3,4). Their role in the control of animal development makes them major players for understanding animal evolution (5) or genomics (6). The importance of nuclear receptors has prompted the accumulation of rapidly increasing data from a great diversity of fields of research: sequences, expression patterns, three-dimensional structures, protein–protein interactions, target genes, physiological roles, mutations, etc. These data are highly dispersed, in a variety of formats. The aim of NUREBASE is to present an integrated database, with a unique, interactive interface, centralizing up-to-date information about nuclear receptors for the specialist and the non-specialist. There are 21 nuclear receptors in the complete genome of the fly Drosophila melanogaster (7), less than 50 in humans (8), but more than 250 in the nematode Caenorhabditis elegans (9,10). This diversity has been officially organized in a phylogeny-based nomenclature (11), of which one of us (V.L.) is in charge. An important aim of our database is thus to facilitate use of the official nomenclature, notably for new nuclear receptors. To answer these needs, we built the NUREBASE database of nuclear receptors. It contains all protein and DNA sequences, reviewed protein alignments and phylogenetic trees, and additional information such as nomenclature, domains and natural ligands. DATABASE CONTENTS Release 1.0 (August 2001) of NUREBASE contains 361 nuclear receptor protein sequences without redundancy, from 88 metazoan species. Divergent nematode sequences (9) will be incorporated as they are characterized experimentally and classified in the nomenclature (11). The sequences are grouped into ‘families’, corresponding to levels of nomenclature (11), with the following for each family. 1. A NUREBASE number, of the form NRBaabbcc, in which aa is 01 for all nuclear receptors with a DBD and an LBD and 00 for those which lack one or the other [families NR0 of the Nuclear Receptors Nomenclature Committee (11)], bb is the family number or 00 for receptors of all families, and cc is the sub-family number or 00 for receptors of all sub-families. Each nuclear receptor can belong to several encased NUREBASE families; for example, thyroid hormone receptor α, NR1A1, belongs to NRB010000, NRB010100 and NRB010101. 2. A textual definition, describing the contents of the NUREBASE family. For example, NRB010101 is ‘Sub-family NR1A: TR: THA, THB, NR1A3’. 3. An alignment of protein sequences, checked by eye with SEAVIEW (12). Sequences are first aligned to their sub-family, then sub-families aligned in a family, and finally all families aligned. Thus, the variable A/B domain can be correctly aligned between closely related sequences (human and mouse thyroid hormone receptor α), even though it cannot be aligned over the whole superfamily. 4. A phylogeny based on the protein sequences. Phylogenies are obtained by Neighbor-Joining (13) with Poisson corrected distances, and checked taking into account bootstrap support, known taxonomic relations and other tree-building methods (6,8). There are two additional NUREBASE families, NRB020001 and NRB020002, containing all nuclear receptors from the human and fly genomes, respectively. This brings the total number of NUREBASE families in release 1.0 to 35, each with an alignment and a phylogeny. Protein sequence names were modified in a manner similar to SWISS-PROT (14), starting with four characters describing the gene in a standardized way, an underscore, three characters for the genus, one character for the species, and a digit (1 by default) to manage cases of identical names. The gene name corresponds to the most common acronym for well known nuclear receptors, and to the nomenclature otherwise. For example, human thyroid hormone receptor α (NR1A1) is THA_HOMS1, but the human steroidogenic factor (NR5A1) is 5A1_HOMS1. For each protein sequence, NUREBASE has at least one reference to a corresponding DNA sequence in EMBL format (15) containing the original information from this database. Sequence annotations are also enriched with the official nomenclature, the name of the natural ligand (estrogen, thyroid hormone, etc., or ‘orphan’), the definition of the DBDs and LBDs, and the NUREBASE family number. The sum of this information makes a NUREBASE entry. These entries are integrated into two ACNUC (16) databases, one for protein sequences and one for DNA. To accommodate the need for frequent updates, as well as reviewed data, we have created a second database, NUREBASE_DAILY. It contains all the sequences and annotations of the reviewed NUREBASE, plus new sequences detected by an automatic update procedure. This procedure is launched every 24 h. It first downloads the daily release of GenPept, the translation of new coding sequences from NCBI (17). Nuclear receptor sequences are detected using BLASTP2 (18) by a search performed against one representative sequence for each nuclear receptor (i.e. one NR1A1, one NR1A2, etc.). Hits with an E-value < 10–5 are allotted to a sub-family by a second BLAST search performed against all known nuclear receptors (8). Entries in NUREBASE_DAILY are automatically generated from GenBank entries plus information from the sub-family to which they are allotted, and modified to the EMBL and TrEMBL formats. EMBL is not used directly in that case because of the lack of daily update files. Protein alignments are computed by adding new sequences to the pre-existing alignment of their sub-family with the Profile option of CLUSTALW (19). Phylogenetic trees are computed from these alignments by Neighbor-Joining with Poisson distances. Notably unreliable in NUREBASE_DAILY are the phylogenies, since a mistake in allotting a candidate nuclear receptor sequence may disturb the whole tree reconstruction process. New sequences in NUREBASE_DAILY have an automatically generated name with a ‘∼’ separator instead of ‘_’, for example, REVA∼MUSM1. NUREBASE_DAILY entries are then regularly added to NUREBASE proper, including recovery of corresponding EMBL entries, review of the annotations and inclusion in the reviewed alignments and trees of NUREBASE. In addition, an asterisk marks the families in which new sequences have been most recently added. DATABASE ACCESS The main access to NUREBASE is through the interface initially developed for the HOBACGEN database (20). This system is build under a client/server architecture, avoiding the need to install the complete database on the users’ computer (Fig. 1). The server side is made of three components: a World Wide Web service, a dedicated C program to access the data, and the data itself. A complete description of the World Wide Web service and the C program has been published previously (20). The client is the FamFetch Java application, which allows wide portability and interactivity. The main window of the interface allows selection of one of the NUREBASE families described above (Fig. 2). It is possible to make queries to define a subset of families matching specific criteria, such as species or keywords. Selection of a family prompts a tree window with the corresponding phylogeny. In this tree, sequences are colored according to taxonomy. The tree display is active, with options of re-rooting, node swapping or subtree selection. Clicking on leaves allows selection of one or several nuclear receptors in a new window. From there, the user may view the DNA or protein NUREBASE entry, or the protein alignment. The alignment contains only those sequences selected by the user, and is not computed but reconstructed from the pre-existing whole family multiple alignment. Functions allow the user to save lists of families, sequences entries, alignments or trees in text files. The two ACNUC databases of NUREBASE_DAILY are accessible online through the PBIL (Pôle Bio-Informatique Lyonnais) World Wide Web server (http://pbil.univ-lyon1.fr/search/query.html) (21). Advanced users can also install locally the complete database, and use the Query (16) and Query_win (22) retrieval programs allowing complex queries, although the benefit of regular updates is thus lost. The results of queries can then be directly used in the FamFetch interface, as explained by Perrière et al. (20). All information on accessing and installing NUREBASE may be found at http://www.ens-lyon.fr/LBMC/laudet/nurebase.html. DISCUSSION Strong points of NUREBASE include reviewed data, an interactive interface, elaborate queries through ACNUC and daily updates. These points differentiate NUREBASE from available World Wide Web resources for nuclear receptors. The Nuclear Receptor Resource (NRR) (23) is a network of static web pages, specialized in different sub-families of nuclear receptors, such as glucocorticoid receptors or androgen receptors. Unlike NUREBASE, the NRR is not exhaustive, nor does it offer evolutionary information, such as taxonomy or phylogenies. The NRR does not allow complex queries either, but it features important information such as a ‘Who’s who?’ in the field, meeting advertisements, educational resources, etc., which are out of the scope of NUREBASE. NucleaRDB (24) is more similar to NUREBASE, in that it includes all nuclear receptors, with alignments and trees per family and sub-family; as NUREBASE, but unlike the NRR, NucleaRDB relies on the official nomenclature. Unlike NUREBASE, data in NucleaRDB is not reviewed, but rather automatically generated by a system originally developed for G protein-coupled receptors. Moreover, the web page of NucleaRDB does not allow complex queries, or manipulations of alignments or phylogenies. On the other hand, NucleaRDB now includes data mining from MEDLINE, which appears as a promising tool for all studies of nuclear receptors. We thus hope that links between NUREBASE, the NRR and NucleaRDB will develop. Apart from these links, there are two main perspectives for NUREBASE development. The first is automatic integration of relevant data from various specialized databases. For each type of data, a first manual step is necessary to evaluate the task, then an automatic query is to be installed, as for sequences, to ensure the database is 100% up-to-date. Examples of databanks to query include: expression from ESTs or DNA chips, mutant mice, human mutations involved in diseases, secondary protein structures, non-annotated genome sequences, etc. The second perspective is development of the database structure and interface to integrate more elaborate data, such as in situ hybridization results, three-dimensional molecular structure or interactions with other proteins and target genes. We are also developing tools to handle alternative transcripts, which play numerous functional roles in nuclear receptors, and which still pose an important problem to many bioinformatic procedures. Finally, we are testing the use of different programs to improve automatic updating, such as SSAHA for similarity searches (Z.Ning, A.J.Cox, J.C.Mullikin, unpublished), T-COFFEE for alignments (25) or TREE-PUZZLE for phylogenies (26). We expect NUREBASE to be an important tool for all functional and evolutionary studies of nuclear receptors, as well as to organize the increasing amount of genome, transcriptome and proteome data relevant to this important superfamily. ACKNOWLEDGEMENTS We thank François Bonneton and Hector Escriva for critical reading. NUREBASE is supported by the CNRS. J.D. is supported by formation continue of Université de Rouen. * To whom correspondance should be addressed. Tel: +33 472 72 86 85; Fax: +33 472 72 80 80; Email: marc.robinson@ens-lyon.fr View largeDownload slide Figure 1. General organization of the NUREBASE client/server architecture. View largeDownload slide Figure 1. General organization of the NUREBASE client/server architecture. View largeDownload slide Figure 2. Organization of the FamFetch interface. The ‘Families’ window allows one to perform queries and to select a given family (A). In the ‘Tree’ window, the phylogenetic tree associated with the selected family is displayed (B). The ‘Colors’ dialog box shows the correspondence between colors and taxa (C). Clicking on a leaf on the ‘Tree’ window starts the display of the ‘Choice’ dialog box (D), from which it is possible to view one of the protein or DNA sequences (E), or the alignment of selected sequences (F). View largeDownload slide Figure 2. Organization of the FamFetch interface. The ‘Families’ window allows one to perform queries and to select a given family (A). In the ‘Tree’ window, the phylogenetic tree associated with the selected family is displayed (B). The ‘Colors’ dialog box shows the correspondence between colors and taxa (C). Clicking on a leaf on the ‘Tree’ window starts the display of the ‘Choice’ dialog box (D), from which it is possible to view one of the protein or DNA sequences (E), or the alignment of selected sequences (F). References 1 Gronemeyer,H. and Laudet,V. ( 1995) Transcription factors 3: nuclear receptors. In Sheterline,P. (ed.), Protein Profile. Academic Press, London, UK, Vol. 2. Google Scholar 2 Escriva,H., Delaunay,F. and Laudet,V. ( 2000) Ligand binding and nuclear receptor evolution. Bioessays , 22, 717–727. Google Scholar 3 Gustafsson,J.A. ( 1999) Seeking ligands for lonely orphan receptors. Science , 284, 1285–1286. Google Scholar 4 Kliewer,S.A., Lehmann,J.M. and Willson,T.M. ( 1999) Orphan nuclear receptors: shifting endocrinology into reverse. Science , 284, 757–760. Google Scholar 5 Escriva,H., Safi,R., Hänni,C., Langlois,M.-C., Saumitou-Laprade,P., Stehelin,D., Capron,A., Pierce,R. and Laudet,V. ( 1997) Ligand binding was aquired during evolution of nuclear receptors. Proc. Natl Acad. Sci. USA , 94, 6803–6808. Google Scholar 6 Robinson-Rechavi,M., Marchand,O., Escriva,H., Bardet,P.-L., Zelus,D., Hughes,S. and Laudet,V. ( 2001) Euteleost fish genomes are characterized by expansion of gene families. Genome Res. , 11, 781–788. Google Scholar 7 Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D., Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al. ( 2000) The genome sequence of Drosophila melanogaster. Science , 287, 2185–2195. Google Scholar 8 Robinson-Rechavi,M., Carpentier,A.-S., Duffraisse,M. and Laudet,V. ( 2001) How many nuclear hormone receptors in the human genome? Trends Genet. , 17, 554–556. Google Scholar 9 Sluder,A.E., Mathews,S.W., Hough,D., Yin,V.P. and Maina,C.V. ( 1999) The nuclear receptor superfamily has undergone extensive proliferation and diversification in nematodes. Genome Res. , 9, 103–120. Google Scholar 10 Sluder,A.E. and Maina,C.V. ( 2001) Nuclear receptors in nematodes: themes and variations. Trends Genet. , 17, 206–213. Google Scholar 11 Nuclear Receptors Nomenclature Committee ( 1999) A unified nomenclature system for the nuclear receptor superfamily. Cell , 97, 1–3. Google Scholar 12 Galtier,N., Gouy,M. and Gautier,C. ( 1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci. , 12, 543–548. Google Scholar 13 Saitou,N. and Nei,M. ( 1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. , 4, 406–425. Google Scholar 14 Bairoch,A. and Apweiler,R. ( 2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. , 28, 45–48. Google Scholar 15 Stoesser,G., Baker,W., van den Broek,A., Camon,E., Garcia-Pastor,M., Kanz,C., Kulikova,T., Lombard,V., Lopez,R., Parkinson,H., Redaschi,N., Sterk,P., Stoehr,P. and Tuli,M.A. ( 2001) The EMBL nucleotide sequence database. Nucleic Acids Res. , 29, 17–21. Updated article in this issue: Nucleic Acids Res. ( 2002), 30, 21–26. Google Scholar 16 Gouy,M., Gautier,C., Attimonelli,M., Lanave,C. and Di Paola,G. ( 1985) ACNUC – a portable system for nucleic acid sequence databases: logical and physical designs and usage. Comput. Appl. Biosci. , 1, 167–172. Google Scholar 17 Wheeler,D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. ( 2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. , 29, 11–16. Updated article in this issue: Nucleic Acids Res. ( 2002), 30, 13–16. Google Scholar 18 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. ( 1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. , 25, 3389–3402. Google Scholar 19 Thomson,J.D., Higgins,D.G. and Gibson,T.J. ( 1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. , 22, 4673–4680. Google Scholar 20 Perrière,G., Duret,L. and Gouy,M. ( 2000) HOBACGEN: database system for comparative genomics in bacteria. Genome Res. , 10, 379–385. Google Scholar 21 Perrière,G. and Gouy,M. ( 1996) WWW-Query: an on-line retrieval system for biological sequence banks. Biochimie , 78, 364–369. Google Scholar 22 Perrière,G., Gouy,M. and Gojobori,T. ( 1994) NRSub: a non-redundant data base for the Bacillus subtilis genome. Nucleic Acids Res. , 22, 5525–5529. Google Scholar 23 Martinez,E., Moore,D.D., Keller,E., Pearce,D., Vanden Heuvel,J.P., Robinson,V., Gottlieb,B., MacDonald,P., Simons,S.,Jr, Sanchez,E. and Danielsen,M. ( 1998) The Nuclear Receptor Resource: a growing family. Nucleic Acids Res. , 26, 239–241. Google Scholar 24 Horn,F., Vriend,G. and Cohen,F.E. ( 2001) Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems. Nucleic Acids Res. , 29, 346–349. Google Scholar 25 Notredame,C., Higgins,D.G. and Heringa,J. ( 2000) T-COFFEE: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. , 302, 205–217. Google Scholar 26 Strimmer,K. and Von Haeseler,A. ( 1996) Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. , 13, 964–969. Google Scholar TI - NUREBASE: database of nuclear hormone receptors JO - Nucleic Acids Research DO - 10.1093/nar/30.1.364 DA - 2002-01-01 UR - https://www.deepdyve.com/lp/oxford-university-press/nurebase-database-of-nuclear-hormone-receptors-GFyFDPP1V8 SP - 364 EP - 368 VL - 30 IS - 1 DP - DeepDyve ER -