TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes

V. Matys; O. V. Kel-Margoulis; E. Fricke; I. Liebich; S. Land; A. Barre-Dirrie; I. Reuter; D. Chekmenev; M. Krull; K. Hornischer; N. Voss; P. Stegmaier; B. Lewicki-Potapov; H. Saxel; A. E. Kel; E. Wingender

doi:10.1093/nar/gkj143

TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes

Matys, V.;Kel-Margoulis, O. V.;Fricke, E.;Liebich, I.;Land, S.;Barre-Dirrie, A.;Reuter, I.;Chekmenev, D.;Krull, M.;Hornischer, K.;Voss, N.;Stegmaier, P.;Lewicki-Potapov, B.;Saxel, H.;Kel, A. E.;Wingender, E. 2006-01-01 00:00:00 D108–D110 Nucleic Acids Research, 2006, Vol. 34, Database issue doi:10.1093/nar/gkj143 TRANSFAC and its module TRANSCompel : transcriptional gene regulation in eukaryotes V. Matys*, O. V. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, N. Voss, P. Stegmaier, B. Lewicki-Potapov, H. Saxel, A. E. Kel and E. Wingender BIOBASE GmbH, Halchtersche Strasse 33, D-38304 Wolfenbu ¨ ttel, Germany Received September 15, 2005; Revised and Accepted October 27, 2005 ABSTRACT TRANSFAC (1–3) and TRANSCompel (3,4) are among those databases which have been contributing for years to sight The TRANSFAC database on transcription factors, and order the published data on eukaryotic gene transcription their binding sites, nucleotide distribution matrices regulation and, by doing so, to make the available data applic- and regulated genes as well as the complementing able for analysis and predictions. The primary data in the two database TRANSCompel on composite elements databases (e.g. DNA-binding sites in TRANSFAC , compos- have been further enhanced on various levels. A ite elements in TRANSCompel ) are based on experimental evidence. These data are extracted by curators from peer- new web interface with different search options and TM TM reviewed papers. The curators search the scientiﬁc literature integrated versions of Match and Patch provides for suitable data, which are then entered via an input client, increased functionality for TRANSFAC . The list of making use of controlled vocabulary and various automated databases which are linked to the common GENE functions, into a relational database, from which ﬂatﬁle table of TRANSFAC and TRANSCompel has been releases are generated from time to time. Collection of extended by: Ensembl, UniGene, EntrezGene, these data in a structured form allows us to deduce—via com- TM TM HumanPSD and TRANSPRO . Standard gene parison and classiﬁcation—secondary or so-called meta-data names from HGNC, MGI and RGD, are included for (e.g. nucleotide distribution matrices, factor classiﬁcation). human, mouse and rat genes, respectively. With Both types of data, the primary as well as the secondary the help of InterProScan, Pfam, SMART and data, can then serve for (sequence-based) predictions by cer- PROSITE domains are assigned automatically to tain programs, e.g. Match (5) (for matrix-based transcription the protein sequences of the transcription factors. factor binding site searches), Patch (for pattern-based tran- scription factor binding site searches) and P-Match (6) (for a TRANSCompel contains now, in addition to the mixture of matrix- and pattern-based binding site searches). COMPEL table, a separate table for detailed informa- (These programs are all available on the same server as the tion on the experimental EVIDENCE on which herein described databases.) the composite elements are based. Finally, for TRANSFAC , in respect of data growth, in particu- Content of TRANSFAC and TRANSCompel lar the gain of Drosophila transcription factor The primary data of TRANSFAC are stored in the three binding sites (by courtesy of the Drosophila DNase I tables FACTOR, SITE and GENE for information on tran- footprint database) and of Arabidopsis factors scription factors, their binding sites and regulated genes, (by courtesy of DATF, Database of Arabidopsis respectively. Besides genomic binding sites, the SITE table Transcription Factors) has to be stressed. The here contains also so-called artiﬁcial binding sites, which are described public releases, TRANSFAC 7.0 and mostly sites from random oligonucleotide selection assays, TRANSCompel 7.0, are accessible under http:// and IUPAC consensus sequences. Nucleotide distribution mat- rices, which are derived from a collection of binding sites for a www.gene-regulation.com/pub/databases.html. particular factor are stored in the MATRIX table, while the CLASS table groups the transcription factors according to INTRODUCTION their DNA-binding domains. In addition to this CLASS For a better understanding of almost all life processes, a table, the factor entries are linked to the respective nodes in deeper knowledge of gene regulation seems indispensable. a classiﬁcation hierarchy (7). In a sixth table (CELL), cell lines *To whom correspondence should be addressed. Tel: +49 5331 8584 28; Fax: +49 5331 8584 70; Email: [email protected] The Author 2006. Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D108/1133867 by Ed 'DeepDyve' Gillespie user on 06 February 2018 Nucleic Acids Research, 2006, Vol. 34, Database issue D109 Table 1. Number of entries in the tables of TRANSFAC 7.0 and respective protein reports in HumanPSD (8). GENE entries TRANSCompel 7.0 from human, mouse and rat have been linked to the promoter database TRANSPRO [see the paper about TiProD (9)], and Table TRANSFAC Rel. 7.0 the corresponding SITE entries were also mapped to the pro- moter sequences in TRANSPRO, where absolute genomic FACTOR 6133 Homo sapiens 1040 positions are given for the promoter sequences. These Mus musculus 765 sequences in TRANSPRO reach from 10 000 nt upstream D.melanogaster 233 to 1000 nt downstream relative to the accepted ‘virtual TSSs’, A.thaliana 1751 which are derived by weighting documented TSSs from dif- S.cerevisiae 368 SITE 7915 ferent resources (9). FACTOR–SITE–GENE links are mir- MATRIX 398 rored as ‘trans-regulations’ (FACTOR!GENE) in GENE (all entries) 2397 TRANSPATH (10,11), where these are incorporated into H.sapiens 608 the overall regulation network of the cell. For yeast (Saccharo- M.musculus 417 myces cerevisiae) genes, now standard open reading frame D.melanogaster 145 A.thaliana 115 names are given under synonyms. For human, mouse and S.cerevisiae 195 rat genes (and factors) HGNC (12), MGI (13) and RGD GENE (entries with SITE links) 1504 (14) gene symbols are given, respectively. (The HGNC, CLASS 50 MGI and RGD gene symbols appear in the GENE table CELL 1307 under external database links and in the FACTOR table along- TRANSCompel Rel. 7.0 side the GENE link.) Further, new links to Ensembl (15) and COMPEL (composite elements) 322 UniGene (16), as well as Affymetrix probe set IDs were added to the GENE table, and the links to LocusLink were changed and other kinds of factor sources, which were used for detec- into EntrezGene (17) links and expanded from human to tion of a binding site/binding activity, are stored (Table 1). mouse and rat, as well as—to a smaller extend—other While TRANSFAC deals essentially with single factor– organisms. site interactions, the focus of TRANSCompel is on so-called composite elements, consisting of two (or more) neighboring Automatic factor domain assignment binding sites, characterized by synergistic or antagonistic For each release protein sequences are analyzed with Inter- effects between the two transcription factors binding to ProScan (18). From the databases integrated by InterPro, Pfam them. They are, thus, the smallest units of combinatorial tran- (19) and SMART (20), as well as PROSITE (21) models scriptional regulation (Table 1). corresponding to low-complexity regions are selected. The automatically assigned domains, which are linked to the cor- Recent changes in the database structure responding Pfam, SMART and PROSITE entries, are meant In the TRANSFAC GENE table a new ﬁeld has been intro- to complement the manually annotated domains, many of duced for the inclusion of information on the regulation of which are based on functional studies reported in the original gene expression, especially when this information cannot be literature. assigned (yet) to a particular binding site. The CELL table, From Arabidopsis to Drosophila which contains entries of cell lines or other factor sources, lists now all SITE entries, for which binding activity was shown Besides a general data increase, with major focus on human, under the given conditions. The factor CLASS entries have mouse, rat and other vertebrate organisms, especially the been linked to the respective nodes in the hierarchical factor amount of Arabidopsis thaliana and Drosophila melanogaster classiﬁcation. The links from GENE entries (TRANSFAC ) data has been increased for TRANSFAC 7.0. This was to composite elements (in TRANSCompel ) are no longer accomplished in particular by import of 1440 factor entries listed among other database links, but are given now sub- from DATF, Database of Arabidopsis transcription factors sequent to the listed binding sites, and, as for those, the posi- [http://datf.cbi.pku.edu.cn/ (22)], and 899 genomic site entries tions of the composite elements within the gene (usually from the Drosophila DNase I footprint database [http://www. relative to the transcription start site, TSS) are given. The ﬂyreg.org/ (23)]. The imported data are referenced accordingly structure of TRANScompel has been fundamentally chan- and are linked to the respective databases, from which they ged. The database consists now of two tables. The COMPEL were derived. In addition, the Drosophila gene entries linked table contains general information about the composite ele- to the newly imported sites contain pointers to FlyBase (24) ments including sequence, positions, gene, names of cooper- and EntrezGene (17). Identiﬁers used by Ensembl which are ating transcription factors as well as a brief list of the synonyms in FlyBase and EntrezGene (e.g. CG3481) were experimental evidence, while detailed information about the introduced as synonyms of the gene name and were used experimental evidence, conﬁrming physical and functional for mapping during the import procedure. interactions between the corresponding transcription factors, can be found in the EVIDENCE table. New web interface TRANSFAC and the programs Match and Patch, for Linking to other databases transcription factor binding site searches, are now combined Linking to other databases has been extended. The FACTOR under a common web interface. In addition to the ‘one table and GENE entries of TRANSFAC have been linked to the search’ the new search engine has a search mode for Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D108/1133867 by Ed 'DeepDyve' Gillespie user on 06 February 2018 D110 Nucleic Acids Research, 2006, Vol. 34, Database issue simultaneous search in all tables. The ‘one table search’ con- 4. Kel-Margoulis,O., Kel,A.E., Reuter,I., Deineko,I.V. and Wingender,E. (2002) TRANSCompel—a database on composite regulatory elements in tains now the possibility to combine searches in up to three eukaryotic genes. Nucleic Acids Res., 30, 332–334. ﬁelds at the same time or to make batch search and further 5. Kel,A.E., Go ¨ ßling,E., Reuter,I., Cheremushkin,E., Kel-Margoulis,O.V. options. The user can also choose the output ﬁelds, which are and Wingender,E. (2003) MATCH: a tool for searching transcription to be displayed in the result list. Each search can be stored and factor binding sites in DNA sequences. Nucleic Acids Res., 31, 3576–3579. reﬁned later on or the stored search result (list of accession 6. Chekmenev,D.S., Haid,C. and Kel,A.E. (2005) P-Match: transcription numbers) can serve as input for a batch search in another table. factor binding site search by combining patterns and weight matrices. Finally, on the basis of the result from a search in the SITE Nucleic Acids Res., 33, W432–W437. or MATRIX table, ‘proﬁles’ (sets of sites or matrices) can 7. Wingender,E. (1997) Classification scheme of eukaryotic transcription be created, which can be used by the integrated version of factors. Mol. Biol., 31, 483–497. 8. Hodges,P.E., Carrico,P.M., Hogan,J.D., O’Neill,K.E., Owen,J.J., Patch or Match, respectively, for sequence analyses. Mangan,M., Davis B.P., Brooks,J.E. and Garrels,J.I. (2002) Annotating the human proteome: the Human Proteome Survey Database TM (HumanPSD ) and an in-depth target database for G protein-coupled TM receptors (GPCR-PD ) from Incyte. Genomics. Nucleic Acids Res., 30, AVAILABILITY 137–141. 9. Chen,X., Wu,J.-m., Hornischer,K., Kel,A. and Wingender,E. (2006) The described TRANSFAC 7.0 and TRANSCompel 7.0 TiProD: the Tissue-specific Promoter Database. Nucleic Acids Res., 34, releases as well as the programs Match, Patch and P- D104–D107. Match are all freely available for online use by users 10. Schacherer,F., Choi,C., Go ¨ tze,U., Krull,M., Pistor,S. and Wingender,E. from non-proﬁt organizations at http://www.gene-regulation. (2001) The TRANSPATH signal transduction database: a knowledge base on signal transduction networks. Bioinformatics, 17, com/pub/databases.html#transfac, http://www.gene-regulation. 1053–1057. com/pub/databases.html#transcompel and http://www.gene- 11. Krull,M., Voss,N., Choi,C., Pistor,S., Potapov,A. and Wingender,E. regulation.com/pub/programs.html, respectively. (2003) TRANSPATH : an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res., 31, 97–100. 12. Wain,H.M., Lush,M.J., Ducluzeau,F., Khodiyar,V.K. and Povey,S. (2004) Genew: the Human Gene Nomenclature Database, 2004 updates. ACKNOWLEDGEMENTS Nucleic Acids Res., 32, D255–D257. 13. Bult,C.J., Blake,J.A., Richardson,J.E., Kadin, J.A., Eppig,J.T. and the We like to thank Prof Jingchu Luo, Dr Anyuan Guo and col- Mouse Genome Database Group (2004) The Mouse Genome Database leagues from Peking University, Center for Bioinformatics, (MGD): integrating biology with the genome. Nuceic Acids Res., 32, for providing the DATF data and Dr Casey Bergman and D476–D481. colleagues from FlyReg, Universiy of Manchester, U.K. for 14. de la Cruz,N., Bromberg,S., Pasko,D., Shimoyama,M., Twigger,S., Chen,J., Chen,C.F., Fan,C., Foote,C., Gopinath,G.R. et al. (2005) The Rat the Drosophila footprint data, as well as Prof. Dr. Reinhard Genome Database (RGD): developments towards a phenome database. Hehl and Claudia Galuschka from the Technical University Nucleic Acids Res., 33, D485–D491. Braunschweig, Germany for their collaboration on plant data 15. Hubbard,T., Andrews,D., Caccamo,M., Cameron,G., Chen,Y., curation. Further, we like to thank all people who have been Clamp,M., Clarke,L., Coates,G., Cox,T., Cunningham,F. et al. (2005) contributing over the years to the development and curation of Ensembl 2005. Nucleic Acids Res., 33, D447–D453. 16. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., the described databases and connected tools. Parts of the work Church,D.M., DiCuccio,M., Edgar,R., Federhen,S., Helmberg,W. et al. were funded by grants of the German Ministry of Education and (2005) Database resources of the National Center for Biotechnology Research (BMBF) ‘Intergenomics’ (031U210B), collectively Information. Nucleic Acids Res., 33, D39–D45. by BioRegioN GmbH and BMBF ‘BioProfil’ (0313092), by the 17. Maglott,D., Ostell,J., Pruitt,K.D., Tatusova,T. (2005) Entrez Gene: gene- centered information at NCBI. Nucleic Acids Res., 33, D54–D58. European Commission under FP6-‘Life sciences, genomics 18. Mulder,N.J., Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., and biotechnology for health’, contract LSHG-CT-2004- Binns,D., Bradley,P., Bork,P., Bucher,P., Cerutti,L. et al. (2005) 503568 ‘COMBIO’, and by the European Commission InterPro, progress and status in 2005. Nucleic Acids Res., 33, under ‘Marie Curie research training networks’, contract D201–D205. MRTN-CT-2004-512285 ‘TRANSISTOR’. Funding to pay 19. Bateman,A., Coin,L., Durbin,R., Finn,R.D., Hollich,V., Griffiths- Jones,S., Khanna,A., Marshall,M., Moxon,S., Sonnhammer,E.L. et al. the Open Access publication charges for this article was pro- (2004) The Pfam protein families database. Nucleic Acids Res., 32, vided by BIOBASE GmbH. D138–D141. 20. Letunic,I., Copley,R.R., Schmidt,S., Ciccarelli,F.D., Doerks,T., Conflict of interest statement. None declared. Schultz,J., Ponting,C.P., Bork,P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res., 32, D142–D144. 21. Hulo,N., Sigrist,C.J., Le Saux,V., Langendijk-Genevaux,P.S., Bordoli,L., Gattiker,A., De Castro,E., Bucher,P., Bairoch,A. (2004) Recent REFERENCES improvements to the PROSITE database. Nucleic Acids Res., 32, D134–D137. 1. Wingender,E. (1988) Compilation of transcription regulating proteins. 22. Guo,A., He,K., Liu,D., Bai,S., Gu,X., Wei,L., Luo,J. (2005) DATF: Nucleic Acids Res., 16, 1879–1902. a Database of Arabidopsis Transcription Factors. Bioinformatics, 2. Matys,V., Fricke,E., Geffers,R., Go ¨ ßling,E., Haubrock,M., Hehl,R., 21, 2568–2569. Hornischer,K., Karas,D., Kel,A.E., Kel-Margoulis,O.V. et al. (2003) 23. Bergman,C.M., Carlson,J.W., Celniker,S.E. (2005) Drosophila DNase I TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic footprint database: a systematic genome annotation of transcription factor Acids Res., 31, 374–378. binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics, 21, 3. Kel-Margoulis,O., Matys,V., Choi,C., Reuter,I., Krull,M., Potapov,A.P., 1747–1749. Voss,N., Liebich,I., Kel,A. and Wingender,E. (2005) Databases on gene 24. Drysdale,R.A., Crosby,M.A. and FlyBase Consortium (2005) FlyBase: regulation. In Bajic,V.B. and Tan,T.W. (eds), Information Processing and genes and gene models. Nucleic Acids Res., 33, Living Systems. World Scientific Publishing Co., Singapore, Vol. 2, D390–D395. pp. 709–727. Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D108/1133867 by Ed 'DeepDyve' Gillespie user on 06 February 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/transfac-and-its-module-transcompel-transcriptional-gene-regulation-in-IW73rZMTMP

Loading next page...

References (28)

P. Hodges, P. Carrico, J. Hogan, Kathy O’Neill, J. Owen, Mary Mangan, B. Davis, J. Brooks, J. Garrels (2002)
Annotating the human proteome: the Human Proteome Survey Database (HumanPSDTM) and an in-depth target database for G protein-coupled receptors (GPCR-PDTM) from Incyte Genomics
Nucleic acids research, 30 1
A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E.L. Sonnhammer (2004)
The Pfam protein families database
Nucleic Acids Res., 32
E. Gößling, O. Kel-Margoulis, A. Kel, E. Wingender (2003)
MATCHTM - a tool for searching transcription factor binding sites in DNA sequences. Application for the analysis of human chromosomes
H. Wain, M. Lush, Fabrice Ducluzeau, V. Khodiyar, S. Povey (2004)
Genew: the Human Gene Nomenclature Database, 2004 updates
Nucleic acids research, 32 Database issue
C. Bult, J. Blake, J. Richardson, J. Kadin, J. Eppig (2004)
The Mouse Genome Database (MGD): integrating biology with the genome
Nucleic acids research, 32 Database issue
D. Chekmenev, C. Haid, A. Kel (2005)
P-Match: transcription factor binding site search by combining patterns and weight matrices
Nucleic Acids Research, 33
A. Kel, E. Gößling, I. Reuter, E. Cheremushkin, O. Kel-Margoulis, E. Wingender (2003)
MATCH: A tool for searching transcription factor binding sites in DNA sequences.
Nucleic acids research, 31 13
H. Wain, M. Lush, Fabrice Ducluzeau, S. Povey (2002)
Genew: the Human Gene Nomenclature Database
Nucleic acids research, 30 1
N. Hulo, Christian Sigrist, Virginie Saux, P. Langendijk-Genevaux, L. Bordoli, Alexandre Gattiker, E. Castro, P. Bucher, A. Bairoch (2004)
Recent improvements to the PROSITE database
Nucleic acids research, 32 Database issue
(2006)
D110 Nucleic Acids Research
R. Drysdale, M. Crosby (2004)
FlyBase: genes and gene models
Nucleic Acids Research, 33
C. Bergman, J. Carlson, S. Celniker (2005)
Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster
Bioinformatics, 21 8
Anyuan Guo, Kun He, Di Liu, Shunong Bai, X. Gu, Liping Wei, Jingchu Luo (2005)
DATF: a database of Arabidopsis transcription factors
Bioinformatics, 21 10
O. Kel-Margoulis, V. Matys, Claudia Choi, M. Krull, I. Reuter, N. Voss, A. Kel, E. Wingender, A. Potapov, I. Liebich (2005)
DATABASES ON GENE REGULATION
E. Wingender (1988)
Compilation of transcription regulating proteins.
Nucleic acids research, 16 5
N. Cruz, Susan Bromberg, D. Pasko, M. Shimoyama, S. Twigger, Jiali Chen, Chin-Fu Chen, Chunyu Fan, Cindy Foote, Gopal Gopinath, G. Harris, Aubrey Hughes, Yuan Ji, W. Jin, Dawei Li, Jedidiah Mathis, N. Nenasheva, Jeff Nie, Rajni Nigam, V. Petri, Dorothy Reilly, Weiye Wang, Wenhua Wu, Angela Zuniga-Meyer, Lan Zhao, A. Kwitek, P. Tonellato, H. Jacob (2004)
The Rat Genome Database (RGD): developments towards a phenome database
Nucleic Acids Research, 33
O. Kel-Margoulis, A. Kel, I. Reuter, I. Deineko, E. Wingender (2002)
TRANSCompel®: a database on composite regulatory elements in eukaryotic genes
Nucleic acids research, 30 1
Donna Maglott, J. Ostell, Kim Pruitt, T. Tatusova (2004)
Entrez Gene: gene-centered information at NCBI
Nucleic Acids Research, 33
V. Matys, E. Fricke, R. Geffers, E. Gößling, Martin Haubrock, R. Hehl, K. Hornischer, D. Karas, A. Kel, O. Kel-Margoulis, D. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Münch, I. Reuter, S. Rotert, H. Saxel, Maurice Scheer, S. Thiele, E. Wingender (2003)
TRANSFAC®: transcriptional regulation, from patterns to profiles
Nucleic Acids Res., 31
F. Schacherer, Claudia Choi, U. Götze, M. Krull, S. Pistor, Edgar Wingender (2001)
The TRANSPATH signal transduction database: a knowledge base on signal transduction networks
Bioinformatics, 17 11
Xin Chen, Jian-min Wu, K. Hornischer, A. Kel, E. Wingender (2005)
TiProD: the Tissue-specific Promoter Database
Nucleic Acids Research, 34
D.L. Wheeler, T. Barrett, D.A. Benson, S.H. Bryant, K. Canese, D.M. Church, M. DiCuccio, R. Edgar, S. Federhen, W. Helmberg (2005)
Database resources of the National Center for Biotechnology Information
Nucleic Acids Res., 33
T. Hubbard, D. Andrews, M. Cáccamo, G. Cameron, Yuan Chen, M. Clamp, Laura Clarke, Guy Coates, Tony Cox, Fiona Cunningham, V. Curwen, T. Cutts, T. Down, R. Durbin, X. Fernández-Suárez, J. Gilbert, M. Hammond, Javier Herrero, H. Hotz, K. Howe, V. Iyer, K. Jekosch, Andreas Kähäri, A. Kasprzyk, Damian Keefe, S. Keenan, F. Kokocinski, D. London, Ian Longden, G. McVicker, Craig Melsopp, P. Meidl, Simon Potter, G. Proctor, Mark Rae, Daniel Rios, Michael Schuster, S. Searle, J. Severin, G. Slater, D. Smedley, James Smith, W. Spooner, Arne Stabenau, J. Stalker, R. Storey, S. Trevanion, A. Ureta-Vidal, J. Vogel, S. White, Cara Woodwark, E. Birney (2004)
Ensembl 2005
Nucleic Acids Research, 33
N. Mulder, R. Apweiler, T. Attwood, A. Bairoch, A. Bateman, David Binns, Paul Bradley, P. Bork, Phillip Bucher, L. Cerutti, R. Copley, E. Courcelle, Ujjwal Das, R. Durbin, W. Fleischmann, J. Gough, D. Haft, Nicola Harte, N. Hulo, D. Kahn, Alexander Kanapin, Maria Krestyaninova, D. Lonsdale, R. Lopez, Ivica Letunic, M. Madera, J. Maslen, J. McDowall, A. Mitchell, A. Nikolskaya, S. Orchard, M. Pagni, C. Ponting, Emmanuel Quevillon, J. Selengut, Christian Sigrist, Ville Silventoinen, D. Studholme, Robert Vaughan, Cathy Wu (2004)
InterPro, progress and status in 2005
Nucleic Acids Research, 33
Robert Finn, Jaina Mistry, John Tate, Penny Coggill, A. Heger, Joanne Pollington, O. Gavin, Prasad Gunasekaran, Goran Ceric, Kristoffer Forslund, Liisa Holm, Erik Sonnhammer, Sean Eddy, Alex Bateman (2007)
The Pfam protein families database
Nucleic Acids Research, 38
E. Wingender (1997)
CLASSIFICATION SCHEME OF EUKARYOTIC TRANSCRIPTION FACTORS
Molecular Biology, 31
M. Krull, N. Voss, Claudia Choi, S. Pistor, A. Potapov, E. Wingender (2003)
TRANSPATH: An integrated database on signal transduction and a tool for array analysis
Nucleic acids research, 31 1
Ivica Letunic, R. Copley, Steffen Schmidt, F. Ciccarelli, T. Doerks, J. Schultz, C. Ponting, P. Bork (2004)
SMART 4.0: towards genomic data integration
Nucleic acids research, 32 Database issue

Publisher: Oxford University Press
Copyright: © The Author 2006. Published by Oxford University Press. All rights reserved  The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected]
ISSN: 0305-1048
eISSN: 1362-4962
DOI: 10.1093/nar/gkj143
pmid: 16381825
Publisher site: See Article on Publisher Site

Abstract

D108–D110 Nucleic Acids Research, 2006, Vol. 34, Database issue doi:10.1093/nar/gkj143 TRANSFAC and its module TRANSCompel : transcriptional gene regulation in eukaryotes V. Matys*, O. V. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, N. Voss, P. Stegmaier, B. Lewicki-Potapov, H. Saxel, A. E. Kel and E. Wingender BIOBASE GmbH, Halchtersche Strasse 33, D-38304 Wolfenbu ¨ ttel, Germany Received September 15, 2005; Revised and Accepted October 27, 2005 ABSTRACT TRANSFAC (1–3) and TRANSCompel (3,4) are among those databases which have been contributing for years to sight The TRANSFAC database on transcription factors, and order the published data on eukaryotic gene transcription their binding sites, nucleotide distribution matrices regulation and, by doing so, to make the available data applic- and regulated genes as well as the complementing able for analysis and predictions. The primary data in the two database TRANSCompel on composite elements databases (e.g. DNA-binding sites in TRANSFAC , compos- have been further enhanced on various levels. A ite elements in TRANSCompel ) are based on experimental evidence. These data are extracted by curators from peer- new web interface with different search options and TM TM reviewed papers. The curators search the scientiﬁc literature integrated versions of Match and Patch provides for suitable data, which are then entered via an input client, increased functionality for TRANSFAC . The list of making use of controlled vocabulary and various automated databases which are linked to the common GENE functions, into a relational database, from which ﬂatﬁle table of TRANSFAC and TRANSCompel has been releases are generated from time to time. Collection of extended by: Ensembl, UniGene, EntrezGene, these data in a structured form allows us to deduce—via com- TM TM HumanPSD and TRANSPRO . Standard gene parison and classiﬁcation—secondary or so-called meta-data names from HGNC, MGI and RGD, are included for (e.g. nucleotide distribution matrices, factor classiﬁcation). human, mouse and rat genes, respectively. With Both types of data, the primary as well as the secondary the help of InterProScan, Pfam, SMART and data, can then serve for (sequence-based) predictions by cer- PROSITE domains are assigned automatically to tain programs, e.g. Match (5) (for matrix-based transcription the protein sequences of the transcription factors. factor binding site searches), Patch (for pattern-based tran- scription factor binding site searches) and P-Match (6) (for a TRANSCompel contains now, in addition to the mixture of matrix- and pattern-based binding site searches). COMPEL table, a separate table for detailed informa- (These programs are all available on the same server as the tion on the experimental EVIDENCE on which herein described databases.) the composite elements are based. Finally, for TRANSFAC , in respect of data growth, in particu- Content of TRANSFAC and TRANSCompel lar the gain of Drosophila transcription factor The primary data of TRANSFAC are stored in the three binding sites (by courtesy of the Drosophila DNase I tables FACTOR, SITE and GENE for information on tran- footprint database) and of Arabidopsis factors scription factors, their binding sites and regulated genes, (by courtesy of DATF, Database of Arabidopsis respectively. Besides genomic binding sites, the SITE table Transcription Factors) has to be stressed. The here contains also so-called artiﬁcial binding sites, which are described public releases, TRANSFAC 7.0 and mostly sites from random oligonucleotide selection assays, TRANSCompel 7.0, are accessible under http:// and IUPAC consensus sequences. Nucleotide distribution mat- rices, which are derived from a collection of binding sites for a www.gene-regulation.com/pub/databases.html. particular factor are stored in the MATRIX table, while the CLASS table groups the transcription factors according to INTRODUCTION their DNA-binding domains. In addition to this CLASS For a better understanding of almost all life processes, a table, the factor entries are linked to the respective nodes in deeper knowledge of gene regulation seems indispensable. a classiﬁcation hierarchy (7). In a sixth table (CELL), cell lines *To whom correspondence should be addressed. Tel: +49 5331 8584 28; Fax: +49 5331 8584 70; Email: [email protected] The Author 2006. Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D108/1133867 by Ed 'DeepDyve' Gillespie user on 06 February 2018 Nucleic Acids Research, 2006, Vol. 34, Database issue D109 Table 1. Number of entries in the tables of TRANSFAC 7.0 and respective protein reports in HumanPSD (8). GENE entries TRANSCompel 7.0 from human, mouse and rat have been linked to the promoter database TRANSPRO [see the paper about TiProD (9)], and Table TRANSFAC Rel. 7.0 the corresponding SITE entries were also mapped to the pro- moter sequences in TRANSPRO, where absolute genomic FACTOR 6133 Homo sapiens 1040 positions are given for the promoter sequences. These Mus musculus 765 sequences in TRANSPRO reach from 10 000 nt upstream D.melanogaster 233 to 1000 nt downstream relative to the accepted ‘virtual TSSs’, A.thaliana 1751 which are derived by weighting documented TSSs from dif- S.cerevisiae 368 SITE 7915 ferent resources (9). FACTOR–SITE–GENE links are mir- MATRIX 398 rored as ‘trans-regulations’ (FACTOR!GENE) in GENE (all entries) 2397 TRANSPATH (10,11), where these are incorporated into H.sapiens 608 the overall regulation network of the cell. For yeast (Saccharo- M.musculus 417 myces cerevisiae) genes, now standard open reading frame D.melanogaster 145 A.thaliana 115 names are given under synonyms. For human, mouse and S.cerevisiae 195 rat genes (and factors) HGNC (12), MGI (13) and RGD GENE (entries with SITE links) 1504 (14) gene symbols are given, respectively. (The HGNC, CLASS 50 MGI and RGD gene symbols appear in the GENE table CELL 1307 under external database links and in the FACTOR table along- TRANSCompel Rel. 7.0 side the GENE link.) Further, new links to Ensembl (15) and COMPEL (composite elements) 322 UniGene (16), as well as Affymetrix probe set IDs were added to the GENE table, and the links to LocusLink were changed and other kinds of factor sources, which were used for detec- into EntrezGene (17) links and expanded from human to tion of a binding site/binding activity, are stored (Table 1). mouse and rat, as well as—to a smaller extend—other While TRANSFAC deals essentially with single factor– organisms. site interactions, the focus of TRANSCompel is on so-called composite elements, consisting of two (or more) neighboring Automatic factor domain assignment binding sites, characterized by synergistic or antagonistic For each release protein sequences are analyzed with Inter- effects between the two transcription factors binding to ProScan (18). From the databases integrated by InterPro, Pfam them. They are, thus, the smallest units of combinatorial tran- (19) and SMART (20), as well as PROSITE (21) models scriptional regulation (Table 1). corresponding to low-complexity regions are selected. The automatically assigned domains, which are linked to the cor- Recent changes in the database structure responding Pfam, SMART and PROSITE entries, are meant In the TRANSFAC GENE table a new ﬁeld has been intro- to complement the manually annotated domains, many of duced for the inclusion of information on the regulation of which are based on functional studies reported in the original gene expression, especially when this information cannot be literature. assigned (yet) to a particular binding site. The CELL table, From Arabidopsis to Drosophila which contains entries of cell lines or other factor sources, lists now all SITE entries, for which binding activity was shown Besides a general data increase, with major focus on human, under the given conditions. The factor CLASS entries have mouse, rat and other vertebrate organisms, especially the been linked to the respective nodes in the hierarchical factor amount of Arabidopsis thaliana and Drosophila melanogaster classiﬁcation. The links from GENE entries (TRANSFAC ) data has been increased for TRANSFAC 7.0. This was to composite elements (in TRANSCompel ) are no longer accomplished in particular by import of 1440 factor entries listed among other database links, but are given now sub- from DATF, Database of Arabidopsis transcription factors sequent to the listed binding sites, and, as for those, the posi- [http://datf.cbi.pku.edu.cn/ (22)], and 899 genomic site entries tions of the composite elements within the gene (usually from the Drosophila DNase I footprint database [http://www. relative to the transcription start site, TSS) are given. The ﬂyreg.org/ (23)]. The imported data are referenced accordingly structure of TRANScompel has been fundamentally chan- and are linked to the respective databases, from which they ged. The database consists now of two tables. The COMPEL were derived. In addition, the Drosophila gene entries linked table contains general information about the composite ele- to the newly imported sites contain pointers to FlyBase (24) ments including sequence, positions, gene, names of cooper- and EntrezGene (17). Identiﬁers used by Ensembl which are ating transcription factors as well as a brief list of the synonyms in FlyBase and EntrezGene (e.g. CG3481) were experimental evidence, while detailed information about the introduced as synonyms of the gene name and were used experimental evidence, conﬁrming physical and functional for mapping during the import procedure. interactions between the corresponding transcription factors, can be found in the EVIDENCE table. New web interface TRANSFAC and the programs Match and Patch, for Linking to other databases transcription factor binding site searches, are now combined Linking to other databases has been extended. The FACTOR under a common web interface. In addition to the ‘one table and GENE entries of TRANSFAC have been linked to the search’ the new search engine has a search mode for Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D108/1133867 by Ed 'DeepDyve' Gillespie user on 06 February 2018 D110 Nucleic Acids Research, 2006, Vol. 34, Database issue simultaneous search in all tables. The ‘one table search’ con- 4. Kel-Margoulis,O., Kel,A.E., Reuter,I., Deineko,I.V. and Wingender,E. (2002) TRANSCompel—a database on composite regulatory elements in tains now the possibility to combine searches in up to three eukaryotic genes. Nucleic Acids Res., 30, 332–334. ﬁelds at the same time or to make batch search and further 5. Kel,A.E., Go ¨ ßling,E., Reuter,I., Cheremushkin,E., Kel-Margoulis,O.V. options. The user can also choose the output ﬁelds, which are and Wingender,E. (2003) MATCH: a tool for searching transcription to be displayed in the result list. Each search can be stored and factor binding sites in DNA sequences. Nucleic Acids Res., 31, 3576–3579. reﬁned later on or the stored search result (list of accession 6. Chekmenev,D.S., Haid,C. and Kel,A.E. (2005) P-Match: transcription numbers) can serve as input for a batch search in another table. factor binding site search by combining patterns and weight matrices. Finally, on the basis of the result from a search in the SITE Nucleic Acids Res., 33, W432–W437. or MATRIX table, ‘proﬁles’ (sets of sites or matrices) can 7. Wingender,E. (1997) Classification scheme of eukaryotic transcription be created, which can be used by the integrated version of factors. Mol. Biol., 31, 483–497. 8. Hodges,P.E., Carrico,P.M., Hogan,J.D., O’Neill,K.E., Owen,J.J., Patch or Match, respectively, for sequence analyses. Mangan,M., Davis B.P., Brooks,J.E. and Garrels,J.I. (2002) Annotating the human proteome: the Human Proteome Survey Database TM (HumanPSD ) and an in-depth target database for G protein-coupled TM receptors (GPCR-PD ) from Incyte. Genomics. Nucleic Acids Res., 30, AVAILABILITY 137–141. 9. Chen,X., Wu,J.-m., Hornischer,K., Kel,A. and Wingender,E. (2006) The described TRANSFAC 7.0 and TRANSCompel 7.0 TiProD: the Tissue-specific Promoter Database. Nucleic Acids Res., 34, releases as well as the programs Match, Patch and P- D104–D107. Match are all freely available for online use by users 10. Schacherer,F., Choi,C., Go ¨ tze,U., Krull,M., Pistor,S. and Wingender,E. from non-proﬁt organizations at http://www.gene-regulation. (2001) The TRANSPATH signal transduction database: a knowledge base on signal transduction networks. Bioinformatics, 17, com/pub/databases.html#transfac, http://www.gene-regulation. 1053–1057. com/pub/databases.html#transcompel and http://www.gene- 11. Krull,M., Voss,N., Choi,C., Pistor,S., Potapov,A. and Wingender,E. regulation.com/pub/programs.html, respectively. (2003) TRANSPATH : an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res., 31, 97–100. 12. Wain,H.M., Lush,M.J., Ducluzeau,F., Khodiyar,V.K. and Povey,S. (2004) Genew: the Human Gene Nomenclature Database, 2004 updates. ACKNOWLEDGEMENTS Nucleic Acids Res., 32, D255–D257. 13. Bult,C.J., Blake,J.A., Richardson,J.E., Kadin, J.A., Eppig,J.T. and the We like to thank Prof Jingchu Luo, Dr Anyuan Guo and col- Mouse Genome Database Group (2004) The Mouse Genome Database leagues from Peking University, Center for Bioinformatics, (MGD): integrating biology with the genome. Nuceic Acids Res., 32, for providing the DATF data and Dr Casey Bergman and D476–D481. colleagues from FlyReg, Universiy of Manchester, U.K. for 14. de la Cruz,N., Bromberg,S., Pasko,D., Shimoyama,M., Twigger,S., Chen,J., Chen,C.F., Fan,C., Foote,C., Gopinath,G.R. et al. (2005) The Rat the Drosophila footprint data, as well as Prof. Dr. Reinhard Genome Database (RGD): developments towards a phenome database. Hehl and Claudia Galuschka from the Technical University Nucleic Acids Res., 33, D485–D491. Braunschweig, Germany for their collaboration on plant data 15. Hubbard,T., Andrews,D., Caccamo,M., Cameron,G., Chen,Y., curation. Further, we like to thank all people who have been Clamp,M., Clarke,L., Coates,G., Cox,T., Cunningham,F. et al. (2005) contributing over the years to the development and curation of Ensembl 2005. Nucleic Acids Res., 33, D447–D453. 16. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K., the described databases and connected tools. Parts of the work Church,D.M., DiCuccio,M., Edgar,R., Federhen,S., Helmberg,W. et al. were funded by grants of the German Ministry of Education and (2005) Database resources of the National Center for Biotechnology Research (BMBF) ‘Intergenomics’ (031U210B), collectively Information. Nucleic Acids Res., 33, D39–D45. by BioRegioN GmbH and BMBF ‘BioProfil’ (0313092), by the 17. Maglott,D., Ostell,J., Pruitt,K.D., Tatusova,T. (2005) Entrez Gene: gene- centered information at NCBI. Nucleic Acids Res., 33, D54–D58. European Commission under FP6-‘Life sciences, genomics 18. Mulder,N.J., Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., and biotechnology for health’, contract LSHG-CT-2004- Binns,D., Bradley,P., Bork,P., Bucher,P., Cerutti,L. et al. (2005) 503568 ‘COMBIO’, and by the European Commission InterPro, progress and status in 2005. Nucleic Acids Res., 33, under ‘Marie Curie research training networks’, contract D201–D205. MRTN-CT-2004-512285 ‘TRANSISTOR’. Funding to pay 19. Bateman,A., Coin,L., Durbin,R., Finn,R.D., Hollich,V., Griffiths- Jones,S., Khanna,A., Marshall,M., Moxon,S., Sonnhammer,E.L. et al. the Open Access publication charges for this article was pro- (2004) The Pfam protein families database. Nucleic Acids Res., 32, vided by BIOBASE GmbH. D138–D141. 20. Letunic,I., Copley,R.R., Schmidt,S., Ciccarelli,F.D., Doerks,T., Conflict of interest statement. None declared. Schultz,J., Ponting,C.P., Bork,P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res., 32, D142–D144. 21. Hulo,N., Sigrist,C.J., Le Saux,V., Langendijk-Genevaux,P.S., Bordoli,L., Gattiker,A., De Castro,E., Bucher,P., Bairoch,A. (2004) Recent REFERENCES improvements to the PROSITE database. Nucleic Acids Res., 32, D134–D137. 1. Wingender,E. (1988) Compilation of transcription regulating proteins. 22. Guo,A., He,K., Liu,D., Bai,S., Gu,X., Wei,L., Luo,J. (2005) DATF: Nucleic Acids Res., 16, 1879–1902. a Database of Arabidopsis Transcription Factors. Bioinformatics, 2. Matys,V., Fricke,E., Geffers,R., Go ¨ ßling,E., Haubrock,M., Hehl,R., 21, 2568–2569. Hornischer,K., Karas,D., Kel,A.E., Kel-Margoulis,O.V. et al. (2003) 23. Bergman,C.M., Carlson,J.W., Celniker,S.E. (2005) Drosophila DNase I TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic footprint database: a systematic genome annotation of transcription factor Acids Res., 31, 374–378. binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics, 21, 3. Kel-Margoulis,O., Matys,V., Choi,C., Reuter,I., Krull,M., Potapov,A.P., 1747–1749. Voss,N., Liebich,I., Kel,A. and Wingender,E. (2005) Databases on gene 24. Drysdale,R.A., Crosby,M.A. and FlyBase Consortium (2005) FlyBase: regulation. In Bajic,V.B. and Tan,T.W. (eds), Information Processing and genes and gene models. Nucleic Acids Res., 33, Living Systems. World Scientific Publishing Co., Singapore, Vol. 2, D390–D395. pp. 709–727. Downloaded from https://academic.oup.com/nar/article-abstract/34/suppl_1/D108/1133867 by Ed 'DeepDyve' Gillespie user on 06 February 2018

Journal

Nucleic Acids Research – Oxford University Press

Published: Jan 1, 2006

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes

TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes

TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes

References (28)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies