DRTF: a database of rice transcription factors

Ge Gao; Yingfu Zhong; Anyuan Guo; Qihui Zhu; Wen Tang; Weimou Zheng; Xiaocheng Gu; Liping Wei; Jingchu Luo

doi:10.1093/bioinformatics/btl107

DRTF: a database of rice transcription factors

Gao, Ge; Zhong, Yingfu; Guo, Anyuan; Zhu, Qihui; Tang, Wen; Zheng, Weimou; Gu, Xiaocheng; Wei, Liping; Luo, Jingchu 2006-03-21 00:00:00 Vol. 22 no. 10 2006, pages 1286–1287 BIOINFORMATICSAPPLICATIONS NOTE doi:10.1093/bioinformatics/btl107 Databases and ontologies 1 1 1 1 1 2 Ge Gao , Yingfu Zhong , Anyuan Guo , Qihui Zhu , Wen Tang , Weimou Zheng , 1 1, 1, Xiaocheng Gu , Liping Wei and Jingchu Luo Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, People’s Republic of China and The Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China Received on March 9, 2006; accepted on March 18, 2006 Advance Access publication March 21, 2006 Associate Editor: Martin Bishop ABSTRACT searched by keywords or sequences. All sequences are available for downloading. Summary: DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. 2 IDENTIFICATION OF PUTATIVE It includesdetailed annotationsof eachTF including sequencefeatures, TRANSCRIPTION FACTORS functional domains, Gene Ontology assignment, chromosomal local- We ﬁrst compiled and reﬁned a list of sequence signatures for ization, ESTand microarray expression information, as well as multiple known plant TF families based on the literature (Shiu et al., sequence alignment of the DNA-binding domains for each TF family. 2005; Xiong et al., 2005; Davuluri et al., 2003; Riechmann The database can be browsed and searched with a user-friendly et al., 2000) and existing databases (Guo et al., 2005, http://datf. web interface. cbi.pku.edu.cn/). Most families can be identiﬁed by representative Availability: DRTF is available at http://drtf.cbi.pku.edu.cn HMM proﬁles for their DNA-binding domains from Pfam (Bateman Contact: drtf@mail.cbi.pku.edu.cn et al., 2004). For the remaining families without DNA-binding domain proﬁles, either characterized recently or containing few members, we chose representative sequences from the literature 1 INTRODUCTION and use them as seeds for BLAST. Finally, we collected 63 distinct Transcription factors (TFs) play key roles in regulating gene expres- TF families. sion at the transcriptional level, controlling or inﬂuencing many We downloaded 49 710 predicted indica proteins from the biological processes such as development, growth, cell division Beijing Genome Institute (BGI, http://rise.genomics.org.cn/) and and responses to environmental stimulus. Identiﬁcation, character- 49 472 predicted japonica proteins from TIGR (http://rice.tigr. ization and classiﬁcation of TFs at the genome scale may provide an org/). Based on the list of plant TF signatures, we performed important resource for researchers on transcriptional regulation. The HMMER (Eddy, 1998) and BLAST searches against the whole only available online database of rice TFs is RiceTFDB (http:// proteomes of indica and japonica. We choose 0.01 as the default ricetfdb.bio.uni-potsdam.de/) which contains 2856 protein models E-value cutoff for most TF families in HMMER searches. We coded from 2305 loci in 53 TF families for japonica. It has limited manually inspected all alignments of the domains and reﬁned the annotations including DNA-binding domain and InterPro domain results carefully. For BLAST searches, we manually inspected the hits for each TF and whole length multiple sequence alignment for alignments and set the E-value cutoff case by case (for details see each family. Xiong et al. (2005) identiﬁed 1745 putative TF protein the DRTF Help page). Finally, we identiﬁed 2025 putative TFs from models coded from 1611 loci in japonica, and provided the list as indica and 2384 from japonica. Supplementary data. A comprehensive, well-annotated resource of TFs in both indica and japonica can facilitate comparative analysis 3 ANNOTATION OF PUTATIVE of TFs between these two rice subspecies and help to explore TRANSCRIPTION FACTORS the distinct morphological differentiations between indica and To provide comprehensive information for the putative TFs, we japonica. made extensive annotations using a number of bioinformatics Combining automated InterPro scans and BLAST searches with tools and databases. In particular, we employed InterProScan careful manual curation, we have identiﬁed TFs in both indica and (Quevillon et al., 2005) to identify protein domains and assign japonica, and constructed a database of rice TFs named DRTF GO terms to the putative TFs; we performed similarity searches containing extensive annotations for the TFs and TF families as against major databases including UniProt (Wu et al., 2006), well as homologous relationship between corresponding indica, RefSeq (Pruitt et al., 2005), EMBL (Cochrane et al., 2006) and japonica and Arabidopsis TFs. The DRTF web server was set up TRANSFAC (Matys et al., 2006) and hyperlinked to them; we made under the Apache/PHP/MySQL environment on a RedHat Linux BLASTP searches against the latest PDBselect database (E-value platform. It can be browsed by TF families or chromosomes, and <0.01, identity >30%, and overlap 50 residues) to ﬁnd 3D struc- To whom correspondence should be addressed. tural relevance; we obtained EST expression information from 1286 The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org DRTF: a database of rice transcription factors UniGene clusters and microarray expression information from the may bridge the gap at least for the TF families. We will maintain and NCBI GEO database using GEO-BLAST; we aligned the TFs to the update DRTF regularly as more data and information become RIKEN full-length sequences and provided their accession numbers available. and CloneIDs; lastly, we identiﬁed homologs of each TF in the other rice subspecies and Arabidopsis. For each TF family, DRTF includes information extracted from the literature, key references, ACKNOWLEDGEMENTS and multiple sequence alignment of the DNA-binding domains. This study was supported by domestic grants: 2003CB715900 (973), 90408015 (NSFC) and the 863 Programme. 4 DISCUSSION Conflict of Interest: none declared. The goal of DRTF is to construct a comprehensive resource of rice TFs. Instead of relying on computational prediction completely, we combined automated search and manual curation. Despite the dif- REFERENCES ference in TF numbers of the two rice subspecies, TFs of one Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, subspecies ﬁnd homologs in the other reciprocally at a rate >97%. D138–D141. The different TF numbers between some of the co-responding Cochrane,G. et al. (2006) EMBL Nucleotide Sequence Database: developments in families in DRTF and RiceTFDB could be caused partly by the 2005. Nucleic Acids Res., 34, D10–D15. different HMM proﬁles used to deﬁne certain families. For example, Davuluri,R.V. et al. (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription we took CCCH type zinc ﬁnger domain (IPR000571) as the deﬁning factors. BMC Bioinformatics, 23, 4–25. signature described as ‘DNA-binding’ in InterPro and ‘nucleic acid Eddy,S.R. (1998) Proﬁle hidden Markov models. Bioinformatics, 14, 755–763. binding’ as the GO term for the C3H family, whereas RiceTFDB Guo,A. et al. (2005) DATF: a database of Arabidopsis transcription factors. used the C3HC4 ring-ﬁnger domain (IPR001841) which has no Bioinformatics, 21, 2568–2469. Hwang,I. et al. (2002) Two-component signal transduction pathways in Arabidopsis. description of DNA-binding function in InterPro, and the GO Plant Physiol., 129, 500–515. terms assigned are ‘protein-binding’ (GO: 0005515) and ‘zinc ion Matys,V. et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional binding’ (GO: 0008270). The different choice of HMM proﬁles has gene regulation in eukaryotes. Nucleic Acids Res., 34, D108–D110. resulted in a 6-fold difference in the number of predicted japonica Pruitt,K.D. et al. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant TFs of this family, only 90 in DRTF but 541 in RiceTFDB. sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33, D501–D504. The differences between the dataset of putative japonica TFs in Quevillon,E. et al. (2005) InterProScan: protein domains identiﬁer. Nucleic Acids Res., DRTF and the dataset composed by Xiong et al. (2005) are mostly 33, W116–W120. because of the larger number of TF families we classiﬁed (63 versus Riechmann,J.L. et al. (2000) Arabidopsis transcription factors: genome-wide compar- 37), and the newer version (Release 4) of TIGR database which ative analysis among eukaryotes. Science, 290, 2105–2110. Shiu,S.H. et al. (2005) Transcription factor families have much higher expansion rates contains 62 827 predicted proteins versus 59 712 in Release 2 of in plants than in animals. Plant Physiol., 139, 18–26. which 409 TFs we identiﬁed for DRTF are missed. Wu,C.H. et al. (2006) The Universal Protein Resource (UniProt): an expanding DRTF is the ﬁrst database of TFs for indica and the most annot- universe of protein information. Nucleic Acids Res., 34, D187–D191. ated one for japonica. Currently, there is little annotation available Xiong,Y. et al. (2005) Transcription factors in rice: a genome-wide comparative for the indica genome in the public sequence repository, and DRTF analysis between monocots and eudicots. Plant Mol. Biol., 59, 191–203. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/drtf-a-database-of-rice-transcription-factors-C8rSJTpZOx

Loading next page...

References (18)

Shin-Han Shiu, M. Shih, Wen-Hsiung Li (2005)
Transcription Factor Families Have Much Higher Expansion Rates in Plants than in Animals1
Plant Physiology, 139
Ildoo Hwang, Huei-Chi Chen, J. Sheen (2002)
Two-Component Signal Transduction Pathways in Arabidopsis1
Plant Physiology, 129
G. Cochrane, P. Aldebert, N. Althorpe, M. Andersson, W. Baker, A. Baldwin, Kirsty Bates, S. Bhattacharyya, Paul Browne, Alexandra Broek, Matias Castro, Karyn Duggan, R. Eberhardt, Nadeem Faruque, John Gamble, Carola Kanz, T. Kulikova, Charles Lee, R. Leinonen, Quan Lin, V. Lombard, R. Lopez, Michelle McHale, Hamish McWilliam, Gaurab Mukherjee, Francesco Nardone, M. Pastor, S. Sobhany, P. Stoehr, Katerina Tzouvara, Robert Vaughan, Dan Wu, Weimin Zhu, R. Apweiler (2005)
EMBL Nucleotide Sequence Database: developments in 2005
Nucleic Acids Research, 34
Emmanuel Quevillon, Ville Silventoinen, Sharmila Pillai, Nicola Harte, N. Mulder, R. Apweiler, R. Lopez (2005)
InterProScan: protein domains identifier
Nucleic Acids Research, 33
RiceTFDB: http://ricetfdb.bio.uni-potsdam
(2006)
plants than in animals
K. Pruitt, T. Tatusova, D. Maglott (2004)
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Nucleic Acids Research, 33
J. Riechmann, Jacqueline Heard, George Martin, T. Reuber, C. Jiang, J. Keddie, L. Adam, O. Pineda, O. Ratcliffe, Raymond Samaha, R. Creelman, M. Pilgrim, P. Broun, James Zhang, D. Ghandehari, B. Sherman, Guo-Liang Yu (2000)
Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.
Science, 290 5499
S. Eddy (1998)
Profile hidden Markov models
Bioinformatics, 14 9
Robert Finn, Jaina Mistry, John Tate, Penny Coggill, A. Heger, Joanne Pollington, O. Gavin, Prasad Gunasekaran, Goran Ceric, Kristoffer Forslund, Liisa Holm, Erik Sonnhammer, Sean Eddy, Alex Bateman (2007)
The Pfam protein families database
Nucleic Acids Research, 38
K. Pruitt, T. Tatusova, D. Maglott (2006)
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Nucleic Acids Research, 35
L. Conde, Juan Vaquerizas, Carles Ferrer-Costa, X. Cruz, M. Orozco, J. Dopazo (2005)
PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes
Nucleic Acids Research, 33
Yuqing Xiong, Tieyan Liu, Chaoguang Tian, Shouhong Sun, Jiayang Li, Mingsheng Chen (2005)
Transcription Factors in Rice: A Genome-wide Comparative Analysis between Monocots and Eudicots
Plant Molecular Biology, 59
Anyuan Guo, Kun He, Di Liu, Shunong Bai, X. Gu, Liping Wei, Jingchu Luo (2005)
DATF: a database of Arabidopsis transcription factors
Bioinformatics, 21 10
Cathy Wu, R. Apweiler, A. Bairoch, D. Natale, W. Barker, B. Boeckmann, Serenella Ferro, E. Gasteiger, Hongzhan Huang, R. Lopez, M. Magrane, M. Martin, R. Mazumder, C. O’Donovan, Nicole Redaschi, Baris Suzek (2005)
The Universal Protein Resource (UniProt): an expanding universe of protein information
Nucleic Acids Research, 34
R. Davuluri, Hao Sun, Saranyan Palaniswamy, Nicole Matthews, Carlos Molina, M. Kurtz, E. Grotewold (2003)
AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors
BMC Bioinformatics, 4
V. Matys, O. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, N. Voss, P. Stegmaier, B. Lewicki-Potapov, H. Saxel, A. Kel, E. Wingender (2005)
TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes
Nucleic Acids Research, 34
Bateman (2004)
D138
Nucleic Acids Res., 32

Publisher: Oxford University Press
Copyright: © The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
ISSN: 1367-4803
eISSN: 1460-2059
DOI: 10.1093/bioinformatics/btl107
pmid: 16551659
Publisher site: See Article on Publisher Site

Abstract

Vol. 22 no. 10 2006, pages 1286–1287 BIOINFORMATICSAPPLICATIONS NOTE doi:10.1093/bioinformatics/btl107 Databases and ontologies 1 1 1 1 1 2 Ge Gao , Yingfu Zhong , Anyuan Guo , Qihui Zhu , Wen Tang , Weimou Zheng , 1 1, 1, Xiaocheng Gu , Liping Wei and Jingchu Luo Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, People’s Republic of China and The Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China Received on March 9, 2006; accepted on March 18, 2006 Advance Access publication March 21, 2006 Associate Editor: Martin Bishop ABSTRACT searched by keywords or sequences. All sequences are available for downloading. Summary: DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. 2 IDENTIFICATION OF PUTATIVE It includesdetailed annotationsof eachTF including sequencefeatures, TRANSCRIPTION FACTORS functional domains, Gene Ontology assignment, chromosomal local- We ﬁrst compiled and reﬁned a list of sequence signatures for ization, ESTand microarray expression information, as well as multiple known plant TF families based on the literature (Shiu et al., sequence alignment of the DNA-binding domains for each TF family. 2005; Xiong et al., 2005; Davuluri et al., 2003; Riechmann The database can be browsed and searched with a user-friendly et al., 2000) and existing databases (Guo et al., 2005, http://datf. web interface. cbi.pku.edu.cn/). Most families can be identiﬁed by representative Availability: DRTF is available at http://drtf.cbi.pku.edu.cn HMM proﬁles for their DNA-binding domains from Pfam (Bateman Contact: drtf@mail.cbi.pku.edu.cn et al., 2004). For the remaining families without DNA-binding domain proﬁles, either characterized recently or containing few members, we chose representative sequences from the literature 1 INTRODUCTION and use them as seeds for BLAST. Finally, we collected 63 distinct Transcription factors (TFs) play key roles in regulating gene expres- TF families. sion at the transcriptional level, controlling or inﬂuencing many We downloaded 49 710 predicted indica proteins from the biological processes such as development, growth, cell division Beijing Genome Institute (BGI, http://rise.genomics.org.cn/) and and responses to environmental stimulus. Identiﬁcation, character- 49 472 predicted japonica proteins from TIGR (http://rice.tigr. ization and classiﬁcation of TFs at the genome scale may provide an org/). Based on the list of plant TF signatures, we performed important resource for researchers on transcriptional regulation. The HMMER (Eddy, 1998) and BLAST searches against the whole only available online database of rice TFs is RiceTFDB (http:// proteomes of indica and japonica. We choose 0.01 as the default ricetfdb.bio.uni-potsdam.de/) which contains 2856 protein models E-value cutoff for most TF families in HMMER searches. We coded from 2305 loci in 53 TF families for japonica. It has limited manually inspected all alignments of the domains and reﬁned the annotations including DNA-binding domain and InterPro domain results carefully. For BLAST searches, we manually inspected the hits for each TF and whole length multiple sequence alignment for alignments and set the E-value cutoff case by case (for details see each family. Xiong et al. (2005) identiﬁed 1745 putative TF protein the DRTF Help page). Finally, we identiﬁed 2025 putative TFs from models coded from 1611 loci in japonica, and provided the list as indica and 2384 from japonica. Supplementary data. A comprehensive, well-annotated resource of TFs in both indica and japonica can facilitate comparative analysis 3 ANNOTATION OF PUTATIVE of TFs between these two rice subspecies and help to explore TRANSCRIPTION FACTORS the distinct morphological differentiations between indica and To provide comprehensive information for the putative TFs, we japonica. made extensive annotations using a number of bioinformatics Combining automated InterPro scans and BLAST searches with tools and databases. In particular, we employed InterProScan careful manual curation, we have identiﬁed TFs in both indica and (Quevillon et al., 2005) to identify protein domains and assign japonica, and constructed a database of rice TFs named DRTF GO terms to the putative TFs; we performed similarity searches containing extensive annotations for the TFs and TF families as against major databases including UniProt (Wu et al., 2006), well as homologous relationship between corresponding indica, RefSeq (Pruitt et al., 2005), EMBL (Cochrane et al., 2006) and japonica and Arabidopsis TFs. The DRTF web server was set up TRANSFAC (Matys et al., 2006) and hyperlinked to them; we made under the Apache/PHP/MySQL environment on a RedHat Linux BLASTP searches against the latest PDBselect database (E-value platform. It can be browsed by TF families or chromosomes, and <0.01, identity >30%, and overlap 50 residues) to ﬁnd 3D struc- To whom correspondence should be addressed. tural relevance; we obtained EST expression information from 1286 The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org DRTF: a database of rice transcription factors UniGene clusters and microarray expression information from the may bridge the gap at least for the TF families. We will maintain and NCBI GEO database using GEO-BLAST; we aligned the TFs to the update DRTF regularly as more data and information become RIKEN full-length sequences and provided their accession numbers available. and CloneIDs; lastly, we identiﬁed homologs of each TF in the other rice subspecies and Arabidopsis. For each TF family, DRTF includes information extracted from the literature, key references, ACKNOWLEDGEMENTS and multiple sequence alignment of the DNA-binding domains. This study was supported by domestic grants: 2003CB715900 (973), 90408015 (NSFC) and the 863 Programme. 4 DISCUSSION Conflict of Interest: none declared. The goal of DRTF is to construct a comprehensive resource of rice TFs. Instead of relying on computational prediction completely, we combined automated search and manual curation. Despite the dif- REFERENCES ference in TF numbers of the two rice subspecies, TFs of one Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, subspecies ﬁnd homologs in the other reciprocally at a rate >97%. D138–D141. The different TF numbers between some of the co-responding Cochrane,G. et al. (2006) EMBL Nucleotide Sequence Database: developments in families in DRTF and RiceTFDB could be caused partly by the 2005. Nucleic Acids Res., 34, D10–D15. different HMM proﬁles used to deﬁne certain families. For example, Davuluri,R.V. et al. (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription we took CCCH type zinc ﬁnger domain (IPR000571) as the deﬁning factors. BMC Bioinformatics, 23, 4–25. signature described as ‘DNA-binding’ in InterPro and ‘nucleic acid Eddy,S.R. (1998) Proﬁle hidden Markov models. Bioinformatics, 14, 755–763. binding’ as the GO term for the C3H family, whereas RiceTFDB Guo,A. et al. (2005) DATF: a database of Arabidopsis transcription factors. used the C3HC4 ring-ﬁnger domain (IPR001841) which has no Bioinformatics, 21, 2568–2469. Hwang,I. et al. (2002) Two-component signal transduction pathways in Arabidopsis. description of DNA-binding function in InterPro, and the GO Plant Physiol., 129, 500–515. terms assigned are ‘protein-binding’ (GO: 0005515) and ‘zinc ion Matys,V. et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional binding’ (GO: 0008270). The different choice of HMM proﬁles has gene regulation in eukaryotes. Nucleic Acids Res., 34, D108–D110. resulted in a 6-fold difference in the number of predicted japonica Pruitt,K.D. et al. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant TFs of this family, only 90 in DRTF but 541 in RiceTFDB. sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33, D501–D504. The differences between the dataset of putative japonica TFs in Quevillon,E. et al. (2005) InterProScan: protein domains identiﬁer. Nucleic Acids Res., DRTF and the dataset composed by Xiong et al. (2005) are mostly 33, W116–W120. because of the larger number of TF families we classiﬁed (63 versus Riechmann,J.L. et al. (2000) Arabidopsis transcription factors: genome-wide compar- 37), and the newer version (Release 4) of TIGR database which ative analysis among eukaryotes. Science, 290, 2105–2110. Shiu,S.H. et al. (2005) Transcription factor families have much higher expansion rates contains 62 827 predicted proteins versus 59 712 in Release 2 of in plants than in animals. Plant Physiol., 139, 18–26. which 409 TFs we identiﬁed for DRTF are missed. Wu,C.H. et al. (2006) The Universal Protein Resource (UniProt): an expanding DRTF is the ﬁrst database of TFs for indica and the most annot- universe of protein information. Nucleic Acids Res., 34, D187–D191. ated one for japonica. Currently, there is little annotation available Xiong,Y. et al. (2005) Transcription factors in rice: a genome-wide comparative for the indica genome in the public sequence repository, and DRTF analysis between monocots and eudicots. Plant Mol. Biol., 59, 191–203.

Journal

Bioinformatics – Oxford University Press

Published: Mar 21, 2006

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DRTF: a database of rice transcription factors

DRTF: a database of rice transcription factors

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DRTF: a database of rice transcription factors

DRTF: a database of rice transcription factors

References (18)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies