Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

DRTF: a database of rice transcription factors

DRTF: a database of rice transcription factors Vol. 22 no. 10 2006, pages 1286–1287 BIOINFORMATICSAPPLICATIONS NOTE doi:10.1093/bioinformatics/btl107 Databases and ontologies 1 1 1 1 1 2 Ge Gao , Yingfu Zhong , Anyuan Guo , Qihui Zhu , Wen Tang , Weimou Zheng , 1 1, 1, Xiaocheng Gu , Liping Wei and Jingchu Luo Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, People’s Republic of China and The Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China Received on March 9, 2006; accepted on March 18, 2006 Advance Access publication March 21, 2006 Associate Editor: Martin Bishop ABSTRACT searched by keywords or sequences. All sequences are available for downloading. Summary: DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. 2 IDENTIFICATION OF PUTATIVE It includesdetailed annotationsof eachTF including sequencefeatures, TRANSCRIPTION FACTORS functional domains, Gene Ontology assignment, chromosomal local- We first compiled and refined a list of sequence signatures for ization, ESTand microarray expression information, as well as multiple known plant TF families based on the literature (Shiu et al., sequence alignment of the DNA-binding domains for each TF family. 2005; Xiong et al., 2005; Davuluri et al., 2003; Riechmann The database can be browsed and searched with a user-friendly et al., 2000) and existing databases (Guo et al., 2005, http://datf. web interface. cbi.pku.edu.cn/). Most families can be identified by representative Availability: DRTF is available at http://drtf.cbi.pku.edu.cn HMM profiles for their DNA-binding domains from Pfam (Bateman Contact: drtf@mail.cbi.pku.edu.cn et al., 2004). For the remaining families without DNA-binding domain profiles, either characterized recently or containing few members, we chose representative sequences from the literature 1 INTRODUCTION and use them as seeds for BLAST. Finally, we collected 63 distinct Transcription factors (TFs) play key roles in regulating gene expres- TF families. sion at the transcriptional level, controlling or influencing many We downloaded 49 710 predicted indica proteins from the biological processes such as development, growth, cell division Beijing Genome Institute (BGI, http://rise.genomics.org.cn/) and and responses to environmental stimulus. Identification, character- 49 472 predicted japonica proteins from TIGR (http://rice.tigr. ization and classification of TFs at the genome scale may provide an org/). Based on the list of plant TF signatures, we performed important resource for researchers on transcriptional regulation. The HMMER (Eddy, 1998) and BLAST searches against the whole only available online database of rice TFs is RiceTFDB (http:// proteomes of indica and japonica. We choose 0.01 as the default ricetfdb.bio.uni-potsdam.de/) which contains 2856 protein models E-value cutoff for most TF families in HMMER searches. We coded from 2305 loci in 53 TF families for japonica. It has limited manually inspected all alignments of the domains and refined the annotations including DNA-binding domain and InterPro domain results carefully. For BLAST searches, we manually inspected the hits for each TF and whole length multiple sequence alignment for alignments and set the E-value cutoff case by case (for details see each family. Xiong et al. (2005) identified 1745 putative TF protein the DRTF Help page). Finally, we identified 2025 putative TFs from models coded from 1611 loci in japonica, and provided the list as indica and 2384 from japonica. Supplementary data. A comprehensive, well-annotated resource of TFs in both indica and japonica can facilitate comparative analysis 3 ANNOTATION OF PUTATIVE of TFs between these two rice subspecies and help to explore TRANSCRIPTION FACTORS the distinct morphological differentiations between indica and To provide comprehensive information for the putative TFs, we japonica. made extensive annotations using a number of bioinformatics Combining automated InterPro scans and BLAST searches with tools and databases. In particular, we employed InterProScan careful manual curation, we have identified TFs in both indica and (Quevillon et al., 2005) to identify protein domains and assign japonica, and constructed a database of rice TFs named DRTF GO terms to the putative TFs; we performed similarity searches containing extensive annotations for the TFs and TF families as against major databases including UniProt (Wu et al., 2006), well as homologous relationship between corresponding indica, RefSeq (Pruitt et al., 2005), EMBL (Cochrane et al., 2006) and japonica and Arabidopsis TFs. The DRTF web server was set up TRANSFAC (Matys et al., 2006) and hyperlinked to them; we made under the Apache/PHP/MySQL environment on a RedHat Linux BLASTP searches against the latest PDBselect database (E-value platform. It can be browsed by TF families or chromosomes, and <0.01, identity >30%, and overlap 50 residues) to find 3D struc- To whom correspondence should be addressed. tural relevance; we obtained EST expression information from 1286  The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org DRTF: a database of rice transcription factors UniGene clusters and microarray expression information from the may bridge the gap at least for the TF families. We will maintain and NCBI GEO database using GEO-BLAST; we aligned the TFs to the update DRTF regularly as more data and information become RIKEN full-length sequences and provided their accession numbers available. and CloneIDs; lastly, we identified homologs of each TF in the other rice subspecies and Arabidopsis. For each TF family, DRTF includes information extracted from the literature, key references, ACKNOWLEDGEMENTS and multiple sequence alignment of the DNA-binding domains. This study was supported by domestic grants: 2003CB715900 (973), 90408015 (NSFC) and the 863 Programme. 4 DISCUSSION Conflict of Interest: none declared. The goal of DRTF is to construct a comprehensive resource of rice TFs. Instead of relying on computational prediction completely, we combined automated search and manual curation. Despite the dif- REFERENCES ference in TF numbers of the two rice subspecies, TFs of one Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, subspecies find homologs in the other reciprocally at a rate >97%. D138–D141. The different TF numbers between some of the co-responding Cochrane,G. et al. (2006) EMBL Nucleotide Sequence Database: developments in families in DRTF and RiceTFDB could be caused partly by the 2005. Nucleic Acids Res., 34, D10–D15. different HMM profiles used to define certain families. For example, Davuluri,R.V. et al. (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription we took CCCH type zinc finger domain (IPR000571) as the defining factors. BMC Bioinformatics, 23, 4–25. signature described as ‘DNA-binding’ in InterPro and ‘nucleic acid Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. binding’ as the GO term for the C3H family, whereas RiceTFDB Guo,A. et al. (2005) DATF: a database of Arabidopsis transcription factors. used the C3HC4 ring-finger domain (IPR001841) which has no Bioinformatics, 21, 2568–2469. Hwang,I. et al. (2002) Two-component signal transduction pathways in Arabidopsis. description of DNA-binding function in InterPro, and the GO Plant Physiol., 129, 500–515. terms assigned are ‘protein-binding’ (GO: 0005515) and ‘zinc ion Matys,V. et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional binding’ (GO: 0008270). The different choice of HMM profiles has gene regulation in eukaryotes. Nucleic Acids Res., 34, D108–D110. resulted in a 6-fold difference in the number of predicted japonica Pruitt,K.D. et al. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant TFs of this family, only 90 in DRTF but 541 in RiceTFDB. sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33, D501–D504. The differences between the dataset of putative japonica TFs in Quevillon,E. et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res., DRTF and the dataset composed by Xiong et al. (2005) are mostly 33, W116–W120. because of the larger number of TF families we classified (63 versus Riechmann,J.L. et al. (2000) Arabidopsis transcription factors: genome-wide compar- 37), and the newer version (Release 4) of TIGR database which ative analysis among eukaryotes. Science, 290, 2105–2110. Shiu,S.H. et al. (2005) Transcription factor families have much higher expansion rates contains 62 827 predicted proteins versus 59 712 in Release 2 of in plants than in animals. Plant Physiol., 139, 18–26. which 409 TFs we identified for DRTF are missed. Wu,C.H. et al. (2006) The Universal Protein Resource (UniProt): an expanding DRTF is the first database of TFs for indica and the most annot- universe of protein information. Nucleic Acids Res., 34, D187–D191. ated one for japonica. Currently, there is little annotation available Xiong,Y. et al. (2005) Transcription factors in rice: a genome-wide comparative for the indica genome in the public sequence repository, and DRTF analysis between monocots and eudicots. Plant Mol. Biol., 59, 191–203. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

DRTF: a database of rice transcription factors

Loading next page...
 
/lp/oxford-university-press/drtf-a-database-of-rice-transcription-factors-C8rSJTpZOx

References (18)

Publisher
Oxford University Press
Copyright
© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btl107
pmid
16551659
Publisher site
See Article on Publisher Site

Abstract

Vol. 22 no. 10 2006, pages 1286–1287 BIOINFORMATICSAPPLICATIONS NOTE doi:10.1093/bioinformatics/btl107 Databases and ontologies 1 1 1 1 1 2 Ge Gao , Yingfu Zhong , Anyuan Guo , Qihui Zhu , Wen Tang , Weimou Zheng , 1 1, 1, Xiaocheng Gu , Liping Wei and Jingchu Luo Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, People’s Republic of China and The Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China Received on March 9, 2006; accepted on March 18, 2006 Advance Access publication March 21, 2006 Associate Editor: Martin Bishop ABSTRACT searched by keywords or sequences. All sequences are available for downloading. Summary: DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. 2 IDENTIFICATION OF PUTATIVE It includesdetailed annotationsof eachTF including sequencefeatures, TRANSCRIPTION FACTORS functional domains, Gene Ontology assignment, chromosomal local- We first compiled and refined a list of sequence signatures for ization, ESTand microarray expression information, as well as multiple known plant TF families based on the literature (Shiu et al., sequence alignment of the DNA-binding domains for each TF family. 2005; Xiong et al., 2005; Davuluri et al., 2003; Riechmann The database can be browsed and searched with a user-friendly et al., 2000) and existing databases (Guo et al., 2005, http://datf. web interface. cbi.pku.edu.cn/). Most families can be identified by representative Availability: DRTF is available at http://drtf.cbi.pku.edu.cn HMM profiles for their DNA-binding domains from Pfam (Bateman Contact: drtf@mail.cbi.pku.edu.cn et al., 2004). For the remaining families without DNA-binding domain profiles, either characterized recently or containing few members, we chose representative sequences from the literature 1 INTRODUCTION and use them as seeds for BLAST. Finally, we collected 63 distinct Transcription factors (TFs) play key roles in regulating gene expres- TF families. sion at the transcriptional level, controlling or influencing many We downloaded 49 710 predicted indica proteins from the biological processes such as development, growth, cell division Beijing Genome Institute (BGI, http://rise.genomics.org.cn/) and and responses to environmental stimulus. Identification, character- 49 472 predicted japonica proteins from TIGR (http://rice.tigr. ization and classification of TFs at the genome scale may provide an org/). Based on the list of plant TF signatures, we performed important resource for researchers on transcriptional regulation. The HMMER (Eddy, 1998) and BLAST searches against the whole only available online database of rice TFs is RiceTFDB (http:// proteomes of indica and japonica. We choose 0.01 as the default ricetfdb.bio.uni-potsdam.de/) which contains 2856 protein models E-value cutoff for most TF families in HMMER searches. We coded from 2305 loci in 53 TF families for japonica. It has limited manually inspected all alignments of the domains and refined the annotations including DNA-binding domain and InterPro domain results carefully. For BLAST searches, we manually inspected the hits for each TF and whole length multiple sequence alignment for alignments and set the E-value cutoff case by case (for details see each family. Xiong et al. (2005) identified 1745 putative TF protein the DRTF Help page). Finally, we identified 2025 putative TFs from models coded from 1611 loci in japonica, and provided the list as indica and 2384 from japonica. Supplementary data. A comprehensive, well-annotated resource of TFs in both indica and japonica can facilitate comparative analysis 3 ANNOTATION OF PUTATIVE of TFs between these two rice subspecies and help to explore TRANSCRIPTION FACTORS the distinct morphological differentiations between indica and To provide comprehensive information for the putative TFs, we japonica. made extensive annotations using a number of bioinformatics Combining automated InterPro scans and BLAST searches with tools and databases. In particular, we employed InterProScan careful manual curation, we have identified TFs in both indica and (Quevillon et al., 2005) to identify protein domains and assign japonica, and constructed a database of rice TFs named DRTF GO terms to the putative TFs; we performed similarity searches containing extensive annotations for the TFs and TF families as against major databases including UniProt (Wu et al., 2006), well as homologous relationship between corresponding indica, RefSeq (Pruitt et al., 2005), EMBL (Cochrane et al., 2006) and japonica and Arabidopsis TFs. The DRTF web server was set up TRANSFAC (Matys et al., 2006) and hyperlinked to them; we made under the Apache/PHP/MySQL environment on a RedHat Linux BLASTP searches against the latest PDBselect database (E-value platform. It can be browsed by TF families or chromosomes, and <0.01, identity >30%, and overlap 50 residues) to find 3D struc- To whom correspondence should be addressed. tural relevance; we obtained EST expression information from 1286  The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org DRTF: a database of rice transcription factors UniGene clusters and microarray expression information from the may bridge the gap at least for the TF families. We will maintain and NCBI GEO database using GEO-BLAST; we aligned the TFs to the update DRTF regularly as more data and information become RIKEN full-length sequences and provided their accession numbers available. and CloneIDs; lastly, we identified homologs of each TF in the other rice subspecies and Arabidopsis. For each TF family, DRTF includes information extracted from the literature, key references, ACKNOWLEDGEMENTS and multiple sequence alignment of the DNA-binding domains. This study was supported by domestic grants: 2003CB715900 (973), 90408015 (NSFC) and the 863 Programme. 4 DISCUSSION Conflict of Interest: none declared. The goal of DRTF is to construct a comprehensive resource of rice TFs. Instead of relying on computational prediction completely, we combined automated search and manual curation. Despite the dif- REFERENCES ference in TF numbers of the two rice subspecies, TFs of one Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, subspecies find homologs in the other reciprocally at a rate >97%. D138–D141. The different TF numbers between some of the co-responding Cochrane,G. et al. (2006) EMBL Nucleotide Sequence Database: developments in families in DRTF and RiceTFDB could be caused partly by the 2005. Nucleic Acids Res., 34, D10–D15. different HMM profiles used to define certain families. For example, Davuluri,R.V. et al. (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription we took CCCH type zinc finger domain (IPR000571) as the defining factors. BMC Bioinformatics, 23, 4–25. signature described as ‘DNA-binding’ in InterPro and ‘nucleic acid Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. binding’ as the GO term for the C3H family, whereas RiceTFDB Guo,A. et al. (2005) DATF: a database of Arabidopsis transcription factors. used the C3HC4 ring-finger domain (IPR001841) which has no Bioinformatics, 21, 2568–2469. Hwang,I. et al. (2002) Two-component signal transduction pathways in Arabidopsis. description of DNA-binding function in InterPro, and the GO Plant Physiol., 129, 500–515. terms assigned are ‘protein-binding’ (GO: 0005515) and ‘zinc ion Matys,V. et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional binding’ (GO: 0008270). The different choice of HMM profiles has gene regulation in eukaryotes. Nucleic Acids Res., 34, D108–D110. resulted in a 6-fold difference in the number of predicted japonica Pruitt,K.D. et al. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant TFs of this family, only 90 in DRTF but 541 in RiceTFDB. sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33, D501–D504. The differences between the dataset of putative japonica TFs in Quevillon,E. et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res., DRTF and the dataset composed by Xiong et al. (2005) are mostly 33, W116–W120. because of the larger number of TF families we classified (63 versus Riechmann,J.L. et al. (2000) Arabidopsis transcription factors: genome-wide compar- 37), and the newer version (Release 4) of TIGR database which ative analysis among eukaryotes. Science, 290, 2105–2110. Shiu,S.H. et al. (2005) Transcription factor families have much higher expansion rates contains 62 827 predicted proteins versus 59 712 in Release 2 of in plants than in animals. Plant Physiol., 139, 18–26. which 409 TFs we identified for DRTF are missed. Wu,C.H. et al. (2006) The Universal Protein Resource (UniProt): an expanding DRTF is the first database of TFs for indica and the most annot- universe of protein information. Nucleic Acids Res., 34, D187–D191. ated one for japonica. Currently, there is little annotation available Xiong,Y. et al. (2005) Transcription factors in rice: a genome-wide comparative for the indica genome in the public sequence repository, and DRTF analysis between monocots and eudicots. Plant Mol. Biol., 59, 191–203.

Journal

BioinformaticsOxford University Press

Published: Mar 21, 2006

There are no references for this article.