Access the full text.
Sign up today, get DeepDyve free for 14 days.
Shin-Han Shiu, M. Shih, Wen-Hsiung Li (2005)
Transcription Factor Families Have Much Higher Expansion Rates in Plants than in Animals1Plant Physiology, 139
Ildoo Hwang, Huei-Chi Chen, J. Sheen (2002)
Two-Component Signal Transduction Pathways in Arabidopsis1Plant Physiology, 129
G. Cochrane, P. Aldebert, N. Althorpe, M. Andersson, W. Baker, A. Baldwin, Kirsty Bates, S. Bhattacharyya, Paul Browne, Alexandra Broek, Matias Castro, Karyn Duggan, R. Eberhardt, Nadeem Faruque, John Gamble, Carola Kanz, T. Kulikova, Charles Lee, R. Leinonen, Quan Lin, V. Lombard, R. Lopez, Michelle McHale, Hamish McWilliam, Gaurab Mukherjee, Francesco Nardone, M. Pastor, S. Sobhany, P. Stoehr, Katerina Tzouvara, Robert Vaughan, Dan Wu, Weimin Zhu, R. Apweiler (2005)
EMBL Nucleotide Sequence Database: developments in 2005Nucleic Acids Research, 34
Emmanuel Quevillon, Ville Silventoinen, Sharmila Pillai, Nicola Harte, N. Mulder, R. Apweiler, R. Lopez (2005)
InterProScan: protein domains identifierNucleic Acids Research, 33
RiceTFDB: http://ricetfdb.bio.uni-potsdam
(2006)
plants than in animals
K. Pruitt, T. Tatusova, D. Maglott (2004)
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 33
J. Riechmann, Jacqueline Heard, George Martin, T. Reuber, C. Jiang, J. Keddie, L. Adam, O. Pineda, O. Ratcliffe, Raymond Samaha, R. Creelman, M. Pilgrim, P. Broun, James Zhang, D. Ghandehari, B. Sherman, Guo-Liang Yu (2000)
Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.Science, 290 5499
S. Eddy (1998)
Profile hidden Markov modelsBioinformatics, 14 9
Robert Finn, Jaina Mistry, John Tate, Penny Coggill, A. Heger, Joanne Pollington, O. Gavin, Prasad Gunasekaran, Goran Ceric, Kristoffer Forslund, Liisa Holm, Erik Sonnhammer, Sean Eddy, Alex Bateman (2007)
The Pfam protein families databaseNucleic Acids Research, 38
K. Pruitt, T. Tatusova, D. Maglott (2006)
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 35
L. Conde, Juan Vaquerizas, Carles Ferrer-Costa, X. Cruz, M. Orozco, J. Dopazo (2005)
PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposesNucleic Acids Research, 33
Yuqing Xiong, Tieyan Liu, Chaoguang Tian, Shouhong Sun, Jiayang Li, Mingsheng Chen (2005)
Transcription Factors in Rice: A Genome-wide Comparative Analysis between Monocots and EudicotsPlant Molecular Biology, 59
Anyuan Guo, Kun He, Di Liu, Shunong Bai, X. Gu, Liping Wei, Jingchu Luo (2005)
DATF: a database of Arabidopsis transcription factorsBioinformatics, 21 10
Cathy Wu, R. Apweiler, A. Bairoch, D. Natale, W. Barker, B. Boeckmann, Serenella Ferro, E. Gasteiger, Hongzhan Huang, R. Lopez, M. Magrane, M. Martin, R. Mazumder, C. O’Donovan, Nicole Redaschi, Baris Suzek (2005)
The Universal Protein Resource (UniProt): an expanding universe of protein informationNucleic Acids Research, 34
R. Davuluri, Hao Sun, Saranyan Palaniswamy, Nicole Matthews, Carlos Molina, M. Kurtz, E. Grotewold (2003)
AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factorsBMC Bioinformatics, 4
V. Matys, O. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, N. Voss, P. Stegmaier, B. Lewicki-Potapov, H. Saxel, A. Kel, E. Wingender (2005)
TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotesNucleic Acids Research, 34
Bateman (2004)
D138Nucleic Acids Res., 32
Vol. 22 no. 10 2006, pages 1286–1287 BIOINFORMATICSAPPLICATIONS NOTE doi:10.1093/bioinformatics/btl107 Databases and ontologies 1 1 1 1 1 2 Ge Gao , Yingfu Zhong , Anyuan Guo , Qihui Zhu , Wen Tang , Weimou Zheng , 1 1, 1, Xiaocheng Gu , Liping Wei and Jingchu Luo Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, People’s Republic of China and The Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China Received on March 9, 2006; accepted on March 18, 2006 Advance Access publication March 21, 2006 Associate Editor: Martin Bishop ABSTRACT searched by keywords or sequences. All sequences are available for downloading. Summary: DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. 2 IDENTIFICATION OF PUTATIVE It includesdetailed annotationsof eachTF including sequencefeatures, TRANSCRIPTION FACTORS functional domains, Gene Ontology assignment, chromosomal local- We first compiled and refined a list of sequence signatures for ization, ESTand microarray expression information, as well as multiple known plant TF families based on the literature (Shiu et al., sequence alignment of the DNA-binding domains for each TF family. 2005; Xiong et al., 2005; Davuluri et al., 2003; Riechmann The database can be browsed and searched with a user-friendly et al., 2000) and existing databases (Guo et al., 2005, http://datf. web interface. cbi.pku.edu.cn/). Most families can be identified by representative Availability: DRTF is available at http://drtf.cbi.pku.edu.cn HMM profiles for their DNA-binding domains from Pfam (Bateman Contact: drtf@mail.cbi.pku.edu.cn et al., 2004). For the remaining families without DNA-binding domain profiles, either characterized recently or containing few members, we chose representative sequences from the literature 1 INTRODUCTION and use them as seeds for BLAST. Finally, we collected 63 distinct Transcription factors (TFs) play key roles in regulating gene expres- TF families. sion at the transcriptional level, controlling or influencing many We downloaded 49 710 predicted indica proteins from the biological processes such as development, growth, cell division Beijing Genome Institute (BGI, http://rise.genomics.org.cn/) and and responses to environmental stimulus. Identification, character- 49 472 predicted japonica proteins from TIGR (http://rice.tigr. ization and classification of TFs at the genome scale may provide an org/). Based on the list of plant TF signatures, we performed important resource for researchers on transcriptional regulation. The HMMER (Eddy, 1998) and BLAST searches against the whole only available online database of rice TFs is RiceTFDB (http:// proteomes of indica and japonica. We choose 0.01 as the default ricetfdb.bio.uni-potsdam.de/) which contains 2856 protein models E-value cutoff for most TF families in HMMER searches. We coded from 2305 loci in 53 TF families for japonica. It has limited manually inspected all alignments of the domains and refined the annotations including DNA-binding domain and InterPro domain results carefully. For BLAST searches, we manually inspected the hits for each TF and whole length multiple sequence alignment for alignments and set the E-value cutoff case by case (for details see each family. Xiong et al. (2005) identified 1745 putative TF protein the DRTF Help page). Finally, we identified 2025 putative TFs from models coded from 1611 loci in japonica, and provided the list as indica and 2384 from japonica. Supplementary data. A comprehensive, well-annotated resource of TFs in both indica and japonica can facilitate comparative analysis 3 ANNOTATION OF PUTATIVE of TFs between these two rice subspecies and help to explore TRANSCRIPTION FACTORS the distinct morphological differentiations between indica and To provide comprehensive information for the putative TFs, we japonica. made extensive annotations using a number of bioinformatics Combining automated InterPro scans and BLAST searches with tools and databases. In particular, we employed InterProScan careful manual curation, we have identified TFs in both indica and (Quevillon et al., 2005) to identify protein domains and assign japonica, and constructed a database of rice TFs named DRTF GO terms to the putative TFs; we performed similarity searches containing extensive annotations for the TFs and TF families as against major databases including UniProt (Wu et al., 2006), well as homologous relationship between corresponding indica, RefSeq (Pruitt et al., 2005), EMBL (Cochrane et al., 2006) and japonica and Arabidopsis TFs. The DRTF web server was set up TRANSFAC (Matys et al., 2006) and hyperlinked to them; we made under the Apache/PHP/MySQL environment on a RedHat Linux BLASTP searches against the latest PDBselect database (E-value platform. It can be browsed by TF families or chromosomes, and <0.01, identity >30%, and overlap 50 residues) to find 3D struc- To whom correspondence should be addressed. tural relevance; we obtained EST expression information from 1286 The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org DRTF: a database of rice transcription factors UniGene clusters and microarray expression information from the may bridge the gap at least for the TF families. We will maintain and NCBI GEO database using GEO-BLAST; we aligned the TFs to the update DRTF regularly as more data and information become RIKEN full-length sequences and provided their accession numbers available. and CloneIDs; lastly, we identified homologs of each TF in the other rice subspecies and Arabidopsis. For each TF family, DRTF includes information extracted from the literature, key references, ACKNOWLEDGEMENTS and multiple sequence alignment of the DNA-binding domains. This study was supported by domestic grants: 2003CB715900 (973), 90408015 (NSFC) and the 863 Programme. 4 DISCUSSION Conflict of Interest: none declared. The goal of DRTF is to construct a comprehensive resource of rice TFs. Instead of relying on computational prediction completely, we combined automated search and manual curation. Despite the dif- REFERENCES ference in TF numbers of the two rice subspecies, TFs of one Bateman,A. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, subspecies find homologs in the other reciprocally at a rate >97%. D138–D141. The different TF numbers between some of the co-responding Cochrane,G. et al. (2006) EMBL Nucleotide Sequence Database: developments in families in DRTF and RiceTFDB could be caused partly by the 2005. Nucleic Acids Res., 34, D10–D15. different HMM profiles used to define certain families. For example, Davuluri,R.V. et al. (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription we took CCCH type zinc finger domain (IPR000571) as the defining factors. BMC Bioinformatics, 23, 4–25. signature described as ‘DNA-binding’ in InterPro and ‘nucleic acid Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. binding’ as the GO term for the C3H family, whereas RiceTFDB Guo,A. et al. (2005) DATF: a database of Arabidopsis transcription factors. used the C3HC4 ring-finger domain (IPR001841) which has no Bioinformatics, 21, 2568–2469. Hwang,I. et al. (2002) Two-component signal transduction pathways in Arabidopsis. description of DNA-binding function in InterPro, and the GO Plant Physiol., 129, 500–515. terms assigned are ‘protein-binding’ (GO: 0005515) and ‘zinc ion Matys,V. et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional binding’ (GO: 0008270). The different choice of HMM profiles has gene regulation in eukaryotes. Nucleic Acids Res., 34, D108–D110. resulted in a 6-fold difference in the number of predicted japonica Pruitt,K.D. et al. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant TFs of this family, only 90 in DRTF but 541 in RiceTFDB. sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33, D501–D504. The differences between the dataset of putative japonica TFs in Quevillon,E. et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res., DRTF and the dataset composed by Xiong et al. (2005) are mostly 33, W116–W120. because of the larger number of TF families we classified (63 versus Riechmann,J.L. et al. (2000) Arabidopsis transcription factors: genome-wide compar- 37), and the newer version (Release 4) of TIGR database which ative analysis among eukaryotes. Science, 290, 2105–2110. Shiu,S.H. et al. (2005) Transcription factor families have much higher expansion rates contains 62 827 predicted proteins versus 59 712 in Release 2 of in plants than in animals. Plant Physiol., 139, 18–26. which 409 TFs we identified for DRTF are missed. Wu,C.H. et al. (2006) The Universal Protein Resource (UniProt): an expanding DRTF is the first database of TFs for indica and the most annot- universe of protein information. Nucleic Acids Res., 34, D187–D191. ated one for japonica. Currently, there is little annotation available Xiong,Y. et al. (2005) Transcription factors in rice: a genome-wide comparative for the indica genome in the public sequence repository, and DRTF analysis between monocots and eudicots. Plant Mol. Biol., 59, 191–203.
Bioinformatics – Oxford University Press
Published: Mar 21, 2006
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.