Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Osiris: an integrated promoter database for Oryza sativa L.

Osiris: an integrated promoter database for Oryza sativa L. Vol. 24 no. 24 2008, pages 2915–2917 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn537 Genome analysis Robert T. Morris, Timothy R. O’Connor and John J. Wyrick School of Molecular Biosciences and Center for Reproductive Biology, Washington State University, Pullman, WA, USA Received on June 13, 2008; revised on August 14, 2008; accepted on October 12, 2008 Advance Access publication October 15, 2008 Associate Editor: Dmitrij Frishman ABSTRACT for an integrated and user-friendly promoter database and associated analysis tools dedicated to the rice genome. Summary: Rice (Oryza sativa L.) is an important model monocot and To meet this challenge the Osiris database and website was cereal crop. While the rice genome sequence has been published developed. Osiris integrates diverse data sources, including rice and annotated, relatively little is known about the transcriptional promoter sequences, predicted TF binding sites and CpG islands, networks that regulate rice gene expression. For this reason, we gene ontology functional annotation and microarray expression have developed Osiris, a database containing promoter sequences, data to provide a comprehensive platform for promoter analysis. predicted transcription factor (TF) binding sites, gene ontology The database mining tools in Osiris enable the user to query the annotation and microarray expression data for 24 209 genes in the database for relationships between TF binding sites, gene functions rice genome. These tools are seamlessly integrated in the Osiris and gene expression—queries that are not possible using currently web site, allowing the user to visualize TF binding sites in multiple available tools. Furthermore, this flexible framework can perform promoters; analyze the statistical significance of enriched TF binding fast visualization of promoter sequences and gene expression sites; query for genes containing similar promoter regulatory logic or patterns, as well as providing statistical support to test for enrichment gene function and visualize the microarray expression patterns of of regulatory elements and functional categories among sets of queried or selected gene sets. selected genes. Availability: http://www.bioinformatics2.wsu.edu/Osiris Contact: jwyrick@wsu.edu Supplementary information: Supplementary data are available at 2 METHODS Bioinformatics online. The Osiris database was built using the software framework previously used for the Athena promoter database (O’Connor et al., 2005). This framework was updated for rice genomic data, and additional tools and 1 INTRODUCTION database modifications were added to accommodate microarray expression The recent sequencing (IRGSP, 2005) and annotation (Tanaka et al., data. Osiris contains promoter sequences for 24 209 annotated rice genes, 92 experimentally validated TF-binding consensus sequences, gene ontology 2008) of the rice genome has afforded the opportunity to examine information for a majority of rice genes and 67 microarray datasets. The set transcriptional regulation at a genomic level in this model cereal of all promoter and TF binding site consensus sequences in the database monocot. Computational prediction of transcription factor (TF) are available for download through the user interface. Additional details of binding sites and other regulatory elements present in promoter data sources, data filtering and validation are described in the Supplementary sequences provides a powerful method to decipher the regulatory Materials. networks controlling plant gene expression. Several databases have Osiris provides two separate sets of rice promoter sequences. The first set been published that contain information about rice-specific TF of promoter sequences is based on the predicted gene models described in proteins and their consensus-binding sequences. The DRTF database the Rice Annotation Project database (Tanaka et al., 2008). The second set is a catalog of more than 2000 rice genes encoding predicted of promoter sequences is based on experimentally defined gene transcript TF proteins (Gao et al., 2006). TF-binding consensus sequences data (Tanaka et al., 2008). This design gives the user the flexibility to choose for rice are available from plant cis-element databases, including either the predicted or transcript-based definitions of promoter sequences, though it is important to note that, in general, the promoter definitions vary PLACE (Higo et al., 1999) and PlantCARE (Lescot et al., 2002). only slightly between these two datasets. In addition, a number of microarray datasets analyzing rice gene expression patterns have been published. Until recently, however, bioinformatics tools dedicated to the analysis of rice promoter 3 RESULTS AND FUNCTIONALITY elements and transcriptional regulation have been lacking. The Osiris interface is divided into four sections: data mining, A software tool designed to search for novel or enriched cis- custom motif searching, visualization and analysis. We will analyze elements in user-selected rice promoter sequences has been recently the regulation of genes encoding rice storage protein to illustrate the published (Doi et al., 2008). While this is an important first step in features of Osiris. the analysis of rice promoter sequences, there is still a great need The data mining page allows the user to query the Osiris database for gene sets which have user-specified combination To whom correspondence should be addressed. of TF binding sites and GO annotation terms. The query result © The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2915 R.T.Morris et al. Fig. 1. (A) Table of TF binding sites detected in the promoter sequences of 21 genes encoding rice storage proteins. P-values indicate significance of enrichment of each TF binding sites. The P and S parameters indicate the number of promoters in the selected set with a particular binding site and the total number of predicted sites detected in the selected promoter set, respectively. TF binding site names are hyperlinks to pages providing background information. (B) Visualization of GluB-1 upstream promoter region. Vertical colored lines indicate predicted TF binding sites locations in the promoter sequence, and are hyperlinks to binding site information pages. The line color can also be used to identify the TF binding site in the TF table (A). Aqua colored rectangles highlight putative CpG islands. page provides the set of genes that match the query criteria, The analysis module allows the user to construct more and summarizes information about predicted TF binding sites and complicated database queries, including positional constraints for TF other regulatory features in the promoter sequences of the selected binding sites in the promoter sequences. The corresponding results genes. We queried Osiris for genes that are functionally annotated page also contains tools to visualize the microarray expression as storage proteins. The query returned 21 genes. The promoter patterns for the selected or queried genes. The profile tool provides −3 sequences of these genes were significantly enriched (P < 10 ) a visual display of the mRNA levels of the selected genes across for the TF binding sites GLUTAACAOS, GLUTEBOX2OSGT3, different growth conditions and tissue types (Supplementary Fig. 2). GLUTEBOX1OSGT3 and GLUTEBP2OS (Fig. 1A), which have The correlation tool calculates the similarity in the expression been previously shown to regulate the transcription of genes patterns between each pair of selected genes, and returns a graph encoding glutelin storage proteins (Croissant-Sych and Okita, of these correlation coefficients (Supplementary Fig. 3). These tools 1996). These data mining results may be downloaded by the allow the user to test how combinations of TF binding sites correlate user by selecting the download button (Fig. 1A). Alternatively, with gene expression patterns. querying the database for genes whose promoters containing both The analysis results page also contains a tool to graph the the GLUTAACAOS and AACACOREOSGLUB1 TF binding site positional distribution of TF binding sites in user-selected promoters. returned 28 genes that showed enrichment in various functional Analysis of the position distribution of the G-box-like motif revealed annotations (Supplementary Fig. 1). In addition to querying the that this predicted TF binding site shows a clear positional bias promoter database for known TF binding sites, Osiris has a custom adjacent to transcription start sites (Supplementary Fig. 4). motif search tool that can be used to identify rice promoter sequences containing instances of a user-defined sequence motif. The visualization tool provides a graphical representation of the ACKNOWLEDGEMENTS promoter sequences of user-selected genes, or genes identified by data mining queries. The compact visualization option provides a We thank Nick Lewis for assistance in developing some of the Osiris simplified view of the promoter regions, including the positions of tools, and Justin Fischer for computer support. We are grateful predicted TF binding sites and CpG islands. The cartoon option for the microarray data obtained from the RICEATLAS project has more detailed sequence information. The visualization tool (http://plantgenomics.biology.yale.edu/riceatlas), which is funded interface allows the user to dynamically alter the promoter image by by an NSF Plant Genome Award. selecting which predicted TF binding sites to visualize. The compact visualization tool was used to display the locations of predicted TF Funding: National Science Foundation grant (DBI-0605016). binding sites present in the promoter of the storage protein GluB-1 Conflict of Interest: none declared. (Fig. 1B). 2916 Osiris REFERENCES Lescot,M. et al. (2002) PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Croissant-Sych,Y. and Okita,T. (1996) Identification of positive and negative regulatory Res., 30, 325–327. cis-elements of the rice glutelin Gt3 promoter. Plant Sci., 116, 27–35. O’Connor,T.R. et al. (2005) Athena: a resource for rapid visualization and systematic Doi,K. et al. (2008) Development of a novel data mining tool to find cis-elements in analysis of Arabidopsis promoter sequences. Bioinformatics, 21, 4411–4413. rice gene promoter regions. BMC Plant Biol., 8, 20. Tanaka,T. et al. (2008) The Rice Annotation Project Database (RAP-DB): 2008 update. Gao,G. et al. (2006) DRTF: a database of rice transcription factors. Bioinformatics, 22, Nucleic Acids Res., 36, D1028–D1033. 1286–1287. Higo,K. et al. (1999) Plant cis-acting regulatory DNA elements (PLACE) database: The International Rice Genome Sequencing Project (2005) The map-based sequence of 1999. Nucleic Acids Res., 27, 297–300. the rice genome, Nature, 436, 793–800. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Osiris: an integrated promoter database for Oryza sativa L.

Bioinformatics , Volume 24 (24): 3 – Oct 15, 2008

Loading next page...
 
/lp/oxford-university-press/osiris-an-integrated-promoter-database-for-oryza-sativa-l-UHa6AnOhHN

References (9)

Publisher
Oxford University Press
Copyright
© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btn537
pmid
18922805
Publisher site
See Article on Publisher Site

Abstract

Vol. 24 no. 24 2008, pages 2915–2917 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn537 Genome analysis Robert T. Morris, Timothy R. O’Connor and John J. Wyrick School of Molecular Biosciences and Center for Reproductive Biology, Washington State University, Pullman, WA, USA Received on June 13, 2008; revised on August 14, 2008; accepted on October 12, 2008 Advance Access publication October 15, 2008 Associate Editor: Dmitrij Frishman ABSTRACT for an integrated and user-friendly promoter database and associated analysis tools dedicated to the rice genome. Summary: Rice (Oryza sativa L.) is an important model monocot and To meet this challenge the Osiris database and website was cereal crop. While the rice genome sequence has been published developed. Osiris integrates diverse data sources, including rice and annotated, relatively little is known about the transcriptional promoter sequences, predicted TF binding sites and CpG islands, networks that regulate rice gene expression. For this reason, we gene ontology functional annotation and microarray expression have developed Osiris, a database containing promoter sequences, data to provide a comprehensive platform for promoter analysis. predicted transcription factor (TF) binding sites, gene ontology The database mining tools in Osiris enable the user to query the annotation and microarray expression data for 24 209 genes in the database for relationships between TF binding sites, gene functions rice genome. These tools are seamlessly integrated in the Osiris and gene expression—queries that are not possible using currently web site, allowing the user to visualize TF binding sites in multiple available tools. Furthermore, this flexible framework can perform promoters; analyze the statistical significance of enriched TF binding fast visualization of promoter sequences and gene expression sites; query for genes containing similar promoter regulatory logic or patterns, as well as providing statistical support to test for enrichment gene function and visualize the microarray expression patterns of of regulatory elements and functional categories among sets of queried or selected gene sets. selected genes. Availability: http://www.bioinformatics2.wsu.edu/Osiris Contact: jwyrick@wsu.edu Supplementary information: Supplementary data are available at 2 METHODS Bioinformatics online. The Osiris database was built using the software framework previously used for the Athena promoter database (O’Connor et al., 2005). This framework was updated for rice genomic data, and additional tools and 1 INTRODUCTION database modifications were added to accommodate microarray expression The recent sequencing (IRGSP, 2005) and annotation (Tanaka et al., data. Osiris contains promoter sequences for 24 209 annotated rice genes, 92 experimentally validated TF-binding consensus sequences, gene ontology 2008) of the rice genome has afforded the opportunity to examine information for a majority of rice genes and 67 microarray datasets. The set transcriptional regulation at a genomic level in this model cereal of all promoter and TF binding site consensus sequences in the database monocot. Computational prediction of transcription factor (TF) are available for download through the user interface. Additional details of binding sites and other regulatory elements present in promoter data sources, data filtering and validation are described in the Supplementary sequences provides a powerful method to decipher the regulatory Materials. networks controlling plant gene expression. Several databases have Osiris provides two separate sets of rice promoter sequences. The first set been published that contain information about rice-specific TF of promoter sequences is based on the predicted gene models described in proteins and their consensus-binding sequences. The DRTF database the Rice Annotation Project database (Tanaka et al., 2008). The second set is a catalog of more than 2000 rice genes encoding predicted of promoter sequences is based on experimentally defined gene transcript TF proteins (Gao et al., 2006). TF-binding consensus sequences data (Tanaka et al., 2008). This design gives the user the flexibility to choose for rice are available from plant cis-element databases, including either the predicted or transcript-based definitions of promoter sequences, though it is important to note that, in general, the promoter definitions vary PLACE (Higo et al., 1999) and PlantCARE (Lescot et al., 2002). only slightly between these two datasets. In addition, a number of microarray datasets analyzing rice gene expression patterns have been published. Until recently, however, bioinformatics tools dedicated to the analysis of rice promoter 3 RESULTS AND FUNCTIONALITY elements and transcriptional regulation have been lacking. The Osiris interface is divided into four sections: data mining, A software tool designed to search for novel or enriched cis- custom motif searching, visualization and analysis. We will analyze elements in user-selected rice promoter sequences has been recently the regulation of genes encoding rice storage protein to illustrate the published (Doi et al., 2008). While this is an important first step in features of Osiris. the analysis of rice promoter sequences, there is still a great need The data mining page allows the user to query the Osiris database for gene sets which have user-specified combination To whom correspondence should be addressed. of TF binding sites and GO annotation terms. The query result © The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2915 R.T.Morris et al. Fig. 1. (A) Table of TF binding sites detected in the promoter sequences of 21 genes encoding rice storage proteins. P-values indicate significance of enrichment of each TF binding sites. The P and S parameters indicate the number of promoters in the selected set with a particular binding site and the total number of predicted sites detected in the selected promoter set, respectively. TF binding site names are hyperlinks to pages providing background information. (B) Visualization of GluB-1 upstream promoter region. Vertical colored lines indicate predicted TF binding sites locations in the promoter sequence, and are hyperlinks to binding site information pages. The line color can also be used to identify the TF binding site in the TF table (A). Aqua colored rectangles highlight putative CpG islands. page provides the set of genes that match the query criteria, The analysis module allows the user to construct more and summarizes information about predicted TF binding sites and complicated database queries, including positional constraints for TF other regulatory features in the promoter sequences of the selected binding sites in the promoter sequences. The corresponding results genes. We queried Osiris for genes that are functionally annotated page also contains tools to visualize the microarray expression as storage proteins. The query returned 21 genes. The promoter patterns for the selected or queried genes. The profile tool provides −3 sequences of these genes were significantly enriched (P < 10 ) a visual display of the mRNA levels of the selected genes across for the TF binding sites GLUTAACAOS, GLUTEBOX2OSGT3, different growth conditions and tissue types (Supplementary Fig. 2). GLUTEBOX1OSGT3 and GLUTEBP2OS (Fig. 1A), which have The correlation tool calculates the similarity in the expression been previously shown to regulate the transcription of genes patterns between each pair of selected genes, and returns a graph encoding glutelin storage proteins (Croissant-Sych and Okita, of these correlation coefficients (Supplementary Fig. 3). These tools 1996). These data mining results may be downloaded by the allow the user to test how combinations of TF binding sites correlate user by selecting the download button (Fig. 1A). Alternatively, with gene expression patterns. querying the database for genes whose promoters containing both The analysis results page also contains a tool to graph the the GLUTAACAOS and AACACOREOSGLUB1 TF binding site positional distribution of TF binding sites in user-selected promoters. returned 28 genes that showed enrichment in various functional Analysis of the position distribution of the G-box-like motif revealed annotations (Supplementary Fig. 1). In addition to querying the that this predicted TF binding site shows a clear positional bias promoter database for known TF binding sites, Osiris has a custom adjacent to transcription start sites (Supplementary Fig. 4). motif search tool that can be used to identify rice promoter sequences containing instances of a user-defined sequence motif. The visualization tool provides a graphical representation of the ACKNOWLEDGEMENTS promoter sequences of user-selected genes, or genes identified by data mining queries. The compact visualization option provides a We thank Nick Lewis for assistance in developing some of the Osiris simplified view of the promoter regions, including the positions of tools, and Justin Fischer for computer support. We are grateful predicted TF binding sites and CpG islands. The cartoon option for the microarray data obtained from the RICEATLAS project has more detailed sequence information. The visualization tool (http://plantgenomics.biology.yale.edu/riceatlas), which is funded interface allows the user to dynamically alter the promoter image by by an NSF Plant Genome Award. selecting which predicted TF binding sites to visualize. The compact visualization tool was used to display the locations of predicted TF Funding: National Science Foundation grant (DBI-0605016). binding sites present in the promoter of the storage protein GluB-1 Conflict of Interest: none declared. (Fig. 1B). 2916 Osiris REFERENCES Lescot,M. et al. (2002) PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Croissant-Sych,Y. and Okita,T. (1996) Identification of positive and negative regulatory Res., 30, 325–327. cis-elements of the rice glutelin Gt3 promoter. Plant Sci., 116, 27–35. O’Connor,T.R. et al. (2005) Athena: a resource for rapid visualization and systematic Doi,K. et al. (2008) Development of a novel data mining tool to find cis-elements in analysis of Arabidopsis promoter sequences. Bioinformatics, 21, 4411–4413. rice gene promoter regions. BMC Plant Biol., 8, 20. Tanaka,T. et al. (2008) The Rice Annotation Project Database (RAP-DB): 2008 update. Gao,G. et al. (2006) DRTF: a database of rice transcription factors. Bioinformatics, 22, Nucleic Acids Res., 36, D1028–D1033. 1286–1287. Higo,K. et al. (1999) Plant cis-acting regulatory DNA elements (PLACE) database: The International Rice Genome Sequencing Project (2005) The map-based sequence of 1999. Nucleic Acids Res., 27, 297–300. the rice genome, Nature, 436, 793–800.

Journal

BioinformaticsOxford University Press

Published: Oct 15, 2008

There are no references for this article.