funRiceGenes dataset for comprehensive understanding and application of rice functional genes

funRiceGenes dataset for comprehensive understanding and application of rice functional genes Background: As a main staple food, rice is also a model plant for functional genomic studies of monocots. Decoding of every DNA element of the rice genome is essential for genetic improvement to address increasing food demands. The past 15 years have witnessed extraordinary advances in rice functional genomics. Systematic characterization and proper deposition of every rice gene are vital for both functional studies and crop genetic improvement. Findings: We built a comprehensive and accurate dataset of ∼2800 functionally characterized rice genes and ∼5000 members of different gene families by integrating data from available databases and reviewing every publication on rice functional genomic studies. The dataset accounts for 19.2% of the 39 045 annotated protein-coding rice genes, which provides the most exhaustive archive for investigating the functions of rice genes. We also constructed 214 gene interaction networks based on 1841 connections between 1310 genes. The largest network with 762 genes indicated that pleiotropic genes linked different biological pathways. Increasing degree of conservation of the flowering pathway was observed among more closely related plants, implying substantial value of rice genes for future dissection of flowering regulation in other crops. All data are deposited in the funRiceGenes database (https://funricegenes.github.io/). Functionality for advanced search and continuous updating of the database are provided by a Shiny application (http://funricegenes.ncpgr.cn/). Conclusions: The funRiceGenes dataset would enable further exploring of the crosslink between gene functions and natural variations in rice, which can also facilitate breeding design to improve target agronomic traits of rice. Keywords: Oryza sativa (rice); functional genomics; interaction network; genetic improvement increasing world population and the diminishing arable land. Background Decoding the genetic reservoirs of rice is the basis for rice phe- Rice is a main staple food that feeds half of the world’s popula- notype improvement. tion. Improvement of yield and resistance to multiple biotic and Functional genomic studies in model organisms have made abiotic stresses of rice are essential strategies to cope with the great contributions to the studies of a wide range of other Received: 25 June 2017; Revised: 23 September 2017; Accepted: 22 November 2017 The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 2 Yao et al. species [1]. In the last decade, the functions of a number of including the GenBank accession number and the correspond- rice genes were explored with the availability of the genome ing gene model in the Nipponbare genome, was extracted. sequence of Oryza sativa L. ssp. japonica cv. Nipponbare [2]. Genes As an integrated rice science database, Oryzabase [25] also controlling important agronomic traits, including grain yield provides information on a portion of functionally characterized [3, 4], blast [5] and blight [6, 7] disease resistance, insect resis- rice genes with manual curation. We downloaded 10 140 records tance [8], and abiotic stress resistance [9, 10], were functionally comprising a list of genes from this database [28], and 5531 characterized. Some of these genes were utilized in rice breed- records with assigned Nipponbare genomic locus were retained. ing directly based on marker-assisted strategy and CRISPR (Clus- After removing of redundant records in datasets obtained from tered Regularly Interspaced Short Palindromic Repeats) [11–13]. the other 2 approaches, 469 functionally characterized genes ex- Moreover, the putative homologs of some rice genes were in- cluding members of gene families were retrieved. All informa- vestigated in other crops such as wheat [14–17], barley [18], and tion on the 469 genes was manually curated based on the review maize [19]. As rice is an ideal model of the grass family, charac- of research publications. Finally, 2207 functionally characterized terization of rice genes would greatly facilitate genomic studies rice genes were collected until 13 February 2014. and molecular breeding in other crops. We further collected ∼3600 members of various gene families Abundant information on functionally characterized genes by integrating data from the Rice Genome Annotation Project of Arabidopsis is archived in The Arabidopsis Information Re- database [29], the Oryzabase database, and research publica- source (TAIR) [20], while a list of functionally characterized tions. All the data were deposited in the funRiceGenes database maize genes are integrated in the maizeGDB database [21], [30]. which greatly promoted the functional genomics studies in A Shiny application [31] was then developed to facilitate uti- plants. Detailed information on Drosophila genes stored in the lization of this dataset, which also enabled easy addition of FlyBase database is of great value to the studies in Drosophila and newly reported genes to the database. New genes were added to humans [22]. The rice genome annotation project maintained this database using the Shiny application, based on daily email by the Michigan State University [23] and the Rice Annotation alerts of search results from the PubMed database with the key- Project Database (RAP-DB) [24] greatly promoted the progress word rice (rice[Title] OR rice[Title/Abstract]) [32]. For all PubMed of rice functional genomics. Although a number of curated rice records in the email alert, we identified ones on functionally genes are collected in RAP-DB and Oryzabase [25], not all the characterized rice genes. We then went over the full publication functionally characterized rice genes are properly deposited in of each record and identified the gene symbol and gene model existing databases. In the long term, the functions of all rice in the reference genome. After inputting the gene symbol, the genes will be decoded [26]. As a result, a comprehensive archive gene model in the reference genome, and the PubMed identi- of all functionally characterized rice genes involved in diverse fier, the Shiny application will fetch the corresponding publi- pathways with live updating is urgently in demand. cation record from PubMed and extract key information auto- In this study, we constructed a comprehensive, up-to-date matically. We also kept track of new records in the database of database of rice functional genes, which includes ∼2800 cloned Oryzabase and China Rice Data Center, which were then added rice genes and ∼5000 members of different gene families. Inter- to our database using the Shiny application. Since 13 February action networks comprising 1310 functionally characterized rice 2014, funRiceGenes has been updated every 2 weeks using the genes were constructed, which revealed complex regulation and Shiny application. All updated records are available at the NEWS crosstalk of different biological pathways. We also developed a menu of the funRiceGenes database [33]. As of 23 February 2017, Shiny application that allows easy addition of newly reported ∼2800 functionally characterized genes and ∼5000 gene family rice genes. As far as we are concerned, this is the most compre- members were archived in the funRiceGenes database, which hensive and accurate database of functionally characterized rice accounted for 19.2% of the 39 045 annotated protein-coding rice genes with continuous updating. genes (Additional file 1: Table S1; Additional file 2: Table S2) [33, 23]. Overview of the dataset regarding functionally Results characterized rice genes Collection of functionally characterized rice genes Rice functional genomic studies developed rapidly after the pub- A database [27] maintained by the China Rice Data Center col- lic availability of the Nipponbare reference genome (Additional lects information on thousands of cloned rice genes in Chinese. file3:Fig.S1).Intotal, ∼3553 publications with respect to ∼2800 Information on these genes was downloaded using in-house R functionally characterized genes were collected (Additional file scripts, including gene symbol, publications, the corresponding 4: Table S3). These publications came from more than 215 jour- gene model in the Nipponbare reference genome, and a brief nals, 31.0% of which were published in The Plant Journal, Plant summary of the corresponding gene. The abstract, the author af- Physiology, Plant Molecular Biology, The Plant Cell, Molecular Plant, filiation, and the full text of each publication were subsequently and New Phytologist (Additional file 4: Table S3). Among all pub- extracted from the PubMed database. Next, we manually curated lished papers, 4 words—rice, gene, protein, and expression— the dataset based on the full text of each publication and ob- were observed with the highest frequency in titles, while the tained 1297 functionally characterized rice genes. words rice, gene, expression, protein, plant, mutant, and stress We further downloaded 29 982 publication records by query- were found with the highest frequency in the abstract (Addi- ing the PubMed database with the keyword rice ((rice[Title] OR tional file 5: Fig. S2; Additional file 6: Fig. S3). More than 1800 rice[Title/Abstract]), data until 13 February 2014). All records affiliations from all over the world contributed to rice functional were grouped by published journal. After removing the records genomic studies (Additional file 7: Table S4), and scientists from involved in the China Rice Data Center and ones irrelevant to rice China, Japan, Korea, United States, and India accounted for the functional genomics, the full texts of the remaining publications majority of the progress (Additional file 8: Fig. S4). were downloaded and reviewed, which identified 441 additional Genomic positions were determined for more than 98.1% functionally characterized rice genes. Information on each gene, of all functionally characterized rice genes based on the Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 3 Figure 1: Chromosome distribution of representative functionally characterized rice genes. The chromosomes are represented as vertical rectangles, and each hori- zontal line denotes the position of a functionally characterized rice gene. Symbols of all genes are labeled. A total of 930 representative genes are shown. corresponding gene models of the Nipponbare reference remaining 24 genes could not be located in the genome, which genome (Additional file 1: Table S1; Fig. 1). Twenty-five genes was likely due to the sequence divergence between different rice were absent from or showed substantial sequence divergence germplasms. relative to the Nipponbare reference genome, and their genomic A number of genes were investigated simultaneously by dis- positions were determined based on the reference genome se- tinct research groups based on various rice accessions, mutants, quences of indica varieties Zhenshan 97 and Minghui 63 [34]. The or phenotypic traits. As a result, 637 genes were assigned more Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Yao et al. Figure 2: Usage of various biotechniques in rice functional genomics studies. The y-axis indicates the number of publications using a specific biotechnique. D ata after 18 June 2015 are not shown. than 1 symbol (Additional file 1: Table S1). In contrast, the same lyze gene expression level (Fig. 2). Overexpression or RNAi were symbols were sometimes assigned to different genes due to the frequently used to disturb gene expression, which contributed lack of communication (Additional file 9: Table S5). to the dissection of the association between gene expression Based on the concurrence of gene symbols and keywords re- and phenotype variation. Creation of mutants using T-DNA and garding phenotype description or biological process in the same Tos17 insertions contributed significantly to rice gene cloning, sentence of an abstract or a title in the literature, the functions of while GWAS (genome-wide association study) and CRISPR be- corresponding genes were summarized with manual curation. A came new strategies to dissect the functions of rice genes in re- total of 441 keywords were investigated, which generated 21 872 cent years [35, 36]. records for 1952 genes (Additional file 10: Table S6). Among all 441 keywords, yield and grain yield were found in 311 records for 115 genes, while grain width, grain length, grain weight, and Interaction networks of functionally characterized grain size were detected in 139 records for 53 genes. Among all 77 rice genes genes retrieved with a heading date or flowering time, 13 were also associated with yield or grain yield. Likewise, 7 genes in- Physical and genetic interactions between different rice genes volved in iron utilization, phosphate uptake, and sugar trans- were frequently reported. However, a global view of the inter- porting were related to grain yield. We also found that 335 genes action networks for all functionally characterized rice genes re- were involved in different stress signaling pathways, while 139 mains to be elaborated. We constructed interaction networks of genes were related to rice diseases, including blast, bacterial functionally characterized genes based on the concurrence of blight, and sheath blight. the symbols of 2 or more genes in the same sentence of an ab- Progress in rice functional genomics benefited from the de- stract or a title of research publications using in-house R script velopment of various technologies and the availability of di- with manual curation. A sentence in which 2 or more genes were verse genomic and genetic resources. We found that homolog observed was regarded as evidence supporting the connection information was the most frequently used resource in rice func- between these genes. In total, 1841 connections supported by tional genomics studies, and reverse transcriptase polymerase 4046 evidences were detected, which comprised 1310 genes con- chain reaction was the most commonly used technique to ana- stituting 214 interaction networks (Additional file 11: Table S7). Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 5 Figure 3: The gene interaction network comprising 762 genes. Each white node represents a functionally characterized rice gene, and gene symbols are marked beside the node. Each green edge indicates a connection between 2 genes. Genes involved in the same biological pathways are indicated. The largest network was composed of 762 genes including sis, poplar, and grapevine, and orthologous genes were also ones associated with flowering, phosphate uptake and home- identified for another 20 rice genes in sorghum, maize, and ostasis, iron uptake, stress signaling, blight disease resistance, Brachypodium (Fig. 4; Additional file 13: Table S8). Only 7 genes— meiosis, BR (brassinosteroid) and GA (gibberellin) signaling, RFT1, Ehd4, Hd6, OsCO3, ROC4, Se14, and OsPIL15—were unique to grain weight, and endosperm development (Fig. 3). Genes related rice. These results demonstrated the increasing degree of con- to the same trait were clustered together, indicating the trust- servation of the flowering pathway among plants with closer worthiness of this approach. The enormous size of this network phylogenetic relationships, implying substantial value of knowl- was mainly caused by pleiotropic genes involved in different bi- edge on functionally characterized rice genes to future dissec- ological pathways. For example, Ghd8 was responsible for grain tion of flowering time regulation in other crops. number, plant height, and heading date [37]. Ghd8 connected to genes controlling heading date including Ehd1 [38], Hd16 [39], Discussion and RFT1 [38], and genes controlling tillering including MOC1 [40], which was further connected with MIP1,agene regulat- In this study, we built a comprehensive and accurate database of ing tillering and plant height [41]. The other 213 interaction net- functionally characterized rice genes, funRiceGenes, which pro- works were made up of 548 rice genes, 88% of which contained vides a valuable resource for rice functional genomic studies. only 2 or 3 genes (Additional file 12: Fig. S5). The second largest funRiceGenes was constructed by integrating data from PubMed, network contained 14 genes involved in glutamine metabolism, Oryzabase, and China Rice Data Center, and it has been updated including OsAMT1;3, GAD3,and GAT1 [42, 43]. Genes in terms of every 2 weeks using a Shiny application. For each gene in the small RNA biogenesis including OsDCL3a, OsDCL1, and OsHEN1 funRiceGenes database, the gene symbol, the genomic locus in were observed in a 10-gene network (Additional file 12: Fig. S5) the reference genome, and the published papers on this gene [44–46]. were identified. Compared with Textpresso for Oryza sativa [48], We further constructed an interaction network using 77 which is a comprehensive collection of literatures on rice, we genes involved in flowering regulation (Fig. 4). Based on the or- further built the associations between genomic locus or sym- thologous groups among 7 plants provided by the Rice Genome bol of genes and literatures [49]. Based on the literature identi- Annotation Project [47], we found that 40 of the 77 genes had or- fied for each gene, we summarized the brief functions of each thologous genes in sorghum, maize, Brachypodium, Arabidop- gene and constructed interaction networks for all genes. The Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Yao et al. Figure 4: Interaction network of genes regulating flowering in rice and the orthologs of these genes in other plants. Each node represents a functionally charac terized rice gene. Each edge indicates a connection between 2 genes. Genes with different number of orthologs are indicated with different colors and shapes. “Rice + (Maize — Poplar)” indicates “Rice and Maize” or “Rice and Poplar.” Detailed information is shown in Additional file 13: Table S8. evidence supporting the functions of all collected genes and the Pyramiding and editing of functionally characterized rice interaction networks are unique to the funRiceGenes database. genes regulating important agronomic traits by molecular In addition, a user-friendly query interface and tidy data for marker–assisted selection and CRISPR are 2 promising ap- downloading are provided in the funRiceGenes database. proaches used to breed new rice varieties in recent years [55–57]. Along with the sequence and phenotype data of thou- Thus, this database would play an important role in future rice sands of rice accessions reported in recent years, the af- breeding. For a specific agronomic trait, all related genes could fluent information of rice genes in our database would be retrieved from this database conveniently for further ma- enable further exploring of the crosslink between gene func- nipulation [58]. For any of these genes, all relevant publications tions and natural variations. We found that a cloned rice gene and a brief summary are available in this database [59]. The se- OsSGL (LOC Os02g04130, chr02:1799733-1800811), which regu- quences of different alleles reported are also archived in this lated grain weight in rice, was ∼70 kb away from a GWAS database. These resources would greatly facilitate breeding de- peak (chr02:1871732) in terms of grain weight [50, 51]. Like- sign to improve target agronomic traits by pyramiding of elite wise, another gene OsPPKL3 (LOC Os12g42310, chr12:26273157- alleles or knocking out deleterious alleles. In addition, the ef- 26282197), which regulated grain length, is ∼90 kb away from a fect of 1 gene might be enhanced or masked by other genes [60]. GWAS peak (chr12:26182880) associated with grain length [52, Thus, the gene interaction networks provided in this database 53]. The functions of OsSGL and OsPPKL3 were characterized by could also be taken into account when making breeding transgenic studies, and the natural variations of the 2 genes are designs. yet to be dissected. Our database is also beneficial to the interpretation of large- scale DNA, mRNA, and other sequencing datasets in rice. Analy- ses of these data usually identify differentially expressed genes, Materials and Methods gene co-expression networks, differentially methylated regions, Geocoding of author affiliations and ChIP-seq peaks, etc. The detailed information concerning the several thousands of rice genes archived in this database The latitudes and longitudes of all the author affiliations were would be helpful for illustration of these results [54]. Batch query obtained using the application interface provided by the DATA- functions are provided, allowing search of this database with SCIENCETOOLKIT website [61] with in-house R scripts. For au- multiple genes belonging to a pathway/biological process or de- thor affiliations that failed to be geocoded at high resolutions, we fined gene set. Our work in rice would facilitate functional ge- further used the Mapeasy website [62] to find the accurate lati- nomic studies of other crops including wheat, sorghum, and tudes and longitudes. The R package ggmap was used to demon- maize. strate the positions of all affiliations on the world map [ 63]. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 7 Extraction of information from PDF files Additional file 12: Figure S5: Gene interaction networks con- structed based on the concurrence of 2 or more genes in the The occurrence of keywords, including map-based cloning, posi- same sentence of abstracts or titles of publications. Each white tional cloning, accession number, accession No., northern blot, node represents a gene, while each green edge indicates a con- northern analysis, northern hybridization, and the regular ex- nection between 2 genes. pression “os[0-1][0-9]g[0-9]+.∗” in PDF files were inspected uti- Additional file 13: Table S8: Orthologs of genes regulating head- lizing the R tm [64]package. ingdateinrice. Construction of interaction networks Abbreviations The R package igraph [65] was used to build the interaction net- works based on all the connection information between genes. BR: brassinosteroid; CRISPR: Clustered Regularly Interspaced The networks were then exported in a data format suitable for Short Palindromic Repeats; GA: gibberellin; GWAS: genome-wide Cytoscape, which was used to visualize the network [66]. association study; kb: kilo base; RAP-DB: the Rice Annotation Project Database; TAIR: the Arabidopsis Information Resource. Availability of supporting source code and requirements Conflicts of interest Project name: funRiceGenes (funRiceGenes, RRID:SCR 015778) The authors declare that they have no competing interests. Project home page: http://funricegenes.ncpgr.cn/ GitHub repository: https://github.com/venyao/RICENCODE Operating system(s): platform independent Programming language: R (≥3.1.0) Author contributions Other requirements: tested with R packages shiny (1.0.5), W.Y. conceived and designed the experiments. W.Y., G.L., Y.Y., shinythemes (1.1.1), shinyBS (0.61), RCurl (1.95.4.8), XML and Y.O. analyzed the data. W.Y. and Y.O. wrote the paper. (3.98.1.9), stringr (1.2.0), plyr (1.8.4) License: GPLv3 Any restrictions to use by nonacademics: none Funding Research resource ID: funRiceGenes, RRID:SCR 015778 This research was supported by grants from the National Key Research and Development Program of China (2016YFD0100903), Availability of supporting data the National Natural Science Foundation of China (31771873 and A snapshot of the version of the funRiceGenes source code used 31371599), and the National Program for Support of Top-notch in this paper is archived in the GigaScience repository, GigaDB [67]. Young Professionals. Additional files References Additional file 1: Table S1: A comprehensive list of functionally characterized rice genes. 1. Fontana L, Partridge L. Promoting health and longevity Additional file 2: Table S2: List of rice gene families. through diet: from model organisms to humans. Cell Additional file 3: Figure S1: Number of papers on rice functional 2015;1(1):106–18. genomic studies published in each year. 2. Goff SA, Ricke D, Lan TH et al. A draft sequence of Additional file 4: Table S3: Publications on functionally charac- the rice genome (Oryza sativa L. ssp. japonica). Science terized rice genes. 2002;5565(5565):92–100. Additional file 5: Figure S2: Word cloud analysis of the titles of 3. Wang J, Yu H, Xiong G et al. Tissue-specific ubiquitination by all the publications on rice functional genomic studies. IPA1 INTERACTING PROTEIN1 modulates IPA1 protein levels Additional file 6: Figure S3: Word cloud analysis of the abstracts to regulate plant architecture in rice. Plant Cell 2017;4(4):697– of all the publications on rice functional genomic studies. 707. Additional file 7: Table S4: The geocoding results of author affil- 4. Fan C, Xing Y, Mao H et al. GS3, a major QTL for grain length iations. and weight and minor QTL for grain width and thickness in Additional file 8: Figure S4: Global distribution of affiliations con- rice, encodes a putative transmembrane protein. Theor Appl tributed to rice functional genomics studies. All the affiliations Genet 2006;6(6):1164–71. are marked on the world map as blue circles based on their longi- 5. Deng Y, Zhai K, Xie Z et al. Epigenetic regulation of antago- tudes and latitudes. The size of the circle represents the number nistic receptors confers rice blast resistance with yield bal- of publications conducted by each affiliation. Data after 18 June ance. Science 2017;6328(6328):962–5. 2015 are not shown. 6. Gu K, Yang B, Tian D et al. R gene expression induced by Additional file 9: Table S5: Genes with different functions that a type-III effector triggers disease resistance in rice. Nature were assigned the same symbols. 2005;7045(7045):1122–5. Additional file 10: Table S6: Concurrence of the gene symbols and 7. Hu K, Cao J, Zhang J et al. Improvement of multiple agro- the keywords regarding phenotype description or biological pro- nomic traits by a disease resistance gene via cell wall rein- cess in the same sentence of the abstracts or titles of articles. forcement. Nat Plants 2017;(3):17009. Additional file 11: Table S7: Concurrence of the symbols of 2 or 8. Zhao Y, Huang J, Wang Z et al. Allelic diversity in an NLR gene more genes in the same sentence of the abstracts or titles of BPH9 enables rice to combat planthopper variation. Proc Natl research publications. Acad Sci U S A 2016;45(45):12850–5. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 8 Yao et al. 9. Xu K, Xu X, Fukao T et al. Sub1A is an ethylene-response- 30. The funRiceGenes database. https://funricegenes.github.io/. factor-like gene that confers submergence tolerance to rice. Accessed 29 October 2017. Nature 2006;7103(7103):705–8. 31. The funRiceGenes application. http://funricegenes.ncpgr.cn/. 10. Tan J, Tan Z, Wu F et al. A novel chloroplast-localized pen- Accessed 29 October 2017. tatricopeptide repeat protein involved in splicing affects 32. Help page of the funRiceGenes database. https:// chloroplast development and abiotic stress response in rice. funricegenes.github.io/help.pdf. Accessed 29 October 2017. Mol Plant 2014;8(8):1329–49. 33. News menu of the funRiceGenes database. https:// 11. Jiang H, Feng Y, Bao L et al. Improving blast resistance of Jin funricegenes.github.io/news/. Accessed 29 October 2017. 23B and its hybrid rice by marker-assisted gene pyramiding. 34. Zhang J, Chen L-L, Xing F et al. Extensive sequence diver- Mol Breeding 2012;4(4):1679–88. gence between the reference genomes of two elite indica rice 12. Wang S, Wu K, Yuan Q et al. Control of grain size, shape and varieties Zhenshan 97 and Minghui 63. Proc Natl Acad Sci U quality by OsSPL16 in rice. Nat Genet 2012;8(8):950–4. S A 2016;35:E5163–71. 13. Shan Q, Zhang Y, Chen K et al. Creation of fragrant rice by 35. Si L, Chen J, Huang X et al. OsSPL13 controls grain size in cul- targeted knockout of the OsBADH2 gene using TALEN tech- tivated rice. Nat Genet 2016;4:447–56. nology. Plant Biotechnol J 2015;6(6):791–800. 36. Yamauchi T, Yoshioka M, Fukazawa A et al. An NADPH 14. Bednarek J, Boulaflous A, Girousse C et al. Down-regulation oxidase RBOH functions in rice roots during Lysigenous of the TaGW2 gene by RNA interference results in decreased Aerenchyma formation under oxygen-deficient conditions. grain size and weight in wheat. J Exp Bot 2012;16(16):5945–55. Plant Cell 2017;4:775–90. 15. Liu Y-N, Xia X-C, Z-H He. Characterization of dense and erect 37. Yan WH, Wang P, Chen HX et al. A major QTL, Ghd8, panicle1gene(TaDep1) located on common wheat group 5 plays pleiotropic roles in regulating grain productivity, plant chromosomes and development of allele-specific markers. height, and heading date in rice. Mol Plant 2011;2:319–30. Acta Agronomica Sinica 2013;4(4):589–98. 38. Dai X, Ding Y, Tan L et al. LHD1, an allele of DTH8/Ghd8, con- 16. Nemoto Y, Kisaka M, Fuse T et al. Characterization and func- trols late heading date in common wild rice (Oryza rufipogon ) tional analysis of three wheat genes with homology to the F. J Integr Plant Biol 2012;10:790–9. CONSTANS flowering time gene in transgenic rice. Plant J 39. Hori K, Ogiso-Tanaka E, Matsubara K et al. Hd16,agene 2003;1(1):82–93. for casein kinase I, is involved in the control of rice flow- 17. Nakamura S, Abe F, Kawahigashi H et al. A wheat homolog of ering time by modulating the day-length response. Plant J mother of FT and TFL1 acts in the regulation of germination. 2013;1:36–46. Plant Cell 2011;9(9):3215–29. 40. Li X, Qian Q, Fu Z et al. Control of tillering in rice. Nature 18. Comadran J, Kilian B, Russell J et al. Natural variation in 2003;6932:618–21. a homolog of Antirrhinum CENTRORADIALIS contributed to 41. Sun F, Zhang W, Xiong G et al. Identification and functional spring growth habit and environmental adaptation in cul- analysis of the MOC1 interacting protein 1. J Genet Genomics tivated barley. Nat Genet 2012;12(12):1388–92. 2010;1:69–77. 19. Yang Q, Li Z, Li W et al. CACTA-like transposable element in 42. Yang S, Hao D, Cong Y et al. The rice OsAMT1;1 is a proton- ZmCCT attenuated photoperiod sensitivity and accelerated independent feedback regulated ammonium transporter. the postdomestication spread of maize. Proc Natl Acad Sci U Plant Cell Rep 2015;2:321–30. S A 2013;42(42):16969–74. 43. El-kereamy A, Bi Y-M, Ranathunge K et al. The rice R2R3-MYB 20. Lamesch P, Berardini TZ, Li D et al. The Arabidopsis Infor- transcription factor OsMYB55 is involved in the tolerance to mation Resource (TAIR): improved gene annotation and new high temperature and modulates amino acid metabolism. tools. Nucleic Acids Res 2012;(D1):D1202–10. PLoS One 2012;12:e52030. 21. The maizeGDB database. http://maizegdb.org/web newgene 44. Wei L, Gu L, Song X et al. Dicer-like 3 produces trans- .php?window=alltime. Accessed 29 October 2017. posable element-associated 24-nt siRNAs that control agri- 22. Gramates LS, Marygold SJ, Santos Gd et al. FlyBase at 25: cultural traits in rice. Proc Natl Acad Sci U S A 2014;10: looking to the future. Nucleic Acids Res 2017;D1:D663–71. 3877–82. 23. Kawahara Y, de la Bastide M, Hamilton J et al. Improvement 45. LiuB,LiP,LiXet al.Lossoffunctionof OsDCL1 affects mi- of the Oryza sativa Nipponbare reference genome using next croRNA accumulation and causes developmental defects in generation sequence and optical map data. Rice 2013;1:1–10. rice. Plant Physiol 2005;1:296–305. 24. Sakai H, Lee SS, Tanaka T et al. Rice annotation project 46. Abe M, Yoshikawa T, Nosaka M et al. WAVY LEAF1,an database (RAP-DB): an integrative and interactive database ortholog of Arabidopsis HEN1, regulates shoot develop- for rice genomics. Plant Cell Physiol 2013;2:e6. ment by maintaining microRNA and trans-acting small in- 25. The Oryzabase database. http://www.shigen.nig.ac.jp/rice/ terfering RNA accumulation in rice. Plant Physiol 2010;3: oryzabase/download/gene. Accessed 29 October 2017. 1335–46. 26. Zhang Q, Li J, Xue Y et al. Rice 2020: a call for an interna- 47. Orthologous groups among rice, arabidopsis, brachy- tional coordinated effort in rice functional genomics. Mol podium, maize, poplar, grapevine and sorghum. http://rice. Plant 2008;5:715–9. plantbiology.msu.edu/annotation pseudo apk.shtml. Ac- 27. China Rice Data Center. http://www.ricedata.cn/gene. Ac- cessed 29 October 2017. cessed 29 October 2017. 48. Textpresso for Oryza sativa. http://map.lab.nig.ac.jp:8095/ 28. Gene list in the Oryzabase database. http://www.shigen. textpresso/index.html. Accessed 29 October 2017. nig.ac.jp/rice/oryzabase/gene/download;jsessionid=52FB01 49. Muller ¨ H-M, Kenny EE, Sternberg PW. Textpresso: an A7F53441CF54F823AA1ED71DE0?classtag=GENE EN LIST. ontology-based information retrieval and extraction system Accessed 29 October 2017. for biological literature. PLoS Biol 2004;11:e309. 29. Community Annotation of Rice Gene Families. http://rice. 50. Wang M, Lu X, Xu G et al. OsSGL, a novel pleiotropic stress- plantbiology.msu.edu/annotation community families.shtml. related gene enhances grain length and yield in rice. Sci Rep Accessed 29 October 2017. 2016:38157. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 9 51. Yang W, Guo Z, Huang C et al. Combining high-throughput 58. Blight disease genes in the funRiceGenes database. https:// phenotyping and genome-wide association studies to re- funricegenes.github.io/tags/#blight%20disease. Accessed 29 veal natural genetic variation in rice. Nat Commun 2014: October 2017. 5087. 59. Xa21 gene in the funRiceGenes database. https:// 52. Zhang X, Wang J, Huang J et al. Rare allele of OsPPKL1 as- funricegenes.github.io/xa21/. Accessed 29 October 2017. sociated with grain length causes extra-large grain and a 60. Gao X, Zhang X, Lan H et al. The additive effects of GS3 and significant yield increase in rice. Proc Natl Acad Sci U S A qGL3 on rice grain length regulation revealed by genetic and 2012;52:21534–9. transcriptome comparisons. BMC Plant Biol 2015;1:156. 53. McCouch SR, Wright MH, Tung C-W et al. Open access re- 61. The DATASCIENCETOOLKIT website. http://www. sources for genome-wide association mapping in rice. Nat datasciencetoolkit.org/. Accessed 29 October 2017. Commun 2016:10532. 62. The Mapeasy website. http://www.mapseasy.com/adress-to 54. Zong W, Tang N, Yang J et al. Feedback regulation of ABA sig- -gps-coordinates.php. Accessed 29 October 2017. naling and biosynthesis by a bZIP transcription factor targets 63. Kahle D, Wickham H. ggmap: spatial visualization with gg- drought resistance related genes. Plant Physiol 2016;4:2810– plot2. R J 2013;1:144–61. 25. 64. Meyer D, Hornik K, Feinerer I. Text mining infrastructure in 55. Collard BC, Mackill DJ. Marker-assisted selection: an ap- R. J Stat Softw 2008;5:1–54. proach for precision plant breeding in the twenty-first 65. Csardi G, Nepusz T. The igraph software package for complex century. Phil Transact Royal Soc B Biol Sci 2008;1491: network research. InterJournal 2006:1695. 557–72. 66. Shannon P, Markiel A, Ozier O et al. Cytoscape: a software 56. Zeng D, Tian Z, Rao Y et al. Rational design of high-yield and environment for integrated models of biomolecular interac- superior-quality rice. Nat Plants 2017:17031. tion networks. Genome Res 2003;11:2498–504. 57. Zhou H, He M, Li J et al. Development of commercial thermo- 67. Yao W, Li G, Yu Y et al. Supporting data for “funRice- sensitive genic male sterile rice accelerates hybrid rice Genes dataset for comprehensive understanding and appli- breeding using the CRISPR/Cas9-mediated TMS5 editing sys- cation of rice functional genes.” GigaScience Database 2017. tem. Sci Rep 2016:37395. http://dx.doi.org/10.5524/100375. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png GigaScience Oxford University Press

funRiceGenes dataset for comprehensive understanding and application of rice functional genes

Free
9 pages

Loading next page...
 
/lp/ou_press/funricegenes-dataset-for-comprehensive-understanding-and-application-H0giOH0XKQ
Publisher
BGI
Copyright
© The Author 2017. Published by Oxford University Press.
eISSN
2047-217X
D.O.I.
10.1093/gigascience/gix119
Publisher site
See Article on Publisher Site

Abstract

Background: As a main staple food, rice is also a model plant for functional genomic studies of monocots. Decoding of every DNA element of the rice genome is essential for genetic improvement to address increasing food demands. The past 15 years have witnessed extraordinary advances in rice functional genomics. Systematic characterization and proper deposition of every rice gene are vital for both functional studies and crop genetic improvement. Findings: We built a comprehensive and accurate dataset of ∼2800 functionally characterized rice genes and ∼5000 members of different gene families by integrating data from available databases and reviewing every publication on rice functional genomic studies. The dataset accounts for 19.2% of the 39 045 annotated protein-coding rice genes, which provides the most exhaustive archive for investigating the functions of rice genes. We also constructed 214 gene interaction networks based on 1841 connections between 1310 genes. The largest network with 762 genes indicated that pleiotropic genes linked different biological pathways. Increasing degree of conservation of the flowering pathway was observed among more closely related plants, implying substantial value of rice genes for future dissection of flowering regulation in other crops. All data are deposited in the funRiceGenes database (https://funricegenes.github.io/). Functionality for advanced search and continuous updating of the database are provided by a Shiny application (http://funricegenes.ncpgr.cn/). Conclusions: The funRiceGenes dataset would enable further exploring of the crosslink between gene functions and natural variations in rice, which can also facilitate breeding design to improve target agronomic traits of rice. Keywords: Oryza sativa (rice); functional genomics; interaction network; genetic improvement increasing world population and the diminishing arable land. Background Decoding the genetic reservoirs of rice is the basis for rice phe- Rice is a main staple food that feeds half of the world’s popula- notype improvement. tion. Improvement of yield and resistance to multiple biotic and Functional genomic studies in model organisms have made abiotic stresses of rice are essential strategies to cope with the great contributions to the studies of a wide range of other Received: 25 June 2017; Revised: 23 September 2017; Accepted: 22 November 2017 The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 2 Yao et al. species [1]. In the last decade, the functions of a number of including the GenBank accession number and the correspond- rice genes were explored with the availability of the genome ing gene model in the Nipponbare genome, was extracted. sequence of Oryza sativa L. ssp. japonica cv. Nipponbare [2]. Genes As an integrated rice science database, Oryzabase [25] also controlling important agronomic traits, including grain yield provides information on a portion of functionally characterized [3, 4], blast [5] and blight [6, 7] disease resistance, insect resis- rice genes with manual curation. We downloaded 10 140 records tance [8], and abiotic stress resistance [9, 10], were functionally comprising a list of genes from this database [28], and 5531 characterized. Some of these genes were utilized in rice breed- records with assigned Nipponbare genomic locus were retained. ing directly based on marker-assisted strategy and CRISPR (Clus- After removing of redundant records in datasets obtained from tered Regularly Interspaced Short Palindromic Repeats) [11–13]. the other 2 approaches, 469 functionally characterized genes ex- Moreover, the putative homologs of some rice genes were in- cluding members of gene families were retrieved. All informa- vestigated in other crops such as wheat [14–17], barley [18], and tion on the 469 genes was manually curated based on the review maize [19]. As rice is an ideal model of the grass family, charac- of research publications. Finally, 2207 functionally characterized terization of rice genes would greatly facilitate genomic studies rice genes were collected until 13 February 2014. and molecular breeding in other crops. We further collected ∼3600 members of various gene families Abundant information on functionally characterized genes by integrating data from the Rice Genome Annotation Project of Arabidopsis is archived in The Arabidopsis Information Re- database [29], the Oryzabase database, and research publica- source (TAIR) [20], while a list of functionally characterized tions. All the data were deposited in the funRiceGenes database maize genes are integrated in the maizeGDB database [21], [30]. which greatly promoted the functional genomics studies in A Shiny application [31] was then developed to facilitate uti- plants. Detailed information on Drosophila genes stored in the lization of this dataset, which also enabled easy addition of FlyBase database is of great value to the studies in Drosophila and newly reported genes to the database. New genes were added to humans [22]. The rice genome annotation project maintained this database using the Shiny application, based on daily email by the Michigan State University [23] and the Rice Annotation alerts of search results from the PubMed database with the key- Project Database (RAP-DB) [24] greatly promoted the progress word rice (rice[Title] OR rice[Title/Abstract]) [32]. For all PubMed of rice functional genomics. Although a number of curated rice records in the email alert, we identified ones on functionally genes are collected in RAP-DB and Oryzabase [25], not all the characterized rice genes. We then went over the full publication functionally characterized rice genes are properly deposited in of each record and identified the gene symbol and gene model existing databases. In the long term, the functions of all rice in the reference genome. After inputting the gene symbol, the genes will be decoded [26]. As a result, a comprehensive archive gene model in the reference genome, and the PubMed identi- of all functionally characterized rice genes involved in diverse fier, the Shiny application will fetch the corresponding publi- pathways with live updating is urgently in demand. cation record from PubMed and extract key information auto- In this study, we constructed a comprehensive, up-to-date matically. We also kept track of new records in the database of database of rice functional genes, which includes ∼2800 cloned Oryzabase and China Rice Data Center, which were then added rice genes and ∼5000 members of different gene families. Inter- to our database using the Shiny application. Since 13 February action networks comprising 1310 functionally characterized rice 2014, funRiceGenes has been updated every 2 weeks using the genes were constructed, which revealed complex regulation and Shiny application. All updated records are available at the NEWS crosstalk of different biological pathways. We also developed a menu of the funRiceGenes database [33]. As of 23 February 2017, Shiny application that allows easy addition of newly reported ∼2800 functionally characterized genes and ∼5000 gene family rice genes. As far as we are concerned, this is the most compre- members were archived in the funRiceGenes database, which hensive and accurate database of functionally characterized rice accounted for 19.2% of the 39 045 annotated protein-coding rice genes with continuous updating. genes (Additional file 1: Table S1; Additional file 2: Table S2) [33, 23]. Overview of the dataset regarding functionally Results characterized rice genes Collection of functionally characterized rice genes Rice functional genomic studies developed rapidly after the pub- A database [27] maintained by the China Rice Data Center col- lic availability of the Nipponbare reference genome (Additional lects information on thousands of cloned rice genes in Chinese. file3:Fig.S1).Intotal, ∼3553 publications with respect to ∼2800 Information on these genes was downloaded using in-house R functionally characterized genes were collected (Additional file scripts, including gene symbol, publications, the corresponding 4: Table S3). These publications came from more than 215 jour- gene model in the Nipponbare reference genome, and a brief nals, 31.0% of which were published in The Plant Journal, Plant summary of the corresponding gene. The abstract, the author af- Physiology, Plant Molecular Biology, The Plant Cell, Molecular Plant, filiation, and the full text of each publication were subsequently and New Phytologist (Additional file 4: Table S3). Among all pub- extracted from the PubMed database. Next, we manually curated lished papers, 4 words—rice, gene, protein, and expression— the dataset based on the full text of each publication and ob- were observed with the highest frequency in titles, while the tained 1297 functionally characterized rice genes. words rice, gene, expression, protein, plant, mutant, and stress We further downloaded 29 982 publication records by query- were found with the highest frequency in the abstract (Addi- ing the PubMed database with the keyword rice ((rice[Title] OR tional file 5: Fig. S2; Additional file 6: Fig. S3). More than 1800 rice[Title/Abstract]), data until 13 February 2014). All records affiliations from all over the world contributed to rice functional were grouped by published journal. After removing the records genomic studies (Additional file 7: Table S4), and scientists from involved in the China Rice Data Center and ones irrelevant to rice China, Japan, Korea, United States, and India accounted for the functional genomics, the full texts of the remaining publications majority of the progress (Additional file 8: Fig. S4). were downloaded and reviewed, which identified 441 additional Genomic positions were determined for more than 98.1% functionally characterized rice genes. Information on each gene, of all functionally characterized rice genes based on the Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 3 Figure 1: Chromosome distribution of representative functionally characterized rice genes. The chromosomes are represented as vertical rectangles, and each hori- zontal line denotes the position of a functionally characterized rice gene. Symbols of all genes are labeled. A total of 930 representative genes are shown. corresponding gene models of the Nipponbare reference remaining 24 genes could not be located in the genome, which genome (Additional file 1: Table S1; Fig. 1). Twenty-five genes was likely due to the sequence divergence between different rice were absent from or showed substantial sequence divergence germplasms. relative to the Nipponbare reference genome, and their genomic A number of genes were investigated simultaneously by dis- positions were determined based on the reference genome se- tinct research groups based on various rice accessions, mutants, quences of indica varieties Zhenshan 97 and Minghui 63 [34]. The or phenotypic traits. As a result, 637 genes were assigned more Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Yao et al. Figure 2: Usage of various biotechniques in rice functional genomics studies. The y-axis indicates the number of publications using a specific biotechnique. D ata after 18 June 2015 are not shown. than 1 symbol (Additional file 1: Table S1). In contrast, the same lyze gene expression level (Fig. 2). Overexpression or RNAi were symbols were sometimes assigned to different genes due to the frequently used to disturb gene expression, which contributed lack of communication (Additional file 9: Table S5). to the dissection of the association between gene expression Based on the concurrence of gene symbols and keywords re- and phenotype variation. Creation of mutants using T-DNA and garding phenotype description or biological process in the same Tos17 insertions contributed significantly to rice gene cloning, sentence of an abstract or a title in the literature, the functions of while GWAS (genome-wide association study) and CRISPR be- corresponding genes were summarized with manual curation. A came new strategies to dissect the functions of rice genes in re- total of 441 keywords were investigated, which generated 21 872 cent years [35, 36]. records for 1952 genes (Additional file 10: Table S6). Among all 441 keywords, yield and grain yield were found in 311 records for 115 genes, while grain width, grain length, grain weight, and Interaction networks of functionally characterized grain size were detected in 139 records for 53 genes. Among all 77 rice genes genes retrieved with a heading date or flowering time, 13 were also associated with yield or grain yield. Likewise, 7 genes in- Physical and genetic interactions between different rice genes volved in iron utilization, phosphate uptake, and sugar trans- were frequently reported. However, a global view of the inter- porting were related to grain yield. We also found that 335 genes action networks for all functionally characterized rice genes re- were involved in different stress signaling pathways, while 139 mains to be elaborated. We constructed interaction networks of genes were related to rice diseases, including blast, bacterial functionally characterized genes based on the concurrence of blight, and sheath blight. the symbols of 2 or more genes in the same sentence of an ab- Progress in rice functional genomics benefited from the de- stract or a title of research publications using in-house R script velopment of various technologies and the availability of di- with manual curation. A sentence in which 2 or more genes were verse genomic and genetic resources. We found that homolog observed was regarded as evidence supporting the connection information was the most frequently used resource in rice func- between these genes. In total, 1841 connections supported by tional genomics studies, and reverse transcriptase polymerase 4046 evidences were detected, which comprised 1310 genes con- chain reaction was the most commonly used technique to ana- stituting 214 interaction networks (Additional file 11: Table S7). Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 5 Figure 3: The gene interaction network comprising 762 genes. Each white node represents a functionally characterized rice gene, and gene symbols are marked beside the node. Each green edge indicates a connection between 2 genes. Genes involved in the same biological pathways are indicated. The largest network was composed of 762 genes including sis, poplar, and grapevine, and orthologous genes were also ones associated with flowering, phosphate uptake and home- identified for another 20 rice genes in sorghum, maize, and ostasis, iron uptake, stress signaling, blight disease resistance, Brachypodium (Fig. 4; Additional file 13: Table S8). Only 7 genes— meiosis, BR (brassinosteroid) and GA (gibberellin) signaling, RFT1, Ehd4, Hd6, OsCO3, ROC4, Se14, and OsPIL15—were unique to grain weight, and endosperm development (Fig. 3). Genes related rice. These results demonstrated the increasing degree of con- to the same trait were clustered together, indicating the trust- servation of the flowering pathway among plants with closer worthiness of this approach. The enormous size of this network phylogenetic relationships, implying substantial value of knowl- was mainly caused by pleiotropic genes involved in different bi- edge on functionally characterized rice genes to future dissec- ological pathways. For example, Ghd8 was responsible for grain tion of flowering time regulation in other crops. number, plant height, and heading date [37]. Ghd8 connected to genes controlling heading date including Ehd1 [38], Hd16 [39], Discussion and RFT1 [38], and genes controlling tillering including MOC1 [40], which was further connected with MIP1,agene regulat- In this study, we built a comprehensive and accurate database of ing tillering and plant height [41]. The other 213 interaction net- functionally characterized rice genes, funRiceGenes, which pro- works were made up of 548 rice genes, 88% of which contained vides a valuable resource for rice functional genomic studies. only 2 or 3 genes (Additional file 12: Fig. S5). The second largest funRiceGenes was constructed by integrating data from PubMed, network contained 14 genes involved in glutamine metabolism, Oryzabase, and China Rice Data Center, and it has been updated including OsAMT1;3, GAD3,and GAT1 [42, 43]. Genes in terms of every 2 weeks using a Shiny application. For each gene in the small RNA biogenesis including OsDCL3a, OsDCL1, and OsHEN1 funRiceGenes database, the gene symbol, the genomic locus in were observed in a 10-gene network (Additional file 12: Fig. S5) the reference genome, and the published papers on this gene [44–46]. were identified. Compared with Textpresso for Oryza sativa [48], We further constructed an interaction network using 77 which is a comprehensive collection of literatures on rice, we genes involved in flowering regulation (Fig. 4). Based on the or- further built the associations between genomic locus or sym- thologous groups among 7 plants provided by the Rice Genome bol of genes and literatures [49]. Based on the literature identi- Annotation Project [47], we found that 40 of the 77 genes had or- fied for each gene, we summarized the brief functions of each thologous genes in sorghum, maize, Brachypodium, Arabidop- gene and constructed interaction networks for all genes. The Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Yao et al. Figure 4: Interaction network of genes regulating flowering in rice and the orthologs of these genes in other plants. Each node represents a functionally charac terized rice gene. Each edge indicates a connection between 2 genes. Genes with different number of orthologs are indicated with different colors and shapes. “Rice + (Maize — Poplar)” indicates “Rice and Maize” or “Rice and Poplar.” Detailed information is shown in Additional file 13: Table S8. evidence supporting the functions of all collected genes and the Pyramiding and editing of functionally characterized rice interaction networks are unique to the funRiceGenes database. genes regulating important agronomic traits by molecular In addition, a user-friendly query interface and tidy data for marker–assisted selection and CRISPR are 2 promising ap- downloading are provided in the funRiceGenes database. proaches used to breed new rice varieties in recent years [55–57]. Along with the sequence and phenotype data of thou- Thus, this database would play an important role in future rice sands of rice accessions reported in recent years, the af- breeding. For a specific agronomic trait, all related genes could fluent information of rice genes in our database would be retrieved from this database conveniently for further ma- enable further exploring of the crosslink between gene func- nipulation [58]. For any of these genes, all relevant publications tions and natural variations. We found that a cloned rice gene and a brief summary are available in this database [59]. The se- OsSGL (LOC Os02g04130, chr02:1799733-1800811), which regu- quences of different alleles reported are also archived in this lated grain weight in rice, was ∼70 kb away from a GWAS database. These resources would greatly facilitate breeding de- peak (chr02:1871732) in terms of grain weight [50, 51]. Like- sign to improve target agronomic traits by pyramiding of elite wise, another gene OsPPKL3 (LOC Os12g42310, chr12:26273157- alleles or knocking out deleterious alleles. In addition, the ef- 26282197), which regulated grain length, is ∼90 kb away from a fect of 1 gene might be enhanced or masked by other genes [60]. GWAS peak (chr12:26182880) associated with grain length [52, Thus, the gene interaction networks provided in this database 53]. The functions of OsSGL and OsPPKL3 were characterized by could also be taken into account when making breeding transgenic studies, and the natural variations of the 2 genes are designs. yet to be dissected. Our database is also beneficial to the interpretation of large- scale DNA, mRNA, and other sequencing datasets in rice. Analy- ses of these data usually identify differentially expressed genes, Materials and Methods gene co-expression networks, differentially methylated regions, Geocoding of author affiliations and ChIP-seq peaks, etc. The detailed information concerning the several thousands of rice genes archived in this database The latitudes and longitudes of all the author affiliations were would be helpful for illustration of these results [54]. Batch query obtained using the application interface provided by the DATA- functions are provided, allowing search of this database with SCIENCETOOLKIT website [61] with in-house R scripts. For au- multiple genes belonging to a pathway/biological process or de- thor affiliations that failed to be geocoded at high resolutions, we fined gene set. Our work in rice would facilitate functional ge- further used the Mapeasy website [62] to find the accurate lati- nomic studies of other crops including wheat, sorghum, and tudes and longitudes. The R package ggmap was used to demon- maize. strate the positions of all affiliations on the world map [ 63]. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 7 Extraction of information from PDF files Additional file 12: Figure S5: Gene interaction networks con- structed based on the concurrence of 2 or more genes in the The occurrence of keywords, including map-based cloning, posi- same sentence of abstracts or titles of publications. Each white tional cloning, accession number, accession No., northern blot, node represents a gene, while each green edge indicates a con- northern analysis, northern hybridization, and the regular ex- nection between 2 genes. pression “os[0-1][0-9]g[0-9]+.∗” in PDF files were inspected uti- Additional file 13: Table S8: Orthologs of genes regulating head- lizing the R tm [64]package. ingdateinrice. Construction of interaction networks Abbreviations The R package igraph [65] was used to build the interaction net- works based on all the connection information between genes. BR: brassinosteroid; CRISPR: Clustered Regularly Interspaced The networks were then exported in a data format suitable for Short Palindromic Repeats; GA: gibberellin; GWAS: genome-wide Cytoscape, which was used to visualize the network [66]. association study; kb: kilo base; RAP-DB: the Rice Annotation Project Database; TAIR: the Arabidopsis Information Resource. Availability of supporting source code and requirements Conflicts of interest Project name: funRiceGenes (funRiceGenes, RRID:SCR 015778) The authors declare that they have no competing interests. Project home page: http://funricegenes.ncpgr.cn/ GitHub repository: https://github.com/venyao/RICENCODE Operating system(s): platform independent Programming language: R (≥3.1.0) Author contributions Other requirements: tested with R packages shiny (1.0.5), W.Y. conceived and designed the experiments. W.Y., G.L., Y.Y., shinythemes (1.1.1), shinyBS (0.61), RCurl (1.95.4.8), XML and Y.O. analyzed the data. W.Y. and Y.O. wrote the paper. (3.98.1.9), stringr (1.2.0), plyr (1.8.4) License: GPLv3 Any restrictions to use by nonacademics: none Funding Research resource ID: funRiceGenes, RRID:SCR 015778 This research was supported by grants from the National Key Research and Development Program of China (2016YFD0100903), Availability of supporting data the National Natural Science Foundation of China (31771873 and A snapshot of the version of the funRiceGenes source code used 31371599), and the National Program for Support of Top-notch in this paper is archived in the GigaScience repository, GigaDB [67]. Young Professionals. Additional files References Additional file 1: Table S1: A comprehensive list of functionally characterized rice genes. 1. Fontana L, Partridge L. Promoting health and longevity Additional file 2: Table S2: List of rice gene families. through diet: from model organisms to humans. Cell Additional file 3: Figure S1: Number of papers on rice functional 2015;1(1):106–18. genomic studies published in each year. 2. Goff SA, Ricke D, Lan TH et al. A draft sequence of Additional file 4: Table S3: Publications on functionally charac- the rice genome (Oryza sativa L. ssp. japonica). Science terized rice genes. 2002;5565(5565):92–100. Additional file 5: Figure S2: Word cloud analysis of the titles of 3. Wang J, Yu H, Xiong G et al. Tissue-specific ubiquitination by all the publications on rice functional genomic studies. IPA1 INTERACTING PROTEIN1 modulates IPA1 protein levels Additional file 6: Figure S3: Word cloud analysis of the abstracts to regulate plant architecture in rice. Plant Cell 2017;4(4):697– of all the publications on rice functional genomic studies. 707. Additional file 7: Table S4: The geocoding results of author affil- 4. Fan C, Xing Y, Mao H et al. GS3, a major QTL for grain length iations. and weight and minor QTL for grain width and thickness in Additional file 8: Figure S4: Global distribution of affiliations con- rice, encodes a putative transmembrane protein. Theor Appl tributed to rice functional genomics studies. All the affiliations Genet 2006;6(6):1164–71. are marked on the world map as blue circles based on their longi- 5. Deng Y, Zhai K, Xie Z et al. Epigenetic regulation of antago- tudes and latitudes. The size of the circle represents the number nistic receptors confers rice blast resistance with yield bal- of publications conducted by each affiliation. Data after 18 June ance. Science 2017;6328(6328):962–5. 2015 are not shown. 6. Gu K, Yang B, Tian D et al. R gene expression induced by Additional file 9: Table S5: Genes with different functions that a type-III effector triggers disease resistance in rice. Nature were assigned the same symbols. 2005;7045(7045):1122–5. Additional file 10: Table S6: Concurrence of the gene symbols and 7. Hu K, Cao J, Zhang J et al. Improvement of multiple agro- the keywords regarding phenotype description or biological pro- nomic traits by a disease resistance gene via cell wall rein- cess in the same sentence of the abstracts or titles of articles. forcement. Nat Plants 2017;(3):17009. Additional file 11: Table S7: Concurrence of the symbols of 2 or 8. Zhao Y, Huang J, Wang Z et al. Allelic diversity in an NLR gene more genes in the same sentence of the abstracts or titles of BPH9 enables rice to combat planthopper variation. Proc Natl research publications. Acad Sci U S A 2016;45(45):12850–5. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 8 Yao et al. 9. Xu K, Xu X, Fukao T et al. Sub1A is an ethylene-response- 30. The funRiceGenes database. https://funricegenes.github.io/. factor-like gene that confers submergence tolerance to rice. Accessed 29 October 2017. Nature 2006;7103(7103):705–8. 31. The funRiceGenes application. http://funricegenes.ncpgr.cn/. 10. Tan J, Tan Z, Wu F et al. A novel chloroplast-localized pen- Accessed 29 October 2017. tatricopeptide repeat protein involved in splicing affects 32. Help page of the funRiceGenes database. https:// chloroplast development and abiotic stress response in rice. funricegenes.github.io/help.pdf. Accessed 29 October 2017. Mol Plant 2014;8(8):1329–49. 33. News menu of the funRiceGenes database. https:// 11. Jiang H, Feng Y, Bao L et al. Improving blast resistance of Jin funricegenes.github.io/news/. Accessed 29 October 2017. 23B and its hybrid rice by marker-assisted gene pyramiding. 34. Zhang J, Chen L-L, Xing F et al. Extensive sequence diver- Mol Breeding 2012;4(4):1679–88. gence between the reference genomes of two elite indica rice 12. Wang S, Wu K, Yuan Q et al. Control of grain size, shape and varieties Zhenshan 97 and Minghui 63. Proc Natl Acad Sci U quality by OsSPL16 in rice. Nat Genet 2012;8(8):950–4. S A 2016;35:E5163–71. 13. Shan Q, Zhang Y, Chen K et al. Creation of fragrant rice by 35. Si L, Chen J, Huang X et al. OsSPL13 controls grain size in cul- targeted knockout of the OsBADH2 gene using TALEN tech- tivated rice. Nat Genet 2016;4:447–56. nology. Plant Biotechnol J 2015;6(6):791–800. 36. Yamauchi T, Yoshioka M, Fukazawa A et al. An NADPH 14. Bednarek J, Boulaflous A, Girousse C et al. Down-regulation oxidase RBOH functions in rice roots during Lysigenous of the TaGW2 gene by RNA interference results in decreased Aerenchyma formation under oxygen-deficient conditions. grain size and weight in wheat. J Exp Bot 2012;16(16):5945–55. Plant Cell 2017;4:775–90. 15. Liu Y-N, Xia X-C, Z-H He. Characterization of dense and erect 37. Yan WH, Wang P, Chen HX et al. A major QTL, Ghd8, panicle1gene(TaDep1) located on common wheat group 5 plays pleiotropic roles in regulating grain productivity, plant chromosomes and development of allele-specific markers. height, and heading date in rice. Mol Plant 2011;2:319–30. Acta Agronomica Sinica 2013;4(4):589–98. 38. Dai X, Ding Y, Tan L et al. LHD1, an allele of DTH8/Ghd8, con- 16. Nemoto Y, Kisaka M, Fuse T et al. Characterization and func- trols late heading date in common wild rice (Oryza rufipogon ) tional analysis of three wheat genes with homology to the F. J Integr Plant Biol 2012;10:790–9. CONSTANS flowering time gene in transgenic rice. Plant J 39. Hori K, Ogiso-Tanaka E, Matsubara K et al. Hd16,agene 2003;1(1):82–93. for casein kinase I, is involved in the control of rice flow- 17. Nakamura S, Abe F, Kawahigashi H et al. A wheat homolog of ering time by modulating the day-length response. Plant J mother of FT and TFL1 acts in the regulation of germination. 2013;1:36–46. Plant Cell 2011;9(9):3215–29. 40. Li X, Qian Q, Fu Z et al. Control of tillering in rice. Nature 18. Comadran J, Kilian B, Russell J et al. Natural variation in 2003;6932:618–21. a homolog of Antirrhinum CENTRORADIALIS contributed to 41. Sun F, Zhang W, Xiong G et al. Identification and functional spring growth habit and environmental adaptation in cul- analysis of the MOC1 interacting protein 1. J Genet Genomics tivated barley. Nat Genet 2012;12(12):1388–92. 2010;1:69–77. 19. Yang Q, Li Z, Li W et al. CACTA-like transposable element in 42. Yang S, Hao D, Cong Y et al. The rice OsAMT1;1 is a proton- ZmCCT attenuated photoperiod sensitivity and accelerated independent feedback regulated ammonium transporter. the postdomestication spread of maize. Proc Natl Acad Sci U Plant Cell Rep 2015;2:321–30. S A 2013;42(42):16969–74. 43. El-kereamy A, Bi Y-M, Ranathunge K et al. The rice R2R3-MYB 20. Lamesch P, Berardini TZ, Li D et al. The Arabidopsis Infor- transcription factor OsMYB55 is involved in the tolerance to mation Resource (TAIR): improved gene annotation and new high temperature and modulates amino acid metabolism. tools. Nucleic Acids Res 2012;(D1):D1202–10. PLoS One 2012;12:e52030. 21. The maizeGDB database. http://maizegdb.org/web newgene 44. Wei L, Gu L, Song X et al. Dicer-like 3 produces trans- .php?window=alltime. Accessed 29 October 2017. posable element-associated 24-nt siRNAs that control agri- 22. Gramates LS, Marygold SJ, Santos Gd et al. FlyBase at 25: cultural traits in rice. Proc Natl Acad Sci U S A 2014;10: looking to the future. Nucleic Acids Res 2017;D1:D663–71. 3877–82. 23. Kawahara Y, de la Bastide M, Hamilton J et al. Improvement 45. LiuB,LiP,LiXet al.Lossoffunctionof OsDCL1 affects mi- of the Oryza sativa Nipponbare reference genome using next croRNA accumulation and causes developmental defects in generation sequence and optical map data. Rice 2013;1:1–10. rice. Plant Physiol 2005;1:296–305. 24. Sakai H, Lee SS, Tanaka T et al. Rice annotation project 46. Abe M, Yoshikawa T, Nosaka M et al. WAVY LEAF1,an database (RAP-DB): an integrative and interactive database ortholog of Arabidopsis HEN1, regulates shoot develop- for rice genomics. Plant Cell Physiol 2013;2:e6. ment by maintaining microRNA and trans-acting small in- 25. The Oryzabase database. http://www.shigen.nig.ac.jp/rice/ terfering RNA accumulation in rice. Plant Physiol 2010;3: oryzabase/download/gene. Accessed 29 October 2017. 1335–46. 26. Zhang Q, Li J, Xue Y et al. Rice 2020: a call for an interna- 47. Orthologous groups among rice, arabidopsis, brachy- tional coordinated effort in rice functional genomics. Mol podium, maize, poplar, grapevine and sorghum. http://rice. Plant 2008;5:715–9. plantbiology.msu.edu/annotation pseudo apk.shtml. Ac- 27. China Rice Data Center. http://www.ricedata.cn/gene. Ac- cessed 29 October 2017. cessed 29 October 2017. 48. Textpresso for Oryza sativa. http://map.lab.nig.ac.jp:8095/ 28. Gene list in the Oryzabase database. http://www.shigen. textpresso/index.html. Accessed 29 October 2017. nig.ac.jp/rice/oryzabase/gene/download;jsessionid=52FB01 49. Muller ¨ H-M, Kenny EE, Sternberg PW. Textpresso: an A7F53441CF54F823AA1ED71DE0?classtag=GENE EN LIST. ontology-based information retrieval and extraction system Accessed 29 October 2017. for biological literature. PLoS Biol 2004;11:e309. 29. Community Annotation of Rice Gene Families. http://rice. 50. Wang M, Lu X, Xu G et al. OsSGL, a novel pleiotropic stress- plantbiology.msu.edu/annotation community families.shtml. related gene enhances grain length and yield in rice. Sci Rep Accessed 29 October 2017. 2016:38157. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Interpretation for rice functional genes 9 51. Yang W, Guo Z, Huang C et al. Combining high-throughput 58. Blight disease genes in the funRiceGenes database. https:// phenotyping and genome-wide association studies to re- funricegenes.github.io/tags/#blight%20disease. Accessed 29 veal natural genetic variation in rice. Nat Commun 2014: October 2017. 5087. 59. Xa21 gene in the funRiceGenes database. https:// 52. Zhang X, Wang J, Huang J et al. Rare allele of OsPPKL1 as- funricegenes.github.io/xa21/. Accessed 29 October 2017. sociated with grain length causes extra-large grain and a 60. Gao X, Zhang X, Lan H et al. The additive effects of GS3 and significant yield increase in rice. Proc Natl Acad Sci U S A qGL3 on rice grain length regulation revealed by genetic and 2012;52:21534–9. transcriptome comparisons. BMC Plant Biol 2015;1:156. 53. McCouch SR, Wright MH, Tung C-W et al. Open access re- 61. The DATASCIENCETOOLKIT website. http://www. sources for genome-wide association mapping in rice. Nat datasciencetoolkit.org/. Accessed 29 October 2017. Commun 2016:10532. 62. The Mapeasy website. http://www.mapseasy.com/adress-to 54. Zong W, Tang N, Yang J et al. Feedback regulation of ABA sig- -gps-coordinates.php. Accessed 29 October 2017. naling and biosynthesis by a bZIP transcription factor targets 63. Kahle D, Wickham H. ggmap: spatial visualization with gg- drought resistance related genes. Plant Physiol 2016;4:2810– plot2. R J 2013;1:144–61. 25. 64. Meyer D, Hornik K, Feinerer I. Text mining infrastructure in 55. Collard BC, Mackill DJ. Marker-assisted selection: an ap- R. J Stat Softw 2008;5:1–54. proach for precision plant breeding in the twenty-first 65. Csardi G, Nepusz T. The igraph software package for complex century. Phil Transact Royal Soc B Biol Sci 2008;1491: network research. InterJournal 2006:1695. 557–72. 66. Shannon P, Markiel A, Ozier O et al. Cytoscape: a software 56. Zeng D, Tian Z, Rao Y et al. Rational design of high-yield and environment for integrated models of biomolecular interac- superior-quality rice. Nat Plants 2017:17031. tion networks. Genome Res 2003;11:2498–504. 57. Zhou H, He M, Li J et al. Development of commercial thermo- 67. Yao W, Li G, Yu Y et al. Supporting data for “funRice- sensitive genic male sterile rice accelerates hybrid rice Genes dataset for comprehensive understanding and appli- breeding using the CRISPR/Cas9-mediated TMS5 editing sys- cation of rice functional genes.” GigaScience Database 2017. tem. Sci Rep 2016:37395. http://dx.doi.org/10.5524/100375. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4689117 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

GigaScienceOxford University Press

Published: Jan 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off