PalmXplore: oil palm gene database

Sanusi, Nik Shazana Nik, Mohd; Rosli,, Rozana; Halim, Mohd Amin, Ab; Chan,, Kuang-Lim; Nagappan,, Jayanthi; Azizi,, Norazah; Amiruddin,, Nadzirah; Tatarinova, Tatiana, V; Low, Eng-Ti, Leslie

doi:10.1093/database/bay095

PalmXplore: oil palm gene database

Sanusi, Nik Shazana Nik, Mohd;Rosli,, Rozana;Halim, Mohd Amin, Ab;Chan,, Kuang-Lim;Nagappan,, Jayanthi;Azizi,, Norazah;Amiruddin,, Nadzirah;Tatarinova, Tatiana, V;Low, Eng-Ti, Leslie 2018-01-01 00:00:00 Aset of Elaeis guineensis genes had been generated by combining two gene prediction pipelines: Fgenesh++ developed by Softberry and Seqping by the Malaysian Palm Oil Board. PalmXplore was developed to provide a scalable data repository and a user-friendly search engine system to efﬁciently store, manage and retrieve the oil palm gene sequences and annotations. Information deposited in PalmXplore includes predicted genes, their genomic coordinates, as well as the annotations derived from external databases, such as Pfam, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Information about genes related to important traits, such as those involved in fatty acid biosynthesis (FAB) and disease resistance, is also provided. The system offers Basic Local Alignment Search Tool homology search, where the results can be down- loaded or visualized in the oil palm genome browser (MYPalmViewer). PalmXplore is regularly updated offering new features, improvements to genome annotation and new genomic sequences. The system is freely accessible at http://palmxplore.mpob.gov.my. Database URL: http://palmxplore.mpob.gov.my Introduction year. Its production has increased over the decades with a Oil palm is a major source of oil that has become a common world production of 69.89 million metric tons in 2017– ingredient in many consumer products and industrial 18, which is an increase of about 7.4% from 2016–17 applications, such as biofuel, a renewable alternative to (1). Palm oil dominates the global market, contributing petroleum. The oil palm is globally important and the up to 55% of the total world exports of oils and fats (2). highest yielding oil-bearing crop in the world, with an The upstream subsectors of oil palm industry particularly average national yield of 4 tonnes of oil per hectare per in genomics-based technologies have gone through © The Author(s) 2018. Published by Oxford University Press. Page 1 of 9 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. (page number not for citation purposes) Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page2of9 Database, Vol. 2018, Article ID bay095 tremendous transformation over the past few decades (3). submit selected entries to the integrated bioinformatics The importance of the oil palm inspired the Malaysian analytics toolkit. The primary genomic data types available Palm Oil Board (MPOB) to sequence and assemble the include predicted oil palm genes and assembled scaffold genome of two oil palm species, Elaeis guineensis and information, DNA and protein sequences and functional Elaeis oleifera to further improve the industry (4). A annotation results. Integrating these diverse data types in breakthrough achievement by MPOB was the discovery an online user-friendly database that is easy to query, view of the SHELL gene that is responsible for the dura (thick- and download was essential to maximize utility of these shell), pisifera (shell-less) and tenera (thin-shell) fruit forms valuable research data. (5), which have a significant impact on yield. Following this, In summary, PalmXplore provides the following Singh et al. (2014) successfully identified the VIRESCENS features: gene that controls the exocarp colour of the oil palm Representative oil palm gene models fruit (6). Another major success was the identification of Oil palm genome Pisifera5-build (P5-build) the epimutation in the MANTLED gene that causes the Basic and advanced search functionalities mantled somaclonal abnormality in oil palm that often Oil palm gene identifiers, coding sequence ID (CDS results in bunch failure and drastic yield reduction (7). ID) and P5-build assembled scaffolds (Scaffold ID) To address the growing demand for exploring and browsers retrieving oil palm data, a web portal for the oil palm Integration to the Genomsawit portal genome information was developed (Genomsawit portal: Integrated bioinformatics tools and external databases: http://genomsawit.mpob.gov.my)(8). This portal was – BLAST designed as an initial access point for oil palm web- – Oil palm genome browser (MYPalmViewer) based information systems and specialized datasets. When – GO the oil palm genome sequence was published in 2013, – Enzyme Code from KEGG the assembled sequences represented about 83% of the – Pfam 1.8 Gb genome sequence (4). At that time, only a draft gene model prediction was available. Therefore, in order Materials and methods to improve the accuracy of gene model prediction and Source of the oil palm gene data subsequent annotation of the genome, gene models from two pipelines namely Fgenesh++ (9) and an in-house tool- Genic regions of the genome were predicted by integrat- termed Seqping (10) were integrated (11). By using these ing the gene models of two pipelines, the established pipelines, we were able to predict a total of 26 059 high- Fgenesh++ (9) and an in-house tool, Seqping (10, 11). quality and validated gene models in the E. guineensis Two approaches were taken to predict good quality genome. Genes associated with imperative agronomic homologous proteins in oil palm. The first uses P5-build traits, such as fatty acid biosynthesis (FAB) and disease genomic scaffolds of an AVROS (Algemene Vereniging van resistance, were also identified ( 11). In order to assist in Rubberplanterter Oostkust van Sumatra) pisifera palm (4) the use of these datasets, we have developed a searchable and known proteins from closely related organisms such as database and information system called PalmXplore the date palm, as reference sequences for the Fgenesh++ (http://palmxplore.mpob.gov.my). PalmXplore features a pipeline (with generic parameters for monocots) to identify series of user-friendly search engines, information browsers a set of predicted oil palm gene models encoding highly and interactive visualization tools for accessing the oil palm homologous proteins. Gene models with significant BLAST −10 gene information and associated functional annotations. hit (E-value cut-off: e ) to known plant proteins from The portal uses the information from and is linked to the NCBI non-redundant (NR) database were used as the Enzyme Code from Kyoto Encyclopedia of Genes and a training set for the Fgenesh++ pipeline to develop Genomes (KEGG) (12), Gene Ontology (GO) (13) and Pfam oil palm Hidden Markov Model (HMM). The HMM version 29.0 (14) databases. User-friendly query interfaces was used to identify the genic regions in the oil palm and bioinformatics tools such as Basic Local Alignment genome sequence. Subsequently, BLAST 2 Sequences was Search Tool (BLAST) (15) and an oil palm genome browser executed to compare the predicted gene models to the called MYPalmViewer (16, 17) have been developed within protein sequences from the plant NR database. The cut- the system to help users in deciphering important biological offs were percent identity ≥ 50, score ≥ 100, coverage information from the datasets. of predicted protein ≥ 80% and coverage of homologous PalmXplore is able to efficiently handle the oil palm protein ≥ 80%. Sequence similarity search between the genome data and is scalable to keep up with data growth. predicted genes and the E. guineensis mRNA dataset It can also provide necessary input/output operations to (5, 18, 19, 20, 21) with an identity cut-off of ≥ 90% Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page3of9 was also carried out. A total of 27 915 Fgenesh++ gene PalmXplore system architecture and database design models had notable similarities to the E. guineensis mRNA dataset and RefSeq proteins (22). The in-house-developed A three-tiered client/server structural design was imple- gene prediction pipeline, Seqping, was used in parallel as mented with this system (28), through which the presen- a second approach to validate and subsequently improve tation (front end), processing (conceptual) and data storage the accuracy of the genes predicted by Fgenesh++.The (physical) are logically divided. Here, the conceptual level self-trained HMM was used to make gene predictions plays the crucial role in streamlining the integration, sharing by incorporating the transcriptomic datasets of oil palm. and exchange of available data (Figure 1). The system’s Here, the pipeline processed the genome and transcriptome architecture makes the existing modules easy to maintain sequences using GlimmerHMM version 3.0 (23, 24), SNAP and is scalable and upgradable, without the need to redesign (25) and AUGUSTUS version 2.6.1 (26) pipelines, followed the database scheme and storage properties for new data. by MAKER2 (27) program to combine the predictions The entire module was modelled into a relational from the three tools in association with the transcriptomic database management system (RDBMS). PalmXplore-DB evidence. The predicted sequences were compared to consists of sequence data and annotations derived from RefSeq protein sequences and oil palm transcriptome different annotation methods. These data need to be stored, −10 dataset via BLASTX (E-value cut-off: e ), resulting in indexed and searched efficiently. RDBMS uses indexes to 17 680 predicted genes with significant similarities. Gene sort data and link information from different tables through models predicted using the two approaches were unified, the use of foreign keys. Other tables may refer to that resulting in 26 059 high-quality genes (11). foreign key, so as to create a link between their data pieces. This comes in handy for applications that are heavy on data analysis and thus, is a good choice for genome or Database and web interface implementation gene annotation databases (29). General performance of The PalmXplore system consists of two major components: SQL queries was checked using the EXPLAIN command a database to store and administer the data and a high- in MySQL (Supplementary le fi 1 ). In Figure 1, a detailed level web interface. The back end of this system was schema of the PalmXplore-DB is illustrated. Here, the organized with a relational model and stored in the conceptual structure of this database was visualized in the MySQL (https://www.mysql.com/) database management form of an Entity-Relationship Diagram (ERD; Figure 2). system. phpMyAdmin (https://www.phpmyadmin.net/) and Three modules of closely linked relations or tables of mysql-workbench 6.3 (https://www.mysql.com) were used the database were created. In the first module, nine for data modelling and database development and adminis- entities (protein sequence, cds sequence, cdna sequence, tration. The web interfaces were constructed using PHP cds, cdna, intronless gene, exon, gene model methods and scripting language, HTML5, CSS3 styling codes and specific gene) were constructed whereby, they primarily JavaScript and operate on the Apache web server. It was stored the core information on oil palm protein-coding designed and tested for web browsers and derived rendering genes (11). The second module was designed with ve fi engines (Windows operating system (OS): Firefox 10.0 entities that manages data on functional annotation of and higher, Google Chrome 21.0 and higher, Safari 5; oil palm genes from KEGG (12), GO (13) and Pfam (14) Linux OS: Firefox 3.6 and higher, Google Chrome 37.0 databases. Moreover, additional information on annotation and higher, Opera 12.0 and higher; Mac OS X version: that resulted from protein sequence comparison of oil palm Firefox 30.0 and higher, Google Chrome 41.0 and higher, and rice (Oryza sativa) by NCBI Protein BLAST (30) were Safari 9.1). The development of the front end was facilitated included in this database. The third module is a single table and empowered by Bootstrap3 and AngularJS frameworks of Scaffold ID. (http://getbootstrap.com/) to address the balance between design and implementation. The system sites were also Results developed to be mobile friendly. They were optimized for responsiveness on client systems regardless of the sizes of The improved predictions were important for a well-defined device used, whereby the fluid grid system implemented annotation of the oil palm genes. Hence, two methods were was scalable to 12 columns as the device or viewport tested in acquiring high-quality gene sets for oil palm. In size increased. The overall performance of the web and the first prediction with Fgenesh ++, ∼27 915 genes that system was analysed by Yslow (http://yslow.org), based on had similarity to the oil palm transcriptome dataset were predefined rule set identified by Yahoo!. Google Analytics predicted. The in-house-developed gene prediction pipeline (https://www.google.com/analytics) is currently used to Seqping predicted ∼17 680 genes that had significant track and report website traffic and user activities. similarities to the oil palm transcriptome dataset. Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page4of9 Database, Vol. 2018, Article ID bay095 Figure 1. Overview of the PalmXplore system architecture. The system architecture is based on the client/server architecture. The PalmXplore-DB contains a list of predicted oil palm genes, functional annotations of the genes with integrative access to external databases and oil palm genome scaffolds and sequences. Combining the results of the two pipelines produced a cient and comprehensive search and browsing of the pre- high-quality set of 26 059 genes that were subsequently dicted oil palm (EG) genes. A simple free-text search of imported into the PalmXplore database. Figure 3 shows the genes is available. PalmXplore’s record identifiers and the GO functional classifications of the oil palm genes ( 31). keywords can also be used in the search. These include CDS ID, Scaffold ID, oil palm chromosome, gene name and gene annotation. Additionally, searches on specific Database components genes, such as intronless, fatty acid biosynthetic, disease- In this database (PalmXplore-DB), relevant information resistance, SHELL, VIRESCENS and MANTLED genes pertaining to the oil palm genome had been deposited. The can be easily performed (Figure 4A). most recent was the E. guineensis P5-build and a collection In the advanced search lter fi , a combination of search of predicted oil palm genes. The unified data set of 26 059 criteria will assist users in applying greater control over genes from Fgenesh++ and Seqping was deposited into how the customized search operates on the oil palm gene the database. Within these high-quality representative oil models. The PalmXplore database was designed with the palm gene models, 3672 (14.1%) genes were identified to integration of public databases, such as KEGG, GO and be intronless, 42 were classified as FAB genes and 210 were Pfam to provide an in-depth description of the oil palm identified as resistance genes ( 11). The published SHELL predicted genes and its functional annotation. The database (5), VIRESCENS (6)and MANTLED (7) genes were also also permits users to search for a gene by its location on the included in this database. Table 1 shows a summary of the genome, ID or putative function (e.g. hydrolase or protein data deposited in the PalmXplore system. kinase) (Figure 4B). The search output includes the sequence data used, Data access and retrieval prediction pipelines, functional annotation, genome posi- PalmXplore has made the access and retrieval of data tion and a link to MYPalmViewer (Figure 4C). Additional feasible, and the user-friendly web interface facilitates effi- information to cross-reference the prediction with other Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page5of9 Figure 2. ERD of PalmXplore-DB. The ERD shows the conceptual data structure used in PalmXplore-DB. Entities and relationships are represented as boxes and dotted lines between the boxes, respectively. The database structure consists of 15 tables presented in three modules (represented by different colours). databases (KEGG, GO and Pfam) is also available. The program and genome browser (Figure 5). Genomsawit search results can be exported for further analysis. The data (Figure 5A) is a web portal, which provides comprehensive, could also be retrieved by browsing the list of CDS ID and updated and free oil palm genome information. Currently, Scaffold ID in the form of populated tables (Figure 4D). Genomsawit provides the genome data for E. guineensis and E. oleifera (4), gene models (4, 11), transcripts (4), markers (32, 33) and GeneThresher data (34). System interoperability Sequence similarity search function can be done using PalmXplore system is interoperable with the Genomsawit BLAST (Figure 5C), where reference databases were pre- portal and bioinformatics analysis tools such as the BLAST pared using the NCBI’s makeblastdb tool. Twelve databases Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page6of9 Database, Vol. 2018, Article ID bay095 Figure 3. Oil palm genes classiﬁcation based on GO annotation. Table 1. Summary of the data deposited in PalmXplore system P5-build oil palm genome statistics Total number of scaffolds 40 360 Average / N50 / largest scaffold sizes (bp) 38 036 / 1 045 414 / 22 100 610 Number of bases (bp) 1 535 150 282 Oil palm gene models Representative Fgenesh++ Seqping Number of genes 26 059 27 915 17 680 Average length (bp) 1 239 1 120 1 193 Gene density (gene/Mb) 16.98 18.19 11.52 Average exon per gene 5.4 5.1 6.0 Average exon length (bp) 252 237 197 Number of genes annotated to GO term(s) 21 572 - - Number of genes with Enzyme Code (KEGG) 6 195 - - Others Intronless genes 3 658 Resistance (R) genes 210 FAB genes 42 containing genomic, transcriptomic, gene annotation and predicted genes or by directly clicking on the link available GeneThresher data of E. guineensis and E. oleifera are on the menu tab. The BLAST results are also linked to readily available for nucleotide or protein sequence search. sequence objects in MYPalmViewer. The hits are visualized MYPalmViewer (Figure 5D) is embedded into the portal as features in the overview, region and detail panels of and can be accessed through the search result page of MYPalmViewer. The features in MYPalmViewer are hyper- Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page7of9 Figure 4. Search and browse options in the PalmXplore system. (A) Basic search: search oil palm genes by Gene ID, Scaffold ID, chromosome number, keywords, speciﬁc gene or location on the genome. (B) Advanced search: reﬁne search results by entering multiple options. (C) MYPalmViewer: visualize and navigate searched gene in the oil palm genome sequence. Annotation data is also available. (D) CDS browser: browse the list of predicted genes. linked to a page with additional information and sequences, the display are all user-configurable. Apart from that, users as well as to external databases. MYPalmViewer allows all are allowed to upload custom track data in a variety of file tracks to be displayed on the same view concurrently. The formats [BED, GBrowse Feature File Format, GFF, GFF3, tracks are customizable; color, shape, size and position on Wiggle (WIG), BAM and SAM]. In addition to genome Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page8of9 Database, Vol. 2018, Article ID bay095 Figure 5. Interoperability with web portal and Bioinformatics analysis tools. (A) Genomsawit portal: a web portal for the oil palm genome information. Genome assemblies and gene models are available for download here. (B) PalmXplore system manages oil palm gene data deposition and queries. (C) BLAST as an alignment tool. (D) MYPalmViewer to visualise oil palm genomes, genes, genetic markers and others; (E) sequence retrieval of oil palm genomic data; and (F) data download facility. browsing, MYPalmViewer offers several other capabilities, this database further aids in identification of new genes such as search engines, detailed view pages for each gene, and gene families that will be responsible for traits of interactive genome navigation and download functions. interest such as the height, FAB and disease resistance genes (35). The database is continuously updated with new features, improvements to the oil palm genome and Conclusion gene models, as they become available, along with the associated data mining and updated versions of bioinfor- When the oil palm genome sequence was published in 2013, matics analysis tools. PalmXplore is freely accessible at a resource was needed to host the oil palm sequences; hence, http://palmxplore.mpob.gov.my. the Genomsawit portal was framed out. Later, with the availability of the high-quality predicted genes, PalmXplore was created as an appendage to the Genomsawit portal. PalmXplore is the first publicly available gene resource Supplementary data depository and search engine for MPOB’s oil palm genome Supplementary data are available at Database Online. data. With the accessibility of this system, it facilitates proficient and comprehensive search and browsing of the sequence information and annotations of oil palm genes. Acknowledgements Moreover, this system has been integrated with the BLAST The authors would like to thank the Director General of MPOB search options and the results can be visualized via the for permission to publish this paper. Special thanks to Dr Ravi- oil palm genome browser (MYPalmViewer). Apart from gadevi Sambanthamurthi for giving critical feedback and assisting in that, the information in PalmXplore provides fundamental manuscript preparation. We also extend our appreciation to Ahmad and important information needed to expedite biologi- Sadiq Abdul Razak, Faizun Kadri, Nor ZihanYusoff and Mohd Farid cal research pertaining to oil palm. The information in Masarin for their technical and network assistance. Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page9of9 scriptome and metabolite analysis of oil palm and date palm Funding mesocarp that differ dramatically in carbon partitioning. Proc. Malaysian Palm Oil Board under the Oil Palm Genome Programme Natl. Acad. Sci. USA., 108, 12527–12532. [R007308000(4)]. 19. Tranbarger,T.J., Dussert,S., Joët,T. et al. (2011) Regulatory mechanisms underlying oil palm fruit mesocarp maturation, Conflict of interest. None declared. ripening, and functional specialization in lipid and carotenoid metabolism. Plant Physiol., 156, 564–584. 20. Shearman,J.R., Jantasuriyarat,C., Sangsrakru,D. et al. (2013) References Transcriptome analysis of normal and mantled developing oil 1. USDA (2018) Oilseeds: World Markets and Trades. https://apps. palm flower and fruit. Genomics, 101, 306–312. fas.usda.gov/psdonline/circulars/oilseeds.pdf. 21. Shearman,J.R., Jantasuriyarat,C., Sangsrakru,D. et al. (2013) 2. Kushairi,A. (2017) Malaysian Palm Oil Performance 2016 and Transcriptome assembly and expression data from normal and Future Prospects for 2017. http://www.mpob.gov.my/images/ mantled oil palm fruit. Dataset Pap Biol., 2013,1–7. stories/pdf/2017/2017 Dr.Kushairi PALMEROS2017.pdf (May 22. Pruitt,K.D., Tatusova,T., Maglott,D.R. et al. (2007) NCBI ref- 2018, date last accessed). erence sequences (RefSeq): a curated non-redundant sequence 3. Kushairi,A., Singh,R. and Ong-Abdullah,M. (2017) The oil palm database of genomes, transcripts and proteins. Nucleic Acids industry in Malaysia: thriving with informative technologies. J. Res., 35, D61–D65. Oil Palm Res., 29, 431–439. 23. Majoros,W.H., Pertea,M. and Salzberg,S.L. (2004) TigrScan and 4. Singh,R., Ong-Abdullah,M., Low,E.-T.L. et al. (2013) Oil palm GlimmerHMM: two open source ab initio eukaryotic gene- genome sequence reveals divergence of interfertile species in old finders. Bioinformatics, 20, 2878–2879. and new worlds. Nature, 500, 335–339. 24. Allen,J.E., Majoros,W.H., Pertea,M. et al. (2006) JIGSAW, 5. Singh,R., Low,E.-T.L., Ooi,L.C.-L. et al. (2013) The oil palm GeneZilla, and GlimmerHMM: puzzling out the features of SHELL gene controls oil yield and encodes a homologue of human genes in the ENCODE regions. Genome Biol., 7, SEEDSTICK. Nature, 500, 340–344. S9.1–S9.13. 6. Singh,R., Low,E.-T.L., Ooi,L.C.-L. et al. (2014) The oil palm 25. Korf,I. (2004) Gene finding in novel genomes. BMC Bioinfor- VIRESCENS gene controls fruit colour and encodes a R2R3- matics, 5, 59. MYB. Nat. Commun., 5, 4106. 26. Stanke,M., Diekhans,M., Baertsch,R. et al. (2008) Using native 7. Ong-Abdullah,M., Ordway,J.M., Jiang,N. et al. (2015) Loss and syntenically mapped cDNA alignments to improve de novo of Karma transposon methylation underlies the mantled gene finding. Bioinformatics, 24, 637–644. somaclonal variant of oil palm. Nature, 525, 533–537. 27. Holt,C. and Yandell,M. (2011) MAKER2: an annotation 8. Rosli,R., Ab Halim,M.-A., Chan,K.L. et al. (2014) Genomsawit pipeline and genome-database management tool for second- website. MPOB Information Series, MPOB TT No. 134. generation genome projects. BMC Bioinformatics, 12, 491. 9. Solovyev,V., Kosarev,P., Seledsov,I. et al. (2006) Automatic 28. Eckerson,W. (1995) Three tier client/server architecture: achiev- annotation of eukaryotic genes, pseudogenes and promoters. ing scalability, performance, and efficiency in client server appli- Genome Biol., 7, S10.1–S10.12. cations. Open Information Systems, 3, 46–50. 10. Chan,K.-L., Rosli,R., Tatarinova,T.V. et al. (2016) Seqping: 29. Lal,S.B., Pandey,P.K., Rai,P.K. et al. (2012) Design and develop- gene prediction pipeline for plant genomes using self-trained ment of portal for biological database in agriculture. Bioinfor- gene models and transcriptomic data. BMC Bioinformatics, mation, 9, 588–598. 18, 29. 30. Kawahara,Y., de la Bastide,M., Hamilton,J.P. et al. (2013) 11. Chan,K.-L., Tatarinova,T.V., Rosli,R. et al. (2017) Evidence- Improvement of the Oryza Sativa Nipponbare reference genome based gene models for structural and functional annotations of using next generation sequence and optical map data. Rice the oil palm genome. Biol. Direct, 12, 21. (N.Y.), 6,4. 12. Kanehisa,M., Sato,Y., Kawashima,M. et al. (2016) KEGG as 31. Ye,J., Fang,L., Zheng,H. et al. (2006) WEGO: a web tool for a reference resource for gene and protein annotation. Nucleic plotting GO annotations. Nucleic Acids Res., 34, 293–297. Acids Res., 44, D457–D462. 32. Ting,N.-C., Jansen,J., Mayes,S. et al. (2014) High density SNP 13. Gene Ontology Consortium (2015) Gene ontology consortium: and SSR-based genetic maps of two independent oil palm going forward. Nucleic Acids Res., 43, D1049–D1056. hybrids. BMC Genomics, 15, 309. 14. Finn,R.D., Coggill,P., Eberhardt,R.Y. et al. (2016) The Pfam 33. Ting,N.-C., Yaakub,Z., Kamaruddin,K. et al. (2016) Fine- protein families database: towards a more sustainable future. mapping and cross-validation of QTLs linked to fatty acid Nucleic Acids Res., 44, D279–D285. composition in multiple independent interspecific crosses of oil 15. Camacho,C., Coulouris,G., Avagyan,V. et al. (2009) BLAST+: palm. BMC Genomics, 17, 289. architecture and applications. BMC Bioinformatics, 10, 421. 34. Low,E.T.L., Rosli,R., Nagappan,J. et al. (2014) Analyses of 16. Low,E.-T.L., Halim,M.A., Rosli,R. et al. (2015) MYPalmViewer: hypomethylated oil palm gene space. PLoS One, 9, e86728. oil palm genome browser. MPOB Information Series No., 148. 35. Rosli,R., Amiruddin,N., Ab Halim,M.A. et al. (2018) Compar- 17. Stein,L.D. (2013) Using GBrowse 2.0 to visualize and share next- ative genomic and transcriptomic analysis of selected fatty acid generation sequence data. Brief. Bioinform., 14(2), 162–171. biosynthesis genes and CNL disease resistance genes in oil palm. 18. Bourgis,F., Kilaru,A., Cao,X. et al. (2011) Comparative tran- PLoS One, 13, e0194792. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Database Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/palmxplore-oil-palm-gene-database-38IrhCffZm

Loading next page...

References (34)

Jia Ye, L. Fang, Hongkun Zheng, Yong Zhang, Jing Chen, Zengjin Zhang, Jing Wang, Shengting Li, Ruiqiang Li, L. Bolund, Jun Wang (2006)
WEGO: a web tool for plotting GO annotations
Nucleic Acids Research, 34
A. Kushairi (2018)
THE OIL PALM INDUSTRY IN MALAYSIA: THRIVING WITH TRANSFORMATIVE TECHNOLOGIES
Journal of Oil Palm Research, 29
Rajinder Singh, Eng-Ti Low, L. Ooi, M. Ong-Abdullah, Ting Chin, J. Nagappan, R. Nookiah, M. Amiruddin, Rozana Rosli, Mohamad Manaf, K. Chan, M. Halim, Norazah Azizi, N. Lakey, Steven Smith, M. Budiman, Michael Hogan, Blaire Bacher, Andrew Brunt, Chunyan Wang, Jared Ordway, R. Sambanthamurthi, R. Martienssen (2013)
The oil palm Shell gene controls oil yield and encodes a homologue of SEEDSTICK
Nature, 500
V. Solovyev, P. Kosarev, Igor Seledsov, D. Vorobyev (2006)
Automatic annotation of eukaryotic genes, pseudogenes and promoters
Genome Biology, 7
M. Kanehisa, Yoko Sato, Masayuki Kawashima, Miho Furumichi, M. Tanabe (2015)
KEGG as a reference resource for gene and protein annotation
Nucleic Acids Research, 44
ong-Abdullah (2018)
THE OIL PALM INDUSTRY IN MALAYSIA: THRIVING WITH TRANSFORMATIVE TECHNOLOGIES
K. Chan, T. Tatarinova, Rozana Rosli, Nadzirah Amiruddin, Norazah Azizi, M. Halim, Nik Sanusi, N. Jayanthi, P. Ponomarenko, Martin Triska, V. Solovyev, Mohd Firdaus-Raih, R. Sambanthamurthi, D. Murphy, E. Low (2017)
Evidence-based gene models for structural and functional annotations of the oil palm genome
Biology Direct, 12
C. Holt, M. Yandell (2011)
MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
BMC Bioinformatics, 12
T. Tranbarger, S. Dussert, T. Joët, X. Argout, Marilyne Summo, A. Champion, D. Cros, A. Omoré, B. Nouy, F. Morcillo (2011)
Regulatory Mechanisms Underlying Oil Palm Fruit Mesocarp Maturation, Ripening, and Functional Specialization in Lipid and Carotenoid Metabolism1[W][OA]
Plant Physiology, 156
N. Ting, Zulkifli Yaakub, Katialisa Kamaruddin, S. Mayes, F. Massawe, R. Sambanthamurthi, J. Jansen, Leslie Low, Maizura Ithnin, A. Kushairi, X. Arulandoo, Rozana Rosli, K. Chan, Nadzirah Amiruddin, Kandha Sritharan, C. Lim, R. Nookiah, M. Amiruddin, Rajinder Singh (2016)
Fine-mapping and cross-validation of QTLs linked to fatty acid composition in multiple independent interspecific crosses of oil palm
BMC Genomics, 17
S. Lal, P. Pandey, Punit Rai, A. Rai, Anu Sharma, K. Chaturvedi (2013)
Design and development of portal for biological database in agriculture
Bioinformation, 9
Rozana Rosli, Nadzirah Amiruddin, Mohd Halim, P. Chan, K. Chan, Norazah Azizi, P. Morris, Eng-Ti Low, M. Ong-Abdullah, R. Sambanthamurthi, Rajinder Singh, D. Murphy (2018)
Comparative genomic and transcriptomic analysis of selected fatty acid biosynthesis genes and CNL disease resistance genes in oil palm
PLoS ONE, 13
E. Low, Rozana Rosli, N. Jayanthi, Ab Mohd-Amin, Norazah Azizi, K. Chan, N. Maqbool, P. Maclean, R. Brauning, A. McCulloch, Roger Moraga, M. Ong-Abdullah, Rajinder Singh (2014)
Analyses of Hypomethylated Oil Palm Gene Space
PLoS ONE, 9
M. Ong-Abdullah, Jared Ordway, Nan Jiang, S. Ooi, Sau-Yee Kok, Norashikin Sarpan, Nuraziyan Azimi, A. Hashim, Z. Ishak, Samsul Rosli, Fadila Malike, Nor Bakar, M. Marjuni, Norziha Abdullah, Zulkifli Yaakub, M. Amiruddin, R. Nookiah, Rajinder Singh, E. Low, K. Chan, Norazah Azizi, Steven Smith, Blaire Bacher, M. Budiman, Andrew Brunt, C. Wischmeyer, Melissa Beil, Michael Hogan, N. Lakey, C. Lim, X. Arulandoo, C. Wong, C. Choo, W. Wong, Y. Kwan, S. Alwee, R. Sambanthamurthi, R. Martienssen (2015)
Loss of Karma transposon methylation underlies the mantled somaclonal variant of oil palm
Nature, 525
J. Blake, Kim Rutherford, J. Chan, R. Kishore, P. Sternberg, K. Auken, Hans-Michael Müller, J. Done, Yanhong Li (2014)
Gene Ontology Consortium: going forward
Nucleic Acids Research, 43
Malaysian Palm Oil Performance 2016 and Future Prospects for
Christiam Camacho, G. Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, Thomas Madden (2009)
BLAST+: architecture and applications
BMC Bioinformatics, 10
N. Ting, J. Jansen, S. Mayes, F. Massawe, R. Sambanthamurthi, L. Ooi, C. Chin, X. Arulandoo, T. Seng, S. Alwee, Maizura Ithnin, Rajinder Singh (2014)
High density SNP and SSR-based genetic maps of two independent oil palm hybrids
BMC Genomics, 15
Jonathan Allen, W. Majoros, M. Perțea, S. Salzberg (2006)
JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
Genome Biology, 7
J. Shearman, C. Jantasuriyarat, D. Sangsrakru, Thippawan Yoocha, A. Vannavichit, S. Tangphatsornruang, S. Tragoonrung (2013)
Transcriptome Assembly and Expression Data from Normal and Mantled Oil Palm Fruit
, 2013
(1995)
Three tier client / server architecture : achieving scalability , performance , and efficiency in client server applications
I. Korf (2004)
Gene finding in novel genomes
BMC Bioinformatics, 5
K. Chan, Rozana Rosli, T. Tatarinova, Michael Hogan, M. Raih, E. Low (2016)
Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data
BMC Bioinformatics, 18
Lincoln Stein (2013)
Using GBrowse 2.0 to visualize and share next-generation sequence data
Briefings in Bioinformatics, 14
K. Pruitt, T. Tatusova, D. Maglott (2004)
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Nucleic Acids Research, 33
W. Majoros, M. Perțea, S. Salzberg (2004)
TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders
Bioinformatics, 20 16
Y. Kawahara, M. Bastide, J. Hamilton, H. Kanamori, W. McCombie, Ouyang Shu, D. Schwartz, Tsuyoshi Tanaka, Jianzhon Wu, Shiguo Zhou, K. Childs, R. Davidson, Haining Lin, Haining Lin, L. Quesada-Ocampo, Brieanne Vaillancourt, H. Sakai, S. Lee, Jungsok Kim, H. Numa, T. Itoh, C. Buell, T. Matsumoto (2013)
Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data
Rice, 6
StankeMario, DiekhansMark, BaertschRobert, HausslerDavid (2008)
Using native and syntenically mapped cDNA alignments to improve de novo gene finding
Bioinformatics
Fabienne Bourgis, Aruna Kilaru, Xia Cao, G. Ngando-Ebongue, N. Drira, J. Ohlrogge, V. Arondel (2011)
Comparative transcriptome and metabolite analysis of oil palm and date palm mesocarp that differ dramatically in carbon partitioning
Proceedings of the National Academy of Sciences, 108
R. Finn, P. Coggill, R. Eberhardt, S. Eddy, Jaina Mistry, A. Mitchell, Simon Potter, M. Punta, Matloob Qureshi, A. Sangrador-Vegas, Gustavo Salazar, J. Tate, A. Bateman (2015)
The Pfam protein families database: towards a more sustainable future
Nucleic Acids Research, 44
Rajinder Singh, E. Low, L. Ooi, M. Ong-Abdullah, R. Nookiah, N. Ting, M. Marjuni, P. Chan, Maizura Ithnin, Mohd Manaf, J. Nagappan, K. Chan, Rozana Rosli, M. Halim, Norazah Azizi, M. Budiman, N. Lakey, Blaire Bacher, Andrew Brunt, Chunyan Wang, Michael Hogan, D. He, Jill MacDonald, Steven Smith, Jared Ordway, R. Martienssen, R. Sambanthamurthi (2014)
The oil palm VIRESCENS gene controls fruit colour and encodes a R2R3-MYB
Nature Communications, 5
(2015)
MYPalmViewer: oil palm genome browser
Rajinder Singh, M. Ong-Abdullah, E. Low, M. Manaf, Rozana Rosli, R. Nookiah, L. Ooi, S. Ooi, K. Chan, M. Halim, Norazah Azizi, J. Nagappan, Blaire Bacher, N. Lakey, Steven Smith, D. He, Michael Hogan, M. Budiman, Ernest Lee, R. DeSalle, D. Kudrna, J. Goicoechea, R. Wing, R. Wilson, R. Fulton, Jared Ordway, R. Martienssen, R. Sambanthamurthi (2013)
Oil palm genome sequence reveals divergence of interfertile species in old and new worlds
Nature, 500
J. Shearman, C. Jantasuriyarat, D. Sangsrakru, Thippawan Yoocha, A. Vannavichit, S. Tragoonrung, S. Tangphatsornruang (2013)
Transcriptome analysis of normal and mantled developing oil palm flower and fruit.
Genomics, 101 5

Publisher: Oxford University Press
Copyright: © The Author(s) 2018. Published by Oxford University Press.
eISSN: 1758-0463
DOI: 10.1093/database/bay095
pmid: 30239681
Publisher site: See Article on Publisher Site

Abstract

Aset of Elaeis guineensis genes had been generated by combining two gene prediction pipelines: Fgenesh++ developed by Softberry and Seqping by the Malaysian Palm Oil Board. PalmXplore was developed to provide a scalable data repository and a user-friendly search engine system to efﬁciently store, manage and retrieve the oil palm gene sequences and annotations. Information deposited in PalmXplore includes predicted genes, their genomic coordinates, as well as the annotations derived from external databases, such as Pfam, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Information about genes related to important traits, such as those involved in fatty acid biosynthesis (FAB) and disease resistance, is also provided. The system offers Basic Local Alignment Search Tool homology search, where the results can be down- loaded or visualized in the oil palm genome browser (MYPalmViewer). PalmXplore is regularly updated offering new features, improvements to genome annotation and new genomic sequences. The system is freely accessible at http://palmxplore.mpob.gov.my. Database URL: http://palmxplore.mpob.gov.my Introduction year. Its production has increased over the decades with a Oil palm is a major source of oil that has become a common world production of 69.89 million metric tons in 2017– ingredient in many consumer products and industrial 18, which is an increase of about 7.4% from 2016–17 applications, such as biofuel, a renewable alternative to (1). Palm oil dominates the global market, contributing petroleum. The oil palm is globally important and the up to 55% of the total world exports of oils and fats (2). highest yielding oil-bearing crop in the world, with an The upstream subsectors of oil palm industry particularly average national yield of 4 tonnes of oil per hectare per in genomics-based technologies have gone through © The Author(s) 2018. Published by Oxford University Press. Page 1 of 9 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. (page number not for citation purposes) Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page2of9 Database, Vol. 2018, Article ID bay095 tremendous transformation over the past few decades (3). submit selected entries to the integrated bioinformatics The importance of the oil palm inspired the Malaysian analytics toolkit. The primary genomic data types available Palm Oil Board (MPOB) to sequence and assemble the include predicted oil palm genes and assembled scaffold genome of two oil palm species, Elaeis guineensis and information, DNA and protein sequences and functional Elaeis oleifera to further improve the industry (4). A annotation results. Integrating these diverse data types in breakthrough achievement by MPOB was the discovery an online user-friendly database that is easy to query, view of the SHELL gene that is responsible for the dura (thick- and download was essential to maximize utility of these shell), pisifera (shell-less) and tenera (thin-shell) fruit forms valuable research data. (5), which have a significant impact on yield. Following this, In summary, PalmXplore provides the following Singh et al. (2014) successfully identified the VIRESCENS features: gene that controls the exocarp colour of the oil palm Representative oil palm gene models fruit (6). Another major success was the identification of Oil palm genome Pisifera5-build (P5-build) the epimutation in the MANTLED gene that causes the Basic and advanced search functionalities mantled somaclonal abnormality in oil palm that often Oil palm gene identifiers, coding sequence ID (CDS results in bunch failure and drastic yield reduction (7). ID) and P5-build assembled scaffolds (Scaffold ID) To address the growing demand for exploring and browsers retrieving oil palm data, a web portal for the oil palm Integration to the Genomsawit portal genome information was developed (Genomsawit portal: Integrated bioinformatics tools and external databases: http://genomsawit.mpob.gov.my)(8). This portal was – BLAST designed as an initial access point for oil palm web- – Oil palm genome browser (MYPalmViewer) based information systems and specialized datasets. When – GO the oil palm genome sequence was published in 2013, – Enzyme Code from KEGG the assembled sequences represented about 83% of the – Pfam 1.8 Gb genome sequence (4). At that time, only a draft gene model prediction was available. Therefore, in order Materials and methods to improve the accuracy of gene model prediction and Source of the oil palm gene data subsequent annotation of the genome, gene models from two pipelines namely Fgenesh++ (9) and an in-house tool- Genic regions of the genome were predicted by integrat- termed Seqping (10) were integrated (11). By using these ing the gene models of two pipelines, the established pipelines, we were able to predict a total of 26 059 high- Fgenesh++ (9) and an in-house tool, Seqping (10, 11). quality and validated gene models in the E. guineensis Two approaches were taken to predict good quality genome. Genes associated with imperative agronomic homologous proteins in oil palm. The first uses P5-build traits, such as fatty acid biosynthesis (FAB) and disease genomic scaffolds of an AVROS (Algemene Vereniging van resistance, were also identified ( 11). In order to assist in Rubberplanterter Oostkust van Sumatra) pisifera palm (4) the use of these datasets, we have developed a searchable and known proteins from closely related organisms such as database and information system called PalmXplore the date palm, as reference sequences for the Fgenesh++ (http://palmxplore.mpob.gov.my). PalmXplore features a pipeline (with generic parameters for monocots) to identify series of user-friendly search engines, information browsers a set of predicted oil palm gene models encoding highly and interactive visualization tools for accessing the oil palm homologous proteins. Gene models with significant BLAST −10 gene information and associated functional annotations. hit (E-value cut-off: e ) to known plant proteins from The portal uses the information from and is linked to the NCBI non-redundant (NR) database were used as the Enzyme Code from Kyoto Encyclopedia of Genes and a training set for the Fgenesh++ pipeline to develop Genomes (KEGG) (12), Gene Ontology (GO) (13) and Pfam oil palm Hidden Markov Model (HMM). The HMM version 29.0 (14) databases. User-friendly query interfaces was used to identify the genic regions in the oil palm and bioinformatics tools such as Basic Local Alignment genome sequence. Subsequently, BLAST 2 Sequences was Search Tool (BLAST) (15) and an oil palm genome browser executed to compare the predicted gene models to the called MYPalmViewer (16, 17) have been developed within protein sequences from the plant NR database. The cut- the system to help users in deciphering important biological offs were percent identity ≥ 50, score ≥ 100, coverage information from the datasets. of predicted protein ≥ 80% and coverage of homologous PalmXplore is able to efficiently handle the oil palm protein ≥ 80%. Sequence similarity search between the genome data and is scalable to keep up with data growth. predicted genes and the E. guineensis mRNA dataset It can also provide necessary input/output operations to (5, 18, 19, 20, 21) with an identity cut-off of ≥ 90% Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page3of9 was also carried out. A total of 27 915 Fgenesh++ gene PalmXplore system architecture and database design models had notable similarities to the E. guineensis mRNA dataset and RefSeq proteins (22). The in-house-developed A three-tiered client/server structural design was imple- gene prediction pipeline, Seqping, was used in parallel as mented with this system (28), through which the presen- a second approach to validate and subsequently improve tation (front end), processing (conceptual) and data storage the accuracy of the genes predicted by Fgenesh++.The (physical) are logically divided. Here, the conceptual level self-trained HMM was used to make gene predictions plays the crucial role in streamlining the integration, sharing by incorporating the transcriptomic datasets of oil palm. and exchange of available data (Figure 1). The system’s Here, the pipeline processed the genome and transcriptome architecture makes the existing modules easy to maintain sequences using GlimmerHMM version 3.0 (23, 24), SNAP and is scalable and upgradable, without the need to redesign (25) and AUGUSTUS version 2.6.1 (26) pipelines, followed the database scheme and storage properties for new data. by MAKER2 (27) program to combine the predictions The entire module was modelled into a relational from the three tools in association with the transcriptomic database management system (RDBMS). PalmXplore-DB evidence. The predicted sequences were compared to consists of sequence data and annotations derived from RefSeq protein sequences and oil palm transcriptome different annotation methods. These data need to be stored, −10 dataset via BLASTX (E-value cut-off: e ), resulting in indexed and searched efficiently. RDBMS uses indexes to 17 680 predicted genes with significant similarities. Gene sort data and link information from different tables through models predicted using the two approaches were unified, the use of foreign keys. Other tables may refer to that resulting in 26 059 high-quality genes (11). foreign key, so as to create a link between their data pieces. This comes in handy for applications that are heavy on data analysis and thus, is a good choice for genome or Database and web interface implementation gene annotation databases (29). General performance of The PalmXplore system consists of two major components: SQL queries was checked using the EXPLAIN command a database to store and administer the data and a high- in MySQL (Supplementary le fi 1 ). In Figure 1, a detailed level web interface. The back end of this system was schema of the PalmXplore-DB is illustrated. Here, the organized with a relational model and stored in the conceptual structure of this database was visualized in the MySQL (https://www.mysql.com/) database management form of an Entity-Relationship Diagram (ERD; Figure 2). system. phpMyAdmin (https://www.phpmyadmin.net/) and Three modules of closely linked relations or tables of mysql-workbench 6.3 (https://www.mysql.com) were used the database were created. In the first module, nine for data modelling and database development and adminis- entities (protein sequence, cds sequence, cdna sequence, tration. The web interfaces were constructed using PHP cds, cdna, intronless gene, exon, gene model methods and scripting language, HTML5, CSS3 styling codes and specific gene) were constructed whereby, they primarily JavaScript and operate on the Apache web server. It was stored the core information on oil palm protein-coding designed and tested for web browsers and derived rendering genes (11). The second module was designed with ve fi engines (Windows operating system (OS): Firefox 10.0 entities that manages data on functional annotation of and higher, Google Chrome 21.0 and higher, Safari 5; oil palm genes from KEGG (12), GO (13) and Pfam (14) Linux OS: Firefox 3.6 and higher, Google Chrome 37.0 databases. Moreover, additional information on annotation and higher, Opera 12.0 and higher; Mac OS X version: that resulted from protein sequence comparison of oil palm Firefox 30.0 and higher, Google Chrome 41.0 and higher, and rice (Oryza sativa) by NCBI Protein BLAST (30) were Safari 9.1). The development of the front end was facilitated included in this database. The third module is a single table and empowered by Bootstrap3 and AngularJS frameworks of Scaffold ID. (http://getbootstrap.com/) to address the balance between design and implementation. The system sites were also Results developed to be mobile friendly. They were optimized for responsiveness on client systems regardless of the sizes of The improved predictions were important for a well-defined device used, whereby the fluid grid system implemented annotation of the oil palm genes. Hence, two methods were was scalable to 12 columns as the device or viewport tested in acquiring high-quality gene sets for oil palm. In size increased. The overall performance of the web and the first prediction with Fgenesh ++, ∼27 915 genes that system was analysed by Yslow (http://yslow.org), based on had similarity to the oil palm transcriptome dataset were predefined rule set identified by Yahoo!. Google Analytics predicted. The in-house-developed gene prediction pipeline (https://www.google.com/analytics) is currently used to Seqping predicted ∼17 680 genes that had significant track and report website traffic and user activities. similarities to the oil palm transcriptome dataset. Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page4of9 Database, Vol. 2018, Article ID bay095 Figure 1. Overview of the PalmXplore system architecture. The system architecture is based on the client/server architecture. The PalmXplore-DB contains a list of predicted oil palm genes, functional annotations of the genes with integrative access to external databases and oil palm genome scaffolds and sequences. Combining the results of the two pipelines produced a cient and comprehensive search and browsing of the pre- high-quality set of 26 059 genes that were subsequently dicted oil palm (EG) genes. A simple free-text search of imported into the PalmXplore database. Figure 3 shows the genes is available. PalmXplore’s record identifiers and the GO functional classifications of the oil palm genes ( 31). keywords can also be used in the search. These include CDS ID, Scaffold ID, oil palm chromosome, gene name and gene annotation. Additionally, searches on specific Database components genes, such as intronless, fatty acid biosynthetic, disease- In this database (PalmXplore-DB), relevant information resistance, SHELL, VIRESCENS and MANTLED genes pertaining to the oil palm genome had been deposited. The can be easily performed (Figure 4A). most recent was the E. guineensis P5-build and a collection In the advanced search lter fi , a combination of search of predicted oil palm genes. The unified data set of 26 059 criteria will assist users in applying greater control over genes from Fgenesh++ and Seqping was deposited into how the customized search operates on the oil palm gene the database. Within these high-quality representative oil models. The PalmXplore database was designed with the palm gene models, 3672 (14.1%) genes were identified to integration of public databases, such as KEGG, GO and be intronless, 42 were classified as FAB genes and 210 were Pfam to provide an in-depth description of the oil palm identified as resistance genes ( 11). The published SHELL predicted genes and its functional annotation. The database (5), VIRESCENS (6)and MANTLED (7) genes were also also permits users to search for a gene by its location on the included in this database. Table 1 shows a summary of the genome, ID or putative function (e.g. hydrolase or protein data deposited in the PalmXplore system. kinase) (Figure 4B). The search output includes the sequence data used, Data access and retrieval prediction pipelines, functional annotation, genome posi- PalmXplore has made the access and retrieval of data tion and a link to MYPalmViewer (Figure 4C). Additional feasible, and the user-friendly web interface facilitates effi- information to cross-reference the prediction with other Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page5of9 Figure 2. ERD of PalmXplore-DB. The ERD shows the conceptual data structure used in PalmXplore-DB. Entities and relationships are represented as boxes and dotted lines between the boxes, respectively. The database structure consists of 15 tables presented in three modules (represented by different colours). databases (KEGG, GO and Pfam) is also available. The program and genome browser (Figure 5). Genomsawit search results can be exported for further analysis. The data (Figure 5A) is a web portal, which provides comprehensive, could also be retrieved by browsing the list of CDS ID and updated and free oil palm genome information. Currently, Scaffold ID in the form of populated tables (Figure 4D). Genomsawit provides the genome data for E. guineensis and E. oleifera (4), gene models (4, 11), transcripts (4), markers (32, 33) and GeneThresher data (34). System interoperability Sequence similarity search function can be done using PalmXplore system is interoperable with the Genomsawit BLAST (Figure 5C), where reference databases were pre- portal and bioinformatics analysis tools such as the BLAST pared using the NCBI’s makeblastdb tool. Twelve databases Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page6of9 Database, Vol. 2018, Article ID bay095 Figure 3. Oil palm genes classiﬁcation based on GO annotation. Table 1. Summary of the data deposited in PalmXplore system P5-build oil palm genome statistics Total number of scaffolds 40 360 Average / N50 / largest scaffold sizes (bp) 38 036 / 1 045 414 / 22 100 610 Number of bases (bp) 1 535 150 282 Oil palm gene models Representative Fgenesh++ Seqping Number of genes 26 059 27 915 17 680 Average length (bp) 1 239 1 120 1 193 Gene density (gene/Mb) 16.98 18.19 11.52 Average exon per gene 5.4 5.1 6.0 Average exon length (bp) 252 237 197 Number of genes annotated to GO term(s) 21 572 - - Number of genes with Enzyme Code (KEGG) 6 195 - - Others Intronless genes 3 658 Resistance (R) genes 210 FAB genes 42 containing genomic, transcriptomic, gene annotation and predicted genes or by directly clicking on the link available GeneThresher data of E. guineensis and E. oleifera are on the menu tab. The BLAST results are also linked to readily available for nucleotide or protein sequence search. sequence objects in MYPalmViewer. The hits are visualized MYPalmViewer (Figure 5D) is embedded into the portal as features in the overview, region and detail panels of and can be accessed through the search result page of MYPalmViewer. The features in MYPalmViewer are hyper- Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page7of9 Figure 4. Search and browse options in the PalmXplore system. (A) Basic search: search oil palm genes by Gene ID, Scaffold ID, chromosome number, keywords, speciﬁc gene or location on the genome. (B) Advanced search: reﬁne search results by entering multiple options. (C) MYPalmViewer: visualize and navigate searched gene in the oil palm genome sequence. Annotation data is also available. (D) CDS browser: browse the list of predicted genes. linked to a page with additional information and sequences, the display are all user-configurable. Apart from that, users as well as to external databases. MYPalmViewer allows all are allowed to upload custom track data in a variety of file tracks to be displayed on the same view concurrently. The formats [BED, GBrowse Feature File Format, GFF, GFF3, tracks are customizable; color, shape, size and position on Wiggle (WIG), BAM and SAM]. In addition to genome Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Page8of9 Database, Vol. 2018, Article ID bay095 Figure 5. Interoperability with web portal and Bioinformatics analysis tools. (A) Genomsawit portal: a web portal for the oil palm genome information. Genome assemblies and gene models are available for download here. (B) PalmXplore system manages oil palm gene data deposition and queries. (C) BLAST as an alignment tool. (D) MYPalmViewer to visualise oil palm genomes, genes, genetic markers and others; (E) sequence retrieval of oil palm genomic data; and (F) data download facility. browsing, MYPalmViewer offers several other capabilities, this database further aids in identification of new genes such as search engines, detailed view pages for each gene, and gene families that will be responsible for traits of interactive genome navigation and download functions. interest such as the height, FAB and disease resistance genes (35). The database is continuously updated with new features, improvements to the oil palm genome and Conclusion gene models, as they become available, along with the associated data mining and updated versions of bioinfor- When the oil palm genome sequence was published in 2013, matics analysis tools. PalmXplore is freely accessible at a resource was needed to host the oil palm sequences; hence, http://palmxplore.mpob.gov.my. the Genomsawit portal was framed out. Later, with the availability of the high-quality predicted genes, PalmXplore was created as an appendage to the Genomsawit portal. PalmXplore is the first publicly available gene resource Supplementary data depository and search engine for MPOB’s oil palm genome Supplementary data are available at Database Online. data. With the accessibility of this system, it facilitates proficient and comprehensive search and browsing of the sequence information and annotations of oil palm genes. Acknowledgements Moreover, this system has been integrated with the BLAST The authors would like to thank the Director General of MPOB search options and the results can be visualized via the for permission to publish this paper. Special thanks to Dr Ravi- oil palm genome browser (MYPalmViewer). Apart from gadevi Sambanthamurthi for giving critical feedback and assisting in that, the information in PalmXplore provides fundamental manuscript preparation. We also extend our appreciation to Ahmad and important information needed to expedite biologi- Sadiq Abdul Razak, Faizun Kadri, Nor ZihanYusoff and Mohd Farid cal research pertaining to oil palm. The information in Masarin for their technical and network assistance. Downloaded from https://academic.oup.com/database/article-abstract/doi/10.1093/database/bay095/5098614 by Ed 'DeepDyve' Gillespie user on 10 January 2019 Database, Vol. 2018, Article ID bay095 Page9of9 scriptome and metabolite analysis of oil palm and date palm Funding mesocarp that differ dramatically in carbon partitioning. Proc. Malaysian Palm Oil Board under the Oil Palm Genome Programme Natl. Acad. Sci. USA., 108, 12527–12532. [R007308000(4)]. 19. Tranbarger,T.J., Dussert,S., Joët,T. et al. (2011) Regulatory mechanisms underlying oil palm fruit mesocarp maturation, Conflict of interest. None declared. ripening, and functional specialization in lipid and carotenoid metabolism. Plant Physiol., 156, 564–584. 20. Shearman,J.R., Jantasuriyarat,C., Sangsrakru,D. et al. (2013) References Transcriptome analysis of normal and mantled developing oil 1. USDA (2018) Oilseeds: World Markets and Trades. https://apps. palm flower and fruit. Genomics, 101, 306–312. fas.usda.gov/psdonline/circulars/oilseeds.pdf. 21. Shearman,J.R., Jantasuriyarat,C., Sangsrakru,D. et al. (2013) 2. Kushairi,A. (2017) Malaysian Palm Oil Performance 2016 and Transcriptome assembly and expression data from normal and Future Prospects for 2017. http://www.mpob.gov.my/images/ mantled oil palm fruit. Dataset Pap Biol., 2013,1–7. stories/pdf/2017/2017 Dr.Kushairi PALMEROS2017.pdf (May 22. Pruitt,K.D., Tatusova,T., Maglott,D.R. et al. (2007) NCBI ref- 2018, date last accessed). erence sequences (RefSeq): a curated non-redundant sequence 3. Kushairi,A., Singh,R. and Ong-Abdullah,M. (2017) The oil palm database of genomes, transcripts and proteins. Nucleic Acids industry in Malaysia: thriving with informative technologies. J. Res., 35, D61–D65. Oil Palm Res., 29, 431–439. 23. Majoros,W.H., Pertea,M. and Salzberg,S.L. (2004) TigrScan and 4. Singh,R., Ong-Abdullah,M., Low,E.-T.L. et al. (2013) Oil palm GlimmerHMM: two open source ab initio eukaryotic gene- genome sequence reveals divergence of interfertile species in old finders. Bioinformatics, 20, 2878–2879. and new worlds. Nature, 500, 335–339. 24. Allen,J.E., Majoros,W.H., Pertea,M. et al. (2006) JIGSAW, 5. Singh,R., Low,E.-T.L., Ooi,L.C.-L. et al. (2013) The oil palm GeneZilla, and GlimmerHMM: puzzling out the features of SHELL gene controls oil yield and encodes a homologue of human genes in the ENCODE regions. Genome Biol., 7, SEEDSTICK. Nature, 500, 340–344. S9.1–S9.13. 6. Singh,R., Low,E.-T.L., Ooi,L.C.-L. et al. (2014) The oil palm 25. Korf,I. (2004) Gene finding in novel genomes. BMC Bioinfor- VIRESCENS gene controls fruit colour and encodes a R2R3- matics, 5, 59. MYB. Nat. Commun., 5, 4106. 26. Stanke,M., Diekhans,M., Baertsch,R. et al. (2008) Using native 7. Ong-Abdullah,M., Ordway,J.M., Jiang,N. et al. (2015) Loss and syntenically mapped cDNA alignments to improve de novo of Karma transposon methylation underlies the mantled gene finding. Bioinformatics, 24, 637–644. somaclonal variant of oil palm. Nature, 525, 533–537. 27. Holt,C. and Yandell,M. (2011) MAKER2: an annotation 8. Rosli,R., Ab Halim,M.-A., Chan,K.L. et al. (2014) Genomsawit pipeline and genome-database management tool for second- website. MPOB Information Series, MPOB TT No. 134. generation genome projects. BMC Bioinformatics, 12, 491. 9. Solovyev,V., Kosarev,P., Seledsov,I. et al. (2006) Automatic 28. Eckerson,W. (1995) Three tier client/server architecture: achiev- annotation of eukaryotic genes, pseudogenes and promoters. ing scalability, performance, and efficiency in client server appli- Genome Biol., 7, S10.1–S10.12. cations. Open Information Systems, 3, 46–50. 10. Chan,K.-L., Rosli,R., Tatarinova,T.V. et al. (2016) Seqping: 29. Lal,S.B., Pandey,P.K., Rai,P.K. et al. (2012) Design and develop- gene prediction pipeline for plant genomes using self-trained ment of portal for biological database in agriculture. Bioinfor- gene models and transcriptomic data. BMC Bioinformatics, mation, 9, 588–598. 18, 29. 30. Kawahara,Y., de la Bastide,M., Hamilton,J.P. et al. (2013) 11. Chan,K.-L., Tatarinova,T.V., Rosli,R. et al. (2017) Evidence- Improvement of the Oryza Sativa Nipponbare reference genome based gene models for structural and functional annotations of using next generation sequence and optical map data. Rice the oil palm genome. Biol. Direct, 12, 21. (N.Y.), 6,4. 12. Kanehisa,M., Sato,Y., Kawashima,M. et al. (2016) KEGG as 31. Ye,J., Fang,L., Zheng,H. et al. (2006) WEGO: a web tool for a reference resource for gene and protein annotation. Nucleic plotting GO annotations. Nucleic Acids Res., 34, 293–297. Acids Res., 44, D457–D462. 32. Ting,N.-C., Jansen,J., Mayes,S. et al. (2014) High density SNP 13. Gene Ontology Consortium (2015) Gene ontology consortium: and SSR-based genetic maps of two independent oil palm going forward. Nucleic Acids Res., 43, D1049–D1056. hybrids. BMC Genomics, 15, 309. 14. Finn,R.D., Coggill,P., Eberhardt,R.Y. et al. (2016) The Pfam 33. Ting,N.-C., Yaakub,Z., Kamaruddin,K. et al. (2016) Fine- protein families database: towards a more sustainable future. mapping and cross-validation of QTLs linked to fatty acid Nucleic Acids Res., 44, D279–D285. composition in multiple independent interspecific crosses of oil 15. Camacho,C., Coulouris,G., Avagyan,V. et al. (2009) BLAST+: palm. BMC Genomics, 17, 289. architecture and applications. BMC Bioinformatics, 10, 421. 34. Low,E.T.L., Rosli,R., Nagappan,J. et al. (2014) Analyses of 16. Low,E.-T.L., Halim,M.A., Rosli,R. et al. (2015) MYPalmViewer: hypomethylated oil palm gene space. PLoS One, 9, e86728. oil palm genome browser. MPOB Information Series No., 148. 35. Rosli,R., Amiruddin,N., Ab Halim,M.A. et al. (2018) Compar- 17. Stein,L.D. (2013) Using GBrowse 2.0 to visualize and share next- ative genomic and transcriptomic analysis of selected fatty acid generation sequence data. Brief. Bioinform., 14(2), 162–171. biosynthesis genes and CNL disease resistance genes in oil palm. 18. Bourgis,F., Kilaru,A., Cao,X. et al. (2011) Comparative tran- PLoS One, 13, e0194792.

Journal

Database – Oxford University Press

Published: Jan 1, 2018

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

PalmXplore: oil palm gene database

PalmXplore: oil palm gene database

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

PalmXplore: oil palm gene database

PalmXplore: oil palm gene database

References (34)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies