MIDORI server: a webserver for taxonomic assignment of unknown metazoan mitochondrial-encoded sequences using a curated database

MIDORI server: a webserver for taxonomic assignment of unknown metazoan mitochondrial-encoded... Abstract Summary We present MIDORI server, a user-friendly web platform that uses a curated reference dataset, MIDORI, for high throughput taxonomic classification of unknown metazoan mitochondrial-encoded gene sequences. Currently three methods of taxonomic assignments: RDP Classifier, SPINGO and SINTAX, are implemented. Availability and implementation The web server is freely available at {http://reference-midori.info/server.php}. 1 Introduction The era of massive sequencing has transformed our ability to study Earth’s bio-diversity (Taberlet et al., 2012). DNA extracted from various environments (i.e. soil, water, air, food products) can be analyzed and compared to public databases of annotated reference sequences to determine the presence of microbial, plant and animal taxa. The possible applications are extremely diverse. PCR-based (i.e. metagenetics or metabarcoding) and PCR-free (i.e. metatranscriptomics or metagenomics) DNA sequencing approaches can be used to study diversity patterns of overlooked microscopic taxa (Al-Rshaidat et al., 2016), study the response of biological communities to environmental changes (Ji et al., 2013), investigate illegal trade of endangered wildlife (Arulandhu et al., 2017) and detect species mislabelling in food products (Raclariu et al., 2017). The robustness of these technics, however, largely depends on our ability to rapidly and reliably assign taxonomy to sequences recovered from the environment. The realization by the scientific community that public repositories of genetic data (e.g. GenBank) contained a significant number of taxonomically mislabelled sequences has promoted the creation of curated databases with higher quality standards. For example, reference datasets were built for nuclear-encoded ribosomal RNA genes [e.g. PR2 (Guillou et al., 2012), Silva (Quast et al., 2013)]. Recently, we assembled the first curated database of mitochondrial-encoded genes, MIDORI, for taxonomic assignments of metazoan sequences (Machida et al., 2017). Mitochondrial genes provide higher taxonomic resolution for most metazoan groups than nuclear-encoded genes. As a result, they have been increasingly targeted in metagenetics and metagenomics studies (Leray and Knowlton, 2016). MIDORI was built by retrieving all nucleotide sequences from GenBank BLAST NT and, after quality filtration, includes metazoan mitochondrial sequences for 13 protein-coding (ATP synthase sub-unit 6 and 8; Cytochrome oxidase sub-unit I, II and III; Cytochrome b apoenzyme; NADH dehydrogenase sub-units 1–4, 4L, 5 and 6) and two ribosomal RNA genes (Large and Small ribosomal sub-unit RNA) with species-level level taxonomic information (see details in Machida et al., 2017). 2 Server description Here, we present MIDORI server, a user-friendly platform to facilitate taxonomic classification of mitochondrial-encoded gene sequences with MIDORI. The server currently performs taxonomic assignments with three algorithms that predict taxonomy using k-mer similarity: SPINGO (Allard et al., 2015), RDP classifier (Wang et al., 2007) and SINTAX (Edgar, 2016). A maximum of 10 000 sequences in a FASTA format can be uploaded at once, and all of them must be shorter than 4000 base pairs. Each algorithm can be run using two versions of each of the 15 mitochondrial-encoded gene reference datasets: MIDORI-Unique and MIDORI-Longest. MIDORI-Unique contains all haplotypes of every species while MIDORI-Longest contains a single haplotype per species, the longest one. For example, MIDORI-Longest for the COI gene contains the longest sequence for every species represented in the COI dataset. Using 1336 zooplankton sequences (Machida et al., 2009, 500 bp), we estimated the time required for assignments using the three algorithms with default settings (reference: COI-Longest). As a result, relatively longer calculation time was required for RDP classifier (630 s), compared to SPINGO (90 s) and SINTAX (100 s). Assigned phyla were compared between the results obtained from RDP classifier and SINTAX. The result indicated that about 10% of assignments were inconsistent between the results (most likely the groups with fewer reference sequences). Furthermore, we have also deposited the results of Leave One Out Test in MIDORI web site (http://www.reference-midori.info/download.php, Wang et al., 2007). These results indicated that possibility of mis-assignment increases with the supporting bootstrap values decrease, demonstrating the importance of careful interpretation of results obtained for the analyses. The server is designed to give full flexibility to the user and functions with recent major browsers. A range of options is available for each algorithm such as assignment confidence cut-off (RDP), k-mer size (SPINGO) and bootstrap cut-off (SINTAX). A question mark button located next to each option provides hint details to the user when hovered by the cursor. The user can provide an e-mail address to receive the text-formatted result of the analysis. The server was extensively tested using mock sample and real environmental data. It is easy to use and does neither require any registration nor specific software to be installed locally. The RDP classifier is pre-trained with each of the reference datasets. 3 Conclusion As bio-monitoring and bio-surveillance increasingly rely on mitochondrial-encoded sequence data, the ability to rapidly and reliably assign metazoan sequences to taxonomic groups has become indispensable. MIDORI server enables the classification of large number of unknown metazoan reads to taxa represented in the curated reference database. MIDORI will be regularly updated. We also intend to implement several additional taxonomic assignment algorithms on MIDORI server in the near future [e.g. SAP (Munch et al., 2008), RTAX (Soergel et al., 2012), METAXA2 (Bengtsson-Palme et al., 2015)]. Acknowledgements The authors would like to thank Chao-Yu Pan and Peter Hsiao for technical assistances. They would also like to thank three anonymous referees for thoughtful and insightful comments. Funding This work was supported by Ministry of Science and Technology, Taiwan [grant number 105-2621-B-001-003]; and Academia Sinica. Conflict of Interest: none declared. References Al-Rshaidat M.M.D. et al. ( 2016 ) Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea . Genome , 59 , 724 – 737 . Google Scholar Crossref Search ADS PubMed Allard G. et al. ( 2015 ) SPINGO: a rapid species-classifier for microbial amplicon sequences . BMC Bioinformatics , 16 , 324. Google Scholar Crossref Search ADS PubMed Arulandhu A.J. et al. ( 2017 ) Development and validation of a multi-locus DNA metabarcoding method to identify endangered species in complex samples . GigaScience , 6 , 1 – 18 . Google Scholar Crossref Search ADS PubMed Bengtsson-Palme J. et al. ( 2015 ) METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data . Mol. Ecol. Res ., 15 , 1403 – 1414 . Google Scholar Crossref Search ADS Edgar R. ( 2016 ) SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences . bioRxiv , doi: 10.1101/074161. Guillou L. et al. ( 2012 ) The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy . Nucleic Acids Res ., 41 , D597 – D604 . Google Scholar Crossref Search ADS PubMed Ji Y. et al. ( 2013 ) Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding . Ecol. Lett ., 16 , 1245 – 1257 . Google Scholar Crossref Search ADS PubMed Leray M. , Knowlton N. ( 2016 ) Censusing marine eukaryotic diversity in the twenty-first century . Philos. Trans. R. Soc. Lond. B. Biol. Sci ., 371 , 20150331. Google Scholar Crossref Search ADS PubMed Machida R.J. et al. ( 2017 ) Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples . Sci. Data , 4 , 170027 . Google Scholar Crossref Search ADS PubMed Machida R.J. et al. ( 2009 ) Zooplankton diversity analysis through single-gene sequencing of a community sample . BMC Genomics , 10 , 438 . Google Scholar Crossref Search ADS PubMed Munch K. et al. ( 2008 ) Statistical assignment of DNA sequences using Bayesian phylogenetics . Syst. Biol ., 57 , 750 – 757 . Google Scholar Crossref Search ADS PubMed Quast C. et al. ( 2013 ) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools . Nucleic Acids Res ., 41 , D590 – D596 . Google Scholar Crossref Search ADS PubMed Raclariu A.C. et al. ( 2017 ) Comparative authentication of Hypericum perforatum herbal products using DNA metabarcoding, TLC and HPLC-MS . Sci. Rep ., 7 , 1291 . Google Scholar Crossref Search ADS PubMed Soergel D.A.W. et al. ( 2012 ) Selection of primers for optimal taxonomic classification of environmental 16S rRNA sequences . Isme J ., 6 , 1440 – 1444 . Google Scholar Crossref Search ADS PubMed Taberlet P. et al. ( 2012 ) Towards next-generation biodiversity assessment using DNA metabarcoding . Mol. Ecol ., 21 , 2045 – 2050 . Google Scholar Crossref Search ADS PubMed Wang Q. et al. ( 2007 ) Naïve bayesian classifier for rapid assignment of rRNA sequences into the new bacteral taxonomy . Appl. Environ. Microb ., 73 , 5261 – 5267 . Google Scholar Crossref Search ADS © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

MIDORI server: a webserver for taxonomic assignment of unknown metazoan mitochondrial-encoded sequences using a curated database

Loading next page...
 
/lp/ou_press/midori-server-a-webserver-for-taxonomic-assignment-of-unknown-metazoan-ee0euG4zUf
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty454
Publisher site
See Article on Publisher Site

Abstract

Abstract Summary We present MIDORI server, a user-friendly web platform that uses a curated reference dataset, MIDORI, for high throughput taxonomic classification of unknown metazoan mitochondrial-encoded gene sequences. Currently three methods of taxonomic assignments: RDP Classifier, SPINGO and SINTAX, are implemented. Availability and implementation The web server is freely available at {http://reference-midori.info/server.php}. 1 Introduction The era of massive sequencing has transformed our ability to study Earth’s bio-diversity (Taberlet et al., 2012). DNA extracted from various environments (i.e. soil, water, air, food products) can be analyzed and compared to public databases of annotated reference sequences to determine the presence of microbial, plant and animal taxa. The possible applications are extremely diverse. PCR-based (i.e. metagenetics or metabarcoding) and PCR-free (i.e. metatranscriptomics or metagenomics) DNA sequencing approaches can be used to study diversity patterns of overlooked microscopic taxa (Al-Rshaidat et al., 2016), study the response of biological communities to environmental changes (Ji et al., 2013), investigate illegal trade of endangered wildlife (Arulandhu et al., 2017) and detect species mislabelling in food products (Raclariu et al., 2017). The robustness of these technics, however, largely depends on our ability to rapidly and reliably assign taxonomy to sequences recovered from the environment. The realization by the scientific community that public repositories of genetic data (e.g. GenBank) contained a significant number of taxonomically mislabelled sequences has promoted the creation of curated databases with higher quality standards. For example, reference datasets were built for nuclear-encoded ribosomal RNA genes [e.g. PR2 (Guillou et al., 2012), Silva (Quast et al., 2013)]. Recently, we assembled the first curated database of mitochondrial-encoded genes, MIDORI, for taxonomic assignments of metazoan sequences (Machida et al., 2017). Mitochondrial genes provide higher taxonomic resolution for most metazoan groups than nuclear-encoded genes. As a result, they have been increasingly targeted in metagenetics and metagenomics studies (Leray and Knowlton, 2016). MIDORI was built by retrieving all nucleotide sequences from GenBank BLAST NT and, after quality filtration, includes metazoan mitochondrial sequences for 13 protein-coding (ATP synthase sub-unit 6 and 8; Cytochrome oxidase sub-unit I, II and III; Cytochrome b apoenzyme; NADH dehydrogenase sub-units 1–4, 4L, 5 and 6) and two ribosomal RNA genes (Large and Small ribosomal sub-unit RNA) with species-level level taxonomic information (see details in Machida et al., 2017). 2 Server description Here, we present MIDORI server, a user-friendly platform to facilitate taxonomic classification of mitochondrial-encoded gene sequences with MIDORI. The server currently performs taxonomic assignments with three algorithms that predict taxonomy using k-mer similarity: SPINGO (Allard et al., 2015), RDP classifier (Wang et al., 2007) and SINTAX (Edgar, 2016). A maximum of 10 000 sequences in a FASTA format can be uploaded at once, and all of them must be shorter than 4000 base pairs. Each algorithm can be run using two versions of each of the 15 mitochondrial-encoded gene reference datasets: MIDORI-Unique and MIDORI-Longest. MIDORI-Unique contains all haplotypes of every species while MIDORI-Longest contains a single haplotype per species, the longest one. For example, MIDORI-Longest for the COI gene contains the longest sequence for every species represented in the COI dataset. Using 1336 zooplankton sequences (Machida et al., 2009, 500 bp), we estimated the time required for assignments using the three algorithms with default settings (reference: COI-Longest). As a result, relatively longer calculation time was required for RDP classifier (630 s), compared to SPINGO (90 s) and SINTAX (100 s). Assigned phyla were compared between the results obtained from RDP classifier and SINTAX. The result indicated that about 10% of assignments were inconsistent between the results (most likely the groups with fewer reference sequences). Furthermore, we have also deposited the results of Leave One Out Test in MIDORI web site (http://www.reference-midori.info/download.php, Wang et al., 2007). These results indicated that possibility of mis-assignment increases with the supporting bootstrap values decrease, demonstrating the importance of careful interpretation of results obtained for the analyses. The server is designed to give full flexibility to the user and functions with recent major browsers. A range of options is available for each algorithm such as assignment confidence cut-off (RDP), k-mer size (SPINGO) and bootstrap cut-off (SINTAX). A question mark button located next to each option provides hint details to the user when hovered by the cursor. The user can provide an e-mail address to receive the text-formatted result of the analysis. The server was extensively tested using mock sample and real environmental data. It is easy to use and does neither require any registration nor specific software to be installed locally. The RDP classifier is pre-trained with each of the reference datasets. 3 Conclusion As bio-monitoring and bio-surveillance increasingly rely on mitochondrial-encoded sequence data, the ability to rapidly and reliably assign metazoan sequences to taxonomic groups has become indispensable. MIDORI server enables the classification of large number of unknown metazoan reads to taxa represented in the curated reference database. MIDORI will be regularly updated. We also intend to implement several additional taxonomic assignment algorithms on MIDORI server in the near future [e.g. SAP (Munch et al., 2008), RTAX (Soergel et al., 2012), METAXA2 (Bengtsson-Palme et al., 2015)]. Acknowledgements The authors would like to thank Chao-Yu Pan and Peter Hsiao for technical assistances. They would also like to thank three anonymous referees for thoughtful and insightful comments. Funding This work was supported by Ministry of Science and Technology, Taiwan [grant number 105-2621-B-001-003]; and Academia Sinica. Conflict of Interest: none declared. References Al-Rshaidat M.M.D. et al. ( 2016 ) Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea . Genome , 59 , 724 – 737 . Google Scholar Crossref Search ADS PubMed Allard G. et al. ( 2015 ) SPINGO: a rapid species-classifier for microbial amplicon sequences . BMC Bioinformatics , 16 , 324. Google Scholar Crossref Search ADS PubMed Arulandhu A.J. et al. ( 2017 ) Development and validation of a multi-locus DNA metabarcoding method to identify endangered species in complex samples . GigaScience , 6 , 1 – 18 . Google Scholar Crossref Search ADS PubMed Bengtsson-Palme J. et al. ( 2015 ) METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data . Mol. Ecol. Res ., 15 , 1403 – 1414 . Google Scholar Crossref Search ADS Edgar R. ( 2016 ) SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences . bioRxiv , doi: 10.1101/074161. Guillou L. et al. ( 2012 ) The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy . Nucleic Acids Res ., 41 , D597 – D604 . Google Scholar Crossref Search ADS PubMed Ji Y. et al. ( 2013 ) Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding . Ecol. Lett ., 16 , 1245 – 1257 . Google Scholar Crossref Search ADS PubMed Leray M. , Knowlton N. ( 2016 ) Censusing marine eukaryotic diversity in the twenty-first century . Philos. Trans. R. Soc. Lond. B. Biol. Sci ., 371 , 20150331. Google Scholar Crossref Search ADS PubMed Machida R.J. et al. ( 2017 ) Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples . Sci. Data , 4 , 170027 . Google Scholar Crossref Search ADS PubMed Machida R.J. et al. ( 2009 ) Zooplankton diversity analysis through single-gene sequencing of a community sample . BMC Genomics , 10 , 438 . Google Scholar Crossref Search ADS PubMed Munch K. et al. ( 2008 ) Statistical assignment of DNA sequences using Bayesian phylogenetics . Syst. Biol ., 57 , 750 – 757 . Google Scholar Crossref Search ADS PubMed Quast C. et al. ( 2013 ) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools . Nucleic Acids Res ., 41 , D590 – D596 . Google Scholar Crossref Search ADS PubMed Raclariu A.C. et al. ( 2017 ) Comparative authentication of Hypericum perforatum herbal products using DNA metabarcoding, TLC and HPLC-MS . Sci. Rep ., 7 , 1291 . Google Scholar Crossref Search ADS PubMed Soergel D.A.W. et al. ( 2012 ) Selection of primers for optimal taxonomic classification of environmental 16S rRNA sequences . Isme J ., 6 , 1440 – 1444 . Google Scholar Crossref Search ADS PubMed Taberlet P. et al. ( 2012 ) Towards next-generation biodiversity assessment using DNA metabarcoding . Mol. Ecol ., 21 , 2045 – 2050 . Google Scholar Crossref Search ADS PubMed Wang Q. et al. ( 2007 ) Naïve bayesian classifier for rapid assignment of rRNA sequences into the new bacteral taxonomy . Appl. Environ. Microb ., 73 , 5261 – 5267 . Google Scholar Crossref Search ADS © The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Journal

BioinformaticsOxford University Press

Published: Nov 1, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off