Access the full text.
Sign up today, get DeepDyve free for 14 days.
Biodiversity information from Antarctic terrestrial habitats helps conservation efforts, but the distribution and diversity particularly of microinvertebrates remains poorly understood. Springtails, mites, tardigrades, nematodes and rotifers are difficult to identify using morphological features, hence DNAbased metabarcoding methods are well suited for their study. We compared taxonomy assignments of a high throughput sequencing metabarcoding approach using one ribosomal DNA (18S rDNA) and one mitochondrial DNA (cytochrome c oxidase subunit I COI) marker with morphological reference data. Specifically, we compared metabarcoding or morphological taxonomic assignments on multiple taxonomic levels in an artificial DNA blend containing Australian invertebrates, and in seven extracts of Antarctic soils containing known micro-faunal taxa. Avoiding arbitrary application of metabarcoding analysis parameters, we calibrated those parameters with metabarcoding data from non-Antarctic soils. Metabarcoding approaches employing 18S rDNA and COI markers enabled detection of small and cryptic Antarctic invertebrates, and on low taxonomic ranks 18S data outperformed COI data in this respect. Morphological *Corresponding author Paul Czechowski, Antarctic Biological Research Initiative, 31 Jobson Road, Bolivar, South Australia 5110, Australia; E-mail: email@example.com Paul Czechowski, Laurence Clarke, Alan Cooper, Australian Centre for Ancient DNA, University of Adelaide, North Terrace, Adelaide, South Australia 5000, Australia Laurence Clarke, Australian Antarctic Division, 203 Channel Highway, Kingston, Tasmania 7050, Australia Laurence Clarke, Antarctic Climate & Ecosystems Cooperative, Research Centre, University of Tasmania, Private Bag 80, Hobart, Tasmania 7001, Australia Mark Stevens, South Australian Museum, Science Centre, North Terrace, Adelaide, South Australia 5001, Australia Mark Stevens, School of Pharmacy and Medical Sciences, University of South Australia, North Terrace Adelaide, South Australia 5000, Australia taxonomy determination did not outperform metabarcoding approaches. Our study demonstrates how barcoding markers can be tested prior to their application to specific taxonomic groups, and that taxonomy fidelity of markers needs to be validated in relation to environment, taxa, and available reference information. Keywords: environmental DNA, metataxonomic, mitochondrial cytochrome c oxidase I, COI, 18S rDNA, Illumina, 454, biodiversity survey 1 Introduction Biodiversity information from Antarctic terrestrial habitats is important for estimating the effects of environmental change on Antarctic ecosystems [1,2], conservation management in light of increasing threats from nonindigenous invasive species , and investigations on the historic effect of glacial constraints on the evolution of Antarctic biotas over millions of years . Undertaking such biodiversity research in terrestrial Antarctica is challenging due to the logistics of accessing remote locations in a harsh environment . In recent years, biodiversity information for terrestrial Antarctic plant life has improved due to compilation of occurrence records from smaller-scale studies into easily accessible databases, and may in the future be easier to obtain through remote sensing technology [6,7]. However, the distribution and diversity of Antarctic invertebrates remains understudied [8,9] despite their important role in nutrient cycling and soil formation . Deficient biodiversity information for terrestrial Antarctic invertebrates is caused by the persistence of slow and inefficient survey methods. Antarctic springtails, mites, tardigrades, nematodes and rotifers are morphologically conserved, but still frequently analyzed with morphological approaches, requiring highly skilled taxonomists - time required to identify such invertebrates can be considered inversely proportional to their size . Not reliant on morphological identification, DNA- © 2017 Paul Czechowski, et al., published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. based methods are better suited for the study of such taxa [15,16], but may lack resolution when sequence information is not used (e.g. in analysis of Terminal Restriction Fragment Length Polymorphisms TRFLPs; [17,18]) or may also be prohibitively work intensive when large sample numbers are analyzed (e.g. through Sangersequencing; [19,20]. High Throughput Sequencing (HTS) of amplicons generated from bulk extracts of environmental samples can be used to rapidly generate biodiversity information from terrestrial Antarctic habitats . With such metabarcoding methods (sensu ), morphologically conserved species are rapidly distinguished in parallel from substrates such as soil, snow or water , using simple sampling procedures and laboratory workflows [23,26]. In Antarctica, HTS based metabarcoding studies have investigated viruses , bacteria [17,28,29] and predominantly unicellular, fungal or algal, eukaryotes [18,26,30]. Such metabarcoding studies could also be increasingly applied specifically to invertebrate taxa, and may there be particularly useful to provide broad taxonomic classification across large spatial distances (i.e. large sample numbers) . The development of practical metabarcoding techniques as successors over morphological biodiversity research requires comparative methodological studies . In the Antarctic context, it is currently unknown how well a metabarcoding approach generating invertebrate phylotypes would compare to morphological taxonomic assignments. Metabarcoding studies require suitable genetic markers to detect target organisms, and markers targeting 18S rDNA  and mitochondrial cytochrome c oxidase subunit I (COI)  have been widely applied to phylogenetic studies of invertebrates which are also prevalent in Antarctica . Both markers consequently offer a comparatively large amount of reference data to identify such taxa in mixed DNA extracts [19,34]. A comparison of taxonomic assignments between 18S rDNA and COI phylotypes generated through metabarcoding and morphological approaches should consider available metabarcoding reference data, and rank resolution of morphological identifications. Furthermore, comparisons between morphological and HTS-based metabarcoding approaches are complicated by assumptions regarding sequence clustering and taxonomy assignment. Often, analysis parameters for a given processing environment are more or less arbitrary, although crucial to establish reliable richness and diversity estimates . Here, we compared the taxonomy assignment performance of a metabarcoding approach using one 18S rDNA and one COI marker to morphologyderived reference data. To avoid arbitrary application of metabarcoding analysis parameters, we calibrated parameters with replicated metabarcoding data from two soil samples ("Australian soils"). We were interested in how successfully each marker retrieves taxonomic assignment on superphylum, phylum, class, order, family, genus, and species level in an artificial DNA blend (containing Australian invertebrates - "Australian blend"), and in seven extracts of Antarctic soils (containing microfaunal taxa - "Antarctic soils") when compared to morphologically-derived sample compositions. 2 Methods 2.1 Samples All field activities in Antarctica and sample handling in Australia were undertaken as permitted by the Australian Antarctic Division and the Department of Agriculture, Fisheries and Forestry (Australian Federal Government). Sampling locations of Antarctic soils are shown in Fig. 1, invertebrate isolation and taxonomic descriptions are detailed elsewhere . Invertebrate morphotype composition of these soils is provided in Fig. S2, supporting information. Antarctic soils were thawed and freeze-dried prior to DNA extraction. Australian soils, collected in Adelaide (July 2012, see Table S1, supporting information), were introduced into the laboratory workflow at the freeze-drying stage; the Australian blend was introduced prior to amplification. The latter blend contained DNA from 15 taxa belonging to one order of Arachnida and 14 orders of insects, at a total concentration of 3.1 ng/µl . 2.2 DNA extractions DNA extractions of Australian and Antarctic soils were performed at the South Australian Research and Development Institute (SARDI) using a method optimized for the retrieval of DNA from different soil types and the retrieval of invertebrates in agricultural ecosystems for plant pathogen detection [26,3639], that processes 400 g of starting material. Cross contamination during extraction was detected by measuring the concentration of blank extractions . DNA was stored at -20 °C (SARDI) and at -60 °C (University of Adelaide). Extraction of Australian blend is described elsewhere . Fig 1. Soil sampling locations used for morphological and metabarcoding analysis of invertebrates. Amplifications of 18S rDNA and COI metabarcoding markers were conducted for whole-soil samples of all shown locations. From locations labeled with a number sign ("#") data could only be retrieved using the 18S marker. Base layers compiled by the Norwegian Polar Institute and distributed in the QUANTARCTICA package (http://www.quantarctica.org/) and courtesy of the SCAR Antarctic Digital Database, Scientific Committee on Antarctic Research; The National Snow and Ice Data Centre, University of Colorado, Boulder; National Aeronautics and Space Administration, Visible Earth Team, http://visibleearth.nasa.gov/; Australian Antarctic Division, Commonwealth of Australia. 2.3 Primers Primer sequences (including sequencing adapters and amplicon labels--fusion primers) for PCR and paired-end sequencing of 18S rDNA on the Illumina MiSeq platform were sourced from the 18S rDNA amplification protocol 4.13 of the Earth Microbiome Project  and are routinely used for metabarcoding 18S rDNA analyses . Primers HCO2198  and mlCOIintF  were chosen for COI amplification and sequencing using the 454 GS FLX platform. Further details on fusion primer design for both gene regions are provided in the Supplemental Material. 18S rDNA and COI fusion primers were initially tested on Antarctic soil samples. Phyla Chelicerata, Nematoda and Rotifera could be recovered by 18S rDNA and COI fusion primers, phylum Tardigrada only by 18S rDNA fusion primers. ng/µl for emulsion PCR preceding 454 sequencing (COI). 18S rDNA libraries were paired-end sequenced in two separate runs on the Illumina MiSeq platform (Illumina, San Diego, US-CA; reagents kit v2; 150 bp paired-end reads) in 300 cycles and on two separate quarters of a 454 GS FLX PicoTiterPlate (COI). DNA extraction and PCR controls were amplified and sequenced for both markers if the cleaned control reaction allowed pipetting (with a concentration above 0.25 ng/µl). Further details are provided in the Supplemental Material. 2.5 Reference data for taxonomic assignments For 18S rDNA taxonomy assignments, SILVA reference data  release 111 was used. Reference data for COI was compiled from earlier Antarctic studies (VelascoCastrillón et al. 2014b; c; Velasco-Castrillón & Stevens 2014) as well as GenBank . Further details regarding creation and composition of reference data are provided in the Supplemental Material. 2.4 Amplification and sequencing Amplification and sequencing steps are detailed in the Supplemental Material. Triplicate PCRs were prepared from all 8 extracts to alleviate mixed-template amplification biases [23,43]. Long extension times were used to counteract chimera formation [44,45]. Amplicons were visualized on agarose gels. Triplicates amplicons for each marker were then combined, purified and quantified. Amplicons above 0.25 ng/µl (Table S1, Supplemental Material) were then pooled in equimolar concentration for each marker. Libraries were diluted to 9 pM for Illumina sequencing (18S rDNA) or concentrated to 3.18 2.6 Generation of phylotype observations using multiple parameter combinations Phylotype data was generated in QIIME 1.8 , analyses were performed in R 3.1.1  using packages described elsewhere . With QIIME, we applied several clustering, taxonomy assignment and abundance filtering COI 454 data COI clustering clustering 97% clustering 99% 18S Illumina data 18S clustering clustering 97% clustering 99% NCBI reference data arthropods tardigrades rotifers nematodes taxonomy assignment taxonomy assignment 99% taxonomy assignment 95% taxonomy assignment 90% taxonomy assignment 85% taxonomy assignment 80% Taxonomy assignment taxonomy assignment 99% taxonomy assignment 95% taxonomy assignment 90% SILVA reference data eukaryotes Australian soils number of phylotypes mean of distances between corresponding replictes custom reference data arthropods tardigrades rotifers nematodes taxonomy assignment 75% taxonomy assignment 70% pre - filtering remove unassigned retain arthropods, tardigrades, rotifers nematodes remove phylotypes with <5 Sequences pre - filtering remove unassigned substract PCR and extraction blank controls retain Arthropods, Tardigrades, Rotifers Nematodes remove phylotypes with <5 Sequences Australian blend composition concordance at taxonomic levels Reference information Antarctic soils composition concordance at taxonomic levels Reference information abundance filtering discard lowest 0.5% discard lowest 0.3% discard lowest 0.2% discard lowest 0.1% abundance filtering discard lowest 0.5% discard lowest 0.3% discard lowest 0.2% discard lowest 0.1% Fig 2. Preparation of 18S (light green) and COI (light blue) phylotype data using the QIIME environment, and subsequent analysis (purple). During preparation, data of both metabarcoding markers were independently clustered and assigned with taxonomy using multiple thresholds. Taxonomy assignment was aided by SILVA, NCBI and unpublished reference data (18S rDNA and COI, respectively). During pre-filtering, phylotypes without taxonomic information or not defined by more than five sequences were discarded. Among the remaining phylotypes, only invertebrate phyla expected in Australian blend and Antarctic soils were retained for analysis. Subsequently, different percentages of low abundant phylotypes were discarded. During analysis (purple), comparisons of invertebrate phylotype compositions between two independent PCR replicates for each of two Australian whole soil extracts (Australian soils) were used to determine clustering, taxonomy assignment and abundance filtering parameters that yield similar compositions between corresponding replicates (without discarding all phylotypes). Those settings were then chosen to compare phylotype compositions of Australian blend and seven Antarctic soils to their morphologic taxonomy reference information. thresholds to raw metabarcoding data of both markers and evaluated the effect of these different settings on phylotype data from Australian soils. We then picked the most suitable setting (see below) to evaluate data from Australian blend and Antarctic soils (Fig. 2). Initially, deconvolution and 18S rDNA and COI data was performed. Chimera removal was achieved through removal of low abundant sequences for 18S rDNA and COI (removal of phylotypes with less than 5 sequences), and an additional de-novo search across the COI data using USEARCH 6.1  as further detailed in the Supplemental Material. Subsequently, de novo clustering at 97% or 99% sequence similarity was performed with UCLUST . Taxonomy assignment to phylotypes was performed with UCLUST  via QIIME and thresholds of 90%, 95% and 99% (18S rDNA), and 70%, 75%, 80%, 85%, 90%, 95% and 99% (COI, accommodating higher intraspecific pairwise distances between query and reference sequences). Resulting phylotype observations were filtered in a step-wise process (Fig. 2; Table S3, Supplemental Material) to retrieve data free of phylotypes present in PCR and extraction blanks and containing only arthropods, nematodes, tardigrades and rotifers; after removal of observations present at 0.1%, 0.2% 0.3% or 0.5% total abundance. From 24 (18S rDNA) and 70 (COI) resulting QIIME phylotype tables, 24 and 16 contained data after processing and were imported into R using the PHYLOSEQ package . Morphological information for Australian blend and Australian soils was converted into a format accessible by PHYLOSEQ and likewise imported into R. To ensure Antarctic phylotype origin in Antarctic soils, observations linked to Australian soils were removed from the Antarctic soil data. Taxonomy strings for morphological and metabarcoding data were restricted or expanded (where possible), to yield superphylum, phylum, class, order, family, genus, and species rank-level information. Taxon information was corrected using NCBI taxonomy terminology (16th of January 2015). All steps are further detailed in the supporting information; analysis scripts are available as indicated at the end of the text. of two Australian soils and minimize inclusion of low abundant phylotypes without discarding phylotypes reflective of the `true' compositional diversity (see Discussion). 2.8 Concordance between morphotypes and 18S rDNA, COI phylotype taxonomy We firstly plotted out the taxonomic composition of phylotypes from Australian blend and Antarctic soils, as well as the corresponding morphotype assignments. Secondly, concordance between morphotype and 18S rDNA/COI phylotype taxonomy was qualified by comparing rank-level information across all (seven) taxonomic ranks. To do so, through an algorithm (see analysis scripts) we recorded all morphotype names for each taxonomic level, and evaluated their presence among 18S rDNA and COI phylotype taxonomic assignments. Complete concordance between morphotypes and phylotypes was scored with "1", complete dis-concordance 2.7 Selection of processing parameters for 18S rDNA and COI phylotypes We selected QIIME processing parameters (clustering threshold, taxonomy assignment, low-abundance filtering percentage) based on the highest mean value of Jaccard indices between individually replicated PCRs of each of two Australian soils (Fig. 3; soils 1 and 2, dark grey shading). The chosen analysis parameters retrieve the most compositional similarity between two PCR replicates Cumulative effect of processing parameters on Australian soil invertebrates 15 10 5 0 0.8 0.6 0.4 0.2 0.0 0.75 0.50 0.25 0.00 0.8 0.6 0.4 0.2 0.0 10 5 0 0.4 0.2 0.0 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.4 0.3 0.2 0.1 0.0 COI c97 t75 f0.001 COI c97 t70 f0.001 COI c99 t75 f0.001 COI c99 t70 f0.001 COI c97 t75 f0.002 COI c97 t70 f0.002 COI c97 t80 f0.001 COI c99 t75 f0.002 COI c97 t70 f0.003 COI c99 t70 f0.002 COI c97 t75 f0.003 COI c99 t70 f0.003 COI c99 t80 f0.001 COI c97 t70 f0.005 COI c97 t85 f0.001 COI c99 t85 f0.001 Soil 1 18S c97 t90 f0.001 18S c97 t95 f0.001 18S c97 t95 f0.002 18S c97 t90 f0.002 18S c97 t95 f0.003 18S c97 t90 f0.003 18S c97 t90 f0.005 18S c97 t95 f0.005 18S c99 t90 f0.001 18S c99 t95 f0.001 18S c97 t99 f0.001 18S c97 t99 f0.002 18S c99 t90 f0.002 18S c99 t90 f0.003 18S c99 t95 f0.002 18S c99 t95 f0.003 18S c99 t99 f0.001 18S c97 t99 f0.003 18S c99 t95 f0.005 18S c97 t99 f0.005 18S c99 t90 f0.005 18S c99 t99 f0.002 18S c99 t99 f0.003 18S c99 t99 f0.005 Soil 2 Soils 1 & 2 Soil 1 Count Count Soil 2 Soils 1 & 2 Parameter combinations applied to 18S data Parameter combinations applied to COI data Fig 3. Determination of suitable processing parameters for invertebrate phylotypes recovered in Australian soils using 18S rDNA and COI metabarcoding markers. 18S rDNA data on the left, COI on right, respectively. Count: Abundances of invertebrate phylotypes. Decreasing phylotype numbers increase compositional similarity between two independent PCR replicates of the same soil and thus were chosen to order processing parameters (in rows). Soil 1/Soil 2: Jaccard indices described similarity between phylotype compositions of two corresponding PCR replicates for a given processing parameter and soil sample (Complete conformity = 1, complete nonconformity = 0). Soil 1 & 2: Mean Jaccard index was calculated from Soil 1 and Soil 2, which was used to choose a suitable processing parameter for each data set. Dark grey bars mark highest values in each row. Figure generated using the GGPLOT2 package . was scored with "0". To avoid deflating this taxonomic concordance in cases were both morpho- and phylotypes were lacking taxonomic information, unavailable taxonomic information was coded "NA". Thirdly, interclass correlation coefficients (ICC; ) were used to interrelate 18S rDNA and COI concordance values derived from the our algorithm, reaching a value of "1" if both markers showed the same ability to detect morphotypes. 3 Results 3.1 Selection of analysis parameters for 18S rDNA and COI metabarcoding data Maximum mean compositional similarity between two PCR replicates of each of two Australian soil samples (0.8 for 18S rDNA, and 0.45 for COI) was achieved using a clustering threshold of 97% and low abundance filtering of 0.01% for both markers, with a taxonomy assignment threshold of 99% for 18S rDNA and 80% for COI (Fig. 3, soils 1 and 2, dark grey shading). 3.2 Concordance between taxonomic assignments of morphotypes and 18S rDNA, COI phylotypes Initial plotting indicated that morphologic taxonomy assignments were more or less straightforward for larger invertebrates in the Australian blend, where they were possible to species rank (Fig. 5a and Fig. S1, Supplemental Material). In comparison, morphological taxonomic analysis was more difficult across all Antarctic soils. There, assignments were frequently missing below order level, but identified taxa from six orders (Adinetida, Araeolaimida, Rhabditida, Dorylaimida, Parachela and Phylodinida) and seven families (Macrobiotidae, Adinetidae, Hypsibiidae, Philodinidae, Plectidae, Qudsiannematidae and Rhabditidae (Fig 6a; Fig S2, Supplemental Material). Interrelation of 18S rDNA and COI concordances in relation to the morphologic data by means of ICCs (third analysis in methods section) resulted in a value of 0.843 for Australian blend, serving as a comparison value for the analogous calculations regarding Antarctic soils. Influenced by the less detailed morphologic data, 18S rDNA and COI phylotype taxonomy deviated further from each other in Antarctic soils then observed in Australian blends: Across Antarctic soils, ICCs were calculated with 0.429, 1.0 and -0.173 for samples CS-1, CS-2 and HI-1 (Table S4, Supplemental Material). The Antarctic soil ICCs thus were lower than for the Australian blend (CS-1, HI-1) and/ or influenced only by unavailable taxon information (HI-1) (Fig. 6 and Fig. S2, Table S1, Supplemental Material). Sample LH-2 yielded a comparatively high ICC value (0.759), due to the detection of Plectidae (Araeolaimida, Nematoda) in all three data sets (Fig. 6, Fig. S2). Evaluation of our scoring algorithm (Fig. 4a; second analysis in methods section) revealed phylotype taxonomy assignments for Australian blend using 18S rDNA and COI to be accurate on the superphylum and phylum level (for Ecdysozoa and Arthropoda, respectively, Fig. 4a). 18S rDNA yielded only four expected orders (Blattodea, Hymenoptera, Lepidoptera, Odonata, Fig. 5b). On class and order levels COI performed better than 18S rDNA (Fig. 4a) - insects were more accurately retrieved by COI; furthermore, six of 12 expected orders were retrieved (Araneae, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera; Fig. 5c). On the family level 18S rDNA yielded more matches than COI (Fig. 4a), three of 12 expected families were accurately assigned (Blattidae, Coenagrionidae, Ichneumonidae) and one family (Zygaenidae) constituted a miss-assignment (Fig. 5b). In comparison, COI yielded only two correct family assignments (Formicidae, Ichneumonidae), while four families were miss-assigned (Erebidae, Gnaphosidae, Hemerobiidae, Tachninidae, Fig. 5c). The concordance indices hereafter rose for both markers on the genus and species level (Fig. 4a), indicative of missing taxonomic information in both the morphologic and phylotype data (Fig. 5a, Fig. S1, Supplemental Material). Graphic representation of our scoring algorithm applied to Antarctic soil data (Fig. 4b; second analysis in methods section) demonstrated comparisons between phylotypes and morphotypes to be more impeded by missing information than observed in Australian blend. 18S rDNA phylotype data yielded only one of the orders and families (Araeolaimida: Plectidae) detected morphologically (Fig. 6a), in 3 of 5 expected samples LH-1, LH-2, VH-1 (Fig. 6b). Evidently (Fig. 6b), 18S rDNA phylotypes included orders not detected in morphologic approaches (Monhysterida in sample HI-1 and Oribatida in sample LH-2), each comprised of one family (Monhysteridae and Phenopelopidae, respectively). COI metabarcoding data yielded two orders contained in morphologic reference data (Adinetida and Araeolaimida, Fig. 6c). COI family level assignment to Plectidae was concordant with morphological data (Fig. 6a, c). This family was detected in sample LH-2 with both approaches, in CS-1 only with COI (Fig. 6a, c). In the order Adinetida, the family Adinetidae was detected using morphology but not COI in sample LH-1 and LH-2 (Fig. 6a and Fig. 6c); instead family Adinetidae was detected with COI in sample CS-1 (Fig. 6c). Since we excluded non-Antarctic phylotypes in our initial processing and also conducted low-abundance filtering, orders Coleoptera, Diptera, Lepidoptera and families therein (Fig. 6c) are highly likely to constitute miss-assignments due to missing reference data. Thus, our approach also allowed detection of Antarctic phylotypes that are not contained in the sequence reference data. 4 Discussion Metabarcoding will likely remain one of the prime methods for biodiversity monitoring and ecological studies in the years to come . Here, we present a straightforward approach to compare taxonomic assignments retrieved with different classification methods. We exemplify the usefulness of this approach by comparing taxonomic assignments of two different metabarcoding markers to morphological assignments. In the case of this study, the 18S rDNA and COI markers were chosen, since their application to Antarctic invertebrates provides comparatively comprehensive reference information. Similarly, other taxon groups and barcoding markers could be evaluated. With this study, we have complemented research that has compared metabarcoding markers (e.g.: [59,60]), focused on invertebrates  and provided `ground-truthing' for replicated metabarcoding data by means of morphological data . We also expand the range of studies investigating the effect of analysis parameters on metabarcoding data sets [35,60,65,66] including the quality of reference data [67,68]. 4.1 Selection of analysis parameters for 18S rDNA and COI metabarcoding data Amplicons of replicate bulk soil DNA extracts yield similar taxonomic compositions when the same markers are used, even when sequenced on different HTS platforms . At the same time, low abundant sequence phylotypes Comparison 18S to Morphology COI to Morphology Matching taxonomic assignments 1.00 0.75 0.50 0.25 0.00 Superphylum Phylum Class Order Family Genus Species CS-1 CS-2 HI-1 LH-2 Superphylum Phylum Class Order Family Genus Species Fig 4. Fraction of detected taxonomic entities in metabarcoding data, when compared to morphologic reference information at various taxonomic levels. A: Australian blend B: Antarctic soils. Blue: 18S rDNA. Grey: COI. Inclusion of unassigned taxonomy in both reference and metabarcoding data increased similarity on lower taxonomic levels even when higher taxonomic levels are not concordant (e.g.: sample CS-1). Figure generated using the GGPLOT2 package . Araneae Blattodea Coleoptera Dermaptera Diptera Hemiptera Hymenoptera Isoptera Lepidoptera Neuroptera Odonata Orthoptera Families Acrididae Blattidae Chrysopidae Coenagrionidae Eurybrachyidae Forficulidae Formicidae Ichneumonidae Lauxaniidae Lymantriidae Rhinotermitidae Scarabaeidae Sparassidae Tettigoniidae Blattidae Coenagrionidae Ichneumonidae Zygaenidae Araneae Erebidae Formicidae Gnaphosidae Hemerobiidae Ichneumonidae Tachinidae Blattodea Hymenoptera Lepidoptera Diptera Hemiptera Count Hymenoptera Lepidoptera Neuroptera Odonata no reference A - Morphoypes.: 12 Orders, 14 Families B - 18S: 4 Orders, 4 Families C - COI: 6 Orders, 7 Families Fig 5. Taxonomic assignments to invertebrates contained in Australian blend. Composition is shown on order and family level. Figure generated using the GGPLOT2 package . Families Macrobiotidae Adinetidae Hypsibiidae Philodinidae Plectidae Qudsianematidae Rhabditidae Adinetida Araeolaimida Adinetida Monhysteridae Phenopelopidae Plectidae Araeolaimida Monhysterida Adinetidae Carabidae Geometridae Muscidae Plectidae Sphingidae Ulidiidae Araeolaimida Rhabditida Dorylaimida Coleoptera Count Diptera Oribatida CS-1 CS-2 CS-1 CS-2 HI-1 LH-1 LH-2 VH-1 VH-2 A - M.-types.: 6 Orders, 7 Families B - 18S: 3 Orders, 3 Families CS-1 CS-2 HI-1 LH-1 LH-2 VH-1 VH-2 C - COI: 5 Orders, 7 Families Fig 6. Taxonomic assignments to invertebrates contained in Antarctic soils. Composition of morphotypes and phylotypes is shown on order and family level. Figure generated using the GGPLOT2 package . LH-2 HI-1 Parachela Philodinida Lepidoptera are likely to be of chimeric origin or may constitute other PCR or sequencing errors, necessitating their removal for ecological inferences . Adjusting sequence data processing thresholds could lead to biased results, if sparse data (i.e. Antarctic soils with low diversity [10,70], and low overall sequence count) or an artificial community (i.e. Australian blend, with high overall sequence count, but little sequence diversity) were used for calibration. In the first case (calibration with Antarctic soils), the amount of sequence artifacts could be underestimated, when applied to the Australian blend. In the second case (calibration with Australian blend), the amount of sequence artifacts could be overestimated in data from Antarctic soils. Consequently, we chose data from a more diverse natural community (i.e. Australian soils) for thresholds adjustment. We considered that these data would represent a compromise between over- and underestimating the amount of sequence artifacts. 4.2 Detecting highly abundant and cryptic Antarctic invertebrates Generally, retrieval of a completely overlapping species inventory between morphological and sequence-based approaches is difficult to achieve due to inherent biases of each approach . This limitation somewhat constrains comparisons between sequence-based and morphology based taxonomy assignments as performed here. Regardless, metabarcoding approaches are preferable in the first instance over morphological techniques for taxonomic identification of highly abundant and cryptic Antarctic nematodes and rotifers in Antarctic bulk soil samples, and provide some attractive benefits: (1) The high abundance of those taxa constitutes a constraint to morphological approaches and increases their DNA contributions to low-diverse Antarctic soil extracts, leading to higher success of metabarcoding approaches [14,61,62]; (2) both nematodes and rotifers are often missed in morphological approaches due to constraints of extraction methods, their small size and conserved morphology [61,62,71]; and (3) both markers employed here were able to provide family level assignments to nematodes and rotifers with reasonable workload (Fig. 6), despite the fact that all metabarcoding markers perform differently in detecting expected phylotypes from DNA mixtures . Apart from Araeolaimida (Nematoda) and Adinetida (Rotifera), Antarctic phylotype data did not contain taxa detected also by visual inspection (morphology). These absences may be caused by (a) absences of target organisms in the sample, (b) incomplete/ imperfect DNA extraction, (c) poor PCR performance of markers, (d) inappropriate removal of reads during sequence processing or (e) incorrect taxonomy assignment due to lacking reference data [62,63,72]. Sub-samples of Antarctic soils used for sequence generation may have lacked taxa identified visually (a, above), but extraction of large soil quantities (400 g) performed here makes biased DNA extract composition (b, above) unlikely . Overall lower amplicon concentrations for COI (Table S1, Supplemental Material) indicated lower PCR performance in comparison to 18S rDNA (c, above), but retrieval of invertebrate phylotypes was nonetheless possible (Fig. 6) and perhaps for invertebrates the large quantity of soil used is important (a, above). Due to our threshold selection approach, incorrect taxonomic assignments to phylotypes also detected among morphotypes is unlikely (e, above). Our results hence show that both 18S rDNA and COI markers are well suited to detect highly abundant Antarctic rotifers and nematodes in bulk soil extract, on the family level or above. Additionally, the employed 18S rDNA marker was able to detect Oribatida (Chelicerata), which the morphological approach failed to detect. In many cases this taxonomic resolution will be sufficient for large scale biogeographic inferences  and allows targeting of samples for further examination. 4.3 Metabarcoding marker choice for Antarctic invertebrates Here we examined the 18S rDNA and COI markers in conjunction with Antarctic invertebrates. Although metabarcoding data derived from the slow-evolving 18S rDNA may fail to accurately reflect biodiversity in mixed samples, the 18S rDNA gene region is considered an efficient and powerful marker for profiling unknown communities [59,60,74]. The faster mutation rate of the mitochondrial COI region is considered well-suited for discriminating among lower taxonomic ranks [32,7577]. At the same time, the COI gene region is prone to saturate at higher taxonomic levels [5860,74]. However, qualities of marker regions such as 18S rDNA and COI can only be observed and applied if reference data is available across all (and particularly the low) taxonomic ranks . Collectively, our study provides evidence that markers used for metabarcoding of bulk samples need to be chosen depending on the desired rank-resolution of taxonomic assignments, with regard to available reference data, and to the investigated environment and/or potential biological diversity. We recommend the application of COI markers for Antarctic invertebrate biodiversity assessments only for high taxonomic ranks, and to complement phylotype information obtained through other markers, such as 18S rDNA. In the Australian blend, COI performed better in retrieving morphologically concordant class and order level information, while on the family level 18S rDNA yielded higher concordance. In Antarctic soils, COI performed better on phylum and class level, while 18S rDNA retrieved better concordance at the order to species ranks. On lower taxonomic ranks, the taxonomic resolution of 18S rDNA outperforms the taxonomic resolution of COI for metabarcoding of Antarctic invertebrates. The decreased performance of COI at low taxonomic ranks in Antarctic samples is likely due and missing taxonomic information for this marker at low taxonomic ranks . The task of collating COI genotypes across all metazoans (and particularly small and cryptic invertebrates in remote regions) is arguably more difficult than creating reference data for the overall fewer conserved 18S rDNA genotypes. Critically, the remedy for this situation is to increase taxonomic [78,79] approaches in the Antarctic region linked to sequencing efforts, the latter in the future being likely realized using shotgun sequencing approaches [16,58,80]. morphotype information, Stephen Pederson (University of Adelaide) and Greg Guerin (University of Adelaide) for helpful discussions on analysis coding, Jimmy Breen (University of Adelaide) for maintaining the computational infrastructure required for sequence processing. We appreciate the support of Alan McKay, Russell Burns, and the laboratory staff of the South Australian Research and Development Institute (SARDI). Supplemental data Supporting Online Information provide further details regarding materials, methods and results. Sequence data and analysis are available as a stable release via https://doi.org/10.5281/zenodo.570066 and maintained at https://github.com/macrobiotus/marker_comparison.git.
DNA Barcodes – de Gruyter
Published: Jun 27, 2017
Access the full text.
Sign up today, get DeepDyve free for 14 days.