Genome skimming herbarium specimens for DNA barcoding and phylogenomics

Genome skimming herbarium specimens for DNA barcoding and phylogenomics Background: The world’s herbaria contain millions of specimens, collected and named by thousands of researchers, over hundreds of years. However, this treasure has remained largely inaccessible to genetic studies, because of both generally limited success of DNA extraction and the challenges associated with PCR‑ amplifying highly degraded DNA. In today’s next‑ generation sequencing world, opportunities and prospects for historical DNA have changed dramati‑ cally, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Results: As a practical test of routine recovery of rDNA and plastid genome sequences from herbarium specimens, we sequenced 25 herbarium specimens up to 80 years old from 16 different Angiosperm families. Paired ‑ end reads were generated, yielding successful plastid genome assemblies for 23 species and nuclear rDNAs for 24 species, respectively. These data showed that genome skimming can be used to generate genomic information from herbar‑ ium specimens as old as 80 years and using as little as 500 pg of degraded starting DNA. Conclusions: The routine plastome sequencing from herbarium specimens is feasible and cost‑ effective (com‑ pare with Sanger sequencing or plastome‑ enrichment approaches), and can be performed with limited sample destruction. Keywords: Degraded DNA, Herbarium specimens, Genome skimming, Plastid genome, rDNA, DNA barcoding species (DNA barcoding) and increasing knowledge of Background phylogenetic relationships. Herbaria are collections of preserved plant specimens The ‘unlocking’ of preserved natural history specimens stored for scientific study. There are approximately 3400 for DNA barcoding/species discrimination is of particu- herbaria in the world, containing around 350 million lar relevance. In the first decade of DNA barcoding, it specimens, collected over the past 400 years (http://sciwe became clear that obtaining material from expertly veri- b.nybg.org/scien ce2/index Herba rioru m.asp). These col - fied is a key rate-limiting step in the construction of a lections cover most of the world’s plant species, including global DNA reference library [2]. The millions of samples many rare and endangered local endemics, and species that are required for this endeavor, each needing cor- collected from places that are currently expensive or dif- responding voucher specimens and meta-data, create ficult to access [ 1]. The recovery of DNA from this vast a strong impetus for making best-use of previously col- resource of already collected expertly-verified herbarium lected material. specimens represent a highly efficient way of building a DNA degradation in herbarium samples followed by DNA-based identification resource of the world’s plant subsequent diffusion from the sample creates challenges for DNA recovery [3]. In addition, different preserva - tion methods can negatively affect the ability of extract, *Correspondence: dzl@mail.kib.ac.cn; jbyang@mail.kib.ac.cn Chun‑ Xia Zeng and Peter M. Hollingsworth contributed equally to this amplify and sequence DNA [4–6]. PCR amplification work of historical DNA is, therefore, generally restricted to Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese short amplicons (< 200  bp) and is further vulnerable Academy of Sciences, Kunming 650201, Yunnan, China Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/ publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Zeng et al. Plant Methods (2018) 14:43 Page 2 of 14 to contamination by recent DNA and PCR products input DNA was available (i.e. below 10 ng in Kanda et al. from the study species. The cumulative damage to the [20]; below 5  ng in Blaimer et  al. [21]). To better under- DNA can also cause incorrect bases to be inserted dur- stand ideal approaches of sample preparation for speci- ing enzymatic amplification. The main sources for these mens with minimal DNA, we intentionally limited DNA alterations are single nucleotide misincorporations [7, input to 500 pg per specimen. 8]. Above all, PCR-based Sanger sequencing by using In this paper we provide a further practical test of the herbarium samples to generate standard DNA bar- genome skimming methodology applied to herbarium codes can be challenging. A recent large-scale study by specimens. As part of the China Barcode of Life pro- Kuzmina et al. 2017 [9] examined 20,816 specimens rep- ject, and our wider phylogenomic studies, our aim was resenting 5076 of 5190 vascular plant species in Canada. to assess whether the success reported in these early Kuzmina et  al. found that specimen age and method of genome skimming studies could be repeated in other preservation had significant effects on sequence recov - laboratories. ery for all barcode markers. However, massively-parallel We evaluated the success and failure rates of rDNA and short-read Next-generation sequencing (NGS) protocols plastid genome sequencing from genome skims of 25 dif- have the potential to greatly increase the success of her- ferent species from herbarium specimens, and explored barium sequencing projects, as many new sequencing the impacts of parameters such as amount of input DNA approaches do not rely on large, intact DNA templates and PCR cycle numbers. and instead are well-suited for sequencing low concen- trations of short (100-400  bp) fragmented molecules [3, Methods 10]. Specimen sampling Straub et  al. [11], described how “genome skimming”, 25 herbarium specimens were selected from 16 Angio- involving a shallow-pass genome sequence using NGS, sperm families covering 22 genera, with specimen ages could recover highly repetitive genome regions such up to 80  years old. All 25 species were taken from the as rDNA or organelle genomes, and yield highly use- specimens housed in the Herbarium of the Institute of ful sequence data at relatively low sequence depth, and Botany, Chinese Academy of Sciences (KUN). The sam - these regions include the usual suite of DNA barcoding ples were selected to represent the major clades of APG markers [12, 13]. The genome skimming approach using III system (Table 1). NGS has been used to recover plastid DNA and rDNA sequences from 146 herbarium specimens [14], to pro- DNA extraction duce the entire nuclear genome of a 43-year-old Arabi- Approximately 1  cm sections of leaf or 20  mg of leaf dopsis thaliana herbarium specimen [15], the complete tissue were used for each DNA extraction. Genomic plastome, the mitogenome, nuclear ribosomal DNA clus- DNA was extracted using Tiangen DNAsecure Plant ters, and partial sequences of low-copy genes from an Kit (DP320). Yield and integrity (size distribution) of herbarium specimen of an extinct species of Hesperelaea genomic DNA extracts were quantified by fluorometric [16, 17], and the complete plastome, nuclear ribosomal quantification on the Qubit (Invitrogen, Carlsbad, Cali - DNA clusters, and partial sequences of low-copy genes fornia, USA) using the dsDNA HS kit, as well as by visual from three grass herbarium specimens [18]. assessment on a 1% agarose gel. However, sequencing small, historical specimens may be especially challenging if a specimens is unique, or Library preparation nearly so, with no alternative specimens available for All samples were subsequently built into blunt-end DNA study should the first specimen fail. Methods used to libraries in the laboratories using the NEBNext Ultra II extract and prepare DNA for sequencing must both be DNA library Prep kit for Illumina (New England BIo- more or less guaranteed to work, and, in many cases, labs) which has been optimized for as little as 5 ng start- allow for preservation of DNA for future study [19]. In ing DNA and Illumina-specific adapters [22]. The library recent studies that report successfully sequencing of his- protocol was performed as per the manufacturer’s torical specimens from 1  ng to 1  μg of input DNA (for instructions with four modifications: (i) 500  pg of input example, up to 1  μg in Bakker et  al. [14]; ∽  600  ng in DNA was selected to accommodate low starting DNA Staats et al. [15]; 33 ng in Zadane et al. [17]; 8.25–537 ng quantities, (ii) DNA was not fragmented by sonication in Kanda et  al. [20]; 5.8–200  ng in Blaimer et  al. [21]; because the DNA was highly degraded; (iii) The NEB - less than 10  ng in Besnard et  al. [18]; 1–10  ng in Sproul Next library was generated without any size selection; (iv) and Maddison [19]). But a number of studies also report DNA libraries were then amplified in an indexing PCR, abandoning a subset of specimens for which too little which barcoded each library and discriminated each Zeng et al. Plant Methods (2018) 14:43 Page 3 of 14 Table 1 List of the specimen materials, DNA yields used in our study Sample ID Species Family Collection Age ng/ul Volume (ul) DNA yield (ng) 01 Manglietia fordiana Magnoliaceae 19780402 39 0.894 36 32.184 02 Manglietia fordiana Magnoliaceae 19541027 63 2.35 37 86.95 03 Schisandra henryi Schisandraceae 19821108 35 1.87 33 61.71 04 Schisandra henryi Schisandraceae 19840528 33 0.909 33 29.997 05 Phoebe neurantha Lauraceae 1938 79 0.507 36 18.252 06 Cinnamomum bodinieri Lauraceae 1960 57 2.26 36 81.36 08 Holboellia latifolia Lardizabalaceae 1982 35 1.29 34 43.86 09 Chloranthus erectus Chloranthaceae 1973 44 4.18 36 150.48 10 Sarcandra glabra Chloranthaceae 1988 29 4.35 31.5 137.025 11 Meconopsis racemosa Papaveraceae 1976 41 4.35 22 95.7 12 Macleaya microcarpa Papaveraceae 1986 31 1.97 35.5 69.935 13 Hodgsonia macrocarpa Cucurbitaceae 1982 35 2.18 34 74.12 14 Malus yunnanensis Rosaceae 1939 78 0.834 35 29.19 15 Elaeagnus loureirii Elaeagnaceae 1993 24 9.75 34 331.5 16 Rhododendron rex subsp. fictolacteum Ericaceae 1979 38 8.15 20.5 167.075 17 Swertia bimaculata Gentianaceae 19840823 33 1.67 35 58.45 18 Primula sinopurpurea Primulaceae 19400907 77 0.974 32 31.168 19 Paederia scandens Araceae 19550331 62 0.344 34 11.696 20 Colocasia esculenta Araceae 19741001 43 1.46 36 52.56 21 Pholidota chinensis Orchidaceae 1959 58 0.107 34 3.638 22 Otochilus porrectus Orchidaceae 1990 27 0.344 35 12.04 23 Indosasa sinica Poaceae 2007 10 1.65 35 57.75 24 Camellia gymnogyna Theaceae 19340617 83 0.417 36 15.012 25 Camellia sinensis var. assamica Theaceae 2002 15 4.03 23 92.69 26 Panicum incomtum Poaceae 20001017 17 1.63 36 58.68 All vouchers are deposited in the herbarium of the Kunming Institute of Botany (KUN) sample. Five PCR cycles was suggested by the manufac- We did not assemble the internal gene spacer (IGS) turer’s instruction for 5 ng of input DNA. As only 500 pg because of the complexity of this region which is rich in of starting DNA was used, we tested use of increasing duplications and inversions. numbers of PCR cycles (namely × 6, × 8, × 10, × 12, × 14 The raw sequence reads were filtered for primer/adap - PCR cycles). Concentration and size profiles of the final tor sequences and low-quality reads with the NGS QC indexed libraries (125 libraries, representing 25 speci- Toolkit [23]. The cut-off value for percentage of read mens at 5 different numbers of PCR cycles) were assessed length was 80, and that for PHRED quality score was on a Bioanalyzer 2100 using a high sensitivity DNA chip. 30. Then the filtered high-quality pair-end reads were assembled into contigs with Spades 3.0 [24]. Next, we Library pooling identified highly similar genome sequences using the The final indexed libraries were then pooled (33 or 34 Basic Local Alignment Search Tool (BLAST: http://blast samples per lane) in equimolar ratios and sequenced .ncbi.nlm.gov/). The procedures and parameters for set - on three lanes on an Illumina XTen sequencing system ting the sequence quality control, de novo assembly, and (Illumina Inc.) using paired and chemistry at the Cloud blast search were followed as in Yang et  al. [25]. Next, health Medical Group Ltd. we determined the proper orders of the aligned contigs using the highly similar genome sequences identified in Analyses the BLAST search as references. At this point, the target Successfully sequenced samples were assembled into contigs were assembled into complete plastid genomes chloroplast genomes and nuclear rDNAs. Here the and nuclear rDNAs. rDNAs comprise the complete sequence of 26S, 18S, and Annotation of the plastomes was performed using 5.8S and internal transcribed spacers (ITS1 and ITS2). the plastid genome annotation package DOGMA [26] Zeng et al. Plant Methods (2018) 14:43 Page 4 of 14 (http://dogma .ccbb.utexa s.edu/). Start and stop codons of protein-coding genes, as well as intron/exon positions, were manually adjusted. The online tRNAscan-SE ser - vice [27] was used to further determine tRNA genes. The final complete plastomes and rDNAs were deposited into GenBank (Accession numbers: MH394344-MH394431; MH270450-MH270494). Fungi or other plants may be co-isolated during the DNA extraction process resulting in DNA contamination [1]. This is particularly important where starting DNA concentrations are extremely low. We thus sub-sampled our data to check for contamination. To check for con- tamination in the plastid DNA sequences, for each spe- cies we extracted its rbcL sequence and blasted it against Fig. 1 DNA yield against specimen age GenBank to check that it grouped with related species. BLAST1 (implemented in the BLAST program, version 2.2.17) was used to search the reference database for each −5 complete plastid genomes, but with gaps ranged from 5 query sequence with an E value < 1 × 10 . Likewise, to to 349  bp (Table  2). However, although Rhododendron check for plant and fungal contamination in the rDNA rex subsp. fictolacteum yielded useful plastid assemblies, sequences, we took the final assembled ITS sequences (or many gaps were detected among contigs when the spe- partial ITS sequences where complete ITS was not recov- cies Vaccinium macrocarpon was used as reference data. ered) and blasted the sequences against the NCBI data- For the nuclear rDNAs, 21 species gave ribosomal base to check that it grouped with related species. DNA sequences assemblies > 4.3  kb drawn from 1 to 2 contigs with sequencing depths ranging from 3 × to Results 567 × (no nrDNA sequences could be assembled for All 25 species yielded amounts of DNA suitable for Phodidota chinensis,  Paederia scandens, Otochilus por- library preparation and further processing. Total yields rectus, and Camellia gymnogyna) (Table  3). Of these varied between 3  ng and 400  ng from on average 20  mg 21 species, 18 resulted in assembled nrDNAs consist- of dried leaf tissue, usually the equivalent of 1 cm of leaf ing of partial sequences of 18S and 26S, along with the tissue (Table 1). We found a negative correlation between complete sequence of 5.8S and the internal transcribed specimen age and DNA yield (Fig. 1). spacers ITS1 and ITS2. However, 3 species (2 samples We successfully enriched and sequenced DNA librar- of Manglietia fordiana (Sample ID 01 and 02), Phoebe ies constructed from herbarium material. Despite only neurantha (Sample ID 05), were difficult to assemble, 500  pg of input DNA, good quality libraries were pro- resulting in only partial recovery of 5.8S and the internal duced from 100 of 125 samples (25 species, with × 8, transcribed spacers ITS1 and ITS2. × 10, × 12, × 14 PCR cycles). The concentration of the To check the quality of the plastid sequences, all gene final indexed libraries based on six PCR cycles per spe - regions were translated. No stop codons that would be cies was too low to be further sequenced. Between indicative of sequencing errors were detected within the 15,877,478 and 44,724,436 high-quality paired-end reads assembled contigs. We then extracted about 1400  bp of were produced, with the total number of bases rang- rbcL sequence from 23 of the samples to check for con- ing from 2,381,621,700  bp (2.38 giga base pairs, Gbp) to tamination (for Rhododendron rex subsp. fictolacteum 6,708,665,400  bp (6.71 Gbp) (Table  2). These were then (Sample ID 16), the plastid genome was not assembled assembled into contigs, and using a blast search into successfully but we could nevertheless extract the rbcL plastid genomes and rDNA arrays. sequence from the plastid contigs). These rbcL sequences After de novo assembly, two species (Otochilus por- were subjected to a blast search against the NCBI data- rectus and Pholidota chinensis) generated poor plastid base. The rbcL sequences contained no insertions or dele- assemblies, with the longest contigs being 6705  bp with tions and matched the correct genus or family in each case 2 × coverage and 1325  bp with 3 × coverage respec- (Table  4). Likewise, we blasted the final assembled rDNA tively. The other 23 species yielded useful plastid assem - ITS sequences (or partial ITS sequences) from 24 samples blies drawn from 3 to 61 contigs assembled into plastid against the NCBI database. In all cases, the closest match genomes with depths ranged from 459 × to 2176 ×. Of to the sequence was from the family of the sequenced sam- these 23 species, 14 were assembled into complete plas- ple. No matches with fungi were detected (Table 5). tid genomes. Eight species were assembled into nearly Zeng et al. Plant Methods (2018) 14:43 Page 5 of 14 Table 2 Assembly statistics of plastid genome for all specimens used in this study Sample ID PCR Species Family Total Raw data #contigs Total Completed GenBank cycles sequences (gb) assembly accession length (bp) number 01D ×8 Manglietia fordiana Magnoliaceae 22404632 3.36 9 158993 1059 bp gap MH394393 01E ×10 Manglietia fordiana Magnoliaceae 25869654 3.88 32 159759 349 bp gap MH394394 01A ×12 Manglietia fordiana Magnoliaceae 35201972 5.28 14 158241 1840 bp gap MH394391 01B ×14 Manglietia fordiana Magnoliaceae 30007234 4.5 14 158221 1840 bp gap MH394392 02D ×8 Manglietia fordiana Magnoliaceae 22829038 3.42 8 161497 1040 bp gap MH394397 02E ×10 Manglietia fordiana Magnoliaceae 32497068 4.87 21 160113 Y MH394398 02A ×12 Manglietia fordiana Magnoliaceae 29637182 4.45 12 158315 1802 bp gap MH394395 02B ×14 Manglietia fordiana Magnoliaceae 31089730 4.66 22 160113 Y MH394396 03D ×8 Schisandra henryi Schisandraceae 29691984 4.45 5 145963 94 bp gap MH394365 03E ×10 Schisandra henryi Schisandraceae 25141160 3.77 4 145616 54 bp gap MH394366 03A ×12 Schisandra henryi Schisandraceae 32511344 4.88 11 146031 18 bp gap MH394363 03B ×14 Schisandra henryi Schisandraceae 29856636 4.48 9 145993 63 bp gap MH394364 04D ×8 Schisandra henryi Schisandraceae 24039822 3.61 4 146212 53 bp gap MH394369 04E ×10 Schisandra henryi Schisandraceae 23870902 3.58 4 146243 53 bp gap MH394370 04A ×12 Schisandra henryi Schisandraceae 33190158 4.98 15 146218 63 bp gap MH394367 04B ×14 Schisandra henryi Schisandraceae 30498044 4.57 6 145893 45 bp gap MH394368 05D ×8 Phoebe neurantha Lauraceae 29040850 4.36 11 152782 Y MH394354 05E ×10 Phoebe neurantha Lauraceae 27831254 4.17 15 152782 Y MH394355 05A ×12 Phoebe neurantha Lauraceae 44724436 6.71 17 152781 1 bp gap MH394352 05B ×14 Phoebe neurantha Lauraceae 35264634 5.29 13 152781 1 bp gap MH394353 06D ×8 Cinnamomum Lauraceae 30188820 4.53 9 152778 Y MH394417 bodinieri 06E ×10 Cinnamomum Lauraceae 32065328 4.81 13 152719 Y MH394418 bodinieri 06A ×12 Cinnamomum Lauraceae 24488292 3.67 7 152719 Y MH394415 bodinieri 06B ×14 Cinnamomum Lauraceae 35035602 5.26 11 152719 Y MH394416 bodinieri 08D ×8 Holboellia latifolia Lardizabalaceae 26229946 3.93 5 157817 Y MH394377 08E ×10 Holboellia latifolia Lardizabalaceae 28273022 4.24 9 157818 Y MH394378 08A ×12 Holboellia latifolia Lardizabalaceae 33873136 5.08 13 157614 204 bp gap MH394375 08B ×14 Holboellia latifolia Lardizabalaceae 34021360 5.1 10 157818 Y MH394376 09D ×8 Chloranthus erectus Chloranthaceae 21843512 3.28 4 157812 43 bp gap MH394413 09E ×10 Chloranthus erectus Chloranthaceae 18044364 2.71 5 157812 47 bp gap MH394414 09A ×12 Chloranthus erectus Chloranthaceae 30022162 4.5 13 157852 Y MH394411 09B ×14 Chloranthus erectus Chloranthaceae 28656686 4.3 11 157852 Y MH394412 10D ×8 Sarcandra glabra Chloranthaceae 18893508 2.83 5 158733 119 bp gap MH394361 10E ×10 Sarcandra glabra Chloranthaceae 20662770 3.1 7 159007 22 bp gap MH394362 10A ×12 Sarcandra glabra Chloranthaceae 27510166 4.13 9 158900 Y MH394360 10B ×14 Sarcandra glabra Chloranthaceae 29545206 4.43 9 158900 Y MH394431 11D ×8 Meconopsis racemosa Papaveraceae 24351884 3.65 5 153762 Y MH394401 11E ×10 Meconopsis racemosa Papaveraceae 29160582 4.37 5 153762 Y MH394402 11A ×12 Meconopsis racemosa Papaveraceae 33763340 5.06 6 153763 Y MH394399 11B ×14 Meconopsis racemosa Papaveraceae 35990358 5.4 4 153728 1 bp gap MH394400 12D ×8 Macleaya microcarpa Papaveraceae 26265548 3.94 11 161064 48 bp gap MH394385 12E ×10 Macleaya microcarpa Papaveraceae 25100372 3.77 11 161064 48 bp gap MH394386 12A ×12 Macleaya microcarpa Papaveraceae 29491952 4.42 13 161118 Y MH394383 12B ×14 Macleaya microcarpa Papaveraceae 28462338 4.27 12 161110 2 bp gap MH394384 Zeng et al. Plant Methods (2018) 14:43 Page 6 of 14 Table 2 (continued) Sample ID PCR Species Family Total Raw data #contigs Total Completed GenBank cycles sequences (gb) assembly accession length (bp) number 13D ×8 Hodgsonia macro- Cucurbitaceae 26886870 4.03 26 155027 1300 bp gap MH394428 carpa 13E ×10 Hodgsonia macro- Cucurbitaceae 34179418 5.13 16 154855 1298 bp gap MH394429 carpa 13A ×12 Hodgsonia macro- Cucurbitaceae 37182144 5.58 18 156015 20 bp gap MH394426 carpa 13B ×14 Hodgsonia macro- Cucurbitaceae 36782268 5.52 17 156146 Y MH394427 carpa 14D ×8 Malus yunnanensis Rosaceae 22107718 3.32 16 158955 820 bp gap MH394389 14E ×10 Malus yunnanensis Rosaceae 25720160 3.86 5 160071 Y MH394390 14A ×12 Malus yunnanensis Rosaceae 37501036 5.63 5 160067 Y MH394387 14B ×14 Malus yunnanensis Rosaceae 33776058 5.07 5 160068 Y MH394388 15D ×8 Elaeagnus loureirii Elaeagnaceae 15195822 2.28 5 152196 8 bp gap MH394424 15E ×10 Elaeagnus loureirii Elaeagnaceae 16862680 2.53 5 152196 8 bp gap MH394425 15A ×12 Elaeagnus loureirii Elaeagnaceae 21511050 3.23 4 152199 5 bp gap MH394422 15B ×14 Elaeagnus loureirii Elaeagnaceae 20556860 3.08 6 152199 5 bp gap MH394423 16D ×8 Rhododendron rex Ericaceae 23623070 3.54 subsp. fictolacteum 16E ×10 Rhododendron rex Ericaceae 28092596 4.21 subsp. fictolacteum 16A ×12 Rhododendron rex Ericaceae 31352560 4.7 subsp. fictolacteum 16B ×14 Rhododendron rex Ericaceae 30525730 4.58 subsp. fictolacteum 17D ×8 Swertia bimaculata Gentianaceae 18303136 2.77 53 152808 266 bp gap MH394373 17E ×10 Swertia bimaculata Gentianaceae 16559554 2.48 41 153443 406 bp gap MH394374 17A ×12 Swertia bimaculata Gentianaceae 15877478 2.38 30 143977 9947 bp gap MH394371 17B ×14 Swertia bimaculata Gentianaceae 18448302 2.77 48 153602 341 bp gap MH394372 18D ×8 Primula sinopurpurea Primulaceae 22890598 3.43 5 151945 50 bp gap MH394358 18E ×10 Primula sinopurpurea Primulaceae 26618684 3.99 5 151945 50 bp gap MH394359 18A ×12 Primula sinopurpurea Primulaceae 24107472 3.62 3 151945 50 bp gap MH394356 18B ×14 Primula sinopurpurea Primulaceae 25834066 3.88 3 151945 50 bp gap MH394357 19D ×8 Paederia scandens Araceae 25307356 3.8 15 162267 247 bp gap MH394346 19E ×10 Paederia scandens Araceae 24658068 3.7 7 162268 247 bp gap MH394347 19A ×12 Paederia scandens Araceae 23850180 3.58 8 162282 253 bp gap MH394344 19B ×14 Paederia scandens Araceae 24064764 3.61 10 162139 253 bp gap MH394345 20D ×8 Colocasia esculenta Araceae 29284270 4.39 4 162350 155 bp gap MH394430 20E ×10 Colocasia esculenta Araceae 25045978 3.77 5 162350 155 bp gap MH394421 20A ×12 Colocasia esculenta Araceae 23560322 3.53 6 162414 155 bp gap MH394419 20B ×14 Colocasia esculenta Araceae 24533656 3.68 4 162414 155 bp gap MH394420 21D ×8 Pholidota chinensis Orchidaceae 21688990 3.25 21E ×10 Pholidota chinensis Orchidaceae 20880950 3.13 21A ×12 Pholidota chinensis Orchidaceae 23548018 3.53 21B ×14 Pholidota chinensis Orchidaceae 27148284 4.07 22D ×8 Otochilus porrectus Orchidaceae 15550512 2.33 22E ×10 Otochilus porrectus Orchidaceae 22638772 3.4 22A ×12 Otochilus porrectus Orchidaceae 21572196 3.23 22B ×14 Otochilus porrectus Orchidaceae 28960858 4.34 23D ×8 Indosasa sinica Gramineae 18793020 2.82 6 139848 18 bp gap MH394381 23E ×10 Indosasa sinica Gramineae 17903432 2.69 10 139740 Y MH394382 Zeng et al. Plant Methods (2018) 14:43 Page 7 of 14 Table 2 (continued) Sample ID PCR Species Family Total Raw data #contigs Total Completed GenBank cycles sequences (gb) assembly accession length (bp) number 23A ×12 Indosasa sinica Gramineae 19106404 2.87 9 139740 Y MH394379 23B ×14 Indosasa sinica Gramineae 19668682 2.95 8 139740 Y MH394380 24D ×8 Camellia gymnogyna Theaceae 17176632 2.58 4 156402 Y MH394405 24E ×10 Camellia gymnogyna Theaceae 24532196 3.68 7 156590 Y MH394406 24A ×12 Camellia gymnogyna Theaceae 26478224 3.97 4 156590 Y MH394403 24B ×14 Camellia gymnogyna Theaceae 29768770 4.47 4 156590 Y MH394404 25D ×8 Camellia sinensis var. Theaceae 23291572 3.49 4 157028 Y MH394409 assamica 25E ×10 Camellia sinensis var. Theaceae 18698814 2.8 5 157028 Y MH394410 assamica 25A ×12 Camellia sinensis var. Theaceae 21788776 3.27 4 157029 Y MH394407 assamica 25B ×14 Camellia sinensis var. Theaceae 26155342 3.92 8 157028 Y MH394408 assamica 26D ×8 Panicum incomtum Gramineae 16865102 2.53 61 139986 Y MH394350 26E ×10 Panicum incomtum Gramineae 20465942 3.07 21 139999 Y MH394351 26A ×12 Panicum incomtum Gramineae 20004364 3 18 139999 Y MH394348 26B ×14 Panicum incomtum Gramineae 20672642 3.1 17 139999 Y MH394349 One-way analyses of variance (ANOVA) were per- starting tissue is important, and demonstrates the prac- formed to test the total reads against PCR cycles, PCR tical feasibility of organelle genome and rDNA recovery cycles against plastid contig numbers, PCR cycles against with minimal impacts on specimens. These findings, in plastid genome assembly length, PCR cycles against plas- the context of studies by others (e.g. Bakker et  al. [14]) tid mean-depth, and PCR cycles against plastid coverage. confirm that genome skimming can be performed with We found that was no significant correlation between limited sample destruction enabling relatively straight- PCR cycles and plastid contig numbers, PCR cycles and forward access to high-copy number DNA in preserved plastid genome assembly length, and PCR cycles and herbarium specimens spanning a wide phylogenetic plastid coverage. There was, however, a significant posi - coverage. tive correlation between the number of PCR cycles and To accommodate the use of only 500 pg of input DNA, the total number of reads, and PCR cycles and the plastid we modified the library protocol to remove the step of mean-depth (Fig. 2). DNA fragmentation by sonication because the DNA was Finally, when comparing plastome assembly coverage already highly degraded, we did not undertake any size with C values of the species concerned we find a slight selection, and we increased the number of PCR cycles to negative bit not significant correlation (Fig.  3), which enrich the indexed library. After library preparation and would suggest, at least for our sampling, that plastome Illumina paired-end sequencing, a sufficient number of assembly coverage is not affected by nuclear genome size read pairs (> 15,000,000) were generated for our 25 speci- of the specimen concerned. mens and 100 libraries. This strategy allowed the genera - tion of complete or near complete plastid genomes with Discussion depths ranging from 459 × to 2176 ×, and nuclear ribo- Sequencing herbarium specimens from low amounts somal units with a high sequencing depth (3 × to 567 ×) of starting DNA for 23 and 24 specimens respectively. Despite the low Our current study successfully demonstrated the recov- starting concentration, no plant or fungal contaminants ery of plastid genome sequences and rDNA sequences were obviously detectable in the assembled plastomes from herbarium specimens, some up to 80 years old. Our and rDNA sequences. study used small amounts of starting tissue (c 1 cm ) and For herbarium plastome assembly, the procedures extremely low initial concentrations (500 pg) of degraded and parameters for setting the sequence quality control, starting DNA. This success with a small amount of de novo assembly, blast search and genome annotation Zeng et al. Plant Methods (2018) 14:43 Page 8 of 14 Table 3 Assembly statistics of rDNAs for all specimens used in this study Sample ID PCR Cycles Species Family #contigs Total assembly (mean) Reference genome GenBank length (bp) Coverage accession (×) number 01A ×12 Manglietia fordiana Magnoliaceae 2 10343 406 KJ414477_ MH270473 Chrysobalanus icaco 02A ×12 Manglietia fordiana Magnoliaceae 2 8637 67 MH270474 03A ×12 Schisandra henryi Schisandraceae 1 15487 47 MH270475 04A ×12 Schisandra henryi Schisandraceae 1 10747 78 MH270476 05A ×12 Phoebe neurantha Lauraceae 2 7516 19 MH270477 06A ×12 Cinnamomum Lauraceae 1 10926 32 MH270478 bodinieri 08A ×12 Holboellia latifolia Lardizabalaceae 1 9298 160 MH270479 09A ×12 Chloranthus erectus Chloranthaceae 1 9094 54 MH270480 10A ×12 Sarcandra glabra Chloranthaceae 1 9062 51 MH270481 11A ×12 Meconopsis rac- Papaveraceae 1 7577 60 MH270482 emosa 12A ×12 Macleaya micro- Papaveraceae 1 12587 458 MH270483 carpa 13A ×12 Hodgsonia macro- Cucurbitaceae 1 10172 567 MH270484 carpa 14A ×12 Malus yunnanensis Rosaceae 1 5953 249 MH270485 15A ×12 Elaeagnus loureirii Elaeagnaceae 1 7901 428 MH270486 16A ×12 Rhododendron rex Ericaceae 1 6825 380 MH270487 subsp. fictolac- teum 17A ×12 Swertia bimaculata Gentianaceae 1 9644 48 MH270488 18A ×12 Primula sinopur- Primulaceae 1 5539 15 MH270489 purea 19A ×12 Paederia scandens Araceae 20A ×12 Colocasia esculenta Araceae 1 4399 5 MH270490 21A ×12 Pholidota chinensis Orchidaceae – – – – 22A ×12 Otochilus porrectus Orchidaceae 23A ×12 Indosasa sinica Gramineae 1 17306 93 MH270491 24A ×12 Camellia gym- Theaceae nogyna 25A ×12 Camellia sinensis var. Theaceae 1 11212 46 MH270493 assamica 26A ×12 Panicum incomtum Gramineae 1 8446 74 MH270494 were followed as in Yang et  al. [25]. The rate of our 25 barcode region from 23/25 samples, the trnL intron specimens with 100 libraries was c. 5  h per specimen from 23/25 samples, and the ITS1 and ITS2 from 20/25 on a 3-TB RAM Linux workstation with 32 cores. It was to 19/25 samples respectively. In addition to the recov- not different significantly between fresh and herbarium ery of these standard DNA barcoding loci, we also recov- specimens. ered many other regions used as supplementary barcode markers (e.g. atpF-H, psbK-I). The data produced with Recovery of widely used loci in plant molecular systematics this approach can thus contribute towards standard and A benefit of the genome skimming approach is that it extended DNA barcode reference libraries [12], in help- can recover loci widely used in previous molecular sys- ing identify additional regions which are informative for tematics studies (e.g. Coissac et  al. 2016 [12]). Here we any given clade [28], as well as producing data for phy- recovered the standard rbcL DNA barcode region from logenomic investigations to elucidate the relationships 23/25 samples, the standard matK DNA barcode region amongst plant groups. from 23/25 specimens, the standard trnH-psbA DNA Zeng et al. Plant Methods (2018) 14:43 Page 9 of 14 Table 4 BLAST results with extracted rbcL sequence against GenBank Query Information BLAST results Query_ Query_Species PCR Gene Length Reference_Species_Accession number Query Identities Identify Sample ID (Family) cycles name (bp) (Family) coverage (%) (%) level 01A Manglietia fordiana 12 rbcL 1428 Magnolia cathcartii_JX280392.1 (Magnoliaceae) 100 99 Family (Magnoliaceae) Magnolia biondii_KY085894.1 (Magnoliaceae) 100 99 Michelia odora_JX280398.1 (Magnoliaceae) 100 99 Manglietia fordiana_L12658.1 (Magnoliaceae) 98 100 02A Manglietia fordiana 12 rbcL 1428 Magnolia cathcartii_JX280392.1 (Magnoliaceae) 100 99 Family (Magnoliaceae) Magnolia biondii_KY085894.1 (Magnoliaceae) 100 99 Michelia odora_JX280398.1 (Magnoliaceae) 100 99 Manglietia fordiana_L12658.1 (Magnoliaceae) 98 100 03A Schisandra henryi 12 rbcL 1428 Schisandra chinensis_KY111264.1 (Schisandraceae) 100 99 Genus (Schisandraceae) Schisandra chinensis_KU362793.1 (Schisandraceae) 100 99 Schisandra sphenanthera_L12665.2 (Schisan‑ 98 99 draceae) 04A Schisandra henryi 12 rbcL 1428 Schisandra chinensis_KY111264.1 (Schisandraceae) 100 99 Genus (Schisandraceae) Schisandra chinensis_KU362793.1 (Schisandraceae) 100 99 Schisandra sphenanthera_L12665.2 (Schisan‑ 98 99 draceae) 05A Phoebe neurantha 12 rbcL 1428 Phoebe omeiensis_KX437772.1 (Lauraceae) 100 99 Family (Lauraceae) Persea Americana_KX437771.1 (Lauraceae) 100 99 Persea sp. _JF966606.1 (Lauraceae) 100 99 06A Cinnamomum bodi- 12 rbcL 1428 Phoebe bournei_KY346512.1 (Lauraceae) 100 99 Family nieri (Lauraceae) Phoebe chekiangensis_KY346511.1 (Lauraceae) 100 99 Phoebe sheareri_KX437773.1 (Lauraceae) 100 99 Cinnamomum verum_KY635878.1 (Lauraceae) 100 99 08A Holboellia latifolia 12 rbcL 1428 Akebia quinata_KX611091.1 (Lardizabalaceae) 100 99 Family (Lardizabalaceae) Stauntonia hexaphylla_L37922.2 (Lardizabalaceae) 99 99 Akebia trifoliate_KU204898.1 (Lardizabalaceae) 100 99 Holboellia latifolia_L37918.2 (Lardizabalaceae) 99 99 09A Chloranthus erectus 12 rbcL 1428 Chloranthus spicatus_EF380352.1 (Chloranthaceae) 100 100 Genus (Chloranthaceae) Chloranthus japonicas_KP256024.1 (Chloran‑ 100 99 thaceae) Chloranthus spicatus_AY236835.1 (Chloran‑ 98 99 thaceae) Chloranthus erectus_AY236834.1 (Chloranthaceae) 98 99 10A Sarcandra glabra 12 rbcL 1428 Chloranthus spicatus_EF380352.1 (Chloranthaceae) 100 99 Family (Chloranthaceae) Chloranthus japonicas_KP256024.1 (Chloran‑ 100 98 thaceae) Chloranthus nervosus_AY236841.1 (Chloran‑ 97 98 thaceae) Sarcandra glabra_HQ336522.1 (Chloranthaceae) 89 100 11A Meconopsis rac- 12 rbcL 1428 Meconopsis horridula_JX087717.1 (Papaveraceae) 97 100 Genus emosa (Papaver‑ aceae) Zeng et al. Plant Methods (2018) 14:43 Page 10 of 14 Table 4 (continued) Query Information BLAST results Query_ Query_Species PCR Gene Length Reference_Species_Accession number Query Identities Identify Sample ID (Family) cycles name (bp) (Family) coverage (%) (%) level Meconopsis horridula_ JX087712.1 (Papaveraceae) 97 99 Meconopsis delavayi_JX087688.1 (Papaveraceae) 97 99 12A Macleaya micro- 12 rbcL 1428 Macleaya microcarpa_FJ626612.1 (Papaveraceae) 97 99 Family carpa (Papaver‑ aceae) Macleaya cordata_U86629.1 (Papaveraceae) 97 99 Coreanomecon hylomeconoides_KT274030.1 100 98 (Papaveraceae) 13A Hodgsonia macro- 12 rbcL 1449 Cucumis sativus var. hardwickii_KT852702.1 100 98 Family carpa (Cucurbita‑ (Cucurbitaceae) ceae) Cucumis sativus_KX231330.1 (Cucurbitaceae) 100 98 Cucumis sativus_KX231329.1 (Cucurbitaceae) 100 98 14A Malus yunnanensis 12 rbcL 1428 Cotoneaster franchetii_KY419994.1 (Rosaceae) 100 99 Family (Rosaceae) Vauquelinia californica_KY419925.1 (Rosaceae) 100 99 Cotoneaster horizontalis_KY419917.1 (Rosaceae) 100 99 Malus doumeri_KX499861.1 (Rosaceae) 100 99 15A Elaeagnus loureirii 12 rbcL 1428 Elaeagnus macrophylla_KP211788.1 (Elae‑ 100 99 Order (Elaeagnaceae) agnaceae) Elaeagnus sp._KY420020.1 (Elaeagnaceae) 100 99 Toricellia angulate_KX648359.1 (Cornaceae) 99 99 16A Rhododendron rex 12 rbcL 1428 Rhododendron simsii_GQ997829.1 (Ericaceae) 100 99 Family subsp. Fictolac- teum (Ericaceae) Rhododendron ponticum_KM360957.1 (Ericaceae) 98 99 Epacris sp._ L01915.2 (Ericaceae) 97 99 17A Swertia bimaculata 12 rbcL 1443 Swertia mussotii_KU641021.1 (Gentianaceae) 98 99 Family (Gentianaceae) Gentianopsis ciliate_KM360802.1 (Gentianaceae) 97 98 Gentianella rapunculoides_Y11862.1 (Gentian‑ 97 99 aceae) 18A Primula sinopurpu- 12 rbcL 1428 Primula poissonii_KX668176.1 (Primulaceae) 100 99 Genus rea (Primulaceae) Primula chrysochlora_KX668178.1 (Primulaceae) 100 99 Primula poissonii_KF753634.1 (Primulaceae) 100 99 19A Paederia scandens 12 rbcL 1443 Pothos scandens_AM905732.1 (Araceae) 96 99 Family (Araceae) Pedicellarum paiei_AM905733.1 (Araceae) 96 99 Pothoidium lobbianum_AM905734.1 (Araceae) 96 99 20A Colocasia esculenta 12 rbcL 1443 Colocasia esculenta_JN105690.1 (Araceae) 100 100 Species (Araceae) Colocasia esculenta_JN105689.1 (Araceae) 100 99 Pinellia pedatisecta_KT025709.1 (Araceae) 100 99 21A Pholidota chinensis 12 rbcL – – – (Orchidaceae) 22A Otochilus porrectus 12 rbcL – – – (Orchidaceae) 23A Indosasa sinica 12 rbcL 1434 Pleioblastus maculatus_JX513424.1 (Poaceae) 100 100 Family (Poaceae) Zeng et al. Plant Methods (2018) 14:43 Page 11 of 14 Table 4 (continued) Query Information BLAST results Query_ Query_Species PCR Gene Length Reference_Species_Accession number Query Identities Identify Sample ID (Family) cycles name (bp) (Family) coverage (%) (%) level Oligostachyum shiuyingianum_JX513423.1 100 100 (Poaceae) Indosasa sinica_JX513422.1 (Poaceae) 100 100 24A Camellia gymnog- 12 rbcL 1428 Camellia szechuanensis_KY406778.1 ( Theaceae) 100 100 Family yna ( Theaceae) Pyrenaria menglaensis_KY406747.1 ( Theaceae) Camellia luteoflora_KY626042.1 ( Theaceae) 25A Camellia sinensis 12 rbcL 1428 Camellia szechuanensis_KY406778.1 ( Theaceae) 100 100 Family var. assamica ( Theaceae) Pyrenaria menglaensis_KY406747.1 ( Theaceae) 100 100 Camellia luteoflora_KY626042.1 ( Theaceae) 100 100 Camellia sinensis var. assamica_JQ975030.1 100 100 ( Theaceae) 26A Panicum incomtum 12 rbcL 1434 Lecomtella madagascariensis_HF543599.2 99 99 Family (Poaceae) (Poaceae) Chasechloa madagascariensis_KX663838.1 99 99 (Poaceae) Amphicarpum muhlenbergianum_KU291489.1 99 99 (Poaceae) Panicum virgatum_HQ731441.1 (Poaceae) 100 99 Practical benefits degraded or destroyed, the species concerned may sim- A primary motivation for this study was our own expe- ply be no longer available for collection. Mining herbaria riences with suboptimal DNA recovery from herbarium to obtain sequences from previously collected material specimens using Sanger sequencing coupled with dif- can circumvent this problem. Thirdly, sequencing plas - ficulty in accessing fresh material of some species. The tid genomes and rDNA arrays from specimens that are success of this method using only small amounts of start- many decades old enables a baseline to be established for ing tissue from herbarium specimens is an important haplotype and ribotype diversity. This baseline can then step to addressing these challenges. It makes sequencing be used to assess evidence for genetic diversity loss or type specimens a realistic proposition, which can further change due to recent population declines or environmen- serves to integrate genetic data into the existing taxo- tal change. nomic framework. A second practical benefit is that field work is often not possible in some geographical regions Conclusions where past collections have been made. Political insta- This study confirms the practical and routine applica - bility and/or general inaccessibility can preclude current tion of genome skimming for recovering sequences collecting activities, and where habitats have been highly from plastid genomes and rDNA from small amounts Zeng et al. Plant Methods (2018) 14:43 Page 12 of 14 Table 5 BLAST results with extracted ITS sequence against GenBank Query information BLAST results Query_ Query_Species (Family) PCR cycles Gene Length Reference_Species (Family) Query Identities Sample ID name (bp) coverage 01A Manglietia fordiana (Magnoliaceae) 12 ITS 369 Magnolia virginiana_DQ499097.1 (Mag‑ 100% 95% noliaceae) 02A Manglietia fordiana (Magnoliaceae) 12 ITS 349 Magnolia virginiana_DQ499097.1 (Mag‑ 100% 95% noliaceae) 03A Schisandra henryi (Schisandraceae) 12 ITS 676 Schisandra pubescens_AF263436.1 99% 100% (Schisandraceae) 04A Schisandra henryi (Schisandraceae) 12 ITS 676 Schisandra pubescens_JF978533.1 99% 99% (Schisandraceae) 05A Phoebe neurantha (Lauraceae) 12 ITS 518 Phoebe neurantha_FM957847.1 (Lau‑ 100% 99% raceae) 06A Cinnamomum bodinieri (Lauraceae) 12 ITS 603 Cinnamomum micranthum f. kanehirae 100% 99% _KP218515.1 (Lauraceae) 08A Holboellia latifolia (Lardizabalaceae) 12 ITS 677 Holboellia angustifolia subsp. 100% 99% angustifolia_AY029790.1 (Lardizabal‑ aceae) 09A Chloranthus erectus (Chloran‑ 12 ITS 663 Chloranthus erectus_AF280410.1 (Chlo‑ 99% 99% thaceae) ranthaceae) 10A Sarcandra glabra (Chloranthaceae) 12 ITS 667 Sarcandra glabra_KWNU91871 (Chlor‑ 100% 100% anthaceae) 11A Meconopsis racemosa (Papaver‑ 12 ITS 671 Meconopsis racemosa_JF411034.1 100% 99% aceae) (Papaveraceae) 12A Macleaya microcarpa (Papaveraceae) 12 ITS 612 Macleaya cordata_AY328307.1 (Papa‑ 99% 89% veraceae) 13A Hodgsonia macrocarpa (Cucurbi‑ 12 ITS 614 Hodgsonia heteroclita_HE661302.1 100% 98% taceae) (Cucurbitaceae) 14A Malus yunnanensis (Rosaceae) 12 ITS 596 Malus prattii_JQ392445.1 (Rosaceae) 99% 99% 15A Elaeagnus loureirii (Elaeagnaceae) 12 ITS 649 Elaeagnus macrophylla_JQ062495.1 99% 99% (Elaeagnaceae) 16A Rhododendron rex subsp. fictolac- 12 ITS 646 Rhododendron rex subsp. fictolacteum_ 100% 100 teum (Ericaceae) KM605995.1 (Ericaceae) 17A Swertia bimaculata (Gentianaceae) 12 ITS 626 Swertia bimaculata _JF978819.2 (Gen‑ 100 99% tianaceae) 18A Primula sinopurpurea (Primulaceae) 12 ITS 631 Primula melanops_JF978004.1 (Primu‑ 100% 99% laceae) 19A Paederia scandens (Araceae) 12 ITS – – – – 20A Colocasia esculenta (Araceae) 12 ITS 552 Colocasia esculenta_AY081000.1 99% 99% (Araceae) 21A Pholidota chinensis (Orchidaceae) 12 ITS – – – – 22A Otochilus porrectus (Orchidaceae) 12 ITS – – – – 23A Indosasa sinica (Poaceae) 12 ITS 604 Oligostachyum sulcatum_EU847131.1 98 99 (Poaceae) 24A Camellia gymnogyna ( Theaceae) 12 ITS – – – – 25A Camellia sinensis var. assamica 12 ITS 645 Camellia sinensis var. sinensis_ 99% 99% ( Theaceae) FJ004871.1 ( Theaceae) 26A Panicum incomtum (Poaceae) 12 ITS 795 Chasechloa egregia_LT593967.1 100 98 (Poaceae) Zeng et al. Plant Methods (2018) 14:43 Page 13 of 14 of starting tissue from preserved herbarium specimens. The ongoing development of new sequencing technolo - gies is creating a fundamental shift in the ease of recov- ery of nucleotide sequences enabling ‘new uses’ for the hundreds of millions of existing herbarium specimens [1, 10, 14, 16, 29]. This shift from Sanger sequencing to NGS approaches has now firmly moved herbarium specimens into the genomic era. Authors’ contributions BY and DZL organized the project. CXZ performed the experiments, analyzed the data, and wrote the paper; PMH wrote and edited the paper; JY, ZSH, and ZRZ extracted DNA, prepared library. All authors read and approved the final manuscript. Author details Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China. Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, UK. Acknowledgements We are very grateful to Mr. Wei Fang (Kunming Institute of Botany, Chinese Academy of Sciences) for kindly providing the materials. We would like to thank Ms. Chun‑ Yan Lin and Mr. Shi‑ Yu Lv (Kunming Institute of Botany, Chi‑ nese Academy of Sciences) for their help with the experiments. Competing interests The authors declare that they have no competing interests. Availability of data and materials The datasets supporting the conclusions of this article are available in the NCBI SRA repository, SRP142448 and hyperlink to datasets in http://www.ncbi. nlm.nih.gov/home/submi t.shtml . Consent for publication Not applicable. Ethics approval and consent to participate Not applicable. Funding This work was funded by a program for basic scientific and technological data acquisition of the Ministry of Science of Technology of China (Grant No. 2013FY112600), the Large‑scale Scientific Facilities of the Chinese Academy Fig. 2 PCR cycles with raw data, contigs, and assembly length of Sciences (Grant No: 2017‑LSF‑ GBOWS‑02), and Biodiversity Conservation Strategy Program of Chinese Academy of Sciences (ZSSD‑011). Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub‑ lished maps and institutional affiliations. Received: 27 November 2017 Accepted: 20 April 2018 References 1. Särkinen T, Staats M, Richardson JE, Cowan RS, Bakker FT. How to open the treasure chest? Optimizing DNA extraction from herbarium speci‑ mens. PLoS ONE. 2012;7(8):e43808. 2. Hebert PDN, Hollingsworth PM, Hajibabaei M. From writing to reading the encyclopedia of life. Philos Trans R Soc B. 2016;371(1702):20150321. 3. Kistler L, Ware R, Smith O, Collins M, Allaby RG. A new model for ancient Fig. 3 Plastome coverage versus C value (pg DNA per 1C) of all DNA decay based on paleogenomic meta‑analysis. Nucleic Acids Res. samples assembled in this study 2017;45(11):6310–20. Zeng et al. Plant Methods (2018) 14:43 Page 14 of 14 4. Hall LM, Wollcox MS, Jones DS. Association of enzyme inhibition with 16. Van de Paer C, Hong‑ Wa C, Jeziorski C, Besnard G. Mitogenomics of methods of museum skin preparation. Biotechniques. 1997;22(5):928–34. Hesperelaea, an extinct genus of Oleaceae. Gene. 2016;594(2):197–202. 5. Hedmark E, Ellegren H. Microsatellite genotyping of DNA isolated from 17. Zedane L, Hong‑ Wa C, Murienne J, Jeziorsky C, Baldwin BG, Besnard G. claws left on tanned carnivore hides. Int J Legal Med. 2005;119(6):370–3. Museomics Illuminate the history of an extinct, paleoendemic plant 6. Tang EPY. Path to effective recovering of DNA from formalin‑fixed lineage (Hesperelaea, Oleaceae) known from an 1875 collection from biological samples in natural history collections: workshop summary. Guadalupe Island, Mexico. Biol J Linnea Soc. 2015;117(1):44–57. Washington: The National Academies Press; 2006. 18. Besnard G, Christin PA, Malé PJG, Lhuillier E, Lauzeral C, Coissac E, Voront‑ 7. Groombridge JJ, Jones CG, Bruford MW, Nichols RA. ‘Ghost’ alleles of the sova MS. From museums to genomics: old herbarium specimens shed Mauritius kestrel. Nature. 2000;403(6770):616. light on a C3 to C4 transition. J Exp Bot. 2014;65(22):6711–21. 8. Stiller M, Green RE, Ronan M, Simons JF, Du L, He W, Egholm M, Rothberg 19. Sproul JS, Maddison DR. Sequencing historical specimens: successful JM, Keates SG, Ovodov ND, Antipina EE, Baryshnikov GF, Kuzmin YV, preparation of small specimens with low amounts of degraded DNA. Mol Vasilevski AA, Wuenschell GE, Termini J, Hofreiter M, Jaenicke‑Després Ecol Resour. 2017;17:1183–201. V, Pääbo S. Patterns of nucleotide misincorporations during enzymatic 20. Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DE. Successful amplification and direct large ‑scale sequencing of ancient DNA. Proc Natl recovery of nuclear protein‑ coding genes from small insects in museums Acad Sci USA. 2006;103(37):13578–84. using Illumina sequencing. PLoS ONE. 2015;10:30143929. 9. Kuzmina ML, Braukmann TWA, Fazekas AJ, Graham SW, Dewaard SL, Rod‑ 21. Blaimer BB, Lloyd MW, Guillory WX, SnG B. Sequence capture and rigues A, Bennett BA, Dickinson TA, Saarela JM, Catling PM, Newmaster phylogenetic utility of genomic ultraconserved elements obtained from SG, Percy DM, Fenneman E, Lauron‑Moreau A, Ford B, Gillespie L, Sub ‑ pinned insect specimens. PLoS ONE. 2016;11:e0161531. ramanyam R, Whitton J, Jennings L, Metsger D, Warne CP, Brown A, Sears 22. Meyer M, Kircher M. Illumina sequencing library preparation for highly E, Dewaard JR, Zakharov EV, Hebert PDN. Using herbarium‑ drived DNAs multiplexed target capture and sequencing. Cold Spring Harb Protoc. to assemble a large‑scale DNA barcode library for the vascular plants of 2010. https ://doi.org/10.1101/pdb.prot5 448. Canada. Appl Plant Sci. 2017;5(12):1700079. 23. Patel RK, Jain M. NGS QC toolkit: a toolkit for quality control of next gen‑ 10. Smith O, Palmer SA, Gutaker R, Allaby RG. An NGS approach to archaeo‑ eration sequencing data. PLoS ONE. 2012;7(2):e30619. botanical museum specimens as genetic resources in systematics 24. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin research. In: Olson PD, Hughes J, Cotton JA, editors. Next generation VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi systematics. Cambridge: Cambridge University Press; 2016. p. 282–304. N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly 11. Straub SCK, Parks M, Weithmier K, Fishbein M, Cronn RC, Liston A. Navi‑ algorithm and its applications to single‑ cell sequencing. J Comput Biol. gating the tip of the genomic iceberg: next‑ generation sequencing for 2012;19(5):455–77. plant systematics. Am J Bot. 2012;99(2):349–64. 25. Yang JB, Li DZ, Li HT. Highly effective sequencing whole chloroplast 12. Coissac E, Hollingsworth PM, Lavergne S, Taberlet P. From barcodes genomes of angiosperms by nine novel universal primer pairs. Mol Ecol to genomes: extending the concept of DNA barcoding. Mol Ecol. Resour. 2014;14(5):1024–31. 2016;25(7):1423–8. 26. Wyman SK, Jansen RK, Boore JL. Automatic annotation of organellar 13. Hollingsworth PM, Li DZ, van der Bank M, Twyford AD. Telling plant spe‑ genomes with DOGMA. Bioinformatics. 2004;20(17):3252–5. cies apart with DNA: from barcodes to genomes. Philos Trans R Soc B. 27. Schattner P, Brooks AN, Lowe TM. The tRNAscan‑SE, snoscan and snoGPS 2016;371(1702):20150338. web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 14. Bakker FT, Lei D, Yu JY, Mohammadin S, Wei Z, van de Kerke S, Gravendeel 2005;33(Suppl_2):W686–9. B, Nieuwenhuis M, Staats M, Alquezar‑Planas DE, Holmer R. Herbarium 28. Li XW, Yang Y, Henry RJ, Rossetto M, Wang Y T, Chen SL. Plant DNA barcod‑ genomics: plastome sequence assembly from a range of herbarium ing: from gene to genome. Biol Rev. 2015;90(1):157–66. specimens using an Iterative Organelle Genome Assembly pipeline. Biol J 29. Hart ML, Forrest LL, Nicholls JA, Kidner CA. Retrieval of hundreds of Lin Soc. 2016;117(1):33–43. nuclear loci from herbarium specimens. Taxon. 2016;65(5):1081–92. 15. Staats M, Erkens RHJ, van de Vossenberg B, Wieringa JJ, Kraaijeveld K, Stielow B, Geml J, Richardson JE, Bakker FT. Genomic treasure troves: complete genome sequencing of herbarium and insect museum speci‑ mens. PLoS ONE. 2013;8(7):e69189. Ready to submit your research ? Choose BMC and benefit from: fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Plant Methods Springer Journals

Genome skimming herbarium specimens for DNA barcoding and phylogenomics

Free
14 pages

Loading next page...
 
/lp/springer_journal/genome-skimming-herbarium-specimens-for-dna-barcoding-and-OCZWn6kHxF
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Life Sciences; Plant Sciences; Biological Techniques
eISSN
1746-4811
D.O.I.
10.1186/s13007-018-0300-0
Publisher site
See Article on Publisher Site

Abstract

Background: The world’s herbaria contain millions of specimens, collected and named by thousands of researchers, over hundreds of years. However, this treasure has remained largely inaccessible to genetic studies, because of both generally limited success of DNA extraction and the challenges associated with PCR‑ amplifying highly degraded DNA. In today’s next‑ generation sequencing world, opportunities and prospects for historical DNA have changed dramati‑ cally, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Results: As a practical test of routine recovery of rDNA and plastid genome sequences from herbarium specimens, we sequenced 25 herbarium specimens up to 80 years old from 16 different Angiosperm families. Paired ‑ end reads were generated, yielding successful plastid genome assemblies for 23 species and nuclear rDNAs for 24 species, respectively. These data showed that genome skimming can be used to generate genomic information from herbar‑ ium specimens as old as 80 years and using as little as 500 pg of degraded starting DNA. Conclusions: The routine plastome sequencing from herbarium specimens is feasible and cost‑ effective (com‑ pare with Sanger sequencing or plastome‑ enrichment approaches), and can be performed with limited sample destruction. Keywords: Degraded DNA, Herbarium specimens, Genome skimming, Plastid genome, rDNA, DNA barcoding species (DNA barcoding) and increasing knowledge of Background phylogenetic relationships. Herbaria are collections of preserved plant specimens The ‘unlocking’ of preserved natural history specimens stored for scientific study. There are approximately 3400 for DNA barcoding/species discrimination is of particu- herbaria in the world, containing around 350 million lar relevance. In the first decade of DNA barcoding, it specimens, collected over the past 400 years (http://sciwe became clear that obtaining material from expertly veri- b.nybg.org/scien ce2/index Herba rioru m.asp). These col - fied is a key rate-limiting step in the construction of a lections cover most of the world’s plant species, including global DNA reference library [2]. The millions of samples many rare and endangered local endemics, and species that are required for this endeavor, each needing cor- collected from places that are currently expensive or dif- responding voucher specimens and meta-data, create ficult to access [ 1]. The recovery of DNA from this vast a strong impetus for making best-use of previously col- resource of already collected expertly-verified herbarium lected material. specimens represent a highly efficient way of building a DNA degradation in herbarium samples followed by DNA-based identification resource of the world’s plant subsequent diffusion from the sample creates challenges for DNA recovery [3]. In addition, different preserva - tion methods can negatively affect the ability of extract, *Correspondence: dzl@mail.kib.ac.cn; jbyang@mail.kib.ac.cn Chun‑ Xia Zeng and Peter M. Hollingsworth contributed equally to this amplify and sequence DNA [4–6]. PCR amplification work of historical DNA is, therefore, generally restricted to Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese short amplicons (< 200  bp) and is further vulnerable Academy of Sciences, Kunming 650201, Yunnan, China Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/ publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Zeng et al. Plant Methods (2018) 14:43 Page 2 of 14 to contamination by recent DNA and PCR products input DNA was available (i.e. below 10 ng in Kanda et al. from the study species. The cumulative damage to the [20]; below 5  ng in Blaimer et  al. [21]). To better under- DNA can also cause incorrect bases to be inserted dur- stand ideal approaches of sample preparation for speci- ing enzymatic amplification. The main sources for these mens with minimal DNA, we intentionally limited DNA alterations are single nucleotide misincorporations [7, input to 500 pg per specimen. 8]. Above all, PCR-based Sanger sequencing by using In this paper we provide a further practical test of the herbarium samples to generate standard DNA bar- genome skimming methodology applied to herbarium codes can be challenging. A recent large-scale study by specimens. As part of the China Barcode of Life pro- Kuzmina et al. 2017 [9] examined 20,816 specimens rep- ject, and our wider phylogenomic studies, our aim was resenting 5076 of 5190 vascular plant species in Canada. to assess whether the success reported in these early Kuzmina et  al. found that specimen age and method of genome skimming studies could be repeated in other preservation had significant effects on sequence recov - laboratories. ery for all barcode markers. However, massively-parallel We evaluated the success and failure rates of rDNA and short-read Next-generation sequencing (NGS) protocols plastid genome sequencing from genome skims of 25 dif- have the potential to greatly increase the success of her- ferent species from herbarium specimens, and explored barium sequencing projects, as many new sequencing the impacts of parameters such as amount of input DNA approaches do not rely on large, intact DNA templates and PCR cycle numbers. and instead are well-suited for sequencing low concen- trations of short (100-400  bp) fragmented molecules [3, Methods 10]. Specimen sampling Straub et  al. [11], described how “genome skimming”, 25 herbarium specimens were selected from 16 Angio- involving a shallow-pass genome sequence using NGS, sperm families covering 22 genera, with specimen ages could recover highly repetitive genome regions such up to 80  years old. All 25 species were taken from the as rDNA or organelle genomes, and yield highly use- specimens housed in the Herbarium of the Institute of ful sequence data at relatively low sequence depth, and Botany, Chinese Academy of Sciences (KUN). The sam - these regions include the usual suite of DNA barcoding ples were selected to represent the major clades of APG markers [12, 13]. The genome skimming approach using III system (Table 1). NGS has been used to recover plastid DNA and rDNA sequences from 146 herbarium specimens [14], to pro- DNA extraction duce the entire nuclear genome of a 43-year-old Arabi- Approximately 1  cm sections of leaf or 20  mg of leaf dopsis thaliana herbarium specimen [15], the complete tissue were used for each DNA extraction. Genomic plastome, the mitogenome, nuclear ribosomal DNA clus- DNA was extracted using Tiangen DNAsecure Plant ters, and partial sequences of low-copy genes from an Kit (DP320). Yield and integrity (size distribution) of herbarium specimen of an extinct species of Hesperelaea genomic DNA extracts were quantified by fluorometric [16, 17], and the complete plastome, nuclear ribosomal quantification on the Qubit (Invitrogen, Carlsbad, Cali - DNA clusters, and partial sequences of low-copy genes fornia, USA) using the dsDNA HS kit, as well as by visual from three grass herbarium specimens [18]. assessment on a 1% agarose gel. However, sequencing small, historical specimens may be especially challenging if a specimens is unique, or Library preparation nearly so, with no alternative specimens available for All samples were subsequently built into blunt-end DNA study should the first specimen fail. Methods used to libraries in the laboratories using the NEBNext Ultra II extract and prepare DNA for sequencing must both be DNA library Prep kit for Illumina (New England BIo- more or less guaranteed to work, and, in many cases, labs) which has been optimized for as little as 5 ng start- allow for preservation of DNA for future study [19]. In ing DNA and Illumina-specific adapters [22]. The library recent studies that report successfully sequencing of his- protocol was performed as per the manufacturer’s torical specimens from 1  ng to 1  μg of input DNA (for instructions with four modifications: (i) 500  pg of input example, up to 1  μg in Bakker et  al. [14]; ∽  600  ng in DNA was selected to accommodate low starting DNA Staats et al. [15]; 33 ng in Zadane et al. [17]; 8.25–537 ng quantities, (ii) DNA was not fragmented by sonication in Kanda et  al. [20]; 5.8–200  ng in Blaimer et  al. [21]; because the DNA was highly degraded; (iii) The NEB - less than 10  ng in Besnard et  al. [18]; 1–10  ng in Sproul Next library was generated without any size selection; (iv) and Maddison [19]). But a number of studies also report DNA libraries were then amplified in an indexing PCR, abandoning a subset of specimens for which too little which barcoded each library and discriminated each Zeng et al. Plant Methods (2018) 14:43 Page 3 of 14 Table 1 List of the specimen materials, DNA yields used in our study Sample ID Species Family Collection Age ng/ul Volume (ul) DNA yield (ng) 01 Manglietia fordiana Magnoliaceae 19780402 39 0.894 36 32.184 02 Manglietia fordiana Magnoliaceae 19541027 63 2.35 37 86.95 03 Schisandra henryi Schisandraceae 19821108 35 1.87 33 61.71 04 Schisandra henryi Schisandraceae 19840528 33 0.909 33 29.997 05 Phoebe neurantha Lauraceae 1938 79 0.507 36 18.252 06 Cinnamomum bodinieri Lauraceae 1960 57 2.26 36 81.36 08 Holboellia latifolia Lardizabalaceae 1982 35 1.29 34 43.86 09 Chloranthus erectus Chloranthaceae 1973 44 4.18 36 150.48 10 Sarcandra glabra Chloranthaceae 1988 29 4.35 31.5 137.025 11 Meconopsis racemosa Papaveraceae 1976 41 4.35 22 95.7 12 Macleaya microcarpa Papaveraceae 1986 31 1.97 35.5 69.935 13 Hodgsonia macrocarpa Cucurbitaceae 1982 35 2.18 34 74.12 14 Malus yunnanensis Rosaceae 1939 78 0.834 35 29.19 15 Elaeagnus loureirii Elaeagnaceae 1993 24 9.75 34 331.5 16 Rhododendron rex subsp. fictolacteum Ericaceae 1979 38 8.15 20.5 167.075 17 Swertia bimaculata Gentianaceae 19840823 33 1.67 35 58.45 18 Primula sinopurpurea Primulaceae 19400907 77 0.974 32 31.168 19 Paederia scandens Araceae 19550331 62 0.344 34 11.696 20 Colocasia esculenta Araceae 19741001 43 1.46 36 52.56 21 Pholidota chinensis Orchidaceae 1959 58 0.107 34 3.638 22 Otochilus porrectus Orchidaceae 1990 27 0.344 35 12.04 23 Indosasa sinica Poaceae 2007 10 1.65 35 57.75 24 Camellia gymnogyna Theaceae 19340617 83 0.417 36 15.012 25 Camellia sinensis var. assamica Theaceae 2002 15 4.03 23 92.69 26 Panicum incomtum Poaceae 20001017 17 1.63 36 58.68 All vouchers are deposited in the herbarium of the Kunming Institute of Botany (KUN) sample. Five PCR cycles was suggested by the manufac- We did not assemble the internal gene spacer (IGS) turer’s instruction for 5 ng of input DNA. As only 500 pg because of the complexity of this region which is rich in of starting DNA was used, we tested use of increasing duplications and inversions. numbers of PCR cycles (namely × 6, × 8, × 10, × 12, × 14 The raw sequence reads were filtered for primer/adap - PCR cycles). Concentration and size profiles of the final tor sequences and low-quality reads with the NGS QC indexed libraries (125 libraries, representing 25 speci- Toolkit [23]. The cut-off value for percentage of read mens at 5 different numbers of PCR cycles) were assessed length was 80, and that for PHRED quality score was on a Bioanalyzer 2100 using a high sensitivity DNA chip. 30. Then the filtered high-quality pair-end reads were assembled into contigs with Spades 3.0 [24]. Next, we Library pooling identified highly similar genome sequences using the The final indexed libraries were then pooled (33 or 34 Basic Local Alignment Search Tool (BLAST: http://blast samples per lane) in equimolar ratios and sequenced .ncbi.nlm.gov/). The procedures and parameters for set - on three lanes on an Illumina XTen sequencing system ting the sequence quality control, de novo assembly, and (Illumina Inc.) using paired and chemistry at the Cloud blast search were followed as in Yang et  al. [25]. Next, health Medical Group Ltd. we determined the proper orders of the aligned contigs using the highly similar genome sequences identified in Analyses the BLAST search as references. At this point, the target Successfully sequenced samples were assembled into contigs were assembled into complete plastid genomes chloroplast genomes and nuclear rDNAs. Here the and nuclear rDNAs. rDNAs comprise the complete sequence of 26S, 18S, and Annotation of the plastomes was performed using 5.8S and internal transcribed spacers (ITS1 and ITS2). the plastid genome annotation package DOGMA [26] Zeng et al. Plant Methods (2018) 14:43 Page 4 of 14 (http://dogma .ccbb.utexa s.edu/). Start and stop codons of protein-coding genes, as well as intron/exon positions, were manually adjusted. The online tRNAscan-SE ser - vice [27] was used to further determine tRNA genes. The final complete plastomes and rDNAs were deposited into GenBank (Accession numbers: MH394344-MH394431; MH270450-MH270494). Fungi or other plants may be co-isolated during the DNA extraction process resulting in DNA contamination [1]. This is particularly important where starting DNA concentrations are extremely low. We thus sub-sampled our data to check for contamination. To check for con- tamination in the plastid DNA sequences, for each spe- cies we extracted its rbcL sequence and blasted it against Fig. 1 DNA yield against specimen age GenBank to check that it grouped with related species. BLAST1 (implemented in the BLAST program, version 2.2.17) was used to search the reference database for each −5 complete plastid genomes, but with gaps ranged from 5 query sequence with an E value < 1 × 10 . Likewise, to to 349  bp (Table  2). However, although Rhododendron check for plant and fungal contamination in the rDNA rex subsp. fictolacteum yielded useful plastid assemblies, sequences, we took the final assembled ITS sequences (or many gaps were detected among contigs when the spe- partial ITS sequences where complete ITS was not recov- cies Vaccinium macrocarpon was used as reference data. ered) and blasted the sequences against the NCBI data- For the nuclear rDNAs, 21 species gave ribosomal base to check that it grouped with related species. DNA sequences assemblies > 4.3  kb drawn from 1 to 2 contigs with sequencing depths ranging from 3 × to Results 567 × (no nrDNA sequences could be assembled for All 25 species yielded amounts of DNA suitable for Phodidota chinensis,  Paederia scandens, Otochilus por- library preparation and further processing. Total yields rectus, and Camellia gymnogyna) (Table  3). Of these varied between 3  ng and 400  ng from on average 20  mg 21 species, 18 resulted in assembled nrDNAs consist- of dried leaf tissue, usually the equivalent of 1 cm of leaf ing of partial sequences of 18S and 26S, along with the tissue (Table 1). We found a negative correlation between complete sequence of 5.8S and the internal transcribed specimen age and DNA yield (Fig. 1). spacers ITS1 and ITS2. However, 3 species (2 samples We successfully enriched and sequenced DNA librar- of Manglietia fordiana (Sample ID 01 and 02), Phoebe ies constructed from herbarium material. Despite only neurantha (Sample ID 05), were difficult to assemble, 500  pg of input DNA, good quality libraries were pro- resulting in only partial recovery of 5.8S and the internal duced from 100 of 125 samples (25 species, with × 8, transcribed spacers ITS1 and ITS2. × 10, × 12, × 14 PCR cycles). The concentration of the To check the quality of the plastid sequences, all gene final indexed libraries based on six PCR cycles per spe - regions were translated. No stop codons that would be cies was too low to be further sequenced. Between indicative of sequencing errors were detected within the 15,877,478 and 44,724,436 high-quality paired-end reads assembled contigs. We then extracted about 1400  bp of were produced, with the total number of bases rang- rbcL sequence from 23 of the samples to check for con- ing from 2,381,621,700  bp (2.38 giga base pairs, Gbp) to tamination (for Rhododendron rex subsp. fictolacteum 6,708,665,400  bp (6.71 Gbp) (Table  2). These were then (Sample ID 16), the plastid genome was not assembled assembled into contigs, and using a blast search into successfully but we could nevertheless extract the rbcL plastid genomes and rDNA arrays. sequence from the plastid contigs). These rbcL sequences After de novo assembly, two species (Otochilus por- were subjected to a blast search against the NCBI data- rectus and Pholidota chinensis) generated poor plastid base. The rbcL sequences contained no insertions or dele- assemblies, with the longest contigs being 6705  bp with tions and matched the correct genus or family in each case 2 × coverage and 1325  bp with 3 × coverage respec- (Table  4). Likewise, we blasted the final assembled rDNA tively. The other 23 species yielded useful plastid assem - ITS sequences (or partial ITS sequences) from 24 samples blies drawn from 3 to 61 contigs assembled into plastid against the NCBI database. In all cases, the closest match genomes with depths ranged from 459 × to 2176 ×. Of to the sequence was from the family of the sequenced sam- these 23 species, 14 were assembled into complete plas- ple. No matches with fungi were detected (Table 5). tid genomes. Eight species were assembled into nearly Zeng et al. Plant Methods (2018) 14:43 Page 5 of 14 Table 2 Assembly statistics of plastid genome for all specimens used in this study Sample ID PCR Species Family Total Raw data #contigs Total Completed GenBank cycles sequences (gb) assembly accession length (bp) number 01D ×8 Manglietia fordiana Magnoliaceae 22404632 3.36 9 158993 1059 bp gap MH394393 01E ×10 Manglietia fordiana Magnoliaceae 25869654 3.88 32 159759 349 bp gap MH394394 01A ×12 Manglietia fordiana Magnoliaceae 35201972 5.28 14 158241 1840 bp gap MH394391 01B ×14 Manglietia fordiana Magnoliaceae 30007234 4.5 14 158221 1840 bp gap MH394392 02D ×8 Manglietia fordiana Magnoliaceae 22829038 3.42 8 161497 1040 bp gap MH394397 02E ×10 Manglietia fordiana Magnoliaceae 32497068 4.87 21 160113 Y MH394398 02A ×12 Manglietia fordiana Magnoliaceae 29637182 4.45 12 158315 1802 bp gap MH394395 02B ×14 Manglietia fordiana Magnoliaceae 31089730 4.66 22 160113 Y MH394396 03D ×8 Schisandra henryi Schisandraceae 29691984 4.45 5 145963 94 bp gap MH394365 03E ×10 Schisandra henryi Schisandraceae 25141160 3.77 4 145616 54 bp gap MH394366 03A ×12 Schisandra henryi Schisandraceae 32511344 4.88 11 146031 18 bp gap MH394363 03B ×14 Schisandra henryi Schisandraceae 29856636 4.48 9 145993 63 bp gap MH394364 04D ×8 Schisandra henryi Schisandraceae 24039822 3.61 4 146212 53 bp gap MH394369 04E ×10 Schisandra henryi Schisandraceae 23870902 3.58 4 146243 53 bp gap MH394370 04A ×12 Schisandra henryi Schisandraceae 33190158 4.98 15 146218 63 bp gap MH394367 04B ×14 Schisandra henryi Schisandraceae 30498044 4.57 6 145893 45 bp gap MH394368 05D ×8 Phoebe neurantha Lauraceae 29040850 4.36 11 152782 Y MH394354 05E ×10 Phoebe neurantha Lauraceae 27831254 4.17 15 152782 Y MH394355 05A ×12 Phoebe neurantha Lauraceae 44724436 6.71 17 152781 1 bp gap MH394352 05B ×14 Phoebe neurantha Lauraceae 35264634 5.29 13 152781 1 bp gap MH394353 06D ×8 Cinnamomum Lauraceae 30188820 4.53 9 152778 Y MH394417 bodinieri 06E ×10 Cinnamomum Lauraceae 32065328 4.81 13 152719 Y MH394418 bodinieri 06A ×12 Cinnamomum Lauraceae 24488292 3.67 7 152719 Y MH394415 bodinieri 06B ×14 Cinnamomum Lauraceae 35035602 5.26 11 152719 Y MH394416 bodinieri 08D ×8 Holboellia latifolia Lardizabalaceae 26229946 3.93 5 157817 Y MH394377 08E ×10 Holboellia latifolia Lardizabalaceae 28273022 4.24 9 157818 Y MH394378 08A ×12 Holboellia latifolia Lardizabalaceae 33873136 5.08 13 157614 204 bp gap MH394375 08B ×14 Holboellia latifolia Lardizabalaceae 34021360 5.1 10 157818 Y MH394376 09D ×8 Chloranthus erectus Chloranthaceae 21843512 3.28 4 157812 43 bp gap MH394413 09E ×10 Chloranthus erectus Chloranthaceae 18044364 2.71 5 157812 47 bp gap MH394414 09A ×12 Chloranthus erectus Chloranthaceae 30022162 4.5 13 157852 Y MH394411 09B ×14 Chloranthus erectus Chloranthaceae 28656686 4.3 11 157852 Y MH394412 10D ×8 Sarcandra glabra Chloranthaceae 18893508 2.83 5 158733 119 bp gap MH394361 10E ×10 Sarcandra glabra Chloranthaceae 20662770 3.1 7 159007 22 bp gap MH394362 10A ×12 Sarcandra glabra Chloranthaceae 27510166 4.13 9 158900 Y MH394360 10B ×14 Sarcandra glabra Chloranthaceae 29545206 4.43 9 158900 Y MH394431 11D ×8 Meconopsis racemosa Papaveraceae 24351884 3.65 5 153762 Y MH394401 11E ×10 Meconopsis racemosa Papaveraceae 29160582 4.37 5 153762 Y MH394402 11A ×12 Meconopsis racemosa Papaveraceae 33763340 5.06 6 153763 Y MH394399 11B ×14 Meconopsis racemosa Papaveraceae 35990358 5.4 4 153728 1 bp gap MH394400 12D ×8 Macleaya microcarpa Papaveraceae 26265548 3.94 11 161064 48 bp gap MH394385 12E ×10 Macleaya microcarpa Papaveraceae 25100372 3.77 11 161064 48 bp gap MH394386 12A ×12 Macleaya microcarpa Papaveraceae 29491952 4.42 13 161118 Y MH394383 12B ×14 Macleaya microcarpa Papaveraceae 28462338 4.27 12 161110 2 bp gap MH394384 Zeng et al. Plant Methods (2018) 14:43 Page 6 of 14 Table 2 (continued) Sample ID PCR Species Family Total Raw data #contigs Total Completed GenBank cycles sequences (gb) assembly accession length (bp) number 13D ×8 Hodgsonia macro- Cucurbitaceae 26886870 4.03 26 155027 1300 bp gap MH394428 carpa 13E ×10 Hodgsonia macro- Cucurbitaceae 34179418 5.13 16 154855 1298 bp gap MH394429 carpa 13A ×12 Hodgsonia macro- Cucurbitaceae 37182144 5.58 18 156015 20 bp gap MH394426 carpa 13B ×14 Hodgsonia macro- Cucurbitaceae 36782268 5.52 17 156146 Y MH394427 carpa 14D ×8 Malus yunnanensis Rosaceae 22107718 3.32 16 158955 820 bp gap MH394389 14E ×10 Malus yunnanensis Rosaceae 25720160 3.86 5 160071 Y MH394390 14A ×12 Malus yunnanensis Rosaceae 37501036 5.63 5 160067 Y MH394387 14B ×14 Malus yunnanensis Rosaceae 33776058 5.07 5 160068 Y MH394388 15D ×8 Elaeagnus loureirii Elaeagnaceae 15195822 2.28 5 152196 8 bp gap MH394424 15E ×10 Elaeagnus loureirii Elaeagnaceae 16862680 2.53 5 152196 8 bp gap MH394425 15A ×12 Elaeagnus loureirii Elaeagnaceae 21511050 3.23 4 152199 5 bp gap MH394422 15B ×14 Elaeagnus loureirii Elaeagnaceae 20556860 3.08 6 152199 5 bp gap MH394423 16D ×8 Rhododendron rex Ericaceae 23623070 3.54 subsp. fictolacteum 16E ×10 Rhododendron rex Ericaceae 28092596 4.21 subsp. fictolacteum 16A ×12 Rhododendron rex Ericaceae 31352560 4.7 subsp. fictolacteum 16B ×14 Rhododendron rex Ericaceae 30525730 4.58 subsp. fictolacteum 17D ×8 Swertia bimaculata Gentianaceae 18303136 2.77 53 152808 266 bp gap MH394373 17E ×10 Swertia bimaculata Gentianaceae 16559554 2.48 41 153443 406 bp gap MH394374 17A ×12 Swertia bimaculata Gentianaceae 15877478 2.38 30 143977 9947 bp gap MH394371 17B ×14 Swertia bimaculata Gentianaceae 18448302 2.77 48 153602 341 bp gap MH394372 18D ×8 Primula sinopurpurea Primulaceae 22890598 3.43 5 151945 50 bp gap MH394358 18E ×10 Primula sinopurpurea Primulaceae 26618684 3.99 5 151945 50 bp gap MH394359 18A ×12 Primula sinopurpurea Primulaceae 24107472 3.62 3 151945 50 bp gap MH394356 18B ×14 Primula sinopurpurea Primulaceae 25834066 3.88 3 151945 50 bp gap MH394357 19D ×8 Paederia scandens Araceae 25307356 3.8 15 162267 247 bp gap MH394346 19E ×10 Paederia scandens Araceae 24658068 3.7 7 162268 247 bp gap MH394347 19A ×12 Paederia scandens Araceae 23850180 3.58 8 162282 253 bp gap MH394344 19B ×14 Paederia scandens Araceae 24064764 3.61 10 162139 253 bp gap MH394345 20D ×8 Colocasia esculenta Araceae 29284270 4.39 4 162350 155 bp gap MH394430 20E ×10 Colocasia esculenta Araceae 25045978 3.77 5 162350 155 bp gap MH394421 20A ×12 Colocasia esculenta Araceae 23560322 3.53 6 162414 155 bp gap MH394419 20B ×14 Colocasia esculenta Araceae 24533656 3.68 4 162414 155 bp gap MH394420 21D ×8 Pholidota chinensis Orchidaceae 21688990 3.25 21E ×10 Pholidota chinensis Orchidaceae 20880950 3.13 21A ×12 Pholidota chinensis Orchidaceae 23548018 3.53 21B ×14 Pholidota chinensis Orchidaceae 27148284 4.07 22D ×8 Otochilus porrectus Orchidaceae 15550512 2.33 22E ×10 Otochilus porrectus Orchidaceae 22638772 3.4 22A ×12 Otochilus porrectus Orchidaceae 21572196 3.23 22B ×14 Otochilus porrectus Orchidaceae 28960858 4.34 23D ×8 Indosasa sinica Gramineae 18793020 2.82 6 139848 18 bp gap MH394381 23E ×10 Indosasa sinica Gramineae 17903432 2.69 10 139740 Y MH394382 Zeng et al. Plant Methods (2018) 14:43 Page 7 of 14 Table 2 (continued) Sample ID PCR Species Family Total Raw data #contigs Total Completed GenBank cycles sequences (gb) assembly accession length (bp) number 23A ×12 Indosasa sinica Gramineae 19106404 2.87 9 139740 Y MH394379 23B ×14 Indosasa sinica Gramineae 19668682 2.95 8 139740 Y MH394380 24D ×8 Camellia gymnogyna Theaceae 17176632 2.58 4 156402 Y MH394405 24E ×10 Camellia gymnogyna Theaceae 24532196 3.68 7 156590 Y MH394406 24A ×12 Camellia gymnogyna Theaceae 26478224 3.97 4 156590 Y MH394403 24B ×14 Camellia gymnogyna Theaceae 29768770 4.47 4 156590 Y MH394404 25D ×8 Camellia sinensis var. Theaceae 23291572 3.49 4 157028 Y MH394409 assamica 25E ×10 Camellia sinensis var. Theaceae 18698814 2.8 5 157028 Y MH394410 assamica 25A ×12 Camellia sinensis var. Theaceae 21788776 3.27 4 157029 Y MH394407 assamica 25B ×14 Camellia sinensis var. Theaceae 26155342 3.92 8 157028 Y MH394408 assamica 26D ×8 Panicum incomtum Gramineae 16865102 2.53 61 139986 Y MH394350 26E ×10 Panicum incomtum Gramineae 20465942 3.07 21 139999 Y MH394351 26A ×12 Panicum incomtum Gramineae 20004364 3 18 139999 Y MH394348 26B ×14 Panicum incomtum Gramineae 20672642 3.1 17 139999 Y MH394349 One-way analyses of variance (ANOVA) were per- starting tissue is important, and demonstrates the prac- formed to test the total reads against PCR cycles, PCR tical feasibility of organelle genome and rDNA recovery cycles against plastid contig numbers, PCR cycles against with minimal impacts on specimens. These findings, in plastid genome assembly length, PCR cycles against plas- the context of studies by others (e.g. Bakker et  al. [14]) tid mean-depth, and PCR cycles against plastid coverage. confirm that genome skimming can be performed with We found that was no significant correlation between limited sample destruction enabling relatively straight- PCR cycles and plastid contig numbers, PCR cycles and forward access to high-copy number DNA in preserved plastid genome assembly length, and PCR cycles and herbarium specimens spanning a wide phylogenetic plastid coverage. There was, however, a significant posi - coverage. tive correlation between the number of PCR cycles and To accommodate the use of only 500 pg of input DNA, the total number of reads, and PCR cycles and the plastid we modified the library protocol to remove the step of mean-depth (Fig. 2). DNA fragmentation by sonication because the DNA was Finally, when comparing plastome assembly coverage already highly degraded, we did not undertake any size with C values of the species concerned we find a slight selection, and we increased the number of PCR cycles to negative bit not significant correlation (Fig.  3), which enrich the indexed library. After library preparation and would suggest, at least for our sampling, that plastome Illumina paired-end sequencing, a sufficient number of assembly coverage is not affected by nuclear genome size read pairs (> 15,000,000) were generated for our 25 speci- of the specimen concerned. mens and 100 libraries. This strategy allowed the genera - tion of complete or near complete plastid genomes with Discussion depths ranging from 459 × to 2176 ×, and nuclear ribo- Sequencing herbarium specimens from low amounts somal units with a high sequencing depth (3 × to 567 ×) of starting DNA for 23 and 24 specimens respectively. Despite the low Our current study successfully demonstrated the recov- starting concentration, no plant or fungal contaminants ery of plastid genome sequences and rDNA sequences were obviously detectable in the assembled plastomes from herbarium specimens, some up to 80 years old. Our and rDNA sequences. study used small amounts of starting tissue (c 1 cm ) and For herbarium plastome assembly, the procedures extremely low initial concentrations (500 pg) of degraded and parameters for setting the sequence quality control, starting DNA. This success with a small amount of de novo assembly, blast search and genome annotation Zeng et al. Plant Methods (2018) 14:43 Page 8 of 14 Table 3 Assembly statistics of rDNAs for all specimens used in this study Sample ID PCR Cycles Species Family #contigs Total assembly (mean) Reference genome GenBank length (bp) Coverage accession (×) number 01A ×12 Manglietia fordiana Magnoliaceae 2 10343 406 KJ414477_ MH270473 Chrysobalanus icaco 02A ×12 Manglietia fordiana Magnoliaceae 2 8637 67 MH270474 03A ×12 Schisandra henryi Schisandraceae 1 15487 47 MH270475 04A ×12 Schisandra henryi Schisandraceae 1 10747 78 MH270476 05A ×12 Phoebe neurantha Lauraceae 2 7516 19 MH270477 06A ×12 Cinnamomum Lauraceae 1 10926 32 MH270478 bodinieri 08A ×12 Holboellia latifolia Lardizabalaceae 1 9298 160 MH270479 09A ×12 Chloranthus erectus Chloranthaceae 1 9094 54 MH270480 10A ×12 Sarcandra glabra Chloranthaceae 1 9062 51 MH270481 11A ×12 Meconopsis rac- Papaveraceae 1 7577 60 MH270482 emosa 12A ×12 Macleaya micro- Papaveraceae 1 12587 458 MH270483 carpa 13A ×12 Hodgsonia macro- Cucurbitaceae 1 10172 567 MH270484 carpa 14A ×12 Malus yunnanensis Rosaceae 1 5953 249 MH270485 15A ×12 Elaeagnus loureirii Elaeagnaceae 1 7901 428 MH270486 16A ×12 Rhododendron rex Ericaceae 1 6825 380 MH270487 subsp. fictolac- teum 17A ×12 Swertia bimaculata Gentianaceae 1 9644 48 MH270488 18A ×12 Primula sinopur- Primulaceae 1 5539 15 MH270489 purea 19A ×12 Paederia scandens Araceae 20A ×12 Colocasia esculenta Araceae 1 4399 5 MH270490 21A ×12 Pholidota chinensis Orchidaceae – – – – 22A ×12 Otochilus porrectus Orchidaceae 23A ×12 Indosasa sinica Gramineae 1 17306 93 MH270491 24A ×12 Camellia gym- Theaceae nogyna 25A ×12 Camellia sinensis var. Theaceae 1 11212 46 MH270493 assamica 26A ×12 Panicum incomtum Gramineae 1 8446 74 MH270494 were followed as in Yang et  al. [25]. The rate of our 25 barcode region from 23/25 samples, the trnL intron specimens with 100 libraries was c. 5  h per specimen from 23/25 samples, and the ITS1 and ITS2 from 20/25 on a 3-TB RAM Linux workstation with 32 cores. It was to 19/25 samples respectively. In addition to the recov- not different significantly between fresh and herbarium ery of these standard DNA barcoding loci, we also recov- specimens. ered many other regions used as supplementary barcode markers (e.g. atpF-H, psbK-I). The data produced with Recovery of widely used loci in plant molecular systematics this approach can thus contribute towards standard and A benefit of the genome skimming approach is that it extended DNA barcode reference libraries [12], in help- can recover loci widely used in previous molecular sys- ing identify additional regions which are informative for tematics studies (e.g. Coissac et  al. 2016 [12]). Here we any given clade [28], as well as producing data for phy- recovered the standard rbcL DNA barcode region from logenomic investigations to elucidate the relationships 23/25 samples, the standard matK DNA barcode region amongst plant groups. from 23/25 specimens, the standard trnH-psbA DNA Zeng et al. Plant Methods (2018) 14:43 Page 9 of 14 Table 4 BLAST results with extracted rbcL sequence against GenBank Query Information BLAST results Query_ Query_Species PCR Gene Length Reference_Species_Accession number Query Identities Identify Sample ID (Family) cycles name (bp) (Family) coverage (%) (%) level 01A Manglietia fordiana 12 rbcL 1428 Magnolia cathcartii_JX280392.1 (Magnoliaceae) 100 99 Family (Magnoliaceae) Magnolia biondii_KY085894.1 (Magnoliaceae) 100 99 Michelia odora_JX280398.1 (Magnoliaceae) 100 99 Manglietia fordiana_L12658.1 (Magnoliaceae) 98 100 02A Manglietia fordiana 12 rbcL 1428 Magnolia cathcartii_JX280392.1 (Magnoliaceae) 100 99 Family (Magnoliaceae) Magnolia biondii_KY085894.1 (Magnoliaceae) 100 99 Michelia odora_JX280398.1 (Magnoliaceae) 100 99 Manglietia fordiana_L12658.1 (Magnoliaceae) 98 100 03A Schisandra henryi 12 rbcL 1428 Schisandra chinensis_KY111264.1 (Schisandraceae) 100 99 Genus (Schisandraceae) Schisandra chinensis_KU362793.1 (Schisandraceae) 100 99 Schisandra sphenanthera_L12665.2 (Schisan‑ 98 99 draceae) 04A Schisandra henryi 12 rbcL 1428 Schisandra chinensis_KY111264.1 (Schisandraceae) 100 99 Genus (Schisandraceae) Schisandra chinensis_KU362793.1 (Schisandraceae) 100 99 Schisandra sphenanthera_L12665.2 (Schisan‑ 98 99 draceae) 05A Phoebe neurantha 12 rbcL 1428 Phoebe omeiensis_KX437772.1 (Lauraceae) 100 99 Family (Lauraceae) Persea Americana_KX437771.1 (Lauraceae) 100 99 Persea sp. _JF966606.1 (Lauraceae) 100 99 06A Cinnamomum bodi- 12 rbcL 1428 Phoebe bournei_KY346512.1 (Lauraceae) 100 99 Family nieri (Lauraceae) Phoebe chekiangensis_KY346511.1 (Lauraceae) 100 99 Phoebe sheareri_KX437773.1 (Lauraceae) 100 99 Cinnamomum verum_KY635878.1 (Lauraceae) 100 99 08A Holboellia latifolia 12 rbcL 1428 Akebia quinata_KX611091.1 (Lardizabalaceae) 100 99 Family (Lardizabalaceae) Stauntonia hexaphylla_L37922.2 (Lardizabalaceae) 99 99 Akebia trifoliate_KU204898.1 (Lardizabalaceae) 100 99 Holboellia latifolia_L37918.2 (Lardizabalaceae) 99 99 09A Chloranthus erectus 12 rbcL 1428 Chloranthus spicatus_EF380352.1 (Chloranthaceae) 100 100 Genus (Chloranthaceae) Chloranthus japonicas_KP256024.1 (Chloran‑ 100 99 thaceae) Chloranthus spicatus_AY236835.1 (Chloran‑ 98 99 thaceae) Chloranthus erectus_AY236834.1 (Chloranthaceae) 98 99 10A Sarcandra glabra 12 rbcL 1428 Chloranthus spicatus_EF380352.1 (Chloranthaceae) 100 99 Family (Chloranthaceae) Chloranthus japonicas_KP256024.1 (Chloran‑ 100 98 thaceae) Chloranthus nervosus_AY236841.1 (Chloran‑ 97 98 thaceae) Sarcandra glabra_HQ336522.1 (Chloranthaceae) 89 100 11A Meconopsis rac- 12 rbcL 1428 Meconopsis horridula_JX087717.1 (Papaveraceae) 97 100 Genus emosa (Papaver‑ aceae) Zeng et al. Plant Methods (2018) 14:43 Page 10 of 14 Table 4 (continued) Query Information BLAST results Query_ Query_Species PCR Gene Length Reference_Species_Accession number Query Identities Identify Sample ID (Family) cycles name (bp) (Family) coverage (%) (%) level Meconopsis horridula_ JX087712.1 (Papaveraceae) 97 99 Meconopsis delavayi_JX087688.1 (Papaveraceae) 97 99 12A Macleaya micro- 12 rbcL 1428 Macleaya microcarpa_FJ626612.1 (Papaveraceae) 97 99 Family carpa (Papaver‑ aceae) Macleaya cordata_U86629.1 (Papaveraceae) 97 99 Coreanomecon hylomeconoides_KT274030.1 100 98 (Papaveraceae) 13A Hodgsonia macro- 12 rbcL 1449 Cucumis sativus var. hardwickii_KT852702.1 100 98 Family carpa (Cucurbita‑ (Cucurbitaceae) ceae) Cucumis sativus_KX231330.1 (Cucurbitaceae) 100 98 Cucumis sativus_KX231329.1 (Cucurbitaceae) 100 98 14A Malus yunnanensis 12 rbcL 1428 Cotoneaster franchetii_KY419994.1 (Rosaceae) 100 99 Family (Rosaceae) Vauquelinia californica_KY419925.1 (Rosaceae) 100 99 Cotoneaster horizontalis_KY419917.1 (Rosaceae) 100 99 Malus doumeri_KX499861.1 (Rosaceae) 100 99 15A Elaeagnus loureirii 12 rbcL 1428 Elaeagnus macrophylla_KP211788.1 (Elae‑ 100 99 Order (Elaeagnaceae) agnaceae) Elaeagnus sp._KY420020.1 (Elaeagnaceae) 100 99 Toricellia angulate_KX648359.1 (Cornaceae) 99 99 16A Rhododendron rex 12 rbcL 1428 Rhododendron simsii_GQ997829.1 (Ericaceae) 100 99 Family subsp. Fictolac- teum (Ericaceae) Rhododendron ponticum_KM360957.1 (Ericaceae) 98 99 Epacris sp._ L01915.2 (Ericaceae) 97 99 17A Swertia bimaculata 12 rbcL 1443 Swertia mussotii_KU641021.1 (Gentianaceae) 98 99 Family (Gentianaceae) Gentianopsis ciliate_KM360802.1 (Gentianaceae) 97 98 Gentianella rapunculoides_Y11862.1 (Gentian‑ 97 99 aceae) 18A Primula sinopurpu- 12 rbcL 1428 Primula poissonii_KX668176.1 (Primulaceae) 100 99 Genus rea (Primulaceae) Primula chrysochlora_KX668178.1 (Primulaceae) 100 99 Primula poissonii_KF753634.1 (Primulaceae) 100 99 19A Paederia scandens 12 rbcL 1443 Pothos scandens_AM905732.1 (Araceae) 96 99 Family (Araceae) Pedicellarum paiei_AM905733.1 (Araceae) 96 99 Pothoidium lobbianum_AM905734.1 (Araceae) 96 99 20A Colocasia esculenta 12 rbcL 1443 Colocasia esculenta_JN105690.1 (Araceae) 100 100 Species (Araceae) Colocasia esculenta_JN105689.1 (Araceae) 100 99 Pinellia pedatisecta_KT025709.1 (Araceae) 100 99 21A Pholidota chinensis 12 rbcL – – – (Orchidaceae) 22A Otochilus porrectus 12 rbcL – – – (Orchidaceae) 23A Indosasa sinica 12 rbcL 1434 Pleioblastus maculatus_JX513424.1 (Poaceae) 100 100 Family (Poaceae) Zeng et al. Plant Methods (2018) 14:43 Page 11 of 14 Table 4 (continued) Query Information BLAST results Query_ Query_Species PCR Gene Length Reference_Species_Accession number Query Identities Identify Sample ID (Family) cycles name (bp) (Family) coverage (%) (%) level Oligostachyum shiuyingianum_JX513423.1 100 100 (Poaceae) Indosasa sinica_JX513422.1 (Poaceae) 100 100 24A Camellia gymnog- 12 rbcL 1428 Camellia szechuanensis_KY406778.1 ( Theaceae) 100 100 Family yna ( Theaceae) Pyrenaria menglaensis_KY406747.1 ( Theaceae) Camellia luteoflora_KY626042.1 ( Theaceae) 25A Camellia sinensis 12 rbcL 1428 Camellia szechuanensis_KY406778.1 ( Theaceae) 100 100 Family var. assamica ( Theaceae) Pyrenaria menglaensis_KY406747.1 ( Theaceae) 100 100 Camellia luteoflora_KY626042.1 ( Theaceae) 100 100 Camellia sinensis var. assamica_JQ975030.1 100 100 ( Theaceae) 26A Panicum incomtum 12 rbcL 1434 Lecomtella madagascariensis_HF543599.2 99 99 Family (Poaceae) (Poaceae) Chasechloa madagascariensis_KX663838.1 99 99 (Poaceae) Amphicarpum muhlenbergianum_KU291489.1 99 99 (Poaceae) Panicum virgatum_HQ731441.1 (Poaceae) 100 99 Practical benefits degraded or destroyed, the species concerned may sim- A primary motivation for this study was our own expe- ply be no longer available for collection. Mining herbaria riences with suboptimal DNA recovery from herbarium to obtain sequences from previously collected material specimens using Sanger sequencing coupled with dif- can circumvent this problem. Thirdly, sequencing plas - ficulty in accessing fresh material of some species. The tid genomes and rDNA arrays from specimens that are success of this method using only small amounts of start- many decades old enables a baseline to be established for ing tissue from herbarium specimens is an important haplotype and ribotype diversity. This baseline can then step to addressing these challenges. It makes sequencing be used to assess evidence for genetic diversity loss or type specimens a realistic proposition, which can further change due to recent population declines or environmen- serves to integrate genetic data into the existing taxo- tal change. nomic framework. A second practical benefit is that field work is often not possible in some geographical regions Conclusions where past collections have been made. Political insta- This study confirms the practical and routine applica - bility and/or general inaccessibility can preclude current tion of genome skimming for recovering sequences collecting activities, and where habitats have been highly from plastid genomes and rDNA from small amounts Zeng et al. Plant Methods (2018) 14:43 Page 12 of 14 Table 5 BLAST results with extracted ITS sequence against GenBank Query information BLAST results Query_ Query_Species (Family) PCR cycles Gene Length Reference_Species (Family) Query Identities Sample ID name (bp) coverage 01A Manglietia fordiana (Magnoliaceae) 12 ITS 369 Magnolia virginiana_DQ499097.1 (Mag‑ 100% 95% noliaceae) 02A Manglietia fordiana (Magnoliaceae) 12 ITS 349 Magnolia virginiana_DQ499097.1 (Mag‑ 100% 95% noliaceae) 03A Schisandra henryi (Schisandraceae) 12 ITS 676 Schisandra pubescens_AF263436.1 99% 100% (Schisandraceae) 04A Schisandra henryi (Schisandraceae) 12 ITS 676 Schisandra pubescens_JF978533.1 99% 99% (Schisandraceae) 05A Phoebe neurantha (Lauraceae) 12 ITS 518 Phoebe neurantha_FM957847.1 (Lau‑ 100% 99% raceae) 06A Cinnamomum bodinieri (Lauraceae) 12 ITS 603 Cinnamomum micranthum f. kanehirae 100% 99% _KP218515.1 (Lauraceae) 08A Holboellia latifolia (Lardizabalaceae) 12 ITS 677 Holboellia angustifolia subsp. 100% 99% angustifolia_AY029790.1 (Lardizabal‑ aceae) 09A Chloranthus erectus (Chloran‑ 12 ITS 663 Chloranthus erectus_AF280410.1 (Chlo‑ 99% 99% thaceae) ranthaceae) 10A Sarcandra glabra (Chloranthaceae) 12 ITS 667 Sarcandra glabra_KWNU91871 (Chlor‑ 100% 100% anthaceae) 11A Meconopsis racemosa (Papaver‑ 12 ITS 671 Meconopsis racemosa_JF411034.1 100% 99% aceae) (Papaveraceae) 12A Macleaya microcarpa (Papaveraceae) 12 ITS 612 Macleaya cordata_AY328307.1 (Papa‑ 99% 89% veraceae) 13A Hodgsonia macrocarpa (Cucurbi‑ 12 ITS 614 Hodgsonia heteroclita_HE661302.1 100% 98% taceae) (Cucurbitaceae) 14A Malus yunnanensis (Rosaceae) 12 ITS 596 Malus prattii_JQ392445.1 (Rosaceae) 99% 99% 15A Elaeagnus loureirii (Elaeagnaceae) 12 ITS 649 Elaeagnus macrophylla_JQ062495.1 99% 99% (Elaeagnaceae) 16A Rhododendron rex subsp. fictolac- 12 ITS 646 Rhododendron rex subsp. fictolacteum_ 100% 100 teum (Ericaceae) KM605995.1 (Ericaceae) 17A Swertia bimaculata (Gentianaceae) 12 ITS 626 Swertia bimaculata _JF978819.2 (Gen‑ 100 99% tianaceae) 18A Primula sinopurpurea (Primulaceae) 12 ITS 631 Primula melanops_JF978004.1 (Primu‑ 100% 99% laceae) 19A Paederia scandens (Araceae) 12 ITS – – – – 20A Colocasia esculenta (Araceae) 12 ITS 552 Colocasia esculenta_AY081000.1 99% 99% (Araceae) 21A Pholidota chinensis (Orchidaceae) 12 ITS – – – – 22A Otochilus porrectus (Orchidaceae) 12 ITS – – – – 23A Indosasa sinica (Poaceae) 12 ITS 604 Oligostachyum sulcatum_EU847131.1 98 99 (Poaceae) 24A Camellia gymnogyna ( Theaceae) 12 ITS – – – – 25A Camellia sinensis var. assamica 12 ITS 645 Camellia sinensis var. sinensis_ 99% 99% ( Theaceae) FJ004871.1 ( Theaceae) 26A Panicum incomtum (Poaceae) 12 ITS 795 Chasechloa egregia_LT593967.1 100 98 (Poaceae) Zeng et al. Plant Methods (2018) 14:43 Page 13 of 14 of starting tissue from preserved herbarium specimens. The ongoing development of new sequencing technolo - gies is creating a fundamental shift in the ease of recov- ery of nucleotide sequences enabling ‘new uses’ for the hundreds of millions of existing herbarium specimens [1, 10, 14, 16, 29]. This shift from Sanger sequencing to NGS approaches has now firmly moved herbarium specimens into the genomic era. Authors’ contributions BY and DZL organized the project. CXZ performed the experiments, analyzed the data, and wrote the paper; PMH wrote and edited the paper; JY, ZSH, and ZRZ extracted DNA, prepared library. All authors read and approved the final manuscript. Author details Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China. Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, UK. Acknowledgements We are very grateful to Mr. Wei Fang (Kunming Institute of Botany, Chinese Academy of Sciences) for kindly providing the materials. We would like to thank Ms. Chun‑ Yan Lin and Mr. Shi‑ Yu Lv (Kunming Institute of Botany, Chi‑ nese Academy of Sciences) for their help with the experiments. Competing interests The authors declare that they have no competing interests. Availability of data and materials The datasets supporting the conclusions of this article are available in the NCBI SRA repository, SRP142448 and hyperlink to datasets in http://www.ncbi. nlm.nih.gov/home/submi t.shtml . Consent for publication Not applicable. Ethics approval and consent to participate Not applicable. Funding This work was funded by a program for basic scientific and technological data acquisition of the Ministry of Science of Technology of China (Grant No. 2013FY112600), the Large‑scale Scientific Facilities of the Chinese Academy Fig. 2 PCR cycles with raw data, contigs, and assembly length of Sciences (Grant No: 2017‑LSF‑ GBOWS‑02), and Biodiversity Conservation Strategy Program of Chinese Academy of Sciences (ZSSD‑011). Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub‑ lished maps and institutional affiliations. Received: 27 November 2017 Accepted: 20 April 2018 References 1. Särkinen T, Staats M, Richardson JE, Cowan RS, Bakker FT. How to open the treasure chest? Optimizing DNA extraction from herbarium speci‑ mens. PLoS ONE. 2012;7(8):e43808. 2. Hebert PDN, Hollingsworth PM, Hajibabaei M. From writing to reading the encyclopedia of life. Philos Trans R Soc B. 2016;371(1702):20150321. 3. Kistler L, Ware R, Smith O, Collins M, Allaby RG. A new model for ancient Fig. 3 Plastome coverage versus C value (pg DNA per 1C) of all DNA decay based on paleogenomic meta‑analysis. Nucleic Acids Res. samples assembled in this study 2017;45(11):6310–20. Zeng et al. Plant Methods (2018) 14:43 Page 14 of 14 4. Hall LM, Wollcox MS, Jones DS. Association of enzyme inhibition with 16. Van de Paer C, Hong‑ Wa C, Jeziorski C, Besnard G. Mitogenomics of methods of museum skin preparation. Biotechniques. 1997;22(5):928–34. Hesperelaea, an extinct genus of Oleaceae. Gene. 2016;594(2):197–202. 5. Hedmark E, Ellegren H. Microsatellite genotyping of DNA isolated from 17. Zedane L, Hong‑ Wa C, Murienne J, Jeziorsky C, Baldwin BG, Besnard G. claws left on tanned carnivore hides. Int J Legal Med. 2005;119(6):370–3. Museomics Illuminate the history of an extinct, paleoendemic plant 6. Tang EPY. Path to effective recovering of DNA from formalin‑fixed lineage (Hesperelaea, Oleaceae) known from an 1875 collection from biological samples in natural history collections: workshop summary. Guadalupe Island, Mexico. Biol J Linnea Soc. 2015;117(1):44–57. Washington: The National Academies Press; 2006. 18. Besnard G, Christin PA, Malé PJG, Lhuillier E, Lauzeral C, Coissac E, Voront‑ 7. Groombridge JJ, Jones CG, Bruford MW, Nichols RA. ‘Ghost’ alleles of the sova MS. From museums to genomics: old herbarium specimens shed Mauritius kestrel. Nature. 2000;403(6770):616. light on a C3 to C4 transition. J Exp Bot. 2014;65(22):6711–21. 8. Stiller M, Green RE, Ronan M, Simons JF, Du L, He W, Egholm M, Rothberg 19. Sproul JS, Maddison DR. Sequencing historical specimens: successful JM, Keates SG, Ovodov ND, Antipina EE, Baryshnikov GF, Kuzmin YV, preparation of small specimens with low amounts of degraded DNA. Mol Vasilevski AA, Wuenschell GE, Termini J, Hofreiter M, Jaenicke‑Després Ecol Resour. 2017;17:1183–201. V, Pääbo S. Patterns of nucleotide misincorporations during enzymatic 20. Kanda K, Pflug JM, Sproul JS, Dasenko MA, Maddison DE. Successful amplification and direct large ‑scale sequencing of ancient DNA. Proc Natl recovery of nuclear protein‑ coding genes from small insects in museums Acad Sci USA. 2006;103(37):13578–84. using Illumina sequencing. PLoS ONE. 2015;10:30143929. 9. Kuzmina ML, Braukmann TWA, Fazekas AJ, Graham SW, Dewaard SL, Rod‑ 21. Blaimer BB, Lloyd MW, Guillory WX, SnG B. Sequence capture and rigues A, Bennett BA, Dickinson TA, Saarela JM, Catling PM, Newmaster phylogenetic utility of genomic ultraconserved elements obtained from SG, Percy DM, Fenneman E, Lauron‑Moreau A, Ford B, Gillespie L, Sub ‑ pinned insect specimens. PLoS ONE. 2016;11:e0161531. ramanyam R, Whitton J, Jennings L, Metsger D, Warne CP, Brown A, Sears 22. Meyer M, Kircher M. Illumina sequencing library preparation for highly E, Dewaard JR, Zakharov EV, Hebert PDN. Using herbarium‑ drived DNAs multiplexed target capture and sequencing. Cold Spring Harb Protoc. to assemble a large‑scale DNA barcode library for the vascular plants of 2010. https ://doi.org/10.1101/pdb.prot5 448. Canada. Appl Plant Sci. 2017;5(12):1700079. 23. Patel RK, Jain M. NGS QC toolkit: a toolkit for quality control of next gen‑ 10. Smith O, Palmer SA, Gutaker R, Allaby RG. An NGS approach to archaeo‑ eration sequencing data. PLoS ONE. 2012;7(2):e30619. botanical museum specimens as genetic resources in systematics 24. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin research. In: Olson PD, Hughes J, Cotton JA, editors. Next generation VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi systematics. Cambridge: Cambridge University Press; 2016. p. 282–304. N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly 11. Straub SCK, Parks M, Weithmier K, Fishbein M, Cronn RC, Liston A. Navi‑ algorithm and its applications to single‑ cell sequencing. J Comput Biol. gating the tip of the genomic iceberg: next‑ generation sequencing for 2012;19(5):455–77. plant systematics. Am J Bot. 2012;99(2):349–64. 25. Yang JB, Li DZ, Li HT. Highly effective sequencing whole chloroplast 12. Coissac E, Hollingsworth PM, Lavergne S, Taberlet P. From barcodes genomes of angiosperms by nine novel universal primer pairs. Mol Ecol to genomes: extending the concept of DNA barcoding. Mol Ecol. Resour. 2014;14(5):1024–31. 2016;25(7):1423–8. 26. Wyman SK, Jansen RK, Boore JL. Automatic annotation of organellar 13. Hollingsworth PM, Li DZ, van der Bank M, Twyford AD. Telling plant spe‑ genomes with DOGMA. Bioinformatics. 2004;20(17):3252–5. cies apart with DNA: from barcodes to genomes. Philos Trans R Soc B. 27. Schattner P, Brooks AN, Lowe TM. The tRNAscan‑SE, snoscan and snoGPS 2016;371(1702):20150338. web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 14. Bakker FT, Lei D, Yu JY, Mohammadin S, Wei Z, van de Kerke S, Gravendeel 2005;33(Suppl_2):W686–9. B, Nieuwenhuis M, Staats M, Alquezar‑Planas DE, Holmer R. Herbarium 28. Li XW, Yang Y, Henry RJ, Rossetto M, Wang Y T, Chen SL. Plant DNA barcod‑ genomics: plastome sequence assembly from a range of herbarium ing: from gene to genome. Biol Rev. 2015;90(1):157–66. specimens using an Iterative Organelle Genome Assembly pipeline. Biol J 29. Hart ML, Forrest LL, Nicholls JA, Kidner CA. Retrieval of hundreds of Lin Soc. 2016;117(1):33–43. nuclear loci from herbarium specimens. Taxon. 2016;65(5):1081–92. 15. Staats M, Erkens RHJ, van de Vossenberg B, Wieringa JJ, Kraaijeveld K, Stielow B, Geml J, Richardson JE, Bakker FT. Genomic treasure troves: complete genome sequencing of herbarium and insect museum speci‑ mens. PLoS ONE. 2013;8(7):e69189. Ready to submit your research ? Choose BMC and benefit from: fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions

Journal

Plant MethodsSpringer Journals

Published: Jun 5, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off