Draft genome of a high value tropical timber tree, Teak (Tectona grandis L. f): insights into SSR diversity, phylogeny and conservation

Draft genome of a high value tropical timber tree, Teak (Tectona grandis L. f): insights into SSR... Teak (Tectona grandis L. f.) is one of the precious bench mark tropical hardwood having quali- ties of durability, strength and visual pleasantries. Natural teak populations harbour a variety of characteristics that determine their economic, ecological and environmental importance. Sequencing of whole nuclear genome of teak provides a platform for functional analyses and development of genomic tools in applied tree improvement. A draft genome of 317 Mb was as- sembled at 151 coverage and annotated 36, 172 protein-coding genes. Approximately about 11.18% of the genome was repetitive. Microsatellites or simple sequence repeats (SSRs) are un- doubtedly the most informative markers in genotyping, genetics and applied breeding applica- tions. We generated 182,712 SSRs at the whole genome level, of which, 170,574 perfect SSRs were found; 16,252 perfect SSRs showed in silico polymorphisms across six genotypes sug- gesting their promising use in genetic conservation and tree improvement programmes. Genomic SSR markers developed in this study have high potential in advancing conservation and management of teak genetic resources. Phylogenetic studies confirmed the taxonomic po- sition of the genus Tectona within the family Lamiaceae. Interestingly, estimation of divergence time inferred that the Miocene origin of the Tectona genus to be around 21.4508 million years ago. Key words: teak, genome sequencing, SSRs, phylogeny, divergence V C The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 1 Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 2 Genome sequencing of Teak 1. Introduction growth rate, stem form, flowering, fruit yield and wood characteris- 9,10 tics. The results of genetic improvement in teak showed an overall Teak (Tectona grandis L. f.; 2n ¼ 2x ¼ 36) belonging to the mint positive trend, however, possible existence of non-additive genetic family Lamiaceae is one of the world’s highly valued tropical timber control for economically important traits in seed progeny generated species that occurs naturally in India, Laos, Myanmar and 11,12 1–4 high genetic variability even within a family. Further, it was sug- Thailand. The timber is highly valued because of its extreme dura- gested that selection of teak stem size can be carried out at the age of bility, strength, stability as well as resistance to pests, chemicals and 3 years, wherein indirect selection on flowering age will improve water. Quinones and other extractives found abundant in the teak- forking height. Clonal propagation through budding, rooting of wood are responsible for its anti-termite and anti-fungal properties cuttings and in vitro propagation has facilitated tree improvement conferring the longevity of its timber. Thus, the wood is used in and deployment of superior performers for commercial cultivation building ships, railway carriages, sleepers, construction, furniture, ve- towards increasing timber yield. Globally, although clonal seed neer and carving. Owing to its admirable timber qualities and aes- orchards (CSOs) were established for production of quality seed thetical properties (Fig. 1), teak has been successfully established as stock, reproductive fitness and success of CSO was a chronic prob- pure plantations in India and elsewhere since 1850. The recent log lem largely pivoted to asynchronous flowering. export ban imposed by Myanmar has resulted steep rise in interna- Economic importance, concerns for conservation of natural popu- tional prices of plantation grown teak from Latin America and Africa lations and increase in plantation area, demand understanding the ge- leading to expansion of plantation area. The estimated planted area netic basis of the economic traits in teak. Hence, like many other of teak is about 4.25–6.89 million ha with over 1.7 million ha in forest plantation species (e.g. Pinus, Populus and Eucalyptus), genetic India. Though the share of teak is <2% of tropical round wood pro- and genomic resources in teak needs to be comprehended. Genetic di- duction, its high value continuously attracts new planters. At the versity in natural and introduced populations of teak has been assessed same time, natural populations are continuously diminishing due to with markers such as random amplified polymorphic DNA, ampli- illegal logging, anthropogenic pressures and climate change. A recent 17,18 fied fragment length polymorphism and simple sequence repeat study on the effect of climate change in teak expresses the risks of bi- 19,20 markers (SSRs) or Microsatellites. These studies generated infor- ological invasion into teak habitats and recommends conservation of mation on population genetic structure of natural teak populations. crucial teak growing areas and suitable management planning. Indian teak is genetically very distinct from Thai and Indonesian prov- Population structure of teak across natural and introduced locations 17,19 enances and African landraces. Knowledge generated in these reveal that the landraces in introduced locations have comparatively studies is highly useful to implement conservation programmes in teak narrow genetic diversity, thus demanding exploration of genetic di- to improve sustainable management of teak forests. Recently, tran- versity of natural provenances and their conservation. scriptomes of secondary wood and vegetative to flowering transition Teak has several intrinsic genetic qualities that allow its genetic stage were developed, leading to identification of genes involved in improvement for timber production. Wide and discontinuous natural lignification, secondary metabolite production and flower formation. distribution across varying edaphic and climatic conditions in India Although numerous studies focused on various life history traits exist offers enormous potential for capturing adaptive genetic variation in teak, comprehensive understanding on the complete genome infor- for genetic improvement. As a first step in the genetic improvement mation remains unexplored. Next generation sequencing-based whole programme at a global level, a series of seventy five international genome sequencing (NGS-WGS) yields more information on genomic provenance trials co-ordinated by Danish International Development scans of polymorphism to precisely estimate various population ge- Agency were established during 1973–76 across sixteen countries. netic parameters including demographic history. In this context, to en- Evaluation of the provenances indicated wide variation for survival, rich genomic resources in teak, the present study reports whole genome of teak through Illumina Hiseq 2000 NGS platform followed by de novo contig assembly, gene annotation and subsequent discov- ery of SSR polymorphism. 2. Materials and methods 2.1. Plant material and genomic DNA sample preparation Open-pollinated seeds were collected from six dominant trees, one each from a provenance covering the entire latitudinal range of natu- ral distribution of teak in India (Table 1). Seedlings were raised with family identity and one vigorously growing seedling randomly cho- sen from each family was used in this study. Genomic DNA was extracted from fresh and young leaves using standard CTAB method and was purified using DNAeasy Plant Mini kit (Qiagen, USA). The quantity and quality of the genomic DNA were assessed using Nanodrop2000 (Thermo Fisher Scientific, USA), Qubit (Thermo Fisher Scientific, USA) and agarose gel electrophoresis. 2.2. Library preparation and genome sequencing WGS was performed using Illumina HiSeq 2000 platform and Figure 1. Cross section of teak wood showing major features (a) pith; (b) heart wood; (c) sap wood; (d) growth ring; and (e) medullary rays. Oxford Nanopore Technologies MinION device by the Genotypic Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 3 Table 1. Details of plant materials used in this study Accession ID Sample code Name of the provenance State Latitude Longitude Altitude (m) Rainfall (mm) 1 NR Nilambur Kerala 11 17’ N 76 19’ E 49 2,600 2 AU Arienkavu Kerala 8 96’ N 77 14’ E 240 2,600 3 WR Walayar Kerala 10 52’ N 76 46’ E 216 1,500 4 DI Dandeli Karnataka 15 07’ N 74 35’ E 510 2,200 5 HI Hojai Assam 26 39’ N 92 36’ E 69 1,750 6 TP Topslip Tamilnadu 10 26’ N 76 50’ E 640 1,350 Technology, Bengaluru, India in accordance to standard protocols. carried out using Augustus 3.0.2 programme and predicted proteins Accession 2 was selected for the generation of high quality reference were searched against the Uniprot non-redundant plant protein data- genome assembly. Accessions 1, 3, 4, 5 and 6 were subjected to low base (Taxonomy ¼ Viridiplante) with BlastX algorithm with an e-value coverage genome sequencing to identify polymorphic SSRs. In the (e-10) for gene ontology and annotation. Pathway annotation was per- case of accession 2, one paired end (PE) (150 bp  2) library of size formed by mapping the sequences obtained from Blast2GO to the con- 300–700 bp, two mate pair (MP) libraries (2–4 and 4–6 kb frag- tents of the KEGG Automatic Annotation Server (http://www.genome. ments) and one nanopore library with genomic DNA (2 lg) were jp/kegg/kaas/ (10 November 2017, date last accessed)). List of eudicot prepared for sequencing. In Illumina HiSeq 2000 platform, one lane plants used as reference organism for pathway identification in KASS of the flow cell was used for each sequencing library. Nanopore se- server is given in Supplementary Table S1. quencing was performed using R9.4 flow cells on a MinION Mk 1B device (Oxford Nanopore) with the MinKNOW software (versions 2.5. Identification of SSRs and detection of 1.0.5–1.5.12) and base calling was performed using Albacore 1.1.0 polymorphism (Oxford Nanopore). Template reads were exported as FASTA using FASTA formatted scaffolds of teak were analysed for frequency and poretools version 0.6. In the case of other five accessions (1, 3, 4, 5 density of SSRs using the Perl script MIcroSAtelitte (MISA; http:// and 6) one PE library for each with the size of 300–700 bp was se- pgrc.ipk-gatersleben.de/misa/ (20 November 2017, date last quenced at 15 coverage through Illumina Hiseq 2000 platform. accessed)). Initially SSRs of 1–6 nucleotides motifs were identified The sequence data is uploaded in genome database of GenBank with the minimum repeat unit defined as 10 for mononucleotides, 6 (Project id: PRJNA374940). The assembled genome, protein sequen- for dinucleotides, 5 for trinucleotides, 4 for tetra-nucleotides, and 3 ces and its annotation, GO and pathway information are available in each for penta and hexa-nucleotides. Compound SSRs were defined the web link https://biit.cs.ut.ee/supplementary/WGSteak/. as 2 SSRs interrupted by 100 bases. To design primers flanking the microsatellite loci, two interface Perl script modules were used to interchange data between MISA and the primer designing software 2.3. De novo genome assembly Primer 3. The SSR containing scaffolds were used to design the pri- The Illumina PE raw reads were filtered using FastQC and the raw mers with the following parameters. Primer length 18–25 bp, with reads were processed by in-house (Genotypic Technology, 20 bp as optimum; primer GC content ¼ 30–0%, with the optimum Bangaluru, India) ABLT script for low-quality bases and adapters 24 value of 50%; primer Tm 57–63 C, and product size ranged 100– removal. The MP reads were processed using Platanus internal 300 bp. trimmer for adapters and low-quality regions towards 3’-end. Polymorphic SSRs across five samples were analysed using acces- The processed PE reads along with MP and nanopore reads were sion 2 as reference. Polymorphic SSR retrieval tool (PSR) compris- used for contig generation using MaSuRCA v 3.2.2 de novo assem- 25 ing two modules (PSR_read_retrieval and PSR_poly_finder) were bler. To assemble the genome following command was used in deployed to detect SSR length polymorphisms of perfect repeats MaSuRCA assembler: GRAPH_KMER_SIZE ¼ auto, from NGS data. It is to be noted that PSR tool identifies length poly- LIMIT_JUMP_COVERAGE ¼ 300, JF_SIZE ¼ 38000000000, morphisms in perfect microsatellites only. Also, it filters out all the DO_HOMOPOLYMER_TRIM ¼ 1. Scaffolding of the assembled 26 reads that match twice or more on the reference sequence as well as contigs was performed using SSPACE v 2.0.5 with processed PE 27 non-overlapping paired-end reads that are aligned on the same mi- and MP reads followed by gap filling using Gap Closer v 1.12. crosatellite locus. Minimum number of supporting reads and read The genome size was estimated automatically during read comput- depth was fixed to 10 and 30, respectively. This process detects poly- ing stage which utilized both the Illuimna and Nanopore reads. morphic SSRs based on the availability of left and right border Similarly, the low depth Illumina reads generated for five accessions unique sequences based on the complete coverage of the SSR region of teak were assembled using accession 2 as reference. The se- in the sequence data. quenced data was uploaded to the Genome database of GenBank The polymorphic SSR data generated from PSR software was vali- (Project id: PRJNA421422). dated through gel electrophoresis. Totally 10 SSRs representing di, tri and tetra-nucleotide motifs were randomly chosen and amplified 2.4. Genome annotation with 10 randomly selected teak trees from Topslip (Latitude: For a functional overview of draft genome, assembled scaffolds were 10 29’09.5’N; Longitude: 75 50’03.8’E Altitude: 736m) provenance convertedtoFASTA formattedsequences, hardmaskedby (Supplementary Table S2). PCR amplification were set to 10 ml vol- RepeatMasker tool (RepeatMasker Open-3.0; www.repeatmasker.org ume containing 5 ng of template DNA, 2.5 mM MgCl , 2.5 ml10 (10 November 2017, date last accessed)). Repeats of Arabidopsis thali- PCR buffer, 0.5 mM of primer, 0.5 U Taq DNA polymerase, and 2.5 ana were used as reference for genome masking. Gene prediction was mM of dNTPs. The PCR cycling profile was programmed to 94 C Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 4 Genome sequencing of Teak Table 2. Details on 10 SSR markers used for amplification of teak germplasm SSR SSR Motif Primers (5’-3’) Annealing Number Product code temperature ( C) of alleles size (bp) IFT83 (AG) F5’AATTGGCATAAAGCGTGCTACTR5’CGCACGTCCTATTTTGGTTTAT 54 5 312-393 IFT821 (AC) F5’CCCCAATTATGTCAACCGACT R5’GGCATTATCTAAGATCGCAAGG 53.9 3 331-350 IFT63 (ATG) F5’CCCAAAGCGAATAATATCCTAC R5’CATGACTTGTTCGATGGGCTAAT 54 3 250-275 IFT168 (TCT) F5’ATCTTCAGCAGAGGAGGCTATG R5’GTGCCCTTTTCTCTCTTCTTCA 55 4 287-306 IFT479b (GGA) F5’GTGAAGATTCGGGTATGGAGAG R5’TACTCCCAGATTTCCCAATCAC 55 3 331-343 IFT382 (TATG) F5’TACTCATCACTGTCCCCAGTTG R5’GAACGGGAATCTAGAGTTGTGG 56 3 337-350 IFT 28 (AAAG) F5’CAGCCTCTGCATGTCAAATAAA R5’TTAGAGCTGGATATGCCATTGA 53.6 3 381-393 IFT14 (TTCT) F5’TGTGGTATTGGACCATCTGAAA R5’GGTAACCCACCAACAAATATGC 54 5 265-278 IFT3 (GAAAG) F5’TTCCACCTACTGGTTAAGGAAC R5’ATGGCTTACCAATTTACCAAACC 54 1 330 IFT777 (TCAGG) F5’TACTAACCGGAAGAGGGAAACC R5’TGTCGCTATGGACAGTTCATCT 56 4 312-343 for 5 min, 35 cycles at 94 C for 45 s, 58 C (annealing temperature LogCombiner ver. 2.4.4. Twenty percent of the trees were removed varied for each locus) for 45 s (Table 2), 72 C for 45 s, and a final as burn-in and the resulting trees were summarized with extension at 72 C for 10 min. Banding pattern was visualised by Treeannotator ver. 2.4.4. Finally, the summarized single trees were silver staining after denaturing polyacrylamide gel (5%) visualized in FigTree ver. 1.4.2. electrophoresis. 3. Results and discussion 2.6. Phylogenetic tree construction Several members of the order Lamiales are well known for their sec- Plastid gene sequences of psbB, rbcL, psaAand ycf2 from 16 species of ondary metabolites of medicinal value and 17 species of the family the family Lamiaceae were accessed from the Genbank public domain were sequenced for the whole genome (https://www.ncbi.nlm.nih. (https://www.ncbi.nlm.nih.gov/genbank (13 February 2018, date last gov/genome/? term¼Lamiales (13 February 2018, date last accessed)) for constructing the phylogenetic tree with Olea europaea accessed)) due to their economic importance. However, the woody (Oleaceae) as the out group (Supplementary Table S3). Sequences were species teak, one of the world’s premier timber species cultivated aligned using multiple sequence alignment tool implemented in across 65 countries has only 3,269 nucleotide accessions including 6 ClustalX ver. 2.0. The sequences were manually refined in BioEdit ESTs available so far in the public domain (7 November 2017, date ver. 7.0.9. Phylogenetic analyses were performed for concatenated last accessed). Owing to the commercial importance, this study was sequences of plastid gene regions. JModeltest ver. 2.1.7 was used to undertaken to unravel the genome structure to facilitate conservation choose the appropriate model of sequence evolution according to the and improvement of teak genetic resources. The only available geno- Akaike information criterion (AIC). Bayesian interference analysis mic resource in teak is the de novo assembly with transcriptome in was performed in MrBayes ver. 3.2.6. The Markov chain Monte 12- and 60-year-old trees to generate unigenes related to lignin bio- Carlo algorithm was run for ten million generations, over four chains synthesis. All the earlier gene assemblies were based on short-read each, sampled every 1000 generations. The estimated sample size was technology. This work on draft genome assembly using long read checked using Tracer ver. 1.4 (http://beast.bio.ed.ac.uk/Tracer (22 technology like MP and MinION nanopore sequencing would pro- February 2018, date last accessed)). The first 25% of the sampled trees vide an excellent resource to comprehend genome structure, genetic was discarded as burn-in. The phylogenetic tree out of MrBayes variation and conservation. ver.3.2.6 was visualized in FigTree ver.1.4.2 (http://tree.bio.ed.ac.uk/ software/figtree/ (22 February 2018, date last accessed)). 3.1. De novo assembly and characterization of 2.7. Divergence time estimation genomic sequences The estimation of divergence time was performed using Bayesian ap- The study has generated a high quality reference genome for the ac- proach implemented in BEAST ver. 2.4.4 programme. Bayesian ap- cession 2 which was assembled from PE, MP and nanopore library proach was deployed to estimate divergence time of the genus sequences. The numbers of raw and processed reads are summarized Tectona with respect to the other subfamilies and genera within the in Table 3. High depth (109) PE sequencing provided a global family Lamiaceae. The HKY model was used based on the result of overview of teak genome with over 137.2 million PE sequences. AIC from JModeltest under an uncorrelated lognormal relaxed clock After suitable filtration, a total of 128.2 million sequences, represent- model. Yule speciation model was used as tree prior. Two calibration ing 93.43% of the raw reads were obtained. Two different MP li- points (57.6 million years ago, Mya) for Nepetoideae and 23.9 Mya braries with 2–4 and 4–6 Kb generated raw reads of about 7,681 for Lamioideae) were used based on the previous reports to deter- (20) and 5,819 Mbp (15) of which processed read length was mine specific nodes prior and lognormal distributions. Markov 2,408 and 1,898 Mbp, respectively. Chain was run for 10 generations, while every 1,000 generations Nanopore library generated 782,591 reads with read length of were sampled. The chronograms shown were calculated using the 2,685, 280,348 bp, corresponding to 7.06 coverage of the genome. median clade credibility tree plus 95% confidence intervals. Results Longest read was 1,345,484 bp and the average read length was were analysed using Tracer ver. 1.4 to assess the convergence statis- 3,431 bp. All these sequences with 151 coverage were included in tics of the sequences. The effective sample sizes for all parameters whole genome assembly. Application of nanopore sequencing was and the tree files from the four runs of BEAST were combined using challenging mainly due to large size and repetitive nature of the plant Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 5 Table 3. Raw data statistics of Illumina PE, MP and Nanopore reads of teak genome Platform Chemistry Number of Total bases of No of processed Total bases after Coverage () raw reads raw reads (bp) reads (bp) processing (bp) Illumina HiSeq PE (150 2) 137,231,716 41,443,978,232 128,115,515 37,507,019,850 109 Illumina HiSeq (2-4 Kb) MP (150 2) 25,436,869 7,681,934,438 10,776,772 2,408,197,618 20 Illumina HiSeq (4-6Kb) MP (150 2) 19,268,378 5,819,050,156 8,470,072 1,898,380,817 15 Nanopore Long read (5-1,345,484) 782,591 2,685,280,348 782,591 2,685,280,348 7.06 Total coverage 151 Table 4. Draft genome assembly statistics of teak Parameters Contig Scaffold Gap closer Draft genome Contigs generated 3,500 3,004 3,004 2,993 Maximum contig length (bp) 1,718,119 1,718,322 1,718,606 1,718,606 Minimum contig length (bp) 332 445 445 1,100 Average contig length (bp) 90,394 105,594 105,712 106,098 Total contigs length (bp) 316,377,938 317,203,315 317,558,121 317,551,182 Total number of non-ATGC characters 2,084,563 2,374,171 808,446 808,446 Percentage of non-ATGC characters 0.659 0.748 0.255 0.255 Contigs  500 bp 3,484 3,003 3,003 2,993 Contigs  1 Kbp 3,478 2,993 2,993 2,993 Contigs  10 Kbp 2,431 2,069 2,069 2,069 Contigs  1 Mbp 8 18 18 18 N50 value (bp) 277,872 357,576 357,576 357,576 genome. However, high yield of nanopore using 9.4 chemistry about 3.36% of the total genome that can aid in molecular marker has resolved this issue. Genomic applications to genetic resource con- development. A total of 6,976 (1.62%) retroelements, 6,518 servation and breeding in forest trees is expected to harness much (1.58%) long terminal repeats (LTRs), 4,431 (0.24%) DNA transpo- benefits due to the long read sequencing technologies. The proc- sons, 2,687 (0.70%) Copia-type, and 2,499 (0.81%) Gypsy-type essed PE reads along with MP and nanopore reads were assembled sequences were predicted in teak. Number of repeat elements in teak using MaSuRCA de novo assembler. It used all the Illumina and differed from other Lamiaceae members, Mentha longifolia and 47,48 46 Nanopore reads choosing a kmer size of 105 and estimated the teak Ocimum sanctum. In M. longifolia LTR elements were pre- genome size of 371,016,305 bp. Genome size of teak was estimated dicted as 3,866, whereas in teak it was 6,518. In Ocimum tenuiflo- 40 48 465 Mbp through flow cytometry (1C ¼ 0.48 pg). Typically, data rum , the percent distribution of long interspersed nuclear elements obtained from the WGS approach in plants with genome size exceed- (LINEs), LTR, unclassified repeat elements and total interspersed ing a few hundred mega bases are difficult to assemble satisfactorily repeats was 0.3, 11.07, 27.99, and 40.71, respectively, but in teak due to highly repetitive DNA. The final draft genome used for the distribution was 0.04, 1.58, 0.02, and 1.88 respectively. In con- downstream analysis had 2,993 filtered contigs (>1 Kbp) with maxi- trast, small RNA repeats were not recorded in O. tenuiflorum but mum, minimum and average contig length of 1,718,606, 1,100 and 0.03% observed in teak. Presence of higher number of simple 106,098 bp, respectively (Table 4). The N50 value of the assembly repeats, retroelements and interspersed repeats are common in was 357,576 bp. Comparative WGS analysis including estimated ge- plants, which was reflected in teak genome as well. Overall, the nome size, assembly statistics and annotation details of Lamiaceae percent distribution of repeat elements in teak genome seemed to be members is provided in Supplementary Table S4. very low compared to pine genome, where 82% of genome is repeti- tive in nature. The sequencing methods, per cent coverage and regions covered in the genome influence the representation of repeat 3.2. Repetitive genome elements elements. The teak genome is estimated to be 317 Mb in length, Repetitive DNAs and transposable elements are ubiquitously present and at least 11% of its sequence is observed to be made of repeat ele- in eukaryotic genomes. They provide wide variety of variations ments. Variations in repeat elements among the members of across plant species. Repetitive elements are fast evolving compo- Lamiaceae could be due to the variations in chromosome number nents of nuclear genome that play an important role in evolution of and ploidy level among the species which may reflect on evolutionary 42 43 the species and interspecific divergence. The internal sequence distances between genomes. Further, in this study, only 6% of the variability of various repeat elements depends on the ratio between genome was used for masking and WGS assembly may have missed mutation and homogenization/fixation rates within a species. out some regions that are rich in repeated sequences. Repeat masking of the reference genome of teak with A. thaliana showed that a total of 19,046,577 bp (6% of the genome) had repeat 3.3. Annotation and gene prediction elements which was very low when compared to Salvia repeat ele- ments. The classification of repetitive elements of teak genome is Functional annotation, process of identifying sequence similarity to provided in Table 5. The simple repeats were dominant occupying other known genes or proteins, in teak was carried out using the Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 6 Genome sequencing of Teak Table 5. Overview of repeat elements in teak genome (Supplementary Table S8). The top five pathways were involved in plant-pathogen interaction, plant hormone signal transduction, car- Type Number of Length Percentage bon metabolism, ribosomes, and protein processing (Supplementary elements occupied (bp) in genome Table S9). Teak is known for its natural resistance against various decaying Retroelements 6,976 5,153,927 1.62 SINEs 1 46 0.00 agents and is highly durable. Many biochemical studies on teak LINEs 457 122,681 0.04 wood indicated the role of several secondary metabolites and pheno- L1/CIN4 457 122,681 0.04 lic compounds including flavanoids, alkaloids, terpenoids, quinines 53,54 LTR elements 6,518 5,031,200 1.58 and tannins that play a major role for its durability. This study Ty1/Copia 2,687 2,212,758 0.70 has identified a total of 615 gene sequences that directly code for Gypsy/DIRS1 2,499 2,564,424 0.81 enzymes involved in the synthesis of specialized secondary metabo- DNA transposons 4,431 749,766 0.24 lites (363 genes) and biosynthesis of terpenoids and polyketides (252 hobo-Activator 647 139,917 0.04 genes). Lipid metabolism related genes were also represented in Tc1-IS630-Pogo 1,298 216,636 0.07 higher number (542) (Supplementary Table S8). The colour of wood Tourist/Harbinger 285 70,800 0.02 is associated with extractive content, and is a useful parameter to es- Unclassified 242 63,153 0.02 Total interspersed repeats  5,966,846 1.88 timate the durability of heartwood. Identification of several genes re- Small RNA 158 104,298 0.03 sponsible for the production of above compounds in the teak Satellites 1 54 0.00 genome would pave way for understanding the basis of natural resis- Simple repeats 253,260 10,655,131 3.36 tance of teak timber. Recently, it was shown that heartwood specific Low complexity 46,994 2,328,770 0.73 transcriptome signatures were responsible for the presence of partic- Total bases masked 19,046,577 6.00 ular secondary metabolites through functional genomics studies in Santalum album. Similar approaches would provide further insights into secondary wood formation in teak. hard masked draft genome for gene prediction. A total of 36,172 3.4. The frequency and distribution of SSR types in proteins were predicted of which 31,126 (86%) proteins were anno- teak genome tated against Viridiplantae (Supplementary Table S5) and 5,046 pro- In this study, totally 2,993 scaffolds amounting to 317.5 Mbp were teins were unannotated. In this study, length of the longest and examined for SSRs, of which 2,938 sequences were harbouring shortest annotated protein sequence was 5,648 and 32 amino acids, SSRs. Further, 2,846 sequences had more than one SSR and 11,255 respectively. The number of predicted genes is in accordance with SSRs were in compound form. Different types of SSR recorded in the Lamiaceae members, Mentha and Ocimum, except for one cultivar teak genome are shown in Table 6. A total of 182,712 SSR motifs 46–48 of O. tenuiflorum, where 53,480 genes were predicted. List of were identified, where perfect SSRs were represented in maximum species considered for pathway analysis by homology based align- numbers (170,574) with an overall frequency of 537.15 loci/Mbp ac- ment of the T. grandis draft genome is given in the Supplementary counting for 93% of SSRs. Compound, complex and interrupted Table S6. The proteins with >30% identity cut off were taken for types constituted 7% of the total SSRs, where interrupted complex pathway analysis. Overall, 344 taxa had similarity hits when type was the least in number. Among the pure repeat motifs, mono- searched against Uniprot Viridiplantae protein database for similar- nucleotide repeats were represented in maximum counts (88,766) ity using BLASTP programme with an e-value of e-10. Fifteen plant followed by di (81,215), tri (14,654), tetra (8,086), penta (1,967) species showed high homology is listed out in Supplementary and hexanucleotides (1,161) (Table 7). Predominant (>1,000) Table S7. Among the 31,126 predicted protein-coding genes, 17,353 repeat times were 12–22 for mononucleotides, 7–16 for dinucleoti- genes (55%) showed high similarity to Erythranthe guttata 2,478 to des, 5–8 for trinucleotides, 4–7 for tetranucleotides and 4 for penta- Coffea canephora, 1,922 to Solanum tuberosum and 1,049 to Vitis nucleotides. The major repeat motifs with over 5000 loci were (A)n, vinifera. Highest per cent of similarity in the predicted coding genes (T)n, (C)n, (G)n, (AC)n, (AT)n, (AG)n, (GT)n and (CT)n. Nine trinu- with E. guttata (17,353 genes) could reflect genetic relationship of cleotide, four tetranucleotide and two pentanucleotide motifs were this genus with teak as both these belonging to the class predominant (Supplementary Table S10). (AT)n repeat motif with Lamioideae. frequency of 131.2 loci/Mb was the most predominant dinucleotide Gene ontology analysis revealed that 48.08% genes related to mo- SSRs, accounting for over 51.3% of the total dinucleotide SSRs. lecular functions (Fig. 2a), 34.35% genes related to cellular compo- Primers were designed for 86,854 SSRs which had sufficient left and nents (Fig. 2b) and 12.56% involved in biological processes (Fig. 2c). right sequences (Supplementary Table S11). Presence of large num- In terms of biological processes, the major categories were transcrip- ber of short repeat type SSR loci in the teak genome may be due to tion, regulation of transcription and metabolic and defense pro- the higher genomic mutation rate and long evolutionary history of cesses. Cellular component consisted of a major portion of integral the genus. component of membrane, followed by nucleus and cytoplasm com- ponents. In terms of molecular function, the top three GO terms 3.5. Selection of polymorphic SSRs and validation were ATP binding, DNA binding and metal ion binding activities. Classification of all the protein sequences grouped under five catego- SSRs have become powerful markers for population genetic analysis, ries such as metabolism, cellular processes, environmental informa- QTL mapping and other related genetic and genomic studies. tion processing, genetic information processing and organismal The conventional methods for SSR genotyping are labour intensive, systems. Metabolism related sequences were represented in the high- time consuming and costly, especially for tree species that lack DNA est number, in which genes were representing carbohydrate metabo- sequence information in the public databases. The recent advances in lism (1,100) ranked first, followed by amino acid metabolism (643) NGS methods offer rapid identification of repeat size variations by Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 7 Figure 2. Characterization of teak genome sequence by gene ontology categories: (a) Biological process; (b) Molecular function; (c) Cellular component. Table 6. Characteristics of six types of SSRs in teak genome genome, 16,252 showed polymorphisms across these genotypes and primer pairs were developed for 13,007 SSRs (Supplementary SSR Total Total Average Frequency Density Table S12). Heterozygous and homozygous conditions at each mi- type counts length (bp) length (bp) (loci/Mb) (bp/Mb) crosatellite locus across all genotypes were detected by the compara- cd 6,309 248,207 39.34 19.87 781.63 tive module (PSR poly finder). cx 227 13,835 60.95 0.71 43.57 Gel electrophoresis of all the 10 primer pairs developed for poly- icd 4,049 170,803 42.18 12.75 537.88 morphic SSRs generated by the PSR software produced perfect band- icx 670 43,668 65.18 2.11 137.51 ing pattern and no optimization of primer annealing temperature ip 883 34,519 39.09 2.78 108.7 were required. All the amplified loci generated polymorphism except p 170,574 3,006,200 17.62 537.15 9,466.82 one locus as in the PSR results. Validation of more number of primer pairs in efficient allele separation systems like capillary elec- sequencing. Five teak accessions were re-sequenced with coverage trophoresis would strengthen the SSR marker development. These level of 7.3x–11.6x (Table 8). The PSR tool was used to compare se- results showed that identification of polymorphic SSRs by sequenc- quence variants among the five assemblies against the SSR sequences ing is highly cost efficient and rapid compared to conventional of reference genome. Among the 170,574 perfect SSRs found in teak methods SSR identification. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 8 Genome sequencing of Teak Table 7. The number, length, frequency and density of six different types of SSRs Nucleotide Total counts Total length (bp) Average length (bp) Frequency (loci/Mb) Density (bp/Mb) SSRs in the whole genome (%) Mononucleotide 88,766 1,321,753 14.89 279.53 4,162.3 45.32 Dinucleotide 81,215 1,664,278 20.49 255.75 5,241 41.4 Trinucleotide 14,654 286,074 19.52 46.15 900.88 7.48 Tetranucleotide 8,086 146,728 18.15 25.46 462.06 4.13 Pentanucleotide 1,967 42,960 21.84 6.19 135.29 1 Hexanucleotide 1,161 29,724 25.6 3.66 93.604 0.59 Cd, compound; cx, complex; icd, interrupted compound; icx, interrupted complex; ip, imperfect; p, perfect. with lesser number (10–14) of Lamiaceae species (data not shown). Table 8. Read Statistics of the teak samples sequenced at low The resulted phylogenetic tree retained the previously reported uncer- depth coverage for identification of polymorphic SSRs tain taxonomic status of the genus Tectona, which formed a separate Accession Sample Total Total Total Coverage sister clade to the clades comprising of Laminoideae, Ajugoideae, ID code raw processed reference () Nepetoideae, Scutellarioideae and Premonoideae (Fig. 3). Monophyly reads (bp) reads (bp) covered (%) of all the sub families under Lamiaceae were also reported. Earlier classifications on morphology considered Tectona under 1 NR 22,503,220 21,315,714 90.1 9.6 73–76 3 WR 28,219,627 26,462,008 90.6 11.6 tribe Tectoneae in subfamily Viticoideae. Molecular phylogeny 4 DI 19,670,649 18,335,390 87.6 7.8 analysis and novel combination of morphological features delineated 5 HI 17,029,396 16,043,025 85.8 7.3 Tectona to be an earliest diverging lineage in Lamiaceae. The phylo- 6 TP 18,838,068 18,451,073 94.3 8.2 genetic studies of Lamiaceae family deduced T. grandis to be an early Average 8.9 4 diverging lineage. The time estimated in the present study confirmed the early origin and divergence of the genera around 21.4508 Mya [95% highest posterior density (HPD): 10.11–34.52 Mya] (Fig. 4). Most of the genera of the family Lamiaceae were found to have a 3.6. Phylogeny and divergence time estimation Miocene origin. Recently, two new subfamilies have been proposed The Lamiaceae (Labiatae) family contains 236 genera and over 7,000 to Lamiaceae viz. Callicarpoideae and Tectonoideae with Tectona as species, and is one of the largest families of seed plants. The family a monotypic taxon. Accordingly, assigning Tectona to a mono- Verbenaceae is closely related to Lamiaceae and differentiated mainly generic subfamily Tectonoideae need to be considered, however it on the basis of terminal (Verbenceae)/gynobasis (Lamiaceae) style with demands an extensive sampling within Lamiaceae and multigene difficulties in separating members of one family from the other. phylogeny analysis to provide an appropriate taxonomical position. Previously, several genera from Verbenaceae were transferred to 2,59 Lamiaceae including Tectona but the systematic position of genus Tectona still lacks clarity. Out of 236 genera of Lamiaceae, 226 were 4. Conclusion placed under seven subfamilies (Ajugoideae, Lamioideae, Nepetoideae, Shrinkage of natural populations of teak in its native locations and Prostantheroideae, Scutellarioideae, Symphorematoideae and worldwide increase of managed plantations demand conservation of Viticoideae) and 10 genera were listed as Incertae sedis (of ‘uncertain native forests, which are critical in providing the best possible alleles placement’) by considering morphology, secondary metabolites and 60–64 to maintain genetic diversity. Captive plantations inherently have molecular phylogeny. However, a recent study on chloroplast narrow genetic base, limited gene flow, and exist in non-native envi- phylogeny proposed additional three subfamilies in Lamiaceae to en- ronments, and these characteristics often significantly alter the evolu- compass eight genera of Incertae sedis, leaving Tectona and Callicarpa tionary trajectory leading to decrease in population fitness. unassigned. In the present study, potential phylogenetic plastid marker Conservation and maintenance of wild progenitors are increasingly sequences, ycf2and psbB were used to disentangle the taxonomic posi- important for genetic improvement programmes. Thus, full genome tion of the genus Tectona. Although several different genes were used sequence of teak was developed and an attempt for the functional for phylogeny analysis of teak, the ycf2and psbB genes were not analyses of genome components was carried out to use it as a tool reported so far. The plastid gene psbB which codes for the core protein for conservation and tree breeding programmes. The polymorphic of Photosystem II, has the highest level of translation among the chlo- DNA markers developed in this study will propel the genetic and ge- roplast genes and provides an excellent opportunity to investigate an nomic research in teak, hitherto unavailable for the highly valuable unusual evolutionary situation. Phylogenetic utility of psbB gene timber yielding tropical hardwood species. The draft genome of teak sequences has been tested in many plant families under the Order 66 67 68 along with a large number of markers will benefit various explorative Lamiales such as Lamium, Chelonopsis, Salvia and studies including, genetic basis of wood properties, pest tolerance, Prosanthera among others. Similar to psbB, studies on molecular sys- adaptive traits, germplasm movement and genetic resource tematics have undoubtedly proved that ycf gene is more variable than 70,71 conservation. matK in many taxa to resolve the phylogeny issues. This gene har- bours highest level nucleotide genetic diversity among the angiosperm plastid genomes. As the number of entries of Lamiaceae members in Acknowledgements the public domain is very less, combination of two genes, ycf2and psbB sequences generated phylogeny tree with 16 Lamiaceae species. We are thankful to the State Forest Departments of Kerala, Tamil Nadu and Inclusion of other genes (rbcLand psaA) resulted in phylogenetic tree Karnataka for their permission to collect teak samples. The financial support Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 9 Figure 3. Bayesian tree generated by analysis of two plastid sequences psb and ycf2. Posterior probabilities are shown above the branches. Scale bar specifies mean branch length. Six major clades representing subfamilies of the Lamiaceae family are indicated. 2. Thorne, R. F. 1992, Classification and geography of the flowering plants, Bot. Rev., 58, 225–327. 3. Harley, R. M., Atkins, S. and Budantsey, A. L. 2004, Labiatae. In: Kubitzki, K. and Kadereit, J. W. (eds) Families and Genera of Vascular Plants. Flowering Plants. Dicotyledons  Lamiales (except Acanthaceae Including Avicenniaceae, vol. 7. Berlin: Springer, pp. 167–275. 4. Li, B., Cantino, P. D. and Olmstead, R. G. 2016, A large-scale chloroplast phylogeny of the Lamiaceae sheds new light on its subfamilial classifica- tion, Sci. Rep., 6, 3434. 5. Kollert, W. and Kleine, M., eds. 2017, The Global Teak Study. Analysis, Evaluation and Future Potential of Teak Resources, IUFRO World Series, 36. Vienna. pp 108. 6. Kollert, W. and Cherubini, L. 2012, Teak resources and market assess- ment 2010, FAO Planted Forests and Trees Working Paper, FP/47/E, Rome. 7. Deb, J. C., Phinn, S., Butt, N. and Mcalpine, C. A. 2017, Climatic-induced shifts in the distribution of teak (Tectona grandis)in tropical Asia: implications for forest management and planning, Environ. Manag., 60, 422–35. 8. Hansen, O. K., Changtragoon, S., Ponoy, B., Lopez, J., Richard, J. and Kjær, E. D. 2017, Worldwide translocation of teak—origin of landraces Figure 4. Chronogram of Lamiaceae (genus Tectona) based on two plastid and present genetic base, Tree Genet. Genomes, 13, 87. sequences psb and ycf2, estimated from secondary calibration strategies as 9. Keiding, H., Wellendorf, H. and Lauridsen, E. B. 1986, Evaluation of an implemented in BEAST. Calibration points are indicated with black dots. International Series of Teak Provenance Trials. Humlebaek: Danida Node bar indicates 95% HPD interval for node ages. Geological time scale is Forest Seed Centre. given in Mya. 10. Kjaer, E. D., Lauridsen, E. B. and Wellendorf, H. 1995, Second Evaluation of an International Series of Teak Provenance Trails. received from Department of Biotechnology, Government of India (No. BT/ Humlebaek: Danida Forest Seed Centre. PR7143/PBD/16/1011/2012) is gratefully acknowledged. 11. Chaix, G., Monteuuis, O., Garcia, C., et al. 2011, Genetic variation in major phenotypic traits among diverse genetic origins of teak (Tectona grandis L.f.) planted in Taliwas, Sabah, East Malaysia, Ann. For. Sci., 68, Conflict of interest 1015–26. None declared. 12. Monteuuis, O., Goh, D. K. S., Garcia, C., Alloysius, D., Gidiman, J., Bacilieri, R. and Chaix, G. 2011, Genetic variation of growth and tree quality traits among 42 diverse genetic origins of Tectona grandis planted Supplementary data under humid tropical conditions in Sabah, East Malaysia, Tree Genet. Genomes, 7, 1263–75. Supplementary data are available at DNARES online. 13. Callister, A. N. 2013, Genetic parameters and correlations between stem size, forking, and flowering in teak (Tectona grandis), Can. J. For. Res., 43, 1145–50. References 14. Monteuuis, O. and Goh, D. K. S. 2015, Field growth performances of 1. Tewari, D.N. 1992, A Monograph on Teak (Tectona grandis Linn. f.), teak genotypes of different ages clonally produced by rooted cuttings, Dehra Dun, India: International Book Distributors in vitro microcuttings, and meristem culture, Can. J. For. Res., 45, 9–14. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 10 Genome sequencing of Teak 15. Gunaga, R. P. and Vasudeva, R. 2002, Variation in flowering phenology 38. Yamamura, Y., Kurosaki, F. and Lee, J. B. 2017, Elucidation of terpenoid in a clonal seed orchard of teak, J. Tree Sci., 21, 1–10. metabolism in Scoparia dulcis by RNA-seq analysis, Sci. Rep., 7, 43311. 16. Nicodemus, A., Nagarajan, B. and Narayanan C. 2005, RAPD variation 39. Holliday, J. A., Aitken, S. N., Cooke, J. E. K., et al. 2017, Advances in in Indian teak populations and its implications for breeding and conserva- ecological genomics in forest trees and applications to genetic resources tion. In: Bhat, K.M., Nair, K.K.N., Bhat, K.V., Muralidharan, E.M. and conservation and breeding, Mol. Ecol., 26, 706–17. Sharma, J.K. (eds.), Quality Timber Products of Teak from Sustainable 40. Ohri, D. and Kumar, A. 1986, Nuclear DNA amounts in some tropical Forest Management. Kerala Forest Research Institute, Yokohama: India hardwoods, Caryologia, 39, 303–7. and International Tropical Timber Organization, pp. 321–30. 41. Ling, H. Q., Zhao, S., Liu, D., Wang, J., et al. 2013, Draft genome of the 17. Shrestha, M. K., Volkaert, H. and Straeten, D. V. D. 2005, Assessment of wheat A-genome progenitor Triticum urartu, Nature, 496, 87–90. genetic diversity in Tectona grandis using amplified fragment length poly- 42. Britten, R. J. 2010, Transposable element insertions have strongly affected morphism markers, Can. J. For. Res., 35, 1017–22. human evolution, Proc. Natl. Acad. Sci. U. S. A., 107, 19945–8. 18. Sreekanth, P. M., Balasundaran, M., Nazeem, P. A. and Suma, T. B. 43. Mehrotra, S. and Goyal, V. 2014, Repetitive sequences in plant nuclear 2012, Genetic diversity of nine natural Tectona grandis L.f. populations DNA: types, distribution, evolution and function, Genomics Proteomics of the Western Ghats in Southern India, Conserv. Genet., 13, 1409–19. Bioinformatics, 12, 164–71. 19. Fofana, I. J., Ofori, D., Poitel, M. and Verhaegen, D. 2009, Diversity and 44. Dover, G. A. 1986, Molecular drive in multigene families: how biological genetic structure of teak (Tectona grandisL.f.) in its natural range using novelties arise, spread and are assimilated, Trends Genet., 2, 159–65. DNA microsatellite markers, New Forests, 37, 175–95. 45. Xu, H., Song, J., Luo, H., et al. 2016, Analysis of the genome sequence of 20. Hansen, O. K., Changtragoon, S., Ponoy, B., et al. 2015, Genetic resour- the medicinal plant, Salvia Miltiorrhiza, Mol. Plant, 9, 949–52. ces of teak (Tectona grandis Linn. f.)—strong genetic structure among 46. Vining, K. J., Johnson, S. R., Ahkami, A., et al. 2017, Draft genome se- natural populations, Tree Genet. Genomes, 11, 802. quence of Mentha longifolia and development of resources for mint culti- 21. Galeano, E., Vasconcelos, T. S., Vidal, M., Mejia-Guerra, M. K. and var improvement, Mol. Plant., 10, 323–39. Carrer, H. 2015, Large-scale transcriptional profiling of lignified tissues in 47. Rastogi, S., Kalra, A., Gupta, V., et al. 2015, Unravelling the genome of Tectona grandis, BMC Plant Biol., 15, 221. Holy basil: an “incomparable” “elixir of life” of traditional Indian medi- 22. Diningrat, D. S., Widiyanto, S. M., Pancoro, A., et al. 2015, cine, BMC Genomics, 16, 413. Transcriptome of teak (Tectona grandis, L.f) in vegetative to generative 48. Upadhyay, A. K., Chacko, A. R., Gandhimathi, A., et al. 2015, Genome stages development, J. Plant Sci., 10, 1–14. sequencing of herb Tulsi (Ocimum tenuiflorum) unravels key genes be- 23. Doyle, J. J. and Doyle, J. L. 1990, Isolation of plant DNA from fresh tis- hind its strong medicinal properties, BMC Plant Biol., 15, 212., sue, Focus, 12, 13–5. 49. Weiss-Schneeweiss, H., Leitch, A. R., Jamie, J., Jang, T. S. and Macas, J. 24. Kajitani, R., Toshimoto, K., Noguchi, H., et al. 2014, Efficient de novo as- 2015, Employing next generation sequencing to explore the repeat land- sembly of highly heterozygous genomes from whole-genome shotgun scape of the plant genome. In: Ho ¨ randl, E. and Appelhans, M. (eds). Next short reads, Genome Res., 24, 1384–95. Generation Sequencing in Plant Systematic, Regnum Vegetabile 157. 25. Zimin, A., Marc ¸ais, G., Puiu, D., Roberts, M., Salzberg, S. and Yorke, J. Ko ¨ nigstein, Germany: Koeltz Scientific Books, pp. 1–25. 2013, The MaSuRCA genome assembler, Bioinformatics, 29, 2669–77. 50. Neale, D. B., Wegrzyn, J. L., Stevens, K. A., et al. 2014, Decoding the 26. Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. and Pirovano, W. massive genome of loblolly pine using haploid DNA and novel assembly 2011, Scaffolding pre-assembled contigs using SSPACE, Bioinformatics, strategies, Genome Biol. , 15, R59. 27, 578–9. 51. Parween, S., Nawaz, K., Roy, R., et al. 2015, An advanced draft genome 27. Simpson, J. T. and Durbin, R. 2012, Efficient de novo assembly of large assembly of a desi type chickpea (Cicer arietinum L.), Sci. Rep., 5, 12806. genomes using compressed data structures, Genome Res., 22, 549–56. 52. Refulio-Rodriguez, N. F. and Olmstead, R. G. 2014, Phylogeny of 28. Stanke, M., Diekhans, M., Baertsch, R. and Haussler, D. 2008, Using na- Lamiidae, Am. J. Bot., 101, 287–99. tive and syntenically mapped cDNA alignments to improve de novo gene 53. Francisco, A. M., Rodne, Y. L., Rosa, M. V., Clara, N. and Jose, M. G. finding, Bioinformatics, 24, 637–44. M. 2008, Bioactive apocarotenoids from Tectona grandis, 29. Cantarella, C. and D’Agostino, N. 2015, PSR: polymorphic SSR retrieval, Phytochemistry, 69, 2708–15. BMC Res. Notes, 8, 54. Lacret, R., Varela, R. M., Molinillo, J. M. G., Nogueiras, C. and Macı´as, 30. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. and F. A. 2012, Tectonoelins, new norlignans from a bioactive extract of Higgins, D. G. 1997, The CLUSTAL_X windows interface: flexible strate- Tectona grandis, Phytochem. Lett., 5, 382–6. gies for multiple sequence alignment aided by quality analysis tools, Nuc. 55. Celedon, J. M. and Bohlmann, J. 2018, An extended model of heartwood Acids Res., 25, 4876–82. secondary metabolism informed by functional genomics, Tree Physiol., 31. Hall, T. A. 1999, BioEdit: a user-friendly biological sequence alignment 14, 1–9. editor and analysis program for Windows 95/98/NT, Nucleic Acids 56. Toth, G., Gaspari, Z. and Jurka, J. 2000, Microsatellites in different eu- Symp. Ser., 41, 95–8. karyotic genome: survey and analysis, Genome, Res., 10, 967–981. 32. Darriba, D., Taboada, G. L., Doallo, R. and Posada, D. 2012, 57. Vieira, M. L. C., Santini, L., Diniz, A. L. and Munhoz, C. D. F. 2016, JModelTest 2: more models, new heuristics and parallel computing, Nat. Microsatellite markers: what they mean and why they are so useful, Methods, 9, 772. Genet. Mol. Biol., 39, 312–28. 33. Akaike, H. 1974, A new look at statistical model identification, IEEE 58. Pullaiah, T., Sri Rama Murthy, K. and Karuppusamy, S. 2007, Flora of Trans. Automat. Contr., 19, 716–23. Eastern Ghats, Vol. 3. New Delhi: Regency Publications. 34. Ronquist, F. and Huelsenbeck, J. P. 2003, MRBAYES 3: bayesian phylo- 59. Cantino, P. D., Harley, R. M. and Wagstaff, S. J. 1992, Genera of genetic inference under mixed models, Bioinformatics, 19, 1572–4. Labiatae: status and classification. In: Harley, R.M. and Reynolds, T. 35. Drummond, A. J., Suchard, M. A., Xie, D. and Rambaut, A. 2012, (eds). Advances in Labiatae Science, Kew, UK: Key Botanic Garden, pp. Bayesian phylogenetics with BEAUti and the BEAST 1.7, Mol. Biol. Evol., 511–22. 29, 1969–73. 60. Cantino, P. D., Olmstead, R. G. and Wagstaff, S. J. 1997, A comparison 36. Roy, T. and Lindqvist, C. 2015, New insights into evolutionary relation- of phylogenetic nomenclature with the current system: a botanical case ships within the subfamily Lamioideae (Lamiaceae) based on pentatrico- study, Syst. Biol., 46, 313–31. peptide repeat (PPR) nuclear DNA sequences, Amer. J. Bot., 102, 61. Wagstaff, S. J. and Olmstead, R. G. 1997, Phylogeny of Labiatae and 1721–35. Verbenaceae inferred from rbcL sequences, Syst. Bot., 22, 165–79. 37. Drummond, A. J., Ho, S. Y. W., Phillips, M. J. and Rambaut, A. 2006, 62. Wagstaff, S. J., Hickerson, L., Spangler, R., Reeves, P. A. and Olmstead, Relaxed phylogenetics and dating with confidence, PLoS Biol., 4, R. G. 1998, Phylogeny in Labiatae s. l., inferred from cpDNA sequences, e88–710. Plant Syst. Evol., 209, 265–74. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 11 63. Alvarenga, S. A. V., Gastmans, J. P., Rodrigues, G. D. V., Moreno, P. R. 69. Conn, B. J., Wilson, T. C., Henwood, M. J. and Proft, K. 2013, H. and Emerenciano, V. D. P. 2001, A computer-assisted approach for Circumscription and phylogenetic relationships of Prostanthera densa chemotaxonomic studies - diterpenes in Lamiaceae, Phytochemistry, 56, and P. marifolia (Lamiaceae), Telopea, 15, 149–64. 583–95. 70. Neubig, K. M., Whitten, W. M., Carlsward, B. S., Blanco, M. A., Endara, L., 64. Bramley, G. L. C., Forest, F. and De Kok, R. P. J. 2009, Troublesome Williams, N. H. and Moore, M. 2009, Phylogenetic utility of ycf1 in orchids: tropical mints: re-examining generic limits of Vitex and relations a plastid gene more variable than matK, Plant Syst. Evol., 277, 75–84. (Lamiaceae) in South East Asia, Taxon, 58, 500–10. 71. Dong, W., Xu, C., Li, C., et al. 2015, ycf1, the most promising plastid 65. Morton, B. R. and Levin, J. A. 1997, The atypical codon usage of the DNA barcode of land plants, Sci. Rep., 5, 8348. plant psbA gene may be the remnant of an ancestral bias, Proc. Nat. Aca. 72. Dong, W., Liu, J., Yu, J., Wang, L. and Zhou, S. 2012, Highly variable Sci. U.S.A., 94, 11434–8. chloroplast markers for evaluating plant phylogeny at low taxonomic lev- 66. Bendiksby, M., Brysting, A. K., Thorbek, L., Gussarov, G. and Ryding, G. els and for DNA barcoding, PLoS One, 7, e35071. 2011, Molecular phylogeny and taxonomy of genus Lamium L 73. Briquet, J. 1897, Verbenaceae. In: Engler, A. and Prantl, K. (eds). Die (Lamiaceae): Disentangling origins of presumed allotetraploids, Taxon, Naturlichen Pflanzenfamilien Teil 4. Leipzig: Engelmann, pp. 132–182. 60, 986–1000., 74. Melchior, H. 1964, A. Engler’s Syllabus Der Pflanzenfamilien, vol. 2. 67. Xiang, C.-L., Zhang, Q., Scheen, A.-C., Cantino, P. D., Funamoto, T. and Berlin: Borntraeger, pp. 666. Peng, H. 2013, Molecular phylogenetics of Chelonopsis (Lamiaceae: gom- 75. Moldenke, H. N. 1975, Notes on new and noteworthy plants LXXVII, phostemmateae) as inferred from nuclear and plastid DNA and morphol- Phytologia, 31, 28. ogy, Taxon, 62, 375–86. 76. Takhtajan, A. 1983, Outline of the classification of flowering plants 68. Jenks, A. A., Walker, J. B. and Kim, S. C. 2011, Evolution and origins of (Magnoliophyta), Brittonia, 35, 254–359. the Mazatec hallucinogenic sage, Salvia divinorum (Lamiaceae): a molecu- 77. Li, B. and Olmstead, R. 2017, Two new subfamilies in Lamiaceae, lar phylogenetic approach, J. Plant Res., 124, 593–600. Phytotaxa, 313, 222–6. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png DNA Research Oxford University Press

Draft genome of a high value tropical timber tree, Teak (Tectona grandis L. f): insights into SSR diversity, phylogeny and conservation

Free
11 pages

Loading next page...
 
/lp/ou_press/draft-genome-of-a-high-value-tropical-timber-tree-teak-tectona-grandis-GhfLOR3uEG
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
ISSN
1340-2838
eISSN
1756-1663
D.O.I.
10.1093/dnares/dsy013
Publisher site
See Article on Publisher Site

Abstract

Teak (Tectona grandis L. f.) is one of the precious bench mark tropical hardwood having quali- ties of durability, strength and visual pleasantries. Natural teak populations harbour a variety of characteristics that determine their economic, ecological and environmental importance. Sequencing of whole nuclear genome of teak provides a platform for functional analyses and development of genomic tools in applied tree improvement. A draft genome of 317 Mb was as- sembled at 151 coverage and annotated 36, 172 protein-coding genes. Approximately about 11.18% of the genome was repetitive. Microsatellites or simple sequence repeats (SSRs) are un- doubtedly the most informative markers in genotyping, genetics and applied breeding applica- tions. We generated 182,712 SSRs at the whole genome level, of which, 170,574 perfect SSRs were found; 16,252 perfect SSRs showed in silico polymorphisms across six genotypes sug- gesting their promising use in genetic conservation and tree improvement programmes. Genomic SSR markers developed in this study have high potential in advancing conservation and management of teak genetic resources. Phylogenetic studies confirmed the taxonomic po- sition of the genus Tectona within the family Lamiaceae. Interestingly, estimation of divergence time inferred that the Miocene origin of the Tectona genus to be around 21.4508 million years ago. Key words: teak, genome sequencing, SSRs, phylogeny, divergence V C The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 1 Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 2 Genome sequencing of Teak 1. Introduction growth rate, stem form, flowering, fruit yield and wood characteris- 9,10 tics. The results of genetic improvement in teak showed an overall Teak (Tectona grandis L. f.; 2n ¼ 2x ¼ 36) belonging to the mint positive trend, however, possible existence of non-additive genetic family Lamiaceae is one of the world’s highly valued tropical timber control for economically important traits in seed progeny generated species that occurs naturally in India, Laos, Myanmar and 11,12 1–4 high genetic variability even within a family. Further, it was sug- Thailand. The timber is highly valued because of its extreme dura- gested that selection of teak stem size can be carried out at the age of bility, strength, stability as well as resistance to pests, chemicals and 3 years, wherein indirect selection on flowering age will improve water. Quinones and other extractives found abundant in the teak- forking height. Clonal propagation through budding, rooting of wood are responsible for its anti-termite and anti-fungal properties cuttings and in vitro propagation has facilitated tree improvement conferring the longevity of its timber. Thus, the wood is used in and deployment of superior performers for commercial cultivation building ships, railway carriages, sleepers, construction, furniture, ve- towards increasing timber yield. Globally, although clonal seed neer and carving. Owing to its admirable timber qualities and aes- orchards (CSOs) were established for production of quality seed thetical properties (Fig. 1), teak has been successfully established as stock, reproductive fitness and success of CSO was a chronic prob- pure plantations in India and elsewhere since 1850. The recent log lem largely pivoted to asynchronous flowering. export ban imposed by Myanmar has resulted steep rise in interna- Economic importance, concerns for conservation of natural popu- tional prices of plantation grown teak from Latin America and Africa lations and increase in plantation area, demand understanding the ge- leading to expansion of plantation area. The estimated planted area netic basis of the economic traits in teak. Hence, like many other of teak is about 4.25–6.89 million ha with over 1.7 million ha in forest plantation species (e.g. Pinus, Populus and Eucalyptus), genetic India. Though the share of teak is <2% of tropical round wood pro- and genomic resources in teak needs to be comprehended. Genetic di- duction, its high value continuously attracts new planters. At the versity in natural and introduced populations of teak has been assessed same time, natural populations are continuously diminishing due to with markers such as random amplified polymorphic DNA, ampli- illegal logging, anthropogenic pressures and climate change. A recent 17,18 fied fragment length polymorphism and simple sequence repeat study on the effect of climate change in teak expresses the risks of bi- 19,20 markers (SSRs) or Microsatellites. These studies generated infor- ological invasion into teak habitats and recommends conservation of mation on population genetic structure of natural teak populations. crucial teak growing areas and suitable management planning. Indian teak is genetically very distinct from Thai and Indonesian prov- Population structure of teak across natural and introduced locations 17,19 enances and African landraces. Knowledge generated in these reveal that the landraces in introduced locations have comparatively studies is highly useful to implement conservation programmes in teak narrow genetic diversity, thus demanding exploration of genetic di- to improve sustainable management of teak forests. Recently, tran- versity of natural provenances and their conservation. scriptomes of secondary wood and vegetative to flowering transition Teak has several intrinsic genetic qualities that allow its genetic stage were developed, leading to identification of genes involved in improvement for timber production. Wide and discontinuous natural lignification, secondary metabolite production and flower formation. distribution across varying edaphic and climatic conditions in India Although numerous studies focused on various life history traits exist offers enormous potential for capturing adaptive genetic variation in teak, comprehensive understanding on the complete genome infor- for genetic improvement. As a first step in the genetic improvement mation remains unexplored. Next generation sequencing-based whole programme at a global level, a series of seventy five international genome sequencing (NGS-WGS) yields more information on genomic provenance trials co-ordinated by Danish International Development scans of polymorphism to precisely estimate various population ge- Agency were established during 1973–76 across sixteen countries. netic parameters including demographic history. In this context, to en- Evaluation of the provenances indicated wide variation for survival, rich genomic resources in teak, the present study reports whole genome of teak through Illumina Hiseq 2000 NGS platform followed by de novo contig assembly, gene annotation and subsequent discov- ery of SSR polymorphism. 2. Materials and methods 2.1. Plant material and genomic DNA sample preparation Open-pollinated seeds were collected from six dominant trees, one each from a provenance covering the entire latitudinal range of natu- ral distribution of teak in India (Table 1). Seedlings were raised with family identity and one vigorously growing seedling randomly cho- sen from each family was used in this study. Genomic DNA was extracted from fresh and young leaves using standard CTAB method and was purified using DNAeasy Plant Mini kit (Qiagen, USA). The quantity and quality of the genomic DNA were assessed using Nanodrop2000 (Thermo Fisher Scientific, USA), Qubit (Thermo Fisher Scientific, USA) and agarose gel electrophoresis. 2.2. Library preparation and genome sequencing WGS was performed using Illumina HiSeq 2000 platform and Figure 1. Cross section of teak wood showing major features (a) pith; (b) heart wood; (c) sap wood; (d) growth ring; and (e) medullary rays. Oxford Nanopore Technologies MinION device by the Genotypic Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 3 Table 1. Details of plant materials used in this study Accession ID Sample code Name of the provenance State Latitude Longitude Altitude (m) Rainfall (mm) 1 NR Nilambur Kerala 11 17’ N 76 19’ E 49 2,600 2 AU Arienkavu Kerala 8 96’ N 77 14’ E 240 2,600 3 WR Walayar Kerala 10 52’ N 76 46’ E 216 1,500 4 DI Dandeli Karnataka 15 07’ N 74 35’ E 510 2,200 5 HI Hojai Assam 26 39’ N 92 36’ E 69 1,750 6 TP Topslip Tamilnadu 10 26’ N 76 50’ E 640 1,350 Technology, Bengaluru, India in accordance to standard protocols. carried out using Augustus 3.0.2 programme and predicted proteins Accession 2 was selected for the generation of high quality reference were searched against the Uniprot non-redundant plant protein data- genome assembly. Accessions 1, 3, 4, 5 and 6 were subjected to low base (Taxonomy ¼ Viridiplante) with BlastX algorithm with an e-value coverage genome sequencing to identify polymorphic SSRs. In the (e-10) for gene ontology and annotation. Pathway annotation was per- case of accession 2, one paired end (PE) (150 bp  2) library of size formed by mapping the sequences obtained from Blast2GO to the con- 300–700 bp, two mate pair (MP) libraries (2–4 and 4–6 kb frag- tents of the KEGG Automatic Annotation Server (http://www.genome. ments) and one nanopore library with genomic DNA (2 lg) were jp/kegg/kaas/ (10 November 2017, date last accessed)). List of eudicot prepared for sequencing. In Illumina HiSeq 2000 platform, one lane plants used as reference organism for pathway identification in KASS of the flow cell was used for each sequencing library. Nanopore se- server is given in Supplementary Table S1. quencing was performed using R9.4 flow cells on a MinION Mk 1B device (Oxford Nanopore) with the MinKNOW software (versions 2.5. Identification of SSRs and detection of 1.0.5–1.5.12) and base calling was performed using Albacore 1.1.0 polymorphism (Oxford Nanopore). Template reads were exported as FASTA using FASTA formatted scaffolds of teak were analysed for frequency and poretools version 0.6. In the case of other five accessions (1, 3, 4, 5 density of SSRs using the Perl script MIcroSAtelitte (MISA; http:// and 6) one PE library for each with the size of 300–700 bp was se- pgrc.ipk-gatersleben.de/misa/ (20 November 2017, date last quenced at 15 coverage through Illumina Hiseq 2000 platform. accessed)). Initially SSRs of 1–6 nucleotides motifs were identified The sequence data is uploaded in genome database of GenBank with the minimum repeat unit defined as 10 for mononucleotides, 6 (Project id: PRJNA374940). The assembled genome, protein sequen- for dinucleotides, 5 for trinucleotides, 4 for tetra-nucleotides, and 3 ces and its annotation, GO and pathway information are available in each for penta and hexa-nucleotides. Compound SSRs were defined the web link https://biit.cs.ut.ee/supplementary/WGSteak/. as 2 SSRs interrupted by 100 bases. To design primers flanking the microsatellite loci, two interface Perl script modules were used to interchange data between MISA and the primer designing software 2.3. De novo genome assembly Primer 3. The SSR containing scaffolds were used to design the pri- The Illumina PE raw reads were filtered using FastQC and the raw mers with the following parameters. Primer length 18–25 bp, with reads were processed by in-house (Genotypic Technology, 20 bp as optimum; primer GC content ¼ 30–0%, with the optimum Bangaluru, India) ABLT script for low-quality bases and adapters 24 value of 50%; primer Tm 57–63 C, and product size ranged 100– removal. The MP reads were processed using Platanus internal 300 bp. trimmer for adapters and low-quality regions towards 3’-end. Polymorphic SSRs across five samples were analysed using acces- The processed PE reads along with MP and nanopore reads were sion 2 as reference. Polymorphic SSR retrieval tool (PSR) compris- used for contig generation using MaSuRCA v 3.2.2 de novo assem- 25 ing two modules (PSR_read_retrieval and PSR_poly_finder) were bler. To assemble the genome following command was used in deployed to detect SSR length polymorphisms of perfect repeats MaSuRCA assembler: GRAPH_KMER_SIZE ¼ auto, from NGS data. It is to be noted that PSR tool identifies length poly- LIMIT_JUMP_COVERAGE ¼ 300, JF_SIZE ¼ 38000000000, morphisms in perfect microsatellites only. Also, it filters out all the DO_HOMOPOLYMER_TRIM ¼ 1. Scaffolding of the assembled 26 reads that match twice or more on the reference sequence as well as contigs was performed using SSPACE v 2.0.5 with processed PE 27 non-overlapping paired-end reads that are aligned on the same mi- and MP reads followed by gap filling using Gap Closer v 1.12. crosatellite locus. Minimum number of supporting reads and read The genome size was estimated automatically during read comput- depth was fixed to 10 and 30, respectively. This process detects poly- ing stage which utilized both the Illuimna and Nanopore reads. morphic SSRs based on the availability of left and right border Similarly, the low depth Illumina reads generated for five accessions unique sequences based on the complete coverage of the SSR region of teak were assembled using accession 2 as reference. The se- in the sequence data. quenced data was uploaded to the Genome database of GenBank The polymorphic SSR data generated from PSR software was vali- (Project id: PRJNA421422). dated through gel electrophoresis. Totally 10 SSRs representing di, tri and tetra-nucleotide motifs were randomly chosen and amplified 2.4. Genome annotation with 10 randomly selected teak trees from Topslip (Latitude: For a functional overview of draft genome, assembled scaffolds were 10 29’09.5’N; Longitude: 75 50’03.8’E Altitude: 736m) provenance convertedtoFASTA formattedsequences, hardmaskedby (Supplementary Table S2). PCR amplification were set to 10 ml vol- RepeatMasker tool (RepeatMasker Open-3.0; www.repeatmasker.org ume containing 5 ng of template DNA, 2.5 mM MgCl , 2.5 ml10 (10 November 2017, date last accessed)). Repeats of Arabidopsis thali- PCR buffer, 0.5 mM of primer, 0.5 U Taq DNA polymerase, and 2.5 ana were used as reference for genome masking. Gene prediction was mM of dNTPs. The PCR cycling profile was programmed to 94 C Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 4 Genome sequencing of Teak Table 2. Details on 10 SSR markers used for amplification of teak germplasm SSR SSR Motif Primers (5’-3’) Annealing Number Product code temperature ( C) of alleles size (bp) IFT83 (AG) F5’AATTGGCATAAAGCGTGCTACTR5’CGCACGTCCTATTTTGGTTTAT 54 5 312-393 IFT821 (AC) F5’CCCCAATTATGTCAACCGACT R5’GGCATTATCTAAGATCGCAAGG 53.9 3 331-350 IFT63 (ATG) F5’CCCAAAGCGAATAATATCCTAC R5’CATGACTTGTTCGATGGGCTAAT 54 3 250-275 IFT168 (TCT) F5’ATCTTCAGCAGAGGAGGCTATG R5’GTGCCCTTTTCTCTCTTCTTCA 55 4 287-306 IFT479b (GGA) F5’GTGAAGATTCGGGTATGGAGAG R5’TACTCCCAGATTTCCCAATCAC 55 3 331-343 IFT382 (TATG) F5’TACTCATCACTGTCCCCAGTTG R5’GAACGGGAATCTAGAGTTGTGG 56 3 337-350 IFT 28 (AAAG) F5’CAGCCTCTGCATGTCAAATAAA R5’TTAGAGCTGGATATGCCATTGA 53.6 3 381-393 IFT14 (TTCT) F5’TGTGGTATTGGACCATCTGAAA R5’GGTAACCCACCAACAAATATGC 54 5 265-278 IFT3 (GAAAG) F5’TTCCACCTACTGGTTAAGGAAC R5’ATGGCTTACCAATTTACCAAACC 54 1 330 IFT777 (TCAGG) F5’TACTAACCGGAAGAGGGAAACC R5’TGTCGCTATGGACAGTTCATCT 56 4 312-343 for 5 min, 35 cycles at 94 C for 45 s, 58 C (annealing temperature LogCombiner ver. 2.4.4. Twenty percent of the trees were removed varied for each locus) for 45 s (Table 2), 72 C for 45 s, and a final as burn-in and the resulting trees were summarized with extension at 72 C for 10 min. Banding pattern was visualised by Treeannotator ver. 2.4.4. Finally, the summarized single trees were silver staining after denaturing polyacrylamide gel (5%) visualized in FigTree ver. 1.4.2. electrophoresis. 3. Results and discussion 2.6. Phylogenetic tree construction Several members of the order Lamiales are well known for their sec- Plastid gene sequences of psbB, rbcL, psaAand ycf2 from 16 species of ondary metabolites of medicinal value and 17 species of the family the family Lamiaceae were accessed from the Genbank public domain were sequenced for the whole genome (https://www.ncbi.nlm.nih. (https://www.ncbi.nlm.nih.gov/genbank (13 February 2018, date last gov/genome/? term¼Lamiales (13 February 2018, date last accessed)) for constructing the phylogenetic tree with Olea europaea accessed)) due to their economic importance. However, the woody (Oleaceae) as the out group (Supplementary Table S3). Sequences were species teak, one of the world’s premier timber species cultivated aligned using multiple sequence alignment tool implemented in across 65 countries has only 3,269 nucleotide accessions including 6 ClustalX ver. 2.0. The sequences were manually refined in BioEdit ESTs available so far in the public domain (7 November 2017, date ver. 7.0.9. Phylogenetic analyses were performed for concatenated last accessed). Owing to the commercial importance, this study was sequences of plastid gene regions. JModeltest ver. 2.1.7 was used to undertaken to unravel the genome structure to facilitate conservation choose the appropriate model of sequence evolution according to the and improvement of teak genetic resources. The only available geno- Akaike information criterion (AIC). Bayesian interference analysis mic resource in teak is the de novo assembly with transcriptome in was performed in MrBayes ver. 3.2.6. The Markov chain Monte 12- and 60-year-old trees to generate unigenes related to lignin bio- Carlo algorithm was run for ten million generations, over four chains synthesis. All the earlier gene assemblies were based on short-read each, sampled every 1000 generations. The estimated sample size was technology. This work on draft genome assembly using long read checked using Tracer ver. 1.4 (http://beast.bio.ed.ac.uk/Tracer (22 technology like MP and MinION nanopore sequencing would pro- February 2018, date last accessed)). The first 25% of the sampled trees vide an excellent resource to comprehend genome structure, genetic was discarded as burn-in. The phylogenetic tree out of MrBayes variation and conservation. ver.3.2.6 was visualized in FigTree ver.1.4.2 (http://tree.bio.ed.ac.uk/ software/figtree/ (22 February 2018, date last accessed)). 3.1. De novo assembly and characterization of 2.7. Divergence time estimation genomic sequences The estimation of divergence time was performed using Bayesian ap- The study has generated a high quality reference genome for the ac- proach implemented in BEAST ver. 2.4.4 programme. Bayesian ap- cession 2 which was assembled from PE, MP and nanopore library proach was deployed to estimate divergence time of the genus sequences. The numbers of raw and processed reads are summarized Tectona with respect to the other subfamilies and genera within the in Table 3. High depth (109) PE sequencing provided a global family Lamiaceae. The HKY model was used based on the result of overview of teak genome with over 137.2 million PE sequences. AIC from JModeltest under an uncorrelated lognormal relaxed clock After suitable filtration, a total of 128.2 million sequences, represent- model. Yule speciation model was used as tree prior. Two calibration ing 93.43% of the raw reads were obtained. Two different MP li- points (57.6 million years ago, Mya) for Nepetoideae and 23.9 Mya braries with 2–4 and 4–6 Kb generated raw reads of about 7,681 for Lamioideae) were used based on the previous reports to deter- (20) and 5,819 Mbp (15) of which processed read length was mine specific nodes prior and lognormal distributions. Markov 2,408 and 1,898 Mbp, respectively. Chain was run for 10 generations, while every 1,000 generations Nanopore library generated 782,591 reads with read length of were sampled. The chronograms shown were calculated using the 2,685, 280,348 bp, corresponding to 7.06 coverage of the genome. median clade credibility tree plus 95% confidence intervals. Results Longest read was 1,345,484 bp and the average read length was were analysed using Tracer ver. 1.4 to assess the convergence statis- 3,431 bp. All these sequences with 151 coverage were included in tics of the sequences. The effective sample sizes for all parameters whole genome assembly. Application of nanopore sequencing was and the tree files from the four runs of BEAST were combined using challenging mainly due to large size and repetitive nature of the plant Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 5 Table 3. Raw data statistics of Illumina PE, MP and Nanopore reads of teak genome Platform Chemistry Number of Total bases of No of processed Total bases after Coverage () raw reads raw reads (bp) reads (bp) processing (bp) Illumina HiSeq PE (150 2) 137,231,716 41,443,978,232 128,115,515 37,507,019,850 109 Illumina HiSeq (2-4 Kb) MP (150 2) 25,436,869 7,681,934,438 10,776,772 2,408,197,618 20 Illumina HiSeq (4-6Kb) MP (150 2) 19,268,378 5,819,050,156 8,470,072 1,898,380,817 15 Nanopore Long read (5-1,345,484) 782,591 2,685,280,348 782,591 2,685,280,348 7.06 Total coverage 151 Table 4. Draft genome assembly statistics of teak Parameters Contig Scaffold Gap closer Draft genome Contigs generated 3,500 3,004 3,004 2,993 Maximum contig length (bp) 1,718,119 1,718,322 1,718,606 1,718,606 Minimum contig length (bp) 332 445 445 1,100 Average contig length (bp) 90,394 105,594 105,712 106,098 Total contigs length (bp) 316,377,938 317,203,315 317,558,121 317,551,182 Total number of non-ATGC characters 2,084,563 2,374,171 808,446 808,446 Percentage of non-ATGC characters 0.659 0.748 0.255 0.255 Contigs  500 bp 3,484 3,003 3,003 2,993 Contigs  1 Kbp 3,478 2,993 2,993 2,993 Contigs  10 Kbp 2,431 2,069 2,069 2,069 Contigs  1 Mbp 8 18 18 18 N50 value (bp) 277,872 357,576 357,576 357,576 genome. However, high yield of nanopore using 9.4 chemistry about 3.36% of the total genome that can aid in molecular marker has resolved this issue. Genomic applications to genetic resource con- development. A total of 6,976 (1.62%) retroelements, 6,518 servation and breeding in forest trees is expected to harness much (1.58%) long terminal repeats (LTRs), 4,431 (0.24%) DNA transpo- benefits due to the long read sequencing technologies. The proc- sons, 2,687 (0.70%) Copia-type, and 2,499 (0.81%) Gypsy-type essed PE reads along with MP and nanopore reads were assembled sequences were predicted in teak. Number of repeat elements in teak using MaSuRCA de novo assembler. It used all the Illumina and differed from other Lamiaceae members, Mentha longifolia and 47,48 46 Nanopore reads choosing a kmer size of 105 and estimated the teak Ocimum sanctum. In M. longifolia LTR elements were pre- genome size of 371,016,305 bp. Genome size of teak was estimated dicted as 3,866, whereas in teak it was 6,518. In Ocimum tenuiflo- 40 48 465 Mbp through flow cytometry (1C ¼ 0.48 pg). Typically, data rum , the percent distribution of long interspersed nuclear elements obtained from the WGS approach in plants with genome size exceed- (LINEs), LTR, unclassified repeat elements and total interspersed ing a few hundred mega bases are difficult to assemble satisfactorily repeats was 0.3, 11.07, 27.99, and 40.71, respectively, but in teak due to highly repetitive DNA. The final draft genome used for the distribution was 0.04, 1.58, 0.02, and 1.88 respectively. In con- downstream analysis had 2,993 filtered contigs (>1 Kbp) with maxi- trast, small RNA repeats were not recorded in O. tenuiflorum but mum, minimum and average contig length of 1,718,606, 1,100 and 0.03% observed in teak. Presence of higher number of simple 106,098 bp, respectively (Table 4). The N50 value of the assembly repeats, retroelements and interspersed repeats are common in was 357,576 bp. Comparative WGS analysis including estimated ge- plants, which was reflected in teak genome as well. Overall, the nome size, assembly statistics and annotation details of Lamiaceae percent distribution of repeat elements in teak genome seemed to be members is provided in Supplementary Table S4. very low compared to pine genome, where 82% of genome is repeti- tive in nature. The sequencing methods, per cent coverage and regions covered in the genome influence the representation of repeat 3.2. Repetitive genome elements elements. The teak genome is estimated to be 317 Mb in length, Repetitive DNAs and transposable elements are ubiquitously present and at least 11% of its sequence is observed to be made of repeat ele- in eukaryotic genomes. They provide wide variety of variations ments. Variations in repeat elements among the members of across plant species. Repetitive elements are fast evolving compo- Lamiaceae could be due to the variations in chromosome number nents of nuclear genome that play an important role in evolution of and ploidy level among the species which may reflect on evolutionary 42 43 the species and interspecific divergence. The internal sequence distances between genomes. Further, in this study, only 6% of the variability of various repeat elements depends on the ratio between genome was used for masking and WGS assembly may have missed mutation and homogenization/fixation rates within a species. out some regions that are rich in repeated sequences. Repeat masking of the reference genome of teak with A. thaliana showed that a total of 19,046,577 bp (6% of the genome) had repeat 3.3. Annotation and gene prediction elements which was very low when compared to Salvia repeat ele- ments. The classification of repetitive elements of teak genome is Functional annotation, process of identifying sequence similarity to provided in Table 5. The simple repeats were dominant occupying other known genes or proteins, in teak was carried out using the Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 6 Genome sequencing of Teak Table 5. Overview of repeat elements in teak genome (Supplementary Table S8). The top five pathways were involved in plant-pathogen interaction, plant hormone signal transduction, car- Type Number of Length Percentage bon metabolism, ribosomes, and protein processing (Supplementary elements occupied (bp) in genome Table S9). Teak is known for its natural resistance against various decaying Retroelements 6,976 5,153,927 1.62 SINEs 1 46 0.00 agents and is highly durable. Many biochemical studies on teak LINEs 457 122,681 0.04 wood indicated the role of several secondary metabolites and pheno- L1/CIN4 457 122,681 0.04 lic compounds including flavanoids, alkaloids, terpenoids, quinines 53,54 LTR elements 6,518 5,031,200 1.58 and tannins that play a major role for its durability. This study Ty1/Copia 2,687 2,212,758 0.70 has identified a total of 615 gene sequences that directly code for Gypsy/DIRS1 2,499 2,564,424 0.81 enzymes involved in the synthesis of specialized secondary metabo- DNA transposons 4,431 749,766 0.24 lites (363 genes) and biosynthesis of terpenoids and polyketides (252 hobo-Activator 647 139,917 0.04 genes). Lipid metabolism related genes were also represented in Tc1-IS630-Pogo 1,298 216,636 0.07 higher number (542) (Supplementary Table S8). The colour of wood Tourist/Harbinger 285 70,800 0.02 is associated with extractive content, and is a useful parameter to es- Unclassified 242 63,153 0.02 Total interspersed repeats  5,966,846 1.88 timate the durability of heartwood. Identification of several genes re- Small RNA 158 104,298 0.03 sponsible for the production of above compounds in the teak Satellites 1 54 0.00 genome would pave way for understanding the basis of natural resis- Simple repeats 253,260 10,655,131 3.36 tance of teak timber. Recently, it was shown that heartwood specific Low complexity 46,994 2,328,770 0.73 transcriptome signatures were responsible for the presence of partic- Total bases masked 19,046,577 6.00 ular secondary metabolites through functional genomics studies in Santalum album. Similar approaches would provide further insights into secondary wood formation in teak. hard masked draft genome for gene prediction. A total of 36,172 3.4. The frequency and distribution of SSR types in proteins were predicted of which 31,126 (86%) proteins were anno- teak genome tated against Viridiplantae (Supplementary Table S5) and 5,046 pro- In this study, totally 2,993 scaffolds amounting to 317.5 Mbp were teins were unannotated. In this study, length of the longest and examined for SSRs, of which 2,938 sequences were harbouring shortest annotated protein sequence was 5,648 and 32 amino acids, SSRs. Further, 2,846 sequences had more than one SSR and 11,255 respectively. The number of predicted genes is in accordance with SSRs were in compound form. Different types of SSR recorded in the Lamiaceae members, Mentha and Ocimum, except for one cultivar teak genome are shown in Table 6. A total of 182,712 SSR motifs 46–48 of O. tenuiflorum, where 53,480 genes were predicted. List of were identified, where perfect SSRs were represented in maximum species considered for pathway analysis by homology based align- numbers (170,574) with an overall frequency of 537.15 loci/Mbp ac- ment of the T. grandis draft genome is given in the Supplementary counting for 93% of SSRs. Compound, complex and interrupted Table S6. The proteins with >30% identity cut off were taken for types constituted 7% of the total SSRs, where interrupted complex pathway analysis. Overall, 344 taxa had similarity hits when type was the least in number. Among the pure repeat motifs, mono- searched against Uniprot Viridiplantae protein database for similar- nucleotide repeats were represented in maximum counts (88,766) ity using BLASTP programme with an e-value of e-10. Fifteen plant followed by di (81,215), tri (14,654), tetra (8,086), penta (1,967) species showed high homology is listed out in Supplementary and hexanucleotides (1,161) (Table 7). Predominant (>1,000) Table S7. Among the 31,126 predicted protein-coding genes, 17,353 repeat times were 12–22 for mononucleotides, 7–16 for dinucleoti- genes (55%) showed high similarity to Erythranthe guttata 2,478 to des, 5–8 for trinucleotides, 4–7 for tetranucleotides and 4 for penta- Coffea canephora, 1,922 to Solanum tuberosum and 1,049 to Vitis nucleotides. The major repeat motifs with over 5000 loci were (A)n, vinifera. Highest per cent of similarity in the predicted coding genes (T)n, (C)n, (G)n, (AC)n, (AT)n, (AG)n, (GT)n and (CT)n. Nine trinu- with E. guttata (17,353 genes) could reflect genetic relationship of cleotide, four tetranucleotide and two pentanucleotide motifs were this genus with teak as both these belonging to the class predominant (Supplementary Table S10). (AT)n repeat motif with Lamioideae. frequency of 131.2 loci/Mb was the most predominant dinucleotide Gene ontology analysis revealed that 48.08% genes related to mo- SSRs, accounting for over 51.3% of the total dinucleotide SSRs. lecular functions (Fig. 2a), 34.35% genes related to cellular compo- Primers were designed for 86,854 SSRs which had sufficient left and nents (Fig. 2b) and 12.56% involved in biological processes (Fig. 2c). right sequences (Supplementary Table S11). Presence of large num- In terms of biological processes, the major categories were transcrip- ber of short repeat type SSR loci in the teak genome may be due to tion, regulation of transcription and metabolic and defense pro- the higher genomic mutation rate and long evolutionary history of cesses. Cellular component consisted of a major portion of integral the genus. component of membrane, followed by nucleus and cytoplasm com- ponents. In terms of molecular function, the top three GO terms 3.5. Selection of polymorphic SSRs and validation were ATP binding, DNA binding and metal ion binding activities. Classification of all the protein sequences grouped under five catego- SSRs have become powerful markers for population genetic analysis, ries such as metabolism, cellular processes, environmental informa- QTL mapping and other related genetic and genomic studies. tion processing, genetic information processing and organismal The conventional methods for SSR genotyping are labour intensive, systems. Metabolism related sequences were represented in the high- time consuming and costly, especially for tree species that lack DNA est number, in which genes were representing carbohydrate metabo- sequence information in the public databases. The recent advances in lism (1,100) ranked first, followed by amino acid metabolism (643) NGS methods offer rapid identification of repeat size variations by Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 7 Figure 2. Characterization of teak genome sequence by gene ontology categories: (a) Biological process; (b) Molecular function; (c) Cellular component. Table 6. Characteristics of six types of SSRs in teak genome genome, 16,252 showed polymorphisms across these genotypes and primer pairs were developed for 13,007 SSRs (Supplementary SSR Total Total Average Frequency Density Table S12). Heterozygous and homozygous conditions at each mi- type counts length (bp) length (bp) (loci/Mb) (bp/Mb) crosatellite locus across all genotypes were detected by the compara- cd 6,309 248,207 39.34 19.87 781.63 tive module (PSR poly finder). cx 227 13,835 60.95 0.71 43.57 Gel electrophoresis of all the 10 primer pairs developed for poly- icd 4,049 170,803 42.18 12.75 537.88 morphic SSRs generated by the PSR software produced perfect band- icx 670 43,668 65.18 2.11 137.51 ing pattern and no optimization of primer annealing temperature ip 883 34,519 39.09 2.78 108.7 were required. All the amplified loci generated polymorphism except p 170,574 3,006,200 17.62 537.15 9,466.82 one locus as in the PSR results. Validation of more number of primer pairs in efficient allele separation systems like capillary elec- sequencing. Five teak accessions were re-sequenced with coverage trophoresis would strengthen the SSR marker development. These level of 7.3x–11.6x (Table 8). The PSR tool was used to compare se- results showed that identification of polymorphic SSRs by sequenc- quence variants among the five assemblies against the SSR sequences ing is highly cost efficient and rapid compared to conventional of reference genome. Among the 170,574 perfect SSRs found in teak methods SSR identification. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 8 Genome sequencing of Teak Table 7. The number, length, frequency and density of six different types of SSRs Nucleotide Total counts Total length (bp) Average length (bp) Frequency (loci/Mb) Density (bp/Mb) SSRs in the whole genome (%) Mononucleotide 88,766 1,321,753 14.89 279.53 4,162.3 45.32 Dinucleotide 81,215 1,664,278 20.49 255.75 5,241 41.4 Trinucleotide 14,654 286,074 19.52 46.15 900.88 7.48 Tetranucleotide 8,086 146,728 18.15 25.46 462.06 4.13 Pentanucleotide 1,967 42,960 21.84 6.19 135.29 1 Hexanucleotide 1,161 29,724 25.6 3.66 93.604 0.59 Cd, compound; cx, complex; icd, interrupted compound; icx, interrupted complex; ip, imperfect; p, perfect. with lesser number (10–14) of Lamiaceae species (data not shown). Table 8. Read Statistics of the teak samples sequenced at low The resulted phylogenetic tree retained the previously reported uncer- depth coverage for identification of polymorphic SSRs tain taxonomic status of the genus Tectona, which formed a separate Accession Sample Total Total Total Coverage sister clade to the clades comprising of Laminoideae, Ajugoideae, ID code raw processed reference () Nepetoideae, Scutellarioideae and Premonoideae (Fig. 3). Monophyly reads (bp) reads (bp) covered (%) of all the sub families under Lamiaceae were also reported. Earlier classifications on morphology considered Tectona under 1 NR 22,503,220 21,315,714 90.1 9.6 73–76 3 WR 28,219,627 26,462,008 90.6 11.6 tribe Tectoneae in subfamily Viticoideae. Molecular phylogeny 4 DI 19,670,649 18,335,390 87.6 7.8 analysis and novel combination of morphological features delineated 5 HI 17,029,396 16,043,025 85.8 7.3 Tectona to be an earliest diverging lineage in Lamiaceae. The phylo- 6 TP 18,838,068 18,451,073 94.3 8.2 genetic studies of Lamiaceae family deduced T. grandis to be an early Average 8.9 4 diverging lineage. The time estimated in the present study confirmed the early origin and divergence of the genera around 21.4508 Mya [95% highest posterior density (HPD): 10.11–34.52 Mya] (Fig. 4). Most of the genera of the family Lamiaceae were found to have a 3.6. Phylogeny and divergence time estimation Miocene origin. Recently, two new subfamilies have been proposed The Lamiaceae (Labiatae) family contains 236 genera and over 7,000 to Lamiaceae viz. Callicarpoideae and Tectonoideae with Tectona as species, and is one of the largest families of seed plants. The family a monotypic taxon. Accordingly, assigning Tectona to a mono- Verbenaceae is closely related to Lamiaceae and differentiated mainly generic subfamily Tectonoideae need to be considered, however it on the basis of terminal (Verbenceae)/gynobasis (Lamiaceae) style with demands an extensive sampling within Lamiaceae and multigene difficulties in separating members of one family from the other. phylogeny analysis to provide an appropriate taxonomical position. Previously, several genera from Verbenaceae were transferred to 2,59 Lamiaceae including Tectona but the systematic position of genus Tectona still lacks clarity. Out of 236 genera of Lamiaceae, 226 were 4. Conclusion placed under seven subfamilies (Ajugoideae, Lamioideae, Nepetoideae, Shrinkage of natural populations of teak in its native locations and Prostantheroideae, Scutellarioideae, Symphorematoideae and worldwide increase of managed plantations demand conservation of Viticoideae) and 10 genera were listed as Incertae sedis (of ‘uncertain native forests, which are critical in providing the best possible alleles placement’) by considering morphology, secondary metabolites and 60–64 to maintain genetic diversity. Captive plantations inherently have molecular phylogeny. However, a recent study on chloroplast narrow genetic base, limited gene flow, and exist in non-native envi- phylogeny proposed additional three subfamilies in Lamiaceae to en- ronments, and these characteristics often significantly alter the evolu- compass eight genera of Incertae sedis, leaving Tectona and Callicarpa tionary trajectory leading to decrease in population fitness. unassigned. In the present study, potential phylogenetic plastid marker Conservation and maintenance of wild progenitors are increasingly sequences, ycf2and psbB were used to disentangle the taxonomic posi- important for genetic improvement programmes. Thus, full genome tion of the genus Tectona. Although several different genes were used sequence of teak was developed and an attempt for the functional for phylogeny analysis of teak, the ycf2and psbB genes were not analyses of genome components was carried out to use it as a tool reported so far. The plastid gene psbB which codes for the core protein for conservation and tree breeding programmes. The polymorphic of Photosystem II, has the highest level of translation among the chlo- DNA markers developed in this study will propel the genetic and ge- roplast genes and provides an excellent opportunity to investigate an nomic research in teak, hitherto unavailable for the highly valuable unusual evolutionary situation. Phylogenetic utility of psbB gene timber yielding tropical hardwood species. The draft genome of teak sequences has been tested in many plant families under the Order 66 67 68 along with a large number of markers will benefit various explorative Lamiales such as Lamium, Chelonopsis, Salvia and studies including, genetic basis of wood properties, pest tolerance, Prosanthera among others. Similar to psbB, studies on molecular sys- adaptive traits, germplasm movement and genetic resource tematics have undoubtedly proved that ycf gene is more variable than 70,71 conservation. matK in many taxa to resolve the phylogeny issues. This gene har- bours highest level nucleotide genetic diversity among the angiosperm plastid genomes. As the number of entries of Lamiaceae members in Acknowledgements the public domain is very less, combination of two genes, ycf2and psbB sequences generated phylogeny tree with 16 Lamiaceae species. We are thankful to the State Forest Departments of Kerala, Tamil Nadu and Inclusion of other genes (rbcLand psaA) resulted in phylogenetic tree Karnataka for their permission to collect teak samples. The financial support Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 9 Figure 3. Bayesian tree generated by analysis of two plastid sequences psb and ycf2. Posterior probabilities are shown above the branches. Scale bar specifies mean branch length. Six major clades representing subfamilies of the Lamiaceae family are indicated. 2. Thorne, R. F. 1992, Classification and geography of the flowering plants, Bot. Rev., 58, 225–327. 3. Harley, R. M., Atkins, S. and Budantsey, A. L. 2004, Labiatae. In: Kubitzki, K. and Kadereit, J. W. (eds) Families and Genera of Vascular Plants. Flowering Plants. Dicotyledons  Lamiales (except Acanthaceae Including Avicenniaceae, vol. 7. Berlin: Springer, pp. 167–275. 4. Li, B., Cantino, P. D. and Olmstead, R. G. 2016, A large-scale chloroplast phylogeny of the Lamiaceae sheds new light on its subfamilial classifica- tion, Sci. Rep., 6, 3434. 5. Kollert, W. and Kleine, M., eds. 2017, The Global Teak Study. Analysis, Evaluation and Future Potential of Teak Resources, IUFRO World Series, 36. Vienna. pp 108. 6. Kollert, W. and Cherubini, L. 2012, Teak resources and market assess- ment 2010, FAO Planted Forests and Trees Working Paper, FP/47/E, Rome. 7. Deb, J. C., Phinn, S., Butt, N. and Mcalpine, C. A. 2017, Climatic-induced shifts in the distribution of teak (Tectona grandis)in tropical Asia: implications for forest management and planning, Environ. Manag., 60, 422–35. 8. Hansen, O. K., Changtragoon, S., Ponoy, B., Lopez, J., Richard, J. and Kjær, E. D. 2017, Worldwide translocation of teak—origin of landraces Figure 4. Chronogram of Lamiaceae (genus Tectona) based on two plastid and present genetic base, Tree Genet. Genomes, 13, 87. sequences psb and ycf2, estimated from secondary calibration strategies as 9. Keiding, H., Wellendorf, H. and Lauridsen, E. B. 1986, Evaluation of an implemented in BEAST. Calibration points are indicated with black dots. International Series of Teak Provenance Trials. Humlebaek: Danida Node bar indicates 95% HPD interval for node ages. Geological time scale is Forest Seed Centre. given in Mya. 10. Kjaer, E. D., Lauridsen, E. B. and Wellendorf, H. 1995, Second Evaluation of an International Series of Teak Provenance Trails. received from Department of Biotechnology, Government of India (No. BT/ Humlebaek: Danida Forest Seed Centre. PR7143/PBD/16/1011/2012) is gratefully acknowledged. 11. Chaix, G., Monteuuis, O., Garcia, C., et al. 2011, Genetic variation in major phenotypic traits among diverse genetic origins of teak (Tectona grandis L.f.) planted in Taliwas, Sabah, East Malaysia, Ann. For. Sci., 68, Conflict of interest 1015–26. None declared. 12. Monteuuis, O., Goh, D. K. S., Garcia, C., Alloysius, D., Gidiman, J., Bacilieri, R. and Chaix, G. 2011, Genetic variation of growth and tree quality traits among 42 diverse genetic origins of Tectona grandis planted Supplementary data under humid tropical conditions in Sabah, East Malaysia, Tree Genet. Genomes, 7, 1263–75. Supplementary data are available at DNARES online. 13. Callister, A. N. 2013, Genetic parameters and correlations between stem size, forking, and flowering in teak (Tectona grandis), Can. J. For. Res., 43, 1145–50. References 14. Monteuuis, O. and Goh, D. K. S. 2015, Field growth performances of 1. Tewari, D.N. 1992, A Monograph on Teak (Tectona grandis Linn. f.), teak genotypes of different ages clonally produced by rooted cuttings, Dehra Dun, India: International Book Distributors in vitro microcuttings, and meristem culture, Can. J. For. Res., 45, 9–14. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 10 Genome sequencing of Teak 15. Gunaga, R. P. and Vasudeva, R. 2002, Variation in flowering phenology 38. Yamamura, Y., Kurosaki, F. and Lee, J. B. 2017, Elucidation of terpenoid in a clonal seed orchard of teak, J. Tree Sci., 21, 1–10. metabolism in Scoparia dulcis by RNA-seq analysis, Sci. Rep., 7, 43311. 16. Nicodemus, A., Nagarajan, B. and Narayanan C. 2005, RAPD variation 39. Holliday, J. A., Aitken, S. N., Cooke, J. E. K., et al. 2017, Advances in in Indian teak populations and its implications for breeding and conserva- ecological genomics in forest trees and applications to genetic resources tion. In: Bhat, K.M., Nair, K.K.N., Bhat, K.V., Muralidharan, E.M. and conservation and breeding, Mol. Ecol., 26, 706–17. Sharma, J.K. (eds.), Quality Timber Products of Teak from Sustainable 40. Ohri, D. and Kumar, A. 1986, Nuclear DNA amounts in some tropical Forest Management. Kerala Forest Research Institute, Yokohama: India hardwoods, Caryologia, 39, 303–7. and International Tropical Timber Organization, pp. 321–30. 41. Ling, H. Q., Zhao, S., Liu, D., Wang, J., et al. 2013, Draft genome of the 17. Shrestha, M. K., Volkaert, H. and Straeten, D. V. D. 2005, Assessment of wheat A-genome progenitor Triticum urartu, Nature, 496, 87–90. genetic diversity in Tectona grandis using amplified fragment length poly- 42. Britten, R. J. 2010, Transposable element insertions have strongly affected morphism markers, Can. J. For. Res., 35, 1017–22. human evolution, Proc. Natl. Acad. Sci. U. S. A., 107, 19945–8. 18. Sreekanth, P. M., Balasundaran, M., Nazeem, P. A. and Suma, T. B. 43. Mehrotra, S. and Goyal, V. 2014, Repetitive sequences in plant nuclear 2012, Genetic diversity of nine natural Tectona grandis L.f. populations DNA: types, distribution, evolution and function, Genomics Proteomics of the Western Ghats in Southern India, Conserv. Genet., 13, 1409–19. Bioinformatics, 12, 164–71. 19. Fofana, I. J., Ofori, D., Poitel, M. and Verhaegen, D. 2009, Diversity and 44. Dover, G. A. 1986, Molecular drive in multigene families: how biological genetic structure of teak (Tectona grandisL.f.) in its natural range using novelties arise, spread and are assimilated, Trends Genet., 2, 159–65. DNA microsatellite markers, New Forests, 37, 175–95. 45. Xu, H., Song, J., Luo, H., et al. 2016, Analysis of the genome sequence of 20. Hansen, O. K., Changtragoon, S., Ponoy, B., et al. 2015, Genetic resour- the medicinal plant, Salvia Miltiorrhiza, Mol. Plant, 9, 949–52. ces of teak (Tectona grandis Linn. f.)—strong genetic structure among 46. Vining, K. J., Johnson, S. R., Ahkami, A., et al. 2017, Draft genome se- natural populations, Tree Genet. Genomes, 11, 802. quence of Mentha longifolia and development of resources for mint culti- 21. Galeano, E., Vasconcelos, T. S., Vidal, M., Mejia-Guerra, M. K. and var improvement, Mol. Plant., 10, 323–39. Carrer, H. 2015, Large-scale transcriptional profiling of lignified tissues in 47. Rastogi, S., Kalra, A., Gupta, V., et al. 2015, Unravelling the genome of Tectona grandis, BMC Plant Biol., 15, 221. Holy basil: an “incomparable” “elixir of life” of traditional Indian medi- 22. Diningrat, D. S., Widiyanto, S. M., Pancoro, A., et al. 2015, cine, BMC Genomics, 16, 413. Transcriptome of teak (Tectona grandis, L.f) in vegetative to generative 48. Upadhyay, A. K., Chacko, A. R., Gandhimathi, A., et al. 2015, Genome stages development, J. Plant Sci., 10, 1–14. sequencing of herb Tulsi (Ocimum tenuiflorum) unravels key genes be- 23. Doyle, J. J. and Doyle, J. L. 1990, Isolation of plant DNA from fresh tis- hind its strong medicinal properties, BMC Plant Biol., 15, 212., sue, Focus, 12, 13–5. 49. Weiss-Schneeweiss, H., Leitch, A. R., Jamie, J., Jang, T. S. and Macas, J. 24. Kajitani, R., Toshimoto, K., Noguchi, H., et al. 2014, Efficient de novo as- 2015, Employing next generation sequencing to explore the repeat land- sembly of highly heterozygous genomes from whole-genome shotgun scape of the plant genome. In: Ho ¨ randl, E. and Appelhans, M. (eds). Next short reads, Genome Res., 24, 1384–95. Generation Sequencing in Plant Systematic, Regnum Vegetabile 157. 25. Zimin, A., Marc ¸ais, G., Puiu, D., Roberts, M., Salzberg, S. and Yorke, J. Ko ¨ nigstein, Germany: Koeltz Scientific Books, pp. 1–25. 2013, The MaSuRCA genome assembler, Bioinformatics, 29, 2669–77. 50. Neale, D. B., Wegrzyn, J. L., Stevens, K. A., et al. 2014, Decoding the 26. Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. and Pirovano, W. massive genome of loblolly pine using haploid DNA and novel assembly 2011, Scaffolding pre-assembled contigs using SSPACE, Bioinformatics, strategies, Genome Biol. , 15, R59. 27, 578–9. 51. Parween, S., Nawaz, K., Roy, R., et al. 2015, An advanced draft genome 27. Simpson, J. T. and Durbin, R. 2012, Efficient de novo assembly of large assembly of a desi type chickpea (Cicer arietinum L.), Sci. Rep., 5, 12806. genomes using compressed data structures, Genome Res., 22, 549–56. 52. Refulio-Rodriguez, N. F. and Olmstead, R. G. 2014, Phylogeny of 28. Stanke, M., Diekhans, M., Baertsch, R. and Haussler, D. 2008, Using na- Lamiidae, Am. J. Bot., 101, 287–99. tive and syntenically mapped cDNA alignments to improve de novo gene 53. Francisco, A. M., Rodne, Y. L., Rosa, M. V., Clara, N. and Jose, M. G. finding, Bioinformatics, 24, 637–44. M. 2008, Bioactive apocarotenoids from Tectona grandis, 29. Cantarella, C. and D’Agostino, N. 2015, PSR: polymorphic SSR retrieval, Phytochemistry, 69, 2708–15. BMC Res. Notes, 8, 54. Lacret, R., Varela, R. M., Molinillo, J. M. G., Nogueiras, C. and Macı´as, 30. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. and F. A. 2012, Tectonoelins, new norlignans from a bioactive extract of Higgins, D. G. 1997, The CLUSTAL_X windows interface: flexible strate- Tectona grandis, Phytochem. Lett., 5, 382–6. gies for multiple sequence alignment aided by quality analysis tools, Nuc. 55. Celedon, J. M. and Bohlmann, J. 2018, An extended model of heartwood Acids Res., 25, 4876–82. secondary metabolism informed by functional genomics, Tree Physiol., 31. Hall, T. A. 1999, BioEdit: a user-friendly biological sequence alignment 14, 1–9. editor and analysis program for Windows 95/98/NT, Nucleic Acids 56. Toth, G., Gaspari, Z. and Jurka, J. 2000, Microsatellites in different eu- Symp. Ser., 41, 95–8. karyotic genome: survey and analysis, Genome, Res., 10, 967–981. 32. Darriba, D., Taboada, G. L., Doallo, R. and Posada, D. 2012, 57. Vieira, M. L. C., Santini, L., Diniz, A. L. and Munhoz, C. D. F. 2016, JModelTest 2: more models, new heuristics and parallel computing, Nat. Microsatellite markers: what they mean and why they are so useful, Methods, 9, 772. Genet. Mol. Biol., 39, 312–28. 33. Akaike, H. 1974, A new look at statistical model identification, IEEE 58. Pullaiah, T., Sri Rama Murthy, K. and Karuppusamy, S. 2007, Flora of Trans. Automat. Contr., 19, 716–23. Eastern Ghats, Vol. 3. New Delhi: Regency Publications. 34. Ronquist, F. and Huelsenbeck, J. P. 2003, MRBAYES 3: bayesian phylo- 59. Cantino, P. D., Harley, R. M. and Wagstaff, S. J. 1992, Genera of genetic inference under mixed models, Bioinformatics, 19, 1572–4. Labiatae: status and classification. In: Harley, R.M. and Reynolds, T. 35. Drummond, A. J., Suchard, M. A., Xie, D. and Rambaut, A. 2012, (eds). Advances in Labiatae Science, Kew, UK: Key Botanic Garden, pp. Bayesian phylogenetics with BEAUti and the BEAST 1.7, Mol. Biol. Evol., 511–22. 29, 1969–73. 60. Cantino, P. D., Olmstead, R. G. and Wagstaff, S. J. 1997, A comparison 36. Roy, T. and Lindqvist, C. 2015, New insights into evolutionary relation- of phylogenetic nomenclature with the current system: a botanical case ships within the subfamily Lamioideae (Lamiaceae) based on pentatrico- study, Syst. Biol., 46, 313–31. peptide repeat (PPR) nuclear DNA sequences, Amer. J. Bot., 102, 61. Wagstaff, S. J. and Olmstead, R. G. 1997, Phylogeny of Labiatae and 1721–35. Verbenaceae inferred from rbcL sequences, Syst. Bot., 22, 165–79. 37. Drummond, A. J., Ho, S. Y. W., Phillips, M. J. and Rambaut, A. 2006, 62. Wagstaff, S. J., Hickerson, L., Spangler, R., Reeves, P. A. and Olmstead, Relaxed phylogenetics and dating with confidence, PLoS Biol., 4, R. G. 1998, Phylogeny in Labiatae s. l., inferred from cpDNA sequences, e88–710. Plant Syst. Evol., 209, 265–74. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018 Y. Ramasamy et al. 11 63. Alvarenga, S. A. V., Gastmans, J. P., Rodrigues, G. D. V., Moreno, P. R. 69. Conn, B. J., Wilson, T. C., Henwood, M. J. and Proft, K. 2013, H. and Emerenciano, V. D. P. 2001, A computer-assisted approach for Circumscription and phylogenetic relationships of Prostanthera densa chemotaxonomic studies - diterpenes in Lamiaceae, Phytochemistry, 56, and P. marifolia (Lamiaceae), Telopea, 15, 149–64. 583–95. 70. Neubig, K. M., Whitten, W. M., Carlsward, B. S., Blanco, M. A., Endara, L., 64. Bramley, G. L. C., Forest, F. and De Kok, R. P. J. 2009, Troublesome Williams, N. H. and Moore, M. 2009, Phylogenetic utility of ycf1 in orchids: tropical mints: re-examining generic limits of Vitex and relations a plastid gene more variable than matK, Plant Syst. Evol., 277, 75–84. (Lamiaceae) in South East Asia, Taxon, 58, 500–10. 71. Dong, W., Xu, C., Li, C., et al. 2015, ycf1, the most promising plastid 65. Morton, B. R. and Levin, J. A. 1997, The atypical codon usage of the DNA barcode of land plants, Sci. Rep., 5, 8348. plant psbA gene may be the remnant of an ancestral bias, Proc. Nat. Aca. 72. Dong, W., Liu, J., Yu, J., Wang, L. and Zhou, S. 2012, Highly variable Sci. U.S.A., 94, 11434–8. chloroplast markers for evaluating plant phylogeny at low taxonomic lev- 66. Bendiksby, M., Brysting, A. K., Thorbek, L., Gussarov, G. and Ryding, G. els and for DNA barcoding, PLoS One, 7, e35071. 2011, Molecular phylogeny and taxonomy of genus Lamium L 73. Briquet, J. 1897, Verbenaceae. In: Engler, A. and Prantl, K. (eds). Die (Lamiaceae): Disentangling origins of presumed allotetraploids, Taxon, Naturlichen Pflanzenfamilien Teil 4. Leipzig: Engelmann, pp. 132–182. 60, 986–1000., 74. Melchior, H. 1964, A. Engler’s Syllabus Der Pflanzenfamilien, vol. 2. 67. Xiang, C.-L., Zhang, Q., Scheen, A.-C., Cantino, P. D., Funamoto, T. and Berlin: Borntraeger, pp. 666. Peng, H. 2013, Molecular phylogenetics of Chelonopsis (Lamiaceae: gom- 75. Moldenke, H. N. 1975, Notes on new and noteworthy plants LXXVII, phostemmateae) as inferred from nuclear and plastid DNA and morphol- Phytologia, 31, 28. ogy, Taxon, 62, 375–86. 76. Takhtajan, A. 1983, Outline of the classification of flowering plants 68. Jenks, A. A., Walker, J. B. and Kim, S. C. 2011, Evolution and origins of (Magnoliophyta), Brittonia, 35, 254–359. the Mazatec hallucinogenic sage, Salvia divinorum (Lamiaceae): a molecu- 77. Li, B. and Olmstead, R. 2017, Two new subfamilies in Lamiaceae, lar phylogenetic approach, J. Plant Res., 124, 593–600. Phytotaxa, 313, 222–6. Downloaded from https://academic.oup.com/dnaresearch/advance-article-abstract/doi/10.1093/dnares/dsy013/5003450 by Ed 'DeepDyve' Gillespie user on 12 July 2018

Journal

DNA ResearchOxford University Press

Published: May 24, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off