Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly

Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the... Background: Some of the most widely recognized coral reef fishes are clownfish or anemonefish, members of the family Pomacentridae (subfamily: Amphiprioninae). They are popular aquarium species due to their bright colours, adaptability to captivity, and fascinating behavior. Their breeding biology (sequential hermaphrodites) and symbiotic mutualism with sea anemones have attracted much scientific interest. Moreover, there are some curious geographic-based phenotypes that warrant investigation. Leveraging on the advancement in Nanopore long read technology, we report the first hybrid assembly of the clown anemonefish ( Amphiprion ocellaris) genome utilizing Illumina and Nanopore reads, further demonstrating the substantial impact of modest long read sequencing data sets on improving genome assembly statistics. Results: We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches. Conclusions: We present the first genome of any anemonefish and demonstrate the value of low coverage ( ∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic Received: 14 November 2017; Revised: 11 December 2017; Accepted: 27 December 2017 The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 2 Tan et al. studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae. Keywords: clownfish; long reads; genome; transcriptome; hybrid assembly Data Description The clown anemonefish, Amphiprion ocellaris (Fig. 1,NCBITaxon ID: 80 972, Fish Base ID:6509), is a well-known tropical marine fish species among the nonscientific community especially fol- lowing the Pixar film Finding Nemo and its sequel Finding Dory [1]. The visual appeal of A. ocellaris due to its bright coloration and behaviour and ease of husbandry have maintained a strong global demand for this species in the marine aquarium trade, driving a fine balance between positive environmental aware- ness and sustainable ornamental use [1, 2]. Further, given high survival rates and ability to complete their life cycle in captivity, captive-breeding programs to partially sustain their global trade have been successful [3]. For the scientific community, A. ocellaris or anemonefishes in general are actively studied due to their in- triguing reproductive strategy, i.e., sequential hermaphroditism Figure 1: The clown anemonefish ( Amphiprion ocellaris). Photo by Michael P. [4–7] and mutualistic relationships with sea anemones [8–12]. Hammer. Phenotypic body colour variation based on host-anemone use and geography also pose additional questions regarding adap- tive genetic variation [13]. size-selected (8–30 kb) with a BluePippin (Sage Science, Beverly, In recent years, concurrent with the advent of long read se- MA, USA), and processed using the Ligation Sequencing 1D Kit quencing technologies [14], several studies have explored com- bining short but accurate Illumina reads with long but less ac- (Oxford Nanopore, Oxford, UK) according to the manufacturer’s instructions. Three libraries were prepared and sequenced on curate Nanopore/PacBio reads to obtain genome assemblies that are usually more contiguous with higher completeness than as- 3 different R9.4 flowcells using the MinION portable DNA se- quencer (Oxford Nanopore, Oxford, UK) for 48 hours. semblies based on Illumina-only reads [15–19]. To further con- tribute to the evaluation of long read technology in fish ge- nomics [15], we sequenced the whole genome of A. ocellaris using Sequence read processing Oxford Nanopore and Illumina technologies and demonstrate that hybrid assembly of long and short reads greatly improved Raw Illumina short reads were adapter-trimmed with Trimmo- the quality of genome assembly. matic v.0.36 (ILLUMINACLIP:2:30:10, MINLEN:100; Trimmomatic, RRID:SCR 011848)[20], followed by a screening for vectors and contaminants, using Kraken v.0.10.5 (Kraken, RRID:SCR 005484) Whole-genome sequencing [21] based on the MiniKraken DB. Kraken-unclassified reads, i.e., nonmicrobial/viral origin, were aligned to the complete mi- Tissues for genome assembly and as reference material were togenome of NTM A3764 (see the Mitogenome Assembly section) sourced from the collection of the Museum and Art Gallery of to exclude sequences of organellar origin. This results in a total the Northern Territory (NTM). The samples used for DNA ex- of 42.35 Gb of “clean” short reads. Nanopore reads were base- traction and subsequent whole-genome sequencing were from called from their raw FAST5 files using the Oxford Nanopore freshly vouchered captive bred A. ocellaris specimens, repre- proprietary base-caller, Albacore, version 2.0.1. Applying a min- senting a unique black and white colour phenotype found only imum length cutoff of 500 bp, this study produced a total of 8.95 in the Darwin Harbour region, Australia (NTM A3764, A4496, Gbp in 895 672 Nanopore reads (N : 12.7 kb). Sequencing statis- A4497). tics are available in Supplementary Table 1. Genomic DNA was extracted from multiple fin clip and mus- cle samples using the E.Z.N.A. Tissue DNA Kit (Omega Bio-tek, Norcross, GA, USA). For Illumina library prep, approximately 1 Genome size estimation μg of gDNA from isolate A3764 was sheared to 300 bp using a Covaris Focused-Ultrasonicator (Covaris, Woburn, MA, USA) and K-mer counting with the “clean” Illumina reads was performed subsequently processed using the TruSeq DNA Sample Prep Kit with Jellyfish v.2.2.6 (Jellyfish, RRID:SCR 005491)[22], generating (Illumina, San Diego, CA, USA) according to the manufacturer’s k-mer frequency distributions of 17-, 21-, and 25-mers. These instructions. Paired-end sequencing was performed on a sin- histograms were processed by GenomeScope [23], which esti- gle lane of HiSeq 2000 (Illumina, San Diego, CA, USA) located mated a genome size of 791 to 794 Mbp with approximately 80% at the Malaysian Genomics Resource Centre Berhad. Two ad- of unique content and a heterozygosity level of 0.6% (Supple- ditional libraries were constructed from specimen NTM A3764, mentary Fig. 1). Given that we had previously excluded adapters and both libraries were sequenced on the MiSeq (2 × 300 bp as well as sequences from contaminant or organellar sources, setting), located at the Monash University Malaysia Genomics the max kmer coverage filter was not applied ( max kmer coverage: Facility. -1). A separate estimation performed by BBMap [24] estimated a To generate Oxford Nanopore long reads, approximately haploid genome size of 967 Mbp. The genome sizes estimated 5 μg of gDNA was extracted from isolates NTM A4496 and A4497, from both approaches are within the range of sizes listed for Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Genome sequence of Amphiprion ocellaris 3 Table 1: Genome and transcriptome statistics of the clownfish ( Am- other Amphiprion species (792 Mb–1.2 Gb) as reported on the An- phiprion ocellaris) genome imal Genome Size Database [25]. Illumina Illumina + (≥500 bp) Nanopore Hybrid genome assembly (≥500 bp) Short reads used for assemblies described in this study were only trimmed for adapters, but not for quality. Both short- Genome assembly read-only and hybrid de novo assemblies were performed Contig statistics with the Maryland Super-Read Celera Assembler v.3.2.2 (Ma- Number of contigs 133 997 7810 SuRCA, RRID:SCR 010691)[26]. During hybrid assembly, errors Total contig size, bp 851 389 851 880 159 068 were encountered in the fragment correction step of the Cel- Contig N size, bp 15 458 323 678 era Assembler (CA; Celera assembler, RRID:SCR 010750). To Longest contig, bp 204 209 2051 878 overcome this, given that the CA assembler is no longer Scaffold statistics maintained, we disabled the frgcorr step based on one of Number of scaffolds 106 526 6404 the developer’s recommendations, and the hybrid assem- Total scaffold size, bp 852 602 726 880 704 246 bly was subsequently improved with 10 iterations of Pilon Scaffold N size, bp 21 802 401 715 v.1.22 (Pilon, RRID:SCR 014731)[27], using short reads to cor- Longest scaffold, bp 227 111 3111 502 rect bases, fix misassemblies, and fill assembly gaps. To GC/AT/N, % 39.6/60.2/0.14 39.4/60.5/0.06 assess the completeness of the genome, Benchmarking Univer- BUSCO genome completeness sal Single-Copy Orthologs v.3.0.2 (BUSCO, RRID:SCR 015008)[28] Complete 3691 (80.5%) 4417 (96.3%) was used to locate the presence or absence of the Actinopterygii- Complete and single copy 3600 (78.5%) 4269 (93.1%) specific set of 4584 single-copy orthologs (OrthoDB v9). Complete and duplicated 91 (2.0%) 148 (3.2%) The short-read-only and hybrid assemblies yielded total as- Fragmented 534 (11.6%) 63 (1.4%) sembly sizes of 851 Mb and 880 Mb, respectively. Statistics for Missing 359 (7.9%) 104 (2.3%) assemblies for each Pilon iteration are available in Supplemen- Transcriptome assembly tary Table 2. Inclusion of Nanopore long reads for a hybrid as- Number of contigs 25 364 sembly representing approximately ×11 genome coverage led to Total length, bp 68 405 796 a 94% decrease in the number of scaffolds (>500 bp) from 106 Contig N size, bp 3670 526 to 6404 scaffolds and an 18-fold increase in the scaffold N BUSCO completeness length from 21 802 bp to 401 715 bp (Table 1). In addition, the Complete 4253 (92.8%) genome completeness was also substantially improved in the Complete and single-copy 4128 (90.1%) hybrid assembly, with BUSCO detecting complete sequences of Complete and duplicated 125 (2.7%) 96.3% (4417/4584) of single-copy orthologs in the Actinopterygii- Fragmented 127 (2.8%) specific dataset. Missing 204 (4.4%) Genome annotation Transcriptome sequencing and assembly Number of protein-coding genes 27 420 Number of functionally 26 211 Total RNA extraction from RNAshield-preserved whole-body annotated proteins and muscle tissues of isolate A4496 used Quick-RNA MicroPrep Mean protein length 514 aa (Zymo Research Corpt, Irvine, CA, USA) according to the man- Longest protein 29 084 aa ufacturer’s protocols. After assessing total RNA intactness on (titin protein) the Tapestation2100 (Agilent), mRNA was enriched using NEB- Average number (length) of exon 9 (355 bp) Next Poly(A) mRNA Magnetic Isolation Kit (NEB, Ipwich, MA, per gene USA) and processed with NEBNext Ultra RNA Library Prep Kit Average number (length) of 8 (1532 bp) for Illumina (NEB, Ipwich, MA, USA). Libraries from both whole- intron per gene body and muscle tissues were sequenced on a fraction of MiSeq V3 flowcell (1 × 150 bp). Single-end reads from both libraries in addition to 2 publicly available A. ocellaris transcriptome se- total of 3 passes were run with MAKER2; the first pass was quencing data (SRR5253145 and SRR5253146, Bioproject ID: PR- based on hints from the assembled transcripts as RNA-seq ev- JNA374650) were individually assembled using Scallop v0.10.2 idence (est2genome) and protein sequences from 11 fish species [29] based on HiSat2 [30] alignment of RNA-sequencing reads to downloaded from Ensembl (Ensembl, RRID:SCR 002344)[33](pro- the newly generated A. ocellaris genome. The transcriptome as- tein2genome), whereas the second and third passes included gene semblies were subsequently merged using the tr2aacds pipeline models trained from the first (and then second) passes with from the EvidentialGene [31] package and similarly assessed for ab initio gene predictors SNAP (SNAP, RRID:SCR 002127)[34]and completeness using BUSCO, version 3 [28]. The final nonredun- Augustus (Augustus: Gene Prediction, RRID:SCR 008417)[35]. In dant transcriptome assembly, which was subsequently used to the final set of genes predicted, sequences with annotation edit annotate the A. ocellaris genome, contains 25 264 contigs/isotigs distance (AED) values of less than 0.5 were retained. A small (putative transcripts) with an accumulated length of 68.4 Mb and AED value suggests a lesser degree of difference between the BUSCO-calculated completeness of 92.8% (Table 1). predicted protein and the evidence used in the prediction (i.e., fish proteins, transcripts). This resulted in a final set of 27 240 protein-coding genes with an average AED of 0.14 (Table 1). A Genome annotation BUSCO analysis on the completeness of the predicted protein Protein-coding genes were predicted with the MAKER v.2.31.9 dataset detected the presence of 4259 (92.9%) single-copy or- genome annotation pipeline (MAKER, RRID:SCR 005309)[32]. A thologs from the Actinopterygii-specific dataset. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Tan et al. Figure 2: Mapping of MinION long reads, Illumina-assembled scaffolds, and RNA-sequencing reads of male and female A. ocellaris to the genomic region containing the cyp19a1a gene. Transcripts per million (TPM) values were calculated using Kallisto, version 0.43.1 [46]. Further, to infer the putative function of these predicted pro- AMPOCE 00 012675-RA (71.5% protein identity to O42145), was teins, NCBI’s blastp v.2.6.0 (-evalue 1e-10, -seg yes, -soft masking searched (tblastn) against the NCBI TSA database (Taxon: true, -lcase masking;BLASTP, RRID:SCR 001010)[36] was used to Amphirion) and showed strikingly high protein identity (99%) find homology to existing vertebrate sequences in the nonre- to a translated RNA transcript from Amphiprion bicinctus dundant (NR) database. Applying a hit fraction filter to include (c183337 g1 i2: GDCV01327693) [5]. The cyp19a1a gene codes for only hits with ≥70% target length fraction, the remaining unan- a steroidogenic enzyme that converts androgens into estrogens notated sequences were subsequently aligned to all sequences [42] were recently shown to be instrumental during sex change in the NR database. With this method, 20 107 proteins (74%) were in Amphiprion bicinctus, as evidenced by significant correlation annotated with a putative function based on homology. Addi- and differential expression of this gene between males and ma- tionally, InterProScan v.5.26.65 (InterProScan, RRID:SCR 005829) ture females [5]. We also observed a similar profile based on [37] was used to examine protein domains, signatures, and mo- mapping of RNA reads from the publicly available male and fe- tifs present in the predicted protein sequences. This analysis de- male transcriptomes of A. ocellaris to the cyp19a1a gene region tected domains, signatures, or motifs for 26 211 proteins (96%). as visualized using the Integrative Genomics Viewer (Fig. 2)[43]. Overall, 96% of the predicted clownfish protein-coding genes The A. ocellaris cyp19a1a gene is located on a 419-kb scaffold and were functionally annotated with information from at least 1 of is spanned by multiple Minimap2-aligned Nanopore reads [44]. the 2 approaches. It is noteworthy that in the Illumina-only assembly, this gene is fragmented and located on 3 relatively short scaffolds (Fig. 2). Mitogenome recovery via genome skimming Genome skimming [38, 39] was performed on 3 additional Conclusion A. ocellaris individuals from known localities (Supplementary We present the first clownfish genome co-assembled with Table 3). Mitogenome assembly was performed with MITObim, high-coverage Illumina short reads and low-coverage (∼11×) version 1.9 (MITObim, RRID:SCR 015056)[40], using the complete Nanopore long reads. Hybrid assembly of Illumina and mitogenome of A. ocellaris (GenBank: NC009065.1) as the bait for Nanopore reads is one of the new features of the MaSuRCA read mapping. The assembled mitogenomes were subsequently assembler, version 3.2.2, which works by constructing long annotated with MitoAnnotator [41]. Consistent with the original and accurate mega-reads from the combination of long and broodstock collection from northern Australia, the captive-bred short read data. Although this is a relatively computationally black and white A. ocellaris NTM A3764 exhibits strikingly high intensive strategy with long run times, we observed substantial whole-mitogenome nucleotide identity (99.98%) to sample NTM improvement in the genome statistics when compared with A3708 as a wild collection from Darwin Harbour, Australia. In Illumina-only assembly. As Nanopore technology becomes more addition, the overall high pair-wise nucleotide identity (>98%) of mature, it is likely that future de novo genome assembly will NTM A3764 to newly generated and publicly available A. ocellaris shift toward high-coverage long read–only assembly, followed whole mitogenomes further supports its morphological identi- by multiple iterations of genome polishing using Illumina reads. fication as A. ocellaris (Supplementary Table 3). Identification of the cyp19a1a gene associated Availability of supporting data with sexual differentiation Data supporting the results of this article are available in the The validated cyp19a1a enzyme of Danio rerio (Uniprot: O42145) GigaDB repository [45]. Raw Illumina and Nanopore reads gen- was used as the query (E-value = 1e-10) for blastp search erated in this study are available in the Sequence Read Archive against the predicted A. ocellaris proteins. The top blast hit, (SRP123679), whereas the Whole Genome Shotgun project has Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Genome sequence of Amphiprion ocellaris 5 been deposited at DDBJ/EMBL/GenBank under the accession 9. Arvedlund M, Nielsen LE. Do the anemonefish Amphiprion NXFZ00000000, both under BioProject PRJNA407816. ocellaris (Pisces: Pomacentridae) imprint themselves to their host sea anemone Heteractis magnifica (Anthozoa: Actinidae)? Ethology 1996;102(2):197–211. Abbreviations 10. Mariscal RN. An experimental analysis of the protection of Amphiprion xanthurus Cuvier & Valenciennes and some other bp: base pair; CDS: coding sequence; Gb: giga base; kb: kilo base; anemone fishes from sea anemones. J Exp Marine Biol Ecol Mb: mega base; SRA: Sequence read archive; TE: transposable 1970;4(2):134–49. elements; TSA: transcriptome shotgun assembly. 11. Hattori A. Coexistence of two anemonefishes, Amphiprion clarkii and A. perideraion, which utilize the same host sea anemone. Environ Biol Fish 1995;42(4):345–53. Additional files 12. Schmiege PF, D’Aloia CC, Buston PM. Anemonefish personal- Additional file 1: Figure S1: Genome profiling of A. ocellaris based ities influence the strength of mutualistic interactions with on Illumina short reads. host sea anemones. Marine Biol 2017;164(1):24. Additional file 1: Table S1: Summary of raw reads generated 13. Allen GR. Damselfishes of the World. Melle, Germany: Mer- from genome and transcriptome sequencing. gus Publishers; 1991. Additional file 1: Table S2: Assembly details after each pilon 14. Heather JM, Chain B. The sequence of sequencers: the his- iteration. tory of sequencing DNA. Genomics 2016;107(1):1–8. Additional file 1: Table S3: Mitogenome similarity of Am- 15. Austin CM, Tan MH, Harrisson KA et al. De novo genome phiprion ocellaris between the target sample (NTM A3764) and assembly and annotation of Australia’s largest freshwater other isolates with known locality; body-colour phenotype is fish, the Murray cod ( Maccullochella peelii), from Illumina and marked where known. Nanopore sequencing read. Gigascience 2017;6(8):1–6. 16. Gan HM, Lee YP, Austin CM. Nanopore long-read guided complete genome assembly of Hydrogenophaga intermedia, Competing interests and genomic insights into 4-aminobenzenesulfonate, p- The authors declare that they have no competing interests. aminobenzoic acid and hydrogen metabolism in the genus Hydrogenophaga. Front Microbiol 2017;8:1880. 17. Zimin AV, Puiu D, Hall R et al. The first near-complete assem- Funding bly of the hexaploid bread wheat genome, Triticum aestivum. This study was funded by the Monash University Malaysia Trop- Gigascience 2017;6(11):1–7. ical and Biology Multidisciplinary Platform. 18. Zimin AV, Stevens KA, Crepeau MW et al. An improved assembly of the loblolly pine mega-genome using long- read single-molecule sequencing. Gigascience 2017;6(1): References 1–4. 1. Militz TA, Foale S. The “Nemo Effect”: perception and reality 19. Zimin AV, Puiu D, Luo M-C et al. Hybrid assembly of the large of Finding Nemo’s impact on marine aquarium fisheries. Nat and highly repetitive genome of Aegilops tauschii,aprogeni- Biotechnol 2017;18(3):525–7. tor of bread wheat, with the mega-reads algorithm. Genome Res 2017;27(5):787–92. 2. Madduppa HH, von Juterzenka K, Syakir M et al. Socio- economy of marine ornamental fishery and its impact on the 20. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexi- population structure of the clown anemonefish Amphiprion ble trimmer for Illumina sequence data. Bioinformatics ocellaris and its host anemones in Spermonde Archipelago, 2014;30(15):2114–20. Indonesia. Ocean Coast Manag 2014;100(Supplement C):41– 21. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic se- 50. quence classification using exact alignments. Genome Biol 3. Hall H, Warmolts D. The role of public aquariums in the 2014;15(3):R46. conservation and sustainability of the marine ornamentals 22. Marc¸ais G, Kingsford C. A fast, lock-free approach for effi- trade. In: Cato JC, Brown CL, eds. Marine Ornamental Species. cient parallel counting of occurrences of k-mers. Bioinfor- Ames, Iowa, USA: Blackwell Publishing Company; 2008:305– matics 2011;27(6):764–70. 24. 23. Vurture GW, Sedlazeck FJ, Nattestad M et al. GenomeScope: fast reference-free genome profiling from short reads. Bioin- 4. Madhu R, Madhu K, Retheesh T. Life history pathways in false clown Amphiprion ocellaris Cuvier, 1830: a journey from formatics 2017;33(14):2202–4. 24. Bushnell B. BBMap Short Read Aligner. Berkeley, CA: Uni- egg to adult under captive condition. J Marine Biol Assoc In- dia 2012;54(1):77–90. versity of California; 2016. http://sourceforgenet/projects 5. Casas L, Saborido-Rey F, Ryu T et al. Sex change in clown- /bbmap (15 June 2017, date last accessed). fish: molecular insights from transcriptome analysis. Sci Rep 25. http://www.genomesize.com (15 June 2017, date last ac- 2016;6:35461. cessed). 6. Buston P. Social hierarchies: size and growth modification in 26. Zimin AV, Marc¸ais G, Puiu D et al. The MaSuRCA genome as- clownfish. Nature 2003; 424(6945):145–6. sembler. Bioinformatics 2013;29(21):2669–77. 7. Kobayashi Y, Horiguchi R, Miura S et al. Sex- and tissue- 27. Walker BJ, Abeel T, Shea T et al. Pilon: an integrated tool for specific expression of P450 aromatase (cyp19a1a) in the yel- comprehensive microbial variant detection and genome as- lowtail clownfish, Amphiprion clarkii. Comp Biochem Physiol sembly improvement. PLoS One 2014;9(11):e112963. A Mol Integr Physiol 2010;155(2):237–44. 28. Simao ˜ FA, Waterhouse RM, Ioannidis P et al. BUSCO: assessing genome assembly and annotation complete- 8. Davenport D, Norris KS. Observations on the symbiosis of the sea anemone Stoichactis and the pomacentrid fish, Am- ness with single-copy orthologs. Bioinformatics 2015;31(19): 3210–2. phiprion percula. Biol Bull 1958;115(3):397–410. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Tan et al. 29. Shao M, Kingsford C. Accurate assembly of transcripts 39. Grandjean F, Tan MH, Gan HM et al. Rapid recovery of nu- through phase-preserving graph decomposition. Nat clear and mitochondrial genes by genome skimming from Biotechnol 2017; doi:10.1038/nbt.4020. https://www.nature. Northern Hemisphere freshwater crayfish. Zool Scripta 2017; com/articles/nbt.4020#supplementary-information (15 June doi:10.1111/zsc.12247. 2017, date last accessed). 40. Hahn C, Bachmann L, Chevreux B. Reconstructing mitochon- 30. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner drial genomes directly from genomic next-generation se- with low memory requirements. Nat Methods 2015;12:357. quencing reads—a baiting and iterative mapping approach. https://www.nature.com/articles/nmeth.3317#supplemen Nucleic Acids Res 2013;41(13):e129. tary-information (15 June 2017, date last accessed). 41. Iwasaki W, Fukunaga T, Isagozawa R et al. MitoFish and Mi- 31. Gilber D. Gene-omes built from mRNA seq not genome DNA. toAnnotator: a mitochondrial genome database of fish with F1000Res 2016;5(1695):1. an accurate and automatic annotation pipeline. Mol Biol Evol 32. Holt C, Yandell M. MAKER2: an annotation pipeline 2013;30(11):2531–40. and genome-database management tool for second- 42. Kallivretaki E, Eggen R, Neuhauss S et al. Aromatase in ze- generation genome projects. BMC Bioinformatics 2011;12(1): brafish: a potential target for endocrine disrupting chemi- 491. cals. Mar Environ Res 2006;62(90):7. 33. Hubbard T, Barker D, Birney E et al. The Ensembl genome 43. Thorvaldsdottir ´ H, Robinson JT, Mesirov JP. Integrative database project. Nucleic Acids Res 2002;30(1):38–41. Genomics Viewer (IGV): high-performance genomics 34. Korf I. Gene finding in novel genomes. BMC Bioinformatics data visualization and exploration. Brief Bioinformatics 2004;5(1):59. 2013;14(2):178–92. 35. Stanke M, Steinkamp R, Waack S et al. AUGUSTUS: a web 44. Li H. Minimap2: fast pairwise alignment for long nucletide server for gene finding in eukaryotes. Nucleic Acids Res sequences. arXiv 2017. https://arxiv.org/abs/1708.01492 (15 2004;32(suppl–2):W309–12. June 2017, date last accessed). 36. Boratyn GM, Camacho C, Cooper PS et al. BLAST: a more effi- 45. Tan MH, Austin CM, Hammer MP et al. Supporting data cient report with usability improvements. Nucleic Acids Res for “Finding Nemo: hybrid assembly with oxford nanopore 2013;41(W1):W29–33. and illumina reads greatly improves the clownfish ( Am- 37. Zdobnov EM, Apweiler R. InterProScan–an integration plat- phiprion ocellaris) genome assembly.” GigaScience Database form for the signature-recognition methods in InterPro. 2017. http://dx.doi.org/10.5524/100397 (15 June 2017, date Bioinformatics 2001;17(9):847–8. last accessed). 38. Gan HM, Schultz MB, Austin CM. Integrated shotgun se- 46. Bray NL, Pimentel H, Melsted P et al. Near-optimal proba- quencing and bioinformatics pipeline allows ultra-fast mi- bilistic RNA-seq quantification. Nat Biotech 2016; 34(5):525– togenome recovery and confirms substantial gene rear- 7. http://www.nature.com/nbt/journal/v34/n5/abs/nbt.3519. rangements in Australian freshwater crayfishes. BMC Evol html#supplementary-information (15 June 2017, date last Biol 2014;14(1):19. accessed). Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png GigaScience Oxford University Press

Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly

Free
6 pages

Loading next page...
 
/lp/ou_press/finding-nemo-hybrid-assembly-with-oxford-nanopore-and-illumina-reads-sSVAHepclD
Publisher
BGI
Copyright
© The Author(s) 2018. Published by Oxford University Press.
eISSN
2047-217X
D.O.I.
10.1093/gigascience/gix137
Publisher site
See Article on Publisher Site

Abstract

Background: Some of the most widely recognized coral reef fishes are clownfish or anemonefish, members of the family Pomacentridae (subfamily: Amphiprioninae). They are popular aquarium species due to their bright colours, adaptability to captivity, and fascinating behavior. Their breeding biology (sequential hermaphrodites) and symbiotic mutualism with sea anemones have attracted much scientific interest. Moreover, there are some curious geographic-based phenotypes that warrant investigation. Leveraging on the advancement in Nanopore long read technology, we report the first hybrid assembly of the clown anemonefish ( Amphiprion ocellaris) genome utilizing Illumina and Nanopore reads, further demonstrating the substantial impact of modest long read sequencing data sets on improving genome assembly statistics. Results: We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches. Conclusions: We present the first genome of any anemonefish and demonstrate the value of low coverage ( ∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic Received: 14 November 2017; Revised: 11 December 2017; Accepted: 27 December 2017 The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 2 Tan et al. studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae. Keywords: clownfish; long reads; genome; transcriptome; hybrid assembly Data Description The clown anemonefish, Amphiprion ocellaris (Fig. 1,NCBITaxon ID: 80 972, Fish Base ID:6509), is a well-known tropical marine fish species among the nonscientific community especially fol- lowing the Pixar film Finding Nemo and its sequel Finding Dory [1]. The visual appeal of A. ocellaris due to its bright coloration and behaviour and ease of husbandry have maintained a strong global demand for this species in the marine aquarium trade, driving a fine balance between positive environmental aware- ness and sustainable ornamental use [1, 2]. Further, given high survival rates and ability to complete their life cycle in captivity, captive-breeding programs to partially sustain their global trade have been successful [3]. For the scientific community, A. ocellaris or anemonefishes in general are actively studied due to their in- triguing reproductive strategy, i.e., sequential hermaphroditism Figure 1: The clown anemonefish ( Amphiprion ocellaris). Photo by Michael P. [4–7] and mutualistic relationships with sea anemones [8–12]. Hammer. Phenotypic body colour variation based on host-anemone use and geography also pose additional questions regarding adap- tive genetic variation [13]. size-selected (8–30 kb) with a BluePippin (Sage Science, Beverly, In recent years, concurrent with the advent of long read se- MA, USA), and processed using the Ligation Sequencing 1D Kit quencing technologies [14], several studies have explored com- bining short but accurate Illumina reads with long but less ac- (Oxford Nanopore, Oxford, UK) according to the manufacturer’s instructions. Three libraries were prepared and sequenced on curate Nanopore/PacBio reads to obtain genome assemblies that are usually more contiguous with higher completeness than as- 3 different R9.4 flowcells using the MinION portable DNA se- quencer (Oxford Nanopore, Oxford, UK) for 48 hours. semblies based on Illumina-only reads [15–19]. To further con- tribute to the evaluation of long read technology in fish ge- nomics [15], we sequenced the whole genome of A. ocellaris using Sequence read processing Oxford Nanopore and Illumina technologies and demonstrate that hybrid assembly of long and short reads greatly improved Raw Illumina short reads were adapter-trimmed with Trimmo- the quality of genome assembly. matic v.0.36 (ILLUMINACLIP:2:30:10, MINLEN:100; Trimmomatic, RRID:SCR 011848)[20], followed by a screening for vectors and contaminants, using Kraken v.0.10.5 (Kraken, RRID:SCR 005484) Whole-genome sequencing [21] based on the MiniKraken DB. Kraken-unclassified reads, i.e., nonmicrobial/viral origin, were aligned to the complete mi- Tissues for genome assembly and as reference material were togenome of NTM A3764 (see the Mitogenome Assembly section) sourced from the collection of the Museum and Art Gallery of to exclude sequences of organellar origin. This results in a total the Northern Territory (NTM). The samples used for DNA ex- of 42.35 Gb of “clean” short reads. Nanopore reads were base- traction and subsequent whole-genome sequencing were from called from their raw FAST5 files using the Oxford Nanopore freshly vouchered captive bred A. ocellaris specimens, repre- proprietary base-caller, Albacore, version 2.0.1. Applying a min- senting a unique black and white colour phenotype found only imum length cutoff of 500 bp, this study produced a total of 8.95 in the Darwin Harbour region, Australia (NTM A3764, A4496, Gbp in 895 672 Nanopore reads (N : 12.7 kb). Sequencing statis- A4497). tics are available in Supplementary Table 1. Genomic DNA was extracted from multiple fin clip and mus- cle samples using the E.Z.N.A. Tissue DNA Kit (Omega Bio-tek, Norcross, GA, USA). For Illumina library prep, approximately 1 Genome size estimation μg of gDNA from isolate A3764 was sheared to 300 bp using a Covaris Focused-Ultrasonicator (Covaris, Woburn, MA, USA) and K-mer counting with the “clean” Illumina reads was performed subsequently processed using the TruSeq DNA Sample Prep Kit with Jellyfish v.2.2.6 (Jellyfish, RRID:SCR 005491)[22], generating (Illumina, San Diego, CA, USA) according to the manufacturer’s k-mer frequency distributions of 17-, 21-, and 25-mers. These instructions. Paired-end sequencing was performed on a sin- histograms were processed by GenomeScope [23], which esti- gle lane of HiSeq 2000 (Illumina, San Diego, CA, USA) located mated a genome size of 791 to 794 Mbp with approximately 80% at the Malaysian Genomics Resource Centre Berhad. Two ad- of unique content and a heterozygosity level of 0.6% (Supple- ditional libraries were constructed from specimen NTM A3764, mentary Fig. 1). Given that we had previously excluded adapters and both libraries were sequenced on the MiSeq (2 × 300 bp as well as sequences from contaminant or organellar sources, setting), located at the Monash University Malaysia Genomics the max kmer coverage filter was not applied ( max kmer coverage: Facility. -1). A separate estimation performed by BBMap [24] estimated a To generate Oxford Nanopore long reads, approximately haploid genome size of 967 Mbp. The genome sizes estimated 5 μg of gDNA was extracted from isolates NTM A4496 and A4497, from both approaches are within the range of sizes listed for Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Genome sequence of Amphiprion ocellaris 3 Table 1: Genome and transcriptome statistics of the clownfish ( Am- other Amphiprion species (792 Mb–1.2 Gb) as reported on the An- phiprion ocellaris) genome imal Genome Size Database [25]. Illumina Illumina + (≥500 bp) Nanopore Hybrid genome assembly (≥500 bp) Short reads used for assemblies described in this study were only trimmed for adapters, but not for quality. Both short- Genome assembly read-only and hybrid de novo assemblies were performed Contig statistics with the Maryland Super-Read Celera Assembler v.3.2.2 (Ma- Number of contigs 133 997 7810 SuRCA, RRID:SCR 010691)[26]. During hybrid assembly, errors Total contig size, bp 851 389 851 880 159 068 were encountered in the fragment correction step of the Cel- Contig N size, bp 15 458 323 678 era Assembler (CA; Celera assembler, RRID:SCR 010750). To Longest contig, bp 204 209 2051 878 overcome this, given that the CA assembler is no longer Scaffold statistics maintained, we disabled the frgcorr step based on one of Number of scaffolds 106 526 6404 the developer’s recommendations, and the hybrid assem- Total scaffold size, bp 852 602 726 880 704 246 bly was subsequently improved with 10 iterations of Pilon Scaffold N size, bp 21 802 401 715 v.1.22 (Pilon, RRID:SCR 014731)[27], using short reads to cor- Longest scaffold, bp 227 111 3111 502 rect bases, fix misassemblies, and fill assembly gaps. To GC/AT/N, % 39.6/60.2/0.14 39.4/60.5/0.06 assess the completeness of the genome, Benchmarking Univer- BUSCO genome completeness sal Single-Copy Orthologs v.3.0.2 (BUSCO, RRID:SCR 015008)[28] Complete 3691 (80.5%) 4417 (96.3%) was used to locate the presence or absence of the Actinopterygii- Complete and single copy 3600 (78.5%) 4269 (93.1%) specific set of 4584 single-copy orthologs (OrthoDB v9). Complete and duplicated 91 (2.0%) 148 (3.2%) The short-read-only and hybrid assemblies yielded total as- Fragmented 534 (11.6%) 63 (1.4%) sembly sizes of 851 Mb and 880 Mb, respectively. Statistics for Missing 359 (7.9%) 104 (2.3%) assemblies for each Pilon iteration are available in Supplemen- Transcriptome assembly tary Table 2. Inclusion of Nanopore long reads for a hybrid as- Number of contigs 25 364 sembly representing approximately ×11 genome coverage led to Total length, bp 68 405 796 a 94% decrease in the number of scaffolds (>500 bp) from 106 Contig N size, bp 3670 526 to 6404 scaffolds and an 18-fold increase in the scaffold N BUSCO completeness length from 21 802 bp to 401 715 bp (Table 1). In addition, the Complete 4253 (92.8%) genome completeness was also substantially improved in the Complete and single-copy 4128 (90.1%) hybrid assembly, with BUSCO detecting complete sequences of Complete and duplicated 125 (2.7%) 96.3% (4417/4584) of single-copy orthologs in the Actinopterygii- Fragmented 127 (2.8%) specific dataset. Missing 204 (4.4%) Genome annotation Transcriptome sequencing and assembly Number of protein-coding genes 27 420 Number of functionally 26 211 Total RNA extraction from RNAshield-preserved whole-body annotated proteins and muscle tissues of isolate A4496 used Quick-RNA MicroPrep Mean protein length 514 aa (Zymo Research Corpt, Irvine, CA, USA) according to the man- Longest protein 29 084 aa ufacturer’s protocols. After assessing total RNA intactness on (titin protein) the Tapestation2100 (Agilent), mRNA was enriched using NEB- Average number (length) of exon 9 (355 bp) Next Poly(A) mRNA Magnetic Isolation Kit (NEB, Ipwich, MA, per gene USA) and processed with NEBNext Ultra RNA Library Prep Kit Average number (length) of 8 (1532 bp) for Illumina (NEB, Ipwich, MA, USA). Libraries from both whole- intron per gene body and muscle tissues were sequenced on a fraction of MiSeq V3 flowcell (1 × 150 bp). Single-end reads from both libraries in addition to 2 publicly available A. ocellaris transcriptome se- total of 3 passes were run with MAKER2; the first pass was quencing data (SRR5253145 and SRR5253146, Bioproject ID: PR- based on hints from the assembled transcripts as RNA-seq ev- JNA374650) were individually assembled using Scallop v0.10.2 idence (est2genome) and protein sequences from 11 fish species [29] based on HiSat2 [30] alignment of RNA-sequencing reads to downloaded from Ensembl (Ensembl, RRID:SCR 002344)[33](pro- the newly generated A. ocellaris genome. The transcriptome as- tein2genome), whereas the second and third passes included gene semblies were subsequently merged using the tr2aacds pipeline models trained from the first (and then second) passes with from the EvidentialGene [31] package and similarly assessed for ab initio gene predictors SNAP (SNAP, RRID:SCR 002127)[34]and completeness using BUSCO, version 3 [28]. The final nonredun- Augustus (Augustus: Gene Prediction, RRID:SCR 008417)[35]. In dant transcriptome assembly, which was subsequently used to the final set of genes predicted, sequences with annotation edit annotate the A. ocellaris genome, contains 25 264 contigs/isotigs distance (AED) values of less than 0.5 were retained. A small (putative transcripts) with an accumulated length of 68.4 Mb and AED value suggests a lesser degree of difference between the BUSCO-calculated completeness of 92.8% (Table 1). predicted protein and the evidence used in the prediction (i.e., fish proteins, transcripts). This resulted in a final set of 27 240 protein-coding genes with an average AED of 0.14 (Table 1). A Genome annotation BUSCO analysis on the completeness of the predicted protein Protein-coding genes were predicted with the MAKER v.2.31.9 dataset detected the presence of 4259 (92.9%) single-copy or- genome annotation pipeline (MAKER, RRID:SCR 005309)[32]. A thologs from the Actinopterygii-specific dataset. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Tan et al. Figure 2: Mapping of MinION long reads, Illumina-assembled scaffolds, and RNA-sequencing reads of male and female A. ocellaris to the genomic region containing the cyp19a1a gene. Transcripts per million (TPM) values were calculated using Kallisto, version 0.43.1 [46]. Further, to infer the putative function of these predicted pro- AMPOCE 00 012675-RA (71.5% protein identity to O42145), was teins, NCBI’s blastp v.2.6.0 (-evalue 1e-10, -seg yes, -soft masking searched (tblastn) against the NCBI TSA database (Taxon: true, -lcase masking;BLASTP, RRID:SCR 001010)[36] was used to Amphirion) and showed strikingly high protein identity (99%) find homology to existing vertebrate sequences in the nonre- to a translated RNA transcript from Amphiprion bicinctus dundant (NR) database. Applying a hit fraction filter to include (c183337 g1 i2: GDCV01327693) [5]. The cyp19a1a gene codes for only hits with ≥70% target length fraction, the remaining unan- a steroidogenic enzyme that converts androgens into estrogens notated sequences were subsequently aligned to all sequences [42] were recently shown to be instrumental during sex change in the NR database. With this method, 20 107 proteins (74%) were in Amphiprion bicinctus, as evidenced by significant correlation annotated with a putative function based on homology. Addi- and differential expression of this gene between males and ma- tionally, InterProScan v.5.26.65 (InterProScan, RRID:SCR 005829) ture females [5]. We also observed a similar profile based on [37] was used to examine protein domains, signatures, and mo- mapping of RNA reads from the publicly available male and fe- tifs present in the predicted protein sequences. This analysis de- male transcriptomes of A. ocellaris to the cyp19a1a gene region tected domains, signatures, or motifs for 26 211 proteins (96%). as visualized using the Integrative Genomics Viewer (Fig. 2)[43]. Overall, 96% of the predicted clownfish protein-coding genes The A. ocellaris cyp19a1a gene is located on a 419-kb scaffold and were functionally annotated with information from at least 1 of is spanned by multiple Minimap2-aligned Nanopore reads [44]. the 2 approaches. It is noteworthy that in the Illumina-only assembly, this gene is fragmented and located on 3 relatively short scaffolds (Fig. 2). Mitogenome recovery via genome skimming Genome skimming [38, 39] was performed on 3 additional Conclusion A. ocellaris individuals from known localities (Supplementary We present the first clownfish genome co-assembled with Table 3). Mitogenome assembly was performed with MITObim, high-coverage Illumina short reads and low-coverage (∼11×) version 1.9 (MITObim, RRID:SCR 015056)[40], using the complete Nanopore long reads. Hybrid assembly of Illumina and mitogenome of A. ocellaris (GenBank: NC009065.1) as the bait for Nanopore reads is one of the new features of the MaSuRCA read mapping. The assembled mitogenomes were subsequently assembler, version 3.2.2, which works by constructing long annotated with MitoAnnotator [41]. Consistent with the original and accurate mega-reads from the combination of long and broodstock collection from northern Australia, the captive-bred short read data. Although this is a relatively computationally black and white A. ocellaris NTM A3764 exhibits strikingly high intensive strategy with long run times, we observed substantial whole-mitogenome nucleotide identity (99.98%) to sample NTM improvement in the genome statistics when compared with A3708 as a wild collection from Darwin Harbour, Australia. In Illumina-only assembly. As Nanopore technology becomes more addition, the overall high pair-wise nucleotide identity (>98%) of mature, it is likely that future de novo genome assembly will NTM A3764 to newly generated and publicly available A. ocellaris shift toward high-coverage long read–only assembly, followed whole mitogenomes further supports its morphological identi- by multiple iterations of genome polishing using Illumina reads. fication as A. ocellaris (Supplementary Table 3). Identification of the cyp19a1a gene associated Availability of supporting data with sexual differentiation Data supporting the results of this article are available in the The validated cyp19a1a enzyme of Danio rerio (Uniprot: O42145) GigaDB repository [45]. Raw Illumina and Nanopore reads gen- was used as the query (E-value = 1e-10) for blastp search erated in this study are available in the Sequence Read Archive against the predicted A. ocellaris proteins. The top blast hit, (SRP123679), whereas the Whole Genome Shotgun project has Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Genome sequence of Amphiprion ocellaris 5 been deposited at DDBJ/EMBL/GenBank under the accession 9. Arvedlund M, Nielsen LE. Do the anemonefish Amphiprion NXFZ00000000, both under BioProject PRJNA407816. ocellaris (Pisces: Pomacentridae) imprint themselves to their host sea anemone Heteractis magnifica (Anthozoa: Actinidae)? Ethology 1996;102(2):197–211. Abbreviations 10. Mariscal RN. An experimental analysis of the protection of Amphiprion xanthurus Cuvier & Valenciennes and some other bp: base pair; CDS: coding sequence; Gb: giga base; kb: kilo base; anemone fishes from sea anemones. J Exp Marine Biol Ecol Mb: mega base; SRA: Sequence read archive; TE: transposable 1970;4(2):134–49. elements; TSA: transcriptome shotgun assembly. 11. Hattori A. Coexistence of two anemonefishes, Amphiprion clarkii and A. perideraion, which utilize the same host sea anemone. Environ Biol Fish 1995;42(4):345–53. Additional files 12. Schmiege PF, D’Aloia CC, Buston PM. Anemonefish personal- Additional file 1: Figure S1: Genome profiling of A. ocellaris based ities influence the strength of mutualistic interactions with on Illumina short reads. host sea anemones. Marine Biol 2017;164(1):24. Additional file 1: Table S1: Summary of raw reads generated 13. Allen GR. Damselfishes of the World. Melle, Germany: Mer- from genome and transcriptome sequencing. gus Publishers; 1991. Additional file 1: Table S2: Assembly details after each pilon 14. Heather JM, Chain B. The sequence of sequencers: the his- iteration. tory of sequencing DNA. Genomics 2016;107(1):1–8. Additional file 1: Table S3: Mitogenome similarity of Am- 15. Austin CM, Tan MH, Harrisson KA et al. De novo genome phiprion ocellaris between the target sample (NTM A3764) and assembly and annotation of Australia’s largest freshwater other isolates with known locality; body-colour phenotype is fish, the Murray cod ( Maccullochella peelii), from Illumina and marked where known. Nanopore sequencing read. Gigascience 2017;6(8):1–6. 16. Gan HM, Lee YP, Austin CM. Nanopore long-read guided complete genome assembly of Hydrogenophaga intermedia, Competing interests and genomic insights into 4-aminobenzenesulfonate, p- The authors declare that they have no competing interests. aminobenzoic acid and hydrogen metabolism in the genus Hydrogenophaga. Front Microbiol 2017;8:1880. 17. Zimin AV, Puiu D, Hall R et al. The first near-complete assem- Funding bly of the hexaploid bread wheat genome, Triticum aestivum. This study was funded by the Monash University Malaysia Trop- Gigascience 2017;6(11):1–7. ical and Biology Multidisciplinary Platform. 18. Zimin AV, Stevens KA, Crepeau MW et al. An improved assembly of the loblolly pine mega-genome using long- read single-molecule sequencing. Gigascience 2017;6(1): References 1–4. 1. Militz TA, Foale S. The “Nemo Effect”: perception and reality 19. Zimin AV, Puiu D, Luo M-C et al. Hybrid assembly of the large of Finding Nemo’s impact on marine aquarium fisheries. Nat and highly repetitive genome of Aegilops tauschii,aprogeni- Biotechnol 2017;18(3):525–7. tor of bread wheat, with the mega-reads algorithm. Genome Res 2017;27(5):787–92. 2. Madduppa HH, von Juterzenka K, Syakir M et al. Socio- economy of marine ornamental fishery and its impact on the 20. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexi- population structure of the clown anemonefish Amphiprion ble trimmer for Illumina sequence data. Bioinformatics ocellaris and its host anemones in Spermonde Archipelago, 2014;30(15):2114–20. Indonesia. Ocean Coast Manag 2014;100(Supplement C):41– 21. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic se- 50. quence classification using exact alignments. Genome Biol 3. Hall H, Warmolts D. The role of public aquariums in the 2014;15(3):R46. conservation and sustainability of the marine ornamentals 22. Marc¸ais G, Kingsford C. A fast, lock-free approach for effi- trade. In: Cato JC, Brown CL, eds. Marine Ornamental Species. cient parallel counting of occurrences of k-mers. Bioinfor- Ames, Iowa, USA: Blackwell Publishing Company; 2008:305– matics 2011;27(6):764–70. 24. 23. Vurture GW, Sedlazeck FJ, Nattestad M et al. GenomeScope: fast reference-free genome profiling from short reads. Bioin- 4. Madhu R, Madhu K, Retheesh T. Life history pathways in false clown Amphiprion ocellaris Cuvier, 1830: a journey from formatics 2017;33(14):2202–4. 24. Bushnell B. BBMap Short Read Aligner. Berkeley, CA: Uni- egg to adult under captive condition. J Marine Biol Assoc In- dia 2012;54(1):77–90. versity of California; 2016. http://sourceforgenet/projects 5. Casas L, Saborido-Rey F, Ryu T et al. Sex change in clown- /bbmap (15 June 2017, date last accessed). fish: molecular insights from transcriptome analysis. Sci Rep 25. http://www.genomesize.com (15 June 2017, date last ac- 2016;6:35461. cessed). 6. Buston P. Social hierarchies: size and growth modification in 26. Zimin AV, Marc¸ais G, Puiu D et al. The MaSuRCA genome as- clownfish. Nature 2003; 424(6945):145–6. sembler. Bioinformatics 2013;29(21):2669–77. 7. Kobayashi Y, Horiguchi R, Miura S et al. Sex- and tissue- 27. Walker BJ, Abeel T, Shea T et al. Pilon: an integrated tool for specific expression of P450 aromatase (cyp19a1a) in the yel- comprehensive microbial variant detection and genome as- lowtail clownfish, Amphiprion clarkii. Comp Biochem Physiol sembly improvement. PLoS One 2014;9(11):e112963. A Mol Integr Physiol 2010;155(2):237–44. 28. Simao ˜ FA, Waterhouse RM, Ioannidis P et al. BUSCO: assessing genome assembly and annotation complete- 8. Davenport D, Norris KS. Observations on the symbiosis of the sea anemone Stoichactis and the pomacentrid fish, Am- ness with single-copy orthologs. Bioinformatics 2015;31(19): 3210–2. phiprion percula. Biol Bull 1958;115(3):397–410. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Tan et al. 29. Shao M, Kingsford C. Accurate assembly of transcripts 39. Grandjean F, Tan MH, Gan HM et al. Rapid recovery of nu- through phase-preserving graph decomposition. Nat clear and mitochondrial genes by genome skimming from Biotechnol 2017; doi:10.1038/nbt.4020. https://www.nature. Northern Hemisphere freshwater crayfish. Zool Scripta 2017; com/articles/nbt.4020#supplementary-information (15 June doi:10.1111/zsc.12247. 2017, date last accessed). 40. Hahn C, Bachmann L, Chevreux B. Reconstructing mitochon- 30. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner drial genomes directly from genomic next-generation se- with low memory requirements. Nat Methods 2015;12:357. quencing reads—a baiting and iterative mapping approach. https://www.nature.com/articles/nmeth.3317#supplemen Nucleic Acids Res 2013;41(13):e129. tary-information (15 June 2017, date last accessed). 41. Iwasaki W, Fukunaga T, Isagozawa R et al. MitoFish and Mi- 31. Gilber D. Gene-omes built from mRNA seq not genome DNA. toAnnotator: a mitochondrial genome database of fish with F1000Res 2016;5(1695):1. an accurate and automatic annotation pipeline. Mol Biol Evol 32. Holt C, Yandell M. MAKER2: an annotation pipeline 2013;30(11):2531–40. and genome-database management tool for second- 42. Kallivretaki E, Eggen R, Neuhauss S et al. Aromatase in ze- generation genome projects. BMC Bioinformatics 2011;12(1): brafish: a potential target for endocrine disrupting chemi- 491. cals. Mar Environ Res 2006;62(90):7. 33. Hubbard T, Barker D, Birney E et al. The Ensembl genome 43. Thorvaldsdottir ´ H, Robinson JT, Mesirov JP. Integrative database project. Nucleic Acids Res 2002;30(1):38–41. Genomics Viewer (IGV): high-performance genomics 34. Korf I. Gene finding in novel genomes. BMC Bioinformatics data visualization and exploration. Brief Bioinformatics 2004;5(1):59. 2013;14(2):178–92. 35. Stanke M, Steinkamp R, Waack S et al. AUGUSTUS: a web 44. Li H. Minimap2: fast pairwise alignment for long nucletide server for gene finding in eukaryotes. Nucleic Acids Res sequences. arXiv 2017. https://arxiv.org/abs/1708.01492 (15 2004;32(suppl–2):W309–12. June 2017, date last accessed). 36. Boratyn GM, Camacho C, Cooper PS et al. BLAST: a more effi- 45. Tan MH, Austin CM, Hammer MP et al. Supporting data cient report with usability improvements. Nucleic Acids Res for “Finding Nemo: hybrid assembly with oxford nanopore 2013;41(W1):W29–33. and illumina reads greatly improves the clownfish ( Am- 37. Zdobnov EM, Apweiler R. InterProScan–an integration plat- phiprion ocellaris) genome assembly.” GigaScience Database form for the signature-recognition methods in InterPro. 2017. http://dx.doi.org/10.5524/100397 (15 June 2017, date Bioinformatics 2001;17(9):847–8. last accessed). 38. Gan HM, Schultz MB, Austin CM. Integrated shotgun se- 46. Bray NL, Pimentel H, Melsted P et al. Near-optimal proba- quencing and bioinformatics pipeline allows ultra-fast mi- bilistic RNA-seq quantification. Nat Biotech 2016; 34(5):525– togenome recovery and confirms substantial gene rear- 7. http://www.nature.com/nbt/journal/v34/n5/abs/nbt.3519. rangements in Australian freshwater crayfishes. BMC Evol html#supplementary-information (15 June 2017, date last Biol 2014;14(1):19. accessed). Downloaded from https://academic.oup.com/gigascience/article-abstract/7/3/1/4803946 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

GigaScienceOxford University Press

Published: Mar 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off