Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader

Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader Background: Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Findings: Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. Conclusions: This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. Keywords: de novo assembly; Bufo marinus; cane toad; Rhinella marina; invasive species; RNA-Seq; transcriptome; anuran; amphibian significant challenges to genome assembly [ 2], which likely ac- Data Description counts for the scarcity of large genome sequences currently Background available. Anuran genome size is highly variable (C-values of It is well established that genome size across taxa is related to 0.95–13.02) [3], and to date, genome sequences of only 3 anu- rans have been published: Xenopus tropicalis [4], X. laevis [5], repetitive DNA content [1]. Highly repetitive genomes present Received: 2 May 2017; Revised: 12 September 2017; Accepted: 16 November 2017 The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 provided the original work is properly cited. by Ed 'DeepDyve' Gillespie user on 16 March 2018 1 2 Richardson et al. and Nanorana parkeri [6]. Large genomes typify many Bu- egg clutches, pairs of adult male and female toads per popu- fonids, including the cane toad (Rhinella marina;average re- lation (i.e., 2 separate male × female crosses) were subcuta- ported C-value = 4.79) [3], and none have been sequenced to neously injected with 0.25 mg/mL Leuproelin acetate (Lucrin Ab- date. Transcriptome sequencing provides a tenable alternative bott Australasia, Kurnell, Australia) in amphibian Ringer’s so- to genome sequencing in anurans because the large, repeti- lution to stimulate spawning; males received 0.25 mL and fe- tive, noncoding regions typical of their large genomes are not males 0.75 mL. The pairs of male and female toads were left sequenced [7]. overnight in 750-L plastic enclosures that contained bore water Cane toads (NCBI Taxonomy ID: 8386) (Fig. 1)are an excel- to lay and fertilize egg clutches. Egg clutches were removed and lent model for the study of invasion. Because they were in- placed in 17-L tanks containing continuously aerated bore wa- tentionally and repeatedly introduced to novel environments ter and monitored to ensure fertilization had occurred. Embryos as a biocontrol agent, their introduction history is well docu- were selected once they reached Gosner stage 16–17 [10]. Three mented [8]. A wealth of evolutionary and ecological knowledge replicates of 5 fertilized embryos were removed per clutch and about cane toads currently exists, documenting phenotypic ev- placed in 1-L containers, each with 750 mL bore water, where idence of rapid evolution in introduced environments, but ge- they were raised until 10 days old; water was changed daily, nomic data are scarce [9]. Providing access to well-developed and developing tadpoles were fed 12 mg of a commercial al- genomic resources for the cane toad will enable the investiga- gae supplement (Hikari algae pellets, Kyorin, Himeji, Japan) af- tion of the genetic basis of traits underlying invasion ability in ter each water change. One tadpole from each of the 3 replicate this species, which will in turn significantly advance our under- tadpole tanks per clutch was euthanized (immersion in 2g/L Tri- standing of invasion genetics for all species. Here we present a caine methanesulfonate) and immediately stored in RNAlater, ◦ ◦ de novo transcriptome assembly covering multiple R. marina tis- kept at 4 C, then transferred to –80 C for storage until RNA sues and life stages, representing one of most complete anuran extraction. genomic resources reported to date. Total RNA was extracted from each of the brain, spleen, and tadpole samples using Qiagen RNeasy kits (Qiagen, USA), follow- ing the manufacturers protocol with an additional DNase diges- Samples tion step. Extracted RNA was quantified using a Quibit RNA HS Cane toad samples and tissues (7 in total) used in this study assay on a Qubit 3.0 Fluorometer (Life Technologies, USA). For were obtained from several sources within the invasive (Aus- the tadpole sequencing, total RNA from the 3 “replicate” tad- tralian) and native (Brazilian) range. Several different methods poles per clutch was pooled in equal quantities, resulting in 4 were used to prepare and sequence samples; for simplicity, we pooled samples. Two μg of total RNA per sample was sent to describe the samples and data sources used based on the tissue Macrogen (Macrogen Inc., Seoul, ROK), where mRNA libraries types sequenced (Table 1). were constructed using the TruSeq mRNA v2 sample kit (Illu- mina Inc, San Diego, CA, USA), which included a 300-bp size Brain and spleen selection step. Libraries were sequenced on 1 lane of Illumina Four adult female toads were collected across 2 sites in Australia, HiSeq 2500 (Illumina Inc, San Diego, CA, USA), generating 295.6 ◦ ◦ 2 from Durack (15.9419 S, 127.2202 E), Western Australia, and million paired-end 2×125-bp reads. Raw reads are available in ◦ ◦ 2 from Gordonvale (17.0972 S, 145.7792 E), Queensland, in May the NCBI Short Read Archive (SRA) under the Bioproject Acces- 2015. Toads were euthanized using lethal injection of 150 mg/kg sion PRJNA382870. sodium pentobarbital, and the whole brain and spleen were har- vested and immediately stored in RNAlater (Qiagen, USA), kept Muscle ◦ ◦ at 4 C, then transferred to –80 C for storage until RNA extraction. We downloaded raw fastq files (NCBI Bioproject Accession: PRJNA277985; paired-end 2×100 bp; Illumina HiSeq-2000) Tadpoles for 4 adult female toads (RM0021M, RM0094M, RM0108M, We conducted a tadpole rearing experiment in March 2015. Four and RM0169M) across 4 populations in Australia (El Questro, adult toads (2 males and 2 females) were collected from both 16.007872S, 128.020494E; Purnululu National Park, 17.4334 S, ◦ ◦ Oombulgurri (15.1818 S, 127.8413 E), Western Australia, and In- ◦ ◦ 128.3018 E, both Western Australia; Innisfail, 17.4963 S, ◦ ◦ nisfail (17.4963 S, 146.0465 E), Queensland, Australia. To obtain ◦ ◦ ◦ 146.0465 E; Rossville, 15.7054 S, 145.2229 E, both Queensland) previously used to build a de novo muscle (triceps femoris) transcriptome [9]. Ovary and testes Two adult toads (1 male and 1 female) were collected from ◦ ◦ Macapa´ (0.0432 S, 51.1241 W), Amapa, ´ Brazil, in December 2015. Toads were euthanized as described above, and ovary and testes were excised and immediately stored in RNAlater, then kept at 4 C for storage until RNA extraction. Total RNA was extracted using Qiagen RNeasy kits, following the manufacturer’s proto- col with an additional DNase digestion step. Extracted RNA was quantified using a Qubit RNA BR assay, and RNA integrity was assessed using a Tapestation 2200 (Aligent Tech., Santa Clara, CA, USA) with an RNA screen. One μg of total RNA per sample was used to construct mRNA libraries using the TruSeq mRNA v2 sample kit, which included a 130–350-bp size selection step. Both libraries were run on a HiSeq 1500 using Illumina V4 PE chem- Figure 1: The cane toad, Rhinella marina. NCBI Taxonomy ID: 8386. Photographer istry across 2 lanes (1 lane for each sample), generating 844.6 credit: Matt Greenlees. Source: Matt Greenlees. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 A multi-tissue cane toad transcriptome 3 Table 1: Cane toad samples used to generate the de novo reference transcriptome Tissue Origin Platform Sample ID (library size) Sampling location Sex SRA Brain Australia HiSeq 2500 B19 (23.9 M) Durack F SRR5446736 (2 × 125 bp) B20 (27.7 M) Durack F SRR5446735 B31 (24.8 M) Gordonvale F SRR5446734 B32 (22.3 M) Gordonvale F SRR5446733 Spleen Australia HiSeq 2500 S1 (23.8 M) Gordonvale F SRR5446732 (2 × 125 bp) S2 (25.0 M) Gordonvale F SRR5446732 S18 (24.7 M) Durack F SRR5446732 S19 (23.6 M) Durack F SRR5446732 Muscle Australia HiSeq 2000 RM0021M (93.8 M) El Questro F SRR1910534 (2 × 100 bp) SRR1910535 RM0094M (88.2 M) Purnululu F SRR1910543 RM0108M (97.6 M) Innisfail F SRR1910545 RM0169M (80.0 M) Rossville F SRR1910549 Tadpole Australia HiSeq 2500 T1 (26.4 M) Oombulgurri Both SRR5446728 (2 × 125 bp) T4 (24.5 M) Oombulgurri Both SRR5446727 T7 (23.2 M) Innisfail Both SRR5446726 T10 (25.7 M) Innisfail Both SRR5446725 Liver Brazil HiSeq 2000 RMTP (536.8 M) Macapa´ NA SRR1514601 (2 × 75bp) Ovary Brazil HiSeq 1500 AR19 (434.1 M) Macapa´ F SRR5446724 (2 × 125 bp) Testes Brazil HiSeq 1500 AR05 (410.5 M) Macapa´ M SRR5446723 (2×125 bp) Library size is given as raw sequenced reads in millions (M), sex denoted as female (F) and male (M). Both: sample contains mixed individuals of both sexes; NA: information unknown. million paired-end 2×125-bp reads. Raw reads are available in 12 different k-mers for each combination of -cov cutoff = 3, - the NCBI SRA under the Bioproject Accession PRJNA382870. min pair count = 4, and -cov cutoff = 5, -min pair count = 6,where -ins length = 300 and -min trans lgth = 200, were consistent across Liver assemblies. The individual assemblies were then compiled into We downloaded raw fastq files (NCBI Bioproject Accession: PR- an “over-assembly” of ∼42 million transcripts. To reduce redun- JNA255079; paired-end 2×75 bp; Illumina HiSeq-2000) from a dancy in the “over-assembly,” we used the tr2aacds pipeline pool of 5 adult toads from Macapa, ´ Amapa, ´ Brazil, previously from the Evidential Gene package [17], which selects the “op- used to build a de novo liver transcriptome [11]. timal” set of transcripts based on their coding potential. This re- duced the redundant “over-assembly” to the final assembly of 62 202 transcripts. Of these, 50% (31 040) are commonly expressed Data preprocessing and multiple de novo transcriptome among the 7 different tissues used in the assembly, while a total assemblies of 6.64% exhibit tissue-specific expression (Fig. S1: Additional file 1) [32]. We then used TransDecoder v3.0.0 to predict protein cod- Raw reads from each sample were first processed with Trimmo- ing sequences (CDS) with a minimum CDS of 100 bp. Transvesti- matic v0.33 [12], using the following parameters: ILLUMINACLIP: gator [18] was used to prepare the final assembly for submission TruSeq3-PE.fa:2:30:10:4 HEADCROP:13 AVGQUAL:30 MINLEN:36, to NCBI’s Transcriptome Shotgun Assembly (TSA) database— to (i) remove adaptor sequences, (ii) trim the first 13 bp of a read, accessible through the PRJNA383966 accession. Results from the (iii) discard reads with an average quality <Phred 30, and (iv) assembly pipeline are described in Table 3. As the “dropset”— remove reads <36 bp after processing. We then concatenated those transcripts not kept in the “optimal” tr2aacds output— reads into 2 input data sets, 1 containing all samples from Aus- may contain other biologically relevant transcripts, such as non- tralia and 1 containing those from Brazil. To reduce the compu- coding RNAs and active transposable elements, we also provide tational load of assembly, we used the in silico normalization ap- these transcripts in the associated GigaDB repository [32]. proach implemented in Trinity v2.1.1 (Trinity, RRID:SCR 013048) [13], with –normalize max read cov = 50, on both of the input data sets. The normalized Australia and Brazil data sets contained Annotation ∼42.2 million and ∼82.2 million reads, respectively. Multiple in- dependent de novo transcriptome assemblies were conducted for We conducted functional annotation based on our predicted each of the input data sets, resulting in 100 separate assemblies protein sequences utilizing the automated Trinotate pipeline. (Table 2). In brief, we used 3 assemblers: Trinity, with default pa- Transcripts were first annotated based on sequence homology, rameters and –min contig length = 300; SOAPdenovo-Trans v1.03 where assembled nucleotides and translated CDS sequences (SOAPdenovo-Trans, RRID:SCR 013268)[14], with 13 different k- were used in BLASTx (BLASTX, RRID:SCR 001653)and BLASTp mers (apart from the Brazil input set, which had 12) for each (BLASTP, RRID:SCR 001010) searches, against the UniProt/Swiss- combination of EdgeCovCutoff = 2, mergeLevel = 1, EdgeCovCut- Prot database (downloaded Feb. 2017) using a standalone ver- −5 off = 3, mergeLevel = 2; the parameters -f, -F, and minContigLen = sion of BLAST v2.2.26+ [19], with an e-value cutoff of 1×10 . 200 were the same for all assemblies; velvet v1.2.09/oases v0.2.08 Pfam (Pfam, RRID:SCR 004726)[20] functional domains (down- (Velvet, RRID:SCR 010755/Oases, RRID:SCR 011896)[15, 16], with loaded Feb. 2017) were identified in protein coding sequences us- Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Richardson et al. Table 2: De novo assembler parameters used to produce the “over-assembly” Assembler k-mers Parameter combinations No. of assemblies Trinity 25 Default Aus 1, Brazil 1 SOAPdenovo-Trans 21, 25, 29, 33, 37, 41, 45, 49, 59, 69, 79, 89, EdgeCovCutoff = 2 and mergeLevel = 1; Aus 13, Brazil 12; Aus 99 (No. 99 for the Brazil input set) EdgeCovCutoff = 3 and mergeLevel = 2 13, Brazil 12 Velvet/Oases 21, 25, 29, 33, 37, 41, 45, 49, 59, 69, 79, 89 cov cutoff = 3 and min pair count = 4; Aus 12, Brazil 12; Aus cov cutoff = 5 and min pair count = 6 12, Brazil 12 Total: 100 Table 3: Summary of transcriptome assembly and annotation statis- v9.0, downloaded Xenbase.org, Aug. 2017) (Fig. S2: Additional tics compared with previous cane toad transcriptomes file 1), albeit with a greater frequency of shorter features. This is not unexpected given that we are comparing a de a b This study Muscle Liver novo transcriptome assembly to gene models from a genome assembly. Also, our assembly shows good coverage of the Assembly lengths of CDS features compared with X. tropicalis (Fig. S3: Filtered read pairs 945 348 780 99 462 214 265 684 605 Additional file 1), given both species are substantially diver- In silico normalized reads 129 051 008 18 713 526 - gent. Importantly, the new R. marina multitissue assembly in- Assembly size, bp 83 724 193 60 388 685 80 251 892 creases the coverage of transcripts containing protein coding Number of transcripts 62 202 57 580 131 020 sequences with associated BLAST matches and Gene Ontology N50 2377 1871 916 Average length, bp 1346 1049 613 annotations compared with the previously available R. marina Minimum length, bp 297 201 201 assemblies. Maximum length, bp 99 438 40 546 17 369 Second, we evaluated the new assembly using the Bench- Median length, bp 698 577 331 marking Universal Single-Copy Orthologs (BUSCO) vertebrate GC, % 46.05 45.06 44.32 gene set (BUSCO, RRID:SCR 015008)[27], which uses 3023 Transcripts with CDS 62 202 19 751 – near-universal orthologs (hereafter BUSCOs) to evaluate the Annotation relative completeness of assemblies and compares the re- Transcripts with BLASTx hit 31 103 21 533 – sults with the previous R. marina assemblies and those Transcripts with BLASTp hit 28 560 16 754 – of several available amphibian transcriptomes and genomes Transcripts with GO terms 28 399 19 500 – (Table 4). We used BUSCO v1.2 [27] with the default e-value cut- offof0.01and –mode = Trans for all the transcriptomes com- Rollins, Richardson, and Shine [9]. pared and –mode = OGS for the genome comparisons (using the Arthofer et al. [11]. N. parkeri v2.0, X. tropicalis v9.0 and X. laevis v9.1 1.8.3.2 gene builds downloaded from Xenbase.org, Aug. 2017). Our multitis- ing hmmscan [21]; signal peptides and transmembrane domains sue assembly had a much higher percentage of complete BUS- were assigned using hidden Markov model prediction imple- COs (90%), apart from the 2 Xenopus genomes, which exhibited mented in SignalP v4.1 [22] and TMHMM v2.0c [23], respectively. comparable results (X. tropicalis, 91% and X. laevis, 97%). Addi- Finally, transcripts were compared with curated annotations in tionally, our multitissue transcriptome has low BUSCO miss- the eggNOG (eggNOG, RRID:SCR 002456)[24] and Gene Ontology ingness, intermediate duplication of complete BUSCOs, and the (GO, RRID:SCR 002811)[25] databases. A summary of annotation second lowest level of fragmented BUSCOs. In contrast to the metrics is provided in Table 3. The combined Trinotate func- previous R. marina transcriptomes specifically, the new assem- tional annotations to the TSA assembly are available in the as- bly has less fragmented and 20%–30% more complete BUSCO sociated GigaDB [26]. genes—suggesting the presence of more full-length transcripts. Overall, the comparison of BUSCO results revealed our assem- Quality and completeness of the cane toad bly to be one of the most complete references available for transcriptome anurans. Third, we compared the multitissue transcriptome with To evaluate our new multitissue transcriptome assembly, we the previous cane toad transcriptomes and the 3 currently used 3 comparative approaches to assess relative quality and available Anuran genomes through both standard and recip- completeness. First, we compared core assembly statistics of rocal best-hit BLAST approaches. The standard approach re- the new assembly to our 2 previous cane toad single-tissue vealed that 40 741 (65.5%) and 31 189 (50.1%) of our new as- transcriptomes derived from muscle and liver tissue (Table 3). −3 sembly had significant matches (e-value < 10 )tothe liver The inclusion of data from multiple tissues (encompassing 9.5- and muscle transcriptomes, respectively. The reciprocal best- and 3.5-fold increases in read input compared with the mus- hit approach reduced the number of significant matches to cle and liver transcriptomes, respectively) resulted in increases both the liver (23 943; 38.5%) and muscle (15 892; 25.5%) of all assembly metrics, apart from the number of assembled transcriptomes, which may in part be due to transcripts transcripts, which fell compared with the liver transcriptome mapping to multiple isoforms in the different assemblies. This, (Table 3). Notably, mean transcript length increased from 613 together with the high number of protein-coding transcripts in (liver) to 1346 bp, and transcript n50 increased from 916 (liver) the multitissue assembly, indicates that the new assembly still to 2377 bp. The total assembled bases were similar between contains some redundancy and that we have assembled mul- the multitissue transcriptome and that assembled from liver, tiple transcript variants for some genes. Standard BLAST com- yet higher (∼20 million bp) than that produced from muscle parisons of our assembly with the X. tropicalis, X. laevis, and tissue. Additionally, our assembly exhibits comparable lengths N. parkeri proteins exhibited 40 275 (64.7%), 40 218 (64.7%), and of mRNAs and CDS to those from X. tropicalis (gene build Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 A multi-tissue cane toad transcriptome 5 Table 4: BUSCO analysis of transcriptome completeness Complete and Complete duplicated Fragmented Missing BUSCOs, % BUSCOs, % BUSCOs, % BUSCOs, % R. marina transcriptomes This study 90 4.7 1.7 7.8 Muscle 60 4.6 5.7 33 Liver 69 0.6 4.1 26 Select anuran transcriptomes Bufotes viridis 26 0.3 15 57 Rana catesbeiana 79 42 2.8 17 Pelohylax nigromaculatus 50 0.4 7.8 41 Microhyla ssipes fi 73 1.2 4.7 21 Select anuran genomes Xenopus laevis 97 51 1.4 1.4 Xenopus tropicalis 91 4.1 3.7 4.9 Nanorana parkeri 76 2.8 9.0 14 “Complete BUSCOs” refers to those with a full-length match in the assembly. “Complete and duplicated” refers to those BUSCOs that are complete within an assembly but have multiple matches present. “Fragmented” are those BUSCOs that only have a partial match in the assembly, and “Missing” refers to those BUSCOs that do not a b c d e f have a corresponding match in the assembly. Rollins, Richardson, and Shine [9]. Arthofer et al. [11]. Gerhchen et al. [7]. Birol et al. [28]. Huang et al. [29]. Zhao g h i et al. [30]. Session et al. [5]. Hellsten et al. [4]. Sun et al. [6]. 40 244 (64.7%) significant matches, respectively; 37 064 of our Conclusions assembled transcripts with significant matches are common to This comprehensive anuran transcriptome will not only serve all 3 species (Fig. S4: Additional file 1). Of our 31 103 assem- as a valuable reference for investigation of genes under selec- bled transcripts with annotations, 97.8% (30 423) have signifi- tion during invasion in cane toads, but will also expand our cant matches to X. tropicalis, while 97.9% (30 465) have matches general knowledge of anuran genomes. Additionally, we have to X. laevis and 98.3% (30 574) to N. parkeri; 96.3% (29 967) are identified numerous orthologous transcripts to X. tropicalis, X. common to all 3 species (Fig. S5: Additional file 1). The high laevis, and N. parkeri proteins, which may aid the identifica- percentage of annotated transcripts with matches to X. trop- tion of amphibian-specific genes—an important objective of icalis, X. laevis, and N. parkeri provides further evidence that AmphiBase [31]. our assembly pipeline produced transcripts with strong anuran homology. Availability of supporting data The data sets supporting the results presented here are available Identification of anuran orthologues in the associated GigaDB repository [25]. All raw sequencing data used in this study are available in the SRA and associated with Current efforts to identify amphibian-specific genes have been the following BioProject accessions: PRJNA277985, PRJNA255079, hampered by a lack of high-quality full-length genes for nu- PRJNA382870, and PRJNA383966. The final transcriptome assem- merous amphibian species [31]. So far, the identification of bly has been deposited at DDBJ/EMBL/GenBank under the acces- amphibian-specific genes has not been possible as orthologous sion GFMT00000000. The version described in this paper is the counterparts have only been identified between the 2 Xenopus first version, GFMT01000000. genomes. We used OrthoFinder v1.1.10 [32] with default pa- rameters to identify orthologues between our newly assem- bled R. marina CDS-containing transcripts and the proteomes Additional file from N. parkeri, X. tropicalis, and X. laevis (using the same gene Additional file 1: Figure S1: Schematic diagram showing the per- build versions as in the BUSCO analysis). OrthoFinder iden- tifies “orthogroups” (a group of genes descended from a sin- centage (%) of expressed transcripts among the 7 different tis- sues used in the assembly. For brevity, we only show those gle gene in the last common ancestor of a group of species) [32] and then orthologues between each pair of species in the common to all tissues (centre of elements) and those uniquely comparison. Because OrthoFinder classifies genes with mul- expressed in each separate tissue. Transcript expression was quantified using Salmon v0.8.0 [ 1] using –l IU and default pa- tiple orthologues (i.e., many to many relationships) in “or- thogroups,” it may reduce the impact that multiple isoforms rameters. Additionally, detailed comparative analysis of expres- sion among all tissue combinations is provided in the associated of the same gene have in such analyses. We assigned 60.2% (94 516) of all the genes examined to 18 776 “orthogroups” GigaDB repository [2]. Additional file 1: Figure S2: Histogram of the lengths of R. ma- (see Additional file 2), of which 50% of all genes were found in “orthogroups” with 4 or more genes. Additionally, we iden- rina assembled mRNAs and CDS compared with those from X. tropicalis (gene build v9.0, Xenbase.org, Aug. 2017). tified 12 674 “orthogroups” that contained genes from all 4 species, and 4586 of these consisted entirely of single-copy Additional file 1: Figure S3: Scatterplot showing the coverage of each R. marina CDS length in base pairs compared with the genes. The data set presented here may be useful for further research into identifying amphibian-specific genes, so we have corresponding CDS match in X. tropicalis (gene build v9.0, Xen- base.org, Aug. 2017). CDS length matches were extracted from included this analysis in its entirety in the associated GigaDB repository [25]. BLASTx queries with evalue = 10E-3 and –max target seqs=1. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Richardson et al. Additional file 1: Figure S4: Venn diagram showing an 3. Gregory T. Animal Genome Size Database. 2015. –3 overview of the significant BLASTx matches (e-value < 10 )for http://www.genomesize.com. Accessed 3 March 2017. our R. marina assembly against the proteins from X. tropicalis, X. 4. Hellsten U, Harland RM, Gilchrist MJ et al. The genome laevis, and N. parkeri. Venn diagram build using the Venn diagram of the Western Clawed frog Xenopus tropicalis. Science webserver: http://bioinformatics.psb.ugent.be/webtools/Venn/. 2010;328:633–6. Additional file 1: Figure S5. Venn diagram showing an 5. Session AM, Uno Y, Kwon T et al. Genome evolution –3 overview of the significant BLASTx matches (e-value < 10 ) in the allotetraploid frog Xenopus laevis. Nature 2016;538: for our R. marina transcripts with annotations (31 103) 336–43. against proteins from X. tropicalis, X. laevis, and N. park- 6. Sun Y-B, Xiong Z-J, Xiang X-Y et al. Whole-genome sequence eri. Venn diagram built using the Venn diagram webserver: of the Tibetan frog Nanorana parkeri and the comparative http://bioinformatics.psb.ugent.be/webtools/Venn/ evolution of tetrapod genomes. Proc Natl Acad Sci U S A Additional file 2: Table S1: Summary of OrthoFinder analy- 2015;112:E1257–62. sis. Gene build versions are given in the species heading where 7. Gerchen JF, Reichert SJ, Rohr ¨ JT et al. A single transcriptome appropriate. of a green toad (Bufo viridis) yields candidate genes for sex de- termination and -differentiation and non-anonymous popu- Competing interests lation genetic markers. PLoS One 2016;11:e0156419. 8. Kraus F. Alien Reptiles and Amphibians: A Scientific Com- We declare no competing interests. pendium and Analysis. Dordrecht, Netherlands: Springer; Abbreviations 9. Rollins LA, Richardson MF, Shine R. A genetic perspective on rapid evolution in cane toads (Rhinella marina). Mol Ecol BUSCO: Benchmarking universal single copy orthologs; bp = 2015;24:2264–76. base pair; CDS: coding sequence; GO: Gene Ontology; SRA: Short 10. Gosner KL. A simplified table for staging anuran em- Read Archive; TSA: Transcriptome Shotgun Assembly. bryos and larvae with notes on identification. Herpetologica 1960;16:183–90. Ethics statement 11. Arthofer W, Banbury BL, Carneiro M et al. Genomic resources notes accepted 1 August 2014-30 September 2014. Mol Ecol Ethics approval for the capture of wild Australian samples was Resour 2015;15:228–9. provided under the University of Sydney permit 2014/562, the 12. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexi- rearing of tadpoles by the University of Sydney permit 2013/6033, ble trimmer for Illumina sequence data. Bioinformatics and Brazilian samples under the Brazilian Federal Chico Mendes 2014;30:2114–20. Institute for Biodiversity Conservation (ICMBio), through license 13. Grabherr MG, Haas BJ, Yassour M et al. Full-length tran- number 38 047–3. scriptome assembly from RNA-Seq data without a reference Author contributions genome. Nat Biotechnol 2011;29:644–52. 14. Xie Y, Wu G, Tang J et al. SOAPdenovo-Trans: de novo tran- M.F.R., M.V., and L.A.R. collected animals and conducted the scriptome assembly with short RNA-Seq reads. Bioinformat- sample preparation. J.G.R. and M.R.C. conducted tadpole rearing. ics 2014;30:1660–6. M.F.R., F.S., D.M.S., M.C., and L.A.R. conducted RNA isolation for 15. Zerbino DR, Birney E. Velvet: algorithms for de novo sequencing and library construction. AJW contributed samples. short read assembly using de Bruijn graphs. Genome Res M.F.R. conducted transcriptome assemblies and analysis. M.F.R., 2008;18:821–9. F.S., R.S., and L.A.R. wrote the manuscript and participated in 16. Schulz MH, Zerbino DR, Vingron M et al. Oases: robust de study design. All authors commented on the manuscript and ap- novo RNA-seq assembly across the dynamic range of expres- proved the final submission. sion levels. Bioinformatics 2012;28:1086–92. 17. Gilbert D. Accurate and complete gene construction with Ev- Acknowledgements identialGene. F1000Res 2016;5. 18. DeRego T, Hall B, ben-guin, Geib S. Transvestigator early We thank Serena Lam and Chris Jolly for assistance with release, Version v0.1-alpha. Zenodo, 2014. http://doi.org/ sample collection. This project was funded through an Aus- 10.5281/zenodo.10471. tralian Research Council (ARC) Discovery Early Career Re- 19. Camacho C, Coulouris G, Avagyan V et al. BLAST+:architec- search Award (DE150101393) to L.A.R., ARC Laureate Fellow- ture and applications. BMC Bioinformatics 2009;10:421. ship to R.S. (FL120100074), Centre for Integrative Ecology re- 20. Finn RD, Bateman A, Clements J et al. Pfam: the protein fam- search funds, Deakin University to M.F.R., FCT Investigator ilies database. Nucl Acids Res 2014;42:D222–30. grant to M.C. (IF/00283/2014/CP1256/CT0012), research project 21. Eddy SR, Crooks G, Green R et al. Accelerated pro- and research fellowship (PQ 10/2012 and Universal 14–2013) file HMM searches. Pearson WR, ed. PLoS Comput Biol and postdoctoral fellowship to M.V. (232916/2013–6, CNPq), 2011;7:e1002195. and FEDER (COMPETE, POCI-01–0145-FEDER-006821) and FCT 22. Petersen TN, Brunak S, von Heijne G et al. SignalP 4.0: dis- (UID/BIA/50027/2013) funds to F.S. criminating signal peptides from transmembrane regions. Nat Methods 2011;8:785–6. References 23. Krogh A, Larsson B, von Heijne G et al. Predicting transmem- 1. Kidwell MG. Transposable elements and the evolution of brane protein topology with a hidden Markov model: appli- genome size in eukaryotes. Genetica 2002;115:49–63. cation to complete genomes. J Mol Biol 2001;305:567–80. 2. Treangen TJ, Salzberg SL. Repetitive DNA and next- 24. Powell S, Szklarczyk D, Trachana K et al. eggNOG v3.0: orthol- generation sequencing: computational challenges and ogous groups covering 1133 organisms at 41 different taxo- solutions. Nat Rev Genet 2011;13:36–46. nomic ranges. Nucleic Acids Res 2012;40:D284–9. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 A multi-tissue cane toad transcriptome 7 25. Ashburner M, Ball CA, Blake JA et al. Gene ontology: tool for 29. Huang L, Li J, Anboukaria H et al. Comparative transcriptome the unification of biology. Nat Genet 2000; 25:25–29. analyses of seven anurans reveal functions and adaptations 26. Richardson MF, Sequeira F, Selechnik D et al. Supporting of amphibian skin. Sci Rep 2016;6:24069. data for “Improving amphibian genomic resources: a mul- 30. Zhao L, Liu L, Wang S et al. Transcriptome profiles of meta- titissue reference transcriptome of an iconic invader.” Giga- morphosis in the ornamented pygmy frog Microhyla ssipes fi Science Database 2017. http://dx.doi.org/10.5524/100374. clarify the functions of thyroid hormone receptors in meta- 27. Simao ˜ FA, Waterhouse RM, Ioannidis P et al. BUSCO: assess- morphosis. Sci Rep 2016;6:27310. ing genome assembly and annotation completeness with 31. Kwon T. AmphiBase: a new genomic resource for non-model single-copy orthologs. Bioinformatics 2015;31:3210–2. amphibian species. Genesis 2017;55:e23010. 28. Birol I, Behsaz B, Hammond SA et al. De novo transcriptome 32. Emms DM, Kelly S. OrthoFinder: solving fundamental assemblies of rana (Lithobates) catesbeiana and Xenopus laevis biases in whole genome comparisons dramatically im- tadpole livers for comparative genomics without reference proves orthogroup inference accuracy. Genome Biol 2015; genomes. Plateroti M, ed. PLoS One 2015;10:e0130720. 16:157. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png GigaScience Oxford University Press

Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader

Free
7 pages

Loading next page...
 
/lp/ou_press/improving-amphibian-genomic-resources-a-multitissue-reference-0Rm3FrbFwX
Publisher
BGI
Copyright
© The Author 2017. Published by Oxford University Press.
eISSN
2047-217X
D.O.I.
10.1093/gigascience/gix114
Publisher site
See Article on Publisher Site

Abstract

Background: Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Findings: Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. Conclusions: This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. Keywords: de novo assembly; Bufo marinus; cane toad; Rhinella marina; invasive species; RNA-Seq; transcriptome; anuran; amphibian significant challenges to genome assembly [ 2], which likely ac- Data Description counts for the scarcity of large genome sequences currently Background available. Anuran genome size is highly variable (C-values of It is well established that genome size across taxa is related to 0.95–13.02) [3], and to date, genome sequences of only 3 anu- rans have been published: Xenopus tropicalis [4], X. laevis [5], repetitive DNA content [1]. Highly repetitive genomes present Received: 2 May 2017; Revised: 12 September 2017; Accepted: 16 November 2017 The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 provided the original work is properly cited. by Ed 'DeepDyve' Gillespie user on 16 March 2018 1 2 Richardson et al. and Nanorana parkeri [6]. Large genomes typify many Bu- egg clutches, pairs of adult male and female toads per popu- fonids, including the cane toad (Rhinella marina;average re- lation (i.e., 2 separate male × female crosses) were subcuta- ported C-value = 4.79) [3], and none have been sequenced to neously injected with 0.25 mg/mL Leuproelin acetate (Lucrin Ab- date. Transcriptome sequencing provides a tenable alternative bott Australasia, Kurnell, Australia) in amphibian Ringer’s so- to genome sequencing in anurans because the large, repeti- lution to stimulate spawning; males received 0.25 mL and fe- tive, noncoding regions typical of their large genomes are not males 0.75 mL. The pairs of male and female toads were left sequenced [7]. overnight in 750-L plastic enclosures that contained bore water Cane toads (NCBI Taxonomy ID: 8386) (Fig. 1)are an excel- to lay and fertilize egg clutches. Egg clutches were removed and lent model for the study of invasion. Because they were in- placed in 17-L tanks containing continuously aerated bore wa- tentionally and repeatedly introduced to novel environments ter and monitored to ensure fertilization had occurred. Embryos as a biocontrol agent, their introduction history is well docu- were selected once they reached Gosner stage 16–17 [10]. Three mented [8]. A wealth of evolutionary and ecological knowledge replicates of 5 fertilized embryos were removed per clutch and about cane toads currently exists, documenting phenotypic ev- placed in 1-L containers, each with 750 mL bore water, where idence of rapid evolution in introduced environments, but ge- they were raised until 10 days old; water was changed daily, nomic data are scarce [9]. Providing access to well-developed and developing tadpoles were fed 12 mg of a commercial al- genomic resources for the cane toad will enable the investiga- gae supplement (Hikari algae pellets, Kyorin, Himeji, Japan) af- tion of the genetic basis of traits underlying invasion ability in ter each water change. One tadpole from each of the 3 replicate this species, which will in turn significantly advance our under- tadpole tanks per clutch was euthanized (immersion in 2g/L Tri- standing of invasion genetics for all species. Here we present a caine methanesulfonate) and immediately stored in RNAlater, ◦ ◦ de novo transcriptome assembly covering multiple R. marina tis- kept at 4 C, then transferred to –80 C for storage until RNA sues and life stages, representing one of most complete anuran extraction. genomic resources reported to date. Total RNA was extracted from each of the brain, spleen, and tadpole samples using Qiagen RNeasy kits (Qiagen, USA), follow- ing the manufacturers protocol with an additional DNase diges- Samples tion step. Extracted RNA was quantified using a Quibit RNA HS Cane toad samples and tissues (7 in total) used in this study assay on a Qubit 3.0 Fluorometer (Life Technologies, USA). For were obtained from several sources within the invasive (Aus- the tadpole sequencing, total RNA from the 3 “replicate” tad- tralian) and native (Brazilian) range. Several different methods poles per clutch was pooled in equal quantities, resulting in 4 were used to prepare and sequence samples; for simplicity, we pooled samples. Two μg of total RNA per sample was sent to describe the samples and data sources used based on the tissue Macrogen (Macrogen Inc., Seoul, ROK), where mRNA libraries types sequenced (Table 1). were constructed using the TruSeq mRNA v2 sample kit (Illu- mina Inc, San Diego, CA, USA), which included a 300-bp size Brain and spleen selection step. Libraries were sequenced on 1 lane of Illumina Four adult female toads were collected across 2 sites in Australia, HiSeq 2500 (Illumina Inc, San Diego, CA, USA), generating 295.6 ◦ ◦ 2 from Durack (15.9419 S, 127.2202 E), Western Australia, and million paired-end 2×125-bp reads. Raw reads are available in ◦ ◦ 2 from Gordonvale (17.0972 S, 145.7792 E), Queensland, in May the NCBI Short Read Archive (SRA) under the Bioproject Acces- 2015. Toads were euthanized using lethal injection of 150 mg/kg sion PRJNA382870. sodium pentobarbital, and the whole brain and spleen were har- vested and immediately stored in RNAlater (Qiagen, USA), kept Muscle ◦ ◦ at 4 C, then transferred to –80 C for storage until RNA extraction. We downloaded raw fastq files (NCBI Bioproject Accession: PRJNA277985; paired-end 2×100 bp; Illumina HiSeq-2000) Tadpoles for 4 adult female toads (RM0021M, RM0094M, RM0108M, We conducted a tadpole rearing experiment in March 2015. Four and RM0169M) across 4 populations in Australia (El Questro, adult toads (2 males and 2 females) were collected from both 16.007872S, 128.020494E; Purnululu National Park, 17.4334 S, ◦ ◦ Oombulgurri (15.1818 S, 127.8413 E), Western Australia, and In- ◦ ◦ 128.3018 E, both Western Australia; Innisfail, 17.4963 S, ◦ ◦ nisfail (17.4963 S, 146.0465 E), Queensland, Australia. To obtain ◦ ◦ ◦ 146.0465 E; Rossville, 15.7054 S, 145.2229 E, both Queensland) previously used to build a de novo muscle (triceps femoris) transcriptome [9]. Ovary and testes Two adult toads (1 male and 1 female) were collected from ◦ ◦ Macapa´ (0.0432 S, 51.1241 W), Amapa, ´ Brazil, in December 2015. Toads were euthanized as described above, and ovary and testes were excised and immediately stored in RNAlater, then kept at 4 C for storage until RNA extraction. Total RNA was extracted using Qiagen RNeasy kits, following the manufacturer’s proto- col with an additional DNase digestion step. Extracted RNA was quantified using a Qubit RNA BR assay, and RNA integrity was assessed using a Tapestation 2200 (Aligent Tech., Santa Clara, CA, USA) with an RNA screen. One μg of total RNA per sample was used to construct mRNA libraries using the TruSeq mRNA v2 sample kit, which included a 130–350-bp size selection step. Both libraries were run on a HiSeq 1500 using Illumina V4 PE chem- Figure 1: The cane toad, Rhinella marina. NCBI Taxonomy ID: 8386. Photographer istry across 2 lanes (1 lane for each sample), generating 844.6 credit: Matt Greenlees. Source: Matt Greenlees. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 A multi-tissue cane toad transcriptome 3 Table 1: Cane toad samples used to generate the de novo reference transcriptome Tissue Origin Platform Sample ID (library size) Sampling location Sex SRA Brain Australia HiSeq 2500 B19 (23.9 M) Durack F SRR5446736 (2 × 125 bp) B20 (27.7 M) Durack F SRR5446735 B31 (24.8 M) Gordonvale F SRR5446734 B32 (22.3 M) Gordonvale F SRR5446733 Spleen Australia HiSeq 2500 S1 (23.8 M) Gordonvale F SRR5446732 (2 × 125 bp) S2 (25.0 M) Gordonvale F SRR5446732 S18 (24.7 M) Durack F SRR5446732 S19 (23.6 M) Durack F SRR5446732 Muscle Australia HiSeq 2000 RM0021M (93.8 M) El Questro F SRR1910534 (2 × 100 bp) SRR1910535 RM0094M (88.2 M) Purnululu F SRR1910543 RM0108M (97.6 M) Innisfail F SRR1910545 RM0169M (80.0 M) Rossville F SRR1910549 Tadpole Australia HiSeq 2500 T1 (26.4 M) Oombulgurri Both SRR5446728 (2 × 125 bp) T4 (24.5 M) Oombulgurri Both SRR5446727 T7 (23.2 M) Innisfail Both SRR5446726 T10 (25.7 M) Innisfail Both SRR5446725 Liver Brazil HiSeq 2000 RMTP (536.8 M) Macapa´ NA SRR1514601 (2 × 75bp) Ovary Brazil HiSeq 1500 AR19 (434.1 M) Macapa´ F SRR5446724 (2 × 125 bp) Testes Brazil HiSeq 1500 AR05 (410.5 M) Macapa´ M SRR5446723 (2×125 bp) Library size is given as raw sequenced reads in millions (M), sex denoted as female (F) and male (M). Both: sample contains mixed individuals of both sexes; NA: information unknown. million paired-end 2×125-bp reads. Raw reads are available in 12 different k-mers for each combination of -cov cutoff = 3, - the NCBI SRA under the Bioproject Accession PRJNA382870. min pair count = 4, and -cov cutoff = 5, -min pair count = 6,where -ins length = 300 and -min trans lgth = 200, were consistent across Liver assemblies. The individual assemblies were then compiled into We downloaded raw fastq files (NCBI Bioproject Accession: PR- an “over-assembly” of ∼42 million transcripts. To reduce redun- JNA255079; paired-end 2×75 bp; Illumina HiSeq-2000) from a dancy in the “over-assembly,” we used the tr2aacds pipeline pool of 5 adult toads from Macapa, ´ Amapa, ´ Brazil, previously from the Evidential Gene package [17], which selects the “op- used to build a de novo liver transcriptome [11]. timal” set of transcripts based on their coding potential. This re- duced the redundant “over-assembly” to the final assembly of 62 202 transcripts. Of these, 50% (31 040) are commonly expressed Data preprocessing and multiple de novo transcriptome among the 7 different tissues used in the assembly, while a total assemblies of 6.64% exhibit tissue-specific expression (Fig. S1: Additional file 1) [32]. We then used TransDecoder v3.0.0 to predict protein cod- Raw reads from each sample were first processed with Trimmo- ing sequences (CDS) with a minimum CDS of 100 bp. Transvesti- matic v0.33 [12], using the following parameters: ILLUMINACLIP: gator [18] was used to prepare the final assembly for submission TruSeq3-PE.fa:2:30:10:4 HEADCROP:13 AVGQUAL:30 MINLEN:36, to NCBI’s Transcriptome Shotgun Assembly (TSA) database— to (i) remove adaptor sequences, (ii) trim the first 13 bp of a read, accessible through the PRJNA383966 accession. Results from the (iii) discard reads with an average quality <Phred 30, and (iv) assembly pipeline are described in Table 3. As the “dropset”— remove reads <36 bp after processing. We then concatenated those transcripts not kept in the “optimal” tr2aacds output— reads into 2 input data sets, 1 containing all samples from Aus- may contain other biologically relevant transcripts, such as non- tralia and 1 containing those from Brazil. To reduce the compu- coding RNAs and active transposable elements, we also provide tational load of assembly, we used the in silico normalization ap- these transcripts in the associated GigaDB repository [32]. proach implemented in Trinity v2.1.1 (Trinity, RRID:SCR 013048) [13], with –normalize max read cov = 50, on both of the input data sets. The normalized Australia and Brazil data sets contained Annotation ∼42.2 million and ∼82.2 million reads, respectively. Multiple in- dependent de novo transcriptome assemblies were conducted for We conducted functional annotation based on our predicted each of the input data sets, resulting in 100 separate assemblies protein sequences utilizing the automated Trinotate pipeline. (Table 2). In brief, we used 3 assemblers: Trinity, with default pa- Transcripts were first annotated based on sequence homology, rameters and –min contig length = 300; SOAPdenovo-Trans v1.03 where assembled nucleotides and translated CDS sequences (SOAPdenovo-Trans, RRID:SCR 013268)[14], with 13 different k- were used in BLASTx (BLASTX, RRID:SCR 001653)and BLASTp mers (apart from the Brazil input set, which had 12) for each (BLASTP, RRID:SCR 001010) searches, against the UniProt/Swiss- combination of EdgeCovCutoff = 2, mergeLevel = 1, EdgeCovCut- Prot database (downloaded Feb. 2017) using a standalone ver- −5 off = 3, mergeLevel = 2; the parameters -f, -F, and minContigLen = sion of BLAST v2.2.26+ [19], with an e-value cutoff of 1×10 . 200 were the same for all assemblies; velvet v1.2.09/oases v0.2.08 Pfam (Pfam, RRID:SCR 004726)[20] functional domains (down- (Velvet, RRID:SCR 010755/Oases, RRID:SCR 011896)[15, 16], with loaded Feb. 2017) were identified in protein coding sequences us- Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Richardson et al. Table 2: De novo assembler parameters used to produce the “over-assembly” Assembler k-mers Parameter combinations No. of assemblies Trinity 25 Default Aus 1, Brazil 1 SOAPdenovo-Trans 21, 25, 29, 33, 37, 41, 45, 49, 59, 69, 79, 89, EdgeCovCutoff = 2 and mergeLevel = 1; Aus 13, Brazil 12; Aus 99 (No. 99 for the Brazil input set) EdgeCovCutoff = 3 and mergeLevel = 2 13, Brazil 12 Velvet/Oases 21, 25, 29, 33, 37, 41, 45, 49, 59, 69, 79, 89 cov cutoff = 3 and min pair count = 4; Aus 12, Brazil 12; Aus cov cutoff = 5 and min pair count = 6 12, Brazil 12 Total: 100 Table 3: Summary of transcriptome assembly and annotation statis- v9.0, downloaded Xenbase.org, Aug. 2017) (Fig. S2: Additional tics compared with previous cane toad transcriptomes file 1), albeit with a greater frequency of shorter features. This is not unexpected given that we are comparing a de a b This study Muscle Liver novo transcriptome assembly to gene models from a genome assembly. Also, our assembly shows good coverage of the Assembly lengths of CDS features compared with X. tropicalis (Fig. S3: Filtered read pairs 945 348 780 99 462 214 265 684 605 Additional file 1), given both species are substantially diver- In silico normalized reads 129 051 008 18 713 526 - gent. Importantly, the new R. marina multitissue assembly in- Assembly size, bp 83 724 193 60 388 685 80 251 892 creases the coverage of transcripts containing protein coding Number of transcripts 62 202 57 580 131 020 sequences with associated BLAST matches and Gene Ontology N50 2377 1871 916 Average length, bp 1346 1049 613 annotations compared with the previously available R. marina Minimum length, bp 297 201 201 assemblies. Maximum length, bp 99 438 40 546 17 369 Second, we evaluated the new assembly using the Bench- Median length, bp 698 577 331 marking Universal Single-Copy Orthologs (BUSCO) vertebrate GC, % 46.05 45.06 44.32 gene set (BUSCO, RRID:SCR 015008)[27], which uses 3023 Transcripts with CDS 62 202 19 751 – near-universal orthologs (hereafter BUSCOs) to evaluate the Annotation relative completeness of assemblies and compares the re- Transcripts with BLASTx hit 31 103 21 533 – sults with the previous R. marina assemblies and those Transcripts with BLASTp hit 28 560 16 754 – of several available amphibian transcriptomes and genomes Transcripts with GO terms 28 399 19 500 – (Table 4). We used BUSCO v1.2 [27] with the default e-value cut- offof0.01and –mode = Trans for all the transcriptomes com- Rollins, Richardson, and Shine [9]. pared and –mode = OGS for the genome comparisons (using the Arthofer et al. [11]. N. parkeri v2.0, X. tropicalis v9.0 and X. laevis v9.1 1.8.3.2 gene builds downloaded from Xenbase.org, Aug. 2017). Our multitis- ing hmmscan [21]; signal peptides and transmembrane domains sue assembly had a much higher percentage of complete BUS- were assigned using hidden Markov model prediction imple- COs (90%), apart from the 2 Xenopus genomes, which exhibited mented in SignalP v4.1 [22] and TMHMM v2.0c [23], respectively. comparable results (X. tropicalis, 91% and X. laevis, 97%). Addi- Finally, transcripts were compared with curated annotations in tionally, our multitissue transcriptome has low BUSCO miss- the eggNOG (eggNOG, RRID:SCR 002456)[24] and Gene Ontology ingness, intermediate duplication of complete BUSCOs, and the (GO, RRID:SCR 002811)[25] databases. A summary of annotation second lowest level of fragmented BUSCOs. In contrast to the metrics is provided in Table 3. The combined Trinotate func- previous R. marina transcriptomes specifically, the new assem- tional annotations to the TSA assembly are available in the as- bly has less fragmented and 20%–30% more complete BUSCO sociated GigaDB [26]. genes—suggesting the presence of more full-length transcripts. Overall, the comparison of BUSCO results revealed our assem- Quality and completeness of the cane toad bly to be one of the most complete references available for transcriptome anurans. Third, we compared the multitissue transcriptome with To evaluate our new multitissue transcriptome assembly, we the previous cane toad transcriptomes and the 3 currently used 3 comparative approaches to assess relative quality and available Anuran genomes through both standard and recip- completeness. First, we compared core assembly statistics of rocal best-hit BLAST approaches. The standard approach re- the new assembly to our 2 previous cane toad single-tissue vealed that 40 741 (65.5%) and 31 189 (50.1%) of our new as- transcriptomes derived from muscle and liver tissue (Table 3). −3 sembly had significant matches (e-value < 10 )tothe liver The inclusion of data from multiple tissues (encompassing 9.5- and muscle transcriptomes, respectively. The reciprocal best- and 3.5-fold increases in read input compared with the mus- hit approach reduced the number of significant matches to cle and liver transcriptomes, respectively) resulted in increases both the liver (23 943; 38.5%) and muscle (15 892; 25.5%) of all assembly metrics, apart from the number of assembled transcriptomes, which may in part be due to transcripts transcripts, which fell compared with the liver transcriptome mapping to multiple isoforms in the different assemblies. This, (Table 3). Notably, mean transcript length increased from 613 together with the high number of protein-coding transcripts in (liver) to 1346 bp, and transcript n50 increased from 916 (liver) the multitissue assembly, indicates that the new assembly still to 2377 bp. The total assembled bases were similar between contains some redundancy and that we have assembled mul- the multitissue transcriptome and that assembled from liver, tiple transcript variants for some genes. Standard BLAST com- yet higher (∼20 million bp) than that produced from muscle parisons of our assembly with the X. tropicalis, X. laevis, and tissue. Additionally, our assembly exhibits comparable lengths N. parkeri proteins exhibited 40 275 (64.7%), 40 218 (64.7%), and of mRNAs and CDS to those from X. tropicalis (gene build Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 A multi-tissue cane toad transcriptome 5 Table 4: BUSCO analysis of transcriptome completeness Complete and Complete duplicated Fragmented Missing BUSCOs, % BUSCOs, % BUSCOs, % BUSCOs, % R. marina transcriptomes This study 90 4.7 1.7 7.8 Muscle 60 4.6 5.7 33 Liver 69 0.6 4.1 26 Select anuran transcriptomes Bufotes viridis 26 0.3 15 57 Rana catesbeiana 79 42 2.8 17 Pelohylax nigromaculatus 50 0.4 7.8 41 Microhyla ssipes fi 73 1.2 4.7 21 Select anuran genomes Xenopus laevis 97 51 1.4 1.4 Xenopus tropicalis 91 4.1 3.7 4.9 Nanorana parkeri 76 2.8 9.0 14 “Complete BUSCOs” refers to those with a full-length match in the assembly. “Complete and duplicated” refers to those BUSCOs that are complete within an assembly but have multiple matches present. “Fragmented” are those BUSCOs that only have a partial match in the assembly, and “Missing” refers to those BUSCOs that do not a b c d e f have a corresponding match in the assembly. Rollins, Richardson, and Shine [9]. Arthofer et al. [11]. Gerhchen et al. [7]. Birol et al. [28]. Huang et al. [29]. Zhao g h i et al. [30]. Session et al. [5]. Hellsten et al. [4]. Sun et al. [6]. 40 244 (64.7%) significant matches, respectively; 37 064 of our Conclusions assembled transcripts with significant matches are common to This comprehensive anuran transcriptome will not only serve all 3 species (Fig. S4: Additional file 1). Of our 31 103 assem- as a valuable reference for investigation of genes under selec- bled transcripts with annotations, 97.8% (30 423) have signifi- tion during invasion in cane toads, but will also expand our cant matches to X. tropicalis, while 97.9% (30 465) have matches general knowledge of anuran genomes. Additionally, we have to X. laevis and 98.3% (30 574) to N. parkeri; 96.3% (29 967) are identified numerous orthologous transcripts to X. tropicalis, X. common to all 3 species (Fig. S5: Additional file 1). The high laevis, and N. parkeri proteins, which may aid the identifica- percentage of annotated transcripts with matches to X. trop- tion of amphibian-specific genes—an important objective of icalis, X. laevis, and N. parkeri provides further evidence that AmphiBase [31]. our assembly pipeline produced transcripts with strong anuran homology. Availability of supporting data The data sets supporting the results presented here are available Identification of anuran orthologues in the associated GigaDB repository [25]. All raw sequencing data used in this study are available in the SRA and associated with Current efforts to identify amphibian-specific genes have been the following BioProject accessions: PRJNA277985, PRJNA255079, hampered by a lack of high-quality full-length genes for nu- PRJNA382870, and PRJNA383966. The final transcriptome assem- merous amphibian species [31]. So far, the identification of bly has been deposited at DDBJ/EMBL/GenBank under the acces- amphibian-specific genes has not been possible as orthologous sion GFMT00000000. The version described in this paper is the counterparts have only been identified between the 2 Xenopus first version, GFMT01000000. genomes. We used OrthoFinder v1.1.10 [32] with default pa- rameters to identify orthologues between our newly assem- bled R. marina CDS-containing transcripts and the proteomes Additional file from N. parkeri, X. tropicalis, and X. laevis (using the same gene Additional file 1: Figure S1: Schematic diagram showing the per- build versions as in the BUSCO analysis). OrthoFinder iden- tifies “orthogroups” (a group of genes descended from a sin- centage (%) of expressed transcripts among the 7 different tis- sues used in the assembly. For brevity, we only show those gle gene in the last common ancestor of a group of species) [32] and then orthologues between each pair of species in the common to all tissues (centre of elements) and those uniquely comparison. Because OrthoFinder classifies genes with mul- expressed in each separate tissue. Transcript expression was quantified using Salmon v0.8.0 [ 1] using –l IU and default pa- tiple orthologues (i.e., many to many relationships) in “or- thogroups,” it may reduce the impact that multiple isoforms rameters. Additionally, detailed comparative analysis of expres- sion among all tissue combinations is provided in the associated of the same gene have in such analyses. We assigned 60.2% (94 516) of all the genes examined to 18 776 “orthogroups” GigaDB repository [2]. Additional file 1: Figure S2: Histogram of the lengths of R. ma- (see Additional file 2), of which 50% of all genes were found in “orthogroups” with 4 or more genes. Additionally, we iden- rina assembled mRNAs and CDS compared with those from X. tropicalis (gene build v9.0, Xenbase.org, Aug. 2017). tified 12 674 “orthogroups” that contained genes from all 4 species, and 4586 of these consisted entirely of single-copy Additional file 1: Figure S3: Scatterplot showing the coverage of each R. marina CDS length in base pairs compared with the genes. The data set presented here may be useful for further research into identifying amphibian-specific genes, so we have corresponding CDS match in X. tropicalis (gene build v9.0, Xen- base.org, Aug. 2017). CDS length matches were extracted from included this analysis in its entirety in the associated GigaDB repository [25]. BLASTx queries with evalue = 10E-3 and –max target seqs=1. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Richardson et al. Additional file 1: Figure S4: Venn diagram showing an 3. Gregory T. Animal Genome Size Database. 2015. –3 overview of the significant BLASTx matches (e-value < 10 )for http://www.genomesize.com. Accessed 3 March 2017. our R. marina assembly against the proteins from X. tropicalis, X. 4. Hellsten U, Harland RM, Gilchrist MJ et al. The genome laevis, and N. parkeri. Venn diagram build using the Venn diagram of the Western Clawed frog Xenopus tropicalis. Science webserver: http://bioinformatics.psb.ugent.be/webtools/Venn/. 2010;328:633–6. Additional file 1: Figure S5. Venn diagram showing an 5. Session AM, Uno Y, Kwon T et al. Genome evolution –3 overview of the significant BLASTx matches (e-value < 10 ) in the allotetraploid frog Xenopus laevis. Nature 2016;538: for our R. marina transcripts with annotations (31 103) 336–43. against proteins from X. tropicalis, X. laevis, and N. park- 6. Sun Y-B, Xiong Z-J, Xiang X-Y et al. Whole-genome sequence eri. Venn diagram built using the Venn diagram webserver: of the Tibetan frog Nanorana parkeri and the comparative http://bioinformatics.psb.ugent.be/webtools/Venn/ evolution of tetrapod genomes. Proc Natl Acad Sci U S A Additional file 2: Table S1: Summary of OrthoFinder analy- 2015;112:E1257–62. sis. Gene build versions are given in the species heading where 7. Gerchen JF, Reichert SJ, Rohr ¨ JT et al. A single transcriptome appropriate. of a green toad (Bufo viridis) yields candidate genes for sex de- termination and -differentiation and non-anonymous popu- Competing interests lation genetic markers. PLoS One 2016;11:e0156419. 8. Kraus F. Alien Reptiles and Amphibians: A Scientific Com- We declare no competing interests. pendium and Analysis. Dordrecht, Netherlands: Springer; Abbreviations 9. Rollins LA, Richardson MF, Shine R. A genetic perspective on rapid evolution in cane toads (Rhinella marina). Mol Ecol BUSCO: Benchmarking universal single copy orthologs; bp = 2015;24:2264–76. base pair; CDS: coding sequence; GO: Gene Ontology; SRA: Short 10. Gosner KL. A simplified table for staging anuran em- Read Archive; TSA: Transcriptome Shotgun Assembly. bryos and larvae with notes on identification. Herpetologica 1960;16:183–90. Ethics statement 11. Arthofer W, Banbury BL, Carneiro M et al. Genomic resources notes accepted 1 August 2014-30 September 2014. Mol Ecol Ethics approval for the capture of wild Australian samples was Resour 2015;15:228–9. provided under the University of Sydney permit 2014/562, the 12. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexi- rearing of tadpoles by the University of Sydney permit 2013/6033, ble trimmer for Illumina sequence data. Bioinformatics and Brazilian samples under the Brazilian Federal Chico Mendes 2014;30:2114–20. Institute for Biodiversity Conservation (ICMBio), through license 13. Grabherr MG, Haas BJ, Yassour M et al. Full-length tran- number 38 047–3. scriptome assembly from RNA-Seq data without a reference Author contributions genome. Nat Biotechnol 2011;29:644–52. 14. Xie Y, Wu G, Tang J et al. SOAPdenovo-Trans: de novo tran- M.F.R., M.V., and L.A.R. collected animals and conducted the scriptome assembly with short RNA-Seq reads. Bioinformat- sample preparation. J.G.R. and M.R.C. conducted tadpole rearing. ics 2014;30:1660–6. M.F.R., F.S., D.M.S., M.C., and L.A.R. conducted RNA isolation for 15. Zerbino DR, Birney E. Velvet: algorithms for de novo sequencing and library construction. AJW contributed samples. short read assembly using de Bruijn graphs. Genome Res M.F.R. conducted transcriptome assemblies and analysis. M.F.R., 2008;18:821–9. F.S., R.S., and L.A.R. wrote the manuscript and participated in 16. Schulz MH, Zerbino DR, Vingron M et al. Oases: robust de study design. All authors commented on the manuscript and ap- novo RNA-seq assembly across the dynamic range of expres- proved the final submission. sion levels. Bioinformatics 2012;28:1086–92. 17. Gilbert D. Accurate and complete gene construction with Ev- Acknowledgements identialGene. F1000Res 2016;5. 18. DeRego T, Hall B, ben-guin, Geib S. Transvestigator early We thank Serena Lam and Chris Jolly for assistance with release, Version v0.1-alpha. Zenodo, 2014. http://doi.org/ sample collection. This project was funded through an Aus- 10.5281/zenodo.10471. tralian Research Council (ARC) Discovery Early Career Re- 19. Camacho C, Coulouris G, Avagyan V et al. BLAST+:architec- search Award (DE150101393) to L.A.R., ARC Laureate Fellow- ture and applications. BMC Bioinformatics 2009;10:421. ship to R.S. (FL120100074), Centre for Integrative Ecology re- 20. Finn RD, Bateman A, Clements J et al. Pfam: the protein fam- search funds, Deakin University to M.F.R., FCT Investigator ilies database. Nucl Acids Res 2014;42:D222–30. grant to M.C. (IF/00283/2014/CP1256/CT0012), research project 21. Eddy SR, Crooks G, Green R et al. Accelerated pro- and research fellowship (PQ 10/2012 and Universal 14–2013) file HMM searches. Pearson WR, ed. PLoS Comput Biol and postdoctoral fellowship to M.V. (232916/2013–6, CNPq), 2011;7:e1002195. and FEDER (COMPETE, POCI-01–0145-FEDER-006821) and FCT 22. Petersen TN, Brunak S, von Heijne G et al. SignalP 4.0: dis- (UID/BIA/50027/2013) funds to F.S. criminating signal peptides from transmembrane regions. Nat Methods 2011;8:785–6. References 23. Krogh A, Larsson B, von Heijne G et al. Predicting transmem- 1. Kidwell MG. Transposable elements and the evolution of brane protein topology with a hidden Markov model: appli- genome size in eukaryotes. Genetica 2002;115:49–63. cation to complete genomes. J Mol Biol 2001;305:567–80. 2. Treangen TJ, Salzberg SL. Repetitive DNA and next- 24. Powell S, Szklarczyk D, Trachana K et al. eggNOG v3.0: orthol- generation sequencing: computational challenges and ogous groups covering 1133 organisms at 41 different taxo- solutions. Nat Rev Genet 2011;13:36–46. nomic ranges. Nucleic Acids Res 2012;40:D284–9. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018 A multi-tissue cane toad transcriptome 7 25. Ashburner M, Ball CA, Blake JA et al. Gene ontology: tool for 29. Huang L, Li J, Anboukaria H et al. Comparative transcriptome the unification of biology. Nat Genet 2000; 25:25–29. analyses of seven anurans reveal functions and adaptations 26. Richardson MF, Sequeira F, Selechnik D et al. Supporting of amphibian skin. Sci Rep 2016;6:24069. data for “Improving amphibian genomic resources: a mul- 30. Zhao L, Liu L, Wang S et al. Transcriptome profiles of meta- titissue reference transcriptome of an iconic invader.” Giga- morphosis in the ornamented pygmy frog Microhyla ssipes fi Science Database 2017. http://dx.doi.org/10.5524/100374. clarify the functions of thyroid hormone receptors in meta- 27. Simao ˜ FA, Waterhouse RM, Ioannidis P et al. BUSCO: assess- morphosis. Sci Rep 2016;6:27310. ing genome assembly and annotation completeness with 31. Kwon T. AmphiBase: a new genomic resource for non-model single-copy orthologs. Bioinformatics 2015;31:3210–2. amphibian species. Genesis 2017;55:e23010. 28. Birol I, Behsaz B, Hammond SA et al. De novo transcriptome 32. Emms DM, Kelly S. OrthoFinder: solving fundamental assemblies of rana (Lithobates) catesbeiana and Xenopus laevis biases in whole genome comparisons dramatically im- tadpole livers for comparative genomics without reference proves orthogroup inference accuracy. Genome Biol 2015; genomes. Plateroti M, ed. PLoS One 2015;10:e0130720. 16:157. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4662864 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

GigaScienceOxford University Press

Published: Jan 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off