Comparative transcriptomics of cyprinid minnows and carp in a common wild setting: a resource for ecological genomics in freshwater communities

Comparative transcriptomics of cyprinid minnows and carp in a common wild setting: a resource for... Comparative transcriptomics can now be conducted on organisms in natural settings, which has greatly enhanced understanding of genome–environment interactions. Here, we demon- strate the utility and potential pitfalls of comparative transcriptomics of wild organisms, with an example from three cyprinid fish species (Teleostei:Cypriniformes). We present extensively fil- tered and annotated transcriptome assemblies that provide a valuable resource for studies of genome evolution (e.g. polyploidy), ecological and morphological diversification, speciation, and shared and unique responses to environmental variation in cyprinid fishes. Our results and analyses address the following points: (i) ‘essential developmental genes’ are shown to be ubiquitously expressed in a diverse suite of tissues across later ontogenetic stages (i.e. juve- niles and adults), making these genes are useful for assessing the quality of transcriptome as- semblies, (ii) the influence of microbiomes and other exogenous DNA, (iii) potentially novel, species-specific genes, and (iv) genomic rearrangements (e.g. whole genome duplication). The data we present provide a resource for future comparative work in cypriniform fishes and other taxa across a variety of sub-disciplines, including stress response, morphological diversifica- tion, community ecology, ecotoxicology, and climate change. Key words: RNA-seq, essential genes, Cyprinus carpio, carp, gene silencing 1. Introduction 9–11 and giving rise to the field of ecological genomics. Reduced sequenc- High-throughput sequencing has dramatically accelerated the pace of ing costs have made it feasible to study transcriptomes of co-occurring 1,2 genomic research. While once restricted to model species in labora- species in a community ecology context (i.e. ‘community transcriptom- tory settings, genomic methods are being widely applied to non-model ics’), as well as comparative studies of transcriptome evolution across 3–8 13–16 species in nature, rapidly illuminating the black box of the genome diverse clades (i.e. ‘comparative transcriptomics’). While genomic V C The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 11 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 12 Comparative transcriptomics of cyprinids data from model species can be informative for the biology of related species with a comprehensively-annotated genome. Zebrafish is an organisms, not all species are the same in terms of their ecology, genet- important model organism in developmental biology and disease re- 43–45 ics, and morphology. For example, research on the zebrafish (Danio search, due to its semitransparent embryos and ease of laboratory rerio, family Cyprinidae) can be relevant for closely related species, but culture, as well as its comprehensively annotated genome. Fathead cannot explain the tremendous ecological and morphological diversity minnow is widely used as an indicator species in ecotoxicology studies 47,48 in this clade, as studies of single species are insufficient for understand- for which microarrays have been developed and a draft genome ing dynamic interactions among species and their respective genomes in sequence is now available. Common carp is an important food fish, a macroevolutionary context. In order to understand the causes and especially in Asia, and is produced extensively in aquaculture. The 39,51,52 consequences of those interactions, as well as the origin of ecological carp transcriptome has been studied elsewhere and recently a 17 40 novelty (e.g. new genes ), we must examine genomes across species draft genome sequence was published. that reflect that diversity. Recent studies across a diverse suite of teleost We used the extensive zebrafish genomic resources available to an- fish lineages have focused on functionally-important genetic variation notate transcriptomes of three evolutionarily related, but non-model 18–22 in non-model species in nature. species that co-occur in parts of the west-central United States: Transcriptomics of species in the wild has enormous potential to Cyprinella lutrensis (red shiner), Platygobio gracilis (flathead chub) and advance our understanding of mechanisms underlying molecular ad- Cyprinus carpio (common carp). These species were selected to reflect aptation, evolutionary diversification, ecotoxicology, and commu- phylogenetic breadth, but also because their distributions overlap and 4,23–26 nity ecology. In this context, several important questions arise. occupy identical dryland river habitats (i.e. the Rio Grande, New For example, What are the proximate and ultimate mechanisms un- Mexico), where they are exposed to similar biotic and abiotic condi- derlying phylogenetic, ecological, and morphological divergence? tions. Cyprinus carpio is native to Asia and Europe, but was introduced How have ancestral genomes been molded by divergent natural se- into North America, perhaps as early as 1831, and enthusiastically lection and other evolutionary forces into myriad forms that exist stocked throughout the US thereafter as a food fish, including New today? How does genomic architecture constrain or promote diversi- Mexico as early as 1889. Cyprinella lutrensis and Platygobio gracilis fication? How important are genome duplication events in adaptive are both native to central and western North America, from the radiations? What role do genomes play in underlying the ecological Mississippi River basin to the Rio Grande in New Mexico. Both C. dynamics of community assembly (e.g. competition, abundance, spa- lutrensis and C. carpio are highly tolerant of a wide range of environ- tial and temporal dynamics, physiological constraint, etc.)? A neces- mental conditions and are highly invasive in areas outside of their natu- 55,56 sary first step in addressing these questions is the generation of ral range, whereas Platygobio gracilis is sensitive to environmental databases reflecting the genomic or transcriptomic variation among disturbance and imperiled or declining in several parts of its range. species, which we provide in the current study. Transcriptomes of Cyprinella lutrensis and Platygobio gracilis Here we present transcriptomic resources for three members of have not been published to our knowledge, whereas genomic and 39,40,51 the freshwater fish family Cyprinidae (Teleostei: Cypriniformes), one transcriptomic data are available for Cyprinus carpio. of the most speciose vertebrate clades, with over 2,000 species. In Cyprinella lutrensis and Platygobio gracilis are diploid (2n¼ 50), addition to remarkable species diversity, the clade includes extensive while Cyprinus carpio is allotetraploid 2n¼ 100, with some dupli- ecological, genetic, and morphological diversity. Cyprinid fishes cated genes silenced after a lineage-specific whole genome duplica- (minnows and carp) comprise an important component of freshwater tion (i.e. Cc4R). Our aims in this study were to: (i) succinctly fish communities throughout North America, Asia, Europe and summarize and compare genes and functional annotation informa- Africa. They are often the dominant fish taxa in numerical abun- tion obtained from various databases; (ii) test whether Cyprinus car- dance and biomass and play an important functional role in aquatic pio expresses additional copies of particular genes compared to the 29–31 ecosystems. two diploid species (Cyprinella lutrensis and Platygobio gracilis); (iii) The ecological and taxonomic diversity of cyprinids is particularly identify potentially novel genes present in the three cyprinids that interesting in light of the history of genome evolution in this clade. may underlie their unique ecological and morphological novelty and Cyprinids were part of the radiation that occurred after the teleost- (iv) to assess evolutionary conservation of essential genes for 32,33 specific genome duplication event, known as the ‘3R hypothesis’, development. that preceded and perhaps facilitated the diversification of teleost In zebrafish, 307 genes are known to be essential for development. fishes. In addition, several cyprinid lineages have independently un- Knockout mutations in these genes are embryonic lethal according to 35–38 58 dergone additional rounds of genome duplications. For exam- experiments by Amsterdam et al. with subsequent revisions by 59 60 ple, the common carp (Cyprinus carpio) lineage had a fourth round Chen et al. and updates to the ENSEMBL database. These genes 39 40 of genome duplication approximately 5.6–11.3 or 8.2 million are highly conserved across extremely deep phylogenetic splits (e.g. years ago (Ma), which we refer to as ‘Cc4R’. Pairs of genes arising yeast, fly, zebrafish, and human) due to their essential roles in devel- from whole genome duplication, referred to as ‘Ohnologs’, were the- opment. Despite their importance, essential genes have not been oretically present for all genes immediately following the Cc4R ge- studied in the context of comparative molecular ecology or ecologi- nome duplication. Varying levels of subsequent gene-silencing and cal genomics of co-occurring species. Using transcriptome data pre- ‘re-diploidization’ have since occurred in polyploid lineages making sented in this study, we assessed the evolutionary conservation of the 41,42 cyprinids ideal for comparative studies of genome evolution. 307 zebrafish essential genes across four cyprinid lineages. We pre- Despite the ecological importance of cyprinids in freshwater sys- dicted that these genes would be highly conserved across all species, tems worldwide and dynamic lineage-specific patterns of genomic ex- consistent with their critical functional roles, as compared to non- pansions and contractions, most species have little or no genomic essential genes. If this is the case, then differences among species resources available for investigating their molecular ecology or genome should be found in non-essential genes, such as lineage- or species- evolution. Three notable exceptions are zebrafish (Danio rerio), fat- specific genes. We also tested whether both copies of duplicated head minnow (Pimephales promelas), and common carp (Cyprinus genes in C. carpio (i.e. Cc4R Ohnologs) were retained and expressed 43–45 38,42 carpio). The family includes zebrafish (Danio rerio), a model in duplicate or whether one copy was evolutionarily lost. One Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 13 mechanism for the loss of Ohnologs is ‘pseudogenization’, wherein a gene accumulates one or more internal stop codons that prevent formation of a functional protein product and thus becomes a pseu- dogene. If having redundant copies of essential genes were impor- tant for survival (e.g. due to loss-of-function mutations in one copy), then evolutionary retention of duplicates would likely be favored in C. carpio. Conversely, if regulation of proper gene expression levels were important in the context of functional pathways, then dupli- cated essential genes would likely be silenced at roughly the same rate as non-essential genes (although regulatory changes could also fine-tune expression patterns). We tested these hypotheses using ex- pression data for the three cyprinid transcriptomes as compared to zebrafish. These sequences will provide resources for more detailed studies of the evolution and functional constraint of these critical genes, particularly in the context of genome expansions and reductions. 2. Materials and methods Fish (n¼ 3 per species) were collected with a seine on 6 July 2012 from a field site on the Rio Grande, approximately 40 km south of Socorro, New Mexico (33.690556 N, 106.993042 W). Whole fish samples (juveniles or non-spawning adults) were immediately frozen in liquid nitrogen and transported to the laboratory. Skin, gill, gut, and kidney tissues were dissected and removed from frozen fish (outer layers were only slightly thawed by the time dissection was completed;<5 minutes total time), placed in TRIzol (Invitrogen), and Figure 1. Flow diagram illustrating our bioinformatic pipeline. Analyses con- sist of three main steps: assembly, annotation, and analysis of gene silenc- mechanically homogenized. Total RNA was isolated using Purelink ing patterns. Databases queried and software packages used are listed. RNA Mini kits (Ambion) following manufacturer’s protocol, along with DNase treatment to reduce genomic DNA contamination. Purified total RNA was sent to the National Center for Genome BLAST hits were identified based on the following parameter set- Resources (Santa Fe, New Mexico, USA) for quantification, quality tings: E-value< 0.0001; gap open penalty¼ 11; gap extend¼ 1; assessment, cDNA library preparation and sequencing. RNA integ- wordsize¼ 3. After extensive testing, this parameter combination rity and purity was assessed with a Bioanalyzer 2100 instrument was found to give the optimal balance between finding matches for (Agilent Technologies). Thirty-six Illumina libraries were constructed large numbers of contigs, while minimizing spurious hits. For most (3 species  4 tissues  3 biological replicates) from the total RNA genes a 1–1 match was expected between zebrafish versus Platygobio samples using Illumina TruSeq DNA prep kits according to the man- gracilis or Cyprinella lutrensis, whereas zebrafish and Cyprinus car- ufacturer’s protocol. Libraries were barcoded using standard six pio should have either 1-2 or 1-1 due to partial diploidy in carp. We base pair Illumina oligonucleotides, and six libraries were pooled for used this expectation in determining the threshold E-value (i.e. each lane of Illumina HiSeq 2000 (V3 chemistry) for a total of six E< 0.0001 in this study) to use. In practice, more stringent E-value lanes of 2  100 bp paired-end sequencing. thresholds (e.g. E< 1e6) had very little effect on the number of sig- nificant BLAST hits. 2.1. Bioinformatics Contigs with no significant BLAST hits against the zebrafish tran- We used the bioinformatics pipeline outlined in Figure 1 for analyz- scriptome were subjected to a series of stepwise BLASTn searches until ing transcriptomic data in three main steps: de novo assembly, gene significant hits were found (or not) in order to identify the possible annotation, and analysis of expression of duplicated genes. Adapters sources of those sequences (e.g. microbiome ) or to identify novel and barcode sequences were removed from raw reads, and reads genes not present in the zebrafish genome. First, remaining contigs were trimmed using TRIMMOMATIC with parameter settings as fol- lacking significant hits against the zebrafish transcriptome were que- lows: leading quality¼ 5; trailing quality¼ 5; minimum trimmed ried against the rRNA silva database (SSU Ref 119 NR99 and LSU read length¼ 36). Reads were normalized in silico to maximum read Parc 119), which contains bacterial and eukaryotic rRNA sequences. coverage of 50. Clipped and trimmed reads were assembled, de Contigs with still no significant BLAST hits were then queried against novo, for each species separately using TRINITY version 2014-04- a database containing all nine additional teleost fish transcriptomes 65,66 13, with minimum contig length set to 200 bp. Libraries were (Amazon molly, Poecilia formosa; cavefish, Astyanax mexicanus;cod, pooled within a species for de novo assembly, to maximize the num- Gadus morhua;fugu, Takifugu rubripes;medaka, Oryzias latipes;pla- ber of genes included. TRINITY assembles reads into contigs (‘TRINITY tyfish, Xiphophorus maculates; stickleback, Gasterosteus aculeatus; transcripts’), places similar transcripts in groups loosely referred to tetraodon, Tetraodon nigroviridis;tilapia, Oreochromis niloticus) as ‘genes’, and groups similar ‘genes’ into gene clusters. from Ensembl 78. Contigs with no BLAST hits at this point were then Putative protein coding genes were also identified by BLASTx BLASTed against the zebrafish genome (Zv9) using the ‘Top Level’ se- searches of contigs against zebrafish (Danio rerio) peptide sequences quences from Ensembl to identify possible genomic DNA contamina- (database build Zv9) obtained from Ensembl 78. Significant tion. Remaining contigs with no significant blast hits in any of these Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 14 Comparative transcriptomics of cyprinids databases were piped to TRANSDECODER to identify open reading GO terminology was based on the zebrafish database. Results of the frames (ORFs) that represent potentially novel genes. Default parame- overrepresentation analysis were visualized with REVIGO. ter settings were used with TRANSDECODER. The software generates pre- dicted peptide sequences for contigs with ORFs. Predicted peptide 2.3. Essential genes sequences for the contigs with ORFs but no BLAST hits to the afore- To test the hypothesis of evolutionary conservation of essential genes mentioned databases were queried (BLASTp; E-value< 0.001) against among cyprinid fishes, we used zebrafish genes present in the Online the NCBI nr database. BLAST2GO version 3.0, was used to identify Gene Essentiality Database OGEE; and identified orthologs in the top species hits for those predicted proteins with significant hits against three transcriptomes from BLASTx searches described above. Of the nr. The remaining sequences with no hits to databases and no ORFs 58,59 307 essential genes in zebrafish, one (ENSDARG00000038423) were discarded as likely non-protein coding, genomic DNA contami- has been retired from ENSEMBL and one (ENSDARG000 nation with sufficient divergence from zebrafish to render genomic 00045605) is an unprocessed pseudogene with no protein product. BLASTn searches ineffective. We searched for the remaining 305 genes in the three transcriptome assemblies to assess their conservation across cyprinids. 2.2. Genome duplication, diploidization and gene silencing 3. Results Trimmed sequence reads were mapped to TRINITY contigs using BOWTIE2 version 2.2.2.3 and corresponding gene expression was 3.1. Sequencing and transcriptome assemblies quantified with RSEM version 1.2.13. Because RSEM is incompat- Six lanes of Illumina sequencing produced more than 1.2 billion ible with indel, local, and discordant alignments, parameter settings paired-end reads, including 420.5-, 413.9-, and 385.3-million se- were chosen to avoid these alignments. The following RSEM param- quences in Cyprinus carpio, Cyprinella lutrensis, and Platygobio gra- eters were used: –sensitive; –dpad 0; –gbar 99999999; –mp 1,1 –np 1 cilis, respectively. De novo assembly resulted in high quality –score-min L,0,-0.1; –no-mixed; –no-discordant. Normalized expres- transcriptomes for all three species (Table 1). The C. carpio assembly sion for TRINITY genes was calculated by standardizing by total had the largest number of contigs (‘TRINITY transcripts’) and genes mapped reads across libraries and summed across alternate TRINITY (‘TRINITY genes’), while P. gracilis had the fewest. In contrast, metrics transcripts (isoforms) for each locus. Networks of co-expressed genes for contig length (N25, N50, N75, median contig length, average were identified for the three species using the WGCNA package in contig length) were all longer in P. gracilis than the other two species R. In order to assess the expression of duplicated genes in (Table 1; Fig. 2). Overall, the P. gracilis transcriptome assembly was Cyprinus carpio arising from the Cc4R duplication event, we quanti- more complete despite fewer raw sequence reads. TRANSDECODER pre- fied the number of TRINITY genes present in each species relative to dicted ORFs in about half of all TRINITY contigs (not shown), with zebrafish genes, as well as their expression levels. We used an arbi- the remainder comprised mainly of genomic DNA contamination trary threshold of ten sequence reads per gene per tissue, summed that was filtered out of the final dataset. The N50 of predicted ORFs across all three individuals, for a given gene to be considered ‘ex- was only slightly shorter in the three species (i.e. 1,299–1,572 bp) pressed’ in a particular tissue. Note that we are comparing whether than in zebrafish (CDS N50¼ 2,037 bp), and similar to the recently or not a gene is expressed beyond a certain threshold, as opposed to published draft C. carpio genome 1,487 bp. Removal of micro- quantifying levels of expression (i.e. RNA-seq). This approach was biome and genomic DNA contamination from the final assembly re- aimed at reducing the influence of unique reads (e.g. sequencing arti- sulted in fewer, but longer contigs (see filtering of the final dataset, facts). Most of the contigs excluded as a result were contigs repre- below), and an overall higher-quality assembly. sented only by singleton reads in one library. For C. carpio, we tested whether certain functional classes of 3.2. BLAST searches: zebrafish transcriptome genes were preferentially expressed in duplicate (i.e. the case where neither ohnolog is silenced). For this analysis, we used PANTHER to TopBLASTxhitsofTRINITY contigs against zebrafish peptides included test for statistical overrepresentation of GO-slim Biological approximately 20,000 unique genes (ENSDARG) and 11,000 protein Processes, with Bonferroni correction. The test genes consisted of the families (ENSFAM) present in each of the three species (Fig. 3), suggest- list of C. carpio ohnologs expressed in duplicate, while the list of all ing similar annotation efficiency and transcriptome representation for C. carpio genes present in the assembly was used as the reference set. each species. However, after pooling isoforms, the number of TRINITY Table 1. De novo transcriptome assembly results. Zebrafish (Danio rerio) data is included as an example of a well-assembled and complete transcriptome based primarily on Sanger sequencing Cyprinus carpio Cyprinella lutrensis Platygobio gracilis Danio rerio Trinity ‘genes’ (¼Clusters of contigs) 309,921 255,863 180,130 30,651 Trinity ‘transcripts’ (¼Assembled contigs) 440,696 382,504 262,969 43,153 GC content 42.45 43.25 42.67 49.60 N25 (bp) 3,327 3,069 3,644 3,465 N50 1,841 1,666 1,972 2,037 N75 704 679 788 1,179 Median contig length 418 439 450 1,080 Average contig length 907 886 978 1,501 Total assembled bases 399,790,412 339,160,955 257,217,466 64,757,328 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 15 genes that significantly matched these 20,000 zebrafish genes varied nine teleost transcriptomes, and the zebrafish genome databases (Table among species: 66,447 in Cyprinus carpio, 60,990 in Cyprinella lutren- 2). For contigs lacking hits against zebrafish peptides, BLASTn searches sis, and 39,915 in Platygobio gracilis (Table 2,top row).Zebrafish versus the rRNA silva database revealed a small number of significant genes were well covered, with more than 15,000 unique zebrafish genes hits (i.e.<400 contigs; Table 2). BLASTn searches of the remaining covered over at least 70% of their length in corresponding contigs from unmatched contigs versus the nine teleost fish transcriptomes identified each of the three cyprinids, consistent with the N50 data presented approximately 1,500–4,500 additional hits (Table 2), far fewer than above. In general, zebrafish proteins were more completely covered by the evolutionarily more closely related zebrafish transcriptome. P. gracilis contigs than C. carpio or C. lutrensis. For example, zebrafish BLASTn searches of the remaining unidentified contigs against the genes were more than 90% covered (i.e. the alignment covers>90% of zebrafish genome revealed a large number of significant hits (>30,000 bases of a gene) by sequences in 50.3% (12,489 of 24,817 genes) of P. per species), suggesting these reads were the result of low levels of back- gracilis genes with significant zebrafish peptide hits, versus 49.9% ground genomic DNA contamination in the cDNA libraries, a common (13,453 of 26,963) for C. carpio, and 46.8% (12,538 of 26,817) in C. occurrence resulting from the hypersensitivity of Illumina sequencing. lutrensis. A large number of TRINITY contigs did not significantly match Conservation of sequences across deep evolutionary lineages suggests (BLASTx) zebrafish peptide sequences and were subsequently queried functional importance, such as regulatory regions. against several additional databases. Despite extensive BLAST searches, a large number of TRINITY con- tigs (>100,000 in each species or more than 50% of all contigs) did not have significant hits in any of the databases. These contigs are 3.3. BLAST searches: other databases short in length (i.e. 200 bp) and have few reads mapping to them Contigs lacking significant BLASTx hits against zebrafish peptides were (e.g. single-read contigs). These could represent endogenous genomic queried (BLASTn) iteratively against rRNA silva microbiome database, DNA contamination of cDNA libraries and have sufficient evolution- ary divergence from zebrafish to render BLASTn searches ineffective. A large number are also expected to be non-rRNA sequences from the microbiome, which were not present in target databases. Of con- tigs with no BLAST hits in the aforementioned databases, TRANSDECODER predicted ORFs in 8,652 (Cyprinus carpio), 9,215 (Cyprinella lutrensis), and 3,011 (Platygobio gracilis) contigs Figure 2. Contig length histogram of three cyprinids in this study and zebra- fish, Danio rerio. By leveraging high throughput sequencing and bioinfor- matic filtering, we were able to generate high quality transcriptomes at a fraction of the cost and research effort used for zebrafish. As expected, de novo TRINITY assemblies resulted in proportionally fewer contigs longer than 1000 bp, as compared to those of a well-assembled transcriptome, zebrafish (Danio rerio). However, note that we only used canonical transcripts for zebrafish and not the shorter isoforms, which skews the distribution toward Figure 3. Unique genes and protein families from BLASTx searches (E-value longer transcripts for that species. threshold¼ 0.0001) against zebrafish (Danio rerio) peptide sequences. Table 2. Significant BLAST hits for TRINITY ‘genes’ versus various databases and number of ORFs present. BLAST searches were done in stepwise fashion: all TRINITY genes were queried against zebrafish peptides but only genes without zebrafish peptide hits were queried against rRNA silva, and so on until all of the databases were queried. Summary of open reading frames (ORFs) identified in TRINITY contigs with no significant BLAST hits against databases listed (‘No significant BLAST hits’). Some of the ORFs lacking similar proteins in the nr database may represent novel genes or genes with divergent sequences and function, while many are likely spurious results from the sequencing and assembly process or are from unidentified microbes Cyprinus carpio Cyprinella lutrensis Platygobio gracilis Zebrafish peptides 66,447 60,990 39,915 rRNA Silva (microbiome) 140 306 87 Teleost fish transcriptomes 4,572 2,923 1,561 Zebrafish genome 48,527 38,199 31,955 No significant BLAST hits 190,235 153,445 106,612 Total contigs 309,921 255,863 180,130 Predicted ORFs present 8,652 9,215 3,011 ORFs with nr BLASTp hits 3,789 4,154 1,548 ORFs without nr BLASTp hits (i.e. potentially novel genes) 4,863 5,061 1,463 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 16 Comparative transcriptomics of cyprinids Figure 4. Top-species BLASTp hits for predicted open reading frame (ORF) peptide sequences queried against the nr database. Query sequences only included ORFs from contigs that lacked significant BLAST hits (see Table 2). Grey bars represent fish or other chordates, while black bars represent non-chordate taxa. (Table 2). Roughly half of the predicted ORFs had significant Tetrahymena, Parameceum). Conversely, in P. gracilis the ORFs ap- BLASTp hits against the nr protein database (3,789, 4,154, and pear to be endogenous genes with high similarity to zebrafish 1,548 contigs, respectively). Conversely, there were 4,863 (C. car- (Fig. 4), i.e. a less diverse microbiome is present. Contigs with pre- pio), 5,061 (C. lutrensis), and 1,463 (P. gracilis) predicted ORFs had dicted ORFs but no BLAST hits to any of the databases possibly rep- no significant hits against nr (Table 2). These ORFs could include resent novel or functionally divergent genes in these species that novel genes not present in zebrafish or other teleost models, genes warrant further study. present in zebrafish but with significantly divergent sequences to cause BLAST searches to miss them, or could include genes from the 3.4. Filtering and the final assembly datasets microbiome that are not present in sequence databases. For ORFs with nr hits, zebrafish was the top-hit species for a large After filtering and removal of genomic DNA and microbiome reads, portion (Fig. 4), somewhat paradoxically given the lack of significant the final de novo assembly datasets contained only TRINITY contigs BLAST hits against zebrafish peptide and genome sequences dis- falling into one of the following categories: (i) contigs with significant cussed above. This appears to be due to the fact that TRANSDECODER- BLAST hits against zebrafish or the nine other teleost transcriptomes; predicted ORFs exclude 5’ and 3’ untranslated regions (UTRs) which or (ii) contigs with no matches against any of the databases but with diverge more rapidly than ORFs over evolutionary time. In C. carpio predicted ORFs present, i.e. potentially novel genes. All other contigs and C. lutrensis, many of these ORFs are from a diverse microbiome were removed via bioinformatic filtering. While it is possible that with many sequences sharing significant similarity to cyclophyllid some of the ‘microbiome’ hits are actually external contamination, tapeworms (e.g. Echinococcus, Hymenolepis) and protozoans (e.g. we expect this to be a minor component given the diverse nature of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 17 Figure 5. Number of TRINITY genes (contigs) expressed in each of four tissue types, as well as all tissues pooled. Contigs only include those with significant BLASTx hits versus zebrafish peptides. these sequences in terms of top-hit organism (Fig. 4). It is also possi- 3.6. Expression of essential genes ble that some of the genes that significantly align against zebrafish Genes that are essential for embryonic development in D. rerio were are actually microbiome or contaminant reads, though these genes nearly all present in the three cyprinids: 285 (Platygobio gracilis), 301 being target species DNA is a more parsimonious conclusion. The fi- (Cyprinella lutrensis), and 301 (Cyprinus carpio) genes were expressed nal datasets are significantly smaller than the raw de novo assembly out of 305 zebrafish essential genes (i.e. 93.4–97.8%). Of the 20 essen- but present much more reliable sequence information, i.e. transcrip- tial genes that we did not detect in P. gracilis, only one was also miss- tome sequences rather than microbiome or genomic DNA ing in C. lutrensis, and two were shared with C. carpio. No missing contamination. essential genes were shared between C. lutrensis and C. carpio,of the four missing in each species. Essential genes missing in one or more species were generally expressed at low levels in the other species. 3.5. Genome duplication, diploidization and gene Essential genes were nearly ubiquitously expressed across all four tis- silencing sue types (skin, gill, gut, kidney), with low levels of tissue specificity Transcriptome annotation and comparison with zebrafish revealed (Fig. 8), in contrast to non-essential genes which generally exhibited that Cyprinus carpio expresses more genes than Cyprinella lutrensis higher levels of tissue specificity. A few essential genes do exhibit pat- and Platygobio gracilis, due to the Cc4R duplication (Fig. 5). terns of tissue specificity or species-specificity. For example, C. carpio Cyprinus carpio expressed about 41% more genes overall than P. expresses more essential genes in the gut than the other two species, in- gracilis and 11% more than C. lutrensis. The number of duplicate cluding genes such as wdr46 and exosc8, which are missing in both of genes expressed varied dramatically among tissue types (Fig. 5). In the other species. Normalized levels of expression were higher in C. all tissues except skin, C. carpio expressed more genes than the other carpio than P. gracilis and C. lutrensis for 165 and 204 out of 305 two species (i.e. 3–48% more). In skin, both C. lutrensis and P. graci- genes, respectively. This pattern was not due to C. carpio expressing lis expressed more genes than C. carpio (26 and 2%, respectively). more loci per zebrafish gene (e.g. Ohnologs) than the other two spe- Using higher thresholds for ‘expression’ had moderate impact on the cies. Only slightly more loci (e.g. n¼ 2 contigs) were expressed per es- inferred percentage of duplicates expressed: a threshold of 100 reads sential gene in the recently duplicated C. carpio genome (Fig. 9) instead of 10 resulted in different estimates of duplicated genes ex- whereas most duplicated essential genes in C. carpio are not tran- pressed in C. carpio versus P. gracilis (18% more in C. carpio) and scribed and have either been lost evolutionarily, e.g. pseudogenes, or C. lutrensis (8% more in C. carpio), i.e. retained expression of Cc4R are expressed in other developmental stages or tissues. duplicates. The disparity in these results could be driven in part by different assembly qualities (e.g. a better assembled P. gracilis tran- scriptome). WGCNA analysis revealed broadly similar patterns of 4. Discussion blocks of co-expressed genes across species, consistent with the phy- logenetic relatedness of the three species (Fig. 6). Next-generation transcriptome sequencing has revolutionized the 4,74 Genes with retained duplicate expression (i.e. Ohnologs) in C. car- field of molecular ecology over the past decade. One outcome is pio represented a suite of functional groups: gene ontology terms that increased appreciation for the molecular complexity underlying the 75,76 were significantly enriched in the ‘retained duplicates’ list were diverse evolution of basic ecological traits. Here we present transcrip- (Fig. 7, top panel). One functional grouping that was a predominant tomic resources for comparative study of non-model cyprinid fishes contributor in the REVIGO analysis was ‘anatomical structure morpho- in a natural ‘common-garden’ setting. Previous work, along with our genesis,’ of interest because common carp attain much larger body size bioinformatic analyses demonstrate that careful processing and filter- than the other two species (Fig. 7,bottom panel). ing is needed to assess the sources of DNA fragments, which can be Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 18 Comparative transcriptomics of cyprinids Figure 6. Results of WGCNA analysis for three cyprinid species. Dendrograms represent results of hierarchical cluster analysis of co-expression patterns of genes. Colored bars to the left and top of the heatmap show modules of co-expressed genes and pairs of genes with higher co-expression show (darker) colora- tion in the heatmap. endogenous target transcriptome sequences, genomic DNA ‘contami- 4.1. Assembly results nation’ from the study organism, or DNA from the microbiome or There are important considerations associated with conducting tran- diet items. Assessment of transcriptome quality also requires careful scriptome analysis in a non-laboratory setting and in species lacking 77,78 4,80,81 consideration. Traditional measures of assembled read lengths high-quality, well-annotated genomes. For example, it is neces- such as N50 are largely meaningless for transcriptomes without ad- sary to identify ways to maximize the quality and completeness of de 77,80,82 ditional context. We advocate combining N50 and/or histograms of novo assemblies. Our assemblies are somewhat less complete contig lengths with explicit comparisons to well-studied transcrip- than the zebrafish reference, but this was expected because zebrafish tomes of model organisms, when available. For example, we com- has been sequenced extensively at the genomic DNA level, empirically pared our de novo transcriptomes to zebrafish, which yielded validated with RNA-seq, and refined by years of manual curation. valuable insight into progress made in our target species. Finally, TRINITY assemblies resulted in proportionally fewer long contigs positive identification of nearly all zebrafish essential genes in our (e.g.> 1,000 bp) compared to zebrafish. Four factors account for this transcriptomes provides additional evidence of the utility of our an- result. First, the microbiome is present in these sequences and many of notation procedures. Using the bioinformatics pipeline presented in the contigs are not endogenous, as reflected by top species hits in Fig. 1, we obtained high quality transcriptome data from three spe- BLAST searches (Fig. 4). Second, a small amount of genomic DNA cies of cyprinid fishes with distinctly different evolutionary histories. contamination persists despite DNase treatment during library prepa- Our specific aims in this study were to sequence, annotate, and as- ration. Genomic contamination tends to be observed as short (e.g. semble the transcriptomes of co-occurring fishes with the goal of de- 200 bp), shallow contigs often comprised of single-reads. Third, the de veloping resources for ongoing studies of the evolution and novo assemblies are more fragmented due to the short read technology molecular ecology of North American cyprinids. This comparative employed, with multiple contigs often representing non-overlapping transcriptome dataset offers tools to construct assays to pose and test fragments of the same gene. This effect is particularly acute in genes hypotheses related to differences in DNA sequences, functional path- with short sequence repeats (e.g. microsatellites). Finally, we only used ways, and expression patterns among organisms that are more or the canonical zebrafish transcripts in this study, which excludes the less closely related (i.e. comparative approach), but that also co- shorter isoforms present in many genes and biases the zebrafish distri- occur in nature and experience similar biotic and abiotic conditions, bution toward longer sequences. Transcriptomes presented here repre- including exposure to similar suites of pathogens and water quality sent an improvement (i.e. more sequences, higher coverage; longer conditions, for example. These data are also a resource for identify- relative N50) over earlier work on sequencing and assembling the 39,51 ing single nucleotide polymorphisms (SNPs) in transcribed genes, common carp transcriptome using Roche 454 sequencing, due to which could be used to explore functional or phenotypic variation the higher throughput, Illumina paired-end sequencing approach we within and among species. employed. The bioinformatic approach we presented to identify and There are several key findings in this study, including: (i) high- filter non-target sequences from the final dataset resulted in high qual- quality transcriptome assemblies for cyprinid fishes that reveal broad ity and well annotated assemblies. similarities and evolutionary conservation of genes with zebrafish, but with some key differences; (ii) several potentially novel genes not identified in zebrafish that are candidates for studies of ecological 4.2. Potentially novel genes and morphological novelty; (iii) diverse microbiomes that vary sub- Results of BLAST searches and ORF predictions helped us identify stantially among species, despite origin from a single collection local- candidate genes that may represent novel species- or taxon-specific ity; (iv) ubiquitous expression of essential genes for development in genes. Our interest in these genes lies in the idea that they may con- later ontogenetic stages (i.e. juveniles and adults) across a broad ar- tain some of the functional elements responsible for extensive ecolog- ray of tissue types; (v) a large number of duplicate genes expressed in ical and phylogenetic diversity present in the Cyprinidae, as in 83–85 the tetraploid, Cyprinus carpio, representing a diverse suite of bio- previous studies of lineage-specific gene family expansions. For logical processes or gene ontologies. We discuss each of these find- example, expansion of the patristacin gene family in pipefish may be ings in greater detail below. an important driver in the evolution of male pregnancy in that Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 19 Figure 7. Top panel: Gene-ontology terms that are over- or under-represented (y-axis) in the list of genes retained as duplicates in the common carp transcrip- tome as compared to all expressed genes in common carp. Bottom panel: Summary of groups of biological processes overrepresented in the retained-dupli- cates in common carp. Box size is proportional to the number of genes with particular gene ontology terms, which may suggest a dosage effect in common carp. 85 87 lineage. Many of the potentially novel genes we identified may out of final assemblies. Genome-scale sequence data is often lack- prove to be false positives as more fish genomes are sequenced and ing for the bacterial and metazoan microbiota on vertebrate samples, annotated; however, these candidates would be an excellent starting which complicates attempts at removal. We used an iterative and point for researchers interested in targeted searches for genes or pro- successive filtering approach to address this issue (Fig. 1) that pro- teins underlying ecological novelty in cyprinids that may have arisen vides valuable information on the likely source (e.g. exogenous or en- through local gene duplications, exon shuffling, horizontal transfer, dogenous) of particular sequences or contigs. Transcriptome or other mechanisms. characterization studies often do not attempt to remove exogenous microbiome and genomic DNA contamination. Researchers should be cautious when using unfiltered sequence reads, particularly when 4.3. Microbiome diversity they are compiled into massive databases that lack appropriate Another valuable aspect of transcriptome sequencing of samples metadata. taken from nature is the simultaneous generation of quantifiable data on the microbiome. These data are applicable to study of 4.4. Conservation of essential genes host-parasite dynamics, immune response, paired comparative popu- lation genetics or phylogeographic analysis of host and microbiota. Genes that are essential for embryonic development present interest- When generating de novo transcriptome assemblies for focal species, ing targets for studying genome evolution due to their critical func- 58,88,89 it is imperative that microbiome sequences are identified and filtered tional importance. Essential genes also bear biomedical Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 20 Comparative transcriptomics of cyprinids Figure 8. Expression of essential developmental genes by tissue type in three cyprinid fishes compared to 305 essential genes expressed across all tissues in zebrafish (Danio rerio). or juvenile developmental stages. We propose that the number of es- sential genes expressed could be used as a metric to complement other measures of assembly quality and completeness, in addition to comparing transcript length histograms to closely related model species (see Fig. 2). While beyond the scope of the present study, fu- ture work should compare the utility of these metrics for assessing transcriptome assemblies. 4.5. Tetraploidy and expression of duplicated genes Our results indicated that a large number of duplicate genes are ex- pressed in Cyprinus carpio, representing a diverse suite of biological 39,40,51 processes or gene ontologies, similar to previous studies. For genes where both Ohnologs were expressed in C. carpio, there was en- richment in several different functional pathways, but many genes Figure 9. Number of loci (TRINITY genes) expressed per zebrafish essential were associated with ‘anatomical structure morphogenesis’ in particu- gene. Only slightly more (e.g. n¼ 2) were expressed per essential gene in the recently duplicated genome of Cyprinus carpio. Most duplicated essen- lar (Fig. 7). Functional duplicates at these genes correlate with large tial genes in C. carpio are not transcribed and have either been lost evolu- body size and rapid growth in C. carpio as compared to C. lutrensis tionarily, e.g. pseudogenes, or are expressed in other developmental stages and P. gracilis and a potential dosage effect. Using a different set of tis- or tissues. sues, Wang et al. identified enrichment of retained expression of du- plicates in gene ontology pathways involved in metabolic and immune significance as many have been implicated in human diseases and de- functions using 454 transcriptome sequencing and EST data mining. 88 40 velopmental abnormalities. Our data demonstrate that essential The availability of a (draft) genome for common carp will eventually 58,59 ‘developmental’ genes previously identified in larval zebrafish help identify Ohnologs that are silenced because pseudogenes of si- are almost all ubiquitously expressed in juvenile or adults across a lenced genes may still be present in genomic DNA sequences; cur- broad range of tissues, suggesting their importance is not simply lim- rently, the incomplete annotation of that genome precludes analysis of ited to early ontogenetic stages or particular tissues. Previous work gene silencing at the genomic DNA level. Ultimately, knowledge of has shown that many of these genes are critical for basic cell func- which genes are retained and expressed in duplicate in tetraploids as tion, which may underlie their ubiquitous expression. Based on the compared to related diploid species can provide insight into the role critical functions they perform, these genes are candidates for future that whole genome duplication plays in the molecular ecology and 93,94 studies looking at the cause of high rates of genetic inviability and phylogenetic diversification of organisms. Note that our analyses mortality in cyprinids and other organisms with type-III life histo- and those of Wang et al. are based only on expressed genes in partic- ries. From a practical standpoint, however, we suggest that the ular tissues at a single time point, rather than genomic DNA sequences ubiquitous expression of these genes makes their sequencing cover- and consequently would not include Ohnologs expressed only in dif- age and completeness useful metrics that should be used to assess the ferent tissues or at different time points. The recent Cc4R allotetra- 35,95,96 quality and completeness of de novo transcriptome assemblies, anal- ploidy event complicates transcriptome assembly because there ogous to the use of ‘housekeeping’ genes as positive controls in has been little time for divergence of Ohnologs e.g, 8.2 million years. qPCR studies. The presence of nearly all essential genes across In autopolyploid salmonids, the fourth round of whole genome dupli- these four cyprinid species (representing more than 100 million years cation is much older i.e. 90–102 ma; yet many Ohnologous loci are of evolutionary divergence ) is consistent with the hypothesis of difficult to separate via bioinformatic approaches. Some loci even 58,61 broad evolutionary and functional conservation. The few essen- maintain tetrasomic inheritance because of the autopolyploid nature of tial genes not detected may still be present in the genome, but were the duplication. These factors need to be explicitly considered when missed due to assembly errors or are expressed transiently at larval conducting analyses that require orthologous alignments, such as Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 21 RNA-seq and syntenic mapping, when working with polyploid or par- Supplementary data tially diploidized species. Supplementary data are available at DNARES online. 4.6. Summary References Results from short read sequences yield high-quality transcriptome 1. Mardis, E. R. 2008, The impact of next-generation sequencing technology resources for comparative study of cyprinids, a hyper-diverse clade on genetics. Trends Genet., 24, 133–41. of fishes. We used a variety of bioinformatic tools for assembly qual- 2. Shendure, J. and Ji, H. 2008, Next-generation DNA sequencing. Nat. ity assessment, gene annotation, orthology assignment, and identifi- Biotechnol., 26, 1135–45. cation and partitioning of exogenous DNA in wild cyprinid fishes. 3. Ekblom, R. and Galindo, J. 2011, Applications of next generation sequenc- This approach facilitates technology transfer from a model organism ing in molecular ecology of non-model organisms. Heredity, 107,1–15. (zebrafish) to a group of related species that fill diverse and critical 4. Alvarez, M., Schrey, A. W. and Richards, C. L. 2015, Ten years of tran- roles in these ecosystems and comprise an important component of scriptomics in wild populations: what have we learned about their ecology and evolution? Mol. Ecol., 24, 710–25. biodiversity. Conserved expression of essential developmental genes 5. Andrews, K. R., Good, J. M., Miller, M. R., Luikart, G. and Hohenlohe, across a broad phylogenetic scope, later ontogenetic stages, and ar- P. A. 2016, Harnessing the power of RADseq for ecological and evolu- ray of tissue types, illustrates their utility as benchmarks for assessing tionary genomics. Nat. Rev. Genet., 17, 81–92. coverage in de novo assemblies. Moreover, their ubiquitous expres- 6. Barshis, D. J., Ladner, J. T., Oliver, T. A., Seneca, F. O., Traylor-Knowles, sion further supports the hypothesis that these genes are required for N. and Palumbi, S. R. 2013, Genomic basis for coral resilience to climate the basic biology of cyprinid fish and are candidate loci for develop- change. Proc. Natl. Acad. Sci., 110, 1387–92. mental abnormalities and disease. Finally, comparative transcriptom- 7. Schunter, C., Vollmer, S. V., Macpherson, E. and Pascual, M. 2014, ics must contend with genome duplications and other genomic Transcriptome analyses and differential gene expression in a non-model ‘events’ that affect gene identity and expression. Nonetheless, com- fish species with alternative mating tactics. BMC Genomics, 15, 167. parative approaches could provide enormous power to identify 8. Verbruggen, B., Bickley, L. K., Santos, E. M., et al. 2015, De novo assem- bly of the Carcinus maenas transcriptome and characterization of innate shared and unique physiological pathways that respond to common immune system pathways. BMC Genomics, 16,1. environmental stressors in a natural setting. 9. Pavey, S. A., Bernatchez, L., Aubin-Horth, N. and Landry, C. R. 2012, What is needed for next-generation ecological and evolutionary geno- mics? Trends Ecol. Evol., 27, 673–78. 4.7. Data Availability 10. Van Straalen, N. M. and Roelofs, D. 2012, An Introduction to Ecological Raw sequence reads were uploaded to the NCBI Sequence Read Genomics. Oxford University Press. Archive (SRA: SRP107991: SRR5601334-SRR560133469. 11. Landry, C. R. and Aubin-Horth, N. 2014, Recent advances in ecological BIOPROJECT: PRJNA383604. BIOSAMPLES: SAMN07166458- genomics: from phenotypic plasticity to convergent and adaptive evolu- SAMN0716493). TRINITY-assembled transcriptomes are available tion and speciation. Adv. Exp. Med. Biol., 781, 1–5. via FigShare. BLAST results and the list of contigs 12. Goltsman, D. S. A., Comolli, L. R., Thomas, B. C. and Banfield, J. F. corresponding to potentially novel genes are available in the 2015, Community transcriptomics reveals unexpected high microbial di- Supplementary data. versity in acidophilic biofilm communities. ISME J., 9, 1014–23. 13. DeBiasse, M. B. and Kelly, M. W. 2016, Plastic and evolved responses to global change: what can we learn from comparative transcriptomics? J. Hered., 107, 71–81. Acknowledgements 14. Eo, S. H., Doyle, J. M., Hale, M. C., Marra, N. J., Ruhl, J. D. and This project was supported by the National Institute of General Medical DeWoody, J. A. 2012, Comparative transcriptomics and gene expression Sciences (8P20GM103451-12), New Mexico IDeA Networks of Biomedical in larval tiger salamander (Ambystoma tigrinum) gill and lung tissues as Research Excellence (NMINBRE_A2_Jan_2013), and the Center for revealed by pyrosequencing. Gene, 492, 329–38. Evolutionary and Theoretical Immunology. Samples were collected under 15. Whitehead, A., Triant, D., Champlin, D. and Nacci, D. 2010, New Mexico Department of Game and Fish permit #3015. This research was Comparative transcriptomics implicates mechanisms of evolved pollution approved by Institutional Animal Care and Use Committee Protocol #10- tolerance in a killifish population. Mol. Ecol., 19, 5186–203. 100468-MCC and #10-100492-MCC. We thank E. Loker, R. Miller, J. 16. Cohen, D., Bogeat-Triboulot, M.-B., Tisserant, E., et al. 2010, Kavka, and G. Rosenberg for research and technical support. Thanks to Z. Comparative transcriptomics of drought responses in Populus: a Ren and L. Hao for assistance with the Database of Essential Genes. Fish im- meta-analysis of genome-wide expression profiling in mature leaves and ages were provided T. Kennedy (red shiner, flathead chub) and C. Thomas root apices across two genotypes. BMC Genomics, 11,1. (common carp). This research benefitted from insight and technical assistance 17. Des Marais, D. L. and Rausher, M. D. 2008, Escape from adaptive con- provided by F. Schilkey, N. Devitt, P. Mena, T. Ramaraj, I. Lindquist, A. flict after duplication in an anthocyanin pathway gene. Nature, 454, Snyder, M. Osborne, and C. Krabbenhoft. We thank the anonymous re- 762–65. viewers for helpful comments on the manuscript. The authors have no compet- 18. Reid, N. M., Proestou, D. A., Clark, B. W., et al. 2016, The genomic land- ing interests. scape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science, 354, 1305–8. 19. Thompson, A. W. and Ortı´, G. 2016, Annual killifish transcriptomics and candidate genes for metazoan diapause. Mol. Biol. Evol., 33, 2391–95. Accession numbers 20. Dion-Coˆte ´ , A.-M., Renaut, S., Normandeau, E. and Bernatchez, L. 2014, NCBI Genbank SRA: SRP107991: SRR5601334-SRR560133469 RNA-seq reveals transcriptomic shock involving transposable elements reactivation in hybrids of young lake whitefish species. Mol. Biol. Evol., 31, 1188–99. 21. Kavembe, G. D., Franchini, P., Irisarri, I., Machado-Schiaffino, G. and Conflict of interest Meyer, A. 2015, Genomics of adaptation to multiple concurrent stresses: None declared. insights from comparative transcriptomics of a Cichlid fish from one of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 22 Comparative transcriptomics of cyprinids earth’s most extreme environments, the Hypersaline Soda Lake Magadi in 47. Ankley, G. T. and Villeneuve, D. L. 2006, The fathead minnow in aquatic Kenya, East Africa. J. Mol. Evol., 81, 90–109. toxicology: past, present and future. Aquatic Toxicol., 78, 91–102. 22. Kelley, J. L., Passow, C. N., Plath, M., Arias Rodriguez, L., Yee, M. C. 48. Klaper, R., Carter, B. J., Richter, C., Drevnick, P., Sandheinrich, M. and and Tobler, M. 2012, Genomic resources for a model in adaptation and Tillitt, D. 2008, Use of a 15 k gene microarray to determine gene expres- speciation research: characterization of the Poecilia mexicana transcrip- sion changes in response to acute and chronic methylmercury exposure in tome. BMC Genomics, 13, 652. the fathead minnow Pimephales promelas Rafinesque. J. Fish Biol., 72, 23. Deyholos, M. K. 2010, Making the most of drought and salinity tran- 2207–80. scriptomics. Plant Cell Environ., 33, 648–54. 49. Burns, F. R., Cogburn, A. L., Ankley, G. T., et al. 2016, Sequencing and 24. Ramachandran, V. K., East, A. K., Karunakaran, R., Downie, J. A. and de novo draft assemblies of a fathead minnow (Pimephales promelas) Poole, P. S. 2011, Adaptation of Rhizobium leguminosarum to pea, alfalfa reference genome. Environ. Toxicol. Chem., 35, 212–217. and sugar beet rhizospheres investigated by comparative transcriptomics. 50. FAO 2014, FISHSTAT: Global Aquaculture Production. FAO, Rome, Genome Biol., 12,1. Italy. 25. Cheviron, Z. A., Connaty, A. D., McClelland, G. B. and Storz, J. F. 2014, 51. Ji, P., Liu, G., Xu, J., et al. 2012, Characterization of common carp tran- Functional genomics of adaptation to hypoxic cold-stress in high-altitude scriptome: de novo sequencing, assembly, annotation and comparative ge- deer mice: transcriptomic plasticity and thermogenic performance nomics. PloS One, 7, 1–9. Evolution, 68, 48–62. 52. Li, G., Zhao, Y., Liu, Z., et al. 2015, De novo assembly and characteriza- 26. Webster, T. M. U. and Santos, E. M. 2015, Global transcriptomic profil- tion of the spleen transcriptome of common carp (Cyprinus carpio) using ing demonstrates induction of oxidative stress and of compensatory cellu- illumina paired-end sequencing. Fish Shellfish Immunol. lar stress responses in brown trout exposed to glyphosate and Roundup. 53. Dekay, J. E. 1842, Zoology of New-York, or, the New-York fauna; com- BMC Genomics, 16,1. prising detailed descriptions of all the animals hitherto observed within 27. Nelson, J. S. 2006, Fishes of the World. John Wiley & Sons: Hoboken, NJ. the state of New-York, with brief notices of those occasionally found near 28. Berra, T. M. 2001, Freshwater fish distribution. Academic Press: San its borders, and accompanied by appropriate illustrations. Part IV. Fishes. Diego, CA. W. & A. White & J. Visscher: Albany, NY. 29. Winfield, I. and Townsend, C. 1991, The role of cyprinids in ecosystems. 54. McDonald, M. 1893, Report of the commissioner of fish and fisheries for In: Winfield, I. and Nelson, J. (eds), Cyprinid Fishes, Springer, 552–71. the fiscal years 1889-90 and 1890-91. Part XVII. U.S. Commission of 30. Sterner, R. W. and George, N. B. 2000, Carbon, nitrogen, and phospho- Fish and Fisheries. Washington, DC, 1–96. rous stoichiometry of cyprinid fishes. Ecology, 81, 127–40. 55. Walters, D. M., Blum, M. J., Rashleigh, B., Freeman, B. J., Porter, B. A. 31. Power, M. E., Matthews, W. J. and Stewart, A. J. 1985, Grazing min- and Burkhead, N. M. 2008, Red shiner invasion and hybridization with nows, piscivorous bass, and stream algae: dynamics of a strong interac- blacktail shiner in the upper Coosa River, USA. Biol. Invas., 10, tion. Ecology, 66, 1448–56. 1229–1242. 32. Amores, A., Force, A., Yan, Y.-L., et al. 1998, Zebrafish hox clusters and 56. Marsh-Matthews, E. and Matthews, W. J. 2000, Spatial variation in rela- vertebrate genome evolution. Science, 282, 1711–14. tive abundance of a widespread, numerically dominant fish species and its 33. Postlethwait, J. H., Yan, Y.-L., Gates, M. A., et al. 1998, Vertebrate ge- effect on fish assemblage structure. Oecologia, 125, 283–292. nome evolution and the zebrafish gene map. Nature Genet., 18, 345–49. 57. Hoagstrom, C. W., DeWitte, A. C., Gosch, N. J. and Berry, C. R., Jr 34. Meyer, A. and Van de Peer, Y. 2005, From 2R to 3R: evidence for a 2007, Historical fish assemblage flux in the Cheyenne River below fish-specific genome duplication (FSGD). BioEssays, 27, 937–45. Angostura Dam. J. Freshwater Ecol., 22, 219–29. 35. David, L., Blum, S., Feldman, M. W., Lavi, U. and Hillel, J. 2003, Recent 58. Amsterdam, A., Nissen, R. M., Sun, Z., Swindell, E. C., Farrington, S. duplication of the common carp (Cyprinus carpio L.) genome as revealed and Hopkins, N. 2004, Identification of 315 genes essential for early by analyses of microsatellite loci. Mol. Biol. Evol., 20, 1425–34. zebrafish development. Proc. Natl. Acad. Sci USA, 101, 12792–97. 36. Machordom, A. and Doadrio, I. 2001, Evolutionary history and specia- 59. Chen, W.-H., Minguez, P., Lercher, M. J. and Bork, P. 2012, OGEE: an tion modes in the cyprinid genus Barbus. Proc. Roy. Soc. Lond B Biol. online gene essentiality database. Nucleic Acids Res., 40, D901–D906. Sci., 268, 1297–306. 60. Flicek, P., Amode, M. R., Barrell, D., et al. 2014, Ensembl 2014. Nucleic 37. Ohno, S., Muramoto, J., Christian, L. and Atkin, N. B. 1967, Acids Res., 42, D749–755. Diploid-tetraploid relationship among old-world members of the fish fam- 61. Hahn, M. W. and Kern, A. D. 2005, Comparative genomics of centrality ily Cyprinidae. Chromosoma, 23, 1–9. and essentiality in three eukaryotic protein-interaction networks. Mol. 38. Yang, L., Sado, T., Hirt, M. V., et al. 2015, Phylogeny and polyploidy: Biol. Evol., 22, 803–6. Resolving the classification of cyprinine fishes (Teleostei: Cypriniformes). 62. Innan, H. and Kondrashov, F. 2010, The evolution of gene duplications: clas- Mol. Phylogenet. Evol., 85, 97–116. sifying and distinguishing between models. Nat. Rev. Genet., 11, 97–108. 39. Wang, J.-T., Li, J.-T., Zhang, X.-F. and Sun, X.-W. 2012, Transcriptome 63. Chain, F. J., Dushoff, J. and Evans, B. J. 2011, The odds of duplicate gene analysis reveals the time of the fourth round of genome duplication in persistence after polyploidization. BMC Genomics, 12,1. common carp (Cyprinus carpio). BMC Genomics, 13, 96. 64. Bolger, A. M., Lohse, M. and Usadel, B. 2014, Trimmomatic: a flexible 40. Xu, P., Zhang, X., Wang, X., et al. 2014, Genome sequence and genetic trimmer for Illumina sequence data. Bioinformatics, btu170. diversity of the common carp, Cyprinus carpio. Nature Genet., 46, 65. Grabherr, M. G., Haas, B. J., Yassour, M., et al. 2011, Full-length tran- 1212–1219. scriptome assembly from RNA-Seq data without a reference genome. Nat. 41. Woods, T. D. and Buth, D. G. 1984, High level of gene silencing in the tet- Biotechnol., 29, 644–52. raploid goldfish. Biochem. Syst. Ecol., 12, 415–21. 66. Haas, B. J., Papanicolaou, A., Yassour, M., et al. 2013, De novo tran- 42. Ferris, S. D. and Whitt, G. S. 1977, The evolution of duplicate gene ex- script sequence reconstruction from RNA-seq using the Trinity platform pression in the carp (Cyprinus carpio). Experientia, 33, 1299–1301. for reference generation and analysis. Nat. Protocols, 8, 1494–512. 43. Briggs, J. P. 2002, The zebrafish: a new model organism for integrative 67. Pruesse, E., Quast, C., Knittel, K., et al. 2007, SILVA: a comprehensive physiology. Am. J. Physiol. Regul. Integr. Comp. Physiol., 282, R3–R9. online resource for quality checked and aligned ribosomal RNA sequence 44. Dooley, K. and Zon, L. I. 2000, Zebrafish: a model system for the study data compatible with ARB. Nucleic Acids Res., 35, 7188–96. of human disease. Curr. Opinion Genet. Dev., 10, 252–56. 68. Conesa, A., Gotz, S., Garcia-Gomez, J. M., Terol, J., Talon, M. and 45. Lieschke, G. J. and Currie, P. D. 2007, Animal models of human disease: Robles, M. 2005, Blast2GO: a universal tool for annotation, visualization zebrafish swim into view. Nat. Rev. Genet., 8, 353–367. and analysis in functional genomics research. Bioinformatics, 21, 46. Howe, K., Clark, M. D., Torroja, C. F., et al. 2013, The zebrafish refer- 3674–76. ence genome sequence and its relationship to the human genome. Nature, 69. Langmead, B. and Salzberg, S. L. 2012, Fast gapped-read alignment with 496, 498–503. Bowtie 2. Nat. Methods, 9, 357–59. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 23 70. Li, B. and Dewey, C. N. 2011, RSEM: accurate transcript quantification 85. Small, C., Bassham, S., Catchen, J., et al. 2016, The genome of the Gulf from RNA-Seq data with or without a reference genome. BMC pipefish enables understanding of evolutionary innovations. Genome Bioinform., 12, 323. Biol., 17, 258. 71. Langfelder, P. and Horvath, S. 2008, WGCNA: an R package for 86. Di Bella, J. M., Bao, Y., Gloor, G. B., Burton, J. P. and Reid, G. 2013, weighted correlation network analysis. BMC Bioinform., 9, 559. High throughput sequencing methods and analysis for microbiome re- 72. Mi, H., Muruganujan, A. and Thomas, P. D. 2013, PANTHER in 2013: search. J. Microbiol. Methods, 95, 401–14. modeling the evolution of gene function, and other gene attributes, in the 87. Coco, J. R., Flemington, E. K. and Taylor, C. M. 2011, PARSES: a pipe- context of phylogenetic trees. Nucleic Acids Res., 41, D377–386. line for analysis of RNA-Seq exogenous sequences. BICoB, 196–200. 73. Supek, F., Bosnjak, M., Skunca, N. and Smuc, T. 2011, REVIGO summa- 88. Dickerson, J. E., Zhu, A., Robertson, D. L. and Hentges, K. E. 2011, rizes and visualizes long lists of gene ontology terms. PLoS One, 6, Defining the role of essential genes in human disease. PLoS One, 6, e21800. e27368. 74. van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. 2014, Ten 89. Luo, H., Lin, Y., Gao, F., Zhang, C.-T. and Zhang, R. 2014, DEG 10, an years of next-generation sequencing technology. Trends Genet., 30, update of the database of essential genes that includes both 418–26. protein-coding genes and noncoding genomic elements. Nucleic Acids 75. Bernatchez, L. 2016, On the maintenance of genetic variation and adapta- Res., 42, D574–D580. tion to environmental change: considerations from population genomics 90. Plough, L., Shin, G. and Hedgecock, D. 2016, Genetic inviability is a ma- in fishes. J. Fish Biol., 89, 2519–56. jor driver of type-III survivorship in experimental families of a highly fe- 76. Martin, J. A. and Wang, Z. 2011, Next-generation transcriptome assem- cund marine bivalve. Mol. Ecol. bly. Nat. Rev. Genet., 12, 671–82. 91. McCurley, A. T. and Callard, G. V. 2008, Characterization of housekeep- 77. Li, B., Fillmore, N., Bai, Y., et al. 2014, Evaluation of de novo transcrip- ing genes in zebrafish: male-female differences and effects of tissue type, tome assemblies from RNA-Seq data. Genome Biol., 15,1. developmental stage and chemical treatment. BMC Mol. Biol., 9,1. 78. O’Neil, S. T. and Emrich, S. J. 2013, Assessing De Novo transcriptome as- 92. Saitoh, K., Sado, T., Doosey, M. H., et al. 2011, Evidence from mitochon- sembly metrics for consistency and utility. BMC Genomics, 14, 465. drial genomics supports the lower Mesozoic of South Asia as the time and 79. Lopez-Maestre, H., Brinza, L., Marchet, C., et al. 2016, SNP calling from place of basal divergence of cypriniform fishes (Actinopterygii: RNA-seq data without a reference genome: identification, quantification, Ostariophysi). Zool. J. Linnean Soc., 161, 633–62. differential analysis and impact on the protein sequence. Nucleic Acids 93. Postlethwait, J. H. 2007, The zebrafish genome in context: ohnologs gone Res., 44, e148–e148. missing. J. Exp. Zool. B Mol. Dev. Evol., 308, 563–77. 80. DeWoody, J. A., Abts, K. C., Fahey, A. L., et al. 2013, Of contigs and 94. Ferris, S. D. and Whitt, G. S. 1977, Loss of duplicate gene expression after quagmires: next-generation sequencing pitfalls associated with transcrip- polyploidisation. Nature, 265, 258–60. tomic studies. Mol. Ecol. Resour., 13, 551–58. 95. Larhammar, D. and Risinger, C. 1994, Molecular genetic aspects of tetra- 81. Wolf, J. B. 2013, Principles of transcriptome analysis and gene expression ploidy in the common carp Cyprinus carpio. Mol. Phylogenet. Evol., 3, quantification: an RNA-seq tutorial. Mol. Ecol. Resour., 13, 559–72. 59–68. 82. Vijay, N., Poelstra, J. W., Ku ¨ nstner, A. and Wolf, J. B. 2013, Challenges 96. Ohno, S. 1970, Evolution by Gene Duplication. Springer: New York, and strategies in transcriptome assembly and differential gene expression NY. quantification. A comprehensive in silico assessment of RNA-seq experi- 97. Berthelot, C., Brunet, F., Chalopin, D., et al. 2014, The rainbow trout ge- ments. Mol. Ecol., 22, 620–34. nome provides novel insights into evolution after whole-genome duplica- 83. Aravind, L., Watanabe, H., Lipman, D. J. and Koonin, E. V. 2000, tion in vertebrates. Nat. Commun., 5, 3657. Lineage-specific loss and divergence of functionally linked genes in eu- 98. Timusk, E. R., Ferguson, M. M., Moghadam, H. K., Norman, J. D., karyotes. Proc. Natl. Acad. Sci., 97, 11319–24. Wilson, C. C. and Danzmann, R. G. 2011, Genome evolution in the fish 84. Stein, C., Caccamo, M., Laird, G. and Leptin, M. 2007, Conservation and family salmonidae: generation of a brook charr genetic map and compari- divergence of gene families encoding components of innate immune re- sons among charrs (Arctic charr and brook charr) with rainbow trout. sponse systems in zebrafish. Genome Biol., 8,1. BMC Genet., 12, 1–15. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png DNA Research Oxford University Press

Comparative transcriptomics of cyprinid minnows and carp in a common wild setting: a resource for ecological genomics in freshwater communities

Free
13 pages

Loading next page...
 
/lp/ou_press/comparative-transcriptomics-of-cyprinid-minnows-and-carp-in-a-common-GCS0IpPQiW
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
ISSN
1340-2838
eISSN
1756-1663
D.O.I.
10.1093/dnares/dsx034
Publisher site
See Article on Publisher Site

Abstract

Comparative transcriptomics can now be conducted on organisms in natural settings, which has greatly enhanced understanding of genome–environment interactions. Here, we demon- strate the utility and potential pitfalls of comparative transcriptomics of wild organisms, with an example from three cyprinid fish species (Teleostei:Cypriniformes). We present extensively fil- tered and annotated transcriptome assemblies that provide a valuable resource for studies of genome evolution (e.g. polyploidy), ecological and morphological diversification, speciation, and shared and unique responses to environmental variation in cyprinid fishes. Our results and analyses address the following points: (i) ‘essential developmental genes’ are shown to be ubiquitously expressed in a diverse suite of tissues across later ontogenetic stages (i.e. juve- niles and adults), making these genes are useful for assessing the quality of transcriptome as- semblies, (ii) the influence of microbiomes and other exogenous DNA, (iii) potentially novel, species-specific genes, and (iv) genomic rearrangements (e.g. whole genome duplication). The data we present provide a resource for future comparative work in cypriniform fishes and other taxa across a variety of sub-disciplines, including stress response, morphological diversifica- tion, community ecology, ecotoxicology, and climate change. Key words: RNA-seq, essential genes, Cyprinus carpio, carp, gene silencing 1. Introduction 9–11 and giving rise to the field of ecological genomics. Reduced sequenc- High-throughput sequencing has dramatically accelerated the pace of ing costs have made it feasible to study transcriptomes of co-occurring 1,2 genomic research. While once restricted to model species in labora- species in a community ecology context (i.e. ‘community transcriptom- tory settings, genomic methods are being widely applied to non-model ics’), as well as comparative studies of transcriptome evolution across 3–8 13–16 species in nature, rapidly illuminating the black box of the genome diverse clades (i.e. ‘comparative transcriptomics’). While genomic V C The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 11 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 12 Comparative transcriptomics of cyprinids data from model species can be informative for the biology of related species with a comprehensively-annotated genome. Zebrafish is an organisms, not all species are the same in terms of their ecology, genet- important model organism in developmental biology and disease re- 43–45 ics, and morphology. For example, research on the zebrafish (Danio search, due to its semitransparent embryos and ease of laboratory rerio, family Cyprinidae) can be relevant for closely related species, but culture, as well as its comprehensively annotated genome. Fathead cannot explain the tremendous ecological and morphological diversity minnow is widely used as an indicator species in ecotoxicology studies 47,48 in this clade, as studies of single species are insufficient for understand- for which microarrays have been developed and a draft genome ing dynamic interactions among species and their respective genomes in sequence is now available. Common carp is an important food fish, a macroevolutionary context. In order to understand the causes and especially in Asia, and is produced extensively in aquaculture. The 39,51,52 consequences of those interactions, as well as the origin of ecological carp transcriptome has been studied elsewhere and recently a 17 40 novelty (e.g. new genes ), we must examine genomes across species draft genome sequence was published. that reflect that diversity. Recent studies across a diverse suite of teleost We used the extensive zebrafish genomic resources available to an- fish lineages have focused on functionally-important genetic variation notate transcriptomes of three evolutionarily related, but non-model 18–22 in non-model species in nature. species that co-occur in parts of the west-central United States: Transcriptomics of species in the wild has enormous potential to Cyprinella lutrensis (red shiner), Platygobio gracilis (flathead chub) and advance our understanding of mechanisms underlying molecular ad- Cyprinus carpio (common carp). These species were selected to reflect aptation, evolutionary diversification, ecotoxicology, and commu- phylogenetic breadth, but also because their distributions overlap and 4,23–26 nity ecology. In this context, several important questions arise. occupy identical dryland river habitats (i.e. the Rio Grande, New For example, What are the proximate and ultimate mechanisms un- Mexico), where they are exposed to similar biotic and abiotic condi- derlying phylogenetic, ecological, and morphological divergence? tions. Cyprinus carpio is native to Asia and Europe, but was introduced How have ancestral genomes been molded by divergent natural se- into North America, perhaps as early as 1831, and enthusiastically lection and other evolutionary forces into myriad forms that exist stocked throughout the US thereafter as a food fish, including New today? How does genomic architecture constrain or promote diversi- Mexico as early as 1889. Cyprinella lutrensis and Platygobio gracilis fication? How important are genome duplication events in adaptive are both native to central and western North America, from the radiations? What role do genomes play in underlying the ecological Mississippi River basin to the Rio Grande in New Mexico. Both C. dynamics of community assembly (e.g. competition, abundance, spa- lutrensis and C. carpio are highly tolerant of a wide range of environ- tial and temporal dynamics, physiological constraint, etc.)? A neces- mental conditions and are highly invasive in areas outside of their natu- 55,56 sary first step in addressing these questions is the generation of ral range, whereas Platygobio gracilis is sensitive to environmental databases reflecting the genomic or transcriptomic variation among disturbance and imperiled or declining in several parts of its range. species, which we provide in the current study. Transcriptomes of Cyprinella lutrensis and Platygobio gracilis Here we present transcriptomic resources for three members of have not been published to our knowledge, whereas genomic and 39,40,51 the freshwater fish family Cyprinidae (Teleostei: Cypriniformes), one transcriptomic data are available for Cyprinus carpio. of the most speciose vertebrate clades, with over 2,000 species. In Cyprinella lutrensis and Platygobio gracilis are diploid (2n¼ 50), addition to remarkable species diversity, the clade includes extensive while Cyprinus carpio is allotetraploid 2n¼ 100, with some dupli- ecological, genetic, and morphological diversity. Cyprinid fishes cated genes silenced after a lineage-specific whole genome duplica- (minnows and carp) comprise an important component of freshwater tion (i.e. Cc4R). Our aims in this study were to: (i) succinctly fish communities throughout North America, Asia, Europe and summarize and compare genes and functional annotation informa- Africa. They are often the dominant fish taxa in numerical abun- tion obtained from various databases; (ii) test whether Cyprinus car- dance and biomass and play an important functional role in aquatic pio expresses additional copies of particular genes compared to the 29–31 ecosystems. two diploid species (Cyprinella lutrensis and Platygobio gracilis); (iii) The ecological and taxonomic diversity of cyprinids is particularly identify potentially novel genes present in the three cyprinids that interesting in light of the history of genome evolution in this clade. may underlie their unique ecological and morphological novelty and Cyprinids were part of the radiation that occurred after the teleost- (iv) to assess evolutionary conservation of essential genes for 32,33 specific genome duplication event, known as the ‘3R hypothesis’, development. that preceded and perhaps facilitated the diversification of teleost In zebrafish, 307 genes are known to be essential for development. fishes. In addition, several cyprinid lineages have independently un- Knockout mutations in these genes are embryonic lethal according to 35–38 58 dergone additional rounds of genome duplications. For exam- experiments by Amsterdam et al. with subsequent revisions by 59 60 ple, the common carp (Cyprinus carpio) lineage had a fourth round Chen et al. and updates to the ENSEMBL database. These genes 39 40 of genome duplication approximately 5.6–11.3 or 8.2 million are highly conserved across extremely deep phylogenetic splits (e.g. years ago (Ma), which we refer to as ‘Cc4R’. Pairs of genes arising yeast, fly, zebrafish, and human) due to their essential roles in devel- from whole genome duplication, referred to as ‘Ohnologs’, were the- opment. Despite their importance, essential genes have not been oretically present for all genes immediately following the Cc4R ge- studied in the context of comparative molecular ecology or ecologi- nome duplication. Varying levels of subsequent gene-silencing and cal genomics of co-occurring species. Using transcriptome data pre- ‘re-diploidization’ have since occurred in polyploid lineages making sented in this study, we assessed the evolutionary conservation of the 41,42 cyprinids ideal for comparative studies of genome evolution. 307 zebrafish essential genes across four cyprinid lineages. We pre- Despite the ecological importance of cyprinids in freshwater sys- dicted that these genes would be highly conserved across all species, tems worldwide and dynamic lineage-specific patterns of genomic ex- consistent with their critical functional roles, as compared to non- pansions and contractions, most species have little or no genomic essential genes. If this is the case, then differences among species resources available for investigating their molecular ecology or genome should be found in non-essential genes, such as lineage- or species- evolution. Three notable exceptions are zebrafish (Danio rerio), fat- specific genes. We also tested whether both copies of duplicated head minnow (Pimephales promelas), and common carp (Cyprinus genes in C. carpio (i.e. Cc4R Ohnologs) were retained and expressed 43–45 38,42 carpio). The family includes zebrafish (Danio rerio), a model in duplicate or whether one copy was evolutionarily lost. One Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 13 mechanism for the loss of Ohnologs is ‘pseudogenization’, wherein a gene accumulates one or more internal stop codons that prevent formation of a functional protein product and thus becomes a pseu- dogene. If having redundant copies of essential genes were impor- tant for survival (e.g. due to loss-of-function mutations in one copy), then evolutionary retention of duplicates would likely be favored in C. carpio. Conversely, if regulation of proper gene expression levels were important in the context of functional pathways, then dupli- cated essential genes would likely be silenced at roughly the same rate as non-essential genes (although regulatory changes could also fine-tune expression patterns). We tested these hypotheses using ex- pression data for the three cyprinid transcriptomes as compared to zebrafish. These sequences will provide resources for more detailed studies of the evolution and functional constraint of these critical genes, particularly in the context of genome expansions and reductions. 2. Materials and methods Fish (n¼ 3 per species) were collected with a seine on 6 July 2012 from a field site on the Rio Grande, approximately 40 km south of Socorro, New Mexico (33.690556 N, 106.993042 W). Whole fish samples (juveniles or non-spawning adults) were immediately frozen in liquid nitrogen and transported to the laboratory. Skin, gill, gut, and kidney tissues were dissected and removed from frozen fish (outer layers were only slightly thawed by the time dissection was completed;<5 minutes total time), placed in TRIzol (Invitrogen), and Figure 1. Flow diagram illustrating our bioinformatic pipeline. Analyses con- sist of three main steps: assembly, annotation, and analysis of gene silenc- mechanically homogenized. Total RNA was isolated using Purelink ing patterns. Databases queried and software packages used are listed. RNA Mini kits (Ambion) following manufacturer’s protocol, along with DNase treatment to reduce genomic DNA contamination. Purified total RNA was sent to the National Center for Genome BLAST hits were identified based on the following parameter set- Resources (Santa Fe, New Mexico, USA) for quantification, quality tings: E-value< 0.0001; gap open penalty¼ 11; gap extend¼ 1; assessment, cDNA library preparation and sequencing. RNA integ- wordsize¼ 3. After extensive testing, this parameter combination rity and purity was assessed with a Bioanalyzer 2100 instrument was found to give the optimal balance between finding matches for (Agilent Technologies). Thirty-six Illumina libraries were constructed large numbers of contigs, while minimizing spurious hits. For most (3 species  4 tissues  3 biological replicates) from the total RNA genes a 1–1 match was expected between zebrafish versus Platygobio samples using Illumina TruSeq DNA prep kits according to the man- gracilis or Cyprinella lutrensis, whereas zebrafish and Cyprinus car- ufacturer’s protocol. Libraries were barcoded using standard six pio should have either 1-2 or 1-1 due to partial diploidy in carp. We base pair Illumina oligonucleotides, and six libraries were pooled for used this expectation in determining the threshold E-value (i.e. each lane of Illumina HiSeq 2000 (V3 chemistry) for a total of six E< 0.0001 in this study) to use. In practice, more stringent E-value lanes of 2  100 bp paired-end sequencing. thresholds (e.g. E< 1e6) had very little effect on the number of sig- nificant BLAST hits. 2.1. Bioinformatics Contigs with no significant BLAST hits against the zebrafish tran- We used the bioinformatics pipeline outlined in Figure 1 for analyz- scriptome were subjected to a series of stepwise BLASTn searches until ing transcriptomic data in three main steps: de novo assembly, gene significant hits were found (or not) in order to identify the possible annotation, and analysis of expression of duplicated genes. Adapters sources of those sequences (e.g. microbiome ) or to identify novel and barcode sequences were removed from raw reads, and reads genes not present in the zebrafish genome. First, remaining contigs were trimmed using TRIMMOMATIC with parameter settings as fol- lacking significant hits against the zebrafish transcriptome were que- lows: leading quality¼ 5; trailing quality¼ 5; minimum trimmed ried against the rRNA silva database (SSU Ref 119 NR99 and LSU read length¼ 36). Reads were normalized in silico to maximum read Parc 119), which contains bacterial and eukaryotic rRNA sequences. coverage of 50. Clipped and trimmed reads were assembled, de Contigs with still no significant BLAST hits were then queried against novo, for each species separately using TRINITY version 2014-04- a database containing all nine additional teleost fish transcriptomes 65,66 13, with minimum contig length set to 200 bp. Libraries were (Amazon molly, Poecilia formosa; cavefish, Astyanax mexicanus;cod, pooled within a species for de novo assembly, to maximize the num- Gadus morhua;fugu, Takifugu rubripes;medaka, Oryzias latipes;pla- ber of genes included. TRINITY assembles reads into contigs (‘TRINITY tyfish, Xiphophorus maculates; stickleback, Gasterosteus aculeatus; transcripts’), places similar transcripts in groups loosely referred to tetraodon, Tetraodon nigroviridis;tilapia, Oreochromis niloticus) as ‘genes’, and groups similar ‘genes’ into gene clusters. from Ensembl 78. Contigs with no BLAST hits at this point were then Putative protein coding genes were also identified by BLASTx BLASTed against the zebrafish genome (Zv9) using the ‘Top Level’ se- searches of contigs against zebrafish (Danio rerio) peptide sequences quences from Ensembl to identify possible genomic DNA contamina- (database build Zv9) obtained from Ensembl 78. Significant tion. Remaining contigs with no significant blast hits in any of these Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 14 Comparative transcriptomics of cyprinids databases were piped to TRANSDECODER to identify open reading GO terminology was based on the zebrafish database. Results of the frames (ORFs) that represent potentially novel genes. Default parame- overrepresentation analysis were visualized with REVIGO. ter settings were used with TRANSDECODER. The software generates pre- dicted peptide sequences for contigs with ORFs. Predicted peptide 2.3. Essential genes sequences for the contigs with ORFs but no BLAST hits to the afore- To test the hypothesis of evolutionary conservation of essential genes mentioned databases were queried (BLASTp; E-value< 0.001) against among cyprinid fishes, we used zebrafish genes present in the Online the NCBI nr database. BLAST2GO version 3.0, was used to identify Gene Essentiality Database OGEE; and identified orthologs in the top species hits for those predicted proteins with significant hits against three transcriptomes from BLASTx searches described above. Of the nr. The remaining sequences with no hits to databases and no ORFs 58,59 307 essential genes in zebrafish, one (ENSDARG00000038423) were discarded as likely non-protein coding, genomic DNA contami- has been retired from ENSEMBL and one (ENSDARG000 nation with sufficient divergence from zebrafish to render genomic 00045605) is an unprocessed pseudogene with no protein product. BLASTn searches ineffective. We searched for the remaining 305 genes in the three transcriptome assemblies to assess their conservation across cyprinids. 2.2. Genome duplication, diploidization and gene silencing 3. Results Trimmed sequence reads were mapped to TRINITY contigs using BOWTIE2 version 2.2.2.3 and corresponding gene expression was 3.1. Sequencing and transcriptome assemblies quantified with RSEM version 1.2.13. Because RSEM is incompat- Six lanes of Illumina sequencing produced more than 1.2 billion ible with indel, local, and discordant alignments, parameter settings paired-end reads, including 420.5-, 413.9-, and 385.3-million se- were chosen to avoid these alignments. The following RSEM param- quences in Cyprinus carpio, Cyprinella lutrensis, and Platygobio gra- eters were used: –sensitive; –dpad 0; –gbar 99999999; –mp 1,1 –np 1 cilis, respectively. De novo assembly resulted in high quality –score-min L,0,-0.1; –no-mixed; –no-discordant. Normalized expres- transcriptomes for all three species (Table 1). The C. carpio assembly sion for TRINITY genes was calculated by standardizing by total had the largest number of contigs (‘TRINITY transcripts’) and genes mapped reads across libraries and summed across alternate TRINITY (‘TRINITY genes’), while P. gracilis had the fewest. In contrast, metrics transcripts (isoforms) for each locus. Networks of co-expressed genes for contig length (N25, N50, N75, median contig length, average were identified for the three species using the WGCNA package in contig length) were all longer in P. gracilis than the other two species R. In order to assess the expression of duplicated genes in (Table 1; Fig. 2). Overall, the P. gracilis transcriptome assembly was Cyprinus carpio arising from the Cc4R duplication event, we quanti- more complete despite fewer raw sequence reads. TRANSDECODER pre- fied the number of TRINITY genes present in each species relative to dicted ORFs in about half of all TRINITY contigs (not shown), with zebrafish genes, as well as their expression levels. We used an arbi- the remainder comprised mainly of genomic DNA contamination trary threshold of ten sequence reads per gene per tissue, summed that was filtered out of the final dataset. The N50 of predicted ORFs across all three individuals, for a given gene to be considered ‘ex- was only slightly shorter in the three species (i.e. 1,299–1,572 bp) pressed’ in a particular tissue. Note that we are comparing whether than in zebrafish (CDS N50¼ 2,037 bp), and similar to the recently or not a gene is expressed beyond a certain threshold, as opposed to published draft C. carpio genome 1,487 bp. Removal of micro- quantifying levels of expression (i.e. RNA-seq). This approach was biome and genomic DNA contamination from the final assembly re- aimed at reducing the influence of unique reads (e.g. sequencing arti- sulted in fewer, but longer contigs (see filtering of the final dataset, facts). Most of the contigs excluded as a result were contigs repre- below), and an overall higher-quality assembly. sented only by singleton reads in one library. For C. carpio, we tested whether certain functional classes of 3.2. BLAST searches: zebrafish transcriptome genes were preferentially expressed in duplicate (i.e. the case where neither ohnolog is silenced). For this analysis, we used PANTHER to TopBLASTxhitsofTRINITY contigs against zebrafish peptides included test for statistical overrepresentation of GO-slim Biological approximately 20,000 unique genes (ENSDARG) and 11,000 protein Processes, with Bonferroni correction. The test genes consisted of the families (ENSFAM) present in each of the three species (Fig. 3), suggest- list of C. carpio ohnologs expressed in duplicate, while the list of all ing similar annotation efficiency and transcriptome representation for C. carpio genes present in the assembly was used as the reference set. each species. However, after pooling isoforms, the number of TRINITY Table 1. De novo transcriptome assembly results. Zebrafish (Danio rerio) data is included as an example of a well-assembled and complete transcriptome based primarily on Sanger sequencing Cyprinus carpio Cyprinella lutrensis Platygobio gracilis Danio rerio Trinity ‘genes’ (¼Clusters of contigs) 309,921 255,863 180,130 30,651 Trinity ‘transcripts’ (¼Assembled contigs) 440,696 382,504 262,969 43,153 GC content 42.45 43.25 42.67 49.60 N25 (bp) 3,327 3,069 3,644 3,465 N50 1,841 1,666 1,972 2,037 N75 704 679 788 1,179 Median contig length 418 439 450 1,080 Average contig length 907 886 978 1,501 Total assembled bases 399,790,412 339,160,955 257,217,466 64,757,328 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 15 genes that significantly matched these 20,000 zebrafish genes varied nine teleost transcriptomes, and the zebrafish genome databases (Table among species: 66,447 in Cyprinus carpio, 60,990 in Cyprinella lutren- 2). For contigs lacking hits against zebrafish peptides, BLASTn searches sis, and 39,915 in Platygobio gracilis (Table 2,top row).Zebrafish versus the rRNA silva database revealed a small number of significant genes were well covered, with more than 15,000 unique zebrafish genes hits (i.e.<400 contigs; Table 2). BLASTn searches of the remaining covered over at least 70% of their length in corresponding contigs from unmatched contigs versus the nine teleost fish transcriptomes identified each of the three cyprinids, consistent with the N50 data presented approximately 1,500–4,500 additional hits (Table 2), far fewer than above. In general, zebrafish proteins were more completely covered by the evolutionarily more closely related zebrafish transcriptome. P. gracilis contigs than C. carpio or C. lutrensis. For example, zebrafish BLASTn searches of the remaining unidentified contigs against the genes were more than 90% covered (i.e. the alignment covers>90% of zebrafish genome revealed a large number of significant hits (>30,000 bases of a gene) by sequences in 50.3% (12,489 of 24,817 genes) of P. per species), suggesting these reads were the result of low levels of back- gracilis genes with significant zebrafish peptide hits, versus 49.9% ground genomic DNA contamination in the cDNA libraries, a common (13,453 of 26,963) for C. carpio, and 46.8% (12,538 of 26,817) in C. occurrence resulting from the hypersensitivity of Illumina sequencing. lutrensis. A large number of TRINITY contigs did not significantly match Conservation of sequences across deep evolutionary lineages suggests (BLASTx) zebrafish peptide sequences and were subsequently queried functional importance, such as regulatory regions. against several additional databases. Despite extensive BLAST searches, a large number of TRINITY con- tigs (>100,000 in each species or more than 50% of all contigs) did not have significant hits in any of the databases. These contigs are 3.3. BLAST searches: other databases short in length (i.e. 200 bp) and have few reads mapping to them Contigs lacking significant BLASTx hits against zebrafish peptides were (e.g. single-read contigs). These could represent endogenous genomic queried (BLASTn) iteratively against rRNA silva microbiome database, DNA contamination of cDNA libraries and have sufficient evolution- ary divergence from zebrafish to render BLASTn searches ineffective. A large number are also expected to be non-rRNA sequences from the microbiome, which were not present in target databases. Of con- tigs with no BLAST hits in the aforementioned databases, TRANSDECODER predicted ORFs in 8,652 (Cyprinus carpio), 9,215 (Cyprinella lutrensis), and 3,011 (Platygobio gracilis) contigs Figure 2. Contig length histogram of three cyprinids in this study and zebra- fish, Danio rerio. By leveraging high throughput sequencing and bioinfor- matic filtering, we were able to generate high quality transcriptomes at a fraction of the cost and research effort used for zebrafish. As expected, de novo TRINITY assemblies resulted in proportionally fewer contigs longer than 1000 bp, as compared to those of a well-assembled transcriptome, zebrafish (Danio rerio). However, note that we only used canonical transcripts for zebrafish and not the shorter isoforms, which skews the distribution toward Figure 3. Unique genes and protein families from BLASTx searches (E-value longer transcripts for that species. threshold¼ 0.0001) against zebrafish (Danio rerio) peptide sequences. Table 2. Significant BLAST hits for TRINITY ‘genes’ versus various databases and number of ORFs present. BLAST searches were done in stepwise fashion: all TRINITY genes were queried against zebrafish peptides but only genes without zebrafish peptide hits were queried against rRNA silva, and so on until all of the databases were queried. Summary of open reading frames (ORFs) identified in TRINITY contigs with no significant BLAST hits against databases listed (‘No significant BLAST hits’). Some of the ORFs lacking similar proteins in the nr database may represent novel genes or genes with divergent sequences and function, while many are likely spurious results from the sequencing and assembly process or are from unidentified microbes Cyprinus carpio Cyprinella lutrensis Platygobio gracilis Zebrafish peptides 66,447 60,990 39,915 rRNA Silva (microbiome) 140 306 87 Teleost fish transcriptomes 4,572 2,923 1,561 Zebrafish genome 48,527 38,199 31,955 No significant BLAST hits 190,235 153,445 106,612 Total contigs 309,921 255,863 180,130 Predicted ORFs present 8,652 9,215 3,011 ORFs with nr BLASTp hits 3,789 4,154 1,548 ORFs without nr BLASTp hits (i.e. potentially novel genes) 4,863 5,061 1,463 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 16 Comparative transcriptomics of cyprinids Figure 4. Top-species BLASTp hits for predicted open reading frame (ORF) peptide sequences queried against the nr database. Query sequences only included ORFs from contigs that lacked significant BLAST hits (see Table 2). Grey bars represent fish or other chordates, while black bars represent non-chordate taxa. (Table 2). Roughly half of the predicted ORFs had significant Tetrahymena, Parameceum). Conversely, in P. gracilis the ORFs ap- BLASTp hits against the nr protein database (3,789, 4,154, and pear to be endogenous genes with high similarity to zebrafish 1,548 contigs, respectively). Conversely, there were 4,863 (C. car- (Fig. 4), i.e. a less diverse microbiome is present. Contigs with pre- pio), 5,061 (C. lutrensis), and 1,463 (P. gracilis) predicted ORFs had dicted ORFs but no BLAST hits to any of the databases possibly rep- no significant hits against nr (Table 2). These ORFs could include resent novel or functionally divergent genes in these species that novel genes not present in zebrafish or other teleost models, genes warrant further study. present in zebrafish but with significantly divergent sequences to cause BLAST searches to miss them, or could include genes from the 3.4. Filtering and the final assembly datasets microbiome that are not present in sequence databases. For ORFs with nr hits, zebrafish was the top-hit species for a large After filtering and removal of genomic DNA and microbiome reads, portion (Fig. 4), somewhat paradoxically given the lack of significant the final de novo assembly datasets contained only TRINITY contigs BLAST hits against zebrafish peptide and genome sequences dis- falling into one of the following categories: (i) contigs with significant cussed above. This appears to be due to the fact that TRANSDECODER- BLAST hits against zebrafish or the nine other teleost transcriptomes; predicted ORFs exclude 5’ and 3’ untranslated regions (UTRs) which or (ii) contigs with no matches against any of the databases but with diverge more rapidly than ORFs over evolutionary time. In C. carpio predicted ORFs present, i.e. potentially novel genes. All other contigs and C. lutrensis, many of these ORFs are from a diverse microbiome were removed via bioinformatic filtering. While it is possible that with many sequences sharing significant similarity to cyclophyllid some of the ‘microbiome’ hits are actually external contamination, tapeworms (e.g. Echinococcus, Hymenolepis) and protozoans (e.g. we expect this to be a minor component given the diverse nature of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 17 Figure 5. Number of TRINITY genes (contigs) expressed in each of four tissue types, as well as all tissues pooled. Contigs only include those with significant BLASTx hits versus zebrafish peptides. these sequences in terms of top-hit organism (Fig. 4). It is also possi- 3.6. Expression of essential genes ble that some of the genes that significantly align against zebrafish Genes that are essential for embryonic development in D. rerio were are actually microbiome or contaminant reads, though these genes nearly all present in the three cyprinids: 285 (Platygobio gracilis), 301 being target species DNA is a more parsimonious conclusion. The fi- (Cyprinella lutrensis), and 301 (Cyprinus carpio) genes were expressed nal datasets are significantly smaller than the raw de novo assembly out of 305 zebrafish essential genes (i.e. 93.4–97.8%). Of the 20 essen- but present much more reliable sequence information, i.e. transcrip- tial genes that we did not detect in P. gracilis, only one was also miss- tome sequences rather than microbiome or genomic DNA ing in C. lutrensis, and two were shared with C. carpio. No missing contamination. essential genes were shared between C. lutrensis and C. carpio,of the four missing in each species. Essential genes missing in one or more species were generally expressed at low levels in the other species. 3.5. Genome duplication, diploidization and gene Essential genes were nearly ubiquitously expressed across all four tis- silencing sue types (skin, gill, gut, kidney), with low levels of tissue specificity Transcriptome annotation and comparison with zebrafish revealed (Fig. 8), in contrast to non-essential genes which generally exhibited that Cyprinus carpio expresses more genes than Cyprinella lutrensis higher levels of tissue specificity. A few essential genes do exhibit pat- and Platygobio gracilis, due to the Cc4R duplication (Fig. 5). terns of tissue specificity or species-specificity. For example, C. carpio Cyprinus carpio expressed about 41% more genes overall than P. expresses more essential genes in the gut than the other two species, in- gracilis and 11% more than C. lutrensis. The number of duplicate cluding genes such as wdr46 and exosc8, which are missing in both of genes expressed varied dramatically among tissue types (Fig. 5). In the other species. Normalized levels of expression were higher in C. all tissues except skin, C. carpio expressed more genes than the other carpio than P. gracilis and C. lutrensis for 165 and 204 out of 305 two species (i.e. 3–48% more). In skin, both C. lutrensis and P. graci- genes, respectively. This pattern was not due to C. carpio expressing lis expressed more genes than C. carpio (26 and 2%, respectively). more loci per zebrafish gene (e.g. Ohnologs) than the other two spe- Using higher thresholds for ‘expression’ had moderate impact on the cies. Only slightly more loci (e.g. n¼ 2 contigs) were expressed per es- inferred percentage of duplicates expressed: a threshold of 100 reads sential gene in the recently duplicated C. carpio genome (Fig. 9) instead of 10 resulted in different estimates of duplicated genes ex- whereas most duplicated essential genes in C. carpio are not tran- pressed in C. carpio versus P. gracilis (18% more in C. carpio) and scribed and have either been lost evolutionarily, e.g. pseudogenes, or C. lutrensis (8% more in C. carpio), i.e. retained expression of Cc4R are expressed in other developmental stages or tissues. duplicates. The disparity in these results could be driven in part by different assembly qualities (e.g. a better assembled P. gracilis tran- scriptome). WGCNA analysis revealed broadly similar patterns of 4. Discussion blocks of co-expressed genes across species, consistent with the phy- logenetic relatedness of the three species (Fig. 6). Next-generation transcriptome sequencing has revolutionized the 4,74 Genes with retained duplicate expression (i.e. Ohnologs) in C. car- field of molecular ecology over the past decade. One outcome is pio represented a suite of functional groups: gene ontology terms that increased appreciation for the molecular complexity underlying the 75,76 were significantly enriched in the ‘retained duplicates’ list were diverse evolution of basic ecological traits. Here we present transcrip- (Fig. 7, top panel). One functional grouping that was a predominant tomic resources for comparative study of non-model cyprinid fishes contributor in the REVIGO analysis was ‘anatomical structure morpho- in a natural ‘common-garden’ setting. Previous work, along with our genesis,’ of interest because common carp attain much larger body size bioinformatic analyses demonstrate that careful processing and filter- than the other two species (Fig. 7,bottom panel). ing is needed to assess the sources of DNA fragments, which can be Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 18 Comparative transcriptomics of cyprinids Figure 6. Results of WGCNA analysis for three cyprinid species. Dendrograms represent results of hierarchical cluster analysis of co-expression patterns of genes. Colored bars to the left and top of the heatmap show modules of co-expressed genes and pairs of genes with higher co-expression show (darker) colora- tion in the heatmap. endogenous target transcriptome sequences, genomic DNA ‘contami- 4.1. Assembly results nation’ from the study organism, or DNA from the microbiome or There are important considerations associated with conducting tran- diet items. Assessment of transcriptome quality also requires careful scriptome analysis in a non-laboratory setting and in species lacking 77,78 4,80,81 consideration. Traditional measures of assembled read lengths high-quality, well-annotated genomes. For example, it is neces- such as N50 are largely meaningless for transcriptomes without ad- sary to identify ways to maximize the quality and completeness of de 77,80,82 ditional context. We advocate combining N50 and/or histograms of novo assemblies. Our assemblies are somewhat less complete contig lengths with explicit comparisons to well-studied transcrip- than the zebrafish reference, but this was expected because zebrafish tomes of model organisms, when available. For example, we com- has been sequenced extensively at the genomic DNA level, empirically pared our de novo transcriptomes to zebrafish, which yielded validated with RNA-seq, and refined by years of manual curation. valuable insight into progress made in our target species. Finally, TRINITY assemblies resulted in proportionally fewer long contigs positive identification of nearly all zebrafish essential genes in our (e.g.> 1,000 bp) compared to zebrafish. Four factors account for this transcriptomes provides additional evidence of the utility of our an- result. First, the microbiome is present in these sequences and many of notation procedures. Using the bioinformatics pipeline presented in the contigs are not endogenous, as reflected by top species hits in Fig. 1, we obtained high quality transcriptome data from three spe- BLAST searches (Fig. 4). Second, a small amount of genomic DNA cies of cyprinid fishes with distinctly different evolutionary histories. contamination persists despite DNase treatment during library prepa- Our specific aims in this study were to sequence, annotate, and as- ration. Genomic contamination tends to be observed as short (e.g. semble the transcriptomes of co-occurring fishes with the goal of de- 200 bp), shallow contigs often comprised of single-reads. Third, the de veloping resources for ongoing studies of the evolution and novo assemblies are more fragmented due to the short read technology molecular ecology of North American cyprinids. This comparative employed, with multiple contigs often representing non-overlapping transcriptome dataset offers tools to construct assays to pose and test fragments of the same gene. This effect is particularly acute in genes hypotheses related to differences in DNA sequences, functional path- with short sequence repeats (e.g. microsatellites). Finally, we only used ways, and expression patterns among organisms that are more or the canonical zebrafish transcripts in this study, which excludes the less closely related (i.e. comparative approach), but that also co- shorter isoforms present in many genes and biases the zebrafish distri- occur in nature and experience similar biotic and abiotic conditions, bution toward longer sequences. Transcriptomes presented here repre- including exposure to similar suites of pathogens and water quality sent an improvement (i.e. more sequences, higher coverage; longer conditions, for example. These data are also a resource for identify- relative N50) over earlier work on sequencing and assembling the 39,51 ing single nucleotide polymorphisms (SNPs) in transcribed genes, common carp transcriptome using Roche 454 sequencing, due to which could be used to explore functional or phenotypic variation the higher throughput, Illumina paired-end sequencing approach we within and among species. employed. The bioinformatic approach we presented to identify and There are several key findings in this study, including: (i) high- filter non-target sequences from the final dataset resulted in high qual- quality transcriptome assemblies for cyprinid fishes that reveal broad ity and well annotated assemblies. similarities and evolutionary conservation of genes with zebrafish, but with some key differences; (ii) several potentially novel genes not identified in zebrafish that are candidates for studies of ecological 4.2. Potentially novel genes and morphological novelty; (iii) diverse microbiomes that vary sub- Results of BLAST searches and ORF predictions helped us identify stantially among species, despite origin from a single collection local- candidate genes that may represent novel species- or taxon-specific ity; (iv) ubiquitous expression of essential genes for development in genes. Our interest in these genes lies in the idea that they may con- later ontogenetic stages (i.e. juveniles and adults) across a broad ar- tain some of the functional elements responsible for extensive ecolog- ray of tissue types; (v) a large number of duplicate genes expressed in ical and phylogenetic diversity present in the Cyprinidae, as in 83–85 the tetraploid, Cyprinus carpio, representing a diverse suite of bio- previous studies of lineage-specific gene family expansions. For logical processes or gene ontologies. We discuss each of these find- example, expansion of the patristacin gene family in pipefish may be ings in greater detail below. an important driver in the evolution of male pregnancy in that Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 19 Figure 7. Top panel: Gene-ontology terms that are over- or under-represented (y-axis) in the list of genes retained as duplicates in the common carp transcrip- tome as compared to all expressed genes in common carp. Bottom panel: Summary of groups of biological processes overrepresented in the retained-dupli- cates in common carp. Box size is proportional to the number of genes with particular gene ontology terms, which may suggest a dosage effect in common carp. 85 87 lineage. Many of the potentially novel genes we identified may out of final assemblies. Genome-scale sequence data is often lack- prove to be false positives as more fish genomes are sequenced and ing for the bacterial and metazoan microbiota on vertebrate samples, annotated; however, these candidates would be an excellent starting which complicates attempts at removal. We used an iterative and point for researchers interested in targeted searches for genes or pro- successive filtering approach to address this issue (Fig. 1) that pro- teins underlying ecological novelty in cyprinids that may have arisen vides valuable information on the likely source (e.g. exogenous or en- through local gene duplications, exon shuffling, horizontal transfer, dogenous) of particular sequences or contigs. Transcriptome or other mechanisms. characterization studies often do not attempt to remove exogenous microbiome and genomic DNA contamination. Researchers should be cautious when using unfiltered sequence reads, particularly when 4.3. Microbiome diversity they are compiled into massive databases that lack appropriate Another valuable aspect of transcriptome sequencing of samples metadata. taken from nature is the simultaneous generation of quantifiable data on the microbiome. These data are applicable to study of 4.4. Conservation of essential genes host-parasite dynamics, immune response, paired comparative popu- lation genetics or phylogeographic analysis of host and microbiota. Genes that are essential for embryonic development present interest- When generating de novo transcriptome assemblies for focal species, ing targets for studying genome evolution due to their critical func- 58,88,89 it is imperative that microbiome sequences are identified and filtered tional importance. Essential genes also bear biomedical Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 20 Comparative transcriptomics of cyprinids Figure 8. Expression of essential developmental genes by tissue type in three cyprinid fishes compared to 305 essential genes expressed across all tissues in zebrafish (Danio rerio). or juvenile developmental stages. We propose that the number of es- sential genes expressed could be used as a metric to complement other measures of assembly quality and completeness, in addition to comparing transcript length histograms to closely related model species (see Fig. 2). While beyond the scope of the present study, fu- ture work should compare the utility of these metrics for assessing transcriptome assemblies. 4.5. Tetraploidy and expression of duplicated genes Our results indicated that a large number of duplicate genes are ex- pressed in Cyprinus carpio, representing a diverse suite of biological 39,40,51 processes or gene ontologies, similar to previous studies. For genes where both Ohnologs were expressed in C. carpio, there was en- richment in several different functional pathways, but many genes Figure 9. Number of loci (TRINITY genes) expressed per zebrafish essential were associated with ‘anatomical structure morphogenesis’ in particu- gene. Only slightly more (e.g. n¼ 2) were expressed per essential gene in the recently duplicated genome of Cyprinus carpio. Most duplicated essen- lar (Fig. 7). Functional duplicates at these genes correlate with large tial genes in C. carpio are not transcribed and have either been lost evolu- body size and rapid growth in C. carpio as compared to C. lutrensis tionarily, e.g. pseudogenes, or are expressed in other developmental stages and P. gracilis and a potential dosage effect. Using a different set of tis- or tissues. sues, Wang et al. identified enrichment of retained expression of du- plicates in gene ontology pathways involved in metabolic and immune significance as many have been implicated in human diseases and de- functions using 454 transcriptome sequencing and EST data mining. 88 40 velopmental abnormalities. Our data demonstrate that essential The availability of a (draft) genome for common carp will eventually 58,59 ‘developmental’ genes previously identified in larval zebrafish help identify Ohnologs that are silenced because pseudogenes of si- are almost all ubiquitously expressed in juvenile or adults across a lenced genes may still be present in genomic DNA sequences; cur- broad range of tissues, suggesting their importance is not simply lim- rently, the incomplete annotation of that genome precludes analysis of ited to early ontogenetic stages or particular tissues. Previous work gene silencing at the genomic DNA level. Ultimately, knowledge of has shown that many of these genes are critical for basic cell func- which genes are retained and expressed in duplicate in tetraploids as tion, which may underlie their ubiquitous expression. Based on the compared to related diploid species can provide insight into the role critical functions they perform, these genes are candidates for future that whole genome duplication plays in the molecular ecology and 93,94 studies looking at the cause of high rates of genetic inviability and phylogenetic diversification of organisms. Note that our analyses mortality in cyprinids and other organisms with type-III life histo- and those of Wang et al. are based only on expressed genes in partic- ries. From a practical standpoint, however, we suggest that the ular tissues at a single time point, rather than genomic DNA sequences ubiquitous expression of these genes makes their sequencing cover- and consequently would not include Ohnologs expressed only in dif- age and completeness useful metrics that should be used to assess the ferent tissues or at different time points. The recent Cc4R allotetra- 35,95,96 quality and completeness of de novo transcriptome assemblies, anal- ploidy event complicates transcriptome assembly because there ogous to the use of ‘housekeeping’ genes as positive controls in has been little time for divergence of Ohnologs e.g, 8.2 million years. qPCR studies. The presence of nearly all essential genes across In autopolyploid salmonids, the fourth round of whole genome dupli- these four cyprinid species (representing more than 100 million years cation is much older i.e. 90–102 ma; yet many Ohnologous loci are of evolutionary divergence ) is consistent with the hypothesis of difficult to separate via bioinformatic approaches. Some loci even 58,61 broad evolutionary and functional conservation. The few essen- maintain tetrasomic inheritance because of the autopolyploid nature of tial genes not detected may still be present in the genome, but were the duplication. These factors need to be explicitly considered when missed due to assembly errors or are expressed transiently at larval conducting analyses that require orthologous alignments, such as Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 21 RNA-seq and syntenic mapping, when working with polyploid or par- Supplementary data tially diploidized species. Supplementary data are available at DNARES online. 4.6. Summary References Results from short read sequences yield high-quality transcriptome 1. Mardis, E. R. 2008, The impact of next-generation sequencing technology resources for comparative study of cyprinids, a hyper-diverse clade on genetics. Trends Genet., 24, 133–41. of fishes. We used a variety of bioinformatic tools for assembly qual- 2. Shendure, J. and Ji, H. 2008, Next-generation DNA sequencing. Nat. ity assessment, gene annotation, orthology assignment, and identifi- Biotechnol., 26, 1135–45. cation and partitioning of exogenous DNA in wild cyprinid fishes. 3. Ekblom, R. and Galindo, J. 2011, Applications of next generation sequenc- This approach facilitates technology transfer from a model organism ing in molecular ecology of non-model organisms. Heredity, 107,1–15. (zebrafish) to a group of related species that fill diverse and critical 4. Alvarez, M., Schrey, A. W. and Richards, C. L. 2015, Ten years of tran- roles in these ecosystems and comprise an important component of scriptomics in wild populations: what have we learned about their ecology and evolution? Mol. Ecol., 24, 710–25. biodiversity. Conserved expression of essential developmental genes 5. Andrews, K. R., Good, J. M., Miller, M. R., Luikart, G. and Hohenlohe, across a broad phylogenetic scope, later ontogenetic stages, and ar- P. A. 2016, Harnessing the power of RADseq for ecological and evolu- ray of tissue types, illustrates their utility as benchmarks for assessing tionary genomics. Nat. Rev. Genet., 17, 81–92. coverage in de novo assemblies. Moreover, their ubiquitous expres- 6. Barshis, D. J., Ladner, J. T., Oliver, T. A., Seneca, F. O., Traylor-Knowles, sion further supports the hypothesis that these genes are required for N. and Palumbi, S. R. 2013, Genomic basis for coral resilience to climate the basic biology of cyprinid fish and are candidate loci for develop- change. Proc. Natl. Acad. Sci., 110, 1387–92. mental abnormalities and disease. Finally, comparative transcriptom- 7. Schunter, C., Vollmer, S. V., Macpherson, E. and Pascual, M. 2014, ics must contend with genome duplications and other genomic Transcriptome analyses and differential gene expression in a non-model ‘events’ that affect gene identity and expression. Nonetheless, com- fish species with alternative mating tactics. BMC Genomics, 15, 167. parative approaches could provide enormous power to identify 8. Verbruggen, B., Bickley, L. K., Santos, E. M., et al. 2015, De novo assem- bly of the Carcinus maenas transcriptome and characterization of innate shared and unique physiological pathways that respond to common immune system pathways. BMC Genomics, 16,1. environmental stressors in a natural setting. 9. Pavey, S. A., Bernatchez, L., Aubin-Horth, N. and Landry, C. R. 2012, What is needed for next-generation ecological and evolutionary geno- mics? Trends Ecol. Evol., 27, 673–78. 4.7. Data Availability 10. Van Straalen, N. M. and Roelofs, D. 2012, An Introduction to Ecological Raw sequence reads were uploaded to the NCBI Sequence Read Genomics. Oxford University Press. Archive (SRA: SRP107991: SRR5601334-SRR560133469. 11. Landry, C. R. and Aubin-Horth, N. 2014, Recent advances in ecological BIOPROJECT: PRJNA383604. BIOSAMPLES: SAMN07166458- genomics: from phenotypic plasticity to convergent and adaptive evolu- SAMN0716493). TRINITY-assembled transcriptomes are available tion and speciation. Adv. Exp. Med. Biol., 781, 1–5. via FigShare. BLAST results and the list of contigs 12. Goltsman, D. S. A., Comolli, L. R., Thomas, B. C. and Banfield, J. F. corresponding to potentially novel genes are available in the 2015, Community transcriptomics reveals unexpected high microbial di- Supplementary data. versity in acidophilic biofilm communities. ISME J., 9, 1014–23. 13. DeBiasse, M. B. and Kelly, M. W. 2016, Plastic and evolved responses to global change: what can we learn from comparative transcriptomics? J. Hered., 107, 71–81. Acknowledgements 14. Eo, S. H., Doyle, J. M., Hale, M. C., Marra, N. J., Ruhl, J. D. and This project was supported by the National Institute of General Medical DeWoody, J. A. 2012, Comparative transcriptomics and gene expression Sciences (8P20GM103451-12), New Mexico IDeA Networks of Biomedical in larval tiger salamander (Ambystoma tigrinum) gill and lung tissues as Research Excellence (NMINBRE_A2_Jan_2013), and the Center for revealed by pyrosequencing. Gene, 492, 329–38. Evolutionary and Theoretical Immunology. Samples were collected under 15. Whitehead, A., Triant, D., Champlin, D. and Nacci, D. 2010, New Mexico Department of Game and Fish permit #3015. This research was Comparative transcriptomics implicates mechanisms of evolved pollution approved by Institutional Animal Care and Use Committee Protocol #10- tolerance in a killifish population. Mol. Ecol., 19, 5186–203. 100468-MCC and #10-100492-MCC. We thank E. Loker, R. Miller, J. 16. Cohen, D., Bogeat-Triboulot, M.-B., Tisserant, E., et al. 2010, Kavka, and G. Rosenberg for research and technical support. Thanks to Z. Comparative transcriptomics of drought responses in Populus: a Ren and L. Hao for assistance with the Database of Essential Genes. Fish im- meta-analysis of genome-wide expression profiling in mature leaves and ages were provided T. Kennedy (red shiner, flathead chub) and C. Thomas root apices across two genotypes. BMC Genomics, 11,1. (common carp). This research benefitted from insight and technical assistance 17. Des Marais, D. L. and Rausher, M. D. 2008, Escape from adaptive con- provided by F. Schilkey, N. Devitt, P. Mena, T. Ramaraj, I. Lindquist, A. flict after duplication in an anthocyanin pathway gene. Nature, 454, Snyder, M. Osborne, and C. Krabbenhoft. We thank the anonymous re- 762–65. viewers for helpful comments on the manuscript. The authors have no compet- 18. Reid, N. M., Proestou, D. A., Clark, B. W., et al. 2016, The genomic land- ing interests. scape of rapid repeated evolutionary adaptation to toxic pollution in wild fish. Science, 354, 1305–8. 19. Thompson, A. W. and Ortı´, G. 2016, Annual killifish transcriptomics and candidate genes for metazoan diapause. Mol. Biol. Evol., 33, 2391–95. Accession numbers 20. Dion-Coˆte ´ , A.-M., Renaut, S., Normandeau, E. and Bernatchez, L. 2014, NCBI Genbank SRA: SRP107991: SRR5601334-SRR560133469 RNA-seq reveals transcriptomic shock involving transposable elements reactivation in hybrids of young lake whitefish species. Mol. Biol. Evol., 31, 1188–99. 21. Kavembe, G. D., Franchini, P., Irisarri, I., Machado-Schiaffino, G. and Conflict of interest Meyer, A. 2015, Genomics of adaptation to multiple concurrent stresses: None declared. insights from comparative transcriptomics of a Cichlid fish from one of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 22 Comparative transcriptomics of cyprinids earth’s most extreme environments, the Hypersaline Soda Lake Magadi in 47. Ankley, G. T. and Villeneuve, D. L. 2006, The fathead minnow in aquatic Kenya, East Africa. J. Mol. Evol., 81, 90–109. toxicology: past, present and future. Aquatic Toxicol., 78, 91–102. 22. Kelley, J. L., Passow, C. N., Plath, M., Arias Rodriguez, L., Yee, M. C. 48. Klaper, R., Carter, B. J., Richter, C., Drevnick, P., Sandheinrich, M. and and Tobler, M. 2012, Genomic resources for a model in adaptation and Tillitt, D. 2008, Use of a 15 k gene microarray to determine gene expres- speciation research: characterization of the Poecilia mexicana transcrip- sion changes in response to acute and chronic methylmercury exposure in tome. BMC Genomics, 13, 652. the fathead minnow Pimephales promelas Rafinesque. J. Fish Biol., 72, 23. Deyholos, M. K. 2010, Making the most of drought and salinity tran- 2207–80. scriptomics. Plant Cell Environ., 33, 648–54. 49. Burns, F. R., Cogburn, A. L., Ankley, G. T., et al. 2016, Sequencing and 24. Ramachandran, V. K., East, A. K., Karunakaran, R., Downie, J. A. and de novo draft assemblies of a fathead minnow (Pimephales promelas) Poole, P. S. 2011, Adaptation of Rhizobium leguminosarum to pea, alfalfa reference genome. Environ. Toxicol. Chem., 35, 212–217. and sugar beet rhizospheres investigated by comparative transcriptomics. 50. FAO 2014, FISHSTAT: Global Aquaculture Production. FAO, Rome, Genome Biol., 12,1. Italy. 25. Cheviron, Z. A., Connaty, A. D., McClelland, G. B. and Storz, J. F. 2014, 51. Ji, P., Liu, G., Xu, J., et al. 2012, Characterization of common carp tran- Functional genomics of adaptation to hypoxic cold-stress in high-altitude scriptome: de novo sequencing, assembly, annotation and comparative ge- deer mice: transcriptomic plasticity and thermogenic performance nomics. PloS One, 7, 1–9. Evolution, 68, 48–62. 52. Li, G., Zhao, Y., Liu, Z., et al. 2015, De novo assembly and characteriza- 26. Webster, T. M. U. and Santos, E. M. 2015, Global transcriptomic profil- tion of the spleen transcriptome of common carp (Cyprinus carpio) using ing demonstrates induction of oxidative stress and of compensatory cellu- illumina paired-end sequencing. Fish Shellfish Immunol. lar stress responses in brown trout exposed to glyphosate and Roundup. 53. Dekay, J. E. 1842, Zoology of New-York, or, the New-York fauna; com- BMC Genomics, 16,1. prising detailed descriptions of all the animals hitherto observed within 27. Nelson, J. S. 2006, Fishes of the World. John Wiley & Sons: Hoboken, NJ. the state of New-York, with brief notices of those occasionally found near 28. Berra, T. M. 2001, Freshwater fish distribution. Academic Press: San its borders, and accompanied by appropriate illustrations. Part IV. Fishes. Diego, CA. W. & A. White & J. Visscher: Albany, NY. 29. Winfield, I. and Townsend, C. 1991, The role of cyprinids in ecosystems. 54. McDonald, M. 1893, Report of the commissioner of fish and fisheries for In: Winfield, I. and Nelson, J. (eds), Cyprinid Fishes, Springer, 552–71. the fiscal years 1889-90 and 1890-91. Part XVII. U.S. Commission of 30. Sterner, R. W. and George, N. B. 2000, Carbon, nitrogen, and phospho- Fish and Fisheries. Washington, DC, 1–96. rous stoichiometry of cyprinid fishes. Ecology, 81, 127–40. 55. Walters, D. M., Blum, M. J., Rashleigh, B., Freeman, B. J., Porter, B. A. 31. Power, M. E., Matthews, W. J. and Stewart, A. J. 1985, Grazing min- and Burkhead, N. M. 2008, Red shiner invasion and hybridization with nows, piscivorous bass, and stream algae: dynamics of a strong interac- blacktail shiner in the upper Coosa River, USA. Biol. Invas., 10, tion. Ecology, 66, 1448–56. 1229–1242. 32. Amores, A., Force, A., Yan, Y.-L., et al. 1998, Zebrafish hox clusters and 56. Marsh-Matthews, E. and Matthews, W. J. 2000, Spatial variation in rela- vertebrate genome evolution. Science, 282, 1711–14. tive abundance of a widespread, numerically dominant fish species and its 33. Postlethwait, J. H., Yan, Y.-L., Gates, M. A., et al. 1998, Vertebrate ge- effect on fish assemblage structure. Oecologia, 125, 283–292. nome evolution and the zebrafish gene map. Nature Genet., 18, 345–49. 57. Hoagstrom, C. W., DeWitte, A. C., Gosch, N. J. and Berry, C. R., Jr 34. Meyer, A. and Van de Peer, Y. 2005, From 2R to 3R: evidence for a 2007, Historical fish assemblage flux in the Cheyenne River below fish-specific genome duplication (FSGD). BioEssays, 27, 937–45. Angostura Dam. J. Freshwater Ecol., 22, 219–29. 35. David, L., Blum, S., Feldman, M. W., Lavi, U. and Hillel, J. 2003, Recent 58. Amsterdam, A., Nissen, R. M., Sun, Z., Swindell, E. C., Farrington, S. duplication of the common carp (Cyprinus carpio L.) genome as revealed and Hopkins, N. 2004, Identification of 315 genes essential for early by analyses of microsatellite loci. Mol. Biol. Evol., 20, 1425–34. zebrafish development. Proc. Natl. Acad. Sci USA, 101, 12792–97. 36. Machordom, A. and Doadrio, I. 2001, Evolutionary history and specia- 59. Chen, W.-H., Minguez, P., Lercher, M. J. and Bork, P. 2012, OGEE: an tion modes in the cyprinid genus Barbus. Proc. Roy. Soc. Lond B Biol. online gene essentiality database. Nucleic Acids Res., 40, D901–D906. Sci., 268, 1297–306. 60. Flicek, P., Amode, M. R., Barrell, D., et al. 2014, Ensembl 2014. Nucleic 37. Ohno, S., Muramoto, J., Christian, L. and Atkin, N. B. 1967, Acids Res., 42, D749–755. Diploid-tetraploid relationship among old-world members of the fish fam- 61. Hahn, M. W. and Kern, A. D. 2005, Comparative genomics of centrality ily Cyprinidae. Chromosoma, 23, 1–9. and essentiality in three eukaryotic protein-interaction networks. Mol. 38. Yang, L., Sado, T., Hirt, M. V., et al. 2015, Phylogeny and polyploidy: Biol. Evol., 22, 803–6. Resolving the classification of cyprinine fishes (Teleostei: Cypriniformes). 62. Innan, H. and Kondrashov, F. 2010, The evolution of gene duplications: clas- Mol. Phylogenet. Evol., 85, 97–116. sifying and distinguishing between models. Nat. Rev. Genet., 11, 97–108. 39. Wang, J.-T., Li, J.-T., Zhang, X.-F. and Sun, X.-W. 2012, Transcriptome 63. Chain, F. J., Dushoff, J. and Evans, B. J. 2011, The odds of duplicate gene analysis reveals the time of the fourth round of genome duplication in persistence after polyploidization. BMC Genomics, 12,1. common carp (Cyprinus carpio). BMC Genomics, 13, 96. 64. Bolger, A. M., Lohse, M. and Usadel, B. 2014, Trimmomatic: a flexible 40. Xu, P., Zhang, X., Wang, X., et al. 2014, Genome sequence and genetic trimmer for Illumina sequence data. Bioinformatics, btu170. diversity of the common carp, Cyprinus carpio. Nature Genet., 46, 65. Grabherr, M. G., Haas, B. J., Yassour, M., et al. 2011, Full-length tran- 1212–1219. scriptome assembly from RNA-Seq data without a reference genome. Nat. 41. Woods, T. D. and Buth, D. G. 1984, High level of gene silencing in the tet- Biotechnol., 29, 644–52. raploid goldfish. Biochem. Syst. Ecol., 12, 415–21. 66. Haas, B. J., Papanicolaou, A., Yassour, M., et al. 2013, De novo tran- 42. Ferris, S. D. and Whitt, G. S. 1977, The evolution of duplicate gene ex- script sequence reconstruction from RNA-seq using the Trinity platform pression in the carp (Cyprinus carpio). Experientia, 33, 1299–1301. for reference generation and analysis. Nat. Protocols, 8, 1494–512. 43. Briggs, J. P. 2002, The zebrafish: a new model organism for integrative 67. Pruesse, E., Quast, C., Knittel, K., et al. 2007, SILVA: a comprehensive physiology. Am. J. Physiol. Regul. Integr. Comp. Physiol., 282, R3–R9. online resource for quality checked and aligned ribosomal RNA sequence 44. Dooley, K. and Zon, L. I. 2000, Zebrafish: a model system for the study data compatible with ARB. Nucleic Acids Res., 35, 7188–96. of human disease. Curr. Opinion Genet. Dev., 10, 252–56. 68. Conesa, A., Gotz, S., Garcia-Gomez, J. M., Terol, J., Talon, M. and 45. Lieschke, G. J. and Currie, P. D. 2007, Animal models of human disease: Robles, M. 2005, Blast2GO: a universal tool for annotation, visualization zebrafish swim into view. Nat. Rev. Genet., 8, 353–367. and analysis in functional genomics research. Bioinformatics, 21, 46. Howe, K., Clark, M. D., Torroja, C. F., et al. 2013, The zebrafish refer- 3674–76. ence genome sequence and its relationship to the human genome. Nature, 69. Langmead, B. and Salzberg, S. L. 2012, Fast gapped-read alignment with 496, 498–503. Bowtie 2. Nat. Methods, 9, 357–59. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 23 70. Li, B. and Dewey, C. N. 2011, RSEM: accurate transcript quantification 85. Small, C., Bassham, S., Catchen, J., et al. 2016, The genome of the Gulf from RNA-Seq data with or without a reference genome. BMC pipefish enables understanding of evolutionary innovations. Genome Bioinform., 12, 323. Biol., 17, 258. 71. Langfelder, P. and Horvath, S. 2008, WGCNA: an R package for 86. Di Bella, J. M., Bao, Y., Gloor, G. B., Burton, J. P. and Reid, G. 2013, weighted correlation network analysis. BMC Bioinform., 9, 559. High throughput sequencing methods and analysis for microbiome re- 72. Mi, H., Muruganujan, A. and Thomas, P. D. 2013, PANTHER in 2013: search. J. Microbiol. Methods, 95, 401–14. modeling the evolution of gene function, and other gene attributes, in the 87. Coco, J. R., Flemington, E. K. and Taylor, C. M. 2011, PARSES: a pipe- context of phylogenetic trees. Nucleic Acids Res., 41, D377–386. line for analysis of RNA-Seq exogenous sequences. BICoB, 196–200. 73. Supek, F., Bosnjak, M., Skunca, N. and Smuc, T. 2011, REVIGO summa- 88. Dickerson, J. E., Zhu, A., Robertson, D. L. and Hentges, K. E. 2011, rizes and visualizes long lists of gene ontology terms. PLoS One, 6, Defining the role of essential genes in human disease. PLoS One, 6, e21800. e27368. 74. van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. 2014, Ten 89. Luo, H., Lin, Y., Gao, F., Zhang, C.-T. and Zhang, R. 2014, DEG 10, an years of next-generation sequencing technology. Trends Genet., 30, update of the database of essential genes that includes both 418–26. protein-coding genes and noncoding genomic elements. Nucleic Acids 75. Bernatchez, L. 2016, On the maintenance of genetic variation and adapta- Res., 42, D574–D580. tion to environmental change: considerations from population genomics 90. Plough, L., Shin, G. and Hedgecock, D. 2016, Genetic inviability is a ma- in fishes. J. Fish Biol., 89, 2519–56. jor driver of type-III survivorship in experimental families of a highly fe- 76. Martin, J. A. and Wang, Z. 2011, Next-generation transcriptome assem- cund marine bivalve. Mol. Ecol. bly. Nat. Rev. Genet., 12, 671–82. 91. McCurley, A. T. and Callard, G. V. 2008, Characterization of housekeep- 77. Li, B., Fillmore, N., Bai, Y., et al. 2014, Evaluation of de novo transcrip- ing genes in zebrafish: male-female differences and effects of tissue type, tome assemblies from RNA-Seq data. Genome Biol., 15,1. developmental stage and chemical treatment. BMC Mol. Biol., 9,1. 78. O’Neil, S. T. and Emrich, S. J. 2013, Assessing De Novo transcriptome as- 92. Saitoh, K., Sado, T., Doosey, M. H., et al. 2011, Evidence from mitochon- sembly metrics for consistency and utility. BMC Genomics, 14, 465. drial genomics supports the lower Mesozoic of South Asia as the time and 79. Lopez-Maestre, H., Brinza, L., Marchet, C., et al. 2016, SNP calling from place of basal divergence of cypriniform fishes (Actinopterygii: RNA-seq data without a reference genome: identification, quantification, Ostariophysi). Zool. J. Linnean Soc., 161, 633–62. differential analysis and impact on the protein sequence. Nucleic Acids 93. Postlethwait, J. H. 2007, The zebrafish genome in context: ohnologs gone Res., 44, e148–e148. missing. J. Exp. Zool. B Mol. Dev. Evol., 308, 563–77. 80. DeWoody, J. A., Abts, K. C., Fahey, A. L., et al. 2013, Of contigs and 94. Ferris, S. D. and Whitt, G. S. 1977, Loss of duplicate gene expression after quagmires: next-generation sequencing pitfalls associated with transcrip- polyploidisation. Nature, 265, 258–60. tomic studies. Mol. Ecol. Resour., 13, 551–58. 95. Larhammar, D. and Risinger, C. 1994, Molecular genetic aspects of tetra- 81. Wolf, J. B. 2013, Principles of transcriptome analysis and gene expression ploidy in the common carp Cyprinus carpio. Mol. Phylogenet. Evol., 3, quantification: an RNA-seq tutorial. Mol. Ecol. Resour., 13, 559–72. 59–68. 82. Vijay, N., Poelstra, J. W., Ku ¨ nstner, A. and Wolf, J. B. 2013, Challenges 96. Ohno, S. 1970, Evolution by Gene Duplication. Springer: New York, and strategies in transcriptome assembly and differential gene expression NY. quantification. A comprehensive in silico assessment of RNA-seq experi- 97. Berthelot, C., Brunet, F., Chalopin, D., et al. 2014, The rainbow trout ge- ments. Mol. Ecol., 22, 620–34. nome provides novel insights into evolution after whole-genome duplica- 83. Aravind, L., Watanabe, H., Lipman, D. J. and Koonin, E. V. 2000, tion in vertebrates. Nat. Commun., 5, 3657. Lineage-specific loss and divergence of functionally linked genes in eu- 98. Timusk, E. R., Ferguson, M. M., Moghadam, H. K., Norman, J. D., karyotes. Proc. Natl. Acad. Sci., 97, 11319–24. Wilson, C. C. and Danzmann, R. G. 2011, Genome evolution in the fish 84. Stein, C., Caccamo, M., Laird, G. and Leptin, M. 2007, Conservation and family salmonidae: generation of a brook charr genetic map and compari- divergence of gene families encoding components of innate immune re- sons among charrs (Arctic charr and brook charr) with rainbow trout. sponse systems in zebrafish. Genome Biol., 8,1. BMC Genet., 12, 1–15. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

DNA ResearchOxford University Press

Published: Feb 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

Monthly Plan

  • Read unlimited articles
  • Personalized recommendations
  • No expiration
  • Print 20 pages per month
  • 20% off on PDF purchases
  • Organize your research
  • Get updates on your journals and topic searches

$49/month

Start Free Trial

14-day Free Trial

Best Deal — 39% off

Annual Plan

  • All the features of the Professional Plan, but for 39% off!
  • Billed annually
  • No expiration
  • For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588

$360/year

billed annually
Start Free Trial

14-day Free Trial