Comparative transcriptomics can now be conducted on organisms in natural settings, which has greatly enhanced understanding of genome–environment interactions. Here, we demon- strate the utility and potential pitfalls of comparative transcriptomics of wild organisms, with an example from three cyprinid fish species (Teleostei:Cypriniformes). We present extensively fil- tered and annotated transcriptome assemblies that provide a valuable resource for studies of genome evolution (e.g. polyploidy), ecological and morphological diversification, speciation, and shared and unique responses to environmental variation in cyprinid fishes. Our results and analyses address the following points: (i) ‘essential developmental genes’ are shown to be ubiquitously expressed in a diverse suite of tissues across later ontogenetic stages (i.e. juve- niles and adults), making these genes are useful for assessing the quality of transcriptome as- semblies, (ii) the influence of microbiomes and other exogenous DNA, (iii) potentially novel, species-specific genes, and (iv) genomic rearrangements (e.g. whole genome duplication). The data we present provide a resource for future comparative work in cypriniform fishes and other taxa across a variety of sub-disciplines, including stress response, morphological diversifica- tion, community ecology, ecotoxicology, and climate change. Key words: RNA-seq, essential genes, Cyprinus carpio, carp, gene silencing 1. Introduction 9–11 and giving rise to the ﬁeld of ecological genomics. Reduced sequenc- High-throughput sequencing has dramatically accelerated the pace of ing costs have made it feasible to study transcriptomes of co-occurring 1,2 genomic research. While once restricted to model species in labora- species in a community ecology context (i.e. ‘community transcriptom- tory settings, genomic methods are being widely applied to non-model ics’), as well as comparative studies of transcriptome evolution across 3–8 13–16 species in nature, rapidly illuminating the black box of the genome diverse clades (i.e. ‘comparative transcriptomics’). While genomic V C The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact email@example.com 11 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 12 Comparative transcriptomics of cyprinids data from model species can be informative for the biology of related species with a comprehensively-annotated genome. Zebraﬁsh is an organisms, not all species are the same in terms of their ecology, genet- important model organism in developmental biology and disease re- 43–45 ics, and morphology. For example, research on the zebraﬁsh (Danio search, due to its semitransparent embryos and ease of laboratory rerio, family Cyprinidae) can be relevant for closely related species, but culture, as well as its comprehensively annotated genome. Fathead cannot explain the tremendous ecological and morphological diversity minnow is widely used as an indicator species in ecotoxicology studies 47,48 in this clade, as studies of single species are insufﬁcient for understand- for which microarrays have been developed and a draft genome ing dynamic interactions among species and their respective genomes in sequence is now available. Common carp is an important food ﬁsh, a macroevolutionary context. In order to understand the causes and especially in Asia, and is produced extensively in aquaculture. The 39,51,52 consequences of those interactions, as well as the origin of ecological carp transcriptome has been studied elsewhere and recently a 17 40 novelty (e.g. new genes ), we must examine genomes across species draft genome sequence was published. that reﬂect that diversity. Recent studies across a diverse suite of teleost We used the extensive zebraﬁsh genomic resources available to an- ﬁsh lineages have focused on functionally-important genetic variation notate transcriptomes of three evolutionarily related, but non-model 18–22 in non-model species in nature. species that co-occur in parts of the west-central United States: Transcriptomics of species in the wild has enormous potential to Cyprinella lutrensis (red shiner), Platygobio gracilis (ﬂathead chub) and advance our understanding of mechanisms underlying molecular ad- Cyprinus carpio (common carp). These species were selected to reﬂect aptation, evolutionary diversiﬁcation, ecotoxicology, and commu- phylogenetic breadth, but also because their distributions overlap and 4,23–26 nity ecology. In this context, several important questions arise. occupy identical dryland river habitats (i.e. the Rio Grande, New For example, What are the proximate and ultimate mechanisms un- Mexico), where they are exposed to similar biotic and abiotic condi- derlying phylogenetic, ecological, and morphological divergence? tions. Cyprinus carpio is native to Asia and Europe, but was introduced How have ancestral genomes been molded by divergent natural se- into North America, perhaps as early as 1831, and enthusiastically lection and other evolutionary forces into myriad forms that exist stocked throughout the US thereafter as a food ﬁsh, including New today? How does genomic architecture constrain or promote diversi- Mexico as early as 1889. Cyprinella lutrensis and Platygobio gracilis ﬁcation? How important are genome duplication events in adaptive are both native to central and western North America, from the radiations? What role do genomes play in underlying the ecological Mississippi River basin to the Rio Grande in New Mexico. Both C. dynamics of community assembly (e.g. competition, abundance, spa- lutrensis and C. carpio are highly tolerant of a wide range of environ- tial and temporal dynamics, physiological constraint, etc.)? A neces- mental conditions and are highly invasive in areas outside of their natu- 55,56 sary ﬁrst step in addressing these questions is the generation of ral range, whereas Platygobio gracilis is sensitive to environmental databases reﬂecting the genomic or transcriptomic variation among disturbance and imperiled or declining in several parts of its range. species, which we provide in the current study. Transcriptomes of Cyprinella lutrensis and Platygobio gracilis Here we present transcriptomic resources for three members of have not been published to our knowledge, whereas genomic and 39,40,51 the freshwater ﬁsh family Cyprinidae (Teleostei: Cypriniformes), one transcriptomic data are available for Cyprinus carpio. of the most speciose vertebrate clades, with over 2,000 species. In Cyprinella lutrensis and Platygobio gracilis are diploid (2n¼ 50), addition to remarkable species diversity, the clade includes extensive while Cyprinus carpio is allotetraploid 2n¼ 100, with some dupli- ecological, genetic, and morphological diversity. Cyprinid ﬁshes cated genes silenced after a lineage-speciﬁc whole genome duplica- (minnows and carp) comprise an important component of freshwater tion (i.e. Cc4R). Our aims in this study were to: (i) succinctly ﬁsh communities throughout North America, Asia, Europe and summarize and compare genes and functional annotation informa- Africa. They are often the dominant ﬁsh taxa in numerical abun- tion obtained from various databases; (ii) test whether Cyprinus car- dance and biomass and play an important functional role in aquatic pio expresses additional copies of particular genes compared to the 29–31 ecosystems. two diploid species (Cyprinella lutrensis and Platygobio gracilis); (iii) The ecological and taxonomic diversity of cyprinids is particularly identify potentially novel genes present in the three cyprinids that interesting in light of the history of genome evolution in this clade. may underlie their unique ecological and morphological novelty and Cyprinids were part of the radiation that occurred after the teleost- (iv) to assess evolutionary conservation of essential genes for 32,33 speciﬁc genome duplication event, known as the ‘3R hypothesis’, development. that preceded and perhaps facilitated the diversiﬁcation of teleost In zebraﬁsh, 307 genes are known to be essential for development. ﬁshes. In addition, several cyprinid lineages have independently un- Knockout mutations in these genes are embryonic lethal according to 35–38 58 dergone additional rounds of genome duplications. For exam- experiments by Amsterdam et al. with subsequent revisions by 59 60 ple, the common carp (Cyprinus carpio) lineage had a fourth round Chen et al. and updates to the ENSEMBL database. These genes 39 40 of genome duplication approximately 5.6–11.3 or 8.2 million are highly conserved across extremely deep phylogenetic splits (e.g. years ago (Ma), which we refer to as ‘Cc4R’. Pairs of genes arising yeast, ﬂy, zebraﬁsh, and human) due to their essential roles in devel- from whole genome duplication, referred to as ‘Ohnologs’, were the- opment. Despite their importance, essential genes have not been oretically present for all genes immediately following the Cc4R ge- studied in the context of comparative molecular ecology or ecologi- nome duplication. Varying levels of subsequent gene-silencing and cal genomics of co-occurring species. Using transcriptome data pre- ‘re-diploidization’ have since occurred in polyploid lineages making sented in this study, we assessed the evolutionary conservation of the 41,42 cyprinids ideal for comparative studies of genome evolution. 307 zebraﬁsh essential genes across four cyprinid lineages. We pre- Despite the ecological importance of cyprinids in freshwater sys- dicted that these genes would be highly conserved across all species, tems worldwide and dynamic lineage-speciﬁc patterns of genomic ex- consistent with their critical functional roles, as compared to non- pansions and contractions, most species have little or no genomic essential genes. If this is the case, then differences among species resources available for investigating their molecular ecology or genome should be found in non-essential genes, such as lineage- or species- evolution. Three notable exceptions are zebraﬁsh (Danio rerio), fat- speciﬁc genes. We also tested whether both copies of duplicated head minnow (Pimephales promelas), and common carp (Cyprinus genes in C. carpio (i.e. Cc4R Ohnologs) were retained and expressed 43–45 38,42 carpio). The family includes zebraﬁsh (Danio rerio), a model in duplicate or whether one copy was evolutionarily lost. One Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 13 mechanism for the loss of Ohnologs is ‘pseudogenization’, wherein a gene accumulates one or more internal stop codons that prevent formation of a functional protein product and thus becomes a pseu- dogene. If having redundant copies of essential genes were impor- tant for survival (e.g. due to loss-of-function mutations in one copy), then evolutionary retention of duplicates would likely be favored in C. carpio. Conversely, if regulation of proper gene expression levels were important in the context of functional pathways, then dupli- cated essential genes would likely be silenced at roughly the same rate as non-essential genes (although regulatory changes could also ﬁne-tune expression patterns). We tested these hypotheses using ex- pression data for the three cyprinid transcriptomes as compared to zebraﬁsh. These sequences will provide resources for more detailed studies of the evolution and functional constraint of these critical genes, particularly in the context of genome expansions and reductions. 2. Materials and methods Fish (n¼ 3 per species) were collected with a seine on 6 July 2012 from a ﬁeld site on the Rio Grande, approximately 40 km south of Socorro, New Mexico (33.690556 N, 106.993042 W). Whole ﬁsh samples (juveniles or non-spawning adults) were immediately frozen in liquid nitrogen and transported to the laboratory. Skin, gill, gut, and kidney tissues were dissected and removed from frozen ﬁsh (outer layers were only slightly thawed by the time dissection was completed;<5 minutes total time), placed in TRIzol (Invitrogen), and Figure 1. Flow diagram illustrating our bioinformatic pipeline. Analyses con- sist of three main steps: assembly, annotation, and analysis of gene silenc- mechanically homogenized. Total RNA was isolated using Purelink ing patterns. Databases queried and software packages used are listed. RNA Mini kits (Ambion) following manufacturer’s protocol, along with DNase treatment to reduce genomic DNA contamination. Puriﬁed total RNA was sent to the National Center for Genome BLAST hits were identiﬁed based on the following parameter set- Resources (Santa Fe, New Mexico, USA) for quantiﬁcation, quality tings: E-value< 0.0001; gap open penalty¼ 11; gap extend¼ 1; assessment, cDNA library preparation and sequencing. RNA integ- wordsize¼ 3. After extensive testing, this parameter combination rity and purity was assessed with a Bioanalyzer 2100 instrument was found to give the optimal balance between ﬁnding matches for (Agilent Technologies). Thirty-six Illumina libraries were constructed large numbers of contigs, while minimizing spurious hits. For most (3 species 4 tissues 3 biological replicates) from the total RNA genes a 1–1 match was expected between zebraﬁsh versus Platygobio samples using Illumina TruSeq DNA prep kits according to the man- gracilis or Cyprinella lutrensis, whereas zebraﬁsh and Cyprinus car- ufacturer’s protocol. Libraries were barcoded using standard six pio should have either 1-2 or 1-1 due to partial diploidy in carp. We base pair Illumina oligonucleotides, and six libraries were pooled for used this expectation in determining the threshold E-value (i.e. each lane of Illumina HiSeq 2000 (V3 chemistry) for a total of six E< 0.0001 in this study) to use. In practice, more stringent E-value lanes of 2 100 bp paired-end sequencing. thresholds (e.g. E< 1e6) had very little effect on the number of sig- niﬁcant BLAST hits. 2.1. Bioinformatics Contigs with no signiﬁcant BLAST hits against the zebraﬁsh tran- We used the bioinformatics pipeline outlined in Figure 1 for analyz- scriptome were subjected to a series of stepwise BLASTn searches until ing transcriptomic data in three main steps: de novo assembly, gene signiﬁcant hits were found (or not) in order to identify the possible annotation, and analysis of expression of duplicated genes. Adapters sources of those sequences (e.g. microbiome ) or to identify novel and barcode sequences were removed from raw reads, and reads genes not present in the zebraﬁsh genome. First, remaining contigs were trimmed using TRIMMOMATIC with parameter settings as fol- lacking signiﬁcant hits against the zebraﬁsh transcriptome were que- lows: leading quality¼ 5; trailing quality¼ 5; minimum trimmed ried against the rRNA silva database (SSU Ref 119 NR99 and LSU read length¼ 36). Reads were normalized in silico to maximum read Parc 119), which contains bacterial and eukaryotic rRNA sequences. coverage of 50. Clipped and trimmed reads were assembled, de Contigs with still no signiﬁcant BLAST hits were then queried against novo, for each species separately using TRINITY version 2014-04- a database containing all nine additional teleost ﬁsh transcriptomes 65,66 13, with minimum contig length set to 200 bp. Libraries were (Amazon molly, Poecilia formosa; caveﬁsh, Astyanax mexicanus;cod, pooled within a species for de novo assembly, to maximize the num- Gadus morhua;fugu, Takifugu rubripes;medaka, Oryzias latipes;pla- ber of genes included. TRINITY assembles reads into contigs (‘TRINITY tyﬁsh, Xiphophorus maculates; stickleback, Gasterosteus aculeatus; transcripts’), places similar transcripts in groups loosely referred to tetraodon, Tetraodon nigroviridis;tilapia, Oreochromis niloticus) as ‘genes’, and groups similar ‘genes’ into gene clusters. from Ensembl 78. Contigs with no BLAST hits at this point were then Putative protein coding genes were also identiﬁed by BLASTx BLASTed against the zebraﬁsh genome (Zv9) using the ‘Top Level’ se- searches of contigs against zebraﬁsh (Danio rerio) peptide sequences quences from Ensembl to identify possible genomic DNA contamina- (database build Zv9) obtained from Ensembl 78. Signiﬁcant tion. Remaining contigs with no signiﬁcant blast hits in any of these Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 14 Comparative transcriptomics of cyprinids databases were piped to TRANSDECODER to identify open reading GO terminology was based on the zebraﬁsh database. Results of the frames (ORFs) that represent potentially novel genes. Default parame- overrepresentation analysis were visualized with REVIGO. ter settings were used with TRANSDECODER. The software generates pre- dicted peptide sequences for contigs with ORFs. Predicted peptide 2.3. Essential genes sequences for the contigs with ORFs but no BLAST hits to the afore- To test the hypothesis of evolutionary conservation of essential genes mentioned databases were queried (BLASTp; E-value< 0.001) against among cyprinid ﬁshes, we used zebraﬁsh genes present in the Online the NCBI nr database. BLAST2GO version 3.0, was used to identify Gene Essentiality Database OGEE; and identiﬁed orthologs in the top species hits for those predicted proteins with signiﬁcant hits against three transcriptomes from BLASTx searches described above. Of the nr. The remaining sequences with no hits to databases and no ORFs 58,59 307 essential genes in zebraﬁsh, one (ENSDARG00000038423) were discarded as likely non-protein coding, genomic DNA contami- has been retired from ENSEMBL and one (ENSDARG000 nation with sufﬁcient divergence from zebraﬁsh to render genomic 00045605) is an unprocessed pseudogene with no protein product. BLASTn searches ineffective. We searched for the remaining 305 genes in the three transcriptome assemblies to assess their conservation across cyprinids. 2.2. Genome duplication, diploidization and gene silencing 3. Results Trimmed sequence reads were mapped to TRINITY contigs using BOWTIE2 version 18.104.22.168 and corresponding gene expression was 3.1. Sequencing and transcriptome assemblies quantiﬁed with RSEM version 1.2.13. Because RSEM is incompat- Six lanes of Illumina sequencing produced more than 1.2 billion ible with indel, local, and discordant alignments, parameter settings paired-end reads, including 420.5-, 413.9-, and 385.3-million se- were chosen to avoid these alignments. The following RSEM param- quences in Cyprinus carpio, Cyprinella lutrensis, and Platygobio gra- eters were used: –sensitive; –dpad 0; –gbar 99999999; –mp 1,1 –np 1 cilis, respectively. De novo assembly resulted in high quality –score-min L,0,-0.1; –no-mixed; –no-discordant. Normalized expres- transcriptomes for all three species (Table 1). The C. carpio assembly sion for TRINITY genes was calculated by standardizing by total had the largest number of contigs (‘TRINITY transcripts’) and genes mapped reads across libraries and summed across alternate TRINITY (‘TRINITY genes’), while P. gracilis had the fewest. In contrast, metrics transcripts (isoforms) for each locus. Networks of co-expressed genes for contig length (N25, N50, N75, median contig length, average were identiﬁed for the three species using the WGCNA package in contig length) were all longer in P. gracilis than the other two species R. In order to assess the expression of duplicated genes in (Table 1; Fig. 2). Overall, the P. gracilis transcriptome assembly was Cyprinus carpio arising from the Cc4R duplication event, we quanti- more complete despite fewer raw sequence reads. TRANSDECODER pre- ﬁed the number of TRINITY genes present in each species relative to dicted ORFs in about half of all TRINITY contigs (not shown), with zebraﬁsh genes, as well as their expression levels. We used an arbi- the remainder comprised mainly of genomic DNA contamination trary threshold of ten sequence reads per gene per tissue, summed that was ﬁltered out of the ﬁnal dataset. The N50 of predicted ORFs across all three individuals, for a given gene to be considered ‘ex- was only slightly shorter in the three species (i.e. 1,299–1,572 bp) pressed’ in a particular tissue. Note that we are comparing whether than in zebraﬁsh (CDS N50¼ 2,037 bp), and similar to the recently or not a gene is expressed beyond a certain threshold, as opposed to published draft C. carpio genome 1,487 bp. Removal of micro- quantifying levels of expression (i.e. RNA-seq). This approach was biome and genomic DNA contamination from the ﬁnal assembly re- aimed at reducing the inﬂuence of unique reads (e.g. sequencing arti- sulted in fewer, but longer contigs (see ﬁltering of the ﬁnal dataset, facts). Most of the contigs excluded as a result were contigs repre- below), and an overall higher-quality assembly. sented only by singleton reads in one library. For C. carpio, we tested whether certain functional classes of 3.2. BLAST searches: zebrafish transcriptome genes were preferentially expressed in duplicate (i.e. the case where neither ohnolog is silenced). For this analysis, we used PANTHER to TopBLASTxhitsofTRINITY contigs against zebraﬁsh peptides included test for statistical overrepresentation of GO-slim Biological approximately 20,000 unique genes (ENSDARG) and 11,000 protein Processes, with Bonferroni correction. The test genes consisted of the families (ENSFAM) present in each of the three species (Fig. 3), suggest- list of C. carpio ohnologs expressed in duplicate, while the list of all ing similar annotation efﬁciency and transcriptome representation for C. carpio genes present in the assembly was used as the reference set. each species. However, after pooling isoforms, the number of TRINITY Table 1. De novo transcriptome assembly results. Zebrafish (Danio rerio) data is included as an example of a well-assembled and complete transcriptome based primarily on Sanger sequencing Cyprinus carpio Cyprinella lutrensis Platygobio gracilis Danio rerio Trinity ‘genes’ (¼Clusters of contigs) 309,921 255,863 180,130 30,651 Trinity ‘transcripts’ (¼Assembled contigs) 440,696 382,504 262,969 43,153 GC content 42.45 43.25 42.67 49.60 N25 (bp) 3,327 3,069 3,644 3,465 N50 1,841 1,666 1,972 2,037 N75 704 679 788 1,179 Median contig length 418 439 450 1,080 Average contig length 907 886 978 1,501 Total assembled bases 399,790,412 339,160,955 257,217,466 64,757,328 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 15 genes that signiﬁcantly matched these 20,000 zebraﬁsh genes varied nine teleost transcriptomes, and the zebraﬁsh genome databases (Table among species: 66,447 in Cyprinus carpio, 60,990 in Cyprinella lutren- 2). For contigs lacking hits against zebraﬁsh peptides, BLASTn searches sis, and 39,915 in Platygobio gracilis (Table 2,top row).Zebraﬁsh versus the rRNA silva database revealed a small number of signiﬁcant genes were well covered, with more than 15,000 unique zebraﬁsh genes hits (i.e.<400 contigs; Table 2). BLASTn searches of the remaining covered over at least 70% of their length in corresponding contigs from unmatched contigs versus the nine teleost ﬁsh transcriptomes identiﬁed each of the three cyprinids, consistent with the N50 data presented approximately 1,500–4,500 additional hits (Table 2), far fewer than above. In general, zebraﬁsh proteins were more completely covered by the evolutionarily more closely related zebraﬁsh transcriptome. P. gracilis contigs than C. carpio or C. lutrensis. For example, zebraﬁsh BLASTn searches of the remaining unidentiﬁed contigs against the genes were more than 90% covered (i.e. the alignment covers>90% of zebraﬁsh genome revealed a large number of signiﬁcant hits (>30,000 bases of a gene) by sequences in 50.3% (12,489 of 24,817 genes) of P. per species), suggesting these reads were the result of low levels of back- gracilis genes with signiﬁcant zebraﬁsh peptide hits, versus 49.9% ground genomic DNA contamination in the cDNA libraries, a common (13,453 of 26,963) for C. carpio, and 46.8% (12,538 of 26,817) in C. occurrence resulting from the hypersensitivity of Illumina sequencing. lutrensis. A large number of TRINITY contigs did not signiﬁcantly match Conservation of sequences across deep evolutionary lineages suggests (BLASTx) zebraﬁsh peptide sequences and were subsequently queried functional importance, such as regulatory regions. against several additional databases. Despite extensive BLAST searches, a large number of TRINITY con- tigs (>100,000 in each species or more than 50% of all contigs) did not have signiﬁcant hits in any of the databases. These contigs are 3.3. BLAST searches: other databases short in length (i.e. 200 bp) and have few reads mapping to them Contigs lacking signiﬁcant BLASTx hits against zebraﬁsh peptides were (e.g. single-read contigs). These could represent endogenous genomic queried (BLASTn) iteratively against rRNA silva microbiome database, DNA contamination of cDNA libraries and have sufﬁcient evolution- ary divergence from zebraﬁsh to render BLASTn searches ineffective. A large number are also expected to be non-rRNA sequences from the microbiome, which were not present in target databases. Of con- tigs with no BLAST hits in the aforementioned databases, TRANSDECODER predicted ORFs in 8,652 (Cyprinus carpio), 9,215 (Cyprinella lutrensis), and 3,011 (Platygobio gracilis) contigs Figure 2. Contig length histogram of three cyprinids in this study and zebra- fish, Danio rerio. By leveraging high throughput sequencing and bioinfor- matic filtering, we were able to generate high quality transcriptomes at a fraction of the cost and research effort used for zebrafish. As expected, de novo TRINITY assemblies resulted in proportionally fewer contigs longer than 1000 bp, as compared to those of a well-assembled transcriptome, zebrafish (Danio rerio). However, note that we only used canonical transcripts for zebrafish and not the shorter isoforms, which skews the distribution toward Figure 3. Unique genes and protein families from BLASTx searches (E-value longer transcripts for that species. threshold¼ 0.0001) against zebrafish (Danio rerio) peptide sequences. Table 2. Significant BLAST hits for TRINITY ‘genes’ versus various databases and number of ORFs present. BLAST searches were done in stepwise fashion: all TRINITY genes were queried against zebrafish peptides but only genes without zebrafish peptide hits were queried against rRNA silva, and so on until all of the databases were queried. Summary of open reading frames (ORFs) identified in TRINITY contigs with no significant BLAST hits against databases listed (‘No significant BLAST hits’). Some of the ORFs lacking similar proteins in the nr database may represent novel genes or genes with divergent sequences and function, while many are likely spurious results from the sequencing and assembly process or are from unidentified microbes Cyprinus carpio Cyprinella lutrensis Platygobio gracilis Zebraﬁsh peptides 66,447 60,990 39,915 rRNA Silva (microbiome) 140 306 87 Teleost ﬁsh transcriptomes 4,572 2,923 1,561 Zebraﬁsh genome 48,527 38,199 31,955 No signiﬁcant BLAST hits 190,235 153,445 106,612 Total contigs 309,921 255,863 180,130 Predicted ORFs present 8,652 9,215 3,011 ORFs with nr BLASTp hits 3,789 4,154 1,548 ORFs without nr BLASTp hits (i.e. potentially novel genes) 4,863 5,061 1,463 Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 16 Comparative transcriptomics of cyprinids Figure 4. Top-species BLASTp hits for predicted open reading frame (ORF) peptide sequences queried against the nr database. Query sequences only included ORFs from contigs that lacked significant BLAST hits (see Table 2). Grey bars represent fish or other chordates, while black bars represent non-chordate taxa. (Table 2). Roughly half of the predicted ORFs had signiﬁcant Tetrahymena, Parameceum). Conversely, in P. gracilis the ORFs ap- BLASTp hits against the nr protein database (3,789, 4,154, and pear to be endogenous genes with high similarity to zebraﬁsh 1,548 contigs, respectively). Conversely, there were 4,863 (C. car- (Fig. 4), i.e. a less diverse microbiome is present. Contigs with pre- pio), 5,061 (C. lutrensis), and 1,463 (P. gracilis) predicted ORFs had dicted ORFs but no BLAST hits to any of the databases possibly rep- no signiﬁcant hits against nr (Table 2). These ORFs could include resent novel or functionally divergent genes in these species that novel genes not present in zebraﬁsh or other teleost models, genes warrant further study. present in zebraﬁsh but with signiﬁcantly divergent sequences to cause BLAST searches to miss them, or could include genes from the 3.4. Filtering and the final assembly datasets microbiome that are not present in sequence databases. For ORFs with nr hits, zebraﬁsh was the top-hit species for a large After ﬁltering and removal of genomic DNA and microbiome reads, portion (Fig. 4), somewhat paradoxically given the lack of signiﬁcant the ﬁnal de novo assembly datasets contained only TRINITY contigs BLAST hits against zebraﬁsh peptide and genome sequences dis- falling into one of the following categories: (i) contigs with signiﬁcant cussed above. This appears to be due to the fact that TRANSDECODER- BLAST hits against zebraﬁsh or the nine other teleost transcriptomes; predicted ORFs exclude 5’ and 3’ untranslated regions (UTRs) which or (ii) contigs with no matches against any of the databases but with diverge more rapidly than ORFs over evolutionary time. In C. carpio predicted ORFs present, i.e. potentially novel genes. All other contigs and C. lutrensis, many of these ORFs are from a diverse microbiome were removed via bioinformatic ﬁltering. While it is possible that with many sequences sharing signiﬁcant similarity to cyclophyllid some of the ‘microbiome’ hits are actually external contamination, tapeworms (e.g. Echinococcus, Hymenolepis) and protozoans (e.g. we expect this to be a minor component given the diverse nature of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 17 Figure 5. Number of TRINITY genes (contigs) expressed in each of four tissue types, as well as all tissues pooled. Contigs only include those with significant BLASTx hits versus zebrafish peptides. these sequences in terms of top-hit organism (Fig. 4). It is also possi- 3.6. Expression of essential genes ble that some of the genes that signiﬁcantly align against zebraﬁsh Genes that are essential for embryonic development in D. rerio were are actually microbiome or contaminant reads, though these genes nearly all present in the three cyprinids: 285 (Platygobio gracilis), 301 being target species DNA is a more parsimonious conclusion. The ﬁ- (Cyprinella lutrensis), and 301 (Cyprinus carpio) genes were expressed nal datasets are signiﬁcantly smaller than the raw de novo assembly out of 305 zebraﬁsh essential genes (i.e. 93.4–97.8%). Of the 20 essen- but present much more reliable sequence information, i.e. transcrip- tial genes that we did not detect in P. gracilis, only one was also miss- tome sequences rather than microbiome or genomic DNA ing in C. lutrensis, and two were shared with C. carpio. No missing contamination. essential genes were shared between C. lutrensis and C. carpio,of the four missing in each species. Essential genes missing in one or more species were generally expressed at low levels in the other species. 3.5. Genome duplication, diploidization and gene Essential genes were nearly ubiquitously expressed across all four tis- silencing sue types (skin, gill, gut, kidney), with low levels of tissue speciﬁcity Transcriptome annotation and comparison with zebraﬁsh revealed (Fig. 8), in contrast to non-essential genes which generally exhibited that Cyprinus carpio expresses more genes than Cyprinella lutrensis higher levels of tissue speciﬁcity. A few essential genes do exhibit pat- and Platygobio gracilis, due to the Cc4R duplication (Fig. 5). terns of tissue speciﬁcity or species-speciﬁcity. For example, C. carpio Cyprinus carpio expressed about 41% more genes overall than P. expresses more essential genes in the gut than the other two species, in- gracilis and 11% more than C. lutrensis. The number of duplicate cluding genes such as wdr46 and exosc8, which are missing in both of genes expressed varied dramatically among tissue types (Fig. 5). In the other species. Normalized levels of expression were higher in C. all tissues except skin, C. carpio expressed more genes than the other carpio than P. gracilis and C. lutrensis for 165 and 204 out of 305 two species (i.e. 3–48% more). In skin, both C. lutrensis and P. graci- genes, respectively. This pattern was not due to C. carpio expressing lis expressed more genes than C. carpio (26 and 2%, respectively). more loci per zebraﬁsh gene (e.g. Ohnologs) than the other two spe- Using higher thresholds for ‘expression’ had moderate impact on the cies. Only slightly more loci (e.g. n¼ 2 contigs) were expressed per es- inferred percentage of duplicates expressed: a threshold of 100 reads sential gene in the recently duplicated C. carpio genome (Fig. 9) instead of 10 resulted in different estimates of duplicated genes ex- whereas most duplicated essential genes in C. carpio are not tran- pressed in C. carpio versus P. gracilis (18% more in C. carpio) and scribed and have either been lost evolutionarily, e.g. pseudogenes, or C. lutrensis (8% more in C. carpio), i.e. retained expression of Cc4R are expressed in other developmental stages or tissues. duplicates. The disparity in these results could be driven in part by different assembly qualities (e.g. a better assembled P. gracilis tran- scriptome). WGCNA analysis revealed broadly similar patterns of 4. Discussion blocks of co-expressed genes across species, consistent with the phy- logenetic relatedness of the three species (Fig. 6). Next-generation transcriptome sequencing has revolutionized the 4,74 Genes with retained duplicate expression (i.e. Ohnologs) in C. car- ﬁeld of molecular ecology over the past decade. One outcome is pio represented a suite of functional groups: gene ontology terms that increased appreciation for the molecular complexity underlying the 75,76 were signiﬁcantly enriched in the ‘retained duplicates’ list were diverse evolution of basic ecological traits. Here we present transcrip- (Fig. 7, top panel). One functional grouping that was a predominant tomic resources for comparative study of non-model cyprinid ﬁshes contributor in the REVIGO analysis was ‘anatomical structure morpho- in a natural ‘common-garden’ setting. Previous work, along with our genesis,’ of interest because common carp attain much larger body size bioinformatic analyses demonstrate that careful processing and ﬁlter- than the other two species (Fig. 7,bottom panel). ing is needed to assess the sources of DNA fragments, which can be Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 18 Comparative transcriptomics of cyprinids Figure 6. Results of WGCNA analysis for three cyprinid species. Dendrograms represent results of hierarchical cluster analysis of co-expression patterns of genes. Colored bars to the left and top of the heatmap show modules of co-expressed genes and pairs of genes with higher co-expression show (darker) colora- tion in the heatmap. endogenous target transcriptome sequences, genomic DNA ‘contami- 4.1. Assembly results nation’ from the study organism, or DNA from the microbiome or There are important considerations associated with conducting tran- diet items. Assessment of transcriptome quality also requires careful scriptome analysis in a non-laboratory setting and in species lacking 77,78 4,80,81 consideration. Traditional measures of assembled read lengths high-quality, well-annotated genomes. For example, it is neces- such as N50 are largely meaningless for transcriptomes without ad- sary to identify ways to maximize the quality and completeness of de 77,80,82 ditional context. We advocate combining N50 and/or histograms of novo assemblies. Our assemblies are somewhat less complete contig lengths with explicit comparisons to well-studied transcrip- than the zebraﬁsh reference, but this was expected because zebraﬁsh tomes of model organisms, when available. For example, we com- has been sequenced extensively at the genomic DNA level, empirically pared our de novo transcriptomes to zebraﬁsh, which yielded validated with RNA-seq, and reﬁned by years of manual curation. valuable insight into progress made in our target species. Finally, TRINITY assemblies resulted in proportionally fewer long contigs positive identiﬁcation of nearly all zebraﬁsh essential genes in our (e.g.> 1,000 bp) compared to zebraﬁsh. Four factors account for this transcriptomes provides additional evidence of the utility of our an- result. First, the microbiome is present in these sequences and many of notation procedures. Using the bioinformatics pipeline presented in the contigs are not endogenous, as reﬂected by top species hits in Fig. 1, we obtained high quality transcriptome data from three spe- BLAST searches (Fig. 4). Second, a small amount of genomic DNA cies of cyprinid ﬁshes with distinctly different evolutionary histories. contamination persists despite DNase treatment during library prepa- Our speciﬁc aims in this study were to sequence, annotate, and as- ration. Genomic contamination tends to be observed as short (e.g. semble the transcriptomes of co-occurring ﬁshes with the goal of de- 200 bp), shallow contigs often comprised of single-reads. Third, the de veloping resources for ongoing studies of the evolution and novo assemblies are more fragmented due to the short read technology molecular ecology of North American cyprinids. This comparative employed, with multiple contigs often representing non-overlapping transcriptome dataset offers tools to construct assays to pose and test fragments of the same gene. This effect is particularly acute in genes hypotheses related to differences in DNA sequences, functional path- with short sequence repeats (e.g. microsatellites). Finally, we only used ways, and expression patterns among organisms that are more or the canonical zebraﬁsh transcripts in this study, which excludes the less closely related (i.e. comparative approach), but that also co- shorter isoforms present in many genes and biases the zebraﬁsh distri- occur in nature and experience similar biotic and abiotic conditions, bution toward longer sequences. Transcriptomes presented here repre- including exposure to similar suites of pathogens and water quality sent an improvement (i.e. more sequences, higher coverage; longer conditions, for example. These data are also a resource for identify- relative N50) over earlier work on sequencing and assembling the 39,51 ing single nucleotide polymorphisms (SNPs) in transcribed genes, common carp transcriptome using Roche 454 sequencing, due to which could be used to explore functional or phenotypic variation the higher throughput, Illumina paired-end sequencing approach we within and among species. employed. The bioinformatic approach we presented to identify and There are several key ﬁndings in this study, including: (i) high- ﬁlter non-target sequences from the ﬁnal dataset resulted in high qual- quality transcriptome assemblies for cyprinid ﬁshes that reveal broad ity and well annotated assemblies. similarities and evolutionary conservation of genes with zebraﬁsh, but with some key differences; (ii) several potentially novel genes not identiﬁed in zebraﬁsh that are candidates for studies of ecological 4.2. Potentially novel genes and morphological novelty; (iii) diverse microbiomes that vary sub- Results of BLAST searches and ORF predictions helped us identify stantially among species, despite origin from a single collection local- candidate genes that may represent novel species- or taxon-speciﬁc ity; (iv) ubiquitous expression of essential genes for development in genes. Our interest in these genes lies in the idea that they may con- later ontogenetic stages (i.e. juveniles and adults) across a broad ar- tain some of the functional elements responsible for extensive ecolog- ray of tissue types; (v) a large number of duplicate genes expressed in ical and phylogenetic diversity present in the Cyprinidae, as in 83–85 the tetraploid, Cyprinus carpio, representing a diverse suite of bio- previous studies of lineage-speciﬁc gene family expansions. For logical processes or gene ontologies. We discuss each of these ﬁnd- example, expansion of the patristacin gene family in pipeﬁsh may be ings in greater detail below. an important driver in the evolution of male pregnancy in that Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 19 Figure 7. Top panel: Gene-ontology terms that are over- or under-represented (y-axis) in the list of genes retained as duplicates in the common carp transcrip- tome as compared to all expressed genes in common carp. Bottom panel: Summary of groups of biological processes overrepresented in the retained-dupli- cates in common carp. Box size is proportional to the number of genes with particular gene ontology terms, which may suggest a dosage effect in common carp. 85 87 lineage. Many of the potentially novel genes we identiﬁed may out of ﬁnal assemblies. Genome-scale sequence data is often lack- prove to be false positives as more ﬁsh genomes are sequenced and ing for the bacterial and metazoan microbiota on vertebrate samples, annotated; however, these candidates would be an excellent starting which complicates attempts at removal. We used an iterative and point for researchers interested in targeted searches for genes or pro- successive ﬁltering approach to address this issue (Fig. 1) that pro- teins underlying ecological novelty in cyprinids that may have arisen vides valuable information on the likely source (e.g. exogenous or en- through local gene duplications, exon shufﬂing, horizontal transfer, dogenous) of particular sequences or contigs. Transcriptome or other mechanisms. characterization studies often do not attempt to remove exogenous microbiome and genomic DNA contamination. Researchers should be cautious when using unﬁltered sequence reads, particularly when 4.3. Microbiome diversity they are compiled into massive databases that lack appropriate Another valuable aspect of transcriptome sequencing of samples metadata. taken from nature is the simultaneous generation of quantiﬁable data on the microbiome. These data are applicable to study of 4.4. Conservation of essential genes host-parasite dynamics, immune response, paired comparative popu- lation genetics or phylogeographic analysis of host and microbiota. Genes that are essential for embryonic development present interest- When generating de novo transcriptome assemblies for focal species, ing targets for studying genome evolution due to their critical func- 58,88,89 it is imperative that microbiome sequences are identiﬁed and ﬁltered tional importance. Essential genes also bear biomedical Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 20 Comparative transcriptomics of cyprinids Figure 8. Expression of essential developmental genes by tissue type in three cyprinid fishes compared to 305 essential genes expressed across all tissues in zebrafish (Danio rerio). or juvenile developmental stages. We propose that the number of es- sential genes expressed could be used as a metric to complement other measures of assembly quality and completeness, in addition to comparing transcript length histograms to closely related model species (see Fig. 2). While beyond the scope of the present study, fu- ture work should compare the utility of these metrics for assessing transcriptome assemblies. 4.5. Tetraploidy and expression of duplicated genes Our results indicated that a large number of duplicate genes are ex- pressed in Cyprinus carpio, representing a diverse suite of biological 39,40,51 processes or gene ontologies, similar to previous studies. For genes where both Ohnologs were expressed in C. carpio, there was en- richment in several different functional pathways, but many genes Figure 9. Number of loci (TRINITY genes) expressed per zebrafish essential were associated with ‘anatomical structure morphogenesis’ in particu- gene. Only slightly more (e.g. n¼ 2) were expressed per essential gene in the recently duplicated genome of Cyprinus carpio. Most duplicated essen- lar (Fig. 7). Functional duplicates at these genes correlate with large tial genes in C. carpio are not transcribed and have either been lost evolu- body size and rapid growth in C. carpio as compared to C. lutrensis tionarily, e.g. pseudogenes, or are expressed in other developmental stages and P. gracilis and a potential dosage effect. Using a different set of tis- or tissues. sues, Wang et al. identiﬁed enrichment of retained expression of du- plicates in gene ontology pathways involved in metabolic and immune signiﬁcance as many have been implicated in human diseases and de- functions using 454 transcriptome sequencing and EST data mining. 88 40 velopmental abnormalities. Our data demonstrate that essential The availability of a (draft) genome for common carp will eventually 58,59 ‘developmental’ genes previously identiﬁed in larval zebraﬁsh help identify Ohnologs that are silenced because pseudogenes of si- are almost all ubiquitously expressed in juvenile or adults across a lenced genes may still be present in genomic DNA sequences; cur- broad range of tissues, suggesting their importance is not simply lim- rently, the incomplete annotation of that genome precludes analysis of ited to early ontogenetic stages or particular tissues. Previous work gene silencing at the genomic DNA level. Ultimately, knowledge of has shown that many of these genes are critical for basic cell func- which genes are retained and expressed in duplicate in tetraploids as tion, which may underlie their ubiquitous expression. Based on the compared to related diploid species can provide insight into the role critical functions they perform, these genes are candidates for future that whole genome duplication plays in the molecular ecology and 93,94 studies looking at the cause of high rates of genetic inviability and phylogenetic diversiﬁcation of organisms. Note that our analyses mortality in cyprinids and other organisms with type-III life histo- and those of Wang et al. are based only on expressed genes in partic- ries. From a practical standpoint, however, we suggest that the ular tissues at a single time point, rather than genomic DNA sequences ubiquitous expression of these genes makes their sequencing cover- and consequently would not include Ohnologs expressed only in dif- age and completeness useful metrics that should be used to assess the ferent tissues or at different time points. The recent Cc4R allotetra- 35,95,96 quality and completeness of de novo transcriptome assemblies, anal- ploidy event complicates transcriptome assembly because there ogous to the use of ‘housekeeping’ genes as positive controls in has been little time for divergence of Ohnologs e.g, 8.2 million years. qPCR studies. The presence of nearly all essential genes across In autopolyploid salmonids, the fourth round of whole genome dupli- these four cyprinid species (representing more than 100 million years cation is much older i.e. 90–102 ma; yet many Ohnologous loci are of evolutionary divergence ) is consistent with the hypothesis of difﬁcult to separate via bioinformatic approaches. Some loci even 58,61 broad evolutionary and functional conservation. The few essen- maintain tetrasomic inheritance because of the autopolyploid nature of tial genes not detected may still be present in the genome, but were the duplication. These factors need to be explicitly considered when missed due to assembly errors or are expressed transiently at larval conducting analyses that require orthologous alignments, such as Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 21 RNA-seq and syntenic mapping, when working with polyploid or par- Supplementary data tially diploidized species. Supplementary data are available at DNARES online. 4.6. Summary References Results from short read sequences yield high-quality transcriptome 1. Mardis, E. R. 2008, The impact of next-generation sequencing technology resources for comparative study of cyprinids, a hyper-diverse clade on genetics. Trends Genet., 24, 133–41. of ﬁshes. We used a variety of bioinformatic tools for assembly qual- 2. Shendure, J. and Ji, H. 2008, Next-generation DNA sequencing. Nat. ity assessment, gene annotation, orthology assignment, and identiﬁ- Biotechnol., 26, 1135–45. cation and partitioning of exogenous DNA in wild cyprinid ﬁshes. 3. Ekblom, R. and Galindo, J. 2011, Applications of next generation sequenc- This approach facilitates technology transfer from a model organism ing in molecular ecology of non-model organisms. Heredity, 107,1–15. (zebraﬁsh) to a group of related species that ﬁll diverse and critical 4. Alvarez, M., Schrey, A. W. and Richards, C. L. 2015, Ten years of tran- roles in these ecosystems and comprise an important component of scriptomics in wild populations: what have we learned about their ecology and evolution? Mol. Ecol., 24, 710–25. biodiversity. Conserved expression of essential developmental genes 5. Andrews, K. R., Good, J. M., Miller, M. R., Luikart, G. and Hohenlohe, across a broad phylogenetic scope, later ontogenetic stages, and ar- P. A. 2016, Harnessing the power of RADseq for ecological and evolu- ray of tissue types, illustrates their utility as benchmarks for assessing tionary genomics. Nat. Rev. Genet., 17, 81–92. coverage in de novo assemblies. Moreover, their ubiquitous expres- 6. Barshis, D. J., Ladner, J. T., Oliver, T. A., Seneca, F. O., Traylor-Knowles, sion further supports the hypothesis that these genes are required for N. and Palumbi, S. R. 2013, Genomic basis for coral resilience to climate the basic biology of cyprinid ﬁsh and are candidate loci for develop- change. Proc. Natl. Acad. Sci., 110, 1387–92. mental abnormalities and disease. Finally, comparative transcriptom- 7. Schunter, C., Vollmer, S. V., Macpherson, E. and Pascual, M. 2014, ics must contend with genome duplications and other genomic Transcriptome analyses and differential gene expression in a non-model ‘events’ that affect gene identity and expression. Nonetheless, com- ﬁsh species with alternative mating tactics. BMC Genomics, 15, 167. parative approaches could provide enormous power to identify 8. Verbruggen, B., Bickley, L. K., Santos, E. M., et al. 2015, De novo assem- bly of the Carcinus maenas transcriptome and characterization of innate shared and unique physiological pathways that respond to common immune system pathways. BMC Genomics, 16,1. environmental stressors in a natural setting. 9. Pavey, S. A., Bernatchez, L., Aubin-Horth, N. and Landry, C. R. 2012, What is needed for next-generation ecological and evolutionary geno- mics? Trends Ecol. Evol., 27, 673–78. 4.7. Data Availability 10. Van Straalen, N. M. and Roelofs, D. 2012, An Introduction to Ecological Raw sequence reads were uploaded to the NCBI Sequence Read Genomics. Oxford University Press. Archive (SRA: SRP107991: SRR5601334-SRR560133469. 11. Landry, C. R. and Aubin-Horth, N. 2014, Recent advances in ecological BIOPROJECT: PRJNA383604. BIOSAMPLES: SAMN07166458- genomics: from phenotypic plasticity to convergent and adaptive evolu- SAMN0716493). TRINITY-assembled transcriptomes are available tion and speciation. Adv. Exp. Med. Biol., 781, 1–5. via FigShare. BLAST results and the list of contigs 12. Goltsman, D. S. A., Comolli, L. R., Thomas, B. C. and Banﬁeld, J. F. corresponding to potentially novel genes are available in the 2015, Community transcriptomics reveals unexpected high microbial di- Supplementary data. versity in acidophilic bioﬁlm communities. ISME J., 9, 1014–23. 13. DeBiasse, M. B. and Kelly, M. W. 2016, Plastic and evolved responses to global change: what can we learn from comparative transcriptomics? J. Hered., 107, 71–81. Acknowledgements 14. Eo, S. H., Doyle, J. M., Hale, M. C., Marra, N. J., Ruhl, J. D. and This project was supported by the National Institute of General Medical DeWoody, J. A. 2012, Comparative transcriptomics and gene expression Sciences (8P20GM103451-12), New Mexico IDeA Networks of Biomedical in larval tiger salamander (Ambystoma tigrinum) gill and lung tissues as Research Excellence (NMINBRE_A2_Jan_2013), and the Center for revealed by pyrosequencing. Gene, 492, 329–38. Evolutionary and Theoretical Immunology. Samples were collected under 15. Whitehead, A., Triant, D., Champlin, D. and Nacci, D. 2010, New Mexico Department of Game and Fish permit #3015. This research was Comparative transcriptomics implicates mechanisms of evolved pollution approved by Institutional Animal Care and Use Committee Protocol #10- tolerance in a killiﬁsh population. Mol. Ecol., 19, 5186–203. 100468-MCC and #10-100492-MCC. We thank E. Loker, R. Miller, J. 16. Cohen, D., Bogeat-Triboulot, M.-B., Tisserant, E., et al. 2010, Kavka, and G. Rosenberg for research and technical support. Thanks to Z. Comparative transcriptomics of drought responses in Populus: a Ren and L. Hao for assistance with the Database of Essential Genes. Fish im- meta-analysis of genome-wide expression proﬁling in mature leaves and ages were provided T. Kennedy (red shiner, ﬂathead chub) and C. Thomas root apices across two genotypes. BMC Genomics, 11,1. (common carp). This research beneﬁtted from insight and technical assistance 17. Des Marais, D. L. and Rausher, M. D. 2008, Escape from adaptive con- provided by F. Schilkey, N. Devitt, P. Mena, T. Ramaraj, I. Lindquist, A. ﬂict after duplication in an anthocyanin pathway gene. Nature, 454, Snyder, M. Osborne, and C. Krabbenhoft. We thank the anonymous re- 762–65. viewers for helpful comments on the manuscript. The authors have no compet- 18. Reid, N. M., Proestou, D. A., Clark, B. W., et al. 2016, The genomic land- ing interests. scape of rapid repeated evolutionary adaptation to toxic pollution in wild ﬁsh. Science, 354, 1305–8. 19. Thompson, A. W. and Ortı´, G. 2016, Annual killiﬁsh transcriptomics and candidate genes for metazoan diapause. Mol. Biol. Evol., 33, 2391–95. Accession numbers 20. Dion-Coˆte ´ , A.-M., Renaut, S., Normandeau, E. and Bernatchez, L. 2014, NCBI Genbank SRA: SRP107991: SRR5601334-SRR560133469 RNA-seq reveals transcriptomic shock involving transposable elements reactivation in hybrids of young lake whiteﬁsh species. Mol. Biol. Evol., 31, 1188–99. 21. Kavembe, G. D., Franchini, P., Irisarri, I., Machado-Schiafﬁno, G. and Conflict of interest Meyer, A. 2015, Genomics of adaptation to multiple concurrent stresses: None declared. insights from comparative transcriptomics of a Cichlid ﬁsh from one of Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 22 Comparative transcriptomics of cyprinids earth’s most extreme environments, the Hypersaline Soda Lake Magadi in 47. Ankley, G. T. and Villeneuve, D. L. 2006, The fathead minnow in aquatic Kenya, East Africa. J. Mol. Evol., 81, 90–109. toxicology: past, present and future. Aquatic Toxicol., 78, 91–102. 22. Kelley, J. L., Passow, C. N., Plath, M., Arias Rodriguez, L., Yee, M. C. 48. Klaper, R., Carter, B. J., Richter, C., Drevnick, P., Sandheinrich, M. and and Tobler, M. 2012, Genomic resources for a model in adaptation and Tillitt, D. 2008, Use of a 15 k gene microarray to determine gene expres- speciation research: characterization of the Poecilia mexicana transcrip- sion changes in response to acute and chronic methylmercury exposure in tome. BMC Genomics, 13, 652. the fathead minnow Pimephales promelas Raﬁnesque. J. Fish Biol., 72, 23. Deyholos, M. K. 2010, Making the most of drought and salinity tran- 2207–80. scriptomics. Plant Cell Environ., 33, 648–54. 49. Burns, F. R., Cogburn, A. L., Ankley, G. T., et al. 2016, Sequencing and 24. Ramachandran, V. K., East, A. K., Karunakaran, R., Downie, J. A. and de novo draft assemblies of a fathead minnow (Pimephales promelas) Poole, P. S. 2011, Adaptation of Rhizobium leguminosarum to pea, alfalfa reference genome. Environ. Toxicol. Chem., 35, 212–217. and sugar beet rhizospheres investigated by comparative transcriptomics. 50. FAO 2014, FISHSTAT: Global Aquaculture Production. FAO, Rome, Genome Biol., 12,1. Italy. 25. Cheviron, Z. A., Connaty, A. D., McClelland, G. B. and Storz, J. F. 2014, 51. Ji, P., Liu, G., Xu, J., et al. 2012, Characterization of common carp tran- Functional genomics of adaptation to hypoxic cold-stress in high-altitude scriptome: de novo sequencing, assembly, annotation and comparative ge- deer mice: transcriptomic plasticity and thermogenic performance nomics. PloS One, 7, 1–9. Evolution, 68, 48–62. 52. Li, G., Zhao, Y., Liu, Z., et al. 2015, De novo assembly and characteriza- 26. Webster, T. M. U. and Santos, E. M. 2015, Global transcriptomic proﬁl- tion of the spleen transcriptome of common carp (Cyprinus carpio) using ing demonstrates induction of oxidative stress and of compensatory cellu- illumina paired-end sequencing. Fish Shellﬁsh Immunol. lar stress responses in brown trout exposed to glyphosate and Roundup. 53. Dekay, J. E. 1842, Zoology of New-York, or, the New-York fauna; com- BMC Genomics, 16,1. prising detailed descriptions of all the animals hitherto observed within 27. Nelson, J. S. 2006, Fishes of the World. John Wiley & Sons: Hoboken, NJ. the state of New-York, with brief notices of those occasionally found near 28. Berra, T. M. 2001, Freshwater ﬁsh distribution. Academic Press: San its borders, and accompanied by appropriate illustrations. Part IV. Fishes. Diego, CA. W. & A. White & J. Visscher: Albany, NY. 29. Winﬁeld, I. and Townsend, C. 1991, The role of cyprinids in ecosystems. 54. McDonald, M. 1893, Report of the commissioner of ﬁsh and ﬁsheries for In: Winﬁeld, I. and Nelson, J. (eds), Cyprinid Fishes, Springer, 552–71. the ﬁscal years 1889-90 and 1890-91. Part XVII. U.S. Commission of 30. Sterner, R. W. and George, N. B. 2000, Carbon, nitrogen, and phospho- Fish and Fisheries. Washington, DC, 1–96. rous stoichiometry of cyprinid ﬁshes. Ecology, 81, 127–40. 55. Walters, D. M., Blum, M. J., Rashleigh, B., Freeman, B. J., Porter, B. A. 31. Power, M. E., Matthews, W. J. and Stewart, A. J. 1985, Grazing min- and Burkhead, N. M. 2008, Red shiner invasion and hybridization with nows, piscivorous bass, and stream algae: dynamics of a strong interac- blacktail shiner in the upper Coosa River, USA. Biol. Invas., 10, tion. Ecology, 66, 1448–56. 1229–1242. 32. Amores, A., Force, A., Yan, Y.-L., et al. 1998, Zebraﬁsh hox clusters and 56. Marsh-Matthews, E. and Matthews, W. J. 2000, Spatial variation in rela- vertebrate genome evolution. Science, 282, 1711–14. tive abundance of a widespread, numerically dominant ﬁsh species and its 33. Postlethwait, J. H., Yan, Y.-L., Gates, M. A., et al. 1998, Vertebrate ge- effect on ﬁsh assemblage structure. Oecologia, 125, 283–292. nome evolution and the zebraﬁsh gene map. Nature Genet., 18, 345–49. 57. Hoagstrom, C. W., DeWitte, A. C., Gosch, N. J. and Berry, C. R., Jr 34. Meyer, A. and Van de Peer, Y. 2005, From 2R to 3R: evidence for a 2007, Historical ﬁsh assemblage ﬂux in the Cheyenne River below ﬁsh-speciﬁc genome duplication (FSGD). BioEssays, 27, 937–45. Angostura Dam. J. Freshwater Ecol., 22, 219–29. 35. David, L., Blum, S., Feldman, M. W., Lavi, U. and Hillel, J. 2003, Recent 58. Amsterdam, A., Nissen, R. M., Sun, Z., Swindell, E. C., Farrington, S. duplication of the common carp (Cyprinus carpio L.) genome as revealed and Hopkins, N. 2004, Identiﬁcation of 315 genes essential for early by analyses of microsatellite loci. Mol. Biol. Evol., 20, 1425–34. zebraﬁsh development. Proc. Natl. Acad. Sci USA, 101, 12792–97. 36. Machordom, A. and Doadrio, I. 2001, Evolutionary history and specia- 59. Chen, W.-H., Minguez, P., Lercher, M. J. and Bork, P. 2012, OGEE: an tion modes in the cyprinid genus Barbus. Proc. Roy. Soc. Lond B Biol. online gene essentiality database. Nucleic Acids Res., 40, D901–D906. Sci., 268, 1297–306. 60. Flicek, P., Amode, M. R., Barrell, D., et al. 2014, Ensembl 2014. Nucleic 37. Ohno, S., Muramoto, J., Christian, L. and Atkin, N. B. 1967, Acids Res., 42, D749–755. Diploid-tetraploid relationship among old-world members of the ﬁsh fam- 61. Hahn, M. W. and Kern, A. D. 2005, Comparative genomics of centrality ily Cyprinidae. Chromosoma, 23, 1–9. and essentiality in three eukaryotic protein-interaction networks. Mol. 38. Yang, L., Sado, T., Hirt, M. V., et al. 2015, Phylogeny and polyploidy: Biol. Evol., 22, 803–6. Resolving the classiﬁcation of cyprinine ﬁshes (Teleostei: Cypriniformes). 62. Innan, H. and Kondrashov, F. 2010, The evolution of gene duplications: clas- Mol. Phylogenet. Evol., 85, 97–116. sifying and distinguishing between models. Nat. Rev. Genet., 11, 97–108. 39. Wang, J.-T., Li, J.-T., Zhang, X.-F. and Sun, X.-W. 2012, Transcriptome 63. Chain, F. J., Dushoff, J. and Evans, B. J. 2011, The odds of duplicate gene analysis reveals the time of the fourth round of genome duplication in persistence after polyploidization. BMC Genomics, 12,1. common carp (Cyprinus carpio). BMC Genomics, 13, 96. 64. Bolger, A. M., Lohse, M. and Usadel, B. 2014, Trimmomatic: a ﬂexible 40. Xu, P., Zhang, X., Wang, X., et al. 2014, Genome sequence and genetic trimmer for Illumina sequence data. Bioinformatics, btu170. diversity of the common carp, Cyprinus carpio. Nature Genet., 46, 65. Grabherr, M. G., Haas, B. J., Yassour, M., et al. 2011, Full-length tran- 1212–1219. scriptome assembly from RNA-Seq data without a reference genome. Nat. 41. Woods, T. D. and Buth, D. G. 1984, High level of gene silencing in the tet- Biotechnol., 29, 644–52. raploid goldﬁsh. Biochem. Syst. Ecol., 12, 415–21. 66. Haas, B. J., Papanicolaou, A., Yassour, M., et al. 2013, De novo tran- 42. Ferris, S. D. and Whitt, G. S. 1977, The evolution of duplicate gene ex- script sequence reconstruction from RNA-seq using the Trinity platform pression in the carp (Cyprinus carpio). Experientia, 33, 1299–1301. for reference generation and analysis. Nat. Protocols, 8, 1494–512. 43. Briggs, J. P. 2002, The zebraﬁsh: a new model organism for integrative 67. Pruesse, E., Quast, C., Knittel, K., et al. 2007, SILVA: a comprehensive physiology. Am. J. Physiol. Regul. Integr. Comp. Physiol., 282, R3–R9. online resource for quality checked and aligned ribosomal RNA sequence 44. Dooley, K. and Zon, L. I. 2000, Zebraﬁsh: a model system for the study data compatible with ARB. Nucleic Acids Res., 35, 7188–96. of human disease. Curr. Opinion Genet. Dev., 10, 252–56. 68. Conesa, A., Gotz, S., Garcia-Gomez, J. M., Terol, J., Talon, M. and 45. Lieschke, G. J. and Currie, P. D. 2007, Animal models of human disease: Robles, M. 2005, Blast2GO: a universal tool for annotation, visualization zebraﬁsh swim into view. Nat. Rev. Genet., 8, 353–367. and analysis in functional genomics research. Bioinformatics, 21, 46. Howe, K., Clark, M. D., Torroja, C. F., et al. 2013, The zebraﬁsh refer- 3674–76. ence genome sequence and its relationship to the human genome. Nature, 69. Langmead, B. and Salzberg, S. L. 2012, Fast gapped-read alignment with 496, 498–503. Bowtie 2. Nat. Methods, 9, 357–59. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018 T.J. Krabbenhoft and T.F. Turner 23 70. Li, B. and Dewey, C. N. 2011, RSEM: accurate transcript quantiﬁcation 85. Small, C., Bassham, S., Catchen, J., et al. 2016, The genome of the Gulf from RNA-Seq data with or without a reference genome. BMC pipeﬁsh enables understanding of evolutionary innovations. Genome Bioinform., 12, 323. Biol., 17, 258. 71. Langfelder, P. and Horvath, S. 2008, WGCNA: an R package for 86. Di Bella, J. M., Bao, Y., Gloor, G. B., Burton, J. P. and Reid, G. 2013, weighted correlation network analysis. BMC Bioinform., 9, 559. High throughput sequencing methods and analysis for microbiome re- 72. Mi, H., Muruganujan, A. and Thomas, P. D. 2013, PANTHER in 2013: search. J. Microbiol. Methods, 95, 401–14. modeling the evolution of gene function, and other gene attributes, in the 87. Coco, J. R., Flemington, E. K. and Taylor, C. M. 2011, PARSES: a pipe- context of phylogenetic trees. Nucleic Acids Res., 41, D377–386. line for analysis of RNA-Seq exogenous sequences. BICoB, 196–200. 73. Supek, F., Bosnjak, M., Skunca, N. and Smuc, T. 2011, REVIGO summa- 88. Dickerson, J. E., Zhu, A., Robertson, D. L. and Hentges, K. E. 2011, rizes and visualizes long lists of gene ontology terms. PLoS One, 6, Deﬁning the role of essential genes in human disease. PLoS One, 6, e21800. e27368. 74. van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. 2014, Ten 89. Luo, H., Lin, Y., Gao, F., Zhang, C.-T. and Zhang, R. 2014, DEG 10, an years of next-generation sequencing technology. Trends Genet., 30, update of the database of essential genes that includes both 418–26. protein-coding genes and noncoding genomic elements. Nucleic Acids 75. Bernatchez, L. 2016, On the maintenance of genetic variation and adapta- Res., 42, D574–D580. tion to environmental change: considerations from population genomics 90. Plough, L., Shin, G. and Hedgecock, D. 2016, Genetic inviability is a ma- in ﬁshes. J. Fish Biol., 89, 2519–56. jor driver of type-III survivorship in experimental families of a highly fe- 76. Martin, J. A. and Wang, Z. 2011, Next-generation transcriptome assem- cund marine bivalve. Mol. Ecol. bly. Nat. Rev. Genet., 12, 671–82. 91. McCurley, A. T. and Callard, G. V. 2008, Characterization of housekeep- 77. Li, B., Fillmore, N., Bai, Y., et al. 2014, Evaluation of de novo transcrip- ing genes in zebraﬁsh: male-female differences and effects of tissue type, tome assemblies from RNA-Seq data. Genome Biol., 15,1. developmental stage and chemical treatment. BMC Mol. Biol., 9,1. 78. O’Neil, S. T. and Emrich, S. J. 2013, Assessing De Novo transcriptome as- 92. Saitoh, K., Sado, T., Doosey, M. H., et al. 2011, Evidence from mitochon- sembly metrics for consistency and utility. BMC Genomics, 14, 465. drial genomics supports the lower Mesozoic of South Asia as the time and 79. Lopez-Maestre, H., Brinza, L., Marchet, C., et al. 2016, SNP calling from place of basal divergence of cypriniform ﬁshes (Actinopterygii: RNA-seq data without a reference genome: identiﬁcation, quantiﬁcation, Ostariophysi). Zool. J. Linnean Soc., 161, 633–62. differential analysis and impact on the protein sequence. Nucleic Acids 93. Postlethwait, J. H. 2007, The zebraﬁsh genome in context: ohnologs gone Res., 44, e148–e148. missing. J. Exp. Zool. B Mol. Dev. Evol., 308, 563–77. 80. DeWoody, J. A., Abts, K. C., Fahey, A. L., et al. 2013, Of contigs and 94. Ferris, S. D. and Whitt, G. S. 1977, Loss of duplicate gene expression after quagmires: next-generation sequencing pitfalls associated with transcrip- polyploidisation. Nature, 265, 258–60. tomic studies. Mol. Ecol. Resour., 13, 551–58. 95. Larhammar, D. and Risinger, C. 1994, Molecular genetic aspects of tetra- 81. Wolf, J. B. 2013, Principles of transcriptome analysis and gene expression ploidy in the common carp Cyprinus carpio. Mol. Phylogenet. Evol., 3, quantiﬁcation: an RNA-seq tutorial. Mol. Ecol. Resour., 13, 559–72. 59–68. 82. Vijay, N., Poelstra, J. W., Ku ¨ nstner, A. and Wolf, J. B. 2013, Challenges 96. Ohno, S. 1970, Evolution by Gene Duplication. Springer: New York, and strategies in transcriptome assembly and differential gene expression NY. quantiﬁcation. A comprehensive in silico assessment of RNA-seq experi- 97. Berthelot, C., Brunet, F., Chalopin, D., et al. 2014, The rainbow trout ge- ments. Mol. Ecol., 22, 620–34. nome provides novel insights into evolution after whole-genome duplica- 83. Aravind, L., Watanabe, H., Lipman, D. J. and Koonin, E. V. 2000, tion in vertebrates. Nat. Commun., 5, 3657. Lineage-speciﬁc loss and divergence of functionally linked genes in eu- 98. Timusk, E. R., Ferguson, M. M., Moghadam, H. K., Norman, J. D., karyotes. Proc. Natl. Acad. Sci., 97, 11319–24. Wilson, C. C. and Danzmann, R. G. 2011, Genome evolution in the ﬁsh 84. Stein, C., Caccamo, M., Laird, G. and Leptin, M. 2007, Conservation and family salmonidae: generation of a brook charr genetic map and compari- divergence of gene families encoding components of innate immune re- sons among charrs (Arctic charr and brook charr) with rainbow trout. sponse systems in zebraﬁsh. Genome Biol., 8,1. BMC Genet., 12, 1–15. Downloaded from https://academic.oup.com/dnaresearch/article-abstract/25/1/11/4106313 by Ed 'DeepDyve' Gillespie user on 16 March 2018
DNA Research – Oxford University Press
Published: Feb 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera