Evolutionary divergence of 3’ UTRs in cichlid fishes

Evolutionary divergence of 3’ UTRs in cichlid fishes Background: Post-transcriptional regulation is crucial for the control of eukaryotic gene expression and might contribute to adaptive divergence. The three prime untranslated regions (3’ UTRs), that are located downstream of protein-coding sequences, play important roles in post-transcriptional regulation. These regions contain functional elements that influence the fate of mRNAs and could be exceptionally important in groups such as rapidly evolving cichlid fishes. Results: To examine cichlid 3’ UTR evolution, we 1) identified gene features in nine teleost genomes and 2) performed comparative analyses to assess evolutionary variation in length, functional motifs, and evolutionary rates of 3’ UTRs. In all nine teleost genomes, we found a smaller proportion of repetitive elements in 3’ UTRs than in the whole genome. We found that the 3’ UTRs in cichlids tend to be longer than those in non-cichlids, and this was associated, on average, with one moremiRNA target pergeneincichlids. Moreover,weprovidedevidencethat3’ UTRs on average have evolved faster in cichlids than in non-cichlids. Finally, analyses of gene function suggested that both the top 5% longest and 5% most rapidly evolving 3’ UTRs in cichlids tended to be involved in ribosome-associated pathways and translation. Conclusions: Our results reveal novel patterns of evolution in the 3’ UTRs of teleosts in general and cichlids in particular. The data suggest that 3’ UTRs might serve as important meta-regulators, regulators of other mechanisms governing post- transcriptional regulation, especially in groups like cichlids that have undergone extremely fast rates of phenotypic diversification and speciation. Keywords: Adaptive radiation - Comparative genomics - Regulatory evolution - miRNA binding sites Background large number of studies have found evidence for associa- Understanding the genomic mechanisms underlying tions between gene regulatory mechanisms and pheno- adaptive radiations and phenotypic variation is one of typic variation [4–7]. However, only a few studies [8]have the major objectives of evolutionary biology. Gene examined the importance of three prime untranslated regulation has long been thought to be a particularly im- regions (3’ UTRs), that are located directly downstream of portant component of phenotypic diversification [1]. For protein-coding DNA sequences, for adaptive divergence. example, because of the inferred small amount of diver- These regions contain functional sequence elements that gence in protein-coding genes, despite large phenotypic control mRNA stability [9], expression levels [10], and differences between humans and chimpanzees, it was mRNA localization [11]. Therefore, they are undoubtedly suggested that mechanisms regulating differential gene important for gene regulation. In groups such as cichlid expression must be playing a predominant role during fishes, that are well known for their astonishing species our own adaptive evolution [2]. Due to major advances richness, the evolutionary divergence of 3’ UTRs might in the feasibility of generating fully-sequenced genomes provide an exceptionally important mechanism for their [3], the importance of non-coding mechanisms during rapid evolution and diversity of adaptive phenotypes [12]. evolution can now be rigorously tested. As predicted, a The 3’ UTR, the last component of messenger RNA (mRNA) to be transcribed in eukaryotes, starts directly * Correspondence: paolo.franchini@uni-konstanz.de after the stop codon and ends with a poly(A) tail. The 3’ Chair in Zoology and Evolutionary Biology, Department of Biology, UTR usually contains binding sites for RNA-binding University of Konstanz, 78457 Konstanz, Germany proteins (RBPs) and small non-coding RNAs, which have Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Xiong et al. BMC Genomics (2018) 19:433 Page 2 of 11 important functional effects on the fate of mRNA. For mechanistic role of 3’ UTRs during cichlid divergence instance, poly(A)-binding proteins (PABPs) stabilize the [30]. Gene ontology (GO) terms [31] and KEGG path- mRNA and facilitate translation by binding to the ways [32] are widely used to describe the functions and poly(A) tails [13]. 3’ UTRs are also the main target re- biological processes of interesting genes. If particular gions of microRNAs (miRNAs), ubiquitous non-coding groups of genes that appear to be outliers are known to small RNAs with a crucial role in fine-tuning gene regu- function together in biological processes such as lation [14]. Mature miRNAs are incorporated into the morphogenesis or gene regulation, this could provide an RNA-induced silencing complex (RISC) and following interesting insight into the role of 3’ UTRs during interaction with target sites in the 3’ UTRs [15], which adaptive divergence. can cause translational repression and/or mRNA degra- Cichlid fishes are one of the most species-rich clades dation [16]. We might expect that in groups like cichlid of teleost fishes and exhibit virtually unparalleled levels fishes that show exceptional phenotypic variation that of phenotypic diversity [33, 34]. For instance, a huge there could be an increased abundance of miRNA component of that diversity, more than 2000 species, targets in 3’ UTRs. have originated in three East African Great Lakes within Since 3’ UTRs are a particularly important part of the an extremely short period of time [35, 36]. Remarkable genomic machinery, we might also expect that purifying phenotypic diversity is also found in many cichlid traits selection in these regions could prevent the incorpor- including body color [37], body shape [38], jaws [39], ation of non-essential elements. As a major source of lips [40], and visual systems [41], that all are adaptations genomic (especially non-coding region) variation [17, to a multitude of ecological niches. The extremely fast 18], repetitive elements (repeats) can cause gene disrup- speciation rates and diverse phenotypic novelties in tion [19], double-strand breaks [20], and gene expression cichlids could be explained by changes in both changes [21], which might commonly be purified from protein-coding genes and the gene regulatory system the 3’ UTR regions [22, 23]. A lower abundance of re- [42]. To date, rapid sequence evolution has been found peats in the 3’UTRs as compared to the genome as a in protein-coding regions linked to morphogenesis, whole would provide evidence that there has been puri- vision, and pigmentation [42]. However, molecular fying selection in these regions [24]. changes in gene regulatory systems have not been as The lengthening of 3’ UTRs might be especially impor- well studied although cichlids are rapidly becoming a tant because it could permit an increase in the number of model system of genome evolution and several cichlid miRNA targets, contribute to the emergence of new cell genomes have been fully sequenced [42]. Therefore, types, and even increase morphological complexity during examining how genomic divergence of regulatory re- evolution [25, 26]. It has been shown that humans have gions in this group compares to that in other teleosts much longer 3’ UTRs compared to other mammals [27]. should provide novel insights into not only vertebrate Cichlids might also have longer 3’ UTRs as compared to genome evolution in general but also genome evolution other teleosts. Additionally, one defining characteristic of in a classic model of adaptive radiation. adaptively radiating groups like cichlids is an increase in We used comparative genomics to test several hypo- the pace of trait divergence relative to other lineages [28]. theses about the evolutionary patterns of 3’ UTR diver- Therefore, the rate of evolutionary change in 3’ UTR gence in cichlid fishes as compared to other teleosts. We lengths might provide the genomic substrate for many first quantified the length of all 3’ UTRs in five cichlid spe- adaptive radiations. If there were a causal relationship cies as well as four other model teleosts and determined if between 3’ UTR length, function, and phenotypic change, 3’ UTR length of more than two thousand 1:1 orthologous then rapid modification of 3’ UTR length during evolution genes was on average longer in cichlids. To examine the could facilitate rapid phenotypic evolution. Therefore, we evidence for purifying selection in 3’ UTRs, we compared might expect in a clade that is rapidly diversifying pheno- the abundance of repetitive elements in the genome as a typically such as cichlid fishes, that the 3’ UTRs are excep- whole as compared to that in all 3’ UTRs of all nine teleost tionally long and/or diversifying rapidly. genomes we analyzed. Then, we examined how cichlid Although most gene regions could function more or 3’ UTRs compared to those of other teleosts in the less the same in all teleosts, one might expect that genes average composition of repetitive elements and num- that deviate the most from average to play an outsized ber of miRNA binding sites. For 1:1 orthologous role during adaptive radiations [29]. For instance, we genes, we also compared the evolutionary rate of might expect 3’ UTRs that are exceptionally long or change in 3’ UTR length for cichlids versus exhibit enhanced rates of evolutionary divergence in a non-cichlids. We then examined in detail the top 5% group such as cichlid fishes to have played a role in their of genes that had the longest 3’ UTRs and the top adaptive radiation. Therefore, we aimed to identify such 5% that showed the most rapid evolutionary diver- 3’ UTRs and thereby make inferences about the gence of 3’ UTR length in cichlids to determine what Xiong et al. BMC Genomics (2018) 19:433 Page 3 of 11 genes and pathways in these fishes showed excep- distribution of 3’ UTR length for all species is presented tional divergence in their 3’ UTRs. in the Additional file 3: Figure S1. The median 3’ UTR length of the nine teleost fishes was 795 nt. Cichlids Results tended to have longer 3’ UTRs in general. We also Statistics and distribution of 3’ UTRs in teleost fishes comparatively examined only the 3’ UTRs of 2041 1:1 We examined gene structures of nine teleost fishes (Fig. 1a) orthologous genes and presented their distribution of based on species-specific transcripts from a large number lengths in Fig. 1b. We calculated the ratio of 3’ UTR of RNA-Seq experiments (Additional file 1:Table S1) and length of 1:1 orthologous genes between cichlids and incorporated the structures into existing annotations using non-cichlids, and this ratio was on average significantly the PASA pipeline. In our updated PASA annotations, higher than the null expectation of 1.0 (the ratio’s value 212,135 genes and 563,263 transcripts with stop codons is 0 after log -transformation) again indicating that 3’ were annotated (Additional file 2:Table S2). Thelongest UTRs were generally longer in cichlids as compared to transcript of each gene was selected as the canonical the non-cichlids examined (Fig. 1c, p < 0.001). transcript and the coding sequences and UTR sequences of canonical transcripts were extracted from the genomes ac- Repetitive elements identification in 3’ UTRs cording to the annotations (see Methods for details). Based We identified repetitive elements (repeats) in 3’ UTRs on similarity searches, 184,810 genes were clustered into and the whole genome respectively, to examine whether gene families and 2041 were identified as 1:1 orthologous 3’ UTRs showed a lower percentage of repeats than the genes that were present in all nine teleost genomes. genome as a whole. Most of the repeats identified were The 3’ UTRs occupied from 1.61 to 4.31 percentage of transposable elements (TEs), including SINEs, LINEs, the teleost genomes (Table 1). Haplochromis burtoni had LTR retrotransposons and DNA transposons (Additional the largest combined total 3’ UTR length with file 4: Table S3). All species had a lower proportion of 35,849,263 nucleotides (nt) as well as the longest mean repeats in the 3’ UTRs compared to their genomes as a length of 1552.66 nt, while Astyanax mexicanus had the whole. On average, the percentage of 3’ UTR that is smallest total 3’ UTR length of 19,162,274 nt and the made up by repeats for these teleosts ranged between 9 shortest mean length of 836.05 nt. The frequency and 20% (against a percentage of repeats in the whole a b Fig. 1 Length of 3’ UTRs in nine teleost fishes. a Phylogenetic tree of the focal nine teleost fishes with time-scale according to estimated divergence time from TimeTree [66]. The five species in red are cichlid species and the four species in purple are other model teleosts. b Frequency distribution of 3’ UTR length of nine teleost fishes for 2041 1:1 orthologous genes. In cichlids, there are fewer 3’UTRs that are shorter than the median length and more 3’UTRs that are longer than the median length. c Comparison of 3’ UTR length between cichlids and non-cichlids. The ratios of the mean length of 3’ UTR in cichlids to that in non-cichlids are calculated for 1:1 orthologous genes. These ratios were log -transformed.In general,cichlids have longer 3’ UTRs than non-cichlids (one-sample t-test that the mean log ratio = 0.0; p < 0.001). The top 5% relatively longest 3’ UTRs in cichlids taken from these 2041 orthologous genes were further analyzed for enrichment of gene functions (GO terms and KEGG pathways). The images of A. mexicanus, O. latipes, O. niloticus, M. zebra, N. brichardi and P. nyererei were taken from Wikimedia Commons (http://commons.wikimedia.org/wiki/), while the images of H. burtoni, D. rerio and P. formosa were taken from FishBase (http://www.fishbase.org) Xiong et al. BMC Genomics (2018) 19:433 Page 4 of 11 Table 1 Summary statistics of 3’ UTR length of canonical transcripts in the nine teleost genomes Species Cichlids Non-cichlids M. zebra P. nyererei H. burtoni N.brichardi O. niloticus O. latipes P. formosa A. mexicanus D. rerio Genome size (Mb) 860 830 831 848 928 870 749 1191 1372 Number of transcripts 23,910 22,358 23,089 21,921 25,191 21,979 24,901 22,920 25,866 Total 3’UTR length (Mb) 34.18 27.73 35.85 23.99 35.80 21.40 30.79 19.16 29.96 Percentage of 3’ UTR in genome 3.97 3.34 4.31 2.83 3.86 2.46 4.11 1.61 2.18 Mean length 1429 1240 1553 1094 1421 974 1237 836 1158 Median length 993 842 1087 690 970 669 812 516 691 25th percentile 410 317.25 463 194 381 277 333 175 270 75th percentile 1980 1727 2143 1552 1990.5 1334 1680 1133 1540 Maximum length 15,626 12,530 21,087 16,134 17,796 15,264 16,660 12,260 22,013 genome ranging between 18 and 38%). Danio rerio was miRNA targets prediction in 3’ UTRs exceptional as repeats made up 34% of its 3’ UTRs and Since miRNA targets are thought to be one of the most D. rerio also had the highest proportion (52%) of the important functional motifs in 3’ UTRs [43], we genome composed of repeats (Fig. 2a). However, for searched for predicted miRNA targets in these nine tele- every single species, the proportion of the 3’ UTRs that osts 3’ UTR sequences. As species-specific miRNA data was composed of repeats was significantly lower than is not available and conserved miRNAs are expected to the proportion of repetitive elements in their genomes be unbiased in their presence, we used miRBase to as a whole (Wilcoxon signed-rank test, p = 0.004). To screen all 3’ UTRs for the targets of 271 mature miRNAs extend the comparison on the proportion of repeats that are conserved across vertebrates (Additional file 5: among different genomic features, we additionally Table S4). From 42,611 to 83,210 targets per species identified the repeats in the 5’ UTRs and coding regions were predicted in the 10,502 to 15,295 3’ UTRs per in all of the species. The coding regions contained the species (Additional file 6: Table S5). On average, there lowest proportion of repeats (2–4%), while the 5’ UTRs were more miRNA targets detected as well as more contained a slightly lower proportion of repeats (from 6 3’ UTRs that were predicted to contain miRNA tar- to 11% depending on the target species) than the 3’ gets in cichlids. In the cichlid species examined, there UTRs (Additional file 3: Figure S3; Additional file 4: were 4.9 to 5.6 miRNA targets per 3’ UTR versus a Table S3). Despite the on average longer 3’ UTRs in range of only 4.0 to 4.6 miRNA targets in the cichlids, we did not find a clear pattern of increased re- non-cichlid fishes (Fig. 3; Additional file 6:Table S5). peats in cichlid 3’ UTRs (Fig. 2b). Importantly, the Therefore, when compared to the other teleosts ex- length of 3’ UTRs on average remained longer in cichlids amined there was approximately one more miRNA after removing all repetitive elements (Additional file 3: target per 3’ UTR in cichlids. Figure S2). a b Fig. 2 Repetitive elements in 3’ UTR. a The percentage of repetitive elements in 3’ UTRs and in the whole genome. In all nine teleost species, 3’ UTR regions contain a lower proportion of repetitive elements as compared to the genome as a whole (Wilcoxon signed-rank test, p = 0.004). b The total length of repetitive elements in 3’ UTRs Xiong et al. BMC Genomics (2018) 19:433 Page 5 of 11 Evolutionary rate divergence in 3’ UTR length genes did not have a match (Additional file 3: Figure S4; Because rapid evolutionary change in 3’ UTR length Additional file 7: Table S6). The enrichment analysis re- might be associated with substantial phenotypic diver- vealed that 13 GO terms were significantly enriched gence, we estimated the evolutionary divergence rate of (Table 2). A number of GO terms were related to both 3’ UTR length under a time-calibrated phylogenetic ribosome and translation, a pattern that was confirmed framework (Fig. 4a). For each of the 2041 1:1 ortholo- by the enriched KEGG “Ribosome” pathway (p < 0.001). gous genes identified, rates were estimated both in the group of cichlids and the group of non-cichlids separ- Discussion ately. Then, the ratio between these rates was calculated The divergence of 3’ UTRs could play important roles in by dividing the cichlid rate by the non-cichlid rate. In both genomic as well as phenotypic evolution. There- general, cichlids had a significantly higher evolutionary fore, we characterized 3’ UTRs in the genomes of nine rate of 3’ UTR divergence than non-cichlids (Fig. 4b, teleost fishes and documented patterns of genome-wide one sample t-test, p < 0.001). 3’ UTR divergence in cichlid fishes compared to non-cichlid fishes. We found that 3’ UTRs in cichlid fish Gene set enrichment analysis of long and rapid-evolving genomes are on average longer than those in other fish 3’ UTRs lineages. The relative paucity of repetitive elements in 3’ Next, we selected two sets of genes to further examine UTRs as compared to the whole genome in all of the the hypothesis that 3’ UTRs might be important for par- teleosts examined speaks to the functional importance ticular aspects of cichlid diversification by conducting of this region and suggests purifying selection could be GO-enrichment analyses for (1) the genes with relatively operating to keep transposable elements and other inser- longest 3’ UTRs in cichlids and (2) genes that exhibited tions out of 3’ UTRs. Moreover, the on average longer 3’ relatively rapid-evolving 3’ UTRs in cichlids. A top 5% UTRs in cichlids is associated with a greater number of cutoff was applied to 1:1 orthologous genes and thereby miRNA targets. There also appears to be a higher aver- 101 genes were selected for each set. Both of these rela- age evolutionary rate of divergence in 3’ UTR sequence tively longest and fastest evolving 3’ UTRs were then length in cichlids as compared to other teleosts. subjected to enrichment analysis (GO terms and KEGG Additionally, analysis of gene function on both the pathways). There were 47 genes shared between the longest and fastest-evolving cichlid 3’ UTRs showed a “length” set and the “rate” set, which meant 46.5% of strong functional bias towards ribosome-related path- them overlapped. By comparing proteins of D. rerio in ways and translation. These associations suggest that the Ensembl database (release 91), only one of these macro-evolutionary divergence in 3’ UTRs in cichlids might be influencing the core post-transcriptional regu- lation machinery in these rapidly diversifying fishes. In general, our results support a role of 3’ UTRs as meta-regulators, regulators of other gene regulatory mechanisms, in groups undergoing exceptional speci- ation and adaptive phenotypic diversification. The lengths of 3’ UTRs likely represent compromises among a number of factors. While 3’ UTRs are not translated into amino acids, the process of transcription is energetically consuming. We, therefore, expected that most 3’ UTRs should be relatively short and this was supported in our analyses (Additional file 3: Figure S1). The length of a 3’ UTR is also known to be important because it is associated with gene expression levels [44]. Longer 3’ UTRs usually contain more functional motifs and are associated with lower gene expression levels [45]. For the nine teleost fishes examined, their overall 3’ Fig. 3 Predicted miRNA target sites per mRNA in 3’ UTRs. The box UTRs had a median length of 795 nt, while the 3’ UTRs plots show the log -transformed distribution of miRNA targets for of only 1:1 orthologous genes had a median length of the nine teleost fish species. The dots inside the boxes show the 723 nt. It appears that cichlids have more 3’ UTRs longer mean values. The outliers are shown by the dots outside the boxes. The maximum number of miRNA targets in a single 3’ UTR is 62, than the median compared to non-cichlids (Fig. 1b; which can be found in one 3’ UTR in N. brichardi and one 3’ UTR in Additional file 3: Figure S1) and it is confirmed by com- P. formosa. On average, cichlid fishes consistently have more miRNA paring 1:1 orthologous genes that 3’ UTRs on average target sites than non-cichlid fishes were longer in cichlids than in non-cichlids (Fig. 1c), Xiong et al. BMC Genomics (2018) 19:433 Page 6 of 11 a b Fig. 4 Evolutionary rate of 3’ UTR length. a The phylogeny used in rate estimation. The rates were estimated in cichlid and non-cichlid groups separately. b Comparison of evolutionary rate of 3’ UTRs between cichlids and non-cichlids. The ratios of the rate in cichlids to that in non-cichlids are calculated for 2.041 1:1 orthologous genes. These ratios were log -transformed. In general, 3’ UTRs in cichlids have significant higher evolutionary rates than in non-cichlids (one-sample t-test that the mean log ratio = 0.0; p < 0.001). The top 5% relatively fastest evolving 3’ UTRs were further analyzed for enrichment of gene functions (GO terms and KEGG pathways) which could indicate a greater potential for a diversity of variation could be associated with different levels of se- functions in cichlid 3’ UTRs. lective constraints acting on different components of the Repetitive elements (repeats) are mobile genetic units genome [51]. In each of the nine focal species, we identi- that can shape a number of aspects of eukaryotic fied the abundance of repeats in the coding regions, 5’ genome architecture including 3’UTR length [46]. It has UTRs, 3’ UTRs and in the genome as a whole. Our results been proposed that repeats are one of the main contri- show that coding regions are made up of only 2–4% of butors of non-coding variation among different species repeats, and this is not surprising considering the strong [47, 48] and their expansion history and impact on gen- selective pressures on protein coding genes. Additionally, ome diversity both between and within lineages has also in all species, both 5′ and 3’ UTRs contain smaller pro- been documented in many fish [42, 49, 50]. The abun- portions of repeats than the genome as a whole, but dancy of repeats is uneven across the genome, and this higher than the coding regions, which is consistent with Table 2 Gene set enrichment analysis of relatively longest and fastest evolving 3’ UTRs in cichlids. Significant enriched GO terms are presented. CC: cellular component; BP: biological process; MF: molecular function; KEGG: KEGG pathway GO GO term Significant p-value category “length”“rate”“union” CC macromolecular complex – < 0.001 – CC mitochondrial matrix 0.046 –– CC intracellular ribonucleoprotein complex – < 0.001 – CC Ribosome < 0.001 < 0.001 0.001 CC ribosomal subunit < 0.001 –– BP cellular biosynthetic process –– 0.046 BP nucleic acid phosphodiester bond hydrolysis 0.037 –– BP peptide metabolic process < 0.001 < 0.001 < 0.001 BP primary metabolic process – 0.019 – BP Translation < 0.001 < 0.001 – MF nuclease activity 0.013 –– MF RNA binding – < 0.001 – MF structural constituent of ribosome < 0.001 < 0.001 < 0.001 KEGG Ribosome < 0.001 < 0.001 < 0.001 Xiong et al. BMC Genomics (2018) 19:433 Page 7 of 11 the hypothesis that purifying selection acts against the protein synthesis, rather than particular genes such as presence of repeats in UTRs [52, 53]. Focusing on 3’ morphogens or opsins that more directly influence indi- UTRs, Dario rerio stood out from the other teleosts, in vidual phenotypes. Although ribosomal proteins are that it has the largest proportion of repeats in its genome highly conserved in coding regions because of their cru- as well as in its 3’ UTRs. This is likely due to the docu- cial functions in protein synthesis, it has been shown mented genome-wide expansion of repeats in this species that the diversity and differential expression of ribosomal [54, 55]. Astyanax mexicanus has the second largest components, known as specialized ribosomes, can select- proportion of repeats in the 3’ UTRs and this was also ively translate certain sets of genes [61]. This new level of mirrored in the proportion of repetitive elements in its gene regulation can be further linked to many phenotypes, genome. Less variation was present in the other fish spe- such as development [62] and disease [63, 64]. The fast cies investigated in this study, and variation of 3’ UTR evolution of 3’ UTRs of ribosomal genes in cichlids could length among different taxa did not appear to be the result provide the potential for specialized expression of specific of an expansion of repeats (Fig. 2b). This implies other ribosomal genes both spatially and temporally, which mechanisms are likely driving the evolution of 3’ UTR could be an important post-transcriptional regulatory length divergence among teleosts. mechanism that has contributed to the adaptation and The on average lengthening of 3’ UTRs in cichlids could diversification of cichlid fishes. be associated with a larger number of functional motifs, such as RNA binding sites and miRNA targets. Indeed, we Conclusions identified about one more miRNA target per 3’ UTRs in Our results reveal novel patterns of evolution and poten- cichlids compared to the other fish lineages (Fig. 3; tial functional differences in the 3’ UTRs of teleosts in Additional file 6: Table S5). Also, we found that more general and of cichlid fishes in particular. It has long mRNAs are targeted by miRNAs and more mRNAs have been argued that because of the relatively small amount multiple targets in cichlids (Additional file 6: Table S5). of protein-coding divergence, but large phenotypic This could have consequences for cichlid diversification as differences found between groups like humans and more potential miRNA-mRNA functional pairs in cichlids chimpanzees, that differential gene expression and asso- could enable these fishes to regulate gene expression ciated regulatory mechanisms must play an important spatially and temporally to an exceptional degree [8, 56]. role in adaptive evolution [2]. Our analyses suggest that Both coding and non-coding divergence likely contribute what might be most critical during adaptive divergence to diversification. Accelerated sequence evolution of par- is not only the direct regulators of gene expression but ticular coding regions in cichlids has been considered to be the regulators of these more direct regulators. By prefer- associated with their adaptive radiation [42]. Interestingly, entially influencing divergence in other regulatory we also detected that 3’ UTRs have on average evolved sig- mechanisms such as the ribosome and other aspects of nificantly faster in cichlids than in other teleosts (Fig. 4b). the post-translational machinery during the adaptive radi- The rapid evolution of 3’ UTRs within cichlids implicates ation of groups like cichlid fishes, divergence in 3’ UTRs this part of the post-transcriptional regulatory system in could serve as important genomic meta-regulators. structuring the exceptional phenotypic diversification of cichlids [57, 58]. Methods Particular subsets of 3’ UTRs might be especially im- Data collection portant during adaptive diversification. We, therefore, The NCBI database [65] was searched for genomic and examined two sets of evolutionary outliers for further transcriptomic data of teleost fishes. In order to obtain gene function enrichment analysis: (1) genes with rela- high-quality annotations of 3’ UTRs, we applied the follow- tively longest 3’ UTRs and (2) genes with the relatively ing minimal criteria for including species into our data set: fastest evolving 3’ UTRs. The influence of these subsets a) de novo genome is assembled; b) species-specific gene of genes on gene expression could play an important models are predicted; c) transcriptomic data from nume- role in the diversification of cichlids [59, 60]. Surpris- rous tissues is available. These stringent criteria enabled us ingly, the genes in these two datasets most often have to carry out comprehensive gene and isoform discovery in functions related to ribosomal structures and related each species, thus minimizing the potential bias due to un- pathways. The ribosome is the main organelle in the even gene and isoform representation across species. Nine process of translation, the final step of gene expression teleost fishes matched these criteria and were included in after post-transcriptional regulation. In addition, GO our analyses. Specifically, genome assemblies, annotations analyses suggested “RNA binding” was also one of the and RNA sequencing (RNA-Seq) data for five cichlid fishes main functions of these 3’ UTRs. The evolution of 3’ (Neolamprologus brichardi, Pundamilia nyererei, Maylan- UTRs in cichlids appears to have most affected the core dia zebra, Oreochromis niloticus, Haplochromis burtoni) of cellular life, post-transcriptional regulation and and four non-cichlid teleost fishes (Astyanax mexicanus, Xiong et al. BMC Genomics (2018) 19:433 Page 8 of 11 Danio rerio, Oryzias latipes, Poecilia formosa)wereexa- canonical transcript. The sequences of coding regions mined (Additional file 1: Table S1). It is important to note and UTRs of the canonical transcripts were then ex- that the nine selected species underwent the same number tracted from the corresponding genome according to the of whole genome duplication (WGD) events and are not PASA annotation. The frequency distributions of 3’ UTR descended from a lineage containing an extra round of lengths, calculated using 50 nt windows, were built for WGD after the teleost-specific WGD. This gave us more each of the nine teleost fishes. confidence in detecting the correct orthology among genes (see below). To place these species in a comparative frame- Gene clustering work, a time-calibrated phylogenetic tree of these nine tele- Protein sequences of canonical transcripts were ex- ost fishes was acquired from TimeTree [66]. tracted from the PASA annotations and genomes. All-against-all BLASTP v2.2.31+ [71] was employed to Annotation of 3’ UTRs with the PASA pipeline find the potential homologous genes with an E-value − 10 The number of genomes of teleost fishes that are avail- cutoff of 1e . The high-scoring segment pairs (HSPs) able is increasing quickly, but the comparison of 3’ from BLASTP output were conjoined by using Solar UTRs across genomes remains challenging. Although v0.9.6 [72]. The similarity of genes was measured by the the annotations of gene models are usually published bit-score. Most similar genes were clustered to gene with the genomes, the quality of these annotations is families using hcluster_sg v0.5.1, a hierarchical clustering often uneven because different pipelines have been ap- algorithm from the Treefam pipeline [73], with the plied in each species. Moreover, the UTRs are often parameters “-w 5 -s 0.33 -m 100000”. poorly described since most annotations typically focus on the protein-coding regions. Thus, the PASA v2.0.2 Identification of repetitive sequences (repeats) annotation pipeline [67] was employed to incorporate The species-specific libraries of repeat families were gene structures, including UTRs, into existing gene identified by applying RepeatModeler v1.0.8 [74] on the annotation, based on RNA-Seq data. genome of each species. RepeatMasker v4.0.6 [75]was A comprehensive workflow was used to integrate the used to mask interspersed repeats and low complexity available transcriptomes into existing annotations. For DNA sequences in the genomes. To compare the each species, RNA-Seq data from eight to eleven tissues proportion of repetitive elements in different genomic and different developmental stages (Additional file 1: regions across the focal species, sequence repeats were Table S1) was used. First, transcripts for each species annotated in the 3’ UTRs, 5’ UTRs, and in the coding were generated by combining de novo assembly and regions. genome-guided assembly. The RNA-Seq reads were de novo assembled using Trinity v2.4.0 [68] using default Prediction of miRNA target sites. parameters. Moreover, the RNA-Seq raw reads were Sequences of mature miRNAs from teleost fishes were mapped to the corresponding genome using TopHat retrieved from miRBase v21 [76]. Since miRNAs are v2.1.0 [69] also with default parameters and then assem- usually conserved across taxa [77] and species-specific bled into transcripts using the genome-guide model im- miRNAs are not present in the database for all species plemented in Trinity. Then, the gene structures were in this study, we extracted 271 unique mature miRNAs identified according to TopHat mapping results using conserved among vertebrates (Additional file 5: Table S4) Cufflinks v2.2.1 [70]. Lastly, the original annotation, tran- for the target prediction. The miRNA target sites of 3’ scripts, and Cufflinks gene structures were imported into UTRs were predicted using miRanda v3.3a [78]with the PASA annotation pipeline. The annotation output parameters “-en -20 -strict”, which set the minimal free from the first PASA run, transcripts, and Cufflinks gene energy as − 20 kcal/mol and required 5′-seed [79]inthe structures were then used in an additional run of PASA to mRNA-miRNA matches. further refine the gene structure and get the final annota- tions for each of the species (the updated gene models have been uploaded to the Dryad Digital Repository Evolutionary rate of 3’ UTRs (doi:https://doi.org/10.5061/dryad.4581cm37). The evolutionary rates of 3’ UTR length for all 1:1 orthologous genes were examined using the R package Retrieval of coding and UTR sequences “OUwie v1.50” [80] under the Brownian motion model. From the PASA annotations, we selected only transcripts Rates were estimated on the time-calibrated phylogen- that contained stop codons to ensure that the transcripts etic tree for two groups. Specifically, the rates within the were not degraded during RNA sequencing. For genes cichlid clade and in the non-cichlid outgroup were esti- with alternative splicing variants, we selected only the mated separately. Then, the ratio between the estimated transcript with the longest coding region as the rates in cichlids and in non-cichlids was calculated and a Xiong et al. BMC Genomics (2018) 19:433 Page 9 of 11 log transformation was applied to the ratio to normalize Additional file 6: Table S5. Predicted miRNA targets in the focal nine the data for further analyses. teleost fish species. (XLSX 11 kb) Additional file 7: Table S6. GO annotations of the selected genes with relatively longest and fastest-evolving 3’ UTRs in cichlid fishes. (XLSX 35 kb) Gene set enrichment analysis Enrichment analysis was performed on two data sets Abbreviations taken from the analyses on the 1:1 orthologous genes. 3’ UTRs: Three prime untranslated regions; GO: Gene ontology; HSPs: We examined the upper 5% percentile distribution of High-scoreing segment pairs; miRNAs: MicroRNAs; mRNA: Messenger RNA; Nt: Nucleotide; PABPs: Poly(A)-binding proteins; RBPs: RNA-binding proteins; the genes that had relatively longest 3’ UTRs and the Repeats: Repetitive elements; RISC: RNA-induced silencing complex; RNA- upper 5% percentile distribution form the 3’ UTRs Seq: RNA sequencing; TEs: Transposable elements; WGD: Whole genome showing the relative fastest evolutionary rate in cichlids. duplication To identify significantly over-represented gene ontology Acknowledgements (GO) terms and KEGG pathways in the identified genes We thank the two anonymous reviewers for their constructive comments (test sets) when compared to the whole gene set (base- and useful suggestions. And we gratefully acknowledge use of the services and facilities at the University of Konstanz. line set), we used a Fisher’s exact test implemented in g:Profiler [81]. To avoid any bias due to the different qual- Funding ity of the genome annotations, we carried out different ana- This work was supported by a China Scholarship Council fellowship lyses using as baseline the O. niloticus and the D. rerio gene (201306380094) to PX and by a Deutsche Forschungsgemeinschaft Research Grant (FR 3399/1–1) to PF. sets, the best-annotated cichlids and non-cichlids genomes, respectively. The sequence distribution of the GO terms for Availability of data and materials the genes in the test sets was observed graphically using a PASA annotation files in GFF3 format have been uploaded to the Dryad Digital Repository (doi:https://doi.org/10.5061/dryad.4581cm37). multilevel pie representation method as implemented in Blast2GO v4.1.5 [82]. Authors’ contributions PX, DH, and PF performed the analyses. PX, DH, AM, and PF conceived the study. All authors read and approved the final manuscript. Statistics We performed three types of comparisons and performed Ethics approval and consent to participate different statistical analyses based on the type of compari- Not applicable. sons made with the teleost 3’ UTRs. When we compared Competing interests different values within a species, such as the proportion of The authors declare that they have no competing interests. repetitive elements in the whole genome versus the 3’ UTRs, we treated the species as the unit of replication. Publisher’sNote For the comparisons of length and evolutionary rates in- Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. volving large numbers of individual genes, we treated these genes as independent data points for comparisons Author details between cichlids and non-cichlids. When we compared Chair in Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78457 Konstanz, Germany. Radcliffe Institute for the average values, such as miRNA targets, in cichlids ver- Advanced Study, Harvard University, Cambridge, MA 02138, USA. sus non-cichlids, we did not perform any statistical ana- lyses as these groups and their aggregate trait values were Received: 28 February 2018 Accepted: 23 May 2018 not phylogenetically independent of one another, and therefore, in these cases, we only reported the patterns. References 1. Britten RJ, Davidson EH. Gene regulation for higher cells - a theory. Science. 1969;165(3891):349. Additional files 2. King M, Wilson A. Evolution at two levels in humans and chimpanzees. Science. 1975;188(4184):107–16. 3. Shendure J, Balasubramanian S, Church GM, Gilbert W, Rogers J, Schloss JA, Additional file 1: Table S1. Genomic and transcriptomic data of the Waterston RH. DNA sequencing at 40: past, present and future. Nature. nine teleost fish species included in this study. (XLSX 11 kb) 2017;550(7676) Additional file 2: Table S2. Comparison of annotations for the nine 4. Romero IG, Ruvinsky I, Gilad Y. Comparative studies of gene expression and teleost fish species. (XLSX 10 kb) the evolution of gene regulation. Nat Rev Genet. 2012;13(7):505–16. Additional file 3: Figure S1. Frequency distribution of 3’ UTR length in 5. Salinas F, de Boer CG, Abarca V, Garcia V, Cuevas M, Araos S, Larrondo LF, nine teleost species. Figure S2. Mean length of 3’ UTRs without repeats. Martinez C, Cubillos FA. Natural variation in non-coding regions underlying Figure S3. Different classes of repetitive elements in different genomic phenotypic diversity in budding yeast. Sci Rep. 2016;6:21849. components in nine teleost fish species. Figure S4. Multi-level pie charts 6. Krubitzer L, Kaas J. The evolution of the neocortex in mammals: how is of GO terms of selected genes. (PDF 1211 kb) phenotypic diversity generated? Curr Opin Neurobiol. 2005;15(4):444–53. 7. Streelman JT, Peichel CL, Parichy DM. Developmental genetics of adaptation Additional file 4: Table S3. Percentage of repetitive elements in 3’ in fishes: the case for novelty. Annu Rev Ecol Evol Syst. 2007;38(1):655–81. UTRs, 5’ UTRs, coding regions and in the whole genome. (XLSX 13 kb) 8. Franchini P, Xiong P, Fruciano C, Meyer A. The role of microRNAs in the Additional file 5: Table S4. Name and sequence of the 271 teleost repeated parallel diversification of lineages of Midas cichlid fish from miRNAs conserved in vertebrates. (XLSX 18 kb) Nicaragua. Genome biology and evolution. 2016;8(5):1543–55. Xiong et al. BMC Genomics (2018) 19:433 Page 10 of 11 9. Goldstrohm AC, Wickens M. Multifunctional deadenylase complexes cichlid fishes long after Gondwanan rifting. Proceedings Biological sciences. diversify mRNA control. Nat Rev Mol Cell Biol. 2008;9(4):337–44. 2013;280(1770):20131733. 10. Matoulkova E, Michalova E, Vojtesek B, Hrstka R. The role of the 3′ 37. Seehausen O, Mayhew PJ, Van Alphen JJM. Evolution of colour patterns in untranslated region in post-transcriptional regulation of protein expression east African cichlid fish. J Evolution Biol. 1999;12(3):514–34. in mammalian cells. RNA Biol. 2012;9(5):563–76. 38. Fruciano C, Franchini P, Kovacova V, Elmer KR, Henning F, Meyer A. Genetic 11. Andreassi C, Riccio A. To localize or not to localize: mRNA fate is in 3'UTR linkage of distinct adaptive traits in sympatrically speciating crater lake ends. Trends Cell Biol. 2009;19(9):465–74. cichlid fish. Nat Commun. 2016;7:12736. 12. Holloway AK, Lawniczak MK, Mezey JG, Begun DJ, Jones CD. Adaptive gene 39. Hulsey CD, Mims MC, Parnell NF, Streelman JT. Comparative rates of lower jaw expression divergence inferred from population genomics. PLoS Genet. diversification in cichlid adaptive radiations. J Evol Biol. 2010;23(7):1456–67. 2007;3(10):2007–13. 40. Henning F, Machado-Schiaffino G, Baumgarten L, Meyer A. Genetic 13. Gorgoni B, Gray NK. The roles of cytoplasmic poly(a)-binding proteins in dissection of adaptive form and function in rapidly speciating cichlid fishes. regulating gene expression: a developmental perspective. Briefings in Evolution. 2017;71(5):1297–312. functional genomics & proteomics. 2004;3(2):125–41. 41. O'Quin KE, Hofmann CM, Hofmann HA, Carleton KL. Parallel evolution of 14. Bartel DP. MicroRNAs. Cell. 2004;116(2):281–97. opsin gene expression in African cichlid fishes. Mol Biol Evol. 2010;27(12): 15. Gregory RI, Chendrimada TP, Cooch N, Shiekhattar R. Human RISC couples 2839–54. microRNA biogenesis and posttranscriptional gene silencing. Cell. 2005; 42. Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, Simakov O, Ng AY, 123(4):631–40. Lim ZW, Bezault E, et al. The genomic substrate for adaptive radiation in African cichlid fish. Nature. 2014;513(7518):375–81. 16. Hu W, Coller J. What comes first: translational repression or mRNA degradation? The deepening mystery of microRNA function. Cell Res. 2012; 43. Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP. The impact of 22(9):1322–4. microRNAs on protein output. Nature. 2008;455(7209):64–71. 17. Smit AF. Interspersed repeats and other mementos of transposable 44. Ji Z, Lee JY, Pan Z, Jiang B, Tian B. Progressive lengthening of 3′ elements in mammalian genomes. Curr Opin Genet Dev. 1999;9(6):657–63. untranslated regions of mRNAs by alternative polyadenylation during 18. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, mouse embryonic development. Proc Natl Acad Sci U S A. 2009;106(17): Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the 7028–33. human genome. Nature. 2001;409(6822):860–921. 45. Barrett LW, Fletcher S, Wilton SD. Regulation of eukaryotic gene expression 19. Kidwell MG, Lisch D. Transposable elements as sources of variation in by the untranslated gene regions and other non-coding elements. Cellular animals and plants. P Natl Acad Sci USA. 1997;94(15):7704–11. and molecular life sciences : CMLS. 2012;69(21):3613–34. 20. Hedges DJ, Deininger PL. Inviting instability: transposable elements, double- 46. Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic strand breaks, and the maintenance of genome integrity. Mutat Res. 2007; genomes. Annu Rev Genet. 2007;41:331–68. 616(1–2):46–59. 47. Lee SI, Kim NS. Transposable elements and genome size variations in plants. 21. Lerman DN, Michalak P, Helin AB, Bettencourt BR, Feder ME. Modification of Genomics & informatics. 2014;12(3):87–97. heat-shock gene expression in Drosophila melanogaster populations via 48. Kidwell MG. Transposable elements and the evolution of genome size in transposable elements. Mol Biol Evol. 2003;20(1):135–44. eukaryotes. Genetica. 2002;115(1):49–63. 22. Wright S, Finnegan D. Genome evolution: sex and the transposable 49. Chalopin D, Naville M, Plard F, Galiana D, Volff JN. Comparative analysis of element. Current biology : CB. 2001;11(8):R296–9. transposable elements highlights Mobilome diversity and evolution in 23. Szitenberg A, Cha S, Opperman CH, Bird DM, Blaxter ML, Lunt DH. Genetic vertebrates. Genome Biology and Evolution. 2015;7(2):567–80. drift, not life history or RNAi, determine long-term evolution of transposable 50. Gao B, Shen D, Xue SL, Chen C, Cui HM, Song CY. The contribution of elements. Genome biology and evolution. 2016;8(9):2964–78. transposable elements to size variations between four teleost genomes. 24. Charlesworth B, Charlesworth D. The population-dynamics of transposable Mobile DNA-Uk. 2016;7 elements. Genet Res. 1983;42(1):1–27. 51. Gonzalez J, Petrov DA. The adaptive role of transposable elements in the 25. Chen CY, Chen ST, Juan HF, Huang HC. Lengthening of 3'UTR increases Drosophila genome. Gene. 2009;448(2):124–33. with morphological complexity in animal evolution. Bioinformatics. 2012; 52. Pereira V. Insertion bias and purifying selection of retrotransposons in the 28(24):3178–81. Arabidopsis thaliana genome. Genome Biol. 2004;5(10):R79. 26. Miura P, Shenker S, Andreu-Agullo C, Westholm JO, Lai EC. Widespread and 53. Barron MG, Fiston-Lavier AS, Petrov DA, Gonzalez J. Population genomics of extensive lengthening of 3 ' UTRs in the mammalian brain. Genome Res. transposable elements in Drosophila. Annu Rev Genet. 2014;48:561–81. 2013;23(5):812–25. 54. Moss SP, Joyce DA, Humphries S, Tindall KJ, Lunt DH. Comparative analysis 27. Pesole G, Mignone F, Gissi C, Grillo G, Licciulli F, Liuni S. Structural and of teleost genome sequences reveals an ancient intron size expansion in functional features of eukaryotic mRNA untranslated regions. Gene. 2001; the zebrafish lineage. Genome biology and evolution. 2011;3:1187–96. 276(1–2):73–81. 55. Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, 28. Gavrilets S, Losos JB. Adaptive radiation: contrasting theory with data. Humphray S, McLaren K, Matthews L, et al. The zebrafish reference genome Science. 2009;323(5915):732–7. sequence and its relationship to the human genome. Nature. 2013; 29. Barrier M, Robichaux RH, Purugganan MD. Accelerated regulatory gene evolution 496(7446):498–503. in an adaptive radiation. Proc Natl Acad Sci U S A. 2001;98(18):10208–13. 56. Li J, Zhang Z. miRNA regulatory variation in human evolution. Trends in 30. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. The power and genetics : TIG. 2013;29(2):116–24. promise of population genomics: from genotyping to genome typing. Nat 57. Kuraku S, Meyer A, Kuratani S. Timing of genome duplications relative to Rev Genet. 2003;4(12):981–94. the origin of the vertebrates: did cyclostomes diverge before or after? Mol 31. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Biol Evol. 2009;26(1):47–59. Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification 58. Henning F, Meyer A. The evolutionary genomics of cichlid fishes: explosive of biology. The Gene Ontology Consortium. Nat. gen. 2000;25(1):25–9. speciation and adaptation in the postgenomic era. Annu Rev Genomics 32. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a Hum Genet. 2014;15:417–41. reference resource for gene and protein annotation. Nucleic Acids Res. 59. Brand AH, Perrimon N. Targeted gene-expression as a means of altering 2016;44(D1):D457–62. cell fates and generating dominant phenotypes. Development. 1993; 33. Hulsey CD. Cichlid genomics and phenotypic diversity in a comparative 118(2):401–15. context. Integr Comp Biol. 2009;49(6):618–29. 60. Vu V, Verster AJ, Schertzberg M, Chuluunbaatar T, Spensley M, Pajkic D, Hart 34. Meyer A. Phylogenetic relationships and evolutionary processes in east GT, Moffat J, Fraser AG. Natural variation in gene expression modulates the African cichlid fishes. Trends Ecol Evol. 1993;8(8):279–84. severity of mutant phenotypes. Cell. 2015;162(2):391–402. 35. Meyer A, Kocher TD, Basasibwaki P, Wilson AC. Monophyletic origin of Lake 61. Xue S, Barna M. Specialized ribosomes: a new frontier in gene regulation Victoria cichlid fishes suggested by mitochondrial DNA sequences. Nature. and organismal biology. Nat Rev Mol Cell Biol. 2012;13(6):355–69. 1990;347(6293):550–3. 62. Kondrashov N, Pusic A, Stumpf CR, Shimizu K, Hsieh AC, Ishijima J, Shiroishi 36. Friedman M, Keck BP, Dornburg A, Eytan RI, Martin CH, Hulsey CD, T, Barna M. Ribosome-mediated specificity in Hox mRNA translation and Wainwright PC, Near TJ. Molecular and fossil evidence place the origin of vertebrate tissue patterning. Cell. 2011;145(3):383–97. Xiong et al. BMC Genomics (2018) 19:433 Page 11 of 11 63. Wang W, Nag S, Zhang X, Wang MH, Wang H, Zhou J, Zhang R. Ribosomal proteins and human diseases: pathogenesis, molecular mechanisms, and therapeutic implications. Med Res Rev. 2015;35(2):225–85. 64. Guimaraes JC, Zavolan M. Patterns of ribosomal protein expression specify normal and malignant human cells. Genome Biol. 2016;17(1):236. 65. Coordinators NR. Database resources of the National Center for biotechnology information. Nucleic Acids Res. 2017;45(D1):D12–7. 66. Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: a resource for timelines, Timetrees, and divergence times. Mol Biol Evol. 2017;34(7):1812–9. 67. Haas BJ. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66. 68. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52. 69. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. 70. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5. 71. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. 72. Li H: solar. In.; 2006. 73. Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, Heriche JK, Hu Y, Kristiansen K, Li R, et al. TreeFam: 2008 update. Nucleic Acids Res. 2008; 36(Database issue):D735–40. 74. Smit A, Hubley R: RepeatModeler Open-1.0. In.; 2008-2015. 75. Smit A, Hubley R & Green P: RepeatMasker Open-4.0. In.; 2013-2015. 76. Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014; 42(Database issue):D68–73. 77. Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009;19(1):92–105. 78. Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. MicroRNA targets in Drosophila. Genome Biol. 2003;5(1):R1. 79. Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136(2):215–33. 80. Beaulieu JM, Jhwueng DC, Boettiger C, O'Meara BC. Modeling stabilizing selection: expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution. 2012;66(8):2369–83. 81. Reimand J, Arak T, Adler P, Kolberg L, Reisberg S, Peterson H, Vilo J. G: profiler-a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 2016;44(W1):W83–9. 82. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Genomics Springer Journals

Evolutionary divergence of 3’ UTRs in cichlid fishes

Free
11 pages

Loading next page...
 
/lp/springer_journal/evolutionary-divergence-of-3-utrs-in-cichlid-fishes-dwkFRiLwU2
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s).
Subject
Life Sciences; Life Sciences, general; Microarrays; Proteomics; Animal Genetics and Genomics; Microbial Genetics and Genomics; Plant Genetics and Genomics
eISSN
1471-2164
D.O.I.
10.1186/s12864-018-4821-8
Publisher site
See Article on Publisher Site

Abstract

Background: Post-transcriptional regulation is crucial for the control of eukaryotic gene expression and might contribute to adaptive divergence. The three prime untranslated regions (3’ UTRs), that are located downstream of protein-coding sequences, play important roles in post-transcriptional regulation. These regions contain functional elements that influence the fate of mRNAs and could be exceptionally important in groups such as rapidly evolving cichlid fishes. Results: To examine cichlid 3’ UTR evolution, we 1) identified gene features in nine teleost genomes and 2) performed comparative analyses to assess evolutionary variation in length, functional motifs, and evolutionary rates of 3’ UTRs. In all nine teleost genomes, we found a smaller proportion of repetitive elements in 3’ UTRs than in the whole genome. We found that the 3’ UTRs in cichlids tend to be longer than those in non-cichlids, and this was associated, on average, with one moremiRNA target pergeneincichlids. Moreover,weprovidedevidencethat3’ UTRs on average have evolved faster in cichlids than in non-cichlids. Finally, analyses of gene function suggested that both the top 5% longest and 5% most rapidly evolving 3’ UTRs in cichlids tended to be involved in ribosome-associated pathways and translation. Conclusions: Our results reveal novel patterns of evolution in the 3’ UTRs of teleosts in general and cichlids in particular. The data suggest that 3’ UTRs might serve as important meta-regulators, regulators of other mechanisms governing post- transcriptional regulation, especially in groups like cichlids that have undergone extremely fast rates of phenotypic diversification and speciation. Keywords: Adaptive radiation - Comparative genomics - Regulatory evolution - miRNA binding sites Background large number of studies have found evidence for associa- Understanding the genomic mechanisms underlying tions between gene regulatory mechanisms and pheno- adaptive radiations and phenotypic variation is one of typic variation [4–7]. However, only a few studies [8]have the major objectives of evolutionary biology. Gene examined the importance of three prime untranslated regulation has long been thought to be a particularly im- regions (3’ UTRs), that are located directly downstream of portant component of phenotypic diversification [1]. For protein-coding DNA sequences, for adaptive divergence. example, because of the inferred small amount of diver- These regions contain functional sequence elements that gence in protein-coding genes, despite large phenotypic control mRNA stability [9], expression levels [10], and differences between humans and chimpanzees, it was mRNA localization [11]. Therefore, they are undoubtedly suggested that mechanisms regulating differential gene important for gene regulation. In groups such as cichlid expression must be playing a predominant role during fishes, that are well known for their astonishing species our own adaptive evolution [2]. Due to major advances richness, the evolutionary divergence of 3’ UTRs might in the feasibility of generating fully-sequenced genomes provide an exceptionally important mechanism for their [3], the importance of non-coding mechanisms during rapid evolution and diversity of adaptive phenotypes [12]. evolution can now be rigorously tested. As predicted, a The 3’ UTR, the last component of messenger RNA (mRNA) to be transcribed in eukaryotes, starts directly * Correspondence: paolo.franchini@uni-konstanz.de after the stop codon and ends with a poly(A) tail. The 3’ Chair in Zoology and Evolutionary Biology, Department of Biology, UTR usually contains binding sites for RNA-binding University of Konstanz, 78457 Konstanz, Germany proteins (RBPs) and small non-coding RNAs, which have Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Xiong et al. BMC Genomics (2018) 19:433 Page 2 of 11 important functional effects on the fate of mRNA. For mechanistic role of 3’ UTRs during cichlid divergence instance, poly(A)-binding proteins (PABPs) stabilize the [30]. Gene ontology (GO) terms [31] and KEGG path- mRNA and facilitate translation by binding to the ways [32] are widely used to describe the functions and poly(A) tails [13]. 3’ UTRs are also the main target re- biological processes of interesting genes. If particular gions of microRNAs (miRNAs), ubiquitous non-coding groups of genes that appear to be outliers are known to small RNAs with a crucial role in fine-tuning gene regu- function together in biological processes such as lation [14]. Mature miRNAs are incorporated into the morphogenesis or gene regulation, this could provide an RNA-induced silencing complex (RISC) and following interesting insight into the role of 3’ UTRs during interaction with target sites in the 3’ UTRs [15], which adaptive divergence. can cause translational repression and/or mRNA degra- Cichlid fishes are one of the most species-rich clades dation [16]. We might expect that in groups like cichlid of teleost fishes and exhibit virtually unparalleled levels fishes that show exceptional phenotypic variation that of phenotypic diversity [33, 34]. For instance, a huge there could be an increased abundance of miRNA component of that diversity, more than 2000 species, targets in 3’ UTRs. have originated in three East African Great Lakes within Since 3’ UTRs are a particularly important part of the an extremely short period of time [35, 36]. Remarkable genomic machinery, we might also expect that purifying phenotypic diversity is also found in many cichlid traits selection in these regions could prevent the incorpor- including body color [37], body shape [38], jaws [39], ation of non-essential elements. As a major source of lips [40], and visual systems [41], that all are adaptations genomic (especially non-coding region) variation [17, to a multitude of ecological niches. The extremely fast 18], repetitive elements (repeats) can cause gene disrup- speciation rates and diverse phenotypic novelties in tion [19], double-strand breaks [20], and gene expression cichlids could be explained by changes in both changes [21], which might commonly be purified from protein-coding genes and the gene regulatory system the 3’ UTR regions [22, 23]. A lower abundance of re- [42]. To date, rapid sequence evolution has been found peats in the 3’UTRs as compared to the genome as a in protein-coding regions linked to morphogenesis, whole would provide evidence that there has been puri- vision, and pigmentation [42]. However, molecular fying selection in these regions [24]. changes in gene regulatory systems have not been as The lengthening of 3’ UTRs might be especially impor- well studied although cichlids are rapidly becoming a tant because it could permit an increase in the number of model system of genome evolution and several cichlid miRNA targets, contribute to the emergence of new cell genomes have been fully sequenced [42]. Therefore, types, and even increase morphological complexity during examining how genomic divergence of regulatory re- evolution [25, 26]. It has been shown that humans have gions in this group compares to that in other teleosts much longer 3’ UTRs compared to other mammals [27]. should provide novel insights into not only vertebrate Cichlids might also have longer 3’ UTRs as compared to genome evolution in general but also genome evolution other teleosts. Additionally, one defining characteristic of in a classic model of adaptive radiation. adaptively radiating groups like cichlids is an increase in We used comparative genomics to test several hypo- the pace of trait divergence relative to other lineages [28]. theses about the evolutionary patterns of 3’ UTR diver- Therefore, the rate of evolutionary change in 3’ UTR gence in cichlid fishes as compared to other teleosts. We lengths might provide the genomic substrate for many first quantified the length of all 3’ UTRs in five cichlid spe- adaptive radiations. If there were a causal relationship cies as well as four other model teleosts and determined if between 3’ UTR length, function, and phenotypic change, 3’ UTR length of more than two thousand 1:1 orthologous then rapid modification of 3’ UTR length during evolution genes was on average longer in cichlids. To examine the could facilitate rapid phenotypic evolution. Therefore, we evidence for purifying selection in 3’ UTRs, we compared might expect in a clade that is rapidly diversifying pheno- the abundance of repetitive elements in the genome as a typically such as cichlid fishes, that the 3’ UTRs are excep- whole as compared to that in all 3’ UTRs of all nine teleost tionally long and/or diversifying rapidly. genomes we analyzed. Then, we examined how cichlid Although most gene regions could function more or 3’ UTRs compared to those of other teleosts in the less the same in all teleosts, one might expect that genes average composition of repetitive elements and num- that deviate the most from average to play an outsized ber of miRNA binding sites. For 1:1 orthologous role during adaptive radiations [29]. For instance, we genes, we also compared the evolutionary rate of might expect 3’ UTRs that are exceptionally long or change in 3’ UTR length for cichlids versus exhibit enhanced rates of evolutionary divergence in a non-cichlids. We then examined in detail the top 5% group such as cichlid fishes to have played a role in their of genes that had the longest 3’ UTRs and the top adaptive radiation. Therefore, we aimed to identify such 5% that showed the most rapid evolutionary diver- 3’ UTRs and thereby make inferences about the gence of 3’ UTR length in cichlids to determine what Xiong et al. BMC Genomics (2018) 19:433 Page 3 of 11 genes and pathways in these fishes showed excep- distribution of 3’ UTR length for all species is presented tional divergence in their 3’ UTRs. in the Additional file 3: Figure S1. The median 3’ UTR length of the nine teleost fishes was 795 nt. Cichlids Results tended to have longer 3’ UTRs in general. We also Statistics and distribution of 3’ UTRs in teleost fishes comparatively examined only the 3’ UTRs of 2041 1:1 We examined gene structures of nine teleost fishes (Fig. 1a) orthologous genes and presented their distribution of based on species-specific transcripts from a large number lengths in Fig. 1b. We calculated the ratio of 3’ UTR of RNA-Seq experiments (Additional file 1:Table S1) and length of 1:1 orthologous genes between cichlids and incorporated the structures into existing annotations using non-cichlids, and this ratio was on average significantly the PASA pipeline. In our updated PASA annotations, higher than the null expectation of 1.0 (the ratio’s value 212,135 genes and 563,263 transcripts with stop codons is 0 after log -transformation) again indicating that 3’ were annotated (Additional file 2:Table S2). Thelongest UTRs were generally longer in cichlids as compared to transcript of each gene was selected as the canonical the non-cichlids examined (Fig. 1c, p < 0.001). transcript and the coding sequences and UTR sequences of canonical transcripts were extracted from the genomes ac- Repetitive elements identification in 3’ UTRs cording to the annotations (see Methods for details). Based We identified repetitive elements (repeats) in 3’ UTRs on similarity searches, 184,810 genes were clustered into and the whole genome respectively, to examine whether gene families and 2041 were identified as 1:1 orthologous 3’ UTRs showed a lower percentage of repeats than the genes that were present in all nine teleost genomes. genome as a whole. Most of the repeats identified were The 3’ UTRs occupied from 1.61 to 4.31 percentage of transposable elements (TEs), including SINEs, LINEs, the teleost genomes (Table 1). Haplochromis burtoni had LTR retrotransposons and DNA transposons (Additional the largest combined total 3’ UTR length with file 4: Table S3). All species had a lower proportion of 35,849,263 nucleotides (nt) as well as the longest mean repeats in the 3’ UTRs compared to their genomes as a length of 1552.66 nt, while Astyanax mexicanus had the whole. On average, the percentage of 3’ UTR that is smallest total 3’ UTR length of 19,162,274 nt and the made up by repeats for these teleosts ranged between 9 shortest mean length of 836.05 nt. The frequency and 20% (against a percentage of repeats in the whole a b Fig. 1 Length of 3’ UTRs in nine teleost fishes. a Phylogenetic tree of the focal nine teleost fishes with time-scale according to estimated divergence time from TimeTree [66]. The five species in red are cichlid species and the four species in purple are other model teleosts. b Frequency distribution of 3’ UTR length of nine teleost fishes for 2041 1:1 orthologous genes. In cichlids, there are fewer 3’UTRs that are shorter than the median length and more 3’UTRs that are longer than the median length. c Comparison of 3’ UTR length between cichlids and non-cichlids. The ratios of the mean length of 3’ UTR in cichlids to that in non-cichlids are calculated for 1:1 orthologous genes. These ratios were log -transformed.In general,cichlids have longer 3’ UTRs than non-cichlids (one-sample t-test that the mean log ratio = 0.0; p < 0.001). The top 5% relatively longest 3’ UTRs in cichlids taken from these 2041 orthologous genes were further analyzed for enrichment of gene functions (GO terms and KEGG pathways). The images of A. mexicanus, O. latipes, O. niloticus, M. zebra, N. brichardi and P. nyererei were taken from Wikimedia Commons (http://commons.wikimedia.org/wiki/), while the images of H. burtoni, D. rerio and P. formosa were taken from FishBase (http://www.fishbase.org) Xiong et al. BMC Genomics (2018) 19:433 Page 4 of 11 Table 1 Summary statistics of 3’ UTR length of canonical transcripts in the nine teleost genomes Species Cichlids Non-cichlids M. zebra P. nyererei H. burtoni N.brichardi O. niloticus O. latipes P. formosa A. mexicanus D. rerio Genome size (Mb) 860 830 831 848 928 870 749 1191 1372 Number of transcripts 23,910 22,358 23,089 21,921 25,191 21,979 24,901 22,920 25,866 Total 3’UTR length (Mb) 34.18 27.73 35.85 23.99 35.80 21.40 30.79 19.16 29.96 Percentage of 3’ UTR in genome 3.97 3.34 4.31 2.83 3.86 2.46 4.11 1.61 2.18 Mean length 1429 1240 1553 1094 1421 974 1237 836 1158 Median length 993 842 1087 690 970 669 812 516 691 25th percentile 410 317.25 463 194 381 277 333 175 270 75th percentile 1980 1727 2143 1552 1990.5 1334 1680 1133 1540 Maximum length 15,626 12,530 21,087 16,134 17,796 15,264 16,660 12,260 22,013 genome ranging between 18 and 38%). Danio rerio was miRNA targets prediction in 3’ UTRs exceptional as repeats made up 34% of its 3’ UTRs and Since miRNA targets are thought to be one of the most D. rerio also had the highest proportion (52%) of the important functional motifs in 3’ UTRs [43], we genome composed of repeats (Fig. 2a). However, for searched for predicted miRNA targets in these nine tele- every single species, the proportion of the 3’ UTRs that osts 3’ UTR sequences. As species-specific miRNA data was composed of repeats was significantly lower than is not available and conserved miRNAs are expected to the proportion of repetitive elements in their genomes be unbiased in their presence, we used miRBase to as a whole (Wilcoxon signed-rank test, p = 0.004). To screen all 3’ UTRs for the targets of 271 mature miRNAs extend the comparison on the proportion of repeats that are conserved across vertebrates (Additional file 5: among different genomic features, we additionally Table S4). From 42,611 to 83,210 targets per species identified the repeats in the 5’ UTRs and coding regions were predicted in the 10,502 to 15,295 3’ UTRs per in all of the species. The coding regions contained the species (Additional file 6: Table S5). On average, there lowest proportion of repeats (2–4%), while the 5’ UTRs were more miRNA targets detected as well as more contained a slightly lower proportion of repeats (from 6 3’ UTRs that were predicted to contain miRNA tar- to 11% depending on the target species) than the 3’ gets in cichlids. In the cichlid species examined, there UTRs (Additional file 3: Figure S3; Additional file 4: were 4.9 to 5.6 miRNA targets per 3’ UTR versus a Table S3). Despite the on average longer 3’ UTRs in range of only 4.0 to 4.6 miRNA targets in the cichlids, we did not find a clear pattern of increased re- non-cichlid fishes (Fig. 3; Additional file 6:Table S5). peats in cichlid 3’ UTRs (Fig. 2b). Importantly, the Therefore, when compared to the other teleosts ex- length of 3’ UTRs on average remained longer in cichlids amined there was approximately one more miRNA after removing all repetitive elements (Additional file 3: target per 3’ UTR in cichlids. Figure S2). a b Fig. 2 Repetitive elements in 3’ UTR. a The percentage of repetitive elements in 3’ UTRs and in the whole genome. In all nine teleost species, 3’ UTR regions contain a lower proportion of repetitive elements as compared to the genome as a whole (Wilcoxon signed-rank test, p = 0.004). b The total length of repetitive elements in 3’ UTRs Xiong et al. BMC Genomics (2018) 19:433 Page 5 of 11 Evolutionary rate divergence in 3’ UTR length genes did not have a match (Additional file 3: Figure S4; Because rapid evolutionary change in 3’ UTR length Additional file 7: Table S6). The enrichment analysis re- might be associated with substantial phenotypic diver- vealed that 13 GO terms were significantly enriched gence, we estimated the evolutionary divergence rate of (Table 2). A number of GO terms were related to both 3’ UTR length under a time-calibrated phylogenetic ribosome and translation, a pattern that was confirmed framework (Fig. 4a). For each of the 2041 1:1 ortholo- by the enriched KEGG “Ribosome” pathway (p < 0.001). gous genes identified, rates were estimated both in the group of cichlids and the group of non-cichlids separ- Discussion ately. Then, the ratio between these rates was calculated The divergence of 3’ UTRs could play important roles in by dividing the cichlid rate by the non-cichlid rate. In both genomic as well as phenotypic evolution. There- general, cichlids had a significantly higher evolutionary fore, we characterized 3’ UTRs in the genomes of nine rate of 3’ UTR divergence than non-cichlids (Fig. 4b, teleost fishes and documented patterns of genome-wide one sample t-test, p < 0.001). 3’ UTR divergence in cichlid fishes compared to non-cichlid fishes. We found that 3’ UTRs in cichlid fish Gene set enrichment analysis of long and rapid-evolving genomes are on average longer than those in other fish 3’ UTRs lineages. The relative paucity of repetitive elements in 3’ Next, we selected two sets of genes to further examine UTRs as compared to the whole genome in all of the the hypothesis that 3’ UTRs might be important for par- teleosts examined speaks to the functional importance ticular aspects of cichlid diversification by conducting of this region and suggests purifying selection could be GO-enrichment analyses for (1) the genes with relatively operating to keep transposable elements and other inser- longest 3’ UTRs in cichlids and (2) genes that exhibited tions out of 3’ UTRs. Moreover, the on average longer 3’ relatively rapid-evolving 3’ UTRs in cichlids. A top 5% UTRs in cichlids is associated with a greater number of cutoff was applied to 1:1 orthologous genes and thereby miRNA targets. There also appears to be a higher aver- 101 genes were selected for each set. Both of these rela- age evolutionary rate of divergence in 3’ UTR sequence tively longest and fastest evolving 3’ UTRs were then length in cichlids as compared to other teleosts. subjected to enrichment analysis (GO terms and KEGG Additionally, analysis of gene function on both the pathways). There were 47 genes shared between the longest and fastest-evolving cichlid 3’ UTRs showed a “length” set and the “rate” set, which meant 46.5% of strong functional bias towards ribosome-related path- them overlapped. By comparing proteins of D. rerio in ways and translation. These associations suggest that the Ensembl database (release 91), only one of these macro-evolutionary divergence in 3’ UTRs in cichlids might be influencing the core post-transcriptional regu- lation machinery in these rapidly diversifying fishes. In general, our results support a role of 3’ UTRs as meta-regulators, regulators of other gene regulatory mechanisms, in groups undergoing exceptional speci- ation and adaptive phenotypic diversification. The lengths of 3’ UTRs likely represent compromises among a number of factors. While 3’ UTRs are not translated into amino acids, the process of transcription is energetically consuming. We, therefore, expected that most 3’ UTRs should be relatively short and this was supported in our analyses (Additional file 3: Figure S1). The length of a 3’ UTR is also known to be important because it is associated with gene expression levels [44]. Longer 3’ UTRs usually contain more functional motifs and are associated with lower gene expression levels [45]. For the nine teleost fishes examined, their overall 3’ Fig. 3 Predicted miRNA target sites per mRNA in 3’ UTRs. The box UTRs had a median length of 795 nt, while the 3’ UTRs plots show the log -transformed distribution of miRNA targets for of only 1:1 orthologous genes had a median length of the nine teleost fish species. The dots inside the boxes show the 723 nt. It appears that cichlids have more 3’ UTRs longer mean values. The outliers are shown by the dots outside the boxes. The maximum number of miRNA targets in a single 3’ UTR is 62, than the median compared to non-cichlids (Fig. 1b; which can be found in one 3’ UTR in N. brichardi and one 3’ UTR in Additional file 3: Figure S1) and it is confirmed by com- P. formosa. On average, cichlid fishes consistently have more miRNA paring 1:1 orthologous genes that 3’ UTRs on average target sites than non-cichlid fishes were longer in cichlids than in non-cichlids (Fig. 1c), Xiong et al. BMC Genomics (2018) 19:433 Page 6 of 11 a b Fig. 4 Evolutionary rate of 3’ UTR length. a The phylogeny used in rate estimation. The rates were estimated in cichlid and non-cichlid groups separately. b Comparison of evolutionary rate of 3’ UTRs between cichlids and non-cichlids. The ratios of the rate in cichlids to that in non-cichlids are calculated for 2.041 1:1 orthologous genes. These ratios were log -transformed. In general, 3’ UTRs in cichlids have significant higher evolutionary rates than in non-cichlids (one-sample t-test that the mean log ratio = 0.0; p < 0.001). The top 5% relatively fastest evolving 3’ UTRs were further analyzed for enrichment of gene functions (GO terms and KEGG pathways) which could indicate a greater potential for a diversity of variation could be associated with different levels of se- functions in cichlid 3’ UTRs. lective constraints acting on different components of the Repetitive elements (repeats) are mobile genetic units genome [51]. In each of the nine focal species, we identi- that can shape a number of aspects of eukaryotic fied the abundance of repeats in the coding regions, 5’ genome architecture including 3’UTR length [46]. It has UTRs, 3’ UTRs and in the genome as a whole. Our results been proposed that repeats are one of the main contri- show that coding regions are made up of only 2–4% of butors of non-coding variation among different species repeats, and this is not surprising considering the strong [47, 48] and their expansion history and impact on gen- selective pressures on protein coding genes. Additionally, ome diversity both between and within lineages has also in all species, both 5′ and 3’ UTRs contain smaller pro- been documented in many fish [42, 49, 50]. The abun- portions of repeats than the genome as a whole, but dancy of repeats is uneven across the genome, and this higher than the coding regions, which is consistent with Table 2 Gene set enrichment analysis of relatively longest and fastest evolving 3’ UTRs in cichlids. Significant enriched GO terms are presented. CC: cellular component; BP: biological process; MF: molecular function; KEGG: KEGG pathway GO GO term Significant p-value category “length”“rate”“union” CC macromolecular complex – < 0.001 – CC mitochondrial matrix 0.046 –– CC intracellular ribonucleoprotein complex – < 0.001 – CC Ribosome < 0.001 < 0.001 0.001 CC ribosomal subunit < 0.001 –– BP cellular biosynthetic process –– 0.046 BP nucleic acid phosphodiester bond hydrolysis 0.037 –– BP peptide metabolic process < 0.001 < 0.001 < 0.001 BP primary metabolic process – 0.019 – BP Translation < 0.001 < 0.001 – MF nuclease activity 0.013 –– MF RNA binding – < 0.001 – MF structural constituent of ribosome < 0.001 < 0.001 < 0.001 KEGG Ribosome < 0.001 < 0.001 < 0.001 Xiong et al. BMC Genomics (2018) 19:433 Page 7 of 11 the hypothesis that purifying selection acts against the protein synthesis, rather than particular genes such as presence of repeats in UTRs [52, 53]. Focusing on 3’ morphogens or opsins that more directly influence indi- UTRs, Dario rerio stood out from the other teleosts, in vidual phenotypes. Although ribosomal proteins are that it has the largest proportion of repeats in its genome highly conserved in coding regions because of their cru- as well as in its 3’ UTRs. This is likely due to the docu- cial functions in protein synthesis, it has been shown mented genome-wide expansion of repeats in this species that the diversity and differential expression of ribosomal [54, 55]. Astyanax mexicanus has the second largest components, known as specialized ribosomes, can select- proportion of repeats in the 3’ UTRs and this was also ively translate certain sets of genes [61]. This new level of mirrored in the proportion of repetitive elements in its gene regulation can be further linked to many phenotypes, genome. Less variation was present in the other fish spe- such as development [62] and disease [63, 64]. The fast cies investigated in this study, and variation of 3’ UTR evolution of 3’ UTRs of ribosomal genes in cichlids could length among different taxa did not appear to be the result provide the potential for specialized expression of specific of an expansion of repeats (Fig. 2b). This implies other ribosomal genes both spatially and temporally, which mechanisms are likely driving the evolution of 3’ UTR could be an important post-transcriptional regulatory length divergence among teleosts. mechanism that has contributed to the adaptation and The on average lengthening of 3’ UTRs in cichlids could diversification of cichlid fishes. be associated with a larger number of functional motifs, such as RNA binding sites and miRNA targets. Indeed, we Conclusions identified about one more miRNA target per 3’ UTRs in Our results reveal novel patterns of evolution and poten- cichlids compared to the other fish lineages (Fig. 3; tial functional differences in the 3’ UTRs of teleosts in Additional file 6: Table S5). Also, we found that more general and of cichlid fishes in particular. It has long mRNAs are targeted by miRNAs and more mRNAs have been argued that because of the relatively small amount multiple targets in cichlids (Additional file 6: Table S5). of protein-coding divergence, but large phenotypic This could have consequences for cichlid diversification as differences found between groups like humans and more potential miRNA-mRNA functional pairs in cichlids chimpanzees, that differential gene expression and asso- could enable these fishes to regulate gene expression ciated regulatory mechanisms must play an important spatially and temporally to an exceptional degree [8, 56]. role in adaptive evolution [2]. Our analyses suggest that Both coding and non-coding divergence likely contribute what might be most critical during adaptive divergence to diversification. Accelerated sequence evolution of par- is not only the direct regulators of gene expression but ticular coding regions in cichlids has been considered to be the regulators of these more direct regulators. By prefer- associated with their adaptive radiation [42]. Interestingly, entially influencing divergence in other regulatory we also detected that 3’ UTRs have on average evolved sig- mechanisms such as the ribosome and other aspects of nificantly faster in cichlids than in other teleosts (Fig. 4b). the post-translational machinery during the adaptive radi- The rapid evolution of 3’ UTRs within cichlids implicates ation of groups like cichlid fishes, divergence in 3’ UTRs this part of the post-transcriptional regulatory system in could serve as important genomic meta-regulators. structuring the exceptional phenotypic diversification of cichlids [57, 58]. Methods Particular subsets of 3’ UTRs might be especially im- Data collection portant during adaptive diversification. We, therefore, The NCBI database [65] was searched for genomic and examined two sets of evolutionary outliers for further transcriptomic data of teleost fishes. In order to obtain gene function enrichment analysis: (1) genes with rela- high-quality annotations of 3’ UTRs, we applied the follow- tively longest 3’ UTRs and (2) genes with the relatively ing minimal criteria for including species into our data set: fastest evolving 3’ UTRs. The influence of these subsets a) de novo genome is assembled; b) species-specific gene of genes on gene expression could play an important models are predicted; c) transcriptomic data from nume- role in the diversification of cichlids [59, 60]. Surpris- rous tissues is available. These stringent criteria enabled us ingly, the genes in these two datasets most often have to carry out comprehensive gene and isoform discovery in functions related to ribosomal structures and related each species, thus minimizing the potential bias due to un- pathways. The ribosome is the main organelle in the even gene and isoform representation across species. Nine process of translation, the final step of gene expression teleost fishes matched these criteria and were included in after post-transcriptional regulation. In addition, GO our analyses. Specifically, genome assemblies, annotations analyses suggested “RNA binding” was also one of the and RNA sequencing (RNA-Seq) data for five cichlid fishes main functions of these 3’ UTRs. The evolution of 3’ (Neolamprologus brichardi, Pundamilia nyererei, Maylan- UTRs in cichlids appears to have most affected the core dia zebra, Oreochromis niloticus, Haplochromis burtoni) of cellular life, post-transcriptional regulation and and four non-cichlid teleost fishes (Astyanax mexicanus, Xiong et al. BMC Genomics (2018) 19:433 Page 8 of 11 Danio rerio, Oryzias latipes, Poecilia formosa)wereexa- canonical transcript. The sequences of coding regions mined (Additional file 1: Table S1). It is important to note and UTRs of the canonical transcripts were then ex- that the nine selected species underwent the same number tracted from the corresponding genome according to the of whole genome duplication (WGD) events and are not PASA annotation. The frequency distributions of 3’ UTR descended from a lineage containing an extra round of lengths, calculated using 50 nt windows, were built for WGD after the teleost-specific WGD. This gave us more each of the nine teleost fishes. confidence in detecting the correct orthology among genes (see below). To place these species in a comparative frame- Gene clustering work, a time-calibrated phylogenetic tree of these nine tele- Protein sequences of canonical transcripts were ex- ost fishes was acquired from TimeTree [66]. tracted from the PASA annotations and genomes. All-against-all BLASTP v2.2.31+ [71] was employed to Annotation of 3’ UTRs with the PASA pipeline find the potential homologous genes with an E-value − 10 The number of genomes of teleost fishes that are avail- cutoff of 1e . The high-scoring segment pairs (HSPs) able is increasing quickly, but the comparison of 3’ from BLASTP output were conjoined by using Solar UTRs across genomes remains challenging. Although v0.9.6 [72]. The similarity of genes was measured by the the annotations of gene models are usually published bit-score. Most similar genes were clustered to gene with the genomes, the quality of these annotations is families using hcluster_sg v0.5.1, a hierarchical clustering often uneven because different pipelines have been ap- algorithm from the Treefam pipeline [73], with the plied in each species. Moreover, the UTRs are often parameters “-w 5 -s 0.33 -m 100000”. poorly described since most annotations typically focus on the protein-coding regions. Thus, the PASA v2.0.2 Identification of repetitive sequences (repeats) annotation pipeline [67] was employed to incorporate The species-specific libraries of repeat families were gene structures, including UTRs, into existing gene identified by applying RepeatModeler v1.0.8 [74] on the annotation, based on RNA-Seq data. genome of each species. RepeatMasker v4.0.6 [75]was A comprehensive workflow was used to integrate the used to mask interspersed repeats and low complexity available transcriptomes into existing annotations. For DNA sequences in the genomes. To compare the each species, RNA-Seq data from eight to eleven tissues proportion of repetitive elements in different genomic and different developmental stages (Additional file 1: regions across the focal species, sequence repeats were Table S1) was used. First, transcripts for each species annotated in the 3’ UTRs, 5’ UTRs, and in the coding were generated by combining de novo assembly and regions. genome-guided assembly. The RNA-Seq reads were de novo assembled using Trinity v2.4.0 [68] using default Prediction of miRNA target sites. parameters. Moreover, the RNA-Seq raw reads were Sequences of mature miRNAs from teleost fishes were mapped to the corresponding genome using TopHat retrieved from miRBase v21 [76]. Since miRNAs are v2.1.0 [69] also with default parameters and then assem- usually conserved across taxa [77] and species-specific bled into transcripts using the genome-guide model im- miRNAs are not present in the database for all species plemented in Trinity. Then, the gene structures were in this study, we extracted 271 unique mature miRNAs identified according to TopHat mapping results using conserved among vertebrates (Additional file 5: Table S4) Cufflinks v2.2.1 [70]. Lastly, the original annotation, tran- for the target prediction. The miRNA target sites of 3’ scripts, and Cufflinks gene structures were imported into UTRs were predicted using miRanda v3.3a [78]with the PASA annotation pipeline. The annotation output parameters “-en -20 -strict”, which set the minimal free from the first PASA run, transcripts, and Cufflinks gene energy as − 20 kcal/mol and required 5′-seed [79]inthe structures were then used in an additional run of PASA to mRNA-miRNA matches. further refine the gene structure and get the final annota- tions for each of the species (the updated gene models have been uploaded to the Dryad Digital Repository Evolutionary rate of 3’ UTRs (doi:https://doi.org/10.5061/dryad.4581cm37). The evolutionary rates of 3’ UTR length for all 1:1 orthologous genes were examined using the R package Retrieval of coding and UTR sequences “OUwie v1.50” [80] under the Brownian motion model. From the PASA annotations, we selected only transcripts Rates were estimated on the time-calibrated phylogen- that contained stop codons to ensure that the transcripts etic tree for two groups. Specifically, the rates within the were not degraded during RNA sequencing. For genes cichlid clade and in the non-cichlid outgroup were esti- with alternative splicing variants, we selected only the mated separately. Then, the ratio between the estimated transcript with the longest coding region as the rates in cichlids and in non-cichlids was calculated and a Xiong et al. BMC Genomics (2018) 19:433 Page 9 of 11 log transformation was applied to the ratio to normalize Additional file 6: Table S5. Predicted miRNA targets in the focal nine the data for further analyses. teleost fish species. (XLSX 11 kb) Additional file 7: Table S6. GO annotations of the selected genes with relatively longest and fastest-evolving 3’ UTRs in cichlid fishes. (XLSX 35 kb) Gene set enrichment analysis Enrichment analysis was performed on two data sets Abbreviations taken from the analyses on the 1:1 orthologous genes. 3’ UTRs: Three prime untranslated regions; GO: Gene ontology; HSPs: We examined the upper 5% percentile distribution of High-scoreing segment pairs; miRNAs: MicroRNAs; mRNA: Messenger RNA; Nt: Nucleotide; PABPs: Poly(A)-binding proteins; RBPs: RNA-binding proteins; the genes that had relatively longest 3’ UTRs and the Repeats: Repetitive elements; RISC: RNA-induced silencing complex; RNA- upper 5% percentile distribution form the 3’ UTRs Seq: RNA sequencing; TEs: Transposable elements; WGD: Whole genome showing the relative fastest evolutionary rate in cichlids. duplication To identify significantly over-represented gene ontology Acknowledgements (GO) terms and KEGG pathways in the identified genes We thank the two anonymous reviewers for their constructive comments (test sets) when compared to the whole gene set (base- and useful suggestions. And we gratefully acknowledge use of the services and facilities at the University of Konstanz. line set), we used a Fisher’s exact test implemented in g:Profiler [81]. To avoid any bias due to the different qual- Funding ity of the genome annotations, we carried out different ana- This work was supported by a China Scholarship Council fellowship lyses using as baseline the O. niloticus and the D. rerio gene (201306380094) to PX and by a Deutsche Forschungsgemeinschaft Research Grant (FR 3399/1–1) to PF. sets, the best-annotated cichlids and non-cichlids genomes, respectively. The sequence distribution of the GO terms for Availability of data and materials the genes in the test sets was observed graphically using a PASA annotation files in GFF3 format have been uploaded to the Dryad Digital Repository (doi:https://doi.org/10.5061/dryad.4581cm37). multilevel pie representation method as implemented in Blast2GO v4.1.5 [82]. Authors’ contributions PX, DH, and PF performed the analyses. PX, DH, AM, and PF conceived the study. All authors read and approved the final manuscript. Statistics We performed three types of comparisons and performed Ethics approval and consent to participate different statistical analyses based on the type of compari- Not applicable. sons made with the teleost 3’ UTRs. When we compared Competing interests different values within a species, such as the proportion of The authors declare that they have no competing interests. repetitive elements in the whole genome versus the 3’ UTRs, we treated the species as the unit of replication. Publisher’sNote For the comparisons of length and evolutionary rates in- Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. volving large numbers of individual genes, we treated these genes as independent data points for comparisons Author details between cichlids and non-cichlids. When we compared Chair in Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78457 Konstanz, Germany. Radcliffe Institute for the average values, such as miRNA targets, in cichlids ver- Advanced Study, Harvard University, Cambridge, MA 02138, USA. sus non-cichlids, we did not perform any statistical ana- lyses as these groups and their aggregate trait values were Received: 28 February 2018 Accepted: 23 May 2018 not phylogenetically independent of one another, and therefore, in these cases, we only reported the patterns. References 1. Britten RJ, Davidson EH. Gene regulation for higher cells - a theory. Science. 1969;165(3891):349. Additional files 2. King M, Wilson A. Evolution at two levels in humans and chimpanzees. Science. 1975;188(4184):107–16. 3. Shendure J, Balasubramanian S, Church GM, Gilbert W, Rogers J, Schloss JA, Additional file 1: Table S1. Genomic and transcriptomic data of the Waterston RH. DNA sequencing at 40: past, present and future. Nature. nine teleost fish species included in this study. (XLSX 11 kb) 2017;550(7676) Additional file 2: Table S2. Comparison of annotations for the nine 4. Romero IG, Ruvinsky I, Gilad Y. Comparative studies of gene expression and teleost fish species. (XLSX 10 kb) the evolution of gene regulation. Nat Rev Genet. 2012;13(7):505–16. Additional file 3: Figure S1. Frequency distribution of 3’ UTR length in 5. Salinas F, de Boer CG, Abarca V, Garcia V, Cuevas M, Araos S, Larrondo LF, nine teleost species. Figure S2. Mean length of 3’ UTRs without repeats. Martinez C, Cubillos FA. Natural variation in non-coding regions underlying Figure S3. Different classes of repetitive elements in different genomic phenotypic diversity in budding yeast. Sci Rep. 2016;6:21849. components in nine teleost fish species. Figure S4. Multi-level pie charts 6. Krubitzer L, Kaas J. The evolution of the neocortex in mammals: how is of GO terms of selected genes. (PDF 1211 kb) phenotypic diversity generated? Curr Opin Neurobiol. 2005;15(4):444–53. 7. Streelman JT, Peichel CL, Parichy DM. Developmental genetics of adaptation Additional file 4: Table S3. Percentage of repetitive elements in 3’ in fishes: the case for novelty. Annu Rev Ecol Evol Syst. 2007;38(1):655–81. UTRs, 5’ UTRs, coding regions and in the whole genome. (XLSX 13 kb) 8. Franchini P, Xiong P, Fruciano C, Meyer A. The role of microRNAs in the Additional file 5: Table S4. Name and sequence of the 271 teleost repeated parallel diversification of lineages of Midas cichlid fish from miRNAs conserved in vertebrates. (XLSX 18 kb) Nicaragua. Genome biology and evolution. 2016;8(5):1543–55. Xiong et al. BMC Genomics (2018) 19:433 Page 10 of 11 9. Goldstrohm AC, Wickens M. Multifunctional deadenylase complexes cichlid fishes long after Gondwanan rifting. Proceedings Biological sciences. diversify mRNA control. Nat Rev Mol Cell Biol. 2008;9(4):337–44. 2013;280(1770):20131733. 10. Matoulkova E, Michalova E, Vojtesek B, Hrstka R. The role of the 3′ 37. Seehausen O, Mayhew PJ, Van Alphen JJM. Evolution of colour patterns in untranslated region in post-transcriptional regulation of protein expression east African cichlid fish. J Evolution Biol. 1999;12(3):514–34. in mammalian cells. RNA Biol. 2012;9(5):563–76. 38. Fruciano C, Franchini P, Kovacova V, Elmer KR, Henning F, Meyer A. Genetic 11. Andreassi C, Riccio A. To localize or not to localize: mRNA fate is in 3'UTR linkage of distinct adaptive traits in sympatrically speciating crater lake ends. Trends Cell Biol. 2009;19(9):465–74. cichlid fish. Nat Commun. 2016;7:12736. 12. Holloway AK, Lawniczak MK, Mezey JG, Begun DJ, Jones CD. Adaptive gene 39. Hulsey CD, Mims MC, Parnell NF, Streelman JT. Comparative rates of lower jaw expression divergence inferred from population genomics. PLoS Genet. diversification in cichlid adaptive radiations. J Evol Biol. 2010;23(7):1456–67. 2007;3(10):2007–13. 40. Henning F, Machado-Schiaffino G, Baumgarten L, Meyer A. Genetic 13. Gorgoni B, Gray NK. The roles of cytoplasmic poly(a)-binding proteins in dissection of adaptive form and function in rapidly speciating cichlid fishes. regulating gene expression: a developmental perspective. Briefings in Evolution. 2017;71(5):1297–312. functional genomics & proteomics. 2004;3(2):125–41. 41. O'Quin KE, Hofmann CM, Hofmann HA, Carleton KL. Parallel evolution of 14. Bartel DP. MicroRNAs. Cell. 2004;116(2):281–97. opsin gene expression in African cichlid fishes. Mol Biol Evol. 2010;27(12): 15. Gregory RI, Chendrimada TP, Cooch N, Shiekhattar R. Human RISC couples 2839–54. microRNA biogenesis and posttranscriptional gene silencing. Cell. 2005; 42. Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, Simakov O, Ng AY, 123(4):631–40. Lim ZW, Bezault E, et al. The genomic substrate for adaptive radiation in African cichlid fish. Nature. 2014;513(7518):375–81. 16. Hu W, Coller J. What comes first: translational repression or mRNA degradation? The deepening mystery of microRNA function. Cell Res. 2012; 43. Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP. The impact of 22(9):1322–4. microRNAs on protein output. Nature. 2008;455(7209):64–71. 17. Smit AF. Interspersed repeats and other mementos of transposable 44. Ji Z, Lee JY, Pan Z, Jiang B, Tian B. Progressive lengthening of 3′ elements in mammalian genomes. Curr Opin Genet Dev. 1999;9(6):657–63. untranslated regions of mRNAs by alternative polyadenylation during 18. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, mouse embryonic development. Proc Natl Acad Sci U S A. 2009;106(17): Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the 7028–33. human genome. Nature. 2001;409(6822):860–921. 45. Barrett LW, Fletcher S, Wilton SD. Regulation of eukaryotic gene expression 19. Kidwell MG, Lisch D. Transposable elements as sources of variation in by the untranslated gene regions and other non-coding elements. Cellular animals and plants. P Natl Acad Sci USA. 1997;94(15):7704–11. and molecular life sciences : CMLS. 2012;69(21):3613–34. 20. Hedges DJ, Deininger PL. Inviting instability: transposable elements, double- 46. Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic strand breaks, and the maintenance of genome integrity. Mutat Res. 2007; genomes. Annu Rev Genet. 2007;41:331–68. 616(1–2):46–59. 47. Lee SI, Kim NS. Transposable elements and genome size variations in plants. 21. Lerman DN, Michalak P, Helin AB, Bettencourt BR, Feder ME. Modification of Genomics & informatics. 2014;12(3):87–97. heat-shock gene expression in Drosophila melanogaster populations via 48. Kidwell MG. Transposable elements and the evolution of genome size in transposable elements. Mol Biol Evol. 2003;20(1):135–44. eukaryotes. Genetica. 2002;115(1):49–63. 22. Wright S, Finnegan D. Genome evolution: sex and the transposable 49. Chalopin D, Naville M, Plard F, Galiana D, Volff JN. Comparative analysis of element. Current biology : CB. 2001;11(8):R296–9. transposable elements highlights Mobilome diversity and evolution in 23. Szitenberg A, Cha S, Opperman CH, Bird DM, Blaxter ML, Lunt DH. Genetic vertebrates. Genome Biology and Evolution. 2015;7(2):567–80. drift, not life history or RNAi, determine long-term evolution of transposable 50. Gao B, Shen D, Xue SL, Chen C, Cui HM, Song CY. The contribution of elements. Genome biology and evolution. 2016;8(9):2964–78. transposable elements to size variations between four teleost genomes. 24. Charlesworth B, Charlesworth D. The population-dynamics of transposable Mobile DNA-Uk. 2016;7 elements. Genet Res. 1983;42(1):1–27. 51. Gonzalez J, Petrov DA. The adaptive role of transposable elements in the 25. Chen CY, Chen ST, Juan HF, Huang HC. Lengthening of 3'UTR increases Drosophila genome. Gene. 2009;448(2):124–33. with morphological complexity in animal evolution. Bioinformatics. 2012; 52. Pereira V. Insertion bias and purifying selection of retrotransposons in the 28(24):3178–81. Arabidopsis thaliana genome. Genome Biol. 2004;5(10):R79. 26. Miura P, Shenker S, Andreu-Agullo C, Westholm JO, Lai EC. Widespread and 53. Barron MG, Fiston-Lavier AS, Petrov DA, Gonzalez J. Population genomics of extensive lengthening of 3 ' UTRs in the mammalian brain. Genome Res. transposable elements in Drosophila. Annu Rev Genet. 2014;48:561–81. 2013;23(5):812–25. 54. Moss SP, Joyce DA, Humphries S, Tindall KJ, Lunt DH. Comparative analysis 27. Pesole G, Mignone F, Gissi C, Grillo G, Licciulli F, Liuni S. Structural and of teleost genome sequences reveals an ancient intron size expansion in functional features of eukaryotic mRNA untranslated regions. Gene. 2001; the zebrafish lineage. Genome biology and evolution. 2011;3:1187–96. 276(1–2):73–81. 55. Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, 28. Gavrilets S, Losos JB. Adaptive radiation: contrasting theory with data. Humphray S, McLaren K, Matthews L, et al. The zebrafish reference genome Science. 2009;323(5915):732–7. sequence and its relationship to the human genome. Nature. 2013; 29. Barrier M, Robichaux RH, Purugganan MD. Accelerated regulatory gene evolution 496(7446):498–503. in an adaptive radiation. Proc Natl Acad Sci U S A. 2001;98(18):10208–13. 56. Li J, Zhang Z. miRNA regulatory variation in human evolution. Trends in 30. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. The power and genetics : TIG. 2013;29(2):116–24. promise of population genomics: from genotyping to genome typing. Nat 57. Kuraku S, Meyer A, Kuratani S. Timing of genome duplications relative to Rev Genet. 2003;4(12):981–94. the origin of the vertebrates: did cyclostomes diverge before or after? Mol 31. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Biol Evol. 2009;26(1):47–59. Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification 58. Henning F, Meyer A. The evolutionary genomics of cichlid fishes: explosive of biology. The Gene Ontology Consortium. Nat. gen. 2000;25(1):25–9. speciation and adaptation in the postgenomic era. Annu Rev Genomics 32. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a Hum Genet. 2014;15:417–41. reference resource for gene and protein annotation. Nucleic Acids Res. 59. Brand AH, Perrimon N. Targeted gene-expression as a means of altering 2016;44(D1):D457–62. cell fates and generating dominant phenotypes. Development. 1993; 33. Hulsey CD. Cichlid genomics and phenotypic diversity in a comparative 118(2):401–15. context. Integr Comp Biol. 2009;49(6):618–29. 60. Vu V, Verster AJ, Schertzberg M, Chuluunbaatar T, Spensley M, Pajkic D, Hart 34. Meyer A. Phylogenetic relationships and evolutionary processes in east GT, Moffat J, Fraser AG. Natural variation in gene expression modulates the African cichlid fishes. Trends Ecol Evol. 1993;8(8):279–84. severity of mutant phenotypes. Cell. 2015;162(2):391–402. 35. Meyer A, Kocher TD, Basasibwaki P, Wilson AC. Monophyletic origin of Lake 61. Xue S, Barna M. Specialized ribosomes: a new frontier in gene regulation Victoria cichlid fishes suggested by mitochondrial DNA sequences. Nature. and organismal biology. Nat Rev Mol Cell Biol. 2012;13(6):355–69. 1990;347(6293):550–3. 62. Kondrashov N, Pusic A, Stumpf CR, Shimizu K, Hsieh AC, Ishijima J, Shiroishi 36. Friedman M, Keck BP, Dornburg A, Eytan RI, Martin CH, Hulsey CD, T, Barna M. Ribosome-mediated specificity in Hox mRNA translation and Wainwright PC, Near TJ. Molecular and fossil evidence place the origin of vertebrate tissue patterning. Cell. 2011;145(3):383–97. Xiong et al. BMC Genomics (2018) 19:433 Page 11 of 11 63. Wang W, Nag S, Zhang X, Wang MH, Wang H, Zhou J, Zhang R. Ribosomal proteins and human diseases: pathogenesis, molecular mechanisms, and therapeutic implications. Med Res Rev. 2015;35(2):225–85. 64. Guimaraes JC, Zavolan M. Patterns of ribosomal protein expression specify normal and malignant human cells. Genome Biol. 2016;17(1):236. 65. Coordinators NR. Database resources of the National Center for biotechnology information. Nucleic Acids Res. 2017;45(D1):D12–7. 66. Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: a resource for timelines, Timetrees, and divergence times. Mol Biol Evol. 2017;34(7):1812–9. 67. Haas BJ. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–66. 68. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52. 69. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. 70. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5. 71. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. 72. Li H: solar. In.; 2006. 73. Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, Heriche JK, Hu Y, Kristiansen K, Li R, et al. TreeFam: 2008 update. Nucleic Acids Res. 2008; 36(Database issue):D735–40. 74. Smit A, Hubley R: RepeatModeler Open-1.0. In.; 2008-2015. 75. Smit A, Hubley R & Green P: RepeatMasker Open-4.0. In.; 2013-2015. 76. Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014; 42(Database issue):D68–73. 77. Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009;19(1):92–105. 78. Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. MicroRNA targets in Drosophila. Genome Biol. 2003;5(1):R1. 79. Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136(2):215–33. 80. Beaulieu JM, Jhwueng DC, Boettiger C, O'Meara BC. Modeling stabilizing selection: expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution. 2012;66(8):2369–83. 81. Reimand J, Arak T, Adler P, Kolberg L, Reisberg S, Peterson H, Vilo J. G: profiler-a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 2016;44(W1):W83–9. 82. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.

Journal

BMC GenomicsSpringer Journals

Published: Jun 5, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off