Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss

Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss microRNAs are conserved noncoding regulatory factors implicated in diverse physiological and developmental processes in multi- cellular organisms, as causal macroevolutionary agents and for phylogeny inference. However, the conservation and phylogenetic utility of microRNAs has been questioned on evidence of pervasive loss. Here, we show that apparent widespread losses are, largely, an artefact of poorly sampled and annotated microRNAomes. Using a curated data set of animal microRNAomes, we reject the view that miRNA families are never lost, but they are rarely lost (92% are never lost). A small number of families account for a majority of losses (1.7% of families account for>45% losses), and losses are associated with lineages exhibiting phenotypic simplification. Phylogenetic analyses based on the presence/absence of microRNA families among animal lineages, and based on microRNA sequences among Osteichthyes, demonstrate the power of these small data sets in phylogenetic inference. Perceptions of widespread evolutionary loss of microRNA families are due to the uncritical use of public archives corrupted by spurious microRNA annotations, and failure to discriminate false absences that occur because of incomplete microRNAome annotation. Key words: microRNAs, annotation, evolution, birth, death, phylogeny. Introduction (Thomson et al. 2014; Hertel and Stadler 2015). However, microRNAs (miRNAs) are short noncoding RNA molecules pre- there are two key challenges to inferring the evolutionary his- sent in the genomes of animals, plants, fungi, and both green tory of miRNAs. The first is incomplete annotation of the and brown algae (Tarver et al. 2012); they are important bio- miRNAome of an organism as a consequence of incomplete medically (Sayed and Abdellatif 2011), agriculturally (Zhou and genome and/or small RNA sequencing, an ascertainment bias Luo 2013), developmentally (Plasterk 2006), as causal agents of that results in false-negatives (Thomson et al. 2014; Fromm macroevolutionary change (Heimberg et al. 2008; Peterson et al. 2015). The second challenge is the misannotation of ran- et al. 2009), and in phylogenetic inference (Tarver et al. dom or degraded sequence, or of other classes of small RNAs, 2013). Though they are thought largely to evolve from random as miRNAs (Chiang et al. 2010; Hansen et al. 2011; Wang and sequence, they have been perceived to be rarely lost after as- Liu 2011; Meng et al. 2012; Tarver et al. 2012; Castellano and suming a regulatory function (Tarver et al. 2013). Recent stud- Stebbing 2013; Taylor et al. 2014, 2017), which results in false- ies have called this prevailing view into question (Guerra- positives. The occurrence of false-positives is so widespread Assunc¸ao and Enright 2012; Meunier et al. 2013; Thomson that some have argued that only 16% of metazoan miRNA et al. 2014; Hertel and Stadler 2015), arguing that miRNA fam- annotations in the canonical miRNA database (Kozomara and ilies exhibit rates of loss approaching 80% in some species Griffiths-Jones 2014) are genuine (Fromm et al. 2015). The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Genome Biol. Evol. 10(6):1457–1470. doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1457 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE In an attempt to obtain a realistic understanding of miRNA comparison of the results of the analyses of the two data sets, family evolution, we exploited a curated data set of eumeta- but it biases the analysis in favor of the uncurated data set by zoan miRNA families in which these biases are minimized raising the overall quality of annotation, minimizing the im- (Fromm et al. 2015). This data set includes all metazoan pact of noise reduction in the curated data set. Not all taxa that have been the subject of exhaustive miRNAome miRBase miRNA genes are assigned to families,—particularly annotation based on genome and small RNA transcriptome in cases where there is only a single gene in the family. We sequencing. We present an analysis of the dynamics of meta- included these as individual families containing the single zoan miRNA evolution, demonstrating that miRNA families gene. Some genes are named correctly but not assigned to are among the most conserved of phylogenetic characters. the correct family. For example, in Schmidtea mediterranea, Most losses can be attributed to a small number of families there are three genes which have correctly been given the that have been repeatedly lost, and a small number of line- name miR-7 but which are not included in the miR-7 family ages exhibiting reduced phenotypic complexity (e.g., as in- grouping on miRBase v.21 (Kozomara and Griffiths-Jones ferred from cell diversity; Valentine et al. 1994). Finally, our 2014). In such instances, we included each of these as a sep- phylogenetic analyses of 1) the distribution of miRNA families, arate family. The alternative would have been to hand curate and 2) their nucleotide sequences, demonstrate the power of the data set, which would be unrepresentative of miRBase, or miRNAs in recovering phylogenetic resolution at the scale of else we could assign genes to families based on their name. the animal kingdom. However, this would introduce additional artifact since many genes correctly assigned to families in miRBase have different names (e.g., mmu-mir-300 is currently correctly listed in the miR-154 family, not the miR-300 family, despite the differ- Materials and Methods ence in names). Thus, for consistency, we have followed the Taxon Sampling family groupings that are listed on miRBase and ignored the To minimize false negatives we constrained our analyses to names of the genes. only those taxa with high-coverage genomic and small RNA sequencing. To eliminate the effect of transcriptional noise Phylogeny and Rates of Evolution (i.e., non-miRNA sequences), we used the curated data set of Fromm et al. (2015) which is based on a reanalysis of the The phylogenetic relationships between taxa included in the entire compliment of eumetazoan miRNA families in miRBase analysis are not controversial and a constraint timetree was V.21 (Kozomara and Griffiths-Jones 2014) following estab- used based on previous molecular clock studies (Dabert et al. lished criteria (Ambros 2003; Kozomara and Griffiths-Jones ^ 2010; Pyron 2010; Erwin et al. 2011; Reis et al. 2011; Crete- 2011). Fromm and colleagues found that of the 7,095 anno- Lafrenie ` re et al. 2012; Tarver et al. 2016) onto which the evo- tated metazoan miRNA families in miRBase V.21, only 1,178 lution of microRNAs was mapped. This allowed us to estimate (16.60%) meet all of the criteria required for correct annota- rates of miRNA gain and loss in two ways. First, for the analyses tion, with a further 2,104 (29.7%) meeting some but not all of the Homo and Drosophila lineages, numbers of shared of the criteria required for validation (e.g., lacking the prod- miRNA families were calculated for the internal nodes of the 0 0 ucts of both the 5 and 3 arms). Taxa with low coverage phylogeny, so that the branching pattern could be observed. (<6) genomes (e.g., Taeniopygia guttata), poorly annotated Second, for the total tree, the rate for each branch was aver- small RNA repertoires (e.g., Ovis aries), and uncertain phylo- aged across the length of the branch with rates calculated as genetic position (e.g., Strongyloides ratti), were removed, re- birth/death per million years. About 1,000 simulations were run ducing the data set to 1,139 miRNA families from 35 species, to generate confidence intervals around the average values. representing a broad range of metazoan taxa (supplementary We investigated the phylogenetic and temporal distribu- file 1, Supplementary Material online). This data set minimizes tion of gains and losses using a novel stochastic character both false negatives and any bias from misannotated miRNAs, mapping approach under a Dollo assumption; this model is providing the basis for an approximately unbiased assessment employed even by those who argue that miRNA loss is prev- of the rates of miRNA family birth and death. alent (Thomson et al. 2014) since there is no evidence for the To establish the impact of data curation, we compiled an convergent evolution of miRNA families (Tarver et al. 2013). uncurated data set (supplementary file 2, Supplementary First, we assumed that characters were gained only once and Material online) from miRBase v.21 (Kozomara and that the gain occurred on either the terminal branch (for Griffiths-Jones 2014) which we subjected to parallel analyses. singletons), or the branch subtending the least inclusive clade Not all of the taxa in our curated data set appear in miRBase of taxa exhibiting presence of that miRNA. Without further v.21 (Kozomara and Griffiths-Jones 2014), viz. Alligator mis- information to constrain its position, we assumed equal prob- sissippiensis, Melibe maugeana, Columba livia,and Chrysemys ability of miRNA family gain having occurred at any point picta, and so we included the curated annotations for these along the branch and the precise timing of gain time was taxa in the otherwise uncurated data set. This facilitates direct generated by randomly sampling from a uniform distribution 1458 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE between the beginning and the end of this branch. Thus, in Phylogenetics practice, gains will always occur on the same branch, but the To evaluate the suitability of miRNAs as phylogenetic characters, precise position (and hence timing) is determined stochastically we first compiled a data matrix using the complete data set of all and will vary each time the algorithm is run. Second, we as- miRNA families, coding their presence and absence (supplemen- sumed no losses occurred in the case of singletons but allowed tary file 1, Supplementary Material online). We then estimated stochastic losses to occur in instances of nonsingletons. Where the level of homoplasy in this data set using the Consistency losses were possible, these were simulated by using the make.- Index, following Sanderson and Donoghue (1989, 1996), with simmap function in the phytools package (Revell 2012)by plac- comparisons made to their data set of 101 morphological and ing a hard prior probability of presence on the root, and then molecular matrices. The Consistency Index was calculated in only allowing losses (1–0 transitions) in the model. When losses MacClade with all characters considered to be unordered. were observed, the branch on which they were inferred to have We conducted phylogenetic analyses of the curated and occurred was recorded, as well as the position along that uncurated data sets (supplementary files 1 and 2, branch (effectively a temporal estimate). This whole process Supplementary Material online) using a stochastic Dollo binary was achieved using the custom-written function DolloSCM in substitution model (Nicholls and Gray 2008) implemented in the package Claddis (Lloyd 2016). Because the algorithm is RevBayes (Hohna 2016), with four discrete gamma loss rate stochastic, it was run 1,000 times in order to generate mean categories across sites, and the homogeneous origination rate and 95% confidence intervals. Similarly, the output allows k integrated out of the likelihood equation (see Alekseyenko both phylogenetic and temporal positions to be examined for et al. 2008). We applied an ascertainment bias correction to each inferred transition, which can themselves be subdivided account for unobserved miRNA families lost in all species into gains (0–1 transitions) and losses (1–0 transitions). We (Felsenstein 1992). We conducted a second analysis with all used this output to estimate per-branch rates for gains-only singleton families removed, applying a correction to account (as losses are necessarily biased both phylogenetically, to- for the absence of both singletons as well as families lost in all ward nested branches, and temporally, toward the pre- species (Alekseyenko et al. 2008; Nicholls and Gray 2008). sent, because they must occur after a gain). We also The RevBayes script used for this analysis is provided in sup- generated separate (gains and losses) time series of rates plementary file 3, Supplementary Material online. (changes per 10-Myr time bin) from 720 Ma to the present. Convergence was assessed using the tracecomp and bpcomp Finally, we compared inferred losses across all miRNAs to programs in the PhyloBayes package (Lartillot et al. 2013), simple minimum and maximum possible values based on ensuring a maxdiff statistic< 0.05, a minimum effsize> 100 the observed data. Here, the minimum was set at zero (no and maximum rel_diff< 0.1 across all parameters. Finally, we losses). The maximum is more complex to estimate, but conducted phylogenetic analyses of concatenated pri-miRNA here we used a simple rule, viz.: 1) if a singleton then no sequences. This alignment (supplementary file 4, losses could be observed; 2) if presence of the miRNA is Supplementary Material online) was run under a GTRþ G presentin all members ofacladeoftwo tips then, again, model in PhyloBayes. no losses could be observed; 3) if the least inclusive clade exhibiting the miRNA contains more than two taxa, a loss could be observed and the maximum number of losses is Results 2N,where N is the number of taxa in that clade (i.e., with Birth and Death of microRNA Families all losses occurring on terminal branches). These rules were The curated data set reveals a significant difference in the then used to sum the maximum number of losses across all number of miRNA families present in the most completely miRNAs. The inferred total number of losses can vary (due annotated deuterostome (Homo sapiens) and protostome to the stochastic nature of the algorithm) and, thus, we (Drosophila melanogaster), with 300 in H. sapiens, and 132 considered the full distribution of our 1,000 runs when in D. melanogaster (fig. 1). This difference in cumulative comparing between minimum and maximum losses. The miRNA family acquisition is widely recognized (Hertel 2006; Dollo model necessitates that characters are rarely lost and Heimberg et al. 2008; Wheeler et al. 2009; Fromm et al. this may bias toward higher rates of gain. To avoid this po- 2015); the large complement of human miRNAs inferred to tential artefact, we used a model that would produce much be a consequence of episodic bursts in miRNA acquisition at lower rates of gain compared with loss. Using DiscML (Kim the origin of bilaterians, vertebrates, placentals, and primates and Hao 2014) in R, we estimated single rates of gain (0.16) (Heimberg et al. 2008; Wheeler et al. 2009; Tanzer et al. and loss (1.64) for all characters, with the root set to miRNA 2010). Likewise, the absolute number of miRNA family losses absence. Using these estimates, we calculated maximum like- varies between species (23 in H. sapiens;10 in D. mela- lihood ancestral states in APE (Paradis et al. 2004)which we nogaster). We find a low proportion of loss (0.07) in these used to estimate rates of evolution in Claddis (Lloyd 2016). All lineages, with an overall ratio of 13.0 gains to every 1 loss in analyses were performed in R (RCore Team 2016) and the H. sapiens and 13.2 gains to each loss in D. melanogaster. code and data used are available from Dryad. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1459 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE H. sapiens Homo sapiens Drosophila melanogaster Gains (300) 92.8% 93.0% D. melanogaster Gains (132) H. sapiens losses (23) D.melanogaster Losses (10) 800 700 600 500 400 300 200 100 0 Time (Millions of years before present) FIG.1.—Cumulative numbers of gains and losses of miRNA families in the lineages leading to the two most completely sampled taxa (Homo and Drosophila) over the past 720 Myr based on our curated data set. To determine the extent to which these lineages are rep- with an average length of 196 Myr, suggesting that the broad resentative of the broader pattern of miRNA gain and loss CIs reflect variable, rather than constant rates of character across metazoans, we mapped all character state changes change. Thus, if we consider the 44 gains on the lineage lead- onto the known phylogeny, with gains and losses calculated ing to vertebrates (617–510 Ma), the CI spans a 6-fold range for each taxon, running from the root to tree tips. This reveals from 0.025 to 0.15 gains per million years. There is a significant an average gain to loss ratio per taxon of 11.4 gains for every increase in the evolution of miRNA families corresponding to loss, equating to 0.08 loss proportion (table 1), with a me- the evolution of Bilateria during the Cryogenian. The rate dian ratio of 10.5:1 and a loss proportion of 0.087. decreases until an Ediacaran rise followed by a steady Of the 1,143 miRNA families in the curated data set, 1,052 Cambrian–Permian decrease in the rate of gains. The subse- (92%) exhibit no losses, 52 are lost once, 20 are lost twice, 10 quent rate of miRNA family gains has increased steadily, sub- are lost three times, 7 are lost four times, miR-2001 is lost five stantially from the Cretaceous. In contrast, losses exhibit an times, and miR-315 is lost six times (fig. 2). Thus, although approximately constant (and low) rate from the Cryogenian. only 91 individual families exhibit losses, we observe a total of Numerous internal branches exhibit significantly high rates of 161 losses, and these are distributed unevenly across the tree. miRNA family acquisition (e.g., stem-lineages of bilaterians, Of the 68 branches with character changes, 24 show only protostomes, vertebrates, placentals, and primates; supple- gains of miRNA families, a further 20 show 75% gains, mentary file 5, Supplementary Material online). Some external while only eight branches are characterized by more losses branches show significantly high acquisition rates (most pla- than gains (fig. 3A). centals, birds, drosophilids, and nematodes). Significantly low When character state changes are mapped through time rates of miRNA family acquisition are observed on internal (fig. 3B), average gains outweigh losses, although the confi- branches leading to mammals and birds. These results obtain dence intervals overlap until the mid-Permian. Confidence regardless of whether the Dollo Parsimony or likelihood-based intervals are constrained by relatively long branch lengths models of ancestral state reconstruction are used as a basis for 1460 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Cumulative number of microRNA families Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE Table 1 Sum Total of miRNA Family Gains and Losses Along the Path from the Root of the Phylogenetic Tree to Each Tip Based on our Curated Data Set Node Lineage Gains Losses Gain:Loss Percentage Ratio Loss 37 Amphimedon 80 N/A N/A 38 Nematostella 28 0 N/A N/A 40 Capitella 68 0 N/A N/A 39 Melibe 61 2 30.50 3.28 47 Apis 77 4 19.25 5.19 46 Ixodes 57 3 19.00 5.26 54 Branchiostoma 56 3 18.67 5.36 400 52 Saccoglossus 54 3 18.00 5.56 56 Petromyzon 99 6 16.50 6.06 60 Chrysemys 130 8 16.25 6.15 63 Columba 154 10 15.40 6.49 48 Tribolium 88 6 14.67 6.82 61 Alligator 128 9 14.22 7.03 7 1 1 70 Homo 301 22 13.68 7.31 0 1 2 3 4 5 6 7 8 9 10 51 D. melanogaster 133 10 13.30 7.52 Observed number of losses per miRNA family 53 Strongylocentrotus 46 4 11.50 8.70 FIG.2.—The observed number of losses of miRNA families inferred 62 Gallus 146 13 11.23 8.90 from our phylogenetic analysis of the curated data set. 71 Macaca 235 21 11.19 8.94 59 Anolis 124 12 10.33 9.68 67 Bos 219 22 9.95 10.05 Utility of microRNA Families in Phylogenetics 50 D. pseudoobscura 99 10 9.90 10.10 The utility of a marker for phylogenetic inference is related to 44 Celegans 105 11 9.55 10.48 the level of homoplasy that it exhibits. In order to estimate the 58 Salmon 113 12 9.42 10.62 level of homoplasy in our miRNA family data set, we used the 57 Danio 115 13 8.85 11.30 Consistency Index (CI) to characterize the number of times a 49 Bombyx 88 10 8.80 11.36 66 Sus 198 24 8.25 12.12 miRNA family is gained and lost on the tree. CI is based on 69 Mus 259 33 7.85 12.74 the minimum number of changes required by a character 65 Monodelphis 143 19 7.53 13.29 or data set, divided by the actual number of changes 41 Schmidtea 58 9 6.44 15.52 (Kluge and Farris 1969); values range from 0 to 1, where a 64 Ornithorhynchus 129 21 6.14 16.28 CI of 1 reflects the minimum number of changes and, there- 68 Rattus 221 36 6.14 16.29 fore, zero homoplasy. The expected CI can be inferred based 43 Pristionchus 140 25 5.60 17.86 on an empirically derived formula (Sanderson and Donoghue 42 Ascaris 64 14 4.57 21.88 1996); for a categorical data set of 35 taxa it is  0.50, and for 45 Tetranychus 56 13 4.31 23.21 molecular sequence data  0.64. In contrast, the CI observed 55 Ciona 40 14 2.86 35.00 for our miRNA data sets is 0.88, indicating that miRNA families NOTE.—The gain to loss ratio, and percentage loss, for each individual lineage scored simply by presence/absence, exhibit surprisingly low are also shown. Thus, the three gains and two losses observed on the lineage leading to chordates are recorded in all descendent taxa. levels of homoplasy and, as such, should be reliable phyloge- netic markers. Having established the empirical basis for the phylogenetic utility of miRNAs, we conducted a phylogenetic analysis of the analysis of the curated data set (supplementary file 6, same data set using a stochastic Dollo model of binary char- Supplementary Material online). The likelihood-based analysis acter evolution, implemented in RevBayes. This recovered differs principally in inferring a greater flux of miRNA family trees that were largely compatible, even if not always highly gain and loss deep within metazoan evolution, though this is supported, with current knowledge of metazoan evolution likely an analytic artefact of the uninformative outgroup that (fig. 4), suggesting that previous unorthodox results obtained lacks miRNAs altogether (supplementary file 6, Supplementary using BEAST were at least partly driven by model mis- Material online). specification (previous analyses lacked ascertainment bias cor- Analysis of the uncurated miRBase data set yielded results rection and did not use a Gamma distribution). Parallel anal- that imply an order of magnitude higher rate of gain and loss of yses of the uncurated miRBase data set (supplementary file 2, miRNAs within the Cenozoic, toward the tips of the phyloge- Supplementary Material online) yielded trees that are much netic tree, in comparison to the rates inferred from the curated more incongruent with current perceptions of metazoan data set (supplementary file 7, Supplementary Material online). Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1461 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Number of miRNA families Nematodes Arthropods Vertebrates Insects Amniotes Mammals loss Tarver et al. GBE 1000 800 700 600 500 400 300 200 900 100 0 Chanoflagellate Monosiga 8 Amphimedon Sponge Cnidarian 28 Nematostella Mollusc Melibe Annelid 19 Capitella Platyhelminth 21 Schmidtea 23 Ascaris 106 Pristionchus C. elegans 1 18 Tetranychus Ixodes 14 Apis 18 Tribolium 27 Bombyx D. pseudoobscura 58 D. melanogaster 18 Saccoglossus Hemichordate Echinoderm Strongylocentrotus Branchiostoma Amphioxus 21 Tunicate Ciona 16 Petromyzon Danio Salmo 44 5 Anolis 3 19 17 Chrysemys 17 Alligator Gallus 36 Columbia Ornithorhynchus 31 Monodelphis gain 10 Sus 8 3 30 Bos 78 10 Rattus 46 Mus Homo 13 Macaca 1000 900 800 700 600 400 300 200 100 0 Cryogenian Ediacaran Camb Ord Sil Devon Carbon Perm Trias Jurass Cretace Pal Ng FIG.3.—The gains and losses of miRNA families for all taxa with well-annotated miRNAomes, (A) per branch within the phylogenetic tree, (B) per unit time across the phylogenetic tree. miRNA gains showed an increased rate toward the Recent, compatible with some theories which posit an early transient phase in which miRNA gain and loss is rapid (Peterson et al. 2009), making early losses difficult to detect. phylogeny (fig. 5), regardless of whether singletons were in- paraphyletic with respect to mammals) or excluded (little res- cluded (protostomes resolved as paraphyletic with respect to olution among invertebrates; lizard resolved as sister to mam- deuterostomes; lizards, and archosaurs resolved as mals rather than archosaurs). 1462 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 0.0 0.1 0.2 0.3 0.4 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE 0.98 0.79 1 1 1 1 0.85 1 0.81 1 0.83 0.89 0.95 1 0.57 0.52 0.76 0.94 0.61 1 1 0.97 0.52 0.59 1 0.62 0.81 1 1 1 1 0.84 0.58 0.92 0.51 0.76 1 1 0.55 0.92 1 0.56 0.92 0.98 0.94 0.63 0.92 FIG.4.—Phylogenetic tree derived from the Dollo analysis of the curated presence/absence of microRNA families considered in our analysis. (A)Shows results whereby all miRNAs were included and (B) where singletons (synapomorphic) miRNAs were excluded. Node values are clade posterior probabilities. The results of the tetrapod superalignment are congruent acquisition rates (e.g., at the origins of Bilateria, Protostomia, with established phylogenies (fig. 6). Some parts of the tree, Vertebrata, and Placentalia), while others show significantly such as the higher level relationships within Laurasiatheria, are low acquisition rates (e.g., at the origins of Aves, and unresolved or have low levels of support, however, these nodes Mammalia). Terminal branches also show highly variable have traditionally proven hard to resolve (Tarver et al. 2016). rates; this likely reflects the depth of miRNA annotation rather This expanded data set provides additional support for turtles than genuine biological signal, as many of these taxa are as archosaurs (Field et al. 2014), and resolves an model organisms. The results of analyses of the uncurated Atlantogenata–Boreoeutheria root for placental mammals miRBase data set imply several factors higher numbers and (Tarver et al. 2016). frequency of losses compared with the curated data set; the general pattern of miRNA family gain and loss is the same through the Phanerozoic, except for the Cenozoic where the uncurated data set implies an order of magnitude higher Discussion number and rate of gains and losses (supplementary file 7, Diversification of miRNA Families Supplementary Material online). These differences reflect the The results of our analyses based on the curated data set impact of curation and the importance of using curated data show highly variable rates in the diversification of miRNA fam- from high coverage genomes in attempting to obtain an ac- ilies, with some internal branches showing significantly high curate perspective on miRNA family evolution; uncurated Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1463 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Monosiga Monosiga Amphimedon Amphimedon Nematostella Nematostella Ascaris Ciona Pristionchus Branchiostoma C. elegans Strongylocentrotus Tetranychus Saccoglossus Ixodes Schmidtea Apis Melibe Tribolium Capitella Bombyx Ascaris D. pseudoobscura Pristionchus D. melanogaster C. elegans Schmidtea Tetranychus Melibe Ixodes Apis Capitella Strongylocentrotus Tribolium Saccoglossus Bombyx Branchiostoma D. pseudoobscura Ciona D. melanogaster Petromyzon Petromyzon Salmon Salmon Danio Danio Anolis Anolis Chrysemys Chrysemys Alligator Alligator Gallus Gallus Columba Columba Ornithorhynchus Ornithorhynchus Monodelphis Monodelphis Sus Sus Bos Bos Rattus Rattus Mus Mus Macaca Macaca Homo Homo Tarver et al. GBE 0.75 0.99 0.83 0.84 0.88 0.99 0.99 0.82 0.74 0.98 0.89 0.87 0.99 0.5 0.97 0.99 0.99 0.83 0.99 0.98 0.98 0.98 0.61 0.52 0.53 0.79 0.99 0.99 0.99 0.7 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.75 0.99 0.95 0.96 0.99 0.99 0.83 0.99 0.99 0.99 0.81 0.5 0.99 0.99 0.99 0.99 0.99 0.99 0.97 FIG.5.—Phylogenetic tree derived from the Dollo analysis of presence/absence of microRNA families in our uncurated miRBase data set. (A) Shows results whereby all miRNAs were included and (B) where singletons (synapomorphic) miRNAs were excluded. Node values are clade posterior probabilities. miRNA data sets yields spurious perceptions of miRNA and death rates. High birth and death rates have been inferred evolution. primarily based on analyses of Drosophila species. Lu et al. The diversification of microRNAs is the result of birth and (2008) has been criticized for misannotation of degraded death of individual miRNA families. Thus, apparent shifts in transcriptional sequences as miRNAs (Berezikov 2010). the miRNA diversification rate can be caused by 1) variable Nozawa et al. (2010) also inferred high death rates of birth rates, 2) variable death rates, or 3) a combination of miRNA families in Drosophila but their study did not control both. Previous work has suggested that shifts in miRNA diver- for the validity of miRNAs in miRBase or variable genome sification rates are caused by increased birth rates (Heimberg quality, and compounded these problems by assaying et al. 2008, 2010; Iwama et al. 2013; Platt et al. 2014), or miRNA evolution at the gene level, increasing false- constant birth rate and variable death rate (Lu et al. 2008; negatives. Lyu et al. (2014) inferred 87–94% of miRNA family Nozawa et al. 2010; Quah et al. 2015). Our data support loss within 4–30 and 60 Myr of Drosophila evolution, respec- either variable birth rates, or a constant but low birth rate tively; unfortunately, their analysis assumed a constant birth with slight variation in a low death rate, rather than high birth rate a priori, for which there is no evidence. 1464 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Amphimedon Schmidtea Monosiga Nematostella Capitella Nematostella Schmidtea Amphimedon Pristionchus Monosiga Caenorhabditis Melibe Ascaris Pristionchus Tetranychus Caenorhabditis Ixodes Ascaris Tetranychus Tribolium Apis Ixodes Bombyx Tribolium Drosophila Apis Drosophila Bombyx Capitella Drosophila Drosophila Melibe Strongylocentrotus Ciona Saccoglossus Strongylocentrotus Ciona Saccoglossus Branchiostoma Branchiostoma Petromyzon Petromyzon Salmo Salmo Danio Danio Anolis Columba Columba Alligator Alligator Chrysemys Chrysemys Anolis Ornithorhynchus Gallus Monodelphis Ornithorhynchus Monodelphis Gallus Sus Sus Rattus Bos Mus Rattus Bos Mus Macaca Macaca Homo Homo Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE FIG.6.—Phylogenetic tree derived from phylogenetic analysis of the concatenated analysis of pri-miRNA sequences. Node values are clade posterior probabilities. Investigating butterflies and moths, Quah et al. (2015) ob- viewpoint, that lineage specific miRNAs are therefore young, served that the terminal lineages in their phylogeny exhibited which has been incorrectly interpreted as providing support a much higher abundance of lineage-specific miRNAs than for the high birth rate of miRNAs (Taylor et al. 2014). This subtending lineages, supporting the view that miRNA family artefact of perception underpins the theory of a transient diversification rates are high. However, they failed to account phase of early miRNA evolution which now lacks an evi- for the antiquity of their terminal lineages (e.g., the species dential basis. miRNAome annotation of populations and with the largest number of lineage-specific miRNAs, Pararge closely related species are required to assess the veracity aegeria, is also estimated to have diverged from its sister taxa of a perceived “pull of the Recent” in miRNA family ac- 100 Ma; Pohl et al. 2009; Wahlberg et al. 2009). It is this quisition rates. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1465 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE Transcriptional Noise Loss of miRNAs Overall, we observed low levels of miRNA family loss and the The critical first step in understanding microRNA evolution is losses that we did observe are nonuniformly distributed across to establish a clear framework for what is and what is not a the phylogeny. The losses we observed provide support for microRNA, that is to separate miRNAs (signal) from tRNAs, the role of miRNA families in the evolution of morphological siRNAs, rRNAs, and degraded mRNAs (noise) within the tran- complexity (Sempere et al. 2006; Heimberg et al. 2008; scriptome. The inclusion of degraded mRNAs, as well as other Peterson et al. 2009; Wheeler et al. 2009; Berezikov 2011), RNA molecules in studies of miRNA evolution, is a significant since loss-prevalent lineages, such as Ciona (89%), Ascaris source of bias in the inference of miRNA losses, and has im- (22%), Pristionchus (14%), and Schmidtea (43%), correlate pacted previous results in several ways. First, individual studies with phenotypic simplification (Erwin et al. 2011; Philippe such as Meunier et al. (2013), who identified many putative et al. 2011; Fromm et al. 2013; Bai 2014), contra Dunn novel miRNAs and who showed variable conservation rates, et al. (2014). Large proportions of losses are also observed are suspect given that of the 333 purported miRNAs candi- in the spider mite Tetranychus (61%), which has undergone dates only 49 (14%) achieve all of the criteria necessary for large scale genomic reduction, losing>27% of its protein miRNA annotation (Fromm et al. 2015). Second, such errors coding genes (Grbic et al. 2011). Apparent exceptions to are compounded by the inclusion of such sequences into this are the relatively large proportions of miRNA families miRBase (Kozomara and Griffiths-Jones 2014) (the official on- lost in Ornithorhynchus (30%) and Rattus (40%), and clus- line depository of microRNAs). Indeed, the majority of sequen- tered losses within the murid lineage where 36% of the ces deposited in miRBase may not be genuine microRNAs changes are losses. This branch shows the highest rate of (Hansen et al. 2011; Wang and Liu 2011; Meng et al. loss (0.2 miRNA families per million years) in our data set. 2012; Brown et al. 2013; Taylor et al. 2014; Fromm et al. The miRNA losses identified in murids appear to be genuine, 2015), but are instead fragments of a great many different as evidenced by the comprehensively annotated genomes, types of RNA. Thus, in bioinformatic analyses (Guerra- deep coverage of its small RNA expression, which surpasses Assunc¸ao and Enright 2012; Hertel and Stadler 2015)in that of all other taxa, and also by synteny maps which showed which the miRNA catalogue of miRBase (Kozomara and that these losses of miRNA-families occurred on eight distinct Griffiths-Jones 2014) isused asa reference BLAST database, chromosomes (Fromm et al. 2015). Thenatureofthese losses many of the sequences under investigation are not miRNAs, is unexpected as nine of the eleven miRNAs that were lost, compromising the results from such studies. originated on the lineage leading to placentals. miRNA losses are otherwise mosaic in nature (Tarver et al. 2013), but the False-Negatives pattern here implies the loss of one or more closely associated When inferring loss of miRNAs, a critical issue is the depth of genetic pathways, mirroring losses observed in the homeobox genome sequencing, a problem noted but not addressed by gene repertoire of murids (Zhong and Holland 2011). Guerra-Assunc¸æo and Enright (2012) and Hertel and Stadler (2015). Issues associated with the use of low-coverage genomes in comparative genomics have been discussed pre- Comparisons to Studies Showing High Rates of Loss viously by Milinkovitch et al. (2010) who showed that they can While our results demonstrate that, overall, miRNA families generate artificially high numbers of gene losses. Tarver et al. exhibit proportionally few losses, and the majority of these are (2013) showed large-scale differences in the number of miss- attributable to a small number of miRNAs families and evo- ing miRNAs between three levels of sequencing coverage, lutionary lineages, this pattern is inconsistent with the with complete, high, and low coverage genomes missing view that loss of miRNA families is close to zero on an average 2.6%, 3%, and 19.5% miRNA families, re- (Sempere et al. 2006, 2007; Peterson et al. 2009; spectively. Low-coverage genomes were missing, on average, Sperling and Peterson 2009; Sperling et al. 2009; 6 times more miRNA families than high-coverage genomes, Wheeler et al. 2009; Tarver et al. 2013). Nevertheless, yet low-coverage genomes were included in both Guerra- our results are clearly incongruent with those published Assunc¸æo and Enright (2012) and Hertel and Stadler (2015). in other studies suggesting high rates of microRNA family Sequencing depth not only affects bioinformatic studies loss (Guerra-Assunc¸ao and Enright 2012; Meunier et al. but also those that rely only on small RNA sequencing data, 2013; Thomson et al. 2014; Hertel and Stadler 2015). All such as Meunier et al. (2013). This study generated small of these studies rely on different data sets, whether ge- RNA-seq data from five organs of single male representatives nomic (Guerra-Assunc¸ao and Enright 2012; Hertel and of six species, used to investigate the pattern of mammalian Stadler 2015), or small RNA (Meunier et al. 2013). Thus, miRNA evolution, concluding that there were high levels of the causes of the discrepancies between these studies and miRNA family turnover (Meunier et al. 2013). However, our own may include: 1) transcriptional noise, 2) false- miRNAs are known to have restricted tissue-specific expres- negatives, and 3) false-positives. sion profiles (Lagos-Quintana et al. 2002; Aboobakeretal. 1466 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE 2005), and so miRNA families present in the genome may not depth (100 Ms reads) from a breadth of tissues, organs, be sequenced in a sample of only five organs. Meunier et al. and developmental stages, as well as manual annotation of (2013) inferred the loss of miR-1, miR-22, and miR-122 the miRNAome supported by a ( 6) reference genome and (among others) in humans, despite numerous other studies a high-quality bioinformatics pipeline (Taylor et al. 2017). highlighting the key roles these miRNA families play in muscle Our principal analyses were based on a data set that development (Zhao et al. 2005), cancer (Song et al. 2013), minimizes false-positives and negatives (supplementary and in cholesterol and lipid metabolism within the liver (Girard file 1, Supplementary Material online). Our results dem- et al. 2008). It is clear that false-negatives caused either by the onstrate that miRNA families have a consistency index far use of low coverage genomes, unrepresentative tissue sam- higher than would be expected for either molecular or pling, or reliance on small RNA sequencing without reference morphological data sets of similar taxonomic scale and, to a genome, have all greatly increased the number of appar- thus, should make robust phylogenetic markers. This is ent losses of miRNA families. reflected in the results of our phylogenetic analyses that, even when based on miRNA family presence/absence, produces a tree that is in good agreement with phyloge- False-Positives nies based on more traditional markers (fig. 4). In compar- Genuine miRNAs incorrectly homologized in distantly re- ison to theresultofasimilaranalysis by Tarver et al. latedclades result inanoverestimationof miRNA loss from (2013), many of the nodes exhibit lower support values, all intermediate lineages. For example, the identification of principally because of differences in taxon sampling which mir-7880 in the 13-lined ground squirrel (Hertel and Stadler excluded lineages exhibiting few losses and introduced 2015), a miRNA family previously identified only in nema- lineages that exhibit many losses. Nevertheless, the results tode intestinal parasites, implies 51 instances of loss across are considerably more precisely and accurately resolved all intermediate lineages in the tree topology of Hertel and than those based on an uncurated miRBase data set Stadler (2015). However, this miRNA is not only absent (fig. 5; supplementary file 2, Supplementary Material from the ground squirrel genome, its annotation as a online). miRNA is spurious (the hairpin sequence lacks a two nucle- We found that support for some clades changed after in- 0 0 otide offset between the annotated 5 and 3 products; cluding singleton miRNA families (fig. 4A), with stronger sup- Fromm et al. 2015). Thus, all of these hypothesized losses port for the conventional grouping of birds with reptiles, as are artefacts of a single false-positive. well as a monophyletic Ecdysozoa. However, this analysis also gave strong support for a highly unconventional paraphyletic arrangement within Deuterostomia. These incongruities sug- microRNAs and Phylogenetics gest that small miRNA families contain some phylogenetic Data from our analysis of consistency indices suggest that signal but may still be subject to potentially strong partial miRNA families have the potential to be informative phyloge- sampling bias in our data set. Regardless, this topology was netic markers, supporting their use in previous studies still largely congruent with many established metazoan rela- (Heimberg et al. 2008, 2010; Sperling et al. 2009; Campbell tionships, despite the fact that nonsingletons accounted for et al. 2011; Philippe et al. 2011; Rota-Stabelli et al. 2011; only 322 miRNA families. Continued development of binary Helm et al. 2012; Fromm et al. 2013; Tarver et al. 2013; substitution models accommodating additional types of rate Field et al. 2014). However, the phylogenetic utility of heterogeneity, as well as partial sampling schemes, may mit- miRNA families has been called into question by Thomson igate the effect of loss, leading to greater accuracy in such et al. (2014) who found that the results from a selection of data sets (Taylor et al. 2014; Thomson et al. 2014). previous analyses lack statistical robustness. Following Tarver Finally, the concatenation of pri-miRNA sequences into et al. (2013), Thomson et al. (2014) used a Bayesian stochastic superalignments (Tarver et al. 2013) represents a promising Dollo model (Alekseyenko et al. 2008) which better accom- approach for future analyses. This approach has already re- modates homoplasy than does parsimony. This led to the solved relationships among taxa as diverse as mammals, pri- reanalysis of some of the older studies using new genomic mates, reptiles, drosophilids, and nematodes (Field et al. 2014; and additional small RNA-seq data (Field et al. 2014), correct- Kenny et al. 2015; Tarver et al. 2016), as well as in our analysis ing previous errors. Inevitably, early studies (e.g., Sperling of amniotes. In our pre-miR analysis, the results are robust al- et al. 2009; Heimberg et al. 2010; Philippe et al. 2011; though there are some nodes which conflict with other recent Helm et al. 2012; Lyson et al. 2012) fall short of current stand- studies, the critical taxa (e.g., horse; Tarver et al. 2016)have ards, based as they were on low coverage miRNAome se- invariably been identified as phylogenetically problematic quencing, annotated in the absence of a reference genome, based on conflict between previously published studies. At and performed before the Bayesian stochastic Dollo model the same time, these data resolve some controversial relation- was available. As such, they warrant further investigation fol- ships, such as the earliest diverging lineage of placental lowing contemporary standards of miRNAome sequencing mammals. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1467 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE David Bapst, Nicholas Crouch, Brian O’Meara, Matt Pennell, Conclusions and Liam Revell for discussion. Our data suggest that the loss of miRNA families is far from pervasive, contrary to several recent studies, all of which are undermined by a variety of biases, most substantially ascer- Literature Cited tainment bias. Where losses occur, they are nonrandomly dis- Aboobaker AA, Tomancak P, Patel N, Rubin GM, Lai EC. 2005. Drosophila tributed, with only 1.7% of miRNA families accounting microRNAs exhibit diverse spatial expression patterns during embry- onic development. Proc Natl Acad Sci U S A. 102(50):18017. for>45% of losses, while the nematodes Ascaris and Alekseyenko AV, Lee CJ, Suchard MA. 2008. Wagner and Dollo: a sto- Pristionchus account for 20% of losses yet represent only chastic duet by composing two parsimonious solos. Syst Biol. 6% of branches, bearing out the suggestion that miRNA fam- 57(5):772–784. ily gains and losses are associated with increasing and de- Ambros V. 2003. A uniform system for microRNA annotation. RNA creasing phenotypic complexity (e.g., as inferred from cell 9(3):277–279. Bai Y. 2014. Genome-wide sequencing of small RNAs reveals a tissue- diversity; Valentine et al. 1994), respectively. We demonstrate specific loss of conserved microRNA families in Echinococcus granulo- that thepresence/absenceofmiRNA families has aconsis- sus. BMC Genomics 15(1):736–713. tency index higher than expected for either morphological Berezikov E. 2010. Evolutionary flux of canonical microRNAs and mirtrons or molecular data of comparable taxon sampling. Our phylo- in Drosophila. Nat Genet. 42(1):6–9. genetic analyses were, in the main, congruent with estab- Berezikov E. 2011. Evolution of microRNA diversity and regulation in ani- mals. Nat Rev Genet. 12(12):846–860. lished metazoan relationships. However, our results suggest Brown M, Suryawanshi H, Hafner M, Farazi TA, Tuschl T. 2013. that small miRNA families (singletons in particular) may still be Mammalian miRNA curation through next-generation sequencing. subject to partial sampling bias in our data set. Our data sup- Front Genet. 4:145. port a pattern of miRNA diversification caused either by var- Campbell LI, et al. 2011. MicroRNAs and phylogenomics resolve the rela- iable rates of miRNA birth, or constant but low rates of tionships of Tardigrada and suggest that velvet worms are the sister group of Arthropoda. Proc Natl Acad Sci U S A. miRNA birth and a variable (but low) death rate, rather 108(38):15920–15924. than high birth and death rates. However, our under- Castellano L, Stebbing J. 2013. Deep sequencing of small RNAs identifies standing of miRNA evolution is greatly constrained by canonical and non-canonical miRNA and endogenous siRNAs in mam- the paucity of taxa with well-annotated miRNAomes, as malian somatic tissues. Nucleic Acids Res. 41(5):3339–3351. most of the taxa included here are distantly related to Chiang HR, et al. 2010. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. each other with an average lineage-specific branch length 24(10):992–1009. of 312.6 Myr. The paucity of taxa with well-annotated Crete-Lafrenie ` re A, Weir LK, Bernatchez L. 2012. Framing the Salmonidae miRNAomes runs the risk of telescoping past flux in spe- family phylogenetic portrait: a more complete picture from increased ciation and extinction onto individual internal branches of taxon sampling. PLoS One 7(10):e46662. a phylogenetic tree. In this way, the summed effects of Dabert M, Witalinski W, Kazmierski A, Olszanowski Z, Dabert J. 2010. Molecular phylogeny of acariform mites (Acari, Arachnida): strong random processes can produce nonrandom bursts in conflict between phylogenetic signal and long-branch attraction arti- (miRNA) innovation. More studies are needed to investi- facts. Mol Phylogenet Evol. 56(1):222–241. gate miRNA family evolution among congeneric species Dunn CW, Giribet G, Edgecombe GD, Hejnol A. 2014. Animal phylogeny with comparisons between the evolution of individual and its evolutionary implications*. Ann Rev Ecol Evol Syst. miRNA genes and the evolution of miRNA families. 45(1):371–395. Erwin DH, et al. 2011. The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science Supplementary Material 334(6059):1091–1097. Felsenstein J. 1992. Phylogenies from restriction sites: a maximum- Supplementary data areavailableat Genome Biology and likelihood approach. Evolution 46(1):159–173. Evolution online. Field DJ, et al. 2014. Toward consilience in reptile phylogeny: miRNAs support an archosaur, not lepidosaur, affinity for turtles. Evol Dev. 16(4):189–196. Fromm B, et al. 2015. A uniform system for the annotation of vertebrate Acknowledgments microRNA genes and the evolution of the human microRNAome. Annu Rev Genet. 49(1):213–242. This work was supported by Irish Research Council for Science Fromm B, Worren MM, Hahn C, Hovig E, Bachmann L. 2013. Substantial Engineering and Technology grant (J.T.); Engineering and loss of conserved and gain of novel MicroRNA families in flatworms. Physical Sciences Research Council studentship (R.T.); Mol Biol Evol. 30(12):2619–2628. Girard M, Jacquemin E, Munnich A, Lyonnet S, Henrion-Caude A. 2008. Australian Research Council grant DE140101879 (G.L.); SE miR-122, a paradigm for the role of microRNAs in the liver. J Hepatol. Norway Health Authority grant 2014041 (B.F.); National 48(4):648–656. Aeronautics and Space Agency-Ames grant (K.J.P.); Wolfson Grbic M, et al. 2011. The genome of Tetranychus urticae reveals herbiv- Merit Award, Biological Sciences Research Council grant orous pest adaptations. Nature 479(7374):487–492. BB/N000919/1, and Natural Environment Research Council ~ Guerra-Assunc¸ao JA, Enright AJ. 2012. Large-scale analysis of microRNA evolution. BMC Genomics 13(1):218–212. grants NE/P013678/1; NE/N002067/1 (P.D.). We thank 1468 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE Hansen TB, Kjems J, Bramsen JB. 2011. Enhancing miRNA annotation Peterson KJ, Dietrich MR, McPeek MA. 2009. MicroRNAs and metazoan confidence in miRBase by continuous cross dataset analysis. RNA macroevolution: insights into canalization, complexity, and the Biol. 8(3):378–383. Cambrian explosion. BioEssays 31(7):736–747. Heimberg AM, Cowper-Sallari R, Semon M, Donoghue PCJ, Peterson KJ. Philippe H, et al. 2011. Acoelomorph flatworms are deuterostomes related 2010. microRNAs reveal the interrelationships of hagfish, lampreys, to Xenoturbella. Nature 470(7333):255–258. and gnathostomes and the nature of the ancestral vertebrate. Proc Plasterk RHA. 2006. Micro RNAs in animal development. Cell Natl Acad Sci U S A. 107(45):19379–19383. 124(5):877–881. Heimberg AM, Sempere LF, Moy VN, Donoghue PCJ, Peterson KJ. 2008. Platt RN, et al. 2014. Large numbers of novel miRNAs originate from DNA MicroRNAs and the advent of vertebrate morphological complexity. transposons and are coincident with a large species radiation in bats. Proc Natl Acad Sci U S A. 105(8):2946–2950. Mol Biol Evol. 31(6):1536–1545. Pohl N, Sison-Mangus MP, Yee EN, Liswi SW, Briscoe AD. 2009. Impact of Helm C, Bernhart SH, Siederdissen C. H z, Nickel B, Bleidorn C. 2012. Deep sequencing of small RNAs confirms an annelid affinity of duplicate gene copies on phylogenetic analysis and divergence time Myzostomida. Mol Phylogenet Evol. 64(1):198–203. estimates in butterflies. BMC Evol Biol. 9(1):99–16. Hertel J. 2006. The expansion of the metazoan microRNA repertoire. BMC Pyron RA. 2010. A likelihood method for assessing molecular divergence Genomics 7(25). time estimates and the placement of fossil calibrations. Syst Biol. Hertel J, Stadler PF. 2015. The expansion of animal microRNA families 59(2):185–194. revisited. Life (Basel) 5(1):905–920. Quah S, Hui JH, Holland PW. 2015. A burst of miRNA innovation in the Hohna S. et al. 2016. RevBayes: Bayesian phylogenetic inference using early evolution of butterflies and moths. Mol Biol Evol. 32:1161–1174. graphical models and an interactive model-specification language. Reis M, Sousa-Guimar~ aes S, Vieira CP, Sunkel CE, Vieira J. 2011. Syst Biol. (4):726–736. Drosophila genes that affect meiosis duration are among the meiosis Iwama H, Kato K, Imachi H, Murao K, Masaki T. 2013. Human microRNAs related genes that are more often found duplicated. PLoS One originated from two periods at accelerated rates in mammalian evo- 6(3):e17512. lution. Mol Biol Evol. 30(3):613–626. Revell LJ. 2012. phytools: an R package for phylogenetic comparative bi- Kenny NJ, et al. 2015. The phylogenetic utility and functional constraint of ology (and other things). Methods Ecol Evol. 3(2):217–223. microRNA flanking sequences. Proc Biol Sci. 282(1803):20142983. Rota-Stabelli O, et al. 2011. A congruent solution to arthropod phylogeny: Kim T, Hao W. 2014. DiscML: an R package for estimating evolutionary phylogenomics, microRNAs and morphology support monophyletic rates of discrete characters using maximum likelihood. BMC Mandibulata. Proc R Soc B Biol Sci. 278(1703):298–306. Bioinformatics 15(1):320. Sanderson MJ, Donoghue MJ. 1989. Patterns of variation in levels of ho- moplasy. Evolution 43(8):1781–1795. Kluge AG, Farris JS. 1969. Quantitative phyletics and the evolution of anurans. Syst Biol. 18(1):1–32. Sanderson MJ, Donoghue MJ. 1996. The relationship between homoplasy Kozomara A, Griffiths-Jones S. 2011. miRBase: integrating microRNA an- and confidence in a phylogenetic tree. In: Sanderson MJ, Hufford L, notation and deep-sequencing data. Nucleic Acids Res. 39(Database editors. Homoplasy: the recurrence of similarity in evolution. San issue):D152–D157. Diego: Academic Press. p. 67–89. Kozomara A, Griffiths-Jones S. 2014. miRBase: annotating high confi- Sayed D, Abdellatif M. 2011. MicroRNAs in development and disease. dence microRNAs using deep sequencing data. Nucleic Acids Res. Physiol Rev. 91(3):827–887. 42(D1):D68–D73. Sempere LF, Cole CN, McPeek MA, Peterson KJ. 2006. The phylogenetic Lagos-Quintana M, et al. 2002. Identification of tissue-specific microRNAs distribution of metazoan microRNAs: insights into evolutionary com- from mouse. Curr Biol. 12(9):735–739. plexity and constraint. J Exp Zool B Mol Dev Evol. 306B(6):575–588. Lartillot N, Rodrigue N, Stubbs D, Richer J. 2013. PhyloBayes MPI: phylo- Sempere LF,Martinez P,Cole C,Baguna J, Peterson KJ. 2007. genetic reconstruction with infinite mixtures of profiles in a parallel Phylogenetic distribution of microRNAs supports the basal position environment. Syst Biol. 62(4):611–615. of acoel flatworms and the polyphyly of Platyhelminthes. Evol Dev. Lloyd GT. 2016. Estimating morphological diversity and tempo with dis- 9(5):409–415. crete character-taxon matrices: implementation, challenges, progress, Song SJung, et al. 2013. The oncogenic microRNA miR-22 targets the and future directions. Biol J Linn Soc. 118(1):131–151. TET2 tumor suppressor to promote hematopoietic stem cell self- Lu J, et al. 2008. Adaptive evolution of newly emerged micro-RNA genes renewal and transformation. Cell Stem Cell 13(1):87–101. in Drosophila. Mol Biol Evol. 25(5):929–938. Sperling EA, et al. 2009. MicroRNAs resolve an apparent conflict between Lyson TR, et al. 2012. MicroRNAs support a turtle – lizard clade. Biol Lett. annelid systematics and their fossil record. Proc R Soc B Biol Sci. 8(1):104–107. 22;276(1677):4315–4322. Lyu Y, et al. 2014. New microRNAs in Drosophila–birth, death and cycles Sperling EA, Peterson KJ. 2009. MicroRNAs and metazoan phylogeny: big of adaptive evolution. PLoS Genet. 10(1):e1004096. trees from little genes. In: Telford MJ, Littlewood DTJ, editors. Animal Meng Y, Shao C, Wang H, Chen M. 2012. Are all the miRBase-registered evolution: genomes, fossils and trees. Oxford: Oxford University Press. microRNAs true? A structure- and expression-based re-examination in p. 157–170. plants. RNA Biol. 9(3):249–253. Tanzer A, et al. 2010. Evolutionary genomics of microRNAs and their Meunier J, et al. 2013. Birth and expression evolution of mammalian relatives. In: Caetano-Anolles G, editor. Evolutionary genomics and microRNA genes. Genome Res. 23(1):34–45. systems biology. Hoboken (NJ): Wiley-Blackwell. p. 295–327. Milinkovitch MC, Helaers R, Depiereux E, Tzika AC, Gabaldo n T. 2010. 2X Tarver JE, Donoghue PCJ, Peterson KJ. 2012. Do miRNAs have a deep genomes – depth does matter. Genome Biol. 11(2):r16. evolutionary history? BioEssays 34(10):857–866. Nicholls GK, Gray RD. 2008. Dated ancestral trees from binary trait data Tarver JE, et al. 2013. miRNAs: small genes with big potential in metazoan and their application to the diversification of languages. J R Stat Soc. phylogenetics. Mol Biol Evol. 30(11):2369–2382. 70(3):545–566. Tarver JE, et al. 2016. The interrelationships of placental mammals and the Nozawa M, Miura S, Nei M. 2010. Origins and evolution of microRNA limits of phylogenetic inference. Genome Biol Evol. 8(2):330–344. genes in Drosophila species. Genome Biol Evol. 2(0):180–189. Taylor RS, Tarver JE, Foroozani A, Donoghue PC. 2017. MicroRNA anno- Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics and tation of plant genomes – do it right or not at all. BioEssays evolution in R language. Bioinformatics 20(2):289–290. 39(2):1600113. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1469 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE Taylor RS, Tarver JE, Hiscock SJ, Donoghue PC. 2014. Evolutionary history Wheeler BM, et al. 2009. The deep evolution of metazoan microRNAs. of plant microRNAs. Trends Plant Sci. 19(3):175–182. Evol Dev. 11(1):50–68. Team RC. 2016. R: a language and environment for statistical computing. Zhao Y, Samal E, Srivastava D. 2005. Serum response factor regulates a Vienna: R Foundation for Statistical Computing. muscle-specific microRNA that targets Hand2 during cardiogenesis. Thomson RC, Plachetzki DC, Mahler DL, Moore BR. 2014. A critical ap- Nature 436(7048):214–220. praisal of the use of microRNA data in phylogenetics. Proc Natl Acad Zhong Y-f, Holland PW. 2011. The dynamics of vertebrate homeobox Sci U S A. 111(35):E3659–E3668. gene evolution: gain and loss of genes in mouse and human lineages. Valentine JW, Collins AG, Meyer CP. 1994. Morphological complexity in- BMC Evol Biol. 11(1):1–13. crease in metazoans. Paleobiology 20(02):131–142. Zhou M, Luo H. 2013. MicroRNA-mediated gene regulation: potential Wahlberg N, et al. 2009. Nymphalid butterflies diversify following near applications for plant genetic engineering. Plant Mol Biol. 83(1– demise at the cretaceous/tertiary boundary. Proc R Soc B Biol Sci. 2):59–75. 2009.1303. Wang X, Liu S. 2011. Systematic curation of miRBase annotation using integrated small RNA high-throughput sequencing data for C. elegans and Drosophila. Front Genet. 276(1677):4295–302. Associate editor: Mary O’Connell 1470 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology and Evolution Oxford University Press

Loading next page...
 
/lp/ou_press/well-annotated-micrornaomes-do-not-evidence-pervasive-mirna-loss-QlaEZ184q0
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
ISSN
1759-6653
eISSN
1759-6653
D.O.I.
10.1093/gbe/evy096
Publisher site
See Article on Publisher Site

Abstract

microRNAs are conserved noncoding regulatory factors implicated in diverse physiological and developmental processes in multi- cellular organisms, as causal macroevolutionary agents and for phylogeny inference. However, the conservation and phylogenetic utility of microRNAs has been questioned on evidence of pervasive loss. Here, we show that apparent widespread losses are, largely, an artefact of poorly sampled and annotated microRNAomes. Using a curated data set of animal microRNAomes, we reject the view that miRNA families are never lost, but they are rarely lost (92% are never lost). A small number of families account for a majority of losses (1.7% of families account for>45% losses), and losses are associated with lineages exhibiting phenotypic simplification. Phylogenetic analyses based on the presence/absence of microRNA families among animal lineages, and based on microRNA sequences among Osteichthyes, demonstrate the power of these small data sets in phylogenetic inference. Perceptions of widespread evolutionary loss of microRNA families are due to the uncritical use of public archives corrupted by spurious microRNA annotations, and failure to discriminate false absences that occur because of incomplete microRNAome annotation. Key words: microRNAs, annotation, evolution, birth, death, phylogeny. Introduction (Thomson et al. 2014; Hertel and Stadler 2015). However, microRNAs (miRNAs) are short noncoding RNA molecules pre- there are two key challenges to inferring the evolutionary his- sent in the genomes of animals, plants, fungi, and both green tory of miRNAs. The first is incomplete annotation of the and brown algae (Tarver et al. 2012); they are important bio- miRNAome of an organism as a consequence of incomplete medically (Sayed and Abdellatif 2011), agriculturally (Zhou and genome and/or small RNA sequencing, an ascertainment bias Luo 2013), developmentally (Plasterk 2006), as causal agents of that results in false-negatives (Thomson et al. 2014; Fromm macroevolutionary change (Heimberg et al. 2008; Peterson et al. 2015). The second challenge is the misannotation of ran- et al. 2009), and in phylogenetic inference (Tarver et al. dom or degraded sequence, or of other classes of small RNAs, 2013). Though they are thought largely to evolve from random as miRNAs (Chiang et al. 2010; Hansen et al. 2011; Wang and sequence, they have been perceived to be rarely lost after as- Liu 2011; Meng et al. 2012; Tarver et al. 2012; Castellano and suming a regulatory function (Tarver et al. 2013). Recent stud- Stebbing 2013; Taylor et al. 2014, 2017), which results in false- ies have called this prevailing view into question (Guerra- positives. The occurrence of false-positives is so widespread Assunc¸ao and Enright 2012; Meunier et al. 2013; Thomson that some have argued that only 16% of metazoan miRNA et al. 2014; Hertel and Stadler 2015), arguing that miRNA fam- annotations in the canonical miRNA database (Kozomara and ilies exhibit rates of loss approaching 80% in some species Griffiths-Jones 2014) are genuine (Fromm et al. 2015). The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Genome Biol. Evol. 10(6):1457–1470. doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1457 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE In an attempt to obtain a realistic understanding of miRNA comparison of the results of the analyses of the two data sets, family evolution, we exploited a curated data set of eumeta- but it biases the analysis in favor of the uncurated data set by zoan miRNA families in which these biases are minimized raising the overall quality of annotation, minimizing the im- (Fromm et al. 2015). This data set includes all metazoan pact of noise reduction in the curated data set. Not all taxa that have been the subject of exhaustive miRNAome miRBase miRNA genes are assigned to families,—particularly annotation based on genome and small RNA transcriptome in cases where there is only a single gene in the family. We sequencing. We present an analysis of the dynamics of meta- included these as individual families containing the single zoan miRNA evolution, demonstrating that miRNA families gene. Some genes are named correctly but not assigned to are among the most conserved of phylogenetic characters. the correct family. For example, in Schmidtea mediterranea, Most losses can be attributed to a small number of families there are three genes which have correctly been given the that have been repeatedly lost, and a small number of line- name miR-7 but which are not included in the miR-7 family ages exhibiting reduced phenotypic complexity (e.g., as in- grouping on miRBase v.21 (Kozomara and Griffiths-Jones ferred from cell diversity; Valentine et al. 1994). Finally, our 2014). In such instances, we included each of these as a sep- phylogenetic analyses of 1) the distribution of miRNA families, arate family. The alternative would have been to hand curate and 2) their nucleotide sequences, demonstrate the power of the data set, which would be unrepresentative of miRBase, or miRNAs in recovering phylogenetic resolution at the scale of else we could assign genes to families based on their name. the animal kingdom. However, this would introduce additional artifact since many genes correctly assigned to families in miRBase have different names (e.g., mmu-mir-300 is currently correctly listed in the miR-154 family, not the miR-300 family, despite the differ- Materials and Methods ence in names). Thus, for consistency, we have followed the Taxon Sampling family groupings that are listed on miRBase and ignored the To minimize false negatives we constrained our analyses to names of the genes. only those taxa with high-coverage genomic and small RNA sequencing. To eliminate the effect of transcriptional noise Phylogeny and Rates of Evolution (i.e., non-miRNA sequences), we used the curated data set of Fromm et al. (2015) which is based on a reanalysis of the The phylogenetic relationships between taxa included in the entire compliment of eumetazoan miRNA families in miRBase analysis are not controversial and a constraint timetree was V.21 (Kozomara and Griffiths-Jones 2014) following estab- used based on previous molecular clock studies (Dabert et al. lished criteria (Ambros 2003; Kozomara and Griffiths-Jones ^ 2010; Pyron 2010; Erwin et al. 2011; Reis et al. 2011; Crete- 2011). Fromm and colleagues found that of the 7,095 anno- Lafrenie ` re et al. 2012; Tarver et al. 2016) onto which the evo- tated metazoan miRNA families in miRBase V.21, only 1,178 lution of microRNAs was mapped. This allowed us to estimate (16.60%) meet all of the criteria required for correct annota- rates of miRNA gain and loss in two ways. First, for the analyses tion, with a further 2,104 (29.7%) meeting some but not all of the Homo and Drosophila lineages, numbers of shared of the criteria required for validation (e.g., lacking the prod- miRNA families were calculated for the internal nodes of the 0 0 ucts of both the 5 and 3 arms). Taxa with low coverage phylogeny, so that the branching pattern could be observed. (<6) genomes (e.g., Taeniopygia guttata), poorly annotated Second, for the total tree, the rate for each branch was aver- small RNA repertoires (e.g., Ovis aries), and uncertain phylo- aged across the length of the branch with rates calculated as genetic position (e.g., Strongyloides ratti), were removed, re- birth/death per million years. About 1,000 simulations were run ducing the data set to 1,139 miRNA families from 35 species, to generate confidence intervals around the average values. representing a broad range of metazoan taxa (supplementary We investigated the phylogenetic and temporal distribu- file 1, Supplementary Material online). This data set minimizes tion of gains and losses using a novel stochastic character both false negatives and any bias from misannotated miRNAs, mapping approach under a Dollo assumption; this model is providing the basis for an approximately unbiased assessment employed even by those who argue that miRNA loss is prev- of the rates of miRNA family birth and death. alent (Thomson et al. 2014) since there is no evidence for the To establish the impact of data curation, we compiled an convergent evolution of miRNA families (Tarver et al. 2013). uncurated data set (supplementary file 2, Supplementary First, we assumed that characters were gained only once and Material online) from miRBase v.21 (Kozomara and that the gain occurred on either the terminal branch (for Griffiths-Jones 2014) which we subjected to parallel analyses. singletons), or the branch subtending the least inclusive clade Not all of the taxa in our curated data set appear in miRBase of taxa exhibiting presence of that miRNA. Without further v.21 (Kozomara and Griffiths-Jones 2014), viz. Alligator mis- information to constrain its position, we assumed equal prob- sissippiensis, Melibe maugeana, Columba livia,and Chrysemys ability of miRNA family gain having occurred at any point picta, and so we included the curated annotations for these along the branch and the precise timing of gain time was taxa in the otherwise uncurated data set. This facilitates direct generated by randomly sampling from a uniform distribution 1458 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE between the beginning and the end of this branch. Thus, in Phylogenetics practice, gains will always occur on the same branch, but the To evaluate the suitability of miRNAs as phylogenetic characters, precise position (and hence timing) is determined stochastically we first compiled a data matrix using the complete data set of all and will vary each time the algorithm is run. Second, we as- miRNA families, coding their presence and absence (supplemen- sumed no losses occurred in the case of singletons but allowed tary file 1, Supplementary Material online). We then estimated stochastic losses to occur in instances of nonsingletons. Where the level of homoplasy in this data set using the Consistency losses were possible, these were simulated by using the make.- Index, following Sanderson and Donoghue (1989, 1996), with simmap function in the phytools package (Revell 2012)by plac- comparisons made to their data set of 101 morphological and ing a hard prior probability of presence on the root, and then molecular matrices. The Consistency Index was calculated in only allowing losses (1–0 transitions) in the model. When losses MacClade with all characters considered to be unordered. were observed, the branch on which they were inferred to have We conducted phylogenetic analyses of the curated and occurred was recorded, as well as the position along that uncurated data sets (supplementary files 1 and 2, branch (effectively a temporal estimate). This whole process Supplementary Material online) using a stochastic Dollo binary was achieved using the custom-written function DolloSCM in substitution model (Nicholls and Gray 2008) implemented in the package Claddis (Lloyd 2016). Because the algorithm is RevBayes (Hohna 2016), with four discrete gamma loss rate stochastic, it was run 1,000 times in order to generate mean categories across sites, and the homogeneous origination rate and 95% confidence intervals. Similarly, the output allows k integrated out of the likelihood equation (see Alekseyenko both phylogenetic and temporal positions to be examined for et al. 2008). We applied an ascertainment bias correction to each inferred transition, which can themselves be subdivided account for unobserved miRNA families lost in all species into gains (0–1 transitions) and losses (1–0 transitions). We (Felsenstein 1992). We conducted a second analysis with all used this output to estimate per-branch rates for gains-only singleton families removed, applying a correction to account (as losses are necessarily biased both phylogenetically, to- for the absence of both singletons as well as families lost in all ward nested branches, and temporally, toward the pre- species (Alekseyenko et al. 2008; Nicholls and Gray 2008). sent, because they must occur after a gain). We also The RevBayes script used for this analysis is provided in sup- generated separate (gains and losses) time series of rates plementary file 3, Supplementary Material online. (changes per 10-Myr time bin) from 720 Ma to the present. Convergence was assessed using the tracecomp and bpcomp Finally, we compared inferred losses across all miRNAs to programs in the PhyloBayes package (Lartillot et al. 2013), simple minimum and maximum possible values based on ensuring a maxdiff statistic< 0.05, a minimum effsize> 100 the observed data. Here, the minimum was set at zero (no and maximum rel_diff< 0.1 across all parameters. Finally, we losses). The maximum is more complex to estimate, but conducted phylogenetic analyses of concatenated pri-miRNA here we used a simple rule, viz.: 1) if a singleton then no sequences. This alignment (supplementary file 4, losses could be observed; 2) if presence of the miRNA is Supplementary Material online) was run under a GTRþ G presentin all members ofacladeoftwo tips then, again, model in PhyloBayes. no losses could be observed; 3) if the least inclusive clade exhibiting the miRNA contains more than two taxa, a loss could be observed and the maximum number of losses is Results 2N,where N is the number of taxa in that clade (i.e., with Birth and Death of microRNA Families all losses occurring on terminal branches). These rules were The curated data set reveals a significant difference in the then used to sum the maximum number of losses across all number of miRNA families present in the most completely miRNAs. The inferred total number of losses can vary (due annotated deuterostome (Homo sapiens) and protostome to the stochastic nature of the algorithm) and, thus, we (Drosophila melanogaster), with 300 in H. sapiens, and 132 considered the full distribution of our 1,000 runs when in D. melanogaster (fig. 1). This difference in cumulative comparing between minimum and maximum losses. The miRNA family acquisition is widely recognized (Hertel 2006; Dollo model necessitates that characters are rarely lost and Heimberg et al. 2008; Wheeler et al. 2009; Fromm et al. this may bias toward higher rates of gain. To avoid this po- 2015); the large complement of human miRNAs inferred to tential artefact, we used a model that would produce much be a consequence of episodic bursts in miRNA acquisition at lower rates of gain compared with loss. Using DiscML (Kim the origin of bilaterians, vertebrates, placentals, and primates and Hao 2014) in R, we estimated single rates of gain (0.16) (Heimberg et al. 2008; Wheeler et al. 2009; Tanzer et al. and loss (1.64) for all characters, with the root set to miRNA 2010). Likewise, the absolute number of miRNA family losses absence. Using these estimates, we calculated maximum like- varies between species (23 in H. sapiens;10 in D. mela- lihood ancestral states in APE (Paradis et al. 2004)which we nogaster). We find a low proportion of loss (0.07) in these used to estimate rates of evolution in Claddis (Lloyd 2016). All lineages, with an overall ratio of 13.0 gains to every 1 loss in analyses were performed in R (RCore Team 2016) and the H. sapiens and 13.2 gains to each loss in D. melanogaster. code and data used are available from Dryad. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1459 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE H. sapiens Homo sapiens Drosophila melanogaster Gains (300) 92.8% 93.0% D. melanogaster Gains (132) H. sapiens losses (23) D.melanogaster Losses (10) 800 700 600 500 400 300 200 100 0 Time (Millions of years before present) FIG.1.—Cumulative numbers of gains and losses of miRNA families in the lineages leading to the two most completely sampled taxa (Homo and Drosophila) over the past 720 Myr based on our curated data set. To determine the extent to which these lineages are rep- with an average length of 196 Myr, suggesting that the broad resentative of the broader pattern of miRNA gain and loss CIs reflect variable, rather than constant rates of character across metazoans, we mapped all character state changes change. Thus, if we consider the 44 gains on the lineage lead- onto the known phylogeny, with gains and losses calculated ing to vertebrates (617–510 Ma), the CI spans a 6-fold range for each taxon, running from the root to tree tips. This reveals from 0.025 to 0.15 gains per million years. There is a significant an average gain to loss ratio per taxon of 11.4 gains for every increase in the evolution of miRNA families corresponding to loss, equating to 0.08 loss proportion (table 1), with a me- the evolution of Bilateria during the Cryogenian. The rate dian ratio of 10.5:1 and a loss proportion of 0.087. decreases until an Ediacaran rise followed by a steady Of the 1,143 miRNA families in the curated data set, 1,052 Cambrian–Permian decrease in the rate of gains. The subse- (92%) exhibit no losses, 52 are lost once, 20 are lost twice, 10 quent rate of miRNA family gains has increased steadily, sub- are lost three times, 7 are lost four times, miR-2001 is lost five stantially from the Cretaceous. In contrast, losses exhibit an times, and miR-315 is lost six times (fig. 2). Thus, although approximately constant (and low) rate from the Cryogenian. only 91 individual families exhibit losses, we observe a total of Numerous internal branches exhibit significantly high rates of 161 losses, and these are distributed unevenly across the tree. miRNA family acquisition (e.g., stem-lineages of bilaterians, Of the 68 branches with character changes, 24 show only protostomes, vertebrates, placentals, and primates; supple- gains of miRNA families, a further 20 show 75% gains, mentary file 5, Supplementary Material online). Some external while only eight branches are characterized by more losses branches show significantly high acquisition rates (most pla- than gains (fig. 3A). centals, birds, drosophilids, and nematodes). Significantly low When character state changes are mapped through time rates of miRNA family acquisition are observed on internal (fig. 3B), average gains outweigh losses, although the confi- branches leading to mammals and birds. These results obtain dence intervals overlap until the mid-Permian. Confidence regardless of whether the Dollo Parsimony or likelihood-based intervals are constrained by relatively long branch lengths models of ancestral state reconstruction are used as a basis for 1460 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Cumulative number of microRNA families Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE Table 1 Sum Total of miRNA Family Gains and Losses Along the Path from the Root of the Phylogenetic Tree to Each Tip Based on our Curated Data Set Node Lineage Gains Losses Gain:Loss Percentage Ratio Loss 37 Amphimedon 80 N/A N/A 38 Nematostella 28 0 N/A N/A 40 Capitella 68 0 N/A N/A 39 Melibe 61 2 30.50 3.28 47 Apis 77 4 19.25 5.19 46 Ixodes 57 3 19.00 5.26 54 Branchiostoma 56 3 18.67 5.36 400 52 Saccoglossus 54 3 18.00 5.56 56 Petromyzon 99 6 16.50 6.06 60 Chrysemys 130 8 16.25 6.15 63 Columba 154 10 15.40 6.49 48 Tribolium 88 6 14.67 6.82 61 Alligator 128 9 14.22 7.03 7 1 1 70 Homo 301 22 13.68 7.31 0 1 2 3 4 5 6 7 8 9 10 51 D. melanogaster 133 10 13.30 7.52 Observed number of losses per miRNA family 53 Strongylocentrotus 46 4 11.50 8.70 FIG.2.—The observed number of losses of miRNA families inferred 62 Gallus 146 13 11.23 8.90 from our phylogenetic analysis of the curated data set. 71 Macaca 235 21 11.19 8.94 59 Anolis 124 12 10.33 9.68 67 Bos 219 22 9.95 10.05 Utility of microRNA Families in Phylogenetics 50 D. pseudoobscura 99 10 9.90 10.10 The utility of a marker for phylogenetic inference is related to 44 Celegans 105 11 9.55 10.48 the level of homoplasy that it exhibits. In order to estimate the 58 Salmon 113 12 9.42 10.62 level of homoplasy in our miRNA family data set, we used the 57 Danio 115 13 8.85 11.30 Consistency Index (CI) to characterize the number of times a 49 Bombyx 88 10 8.80 11.36 66 Sus 198 24 8.25 12.12 miRNA family is gained and lost on the tree. CI is based on 69 Mus 259 33 7.85 12.74 the minimum number of changes required by a character 65 Monodelphis 143 19 7.53 13.29 or data set, divided by the actual number of changes 41 Schmidtea 58 9 6.44 15.52 (Kluge and Farris 1969); values range from 0 to 1, where a 64 Ornithorhynchus 129 21 6.14 16.28 CI of 1 reflects the minimum number of changes and, there- 68 Rattus 221 36 6.14 16.29 fore, zero homoplasy. The expected CI can be inferred based 43 Pristionchus 140 25 5.60 17.86 on an empirically derived formula (Sanderson and Donoghue 42 Ascaris 64 14 4.57 21.88 1996); for a categorical data set of 35 taxa it is  0.50, and for 45 Tetranychus 56 13 4.31 23.21 molecular sequence data  0.64. In contrast, the CI observed 55 Ciona 40 14 2.86 35.00 for our miRNA data sets is 0.88, indicating that miRNA families NOTE.—The gain to loss ratio, and percentage loss, for each individual lineage scored simply by presence/absence, exhibit surprisingly low are also shown. Thus, the three gains and two losses observed on the lineage leading to chordates are recorded in all descendent taxa. levels of homoplasy and, as such, should be reliable phyloge- netic markers. Having established the empirical basis for the phylogenetic utility of miRNAs, we conducted a phylogenetic analysis of the analysis of the curated data set (supplementary file 6, same data set using a stochastic Dollo model of binary char- Supplementary Material online). The likelihood-based analysis acter evolution, implemented in RevBayes. This recovered differs principally in inferring a greater flux of miRNA family trees that were largely compatible, even if not always highly gain and loss deep within metazoan evolution, though this is supported, with current knowledge of metazoan evolution likely an analytic artefact of the uninformative outgroup that (fig. 4), suggesting that previous unorthodox results obtained lacks miRNAs altogether (supplementary file 6, Supplementary using BEAST were at least partly driven by model mis- Material online). specification (previous analyses lacked ascertainment bias cor- Analysis of the uncurated miRBase data set yielded results rection and did not use a Gamma distribution). Parallel anal- that imply an order of magnitude higher rate of gain and loss of yses of the uncurated miRBase data set (supplementary file 2, miRNAs within the Cenozoic, toward the tips of the phyloge- Supplementary Material online) yielded trees that are much netic tree, in comparison to the rates inferred from the curated more incongruent with current perceptions of metazoan data set (supplementary file 7, Supplementary Material online). Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1461 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Number of miRNA families Nematodes Arthropods Vertebrates Insects Amniotes Mammals loss Tarver et al. GBE 1000 800 700 600 500 400 300 200 900 100 0 Chanoflagellate Monosiga 8 Amphimedon Sponge Cnidarian 28 Nematostella Mollusc Melibe Annelid 19 Capitella Platyhelminth 21 Schmidtea 23 Ascaris 106 Pristionchus C. elegans 1 18 Tetranychus Ixodes 14 Apis 18 Tribolium 27 Bombyx D. pseudoobscura 58 D. melanogaster 18 Saccoglossus Hemichordate Echinoderm Strongylocentrotus Branchiostoma Amphioxus 21 Tunicate Ciona 16 Petromyzon Danio Salmo 44 5 Anolis 3 19 17 Chrysemys 17 Alligator Gallus 36 Columbia Ornithorhynchus 31 Monodelphis gain 10 Sus 8 3 30 Bos 78 10 Rattus 46 Mus Homo 13 Macaca 1000 900 800 700 600 400 300 200 100 0 Cryogenian Ediacaran Camb Ord Sil Devon Carbon Perm Trias Jurass Cretace Pal Ng FIG.3.—The gains and losses of miRNA families for all taxa with well-annotated miRNAomes, (A) per branch within the phylogenetic tree, (B) per unit time across the phylogenetic tree. miRNA gains showed an increased rate toward the Recent, compatible with some theories which posit an early transient phase in which miRNA gain and loss is rapid (Peterson et al. 2009), making early losses difficult to detect. phylogeny (fig. 5), regardless of whether singletons were in- paraphyletic with respect to mammals) or excluded (little res- cluded (protostomes resolved as paraphyletic with respect to olution among invertebrates; lizard resolved as sister to mam- deuterostomes; lizards, and archosaurs resolved as mals rather than archosaurs). 1462 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 0.0 0.1 0.2 0.3 0.4 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE 0.98 0.79 1 1 1 1 0.85 1 0.81 1 0.83 0.89 0.95 1 0.57 0.52 0.76 0.94 0.61 1 1 0.97 0.52 0.59 1 0.62 0.81 1 1 1 1 0.84 0.58 0.92 0.51 0.76 1 1 0.55 0.92 1 0.56 0.92 0.98 0.94 0.63 0.92 FIG.4.—Phylogenetic tree derived from the Dollo analysis of the curated presence/absence of microRNA families considered in our analysis. (A)Shows results whereby all miRNAs were included and (B) where singletons (synapomorphic) miRNAs were excluded. Node values are clade posterior probabilities. The results of the tetrapod superalignment are congruent acquisition rates (e.g., at the origins of Bilateria, Protostomia, with established phylogenies (fig. 6). Some parts of the tree, Vertebrata, and Placentalia), while others show significantly such as the higher level relationships within Laurasiatheria, are low acquisition rates (e.g., at the origins of Aves, and unresolved or have low levels of support, however, these nodes Mammalia). Terminal branches also show highly variable have traditionally proven hard to resolve (Tarver et al. 2016). rates; this likely reflects the depth of miRNA annotation rather This expanded data set provides additional support for turtles than genuine biological signal, as many of these taxa are as archosaurs (Field et al. 2014), and resolves an model organisms. The results of analyses of the uncurated Atlantogenata–Boreoeutheria root for placental mammals miRBase data set imply several factors higher numbers and (Tarver et al. 2016). frequency of losses compared with the curated data set; the general pattern of miRNA family gain and loss is the same through the Phanerozoic, except for the Cenozoic where the uncurated data set implies an order of magnitude higher Discussion number and rate of gains and losses (supplementary file 7, Diversification of miRNA Families Supplementary Material online). These differences reflect the The results of our analyses based on the curated data set impact of curation and the importance of using curated data show highly variable rates in the diversification of miRNA fam- from high coverage genomes in attempting to obtain an ac- ilies, with some internal branches showing significantly high curate perspective on miRNA family evolution; uncurated Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1463 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Monosiga Monosiga Amphimedon Amphimedon Nematostella Nematostella Ascaris Ciona Pristionchus Branchiostoma C. elegans Strongylocentrotus Tetranychus Saccoglossus Ixodes Schmidtea Apis Melibe Tribolium Capitella Bombyx Ascaris D. pseudoobscura Pristionchus D. melanogaster C. elegans Schmidtea Tetranychus Melibe Ixodes Apis Capitella Strongylocentrotus Tribolium Saccoglossus Bombyx Branchiostoma D. pseudoobscura Ciona D. melanogaster Petromyzon Petromyzon Salmon Salmon Danio Danio Anolis Anolis Chrysemys Chrysemys Alligator Alligator Gallus Gallus Columba Columba Ornithorhynchus Ornithorhynchus Monodelphis Monodelphis Sus Sus Bos Bos Rattus Rattus Mus Mus Macaca Macaca Homo Homo Tarver et al. GBE 0.75 0.99 0.83 0.84 0.88 0.99 0.99 0.82 0.74 0.98 0.89 0.87 0.99 0.5 0.97 0.99 0.99 0.83 0.99 0.98 0.98 0.98 0.61 0.52 0.53 0.79 0.99 0.99 0.99 0.7 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.75 0.99 0.95 0.96 0.99 0.99 0.83 0.99 0.99 0.99 0.81 0.5 0.99 0.99 0.99 0.99 0.99 0.99 0.97 FIG.5.—Phylogenetic tree derived from the Dollo analysis of presence/absence of microRNA families in our uncurated miRBase data set. (A) Shows results whereby all miRNAs were included and (B) where singletons (synapomorphic) miRNAs were excluded. Node values are clade posterior probabilities. miRNA data sets yields spurious perceptions of miRNA and death rates. High birth and death rates have been inferred evolution. primarily based on analyses of Drosophila species. Lu et al. The diversification of microRNAs is the result of birth and (2008) has been criticized for misannotation of degraded death of individual miRNA families. Thus, apparent shifts in transcriptional sequences as miRNAs (Berezikov 2010). the miRNA diversification rate can be caused by 1) variable Nozawa et al. (2010) also inferred high death rates of birth rates, 2) variable death rates, or 3) a combination of miRNA families in Drosophila but their study did not control both. Previous work has suggested that shifts in miRNA diver- for the validity of miRNAs in miRBase or variable genome sification rates are caused by increased birth rates (Heimberg quality, and compounded these problems by assaying et al. 2008, 2010; Iwama et al. 2013; Platt et al. 2014), or miRNA evolution at the gene level, increasing false- constant birth rate and variable death rate (Lu et al. 2008; negatives. Lyu et al. (2014) inferred 87–94% of miRNA family Nozawa et al. 2010; Quah et al. 2015). Our data support loss within 4–30 and 60 Myr of Drosophila evolution, respec- either variable birth rates, or a constant but low birth rate tively; unfortunately, their analysis assumed a constant birth with slight variation in a low death rate, rather than high birth rate a priori, for which there is no evidence. 1464 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Amphimedon Schmidtea Monosiga Nematostella Capitella Nematostella Schmidtea Amphimedon Pristionchus Monosiga Caenorhabditis Melibe Ascaris Pristionchus Tetranychus Caenorhabditis Ixodes Ascaris Tetranychus Tribolium Apis Ixodes Bombyx Tribolium Drosophila Apis Drosophila Bombyx Capitella Drosophila Drosophila Melibe Strongylocentrotus Ciona Saccoglossus Strongylocentrotus Ciona Saccoglossus Branchiostoma Branchiostoma Petromyzon Petromyzon Salmo Salmo Danio Danio Anolis Columba Columba Alligator Alligator Chrysemys Chrysemys Anolis Ornithorhynchus Gallus Monodelphis Ornithorhynchus Monodelphis Gallus Sus Sus Rattus Bos Mus Rattus Bos Mus Macaca Macaca Homo Homo Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE FIG.6.—Phylogenetic tree derived from phylogenetic analysis of the concatenated analysis of pri-miRNA sequences. Node values are clade posterior probabilities. Investigating butterflies and moths, Quah et al. (2015) ob- viewpoint, that lineage specific miRNAs are therefore young, served that the terminal lineages in their phylogeny exhibited which has been incorrectly interpreted as providing support a much higher abundance of lineage-specific miRNAs than for the high birth rate of miRNAs (Taylor et al. 2014). This subtending lineages, supporting the view that miRNA family artefact of perception underpins the theory of a transient diversification rates are high. However, they failed to account phase of early miRNA evolution which now lacks an evi- for the antiquity of their terminal lineages (e.g., the species dential basis. miRNAome annotation of populations and with the largest number of lineage-specific miRNAs, Pararge closely related species are required to assess the veracity aegeria, is also estimated to have diverged from its sister taxa of a perceived “pull of the Recent” in miRNA family ac- 100 Ma; Pohl et al. 2009; Wahlberg et al. 2009). It is this quisition rates. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1465 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE Transcriptional Noise Loss of miRNAs Overall, we observed low levels of miRNA family loss and the The critical first step in understanding microRNA evolution is losses that we did observe are nonuniformly distributed across to establish a clear framework for what is and what is not a the phylogeny. The losses we observed provide support for microRNA, that is to separate miRNAs (signal) from tRNAs, the role of miRNA families in the evolution of morphological siRNAs, rRNAs, and degraded mRNAs (noise) within the tran- complexity (Sempere et al. 2006; Heimberg et al. 2008; scriptome. The inclusion of degraded mRNAs, as well as other Peterson et al. 2009; Wheeler et al. 2009; Berezikov 2011), RNA molecules in studies of miRNA evolution, is a significant since loss-prevalent lineages, such as Ciona (89%), Ascaris source of bias in the inference of miRNA losses, and has im- (22%), Pristionchus (14%), and Schmidtea (43%), correlate pacted previous results in several ways. First, individual studies with phenotypic simplification (Erwin et al. 2011; Philippe such as Meunier et al. (2013), who identified many putative et al. 2011; Fromm et al. 2013; Bai 2014), contra Dunn novel miRNAs and who showed variable conservation rates, et al. (2014). Large proportions of losses are also observed are suspect given that of the 333 purported miRNAs candi- in the spider mite Tetranychus (61%), which has undergone dates only 49 (14%) achieve all of the criteria necessary for large scale genomic reduction, losing>27% of its protein miRNA annotation (Fromm et al. 2015). Second, such errors coding genes (Grbic et al. 2011). Apparent exceptions to are compounded by the inclusion of such sequences into this are the relatively large proportions of miRNA families miRBase (Kozomara and Griffiths-Jones 2014) (the official on- lost in Ornithorhynchus (30%) and Rattus (40%), and clus- line depository of microRNAs). Indeed, the majority of sequen- tered losses within the murid lineage where 36% of the ces deposited in miRBase may not be genuine microRNAs changes are losses. This branch shows the highest rate of (Hansen et al. 2011; Wang and Liu 2011; Meng et al. loss (0.2 miRNA families per million years) in our data set. 2012; Brown et al. 2013; Taylor et al. 2014; Fromm et al. The miRNA losses identified in murids appear to be genuine, 2015), but are instead fragments of a great many different as evidenced by the comprehensively annotated genomes, types of RNA. Thus, in bioinformatic analyses (Guerra- deep coverage of its small RNA expression, which surpasses Assunc¸ao and Enright 2012; Hertel and Stadler 2015)in that of all other taxa, and also by synteny maps which showed which the miRNA catalogue of miRBase (Kozomara and that these losses of miRNA-families occurred on eight distinct Griffiths-Jones 2014) isused asa reference BLAST database, chromosomes (Fromm et al. 2015). Thenatureofthese losses many of the sequences under investigation are not miRNAs, is unexpected as nine of the eleven miRNAs that were lost, compromising the results from such studies. originated on the lineage leading to placentals. miRNA losses are otherwise mosaic in nature (Tarver et al. 2013), but the False-Negatives pattern here implies the loss of one or more closely associated When inferring loss of miRNAs, a critical issue is the depth of genetic pathways, mirroring losses observed in the homeobox genome sequencing, a problem noted but not addressed by gene repertoire of murids (Zhong and Holland 2011). Guerra-Assunc¸æo and Enright (2012) and Hertel and Stadler (2015). Issues associated with the use of low-coverage genomes in comparative genomics have been discussed pre- Comparisons to Studies Showing High Rates of Loss viously by Milinkovitch et al. (2010) who showed that they can While our results demonstrate that, overall, miRNA families generate artificially high numbers of gene losses. Tarver et al. exhibit proportionally few losses, and the majority of these are (2013) showed large-scale differences in the number of miss- attributable to a small number of miRNAs families and evo- ing miRNAs between three levels of sequencing coverage, lutionary lineages, this pattern is inconsistent with the with complete, high, and low coverage genomes missing view that loss of miRNA families is close to zero on an average 2.6%, 3%, and 19.5% miRNA families, re- (Sempere et al. 2006, 2007; Peterson et al. 2009; spectively. Low-coverage genomes were missing, on average, Sperling and Peterson 2009; Sperling et al. 2009; 6 times more miRNA families than high-coverage genomes, Wheeler et al. 2009; Tarver et al. 2013). Nevertheless, yet low-coverage genomes were included in both Guerra- our results are clearly incongruent with those published Assunc¸æo and Enright (2012) and Hertel and Stadler (2015). in other studies suggesting high rates of microRNA family Sequencing depth not only affects bioinformatic studies loss (Guerra-Assunc¸ao and Enright 2012; Meunier et al. but also those that rely only on small RNA sequencing data, 2013; Thomson et al. 2014; Hertel and Stadler 2015). All such as Meunier et al. (2013). This study generated small of these studies rely on different data sets, whether ge- RNA-seq data from five organs of single male representatives nomic (Guerra-Assunc¸ao and Enright 2012; Hertel and of six species, used to investigate the pattern of mammalian Stadler 2015), or small RNA (Meunier et al. 2013). Thus, miRNA evolution, concluding that there were high levels of the causes of the discrepancies between these studies and miRNA family turnover (Meunier et al. 2013). However, our own may include: 1) transcriptional noise, 2) false- miRNAs are known to have restricted tissue-specific expres- negatives, and 3) false-positives. sion profiles (Lagos-Quintana et al. 2002; Aboobakeretal. 1466 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE 2005), and so miRNA families present in the genome may not depth (100 Ms reads) from a breadth of tissues, organs, be sequenced in a sample of only five organs. Meunier et al. and developmental stages, as well as manual annotation of (2013) inferred the loss of miR-1, miR-22, and miR-122 the miRNAome supported by a ( 6) reference genome and (among others) in humans, despite numerous other studies a high-quality bioinformatics pipeline (Taylor et al. 2017). highlighting the key roles these miRNA families play in muscle Our principal analyses were based on a data set that development (Zhao et al. 2005), cancer (Song et al. 2013), minimizes false-positives and negatives (supplementary and in cholesterol and lipid metabolism within the liver (Girard file 1, Supplementary Material online). Our results dem- et al. 2008). It is clear that false-negatives caused either by the onstrate that miRNA families have a consistency index far use of low coverage genomes, unrepresentative tissue sam- higher than would be expected for either molecular or pling, or reliance on small RNA sequencing without reference morphological data sets of similar taxonomic scale and, to a genome, have all greatly increased the number of appar- thus, should make robust phylogenetic markers. This is ent losses of miRNA families. reflected in the results of our phylogenetic analyses that, even when based on miRNA family presence/absence, produces a tree that is in good agreement with phyloge- False-Positives nies based on more traditional markers (fig. 4). In compar- Genuine miRNAs incorrectly homologized in distantly re- ison to theresultofasimilaranalysis by Tarver et al. latedclades result inanoverestimationof miRNA loss from (2013), many of the nodes exhibit lower support values, all intermediate lineages. For example, the identification of principally because of differences in taxon sampling which mir-7880 in the 13-lined ground squirrel (Hertel and Stadler excluded lineages exhibiting few losses and introduced 2015), a miRNA family previously identified only in nema- lineages that exhibit many losses. Nevertheless, the results tode intestinal parasites, implies 51 instances of loss across are considerably more precisely and accurately resolved all intermediate lineages in the tree topology of Hertel and than those based on an uncurated miRBase data set Stadler (2015). However, this miRNA is not only absent (fig. 5; supplementary file 2, Supplementary Material from the ground squirrel genome, its annotation as a online). miRNA is spurious (the hairpin sequence lacks a two nucle- We found that support for some clades changed after in- 0 0 otide offset between the annotated 5 and 3 products; cluding singleton miRNA families (fig. 4A), with stronger sup- Fromm et al. 2015). Thus, all of these hypothesized losses port for the conventional grouping of birds with reptiles, as are artefacts of a single false-positive. well as a monophyletic Ecdysozoa. However, this analysis also gave strong support for a highly unconventional paraphyletic arrangement within Deuterostomia. These incongruities sug- microRNAs and Phylogenetics gest that small miRNA families contain some phylogenetic Data from our analysis of consistency indices suggest that signal but may still be subject to potentially strong partial miRNA families have the potential to be informative phyloge- sampling bias in our data set. Regardless, this topology was netic markers, supporting their use in previous studies still largely congruent with many established metazoan rela- (Heimberg et al. 2008, 2010; Sperling et al. 2009; Campbell tionships, despite the fact that nonsingletons accounted for et al. 2011; Philippe et al. 2011; Rota-Stabelli et al. 2011; only 322 miRNA families. Continued development of binary Helm et al. 2012; Fromm et al. 2013; Tarver et al. 2013; substitution models accommodating additional types of rate Field et al. 2014). However, the phylogenetic utility of heterogeneity, as well as partial sampling schemes, may mit- miRNA families has been called into question by Thomson igate the effect of loss, leading to greater accuracy in such et al. (2014) who found that the results from a selection of data sets (Taylor et al. 2014; Thomson et al. 2014). previous analyses lack statistical robustness. Following Tarver Finally, the concatenation of pri-miRNA sequences into et al. (2013), Thomson et al. (2014) used a Bayesian stochastic superalignments (Tarver et al. 2013) represents a promising Dollo model (Alekseyenko et al. 2008) which better accom- approach for future analyses. This approach has already re- modates homoplasy than does parsimony. This led to the solved relationships among taxa as diverse as mammals, pri- reanalysis of some of the older studies using new genomic mates, reptiles, drosophilids, and nematodes (Field et al. 2014; and additional small RNA-seq data (Field et al. 2014), correct- Kenny et al. 2015; Tarver et al. 2016), as well as in our analysis ing previous errors. Inevitably, early studies (e.g., Sperling of amniotes. In our pre-miR analysis, the results are robust al- et al. 2009; Heimberg et al. 2010; Philippe et al. 2011; though there are some nodes which conflict with other recent Helm et al. 2012; Lyson et al. 2012) fall short of current stand- studies, the critical taxa (e.g., horse; Tarver et al. 2016)have ards, based as they were on low coverage miRNAome se- invariably been identified as phylogenetically problematic quencing, annotated in the absence of a reference genome, based on conflict between previously published studies. At and performed before the Bayesian stochastic Dollo model the same time, these data resolve some controversial relation- was available. As such, they warrant further investigation fol- ships, such as the earliest diverging lineage of placental lowing contemporary standards of miRNAome sequencing mammals. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1467 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE David Bapst, Nicholas Crouch, Brian O’Meara, Matt Pennell, Conclusions and Liam Revell for discussion. Our data suggest that the loss of miRNA families is far from pervasive, contrary to several recent studies, all of which are undermined by a variety of biases, most substantially ascer- Literature Cited tainment bias. Where losses occur, they are nonrandomly dis- Aboobaker AA, Tomancak P, Patel N, Rubin GM, Lai EC. 2005. Drosophila tributed, with only 1.7% of miRNA families accounting microRNAs exhibit diverse spatial expression patterns during embry- onic development. Proc Natl Acad Sci U S A. 102(50):18017. for>45% of losses, while the nematodes Ascaris and Alekseyenko AV, Lee CJ, Suchard MA. 2008. Wagner and Dollo: a sto- Pristionchus account for 20% of losses yet represent only chastic duet by composing two parsimonious solos. Syst Biol. 6% of branches, bearing out the suggestion that miRNA fam- 57(5):772–784. ily gains and losses are associated with increasing and de- Ambros V. 2003. A uniform system for microRNA annotation. RNA creasing phenotypic complexity (e.g., as inferred from cell 9(3):277–279. Bai Y. 2014. Genome-wide sequencing of small RNAs reveals a tissue- diversity; Valentine et al. 1994), respectively. We demonstrate specific loss of conserved microRNA families in Echinococcus granulo- that thepresence/absenceofmiRNA families has aconsis- sus. BMC Genomics 15(1):736–713. tency index higher than expected for either morphological Berezikov E. 2010. Evolutionary flux of canonical microRNAs and mirtrons or molecular data of comparable taxon sampling. Our phylo- in Drosophila. Nat Genet. 42(1):6–9. genetic analyses were, in the main, congruent with estab- Berezikov E. 2011. Evolution of microRNA diversity and regulation in ani- mals. Nat Rev Genet. 12(12):846–860. lished metazoan relationships. However, our results suggest Brown M, Suryawanshi H, Hafner M, Farazi TA, Tuschl T. 2013. that small miRNA families (singletons in particular) may still be Mammalian miRNA curation through next-generation sequencing. subject to partial sampling bias in our data set. Our data sup- Front Genet. 4:145. port a pattern of miRNA diversification caused either by var- Campbell LI, et al. 2011. MicroRNAs and phylogenomics resolve the rela- iable rates of miRNA birth, or constant but low rates of tionships of Tardigrada and suggest that velvet worms are the sister group of Arthropoda. Proc Natl Acad Sci U S A. miRNA birth and a variable (but low) death rate, rather 108(38):15920–15924. than high birth and death rates. However, our under- Castellano L, Stebbing J. 2013. Deep sequencing of small RNAs identifies standing of miRNA evolution is greatly constrained by canonical and non-canonical miRNA and endogenous siRNAs in mam- the paucity of taxa with well-annotated miRNAomes, as malian somatic tissues. Nucleic Acids Res. 41(5):3339–3351. most of the taxa included here are distantly related to Chiang HR, et al. 2010. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. each other with an average lineage-specific branch length 24(10):992–1009. of 312.6 Myr. The paucity of taxa with well-annotated Crete-Lafrenie ` re A, Weir LK, Bernatchez L. 2012. Framing the Salmonidae miRNAomes runs the risk of telescoping past flux in spe- family phylogenetic portrait: a more complete picture from increased ciation and extinction onto individual internal branches of taxon sampling. PLoS One 7(10):e46662. a phylogenetic tree. In this way, the summed effects of Dabert M, Witalinski W, Kazmierski A, Olszanowski Z, Dabert J. 2010. Molecular phylogeny of acariform mites (Acari, Arachnida): strong random processes can produce nonrandom bursts in conflict between phylogenetic signal and long-branch attraction arti- (miRNA) innovation. More studies are needed to investi- facts. Mol Phylogenet Evol. 56(1):222–241. gate miRNA family evolution among congeneric species Dunn CW, Giribet G, Edgecombe GD, Hejnol A. 2014. Animal phylogeny with comparisons between the evolution of individual and its evolutionary implications*. Ann Rev Ecol Evol Syst. miRNA genes and the evolution of miRNA families. 45(1):371–395. Erwin DH, et al. 2011. The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science Supplementary Material 334(6059):1091–1097. Felsenstein J. 1992. Phylogenies from restriction sites: a maximum- Supplementary data areavailableat Genome Biology and likelihood approach. Evolution 46(1):159–173. Evolution online. Field DJ, et al. 2014. Toward consilience in reptile phylogeny: miRNAs support an archosaur, not lepidosaur, affinity for turtles. Evol Dev. 16(4):189–196. Fromm B, et al. 2015. A uniform system for the annotation of vertebrate Acknowledgments microRNA genes and the evolution of the human microRNAome. Annu Rev Genet. 49(1):213–242. This work was supported by Irish Research Council for Science Fromm B, Worren MM, Hahn C, Hovig E, Bachmann L. 2013. Substantial Engineering and Technology grant (J.T.); Engineering and loss of conserved and gain of novel MicroRNA families in flatworms. Physical Sciences Research Council studentship (R.T.); Mol Biol Evol. 30(12):2619–2628. Girard M, Jacquemin E, Munnich A, Lyonnet S, Henrion-Caude A. 2008. Australian Research Council grant DE140101879 (G.L.); SE miR-122, a paradigm for the role of microRNAs in the liver. J Hepatol. Norway Health Authority grant 2014041 (B.F.); National 48(4):648–656. Aeronautics and Space Agency-Ames grant (K.J.P.); Wolfson Grbic M, et al. 2011. The genome of Tetranychus urticae reveals herbiv- Merit Award, Biological Sciences Research Council grant orous pest adaptations. Nature 479(7374):487–492. BB/N000919/1, and Natural Environment Research Council ~ Guerra-Assunc¸ao JA, Enright AJ. 2012. Large-scale analysis of microRNA evolution. BMC Genomics 13(1):218–212. grants NE/P013678/1; NE/N002067/1 (P.D.). We thank 1468 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Well-Annotated microRNAomes Do Not Evidence Pervasive miRNA Loss GBE Hansen TB, Kjems J, Bramsen JB. 2011. Enhancing miRNA annotation Peterson KJ, Dietrich MR, McPeek MA. 2009. MicroRNAs and metazoan confidence in miRBase by continuous cross dataset analysis. RNA macroevolution: insights into canalization, complexity, and the Biol. 8(3):378–383. Cambrian explosion. BioEssays 31(7):736–747. Heimberg AM, Cowper-Sallari R, Semon M, Donoghue PCJ, Peterson KJ. Philippe H, et al. 2011. Acoelomorph flatworms are deuterostomes related 2010. microRNAs reveal the interrelationships of hagfish, lampreys, to Xenoturbella. Nature 470(7333):255–258. and gnathostomes and the nature of the ancestral vertebrate. Proc Plasterk RHA. 2006. Micro RNAs in animal development. Cell Natl Acad Sci U S A. 107(45):19379–19383. 124(5):877–881. Heimberg AM, Sempere LF, Moy VN, Donoghue PCJ, Peterson KJ. 2008. Platt RN, et al. 2014. Large numbers of novel miRNAs originate from DNA MicroRNAs and the advent of vertebrate morphological complexity. transposons and are coincident with a large species radiation in bats. Proc Natl Acad Sci U S A. 105(8):2946–2950. Mol Biol Evol. 31(6):1536–1545. Pohl N, Sison-Mangus MP, Yee EN, Liswi SW, Briscoe AD. 2009. Impact of Helm C, Bernhart SH, Siederdissen C. H z, Nickel B, Bleidorn C. 2012. Deep sequencing of small RNAs confirms an annelid affinity of duplicate gene copies on phylogenetic analysis and divergence time Myzostomida. Mol Phylogenet Evol. 64(1):198–203. estimates in butterflies. BMC Evol Biol. 9(1):99–16. Hertel J. 2006. The expansion of the metazoan microRNA repertoire. BMC Pyron RA. 2010. A likelihood method for assessing molecular divergence Genomics 7(25). time estimates and the placement of fossil calibrations. Syst Biol. Hertel J, Stadler PF. 2015. The expansion of animal microRNA families 59(2):185–194. revisited. Life (Basel) 5(1):905–920. Quah S, Hui JH, Holland PW. 2015. A burst of miRNA innovation in the Hohna S. et al. 2016. RevBayes: Bayesian phylogenetic inference using early evolution of butterflies and moths. Mol Biol Evol. 32:1161–1174. graphical models and an interactive model-specification language. Reis M, Sousa-Guimar~ aes S, Vieira CP, Sunkel CE, Vieira J. 2011. Syst Biol. (4):726–736. Drosophila genes that affect meiosis duration are among the meiosis Iwama H, Kato K, Imachi H, Murao K, Masaki T. 2013. Human microRNAs related genes that are more often found duplicated. PLoS One originated from two periods at accelerated rates in mammalian evo- 6(3):e17512. lution. Mol Biol Evol. 30(3):613–626. Revell LJ. 2012. phytools: an R package for phylogenetic comparative bi- Kenny NJ, et al. 2015. The phylogenetic utility and functional constraint of ology (and other things). Methods Ecol Evol. 3(2):217–223. microRNA flanking sequences. Proc Biol Sci. 282(1803):20142983. Rota-Stabelli O, et al. 2011. A congruent solution to arthropod phylogeny: Kim T, Hao W. 2014. DiscML: an R package for estimating evolutionary phylogenomics, microRNAs and morphology support monophyletic rates of discrete characters using maximum likelihood. BMC Mandibulata. Proc R Soc B Biol Sci. 278(1703):298–306. Bioinformatics 15(1):320. Sanderson MJ, Donoghue MJ. 1989. Patterns of variation in levels of ho- moplasy. Evolution 43(8):1781–1795. Kluge AG, Farris JS. 1969. Quantitative phyletics and the evolution of anurans. Syst Biol. 18(1):1–32. Sanderson MJ, Donoghue MJ. 1996. The relationship between homoplasy Kozomara A, Griffiths-Jones S. 2011. miRBase: integrating microRNA an- and confidence in a phylogenetic tree. In: Sanderson MJ, Hufford L, notation and deep-sequencing data. Nucleic Acids Res. 39(Database editors. Homoplasy: the recurrence of similarity in evolution. San issue):D152–D157. Diego: Academic Press. p. 67–89. Kozomara A, Griffiths-Jones S. 2014. miRBase: annotating high confi- Sayed D, Abdellatif M. 2011. MicroRNAs in development and disease. dence microRNAs using deep sequencing data. Nucleic Acids Res. Physiol Rev. 91(3):827–887. 42(D1):D68–D73. Sempere LF, Cole CN, McPeek MA, Peterson KJ. 2006. The phylogenetic Lagos-Quintana M, et al. 2002. Identification of tissue-specific microRNAs distribution of metazoan microRNAs: insights into evolutionary com- from mouse. Curr Biol. 12(9):735–739. plexity and constraint. J Exp Zool B Mol Dev Evol. 306B(6):575–588. Lartillot N, Rodrigue N, Stubbs D, Richer J. 2013. PhyloBayes MPI: phylo- Sempere LF,Martinez P,Cole C,Baguna J, Peterson KJ. 2007. genetic reconstruction with infinite mixtures of profiles in a parallel Phylogenetic distribution of microRNAs supports the basal position environment. Syst Biol. 62(4):611–615. of acoel flatworms and the polyphyly of Platyhelminthes. Evol Dev. Lloyd GT. 2016. Estimating morphological diversity and tempo with dis- 9(5):409–415. crete character-taxon matrices: implementation, challenges, progress, Song SJung, et al. 2013. The oncogenic microRNA miR-22 targets the and future directions. Biol J Linn Soc. 118(1):131–151. TET2 tumor suppressor to promote hematopoietic stem cell self- Lu J, et al. 2008. Adaptive evolution of newly emerged micro-RNA genes renewal and transformation. Cell Stem Cell 13(1):87–101. in Drosophila. Mol Biol Evol. 25(5):929–938. Sperling EA, et al. 2009. MicroRNAs resolve an apparent conflict between Lyson TR, et al. 2012. MicroRNAs support a turtle – lizard clade. Biol Lett. annelid systematics and their fossil record. Proc R Soc B Biol Sci. 8(1):104–107. 22;276(1677):4315–4322. Lyu Y, et al. 2014. New microRNAs in Drosophila–birth, death and cycles Sperling EA, Peterson KJ. 2009. MicroRNAs and metazoan phylogeny: big of adaptive evolution. PLoS Genet. 10(1):e1004096. trees from little genes. In: Telford MJ, Littlewood DTJ, editors. Animal Meng Y, Shao C, Wang H, Chen M. 2012. Are all the miRBase-registered evolution: genomes, fossils and trees. Oxford: Oxford University Press. microRNAs true? A structure- and expression-based re-examination in p. 157–170. plants. RNA Biol. 9(3):249–253. Tanzer A, et al. 2010. Evolutionary genomics of microRNAs and their Meunier J, et al. 2013. Birth and expression evolution of mammalian relatives. In: Caetano-Anolles G, editor. Evolutionary genomics and microRNA genes. Genome Res. 23(1):34–45. systems biology. Hoboken (NJ): Wiley-Blackwell. p. 295–327. Milinkovitch MC, Helaers R, Depiereux E, Tzika AC, Gabaldo n T. 2010. 2X Tarver JE, Donoghue PCJ, Peterson KJ. 2012. Do miRNAs have a deep genomes – depth does matter. Genome Biol. 11(2):r16. evolutionary history? BioEssays 34(10):857–866. Nicholls GK, Gray RD. 2008. Dated ancestral trees from binary trait data Tarver JE, et al. 2013. miRNAs: small genes with big potential in metazoan and their application to the diversification of languages. J R Stat Soc. phylogenetics. Mol Biol Evol. 30(11):2369–2382. 70(3):545–566. Tarver JE, et al. 2016. The interrelationships of placental mammals and the Nozawa M, Miura S, Nei M. 2010. Origins and evolution of microRNA limits of phylogenetic inference. Genome Biol Evol. 8(2):330–344. genes in Drosophila species. Genome Biol Evol. 2(0):180–189. Taylor RS, Tarver JE, Foroozani A, Donoghue PC. 2017. MicroRNA anno- Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics and tation of plant genomes – do it right or not at all. BioEssays evolution in R language. Bioinformatics 20(2):289–290. 39(2):1600113. Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 1469 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018 Tarver et al. GBE Taylor RS, Tarver JE, Hiscock SJ, Donoghue PC. 2014. Evolutionary history Wheeler BM, et al. 2009. The deep evolution of metazoan microRNAs. of plant microRNAs. Trends Plant Sci. 19(3):175–182. Evol Dev. 11(1):50–68. Team RC. 2016. R: a language and environment for statistical computing. Zhao Y, Samal E, Srivastava D. 2005. Serum response factor regulates a Vienna: R Foundation for Statistical Computing. muscle-specific microRNA that targets Hand2 during cardiogenesis. Thomson RC, Plachetzki DC, Mahler DL, Moore BR. 2014. A critical ap- Nature 436(7048):214–220. praisal of the use of microRNA data in phylogenetics. Proc Natl Acad Zhong Y-f, Holland PW. 2011. The dynamics of vertebrate homeobox Sci U S A. 111(35):E3659–E3668. gene evolution: gain and loss of genes in mouse and human lineages. Valentine JW, Collins AG, Meyer CP. 1994. Morphological complexity in- BMC Evol Biol. 11(1):1–13. crease in metazoans. Paleobiology 20(02):131–142. Zhou M, Luo H. 2013. MicroRNA-mediated gene regulation: potential Wahlberg N, et al. 2009. Nymphalid butterflies diversify following near applications for plant genetic engineering. Plant Mol Biol. 83(1– demise at the cretaceous/tertiary boundary. Proc R Soc B Biol Sci. 2):59–75. 2009.1303. Wang X, Liu S. 2011. Systematic curation of miRBase annotation using integrated small RNA high-throughput sequencing data for C. elegans and Drosophila. Front Genet. 276(1677):4295–302. Associate editor: Mary O’Connell 1470 Genome Biol. Evol. 10(6):1457–1470 doi:10.1093/gbe/evy096 Advance Access publication May 18, 2018 Downloaded from https://academic.oup.com/gbe/article-abstract/10/6/1457/4999385 by Ed 'DeepDyve' Gillespie user on 20 June 2018

Journal

Genome Biology and EvolutionOxford University Press

Published: May 18, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off