Adenosine-to-inosine (A-to-I) editing is widespread across the kingdom Metazoa. However, for the lack of comprehensive analysis in nonmodel animals, the evolutionary history of A-to-I editing remains largely unexplored. Here, we detect high- conﬁdence editing sites using clustering and conservation strategies based on RNA sequencing data alone, without using single-nucleotide polymorphism information or genome sequencing data from the same sample. We thereby unveil the ﬁrst evolutionary landscape of A-to-I editing maps across 20 metazoan species (from worm to human), providing unprecedented evidence on how the editing mechanism gradually expands its territory and increases its inﬂuence along the history of evo- lution. Our result revealed that highly clustered and conserved editing sites tended to have a higher editing level and a higher magnitude of the ADAR motif. The ratio of the frequencies of nonsynonymous editing to that of synonymous editing remark- ably increased with increasing the conservation level of A-to-I editing. These results thus suggest potentially functional beneﬁt of highly clustered and conserved editing sites. In addition, spatiotemporal dynamics analyses reveal a conserved enrichment of editing and ADAR expression in the central nervous system throughout more than 300 Myr of divergent evolution in complex animals and the comparability of editing patterns between invertebrates and between vertebrates during development. This study provides evolutionary and dynamic aspects of A-to-I editome across metazoan species, expanding this important but understudied class of nongenomically encoded events for comprehensive characterization. Key words: A-to-I RNA editing, ADAR, ADAR motif, dynamic editome, evolution. Introduction 1989), and coding potential (Burns et al. 1997). RNA editing Adenosine-to-inosine (A-to-I) RNA editing is a common co- or events have been reported to be highly associated with mod- post-transcriptional mechanism in metazoans, which is medi- iﬁcation of regulatory RNAs (Kawahara et al. 2007; Tomaselli ated by the ADAR (adenosine deaminases that act on RNA) et al. 2015), cancer mechanisms (Maas et al. 2006; Han et al. family of proteins and converts adenosine (A) into inosine (I) 2015), embryogenesis (Wang et al. 2000; Osenberg et al. (Bass 2002; Nishikura 2006). Since inosine is then recognized 2010), and brain development or neural differentiation as guanine (G) in living cells, A-to-I editing is also known as A- (Wahlstedt et al. 2009; Hwang et al. 2016). These observa- to-G editing (Bass 2002; Nishikura 2006). Although an A-to-I tions indicate the functional importance of A-to-I RNA editing editing event only alters one base at the RNA level, which not only for the immediate biological relevance but also for seems to slightly affect the corresponding RNA molecule, the potential molecular complexity. emerging evidence shows that it can affect both transcription The past decade has seen a rapid growth in efforts to and translation in different aspects such as microRNA regula- identify A-to-I sites on a transcriptome-wide scale; a tremen- tion (Rueter et al. 1999; Kawahara et al. 2007), alternative dous number of editing events have been detected (Levanon splicing (Rueter et al. 1999; Lev-Maor et al. 2007), structural et al. 2004; Danecek et al. 2012; Peng et al. 2012; alteration (Kimelman and Kirschner 1989; Wagneretal. Ramaswami et al. 2013; Bazak et al. 2014; Ramaswami and The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact firstname.lastname@example.org Genome Biol. Evol. 10(2):521–537. doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 521 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE Li 2014; Picardi et al. 2015; Zhang and Xiao 2015). However, et al. 2015), our spatiotemporal atlas provided an unprece- identiﬁcation of RNA editing often severely hampers by false dented opportunity for studying dynamic editome in diverse positives arising from RNA/DNA sequencing errors, single nu- species and for assessing the correlation between the global cleotide polymorphisms (SNPs) or somatic mutations between editing and expression of the two functional editors (ADAR1/ the individual cells, incompleteness of genomic sequences ADAR2) in the context of evolution. and gene annotation, and mapping errors of the cDNA sequences to the reference genome (Bass et al. 2012; Materials and Methods Ulbricht and Emeson 2014). Several RNA editing-detecting Data Retrieval and Access methods minimize false positives from SNPs by comparing the DNA and RNA sequences from a single individual (Bahn The information of the RNA-seq data used in this study was et al. 2012; Peng et al. 2012; Ramaswami et al. 2012). listed in supplementary data set S1, Supplementary Material However, this approach simultaneously requires genome online. TEs and other repetitive elements were identiﬁed by and transcriptome sequencing data from the same sample, RepeatMasker and were downloaded from the University of limiting its practicability. To address the lack of genome se- California Santa Cruz (UCSC) genome browser at https://ge- quencing data, a complete (or a partial) SNP database is often nome.ucsc.edu/. The PhyloP scores were also downloaded used to ﬁlter out SNPs (Bahn et al. 2012; Ramaswami et al. from the UCSC genome browser. All supplementary data 2012, 2013; Bazak et al. 2014; Zhang and Xiao 2015; John sets (S1–S7) are publicly downloadable at http://idv.sinica. et al. 2017; Sun et al. 2016). These approaches require a priori edu.tw/trees/RNA_editing/RNA_editing.html. The in-house knowledge from known SNPs such as dbSNP, also decreasing programs for identifying RNA editing sites are publicly acces- their capability to identify editing sites in diverse organs/ani- sible from GitHub at https://github.com/TreesLab/ICARES/ mals, especially in nonmodel species due to the lack of SNP tree/dev. A visualizable website was also provided (https:// information. Moreover, the phenomenon that most detected sites.google.com/site/recodedatabase/), which allows users RNA editing sites on coding regions tend to be nonadaptive to search for the identiﬁed RNA editing sites by providing (Chen et al. 2014; Xu and Zhang 2014) reveal another chal- the coordinates of genomic regions or gene symbols. lenge in extracting functionally beneﬁcial editing events from the sea of sites with a very low level of editing (Pinto et al. The Pipeline for Identifying A-to-I Sites 2014; Savva and Reenan 2014; Xu and Zhang 2015). These challenges thus limit our understanding of the evolutionary The RNA editing sites were identiﬁed by two strategies: the landscape and spatiotemporal dynamics of A-to-I editing in clustering strategy within the same species and the conserva- context. tion strategy with cross-species comparison (ﬁg. 1A). The clus- Previously, it was observed an enrichment for clusters of tering process was made up of three main phases to detected editing sites, suggesting that the effect of sequenc- distinguish editing sites from technical and biological noises. ing errors, SNPs, and mutations on identiﬁcation of A-to-I First, we collected 309 RNA-seq data from 20 metazoan spe- editing could remarkably decrease with increasing the num- cies (supplementary data set S1, Supplementary Material on- ber of consecutive variants (Levanon et al. 2004; Zaranek et al. line), and used BWA (Li and Durbin 2009)(version 1.2.3)for 2010). The clustering strategy has been demonstrated to be short-read mapping. SNV calling was then performed using effective to detect RNA editing sites without the need for SNP SAMtools pileup (Li et al. 2009) (version 1.0.2) with a pileup information or genome sequencing data from the same sam- parser program downloaded from Galaxy (pileup_parser.pl ple (Porath et al. 2014; Feng Zhang et al. 2017). To generate with parameter settings: 3 9 10 8 20 1 “No” “Yes” 2 an evolutionary landscape of A-to-I RNA editome across “Yes” “Yes”). The type of nucleotide substitution was deter- metazoan species, most of which have no (or limited) SNP mined on the basis of the reference genome and the strand data, we employ the clustering strategy to identify A-to-I edit- information of the Ensembl-annotated genes (see supplemen- ing sites using RNA-seq data alone. For accuracy, we selected tary data set S1, Supplementary Material online). Second, we A-to-I editing sites by controlling for the fraction of A-to-G selected dimorphic variants (i.e., sites with two distinct nucle- mismatches to all types of mismatches (designated as otide types), and discarded singleton and multi-allelic ones. To “%AG”; >95%) and false discovery rate (FDR; <1%) or eliminate potential strand-speciﬁc miscalls, we discarded var- cross-species conservation of A-to-I editing, and thereby iden- iants with strand bias. For each species, we used MAQ (Li et al. tiﬁed 429,509 high-conﬁdence A-to-I sites in diverse species 2008) to simulate all possible cDNA short-reads based on from worm to human, including model and nonmodel ani- Ensembl-annotated transcripts, mapped these simulated mals. We thus constructed the ﬁrst evolutionary landscape reads to the reference genome, and compiled a mapping er- and spatiotemporal atlas of A-to-I editing across metazoan ror set according to the mapping result. The called variants species. In contrast with previous studies that investigated within this set were discarded since they were subject to map- editing dynamics in limited genes/species (Wahlstedt et al. ping errors. Besides, sequencing errors were reported to occur 2009; Shtrichman et al. 2012; Veno et al. 2012; Picardi in proximity (Nakamura et al. 2011). We then identiﬁed a 522 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE sequencing error set (SES) wherein variant calls of different experimentally conﬁrmed (supplementary ﬁg. S1A and B, mismatch types occurred in proximity (i.e., distance between Supplementary Material online; supplementary dat aset S3, each other <100 bp). Variants overlapped with the SES were Supplementary Material online). In addition to mouse and ﬂy subsequently discarded. In addition, for accuracy, only the editing sites, which have been comprehensively detected and variants with multi-sample evidence were retained. Third, collected in public databases, we also selected nine editing the retained variants were compiled to identify high- sites from the identiﬁed chicken editing sites and successfully conﬁdence A-to-I sites using the clustering strategy. Of conﬁrmed eight of them in chicken brain (supplementary ﬁg. note, the editing type of each mismatch was determined on S1C, Supplementary Material online; supplementary data set the basis of the Ensembl annotation. The clustering strategy S3, Supplementary Material online). compiled variants of the same mismatch type in close prox- imity (i.e., distance between each other <100 bp), because it was observed that the effect of sequencing errors, SNPs, and Samples and Validation mutations on identiﬁcation of A-to-I editing could remarkably Genomic DNA and total RNA were extracted from frozen decrease with increasing the number of consecutive variants brain of mouse (C57BL/6J) at postnatal day 49, fresh brain (Levanon et al. 2004; Zaranek et al. 2010). The proximal dis- of adult chicken, and wild-type ﬂy (Drosophila melanogaster). tance was set as 100 bp because of the observation that the Both genomic DNA and total RNA were obtained from the vast majority (>95%) of the previously identiﬁed A-to-I edit- same individual/sample. Genomic DNA and total RNA were ing sites have at least one neighbor site within the proximal extracted by PureLink Genomic DNA Mini Kit (Thermo Fisher distance of 100 bp (supplementary table S1, Supplementary Scientiﬁc) and PureLink RNA Mini Kit (Thermo Fisher Scientiﬁc) Material online). For each species, the qualiﬁed number of with DNase I according to the manufacturer’s instructions, consecutive variants (qualiﬁed N ) was determined by cluster respectively. Primers were designed in ﬂanking sequences of the FDR and %AG. The FDR of A-to-I sites was deﬁned as the tested editing sites. 5 lg total RNA transforms to cDNA FDR¼the number of G-to-A mismatches/the number of A-to- with SuperScript III First-Strand Synthesis System (Thermo G mismatches. The qualiﬁed N was determined while the cluster Fisher Scientiﬁc) using Random Hexamer and Oligo-DT pri- detected sites satisﬁed both of the two rules: %AG>95% mers. 50 ng of genomic DNA and cDNA were used for the and FDR<1%. For the conservation strategy, the A-to-I edit- PCR analysis of the editing sites. The PCR was performed us- ing sites were identiﬁed from three clades (12 vertebrates, 4 ing DreamTaq Green PCR Master Mix (Thermo Fisher Drosophila species, and 4 Caenorhabditis species), respec- Scientiﬁc) on Veriti Thermal Cycler (Thermo Fisher tively. We considered the A-to-G mismatches that have Scientiﬁc). PCR products were validated by gel, and then passed the strand-bias, mapping error, and sequencing error treated with QIAquick Gel Extraction Kit (Qiagen). The edit- ﬁlters described in Phase II (i.e., ad hoc ﬁlters; ﬁg. 1A)for each ing sites were selected to be tested, if they were detected species. Such A-to-G variants that were observed at the by the RNA-seq data (number of mapped reads at the site orthologous sites in more than one species of the examined should be >10) to be editing events with editing level clade were regarded as evolutionarily conserved A-to-I editing >10% in at least one sample. Sanger sequencing was per- sites (ﬁg. 1A). A full list of the identiﬁed A-to-I editing sites formed to validate the corresponding editing positions of (including clustered and conserved editing sites) referred to genomic DNA and cDNA sequences (see supplementary the Ensembl annotation (releases 66 and 85) is given in sup- ﬁg. S1, Supplementary Material online). The primer plementary data set S2, Supplementary Material online. sequences used were listed in supplementary data set S3, Compared with previously identiﬁed A-to-I editing sites in hu- Supplementary Material online. man, mouse, and ﬂy [collected in the well-known public data- bases: DARNED (Kiran and Baranov 2010), RADAR (Ramaswami and Li 2014), and REDIportal (Picardi et al. Observed-to-Expected Ratio of “G” 2017)], we found that most the identiﬁed human editing sites (90%) were also detected in the public databases (supple- The observed-to-expected (O/E) ratio of thepresenceof G mentary dat aset S2, Supplementary Material online). In con- was calculated to examine ADAR preference for A-to-I edit- trast, more than one third of the identiﬁed mouse and ﬂy ing, which was deﬁned as P =P ,where P rep- ObsðGÞ ExpðGÞ Obs(G) A-to-I editing sites (92% for mouse and 33% for ﬂy) were resented the frequency of observed presence of “G” not found in these databases (supplementary data set S2, immediately upstream or downstream to the A-to-I sites, Supplementary Material online). To validate the newly identi- and P represented that of expected presence of “G,” Exp(G) ﬁed A-to-I editing sites in mouse and ﬂy, we selected 11 which was calculated by the frequency of “G” in the exam- mouse and 12 ﬂy editing sites and performed PCR ampliﬁca- ined genome. The statistical signiﬁcance of difference be- tion and Sanger sequencing of both DNA and RNA of mouse tween the observed and expected ratios of the presence of brain and wild-type ﬂy from the same individual. Our result “G” was evaluated by the two-tailed Fisher’s exact test using revealed that 10 mouse and 12 ﬂy editing sites were theR package(https://www.r-project.org/). Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 523 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE FIG.1.—Identiﬁcation of high-conﬁdence A-to-I RNA editing sites, without a priori knowledge of SNP information. (A) Overview of the identiﬁcation of RNA editing sites. The editing sites were identiﬁed by the clustering strategy within the same species or by cross-species comparison. The clusteringprocess involved three main phases: Phase I: preprocessing, Phase II: ad hoc ﬁlters, and Phase III: ad hoc identiﬁcation. (B) Correlation between the number of 524 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Analysis of the Conservation Level of A-to-I Editing and Measurement of ADAR mRNA Expression Conservation Scores ADAR mRNA expression was represented by BPKM (Bases Per To assess the correlation between the conservation level of Kilo-base of gene model per Million mapped bases) nonsynonymous A-to-I editing and conservation scores, we (Mortazavi et al. 2008). To improve the precision of mRNA ﬁrst performed the UCSC LiftOver tool to convert genome expression measurement in poly(A)-selected RNA-seq sam- coordinates of nonhuman species into those of human at ples, we only considered 3 UTR exons in the gene model of the nonsynonymous sites of adenosine for the four categories calculation. of conservation (i.e., human–chimpanzee, human– chimpanzee–mouse, and human–chimpanzee–mouse– Results chicken conservation). For each category of conservation, the control sites with the same number of “A” of the corre- Identiﬁcation of Editing Sites across 20 Metazoan Species sponding category of conserved editing sites were selected To investigate the evolutionary landscape and dynamic randomly. The PhyloP score of each selected “A” was changes of A-to-I editing across metazoan species, we ﬁrst extracted from the UCSC genome browser. The simulation collected RNA-seq data from various cell types of 20 species, was performed 1,000 replicates for each category of including 6 primates, 3 nonprimate mammals, 3 nonmamma- conservation. lian vertebrates, 4 Drosophila species, and 4 Caenorhabditis species (309 samples in total; supplementary data set S1, Comparative Analysis of A-to-I Editing Levels Supplementary Material online). We then used the clustering strategy to detect RNA editing sites based on RNA-seq data We selected 16 animal individuals (2 for chimpanzee, 2 for alone, without the need for a priori knowledge from known bonobo, 2 for gorilla, 1 for orangutan, 2 for macaque, 3 for SNPs or genome sequencing data from the same sample mouse, 1 for opossum, 1 for platypus, and 2 for chicken) for (Materials and Methods). Two useful measures are often ap- assessing editing dynamics in 5 organs (i.e., cerebellum, brain, plied to evaluating the speciﬁcity of the identiﬁed A-to-I edit- liver, kidney, and heart). Of note, samples were selected for ing sites: 1) the fraction of A-to-G mismatches to all types of spatial proﬁling if they were eligible for measurement in ﬁve mismatches (designated as “%AG”) because of the extreme organs from a single individual. Each individual must contain infrequency of non-A-to-G editing events (Kleinman and all ﬁve types of organ samples with reliable RNA quality (i.e., Majewski 2012; Lin et al. 2012; Pickrell et al. 2012); and 2) RNA integrity number [Schroeder et al. 2006] >7.0). On the the ratio of the number of G-to-A mismatches to the number other hand, we used four species (i.e., Caenorhabditis ele- of A-to-G mismatches (i.e., FDR, see Materials and Methods) gans, D. melanogaster, zebraﬁsh, and frog) in the proﬁling because of the assumption that G-to-A mismatches often of developmental dynamics. Since A-to-I editing could inﬂu- reﬂected sequencing errors (Bahn et al. 2012; Liscovitch- ence its target sites in a local dsRNA region (Zaranek et al. Brauer et al. 2017). We can ﬁnd that the percentage of A- 2010; Bazak Levanon, et al. 2014), the level of editing was to-G/T-to-C variants grew (or the FDR values decreased) rap- determined in a clustering manner as idly with the increase of the number of consecutive single- A to I editing level ¼ nucleotide variants (SNVs) of the same type (designated as Number of G in the clustered sites “N ”) in all examined species (ﬁg. 1B). Of note, the T- i cluster Number of A and G in the clustered sites to-C variants were considered, because the used RNA-seq data were not strand-speciﬁc and these variants could possi- bly be A-to-G editing sites in an antisense transcript. This with i ¼ 1, .. ., N,where N is the total number of editing sites revealed that a priori knowledge of the clustering tendency in the designated cluster. For each individual in the spatial for A-to-I editing held well across species, including model context or animal in the temporal context, the editing levels and nonmodel animals. For accuracy, the qualiﬁed N were accumulated and the highest value among the exam- cluster was determined while the detected sites satisﬁed both ined organs or development stages was used to normalize the %AG>95% and FDR<1% (ﬁg. 1B). Upon completion of A-to-I editing index. To ensure the statistical accuracy, we only the screening process, 343,979 A-to-G editing sites were considered the editing clusters with reasonable base coverage identiﬁed in diverse species from worm to human. Since the (i.e., the total number of A and G in the clustered sites 10) more species the variants were supported by, the higher level for the analysis of editing dynamics. FIG.1.—Continued consecutive SNVs of the same type (N ) and the two measures of speciﬁcity: %AG (the percentage of A-to-G & T-to-C among 12 SNV types) and FDR cluster (the ratio of the number of G-to-A mismatches to the number of A-to-G mismatches). Histograms for each species represent A-to-G & T-to-C percentage in the subset of N consecutive SNVs of the same type. The dark gray histogram for each species represents the qualiﬁed N , which satisﬁes both cluster cluster %AG> 95% and FDR< 1%. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 525 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE of functional potentiality (and thus accuracy) they contained are highly associated with the evolution of RNA editing be- (Bahn et al. 2012), we recovered the A-to-G variants at the tween Drosophila species (Sapiro et al. 2015). Third, we di- orthologous sites in multiple species (editing sites supported vided all examined human editing sites into 20 equal-size bins by more than one species; the conservation strategy). By do- according to their editing levels and found a remarkably pos- ing so, 179,182 evolutionarily conserved A-to-G editing sites itive correlation between the magnitude of the ADAR motif were identiﬁed in the 20 species. Integrating the editing sites and editing level (ﬁg. 2E). Of note, if a detected site appeared identiﬁed by these two strategies, we totally identiﬁed in multiple samples, the highest level observed was consid- 429,509 editing sites (table 1 and supplementary data set ered. Finally, we also observed that both N (ﬁg. 2F)and cluster S2, Supplementary Material online). Of note, we only consid- the conservation level of editing (ﬁg. 2G) were positively cor- ered the sites that were located within genic regions. We related with editing level. The correlations among N ,the cluster emphasize that the identiﬁed editing sites are highly conﬁdent conservation level of editing, the magnitude of the ADAR with controlling for FDR< 1% and %AG> 95% (the cluster- motif, and editing level were summarized in ﬁgure 2H. ing strategy) or evolutionary conservation of editing in multi- Since a relatively higher editing level at a site is expected to ple species (the conservation strategy). be functionally important (Xu and Zhang 2015), our results suggested the biological signiﬁcance of highly clustered and conserved editing sites. The Correlation between N , Conservation cluster Level of Editing, Editing Level, and the Magnitude of the ADAR Motif Evolutionary Analysis of A-to-I Editomes The A-to-I editors (i.e., ADAR proteins) are known to be highly Since transposable elements (TEs) are pervasive in the conserved across species (Bass 2002). Previous studies have genomes of higher eukaryotes, the double-stranded RNA observed that in some animals ADARs have a sequence pref- structure formed by TE-complementary pairs often provides erence (or targets) for “G” depletion and “G” enrichment at an ideal target for ADAR binding (Levanon et al. 2004). To 0 0 the 5 and 3 neighbor nucleotides next to A-to-I editing sites, understand the extent of ADAR substrates among different respectively (Kiran and Baranov 2010; Lehmann and Bass lineages and species, we constructed an evolutionary land- 2000; Eggington et al. 2011; Ramaswami et al. 2013; Chen scape of A-to-I editing maps (i.e., A-to-I editomes). We et al. 2014; Pinto et al. 2014; Porath et al. 2014; Alon et al. showed that the scaffolding of A-to-I editomes was highly 2015; Liscovitch-Brauer et al. 2017). We evaluated the associated with the expansion of TEs (ﬁg. 3A). We found observed-to-expected (O/E) ratios of the presence of “G”im- that the distribution of clustered A-to-I sites pertaining to mediately upstream or downstream to the A-to-I editing sites TEs (ﬁg. 3B) generally reﬂected the TE density in the genome (see Materials and Methods) and showed that the trend of the (ﬁg. 3A), echoing the observation that A-to-I editing tended known ADAR motif generally held true across metazoan spe- to be clustered within TEs (Kim et al. 2004). In addition, the cies, regardless of the detection strategies of A-to-G editing landscape was likely to encompass all the A-to-I editing events (i.e., the clustered and conserved sites; ﬁg. 2A and B). that were highly conserved across species, whilst contained We processed to examine the correlations among the lineage- or species-speciﬁc events that might have been magnitude of clustering of editing sites (N ), the conser- recruited as a result of divergent evolution. Importantly, short cluster vation level of editing, the magnitude of the ADAR motif, and interspersed elements (SINEs), which were apparently editing level. First, we found a positive correlation between enriched in the mammalian genomes (ﬁg. 3A, top), consti- the magnitude of the ADAR motif and N .Such a trend tuted a major landmark in mammalian editomes (ﬁg. 3A,bot- cluster that the magnitude of the ADAR motif increased with increas- tom). This co-evolution of SINEs and editomes might ing N became ﬂat after the qualiﬁed N (ﬁg. 2C). accelerate the divergence of species, not only in the genomic cluster cluster Second, we classiﬁed the identiﬁed human editing sites into content, but also in the transcriptomic complexity. For exam- three groups according to the conservation level (supplemen- ple, in primates, genes associated with neurological processes tary data set S4, Supplementary Material online) and found or disorders were more likely to develop a host of SINE- that the magnitude of the ADAR motif signiﬁcantly increased mediated substrates of A-to-I editing (Paz-Yaacov et al. with increasing the conservation level of A-to-I editing 2010), possibly resulting in the development of more complex (ﬁg. 2D). This also reﬂected the sequence and structural con- transcriptome in the primate brain. servation in the double-stranded RNA binding domains To further examine the biological signiﬁcance of conserved (dsRBDs) of ADARs across 14 vertebrates (supplementary editing sites, we retrieved 66,689 primate-only, 39 mammal- ﬁg. S2A, Supplementary Material online), in which the protein only, and 16 vertebrate-conserved editing events according to residues involved in RNA binding (Steﬂ et al. 2010)were even the conservation level of the human A-to-I editing events highly conserved from vertebrates to Drosophila (supplemen- identiﬁed by our conservation strategy (supplementary data tary ﬁg. S2B and C, Supplementary Material online). These set S4, Supplementary Material online). Of note, primate-only results respond to a recent notion that cis sequence changes events were human editing events observed at the 526 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Table 1 Summary of A-to-I Editing Sites Identiﬁed in This Study Organis Number of Number of A-to-I Editing Sites Host Genes Samples Clustering Conservation Total TE Sites Strategy (A) Strategy (B) Sites (A[B) Homo sapiens (human) 45 177,734 66,800 222,300 213,772 11,035 Pan troglodytes (chimpanzee) 15 42,082 45,117 54,891 52,939 5,292 Pan paniscus (bonobo) 12 30,807 37,021 42,420 40,911 4,929 Gorilla gorilla (gorilla) 11 12,169 15,936 19,956 18,561 3,442 Pongo abelii (orangutan) 9 6,230 8,828 11,125 10,820 2,357 Macaca mulatta (macaque) 13 3,034 4,593 6,206 5,997 1,691 Mus musculus (mouse) 32 57,125 115 57,143 47,859 3,738 Monodelphis domestica (opossum) 12 661 27 682 588 89 Ornithorhynchus anatinus (platypus) 12 267 7 274 253 36 Gallus gallus (chicken) 12 77 17 94 30 20 Xenopus tropicalis (frog) 40 4,495 4 4,499 1,812 146 Danio rerio (zebraﬁsh) 8 5,331 5 5,336 4,571 340 Fly species D. melanogaster 34 317 229 534 208 114 D. pseudoobscura 8 464 206 636 N/A 112 D. willistoni 893 63 149 N/A 124 D. mojavensis 4 1,894 214 2,065 208 410 Nematode species Caenorhabditis elegans 16 888 0 888 584 57 C. briggsae 6 241 0 241 164 21 C. brenneri 628 0 28 N/A 3 C. japonica 642 0 42 N/A 2 The detailed information of the identiﬁed sites (e.g., genomic location and host genes) is listed in supplementary data set S2, Supplementary Material online. TE sites: A-to-I editing sites that are located in TEs. TE information of the species is not available. orthologous sites in nonhuman primate(s) but not observed in with increasing the conservation level of A-to-I editing. nonprimate vertebrates. Mammal-only ones were human Furthermore, considering the total “A” sites within the hu- editing events observed at the orthologous sites in both non- man coding sequences that would cause nonsynonymous human primate(s) and nonprimate mammal(s) but not ob- and synonymous changes if edited to “G,” respectively, we served in nonmammal vertebrates. Vertebrate-conserved calculated the frequencies of nonsynonymous (f ) and synon- ones were human editing events simultaneously observed at ymous (f ) editing (Xu and Zhang 2014). If nonsynonymous the orthologous sites in nonhuman primate(s), nonprimate editing events are generally deleterious and are destined to mammal(s), and nonmammal vertebrate(s). We found that selective elimination, f should exhibit remarkably smaller the great majority of the conserved editing events (99%) than f (Chen 2013; Xu and Zhang 2014). To examine the were primate-only, of which 98% (65,109 out of 66,689 relationship between the conservation level of A-to-I editing events) were located in TEs, especially in the primate- and the f -to-f ratio, we retrieved human nonconserved, n s speciﬁc Alu sequences (63,745 events; 96%), whereas no human–chimpanzee shared, human–chimpanzee–mouse mammal-only or vertebrate-conserved events occurred in shared, and human–chimpanzee–mouse–chicken shared syn- Alu repeats (ﬁg. 4A). This also reﬂected our abovementioned onymous (or nonsynonymous) editing sites and the corre- notion that the scaffolding of A-to-I editomes could be raised sponding shared “A” sites that would have synonymous (or by the introduction of TEs in a lineage-speciﬁc manner (ﬁg. 3). nonsynonymous) changes if edited to “G,” respectively (sup- We examined the effect of the conservation level on the dis- plementary table S2, Supplementary Material online). tribution of editing events. We found that the only 0.23% of Intriguingly, we found that the f -to-f ratio remarkably n s primate-only events caused amino acid changes (nonsynon- increased with increasing the conservation level of A-to-I edit- ymous changes), whereas as high as 26% and 88% of ing, andthe ratioevenexhibited 100% >1 for the human– mammal-only and vertebrate-conserved events leaded to chimpanzee–mouse–chicken shared events (ﬁg. 4C). A nonsynonymous changes, respectively (ﬁg. 4B). This reveals previous study reported that nonsynonymous editing events that the percentage of nonsynonymous editing sites increase were observed to occur less frequently than expected by Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 527 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE FIG.2.—The cis-preference of ADARs and editing level for the detected A-to-G editing sites. (A, B)The cis-preference of ADARs (or the ADAR motif, which was measured by the observed-to-expected (O/E) ratio of the presence of “G” immediately upstream and downstream to the A-to-G editing sites) for 528 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE 20 20 10 100 0 FIG.3.—The relationship between the expansion of TEs and the increase of A-to-I editing sites. (A) The average numbers of TEs (i.e., SINEs, LINEs, LTRs, and DNA transposons) per million bases (top) and the compositions of A-to-I editing sites in the four types of TEs (i.e., SINE, LINE, LTR, and DNA transposon), other repetitive region, and nonrepetitive region (bottom). (B) The distribution of clustered A-to-I sites pertaining to TEs across species. TE: transposable element. SINE: short interspersed nuclear element. LINE: long interspersed nuclear element. LTR: long terminal repeat element. chance, suggesting that coding RNA editing is generally not scores; Materials and Methods) and the conservation level beneﬁcial (Xu and Zhang 2014). However, if the editing of nonsynonymous A-to-I editing was indeed positively corre- events were highly conserved across species, we observed a lated with the PhyloP scores (ﬁg. 4D). This also reﬂected the different trend (ﬁg. 4C). We further examined the correlation above observations that both the magnitude of the ADAR between the conservation level of nonsynonymous A-to-I motif and editing level were positively correlated with the editing and the conservation level of the corresponding indi- conservation level of editing (ﬁg. 2D, G,and H). These results vidual nucleotides (measured by the PhyloP score) (Pollard et thus support a previous assumption that cross-species shared al. 2010). We found that the PhyloP scores of the conserved nonsynonymous editing is potentially beneﬁcial and unlikely editing sites (including human–chimpanzee, human– due to nonediting-related processes that may cause the chimpanzee–mouse, and human–chimpanzee–mouse– inevitable consequence of sequence conservation (Xu and chicken shared editing sites) were generally higher than con- Zhang 2015). For example, considering the 87 human-editing trol (by simulating the sequence data to infer the PhyloP events highly conserved in 5 nonhuman vertebrates FIG.2.—Continued the editing sites identiﬁed by the clustering (A) and conservation (B) strategies across metazoan species. Only the species with more than 50 detected editing sites were considered. (C) The correlation between the magnitude of the ADAR motif and the magnitude of clustering of editing sites (N )in human.(D) cluster The correlation between the magnitude of the ADAR motif and the conservation level of editing. (E) The correlation between the magnitude of the ADAR motif and editing level (measured by the Spearman’s rank correlation). (F, G) The correlation between editing level and N (F) and the conservation level of cluster editing (G). (H) The correlations among N , the conservation level of editing, the magnitude of the ADAR motif, and editing level. “þ”represents a cluster positive correlation. The statistical signiﬁcance was evaluated using the two-tailed Fisher’s exact test (A, B, D), the Spearman’s rank correlation (E), and the two-tailed Wilcoxon rank-sum test (F, G) using the R package: *P value < 0.05, **P value < 0.01, and ***P value < 0.001. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 529 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 % of clustered sites TE density pertaining to TEs RNA editome (per million bases) Hung et al. GBE FIG.4.—Analysis of the identiﬁed human editing events according to different conservation levels of A-to-I editing. (A, B) The distribution of human A- to-I editing sites (A)located in Alu, non-Alu TE, and non-TE regions, and (B) located in UTR/intron and editing sites leading to synonymous and non- synonymous changes for primate-only, mammal-only, and vertebrate-conserved editing events. UTR: untranslated region. (C)The f -to-f ratios for human n s (all identiﬁed human A-to-I editing sites), human–chimpanzee shared, human–chimpanzee-mouse shared, and human–chimpanzee–mouse–chicken shared A-to-I editing sites. (D) Comparison of the conservation level of nonsynonymous A-to-I editing and conservation scores (measured by the PhyloP score). The empty diamond, circle, rectangle, and triangle represents the control (the average values of PhyloP scores of the simulation; see Materials and Methods) for each bin of the four categories of conservation, respectively. The error bar represents the standard error of the mean. The P values were estimated by the Kolmogorov–Smirnov test. *P value< 0.05 and ***P value< 0.001. (supplementary data set S4, Supplementary Material online), Samples were selected for spatial proﬁling if they were eligible all the nonsynonymous editing events (18 events located in 15 for measurement in ﬁve organs (cerebellum, brain, liver, kid- genes) were conserved across primates and nonprimate ney, and heart) from a single individual, and for temporal vertebrates. In fact, alteration of A-to-I editing has been proﬁling if they were based on a time-course experiment us- shown to affect the function of all these 15 genes in diverse ing a single animal strain. The constructed spatiotemporal species (supplementary table S3, Supplementary Material atlasthus enabledustoassessthe correlationbetween the online), further supporting the functional importance of global editing (supplementary data set S5, Supplementary these evolutionarily conserved nonsynonymous editing Material online) and expression of the two functional editors events. (i.e., ADAR1 and ADAR2 expression; supplementary data set S6, Supplementary Material online) in the context of evolution. Spatiotemporal Dynamics of A-to-I Editing across Species In the spatial context, consistent with a previous observation (Picardi et al. 2015), we found that A-to-I editing To survey the transcriptome-wide dynamics of A-to-I editing, tendedtobe tissue-speciﬁc (ﬁg. 5A), and that most of we constructed the spatiotemporal atlas in diverse species, the editing events observed in only one tissue were brain-/ including nine species with spatial proﬁles in different organs cerebellum-speciﬁc (ﬁg. 5B). The clustered heatmap analysis and four species with temporal stages during development. 530 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE FIG.5.—Spatial proﬁling of A-to-I editing in ﬁve organs (cerebellum, brain, liver, kidney, and heart) among metazoans. (A) Analysis of tissue-speciﬁcity of A-to-I editing. (B) Distribution of tissue-speciﬁc A-to-I editing sites in varied individuals across species. (C) Clustered heatmap for A-to-I editing across ﬁve types Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 531 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE on 16 individuals across species further revealed that A-to-I embryogenesis from cleavage to organogenesis (ﬁg. 6B), re- editing events grouped tissue samples into 2 distinct groups spectively. We observed that the dynamics of global editing (ﬁg. 5C). This reﬂected that A-to-I editing activity was more exhibited one-stage lagging when compared with the ﬂuctu- abundant in the central nervous system (CNS, e.g., cerebel- ation of ADAR expression (supplementary ﬁg. S6, lum and brain) than in the rest (liver, kidney, and heart), re- Supplementary Material online), as we used mRNA expression gardless of species, sex (ﬁg. 5C and D,and supplementary ﬁg. to represent the actual abundance of ADARs proteins. When S3, Supplementary Material online), or where the editing sites taking this into account, ADAR expression was precisely syn- were located (TE vs. non-TE and coding vs. noncoding; sup- chronized with global editing in all examined species (ﬁg. 6A plementary ﬁg. S4, Supplementary Material online). These and B). In C. elegans, consistent with a previous study (Tonkin results indicate a highly conserved, enriched pattern of editing et al. 2002), the ﬁrst larval stage (L1) exhibited a strong signal activity in the CNS. We further showed that the expression of editing during embryonic and larval (L1–L4) development levels of ADAR1/ADAR2 were generally higher in the CNS as (ﬁg. 6A, left). Intriguingly, another strong signal was present compared with other organs (ﬁg. 5D). Such an expression during dauer stages (i.e., dauer entry, dauer, and dauer exit) pattern holds well among amniotes from primates to birds, (ﬁg. 6A, left), which represented an alternative development suggesting both editors are subject to strong evolutionary pathway in response to environmental stress. In D. mela- constraint in the perspective of gene expression. In addition, nogaster, similarly, an elevated trend of editing was also ob- a positive correlation between A-to-I editing levels in tissues served in L1; and the patterns of global editing were generally and ADAR (ADAR1 and ADAR2) expression was generally comparable with the patterns observed in C. elegans during observed across species (ﬁg. 5D and supplementary ﬁg. S3, development (ﬁg. 6A, right). Such an elevated trend of editing Supplementary Material online). We asked whether our result in L1 generally held during worm and ﬂy development. In may bias toward the editing sites identiﬁed by our stringent vertebrates, frog and zebraﬁsh shared a conserved pattern criteria (i.e., the clustering and conservation strategies). To of editing during embryogenesis: A-to-I editing was highly address this, we extracted the A-to-G variants that failed to active in early embryonic stages, and rapidly fell off during pass our strategies in the three mouse individuals examined in gastrulation (from shield to bud for zebraﬁsh, and from 6 to this study (for which the ﬁve examined organs were obtained 12 h for frog), followed by a stable trajectory that remained at from the same individual) but were previously identiﬁed to be low levels in later stages (ﬁg. 6B). These results thus suggest editing sites (collected in DARNED or RADAR; supplementary that A-to-I editing may play a stage-dependent role during data set S7, Supplementary Material online). We performed development. The similar patterns of editing between inver- the similar analysis and observed a consistent result (supple- tebrates (nematode and ﬂy) and between vertebrates (zebra- mentary ﬁg. S5, Supplementary Material online). These results ﬁsh and frog) during development were generally observed thus suggest that global A-to-I editing activity could be largely for TE, non-TE, coding, and noncoding editing events (ﬁg. 6C attributed to the two editors in tissues, although other con- and supplementary ﬁg. S7, Supplementary Material online). founding factors may also affect the regulation of A-to-I edit- To avoid a possible bias toward our identiﬁed editing sites, we ing activity. This result also reﬂected the enrichment of RNA also analyzed the A-to-G variants that were previously identi- editing activity in the CNS. Furthermore, considering editing ﬁed to be editing sites (collected in DARNED or RADAR) but levels of 589 one-to-one orthologous sites in 5 organs from 5 excluded by our stringent strategies in the ﬂy samples exam- primates, the principal component analysis (PCA) showed that ined in this study (supplementary data set S7, Supplementary the data tended to be clustered by organ (ﬁg. 5E). This result Material online), and showed the similar results as above (sup- also reﬂects the tendency of tissue-speciﬁcity of A-to-I editing plementary ﬁg. S8, Supplementary Material online). The com- (ﬁg. 5A and B). parability of global editing patterns between species implied In the temporal context, we examined the dynamics of A- that editing activity principally reﬂects ADAR expression inher- to-I editing in two invertebrates (C. elegans and D. mela- ited from their common ancestors. Intriguingly, we found 22 nogaster) during the development from embryo to adult nonsynonymous editing sites (in 13 genes; supplementary ta- (ﬁg. 6A) and two vertebrates (zebraﬁsh and frog) during the ble S4, Supplementary Material online), all of which were not FIG.5.—Continued of tissues, with rows representing accumulated editing levels of detected editing events and columns representing tissues. (D) Correlation between A-to-I editing activity and ADAR expression in the spatial context. Transcriptome-wide activities of editing were examined in the ﬁve organs from the same individual. For each individual, the highest editing level among the ﬁve organs was used to normalize the A-to-I editing index (left Y-axis; Material and Methods). ADAR expression levels were estimated in terms of BPKM (right Y-axis; Material and Methods). Pearson’s coefﬁcient of correlation (r) (performed by the R package) was used to evaluate the correlation between A-to-I editing index and ADAR (ADAR1 (r ) and ADAR2 (r )) expression levels. (E)PCA based 1 2 on the editing levels of 589 orthologous sites in 5 organs from 5 primates. PCA was performed by the “princomp” function in the “stats” package of the R package. The distance metric between samples was calculated by 1 jj q . q represents pairwise Spearman’s correlation coefﬁcient of RNA editing level between samples. M, male; F, female; WT, wild type (mouse). 532 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE FIG.6.—Temporal proﬁling of A-to-I editing among metazoans. (A, B) Temporal dynamics of A-to-I editing and ADAR expression during (A) invertebrate (C. elegans and D. melanogaster)and (B) vertebrate (zebraﬁsh and frog) developments. For each animal, the highest editing level during development was used to normalize the A-to-I editing index (left Y-axis; see Material and Methods). ADAR expression levels were estimated in terms of BPKM (right Y-axis). Pearson’s r between A-to-I editing index and ADAR expression level was calculated by considering the one-stage lagging of A-to-I editing as the ﬂuctuation of ADAR expression during development (see the text). ADARs represent ADR-1 and ADR-2 for C. elegant, dADAR for D. melanogaster, and ADAR1 and ADAR2 for vertebrates (zebraﬁsh and frog). (C) Temporal proﬁling of TE- and non-TE-associated A-to-I editing during C. elegans, D. melanogaster, zebraﬁsh, and frog developments. For each animal, the editing levels of TE- and non-TE-associated sites were accumulated, respectively. The highest editing level during development was used to normalize the A-to-I editing index. (D) Changes in A-to-I editing levels for nonsynonymous editing sites during ﬂy holometabolous development. (E) Heatmap representation of Pearson’s correlations (r) between editing levels of individual nonsynonymous editing sites and different levels of lagging (no lagging and one-, two-, and three-stage lagging) in dADAR expression in the matching order of (D). Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 533 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE located in TEs and commonly present during ﬂy holometab- higher magnitude of the ADAR motif (ﬁg. 2H). We also found olous development from embryo to adult. Of the 22 nonsy- that both the ratio of the frequencies of nonsynonymous nonymous events, 16 were conserved editing sites, implying editing to that of synonymous editing (the f -to-f ratio; n s their importance of A-to-I editing for ﬂy development. As ﬁg. 4C) and the conservation level of single nucleotides (the shown in ﬁgure 6D, we further found different patterns in PhyloP score; ﬁg. 4D) remarkably increased with increasing changes of RNA editing level on separate A-to-I editing sites the conservation level of A-to-I editing. These results thus during ﬂy holometabolous development. We observed that suggest potentially functional beneﬁt of highly clustered the dynamics and active nature A-to-I editing on separate sites and conserved editing sites. may reﬂect different levels of lagging as compared with the While the scaffolding of A-to-I editomes could be raised by ﬂuctuation of dADAR expression, even though editing sites the introduction of TEs in a lineage-speciﬁc manner, a subset located within the same gene loci (ﬁg. 6E and supplementary of highly conserved A-to-I editing events (87 human editing ﬁg. S9, Supplementary Material online). For example, events observed in at least 5 nonhuman species, of which 34 NaCP60E (Na channel protein 60E) mediates the voltage- events were conserved in nonprimate species; supplementary dependent sodium ion permeability of excitable mem- data set S4, Supplementary Material online) might exist long branes (Kulkarni et al. 2002), for which A-to-I editing in the common ancestor of primates (or even the common may play a role in rapid electrical and chemical neurotrans- ancestor of mammalian species for the 34 events). For exam- mission (Hoopengardner et al. 2003). Changes in editing ple, Gabra3 contains a highly conserved A-to-I editing site, at levels for NaCP60E transcripts at the six nonsynonymous A- which amniotes encode an isoleucine codon (AUA) while to-I editing sites exhibited distinct patterns throughout fruit zebraﬁsh and frog retain that of methionine (AUG) as the ﬂy life cycles, and were correlated with different levels of ancestral state (Tian et al. 2011). This codon in amniotes lagging in dADAR expression (ﬁg. 6E and supplementary can still revert to “AUG” by A-to-I editing during transcription, ﬁg. S9, Supplementary Material online). Such diversity of making the switch from isoleucine to methionine (I/M) possi- NaCP60E proteins through A-to-I editing may contribute to ble without losing its original message in genome. In addition, ﬂy nervous system. This is the ﬁrst reported ﬁnding on the we found that 18 of the 87events(locatedin15genes) were editing pattern of NaCP60E transcripts. These results reveal nonsynonymous (supplementary data set S4,Supplementary that A-to-I editing, which occurs at the transcriptome level, Material online). Alteration of A-to-I editing has been shown also provides a wide range of diversity for the proteome, to affect the function of these 15 genes in diverse species and that individual editing sites may be developmentally (supplementary table S3, Supplementary Material online). regulated. The gene ontology (GO) analysis further revealed that the host genes of these highly conserved nonsynonymous editing sites were enriched in “synaptic transmission,” “synapse,” Discussion “channel activity,” “gated channel activity,” and “ion chan- In this study, we successfully identiﬁed high-conﬁdence edit- nel activity” (supplementary table S6, Supplementary Material ing sites across diverse species by controlling for FDR and online), consistent with the functional effect of A-to-I editing %AG (which is determined by N ; the clustering strategy) cluster in previous reports (supplementary table S3, Supplementary or evolutionary conservation of editing in multiple species (the Material online). Of note, the host genes of Drosophila con- conservation strategy), without a priori knowledge of known served editing sites (i.e., D. melanogaster editing sites ob- SNPs or genome sequencing from the same sample. This served in at least one other Drosophila species; enables us to identify high-conﬁdence A-to-I sites in both supplementary data set S2, Supplementary Material online) model and nonmodel animals, and thus to construct the ﬁrst were also enriched in similar GO terms (supplementary table A-to-I editomes across 20 metazoan species. Of note, previ- S6, Supplementary Material online), indicating the evolution- ous studies have reported that the A/G pattern of human- ary convergence of editing targets despite substantial diver- chimpanzee coincident SNPs (the same A/G variants were gence between vertebrate and invertebrate. These results also present at human–chimpanzee orthologous positions) was reﬂect that a conserved enrichment of editing in the CNS over-represented (Hodgkinson et al. 2009; Chen et al. throughout more than 300 Myr of divergent evolution in 2016). We suggest that the effect of coincident SNPs on complex animals from primates to chickens (ﬁg. 5D and sup- the editing sites identiﬁed by the conservation strategy is slight plementary ﬁgs. S3 and S4, Supplementary Material online). because only 0.01% (5 out of 38,480) of the human–chim- Intriguingly, of the 66,800 human A-to-I editing events iden- panzee conserved A-to-G editing sites are also observed to be tiﬁed by our conservation strategy (table 1 and supplementary A/G polymorphic at human–chimpanzee orthologous posi- data set S4, Supplementary Material online), 6, 4, and 192 tions on the basis of dbSNP (supplementary table S5, could cause stop codon losses, splice site alterations, and Supplementary Material online). Comparative analysis of the amino acid changes (i.e., nonsynonymous changes), respec- identiﬁed editing sites revealed that highly clustered and con- tively. It is worthwhile to further investigate the functional served editing sites tended to have a higher editing level and a importance of these evolutionarily conserved editing events. 534 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE For the spatial context of A-to-I editing, we found that mechanism in living cells, especially in the CNS and during editing events tended to exhibit tissue-speciﬁcity (particularly early development. This study thus sheds light on evolutionary for the CNS) (ﬁg. 5A and B), and that global editing activity and dynamic aspects of A-to-I editome across vertebrates and and ADAR (ADAR1 and ADAR2) expression were higher in invertebrates, opening up this important but understudied the CNS than in the other tissues examined and positively class of nongenomically encoded events for comprehensive correlatedwitheachother (ﬁg. 5C and D and supplementary characterization. ﬁg. S3, Supplementary Material online). Importantly, these tendencies were consistently observed among amniotes Supplementary Material from primates to birds, representing the ﬁrst documented Supplementary data are available at Genome Biology and spatial proﬁling for A-to-I editing across amniotes. Using Evolution online. PCA, we further showed that global editing activity of primate-conserved sites exhibited an organ-preferential clus- tering (ﬁg. 5E), reﬂecting that these editing events tended to Acknowledgments be speciﬁc to a few organs (ﬁg. 5A and B). On the other hand, We thank Chan-Shuo Wu, Yen-Zho Chen, and Yu-Ting Hsiao our results exhibited the dynamic and active nature of A-to-I for computational assistance, Prof. Hsien-Sung Huang for pro- editome during the development of invertebrates (worm and viding mouse brain tissue, Prof. Fu Huang for providing D. ﬂy) and during the embryogenesis of vertebrates (zebraﬁsh melanogaster sample, and Prof. Ying-Chih Chang for provid- and frog), respectively. We observed that shifting patterns of ing bench and related experimental facilities. This study was global A-to-I editing were precisely dependent on ADAR ex- supported by Genomics Research Center, Academia Sinica, pression (ﬁg. 6A and B), echoing the developmental effects of Taiwan; and the Ministry of Science and Technology, Taiwan ADAR enzymes (Palladino et al. 2000; Jepson and Reenan (under the contracts NSC 102-2621-B-001-003 and MOST 2008; Horsch et al. 2011). Particularly, this (ﬁg. 6D) and other 103-2628-B-001-001-MY4). studies also observed that individual editing sites can ex- hibit distinct editing patterns during development in di- verse species (Gurevich et al. 2002; Osenberg et al. 2010; Literature cited Yu et al. 2016). In addition to a possible explanation that Alon S, et al. 2015. The majority of transcripts in the squid nervous system shift in expression levels of edited genes may accompany are extensively recoded by A-to-I RNA editing. Elife 4:e05198. editing changes during development (Yu et al. 2016), we Bahn JH, et al. 2012. Accurate identiﬁcation of A-to-I RNA editing in hu- man by transcriptome sequencing. Genome Res. 22(1):142–150. provided another possibility that individual editing sites Bass B, et al. 2012. The difﬁcult calls in RNA editing. Interviewed by H Craig may reﬂect different levels of lagging as compared with Mak. Nat Biotechnol. 30(12):1207–1209. the ﬂuctuation of ADAR expression (e.g., ﬁg. 6D and E). Bass BL. 2002. RNA editing by adenosine deaminases that act on RNA. The joint effects of both possibilities may be the cause for Annu Rev Biochem. 71:817–846. diverse editing patterns of individual sites during develop- Bazak L, et al. 2014. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res. ment. Further investigation of individual targeted editing 24:365–376. sites will provide a fuller understanding of the developmen- Bazak L, Levanon EY, Eisenberg E. 2014. Genome-wide analysis of Alu tal role of A-to-I RNA editing. Moreover, we observed that editability. Nucleic Acids Res. 42(11):6876–6884. the editing patterns of C. elegans and ﬂy were generally Burns CM, et al. 1997. Regulation of serotonin-2C receptor G-protein comparable with each other during development (ﬁg. 6A). coupling by RNA editing. Nature 387(6630):303–308. Chen CY, Hung LY, Wu CS, Chuang TJ. 2016. Purifying selection shapes The comparability of editing pattern was also observed be- the coincident SNP distribution of primate coding sequences. Sci Rep. tween vertebrates (zebraﬁsh and frog) during embryogen- 6:27272. esis (ﬁg. 6B), regardless of where the editing sites were Chen JY, et al. 2014. RNA editome in rhesus macaque shaped by purifying located (ﬁg. 6C and supplementary ﬁg. S7, selection. PLoS Genet. 10(4):e1004274. Supplementary Material online). These results thus indicate Chen L. 2013. Characterization and comparison of human nuclear and cytosolic editomes. Proc Natl Acad Sci U S A. 110(29):E2741–E2747. that global editing activity principally reﬂects ADAR expres- Danecek P, et al. 2012. High levels of RNA-editing site conservation sion inherited from their common ancestors. amongst 15 laboratory mouse strains. Genome Biol. 13(4):26. In summary, our comparative analysis provides the ﬁrst A- Eggington JM, Greene T, Bass BL. 2011. Predicting sites of ADAR editing in to-I editomes across 20 diverse species, from worm to human, double-stranded RNA. Nat Commun. 2:319. and reconstructs an evolutionary landscape of editomes, Feng Zhang YL, Yan S, Xing Q, Tian W. 2017. SPRINT: a SNP-free toolkit for identifying RNA editing sites. Bioinformatics 33:3538–3548. which might serve as a valuable resource for understanding Gurevich I, et al. 2002. Altered editing of serotonin 2C receptor pre-mRNA the co-evolution of the editing machinery (i.e., TEs, editomes, in the prefrontal cortex of depressed suicide victims. Neuron and editors). The spatiotemporal atlas provides unprece- 34(3):349–356. dented resolution of editing dynamics, and reveals conserved Han L, et al. 2015. The genomic landscape and clinical relevance of A-to-I patterns of editing, which might be a key regulatory RNA editing in human cancers. Cancer Cell 28(4):515–528. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 535 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE Hodgkinson A, Ladoukakis E, Eyre-Walker A. 2009. Cryptic variation in the Osenberg S, et al. 2010. Alu sequences in undifferentiated human embry- human mutation rate. PLoS Biol. 7(2):e1000027. onic stem cells display high levels of A-to-I RNA editing. PLoS ONE Hoopengardner B, Bhalla T, Staber C, Reenan R. 2003. Nervous system 5(6):e11173. targets of RNA editing identiﬁed by comparative genomics. Science Palladino MJ, Keegan LP, O’Connell MA, Reenan RA. 2000. A-to-I pre- 301(5634):832–836. mRNA editing in Drosophila is primarily involved in adult nervous sys- Horsch M, et al. 2011. Requirement of the RNA-editing enzyme tem function and integrity. Cell 102(4):437–449. ADAR2 for normal physiology in mice. J Biol Chem. Paz-Yaacov N, et al. 2010. Adenosine-to-inosine RNA editing shapes tran- 286(21):18614–18622. scriptome diversity in primates. Proc Natl Acad Sci U S A. Hwang T, et al. 2016. Dynamic regulation of RNA editing in human brain 107(27):12174–12179. development and disease. Nat Neurosci. 19(8):1093–1099. Peng Z, et al. 2012. Comprehensive analysis of RNA-Seq data reveals ex- Jepson JE, Reenan RA. 2008. RNA editing in regulating gene expression in tensive RNA editing in a human transcriptome. Nat Biotechnol. the brain. Biochim Biophys Acta 1779(8):459–470. 30(3):253–260. John D, Weirick T, Dimmeler S, Uchida S. 2017. RNAEditor: easy detection Picardi E, D’Erchia AM, Lo Giudice C, Pesole G. 2017. REDIportal: a com- of RNA editing events and the introduction of editing islands. Brief prehensive database of A-to-I RNA editing events in humans. Nucleic Bioinform. 18:993–1001. Acids Res. 45(D1):D750–D757. Kawahara Y, et al. 2007. Redirection of silencing targets by Picardi E, et al. 2015. Proﬁling RNA editing in human tissues: towards the adenosine-to-inosine editing of miRNAs. Science inosinome Atlas. Sci Rep. 5:14941. 315(5815):1137–1140. Pickrell JK, Gilad Y, Pritchard JK. 2012. Comment on “Widespread RNA Kim DD, et al. 2004. Widespread RNA editing of embedded alu elements and DNA sequence differences in the human transcriptome”. Science in the human transcriptome. Genome Res. 14(9):1719–1725. 335(6074):1302; author reply 1302. Kimelman D, Kirschner MW. 1989. An antisense mRNA directs the cova- Pinto Y, Cohen HY, Levanon EY. 2014. Mammalian conserved ADAR lent modiﬁcation of the transcript encoding ﬁbroblast growth-factor in targets comprise only a small fragment of the human editosome. Xenopus oocytes. Cell 59(4):687–696. Genome Biol. 15(1):R5. Kiran A, Baranov PV. 2010. DARNED: a DAtabase of RNa EDiting in Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. 2010. Detection of non- humans. Bioinformatics 26(14):1772–1776. neutral substitution rates on mammalian phylogenies. Genome Res. Kleinman CL, Majewski J. 2012. Comment on “Widespread RNA and 20:110–121. DNA sequence differences in the human transcriptome”. Science Porath HT, Carmi S, Levanon EY. 2014. A genome-wide map of hyper- 335(6074):1302; author reply 1302. edited RNA reveals numerous new sites. Nat Commun. 5:4726. Kulkarni NH, Yamamoto AH, Robinson KO, Mackay TF, Anholt RR. 2002. Ramaswami G, et al. 2012. Accurate identiﬁcation of human Alu and non- The DSC1 channel, encoded by the smi60E locus, contributes to odor- Alu RNA editing sites. Nat Methods 9(6):579–581. guided behavior in Drosophila melanogaster. Genetics Ramaswami G, et al. 2013. Identifying RNA editing sites using RNA se- 161(4):1507–1516. quencing data alone. Nat Methods 10(2):128–132. Lehmann KA, Bass BL. 2000. Double-stranded RNA adenosine deaminases Ramaswami G, Li JB. 2014. RADAR: a rigorously annotated database ADAR1 and ADAR2 have overlapping speciﬁcities. Biochemistry of A-to-I RNA editing. Nucleic Acids Res. 42(Database 39(42):12875–12884. issue):D109–D113. Lev-Maor G, et al. 2007. RNA-editing-mediated exon evolution. Genome Rueter SM, Dawson TR, Emeson RB. 1999. Regulation of alternative splic- Biol. 8(2):R29. ing by RNA editing. Nature 399(6731):75–80. Levanon EY, et al. 2004. Systematic identiﬁcation of abundant A-to-I Sapiro AL, Deng P, Zhang R, Li JB. 2015. Cis regulatory effects on A-to-I editing sites in the human transcriptome. Nat Biotechnol. RNA editing in related Drosophila species. Cell Rep. 11(5):697–703. 22(8):1001–1005. Savva YA, Reenan RA. 2014. Identiﬁcation of evolutionarily meaningful Li H, Durbin R. 2009. Fast and accurate short read alignment with information within the mammalian RNA editing landscape. Genome Burrows-Wheeler transform. Bioinformatics Biol. 15(1):103. 25(14):1754–1760. Schroeder A, et al. 2006. The RIN: an RNA integrity number for assigning Li H, et al. 2009. The sequence alignment/map format and SAMtools. integrity values to RNA measurements. BMC Mol Biol. 7:3. Bioinformatics 25(16):2078–2079. Shtrichman R, et al. 2012. Altered A-to-I RNA editing in human embryo- Li H, Ruan J, Durbin R. 2008. Mapping short DNA sequencing reads and genesis. PLoS ONE 7(7):e41576. calling variants using mapping quality scores. Genome Res. Steﬂ R, et al. 2010. The solution structure of the ADAR2 dsRBM-RNA 18(11):1851–1858. complex reveals a sequence-speciﬁc readout of the minor groove. Lin W, Piskol R, Tan MH, Li JB. 2012. Comment on “Widespread RNA and Cell 143(2):225–237. DNA sequence differences in the human transcriptome”. Science Sun Y, et al. 2016. RED: a Java-MySQL software for identifying and visu- 335(6074):1302; author reply 1302. alizing RNA editing sites using rule-based and statistical ﬁlters. PLoS Liscovitch-Brauer N, et al. 2017. Trade-off between transcriptome ONE 11(3):e0150465. plasticity and genome evolution in cephalopods. Cell Tian N, et al. 2011. A structural determinant required for RNA editing. 169(2):191–202 e111. Nucleic Acids Res. 39(13):5669–5681. Maas S, Kawahara Y, Tamburro KM, Nishikura K. 2006. A-to-I RNA editing Tomaselli S, et al. 2015. Modulation of microRNA editing, expression and human disease. RNA Biol. 3(1):1–9. and processing by ADAR2 deaminase in glioblastoma. Genome Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping Biol. 16:5. and quantifying mammalian transcriptomes by RNA-Seq. Nat Tonkin LA, et al. 2002. RNA editing by ADARs is important for normal Methods 5(7):621–628. behavior in Caenorhabditis elegans. Embo J. 21(22):6025–6035. Nakamura K, et al. 2011. Sequence-speciﬁc error proﬁle of Illumina Ulbricht RJ, Emeson RB. 2014. One hundred million adenosine-to-inosine sequencers. Nucleic Acids Res. 39(13):e90. RNA editing sites: hearing through the noise. Bioessays Nishikura K. 2006. Editor meets silencer: crosstalk between RNA 36(8):730–735. editing and RNA interference. Nat Rev Mol Cell Biol. Veno MT, et al. 2012. Spatio-temporal regulation of ADAR editing during 7(12):919–931. development in porcine neural tissues. RNA Biol. 9:1054–1065. 536 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Wagner RW, Smith JE, Cooperman BS, Nishikura K. 1989. A double- Xu G, Zhang J. 2015. In search of beneﬁcial coding RNA editing. Mol Biol stranded RNA unwinding activity introduces structural altera- Evol. 32(2):536–541. tions by means of adenosine to inosine conversions in mamma- Yu Y, et al. 2016. The landscape of A-to-I RNA editome is shaped by both lian cells and Xenopus eggs. Proc Natl Acad Sci U S A. positive and purifying selection. PLoS Genet. 12(7):e1006191. 86(8):2647–2651. Zaranek AW, Levanon EY, Zecharia T, Clegg T, Church GM. 2010. A Wahlstedt H, Daniel C, Enstero M, Ohman M. 2009. Large-scale mRNA survey of genomic traces reveals a common sequencing error, RNA sequencing determines global regulation of RNA editing during brain editing, and DNA editing. PLoS Genet. 6(5):e1000954. development. Genome Res. 19(6):978–986. Zhang Q, Xiao X. 2015. Genome sequence-independent identiﬁcation of Wang Q, Khillan J, Gadue P, Nishikura K. 2000. Requirement of the RNA RNA editing sites. Nat Methods 12(4):347–350. editing deaminase ADAR1 gene for embryonic erythropoiesis. Science 290(5497):1765–1768. Xu G, Zhang J. 2014. Human coding RNA editing is generally nonadaptive. Proc Natl Acad Sci U S A. 111(10):3769–3774. Associate editor: Wen-Hsiung Li Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 537 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018
Genome Biology and Evolution – Oxford University Press
Published: Feb 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera