An Evolutionary Landscape of A-to-I RNA Editome across Metazoan Species

An Evolutionary Landscape of A-to-I RNA Editome across Metazoan Species Adenosine-to-inosine (A-to-I) editing is widespread across the kingdom Metazoa. However, for the lack of comprehensive analysis in nonmodel animals, the evolutionary history of A-to-I editing remains largely unexplored. Here, we detect high- confidence editing sites using clustering and conservation strategies based on RNA sequencing data alone, without using single-nucleotide polymorphism information or genome sequencing data from the same sample. We thereby unveil the first evolutionary landscape of A-to-I editing maps across 20 metazoan species (from worm to human), providing unprecedented evidence on how the editing mechanism gradually expands its territory and increases its influence along the history of evo- lution. Our result revealed that highly clustered and conserved editing sites tended to have a higher editing level and a higher magnitude of the ADAR motif. The ratio of the frequencies of nonsynonymous editing to that of synonymous editing remark- ably increased with increasing the conservation level of A-to-I editing. These results thus suggest potentially functional benefit of highly clustered and conserved editing sites. In addition, spatiotemporal dynamics analyses reveal a conserved enrichment of editing and ADAR expression in the central nervous system throughout more than 300 Myr of divergent evolution in complex animals and the comparability of editing patterns between invertebrates and between vertebrates during development. This study provides evolutionary and dynamic aspects of A-to-I editome across metazoan species, expanding this important but understudied class of nongenomically encoded events for comprehensive characterization. Key words: A-to-I RNA editing, ADAR, ADAR motif, dynamic editome, evolution. Introduction 1989), and coding potential (Burns et al. 1997). RNA editing Adenosine-to-inosine (A-to-I) RNA editing is a common co- or events have been reported to be highly associated with mod- post-transcriptional mechanism in metazoans, which is medi- ification of regulatory RNAs (Kawahara et al. 2007; Tomaselli ated by the ADAR (adenosine deaminases that act on RNA) et al. 2015), cancer mechanisms (Maas et al. 2006; Han et al. family of proteins and converts adenosine (A) into inosine (I) 2015), embryogenesis (Wang et al. 2000; Osenberg et al. (Bass 2002; Nishikura 2006). Since inosine is then recognized 2010), and brain development or neural differentiation as guanine (G) in living cells, A-to-I editing is also known as A- (Wahlstedt et al. 2009; Hwang et al. 2016). These observa- to-G editing (Bass 2002; Nishikura 2006). Although an A-to-I tions indicate the functional importance of A-to-I RNA editing editing event only alters one base at the RNA level, which not only for the immediate biological relevance but also for seems to slightly affect the corresponding RNA molecule, the potential molecular complexity. emerging evidence shows that it can affect both transcription The past decade has seen a rapid growth in efforts to and translation in different aspects such as microRNA regula- identify A-to-I sites on a transcriptome-wide scale; a tremen- tion (Rueter et al. 1999; Kawahara et al. 2007), alternative dous number of editing events have been detected (Levanon splicing (Rueter et al. 1999; Lev-Maor et al. 2007), structural et al. 2004; Danecek et al. 2012; Peng et al. 2012; alteration (Kimelman and Kirschner 1989; Wagneretal. Ramaswami et al. 2013; Bazak et al. 2014; Ramaswami and The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Genome Biol. Evol. 10(2):521–537. doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 521 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE Li 2014; Picardi et al. 2015; Zhang and Xiao 2015). However, et al. 2015), our spatiotemporal atlas provided an unprece- identification of RNA editing often severely hampers by false dented opportunity for studying dynamic editome in diverse positives arising from RNA/DNA sequencing errors, single nu- species and for assessing the correlation between the global cleotide polymorphisms (SNPs) or somatic mutations between editing and expression of the two functional editors (ADAR1/ the individual cells, incompleteness of genomic sequences ADAR2) in the context of evolution. and gene annotation, and mapping errors of the cDNA sequences to the reference genome (Bass et al. 2012; Materials and Methods Ulbricht and Emeson 2014). Several RNA editing-detecting Data Retrieval and Access methods minimize false positives from SNPs by comparing the DNA and RNA sequences from a single individual (Bahn The information of the RNA-seq data used in this study was et al. 2012; Peng et al. 2012; Ramaswami et al. 2012). listed in supplementary data set S1, Supplementary Material However, this approach simultaneously requires genome online. TEs and other repetitive elements were identified by and transcriptome sequencing data from the same sample, RepeatMasker and were downloaded from the University of limiting its practicability. To address the lack of genome se- California Santa Cruz (UCSC) genome browser at https://ge- quencing data, a complete (or a partial) SNP database is often nome.ucsc.edu/. The PhyloP scores were also downloaded used to filter out SNPs (Bahn et al. 2012; Ramaswami et al. from the UCSC genome browser. All supplementary data 2012, 2013; Bazak et al. 2014; Zhang and Xiao 2015; John sets (S1–S7) are publicly downloadable at http://idv.sinica. et al. 2017; Sun et al. 2016). These approaches require a priori edu.tw/trees/RNA_editing/RNA_editing.html. The in-house knowledge from known SNPs such as dbSNP, also decreasing programs for identifying RNA editing sites are publicly acces- their capability to identify editing sites in diverse organs/ani- sible from GitHub at https://github.com/TreesLab/ICARES/ mals, especially in nonmodel species due to the lack of SNP tree/dev. A visualizable website was also provided (https:// information. Moreover, the phenomenon that most detected sites.google.com/site/recodedatabase/), which allows users RNA editing sites on coding regions tend to be nonadaptive to search for the identified RNA editing sites by providing (Chen et al. 2014; Xu and Zhang 2014) reveal another chal- the coordinates of genomic regions or gene symbols. lenge in extracting functionally beneficial editing events from the sea of sites with a very low level of editing (Pinto et al. The Pipeline for Identifying A-to-I Sites 2014; Savva and Reenan 2014; Xu and Zhang 2015). These challenges thus limit our understanding of the evolutionary The RNA editing sites were identified by two strategies: the landscape and spatiotemporal dynamics of A-to-I editing in clustering strategy within the same species and the conserva- context. tion strategy with cross-species comparison (fig. 1A). The clus- Previously, it was observed an enrichment for clusters of tering process was made up of three main phases to detected editing sites, suggesting that the effect of sequenc- distinguish editing sites from technical and biological noises. ing errors, SNPs, and mutations on identification of A-to-I First, we collected 309 RNA-seq data from 20 metazoan spe- editing could remarkably decrease with increasing the num- cies (supplementary data set S1, Supplementary Material on- ber of consecutive variants (Levanon et al. 2004; Zaranek et al. line), and used BWA (Li and Durbin 2009)(version 1.2.3)for 2010). The clustering strategy has been demonstrated to be short-read mapping. SNV calling was then performed using effective to detect RNA editing sites without the need for SNP SAMtools pileup (Li et al. 2009) (version 1.0.2) with a pileup information or genome sequencing data from the same sam- parser program downloaded from Galaxy (pileup_parser.pl ple (Porath et al. 2014; Feng Zhang et al. 2017). To generate with parameter settings: 3 9 10 8 20 1 “No” “Yes” 2 an evolutionary landscape of A-to-I RNA editome across “Yes” “Yes”). The type of nucleotide substitution was deter- metazoan species, most of which have no (or limited) SNP mined on the basis of the reference genome and the strand data, we employ the clustering strategy to identify A-to-I edit- information of the Ensembl-annotated genes (see supplemen- ing sites using RNA-seq data alone. For accuracy, we selected tary data set S1, Supplementary Material online). Second, we A-to-I editing sites by controlling for the fraction of A-to-G selected dimorphic variants (i.e., sites with two distinct nucle- mismatches to all types of mismatches (designated as otide types), and discarded singleton and multi-allelic ones. To “%AG”; >95%) and false discovery rate (FDR; <1%) or eliminate potential strand-specific miscalls, we discarded var- cross-species conservation of A-to-I editing, and thereby iden- iants with strand bias. For each species, we used MAQ (Li et al. tified 429,509 high-confidence A-to-I sites in diverse species 2008) to simulate all possible cDNA short-reads based on from worm to human, including model and nonmodel ani- Ensembl-annotated transcripts, mapped these simulated mals. We thus constructed the first evolutionary landscape reads to the reference genome, and compiled a mapping er- and spatiotemporal atlas of A-to-I editing across metazoan ror set according to the mapping result. The called variants species. In contrast with previous studies that investigated within this set were discarded since they were subject to map- editing dynamics in limited genes/species (Wahlstedt et al. ping errors. Besides, sequencing errors were reported to occur 2009; Shtrichman et al. 2012; Veno et al. 2012; Picardi in proximity (Nakamura et al. 2011). We then identified a 522 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE sequencing error set (SES) wherein variant calls of different experimentally confirmed (supplementary fig. S1A and B, mismatch types occurred in proximity (i.e., distance between Supplementary Material online; supplementary dat aset S3, each other <100 bp). Variants overlapped with the SES were Supplementary Material online). In addition to mouse and fly subsequently discarded. In addition, for accuracy, only the editing sites, which have been comprehensively detected and variants with multi-sample evidence were retained. Third, collected in public databases, we also selected nine editing the retained variants were compiled to identify high- sites from the identified chicken editing sites and successfully confidence A-to-I sites using the clustering strategy. Of confirmed eight of them in chicken brain (supplementary fig. note, the editing type of each mismatch was determined on S1C, Supplementary Material online; supplementary data set the basis of the Ensembl annotation. The clustering strategy S3, Supplementary Material online). compiled variants of the same mismatch type in close prox- imity (i.e., distance between each other <100 bp), because it was observed that the effect of sequencing errors, SNPs, and Samples and Validation mutations on identification of A-to-I editing could remarkably Genomic DNA and total RNA were extracted from frozen decrease with increasing the number of consecutive variants brain of mouse (C57BL/6J) at postnatal day 49, fresh brain (Levanon et al. 2004; Zaranek et al. 2010). The proximal dis- of adult chicken, and wild-type fly (Drosophila melanogaster). tance was set as 100 bp because of the observation that the Both genomic DNA and total RNA were obtained from the vast majority (>95%) of the previously identified A-to-I edit- same individual/sample. Genomic DNA and total RNA were ing sites have at least one neighbor site within the proximal extracted by PureLink Genomic DNA Mini Kit (Thermo Fisher distance of 100 bp (supplementary table S1, Supplementary Scientific) and PureLink RNA Mini Kit (Thermo Fisher Scientific) Material online). For each species, the qualified number of with DNase I according to the manufacturer’s instructions, consecutive variants (qualified N ) was determined by cluster respectively. Primers were designed in flanking sequences of the FDR and %AG. The FDR of A-to-I sites was defined as the tested editing sites. 5 lg total RNA transforms to cDNA FDR¼the number of G-to-A mismatches/the number of A-to- with SuperScript III First-Strand Synthesis System (Thermo G mismatches. The qualified N was determined while the cluster Fisher Scientific) using Random Hexamer and Oligo-DT pri- detected sites satisfied both of the two rules: %AG>95% mers. 50 ng of genomic DNA and cDNA were used for the and FDR<1%. For the conservation strategy, the A-to-I edit- PCR analysis of the editing sites. The PCR was performed us- ing sites were identified from three clades (12 vertebrates, 4 ing DreamTaq Green PCR Master Mix (Thermo Fisher Drosophila species, and 4 Caenorhabditis species), respec- Scientific) on Veriti Thermal Cycler (Thermo Fisher tively. We considered the A-to-G mismatches that have Scientific). PCR products were validated by gel, and then passed the strand-bias, mapping error, and sequencing error treated with QIAquick Gel Extraction Kit (Qiagen). The edit- filters described in Phase II (i.e., ad hoc filters; fig. 1A)for each ing sites were selected to be tested, if they were detected species. Such A-to-G variants that were observed at the by the RNA-seq data (number of mapped reads at the site orthologous sites in more than one species of the examined should be >10) to be editing events with editing level clade were regarded as evolutionarily conserved A-to-I editing >10% in at least one sample. Sanger sequencing was per- sites (fig. 1A). A full list of the identified A-to-I editing sites formed to validate the corresponding editing positions of (including clustered and conserved editing sites) referred to genomic DNA and cDNA sequences (see supplementary the Ensembl annotation (releases 66 and 85) is given in sup- fig. S1, Supplementary Material online). The primer plementary data set S2, Supplementary Material online. sequences used were listed in supplementary data set S3, Compared with previously identified A-to-I editing sites in hu- Supplementary Material online. man, mouse, and fly [collected in the well-known public data- bases: DARNED (Kiran and Baranov 2010), RADAR (Ramaswami and Li 2014), and REDIportal (Picardi et al. Observed-to-Expected Ratio of “G” 2017)], we found that most the identified human editing sites (90%) were also detected in the public databases (supple- The observed-to-expected (O/E) ratio of thepresenceof G mentary dat aset S2, Supplementary Material online). In con- was calculated to examine ADAR preference for A-to-I edit- trast, more than one third of the identified mouse and fly ing, which was defined as P =P ,where P rep- ObsðGÞ ExpðGÞ Obs(G) A-to-I editing sites (92% for mouse and 33% for fly) were resented the frequency of observed presence of “G” not found in these databases (supplementary data set S2, immediately upstream or downstream to the A-to-I sites, Supplementary Material online). To validate the newly identi- and P represented that of expected presence of “G,” Exp(G) fied A-to-I editing sites in mouse and fly, we selected 11 which was calculated by the frequency of “G” in the exam- mouse and 12 fly editing sites and performed PCR amplifica- ined genome. The statistical significance of difference be- tion and Sanger sequencing of both DNA and RNA of mouse tween the observed and expected ratios of the presence of brain and wild-type fly from the same individual. Our result “G” was evaluated by the two-tailed Fisher’s exact test using revealed that 10 mouse and 12 fly editing sites were theR package(https://www.r-project.org/). Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 523 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE FIG.1.—Identification of high-confidence A-to-I RNA editing sites, without a priori knowledge of SNP information. (A) Overview of the identification of RNA editing sites. The editing sites were identified by the clustering strategy within the same species or by cross-species comparison. The clusteringprocess involved three main phases: Phase I: preprocessing, Phase II: ad hoc filters, and Phase III: ad hoc identification. (B) Correlation between the number of 524 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Analysis of the Conservation Level of A-to-I Editing and Measurement of ADAR mRNA Expression Conservation Scores ADAR mRNA expression was represented by BPKM (Bases Per To assess the correlation between the conservation level of Kilo-base of gene model per Million mapped bases) nonsynonymous A-to-I editing and conservation scores, we (Mortazavi et al. 2008). To improve the precision of mRNA first performed the UCSC LiftOver tool to convert genome expression measurement in poly(A)-selected RNA-seq sam- coordinates of nonhuman species into those of human at ples, we only considered 3 UTR exons in the gene model of the nonsynonymous sites of adenosine for the four categories calculation. of conservation (i.e., human–chimpanzee, human– chimpanzee–mouse, and human–chimpanzee–mouse– Results chicken conservation). For each category of conservation, the control sites with the same number of “A” of the corre- Identification of Editing Sites across 20 Metazoan Species sponding category of conserved editing sites were selected To investigate the evolutionary landscape and dynamic randomly. The PhyloP score of each selected “A” was changes of A-to-I editing across metazoan species, we first extracted from the UCSC genome browser. The simulation collected RNA-seq data from various cell types of 20 species, was performed 1,000 replicates for each category of including 6 primates, 3 nonprimate mammals, 3 nonmamma- conservation. lian vertebrates, 4 Drosophila species, and 4 Caenorhabditis species (309 samples in total; supplementary data set S1, Comparative Analysis of A-to-I Editing Levels Supplementary Material online). We then used the clustering strategy to detect RNA editing sites based on RNA-seq data We selected 16 animal individuals (2 for chimpanzee, 2 for alone, without the need for a priori knowledge from known bonobo, 2 for gorilla, 1 for orangutan, 2 for macaque, 3 for SNPs or genome sequencing data from the same sample mouse, 1 for opossum, 1 for platypus, and 2 for chicken) for (Materials and Methods). Two useful measures are often ap- assessing editing dynamics in 5 organs (i.e., cerebellum, brain, plied to evaluating the specificity of the identified A-to-I edit- liver, kidney, and heart). Of note, samples were selected for ing sites: 1) the fraction of A-to-G mismatches to all types of spatial profiling if they were eligible for measurement in five mismatches (designated as “%AG”) because of the extreme organs from a single individual. Each individual must contain infrequency of non-A-to-G editing events (Kleinman and all five types of organ samples with reliable RNA quality (i.e., Majewski 2012; Lin et al. 2012; Pickrell et al. 2012); and 2) RNA integrity number [Schroeder et al. 2006] >7.0). On the the ratio of the number of G-to-A mismatches to the number other hand, we used four species (i.e., Caenorhabditis ele- of A-to-G mismatches (i.e., FDR, see Materials and Methods) gans, D. melanogaster, zebrafish, and frog) in the profiling because of the assumption that G-to-A mismatches often of developmental dynamics. Since A-to-I editing could influ- reflected sequencing errors (Bahn et al. 2012; Liscovitch- ence its target sites in a local dsRNA region (Zaranek et al. Brauer et al. 2017). We can find that the percentage of A- 2010; Bazak Levanon, et al. 2014), the level of editing was to-G/T-to-C variants grew (or the FDR values decreased) rap- determined in a clustering manner as idly with the increase of the number of consecutive single- A to  I editing level ¼ nucleotide variants (SNVs) of the same type (designated as Number of G in the clustered sites “N ”) in all examined species (fig. 1B). Of note, the T- i cluster Number of A and G in the clustered sites to-C variants were considered, because the used RNA-seq data were not strand-specific and these variants could possi- bly be A-to-G editing sites in an antisense transcript. This with i ¼ 1, .. ., N,where N is the total number of editing sites revealed that a priori knowledge of the clustering tendency in the designated cluster. For each individual in the spatial for A-to-I editing held well across species, including model context or animal in the temporal context, the editing levels and nonmodel animals. For accuracy, the qualified N were accumulated and the highest value among the exam- cluster was determined while the detected sites satisfied both ined organs or development stages was used to normalize the %AG>95% and FDR<1% (fig. 1B). Upon completion of A-to-I editing index. To ensure the statistical accuracy, we only the screening process, 343,979 A-to-G editing sites were considered the editing clusters with reasonable base coverage identified in diverse species from worm to human. Since the (i.e., the total number of A and G in the clustered sites 10) more species the variants were supported by, the higher level for the analysis of editing dynamics. FIG.1.—Continued consecutive SNVs of the same type (N ) and the two measures of specificity: %AG (the percentage of A-to-G & T-to-C among 12 SNV types) and FDR cluster (the ratio of the number of G-to-A mismatches to the number of A-to-G mismatches). Histograms for each species represent A-to-G & T-to-C percentage in the subset of N consecutive SNVs of the same type. The dark gray histogram for each species represents the qualified N , which satisfies both cluster cluster %AG> 95% and FDR< 1%. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 525 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE of functional potentiality (and thus accuracy) they contained are highly associated with the evolution of RNA editing be- (Bahn et al. 2012), we recovered the A-to-G variants at the tween Drosophila species (Sapiro et al. 2015). Third, we di- orthologous sites in multiple species (editing sites supported vided all examined human editing sites into 20 equal-size bins by more than one species; the conservation strategy). By do- according to their editing levels and found a remarkably pos- ing so, 179,182 evolutionarily conserved A-to-G editing sites itive correlation between the magnitude of the ADAR motif were identified in the 20 species. Integrating the editing sites and editing level (fig. 2E). Of note, if a detected site appeared identified by these two strategies, we totally identified in multiple samples, the highest level observed was consid- 429,509 editing sites (table 1 and supplementary data set ered. Finally, we also observed that both N (fig. 2F)and cluster S2, Supplementary Material online). Of note, we only consid- the conservation level of editing (fig. 2G) were positively cor- ered the sites that were located within genic regions. We related with editing level. The correlations among N ,the cluster emphasize that the identified editing sites are highly confident conservation level of editing, the magnitude of the ADAR with controlling for FDR< 1% and %AG> 95% (the cluster- motif, and editing level were summarized in figure 2H. ing strategy) or evolutionary conservation of editing in multi- Since a relatively higher editing level at a site is expected to ple species (the conservation strategy). be functionally important (Xu and Zhang 2015), our results suggested the biological significance of highly clustered and conserved editing sites. The Correlation between N , Conservation cluster Level of Editing, Editing Level, and the Magnitude of the ADAR Motif Evolutionary Analysis of A-to-I Editomes The A-to-I editors (i.e., ADAR proteins) are known to be highly Since transposable elements (TEs) are pervasive in the conserved across species (Bass 2002). Previous studies have genomes of higher eukaryotes, the double-stranded RNA observed that in some animals ADARs have a sequence pref- structure formed by TE-complementary pairs often provides erence (or targets) for “G” depletion and “G” enrichment at an ideal target for ADAR binding (Levanon et al. 2004). To 0 0 the 5 and 3 neighbor nucleotides next to A-to-I editing sites, understand the extent of ADAR substrates among different respectively (Kiran and Baranov 2010; Lehmann and Bass lineages and species, we constructed an evolutionary land- 2000; Eggington et al. 2011; Ramaswami et al. 2013; Chen scape of A-to-I editing maps (i.e., A-to-I editomes). We et al. 2014; Pinto et al. 2014; Porath et al. 2014; Alon et al. showed that the scaffolding of A-to-I editomes was highly 2015; Liscovitch-Brauer et al. 2017). We evaluated the associated with the expansion of TEs (fig. 3A). We found observed-to-expected (O/E) ratios of the presence of “G”im- that the distribution of clustered A-to-I sites pertaining to mediately upstream or downstream to the A-to-I editing sites TEs (fig. 3B) generally reflected the TE density in the genome (see Materials and Methods) and showed that the trend of the (fig. 3A), echoing the observation that A-to-I editing tended known ADAR motif generally held true across metazoan spe- to be clustered within TEs (Kim et al. 2004). In addition, the cies, regardless of the detection strategies of A-to-G editing landscape was likely to encompass all the A-to-I editing events (i.e., the clustered and conserved sites; fig. 2A and B). that were highly conserved across species, whilst contained We processed to examine the correlations among the lineage- or species-specific events that might have been magnitude of clustering of editing sites (N ), the conser- recruited as a result of divergent evolution. Importantly, short cluster vation level of editing, the magnitude of the ADAR motif, and interspersed elements (SINEs), which were apparently editing level. First, we found a positive correlation between enriched in the mammalian genomes (fig. 3A, top), consti- the magnitude of the ADAR motif and N .Such a trend tuted a major landmark in mammalian editomes (fig. 3A,bot- cluster that the magnitude of the ADAR motif increased with increas- tom). This co-evolution of SINEs and editomes might ing N became flat after the qualified N (fig. 2C). accelerate the divergence of species, not only in the genomic cluster cluster Second, we classified the identified human editing sites into content, but also in the transcriptomic complexity. For exam- three groups according to the conservation level (supplemen- ple, in primates, genes associated with neurological processes tary data set S4, Supplementary Material online) and found or disorders were more likely to develop a host of SINE- that the magnitude of the ADAR motif significantly increased mediated substrates of A-to-I editing (Paz-Yaacov et al. with increasing the conservation level of A-to-I editing 2010), possibly resulting in the development of more complex (fig. 2D). This also reflected the sequence and structural con- transcriptome in the primate brain. servation in the double-stranded RNA binding domains To further examine the biological significance of conserved (dsRBDs) of ADARs across 14 vertebrates (supplementary editing sites, we retrieved 66,689 primate-only, 39 mammal- fig. S2A, Supplementary Material online), in which the protein only, and 16 vertebrate-conserved editing events according to residues involved in RNA binding (Stefl et al. 2010)were even the conservation level of the human A-to-I editing events highly conserved from vertebrates to Drosophila (supplemen- identified by our conservation strategy (supplementary data tary fig. S2B and C, Supplementary Material online). These set S4, Supplementary Material online). Of note, primate-only results respond to a recent notion that cis sequence changes events were human editing events observed at the 526 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Table 1 Summary of A-to-I Editing Sites Identified in This Study Organis Number of Number of A-to-I Editing Sites Host Genes Samples Clustering Conservation Total TE Sites Strategy (A) Strategy (B) Sites (A[B) Homo sapiens (human) 45 177,734 66,800 222,300 213,772 11,035 Pan troglodytes (chimpanzee) 15 42,082 45,117 54,891 52,939 5,292 Pan paniscus (bonobo) 12 30,807 37,021 42,420 40,911 4,929 Gorilla gorilla (gorilla) 11 12,169 15,936 19,956 18,561 3,442 Pongo abelii (orangutan) 9 6,230 8,828 11,125 10,820 2,357 Macaca mulatta (macaque) 13 3,034 4,593 6,206 5,997 1,691 Mus musculus (mouse) 32 57,125 115 57,143 47,859 3,738 Monodelphis domestica (opossum) 12 661 27 682 588 89 Ornithorhynchus anatinus (platypus) 12 267 7 274 253 36 Gallus gallus (chicken) 12 77 17 94 30 20 Xenopus tropicalis (frog) 40 4,495 4 4,499 1,812 146 Danio rerio (zebrafish) 8 5,331 5 5,336 4,571 340 Fly species D. melanogaster 34 317 229 534 208 114 D. pseudoobscura 8 464 206 636 N/A 112 D. willistoni 893 63 149 N/A 124 D. mojavensis 4 1,894 214 2,065 208 410 Nematode species Caenorhabditis elegans 16 888 0 888 584 57 C. briggsae 6 241 0 241 164 21 C. brenneri 628 0 28 N/A 3 C. japonica 642 0 42 N/A 2 The detailed information of the identified sites (e.g., genomic location and host genes) is listed in supplementary data set S2, Supplementary Material online. TE sites: A-to-I editing sites that are located in TEs. TE information of the species is not available. orthologous sites in nonhuman primate(s) but not observed in with increasing the conservation level of A-to-I editing. nonprimate vertebrates. Mammal-only ones were human Furthermore, considering the total “A” sites within the hu- editing events observed at the orthologous sites in both non- man coding sequences that would cause nonsynonymous human primate(s) and nonprimate mammal(s) but not ob- and synonymous changes if edited to “G,” respectively, we served in nonmammal vertebrates. Vertebrate-conserved calculated the frequencies of nonsynonymous (f ) and synon- ones were human editing events simultaneously observed at ymous (f ) editing (Xu and Zhang 2014). If nonsynonymous the orthologous sites in nonhuman primate(s), nonprimate editing events are generally deleterious and are destined to mammal(s), and nonmammal vertebrate(s). We found that selective elimination, f should exhibit remarkably smaller the great majority of the conserved editing events (99%) than f (Chen 2013; Xu and Zhang 2014). To examine the were primate-only, of which 98% (65,109 out of 66,689 relationship between the conservation level of A-to-I editing events) were located in TEs, especially in the primate- and the f -to-f ratio, we retrieved human nonconserved, n s specific Alu sequences (63,745 events; 96%), whereas no human–chimpanzee shared, human–chimpanzee–mouse mammal-only or vertebrate-conserved events occurred in shared, and human–chimpanzee–mouse–chicken shared syn- Alu repeats (fig. 4A). This also reflected our abovementioned onymous (or nonsynonymous) editing sites and the corre- notion that the scaffolding of A-to-I editomes could be raised sponding shared “A” sites that would have synonymous (or by the introduction of TEs in a lineage-specific manner (fig. 3). nonsynonymous) changes if edited to “G,” respectively (sup- We examined the effect of the conservation level on the dis- plementary table S2, Supplementary Material online). tribution of editing events. We found that the only 0.23% of Intriguingly, we found that the f -to-f ratio remarkably n s primate-only events caused amino acid changes (nonsynon- increased with increasing the conservation level of A-to-I edit- ymous changes), whereas as high as 26% and 88% of ing, andthe ratioevenexhibited 100% >1 for the human– mammal-only and vertebrate-conserved events leaded to chimpanzee–mouse–chicken shared events (fig. 4C). A nonsynonymous changes, respectively (fig. 4B). This reveals previous study reported that nonsynonymous editing events that the percentage of nonsynonymous editing sites increase were observed to occur less frequently than expected by Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 527 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE FIG.2.—The cis-preference of ADARs and editing level for the detected A-to-G editing sites. (A, B)The cis-preference of ADARs (or the ADAR motif, which was measured by the observed-to-expected (O/E) ratio of the presence of “G” immediately upstream and downstream to the A-to-G editing sites) for 528 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE 20 20 10 100 0 FIG.3.—The relationship between the expansion of TEs and the increase of A-to-I editing sites. (A) The average numbers of TEs (i.e., SINEs, LINEs, LTRs, and DNA transposons) per million bases (top) and the compositions of A-to-I editing sites in the four types of TEs (i.e., SINE, LINE, LTR, and DNA transposon), other repetitive region, and nonrepetitive region (bottom). (B) The distribution of clustered A-to-I sites pertaining to TEs across species. TE: transposable element. SINE: short interspersed nuclear element. LINE: long interspersed nuclear element. LTR: long terminal repeat element. chance, suggesting that coding RNA editing is generally not scores; Materials and Methods) and the conservation level beneficial (Xu and Zhang 2014). However, if the editing of nonsynonymous A-to-I editing was indeed positively corre- events were highly conserved across species, we observed a lated with the PhyloP scores (fig. 4D). This also reflected the different trend (fig. 4C). We further examined the correlation above observations that both the magnitude of the ADAR between the conservation level of nonsynonymous A-to-I motif and editing level were positively correlated with the editing and the conservation level of the corresponding indi- conservation level of editing (fig. 2D, G,and H). These results vidual nucleotides (measured by the PhyloP score) (Pollard et thus support a previous assumption that cross-species shared al. 2010). We found that the PhyloP scores of the conserved nonsynonymous editing is potentially beneficial and unlikely editing sites (including human–chimpanzee, human– due to nonediting-related processes that may cause the chimpanzee–mouse, and human–chimpanzee–mouse– inevitable consequence of sequence conservation (Xu and chicken shared editing sites) were generally higher than con- Zhang 2015). For example, considering the 87 human-editing trol (by simulating the sequence data to infer the PhyloP events highly conserved in 5 nonhuman vertebrates FIG.2.—Continued the editing sites identified by the clustering (A) and conservation (B) strategies across metazoan species. Only the species with more than 50 detected editing sites were considered. (C) The correlation between the magnitude of the ADAR motif and the magnitude of clustering of editing sites (N )in human.(D) cluster The correlation between the magnitude of the ADAR motif and the conservation level of editing. (E) The correlation between the magnitude of the ADAR motif and editing level (measured by the Spearman’s rank correlation). (F, G) The correlation between editing level and N (F) and the conservation level of cluster editing (G). (H) The correlations among N , the conservation level of editing, the magnitude of the ADAR motif, and editing level. “þ”represents a cluster positive correlation. The statistical significance was evaluated using the two-tailed Fisher’s exact test (A, B, D), the Spearman’s rank correlation (E), and the two-tailed Wilcoxon rank-sum test (F, G) using the R package: *P value < 0.05, **P value < 0.01, and ***P value < 0.001. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 529 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 % of clustered sites TE density pertaining to TEs RNA editome (per million bases) Hung et al. GBE FIG.4.—Analysis of the identified human editing events according to different conservation levels of A-to-I editing. (A, B) The distribution of human A- to-I editing sites (A)located in Alu, non-Alu TE, and non-TE regions, and (B) located in UTR/intron and editing sites leading to synonymous and non- synonymous changes for primate-only, mammal-only, and vertebrate-conserved editing events. UTR: untranslated region. (C)The f -to-f ratios for human n s (all identified human A-to-I editing sites), human–chimpanzee shared, human–chimpanzee-mouse shared, and human–chimpanzee–mouse–chicken shared A-to-I editing sites. (D) Comparison of the conservation level of nonsynonymous A-to-I editing and conservation scores (measured by the PhyloP score). The empty diamond, circle, rectangle, and triangle represents the control (the average values of PhyloP scores of the simulation; see Materials and Methods) for each bin of the four categories of conservation, respectively. The error bar represents the standard error of the mean. The P values were estimated by the Kolmogorov–Smirnov test. *P value< 0.05 and ***P value< 0.001. (supplementary data set S4, Supplementary Material online), Samples were selected for spatial profiling if they were eligible all the nonsynonymous editing events (18 events located in 15 for measurement in five organs (cerebellum, brain, liver, kid- genes) were conserved across primates and nonprimate ney, and heart) from a single individual, and for temporal vertebrates. In fact, alteration of A-to-I editing has been profiling if they were based on a time-course experiment us- shown to affect the function of all these 15 genes in diverse ing a single animal strain. The constructed spatiotemporal species (supplementary table S3, Supplementary Material atlasthus enabledustoassessthe correlationbetween the online), further supporting the functional importance of global editing (supplementary data set S5, Supplementary these evolutionarily conserved nonsynonymous editing Material online) and expression of the two functional editors events. (i.e., ADAR1 and ADAR2 expression; supplementary data set S6, Supplementary Material online) in the context of evolution. Spatiotemporal Dynamics of A-to-I Editing across Species In the spatial context, consistent with a previous observation (Picardi et al. 2015), we found that A-to-I editing To survey the transcriptome-wide dynamics of A-to-I editing, tendedtobe tissue-specific (fig. 5A), and that most of we constructed the spatiotemporal atlas in diverse species, the editing events observed in only one tissue were brain-/ including nine species with spatial profiles in different organs cerebellum-specific (fig. 5B). The clustered heatmap analysis and four species with temporal stages during development. 530 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE FIG.5.—Spatial profiling of A-to-I editing in five organs (cerebellum, brain, liver, kidney, and heart) among metazoans. (A) Analysis of tissue-specificity of A-to-I editing. (B) Distribution of tissue-specific A-to-I editing sites in varied individuals across species. (C) Clustered heatmap for A-to-I editing across five types Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 531 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE on 16 individuals across species further revealed that A-to-I embryogenesis from cleavage to organogenesis (fig. 6B), re- editing events grouped tissue samples into 2 distinct groups spectively. We observed that the dynamics of global editing (fig. 5C). This reflected that A-to-I editing activity was more exhibited one-stage lagging when compared with the fluctu- abundant in the central nervous system (CNS, e.g., cerebel- ation of ADAR expression (supplementary fig. S6, lum and brain) than in the rest (liver, kidney, and heart), re- Supplementary Material online), as we used mRNA expression gardless of species, sex (fig. 5C and D,and supplementary fig. to represent the actual abundance of ADARs proteins. When S3, Supplementary Material online), or where the editing sites taking this into account, ADAR expression was precisely syn- were located (TE vs. non-TE and coding vs. noncoding; sup- chronized with global editing in all examined species (fig. 6A plementary fig. S4, Supplementary Material online). These and B). In C. elegans, consistent with a previous study (Tonkin results indicate a highly conserved, enriched pattern of editing et al. 2002), the first larval stage (L1) exhibited a strong signal activity in the CNS. We further showed that the expression of editing during embryonic and larval (L1–L4) development levels of ADAR1/ADAR2 were generally higher in the CNS as (fig. 6A, left). Intriguingly, another strong signal was present compared with other organs (fig. 5D). Such an expression during dauer stages (i.e., dauer entry, dauer, and dauer exit) pattern holds well among amniotes from primates to birds, (fig. 6A, left), which represented an alternative development suggesting both editors are subject to strong evolutionary pathway in response to environmental stress. In D. mela- constraint in the perspective of gene expression. In addition, nogaster, similarly, an elevated trend of editing was also ob- a positive correlation between A-to-I editing levels in tissues served in L1; and the patterns of global editing were generally and ADAR (ADAR1 and ADAR2) expression was generally comparable with the patterns observed in C. elegans during observed across species (fig. 5D and supplementary fig. S3, development (fig. 6A, right). Such an elevated trend of editing Supplementary Material online). We asked whether our result in L1 generally held during worm and fly development. In may bias toward the editing sites identified by our stringent vertebrates, frog and zebrafish shared a conserved pattern criteria (i.e., the clustering and conservation strategies). To of editing during embryogenesis: A-to-I editing was highly address this, we extracted the A-to-G variants that failed to active in early embryonic stages, and rapidly fell off during pass our strategies in the three mouse individuals examined in gastrulation (from shield to bud for zebrafish, and from 6 to this study (for which the five examined organs were obtained 12 h for frog), followed by a stable trajectory that remained at from the same individual) but were previously identified to be low levels in later stages (fig. 6B). These results thus suggest editing sites (collected in DARNED or RADAR; supplementary that A-to-I editing may play a stage-dependent role during data set S7, Supplementary Material online). We performed development. The similar patterns of editing between inver- the similar analysis and observed a consistent result (supple- tebrates (nematode and fly) and between vertebrates (zebra- mentary fig. S5, Supplementary Material online). These results fish and frog) during development were generally observed thus suggest that global A-to-I editing activity could be largely for TE, non-TE, coding, and noncoding editing events (fig. 6C attributed to the two editors in tissues, although other con- and supplementary fig. S7, Supplementary Material online). founding factors may also affect the regulation of A-to-I edit- To avoid a possible bias toward our identified editing sites, we ing activity. This result also reflected the enrichment of RNA also analyzed the A-to-G variants that were previously identi- editing activity in the CNS. Furthermore, considering editing fied to be editing sites (collected in DARNED or RADAR) but levels of 589 one-to-one orthologous sites in 5 organs from 5 excluded by our stringent strategies in the fly samples exam- primates, the principal component analysis (PCA) showed that ined in this study (supplementary data set S7, Supplementary the data tended to be clustered by organ (fig. 5E). This result Material online), and showed the similar results as above (sup- also reflects the tendency of tissue-specificity of A-to-I editing plementary fig. S8, Supplementary Material online). The com- (fig. 5A and B). parability of global editing patterns between species implied In the temporal context, we examined the dynamics of A- that editing activity principally reflects ADAR expression inher- to-I editing in two invertebrates (C. elegans and D. mela- ited from their common ancestors. Intriguingly, we found 22 nogaster) during the development from embryo to adult nonsynonymous editing sites (in 13 genes; supplementary ta- (fig. 6A) and two vertebrates (zebrafish and frog) during the ble S4, Supplementary Material online), all of which were not FIG.5.—Continued of tissues, with rows representing accumulated editing levels of detected editing events and columns representing tissues. (D) Correlation between A-to-I editing activity and ADAR expression in the spatial context. Transcriptome-wide activities of editing were examined in the five organs from the same individual. For each individual, the highest editing level among the five organs was used to normalize the A-to-I editing index (left Y-axis; Material and Methods). ADAR expression levels were estimated in terms of BPKM (right Y-axis; Material and Methods). Pearson’s coefficient of correlation (r) (performed by the R package) was used to evaluate the correlation between A-to-I editing index and ADAR (ADAR1 (r ) and ADAR2 (r )) expression levels. (E)PCA based 1 2 on the editing levels of 589 orthologous sites in 5 organs from 5 primates. PCA was performed by the “princomp” function in the “stats” package of the R package. The distance metric between samples was calculated by 1 jj q . q represents pairwise Spearman’s correlation coefficient of RNA editing level between samples. M, male; F, female; WT, wild type (mouse). 532 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE FIG.6.—Temporal profiling of A-to-I editing among metazoans. (A, B) Temporal dynamics of A-to-I editing and ADAR expression during (A) invertebrate (C. elegans and D. melanogaster)and (B) vertebrate (zebrafish and frog) developments. For each animal, the highest editing level during development was used to normalize the A-to-I editing index (left Y-axis; see Material and Methods). ADAR expression levels were estimated in terms of BPKM (right Y-axis). Pearson’s r between A-to-I editing index and ADAR expression level was calculated by considering the one-stage lagging of A-to-I editing as the fluctuation of ADAR expression during development (see the text). ADARs represent ADR-1 and ADR-2 for C. elegant, dADAR for D. melanogaster, and ADAR1 and ADAR2 for vertebrates (zebrafish and frog). (C) Temporal profiling of TE- and non-TE-associated A-to-I editing during C. elegans, D. melanogaster, zebrafish, and frog developments. For each animal, the editing levels of TE- and non-TE-associated sites were accumulated, respectively. The highest editing level during development was used to normalize the A-to-I editing index. (D) Changes in A-to-I editing levels for nonsynonymous editing sites during fly holometabolous development. (E) Heatmap representation of Pearson’s correlations (r) between editing levels of individual nonsynonymous editing sites and different levels of lagging (no lagging and one-, two-, and three-stage lagging) in dADAR expression in the matching order of (D). Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 533 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE located in TEs and commonly present during fly holometab- higher magnitude of the ADAR motif (fig. 2H). We also found olous development from embryo to adult. Of the 22 nonsy- that both the ratio of the frequencies of nonsynonymous nonymous events, 16 were conserved editing sites, implying editing to that of synonymous editing (the f -to-f ratio; n s their importance of A-to-I editing for fly development. As fig. 4C) and the conservation level of single nucleotides (the shown in figure 6D, we further found different patterns in PhyloP score; fig. 4D) remarkably increased with increasing changes of RNA editing level on separate A-to-I editing sites the conservation level of A-to-I editing. These results thus during fly holometabolous development. We observed that suggest potentially functional benefit of highly clustered the dynamics and active nature A-to-I editing on separate sites and conserved editing sites. may reflect different levels of lagging as compared with the While the scaffolding of A-to-I editomes could be raised by fluctuation of dADAR expression, even though editing sites the introduction of TEs in a lineage-specific manner, a subset located within the same gene loci (fig. 6E and supplementary of highly conserved A-to-I editing events (87 human editing fig. S9, Supplementary Material online). For example, events observed in at least 5 nonhuman species, of which 34 NaCP60E (Na channel protein 60E) mediates the voltage- events were conserved in nonprimate species; supplementary dependent sodium ion permeability of excitable mem- data set S4, Supplementary Material online) might exist long branes (Kulkarni et al. 2002), for which A-to-I editing in the common ancestor of primates (or even the common may play a role in rapid electrical and chemical neurotrans- ancestor of mammalian species for the 34 events). For exam- mission (Hoopengardner et al. 2003). Changes in editing ple, Gabra3 contains a highly conserved A-to-I editing site, at levels for NaCP60E transcripts at the six nonsynonymous A- which amniotes encode an isoleucine codon (AUA) while to-I editing sites exhibited distinct patterns throughout fruit zebrafish and frog retain that of methionine (AUG) as the fly life cycles, and were correlated with different levels of ancestral state (Tian et al. 2011). This codon in amniotes lagging in dADAR expression (fig. 6E and supplementary can still revert to “AUG” by A-to-I editing during transcription, fig. S9, Supplementary Material online). Such diversity of making the switch from isoleucine to methionine (I/M) possi- NaCP60E proteins through A-to-I editing may contribute to ble without losing its original message in genome. In addition, fly nervous system. This is the first reported finding on the we found that 18 of the 87events(locatedin15genes) were editing pattern of NaCP60E transcripts. These results reveal nonsynonymous (supplementary data set S4,Supplementary that A-to-I editing, which occurs at the transcriptome level, Material online). Alteration of A-to-I editing has been shown also provides a wide range of diversity for the proteome, to affect the function of these 15 genes in diverse species and that individual editing sites may be developmentally (supplementary table S3, Supplementary Material online). regulated. The gene ontology (GO) analysis further revealed that the host genes of these highly conserved nonsynonymous editing sites were enriched in “synaptic transmission,” “synapse,” Discussion “channel activity,” “gated channel activity,” and “ion chan- In this study, we successfully identified high-confidence edit- nel activity” (supplementary table S6, Supplementary Material ing sites across diverse species by controlling for FDR and online), consistent with the functional effect of A-to-I editing %AG (which is determined by N ; the clustering strategy) cluster in previous reports (supplementary table S3, Supplementary or evolutionary conservation of editing in multiple species (the Material online). Of note, the host genes of Drosophila con- conservation strategy), without a priori knowledge of known served editing sites (i.e., D. melanogaster editing sites ob- SNPs or genome sequencing from the same sample. This served in at least one other Drosophila species; enables us to identify high-confidence A-to-I sites in both supplementary data set S2, Supplementary Material online) model and nonmodel animals, and thus to construct the first were also enriched in similar GO terms (supplementary table A-to-I editomes across 20 metazoan species. Of note, previ- S6, Supplementary Material online), indicating the evolution- ous studies have reported that the A/G pattern of human- ary convergence of editing targets despite substantial diver- chimpanzee coincident SNPs (the same A/G variants were gence between vertebrate and invertebrate. These results also present at human–chimpanzee orthologous positions) was reflect that a conserved enrichment of editing in the CNS over-represented (Hodgkinson et al. 2009; Chen et al. throughout more than 300 Myr of divergent evolution in 2016). We suggest that the effect of coincident SNPs on complex animals from primates to chickens (fig. 5D and sup- the editing sites identified by the conservation strategy is slight plementary figs. S3 and S4, Supplementary Material online). because only 0.01% (5 out of 38,480) of the human–chim- Intriguingly, of the 66,800 human A-to-I editing events iden- panzee conserved A-to-G editing sites are also observed to be tified by our conservation strategy (table 1 and supplementary A/G polymorphic at human–chimpanzee orthologous posi- data set S4, Supplementary Material online), 6, 4, and 192 tions on the basis of dbSNP (supplementary table S5, could cause stop codon losses, splice site alterations, and Supplementary Material online). Comparative analysis of the amino acid changes (i.e., nonsynonymous changes), respec- identified editing sites revealed that highly clustered and con- tively. It is worthwhile to further investigate the functional served editing sites tended to have a higher editing level and a importance of these evolutionarily conserved editing events. 534 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE For the spatial context of A-to-I editing, we found that mechanism in living cells, especially in the CNS and during editing events tended to exhibit tissue-specificity (particularly early development. This study thus sheds light on evolutionary for the CNS) (fig. 5A and B), and that global editing activity and dynamic aspects of A-to-I editome across vertebrates and and ADAR (ADAR1 and ADAR2) expression were higher in invertebrates, opening up this important but understudied the CNS than in the other tissues examined and positively class of nongenomically encoded events for comprehensive correlatedwitheachother (fig. 5C and D and supplementary characterization. fig. S3, Supplementary Material online). Importantly, these tendencies were consistently observed among amniotes Supplementary Material from primates to birds, representing the first documented Supplementary data are available at Genome Biology and spatial profiling for A-to-I editing across amniotes. Using Evolution online. PCA, we further showed that global editing activity of primate-conserved sites exhibited an organ-preferential clus- tering (fig. 5E), reflecting that these editing events tended to Acknowledgments be specific to a few organs (fig. 5A and B). On the other hand, We thank Chan-Shuo Wu, Yen-Zho Chen, and Yu-Ting Hsiao our results exhibited the dynamic and active nature of A-to-I for computational assistance, Prof. Hsien-Sung Huang for pro- editome during the development of invertebrates (worm and viding mouse brain tissue, Prof. Fu Huang for providing D. fly) and during the embryogenesis of vertebrates (zebrafish melanogaster sample, and Prof. Ying-Chih Chang for provid- and frog), respectively. We observed that shifting patterns of ing bench and related experimental facilities. This study was global A-to-I editing were precisely dependent on ADAR ex- supported by Genomics Research Center, Academia Sinica, pression (fig. 6A and B), echoing the developmental effects of Taiwan; and the Ministry of Science and Technology, Taiwan ADAR enzymes (Palladino et al. 2000; Jepson and Reenan (under the contracts NSC 102-2621-B-001-003 and MOST 2008; Horsch et al. 2011). Particularly, this (fig. 6D) and other 103-2628-B-001-001-MY4). studies also observed that individual editing sites can ex- hibit distinct editing patterns during development in di- verse species (Gurevich et al. 2002; Osenberg et al. 2010; Literature cited Yu et al. 2016). In addition to a possible explanation that Alon S, et al. 2015. The majority of transcripts in the squid nervous system shift in expression levels of edited genes may accompany are extensively recoded by A-to-I RNA editing. Elife 4:e05198. editing changes during development (Yu et al. 2016), we Bahn JH, et al. 2012. Accurate identification of A-to-I RNA editing in hu- man by transcriptome sequencing. Genome Res. 22(1):142–150. provided another possibility that individual editing sites Bass B, et al. 2012. The difficult calls in RNA editing. Interviewed by H Craig may reflect different levels of lagging as compared with Mak. Nat Biotechnol. 30(12):1207–1209. the fluctuation of ADAR expression (e.g., fig. 6D and E). Bass BL. 2002. RNA editing by adenosine deaminases that act on RNA. The joint effects of both possibilities may be the cause for Annu Rev Biochem. 71:817–846. diverse editing patterns of individual sites during develop- Bazak L, et al. 2014. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res. ment. Further investigation of individual targeted editing 24:365–376. sites will provide a fuller understanding of the developmen- Bazak L, Levanon EY, Eisenberg E. 2014. Genome-wide analysis of Alu tal role of A-to-I RNA editing. Moreover, we observed that editability. Nucleic Acids Res. 42(11):6876–6884. the editing patterns of C. elegans and fly were generally Burns CM, et al. 1997. Regulation of serotonin-2C receptor G-protein comparable with each other during development (fig. 6A). coupling by RNA editing. Nature 387(6630):303–308. Chen CY, Hung LY, Wu CS, Chuang TJ. 2016. Purifying selection shapes The comparability of editing pattern was also observed be- the coincident SNP distribution of primate coding sequences. Sci Rep. tween vertebrates (zebrafish and frog) during embryogen- 6:27272. esis (fig. 6B), regardless of where the editing sites were Chen JY, et al. 2014. RNA editome in rhesus macaque shaped by purifying located (fig. 6C and supplementary fig. S7, selection. PLoS Genet. 10(4):e1004274. Supplementary Material online). These results thus indicate Chen L. 2013. Characterization and comparison of human nuclear and cytosolic editomes. Proc Natl Acad Sci U S A. 110(29):E2741–E2747. that global editing activity principally reflects ADAR expres- Danecek P, et al. 2012. High levels of RNA-editing site conservation sion inherited from their common ancestors. amongst 15 laboratory mouse strains. Genome Biol. 13(4):26. In summary, our comparative analysis provides the first A- Eggington JM, Greene T, Bass BL. 2011. Predicting sites of ADAR editing in to-I editomes across 20 diverse species, from worm to human, double-stranded RNA. Nat Commun. 2:319. and reconstructs an evolutionary landscape of editomes, Feng Zhang YL, Yan S, Xing Q, Tian W. 2017. SPRINT: a SNP-free toolkit for identifying RNA editing sites. Bioinformatics 33:3538–3548. which might serve as a valuable resource for understanding Gurevich I, et al. 2002. Altered editing of serotonin 2C receptor pre-mRNA the co-evolution of the editing machinery (i.e., TEs, editomes, in the prefrontal cortex of depressed suicide victims. Neuron and editors). The spatiotemporal atlas provides unprece- 34(3):349–356. dented resolution of editing dynamics, and reveals conserved Han L, et al. 2015. The genomic landscape and clinical relevance of A-to-I patterns of editing, which might be a key regulatory RNA editing in human cancers. Cancer Cell 28(4):515–528. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 535 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE Hodgkinson A, Ladoukakis E, Eyre-Walker A. 2009. Cryptic variation in the Osenberg S, et al. 2010. Alu sequences in undifferentiated human embry- human mutation rate. PLoS Biol. 7(2):e1000027. onic stem cells display high levels of A-to-I RNA editing. PLoS ONE Hoopengardner B, Bhalla T, Staber C, Reenan R. 2003. Nervous system 5(6):e11173. targets of RNA editing identified by comparative genomics. Science Palladino MJ, Keegan LP, O’Connell MA, Reenan RA. 2000. A-to-I pre- 301(5634):832–836. mRNA editing in Drosophila is primarily involved in adult nervous sys- Horsch M, et al. 2011. Requirement of the RNA-editing enzyme tem function and integrity. Cell 102(4):437–449. ADAR2 for normal physiology in mice. J Biol Chem. Paz-Yaacov N, et al. 2010. Adenosine-to-inosine RNA editing shapes tran- 286(21):18614–18622. scriptome diversity in primates. Proc Natl Acad Sci U S A. Hwang T, et al. 2016. Dynamic regulation of RNA editing in human brain 107(27):12174–12179. development and disease. Nat Neurosci. 19(8):1093–1099. Peng Z, et al. 2012. Comprehensive analysis of RNA-Seq data reveals ex- Jepson JE, Reenan RA. 2008. RNA editing in regulating gene expression in tensive RNA editing in a human transcriptome. Nat Biotechnol. the brain. Biochim Biophys Acta 1779(8):459–470. 30(3):253–260. John D, Weirick T, Dimmeler S, Uchida S. 2017. RNAEditor: easy detection Picardi E, D’Erchia AM, Lo Giudice C, Pesole G. 2017. REDIportal: a com- of RNA editing events and the introduction of editing islands. Brief prehensive database of A-to-I RNA editing events in humans. Nucleic Bioinform. 18:993–1001. Acids Res. 45(D1):D750–D757. Kawahara Y, et al. 2007. Redirection of silencing targets by Picardi E, et al. 2015. Profiling RNA editing in human tissues: towards the adenosine-to-inosine editing of miRNAs. Science inosinome Atlas. Sci Rep. 5:14941. 315(5815):1137–1140. Pickrell JK, Gilad Y, Pritchard JK. 2012. Comment on “Widespread RNA Kim DD, et al. 2004. Widespread RNA editing of embedded alu elements and DNA sequence differences in the human transcriptome”. Science in the human transcriptome. Genome Res. 14(9):1719–1725. 335(6074):1302; author reply 1302. Kimelman D, Kirschner MW. 1989. An antisense mRNA directs the cova- Pinto Y, Cohen HY, Levanon EY. 2014. Mammalian conserved ADAR lent modification of the transcript encoding fibroblast growth-factor in targets comprise only a small fragment of the human editosome. Xenopus oocytes. Cell 59(4):687–696. Genome Biol. 15(1):R5. Kiran A, Baranov PV. 2010. DARNED: a DAtabase of RNa EDiting in Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. 2010. Detection of non- humans. Bioinformatics 26(14):1772–1776. neutral substitution rates on mammalian phylogenies. Genome Res. Kleinman CL, Majewski J. 2012. Comment on “Widespread RNA and 20:110–121. DNA sequence differences in the human transcriptome”. Science Porath HT, Carmi S, Levanon EY. 2014. A genome-wide map of hyper- 335(6074):1302; author reply 1302. edited RNA reveals numerous new sites. Nat Commun. 5:4726. Kulkarni NH, Yamamoto AH, Robinson KO, Mackay TF, Anholt RR. 2002. Ramaswami G, et al. 2012. Accurate identification of human Alu and non- The DSC1 channel, encoded by the smi60E locus, contributes to odor- Alu RNA editing sites. Nat Methods 9(6):579–581. guided behavior in Drosophila melanogaster. Genetics Ramaswami G, et al. 2013. Identifying RNA editing sites using RNA se- 161(4):1507–1516. quencing data alone. Nat Methods 10(2):128–132. Lehmann KA, Bass BL. 2000. Double-stranded RNA adenosine deaminases Ramaswami G, Li JB. 2014. RADAR: a rigorously annotated database ADAR1 and ADAR2 have overlapping specificities. Biochemistry of A-to-I RNA editing. Nucleic Acids Res. 42(Database 39(42):12875–12884. issue):D109–D113. Lev-Maor G, et al. 2007. RNA-editing-mediated exon evolution. Genome Rueter SM, Dawson TR, Emeson RB. 1999. Regulation of alternative splic- Biol. 8(2):R29. ing by RNA editing. Nature 399(6731):75–80. Levanon EY, et al. 2004. Systematic identification of abundant A-to-I Sapiro AL, Deng P, Zhang R, Li JB. 2015. Cis regulatory effects on A-to-I editing sites in the human transcriptome. Nat Biotechnol. RNA editing in related Drosophila species. Cell Rep. 11(5):697–703. 22(8):1001–1005. Savva YA, Reenan RA. 2014. Identification of evolutionarily meaningful Li H, Durbin R. 2009. Fast and accurate short read alignment with information within the mammalian RNA editing landscape. Genome Burrows-Wheeler transform. Bioinformatics Biol. 15(1):103. 25(14):1754–1760. Schroeder A, et al. 2006. The RIN: an RNA integrity number for assigning Li H, et al. 2009. The sequence alignment/map format and SAMtools. integrity values to RNA measurements. BMC Mol Biol. 7:3. Bioinformatics 25(16):2078–2079. Shtrichman R, et al. 2012. Altered A-to-I RNA editing in human embryo- Li H, Ruan J, Durbin R. 2008. Mapping short DNA sequencing reads and genesis. PLoS ONE 7(7):e41576. calling variants using mapping quality scores. Genome Res. Stefl R, et al. 2010. The solution structure of the ADAR2 dsRBM-RNA 18(11):1851–1858. complex reveals a sequence-specific readout of the minor groove. Lin W, Piskol R, Tan MH, Li JB. 2012. Comment on “Widespread RNA and Cell 143(2):225–237. DNA sequence differences in the human transcriptome”. Science Sun Y, et al. 2016. RED: a Java-MySQL software for identifying and visu- 335(6074):1302; author reply 1302. alizing RNA editing sites using rule-based and statistical filters. PLoS Liscovitch-Brauer N, et al. 2017. Trade-off between transcriptome ONE 11(3):e0150465. plasticity and genome evolution in cephalopods. Cell Tian N, et al. 2011. A structural determinant required for RNA editing. 169(2):191–202 e111. Nucleic Acids Res. 39(13):5669–5681. Maas S, Kawahara Y, Tamburro KM, Nishikura K. 2006. A-to-I RNA editing Tomaselli S, et al. 2015. Modulation of microRNA editing, expression and human disease. RNA Biol. 3(1):1–9. and processing by ADAR2 deaminase in glioblastoma. Genome Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping Biol. 16:5. and quantifying mammalian transcriptomes by RNA-Seq. Nat Tonkin LA, et al. 2002. RNA editing by ADARs is important for normal Methods 5(7):621–628. behavior in Caenorhabditis elegans. Embo J. 21(22):6025–6035. Nakamura K, et al. 2011. Sequence-specific error profile of Illumina Ulbricht RJ, Emeson RB. 2014. One hundred million adenosine-to-inosine sequencers. Nucleic Acids Res. 39(13):e90. RNA editing sites: hearing through the noise. Bioessays Nishikura K. 2006. Editor meets silencer: crosstalk between RNA 36(8):730–735. editing and RNA interference. Nat Rev Mol Cell Biol. Veno MT, et al. 2012. Spatio-temporal regulation of ADAR editing during 7(12):919–931. development in porcine neural tissues. RNA Biol. 9:1054–1065. 536 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Wagner RW, Smith JE, Cooperman BS, Nishikura K. 1989. A double- Xu G, Zhang J. 2015. In search of beneficial coding RNA editing. Mol Biol stranded RNA unwinding activity introduces structural altera- Evol. 32(2):536–541. tions by means of adenosine to inosine conversions in mamma- Yu Y, et al. 2016. The landscape of A-to-I RNA editome is shaped by both lian cells and Xenopus eggs. Proc Natl Acad Sci U S A. positive and purifying selection. PLoS Genet. 12(7):e1006191. 86(8):2647–2651. Zaranek AW, Levanon EY, Zecharia T, Clegg T, Church GM. 2010. A Wahlstedt H, Daniel C, Enstero M, Ohman M. 2009. Large-scale mRNA survey of genomic traces reveals a common sequencing error, RNA sequencing determines global regulation of RNA editing during brain editing, and DNA editing. PLoS Genet. 6(5):e1000954. development. Genome Res. 19(6):978–986. Zhang Q, Xiao X. 2015. Genome sequence-independent identification of Wang Q, Khillan J, Gadue P, Nishikura K. 2000. Requirement of the RNA RNA editing sites. Nat Methods 12(4):347–350. editing deaminase ADAR1 gene for embryonic erythropoiesis. Science 290(5497):1765–1768. Xu G, Zhang J. 2014. Human coding RNA editing is generally nonadaptive. Proc Natl Acad Sci U S A. 111(10):3769–3774. Associate editor: Wen-Hsiung Li Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 537 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology and Evolution Oxford University Press

An Evolutionary Landscape of A-to-I RNA Editome across Metazoan Species

Free
17 pages

Loading next page...
 
/lp/ou_press/an-evolutionary-landscape-of-a-to-i-rna-editome-across-metazoan-CBfbCfzmXk
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
ISSN
1759-6653
eISSN
1759-6653
D.O.I.
10.1093/gbe/evx277
Publisher site
See Article on Publisher Site

Abstract

Adenosine-to-inosine (A-to-I) editing is widespread across the kingdom Metazoa. However, for the lack of comprehensive analysis in nonmodel animals, the evolutionary history of A-to-I editing remains largely unexplored. Here, we detect high- confidence editing sites using clustering and conservation strategies based on RNA sequencing data alone, without using single-nucleotide polymorphism information or genome sequencing data from the same sample. We thereby unveil the first evolutionary landscape of A-to-I editing maps across 20 metazoan species (from worm to human), providing unprecedented evidence on how the editing mechanism gradually expands its territory and increases its influence along the history of evo- lution. Our result revealed that highly clustered and conserved editing sites tended to have a higher editing level and a higher magnitude of the ADAR motif. The ratio of the frequencies of nonsynonymous editing to that of synonymous editing remark- ably increased with increasing the conservation level of A-to-I editing. These results thus suggest potentially functional benefit of highly clustered and conserved editing sites. In addition, spatiotemporal dynamics analyses reveal a conserved enrichment of editing and ADAR expression in the central nervous system throughout more than 300 Myr of divergent evolution in complex animals and the comparability of editing patterns between invertebrates and between vertebrates during development. This study provides evolutionary and dynamic aspects of A-to-I editome across metazoan species, expanding this important but understudied class of nongenomically encoded events for comprehensive characterization. Key words: A-to-I RNA editing, ADAR, ADAR motif, dynamic editome, evolution. Introduction 1989), and coding potential (Burns et al. 1997). RNA editing Adenosine-to-inosine (A-to-I) RNA editing is a common co- or events have been reported to be highly associated with mod- post-transcriptional mechanism in metazoans, which is medi- ification of regulatory RNAs (Kawahara et al. 2007; Tomaselli ated by the ADAR (adenosine deaminases that act on RNA) et al. 2015), cancer mechanisms (Maas et al. 2006; Han et al. family of proteins and converts adenosine (A) into inosine (I) 2015), embryogenesis (Wang et al. 2000; Osenberg et al. (Bass 2002; Nishikura 2006). Since inosine is then recognized 2010), and brain development or neural differentiation as guanine (G) in living cells, A-to-I editing is also known as A- (Wahlstedt et al. 2009; Hwang et al. 2016). These observa- to-G editing (Bass 2002; Nishikura 2006). Although an A-to-I tions indicate the functional importance of A-to-I RNA editing editing event only alters one base at the RNA level, which not only for the immediate biological relevance but also for seems to slightly affect the corresponding RNA molecule, the potential molecular complexity. emerging evidence shows that it can affect both transcription The past decade has seen a rapid growth in efforts to and translation in different aspects such as microRNA regula- identify A-to-I sites on a transcriptome-wide scale; a tremen- tion (Rueter et al. 1999; Kawahara et al. 2007), alternative dous number of editing events have been detected (Levanon splicing (Rueter et al. 1999; Lev-Maor et al. 2007), structural et al. 2004; Danecek et al. 2012; Peng et al. 2012; alteration (Kimelman and Kirschner 1989; Wagneretal. Ramaswami et al. 2013; Bazak et al. 2014; Ramaswami and The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non- commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Genome Biol. Evol. 10(2):521–537. doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 521 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE Li 2014; Picardi et al. 2015; Zhang and Xiao 2015). However, et al. 2015), our spatiotemporal atlas provided an unprece- identification of RNA editing often severely hampers by false dented opportunity for studying dynamic editome in diverse positives arising from RNA/DNA sequencing errors, single nu- species and for assessing the correlation between the global cleotide polymorphisms (SNPs) or somatic mutations between editing and expression of the two functional editors (ADAR1/ the individual cells, incompleteness of genomic sequences ADAR2) in the context of evolution. and gene annotation, and mapping errors of the cDNA sequences to the reference genome (Bass et al. 2012; Materials and Methods Ulbricht and Emeson 2014). Several RNA editing-detecting Data Retrieval and Access methods minimize false positives from SNPs by comparing the DNA and RNA sequences from a single individual (Bahn The information of the RNA-seq data used in this study was et al. 2012; Peng et al. 2012; Ramaswami et al. 2012). listed in supplementary data set S1, Supplementary Material However, this approach simultaneously requires genome online. TEs and other repetitive elements were identified by and transcriptome sequencing data from the same sample, RepeatMasker and were downloaded from the University of limiting its practicability. To address the lack of genome se- California Santa Cruz (UCSC) genome browser at https://ge- quencing data, a complete (or a partial) SNP database is often nome.ucsc.edu/. The PhyloP scores were also downloaded used to filter out SNPs (Bahn et al. 2012; Ramaswami et al. from the UCSC genome browser. All supplementary data 2012, 2013; Bazak et al. 2014; Zhang and Xiao 2015; John sets (S1–S7) are publicly downloadable at http://idv.sinica. et al. 2017; Sun et al. 2016). These approaches require a priori edu.tw/trees/RNA_editing/RNA_editing.html. The in-house knowledge from known SNPs such as dbSNP, also decreasing programs for identifying RNA editing sites are publicly acces- their capability to identify editing sites in diverse organs/ani- sible from GitHub at https://github.com/TreesLab/ICARES/ mals, especially in nonmodel species due to the lack of SNP tree/dev. A visualizable website was also provided (https:// information. Moreover, the phenomenon that most detected sites.google.com/site/recodedatabase/), which allows users RNA editing sites on coding regions tend to be nonadaptive to search for the identified RNA editing sites by providing (Chen et al. 2014; Xu and Zhang 2014) reveal another chal- the coordinates of genomic regions or gene symbols. lenge in extracting functionally beneficial editing events from the sea of sites with a very low level of editing (Pinto et al. The Pipeline for Identifying A-to-I Sites 2014; Savva and Reenan 2014; Xu and Zhang 2015). These challenges thus limit our understanding of the evolutionary The RNA editing sites were identified by two strategies: the landscape and spatiotemporal dynamics of A-to-I editing in clustering strategy within the same species and the conserva- context. tion strategy with cross-species comparison (fig. 1A). The clus- Previously, it was observed an enrichment for clusters of tering process was made up of three main phases to detected editing sites, suggesting that the effect of sequenc- distinguish editing sites from technical and biological noises. ing errors, SNPs, and mutations on identification of A-to-I First, we collected 309 RNA-seq data from 20 metazoan spe- editing could remarkably decrease with increasing the num- cies (supplementary data set S1, Supplementary Material on- ber of consecutive variants (Levanon et al. 2004; Zaranek et al. line), and used BWA (Li and Durbin 2009)(version 1.2.3)for 2010). The clustering strategy has been demonstrated to be short-read mapping. SNV calling was then performed using effective to detect RNA editing sites without the need for SNP SAMtools pileup (Li et al. 2009) (version 1.0.2) with a pileup information or genome sequencing data from the same sam- parser program downloaded from Galaxy (pileup_parser.pl ple (Porath et al. 2014; Feng Zhang et al. 2017). To generate with parameter settings: 3 9 10 8 20 1 “No” “Yes” 2 an evolutionary landscape of A-to-I RNA editome across “Yes” “Yes”). The type of nucleotide substitution was deter- metazoan species, most of which have no (or limited) SNP mined on the basis of the reference genome and the strand data, we employ the clustering strategy to identify A-to-I edit- information of the Ensembl-annotated genes (see supplemen- ing sites using RNA-seq data alone. For accuracy, we selected tary data set S1, Supplementary Material online). Second, we A-to-I editing sites by controlling for the fraction of A-to-G selected dimorphic variants (i.e., sites with two distinct nucle- mismatches to all types of mismatches (designated as otide types), and discarded singleton and multi-allelic ones. To “%AG”; >95%) and false discovery rate (FDR; <1%) or eliminate potential strand-specific miscalls, we discarded var- cross-species conservation of A-to-I editing, and thereby iden- iants with strand bias. For each species, we used MAQ (Li et al. tified 429,509 high-confidence A-to-I sites in diverse species 2008) to simulate all possible cDNA short-reads based on from worm to human, including model and nonmodel ani- Ensembl-annotated transcripts, mapped these simulated mals. We thus constructed the first evolutionary landscape reads to the reference genome, and compiled a mapping er- and spatiotemporal atlas of A-to-I editing across metazoan ror set according to the mapping result. The called variants species. In contrast with previous studies that investigated within this set were discarded since they were subject to map- editing dynamics in limited genes/species (Wahlstedt et al. ping errors. Besides, sequencing errors were reported to occur 2009; Shtrichman et al. 2012; Veno et al. 2012; Picardi in proximity (Nakamura et al. 2011). We then identified a 522 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE sequencing error set (SES) wherein variant calls of different experimentally confirmed (supplementary fig. S1A and B, mismatch types occurred in proximity (i.e., distance between Supplementary Material online; supplementary dat aset S3, each other <100 bp). Variants overlapped with the SES were Supplementary Material online). In addition to mouse and fly subsequently discarded. In addition, for accuracy, only the editing sites, which have been comprehensively detected and variants with multi-sample evidence were retained. Third, collected in public databases, we also selected nine editing the retained variants were compiled to identify high- sites from the identified chicken editing sites and successfully confidence A-to-I sites using the clustering strategy. Of confirmed eight of them in chicken brain (supplementary fig. note, the editing type of each mismatch was determined on S1C, Supplementary Material online; supplementary data set the basis of the Ensembl annotation. The clustering strategy S3, Supplementary Material online). compiled variants of the same mismatch type in close prox- imity (i.e., distance between each other <100 bp), because it was observed that the effect of sequencing errors, SNPs, and Samples and Validation mutations on identification of A-to-I editing could remarkably Genomic DNA and total RNA were extracted from frozen decrease with increasing the number of consecutive variants brain of mouse (C57BL/6J) at postnatal day 49, fresh brain (Levanon et al. 2004; Zaranek et al. 2010). The proximal dis- of adult chicken, and wild-type fly (Drosophila melanogaster). tance was set as 100 bp because of the observation that the Both genomic DNA and total RNA were obtained from the vast majority (>95%) of the previously identified A-to-I edit- same individual/sample. Genomic DNA and total RNA were ing sites have at least one neighbor site within the proximal extracted by PureLink Genomic DNA Mini Kit (Thermo Fisher distance of 100 bp (supplementary table S1, Supplementary Scientific) and PureLink RNA Mini Kit (Thermo Fisher Scientific) Material online). For each species, the qualified number of with DNase I according to the manufacturer’s instructions, consecutive variants (qualified N ) was determined by cluster respectively. Primers were designed in flanking sequences of the FDR and %AG. The FDR of A-to-I sites was defined as the tested editing sites. 5 lg total RNA transforms to cDNA FDR¼the number of G-to-A mismatches/the number of A-to- with SuperScript III First-Strand Synthesis System (Thermo G mismatches. The qualified N was determined while the cluster Fisher Scientific) using Random Hexamer and Oligo-DT pri- detected sites satisfied both of the two rules: %AG>95% mers. 50 ng of genomic DNA and cDNA were used for the and FDR<1%. For the conservation strategy, the A-to-I edit- PCR analysis of the editing sites. The PCR was performed us- ing sites were identified from three clades (12 vertebrates, 4 ing DreamTaq Green PCR Master Mix (Thermo Fisher Drosophila species, and 4 Caenorhabditis species), respec- Scientific) on Veriti Thermal Cycler (Thermo Fisher tively. We considered the A-to-G mismatches that have Scientific). PCR products were validated by gel, and then passed the strand-bias, mapping error, and sequencing error treated with QIAquick Gel Extraction Kit (Qiagen). The edit- filters described in Phase II (i.e., ad hoc filters; fig. 1A)for each ing sites were selected to be tested, if they were detected species. Such A-to-G variants that were observed at the by the RNA-seq data (number of mapped reads at the site orthologous sites in more than one species of the examined should be >10) to be editing events with editing level clade were regarded as evolutionarily conserved A-to-I editing >10% in at least one sample. Sanger sequencing was per- sites (fig. 1A). A full list of the identified A-to-I editing sites formed to validate the corresponding editing positions of (including clustered and conserved editing sites) referred to genomic DNA and cDNA sequences (see supplementary the Ensembl annotation (releases 66 and 85) is given in sup- fig. S1, Supplementary Material online). The primer plementary data set S2, Supplementary Material online. sequences used were listed in supplementary data set S3, Compared with previously identified A-to-I editing sites in hu- Supplementary Material online. man, mouse, and fly [collected in the well-known public data- bases: DARNED (Kiran and Baranov 2010), RADAR (Ramaswami and Li 2014), and REDIportal (Picardi et al. Observed-to-Expected Ratio of “G” 2017)], we found that most the identified human editing sites (90%) were also detected in the public databases (supple- The observed-to-expected (O/E) ratio of thepresenceof G mentary dat aset S2, Supplementary Material online). In con- was calculated to examine ADAR preference for A-to-I edit- trast, more than one third of the identified mouse and fly ing, which was defined as P =P ,where P rep- ObsðGÞ ExpðGÞ Obs(G) A-to-I editing sites (92% for mouse and 33% for fly) were resented the frequency of observed presence of “G” not found in these databases (supplementary data set S2, immediately upstream or downstream to the A-to-I sites, Supplementary Material online). To validate the newly identi- and P represented that of expected presence of “G,” Exp(G) fied A-to-I editing sites in mouse and fly, we selected 11 which was calculated by the frequency of “G” in the exam- mouse and 12 fly editing sites and performed PCR amplifica- ined genome. The statistical significance of difference be- tion and Sanger sequencing of both DNA and RNA of mouse tween the observed and expected ratios of the presence of brain and wild-type fly from the same individual. Our result “G” was evaluated by the two-tailed Fisher’s exact test using revealed that 10 mouse and 12 fly editing sites were theR package(https://www.r-project.org/). Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 523 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE FIG.1.—Identification of high-confidence A-to-I RNA editing sites, without a priori knowledge of SNP information. (A) Overview of the identification of RNA editing sites. The editing sites were identified by the clustering strategy within the same species or by cross-species comparison. The clusteringprocess involved three main phases: Phase I: preprocessing, Phase II: ad hoc filters, and Phase III: ad hoc identification. (B) Correlation between the number of 524 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Analysis of the Conservation Level of A-to-I Editing and Measurement of ADAR mRNA Expression Conservation Scores ADAR mRNA expression was represented by BPKM (Bases Per To assess the correlation between the conservation level of Kilo-base of gene model per Million mapped bases) nonsynonymous A-to-I editing and conservation scores, we (Mortazavi et al. 2008). To improve the precision of mRNA first performed the UCSC LiftOver tool to convert genome expression measurement in poly(A)-selected RNA-seq sam- coordinates of nonhuman species into those of human at ples, we only considered 3 UTR exons in the gene model of the nonsynonymous sites of adenosine for the four categories calculation. of conservation (i.e., human–chimpanzee, human– chimpanzee–mouse, and human–chimpanzee–mouse– Results chicken conservation). For each category of conservation, the control sites with the same number of “A” of the corre- Identification of Editing Sites across 20 Metazoan Species sponding category of conserved editing sites were selected To investigate the evolutionary landscape and dynamic randomly. The PhyloP score of each selected “A” was changes of A-to-I editing across metazoan species, we first extracted from the UCSC genome browser. The simulation collected RNA-seq data from various cell types of 20 species, was performed 1,000 replicates for each category of including 6 primates, 3 nonprimate mammals, 3 nonmamma- conservation. lian vertebrates, 4 Drosophila species, and 4 Caenorhabditis species (309 samples in total; supplementary data set S1, Comparative Analysis of A-to-I Editing Levels Supplementary Material online). We then used the clustering strategy to detect RNA editing sites based on RNA-seq data We selected 16 animal individuals (2 for chimpanzee, 2 for alone, without the need for a priori knowledge from known bonobo, 2 for gorilla, 1 for orangutan, 2 for macaque, 3 for SNPs or genome sequencing data from the same sample mouse, 1 for opossum, 1 for platypus, and 2 for chicken) for (Materials and Methods). Two useful measures are often ap- assessing editing dynamics in 5 organs (i.e., cerebellum, brain, plied to evaluating the specificity of the identified A-to-I edit- liver, kidney, and heart). Of note, samples were selected for ing sites: 1) the fraction of A-to-G mismatches to all types of spatial profiling if they were eligible for measurement in five mismatches (designated as “%AG”) because of the extreme organs from a single individual. Each individual must contain infrequency of non-A-to-G editing events (Kleinman and all five types of organ samples with reliable RNA quality (i.e., Majewski 2012; Lin et al. 2012; Pickrell et al. 2012); and 2) RNA integrity number [Schroeder et al. 2006] >7.0). On the the ratio of the number of G-to-A mismatches to the number other hand, we used four species (i.e., Caenorhabditis ele- of A-to-G mismatches (i.e., FDR, see Materials and Methods) gans, D. melanogaster, zebrafish, and frog) in the profiling because of the assumption that G-to-A mismatches often of developmental dynamics. Since A-to-I editing could influ- reflected sequencing errors (Bahn et al. 2012; Liscovitch- ence its target sites in a local dsRNA region (Zaranek et al. Brauer et al. 2017). We can find that the percentage of A- 2010; Bazak Levanon, et al. 2014), the level of editing was to-G/T-to-C variants grew (or the FDR values decreased) rap- determined in a clustering manner as idly with the increase of the number of consecutive single- A to  I editing level ¼ nucleotide variants (SNVs) of the same type (designated as Number of G in the clustered sites “N ”) in all examined species (fig. 1B). Of note, the T- i cluster Number of A and G in the clustered sites to-C variants were considered, because the used RNA-seq data were not strand-specific and these variants could possi- bly be A-to-G editing sites in an antisense transcript. This with i ¼ 1, .. ., N,where N is the total number of editing sites revealed that a priori knowledge of the clustering tendency in the designated cluster. For each individual in the spatial for A-to-I editing held well across species, including model context or animal in the temporal context, the editing levels and nonmodel animals. For accuracy, the qualified N were accumulated and the highest value among the exam- cluster was determined while the detected sites satisfied both ined organs or development stages was used to normalize the %AG>95% and FDR<1% (fig. 1B). Upon completion of A-to-I editing index. To ensure the statistical accuracy, we only the screening process, 343,979 A-to-G editing sites were considered the editing clusters with reasonable base coverage identified in diverse species from worm to human. Since the (i.e., the total number of A and G in the clustered sites 10) more species the variants were supported by, the higher level for the analysis of editing dynamics. FIG.1.—Continued consecutive SNVs of the same type (N ) and the two measures of specificity: %AG (the percentage of A-to-G & T-to-C among 12 SNV types) and FDR cluster (the ratio of the number of G-to-A mismatches to the number of A-to-G mismatches). Histograms for each species represent A-to-G & T-to-C percentage in the subset of N consecutive SNVs of the same type. The dark gray histogram for each species represents the qualified N , which satisfies both cluster cluster %AG> 95% and FDR< 1%. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 525 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE of functional potentiality (and thus accuracy) they contained are highly associated with the evolution of RNA editing be- (Bahn et al. 2012), we recovered the A-to-G variants at the tween Drosophila species (Sapiro et al. 2015). Third, we di- orthologous sites in multiple species (editing sites supported vided all examined human editing sites into 20 equal-size bins by more than one species; the conservation strategy). By do- according to their editing levels and found a remarkably pos- ing so, 179,182 evolutionarily conserved A-to-G editing sites itive correlation between the magnitude of the ADAR motif were identified in the 20 species. Integrating the editing sites and editing level (fig. 2E). Of note, if a detected site appeared identified by these two strategies, we totally identified in multiple samples, the highest level observed was consid- 429,509 editing sites (table 1 and supplementary data set ered. Finally, we also observed that both N (fig. 2F)and cluster S2, Supplementary Material online). Of note, we only consid- the conservation level of editing (fig. 2G) were positively cor- ered the sites that were located within genic regions. We related with editing level. The correlations among N ,the cluster emphasize that the identified editing sites are highly confident conservation level of editing, the magnitude of the ADAR with controlling for FDR< 1% and %AG> 95% (the cluster- motif, and editing level were summarized in figure 2H. ing strategy) or evolutionary conservation of editing in multi- Since a relatively higher editing level at a site is expected to ple species (the conservation strategy). be functionally important (Xu and Zhang 2015), our results suggested the biological significance of highly clustered and conserved editing sites. The Correlation between N , Conservation cluster Level of Editing, Editing Level, and the Magnitude of the ADAR Motif Evolutionary Analysis of A-to-I Editomes The A-to-I editors (i.e., ADAR proteins) are known to be highly Since transposable elements (TEs) are pervasive in the conserved across species (Bass 2002). Previous studies have genomes of higher eukaryotes, the double-stranded RNA observed that in some animals ADARs have a sequence pref- structure formed by TE-complementary pairs often provides erence (or targets) for “G” depletion and “G” enrichment at an ideal target for ADAR binding (Levanon et al. 2004). To 0 0 the 5 and 3 neighbor nucleotides next to A-to-I editing sites, understand the extent of ADAR substrates among different respectively (Kiran and Baranov 2010; Lehmann and Bass lineages and species, we constructed an evolutionary land- 2000; Eggington et al. 2011; Ramaswami et al. 2013; Chen scape of A-to-I editing maps (i.e., A-to-I editomes). We et al. 2014; Pinto et al. 2014; Porath et al. 2014; Alon et al. showed that the scaffolding of A-to-I editomes was highly 2015; Liscovitch-Brauer et al. 2017). We evaluated the associated with the expansion of TEs (fig. 3A). We found observed-to-expected (O/E) ratios of the presence of “G”im- that the distribution of clustered A-to-I sites pertaining to mediately upstream or downstream to the A-to-I editing sites TEs (fig. 3B) generally reflected the TE density in the genome (see Materials and Methods) and showed that the trend of the (fig. 3A), echoing the observation that A-to-I editing tended known ADAR motif generally held true across metazoan spe- to be clustered within TEs (Kim et al. 2004). In addition, the cies, regardless of the detection strategies of A-to-G editing landscape was likely to encompass all the A-to-I editing events (i.e., the clustered and conserved sites; fig. 2A and B). that were highly conserved across species, whilst contained We processed to examine the correlations among the lineage- or species-specific events that might have been magnitude of clustering of editing sites (N ), the conser- recruited as a result of divergent evolution. Importantly, short cluster vation level of editing, the magnitude of the ADAR motif, and interspersed elements (SINEs), which were apparently editing level. First, we found a positive correlation between enriched in the mammalian genomes (fig. 3A, top), consti- the magnitude of the ADAR motif and N .Such a trend tuted a major landmark in mammalian editomes (fig. 3A,bot- cluster that the magnitude of the ADAR motif increased with increas- tom). This co-evolution of SINEs and editomes might ing N became flat after the qualified N (fig. 2C). accelerate the divergence of species, not only in the genomic cluster cluster Second, we classified the identified human editing sites into content, but also in the transcriptomic complexity. For exam- three groups according to the conservation level (supplemen- ple, in primates, genes associated with neurological processes tary data set S4, Supplementary Material online) and found or disorders were more likely to develop a host of SINE- that the magnitude of the ADAR motif significantly increased mediated substrates of A-to-I editing (Paz-Yaacov et al. with increasing the conservation level of A-to-I editing 2010), possibly resulting in the development of more complex (fig. 2D). This also reflected the sequence and structural con- transcriptome in the primate brain. servation in the double-stranded RNA binding domains To further examine the biological significance of conserved (dsRBDs) of ADARs across 14 vertebrates (supplementary editing sites, we retrieved 66,689 primate-only, 39 mammal- fig. S2A, Supplementary Material online), in which the protein only, and 16 vertebrate-conserved editing events according to residues involved in RNA binding (Stefl et al. 2010)were even the conservation level of the human A-to-I editing events highly conserved from vertebrates to Drosophila (supplemen- identified by our conservation strategy (supplementary data tary fig. S2B and C, Supplementary Material online). These set S4, Supplementary Material online). Of note, primate-only results respond to a recent notion that cis sequence changes events were human editing events observed at the 526 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Table 1 Summary of A-to-I Editing Sites Identified in This Study Organis Number of Number of A-to-I Editing Sites Host Genes Samples Clustering Conservation Total TE Sites Strategy (A) Strategy (B) Sites (A[B) Homo sapiens (human) 45 177,734 66,800 222,300 213,772 11,035 Pan troglodytes (chimpanzee) 15 42,082 45,117 54,891 52,939 5,292 Pan paniscus (bonobo) 12 30,807 37,021 42,420 40,911 4,929 Gorilla gorilla (gorilla) 11 12,169 15,936 19,956 18,561 3,442 Pongo abelii (orangutan) 9 6,230 8,828 11,125 10,820 2,357 Macaca mulatta (macaque) 13 3,034 4,593 6,206 5,997 1,691 Mus musculus (mouse) 32 57,125 115 57,143 47,859 3,738 Monodelphis domestica (opossum) 12 661 27 682 588 89 Ornithorhynchus anatinus (platypus) 12 267 7 274 253 36 Gallus gallus (chicken) 12 77 17 94 30 20 Xenopus tropicalis (frog) 40 4,495 4 4,499 1,812 146 Danio rerio (zebrafish) 8 5,331 5 5,336 4,571 340 Fly species D. melanogaster 34 317 229 534 208 114 D. pseudoobscura 8 464 206 636 N/A 112 D. willistoni 893 63 149 N/A 124 D. mojavensis 4 1,894 214 2,065 208 410 Nematode species Caenorhabditis elegans 16 888 0 888 584 57 C. briggsae 6 241 0 241 164 21 C. brenneri 628 0 28 N/A 3 C. japonica 642 0 42 N/A 2 The detailed information of the identified sites (e.g., genomic location and host genes) is listed in supplementary data set S2, Supplementary Material online. TE sites: A-to-I editing sites that are located in TEs. TE information of the species is not available. orthologous sites in nonhuman primate(s) but not observed in with increasing the conservation level of A-to-I editing. nonprimate vertebrates. Mammal-only ones were human Furthermore, considering the total “A” sites within the hu- editing events observed at the orthologous sites in both non- man coding sequences that would cause nonsynonymous human primate(s) and nonprimate mammal(s) but not ob- and synonymous changes if edited to “G,” respectively, we served in nonmammal vertebrates. Vertebrate-conserved calculated the frequencies of nonsynonymous (f ) and synon- ones were human editing events simultaneously observed at ymous (f ) editing (Xu and Zhang 2014). If nonsynonymous the orthologous sites in nonhuman primate(s), nonprimate editing events are generally deleterious and are destined to mammal(s), and nonmammal vertebrate(s). We found that selective elimination, f should exhibit remarkably smaller the great majority of the conserved editing events (99%) than f (Chen 2013; Xu and Zhang 2014). To examine the were primate-only, of which 98% (65,109 out of 66,689 relationship between the conservation level of A-to-I editing events) were located in TEs, especially in the primate- and the f -to-f ratio, we retrieved human nonconserved, n s specific Alu sequences (63,745 events; 96%), whereas no human–chimpanzee shared, human–chimpanzee–mouse mammal-only or vertebrate-conserved events occurred in shared, and human–chimpanzee–mouse–chicken shared syn- Alu repeats (fig. 4A). This also reflected our abovementioned onymous (or nonsynonymous) editing sites and the corre- notion that the scaffolding of A-to-I editomes could be raised sponding shared “A” sites that would have synonymous (or by the introduction of TEs in a lineage-specific manner (fig. 3). nonsynonymous) changes if edited to “G,” respectively (sup- We examined the effect of the conservation level on the dis- plementary table S2, Supplementary Material online). tribution of editing events. We found that the only 0.23% of Intriguingly, we found that the f -to-f ratio remarkably n s primate-only events caused amino acid changes (nonsynon- increased with increasing the conservation level of A-to-I edit- ymous changes), whereas as high as 26% and 88% of ing, andthe ratioevenexhibited 100% >1 for the human– mammal-only and vertebrate-conserved events leaded to chimpanzee–mouse–chicken shared events (fig. 4C). A nonsynonymous changes, respectively (fig. 4B). This reveals previous study reported that nonsynonymous editing events that the percentage of nonsynonymous editing sites increase were observed to occur less frequently than expected by Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 527 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE FIG.2.—The cis-preference of ADARs and editing level for the detected A-to-G editing sites. (A, B)The cis-preference of ADARs (or the ADAR motif, which was measured by the observed-to-expected (O/E) ratio of the presence of “G” immediately upstream and downstream to the A-to-G editing sites) for 528 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE 20 20 10 100 0 FIG.3.—The relationship between the expansion of TEs and the increase of A-to-I editing sites. (A) The average numbers of TEs (i.e., SINEs, LINEs, LTRs, and DNA transposons) per million bases (top) and the compositions of A-to-I editing sites in the four types of TEs (i.e., SINE, LINE, LTR, and DNA transposon), other repetitive region, and nonrepetitive region (bottom). (B) The distribution of clustered A-to-I sites pertaining to TEs across species. TE: transposable element. SINE: short interspersed nuclear element. LINE: long interspersed nuclear element. LTR: long terminal repeat element. chance, suggesting that coding RNA editing is generally not scores; Materials and Methods) and the conservation level beneficial (Xu and Zhang 2014). However, if the editing of nonsynonymous A-to-I editing was indeed positively corre- events were highly conserved across species, we observed a lated with the PhyloP scores (fig. 4D). This also reflected the different trend (fig. 4C). We further examined the correlation above observations that both the magnitude of the ADAR between the conservation level of nonsynonymous A-to-I motif and editing level were positively correlated with the editing and the conservation level of the corresponding indi- conservation level of editing (fig. 2D, G,and H). These results vidual nucleotides (measured by the PhyloP score) (Pollard et thus support a previous assumption that cross-species shared al. 2010). We found that the PhyloP scores of the conserved nonsynonymous editing is potentially beneficial and unlikely editing sites (including human–chimpanzee, human– due to nonediting-related processes that may cause the chimpanzee–mouse, and human–chimpanzee–mouse– inevitable consequence of sequence conservation (Xu and chicken shared editing sites) were generally higher than con- Zhang 2015). For example, considering the 87 human-editing trol (by simulating the sequence data to infer the PhyloP events highly conserved in 5 nonhuman vertebrates FIG.2.—Continued the editing sites identified by the clustering (A) and conservation (B) strategies across metazoan species. Only the species with more than 50 detected editing sites were considered. (C) The correlation between the magnitude of the ADAR motif and the magnitude of clustering of editing sites (N )in human.(D) cluster The correlation between the magnitude of the ADAR motif and the conservation level of editing. (E) The correlation between the magnitude of the ADAR motif and editing level (measured by the Spearman’s rank correlation). (F, G) The correlation between editing level and N (F) and the conservation level of cluster editing (G). (H) The correlations among N , the conservation level of editing, the magnitude of the ADAR motif, and editing level. “þ”represents a cluster positive correlation. The statistical significance was evaluated using the two-tailed Fisher’s exact test (A, B, D), the Spearman’s rank correlation (E), and the two-tailed Wilcoxon rank-sum test (F, G) using the R package: *P value < 0.05, **P value < 0.01, and ***P value < 0.001. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 529 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 % of clustered sites TE density pertaining to TEs RNA editome (per million bases) Hung et al. GBE FIG.4.—Analysis of the identified human editing events according to different conservation levels of A-to-I editing. (A, B) The distribution of human A- to-I editing sites (A)located in Alu, non-Alu TE, and non-TE regions, and (B) located in UTR/intron and editing sites leading to synonymous and non- synonymous changes for primate-only, mammal-only, and vertebrate-conserved editing events. UTR: untranslated region. (C)The f -to-f ratios for human n s (all identified human A-to-I editing sites), human–chimpanzee shared, human–chimpanzee-mouse shared, and human–chimpanzee–mouse–chicken shared A-to-I editing sites. (D) Comparison of the conservation level of nonsynonymous A-to-I editing and conservation scores (measured by the PhyloP score). The empty diamond, circle, rectangle, and triangle represents the control (the average values of PhyloP scores of the simulation; see Materials and Methods) for each bin of the four categories of conservation, respectively. The error bar represents the standard error of the mean. The P values were estimated by the Kolmogorov–Smirnov test. *P value< 0.05 and ***P value< 0.001. (supplementary data set S4, Supplementary Material online), Samples were selected for spatial profiling if they were eligible all the nonsynonymous editing events (18 events located in 15 for measurement in five organs (cerebellum, brain, liver, kid- genes) were conserved across primates and nonprimate ney, and heart) from a single individual, and for temporal vertebrates. In fact, alteration of A-to-I editing has been profiling if they were based on a time-course experiment us- shown to affect the function of all these 15 genes in diverse ing a single animal strain. The constructed spatiotemporal species (supplementary table S3, Supplementary Material atlasthus enabledustoassessthe correlationbetween the online), further supporting the functional importance of global editing (supplementary data set S5, Supplementary these evolutionarily conserved nonsynonymous editing Material online) and expression of the two functional editors events. (i.e., ADAR1 and ADAR2 expression; supplementary data set S6, Supplementary Material online) in the context of evolution. Spatiotemporal Dynamics of A-to-I Editing across Species In the spatial context, consistent with a previous observation (Picardi et al. 2015), we found that A-to-I editing To survey the transcriptome-wide dynamics of A-to-I editing, tendedtobe tissue-specific (fig. 5A), and that most of we constructed the spatiotemporal atlas in diverse species, the editing events observed in only one tissue were brain-/ including nine species with spatial profiles in different organs cerebellum-specific (fig. 5B). The clustered heatmap analysis and four species with temporal stages during development. 530 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE FIG.5.—Spatial profiling of A-to-I editing in five organs (cerebellum, brain, liver, kidney, and heart) among metazoans. (A) Analysis of tissue-specificity of A-to-I editing. (B) Distribution of tissue-specific A-to-I editing sites in varied individuals across species. (C) Clustered heatmap for A-to-I editing across five types Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 531 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE on 16 individuals across species further revealed that A-to-I embryogenesis from cleavage to organogenesis (fig. 6B), re- editing events grouped tissue samples into 2 distinct groups spectively. We observed that the dynamics of global editing (fig. 5C). This reflected that A-to-I editing activity was more exhibited one-stage lagging when compared with the fluctu- abundant in the central nervous system (CNS, e.g., cerebel- ation of ADAR expression (supplementary fig. S6, lum and brain) than in the rest (liver, kidney, and heart), re- Supplementary Material online), as we used mRNA expression gardless of species, sex (fig. 5C and D,and supplementary fig. to represent the actual abundance of ADARs proteins. When S3, Supplementary Material online), or where the editing sites taking this into account, ADAR expression was precisely syn- were located (TE vs. non-TE and coding vs. noncoding; sup- chronized with global editing in all examined species (fig. 6A plementary fig. S4, Supplementary Material online). These and B). In C. elegans, consistent with a previous study (Tonkin results indicate a highly conserved, enriched pattern of editing et al. 2002), the first larval stage (L1) exhibited a strong signal activity in the CNS. We further showed that the expression of editing during embryonic and larval (L1–L4) development levels of ADAR1/ADAR2 were generally higher in the CNS as (fig. 6A, left). Intriguingly, another strong signal was present compared with other organs (fig. 5D). Such an expression during dauer stages (i.e., dauer entry, dauer, and dauer exit) pattern holds well among amniotes from primates to birds, (fig. 6A, left), which represented an alternative development suggesting both editors are subject to strong evolutionary pathway in response to environmental stress. In D. mela- constraint in the perspective of gene expression. In addition, nogaster, similarly, an elevated trend of editing was also ob- a positive correlation between A-to-I editing levels in tissues served in L1; and the patterns of global editing were generally and ADAR (ADAR1 and ADAR2) expression was generally comparable with the patterns observed in C. elegans during observed across species (fig. 5D and supplementary fig. S3, development (fig. 6A, right). Such an elevated trend of editing Supplementary Material online). We asked whether our result in L1 generally held during worm and fly development. In may bias toward the editing sites identified by our stringent vertebrates, frog and zebrafish shared a conserved pattern criteria (i.e., the clustering and conservation strategies). To of editing during embryogenesis: A-to-I editing was highly address this, we extracted the A-to-G variants that failed to active in early embryonic stages, and rapidly fell off during pass our strategies in the three mouse individuals examined in gastrulation (from shield to bud for zebrafish, and from 6 to this study (for which the five examined organs were obtained 12 h for frog), followed by a stable trajectory that remained at from the same individual) but were previously identified to be low levels in later stages (fig. 6B). These results thus suggest editing sites (collected in DARNED or RADAR; supplementary that A-to-I editing may play a stage-dependent role during data set S7, Supplementary Material online). We performed development. The similar patterns of editing between inver- the similar analysis and observed a consistent result (supple- tebrates (nematode and fly) and between vertebrates (zebra- mentary fig. S5, Supplementary Material online). These results fish and frog) during development were generally observed thus suggest that global A-to-I editing activity could be largely for TE, non-TE, coding, and noncoding editing events (fig. 6C attributed to the two editors in tissues, although other con- and supplementary fig. S7, Supplementary Material online). founding factors may also affect the regulation of A-to-I edit- To avoid a possible bias toward our identified editing sites, we ing activity. This result also reflected the enrichment of RNA also analyzed the A-to-G variants that were previously identi- editing activity in the CNS. Furthermore, considering editing fied to be editing sites (collected in DARNED or RADAR) but levels of 589 one-to-one orthologous sites in 5 organs from 5 excluded by our stringent strategies in the fly samples exam- primates, the principal component analysis (PCA) showed that ined in this study (supplementary data set S7, Supplementary the data tended to be clustered by organ (fig. 5E). This result Material online), and showed the similar results as above (sup- also reflects the tendency of tissue-specificity of A-to-I editing plementary fig. S8, Supplementary Material online). The com- (fig. 5A and B). parability of global editing patterns between species implied In the temporal context, we examined the dynamics of A- that editing activity principally reflects ADAR expression inher- to-I editing in two invertebrates (C. elegans and D. mela- ited from their common ancestors. Intriguingly, we found 22 nogaster) during the development from embryo to adult nonsynonymous editing sites (in 13 genes; supplementary ta- (fig. 6A) and two vertebrates (zebrafish and frog) during the ble S4, Supplementary Material online), all of which were not FIG.5.—Continued of tissues, with rows representing accumulated editing levels of detected editing events and columns representing tissues. (D) Correlation between A-to-I editing activity and ADAR expression in the spatial context. Transcriptome-wide activities of editing were examined in the five organs from the same individual. For each individual, the highest editing level among the five organs was used to normalize the A-to-I editing index (left Y-axis; Material and Methods). ADAR expression levels were estimated in terms of BPKM (right Y-axis; Material and Methods). Pearson’s coefficient of correlation (r) (performed by the R package) was used to evaluate the correlation between A-to-I editing index and ADAR (ADAR1 (r ) and ADAR2 (r )) expression levels. (E)PCA based 1 2 on the editing levels of 589 orthologous sites in 5 organs from 5 primates. PCA was performed by the “princomp” function in the “stats” package of the R package. The distance metric between samples was calculated by 1 jj q . q represents pairwise Spearman’s correlation coefficient of RNA editing level between samples. M, male; F, female; WT, wild type (mouse). 532 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE FIG.6.—Temporal profiling of A-to-I editing among metazoans. (A, B) Temporal dynamics of A-to-I editing and ADAR expression during (A) invertebrate (C. elegans and D. melanogaster)and (B) vertebrate (zebrafish and frog) developments. For each animal, the highest editing level during development was used to normalize the A-to-I editing index (left Y-axis; see Material and Methods). ADAR expression levels were estimated in terms of BPKM (right Y-axis). Pearson’s r between A-to-I editing index and ADAR expression level was calculated by considering the one-stage lagging of A-to-I editing as the fluctuation of ADAR expression during development (see the text). ADARs represent ADR-1 and ADR-2 for C. elegant, dADAR for D. melanogaster, and ADAR1 and ADAR2 for vertebrates (zebrafish and frog). (C) Temporal profiling of TE- and non-TE-associated A-to-I editing during C. elegans, D. melanogaster, zebrafish, and frog developments. For each animal, the editing levels of TE- and non-TE-associated sites were accumulated, respectively. The highest editing level during development was used to normalize the A-to-I editing index. (D) Changes in A-to-I editing levels for nonsynonymous editing sites during fly holometabolous development. (E) Heatmap representation of Pearson’s correlations (r) between editing levels of individual nonsynonymous editing sites and different levels of lagging (no lagging and one-, two-, and three-stage lagging) in dADAR expression in the matching order of (D). Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 533 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE located in TEs and commonly present during fly holometab- higher magnitude of the ADAR motif (fig. 2H). We also found olous development from embryo to adult. Of the 22 nonsy- that both the ratio of the frequencies of nonsynonymous nonymous events, 16 were conserved editing sites, implying editing to that of synonymous editing (the f -to-f ratio; n s their importance of A-to-I editing for fly development. As fig. 4C) and the conservation level of single nucleotides (the shown in figure 6D, we further found different patterns in PhyloP score; fig. 4D) remarkably increased with increasing changes of RNA editing level on separate A-to-I editing sites the conservation level of A-to-I editing. These results thus during fly holometabolous development. We observed that suggest potentially functional benefit of highly clustered the dynamics and active nature A-to-I editing on separate sites and conserved editing sites. may reflect different levels of lagging as compared with the While the scaffolding of A-to-I editomes could be raised by fluctuation of dADAR expression, even though editing sites the introduction of TEs in a lineage-specific manner, a subset located within the same gene loci (fig. 6E and supplementary of highly conserved A-to-I editing events (87 human editing fig. S9, Supplementary Material online). For example, events observed in at least 5 nonhuman species, of which 34 NaCP60E (Na channel protein 60E) mediates the voltage- events were conserved in nonprimate species; supplementary dependent sodium ion permeability of excitable mem- data set S4, Supplementary Material online) might exist long branes (Kulkarni et al. 2002), for which A-to-I editing in the common ancestor of primates (or even the common may play a role in rapid electrical and chemical neurotrans- ancestor of mammalian species for the 34 events). For exam- mission (Hoopengardner et al. 2003). Changes in editing ple, Gabra3 contains a highly conserved A-to-I editing site, at levels for NaCP60E transcripts at the six nonsynonymous A- which amniotes encode an isoleucine codon (AUA) while to-I editing sites exhibited distinct patterns throughout fruit zebrafish and frog retain that of methionine (AUG) as the fly life cycles, and were correlated with different levels of ancestral state (Tian et al. 2011). This codon in amniotes lagging in dADAR expression (fig. 6E and supplementary can still revert to “AUG” by A-to-I editing during transcription, fig. S9, Supplementary Material online). Such diversity of making the switch from isoleucine to methionine (I/M) possi- NaCP60E proteins through A-to-I editing may contribute to ble without losing its original message in genome. In addition, fly nervous system. This is the first reported finding on the we found that 18 of the 87events(locatedin15genes) were editing pattern of NaCP60E transcripts. These results reveal nonsynonymous (supplementary data set S4,Supplementary that A-to-I editing, which occurs at the transcriptome level, Material online). Alteration of A-to-I editing has been shown also provides a wide range of diversity for the proteome, to affect the function of these 15 genes in diverse species and that individual editing sites may be developmentally (supplementary table S3, Supplementary Material online). regulated. The gene ontology (GO) analysis further revealed that the host genes of these highly conserved nonsynonymous editing sites were enriched in “synaptic transmission,” “synapse,” Discussion “channel activity,” “gated channel activity,” and “ion chan- In this study, we successfully identified high-confidence edit- nel activity” (supplementary table S6, Supplementary Material ing sites across diverse species by controlling for FDR and online), consistent with the functional effect of A-to-I editing %AG (which is determined by N ; the clustering strategy) cluster in previous reports (supplementary table S3, Supplementary or evolutionary conservation of editing in multiple species (the Material online). Of note, the host genes of Drosophila con- conservation strategy), without a priori knowledge of known served editing sites (i.e., D. melanogaster editing sites ob- SNPs or genome sequencing from the same sample. This served in at least one other Drosophila species; enables us to identify high-confidence A-to-I sites in both supplementary data set S2, Supplementary Material online) model and nonmodel animals, and thus to construct the first were also enriched in similar GO terms (supplementary table A-to-I editomes across 20 metazoan species. Of note, previ- S6, Supplementary Material online), indicating the evolution- ous studies have reported that the A/G pattern of human- ary convergence of editing targets despite substantial diver- chimpanzee coincident SNPs (the same A/G variants were gence between vertebrate and invertebrate. These results also present at human–chimpanzee orthologous positions) was reflect that a conserved enrichment of editing in the CNS over-represented (Hodgkinson et al. 2009; Chen et al. throughout more than 300 Myr of divergent evolution in 2016). We suggest that the effect of coincident SNPs on complex animals from primates to chickens (fig. 5D and sup- the editing sites identified by the conservation strategy is slight plementary figs. S3 and S4, Supplementary Material online). because only 0.01% (5 out of 38,480) of the human–chim- Intriguingly, of the 66,800 human A-to-I editing events iden- panzee conserved A-to-G editing sites are also observed to be tified by our conservation strategy (table 1 and supplementary A/G polymorphic at human–chimpanzee orthologous posi- data set S4, Supplementary Material online), 6, 4, and 192 tions on the basis of dbSNP (supplementary table S5, could cause stop codon losses, splice site alterations, and Supplementary Material online). Comparative analysis of the amino acid changes (i.e., nonsynonymous changes), respec- identified editing sites revealed that highly clustered and con- tively. It is worthwhile to further investigate the functional served editing sites tended to have a higher editing level and a importance of these evolutionarily conserved editing events. 534 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE For the spatial context of A-to-I editing, we found that mechanism in living cells, especially in the CNS and during editing events tended to exhibit tissue-specificity (particularly early development. This study thus sheds light on evolutionary for the CNS) (fig. 5A and B), and that global editing activity and dynamic aspects of A-to-I editome across vertebrates and and ADAR (ADAR1 and ADAR2) expression were higher in invertebrates, opening up this important but understudied the CNS than in the other tissues examined and positively class of nongenomically encoded events for comprehensive correlatedwitheachother (fig. 5C and D and supplementary characterization. fig. S3, Supplementary Material online). Importantly, these tendencies were consistently observed among amniotes Supplementary Material from primates to birds, representing the first documented Supplementary data are available at Genome Biology and spatial profiling for A-to-I editing across amniotes. Using Evolution online. PCA, we further showed that global editing activity of primate-conserved sites exhibited an organ-preferential clus- tering (fig. 5E), reflecting that these editing events tended to Acknowledgments be specific to a few organs (fig. 5A and B). On the other hand, We thank Chan-Shuo Wu, Yen-Zho Chen, and Yu-Ting Hsiao our results exhibited the dynamic and active nature of A-to-I for computational assistance, Prof. Hsien-Sung Huang for pro- editome during the development of invertebrates (worm and viding mouse brain tissue, Prof. Fu Huang for providing D. fly) and during the embryogenesis of vertebrates (zebrafish melanogaster sample, and Prof. Ying-Chih Chang for provid- and frog), respectively. We observed that shifting patterns of ing bench and related experimental facilities. This study was global A-to-I editing were precisely dependent on ADAR ex- supported by Genomics Research Center, Academia Sinica, pression (fig. 6A and B), echoing the developmental effects of Taiwan; and the Ministry of Science and Technology, Taiwan ADAR enzymes (Palladino et al. 2000; Jepson and Reenan (under the contracts NSC 102-2621-B-001-003 and MOST 2008; Horsch et al. 2011). Particularly, this (fig. 6D) and other 103-2628-B-001-001-MY4). studies also observed that individual editing sites can ex- hibit distinct editing patterns during development in di- verse species (Gurevich et al. 2002; Osenberg et al. 2010; Literature cited Yu et al. 2016). In addition to a possible explanation that Alon S, et al. 2015. The majority of transcripts in the squid nervous system shift in expression levels of edited genes may accompany are extensively recoded by A-to-I RNA editing. Elife 4:e05198. editing changes during development (Yu et al. 2016), we Bahn JH, et al. 2012. Accurate identification of A-to-I RNA editing in hu- man by transcriptome sequencing. Genome Res. 22(1):142–150. provided another possibility that individual editing sites Bass B, et al. 2012. The difficult calls in RNA editing. Interviewed by H Craig may reflect different levels of lagging as compared with Mak. Nat Biotechnol. 30(12):1207–1209. the fluctuation of ADAR expression (e.g., fig. 6D and E). Bass BL. 2002. RNA editing by adenosine deaminases that act on RNA. The joint effects of both possibilities may be the cause for Annu Rev Biochem. 71:817–846. diverse editing patterns of individual sites during develop- Bazak L, et al. 2014. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res. ment. Further investigation of individual targeted editing 24:365–376. sites will provide a fuller understanding of the developmen- Bazak L, Levanon EY, Eisenberg E. 2014. Genome-wide analysis of Alu tal role of A-to-I RNA editing. Moreover, we observed that editability. Nucleic Acids Res. 42(11):6876–6884. the editing patterns of C. elegans and fly were generally Burns CM, et al. 1997. Regulation of serotonin-2C receptor G-protein comparable with each other during development (fig. 6A). coupling by RNA editing. Nature 387(6630):303–308. Chen CY, Hung LY, Wu CS, Chuang TJ. 2016. Purifying selection shapes The comparability of editing pattern was also observed be- the coincident SNP distribution of primate coding sequences. Sci Rep. tween vertebrates (zebrafish and frog) during embryogen- 6:27272. esis (fig. 6B), regardless of where the editing sites were Chen JY, et al. 2014. RNA editome in rhesus macaque shaped by purifying located (fig. 6C and supplementary fig. S7, selection. PLoS Genet. 10(4):e1004274. Supplementary Material online). These results thus indicate Chen L. 2013. Characterization and comparison of human nuclear and cytosolic editomes. Proc Natl Acad Sci U S A. 110(29):E2741–E2747. that global editing activity principally reflects ADAR expres- Danecek P, et al. 2012. High levels of RNA-editing site conservation sion inherited from their common ancestors. amongst 15 laboratory mouse strains. Genome Biol. 13(4):26. In summary, our comparative analysis provides the first A- Eggington JM, Greene T, Bass BL. 2011. Predicting sites of ADAR editing in to-I editomes across 20 diverse species, from worm to human, double-stranded RNA. Nat Commun. 2:319. and reconstructs an evolutionary landscape of editomes, Feng Zhang YL, Yan S, Xing Q, Tian W. 2017. SPRINT: a SNP-free toolkit for identifying RNA editing sites. Bioinformatics 33:3538–3548. which might serve as a valuable resource for understanding Gurevich I, et al. 2002. Altered editing of serotonin 2C receptor pre-mRNA the co-evolution of the editing machinery (i.e., TEs, editomes, in the prefrontal cortex of depressed suicide victims. Neuron and editors). The spatiotemporal atlas provides unprece- 34(3):349–356. dented resolution of editing dynamics, and reveals conserved Han L, et al. 2015. The genomic landscape and clinical relevance of A-to-I patterns of editing, which might be a key regulatory RNA editing in human cancers. Cancer Cell 28(4):515–528. Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 535 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Hung et al. GBE Hodgkinson A, Ladoukakis E, Eyre-Walker A. 2009. Cryptic variation in the Osenberg S, et al. 2010. Alu sequences in undifferentiated human embry- human mutation rate. PLoS Biol. 7(2):e1000027. onic stem cells display high levels of A-to-I RNA editing. PLoS ONE Hoopengardner B, Bhalla T, Staber C, Reenan R. 2003. Nervous system 5(6):e11173. targets of RNA editing identified by comparative genomics. Science Palladino MJ, Keegan LP, O’Connell MA, Reenan RA. 2000. A-to-I pre- 301(5634):832–836. mRNA editing in Drosophila is primarily involved in adult nervous sys- Horsch M, et al. 2011. Requirement of the RNA-editing enzyme tem function and integrity. Cell 102(4):437–449. ADAR2 for normal physiology in mice. J Biol Chem. Paz-Yaacov N, et al. 2010. Adenosine-to-inosine RNA editing shapes tran- 286(21):18614–18622. scriptome diversity in primates. Proc Natl Acad Sci U S A. Hwang T, et al. 2016. Dynamic regulation of RNA editing in human brain 107(27):12174–12179. development and disease. Nat Neurosci. 19(8):1093–1099. Peng Z, et al. 2012. Comprehensive analysis of RNA-Seq data reveals ex- Jepson JE, Reenan RA. 2008. RNA editing in regulating gene expression in tensive RNA editing in a human transcriptome. Nat Biotechnol. the brain. Biochim Biophys Acta 1779(8):459–470. 30(3):253–260. John D, Weirick T, Dimmeler S, Uchida S. 2017. RNAEditor: easy detection Picardi E, D’Erchia AM, Lo Giudice C, Pesole G. 2017. REDIportal: a com- of RNA editing events and the introduction of editing islands. Brief prehensive database of A-to-I RNA editing events in humans. Nucleic Bioinform. 18:993–1001. Acids Res. 45(D1):D750–D757. Kawahara Y, et al. 2007. Redirection of silencing targets by Picardi E, et al. 2015. Profiling RNA editing in human tissues: towards the adenosine-to-inosine editing of miRNAs. Science inosinome Atlas. Sci Rep. 5:14941. 315(5815):1137–1140. Pickrell JK, Gilad Y, Pritchard JK. 2012. Comment on “Widespread RNA Kim DD, et al. 2004. Widespread RNA editing of embedded alu elements and DNA sequence differences in the human transcriptome”. Science in the human transcriptome. Genome Res. 14(9):1719–1725. 335(6074):1302; author reply 1302. Kimelman D, Kirschner MW. 1989. An antisense mRNA directs the cova- Pinto Y, Cohen HY, Levanon EY. 2014. Mammalian conserved ADAR lent modification of the transcript encoding fibroblast growth-factor in targets comprise only a small fragment of the human editosome. Xenopus oocytes. Cell 59(4):687–696. Genome Biol. 15(1):R5. Kiran A, Baranov PV. 2010. DARNED: a DAtabase of RNa EDiting in Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. 2010. Detection of non- humans. Bioinformatics 26(14):1772–1776. neutral substitution rates on mammalian phylogenies. Genome Res. Kleinman CL, Majewski J. 2012. Comment on “Widespread RNA and 20:110–121. DNA sequence differences in the human transcriptome”. Science Porath HT, Carmi S, Levanon EY. 2014. A genome-wide map of hyper- 335(6074):1302; author reply 1302. edited RNA reveals numerous new sites. Nat Commun. 5:4726. Kulkarni NH, Yamamoto AH, Robinson KO, Mackay TF, Anholt RR. 2002. Ramaswami G, et al. 2012. Accurate identification of human Alu and non- The DSC1 channel, encoded by the smi60E locus, contributes to odor- Alu RNA editing sites. Nat Methods 9(6):579–581. guided behavior in Drosophila melanogaster. Genetics Ramaswami G, et al. 2013. Identifying RNA editing sites using RNA se- 161(4):1507–1516. quencing data alone. Nat Methods 10(2):128–132. Lehmann KA, Bass BL. 2000. Double-stranded RNA adenosine deaminases Ramaswami G, Li JB. 2014. RADAR: a rigorously annotated database ADAR1 and ADAR2 have overlapping specificities. Biochemistry of A-to-I RNA editing. Nucleic Acids Res. 42(Database 39(42):12875–12884. issue):D109–D113. Lev-Maor G, et al. 2007. RNA-editing-mediated exon evolution. Genome Rueter SM, Dawson TR, Emeson RB. 1999. Regulation of alternative splic- Biol. 8(2):R29. ing by RNA editing. Nature 399(6731):75–80. Levanon EY, et al. 2004. Systematic identification of abundant A-to-I Sapiro AL, Deng P, Zhang R, Li JB. 2015. Cis regulatory effects on A-to-I editing sites in the human transcriptome. Nat Biotechnol. RNA editing in related Drosophila species. Cell Rep. 11(5):697–703. 22(8):1001–1005. Savva YA, Reenan RA. 2014. Identification of evolutionarily meaningful Li H, Durbin R. 2009. Fast and accurate short read alignment with information within the mammalian RNA editing landscape. Genome Burrows-Wheeler transform. Bioinformatics Biol. 15(1):103. 25(14):1754–1760. Schroeder A, et al. 2006. The RIN: an RNA integrity number for assigning Li H, et al. 2009. The sequence alignment/map format and SAMtools. integrity values to RNA measurements. BMC Mol Biol. 7:3. Bioinformatics 25(16):2078–2079. Shtrichman R, et al. 2012. Altered A-to-I RNA editing in human embryo- Li H, Ruan J, Durbin R. 2008. Mapping short DNA sequencing reads and genesis. PLoS ONE 7(7):e41576. calling variants using mapping quality scores. Genome Res. Stefl R, et al. 2010. The solution structure of the ADAR2 dsRBM-RNA 18(11):1851–1858. complex reveals a sequence-specific readout of the minor groove. Lin W, Piskol R, Tan MH, Li JB. 2012. Comment on “Widespread RNA and Cell 143(2):225–237. DNA sequence differences in the human transcriptome”. Science Sun Y, et al. 2016. RED: a Java-MySQL software for identifying and visu- 335(6074):1302; author reply 1302. alizing RNA editing sites using rule-based and statistical filters. PLoS Liscovitch-Brauer N, et al. 2017. Trade-off between transcriptome ONE 11(3):e0150465. plasticity and genome evolution in cephalopods. Cell Tian N, et al. 2011. A structural determinant required for RNA editing. 169(2):191–202 e111. Nucleic Acids Res. 39(13):5669–5681. Maas S, Kawahara Y, Tamburro KM, Nishikura K. 2006. A-to-I RNA editing Tomaselli S, et al. 2015. Modulation of microRNA editing, expression and human disease. RNA Biol. 3(1):1–9. and processing by ADAR2 deaminase in glioblastoma. Genome Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping Biol. 16:5. and quantifying mammalian transcriptomes by RNA-Seq. Nat Tonkin LA, et al. 2002. RNA editing by ADARs is important for normal Methods 5(7):621–628. behavior in Caenorhabditis elegans. Embo J. 21(22):6025–6035. Nakamura K, et al. 2011. Sequence-specific error profile of Illumina Ulbricht RJ, Emeson RB. 2014. One hundred million adenosine-to-inosine sequencers. Nucleic Acids Res. 39(13):e90. RNA editing sites: hearing through the noise. Bioessays Nishikura K. 2006. Editor meets silencer: crosstalk between RNA 36(8):730–735. editing and RNA interference. Nat Rev Mol Cell Biol. Veno MT, et al. 2012. Spatio-temporal regulation of ADAR editing during 7(12):919–931. development in porcine neural tissues. RNA Biol. 9:1054–1065. 536 Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Evolutionary Landscape of A-to-I RNA Editome GBE Wagner RW, Smith JE, Cooperman BS, Nishikura K. 1989. A double- Xu G, Zhang J. 2015. In search of beneficial coding RNA editing. Mol Biol stranded RNA unwinding activity introduces structural altera- Evol. 32(2):536–541. tions by means of adenosine to inosine conversions in mamma- Yu Y, et al. 2016. The landscape of A-to-I RNA editome is shaped by both lian cells and Xenopus eggs. Proc Natl Acad Sci U S A. positive and purifying selection. PLoS Genet. 12(7):e1006191. 86(8):2647–2651. Zaranek AW, Levanon EY, Zecharia T, Clegg T, Church GM. 2010. A Wahlstedt H, Daniel C, Enstero M, Ohman M. 2009. Large-scale mRNA survey of genomic traces reveals a common sequencing error, RNA sequencing determines global regulation of RNA editing during brain editing, and DNA editing. PLoS Genet. 6(5):e1000954. development. Genome Res. 19(6):978–986. Zhang Q, Xiao X. 2015. Genome sequence-independent identification of Wang Q, Khillan J, Gadue P, Nishikura K. 2000. Requirement of the RNA RNA editing sites. Nat Methods 12(4):347–350. editing deaminase ADAR1 gene for embryonic erythropoiesis. Science 290(5497):1765–1768. Xu G, Zhang J. 2014. Human coding RNA editing is generally nonadaptive. Proc Natl Acad Sci U S A. 111(10):3769–3774. Associate editor: Wen-Hsiung Li Genome Biol. Evol. 10(2):521–537 doi:10.1093/gbe/evx277 Advance Access publication December 26, 2017 537 Downloaded from https://academic.oup.com/gbe/article-abstract/10/2/521/4774976 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

Genome Biology and EvolutionOxford University Press

Published: Feb 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off