Understanding the functional impact of genomic variants is a major goal of modern genetics and personalized medicine. Although many synonymous and non-coding variants act through altering the efficiency of pre-mRNA splicing, it is difficult to predict how these variants impact pre-mRNA splicing. Here, we describe a massively parallel approach we use to test the impact on pre-mRNA splicing of 2059 human genetic variants spanning 110 alternative exons. This method, called variant exon sequencing (Vex-seq), yields data that reinforce known mechanisms of pre-mRNA splicing, identifies variants that impact pre-mRNA splicing, and will be useful for increasing our understanding of genome function. Background into the mechanisms of splicing, but also is important to One of the main goals of personalized medicine is to understand the basis of certain genetic diseases. understand how genetic variations between individuals Identifying variants that impact splicing regulatory ele- impact health. Genetic variants can impact health in a ments and their splicing consequences are difficult to number of different ways, one of which is through alter- detect using conventional poly(A)+ RNA-seq alone be- ing pre-mRNA splicing efficiency. Alternative splicing is cause the variants are often spliced out of the mature a process that is important for regulatory function and a mRNA. A number of different studies have aimed to ad- primary source of proteome diversity in humans . dress this issue. One approach has been the pursuit of Perturbations in splicing have also been shown to con- deciphering the “splicing code” using computational tribute to a number of different diseases [2, 3]. These techniques such as deep learning [5–7]. While these splicing changes can manifest themselves through inter- studies have yielded useful knowledge about splicing and rupting well-known interactions between the spliceo- do have predictive power, experimental confirmation of some and splicing elements, including the 3′ and 5′ the behavior of these variants has been limited and the splice sites, pyrimidine tract, or branchpoint sequences. predictions are not perfect. Other groups have pursued However, splicing can also be perturbed by disrupting the use of random sequences to understand the splicing other sequences known to modulate splicing. Exonic code; however, it is hard to integrate datasets with splicing enhancers and silencers (ESEs and ESSs), as well contextual transcriptome information (i.e., CLIP) when as intronic splicing enhancers and silencers (ISEs and studying the splicing behavior of random sequences . ISSs), are examples of splicing regulatory elements that A more recent study tested a number of exonic can be perturbed and result in different splicing out- disease-associated variants in parallel using a mini-gene comes. Modulation of these splicing regulatory elements system . The approach was to observe the allelic ratio has been shown to be disease associated (for a review of reference to variant in a plasmid pool, and compare see ). Thus, understanding how both intronic and ex- with the ratios observed from splicing outcomes. This onic variants impact splicing not only provides insights approach is useful for studying exonic variants but is un- able to test intronic variants. Here we present a method that address some of these shortcomings using a barcod- * Correspondence: email@example.com ing approach called Variant exon sequencing (Vex-seq). Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Adamson et al. Genome Biology (2018) 19:71 Page 2 of 12 Vex-seq is capable of testing many exonic and flanking serves as an identifier of which exon was present in the intronic variants for the same exon simultaneously. reporter so that it is possible to associate the pre-mRNA of origin in cases where the test exon was skipped. Results We first designed a pool of 2059 variants spanning We set out to develop a high-throughput reporter sys- 110 exons with reference, consensus splice site, and mu- tem to determine the impact of genomic variants on tated splice site control sequences for each exon. To en- pre-mRNA splicing. Our general approach is to generate sure reproducibility, each variant exon was associated a library of test exons flanked by two common constitu- with at least three unique eight-nucleotide barcodes. tive exons (Fig. 1). The library was introduced into tissue Common primer sequences and restriction enzyme sites culture cells followed by RT-PCR and sequencing to de- were also added for proper library construction. We in- termine the splicing frequency of the test exon. Import- cluded a minimum of 50 bases of the upstream intron, antly, the reporters also contain a barcode sequence that which should be adequate to include the majority of Fig. 1 Assembly of test exon and experimental design. a The test exon and flanking introns are subcloned into a reporter plasmid in a two-step process, such that the barcode designating the sequence is near the end of the transcript. Once these plasmids are transfected into cultured cells, a transcript will be produced that may not contain the variant itself, but does contain the barcode (b) uniquely associated with the variant tested. A ten-nucleotide UMI (N10) is attached during the reverse transcription step to collapse PCR duplicates downstream. Illumina flow cell binding sequences (FC) and indexes (I1 and I2) are attached via primers during PCR and the resulting DNA is sequenced on a MiSeq platform. b Data analysis pipeline for splicing results Adamson et al. Genome Biology (2018) 19:71 Page 3 of 12 branchpoints , as well as the exon itself and at least The data analysis pipeline uses custom python scripts 20 bases of downstream intron. This allowed for con- to ensure that read 2 contains the third exon, the correct struction of test exons up to 97 nucleotides in length. restriction site next to the barcode, and sorts the reads Alternative exons between the size of 68 and 97 nucleo- by barcode into bins. PCR replicate reads are collapsed tides were randomly selected from Ensembl into a single read using the UMI from the reverse tran- GRCh37.p13 annotations and variants from the ExAC scription primer. The reads in each bin are then aligned database were intersected with the selected exons and using STAR to a reference specific to each variant . their flanking intron sequences . Percent spliced in (PSI or Ψ) and change in PSI (ΔΨ) We amplified the oligonucleotide pool by PCR from the reference sequence are then calculated (Fig. 1b). (Additional file 1: Table S1). This product was then sub- The amplicon based paired-end sequencing reads con- cloned into a modified version of the splicing reporter tain an unambiguous splicing outcome for each ampli- plasmid pcAT7-Glo1 in between the first intron and the con, making Ψ and ΔΨ calculations straightforward 3′ UTR to generate a 1˚ library. Then restriction sites in from the alignment outputs alone. between the barcode and the end of the test sequence To assess how similarly the barcodes associated with were used to subclone in the second part of the second the same variant impacted the splicing behavior, we intron and third exon from the original plasmid (Fig. 1a). compared the Ψ value of the barcode replicates for each This results in a plasmid that encodes a transcript con- variant and reference exon and observed high correla- taining the first exon and part of the first intron of the glo- tions (Fig. 3a). To ensure that these splicing values were bin gene, the test sequence, followed by the second intron robust to biological variation, we also examined the cor- and final exon of the reporter transcript, ending with the relations of variants between three biological replicates barcode and the 3′ UTR. We refer to this final library pool for HepG2 and K562 cell lines (Fig. 3b). HepG2 and as the 2˚ library. K562 cell lines were chosen for these studies because of In order to ensure that the variants are associated with the wealth of potentially applicable ENCODE data asso- the correct barcode, the 1˚ and 2˚ libraries were se- ciated with these cell lines and because they represent quenced using paired end amplicon sequencing (Fig. 2a). difference cell types and different trans-environments The results from sequencing the 1˚ library show that the for splicing. These data show similarly high correlation majority of barcodes are correctly associated with the values between biological replicates of the same cell correct variant (Fig. 2b). Barcodes excluded from the lines, showing robustness to biological variation. These analysis due to having too few (less than 85%) of the cor- data also show that the splicing data from Vex-seq is ro- rect variant reads associated with it only make up about bust to both technical and biological variation. 1.8% of the barcodes tested. Barcodes that were filtered In order to ensure that splicing behavior reflects what out of the analysis also tended to have a lower read is known about splicing, we examined the Ψ of the mu- depth, suggesting that this may be related to the reason tated and consensus splice site controls relative to refer- for their higher error rate (Fig. 2c). We are also able to ence and variant exons. For the mutated splice site measure a misassignment rate of 4.59% using this plas- controls, both splice sites were mutated such that the 3′ mid sequencing technique. In order to ensure that the li- splice sites were changed from AG to TC and the 5′ brary contained a good diversity of sequences, we splice sites were changed from GT to CA. For the con- calculated a skew ratio between the 10th and 90th per- sensus splice site controls, the variants contained a centile of read depth for each barcode as done previ- 20-nucleotide pyrimidine tract, an AG at the 3′ splice ously to check library diversity . A skew ratio for the site, and a consensus 5′ splice site of GTAAGT. The library established was calculated to be 5.5, which is con- consensus and mutated splice site controls behave as ex- sidered adequate. We conclude that despite some misas- pected with the mutated splice site controls displaying a signment, the plasmid pool is robust enough to be used low Ψ value and consensus splice sites having a higher to study variant changes in splicing. Ψ value, while the variant and reference sequences are The 2˚ library was then transfected into K562 and intermediate (Fig. 4). These are consistent with the ex- HepG2 cell lines in biological triplicate. cDNA was pected splicing behaviors for these control sequences. then synthesized from the RNA isolated from the cells Given the high correlation rates of Ψ values between usingamini-genespecificprimer, aten-nucleotide the biological replicates of different cell lines, we sought random sequence which serves as a unique molecular to characterize this similarity further. Indeed, upon identifier (UMI) and an Illumina Read 2 sequencing examining the correlation between the average Ψ value primer. PCR amplified the cDNA to attach the other of each cell line, we observe a similar pattern (Fig. 5a). necessary sequences for Illumina paired-end sequen- Additionally, when looking at changes in splicing (ΔΨ) cing. The products were then sequenced on an Illu- we see a similar trend (Fig. 5b). In fact, 76.45% of vari- mina MiSeq. ants agree in directionality of ΔΨ between cell lines. Adamson et al. Genome Biology (2018) 19:71 Page 4 of 12 Fig. 2 Quality control for plasmid integrity. a Quality control pipeline for plasmid integrity. Amplicon sequencing of the 1o and 2o plasmid configurations are done through PCR to attach Illumina flow cell binding sequences (FC) and indexes (I1 and I2). Poor quality barcodes are then filtered out by identification of reads not containing variants and excluding barcodes with less than 85% of reads containing the correct variant. b A histogram of the barcodes with the percentage of reads with correct variant identified. c Box plots of 1˚ library read depth for barcodes included and excluded from further analysis Adamson et al. Genome Biology (2018) 19:71 Page 5 of 12 Fig. 3 Behavior and reproducibility of splicing outcomes. a Scatter plots showing the behavior of Ψ for each barcode replicate of the same variant. These were averaged Ψ values of the barcode in each biological replicate. Spearman (s) and Pearson (p) correlations of Ψ are also shown. b Scatter plots showing the splicing behavior or Ψ for each variant in each biological replicate. Spearman (s) and Pearson (p) correlations of Ψ are also shown Furthermore, restricting this analysis to only include var- in splicing studied in Vex-seq are cell type independent. iants that have a ∣ΔΨ∣ >5 or ∣ΔΨ∣ > 10 in HepG2, the However, there are exceptions to this trend, which could agreement in directionality increases to 92.61% and be due to the trans environment of each cell line or 97.49%, respectively (Fig. 5b). This shows that although noise in the data. the Ψ of the reference exons can differ between cell We next examined the changes in splicing efficiency lines, the directionality of most variant-induced changes (or ΔΨ) for each variant. Changes in splicing can be ob- served in many different positions relative to each of the splice sites; however, perturbations in the 3′ and 5′ splice sites typically result in a dramatic reduction of ΔΨ (Fig. 6). Many outliers can be observed changing ΔΨ negatively upstream of the 3′ splice site, which may cor- respond to changes in the pyrimidine tract or the branchpoint sequences. However, variants in core spli- cing sequences alone do not account for the full diver- sity of splicing variation observed from these data. Evidence of potential ESS and ESE regulatory elements can be observed within the exon, as variants in the exon are capable of inducing strong ΔΨ changes in either dir- ection. We examined the role of ESEs and ESSs using ESEseq, a dataset which contains information about hexamers and their exonic splicing regulatory effect Fig. 4 Splice site control sequences generally reflect expected (ESEseq) . We find that variants which gain ESEs or splicing behavior. Boxplots of mean Ψ for each type of control and lose ESSs, as measured by change in ESEseq score, gen- test sequences are shown. Mutated splice site controls contained erally show an increase in ΔΨ, while ESS gain and ESE mutated splice sites such that the 3′ splice sites were changed from AG to TC and the 5′ splice sites were changed from GT to CA. For loss generally show a decrease (Fig. 7a). When examin- the consensus splice site controls, the variants contained a 20-nucleotide ing changes in hexamer composition and the relation- pyrimidine tract, an AG at the 3′ splice site, and a consensus 5′ splice site ship of each hexamer with average changes in ΔΨ,we of GTAAGT observe a weak but significant Spearman correlation Adamson et al. Genome Biology (2018) 19:71 Page 6 of 12 Fig. 5 Similar behavior of splicing between K562 and HepG2 cell lines. a Correlation between Ψ values for each variant between K562 and HepG2 cell lines. b Correlation between ΔΨ values between K562 and HepG2 cell lines. Color coding highlights variants in which the HepG2 ΔΨ changes at different thresholds. Spearman (s) and Pearson (p) correlations are also displayed on each plot with the ESEseq dataset (Fig. 7b). The weakness of this cor- for 3′ splice site strength, we examined whether these relation is probably because the variation in our dataset might disrupt branchpoint sequences. To do this, we was not designed to explore vast hexamer space, and many used branchpointer, a machine learning program, to hexamer rankings are computed from few data points. predict branchpoint probabilities of the reference and The impact of splice site strength has been well char- variant branchpoints . Surprisingly, the majority (53 acterized and is known to impact splicing efficiency . out of 84) of variants impacting splicing in this region The impacts of changes on maximum entropy of 3′ and were not predicted to impact maximum branchpoint 5′ splice sites calculated by MaxEntScan can be visual- usage probability. The variants that do affect branch- ized in Fig. 7c, d . Intronic splicing regulatory point probability did not show any significant correlation elements have been typically studied downstream of the with ΔΨ. We also did not identify any association be- exon in question, which are outside or on the periphery tween changes in RNA secondary structure around the of the context that Vex-seq currently has the capacity to splice sites and changes in splicing, which have been study [16, 17]. However, we do still observe intronic var- previously reported . iants changing splicing (particularly upstream of the 3′ To further characterize variants being studied and splice site) that are outside of the conventionally how they impact splicing we looked at other features, in- measured effects of changes in 3′ and 5′ splice site cluding effect predictions. Using Variant Effect Predictor strength. As many of the variants that do impact splicing (VEP)  annotations we characterized the variants are upsteam of the exon, yet outside the window studied and their impact on ΔΨ (Fig. 8). Annotations were Fig. 6 Distribution of variants tested and their impacts relative to splice sites. ΔΨ from both K562 and HepG2 cell lines is plotted for all variants relative to 3′ and 5′ splice sites. Fifty bases of upstream intron, 33 bases of exon proximal to the splice sites and 20 bases of downstream intron are shown. Above is a histogram showing the number of observations at each position Adamson et al. Genome Biology (2018) 19:71 Page 7 of 12 ab cd Fig. 7 Analysis of potential mechanisms underlying splicing changes. a Violin plots showing how the directionality of a change in ESEseq score associates with splicing changes. P value is calculated using Mann-Whitney U-test. b Scatter plot demonstrating the relationship between the ESEseq score of each hexamer and the average ΔΨ of variance gaining (adding ΔΨ) or losing (subtracting ΔΨ) that hexamer. c Scatter plot showing the positive correlation between changes in 3′ splice site maximum entropy and ΔΨ. d Scatter plot showing the positive correlation between changes in 5′ splice site maximum entropy and ΔΨ. Spearman correlation coefficients and spearman correlation p values are shown in c and d selected based on the first reported annotation from impacts on splicing were conserved . We observed VEP. The 5′ and 3′ splice site variants have the biggest that there is significantly more conservation in variants negative impact on ΔΨ as expected. Intron, missense, which tend to have a high impact on splicing (|ΔΨ| ≥ 5) synonymous, and splice region variants can also have a compared to variants with a low impact on splicing wide range of impacts on ΔΨ. This is consistent with (|ΔΨ| < 5) (Fig. 9a). Much of the conservation observed previous findings about how missense and synonymous is likely due to protein coding constraints on sequences, variants can change splicing inclusion levels . It which may add noise to this signal. To investigate should also be noted that splice region variants alone do whether this splicing-sensitive conservation is stronger not account for many of the variants which changed in variants without protein changing potential, we exam- splicing, consistent with the difficulty of predicting the ined the same trend in variants without protein coding impact of these variants based on impact annotations. constraints (intron, synonymous, UTR, and splice region We were also interested in examining whether the var- variants), and we observed a more significant difference iants that displayed the largest ΔΨ were more or less (Fig. 9b). Additionally, when we focus on synonymous conserved than variants that had little impact on ΔΨ. variants only, the effect is much clearer, even with a We used 100-way vertebrate conservation scores from smaller sample size (Fig. 9c). Intron variants seem to PhyloP to examine how variants with strong or weak show the same trend of higher conservation with higher Adamson et al. Genome Biology (2018) 19:71 Page 8 of 12 (NMD), which may be different from NMD in the en- dogenous context. To investigate the role of NMD in this assay, we used a UPF1-targeted shRNA knock- down to attenuate NMD in the K562 cell line and performed the Vex-seq in this new context . UPF1 protein was depleted 63% (Fig. 10a). To identify transcripts that may be sensitive to NMD, we performed a differential splicing analysis using rMATS-STAT . As expected, most transcripts that are significantly changing display increased exon in- clusion (Fig. 10b). NMD is known to act through pre- mature termination codons (PTCs), which can be predicted based on the presence of a stop codon 50 nucleotides before the last exon junction in the tran- Fig. 8 Variants classified by effect prediction and their impact on script . While not all test exons with PTCs have a ΔΨ. Splice region classified by VEP is defined as being within one to significant increase in splicing upon UPF1 depletion, three bases of the proximal exon, or three to eight bases of the most (95/151) significantly changing (p ≤ 0.01) test proximal introns. The splice donor and acceptor annotations strictly exons have a predicted PTC in the context of refer to the dinucleotides downstream and upstream of the exon, respectively. The first reported annotation by VEP is displayed Vex-seq. This allows us to identify transcripts which are NMD-sensitive in this experimental system, but not necessarily in the endogenous context (Fig. 10c). To characterize the effect of NMD in our assay, and |ΔΨ|; however, it is a milder effect (Fig. 9d). This sug- how it would relate to the wild-type situation, we gests that ESEs and ESSs modulated by these variants used linear regression to predict the effect of NMD are more conserved, while intronic regulatory regions in on transcripts that would be endogenously subject to the window we are testing are relatively more flexible. NMD, but may not necessarily be in Vex-seq. This Perhaps this weaker conservation signal is because ISSs model uses the UPF1 knockdown Ψ and the presence and ISEs are not constrained by the context of the of a PTC as input to predict the shScrambled Ψ protein frame, and may be able to move around in linear (mean squared error (MSE) = 69.81) and performs bet- space within the intron and still be effective in influen- ter than a model without PTC input (MSE = 80.22) cing splicing. on a 1/3 held out test set. The predicted effect of en- One confounding variable for particular test exons dogenous NMD on ΔΨ of stop gained and frameshift in the context of Vex-seq is nonsense-mediated decay variants is showninFig. 10d. cd ab Fig. 9 Conservation of variants with strong splicing impacts. a Boxplots showing the relationship of PhyloP and magnitude of ΔΨ for all variants. b Boxplots showing the relationship of PhyloP and magnitude of ΔΨ for variants without predicted protein coding annotations. c Boxplots showing the relationship of PhyloP and magnitude ΔΨ for synonymous variants. d Boxplots showing the relationship of PhyloP and magnitude of ΔΨ for intron variants. P values are calculated with the Mann-Whitney U-test. All variant effect predictions were performed by VEP and were classified by the first reported annotation Adamson et al. Genome Biology (2018) 19:71 Page 9 of 12 UPF1 GAPDH Fig. 10 The impact of NMD on Vex-seq splicing results. a Western blot from Wes showing UPF1 knockdown in K562 cells. b A volcano plot showing the significance (assessed by rMATS-STAT) and change in splicing of shUPF1 cells compared to a scrambled control. c A scatter plot showing the behavior of ΔΨ of test exons in which there is a significant difference in the shUPF1 cells compared to the shScrambled cells. The color coding highlights test exons in which a significant difference in Ψ was identified. d Violin plots showing the predicted impact of NMD on variants which would be subject to NMD endogenously, but not in Vex-seq Discussion and conclusions associations are already known. This assay is also able to We have developed a new assay to assess how variants test designed intronic variation which other recent can impact pre-mRNA splicing efficiency called Vex-seq. methods have been unable to do, until very recently [9, This method builds upon previous high-throughput spli- 26]. Vex-seq is even able to account for the impacts that cing reporter assays. It utilizes a barcoding approach and variants may have on transcription of reporter tran- designed sequences based on the transcriptome and gen- scripts because of the barcoding approach. Vex-seq etic variants. Vex-seq’s approach of using designed se- could be applied to a number of different applications, quences allows for the possibility of not deeply including fine mapping of GWAS variants that may be sequencing the plasmid pool, because barcode variant involved in splicing regulation, which has been shown to Ladder shScrambled shUPF1 Adamson et al. Genome Biology (2018) 19:71 Page 10 of 12 be linked to complex diseases . Additionally, this the variants are otherwise not predicted to change could be used to dissect the behavior of RNA binding protein products. proteins and their effect on splicing regulation, or even saturating mutagenesis of exons known to be important Methods for diseases. Thus, Vex-seq has the potential to have an Plasmid alterations extremely high impact on our understanding of genome pcAT7-Glo1 was provided by Kristen Lynch. To elimin- function and how non-coding sequence variants impact ate a splice acceptor site in the middle of intron 1, a de- pre-mRNA splicing. letion of the pyrimidine tract and splice acceptor While Vex-seq offers certain advantages over current sequence was deleted. This was done through digestion methods, there remain some obstacles with all of these of the vector with AflII and PstI and PCR amplifying an splice reporter approaches [8, 9]. First, these massively insert using two primers (FWD 5′-AAACTCTTAAGCT parallel splicing assays lack the context of the entire AATACGACTCACTATAGG-3′, REV 5′- GACTGAAT gene and chromatin state of the native genes. Second, GAGTCTGCAGAGGCAGAGAGAGTCAGTGG-3′). The these assays have limitations in terms of barcode design insert digested with AflII and PstI was ligated in the vector and synthesis length constraints and also may have cryp- digested with the same enzymes, resulting in the plasmid tic splice sites formed in the context of the mini-gene. used for these studies. As oligonucleotide synthesis technologies improve, more context can be added to exons tested in this way. With Assembly of Vex-seq plasmid more context, we expect Vex-seq to be more accurate at The oligo pool (Additional file 1: Table S1) was amplified identifying variants that impact splicing. with a common primer set (FWD 5′- GTAGCGTCT Despite only examining 110 alternative exons in this GTCCGTCTGCA-3′; REV 5′-CTGTAGTAGTAGTT study, we are able to obtain some biological insights GTCTAG-3′) for 20 cycles, then digested with PstI and from these data. The first is the similarity between the XbaI. These were subcloned into the modified splicing behavior of K562 and HepG2 cell lines. Al- pcAT7-Glo1 also using PstI and XbaI sites. The resulting though the precise Ψ of each exon variant is not neces- plasmid pool, referred to as 1˚, was then digested with sarily identical between the two cell lines, the SpeI and MfeI. Exon 3 and intron 2 were PCR amplified directionality of the ΔΨ induced by most variants is from the original plasmid with primers (FWD 5′-GTGT quite similar in each cell line (Fig. 5b). This may suggest GGAAGTCTCAGGATCG-3′, REV 5′-AACGGGCCC that most variants tested in this context are acting upon TCTAGAGC-3′) and digested with MfeI and XbaI. The splicing elements common across these cell lines. Of resulting product was subcloned into the digested 1˚ course there are exceptions to this behavior, which may vector resulting in the final plasmid pool (2˚). mechanistically be related to the unique trans factors of each cell line or noise in the data. This observation may change when analyzing splicing changes in response to Transfection and cell culture stimuli or in the context of a cell with more complex HepG2 cells were grown to a density of 0.5 × 10 cells transcriptome regulation. Alternatively, this may suggest per well and transfected with 1 μg of plasmid DNA that regulatory factors important for cell type-specific using Lipofectamine 2000. Transfected HepG2 cells were splicing are generally outside of the window that we are then selected with 1 mg/mL zeocin for 8 days. K562 cells testing in Vex-seq. The predictive power of conserved were grown to a density of 1.0 × 10 cells per well and intronic splicing regulatory elements on Ψ generally be- electroporated with 5 μg of plasmid DNA. Transfected ing within 100 nucleotides upstream and downstream K562 cells were then selected with 200 μg/mL of zeocin may suggest that this is the case . We have also been for 8 days. RNA from each cell line was isolated using able to use this assay after UPF1 depletion to account Maxwell® 16 LEV simplyRNA Purification kits. for NMD as an experimental artifact, but also use it to UPF1 knockdown experiments were performed by predict the impact of NMD on variants which would transducing K562 cells with shRNA TRCN0000022254 cause NMD endogenously. (TRC collection), hairpin sequence (5′-CCGG-GCAT Data obtained from Vex-seq demonstrate the im- CTTATTCTGGGTAATAA-CTCGAG-TTATTACCCAG portance of variants on impacting pre-mRNA splicing AATAAGATGC-TTTTT-3′). A scrambled shRNA (SHC002 efficiency. It shows that variant effect prediction, Sigma-Aldrich; 5′-CCGG-CAACAAGATGAAGAGCA while useful for predicting protein changing variants, CCAA-CTCGAG-TTGGTGCTCTTCATCTTGTTG-TT is insufficient to predict all splicing changes induced TTT-3′) was used as a non-specific control. Trans- by variants. We also show that variants that tend to fected cells were selected with puromycin for 5 days change splicing more are also generally more con- followed by transfection with the Vex-seq plasmid li- served than nucleotides that do not, particularly when brary. Cells were then harvested after 24 h and RNA Adamson et al. Genome Biology (2018) 19:71 Page 11 of 12 was collected as above. Protein was isolated and west- collapsed into a single read by the UMI. Reads were then ern blotting performed using Wes. aligned to a variant-specific reference using STAR version 2.5.2b . The uniquely aligned annotated read junctions Sequencing preparation were identified and Ψ and ΔΨ were calculated. Reads Sequencing for the 1˚ library was constructed using a which spanned unannotated splice junctions were dis- nested PCR reaction. The 1˚ library was amplified for carded for calculating Ψ and ΔΨ. Ψ values for analysis, un- 15 cycles using the following primers: FWD 5′-ACAC less otherwise indicated, werethemean oftheK562and TCTTTCCCTACACGACGCTCTTCCGATCTCCACTG HepG2 Ψ values. Mutated and consensus splice site con- ACTCTCTCTGCCTC-3′;REV 5′-GTGACTGGAGTTC trols were removed for most analyses with the exceptions AGACGTGTGCTCTTCCGATCTAGCGGGTTTAAACG of Figs. 2 and 4. Annotations for each variant were done GGCCCT-3′.The 2˚ library was amplified for 15 cycles using the Ensembl Variant Effect Predictor tool using as- using the following primers: FWD 5′- ACACTCTTT sembly GRCh37.p13 and the Ensembl transcript database CCCTACACGACGCTCTTCCGATCTAGCAGCTAAA . The variants used in the analysis were selected based TCCAGCTACCA-3′;REV 5′-GTGACTGGAGTTCA on the first annotation output by VEP. We used 100-way GACGTGTGCTCTTCCGATCTAGCGGGTTTAAACGG vertebrate PhyloP conservation scores to examine conser- GCCCT-3′. Each of these products was then amplified for vation . Scripts to reproduce the post-processed data ten cycles using the following primers: FWD 5′-AATGA can be found at https://github.com/scottiadamson/Vex-seq. TACGGCGACCACCGAGATCTACAC-i5-INDEX-ACAC TCTTTCCCTACACGACGCTCTTCCGATCT-3′;REV Additional files 5′-CAAGCAGAAGACGGCATACGAGAT-i7-INDEX-GT GACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′. Additional file 1: This document contains Table S1., which kists the oligo sequences used to assemble the splicing reporters along with their The cDNA was synthesized from the K562 and HepG2 corresponding variants. (XLSX 588 kb) RNA using SuperScript™ III reverse transcriptase and a Additional file 2: Reviewer reports and Author’s response to reviewers. gene specific primer (5′-GTGACTGGAGTTCAGACGT (DOCX 87 kb) GTGCTCTTCCGATCTNNNNNNNNNNGCAACTAG AAGGCACAGTCGAGG-3′). The cDNA was then PCR Acknowledgements amplified for ten cycles using the following primers: FWD We thank Kristen Lynch for the pcAT7-Glo1 plasmid and members of the Graveley lab for discussions. We wish to thank those who reviewed the 5′-AATGATACGGCGACCACCGAGATCTACAC-i5-IN- manuscript for their constructive comments (Additional file 2). DEX-ACACTCTTTCCCTACACGACGCTCTTCCGATC TGGCAAGGTGAACGTGGATGAAG-3′;REV5′-CAA Funding This work was supported by a grant from the National Human Genome GCAGAAGACGGCATACGAGAT-i7-INDEX-GTGACTG Research Institute (R21HG008799) and the John and Donna Krenicki GAGTTCAGACGTGTGCTCTTCCGATCT-3′.Resulting Endowment Fund to BRG. samples were multiplexed and sequenced on a MiSeq using a v2 300-cycle kit. Read 1 and read 2 were 150 bases each. Availability of data and materials The raw and processed datasets generated and/or analyzed during the current study are available using the GEO accession GSE113163 . Custom python Data analysis and interpretation scripts for data analysis are located on Github (https://github.com/scottiadamson/ Plasmid quality control Vex-seq) and zenodo (https://doi.org/10.5281/zenodo.1217642). The modified pcAT7-Glo1 plasmid is available from the lab upon request. Forward and reverse reads from plasmids were combined into a single read using FLASH . The 1˚ Review history library reads were sorted into bins using the barcode The review history for this manuscript is available as Additional file 2. and grouped by control exon backbone with separate Authors’ contributions bins for indels and control sequences. Reads were then BG and SA conceived of the experiments, SA designed the oligo pool, built the aligned using Novoalign V3.02.13 (http://www.novo Vex-seq library, and analyzed data. LZ performed cell culture and knockdown craft.com). Sam2tsv was then used to identify variants in experiments. SA and BG wrote the manuscript. All authors read and approved the final manuscript. each read and identify the barcode sequence . Barcodes with 15% or more of reads not containing the Competing interests correct variant were filtered out during splicing analysis The authors declare that they have no competing interests. using custom python scripts. Barcodes identified 2˚ Received: 19 September 2017 Accepted: 27 April 2018 library reads using custom python scripts and barcodes without reads were filtered out of the analysis. References 1. Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by Splicing alignments and analysis alternative splicing. Nature. 2010;463:457–63. Reads were identified by barcode and sorted into bins for 2. Garcia-Blanco MA, Baraniak AP, Lasda EL. Alternative splicing in disease and each variant. The duplicate reads in each bin were then therapy. Nat Biotechnol. 2004;22:535–46. Adamson et al. Genome Biology (2018) 19:71 Page 12 of 12 3. Li YI, van de Geijn B, Raj A, Knowles DA, Petti AA, Golan D, et al. RNA splicing is a primary link between genetic variation and disease. Science. 2016;352:600–4. 4. Scotti MM, Swanson MS. RNA mis-splicing in disease. Nat Rev Genet. 2016;17:19–32. 5. Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RKC, et al. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science. 2015;347:1254806. 6. Xiong HY, Barash Y, Frey BJ. Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics. 2011;27:2554–62. 7. Leung MKK, Xiong HY, Lee LJ, Frey BJ. Deep learning of the tissue-regulated splicing code. Bioinformatics. 2014;30:i121–9. 8. Rosenberg AB, Patwardhan RP, Shendure J, Seelig G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell. 2015;163:698–711. 9. Soemedi R, Cygan KJ, Rhine CL, Wang J, Bulacan C, Yang J, et al. Pathogenic variants that alter protein code often disrupt splicing. Nat Genet. 2017;49:848–55. 10. Mercer TR, Clark MB, Andersen SB, Brunck ME, Haerty W, Crawford J, et al. Genome-wide discovery of human splicing branchpoints. Genome Res. 2015;25:290–303. 11. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–91. 12. Joung J, Konermann S, Gootenberg JS, Abudayyeh OO, Platt RJ, Brigham MD, et al. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening. Nat Protoc. 2017;12:828–63. 13. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. 14. Ke S, Shang S, Kalachikov SM, Morozova I, Yu L, Russo JJ, et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 2011;21:1360–74. 15. Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol J Comput Mol Cell Biol. 2004;11:377–94. 16. Wang Y, Ma M, Xiao X, Wang Z. Intronic splicing enhancers, cognate splicing factors and context-dependent regulation rules. Nat Struct Mol Biol. 2012;19:1044–52. 17. Culler SJ, Hoff KG, Voelker RB, Berglund JA, Smolke CD. Functional selection and systematic analysis of intronic splicing elements identify active sequence motifs and associated splicing factors. Nucleic Acids Res. 2010;38:5152–65. 18. Signal B, Gloss BS, Dinger ME, Mercer TR. Machine learning annotation of human branchpoints. Bioinformatics. 2018;34:920–7. 19. Soemedi R, Cygan KJ, Rhine CL, Glidden DT, Taggart AJ, Lin C-L, et al. The effects of structure on pre-mRNA processing and stability. Methods. 2017;125:36–44. 20. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. 21. Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002;3:285–98. 22. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20:110–21. 23. He F, Peltz SW, Donahue JL, Rosbash M, Jacobson A. Stabilization and ribosome association of unspliced pre-mRNAs in a yeast upf1- mutant. Proc Natl Acad Sci U S A. 1993;90:7034–8. 24. Shen S, Park JW, Lu Z, Lin L, Henry MD, Wu YN, et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci U S A. 2014;111:E5593–601. 25. Nagy E, Maquat LE. A rule for termination-codon position within intron- containing genes: when nonsense affects RNA abundance. Trends Biochem Sci. 1998;23:198–9. 26. Cheung R, Insigne KD, Yao D, Burghard CP, Jones EM, Goodman DB, et al. Many rare genetic variants have unrecognized large-effect disruptions to exon recognition. bioRxiv. 2018. https://www.biorxiv.org/content/early/2018/ 03/10/199927. 27. Wainberg M, Alipanahi B, Frey B. Does conservation account for splicing patterns? BMC Genomics. 2016;17:787. 28. Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27:2957–63. 29. Lindenbaum P. JVarkit: java-based utilities for Bioinformatics. figshare; 2015. https://doi.org/10.6084/m9.figshare.1425030. 30. Adamson SI, Zhan L, Graveley BR. Vex-seq: high-throughput identification of genetic variation impact on pre-mRNA splicing efficiency. NCBI GEO. https://www. ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE113163. Accessed 16 May 2018. 31. Adamson SI. Vex-seq v1.0. https://doi.org/10.5281/zenodo.1217642. Accessed 16 May 2018.
– Springer Journals
Published: Jun 1, 2018