Generally applicable transcriptome-wide analysis of translation using anota2seqOertlin,, Christian;Lorent,, Julie;Murie,, Carl;Furic,, Luc;Topisirovic,, Ivan;Larsson,, Ola
doi: 10.1093/nar/gkz223pmid: 30926999
Abstract mRNA translation plays an evolutionarily conserved role in homeostasis and when dysregulated contributes to various disorders including metabolic and neurological diseases and cancer. Notwithstanding that optimal and universally applicable methods are critical for understanding the complex role of translational control under physiological and pathological conditions, approaches to analyze translatomes are largely underdeveloped. To address this, we developed the anota2seq algorithm which outperforms current methods for statistical identification of changes in translation. Notably, in contrast to available analytical methods, anota2seq also allows specific identification of an underappreciated mode of gene expression regulation whereby translation acts as a buffering mechanism which maintains protein levels despite fluctuations in corresponding mRNA abundance (‘translational buffering’). Thus, the universal anota2seq algorithm allows efficient and hitherto unprecedented interrogation of translatomes which is anticipated to advance knowledge regarding the role of translation in homeostasis and disease. INTRODUCTION Regulation of gene expression is a multi-step process including transcription, mRNA-processing, -transport, -stability, -translation and protein stability (1). Although the precise relative contribution of each of these processes to corresponding protein levels remains controversial (2) and context dependent (3,4), several studies have implicated mRNA translation as a key mechanism which determines the composition of the proteome (5,6). Notably, rapid adaptation to changes in the cellular environment requires precipitous adjustment of the proteome which is, in addition to protein degradation, largely accommodated by altering the efficiency of mRNA translation (7). Direct transcriptome-wide quantification of translational efficiency is therefore required to enhance the understanding of how protein levels are regulated in response to a variety of stimuli and stressors, and in normal vs. diseased cells. Under most conditions, the number of ribosomes associated with an mRNA is proportional to its translational efficiency (8). Transcriptome-wide measurement of translational efficiency is therefore commonly determined using polysome- or ribosome-profiling techniques. For polysome-profiling (9), the pool of efficiently translated mRNA (commonly mRNA with >3 bound ribosomes) is isolated, whereas in ribosome profiling (10) ribosome protected fragments (RPFs) are generated. Both polysome- and ribosome-profiling also involve isolation of total mRNA. Expression levels of polysome-associated mRNA or RPFs and total mRNA are then quantified using RNA sequencing (RNAseq) (10,11). Advantages and limitations of these methods have been extensively reviewed elsewhere (12–16). A paramount challenge during identification of changes in translational efficiencies is that amounts of polysome-associated mRNA and RPFs can also be influenced by steps in the gene expression pathway which modulate total mRNA levels (e.g. transcription and/or mRNA stability; Figure 1A). Analysis of bona fide changes in translational efficiency therefore requires identification of changes in amounts of polysome-associated mRNA or RPFs that are independent of changes in corresponding total mRNA levels. To this end, the analysis of translational activity (anota) algorithm (17) applies per-gene analysis of partial variance (APV) (18) coupled with variance shrinkage (19). This approach is superior to methods comparing differences in log-ratios (i.e. between polysome-associated mRNA or RPFs and total mRNA, commonly referred to as translational efficiency [TE] scores) across experimental conditions, inasmuch as log-ratio based approaches do not efficiently adjust changes in polysome-associated mRNA or RPFs for changes in total mRNA levels due to spurious correlations (20). Spurious correlations were initially described by Pearson in 1896 (21) and imply that the log-ratio of polysome-associated mRNA or RPFs to total mRNA levels, can systematically correlate with total mRNA levels. This can lead to false positive identification of alterations in translational efficacy and consequent misinterpretations of biological phenomena. Such spurious correlations are abundant in both polysome- and ribosome-profiling studies suggesting that log-ratio based approaches should be avoided (20). Figure 1. View largeDownload slide Analysis of differential translation using Xtail or babel is associated with increased false positive findings. (A) Gene expression may be modulated via changes in translational efficiency leading to altered protein levels (top panel); buffering wherein the ribosome occupancy of the mRNA remains unaltered despite changes in corresponding mRNA level (middle panel); and congruent changes in mRNA abundance and its ribosome occupancy. (B) Hierarchical clustering of a simulated dataset with a control and treatment condition including polysome-associated mRNA and total mRNA samples where no mRNAs are regulated (i.e. a NULL model; data provided in Supplementary file 2). (C) P-value and FDR density plots for analysis of differential translation comparing treatment to control for each method using the data from (B). Figure 1. View largeDownload slide Analysis of differential translation using Xtail or babel is associated with increased false positive findings. (A) Gene expression may be modulated via changes in translational efficiency leading to altered protein levels (top panel); buffering wherein the ribosome occupancy of the mRNA remains unaltered despite changes in corresponding mRNA level (middle panel); and congruent changes in mRNA abundance and its ribosome occupancy. (B) Hierarchical clustering of a simulated dataset with a control and treatment condition including polysome-associated mRNA and total mRNA samples where no mRNAs are regulated (i.e. a NULL model; data provided in Supplementary file 2). (C) P-value and FDR density plots for analysis of differential translation comparing treatment to control for each method using the data from (B). A recent study evaluated anota for analysis of RNAseq data and concluded its poor performance for identification of differential translation (22). Anota, however, was developed for normalized data on a continuous logarithmic scale. In the study in question, anota was inappropriately applied on non-normalized and non-transformed count data originating from RNAseq studies. Several methods specifically developed for analysis of RNAseq data are available within the commonly used statistical programming language ‘R’, including babel (23) and Xtail (22). Moreover, DESeq2 (24) which was developed for analysis of differential gene expression using RNAseq data has also been employed for analysis of differential translation (25). However, all these methods use log-ratios (or equivalent analysis) and therefore may suffer from spurious correlations. This highlights a need to develop an algorithm for analysis of changes in translation efficiencies applicable to RNAseq data which, similarly to anota, is not affected by spurious correlations. Traditionally, changes in translational efficiencies are thought to modulate levels of encoded proteins under conditions wherein corresponding mRNA levels are not altered in a similar fashion (Figure 1A). Emerging data, however, point to an additional mode for regulation of translation, whereby translational mechanisms are employed to maintain protein levels via feedback loops compensating for changes in corresponding mRNA levels. For example, levels of polyamines are tightly regulated via a negative feedback loop involving antizyme (OAZ) that induces degradation of ornithine decarboxylase (ODC), a rate-limiting enzyme for polyamine synthesis (26). An increase in polyamine levels leads to a +1 ribosome frameshift during translation of the OAZ mRNA which is required to synthesize the active OAZ protein (26,27). Thus, it is expected that altered levels of the OAZ mRNA will be compensated by translational mechanisms, which sense polyamine levels. Similarly, the level of the AMD1 protein, whose synthesis is modulated by polyamines via a mechanism involving an upstream open reading frame (28), is not expected to be affected by the amount of mRNA but rather the level of polyamines. These examples highlight that regulation of translation can act as a buffering mechanism which gatekeeps the proteome to maintain homeostasis. Larger scale studies also support that mRNA translation can act as a buffering mechanism to preserve protein levels despite fluctuations in mRNA levels. In bacteria, translation equilibrates protein complex stoichiometry (29) and protein levels of conserved pathways across species (30) despite differences in mRNA levels. Similarly, in humans, co-expression of mRNAs transcribed from spatially proximal genomic locations is lost via post-transcriptional mechanisms such that protein co-expression is instead observed for functionally related proteins whose encoding mRNAs are transcribed from different genomic regions (31). Furthermore, translational buffering was also reported to compensate for inter-individual, inter-species and inter-tissue differences in mRNA levels (32–35). Although the scopes and the biological contexts of translational buffering mechanisms are still being established, these examples indicate an urgent need for algorithms enabling efficient separation of changes in translation affecting protein levels from those maintaining proteome homeostasis. Herein, we describe the universal anota2seq algorithm for analysis of changes in translational efficiencies using either continuous (e.g. DNA-microarray) or count (i.e. RNAseq) data as input. Furthermore, anota2seq is not affected by spurious correlations, outperforms other methods and is the only algorithm to date which allows statistics-based separate identification of changes in translational efficiencies affecting protein levels and buffering. MATERIALS AND METHODS Retrieval and processing of polysome-profiling data Already processed RNAseq count data were obtained from published studies (11,36,37) available at the Gene Expression Omnibus (GEO) (38) with accession numbers GSE99909, GSE90070 and GSE35469. From these data sets, specific conditions were selected and used in our analysis. From Liang et al. (11), we used all total mRNA and optimized sucrose gradient polysome-associated mRNA samples; from Guan et al. (36), we selected the control, thapsigargin 1 h and thapsigargin 16 h samples; and from Hsieh et al. (37), we selected data from the DMSO and rapamycin conditions. Genes that could not be resolved (i.e. based on sequence similarity), were duplicated in the count table or had 0 counts in at least one sample were removed. Data from all studies were normalized using the TMM-log2 (39) approach [used in Figure 5C and D and Supplementary Figure S4B top right, bottom right]). Additionally, data from Guan et al. (36) were also processed and used to simulate RNAseq data as described below. DNA- microarray polysome-profiling data from Parent et al. (40) were retrieved from ArrayExpress (41) with accession number E-MEXP-958. The data were normalized using the rma() function (default settings) of the oligo package (42) (data were used in Supplementary Figure S4B). Retrieval and processing of ribosome-profiling data Raw sequencing files from a recent study (43) were obtained from GEO (GSE89183). Only samples of the shLuc (control), shRPL5 and shRPS19 conditions were used. Reads from RPFs were trimmed for adapter sequence ‘AGATCGGAAGAGCACACGTCTG’ and reads shorter than 26 or longer than 32 bases were discarded. RNAseq reads originating from total mRNA and RPF sequencing libraries were then aligned to hg38 (gencode release 29 GRCh38.p12) using bowtie (settings –best –strata –m 1 –l 25 –a). Uniquely aligned and unmapped reads were then aligned to the full transcript or the protein coding region (the latter only for RPF reads) of protein coding mRNAs defined by RefSeq (44) using bowtie (settings –best –strata –m 1 –norc –l 25 –a). When there were multiple RefSeq mRNAs for the same gene, the one with the largest number of mapped reads across all samples was used for downstream analysis. RNAseq data simulation Method performance comparisons were done using simulated data. In order to obtain simulated data with realistic characteristics of polysome profiling data quantified by RNAseq, we first estimated means and dispersions from an empirical data set and sampled from negative binomial distributions (NB) using these parameters. We used the empirical RNAseq data produced in Guan et al. (using the thapsigargin 16 h treatment condition) (36) and the simulation methods described in (45) with several modifications as described below. We simulated 4 replicates of 2 conditions referred to as ‘control’ and ‘treatment’. In addition to a set of unchanged mRNAs between the two conditions, three sets of regulated mRNAs were simulated to reflect: changes in translational efficiency affecting protein levels (i.e. a change in polysome-associated mRNA independent of a change in total mRNA; Figure 1A; referred to as ‘translation set’ below); translational buffering (a change in total mRNA that is not reflected by a similar change in polysome-associated mRNA; Figure 1A; referred to as ‘buffering set’ below); and mRNA abundance (i.e. a congruent change in polysome-associated and total mRNA; Figure 1A; referred to as ‘mRNA abundance set’ below). Each set is represented by 5% of all mRNAs (total number of simulated mRNAs was 9856). In detail, four replicates of control and treatment conditions were simulated using the following parameterization for the NB distribution: \begin{equation*}{Y_{gi}}\,\sim \,NB\left( {mean\ = {\mu _{gi}},var\ = {\mu _{gi}}\ {{\hat{\phi }}_{gi}}} \right)\end{equation*} where Ygi is the simulated RNAseq count for gene g and RNA source i (i.e. polysome-associated or total mRNA). The mean |${\mu _{gi}}$| and dispersion |${\phi _{gi}}$| were estimated from the empirical data using maximum likelihood estimates |${\hat{\mu }_{gi}}$| and |${\hat{\phi }_{gi}}$| with \begin{equation*} {\hat{\mu }_{gi}} = \frac{1}{R}\ \mathop \sum \limits_r {K_{gir}} \end{equation*} where R is the number of replicates and Kgir the RNAseq read count for the empirical gene g, RNA source i and replicate r. The |${\hat{\phi }_{gi}}$| parameter was obtained by maximizing the log-likelihood function described (45,46) using a custom R script (kindly provided by Charlotte Soneson, University of Zürich). In the script, the nlm() function was used with parameter settings (P = 10; gradtol = 1e–6; iterlim = 25) to achieve convergence for all genes. The |${\hat{\mu }_{gi}}$| and |${\hat{\phi }_{gi}}$| parameters were then applied to generate an NB distribution for simulation of 9856 transcripts in 4 control-condition replicates. We used the rnbinom() function of the stats R package with parameter size of |$\frac{1}{{{{\hat{\phi }}_{gi}}}}$|. For unchanged mRNAs (i.e. not belonging to any of the regulated sets), the treatment condition was sampled from a distribution with |${\hat{\mu }_{gi}}$| and |${\hat{\phi }_{gi}}$| identical to the control condition for that gene. For sets of regulated mRNAs, these estimates were used as base parameters and then modified as follows: For transcripts belonging to the translation set (Figure 1A; 493 mRNAs, similar number of up- and down-regulation), the base parameters were used to simulate total mRNA for both conditions and polysome-associated mRNA for the control condition. The mean and dispersion parameters used to simulate polysome-associated mRNA under the treatment condition were modified as follows: an effect parameter |${\alpha _g}$| for upregulation was sampled from a vector containing values from 1.5 to 3 with steps of 0.2. For down regulation the effect parameter was modified to 1/|${\alpha _g}$|. The modified mean parameter was then |${\alpha _g}{\hat{\mu }_{gi}}$| (or 1/|${\alpha _g}{\hat{\mu }_{gi}}$| for down regulation). In order to keep the mean-variance relationship as similar as possible to the empirical data, the modified dispersion was taken as the dispersion of the transcript from the empirical data having the closest mean estimate to |${\alpha _g}{\hat{\mu }_{gi}}$| (or 1/|${\alpha _g}{\hat{\mu }_{gi}}$| for down regulation). Similarly, for transcripts belonging to the buffering set (Figure 1A; 493 mRNAs), base parameters were used for polysome-associated mRNA (both conditions) and total mRNA (control condition). The effect parameter was introduced during simulation of total mRNA under the treatment condition and applied as described above for the translation set. For transcripts in the mRNA abundance set, base parameters were used for total and polysome-associated mRNA under the control condition. The same effect parameter was then introduced to modify the base parameters of the distribution from which total and polysome-associated mRNA were sampled under the treatment condition; and applied as described above for translation. Transcripts with a simulated count of zero in any sample were removed before analysis. To assess the reproducibility of the simulation, five such data sets were simulated. Means as well as standard deviations over the five data sets are provided for different metrics (e.g. number of identified mRNAs and area under the curve (AUC) for receiver operating characteristics (ROC) curves, see figure legends for more details). An example sampled data set is supplied as Supplementary File S1. To assess the ability of methods to control for type I error/false discovery rate in the absence of any true regulation between control and treatment, we also simulated a NULL data set using base parameters for all mRNAs in both conditions (i.e. all mRNAs are unchanged; the NULL data set is supplied as Supplementary File S2). Moreover, we simulated data sets without mRNAs in the buffered set (i.e. only containing simulated mRNAs of the unchanged, mRNA abundance and translation sets); data sets with increased variance (where a percentage of each count [5%, 10%, 15%] is added or subtracted [same probability to add or subtract]); and data sets with increasing total number of RNAseq reads (2.5, 5, 10 and 15 million RNAseq reads per sample). The latter was achieved by obtaining |${\hat{\mu }_{gi}}$| and |${\hat{\phi }_{gi}}$| parameters using empirical data on which the total number of counts has been reduced (e.g. for 2.5 million reads for each RNA source, condition and replicate, we sampled 2.5 million reads from the total available amount of reads mapped to mRNAs). The |${\hat{\mu }_{gi}}$| and |${\hat{\phi }_{gi}}$| parameters of the empirical data with reduced total counts were then used as input for the NB distribution to simulate data sets with reduced sequencing depths using the approach described above. To assess the impact of varying sequencing depths across samples, |${\hat{\mu }_{gi}}$| and |${\hat{\phi }_{gi}}$| parameters obtained from empirical data with 15 or 5 million reads were used and 25%, 50% or 75% of the samples were substituted with samples generated with |${\hat{\mu }_{gi}}$| and |${\hat{\phi }_{gi}}$| parameters estimated from empirical data with a sequencing depth of 2.5 million reads. Comparison of methods for analysis of differential translation using simulated data We compared anota2seq (version 1.2.0 deposited at Bioconductor) to tools available in the statistical programming language ‘R’ for analysis of the translatome: babel (23) (version 0.3.0), DESeq2 (24) (version 1.20.0), Xtail (22) (version 1.1.5) and translational efficiency score (TE-score, calculated using a custom function). All analyses were performed using R (version 3.5.0). Identical simulated count data were used as input for babel, DESeq2 and Xtail. For TE-score analysis, counts were normalized using DESeq2 (normalization for library size using the median ratio method) and log2 transformed. In anota2seq, counts are either rlog (24) or TMM-log2 (39) normalized/transformed. Similar to anota (17), anota2seq combines APV (18) and the Random Variance Model (RVM) (19) and uses a two-step process that firstly assesses the model assumptions for (i) absence of highly influential data points, (ii) common slopes of sample classes, (iii) homoscedasticity of residuals and (iv) normal distribution of per gene residuals. Anota2seq then performs analysis of changes in translational efficiency affecting protein levels or buffering using APV and RVM. Babel and Xtail were applied using default parameters. DESeq2 (24) was applied as previously described (25). TE-scores were calculated as the difference between conditions in log2-ratios (between polysome-associated and total mRNA). Statistics for changes in TE were calculated using Student's t-test (P-values were adjusted using the Benjamin–Hochberg approach) (47). When applying default settings, RiboDiff (48) essentially provides a python implementation of the DESeq2 approach applied herein. For all methods, ROC and AUC analyses were performed on reported P-values prior to any filtering (for anota2seq, this is the output called ‘full’ from the anota2seqGetOutput function). When reporting numbers of identified mRNAs at false discovery rate (FDR) thresholds, default filtering was applied (after fitting all gene-level APV models, anota2seq allows the user to filter the results to exclude unrealistic APV models to reduce the number of false positives; this is the default output called ‘selected’ from the anota2seqGetOutput function). The outputs from all methods for the NULL data set and the data set containing translation, buffering and mRNA abundance sets of mRNAs are provided as Supplementary Files S1–2. Re-analysis of the simulated data set from Xiao et al. The simulated data set from Xiao et al. (22) was retrieved from the published supplementary files. Simulated true positive events by the authors (Supplementary Figure S4B; top left) were reclassified into translation, mRNA abundance and buffering sets based on fold change thresholds (Supplementary Figure S4C). We then applied anota2seq, babel, DESeq2, TE-score and Xtail on the simulated data set and evaluated algorithm performance using the reclassified unchanged, mRNA abundance, buffering and translation sets (Supplementary Figure S4C). Analysis of the empirical data sets using anota2seq RNAseq data from Liang et al. (11) and Guan et al. (36) were analyzed for changes translational efficiencies affecting protein levels using the anota2seq algorithm. To assess the impact of batch effect adjustment, the analysis of data from Liang et al. was performed with and without including the replicate as a covariate in the models (using the ‘batchVec’ parameter in anota2seq). No batch effect adjustment was performed on data from Guan et al. (36). To assess the need for replication, subsets of two or three replicates were re-analyzed (all possible combinations of two or three samples were used and averages of mRNAs passing thresholds were calculated). Of note, anota2seq analysis can be performed on two replicates per condition if at least three conditions are included in the models. Thus, models from Guan et al. data were fitted on data from three conditions (control, Thapsigargin 1 h, Thapsigargin 16 h) and only the ‘Thapsigargin 16 h versus control’ contrast was considered. A similar approach was used for ribosome profiling data from Khajuria et al. (43) where three conditions (shRPS19, shRPL5 and shLUC) with two replicates each were used as input for anota2seq analysis with adjustment for replicate number (i.e. via the ‘batchVec’ parameter). Raw read counts were normalized using the ‘normalize’ parameter in the anota2seqDatasetFromMatrix() function set to ‘TMM-log2’. We focused on the contrast comparing shRPS19 to shLuc. Transcripts were considered significantly changing their translational efficiency leading to altered protein levels or translational buffering when passing default filtering criteria in anota2seq (except for the FDR which was set to <0.10 using the ‘maxPAdj’ parameter of the anota2seqRun function). R-packages and settings Hierarchical clustering and principal components analysis (PCA) plots were generated using the hclust() and prcomp() R-functions, respectively (all using default settings). Receiver operating characteristics (ROC) curves, area under the curve (AUC), partial AUC (pAUC) and precision recall curves were generated using the ROCR R-package (version 1.0-7) (49). P-values from each method were used for ROC analysis. pAUC were obtained at 5% and 15% false positive rate (fpr) by using the ‘fpr.stop’ parameter in the ROCR package; the acquired AUC was divided by the corresponding fpr cut-off rate. The precision is the positive predictive value and the recall is the true positive rate or sensitivity. Statistics All statistical tests within the anota2seq package are two-tailed. RESULTS Babel and Xtail algorithms underperform under a NULL model By monitoring quality control steps (see materials and methods), we first identified suitable data normalization/transformation for application of anota2seq to RNAseq data (rlog (24) and TMM-log2 (39)). We then compared the performance of anota2seq, using rlog or TMM-log2 transformed data, to babel (23), DESeq2 (24), TE-score and Xtail (22). An essential aspect during identification of differences in gene expression is control of type I error/FDR under a NULL model (i.e. when there are no true differences in gene expression). Importantly, the performance of the method on a data set without true gene expression changes is unrelated to its sensitivity; and the distribution of obtained P-values is expected to be uniform resulting in FDRs equal to 1 (50). We assessed this using simulated data sets with control and treatment conditions sampled from the same distribution (i.e. where there were no differences in expression between conditions). The simulated data set closely mirrored characteristics of the empirical data set from which simulation parameters were obtained (Supplementary Figure S1A). Consistent with observations from empirical data sets (36), simulated data for total mRNA or polysome-associated mRNA were different when assessed using hierarchical clustering (Figure 1B). As expected, for a data set under a NULL model, there was no further separation between the treatment and control groups (Figure 1B). The resulting density plots of p-values for changes in translational efficiency (Figure 1C) revealed uniform distributions for all methods except Xtail, which exhibited an overrepresentation of low p-values, and babel, which showed an overrepresentation of high p-values and a local enrichment of low p-values (Figure 1C). Accordingly, Xtail and babel identified mRNAs as differentially translated even when applied to a NULL data set, which was a priori simulated not to exhibit any changes in translation (Figure 1C; at an FDR <0.05 Xtail and babel reported 276 and 66 transcripts, respectively, as changing their translation). Moreover, although the distribution of P-values appears approximately uniform, DESeq2 also identified mRNAs as differentially translated when applied to the NULL data set (Figure 1C; at an FDR < 0.05 DEseq2 reported 15 such transcripts). None of the other methods reported any transcripts from the NULL data set as differentially translated (FDR < 0.05). Furthermore, Xtail can analyze data sets with only one replicate per condition and we therefore assessed whether the number of replicates affected the amount of false positives reported by Xtail. Importantly, there was a strong increase in false positives reported by Xtail when applied to NULL data sets with only one replicate (Supplementary Figure S1B). Xtail and babel therefore have limited usability for statistical FDR-based analysis as they indicate changes in translational efficiency even when such changes are absent. Anota2seq outperforms current methods by allowing distinction between changes in translational efficiency affecting protein levels and translational buffering Algorithms for analysis of changes in translational efficiency necessitate adjustment for concomitant changes in mRNA levels and separation between changes in translational efficiency leading to altered protein levels and buffering (Figure 1A). We therefore simulated data sets with two conditions including sets of transcripts regulated by changes in mRNA abundance, translation or buffering (8377 unchanged mRNAs and 493 mRNAs per regulated set; Figures 1A and 2A). The resulting data sets closely mirrored characteristics of the empirical data set used to obtain simulation parameters (Supplementary Figure S1A). Furthermore, as expected, hierarchical clustering showed separation of conditions both for total mRNA and polysome-associated mRNA samples; and therefore captures the complex structure of polysome-profiling data, which is similar to ribosome-profiling data ((43); Figure 2B). We next determined the performance of each method for detection of changes in translational efficiency leading to altered protein levels. Accordingly, identification of mRNAs from the translation set were considered as true positive events whereas identification of mRNAs from unchanged, buffered and mRNA abundance sets were considered false positives (Figure 2A). The algorithms were applied using default settings on 5 data sets simulated as in Figure 2A. The resulting outputs prior to any filtering (see material and methods) were then evaluated. ROCs showed that anota2seq analysis using rlog or TMM-log2 data performs similarly and outperforms all other methods as judged by AUC and pAUCs (Figure 2C, Table 1). In addition, precision recall curves reveal higher initial precision values for anota2seq compared to other methods (Figure 2C). This can be explained by the analysis principle of the other methods. TE-score, babel, DESeq2 or Xtail cannot separate changes in translational efficiency which affect protein levels from buffering. The latter is consistent with the reported superior performance of anota as compared to TE-score in reflecting changes in the proteome (51). Figure 2. View largeDownload slide Anota2seq outperforms other methods for analysis of changes in translational efficiency affecting protein levels. (A) Scatterplot of polysome-associated and total mRNA log2 fold changes between treatment and control groups for a simulated dataset (provided in Supplementary file 1). Transcripts simulated as unchanged or belonging to mRNA abundance, translation or buffering sets are indicated. (B) Hierarchical clustering of gene expression data from (A). (C) Receiver operating characteristics curves for analysis of differential translation in a simulated dataset (i.e. from A-B; top). Precision recall curves for analysis of differential translation in the simulated dataset (bottom). For both analyses, identification of a transcript from the translation set was considered a true positive event. Vertical lines indicate 5% and 15% false positive rates. (D) Numbers of mRNAs identified as differentially translated belonging to translation (true positives [TP]), buffering (false positives [FP]), mRNA abundance (FP) or unchanged (FP) sets are indicated for each method at several FDR thresholds (mean and standard deviations from 5 simulated data sets are indicated). Red lines indicate the total amount of mRNAs simulated from each set of regulated transcripts. (E) Shown are differences in the number of mRNAs showing differential translation belonging to the four sets in (D) when changing the FDR threshold from 5% to 15% (mean from 5 simulated data sets). Figure 2. View largeDownload slide Anota2seq outperforms other methods for analysis of changes in translational efficiency affecting protein levels. (A) Scatterplot of polysome-associated and total mRNA log2 fold changes between treatment and control groups for a simulated dataset (provided in Supplementary file 1). Transcripts simulated as unchanged or belonging to mRNA abundance, translation or buffering sets are indicated. (B) Hierarchical clustering of gene expression data from (A). (C) Receiver operating characteristics curves for analysis of differential translation in a simulated dataset (i.e. from A-B; top). Precision recall curves for analysis of differential translation in the simulated dataset (bottom). For both analyses, identification of a transcript from the translation set was considered a true positive event. Vertical lines indicate 5% and 15% false positive rates. (D) Numbers of mRNAs identified as differentially translated belonging to translation (true positives [TP]), buffering (false positives [FP]), mRNA abundance (FP) or unchanged (FP) sets are indicated for each method at several FDR thresholds (mean and standard deviations from 5 simulated data sets are indicated). Red lines indicate the total amount of mRNAs simulated from each set of regulated transcripts. (E) Shown are differences in the number of mRNAs showing differential translation belonging to the four sets in (D) when changing the FDR threshold from 5% to 15% (mean from 5 simulated data sets). Table 1. Mean and standard deviation (sd) of pAUCs and AUCs for an ROC analysis assessing performance for identification of changes in translational efficiency affecting protein levels in simulated data sets (n = 5) including translation, buffering, mRNA abundance and unchanged sets of transcripts (Figure 2A) AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.969 0.005 0.719 0.023 0.858 0.017 anota2seq (TMM-log2 transf.) 0.966 0.009 0.702 0.043 0.847 0.031 babel 0.936 0.002 0.376 0.017 0.671 0.009 DESeq2 0.939 0.002 0.431 0.013 0.715 0.008 TE score 0.923 0.004 0.385 0.016 0.655 0.009 Xtail 0.937 0.002 0.414 0.015 0.704 0.009 AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.969 0.005 0.719 0.023 0.858 0.017 anota2seq (TMM-log2 transf.) 0.966 0.009 0.702 0.043 0.847 0.031 babel 0.936 0.002 0.376 0.017 0.671 0.009 DESeq2 0.939 0.002 0.431 0.013 0.715 0.008 TE score 0.923 0.004 0.385 0.016 0.655 0.009 Xtail 0.937 0.002 0.414 0.015 0.704 0.009 View Large Table 1. Mean and standard deviation (sd) of pAUCs and AUCs for an ROC analysis assessing performance for identification of changes in translational efficiency affecting protein levels in simulated data sets (n = 5) including translation, buffering, mRNA abundance and unchanged sets of transcripts (Figure 2A) AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.969 0.005 0.719 0.023 0.858 0.017 anota2seq (TMM-log2 transf.) 0.966 0.009 0.702 0.043 0.847 0.031 babel 0.936 0.002 0.376 0.017 0.671 0.009 DESeq2 0.939 0.002 0.431 0.013 0.715 0.008 TE score 0.923 0.004 0.385 0.016 0.655 0.009 Xtail 0.937 0.002 0.414 0.015 0.704 0.009 AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.969 0.005 0.719 0.023 0.858 0.017 anota2seq (TMM-log2 transf.) 0.966 0.009 0.702 0.043 0.847 0.031 babel 0.936 0.002 0.376 0.017 0.671 0.009 DESeq2 0.939 0.002 0.431 0.013 0.715 0.008 TE score 0.923 0.004 0.385 0.016 0.655 0.009 Xtail 0.937 0.002 0.414 0.015 0.704 0.009 View Large To further characterize the performance of the methods at commonly employed FDR thresholds, we determined the number of identified mRNAs from translation, buffering, mRNA abundance and unchanged sets at a 5%, 10% or 15% FDR threshold (Figure 2D; using the default settings of each method). Babel and TE-score identified fewer true positive events (i.e. transcripts from the translation set) than the other methods. Anota2seq identified approximately the same number of true positives at the 15% FDR threshold as DESeq2 while Xtail identified slightly more true positives. The number of mRNAs identified from the buffered set (here considered false positives as they do not lead to changes in protein levels) by each method reveals that only anota2seq can efficiently distinguish these from transcripts whose change in translational efficiency leads to altered protein levels (Figure 2D). Notwithstanding that all methods perform similarly in terms of rejecting mRNAs changing their abundance, there were dramatic differences in terms of identification of unchanged mRNAs. Anota2seq, babel and TE-score identified few unchanged mRNAs as being perturbed, while Xtail and DEseq2 identified a sizeable number of unchanged mRNAs as being altered (Figure 2D). Strikingly, Xtail identified >800 such mRNAs at FDR <15%, which is consistent with the poor performance of Xtail under the NULL model (Figure 1C). To contrast 5% and 15% FDR thresholds, we calculated the difference in number of identified mRNAs from each set of transcripts (Figure 2E). For anota2seq, there was a gain in true positives at the cost of an approximately equal increase in false positives, whereas for other methods, especially Xtail, increasing the FDR threshold introduced dramatically more false positives. Notably, the output from RiboDiff under default settings (48) is essentially a python implementation of the DESeq2 approach and, as expected, the performance was almost identical to that of DESeq2 (Supplementary Figure S2). Below, we therefore only report results of DESeq2 as representative for both methods. In conclusion, anota2seq outperforms current methods by discriminating between changes in translational efficiency altering protein levels versus buffering; and by identifying fewer false positives at commonly employed FDR-based thresholds. Anota2seq outperforms current methods for statistical analysis of translational efficiency even in the absence of translational buffering We next compared algorithms using a simulated data set including the translation and mRNA abundance sets (493 mRNAs each; Figure 1A) together with a set of unchanged transcripts (8870 mRNAs), but without the buffered set of mRNAs (Supplementary Figure S3A and B). As expected from their inability to separate transcripts from translation and buffering sets (Figure 2D), under these conditions babel, DESeq2, TE-score and Xtail showed improved performance as judged by pAUC and AUC, which was comparable to anota2seq (Supplementary Table S1). Moreover, excluding the set of buffered transcripts resulted in an increase in the precision (compare Supplementary Figure S3C to Figure 2C). Accordingly, the performance of statistical analysis under different FDR thresholds paralleled the analysis which included the buffering set of transcripts (compare Supplementary Figure S3D and E to Figure 2D and E). This comprised comparable performance of anota2seq and DEseq2 (24) for identification transcripts from the translation set (Figure 1A); and increased identification of mRNAs from the unchanged set for DESeq2 and Xtail (i.e. false positives) as compared to anota2seq. Analogous to the simulation including the buffered set, TE-score identified fewer changes in translational efficiency but did not assign unchanged mRNAs low FDRs. These results demonstrate that anota2seq outperforms other methods for FDR-based analysis of changes in translational efficiency even in the absence of buffering (Supplementary Figure S3D). Therefore, anota2seq can be applied to efficiently identify changes in translational efficiency, independent of underlying modes of regulation. These results appear to contradict a recent report which suggested good ROC and precision/recall performance for babel, TE-score and Xtail and poor performance for anota (22). While the reported poor performance of anota (17) was caused by inappropriate application of anota on non-normalized and non-transformed counts, the difference in precision recall performance for babel, TE-score and Xtail was unclear. We therefore examined the simulated data set used during development of Xtail (Supplementary Figure S4A). This revealed that mRNAs selected as true positives for changing their translational efficiency appeared not to distinguish between levels of regulation (translation, buffering or mRNA abundance [Figure 1A]) and also included mRNAs with seemingly unchanged expression (Supplementary Figure S4B top left). Moreover, there were mRNAs which showed increased polysome-association but strongly decreased mRNA levels, which represent unlikely biological events that were not observed in any of the empirical data sets examined (11,36,37,40,43) (Supplementary Figure S4B top right and lower panels). However, if such regulation would exist we speculate it to result in altered protein levels. We reclassified mRNAs in this simulated data set (Supplementary Figure S4C) and examined the population of mRNAs that were identified at different FDR thresholds. These findings closely mirrored results from herein simulated data sets, except for babel, which identified very few regulated events (Supplementary Figure S4D and E). Analysis of translational buffering using anota2seq As discussed above, translational buffering holds potentially important biological information but algorithms using statistics to selectively capture buffering have not yet been developed. We therefore implemented analysis of translational buffering in the anota2seq software. This implementation is based on the same principle as anota2seq analysis that captures changes in translational efficiency affecting protein levels (i.e. APV coupled with variance shrinkage) except that it captures changes in total mRNA levels that are buffered by translation (i.e. alterations in total mRNA that are not paralleled by changes in levels of polysome-associated mRNA or RPFs). To assess the performance of anota2seq for analysis of buffering, we used the same data set as in Figure 2A and B (i.e. with transcripts from translation, buffering, mRNA abundance and unchanged sets; Figure 1A). In contrast to the analysis in Figure 2A and B, identification of translationally buffered mRNAs was considered as true positive events, whereas identification of unchanged mRNAs or mRNAs belonging to translation and mRNA abundance groups were considered false positive events. Importantly, pAUC and AUC for translational buffering analysis were comparable to the performance of anota2seq for analysis of changes in translational efficiency (Figure 3A and Table 2), while very few mRNAs from the translation set were identified during analysis of translational buffering (Figure 3B). Moreover, a relaxed FDR threshold primarily led to additional identification of mRNAs from the buffered set (Figure 3C). Thus, anota2seq can be efficiently applied for FDR-based identification of translationally buffered mRNAs. Figure 3. View largeDownload slide Efficient identification of translational buffering using anota2seq. (A) Receiver operating characteristics curves (left) and precision recall curves (right) for analysis of translational buffering using anota2seq. The data set is the same as in Figure 2A and B but identification of mRNAs from the buffering set were considered true positive events. Vertical lines indicate 5% and 15% false positive rates. (B) Numbers of mRNAs identified as buffered belonged to the translation (FP), buffering (TP), mRNA abundance (FP) or unchanged (FP) sets are indicated at several FDR thresholds (mean and standard deviations from 5 simulated data sets are indicated). Red lines indicate the total amount of mRNAs simulated for each set of regulated transcripts. (C) Shown are differences in the number of mRNAs identified as buffered belonging to the four sets in (B) when changing the FDR threshold from 5% to 15% (mean from five simulated data sets). Figure 3. View largeDownload slide Efficient identification of translational buffering using anota2seq. (A) Receiver operating characteristics curves (left) and precision recall curves (right) for analysis of translational buffering using anota2seq. The data set is the same as in Figure 2A and B but identification of mRNAs from the buffering set were considered true positive events. Vertical lines indicate 5% and 15% false positive rates. (B) Numbers of mRNAs identified as buffered belonged to the translation (FP), buffering (TP), mRNA abundance (FP) or unchanged (FP) sets are indicated at several FDR thresholds (mean and standard deviations from 5 simulated data sets are indicated). Red lines indicate the total amount of mRNAs simulated for each set of regulated transcripts. (C) Shown are differences in the number of mRNAs identified as buffered belonging to the four sets in (B) when changing the FDR threshold from 5% to 15% (mean from five simulated data sets). Table 2. Mean and standard deviation (sd) of AUCs and pAUCs for an ROC analysis assessing performance for identification of changes in translational efficiency leading to buffering in simulated data sets (n = 5) including translation, buffering, mRNA abundance and unchanged sets of transcripts (Figure 2A) AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.965 0.004 0.703 0.016 0.848 0.016 anota2seq (TMM-log2 transf.) 0.966 0.004 0.706 0.016 0.849 0.012 AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.965 0.004 0.703 0.016 0.848 0.016 anota2seq (TMM-log2 transf.) 0.966 0.004 0.706 0.016 0.849 0.012 View Large Table 2. Mean and standard deviation (sd) of AUCs and pAUCs for an ROC analysis assessing performance for identification of changes in translational efficiency leading to buffering in simulated data sets (n = 5) including translation, buffering, mRNA abundance and unchanged sets of transcripts (Figure 2A) AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.965 0.004 0.703 0.016 0.848 0.016 anota2seq (TMM-log2 transf.) 0.966 0.004 0.706 0.016 0.849 0.012 AUC pAUC 5% pAUC 15% Method Mean sd Mean sd Mean sd anota2seq (rlog transf.) 0.965 0.004 0.703 0.016 0.848 0.016 anota2seq (TMM-log2 transf.) 0.966 0.004 0.706 0.016 0.849 0.012 View Large Assessing robustness of algorithms for translatome analyses Experimental and technical challenges and/or study designs may give rise to data sets exhibiting dramatically different characteristics (52,53). This includes different levels of variance and sequencing depth and we therefore assessed the influence of such factors on the performance of the algorithms. Increased variance (Figure 4A) led to a moderate decrease in the number of true positives identified at an FDR threshold of 15% for all methods except TE-score-based analysis, which was particularly affected (Figure 4B and C). This was associated with a decrease in performance as assessed by ROC and precision recall curves (Supplementary Figure S5A–F). As evidenced by pAUC and AUC, anota2seq outperforms other methods at all variance levels in analysis of changes in translational efficiency affecting protein levels using simulated data sets including all sets of regulated transcripts (i.e. translation, buffering and mRNA abundance [Figure 1A]; Supplementary Figure S5; Supplementary Table S2). Reduced sequencing depth had no clear effect on ROC analysis within algorithms until 5 million reads (Supplementary Figure S6; Supplementary Table S3) but at 2.5 million reads all algorithms underperformed (Figure 5A and B). Therefore, while all algorithms except TE-based analysis perform well under increased variance, they all require ∼5 million reads mapped to protein coding mRNAs for efficient analysis. Next, we assessed the impact of varying sequencing depth across samples by analyzing data sets with increasing proportions of samples with a low sequencing depth (2.5M reads) (Supplementary Figure S7A) and monitoring the performance of included algorithms for statistical identification of changes in translational efficiencies affecting protein levels (Supplementary Figure S7B). Although TE-score based analysis was most affected, the performance of all methods decreased as the proportion of samples with low sequencing depth increased (Supplementary Figure S7B; Supplementary Figure S8). Notably, for all methods except TE-score based analysis, the performance with a moderate proportion of samples with low sequencing depth (25%) is comparable to the data set with uniformly high sequencing depth (15M reads in Supplementary Figure S7). When reducing the sequencing depth in the analysis from 15M to 5M reads and adding increasing proportions of samples with a low sequencing depth (2.5M reads; Supplementary Figure S9A), similar patterns, albeit less pronounced, were observed (Supplementary Figure S9B). Finally, when samples with low sequencing depth were unevenly distributed such that all samples under a condition suffer from this limitation, an increase in identification of false positives (unchanged mRNAs) was observed for all methods except babel and TE-score (Supplementary Figure S10). Hence, all algorithms except TE score perform relatively well when the proportion of samples with low sequencing depth (i.e. 2.5M reads mapped to protein coding mRNA) is 25% or less. Figure 4. View largeDownload slide Evaluation of sensitivity to increased variance during analysis of changes in translational efficiency affecting protein levels. (A) Hierarchical clustering of simulated data sets harboring transcripts from unchanged, translation, buffering and mRNA abundance sets (Figure 1A). Data sets were simulated with increasing variance. Red dotted lines provide references across simulated data sets. (B) Numbers of mRNAs identified as differentially translated from translation (TP), buffering (FP), mRNA abundance (FP) or unchanged (FP) sets of transcripts are indicated for each method at a 15% FDR threshold (mean and standard deviations from 5 simulated data sets are indicated). Red lines indicate the total amount of mRNAs simulated for each set of regulated transcripts. (C) Difference in the number of mRNAs identified as differentially translated (FDR < 0.15) belonging to the four sets in (B) when changing from 15% to no additional variance (mean from five simulated data sets). Figure 4. View largeDownload slide Evaluation of sensitivity to increased variance during analysis of changes in translational efficiency affecting protein levels. (A) Hierarchical clustering of simulated data sets harboring transcripts from unchanged, translation, buffering and mRNA abundance sets (Figure 1A). Data sets were simulated with increasing variance. Red dotted lines provide references across simulated data sets. (B) Numbers of mRNAs identified as differentially translated from translation (TP), buffering (FP), mRNA abundance (FP) or unchanged (FP) sets of transcripts are indicated for each method at a 15% FDR threshold (mean and standard deviations from 5 simulated data sets are indicated). Red lines indicate the total amount of mRNAs simulated for each set of regulated transcripts. (C) Difference in the number of mRNAs identified as differentially translated (FDR < 0.15) belonging to the four sets in (B) when changing from 15% to no additional variance (mean from five simulated data sets). Figure 5. View largeDownload slide Evaluation of sensitivity to sequencing depth during analysis of changes in translational efficiency affecting protein levels. (A) Numbers of mRNAs identified as differentially translated from the translation (TP), buffering (FP), mRNA abundance (FP) or unchanged (FP) sets (Figure 1A) are indicated for each method at a 15% FDR threshold (mean and standard deviations from 5 simulated data sets are indicated) (B) Difference in the number of mRNAs identified as differentially translated (FDR < 0.15) belonging to the four sets in (A) when changing the sequencing depth from 15 million to 2.5 million reads (mean from 5 simulated data sets). Red lines indicate the total amount of mRNAs simulated for each set of regulated transcripts. (C) PCA analysis of the RNAseq data set from Liang et al. Lines connect replicate experiments (individual replicates are indicated by numbers). PC: Princinpal component (D) Density plots of adjusted p-values (FDRs) from analysis of changes in translational efficiencies leading to altered protein levels using anota2seq and RNAseq data from Liang et al. (top) or Guan et al. (bottom). Vertical line indicates a 5% and 15% FDR threshold. Numbers of identified mRNAs are reported at a 15% FDR threshold. All possible replicate combinations were analyzed when using three or two replicates and the mean number of identified mRNAs is indicated. Figure 5. View largeDownload slide Evaluation of sensitivity to sequencing depth during analysis of changes in translational efficiency affecting protein levels. (A) Numbers of mRNAs identified as differentially translated from the translation (TP), buffering (FP), mRNA abundance (FP) or unchanged (FP) sets (Figure 1A) are indicated for each method at a 15% FDR threshold (mean and standard deviations from 5 simulated data sets are indicated) (B) Difference in the number of mRNAs identified as differentially translated (FDR < 0.15) belonging to the four sets in (A) when changing the sequencing depth from 15 million to 2.5 million reads (mean from 5 simulated data sets). Red lines indicate the total amount of mRNAs simulated for each set of regulated transcripts. (C) PCA analysis of the RNAseq data set from Liang et al. Lines connect replicate experiments (individual replicates are indicated by numbers). PC: Princinpal component (D) Density plots of adjusted p-values (FDRs) from analysis of changes in translational efficiencies leading to altered protein levels using anota2seq and RNAseq data from Liang et al. (top) or Guan et al. (bottom). Vertical line indicates a 5% and 15% FDR threshold. Numbers of identified mRNAs are reported at a 15% FDR threshold. All possible replicate combinations were analyzed when using three or two replicates and the mean number of identified mRNAs is indicated. Anota2seq allows batch adjustment during statistical analysis Polysome- or ribosome-profiling data sets can include batch effects commonly manifested as systematic differences between replicated experiments. Batch effects can lead to reduced power for detection of changes in translational efficiency and thus adjusting for batch effects during statistical analysis is warranted (54). We therefore implemented the possibility to use batch adjustment in anota2seq, which is applied during APV (18) and also affects parameters for variance shrinkage. To assess the impact of such batch adjustment on anota2seq analysis we selected a data set (11) harboring a systematic batch effect related to replicated experiments (Figure 5C) and applied anota2seq with and without batch adjustment. Batch adjustment led to a dramatic increase in identification of mRNAs with an FDR <15% for changes in translational efficiency which affect protein levels (Figure 5D, top). Therefore, batch adjustment should be considered during anota2seq analysis. Assessing the need for replication in anota2seq analysis Polysome- and ribosome-profiling data sets may include substantial variance. This indicates that sufficient replication is essential for efficient analysis. Because anota2seq requires three replicates in the case of two treatment groups (or two replicates when there are three or more treatment groups) due to the limitation of degrees of freedom in the APV model, we determined the effect of reducing the number of replicates from four to three on anota2seq performance using the data set with two conditions (Figure 5C) (11). Reducing the number of replicates from four to three decreased the number of transcripts identified with an FDR <0.15 for a change in translational efficiency affecting protein levels (Figure 5D, top). Thus, similar to what was previously reported for babel (23), anota2seq is sensitive to the number of replicate experiments although the precise number needed will depend on e.g. the magnitude of changes in translational efficiencies and the sequencing depth. It should also be noted that the batch adjustment reduces the degrees of freedom for the residual error in the APV model (18). Therefore, fewer replicates may be sufficient when analyzing data sets not requiring batch adjustment or with more than two conditions. Indeed, analysis of data from Guan et al. (36) with three conditions which did not require batch adjustment (see original publication) indicated sufficient power for detection of changes in translational efficiency altering protein levels at an 15% FDR threshold when using two replicates per condition (Figure 5D, bottom). Anota2seq applied to ribosome-profiling data identifies translational buffering following RPS19 depletion To assure that anota2seq can also be efficiently applied to ribosome-profiling data, we analyzed a previously published data set assessing the impact of knockdown of RPS19 (43). To this end, we performed the analysis considering only RPFs mapping to coding sequences (CDS) and compared these findings to a data set where RPFs were mapped to full-length transcripts (full transcript). Of note, a relatively low number of reads mapping to mRNA were obtained for RPFs as compared to reads originating from total mRNA sequencing libraries (Figure 6A). The number of RPF reads only mapping to the CDS was on average 20.3% lower as compared to full transcript mapping of RPFs (Figure 6A). Moreover, in a PCA, although there were larger distances between replicated conditions for RPF data as compared to total mRNA data (consistent with poorer reproducibility for RPF data), both mapping strategies achieved similar separation between RNA source (i.e. RPFs or total mRNA) and condition (Figure 6B). Anota2seq analysis of both these datasets revealed similar number of changes in total mRNA, RPFs and buffering as assessed by distributions of FDRs (Figure 6C) and comparable numbers of mRNA classified into the different modes of regulation (Figure 6D). Notably, consistent with the lower reproducibility of RPF samples and low number of reads (Figure 6A and B), few changes in translational efficiencies affecting protein levels could be detected (Figure 6C). As translational buffering emerged as the major post-transcriptional mode for regulation of gene expression upon silencing of RPS19, we compared whether the mapping strategy (i.e. CDS or full transcripts) for RPFs affected the set of identified buffered mRNAs. A Venn diagram-based comparison indicated a substantial number of genes identified regardless whether RPFs from CDSs or full transcripts were employed. Subsets of transcripts, however, were captured by one, but not both mapping approaches (Figure 6E). When applying statistics-based thresholds to compare results using Venn diagrams, identified differences may be due to small shifts in statistical significances and therefore not indicative of differences in patterns of regulation. To assess this, we considered mRNAs identified as buffered in the CDS-based analysis only, and visualized their fold changes as measured using full transcript mapping, and vice versa (Figure 6F). This revealed that the majority of such transcripts showed similar regulation in both datasets despite being suggested as distinctly identified depending on mapping strategy by the Venn diagram (Figure 6E). Thus, anota2seq can be efficiently applied to ribosome profiling data. Moreover, although the analysis suffers from a low number of RPF reads, this assessment suggests wide-spread translational buffering following silencing of RPS19. Furthermore, although read mapping strategies produced very similar results following anota2seq analysis, some transcripts may be concluded as regulated differently depending on whether RPFs are allowed to map to CDS or full transcripts. As this may have significant biological implications, mapping strategy should be considered when performing anota2seq analysis of ribosome-profiling data. Figure 6. View largeDownload slide Anota2seq analysis of ribosome-profiling data identifies translational buffering following RPS19 silencing. (A) Number of RNAseq reads from sequencing libraries for total mRNA mapping to full transcript and for RPF mapping to CDS or full transcript. (B) PCA plots of components 1 and 2 from analysis using data sets where RPFs were mapped to CDSs (left) or the full transcript (right) as input. (C) Density plot of FDRs following anota2seq analysis comparing RPS19 knock-down to control (shLuc) using the two data sets from (B) as input. (D) Scatter plots of log2 fold-changes (shRPS19 vs shLuc) for total mRNA and RPF data (CDS [left] or full transcript [right] RPF mapping). Numbers of identified transcripts under each mode of regulation are indicated. (E) A Venn diagram comparing transcripts identified as buffered from anota2seq analysis of the two data sets from (B). (F) Scatter plots of log2 fold-changes (shRPS19 versus shLuc) of total mRNA and RPF data (CDSs data [right] and full transcript [left]). Transcripts identified as buffered only when anota2seq was applied on the data sets where RPFs were mapped to the full transcript (left) or CDS (right) are indicated (i.e. corresponding to [E]). Note: not all mRNAs identified during analysis of the full transcript data set were represented in the CDS data set (due to filtering [see materials and methods]). Figure 6. View largeDownload slide Anota2seq analysis of ribosome-profiling data identifies translational buffering following RPS19 silencing. (A) Number of RNAseq reads from sequencing libraries for total mRNA mapping to full transcript and for RPF mapping to CDS or full transcript. (B) PCA plots of components 1 and 2 from analysis using data sets where RPFs were mapped to CDSs (left) or the full transcript (right) as input. (C) Density plot of FDRs following anota2seq analysis comparing RPS19 knock-down to control (shLuc) using the two data sets from (B) as input. (D) Scatter plots of log2 fold-changes (shRPS19 vs shLuc) for total mRNA and RPF data (CDS [left] or full transcript [right] RPF mapping). Numbers of identified transcripts under each mode of regulation are indicated. (E) A Venn diagram comparing transcripts identified as buffered from anota2seq analysis of the two data sets from (B). (F) Scatter plots of log2 fold-changes (shRPS19 versus shLuc) of total mRNA and RPF data (CDSs data [right] and full transcript [left]). Transcripts identified as buffered only when anota2seq was applied on the data sets where RPFs were mapped to the full transcript (left) or CDS (right) are indicated (i.e. corresponding to [E]). Note: not all mRNAs identified during analysis of the full transcript data set were represented in the CDS data set (due to filtering [see materials and methods]). DISCUSSION Modulation of translation underlies numerous biological and pathological processes ranging from stress response and cancer (55) to learning and memory (56). Nonetheless, translatomes are vastly under-studied in comparison to transcriptomes (which reflect mRNA abundance determined at the level of transcription and/or mRNA stability) (7). Stringent and efficient application of transcriptome-wide methods to measure changes in translation are therefore required to advance knowledge regarding the role of translation in homeostasis and disease. Moreover, sufficient replication and efficient data analysis is crucial for deriving valid conclusions. Notwithstanding the noisy nature of polysome- and ribosome-profiling data, it is often paradoxically suggested that algorithms for identification of differential translation do not require replication (22). This does not seem to be consistent with concerns about reproducibility in quantitative biology, which instead suggests that sufficient experimental replication is essential to derive meaningful conclusions. Consistently, we reveal that analysis using Xtail on datasets with one replicate is associated with exceptionally high rates of false positive findings (Supplementary Figure S1B). Currently, polysome- and ribosome-profiling are most commonly used methods to interrogate translatomes whereby polysome-profiling is more efficient for identification of changes in translational efficiency (12,14), while ribosome-profiling generates information about ribosome positioning at a single nucleotide resolution (57,58). A powerful unique property of polysome-profiling is that it allows examination of 5′ and/or 3′UTRs of translated mRNAs, thereby facilitating identification of regulatory elements as well as potential differences in translation of mRNA isoforms co-expressed but differing in their 5′ or 3′UTRs (12). Ribosome- and polysome-profiling therefore represent complementary methodologies providing ample opportunity to study translatomes. Hence, there is interest to develop efficient algorithms to analyze polysome- and ribosome-profiling data. Analyses of changes in bona fide translational efficiencies need to be adapted to advances in technology that bear distinct characteristics, such as the count nature of RNAseq data, but also has to parallel the understanding of mechanisms regulating mRNA translation. Translational buffering represents one such mechanism of translation control wherein alterations in mRNA levels are compensated at the level of translation such that levels of polysome-associated mRNAs or RPFs are not affected by changes in mRNA abundance (4,16,29,30,32). Translational buffering is thus expected to retain protein levels despite changes in mRNA levels. Herein, we developed the anota2seq algorithm, which can be employed to analyze DNA-microarray and RNAseq data and efficiently identify and separate changes in translational efficiency affecting protein levels and translational buffering. Evaluation of anota2seq compared to other methods for translatome analyses indicated superior performance of anota2seq in detecting differential translation with low type-1-error rates and robustness against noise and varying sequencing depths. Importantly, anota2seq is applicable to both polysome- and ribosome-profiling studies. This highlights the effectiveness of anota2seq analysis for various types of data and the need to consider translational buffering during analysis of translatomes given that this mode for regulation of gene expression appears to be prevalent in multiple systems (29–35). One unexpected finding was the poor performance of Xtail under the NULL condition inasmuch as a large number of mRNAs were identified as differentially translated despite no true changes in their translational efficiency (Figure 1C, Supplementary Figures S1B and S2D). This most likely stems from incorrect assumptions regarding data independence in the models applied by Xtail (22). Indeed, assessing the performance under the NULL condition during algorithm development to derive tools that can be used for efficient and valid analysis is critical. Moreover, anota2seq has several distinct features as compared to other methods: (i) it is not based on interpretation of differences between log-ratios and hence will not be affected by spurious correlations; (ii) it distinguishes changes in translation efficiency affecting protein levels from translational buffering; (iii) it allows for gene-level batch correction and (iv) it permits analysis of polysome-associated and total mRNA changes using the same analytical methods thereby allowing simple and comparable identification of changes in polysome-associated mRNA, total mRNA, translational efficiency affecting protein levels and buffering. Although, using simulated data, rlog and TMM-log2 approaches performed similarly, prudence is advised when selecting normalization/transformation methods for anota2seq analysis, as technological biases not tested herein may influence outcomes. Anota2seq therefore incorporates both TMM-log2 and rlog but also allows the user to supply custom transformed and normalized data. In summary, we designed anota2seq for analysis of mRNA translation which can be applied independent of platform used for quantification. Application of such statistically stringent analyses holds a promise to critically improve understanding of the role of translation in health and disease. DATA AVAILABILITY The anota2seq software is available as a Bioconductor package. (http://bioconductor.org/packages/release/bioc/html/anota2seq.html) Four datasets were retrieved from GEO (38) with accession numbers GSE99909, GSE90070, GSE35469 and GSE89183. One dataset was retrieved from ArrayExpress (41) with accession number E-MEXP-958 SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS I.T. acknowledges Prof. R. McInnes for invaluable advice. Author contributions: C.O., L.F., I.T. and O.L. designed the study. C.O., J.L., C.M. and O.L. developed the anota2seq method. C.O., J.L. and O.L. developed the anota2seq software. C.O., J.L., L.F., I.T. and O.L. analyzed and interpreted the data. C.O., J.L., L.F., I.T. and O.L. wrote the manuscript. FUNDING Swedish Research Council; Swedish Cancer Society; Cancer Society in Stockholm; Wallenberg Academy Fellows Program; STRATCAN grants (to O.L.); Canadian Institutes for Health Research [MOP-363027]; National Institutes of Health [R01 CA 202021-01-A1 to I.T.] who is a Junior 2 Research Scholar of the Fonds de Recherche du Québec – Santé (FRQ-S); Joint Canada-Israel Health Research Program (JCIHRP) [108589-001 to I.T. and O.L.]; Department of Health and Human Services acting through the Victorian Cancer Agency [MCRF16007 to L.F.]; National Health and Medical Research Council (NHMRC) [APP1141339 to O.L., I.T. and L.F.] Funding for open access charge: Vetenskapsrådet. Conflict of interest statement. None declared. REFERENCES 1. Komili S. , Silver P.A. Coupling and coordination in gene expression processes: a systems biology view . Nat. Rev. Genet. 2008 ; 9 : 38 – 48 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Li J.J. , Bickel P.J. , Biggin M.D. System wide analyses have underestimated protein abundances and the importance of transcription in mammals . PeerJ . 2014 ; 2 : e270 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Liu Y. , Beyer A. , Aebersold R. On the dependency of cellular protein levels on mRNA Abundance . Cell . 2016 ; 165 : 535 – 550 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Jovanovic M. , Rooney M.S. , Mertins P. , Przybylski D. , Chevrier N. , Satija R. , Rodriguez E.H. , Fields A.P. , Schwartz S. , Raychowdhury R. et al. . Immunogenetics. Dynamic profiling of the protein life cycle in response to pathogens . Science . 2015 ; 347 : 1259038 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Schwanhäusser B. , Busse D. , Li N. , Dittmar G. , Schuchhardt J. , Wolf J. , Chen W. , Selbach M. Global quantification of mammalian gene expression control . Nature . 2011 ; 473 : 337 – 342 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Kristensen A.R. , Gsponer J. , Foster L.J. Protein synthesis rate is the predominant regulator of protein expression during differentiation . Mol. Syst. Biol. 2013 ; 9 : 689 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Piccirillo C.A. , Bjur E. , Topisirovic I. , Sonenberg N. , Larsson O. Translational control of immune responses: from transcripts to translatomes . Nat. Immunol. 2014 ; 15 : 503 – 511 . Google Scholar Crossref Search ADS PubMed WorldCat 8. Warner J.R. , Knopf P.M. , Rich A. A multiple ribosomal structure in protein synthesis . Proc. Natl. Acad. Sci. U.S.A. 1963 ; 49 : 122 – 129 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Gandin V. , Sikström K. , Alain T. , Morita M. , McLaughlan S. , Larsson O. , Topisirovic I. Polysome fractionation and analysis of mammalian translatomes on a genome-wide scale . J. Vis. Exp. 2014 ; doi:10.3791/51455 . WorldCat 10. Ingolia N.T. , Ghaemmaghami S. , Newman J.R.S. , Weissman J.S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling . Science . 2009 ; 324 : 218 – 223 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Liang S. , Bellato H.M. , Lorent J. , Lupinacci F.C.S. , Oertlin C. , van Hoef V. , Andrade V.P. , Roffé M. , Masvidal L. , Hajj G.N.M. et al. . Polysome-profiling in small tissue samples . Nucleic Acids Res. 2017 ; 46 : e3 . Google Scholar Crossref Search ADS WorldCat 12. Gandin V. , Masvidal L. , Hulea L. , Gravel S.-P. , Cargnello M. , McLaughlan S. , Cai Y. , Balanathan P. , Morita M. , Rajakumar A. et al. . nanoCAGE reveals 5′ UTR features that define specific modes of translation of functionally related MTOR-sensitive mRNAs . Genome Res. 2016 ; 26 : 636 – 648 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Ingolia N.T. Ribosome profiling: new views of translation, from single codons to genome scale . Nat. Rev. Genet. 2014 ; 15 : 205 – 213 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Masvidal L. , Hulea L. , Furic L. , Topisirovic I. , Larsson O. mTOR-sensitive translation: Cleared fog reveals more trees . RNA Biol. 2017 ; 14 : 1299 – 1305 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Gerashchenko M.V. , Gladyshev V.N. Translation inhibitors cause abnormalities in ribosome profiling experiments . Nucleic Acids Res. 2014 ; 42 : e134 . Google Scholar Crossref Search ADS PubMed WorldCat 16. O’Connor P.B.F. , Andreev D.E. , Baranov P. V. Comparative survey of the relative impact of mRNA features on local ribosome profiling read density . Nat. Commun. 2016 ; 7 : 12915 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Larsson O. , Sonenberg N. , Nadon R. anota: Analysis of differential translation in genome-wide studies . Bioinformatics . 2011 ; 27 : 1440 – 1441 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Schleifer S.J. , Eckholdt H.M. , Cohen J. , Keller S.E. Analysis of partial variance (APV) as a statistical approach to control day to day variation in immune assays . Brain. Behav. Immun. 1993 ; 7 : 243 – 252 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Wright G.W. , Simon R.M. A random variance model for detection of differential gene expression in small microarray experiments . Bioinformatics . 2003 ; 19 : 2448 – 2455 . Google Scholar Crossref Search ADS PubMed WorldCat 20. Larsson O. , Sonenberg N. , Nadon R. Identification of differential translation in genome wide studies . Proc. Natl. Acad. Sci. U.S.A. 2010 ; 107 : 21487 – 21492 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Pearson K. Mathematical contributions to the theory of Evolution.–On a form of spurious correlation which may arise when indices are used in the measurement of organs . Proc. R. Soc. London . 1896 ; 60 : 489 – 498 . WorldCat 22. Xiao Z. , Zou Q. , Liu Y. , Yang X. Genome-wide assessment of differential translations with ribosome profiling data . Nat. Commun. 2016 ; 7 : 11194 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Olshen A.B. , Hsieh A.C. , Stumpf C.R. , Olshen R.A. , Ruggero D. , Taylor B.S. Assessing gene-level translational control from ribosome profiling . Bioinformatics . 2013 ; 29 : 2995 – 3002 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Love M.I. , Huber W. , Anders S. , Lönnstedt I. , Speed T. , Robinson M. , Smyth G. , McCarthy D. , Chen Y. , Smyth G. et al. . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 . Genome Biol. 2014 ; 15 : 550 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Rapino F. , Delaunay S. , Rambow F. , Zhou Z. , Tharun L. , De Tullio P. , Sin O. , Shostak K. , Schmitz S. , Piepers J. et al. . Codon-specific translation reprogramming promotes resistance to targeted therapy . Nature . 2018 ; 558 : 605 – 609 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Dever T.E. , Ivanov I.P. Roles of polyamines in translation . J. Biol. Chem. 2018 ; 293 : 18719 – 18729 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Matsufuji S. , Matsufuji T. , Miyazaki Y. , Murakami Y. , Atkins J.F. , Gesteland R.F. , Hayashi S. Autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme . Cell . 1995 ; 80 : 51 – 60 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Law G.L. , Raney A. , Heusner C. , Morris D.R. Polyamine regulation of ribosome pausing at the upstream open reading frame of S-adenosylmethionine decarboxylase . J. Biol. Chem. 2001 ; 276 : 38036 – 38043 . Google Scholar PubMed WorldCat 29. Li G.-W. , Burkhardt D. , Gross C. , Weissman J.S. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources . Cell . 2014 ; 157 : 624 – 635 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Lalanne J.-B. , Taggart J.C. , Guo M.S. , Herzel L. , Schieler A. , Li G.-W. Evolutionary convergence of pathway-specific enzyme expression stoichiometry . Cell . 2018 ; 173 : 749 – 761 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Kustatscher G. , Grabowski P. , Rappsilber J. Pervasive coexpression of spatially proximal genes is buffered at the protein level . Mol. Syst. Biol. 2017 ; 13 : 937 . Google Scholar Crossref Search ADS PubMed WorldCat 32. McManus C.J. , May G.E. , Spealman P. , Shteyman A. Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast . Genome Res. 2014 ; 24 : 422 – 430 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Artieri C.G. , Fraser H.B. Evolution at two levels of gene expression in yeast . Genome Res. 2014 ; 24 : 411 – 421 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Cenik C. , Cenik E.S. , Byeon G.W. , Grubert F. , Candille S.I. , Spacek D. , Alsallakh B. , Tilgner H. , Araya C.L. , Tang H. et al. . Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans . Genome Res. 2015 ; 25 : 1610 – 1621 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Perl K. , Ushakov K. , Pozniak Y. , Yizhar-Barnea O. , Bhonker Y. , Shivatzki S. , Geiger T. , Avraham K.B. , Shamir R. Reduced changes in protein compared to mRNA levels across non-proliferating tissues . BMC Genomics . 2017 ; 18 : 305 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Guan B.J. , van Hoef V. , Jobava R. , Elroy-Stein O. , Valasek L.S. , Cargnello M. , Gao X.H. , Krokowski D. , Merrick W.C. , Kimball S.R. et al. . A unique ISR program determines cellular responses to chronic stress . Mol. Cell . 2017 ; 68 : 885 – 900 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Hsieh A.C. , Liu Y. , Edlind M.P. , Ingolia N.T. , Janes M.R. , Sher A. , Shi E.Y. , Stumpf C.R. , Christensen C. , Bonham M.J. et al. . The translational landscape of mTOR signalling steers cancer initiation and metastasis . Nature . 2012 ; 485 : 55 – 61 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Barrett T. , Wilhite S.E. , Ledoux P. , Evangelista C. , Kim I.F. , Tomashevsky M. , Marshall K.A. , Phillippy K.H. , Sherman P.M. , Holko M. et al. . NCBI GEO: archive for functional genomics data sets–update . Nucleic Acids Res. 2013 ; 41 : D991 – D995 . Google Scholar Crossref Search ADS PubMed WorldCat 39. Law C.W. , Chen Y. , Shi W. , Smyth G.K. , Tusher V. , Tibshirani R. , Chu G. , Wright G. , Simon R. , Smyth G. et al. . voom: precision weights unlock linear model analysis tools for RNA-seq read counts . Genome Biol. 2014 ; 15 : R29 . Google Scholar Crossref Search ADS PubMed WorldCat 40. Parent R. , Kolippakkam D. , Booth G. , Beretta L. Mammalian target of rapamycin activation impairs hepatocytic differentiation and targets genes moderating lipid homeostasis and hepatocellular growth . Cancer Res. 2007 ; 67 : 4337 – 4345 . Google Scholar Crossref Search ADS PubMed WorldCat 41. Kolesnikov N. , Hastings E. , Keays M. , Melnichuk O. , Tang Y.A. , Williams E. , Dylag M. , Kurbatova N. , Brandizi M. , Burdett T. et al. . ArrayExpress update-simplifying data submissions . Nucleic Acids Res. 2015 ; 43 : D1113 – D1116 . Google Scholar Crossref Search ADS PubMed WorldCat 42. Carvalho B.S. , Irizarry R.A. A framework for oligonucleotide microarray preprocessing . Bioinformatics . 2010 ; 26 : 2363 – 2367 . Google Scholar Crossref Search ADS PubMed WorldCat 43. Khajuria R.K. , Munschauer M. , Ulirsch J.C. , Fiorini C. , Ludwig L.S. , McFarland S.K. , Abdulhay N.J. , Specht H. , Keshishian H. , Mani D.R. et al. . Ribosome levels selectively regulate translation and lineage commitment in human hematopoiesis . Cell . 2018 ; 173 : 90 – 103 . Google Scholar Crossref Search ADS PubMed WorldCat 44. O’Leary N.A. , Wright M.W. , Brister J.R. , Ciufo S. , Haddad D. , McVeigh R. , Rajput B. , Robbertse B. , Smith-White B. , Ako-Adjei D. et al. . Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation . Nucleic Acids Res. 2016 ; 44 : D733 – D745 . Google Scholar Crossref Search ADS PubMed WorldCat 45. Robles J.A. , Qureshi S.E. , Stephen S.J. , Wilson S.R. , Burden C.J. , Taylor J.M. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing . BMC Genomics . 2012 ; 13 : 484 . Google Scholar Crossref Search ADS PubMed WorldCat 46. Soneson C. , Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data . BMC Bioinformatics . 2013 ; 14 : 91 . Google Scholar Crossref Search ADS PubMed WorldCat 47. Benjamini Y. , Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing . J. R. Stat. Soc. B . 1995 ; 57 : 289 – 300 . WorldCat 48. Zhong Y. , Karaletsos T. , Drewe P. , Sreedharan V.T. , Kuo D. , Singh K. , Wendel H.-G. , Rätsch G. RiboDiff: detecting changes of mRNA translation efficiency from ribosome footprints . Bioinformatics . 2017 ; 33 : 139 – 141 . Google Scholar Crossref Search ADS PubMed WorldCat 49. Sing T. , Sander O. , Beerenwinkel N. , Lengauer T. ROCR: Visualizing classifier performance in R . Bioinformatics . 2005 ; 21 : 3940 – 3941 . Google Scholar Crossref Search ADS PubMed WorldCat 50. Noble W.S. How does multiple testing correction work . Nat. Biotechnol. 2009 ; 27 : 1135 – 1137 . Google Scholar Crossref Search ADS PubMed WorldCat 51. Colman H. , Le Berre-Scoul C. , Hernandez C. , Pierredon S. , Bihouée A. , Houlgatte R. , Vagner S. , Rosenberg A.R. , Féray C. Genome-wide analysis of host mRNA translation during hepatitis C virus infection . J. Virol. 2013 ; 87 : 6668 – 6677 . Google Scholar Crossref Search ADS PubMed WorldCat 52. Sims D. , Sudbery I. , Ilott N.E. , Heger A. , Ponting C.P. Sequencing depth and coverage: key considerations in genomic analyses . Nat. Rev. Genet. 2014 ; 15 : 121 – 132 . Google Scholar Crossref Search ADS PubMed WorldCat 53. Ozsolak F. , Milos P.M. RNA sequencing: advances, challenges and opportunities . Nat. Rev. Genet. 2011 ; 12 : 87 – 98 . Google Scholar Crossref Search ADS PubMed WorldCat 54. Leek J.T. , Scharpf R.B. , Bravo H.C. , Simcha D. , Langmead B. , Johnson W.E. , Geman D. , Baggerly K. , Irizarry R.A. Tackling the widespread and critical impact of batch effects in high-throughput data . Nat. Rev. Genet. 2010 ; 11 : 733 – 739 . Google Scholar Crossref Search ADS PubMed WorldCat 55. Topisirovic I. , Sonenberg N. mRNA translation and energy metabolism in cancer: the role of the MAPK and mTORC1 pathways . Cold Spring Harb. Symp. Quant. Biol. 2011 ; 76 : 355 – 367 . Google Scholar Crossref Search ADS PubMed WorldCat 56. Bramham C.R. , Wells D.G. Dendritic mRNA: transport, translation and function . Nat. Rev. Neurosci. 2007 ; 8 : 776 – 789 . Google Scholar Crossref Search ADS PubMed WorldCat 57. Andreev D.E. , O’Connor P.B.F. , Loughran G. , Dmitriev S.E. , Baranov P.V. , Shatsky I.N. Insights into the mechanisms of eukaryotic translation gained with ribosome profiling . Nucleic Acids Res. 2016 ; 45 : 513 – 526 . Google Scholar Crossref Search ADS PubMed WorldCat 58. Ingolia N.T. Ribosome footprint profiling of translation throughout the genome . Cell . 2016 ; 165 : 22 – 33 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]
PIXUL-ChIP: integrated high-throughput sample preparation and analytical platform for epigenetic studiesBomsztyk,, Karol;Mar,, Daniel;Wang,, Yuliang;Denisenko,, Oleg;Ware,, Carol;Frazar, Christian, D;Blattler,, Adam;Maxwell, Adam, D;MacConaghy, Brian, E;Matula, Thomas, J
doi: 10.1093/nar/gkz222pmid: 30927002
Abstract Chromatin immunoprecipitation (ChIP) is the most widely used approach for identification of genome-associated proteins and their modifications. We have previously introduced a microplate-based ChIP platform, Matrix ChIP, where the entire ChIP procedure is done on the same plate without sample transfers. Compared to conventional ChIP protocols, the Matrix ChIP assay is faster and has increased throughput. However, even with microplate ChIP assays, sample preparation and chromatin fragmentation (which is required to map genomic locations) remains a major bottleneck. We have developed a novel technology (termed ‘PIXUL’) utilizing an array of ultrasound transducers for simultaneous shearing of samples in standard 96-well microplates. We integrated PIXUL with Matrix ChIP (‘PIXUL-ChIP’), that allows for fast, reproducible, low-cost and high-throughput sample preparation and ChIP analysis of 96 samples (cell culture or tissues) in one day. Further, we demonstrated that chromatin prepared using PIXUL can be used in an existing ChIP-seq workflow. Thus, the high-throughput capacity of PIXUL-ChIP provides the means to carry out ChIP-qPCR or ChIP-seq experiments involving dozens of samples. Given the complexity of epigenetic processes, the use of PIXUL-ChIP will advance our understanding of these processes in health and disease, as well as facilitate screening of epigenetic drugs. INTRODUCTION The chromatin immunoprecipitation (ChIP) assay, a widely-used approach for identifying histone modifications and genome-associated proteins, is one of the most powerful tools to study transcription and epigenetics processes (1–6). We have previously developed a high-throughput microplate ChIP assay, Matrix ChIP, which speeds up the analytical process, dramatically increases the assay's throughput, and provides superior sensitivity and reproducibility as compared to other protocols (7–9). Although the introduction of Matrix ChIP and other high-throughput ChIP platforms (10,11) was a major improvement, their utility was limited by low throughput and efficiency of the existing methods for chromatin sample preparation. The most common approach used for chromatin fragmentation is ultrasound treatment. Ultrasound waves transmitted into liquids generate cycles of alternating high pressure (compression) and low pressure (rarefaction), with rates governed by the applied frequency. The rarefaction phase creates cavitation, in which vapor and/or gas bubbles expand and then collapse violently. Cavitation in liquids has many applications including chromatin sample preparation for ChIP (12). Enzymatic digestion is alternatively used for chromatin fragmentation (6) but conditions vary depending on the application (13). Enzymatic digestion may also require ultrasonic pre-treatment (14,15) especially for tissues. Recently, targeted in situ enzyme-based genome-wide profiling methods have been introduced but these use un-fixed chromatin (16,17). There are a number of different commercially available ultrasound instruments that use cavitation to shear chromatin, including microtip probes, horns, water bath- including microplate-based methods. As none of these sonication instruments can be directly applied to culture plates, harvesting of cells and their transfer to tubes or plates is inefficient, resulting in sample losses. Further, the commonly used Covaris sonicators use expensive tubes or 96-well plates that cost more than $400/plate. To match the high-throughput capacity of microplate ChIP analytical platforms (8,9,11,18), we developed an instrument, PIXUL, that consists of an array of ultrasound transducers that shear chromatin in each and all wells using off-the-shelf low cost 96-well plate (∼$2/plate). We integrated PIXUL with ChIP, PIXUL-ChIP, for high-throughput transcription and epigenetic analysis of cultured cells and tissues. We also provide examples that PIXUL has the potential to be used as a multipurpose sample preparation platform. METHODS PIXUL instrument components (Supplementary Figure S1) Ultrasound treatment system PIXUL is custom-built and comprises the following main parts: (i) a transducer-lens assembly capable of focusing ultrasound in each well of a 96 well microplate, (ii) a high power amplifier to drive the transducer array, (iii) a Peltier cooling system to reduce heating of the samples and (iv) a computer to control the ultrasound pulse parameters (number of cycles, treatment configurations and treatment time) (Supplementary Figure S1). The transducer array is composed of flat lead-zirconate-titanate (PZT) ceramic bar segments bonded to the base of a 96-element lens array such that each lens focuses acoustic energy into an individual well of the microplate (patent pending, WO 20170205318). The operating frequency is approximately 2 MHz. The transducers are driven with a high voltage pulse from the amplifier. The bonded lens focuses the ultrasound, creating intense cavitation in the sample fluid as well as vigorous mixing during sonication. In free field, each focused transducer element produces up to 30 MPa peak positive (–12 MPa peak negative) pressure. The amplifier is a purpose built multi-channel high power amplifier (19) capable of applying sufficient voltage to the transducer array to generate the required intense cavitation in the sample wells. The amplifier system consists of an FPGA (field-programmable gate array) timing board, high-voltage switching boards, and an external power supply. The timing board controls the ultrasound pulsing parameters, which are programmed through USB using MATLAB software (MathWorks, Natick, MA, USA) on a standard computer. Matching networks are included between the output of the amplifier and transducer to maximize power transfer. The timing board operates the high voltage switching boards to create a high-voltage signal that is applied to the transducers. The Peltier-cooled liquid (water and glycerol mixture) flows (at ∼40 ml/min) between the transducer/lens array and the bottom of the microplate and acts to couple the ultrasound to the microplate samples, and also to reduce sample heating. Cell lines and treatment Cells were grown in round-bottom 96-well polystyrene plates. Human HCT116 colon carcinoma and human HEK293 kidney cell lines were grown (∼200 000/well) in DMEM supplemented with glutamine, penicillin, streptomycin and 10% fetal bovine serum. For time-point experiments, cells were serum-deprived (0.1% FBS) overnight and at specific time points were treated with either 10% FBS or 12-tetradecanoate 13-acetate (TPA) at 10−7 M. Cell cross-linking, harvesting and sonication All PIXUL steps were done directly in 96-well culture plates without sample transfers. Cells were cross-linked by adding 100 μl 1% formaldehyde in PBS to overlaying media, and the plate was shaken for 15 s. Cross-linking was done for 20 min at room temp. Supernatant was removed, 200 μl PBS/glycine (125 mM) was added for 5 min at RT, and the wells were washed with 200 μl PBS (8,9,20). After removing PBS, wells were filled with 100 μl chromatin shearing buffer, and plates were sealed with PCR film (MiniAMP Optical Adhesive Film (Applied Biosystems, #4311971)) and placed in PIXUL for shearing. PIXUL was programmed to sonicate (50 cycles, 550 Pulse Repetition Frequency (PRF), 40 W) one pair of columns (8 + 8 wells) for 10 s and then electronically skip to the next pair (the plate is not mechanically moved). After all six pairs of columns were sheared, this sequence was repeated 24–36 times (total ON shearing time 4–6 min/well, 18–36 min/plate). Comparison of Covaris LE220 instrument with PIXUL was done using HCT116 cells cultured in 96-well plates as follows. After cross-linking (as above), cells from three wells were combined into one sample (one well was not sufficient for ChIP-qPCR) for shearing in either Covaris microplate or tubes, with total time 5 min/column (8 samples) (200 cycles, Duty Factor 15%, 450 W). In-well temperature was monitored with a small-size thermocouple (52II Thermometer, FLUKE). In-well start temperature with PIXUL was 4°C and slowly increased to 24°C at the end of the treatment (tank temperature remained at ∼14°C). In-well start temperature with Covaris was 7°C, increasing within one min to 20°C and reaching 22°C by the end of the 5 min treatment (tank temperature remained at ∼4°C). Comparative studies using Bioruptor were done in 0.5 ml microfuge tubes as previously described (8,9). Bioruptor in-tube temperature could not be monitored, but tank starting temperature was 4°C and maintained <25°C during the run by stopping the treatment and letting the circulating water chill (circulating through ice bucket). Sonicated chromatin generated with each instrument was assessed by agarose gel electrophoresis and Matrix ChIP analyses. Mouse tissue cross-linking and sonication Female and male 12-week-old WT (C57bl6) mice were used. Mice were euthanized by isoflurane overdose followed by cervical dislocation. Hearts, kidneys, livers and lungs were recovered, flash-frozen and stored at −80°C. All procedures were done in accordance with current NIH guidelines and approved by the Animal Care and Use Committee of the University of Washington. Small tissue fragments (30–50 mg) cut from frozen organs were added to 1.5 ml centrifuge tubes containing 0.5 ml 1% formaldehyde in PBS and briefly homogenized with loose pestle motor mixer. After 20 min of cross-linking, formaldehyde was replaced with 0.5 ml 125 mM glycine/PBS for 5 min to quench the reaction, followed by PBS wash. Tissues samples from all these organs were then resuspended in 100 μl chromatin shearing buffer and added to wells of the 96-well plate for sonication in PIXUL using the same protocol as for cell culture. Matrix ChIP: multiplex microplate-based chromatin immunoprecipitation The multiplex microplate Matrix ChIP method was previously described (8,9). Briefly, ChIP assays were done using protein A-coated 96-well polypropylene microplates (8). 1 μl of isolated DNA was used in 2 μl real-time qPCR reactions (done in 384-well plates using ABI7900HT). All PCR reactions were run in quadruplicate using Sybr green. PCR calibration curves were generated for each primer pair from a dilution series of total mouse or human genomic DNA. The PCR primer efficiency curve was fit to cycle threshold (Ct) versus log(genomic DNA concentration) using an r-squared best fit. DNA concentration values for each ChIP and input chromatin DNA sample was calculated from their respective average Ct values. Final results are expressed as fraction of input DNA (9). The list of ChIP antibodies is shown in Table S1 (supplement) and PCR primers in Supplementary Tables S2 and S3. RNA extraction and cDNA synthesis RNA was isolated using TRIzol as per manufacturer's protocol. To synthesize cDNA, 400 ng of TRIzol-extracted total RNA was reverse transcribed with SuperScript IV (Invitrogen, 18090050), 0.2 mM dNTP (GeneScript, 95040-880) and oligo dT primers (IDT) in 10 μl reactions in 96-well microplates. RT reactions were diluted 100-fold prior to running qPCR. RT-qPCR primers are listed in Supplementary Tables S2 and S3. For RNA extraction using PIXUL, small tissue fragments were added to wells of 96-well plates containing 100 μl TRIzol, and samples were treated using the same parameters as for chromatin sharing (50 cycles, 550 PRF, 40 W for 1 min/column of one organ). After PIXUL treatment, the rest of the procedure was same as for the standard TRIzol and RT-qPCR protocol. PIXUL-ChIP-seq HCT116 cells were grown to a density of ∼200 000 cells per well, cross-linked, quenched and sonicated using 96-well PIXUL for 6 min per well in 100 μl. ChIP and library preparation were done using Low Cell ChIP-Seq Kit (Catalog number 53084, Active Motif, Carlsbad, CA, USA). Libraries were sequenced on a NextSeq 500 as PE75 with dual 8bp indexing, allowing for PCR de-duplication with molecular identifiers at the i5 position (as per product manual, https://www.activemotif.com/documents/2073.pdf). ChIP-seq data was aligned to hg19 using BWA (version 0.7.12) (21). For each of the seven PIXUL-ChIP-seq data sets, corresponding ENCODE BAM files were downloaded from the ENCODE website (https://www.encodeproject.org) (22). If there were multiple experiments for the same epitope from different labs, we chose the one from the Bernstein Lab. Peaks were called using MACS2 (23) with the parameters –broad -broad-cutoff 0.1. Low quality peaks were filtered out as follows: for H3K4m1, H3K27m3 and H3K27Ac, peaks with q-value <1e–3 and fold enrichment >2 compared to input were kept. For all other histone marks, peaks with q-value <1e–10 and fold enrichment >5 were kept. Bedtools (24) (https://bedtools.readthedocs.io/) were used to identify peaks overlapping between PIXUL-ChIP-seq dataset and ENCODE. Specialized source code generated for ChIP-seq analyses is available using the following link https://github.com/yuliangwang/PIXUL_ChIP PIXUL-ChIP-seq data can be viewed in genome browser using the following link https://tinyurl.com/y9sap4qd qPCR data To acquire, store and analyze large qPCR data sets generated by the high-throughput Matrix ChIP platform, we used our previously developed graphical tool, PCRCrunch (7). Pair-wise statistically significant differences are represented on graphs by the size of a circle for each comparison, with a small circle representing P < 0.05, a large circle indicating P < 0.01 and no circle implying P > 0.05. PCRCrunch uses a two-tailed Student's t-test to compute P-values (7). Agarose gel electrophoresis image processing software tool Matlab 2017 with Signal Processing and Curve Fitting Toolboxes was used. The program utilizes a gel electrophoresis image to quantify both the relative concentration and the base-pair length of the DNA band in each well. First, the program converts original image into a gray color scale and resizes it to a linear scale (using the base-pair ladder from the gel to calibrate). Second, it goes through each well from the modified image and plots the normalized signal intensity as a best-fit curve, providing both the mean base-pair length and the percentage of signal that falls between 200 and 600 base-pairs in length (indicating the target shearing sizes). Finally, the program produces a waterfall plot that contains best-fit curves in sequential order (lanes 1–12) to compare the relative shapes and intensities of the DNA bands. The code has been deposited and is available at this link: https://github.com/kbomsztyk/Agarose-Gel-Electrophoresis-Image-Processing DNA fragment size distribution measured by agarose gel electrophoresis method was compared to two commercial systems, Agilent Bioanalyzer (Agilent 2100) and Fragment Analyzer (formerly Advanced Analytical, now marketed by Agilent) (Supplementary Figure S2). The Bioanalyzer analyzes the biomolecules as they are electrophoresed through a microchannel in a glass chip that is primed with a gel/dye mix specific for the particular biomolecules being analyzed. The Fragment Analyzer separates biomolecules based on capillary electrophoresis. The mean size measured with the agarose gel system was ∼25 bp smaller than that measured using Agilent Bioanalyzer but 40 bp larger than that measured with Fragment Analyzer. (The difference between the two Agilent instruments was ∼70 bp) (Supplementary Figure S2F). The agarose gel fragment distribution between gDNA replicates was close to those measured with Fragment Analyzer and Agilent Bioanalyzer (Supplementary Figure S2). These comparisons show that the agarose gel electrophoresis system is well suited to analyze size distribution of sheared DNA fragments, provided that they can be visualized on gel with a camera. MATERIALS Proteinase K (25530-015) was from Invitrogen. Bovine serum albumin (BSA, A9647), salmon sperm DNA (D1626), transfer RNA (tRNA, MRE600), and protein A (P7837) leupeptin (L2884), β-glycerophosphate (G6251), sodium fluoride (NaF,S1504), sodium orthovanidate (Na3VO4, S6508), phenylmethyl sulfonylfluoride (PMSF P7626), dithiothreitol (DTT, D0632), p-nitrophenyl phosphate di(tris) salt (N3254), sodium molybdate dihydrate (Na2MoO4 • 2H20, S-6646), EDTA (E3134), Tris–HCl (T3253) were from Sigma. Sodium chloride (NaCl S-271-3) and Triton X-100 (BP151) was from Fisher. Formaldehyde (28908) was from ThermoFisher. NP40 (198596) from MP Biomedicals. McCoy's medium (SH3020001) and Dulbecco's Modified Eagle Medium (DMEM- SH30021.0) were from HyClone, penicillin/streptomycin (P/S 15749) from Invitrogen, fetal bovine serum (FBS 43635-500) from Jr. Scientific, and phosphate buffered saline (PBS 70013-032), TRIzol (15596018) from Life Technologies. Labware and kits catalog numbers, commercial suppliers and costs are listed in Supplementary Table S4. RESULTS AND DISCUSSION Chromatin and DNA ultrasound treatment in PIXUL PIXUL development was aimed to make an array of ultrasound transducers with identical performance across all 96 wells that utilize low-cost, off-the-shelf consumables. To estimate ultrasound treatment efficiency without the confounding effect of DNA crosslinking, we first used purified salmon DNA, which is readily available in large quantities. 100 μl of salmon DNA at 100 ng/μl was aliquoted into each one of the 96 wells of two replicate plates. After sealing wells with tape, the plates were treated with ultrasound in PIXUL (total time 36 min per plate). Sheared DNA fragments were assessed by agarose gel electrophoresis and ethidium bromide staining. Gel images were analyzed using the in-house-developed MATLAB-based agarose gel electrophoresis image analysis software tool as described above (Methods). Figure 1 demonstrates similar size distribution of DNA in all 96 wells. Across all wells, the average size of the sheared fragments was 307 ± 35 bp and on the average, 74.6 ± 3.7% of the band fragments were within the 200–600 bp size range (mean±SDEV, n = 3, 96-well plates) (Figure 1E). Figure 1. View largeDownload slide PIXUL shearing of DNA in 96-well plates. (A) Shearing was performed in 96-well plates (with each well containing salmon DNA at 100 ng/μl in 100μl volume/well) for a total treatment time of 36 min per each plate. (B) Agarose gel electrophoresis of DNA fragments, gels were stained with ethidium bromide. DNA ladder was run in the first lane of each gel. Numbers to the left of the gels show sizes of selected ladder bands in base pair (bp). (C) An example to illustrate a waterfall plot (MATLAB) with annotated axis. Image software was used to analyze stained DNA bands (Methods). Results represent best-fit curves in sequential order of samples from PIXUL plate column wells 1 to 12. X- axis; band size in base pair (Size (bp)). Y-axis; sample from a well of a given column (columns 1–12). Z-axis; relative signal intensity of DNA bands for given plate well (Signal). (D) Waterfall plots for each plate row (rows A through H). (E) Graphs represent band fraction in the 200–600 bp range from each one of the 96 wells (mean ± SDEV, n = 3 experiments). These results demonstrate consistent DNA shearing across all wells of a 96-well plate. Figure 1. View largeDownload slide PIXUL shearing of DNA in 96-well plates. (A) Shearing was performed in 96-well plates (with each well containing salmon DNA at 100 ng/μl in 100μl volume/well) for a total treatment time of 36 min per each plate. (B) Agarose gel electrophoresis of DNA fragments, gels were stained with ethidium bromide. DNA ladder was run in the first lane of each gel. Numbers to the left of the gels show sizes of selected ladder bands in base pair (bp). (C) An example to illustrate a waterfall plot (MATLAB) with annotated axis. Image software was used to analyze stained DNA bands (Methods). Results represent best-fit curves in sequential order of samples from PIXUL plate column wells 1 to 12. X- axis; band size in base pair (Size (bp)). Y-axis; sample from a well of a given column (columns 1–12). Z-axis; relative signal intensity of DNA bands for given plate well (Signal). (D) Waterfall plots for each plate row (rows A through H). (E) Graphs represent band fraction in the 200–600 bp range from each one of the 96 wells (mean ± SDEV, n = 3 experiments). These results demonstrate consistent DNA shearing across all wells of a 96-well plate. Next, we tested the efficiency of chromatin shearing in HCT116 cells that were cultured and crosslinked in 96-well plates. The cells were washed with PBS in the wells, followed by the addition of shearing buffer. Plates were then sealed and processed using PIXUL. Sheared samples were treated with proteinase K and, after reversal of crosslinking, sizes of DNA fragments were assessed by agarose gel electrophoresis and ethidium bromide staining (Figure 2). Across all wells, the average size of the sheared fragments was 313 ± 56 bp, and on average, 74.7 ± 3.3% of the band fragments were within the 200–600 bp size range (mean ± SDEV, n = 4, 96-well culture plates) (Figure 2E). Figure 2. View largeDownload slide PIXUL shearing of chromatin directly in 96-well plate cell cultures. (A) HCT116 cell cultures grown in 96-well plates were crosslinked directly in plates followed by glycine quenching. After PBS wash, shearing buffer was added. Plates were then sealed and were treated in PIXUL (total time 36 min per plate). After digestion with proteinase K and reversal of crosslinking, sheared DNA fragments were resolved by agarose gel electrophoresis. (B) Agarose, gels were stained with ethidium bromide. DNA ladder was run in the first lane of each gel. Numbers to the left of the gels show sizes of selected ladder bands in base pair (bp). (C) An example to illustrate a waterfall plot (MATLAB) with annotated axis. Image software was used to analyze stained DNA bands (Methods). Results are shown as waterfall plots (MATLAB) of best-fit curves in sequential order of samples from culture plate column well 1 to 12. X- axis; band size in base pair (size (bp)). Y-axis; sample from a well of a given column (columns 1 through 12). Z-axis; relative signal intensity of bands for given plate well (Signal). (D) Waterfall plots for each plate row (rows A through H). (D) Waterfall plots for each plate row. (E) Graphs represent band fraction in the 200–600 bp range from each one of the 96 wells (mean ± SDEV, n = 4 experiments). These results show that a 96-well plate culture can be directly sonicated with PIXUL, avoiding the sample transfer step and yielding consistent chromatin fragmentation across all 96 wells. Figure 2. View largeDownload slide PIXUL shearing of chromatin directly in 96-well plate cell cultures. (A) HCT116 cell cultures grown in 96-well plates were crosslinked directly in plates followed by glycine quenching. After PBS wash, shearing buffer was added. Plates were then sealed and were treated in PIXUL (total time 36 min per plate). After digestion with proteinase K and reversal of crosslinking, sheared DNA fragments were resolved by agarose gel electrophoresis. (B) Agarose, gels were stained with ethidium bromide. DNA ladder was run in the first lane of each gel. Numbers to the left of the gels show sizes of selected ladder bands in base pair (bp). (C) An example to illustrate a waterfall plot (MATLAB) with annotated axis. Image software was used to analyze stained DNA bands (Methods). Results are shown as waterfall plots (MATLAB) of best-fit curves in sequential order of samples from culture plate column well 1 to 12. X- axis; band size in base pair (size (bp)). Y-axis; sample from a well of a given column (columns 1 through 12). Z-axis; relative signal intensity of bands for given plate well (Signal). (D) Waterfall plots for each plate row (rows A through H). (D) Waterfall plots for each plate row. (E) Graphs represent band fraction in the 200–600 bp range from each one of the 96 wells (mean ± SDEV, n = 4 experiments). These results show that a 96-well plate culture can be directly sonicated with PIXUL, avoiding the sample transfer step and yielding consistent chromatin fragmentation across all 96 wells. Although microplates are sealed with adhesive films, there is a concern that during sonication there is cross-contamination between wells. To test for leaks, a 96-well plate was loaded in a checkerboard fashion with either human (‘human wells’, blue) or mouse (‘mouse wells’, green) genomic DNA (Figure 3A). The sealed plate was treated with PIXUL (18min), and DNA in each well was analyzed in qPCR using either human or mouse primers (Figure 3B). No human DNA was detected in ‘mouse wells’ and no mouse DNA was detected in ‘human wells.’ Thus, these results show that the seal is tight enough to prevent cross-contamination between wells (Figure 3B). We used only one human and one mouse primer. Thus, we might have overlooked contamination that can be detected by more sensitive and general evaluation (e.g. DNA sequencing). Figure 3. View largeDownload slide Across 96-well plate contamination test. (A) Human and mouse genomic DNA (10 ng/μl in 100 μl volume) were loaded into 96-well plate in a checkerboard layout. After sealing with a film adhesive, plate was treated with PIXUL (18 min per plate), and DNA in each well was assessed using human (EGR1) and mouse (Tnfa) primers in qPCR. (B) Results of qPCR analysis with human (left panel) and mouse (right panel) primers for each one of the 96-wells (rows A–H and columns 1–12). Bars (blue or green) in the graphs show relative human and mouse DNA concentrations (each well/average non-zero concentration across the entire plate- scale shown 0.0–1.0). These results demonstrate that there is no detectable (not different from 0.0) cross-contamination across wells. Figure 3. View largeDownload slide Across 96-well plate contamination test. (A) Human and mouse genomic DNA (10 ng/μl in 100 μl volume) were loaded into 96-well plate in a checkerboard layout. After sealing with a film adhesive, plate was treated with PIXUL (18 min per plate), and DNA in each well was assessed using human (EGR1) and mouse (Tnfa) primers in qPCR. (B) Results of qPCR analysis with human (left panel) and mouse (right panel) primers for each one of the 96-wells (rows A–H and columns 1–12). Bars (blue or green) in the graphs show relative human and mouse DNA concentrations (each well/average non-zero concentration across the entire plate- scale shown 0.0–1.0). These results demonstrate that there is no detectable (not different from 0.0) cross-contamination across wells. PIXUL sample preparation combined with Matrix ChIP into an integrated platform for high-throughput chromatin analysis, PIXUL-ChIP Cell cultures are frequently used to study transcription and epigenetic processes. Many studies are done in 96-well culture plates, which makes the harvesting of cells for epigenetic studies unreliable and tedious. To test the usefulness of PIXUL-generated chromatin in the Matrix ChIP assay, we used a well-characterized model system. We have previously shown that serum added to serum-starved HCT116 cultures activates gene expression and induces recruitment of RNA polymerase II (Pol II) to the EGR1 locus (25). In ChIP assays, we used Pol II 4H8 monoclonal antibody that recognizes phosphorylated and un-phosphorylated C-terminal domain (CTD) (25,26). This system was utilized to develop a protocol for integrating PIXUL with the Matrix ChIP assay, PIXUL-ChIP. Cells were seeded in 96-well plates in 10% FBS. After reaching near confluence, culture media was replaced with 0.1% FBS to render the culture quiescent. Under these conditions, cells can be maintained quiescent for up to a week, ready for testing when needed. Cell cultures were activated with 10% FBS for 0, 5, 15 and 30 min (Figure 4A). After completion of the time-course serum induction, cells in all 96 wells were cross-linked with formaldehyde, lysis buffer was added, and the plate was treated with ultrasound in PIXUL. 4 μl of sheared chromatin (equivalent to ∼2000 cells) from each well was used in one Matrix ChIP reaction to assess Pol II levels at the inducible EGR1 and constitutive UBE2b loci. The intragenic region 15kb upstream of the EGR1 gene was used as negative control. The 96 chromatin samples were used in two Matrix ChIP plates (PIXUL rows A–D in Matrix ChIP plate 1 and PIXUL rows E–H in Matrix ChIP plate 2) to run 48 inputs (which is DNA isolated from whole cell extracts) and 48 Pol II ChIPs on each plate. ChIP DNA was assessed by qPCR, and Pol II levels were calculated as a fraction of input as previously described (7). Figure 4A shows the layout of a 96-well culture plate treated with PIXUL. The results demonstrate inducible recruitment of Pol II to the EGR1 locus, with the kinetics and amplitude similar in each one of the six quadrants (Figure 4B). There were no changes in Pol II recruitment in response to serum at the constitutively expressed UBE2b gene, with levels that were similar across all six quadrants, Figure 4C. As expected, Pol II levels were low at the intragenic site 15 kb upstream of the EGR1 locus (Figure 4B and C). Figure 4. View largeDownload slide PIXUL-ChIP-qPCR analysis of Pol II recruitment kinetics to the EGR1 locus in serum-treated 96-well HCT116 culture. Serum-deprived HCT116 96-well cultures were treated with serum for 0, 5, 15 and 30 min. Cells were crosslinked directly in the 96-well plate, quenched with glycine, and washed with PBS. PBS was then replaced with shearing buffer, and the plate was treated with PIXUL. Sheared chromatin was used in Matrix ChIP-qPCR analysis of Pol II at the EGR1 gene. (A) Layout of the serum time-course treatment experiment. (B, C) Pol II ChIP-qPCR analysis at the EGR1 (B) and UBE2b (C) loci presented as fraction of input. Graphs show mean ± SEM (n = 4) of combined ChIP-qPCR as shown (n = 4 wells/each time point). Gray/blue boxes above the graphs correspond to colors of the plate quadrants in (A). Cartoons of the EGR1 and UBE2b genes and location of the PCR primers (colored boxes) are shown below. These results show that cells grown in 96-well plate can be treated with an inducing agent (here, serum) and sonicated directly on the culture plate in PIXUL (no sample transfer), and then sheared chromatin aliquots analyzed in microplate ChIP-qPCR, yielding reproducible results of all 96 samples in one day. Figure 4. View largeDownload slide PIXUL-ChIP-qPCR analysis of Pol II recruitment kinetics to the EGR1 locus in serum-treated 96-well HCT116 culture. Serum-deprived HCT116 96-well cultures were treated with serum for 0, 5, 15 and 30 min. Cells were crosslinked directly in the 96-well plate, quenched with glycine, and washed with PBS. PBS was then replaced with shearing buffer, and the plate was treated with PIXUL. Sheared chromatin was used in Matrix ChIP-qPCR analysis of Pol II at the EGR1 gene. (A) Layout of the serum time-course treatment experiment. (B, C) Pol II ChIP-qPCR analysis at the EGR1 (B) and UBE2b (C) loci presented as fraction of input. Graphs show mean ± SEM (n = 4) of combined ChIP-qPCR as shown (n = 4 wells/each time point). Gray/blue boxes above the graphs correspond to colors of the plate quadrants in (A). Cartoons of the EGR1 and UBE2b genes and location of the PCR primers (colored boxes) are shown below. These results show that cells grown in 96-well plate can be treated with an inducing agent (here, serum) and sonicated directly on the culture plate in PIXUL (no sample transfer), and then sheared chromatin aliquots analyzed in microplate ChIP-qPCR, yielding reproducible results of all 96 samples in one day. Next, we compared side-by-side different cell types grown on the same plate and treated with different agents over a time-course. Two human lines, HCT116 and HEK293, were grown on the same 96-well plate. After serum deprivation, the quiescent cells (columns 1–11) were treated with either serum or TPA over a time course from 0 min to 48 h prior to crosslinking. Included was also a set of wells in which cells were maintained in 10% serum without any treatment (column 12). To assess the reproducibility of the entire experiment, treatments were done in duplicates. The layout of the plate for this experiment is shown in Figure 5A. The results of ChIP analysis show that both serum and TPA increased levels of Pol II at the EGR1 gene in HTC116 and HEK293 cells but that the kinetics of induction were different. Further, only HCT116 cells demonstrated serum-inducible Pol II recruitment to the NR4A3 locus (27). Figure 5. View largeDownload slide PIXUL-ChIP-qPCR analysis of Pol II kinetics of recruitment to inducible loci in response to serum- and TPA-treatment of 96-well HCT116 and HEK293 cell cultures. Serum-deprived HCT116 and HEK293 cultures in the same 96-well plate were treated with either 10% serum or 100 nM TPA for 5, 15, 30 min and 1, 2, 4, 6, 18, 24 and 48 h. Cells were crosslinked, plates sealed and treated with PIXUL as in Figure 2. Sheared chromatin was used in microplate ChIP analysis of Pol II density at EGR1 and NR4A3 genes. (A) 96-well plate culture layout of the serum and TPA time-course treatment experiment. (B) Graphs of ChIP-qPCR results showing Pol II density (as a fraction of input), mean ± SEM (n = 2) of respective cell lines, treatments (serum; green, TPA;blue) and harvested at indicated time points. (C) Gene cartoons and position of PCR primers. These data show that different cells can be cultured on the same 96-well plate, treated with different agents at various times, and then sheared directly in PIXUL and analyzed by ChIP-qPCR yielding results for all 96 samples in one day. Figure 5. View largeDownload slide PIXUL-ChIP-qPCR analysis of Pol II kinetics of recruitment to inducible loci in response to serum- and TPA-treatment of 96-well HCT116 and HEK293 cell cultures. Serum-deprived HCT116 and HEK293 cultures in the same 96-well plate were treated with either 10% serum or 100 nM TPA for 5, 15, 30 min and 1, 2, 4, 6, 18, 24 and 48 h. Cells were crosslinked, plates sealed and treated with PIXUL as in Figure 2. Sheared chromatin was used in microplate ChIP analysis of Pol II density at EGR1 and NR4A3 genes. (A) 96-well plate culture layout of the serum and TPA time-course treatment experiment. (B) Graphs of ChIP-qPCR results showing Pol II density (as a fraction of input), mean ± SEM (n = 2) of respective cell lines, treatments (serum; green, TPA;blue) and harvested at indicated time points. (C) Gene cartoons and position of PCR primers. These data show that different cells can be cultured on the same 96-well plate, treated with different agents at various times, and then sheared directly in PIXUL and analyzed by ChIP-qPCR yielding results for all 96 samples in one day. These results show that PIXUL sample preparation can be easily integrated with downstream microplate ChIP assays, providing a useful tool that facilitates ChIP studies where comparative analyses of a number of cell lines and/or treatments are done in parallel (28). Further, starting with a 96-well culture plate, sample preparation and all steps of the ChIP assay and qPCR analysis are completed in the same day. As such, along with other applications, integrated PIXUL-ChIP should be a useful tool for drug screening and validation. PIXUL-ChIP application to embryonic stem cells (ESC) A number of small molecules, including epigenetic drugs, have been discovered to induce pluripotency (29) and manipulate ESC fate (30,31). Still, these studies are limited by the lack of sensitive technologies that would allow high-throughput testing and validation of drugs in ESCs. To test PIXUL-ChIP applicability in ESCs, we used Elf1 hESC derived from blastocysts of frozen 6–8-cell embryos (NIHhESC-12-0156) (32,33). 96-well plates with either naïve (Elf1 2iLIF) or primed (Elf1, 2 passages in TeSR + FGF2 media for 4 days) cells were set up (32,34,35) for PIXUL-ChIP and RT-qPCR analysis. After cells in some of the wells were harvested for RT-qPCR, the rest of the plate was cross-linked and sonicated with PIXUL (24min). Sheared chromatin was used in Matrix-ChIP-qPCR analysis as before (Figures 4 and 5). High levels of expression and high chromatin accessibility of the OCT4 (POU5F1) locus is a hallmark of ESC, including Elf1 cells (32). RT-qPCR demonstrated high levels of OCT4 expression, which was higher in TeSR+ FGF2-primed cells compared to 2iLIF naïve cells (Figure 6A). In contrast, expression of another transcriptional regulator, TBX3, (36) was very low in both cells. PIXUL-ChIP analysis (Figure 6B) demonstrated high levels of permissive (H3K27Ac and H3K4m1) (37) and repressive (H3K27m3) epigenetics marks at the OCT4 enhancers compared to promoter regions, and these modifications were higher in the primed cells compared to naïve cells. Consistent with the mRNA data (Figure 6A), Pol II levels and marks were low at the TBX3 gene. These observations are consistent with previous observations that in primed Elf1 cells, OCT4 enhancers have higher chromatin accessibility (DNase I hypersensitivity) and H3K27me3 levels compared to naïve cells (32). Figure 6. View largeDownload slide PIXUL-ChIP-qPCR analysis of Pol II and epigenetic modifications at the OCT4 (POU5F1) locus in hESC Elf1 cells. Human embryonic stem cells (hESC Elf1) were cultured in 96-well plates as naive (2i + hLif + FGF2 + Igf1) or as primed (TeSR+FGF2) on Matrigel for either one or two passages. (2i- two inhibitors: PD0325901 MEKi and CHIR-99021GSK3i). One and two passages in TeSR represent cells transitioning to primed. These cells were plated at 10 000 cells/well on Matrigel in 96-well plates with Rho kinase (ROCK) inhibitor present for the first 24 h of culture to improve survival (32). Half of the plate was used to extract RNA for RT-qPCR (normalized to L32 mRNA) (A) and the other half (chromatin) was crosslinked, sheared in PIXUL, and subjected to Matrix ChIP analysis (expressed as a fraction of input) (B) as in Figure 3. Statistical differences between two means (P value) are shown by the size of the solid circles: P < 0.05 for small circle, P < 0.01 for large circle, and no circle indicating the differences are not statistically significant (7). These results are consistent with previous observations and as such demonstrated that PIXUL-ChIP-qPCR platform could be used for high-throughput experiments and drug screening. Figure 6. View largeDownload slide PIXUL-ChIP-qPCR analysis of Pol II and epigenetic modifications at the OCT4 (POU5F1) locus in hESC Elf1 cells. Human embryonic stem cells (hESC Elf1) were cultured in 96-well plates as naive (2i + hLif + FGF2 + Igf1) or as primed (TeSR+FGF2) on Matrigel for either one or two passages. (2i- two inhibitors: PD0325901 MEKi and CHIR-99021GSK3i). One and two passages in TeSR represent cells transitioning to primed. These cells were plated at 10 000 cells/well on Matrigel in 96-well plates with Rho kinase (ROCK) inhibitor present for the first 24 h of culture to improve survival (32). Half of the plate was used to extract RNA for RT-qPCR (normalized to L32 mRNA) (A) and the other half (chromatin) was crosslinked, sheared in PIXUL, and subjected to Matrix ChIP analysis (expressed as a fraction of input) (B) as in Figure 3. Statistical differences between two means (P value) are shown by the size of the solid circles: P < 0.05 for small circle, P < 0.01 for large circle, and no circle indicating the differences are not statistically significant (7). These results are consistent with previous observations and as such demonstrated that PIXUL-ChIP-qPCR platform could be used for high-throughput experiments and drug screening. These studies illustrate that PIXUL-ChIP is a tool that has the potential to empower researchers for high-throughput screens (such as small molecules and growth factors) to study ESC self-renewal and pluripotency more readily than the traditional approach. Comparison of integrated PIXUL-ChIP protocol with commercial Bioruptor and Covaris chromatin shearing instruments followed by Matrix ChIP There are several commercially available ultrasound instruments to sonicate chromatin. The two best known are the Bioruptor (manufactured by Diagenode), and LE220 Focused Ultrasonicator (manufactured by Covaris). Bioruptor Bioruptor uses standard test tubes and can process 12 tubes at a time. We compared the efficiency of PIXUL-ChIP with Bioruptor followed by Matrix ChIP. Quiescent HEK293 cells in a 96-well plate were treated with serum (0, 5, 15 and 30 min). Next, one row of cells (12 wells, n = 3 for each time point) was transferred to test tubes, crosslinked and then sheared in the Bioruptor (45 min sonication). The rest of the plate was crosslinked and treated with PIXUL (36 min sonication). Agarose gel electrophoresis (Figure 7A) shows less chromatin yield using the Bioruptor protocol compared to PIXUL, suggesting that losses were associated with manual harvesting of the cells from 96-well plates and transfer to tubes for sonication in the Bioruptor. Lower yields of Bioruptor-sheared chromatin are also illustrated for HCT116 cells in Supplementary Figure S3. The average size of HEK293 cell Bioruptor-sheared fragments was 243 ± 28 bp (77.2 ± 7.3% in 200–600 bp range), comparable to 277 ± 15 bp (84.3 ± 2.5% in 200–600 bp range) with PIXUL (mean ± SDEV, n = 12 wells/samples). This comparison shows that chromatin fragmentation with PIXUL done directly in culture plates is more consistent compared to fragmentation with Bioruptor done in tubes requiring sample transfers (Figure 7B). Activation of EGR1 gene is associated with recruitment of active (phosphorylated) components of the ERK pathway to the EGR1 locus (25,26,38). Equal aliquots of chromatin from PIXUL- and Bioruptor-generated samples were analyzed on the same Matrix ChIP plate using antibodies to Pol II CTD, H3K27m3, B-Raf phosphorylated on T598 and S601 (pB-Raf) and ERK phosphorylated on T202 and Y204 (pERK) (25,26). As before, serum treatment increased Pol II recruitment to the EGR1 gene, but the measured levels were higher in chromatin samples prepared with PIXUL compared to Bioruptor (Figure 7C). Measured levels of serum-induced pB-Raf and pERK at EGR1 gene were also higher in chromatin samples prepared with PIXUL. In contrast, levels of H3K27m3 were similar using the PIXUL and Bioruptor. Pol II CTD, pB-Raf and pERK antibodies recognize phosphorylated forms of these proteins (25,26). Phosphorylation can be significantly degraded during sample preparation prior to analysis (7,39), which could explain the differences between the two methods (where Bioruptor protocol requires more manual handling and longer preparation times). Figure 7. View largeDownload slide Matrix ChIP analysis of chromatin prepared from 96-well HEK293 cultures using PIXUL and Bioruptor. Serum-deprived HEK293 96-well cultures were treated with serum for 0, 5, 15 and 30 min. Cells were crosslinked directly in the 96-well plate, harvested manually from Row A (12 samples) and transferred to twelve 0.5 ml tubes for ultrasound treatment in the Bioruptor (45 min). The rest of the plate was sealed and sonicated in PIXUL (26 min treatment). (A) comparison of sheared chromatin fragments obtained with Bioruptor versus PIXUL analyzed by agarose gel electrophoresis, Ethidium bromide stained gels are shown, sizes (bp) of DNA ladder fragments (first lane) are shown to the left. Sonicated fragments were analyzed by image analysis software (Methods), results are displayed as waterfall plots in sequential order of samples run from lane 1 to 12. X- axis; band size in base pair (bp). Y-axis; sample from a given lane. Z-axis; relative signal intensity of bands. (B) Mean fragment size (each blue dot) of sheared chromatin obtained with Bioruptor and PIXUL. (C) Sheared chromatin samples from both PIXUL and Bioruptor were analyzed simultaneously by Matrix ChIP using antibodies to Pol II, pB-Raf, pErk and H3K27m3. ChIPed DNA was analyzed by real-time PCR using indicated primers. Results show mean ± SEM (n = 3 replicates for each time point for each instrument). This comparison demonstrates that sample transfer causes significant sample losses, which may in part account for greater variability. There are lower Matrix ChIP signals from chromatin prepared with Bioruptor compared to PIXUL. Figure 7. View largeDownload slide Matrix ChIP analysis of chromatin prepared from 96-well HEK293 cultures using PIXUL and Bioruptor. Serum-deprived HEK293 96-well cultures were treated with serum for 0, 5, 15 and 30 min. Cells were crosslinked directly in the 96-well plate, harvested manually from Row A (12 samples) and transferred to twelve 0.5 ml tubes for ultrasound treatment in the Bioruptor (45 min). The rest of the plate was sealed and sonicated in PIXUL (26 min treatment). (A) comparison of sheared chromatin fragments obtained with Bioruptor versus PIXUL analyzed by agarose gel electrophoresis, Ethidium bromide stained gels are shown, sizes (bp) of DNA ladder fragments (first lane) are shown to the left. Sonicated fragments were analyzed by image analysis software (Methods), results are displayed as waterfall plots in sequential order of samples run from lane 1 to 12. X- axis; band size in base pair (bp). Y-axis; sample from a given lane. Z-axis; relative signal intensity of bands. (B) Mean fragment size (each blue dot) of sheared chromatin obtained with Bioruptor and PIXUL. (C) Sheared chromatin samples from both PIXUL and Bioruptor were analyzed simultaneously by Matrix ChIP using antibodies to Pol II, pB-Raf, pErk and H3K27m3. ChIPed DNA was analyzed by real-time PCR using indicated primers. Results show mean ± SEM (n = 3 replicates for each time point for each instrument). This comparison demonstrates that sample transfer causes significant sample losses, which may in part account for greater variability. There are lower Matrix ChIP signals from chromatin prepared with Bioruptor compared to PIXUL. We also tested the efficiency of PIXUL versus Bioruptor in chromatin sample preparation from approximately similar size pieces of frozen livers from a mouse model of sepsis. Sheared liver chromatin yields were similar with both methods (Supplementary Figure S4A), providing further evidence that the differences seen with cell cultures (Figure 7A) occur during sample harvest and transfer from the 96-well plate to Bioruptor tubes. The average size of Bioruptor-sheared fragments was 258 ± 14 bp compared to 362 ± 10 bp with PIXUL (mean ± SDEV, n = 6 livers). Previously we found that in experimental sepsis models there was an increased recruitment of Pol II to Ngal (Lcn2) in liver (40). Both sonication methods showed an increase in Pol II signal at the Ngal (Lcn2) gene in septic livers, but the level was greater in chromatin prepared by PIXUL (Supplementary Figure S4D) compared to Bioruptor. As a no-change control, we assessed H3 levels, which were not altered by sepsis but were again higher in chromatin prepared with PIXUL compared to Bioruptor. Our Matrix ChIP results show that PIXUL that uses microplates is faster and more efficient than the standard tube-based Bioruptor approach, where loss of samples during manual transfers and partial dephosphorylation may underlie lower Pol II CTD, pB-Raf and pERK levels at genes. Covaris LE220 This instrument uses either glass tubes or glass microplates. The ultrasound transducers and the plates/tubes are physically moved during the operation, and the water used to couple ultrasound to wells requires degassing. With PIXUL, neither the transducers nor the plate move, and no degassing of the coupling fluid is performed. To compare Covaris with PIXUL, we used the serum-treated HCT116 cell culture system as above (Figure 4). We found that harvesting cells from one well of a 96-well culture plate yielded insufficient amounts of chromatin in Covaris to generate reproducible ChIP data. Thus, for Covaris LE220 we combined cells from three wells of a 96-well culture plate into one sample. The sizes of chromatin fragments sonicated with Covaris were not uniform (Figure 8A and B). Notably, the first position/well (A1) of the Covaris sonicator yielded smaller fragments with either tubes or plate (Figure 8A and B, lane 1). The mean fragment size was 532±77 for Covaris tubes and 490 ± 83 for Covaris microplate, compared to 440 ± 53 for PIXUL (Figure 8A–D). Chromatin prepared with either Covaris or PIXUL instruments and tested in Matrix ChIP yielded similar signals (Figure 8E), but the background was lower using PIXUL (Figure 8F). This comparison demonstrates that PIXUL, which uses inexpensive off-the-shelf plates, not only avoids manual transfers from 96-well culture plates (allowing the use of lower cell numbers) but also shears chromatin more consistently compared to the Covaris LE220 instrument (Figure 8A–D, and also see Figure 2 for all 96 wells). Figure 8. View largeDownload slide Matrix ChIP analysis of chromatin prepared from 96-well HCT116 cultures using either PIXUL or Covaris LE220. Serum-deprived HCT116 cell 96-well plate cultures were treated with serum for 0, 5, 15 and 30 min. Cells were cross-linked directly in the 96-well plate. With Covaris LE220 shearing harvesting cells from one well of a 96-well plate yielded insufficient amounts of chromatin to generate reproducible ChIP results. Thus, with this instrument, for each time point cells harvested from three wells of a 96-well plate were combined into one sample and transferred to either Covaris microplate tubes or Covaris microplate. Each time point was done in duplicate, for a total of eight samples. The rest of the 96-well plate was sealed and sonicated in PIXUL (18 min treatment). (A–C) Agarose gel electrophoresis analysis of chromatin fragments sonicated using Covaris LE220 microplate tubes (A), Covaris LE220 plate (B) or PIXUL (C). Sheared fragments were analyzed by image analysis software (Methods), results are displayed as waterfall plots in sequential order of samples run from lane 1 to 8 as in Figure 1. X- axis; band size in base pair (bp). Y-axis; sample from a given lane. Z-axis; relative signal intensity of bands. Numbers above the plots show average fragment size ±SEM for all eight samples. (D) Sizes of chromatin samples (A-C) sonicated by either Covaris tubes, Covaris plate or PIXUL. (E) Sheared chromatin samples prepared using either Covaris tubes (cells from 3 wells combined into one sample) and PIXUL (single well per sample) were analyzed simultaneously by Matrix ChIP using antibodies to Pol II, H3K9Ac, H3K27Ac, H3K27m3, H3K36m3 and CTCF. ChIP DNA was analyzed by qPCR using indicated primers. Results show fraction of input, mean±SEM (n = 3 replicates for each time point for each instrument). (F) Comparison of Pol II and CTCF ChIP signals at known binding and respective distal sites using Covaris versus PIXUL sonicated chromatin (yellow circles below graphs show P < 0.05). (G) EGR1 gene cartoon and position of PCR primers. This comparison demonstrates that sonication of chromatin with PIXUL is more consistent and yields smaller fragments compared to Covaris; in particular the first position (A1) in their plate yields considerably smaller fragment than the other wells. Combing cells from three wells of a 96-well plate for Covaris sonication generates chromatin yielding similar ChIP results to those obtained using cells from one well of a 96-well plate treated with PIXUL. The ChIP background signal is lower with PIXUL compared to Covaris. Figure 8. View largeDownload slide Matrix ChIP analysis of chromatin prepared from 96-well HCT116 cultures using either PIXUL or Covaris LE220. Serum-deprived HCT116 cell 96-well plate cultures were treated with serum for 0, 5, 15 and 30 min. Cells were cross-linked directly in the 96-well plate. With Covaris LE220 shearing harvesting cells from one well of a 96-well plate yielded insufficient amounts of chromatin to generate reproducible ChIP results. Thus, with this instrument, for each time point cells harvested from three wells of a 96-well plate were combined into one sample and transferred to either Covaris microplate tubes or Covaris microplate. Each time point was done in duplicate, for a total of eight samples. The rest of the 96-well plate was sealed and sonicated in PIXUL (18 min treatment). (A–C) Agarose gel electrophoresis analysis of chromatin fragments sonicated using Covaris LE220 microplate tubes (A), Covaris LE220 plate (B) or PIXUL (C). Sheared fragments were analyzed by image analysis software (Methods), results are displayed as waterfall plots in sequential order of samples run from lane 1 to 8 as in Figure 1. X- axis; band size in base pair (bp). Y-axis; sample from a given lane. Z-axis; relative signal intensity of bands. Numbers above the plots show average fragment size ±SEM for all eight samples. (D) Sizes of chromatin samples (A-C) sonicated by either Covaris tubes, Covaris plate or PIXUL. (E) Sheared chromatin samples prepared using either Covaris tubes (cells from 3 wells combined into one sample) and PIXUL (single well per sample) were analyzed simultaneously by Matrix ChIP using antibodies to Pol II, H3K9Ac, H3K27Ac, H3K27m3, H3K36m3 and CTCF. ChIP DNA was analyzed by qPCR using indicated primers. Results show fraction of input, mean±SEM (n = 3 replicates for each time point for each instrument). (F) Comparison of Pol II and CTCF ChIP signals at known binding and respective distal sites using Covaris versus PIXUL sonicated chromatin (yellow circles below graphs show P < 0.05). (G) EGR1 gene cartoon and position of PCR primers. This comparison demonstrates that sonication of chromatin with PIXUL is more consistent and yields smaller fragments compared to Covaris; in particular the first position (A1) in their plate yields considerably smaller fragment than the other wells. Combing cells from three wells of a 96-well plate for Covaris sonication generates chromatin yielding similar ChIP results to those obtained using cells from one well of a 96-well plate treated with PIXUL. The ChIP background signal is lower with PIXUL compared to Covaris. Covaris instruments are widely used for genomic applications. We thus compared human genomic DNA shearing using PIXUL to Covaris LE220 for exome sequencing library preparation (Supplementary Figure S5). As shown, the quality of exome sequencing libraries was similar with both instruments. Thus at much lower operating costs, PIXUL can also be used as a sample preparation platform for genomic applications. PIXUL-ChIP analysis of Pol II occupancy at organ-restricted genes in mouse heart, kidney, liver, and lung As many diseases are associated with systemic epigenetic changes (e.g. diabetes, obesity, inflammation, sepsis and even cancer), having methods for parallel multiple organ studies in model systems would offer new potential to better understand epigenetics of disease progression and evaluate drug efficacy/toxicity in different organs, ultimately providing information to improve clinical outcomes (40–42). We harvested hearts, kidneys, livers and lungs from male and female mice and simultaneously prepared chromatin samples from fragments of all these tissues in a single 96-well plate using PIXUL. Figure 9A illustrates Pol II binding to genes known to be preferentially expressed in the heart, Tnnt2 (troponin); kidney, Fxyd2 (ATPase subunit); liver, Alb (albumin); and lung, Sftpa1 (surfactant). The organ-specific Pol II binding was corroborated by RT-qPCR measurements of cognate transcripts (Figure 9B). The above experiment demonstrates that PIXUL integrated with Matrix ChIP facilitates parallel high-throughput epigenetic analysis of multiple organs. Figure 9. View largeDownload slide PIXUL-ChIP analysis of Pol II occupancy in mouse heart, kidney, liver, and lung. Flash frozen heart, kidney, liver and lung samples from male and female mice were cross-linked and then sonicated in microplates using PIXUL. (A) PIXUL-sheared chromatin samples were simultaneously analyzed for Pol II levels at indicated organ-specific genes using Matrix ChIP. ChIP DNA was analyzed by qPCR expressed as fraction of input. Data represent mean ± SEM (n = 3 mice) expressed as a fraction of input. (B) RNA isolated from the same frozen organs as in A was used in RT-qPCR with primers to indicated genes. Data represent mean ± SEM (n = 3 mice) expressed as a ratio to the transcript levels of housekeeping ribosomal protein gene, L32. These results demonstrate that PIXUL-ChIP can be used to analyze multiple samples from several organs on the same plate. Figure 9. View largeDownload slide PIXUL-ChIP analysis of Pol II occupancy in mouse heart, kidney, liver, and lung. Flash frozen heart, kidney, liver and lung samples from male and female mice were cross-linked and then sonicated in microplates using PIXUL. (A) PIXUL-sheared chromatin samples were simultaneously analyzed for Pol II levels at indicated organ-specific genes using Matrix ChIP. ChIP DNA was analyzed by qPCR expressed as fraction of input. Data represent mean ± SEM (n = 3 mice) expressed as a fraction of input. (B) RNA isolated from the same frozen organs as in A was used in RT-qPCR with primers to indicated genes. Data represent mean ± SEM (n = 3 mice) expressed as a ratio to the transcript levels of housekeeping ribosomal protein gene, L32. These results demonstrate that PIXUL-ChIP can be used to analyze multiple samples from several organs on the same plate. The novel ultrasound transducer design, the use of off-the-shelf inexpensive plates, and user-friendly operation give PIXUL the potential to be used as a multipurpose sample preparation platform (e.g. in integrative studies). To test this concept, we show that PIXUL can be used for multiorgan RNA isolation done in parallel with chromatin shearing, for RT-PCR and ChIP assays (Supplementary Figure S6). PIXUL-ChIP-seq ChIP-seq is a widely used method that provides powerful means to assess histone modifications and chromatin-bound proteins genome-wide (18,43–45). Sonication is commonly used to shear chromatin for ChIP-seq. We assessed the compatibility of PIXUL with an established ChIP-seq pipeline (Active Motif). HCT116 cells cultured in 96-well plates (∼200 000 cells/well) were sonicated in PIXUL as above, ChIP was carried out with different antibodies, and libraries were constructed and sequenced (see Methods). We compared 7 ChIP-seq HCT116 cell signals (H3K4m1, H3K4m3, H3K9Ac, H3K36m3, H3K27Ac, H3k27m3 and CTCF) that are profiled by both PIXUL ChIP and the ENCODE project (Figure 10). We found that the majority (62–94%) of peaks identified in our PIXUL-ChIP samples are also identified as peaks in the corresponding ENCODE samples (Figure 10A). Figure 10B illustrates a snapshot at the EGR-1 locus comparing PIXUL-ChIP-seq and ENCODE (for a link to UCSC Genome Browser track, see Methods). Scatter plot (46) analysis for all of the above antibodies demonstrated good correlation between PIXUL-ChIP-seq and ENCODE datasets (Supplementary Figure S7). The differences between PIXUL-ChIP-seq and ENCODE data sets may reflect the use of ChIP antibodies from different sources, growth conditions, and the lower number of HCT116 cells (∼200 000 for PIXUL-ChIP-seq) compared to ENCODE (>106). Figure 10. View largeDownload slide PIXUL-ChIP-seq results and comparison to ENCODE datasets. HCT116 cells were grown to the density of ∼200 000 cells per well, cross-linked, and sonicated using 96-well PIXUL. ChIP was performed and libraries were generated from a single PIXUL well using Active Motif's Low Cell ChIP-Seq Kit. Libraries were sequenced on a NextSeq 500 (Methods). (A) Number of peaks in PIXUL-ChIP-seq and ENCODE datasets, and percentage of PIXUL-ChIP-seq peaks that are also detected in ENCODE. (B) PIXUL-ChIP-seq (white background) and ENCODE (gray background) genome browser snapshot of a region around the EGR1 locus occupied by CTCF, H3K9Ac, H3K27Ac, H3K4m1, H3K4m3, H3K36m3 and H3K27m3. The data demonstrate good agreement between PIXUL-ChIP-seq (which was done in ∼200,000 cells) compared to ENCODE (which used >106 cells). (C–H) To verify that genes marked by PIXUL-ChIP-seq peaks show expected expression patterns (genes with repressive marks have lower expression, genes with active marks show higher expression), HCT116 RNA-seq data were downloaded from Sanger Institute Genomics of Drug Sensitivity in Cancer (GDSC) website (https://www.cancerrxgene.org/). Expression distribution was plotted of genes with histone marks at the transcription start sites (TSS) and those without. Genes marked with active histone marks around TSS have a mean expression of 32 (25) Fragments Per Kilobase Million (FPKM), while genes without active histone marks (or with repressive mark H3K27m3) are expressed < 1FPKM – resulting in the bimodal distribution. (C) Expression distribution for genes with H3K4m1 PIXUL-ChIP-seq peaks (orange) around TSS and genes without H3K4m1 peaks (blue). (D) Expression distribution for genes with H3K4m3 peaks around TSS and genes without H3K4m3 peaks. (E) Expression distribution for genes with H3K9Ac peaks around TSS and genes without H3K9Ac peaks. (F) Expression distribution for genes with H3K36m3 peaks around TSS and genes without H3K36m3 peaks. (G) Expression distribution for genes with H3K27Ac peaks within gene body or around TSS and genes without H3K27Ac peaks. (H) Expression distribution for genes with H3K27m3 peaks within gene body or around TSS and genes without H3K27me3 peaks. Figure 10. View largeDownload slide PIXUL-ChIP-seq results and comparison to ENCODE datasets. HCT116 cells were grown to the density of ∼200 000 cells per well, cross-linked, and sonicated using 96-well PIXUL. ChIP was performed and libraries were generated from a single PIXUL well using Active Motif's Low Cell ChIP-Seq Kit. Libraries were sequenced on a NextSeq 500 (Methods). (A) Number of peaks in PIXUL-ChIP-seq and ENCODE datasets, and percentage of PIXUL-ChIP-seq peaks that are also detected in ENCODE. (B) PIXUL-ChIP-seq (white background) and ENCODE (gray background) genome browser snapshot of a region around the EGR1 locus occupied by CTCF, H3K9Ac, H3K27Ac, H3K4m1, H3K4m3, H3K36m3 and H3K27m3. The data demonstrate good agreement between PIXUL-ChIP-seq (which was done in ∼200,000 cells) compared to ENCODE (which used >106 cells). (C–H) To verify that genes marked by PIXUL-ChIP-seq peaks show expected expression patterns (genes with repressive marks have lower expression, genes with active marks show higher expression), HCT116 RNA-seq data were downloaded from Sanger Institute Genomics of Drug Sensitivity in Cancer (GDSC) website (https://www.cancerrxgene.org/). Expression distribution was plotted of genes with histone marks at the transcription start sites (TSS) and those without. Genes marked with active histone marks around TSS have a mean expression of 32 (25) Fragments Per Kilobase Million (FPKM), while genes without active histone marks (or with repressive mark H3K27m3) are expressed < 1FPKM – resulting in the bimodal distribution. (C) Expression distribution for genes with H3K4m1 PIXUL-ChIP-seq peaks (orange) around TSS and genes without H3K4m1 peaks (blue). (D) Expression distribution for genes with H3K4m3 peaks around TSS and genes without H3K4m3 peaks. (E) Expression distribution for genes with H3K9Ac peaks around TSS and genes without H3K9Ac peaks. (F) Expression distribution for genes with H3K36m3 peaks around TSS and genes without H3K36m3 peaks. (G) Expression distribution for genes with H3K27Ac peaks within gene body or around TSS and genes without H3K27Ac peaks. (H) Expression distribution for genes with H3K27m3 peaks within gene body or around TSS and genes without H3K27me3 peaks. To verify that genes marked by PIXUL-ChIP-seq peaks show the expected expression pattern (genes with repressive marks have lower expression, genes with active marks have higher expression), HCT116 RNA-seq data were downloaded from the Sanger Institute Genomics of Drug Sensitivity in Cancer (GDSC) website (https://www.cancerrxgene.org/). Plots of expression distributions of genes showed expected correlations with histone marks (H3K4m1, H3K4m3, H3K9Ac, H3K27Ac, H3K27m3, H3K36m3) at the transcription start sites (TSS) (Figure 10C–H). In summary, we have developed an ultrasound instrument, PIXUL, to rapidly sonicate chromatin and DNA in standard 96-well tissue culture plates and integrated it with ChIP for high-throughput ChIP-qPCR and ChIP-seq analysis. The integrated PIXUL-ChIP method has several important advantages over existing protocols. (i) 96-well plates with cell cultures are directly placed in PIXUL so that cell harvesting and sonication is done in one step, limiting losses during sample transfers (Figures 4–6). Other sonicators use tubes or 96-well glass plates, requiring manual transfers and inherently resulting in sample losses. With PIXUL, fewer transfer steps potentially minimize epitope losses (Figure 7). (ii) PIXUL high-throughput chromatin shearing in microplates matches the format and throughput of the microplate ChIP platform (Figures 4 and 5) (8,9,11). This feature allows for efficient integration of sample preparation with downstream analytical steps, with the potential to fully automate the entire process from sample preparation to results. (iii) ChIP studies involving multiple cell lines and treatments can be carried out on the same 96-well culture plate, making it well suited for high-throughput kinetic studies or drug screening experiments (Figure 5). (iv) Dozens of tissue samples can be processed in parallel. This feature might be useful, for example, in multiple organ (Figure 9, Supplementary Figure S6) or intratumor epigenetic heterogeneity studies. (v) PIXUL can be used in genome-wide sequencing studies (Figure 10 and Supplementary Figure S5). (vi) PIXUL has the potential as a multipurpose and multiomics sample preparation platform (Figure 10, Supplementary Figure S5-S6). (vii) PIXUL consumables cost a small fraction of expenses associated with use of other comparable systems (such as Covaris). The substantial cost reductions allow for more labs to carry out high-throughput studies. DATA AVAILABILITY Sequencing data was deposited in Gene Expression Omnibus database under entry GSE115822. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS We thank John Kucewicz, University of Washington Applied Physics Laboratory, and Jackson Chin, University of Washington Department of Medicine/Bioengineering, for writing the agarose gel electrophoresis image analysis code, and Greg Darlington, Matchstick Technologies, Inc., for ultrasound technical expertise and advice. FUNDING National Institutes of Health (NIH) [R33CA191135, R21GM111439, R01DK103849 to K.B.]; Life Sciences Discovery Fund (LSDF) of State of Washington [12330479 to K.B.]. Funding for open access charge: NIH [R33CA191135, R01DK103849]. Conflict of interest statement. K.B. and T.M. are co-founders in Matchstick Technologies, Inc., which has licensed the PIXUL technology from the University of Washington. PIXUL is described in a patent application (20170205318) filed by the University of Washington. A.B. is an employee of Active Motif that markets some of the antibodies and kits used in this study. Active Motif licensed PIXUL technology for commercialization. REFERENCES 1. Orlando V. , Strutt H. , Paro R. Analysis of chromatin structure by in vivo formaldehyde cross-linking . Methods . 1997 ; 11 : 205 – 214 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Solomon M.J. , Varshavsky A. Formaldehyde-mediated DNA-protein crosslinking: a probe for in vivo chromatin structures . Proc. Natl. Acad. Sci. U.S.A. 1985 ; 82 : 6470 – 6474 . Google Scholar Crossref Search ADS PubMed WorldCat 3. O’Neill L.P. , Turner B.M. Immunoprecipitation of chromatin . Methods Enzymol. 1996 ; 274 : 189 – 197 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Huebert D.J. , Kamal M. , O’Donovan A. , Bernstein B.E. Genome-wide analysis of histone modifications by ChIP-on-chip . Methods . 2006 ; 40 : 365 – 369 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Heintzman N.D. , Stuart R.K. , Hon G. , Fu Y. , Ching C.W. , Hawkins R.D. , Barrera L.O. , Van Calcar S. , Qu C. , Ching K.A. et al. . Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome . Nat. Genet. 2007 ; 39 : 311 – 318 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Brind’Amour J. , Liu S. , Hudson M. , Chen C. , Karimi M.M. , Lorincz M.C. An ultra-low-input native ChIP-seq protocol for genome-wide profiling of rare cell populations . Nat. Commun. 2015 ; 6 : 6033 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Bomsztyk K. , Flanagin S. , Mar D. , Mikula M. , Johnson A. , Zager R. , Denisenko O. Synchronous recruitment of epigenetic modifiers to endotoxin synergistically activated Tnf-alpha gene in acute kidney injury . PLoS One . 2013 ; 8 : e70322 . Google Scholar Crossref Search ADS PubMed WorldCat 8. Yu J. , Feng Q. , Ruan Y. , Komers R. , Kiviat N. , Bomsztyk K. Microplate-based platform for combined chromatin and DNA methylation immunoprecipitation assays . BMC Mol. Biol. 2011 ; 12 : 49 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Flanagin S. , Nelson J.D. , Castner D.G. , Denisenko O. , Bomsztyk K. Microplate-based chromatin immunoprecipitation method, Matrix ChIP: a platform to study signaling of complex genomic events . Nucleic Acids Res. 2008 ; 36 : e17 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Aldridge S. , Watt S. , Quail M.A. , Rayner T. , Lukk M. , Bimson M.F. , Gaffney D. , Odom D.T. AHT-ChIP-seq: a completely automated robotic protocol for high-throughput chromatin immunoprecipitation . Genome Biol. 2013 ; 14 : R124 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Garber M. , Yosef N. , Goren A. , Raychowdhury R. , Thielke A. , Guttman M. , Robinson J. , Minie B. , Chevrier N. , Itzhaki Z. et al. . A high-throughput chromatin immunoprecipitation approach reveals principles of dynamic gene regulation in mammals . Mol. Cell . 2012 ; 47 : 810 – 822 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Schoppee Bortz P.D. , Wamhoff B.R. Chromatin immunoprecipitation (ChIP): revisiting the efficacy of sample preparation, sonication, quantification of sheared DNA, and analysis via PCR . PLoS One . 2011 ; 6 : e26015 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Meyer C.A. , Liu X.S. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology . Nat. Rev. Genet. 2014 ; 15 : 709 – 721 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Skene P.J. , Henikoff S. A simple method for generating high-resolution maps of genome-wide protein binding . Elife . 2015 ; 4 : e09225 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Schmidl C. , Rendeiro A.F. , Sheffield N.C. , Bock C. ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors . Nat Methods . 2015 ; 12 : 963 – 965 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Zarnegar M.A. , Reinitz F. , Newman A.M. , Clarke M.F. Targeted chromatin ligation, a robust epigenetic profiling technique for small cell numbers . Nucleic Acids Res. 2017 ; 45 : e153 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Skene P.J. , Henikoff J.G. , Henikoff S. Targeted in situ genome-wide profiling with high efficiency for low cell numbers . Nat. Protoc. 2018 ; 13 : 1006 – 1019 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Blecher-Gonen R. , Barnett-Itzhaki Z. , Jaitin D. , Amann-Zalcenstein D. , Lara-Astiaso D. , Amit I. High-throughput chromatin immunoprecipitation for genome-wide mapping of in vivo protein-DNA interactions and epigenomic states . Nat. Protoc. 2013 ; 8 : 539 – 554 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Hall T. , Cain C. Clement GT , McDannold NJ , Hynynen K A low cost compact 512 channel therapeutic ultrasound system for transcutaneous ultrasound surgery . AIP Conference Proceedings . 2006 ; 829 : NY AIP 445 – 449 . Google Preview WorldCat 20. Kuo M.H. , Allis C.D. In vivo cross-linking and immunoprecipitation for studying dynamic Protein:DNA associations in a chromatin environment . Methods . 1999 ; 19 : 425 – 433 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Li H. , Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform . Bioinformatics . 2009 ; 25 : 1754 – 1760 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Bernstein B.E. , Birney E. , Dunham I. , Green E.D. , Gunter C. , Snyder M. An integrated encyclopedia of DNA elements in the human genome . Nature . 2012 ; 489 : 57 – 74 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Zhang Y. , Liu T. , Meyer C.A. , Eeckhoute J. , Johnson D.S. , Bernstein B.E. , Nusbaum C. , Myers R.M. , Brown M. , Li W. et al. . Model-based analysis of ChIP-Seq (MACS) . Genome Biol. 2008 ; 9 : R137 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Quinlan A.R. , Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features . Bioinformatics . 2010 ; 26 : 841 – 842 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Mikula M. , Bomsztyk K. Direct recruitment of ERK cascade components to inducible genes is regulated by the heterogeneous nuclear ribonucleoprotein (HnRNP) K . J. Biol. Chem. 2011 ; 286 : 9763 – 9775 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Nelson J.D. , Leboeuf R.C. , Bomsztyk K. Direct recruitment of insulin receptor and ERK signaling cascade to insulin-inducible gene loci . Diabetes . 2011 ; 60 : 127 – 137 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Morris D.P. , Lei B. , Longo L.D. , Bomsztyk K. , Schwinn D.A. , Michelotti G.A. Temporal dissection of rate limiting transcriptional events using Pol II ChIP and RNA analysis of adrenergic stress gene activation . PLoS One . 2015 ; 10 : e0134442 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Ernst J. , Kheradpour P. , Mikkelsen T.S. , Shoresh N. , Ward L.D. , Epstein C.B. , Zhang X. , Wang L. , Issner R. , Coyne M. et al. . Mapping and analysis of chromatin state dynamics in nine human cell types . Nature . 2011 ; 473 : 43 – 49 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Hou P. , Li Y. , Zhang X. , Liu C. , Guan J. , Li H. , Zhao T. , Ye J. , Yang W. , Liu K. et al. . Pluripotent stem cells induced from mouse somatic cells by small-molecule compounds . Science . 2013 ; 341 : 651 – 654 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Marks H. , Kalkan T. , Menafra R. , Denissov S. , Jones K. , Hofemeister H. , Nichols J. , Kranz A. , Stewart A.F. , Smith A. et al. . The transcriptional and epigenomic foundations of ground state pluripotency . Cell . 2012 ; 149 : 590 – 604 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Chen Y. , Blair K. , Smith A. Robust Self-Renewal of rat embryonic stem cells requires fine-tuning of glycogen synthase Kinase-3 inhibition . Stem Cell Rep. 2013 ; 1 : 209 – 217 . Google Scholar Crossref Search ADS WorldCat 32. Ware C.B. , Nelson A.M. , Mecham B. , Hesson J. , Zhou W. , Jonlin E.C. , Jimenez-Caliani A.J. , Deng X. , Cavanaugh C. , Cook S. et al. . Derivation of naive human embryonic stem cells . Proc. Natl. Acad. Sci. U.S.A. 2014 ; 111 : 4484 – 4489 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Freberg C.T. , Dahl J.A. , Timoskainen S. , Collas P. Epigenetic reprogramming of OCT4 and NANOG regulatory regions by embryonal carcinoma cell extract . Mol. Biol. Cell . 2007 ; 18 : 1543 – 1553 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Mathieu J. , Detraux D. , Kuppers D. , Wang Y. , Cavanaugh C. , Sidhu S. , Levy S. , Robitaille A.M. , Ferreccio A. , Bottorff T. et al. . Folliculin regulates mTORC1/2 and WNT pathways in early human pluripotency . Nat. Commun. 2019 ; 10 : 632 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Sperber H. , Mathieu J. , Wang Y. , Ferreccio A. , Hesson J. , Xu Z. , Fischer K.A. , Devi A. , Detraux D. , Gu H. et al. . The metabolome regulates the epigenetic landscape during naive-to-primed human embryonic stem cell transition . Nat. Cell Biol. 2015 ; 17 : 1523 – 1535 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Huang W. , Cao X. , Zhong S. Network-based comparison of temporal gene expression patterns . Bioinformatics . 2010 ; 26 : 2944 – 2951 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Birnbaum R.Y. , Clowney E.J. , Agamy O. , Kim M.J. , Zhao J. , Yamanaka T. , Pappalardo Z. , Clarke S.L. , Wenger A.M. , Nguyen L. et al. . Coding exons function as tissue-specific enhancers of nearby genes . Genome Res. 2012 ; 22 : 1059 – 1068 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Mikula M. , Skrzypczak M. , Goryca K. , Paczkowska K. , Ledwon J.K. , Statkiewicz M. , Kulecka M. , Grzelak M. , Dabrowska M. , Kuklinska U. et al. . Genome-wide co-localization of active EGFR and downstream ERK pathway kinases mirrors mitogen-inducible RNA polymerase 2 genomic occupancy . Nucleic Acids Res. 2016 ; 44 : 10150 – 10164 . Google Scholar PubMed WorldCat 39. Ostrowski J. , Kawata Y. , Schullery D. , Denisenko O.N. , Higaki Y. , Abrass C.K. , Bomsztyk K. Insulin alters heterogeneous ribonucleoprotein K protein binding to DNA and RNA . Proc. Natl. Aca. Sci. U.S.A. 2001 ; 98 : 9044 – 9049 . Google Scholar Crossref Search ADS WorldCat 40. Bomsztyk K. , Mar D. , An D. , Sharifian R. , Mikula M. , Gharib S.A. , Altemeier W.A. , Liles W.C. , Denisenko O. Experimental acute lung injury induces multi-organ epigenetic modifications in key angiogenic genes implicated in sepsis-associated endothelial dysfunction . Crit. Care . 2015 ; 19 : 225 . Google Scholar Crossref Search ADS PubMed WorldCat 41. Sharifian R. , Okamura D.M. , Denisenko O. , Zager R.A. , Johnson A. , Gharib S.A. , Bomsztyk K. Distinct patterns of transcriptional and epigenetic alterations characterize acute and chronic kidney injury . Sci. Rep. 2018 ; 8 : 17870 . Google Scholar Crossref Search ADS PubMed WorldCat 42. Mar D. , Gharib S.A. , Zager R.A. , Johnson A. , Denisenko O. , Bomsztyk K. Heterogeneity of epigenetic changes at ischemia/reperfusion- and endotoxin-induced acute kidney injury genes . Kidney Int. 2015 ; 88 : 734 – 744 . Google Scholar Crossref Search ADS PubMed WorldCat 43. Barski A. , Cuddapah S. , Cui K. , Roh T.Y. , Schones D.E. , Wang Z. , Wei G. , Chepelev I. , Zhao K. High-resolution profiling of histone methylations in the human genome . Cell . 2007 ; 129 : 823 – 837 . Google Scholar Crossref Search ADS PubMed WorldCat 44. Gasper W.C. , Marinov G.K. , Pauli-Behn F. , Scott M.T. , Newberry K. , DeSalvo G. , Ou S. , Myers R.M. , Vielmetter J. , Wold B.J. Fully automated high-throughput chromatin immunoprecipitation for ChIP-seq: identifying ChIP-quality p300 monoclonal antibodies . Sci. Rep. 2014 ; 4 : 5152 . Google Scholar Crossref Search ADS PubMed WorldCat 45. Landt S.G. , Marinov G.K. , Kundaje A. , Kheradpour P. , Pauli F. , Batzoglou S. , Bernstein B.E. , Bickel P. , Brown J.B. , Cayting P. et al. . ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia . Genome Res. 2012 ; 22 : 1813 – 1831 . Google Scholar Crossref Search ADS PubMed WorldCat 46. Ramirez F. , Dundar F. , Diehl S. , Gruning B.A. , Manke T. deepTools: a flexible platform for exploring deep-sequencing data . Nucleic Acids Res. 2014 ; 42 : W187 – W191 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
DNA barcodes for rapid, whole genome, single-molecule analysesWand, Nathaniel, O;Smith, Darren, A;Wilkinson, Andrew, A;Rushton, Ashleigh, E;Busby, Stephen, J W;Styles, Iain, B;Neely, Robert, K
doi: 10.1093/nar/gkz212pmid: 30918971
Abstract We report an approach for visualizing DNA sequence and using these ‘DNA barcodes’ to search complex mixtures of genomic material for DNA molecules of interest. We demonstrate three applications of this methodology; identifying specific molecules of interest from a dataset containing gigabasepairs of genome; identification of a bacterium from such a dataset and, finally, by locating infecting virus molecules in a background of human genomic material. As a result of the dense fluorescent labelling of the DNA, individual barcodes of the order 40 kb pairs in length can be reliably identified. This means DNA can be prepared for imaging using standard handling and purification techniques. The recorded dataset provides stable physical and electronic records of the total genomic content of a sample that can be readily searched for a molecule or region of interest. INTRODUCTION Direct visualization of the DNA sequence by optical mapping offers a unique perspective on genome structure (1). The single-molecule, long-range information that is derived from mapping has recently been invaluable in revealing hundreds of large genomic rearrangements of the human (2,3) and great ape (4) genomes. However, the development of nanopore sequencing has enabled sequence read lengths that are of the same scale as optical maps (5), thereby providing an alternative approach for scaffolding or assembling sequencing data from a dataset which is far more information rich than the comparative mapping data. Whilst nanopore sequencing is increasing the competition in the space traditionally occupied by optical mapping, the commercial Bionano Genomics platform continues to offer a valuable and reliable means of deriving a reference scaffold for genomic assemblies (6). Furthermore, imaging is particularly suited to multiplexed experiments, meaning that mapping offers a route by which whole genome studies can be performed, where sequence data can be directly correlated to, for example, a DNA repair event (7), DNA replication (8), or protein binding (9), at the single-molecule level. Whilst the promise of single-molecule mapping approaches for imaging DNA-based events has been demonstrated in such experiments, none to date has approached achieving this on the scale of a whole bacterial or human genome. Several approaches exist for producing optical maps of DNA sequence (10). Until recently, mapping has been reliant on either restriction enzymes (11) or nicking enzymes (2) to define the sequence motifs used in mapping. However, the discovery that methyltransferase enzymes can be used for fluorescent labelling of DNA (9,12–14) has been transformative because of their ability to yield sequence-specifically modified DNA without introducing damage (cuts or nicks) to the DNA. Previous mapping approaches have focussed on mapping of long DNA molecules, typically greater than 200 kb in length, driven by the relative infrequency of mapping sites. Whether defined by restriction, nicking or methyltransferase enzyme, a map must carry enough information for it to be reliably matched to a sequence of interest, and hence, applied. For a 5- or 6-base targeting enzyme, map density is typically one site every 10 kb, resulting in the need for molecules hundreds of kb in length for reliable map assembly. The production and handling of DNA molecules on this size range is technically challenging, requiring careful extraction and clean-up of DNA using gels to minimize shearing forces in the sample (15). We describe a methodology that significantly increases the accessibility and range of potential applications for optical mapping. We show that a high-density, yet sequence-specific labelling pattern, directed by a DNA methyltransferase allows single DNA molecules to be probed for sequences of interest at the whole-genome scale. Map analysis is demonstrated using relatively small DNA fragments (∼30 kb) that can be readily prepared using standard DNA extraction and purification kits and protocols. We show how this can be used to build datasets of hundreds of thousands of DNA molecules (several gigabasepairs of DNA) in less than an hour using standard fluorescence microscopy. To extend the imaging to applications in genomics, we have developed image-matching and classification techniques, which enable unique, whole genome analyses to be performed. For example, hundreds-of-thousands of single-molecule barcodes can be queried for a sequence of interest, and barcode images can be clustered, based on similarity, allowing the prevalent DNA molecules in a mixed population to be identified. We demonstrate application to the detection of viral infection in human cells, to identification of bacteria, and to the visualization of a region of interest in a genome that had been modified using CRISPR–Cas9 genome editing. In all, we expect this approach to find widespread application in the quantitative study of mixed genome samples and, particularly in studying the sequence context of genome-wide events and processes, such as replication or protein binding, that are not directly accessible with sequencing technologies. MATERIALS AND METHODS Labelling of genomic DNA A 200 μl solution containing 1× CutSmart Buffer (NEB), 10 μg genomic DNA, 0.9 μg TaqI DNA methyltransferase (M.TaqI) and 750 μM AdoHcy-azide (Supplementary Figure S1) was prepared and incubated at 50°C for 1 h. Subsequently, 5μl 18mg/ml proteinase K (NEB)/0.1% Triton X-100 (Sigma-Aldrich) was added and this was incubated at 50°C for 1 h, before purification by GenElute Bacterial Genomic DNA kit (Sigma-Aldrich). DNA was eluted into 200 μl TE Buffer (10 mM tris, 1 mM EDTA). Meanwhile, a 20 μl solution containing 0.5× phosphate buffered saline (Sigma-Aldrich), 10 μl DMSO, 1 mM dibenzylcyclooctyne-amine (Sigma-Aldrich) and 12.5 mM Atto 647N-NHS ester (Sigma-Aldrich) was incubated at 4°C for 1 h. The DNA sample was split into 30 μl aliquots and 10 μl of the mixture containing the Atto 647N was added to an aliquot. This mixture was incubated at room temperature overnight, before purification by GenElute Bacterial Genomic DNA kit and eluted into 50 μl TE buffer (10 mM tris, 1 mM EDTA). Molecular combing Molecular combing of DNA was performed based on the procedure described by Deen et al. (16). Glass coverslips (Borosilicate Glass No. 1, Thermo Fisher) were cleaned to remove any fluorescent contaminants by incubation in a furnace oven at 450°C for 24 h. After removing from the furnace and allowing to cool, 30 μl of Zeonex solution (Zeon Chemicals, 1.5% w/v solution Zeonex 330R in chlorobenzene) was deposited onto a coverslip on a spin coater (Ossila) and subsequently spun at 3000 rpm for 90 seconds. Zeonex-coated coverslips were allowed to dry at room temperature overnight and stored in a desiccator. To perform the molecular combing, 2 μl Atto 647N-labelled DNA (2 ng/μl in 1× TE) was suspended in 17 μl 100 mM sodium phosphate buffer (pH 5.7) containing 1 μl DMSO. A 1.5 μl droplet of this solution was deposited on the surface of the Zeonex-coated coverslip. A clean pipette tip was placed in contact with the droplet and used to drag it, with a velocity of approximately 5 mm/min, across the coverslip. Fluorescence microscopy Deposited DNA was imaged on an ASI RAMM microscope, equipped with a Nikon 100× TIRF objective. Illumination was from a 100 mW OBIS 640 nm CW laser via a quad-band dichroic mirror (405/488/561/635) and images were collected using an Evolve Delta EM-CCD camera, via a quad-band emission filter (Semrock, 432/515/595/730 nm). Micromanager was used to control the system and scan the sample (17). DNA barcode extraction, pairwise alignment and community detection Software was written in MATLAB (R2016b, The MathWorks, Inc., Natick, MA, USA) for the automated extraction of DNA barcodes from microscopy images, in silico generation of DNA barcodes and the alignment procedures. The computational process is outlined in Figure 1. Full details of the procedure for these processes are given in the Supplementary Information and the code we used to extract, process and analyse DNA barcodes are available at edata.bham.ac.uk, DOI: 10.25500/eData.bham.00000255. Figure 1. View largeDownload slide Overview of the computational steps taken to extract DNA barcodes from images and match them to a genome. Intensity profiles for each extracted barcode can either be directly compared to a database of molecules of interest or to every other barcode in the experimental dataset. In the latter approach, communities of similar DNA molecules can be identified and an average barcode for each community determined. Hence a single barcode for each community can be matched to a large database of possible genomes. Figure 1. View largeDownload slide Overview of the computational steps taken to extract DNA barcodes from images and match them to a genome. Intensity profiles for each extracted barcode can either be directly compared to a database of molecules of interest or to every other barcode in the experimental dataset. In the latter approach, communities of similar DNA molecules can be identified and an average barcode for each community determined. Hence a single barcode for each community can be matched to a large database of possible genomes. RESULTS AND DISCUSSION We label DNA using the M.TaqI DNA methyltransferase enzyme to direct the conjugation of fluorophores to sites reading 5′-TCGA-3′ (one site every 256 bp, on average). The M.TaqI enzyme is well-suited to DNA labelling using synthetically-prepared analogues of S-adenosyl-l-methionine and we use it here to add reactive azide groups to target adenine bases. We tested a range of conditions for labelling and several DBCO-conjugated organic dyes and developed conditions that give approximately nine fluorophores for every 10 labelling sites on long, genomic DNA molecules (14,18). Labelled DNA was separated from the methyltransferase enzyme and reactive dye using a standard silica-based column (for genome purification). In a typical experiment, we deposit hundreds of gigabases of DNA barcodes from a 1 μl droplet containing 100 pg of DNA (16). A fraction of this, typically a few gigabases, is imaged for analysis. We visualize of the order of 10 000 molecules in 1000 fields of view in ∼20 min of imaging (Supplementary Figure S2 shows an example dataset). Software for the automated extraction of the images of DNA molecules from these datasets was developed and is described in the Supporting Information and in Supplementary Figure S3. Each DNA barcode extracted from the imaging data is stored as a string of integers, where each integer represents the intensity of an individual pixel along the DNA image. Initially, this dataset is filtered to remove molecules that are entwined/aggregated (using an average intensity threshold) and those that are estimated to be shorter than 30 kb in length. An overview of the matching process for a dataset is given in Figure 1 and shown for experimental data in Figure 2. The process of matching the intensity profiles of two barcodes (experiment/reference or experiment/experiment) is a two-step procedure (see SI for a more detailed description): Find the optimal cross correlation between the two barcodes. Both the relative displacement of the two barcodes and the stretch of the experimental barcode are optimized. Determine the alignment weight. Figure 2. View largeDownload slide Alignment of pure experimental sample of the bacteriophage lambda genome. DNA was labelled with Atto647N using M.TaqI to direct labelling, combed and imaged. 38,000 candidate DNA barcodes were extracted from the images. (A) After filtering the experimental data, 1077 barcodes are identified for further analysis. (B and C) The fluorescence intensity profile of each experimental DNA barcode (red) is cross-correlated with a reference molecule (blue) to find the optimal stretch and alignment of the data. (D) Selected barcodes (368) with an alignment weight greater than the threshold. At the bottom of the image is shown the mean experimental barcode, the mean with the background removed and the reference barcode (top to bottom). (E) Alignment weight for all experimental barcodes is calculated and a threshold (of 0.7 in this case) is applied (black line). (F) Plot of mean experimental barcode (red) against the reference barcode (blue), with the number of barcodes (black). Figure 2. View largeDownload slide Alignment of pure experimental sample of the bacteriophage lambda genome. DNA was labelled with Atto647N using M.TaqI to direct labelling, combed and imaged. 38,000 candidate DNA barcodes were extracted from the images. (A) After filtering the experimental data, 1077 barcodes are identified for further analysis. (B and C) The fluorescence intensity profile of each experimental DNA barcode (red) is cross-correlated with a reference molecule (blue) to find the optimal stretch and alignment of the data. (D) Selected barcodes (368) with an alignment weight greater than the threshold. At the bottom of the image is shown the mean experimental barcode, the mean with the background removed and the reference barcode (top to bottom). (E) Alignment weight for all experimental barcodes is calculated and a threshold (of 0.7 in this case) is applied (black line). (F) Plot of mean experimental barcode (red) against the reference barcode (blue), with the number of barcodes (black). All of the deposited DNA molecules are (over)stretched on the slide during combing. This is due to the force, generated as the DNA leaves the droplet, which gives a consistent stretch of the DNA to around 1.6 times that of B-form DNA, i.e. the step between base pairs is ∼0.54 nm on the surface, on average. We correct for this in the alignment to a reference molecule and allow the experimental barcode to vary in length by ±10%, to account for any inconsistency in the stretch. We apply two general approaches for interrogation of a dataset of DNA barcodes, each revealing complementary information on the composition of the sample: Pairwise matching: Direct matching of experimental barcodes to a reference library A similarity search: Matching the dataset to itself in search of communities of similar molecules. Subsequent matching of average barcodes from communities to a reference library. In order to develop an understanding of the scope of the mapping data, and the questions we might address using it, initially we examined each of these analytical approaches using data generated in silico. An in silico assessment of the impact of experimental variables on mapping DNA barcodes were generated as strings of integers in silico and used in Monte Carlo simulations to understand the impact of a range of experimental variables on our ability to match a DNA barcode to a given DNA sequence. Barcodes were generated from the Escherichia coli K-12 genome sequence using the parameters described in Supplementary Table S1. A summary of the results from these simulations is given in the Supplementary Figures S4 and S5. We find that, for example, for DNA molecules >30 kb in length, a labelling efficiency of one fluorophore per palindromic target site will allow accurate matching of 90% of the experimental data to the 4.6 Mb E. coli genome, Supplementary Figure S5. As a result, this length threshold has been applied to the analyses presented, henceforth. The ability to match such short DNA barcodes to a reference genome highlights a significant advantage of using a high density of DNA labelling for the mapping experiment. The preparation of ultra-long DNA molecules is time-consuming and requires significant expertise, yet molecules 30–50 kb in length can be readily prepared using standard sample preparation kits for genomic DNA extraction and purification. As well as DNA labelling efficiency, and DNA length, we find that non-specific labelling and the signal-to-noise ratio in the imaging are important parameters for generating reliable fits but factors such as a variation in fluorophore intensity, the degree of stretching of the DNA barcode and image resolution have little impact on matching, within the bounds we tested. We also used the barcodes we generated in silico to improve our measure of the goodness of fit between a barcode and its reference sequence. These calculations show that an approach relying solely on the normalized cross-correlation between barcodes does not give a sufficiently discriminating measure of the goodness of fit to resolve correctly- and incorrectly-fitted populations of molecules (for the simulated sample of E. coli K-12 genome). In order to address this, we introduced an alignment weighting, which is calculated as the mean of three measures of fit quality; the normalized cross correlation, the difference in intensity of the two signals and the difference in the gradients of the two signals. By doing so, we were able to improve significantly the accuracy with which we could resolve correctly-aligned molecules from an incorrectly-aligned population, even with relatively low labelling efficiencies (Supplementary Figures S6 and S7). Locating specific DNA molecules in a sample (pairwise matching) In a simple implementation of our mapping approach we make a pairwise comparison between an imaged DNA barcode and a barcode generated in silico for a genome or genomic feature of interest. Figure 2 gives an overview of the analytical process for matching many DNA barcodes to a single, known reference sequence(s): Each intensity profile is compared, by cross-correlation, with the profile of a known sequence of interest. The relative length of the experimental barcode is compressed (1.5- to 1.7-fold) to allow for optimal matching. The best match between molecules is scored (given an alignment weight) and a histogram for all alignment weights in the dataset is generated. Barcodes with an alignment weighting above a selected threshold value are averaged, and compared to the reference, to confirm the match. Cross-correlation of the signals of thousands of molecules with this (short) reference sequence in this instance takes around ten seconds (standard laptop computer with 16GB RAM, 3.20 GHz Intel Core i7 processor). Subsequently, an alignment weight is generated and used as a measure of the match quality. Molecules with a weight above a specified threshold are used to generate an average experimental barcode that can be inspected to visually confirm the match to the reference data. This works well in the case where the reference data is a short (say 50–100 kb) sequence of interest, as demonstrated in Figure 2 for the bacteriophage lambda genome. However, the time taken for matching the experimental data scales linearly with the number (total length) of the genomes in the reference library. Hence, for example, running the sample shown in Figure 2 against a library of 2000 virus genomes would take around 5 h. The likelihood of a spurious match occurring also increases with the size of the reference library. Supplementary Figure S8(A) shows the result of matching of individual barcodes from an experiment containing the bacteriophage lambda and T7 bacteriophage genomes against a library of twenty virus genomes. Whilst both the lambda and T7 genomes are well represented in the dataset of correctly matched molecules, neither can be reliably identified by simply assigning barcodes to their best match in the reference database. However, an identification can be made with greater than 80% accuracy by using an appropriate weighting threshold (Supplementary Figure S6, S7 and supporting text) and counting the number of molecules with matches above this threshold for all genomes in the database, Supplementary Figure S8(B). We now extend this approach to a ‘real-world’ example to identify a specific genomic region of interest. We consider a sample of the E. coli strain DH10B (TOP10) genome, which has been edited using CRISPR/Cas9. Although the edit in this case is only 64 base pairs in length (∼30 nm on the stretched DNA) and so cannot be directly visualized using standard optical microscopy, molecules matching the region of interest can be identified from their barcodes and we can subsequently search the original dataset for the images of those molecules, Figure 3. The dataset of filtered barcodes used in this analysis consists of 1629 molecules. The pairwise matching process assigns each to the region of the genome where its alignment weight is the highest. As a result of this, 6 molecules were found to overlay with at least 25% of the 20 kb region of the interest (containing the short genome edit) from the E. coli genome. Examples of four of these are shown in Figure 3. Figure 3A shows that the consensus taken from the identified barcodes is in excellent agreement with the reference barcode. Supplementary Figure S9 gives a second example, for the same dataset but with a search for molecules having barcodes consistent with the region of the bacterium's genome carrying the lacZ gene. We find fewer barcodes- just two-matching to this region, with a score above the threshold weight, indicating a good match. Figure 3. View largeDownload slide Localization of barcodes containing a CRISPR/Cas9 edit. Red lines show experimental barcode profiles. Blue lines show reference genome profiles. Black dashed lines show the expected position of the edit on the genome. Values in parenthesis below show alignment weight to reference. (A) Consensus barcode generated from all barcodes overlapping with at least 25% of the region of interest. A maximum (minimum) of 35 (18) barcodes (solid black line) contribute to the consensus (0.963). (B) Consensus of all barcodes aligned to the genome reference. A maximum (minimum) of 168 (0) barcodes (solid black line) contribute to the consensus across the genome (0.815). (C, E, G, I) Single molecule barcodes aligned to region of interest (0.866, 0.864, 0.860, 0.831). (D, F, H, J) Raw images of barcodes (shown in C, E, G and I, respectively) identified as overlapping region of interest. Figure 3. View largeDownload slide Localization of barcodes containing a CRISPR/Cas9 edit. Red lines show experimental barcode profiles. Blue lines show reference genome profiles. Black dashed lines show the expected position of the edit on the genome. Values in parenthesis below show alignment weight to reference. (A) Consensus barcode generated from all barcodes overlapping with at least 25% of the region of interest. A maximum (minimum) of 35 (18) barcodes (solid black line) contribute to the consensus (0.963). (B) Consensus of all barcodes aligned to the genome reference. A maximum (minimum) of 168 (0) barcodes (solid black line) contribute to the consensus across the genome (0.815). (C, E, G, I) Single molecule barcodes aligned to region of interest (0.866, 0.864, 0.860, 0.831). (D, F, H, J) Raw images of barcodes (shown in C, E, G and I, respectively) identified as overlapping region of interest. Figure 3B shows the alignment of the entire dataset across the E. coli DH10B genome. Notably, there are several ‘gaps’ in this barcode alignment, where none of the experimental barcodes are aligned to some regions of the genome. On further investigation, we found that this is due to the reference DNA barcode at these locations, where the matching of barcodes to these regions is less robust, with respect to imperfections in the barcode, relative to other regions of the reference genome. For example, in regions with particularly high densities of M.TaqI sites (bright regions in the barcode) the alignment weighting is relatively unaffected by changes in labelling efficiency, compared to a barcode where the distribution of labels is more uniform across the barcode. Hence, even the best matches in some regions of the genome have alignment weightings that lie below the cut-off threshold that we have applied (uniformly) across the whole genome. Future work will focus on developing a dynamic threshold for any reference genome (a ‘P-value’) that allows an unbiased estimation of the appropriate threshold for a ‘good’ match at a given locus. Pairwise matching of the experimental barcodes enables interrogation of a genomic imaging dataset for specific features of interest, based on its underlying DNA barcode (sequence). Currently, the experimental barcodes identified using this approach represent ‘candidate’ barcodes that likely correspond to the identified genomic region but this identification (at the single-molecule level) cannot be regarded as definitive. Identification of monocultures of bacteria (pairwise matching) For a cultured sample, the pairwise matching procedure can be extended to identify viruses and bacteria from a library of known reference genomes. Rather than matching the recorded dataset of DNA barcodes to a single, known reference sequence with a length comparable to that of the barcode, here the dataset is matched to a library containing around one hundred megabases of DNA sequence. Rather than using a specified threshold to locate and visualize the molecules, we rely on the likelihood that, on average, more of the dataset will fit well to the correct genome than will fit spuriously to an alternative, perhaps related genome. We tested this approach on the identification of three cultured isolates of bacteria. The genomes of E. coli strain DH10B; E. coli strain EC958 and K. pneumoniae strain Ecl8 were imaged and the data filtered as described above. We selected a reference ‘database’ of twenty-five bacterial genome sequences, for initial species identification. Each experimental DNA barcode was aligned against this database and is assigned to the genome with which it has the highest alignment weighting, Figure 4A. Since the population of experimental data is imperfect, many incorrect assignments are made (assuming the sample contains only labelled DNA from the cultured organism). However, for all three samples the species with the most barcodes assigned to it is the sample that has been cultured. Furthermore, upon comparison to a database of E. coli strains, selected based on their similar phylogeny to the cultured strain, (19) both strains of E. coli were correctly identified using this approach. Indeed, the SE15 and JJ1886 strains, which are thought to be closely related to EC958 (belong to the same phylogroup), have a similar number of matched barcodes to the EC958 strain, and all three have significantly more barcodes matched to them than the other, less closely related strains in the plot of ‘matched barcodes’, Figure 4B. Figure 4. View largeDownload slide Identification of bacterial DNA by species. Samples of DNA extracted from cultured K. pneumoniae strain Ecl8 (blue); E. coli strain DH10B (green) or E. coli strain EC958 (yellow) were labelled with Atto 647N using methyltransferase-directed labelling (M.TaqI). (A) Each barcode from the sample was assigned to the species to which its alignment yielded the highest alignment weight. In each case, a simple count of the number of barcodes in the sample matching a given reference genome in this way, allows the specie of bacterium to be identified. (B) Similarly, having identified a species, each barcode is now matched against several bacterial strains and then assigned to the strain to which its alignment yielded the highest alignment weight. Both E. coli strains are correctly identified. In both plots the bars are scaled such that the most populated field has a value of 1. Accession numbers for the genomes used and the absolute numbers of barcodes assigned to each are given in the Supporting Supplementary Tables S2 and S3. Figure 4. View largeDownload slide Identification of bacterial DNA by species. Samples of DNA extracted from cultured K. pneumoniae strain Ecl8 (blue); E. coli strain DH10B (green) or E. coli strain EC958 (yellow) were labelled with Atto 647N using methyltransferase-directed labelling (M.TaqI). (A) Each barcode from the sample was assigned to the species to which its alignment yielded the highest alignment weight. In each case, a simple count of the number of barcodes in the sample matching a given reference genome in this way, allows the specie of bacterium to be identified. (B) Similarly, having identified a species, each barcode is now matched against several bacterial strains and then assigned to the strain to which its alignment yielded the highest alignment weight. Both E. coli strains are correctly identified. In both plots the bars are scaled such that the most populated field has a value of 1. Accession numbers for the genomes used and the absolute numbers of barcodes assigned to each are given in the Supporting Supplementary Tables S2 and S3. This promising result begins to show how we can use- and what we can reasonably achieve- with the DNA barcode data and current analytical approach. The experimental barcodes are a collection of DNA molecules that are imperfectly labelled, imaged and analysed. As a result, some barcodes, from some regions of the genome can be well-matched to many possible reference genomes. Currently, this is the ‘noise’, inherent in our dataset (Supplementary Tables S2 and S3 describe the number of matched barcodes for the library of genomes used in these experiments). However, cumulatively, using no more than a couple of thousands of experimental barcodes, we are able to generate sufficient numbers of reliable matches to a reference genome that we can give an indicative identity for both species and strain. The alignment weight is not sufficiently discriminating that we can achieve this with a single barcode. Rather, on the balance of probability, from a population of many experimental barcodes, a majority will match to the reference genome in the library that is most closely related to their own. This comes with some important caveats; the library of ‘other’ genomes we match to is small because running the analysis against a large library would be prohibitively slow (using a personal computer); we apply no threshold to the alignment weight, hence there may be many spurious matches in the dataset (off-target matches may be random); and this process assumes the input DNA is from a purified sample of DNA from a laboratory-grown monoculture. Identification of organisms from their genomes in more complex mixtures requires a more sophisticated analytical routine. We describe such an approach below. De novo clustering of similar genomic barcodes (a similarity search) The mapping dataset offers a unique way to investigate the genomic composition of a sample without prior knowledge of its content. We sought to take advantage of this by developing an approach to cluster, quantify and identify similar molecules in the dataset. In the mapping data, connections between molecules that are highly similar can be made using the alignment weighting. The alignment of every molecule to every other molecule in the dataset allows us to generate an affinity (similarity) matrix for the dataset. The affinity matrix is converted to an adjacency matrix by identifying the most similar matches in the dataset (using the alignment weight) and generating links between them. These communities of closely related barcodes can be used to generate single, consensus barcodes, representative of their community. Such analysis encompasses many thousands of imaged molecules but provides a mechanism for dramatically reducing the size of the dataset and then comparing this reduced representation to a large reference library. For 1000 (∼40 kb) experimental barcodes, the process of generating these communities of similar barcodes takes ∼1 min. In summary, the procedure is as follows: Each DNA barcode compared, by cross-correlation, with the barcodes of every other DNA molecule in the dataset. The resulting matrix of alignment weightings (affinity matrix) is used to link closely related molecules in the dataset (generate communities). An average barcode is generated for each community and this is compared to a large library of reference genomes to identify the community. We validated this approach using a total of 6000 DNA barcodes generated in silico; 4800 of these are from a selection of 20 viral genomes and the remaining 1200 are randomly generated to represent, for example, contamination of the sample or poorly labelled DNA molecules. The analysis groups these molecules into a series of distinct clusters. Similarity within the dataset is visualized using a tool for dimensional reduction, t-SNE (20), in Figure 5. Note that, although we use t-SNE to visualize the data, the communities of similar barcodes are generated based on the affinity matrix (which is difficult to visualize), not the t-SNE plot. The t-SNE plot displays each DNA barcode as a single dot and clusters similar barcodes closely. Around the periphery of the plot is a ring of barcodes that are equally dissimilar to all other barcodes in the dataset (the predominantly grey colouring of the dots indicates that these are almost exclusively the randomly-generated DNA barcodes in the dataset). The consensus barcodes for each detected community were generated and subsequently aligned against a library of 1994 phage genomes, a process that took around 1 h to complete. Figure 5C and Supplementary Table S2 summarize the output of these alignments for five shuffles of the dataset. Randomly re-ordering (shuffling) the dataset changes the relative positions of any two molecules in the affinity matrix for that dataset. We use this to ensure that we recover similar communities for each shuffle and, hence, can be confident that the process we use for constructing affinity and adjacency matrices is robust, with respect to the order of the dataset. Figure 5. View largeDownload slide Community detection for barcodes generated in silico. For each of 20 bacteriophage genomes, between 50 and 430 barcodes were generated, with 50% labelling efficiency (coloured dots). 1200 randomly generated barcodes were also added to the simulation (grey dots). (A) t-SNE visualization of the network of communities generated from the synthetic data. Each colour represents a different genome with the grey dots representing randomly-labelled DNA barcodes. (B) t-SNE visualization of the communities detected in the dataset. Each colour represents a community of similar DNA barcodes. Note that communities tend to absorb similar random barcodes. As expected, a ring of predominantly random barcodes, which are poorly matched to all other barcodes in the sample, appears on the periphery of the t-SNE visualization. (C) Plot showing the mean number of barcodes matched against those generated. Data was shuffled and aligned five times with error bars showing the maximum and minimum number of barcodes assigned to a given genome for the five repeats. Figure 5. View largeDownload slide Community detection for barcodes generated in silico. For each of 20 bacteriophage genomes, between 50 and 430 barcodes were generated, with 50% labelling efficiency (coloured dots). 1200 randomly generated barcodes were also added to the simulation (grey dots). (A) t-SNE visualization of the network of communities generated from the synthetic data. Each colour represents a different genome with the grey dots representing randomly-labelled DNA barcodes. (B) t-SNE visualization of the communities detected in the dataset. Each colour represents a community of similar DNA barcodes. Note that communities tend to absorb similar random barcodes. As expected, a ring of predominantly random barcodes, which are poorly matched to all other barcodes in the sample, appears on the periphery of the t-SNE visualization. (C) Plot showing the mean number of barcodes matched against those generated. Data was shuffled and aligned five times with error bars showing the maximum and minimum number of barcodes assigned to a given genome for the five repeats. Figure 5 shows that, using data generated in silico, 18 of the 20 bacteriophage genomes that were introduced to the mixture can be consistently identified across all five repeats of the analysis. Of the other 1974 genomes in the library, no incorrect (false positive) matches were made. The KBNP1711 genome is not identified in any of the analyses and we attribute this to the inherently low score our alignment weighting gives to good matches with this genome. Other factors, such as absolute and relative barcode numbers, as well as coverage, also play a role in determining the efficacy of the analysis detecting a given genome. Also, note that the impact of the randomly generated barcodes on the analysis limits an absolute, quantitative analysis of the data at this point. Such barcodes represent, for example, a contaminating population of DNA in the dataset or poorly labelled DNA molecules. The detected communities (Figure 5B) invariably expand to absorb a handful of these molecules, such that we generally overestimate the number of barcodes belonging to a given community in the analysis. Identifying viruses against large quantities of host cell genome background To test this approach experimentally, we sought to apply it to identify viral infections of cells. In a first example, we doped a known virus genome (bacteriophage lambda) into a sample of bacterial genomic DNA (E. coli ER2566), which was labelled and imaged, as described earlier. We mixed the sample at a ratio of 1:4 (weight for weight virus: bacterial genomic DNA) to mimic the expected copy number- around 20 copies per cell- of such an infection. Figure 6 shows a clear and unambiguous identification of the phage lambda genome from the mixed sample. In this case, we extracted 5114 barcodes from the imaging dataset, of which 475 were clustered and identified as the lambda phage genome. Figure 6A shows the tSNE visualization derived from the affinity matrix and the two clusters of similar barcodes that are identified as the bacteriophage lambda DNA highlighted in red and blue. The process of identification, using a library of 2000 phage genomes took 47 min to run on a standard laptop computer (16GB RAM, 3.20 GHz Intel Core i7 processor). The t-SNE visualization of the data in Figure 6 is typical of an experimental dataset, where (amongst other factors) labelling efficiency, impurities in the sample and aggregation of DNA barcodes give rise to larger, more diffuse communities than we observed for the simulated dataset. Figure 6B shows that this noise in the dataset has no significant impact on our ability to accurately identify a simple virus genome from a library of possibilities, using the consensus barcode of a given community. In total, 2 out of 41 clusters were identified as the bacteriophage lambda genome. Quantitatively, the absolute number of lambda phage genomes in the sample is underestimated, with 1 in 10 of the barcodes attributed to the bacteriophage lambda, whereas we introduced approximately one in five to the mixture. Figure 6. View largeDownload slide Identification of Lambda bacteriophage DNA in genomic mixture by de novo separation and alignment of experimental barcodes. (A) t-SNE visualization of the communities generated from the adjacency matrix. Community detection gives rise to two clusters (blue and red points) which are matched to the lambda phage genome. A total of 475 of 5114 barcodes are in these two clusters. (B) Consensus (average) barcodes (solid lines) generated from the blue and red clusters and their alignment to the bacteriophage lambda reference genome (dotted lines). Figure 6. View largeDownload slide Identification of Lambda bacteriophage DNA in genomic mixture by de novo separation and alignment of experimental barcodes. (A) t-SNE visualization of the communities generated from the adjacency matrix. Community detection gives rise to two clusters (blue and red points) which are matched to the lambda phage genome. A total of 475 of 5114 barcodes are in these two clusters. (B) Consensus (average) barcodes (solid lines) generated from the blue and red clusters and their alignment to the bacteriophage lambda reference genome (dotted lines). We extended this model study to investigate an infection of a population of cultured human (HeLa) cells, infected with human adenovirus A (type 12), 72 h, post-infection. The mixed sample of human and viral DNA was fluorescently labelled, deposited and imaged. The barcodes extracted from these images (1986) were compared to one another, creating an affinity matrix from which 21 communities were identified. Consensus barcodes from these were compared to a database of 128 reference barcodes of vertebrate viruses. A single virus (human adenovirus A) was identified from the sample, in ∼10 min. Figure 7 shows at tSNE visualization of the four communities of barcodes from the sample that are identified as the human adenovirus A and their associated consensus barcodes. Note that each community describes a slightly different consensus barcode, Figure 7B. Supplementary Figure S10 shows exemplars of individual virus DNA barcodes identified in this sample. Figure 7. View largeDownload slide Identification of human adenovirus A DNA sample by separation, de novo alignment and assignment of consensus barcode to reference library. (A) t-SNE visualization of the communities generated from the adjacency matrix. Community detection gives rise to four clusters (blue, red, yellow and purple points) which are matched to the human adenovirus A genome. 669 of 1986 barcodes (4 from 21 clusters) are assigned to adenovirus A DNA. (B) Consensus (average) barcodes (solid lines) generated from the blue, red, yellow and purple clusters and their alignment to the human adenovirus A reference genome (dotted lines). Figure 7. View largeDownload slide Identification of human adenovirus A DNA sample by separation, de novo alignment and assignment of consensus barcode to reference library. (A) t-SNE visualization of the communities generated from the adjacency matrix. Community detection gives rise to four clusters (blue, red, yellow and purple points) which are matched to the human adenovirus A genome. 669 of 1986 barcodes (4 from 21 clusters) are assigned to adenovirus A DNA. (B) Consensus (average) barcodes (solid lines) generated from the blue, red, yellow and purple clusters and their alignment to the human adenovirus A reference genome (dotted lines). In the two above examples, each cell in the sample contains several tens of copies of an invading genome. We sought to explore the limits of our approach by applying this to the identification of (relatively low copy number) plasmids in bacterial cultures. Initial simulations of mixed samples of genomic DNA and plasmid DNA showed promising results, even where we introduce appropriate levels of experimental imperfections to the synthetic dataset. However, initial experiments using this approach have proven unsuccessful in reliably identifying large plasmids in bacterial cultures. This we attribute to their low copy number in bacterial cells. Indeed, if we reduce the number of copies of a plasmid to less than five per cell, simulations show that it is (unsurprisingly) difficult to identify a community of barcodes that can be mapped to a specific plasmid. However, since the sequence of interest is known in this case, a simple pairwise alignment of the dataset to a reference (plasmid) barcode can be used and has yielded a handful of candidate barcodes with high alignment weighting to the resistance plasmid (Supplementary Figures S11–S13). Further investigation is required to confirm the identity of these molecules, yet this preliminary result suggests that the identification of a small number of molecules of interest from a large sample is possible. Such an approach could be twinned with super-resolution imaging in the future to better resolve the barcodes of such a subset of candidate molecules. CONCLUSION We have provided a basis for the use of DNA barcodes as a tool for visualization of the sequence of whole genomes. We have collected multiple datasets containing gigabases of DNA, each of which is of the order of 120MB in size. We have shown that this data can be searched for a genomic region of interest, used to search a library of genomes for matching organisms and that similar molecules within the dataset can be identified and clustered to identify viral infections against the background of host genomic material. In the future, work will focus on exploiting the strength of these optical measurements to allow multiplexed studies of both sequence and ‘event’. Indeed, we believe that this unique view of the genome will enable a new perspective, i.e. one with sequence context, on events and processes, such as replication (in combination with labelling of replicated DNA using halogenated (21) or alkyne-functionalized (22) nucleotide analogues), (fluorescently-tagged) drug or protein binding or genomic editing across whole genomes, at the single-molecule level. DATA AVAILABILITY The datasets used for analysis in this study and an annotated version of the Matlab code we used to extract, process and analyse DNA barcodes are available at edata.bham.ac.uk, DOI: 10.25500/eData.bham.00000255. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS We would like to gratefully acknowledge Dr Francisco Fernandez-Trillo for invaluable support regarding the synthesis of AdoHcy-azide, Dr Roger Grand for supplying DNA extracted from HeLa cells, infected with the Human adenovirus; Dr Michelle Buckner and Prof. Laura Piddock for supplying DNA extracted from bacterial cultures. We would especially like to thank Anna Dumitriu and her collaborator Dr Sarah Goldberg (MRG-Grammar/Technion) for sharing with us her ‘Make Do and Mend’ strain of E. coli and for bringing her inspirational art/science approach to our lab. FUNDING European Union's Horizon 2020 research and innovation programme under grant agreement [634890]; Engineering and Physical Sciences Research Council through a Healthcare Technologies Challenge Award (RKN) [EP/N020901/1]; Physical Sciences for Health Centre for Doctoral Training (NW) [EP/L016346/1]. Funding for open access charge: University of Birmingham. Conflict of interest statement. R.K.N. is a founder of Chrometra, a company selling kits for methyltransferase-directed labelling of DNA. REFERENCES 1. Müller V. , Westerlund F. Optical DNA mapping in nanofluidic devices: principles and applications . Lab Chip . 2017 ; 17 : 579 – 590 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Mak A.C.Y. , Lai Y.Y.Y. , Lam E.T. , Kwok T.-P. , Leung A.K.Y. , Poon A. , Mostovoy Y. , Hastie A.R. , Stedman W. , Anantharaman T. et al. . Genome-Wide structural variation detection by genome mapping on nanochannel arrays . Genetics . 2016 ; 202 : 351 – 362 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Pendleton M. , Sebra R. , Pang A.W.C. , Ummat A. , Franzen O. , Rausch T. , Stütz A.M. , Stedman W. , Anantharaman T. , Hastie A. et al. . Assembly and diploid architecture of an individual human genome via single-molecule technologies . Nat. Methods . 2015 ; 12 : 780 – 786 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Kronenberg Z.N. , Fiddes I.T. , Gordon D. , Murali S. , Cantsilieris S. , Meyerson O.S. , Underwood J.G. , Nelson B.J. , Chaisson M.J.P. , Dougherty M.L. et al. . High-resolution comparative analysis of great ape genomes . Science . 2018 ; 360 : eaar6343 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Jain M. , Koren S. , Miga K.H. , Quick J. , Rand A.C. , Sasani T.A. , Tyson J.R. , Beggs A.D. , Dilthey A.T. , Fiddes I.T. et al. . Nanopore sequencing and assembly of a human genome with ultra-long reads . Nat. Biotechnol. 2018 ; 36 : 338 – 345 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Deschamps S. , Zhang Y. , Llaca V. , Ye L. , Sanyal A. , King M. , May G. , Lin H. A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping . Nat. Commun. 2018 ; 9 : 4844 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Zirkin S. , Fishman S. , Sharim H. , Michaeli Y. , Don J. , Ebenstein Y. Lighting up individual DNA damage sites by in vitro repair synthesis . J. Am. Chem. Soc. 2014 ; 136 : 7771 – 7776 . Google Scholar Crossref Search ADS PubMed WorldCat 8. Lacroix J. , Pélofy S. , Blatché C. , Pillaire M.-J. , Huet S. , Chapuis C. , Hoffmann J.-S. , Bancaud A. Analysis of DNA replication by optical mapping in nanochannels . Small . 2016 ; 12 : 5963 – 5970 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Kim S. , Gottfried A. , Lin R.R. , Dertinger T. , Kim A.S. , Chung S. , Colyer R.A. , Weinhold E. , Weiss S. , Ebenstein Y. Enzymatically incorporated genomic tags for optical mapping of DNA-Binding proteins . Angew. Chem. Int. Ed. Engl. 2012 ; 51 : 3578 – 3581 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Neely R.K. , Deen J. , Hofkens J. Optical mapping of DNA: Single‐molecule‐based methods for mapping genomes . Biopolymers . 2011 ; 95 : 298 – 311 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Teague B. , Waterman M.S. , Goldstein S. , Potamousis K. , Zhou S. , Reslewic S. , Sarkar D. , Valouev A. , Churas C. , Kidd J.M. et al. . High-resolution human genome structure by single-molecule analysis . Proc. Natl. Acad. Sci. U.S.A. 2010 ; 107 : 10848 – 10853 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Levy-Sakin M. , Grunwald A. , Kim S. , Gassman N.R. , Gottfried A. , Antelman J. , Kim Y. , Ho S.O. , Samuel R. , Michalet X. et al. . Toward Single-Molecule optical mapping of the epigenome . ACS Nano . 2014 ; 8 : 14 – 26 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Deen J. , Wang S. , Van Snick S. , Leen V. , Janssen K. , Hofkens J. , Neely R.K. A general strategy for direct, enzyme-catalyzed conjugation of functional compounds to DNA . Nucleic Acids Res. 2018 ; 46 : e64 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Lauer M.H. , Vranken C. , Deen J. , Frederickx W. , Vanderlinden W. , Wand N. , Leen V. , Gehlen M.H. , Hofkens J. , Neely R.K. Methyltransferase-directed covalent coupling of fluorophores to DNA . Chem. Sci. 2017 ; 8 : 3804 – 3811 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Kaykov A. , Taillefumier T. , Bensimon A. , Nurse P. Molecular combing of single DNA molecules on the 10 megabase scale . Scientific Rep. 2016 ; 6 : 19636 . Google Scholar Crossref Search ADS WorldCat 16. Deen J. , Sempels W. , De Dier R. , Vermant J. , Dedecker P. , Hofkens J. , Neely R.K. Combing of genomic DNA from droplets containing picograms of material . ACS Nano . 2015 ; 9 : 809 – 816 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Edelstein A.D. , Tsuchida M.A. , Amodaj N. , Pinkard H. , Vale R.D. , Stuurman N. Advanced methods of microscope control using μManager software . J. Biol. Methods . 2014 ; 1 : e10 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Deen J. , Wang S. , Van Snick S. , Leen V. , Janssen K. , Hofkens J. , Neely R.K. A general strategy for direct, enzyme-catalyzed conjugation of functional compounds to DNA . Nucleic Acids Res. 2018 ; 46 : e64 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Forde B.M. , Zakour N.L.B. , Stanton-Cook M. , Phan M.-D. , Totsika M. , Peters K.M. , Chan K.G. , Schembri M.A. , Upton M. , Beatson S.A. The complete genome sequence of Escherichia coli EC958: A high quality reference sequence for the globally disseminated multidrug resistant E. coli O25b:H4-ST131 clone . PLoS ONE . 2014 ; 9 : e104400 . Google Scholar Crossref Search ADS PubMed WorldCat 20. van der Maaten L. , Hinton G. Visualizing Data using t-SNE . J. Mach. Learn. Res. 2008 ; 9 : 2579 – 2605 . WorldCat 21. Iyer D.R. , Das S. , Rhind N. Analysis of DNA replication in fission yeast by combing . Cold Spring Harb. Protoc. 2018 ; 2018 : pdb.prot092015 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Bianco J.N. , Poli J. , Saksouk J. , Bacal J. , Silva M.J. , Yoshida K. , Lin Y.-L. , Tourrière H. , Lengronne A. , Pasero P. Analysis of DNA replication profiles in budding yeast and mammalian cells using DNA combing . Methods . 2012 ; 57 : 149 – 157 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
An integrative method to predict signalling perturbations for cellular transitionsZaffaroni,, Gaia;Okawa,, Satoshi;Morales-Ruiz,, Manuel;del Sol,, Antonio
doi: 10.1093/nar/gkz232pmid: 30949696
Abstract Induction of specific cellular transitions is of clinical importance, as it allows to revert disease cellular phenotype, or induce cellular reprogramming and differentiation for regenerative medicine. Signalling is a convenient way to accomplish such transitions without transfer of genetic material. Here we present the first general computational method that systematically predicts signalling molecules, whose perturbations induce desired cellular transitions. This probabilistic method integrates gene regulatory networks (GRNs) with manually-curated signalling pathways obtained from MetaCore from Clarivate Analytics, to model how signalling cues are received and processed in the GRN. The method was applied to 219 cellular transition examples, including cell type transitions, and overall correctly predicted experimentally validated signalling molecules, consistently outperforming other well-established approaches, such as differential gene expression and pathway enrichment analyses. Further, we validated our method predictions in the case of rat cirrhotic liver, and identified the activation of angiopoietins receptor Tie2 as a potential target for reverting the disease phenotype. Experimental results indicated that this perturbation induced desired changes in the gene expression of key TFs involved in fibrosis and angiogenesis. Importantly, this method only requires gene expression data of the initial and desired cell states, and therefore is suited for the discovery of signalling interventions for disease treatments and cellular therapies. INTRODUCTION Cellular phenotypes can be characterized by stable gene expression profiles maintained by the underlying gene regulatory networks (GRNs). Conversions between different cellular phenotypes (i.e. cellular transitions) can be induced either by perturbing directly GRNs, or cellular signalling pathways that in turn act on GRNs. These transitions range from cell type conversion events (reprogramming, differentiation), to conversions between cellular phenotypes within a cell type, due to drug treatment or disease conditions. The induction of desired cellular transitions is of clinical interest, as it allows to revert cellular disease phenotypes to their healthy counterparts, or to derive required cells and tissues for cell replacement therapies. By doing so without transfer of genetic material, but rather acting on signalling pathways, the safety concerns currently posed by gene therapy protocols can be overcome (1). In order to systematically identify signalling perturbations induced by small molecules that are able to trigger changes in cellular phenotype, the effect of signalling on gene expression must be modeled. In this regard, two broad classes of computational methods have been recently developed, namely GRN-free and GRN-based approaches. Some of the GRN-free methods compare a signature gene list from the query gene expression profile with a compendium of signatures associated with known perturbations (2–4). Another set of GRN-free methods maps gene expression onto signalling pathways, and identifies the pathways or sub-pathways whose activities are (dys)regulated by using enrichment measures (5,6). Although these methods have been applied to identify signalling pathways perturbations inducing gene expression changes, they lack the mechanistic understanding of how a change in TF expression or activity gives rise to a new cellular phenotype. Thus, to capture how signalling can induce cellular transitions, a model that integrates signalling and gene regulatory networks is required. Existing GRN-based methods use ordinary differential equations (ODEs) to model the expression level of each gene as a function of the expression of its regulators (7–9). Nevertheless, ODE-based modelling frameworks cannot be applied to systems where only a small number of transcriptomics samples is available, as they require a large amount of data for parameter estimation. Various studies have presented manually-curated models where signalling and gene regulatory networks have been integrated to study individual cellular transitions (10–12). However, we are not aware of any computational approach that systematically integrates and models the signalling and transcriptional regulatory layers without requiring a large amount of gene expression data. Here, we introduce a computational method that predicts signalling molecules whose perturbation can induce transitions between cellular phenotypes, given their initial and target gene expression profiles. This approach integrates the signalling network with a Boolean transition-specific GRN model. A central role in this model is played by interface TFs, which connect the two regulatory layers. The signalling information transmitted along pathways results in the activation or inhibition of interface TFs, which then act on the GRN, triggering cellular transitions. In line with previous observations (13,14), the transmission of signalling information is modeled as a probabilistic event depending on protein availability. Indeed, preliminary results show that the pathways predicted with this approach were enriched in proteins differentially phosphorylated upon perturbation, supporting their involvement in the signal transduction. Further, signalling molecules were ranked according to how effectively they act on the interface TFs that can induce the desired cellular phenotype. Importantly, by considering changes in the GRN initial state upon interface TFs perturbations, this method is able to model how signalling cues are received and processed in the GRN to give rise to the desired phenotype. To our knowledge, this mechanistic insight is not provided by other general methods. We applied our method to 219 cellular transition examples, including the induction of cellular differentiation and reprogramming, and obtained correct predictions in the majority of them. Importantly, this method showed better performance than well-established GRN-free methods, and similar performance to another GRN-based method (DeMAND), while requiring substantially less data. Finally, the method was applied to the prediction of signalling molecules for the reversion of cirrhotic liver to its healthy counterpart. Experimental results confirmed that the activation of one of the predicted candidates restored the healthy expression state of key TFs involved in cirrhosis. In summary, here we propose the first general method, to our knowledge, which uses gene expression data to identify signalling molecules able to induce cellular transitions. The low data requirements make this tool readily applicable to the design of new experimental protocols and the discovery of signalling perturbations for disease treatment or cellular therapies. MATERIALS AND METHODS Perturbation targets Mapping of drugs and small molecules to their direct protein targets was carried out using STITCH (http://stitch.embl.de/ (15), v5.0, accessed in October 2017, with experimental evidence confidence >0.4); DrugBank (www.drugbank.ca (16), accessed in October 2017) and MetaCore from Clarivate Analytics were used to specify the effect (activation, inhibition, unknown effect). For growth factors and proteins, the interacting proteins were obtained from STITCH (same selection criteria), and the signalling network retrieved from MetaCore was used to define the effect on the targets. Datasets All datasets contained in the Connectivity Map (build 02, (17)) and generated on Affymetrix Human Genome U133A 2.0 Array were processed. We also manually selected from ArrayExpress microarray experiments where expression data was collected before and after the application of a single perturbation, prioritizing non-cancer cell lines and, in particular, experiments related to cell differentiation or reprogramming. Datasets were discarded if all the targets of the used perturbation were either absent from the signalling network, or not connected to interface TFs by a directed path. In addition, experiments with chemically undefined perturbations (e.g. serum, oxygen, co-culturing conditions etc.) were also discarded. The evaluation was restricted to datasets with selective perturbation, meaning the number of target signalling molecules present in the signalling network was ≤30. These criteria allowed us to test our method on the prediction of well-characterized and specific signalling perturbations. The considered datasets contain expression values before and a few hours after the perturbation (6 h after drug application for CMap, and up to 48 h after the last perturbation in the manually selected datasets). In reprogramming examples, gene expression data from primary cells for both the initial and the desired cellular states was used. Phosphoproteomics datasets Studies where a single perturbation was applied and quantified through phosphoproteomics data were paired to gene expression datasets matching the initial and final conditions as closely as possible. Ideally, the same cell type was perturbed with the same chemical, and gene expression was measured with comparable delay after perturbation. When this was not possible, time after perturbation was allowed to change up to 48 h. In addition, we considered closely related cell lines and different chemical compounds targeting the same protein targets. Regarding quantitative phosphoproteomics data, the list of differentially phosphorylated proteins were obtained from the original papers; when not available, we repeated the analysis as described in them (see Table 1). We used the highest fold change observed for any phosphosite on a protein as the fold change of that whole protein. Table 1. Datasets for which both phosphorylation and gene expression data is available. Gene expression datasets are indicated with their GEO accession number Gene expression datasets (GEO access ID) Dataset Cell type Perturbation Phosphorylation study (PMID) LFC cutoff for DP control treated Gnad et al., 2016 HCT116 cells MAPK inhibition [GDC0973 (1uM)] 27273156 log2(3) GSM455560 GSM455565 Wilkes et al., 2015 MCF7 cells EGFR inhibition [EGFR2] 26060313 1 GSM149914 GSM149941 Rudolph et al., 2016 MCF7 cells EGF 28009266 2.38 GSM325937 GSM325958 GSM325959 Sharma et al., 2014 HeLa cells EGF 25159151 1 GSM156764 GSM156770 D’Souza et al., 2014 HaCaT cells TGFβ 25056879 1 GSM297456 GSM297458 Wierer et al., 2013 MCF7 cells estradiol 23770244 log2(1.5) GSM289651 GSM289654 GSM289652 GSM289655 GSM289653 GSM289656 Gene expression datasets (GEO access ID) Dataset Cell type Perturbation Phosphorylation study (PMID) LFC cutoff for DP control treated Gnad et al., 2016 HCT116 cells MAPK inhibition [GDC0973 (1uM)] 27273156 log2(3) GSM455560 GSM455565 Wilkes et al., 2015 MCF7 cells EGFR inhibition [EGFR2] 26060313 1 GSM149914 GSM149941 Rudolph et al., 2016 MCF7 cells EGF 28009266 2.38 GSM325937 GSM325958 GSM325959 Sharma et al., 2014 HeLa cells EGF 25159151 1 GSM156764 GSM156770 D’Souza et al., 2014 HaCaT cells TGFβ 25056879 1 GSM297456 GSM297458 Wierer et al., 2013 MCF7 cells estradiol 23770244 log2(1.5) GSM289651 GSM289654 GSM289652 GSM289655 GSM289653 GSM289656 The same log fold change values used in the original studies were used to define differential phosphorylation. View Large Table 1. Datasets for which both phosphorylation and gene expression data is available. Gene expression datasets are indicated with their GEO accession number Gene expression datasets (GEO access ID) Dataset Cell type Perturbation Phosphorylation study (PMID) LFC cutoff for DP control treated Gnad et al., 2016 HCT116 cells MAPK inhibition [GDC0973 (1uM)] 27273156 log2(3) GSM455560 GSM455565 Wilkes et al., 2015 MCF7 cells EGFR inhibition [EGFR2] 26060313 1 GSM149914 GSM149941 Rudolph et al., 2016 MCF7 cells EGF 28009266 2.38 GSM325937 GSM325958 GSM325959 Sharma et al., 2014 HeLa cells EGF 25159151 1 GSM156764 GSM156770 D’Souza et al., 2014 HaCaT cells TGFβ 25056879 1 GSM297456 GSM297458 Wierer et al., 2013 MCF7 cells estradiol 23770244 log2(1.5) GSM289651 GSM289654 GSM289652 GSM289655 GSM289653 GSM289656 Gene expression datasets (GEO access ID) Dataset Cell type Perturbation Phosphorylation study (PMID) LFC cutoff for DP control treated Gnad et al., 2016 HCT116 cells MAPK inhibition [GDC0973 (1uM)] 27273156 log2(3) GSM455560 GSM455565 Wilkes et al., 2015 MCF7 cells EGFR inhibition [EGFR2] 26060313 1 GSM149914 GSM149941 Rudolph et al., 2016 MCF7 cells EGF 28009266 2.38 GSM325937 GSM325958 GSM325959 Sharma et al., 2014 HeLa cells EGF 25159151 1 GSM156764 GSM156770 D’Souza et al., 2014 HaCaT cells TGFβ 25056879 1 GSM297456 GSM297458 Wierer et al., 2013 MCF7 cells estradiol 23770244 log2(1.5) GSM289651 GSM289654 GSM289652 GSM289655 GSM289653 GSM289656 The same log fold change values used in the original studies were used to define differential phosphorylation. View Large Prior knowledge networks Signalling network We retrieved 75 canonical signalling pathways present in MetaCore from Clarivate Analytics in July 2017, and merged them together in a single signalling network, composed of 2496 nodes and 6876 edges. In MetaCore, all edges are obtained by expert manual curation of full text papers from literature, directed and signed when possible; the nodes represent signalling entities, either single proteins or complexes that act as functional entities. We removed edges corresponding to ‘Technical’ or ‘Unspecified’ effect, and ‘Technical’, ’Transcriptional Regulation’, ‘Influence on Expression’, ‘Catalysis’ and ‘Transport’ mechanisms. Some TFs are known to act as complexes but are represented in MetaCore as separate functional nodes all interacting with the same targets. We checked the literature supporting all the interactions involving these TFs, and manually removed the ones where the interaction did not specifically involve the TF considered, but other components of the complex (see Supplementary Table S1 for list of pathways used and the list of manually removed edges). Transcriptional regulatory interactions For transcriptional regulatory interactions, we considered all interactions among human TFs and transcriptional regulators as listed in Animal TFDB 2.0 (18), that are labelled in MetaCore as ‘Transcriptional regulation’, ‘Influence on Expression’ and ‘Regulation’, with ‘Activation’, ‘Inhibition’ or ‘Unspecified’ effect (accessed in March 2017). Microarray data processing All data was processed starting from raw CEL files with the same pipeline, consisting of normalization with frozen-RMA (R package fRMA (19)) and assignment of expression state by Gene Expression Barcode (20,21). Briefly, the barcode approach assumes that the distribution of normalized, log-transformed expression values for a specific probeset observed across multiple tissues, cell types and conditions, can be fitted with a mixture model of a Gaussian distribution corresponding to non-expressed values, and a uniform distribution corresponding to expressed values (20). We selected for each gene the probeset with highest variance, then given this model, we assigned Boolean state 1 (expressed) to TFs that had probability lower than 0.05 of belonging to the non-expressed distribution (corresponding to parameter cutoff = 0.95 in the pipeline), and state 0 otherwise. Furthermore, in this work we defined the expression probability of a probeset |$x$| as the ratio: \begin{equation*}p\ \left( x \right) = \frac{{\frac{1}{2}{f_{e\left( x \right)}}}}{{\frac{1}{2}{f_{e\left( x \right)}} + \frac{1}{2}{f_{n\left( x \right)}}}}\ \end{equation*} where |${f_e}$| is the probability density function (pdf) of |$x$| in the uniform distribution |$U( {\mu ,15} )$|, and |${f_n}$| is the pdf of |$x$| in |$N( {\mu ,\sigma } )$| (see Supplementary Information and Supplementary Figure S1). To each protein, we assigned the maximum probability of expression found across all replicates. The values |$\mu$| and |$\sigma$| were calculated and distributed in (21) and are available as R Bioconductor packages for microarray platforms Affymetrix Human Genome U133A, Affymetrix Human Genome U133 Plus 2.0, Affymetrix Human Genome U133A 2.0, Affymetrix Human Gene 1.0 ST Array, Affymetrix Mouse Gene 1.0 ST Array and Affymetrix Mouse Genome 430 2.0 Array. While relying on pre-processed expression value distributions limits the application of our method to these specific microarray platforms, it also reduces the requirement for data samples to one sample per cellular state, one for the initial and one for the required state. Gene regulatory network (GRN) inference TFs that were assigned different Boolean state (1 = expressed, 0 = not expressed) before and after the perturbation are assumed to also have differential activity in the two cellular states. These TFs were connected in a gene regulatory prior-knowledge network. This Boolean network was pruned so that the resulting GRN matches the initial and final Booleanized gene expression states, as described in (22). Briefly, this algorithm assumes that both states are represented by a separate point attractor in the Boolean network state space, and removes edges from the initial network in order to make the attractor states compatible with the Booleanized gene expression profiles. Datasets with GRNs containing <10 connected TFs after the pruning were excluded from further analysis, as we observed that they tend to correspond to perturbations affecting biological processes different from signalling (e.g. metabolic reactions, structural proteins). Addition of TFs connecting signalling pathways and GRN The activity of signalling molecules on the GRN is mediated by the TFs that belong to signalling pathways and therefore can act as signalling effectors (interface TFs). TFs that are not expressed initially will require some time to be expressed, and are not likely to be involved in the signalling response. Additionally, we assumed that whatever signal is applied to the cell, it initially travels to the nucleus using proteins that are already expressed in the cell. Therefore, in this study we defined interface TFs as TFs that regulate TFs present in the GRN (GRN-TFs), are expressed at the initial time point, and are connected through expressed signalling paths to any of the source nodes (0-indegree nodes) of canonical signalling pathways. No further filtering was applied (Supplementary Figure S2). In silico perturbations of interface TFs The interface TFs that are more likely to drive the cells from the initial to the final gene expression profile were found by in silico perturbations of the initial GRN state. We exhaustively tested combinations of up to four interface TFs at the same time, by fixing their Boolean state and updating synchronously the Boolean state of the network following a majority (threshold) logic rule until it converged to a fixed-point attractor. Interface TFs have the property of being directly regulated by signalling events, meaning that their expression at the initial cellular state is not sufficient to assume they are also active or inactive. Unless an interface TF had different Boolean states between the initial and final gene expression, its activity state at the initial time point was unknown. Therefore, we simulated the network state by assigning it both 0 and 1 states. The perturbations are ranked according to their flipping score, which is the number of GRN-TFs that change their state after simulation. We then selected the combinations of interface TFs that obtained the three best (including ties) flipping scores (best performing combinations, BPCs). This corresponds to parameter best = 3 in the pipeline. We removed the combinations of interface TFs that did not show any synergistic effects (i.e., combinations whose flipping TFs were the same as the union of flipping TFs of individual constituent interface TFs). This is because our method aims to prioritize candidate signalling molecules that specifically target interface TFs over those that target a large number of interface TFs, in order to avoid unspecific effects on cells upon their perturbations. Prioritizing interface TFs that synergistically maximize the flipping score allows us to select BPCs with fewer, but more specific, interface TFs that need to be targeted by optimal signalling perturbations (Supplementary Figure S3). Datasets for which the best flipping score did not represent at least 40% of the GRN-TFs were discarded from further analysis. Probability of intermediate signalling molecules regulating interface TFs We approximated the regulation of signalling molecule x on interface TF y by considering the most probably expressed path (MPP) connecting x to y. The probability Mx,y of the MPP is defined as: \begin{equation*}\ {M_{x,y}} = \ p \left({MP{P_{x \to y}}} \right) = \ max\ \ p\left( {x \to \ldots \to y} \right)\end{equation*} where |$p\ ( {x \to \ldots \to y} ) = \mathop \prod \nolimits_{j \in proteins \in simple\ path} p( j )$| and p(j) is the expression probability of the intermediate signalling molecule j. We used two variations of this approach (see Supplementary Information and Supplementary Figure S4): Proteins belonging to the same functional modules tend to show transcriptional correlation (23). Accordingly, we calculated the Pearson correlation coefficient of expression values across all the processed datasets in CMap, and we increased the probability of interactions among proteins that were correlated at the expression level (absolute correlation > 0.7 and sign of correlation matching the sign of the interaction). We used these corrected probabilities to identify the MPPs and calculate their correlation-corrected probability. We expect that the longer the path used to reach an interface TF (measured in number n of interactions present in the path), the higher the chances that crosstalk and non-functional interactions occur. Therefore, the probability of the MPP connecting between each signalling protein and interface TF was multiplied by |${e^{ - n}}$| (24), resulting in what we defined as length-corrected probability. The distribution of probabilities Px from the same signalling protein x to all interface TFs was obtained by: \begin{equation*}\ {P_{x,y}} = \frac{{{M_{x,y}}}}{{\mathop \sum \nolimits_{i \in interfaceTFs} {M_{x,i}}}}\ \end{equation*} Two Px vectors exist for each molecule, one corresponding to correlation-corrected probability, and one length-corrected probability. Finally, we determined the sign of each MPP by \begin{equation*}{sign_{x,y}} = \mathop \prod \nolimits_{e = edges \in MPP} sign\left( e \right)\ \end{equation*} If the sign is positive, the activation (inhibition) of x causes the activation (inhibition) of TF y with probability Px,y. If the sign is negative, activation (inhibition) of x causes inhibition (activation) of y with probability Px,y. Prediction of candidate signalling proteins for gene expression state transitions We assumed that a high frequency of a particular activated/inhibited interface TF among the BPCs is a good indication that it consistently has large effects on the GRN state. For each state assigned to each interface TF (TFs-state pair s) the frequency Fs in all BPCs was calculated as: \begin{equation*}\ {F_s} = \frac{{{k_s}}}{k}\ \end{equation*} where k is the total number of BPCs and ks is the number of combinations where s is present. The frequencies across all TF-state pairs S were then normalized to sum up to one, giving the probability distribution Q: \begin{equation*}Q\ = \frac{{{F_i}}}{{\mathop \sum \nolimits_{i \in S} {F_i}}}\ \end{equation*} The ranking of the signalling molecules was obtained by comparing their probability of reaching the interface TFs (Px) with the frequency of such TFs in the BPCs (Q) by Jensen-Shannon divergence: \begin{equation*}JSD \left( {{P_x}\parallel Q} \right) = \frac{1}{2}\ D\left( {{P_x}\parallel M} \right) + \frac{1}{2}D\left( {Q\parallel M} \right)\end{equation*} where |$M\ = \frac{1}{2}\ ( {{P_x} + Q} )$| and |$D\ ( {X\parallel Y} ) = \mathop \sum \nolimits_i X( i )log\frac{{X( i )}}{{Y( i )}}$| (Kullback–Leibler divergence). Both activation and inhibition of the signalling protein x are possible, but their scores differ because the sign of the paths connecting x to the interface TFs will be opposite, resulting in different effects on the GRN state. For example, assume that the activation of x results in the activation of TF y, which is frequently present in the BPCs. The inhibition of x on the other hand, assigns to y the inactive state, which is not present in the BPCs. Here the activation of x would have a better score than its inhibition, because while the probability of reaching TF y is the same, the resulting perturbations on the GRN have different effectiveness in changing its state. The signalling molecules were ranked by JSD values (the smaller the better), and then assigned a rank: \begin{equation*}R \left( x \right) = \mathop {min}\limits_{v \in {P_x}} \ rank\left( {x,v} \right)\end{equation*} where R(x) is the rank assigned to signalling molecule x, and is defined as the minimum rank obtained by x using either correlation- or length-based Px variant (v). A cut-off was defined as a fraction (from 1 to 10%) of the maximum R value present in the final ranking. The signalling molecules whose rank R was lower than the cut-off were considered candidate drivers of the transition between initial and final states. For single perturbation datasets, the prediction was considered successful if at least one of the direct targets of the experimental perturbation appeared among the candidates, as was also done in (25). At each cut-off, the chance of obtaining at least one perturbation target (i.e. a success) in a randomly selected set of the same size was calculated by one-sided hypergeometric test. The optimal cut-off was selected as the one where our method showed the maximum improvement from the random chance, across the datasets coming from CMap. Functional analysis GO biological process terms enrichment was calculated separately for candidate and non-candidate signalling molecules using the R package gProfilerR. The overlap between GO terms associated to experimental perturbation targets (target terms) and enriched terms was counted for each dataset and the distributions of these values were compared by one-sided Wilcoxon test. The distances on the signalling network topology were calculated from all signalling molecules to experimental perturbation targets, and in the opposite direction. The direction with shorter distance was then selected, and the average distance from one signalling molecule to all the reachable experimental perturbation targets was calculated. The distributions of average distances from candidate and non-candidate signalling molecules were compared by Wilcoxon test with 100 000 Monte Carlo replicates (P-value < 0.05). Experimental model of cirrhosis and CVX-060 treatment To induce cirrhosis, 10 male Wistar rats were exposed to inhalation of CCl4, as previously described (26) and according to the criteria of the investigation and ethics committee of the Hospital Clínic Universitari and the University of Barcelona. Five cirrhotic rats were treated once a week with 10 mg/kg of CVX-060 (Pfizer, Inc., New York, NY, USA) for 4 weeks. CVX-060 was diluted in 500 μl of saline solution and injected intravenously via the tail vein. Prediction of signalling molecules for reversion of cirrhotic state Expression data for healthy liver in male Wistar rats was obtained from GEO dataset GSE71201. Gene expression in CCl4 and CCl4+CVX-060 treated livers was quantified using Affymetrix GeneChip Rat Genome 230.2 Array. After quality control and PCA visualization, two replicates for each treatment were retained for further analysis. Each gene was assigned gene expression probability equal to |$1 - p$|, where p is the P-value obtained from Affymetrix MAS5.0 detection call. A gene was considered expressed if its expression probability was equal or larger than 0.94 (corresponding to call ‘marginal’ or ‘present’ from MAS5.0). The prediction of GRN state after CVX-060 treatment was obtained by first selecting all the interface TFs in the BPCs that are activated/inhibited with probability higher than zero by the activation of Tie2. Then, the BPCs composed only of such TFs are selected, and the GRN-TFs that change their state in any of these BPCs are expected to flip when CVX-060 is applied. The same GRN state is also obtained when predicting the effect of the inhibition of Ang2 on the GRN state. Comparison with previously published methods Differential gene expression Differential gene expression between the initial and query expression profiles was calculated with the R package limma. Genes with absolute log fold change (lfc) >log2(1.5) and BH-adjusted P-value <0.05 were considered differentially expressed. When replicates were not available, the lfc cut-off alone was applied. MetaCore pathway enrichment was calculated by one-sided hypergeometric test (BH-adjusted P-value < 0.05). To rank all signalling molecules according to differential expression, they were ordered by decreasing absolute lfc values. SPIA Differentially expressed genes (DEGs) were selected by BH-adjusted P-value <0.1, or by lfc >log2(1.5) if replicates were not available. The R package SPIA (27) was used to calculate KEGG signalling pathways significantly perturbed. A prediction was considered successful if any of the significant pathways contained any of the direct targets of the perturbation applied in a dataset. Datasets without any DEG were discarded. Connectivity Map DEGs were selected by BH-adjusted P-value <0.05, or by lfc >log2(1.5) if replicates were not available. Up to 150 up- and down-regulated genes, by decreasing absolute lfc, were submitted to the ‘Batch query’ functionality of CMap L1000 query, accessible at https://clue.io/l1000-query#batch (28), with sig_fastgutc_tool option enabled. The summary results across cell lines were used for further analysis. The predictions were considered correct if the experimental perturbation, its direct targets, or a drug targeting the perturbation targets were assigned a connectivity score (tau) >90. Datasets with <10 DEGs, or raising errors during the submission, were discarded from the analysis. DeMAND Our method was applied to the GEO datasets used as part of the benchmarking for DeMAND (termed GEO13 in the original paper)(29). Only datasets generated on Affymetrix whole-transcriptome array platforms were considered. When available, values |$\mu$| and |$\sigma$| (21) were used to calculate the probability of expression; otherwise, the Affymetrix MAS5.0 detection call was used as previously described. Perturbation targets from STITCH, DrugBank, as well as the original paper, were used to define successful predictions. In DeMAND, genes with FDR≤.1 were considered predicted candidates. RESULTS Method overview We present here a computational method that predicts signalling molecules, including plasma membrane receptors or intermediate signalling proteins, whose perturbations can induce desired cellular transitions. It requires only gene expression profiles of the initial and desired cellular states, and does not require a large number of replicates, time-series expression profiling, or phosphoproteomics data. Therefore, this method can be applied to the transition between any pair of initial and query cell states, including novel cellular transitions that have not been achieved previously. In the first step of the pipeline, the activity of TFs is approximated by Booleanizing their expression state. TFs with differential activity are selected and connected to form a Boolean transition-specific GRN. In this framework, the cellular states are modelled as network state attractors resulting from the same network topology. The objective is to induce the transition from the initial to the desired attractor by acting on TFs that are regulated by signalling interactions (i.e. interface TFs). Perturbations of these interface TFs are performed exhaustively to identify the ones that are most effective for the GRN state transition (Figure 1A). Figure 1. View largeDownload slide Ranking signalling molecules according to their likelihood of inducing desired changes in GRN state. (A) The gene regulatory network (GRN) containing differentially Booleanized TFs is connected to interface TFs. The initial state is perturbed by in silico simulation of fixed states of up to four interface TFs at the same time. The best performing combinations (BPCs) are selected as the ones having the top three flipping scores (including ties). The frequency of each interface TF state (activated +, inhibited –) is calculated and then normalized to give the probability distribution Q of each TF state of causing changes in the GRN state. (B) The expression probability of each protein is mapped onto the signalling network and used to define the probability of signalling interactions. For each signalling molecule X the most probably expressed paths (MPPs) connecting it with all interface TFs are selected. The probability and the sign of the MPPs are calculated, and combined to give the probability distribution P of activating or inhibiting the interface TFs by activating or inhibiting X. Both correlation-based and length-based probabilities are calculated (see Methods). (C) The probability distributions Q and P are compared through Jensen-Shannon divergence (JSD). The score is used to rank the perturbations of each signalling molecule. The best ranking that each molecule obtains across the correlation-based and length-based rankings defines its final rank. A fraction of the final ranking is selected. Figure 1. View largeDownload slide Ranking signalling molecules according to their likelihood of inducing desired changes in GRN state. (A) The gene regulatory network (GRN) containing differentially Booleanized TFs is connected to interface TFs. The initial state is perturbed by in silico simulation of fixed states of up to four interface TFs at the same time. The best performing combinations (BPCs) are selected as the ones having the top three flipping scores (including ties). The frequency of each interface TF state (activated +, inhibited –) is calculated and then normalized to give the probability distribution Q of each TF state of causing changes in the GRN state. (B) The expression probability of each protein is mapped onto the signalling network and used to define the probability of signalling interactions. For each signalling molecule X the most probably expressed paths (MPPs) connecting it with all interface TFs are selected. The probability and the sign of the MPPs are calculated, and combined to give the probability distribution P of activating or inhibiting the interface TFs by activating or inhibiting X. Both correlation-based and length-based probabilities are calculated (see Methods). (C) The probability distributions Q and P are compared through Jensen-Shannon divergence (JSD). The score is used to rank the perturbations of each signalling molecule. The best ranking that each molecule obtains across the correlation-based and length-based rankings defines its final rank. A fraction of the final ranking is selected. In the next step, we predict the effect of the activation/inhibition of each molecule in the signalling network on the downstream interface TFs. Signal transduction is an inherently stochastic process, strongly dependent on post-translational protein modifications not captured by gene expression data (30). However, it has been shown that the response to signalling perturbations changes across different cell types depending on the abundance of specific signalling proteins prior to perturbation (13,14). Thus, we model signal transduction as a probabilistic process driven by protein availability, which can be estimated from gene expression data, an approach followed also by other methods (31,32). We define the probability of signal transduction from a signalling molecule to interface TFs as the product of the expression probability of all proteins present in the MPP between them. Two variants of the signalling probability are used: (a) the probability of an interaction is higher if the genes involved are correlated in gene expression, since their proteins are more likely to work as a functional unit (23); (b) the probability of a path is multiplied by the exponential of its length, to account for the number of interactions required for the signal to reach the TFs (i.e. the longer, the less probable). The sign of each MPP is then incorporated into its outcome, i.e. if the overall sign of a path is positive, the activation/inhibition of that path will activate/inhibit the target interface TF, whereas if the overall sign is negative, the activation/inhibition of that path will inhibit/activate the target TF (Figure 1B). Finally, our method ranks each signalling molecule in the network based on the Jensen-Shannon divergence of the probability P, with which it acts on interface TFs, from the likelihood Q of the interface TFs themselves of changing the GRN state (Figure 1C). Phosphoproteomics datasets suggest that MPPs are used in signal transduction While not all proteins are subject to phosphorylation, many signalling pathways rely on phosphorylation cascades for transmitting the signal in the cytoplasm. Therefore, phosphoproteomics experiments allow the inference of protein activity during a signalling event (33). Proteins showing differential phosphorylation upon perturbation are expected to be transmitting the newly applied signal, and therefore the signalling paths used for signal transduction should show an enrichment in differentially phosphorylated proteins compared to the paths that are not used for signal transduction. Given this assumption, we asked if the MPPs selected by our method were enriched with phosphorylation events. To investigate this, we gathered experiments for which both gene expression and quantitative phosphoproteomics data were acquired before and after a specific perturbation was applied (Table 1). In each of the datasets, the direct target molecules of the experimental perturbation and the interface TFs that they can affect via signalling were identified, and the MPP between each pair of perturbation target molecule and interface TF was computed (Figure 2A). While a protein might contain multiple phosphosites, not all of them are necessarily functional. However, at the moment, only a limited number of proteins have their phosphosites functionally characterized, and for the majority of the proteins this information is not available (33). Therefore, we first considered a protein differentially phosphorylated (DP) if any of its phosphosites showed differential phosphorylation, defined in the original studies (see Table 1). We then calculated the frequency of DP proteins in the MPP, and in randomly selected simple paths among the same initial and final nodes (up to 100 randomly selected paths, limited to maximum path length = 10 edges, Figure 2A). The difference between the frequency of DP proteins in the MPP and the random paths was tested for statistical significance (t-test, P-value < 0.05). In each dataset, the majority of the MPPs had significantly more DP proteins than other possible paths, for both probability computation methods we used to define MPPs (Figure 2B and C). Therefore, the MPPs between signalling molecules and interface TFs reasonably captured phosphorylation patterns observed shortly after signalling perturbations. This gave us confidence in using MPPs as biologically relevant paths used by signal transduction. Figure 2. View largeDownload slide Enrichment of MPPs in differentially phosphorylated (DP) proteins. (A) MPPs are defined by correlation-based and length-based probabilities. For both of these methods, the fraction of DP proteins in each MPP for each target-interface TF pair is compared by t-test to other simple paths connecting the same pair. (B) Average fraction of MPPs per datasets significantly enriched in DP proteins compared to alternative simple paths containing DP proteins (P-value < 0.05). (C) Breakdown of the single MPPs. Orange: number of interface TFs for which the fraction of DP proteins in the MPP is significantly higher than in other simple paths. Light-blue: the difference is not significant; grey: there are no DP proteins in any of the paths connecting the perturbation target to interface TFs. The same results were obtained with both correlation-based and length-based MPPs. Figure 2. View largeDownload slide Enrichment of MPPs in differentially phosphorylated (DP) proteins. (A) MPPs are defined by correlation-based and length-based probabilities. For both of these methods, the fraction of DP proteins in each MPP for each target-interface TF pair is compared by t-test to other simple paths connecting the same pair. (B) Average fraction of MPPs per datasets significantly enriched in DP proteins compared to alternative simple paths containing DP proteins (P-value < 0.05). (C) Breakdown of the single MPPs. Orange: number of interface TFs for which the fraction of DP proteins in the MPP is significantly higher than in other simple paths. Light-blue: the difference is not significant; grey: there are no DP proteins in any of the paths connecting the perturbation target to interface TFs. The same results were obtained with both correlation-based and length-based MPPs. Prediction of signalling molecules that induce desired gene expression change We applied our method to datasets belonging to CMap generated on Affymetrix Human Genome U133A 2.0 Array, plus datasets manually selected from ArrayExpress. After quality controls, 219 datasets (193 from CMap, 26 from ArrayExpress) were used for the analysis. For each dataset, a GRN was built and perturbed in silico to obtain BPCs. General characteristics of GRNs and BPCs are summarized in Supplementary Information and Supplementary Figure S5. In order to generate the final signalling molecule ranking, a) the probability with which each signalling molecule can act on interface TFs, and b) the likelihood of the interface TFs to induce the desired GRN state transitions, are compared using Jensen-Shannon divergence (see also Methods). Signalling molecules that specifically reach a few well-performing interface TFs will score better than molecules which indistinctly act on many interface TFs. On average, ∼1400–1500 signalling molecules are present twice in the final ranking, once for their activation and once for their inhibition, resulting in approximately 2900–3000 potential perturbations for each dataset. First, we compared the ranking obtained with our method with one based on differential gene expression. Ordering genes by their log fold change did not prioritize perturbation targets, which were found only after selecting a big portion of the ranking, while our method performed better (see Figure 3A). For example, to retain a correct experimental perturbation in at least 50% of the datasets, 858 molecules need to be selected by log fold change, compared to the 239 required by our method. This confirms that proteins involved in signal transduction do not necessarily show differential expression, and therefore more complex approaches are needed in order to obtain better predictions using gene expression data. Figure 3. View largeDownload slide Method performance. (A) Fraction of datasets where at least one target is correctly predicted, across an increasing selection of signalling molecules. Proteins are ranked according to their expression log fold change between the initial and final gene expression profile (DEG), or according to our method. (B) Variation of success rate and number of selected signalling molecules, at different ranking cut-offs. Circles = observed success rate, X = random success rate, horizontal error bars = 5th and 95th percentiles of selected set size. The same fraction of ranks selected corresponds to variable set sizes because of ties in the ranking. Sets size range was removed from the random success rate points for clarity. (C) Performance across datasets with increasing number of known perturbation targets. Each point represents the random success rate for a dataset, obtained by calculating how likely it is to find at least one perturbation target in a random set of signalling molecules of same size as the one selected by our method. The solid red lines represent the fraction of datasets in each class, for which at least one of the targets was in the list of candidate signalling molecules selected by our method. Red dashed line: average obtained performance across all classes (62%), blue dashed line: random average performance (39%). Figure 3. View largeDownload slide Method performance. (A) Fraction of datasets where at least one target is correctly predicted, across an increasing selection of signalling molecules. Proteins are ranked according to their expression log fold change between the initial and final gene expression profile (DEG), or according to our method. (B) Variation of success rate and number of selected signalling molecules, at different ranking cut-offs. Circles = observed success rate, X = random success rate, horizontal error bars = 5th and 95th percentiles of selected set size. The same fraction of ranks selected corresponds to variable set sizes because of ties in the ranking. Sets size range was removed from the random success rate points for clarity. (C) Performance across datasets with increasing number of known perturbation targets. Each point represents the random success rate for a dataset, obtained by calculating how likely it is to find at least one perturbation target in a random set of signalling molecules of same size as the one selected by our method. The solid red lines represent the fraction of datasets in each class, for which at least one of the targets was in the list of candidate signalling molecules selected by our method. Red dashed line: average obtained performance across all classes (62%), blue dashed line: random average performance (39%). We considered a prediction correct if at least one of the known perturbation targets appeared in the top ranked molecules. The success rate was calculated as the fraction of datasets for which the prediction was correct. Different ranking cut-offs were tested, and for all of them, the success rate on CMap datasets was better than the random selection of the same number of candidates (Figure 3B). Cut-off = 0.06 was used for subsequent analysis because our method showed the highest performance gain at this cut-off with respect to the random success rate, with 136 out of the total 219 datasets (62%) being successful (Supplementary Table S2). In particular, it correctly predicted at least one perturbation target in 115/193 CMap examples (60%, versus random success of 41%, Figure 3B and C), 6/10 datasets for non-cancer cell lines selected from ArrayExpress, 5/6 datasets with matched phosphoproteomics data (Table 1), and 10/10 cell type transitions datasets (discussed below). We observed that our method is particularly successful in predicting signalling molecules for cellular transitions where a higher number of differentially expressed genes exists between the initial and desired cellular states. This suggests that acting on the GRN with signalling perturbations can be an effective strategy, especially for cellular transitions requiring broad changes in gene expression (Supplementary Figure S6). The number of direct targets of a compound or protein influences the probability of finding at least one of them among the selected candidates. Therefore, we divided the datasets in different classes based on the number of targets reported for the corresponding perturbation. For each of these classes, the performance of our method was compared to the frequency with which at least one real target is expected to appear in random sets of signalling proteins of the same size (Figure 3C). A significantly better performance was obtained in datasets with 1–10 known perturbation targets (P-value = 7.82e–05 for datasets with 1–5 targets, and 3.40e–06 for 6–10), which represent 74% of all datasets tested. The use of target-specific drugs or growth factors is required to induce cellular transitions in a controlled way, and these results demonstrate that our method is particularly suited for such cases. As the majority of the datasets analysed with our method concerned drug application, but so few of the drug–target pairs had a known sign (16% of all pairs), we could not comprehensively assess the accuracy of the predicted signs. However, we did not observe a bias in our method towards the prediction of inhibition or activation of signalling molecules (Supplementary Figure S7). There exist multiple other drug-perturbation gene expression datasets, for example the data generated for the DREAM/NCI compound synergy challenge (34), which are often used to benchmark methods that use gene expression data to predict cellular response to drugs (9,29). However, we could not estimate the gene expression probability for those datasets because their microarray platforms are incompatible with our method. Properties of candidate signalling molecules Next, we asked whether the sets of predicted candidate signalling molecules were related to direct perturbation targets by analysing their functional and topological features. First, the canonical signalling pathways extracted from MetaCore were tested for overrepresentation among candidate signalling molecules. We observed the enrichment of at least one pathway in all datasets. Moreover, 89% of the times at least one of the pathways containing known targets was enriched (Figure 4A). To further evaluate this result, we also tested enrichment of MetaCore canonical pathways in DEGs. 73.5% of the datasets showed some pathway enrichment, but only in 35.2% of all datasets at least one of the enriched pathways contained perturbation targets (Figure 4A). This result indicates that also at signalling pathway level, our method is far more effective in predicting appropriate signalling perturbations than simply using DEGs. In addition, we collected all the GO biological process terms associated with the perturbation targets (target terms). Then, we calculated which GO terms were enriched in the candidate signalling molecules and in the non-candidate signalling molecules. The target terms were more frequently overrepresented in the candidate signalling molecules than in the non-candidate ones (P-value = 6e–16, Figure 4B). Figure 4. View largeDownload slide Features of candidate molecules vs. other molecules. (A) Fraction of datasets for which the enrichment of pathways in signalling molecules selected either by our method or through differential expression, finds a pathway containing perturbation targets (red), any other pathway (orange), or no pathway significantly enriched. (B) Percentage of functional terms mapping to the perturbation targets also enriched in the selected molecule set, or the discarded molecule set. Whiskers indicate 1.5 * inter-quantile range. The candidates have significant higher portion of enriched functional terms shared with the perturbation targets (one-sided Wilcoxon test, P-value = 6e–16, confidence interval = (0.1048031,Inf)). (C) The average distance from perturbation targets of candidates is smaller than the average distance of non-candidates. Red bars = significant difference of the distance distributions, grey bars = non-significant difference (Wilcoxon test with 100 000 Monte Carlo replicates). Figure 4. View largeDownload slide Features of candidate molecules vs. other molecules. (A) Fraction of datasets for which the enrichment of pathways in signalling molecules selected either by our method or through differential expression, finds a pathway containing perturbation targets (red), any other pathway (orange), or no pathway significantly enriched. (B) Percentage of functional terms mapping to the perturbation targets also enriched in the selected molecule set, or the discarded molecule set. Whiskers indicate 1.5 * inter-quantile range. The candidates have significant higher portion of enriched functional terms shared with the perturbation targets (one-sided Wilcoxon test, P-value = 6e–16, confidence interval = (0.1048031,Inf)). (C) The average distance from perturbation targets of candidates is smaller than the average distance of non-candidates. Red bars = significant difference of the distance distributions, grey bars = non-significant difference (Wilcoxon test with 100 000 Monte Carlo replicates). To investigate where the candidate signalling molecules were located in the signalling network, we compared the distribution of distances (minimum number of interactions with same direction required to connect two nodes) from selected candidate molecules to perturbation targets, and from non-candidate molecules to targets. We found that the distances were overall significantly shorter for candidate molecules in 72% of the datasets, significantly longer in 2% of the datasets, and comparable in 26% (Figure 4C). This suggests that our method selects molecules that are not randomly scattered in the signalling network, but are found in the region where the applied perturbation acts. In summary, candidate signalling molecules are more involved in the same biological processes as the perturbation targets than non-candidate molecules or DEGs. Additionally, they are distributed in the signalling network in proximity to perturbation targets. These results suggest that candidate signalling molecules are likely to induce the desired cell state transition, and are novel candidates for further experimental validation. Application to cell type transitions In the context of regenerative medicine, the ability to induce cellular conversions between different cell types would allow to replace damaged tissues and organs. We tested our methods on datasets where growth factors or drugs were used to alter the cellular identity. These cases showed larger GRNs compared to the CMap datasets (on average 37.5 versus 23 TFs), and overall better performance: in all datasets the candidate signalling molecules contained direct targets of the experimental perturbation (Table 2), compared to the 60% success rate obtained across CMap datasets. Table 2. Results obtained on cell type transition examples. Ranks passing the 6% cut-off, which correspond to correct prediction, are reported in bold. Type: D = differentiation, A = activation, M = maintenance, R = reprogramming. Cell types: MSC = mesenchymal stromal cells from the bone marrow; HSPC = hematopoietic stem/progenitor cells; NHEK = normal human epidermal keratinocytes; HME = hematopoietic microenvironment in bone marrow Type Initial cell type Perturbation Final cell type Ref. Best rank Predicted direct targets notes D hMSC BMP2 chondrocytes (35) 10 ALK-2 TGF-β3 targets predicted: TGF-beta receptor type III (betaglycan) Chordin__inh BMP receptor 2 Noggin__inh Ectodin__inh D TGF-β3 38 Endoglin__inh BMP2 targets predicted : Chordin | Noggin | Ectodin | PTCH1__inh D HSPC valproic acid Erythroid and megakaryocytic precursors (36) 5 HDAC9__inh HDAC2 D NHEK density-induced differentiation, treated with EGF terminally differentiated keratinocytes (37) 1 ErbB4 MSK1 D hepatoblasts cAMP hepatocyte-like cells (38) 143 Protein kinase G1 A pre-adipocytes dexamethasone primed pre-adipocytes (39) 93 DAX1 A mesenchymal stem cells bFGF non-HME cells (40) 24 Casein kinase II, alpha chains Casein kinase II, alpha' chain (CSNK2A2) A TGF-β1 subendothelial mural cell fate 103 Ubiquitin bFGF targets predicted: Syndecan-3__inh | Casein kinase II, alpha' chain (CSNK2A2)__inh | S100B__inh M hES-T3 activin A + bFGF - (41) 86 ALK-4 Protocols comparison: MEF feeder M - 15 ALK-4 Protocols comparison: feeder-free ALK-7 ALK-2__inh R Mouse embryonic fibroblasts CHIR99021 + RepSox + Forskolin + valproic acid cardiomyocytes (42) 166 RepSox: JNK1(MAPK8)__inh R Adult fibroblasts SP600125 + SB202190 + Go6983 hMSC (43) 89 SP600125: p38beta (MAPK11)__inh; JAK3__inh; MSK1__inh Go6983: cPKC (conventional) (opposite sign) SB202190: p38beta (MAPK11)__inh; p38alpha (MAPK14)__inh Type Initial cell type Perturbation Final cell type Ref. Best rank Predicted direct targets notes D hMSC BMP2 chondrocytes (35) 10 ALK-2 TGF-β3 targets predicted: TGF-beta receptor type III (betaglycan) Chordin__inh BMP receptor 2 Noggin__inh Ectodin__inh D TGF-β3 38 Endoglin__inh BMP2 targets predicted : Chordin | Noggin | Ectodin | PTCH1__inh D HSPC valproic acid Erythroid and megakaryocytic precursors (36) 5 HDAC9__inh HDAC2 D NHEK density-induced differentiation, treated with EGF terminally differentiated keratinocytes (37) 1 ErbB4 MSK1 D hepatoblasts cAMP hepatocyte-like cells (38) 143 Protein kinase G1 A pre-adipocytes dexamethasone primed pre-adipocytes (39) 93 DAX1 A mesenchymal stem cells bFGF non-HME cells (40) 24 Casein kinase II, alpha chains Casein kinase II, alpha' chain (CSNK2A2) A TGF-β1 subendothelial mural cell fate 103 Ubiquitin bFGF targets predicted: Syndecan-3__inh | Casein kinase II, alpha' chain (CSNK2A2)__inh | S100B__inh M hES-T3 activin A + bFGF - (41) 86 ALK-4 Protocols comparison: MEF feeder M - 15 ALK-4 Protocols comparison: feeder-free ALK-7 ALK-2__inh R Mouse embryonic fibroblasts CHIR99021 + RepSox + Forskolin + valproic acid cardiomyocytes (42) 166 RepSox: JNK1(MAPK8)__inh R Adult fibroblasts SP600125 + SB202190 + Go6983 hMSC (43) 89 SP600125: p38beta (MAPK11)__inh; JAK3__inh; MSK1__inh Go6983: cPKC (conventional) (opposite sign) SB202190: p38beta (MAPK11)__inh; p38alpha (MAPK14)__inh View Large Table 2. Results obtained on cell type transition examples. Ranks passing the 6% cut-off, which correspond to correct prediction, are reported in bold. Type: D = differentiation, A = activation, M = maintenance, R = reprogramming. Cell types: MSC = mesenchymal stromal cells from the bone marrow; HSPC = hematopoietic stem/progenitor cells; NHEK = normal human epidermal keratinocytes; HME = hematopoietic microenvironment in bone marrow Type Initial cell type Perturbation Final cell type Ref. Best rank Predicted direct targets notes D hMSC BMP2 chondrocytes (35) 10 ALK-2 TGF-β3 targets predicted: TGF-beta receptor type III (betaglycan) Chordin__inh BMP receptor 2 Noggin__inh Ectodin__inh D TGF-β3 38 Endoglin__inh BMP2 targets predicted : Chordin | Noggin | Ectodin | PTCH1__inh D HSPC valproic acid Erythroid and megakaryocytic precursors (36) 5 HDAC9__inh HDAC2 D NHEK density-induced differentiation, treated with EGF terminally differentiated keratinocytes (37) 1 ErbB4 MSK1 D hepatoblasts cAMP hepatocyte-like cells (38) 143 Protein kinase G1 A pre-adipocytes dexamethasone primed pre-adipocytes (39) 93 DAX1 A mesenchymal stem cells bFGF non-HME cells (40) 24 Casein kinase II, alpha chains Casein kinase II, alpha' chain (CSNK2A2) A TGF-β1 subendothelial mural cell fate 103 Ubiquitin bFGF targets predicted: Syndecan-3__inh | Casein kinase II, alpha' chain (CSNK2A2)__inh | S100B__inh M hES-T3 activin A + bFGF - (41) 86 ALK-4 Protocols comparison: MEF feeder M - 15 ALK-4 Protocols comparison: feeder-free ALK-7 ALK-2__inh R Mouse embryonic fibroblasts CHIR99021 + RepSox + Forskolin + valproic acid cardiomyocytes (42) 166 RepSox: JNK1(MAPK8)__inh R Adult fibroblasts SP600125 + SB202190 + Go6983 hMSC (43) 89 SP600125: p38beta (MAPK11)__inh; JAK3__inh; MSK1__inh Go6983: cPKC (conventional) (opposite sign) SB202190: p38beta (MAPK11)__inh; p38alpha (MAPK14)__inh Type Initial cell type Perturbation Final cell type Ref. Best rank Predicted direct targets notes D hMSC BMP2 chondrocytes (35) 10 ALK-2 TGF-β3 targets predicted: TGF-beta receptor type III (betaglycan) Chordin__inh BMP receptor 2 Noggin__inh Ectodin__inh D TGF-β3 38 Endoglin__inh BMP2 targets predicted : Chordin | Noggin | Ectodin | PTCH1__inh D HSPC valproic acid Erythroid and megakaryocytic precursors (36) 5 HDAC9__inh HDAC2 D NHEK density-induced differentiation, treated with EGF terminally differentiated keratinocytes (37) 1 ErbB4 MSK1 D hepatoblasts cAMP hepatocyte-like cells (38) 143 Protein kinase G1 A pre-adipocytes dexamethasone primed pre-adipocytes (39) 93 DAX1 A mesenchymal stem cells bFGF non-HME cells (40) 24 Casein kinase II, alpha chains Casein kinase II, alpha' chain (CSNK2A2) A TGF-β1 subendothelial mural cell fate 103 Ubiquitin bFGF targets predicted: Syndecan-3__inh | Casein kinase II, alpha' chain (CSNK2A2)__inh | S100B__inh M hES-T3 activin A + bFGF - (41) 86 ALK-4 Protocols comparison: MEF feeder M - 15 ALK-4 Protocols comparison: feeder-free ALK-7 ALK-2__inh R Mouse embryonic fibroblasts CHIR99021 + RepSox + Forskolin + valproic acid cardiomyocytes (42) 166 RepSox: JNK1(MAPK8)__inh R Adult fibroblasts SP600125 + SB202190 + Go6983 hMSC (43) 89 SP600125: p38beta (MAPK11)__inh; JAK3__inh; MSK1__inh Go6983: cPKC (conventional) (opposite sign) SB202190: p38beta (MAPK11)__inh; p38alpha (MAPK14)__inh View Large Differentiation In (35) the differentiation of human mesenchymal stromal cells to chondrocytes was obtained by treatment with either BMP2 or TGF-β3. Comparing the gene expression profiles of the BMP2-treated cells to the untreated ones, we predicted that the activation of both BMP receptor 2 and TGFBR3 would induce the differentiation, in accordance with the experimental evidence. The activation or inhibition of other members of the TGF-β protein superfamily, which is known to play an important role in chondrocyte differentiation, was also predicted when using both target gene expression profiles. We also correctly predicted the activation of HDAC2 and the inhibition of HDAC9 for the differentiation of hematopoietic stem/progenitor cells to erythroid and megakaryocytic precursors (36); the application of EGF during differentiation of neonatal keratinocytes to terminally differentiated keratinocytes (37); and the activation of protein kinase G1 (PRKG1), a direct interactor of cAMP, as inducing hepatoblasts differentiation towards a hepatocyte-like population (38). Cell activation and maintenance The activation of pre-adipocytes to primed pre-adipocytes can be obtained with dexamethasone treatment (39), and in agreement with this observation we predicted the activation of DAX1, a nuclear receptor for steroid hormones. Mesenchymal stem cells can give rise to many cell types with different potential to establish a hematopoietic differentiation microenvironment. This particular competency is inhibited by treatment with bFGF; the treatment with TGF-β1 on the other hand pushes the cells towards subendothelial murate cell fate (40). Applying our method to bFGF-treated data, we predicted the activation of subunits of the protein kinase CK2, known to bind and phosphorylate bFGF. When applied to TGF-β1-treated data, our method did not predict any direct target apart from ubiquitin, but it suggested the inhibition of bFGF targets. Regarding maintenance of stem cells, in (41) the authors compared MEF feeder and feeder-free protocols for maintenance of hESC-T3 in vitro, to treatment with activin A in conditioned medium. They observed that self-renewal and pluripotency are preserved, but the mRNA and miRNA expression profiles were significantly different for the cells maintained with activin A. When comparing the activin A treatment with the other protocols, our method correctly predicted the activation of ALK-4 (activin A type IB receptor), activation of ALK-7, and inhibition of ALK-2. Reprogramming Cellular reprogramming is increasingly obtained with chemical cocktails. We tested our method on the direct conversion of mouse fibroblasts into cardiomyocytes obtained in (42) with a combination of four compounds (CHIR99021, RepSox, Forskolin and valproic acid). Using data from primary cells, our method only predicted one direct target of RepSox. However, it predicted the activation of Axin, which is a common target of GSK3, a kinase that is inhibited by CHIR99021, and G-protein alpha-s, one of Forskolin targets. Valproic acid, still, did not have any target or target-first neighbor in the candidate signalling molecules. We also applied our method to the conversion of primary human dermal fibroblasts into mesenchymal stem cells. It was observed that the minimal combination of SP600125, SB202190 and Go6983 is sufficient to obtain MSC-like induced cells (43). Our method correctly captured three direct targets of SP600125 (the inhibition of p38, JAK3 and MSK1) and two SB202190 targets (the inhibition of p38 in its α and β forms). Go6983 is an inhibitor of protein kinases C, for which our method predicted instead the activation. This can arise from the fact that multiple equally probable paths with opposing signs can exist, but only one MPP is selected as representative of the effect of a signalling molecule on an interface TFs. In summary, our method consistently predicts signalling perturbations that can induce cell type transitions. In addition, it can predict alternative ways of obtaining the same cellular conversion, as observed in the differentiation of human mesenchymal stromal cells to chondrocytes, and mutually exclusive perturbations, as in the specification of subendothelial murate cell fate in mesenchymal cells. This confirms that not only experimentally perturbed targets are predicted, but also other selected signalling molecules are biologically relevant. No other computational method is known to be capable of systematically predict meaningful signalling molecules for the induction of cell type transitions. Application to disease treatment Finally, we applied this method to the prediction of signalling molecules for disease treatment. In particular, we analysed cirrhotic versus healthy rat liver in order to induce the shift in the gene expression state of the diseased tissue towards the healthy state (Figure 5A). Currently, the therapeutic prospects for cirrhosis patients are limited to liver transplantation and, therefore, there is an urgent need to develop new therapeutic strategies. Cirrhosis was induced in male Wistar rats with CCl4 treatment, and RNA from the complete livers was extracted and quantified through microarray experiments. The gene expression profile of the desired healthy liver state was obtained from publicly available data (see Materials and Methods). The GRN built by our method consisted of 26 TFs (Figure 5B). After in silico perturbation of the 106 interface TFs connecting this GRN to the signalling network, we identified 10 TFs that were present in the BPCs (Figure 5C). Altogether, the BPCs were predicted to change the state of 19 GRN-TFs (Figure 5B). Figure 5. View largeDownload slide Application to cirrhotic model in rat. (A) Our method was applied to gene expression data of whole liver from cirrhotic and healthy rats, respectively generated in this study and obtained from public repositories. The activation of the angiopoietins receptor Tie2 was predicted as potential signalling perturbation able to convert the disease towards the healthy phenotype. Cirrhotic rats were treated with the specific Tie2 activator CVX-060, and gene expression was profiled again and compared with the data corresponding to healthy animals. (B) Boolean state of GRN TFs as measured in cirrhotic, healthy, and CVX-060 treated samples. The ideal perturbation state refers to the state that the GRN TF can reach if any of the BPCs is applied. The predicted CVX-060 treated state is the state the GRN-TFs can have if the BPCs composed only of interface TF states induced by the activation of Tie2, according to our predictions (using the correlation-based MPPs). Green background is used when a state is matching the desired healthy state. (C) Interface TFs present in the best in silico perturbations and their relative probability of inducing the desired changes on the GRN. +: the activation of the interface TFs acts on the GRN; –: its inhibition affects the GRN. The two states are not mutually exclusive, see c-Rel (NF-kB subunit). Figure 5. View largeDownload slide Application to cirrhotic model in rat. (A) Our method was applied to gene expression data of whole liver from cirrhotic and healthy rats, respectively generated in this study and obtained from public repositories. The activation of the angiopoietins receptor Tie2 was predicted as potential signalling perturbation able to convert the disease towards the healthy phenotype. Cirrhotic rats were treated with the specific Tie2 activator CVX-060, and gene expression was profiled again and compared with the data corresponding to healthy animals. (B) Boolean state of GRN TFs as measured in cirrhotic, healthy, and CVX-060 treated samples. The ideal perturbation state refers to the state that the GRN TF can reach if any of the BPCs is applied. The predicted CVX-060 treated state is the state the GRN-TFs can have if the BPCs composed only of interface TF states induced by the activation of Tie2, according to our predictions (using the correlation-based MPPs). Green background is used when a state is matching the desired healthy state. (C) Interface TFs present in the best in silico perturbations and their relative probability of inducing the desired changes on the GRN. +: the activation of the interface TFs acts on the GRN; –: its inhibition affects the GRN. The two states are not mutually exclusive, see c-Rel (NF-kB subunit). The overall ranking of signalling molecules prioritized many proteins known to be involved in different aspects of liver fibrosis, fatty liver disease, cirrhosis, and hepatocellular carcinoma. In particular, the inhibition of fibrosis-related proteins was predicted (e.g. CHIP, AP-1, CBP, MDM2), along with the activation of ESR2, known for its antifibrogenic effects (44), and the inhibition of MMPs responsible for matrix remodelling. Multiple interleukins and proteins related to innate immune response in liver cirrhosis (45) were also selected as candidates. Another biological process emerging in our predictions is angiopoietins signalling, a key pathway in blood vessel normalization. Angiopoietin 1 (Ang1)-Tie2 signalling stabilizes blood vessels, Angiopoietin 2 (Ang2) on the other hand is a context-dependent antagonist of Ang1 and decreases its stabilizing effect (46) giving rise to immature blood vessels. Cirrhotic conditions are characterized by higher expression and activity of Ang2 than healthy conditions, resulting in the loss of blood vessels stability. In this regard, the activation of angiopoietins receptor Tie2 ranked 24th among all signalling molecules, and our method also predicted the activation of Angiopoietin 1 and 4 and the inhibition of Angiopoietin 2 and 3 (Supplementary Table S3). Our model predicted that the activation of Tie2 would activate the interface TFs SP1 and ETS1, and inhibit GCR, STAT5A and B, ESR1, and PU.1 (Supplementary Figure S8), inducing a GRN state that partially matches the healthy liver state (Figure 5B). To test this prediction, we generated gene expression data from the whole liver of cirrhotic rats treated with CVX-060, an inhibitor of Ang2 which induces Tie2 activation, and observed that the GRN-TFs EBF1, EGR3, PRDM1, RUNX3, SP7, RARG and TP63 were indeed reverted to the healthy state (Figure 5B). EGR3 and RUNX3 have been implicated in the pathophysiology of fibrosis and liver development. EGR3 is a pro-inflammatory and immunogenic factor, its overexpression is sufficient to stimulate fibrosis, whereas suppression of Egr-3 activity in deficient mice correlated with the attenuation of the TGF-β signalling and consequently of fibrogenesis (47). EGR3 is also an essential effector of VEGF-mediated functions leading to angiogenesis (48). Runx3 is mostly expressed in the liver during the embryonic development and is a regulator of fetal hematopoiesis (49). Runx3 knockout mice died within 24 h after birth showing organogenesis defects in lung and liver. In addition, the absence of Runx3 activity was associated with excessive intrahepatic angiogenesis, suggesting that the physiological function of this TF in the liver is mainly embryonic (50). RA signalling through RARG has been shown to reverse hepatic stellate cell activation and fibrosis (51), SP7 and TP63 have been previously implicated in the regulation of VEGF-mediated angiogenesis, while PRDM1 and EBF1 have no clear connection with angiogenesis or cirrhosis reported to this date and could be novel therapeutic targets. Next, we constructed a GRN for the treated gene expression profile, and compared it with the disease GRN. The treated GRN had a similar size to the disease GRN (30 TFs), and shared 14 TFs with it. However, the 12 TFs that were exclusively present in the disease network were localized in network-specific modules that contained TFs playing a major role in vascular growth, including the disease-specific TFs TP63, EGR3, RUNX3 and SP7, discussed above (Supplementary Figure S8). On the other hand, modules shared between the disease- and the treated GRNs did not contain TFs associated with angiogenesis. This confirms that the activation of Tie2 specifically targeted the transcriptional regulators of angiogenesis, and reverted their gene expression state to the healthy counterpart. Taken together, this experimental study provides insights into the molecular changes during the inhibition of Ang2 with CVX-060 in cirrhotic rat liver. Importantly, it complements the previous functional study where inhibition of Ang2 and activation of the Tie2 signalling have been demonstrated to improve the normalization of intrahepatic blood vessels and to decrease the liver inflammatory infiltrate, and thus an effective treatment for liver fibrosis in cirrhotic rats (52). As activation of Tie2 only partially reverts the disease phenotype, a combination of candidates involved in different aspects of the disease is probably necessary to obtain the complete switch towards a healthy expression profile. In this context, endothelin inhibition is another predicted candidate (ranking 17th overall, see Supplementary Table S3) that plays an important pathological role in cirrhotic livers through a different mechanism, fibrogenesis induction. Independent studies have convincingly described that an overexpression of endothelin-1 is associated with the pathological activation of hepatic stellate cells, which are the major source of collagen expression in the liver, and intrahepatic vascular dysfunction through exacerbated vasoconstriction (26). Comparison with existing methods No method similar to ours in terms of application or modelling strategy exists to date. Therefore, we compared our method to computational tools that differ from ours by approach and application, but which are widely used to analyse signalling perturbations using gene expression data. Connectivity Map (28) uses gene expression similarity between a compendium of known perturbations and a query signature (list of up- and down-regulated genes) to rank small molecules, drugs and gene perturbations. SPIA (27) scores signalling pathways by their enrichment in DEGs, while also taking into account their topology. We applied both Connectivity Map and SPIA to the 219 datasets considered in our analysis, and obtained results for respectively 136 and 211 datasets where the tools could be applied and the prediction of the experimental perturbation was possible. We then focused on the 135 datasets that obtained predictions from all three methods (ours, Connectivity Map and SPIA). (Figure 6A, Supplementary Table S4). SPIA correctly predicted KEGG pathways containing direct perturbation targets in only 33% of such datasets. Connectivity Map in turn correctly predicted either the experimental perturbation, its gene targets, or drugs regulating the same targets, in 64% of the datasets, and our method obtained correct signalling molecules in 74% of the datasets. While Connectivity Map has an overall success rate similar to our method, it was not very successful in cell type transition datasets, only retrieving correct perturbations in four of them (Figure 6A). SPIA showed a performance similar to Connectivity Map in these cases. This result suggests a superior performance of our method in predicting novel perturbations, which are not present in the Connectivity Map compendium. Figure 6. View largeDownload slide Comparison with previously published methods. (A) Performance of our method, Connectivity Map (CMap) and SPIA on datasets that could be analysed with the three methods (upper panel), and on cell type transition cases (lower panel). The number of cases in which the predictions of one or more methods were correct are reported. (B) Performance of DeMAND and our method on the eight compound perturbation datasets that could be analysed with both. Both methods correctly predicted direct perturbation targets for all datasets except (S)-equol and thapsigargin. Figure 6. View largeDownload slide Comparison with previously published methods. (A) Performance of our method, Connectivity Map (CMap) and SPIA on datasets that could be analysed with the three methods (upper panel), and on cell type transition cases (lower panel). The number of cases in which the predictions of one or more methods were correct are reported. (B) Performance of DeMAND and our method on the eight compound perturbation datasets that could be analysed with both. Both methods correctly predicted direct perturbation targets for all datasets except (S)-equol and thapsigargin. We also compared our method to DeMAND (29), a GRN-based tool aiming at identifying compounds mode of action using gene expression profiles and context-specific regulatory networks. Considering each gene G and the list of genes that it can regulate, called regulon, DeMAND scores each gene G based on how significantly the expression of its regulon is dysregulated following drug application. As it requires at least six samples for each condition to give reliable results, DeMAND could not be applied to the datasets analysed with our method, and we could not assess its suitability for the induction of cell fate transitions. Instead, we tested our method on the compound perturbation datasets that were used for DeMAND’s evaluation. We obtained candidates in nine datasets, however in one case the perturbation targets were absent from the signalling network used in our method, and therefore their prediction was not possible. Both our method and DeMAND obtained correct predictions in six of the remaining eight datasets (75%) (Figure 6B, Supplementary Table S4). Aside from obtaining comparable performance to DeMAND using substantially less data, our method explicitly predicts the activation or the inhibition of signalling molecules, and correctly reported signs in 5/6 datasets, thus overcoming an important limitation of DeMAND. DISCUSSION Here we have introduced the first general method, to our knowledge, which uses gene expression data to predict signalling perturbations that can induce the transition from an initial to a desired cellular phenotype. For this purpose, single signalling molecules are prioritized according to their probability of specifically acting on the interface TFs that are most likely to trigger the shift from the initial to the required GRN state. Our approach differs conceptually from previously published studies since it constitutes a general methodology that integrates signalling and gene regulatory networks by considering transitions between GRN states corresponding to the initial and target cellular phenotypes. On the contrary, other GRN-based approaches solely rely on GRN topology, and therefore ignore collective changes in TF expression induced by the signalling cues. Furthermore, our method was more successful than GRN-free methods in predicting signalling targets for cellular conversions, and showed similar performance compared to another GRN-based method (DeMAND), which requires higher amount of data. Importantly, the pathways predicted by this method to be involved in signal transduction were supported by changes in phosphorylation state (when data was available), indicating that gene expression alone can be reasonably used to analyse signalling processes in the absence of phosphoproteomics and perturbation data. Results show that our method is able to consistently identify signalling targets of experimentally validated perturbations, and novel candidates with potential to induce desired cellular transitions. In particular, our method correctly predicted experimentally validated signalling targets in the analysed cell type transition examples, including cellular differentiation, reprogramming. Further, we applied our method to a liver cirrhosis model in rat to predict signalling molecules whose perturbations could revert the disease phenotype. Experimental perturbation of the predicted angiopoietins receptor (Tie2) induced desired changes in the gene expression of key TFs involved in fibrosis and angiogenesis. An important limitation of our method is that it only predicts single signalling molecules, whereas combinations of these molecules could improve the efficiency of cellular conversion. In this regard, this method could be extended to the prediction of combinations of signalling molecules in order to take into account the combinatorial effect of multiple signalling molecules, which could be synergistic or redundant. In conclusion, we believe that this method represents a general tool that can guide the identification of signalling molecules for the induction of desired cellular transitions, such as the reversal of disease phenotypes and the induction of cell differentiation or reprogramming, with perspective applications to disease treatment and regenerative medicine. DATA AVAILABILITY The method was implemented in Matlab and R. It is available as a Snakemake pipeline (53) with all necessary datasets at: https://git-r3lab.uni.lu/gaia.zaffaroni/INCanTeSIMO. All microarray data generated is available in GEO under accession number GSE122822. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS CVX-060 was generously supplied by Pfizer Inc. FUNDING Fonds National de la Recherche Luxembourg [C15/BM/10397420 to S.O., 10035087 to G.Z.]; Ministerio de Ciencia, Innovación y Universidades [SAF2016-75358-R to M.M.-R.], co-financed by FEDER, European Union, a way of making Europe and CIBERehd is financed by the Instituto de Salud Carlos III. Funding for open access charge: Funding for publication charges is provided in the grants awarded by Fonds National de la Recherche Luxembourg. Conflict of interest statement. None declared. REFERENCES 1. Ginn S.L. , Amaya A.K. , Alexander I.E. , Edelstein M. , Abedi M.R. Gene therapy clinical trials worldwide to 2017: An update . J. Gene Med. 2018 ; 20 : e3015 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Lamb J. The Connectivity Map: a new tool for biomedical research . Nat. Rev. Cancer . 2007 ; 7 : 54 – 60 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Parikh J.R. , Klinger B. , Xia Y. , Marto J.A. , Blüthgen N. Discovering causal signaling pathways through gene-expression patterns . Nucleic Acids Res. 2010 ; 38 : 109 – 117 . Google Scholar Crossref Search ADS WorldCat 4. Schubert M. , Klinger B. , Klünemann M. , Sieber A. , Uhlitz F. , Sauer S. , Garnett M.J. , Blüthgen N. , Saez-Rodriguez J. Perturbation-response genes reveal signaling footprints in cancer gene expression . Nat. Commun. 2018 ; 9 : 20 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Bayerlová M. , Jung K. , Kramer F. , Klemm F. , Bleckmann A. , Beißbarth T. Comparative study on gene set and pathway topology-based enrichment methods . BMC Bioinformatics . 2015 ; 16 : 334 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Khatri P. , Sirota M. , Butte A.J. Ten years of pathway analysis: Current approaches and outstanding challenges . PLoS Comput. Biol. 2012 ; 8 : e1002375 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Osmanbeyoglu H.U. , Pelossof R. , Bromberg J.F. , Leslie C.S. Linking signaling pathways to transcriptional programs in breast cancer . Genome Res. 2014 ; 24 : 1869 – 1880 . Google Scholar Crossref Search ADS PubMed WorldCat 8. Cotton T.B. , Nguyen H.H. , Said J.I. , Ouyang Z. , Zhang J. , Song M. Discerning mechanistically rewired biological pathways by cumulative interaction heterogeneity statistics . Sci. Rep. 2015 ; 5 : 9634 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Noh H. , Shoemaker J.E. , Gunawan R. Network perturbation analysis of gene transcriptional profiles reveals protein targets and mechanism of action of drugs and influenza A viral infection . Nucleic Acids Res. 2018 ; 46 : e34 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Peng S.C. , Wong D.S. , Tung K.C. , Chen Y.Y. , Chao C.C. , Peng C.H. , Chuang Y.J. , Tang C.Y. Computational modeling with forward and reverse engineering links signalling network and genomic regulatory responses: NF-kappaB signalling-induced gene expression responses in inflammation . BMC Bioinformatics . 2010 ; 11 : 308 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Zañudo J.G.T. , Albert R. Cell fate reprogramming by control of intracellular network dynamics . PLoS Comput. Biol. 2015 ; 11 : 1 – 24 . Google Scholar Crossref Search ADS WorldCat 12. Yachie‐Kinoshita A. , Onishi K. , Ostblom J. , Langley M.A. , Posfai E. , Rossant J. , Zandstra P.W. Modeling signaling‐dependent pluripotency with Boolean logic to predict cell fate transitions . Mol. Syst. Biol. 2018 ; 14 : e7952 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Strasen J. , Sarma U. , Jentsch M. , Bohn S. , Sheng C. , Horbelt D. , Knaus P. , Legewie S. , Loewer A. Cell‐specific responses to the cytokine TGFβ are determined by variability in protein levels . Mol. Syst. Biol. 2018 ; 14 : e7733 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Niepel M. , Hafner M. , Duan Q. , Wang Z. , Paull E.O. , Chung M. , Lu X. , Stuart J.M. , Golub T.R. , Subramanian A. et al. . Common and cell-type specific responses to anti-cancer drugs revealed by high throughput transcript profiling . Nat. Commun. 2017 ; 8 : 1186 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Szklarczyk D. , Santos A. , von Mering C. , Jensen L.J. , Bork P. , Kuhn M. STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data . Nucleic Acids Res. 2016 ; 44 : D380 – D384 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Wishart D.S. , Feunang Y.D. , Guo A.C. , Lo E.J. , Marcu A. , Grant J.R. , Sajed T. , Johnson D. , Li C. , Sayeeda Z. et al. . DrugBank 5.0: a major update to the DrugBank database for 2018 . Nucleic Acids Res. 2018 ; 46 : D1074 – D1082 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Lamb J. , Crawford E.D. , Peck D. , Modell J.W. , Blat I.C. , Wrobel M.J. , Lerner J. , Brunet J.-P. , Subramanian A. , Ross K.N. et al. . The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease . Science . 2006 ; 313 : 1929 – 1935 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Zhang H.-M. , Liu T. , Liu C.-J. , Song S. , Zhang X. , Liu W. , Jia H. , Xue Y. , Guo A.-Y. AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors . Nucleic Acids Res. 2015 ; 43 : D76 – D81 . Google Scholar Crossref Search ADS PubMed WorldCat 19. McCall M.N. , Bolstad B.M. , Irizarry R.A. Frozen robust multiarray analysis (fRMA) . Biostatistics . 2010 ; 11 : 242 – 253 . Google Scholar Crossref Search ADS PubMed WorldCat 20. McCall M.N. , Uppal K. , Jaffee H.A. , Zilliox M.J. , Irizarry R.A. The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes . Nucleic Acids Res. 2011 ; 39 : D1011 – D1015 . Google Scholar Crossref Search ADS PubMed WorldCat 21. McCall M.N. , Jaffee H.A. , Zelisko S.J. , Sinha N. , Hooiveld G. , Irizarry R.A. , Zilliox M.J. The Gene Expression Barcode 3.0: improved data processing and mining tools . Nucleic Acids Res. 2014 ; 42 : D938 – D943 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Crespo I. , Del Sol A. A general strategy for cellular reprogramming: the importance of transcription factor cross-repression . Stem Cells . 2013 ; 31 : 2127 – 2135 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Huang R. , Wallqvist A. , Covell D.G. Comprehensive analysis of pathway or functionally related gene expression in the National Cancer Institute's anticancer screen . Genomics . 2006 ; 87 : 315 – 328 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Jaeger S. , Min J. , Nigsch F. , Camargo M. , Hutz J. , Cornett A. , Cleaver S. , Buckler A. , Jenkins J.L. Causal network models for predicting compound targets and driving pathways in cancer . J. Biomol. Screen. 2014 ; 19 : 791 – 802 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Chen E.Y. , Xu H. , Gordonov S. , Lim M.P. , Perkins M.H. , Ma’ayan A. Expression2Kinases: mRNA profiling linked to multiple upstream regulatory layers . Bioinformatics . 2012 ; 28 : 105 – 111 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Tsuchida T. , Friedman S.L. Mechanisms of hepatic stellate cell activation . Nat. Rev. Gastroenterol. Hepatol. 2017 ; 14 : 397 – 411 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Tarca A.L. , Draghici S. , Khatri P. , Hassan S.S. , Mittal P. , Kim J. , Kim C.J. , Kusanovic J.P. , Romero R. A novel signaling pathway impact analysis . Bioinformatics . 2009 ; 25 : 75 – 82 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Subramanian A. , Narayan R. , Corsello S.M. , Peck D.D. , Natoli T.E. , Lu X. , Gould J. , Davis J.F. , Tubelli A.A. , Asiedu J.K. et al. . A next generation connectivity Map: L1000 platform and the first 1,000,000 profiles . Cell . 2017 ; 171 : 1437 – 1452 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Woo J.H. , Shimoni Y. , Yang W.S. , Subramaniam P. , Iyer A. , Nicoletti P. , Rodríguez Martínez M. , López G. , Mattioli M. , Realubit R. et al. . Elucidating compound mechanism of action by network perturbation analysis . Cell . 2015 ; 162 : 441 – 451 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Ladbury J.E. , Arold S.T. Noise in cellular signaling pathways: Causes and effects . Trends Biochem. Sci. 2012 ; 37 : 173 – 178 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Efroni S. , Schaefer C.F. , Buetow K.H. Identification of key processes underlying cancer phenotypes using biologic pathway analysis . PLoS One . 2007 ; 2 : e425 . Google Scholar Crossref Search ADS PubMed WorldCat 32. Sebastian-Leon P. , Vidal E. , Minguez P. , Conesa A. , Tarazona S. , Amadoz A. , Armero C. , Salavert F. , Vidal-Puig A. , Montaner D. et al. . Understanding disease mechanisms with models of signaling pathway activities . BMC Syst. Biol. 2014 ; 8 : 121 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Invergo B.M. , Beltrao P. Reconstructing phosphorylation signalling networks from quantitative phosphoproteomic data . Essays Biochem. 2018 ; 62 : 525 – 534 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Bansal M. , Yang J. , Karan C. , Menden M.P. , Costello J.C. , Tang H. , Xiao G. , Li Y. , Allen J. , Zhong R. et al. . A community computational challenge to predict the activity of pairs of compounds . Nat. Biotechnol. 2014 ; 32 : 1213 – 1222 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Mrugala D. , Dossat N. , Ringe J. , Delorme B. , Coffy A. , Bony C. , Charbord P. , Häupl T. , Daures J.-P. , Noël D. et al. . Gene expression profile of multipotent mesenchymal stromal cells: Identification of pathways common to TGFbeta3/BMP2-induced chondrogenesis . Cloning Stem Cells . 2009 ; 11 : 61 – 76 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Zini R. , Norfo R. , Ferrari F. , Bianchi E. , Salati S. , Pennucci V. , Sacchi G. , Carboni C. , Ceccherelli G.B. , Tagliafico E. et al. . Valproic acid triggers erythro/megakaryocyte lineage decision through induction of GFI1B and MLLT3 expression . Exp. Hematol. 2012 ; 40 : 1043 – 1054 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Tran Q.T. , Kennedy L.H. , Leon Carrion S. , Bodreddigari S. , Goodwin S.B. , Sutter C.H. , Sutter T.R. EGFR regulation of epidermal barrier function . Physiol. Genomics . 2012 ; 44 : 455 – 469 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Ogawa S. , Surapisitchat J. , Virtanen C. , Ogawa M. , Niapour M. , Sugamori K.S. , Wang S. , Tamblyn L. , Guillemette C. , Hoffmann E. et al. . Three-dimensional culture and cAMP signaling promote the maturation of human pluripotent stem cell-derived hepatocytes . Development . 2013 ; 140 : 3285 – 3296 . Google Scholar Crossref Search ADS PubMed WorldCat 39. Tomlinson J.J. , Boudreau A. , Wu D. , Salem H.A. , Carrigan A. , Gagnon A. , Mears A.J. , Sorisky A. , Atlas E. , Haché R.J.G. Insulin sensitization of human preadipocytes through glucocorticoid hormone induction of forkhead transcription factors . Mol. Endocrinol. 2010 ; 24 : 104 – 113 . Google Scholar Crossref Search ADS PubMed WorldCat 40. Sacchetti B. , Funari A. , Michienzi S. , Di Cesare S. , Piersanti S. , Saggio I. , Tagliafico E. , Ferrari S. , Robey P.G. , Riminucci M. et al. . Self-Renewing osteoprogenitors in bone marrow sinusoids can organize a hematopoietic microenvironment . Cell . 2007 ; 131 : 324 – 336 . Google Scholar Crossref Search ADS PubMed WorldCat 41. Tsai Z.Y. , Singh S. , Yu S.L. , Kao L.P. , Chen B.Z. , Ho B.C. , Yang P.C. , Li S.S.L. Identification of microRNAs regulated by activin A in human embryonic stem cells . J. Cell. Biochem. 2010 ; 109 : 93 – 102 . Google Scholar PubMed WorldCat 42. Fu Y. , Huang C. , Xu X. , Gu H. , Ye Y. , Jiang C. , Qiu Z. , Xie X. Direct reprogramming of mouse fibroblasts into cardiomyocytes with chemical cocktails . Cell Res. 2015 ; 25 : 1013 – 1024 . Google Scholar Crossref Search ADS PubMed WorldCat 43. Lai P.-L. , Lin H. , Chen S.-F. , Yang S.-C. , Hung K.-H. , Chang C.-F. , Chang H.-Y. , Lu F.L. , Lee Y.-H. , Liu Y.-C. et al. . Efficient generation of chemically induced mesenchymal stem cells from human dermal fibroblasts . Sci. Rep. 2017 ; 7 : 44534 . Google Scholar Crossref Search ADS PubMed WorldCat 44. Zhang B. , Zhang C.-G. , Ji L.-H. , Zhao G. , Wu Z.-Y. Estrogen receptor β selective agonist ameliorates liver cirrhosis in rats by inhibiting the activation and proliferation of hepatic stellate cells . J. Gastroenterol. Hepatol. 2018 ; 33 : 747 – 755 . Google Scholar Crossref Search ADS PubMed WorldCat 45. Zhou W.-C. Pathogenesis of liver cirrhosis . World J. Gastroenterol. 2014 ; 20 : 7312 . Google Scholar Crossref Search ADS PubMed WorldCat 46. Fagiani E. , Christofori G. Angiopoietins in angiogenesis . Cancer Lett. 2013 ; 328 : 18 – 26 . Google Scholar Crossref Search ADS PubMed WorldCat 47. Fang F. , Shangguan A.J. , Kelly K. , Wei J. , Gruner K. , Ye B. , Wang W. , Bhattacharyya S. , Hinchcliff M.E. , Tourtellotte W.G. et al. . Early growth response 3 (Egr-3) is induced by transforming growth Factor-β and regulates fibrogenic responses . Am. J. Pathol. 2013 ; 183 : 1197 – 1208 . Google Scholar Crossref Search ADS PubMed WorldCat 48. Liu D. , Evans I. , Britton G. , Zachary I. The zinc-finger transcription factor, early growth response 3, mediates VEGF-induced angiogenesis . Oncogene . 2008 ; 27 : 2989 – 2998 . Google Scholar Crossref Search ADS PubMed WorldCat 49. de Bruijn M. , Dzierzak E. Runx transcription factors in the development and function of the definitive hematopoietic system . Blood . 2017 ; 129 : 2061 – 2069 . Google Scholar Crossref Search ADS PubMed WorldCat 50. Lee J.-M. , Lee D.-J. , Bae S.-C. , Jung H.-S. Abnormal liver differentiation and excessive angiogenesis in mice lacking Runx3 . Histochem. Cell Biol. 2013 ; 139 : 751 – 758 . Google Scholar Crossref Search ADS PubMed WorldCat 51. Panebianco C. , Oben J.A. , Vinciguerra M. , Pazienza V. Senescence in hepatic stellate cells as a mechanism of liver fibrosis reversal: a putative synergy between retinoic acid and PPAR-gamma signalings . Clin. Exp. Med. 2017 ; 17 : 269 – 280 . Google Scholar Crossref Search ADS PubMed WorldCat 52. Pauta M. , Ribera J. , Melgar-Lesmes P. , Casals G. , Rodríguez-Vita J. , Reichenbach V. , Fernandez-Varo G. , Morales-Romero B. , Bataller R. , Michelena J. et al. . Overexpression of angiopoietin-2 in rats and patients with liver fibrosis. Therapeutic consequences of its inhibition . Liver Int. 2015 ; 35 : 1383 – 1392 . Google Scholar Crossref Search ADS PubMed WorldCat 53. Koster J. , Rahmann S. Snakemake—a scalable bioinformatics workflow engine . Bioinformatics . 2012 ; 28 : 2520 – 2522 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Multiplexed and tunable transcriptional activation by promoter insertion using nuclease-assisted vector integrationBrown,, Alexander;Winter,, Jackson;Gapinske,, Michael;Tague,, Nathan;Woods, Wendy, S;Perez-Pinera,, Pablo
doi: 10.1093/nar/gkz210pmid: 30931472
Abstract The ability to selectively regulate expression of any target gene within a genome provides a means to address a variety of diseases and disorders. While artificial transcription factors are emerging as powerful tools for gene activation within a natural chromosomal context, current generations often exhibit relatively weak, variable, or unpredictable activity across targets. To address these limitations, we developed a novel system for gene activation, which bypasses native promoters to achieve unprecedented levels of transcriptional upregulation by integrating synthetic promoters at target sites. This gene activation system is multiplexable and easily tuned for precise control of expression levels. Importantly, since promoter vector integration requires just one variable sgRNA to target each gene of interest, this procedure can be implemented with minimal cloning. Collectively, these results demonstrate a novel system for gene activation with wide adaptability for studies of transcriptional regulation and cell line engineering. INTRODUCTION The activation of endogenous genes with artificial transcription factors (ATFs) is an enticing technology, not only for developing gene therapies or disease models (1,2), but also for interrogating gene function through genome-wide screenings (3,4). ATFs consist of a programmable DNA binding domain that can be customized to target a transcriptional activation domain to the appropriate locus for upregulation of gene expression. While zinc finger proteins (5) and Transcriptional Activator-Like Effectors (TALEs) (6–9) have been used for gene activation, the RNA guided nuclease (RGN) platform (1,10–16) is arguably the most popular since the DNA binding specificity can be engineered rapidly and at low cost (17–22). CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) activation or CRISPRa, requires a single-guide RNA (sgRNA) and catalytically dead Cas9 (dCas9) coupled with a transcriptional activator. First generation transcriptional activators, which typically used VP64 or VP16 activation domains, required multiple ATFs acting in synergy near the transcriptional start site (TSS) of the gene of interest for optimal gene activation (6,7,13,15). This important limitation is lessened when using second-generation transcriptional activators, including VP160 (10), SAM (23), VPR (24), SunTag (25), VP64-dCas9-BFP-VP64 (26), Scaffold (27) and P300 (28), which are capable of activating expression of some target genes when used individually. Unfortunately, it is becoming evident that even second generation CRISPRa technologies are often limited by their need for multiple sgRNAs to achieve adequate activation of many genes (29) and the lack of established parameters to best position ATFs within endogenous promoters for effective upregulation of gene expression. Importantly, CRISPRa systems often fail at activating genes whose expression is tightly regulated. These constraints limit widespread adoption of CRISPRa for applications in synthetic biology, tissue engineering or gene therapy. In these studies, we sought to develop an alternative architecture that bypasses the limitations of current platforms for activating native genes. We reasoned that, since genomic context at the promoter greatly impacts output expression when using ATFs, it might be possible to circumvent this problem through insertion of a synthetic promoter near the TSS of target genes. This system would not only override negative regulatory elements, but would also be highly customizable, given the existing assortment of well-characterized synthetic promoters capable of both constitutive and chemically inducible gene expression. In this manuscript, we utilized a universal vector integration platform (30,31) to engineer a synthetic system for activating endogenous genes. To this end, we describe how this platform enables rapid, robust and inducible activation of both individual and multiplexed gene transcripts. MATERIALS AND METHODS Cell culture and transfection 293T, HCT116 and Neuro-2A cells were maintained in DMEM supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin at 37°C with 5% CO2. SF7996 primary glioblastoma cells were a kind gift from Joseph Costello (32) and were cultured in DMEM/Ham's F-12 1:1 media, 10% FBS, 1% Penicillin/Streptomycin. Transfections were performed with Lipofectamine 2000 (Invitrogen) according to manufacturer's instructions. Transfection efficiencies were routinely higher than 80% for 293T and Neuro-2A cells, ∼50% for HCT116 cells and ∼25% for SF7996 cells as determined by fluorescent microscopy following delivery of a control GFP expression plasmid. Selection of transfected cells was performed by culturing in complete medium containing puromycin for 72 h. Concentrations of puromycin were 2 μg/ml for 293T, 0.5 μg/ml for HCT116, 1 μg/ml for SF7996 and 3 μg/ml for Neuro-2A. Induction of gene expression, unless otherwise noted, was carried out with 200 ng/ml doxycycline in DMEM prepared with 10% tetracycline-free FBS for 4 days. Growth rate comparison was performed by seeding 20 000 cells per well, counting the number of cells using a hemocytometer after 6 days and seeding 20 000 cells back to analyze at the next time point. Population Doublings were calculated using the equation PDs = 3.32(log(Y) – log(X)), where Y is the final cell count and X is the initial cell count (33). Plasmids and oligonucleotides The plasmids encoding SpCas9 (Plasmid #41815), sgRNA (#47108) and SpdCas9-VPR (#63798) were obtained from Addgene. The backbone for the targeting vectors was synthesized by IDT as gene blocks and cloned into a pCDNA3.1 plasmid. The oligonucleotides used to create the guide sequences were obtained from IDT, hybridized, phosphorylated and cloned in the sgRNA vector using BbsI (15,34). The target sequences are provided in Supplementary Table S1. PCR Seventy-two hours after transfection, genomic DNA was isolated using DNeasy Blood & Tissue Kit (Qiagen). PCRs were performed using KAPA2G Robust PCR kits (KAPA Biosystems). A typical 25 μl reaction used 20–100 ng of genomic DNA, Buffer A (5 μl), Enhancer (5 μl), dNTPs (0.5 μl), 10μM forward primer (1.25 μl), 10 μM reverse primer (1.25 μl), KAPA2G Robust DNA Polymerase (0.5 U) and water (up to 25 μl). The DNA sequence of the primers for each target are provided in Supplementary Table S2. The PCR products were visualized in 2% agarose gels and images were captured using a ChemiDoc-It2 (UVP). qPCR Cells were harvested and flash-frozen in liquid nitrogen prior to RNA-extraction using the RNeasy Plus RNA isolation kit (Qiagen) according to manufacturer's instructions. cDNA synthesis was carried out using the qScript cDNA Synthesis Kit (Quanta Biosciences) from 1 μg of RNA and reactions were performed as directed by the supplier. For RT-qPCR, SsoFast EvaGreen Supermix (Bio-Rad) was added to cDNA and primers targeting the gene of interest and GAPDH (Supplementary Table S3). Following 30 s at 95°C, PCR amplification (5 s at 95°C, 20 s at 55°C, 40 total cycles) preceded melt-curve analysis of the product by the CFX Connect Real-Time System (Bio-Rad). Ct values were used to calculate changes in expression level, relative to GAPDH and control samples by the 2−ΔΔCt method. qPCR standard curves were prepared for each target (Supplementary Figure S1). RNA integrity numbers (RINs) for representative RNA samples prepared using the methods described were calculated by the Functional Genomics Unit of the Roy J. Carver Biotechnology Center using an Agilent Bioanalyzer RNA Nano chip according to manufacturer's instructions (Supplementary Table S4). Western blot Cell pellets were resuspended in 1X NuPAGE LDS Sample Buffer (Life Technologies) with 2.5% β-mercaptoethanol, heated for 5 min at 95°C, and sonicated. Lysates were loaded into a 4–12% NuPAGE Bis-Tris Protein Gel alongside Precision Plus Protein Dual Color Standard ladder (BioRad), electrophoresed, and transferred onto a 0.45 μm nitrocellulose membrane (BioRad). Membranes were blocked for 2 hours with 5% nonfat dried milk in TBS containing 0.1% Tween-20 (TBST). Membranes were then rinsed three times in TBST and incubated with anti-human GAPDH rabbit antibody (1:1000, Cell Signaling Technology, 14C10) or anti-human NEUROD1 rabbit antibody (1:1000, Cell Signaling Technology, D35G2) in 5% BSA in TBST overnight with gentle rocking at 4°C. Following primary antibody incubation, membranes were rinsed three times in TBST and incubated for 40 min with HRP-conjugated anti-rabbit IgG (1:3000, Cell Signaling Technology, #7074). Membranes were then rinsed with TBST three times and incubated for five minutes in Clarity Western ECL Substrate (BioRad). Membranes were then imaged using 60 min exposure time for NEUROD1 and 10 min exposure time for GAPDH with a ChemiDoc-It2 (UVP). Western blot densitometry analysis NEUROD1 protein expression levels were compared using densitometry analysis of the western blot membrane images using ImageJ software. After subtracting background noise, NEUROD1 band intensities were normalized to GAPDH band intensities, where band intensity is the sum of each pixel grayscale value within the selected area of the band. Statistics Statistical analysis was performed by two-way ANOVA with alpha equal to 0.05 or with t tests in Prism 7. RESULTS We chose nuclease-assisted vector integration (NAVI) (31) for insertion of promoters at target sites. NAVI can be rapidly adapted to integrate heterologous DNA at virtually any locus via two simultaneous DSBs: first in the genome, guided by a primary sgRNA, and second within the targeting vector (TV), guided by a universal secondary sgRNA (31). The TV is then integrated into the genomic locus through Non-Homologous End Joining (NHEJ). This platform is universal since vector integration at any target site can be simply accomplished by customizing the primary sgRNA. To develop a universal system of NAVI-based gene activation (NAVIa), we designed two vectors for constitutive expression and one vector for inducible expression. The two constitutive vectors contain either one CMV promoter followed by a target site for a universal secondary sgRNA (constitutive single promoter targeting vector, cspTV) or two opposing constitutive promoters separated by the secondary sgRNA target site (constitutive dual promoter targeting vector, cdpTV), each containing a cassette for expression of the puromycin N-acetyl-transferase gene (Figure 1A). The targeting vector for inducible expression (inducible dual promoter targeting vector, idpTV) includes two identical promoters in opposite orientations, each consisting of seven TetO repeats and a minimal CMV promoter (mCMV). The idpTV also carries a puromycin N-acetyl-transferase gene linked with a reverse tetracycline transactivator (rtTA) via a T2A peptide. As in the cdpTV, the opposing promoters of the idpTV flank a universal secondary sgRNA target sequence. A DSB introduced in either idpTV or cdpTV by Cas9 generates a linear fragment of DNA with diametric promoters oriented towards the free ends of the vector (Figure 1A). The architecture of the dual promoter TV ensures that there is always a promoter correctly positioned regardless of integration orientation, thereby addressing NAVI’s lack of directionality. Figure 1. View largeDownload slide NAVIa Activation of Native Gene Expression is Tunable and Surpasses CRISPRa. (A) The architecture of the NAVIa system includes a plasmid containing a human codon-optimized expression cassette for active Cas9, which is co-transfected with two separate sgRNA plasmids and a targeting vector (idpTV, cdpTV or cspTV). The primary sgRNA, shown in dark blue, is designed to bind and target Cas9 to the 5′ region of the gene of interest, while the secondary sgRNA target site (green) is at the 3′ end of the cspTV promoter, or between the diametric promoters of the cdpTV and idpTV. After Cas9 cuts the TV, the resulting linearized vector is integrated at the target site in genomic DNA, presumably via NHEJ repair of the double-stranded breaks. (B) The ability of NAVIa to upregulate the expression of target transcript within pooled, selected 293T cells was evaluated using qPCR across a panel of three genes: ASCL1, NEUROD1, and POUF51. Each sgRNA employed within NAVIa was also used for gene activation with CRISPRa (dCas9-VPR) either alone or in conjunction with three additional sgRNAs, previously reported to activate expression of the target mRNA. Data shown as the mean ± S.E.M. (n = 3 independent experiments). P values were determined by t test: idpTV versus four sgRNAs: P ≤ 0.05 for all targets, cdpTV versus 4 sgRNA: P ≤ 0.05 for ASCL1, idpTV, cspTV or cdpTV versus 1 sgRNA: P ≤ 0.05 for all targets. (C) Representation of levels of activation relative to distance between sgRNA targeting and the canonical TSS. (D) Expression of NEUROD1 was induced using NAVIa for a period of 4 days at concentrations of doxycycline ranging from 2 ng/ml to 2 μg/ml and measured using qPCR. (E) Expression of NEUROD1 was measured by qPCR upon induction with 200 ng/ml doxycycline for 12, 24, 48 and 96 h in 293T cells in which NEUROD1 was edited using NAVIa. Data in B, D and E are shown as the mean ± S.E.M. (n = 3 independent experiments). (F) Western blot analysis of NEUROD1 protein expression was performed using cell lysates prepared from wild type 293T cells and a 293T clonal population with idpTV integration at the NEUROD1 locus without induction or after 4 days of culture in Tet-Free DMEM containing 200 ng/ml doxycycline. Densitometry analysis demonstrated an increase in NEUROD1 protein expression in the induced samples compared to the wild type controls and the uninduced samples. Error bars represent the S.D. (n = 2). Figure 1. View largeDownload slide NAVIa Activation of Native Gene Expression is Tunable and Surpasses CRISPRa. (A) The architecture of the NAVIa system includes a plasmid containing a human codon-optimized expression cassette for active Cas9, which is co-transfected with two separate sgRNA plasmids and a targeting vector (idpTV, cdpTV or cspTV). The primary sgRNA, shown in dark blue, is designed to bind and target Cas9 to the 5′ region of the gene of interest, while the secondary sgRNA target site (green) is at the 3′ end of the cspTV promoter, or between the diametric promoters of the cdpTV and idpTV. After Cas9 cuts the TV, the resulting linearized vector is integrated at the target site in genomic DNA, presumably via NHEJ repair of the double-stranded breaks. (B) The ability of NAVIa to upregulate the expression of target transcript within pooled, selected 293T cells was evaluated using qPCR across a panel of three genes: ASCL1, NEUROD1, and POUF51. Each sgRNA employed within NAVIa was also used for gene activation with CRISPRa (dCas9-VPR) either alone or in conjunction with three additional sgRNAs, previously reported to activate expression of the target mRNA. Data shown as the mean ± S.E.M. (n = 3 independent experiments). P values were determined by t test: idpTV versus four sgRNAs: P ≤ 0.05 for all targets, cdpTV versus 4 sgRNA: P ≤ 0.05 for ASCL1, idpTV, cspTV or cdpTV versus 1 sgRNA: P ≤ 0.05 for all targets. (C) Representation of levels of activation relative to distance between sgRNA targeting and the canonical TSS. (D) Expression of NEUROD1 was induced using NAVIa for a period of 4 days at concentrations of doxycycline ranging from 2 ng/ml to 2 μg/ml and measured using qPCR. (E) Expression of NEUROD1 was measured by qPCR upon induction with 200 ng/ml doxycycline for 12, 24, 48 and 96 h in 293T cells in which NEUROD1 was edited using NAVIa. Data in B, D and E are shown as the mean ± S.E.M. (n = 3 independent experiments). (F) Western blot analysis of NEUROD1 protein expression was performed using cell lysates prepared from wild type 293T cells and a 293T clonal population with idpTV integration at the NEUROD1 locus without induction or after 4 days of culture in Tet-Free DMEM containing 200 ng/ml doxycycline. Densitometry analysis demonstrated an increase in NEUROD1 protein expression in the induced samples compared to the wild type controls and the uninduced samples. Error bars represent the S.D. (n = 2). In order to evaluate this gene activation architecture in the context of the human genome, we first selected three target genes whose reported levels of activation utilizing CRISPRa are either high (ASCL1, ∼103-fold), medium (NEUROD1, ∼102-fold), or low (POU5F1, ∼10-fold) (23,24). The primary sgRNAs targeting the genome were co-transfected into 293T cells with three plasmids containing (a) an expression cassette for active Cas9, (b) our customized cspTV, cdpTV or idpTV, and (c) a universal secondary sgRNA. Following transfection, cells with integration of the TV were selected using puromycin and, in cells transfected with the idpTV, gene expression was induced with doxycycline. Isolated clones were screened by PCR to verify integration of the idpTV at the locus of interest (Supplementary Figure S2, Supplementary Table S5). In parallel, one sgRNA or a mixture of 4 sgRNAs (previously validated for use with CRISPRa) was co-transfected into 293Ts with dCas9-VPR for comparison of our system with CRISPRa (23,24,29). Gene expression using an individual sgRNA directing dCas9-VPR to target promoters was increased ∼10-fold for all targets tested but the results were not statistically significant. Utilization of four sgRNAs simultaneously activated gene expression more effectively than 1 sgRNA (ASCL1: ∼1800-fold, NEUROD1: ∼2900-fold, POU5F1: ∼90-fold). With NAVIa, the degree of gene activation using the cspTV (ASCL1: ∼730-fold, NEUROD1: ∼600-fold, POU5F1: ∼200-fold) or cdpTV (ASCL1: ∼8500-fold, NEUROD1: ∼3000-fold, POU5F1: ∼1000-fold) was superior to CRISPRa using 1 sgRNA but lower or not statistically different from activation obtained using CRISPRa with 4 sgRNAs for two of the three targets. However, the idpTV (ASCL1: ∼7200-fold, NEUROD1: ∼76000-fold, POU5F1: ∼5370-fold) surpassed activation obtained using dCas9-VPR using four sgRNAs (Figure 1B). Interestingly, the improvement of NAVIa over dCas9-VPR was higher for targets branded as difficult to regulate with CRISPRa (POU5F1: ∼60-fold improvement, NEUROD1: ∼26-fold improvement) than for a target considered easy to activate (ASCL1: ∼4-fold improvement). When using CRISPRa it is difficult to predict optimal sgRNA target sites for efficient gene activation. While it is generally accepted that proximity to the TSS of the target gene is important, other parameters such as presence of enhancers or local chromatin structure are also critical and, perhaps, more difficult to predict (23–25,29). We investigated a potential correlation between gene activation using NAVIa and distance between integration site and TSS by measuring gene expression induced with sgRNAs that target DNA sequences between positions –1010 and +1995, relative to the TSS of three different genes (Figure 1C). Plotting these data for all three genes showed that NAVIa can activate gene expression efficiently from any integration site on this range, with the most activity being derived from sgRNAs between –500 and +200 bp relative to the TSS. Since maximal gene activation may not be desirable in all experimental settings, CRISPRa has been adapted for tunable gene expression through combinatorial delivery of multiple sgRNAs (10–16). However, such efforts to modulate gene expression have proven unpredictable. Alternatively, NAVIa enables facile integration of any TV, which can facilitate gene activation by a wide variety of regulatory mechanisms provided by existing artificial promoters. As the idpTV used in these experiments introduces a doxycycline-inducible promoter, we anticipated a precise temporal control of gene expression that could be tuned by the concentration of doxycycline in the growth medium. Induction of gene expression for 96 h with concentrations of doxycycline ranging from 2 ng/ml to 2 μg/ml led to a dose-dependent increase in gene expression ranging between ∼337-fold and ∼26 015-fold (Figure 1D). Considering this result, we chose to use 200 ng/ml doxycycline for a time course that demonstrated that induction of NEUROD1 is detectable 12 h after treatment (∼4000-fold) and continues to increase at 24 h (∼5000-fold), 48 h (∼10 000-fold) and 96 h (∼15 000-fold) (Figure 1E). Western blots of a clonal population of 293Ts with heterozygous idpTV integration at the NEUROD1 locus confirmed an increase in NEUROD1 protein expression levels in the induced samples compared to the WT controls and uninduced samples (Figure 1F). Furthermore, upon removal of doxycycline, the RNA expression of NEUROD1 returns to near basal levels within 96 hours (Supplementary Figure S3). Tetracycline-inducible systems have been designed for high responsiveness to doxycycline, yet background expression in the absence of inducer, while low, continues to be a problem that hinders applications requiring precise control over gene activation (35). While inducibility is a significant advantage of NAVIa over CRISPRa, tetracycline-inducible promoters are typically used to modulate expression cassettes within a vector and not in a genomic context where the surrounding transcriptional regulatory elements may contribute to undesired expression at steady state. Analysis of NEUROD1 activation within samples not induced with doxycycline revealed significant background expression (∼432-fold over basal expression, Figure 2A). While we failed to identify a correlation between background and distance from the integration to ATG codons (Supplementary Figure S4) or between background expression and basal expression (Supplementary Figure S5), we reasoned that expression of rtTA from unintegrated plasmids still transiently present from the transfection might be partly responsible for high background levels of expression. Indeed, background expression in clones with heterozygous or homozygous integrations was significantly lower than in pooled populations, while gene induction in heterozygous clones was similar to that observed in pooled populations but significantly lower than activation in homozygous clones. The ratio of gene expression between samples with and without doxycycline treatment was improved from ∼22-fold induction in pooled cells to ∼126-fold and ∼1486-fold in heterozygous and homozygous clones respectively (Figure 2A). The levels of activation observed in heterozygous clones, homozygous clones, and pooled populations were similar in the presence of doxycycline, which indicates that the high levels of expression seen in the pooled population are unlikely due to a minority of cells expressing large amounts of transcript. Figure 2. View largeDownload slide Multiplexed Gene Activation Using NAVIa. (A) Comparison of background and induced expression of NEUROD1 targeted using NAVIa between pooled HCT116 cells (diploid) and clones that were positive for idpTV integration at either one or both alleles (n = 3 independent experiments). Untreated pooled cells versus heterozygous, P ≤ 0.003. Untreated heterozygous versus homozygous, P ≤ 0.07. Untreated pooled cells versus homozygous, P ≤ 0.0005. Doxycycline treated heterozygous versus homozygous, P ≤ 0.001. Doxycycline treated pooled cells versus homozygous, P ≤ 0.001. (B) 293T cells were transfected with CRISPRa or NAVIa targeting simultaneously the genes ASCL1, NEUROD1, POUF51, IL1B, IL1R2, LIN28A and ZFP42. Expression of the target genes without selection was measured at day 3 using qPCR (n = 2 independent experiments). Data is shown as mean ± S.E.M. P values were determined by t test (NAVIa versus VPR, P ≤ 0.001 ASCL1, P ≤ 0.02 IL1B (Ct value of control sample was not detected and assumed to be 40), P ≤ 0.004 IL1R2, P ≤ 0.001 LIN28A, P ≤ 0.001 NEUROD1, P ≤ 0.007 POUF51, P ≤ 0.001 ZFP42). (C) The idpTV was integrated at the TERT locus in SF7996 primary glioblastoma cells and expression of TERT was increased in a dose-dependent manner by addition of doxycycline compared with untreated control cells (n = 4, P < 0.005). N.D.: not detected. (D) The proliferation rates between cells cultured in doxycycline-free medium and cells cultured in 400 ng/ml doxycycline was compared by tracking cumulative population doublings over 84 days (n = 3, * represents P ≤ 0.05, ** represents P ≤ 0.01, *** represents P ≤ 0.001). Data in A, B, C and D are shown as the mean ± S.E.M. Figure 2. View largeDownload slide Multiplexed Gene Activation Using NAVIa. (A) Comparison of background and induced expression of NEUROD1 targeted using NAVIa between pooled HCT116 cells (diploid) and clones that were positive for idpTV integration at either one or both alleles (n = 3 independent experiments). Untreated pooled cells versus heterozygous, P ≤ 0.003. Untreated heterozygous versus homozygous, P ≤ 0.07. Untreated pooled cells versus homozygous, P ≤ 0.0005. Doxycycline treated heterozygous versus homozygous, P ≤ 0.001. Doxycycline treated pooled cells versus homozygous, P ≤ 0.001. (B) 293T cells were transfected with CRISPRa or NAVIa targeting simultaneously the genes ASCL1, NEUROD1, POUF51, IL1B, IL1R2, LIN28A and ZFP42. Expression of the target genes without selection was measured at day 3 using qPCR (n = 2 independent experiments). Data is shown as mean ± S.E.M. P values were determined by t test (NAVIa versus VPR, P ≤ 0.001 ASCL1, P ≤ 0.02 IL1B (Ct value of control sample was not detected and assumed to be 40), P ≤ 0.004 IL1R2, P ≤ 0.001 LIN28A, P ≤ 0.001 NEUROD1, P ≤ 0.007 POUF51, P ≤ 0.001 ZFP42). (C) The idpTV was integrated at the TERT locus in SF7996 primary glioblastoma cells and expression of TERT was increased in a dose-dependent manner by addition of doxycycline compared with untreated control cells (n = 4, P < 0.005). N.D.: not detected. (D) The proliferation rates between cells cultured in doxycycline-free medium and cells cultured in 400 ng/ml doxycycline was compared by tracking cumulative population doublings over 84 days (n = 3, * represents P ≤ 0.05, ** represents P ≤ 0.01, *** represents P ≤ 0.001). Data in A, B, C and D are shown as the mean ± S.E.M. One important feature of CRISPRa architectures is multiplexability. Different genes can be activated simultaneously by delivering sgRNAs targeting each promoter (10,23,24,36). Two benefits of NAVI over other integration platforms, such as those utilizing HR, are the universal adaptability of the system to target different genomic loci by simply providing additional primary sgRNAs and the facile clone screening and isolation upon selection. Since activation of different genes using NAVIa can be accomplished using a set of vectors in which the only variable element is the primary sgRNA, this flexible architecture is also compatible with multiplexing. To demonstrate these capabilities, we first identified sgRNAs for targeting additional genes with NAVIa including IL1B, IL1R2, LIN28A and ZFP42 (Supplementary Figure S6). To facilitate multiplexing, we utilized a custom Golden Gate cloning plasmid to prepare two multi-sgRNA (mgRNA) vectors capable of delivering a total of seven individual sgRNAs targeting genes and one sgRNA for linearizing the idpTV, each under independent promoter control (36). Co-transfection of these plasmids alongside the idpTV and Cas9 vectors into 293T cells was followed by induction of gene expression with doxycycline for two days. Analysis of mRNA expression across all targeted genes demonstrates that multiplexed gene activation with NAVIa surpasses CRISPRa for all targets tested (ranging from ∼15-fold to ∼400-fold) (Figure 2B). Together, these results emphasize the multiplexing capabilities of NAVIa, as well as a clear advantage over CRISPRa when only one sgRNA is employed. To further validate the trends we observed in 293T cells, we targeted NEUROD1 using our cdpTV in other cell lines. NAVIa effectively activated expression of NEUROD1 in the human colorectal carcinoma cell line HCT116, the primary human fibroblast cell line MRC-5, and the mouse neuroblastoma cell line Neuro-2A (Supplementary Figure S7). Finally, we chose TERT as a target to demonstrate the applicability of NAVIa for activating genes that are difficult to regulate with CRISPRa (15,23) in a primary cell line that is difficult to transfect as well as to demonstrate a physiological effect. Following transfection and selection of SF7996 cells (primary glioblastoma cells, which depend on TERT for survival and proliferation (32,37)), we derived a clonal population in which expression of TERT is controlled by the idpTV and can be induced in a dose-dependent manner with doxycycline. It is noteworthy that following idpTV integration, TERT expression could no longer be detected without induction (Figure 2C), but the addition of doxycycline upregulated gene expression and enabled ∼40-fold activation over control untreated cells (Figure 2C). Since SF7996 cells depend on TERT expression for survival over multiple cell divisions, they were maintained in regular growth media supplemented with doxycycline. When the growth media was switched to tetracycline-free medium, the proliferation rate decreased relative to that of the doxycycline-induced cells (Figure 2D), demonstrating a method to immortalize cell lines by regulating expression of native TERT. DISCUSSION In this manuscript we describe a novel platform to activate native gene expression based on integration of heterologous promoters, which provides some advantages over CRISPRa. For example, NAVIa enables robust activation across target genes following a single transfection and with minimal cloning and facile isolation of isogenic cell lines expressing the selected gene. Furthermore, NAVIa can be adapted for gene regulation using any constitutive or inducible promoters of interest and achieves consistent activation from a wide range of positions near the TSS, which minimizes the screening needed to identify optimal sgRNAs. Promoter integration is accomplished by NAVI (31), which utilizes NHEJ, which provides significant advantages over DNA integration platforms that rely on Homologous Recombination (HR). For example, NHEJ is more effective than HR in non-dividing cells and has been exploited to integrate therapeutic transgenes in post-mitotic cells (38). Although NAVI is subject to some shortcomings associated with its specific gene editing mechanism, such as the error-prone nature of NHEJ and chromosomal translocations (39), we have observed only minor indels at target sites both here (Supplementary Figure S8) and in previous studies (31). In a recent study, Niu et al. (40) identified chromosomal abnormalities in cell lines that had been modified via multiplexed DSBs, however they concluded that chromosomally normal clones could be obtained through genomic screening. While this is a risk inherent to any editing strategy that relies on DSBs, ATFs are not necessarily safer in this regard as permanent regulation of gene expression through ATFs can only be obtained through stable integration using viral vectors, gene editing, or transposases, which entail similar risks to NAVIa. Another potential problem resulting from integration via NHEJ is that it is also possible to have multiple copies of the transfer vector integrated at the site of the genomic DSB, however these events can be easily screened for by PCR. It should be noted that we were unable to detect any instances of multiple integrations in a pooled population of 293T cells containing idpTV integrations at the NEUROD1 locus, as well as three clonal populations. Furthermore, the dual promoter architecture ensures that there is always a synthetic promoter activating gene expression should multiple linearized transfer vectors be integrated. In the majority of the experiments shown in this manuscript we demonstrate very high, supraphysiological levels of gene activation, which may not be necessary and/or relevant in all experimental settings. However, it is important to note that these high levels of expression can be modulated by controlling the dose of inducer to achieve any desired outcome. We have observed that in the absence of an inducer the levels of expression of the target genes increased over background due to the inherent leakiness of the inducible promoter used in this particular study. However, since this gene activation system is universal, we anticipate that any other tightly-controlled promoter can be used to minimize this potential problem. Furthermore, we also demonstrated that the levels of background expression can be decreased by controlling the number of modified alleles (Figure 2A). Despite the drastic increase in rates of NEUROD1 transcription, only ∼2-fold increase was observed at the protein level, indicating that rates of translation remain a bottleneck in overexpression studies. Additionally, while we achieved very high levels of expression in model cell lines, such as 293Ts, activation in primary cells, in which native gene expression is more difficult to regulate, more closely resembled physiological conditions. Importantly, in experiments involving primary cells and/or genes that are tightly regulated, such as TERT (Figure 2C, D), CRISPRa is often ineffective (23,24). Conceptually, gene activation by NAVIa resembles transgene expression from a heterologous promoter. However, NAVIa provides some advantages over these systems. For instance, expression of large transgenes is challenging using viral vectors such as AAV or lentiviruses, however, NAVIa can be easily applied for activation of any gene. Additionally, lentiviral systems rely on random integration of transgenes, which often results in different copy number across cells leading to variable and unpredictable levels of expression, which are difficult to control precisely. Another disadvantage of transgenes is that they cannot recapitulate the various protein isoforms that are expressed from a given mammalian gene. NAVIa is expected to maintain natural splicing patterns, as we have previously observed that the overall patterns of gene expression and splicing patterns are maintained despite target gene activation of several orders of magnitude (15,41). Additionally, NAVIa allows multiple genes to be upregulated simultaneously by using additional sgRNAs while delivery of multiple heterologous genes using viral vectors can be challenging. One important concern about the NAVIa system is that it is prone to Cas9 off-target nuclease activity (42). Such activity may lead to off-target vector integration and the inadvertent upregulation of additional genes. It should be noted that we did not detect off-target integrations in any of the clones that we screened (Supplementary Figure S9, Supplementary Tables S6 and S7). Furthermore, the risk of off-target integration can be mitigated by using truncated sgRNAs (43) or enhanced versions of Cas9 that have increased specificity (44,45). While CRISPRa is also susceptible to off-target activation (23), one fundamental difference between both systems is that, for sustained gene activation, CRISPRa necessitates the stable expression, or repeated introduction, of heterologous system components, which may have obvious negative implications on their own. In contrast, NAVIa only necessitates transient nuclease activity to integrate a single synthetic element and is easily amenable to customization to reduce or completely eliminate off-target effects. Additionally, NAVIa can easily be adapted for reversible activation by adding LoxP sites to the integration vector, which would allow for removal of the promoter system using Cre recombinase. While this strategy could be used to remove the synthetic promoter, it should be noted that Cre-Lox recombination leaves a genomic scar that may affect expression levels. Furthermore, this technique should be avoided if a multiplexed approach is being used, as the presence of multiple LoxP sites within the genome may result in chromosomal rearrangements or other aberrations. One significant advantage of NAVIa over existing CRISPRa methods is the rapid and facile generation and screening of stable cell lines with tunable or programmable properties and a highly predictable pattern of integration. Inducible CRISPRa methods have been developed by integrating a tetracycline-inducible Cas9-based transcriptional activator at random genomic loci (24,25). Induction of target gene expression with these systems requires persistent expression of the sgRNA while expression of the ATF, and ultimately target gene activation, is controlled by treatment with doxycycline. Although these systems are tunable, they also exhibit significant background expression in the absence of doxycycline (25). In contrast, NAVIa replaces native promoters via targeted integration of a tetracycline-inducible promoter to achieve a rapid response to the inducer while avoiding unpredictable lentiviral integration patterns. Another potential limitation of NAVIa in these experiments was the integration of two promoters in different orientations. While this approach ensures that one promoter is always positioned in the correct orientation for overexpression of the target gene, it is possible that the other promoter can modify expression in the opposite orientation. While this shortcoming also occurs with bidirectional gene activation induced by CRISPRa, it can be overcome by simply using a TV with a single promoter and isolating clones with only integrations in the desired orientation. Though this alternative strategy requires screening, it effectively prevents potential aberrant activation at the opposite end of the vector. In general, for experiments simply focused on activating the gene of interest and where a stable cell line is not required, the use of the bidirectional promoter will ensure that upregulation of the gene of interest is achieved in most cells within a pooled population. However, for creation of stable cell lines it might be desirable to use the single promoter, which ensures that neighboring genes will not be affected. In summary, the robust levels of activation, multiplexing capabilities, and adaptability make NAVIa an attractive new platform for a variety of synthetic biology applications including metabolic engineering, drug screening, and signal transduction pathway analysis. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS We thank Charles Gersbach for providing the Golden Gate plasmids for cloning sgRNAs. We thank the Office of Undergraduate Research at the University of Illinois for a Summer Undergraduate Research Fellowship to Nathan Tague. FUNDING ZJU-Illinois Institute Research Program, American Heart Association Scientist Development Grant [17SDG33650087], and National Institutes of Health [R01GM127497] to P.P.; National Science Foundation Graduate Research Fellowship Program [DGE-1746047] to M.G. Conflict of interest statement. A.B., W.W., and P.P. have filed patent applications related to genome editing and gene activation. Notes Present address: Alexander Brown, Horae Gene Therapy Center, University of Massachusetts School of Medicine, Worcester, MA 01605, USA. REFERENCES 1. Gersbach C.A. , Perez-Pinera P. Activating human genes with zinc finger proteins, transcription activator-like effectors and CRISPR/Cas9 for gene therapy and regenerative medicine . Expert Opin. Ther. Targets . 2014 ; 18 : 835 – 839 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Didovyk A. , Borek B. , Tsimring L. , Hasty J. Transcriptional regulation with CRISPR-Cas9: principles, advances, and applications . Curr. Opin. Biotechnol. 2016 ; 40 : 177 – 184 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Joung J. , Konermann S. , Gootenberg J.S. , Abudayyeh O.O. , Platt R.J. , Brigham M.D. , Sanjana N.E. , Zhang F. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening . Nat. Protoc. 2017 ; 12 : 828 – 863 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Fellmann C. , Gowen B.G. , Lin P.C. , Doudna J.A. , Corn J.E. Cornerstones of CRISPR-Cas in drug discovery and therapy . Nat. Rev. Drug Discov. 2017 ; 16 : 89 – 100 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Beerli R.R. , Dreier B. , Barbas C.F. 3rd Positive and negative regulation of endogenous genes by designed transcription factors . Proc. Natl. Acad. Sci. U.S.A. 2000 ; 97 : 1495 – 1500 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Maeder M.L. , Linder S.J. , Reyon D. , Angstman J.F. , Fu Y. , Sander J.D. , Joung J.K. Robust, synergistic regulation of human gene expression using TALE activators . Nat. Methods . 2013 ; 10 : 243 – 245 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Perez-Pinera P. , Ousterout D.G. , Brunger J.M. , Farin A.M. , Glass K.A. , Guilak F. , Crawford G.E. , Hartemink A.J. , Gersbach C.A. Synergistic and tunable human gene activation by combinations of synthetic transcription factors . Nat. Methods . 2013 ; 10 : 239 – 242 . Google Scholar Crossref Search ADS PubMed WorldCat 8. Miller J.C. , Tan S. , Qiao G. , Barlow K.A. , Wang J. , Xia D.F. , Meng X. , Paschon D.E. , Leung E. , Hinkley S.J. et al. . A TALE nuclease architecture for efficient genome editing . Nat. Biotechnol. 2011 ; 29 : 143 – 148 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Zhang F. , Cong L. , Lodato S. , Kosuri S. , Church G.M. , Arlotta P. Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription . Nat. Biotechnol. 2011 ; 29 : 149 – 153 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Cheng A.W. , Wang H. , Yang H. , Shi L. , Katz Y. , Theunissen T.W. , Rangarajan S. , Shivalila C.S. , Dadon D.B. , Jaenisch R. Multiplexed activation of endogenous genes by CRISPR-on, an RNA-guided transcriptional activator system . Cell Res. 2013 ; 23 : 1163 – 1171 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Farzadfard F. , Perli S.D. , Lu T.K. Tunable and multifunctional eukaryotic transcription factors based on CRISPR/Cas . ACS Synth. Biol. 2013 ; 2 : 604 – 613 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Gilbert L.A. , Larson M.H. , Morsut L. , Liu Z. , Brar G.A. , Torres S.E. , Stern-Ginossar N. , Brandman O. , Whitehead E.H. , Doudna J.A. et al. . CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes . Cell . 2013 ; 154 : 442 – 451 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Maeder M.L. , Linder S.J. , Cascio V.M. , Fu Y. , Ho Q.H. , Joung J.K. CRISPR RNA-guided activation of endogenous human genes . Nat. Methods . 2013 ; 10 : 977 – 979 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Mali P. , Aach J. , Stranges P.B. , Esvelt K.M. , Moosburner M. , Kosuri S. , Yang L. , Church G.M. CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering . Nat. Biotechnol . 2013 ; 31 : 833 – 838 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Perez-Pinera P. , Kocak D.D. , Vockley C.M. , Adler A.F. , Kabadi A.M. , Polstein L.R. , Thakore P.I. , Glass K.A. , Ousterout D.G. , Leong K.W. et al. . RNA-guided gene activation by CRISPR-Cas9-based transcription factors . Nat. Methods . 2013 ; 10 : 973 – 976 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Qi L.S. , Larson M.H. , Gilbert L.A. , Doudna J.A. , Weissman J.S. , Arkin A.P. , Lim W.A. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression . Cell . 2013 ; 152 : 1173 – 1183 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Cong L. , Ran F.A. , Cox D. , Lin S. , Barretto R. , Habib N. , Hsu P.D. , Wu X. , Jiang W. , Marraffini L.A. et al. . Multiplex genome engineering using CRISPR/Cas systems . Science . 2013 ; 339 : 819 – 823 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Jinek M. , East A. , Cheng A. , Lin S. , Ma E. , Doudna J. RNA-programmed genome editing in human cells . Elife . 2013 ; 2 : e00471 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Mali P. , Yang L. , Esvelt K.M. , Aach J. , Guell M. , DiCarlo J.E. , Norville J.E. , Church G.M. RNA-guided human genome engineering via Cas9 . Science . 2013 ; 339 : 823 – 826 . Google Scholar Crossref Search ADS PubMed WorldCat 20. Doudna J.A. , Charpentier E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9 . Science . 2014 ; 346 : 1258096 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Hsu P.D. , Lander E.S. , Zhang F. Development and applications of CRISPR-Cas9 for genome engineering . Cell . 2014 ; 157 : 1262 – 1278 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Wright A.V. , Nunez J.K. , Doudna J.A. Biology and applications of CRISPR Systems: harnessing Nature's toolbox for genome engineering . Cell . 2016 ; 164 : 29 – 44 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Konermann S. , Brigham M.D. , Trevino A.E. , Joung J. , Abudayyeh O.O. , Barcena C. , Hsu P.D. , Habib N. , Gootenberg J.S. , Nishimasu H. et al. . Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex . Nature . 2015 ; 517 : 583 – 588 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Chavez A. , Scheiman J. , Vora S. , Pruitt B.W. , Tuttle M. , E P.R.I. , Lin S. , Kiani S. , Guzman C.D. , Wiegand D.J. et al. . Highly efficient Cas9-mediated transcriptional programming . Nat.Methods . 2015 ; 12 : 326 – 328 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Gilbert L.A. , Horlbeck M.A. , Adamson B. , Villalta J.E. , Chen Y. , Whitehead E.H. , Guimaraes C. , Panning B. , Ploegh H.L. , Bassik M.C. et al. . Genome-Scale CRISPR-Mediated control of gene repression and activation . Cell . 2014 ; 159 : 647 – 661 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Chakraborty S. , Ji H. , Kabadi A.M. , Gersbach C.A. , Christoforou N. , Leong K.W. A CRISPR/Cas9-based system for reprogramming cell lineage specification . Stem Cell Rep. 2014 ; 3 : 940 – 947 . Google Scholar Crossref Search ADS WorldCat 27. Zalatan J.G. , Lee M.E. , Almeida R. , Gilbert L.A. , Whitehead E.H. , La Russa M. , Tsai J.C. , Weissman J.S. , Dueber J.E. , Qi L.S. et al. . Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds . Cell . 2015 ; 160 : 339 – 350 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Hilton I.B. , D’Ippolito A.M. , Vockley C.M. , Thakore P.I. , Crawford G.E. , Reddy T.E. , Gersbach C.A. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers . Nat. Biotechnol. 2015 ; 33 : 510 – 517 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Chavez A. , Tuttle M. , Pruitt B.W. , Ewen-Campen B. , Chari R. , Ter-Ovanesyan D. , Haque S.J. , Cecchi R.J. , Kowal E.J. , Buchthal J. et al. . Comparison of Cas9 activators in multiple species . Nat. Methods . 2016 ; 13 : 563 – 567 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Gapinske M. , Tague N. , Winter J. , Underhill G.H. , Perez-Pinera P. Targeted gene knock out using nuclease-assisted vector Integration: hemi- and homozygous deletion of JAG1 . Methods Mol. Biol. 2018 ; 1772 : 233 – 248 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Brown A. , Woods W.S. , Perez-Pinera P. Multiplexed targeted genome engineering using a universal nuclease-assisted vector integration system . ACS Synthetic Biology . 2016 ; 5 : 582 – 588 . Google Scholar Crossref Search ADS PubMed WorldCat 32. Mancini A. , Xavier-Magalhaes A. , Woods W.S. , Nguyen K.T. , Amen A.M. , Hayes J.L. , Fellmann C. , Gapinske M. , McKinney A.M. , Hong C. et al. . Disruption of the beta1L isoform of GABP reverses glioblastoma replicative immortality in a TERT promoter mutation-dependent manner . Cancer Cell . 2018 ; 34 : 513 – 528 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Hayflick L. Tissue Culture . 1973 ; 220 – 223 . Google Preview WorldCat 34. Brown A. , Woods W.S. , Perez-Pinera P. Targeted gene activation using RNA-Guided nucleases . Methods Mol. Biol. 2017 ; 1468 : 235 – 250 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Das A.T. , Tenenbaum L. , Berkhout B. Tet-on systems for doxycycline-inducible gene expression . Curr. Gene Ther. 2016 ; 16 : 156 – 167 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Kabadi A.M. , Ousterout D.G. , Hilton I.B. , Gersbach C.A. Multiplex CRISPR/Cas9-based genome engineering from a single lentiviral vector . Nucleic Acids Res. 2014 ; 42 : e147 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Bell R.J. , Rube H.T. , Kreig A. , Mancini A. , Fouse S.D. , Nagarajan R.P. , Choi S. , Hong C. , He D. , Pekmezci M. et al. . Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer . Science . 2015 ; 348 : 1036 – 1039 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Suzuki K. , Tsunekawa Y. , Hernandez-Benitez R. , Wu J. , Zhu J. , Kim E.J. , Hatanaka F. , Yamamoto M. , Araoka T. , Li Z. et al. . In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration . Nature . 2016 ; 540 : 144 – 149 . Google Scholar Crossref Search ADS PubMed WorldCat 39. Kosicki M. , Tomberg K. , Bradley A. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements . Nat. Biotechnol. 2018 ; 36 : 65 – 771 . WorldCat 40. Niu D. , Wei H.J. , Lin L. , George H. , Wang T. , Lee I.H. , Zhao H.Y. , Wang Y. , Kan Y. , Shrock E. et al. . Inactivation of porcine endogenous retrovirus in pigs using CRISPR-Cas9 . Science . 2017 ; 357 : 1303 – 1307 . Google Scholar Crossref Search ADS PubMed WorldCat 41. Polstein L.R. , Perez-Pinera P. , Kocak D.D. , Vockley C.M. , Bledsoe P. , Song L. , Safi A. , Crawford G.E. , Reddy T.E. , Gersbach C.A. Genome-wide specificity of DNA binding, gene regulation, and chromatin remodeling by TALE- and CRISPR/Cas9-based transcriptional activators . Genome Res. 2015 ; 25 : 1158 – 1169 . Google Scholar Crossref Search ADS PubMed WorldCat 42. Fu Y. , Foden J.A. , Khayter C. , Maeder M.L. , Reyon D. , Joung J.K. , Sander J.D. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells . Nat. Biotechnol. 2013 ; 31 : 822 – 826 . Google Scholar Crossref Search ADS PubMed WorldCat 43. Fu Y. , Sander J.D. , Reyon D. , Cascio V.M. , Joung J.K. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs . Nat. Biotechnol. 2014 ; 32 : 279 – 284 . Google Scholar Crossref Search ADS PubMed WorldCat 44. Kleinstiver B.P. , Pattanayak V. , Prew M.S. , Tsai S.Q. , Nguyen N.T. , Zheng Z. , Joung J.K. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects . Nature . 2016 ; 529 : 490 – 495 . Google Scholar Crossref Search ADS PubMed WorldCat 45. Slaymaker I.M. , Gao L. , Zetsche B. , Scott D.A. , Yan W.X. , Zhang F. Rationally engineered Cas9 nucleases with improved specificity . Science . 2016 ; 351 : 84 – 88 . Google Scholar Crossref Search ADS PubMed WorldCat Author notes The authors wish it to be known that, in their opinion, the first three authors should be regarded as Joint First Authors. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]
In vitro isolation of class-specific oligonucleotide-based small-molecule receptorsYang,, Weijuan;Yu,, Haixiang;Alkhamis,, Obtin;Liu,, Yingzhu;Canoura,, Juan;Fu,, Fengfu;Xiao,, Yi
doi: 10.1093/nar/gkz224pmid: 30926988
Abstract Class-specific bioreceptors are highly desirable for recognizing structurally similar small molecules, but the generation of such affinity elements has proven challenging. We here develop a novel ‘parallel-and-serial’ selection strategy for isolating class-specific oligonucleotide-based receptors (aptamers) in vitro. This strategy first entails parallel selection to selectively enrich cross-reactive binding sequences, followed by serial selection that enriches aptamers binding to a designated target family. As a demonstration, we isolate a class-specific DNA aptamer against a family of designer drugs known as synthetic cathinones. The aptamer binds to 12 diverse synthetic cathinones with nanomolar affinity and does not respond to 11 structurally similar non-target compounds, some of which differ from the cathinone targets by a single atom. This is the first account of an aptamer exhibiting a combination of broad target cross-reactivity, high affinity and remarkable specificity. Leveraging the qualities of this aptamer, instantaneous colorimetric detection of synthetic cathinones at nanomolar concentrations in biological samples is achieved. Our findings significantly expand the binding capabilities of aptamers as class-specific bioreceptors and further demonstrate the power of rationally designed selection strategies for isolating customized aptamers with desired binding profiles. We believe that our aptamer isolation approach can be broadly applied to isolate class-specific aptamers for various small molecule families. INTRODUCTION It is highly advantageous to be able to sensitively detect multiple different members of a particular molecular family or class in many analytical contexts—for example, detecting illicit drugs and their metabolites for forensic investigations, antibiotics for food safety or pesticides for environmental monitoring (1–3). Cross-reactive assays that can broadly detect small molecules based on a shared molecular framework offer a more efficient and cost-effective solution to this problem than the tandem use of multiple highly specific assays that each detects an individual analyte. Antibody-based immunoassays have dominated the field of small-molecule detection (4), and while assays have been developed for a wide variety of individual targets, the development of class-specific immunoassays has proven difficult (5,6). This is in part because the process of antibody generation, which is performed in vivo, provides no control over the target-binding affinity and spectrum of the resulting antibody. Nucleic acid-based affinity reagents known as aptamers hold much promise in circumventing many of the shortcomings associated with antibodies (7). Aptamers are isolated through a process termed systematic evolution of ligands by exponential enrichment (SELEX) (8) to bind targets of interest with high affinity and specificity. Unlike antibodies, aptamers can be isolated relatively quickly and chemically synthesized in an inexpensive manner with no batch-to-batch variation. Moreover, aptamers are thermostable and have shelf lives of a few years at room temperature (9). Theoretically, since SELEX is an in vitro process, the selection strategy and conditions can be precisely controlled to isolate class-specific aptamers that can broadly bind to small molecules sharing the same core structure. However, little work has been done to demonstrate the capability of SELEX to achieve such a goal. Aptamers isolated for a given small-molecule target often have innate cross-reactivity to analogs of that molecule, but the target-binding spectra of these aptamers are often either insufficient and/or unpredictable. For example, one heavily studied cocaine-binding aptamer (10) can also bind to metabolites such as norcocaine and cocaethylene, but does not respond to the major metabolite benzoylecognine, which only differs from cocaine by a single methyl group (11,12). Toggle-SELEX was developed as a solution to isolate cross-reactive aptamers (13). In this strategy, a library pool is challenged with two different targets sharing the same core structure, which are alternated every round to select for an aptamer that can cross-react to both targets—and ideally, target analogs sharing the same core structure. This method has led to the isolation of cross-reactive aptamers for a few structurally related small molecules (14–16). However, these aptamers typically exhibit only limited cross-reactivity (14,15), and the overall success rate of such approaches has been low (16). For example, Derbyshire et al. successfully isolated an aptamer that can bind to eight aminoglycoside antibiotics using Toggle-SELEX. Out of four independent selections with four different target pairs, only one yielded the final aptamer (16). The limitations of Toggle-SELEX could be attributed to two reasons. First, in previously reported approaches, only a fraction of possible substituent positions was varied between the two targets, yielding aptamers with narrow target-binding spectra. Second, since only a small portion of the aptamers in the initial pool are cross-reactive compared to those that bind to at least one target, those cross-reactive aptamers could be lost during Toggle-SELEX. To overcome these problems, we have developed a new ‘parallel-and-serial’ selection strategy for SELEX to isolate class-specific aptamers for small molecule families. Our strategy has three steps that are crucial for its success. First, a set of structurally related targets are selected to define the core structure that is to be recognized by the isolated aptamer. The use of these targets creates selection pressure for aptamers that recognize generic molecular features common to all targets while remaining insensitive to peripheral substituents. Accordingly, it is important to choose as many targets as needed to represent variations at all desired substituent sites while preserving the core molecular framework of the target family. Next, these various targets are employed in a parallel selection process, in which multiple aptamer pools are enriched using each individual target. As a result, cross-reactive aptamers recognizing the shared core structure are enriched in all of the resulting pools after a few rounds of selection. When all these parallel pools are combined, the population of such aptamers should be relatively high. Finally, this combined pool is subjected to serial selection with each target sequentially, a process that ultimately retains only those aptamers that bind to the core structure shared by these targets. Importantly, this selection strategy is supplemented with a well-designed counter-SELEX procedure (17) to further define the targeted core structure and to prevent the aptamer from binding to structurally similar non-target molecules. As a demonstration of this strategy, we isolated a class-specific aptamer for synthetic cathinones, a large family of dangerous designer drugs (18) that are associated with many severe psychological and physiological health consequences (19,20). The isolated aptamer demonstrated low nanomolar binding affinity to 12 synthetic cathinones while having no response to 11 structurally similar non-cathinone molecules. Analysis of the aptamer isolation process via high-throughput sequencing revealed that cross-reactive sequences were enriched during parallel selection, and that exponential enrichment of such aptamers occurred during serial selection. The target-binding spectrum of the aptamer was further evaluated via a dye-displacement sensing platform. Impressively, the aptamer enabled instantaneous colorimetric detection of several synthetic cathinones at nanomolar concentrations in biological samples, rivaling the performance of any currently available immunoassays that can only detect a few members of this family. To our knowledge, this is the first demonstration of using a rationally designed strategy to isolate high-affinity aptamers that are truly class-specific, having both broad target-binding spectrum as well as excellent specificity. Such success shows that the customizable nature of SELEX makes it well-suited for sourcing receptors with specific, yet broad molecular recognition capabilities, and that the approach described herein can be generally employed to directly isolate class-specific aptamers for any small-molecule family. MATERIALS AND METHODS SELEX strategy The isolation of aptamers was carried out via a parallel-and-serial selection strategy. The whole aptamer isolation process consisted of five (for ethylone and butylone) or nine (for alpha-pyrrolidinopentiophenone, α-PVP) rounds of parallel selection and two cycles of serial selection. Detailed information regarding the conditions for each round of selection are listed in Supplementary Table S1. Since each sequence has a low copy number at the beginning of SELEX, high target concentrations were used during the first few rounds to ensure retention of all possible target binders. Moreover, for counter-SELEX, counter-target concentrations were initially kept low to remove high-affinity interferent binders while avoiding loss of target binders. In later rounds of SELEX, selection stringency was increased by decreasing the target concentration and increasing the counter-target concentration; this was done to obtain aptamers with high target affinity and specificity. The counter-SELEX protocol was rationally designed to encompass known and commonly observed interferents found in seized drug samples, including cutting agents (e.g. caffeine and acetaminophen), adulterants (e.g. procaine, lidocaine and promazine), other illicit drugs (e.g. cocaine and methamphetamine) or structurally related non-cathinone molecules (e.g. amphetamine, pseudoephedrine and ephedrine) to ensure that the isolated aptamer did not bind to them. Parallel selection Parallel selection began with three initial pools consisting of 1 nmole DNA library (Supplementary Table S2) for each of the three different selection targets (α-PVP, ethylone and butylone). From Round P2 to P9, ∼300 pmole of enriched library pool for each target from the previous round was employed in the subsequent round. Positive selection was performed with progressively decreasing target concentrations (Round P1: 1 mM, Rounds P2–P3: 500 μM, Rounds P4–P5: 250 μM for α-PVP, ethylone and butylone; Rounds P6–P9: 100 μM for α-PVP only) to increase selection stringency for the enrichment of strong binders. For every round after the first, counter-SELEX was performed prior to positive selection in order to eliminate non-specific binders. The number and concentrations of counter-targets were progressively increased during the selection process to increase selection stringency. Specifically, from Rounds P2–P5 for the ethylone and butylone pools and Rounds P2–P8 for the α-PVP pool, we used the following counter-SELEX strategy: 100 μM cocaine for Round P2, a mixture of 100 μM cocaine and 100 μM procaine for Round P3, and a mixture of 100 μM cocaine, 100 μM procaine and 100 μM lidocaine for Rounds P4–P8. For Round P9 for the α-PVP pool, 300 μM each of cocaine, procaine and lidocaine was employed consecutively for counter-SELEX (Supplementary Table S1). Serial selection We combined 100 pmole each from the P5 parallel pools for ethylone and butylone and the P9 parallel pool for α-PVP to generate the initial pool for serial selection. Two cycles of serial selection (Cycle 1 comprising Rounds S1–S3, Cycle 2 comprising Rounds S4–S6) were performed to specifically isolate cross-reactive aptamers. For each cycle, positive selection was performed by alternating the selection target (Rounds S1 and S4: butylone, Rounds S2 and S5: ethylone and Rounds S3 and S6: α-PVP). Target concentration was maintained at 100 μM for each round of serial selection for maximum stringency. In the first cycle of serial selection, counter-SELEX was performed prior to each round of positive selection by consecutively challenging the pool with 500 μM each of cocaine, procaine and lidocaine, while in the second cycle of serial selection, counter-SELEX was performed by sequentially challenging with a mixture of 500 μM each of ephedrine, pseudoephedrine, acetaminophen, methamphetamine and amphetamine; then with a mixture of 1 mM each of cocaine, procaine and lidocaine; and finally with 500 μM promazine (Supplementary Table S1). SELEX procedure The isolation of aptamers was carried out following a previously reported library-immobilized SELEX protocol (21). The initial single-stranded DNA library used for SELEX consisted of approximately 6 × 1014 oligonucleotides. Each library strand is stem–loop structured and 73 nucleotides in length, with a randomized 30-nt loop flanked in turn by a pair of 8-nt stem-forming sequences and two primer-binding regions (Supplementary Table S2, DNA library). For each round of SELEX, the library/pool was mixed with biotinylated capture strands (Supplementary Table S2, cDNA-bio) at a molar ratio of 1:5 in selection buffer (10 mM Tris–HCl, 0.5 mM MgCl2, 20 mM NaCl, pH 7.4), heated at 95°C for 10 min and cooled at room temperature for over 30 min to ensure hybridization between library and capture strands. A micro-gravity column (0.5 ml) was prepared by adding 250 μl of streptavidin-coated agarose beads followed by three washes with 250 μl of selection buffer. About 250 μl of cDNA-library solution was then flowed through the micro-gravity column three times in order to conjugate the library to the agarose beads (Supplementary Figure S1A). The column was subsequently washed 10 times with selection buffer. Then, 250 μl of target (α-PVP, ethylone or butylone) dissolved in selection buffer was added to the column. Library molecules that bound to the target underwent a conformational change, which caused them to detach themselves from the biotinylated cDNA into solution (Supplementary Figure S1B). The eluent containing these strands was collected. This process was repeated twice, and all eluents were combined (750 μl total). The resulting pool was concentrated via centrifugation using a 3 kDa cut-off spin filter. The concentrated pool (100 μl) was then mixed with 1 ml of GoTaq Hot Start Colorless Master Mix with 1 μM forward primer (Supplementary Table S2, FP) and 1 μM biotinylated reverse primer (Supplementary Table S2, RP-bio) to amplify the pool via polymerase chain reaction (PCR). Amplification was performed using a BioRad C1000 thermal cycler with conditions as follows: 2 min at 95°C; 13 cycles of 95°C for 15 s, 58°C for 30 s and 72°C for 45 s, and 5 min at 72°C. The optimal number of amplification cycles was determined by performing pilot PCR to ensure sufficient amplification of enriched sequences without generating PCR-related artifacts. Amplification of the enriched pool and the absence of byproducts were confirmed using 3% agarose gel electrophoresis. If byproducts with differing lengths from the original library strands were observed, the pool was purified with a 4% agarose gel and the 73-nt products were recovered by silica column as reported previously (21). To generate single-stranded DNA from the resulting double-stranded PCR products, a fresh micro-gravity column was prepared containing 250 μl streptavidin-coated agarose beads, as described above. The amplified pool was then flowed through the column three times to conjugate the pool to the beads. Afterward, the column was washed six times with 250 μl of separation buffer (10 mM Tris–HCl, 20 mM NaCl, pH 7.4). The column was then capped, and 300 μl of a 0.2 M NaOH solution was added to the column and incubated for 10 min to generate single-stranded DNA, after which the eluent was collected. An additional 100 μl of 0.2 M NaOH was added to elute residual library strands from the column. Both eluents were combined and neutralized with 0.2 M HCl, and the pool was then concentrated via centrifugation with a 3 kDa cut-off spin filter. For every round after the first, counter-SELEX was performed before the positive selection step. Specifically, the library-immobilized column was washed with 250 μl of counter-target(s) in selection buffer to remove non-specific DNA strands. This process was performed three times for Rounds P1–P3 and S1–S6, and ten times for Rounds P4–P5 and, in the case of α-PVP, Rounds P4–P9. Afterward, the column was washed 30 times with selection buffer to wash away non-specific binders in preparation for positive selection. Gel elution assay The enrichment, target affinity, specificity and cross-reactivity of the pools collected after Rounds P5 (for α-PVP, ethylone and butylone), P9 (for α-PVP only), S3 and S6 were evaluated using a modified version of a previously reported gel elution assay (Supplementary Figure S2) (22). Specifically, 50 pmole of enriched library (Supplementary Figure S2A) was incubated with 250 pmole of biotinylated cDNA in 125 μl of selection buffer, heated at 95°C for 10 min and cooled at room temperature over 30 min to anneal both strands and form cDNA-library complex (Supplementary Figure S2B). Afterward, a microcentrifugation column was prepared by adding 125 μl of streptavidin-coated agarose beads. The cDNA-library complex was then added to the column and immobilized on the beads (Supplementary Figure S2C), and the eluent was collected and recycled through the column twice more. The library-immobilized agarose beads were transferred into a microcentrifugation tube and washed five times by adding 625 μl of selection buffer, incubating on an end-over-end rotator for 5 min, followed by centrifugation and removal of the supernatant. The volume of the library-immobilized bead solution was adjusted to 150 μl with selection buffer and aliquoted into seven tubes (20 μl/tube). Afterward, 50 μl of target at a variety of final concentrations (0, 10, 50, 100, 250, 500 or 1000 μM) was added into each tube (Supplementary Figure S2D). After rotating for 60 min on an end-over-end rotator at room temperature, the beads were settled by centrifugation and 40 μl of the supernatant, which contained the target-eluted strands, was collected and set aside (Supplementary Figure S2E). Meanwhile, the leftover solution (30 μl) (Supplementary Figure S2F) was mixed with 50 μl of a 98% formamide solution containing 10 mM ethylenediaminetetraacetic acid and incubated at 90°C for 10 min to completely release all DNA strands from the beads (Supplementary Figure S2G). The resulting solution contained both leftover target-eluted strands and non-target-eluted strands. We analyzed the target-eluted aptamer solution and formamide-treated library solution via 15% denaturing polyacrylamide gel electrophoresis (PAGE) and determined the concentrations of the strands based on standardized concentrations of ladder loaded in the gel. The elution percentage was calculated using the equation: \begin{equation*}\theta \ = {\rm{\ }}\frac{{{V_1} \times {c_{\rm s}}}}{{{V_2} \times {c_{\rm s}} + {\rm{\ }}{V_3} \times {c_{\rm b}}}} \times 100{\rm{\% }}\end{equation*} where θ is the fraction of target-eluted strands, cs is the concentration of target-eluted strands in the supernatant, cb is the concentration of strands in the formamide solution, V1 is the volume of solution before supernatant collection (estimated as 62 μl, with ∼8 μl occupied by agarose beads), V2 is the volume of the collected supernatant containing target-eluted strands (40 μl) and V3 is the volume of solution after addition of formamide (80 μl). A calibration curve was created by plotting the fraction of eluted strands against the employed target concentration. The resulting curve was fitted with the Langmuir equation to determine the dissociation constant (KD) of the enriched pool. The same protocol was used to determine the target cross-reactivity and specificity of the enriched pool for other synthetic cathinones (4-MMC, 4-FMC, MDPBP, MDPV, MePBP, methedrone, methylone, MPHP, naphyrone, pentylone and pyrovalerone) or interferents (acetaminophen, amphetamine, caffeine, cocaine, ephedrine, lidocaine, methamphetamine, procaine, promazine, pseudoephedrine and sucrose). High-throughput sequencing (HTS) analysis of the aptamer isolation process High-throughput sequencing (HTS) of the four final parallel pools and two serial pools was performed using Ion Torrent Sequencing. To prepare samples for sequencing, 10 nM pool was mixed with GoTaq Hot Start Colorless Master Mix, 1 μM forward primer and 1 μM reverse primer with a final volume of 50 μl. Nine cycles of PCR was then performed with the same PCR conditions as described in ‘SELEX Procedure’. Then 40 μl of PCR product was added into 16 μl of ExoSAP-IT reagent in an ice bath. The mixture was then incubated at 37 °C for 15 min to degrade remaining primers and nucleotides, followed by incubation at 80°C for 15 min to inactivate ExoSAP-IT reagent. HTS was performed at FIU DNA Core Facility using an Ion Personal Genome Machine System with an Ion 318 v2 chip (ThermoFisher Scientific). Upon obtaining the sequencing data, the primer sequences were trimmed by cutadapt (23), and the population of sequences from each pool were calculated using FASTAptamer (24). Members of the SCA2.1 family are defined as sequences containing ‘AGTGGGGTTCGGGTGGAGTT’ in any position in the 30-nt random region. The total reads for each pool are: ethylone P5: 234428, ethylone P8: 251376, butylone P5: 532967, a-PVP P9: 328474, S3: 297379, S5: 1575678. RESULTS Choosing targets for selection Synthetic cathinones share the same β-keto phenethylamine chemical core structure and typically differ at four substituent sites (Figure 1A, core structure). Performing selection directly against this ‘bare core’ (i.e. 2-amino-propiophenone, also known as cathinone) may yield aptamers with high affinity toward cathinone, but there is no guarantee that such aptamers will bind other synthetic cathinones that have different substituents on the core structure. This is supported by previous studies that showed that small molecule-binding aptamers isolated for a single target often have lower or no affinity for compounds with additional/differing substituents (17,25). We rationalized that performing SELEX with a set of structurally similar, yet sufficiently diverse, targets would create selection pressure for isolating aptamers that tolerate variations at all such sites, greatly increasing the likelihood that the isolated aptamer will have high cross-reactivity to this family as a whole. Thus, to isolate class-specific synthetic cathinone-binding aptamers, we selected three targets that share the same core structure but have variations at all of the substitution sites that are typically modified in this family: alpha-pyrrolidinovalerophenone (α-PVP), ethylone and butylone (Figure 1A). Figure 1. View largeDownload slide Isolation of a class-specific aptamer using a parallel-and-serial selection strategy. (A) The core structure of synthetic cathinones and the three targets chosen for parallel-and-serial selection, with substituent moieties shaded in red. (B) Schematic diagram of parallel-and-serial selection. Each of the three targets was subjected to parallel selection to enrich for broadly synthetic cathinone-specific aptamers (top), after which the resulting pools were combined and subjected to serial selection screening (bottom) to eliminate target-specific binders while retaining those aptamers with broad cross-reactivity for this drug class. Figure 1. View largeDownload slide Isolation of a class-specific aptamer using a parallel-and-serial selection strategy. (A) The core structure of synthetic cathinones and the three targets chosen for parallel-and-serial selection, with substituent moieties shaded in red. (B) Schematic diagram of parallel-and-serial selection. Each of the three targets was subjected to parallel selection to enrich for broadly synthetic cathinone-specific aptamers (top), after which the resulting pools were combined and subjected to serial selection screening (bottom) to eliminate target-specific binders while retaining those aptamers with broad cross-reactivity for this drug class. Parallel selection Parallel selection was performed using three different initial library pools, with one pool being challenged with α-PVP, one with ethylone and one with butylone (Figure 1B, top). Presumably, challenging the parallel pools with individual targets enriches all sequences binding to the target, including those that are also cross-reactive to other synthetic cathinones. In contrast, challenging a single pool with a mixture of targets may lead to loss of cross-reactive aptamers. This is evidenced by a previous study that demonstrated that performing SELEX with a mixture of four targets yielded aptamers that specifically bind to just one of the four (26). This may be attributable to competitive binding among these targets. To perform parallel selection, we employed a library pool consisting of ∼6 × 1014 unique oligonucleotide sequences, which form an 8-bp stem and a 30-nt random loop as the putative target-binding domain (Supplementary Figure S1A). During the first round, each initial library pool was challenged with 1 mM target, and eluted strands were collected and amplified for the next round of selection. To further establish class-specificity, from the second round onward we performed counter-SELEX prior to the positive selection step to remove aptamers binding to structurally similar interferents (e.g. cocaine, procaine, lidocaine) that have the same functional groups or partial structural features as our targets. In round two, counter-SELEX was first performed for each pool against 100 μM cocaine, followed by positive selection with 500 μM target, as reducing target concentration increases selection stringency (27–29). In the third round, a mixture of 100 μM cocaine and 100 μM procaine was used for counter-SELEX, with the same target concentration for positive selection. In rounds four and five, counter-SELEX was performed against cocaine, procaine and lidocaine (each at a concentration of 100 μM) in a mixture, with 250 μM target used for the positive selection step. After the fifth round, a gel-elution assay (see ‘Materials and methods’ section) was performed to determine the target-binding affinity of each pool for its respective target. We observed that the fraction of eluted library increased with increasing target concentrations for the ethylone and butylone pools (Supplementary Figures S3A and S4A), showing that aptamers binding to these targets had been enriched through parallel selection. In contrast, target elution remained low for the α-PVP pool (Supplementary Figure S5A) regardless of the employed concentration of target, which indicated that the pool was not yet enriched. We further determined the cross-reactivity and specificity of the three pools via the gel-elution assay. The enriched ethylone and butylone pools were able to bind to both ethylone and butylone, but not to α-PVP, which indicated that the population of cross-reactive aptamers was relatively low. These pools also showed some affinity to procaine, with the ethylone pool also binding to cocaine. Neither pool displayed any affinity for lidocaine (Supplementary Figures S3B and S4B). In contrast, the α-PVP pool showed no affinity for any of the targets or counter-targets (Supplementary Figure S5B). Given that the α-PVP pool was not yet enriched, we performed additional rounds of selection. From rounds six to eight, we used the same counter-target mixture of 100 μM cocaine, procaine and lidocaine from round 5 with a further-reduced α-PVP concentration of 100 μM for positive selection. For the ninth round, we used the same α-PVP concentration but with counter-selection performed with 300 μM of each of the three counter-targets in a consecutive manner. After the ninth round, we performed the gel-elution assay with the enriched pool and observed a clear target concentration-dependent elution profile for α-PVP, with an estimated dissociation constant (KD) of 28 μM (Supplementary Figure S6A). Notably, only 30% of the library was eluted, even in the presence of 1 mM α-PVP, which implied that there was just a small population of binders in the pool. We also determined that this pool displayed affinity to ethylone and butylone but was less responsive toward the various interferents (Supplementary Figure S6B), which can be attributed to the fact that more rounds of counter-SELEX were performed. Given that the pools enriched with individual targets also cross-reacted to other targets, it seemed likely that those pools contained cross-reactive aptamers. We believed that if parallel selection was continued, target-specific aptamers would have begun to dominate the pool; therefore, we terminated parallel selection before this could occur. Serial selection We then performed serial selection to enrich cross-reactive aptamers and exclude those specific to individual targets (Figure 1B, bottom). We combined all three enriched parallel pool as a starting library. For each cycle of serial selection, we challenged the combined pool with each target sequentially for a total of three rounds of selection using butylone (first round), ethylone (second round) and α-PVP (third round). In each round, we first performed counter-SELEX with 500 μM cocaine, procaine and lidocaine sequentially followed by positive selection with 100 μM target. After the first cycle of serial selection against all three targets, we performed the gel-elution assay to determine the cross-reactivity and specificity of the resulting pool. We observed that the cross-reactivity toward ethylone and butylone had substantially increased (KD = 82 and 77 μM, respectively) relative to the individual pools obtained for these targets at the end of parallel selection, while affinity toward α-PVP was essentially unchanged (KD = 34 μM) (Supplementary Figure S7A). Importantly, this pool exhibited greatly improved specificity, with minimal affinity for cocaine and lidocaine and only a moderate response to procaine (Supplementary Figure S7B). We then performed a second cycle of serial selection with an identical selection procedure but with a three-stage counter-SELEX process entailing sequential screening against a mixture of 500 μM each of ephedrine, pseudoephedrine, acetaminophen, methamphetamine, amphetamine, then a mixture of 1 mM each of cocaine, procaine and lidocaine, and finally with 500 μM promazine. We believe that the inclusion of these additional counter-targets, which are similar in structure to synthetic cathinones and commonly encountered in seized substances, further enhances the specificity of the enriched pool. After this cycle, we again evaluated the pool binding affinity via the gel-elution assay (Figure 2A). Each individual target (at a concentration of 500 μM) eluted more than 70% of the pool. The pool affinity toward ethylone and butylone had increased by ∼10-fold (KD = 6.9 and 9.5 μM, respectively), whereas the affinity toward α-PVP only marginally increased (KD = 21 μM). Figure 2. View largeDownload slide Characterization of the affinity and specificity of the final enriched pool via a gel-elution assay. (A) PAGE results depict the target elution profile, with lanes representing samples of the pool eluted with 0, 10, 50, 100, 250, 500 or 1000 μM (from left to right) α-PVP, ethylone or butylone. The percent of target-eluted pool was plotted against the concentration of target to determine the binding affinity of the enriched pool. (B) Percent elution values for 14 synthetic cathinones and 11 interferents at a concentration of 50 μM and buffer alone. Figure 2. View largeDownload slide Characterization of the affinity and specificity of the final enriched pool via a gel-elution assay. (A) PAGE results depict the target elution profile, with lanes representing samples of the pool eluted with 0, 10, 50, 100, 250, 500 or 1000 μM (from left to right) α-PVP, ethylone or butylone. The percent of target-eluted pool was plotted against the concentration of target to determine the binding affinity of the enriched pool. (B) Percent elution values for 14 synthetic cathinones and 11 interferents at a concentration of 50 μM and buffer alone. Characterization and sequencing of the final serial pool We concluded that at this stage, the enriched pool largely comprised of cross-reactive synthetic cathinone-binding aptamers. To confirm this, we used the gel-elution assay to test the cross-reactivity of this pool by challenging it with the three targets as well as 11 other synthetic cathinones (for names and chemical structures, see Supplementary Figure S8A). All of them demonstrated >60% target elution at a concentration of 50 μM (Figure 2B). This shows that the aptamer could recognize the core structure of synthetic cathinones, while being tolerant even to side-chain substituents that were not encountered during SELEX. To evaluate the specificity of the enriched pool, we challenged the pool with 50 μM of the counter targets and other potential interferents (for chemical structures, see Supplementary Figure S8B) and found that none of them showed increased elution compared with buffer alone (Figure 2B). We therefore cloned and sequenced this final enriched pool. The pool was revealed to have low diversity, with 30 of the 50 clones having an identical sequence, which we named SCA2.1 (Supplementary Table S5). Mfold (30) predicts that SCA2.1 has a stem–loop structure with a 9-bp stem and a 28-nt loop in our selection buffer at room temperature (Supplementary Figure S9). High-throughput sequencing (HTS) analysis of the selection process We performed a thorough investigation of the parallel-and-serial selection process using HTS. Specifically, we sequenced our three final parallel pools (butylone P5, ethylone P5 and α-PVP P9) and two serial pools (S3 and S6) (Figure 3). Analysis of the parallel pools showed that the most prevalent sequence represented 0.0079% (butylone P5), 0.0012% (ethylone P5) and 4.6% (α-PVP P9) of their respective pool. The greater enrichment observed in the α-PVP P9 pool is probably due to the additional four rounds of parallel selection. Nevertheless, no particular sequence dominated any of these parallel pools, which indicated that they were not yet highly enriched. Meanwhile, the SCA2.1 family (see ‘Materials and methods’ section) comprised 0.011% of the butylone P5 pool and 0.00095% of the α-PVP P9 pool, which is notably higher than the median sequence abundance in each pool (0.00019% for butylone P5 and 0.00032% for α-PVP P9), showing that parallel selection enriches cross-reactive aptamers. We noted that ethylone P5 pool did not show any signs of enrichment. To determine if cross-reactive aptamers were present or lost in the ethylone pool, we performed an additional three rounds of parallel selection for this pool and then performed high-throughput sequencing with the resulting ethylone P8 pool. We found that the SCA2.1 family represented 0.1% of this pool, which implies that these sequences were originally present in the ethylone P5 pool, but at an amount below the detection capability of the HTS method we employed. Figure 3. View largeDownload slide Box and whisker plots of the population distribution of sequences after parallel selection (butylone P5, ethylone P5, ethylone P8 and α-PVP P9) and each round of serial selection (S3 and S6) are shown. The longest horizontal line indicates the 50th percentile, with the boundaries of the box indicating the 5th and 95th percentile, and the whiskers indicate the highest and lowest values of the results. Sequences with population above 95th percentile are plotted as gray dots. The total population of SCA2.1 family is plotted as a red dot in each pool, except for the ethylone P5 pool where no such sequences were detected. Inset shows the pool population distribution after eight rounds of parallel selection using ethylone as target. Given the high diversity of the parallel pools, the lowest values 5th, 50th and 95th percentile all overlap, thus the box and lowest whisker cannot be seen. For the serial pools, the lowest values, 5th, and 50th percentile overlap, thus the bottom portion of the box and the lowest whisker are likewise not apparent. Figure 3. View largeDownload slide Box and whisker plots of the population distribution of sequences after parallel selection (butylone P5, ethylone P5, ethylone P8 and α-PVP P9) and each round of serial selection (S3 and S6) are shown. The longest horizontal line indicates the 50th percentile, with the boundaries of the box indicating the 5th and 95th percentile, and the whiskers indicate the highest and lowest values of the results. Sequences with population above 95th percentile are plotted as gray dots. The total population of SCA2.1 family is plotted as a red dot in each pool, except for the ethylone P5 pool where no such sequences were detected. Inset shows the pool population distribution after eight rounds of parallel selection using ethylone as target. Given the high diversity of the parallel pools, the lowest values 5th, 50th and 95th percentile all overlap, thus the box and lowest whisker cannot be seen. For the serial pools, the lowest values, 5th, and 50th percentile overlap, thus the bottom portion of the box and the lowest whisker are likewise not apparent. Upon combining the three parallel pools, the SCA2.1 family consisted of 0.0041% of the combined pool (median population: 0.000086%) based on our sequencing data. Such enrichment allowed for exponential enrichment of cross-reactive aptamers during serial selection, wherein the population of the SCA2.1 family increased to 0.39% of the S3 pool and 29% of the S6 pool. Indeed, parallel selection made serial selection highly efficient. It is likely that performing serial selection without parallel selection would result in loss of cross-reactive aptamers, given the generally low copy number of such sequences in initial rounds. Characterization of the affinity and specificity of the isolated aptamer We then characterized the affinity of SCA2.1 for the selection targets and specificity against interferents using isothermal titration calorimetry (ITC). We titrated a 300–400 μM solution of target into a 20 μM solution of the aptamer, recorded the heat released by each titration and integrated these data to generate a binding curve. Curve fitting with a one-site binding model resulted in atypical binding stoichiometries (N) and less-than-optimal fitting, especially for the α-PVP titration curve that has a non-sigmoidal curve (Supplementary Figure S10 and Supplementary Table S3). Given that synthetic cathinones are chiral molecules and a racemic mixture of the targets was employed for SELEX, we hypothesized that the aptamer may have differential binding affinity for each enantiomer. The pure enantiomers of the three selection targets were not commercially available, but the high cross-reactivity of the aptamer allowed us to use enantiomers for another synthetic cathinone, (−)- and (+)-MDPV, to confirm our hypothesis. ITC data indicated that the aptamer binds to one target molecule, with N = 0.92 and 0.95 for (−)- and (+)-MDPV, respectively, and exhibits 100-fold greater affinity for the (−) enantiomer (KD = 46.5 ± 7.5 nM) relative to the (+) enantiomer (KD = 3.61 ± 0.12 μM) (Figures 4A and B and Supplementary Table S4). We also performed ITC by titrating racemic MDPV into a solution of SCA2.1 and observed a binding curve similar in appearance to that of α-PVP. Using a modified two-set-of-sites model (see ‘Supplementary materials and methods’ for detail) to fit the result (Figure 4C), we obtained similar binding parameters to those produced by the titration of each enantiomer alone (Supplementary Table S4, (±)-MDPV). This confirmed that the modified model is appropriate to describe such binding phenomenon. Figure 4. View largeDownload slide Characterization of the synthetic cathinone-binding affinity of SCA2.1 using ITC. Top panels present raw data showing the heat generated from each titration of (A) (−)-MDPV, (B) (+)-MDPV and (C) (±)-MDPV to SCA2.1, while bottom panels show the integrated heat of each titration after correcting for dilution heat of the titrant. ITC data obtained with (−)-MDPV and (+)-MDPV were fitted using a single-site model, and ITC data obtained with (±)-MDPV were fitted with a modified two-site binding model. All binding parameters are shown in Supplementary Table S4. Figure 4. View largeDownload slide Characterization of the synthetic cathinone-binding affinity of SCA2.1 using ITC. Top panels present raw data showing the heat generated from each titration of (A) (−)-MDPV, (B) (+)-MDPV and (C) (±)-MDPV to SCA2.1, while bottom panels show the integrated heat of each titration after correcting for dilution heat of the titrant. ITC data obtained with (−)-MDPV and (+)-MDPV were fitted using a single-site model, and ITC data obtained with (±)-MDPV were fitted with a modified two-site binding model. All binding parameters are shown in Supplementary Table S4. We then used this modified two-set-of-sites model to fit the ethylone, butylone and α-PVP binding curves, and obtained improved fitting and normal N values (0.9–1.1), with nanomolar affinity toward one enantiomer and micromolar affinity for the other (Figure 5 and Supplementary Table S3). Opposing intuition, these results suggest that SCA2.1 can achieve not only high target cross-reactivity, but also superior binding affinity. To determine the specificity of SCA2.1, we performed ITC with interferents that are most structurally similar to synthetic cathinones including amphetamine, methamphetamine, ephedrine, cocaine and procaine. All such compounds had very low binding affinity for SCA2.1 (Supplementary Figure S11). Figure 5. View largeDownload slide Characterization of the target-binding affinity of SCA2.1 using ITC. Top panels present raw data showing the heat generated from each titration of (A) α-PVP, (B) ethylone and (C) butylone to SCA2.1, while bottom panels show the integrated heat of each titration after correcting for dilution heat of the titrant. ITC data were fitted with a modified two-site binding model and the binding parameters are shown in Supplementary Table S3. Figure 5. View largeDownload slide Characterization of the target-binding affinity of SCA2.1 using ITC. Top panels present raw data showing the heat generated from each titration of (A) α-PVP, (B) ethylone and (C) butylone to SCA2.1, while bottom panels show the integrated heat of each titration after correcting for dilution heat of the titrant. ITC data were fitted with a modified two-site binding model and the binding parameters are shown in Supplementary Table S3. Evaluating the target-binding spectrum and specificity of the isolated aptamer We then demonstrated the analytical utility of SCA2.1 in a colorimetric dye-displacement assay. Diethylthiotricarbocyanine (Cy7) is a small-molecule dye that exists in equilibrium between monomer and dimer forms, which have absorbance peaks at 760 and 670 nm, respectively. Previous studies have shown that Cy7 monomers can bind to hydrophobic target-binding domains of aptamers, which results in strong enhancement of absorbance at 760 nm (22,31). However, the binding of target to the aptamer can displace Cy7 monomer from the binding domain within seconds, which causes the dye to dimerize in aqueous solution, resulting in the reduction of absorbance at 760 nm and enhancement of absorbance at 670 nm (Supplementary Figure S12). This approach can thus be used as a colorimetric indicator for small molecule detection. We first examined if such an assay can be employed to detect synthetic cathinones using SCA2.1. We determined the binding affinity of Cy7 to SCA2.1 by titrating different concentrations of the aptamer into a solution of 2 μM Cy7 (Supplementary Figure S13A). Increasing the amount of aptamer progressively enhanced the absorbance of Cy7 monomer at ∼760 nm, indicating binding to the aptamer. A gradual peak shift from 760 to 775 nm was also observed, which is consistent with previous studies (22,31) showing that absorbance of the monomer can change in different microenvironments, such as when the dye binds to the aptamer. Based on Cy7 absorbance at 775 nm, we obtained a KD of 1.6 μM (Supplementary Figure S13B). We then investigated whether the synthetic cathinone targets can efficiently displace Cy7 from SCA2.1. We first titrated different concentrations of butylone into a mixture of 2 μM Cy7 and 3 μM SCA2.1, and found that increasing concentrations of butylone progressively reduced the absorbance of Cy7 at 775 nm while enhancing absorbance at 670 nm (Supplementary Figure S14A). This change can be attributed to dimerization of the Cy7 monomer when displaced from the aptamer into solution (22,31). We used the absorbance ratio between 670 and 775 nm (A670/A775) to calculate signal gain and generate a calibration curve, which displayed a linear range of 0 to 10 μM and a measurable detection limit of 250 nM (Supplementary Figure S15). We obtained equivalent results with both ethylone and α-PVP (Supplementary Figures S14B,C and S15), again confirming the high cross-reactivity of SCA2.1. The Cy7-displacement assay is also compatible with biosamples such as urine and saliva, since the absorbance range of Cy7 is well outside the background absorbance exhibited by these matrices (22). We obtained calibration curves with ethylone spiked into 50% urine (Supplementary Figure S16) and 50% saliva (Supplementary Figure S17) with a linear range of 0 to 10 μM and a measurable detection limit of 80 and 120 nM, respectively. The enhanced sensitivity of the assay in these biomatrices can be possibly attributed to the higher ionic strength of the media, which may enhance target-binding to the aptamer or Cy7 dimerization. We tested the cross-reactivity of this assay for nine other synthetic cathinones, including naphyrone, MDPV, pentylone, methylone, 4-MMC, 4-FMC, 3-FMC, methcathinone and cathinone at a concentration of 50 μM. As expected, despite the diversity of the side chains substituents, all synthetic cathinones induced a significant change in A670/A775, producing a signal gain ranging from 45% to 130% relative to ethylone (Figure 6A). This implies that SCA2.1 mainly recognizes the β-keto phenethylamine core structure, and variations in the side chains do not significantly affect target-binding affinity. Notably, the aptamer is more cross-reactive to a broad range of synthetic cathinones than antibodies used in existing immunoassays, which achieve >20% cross-reactivity for only five synthetic cathinones (32). SCA2.1 shows a moderate bias toward bulkier synthetic cathinones such as MDPV, naphyrone and pentylone. Such ligands may fit better in the binding pocket compared to smaller synthetic cathinones like methcathinone or 4-FMC, achieving high binding affinity through greater interaction with the aptamer. This supports the higher signal gain observed in the assay. Importantly, our assay has excellent specificity, as the aptamer does not cross-react to non-synthetic cathinone interferents. We tested our assay with 11 different interferent compounds, including common illicit drugs (amphetamine, methamphetamine and cocaine) and cutting agents found in street samples (pseudoephedrine, ephedrine, procaine, lidocaine, benzocaine, caffeine, acetaminophen and sucrose) at a concentration of 50 μM. The assay yielded no response to any of these interferents (Figure 6A), even though many contained a partial β-keto phenethylamine structure, demonstrating that aptamer specificity can be precisely controlled through a well-designed selection approach. Figure 6. View largeDownload slide Colorimetric detection of synthetic cathinones using a SCA2.1-based Cy7-displacement assay. (A) Signal gain measured via a plate reader from the Cy7-displacement assay with 12 synthetic cathinones (gray, with the three selection targets shaded) and 11 interferents (white) at a concentration of 50 μM with 3 μM SCA2.1 and 2 μM Cy7. Cross-reactivity is defined as the ratio of signal gain between the reference target ethylone and another synthetic cathinone or interferent multiplied by 100%. Error bars show standard deviations from three measurements. (B) Naked-eye detection of synthetic cathinones using a mixture of 5 μM SCA2.1 and 3.5 μM Cy7. The color of the solution changes to bright blue within seconds upon addition of 50 μM synthetic cathinones (a-l). However, the color appears as a faint blue color in the absence of target (-) or 50 μM of a wide range of interferents (m-w). Figure 6. View largeDownload slide Colorimetric detection of synthetic cathinones using a SCA2.1-based Cy7-displacement assay. (A) Signal gain measured via a plate reader from the Cy7-displacement assay with 12 synthetic cathinones (gray, with the three selection targets shaded) and 11 interferents (white) at a concentration of 50 μM with 3 μM SCA2.1 and 2 μM Cy7. Cross-reactivity is defined as the ratio of signal gain between the reference target ethylone and another synthetic cathinone or interferent multiplied by 100%. Error bars show standard deviations from three measurements. (B) Naked-eye detection of synthetic cathinones using a mixture of 5 μM SCA2.1 and 3.5 μM Cy7. The color of the solution changes to bright blue within seconds upon addition of 50 μM synthetic cathinones (a-l). However, the color appears as a faint blue color in the absence of target (-) or 50 μM of a wide range of interferents (m-w). We finally determined the limit of detection for our Cy7-displacement assay for 12 synthetic cathinones and observed detection limits that varied from 40 to 80 nM in 50% urine (Supplementary Figure S18). Clearly, the high affinity of this aptamer as well as its unresponsiveness toward endogenous compounds enables sensitive screening of synthetic cathinones in biological samples. Given that the concentration of these drugs in urine typically ranges between high nanomolar to <100 μM within a few hours after consumption (33), we believe that our assay will be useful for label-free detection of synthetic cathinones in these matrices. We further fine-tuned our Cy7-displacement assay by using a higher concentration of the dye and aptamer in order to intensify the target-induced color change and thereby enable naked-eye detection. We challenged this assay with the aforementioned 12 synthetic cathinones and 11 interferent compounds at a concentration of 50 μM with 3.5 μM Cy7 and 5 μM SCA2.1. In the absence of target, the aptamer-bound Cy7 monomer has an absorption peak at 775 nm, which is not in the visible range, and thus the sample is practically colorless. However, when Cy7 is displaced by the target, the resulting dimerization-associated absorption peak at 670 nm causes the solution to produce a bright, clearly visible blue color. We observed that all 12 synthetic cathinones immediately induced a clear-to-blue color change in the solution, while no color change was identified upon addition of any of the interferent compounds (Figure 6B). Using ethylone as a target, we determined that 6.3 μM is the lowest concentration that can develop a color distinguishable from the blank with the naked eye (Supplementary Figure S19). These results demonstrate the feasibility of the Cy7-displacement assay for instrument-free on-site drug screening applications. DISCUSSION Class-specificity implies a degree of both receptor promiscuity toward targets within a designated molecular family and specificity against molecules outside of the family. The development of class-specific antibodies and aptamers has proven challenging due to the lack of viable methods to precisely control receptor binding profiles and specificity. In this work, we sought to develop a new aptamer isolation strategy, parallel-and-serial selection, as an effective way to select for class-specific aptamers recognizing small molecules based on a familial molecular core structure. The parallel-and-serial selection strategy increases the likelihood of isolating broadly cross-reactive aptamers by enriching oligonucleotide pools in parallel against diversely structured members of the designated target family and then combining and challenging the pools serially with these targets to fine tune specificity toward a particular class of targets. Here, we chose three synthetic cathinones that vary at all major substituent sites of the targeted family to ensure that broadly cross-reactive aptamers are isolated. Other target triplets could yield equally cross-reactive aptamers if they are sufficiently diverse. More targets can be employed to create broader cross-reactivity, although this will increase labor and cost requirements. By supplementing our strategy with counter-SELEX, the binding spectrum can be narrowed down to the target family, thereby avoiding unwanted cross-reactivity to structurally similar non-target molecules. As a demonstration, we isolated a single class-specific DNA aptamer that can bind to 12 diverse synthetic cathinones. This aptamer is insensitive to variations at all substituent sites on the core structure, and even tolerates many substituents that do not appear in our selection targets. Importantly, our aptamer does not respond to 11 structurally similar compounds, some of which only differ from our targets by a single atom. We subsequently demonstrated the superior class-specificity and affinity of our aptamer in a single-step, colorimetric Cy7-displacement assay, which can detect clinically relevant concentrations of synthetic cathinones in biomatrices (33–35) and presents greater target cross-reactivity than existing antibodies. Advantageously, this assay can also achieve naked-eye detection of synthetic cathinones at concentrations at low micromolar concentrations, which is valuable for on-site screening of seized substances. The aptamer isolated herein displays binding characteristics that confound intuition. This aptamer has great molecular promiscuity, binding to several molecules sharing a common, defined core structure. All the while, the aptamer retains the ability to discriminate against non-target molecules that are closely related in structure to members of the designated target family, even by a single atom difference in certain instances. Impressively, the aptamer also can bind to its targets with nanomolar affinity. This was not as expected, as broad cross-reactivity and high affinity is, on the surface, counter-intuitive. Our findings significantly expand the capability of aptamers as class-specific biorecognition elements and demonstrate an unprecedented level of control over aptamer binding profiles through this new parallel-and-serial selection strategy. We believe that our approach can be used to isolate class-specific aptamers for other families of small molecules for applications relevant to medical diagnostics, environmental monitoring, food safety and forensic science. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS We thank Dr. Paul Sharp of the FIU Department of Biological Sciences for assistance with the DNA sequencing process. FUNDING National Institute of Justice, Office of Justice Programs, U.S. Department of Justice Award [2016-DN-BX-0167]; National Institute of Justice, Office of Justice Programs, U.S. Department of Justice for a Graduate Research Fellowship [2015-R2-CX-0034 to H.Y.]; Overseas Fellowship awarded by Education Department of Fujian Province of China [2016071119 to W.Y.]; National Natural Science Foundation of China [21804020]; Presidential Fellowship awarded by the University Graduate School of Florida International University (to O.A.). Funding for open access charge: National Natural Science Foundation of China. Conflict of interest statement. None declared. REFERENCES 1. Verma N. , Bhardwaj A. Biosensor technology for pesticides—a review . Appl. Biochem. Biotechnol. 2015 ; 175 : 3093 – 3119 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Huet A.-C. , Fodey T. , Haughey S.A. , Weigel S. , Elliott C. , Delahaut P. Advances in biosensor-based analysis for antimicrobial residues in foods . TrAC, Trends Anal. Chem. 2010 ; 29 : 1281 – 1294 . Google Scholar Crossref Search ADS WorldCat 3. Gandhi S. , Suman P. , Kumar A. , Sharma P. , Capalash N. , Suri C.R. Recent advances in immunosensor for narcotic drug detection . BioImpacts . 2015 ; 5 : 207 – 213 . Google Scholar Crossref Search ADS PubMed WorldCat 4. St John A. , Price C.P. Existing and emerging technologies for point-of-care testing . Clin. Biochem. Rev. 2014 ; 35 : 155 – 167 . Google Scholar PubMed WorldCat 5. Zhang H. , Wang S. Review on enzyme-linked immunosorbent assays for sulfonamide residues in edible animal products . J. Immunol. Methods . 2009 ; 350 : 1 – 13 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Spinks C.A. Broad-specificity immunoassay of low molecular weight food contaminants: new paths to utopia . Trends Food Sci. Technol. 2000 ; 11 : 210 – 217 . Google Scholar Crossref Search ADS WorldCat 7. Song S , Wang L , Li J. , Fan C , Zhao J Aptamer-based biosensors . TrAC, Trends Anal. Chem. 2008 ; 27 : 108 – 117 . Google Scholar Crossref Search ADS WorldCat 8. Tuerk C. , Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase . Science . 1990 ; 249 : 505 – 510 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Pendergrast P.S. , Marsh H.N. , Grate D. , Healy J.M. , Stanton M. Nucleic acid aptamers for target validation and therapeutic applications . J. Biomol. Tech. 2005 ; 16 : 224 – 234 . Google Scholar PubMed WorldCat 10. Stojanovic M.N. , de Prada P. , Landry D. W. Fluorescent sensors based on aptamer self-assembly . J. Am. Chem. Soc. 2000 ; 122 : 11547 – 11548 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Slavkovic S. , Altunisik M. , Reinstein O. , Johnson P.E. Structure-affinity relationship of the cocaine-binding aptamer with quinine derivatives . Bioorg. Med. Chem. 2015 ; 23 : 2593 – 2597 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Wang Z. , Yu H. , Canoura J. , Liu Y. , Alkhamis O. , Fu F. , Xiao Y. Introducing structure-switching functionality into small-molecule-binding aptamers via nuclease-directed truncation . Nucleic Acids Res. 2018 ; 46 : e81 . Google Scholar Crossref Search ADS PubMed WorldCat 13. White R. , Rusconi C. , Scardino E. , Wolberg A. , Lawson J. , Hoffman M. , Sullenger B. Generation of species cross-reactive aptamers using “toggle” SELEX . Mol. Ther. 2001 ; 4 : 567 – 573 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Niazi J.H. , Lee S.J. , Gu M.B. Single-stranded DNA aptamers specific for antibiotics tetracyclines . Bioorg. Med. Chem. 2008 ; 16 : 7245 – 7253 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Reinemann C. , Freiin von Fritsch U. , Rudolph S. , Strehlitz B. Generation and characterization of quinolone-specific DNA aptamers suitable for water monitoring . Biosens. Bioelectron. 2016 ; 77 : 1039 – 1047 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Derbyshire N. , White S.J. , Bunka D.H.J. , Song L. , Stead S. , Tarbin J. , Sharman M. , Zhou D. , Stockley P.G. Toggled RNA aptamers against aminoglycosides allowing facile detection of antibiotics using gold nanoparticle assays . Anal. Chem. 2012 ; 84 : 6595 – 6602 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Jenison R.D. , Gill S.C. , Pardi A. , Polisky B. High-resolution molecular discrimination by RNA . Science . 1994 ; 263 : 1425 – 1429 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Banks M.L. , Worst T.J. , Rusyniak D.E. , Sprague J.E. Synthetic cathinones (“bath salts”) . J. Emerg. Med. 2014 ; 46 : 632 – 642 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Spiller H.A. , Ryan M.L. , Weston R.G. , Jansen J. Clinical experience with and analytical confirmation of “bath salts” and “legal highs” (synthetic cathinones) in the United States . Clin. Toxicol. 2011 ; 49 : 499 – 505 . Google Scholar Crossref Search ADS WorldCat 20. Prosser J.M. , Nelson L.S. The toxicology of bath salts: a review of synthetic cathinones . J. Med. Toxicol. 2012 ; 8 : 33 – 42 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Yang K.-A. , Pei R. , Stojanovic M.N. In vitro selection and amplification protocols for isolation of aptameric sensors for small molecules . Methods . 2016 ; 106 : 58 – 65 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Yu H. , Yang W. , Alkhamis O. , Canoura J. , Yang K.-A. , Xiao Y. In vitro isolation of small-molecule-binding aptamers with intrinsic dye-displacement functionality . Nucleic Acids Res. 2018 ; 46 : e43 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads . EMBnet J. 2011 ; 17 : 10 – 12 . Google Scholar Crossref Search ADS WorldCat 24. Alam K.K. , Chang J.L. , Burke D.H. FASTAptamer: A bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections . Mol. Ther. Nucleic Acids . 2015 ; 4 : e230 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Nakatsuka N. , Yang K.-A. , Abendroth J.M. , Cheung K.M. , Xu X. , Yang H. , Zhao C. , Zhu B. , Rim Y.S. , Yang Y. et al. . Aptamer-field-effect transistors overcome Debye length limitations for small-molecule sensing . Science . 2018 ; 362 : 319 – 324 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Stoltenburg R. , Nikolaus N. , Strehlitz B. Capture-SELEX: Selection of DNA aptamers for aminoglycoside antibiotics . J. Anal. Methods Chem. 2012 ; 2012 : 415697 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Wang J. , Rudzinski J.F. , Gong Q. , Soh H.T. , Atzberger P.J. Influence of target concentration and background binding on in vitro selection of affinity reagents . PLoS One . 2012 ; 7 : e43940 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Levine H.A. , Nilsen-Hamilton M. A mathematical analysis of SELEX . Comput. Biol. Chem. 2007 ; 31 : 11 – 35 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Irvine D. , Tuerk C. , Gold L. Selexion: Systematic evolution of ligands by exponential enrichment with integrated optimization by non-linear analysis . J. Mol. Biol. 1991 ; 222 : 739 – 761 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction . Nucleic Acids Res. 2003 ; 31 : 3406 – 3415 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Stojanovic M.N. , Landry D.W. Aptamer-based colorimetric probe for cocaine . J. Am. Chem. Soc. 2002 ; 124 : 9678 – 9679 . Google Scholar Crossref Search ADS PubMed WorldCat 32. Ellefsen K.N. , Anizan S. , Castaneto M.S. , Desrosiers N.A. , Martin T.M. , Klette K.L. , Huestis M.A. Validation of the only commercially available immunoassay for synthetic cathinones in urine: Randox Drugs of Abuse V Biochip Array Technology . Drug Test. Anal. 2014 ; 6 : 728 – 738 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Ellefsen K.N. , Concheiro M. , Huestis M.A. Synthetic cathinone pharmacokinetics, analytical methods, and toxicological findings from human performance and postmortem cases . Drug Metab. Rev. 2016 ; 48 : 237 – 265 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Amaratunga P. , Lemberg B.L. , Lemberg D. Quantitative measurement of synthetic cathinones in oral fluid . J. Anal. Toxicol. 2013 ; 37 : 622 – 628 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Mohamed K.M. , Al-Hazmi A.H. , Alasiri A.M. , Ali M.E-S. A GC-MS method for detection and quantification of cathine, cathinone, methcathinone and ephedrine in oral fluid . J. Chromatogr. Sci. 2016 ; 54 : 1271 – 1276 . Google Scholar Crossref Search ADS PubMed WorldCat Author notes The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Structure and mechanism of pyrimidine–pyrimidone (6-4) photoproduct recognition by the Rad4/XPC nucleotide excision repair complexPaul,, Debamita;Mu,, Hong;Zhao,, Hong;Ouerfelli,, Ouathek;Jeffrey, Philip, D;Broyde,, Suse;Min,, Jung-Hyun
doi: 10.1093/nar/gkz359pmid: 31106376
Abstract Failure in repairing ultraviolet radiation-induced DNA damage can lead to mutations and cancer. Among UV-lesions, the pyrimidine–pyrimidone (6-4) photoproduct (6-4PP) is removed from the genome much faster than the cyclobutane pyrimidine dimer (CPD), owing to the more efficient recognition of 6-4PP by XPC-RAD23B, a key initiator of global-genome nucleotide excision repair (NER). Here, we report a crystal structure of a Rad4–Rad23 (yeast XPC-Rad23B ortholog) bound to 6-4PP-containing DNA and 4-μs molecular dynamics (MD) simulations examining the initial binding of Rad4 to 6-4PP or CPD. This first structure of Rad4/XPC bound to a physiological substrate with matched DNA sequence shows that Rad4 flips out both 6-4PP-containing nucleotide pairs, forming an ‘open’ conformation. The MD trajectories detail how Rad4/XPC initiates ‘opening’ 6-4PP: Rad4 initially engages BHD2 to bend/untwist DNA from the minor groove, leading to unstacking and extrusion of the 6-4PP:AA nucleotide pairs towards the major groove. The 5′ partner adenine first flips out and is captured by a BHD2/3 groove, while the 3′ adenine extrudes episodically, facilitating ensuing insertion of the BHD3 β-hairpin to open DNA as in the crystal structure. However, CPD resists such Rad4-induced structural distortions. Untwisting/bending from the minor groove may be a common way to interrogate DNA in NER. INTRODUCTION Ultraviolet (UV) radiation from sunlight is a ubiquitous and potent DNA-damaging mutagen (1–4). When DNA is exposed to UV, the most frequent lesions are covalent linkages between two adjacent pyrimidines, notably cyclobutane pyrimidine dimer (CPD) and 6-4 photoproduct (6-4PP). Failure in detecting and repairing such DNA damage is a major step leading to mutagenesis and cancer (5–9). The yields of CPD and 6-4PP upon UV irradiation vary depending on DNA sequence and also on UV wavelengths (1,10). In general, CPD is generated 3–4 times more than 6-4PP with UV-C and UV-B radiation at ≤296 nm and is a major source of mutations in mammalian cells (11). Nevertheless, 6-4PP is also cytotoxic and mutagenic although the high mutability of 6-4PP is likely suppressed by rapid repair of the lesion in cells (1,4,12,13). The biological impact of UV lesions, especially that of 6-4PP, may increase as stratospheric ozone that serves as a protective barrier against UV-C continues to decline (14). In eukaryotes, the evolutionarily conserved, nucleotide excision repair (NER) pathway is key to removing the UV-lesions. Impaired NER genes decrease cell survival after UV in yeast and other cell lines and can cause the xeroderma pigmentosum (XP) cancer predisposition syndrome in humans, marked by extreme sun sensitivity and >1000-fold higher risk of sunlight-induced skin cancers (1,15,16). NER can operate via two subpathways depending on how the lesions are initially recognized (6,17). For the lesions on the DNA strands that are being actively transcribed by RNA polymerases, an RNA polymerase stalled at a lesion signals the presence of the lesion and initiates the transcription-coupled NER (TCR) (18–21). In TCR, the repair rates of CPD and 6-4PP are similar to each other and are generally efficient (22,23). On the other hand, the global genome NER (GGR) can repair lesions on any location in the genome and relies on dedicated lesion recognition factors such as the UV-DDB complex (containing the DDB2/XPE protein) and the XPC-RAD23B-CETN2 complex (8,24–27). In this pathway, the recognition and repair rates of CPD and 6-4PP are significantly different. CPD is poorly recognized by XPC alone but requires UV-DDB. The UV-DDB ubiquitin ligase complex can bind to CPD within chromatin and help XPC to localize to the lesion (28–32). However, even in the presence of UV-DDB, the CPD repair is sluggish taking ∼24 hrs to remove 50% of CPD in cells (28,32–34). In contrast, 6-4PP is efficiently recognized by XPC and rapidly repaired within hours, even without UV-DDB (35–39). Once a lesion is recognized, both TC- and GG-NER proceed through a common pathway where the transcription factor IIH complex (TFIIH) is recruited to the lesion. TFIIH, containing XPD and XPB helicases, in turn verifies the presence of a bulky lesion and recruits other subsequent factors including XPA and RPA (26,40–43). Successful lesion recognition and proofreading lead to the excision of the lesion-containing single-stranded DNA (24–32 nucleotides) by XPG and XPF-ERCC1 endonucleases, followed by repair synthesis and nick sealing by DNA ligases (9,44). The lesion recognition involving XPC is an indispensable, rate-limiting step in GG-NER and XPC is strictly required for the recruitment of TFIIH (8,24–26,37,44–49). The importance of XPC in human health and cancers has been well documented (50–55). In vitro, the XPC-RAD23B heterodimeric complex is necessary and sufficient to bind specifically to 6-4PP and other bulky lesions repaired by NER such as adducts derived from polycyclic aromatic hydrocarbons (e.g. benzo[a]pyrene), aromatic amines (e.g. acetylaminofluorene) and cisplatin (9,35,37,56–59). Notably, the binding specificities and NER excision efficiencies for lesions vary widely: various factors including conformation in DNA, lesion topology, stereochemistry, nature of the adducted base and sequence context play a role (35,37,56–65). For many lesions, the more destabilizing and distorting a lesion is, the better it is recognized and repaired. For instance, 6-4PP-harboring DNA exhibits greater distortions and is more dynamic than DNA containing CPD and it is a better substrate for XPC and NER (66–72). Although CPD is a poor substrate in its natural, matched sequence context, its recognition and repair efficiency in vitro dramatically improves if CPD is placed within a string of multiple mismatches such as TTT/TTT (36,38,39,45). While a high resolution structure of a mammalian XPC-Rad23B is still lacking, crystal structures of its yeast ortholog, Rad4–Rad23 bound to DNA model lesions (a TTT/TTT mismatch bubble and a TTT/TTT enclosing a CPD lesion) have been solved (45). Being evolutionarily conserved from yeast to humans, the lesion recognition properties of Rad4 and human XPC are highly similar (73,74). Both XPC and Rad4 bind to mismatch bubble DNA with biochemical specificities comparable to those of bona fide NER lesions (Figure 1) (36,45). A recent cryo-EM architecture of the human XPC complex, albeit at low resolution of ∼25 Å, also indicates that the overall architecture of human XPC would be consistent with the crystal structure of yeast Rad4 (74). The crystal structures of Rad4 bound to these model lesions showed that the binding caused two nucleotide pairs harboring the mismatches (and CPD) to be flipped out of the DNA duplex and a β-hairpin from the BHD3 domain was inserted into the DNA duplex to fill the gap. Notably, in this ‘open’ structure, Rad4 did not directly contact the flipped-out CPD nucleotides but interacted exclusively with the nucleotides flipped out from the undamaged, complementary strand. The third mismatched base pair remained intrahelical. The structures therefore supported an indirect mode of recognition that helps explain the wide substrate specificity of the protein (75–78). Figure 1. View largeDownload slide Rad4–Rad23-binding affinities of 6-4PP- and CPD-containing DNA duplexes measured by competition gel-shift assays. (A) The construct names, DNA sequences and the melting temperatures (Tm) of the 24-bp DNA duplexes used as Rad4 substrates in the competition gel-shift experiments. The CCC/CCC mismatches used as a model lesion or the corresponding 3-bp positions in other DNA sequences are indicated in bold. XX denotes the position of 6-4PP or CPD thymidine-thymidine photodimers. (B) Typical gel images from the competition gel-shift assays for different DNA constructs. (C) Rad4-bound DNA fractions quantified from gels versus total protein concentrations. The symbols and error bars indicate the means and standard deviations, respectively, from triplicate EMSA experiments. Solid lines indicate the fit curves of the data points of the same color. (D) Apparent dissociation constant (Kd,app) and R2 of the fits were calculated from the data points in (C). The errors in Kd,app indicate the errors of the nonlinear regression fit. Figure 1. View largeDownload slide Rad4–Rad23-binding affinities of 6-4PP- and CPD-containing DNA duplexes measured by competition gel-shift assays. (A) The construct names, DNA sequences and the melting temperatures (Tm) of the 24-bp DNA duplexes used as Rad4 substrates in the competition gel-shift experiments. The CCC/CCC mismatches used as a model lesion or the corresponding 3-bp positions in other DNA sequences are indicated in bold. XX denotes the position of 6-4PP or CPD thymidine-thymidine photodimers. (B) Typical gel images from the competition gel-shift assays for different DNA constructs. (C) Rad4-bound DNA fractions quantified from gels versus total protein concentrations. The symbols and error bars indicate the means and standard deviations, respectively, from triplicate EMSA experiments. Solid lines indicate the fit curves of the data points of the same color. (D) Apparent dissociation constant (Kd,app) and R2 of the fits were calculated from the data points in (C). The errors in Kd,app indicate the errors of the nonlinear regression fit. More recently, Min and Ansari have also observed that Rad4 can form the same ‘open’ structure when chemically crosslinked to normal DNA. Based on this and the kinetics of Rad4-induced DNA ‘opening’ as determined by temperature-jump perturbation spectroscopy (T-jump) (79), a ‘kinetic gating’ model has been proposed. In this ‘kinetic gating’ mechanism, the Rad4/XPC-binding specificity (recognition) is determined by the kinetic competition between the protein either opening the DNA site or diffusing away (79). At a lesion, the time required to form an open complex must be small enough to be accomplished while the protein resides on a given DNA register. It was further proposed that distorting and destabilizing lesions decrease the opening time by lowering the free energy barrier for DNA opening while it would also progressively increase the residence time of Rad4/XPC. These studies further point to the importance of understanding the structural mechanism and the process by which the lesion search and ‘opening’ is carried out. Subsequent work using T-jump indicated that Rad4 can dynamically untwist DNA for nonspecific interrogation and that this untwisting does not require BHD3 (80). Studies by Van Houten and Min using atomic force microscopy and single-molecule fluorescence microscopy also showed that the mutant Rad4 lacking BHD3 can bend DNA on nonspecific as well as specific sites and that the mutant also localizes to the lesion sites such as fluorescein-dT, similarly as the wild type protein does (81). These results together suggested that Rad4 may use energetically coupled DNA untwisting and bending as it searches along and ‘opens’ a damaged site, for which BHD3 can be dispensable. This paper also showed that Rad4 shows anomalous diffusion around CPD, not allowing the protein to stay stationary enough to open the lesion site. Interestingly, recent computational MD studies by Mu et al. from the Broyde group showed that Rad4 engages its BHD2 domain early in the binding process for various high-specificity, polycyclic aromatic hydrocarbon adduct lesions indicating a role for BHD2 in facilitating BHD3 hairpin insertion into the DNA in a later step (82). Despite these studies, a structure of Rad4/XPC bound to a bona fide substrate such as 6-4PP that is efficiently recognizable in its natural, matched DNA sequence context has been conspicuously missing. This has been a sore spot in the field that limited the validation of the current mechanistic models. To fill the gap, we have solved the crystal structure of the Rad4–Rad23 complex bound to a 6-4PP lesion. The structure is congruent with other previously solved Rad4–DNA structures and additionally showed insight into how the protein may reach the ‘open’ conformation. Subsequently, we examined the process of initial Rad4–DNA interactions using molecular dynamics (MD) simulations. The results reveal a sequence of intricate conformational rearrangements that the Rad4–6-4PP DNA complex undergoes during the recognition. The process involved extensive engagement of BHD2 with the minor groove of the DNA, which facilitates DNA untwisting, bending, and nucleotide flipping towards the major groove, prior to the full ‘opening’. We also show how these dynamical processes differ for the poorly recognized CPD. Altogether, our study presents a 3D structural trajectory for Rad4/XPC’s photolesion binding that helps us understand the lesion recognition function of Rad4/XPC in an unprecedented level. MATERIALS AND METHODS Synthesis of 6-4 photoproduct-containing phosphoramidites A batch of thymidine dimer methyl ester was irradiated with UV (λ = 254 nm) to generate UV-induced intra-strand crosslink adducts including 6-4PP. The resulting mixture of UV lesions and unmodified thymidine dimers was separated over a reverse-phase HPLC column (C-18 XBridge, 5 μm, 19 × 150 mm, Waters). The isolated 6-4PP diol was subsequently functionalized and purified over HPLC to generate 5′-dimethoxytrityl-6-4PP-3′-phosphoramidite suitable for automated oligonucleotide synthesis. NMR spectra were consistent with those reported previously (83). See SI Methods for more details. Oligonucleotide synthesis Oligonucleotides were synthesized by automated systems using phosphoramidite chemistry by IDT or MWG and were purified by HPLC. The sequences of the DNA are in Figures 1A and 2C. All oligonucleotides appeared as a single band on denaturing polyacrylamide gels as well as mass spectrometry. Annealing to make duplex DNA was done by slow cooling a 1:1 mixture of two complementary DNA strands from 95°C to room temperature over 5–8 h in 10 mM Tris–HCl, 1 mM EDTA, pH 8.0. Figure 2. View largeDownload slide Structures of the Rad4–Rad23 complex bound to 6-4PP-containing DNA duplex. (A) Pyrimidine-pyrimidone (6-4) photoproduct (6-4PP) is induced by ultraviolet radiation of dipyrimidine nucleotide through an oxetane intermediate. Chemical structures are shown for the reaction between two thymines in a thymidine-thymidine dinucleotide (dTpT). (B) Domain arrangements and boundaries of Rad4 used in this study. The transglutaminase domain (TGD) is colored orange, β-hairpin domain 1 (BHD1) magenta, BHD2 cyan and BHD3 red. The crystallized Rad4 construct spans residues 101–632 as before (45). The disordered regions (residues 101–128, 518–525) in the crystal are checkered. Rad23 construct is the same as in (45). (C) Sequence of the 6-4PP DNA used in the co-crystallization. 6-4PP is indicated as T–T in red and the end sequences altered for MD simulation are in green (see Supplementary Figure S4). The red square indicates the four nucleotides that were flipped out by Rad4-binding in this structure. (D) Crystal structure of Rad4–Rad23 bound to 6-4PP-containing DNA duplex (PDB code 6CFI). The TGD (orange) and BHD1(magenta) of Rad4 bind to an 11-bp duplex segment of the DNA while BHD2 (cyan) and BHD3 (red) of Rad4 bind to a 4-bp segment in which two 6-4PP-containing nucleotide pairs are flipped out. The undamaged DNA strand is colored in silver while the damaged one is in light pink. The tip of the long β-hairpin in BHD3 (residues 599–605) is inserted into the DNA duplex and fills the gap left by the flipped-out nucleotides. The 6-4PP-linked deoxythymidine photodimer (red) is flipped out away from the protein while its deoxyadenosine partners (black) are bound by BHD2 and BHD3 of Rad4. Rad23 binds to TGD through its Rad4-binding domain (R4BD, light green). ‘N’ indicates the N-terminus of Rad4. 5′ and 3′ in red indicates the direction of the 6-4PP-containing DNA strand, and black the normal complementary strand. The final model contains residues 129–517 and 526–632 of Rad4 and 256–308 of Rad23. The figure was made using PyMOL Molecular Graphics System, version 2.1.1 (Schrodinger, LLC). (E) Electron density maps near the flipped-out 6-4PP. The Polder omit map (blue mesh) was calculated using Phenix, omitting the three DNA nucleotides containing 6-4PP including the solvent mask. Polder omit map suppresses the noise from the bulk solvent better than regular omit map (109). The polder map colored as blue mesh is contoured at 2.5σ and shown 10 Å around 6-4PP. The 2Fo-Fc map was calculated for the whole molecule, using Phenix, with density contoured at 2.0σ shown as a grey mesh, 2 Å around the DNA molecule. The positioning of the 6-4PP and its partner adenines indicate that the 6-4PP nucleotide pair is flipped out towards the major groove (block arrows). As the β-hairpin3 is inserted from the major groove, opposite of the proposed direction of 6-4PP flipping, β-hairpin3 insertion must happen after the 6-4PP flipping is at least initiated. Figure 2. View largeDownload slide Structures of the Rad4–Rad23 complex bound to 6-4PP-containing DNA duplex. (A) Pyrimidine-pyrimidone (6-4) photoproduct (6-4PP) is induced by ultraviolet radiation of dipyrimidine nucleotide through an oxetane intermediate. Chemical structures are shown for the reaction between two thymines in a thymidine-thymidine dinucleotide (dTpT). (B) Domain arrangements and boundaries of Rad4 used in this study. The transglutaminase domain (TGD) is colored orange, β-hairpin domain 1 (BHD1) magenta, BHD2 cyan and BHD3 red. The crystallized Rad4 construct spans residues 101–632 as before (45). The disordered regions (residues 101–128, 518–525) in the crystal are checkered. Rad23 construct is the same as in (45). (C) Sequence of the 6-4PP DNA used in the co-crystallization. 6-4PP is indicated as T–T in red and the end sequences altered for MD simulation are in green (see Supplementary Figure S4). The red square indicates the four nucleotides that were flipped out by Rad4-binding in this structure. (D) Crystal structure of Rad4–Rad23 bound to 6-4PP-containing DNA duplex (PDB code 6CFI). The TGD (orange) and BHD1(magenta) of Rad4 bind to an 11-bp duplex segment of the DNA while BHD2 (cyan) and BHD3 (red) of Rad4 bind to a 4-bp segment in which two 6-4PP-containing nucleotide pairs are flipped out. The undamaged DNA strand is colored in silver while the damaged one is in light pink. The tip of the long β-hairpin in BHD3 (residues 599–605) is inserted into the DNA duplex and fills the gap left by the flipped-out nucleotides. The 6-4PP-linked deoxythymidine photodimer (red) is flipped out away from the protein while its deoxyadenosine partners (black) are bound by BHD2 and BHD3 of Rad4. Rad23 binds to TGD through its Rad4-binding domain (R4BD, light green). ‘N’ indicates the N-terminus of Rad4. 5′ and 3′ in red indicates the direction of the 6-4PP-containing DNA strand, and black the normal complementary strand. The final model contains residues 129–517 and 526–632 of Rad4 and 256–308 of Rad23. The figure was made using PyMOL Molecular Graphics System, version 2.1.1 (Schrodinger, LLC). (E) Electron density maps near the flipped-out 6-4PP. The Polder omit map (blue mesh) was calculated using Phenix, omitting the three DNA nucleotides containing 6-4PP including the solvent mask. Polder omit map suppresses the noise from the bulk solvent better than regular omit map (109). The polder map colored as blue mesh is contoured at 2.5σ and shown 10 Å around 6-4PP. The 2Fo-Fc map was calculated for the whole molecule, using Phenix, with density contoured at 2.0σ shown as a grey mesh, 2 Å around the DNA molecule. The positioning of the 6-4PP and its partner adenines indicate that the 6-4PP nucleotide pair is flipped out towards the major groove (block arrows). As the β-hairpin3 is inserted from the major groove, opposite of the proposed direction of 6-4PP flipping, β-hairpin3 insertion must happen after the 6-4PP flipping is at least initiated. Preparation of Rad4–Rad23 complex The Rad4–Rad23 and Rad4–Rad23–DNA complexes were prepared as previously described (45). Briefly, the Hi5 insect cells co-expressing the Rad4–Rad23 complex were harvested 2 days after infection. After lysis, the proteins were purified using His-Select Nickel agarose resin (Sigma) and anion exchange chromatography (Source Q, GE Healthcare), followed by thrombin digestion and cation exchange (Source S, GE Healthcare) and gel-filtration (Superdex200, GE Healthcare) chromatography. The final sample was concentrated by ultrafiltration to ∼13 mg ml−1 in 5 mM bis–tris propane–HCl (BTP-HCl), 800 mM NaCl, 5 mM dithiothreitol (DTT), pH 6.8. Characterization of Rad4–DNA binding and DNA duplex thermal stabilities The apparent binding affinities (Kd,app) were determined by competition electrophoretic mobility shift assays carried out in EMSA buffer (5 mM BTP-HCl, 75 mM NaCl, 5 mM DTT, 5% glycerol, 0.74 mM CHAPS, 500 μg ml−1 BSA, pH 6.8) as previously described (45,80). The thermal stabilities of the DNA duplexes were measured as previously described (80). Details are also included in SI Methods. Crystallization, structure determination and refinement All crystals were grown by the hanging-drop vapor diffusion method at 4°C, mixing 1 μl of protein solution and 1 μl of crystallization buffer. Crystals of the complex appeared after a few days at 4 °C in wells containing 50 mM BTP-HCl, 100 mM NaCl, 14% (v/v) 1-propanol, 5 mM spermidine-HCl and 5 mM dithiothreitol, pH 6.8. The crystals were then harvested with a harvest buffer using 20–30% (w/v) polyethylene glycol (PEG) 200 or PEG400 as cryoprotectants and were subsequently flash-frozen in liquid nitrogen. Diffraction data were collected at −170°C and were processed with the HKL2000 suite (84) and XDS (85). The structure of the Rad4–Rad23–DNA complex was determined by molecular replacement method using the previous structure (PDB code 2QSH, Chains A and X containing only Rad4 and Rad23) as the search model and refined through multiple rounds of refinement in Phenix (Supplementary Table S1) (45,86,87). The final model contains residues 126–514 and 525–632 of Rad4, and 256–308 of Rad23. The coordinates have been deposited with PDB code 6CFI. Figures were generated by PyMOL (88). Molecular modeling and molecular dynamics (MD) simulations We used the AMBER16 package (89) with ff14SB force field (90), explicit water and counterions for MD simulations, and Discovery Studio (Dassault Systèmes BIOVIA) for molecular modeling. The structures along each trajectory were clustered using the principal component analysis (PCA) method in the Bio3D package (91). The best representative structure for each cluster is defined as the one frame that has the shortest RMSD for the heavy atoms of the lesion-containing 6-mer and the protein backbone atoms of BHD2 to all other frames. Full details concerning the force field, molecular modeling, MD simulation protocols and analyses are given in SI Methods and Supplementary Table S2. RESULTS Differential binding of 6-4PP and CPD to Rad4 characterized by competitive gel-shift assays Previous studies have established that human XPC binds to 6-4PP more specifically than to CPD in vitro (36,37) and that this correlates well with the greater relative repair efficiencies of 6-4PP compared with CPD in human cells and in CHO cells (92). The repair kinetics of CPD and 6-4PP in yeast cells show similar trends as with mammalian cells (93,94), and it has been previously shown that the purified Rad4–Rad23 complex can bind to UV-irradiated DNA (95,96) and that enzymatic removal of CPD in the DNA did not diminish the binding of the UV-damaged DNA to Rad4, indicating a preference towards 6-4PP (95). However, the differences between 6-4PP and CPD for their binding to Rad4 have never been directly quantified with synthetic substrates. To fill this gap, we have synthetically prepared CPD- and 6-4PP-containing DNA duplexes. The melting temperature measurements of duplex DNA containing each lesion indicate that 6-4PP is more destabilizing than CPD (Figure 1A and Supplementary Figure S1). We then carried out competitive electrophoretic mobility shift assays (EMSA or gel-shift assays) using these DNA duplexes as 32P-labeled substrates in the presence of a defined, matched DNA duplex (CH7_NX) as a nonspecific-binding competitor (79,80,97) (Figure 1). The apparent dissociation constant (Kd,app) for 6-4PP was ∼9-fold lower than that for CPD (34.5 ± 1.1 versus 302 ± 26) which binds with almost similar affinity to an undamaged control DNA (Figure 1). Intriguingly, 6-4PP bound ∼2-fold tighter than the 3-bp CCC/CCC mismatch sequence which has been thus far the best substrate that we have tested among different mismatched or CPD/mismatched DNA constructs (45,97). The results thus verify that the UV-lesion binding characteristics are indeed similar between human XPC and yeast Rad4, recapitulating the previous findings for their similarities in binding to other bulky DNA adduct substrates (73). Rad4–Rad23 bound to 6-4 photoproduct forms an ‘open’ structure Using the same 6-4PP DNA used in EMSA, we were also able to solve a 3.3 Å crystal structure of the Rad4 bound to the DNA (Supplementary Table S1). The overall structure of the Rad4–Rad23 complex bound to the 6-4PP-containing DNA duplex was generally analogous to those previously solved with (CPD-)mismatched DNA (Figure 2 and Supplementary Figure S2): Rad4 flipped out two 6-4PP damage-containing nucleotide pairs from the DNA, and the DNA in this ‘open’ structure was also locally bent and unwound (45,79). Such DNA binding involved all four domains (TGD, BHD1, BHD2 and BHD3) of Rad4 across its length. While the TGD and BHD1 domains encircled the fully duplexed portion of the DNA 3′ to the 6-4PP lesion, thus contacting the DNA in a damage-independent manner, the BHD2 and BHD3 domains made direct contacts with the severely distorted 6-4PP-containing DNA site. Similar to other previously solved ‘open’ structures, the β-hairpin3 of the BHD3 inserted into the duplex DNA and filled the gap created by the flipped out 6-4PP dinucleotide and its complementary adenines. Interestingly, the TGD, BHD1 and BHD2 domains make contacts mainly with the minor groove of the DNA while BHD3 alone faces the major groove of the ‘open’ DNA conformation and the long β-hairpin of BHD3 (hereafter ‘β-hairpin3′) was inserted into the DNA from the major groove. Notably, 6-4PP was flipped out away from Rad4, barely making contact with the protein while the complementary adenines were bound by a narrow groove formed by the BHD2 and BHD3 interface. While this is similar to the previously solved structures, it is the first structure that shows Rad4′s flipping out and binding to purine bases on the undamaged strand, confirming that the BHD2/3 groove can indeed accept purines that are expected as partners for pyrimidine dimer lesions. The ‘open’ structure with 6-4PP also underscores the general mechanism of indirect recognition whereby Rad4 does not rely on a direct structural complementarity between the protein and the damaged DNA but rather indirectly senses the presence of a lesion. Such an indirect mechanism explains how a single protein complex, XPC/Rad4, could function as a common sensor for a wide variety of DNA damage repaired by NER. The crystal structure indicates that Rad4 flips out 6-4PP nucleotide pairs towards the major rather than the minor groove In the previously reported Rad4–DNA structures, the two flipped-out nucleotides on the ‘damaged’ strand were disordered leaving little trace of electron density, as they were flipped out away from Rad4 and freely exposed to the solvent. The flipped-out nucleotides in these cases were either two normal nucleotides or a CPD embedded in a 3-bp mismatch (TTT/T(CPD)). However, in this Rad4-6-4PP structure, we have observed weak yet distinct electron densities for the phosphate groups connecting the 6-4PP dinucleotides to each other and to the flanking nucleotides; We also observed electron densities for the deoxyribose of the 3′-pyrimidone (C5′, C4′ and C3′ groups), but not for the ribose of the 5′-thymidine (Figure 2E). We attribute this to the following observations. First, compared with normal dinucleotides, 6-4PP or CPD as dinucleotides are expected to be much more restricted in their movements due to the covalent linkages between the bases. In fact, the intrinsic rigidity of the 6-4PP originating from the 6-4 linkage has been previously noted (98): the conformations of the 6-4PP lesion moiety were all very similar, whether found within a duplex DNA, or as an isolated dT (6-4)dT dinucleotide, or as bound to a 6-4PP-specific antibody Fab fragment (67,98–101). We speculate that such rigidity may have contributed to the electron density observed in the current structure. Also, we note that only the 3-pyrimidone but not the 5-thymidine backbone groups engaged in van der Waals stacking with Rad4 (Arg601), which may have aided the positional stability of the 3′-base's phosphate and deoxyribose ring in the structure. Lastly, the prior structure solved with TTT/T(CPD) had not only two mismatched bases against the CPD thymine dimer but also an additional 3rd mismatch, T/T, positioned 3′ to the CPD lesion. The extra T/T mismatch showed suboptimal stacking and base pairing with relatively poor electron density and high B-factors. Therefore, it is possible that the added dynamics and instability due to the mismatches in the CPD-containing region prevented the CPD moiety (including its phosphate group) from showing observable electron density, as did the 6-4PP in a natural, matched DNA context in the present structure. Based on the position of the positive electron density and the conformationally restricted dinucleotide structure of the 6-4PP (67), we have modeled the backbone and bases of the 6-4PP into the structure (Figure 2D, E). As mentioned earlier, the DNA duplex bound to Rad4 is shown to be locally bent and unwound at the 6-4PP lesion site and 6-4PP is bulged out of the DNA duplex. The continuity of the DNA strand containing 6-4PP and the positions of the phosphate-backbone groups necessitate that the 6-4PP moiety tightly bridges two DNA segments with severe kinks (Supplementary Figure S3). Interestingly, we also noted that the conformations of the flipped out nucleotides in the structure (particularly the two partner adenines as well as the 3′-pyrimidone in 6-4PP) suggested that the nucleotides are rotated out towards the major groove side rather than the minor groove (Figure 2E). Because the β-hairpin3 was inserted from the major groove in a direction opposite to this putative direction of the nucleotide flipping, it then suggested that the nucleotide flipping/DNA opening must be initiated before the β-hairpin3 is inserted (Figure 2E). If true, this model would complement the previous studies using the T-jump approach on mismatched DNA as a model lesion (79,80). These studies have shown that a mutant of Rad4 lacking the BHD3 domain was still capable of reaching the rate-limiting step of DNA ‘opening’ that leads to fully flipped-out nucleotide pairs shown in crystal structures. Another study by Van Houten and Min using atomic force microscopy and single-molecule fluorescence microscopy also showed that the Rad4 mutant lacking BHD3 can bend DNA and localize to the lesion sites (81). These results are thus congruent with the hypothesis that β-hairpin3 insertion may be a late event in the Rad4-lesion binding trajectory, which we then then closely examined through MD simulations as discussed below. MD simulations of the initial binding between Rad4 and DNA lesions The crystal structure, while showing how the final structure may look when Rad4 has ‘recognized’ the 6-4PP, begs the question of how the protein-DNA complex structure is reached as the protein and the lesion encounter each other in solution. In fact, the process by which this structure is reached must hold the key as to whether or not the final structure can be reached in the first place. Importantly, recent MD simulation studies by Mu et al. from the Broyde group have shown remarkable connections between the early stage interactions between Rad4 and lesion-containing DNA duplexes and the final recognition and repair efficiencies (82,102). Motivated by these studies and to obtain mechanistic insights into the UV-lesion recognition process, we performed 4-μs MD simulations for the initial binding of Rad4 to a 6-4PP-containing DNA duplex as well as to a CPD-containing duplex. The ‘docking complexes’ used as the starting models were generated based on a combination of the lesion-bound Rad4–Rad23, the free Rad4–Rad23 and the free lesion-containing DNA structures as previously described (SI Methods, Supplementary Figure S4A) (82,102,103). The thymine-derived photodimer lesions were paired with the normal partner adenines and were placed within the same DNA sequence context as in the crystal structure (Supplementary Figure S4B). The ‘docking complex’ represents the state of Rad4 and DNA prior to initial engagement of the BHD2 and BHD3 domains. The resulting MD simulations for the binding of 6-4PP and CPD are shown in Movies S1 and S2, respectively. To map out distinct conformational ensembles presented along the simulation trajectories, we carried out principal component analysis (PCA), which then defined 6 distinct structural clusters for the 6-4PP and four clusters for the CPD trajectory (Supplementary Figure S5). Each cluster is an ensemble of structures that exhibits similar structural dynamics and could represent a sub-stage upon Rad4 initial binding to the lesion-containing duplex. To gain insights into the lesion recognition process in detail, we have also computed various structural parameters around the lesion sites over the entire trajectories (Supplementary Figure S6–S11), which are discussed further below. 6-4PP, but not CPD, undergoes remarkable DNA untwisting, bending and nucleotide-flipping towards the major groove First, we compare the conformational clusters which stably dominated the trajectories between 3 and 4 μs for 6-4PP (Figure 3A, Supplementary Figure S5A blue (g5) & Movie S3) and for CPD-binding (Figure 3A, Supplementary Figure S5B green (g3) & Movie S4). These clusters, which we refer to as ‘initial binding state’, best illustrate the key differences between 6-4PP and CPD in their initial binding to Rad4 to give us insights into the differences in their recognition by Rad4 and repair in NER. Figure 3. View largeDownload slide Rad4-UV-lesion DNA initial binding structures and their characteristics from MD simulations. (A) Best representative structures of the initial binding state from the MD trajectories. The structures are rendered as in Figure 2D; the end base pairs of the lesion-containing 6-mer for the calculation of untwist angles (Supplementary Figure S4B) are in blue and the side chains of F597 and F599 are in spheres. The inset depicts a zoomed-in view near the flipped out 5′dA partner opposite 4-pyrimidone (3′ T*) of 6-4PP; its binding pocket is shown as surface. R494 is shown as sticks. The black dashed lines indicate hydrogen bonds with lesion partner bases that have occupancies over 50%. (B) Structural characteristics for the best representative, initial binding state structures. The untwist and bend angles are shown in blue and pink, respectively, and the nucleotide flipping angles for 5′ partner A (5′dA) is in gray. The corresponding values for the 6-4PP-bound to Rad4 in the crystal structure are indicated as dashed lines. The BHD2-occupied alpha space (AS) volumes and relative NER excision efficiencies for the 6-4PP and CPD-containing duplexes are in cyan and orange. The NER excision efficiency with 6-4PP was assigned a relative value of 100 (110). The error bars indicate the standard deviations for the block average values of the measured angles (see SI Methods) Figure 3. View largeDownload slide Rad4-UV-lesion DNA initial binding structures and their characteristics from MD simulations. (A) Best representative structures of the initial binding state from the MD trajectories. The structures are rendered as in Figure 2D; the end base pairs of the lesion-containing 6-mer for the calculation of untwist angles (Supplementary Figure S4B) are in blue and the side chains of F597 and F599 are in spheres. The inset depicts a zoomed-in view near the flipped out 5′dA partner opposite 4-pyrimidone (3′ T*) of 6-4PP; its binding pocket is shown as surface. R494 is shown as sticks. The black dashed lines indicate hydrogen bonds with lesion partner bases that have occupancies over 50%. (B) Structural characteristics for the best representative, initial binding state structures. The untwist and bend angles are shown in blue and pink, respectively, and the nucleotide flipping angles for 5′ partner A (5′dA) is in gray. The corresponding values for the 6-4PP-bound to Rad4 in the crystal structure are indicated as dashed lines. The BHD2-occupied alpha space (AS) volumes and relative NER excision efficiencies for the 6-4PP and CPD-containing duplexes are in cyan and orange. The NER excision efficiency with 6-4PP was assigned a relative value of 100 (110). The error bars indicate the standard deviations for the block average values of the measured angles (see SI Methods) DNA untwisting The 6-4PP-DNA in the crystal structure showed DNA untwisting around the lesion. Prior experimental (80) and computational studies (102,104) have also shown that Rad4 untwists/unwinds DNA as it probes DNA nonspecifically for the presence of a lesion in the initial binding state. To examine whether different ‘untwisting’ may contribute to the varying Rad4-binding specificities for the UV lesions, we calculated the untwist angles along the trajectory of Rad4 initial binding to the lesion-containing duplexes: Untwist = Twist pre-BHD2 engagement –Twist, as illustrated in Supplementary Figure S6. Twistpre-BHD2 engagement is the ensemble average twist angle of the lesion-containing 6-mer during the first 1 ns of production MD during which there is no significant protein binding-induced changes. Positive values indicate untwisting and negative values indicate over-twisting. Our analyses show that the DNA is untwisted around the 6-4PP by 27 ± 4° during the initial binding with Rad4, which was smaller than the untwist angle of 89° shown in the crystal structure but was significantly larger than that observed with CPD: the CPD-containing DNA shows no untwisting but rather a slight over-twisting of 6 ± 2° (Figure 3B). DNA bending We also observed sharp differences between 6-4PP and CPD in their DNA bend angles around the lesion sites throughout the MD trajectories (see Supplementary Methods, Figure 3B and Supplementary Figure S6). The 6-4PP-containing DNA exhibited significant bending of 32 ± 5° in the initial binding state, with the sequence 5′ to the lesion bent from the minor groove side, accompanied by extensive engagement of the Rad4 BHD2 domain with the minor groove of DNA (Figure 3 & Movie S3). The bend angle is larger than those of the DNA in the ‘docking’ model (23°) or in the present crystal structure (25°). On the other hand, the CPD-containing duplex was bent by only 13 ± 2° with the sequence 5′ to the lesion bent in a direction opposite to that in the 6-4PP structure, disallowing the lesion site to be engaged with Rad4 in the ‘open’ conformation (Figure 3, Supplementary Figure S6 & Movie S4). The bending of 6-4PP DNA was also highly dynamic compared to that of CPD during BHD2 binding, which was limited for CPD (described below). Furthermore, the 6-4PP bend angle correlated positively with the degree of untwisting (correlation coefficient r = 0.56) for the ensemble from 1 to 4 μs, indicating potential coupling of the two motions (Supplementary Figure S6). Nucleotide-flipping Perhaps the most dramatic difference between 6-4PP and CPD during the initial binding to Rad4 was that only 6-4PP-containing nucleotide pairs were extruded from the DNA duplex and lost most of their Watson-Crick hydrogen-bond pairing, while CPD did not (Movie S1 & S2, Figure 3 and Supplementary Figure S7). Furthermore, the 6-4PP:AA nucleotide pairs all extrude towards the major groove, as indicated by the directions of the nucleotide-flipping pseudo-dihedral angle changes (Figure 3 and Supplementary Figure S7B) (105). This directly supports our inference from the current crystal structure of the Rad4–6-4PP complex. During the extrusion processes, 6-4PP extruded first followed by the 5′ partner dA (5′dA) which flipped out towards the Rad4 protein. Finally, the 3′ partner dA (3′dA) partially extruded episodically while occasionally maintaining one hydrogen bond with the 6-4PP (Figure 3 and Supplementary Figures S7 and S8). In the major groove, the 6-4PP and its partner nucleotides manifested dynamics, and the 5′dA occasionally rotates around the glycosidic bond from the anti to the syn conformation (Supplementary Figure S9), as adopted in the crystal structures. By contrast, the CPD lesion and partner bases did not extrude and steadily maintained their Watson-Crick base pairing as well as the normal, anti-glycosidic bond (Figure 3, Supplementary Figures S7–S9). Altogether, these analyses demonstrated that the DNA containing 6-4PP, but not CPD, becomes untwisted and bent and has its damage-containing nucleotide pairs flipped out upon its initial binding to Rad4. Next, we describe the key Rad4–DNA interactions that underlie the observed conformational changes of the 6-4PP DNA. 6-4PP:AA nucleotide pair extrusion involve significant engagement of the Rad4 BHD2 domain and manifest compensatory exchanges of van der Waals interactions with interacting partners in DNA and Rad4 Previous MD studies indicated that lesion recognition correlated positively with how well BHD2 initially engaged with the lesion site from the minor groove side of the DNA (104). To examine this for 6-4PP and CPD, we calculated the alpha space (AS) volume occupied by BHD2 in the minor groove of DNA (Supplementary Figure S10): the AS volume reflects the curvature and surface area of the DNA minor groove bound by BHD2 (106). The AS volume for the 6-4PP-binding was 349 (Å3) in the initial binding state and was consistently high from early in the binding trajectory, but it was only 110 (Å3) for the CPD-binding (Figure 3 and Supplementary Figure S10). Thus, the Rad4 BHD2 interacted more extensively with the minor groove around 6-4PP than around CPD. This binding was achieved by multiple hydrogen bonding and hydrophobic interactions between the 6-4PP-containing DNA and BHD2 (particularly involving Val517) while the CPD manifested only limited interactions (Supplementary Figure S11, Supplementary Table S3, Movies S3 and S4). Notably, the analyses of the van der Waals interactions involving the lesion, its partner bases, their neighboring nucleotides as well as BHD2-BHD3 show that there is successive handover between the interacting partners where the loss of one interaction is compensated by a new interaction with another party. Furthermore, the compensatory exchanges were carried out in a manner that directly facilitate nucleotide extrusions and partner base flipping (Supplementary Figure S12, see SI Discussion). Visualizing the full lesion recognition trajectory Combining the crystallographic and MD structures together, we can start constructing a plausible high-resolution 3D binding trajectory for Rad4 and 6-4PP. While the Rad4–6-4PP crystal structure serves as the final complex formed (‘recognition complex’, RC), the docking model for MD simulation represents the initial encounter between the protein and the lesion (encounter complex, EC) where the DNA structures resemble the free DNA structure the most. The MD structural clusters that appear after the initial equilibration can be considered as early intermediate complexes (ICs) between EC and RC along the recognition trajectory. Figure 4 and Movie S5 depict the structural progression from EC to IC (taken from the g5 in Supplementary Figure S5A) and then to the RC. Figure 4B depicts the DNA duplexes extracted from the superposed structures. As expected, all three DNAs in these complexes show bent conformations but the directions were all distinct from one another. Notably, IC showed most bending among the three structures owing to the engagement of BHD2 from the minor groove side, which bends the DNA towards the major groove side. The RC, on the other hand, was relatively less bent but most unwound, accompanied by the flipping out (‘opening’) of both of the 6-4PP-containing nucleotide pairs that is stabilized by the BHD3 β-hairpin insertion. The IC structure(s) clearly indicate that Rad4 uses DNA bending and untwisting as passage to achieve the RC’s ‘open’ structure. Figure 4. View largeDownload slide Structural models for the 6-4PP lesion recognition trajectory by Rad4. The EC, IC (g5) and RC structures were superposed over the undamaged DNA duplex regions bound by the TGD domain of Rad4 (residues 3–11 in Figure 2C) and each structure is shown as a protein–DNA complex (A) or DNA only (B) from the same orientation. (C) The superposed DNA structures are shown in two orientations. EC is in yellow, IC in cyan and RC in purple. The green dotted line indicates the helical axis of the DNA extending from the undamaged B-DNA duplex region. Figure 4. View largeDownload slide Structural models for the 6-4PP lesion recognition trajectory by Rad4. The EC, IC (g5) and RC structures were superposed over the undamaged DNA duplex regions bound by the TGD domain of Rad4 (residues 3–11 in Figure 2C) and each structure is shown as a protein–DNA complex (A) or DNA only (B) from the same orientation. (C) The superposed DNA structures are shown in two orientations. EC is in yellow, IC in cyan and RC in purple. The green dotted line indicates the helical axis of the DNA extending from the undamaged B-DNA duplex region. Altogether, this study enables us to visualize the structural trajectory for Rad4′s photolesion binding in unprecedented detail and provides key structural insights into the remarkable differences between the 6-4PP and CPD recognition and repair by NER. DISCUSSION Indirect recognition of 6-4PP by Rad4 is distinct from the recognition of 6-4PP by other 6-4PP-binding proteins A limited number of 3D structures have previously been determined for other 6-4PP-binding proteins bound to 6-4PP-containing double-stranded DNA, including UV-DDB (28), 6-4 photolyase (107) and anti-6-4PP antibodies (108) using X-ray crystallography (reviewed in (98)). In these complex structures, the protein-bound DNA duplexes were all kinked around the 6-4PP by ∼40–90° with the 6-4PP flipped out towards a cavity or an active site of the protein which makes direct contacts with 6-4PP. The direct contacts between these proteins and 6-4PP would be pertinent for the functions of the proteins, which entail specific recognition of the 6-4PP (e.g. antibodies raised against 6-4PP or the 6-4PP photolyase that carries out the photo-reversal of 6-4PP). In the Rad4–6-4PP structure, we observed that 6-4PP was flipped out from the DNA duplex similarly as in other structures, but in contrast to others, Rad4 did not make direct contacts with the lesion. Such a unique binding mode compels the protein to rely on an indirect recognition strategy and allows the protein to bind to an extraordinarily broad range of DNA damage (45,77,79). MD trajectories show sequential extrusion of both nucleotides complementary to 6-4PP towards the major groove upon the initial binding with Rad4 Previous MD simulation studies by Mu et al. from the Broyde group have examined the initial Rad4–DNA interactions for various DNA lesions derived from polycyclic aromatic chemicals (102–104). The studies revealed the value of the computational approach in uncovering the molecular mechanisms of Rad4 lesion recognition. In particular, it revealed that the NER rates of aromatic adduct lesions had strong positive correlations with the degrees of DNA untwisting and of BHD2-minor groove engagement in the initial binding states reached as early as 300 ns and stable up to 1.5 μs of MD simulations (81). In the present study, we extended this approach to the Rad4-binding with 6-4PP and CPD (all in a natural, matched DNA context) and carried out simulations up to 4 μs. This report is the first that examined the initial binding state MD trajectories of small intra-strand crosslink dimeric lesions, which do not entail a large aromatic modification to nucleobases. While we observed a common pattern of binding characteristics that contrast sharply between well-repaired and repair-resistant lesions, the current study also adds unique insights for 6-4PP recognition, derived from the particular 6-4PP structure. Specifically, here we observe the flipping of the 5′ partner dA binding into a BHD2/BHD3 groove at ∼2 μs, followed by episodic partial extrusion of the 3′ partner dA at ∼2.5 μs to approach the hairpin tip of BHD3. On the other hand, just one nucleotide had been observed to flip in the previous 1.5 μs simulations for polycyclic aromatic moieties (82,102). The distorted original hydrogen bonding of 6-4PP with dA partners promoted the extrusion process particularly for the 5′ partner dA. The unstacking and full flipping of the 5′ dA leads the 3′ dA to partially and episodically extrude, while occasionally maintaining one hydrogen bond. 6-4PP itself also showed dynamical partial extrusions. Remarkably, in all these trajectories, the nucleotide extrusion or ‘flipping-out’ progressed towards the major groove, as hinted from the Rad4–6-4PP crystal structure which represents the final state for a Rad4-recognized lesion. Once both partner adenines are fully flipped out, they would both be captured by a binding pocket between BHD2 and BHD3 in Rad4 prior to the insertion of the β-hairpin3 into the DNA duplex shown in the crystal structure and in our previous full pathway simulations (102,103). The MD analyses thus further solidify the model that the major groove extrusion of both nucleotide pairs may begin before the β-hairpin3 insertion (80,103). Interestingly, the Rad4-binding-induced nucleotide flipping towards the major groove was also observed with the well-repaired 14R-(+)-trans-anti-dibenzo[a,l]-pyrene-N2-dG (14R-DB[a,l]P-dG) lesion paired with normal dC (82). The general observation of partner nucleotide capture has been described for a number of well-repaired lesions: 10R-(+)-cis-anti-benzo[a]pyrene-N2-dG paired with normal dC (cis-B[a]P-dG:dC), N-(deoxyguanosin-8-yl)-2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine paired with normal dC (PhIP-C8-dG:dC) and 14R-DB[a,l]P-dG:dC, and may well be a characteristic of lesions with high NER efficiencies (96). New in this study, we have also found that the untwisting angles are positively correlated with the DNA bend angles particularly after 1 μs, indicating that these motions in the initial binding states exhibit mechanistic coupling. This coupling may be possible as BHD2 engages from the minor groove side of the DNA while BHD3 faces the major groove, encouraging bending directed toward the open complex upon untwisting and vice versa. In contrast to the well-recognized/repaired 6-4PP, the unrecognizable CPD resisted unwinding by the Rad4′s BHD2 and did not engage much with BHD2. These features are congruent with the features shown by other repair-resistant lesions such as cis-B[a]P-dG missing its partner dC or mispaired with a dA or 14R-DB[a,l]P-dA that was paired normally with a dT (82). Furthermore, CPD showed that all of its Watson-Crick pairing for both base pairs was maintained throughout the simulation as was also the case for the repair-resistant 14R-DB[a,l]P-dA:dT (82). Some bending of ∼ 27° in the CPD DNA was observed before 1.5 μs but the DNA maintained mostly straight forms afterwards, consistent with the lack of untwisting. It is intriguing to speculate that CPD may avoid recognition by resisting the coupled untwisting/bending, for instance, by being able to bend without the untwisting that is needed for nucleotide extrusion. Implications for the ‘kinetic gating’ mechanism The current study also corroborates the ‘kinetic gating’ mechanism that we have previously proposed (79). In the ‘kinetic gating’ model, we proposed that Rad4-induced DNA duplex ‘opening’ must happen before the protein translocates away from the DNA site in order for a DNA site to be recognized. The kinetic studies using T-jump spectroscopy had revealed that the nonspecific interrogation involving untwisting motion in the DNA is in the order of 100–500 μs whereas the rate-determining full ‘opening’ step was ∼5–10 ms for mismatched DNA model lesions (79,80). Studies using single molecule microscopy revealed that the residence time of the protein during nonspecific, 1D diffusional search is 1–600 μs per DNA base pair (81). It is currently not feasible to simulate time scales in the 100 μs range with all-atom MD simulations and the computationally demanding but most accurate explicit solvation employed here; furthermore, it remains to be determined how the time scales in the MD trajectory exactly translate to the experimentally observed ones. We also suggest that these time scales will differ depending on the DNA lesions and sequence contexts. However, the 3D movies of molecular motions obtained through the MD studies here already successfully manifest how 6-4PP would be kinetically more likely to be ‘opened’ before Rad4 diffuses away than CPD and provides a firm foundation for future studies aimed at directly bridging the computational and experimental studies. Implication for the mechanism of lesion recognition by UV-DDB The present study shows that the BHD2 engagement with the DNA minor groove around the lesion site is a critical component leading to the lesion-specific binding. Interestingly, the UV-DDB complex is also shown to engage with the DNA minor groove and pushes out the nucleotides towards the major groove. The space previously occupied by the lesion is filled by a three-residue plug (Phe371, Gln372, His373) inserted from the minor groove side (28). The nucleotide pairs such as CPD:AA are being partially flipped out and this leaves the partner purines against the photolesions available for capture by XPC, as the DDB2 protein is facing the damaged strand, partially enclosing the damaged photodimers. Interestingly, when the DDB2–DNA structure is superposed to the initial binding state of CPD with Rad4, the DNA bending directions are consistent with each other, with the DDB2 and Rad4 facing opposite sides of the DNA (Supplementary Figure S13). BHD2 also overlaps with DDB2′s plug residues, which indicates that the lesion hand-over from UV-DDB to XPC may involve replacing DDB2 with the BHD2 of XPC at some point during the process (33). In sum, our study provides new insights into the lesion recognition process by XPC/Rad4 and NER. We envision that the structural and mechanistic principles gathered from this study will be applicable to a broad range of protein–nucleic acid binding and recognition processes that occur in cells in the intricate chromatin context and will illuminate studies geared towards developing strategies to modulate these interactions for clinical interventions. DATA AVAILABILITY The structural coordinates have been deposited with PDB code 6CFI. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS We thank Dr. Nikola P. Pavletich for initiating this project and the members of the Min and Broyde groups for their support. We are grateful to Ms. Chunyan (Alice) Cao for help with synthesis of the thymidine dimer methyl ester and HPLC purifications and to Dr. George Sukenick for help with NMR. FUNDING National Science Foundation (NSF) [MCB-1412692 to J.-H.M.]; National Institutes of Health [R21-ES028384 to J.-H.M. and R01-ES025987 to S.B.]; This work used the Extreme Science and Engineering Discovery Environment (XSEDE), supported by NSF [MCB-060037 to S.B.] and high performance computing resources of New York University (NYU-ITS); The Organic synthesis Core at MSKCC is partially supported by an NCI grant [P30 CA008748]. Funding for open access charge: NSF/NIH grants. Conflict of interest statement. None declared. Notes Present address: Hong Zhao, Department of Chemistry, New York University, New York, NY 10003, USA. REFERENCES 1. Friedberg E.C. , Walker G.C. , Siede W. , Wood R.D. , Schultz R.A. , Ellenberger T. DNA Repair and Mutagenesis . 2006 ; 2 nd edn. Washington, D.C ASM Press . Google Preview WorldCat 2. Ganesan A. , Hanawalt P. Photobiological origins of the field of genomic maintenance . Photochem. Photobiol. 2016 ; 92 : 52 – 60 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Mao P. , Wyrick J.J. , Roberts S.A. , Smerdon M.J. UV-Induced DNA damage and mutagenesis in chromatin . Photochem. Photobiol. 2017 ; 93 : 216 – 228 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Cadet J. , Douki T. Formation of UV-induced DNA damage contributing to skin cancer development . Photochem. Photobiol. Sci. 2018 ; 17 : 1816 – 1841 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Jackson S.P. , Bartek J. The DNA-damage response in human biology and disease . Nature . 2009 ; 461 : 1071 – 1078 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Scharer O.D. Nucleotide excision repair in eukaryotes . Cold Spring Harb. Perspect. Biol. 2013 ; 5 : a012609 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Sugasawa K. Molecular mechanisms of DNA damage recognition for mammalian nucleotide excision repair . DNA Repair (Amst.) . 2016 ; 44 : 110 – 117 . Google Scholar Crossref Search ADS PubMed WorldCat 8. Puumalainen M.R. , Ruthemann P. , Min J.H. , Naegeli H. Xeroderma pigmentosum group C sensor: unprecedented recognition strategy and tight spatiotemporal regulation . Cell. Mol. Life Sci. 2016 ; 73 : 547 – 566 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Gillet L.C. , Scharer O.D. Molecular mechanisms of mammalian global genome nucleotide excision repair . Chem. Rev. 2006 ; 106 : 253 – 276 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Besaratinia A. , Yoon J.I. , Schroeder C. , Bradforth S.E. , Cockburn M. , Pfeifer G.P. Wavelength dependence of ultraviolet radiation-induced DNA damage as determined by laser irradiation suggests that cyclobutane pyrimidine dimers are the principal DNA lesions produced by terrestrial sunlight . FASEB J. 2011 ; 25 : 3079 – 3091 . Google Scholar Crossref Search ADS PubMed WorldCat 11. You Y.H. , Lee D.H. , Yoon J.H. , Nakajima S. , Yasui A. , Pfeifer G.P. Cyclobutane pyrimidine dimers are responsible for the vast majority of mutations induced by UVB irradiation in mammalian cells . J. Biol. Chem. 2001 ; 276 : 44688 – 44694 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Protic-Sabljic M. , Tuteja N. , Munson P.J. , Hauser J. , Kraemer K.H. , Dixon K. UV light-induced cyclobutane pyrimidine dimers are mutagenic in mammalian cells . Mol. Cell. Biol. 1986 ; 6 : 3349 – 3356 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Hu J. , Adar S. The cartography of UV-induced DNA damage formation and DNA repair . Photochem. Photobiol. 2017 ; 93 : 199 – 206 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Karentz D. Beyond xeroderma pigmentosum: DNA damage and repair in an ecological context. A tribute to James E. Cleaver . Photochem. Photobiol. 2015 ; 91 : 460 – 474 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Cleaver J.E. , Lam E.T. , Revet I. Disorders of nucleotide excision repair: the genetic and molecular basis of heterogeneity . Nat. Rev. Genet. 2009 ; 10 : 756 – 768 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Kraemer K.H. , Patronas N.J. , Schiffmann R. , Brooks B.P. , Tamura D. , DiGiovanna J.J. Xeroderma pigmentosum, trichothiodystrophy and Cockayne syndrome: a complex genotype-phenotype relationship . Neuroscience . 2007 ; 145 : 1388 – 1396 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Spivak G. Nucleotide excision repair in humans . DNA Repair (Amst.) . 2015 ; 36 : 13 – 18 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Brueckner F. , Hennecke U. , Carell T. , Cramer P. CPD damage recognition by transcribing RNA polymerase II . Science . 2007 ; 315 : 859 – 862 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Xu J. , Lahiri I. , Wang W. , Wier A. , Cianfrocco M.A. , Chong J. , Hare A.A. , Dervan P.B. , DiMaio F. , Leschziner A.E. et al. . Structural basis for the initiation of eukaryotic transcription-coupled DNA repair . Nature . 2017 ; 551 : 653 – 657 . Google Scholar Crossref Search ADS PubMed WorldCat 20. Sanz-Murillo M. , Xu J. , Belogurov G.A. , Calvo O. , Gil-Carton D. , Moreno-Morcillo M. , Wang D. , Fernandez-Tornero C. Structural basis of RNA polymerase I stalling at UV light-induced DNA damage . Proc. Natl. Acad. Sci. U. S. A. 2018 ; 115 : 8972 – 8977 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Li W. , Selvam K. , Ko T. , Li S. Transcription bypass of DNA lesions enhances cell survival but attenuates transcription coupled DNA repair . Nucleic Acids Res. 2014 ; 42 : 13242 – 13253 . Google Scholar Crossref Search ADS PubMed WorldCat 22. van Hoffen A. , Venema J. , Meschini R. , van Zeeland A.A. , Mullenders L.H. Transcription-coupled repair removes both cyclobutane pyrimidine dimers and 6-4 photoproducts with equal efficiency and in a sequential way from transcribed DNA in xeroderma pigmentosum group C fibroblasts . EMBO J. 1995 ; 14 : 360 – 367 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Adar S. , Hu J. , Lieb J.D. , Sancar A. Genome-wide kinetics of DNA excision repair in relation to chromatin state and mutagenesis . Proc. Natl. Acad. Sci. U.S.A. 2016 ; 113 : E2124 – E2133 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Friedberg E.C. How nucleotide excision repair protects against cancer . Nat. Rev. Cancer . 2001 ; 1 : 22 – 33 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Wood R.D. DNA damage recognition during nucleotide excision repair in mammalian cells . Biochimie . 1999 ; 81 : 39 – 44 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Sugasawa K. , Ng J.M. , Masutani C. , Iwai S. , van der Spek P.J. , Eker A.P. , Hanaoka F. , Bootsma D. , Hoeijmakers J.H. Xeroderma pigmentosum group C protein complex is the initiator of global genome nucleotide excision repair . Mol. Cell . 1998 ; 2 : 223 – 232 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Nishi R. , Okuda Y. , Watanabe E. , Mori T. , Iwai S. , Masutani C. , Sugasawa K. , Hanaoka F. Centrin 2 stimulates nucleotide excision repair by interacting with xeroderma pigmentosum group C protein . Mol. Cell. Biol. 2005 ; 25 : 5664 – 5674 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Scrima A. , Konickova R. , Czyzewski B.K. , Kawasaki Y. , Jeffrey P.D. , Groisman R. , Nakatani Y. , Iwai S. , Pavletich N.P. , Thoma N.H. Structural basis of UV DNA-damage recognition by the DDB1-DDB2 complex . Cell . 2008 ; 135 : 1213 – 1223 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Ghodke H. , Wang H. , Hsieh C.L. , Woldemeskel S. , Watkins S.C. , Rapic-Otrin V. , Van Houten B. Single-molecule analysis reveals human UV-damaged DNA-binding protein (UV-DDB) dimerizes on DNA via multiple kinetic intermediates . Proc. Natl. Acad. Sci. U.S.A. 2014 ; 111 : E1862 – E1871 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Yeh J.I. , Levine A.S. , Du S. , Chinte U. , Ghodke H. , Wang H. , Shi H. , Hsieh C.L. , Conway J.F. , Van Houten B. et al. . Damaged DNA induced UV-damaged DNA-binding protein (UV-DDB) dimerization and its roles in chromatinized DNA repair . Proc. Natl. Acad. Sci. U.S.A. 2012 ; 109 : E2737 – E2746 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Fujiwara Y. , Masutani C. , Mizukoshi T. , Kondo J. , Hanaoka F. , Iwai S. Characterization of DNA recognition by the human UV-damaged DNA-binding protein . J. Biol. Chem. 1999 ; 274 : 20027 – 20033 . Google Scholar Crossref Search ADS PubMed WorldCat 32. Sugasawa K. Multiple DNA damage recognition factors involved in mammalian nucleotide excision repair . Biochemistry (Mosc.) . 2011 ; 76 : 16 – 23 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Sugasawa K. , Okuda Y. , Saijo M. , Nishi R. , Matsuda N. , Chu G. , Mori T. , Iwai S. , Tanaka K. , Hanaoka F. UV-induced ubiquitylation of XPC protein mediated by UV-DDB-ubiquitin ligase complex . Cell . 2005 ; 121 : 387 – 400 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Fitch M.E. , Nakajima S. , Yasui A. , Ford J.M. In vivo recruitment of XPC to UV-induced cyclobutane pyrimidine dimers by the DDB2 gene product . J. Biol. Chem. 2003 ; 278 : 46906 – 46910 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Batty D. , Rapic'-Otrin V. , Levine A.S. , Wood R.D. Stable binding of human XPC complex to irradiated DNA confers strong discrimination for damaged sites . J. Mol. Biol. 2000 ; 300 : 275 – 290 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Sugasawa K. , Okamoto T. , Shimizu Y. , Masutani C. , Iwai S. , Hanaoka F. A multistep damage recognition mechanism for global genomic nucleotide excision repair . Genes Dev. 2001 ; 15 : 507 – 521 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Hey T. , Lipps G. , Sugasawa K. , Iwai S. , Hanaoka F. , Krauss G. The XPC-HR23B complex displays high affinity and specificity for damaged DNA in a true-equilibrium fluorescence assay . Biochemistry . 2002 ; 41 : 6583 – 6587 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Young A.R. , Chadwick C.A. , Harrison G.I. , Hawk J.L. , Nikaido O. , Potten C.S. The in situ repair kinetics of epidermal thymine dimers and 6-4 photoproducts in human skin types I and II . J. Invest. Dermatol. 1996 ; 106 : 1307 – 1313 . Google Scholar Crossref Search ADS PubMed WorldCat 39. Canturk F. , Karaman M. , Selby C.P. , Kemp M.G. , Kulaksiz-Erkmen G. , Hu J. , Li W. , Lindsey-Boltz L.A. , Sancar A. Nucleotide excision repair by dual incisions in plants . Proc. Natl. Acad. Sci. U.S.A. 2016 ; 113 : 4706 – 4710 . Google Scholar Crossref Search ADS PubMed WorldCat 40. Uchida A. , Sugasawa K. , Masutani C. , Dohmae N. , Araki M. , Yokoi M. , Ohkuma Y. , Hanaoka F. The carboxy-terminal domain of the XPC protein plays a crucial role in nucleotide excision repair through interactions with transcription factor IIH . DNA Repair (Amst.) . 2002 ; 1 : 449 – 461 . Google Scholar Crossref Search ADS PubMed WorldCat 41. Yokoi M. , Masutani C. , Maekawa T. , Sugasawa K. , Ohkuma Y. , Hanaoka F. The xeroderma pigmentosum group C protein complex XPC-HR23B plays an important role in the recruitment of transcription factor IIH to damaged DNA . J. Biol. Chem. 2000 ; 275 : 9870 – 9875 . Google Scholar Crossref Search ADS PubMed WorldCat 42. Kusakabe M. , Onishi Y. , Tada H. , Kurihara F. , Kusao K. , Furukawa M. , Iwai S. , Yokoi M. , Sakai W. , Sugasawa K. Mechanism and regulation of DNA damage recognition in nucleotide excision repair . Genes Environ. 2019 ; 41 : 2 . Google Scholar Crossref Search ADS PubMed WorldCat 43. Li C.L. , Golebiowski F.M. , Onishi Y. , Samara N.L. , Sugasawa K. , Yang W. Tripartite DNA Lesion Recognition and Verification by XPC, TFIIH, and XPA in Nucleotide Excision Repair . Mol. Cell . 2015 ; 59 : 1025 – 1034 . Google Scholar Crossref Search ADS PubMed WorldCat 44. Riedl T. , Hanaoka F. , Egly J.M. The comings and goings of nucleotide excision repair factors on damaged DNA . EMBO J. 2003 ; 22 : 5293 – 5303 . Google Scholar Crossref Search ADS PubMed WorldCat 45. Min J.H. , Pavletich N.P. Recognition of DNA damage by the Rad4 nucleotide excision repair protein . Nature . 2007 ; 449 : 570 – 575 . Google Scholar Crossref Search ADS PubMed WorldCat 46. Volker M. , Mone M.J. , Karmakar P. , van Hoffen A. , Schul W. , Vermeulen W. , Hoeijmakers J.H. , van Driel R. , van Zeeland A.A. , Mullenders L.H. Sequential assembly of the nucleotide excision repair factors in vivo . Mol. Cell . 2001 ; 8 : 213 – 224 . Google Scholar Crossref Search ADS PubMed WorldCat 47. Luijsterburg M.S. , von Bornstaedt G. , Gourdin A.M. , Politi A.Z. , Mone M.J. , Warmerdam D.O. , Goedhart J. , Vermeulen W. , van Driel R. , Hofer T. Stochastic and reversible assembly of a multiprotein DNA repair complex ensures accurate target site recognition and efficient repair . J. Cell Biol. 2010 ; 189 : 445 – 463 . Google Scholar Crossref Search ADS PubMed WorldCat 48. Sugasawa K. , Akagi J. , Nishi R. , Iwai S. , Hanaoka F. Two-step recognition of DNA damage for mammalian nucleotide excision repair: Directional binding of the XPC complex and DNA strand scanning . Mol. Cell . 2009 ; 36 : 642 – 653 . Google Scholar Crossref Search ADS PubMed WorldCat 49. Sugasawa K. , Shimizu Y. , Iwai S. , Hanaoka F. A molecular mechanism for DNA damage recognition by the xeroderma pigmentosum group C protein complex . DNA Repair (Amst.) . 2002 ; 1 : 95 – 107 . Google Scholar Crossref Search ADS PubMed WorldCat 50. Bootsma D. , Kraemer K.H. , Cleaver J. , Hoeijmakers J.H. Vogelstein B , Kinzler KW The Genetic Basis of Human Cancer . 1998 ; NY McGraw-Hill 245 – 274 . Google Preview WorldCat 51. Chavanne F. , Broughton B.C. , Pietra D. , Nardo T. , Browitt A. , Lehmann A.R. , Stefanini M. Mutations in the XPC gene in families with xeroderma pigmentosum and consequences at the cell, protein, and transcript levels . Cancer Res. 2000 ; 60 : 1974 – 1982 . Google Scholar PubMed WorldCat 52. Perera D. , Poulos R.C. , Shah A. , Beck D. , Pimanda J.E. , Wong J.W. Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes . Nature . 2016 ; 532 : 259 – 263 . Google Scholar Crossref Search ADS PubMed WorldCat 53. Budden T. , Davey R.J. , Vilain R.E. , Ashton K.A. , Braye S.G. , Beveridge N.J. , Bowden N.A. Repair of UVB-induced DNA damage is reduced in melanoma due to low XPC and global genome repair . Oncotarget . 2016 ; 7 : 60940 – 60953 . Google Scholar Crossref Search ADS PubMed WorldCat 54. Murray H.C. , Maltby V.E. , Smith D.W. , Bowden N.A. Nucleotide excision repair deficiency in melanoma in response to UVA . Exp. Hematol. Oncol. 2015 ; 5 : 6 . Google Scholar Crossref Search ADS PubMed WorldCat 55. Dong T.K. , Ona K. , Scandurra A.E. , Demetriou S.K. , Oh D.H. Deficient nucleotide excision repair in squamous Cell carcinoma cells . Photochem. Photobiol. 2016 ; 92 : 760 – 766 . Google Scholar Crossref Search ADS PubMed WorldCat 56. Kusumoto R. , Masutani C. , Sugasawa K. , Iwai S. , Araki M. , Uchida A. , Mizukoshi T. , Hanaoka F. Diversity of the damage recognition step in the global genomic nucleotide excision repair in vitro . Mutat. Res. 2001 ; 485 : 219 – 227 . Google Scholar Crossref Search ADS PubMed WorldCat 57. Reardon J.T. , Mu D. , Sancar A. Overproduction, purification, and characterization of the XPC subunit of the human DNA repair excision nuclease . J. Biol. Chem. 1996 ; 271 : 19451 – 19456 . Google Scholar Crossref Search ADS PubMed WorldCat 58. Bunick C.G. , Miller M.R. , Fuller B.E. , Fanning E. , Chazin W.J. Biochemical and structural domain analysis of xeroderma pigmentosum complementation group C protein . Biochemistry . 2006 ; 45 : 14965 – 14979 . Google Scholar Crossref Search ADS PubMed WorldCat 59. Trego K.S. , Turchi J.J. Pre-steady-state binding of damaged DNA by XPC-hHR23B reveals a kinetic mechanism for damage discrimination . Biochemistry . 2006 ; 45 : 1961 – 1969 . Google Scholar Crossref Search ADS PubMed WorldCat 60. Cai Y. , Patel D.J. , Geacintov N.E. , Broyde S. Differential nucleotide excision repair susceptibility of bulky DNA adducts in different sequence contexts: hierarchies of recognition signals . J. Mol. Biol. 2009 ; 385 : 30 – 44 . Google Scholar Crossref Search ADS PubMed WorldCat 61. Kropachev K. , Kolbanovskii M. , Cai Y. , Rodriguez F. , Kolbanovskii A. , Liu Y. , Zhang L. , Amin S. , Patel D. , Broyde S. et al. . The sequence dependence of human nucleotide excision repair efficiencies of benzo[a]pyrene-derived DNA lesions: insights into the structural factors that favor dual incisions . J. Mol. Biol. 2009 ; 386 : 1193 – 1203 . Google Scholar Crossref Search ADS PubMed WorldCat 62. Cai Y. , Kropachev K. , Xu R. , Tang Y. , Kolbanovskii M. , Kolbanovskii A. , Amin S. , Patel D.J. , Broyde S. , Geacintov N.E. Distant neighbor base sequence context effects in human nucleotide excision repair of a benzo[a]pyrene-derived DNA lesion . J. Mol. Biol. 2010 ; 399 : 397 – 409 . Google Scholar Crossref Search ADS PubMed WorldCat 63. Mu H. , Kropachev K. , Wang L. , Zhang L. , Kolbanovskiy A. , Kolbanovskiy M. , Geacintov N.E. , Broyde S. Nucleotide excision repair of 2-acetylaminofluorene- and 2-aminofluorene-(C8)-guanine adducts: molecular dynamics simulations elucidate how lesion structure and base sequence context impact repair efficiencies . Nucleic Acids Res. 2012 ; 40 : 9675 – 9690 . Google Scholar Crossref Search ADS PubMed WorldCat 64. Ding S. , Kropachev K. , Cai Y. , Kolbanovskiy M. , Durandina S.A. , Liu Z. , Shafirovich V. , Broyde S. , Geacintov N.E. Structural, energetic and dynamic properties of guanine(C8)-thymine(N3) cross-links in DNA provide insights on susceptibility to nucleotide excision repair . Nucleic Acids Res. 2012 ; 40 : 2506 – 2517 . Google Scholar Crossref Search ADS PubMed WorldCat 65. Cai Y. , Patel D.J. , Geacintov N.E. , Broyde S. Dynamics of a benzo[a]pyrene-derived guanine DNA lesion in TGT and CGC sequence contexts: enhanced mobility in TGT explains conformational heterogeneity, flexible bending, and greater susceptibility to nucleotide excision repair . J. Mol. Biol. 2007 ; 374 : 292 – 305 . Google Scholar Crossref Search ADS PubMed WorldCat 66. Kim J.K. , Choi B.S. The solution structure of DNA duplex-decamer containing the (6-4) photoproduct of thymidylyl(3′→5′)thymidine by NMR and relaxation matrix refinement . Eur. J. Biochem. 1995 ; 228 : 849 – 854 . Google Scholar Crossref Search ADS PubMed WorldCat 67. Kim J.K. , Patel D. , Choi B.S. Contrasting structural impacts induced by cis-syn cyclobutane dimer and (6-4) adduct in DNA duplex decamers: implication in mutagenesis and repair activity . Photochem. Photobiol. 1995 ; 62 : 44 – 50 . Google Scholar Crossref Search ADS PubMed WorldCat 68. Dehez F. , Gattuso H. , Bignon E. , Morell C. , Dumont E. , Monari A. Conformational polymorphism or structural invariance in DNA photoinduced lesions: implications for repair rates . Nucleic Acids Res. 2017 ; 45 : 3654 – 3662 . Google Scholar Crossref Search ADS PubMed WorldCat 69. Kemmink J. , Boelens R. , Koning T.M. , Kaptein R. , van der Marel G.A. , van Boom J.H. Conformational changes in the oligonucleotide duplex d(GCGTTGCG) x d(CGCAACGC) induced by formation of a cis-syn thymine dimer. A two-dimensional NMR study . Eur. J. Biochem. 1987 ; 162 : 37 – 43 . Google Scholar Crossref Search ADS PubMed WorldCat 70. McAteer K. , Jing Y. , Kao J. , Taylor J.S. , Kennedy M.A. Solution-state structure of a DNA dodecamer duplex containing a Cis-syn thymine cyclobutane dimer, the major UV photoproduct of DNA . J. Mol. Biol. 1998 ; 282 : 1013 – 1032 . Google Scholar Crossref Search ADS PubMed WorldCat 71. Wang C.I. , Taylor J.S. Site-specific effect of thymine dimer formation on dAn.dTn tract bending and its biological implications . Proc. Natl. Acad. Sci. U.S.A. 1991 ; 88 : 9072 – 9076 . Google Scholar Crossref Search ADS PubMed WorldCat 72. Park H. , Zhang K. , Ren Y. , Nadji S. , Sinha N. , Taylor J.S. , Kang C. Crystal structure of a DNA decamer containing a cis-syn thymine dimer . Proc. Natl. Acad. Sci. U.S.A. 2002 ; 99 : 15965 – 15970 . Google Scholar Crossref Search ADS PubMed WorldCat 73. Krasikova Y.S. , Rechkunova N.I. , Maltseva E.A. , Anarbaev R.O. , Pestryakov P.E. , Sugasawa K. , Min J.H. , Lavrik O.I. Human and yeast DNA damage recognition complexes bind with high affinity DNA structures mimicking in size transcription bubble . J. Mol. Recognit. 2013 ; 26 : 653 – 661 . Google Scholar Crossref Search ADS PubMed WorldCat 74. Zhang E.T. , He Y. , Grob P. , Fong Y.W. , Nogales E. , Tjian R. Architecture of the human XPC DNA repair and stem cell coactivator complex . Proc. Natl. Acad. Sci. U.S.A. 2015 ; 112 : 14817 – 14822 . Google Scholar Crossref Search ADS PubMed WorldCat 75. Gunz D. , Hess M.T. , Naegeli H. Recognition of DNA adducts by human nucleotide excision repair. Evidence for a thermodynamic probing mechanism . J. Biol. Chem. 1996 ; 271 : 25089 – 25098 . Google Scholar Crossref Search ADS PubMed WorldCat 76. Geacintov N.E. , Broyde S. , Buterin T. , Naegeli H. , Wu M. , Yan S. , Patel D.J. Thermodynamic and structural factors in the removal of bulky DNA adducts by the nucleotide excision repair machinery . Biopolymers . 2002 ; 65 : 202 – 210 . Google Scholar Crossref Search ADS PubMed WorldCat 77. Buterin T. , Meyer C. , Giese B. , Naegeli H. DNA quality control by conformational readout on the undamaged strand of the double helix . Chem. Biol. 2005 ; 12 : 913 – 922 . Google Scholar Crossref Search ADS PubMed WorldCat 78. Maillard O. , Camenisch U. , Clement F.C. , Blagoev K.B. , Naegeli H. DNA repair triggered by sensors of helical dynamics . Trends Biochem. Sci. 2007 ; 32 : 494 – 499 . Google Scholar Crossref Search ADS PubMed WorldCat 79. Chen X. , Velmurugu Y. , Zheng G. , Park B. , Shim Y. , Kim Y. , Liu L. , Van Houten B. , He C. , Ansari A. et al. . Kinetic gating mechanism of DNA damage recognition by Rad4/XPC . Nat. Commun. 2015 ; 6 : 5849 . Google Scholar Crossref Search ADS PubMed WorldCat 80. Velmurugu Y. , Chen X. , Slogoff Sevilla P. , Min J.H. , Ansari A. Twist-open mechanism of DNA damage recognition by the Rad4/XPC nucleotide excision repair complex . Proc. Natl. Acad. Sci. U.S.A. 2016 ; 113 : E2296 – E2305 . Google Scholar Crossref Search ADS PubMed WorldCat 81. Kong M. , Liu L. , Chen X. , Driscoll K.I. , Mao P. , Bohm S. , Kad N.M. , Watkins S.C. , Bernstein K.A. , Wyrick J.J. et al. . Single-nolecule imaging reveals that Rad4 employs a dynamic DNA damage recognition process . Mol. Cell . 2016 ; 64 : 376 – 387 . Google Scholar Crossref Search ADS PubMed WorldCat 82. Mu H. , Zhang Y. , Geacintov N.E. , Broyde S. Lesion sensing during initial binding by yeast XPC/Rad4: toward predicting resistance to nucleotide excision repair . Chem. Res. Toxicol. 2018 ; 31 : 1260 – 1268 . Google Scholar Crossref Search ADS PubMed WorldCat 83. Lin G. , Li L. Elucidation of spore-photoproduct formation by isotope labeling . Angew. Chem. Int. Ed. Engl. 2010 ; 49 : 9926 – 9929 . Google Scholar Crossref Search ADS PubMed WorldCat 84. Otwinowski Z. , Minor W. Carter C.W. Jr Methods Enzymol . 1997 ; 276 : Academic Press 307 – 326 . Google Preview WorldCat 85. Kabsch W. Integration, scaling, space-group assignment and post-refinement . Acta Crystallogr. D Biol. Crystallogr. 2010 ; 66 : 133 – 144 . Google Scholar Crossref Search ADS PubMed WorldCat 86. McCoy A.J. , Grosse-Kunstleve R.W. , Adams P.D. , Winn M.D. , Storoni L.C. , Read R.J. Phaser crystallographic software . J. Appl. Crystallogr. 2007 ; 40 : 658 – 674 . Google Scholar Crossref Search ADS PubMed WorldCat 87. Adams P.D. , Afonine P.V. , Bunkoczi G. , Chen V.B. , Davis I.W. , Echols N. , Headd J.J. , Hung L.W. , Kapral G.J. , Grosse-Kunstleve R.W. et al. . PHENIX: a comprehensive Python-based system for macromolecular structure solution . Acta Crystallogr. D Biol. Crystallogr. 2010 ; 66 : 213 – 221 . Google Scholar Crossref Search ADS PubMed WorldCat 88. The PyMOL Molecular Graphics System 2018 ; Schrödinger, LLC Version 2.1.1https://pymol.org/. 89. Case D.A. , Betz R.M. , Cerutti D.S. , Cheatham I.T.E. , Darden T.A. , Duke R.E. , Giese T.J. , Gohlke H. , Goetz A.W. , Homeyer N. et al. . 2016 ; 16 edn. San Francisco University of Californiahttp://ambermd.org/. 90. Maier J.A. , Martinez C. , Kasavajhala K. , Wickstrom L. , Hauser K.E. , Simmerling C. ff14SB: Improving the accuracy of protein side chain and backbone parameters from ff99SB . J. Chem. Theory Comput. 2015 ; 11 : 3696 – 3713 . Google Scholar Crossref Search ADS PubMed WorldCat 91. Skjaerven L. , Yao X.Q. , Scarabelli G. , Grant B.J. Integrating protein structural dynamics and evolutionary analysis with Bio3D . BMC Bioinformatics . 2014 ; 15 : 399 . Google Scholar Crossref Search ADS PubMed WorldCat 92. Nairn R.S. , Mitchell D.L. , Adair G.M. , Thompson L.H. , Siciliano M.J. , Humphrey R.M. UV mutagenesis, cytotoxicity and split-dose recovery in a human-CHO cell hybrid having intermediate (6-4) photoproduct repair . Mutat. Res. 1989 ; 217 : 193 – 201 . Google Scholar Crossref Search ADS PubMed WorldCat 93. McCready S. , Cox B. Repair of 6-4 photoproducts in Saccharomyces cerevisiae . Mutat. Res. 1993 ; 293 : 233 – 240 . Google Scholar Crossref Search ADS PubMed WorldCat 94. McCready S. Repair of 6-4 photoproducts and cyclobutane pyrimidine dimers in rad mutants of Saccharomyces cerevisiae . Mutat. Res. 1994 ; 315 : 261 – 273 . Google Scholar Crossref Search ADS PubMed WorldCat 95. Guzder S.N. , Sung P. , Prakash L. , Prakash S. Affinity of yeast nucleotide excision repair factor 2, consisting of the Rad4 and Rad23 proteins, for ultraviolet damaged DNA . J. Biol. Chem. 1998 ; 273 : 31541 – 31546 . Google Scholar Crossref Search ADS PubMed WorldCat 96. Jansen L.E. , Verhage R.A. , Brouwer J. Preferential binding of yeast Rad4.Rad23 complex to damaged DNA . J. Biol. Chem. 1998 ; 273 : 33111 – 33114 . Google Scholar Crossref Search ADS PubMed WorldCat 97. Chakraborty S. , Steinbach P.J. , Paul D. , Mu H. , Broyde S. , Min J.H. , Ansari A. Enhanced spontaneous DNA twisting/bending fluctuations unveiled by fluorescence lifetime distributions promote mismatch recognition by the Rad4 nucleotide excision repair complex . Nucleic Acids Res. 2018 ; 46 : 1240 – 1255 . Google Scholar Crossref Search ADS PubMed WorldCat 98. Yokoyama H. , Mizutani R. Structural biology of DNA (6-4) photoproducts formed by ultraviolet radiation and interactions with their binding proteins . Int. J. Mol. Sci. 2014 ; 15 : 20321 – 20338 . Google Scholar Crossref Search ADS PubMed WorldCat 99. Rycyna R.E. , Alderfer J.L. UV irradiation of nucleic acids: formation, purification and solution conformational analysis of the ‘6-4 lesion’ of dTpdT . Nucleic Acids Res. 1985 ; 13 : 5949 – 5963 . Google Scholar Crossref Search ADS PubMed WorldCat 100. Taylor J.S. , Garrett D.S. , Wang M.J. Models for the solution state structure of the (6-4) photoproduct of thymidylyl-(3′—-5′)-thymidine derived via a distance- and angle-constrained conformation search procedure . Biopolymers . 1988 ; 27 : 1571 – 1593 . Google Scholar Crossref Search ADS PubMed WorldCat 101. Yokoyama H. , Mizutani R. , Satow Y. , Komatsu Y. , Ohtsuka E. , Nikaido O. Crystal structure of the 64M-2 antibody Fab fragment in complex with a DNA dT(6-4)T photoproduct formed by ultraviolet radiation . J. Mol. Biol. 2000 ; 299 : 711 – 723 . Google Scholar Crossref Search ADS PubMed WorldCat 102. Mu H. , Geacintov N.E. , Min J.H. , Zhang Y. , Broyde S. Nucleotide excision repair lesion-recognition protein Rad4 captures a pre-flipped partner base in a benzo[a]pyrene-derived DNA lesion: how structure impacts the binding pathway . Chem. Res. Toxicol. 2017 ; 30 : 1344 – 1354 . Google Scholar Crossref Search ADS PubMed WorldCat 103. Mu H. , Geacintov N.E. , Zhang Y. , Broyde S. Recognition of damaged DNA for nucleotide excision repair: a correlated motion mechanism with a mismatched cis-syn thymine dimer lesion . Biochemistry . 2015 ; 54 : 5263 – 5267 . Google Scholar Crossref Search ADS PubMed WorldCat 104. Mu H. , Zhang Y. , Geacintov N.E. , Broyde S. Lesion sensing during initial binding by Yeast XPC/Rad4: toward predicting resistance to nucleotide excision repair . Chem. Res. Toxicol. 2018 ; 31 : 1260 – 1268 . Google Scholar Crossref Search ADS PubMed WorldCat 105. Song K. , Campbell A.J. , Bergonzo C. , de Los Santos C. , Grollman A.P. , Simmerling C. An improved reaction coordinate for nucleic acid base flipping studies . J. Chem. Theory Comput. 2009 ; 5 : 3105 – 3113 . Google Scholar Crossref Search ADS PubMed WorldCat 106. Rooklin D. , Wang C. , Katigbak J. , Arora P.S. , Zhang Y. AlphaSpace: fragment-centric topographical mapping to target protein-protein interaction interfaces . J. Chem. Inf. Model. 2015 ; 55 : 1585 – 1599 . Google Scholar Crossref Search ADS PubMed WorldCat 107. Maul M.J. , Barends T.R. , Glas A.F. , Cryle M.J. , Domratcheva T. , Schneider S. , Schlichting I. , Carell T. Crystal structure and mechanism of a DNA (6-4) photolyase . Angew. Chem. Int. Ed. Engl. 2008 ; 47 : 10076 – 10080 . Google Scholar Crossref Search ADS PubMed WorldCat 108. Yokoyama H. , Mizutani R. , Satow Y. Structure of a double-stranded DNA (6-4) photoproduct in complex with the 64M-5 antibody Fab . Acta Crystallogr. D Biol. Crystallogr. 2013 ; 69 : 504 – 512 . Google Scholar Crossref Search ADS PubMed WorldCat 109. Liebschner D. , Afonine P.V. , Moriarty N.W. , Poon B.K. , Sobolev O.V. , Terwilliger T.C. , Adams P.D. Polder maps: improving OMIT maps by excluding bulk solvent . Acta Crystallogr. D Struct. Biol. 2017 ; 73 : 148 – 157 . Google Scholar Crossref Search ADS PubMed WorldCat 110. Cai Y. , Kropachev K. , Kolbanovskii M. , Kolbanovskii A. , Broyde S. , Patel D. , Geacintov N.E. Geacintov NE , Broyde S The Chemical Biology of DNA Damage . 2010 ; Weinheim WILEY-VCH Verlag GmbH & Co 261 – 298 . Google Preview WorldCat Author notes The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]
Fatty acid conjugation enhances potency of antisense oligonucleotides in musclePrakash, Thazha, P;Mullick, Adam, E;Lee, Richard, G;Yu,, Jinghua;Yeh, Steve, T;Low,, Audrey;Chappell, Alfred, E;Østergaard, Michael, E;Murray,, Sue;Gaus, Hans, J;Swayze, Eric, E;Seth, Punit, P
doi: 10.1093/nar/gkz354pmid: 31127296
Abstract Enhancing the functional uptake of antisense oligonucleotide (ASO) in the muscle will be beneficial for developing ASO therapeutics targeting genes expressed in the muscle. We hypothesized that improving albumin binding will facilitate traversal of ASO from the blood compartment to the interstitium of the muscle tissues to enhance ASO functional uptake. We synthesized structurally diverse saturated and unsaturated fatty acid conjugated ASOs with a range of hydrophobicity. The binding affinity of ASO fatty acid conjugates to plasma proteins improved with fatty acid chain length and highest binding affinity was observed with ASO conjugates containing fatty acid chain length from 16 to 22 carbons. The degree of unsaturation or conformation of double bond appears to have no influence on protein binding or activity of ASO fatty acid conjugates. Activity of fatty acid ASO conjugates correlated with the affinity to albumin and the tightest albumin binder exhibited the highest activity improvement in muscle. Palmitic acid conjugation increases ASO plasma Cmax and improved delivery of ASO to interstitial space of mouse muscle. Conjugation of palmitic acid improved potency of DMPK, Cav3, CD36 and Malat-1 ASOs (3- to 7-fold) in mouse muscle. Our approach provides a foundation for developing more effective therapeutic ASOs for muscle disorders. INTRODUCTION Nucleic acid-based therapeutics represent a distinct drug-discovery platform with the ability to target genes linked to disease that are considered undruggable by classic small molecule approaches (1–3). Several nucleic-acid based drugs are currently approved for clinical use or in late stage of clinical development (3). Conjugation of tri-antennary N-acetyl galactosamine enhanced ASO uptake into hepatocytes through ASGR-mediated internalization, improving hepatocyte potency 10- to 60-fold (4). Recent studies showed that conjugation of ASOs to a ligand of the glucagon like peptide-1 receptor (GLP1R) improved ASO uptake into pancreatic ß-cells and enhanced potency >50-fold (5). This was remarkable since pancreatic ß-cells are known to be refractory to ASO uptake (6) and this opens therapeutic opportunities to treat diseases affecting the pancreas such as diabetes. To further expand the utility of antisense technology it is important to improve ASO potency in additional tissues beyond the liver and ß-cells. Therefore, delivery approaches which enhance ASO potency in extrahepatic tissues will be beneficial. ASOs show robust gene silencing in extrahepatic tissues such as muscle after systemic administration but higher doses are required (6). Development of Ionis DMPK-2.5Rx was recently discontinued because of inadequate therapeutic benefit in short-term clinical trials in patients with myotonic dystrophy type 1 (DM1) (7). We envisage that more potent ASOs with improved muscle delivery technologies would ameliorate poor efficacy of this drug in patients. Previous work showed that cholesterol conjugated ASOs show increased activity and cellular association through LDL receptor mediated endocytosis (8,9). Wolfrum et al. conjugated cholesterol and variety of lipids to siRNA and demonstrated that long-chain fatty acids and cholesterol can facilitate siRNA uptake into cells for effective gene silencing in mice (10). This group also showed that efficient and selective uptake of siRNA conjugates depends on interaction with lipoprotein particles, lipoprotein receptors and other cell-surface receptors (10). There are additional examples where lipid conjugation improved potency of siRNA and ASOs in cells (11) and in mice (12–15). Cholesterol conjugated siRNA also inhibited myostatin mRNA expression in skeletal muscle after systemic administration (16). Skeletal and cardiac muscle cells rely heavily on the oxidation of long-chain fatty acids for contractile work (17). Fatty acids are transported to muscle tissue via the blood either complexed to albumin or covalently bound in triacylglycerols forming the neutral lipid core of circulating triglyceride-rich lipoproteins such as chylomicrons or very low-density lipoproteins (17). The capillary endothelium represents one of the first barriers fatty acids have to traverse on their way from the vascular compartment to skeletal and cardiac muscle cells (17). The mechanism responsible for transmembrane movement of fatty acids is incompletely understood, however recent studies have revealed that interaction of the albumin-fatty acid complex with the endothelial membrane may facilitate fatty acid uptake (17). Serum albumin is a transport protein for endogenous fatty acids. Albumin is the most abundant plasma protein in human blood (35–50 g/l, Molecular weight 66.5 kDa) (18) and it is synthesized in the liver and released into the vascular space (19). Albumin interacts with multiple cellular receptors such as glycoprotein Gp60, Gp30 and Gp18a, the Megalin/Cubilin complex, and the neonatal Fc receptor (FcRn) (20). The interaction with these receptors are responsible for albumin's recycling, transcytosis and extended half-life (∼19 days) in circulation (20). Albumin contains multiple hydrophobic pockets that bind fatty acids and steroids as well as different drugs (20). Palmitic acid conjugated glucagon-like type-1 (GLP-1) agonist liraglutide (21) and Myristic acid conjugated detemir (22) also utilize the fatty-acid binding properties of albumin to improve their pharmacology. ASOs with PS linkages are known to bind serum albumin with dissociation constants in the low micromolar range (23). We hypothesized that fatty acid conjugation will improve the binding of PS ASO with serum albumin and facilitate ASO transport across the continuous capillary endothelial in the skeletal and cardiac muscle (24). In this report, we describe results from our detailed investigation of using fatty acid conjugation to enhance ASO activity in muscle. We show that conjugation of palmitic acid enhances affinity of PS ASOs for serum albumin and that these conjugates show enhanced potency in muscle tissues. We also report the detailed SAR of fatty acid-ASO conjugates to identify the optimal fatty acid for enhancing ASO potency. MATERIALS AND METHODS General method for the synthesis of fatty acid pentafluorophenyl esters 4, 22–28, 31, 56–67 Fatty acids 3, 15–21, 30, 44–55 (1 mmol, Schemes 1 and 2), TEA (2.25 ml, 16 mmol) were dissolved in DCM (1 ml/mmol) and pentafluorophenyl trifluoroacetate (4 mmol) was added. Stirred the reaction mixture at room temperature for 1 h. The reaction mixture was diluted with dichloromethane (6 ml/mmol) and washed with aqueous saturated NaHCO3 (5 ml/mmol) solution and 1 N NaHSO4 (5 ml/mmol) solution. The organic layer was separated, dried (Na2SO4), filtered and concentrated under reduced pressure. The crude product was purified by silica gel column chromatography and eluted with solvents (See Supporting Information) to yield 4, 22–28, 31, 56–67 in 70–82% isolated yield. All the compounds were characterized by 1H and 13C NMR spectroscopic analysis (Supporting Information). Scheme 1. View largeDownload slide Synthesis of 5′-fatty acid conjugated ASO 2, 6, 8–14, 69, 72 and 75; ASO 5: 5′-TmCAGkmCkAk TdTdmCd TdAdAdTdAd GdmCd AkGkmCk 3′, ASO 7: GkmCk AkTdTdmCd TdAdAdTdAd GdmCd AkGkmCk, ASO 70: 5′- TdmCdAdAkGkGkAdTd AdTd GdGdAdAdmCdmCd AkAkAk-3′, ASO 73: 5′-AkmCkAk AdTd AdAd AdTdAdmCdmCdGd AkGkGk-3′, ASO 76: 5′-mCkmCkmCk Td TdTdAd TdTd GdmCdAdGdmCkAkmCk, k: cEt BNA, d: DNA, mC: 5-methyl cytidine, Backbone all PS, underline PO. Scheme 1. View largeDownload slide Synthesis of 5′-fatty acid conjugated ASO 2, 6, 8–14, 69, 72 and 75; ASO 5: 5′-TmCAGkmCkAk TdTdmCd TdAdAdTdAd GdmCd AkGkmCk 3′, ASO 7: GkmCk AkTdTdmCd TdAdAdTdAd GdmCd AkGkmCk, ASO 70: 5′- TdmCdAdAkGkGkAdTd AdTd GdGdAdAdmCdmCd AkAkAk-3′, ASO 73: 5′-AkmCkAk AdTd AdAd AdTdAdmCdmCdGd AkGkGk-3′, ASO 76: 5′-mCkmCkmCk Td TdTdAd TdTd GdmCdAdGdmCkAkmCk, k: cEt BNA, d: DNA, mC: 5-methyl cytidine, Backbone all PS, underline PO. Scheme 2. View largeDownload slide Synthesisof 5’-unsaturated fatty acid conjugated ASO 29, 32-43. Scheme 2. View largeDownload slide Synthesisof 5’-unsaturated fatty acid conjugated ASO 29, 32-43. General method for the synthesis of fatty acid ASO conjugates 2, 6, 8–14, 29, 32–43, 69, 72, 75 To a solution of 5′-hexylamino ASOs 5, 7, 70, 73, 76 (Schemes 1 and 2) in 0.1 M sodium tetraborate buffer, pH 8.5 (2 mM) a solution of fatty acid PFP ester 4, 22–28, 31, 56–67 (3–9 mole equivalent, Scheme 1 and 2) dissolved in DMSO (40 mM), acetonitrile (20 mM) and triethylamine was added and the reaction mixture was stirred at room temperature for 3–18 h. The reaction mixture was diluted with water and purified by HPLC on a strong anion exchange column (GE Healthcare Bioscience, Source 30Q, 30 μm, 2.54 × 8 cm, A = 100 mM ammonium acetate in 30% aqueous CH3CN, B = 1.5 M NaBr in A, 0–60% of B in 60 min, flow 14 ml min−1). HPLC fractions containing full length ASO conjugates (analyzed by LC MS) were pooled together and diluted three fold volume with water and desalted by HPLC on a reverse phase column to yield the 5′-fatty acid conjugated ASOs 2, 6, 8–14, 29, 32–43, 69, 72, 75 in an isolated yield of 50–78%. The ASOs were characterized by ion-pair-HPLC–MS analysis with Agilent 1100 MSD system (Supporting Information). N-(6-hydroxyhexyl)palmitamide 78 To a solution of palmitic acid 3 (10.0 g, 39.0 mmol) and triethylamine (16.3 ml, 117.0 mmol) in dichloromethane (800 ml) pentafluorophenyl trifluoroacetate (10.1 ml, 58.5 mmol) was added dropwise at room temperature. To this 6-aminohexanol 79 (5.48 g, 46.8 mmol) and dichloromethane (200 ml) were added. The reaction mixture was stirred at room temperature for 12 h. Compound 78 (11.6 g, 84%) precipitated from the solution and collected by filtration. 1H NMR (300MHz, CDCl3) δ: 5.42 (br s, 1H), 3.65 (t, J = 6.3 Hz, 2H), 3.29-3.23 (m, 2H), 2.16 (t, J = 6.3 Hz, 2H), 1.70–1.46 (m, 6H), 1.45–1.19 (m, 28H), 0.89 (t, J = 6.3 Hz, 3H); 13C NMR (75MHz, CDCl3) δ: 173.11, 62.71, 39.24, 36.95, 32.56, 31.91, 29.70, 29.68, 29.64, 29.61, 29.49, 29.35, 29.32, 26.49, 25.82, 25.27, 22.68, 14.09. LRMS (ESI) m/z calcd for C22H46NO2 [M + H]+ 356.6, found 356.3. 2-Cyanoethyl (6-palmitamidohexyl) diisopropylphosphoramidite 77 To compound 78 (7.3 g, 20.5 mmol) DMF (50 ml) and THF (40 ml) were added. The mixture was heated at 50°C to get a clear solution. To this solution, tetrazole (1.15 g, 16.40 mmol) and 1-methylimidazole (0.41 ml, 5.13 mmol) were added and the reaction mixture was cooled in an ice bath (Note: solution became cloudy). 2-Cyanoethyl-N,N,N',N'-tetraisopropylphosphorodiamidite (9.78 ml, 30.8 mmol) was added and the reaction mixture was removed from the ice bath and stirred at room temperature for 12 h. The reaction mixture was then diluted with ethyl acetate (50 ml), washed with saturated NaHCO3 solution (50 ml) followed by brine (50 ml), and then dried (Na2SO4), filtered and concentrated under reduce pressure. The residue obtained was purified by silica gel column chromatography and eluted with 30% ethyl acetate in hexanes containing 1% triethylamine to yield 77 (10.7 g, 93%). 31P NMR (121MHz, CDCl3) δ: 147.49; HRMS (ESI) m/z calcd for C33H66N3O3P [M + H]+ 584.4915, found 584.4943. Fluorescence polarization assay Fluorescence polarization experiments were performed using ALEXA647-labeled ASOs. Measurements were performed in 1× phosphate-buffered saline (PBS). The assay was setup in 96-well Costar plates (black flat-bottomed non-binding) purchased from Corning, NY, USA. Binding was evaluated by adding ALEXA 647-labeled ASOs to yield 2 nM concentration to each well containing 100 μl of protein from sub nM to low mM concentration. Readings were taken using the Tecan (Baldwin Park, CA, USA) InfiniteM1000 Pro instrument (λex = 635 nm, λem = 675 nm). Using polarized excitation and emission filters, the instrument measures fluorescence perpendicular to the excitation plane (the ‘P-channel’) and fluorescence that is parallel to the excitation plane (the ‘S-channel’), and then it calculates FP in millipolarization units (mP) as follows: mP = [(S – P * G)/(S + P × G)] × 1000. The ‘G-factor’ is measured by the instrument as a correction for any bias toward the P channel. Polarization values of each ALEXA 647-labeled ASO in 1× PBS at 2 nM concentration were subtracted from each measurement. Kd values were calculated with GraphPad Prism 5 software (GraphPad Software, La Jolla, CA, USA) using non-linear regression for curve fit assuming one binding site. Animal treatment Animal experiments were conducted in accordance with the American Association for the Accreditation of Laboratory Animal Care guidelines and were approved by the Animal Welfare Committee (Cold Spring Harbor Laboratory's Institutional Animal Care and Use Committee guidelines). The animals were housed in micro-isolator cages on a constant 12 h light–dark cycle with controlled temperature and humidity and were given access to food and water ad libitum. Blood was collected by cardiac puncture exsanguination with K2-EDTA (Becton Dickinson Franklin Lakes, NJ, USA) and plasma separated by centrifugation at 10 000 rcf for 4 min at 4°C. Plasma transaminases were measured using a Beckman Coulter AU480 analyzer. Tissues were collected, weighed, flash frozen on liquid nitrogen and stored at −60°C. Reduction of target mRNA expression was determined by real time RT-PCR using StepOne RT–PCR machines (Applied Biosystems). Briefly, RNA was extracted from ∼50 to 100 mg tissue from each mouse using PureLink Pro 96 Total RNA Purification Kit (LifeTechnologies, Carlsbad, CA, USA) and mRNA was measured by qRT-PCR using Express One-Step SuperMix qRT-PCR Kit (Life Technologies, Carlsbad, CA, USA). Primers and probes for the PCR reactions were obtained from Integrated DNA technologies (IDT). The assay is based on a target-specific probe labeled with a fluorescent reporter and quencher dyes at opposite ends. The probe is hydrolyzed through the 5′-exonuclease activity of Taq DNA polymerase, leading to an increasing fluorescence emission of the reporter dye that can be detected during the reaction. Target RNA levels were normalized to cyclophilin mRNA expression or total mRNA as measured by Ribogreen. Malat-1 mouse protocol Malat-1 ASOs 1–2, 6, 8–14, 29 and 32–43 subcutaneously administrated to 8-week-old male C57BL/6 mice (Jackson Laboratories) at various doses. RNA was extracted using Invitrogen PureLink Pro 96 Total RNA purification kit after 5 days of last dose, mice were sacrificed for mouse heart, quadriceps, liver and kidney Malat-1 RNA quantification. PCR was done using Invitrogen Express One-Step qRT-PCR kit using primer probe set MALAT1 5′-TGGGTTAGAGAAGGCGTGTACTG-3′ for the forward primer, 5′-TCAGCGGCAACTGGG AAA-3′ for the reverse primer, and 5′- CGTTGGCACGACACC TTCAGGGACT-3′ for the probe. Target RNA levels were normalized to cyclophilin mRNA expression. The sequences for the primers and probe used for mouse Cyclophilin A are 5′-TCGCCGCTTGCTGCA-3′ for the forward primer, 5′-ATCGGCCGTGATGTCGA-3′ and 5′- CCATGGTCAACCCCACCG TGTTCX-3′ for the probe with 5′ fluorescein and 3′ TAMRA. CD36 mouse protocol CD36 ASOs 68–69 intravenous administrated to six to 8-week-old female C57BL/6 mice (Jackson Laboratories) at 1, 3 and 9 μmol/kg once a week for 3 weeks. Mice liver, kidney, heart and quadriceps were harvested 5 days after the last dose for CD36 mRNA quantitation using Invitrogen Express One-Step qRT-PCR kit. The sequences for the primers and probe used for mouse CD36 are 5′-TCCAGCCAATGCCTTTGC-3′ for the forward primer, 5′-GAGATTACTTTTTCAGTGCAGAA-3′ for the reverse primer, and 5′-TCACCCCTCCAGAA TCCAGACAACCAT-3′ for the probe with 5′ fluorescein and 3′ TAMRA. Target RNA levels were normalized to total RNA (Ribogreen© assessment). DMPK mouse protocol ASOs administrated subcutaneously to 6-week-old male Balb/c mice at ASO 71 at 1.84, 3.68 and 7.36 μmol/kg and ASO 72 at 0.85, 1.70 and 3.40 μmol/kg once a week for 3.5 weeks. Heart and quadriceps were harvested after 2 days after the last dose. DMPK mRNA was quantitated using Invitrogen Express One-Step qRT-PCR kit using DMPK forward primer: 5′-GACATATGCCAAGATTGTGCACTAC-3′, reverse primer: 5′-CACGAAT GAGG TCCTG AGCTT-3′ and 5′ fluorescein and 3′ TAMRA probe 5′-AACACTTGT CG CTG CCGCTGGC. Target RNA levels were normalized to total RNA (Ribogreen© assessment). Caveolin-3 (Cav3) mouse protocol ASOs 74–75 administrated subcutaneously to 6- to 8-week old male C57BL/6 mice at 1.68, 5 and 15 μmol/kg once a week for two weeks. Heart and quadriceps were harvested after 4 days after the last dose. Cav3 mRNA was quantitated using Invitrogen Express One-Step qRT-PCR kit using Cav3 forward primer: 5′-CATCAAGGACATTCACTGCAAG-3′, reverse primer: 5′-CTCCGCAATCACGTCTTCA-3′ and probe 5′- AACCGCGACCCCAAGAACATCA-3′ with 5′ fluorescein and 3′ TAMRA. Target RNA levels were normalized to total RNA (Ribogreen© assessment). ED50 determination ED50 values were determined with GraphPad Prism 5 software. The log dose of ASOs were plotted against mRNA level relative to untreated controls. The curves obtained were fitted using a four-parameter fit with variable slope and constraining bottom = 0 and top = 1. Early distribution mouse protocol Twelve-week-old male C57BL/6 mice were administered 7.5 μmol/ kg ASO subcutaneously and sacrificed (n = 2/ ASO) at 0.5, 1, 2, 4, 8 and 24 h. Plasma was collected by cardiac puncture on K2-EDTA. Systemic tissues including heart were collected at necropsy and immediately frozen on dry ice. Concentration of ASOs in heart were determined using the protocol describe below. Immunohistochemistry analysis was done following the protocol reported in the literature (6). Extraction and quantitation of ASO in tissues and plasma by LCMS ASO was extracted from plasma and tissues using previously described methods using phenol/chloroform followed by solid phase extraction using phenyl-functionalized support (25). Tissues were minced and 50–200-mg or 50–100 μl plasma samples were homogenized in 500 μl homogenization buffer (0.5% NP40 substitute (Sigma-Aldrich St. Louis, MO, USA) in Tris-buffered saline, pH 8) with Lysing Matrix D beads (MPBio Santa Ana, CA, USA) on a ball mill homogenizer (Retsch Haan, Germany) at 30 Hz for 45 s. Standard curves of each ASO were established using 500 μl aliquots of control tissue homogenate (50–200-mg tissue or 50–100 μl plasma/ml homogenization buffer). A 27-mer, fully PS, MOE/DNA oligonucleotide was added as an internal standard (IS) to all standard curves and study samples. Samples and curves were extracted with phenol/chloroform followed by solid-phase extraction (SPE) of the resulting aqueous extract using phenyl-functionalized silica sorbent (Biotage, Upsalla, Sweden). Eluate from SPE was dried down using a warm forced-air (nitrogen) evaporator and reconstituted in 100–200 μl 100 μM EDTA. Extracts were then analyzed and ASO concentration determined by LC–MS using a method similar to that described by Gaus et al. (26). Briefly, separation was accomplished using an 1100 HPLC–MS system (Agilent Technologies, Wilmington, DE, USA) consisting of a quaternary pump, UV detector, a column oven, an autosampler and a single quadrupole mass spectrometer. Samples were injected on an X-bridge OST C18 column (2.1 × 50 mm, 2.5-μm particles; Waters, Milford, MA, USA) equipped with a SecurityGuard C18 guard column (Phenomenex, Torrance, CA, USA). The columns were maintained at 55°C. Tributylammonium acetate buffer (5 mM) and acetonitrile were used as the mobile phase at a flow rate of 0.3 ml/min. Acetonitrile was increased as a gradient from 20 to 70% over 11 min. Mass measurements were made online using a single quadrupole mass spectrometer scanning 1000–2100 m/z in the negative ionization mode. Molecular masses were determined using the ChemStation analysis package (Agilent, Santa Clara, CA, USA). Manual evaluation was performed by comparing a table of calculated m/z values corresponding to potential metabolites with the peaks present in a given spectrum. Peak areas from extracted ion chromatograms were determined for ASOs and IS and a trendline established using the calibration standards, plotting concentration of ASO against the ratio of the peak areas ASO/IS. Concentration of ASOs in study samples were determined using established trendlines and reported as μg ASO/g tissue. RESULTS Palmitoyl conjugation improves plasma protein binding of cEt BNA ASO Phosphorothioate (PS) ASOs are known to bind plasma, cell surface and intra-cellular proteins which facilitate distribution, cellular uptake and intracellular trafficking to peripheral tissues (27). Plasma protein bound ASOs circulate transiently in the blood compartment and partition onto cell surface proteins and enter cells by endocytosis (28). Palmitic acid is known to bind to albumin and conjugation of palmitic acid to PS ASO is expected to improve albumin binding. To characterize the effect of palmitic acid conjugation on ASO binding to plasma proteins we synthesized 5′-plamitic acid conjugated 3–10–3 cEt BNA Gapmer ASO 1 (Figure 1) targeting the metastasis associated lung adenocarcinoma transcript 1 (Malat-1) (29). Malat-1 is an evolutionary conserved noncoding RNA gene highly expressed in many tissues (6). Synthesis of 5′-palmitic acid conjugated ASO 2 (Figure 1) is described in Scheme 1. The palmitoyl ASO 2 was fully characterized by ion-pair LC–MS analysis (Supporting Information). Figure 1. View largeDownload slide Characterizing the effect of palmitoyl conjugation on binding to selected plasma protein of Gapmer ASOs. (A) Binding curves showing differences between binding of unconjugated Malat-1 ASO 1 and palmitic acid conjugated ASO 2. (B) Binding constants for ASO 1 and 2 to selected plasma proteins; NB: no binding; HSA: human serum albumin; HDL: high-density lipoprotein; LDL: low-density lipoprotein; HRG: histidine rich glycoprotein; ASO sequence blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS; underline PO. Figure 1. View largeDownload slide Characterizing the effect of palmitoyl conjugation on binding to selected plasma protein of Gapmer ASOs. (A) Binding curves showing differences between binding of unconjugated Malat-1 ASO 1 and palmitic acid conjugated ASO 2. (B) Binding constants for ASO 1 and 2 to selected plasma proteins; NB: no binding; HSA: human serum albumin; HDL: high-density lipoprotein; LDL: low-density lipoprotein; HRG: histidine rich glycoprotein; ASO sequence blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS; underline PO. To characterize the interaction of palmitoyl ASO 2 with albumin, we determined the binding constants using a fluorescence polarization assay (30). In addition, we also analyzed binding affinity of palmitic acid conjugates to HDL (high-density lipoproteins), LDL (low-density lipoproteins) and HRG (histidine rich glycoprotein). Palmitoyl conjugation improved the binding affinity of PS ASOs to albumin >150-fold (Figure 1). Interestingly, palmitic acid conjugation also enhanced affinity of the ASO to HDL and LDL while no change in affinity was observed for HRG (Figure 1). Palmitoyl conjugation improves potency of cEt BNA Malat-1 ASO 3–6-fold in muscle We next examined the effect of palmitoyl conjugation on potency of a 3–10–3 cEt BNA ASO 2 (Figure 2) targeting Malat-1 RNA to understand the broader effect of palmitoyl conjugation. The potency of ASO 1 and 2 to inhibit Malat-1 RNA was evaluated in mice. Mice (C57BL/6, n = 4/group) were injected subcutaneously with 0.4, 1.2, and 3.6 μmol/kg of ASOs 1 or 2 for 3 weeks. Five days following the last injection, mice were sacrificed and heart, quadriceps, livers and kidney were homogenized and analyzed for Malat-1 RNA expression. The 5′-palmitoyl Malat-1 ASO 2 showed improved potency relative to unconjugated ASO 1 in quadriceps (3-fold, Figure 2) and heart (>6 fold, Figure 2). However, no potency improvement was observed in liver and kidney. Malat-1 ASOs 1 and 2 were well tolerated with no elevations in plasma transaminases or organ weights. Figure 2. View largeDownload slide Palmitic acid conjugation enhances potency of Malat-1 ASO in mouse heart and quadriceps. Mice (C57BL/6, n = 4/group) were injected subcutaneously Malat-1 ASO 1 and 5′-palmitoyl conjugated Malat-1 ASO 2 at 0.4, 1.2 and 3.6 μmol/kg once a week for 3 weeks for a total of three doses and sacrificed after 5 days. (A) Malat-1 RNA expression analyzed in mice heart, quadriceps, liver and kidney using qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing Malat-1 RNA in mouse heart, quadriceps and liver. Figure 2. View largeDownload slide Palmitic acid conjugation enhances potency of Malat-1 ASO in mouse heart and quadriceps. Mice (C57BL/6, n = 4/group) were injected subcutaneously Malat-1 ASO 1 and 5′-palmitoyl conjugated Malat-1 ASO 2 at 0.4, 1.2 and 3.6 μmol/kg once a week for 3 weeks for a total of three doses and sacrificed after 5 days. (A) Malat-1 RNA expression analyzed in mice heart, quadriceps, liver and kidney using qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing Malat-1 RNA in mouse heart, quadriceps and liver. The tissue concentration of Malat-1 ASOs (1,2) in heart, quadriceps and liver of treated mice were quantitated (Table 1) (31,32). Palmitoyl conjugation improved accumulation of Malat-1 ASO in heart (2–4-fold, Table 1) and quadriceps (2-fold, Table 1) relative to unconjugated ASO 1. This data suggest that improved potency observed for palmitoyl ASO 2 is due to higher uptake of ASO in these tissues. To determine the metabolic fate of the 5′-pamitoyl ASO in heart, quadriceps and liver tissues from the mice treated with ASO 2 were homogenized and the ASO and metabolites were extracted and identified by LCMS. Interestingly, very little intact ASO 2 was extracted from tissues. The major metabolite isolated from the tissues corresponds to unconjugated ASO (data not shown) suggesting that the 5′-palmitoyl was metabolized liberating free ASO in the heart, quadriceps and liver. Table 1. Concentration of Malat-1 ASO 1 and 5′-palmitoyl Malat-1 ASO 2 in the heart, quadriceps and liver ASO No Heart tissue concentration (μg/g) Quadriceps tissue concentration (μg/g) Liver tissue concentration (μg/g) 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 1 0.12 ± 0.1 0.53 ± 0.1 2.7 ± 1.2 - 0.53 ± 0.53 1.67 ± 0.33 2.76 ± 0.07 8.28 ± 1.03 23.10 ± 2.07 2 0.40 ± 0.1 2.13 ± 0.9 6.53 ± 2.9 - 0.80 ± 0.4 2.80 ± 1.13 4.82 ± 1.72 19.31 ± 3.45 43.45 ± 8.62 ASO No Heart tissue concentration (μg/g) Quadriceps tissue concentration (μg/g) Liver tissue concentration (μg/g) 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 1 0.12 ± 0.1 0.53 ± 0.1 2.7 ± 1.2 - 0.53 ± 0.53 1.67 ± 0.33 2.76 ± 0.07 8.28 ± 1.03 23.10 ± 2.07 2 0.40 ± 0.1 2.13 ± 0.9 6.53 ± 2.9 - 0.80 ± 0.4 2.80 ± 1.13 4.82 ± 1.72 19.31 ± 3.45 43.45 ± 8.62 View Large Table 1. Concentration of Malat-1 ASO 1 and 5′-palmitoyl Malat-1 ASO 2 in the heart, quadriceps and liver ASO No Heart tissue concentration (μg/g) Quadriceps tissue concentration (μg/g) Liver tissue concentration (μg/g) 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 1 0.12 ± 0.1 0.53 ± 0.1 2.7 ± 1.2 - 0.53 ± 0.53 1.67 ± 0.33 2.76 ± 0.07 8.28 ± 1.03 23.10 ± 2.07 2 0.40 ± 0.1 2.13 ± 0.9 6.53 ± 2.9 - 0.80 ± 0.4 2.80 ± 1.13 4.82 ± 1.72 19.31 ± 3.45 43.45 ± 8.62 ASO No Heart tissue concentration (μg/g) Quadriceps tissue concentration (μg/g) Liver tissue concentration (μg/g) 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 0.4 μmol kg−1 1.2 μmol kg−1 3.6 μmol kg−1 1 0.12 ± 0.1 0.53 ± 0.1 2.7 ± 1.2 - 0.53 ± 0.53 1.67 ± 0.33 2.76 ± 0.07 8.28 ± 1.03 23.10 ± 2.07 2 0.40 ± 0.1 2.13 ± 0.9 6.53 ± 2.9 - 0.80 ± 0.4 2.80 ± 1.13 4.82 ± 1.72 19.31 ± 3.45 43.45 ± 8.62 View Large We next studied the early distribution pharmacokinetics of ASO 1 and palmitic acid conjugate ASO 2 in mice. Mice (C57BL/6) were administered 7.5 µmol/kg ASOs 1 or 2 subcutaneously and sacrificed at 0.5, 1, 2, 4, 8, and 24 h. Plasma and heart tissues were was collected and ASO concentration was analyzed by LCMS (25,26). Conjugation with palmitic acid increased plasma Cmax of ASO nearly 3-fold, from 74 µg/ml for unconjugated ASO to 215 µg/ml for palmitate-conjugated ASO and increased plasma AUC 4-fold from 222 ± 30 µg•h/mL to 915 ± 153 µg•h/ml (Figure 3A). Similarly, Cmax in heart increased nearly two-fold from 67- to 119 µg/g and AUC more than doubled from 682 ± 41 µg•h/g to 1676 ± 118 µg•h/g (Figure 3B) for unconjugated and palmitate-conjugated ASO respectively. Histological examination indicates that much of the ASO accumulated in heart (Figure 3C) at early time points appears diffuse or concentrated along intercellular boundaries suggesting it primarily resides in the interstitial space, however, by 24 h, the ASO appears in punctate foci, likely occupying intracellular vesicles. This increase in ASO exposure in the heart correlates with a 6-fold increase in potency of palmitoyl ASO 2 compared to unconjugated ASO 1 in the heart. Figure 3. View largeDownload slide Pharmacokinetics of Malat-1 ASO 1 and 5′-palmitoyl conjugated Malat-1 ASO 2; mice plasma total ASO concentration (A) and mice heart tissue total ASO concentration (B) after 0.5, 1, 2, 4, 8 and 24 h of subcutaneous administration of ASOs 1–2 (7.5 μmol kg−1); (C) ASO distribution in mice heart tissue after 1, 2, 4, 8 and 24 h of subcutaneous administration ASOs 1–2 (7.5 μmol kg−1) analyzed for ASOs by immunohistochemistry (IHC) using a PS-oligonucleotide antibody. Shown is a representative example of one of the four animals per group. Figure 3. View largeDownload slide Pharmacokinetics of Malat-1 ASO 1 and 5′-palmitoyl conjugated Malat-1 ASO 2; mice plasma total ASO concentration (A) and mice heart tissue total ASO concentration (B) after 0.5, 1, 2, 4, 8 and 24 h of subcutaneous administration of ASOs 1–2 (7.5 μmol kg−1); (C) ASO distribution in mice heart tissue after 1, 2, 4, 8 and 24 h of subcutaneous administration ASOs 1–2 (7.5 μmol kg−1) analyzed for ASOs by immunohistochemistry (IHC) using a PS-oligonucleotide antibody. Shown is a representative example of one of the four animals per group. Linker SAR to identify optimal linker strategy for palmitoyl conjugate In our initial linker strategy, we attached the palmitoyl moiety using a phosphodiester d(TCA) linker which is metabolized in tissues to release the ASO. To further probe the importance of the PO d(TCA) linker moiety on potency, we evaluated ASO conjugate 6 where the fatty acid was directly attached to the ASO using a phosphodiester-linked hexylamino linker (Figure 4). 5′-Hexylamino Malat-1 ASO 7 (Scheme 1) with a PO linker was synthesized according to the reported procedure (33). The 5′-hexylamino ASO 7 was reacted with pentafluorophenyl palmitate 4 to yield ASO 6 and was fully characterized by ion-pair LC–MS analysis (Supporting Information). Figure 4. View largeDownload slide Potency of palmitic acid conjugated Malat-1 ASO with and without PO d(TCA) linker is similar in mouse heart and liver. Mice (C57BL/6, n = 4/group) were injected subcutaneously ASO 1 at 0.4, 1.2, 3.6, 10.8 and ASO 2 with PO d(TCA), 6 without PO d(TCA) at 0.4, 1.2 and 3.6 μmol/kg for 3 weeks for a total of 3 doses then sacrificed after 5 days. Malat-1 RNA expression analyzed in mice heart and liver by qRT-PCR. All data are expressed as mean ± standard deviation. ED50 (μmol/kg/wk) for reducing Malat-1 RNA in mouse heart and ASO sequence: blue = cEt BNA, black = DNA, C = 5-methylcytidine, backbone all PS; o = PO, X = palmitoyl. Figure 4. View largeDownload slide Potency of palmitic acid conjugated Malat-1 ASO with and without PO d(TCA) linker is similar in mouse heart and liver. Mice (C57BL/6, n = 4/group) were injected subcutaneously ASO 1 at 0.4, 1.2, 3.6, 10.8 and ASO 2 with PO d(TCA), 6 without PO d(TCA) at 0.4, 1.2 and 3.6 μmol/kg for 3 weeks for a total of 3 doses then sacrificed after 5 days. Malat-1 RNA expression analyzed in mice heart and liver by qRT-PCR. All data are expressed as mean ± standard deviation. ED50 (μmol/kg/wk) for reducing Malat-1 RNA in mouse heart and ASO sequence: blue = cEt BNA, black = DNA, C = 5-methylcytidine, backbone all PS; o = PO, X = palmitoyl. Mice (C57BL/6, n = 4/group) were injected subcutaneously with 0.4, 1.2 and 3.6 μmol/kg of ASOs 1, 2 and 6 for three weeks. Mice were sacrificed after 5 days and heart and liver Malat-1 RNA expression were analyzed. ASO 2 with a PO d(TCA) (Figure 4) linker and ASO 6 with a PO linker (Figure 4) showed similar ASO activity in heart and liver. This data suggests that the PO d(TCA) linker between the fatty acid and the ASO is not required and just a PO linkage is sufficient. According to this data we selected the PO linked design chemistry for further characterization of fatty acid conjugated ASOs. Fatty acid structure activity relation (SAR) to identify optimal fatty acid strategy for enhancing potency of ASO in muscle and heart Binding of fatty acids with varying chain length to albumin showed that primary association constant increased with chain length (34). Moreover, the number of high-affinity sites also increased with chain length, octanoate (C8) and decanoate (C10) bind to one site while longer chain fatty acid such as laurate (C12) and myristate (C14) bind to two sites (34). To investigate the effect of fatty acid hydrophobicity on the functional uptake of ASO we studied in vivo activity and plasma protein binding of eight fatty acid conjugates (Figure 5) with varying chain length. Figure 5. View largeDownload slide (A) Structures of ASO fatty acid conjugates 6, 8–14; (B) protein binding affinity of fatty acid ASO conjugates 6, 8–14; (C) Malat-1 RNA expression in mice heart and quadriceps after subcutaneous administration of ASO 6, 8–14 at 3.6 μmol/kg once a week for three weeks for a total of three doses; ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, Backbone all PS, o: PO. Figure 5. View largeDownload slide (A) Structures of ASO fatty acid conjugates 6, 8–14; (B) protein binding affinity of fatty acid ASO conjugates 6, 8–14; (C) Malat-1 RNA expression in mice heart and quadriceps after subcutaneous administration of ASO 6, 8–14 at 3.6 μmol/kg once a week for three weeks for a total of three doses; ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, Backbone all PS, o: PO. We synthesized fatty acid conjugated ASOs 8–14 targeting Malat-1 RNA employing the similar solution phase conjugation method used for palmitoyl ASO 6 (Scheme 1). All the ASOs contain similar PO linker design as in ASO 6 without any DNA linker. Fatty acids octanoic acid 15, decanoic acid 16, dodecanoic acid 17, myristic acid 18, stearic acid 19, eicosanoic acid 20 and docosanoic acid 21 (Scheme 1) were treated with pentafluorophenyl trifluroacetate in appropriate solvents in the presence of triethylamine at room temperature afforded corresponding pentafluorophenyl esters 22–28 (Scheme 1, Supporting Information). A solution of pentafluorophenyl esters 22–28 in appropriate solvent were added to a solution of 5′-hexylamino Malat-1 ASO 7 (Scheme 1) in sodium tetraborate buffer (pH 8.5) and the resulting solution was stirred at room temperature for 5–18 h followed by HPLC purification to provide the fatty acid conjugates ASO 8–14 (Figure 5) with varying fatty acid chain length C8-C22. Fatty acid ASO conjugates 8–14 were fully characterized by ion-pair LC–MS analysis (Supporting Information). We examined the plasma protein binding of ASOs 8–14 using a reported fluorescence polarization (FP) competition binding assay (35) which measures the change in FP upon displacement of a 5′-palmitoyl ASO fluorophore tracer from human albumin, LDL or HDL. Interestingly, all fatty acids, regardless of chain length, improved plasma protein binding of ASO to albumin, LDL and HDL (Figure 5) relative to unconjugated ASO 1. Albumin binding affinities of ASO fatty acid conjugates (8-11) with chain length C8–C14 was 2–5-fold less than ASO conjugated with chain length C16-C22 (6, 12-14). A similar trend was observed for the binding affinities of fatty acid conjugates 6 and 8–14 to LDL and HDL. ASO conjugates with fatty acids containing longer chain lengths of C12 to C22 (6, 10-14) exhibited improved binding to HDL and LDL proteins relative to ASO 8–9 containing fatty acid chain length C8 to C10. We next tested the activity of ASOs 8–14 (Figure 5) for inhibiting Malat-1 RNA expression in mouse quadriceps and heart. For comparison we also evaluated the activity of unconjugated Malat-1 ASO 1 and 5′-palmitoyl conjugated ASO 6. Mice (C57BL/6, n = 4/group) were injected subcutaneously with 3.6 μmol/kg of ASOs 1, 6, 8–14 for three weeks. Mice were sacrificed 5 days post-injection and Malat-1 RNA expression was analyzed from quadriceps and heart (Figure 5). ASOs conjugated to fatty acids with chain length C8 (8, Figure 5) and C10 (9, Figure 5) and unconjugated ASO 1 showed similar potency in quadriceps and heart. Interestingly, in the quadriceps and heart ASOs 10–14 conjugated to fatty acids with chain length from C12 to C22 were more active than ASO 1 (Figure 5). ASO conjugated to fatty acids with chain length C16 to C20 (6, 12–13, Figure 5) showed the greatest quadriceps and heart activity improvements. The ASO conjugates 6, 12–14 with longer fatty acid chain conjugates C16–C22 showed the highest plasma protein binding affinities. In general, albumin binding affinity appeared correlated with the activity of fatty acid conjugates in skeletal and cardiac muscle. The ASO conjugates 6, 12–14 with longer fatty acid chain conjugates C16–C22 showed the highest plasma protein binding affinity and in vivo muscle activity. SAR of unsaturated fatty acid ASO Conjugates We also investigated the effect of conjugation of unsaturated fatty acid on muscle ASO potency. We hypothesized that unsaturated fatty acids will assume a different structure than saturated fatty acid leading to significant differences in the protein binding characteristics of unsaturated fatty acid and saturated fatty acid ASO conjugates. To test this hypothesis, we synthesized ASO 29 (Figure 6, Scheme 2) which contained an oleic acid conjugation. Figure 6. View largeDownload slide 5′-Oleoyl ASO 29 and 5′-palmitoyl Malat-1 ASO 2 exhibited similar potency in mice heart and liver. Mice (C57BL/6, n = 4/group) were injected subcutaneously with Malat-1 ASO 1, 5′-plamitoyl Malat-1 ASO 2 and 5′-oleyol Malat-1 ASO 29 at 0.2, 0.6 and 1.8 μmol/kg once a week for 3 weeks, sacrificed after 5 days. (A) Malat-1 RNA expression analyzed in mice heart and liver by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing Malat-1 RNA in mouse heart and liver. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, Backbone all PS, Underline PO. (C) structure of 5′-oleic acid conjugated Malat-1 ASO 29. Figure 6. View largeDownload slide 5′-Oleoyl ASO 29 and 5′-palmitoyl Malat-1 ASO 2 exhibited similar potency in mice heart and liver. Mice (C57BL/6, n = 4/group) were injected subcutaneously with Malat-1 ASO 1, 5′-plamitoyl Malat-1 ASO 2 and 5′-oleyol Malat-1 ASO 29 at 0.2, 0.6 and 1.8 μmol/kg once a week for 3 weeks, sacrificed after 5 days. (A) Malat-1 RNA expression analyzed in mice heart and liver by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing Malat-1 RNA in mouse heart and liver. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, Backbone all PS, Underline PO. (C) structure of 5′-oleic acid conjugated Malat-1 ASO 29. ASO 29 (Figure 6) activity in mouse liver and heart was examined and compared with 5′-palmitoyl ASO 2 and unconjugated ASO 1. Mice (C57BL/6, n = 4/group) were injected with 0.2, 0.6 and 1.8 μmol/kg of ASOs 1, 2 and 29 for three weeks. Mice were sacrificed after 5 days and Malat-1 RNA expression was analyzed from liver and heart (Figure 6). 5′-Palmitoyl ASO 2 and 5′-oleoyl ASO 29 showed similar potency in liver and heart (Figure 6). To further investigate the effect of unsaturated bond number, position, and conformation we synthesized a series of naturally occurring unsaturated fatty acid ASO conjugates and studied their activity in skeletal and cardiac muscle (Figure 7). In addition, we also characterized the plasma protein binding profile (Figure 7) of these conjugates. Figure 7. View largeDownload slide (A) Structure of ASO conjugates 32–43; (B) protein binding affinity of lipid conjugates ASOs 32–43; (C) Malat-1 RNA expression from hearts and quadriceps of ASOs 32–43 administered mice.; ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, Backbone all PS, o: PO. Figure 7. View largeDownload slide (A) Structure of ASO conjugates 32–43; (B) protein binding affinity of lipid conjugates ASOs 32–43; (C) Malat-1 RNA expression from hearts and quadriceps of ASOs 32–43 administered mice.; ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, Backbone all PS, o: PO. We first investigated the cis and trans and positional isomers of fatty acid with one double bond. Myristolenoyl ASO 32 (Figure 7A, C14, ω 5Z), Palmitoleyoyl ASO 33 (Figure 7A, C16, ω 7Z) and sapienoyl ASO 34 (Figure 7A, C16, ω 10Z) with cis double bond at ω 5, 7 and 10 were synthesized (Scheme 2). To understand the effects on activity of fatty acid chain length and the degree of unsaturation we synthesized nervonoyl ASO 35 (Figure 7A, C16, ω 9Z) with chain length C24 and a cis double bond at ω 9. To examine the effect of double bond conformation on ASO activity we synthesized three fatty acid ASO conjugates 36–37 (Figure 7A) with a trans double bond at ω 7 and 9 respectively as well as linoelaidoyl ASO 38 (Figure 7A, C18, ω 6E, 9E) containing two trans double bond at ω 6 and 9. Furthermore, we also characterized the effect of conjugation of fatty acids containing 2, 3, 4 and 6 cis double bonds containing fatty acids on the activity and plasma protein binding of ASO. Linoleoyl ASO 39 (Figure 7A, C18, ω 6Z,9Z), octadecatrienoyl ASO 40 (Figure 7A, C18, ω 3Z, 6Z,9Z), γ-linolenoyl ASO 41 (Figure 7A, C18, ω 6Z,9Z,12Z), arachidonyl ASO 42 (Figure 7A, C20, ω 6Z,9Z,12Z, 15Z) and DHA ASO 43 (Figure 7A, C22, ω 3Z,6Z,9Z,12Z, 15Z, 18Z) were synthesized. ASOs 32–43 were synthesized using methods (Scheme 2) similar to the synthesis methods of other fatty acid conjugates described in this report. Binding of unsaturated fatty acid ASO conjugates 32–43 to human albumin, LDL and HDL were measured using a competition binding assay (Figure 7B) (35). Comparable to saturated fatty acid ASO conjugates, all unsaturated fatty acid ASO conjugates bound to albumin, LDL and HDL (Figure 7B) with higher affinity than unconjugated ASO 1. Interestingly, palmitoyl ASO conjugate 6 showed tighter binding to albumin, LDL and HDL relative to all unsaturated fatty acids ASO conjugates 32–43 assessed in this study. Consistent with previous observation, binding affinity of unsaturated fatty acid conjugates with shorter chain length (ASO 32, C14 ω 5Z) was less than longer chain unsaturated fatty acids. Double bond confirmation and position appears to have minimum effect of the plasma protein binding of ASO conjugates. ASOs 33–35 containing cis double bond at different position and ASOs conjugates 36–38 with trans double bonds at different positions showed similar plasma protein binding affinity (Figure 7B). Fatty acid conjugates containing different degrees of unsaturation exhibited interesting albumin binding affinity trends. ASO conjugate 43 with six cis unsaturated double bonds bound to albumin (Ki 5 μM) at a lower affinity compared to ASOs 32–35 and 39–42 (Ki 1.7–2 μM) with one to four cis double bonds. To assess the effect of structural variation of fatty acid on activity, we evaluated the activity of ASO conjugates 32–43 for inhibiting Malat-1 RNA expression in mice and compared it to 5′-palmitoyl ASO 6 and unconjugated ASO 1. Mice (C57BL/6, n = 4/group) were subcutaneously injected with 3.6 μmol/kg of ASOs 1, 6 and 32–43 for three weeks. Mice were sacrificed after 5 days and Malat-1 RNA expression was analyzed from quadriceps and heart (Figure 7C). All fatty acid conjugates ASOs showed improved activity in quadriceps and heart compared to unconjugated ASO 1 (Figure 7C). However, 5′-palmitoyl ASO 6 exhibited improved activity compared to all other unsaturated fatty acid conjugated ASOs examined in the study (Figure 7C). It appears that geometry or number of unsaturation do not significantly influence the functional uptake of unsaturated fatty acid conjugates. ASOs 32–35 containing fatty acids with cis double bond at different position and ASOs 36–38 containing trans double bonds showed similar activity. Similarly, ASO conjugates 39–43 containing two to six cis double bonds also showed similar activity in muscle. Consistent with saturated fatty acid conjugates, plasma protein binding affinity correlated with the skeletal and cardiac muscle activity of unsaturated fatty acid ASO conjugates Palmitoyl conjugation improves potency of cEt BNA CD36 ASO 3–7 fold in heart and muscle Next, we examined the effects of palmitic acid conjugation on the potency of ASO targeting CD36 mRNA in mouse heart and quadriceps. CD36 ASO 68 (Figure 8) is a fully PS modified cEt BNA Gapmer ASO targeting CD36 mRNA, which is expressed in both murine liver and extra-hepatic tissues (36). Activity of ASOs targeting mouse CD36 mRNA have been previously characterized (36). To study the effect of palmitic acid modification on activity of CD36 ASO we synthesized 5′-palmitoyl conjugated CD36 ASO 69 (Figure 8) using the method described in Scheme 1. Figure 8. View largeDownload slide Palmitic acid conjugation enhances potency of CD36 ASO in mice heart, quadriceps and, liver. Mice (C57BL/6, n = 4/group) were injected intravenously CD36 ASO 68, 5′-plamitoyl CD36 ASO 68 at 1, 3 and 9 μmol/kg once a week for 3 weeks, sacrificed after 5 days. (A) CD36 mRNA expression analyzed in mice hearts quadriceps, liver and kidney by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing CD36 mRNA in mouse heart, quadriceps, liver and kidney. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS, underline PO. Figure 8. View largeDownload slide Palmitic acid conjugation enhances potency of CD36 ASO in mice heart, quadriceps and, liver. Mice (C57BL/6, n = 4/group) were injected intravenously CD36 ASO 68, 5′-plamitoyl CD36 ASO 68 at 1, 3 and 9 μmol/kg once a week for 3 weeks, sacrificed after 5 days. (A) CD36 mRNA expression analyzed in mice hearts quadriceps, liver and kidney by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing CD36 mRNA in mouse heart, quadriceps, liver and kidney. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS, underline PO. Mice (C57BL/6, n = 4/group) were injected intravenously with 1, 3 and 9 μmol/kg of ASOs 68 or 69 for 3 weeks. Mice were sacrificed 5 days post-injection and, heart, quadriceps, liver and kidney were homogenized and analyzed for reduction of CD36 mRNA. Palmitoyl ASO 69 showed more than 7-fold enhanced potency in quadriceps and heart relative to the parent ASO 68 (Figure 8). In liver, potency improvement was only 3-fold whereas in kidney similar potency was exhibited with unconjugated ASO and palmitic acid conjugated ASO. All ASOs were well tolerated with no elevations in plasma transaminases or organ weights (data not shown). Palmitoyl conjugation enhances potency of ASO targeting DMPK mRNA in mice Myotonic dystrophy type 1 (DM1) is a neuromuscular diseases caused by inherited CTG repeat expansion in the gene encoding DM Protein Kinase (DMPK) (7). The CTG repeats in the gene are transcribed into mRNA which cause hairpins to form and bind with high affinity to the muscle blind-like (MBNL) family of proteins. This complex is sequestered and prevents them from performing their normal function (7). In preclinical studies, antisense oligonucleotides targeted to DMPK mRNA efficiently reduced mRNA in different skeletal muscle (37,38) which led to improvements in body weight, muscle strength and muscle histology (38). However, in the clinic the DMPK ASO exhibited limited therapeutic benefit (7), indicating an urgent need for improved potency. In this report we evaluated the potency of a previously characterized (37) DMPK ASO 71 (Figure 9) in muscle and show that palmitoyl conjugation improves the potency of DMPK ASO in mice quadriceps and heart (Figure 9). Figure 9. View largeDownload slide Palmitic acid conjugation enhances potency of DMPK ASO in mice heart and quadriceps. Mice (Bal/c, n = 4/group) were administered subcutaneously DMPK ASO 71 at 1.84, 3.68 and 7.36 and 5′-palmitoyl DMPK ASO 72 at 0.85, 1.70, 3.40, μmol/kg once a week for 3.5 weeks, sacrificed after 2 days. (A) DMPK mRNA expression analyzed in mice heart and quadriceps by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing DMPK mRNA in mouse heart and quadriceps. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS, o = PO. Figure 9. View largeDownload slide Palmitic acid conjugation enhances potency of DMPK ASO in mice heart and quadriceps. Mice (Bal/c, n = 4/group) were administered subcutaneously DMPK ASO 71 at 1.84, 3.68 and 7.36 and 5′-palmitoyl DMPK ASO 72 at 0.85, 1.70, 3.40, μmol/kg once a week for 3.5 weeks, sacrificed after 2 days. (A) DMPK mRNA expression analyzed in mice heart and quadriceps by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing DMPK mRNA in mouse heart and quadriceps. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS, o = PO. Mice (n = 4/group) were injected subcutaneously with DMPK ASOs 71at 1.84, 3.68 and 7.36 μmol/kg and 72 at 0.85, 1.70 and 3.40 μmol/kg per week for 4 weeks. Mice were sacrificed 2 days following their last dose and quadriceps and heart DMPK mRNA expression was analyzed (Figure 9). Palmitoyl modification led to a 3–4-fold improvement of DMPK ASO 72 activity in quadriceps and heart relative to unconjugated ASO 71 (Figure 9). Palmitoyl conjugation enhances potency of ASO targeting Cav3 mRNA more than 5-fold in mice skeletal and cardiac muscle Caveolin 3 (Cav3) is a caveolin family member protein expressed in cell types that act as a scaffolding protein for the organization and concentration of certain caveolin-interacting molecules. Mutations in this gene is attributed to lead to Limb-Girdle muscular dystrophy type-1C (LGMD-1C), hyperCKemia, or rippling muscle disease (RMD) (39). It has been reported that Cav3 expression in mice is primarily in heart and skeletal muscle (40). We designed antisense oligonucleotide 74 (Figure 10) targeting Cav3 mRNA and demonstrated modest activity in quadriceps and heart. To evaluate the effect of palmitic acid conjugation on potency of Cav3 ASO we synthesized 5′-palmitoyl conjugated Cav 3 ASO 75 (Figure 10, Scheme 1) and studied activity in mice. Figure 10. View largeDownload slide Activity of Cav3 ASO 74 and 5′-palmitoyl Cav3 ASO 75 in mice heart and quadriceps; Mice (C57BL/6, n = 4/group) were injected subcutaneously Cav3 ASO 74, 5′-palmitoyl Cav3 ASO 75 at 1.68, 5 and 15 μmol/kg once a week for 2 weeks, sacrificed after 4 days. (A) Cav3 mRNA expression analyzed in mice heart and quadriceps by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing Cav3 mRNA in mouse heart and quadriceps. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS, o = PO. Figure 10. View largeDownload slide Activity of Cav3 ASO 74 and 5′-palmitoyl Cav3 ASO 75 in mice heart and quadriceps; Mice (C57BL/6, n = 4/group) were injected subcutaneously Cav3 ASO 74, 5′-palmitoyl Cav3 ASO 75 at 1.68, 5 and 15 μmol/kg once a week for 2 weeks, sacrificed after 4 days. (A) Cav3 mRNA expression analyzed in mice heart and quadriceps by qRT-PCR. All data are expressed as mean ± standard deviation. (B) ED50 (μmol/kg/week) for reducing Cav3 mRNA in mouse heart and quadriceps. ASO sequence: X = lipid, blue: cEt BNA, black: DNA, C: 5-methylcytidine, backbone all PS, o = PO. We then evaluated the potency of ASOs 74–75 to inhibit Cav3 mRNA in mice quadriceps and heart. Mice (n = 4/group) were injected subcutaneously with ASOs 74 or 75 at 1.68, 5 and 15 μmol/kg per week for two weeks. Mice were sacrificed 4 days following the final dose and Cav3 mRNA expression was analyzed from quadriceps and heart (Figure 10). Palmitoyl Cav3 ASO 75 exhibited improved potency in quadriceps (5-fold) and in heart (> 4-fold) compared to unconjugated ASO 74 (Figure 10). This data further confirms that that palmitoyl conjugation is a worthwhile approach to improve the potency of clinically relevant ASO in mice. Solid phase synthesis of 5′-palmitic acid conjugated ASOs We developed a solid phase DNA synthesis method for convenient synthesis of 5′-palmitic acid conjugated ASOs. First, we developed a 6-palmitamidohexyl phosphoramidite 77 (Scheme 3) for the incorporation of palmitic acid modification using a solid phase DNA synthesis method. Phosphoramidite 77 was synthesized according to the method described in Scheme 3. In brief, 6-palmitamidohexanol 78 (84%) was generated from palmitic acid, Pfp-TFA, triethylamine and 6-aminohexanol 79 in dichloromethane in a one pot synthesis. Compound 78 was subsequently phosphitylated to afford phsophoramidite 77 (93%). 5′-Pamitic acid conjugated ASO were synthesized on a solid phase DNA synthesizer using phosphoramidite 77 using standard protocol in good yield (29). Scheme 3. View largeDownload slide Synthesis of 6-palmitamidohexyl phosphoramidite 77; Pfp-TFA: pentaflurophenyl trifluoroacetate. Scheme 3. View largeDownload slide Synthesis of 6-palmitamidohexyl phosphoramidite 77; Pfp-TFA: pentaflurophenyl trifluoroacetate. DISCUSSION Antisense oligonucleotide therapeutics is a maturing drug discovery platform with six ASO drugs approved for clinical use (3,41,42), and over 40 more in clinical development (3). Given the predominant distribution of ASOs to the liver, it is not surprising that the majority of systemically administrated ASOs in clinical use or development target the liver-expressed genes. Thus, delivery strategies will be essential to define a new class of RNA therapeutics that target extrahepatic sites such as muscle. We demonstrate that fatty acid conjugation can improve potency of RNase H ASOs 3 to7 fold for suppressing gene expression in skeletal and cardiac muscles. Fatty acids constitute a fundamental source of energy production (43). The majority of fatty acids in the plasma are bound to serum albumin and there are seven binding sites for long-chain non-esterified fatty acids on the albumin (44). Albumin is one of the most abundant proteins in plasma and provides the transport of fatty acids, drugs, ions and other metabolites (45). Albumin interacts with multiple receptors such as glycoprotein Gp60, Gp30 and Gp18, (SPARC), the Megalin/Cublin complex, and the neonatal Fc receptor (FcRn) (20). Albumin's interaction with these receptors is responsible for its recycling and cellular transcytosis (20). This property makes albumin an attractive ‘self’ drug delivery molecules to transit across tissues and cellular barriers (20). Transport from circulation to muscle cells ASO must pass the endothelium, the interstitial space, and the plasma membrane (Figure 11). We hypothesized that conjugation of fatty acid will increase the albumin binding affinity of ASOs enhancing their ability to cross the endothelia barrier and improve its functional uptake into muscles (Figure 11). Previous studies suggested that unconjugated PS ASOs bind to human serum albumin with low binding affinity 370–480 μM (46). A recent report suggests that fatty acid modified gapmer ASOs can self-assemble into constructs that offer improved tissue distribution (47). Figure 11. View largeDownload slide Schematic representation of the likely route taken by lipid conjugated ASO from the capillary to cytosol in muscle. AlbLA: albumin bound lipid ASO; AlbR: albumin receptor. Figure 11. View largeDownload slide Schematic representation of the likely route taken by lipid conjugated ASO from the capillary to cytosol in muscle. AlbLA: albumin bound lipid ASO; AlbR: albumin receptor. As expected, palmitoyl conjugation improved binding affinity of ASO to human serum albumin (Figure 1) >150-fold. While, palmitic acid modification enhanced binding affinity of ASO to other most prominent plasma proteins such as HDL, and LDL but binding affinity to HRG remained unchanged (Figure 1). We next studied the effect of palmitic acid conjugation on potency of Malat-1 ASOs in mice quadriceps, heart, liver and kidney. Interestingly, palmitic acid conjugation improved the potency of Malat-1 ASO more than five-fold in heart and quadriceps (Figure 2) after systemic administration. In kidney unconjugated and palmitoyl conjugated Malat-1ASOs exhibited similar potency. In addition, palmitoyl conjugation improved tissue uptake of Malat-1 ASO in heart and quadriceps (Table 1) thus providing a rationale for the enhanced potency. Interestingly, palmitic acid conjugation increased plasma Cmax of ASO and plasma AUC relative to unconjugated ASO. Histological examination indicates that much of the ASO accumulated in heart at early time points appears diffuse or concentrated along intercellular boundaries suggesting it primarily resides in the interstitial space, however, by 24 h, the ASO appears in punctate foci, likely occupying intracellular vesicles. These data provide initial evidence for improved delivery of ASO when conjugated to palmitic acid in the muscle interstitial space compared to unconjugated ASO. One of the principal functions of human serum albumin is to transport fatty acids. The primary association constant of albumin to fatty acids is a function of chain length and degree of unsaturation (34), where the primary association constant and number of high affinity sites increase with fatty acid chain length (34). We characterized the effect of fatty acid chain length and degree of unsaturation on the albumin binding affinity of fatty acid conjugated ASOs. Binding affinity increased with chain length and the optimum affinity was observed in ASO conjugates with C16-C18 fatty acid chain length. Double bond unsaturation number or stereochemistry had little effect on albumin binding affinity of the ASO conjugate. Fatty acid conjugates with one to four unsaturated double bonds in cis conformation and with one or two trans double bond exhibited similar albumin binding. Interestingly, ASO conjugates containing C16 to C18 saturated or unsaturated fatty acids exhibited similar activity in mice skeletal and cardiac muscle. Taken together, our data suggest that palmitoyl (C16 saturated) and oleoyl (C18, ω 9Z) conjugation provided maximum potency enhancement for ASOs in skeletal and cardiac muscle. We also demonstrate that potency improvement observed for palmitoyl conjugation in mouse skeletal and cardiac muscle is not sequence or target specific. Palmitic acid conjugation improved potency (3 to 7 fold) of ASOs targeting CD36, DMPK and Cav3 mRNA in mouse muscle compared to corresponding unconjugated ASOs. The unconjugated DMPK ASO was evaluated in clinical trials for the treatment of myotonic dystrophy type 1. Our data demonstrate that altering the interaction of PS ASO with specific plasma proteins could modulate the tissue distribution of PS ASO. Conjugation of palmitic acid, a known albumin ligand, enhanced the albumin affinity of PS ASO. Albumin is actively transported across the capillary endothelium via caveolin-1 mediated transcytosis and significant amounts (60%) of total albumin is associated with the interstitial space in muscle (48). We hypothesize fatty acid conjugation enhances albumin binding thereby facilitating ASO movement across the vascular wall into the interstitium. Additional work is needed to understand more completely this process, as well as whether myocyte delivery itself is also enhanced by fatty acid interactions on plasma membrane receptors. In conclusion, we report a detailed structure-activity relationship of fatty acid conjugated ASOs for harnessing albumin-based delivery into extrahepatic tissues. Activity and binding to plasma proteins of ASO fatty acid conjugates containing varying fatty acid chain length, degree of unsaturation and cis trans isomers were studied. The binding affinity of ASO to plasma proteins improved with fatty acid chain length with the highest binding affinity observed with fatty acid chain length from 16 to 18 carbons. Degree of unsaturation or conformation of double bond appears to have no influence on protein binding of ASO fatty acid conjugates. Activity of fatty acid ASO conjugates correlate with the affinity to albumin and the tightest albumin binder exhibited highest activity improvement in muscle. Palmitic and oleic acid appears to be the optimum fatty acid structure for improving functional uptake of ASO into skeletal and cardiac muscle. To access the site of action in muscle, ASO must traverse the vascular wall and enter the interstitium to have access to the myocyte. Our data suggest that fatty acid conjugation enables the ASO to bind tightly to albumin thereby facilitating ASO delivery to the interstitium leading to enhanced myocyte uptake and ASO activity. Further improvement in effectiveness of fatty acid ASO conjugates could be achieved by combining with ligands which target cell-surface receptors in muscle tissues. Strategy described in this report provides a foundation for designing more effective therapeutic ASOs for targeting muscle tissues to develop treatments for muscle related diseases. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS We thank Tracy Reigle for the help in preparing Figure 11 and Amy Dan, Rick Carty, Chris Watson, Mark Andrade for their contribution to this work. FUNDING Funding for open access charge: Ionis Pharmaceuticals Inc. Conflict of interest statement. None declared. REFERENCES 1. Khvorova A. , Watts J.K. The chemical evolution of oligonucleotide therapies of clinical utility . Nat. Biotech. 2017 ; 35 : 238 – 248 . Google Scholar Crossref Search ADS WorldCat 2. Stein C.A. , Castanotto D. FDA-Approved Oligonucleotide Therapies in 2017 . Mol. Ther. 2017 ; 25 : 1069 – 1075 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Crooke S.T. , Witztum J.L. , Bennett C.F. , Baker B.F. RNA-targeted therapeutics . Cell Metab. 2018 ; 27 : 714 – 739 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Prakash T.P. , Graham M.J. , Yu J. , Carty R. , Low A. , Chappell A. , Schmidt K. , Zhao C. , Aghajan M. , Murray H.F. et al. . Targeted delivery of antisense oligonucleotides to hepatocytes using triantennary N-acetyl galactosamine improves potency 10-fold in mice . Nucleic Acids Res. 2014 ; 42 : 8796 – 8807 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Ammala C. , Drury W.J. 3rd , Knerr L. , Ahlstedt I. , Stillemark-Billton P. , Wennberg-Huldt C. , Andersson E.M. , Valeur E. , Jansson-Lofmark R. , Janzen D. et al. . Targeted delivery of antisense oligonucleotides to pancreatic β-cells . Sci Adv. 2018 ; 4 : eaat3386 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Hung G. , Xiao X. , Peralta R. , Bhattacharjee G. , Murray S. , Norris D. , Guo S. , Monia B.P. Characterization of target mRNA reduction through in situ RNA hybridization in multiple organ systems following systemic antisense treatment in animals . Nucleic Acid Therap. 2013 ; 23 : 369 – 378 . Google Scholar Crossref Search ADS WorldCat 7. Overby S.J. , Cerro-Herreros E. , Llamusi B. , Artero R. RNA-mediated therapies in myotonic dystrophy . Drug Discov. Today . 2018 ; 23 : 2013 – 2022 . Google Scholar Crossref Search ADS PubMed WorldCat 8. Stein C.A. , Yakubov L. , Zhang L.M. , Tonkinson J. Mode of uptake of 5′-cholesteryl-linked phosphodiester oligodeoxynucleotides in HL60 cells . Nucleic Acids Symp. Ser. 1991 ; 24 : 155 – 156 . WorldCat 9. De Smidt P.C. , Trung L.D. , De Falco S. , Van Berkel T.J.C. Association of antisense oligonucleotides with lipoproteins prolongs the plasma half-life and modifies the tissue distribution . Nucleic Acids Res. 1991 ; 19 : 4695 – 4700 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Wolfrum C. , Shi S. , Jayaprakash K.N. , Jayaraman M. , Wang G. , Pandey R.K. , Rajeev K.G. , Nakayama T. , Charrise K. , Ndungo E.M. et al. . Mechanisms and optimization of in vivo delivery of lipophilic siRNAs . Nat. Biotechnol. 2007 ; 25 : 1149 – 1157 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Petrova N.S. , Chernikov I.V. , Meschaninova M.I. , Dovydenko I.S. , Venyaminova A.G. , Zenkova M.A. , Vlassov V.V. , Chernolovskaya E.L. Carrier-free cellular uptake and the gene-silencing activity of the lipophilic siRNAs is strongly affected by the length of the linker between siRNA and lipophilic group . Nucleic Acids Res. 2012 ; 40 : 2330 – 2344 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Bijsterbosch M.K. , Rump E.T. , De Vrueh R.L.A. , Dorland R. , Van Veghel R. , Tivel K.L. , Biessen E.A.L. , Van Berkel T.J.C. , Manoharan M. Modulation of plasma protein binding and in vivo liver cell uptake of phosphorothioate oligodeoxynucleotides by cholesterol conjugation . Nucleic Acids Res. 2000 ; 28 : 2717 – 2725 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Moroz E. , Lee S.H. , Yamada K. , Halloy F. , Martinez-Montero S. , Jahns H. , Hall J. , Damha M.J. , Castagner B. , Leroux J.-C. Carrier-free gene silencing by amphiphilic nucleic acid conjugates in differentiated intestinal cells . Mol. Ther.–Nucleic Acids . 2016 ; 5 : e364 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Osborn M.F. , Coles A.H. , Biscans A. , Haraszti R.A. , Roux L. , Davis S. , Ly S. , Echeverria D. , Hassler M.R. , Godinho B.M.D.C. et al. . Hydrophobicity drives the systemic distribution of lipid-conjugated siRNAs via lipid transport pathways . Nucleic Acids Res. 2018 ; 47 : 1070 – 1081 . Google Scholar Crossref Search ADS WorldCat 15. Biscans A. , Coles A. , Echeverria D. , Khvorova A. The valency of fatty acid conjugates impacts siRNA pharmacokinetics, distribution, and efficacy in vivo . J. Controlled Release . 2019 ; 302 : 116 – 125 . Google Scholar Crossref Search ADS WorldCat 16. Khan T. , Weber H. , DiMuzio J. , Matter A. , Dogdas B. , Shah T. , Thankappan A. , Disa J. , Jadhav V. , Lubbers L. et al. . Silencing myostatin using Cholesterol-conjugated siRNAs induces muscle growth . Mol. Ther.–Nucleic Acids . 2016 ; 5 : e342 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Van Der Vusse G.J. , Glatz J.F.C. , Van Nieuwenhoven F.A. , Reneman R.S. , Bassingthwaighte J.B. Transport of long-chain fatty acids across the muscular endothelium . Adv. Exp. Med. Biol. 1998 ; 441 : 181 – 191 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Kratz F. Albumin as a drug carrier: design of prodrugs, drug conjugates and nanoparticles . J. Controlled Release . 2008 ; 132 : 171 – 183 . Google Scholar Crossref Search ADS WorldCat 19. Garcovich M. , Zocco M.A. , Gasbarrini A. Clinical use of albumin in hepatology . Blood Transfus. 2009 ; 7 : 268 – 277 . Google Scholar PubMed WorldCat 20. Larsen M.T. , Kuhlmann M. , Hvam M.L. , Howard K.A. Albumin-based drug delivery: harnessing nature to cure disease . Mol. Cell Ther. 2016 ; 4 : 1 – 12 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Plum A. , Jensen L.B. , Kristensen J.B. In vitro protein binding of liraglutide in human plasma determined by reiterated stepwise equilibrium dialysis . J. Pharm. Sci. 2013 ; 102 : 2882 – 2888 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Home P. , Kurtzhals P. Insulin detemir: from concept to clinical experience . Expert Opin. Pharmacother. 2006 ; 7 : 325 – 343 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Gaus H.J. , Gupta R. , Chappell A.E. , Ostergaard M.E. , Swayze E.E. , Seth P.P. Characterization of the interactions of chemically-modified therapeutic nucleic acids with plasma proteins using a fluorescence polarization assay . Nucleic Acids Res. 2018 ; 57 : 2061 – 2064 . WorldCat 24. Seth P.P. , Tanowitz M. , Bennett C.F. Selective tissue targeting of synthetic nucleic acid drugs . J. Clin. Invest. 2019 ; 129 : 915 – 925 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Leeds J.M. , Graham M.J. , Truong L. , Cummins L.L. Quantitation of phosphorothioate oligonucleotides in human plasma . Anal. Biochem. 1996 ; 235 : 36 – 43 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Gaus H.J. , Owens S.R. , Cooper S. , Cummins L.L. Online HPLC electrospray mass spectrometry of phosphorothioate oligonucleotide metabolites . Anal. Chem. 1997 ; 69 : 313 – 319 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Crooke S.T. , Wang S. , Vickers T.A. , Shen W. , Liang X-h. Cellular uptake and trafficking of antisense oligonucleotides . Nat. Biotechnol. 2017 ; 35 : 230 – 237 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Miller C.M. , Harris E.N. , Tanowitz M. , Donner A.J. , Prakash T.P. , Swayze E.E. , Seth P.P. Receptor-mediated uptake of phosphorothioate antisense oligonucleotides in different cell types of the liver . Nucleic Acid Ther. 2018 ; 28 : 119 – 127 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Seth P.P. , Siwkowski A. , Allerson C.R. , Vasquez G. , Lee S. , Prakash T.P. , Wancewicz E.V. , Witchell D. , Swayze E.E. Short antisense oligonucleotides with novel 2′-4′ conformationaly restricted nucleoside analogues show improved potency without increased toxicity in animals . J. Med. Chem. 2009 ; 52 : 10 – 13 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Gaus H. , Miller C.M. , Seth P.P. , Harris E.N. Structural determinants for the interactions of chemically modified nucleic acids with the stabilin-2 clearance receptor . Biochemistry . 2018 ; 57 : 2061 – 2064 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Graham M.J. , Crooke S.T. , Lemonidis K.M. , Gaus H.J. , Templin M.V. , Crooke R.M. Hepatic distribution of a phosphorothioate oligodeoxynucleotide within rodents following intravenous administration . Biochem. Pharmacol. 2001 ; 62 : 297 – 306 . Google Scholar Crossref Search ADS PubMed WorldCat 32. Graham M.J. , Crooke S.T. , Monteith D.K. , Cooper S.R. , Lemonidis K.M. , Stecker K.K. , Martin M.J. , Crooke R.M. In vivo distribution and metabolism of a phosphorothioate oligonucleotide within rat liver after intravenous administration . J. Pharmacol. Exp. Ther. 1998 ; 286 : 447 – 458 . Google Scholar PubMed WorldCat 33. Prakash T.P. , Yu J. , Kinberger G.A. , Low A. , Jackson M. , Rigo F. , Swayze E.E. , Seth P.P. Evaluation of the effect of 2′-O-methyl, fluoro hexitol, bicyclo and Morpholino nucleic acid modifications on potency of GalNAc conjugated antisense oligonucleotides in mice . Bioorg. Med. Chem. Lett. 2018 ; 28 : 3774 – 3779 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Kragh-Hansen U. , Watanabe H. , Nakajou K. , Iwao Y. , Otagiri M. Chain Length-dependent Binding of Fatty Acid Anions to Human Serum Albumin Studied by Site-directed Mutagenesis . J. Mol. Biol. 2006 ; 363 : 702 – 712 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Schmidt K. , Prakash T.P. , Donner A.J. , Kinberger G.A. , Gaus H.J. , Low A. , Østergaard M.E. , Bell M. , Swayze E.E. , Seth P.P. Characterizing the effect of GalNAc and phosphorothioate backbone on binding of antisense oligonucleotides to the asialoglycoprotein receptor . Nucleic Acids Res. 2017 ; 45 : 2294 – 2306 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Mullick A. , Crooke R.M. , Graham M. J. Antisense inhibition of CD36 expression and therapeutic uses thereof . 2012 ; WO2012149465A2 . 37. Pandey S.K. , Wheeler T.M. , Justice S.L. , Kim A. , Younis H.S. , Gattis D. , Jauvin D. , Puymirat J. , Swayze E.E. , Freier S.M. et al. . Identification and characterization of modified antisense oligonucleotides targeting DMPK in mice and nonhuman primates for the treatment of myotonic dystrophy type 1 . J. Pharmacol. Exp. Ther. 2015 ; 355 : 310 – 321 . Google Scholar Crossref Search ADS WorldCat 38. Jauvin D. , Chretien J. , Pandey S.K. , Martineau L. , Revillod L. , Bassez G. , Lachon A. , McLeod A.R. , Gourdon G. , Wheeler T.M. et al. . Targeting DMPK with antisense oligonucleotide improves muscle strength in myotonic dystrophy type 1 mice . Mol. Ther.–Nucleic Acids . 2017 ; 7 : 465 – 474 . Google Scholar Crossref Search ADS PubMed WorldCat 39. Campostrini G. , Bonzanni M. , Lissoni A. , Bazzini C. , Milanesi R. , Vezzoli E. , Francolini M. , Baruscotti M. , Bucchi A. , Rivolta I. et al. . The expression of the rare caveolin-3 variant T78M alters cardiac ion channels function and membrane excitability . Cardiovasc. Res. 2017 ; 113 : 1256 – 1265 . Google Scholar Crossref Search ADS PubMed WorldCat 40. Song K.S. , Scherer P.E. , Tang Z. , Okamoto T. , Li S. , Chafel M. , Chu C. , Kohtz D.S. , Lisanti M.P. Expression of Caveolin-3 in skeletal, cardiac, and smooth muscle Cells: Caveolin-3 is a component of the sarcolemma and co-fractionates with dystrophin and dystrophin-associated glycoproteinS . J. Biol. Chem. 1996 ; 271 : 15160 – 15165 . Google Scholar Crossref Search ADS PubMed WorldCat 41. Benson M.D. , Waddington-Cruz M. , Berk J.L. , Polydefkis M. , Dyck P.J. , Wang A.K. , Plante-Bordeneuve V. , Barroso F.A. , Merlini G. , Obici L. et al. . Inotersen treatment for patients with hereditary transthyretin amyloidosis . N. Engl. J. Med. 2018 ; 379 : 22 – 31 . Google Scholar Crossref Search ADS PubMed WorldCat 42. Mickle K. , Dreitlein W.B. , Pearson S.D. , Lasser K.E. , Hoch J.S. , Cipriano L.E. The effectiveness and value of patisiran and inotersen for hereditary transthyretin amyloidosis . J. Manag. Care Spec. Pharm. 2019 ; 25 : 10 – 15 . Google Scholar PubMed WorldCat 43. Rizzuti B. , Bartucci R. , Sportelli L. , Guzzi R. Fatty acid binding into the highest affinity site of human serum albumin observed in molecular dynamics simulation . Arch. Biochem. Biophys. 2015 ; 579 : 18 – 25 . Google Scholar Crossref Search ADS PubMed WorldCat 44. Curry S. , Mandelkow H. , Brick P. , Franks N. Crystal structure of human serum albumin complexed with fatty acid reveals an asymmetric distribution of binding sites . Nat. Struct. Biol. 1998 ; 5 : 827 – 835 . Google Scholar Crossref Search ADS PubMed WorldCat 45. Fasano M. , Curry S. , Terreno E. , Galliano M. , Fanali G. , Narciso P. , Notari S. , Ascenzi P. The extraordinary ligand binding properties of human serum albumin . IUBMB Life . 2005 ; 57 : 787 – 796 . Google Scholar Crossref Search ADS PubMed WorldCat 46. Srinivasan S.K. , Tewary H.K. , Iversen P.L. Characterization of binding sites, extent of binding, and drug interactions of oligonucleotides with albumin . Antisense Res. Dev. 1995 ; 5 : 131 – 139 . Google Scholar Crossref Search ADS PubMed WorldCat 47. Hvam M.L. , Cai Y. , Dagnæs-Hansen F. , Nielsen J.S. , Wengel J. , Kjems J. , Howard K.A. Fatty acid-modified gapmer antisense oligonucleotide and serum albumin constructs for pharmacokinetic modulation . Mol. Ther. 2017 ; 25 : 1710 – 1717 . Google Scholar Crossref Search ADS PubMed WorldCat 48. Ellmerer M. , Schaupp L. , Brunner G.A. , Sendlhofer G. , Wutte A. , Wach P. , Pieber T.R. Measurement of interstitial albumin in human skeletal muscle and adipose tissue by open-flow microperfusion . Am. J. Physiol. - Endocrinol. Metab. 2000 ; 278 : E352 – E356 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Probing G-quadruplex topologies and recognition concurrently in real time and 3D using a dual-app nucleoside probeNuthanakanti,, Ashok;Ahmed,, Ishtiyaq;Khatik, Saddam, Y;Saikrishnan,, Kayarat;Srivatsan, Seergazhi, G
doi: 10.1093/nar/gkz419pmid: 31106340
Abstract Comprehensive understanding of structure and recognition properties of regulatory nucleic acid elements in real time and atomic level is highly important to devise efficient therapeutic strategies. Here, we report the establishment of an innovative biophysical platform using a dual-app nucleoside analog, which serves as a common probe to detect and correlate different GQ structures and ligand binding under equilibrium conditions and in 3D by fluorescence and X-ray crystallography techniques. The probe (SedU) is composed of a microenvironment-sensitive fluorophore and an excellent anomalous X-ray scatterer (Se), which is assembled by attaching a selenophene ring at 5-position of 2′-deoxyuridine. SedU incorporated into the loop region of human telomeric DNA repeat fluorescently distinguished subtle differences in GQ topologies and enabled quantify ligand binding to different topologies. Importantly, anomalous X-ray dispersion signal from Se could be used to determine the structure of GQs. As the probe is minimally perturbing, a direct comparison of fluorescence data and crystal structures provided structural insights on how the probe senses different GQ conformations without affecting the native fold. Taken together, our dual-app probe represents a new class of tool that opens up new experimental strategies to concurrently investigate nucleic acid structure and recognition in real time and 3D. INTRODUCTION Nucleic acids perform their cellular functions by adopting complex secondary and tertiary structures, which are composed of several structural domains (1−3). The functional role of domains, which support a binding event or serve as a signalling or regulatory element, is coded in the form of conformational dynamics of a set of nucleotides (4−6). Dysfunction in many such domains due to mutations, lesions, etc., can lead to disease states. Hence, basic understanding of the conformation of therapeutically relevant structural motifs in real time and 3D will facilitate design platforms to identify small molecule functional modulators of clinical potential (7,8). One such important structural motif, which has gained prominence as a therapeutic target is the G-quadruplex (GQ) structure formed by sequences containing guanine tracts (9−11). GQ-forming sequences are widely present in the genome (12,13) and have been proven to play important roles in chromosome maintenance, telomerase dysfunction and regulation of expression of several oncogenes (14−21). Consequently, several small molecule ligands that bind and modulate GQ function have been evaluated as chemotherapeutic agents (22−29). However, the druggability of GQs in a clinical setup has not yet been realized. This is because GQ-forming motifs are highly diverse in sequence and exhibit structural polymorphism (30,31). Further, the majority of ligands and GQ sensors poorly distinguish different GQ structures (32). Depending on the number of contiguous G-tracts and the residues between them, a sequence can adopt various GQ topologies, which are generally classified as parallel-, antiparallel- and hybrid-type parallel-antiparallel-stranded conformations (30,31). These structures show differences in the conformation of the glycosidic bond (syn and anti) of guanosine residues of the tetrads, loop type (propeller, diagonal and lateral) and groove size. Fluorescence, NMR and X-ray crystallography techniques in combination with circular dichroism (CD) are commonly used to study GQ structure, dynamics and recognition properties (32−42). These methods invariably use custom-labeled oligonucleotides (ONs) as native bases are non-fluorescent and do not contain isotope or X-ray scattering label for efficient analysis in solution and 3D. For example, fluorescent ligands and metal complexes (43−49), and FRET pair-labeled ONs provide efficient means to study the formation as well as binding of ligands to GQs (50−52). However, detecting the formation of different GQ topologies and estimating the affinity of ligands to different GQ structures are difficult as most of the chemical probes do not efficiently distinguish subtle differences in conformations. Although the benefits of this traditional ‘one label-one technique’ are undeniable, direct correlation of structure and function under equilibrium conditions and in 3D is not straightforward as each technique uses uniquely-labeled ON sequence. In this context, it is of high priority to develop multifunctional probes, which (i) are structurally non-invasive, (ii) can detect subtle differences in the conformation of GQ topologies, (iii) can quantitatively report the affinity of ligands to different GQ topologies, and importantly (iv) can be concurrently deployed in completing biophysical techniques (e.g., fluorescence and X-ray diffraction). This, we hypothesized, can be accomplished by developing a dual-app nucleoside analog probe composed of a conformation-sensitive fluorophore and an anomalous X-ray scattering label (e.g., Se atom, Figure 1). Such a nucleoside analog, incorporated into GQ-forming sequences, would serve as a common probe to investigate different GQ structures and their recognition properties in real time and atomic level concurrently by using fluorescence and X-ray crystallography techniques. Figure 1. View largeDownload slide (A) A schematic diagram showing the dual-app nucleoside probe design. Probe 2 is intentionally designed to contain an environment-sensitive fluorophore (5-heterocycle-conjugated uracil) and an X-ray crystallography compatible label (Se atom) in the same electronic system so that the microenvironment experienced by the labels in different GQ conformations will be similar. (B) The nucleoside analog serves as a common probe to detect and correlate different GQ structures of H-Telo DNA and their ligand binding under equilibrium conditions and in 3D by two powerful techniques, namely fluorescence and X-ray crystallography. Figure 1. View largeDownload slide (A) A schematic diagram showing the dual-app nucleoside probe design. Probe 2 is intentionally designed to contain an environment-sensitive fluorophore (5-heterocycle-conjugated uracil) and an X-ray crystallography compatible label (Se atom) in the same electronic system so that the microenvironment experienced by the labels in different GQ conformations will be similar. (B) The nucleoside analog serves as a common probe to detect and correlate different GQ structures of H-Telo DNA and their ligand binding under equilibrium conditions and in 3D by two powerful techniques, namely fluorescence and X-ray crystallography. We recently developed a ribonucleoside probe containing a microenvironment-sensitive fluorophore and an anomalous X-ray scattering label (53,54). The probe was derived by attaching a selenophene ring at the 5 position of uridine, which essentially expands the π-system thereby generating a fluorescent nucleoside analog. The Se atom used in this probe design is highly beneficial as compared to the traditionally used halogen labels in nucleic acid X-ray analysis. Halogen labeled ONs are prone to dehalogenation upon exposure to X-ray radiation, which can cause failures in phasing (55). Se exhibits good anomalous X-ray dispersion signal, which is widely used in protein and recently in nucleic acid crystallography (55,56). Notably, Huang, Egli and others have used the anomalous diffraction signal from Se atom to determine the structure of ONs containing Se in the phosphate backbone, sugar and nucleobase (57−62). As a proof of principle, we showed the utility of our probe in monitoring the antibiotic binding to a well known RNA target, bacterial ribosomal decoding site RNA (A-site, 54). However, the ability of the probe to experimentally determine X-ray crystallographic phase information using the anomalous signal of Se has not been tested. Encouraged by these key observations and given the importance of GQs in therapeutics, we decided to expand the proficiency of the dual-app probe in studying a polymorphic nucleic acid motif, namely various GQ topologies of H-Telo DNA ON repeat and their binding affinity to small molecule ligands. Here, we describe an innovative biophysical platform to study different GQ topologies and their ligand binding in real time and 3D by using a dual-app nucleoside probe (SedU) and a highly polymorphic GQ-forming human telomeric (H-Telo) DNA ON repeat as a model system (Figure 1). The probe is made of a microenvironment-sensitive fluorophore and Se atom, which is derived by conjugating selenophene at the 5-position of 2′-deoxyuridine. The phosphoramidite substrate of the nucleoside was incorporated into the loop regions as different GQ structures of H-Telo DNA ON repeat show significant differences in loop conformation. The fluorescent component of the nucleoside probe distinguished different GQ topologies and also enabled quantify ligand binding to different topologies via changes in emission intensity and maximum. Single crystals of SedU-labeled H-Telo DNA ONs diffracted anomalously and yielded experimentally determined phases, indicative of their potential use in X-ray structure determination. Superimposition of native and labeled GQ structures indicated that the modification is minimally perturbing, which enabled direct comparison of fluorescence data and crystal structures to provide structural basis on how the dual-app probe senses different GQ conformations without disrupting the overall native fold. MATERIALS AND METHODS Synthesis and characterization of SedU (2) and its phosphoramidite 5 are provided in the Supplementary Data. Solid-phase synthesis of SedU-labeled ONs and their characterization by HPLC, MALDI-TOF mass, UV-thermal melting and CD measurements are described in Supplementary Data. Fluorescence analysis of SedU-labeled H-Telo DNA ONs and their duplexes Respective GQ structures of ONs 6−9 (10 μM) were formed by heating the samples at 90°C for 5 min in 10 mM Tris–HCl buffer (pH 7.5) containing 100 mM NaCl or 100 mM KCl. To obtain the parallel conformation, ONs were annealed in 50 mM Tris–HCl buffer (pH 7.5) containing 150 mM SrCl2. The corresponding duplexes 6•11, 7•11, 8•11 and 9•11 were prepared by heating a 1:1 mixture of H-Telo DNA ONs (6−9) and complementary ON 11 at 90°C for 5 min in different ionic conditions as mentioned above. All the samples were cooled slowly to RT and kept in an ice bath for at least 1 h before fluorescence was recorded. The samples were excited at 330 nm with excitation and emission slit widths of 5 and 9 nm, respectively. Fluorescence measurements were performed in triplicate in a micro fluorescence cuvette (Hellma, path length 1.0 cm) at 20°C. Fluorescence binding assay A series of samples of respective GQ structures of ON 7 (1 μM), annealed in buffers containing NaCl/KCl/SrCl2, was incubated with increasing concentrations of PDS (4 nM to 10 μM) and BRACO19 (4 nM to 10 μM). The samples were incubated for 30 min at RT. Samples were excited at 330 nm with an excitation and emission slit widths of 5 and 9 nm, respectively. Fluorescence experiments were performed in triplicate in a micro fluorescence cuvette (Hellma, path length 1.0 cm) at 20°C. Appropriate blank in absence of ONs, but containing respective concentration of the ligand, was subtracted from the individual spectrum. The dose-dependent quenching curves obtained for the binding of PDS or BRACO19 to H-Telo DNA ON 7 were fitted to a plot, normalized fluorescence intensity (FN) versus log [PDS] or log [BRACO19], using Hill equation (Origin 8.5) to determine the apparent dissociation constants (Kd, 63,64). \begin{equation*}{F_{\rm N}} = \frac{{{F_{\rm i}} - {F_{\rm s}}}}{{{F_0} - {F_{\rm s}}}}\end{equation*} Fi is the fluorescence intensity at each titration point. F0 and Fs are the fluorescence intensity in the absence of ligand (L) and at saturation, respectively. n is the Hill coefficient or degree of cooperativity associated with the binding. \begin{equation*}{F_{\rm N}} = {F_0} + \left( {{F_{\rm s}} - {F_0}} \right)\left( {\frac{{{{\left[ {\rm L} \right]}^n}}}{{{{\left[ {{K_{\rm d}}} \right]}^n} + {{\left[ {\rm L} \right]}^n}}}} \right)\end{equation*} Crystallization Native H-Telo DNA ON 10 A solution of ON 10 (3 mM) in 20 mM potassium cacodylate buffer (pH 6.5, 50 mM KCl) was annealed at 90 °C for 5 min. The sample was slowly cooled to 25°C and stored at this temperature overnight. Crystals were grown by using hanging drop vapor diffusion method at 4°C. Well solution was composed of 0.05 M sodium cacodylate (pH 7.2), 0.4 M ammonium sulfate, 0.05 M KCl, 0.01 M CaCl2, 15% PEG400. A sub-stock of ON 10 (1 μl, 1.8 mM) and 0.5 μl of well solution were used to form the drop. Final concentration of the ON was 1.2 mM. Diffraction quality crystals grew in three months as hexagonal rods of dimensions nearly 0.26 × 0.10 × 0.08 mm3. The crystals were harvested and cryoprotected in a solution of the mother liquor containing 30% PEG400. SedU-labeled H-Telo DNA ON 7 A solution of ON 7 (3 mM) in 20 mM potassium cacodylate buffer (pH 6.5, 50 mM KCl) was prepared as above. Well solution was composed of 0.05 M potassium cacodylate (pH 7.2), 0.625 M ammonium acetate, 0.2 M KCl, 15% PEG400. A sub-stock of ON 7 (1 μl, 1.8 mM) and 0.5 μl of well solution were used in growing the crystals by hanging drop vapor diffusion method at 4°C. Final concentration of the ON was 1.2 mM. Diffraction quality crystals grew in two months as hexagonal rods of dimensions nearly 0.16 × 0.16 × 0.15 mm3. The crystals were harvested and cryoprotected in a solution of the mother liquor containing 30% PEG400. SedU-labeled H-Telo DNA ON 8 A pre-annealed solution of ON 8 (3 mM) in 20 mM potassium cacodylate buffer (pH 6.5, 50 mM KCl) was incubated with various concentrations of BRACO19 at 25 °C for 1 h. Well solution was composed of 0.05 M potassium cacodylate (pH 7.2), 0.7 M ammonium sulfate, 0.05 M KCl, 0.01 M CaCl2 and 12.5% PEG400. Crystals were grown by using hanging drop vapour diffusion method at 4°C. Sub-stocks of ON 8 containing BRACO19 (1 μl) and 0.5 μl of well solution were used to form the drop. A drop containing 0.87 mM ON 8 and 1.04 mM BRACO19 gave diffraction quality crystals in four months as rhombic crystals (0.13 × 0.08 × 0.08 mm3). The crystals were harvested and cryoprotected in a solution of the mother liquor containing 30% PEG400. Single crystal X-ray data collection, structure solution and refinement procedures are described in the Supplementary Data. PDB accession codes for ON 10 (6IP3), ON 7 (6IP7) and ON 8 (6ISW). RESULTS AND DISCUSSION Design, synthesis and photophysical properties of SedU Fluorescent purine surrogates like 2-aminopurine and 6-methylisoxanthopterin and 8-vinyl, styryl- or heteroaryl-substituted guanosine analogs have been used as probes to detect GQ formation and electron transfer process in GQ structures (65−71). However, many of these purine analogs placed in G-tetrads destabilize the GQ structure as they do not have H-bonding sites like guanine and modification at 8 position of guanine is known to bias the glycosidic conformation (67). Further, baring a few examples, the analogs do not photophysically distinguish different GQ conformations possibly due to similarities in the tetrad conformation among various GQ forms. So we envisioned that placing a 5-heterocycle-modified pyrimidine nucleoside probe in the loop region could be advantageous on two counts. Modification at C5 position does not affect the glycosidic conformation of native pyrimidine nucleosides (63). The loop orientation and conformation of loop residues are significantly different in different GQ structures, which upon ligand binding undergo further conformational changes (31,72). Hence, a microenvironment-sensitive probe placed in the loop position would be able to distinguish different GQ conformations, which could be further used in determining the affinity of ligands to different GQ conformations. This notion is supported by a recent study wherein 5-furyl-2′-deoxyuridine (73), a responsive fluorescent nucleoside probe, is minimally perturbing and reports the microenvironment of various thymidine loop residues of an antiparallel GQ-forming thrombin binding aptamer (74). In the present probe design (SedU), we chose to replace oxygen atom of the furan ring with Se atom, so that it is responsive as well as has the added benefit of Se atom to facilitate X-ray analysis. 5-Selenophene-modified 2′-deoxyuridine SedU (2) and its phosphoramidite substrate required for the solid-phase ON synthesis were prepared in simple steps (Scheme 1). 5-Iodo-2′-deoxyuridine and 2-(tri-n-butyl stannyl) selenophene were coupled under Stille cross-coupling reaction conditions to give the dual-app probe 2 in moderate yields. 5′-O-DMT-protected 5-iodo-2′-deoxyuridine 3 was reacted with 2-(tri-n-butyl stannyl) selenophene in the presence of a palladium catalyst to give compound 4. Subsequent reaction in the presence of 2-cyanoethyl N,N-diisopropylchlorophosphoramidite gave phosphoramidite substrate 5. Scheme 1. View largeDownload slide Synthesis of dual-app nucleoside probe SedU (2) and its phosphoramidite substrate 5. DMT = 4,4′-dimethoxytrityl. Scheme 1. View largeDownload slide Synthesis of dual-app nucleoside probe SedU (2) and its phosphoramidite substrate 5. DMT = 4,4′-dimethoxytrityl. Microenvironment sensitivity of the nucleoside analog was examined by recording the photophysical properties of the analog in solvents of different polarity using water, dioxane and mixtures of water-dioxane. Both absorption and fluorescence properties were affected by solvent polarity changes (Figure 2, Table 1). The lowest energy absorption maximum of nucleoside 2 (SedU) was found to be slightly red shifted and hyperchromic as the solvent polarity was decreased from water to dioxane. When excited at its lowest energy absorption maximum, an aqueous solution of the nucleoside displayed a very large Stokes shift with an emission band centered at 452 nm. As the solvent polarity was decreased by varying water-dioxane ratio there was nearly a two-fold increase in quantum yield and a progressive shift in emission maximum to the blue region (452–432 nm). In comparison to 5-furyl-2′-deoxyuridine the emission maximum of SedU is considerably red-shifted in water, dioxane and their mixtures (73,75). Further, SedU exhibits higher fluorescence in a non-polar solvent (dioxane) as compared to in a polar solvent (water), whereas, 5-furyl-2′-deoxyuridine shows higher fluorescence in a polar solvent (water) as compared to in a non-polar solvent (dioxane). Time-resolved fluorescence measurements revealed a biexponential decay profile for SedU in the solvent mixtures tested. The average lifetime was found to decrease with decreasing water/dioxane ratio (Supplementary Figure S1, Table 1). Although, this combination of solvent mixtures is commonly used to investigate the effect of polarity on photophysical properties (75,76), it exhibits small nonlinearity in viscosity (77). Since the nucleoside analog contains an aryl-aryl rotatable bond between selenophene and uracil rings, the fluorescence outcome in different solvent mixtures is more likely due to a combined effect of polarity and viscosity of the medium (75,76). It is worth mentioning here that the responsiveness of 5-furyl-2′-deoxyuridine to microenvironment has been used in GQ studies, which has been found to be minimally perturbing (74). Collective these observations at the nucleoside level suggest that SedU, which is responsive to changes in its surrounding environment, could serve as a good GQ probe without affecting the native fold. Figure 2. View largeDownload slide Absorption (solid line, 50 μM) and emission (dashed line, 5 μM) spectra of SedU in different volume % of water-dioxane mixture. All solutions for absorption and emission studies contained 5% and 0.5% DMSO, respectively. Emission spectrum was recorded by exciting the samples at respective absorption maximum (Table 1) with an excitation and emission slit width of 4 and 4 nm, respectively. Figure 2. View largeDownload slide Absorption (solid line, 50 μM) and emission (dashed line, 5 μM) spectra of SedU in different volume % of water-dioxane mixture. All solutions for absorption and emission studies contained 5% and 0.5% DMSO, respectively. Emission spectrum was recorded by exciting the samples at respective absorption maximum (Table 1) with an excitation and emission slit width of 4 and 4 nm, respectively. Table 1. Photophysical properties of SedU in different solvent mixtures Solvent ixture λmaxa (nm) λem (nm) ϕb τ1c (ns) τ2c (ns) τavb (ns) water 324 452 0.012 0.16 (94) 3.25 (6) 0.34 25% dioxane 328 449 0.023 0.26 (94) 1.32 (7) 0.32 50% dioxane 330 444 0.025 0.24 (92) 1.15 (8) 0.31 75% dioxane 330 439 0.026 0.20 (93) 1.09 (7) 0.27 dioxane 330 432 0.022 0.14 (95) 1.95 (5) 0.23 Solvent ixture λmaxa (nm) λem (nm) ϕb τ1c (ns) τ2c (ns) τavb (ns) water 324 452 0.012 0.16 (94) 3.25 (6) 0.34 25% dioxane 328 449 0.023 0.26 (94) 1.32 (7) 0.32 50% dioxane 330 444 0.025 0.24 (92) 1.15 (8) 0.31 75% dioxane 330 439 0.026 0.20 (93) 1.09 (7) 0.27 dioxane 330 432 0.022 0.14 (95) 1.95 (5) 0.23 aThe lowest energy absorption maximum is provided. bStandard deviations for quantum yield (ϕ) and average lifetime (τav) are ≤0.002 and 0.02 ns, respectively. c% amplitude is given in parenthesis. View Large Table 1. Photophysical properties of SedU in different solvent mixtures Solvent ixture λmaxa (nm) λem (nm) ϕb τ1c (ns) τ2c (ns) τavb (ns) water 324 452 0.012 0.16 (94) 3.25 (6) 0.34 25% dioxane 328 449 0.023 0.26 (94) 1.32 (7) 0.32 50% dioxane 330 444 0.025 0.24 (92) 1.15 (8) 0.31 75% dioxane 330 439 0.026 0.20 (93) 1.09 (7) 0.27 dioxane 330 432 0.022 0.14 (95) 1.95 (5) 0.23 Solvent ixture λmaxa (nm) λem (nm) ϕb τ1c (ns) τ2c (ns) τavb (ns) water 324 452 0.012 0.16 (94) 3.25 (6) 0.34 25% dioxane 328 449 0.023 0.26 (94) 1.32 (7) 0.32 50% dioxane 330 444 0.025 0.24 (92) 1.15 (8) 0.31 75% dioxane 330 439 0.026 0.20 (93) 1.09 (7) 0.27 dioxane 330 432 0.022 0.14 (95) 1.95 (5) 0.23 aThe lowest energy absorption maximum is provided. bStandard deviations for quantum yield (ϕ) and average lifetime (τav) are ≤0.002 and 0.02 ns, respectively. c% amplitude is given in parenthesis. View Large Synthesis of SedU-labeled H-Telo DNA ONs In order to evaluate the GQ sensing ability of SedU, H-Telo DNA ON repeat sequence AGGG(TTAGGG)3 was chosen as the study model. This repeat sequence is ideally suited for studying GQs as it can adopt different GQ topologies depending on the metal ions, molecular crowding and confinement (30,31,78−80). Dual-app labeled H-Telo ONs 6−9 were prepared by incorporating phosphoramidite 5 in different loops by using conventional solid-phase ON synthesis cycle (Figure 3). The deprotected ONs were purified by polyacrylamide gel electrophoresis (PAGE) and the purity and integrity of SedU-labeled ONs were confirmed by RP-HPLC and mass analysis (Supplementary Figures S2, S3 and Supplementary Table S1). Figure 3. View largeDownload slide Sequence of SedU-labeled H-Telo DNA ONs 6−9. One of the T residues of loop 1 (6), loop 2 (7 and 8) and loop 3 (9) from 5′-end was replaced with nucleoside 2. Sequence of control unmodified ON 10 and complementary ON 11 is also shown. Figure 3. View largeDownload slide Sequence of SedU-labeled H-Telo DNA ONs 6−9. One of the T residues of loop 1 (6), loop 2 (7 and 8) and loop 3 (9) from 5′-end was replaced with nucleoside 2. Sequence of control unmodified ON 10 and complementary ON 11 is also shown. SedU labeling does not affect the native fold and stability H-Telo DNA ON in the presence of Na+ ions forms an antiparallel basket-type topology, whereas in the presence of K+ ions forms mixed parallel-antiparallel stranded hybrid type structures (Figure 1, 81). In the presence of Sr2+ ions it forms an all-parallel stranded GQ structure (82). The CD profiles of modified (6–9) and control unmodified (10) H-Telo DNA ONs in Na+ ionic conditions were found to be similar and characteristic of an antiparallel GQ structure (positive peaks at 290 and 240 nm and a negative peak at 260 nm, Supplementary Figure S4A). Similarly, characteristic CD pattern for hybrid-type and parallel GQ structures was displayed by all the H-Telo DNA ONs in the presence of K+ and Sr2+ ions, respectively (Supplementary Figure S4B and C). The melting temperature of GQ structures formed by control unmodified and modified H-Telo DNA ONs in the given ionic condition was found to be similar (Supplementary Figure S5, Supplementary Table S2, 83,84). Further, consistent with the reported data, different GQ topologies exhibited different Tm values with parallel conformation being the most stable form (81,83,84). It is important to mention here that both native and labeled ONs formed respective GQ structures in different ionic conditions, which matched well with the CD spectrum and stability reported for the same sequence. Fluorescence detection of different GQ topologies H-Telo DNA ONs 6−9 were annealed to form respective GQ structures in a buffer containing NaCl/KCl/SrCl2 and their fluorescence profile was compared with corresponding duplexes. GQs of 6, 7 and 9 containing the probe in the first (T5), second (T11) and third (T17) loop, respectively, displayed significant enhancement in fluorescence intensity as compared to the corresponding duplexes 6•11, 7•11 and 9•11 (Figure 4). Notably, depending on the position of modification the nucleoside probe distinguished different GQ topologies via changes in emission intensity and maximum. For example, mixed hybrid-type GQ structures of ON 7, formed in the presences of K+ ions, exhibited significant enhancement in fluorescence intensity (4.7-fold) with a noticeable red shift (λem = 451 nm) in emission maximum as compared to the duplex form (7•11, λem = 445 nm, Figure 4B). The antiparallel form exhibited further enhancement in fluorescence intensity (λem = 452 nm) as compared to the duplex. While the parallel GQ conformation, formed in the presence of Sr2+ ions, displayed fluorescence intensity similar to that of the antiparallel form, its emission maximum was discernibly blue shifted (λem = 444 nm) as compared to the antiparallel and hybrid GQ structures. It is important to mention here that the fluorescence of SedU alone and SedU incorporated into a non-GQ forming ON sequence was only marginally affected by changes in ionic conditions (NaCl, KCl or SrCl2, Supplementary Figure S6). Though the difference in emission maximum of parallel and antiparallel structures is only about 8 nm it was found to be reproducible (Figure 4B). Further, we have confirmed the formation of respective GQ structures by CD and thermal melting analysis (Supplementary Figure S4 and Supplementary Table S2). Taken together, these results indicate that the distinct fluorescence profile displayed by GQs is due to the differences in the environment of the nucleoside probe in different GQ conformations, and not due to effect of salts on the probe. SedU placed in the first and third loop (ONs 6 and 9), though reported the formation of GQ structures with enhancement in fluorescence intensity as compared to the duplex form, it poorly distinguished individual GQ topologies (Figure 4A and D). GQs of ON 8 and its corresponding duplexes, irrespective of the ionic conditions, exhibited a similar fluorescence profile suggesting that SedU placed at the T12 position of second loop failed to detect GQ structures (Figure 4C). Predicting the fluorescence properties of responsive probes incorporated into nucleic acid sequences is difficult, and hence, the implementation of the probes in biophysical assays is mostly empirical. Our results indicate that the dual-app probe incorporated at the T11 position (second loop) of the H-Telo DNA ON 7 is the most responsive among the ONs (6, 8 and 9). Hence, ON 7 was used as the model to study different GQ structures and their ligand binding by fluorescence and X-ray crystallography in greater detail. Figure 4. View largeDownload slide Depending on the position of modification SedU fluorescently distinguishes different GQ structures of H-Telo DNA ON repeat. (A−D) Fluorescence spectrum of GQs of H-Telo DNA ONs 6−9 (solid lines) and corresponding duplexes (dashed lines) in different ionic conditions. Samples (1.0 μM) were excited at 330 nm with an excitation and emission slit width of 5 and 9 nm, respectively. Figure 4. View largeDownload slide Depending on the position of modification SedU fluorescently distinguishes different GQ structures of H-Telo DNA ON repeat. (A−D) Fluorescence spectrum of GQs of H-Telo DNA ONs 6−9 (solid lines) and corresponding duplexes (dashed lines) in different ionic conditions. Samples (1.0 μM) were excited at 330 nm with an excitation and emission slit width of 5 and 9 nm, respectively. Interestingly, the duplexes 6•11, 7•11 and 9•11 formed in the presence of Sr2+ ions, though displayed lower fluorescence intensity as compared to the parallel GQ form of 6, 7, and 9, they were more emissive than the duplexes prepared in the presence of Na+/K+ ions. The Tm values of duplexes (∼63°C) formed in the presence of Na+ and K+ ions are considerably higher than the antiparallel structure (∼54°C) and similar to the hybrid-type GQs (∼65°C, Supplementary Table S3). Hence, in the presence of a complementary ON 11, H-Telo DNA ONs are likely to form a stable duplex in Na+ and K+ ionic conditions. However, the parallel form of H-Telo DNA ON repeat is significantly more stable (∼79°C) than the duplex form (∼65°C). Hence, it is likely that the duplexes annealed in the presence of Sr2+ could have some component of the more emissive parallel GQ structure, which would explain the higher fluorescence intensity exhibited by 6•11, 7•11 and 9•11. SedU reports binding of ligands to different GQ topologies The conformation sensitivity of SedU was put to use in determining the binding affinity of ligands to different GQ topologies. H-Telo DNA ON 7 was assembled into different GQs and titrated with two well known GQ binders, pyridostatin (PDS) and BRACO19 (Figure 5A, 85,86). As the concentration of the ligand was increased the probe reported the binding of ligands to different GQs with significant decrease in fluorescence intensity. The decrease in intensity was found to be dose-dependent, which enabled the determination of dissociation constant Kd (Figure 5B−D, Table 2). Kd values indicate that PDS has relatively a higher binding affinity for the parallel structure followed by antiparallel and hybrid-type GQ structures of H-Telo DNA ON repeat. However, BRACO19 binds to the hybrid GQ topology slightly better as compared to the parallel and antiparallel forms. Collectively, these results indicate that ligands have different binding affinities for different topologies and SedU can be used in assessing the difference in their binding affinities. This attribute of the probe could facilitate setting up discovery platforms to identify topology-specific GQ binders. Figure 5. View largeDownload slide (A) Chemical structure of pyridostatin (PDS) and BRACO19. (B) Representative plot showing the changes in fluorescence intensity upon titrating GQ of ON 7 with ligands. Here, fluorescence spectra of hybrid-type GQ structure of ON 7 in KCl as a function of increasing PDS is shown as an example. Red dashed line represents emission profile of ON 7 in the absence of PDS. (C and D) Curve fits for the binding of PDS and BRACO19, respectively, to various GQ structures of ON 7. Normalized fluorescence intensity at λem = 450 nm is plotted against log [ligand]. All samples were excited at 330 nm with excitation and emission slit widths of 5 and 9 nm, respectively. Figure 5. View largeDownload slide (A) Chemical structure of pyridostatin (PDS) and BRACO19. (B) Representative plot showing the changes in fluorescence intensity upon titrating GQ of ON 7 with ligands. Here, fluorescence spectra of hybrid-type GQ structure of ON 7 in KCl as a function of increasing PDS is shown as an example. Red dashed line represents emission profile of ON 7 in the absence of PDS. (C and D) Curve fits for the binding of PDS and BRACO19, respectively, to various GQ structures of ON 7. Normalized fluorescence intensity at λem = 450 nm is plotted against log [ligand]. All samples were excited at 330 nm with excitation and emission slit widths of 5 and 9 nm, respectively. Table 2. Kd values for PDS and BRACO19 binding to H-Telo DNA ON 7 GQs of ON 7 PDS (μM) BRACO19 (μM) Antiparallel (in NaCl) 1.77 ± 0.25 1.53 ± 0.09 Hybrid (in KCl) 2.26 ± 0.12 1.21 ± 0.08 Parallel (in SrCl2) 1.08 ± 0.17 1.54 ± 0.35 GQs of ON 7 PDS (μM) BRACO19 (μM) Antiparallel (in NaCl) 1.77 ± 0.25 1.53 ± 0.09 Hybrid (in KCl) 2.26 ± 0.12 1.21 ± 0.08 Parallel (in SrCl2) 1.08 ± 0.17 1.54 ± 0.35 View Large Table 2. Kd values for PDS and BRACO19 binding to H-Telo DNA ON 7 GQs of ON 7 PDS (μM) BRACO19 (μM) Antiparallel (in NaCl) 1.77 ± 0.25 1.53 ± 0.09 Hybrid (in KCl) 2.26 ± 0.12 1.21 ± 0.08 Parallel (in SrCl2) 1.08 ± 0.17 1.54 ± 0.35 GQs of ON 7 PDS (μM) BRACO19 (μM) Antiparallel (in NaCl) 1.77 ± 0.25 1.53 ± 0.09 Hybrid (in KCl) 2.26 ± 0.12 1.21 ± 0.08 Parallel (in SrCl2) 1.08 ± 0.17 1.54 ± 0.35 View Large Crystal structure of native and SedU-labeled H-Telo DNA ONs To study the effect of modification on the native fold at the atomic level and to understand the structural basis of the GQ sensing ability of the probe, control unmodified (10) and modified H-Telo DNA ONs (7 and 8) were crystallized and their 3D structures were determined by X-ray diffraction. Structure of native ON 10 The ON crystallized in P6 space group and the crystal diffracted to 1.40 Å—the highest resolution reported for diffraction from the H-Telo DNA ON repeat (Table 3). The structure of ON 10 was determined by molecular replacement method using the PDB coordinate 1KF1 (87), which has an all-parallel-stranded GQ structure. The overall structure of ON 10 (Figure 6A and B) was very similar to the structure reported for the same sequence (1KF1, 2.10 Å, Supplementary Table S4 and S5). However, unlike in the previously reported structure, we noted alternate conformations for nucleotides at two locations. The electron density of the first three nucleotides of ON 10 suggested two alternate conformations of nearly equal occupancy. In one conformation the nucleobase of A1 forms a Watson-Crick base pair with T12 of a symmetry-related molecule, while in the other they form a reversed Watson-Crick base pair (Supplementary Figure S7A). The density for the corresponding 2′-deoxyribose sugars in the two conformations was poor, suggestive of structural flexibility. Alternate conformations were also noted for the phosphate linkage between A13 and G14, with one having occupancy of 0.65 and the other 0.35. While the conformation with a higher occupancy is identical to that in 1KF1, the one with a lower occupancy has the phosphate backbone connecting A13 and G14 flipped inward between their sugar rings (Figure 7A). Table 3. Crystallographic data and refinement statistics Structure native ON 10 SedU-labeled ON 7 SedU-labeled ON 8 Space group P6 P6 P21221 Cell dimensions 56.839, 56.839 56.483, 56.483 35.204, 42.280 a, b, c (Å) 42.411 42.415 49.924 α, β, γ (deg) 90, 90, 120 90, 90, 120 90, 90, 90 Wavelength (Å) 0.9795 0.9795 0.9792 Resolution (Å) 28.4−1.4 (1.42−1.40) 48.9−1.55 (1.58−1.55) 42.3−2.3 (2.38−2.30) (Highest resolution shell) Rmerge (%) overall 0.10 (0.88) 0.058 (0.90) 0.119 (0.744) I/σ 16.4 (3.5) 23.6 (3.0) 13.9 (3.7) Completeness (%) 100 (100) 100 (100) 99.7 (99.6) Redundancy 16.3 (14.7) 12.5 (12.7) 12.7 (12.3) Refinement Resolution (Å) 28.4−1.4 48.9−1.55 42.3−2.3 No. of reflections 15469 21923 6296 Rwork/Rfree (%) 15.1/17.3 18.9/21.3 23.2/27.6 No. of atoms 570 469 469 No. of ions 3 3 3 No. of water molecules 79 50 0 RMS deviations in Bond lengths (Å) 0.010 0.012 0.003 Bond angles (deg) 0.990 1.424 0.616 PDB ID 6IP3 6IP7 6ISW Structure native ON 10 SedU-labeled ON 7 SedU-labeled ON 8 Space group P6 P6 P21221 Cell dimensions 56.839, 56.839 56.483, 56.483 35.204, 42.280 a, b, c (Å) 42.411 42.415 49.924 α, β, γ (deg) 90, 90, 120 90, 90, 120 90, 90, 90 Wavelength (Å) 0.9795 0.9795 0.9792 Resolution (Å) 28.4−1.4 (1.42−1.40) 48.9−1.55 (1.58−1.55) 42.3−2.3 (2.38−2.30) (Highest resolution shell) Rmerge (%) overall 0.10 (0.88) 0.058 (0.90) 0.119 (0.744) I/σ 16.4 (3.5) 23.6 (3.0) 13.9 (3.7) Completeness (%) 100 (100) 100 (100) 99.7 (99.6) Redundancy 16.3 (14.7) 12.5 (12.7) 12.7 (12.3) Refinement Resolution (Å) 28.4−1.4 48.9−1.55 42.3−2.3 No. of reflections 15469 21923 6296 Rwork/Rfree (%) 15.1/17.3 18.9/21.3 23.2/27.6 No. of atoms 570 469 469 No. of ions 3 3 3 No. of water molecules 79 50 0 RMS deviations in Bond lengths (Å) 0.010 0.012 0.003 Bond angles (deg) 0.990 1.424 0.616 PDB ID 6IP3 6IP7 6ISW View Large Table 3. Crystallographic data and refinement statistics Structure native ON 10 SedU-labeled ON 7 SedU-labeled ON 8 Space group P6 P6 P21221 Cell dimensions 56.839, 56.839 56.483, 56.483 35.204, 42.280 a, b, c (Å) 42.411 42.415 49.924 α, β, γ (deg) 90, 90, 120 90, 90, 120 90, 90, 90 Wavelength (Å) 0.9795 0.9795 0.9792 Resolution (Å) 28.4−1.4 (1.42−1.40) 48.9−1.55 (1.58−1.55) 42.3−2.3 (2.38−2.30) (Highest resolution shell) Rmerge (%) overall 0.10 (0.88) 0.058 (0.90) 0.119 (0.744) I/σ 16.4 (3.5) 23.6 (3.0) 13.9 (3.7) Completeness (%) 100 (100) 100 (100) 99.7 (99.6) Redundancy 16.3 (14.7) 12.5 (12.7) 12.7 (12.3) Refinement Resolution (Å) 28.4−1.4 48.9−1.55 42.3−2.3 No. of reflections 15469 21923 6296 Rwork/Rfree (%) 15.1/17.3 18.9/21.3 23.2/27.6 No. of atoms 570 469 469 No. of ions 3 3 3 No. of water molecules 79 50 0 RMS deviations in Bond lengths (Å) 0.010 0.012 0.003 Bond angles (deg) 0.990 1.424 0.616 PDB ID 6IP3 6IP7 6ISW Structure native ON 10 SedU-labeled ON 7 SedU-labeled ON 8 Space group P6 P6 P21221 Cell dimensions 56.839, 56.839 56.483, 56.483 35.204, 42.280 a, b, c (Å) 42.411 42.415 49.924 α, β, γ (deg) 90, 90, 120 90, 90, 120 90, 90, 90 Wavelength (Å) 0.9795 0.9795 0.9792 Resolution (Å) 28.4−1.4 (1.42−1.40) 48.9−1.55 (1.58−1.55) 42.3−2.3 (2.38−2.30) (Highest resolution shell) Rmerge (%) overall 0.10 (0.88) 0.058 (0.90) 0.119 (0.744) I/σ 16.4 (3.5) 23.6 (3.0) 13.9 (3.7) Completeness (%) 100 (100) 100 (100) 99.7 (99.6) Redundancy 16.3 (14.7) 12.5 (12.7) 12.7 (12.3) Refinement Resolution (Å) 28.4−1.4 48.9−1.55 42.3−2.3 No. of reflections 15469 21923 6296 Rwork/Rfree (%) 15.1/17.3 18.9/21.3 23.2/27.6 No. of atoms 570 469 469 No. of ions 3 3 3 No. of water molecules 79 50 0 RMS deviations in Bond lengths (Å) 0.010 0.012 0.003 Bond angles (deg) 0.990 1.424 0.616 PDB ID 6IP3 6IP7 6ISW View Large Figure 6. View largeDownload slide Native (10) and SedU-labeled H-Telo DNA ONs 7 and 8 adopt a parallel GQ structure. Upper panels show the top view and bottom panels show the side view of the structure. (A) and (B) native H-Telo DNA ON 10. One of the GQ conformations of ON 10 is shown for clarity. (C) and (D) ON 7. (E) and (F) ON 8. Potassium ions and water molecules are represented as indigo and red spheres, respectively. The selenophene ring in ON 7 and ON 8 is colored green with Se atom in magenta color. Figure 6. View largeDownload slide Native (10) and SedU-labeled H-Telo DNA ONs 7 and 8 adopt a parallel GQ structure. Upper panels show the top view and bottom panels show the side view of the structure. (A) and (B) native H-Telo DNA ON 10. One of the GQ conformations of ON 10 is shown for clarity. (C) and (D) ON 7. (E) and (F) ON 8. Potassium ions and water molecules are represented as indigo and red spheres, respectively. The selenophene ring in ON 7 and ON 8 is colored green with Se atom in magenta color. Figure 7. View largeDownload slide Comparison of the second propeller loop region in the parallel GQ structure of native and SedU-labeled ONs. (A) The structure of G8−G14 residues in the native ON 10 GQ structure showing alternate conformations of the phosphate backbone connecting A13 and G14. (B) The conformation of G8−G14 residues in ON 7 GQ structure illustrating the environment of SedU11. Potential hydrogen bonds involving selenophene ring are shown in green dotted lines. Also shown is the simple Fourier electron density map at 1.0σ level calculated using phases determined by SAD. (C) The conformation of G8−G14 residues in ON 8 GQ structure showing the environment of SedU12. Se atom is shown in magenta color. Figure 7. View largeDownload slide Comparison of the second propeller loop region in the parallel GQ structure of native and SedU-labeled ONs. (A) The structure of G8−G14 residues in the native ON 10 GQ structure showing alternate conformations of the phosphate backbone connecting A13 and G14. (B) The conformation of G8−G14 residues in ON 7 GQ structure illustrating the environment of SedU11. Potential hydrogen bonds involving selenophene ring are shown in green dotted lines. Also shown is the simple Fourier electron density map at 1.0σ level calculated using phases determined by SAD. (C) The conformation of G8−G14 residues in ON 8 GQ structure showing the environment of SedU12. Se atom is shown in magenta color. The asymmetric unit of the crystal of ON 10 contains an intramolecular all-parallel-stranded GQ (Figure 6A and B). Typical of a parallel structure, the guanosines in the anti glycosidic conformation base pair through Watson–Crick and Hoogsteen faces forming the characteristic square planar G-tetrads, which stack one above the other with an interplanar distance of ∼3.3 Å (Supplementary Figure S8A). The three TTA groups protrude laterally from the G-tetrads forming three propeller loops that adopt a type 1 loop conformation with adenine intercalating between the first and second thymine (Figure 6 and Supplementary Figure S9A, 72). In this loop arrangement, the adenine π−π stacks on the first thymine. The second thymine of each loop is located at the tip of the propeller, which partially stacks on the external face of the adenine residue. The GQ structure is stabilized by three K+ ions located in-between the stacked tetrads, which coordinate to eight C6 carbonyl oxygen atoms forming a bipyramidal antiprismatic geometry (Supplementary Figure S8A). Structure of SedU-labeled ON 7 The crystal form of ON 7 containing SedU at position 11 was the same as native ON 10 and diffracted to 1.55 Å (Table 3, Figure 6C and D). The selenium scattered anomalously and facilitated calculation of an electron density map using phases derived by single-wavelength anomalous diffraction (SAD) method (Figure 7B). This emphasised the use of SedU as a tool for SAD and multiwavelength anomalous diffraction (MAD) methods. While the overall GQ structure of ON 7 is similar to ON 10 (Supplementary Table S4 and S5), there are a few minor differences. In the structure of ON 7, the electron density of the first three nucleotides indicated a single conformation in which A1 forms a Watson–Crick base pair with the symmetry-related T12 (Supplementary Figure S7C). Another notable feature is the stabilization of the flipped-in conformation of the phosphate backbone connecting A13 and G14 residues (Figure 7B). C9 carbon of the selenophene ring of SedU11 forms a weak hydrogen bond with one of the phosphate oxygens. In the structure, the Se atom faces the carbonyl O4 of uracil and the selenophene ring is nearly coplanar with the uracil ring (–5.9°). A13 is slightly displaced to better stack with the labeled base of SedU11, which is slightly moved away from the groove formed along the middle tetrad to accommodate the selenophene ring (Figure 8A). The 2′-deoxyribose ring of G9 of the middle tetrad of ON 7 adopts a C2′-exo conformation as opposed to C2′-endo conformation in the structure of native ON 10, which, however, does not affect the anti glycosidic conformation of G9. As a consequence, the plane of 2′-deoxyribose ring is almost parallel to the selenophene ring of SedU11, which is positioned just below it. It is important to mention here that the superimposition of the parallel GQ structure of native ON 10 and labeled ON 7 showed that the global structure was not affected by the introduction of the probe (Supplementary Figure S10A). Further, when the structure of ON 10 in which the phosphate backbone connecting A13 and G14 is flipped inwards (conformation with lower occupancy, vide supra) was superimposed onto the structure of ON 7, the structures of native and modified ONs were found to be similar (Supplementary Figure S11A). Deviations in the position of atoms of G10, T11, T12 and A13 of modified ONs with respect to native ON were around or less than 2 Å. The shift in the position of bases of T11, T12 and A13 (second loop) due to the presence of SedU is 0.5–2.2 Å (Supplementary Figure S12). However, these perturbations did not affect the tetrad conformation or the overall fold, nor did it affect the ability of the probe to report the global conformation of different GQ topologies in different ionic conditions. Figure 8. View largeDownload slide Superimposition of G9−G14 residues of the parallel GQ structures of H-Telo DNA ONs. (A) Superimposed structures of ON 10 (pink) and ON 7 (brown) showing minor differences in the conformation of the second propeller TTA loop due to the introduction of SedU at position 11. For clarity, only the major conformer of ON 10 in which the phosphate linkage between A13 and G14 is flipped out is shown. (B) Superimposed structures of ON 10 (pink) and ON 8 (green) containing the modification at T12 position of the second propeller loop. (C) Comparison of the second propeller loop region in ON 7 (brown) and ON 8 (green). Figure 8. View largeDownload slide Superimposition of G9−G14 residues of the parallel GQ structures of H-Telo DNA ONs. (A) Superimposed structures of ON 10 (pink) and ON 7 (brown) showing minor differences in the conformation of the second propeller TTA loop due to the introduction of SedU at position 11. For clarity, only the major conformer of ON 10 in which the phosphate linkage between A13 and G14 is flipped out is shown. (B) Superimposed structures of ON 10 (pink) and ON 8 (green) containing the modification at T12 position of the second propeller loop. (C) Comparison of the second propeller loop region in ON 7 (brown) and ON 8 (green). Structure of SedU-labeled ON 8 The ON containing SedU at position 12 crystallized in a distinct space group P21221. The best diffracting crystal gave a complete data resolved to 2.3 Å (Table 3, Figure 6E and F). A structure solution could be obtained by molecular replacement using 1KF1 as the model (87). The electron density for the first three nucleotides indicated a single conformation as seen in the case of ON 7 (Supplementary Figure S7E). Also, the phosphate backbone connecting A13 and G14 adopted a flipped-in conformation as in ON 7 (Figure 7C). The position of A13 is slightly displaced similar to that in ON 7, possibly to better stack with T11, which, like SedU11 in ON 7, is also slightly displaced away from the groove formed along the middle tetrad (Figure 8B and C). The displacement of T11 and A13 appears to facilitate masking of the nucleobase of SedU12 from solvent from one side (Supplementary Figure S13). The 2′-deoxyribose ring of G9 of ON 8 adopts a C2′-endo conformation similar to that seen in the native ON 10 structure. Importantly, superimposed structures of ON 10 and ON 8 revealed that the overall structure is similar and not affected by incorporation of the probe (Supplementary Figures S10B and S11B). Crystal packing The packing structure of native and modified H-Telo DNA ONs showed two parallel GQs interacting from 5′−5′ end via stacking interaction (Figure 9 and Supplementary Figure S14). The dimeric structure is stabilized by a K+ ion, which coordinates with C6 carbonyl oxygen atoms of tetrads of adjacent GQs in a sandwiched fashion. The average distance between the interacting tetrads is in the range of 3.4−3.5 Å. The packing is further expanded by two dimers strongly interacting via pairs of stacked T18−T6′ residues separated by 3.3 ± 0.1 Å. Figure 9. View largeDownload slide Intermolecular interactions in the SedU-labeled ON 7 crystal. (A) GQ dimer is formed by head-to-head stacking of 5′ G-tetrads from two adjacent GQs (cyan and green). The dimer is also stabilized by a bridging K+ ion. (B) Further packing of the dimers is facilitated by strong π−π stacking interaction between two pairs of T18−T6′ residues. Figure 9. View largeDownload slide Intermolecular interactions in the SedU-labeled ON 7 crystal. (A) GQ dimer is formed by head-to-head stacking of 5′ G-tetrads from two adjacent GQs (cyan and green). The dimer is also stabilized by a bridging K+ ion. (B) Further packing of the dimers is facilitated by strong π−π stacking interaction between two pairs of T18−T6′ residues. Structural insights into the GQ sensing ability of SedU: A comparison of fluorescence and X-ray data Crystal structure analysis, complemented by CD and thermal melting studies, indicate that SedU is minimally perturbing and its incorporation, in-principle, should not affect the native fold of different telomeric GQ structures. Based on these key observations, we sought to assess the conformation sensitivity of SedU by correlating the conformation of loop residues in different GQ structures with the fluorescence data. In general, stacking interaction between a fluorophore and adjacent bases and the presence of a guanine near the fluorophore can promote non-radiative decay pathways (88,89). In the duplex structure of 7•11 the base paired SedU stacks with flanking G10 and T12 like other bases in the double helix. Due to this stacking interaction and proximity to a guanine residue the nucleoside analog in duplex shows very low fluorescence. However, the probe placed in the loop region fluorescently distinguishes different GQ structures due to distinct conformation and microenvironment of the emissive analog in these topologies (Figure 4B). In KCl solution, the telomeric repeat forms multiple GQ structures with hybrid-type 1 and 2 structures as the predominant ones (81). In the native hybrid-1 structure, the solvent exposed T11 residue is projected away from the tetrad core (∼8 Å) and experiences no stacking interaction with adjacent bases (Supplementary Figure S15A, 90). SedU placed in the T11 position is likely to adopt a similar conformation. Hence, hybrid-1 structure is highly emissive as compared to the duplex form due to reduced stacking interaction and electron transfer from guanine. Further, in this conformation the probe is solvent exposed as evident from its emission maximum (451 nm), which is closer to the emission maximum of the free nucleoside in water (452 nm, Table 1). In the case of hybrid-2 structure, T11 strongly stacks with the G-tetrad core (Supplementary Figure S15A, 91), and hence, the fluorescence of SedU in this conformation should be highly quenched. Therefore, the enhanced fluorescence displayed by ON 7 in KCl as compared to the duplex is due to a combination of more emissive hybrid-1 and less emissive hybrid-2 forms. In NaCl, the telomeric ON repeat adopts only an antiparallel structure in which the T11 residue is solvent exposed, not stacked and projected away from the G-tetrad (∼7 Å, Supplementary Figure S15B, 92). In this scenario, SedU-labeled antiparallel structure is expected to be highly emissive like hybrid-1 GQ structure. In the absence of other weakly emissive forms (e.g., hybrid-2 GQ), the antiparallel structure of H-Telo ON 7 in NaCl exhibits higher fluorescence than hybrid-type structures in KCl. Since there is no change in emission maximum between antiparallel and hybrid-type GQs structures, it is likely that the weakly emissive hybrid-2 form reduces the overall intensity but does not affect the emission maximum of hybrid-type GQ structures formed in KCl conditions. Similar observation has been reported for a fluorescent GQ sensor based on 5-fluorobenzofuran-dU analog (64). In the parallel structure of ON 7, SedU11 is ∼5 Å away from the G-tetrad and shows a weak π−π interaction with A13 residue (Supplementary Figure S9B and S15D). Further, SedU11 is less solvent exposed and has a distinct microenvironment. The 5′-phosphate of G14 is in the vicinity of the selenophene ring of SedU11 and makes a weak hydrogen bond with C9 of selenophene ring (Figure 7B). Additionally, the C2′-exo ribose ring of G9 and the selenophene ring are involved in carbohydrate-aromatic interaction, with C4′ of the ribose ring being within hydrogen bonding distance of Se. Hence, the distinct environment of SedU in the parallel form of ON 7 shows a blue shifted (λem = 444 nm) emission spectrum as compared to the hybrid and antiparallel GQ structures (λem = 451 nm). Modified nucleoside placed in the T12 position of ON 8 did not distinguish different GQs from the duplex form. In the ON 8 sequence, SedU is flanked by T11 and A13 residues. In the duplex form (8•11), the base paired SedU stacks with the flanking bases and is nearly 6.8 Å away from the guanine residues on either side. Hence, duplex 8•11 unlike 7•11 displays a reasonably higher fluorescence efficiency due to reduced quenching effect of guanine residues. In the hybrid-1 structure of native ON, T12 is strongly stacked on the tetrad core (Supplementary Figure S16A). Hence, SedU in this conformation should exhibit very low fluorescence. However, in the hybrid-2 structure, SedU would be projected out of the tetrad core and experience no stacking interaction with neighboring bases (Supplementary Figure S16A). So a combination of strongly emissive hybrid-2 and weakly emissive hybrid-1 GQs in KCl results in moderate fluorescence, which is similar to the duplex. On similar grounds, antiparallel and parallel structures also show comparable fluorescence intensity, which can be ascribed to a combined effect of stacking interaction, reduced electron transfer process between guanine and the fluorophore and solvation-desolvation of the fluorophore (Supplementary Figures S16B and S16C). CONCLUSIONS We have established a simple platform to investigate the GQ structure and ligand binding of a highly polymorphic telomeric DNA repeat in real time and 3D by using a new nucleoside analog (SedU), which functions both as a conformation-sensitive fluorescent probe and X-ray crystallography phasing agent. The Se atom in the nucleoside was stable under solid-phase ON synthesis conditions and during X-ray irradiation. The fluorescent module of the dual-app probe allowed real time analysis of various GQ topologies formed by H-Telo DNA ON repeat and estimate the binding affinity of ligands to different topologies. On the other hand, Se atom served as a reliable anomalous X-ray dispersion agent in determining the GQ structure by SAD method. Notably, the native ON crystals diffracting at the highest resolution (1.40 Å) so far reported for the telomeric repeat revealed alternative conformations of residues at two regions, which was not seen in the previously reported structures with comparatively lower resolution (PDB ID 1KF1). Except for few minor variations, the overall fold of SedU-labeled H-Telo DNA ONs was similar to that of the native ON. Collectively, both solution (CD and Tm) and X-ray analyses indicated that the probe is minimally perturbing. Further, by comparing fluorescence data and superimposed structures, we could gain structural insights on the GQ sensing ability of the dual-app probe, which otherwise is not straightforward using traditional unidimensional probes. Several covalently attached probes have been used in the study of GQs. However, to the best of our knowledge, this is the first example of GQ crystal structures containing a covalently attached probe, which provides atomic level understanding of how the probe senses different GQ conformations without affecting the native fold. Taken together, our results demonstrate that SedU, when judiciously placed, would not only provide a good understanding of the nucleic acid structure and function in solution and 3D but also could support discovery assays to identify structure-specific binders. DATA AVAILABILITY Atomic coordinates and structure factors for the reported crystal structures have been deposited with the Protein Data bank under accession number−ON 10 (6IP3), ON 7 (6IP7) and ON 8 (6ISW). SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. ACKNOWLEDGEMENTS A.N. and I.A. thank University Grants Commission, India for a graduate research fellowship. S.Y.K. acknowledges graduate fellowship from IISER Pune. We acknowledge the use of Macromolecular Crystallography Facility at IISER Pune, the Diamond Light Source, Oxfordshire, UK, and the European Synchrotron Radiation Facility (ESRF), Grenoble, France, for access to their beamlines and Department of Biotechnology, Government of India-ESRF partnership facilitated some of the X-ray diffraction experiments. S.G.S thanks Wellcome Trust-DBT India Alliance for a research grant [IA/S/16/1/502360]. FUNDING Wellcome Trust-DBT India Alliance [IA/S/16/1/502360 to S.G.S.]. Funding for open access charge: Wellcome Trust-DBT India Alliance [IA/S/16/1/502360 to S.G.S.] Conflict of interest statement. None declared. REFERENCES 1. Cruz J.A. , Westhof E. The dynamic landscapes of RNA architecture . Cell . 2009 ; 136 : 604 – 609 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Choi J. , Majima T. Conformational changes of non-B DNA . Chem. Soc. Rev. 2011 ; 40 : 5893 – 5909 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Serganov A. , Patel D.J. Molecular recognition and function of riboswitches . Curr. Opin. Struct. Biol. 2012 ; 22 : 279 – 286 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Ogle J.M. , Carter A.P. , Ramakrishnan V. Insights into the decoding mechanism from recent ribosome structures . Trends Biochem. Sci. 2003 ; 28 : 259 – 266 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Holbrook S.R. Structural principles from large RNAs . Annu. Rev. Biophys. 2008 ; 37 : 445 – 464 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Salmon L. , Yang S. , Al-Hashimi H.M. Advances in the determination of nucleic acid conformational ensembles . Annu. Rev. Phys. Chem. 2014 ; 65 : 293 – 316 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Hermann T. Rational ligand design for RNA: The role of static structure and conformational flexibility in target recognition . Biochimie . 2002 ; 84 : 869 – 875 . Google Scholar Crossref Search ADS PubMed WorldCat 8. He S. , Mao X. , Sun H. , Shirakawa T. , Zhang H. , Wang X. Potential therapeutic targets in the process of nucleic acid recognition: Opportunities and challenges . Trends Pharmacol. Sci. 2015 ; 36 : 51 – 64 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Balasubramanian S. , Hurley L.H. , Neidle S. Targeting G-quadruplexes in gene promoters: a novel anticancer strategy . Nat. Rev. Drug Discov. 2011 ; 10 : 261 – 275 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Collie G.W. , Parkinson G.N. The application of DNA and RNA G-quadruplexes to therapeutic medicines . Chem. Soc. Rev. 2011 ; 40 : 5867 – 5892 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Neidle S. Quadruplex nucleic acids as targets for anticancer therapeutics . Nat. Rev. Chem. 2017 ; 1 : 0041 . Google Scholar Crossref Search ADS WorldCat 12. Yadav V.K. , Abraham J.K. , Mani P. , Kulshrestha R. , Chowdhury S. QuadBase: genome-wide database of G4 DNA-occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes . Nucleic Acids Res. 2008 ; 36 : D381 – D385 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Bedrat A. , Lacroix L. , Mergny J.-L. Re-evaluation of G-quadruplex propensity with G4hunter . Nucleic Acids Res. 2016 ; 44 : 1746 – 1759 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Patel D.J. , Phan A.T. , Kuryavyi V. Human telomere, oncogenic promoter and 5′-UTR G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics . Nucleic Acids Res. 2007 ; 35 : 7429 – 7455 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Rhodes D. , Lipps H.J. G-quadruplexes and their regulatory roles in biology . Nucleic Acids Res. 2015 ; 43 : 8627 – 8637 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Hänsel-Hertsch R. , Di Antonio M. , Balasubramanian S. DNA G-quadruplexes in the human genome: Detection, functions and therapeutic potential . Nat. Rev. Mol. Cell Biol. 2017 ; 18 : 279 – 284 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Siddiqui-Jain A. , Grand C.L. , Bearss D.J. , Hurley L.H. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription . Proc. Natl. Acad. Sci. U.S.A. 2002 ; 99 : 11593 – 11598 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Gomez D. , Guédin A. , Mergny J.-L. , Salles B. , Riou J.-F. , Teulade-Fichou M.-P. , Calsou P.A. A G-quadruplex structure within the 5′-UTR of TRF2 mRNA represses translation in human cells . Nucleic Acids Res. 2010 ; 38 : 7187 – 7198 . Google Scholar Crossref Search ADS PubMed WorldCat 19. Wang Q. , Liu J.-Q. , Chen Z. , Zheng K.-W. , Chen C.-Y. , Hao Y.-H. , Tan Z. G-quadruplex formation at the 3′ end of telomere DNA inhibits its extension by telomerase, polymerase and unwinding by helicase . Nucleic Acids Res. 2011 ; 39 : 6229 – 6237 . Google Scholar Crossref Search ADS PubMed WorldCat 20. Bolduc F. , Garant J.-M. , Allard F. , Perreault J.-P. Irregular G-quadruplexes found in the untranslated regions of human mRNAs influence translation . J. Biol. Chem. 2016 ; 291 : 21751 – 21760 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Katsuda Y. , Sato S.-I. , Asano L. , Morimura Y. , Furuta T. , Sugiyama H. , Hagihara M. , Uesugi M. A small molecule that represses translation of G-quadruplex-containing mRNA . J. Am. Chem. Soc. 2016 ; 138 : 9037 – 9040 . Google Scholar Crossref Search ADS PubMed WorldCat 22. Ohnmacht S.A. , Neidle S. Small-molecule quadruplex-targeted drug discovery . Bioorg. Med. Chem. Lett. 2014 ; 24 : 2602 – 2612 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Zhang S. , Wu Y. , Zhang W. G-quadruplex structures and their interaction diversity with ligands . ChemMedChem . 2014 ; 9 : 899 – 911 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Ruggiero E. , Richter S.N. G-quadruplexes and G-quadruplex ligands: targets and tools in antiviral therapy . Nucleic Acids Res. 2018 ; 46 : 3270 – 3283 . Google Scholar Crossref Search ADS PubMed WorldCat 25. Wang X.-D. , Ou T.-M. , Lu Y.-J. , Li Z. , Xu Z. , Xi C. , Tan J.-H. , Huang S.-L. , An L.-K. , Li D. et al. . Turning off transcription of the bcl-2 gene by stabilizing the bcl-2 promoter quadruplex with quindoline derivatives . J. Med. Chem. 2010 ; 53 : 4390 – 4398 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Micco M. , Collie G.W. , Dale A.G. , Ohnmacht S.A. , Pazitna I. , Gunaratnam M. , Reszka A.P. , Neidle S. Structure-based design and evaluation of naphthalene diimide G-quadruplex ligands as telomere targeting agents in pancreatic cancer cells . J. Med. Chem. 2013 ; 56 : 2959 – 2974 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Panda D. , Saha P. , Das T. , Dash J. Target guided synthesis using DNA nano-templates for selectively assembling a G-quadruplex binding c-MYC inhibitor . Nat. Commun. 2017 ; 8 : 16103 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Kawauchi K. , Sugimoto W. , Yasui T. , Murata K. , Itoh K. , Takagi K. , Tsuruoka T. , Akamatsu K. , Tateishi-Karimata H. , Sugimoto N. et al. . An anionic phthalocyanine decreases NRAS expression by breaking down its RNA G-quadruplex . Nat. Commun. 2018 ; 9 : 2271 . Google Scholar Crossref Search ADS PubMed WorldCat 29. Calabrese D.R. , Zlotkowski K. , Alden S. , Hewitt W.M. , Connelly C.M. , Wilson R.M. , Gaikwad S. , Chen L. , Guha R. , Thomas C.J. et al. . Characterization of clinically used oral antiseptics as quadruplex-binding ligands . Nucleic Acids Res. 2018 ; 46 : 2722 – 2732 . Google Scholar Crossref Search ADS PubMed WorldCat 30. Burge S. , Parkinson G.N. , Hazel P. , Todd A.K. , Neidle S. Quadruplex DNA: sequence, topology and structure . Nucleic Acids Res. 2006 ; 34 : 5402 – 5415 . Google Scholar Crossref Search ADS PubMed WorldCat 31. Chen Y. , Yang D. Sequence, stability, and structure of G-quadruplexes and their interactions with drugs . Curr. Protoc. Nucleic Acid Chem. 2012 ; 50 : 17.5.1 – 17.5.17 . Google Scholar Crossref Search ADS WorldCat 32. Vummidi B.R. , Alzeer J. , Luedtke N.W. Fluorescent probes for G-quadruplex structures . ChemBioChem . 2013 ; 14 : 540 – 558 . Google Scholar Crossref Search ADS PubMed WorldCat 33. Manna S. , Srivatsan S.G. Fluorescence-based tools to probe G-quadruplexes in cell-free and cellular environments . RSC Adv. 2018 ; 8 : 25673 – 25694 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Adrian M. , Heddi B. , Phan A.T. NMR spectroscopy of G-quadruplexes . Methods . 2012 ; 57 : 11 – 24 . Google Scholar Crossref Search ADS PubMed WorldCat 35. Chung W.J. , Heddi B. , Tera M. , Iida K. , Nagasawa K. , Phan A.T. Solution structure of an intramolecular (3 + 1) human telomeric G-quadruplex bound to a telomestatin derivative . J. Am. Chem. Soc. 2013 ; 135 : 13495 – 13501 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Amrane S. , Kerkour A. , Bedrat A. , Vialet B. , Andreola M.-L. , Mergny J.-L. Topology of a DNA G-quadruplex structure formed in the HIV-1 promoter: A potential target for anti-HIV drug development . J. Am. Chem. Soc. 2014 ; 136 : 5249 – 5252 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Kocman V. , Plavec J. A tetrahelical DNA fold adopted by tandem repeats of alternating GGG and GCG tracts . Nat. Commun. 2014 ; 5 : 5831 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Giassa I.-C. , Rynes J. , Fessl T. , Foldynova-Trantirkova S. , Trantírek L. Advances in the cellular structural biology of nucleic acids . FEBS Lett. 2018 ; 592 : 1997 – 2011 . Google Scholar Crossref Search ADS PubMed WorldCat 39. Haider S.M. , Neidle S. , Parkinson G.N. A structural analysis of G-quadruplex/ligand interactions . Biochimie . 2011 ; 93 : 1239 – 1251 . Google Scholar Crossref Search ADS PubMed WorldCat 40. Nicoludis J.M. , Miller S.T. , Jeffrey P.D. , Barrett S.P. , Rablen P.R. , Lawton T.J. , Yatsunyk L.A. Optimized end-stacking provides specificity of N-methyl mesoporphyrin IX for human telomeric G-quadruplex DNA . J. Am. Chem. Soc. 2012 ; 134 : 20446 – 20456 . Google Scholar Crossref Search ADS PubMed WorldCat 41. Krauss I.R. , Ramaswamy S. , Neidle S. , Haider S. , Parkinson G.N. Structural insights into the quadruplex-duplex 3′ interface formed from a telomeric repeat: A potential molecular target . J. Am. Chem. Soc. 2016 ; 138 : 1226 – 1233 . Google Scholar Crossref Search ADS PubMed WorldCat 42. Vorlíčková M. , Kejnovská I. , Sagi J. , Renčiuk D. , Bednářová K. , Motlová J. , Kypr J. Circular dichroism and guanine quadruplexes . Methods . 2012 ; 57 : 64 – 75 . Google Scholar Crossref Search ADS PubMed WorldCat 43. Ma D.-L. , Che C.-M. , Yan S.-C. Platinum(II) complexes with dipyridophenazine ligands as human telomerase inhibitors and luminescent probes for G-quadruplex DNA . J. Am. Chem. Soc. 2009 ; 131 : 1835 – 1846 . Google Scholar Crossref Search ADS PubMed WorldCat 44. Iida K. , Nakamura T. , Yoshida W. , Tera M. , Nakabayashi K. , Hata K. , Ikebukuro K. , Nagasawa K. Fluorescent-ligand-mediated screening of G-quadruplex structures using a DNA microarray . Angew. Chem., Int. Ed. 2013 ; 52 : 12052 – 12055 . Google Scholar Crossref Search ADS WorldCat 45. Mohanty J. , Barooah N. , Dhamodharan V. , Harikrishna S. , Pradeepkumar P.I. , Bhasikuttan A.C. Thioflavin T as an efficient inducer and selective fluorescent sensor for the human telomeric G-quadruplex DNA . J. Am. Chem. Soc. 2013 ; 135 : 367 – 376 . Google Scholar Crossref Search ADS PubMed WorldCat 46. Laguerre A. , Stefan L. , Larrouy M. , Genest D. , Novotna J. , Pirrotta M. , Monchaud D. A twice-as-smart synthetic G-quartet: pyroTASQ is both a smart quadruplex ligand and a smart fluorescent probe . J. Am. Chem. Soc. 2014 ; 136 : 12406 – 12414 . Google Scholar Crossref Search ADS PubMed WorldCat 47. Shivalingam A. , Izquierdo M.A. , Marois A.L. , Vyšniauskas A. , Suhling K. , Kuimova M.K. , Vilar R. The interactions between a small molecule and G-quadruplexes are visualized by fluorescence lifetime imaging microscopy . Nat. Commun. 2015 ; 6 : 8178 . Google Scholar Crossref Search ADS PubMed WorldCat 48. Engelhard D.M. , Nowack J. , Clever G.H. Copper-induced topology switching and thrombin inhibition with telomeric DNA G-quadruplexes . Angew. Chem., Int. Ed. 2017 ; 56 : 11640 – 11644 . Google Scholar Crossref Search ADS WorldCat 49. Zhang S. , Sun H. , Wang L. , Liu Y. , Chen H. , Li Q. , Guan A. , Liu M. , Tang Y. Real-time monitoring of DNA G-quadruplexes in living cells with a small-molecule fluorescent probe . Nucleic Acids Res. 2018 ; 46 : 7522 – 7532 . Google Scholar Crossref Search ADS PubMed WorldCat 50. Darby R.A. , Sollogoub M. , McKeen C. , Brown L. , Risitano A. , Brown N. , Barton C. , Brown T. , Fox K.R. High throughput measurement of duplex, triplex and quadruplex melting curves using molecular beacons and a LightCycler . Nucleic Acids Res. 2002 ; 30 : e39 . Google Scholar Crossref Search ADS PubMed WorldCat 51. Rache A.D. , Mergny J.-L. Assessment of selectivity of G-quadruplex ligands via an optimised FRET melting assay . Biochimie . 2015 ; 115 : 194 – 202 . Google Scholar Crossref Search ADS PubMed WorldCat 52. Maleki P. , Ma Y. , Iida K. , Nagasawa K. , Balci H. A single molecule study of a fluorescently labeled telomestatin derivative and G-quadruplex interactions . Nucleic Acids Res. 2017 ; 45 : 288 – 295 . Google Scholar Crossref Search ADS PubMed WorldCat 53. Pawar M.G. , Nuthanakanti A. , Srivatsan S.G. Heavy atom containing fluorescent ribonucleoside analog probe for the fluorescence detection of RNA-ligand binding . Bioconjugate Chem . 2013 ; 24 : 1367 – 1377 . Google Scholar Crossref Search ADS WorldCat 54. Nuthanakanti A. , Boerneke M.A. , Hermann T. , Srivatsan S.G. Structure of the ribosomal RNA decoding site containing a selenium-modified responsive fluorescent ribonucleoside probe . Angew. Chem., Int. Ed. 2017 ; 56 : 2640 – 2644 . Google Scholar Crossref Search ADS WorldCat 55. Egli M. , Pallan P.S. Insights from crystallographic studies into the structural and pairing properties of nucleic acid analogs and chemically modified DNA and RNA oligonucleotides . Annu. Rev. Biophys. Biomol. Struct. 2007 ; 36 : 281 – 305 . Google Scholar Crossref Search ADS PubMed WorldCat 56. Sheng J. , Huang Z. Selenium derivatization of nucleic acids for X-Ray crystal-structure and function studies . Chem. Biodivers. 2010 ; 7 : 753 – 785 . Google Scholar Crossref Search ADS PubMed WorldCat 57. Du Q. , Carrasco N. , Teplova M. , Wilds C.J. , Egli M. , Huang Z. Internal derivatization of oligonucleotides with selenium for X-ray crystallography using MAD . J. Am. Chem. Soc. 2002 ; 124 : 24 – 25 . Google Scholar Crossref Search ADS PubMed WorldCat 58. Wilds C.J. , Pattanayek R. , Pan C. , Wawrzak Z. , Egli M. Selenium-assisted nucleic acid crystallography: Use of phosphoroselenoates for MAD phasing of a DNA structure . J. Am. Chem. Soc. 2002 ; 124 : 14910 – 14916 . Google Scholar Crossref Search ADS PubMed WorldCat 59. Serganov A. , Keiper S. , Malinina L. , Tereshko V. , Skripkin E. , Höbartner C. , Polonskaia A. , Phan A.T. , Wombacher R. , Micura R. et al. . Structural basis for Diels-Alder ribozyme-catalyzed carbon-carbon bond formation . Nat. Struct. Mol. Biol. 2005 ; 12 : 218 – 224 . Google Scholar Crossref Search ADS PubMed WorldCat 60. Höbartner C. , Rieder R. , Kreutz C. , Puffer B. , Lang K. , Polonskaia A. , Serganov A. , Micura R. Syntheses of RNAs with up to 100 nucleotides containing site-specific 2′-methylseleno labels for use in X-ray crystallography . J. Am. Chem. Soc. 2005 ; 127 : 12035 – 12045 . Google Scholar Crossref Search ADS PubMed WorldCat 61. Freisz S. , Lang K. , Micura R. , Dumas P. , Ennifar E. Binding of aminoglycoside antibiotics to the duplex form of the HIV-1 genomic RNA dimerization initiation site . Angew. Chem., Int. Ed. 2008 ; 47 : 4110 – 4113 . Google Scholar Crossref Search ADS WorldCat 62. Sheng J. , Gan J. , Soares A.S. , Salon J. , Huang Z. Structural insights of non-canonical U•U pair and Hoogsteen interaction probed with Se atom . Nucleic Acids Res. 2013 ; 41 : 10476 – 10487 . Google Scholar Crossref Search ADS PubMed WorldCat 63. Tanpure A.A. , Srivatsan S.G. Conformation-sensitive nucleoside analogues as topology-specific fluorescence turn-on probes for DNA and RNA G-quadruplexes . Nucleic Acids Res. 2015 ; 43 : e149 . Google Scholar Crossref Search ADS PubMed WorldCat 64. Manna S. , Sarkar D. , Srivatsan S.G. A dual-app nucleoside probe provides structural insights into the human telomeric overhang in live cells . J. Am. Chem. Soc. 2018 ; 140 : 12622 – 12633 . Google Scholar Crossref Search ADS PubMed WorldCat 65. Kimura T. , Kawai K. , Fujitsuka M. , Majima T. Detection of the G-quadruplex-TMPyP4 complex by 2-aminopurine modified human telomeric DNA . Chem. Commun. 2006 ; 401 – 402 . WorldCat 66. Xu Y. , Sugiyama H. Formation of the G-quadruplex and i-motif structures in retinoblastoma susceptibility genes (Rb) . Nucleic Acids Res. 2006 ; 34 : 949 – 954 . Google Scholar Crossref Search ADS PubMed WorldCat 67. Gros J. , Rosu F. , Amrane S. , De Cian A. , Gabelica V. , Lacroix L. , Mergny J.-L. Guanines are a quartet's best friend: Impact of base substitutions on the kinetics and stability of tetramolecular quadruplexes . Nucleic Acids Res. 2007 ; 35 : 3064 – 3075 . Google Scholar Crossref Search ADS PubMed WorldCat 68. Gray R.D. , Petraccone L. , Trent J.O. , Chaires J.B. Characterization of a K+-induced conformational switch in a human telomeric DNA oligonucleotide using 2-aminopurine fluorescence . Biochemistry . 2010 ; 49 : 179 – 194 . Google Scholar Crossref Search ADS PubMed WorldCat 69. Dumas A. , Luedtke N.W. Highly fluorescent guanosine mimics for folding and energy transfer studies . Nucleic Acids Res. 2011 ; 39 : 6825 – 6834 . Google Scholar Crossref Search ADS PubMed WorldCat 70. Nadler A. , Strohmeier J. , Diederichsen U. 8-Vinyl-2′-deoxyguanosine as a fluorescent 2′-deoxyguanosine mimic for investigating DNA hybridization and topology . Angew. Chem., Int. Ed. 2011 ; 50 : 5392 – 5396 . Google Scholar Crossref Search ADS WorldCat 71. Manderville R.A. , Wetmore S.D. C-Linked 8-aryl guanine nucleobase adducts: biological outcomes and utility as fluorescent probes . Chem. Sci. 2016 ; 7 : 3482 – 3493 . Google Scholar Crossref Search ADS PubMed WorldCat 72. Collie G.W. , Campbell N.H. , Neidle S. Loop flexibility in human telomeric quadruplex small-molecule complexes . Nucleic Acids Res. 2015 ; 43 : 4785 – 4799 . Google Scholar Crossref Search ADS PubMed WorldCat 73. Greco N.J. , Tor Y. Simple fluorescent pyrimidine analogues detect the presence of DNA abasic sites . J. Am. Chem. Soc. 2005 ; 127 : 10784 – 10785 . Google Scholar Crossref Search ADS PubMed WorldCat 74. Cservenyi T.Z. , Van Riesen A.J. , Berger F.D. , Desoky A. , Manderville R.A. A simple molecular rotor for defining nucleoside environment within a DNA aptamer-protein complex . ACS Chem. Biol. 2016 ; 11 : 2576 – 2582 . Google Scholar Crossref Search ADS PubMed WorldCat 75. Sinkeldam R.W. , Greco N.J. , Tor Y. Polarity of major grooves explored by using an isosteric emissive nucleoside . ChemBioChem . 2008 ; 9 : 706 – 709 . Google Scholar Crossref Search ADS PubMed WorldCat 76. Noé M.S. , Sinkeldam R.W. , Tor Y. Oligodeoxynucleotides containing multiple thiophene-modified isomorphic fluorescent nucleosides . J. Org. Chem. 2013 ; 78 : 8123 – 8128 . Google Scholar Crossref Search ADS PubMed WorldCat 77. Ouerfelli N. , Iulian O. , Bouaziz M. Competition between Redlich-Kister and improved Herráez equations of correlation viscosities in 1,4-dioxane + water binary mixtures at different temperatures . Phys. Chem. Liq. 2010 ; 48 : 488 – 513 . Google Scholar Crossref Search ADS WorldCat 78. Miyoshi D. , Fujimoto T. , Sugimoto N. Molecular crowding and hydration regulating of G-quadruplex formation . Top. Curr. Chem. 2012 ; 330 : 87 – 110 . Google Scholar Crossref Search ADS WorldCat 79. Shrestha P. , Jonchhe S. , Emura T. , Hidaka K. , Endo M. , Sugiyama H. , Mao H. Confined space facilitates G-quadruplex formation . Nat. Nanotechnol. 2017 ; 12 : 582 – 588 . Google Scholar Crossref Search ADS PubMed WorldCat 80. Manna S. , Panse C.H. , Sontakke V.A. , Sangamesh S. , Srivatsan S.G. Probing human telomeric DNA and RNA topology and ligand binding in a cellular model by using responsive fluorescent nucleoside probes . ChemBioChem . 2017 ; 18 : 1604 – 1615 . Google Scholar Crossref Search ADS PubMed WorldCat 81. Ambrus A. , Chen D. , Dai J. , Bialis T. , Jones R.A. , Yang D. Human telomeric sequence forms a hybrid-type intramolecular G-quadruplex structure with mixed parallel/antiparallel strands in potassium solution . Nucleic Acids Res. 2006 ; 34 : 2723 – 2735 . Google Scholar Crossref Search ADS PubMed WorldCat 82. Pedroso I.M. , Duarte L.F. , Yanez G. , Baker A.M. , Fletcher T.M. Induction of parallel human telomeric G-quadruplex structures by Sr2+ . Biochem. Biophys. Res. Commun. 2007 ; 358 : 298 – 303 . Google Scholar Crossref Search ADS PubMed WorldCat 83. Rachwal P.A. , Fox K.R. Quadruplex melting . Methods . 2007 ; 43 : 291 – 301 . Google Scholar Crossref Search ADS PubMed WorldCat 84. Tran P.L.T. , Mergny J.-L. , Alberti P. Stability of telomeric G-quadruplexes . Nucleic Acids Res. 2011 ; 39 : 3282 – 3294 . Google Scholar Crossref Search ADS PubMed WorldCat 85. Rodriguez R. , Müller S. , Yeoman J.A. , Trentesaux C. , Riou J.-F. , Balasubramanian S. A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres . J. Am. Chem. Soc. 2008 ; 130 : 15758 – 15759 . Google Scholar Crossref Search ADS PubMed WorldCat 86. Moore M.J.B. , Schultes C.M. , Cuesta J. , Cuenca F. , Gunaratnam M. , Tanious F.A. , Wilson W.D. , Neidle S. Trisubstituted acridines as G-quadruplex telomere targeting agents. Effects of extensions of the 3,6- and 9-side chains on quadruplex binding, telomerase activity, and cell proliferation . J. Med. Chem. 2006 ; 49 : 582 – 599 . Google Scholar Crossref Search ADS PubMed WorldCat 87. Parkinson G.N. , Lee M.P.H. , Neidle S. Crystal structure of parallel quadruplexes from human telomeric DNA . Nature . 2002 ; 417 : 876 – 880 . Google Scholar Crossref Search ADS PubMed WorldCat 88. Rachofsky E.L. , Osman R. , Ross J.B.A. Probing structure and dynamics of DNA with 2-aminopurine: effects of local environment on fluorescence . Biochemistry . 2001 ; 40 : 946 – 956 . Google Scholar Crossref Search ADS PubMed WorldCat 89. Doose S. , Neuweiler H. , Sauer M. Fluorescence quenching by photoinduced electron transfer: a reporter for conformational dynamics of macromolecules . ChemPhysChem . 2009 ; 10 : 1389 – 1398 . Google Scholar Crossref Search ADS PubMed WorldCat 90. Luu K.N. , Phan A.T. , Kuryavyi V. , Lacroix L. , Patel D.J. Structure of the human telomere in K+ solution: An intramolecular (3 + 1) G-quadruplex scaffold . J. Am. Chem. Soc. 2006 ; 128 : 9963 – 9970 . Google Scholar Crossref Search ADS PubMed WorldCat 91. Dai J. , Carver M. , Punchihewa C. , Jones R.A. , Yang D. Structure of the hybrid-2 type intramolecular human telomeric G-quadruplex in K+ solution: Insights into structure polymorphism of the human telomeric sequence . Nucleic Acids Res. 2007 ; 35 : 4927 – 4940 . Google Scholar Crossref Search ADS PubMed WorldCat 92. Wang Y. , Patel D.J. Solution structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex . Structure . 1993 ; 1 : 263 – 282 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]