Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data

Statistical method evaluation for differentially methylated CpGs in base resolution... Abstract High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed. bisulfite next-generation sequencing, differential methylation, statistical method comparison Background Base resolution DNA methylation profiling through methylation microarray or high-throughput sequencing is widely used in biomedical research [1]. The former is represented by Illumina Infinium methylation microarray such as HumanMethylation450 BeadChip, and the latter is often achieved by reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing (WGBS). For the methylation microarray data, the average beta, representing the methylation ratio of methylated versus total signal intensity at a CpG site [2, 3], is the favorite format for data analysis and interpretation, as it is ‘self-normalized’ within a sample and intuitive to understand methylation changes. For differential methylation, both parametric and nonparametric tests are used directly on the beta value [3, 4], although an alternative approach to use M value (the ratio of methylated versus non-methylation signal) is proposed to address the heteroscedasticity issue across different CpG sites [5]. Comprehensive performance comparison and evaluation of these statistical methods for microarray data have been reported [4, 6]. In general, the performances of different methods differ more when sample sizes are small. Unlike microarray, the high-throughput bisulfite sequencing not only generates similar data as the average beta (i.e. the methylation ratio of methylated versus total coverage at a CpG site) but also much higher resolution of base counts, the numbers of methylated or unmethylated cytosines. For easier analysis, the methylation ratio data can be analyzed in the same way as those used for microarray, and the relative performance of different statistical methods obtained from Illumina microarray should apply [4, 6]. However, what makes the sequencing data unique is that the count data follow a particular distribution that can be modeled by count-based statistical models. The digital count also carries the sequence coverage information that can be used for better estimate of a methylation level. The most simple and common approach for differential methylation is to construct a 2 × 2 table from methylated and unmethylated cytosine counts from a pair of samples and conduct Fisher’s or χ2 test for independence [7, 8]. However, this approach does not take biological variability into consideration. An alternative is proposed with improved performance [9]. However, both cannot incorporate covariates and make them unfit for a larger study with multiple variables or batch effects. Generalized linear model (GLM) for binary data (i.e. logistic regression) is based on binomial distribution assumption, but methylation data show much higher variability. GLM with quasi-binomial density function can relax the variance structure to some extent. More recent proposal is to use beta-binomial model [10–14]. Beta distribution is a general distribution on the support of (0,1), and it entitles flexible environment for methylation counts to cope with variability from both biological and technical prospective. However, independent evaluation of its performance with other count-based methods and methylation ratio-based statistics has not been available. Statistical methods for differentially methylated CpGs (DMCs) not only are essential for base-level methylation difference but also the first step for differentially methylated regions (DMRs) by many programs for methylation sequence data [8, 11, 12, 15]; thus, the performance of DMC methods has multifold impacts. Application of statistical methods on the count data is not straightforward for most biologists, not to mention to choose the most appropriate one. It is often a question that how much I will compromise to use the convenient methylation ratio data over the count data. In this study, we hypothesized that conducting DMC detection on the ratio data may not be as accurate as modeling the counted-based statistical models; because the count-based models are so diverse, the accuracy can differ dramatically depending on whether technical and biological variabilities are considered. To confirm the hypotheses, we have systematically evaluated statistical methods on both the ratio and the count data in parallel for their performance in DMC detection. We tested their type I error control, sensitivity and specificity in detecting small to large methylation differences in the context of intermediate to small sample size. We also compared their agreement and dived deep into the cause of the discrepancies. Materials and methods Data sets Instead of simulating methylation sequencing data, we used a real RRBS data and simulated methylation differences. By this, we can preserve the technical and biological characteristics of bisulfite sequencing data across CpG sites and samples. For this, we downloaded a data set from GEO (GSE61163), where the tumor samples from 39 individuals with chronic myelomonocytic leukemia (CML) were sequenced with ERRBS protocol before treatment. The detailed library preparation and sequencing were described in the original publication [16]. The raw sequence data (single end, 59 base reads from HiSeq 2000) were processed by our internally developed bisulfite sequence analysis pipeline, SAAP-RRBS [17]. CpGs with at least 10X coverage were kept for further analysis. Thirty individual samples were selected from the 39 and assigned to two groups randomly. To save computational time, only the CpG data from chromosome 22 were used, which had 86 300 CpGs. The CpGs with missing value in any of the samples or constant methylation across samples (either 0 or 1) were removed. After the filtering, we had 53 422 CpGs for evaluation. The first data set was the randomized data, which imitate the null distribution for no methylation difference between the two groups. The purpose was to investigate the type I error control for all the statistical models. The P-value distribution and type I error rate from each test were compared. The second to fourth data sets were created from the first by adding (or subtracting) 2, 5, 10 and 15% methylation value to 10% of CpGs to evaluate each model’s sensitivity and specificity to detect the differences. The ratio and count data were tested separately by applying respective models, and, therefore, we have five simulations and 10 data sets (five methylation ratio data sets and five methylation count data sets) for evaluation as detailed in Figure 1. Figure 1 View largeDownload slide Simulation scheme and data sets. A total of five simulations with 10 data sets are generated, five for ratio and five for count data. Each data set is tested by eight statistical methods (some with multiple implementations). Figure 1 View largeDownload slide Simulation scheme and data sets. A total of five simulations with 10 data sets are generated, five for ratio and five for count data. Each data set is tested by eight statistical methods (some with multiple implementations). Statistical methods For ratio based data, we looked at t-test (parametric), U test (nonparametric), analysis of variance (ANOVA; linear models) and beta regression model. For count-based data, we evaluated contingency table method [7, 8], clustered data analysis [9], logistic regression [8] and beta-binomial model [10–14]. These methods are briefed as below. Analysis on methylation ratio Using methylation ratio data has several advantages. First, although all CpG sites start from the same number of DNA molecules, the depth of coverage for each CpG sites can vary dramatically because of genomic feature and amplification differences. However, the methylation ratio would normalize the coverage difference. Second, the normalized methylation ratio allows modeling the variability directly and easily. Third, the continuous variable can be easily fit into many commonly used statistics models for simple to complex multivariate analyses. However, the disadvantages of the ratio data are (1) the methylation ratio estimate at low sequence coverage is less reliable. (2) Methylation ratio is bounded between 0 and 1, and models such as t-test expect the data range from −∞ to +∞. (3) Variance of CpGs across different methylation range is different [5], and this heteroscedasticity can be an issue for statistical model. For this reason, M value instead of beta value in microarray data is proposed [5]. By the same token, this can be applied to the ratio data by logit transformation. T-test T-test tests if two sets of samples have significantly different means and come from different populations. It is a parametric test and requires the samples to be independent and normally distributed. For a two-sample t-test, if the unknown variance is believed to be equal for both groups, the common variance is estimated by pooling all samples and estimating one common variance. However, if variances for different groups are expected to be different, the unequal variance method called Welch’s t-test can be used. ANOVA ANOVA is a tool based on linear models for multiple group comparison. It tests whether the means of several groups are equal. Linear models are flexible to incorporate clinical covariates. In methylation studies, clinical covariates such as gender and age are shown to be influential on methylation level [18, 19]. CpGassoc [20] is an R package that fits DNA methylation data with phenotype covariates to linear fixed- or mixed-effects models. There is an option to logit-transform the ratio data before fitting the model. CpGassoc conducts ANOVA test and can easily include clinical covariates. When there is only one two-level categorical covariate (like two-group comparison), ANOVA is equivalent to t-test. Our evaluation focused on the logit-transformed data, equivalent to M value. Nonparametric test Considering the underlying distribution of methylation ratio among study population is unknown, a safer approach would be to use nonparametric tests such as Mann–Whitney U Test (a.k.a. Wilcoxon rank-sum test), as it is distribution assumption free. It works more efficiently when data are not normally distributed. Instead of comparing two population means in the t-test, U test evaluates if two comparison samples have identical medians, and it is less affected by outliers like means. The drawback of the Mann–Whitney U test is that it may generate many statistical ties for many CpGs in terms of P-values, although their methylation mean differences between two groups can be clearly different. It will be more problematic when sample size is small. Beta regression Beta distribution is a flexible continuous probability distribution that captures data distributed in the unit interval from 0 to 1. A regression method developed to model beta-distributed dependent variable in relation with a set of independent variables is called the beta regression model [21]. The model associates the mean of the dependent variable (methylation ratio or average beta) with a set of regressors through a linear predictor with unknown coefficients through a link function. The R package betareg [22] implements the beta regression model for general purposes, which estimates the model parameters through maximum likelihood algorithm. Analysis on read counts or CpG coverage data Read count is the number of methylated and unmethylated cytosines at a CpG site. Test of independence (χ2 or Fisher’s exact test) This approach sums the counts across samples within a group for a given CpG site, which results in a 2 × 2 contingency table (methylated/unmethylated × case/control). χ2 test or Fisher’s exact test is applied to test for independence. This approach only does group comparison. Noted is that by pooling reads in the group, the variability of sequence depths and interindividual variation information is lost. Clustered data analysis Realizing the issues of 2 × 2 table approach, Xu [9] proposed a method based on clustered data analysis [23] to incorporate the between-subject variability. For a given CpG site, the method treats sequencing reads as clusters for each individual and estimates a common methylation proportion adjusted by design effect. This method can be considered as performing the χ2 test for independence in a series of 2 × 2 contingency tables. The implementation of this method is simple and computationally efficient. It is robust to variation in sequencing depth for individual subjects. However, the test is based on large-sample approximation. Logistic regression A particular CpG site can be either methylated (denoted as 1) or unmethylated (denoted as 0) as a binary status. Logistic regression is a form of GLM for binary data. The regression is performed across individuals or biological replicates under study conditions. The most common logistic regression assumes the underlying (error) distribution is binomial. However, the methylation data clearly bare more biological variability than the binomial assumption. To overcome the extra variability, a quasi-binomial link can be used in the GLM implementation. Quasi-binomial distribution models binomial data with overdispersion. Beta-binomial model Beta-binomial deals with the overdispersion in methylation data by hierarchical modeling. For a given CpG site, let Xi be the number of methylated Cs for subject i, and ni be the total number of read coverage for subject i; the beta-binomial model assumes Xi∼Binomialni, p where p∼Betaα,β. We say that Xi follows a beta-binomial distribution. The beta-binomial distribution is another way to model binomial data with overdispersion. By assuming the success probability of a binomial distribution being random draws from a beta distribution, the beta-binomial model allows more variability than the binomial model. The beta part in this model can be re-parameterized using mean and a dispersion parameter. If the dispersion parameter is zero, it reduces to a binomial model. Using the two-stage modeling, the beta-binomial model is flexible enough to capture both the technical (coverage) and biological (methylation level) variability. Technical variability is presented in the subject-specific coverage ni from the binomial part. The success probability of the binomial distribution is indeed the methylation ratio. Following the same rationale as the beta regression model, biological variability in methylation ratios is captured in the beta part of the beta-binomial model. There are several software programs implementing the beta-binomial model. Dispersion Shrinkage for Sequencing data (DSS) [13, 14] and methylSig [11] are R packages; RADmeth [10] is an open source command line software. The original DSS [13] is for DMC detection between two groups and its recent addition extends to more flexible general study design with multifactors (hereinafter, DSS.general) [14]. DSS is a Bayesian approach that solves the beta-binomial model by assigning the dispersion parameter with a log-normal prior distribution. The null hypothesis of equal mean methylation between two groups is tested by the Wald test. Owing to the Bayesian nature of borrowing information across different CpG sites, DSS is designed when a sample size is small. Both methylSig and RADmeth use maximum likelihood estimator (MLE) for parameter estimation and pair with the likelihood ratio test for differential methylation detection between two groups. However, there is no closed form for the MLE; methylSig proposes to use an approximation method based on Wilks’ theorem, and RADmeth is based on a GLM framework for beta-binomial regression with logit link. DSS and methylSig are limited for two-group comparison. Though RADmeth is computationally intensive (as it is an iterative algorithm), its GLM framework allows it to be extended to include other experimental design factors (but only categorical factors). DSS.general [14] follows a GLM framework for beta-binomial distribution and uses an ‘arcsine’ link function. Parameter estimation is based on the arcsine transformed data with generalized least square approach without relying on an iterative algorithm. It greatly saves computational time. The disadvantage of the arcsine link is that it is more difficult to interpret compared with the commonly used logit or probit link. Summary of these methods are presented in Table 1. Noted is that for some statistical methods, multiple implementations are evaluated; for example, under the beta-binomial model RADmeth, MethylSig, DSS and DSS.general are evaluated. Table 1 Summary of different statistical methods on differential CpG methylation Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Note. Almost all ratio-based methods used for Illumina Infinium 450K microarray can be used for bisulfite sequence data, and some may need slight customization; many tests such as t-test, χ2 and Fisher’s exact tests are readily available in various statistical packages, functions/methods in R are used for this evaluation. Some DMR methods (metilene, MOABS) do not generate comparable CpG level statistics for evaluation. Table 1 Summary of different statistical methods on differential CpG methylation Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Note. Almost all ratio-based methods used for Illumina Infinium 450K microarray can be used for bisulfite sequence data, and some may need slight customization; many tests such as t-test, χ2 and Fisher’s exact tests are readily available in various statistical packages, functions/methods in R are used for this evaluation. Some DMR methods (metilene, MOABS) do not generate comparable CpG level statistics for evaluation. Results Characteristics of DNA methylation from RRBS data across individuals As the first step, we examined the diagnostic properties of the RRBS data: methylation and coverage variability across individuals. Figure 2A shows the relationship between the mean and SD of methylation ratio at each CpG among 30 samples. The methylation ratio displays high variation around intermediately methylated CpG sites but little for the CpGs that are fully methylated or not methylated at all. This heteroscedasticity is typical for methylation data as described previously for Illumina microarray data [5] and RRBS data [28]. Additionally, the data are not symmetrically distributed. The majority of CpG sites are clustered near the extreme values 0 (50.4% from 0 to 0.2) and 1 (36.7% from 0.8 to 1). For the count data, the biggest challenge is the high variability of coverage across samples as shown in Figure 2B. As an illustration, we highlighted one highly varied CpG with red points across samples, which run from 10s to 2500. These properties of RRBS data have significant implications to the model selections and result interpretation from different models, as we describe in this article. Figure 2 View largeDownload slide Properties of RRBS data. (A) Mean and SD of methylation ratio under the null at 53 422 CpG sites across the randomly selected 30 subjects. X-axis is the mean methylation ratio at each CpG site, and y-axis is the SD of the methylation ratios at corresponding site (average SD = 0.07, maximum SD = 0.41). (B) Boxplot of coverage count at each CpG site across samples. The red/star points represent one specific CpG site that shows largely varying coverage count across samples. Figure 2 View largeDownload slide Properties of RRBS data. (A) Mean and SD of methylation ratio under the null at 53 422 CpG sites across the randomly selected 30 subjects. X-axis is the mean methylation ratio at each CpG site, and y-axis is the SD of the methylation ratios at corresponding site (average SD = 0.07, maximum SD = 0.41). (B) Boxplot of coverage count at each CpG site across samples. The red/star points represent one specific CpG site that shows largely varying coverage count across samples. Performance of different methods under null and alternative simulations For the eight statistical methods on the 10 data sets, we looked at the performance of the methods through four perspectives: (1) under the null, what does the empirical distribution of P-values look like?, (2) Given there is no differential methylation, is the type I error well controlled under the null?, (3) For known level of differential methylation, how powerful are these methods in detecting true DMCs? and (4) How many positive findings are commonly identified by different methods? Computation time and handling of special data issue The computation time was not an issue for most of the methods (within a matter of minutes) except RADmeth and beta regression. RADmeth took 1–2 h, and beta regression took half a day to run the simulated data. RADmeth is an iterative algorithm, and its computation time can be improved on high-performance computing cluster. The problem with beta regression is that the original R function for beta regression was not designed for high-throughput genomic data. Although it takes reasonable amount of time for running a few beta regressions, it became a major issue when there are over hundreds of thousands of CpG sites. Note that some tests (i.e. beta regression and clustered data analysis) may fail at the CpG sites with less ideal data, which result NA for the P-values. For example, under the null, 946 sites failed in the beta regression with logit link, 3 sites failed with probit link and 6121 sites failed in the clustered data analysis test. The maximum likelihood algorithm for solving the beta regression fails with data that produce non-invertible matrices in the calculation. For clustered data analysis, the test may fail at the positions where all observed counts in one group happen to be methylated or unmethylated. This will make zero for observed unmethylated (or methylated) proportion in the pooled contingency table and lead to zero in denominator in the following calculation. P-value distribution Theoretically, P-values should form a uniform distribution under the null. Figure 3 shows the empirical distributions for all available P-values computed by each method. For illustration purpose, we also included the output from variants of some methods. The left column shows tests for ratio data. From top to bottom, the tests are t-test with equal variance, t-test with unequal variance, ANOVA test, ANOVA with logit-transformation, U test, beta regression with logit link and beta regression with probit link. We observed similar patterns for the first four histograms (in fact, Plot 1 and Plot 3 are exactly the same, as the t-test is the special case of ANOVA). The noticeable characteristics are that the P-values are all peaked around 0.3–0.35. This pattern was also observed before [9]. This may be explained by the overloading of the ratios around the extreme values 0 and 1 that may lead P-values to be overloaded at some certain value. Similarly, the P-values from the U test also have a spike around 0.35–0.4. However, the abundant P-values of 1 come from the excessive of ties in the data, which bring deficiency of the U test. The bottom two histograms from beta regression look much closer to a uniform distribution. Figure 3 View largeDownload slide Histogram of P-value distribution under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. Figure 3 View largeDownload slide Histogram of P-value distribution under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. The right column of Figure 3 shows tests for the count data. The first two plots are from tests of independence based on the pooled contingency table. In this scrambled data, we have 15 samples in each of the testing groups. By pooling all samples in the same group together, contingency tables for each CpG site usually contained hundreds to thousands of coverage counts, which lead to a high peak of significant small P-values between 0 and 0.05. Clustered data analysis was able to effectively correct the naïve method. Logistic regressions with and without overdispersion were problematic for having too many P-values equal to 1. Additionally, the standard logistic regression induced a great amount of P-values being significant. With quasi-binomial link, an extra variability can be captured, so the histogram was improved at the zero-end. According to the empirical P-value distribution, the best count-based test was the beta-binomial model. Among different implementations, DSS and DSS.general have the closest empirical distribution to a uniform distribution. Type I error control As no methylation difference is expected under the null simulation, any significant DMCs would be considered as false positives. Figure 4 compares the type I error rate for each test. In general, ratio-based tests are more conservative, and all tests have <5% of CpGs that were claimed as significant (left panel). T-test, equal or unequal variance, is the most stringent for type I error control, and U test is a little bit more relaxed than t-test. ANOVA with logit transformation has the type I error rate that is the closest to the expectation and so is the beta regression in which the different link functions appear not having much difference. Figure 4 View largeDownload slide Observed type I error rates under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. The numbers on the bars are the observed type I error rates. The dashed line is expected type I error rate (0.05). Figure 4 View largeDownload slide Observed type I error rates under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. The numbers on the bars are the observed type I error rates. The dashed line is expected type I error rate (0.05). The type I error control is dramatically variable for the count-based models where some appear problematic (Figure 4, right panel). The error rates of χ2 test, Fisher’s exact test and logistic regression are over 30% (33.3, 34.4 and 33.8%, respectively). The clustered method brought down the type I error dramatically, although still higher than expectation (7.5%). The logistic regression with quasi-binomial link, which captures the overdispersion in data, seems having the type I error well controlled or a little bit stringent (4%). Finally, all implementations of the beta-binomial model give close type I error rate (4–5.9%) to the expectation. Receiver operating characteristic curves for methylation difference detection We looked at the receiver operating characteristic (ROC) curve and area under the curve (AUC) of each test for their sensitivity and specificity under alternative simulations. Under the alternative, we assumed that 10% of the CpG sites was truly differentially methylated at four different magnitudes of 2, 5, 10 and 15%. All the tests evaluated under the null simulation were carried out to the four alternative data sets. As noticed in the null simulation, the statistic tests with different options (link functions or variance assumption) from the same class usually performed similarly, and some test options are redundant. For simplicity, we only report the results of representative tests from each modeling class here. Figure 5 shows the results for the alternative simulations. Each row represents a specific differential methylation level (2, 5, 10 and 15%). At the 2% methylation difference, all tests have a limited detection power. The ratio-based tests have AUC statistics ranging from 0.7 to 0.8, with a clear ordering of the ROC curves in which from the worst to the best performer are t-test, U test, ANOVA with logit transformation and beta regression. The count-based tests have similar AUC statistics, ranging from 0.71 to 0.79. Overall, the 2% methylation difference appears too small for most of the tests to have a reliable detection. Figure 5 View largeDownload slide ROC curves with 2, 5, 10 and 15% methylation difference (15 versus 15 samples). The grey vertical line in each plot is the nominal 0.05 significant level for the false-positive rate, and each open circle along the curve means the actual 0.05 cutoff point in the test. Figure 5 View largeDownload slide ROC curves with 2, 5, 10 and 15% methylation difference (15 versus 15 samples). The grey vertical line in each plot is the nominal 0.05 significant level for the false-positive rate, and each open circle along the curve means the actual 0.05 cutoff point in the test. At the 5% methylation difference, ANOVA with logit transformation becomes the best performer among the ratio-based tests, and the beta-binomial model (methylSig, RADmeth and DSS.general, although DSS had relatively weak performance in the class) is the best performer among the count-based tests. The slightly better performance of the logit-transformed ANOVA model over U test or beta regression may be explained by the better handling of the heteroscedasticity in methylation data, as reported before [5, 6]. No matter which implementation used, the better performance of the beta-binomial model was likely the result of its better handling of type I error, by accounting for both biological and technical variability through a two-layer model specification. As methylation difference increases, the detection power was boosted accordingly. The detection power is about 60, 80 and 90% at 5, 10 and 15% methylation difference, respectively. At the 10% methylation difference, the ROC curves of the ratio-based tests became close to each other. When reaching to 15% methylation difference, almost all the tests had a perfect ROC plot, except for those tests that might fail in some CpG sites. In general, t-test has the worst ROC curve and AUC statistic, and U test has better performance than the t-test. The performance of the beta regression model may be affected by the failure of the test for some CpG sites, as its curve jumps quickly and then declines the growing rate. For count-based tests, all other tests are much worse than the beta-binomial model in terms of both the shape of ROC curve and the value of AUC statistic (the larger, the better). For different implementations, the performances of methylSig and RADmeth are almost identical, and DSS.general has slightly better performance than these two. As methylation difference increases from 2 to 15%, the dominance of the best performer in each data class becomes clearer, i.e. the best performance curve dominates all other curves in respective plots. On Figure 5, we can also compare the nominal and actual 5% significance level. In terms of the type I error control, the results under the alternatives are consistent with the result under the null (Figure 4). All ratio-based tests were able to control the false-positive rate under the nominal level, and they were a bit conservative. For count-based tests, the logistic regression with overdispersion, DSS and DSS.general were able to control the false positives under the nominal level. The clustered data analysis, methylSig and RADmeth had a little inflated type I error rates, but the Fisher’s exact test (and/or χ2 test) lost control of the type I error badly. Though Fisher’s test achieved a higher power, it was paid at the higher cost of loss of type I error control. On the contrary, the two best performers from each data class, i.e. ANOVA with logit transformation and beta-binomial model (DSS.general), were able to achieve a high detection power while controlling the type I error at the nominal level. Consistency of findings from different tests No matter which statistical test is used, the important question is how much information a researcher will lose or be crammed with too many false findings compared with an alternative test being used. For this, we illustrated the commonly detected DMCs between any two tests in a heatmap. As the 2% methylation difference was too small to have a sound detection, Figure 6 shows the common findings under the 5% methylation difference for all the tests discussed above (similar plots for other alternative simulations can be found in Supplementary Figure S1). The diagonal entries with no background color are the numbers of DMCs identified by each test. Of 53 422 CpGs, Fisher’s test identified 21 156 significant ones. However, we believe that many of them were likely false positives, as >30% of CpGs were claimed as DMCs in our null simulation and other tests identified 5000–6700 DMCs, which is about 9.3–12.5% of the total CpG sites, the expected DMC range from 10% artificially changed CpGs plus the false positive under the null distribution. The row of blue cells and the column of almost all red cells are because of the large number of findings by Fisher’s test. Because of the large discrepancy with other tests, our discussion is focused on other tests. Figure 6 View largeDownload slide Common findings between different tests with 5% methylation difference (15 versus 15 samples). The numbers with white background are the significant CpGs (P<0.05) from each test, and other numbers in each cell represents the ratio of overlapping DMCs between a pair of tests on the row and column (note: the plot is not symmetric). Figure 6 View largeDownload slide Common findings between different tests with 5% methylation difference (15 versus 15 samples). The numbers with white background are the significant CpGs (P<0.05) from each test, and other numbers in each cell represents the ratio of overlapping DMCs between a pair of tests on the row and column (note: the plot is not symmetric). The first four tests in Figure 6 are ratio-based, and the rest seven tests are count-based. Each cell in the heatmap represents the ratio of overlapping DMCs between a pair of tests on the row and column. The plot is not symmetric. For example, the very bottom and far left cell of 0.76 means that 76% of the DMCs identified by DSS.general is also found by t-test with equal variance. Reversely, the number of 0.91 on the very top and far right means that 91% of DMCs by the equal variance t-test is overlapped with the DMCs by DSS.general. In other words, if you use t-test for DMC detection, you may lose >20% of significant CpGs picked up by DSS.general, but if you use DSS.general, you only lose 9% of DMCs by the t-test. Overall, all the tests (except Fisher’s test) achieved 60% or above agreement between each other. Figure 6 reconfirms that the two implementations of beta-binomial model, methylSig and RADmeth, performed similarly. The agreement between was 98 and 95% of the significant findings, respectively. The other two implementations, DSS and DSS.general, identified less number of significant DMCs. Also, all beta-binomial implementations had high agreement with t-test. They were able to identify >90% of the DMCs identified by the t-test. Methylation variability and sample size estimation Understanding the technical and biologic variability among biological samples in DNA methylation sequencing is important for the study design of a methylation experiment. It would be interesting to know the magnitude of random differences occurred in the methylation level between two sets of arbitrarily grouped samples. Such measurement can be considered as the noise level for a treatment effect. Knowing this would also help us to understand the detection of ‘truly’ CpG methylation difference. Figure 7 shows the distribution of mean methylation difference between the two randomly sampled groups for all CpG sites on chr22. We can see that the histogram is roughly bell-shaped and centered at 0 (median = 0 and mean =−0.003). The SD of the mean differences is 0.043. In our alternative simulations, the 2% methylation difference represents half of the SD of the between-group difference because of randomness, the 5% methylation difference is >1 SD and 10% is >2 SD (95% of CpGs have absolute methylation difference <10%) of the random differences. This means 5% CpGs can have over 10% methylation difference purely by chance in this data set. Figure 7 View largeDownload slide Distribution of mean methylation differences between groups under the null (15 versus 15 samples) at all CpG sites. The top right legend shows basic summary statistics for the differences: mean, SD, quantiles and 2.5th and 97.5th percentile. The vertical line indicates zero difference. Figure 7 View largeDownload slide Distribution of mean methylation differences between groups under the null (15 versus 15 samples) at all CpG sites. The top right legend shows basic summary statistics for the differences: mean, SD, quantiles and 2.5th and 97.5th percentile. The vertical line indicates zero difference. Knowing the noise level in RRBS data is helpful for determining an appropriate sample size in designing an experiment. With the minimal methylation difference to be detected and confidence level, the required sample size can be estimated. More explicitly, at significant level α and power 1-β, the sample size calculation based on the group mean methylation ratio is n=fα,β2σ2Δ2, where f(α,β) is a value determined by α and β, σ is the SD of the methylation ratio, and Δ is the expected difference that being regarded as minimal important. For example, for two-sample t-test, fα,β=Zβ+Zα22 where Zβ is the β-percentile of a standard normal distribution. If targeting at 5% significant level and 80% power, fα,β=7.85. In our simulated null ratios, the average SD is 0.07 (Figure 1A). If we consider 1 SD of the null methylation difference being minimal important ( Δ = 0.043), it is necessary to have 42 (n = 41.61) samples in each group to achieve the desired significant level and power. If the interested minimal important difference is ≥2 or 3 SDs, 15 (n = 15.02) samples in each group would be required. It is about the same sample size as we used in this study, so that it corresponds to the sensitivity and specificity shown for the 10 and 15% methylation differences in Figure 5. The above example shows the simplest case for sample size calculation, and the estimated sample size is most suitable for two-sample t-test. However, as illustrated in the previous section, t-test is less sensitive, and the required sample size would be smaller for the more sensitive test like the beta-binomial model. Sample size calculation itself is a complex statistical topic, particularly, for genome-wide studies, and a systematic evaluation of sample size estimation for RRBS experiment is out of the scope of this article. Readers can get more information on this similar topic about sample size estimation for high dense DNA methylation microarray [29]. DMC detection with small sample size To evaluate the sample size effect on DMC detection, we also performed simulation for smaller sample size, i.e. five samples in each group for the same null and alternative simulation scheme as the 15 versus 15 sample design, and these results are provided in Supplementary Figures S2–S4. Under the null, the empirical P-value distribution (Supplementary Figure S2) for most of the tests remained similar pattern as before. Note that the U test had a more problematic P-value distribution because of the severe ties in small sample size. The type I error control (Supplementary Figure S3) got a lot worse for the ratio-based tests. All beta-binomial implementations had a little inflated type I error rate, and DSS had the closest control to the expectation. Under the alternatives, the performance for all the tests became less satisfying (Supplementary Figure S4). With the small sample size, it was particularly difficult to identify small methylation difference. As discussed by [13], DSS indeed outperforms other approaches slightly when sample size is small. The agreement between two tests was limited (about 50% agreement on average) when both sample size and differential methylation signal were small (Supplementary Figure S5A). It is likely a consequence of the near diagonal ROC curves in the first row of Supplementary Figure S4. As the differential methylation signal got stronger, the beta-binomial model quickly regained the ability to identify the DMCs identified by t-test, U test, ANOVA with logit transformation and logistic regression with overdispersion (Supplementary Figure S5B–D). The results from the small sample simulation remind us that all tests have a limited power to detect methylation difference <5%, and these changes are well within the noise level. The beta-binomial model is more sensitive and specific, and DSS should be used to get a relatively trustable result for small sample size. Discussion In this study, we extensively looked into different statistical methods for analyzing DMCs in base resolution next-generation DNA sequencing data. We compared two general categories of statistics for such methylation data, which are the continuous model for methylation ratios and the discrete model for methylation counts. Specifically, we illustrated four tests in each category and evaluated their type I error control, sensitivity and specificity under the null and four alternative simulations. Many of these methods have inevitable shortcomings in modeling the methylation data. Loss of coverage information and variance heteroscedasticity are the major issues for the methylation ratio-based models. The normality assumption of parametric models may also not meet for t-test or ANOVA (Table 1 for summary). Logit transformation of the ratio data appears improving the performance of these tests. However, considering all performance measures, we would recommend keeping the methylation counts and modeling them with the beta-binomial distribution. The simple way to model the count data by pooling read coverages within a group to construct a contingency table and testing with χ2 or Fisher’s test is problematic. First, the sequence depth from each individual can be different (Figure 2B), and the ones with large sequencing coverage have dominating influence on the test statistic. Second, by pooling reads across samples together, biological variability is ignored. Ignoring an extra variability may induce spurious significant findings. Third, treating coverage counts as observations in the test also leads to inflated type I error. As there are more samples collected, the pooled coverage counts accumulate quickly for the case and control groups. As in methylKit [8], Fisher’s test is used in the absence of replicates; if there are multiple samples per group, methylKit will use the logistic regression test. Furthermore, even the logistic regression is used for a study with a relatively larger sample size [8], regular logistic regression performed poorly without overdispersion moderation. When both the dependent variable and the independent variable are dichotomous, χ2 test is a special case of logistic regression; thus, they had similar performance. The method using clustered data analysis techniques is an improvement to the pooled contingency table. We noted that this method is based on large sample approximation, and the simulation study by the proponents shows that the performance of this approach improves as sample size increases [9]. When sample size is small, we may expect an inflated type I error. We also pointed out that this test would fail when the data are in the extreme case, i.e. all observed counts in one group happen to be methylated (or unmethylated). The key for constructing a good count-based model for methylation sequencing experiment is the model’s ability to capture the variability in the data from multiple sources. Failing to do so can dramatically impact the results. As we can see in Figure 4, the type I error rate clearly reflects this shortcoming in the construction of these tests (χ2 test, Fisher’s test and the standard logistic regression). To model binary data with additional variability, a general way is to replace the binomial link with the quasi-binomial link in the logistic. More specifically to the methylation data, the beta-binomial model uses two layers of modeling, one for each, to capture one source of variation (technical and biological). The biological variance, modeled in the beta part of the model, is our key interest for DMC detection. Technical variability coming from sequencing depth difference is inevitable, and it is modeled in the binomial part. Though the beta-binomial model is computationally more complex than other methods, with powerful computing clusters, it is no longer an issue today. In this article, we evaluated four beta-binomial implementations (methylSig, RADmeth, DSS and DSS.general). DSS and methylSig are only for two-group comparison; DSS.general can fit the most general multiple factor experimental design. RADmeth is an iterative algorithm and requires longer computation time. DSS is the best performer for small sample size. The implementation is not limited to the ones that we listed here. For example, model-based analysis of bisulfite sequencing data (MOABS) [12] also uses a beta-binomial hierarchical model for DMC and DMR detection between group conditions and solves the model specification by the Empirical Bayes approach. This study focused on DMC statistical method classes but not necessarily all software implementations in the public domain. Statistical methods such as t-test, U test, Fisher’s exact or χ2 test are readily available in many different software packages, and there is no need to evaluate all the packages. DMR methods that do not provide CpG level statistical testing were not included. For example, Bisulfighter [30] uses Hidden Markov Model (HMM) to detect DMRs, but it only works with a pair of samples. Study design with biological replicates in a group would need to pool all samples together first, and a possibility of hyper, hypo or no change is calculated by comparing the two pooled samples, which is not comparable with our study. In this study, we used the hematologic tumor samples that likely have larger variance across individuals than normal tissues but smaller than solid tumors, as hematologic tumors generally are more homogeneous with less contamination of other cell types. To confirm this, we examined other RRBS data in our collection for normal adipose tissues, lung adenocarcinoma, stomach adenocarcinoma, colorectal adenocarcinoma and pancreatic adenocarcinoma, and indeed, this was the case. The normal adipose tissues had slightly lower variability across different individuals; however, when we tested the statistical methods, the similar results and conclusions were obtained (data not shown). All solid tumors evaluated had slightly higher variability across different individuals (mean SD of CpGs from 0.080 to 0.114 compared with CML’s 0.075).The observation and conclusions from our evaluations can be generalizable to RRBS or similar data, but sample size estimation needs to be tissue (tumor) type specific. A larger sample size is needed for solid tumors for DMC detection. In our study, we adopted a real RRBS data set for the purpose of best reflection of the reality. Before that, we also attempted to use an RRBS simulator ‘rrbs-sim’ [31], which generated totally simulated RRBS data. This simulator starts from modeling site locations, to coverage scores and lastly methylation levels. Though the simulator used advanced statistical techniques to model the spatial correlation and the spatial dependence structure in coverage scores and methylation levels, the generated data still have less satisfied variation level compared with the real-world data, which will then lead to misleadingly inflated type I error rates. In summary, our comprehensive assessment of both ratio- and count-based statistical models concludes that although ratio-based tests are easier to work with, they are less sensitive to detect minor methylation changes. The preferred method is the beta-binomial model that can capture both biological and technical variability to achieve the well-balanced sensitivity and specificity. Many implementations are designed for two-group comparison. To our best knowledge, the latest DSS. general [14] is the implementation that provides the most flexibility to fit the beta-binomial model from general experimental design. Key Points Base resolution DNA methylation measurement through bisulfite treatment and next-generation sequencing is an increasingly popular method for genome-wide DNA methylation profiling, which includes RRBS, targeted capturing and WGBS. Differential methylation detection at CpG level (DMC) is the most common statistics for biological interpretation and is often the first step for DMRs identification. Although methylation ratio at CpG level can be directly used for comparison, it does not carry the coverage information for accurate methylation estimate. By using real RRBS data along with simulation, we demonstrated the relative performances of eight statistical models (with multiple implementations), four for ratio-based and four for coverage count-based models. The ratio-based tests are easier to work with but are less sensitive to detect minor methylation changes between contrast groups, particularly when sample size is small. The beta-binomial model is the best method for balanced sensitivity and specificity in DMC detection; some count-based methods that ignore technical and biological variability have high false-positive rates and should be avoided. MethylSig, RADmeth, DSS and DSS.general give the similar result; however, MethylSig and DSS are only for two-group comparison, and RADmeth and DSS.general can incorporate covariates. Supplementary data Supplementary data are available online at http://bib.oxfordjournals.org/. Funding The Center for Individualized Medicine at Mayo Clinic. Yun Zhang is a PhD candidate in Statistics with concentration on Biostatistics and Computational Biology in the Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY. She was a summer intern in the Division of Biomedical Statistics and Informatics, Department of Health Sciences Research at Mayo Clinic Rochester, Minnesota when this work was performed. Saurabh Baheti is a Senior Informatics Specialist with Master’s Degree in Bioinformatics in Division of Biomedical Statistics and Informatics, Department of Health Sciences Research at Mayo Clinic Rochester, Minnesota. Zhifu Sun is a Senior Associate Consultant and Associate Professor in Division of Biomedical Statistics and Informatics, Department of Health Sciences Research at Mayo Clinic Rochester, Minnesota. His research focus is genomics and epigenomics. References 1 Sun Z , Cunningham J , Slager S , et al. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis . Epigenomics 2015 ; 7 : 813 – 28 . Google Scholar CrossRef Search ADS PubMed 2 Weisenberger DJ , Berg DVD , Pan F , et al. Comprehensive DNA Methylation Analysis on the Illumina Infinium Assay Platform. Application Notes: Illumina Epigenetic Analysis . Illumina, Inc , 2008 . marketing/documents/products/appnotes/appnote_dna_methylation_ analysis_infinium.pdf. 3 Dedeurwaerder S , Defrance M , Bizet M , et al. A comprehensive overview of Infinium HumanMethylation450 data processing . Briefings in Bioinformatics 2014 ; 15 : 929 – 41 . Google Scholar CrossRef Search ADS PubMed 4 Li D , Xie Z , Pape ML , et al. An evaluation of statistical methods for DNA methylation microarray data analysis . BMC Bioinformatics 2015 ; 16 : 217. Google Scholar CrossRef Search ADS PubMed 5 Du P , Zhang X , Huang CC , et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis . BMC Bioinformatics 2010 ; 11 : 587. Google Scholar CrossRef Search ADS PubMed 6 Zhuang J , Widschwendter M , Teschendorff AE. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform . BMC Bioinformatics 2012 ; 13 : 59. Google Scholar CrossRef Search ADS PubMed 7 Lister R , Pelizzola M , Dowen RH , et al. Human DNA methylomes at base resolution show widespread epigenomic differences . Nature 2009 ; 462 : 315 – 22 . Google Scholar CrossRef Search ADS PubMed 8 Akalin A , Kormaksson M , Li S , et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles . Genome Biology 2012 ; 13 : R87. Google Scholar CrossRef Search ADS PubMed 9 Xu H , Podolsky RH , Ryu D , et al. A method to detect differentially methylated loci with next-generation sequencing . Genetic Epidemiology 2013 ; 37 : 377 – 82 . Google Scholar CrossRef Search ADS PubMed 10 Dolzhenko E , Smith AD. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments . BMC Bioinformatics 2014 ; 15 : 215. Google Scholar CrossRef Search ADS PubMed 11 Park Y , Figueroa ME , Rozek LS , et al. MethylSig: a whole genome DNA methylation analysis pipeline . Bioinformatics 2014 ; 30 : 2414 – 22 . Google Scholar CrossRef Search ADS PubMed 12 Sun D , Xi Y , Rodriguez B , et al. MOABS: model based analysis of bisulfite sequencing data . Genome Biology 2014 ; 15 : R38. Google Scholar CrossRef Search ADS PubMed 13 Feng H , Conneely KN , Wu H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data . Nucleic Acids Res 2014 ; 42 : e69. Google Scholar CrossRef Search ADS PubMed 14 Park Y , Wu H. Differential methylation analysis for BS-seq data under general experimental design . Bioinformatics 2016 ; 32 : 1446 – 53 . Google Scholar CrossRef Search ADS PubMed 15 Hansen KD , Langmead B , Irizarry RA. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions . Genome Biology 2012 ; 13 : R83. Google Scholar CrossRef Search ADS PubMed 16 Meldi K , Qin T , Buchi F , et al. Specific molecular signatures predict decitabine response in chronic myelomonocytic leukemia, The . Journal of Clinical Investigation 2015 ; 125 : 1857 – 72 . Google Scholar CrossRef Search ADS PubMed 17 Sun Z , Baheti S , Middha S , et al. SAAP-RRBS: Streamlined Analysis and Annotation Pipeline for Reduced Representation Bisulfite Sequencing . Bioinformatics 2012 ; 28 : 2180 – 1 . Google Scholar CrossRef Search ADS PubMed 18 Day K , Waite LL , Thalacker-Mercer A , et al. Differential DNA methylation with age displays both common and dynamic features across human tissues that are influenced by CpG landscape . Genome Biology 2013 ; 14 : R102. Google Scholar CrossRef Search ADS PubMed 19 Florath I , Butterbach K , Muller H , et al. Cross-sectional and longitudinal changes in DNA methylation with age: an epigenome-wide analysis revealing over 60 novel age-associated CpG sites . Human Molecular Genetics 2014 ; 23 : 1186 – 201 . Google Scholar CrossRef Search ADS PubMed 20 Barfield RT , Kilaru V , Smith AK , et al. CpGassoc: an R function for analysis of DNA methylation microarray data . Bioinformatics 2012 ; 28 : 1280 – 1 . Google Scholar CrossRef Search ADS PubMed 21 Ferrari S , Cribari-Neto F. Beta regression for modelling rates and proportions . Journal of Applied Statistics 2004 ; 31 : 799 – 815 . Google Scholar CrossRef Search ADS 22 Cribari-Neto F , Zeileis A. Beta regression in R . Journal of Statistical Software 2010 ; 34 : 1 – 24 . Google Scholar CrossRef Search ADS 23 Rao JN , Scott AJ. A simple method for the analysis of clustered binary data . Biometrics 1992 ; 48 : 577 – 85 . Google Scholar CrossRef Search ADS PubMed 24 Wang D , Yan L , Hu Q , et al. IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data . Bioinformatics 2012 ; 28 : 729 – 30 . Google Scholar CrossRef Search ADS PubMed 25 Aryee MJ , Jaffe AE , Corrada-Bravo H , et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA Methylation microarrays . Bioinformatics 2014 ; 30 : 1363 – 9 . Google Scholar CrossRef Search ADS PubMed 26 Warden CD , Lee H , Tompkins JD , et al. COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis . Nucleic Acids Research 2013 ; 41 : e117. Google Scholar CrossRef Search ADS PubMed 27 Juhling F , Kretzmer H , Bernhart SH , et al. metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data . Genome Research 2016 ; 26 : 256 – 62 . Google Scholar CrossRef Search ADS PubMed 28 Chatterjee A , Stockwell PA , Rodger EJ , et al. Genome-wide DNA methylation map of human neutrophils reveals widespread inter-individual epigenetic variation . Scientific Reports 2015 ; 5 : 17328. Google Scholar CrossRef Search ADS PubMed 29 Tsai PC , Bell JT. Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation . International Journal of Epidemiology 2015 ; 44 : 1429 – 1441 . Google Scholar CrossRef Search ADS PubMed 30 Saito Y , Tsuji J , Mituyama T. Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions . Nucleic Acids Research 2014 ; 42 : e45. Google Scholar CrossRef Search ADS PubMed 31 Lacey MR , Baribault C , Ehrlich M. Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments . Stat Appl Genet Mol Biol 2013 ; 12 : 723 – 42 . Google Scholar CrossRef Search ADS PubMed © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data

Loading next page...
 
/lp/ou_press/statistical-method-evaluation-for-differentially-methylated-cpgs-in-oGNEKTLZ54
Publisher
Oxford University Press
Copyright
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1467-5463
eISSN
1477-4054
D.O.I.
10.1093/bib/bbw133
Publisher site
See Article on Publisher Site

Abstract

Abstract High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed. bisulfite next-generation sequencing, differential methylation, statistical method comparison Background Base resolution DNA methylation profiling through methylation microarray or high-throughput sequencing is widely used in biomedical research [1]. The former is represented by Illumina Infinium methylation microarray such as HumanMethylation450 BeadChip, and the latter is often achieved by reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing (WGBS). For the methylation microarray data, the average beta, representing the methylation ratio of methylated versus total signal intensity at a CpG site [2, 3], is the favorite format for data analysis and interpretation, as it is ‘self-normalized’ within a sample and intuitive to understand methylation changes. For differential methylation, both parametric and nonparametric tests are used directly on the beta value [3, 4], although an alternative approach to use M value (the ratio of methylated versus non-methylation signal) is proposed to address the heteroscedasticity issue across different CpG sites [5]. Comprehensive performance comparison and evaluation of these statistical methods for microarray data have been reported [4, 6]. In general, the performances of different methods differ more when sample sizes are small. Unlike microarray, the high-throughput bisulfite sequencing not only generates similar data as the average beta (i.e. the methylation ratio of methylated versus total coverage at a CpG site) but also much higher resolution of base counts, the numbers of methylated or unmethylated cytosines. For easier analysis, the methylation ratio data can be analyzed in the same way as those used for microarray, and the relative performance of different statistical methods obtained from Illumina microarray should apply [4, 6]. However, what makes the sequencing data unique is that the count data follow a particular distribution that can be modeled by count-based statistical models. The digital count also carries the sequence coverage information that can be used for better estimate of a methylation level. The most simple and common approach for differential methylation is to construct a 2 × 2 table from methylated and unmethylated cytosine counts from a pair of samples and conduct Fisher’s or χ2 test for independence [7, 8]. However, this approach does not take biological variability into consideration. An alternative is proposed with improved performance [9]. However, both cannot incorporate covariates and make them unfit for a larger study with multiple variables or batch effects. Generalized linear model (GLM) for binary data (i.e. logistic regression) is based on binomial distribution assumption, but methylation data show much higher variability. GLM with quasi-binomial density function can relax the variance structure to some extent. More recent proposal is to use beta-binomial model [10–14]. Beta distribution is a general distribution on the support of (0,1), and it entitles flexible environment for methylation counts to cope with variability from both biological and technical prospective. However, independent evaluation of its performance with other count-based methods and methylation ratio-based statistics has not been available. Statistical methods for differentially methylated CpGs (DMCs) not only are essential for base-level methylation difference but also the first step for differentially methylated regions (DMRs) by many programs for methylation sequence data [8, 11, 12, 15]; thus, the performance of DMC methods has multifold impacts. Application of statistical methods on the count data is not straightforward for most biologists, not to mention to choose the most appropriate one. It is often a question that how much I will compromise to use the convenient methylation ratio data over the count data. In this study, we hypothesized that conducting DMC detection on the ratio data may not be as accurate as modeling the counted-based statistical models; because the count-based models are so diverse, the accuracy can differ dramatically depending on whether technical and biological variabilities are considered. To confirm the hypotheses, we have systematically evaluated statistical methods on both the ratio and the count data in parallel for their performance in DMC detection. We tested their type I error control, sensitivity and specificity in detecting small to large methylation differences in the context of intermediate to small sample size. We also compared their agreement and dived deep into the cause of the discrepancies. Materials and methods Data sets Instead of simulating methylation sequencing data, we used a real RRBS data and simulated methylation differences. By this, we can preserve the technical and biological characteristics of bisulfite sequencing data across CpG sites and samples. For this, we downloaded a data set from GEO (GSE61163), where the tumor samples from 39 individuals with chronic myelomonocytic leukemia (CML) were sequenced with ERRBS protocol before treatment. The detailed library preparation and sequencing were described in the original publication [16]. The raw sequence data (single end, 59 base reads from HiSeq 2000) were processed by our internally developed bisulfite sequence analysis pipeline, SAAP-RRBS [17]. CpGs with at least 10X coverage were kept for further analysis. Thirty individual samples were selected from the 39 and assigned to two groups randomly. To save computational time, only the CpG data from chromosome 22 were used, which had 86 300 CpGs. The CpGs with missing value in any of the samples or constant methylation across samples (either 0 or 1) were removed. After the filtering, we had 53 422 CpGs for evaluation. The first data set was the randomized data, which imitate the null distribution for no methylation difference between the two groups. The purpose was to investigate the type I error control for all the statistical models. The P-value distribution and type I error rate from each test were compared. The second to fourth data sets were created from the first by adding (or subtracting) 2, 5, 10 and 15% methylation value to 10% of CpGs to evaluate each model’s sensitivity and specificity to detect the differences. The ratio and count data were tested separately by applying respective models, and, therefore, we have five simulations and 10 data sets (five methylation ratio data sets and five methylation count data sets) for evaluation as detailed in Figure 1. Figure 1 View largeDownload slide Simulation scheme and data sets. A total of five simulations with 10 data sets are generated, five for ratio and five for count data. Each data set is tested by eight statistical methods (some with multiple implementations). Figure 1 View largeDownload slide Simulation scheme and data sets. A total of five simulations with 10 data sets are generated, five for ratio and five for count data. Each data set is tested by eight statistical methods (some with multiple implementations). Statistical methods For ratio based data, we looked at t-test (parametric), U test (nonparametric), analysis of variance (ANOVA; linear models) and beta regression model. For count-based data, we evaluated contingency table method [7, 8], clustered data analysis [9], logistic regression [8] and beta-binomial model [10–14]. These methods are briefed as below. Analysis on methylation ratio Using methylation ratio data has several advantages. First, although all CpG sites start from the same number of DNA molecules, the depth of coverage for each CpG sites can vary dramatically because of genomic feature and amplification differences. However, the methylation ratio would normalize the coverage difference. Second, the normalized methylation ratio allows modeling the variability directly and easily. Third, the continuous variable can be easily fit into many commonly used statistics models for simple to complex multivariate analyses. However, the disadvantages of the ratio data are (1) the methylation ratio estimate at low sequence coverage is less reliable. (2) Methylation ratio is bounded between 0 and 1, and models such as t-test expect the data range from −∞ to +∞. (3) Variance of CpGs across different methylation range is different [5], and this heteroscedasticity can be an issue for statistical model. For this reason, M value instead of beta value in microarray data is proposed [5]. By the same token, this can be applied to the ratio data by logit transformation. T-test T-test tests if two sets of samples have significantly different means and come from different populations. It is a parametric test and requires the samples to be independent and normally distributed. For a two-sample t-test, if the unknown variance is believed to be equal for both groups, the common variance is estimated by pooling all samples and estimating one common variance. However, if variances for different groups are expected to be different, the unequal variance method called Welch’s t-test can be used. ANOVA ANOVA is a tool based on linear models for multiple group comparison. It tests whether the means of several groups are equal. Linear models are flexible to incorporate clinical covariates. In methylation studies, clinical covariates such as gender and age are shown to be influential on methylation level [18, 19]. CpGassoc [20] is an R package that fits DNA methylation data with phenotype covariates to linear fixed- or mixed-effects models. There is an option to logit-transform the ratio data before fitting the model. CpGassoc conducts ANOVA test and can easily include clinical covariates. When there is only one two-level categorical covariate (like two-group comparison), ANOVA is equivalent to t-test. Our evaluation focused on the logit-transformed data, equivalent to M value. Nonparametric test Considering the underlying distribution of methylation ratio among study population is unknown, a safer approach would be to use nonparametric tests such as Mann–Whitney U Test (a.k.a. Wilcoxon rank-sum test), as it is distribution assumption free. It works more efficiently when data are not normally distributed. Instead of comparing two population means in the t-test, U test evaluates if two comparison samples have identical medians, and it is less affected by outliers like means. The drawback of the Mann–Whitney U test is that it may generate many statistical ties for many CpGs in terms of P-values, although their methylation mean differences between two groups can be clearly different. It will be more problematic when sample size is small. Beta regression Beta distribution is a flexible continuous probability distribution that captures data distributed in the unit interval from 0 to 1. A regression method developed to model beta-distributed dependent variable in relation with a set of independent variables is called the beta regression model [21]. The model associates the mean of the dependent variable (methylation ratio or average beta) with a set of regressors through a linear predictor with unknown coefficients through a link function. The R package betareg [22] implements the beta regression model for general purposes, which estimates the model parameters through maximum likelihood algorithm. Analysis on read counts or CpG coverage data Read count is the number of methylated and unmethylated cytosines at a CpG site. Test of independence (χ2 or Fisher’s exact test) This approach sums the counts across samples within a group for a given CpG site, which results in a 2 × 2 contingency table (methylated/unmethylated × case/control). χ2 test or Fisher’s exact test is applied to test for independence. This approach only does group comparison. Noted is that by pooling reads in the group, the variability of sequence depths and interindividual variation information is lost. Clustered data analysis Realizing the issues of 2 × 2 table approach, Xu [9] proposed a method based on clustered data analysis [23] to incorporate the between-subject variability. For a given CpG site, the method treats sequencing reads as clusters for each individual and estimates a common methylation proportion adjusted by design effect. This method can be considered as performing the χ2 test for independence in a series of 2 × 2 contingency tables. The implementation of this method is simple and computationally efficient. It is robust to variation in sequencing depth for individual subjects. However, the test is based on large-sample approximation. Logistic regression A particular CpG site can be either methylated (denoted as 1) or unmethylated (denoted as 0) as a binary status. Logistic regression is a form of GLM for binary data. The regression is performed across individuals or biological replicates under study conditions. The most common logistic regression assumes the underlying (error) distribution is binomial. However, the methylation data clearly bare more biological variability than the binomial assumption. To overcome the extra variability, a quasi-binomial link can be used in the GLM implementation. Quasi-binomial distribution models binomial data with overdispersion. Beta-binomial model Beta-binomial deals with the overdispersion in methylation data by hierarchical modeling. For a given CpG site, let Xi be the number of methylated Cs for subject i, and ni be the total number of read coverage for subject i; the beta-binomial model assumes Xi∼Binomialni, p where p∼Betaα,β. We say that Xi follows a beta-binomial distribution. The beta-binomial distribution is another way to model binomial data with overdispersion. By assuming the success probability of a binomial distribution being random draws from a beta distribution, the beta-binomial model allows more variability than the binomial model. The beta part in this model can be re-parameterized using mean and a dispersion parameter. If the dispersion parameter is zero, it reduces to a binomial model. Using the two-stage modeling, the beta-binomial model is flexible enough to capture both the technical (coverage) and biological (methylation level) variability. Technical variability is presented in the subject-specific coverage ni from the binomial part. The success probability of the binomial distribution is indeed the methylation ratio. Following the same rationale as the beta regression model, biological variability in methylation ratios is captured in the beta part of the beta-binomial model. There are several software programs implementing the beta-binomial model. Dispersion Shrinkage for Sequencing data (DSS) [13, 14] and methylSig [11] are R packages; RADmeth [10] is an open source command line software. The original DSS [13] is for DMC detection between two groups and its recent addition extends to more flexible general study design with multifactors (hereinafter, DSS.general) [14]. DSS is a Bayesian approach that solves the beta-binomial model by assigning the dispersion parameter with a log-normal prior distribution. The null hypothesis of equal mean methylation between two groups is tested by the Wald test. Owing to the Bayesian nature of borrowing information across different CpG sites, DSS is designed when a sample size is small. Both methylSig and RADmeth use maximum likelihood estimator (MLE) for parameter estimation and pair with the likelihood ratio test for differential methylation detection between two groups. However, there is no closed form for the MLE; methylSig proposes to use an approximation method based on Wilks’ theorem, and RADmeth is based on a GLM framework for beta-binomial regression with logit link. DSS and methylSig are limited for two-group comparison. Though RADmeth is computationally intensive (as it is an iterative algorithm), its GLM framework allows it to be extended to include other experimental design factors (but only categorical factors). DSS.general [14] follows a GLM framework for beta-binomial distribution and uses an ‘arcsine’ link function. Parameter estimation is based on the arcsine transformed data with generalized least square approach without relying on an iterative algorithm. It greatly saves computational time. The disadvantage of the arcsine link is that it is more difficult to interpret compared with the commonly used logit or probit link. Summary of these methods are presented in Table 1. Noted is that for some statistical methods, multiple implementations are evaluated; for example, under the beta-binomial model RADmeth, MethylSig, DSS and DSS.general are evaluated. Table 1 Summary of different statistical methods on differential CpG methylation Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Note. Almost all ratio-based methods used for Illumina Infinium 450K microarray can be used for bisulfite sequence data, and some may need slight customization; many tests such as t-test, χ2 and Fisher’s exact tests are readily available in various statistical packages, functions/methods in R are used for this evaluation. Some DMR methods (metilene, MOABS) do not generate comparable CpG level statistics for evaluation. Table 1 Summary of different statistical methods on differential CpG methylation Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Test category Advantage Disadvantage Package with method implementation Ratio data-based methods  T-test Simple Violation of normality assumption; violation of homoscedasticity; methylation ratio is restricted between 0 and 1; overloaded data near 0 and 1 BSmooth [15], IMA [24], Minfi [25], COHCAP [26]  ANOVA Flexible to incorporate clinical covariates; accommodates different study designs Same as t-test CpGassoc [20], Minfi [25], COHCAP [26]  Nonparametric test (U test) Free of underlying distribution Less power if parametric test available IMA [24], metilene [27]  Beta regression Fits naturally to the methylation ratio; incorporates covariates; accommodates data heteroscedasticity Slow to handle millions of CpG sites R function [21] Count data-based methods  Contingency table test Simple Ignores individual sequencing depth; ignores between-subject variability methylKit [8]; COHCAP [26]  Clustered data analysis Robust Reduced power R code from the authors [9]  Logistic regression Incorporates covariates Hard to capture overdispersion methylKit [8]  Beta-binomial model Models overdispersion; contains coverage information Computationally intensive RADmeth [10], MethylSig [11], MOABS [12], DSS [13], DSS.general [14] Note. Almost all ratio-based methods used for Illumina Infinium 450K microarray can be used for bisulfite sequence data, and some may need slight customization; many tests such as t-test, χ2 and Fisher’s exact tests are readily available in various statistical packages, functions/methods in R are used for this evaluation. Some DMR methods (metilene, MOABS) do not generate comparable CpG level statistics for evaluation. Results Characteristics of DNA methylation from RRBS data across individuals As the first step, we examined the diagnostic properties of the RRBS data: methylation and coverage variability across individuals. Figure 2A shows the relationship between the mean and SD of methylation ratio at each CpG among 30 samples. The methylation ratio displays high variation around intermediately methylated CpG sites but little for the CpGs that are fully methylated or not methylated at all. This heteroscedasticity is typical for methylation data as described previously for Illumina microarray data [5] and RRBS data [28]. Additionally, the data are not symmetrically distributed. The majority of CpG sites are clustered near the extreme values 0 (50.4% from 0 to 0.2) and 1 (36.7% from 0.8 to 1). For the count data, the biggest challenge is the high variability of coverage across samples as shown in Figure 2B. As an illustration, we highlighted one highly varied CpG with red points across samples, which run from 10s to 2500. These properties of RRBS data have significant implications to the model selections and result interpretation from different models, as we describe in this article. Figure 2 View largeDownload slide Properties of RRBS data. (A) Mean and SD of methylation ratio under the null at 53 422 CpG sites across the randomly selected 30 subjects. X-axis is the mean methylation ratio at each CpG site, and y-axis is the SD of the methylation ratios at corresponding site (average SD = 0.07, maximum SD = 0.41). (B) Boxplot of coverage count at each CpG site across samples. The red/star points represent one specific CpG site that shows largely varying coverage count across samples. Figure 2 View largeDownload slide Properties of RRBS data. (A) Mean and SD of methylation ratio under the null at 53 422 CpG sites across the randomly selected 30 subjects. X-axis is the mean methylation ratio at each CpG site, and y-axis is the SD of the methylation ratios at corresponding site (average SD = 0.07, maximum SD = 0.41). (B) Boxplot of coverage count at each CpG site across samples. The red/star points represent one specific CpG site that shows largely varying coverage count across samples. Performance of different methods under null and alternative simulations For the eight statistical methods on the 10 data sets, we looked at the performance of the methods through four perspectives: (1) under the null, what does the empirical distribution of P-values look like?, (2) Given there is no differential methylation, is the type I error well controlled under the null?, (3) For known level of differential methylation, how powerful are these methods in detecting true DMCs? and (4) How many positive findings are commonly identified by different methods? Computation time and handling of special data issue The computation time was not an issue for most of the methods (within a matter of minutes) except RADmeth and beta regression. RADmeth took 1–2 h, and beta regression took half a day to run the simulated data. RADmeth is an iterative algorithm, and its computation time can be improved on high-performance computing cluster. The problem with beta regression is that the original R function for beta regression was not designed for high-throughput genomic data. Although it takes reasonable amount of time for running a few beta regressions, it became a major issue when there are over hundreds of thousands of CpG sites. Note that some tests (i.e. beta regression and clustered data analysis) may fail at the CpG sites with less ideal data, which result NA for the P-values. For example, under the null, 946 sites failed in the beta regression with logit link, 3 sites failed with probit link and 6121 sites failed in the clustered data analysis test. The maximum likelihood algorithm for solving the beta regression fails with data that produce non-invertible matrices in the calculation. For clustered data analysis, the test may fail at the positions where all observed counts in one group happen to be methylated or unmethylated. This will make zero for observed unmethylated (or methylated) proportion in the pooled contingency table and lead to zero in denominator in the following calculation. P-value distribution Theoretically, P-values should form a uniform distribution under the null. Figure 3 shows the empirical distributions for all available P-values computed by each method. For illustration purpose, we also included the output from variants of some methods. The left column shows tests for ratio data. From top to bottom, the tests are t-test with equal variance, t-test with unequal variance, ANOVA test, ANOVA with logit-transformation, U test, beta regression with logit link and beta regression with probit link. We observed similar patterns for the first four histograms (in fact, Plot 1 and Plot 3 are exactly the same, as the t-test is the special case of ANOVA). The noticeable characteristics are that the P-values are all peaked around 0.3–0.35. This pattern was also observed before [9]. This may be explained by the overloading of the ratios around the extreme values 0 and 1 that may lead P-values to be overloaded at some certain value. Similarly, the P-values from the U test also have a spike around 0.35–0.4. However, the abundant P-values of 1 come from the excessive of ties in the data, which bring deficiency of the U test. The bottom two histograms from beta regression look much closer to a uniform distribution. Figure 3 View largeDownload slide Histogram of P-value distribution under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. Figure 3 View largeDownload slide Histogram of P-value distribution under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. The right column of Figure 3 shows tests for the count data. The first two plots are from tests of independence based on the pooled contingency table. In this scrambled data, we have 15 samples in each of the testing groups. By pooling all samples in the same group together, contingency tables for each CpG site usually contained hundreds to thousands of coverage counts, which lead to a high peak of significant small P-values between 0 and 0.05. Clustered data analysis was able to effectively correct the naïve method. Logistic regressions with and without overdispersion were problematic for having too many P-values equal to 1. Additionally, the standard logistic regression induced a great amount of P-values being significant. With quasi-binomial link, an extra variability can be captured, so the histogram was improved at the zero-end. According to the empirical P-value distribution, the best count-based test was the beta-binomial model. Among different implementations, DSS and DSS.general have the closest empirical distribution to a uniform distribution. Type I error control As no methylation difference is expected under the null simulation, any significant DMCs would be considered as false positives. Figure 4 compares the type I error rate for each test. In general, ratio-based tests are more conservative, and all tests have <5% of CpGs that were claimed as significant (left panel). T-test, equal or unequal variance, is the most stringent for type I error control, and U test is a little bit more relaxed than t-test. ANOVA with logit transformation has the type I error rate that is the closest to the expectation and so is the beta regression in which the different link functions appear not having much difference. Figure 4 View largeDownload slide Observed type I error rates under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. The numbers on the bars are the observed type I error rates. The dashed line is expected type I error rate (0.05). Figure 4 View largeDownload slide Observed type I error rates under the null (15 versus15 samples). The left panel is for ratio-based, and the right is for count-based statistical models. The numbers on the bars are the observed type I error rates. The dashed line is expected type I error rate (0.05). The type I error control is dramatically variable for the count-based models where some appear problematic (Figure 4, right panel). The error rates of χ2 test, Fisher’s exact test and logistic regression are over 30% (33.3, 34.4 and 33.8%, respectively). The clustered method brought down the type I error dramatically, although still higher than expectation (7.5%). The logistic regression with quasi-binomial link, which captures the overdispersion in data, seems having the type I error well controlled or a little bit stringent (4%). Finally, all implementations of the beta-binomial model give close type I error rate (4–5.9%) to the expectation. Receiver operating characteristic curves for methylation difference detection We looked at the receiver operating characteristic (ROC) curve and area under the curve (AUC) of each test for their sensitivity and specificity under alternative simulations. Under the alternative, we assumed that 10% of the CpG sites was truly differentially methylated at four different magnitudes of 2, 5, 10 and 15%. All the tests evaluated under the null simulation were carried out to the four alternative data sets. As noticed in the null simulation, the statistic tests with different options (link functions or variance assumption) from the same class usually performed similarly, and some test options are redundant. For simplicity, we only report the results of representative tests from each modeling class here. Figure 5 shows the results for the alternative simulations. Each row represents a specific differential methylation level (2, 5, 10 and 15%). At the 2% methylation difference, all tests have a limited detection power. The ratio-based tests have AUC statistics ranging from 0.7 to 0.8, with a clear ordering of the ROC curves in which from the worst to the best performer are t-test, U test, ANOVA with logit transformation and beta regression. The count-based tests have similar AUC statistics, ranging from 0.71 to 0.79. Overall, the 2% methylation difference appears too small for most of the tests to have a reliable detection. Figure 5 View largeDownload slide ROC curves with 2, 5, 10 and 15% methylation difference (15 versus 15 samples). The grey vertical line in each plot is the nominal 0.05 significant level for the false-positive rate, and each open circle along the curve means the actual 0.05 cutoff point in the test. Figure 5 View largeDownload slide ROC curves with 2, 5, 10 and 15% methylation difference (15 versus 15 samples). The grey vertical line in each plot is the nominal 0.05 significant level for the false-positive rate, and each open circle along the curve means the actual 0.05 cutoff point in the test. At the 5% methylation difference, ANOVA with logit transformation becomes the best performer among the ratio-based tests, and the beta-binomial model (methylSig, RADmeth and DSS.general, although DSS had relatively weak performance in the class) is the best performer among the count-based tests. The slightly better performance of the logit-transformed ANOVA model over U test or beta regression may be explained by the better handling of the heteroscedasticity in methylation data, as reported before [5, 6]. No matter which implementation used, the better performance of the beta-binomial model was likely the result of its better handling of type I error, by accounting for both biological and technical variability through a two-layer model specification. As methylation difference increases, the detection power was boosted accordingly. The detection power is about 60, 80 and 90% at 5, 10 and 15% methylation difference, respectively. At the 10% methylation difference, the ROC curves of the ratio-based tests became close to each other. When reaching to 15% methylation difference, almost all the tests had a perfect ROC plot, except for those tests that might fail in some CpG sites. In general, t-test has the worst ROC curve and AUC statistic, and U test has better performance than the t-test. The performance of the beta regression model may be affected by the failure of the test for some CpG sites, as its curve jumps quickly and then declines the growing rate. For count-based tests, all other tests are much worse than the beta-binomial model in terms of both the shape of ROC curve and the value of AUC statistic (the larger, the better). For different implementations, the performances of methylSig and RADmeth are almost identical, and DSS.general has slightly better performance than these two. As methylation difference increases from 2 to 15%, the dominance of the best performer in each data class becomes clearer, i.e. the best performance curve dominates all other curves in respective plots. On Figure 5, we can also compare the nominal and actual 5% significance level. In terms of the type I error control, the results under the alternatives are consistent with the result under the null (Figure 4). All ratio-based tests were able to control the false-positive rate under the nominal level, and they were a bit conservative. For count-based tests, the logistic regression with overdispersion, DSS and DSS.general were able to control the false positives under the nominal level. The clustered data analysis, methylSig and RADmeth had a little inflated type I error rates, but the Fisher’s exact test (and/or χ2 test) lost control of the type I error badly. Though Fisher’s test achieved a higher power, it was paid at the higher cost of loss of type I error control. On the contrary, the two best performers from each data class, i.e. ANOVA with logit transformation and beta-binomial model (DSS.general), were able to achieve a high detection power while controlling the type I error at the nominal level. Consistency of findings from different tests No matter which statistical test is used, the important question is how much information a researcher will lose or be crammed with too many false findings compared with an alternative test being used. For this, we illustrated the commonly detected DMCs between any two tests in a heatmap. As the 2% methylation difference was too small to have a sound detection, Figure 6 shows the common findings under the 5% methylation difference for all the tests discussed above (similar plots for other alternative simulations can be found in Supplementary Figure S1). The diagonal entries with no background color are the numbers of DMCs identified by each test. Of 53 422 CpGs, Fisher’s test identified 21 156 significant ones. However, we believe that many of them were likely false positives, as >30% of CpGs were claimed as DMCs in our null simulation and other tests identified 5000–6700 DMCs, which is about 9.3–12.5% of the total CpG sites, the expected DMC range from 10% artificially changed CpGs plus the false positive under the null distribution. The row of blue cells and the column of almost all red cells are because of the large number of findings by Fisher’s test. Because of the large discrepancy with other tests, our discussion is focused on other tests. Figure 6 View largeDownload slide Common findings between different tests with 5% methylation difference (15 versus 15 samples). The numbers with white background are the significant CpGs (P<0.05) from each test, and other numbers in each cell represents the ratio of overlapping DMCs between a pair of tests on the row and column (note: the plot is not symmetric). Figure 6 View largeDownload slide Common findings between different tests with 5% methylation difference (15 versus 15 samples). The numbers with white background are the significant CpGs (P<0.05) from each test, and other numbers in each cell represents the ratio of overlapping DMCs between a pair of tests on the row and column (note: the plot is not symmetric). The first four tests in Figure 6 are ratio-based, and the rest seven tests are count-based. Each cell in the heatmap represents the ratio of overlapping DMCs between a pair of tests on the row and column. The plot is not symmetric. For example, the very bottom and far left cell of 0.76 means that 76% of the DMCs identified by DSS.general is also found by t-test with equal variance. Reversely, the number of 0.91 on the very top and far right means that 91% of DMCs by the equal variance t-test is overlapped with the DMCs by DSS.general. In other words, if you use t-test for DMC detection, you may lose >20% of significant CpGs picked up by DSS.general, but if you use DSS.general, you only lose 9% of DMCs by the t-test. Overall, all the tests (except Fisher’s test) achieved 60% or above agreement between each other. Figure 6 reconfirms that the two implementations of beta-binomial model, methylSig and RADmeth, performed similarly. The agreement between was 98 and 95% of the significant findings, respectively. The other two implementations, DSS and DSS.general, identified less number of significant DMCs. Also, all beta-binomial implementations had high agreement with t-test. They were able to identify >90% of the DMCs identified by the t-test. Methylation variability and sample size estimation Understanding the technical and biologic variability among biological samples in DNA methylation sequencing is important for the study design of a methylation experiment. It would be interesting to know the magnitude of random differences occurred in the methylation level between two sets of arbitrarily grouped samples. Such measurement can be considered as the noise level for a treatment effect. Knowing this would also help us to understand the detection of ‘truly’ CpG methylation difference. Figure 7 shows the distribution of mean methylation difference between the two randomly sampled groups for all CpG sites on chr22. We can see that the histogram is roughly bell-shaped and centered at 0 (median = 0 and mean =−0.003). The SD of the mean differences is 0.043. In our alternative simulations, the 2% methylation difference represents half of the SD of the between-group difference because of randomness, the 5% methylation difference is >1 SD and 10% is >2 SD (95% of CpGs have absolute methylation difference <10%) of the random differences. This means 5% CpGs can have over 10% methylation difference purely by chance in this data set. Figure 7 View largeDownload slide Distribution of mean methylation differences between groups under the null (15 versus 15 samples) at all CpG sites. The top right legend shows basic summary statistics for the differences: mean, SD, quantiles and 2.5th and 97.5th percentile. The vertical line indicates zero difference. Figure 7 View largeDownload slide Distribution of mean methylation differences between groups under the null (15 versus 15 samples) at all CpG sites. The top right legend shows basic summary statistics for the differences: mean, SD, quantiles and 2.5th and 97.5th percentile. The vertical line indicates zero difference. Knowing the noise level in RRBS data is helpful for determining an appropriate sample size in designing an experiment. With the minimal methylation difference to be detected and confidence level, the required sample size can be estimated. More explicitly, at significant level α and power 1-β, the sample size calculation based on the group mean methylation ratio is n=fα,β2σ2Δ2, where f(α,β) is a value determined by α and β, σ is the SD of the methylation ratio, and Δ is the expected difference that being regarded as minimal important. For example, for two-sample t-test, fα,β=Zβ+Zα22 where Zβ is the β-percentile of a standard normal distribution. If targeting at 5% significant level and 80% power, fα,β=7.85. In our simulated null ratios, the average SD is 0.07 (Figure 1A). If we consider 1 SD of the null methylation difference being minimal important ( Δ = 0.043), it is necessary to have 42 (n = 41.61) samples in each group to achieve the desired significant level and power. If the interested minimal important difference is ≥2 or 3 SDs, 15 (n = 15.02) samples in each group would be required. It is about the same sample size as we used in this study, so that it corresponds to the sensitivity and specificity shown for the 10 and 15% methylation differences in Figure 5. The above example shows the simplest case for sample size calculation, and the estimated sample size is most suitable for two-sample t-test. However, as illustrated in the previous section, t-test is less sensitive, and the required sample size would be smaller for the more sensitive test like the beta-binomial model. Sample size calculation itself is a complex statistical topic, particularly, for genome-wide studies, and a systematic evaluation of sample size estimation for RRBS experiment is out of the scope of this article. Readers can get more information on this similar topic about sample size estimation for high dense DNA methylation microarray [29]. DMC detection with small sample size To evaluate the sample size effect on DMC detection, we also performed simulation for smaller sample size, i.e. five samples in each group for the same null and alternative simulation scheme as the 15 versus 15 sample design, and these results are provided in Supplementary Figures S2–S4. Under the null, the empirical P-value distribution (Supplementary Figure S2) for most of the tests remained similar pattern as before. Note that the U test had a more problematic P-value distribution because of the severe ties in small sample size. The type I error control (Supplementary Figure S3) got a lot worse for the ratio-based tests. All beta-binomial implementations had a little inflated type I error rate, and DSS had the closest control to the expectation. Under the alternatives, the performance for all the tests became less satisfying (Supplementary Figure S4). With the small sample size, it was particularly difficult to identify small methylation difference. As discussed by [13], DSS indeed outperforms other approaches slightly when sample size is small. The agreement between two tests was limited (about 50% agreement on average) when both sample size and differential methylation signal were small (Supplementary Figure S5A). It is likely a consequence of the near diagonal ROC curves in the first row of Supplementary Figure S4. As the differential methylation signal got stronger, the beta-binomial model quickly regained the ability to identify the DMCs identified by t-test, U test, ANOVA with logit transformation and logistic regression with overdispersion (Supplementary Figure S5B–D). The results from the small sample simulation remind us that all tests have a limited power to detect methylation difference <5%, and these changes are well within the noise level. The beta-binomial model is more sensitive and specific, and DSS should be used to get a relatively trustable result for small sample size. Discussion In this study, we extensively looked into different statistical methods for analyzing DMCs in base resolution next-generation DNA sequencing data. We compared two general categories of statistics for such methylation data, which are the continuous model for methylation ratios and the discrete model for methylation counts. Specifically, we illustrated four tests in each category and evaluated their type I error control, sensitivity and specificity under the null and four alternative simulations. Many of these methods have inevitable shortcomings in modeling the methylation data. Loss of coverage information and variance heteroscedasticity are the major issues for the methylation ratio-based models. The normality assumption of parametric models may also not meet for t-test or ANOVA (Table 1 for summary). Logit transformation of the ratio data appears improving the performance of these tests. However, considering all performance measures, we would recommend keeping the methylation counts and modeling them with the beta-binomial distribution. The simple way to model the count data by pooling read coverages within a group to construct a contingency table and testing with χ2 or Fisher’s test is problematic. First, the sequence depth from each individual can be different (Figure 2B), and the ones with large sequencing coverage have dominating influence on the test statistic. Second, by pooling reads across samples together, biological variability is ignored. Ignoring an extra variability may induce spurious significant findings. Third, treating coverage counts as observations in the test also leads to inflated type I error. As there are more samples collected, the pooled coverage counts accumulate quickly for the case and control groups. As in methylKit [8], Fisher’s test is used in the absence of replicates; if there are multiple samples per group, methylKit will use the logistic regression test. Furthermore, even the logistic regression is used for a study with a relatively larger sample size [8], regular logistic regression performed poorly without overdispersion moderation. When both the dependent variable and the independent variable are dichotomous, χ2 test is a special case of logistic regression; thus, they had similar performance. The method using clustered data analysis techniques is an improvement to the pooled contingency table. We noted that this method is based on large sample approximation, and the simulation study by the proponents shows that the performance of this approach improves as sample size increases [9]. When sample size is small, we may expect an inflated type I error. We also pointed out that this test would fail when the data are in the extreme case, i.e. all observed counts in one group happen to be methylated (or unmethylated). The key for constructing a good count-based model for methylation sequencing experiment is the model’s ability to capture the variability in the data from multiple sources. Failing to do so can dramatically impact the results. As we can see in Figure 4, the type I error rate clearly reflects this shortcoming in the construction of these tests (χ2 test, Fisher’s test and the standard logistic regression). To model binary data with additional variability, a general way is to replace the binomial link with the quasi-binomial link in the logistic. More specifically to the methylation data, the beta-binomial model uses two layers of modeling, one for each, to capture one source of variation (technical and biological). The biological variance, modeled in the beta part of the model, is our key interest for DMC detection. Technical variability coming from sequencing depth difference is inevitable, and it is modeled in the binomial part. Though the beta-binomial model is computationally more complex than other methods, with powerful computing clusters, it is no longer an issue today. In this article, we evaluated four beta-binomial implementations (methylSig, RADmeth, DSS and DSS.general). DSS and methylSig are only for two-group comparison; DSS.general can fit the most general multiple factor experimental design. RADmeth is an iterative algorithm and requires longer computation time. DSS is the best performer for small sample size. The implementation is not limited to the ones that we listed here. For example, model-based analysis of bisulfite sequencing data (MOABS) [12] also uses a beta-binomial hierarchical model for DMC and DMR detection between group conditions and solves the model specification by the Empirical Bayes approach. This study focused on DMC statistical method classes but not necessarily all software implementations in the public domain. Statistical methods such as t-test, U test, Fisher’s exact or χ2 test are readily available in many different software packages, and there is no need to evaluate all the packages. DMR methods that do not provide CpG level statistical testing were not included. For example, Bisulfighter [30] uses Hidden Markov Model (HMM) to detect DMRs, but it only works with a pair of samples. Study design with biological replicates in a group would need to pool all samples together first, and a possibility of hyper, hypo or no change is calculated by comparing the two pooled samples, which is not comparable with our study. In this study, we used the hematologic tumor samples that likely have larger variance across individuals than normal tissues but smaller than solid tumors, as hematologic tumors generally are more homogeneous with less contamination of other cell types. To confirm this, we examined other RRBS data in our collection for normal adipose tissues, lung adenocarcinoma, stomach adenocarcinoma, colorectal adenocarcinoma and pancreatic adenocarcinoma, and indeed, this was the case. The normal adipose tissues had slightly lower variability across different individuals; however, when we tested the statistical methods, the similar results and conclusions were obtained (data not shown). All solid tumors evaluated had slightly higher variability across different individuals (mean SD of CpGs from 0.080 to 0.114 compared with CML’s 0.075).The observation and conclusions from our evaluations can be generalizable to RRBS or similar data, but sample size estimation needs to be tissue (tumor) type specific. A larger sample size is needed for solid tumors for DMC detection. In our study, we adopted a real RRBS data set for the purpose of best reflection of the reality. Before that, we also attempted to use an RRBS simulator ‘rrbs-sim’ [31], which generated totally simulated RRBS data. This simulator starts from modeling site locations, to coverage scores and lastly methylation levels. Though the simulator used advanced statistical techniques to model the spatial correlation and the spatial dependence structure in coverage scores and methylation levels, the generated data still have less satisfied variation level compared with the real-world data, which will then lead to misleadingly inflated type I error rates. In summary, our comprehensive assessment of both ratio- and count-based statistical models concludes that although ratio-based tests are easier to work with, they are less sensitive to detect minor methylation changes. The preferred method is the beta-binomial model that can capture both biological and technical variability to achieve the well-balanced sensitivity and specificity. Many implementations are designed for two-group comparison. To our best knowledge, the latest DSS. general [14] is the implementation that provides the most flexibility to fit the beta-binomial model from general experimental design. Key Points Base resolution DNA methylation measurement through bisulfite treatment and next-generation sequencing is an increasingly popular method for genome-wide DNA methylation profiling, which includes RRBS, targeted capturing and WGBS. Differential methylation detection at CpG level (DMC) is the most common statistics for biological interpretation and is often the first step for DMRs identification. Although methylation ratio at CpG level can be directly used for comparison, it does not carry the coverage information for accurate methylation estimate. By using real RRBS data along with simulation, we demonstrated the relative performances of eight statistical models (with multiple implementations), four for ratio-based and four for coverage count-based models. The ratio-based tests are easier to work with but are less sensitive to detect minor methylation changes between contrast groups, particularly when sample size is small. The beta-binomial model is the best method for balanced sensitivity and specificity in DMC detection; some count-based methods that ignore technical and biological variability have high false-positive rates and should be avoided. MethylSig, RADmeth, DSS and DSS.general give the similar result; however, MethylSig and DSS are only for two-group comparison, and RADmeth and DSS.general can incorporate covariates. Supplementary data Supplementary data are available online at http://bib.oxfordjournals.org/. Funding The Center for Individualized Medicine at Mayo Clinic. Yun Zhang is a PhD candidate in Statistics with concentration on Biostatistics and Computational Biology in the Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY. She was a summer intern in the Division of Biomedical Statistics and Informatics, Department of Health Sciences Research at Mayo Clinic Rochester, Minnesota when this work was performed. Saurabh Baheti is a Senior Informatics Specialist with Master’s Degree in Bioinformatics in Division of Biomedical Statistics and Informatics, Department of Health Sciences Research at Mayo Clinic Rochester, Minnesota. Zhifu Sun is a Senior Associate Consultant and Associate Professor in Division of Biomedical Statistics and Informatics, Department of Health Sciences Research at Mayo Clinic Rochester, Minnesota. His research focus is genomics and epigenomics. References 1 Sun Z , Cunningham J , Slager S , et al. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis . Epigenomics 2015 ; 7 : 813 – 28 . Google Scholar CrossRef Search ADS PubMed 2 Weisenberger DJ , Berg DVD , Pan F , et al. Comprehensive DNA Methylation Analysis on the Illumina Infinium Assay Platform. Application Notes: Illumina Epigenetic Analysis . Illumina, Inc , 2008 . marketing/documents/products/appnotes/appnote_dna_methylation_ analysis_infinium.pdf. 3 Dedeurwaerder S , Defrance M , Bizet M , et al. A comprehensive overview of Infinium HumanMethylation450 data processing . Briefings in Bioinformatics 2014 ; 15 : 929 – 41 . Google Scholar CrossRef Search ADS PubMed 4 Li D , Xie Z , Pape ML , et al. An evaluation of statistical methods for DNA methylation microarray data analysis . BMC Bioinformatics 2015 ; 16 : 217. Google Scholar CrossRef Search ADS PubMed 5 Du P , Zhang X , Huang CC , et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis . BMC Bioinformatics 2010 ; 11 : 587. Google Scholar CrossRef Search ADS PubMed 6 Zhuang J , Widschwendter M , Teschendorff AE. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform . BMC Bioinformatics 2012 ; 13 : 59. Google Scholar CrossRef Search ADS PubMed 7 Lister R , Pelizzola M , Dowen RH , et al. Human DNA methylomes at base resolution show widespread epigenomic differences . Nature 2009 ; 462 : 315 – 22 . Google Scholar CrossRef Search ADS PubMed 8 Akalin A , Kormaksson M , Li S , et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles . Genome Biology 2012 ; 13 : R87. Google Scholar CrossRef Search ADS PubMed 9 Xu H , Podolsky RH , Ryu D , et al. A method to detect differentially methylated loci with next-generation sequencing . Genetic Epidemiology 2013 ; 37 : 377 – 82 . Google Scholar CrossRef Search ADS PubMed 10 Dolzhenko E , Smith AD. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments . BMC Bioinformatics 2014 ; 15 : 215. Google Scholar CrossRef Search ADS PubMed 11 Park Y , Figueroa ME , Rozek LS , et al. MethylSig: a whole genome DNA methylation analysis pipeline . Bioinformatics 2014 ; 30 : 2414 – 22 . Google Scholar CrossRef Search ADS PubMed 12 Sun D , Xi Y , Rodriguez B , et al. MOABS: model based analysis of bisulfite sequencing data . Genome Biology 2014 ; 15 : R38. Google Scholar CrossRef Search ADS PubMed 13 Feng H , Conneely KN , Wu H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data . Nucleic Acids Res 2014 ; 42 : e69. Google Scholar CrossRef Search ADS PubMed 14 Park Y , Wu H. Differential methylation analysis for BS-seq data under general experimental design . Bioinformatics 2016 ; 32 : 1446 – 53 . Google Scholar CrossRef Search ADS PubMed 15 Hansen KD , Langmead B , Irizarry RA. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions . Genome Biology 2012 ; 13 : R83. Google Scholar CrossRef Search ADS PubMed 16 Meldi K , Qin T , Buchi F , et al. Specific molecular signatures predict decitabine response in chronic myelomonocytic leukemia, The . Journal of Clinical Investigation 2015 ; 125 : 1857 – 72 . Google Scholar CrossRef Search ADS PubMed 17 Sun Z , Baheti S , Middha S , et al. SAAP-RRBS: Streamlined Analysis and Annotation Pipeline for Reduced Representation Bisulfite Sequencing . Bioinformatics 2012 ; 28 : 2180 – 1 . Google Scholar CrossRef Search ADS PubMed 18 Day K , Waite LL , Thalacker-Mercer A , et al. Differential DNA methylation with age displays both common and dynamic features across human tissues that are influenced by CpG landscape . Genome Biology 2013 ; 14 : R102. Google Scholar CrossRef Search ADS PubMed 19 Florath I , Butterbach K , Muller H , et al. Cross-sectional and longitudinal changes in DNA methylation with age: an epigenome-wide analysis revealing over 60 novel age-associated CpG sites . Human Molecular Genetics 2014 ; 23 : 1186 – 201 . Google Scholar CrossRef Search ADS PubMed 20 Barfield RT , Kilaru V , Smith AK , et al. CpGassoc: an R function for analysis of DNA methylation microarray data . Bioinformatics 2012 ; 28 : 1280 – 1 . Google Scholar CrossRef Search ADS PubMed 21 Ferrari S , Cribari-Neto F. Beta regression for modelling rates and proportions . Journal of Applied Statistics 2004 ; 31 : 799 – 815 . Google Scholar CrossRef Search ADS 22 Cribari-Neto F , Zeileis A. Beta regression in R . Journal of Statistical Software 2010 ; 34 : 1 – 24 . Google Scholar CrossRef Search ADS 23 Rao JN , Scott AJ. A simple method for the analysis of clustered binary data . Biometrics 1992 ; 48 : 577 – 85 . Google Scholar CrossRef Search ADS PubMed 24 Wang D , Yan L , Hu Q , et al. IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data . Bioinformatics 2012 ; 28 : 729 – 30 . Google Scholar CrossRef Search ADS PubMed 25 Aryee MJ , Jaffe AE , Corrada-Bravo H , et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA Methylation microarrays . Bioinformatics 2014 ; 30 : 1363 – 9 . Google Scholar CrossRef Search ADS PubMed 26 Warden CD , Lee H , Tompkins JD , et al. COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis . Nucleic Acids Research 2013 ; 41 : e117. Google Scholar CrossRef Search ADS PubMed 27 Juhling F , Kretzmer H , Bernhart SH , et al. metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data . Genome Research 2016 ; 26 : 256 – 62 . Google Scholar CrossRef Search ADS PubMed 28 Chatterjee A , Stockwell PA , Rodger EJ , et al. Genome-wide DNA methylation map of human neutrophils reveals widespread inter-individual epigenetic variation . Scientific Reports 2015 ; 5 : 17328. Google Scholar CrossRef Search ADS PubMed 29 Tsai PC , Bell JT. Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation . International Journal of Epidemiology 2015 ; 44 : 1429 – 1441 . Google Scholar CrossRef Search ADS PubMed 30 Saito Y , Tsuji J , Mituyama T. Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions . Nucleic Acids Research 2014 ; 42 : e45. Google Scholar CrossRef Search ADS PubMed 31 Lacey MR , Baribault C , Ehrlich M. Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments . Stat Appl Genet Mol Biol 2013 ; 12 : 723 – 42 . Google Scholar CrossRef Search ADS PubMed © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Briefings in BioinformaticsOxford University Press

Published: Dec 31, 2016

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off