VarExp: estimating variance explained by genome-wide GxE summary statistics

VarExp: estimating variance explained by genome-wide GxE summary statistics Abstract Summary Many genome-wide association studies and genome-wide screening for gene–environment (GxE) interactions have been performed to elucidate the underlying mechanisms of human traits and diseases. When the analyzed outcome is quantitative, the overall contribution of identified genetic variants to the outcome is often expressed as the percentage of phenotypic variance explained. This is commonly done using individual-level genotype data but it is challenging when results are derived through meta-analyses. Here, we present R package, ‘VarExp’, that allows for the estimation of the percentage of phenotypic variance explained using summary statistics only. It allows for a range of models to be evaluated, including marginal genetic effects, GxE interaction effects and both effects jointly. Its implementation integrates all recent methodological developments and does not need external data to be uploaded by users. Availability and implementation The R package is available at https://gitlab.pasteur.fr/statistical-genetics/VarExp.git. Contact vincent.laville@pasteur.fr Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction Many genome-wide association studies (GWAS) or genome-wide screenings incorporating gene–environment (GxE) interactions (Aschard et al., 2012) have been performed to better understand underlying mechanisms of human traits and diseases. When the analyzed outcome is continuous, a commonly used measure to judge the overall impact of the significant associations is the percentage of phenotypic variance explained. A standard way of estimating this percentage is to compare the coefficients of determination between the models including or not the significantly associated variants and/or interactions. This requires individual genotype and phenotype data which can be challenging in meta-analyses performed in big consortia as pooling data from multiple cohorts raises practical and ethical issues. However, an alternative is to use only GWAS or genome-wide GxE summary statistics. Recently, several methods (Pare et al., 2016; Shi et al., 2016) have been developed to estimate the variance explained by marginal genetic effects while taking into account linkage disequilibrium between variants, and addressing statistical issues related to finite sample size and Single Nucleotide Polymorphisms (SNP) correlation matrices. Yet, these works only focused on marginal genetic effects, while genome-wide GxE and joint effect GWAS are now commonly performed and face the same need. In this work, we address this gap by extending the methodology to GxE screening and implementing R package VarExp to rapidly and easily estimate the percentage explained by variants and/or interactions of interest using only meta-analysis summary statistics from GWAS. 2 Materials and methods 2.1 Percentage of variance explained Consider a set of K SNPs Gkk=1…K, coded additively as {0, 1, 2} and a quantitative outcome Y. The marginal genetic effect αG.k of SNP Gk is estimated in the marginal model:   Y=α0+αG.kGk+ε. Shi et al. (2016) proposed a first naïve estimator to derive the variance explained by genetic effects, fG using summary statistics:   fG=αG′TΣ*αG′/varY where αG′=αG.1σ1 …αG.kσk…αG.KσKT, σk denotes the standard deviation of SNP Gk and Σ* is the Moore–Penrose generalized inverse of the genotype correlation matrix. However, finite sample size implies statistical noise in both the effect sizes and the correlation matrix estimations which can induce bias in the estimation of fG. Shi et al. (2016) derived a general formula that addresses this issue:   fG*=N×αG′TΣ*αG′-q/N-q×varY where N and q denote respectively the sample size and the rank of the correlation matrix. Now consider an exposure E (either binary or quantitative). The main effect αG.k of the SNP Gk and the interaction effect αINT.k can be estimated using a single-SNP model with an interaction term:   Y=α0+αG.kGk+αEE+αINT.kGk×E+ε. We show in Supplementary Material that, when re-parameterizing the effect estimates of the above model to obtain parameters from a fully standardized model, the percentage of variance explained by interactions effects fI or jointly by genetic and interaction effects fG+I can also be derived using summary statistics only:   fI=αINT′TΣ*αINT′/varY  fG+I=fG+fI where αINT′=αINT.1σ1σE …αINT.kσkσE…αINT.KσKσET and σE is the standard deviation of E. In this model, fG is computed using effect sizes from the interaction model. However, for the reasons discussed above, we define our final estimators, fI* and fG+I*, by applying the same corrections as proposed for the fG estimator by Shi et al. (see fG* equation). 2.2 Estimating the genotype correlation matrix When the genotype correlation is not available from the data, it can be estimated using genotype data from a reference panel such as the 1000 Genomes (Abecasis et al., 2012) . We implemented a transparent function that derives this correlation matrix from 1000 Genomes Phase 3 data either through a web access for small number of SNPs or from local data files for larger number of SNPs, as computational time is dramatically reduced when querying local files (see Supplementary Material and Supplementary Fig. S3). To avoid matrix inversion issues, we also implemented an option to prune SNPs with perfect correlation of 1 with another SNP in the matrix. 3 Application example In practice, application is performed in three main steps (see Supplementary Material and Supplementary Fig. S4): (i) estimating the SNP correlation matrix, (ii) computing mean and variance for both the outcome and the exposure in the pooled sample and (iii) finally, estimating the percentage of phenotypic variance explained by main genetic effects and/or interaction effects. To illustrate the performances of our package, we performed a simulation study (see Supplementary Material) comparing the adjusted coefficients of determination from regressions and the estimates obtained using VarExp across 1000 replicates. Figure 1 and Supplementary Figures S1 and S2 demonstrate the high accuracy of our estimator with an intraclass correlation coefficient between the coefficients of determination and their estimations equal to 0.99, 0.98 and 0.99 for the marginal genetic effects, interaction effects and joint effects, respectively. Fig. 1. View largeDownload slide Percentage of phenotypic variance explained using summary statistics (estimated) and individual-level data (observed) for (a) main genetic (b) interaction and (c) joint effects. The line corresponds to y=x and ICC is the intraclass correlation coefficient Fig. 1. View largeDownload slide Percentage of phenotypic variance explained using summary statistics (estimated) and individual-level data (observed) for (a) main genetic (b) interaction and (c) joint effects. The line corresponds to y=x and ICC is the intraclass correlation coefficient 4 Concluding remarks In this work, we provide R package VarExp to easily estimate the percentage of phenotypic variance explained by genetic effects, GxE interaction effects or their joint contribution using summary statistics only, making it straightforward in large-scale consortia. Importantly, several limitations of GxE screenings have previously been discussed [(Aschard, 2016; Robinson et al., 2017), see also Supplementary Material] and have to be taken into account by users before applying our approach. Acknowledgements We gratefully acknowledge all contributors to the CHARGE Gene-Lifestyle Interactions Working Group. Funding This work was supported by the R01HL118305 grant from the NHLBI. H.A. was also supported by R21HG007687 from NHGRI. A.R.B. was supported by the Intramural Research Program of the National Human Genome Research Institute in the Center for Research in Genomics and Global Health (CRGGH, Z01HG200362). Conflict of Interest: none declared. References Abecasis G.R. et al.   ( 2012) An integrated map of genetic variation from 1, 092 human genomes. Nature , 491, 56– 65. Google Scholar CrossRef Search ADS PubMed  Aschard H. ( 2016) A perspective on interaction effects in genetic association studies. Genet. Epidemiol ., 40, 678– 688. Google Scholar CrossRef Search ADS PubMed  Aschard H. et al.   ( 2012) Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum. Genet ., 131, 1591– 1613. Google Scholar CrossRef Search ADS PubMed  Pare G. et al.   ( 2016) A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics. Sci. Rep ., 6, 27644. Google Scholar CrossRef Search ADS PubMed  Robinson M.R. et al.   ( 2017) Genotype-covariate interaction effects and the heritability of adult body mass index. Nat. Genet ., 49, 1174– 1181. Google Scholar CrossRef Search ADS PubMed  Shi H. et al.   ( 2016) Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet ., 99, 139– 153. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Loading next page...
 
/lp/ou_press/varexp-estimating-variance-explained-by-genome-wide-gxe-summary-I0zqb0kNIk
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty379
Publisher site
See Article on Publisher Site

Abstract

Abstract Summary Many genome-wide association studies and genome-wide screening for gene–environment (GxE) interactions have been performed to elucidate the underlying mechanisms of human traits and diseases. When the analyzed outcome is quantitative, the overall contribution of identified genetic variants to the outcome is often expressed as the percentage of phenotypic variance explained. This is commonly done using individual-level genotype data but it is challenging when results are derived through meta-analyses. Here, we present R package, ‘VarExp’, that allows for the estimation of the percentage of phenotypic variance explained using summary statistics only. It allows for a range of models to be evaluated, including marginal genetic effects, GxE interaction effects and both effects jointly. Its implementation integrates all recent methodological developments and does not need external data to be uploaded by users. Availability and implementation The R package is available at https://gitlab.pasteur.fr/statistical-genetics/VarExp.git. Contact vincent.laville@pasteur.fr Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction Many genome-wide association studies (GWAS) or genome-wide screenings incorporating gene–environment (GxE) interactions (Aschard et al., 2012) have been performed to better understand underlying mechanisms of human traits and diseases. When the analyzed outcome is continuous, a commonly used measure to judge the overall impact of the significant associations is the percentage of phenotypic variance explained. A standard way of estimating this percentage is to compare the coefficients of determination between the models including or not the significantly associated variants and/or interactions. This requires individual genotype and phenotype data which can be challenging in meta-analyses performed in big consortia as pooling data from multiple cohorts raises practical and ethical issues. However, an alternative is to use only GWAS or genome-wide GxE summary statistics. Recently, several methods (Pare et al., 2016; Shi et al., 2016) have been developed to estimate the variance explained by marginal genetic effects while taking into account linkage disequilibrium between variants, and addressing statistical issues related to finite sample size and Single Nucleotide Polymorphisms (SNP) correlation matrices. Yet, these works only focused on marginal genetic effects, while genome-wide GxE and joint effect GWAS are now commonly performed and face the same need. In this work, we address this gap by extending the methodology to GxE screening and implementing R package VarExp to rapidly and easily estimate the percentage explained by variants and/or interactions of interest using only meta-analysis summary statistics from GWAS. 2 Materials and methods 2.1 Percentage of variance explained Consider a set of K SNPs Gkk=1…K, coded additively as {0, 1, 2} and a quantitative outcome Y. The marginal genetic effect αG.k of SNP Gk is estimated in the marginal model:   Y=α0+αG.kGk+ε. Shi et al. (2016) proposed a first naïve estimator to derive the variance explained by genetic effects, fG using summary statistics:   fG=αG′TΣ*αG′/varY where αG′=αG.1σ1 …αG.kσk…αG.KσKT, σk denotes the standard deviation of SNP Gk and Σ* is the Moore–Penrose generalized inverse of the genotype correlation matrix. However, finite sample size implies statistical noise in both the effect sizes and the correlation matrix estimations which can induce bias in the estimation of fG. Shi et al. (2016) derived a general formula that addresses this issue:   fG*=N×αG′TΣ*αG′-q/N-q×varY where N and q denote respectively the sample size and the rank of the correlation matrix. Now consider an exposure E (either binary or quantitative). The main effect αG.k of the SNP Gk and the interaction effect αINT.k can be estimated using a single-SNP model with an interaction term:   Y=α0+αG.kGk+αEE+αINT.kGk×E+ε. We show in Supplementary Material that, when re-parameterizing the effect estimates of the above model to obtain parameters from a fully standardized model, the percentage of variance explained by interactions effects fI or jointly by genetic and interaction effects fG+I can also be derived using summary statistics only:   fI=αINT′TΣ*αINT′/varY  fG+I=fG+fI where αINT′=αINT.1σ1σE …αINT.kσkσE…αINT.KσKσET and σE is the standard deviation of E. In this model, fG is computed using effect sizes from the interaction model. However, for the reasons discussed above, we define our final estimators, fI* and fG+I*, by applying the same corrections as proposed for the fG estimator by Shi et al. (see fG* equation). 2.2 Estimating the genotype correlation matrix When the genotype correlation is not available from the data, it can be estimated using genotype data from a reference panel such as the 1000 Genomes (Abecasis et al., 2012) . We implemented a transparent function that derives this correlation matrix from 1000 Genomes Phase 3 data either through a web access for small number of SNPs or from local data files for larger number of SNPs, as computational time is dramatically reduced when querying local files (see Supplementary Material and Supplementary Fig. S3). To avoid matrix inversion issues, we also implemented an option to prune SNPs with perfect correlation of 1 with another SNP in the matrix. 3 Application example In practice, application is performed in three main steps (see Supplementary Material and Supplementary Fig. S4): (i) estimating the SNP correlation matrix, (ii) computing mean and variance for both the outcome and the exposure in the pooled sample and (iii) finally, estimating the percentage of phenotypic variance explained by main genetic effects and/or interaction effects. To illustrate the performances of our package, we performed a simulation study (see Supplementary Material) comparing the adjusted coefficients of determination from regressions and the estimates obtained using VarExp across 1000 replicates. Figure 1 and Supplementary Figures S1 and S2 demonstrate the high accuracy of our estimator with an intraclass correlation coefficient between the coefficients of determination and their estimations equal to 0.99, 0.98 and 0.99 for the marginal genetic effects, interaction effects and joint effects, respectively. Fig. 1. View largeDownload slide Percentage of phenotypic variance explained using summary statistics (estimated) and individual-level data (observed) for (a) main genetic (b) interaction and (c) joint effects. The line corresponds to y=x and ICC is the intraclass correlation coefficient Fig. 1. View largeDownload slide Percentage of phenotypic variance explained using summary statistics (estimated) and individual-level data (observed) for (a) main genetic (b) interaction and (c) joint effects. The line corresponds to y=x and ICC is the intraclass correlation coefficient 4 Concluding remarks In this work, we provide R package VarExp to easily estimate the percentage of phenotypic variance explained by genetic effects, GxE interaction effects or their joint contribution using summary statistics only, making it straightforward in large-scale consortia. Importantly, several limitations of GxE screenings have previously been discussed [(Aschard, 2016; Robinson et al., 2017), see also Supplementary Material] and have to be taken into account by users before applying our approach. Acknowledgements We gratefully acknowledge all contributors to the CHARGE Gene-Lifestyle Interactions Working Group. Funding This work was supported by the R01HL118305 grant from the NHLBI. H.A. was also supported by R21HG007687 from NHGRI. A.R.B. was supported by the Intramural Research Program of the National Human Genome Research Institute in the Center for Research in Genomics and Global Health (CRGGH, Z01HG200362). Conflict of Interest: none declared. References Abecasis G.R. et al.   ( 2012) An integrated map of genetic variation from 1, 092 human genomes. Nature , 491, 56– 65. Google Scholar CrossRef Search ADS PubMed  Aschard H. ( 2016) A perspective on interaction effects in genetic association studies. Genet. Epidemiol ., 40, 678– 688. Google Scholar CrossRef Search ADS PubMed  Aschard H. et al.   ( 2012) Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum. Genet ., 131, 1591– 1613. Google Scholar CrossRef Search ADS PubMed  Pare G. et al.   ( 2016) A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics. Sci. Rep ., 6, 27644. Google Scholar CrossRef Search ADS PubMed  Robinson M.R. et al.   ( 2017) Genotype-covariate interaction effects and the heritability of adult body mass index. Nat. Genet ., 49, 1174– 1181. Google Scholar CrossRef Search ADS PubMed  Shi H. et al.   ( 2016) Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet ., 99, 139– 153. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

BioinformaticsOxford University Press

Published: May 3, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off