Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Multivariate generalized linear model for genetic pleiotropy

Multivariate generalized linear model for genetic pleiotropy Summary When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that |$k+1$| traits are associated, given that the null of |$k$| associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice. 1. Introduction Genetic pleiotropy—a single gene influencing more than one trait—has been of keen interest to geneticists, and rightly so. Understanding pleiotropy can lead to greater biological insights. However, discovering pleiotropy can be challenging, because a gene can be associated with more than one trait for multiple reasons (Solovieff and others, 2013). Statistical methods to evaluate pleiotropy have been developed from different angles, ranging from comparison of univariate marginal associations of a genetic variant with multiple traits, to multivariate analyses with simultaneous regression of all traits on a genetic variant, to reverse regression of a genetic variant on all traits. Surveys of statistical methods for pleiotropy are provided elsewhere (Schaid and others, 2016; Schriner, 2012; Solovieff and others, 2013; Yang and Wang, 2012; Zhang and others, 2014), so we summarize the main approaches here. Univariate analyses are often based on comparison of variant-specific p-values across multiple traits. Although simple and feasible for meta-analyses, this approach ignores correlation among the traits and is based on post-hoc analyses. More formal meta-analysis methods aggregate p-values to test whether any traits are associated with a variant, yet a significant association could be driven by just one trait. A slightly more sophisticated approach, also based on summary p-values, tests whether the distribution of p-values differs from the null distribution of no associations beyond those already detected (Cotsapas and others, 2011). Description of additional univariate methods are given elsewhere (Solovieff and others, 2013). Multivariate methods have been popular for quantitative traits. Although different statistical methods have been proposed, some of them result in the same statistical tests. The following three approaches to analyze quantitative traits result in the same F-statistic to test whether any of the traits are associated with a genetic variant: (i) simultaneous regression of all traits on a single variant; (ii) regression of the minor allele dose on all traits; and (iii) canonical correlation of the matrix of traits with the minor allele dose [using plink.multivariate (Ferreira and Purcell, 2009)]. The regression of the dose of the minor allele on all traits is a convenient approach, particularly if some of the traits are binary. A slightly different approach is to account for the categorical nature of the dose of the minor allele: instead of using linear regression, use ordinal logistic regression of the dose on the traits [R MultiPhen package, (O’Reilly and others, 2012)]. An advantage of this approach is that it allows for binary traits, unlike most methods that assume traits are quantitative with a multivariate normal distribution. Recently, score tests for generalized linear models, based on estimating equations, have been developed as a way to simultaneously test multiple traits, some of which could be binary (Xu and Pan, 2015). Alternatively, joint analysis of a single quantitative trait and a single categorial trait has been developed, based on conditioning on the categorical trait (Wu and others, 2013). An approach somewhat between univariate and multivariate is based on reducing the dimension of the multiple traits by principal components (PC) and using a reduced set of PCs as either the dependent or independent variables in regression. A comparison of univariate and multivariate approaches found that multivariate methods based on multivariate normality { e.g., canonical correlation, linear regression of traits on minor allele dose, reverse regression, MultiPhen, Bayes methods [BIMBAM (Stephens, 2013), and SNPTEST (Marchini and others, 2007)]} all had similar power and were generally more powerful than univariate methods (Galesloot and others, 2014). The power advantage of multivariate over univariate methods occurs when the direction of the residual correlation is opposite from that of the genetic correlation induced by the causal variant (Galesloot and others, 2014; Liu and others, 2009). In addition to the methods discussed above, a few new approaches have been proposed, but have not yet been compared with others. An interesting approach is to scale the different traits by their standard deviation, and then assume that the effect of a SNP is constant across all traits to in order to construct a test of association with 1 degree of freedom—so-called “scaled marginal models” (Roy and others, 2003; Schifano and others, 2013). Finally, an approach based on kernel machine regression extended the sequential kernel association test (Wu and others, 2010) to multiple traits, providing a simultaneous test of multiple traits with multiple genetic variants in a genomic region (Maity and others, 2012). A limitation of all current approaches is that they test whether any traits are associated with a genetic variant, and small p-values could be driven by the association of the genetic variant with a single trait. Hence, post-hoc analyses are required to interpret the possibility of pleiotropy. This can be quite challenging when scaling up to a large number of genetic variants. Because current analytic methods for pleiotropy limit biological understanding, we recently developed new statistical methods to evaluate pleiotropy, based on a likelihood ratio test for quantitative traits (Schaid and others, 2016). This method was developed as a sequential testing framework, such that one could test the null hypothesis that |$k+1$| traits are associated, given that the null of |$k$| associated traits was rejected. This approach allows one to determine the number of traits associated with a genetic variant, and which traits are associated, while accounting for correlations among the traits. A limitation of this past work is that it was based on the assumption of a multivariate normal distribution. Although robust to distributions with thicker tails than a normal distribution, this approach is too restrictive to other traits, such as binary or ordinal traits, or perhaps a mixture of different types of traits. For this reason, we extended our methods to account for a wide variety of types of traits, based on generalized linear models (to account for alternate types of traits) and generalized estimating equations (GEE, to account for correlation among the traits). 2. Methods An outline of our approach follows. We first define a general linear model for each trait, and in turn define regression parameters for both the genotype of interest and adjusting covariates. These trait models allow us to compute covariances among the residuals to account for correlated traits. We describe how to estimate the parameters of the models, with special consideration for ordinal traits that require a few extra details. We then describe how to set up and test hypotheses regarding the number of regression coefficients constrained to zero, providing a way to consider combinatorial sets of constrained parameters. This framework provides a formal test of the null hypothesis of no pleiotropy (i.e., no more than one trait associated with a genotype), as well as an extension to perform sequential tests of the number of traits associated with a genotype. 2.1 Model and notation Suppose that for each subject |$i=1,\ldots,n$|⁠, a vector of |$p$| traits is measured, |$Y_{i} =(\,y_{i1} ,\cdots ,y_{ip} {)}',$| where the traits could be a mixture of different types (e.g., some quantitive, some binomial, and some ordinal). Also assume a |$q$|-dimensional vector of covariates for each subject and each trait, |$X_{ij} =(x_{ij1} ,\cdots ,x_{ijq} {)}'$|⁠. For each trait |$j = 1, \ldots , p$|⁠, suppose that |$y_{ij} $| follows a generalized linear model $$ g_{j} (\mu_{ij} )=\alpha_{j} +{X}'_{ij} \beta_{j} ,\quad Var(\,y_{ij} \vert X_{ij} )=\phi_{j} \nu_{j} (\mu _{ij} )_{\mathrm{,}} $$ where the mean is modeled as a function of the covariates, |$\mu_{ij}=E(\,y_{ij} \mid X_{ij} )$| with |$\beta_{j} =(\beta_{j1} ,\cdots ,\beta_{jq} {)}'$| the regression parameters to be estimated. Here, |$g_{j} $| is a known link function which has an inverse function |$h_{j} $|⁠; the variance function |$\nu_{j} $| is a known function of |$\mu_{ij} $|⁠; |$\phi_{j} $| is the corresponding scale parameter that needs to be estimated or is known. For simplicity, we assume that |$q=1$| since the statistical inferences below are easily extended to the general case |$q>1$|⁠. We assume that the covariance matrix of |$Y_{i} $| has the form $$ \Sigma_{i} =C_{i}^{1/2} RC_{i}^{1/2} , $$ where |$C_{i} =\textrm{diag}{\kern 1pt}\{\phi_{1} \nu_{1} (\mu_{i1} ),\ldots ,\phi_{p} \nu_{p} (\mu_{ip} )\}$| and |$R$| is the correlation matrix that could be assumed to have a special structure (Diggle and others, 2009). The regression parameters are organized as a vector for intercepts, |$\alpha=(\alpha_{1} ,\cdots ,\alpha_{p} {)}'$|⁠, a vector for covariates, |$\beta=(\beta_{1} ,\cdots ,\beta_{p} {)}'$|⁠, and a collection of both, |$\theta=({\alpha }',{\beta }'{)}'$|⁠, which has dimension |$d=p+qp$|⁠. Denote the true value of |$\theta $| as |$\theta_{0} $|⁠, and let |$\phi =(\phi_{1} ,\cdots,\phi_{p} {)}'$| denote the vector of nuisance scale parameters. In the following sections, we will discuss the estimation of |$\theta $| and hypothesis tests about |$\beta $|⁠. 2.2 Parameter estimation To estimate |$\theta $| and |$\phi $|⁠, we take advantage of standard procedures with standard software. Initial estimates of |$\hat{{\theta }}$| can be obtained by two different approaches. If all the traits are of the same type (e.g., all binomial), then standard software for GEE could be used, such as gee or glmgee packages in R software. A limitation of standard software for GEE, however, is that they do not allow for mixtures of different types of traits (e.g., quantitative mixed with binomial). To allow for a mixture of trait types, initial estimates could be obtained by performing separate regresssions for each trait. For example, using linear regression for quantitative traits and logistic regression for binomial traits. From these initial estimates, standardized residuals can be computed according to |$\varepsilon_{ij} (\hat{{\theta }})=(\,y_{ij} -\mu_{ij} (\hat{{\theta}}))/\sqrt {\nu_{j} (\mu_{ij} (\hat{{\theta }}))} $|⁠. From these residuals, one can estimate scale parameters by $$ \hat{{\phi }}_{j} =n^{-1}\sum\limits_{i=1}^n \varepsilon_{ij} (\hat{{\theta }})^{2}, $$ and an empirical estimate of the correlation by $$ \hat{{R}}_{kj} =\frac{1}{n}\sum\limits_{i=1}^n \varepsilon_{ij} (\hat{{\theta }})\varepsilon_{ik} (\hat{{\theta }})/\sqrt {\hat{{\phi }}_{k} \hat{{\phi }}_{j} } . $$ Using the intial estimates |$\hat{{\theta }}$|⁠, |$\hat{{\phi }}_{j} $|⁠, and |$\hat{{R}}_{kj} $|⁠, it is then possible to iteratively update these estimates to solve the generalized estimation equations |$U_{n} (\theta)=\sum\limits_{i=1}^n D_{i} (\theta )\Sigma_{i} (\theta ,\phi,R)^{-1}[Y_{i} -\mu_{i} (\theta )]=0,$| where |$D_{i} =\partial \mu_{i} /\partial \theta $|⁠, |$\mu_{i} (\theta)=[h_{1} (\alpha_{1} +x_{i1} \beta_{1} ),\cdots ,h_{p} (\alpha_{p}+x_{ip} \beta_{p} ){]}'$| and |$\Sigma_{i} (\theta ,\phi ,R)=C_{i} (\theta,\phi )^{1/2}RC_{i} (\theta ,\phi )^{1/2}.$| In practice, we find that allowing an unspecified correlation structure can make it difficult to achieve convergence for the above iterative procedure, a limitation found in many GEE software packages. For this reason, we propose to obtain initial estimates of |$\theta $|⁠, |$\phi $|⁠, and |$R$| and use these in the subsequent hypothesis testing framework. 2.3 Modification for ordinal traits The above statistical methods are easily extended to the proportional odds model for ordinal traits, with a minor modification to allow for more than one intercept for each ordinal trait. For an ordinal trait with |$K$| ordered categories and a single covariate |$x$|⁠, the proportional odds model is |$P(\,y\geqslant k\vert x)=\frac{e^{\alpha_{k} +\beta x}}{1+e^{\alpha_{k}+\beta x}},$| for |$k=2,\ldots,K$|⁠. So, for our proposed methods, the vector of intercepts for all traits needs to be expanded to account for |$(K-1)$| intercepts for each ordinal trait. Further alterations are needed to compute residuals, |$\varepsilon_{ij}(\hat{{\theta }})=(\,y_{ij} -\mu_{ij} (\hat{{\theta }}))/\sqrt {\nu_{j} (\mu_{ij} (\hat{{\theta }}))} $|⁠, because ordinal regression is modeling |$P(\,y\geqslant k\vert x)$|⁠. Recall that for unordered categorical data with |$K$| categories, |$(K-1)$| indicator variables can be used to record an observed category for a subject, such as |$I_{k} =1$| if category |$k$| is observed, and 0 otherwise. Extending this to ordered categories, we can use the cumulative sum |$y_{k}^{\ast } =\sum\nolimits_k^K {I_{k} } $| to record ordered categories, for |$k=2,\ldots,K$| (⁠|$k =$| 1 is ignored, because in this case |$y_{1}^{\ast } $| always equals 1). This means that we expand from a single ordinal trait |$y$| having values |$1,\ldots,K$| to a vector of traits |$y\ast=(\,y_{2}^{\ast } ,y_{3}^{\ast } ,\ldots,y_{K}^{\ast } )$|⁠, where each of the |$y_{k}^{\ast } $| have values of 0 or 1. This type of coding corresponds with |$E(\,y_{k}^{\ast } \vert x,\theta )=\mu_{k} (\theta )=P(\,y\geqslant k\vert x,\theta )$|⁠, as desired for ordinal regression. So, in summary, we expand each ordinal trait from |$y$| to a vector |$y\ast =(\,y_{2}^{\ast },y_{3}^{\ast } ,\ldots,y_{K}^{\ast } )$|⁠, and then compute residuals |$\varepsilon_{ijk} (\hat{{\theta }})=[y_{ijk}^{\ast } -\mu_{ijk} (\hat{{\theta}})]/\sqrt {\nu_{j} (\mu_{ijk} (\hat{{\theta }}))} $|⁠, where the variance function |$\nu_{j} (\mu_{ijk} (\hat{{\theta }}))=\mu_{ijk} (\theta )[1-\mu_{ijk} (\theta )]$| is for the binary random variable |$y_{ijk}^{\ast } $|⁠. With these modifications, we can follow the above procedures to compute the residuals across all traits, the scale parameters |$\hat{{\phi }}$|⁠, and the empirical estimate of the correlation |$\hat{{R}}$|⁠. A bit more book keeping is required when computing the derivative matrix, |$D_{i} =\partial \mu_{i} /\partial \theta $|⁠, but the statistical methods follow through for analyzing ordinal traits. In general, we allow for any mixture of types of traits, such as binary, ordinal, or quantiative. 2.4 Hypothesis testing We now consider testing the null hypothesis of no pleiotropy, which is formally stated as the following hypothesis test: |$H_{\mathrm{No\,Pleio}}$|⁠: Of the parmeters |$\beta_{1} ,\ldots,\beta_{p},$| there exists at most one that is non-zero. This null hypothesis is equivalent to testing whether one of the following |$p+1$| sub-hypotheses holds: $$ H_{k0} :\beta_{k} \ne 0,\beta_{j} =0\ (j\ne k),\quad \mbox{for}\ k=0,\ldots,p. $$ Note that |$H_{00} $| represents all |$\beta_{k} =0(k=1,\cdots ,p)$|⁠, while for |$k>0$|⁠, |$H_{k0} $| allows |$\beta_{k} \ne 0$| while all other |$\beta_{j} =0\,(j\ne k)$|⁠. To represent these |$p+1$| hypotheses, we use |$H_{k0} :V_{k} \theta =0$|⁠. Let |$V_{0} $| be a matrix such that |$H_{00} :V_{0} \theta =0$| tests whether all |$\beta_{j}=0$|⁠. This is the usual multivariate test. In this case, |$V_{0} $| is created for binary and quantitative traits by taking the identity matrix of dimension |$2p$|⁠, and then removing rows 1 through |$p$| (to ignore the intercepts |$\alpha_{1} ,\ldots,\alpha_{p} )$|⁠. To construct |$V_{k} $| (⁠|$k>0)$|⁠, create an identity matrix of dimension |$2p$|⁠, then remove rows 1 through |$p$| and row |$p+k$| (to ignore |$\beta_{k} )$|⁠. This results in |$V_{k} \theta =(\beta_{1} ,\ldots,\beta_{k-1} ,\beta_{k+1} \beta_{p}{)}'$|⁠. Then, the null hypothesis of no pleiotropy is equivalent to |$H_{0}$|⁠: there exists one of |$H_{k0} :V_{k} \theta=0$|⁠, for |$k=0,\ldots ,p$|⁠. To account for ordinal traits, one must account for the additional intercepts for each ordinal trait. Because we do not assume a specific multivariate distribution of the traits, we cannot work within the framework of likelihood ratio testing, as we had done for traits following a multivariate normal distribution (Schaid and others, 2016). Instead, we define the mean square error (MSE) of a model by |$\rho $|⁠, where $$ \rho_{n} (\theta )=\sum\limits_{i=1}^n [Y_{i} -\mu_{i} (\theta ){]}'\Sigma _{i} (\theta ,\phi ,R)^{-1}[Y_{i} -\mu_{i} (\theta )]. $$ We use the MSE of models constrained under a null hypothesis to construct a test statistic. As shown in the supplementary material available at Biostatistics online, to test a sub-hypothesis |$H_{k0} $|⁠, the statistic |$t_{k}=\hat{{\theta }}_{V_{k} }^{{\ast }'} A_{n} \hat{{\theta }}_{V_{k} }^{\ast }$| can be used, where |$\hat{{\theta }}_{V_{k} }^{\ast } $| denotes the difference between the unconstrained and constrained estimators |$(\hat{{\theta }}_{V_{k} }^{\ast } =\hat{{\theta }}_{n} -\hat{{\theta}}_{V_{k} } )$|⁠, computed as |$\hat{{\theta }}_{V_{k} }^{\ast } =A_{n}^{-1}{V}'_{k} [V_{k} A_{n}^{-1} {V}'_{k} ]^{-1}V_{k} \hat{{\theta }}_{n} $|⁠, where |$\hat{{\theta }}_{n} $| is the unconstrained estimator, and where $$ A_{n} =\sum\nolimits_{i=1}^n D_{i} (\hat{{\theta }}_{n} )\Sigma_{i} (\hat{{\theta }}_{n} ,\hat{{\phi }},\hat{{R}})^{-1}D_{i} (\hat{{\theta }}_{n} {)}'. $$ When |$H_{k0} $| is true, |$t_{k} \sim \chi_{d}^{2} $|⁠, where |$d$| is the rank of the idempotent matrix |$P_{V_{k} } =A^{-1/2} {V}'_{k} [V_{k} A^{-1}{V}'_{k} ]^{-1}V_{k} A^{-1/2} $| (Corollary 1 of supplementary material available at Biostatistics online). The statistics |$t_{0} ,t_{1} ,\ldots,t_{p} $| provide a way to test each sub-hypothesis |$H_{00} ,H_{10} ,\ldots,H_{p0} $|⁠. In the next sections, we propose a way to test the null hypothesis of no pleiotropy, and a sequential testing procedure to evaluate the number of traits associated with a genetic variant. 2.5 Testing pleiotropy The usual multivariate null hypothesis of no associated traits, |$H_{00} $|⁠: |$\beta =0$|⁠, can be tested by the statistic |$t_{0} \sim \chi_{p}^{2} $|⁠, so reject |$H_{00} $| if |$t_{0} >\chi_{p}^{2} (\alpha )$|⁠, where |$\chi_{p}^{2}(\alpha )$| is the |$1-\alpha $| quantile of a |$\chi^{2}$| distribution with |$p$| degrees of freedom (df). Alternatively, the null hypothesis of no pleiotropy can be tested by simultaneously testing |$H_{00} $| (i.e., no associated traits) and testing the null hypotheses that only one |$H_{k0} $| holds for |$k=1,\cdots ,p$| (i.e, only one associated trait). This is achieved by the statistic |$T_{pleio} =\mathop {\min }\limits_{k=0,1,\cdots ,p} t_{k}.$| As shown Corollary 3 of supplementary material available at Biostatistics online, if only one of the |$\beta $|’s is non-zero, which means that one |$H_{k0} $| holds true |$(k>0)$|⁠, |$T_{pleio} \sim \chi_{p-1}^{2} $|⁠, so reject the null hypothesis of no pleiotropy if |$T_{pleio} >\chi_{p-1}^{2} (\alpha)$|⁠. An advantage of this pleiotropy statistic is that it is simple to compute on a large number of genetic markers. However, if no traits are associated, the statistic |$T_{pleio} $| is the minimum of correlated statistics, each with a |$\chi^{2}$| distribution, so the distribution of |$T_{pleio} $| under the null hypothesis of no associated traits is not well defined. To aid interpretation of how many traits, and which traits, are associated with a genetic marker, we propose a sequential test procedure. 2.6 Sequential testing As we previously proposed (Schaid and others, 2016), sequential testing provides a rigorous procedure to evaluate the number of traits associated with a genetic variant. At the first stage, test the usual multivariate null hypothesis of no associated traits, |$H_{00} $|⁠: |$\beta =0$|⁠, using the statistic |$t_{0} \sim \chi_{p}^{2} $|⁠. If |$H_{00} $| cannot be rejected, then the |$H_{\mathrm{No\ pleio}}$| cannot be rejected. In contrast, if |$H_{00} $| is rejected, we turn to the second stage to test the null hypothesis that one sub-hypothesis |$H_{k0} $| holds for |$k=1,\ldots ,p$|⁠. We denote this second stage hypothesis |$H_{1} $|⁠, with subscript 1 indicating the number of associated traits under the null hypothesis. For this, we ignore |$t_{0} $| and use the test statisitc $$ T_{1} =\mathop{\min }\limits_{k=1,\ldots p} t_{k} . $$ As shown in Corollary 3 of supplementary material available at Biostatistics online, if only one of the |$\beta $|’s is non-zero, which means that one |$H_{k0} $| holds true |$(k>0)$|⁠, |$T_{1} \sim \chi_{p-1}^{2} $|⁠, so reject the null hypothesis |$H_{1} $| if |$T_{1} >\chi_{p-1}^{2} (\alpha )$|⁠. In general, for stages |$k=1,\cdots ,p-1$|⁠, test |$H_{k}$|⁠: only |$k$| components of |$\beta$| are not zero. The statistic for stage |$k$| is |$T_{k} =\mathop {\min }\limits_{i_{1},\ldots,i_{k} \in C(p,k)} \hat{{\theta }}_{V_{i_{1} ,\cdots ,i_{k} } }^{{\ast}'} A_{n} \hat{{\theta }}_{V_{i_{1} ,\cdots ,i_{k} } }^{\ast } $|⁠, where the indices |$i_{1} ,\ldots,i_{k} $| are chosen from all possible ways of choosing |$k$| unconstrained |$\beta $|’s among |$p$| traits [|$C(p,k)$|]. That is, the indices |$i_{1} ,\ldots,i_{k} $| correspond to the sub-hypothesis |${\kern 1pt}\beta_{i_{1} } \ne \;0,\cdots ,\beta_{i_{k} } \ne \;0\;and\;\beta_{j} =0 \,(j\ne i_{1} ,\cdots ,i_{k} {\kern 1pt})$|⁠. The contrast matrix |$V_{i_{1} ,\cdots,i_{k} } $| is contructed by constituting an identity matrix whose dimension is the number of estimated parameters, then deleting the rows that correspond to intercepts (to exclude |$\alpha $| intercepts), and then deleting rows with indices |$i_{1} ,\ldots,i_{k} $| for |$\beta $|’s not constrained to equal zero. Reject |$H_{k} $| if |$T_{k} >\chi_{p-k}^{2} (\alpha )$|⁠. If reject |$H_{k} $|⁠, continue this sequential testing by incrementing |$k$| by 1. If fail to reject |$H_{k} $|⁠, stop testing and conclude there are |$k$| traits associated with the genetic variant. The details of the statistical procedures of this general sequential testing procedure are provided in the supplementary material available at Biostatistics online as well as a proof that the Type-I error is controlled. This general sequential procedure provides a formal way to determine the number of traits associated with a genetic variant, and which traits are associated. To determine which traits are associated, the indices |$i_{1} ,\ldots ,i_{k} $| that generate the minimum for the statistic |$T_{k} $| correspond to the traits that are associated, because these indices correspond to unconstrained |$\beta $|’s that provide the best model fit, while all other traits are inferred to be non-associated because their |$\beta $|’s are contrained to zero. 2.7 Simulations To evaluate the adequacy of our developed methods, we performed simulations. For the null hypothesis of no pleiotropy, we performed two sets of simulations. To simulate traits with a specified covariance structure, we simulated random errors from a multivariate normal distribution with a covariance assumed to be a constant |$\rho $| for all pairs of traits (i.e., exchangeable correlation structure) and variance of 1 for each trait. For binary and ordinal traits, we used thresholds to convert from a normally distributed trait to a binary or ordinal trait, with thresholds chosen to achieve the desired frequency for the levels of these categorical traits. By this approach, we were also able to simulate mixtures of quantitative and categorical traits that were correlated. For null hypotheses of no pleiotropy, we assumed either (i) all |$\beta_{j} =0$|⁠, the usual null for multivariate data or (ii) |$\beta_{1} \ne 0$| and all other |$\beta_{j} =0$| (⁠|$j=2,\ldots,p)$|⁠. The value of |$\beta_{1} $| was chosen to achieve 80% power for the marginal effect of the first trait at a nominal Type-I error rate of 0.05. We assumed two different sample sizes, |$n= 500,1000$|⁠, and two different values of the number of traits, |$p=4,10$|⁠. For all simulations, a single SNP was simulated, assuming a minor allele frequency of 0.2. All simulations were repeated 1000 times. 2.8 Data application Our multivariate methods were applied to a study of major depression, combining data from 725 Mayo Clinic patients who participated in two prior studies. All patients had severe depression, and recruitment of subjects and genotype assays are described in the original reports (Biernacka and others, 2015; Ji and others, 2013). For demonstration of our methods, the primary traits of interest were based on the baseline Hamilton Rating Scale for Depression (HAM-D). The HAM-D is a multiple item questionnaire used to provide an indication of depression. A patient is rated by a clinician according to 17 items, each item on a three- or five-point scale, with higher responses indicating more severe symptoms. The total of all 17 scores is used to measure depression, and a total score of 0–7 is considered to be normal, while scores of 20 or higher indicate moderate or worse depression. In our study, the majority of subjects answered the item “Loss of Insight” with the same response (acknowledges being depressed and ill), and because there was little information in this item, it was not used in the analyses. The remaining 16 items were analyzed as multivariate traits in a genome-wide association study (GWAS). The goal was to understand the genetic basis of the different symptoms of depression. The genetic data were based on the approximately 6.9 million measured and imputed single nucleotide polymorphisms (SNPs), having used the software Beagle to impute SNPs with the 1000 Genome cosmopolitan reference sample. 3. Results 3.1 Type-I error of |$t_{{0}}$| The statistic |$t_{{0}}$|⁠, which tests the usual multivariate null hypothesis of no associated traits, had well controlled Type-I error rates when all traits were binary (Table 1), or there was a 50:50 mixture of binary and quantitative traits (Table S1 of supplementary material available at Biostatistics online). When all traits are quantitative, our proposed methods reduce to the methods we presented elsewhere (Schaid and others, 2016), so simulations for purely quantitative traits were not performed here. In contrast, for ordinal traits, the |$t_{{0}}$| statistic sometimes had slightly inflated Type-I error rates (Table S2 and S3 of supplementary material available at Biostatistics online). There are likely several reasons for this inflated error rate. First, for |$K$| levels of an ordinal trait, one must estimate |$K-1$| intercept parameters. Second, to compute the residual correlation matrix, an ordinal trait with |$K$| levels contributes |$K-1$| indicator variables, increasing the number of correlation parameters to estimate for each ordinal trait. It is likely that the additional variability of estimating multiple intercepts and multiple correlations slows down the rate of convergence of |$t_{0} $| to a |$\chi_{p}^{2} $| distribution. As an alternative, we evaluated the benefit of treating ordinal traits as Gaussian and using the identity link function. Tables S2 and S3 of Supplementary material available at Biostatistics online show that treating ordinal traits as Gaussian had better control of the Type-I error rates. To further explore the impact of sample size, we simulated sample sizes of 500, 1000, and 5000, for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Results in Figure 1 show the empirical versus the expected quantiles of p-values for |$t_{0} $|⁠. Figure 1 illustrates that the ordinal link function results in a closer fit to a |$\chi_{p}^{2}$| distribution as sample size increases, and that treating ordinal traits as Gaussian gives a closer fit to a |$\chi_{p}^{2} $| distribution than an ordinal link when sample sizes are not large. Fig. 1. View largeDownload slide Observed versus expected quantiles of |$-$|log|$_{\mathrm{10}}$|(p-value) for |$t_{o} $| statistic when there are no associated traits for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Ordinal links on left panels and Gaussian analyses on right panels. Fig. 1. View largeDownload slide Observed versus expected quantiles of |$-$|log|$_{\mathrm{10}}$|(p-value) for |$t_{o} $| statistic when there are no associated traits for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Ordinal links on left panels and Gaussian analyses on right panels. Table 1. Type-I error rates for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits, all |$\beta_{1} =0$| Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 † Exceeds 95% upper confidence interval. Table 1. Type-I error rates for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits, all |$\beta_{1} =0$| Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 † Exceeds 95% upper confidence interval. 3.2 Type-I error of |$T_{pleio}$| When all |$\beta $|’s equal zero, |$T_{\mathrm{pleio}}$| tends to be very conservative (Table 1). This is not suprising, because under this null hypothesis, |$T_{\mathrm{pleio}}$| is based on the minimum of correlated |$\chi^{2}$| statistics, and the asymptotic distribution of |$T_{\mathrm{pleio}}$| is unknown. In contrast, simulation results for testing pleiotropy with |$T_{\mathrm{pleio}}$| when only a single |$\beta $| is non-zero are presented in Table 2 for when all traits are binary, in Table S4 of supplementary material available at Biostatistics online for a 50:50 mixture of binary and quantitative traits for when a single binary trait has a non-zero |$\beta$|⁠, and in Table S5 of supplementary material available at Biostatistics online for a 50:50 mixture of binary and quantitative traits for when a single quantitative trait has a non-zero |$\beta $|⁠. The size of the |$\beta $| was chosen such that the marginal power of a single trait was 0.80 for a nominal Type-I error rate of 0.05. For all these simulations, the Type-I error rate for |$T_{\mathrm{pleio}}$| is close to the nominal rate, illustrating that |$T_{\mathrm{pleio}}$| closely follows a |$\chi_{p-1}^{2} $| distribution. The results in Tables 2 and Tables S4 and S5 of supplementary material available at Biostatistics online also show the power of testing the usual multivariate null hypothesis with |$t_{0} $| when a single |$\beta $| is non-zero. It is not surprising to see that power increased with increasing trait correlation and with fewer number of tested traits. Table 2. Simulation results for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits when |$\beta_{1} \ne 0$| Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 Table 2. Simulation results for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits when |$\beta_{1} \ne 0$| Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 To evaluate the Type-I error rate of |$T_{\mathrm{pleio}}$| for ordinal traits, we simulated 5 equally correlated traits, with each trait having either 3 or 5 ordered categories. We allowed a single trait to have a non-zero |$\beta $|⁠, where the |$\beta $| was the effect size per allele and chosen such that the marginal power for a single trait was 0.80 for a nominal Type-I error rate of 0.05. For the analyses, we used both an ordinal logistic link and treated ordinal traits as Gaussian. The results in Table 3 show that both procedures have well-controlled Type-I error rates and comparable power to test the null hypothesis of no association using the |$t_{0} $| statistic. Table 3. Simulations for 5 ordinal traits with 3 or 5 categories each, when |$\beta_{1} \ne 0$| Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 † Exceeds 95% upper confidence interval. Table 3. Simulations for 5 ordinal traits with 3 or 5 categories each, when |$\beta_{1} \ne 0$| Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 † Exceeds 95% upper confidence interval. 3.3 Power for pleiotropy Simulation results for power to detect pleiotropy are presented in Table 4 for five binary traits, with effect size |$\beta $| chosen such that there is 80% power to detect the marginal effect of a single trait (Type-I error rate of 0.05). The results show that power for |$T_{\mathrm{pleio}}$| increases as either the trait correlation increases or the number of associated traits increases. Similar findings for ordinal traits are presented in Tables S6 and S7 of supplementary material available at Biostatistics online for five traits with three or five ordered categories each, respectively. The results for ordinal traits show similar power when analyzing ordinal traits with either an ordinal logistic link or treating them as Gaussian. Table 4. Power for 5 binary traits of which either 2 or 3 are associated with a SNP Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power for |$t_{o} $| tests the null hypothesis that no traits are associated, and power for |$T_{\mathrm{pleio}}$| tests the null hypothesis that no more than 1 trait is associated. Table 4. Power for 5 binary traits of which either 2 or 3 are associated with a SNP Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power for |$t_{o} $| tests the null hypothesis that no traits are associated, and power for |$T_{\mathrm{pleio}}$| tests the null hypothesis that no more than 1 trait is associated. 3.4 Properties of sequential test Following the simulation parameters in Table 4, we performed simulations to evaluate the properties of the sequential test. We performed simulations for five binary traits, allowing either two or three associated traits with effect size 0.3, which had 50% marginal power for a single trait association test with a sample size of 500, or 80% power for a sample size of 1000. The results in Table 5 illustrate the following main properties of the sequential test: (i) there was an increased chance of selecting at least one true |$\beta $| as sample size increased, as the number of associated traits increased, or as the trait correlation increased; (ii) when the trait correlation increased, there was in increased chance of selecting both true and false |$\beta $|’s; (iii) the chance of selecting the exact true model depends on the number of associated traits, the trait correlation, the effect size, and the sample size; (iv) there is a trade-off for selecting true and false |$\beta $|’s that depends on the trait correlation. For three of five associated traits with high correlation (⁠|$\rho =0.8)$| there was a 56% chance of correctly selecting all three true |$\beta $|’s, although there was a 23% chance of selecting at least one false |$\beta $|⁠. For |$\rho =0.2$|⁠, there was a 27% chance of selecting all three true betas, but a lower 6% chance of selecting at least one false |$\beta $|⁠; (v) the best scenario for correct model selection is a large sample size (determined by effect size), and moderately correlated traits (for increased power to select true betas, but correlation not too large to avoid also selecting false |$\beta $|’s). Table 5. Properties of sequential testing for five binary traits |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 Proportions out of 1000 simulations for selecting |$\beta $|’s according to true (⁠|$\beta =0.3)$| and false (⁠|$\beta =0)$| coefficients. Marginal power was 50% to detect a single trait for |$N =$| 500, and marginal power of 80% for |$N =$| 1000. Sequential testing was at |$\alpha =0.05$|⁠. Table 5. Properties of sequential testing for five binary traits |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 Proportions out of 1000 simulations for selecting |$\beta $|’s according to true (⁠|$\beta =0.3)$| and false (⁠|$\beta =0)$| coefficients. Marginal power was 50% to detect a single trait for |$N =$| 500, and marginal power of 80% for |$N =$| 1000. Sequential testing was at |$\alpha =0.05$|⁠. 3.5 Comparison with other methods Other methods to evaluate pleiotropy focus on testing the null hypothesis that no traits are associated with a genetic variant, in contrast to our methods that formally test the null hypothesis of one or fewer associated traits, or that provide a sequential test for which traits are associated with a genetic variant. Furthermore, few methods allow for multiple binary traits or a mixture of types of traits. To compare our methods with existing methods, we focused on testing the null hypothesis that no traits are associated with a genetic variant, and compared our proposed |$t_{0}$| statistic with two other methods: (i) MultiPhen (O’Reilly and others, 2012), which reverses the roles of traits and genetic variant by using ordinal logistic regression to regress the three categories of a genotype minor allele dosage on traits; (ii) a joint regression with outcome stratified samples proposed by Wu and others (2013). The Wu approach considers a quantitative trait and a categorical trait and creates a joint likelihood of the two traits by first modeling the probability of the categorical trait, conditional on the genetic variant, and then stratifying on the categorical trait and using linear regression within each strata to model the association of the quantitative trait with the genetic variant. To compare our |$t_{0}$| statistic with the other methods, we simulated a single quantitative trait and a single binary trait, using the simulation methods described above. In this situation, our |$t_{0} $| statistic and the MultiPhen ordinal regression statistic both have an asymptotic chi-square distribution with 2 df. In contrast, the stratified Wu approach has 3 df: one for the logistic regression of the binary trait on the genetic variant, and one for each of the two binary trait strata, for the regression of the quantitative trait on the genetic variant. This reveals the main difference or our approach versus Wu’s stratified approach. The stratified approach allows the effect of the genetic variant on the quantitative trait to vary across the binary trait strata. The power of the three methods are illustrated in Table 6 for when the effect of a genetic variant on a quantitative trait is constant over the binary trait strata. In this scenario, our |$t_{0} $| statistic and the MultiPhen approach had similar power, albeit slightly greater power for |$t_{0} $|⁠. Both of these methods had greater power than Wu’s stratified approach. The weaker power of Wu’s approach results from an extra degree of freedom when stratifying on the binary trait. For categorical traits with more than two levels, power is likely to diminish because of the extra parameters to test. In contrast, when the effect of a genetic variant on a quantitative trait varied over the binary trait strata, Table 7 illustrates that power of Wu’s method was greater than our |$t_{0} $| statistic and the MultiPhen approach, with the power advantage depending on the magnitude of the heterogeneity across strata. Table 6. Power of methods when effect of genetic variant on quantitative trait is constant over binary trait strata A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 Table 6. Power of methods when effect of genetic variant on quantitative trait is constant over binary trait strata A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 Table 7. Power of methods when effect of genetic variant on quantitative trait varies over binary trait strata A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 Table 7. Power of methods when effect of genetic variant on quantitative trait varies over binary trait strata A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 3.6 Application to GWAS symptoms of depression When we applied the multivariate |$t_{0} $| statistic to the GWAS symptoms of depression, treating the 16 response items as ordinal response traits, and the dose of the minor allele as a predictor variable, we found the quantile–quantile plot of the GWAS results to suggest an excess of small p-values, consistent with our simulation results. For this reason, we treated the 16 response items as Gaussian random variables and found the resulting quantile–quantile plot to be more acceptable. Although none of the individual |$t_{0} $| statistics for the SNPs achieved genome-wide significance (i.e., p-value < 5E|$-$|8), the smallest p-value of 4.0E|$-$|7 for the imputed SNP rs11635365 is intriguing, because this common variant (minor allele frequency of 0.38) is within an intron of a gene. In contrast to our multivariate analysis, the most common way to analyze the HAM-D assessment tool data is to sum all response items to create a single score. We illustrate the univariate associations of rs11635365 with each of the 16 response items in Table 8, along with the association of the sum of all 16 response items. This table illustrates that the multivariate |$t_{0}$| statistic results in a much smaller p-value than that for the sum of 16 responses, as well as for any of the individual HAM-D items. Some causes of this difference are likely the different signs of the regression coefficients as well as the correlation structure among the response items. The 120 correlations for the HAM-D data ranged |$-$|0.08 to 0.55, with the majority of correlations between 0 and 0.2. Table 8. Univariate regression results for top-hit SNP rs11635365 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 Table 8. Univariate regression results for top-hit SNP rs11635365 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 To illustrate use of the sequential test procedure, we illustrate results in Table 9 if one were to use a nominal Type-I error rate of 0.05 to decide when to stop sequential testing. Based on this threshold, the results in Table 9 show that HAM-D response items 7 (weight loss), 11 (somatic symptoms), 14 (agitation), and 16 (hypochondriasis) were the main drivers of the statistical association. It is interesting that item 11 was selected at the first step, despite this item not having the smallest marginal p-value. This is likely due to the signs of the regression coefficients and their correlation structure. Table 9. Results from sequential test of HAM-D response items with SNP rs11635365 Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) Table 9. Results from sequential test of HAM-D response items with SNP rs11635365 Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) 4. Discussion Understanding genetic pleiotropy can provide insights to complex biological mechanisms of genes and aid development of pharmacologic targets. Yet, most statistical methods to assess pleiotropy have resorted to ad hoc comparison of univariate statistical tests or multivariate methods that test the null hypothesis of no trait associations. We recently developed a formal statistical test of pleiotropy based on likelihood ratio tests for multivariate normally distributed traits (Schaid and others, 2016), and now extend our approach in this current work to traits that can be modeled by generalized linear models. Our test of pleiotropy, and its corresponding sequential testing framework, allows for a variety of traits commonly encountered in human genetic studies, such as quantitative, binary, or ordinal traits as well as a mixture of different types of traits. Our simulations show that the proposed methods work well for binary and quantitative traits, or a mixture of these types of traits. In contrast, our methods can have inflated Type-I error rates for ordinal traits if the sample size is not large. To correct this, our simulation results show that using treating ordinal traits as Gaussian is a reasonable strategy, providing controlled Type-I error rates while having reasonable power. We proposed a sequential testing procedure, where the null hypothesis of no associated traits could be tested first (using standard multivariate regression methods), and if significant, followed by sequentially testing the null of |$k$| associated traits (⁠|$k=1,\ldots,p-1)$|⁠, until the sequential test fails to reject the null hypothesis. This approach provides a way to assess the number of traits associated with a genetic variant, accounting for the correlations among the traits. As we emphasized elsewhere (Schaid and others, 2016), it is possible to extend our approach to genetically related subjects by using an |$n\times n$| matrix |$G$| to account for relationships. For pedigree data, |$G$| contains diagonal elements |$G_{ii} =1+h_{i} $| where |$h_{i} $| is the inbreeding coefficient for subject |$i$|⁠, and off-diagonal elements |$G_{ij} =2\varphi_{ij} $|⁠. The parameter |$\varphi_{ij} $| is the kinship coefficient between individuals |$i $| and |$j$|⁠, the probability that a randomly chosen allele at a given locus from individual |$i $| is identical by descent to a randomly chosen allele from individual |$j$|⁠, conditional on their ancestral relationship. For data with population structure, |${G}$| is an estimate of genetic relationships (Schaid and others, 2013). Then, the covariance between subjects |$i$| and |$j$|⁠, for traits |$k$| and |$k'$|⁠, is |$\Sigma_{ij,kk'} =\phi_{k} \nu_{k} (\mu_{ik} )\phi_{k'} \nu_{k'} (\mu_{jk'} )R_{kk'} G_{ij}$|⁠. By using simulations to compare our |$t_{o} $| statistic with MultiPhen ordinal regression and Wu’s stratified method, we found similar power between |$t_{o}$| and MultiPhen. The main advantage of our approach is that it allows the multiple traits to depend on different sets of adjusting covariates, in contrast to MultiPhen which would require a common set of covariates, along with the traits, to serve as predictor variables in ordinal regression. The relative power of Wu’s stratified approach depended on whether the effect of the genetic variant on a quantitative trait varied over strata, with weaker power in the absence of heterogeneity, but greater power as heterogeneity increased. A limitation of the stratified approach would be extending it to multiple binary traits, resulting in highly stratified samples. In contrast, our approach easily handles multiple binary traits by testing the marginal effects of a genetic variant on the multiple traits. When we applied our methods to a GWAS of symptoms of major depression, no SNP associations achieved genome wide significance, although the SNP with the smallest p-value of 4.0E|$-$|7 was interesting because this SNP is within an intron of the gene SH3GL3. This is a protein-coding gene on chromosome 4 that is preferentially expressed in brain and testis and interacts with the Huntington’s disease gene, HTT (Sittler and others, 1998). Huntington disease is inherited in an autosomal dominant manner, due primarily to genetic mutations in the HTT gene on chromosome 4. Although our subjects do not have Huntington disease, it is important to understand that some symptoms of Huntington disease include various emotional and psychiatric problems, of which the most common is depression; depression is not a reaction to having Huntington’s disease diagnosis, but rather appears to occur because of brain injury and brain changes from the disease. The characteristics of the interaction between SH3GL3 and HTT suggest that SH3GL3 could be involved in the selective neuronal cell death in patients with Huntington’s disease (Sittler and others, 1998). This raises the question of whether our results suggest that SH3GL3 also plays a role in major depression, independent of Huntington’s disease. Further studies, preferably functional studies, are required to evaluate whether genetic variants in SH3GL3 are associated with major depression, and in particular whether this gene is associated with certain features of depression, as suggested by our sequential analyses. To demonstrate how the sequential testing procedure can be used, we used a p-value threshold of 0.05 to stop the sequential testing. For analysis of GWAS data, this threshold might be too liberal, depending on how many SNPs are found to be associated with traits at prior steps in the sequential testing framework. In practice, it seems sensible to account for the number of tests at each sequential step by using a Bonferroni correction for the number of tests performed at each step. Software: The software package “pleio” is available as an R package from the CRAN web site (https://cran.r-project.org/web/packages/pleio/index.html). This package contains our original pleio.q functions for quantitative traits and pleio.glm functions for the methods discussed in this current work. Reproducible Research: The data and scripts used for analysis of the depression study are available at Github: https://github.com/sinnweja/rpleio/releases/tag/v1.0. Acknowledgments This research was supported by the U.S. Public Health Service, National Institutes of Health, contract grants numbers GM065450, GM28157, and GM61388. We gratefully acknowledge use of the depression data from the Mayo Clinic Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study. Finally, we would like to thank the patients who participated in the PGRN-AMPS SSRI trial as well as the Mayo Clinic psychiatrists who cared for them. Conflict of Interest: None declared. Funding Funding for this work was provided by the U.S. National Institutes of Health, contract grant number GM065450. References Biernacka J. M. , Sangkuhl K. , Jenkins G. , Whaley R. M. , Barman P. , Batzler A. , Altman R. B. , Arolt V. , Brockmoller J. , Chen C. H. and others. ( 2015 ). The International SSRI Pharmacogenomics Consortium (ISPC): a genome-wide association study of antidepressant treatment response. Translational psychiatry 5 , e553 . Google Scholar Crossref Search ADS PubMed Cotsapas C. , Voight B. F. , Rossin E. , Lage K. , Neale B. M. , Wallace C. , Abecasis G. R. , Barrett J. C. , Behrens T. , Cho J. and others. ( 2011 ). Pervasive sharing of genetic effects in autoimmune disease. PLoS genetics 7 ( 8 ), e1002254 . Google Scholar Crossref Search ADS PubMed Diggle P. , Heagerty P. , Liang K-Y. , Zeger S. ( 2009 ). Analysis of Longitudinal Data. Oxford : Oxford University Press . Ferreira M. A. , Purcell S. M. ( 2009 ). A multivariate test of association. Bioinformatics 25 ( 1 ), 132 – 133 . Google Scholar Crossref Search ADS PubMed Galesloot T. E. , van Steen K. , Kiemeney L. A. , Janss L. L. , Vermeulen S. H. ( 2014 ). A comparison of multivariate genome-wide association methods. PloS one 9 ( 4 ), e95923 . Google Scholar Crossref Search ADS PubMed Ji Y. , Biernacka J. M. , Hebbring S. , Chai Y. , Jenkins G. D. , Batzler A. , Snyder K. A. , Drews M. S. , Desta Z. , Flockhart D. and others. ( 2013 ). Pharmacogenomics of selective serotonin reuptake inhibitor treatment for major depressive disorder: genome-wide associations and functional genomics. The pharmacogenomics journal 13 ( 5 ), 456 – 463 . Google Scholar Crossref Search ADS PubMed Liu J. , Pei Y. , Papasian C. J. , Deng H. W. ( 2009 ). Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genetic epidemiology 33 ( 3 ), 217 – 227 . Google Scholar Crossref Search ADS PubMed Maity A. , Sullivan P. F. , Tzeng J. Y. ( 2012 ). Multivariate phenotype association analysis by marker-set kernel machine regression. Genetic epidemiology 36 ( 7 ), 686 – 695 . Google Scholar Crossref Search ADS PubMed Marchini J. , Howie B. , Myers S. , McVean G. , Donnelly P. ( 2007 ). A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39 ( 7 ), 906 – 913 . Google Scholar Crossref Search ADS PubMed O’Reilly P. F. , Hoggart C. J. , Pomyen Y. , Calboli F. C. , Elliott P. , Jarvelin M. R. , Coin L. J. ( 2012 ). MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one 7 ( 5 ), e34861 . Google Scholar Crossref Search ADS PubMed Roy J. , Lin X. , Ryan L. M. ( 2003 ). Scaled marginal models for multiple continuous outcomes. Biostatistics 4 ( 3 ), 371 – 383 . Google Scholar Crossref Search ADS PubMed Schaid D. J. , McDonnell S. K. , Sinnwell J. P. , Thibodeau S. N. ( 2013 ). Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genetic epidemiology 37 , 409 – 418 . Google Scholar Crossref Search ADS PubMed Schaid D. , Tong X. , Larrabee B. , Kennedy R. , Poland G. , Sinnwell J. ( 2016 ). Statistical methods for testing genetic pleiotropy. Genetics 204 , 483 – 497 . Google Scholar Crossref Search ADS PubMed Schifano E. D. , Li L. , Christiani D. C. , Lin X. ( 2013 ). Genome-wide association analysis for multiple continuous secondary phenotypes. American journal of human genetics 92 , 744 – 759 . Google Scholar Crossref Search ADS PubMed Schriner D. ( 2012 ). Moving toward system genetics through multiple trait analysis in genome-wide association studies. Frontiers in Genetics 16 , 1 – 7 . Sittler A. , Walter S. , Wedemeyer N , Hasenbank R , Scherzinger E. , Eickhoff H. , Bates G. P. , Lehrach H. , Wanker E. E. ( 1998 ). SH3GL3 associates with the Huntingtin exon 1 protein and promotes the formation of polygln-containing protein aggregates. Molecular cell 2 , 427 – 436 . Google Scholar Crossref Search ADS PubMed Solovieff N. , Cotsapas C. , Lee P. H. , Purcell S. M , Smoller J. W. ( 2013 ). Pleiotropy in complex traits: challenges and strategies. Nature reviews. Genetics 14 , 483 – 495 . Google Scholar Crossref Search ADS PubMed Stephens M. ( 2013 ). A unified framework for association analysis with multiple related phenotypes. PloS one 8 , e65245 . Google Scholar Crossref Search ADS PubMed Wu M. C. , Kraft P. , Epstein M. P. , Taylor D. M. , Chanock S. J. , Hunter D. J. , Lin X. ( 2010 ). Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86 , 929 – 942 . Google Scholar Crossref Search ADS PubMed Wu C. O. , Zheng G. , Kwak M. ( 2013 ). A joint regression analysis for genetic association studies with outcome stratified samples. Biometrics 69 , 417 – 426 . Google Scholar Crossref Search ADS PubMed Xu Z. , Pan W. ( 2015 ). Approximate score-based testing with application to multivariate trait association analysis. Genetic epidemiology 39 , 469 – 479 . Google Scholar Crossref Search ADS PubMed Yang Q. , Wang Y. ( 2012 ). Methods for analyzing multivariate phenotypes in genetic association studies. Journal of probability and statistics 2012 , 652569 . Google Scholar Crossref Search ADS PubMed Zhang Y. , Xu Z. , Shen X. , Pan W. ( 2014 ). Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. NeuroImage 96 , 309 – 325 . Google Scholar Crossref Search ADS PubMed © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biostatistics Oxford University Press

Multivariate generalized linear model for genetic pleiotropy

Loading next page...
 
/lp/oxford-university-press/multivariate-generalized-linear-model-for-genetic-pleiotropy-vc8j2oVpIW

References (24)

Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
ISSN
1465-4644
eISSN
1468-4357
DOI
10.1093/biostatistics/kxx067
Publisher site
See Article on Publisher Site

Abstract

Summary When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that |$k+1$| traits are associated, given that the null of |$k$| associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice. 1. Introduction Genetic pleiotropy—a single gene influencing more than one trait—has been of keen interest to geneticists, and rightly so. Understanding pleiotropy can lead to greater biological insights. However, discovering pleiotropy can be challenging, because a gene can be associated with more than one trait for multiple reasons (Solovieff and others, 2013). Statistical methods to evaluate pleiotropy have been developed from different angles, ranging from comparison of univariate marginal associations of a genetic variant with multiple traits, to multivariate analyses with simultaneous regression of all traits on a genetic variant, to reverse regression of a genetic variant on all traits. Surveys of statistical methods for pleiotropy are provided elsewhere (Schaid and others, 2016; Schriner, 2012; Solovieff and others, 2013; Yang and Wang, 2012; Zhang and others, 2014), so we summarize the main approaches here. Univariate analyses are often based on comparison of variant-specific p-values across multiple traits. Although simple and feasible for meta-analyses, this approach ignores correlation among the traits and is based on post-hoc analyses. More formal meta-analysis methods aggregate p-values to test whether any traits are associated with a variant, yet a significant association could be driven by just one trait. A slightly more sophisticated approach, also based on summary p-values, tests whether the distribution of p-values differs from the null distribution of no associations beyond those already detected (Cotsapas and others, 2011). Description of additional univariate methods are given elsewhere (Solovieff and others, 2013). Multivariate methods have been popular for quantitative traits. Although different statistical methods have been proposed, some of them result in the same statistical tests. The following three approaches to analyze quantitative traits result in the same F-statistic to test whether any of the traits are associated with a genetic variant: (i) simultaneous regression of all traits on a single variant; (ii) regression of the minor allele dose on all traits; and (iii) canonical correlation of the matrix of traits with the minor allele dose [using plink.multivariate (Ferreira and Purcell, 2009)]. The regression of the dose of the minor allele on all traits is a convenient approach, particularly if some of the traits are binary. A slightly different approach is to account for the categorical nature of the dose of the minor allele: instead of using linear regression, use ordinal logistic regression of the dose on the traits [R MultiPhen package, (O’Reilly and others, 2012)]. An advantage of this approach is that it allows for binary traits, unlike most methods that assume traits are quantitative with a multivariate normal distribution. Recently, score tests for generalized linear models, based on estimating equations, have been developed as a way to simultaneously test multiple traits, some of which could be binary (Xu and Pan, 2015). Alternatively, joint analysis of a single quantitative trait and a single categorial trait has been developed, based on conditioning on the categorical trait (Wu and others, 2013). An approach somewhat between univariate and multivariate is based on reducing the dimension of the multiple traits by principal components (PC) and using a reduced set of PCs as either the dependent or independent variables in regression. A comparison of univariate and multivariate approaches found that multivariate methods based on multivariate normality { e.g., canonical correlation, linear regression of traits on minor allele dose, reverse regression, MultiPhen, Bayes methods [BIMBAM (Stephens, 2013), and SNPTEST (Marchini and others, 2007)]} all had similar power and were generally more powerful than univariate methods (Galesloot and others, 2014). The power advantage of multivariate over univariate methods occurs when the direction of the residual correlation is opposite from that of the genetic correlation induced by the causal variant (Galesloot and others, 2014; Liu and others, 2009). In addition to the methods discussed above, a few new approaches have been proposed, but have not yet been compared with others. An interesting approach is to scale the different traits by their standard deviation, and then assume that the effect of a SNP is constant across all traits to in order to construct a test of association with 1 degree of freedom—so-called “scaled marginal models” (Roy and others, 2003; Schifano and others, 2013). Finally, an approach based on kernel machine regression extended the sequential kernel association test (Wu and others, 2010) to multiple traits, providing a simultaneous test of multiple traits with multiple genetic variants in a genomic region (Maity and others, 2012). A limitation of all current approaches is that they test whether any traits are associated with a genetic variant, and small p-values could be driven by the association of the genetic variant with a single trait. Hence, post-hoc analyses are required to interpret the possibility of pleiotropy. This can be quite challenging when scaling up to a large number of genetic variants. Because current analytic methods for pleiotropy limit biological understanding, we recently developed new statistical methods to evaluate pleiotropy, based on a likelihood ratio test for quantitative traits (Schaid and others, 2016). This method was developed as a sequential testing framework, such that one could test the null hypothesis that |$k+1$| traits are associated, given that the null of |$k$| associated traits was rejected. This approach allows one to determine the number of traits associated with a genetic variant, and which traits are associated, while accounting for correlations among the traits. A limitation of this past work is that it was based on the assumption of a multivariate normal distribution. Although robust to distributions with thicker tails than a normal distribution, this approach is too restrictive to other traits, such as binary or ordinal traits, or perhaps a mixture of different types of traits. For this reason, we extended our methods to account for a wide variety of types of traits, based on generalized linear models (to account for alternate types of traits) and generalized estimating equations (GEE, to account for correlation among the traits). 2. Methods An outline of our approach follows. We first define a general linear model for each trait, and in turn define regression parameters for both the genotype of interest and adjusting covariates. These trait models allow us to compute covariances among the residuals to account for correlated traits. We describe how to estimate the parameters of the models, with special consideration for ordinal traits that require a few extra details. We then describe how to set up and test hypotheses regarding the number of regression coefficients constrained to zero, providing a way to consider combinatorial sets of constrained parameters. This framework provides a formal test of the null hypothesis of no pleiotropy (i.e., no more than one trait associated with a genotype), as well as an extension to perform sequential tests of the number of traits associated with a genotype. 2.1 Model and notation Suppose that for each subject |$i=1,\ldots,n$|⁠, a vector of |$p$| traits is measured, |$Y_{i} =(\,y_{i1} ,\cdots ,y_{ip} {)}',$| where the traits could be a mixture of different types (e.g., some quantitive, some binomial, and some ordinal). Also assume a |$q$|-dimensional vector of covariates for each subject and each trait, |$X_{ij} =(x_{ij1} ,\cdots ,x_{ijq} {)}'$|⁠. For each trait |$j = 1, \ldots , p$|⁠, suppose that |$y_{ij} $| follows a generalized linear model $$ g_{j} (\mu_{ij} )=\alpha_{j} +{X}'_{ij} \beta_{j} ,\quad Var(\,y_{ij} \vert X_{ij} )=\phi_{j} \nu_{j} (\mu _{ij} )_{\mathrm{,}} $$ where the mean is modeled as a function of the covariates, |$\mu_{ij}=E(\,y_{ij} \mid X_{ij} )$| with |$\beta_{j} =(\beta_{j1} ,\cdots ,\beta_{jq} {)}'$| the regression parameters to be estimated. Here, |$g_{j} $| is a known link function which has an inverse function |$h_{j} $|⁠; the variance function |$\nu_{j} $| is a known function of |$\mu_{ij} $|⁠; |$\phi_{j} $| is the corresponding scale parameter that needs to be estimated or is known. For simplicity, we assume that |$q=1$| since the statistical inferences below are easily extended to the general case |$q>1$|⁠. We assume that the covariance matrix of |$Y_{i} $| has the form $$ \Sigma_{i} =C_{i}^{1/2} RC_{i}^{1/2} , $$ where |$C_{i} =\textrm{diag}{\kern 1pt}\{\phi_{1} \nu_{1} (\mu_{i1} ),\ldots ,\phi_{p} \nu_{p} (\mu_{ip} )\}$| and |$R$| is the correlation matrix that could be assumed to have a special structure (Diggle and others, 2009). The regression parameters are organized as a vector for intercepts, |$\alpha=(\alpha_{1} ,\cdots ,\alpha_{p} {)}'$|⁠, a vector for covariates, |$\beta=(\beta_{1} ,\cdots ,\beta_{p} {)}'$|⁠, and a collection of both, |$\theta=({\alpha }',{\beta }'{)}'$|⁠, which has dimension |$d=p+qp$|⁠. Denote the true value of |$\theta $| as |$\theta_{0} $|⁠, and let |$\phi =(\phi_{1} ,\cdots,\phi_{p} {)}'$| denote the vector of nuisance scale parameters. In the following sections, we will discuss the estimation of |$\theta $| and hypothesis tests about |$\beta $|⁠. 2.2 Parameter estimation To estimate |$\theta $| and |$\phi $|⁠, we take advantage of standard procedures with standard software. Initial estimates of |$\hat{{\theta }}$| can be obtained by two different approaches. If all the traits are of the same type (e.g., all binomial), then standard software for GEE could be used, such as gee or glmgee packages in R software. A limitation of standard software for GEE, however, is that they do not allow for mixtures of different types of traits (e.g., quantitative mixed with binomial). To allow for a mixture of trait types, initial estimates could be obtained by performing separate regresssions for each trait. For example, using linear regression for quantitative traits and logistic regression for binomial traits. From these initial estimates, standardized residuals can be computed according to |$\varepsilon_{ij} (\hat{{\theta }})=(\,y_{ij} -\mu_{ij} (\hat{{\theta}}))/\sqrt {\nu_{j} (\mu_{ij} (\hat{{\theta }}))} $|⁠. From these residuals, one can estimate scale parameters by $$ \hat{{\phi }}_{j} =n^{-1}\sum\limits_{i=1}^n \varepsilon_{ij} (\hat{{\theta }})^{2}, $$ and an empirical estimate of the correlation by $$ \hat{{R}}_{kj} =\frac{1}{n}\sum\limits_{i=1}^n \varepsilon_{ij} (\hat{{\theta }})\varepsilon_{ik} (\hat{{\theta }})/\sqrt {\hat{{\phi }}_{k} \hat{{\phi }}_{j} } . $$ Using the intial estimates |$\hat{{\theta }}$|⁠, |$\hat{{\phi }}_{j} $|⁠, and |$\hat{{R}}_{kj} $|⁠, it is then possible to iteratively update these estimates to solve the generalized estimation equations |$U_{n} (\theta)=\sum\limits_{i=1}^n D_{i} (\theta )\Sigma_{i} (\theta ,\phi,R)^{-1}[Y_{i} -\mu_{i} (\theta )]=0,$| where |$D_{i} =\partial \mu_{i} /\partial \theta $|⁠, |$\mu_{i} (\theta)=[h_{1} (\alpha_{1} +x_{i1} \beta_{1} ),\cdots ,h_{p} (\alpha_{p}+x_{ip} \beta_{p} ){]}'$| and |$\Sigma_{i} (\theta ,\phi ,R)=C_{i} (\theta,\phi )^{1/2}RC_{i} (\theta ,\phi )^{1/2}.$| In practice, we find that allowing an unspecified correlation structure can make it difficult to achieve convergence for the above iterative procedure, a limitation found in many GEE software packages. For this reason, we propose to obtain initial estimates of |$\theta $|⁠, |$\phi $|⁠, and |$R$| and use these in the subsequent hypothesis testing framework. 2.3 Modification for ordinal traits The above statistical methods are easily extended to the proportional odds model for ordinal traits, with a minor modification to allow for more than one intercept for each ordinal trait. For an ordinal trait with |$K$| ordered categories and a single covariate |$x$|⁠, the proportional odds model is |$P(\,y\geqslant k\vert x)=\frac{e^{\alpha_{k} +\beta x}}{1+e^{\alpha_{k}+\beta x}},$| for |$k=2,\ldots,K$|⁠. So, for our proposed methods, the vector of intercepts for all traits needs to be expanded to account for |$(K-1)$| intercepts for each ordinal trait. Further alterations are needed to compute residuals, |$\varepsilon_{ij}(\hat{{\theta }})=(\,y_{ij} -\mu_{ij} (\hat{{\theta }}))/\sqrt {\nu_{j} (\mu_{ij} (\hat{{\theta }}))} $|⁠, because ordinal regression is modeling |$P(\,y\geqslant k\vert x)$|⁠. Recall that for unordered categorical data with |$K$| categories, |$(K-1)$| indicator variables can be used to record an observed category for a subject, such as |$I_{k} =1$| if category |$k$| is observed, and 0 otherwise. Extending this to ordered categories, we can use the cumulative sum |$y_{k}^{\ast } =\sum\nolimits_k^K {I_{k} } $| to record ordered categories, for |$k=2,\ldots,K$| (⁠|$k =$| 1 is ignored, because in this case |$y_{1}^{\ast } $| always equals 1). This means that we expand from a single ordinal trait |$y$| having values |$1,\ldots,K$| to a vector of traits |$y\ast=(\,y_{2}^{\ast } ,y_{3}^{\ast } ,\ldots,y_{K}^{\ast } )$|⁠, where each of the |$y_{k}^{\ast } $| have values of 0 or 1. This type of coding corresponds with |$E(\,y_{k}^{\ast } \vert x,\theta )=\mu_{k} (\theta )=P(\,y\geqslant k\vert x,\theta )$|⁠, as desired for ordinal regression. So, in summary, we expand each ordinal trait from |$y$| to a vector |$y\ast =(\,y_{2}^{\ast },y_{3}^{\ast } ,\ldots,y_{K}^{\ast } )$|⁠, and then compute residuals |$\varepsilon_{ijk} (\hat{{\theta }})=[y_{ijk}^{\ast } -\mu_{ijk} (\hat{{\theta}})]/\sqrt {\nu_{j} (\mu_{ijk} (\hat{{\theta }}))} $|⁠, where the variance function |$\nu_{j} (\mu_{ijk} (\hat{{\theta }}))=\mu_{ijk} (\theta )[1-\mu_{ijk} (\theta )]$| is for the binary random variable |$y_{ijk}^{\ast } $|⁠. With these modifications, we can follow the above procedures to compute the residuals across all traits, the scale parameters |$\hat{{\phi }}$|⁠, and the empirical estimate of the correlation |$\hat{{R}}$|⁠. A bit more book keeping is required when computing the derivative matrix, |$D_{i} =\partial \mu_{i} /\partial \theta $|⁠, but the statistical methods follow through for analyzing ordinal traits. In general, we allow for any mixture of types of traits, such as binary, ordinal, or quantiative. 2.4 Hypothesis testing We now consider testing the null hypothesis of no pleiotropy, which is formally stated as the following hypothesis test: |$H_{\mathrm{No\,Pleio}}$|⁠: Of the parmeters |$\beta_{1} ,\ldots,\beta_{p},$| there exists at most one that is non-zero. This null hypothesis is equivalent to testing whether one of the following |$p+1$| sub-hypotheses holds: $$ H_{k0} :\beta_{k} \ne 0,\beta_{j} =0\ (j\ne k),\quad \mbox{for}\ k=0,\ldots,p. $$ Note that |$H_{00} $| represents all |$\beta_{k} =0(k=1,\cdots ,p)$|⁠, while for |$k>0$|⁠, |$H_{k0} $| allows |$\beta_{k} \ne 0$| while all other |$\beta_{j} =0\,(j\ne k)$|⁠. To represent these |$p+1$| hypotheses, we use |$H_{k0} :V_{k} \theta =0$|⁠. Let |$V_{0} $| be a matrix such that |$H_{00} :V_{0} \theta =0$| tests whether all |$\beta_{j}=0$|⁠. This is the usual multivariate test. In this case, |$V_{0} $| is created for binary and quantitative traits by taking the identity matrix of dimension |$2p$|⁠, and then removing rows 1 through |$p$| (to ignore the intercepts |$\alpha_{1} ,\ldots,\alpha_{p} )$|⁠. To construct |$V_{k} $| (⁠|$k>0)$|⁠, create an identity matrix of dimension |$2p$|⁠, then remove rows 1 through |$p$| and row |$p+k$| (to ignore |$\beta_{k} )$|⁠. This results in |$V_{k} \theta =(\beta_{1} ,\ldots,\beta_{k-1} ,\beta_{k+1} \beta_{p}{)}'$|⁠. Then, the null hypothesis of no pleiotropy is equivalent to |$H_{0}$|⁠: there exists one of |$H_{k0} :V_{k} \theta=0$|⁠, for |$k=0,\ldots ,p$|⁠. To account for ordinal traits, one must account for the additional intercepts for each ordinal trait. Because we do not assume a specific multivariate distribution of the traits, we cannot work within the framework of likelihood ratio testing, as we had done for traits following a multivariate normal distribution (Schaid and others, 2016). Instead, we define the mean square error (MSE) of a model by |$\rho $|⁠, where $$ \rho_{n} (\theta )=\sum\limits_{i=1}^n [Y_{i} -\mu_{i} (\theta ){]}'\Sigma _{i} (\theta ,\phi ,R)^{-1}[Y_{i} -\mu_{i} (\theta )]. $$ We use the MSE of models constrained under a null hypothesis to construct a test statistic. As shown in the supplementary material available at Biostatistics online, to test a sub-hypothesis |$H_{k0} $|⁠, the statistic |$t_{k}=\hat{{\theta }}_{V_{k} }^{{\ast }'} A_{n} \hat{{\theta }}_{V_{k} }^{\ast }$| can be used, where |$\hat{{\theta }}_{V_{k} }^{\ast } $| denotes the difference between the unconstrained and constrained estimators |$(\hat{{\theta }}_{V_{k} }^{\ast } =\hat{{\theta }}_{n} -\hat{{\theta}}_{V_{k} } )$|⁠, computed as |$\hat{{\theta }}_{V_{k} }^{\ast } =A_{n}^{-1}{V}'_{k} [V_{k} A_{n}^{-1} {V}'_{k} ]^{-1}V_{k} \hat{{\theta }}_{n} $|⁠, where |$\hat{{\theta }}_{n} $| is the unconstrained estimator, and where $$ A_{n} =\sum\nolimits_{i=1}^n D_{i} (\hat{{\theta }}_{n} )\Sigma_{i} (\hat{{\theta }}_{n} ,\hat{{\phi }},\hat{{R}})^{-1}D_{i} (\hat{{\theta }}_{n} {)}'. $$ When |$H_{k0} $| is true, |$t_{k} \sim \chi_{d}^{2} $|⁠, where |$d$| is the rank of the idempotent matrix |$P_{V_{k} } =A^{-1/2} {V}'_{k} [V_{k} A^{-1}{V}'_{k} ]^{-1}V_{k} A^{-1/2} $| (Corollary 1 of supplementary material available at Biostatistics online). The statistics |$t_{0} ,t_{1} ,\ldots,t_{p} $| provide a way to test each sub-hypothesis |$H_{00} ,H_{10} ,\ldots,H_{p0} $|⁠. In the next sections, we propose a way to test the null hypothesis of no pleiotropy, and a sequential testing procedure to evaluate the number of traits associated with a genetic variant. 2.5 Testing pleiotropy The usual multivariate null hypothesis of no associated traits, |$H_{00} $|⁠: |$\beta =0$|⁠, can be tested by the statistic |$t_{0} \sim \chi_{p}^{2} $|⁠, so reject |$H_{00} $| if |$t_{0} >\chi_{p}^{2} (\alpha )$|⁠, where |$\chi_{p}^{2}(\alpha )$| is the |$1-\alpha $| quantile of a |$\chi^{2}$| distribution with |$p$| degrees of freedom (df). Alternatively, the null hypothesis of no pleiotropy can be tested by simultaneously testing |$H_{00} $| (i.e., no associated traits) and testing the null hypotheses that only one |$H_{k0} $| holds for |$k=1,\cdots ,p$| (i.e, only one associated trait). This is achieved by the statistic |$T_{pleio} =\mathop {\min }\limits_{k=0,1,\cdots ,p} t_{k}.$| As shown Corollary 3 of supplementary material available at Biostatistics online, if only one of the |$\beta $|’s is non-zero, which means that one |$H_{k0} $| holds true |$(k>0)$|⁠, |$T_{pleio} \sim \chi_{p-1}^{2} $|⁠, so reject the null hypothesis of no pleiotropy if |$T_{pleio} >\chi_{p-1}^{2} (\alpha)$|⁠. An advantage of this pleiotropy statistic is that it is simple to compute on a large number of genetic markers. However, if no traits are associated, the statistic |$T_{pleio} $| is the minimum of correlated statistics, each with a |$\chi^{2}$| distribution, so the distribution of |$T_{pleio} $| under the null hypothesis of no associated traits is not well defined. To aid interpretation of how many traits, and which traits, are associated with a genetic marker, we propose a sequential test procedure. 2.6 Sequential testing As we previously proposed (Schaid and others, 2016), sequential testing provides a rigorous procedure to evaluate the number of traits associated with a genetic variant. At the first stage, test the usual multivariate null hypothesis of no associated traits, |$H_{00} $|⁠: |$\beta =0$|⁠, using the statistic |$t_{0} \sim \chi_{p}^{2} $|⁠. If |$H_{00} $| cannot be rejected, then the |$H_{\mathrm{No\ pleio}}$| cannot be rejected. In contrast, if |$H_{00} $| is rejected, we turn to the second stage to test the null hypothesis that one sub-hypothesis |$H_{k0} $| holds for |$k=1,\ldots ,p$|⁠. We denote this second stage hypothesis |$H_{1} $|⁠, with subscript 1 indicating the number of associated traits under the null hypothesis. For this, we ignore |$t_{0} $| and use the test statisitc $$ T_{1} =\mathop{\min }\limits_{k=1,\ldots p} t_{k} . $$ As shown in Corollary 3 of supplementary material available at Biostatistics online, if only one of the |$\beta $|’s is non-zero, which means that one |$H_{k0} $| holds true |$(k>0)$|⁠, |$T_{1} \sim \chi_{p-1}^{2} $|⁠, so reject the null hypothesis |$H_{1} $| if |$T_{1} >\chi_{p-1}^{2} (\alpha )$|⁠. In general, for stages |$k=1,\cdots ,p-1$|⁠, test |$H_{k}$|⁠: only |$k$| components of |$\beta$| are not zero. The statistic for stage |$k$| is |$T_{k} =\mathop {\min }\limits_{i_{1},\ldots,i_{k} \in C(p,k)} \hat{{\theta }}_{V_{i_{1} ,\cdots ,i_{k} } }^{{\ast}'} A_{n} \hat{{\theta }}_{V_{i_{1} ,\cdots ,i_{k} } }^{\ast } $|⁠, where the indices |$i_{1} ,\ldots,i_{k} $| are chosen from all possible ways of choosing |$k$| unconstrained |$\beta $|’s among |$p$| traits [|$C(p,k)$|]. That is, the indices |$i_{1} ,\ldots,i_{k} $| correspond to the sub-hypothesis |${\kern 1pt}\beta_{i_{1} } \ne \;0,\cdots ,\beta_{i_{k} } \ne \;0\;and\;\beta_{j} =0 \,(j\ne i_{1} ,\cdots ,i_{k} {\kern 1pt})$|⁠. The contrast matrix |$V_{i_{1} ,\cdots,i_{k} } $| is contructed by constituting an identity matrix whose dimension is the number of estimated parameters, then deleting the rows that correspond to intercepts (to exclude |$\alpha $| intercepts), and then deleting rows with indices |$i_{1} ,\ldots,i_{k} $| for |$\beta $|’s not constrained to equal zero. Reject |$H_{k} $| if |$T_{k} >\chi_{p-k}^{2} (\alpha )$|⁠. If reject |$H_{k} $|⁠, continue this sequential testing by incrementing |$k$| by 1. If fail to reject |$H_{k} $|⁠, stop testing and conclude there are |$k$| traits associated with the genetic variant. The details of the statistical procedures of this general sequential testing procedure are provided in the supplementary material available at Biostatistics online as well as a proof that the Type-I error is controlled. This general sequential procedure provides a formal way to determine the number of traits associated with a genetic variant, and which traits are associated. To determine which traits are associated, the indices |$i_{1} ,\ldots ,i_{k} $| that generate the minimum for the statistic |$T_{k} $| correspond to the traits that are associated, because these indices correspond to unconstrained |$\beta $|’s that provide the best model fit, while all other traits are inferred to be non-associated because their |$\beta $|’s are contrained to zero. 2.7 Simulations To evaluate the adequacy of our developed methods, we performed simulations. For the null hypothesis of no pleiotropy, we performed two sets of simulations. To simulate traits with a specified covariance structure, we simulated random errors from a multivariate normal distribution with a covariance assumed to be a constant |$\rho $| for all pairs of traits (i.e., exchangeable correlation structure) and variance of 1 for each trait. For binary and ordinal traits, we used thresholds to convert from a normally distributed trait to a binary or ordinal trait, with thresholds chosen to achieve the desired frequency for the levels of these categorical traits. By this approach, we were also able to simulate mixtures of quantitative and categorical traits that were correlated. For null hypotheses of no pleiotropy, we assumed either (i) all |$\beta_{j} =0$|⁠, the usual null for multivariate data or (ii) |$\beta_{1} \ne 0$| and all other |$\beta_{j} =0$| (⁠|$j=2,\ldots,p)$|⁠. The value of |$\beta_{1} $| was chosen to achieve 80% power for the marginal effect of the first trait at a nominal Type-I error rate of 0.05. We assumed two different sample sizes, |$n= 500,1000$|⁠, and two different values of the number of traits, |$p=4,10$|⁠. For all simulations, a single SNP was simulated, assuming a minor allele frequency of 0.2. All simulations were repeated 1000 times. 2.8 Data application Our multivariate methods were applied to a study of major depression, combining data from 725 Mayo Clinic patients who participated in two prior studies. All patients had severe depression, and recruitment of subjects and genotype assays are described in the original reports (Biernacka and others, 2015; Ji and others, 2013). For demonstration of our methods, the primary traits of interest were based on the baseline Hamilton Rating Scale for Depression (HAM-D). The HAM-D is a multiple item questionnaire used to provide an indication of depression. A patient is rated by a clinician according to 17 items, each item on a three- or five-point scale, with higher responses indicating more severe symptoms. The total of all 17 scores is used to measure depression, and a total score of 0–7 is considered to be normal, while scores of 20 or higher indicate moderate or worse depression. In our study, the majority of subjects answered the item “Loss of Insight” with the same response (acknowledges being depressed and ill), and because there was little information in this item, it was not used in the analyses. The remaining 16 items were analyzed as multivariate traits in a genome-wide association study (GWAS). The goal was to understand the genetic basis of the different symptoms of depression. The genetic data were based on the approximately 6.9 million measured and imputed single nucleotide polymorphisms (SNPs), having used the software Beagle to impute SNPs with the 1000 Genome cosmopolitan reference sample. 3. Results 3.1 Type-I error of |$t_{{0}}$| The statistic |$t_{{0}}$|⁠, which tests the usual multivariate null hypothesis of no associated traits, had well controlled Type-I error rates when all traits were binary (Table 1), or there was a 50:50 mixture of binary and quantitative traits (Table S1 of supplementary material available at Biostatistics online). When all traits are quantitative, our proposed methods reduce to the methods we presented elsewhere (Schaid and others, 2016), so simulations for purely quantitative traits were not performed here. In contrast, for ordinal traits, the |$t_{{0}}$| statistic sometimes had slightly inflated Type-I error rates (Table S2 and S3 of supplementary material available at Biostatistics online). There are likely several reasons for this inflated error rate. First, for |$K$| levels of an ordinal trait, one must estimate |$K-1$| intercept parameters. Second, to compute the residual correlation matrix, an ordinal trait with |$K$| levels contributes |$K-1$| indicator variables, increasing the number of correlation parameters to estimate for each ordinal trait. It is likely that the additional variability of estimating multiple intercepts and multiple correlations slows down the rate of convergence of |$t_{0} $| to a |$\chi_{p}^{2} $| distribution. As an alternative, we evaluated the benefit of treating ordinal traits as Gaussian and using the identity link function. Tables S2 and S3 of Supplementary material available at Biostatistics online show that treating ordinal traits as Gaussian had better control of the Type-I error rates. To further explore the impact of sample size, we simulated sample sizes of 500, 1000, and 5000, for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Results in Figure 1 show the empirical versus the expected quantiles of p-values for |$t_{0} $|⁠. Figure 1 illustrates that the ordinal link function results in a closer fit to a |$\chi_{p}^{2}$| distribution as sample size increases, and that treating ordinal traits as Gaussian gives a closer fit to a |$\chi_{p}^{2} $| distribution than an ordinal link when sample sizes are not large. Fig. 1. View largeDownload slide Observed versus expected quantiles of |$-$|log|$_{\mathrm{10}}$|(p-value) for |$t_{o} $| statistic when there are no associated traits for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Ordinal links on left panels and Gaussian analyses on right panels. Fig. 1. View largeDownload slide Observed versus expected quantiles of |$-$|log|$_{\mathrm{10}}$|(p-value) for |$t_{o} $| statistic when there are no associated traits for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Ordinal links on left panels and Gaussian analyses on right panels. Table 1. Type-I error rates for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits, all |$\beta_{1} =0$| Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 † Exceeds 95% upper confidence interval. Table 1. Type-I error rates for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits, all |$\beta_{1} =0$| Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 Type-I error rate, nominal Type-I No. subjects No. traits Fraction cases Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 500 4 0.2 0.2 0.046 0.005 0.011 0 0.5 0.059 0.006 0.011 0.001 0.8 0.052 0.006 0.007 0 0.5 0.2 0.044 0.001 0.008 0 0.5 0.051 0.006 0.009 0 0.8 0.050 0.003 0.009 0.001 10 0.2 0.2 0.050 0.006 0.011 0 0.5 0.050 0.004 0.009 0 0.8 0.052 0.006 0.014 0.001 0.5 0.2 0.050 0.007 0.007 0.001 0.5 0.052 0.003 0.008 0 0.8 0.050 0.008 0.016 0.001 1000 4 0.2 0.2 0.044 0.003 0.006 0 0.5 0.044 0.002 0.009 0 0.8 0.034 0.006 0.005 0.002 0.5 0.2 0.048 0.002 0.010 0.001 0.5 0.050 0.002 0.010 0 0.8 0.046 0.005 0.007 0 10 0.2 0.2 0.054 0.004 0.007 0.001 0.5 0.049 0.005 0.009 0 0.8 0.045 0.006 0.012 0.001 0.5 0.2 0.050 0.006 0.006 0 0.5 0.049 0.003 0.010 0 0.8 0.060 0.008 0.017† 0.002 † Exceeds 95% upper confidence interval. 3.2 Type-I error of |$T_{pleio}$| When all |$\beta $|’s equal zero, |$T_{\mathrm{pleio}}$| tends to be very conservative (Table 1). This is not suprising, because under this null hypothesis, |$T_{\mathrm{pleio}}$| is based on the minimum of correlated |$\chi^{2}$| statistics, and the asymptotic distribution of |$T_{\mathrm{pleio}}$| is unknown. In contrast, simulation results for testing pleiotropy with |$T_{\mathrm{pleio}}$| when only a single |$\beta $| is non-zero are presented in Table 2 for when all traits are binary, in Table S4 of supplementary material available at Biostatistics online for a 50:50 mixture of binary and quantitative traits for when a single binary trait has a non-zero |$\beta$|⁠, and in Table S5 of supplementary material available at Biostatistics online for a 50:50 mixture of binary and quantitative traits for when a single quantitative trait has a non-zero |$\beta $|⁠. The size of the |$\beta $| was chosen such that the marginal power of a single trait was 0.80 for a nominal Type-I error rate of 0.05. For all these simulations, the Type-I error rate for |$T_{\mathrm{pleio}}$| is close to the nominal rate, illustrating that |$T_{\mathrm{pleio}}$| closely follows a |$\chi_{p-1}^{2} $| distribution. The results in Tables 2 and Tables S4 and S5 of supplementary material available at Biostatistics online also show the power of testing the usual multivariate null hypothesis with |$t_{0} $| when a single |$\beta $| is non-zero. It is not surprising to see that power increased with increasing trait correlation and with fewer number of tested traits. Table 2. Simulation results for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits when |$\beta_{1} \ne 0$| Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 Table 2. Simulation results for |$t_{o} $| and |$T_{\mathrm{pleio}}$| for binary traits when |$\beta_{1} \ne 0$| Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 Power and Type-I error rate, nominal type-I No. subjects No. traits Fraction cases |$\beta_{1} $| Trait correlation 0.05 0.01 |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 500 4 0.2 0.51 0.2 0.521 0.041 0.317 0.005 0.5 0.630 0.042 0.407 0.007 0.8 0.803 0.045 0.614 0.013 0.5 0.42 0.2 0.534 0.033 0.305 0.002 0.5 0.631 0.040 0.374 0.009 0.8 0.829 0.042 0.654 0.010 10 0.2 0.51 0.2 0.418 0.026 0.188 0.002 0.5 0.511 0.041 0.277 0.003 0.8 0.677 0.037 0.469 0.013 0.5 0.42 0.2 0.395 0.040 0.180 0.005 0.5 0.493 0.045 0.257 0.007 0.8 0.742 0.060 0.507 0.011 1000 4 0.2 0.37 0.2 0.599 0.033 0.374 0.005 0.5 0.638 0.033 0.411 0.007 0.8 0.840 0.058 0.662 0.011 0.5 0.30 0.2 0.584 0.039 0.307 0.008 0.5 0.648 0.039 0.396 0.007 0.8 0.862 0.050 0.672 0.009 10 0.2 0.37 0.2 0.453 0.039 0.244 0.008 0.5 0.510 0.053 0.288 0.009 0.8 0.763 0.053 0.548 0.012 0.5 0.30 0.2 0.427 0.037 0.191 0.003 0.5 0.538 0.042 0.298 0.007 0.8 0.762 0.050 0.558 0.009 To evaluate the Type-I error rate of |$T_{\mathrm{pleio}}$| for ordinal traits, we simulated 5 equally correlated traits, with each trait having either 3 or 5 ordered categories. We allowed a single trait to have a non-zero |$\beta $|⁠, where the |$\beta $| was the effect size per allele and chosen such that the marginal power for a single trait was 0.80 for a nominal Type-I error rate of 0.05. For the analyses, we used both an ordinal logistic link and treated ordinal traits as Gaussian. The results in Table 3 show that both procedures have well-controlled Type-I error rates and comparable power to test the null hypothesis of no association using the |$t_{0} $| statistic. Table 3. Simulations for 5 ordinal traits with 3 or 5 categories each, when |$\beta_{1} \ne 0$| Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 † Exceeds 95% upper confidence interval. Table 3. Simulations for 5 ordinal traits with 3 or 5 categories each, when |$\beta_{1} \ne 0$| Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 Ordinal link, nominal Type-I Gaussian analysis, nominal Type-I 0.05 0.01 0.05 0.01 No. categories No. subjects Trait correlation |$\beta_{1} $| |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I |$t_{o} $| power |$T_{\mathrm{pleio}}$| Type-I 3 500 0.2 0.41 0.585 0.049 0.340 0.005 0.593 0.047 0.360 0.005 0.5 0.722 0.040 0.482 0.002 0.728 0.042 0.493 0.002 0.8 0.966 0.059 0.880 0.004 0.967 0.057 0.879 0.004 1000 0.2 0.29 0.595 0.038 0.363 0.009 0.593 0.035 0.366 0.009 0.5 0.737 0.038 0.522 0.005 0.737 0.037 0.523 0.006 0.8 0.955 0.054 0.861 0.005 0.956 0.051 0.853 0.005 5 500 0.2 0.39 0.535 0.033 0.309 0.007 0.518 0.029 0.295 0.007 0.5 0.716 0.065† 0.490 0.008 0.706 0.060 0.472 0.007 0.8 0.962 0.058 0.899 0.012 0.960 0.049 0.888 0.008 1000 0.2 0.28 0.648 0.048 0.413 0.010 0.647 0.046 0.410 0.008 0.5 0.794 0.044 0.568 0.009 0.786 0.043 0.567 0.007 0.8 0.991 0.051 0.964 0.014 0.989 0.051 0.957 0.012 † Exceeds 95% upper confidence interval. 3.3 Power for pleiotropy Simulation results for power to detect pleiotropy are presented in Table 4 for five binary traits, with effect size |$\beta $| chosen such that there is 80% power to detect the marginal effect of a single trait (Type-I error rate of 0.05). The results show that power for |$T_{\mathrm{pleio}}$| increases as either the trait correlation increases or the number of associated traits increases. Similar findings for ordinal traits are presented in Tables S6 and S7 of supplementary material available at Biostatistics online for five traits with three or five ordered categories each, respectively. The results for ordinal traits show similar power when analyzing ordinal traits with either an ordinal logistic link or treating them as Gaussian. Table 4. Power for 5 binary traits of which either 2 or 3 are associated with a SNP Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power for |$t_{o} $| tests the null hypothesis that no traits are associated, and power for |$T_{\mathrm{pleio}}$| tests the null hypothesis that no more than 1 trait is associated. Table 4. Power for 5 binary traits of which either 2 or 3 are associated with a SNP Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power, nominal Type-I No. associated traits No. subjects Fraction cases |$\beta $| Trait correlation 0.05 0.01 |$t_{o} $| |$T_{\mathrm{pleio}}$| |$t_{o} $| |$T_{\mathrm{pleio}}$| 2 500 0.2 0.51 0.2 0.869 0.384 0.679 0.157 0.5 0.896 0.522 0.751 0.24 0.8 0.965 0.762 0.911 0.509 0.5 0.42 0.2 0.837 0.343 0.623 0.148 0.5 0.890 0.485 0.727 0.227 0.8 0.978 0.778 0.921 0.505 1000 0.2 0.37 0.2 0.862 0.400 0.683 0.159 0.5 0.869 0.464 0.701 0.209 0.8 0.969 0.721 0.897 0.461 0.5 0.30 0.2 0.826 0.331 0.611 0.132 0.5 0.874 0.468 0.698 0.212 0.8 0.958 0.737 0.895 0.462 3 500 0.2 0.51 0.2 0.955 0.783 0.894 0.566 0.5 0.967 0.828 0.886 0.610 0.8 0.986 0.910 0.957 0.761 0.5 0.42 0.2 0.953 0.760 0.867 0.516 0.5 0.940 0.754 0.828 0.507 0.8 0.982 0.908 0.948 0.751 1000 0.2 0.37 0.2 0.926 0.689 0.822 0.44 0.5 0.922 0.746 0.813 0.498 0.8 0.979 0.876 0.922 0.680 0.5 0.30 0.2 0.909 0.654 0.781 0.398 0.5 0.912 0.686 0.775 0.448 0.8 0.969 0.841 0.905 0.654 Power for |$t_{o} $| tests the null hypothesis that no traits are associated, and power for |$T_{\mathrm{pleio}}$| tests the null hypothesis that no more than 1 trait is associated. 3.4 Properties of sequential test Following the simulation parameters in Table 4, we performed simulations to evaluate the properties of the sequential test. We performed simulations for five binary traits, allowing either two or three associated traits with effect size 0.3, which had 50% marginal power for a single trait association test with a sample size of 500, or 80% power for a sample size of 1000. The results in Table 5 illustrate the following main properties of the sequential test: (i) there was an increased chance of selecting at least one true |$\beta $| as sample size increased, as the number of associated traits increased, or as the trait correlation increased; (ii) when the trait correlation increased, there was in increased chance of selecting both true and false |$\beta $|’s; (iii) the chance of selecting the exact true model depends on the number of associated traits, the trait correlation, the effect size, and the sample size; (iv) there is a trade-off for selecting true and false |$\beta $|’s that depends on the trait correlation. For three of five associated traits with high correlation (⁠|$\rho =0.8)$| there was a 56% chance of correctly selecting all three true |$\beta $|’s, although there was a 23% chance of selecting at least one false |$\beta $|⁠. For |$\rho =0.2$|⁠, there was a 27% chance of selecting all three true betas, but a lower 6% chance of selecting at least one false |$\beta $|⁠; (v) the best scenario for correct model selection is a large sample size (determined by effect size), and moderately correlated traits (for increased power to select true betas, but correlation not too large to avoid also selecting false |$\beta $|’s). Table 5. Properties of sequential testing for five binary traits |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 Proportions out of 1000 simulations for selecting |$\beta $|’s according to true (⁠|$\beta =0.3)$| and false (⁠|$\beta =0)$| coefficients. Marginal power was 50% to detect a single trait for |$N =$| 500, and marginal power of 80% for |$N =$| 1000. Sequential testing was at |$\alpha =0.05$|⁠. Table 5. Properties of sequential testing for five binary traits |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 |$N = 500$| |$N = 1000$| No. associated traits Trait correlation No. selected coefficients True |$\beta $|’s selected False |$\beta $|’s selected True |$\beta $|’s selected False |$\beta $|’s selected 2 0.2 0 0.613 0.924 0.209 0.928 1 0.329 0.074 0.499 0.065 2 0.058 0.002 0.292 0.007 0.5 0 0.528 0.902 0.181 0.900 1 0.353 0.089 0.431 0.090 2 0.119 0.005 0.388 0.010 3 0.000 0.004 0.000 0.000 0.8 0 0.410 0.868 0.069 0.917 1 0.370 0.116 0.249 0.067 2 0.220 0.015 0.682 0.009 3 0.000 0.001 0.000 0.007 3 0.2 0 0.380 0.923 0.072 0.940 1 0.372 0.070 0.238 0.057 2 0.192 0.007 0.419 0.003 3 0.056 0.000 0.271 0.000 0.5 0 0.455 0.861 0.097 0.879 1 0.264 0.128 0.176 0.109 2 0.185 0.011 0.353 0.012 3 0.096 0.000 0.374 0.000 0.8 0 0.457 0.716 0.189 0.774 1 0.215 0.180 0.079 0.086 2 0.187 0.104 0.176 0.140 3 0.141 0.000 0.556 0.000 Proportions out of 1000 simulations for selecting |$\beta $|’s according to true (⁠|$\beta =0.3)$| and false (⁠|$\beta =0)$| coefficients. Marginal power was 50% to detect a single trait for |$N =$| 500, and marginal power of 80% for |$N =$| 1000. Sequential testing was at |$\alpha =0.05$|⁠. 3.5 Comparison with other methods Other methods to evaluate pleiotropy focus on testing the null hypothesis that no traits are associated with a genetic variant, in contrast to our methods that formally test the null hypothesis of one or fewer associated traits, or that provide a sequential test for which traits are associated with a genetic variant. Furthermore, few methods allow for multiple binary traits or a mixture of types of traits. To compare our methods with existing methods, we focused on testing the null hypothesis that no traits are associated with a genetic variant, and compared our proposed |$t_{0}$| statistic with two other methods: (i) MultiPhen (O’Reilly and others, 2012), which reverses the roles of traits and genetic variant by using ordinal logistic regression to regress the three categories of a genotype minor allele dosage on traits; (ii) a joint regression with outcome stratified samples proposed by Wu and others (2013). The Wu approach considers a quantitative trait and a categorical trait and creates a joint likelihood of the two traits by first modeling the probability of the categorical trait, conditional on the genetic variant, and then stratifying on the categorical trait and using linear regression within each strata to model the association of the quantitative trait with the genetic variant. To compare our |$t_{0}$| statistic with the other methods, we simulated a single quantitative trait and a single binary trait, using the simulation methods described above. In this situation, our |$t_{0} $| statistic and the MultiPhen ordinal regression statistic both have an asymptotic chi-square distribution with 2 df. In contrast, the stratified Wu approach has 3 df: one for the logistic regression of the binary trait on the genetic variant, and one for each of the two binary trait strata, for the regression of the quantitative trait on the genetic variant. This reveals the main difference or our approach versus Wu’s stratified approach. The stratified approach allows the effect of the genetic variant on the quantitative trait to vary across the binary trait strata. The power of the three methods are illustrated in Table 6 for when the effect of a genetic variant on a quantitative trait is constant over the binary trait strata. In this scenario, our |$t_{0} $| statistic and the MultiPhen approach had similar power, albeit slightly greater power for |$t_{0} $|⁠. Both of these methods had greater power than Wu’s stratified approach. The weaker power of Wu’s approach results from an extra degree of freedom when stratifying on the binary trait. For categorical traits with more than two levels, power is likely to diminish because of the extra parameters to test. In contrast, when the effect of a genetic variant on a quantitative trait varied over the binary trait strata, Table 7 illustrates that power of Wu’s method was greater than our |$t_{0} $| statistic and the MultiPhen approach, with the power advantage depending on the magnitude of the heterogeneity across strata. Table 6. Power of methods when effect of genetic variant on quantitative trait is constant over binary trait strata A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 Table 6. Power of methods when effect of genetic variant on quantitative trait is constant over binary trait strata A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 A. Genetic effect on only binary trait No. subjects Fraction cases Trait correlation Binary Quantitative |$t_{0} $| MultiPhen Wu stratified OR |$\beta $| 500 0.2 0.2 1.67 0 0.719 0.689 0.666 0.5 0 0.743 0.707 0.688 0.8 0 0.876 0.855 0.852 0.5 0.2 1.52 0 0.644 0.652 0.596 0.5 0 0.737 0.721 0.683 0.8 0 0.868 0.855 0.828 1000 0.2 0.2 1.45 0 0.679 0.665 0.621 0.5 0 0.747 0.723 0.698 0.8 0 0.856 0.835 0.828 0.5 0.2 1.35 0 0.651 0.628 0.593 0.5 0 0.751 0.731 0.693 0.8 0 0.873 0.859 0.840 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 B. Genetic effect on both binary and quantitative traits 500 0.2 0.2 1.1 0.18 0.514 0.490 0.442 0.5 1.1 0.18 0.586 0.571 0.518 0.8 1.1 0.18 0.750 0.735 0.695 0.5 0.2 1.1 0.18 0.528 0.509 0.477 0.5 1.1 0.18 0.636 0.629 0.563 0.8 1.1 0.18 0.851 0.838 0.804 1000 0.2 0.2 1.1 0.18 0.873 0.865 0.834 0.5 1.1 0.18 0.948 0.928 0.913 0.8 1.1 0.18 0.989 0.984 0.983 0.5 0.2 1.1 0.18 0.904 0.888 0.870 0.5 1.1 0.18 0.957 0.950 0.945 0.8 1.1 0.18 0.996 0.996 0.992 Table 7. Power of methods when effect of genetic variant on quantitative trait varies over binary trait strata A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 Table 7. Power of methods when effect of genetic variant on quantitative trait varies over binary trait strata A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 A. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.18$| for binary stratum 1 No. subjects Fraction cases Trait correlation |$t_{0} $| MultiPhen Wu stratified 500 0.2 0.2 0.098 0.093 0.121 0.5 0.100 0.093 0.165 0.8 0.143 0.125 0.276 0.5 0.2 0.206 0.198 0.249 0.5 0.264 0.246 0.351 0.8 0.411 0.399 0.513 1000 0.2 0.2 0.131 0.125 0.249 0.5 0.164 0.162 0.312 0.8 0.201 0.203 0.495 0.5 0.2 0.390 0.365 0.501 0.5 0.512 0.500 0.643 0.8 0.722 0.713 0.850 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 B. OR |$=$| 1.1 for binary trait. Quantitative trait effects are |$\beta_{0} =0$| for binary stratum 0 and |$\beta_{1} =0.40$| for binary stratum 1 500 0.2 0.2 0.182 0.171 0.504 0.5 0.225 0.212 0.618 0.8 0.293 0.274 0.852 0.5 0.2 0.695 0.677 0.899 0.5 0.802 0.774 0.953 0.8 0.924 0.922 0.993 1000 0.2 0.2 0.349 0.336 0.828 0.5 0.428 0.393 0.912 0.8 0.545 0.519 0.993 0.5 0.2 0.945 0.940 0.994 0.5 0.976 0.968 0.999 0.8 1.000 0.999 1.000 3.6 Application to GWAS symptoms of depression When we applied the multivariate |$t_{0} $| statistic to the GWAS symptoms of depression, treating the 16 response items as ordinal response traits, and the dose of the minor allele as a predictor variable, we found the quantile–quantile plot of the GWAS results to suggest an excess of small p-values, consistent with our simulation results. For this reason, we treated the 16 response items as Gaussian random variables and found the resulting quantile–quantile plot to be more acceptable. Although none of the individual |$t_{0} $| statistics for the SNPs achieved genome-wide significance (i.e., p-value < 5E|$-$|8), the smallest p-value of 4.0E|$-$|7 for the imputed SNP rs11635365 is intriguing, because this common variant (minor allele frequency of 0.38) is within an intron of a gene. In contrast to our multivariate analysis, the most common way to analyze the HAM-D assessment tool data is to sum all response items to create a single score. We illustrate the univariate associations of rs11635365 with each of the 16 response items in Table 8, along with the association of the sum of all 16 response items. This table illustrates that the multivariate |$t_{0}$| statistic results in a much smaller p-value than that for the sum of 16 responses, as well as for any of the individual HAM-D items. Some causes of this difference are likely the different signs of the regression coefficients as well as the correlation structure among the response items. The 120 correlations for the HAM-D data ranged |$-$|0.08 to 0.55, with the majority of correlations between 0 and 0.2. Table 8. Univariate regression results for top-hit SNP rs11635365 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 Table 8. Univariate regression results for top-hit SNP rs11635365 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 Item HAM-D item Coefficient Standard error p-value 1 Initial insomnia |$-0.11$| 0.05 0.0270 2 Middle insomnia |$-0.05$| 0.04 0.2426 3 Delayed (late) insomnia |$-0.02$| 0.05 0.6954 4 Depressed mood |$-0.05$| 0.04 0.2541 5 Psychic anxiety |$-0.03$| 0.04 0.5307 6 Appetite |$-0.03$| 0.05 0.5137 7 Weight loss 0.08 0.04 0.0748 8 Guilt feelings and delusions |$-0.04$| 0.05 0.3907 9 Suicide |$-0.05$| 0.05 0.3725 10 Work and activities (interests) |$-0.11$| 0.04 0.0144 11 Somatic symptoms, general (energy) 0.06 0.03 0.0297 12 Libido |$-0.02$| 0.04 0.6366 13 Retardation |$-0.09$| 0.05 0.1021 14 Agitation |$-0.16$| 0.04 0.0004 15 Anxiety (somatic) |$-0.11$| 0.04 0.0166 16 Hypochondriasis |$-0.15$| 0.04 0.0005 Total Sum of responses |$-0.86$| 0.27 0.0016 To illustrate use of the sequential test procedure, we illustrate results in Table 9 if one were to use a nominal Type-I error rate of 0.05 to decide when to stop sequential testing. Based on this threshold, the results in Table 9 show that HAM-D response items 7 (weight loss), 11 (somatic symptoms), 14 (agitation), and 16 (hypochondriasis) were the main drivers of the statistical association. It is interesting that item 11 was selected at the first step, despite this item not having the smallest marginal p-value. This is likely due to the signs of the regression coefficients and their correlation structure. Table 9. Results from sequential test of HAM-D response items with SNP rs11635365 Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) Table 9. Results from sequential test of HAM-D response items with SNP rs11635365 Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) Sequential step: null no. associated traits Statistic p-value HAM-D item indices that generate the minimum for the statistic |$T_{k} $|⁠; the traits that provide best model fit under constrained hypothesis 0 |$t_{0} $| 4.0E|$-$|7 1 |$T_{1} $| 4.9E|$-$|5 11 2 |$T_{2} $| 1.5E|$-$|3 11,14 3 |$T_{3} $| 2.0E|$-$|2 11,14,16 4 |$T_{4} $| 1.7E|$-$|1 7,11,14,16 (Stop, p-value > 0.05) 4. Discussion Understanding genetic pleiotropy can provide insights to complex biological mechanisms of genes and aid development of pharmacologic targets. Yet, most statistical methods to assess pleiotropy have resorted to ad hoc comparison of univariate statistical tests or multivariate methods that test the null hypothesis of no trait associations. We recently developed a formal statistical test of pleiotropy based on likelihood ratio tests for multivariate normally distributed traits (Schaid and others, 2016), and now extend our approach in this current work to traits that can be modeled by generalized linear models. Our test of pleiotropy, and its corresponding sequential testing framework, allows for a variety of traits commonly encountered in human genetic studies, such as quantitative, binary, or ordinal traits as well as a mixture of different types of traits. Our simulations show that the proposed methods work well for binary and quantitative traits, or a mixture of these types of traits. In contrast, our methods can have inflated Type-I error rates for ordinal traits if the sample size is not large. To correct this, our simulation results show that using treating ordinal traits as Gaussian is a reasonable strategy, providing controlled Type-I error rates while having reasonable power. We proposed a sequential testing procedure, where the null hypothesis of no associated traits could be tested first (using standard multivariate regression methods), and if significant, followed by sequentially testing the null of |$k$| associated traits (⁠|$k=1,\ldots,p-1)$|⁠, until the sequential test fails to reject the null hypothesis. This approach provides a way to assess the number of traits associated with a genetic variant, accounting for the correlations among the traits. As we emphasized elsewhere (Schaid and others, 2016), it is possible to extend our approach to genetically related subjects by using an |$n\times n$| matrix |$G$| to account for relationships. For pedigree data, |$G$| contains diagonal elements |$G_{ii} =1+h_{i} $| where |$h_{i} $| is the inbreeding coefficient for subject |$i$|⁠, and off-diagonal elements |$G_{ij} =2\varphi_{ij} $|⁠. The parameter |$\varphi_{ij} $| is the kinship coefficient between individuals |$i $| and |$j$|⁠, the probability that a randomly chosen allele at a given locus from individual |$i $| is identical by descent to a randomly chosen allele from individual |$j$|⁠, conditional on their ancestral relationship. For data with population structure, |${G}$| is an estimate of genetic relationships (Schaid and others, 2013). Then, the covariance between subjects |$i$| and |$j$|⁠, for traits |$k$| and |$k'$|⁠, is |$\Sigma_{ij,kk'} =\phi_{k} \nu_{k} (\mu_{ik} )\phi_{k'} \nu_{k'} (\mu_{jk'} )R_{kk'} G_{ij}$|⁠. By using simulations to compare our |$t_{o} $| statistic with MultiPhen ordinal regression and Wu’s stratified method, we found similar power between |$t_{o}$| and MultiPhen. The main advantage of our approach is that it allows the multiple traits to depend on different sets of adjusting covariates, in contrast to MultiPhen which would require a common set of covariates, along with the traits, to serve as predictor variables in ordinal regression. The relative power of Wu’s stratified approach depended on whether the effect of the genetic variant on a quantitative trait varied over strata, with weaker power in the absence of heterogeneity, but greater power as heterogeneity increased. A limitation of the stratified approach would be extending it to multiple binary traits, resulting in highly stratified samples. In contrast, our approach easily handles multiple binary traits by testing the marginal effects of a genetic variant on the multiple traits. When we applied our methods to a GWAS of symptoms of major depression, no SNP associations achieved genome wide significance, although the SNP with the smallest p-value of 4.0E|$-$|7 was interesting because this SNP is within an intron of the gene SH3GL3. This is a protein-coding gene on chromosome 4 that is preferentially expressed in brain and testis and interacts with the Huntington’s disease gene, HTT (Sittler and others, 1998). Huntington disease is inherited in an autosomal dominant manner, due primarily to genetic mutations in the HTT gene on chromosome 4. Although our subjects do not have Huntington disease, it is important to understand that some symptoms of Huntington disease include various emotional and psychiatric problems, of which the most common is depression; depression is not a reaction to having Huntington’s disease diagnosis, but rather appears to occur because of brain injury and brain changes from the disease. The characteristics of the interaction between SH3GL3 and HTT suggest that SH3GL3 could be involved in the selective neuronal cell death in patients with Huntington’s disease (Sittler and others, 1998). This raises the question of whether our results suggest that SH3GL3 also plays a role in major depression, independent of Huntington’s disease. Further studies, preferably functional studies, are required to evaluate whether genetic variants in SH3GL3 are associated with major depression, and in particular whether this gene is associated with certain features of depression, as suggested by our sequential analyses. To demonstrate how the sequential testing procedure can be used, we used a p-value threshold of 0.05 to stop the sequential testing. For analysis of GWAS data, this threshold might be too liberal, depending on how many SNPs are found to be associated with traits at prior steps in the sequential testing framework. In practice, it seems sensible to account for the number of tests at each sequential step by using a Bonferroni correction for the number of tests performed at each step. Software: The software package “pleio” is available as an R package from the CRAN web site (https://cran.r-project.org/web/packages/pleio/index.html). This package contains our original pleio.q functions for quantitative traits and pleio.glm functions for the methods discussed in this current work. Reproducible Research: The data and scripts used for analysis of the depression study are available at Github: https://github.com/sinnweja/rpleio/releases/tag/v1.0. Acknowledgments This research was supported by the U.S. Public Health Service, National Institutes of Health, contract grants numbers GM065450, GM28157, and GM61388. We gratefully acknowledge use of the depression data from the Mayo Clinic Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study. Finally, we would like to thank the patients who participated in the PGRN-AMPS SSRI trial as well as the Mayo Clinic psychiatrists who cared for them. Conflict of Interest: None declared. Funding Funding for this work was provided by the U.S. National Institutes of Health, contract grant number GM065450. References Biernacka J. M. , Sangkuhl K. , Jenkins G. , Whaley R. M. , Barman P. , Batzler A. , Altman R. B. , Arolt V. , Brockmoller J. , Chen C. H. and others. ( 2015 ). The International SSRI Pharmacogenomics Consortium (ISPC): a genome-wide association study of antidepressant treatment response. Translational psychiatry 5 , e553 . Google Scholar Crossref Search ADS PubMed Cotsapas C. , Voight B. F. , Rossin E. , Lage K. , Neale B. M. , Wallace C. , Abecasis G. R. , Barrett J. C. , Behrens T. , Cho J. and others. ( 2011 ). Pervasive sharing of genetic effects in autoimmune disease. PLoS genetics 7 ( 8 ), e1002254 . Google Scholar Crossref Search ADS PubMed Diggle P. , Heagerty P. , Liang K-Y. , Zeger S. ( 2009 ). Analysis of Longitudinal Data. Oxford : Oxford University Press . Ferreira M. A. , Purcell S. M. ( 2009 ). A multivariate test of association. Bioinformatics 25 ( 1 ), 132 – 133 . Google Scholar Crossref Search ADS PubMed Galesloot T. E. , van Steen K. , Kiemeney L. A. , Janss L. L. , Vermeulen S. H. ( 2014 ). A comparison of multivariate genome-wide association methods. PloS one 9 ( 4 ), e95923 . Google Scholar Crossref Search ADS PubMed Ji Y. , Biernacka J. M. , Hebbring S. , Chai Y. , Jenkins G. D. , Batzler A. , Snyder K. A. , Drews M. S. , Desta Z. , Flockhart D. and others. ( 2013 ). Pharmacogenomics of selective serotonin reuptake inhibitor treatment for major depressive disorder: genome-wide associations and functional genomics. The pharmacogenomics journal 13 ( 5 ), 456 – 463 . Google Scholar Crossref Search ADS PubMed Liu J. , Pei Y. , Papasian C. J. , Deng H. W. ( 2009 ). Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genetic epidemiology 33 ( 3 ), 217 – 227 . Google Scholar Crossref Search ADS PubMed Maity A. , Sullivan P. F. , Tzeng J. Y. ( 2012 ). Multivariate phenotype association analysis by marker-set kernel machine regression. Genetic epidemiology 36 ( 7 ), 686 – 695 . Google Scholar Crossref Search ADS PubMed Marchini J. , Howie B. , Myers S. , McVean G. , Donnelly P. ( 2007 ). A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39 ( 7 ), 906 – 913 . Google Scholar Crossref Search ADS PubMed O’Reilly P. F. , Hoggart C. J. , Pomyen Y. , Calboli F. C. , Elliott P. , Jarvelin M. R. , Coin L. J. ( 2012 ). MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one 7 ( 5 ), e34861 . Google Scholar Crossref Search ADS PubMed Roy J. , Lin X. , Ryan L. M. ( 2003 ). Scaled marginal models for multiple continuous outcomes. Biostatistics 4 ( 3 ), 371 – 383 . Google Scholar Crossref Search ADS PubMed Schaid D. J. , McDonnell S. K. , Sinnwell J. P. , Thibodeau S. N. ( 2013 ). Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genetic epidemiology 37 , 409 – 418 . Google Scholar Crossref Search ADS PubMed Schaid D. , Tong X. , Larrabee B. , Kennedy R. , Poland G. , Sinnwell J. ( 2016 ). Statistical methods for testing genetic pleiotropy. Genetics 204 , 483 – 497 . Google Scholar Crossref Search ADS PubMed Schifano E. D. , Li L. , Christiani D. C. , Lin X. ( 2013 ). Genome-wide association analysis for multiple continuous secondary phenotypes. American journal of human genetics 92 , 744 – 759 . Google Scholar Crossref Search ADS PubMed Schriner D. ( 2012 ). Moving toward system genetics through multiple trait analysis in genome-wide association studies. Frontiers in Genetics 16 , 1 – 7 . Sittler A. , Walter S. , Wedemeyer N , Hasenbank R , Scherzinger E. , Eickhoff H. , Bates G. P. , Lehrach H. , Wanker E. E. ( 1998 ). SH3GL3 associates with the Huntingtin exon 1 protein and promotes the formation of polygln-containing protein aggregates. Molecular cell 2 , 427 – 436 . Google Scholar Crossref Search ADS PubMed Solovieff N. , Cotsapas C. , Lee P. H. , Purcell S. M , Smoller J. W. ( 2013 ). Pleiotropy in complex traits: challenges and strategies. Nature reviews. Genetics 14 , 483 – 495 . Google Scholar Crossref Search ADS PubMed Stephens M. ( 2013 ). A unified framework for association analysis with multiple related phenotypes. PloS one 8 , e65245 . Google Scholar Crossref Search ADS PubMed Wu M. C. , Kraft P. , Epstein M. P. , Taylor D. M. , Chanock S. J. , Hunter D. J. , Lin X. ( 2010 ). Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86 , 929 – 942 . Google Scholar Crossref Search ADS PubMed Wu C. O. , Zheng G. , Kwak M. ( 2013 ). A joint regression analysis for genetic association studies with outcome stratified samples. Biometrics 69 , 417 – 426 . Google Scholar Crossref Search ADS PubMed Xu Z. , Pan W. ( 2015 ). Approximate score-based testing with application to multivariate trait association analysis. Genetic epidemiology 39 , 469 – 479 . Google Scholar Crossref Search ADS PubMed Yang Q. , Wang Y. ( 2012 ). Methods for analyzing multivariate phenotypes in genetic association studies. Journal of probability and statistics 2012 , 652569 . Google Scholar Crossref Search ADS PubMed Zhang Y. , Xu Z. , Shen X. , Pan W. ( 2014 ). Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. NeuroImage 96 , 309 – 325 . Google Scholar Crossref Search ADS PubMed © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

BiostatisticsOxford University Press

Published: Jan 1, 2019

There are no references for this article.