Add Journal to My Library
Biometrika
, Volume Advance Article (2) – Feb 16, 2018

21 pages

/lp/ou_press/joint-testing-and-false-discovery-rate-control-in-high-dimensional-v6YQkibZHh

- Publisher
- Oxford University Press
- Copyright
- © 2018 Biometrika Trust
- ISSN
- 0006-3444
- eISSN
- 1464-3510
- D.O.I.
- 10.1093/biomet/asx085
- Publisher site
- See Article on Publisher Site

SUMMARY Multivariate regression with high-dimensional covariates has many applications in genomic and genetic research, in which some covariates are expected to be associated with multiple responses. This paper considers joint testing for regression coefficients over multiple responses and develops simultaneous testing methods with false discovery rate control. The test statistic is based on inverse regression and bias-corrected group lasso estimates of the regression coefficients and is shown to have an asymptotic chi-squared null distribution. A row-wise multiple testing procedure is developed to identify the covariates associated with the responses. The procedure is shown to control the false discovery proportion and false discovery rate at a prespecified level asymptotically. Simulations demonstrate the gain in power, relative to entrywise testing, in detecting the covariates associated with the responses. The test is applied to an ovarian cancer dataset to identify the microRNA regulators that regulate protein expression. 1. Introduction In genetic and genomic applications, multiple correlated phenotypes are often measured on the same individuals. Examples include genetic association studies, where many correlated phenotypic measurements are analysed jointly in the hope of increasing the power to detect causal genetic variants (Schifano et al., 2013; Zhou et al., 2015). In the study of gene expression, many genetic variants are associated with expression of multiple genes through trans-regulation (Grundberg et al., 2012). Identification of such trans-variants has been difficult due to limited sample sizes and multiple comparisons. In cancer genomic studies, microRNAs, or miRNAs, have been found to modulate many cellular processes, and protein-miRNA interactions are essential for post-transcriptional regulation in normal development and may be deregulated in cancer (van Kouwenhove et al., 2011). The biological goal of these studies is to identify associations between miRNA and protein expression in order to understand the regulatory roles of miRNAs. High-dimensional multivariate regression can be used in the applications mentioned above. One simple approach involves assessing the relationship between each response and each covariate individually and using a Bonferroni correction and false discovery rate control to account for the large number of tests conducted. This is often performed in genetic association analysis of gene expression data. Ruffieux et al. (2017) developed a Bayesian method for sparse multivariate regression models to select predictor-response pairs. Alternatively, one can apply a dimension reduction technique, such as principal component analysis of the responses, and test for association with the principal components rather than the individual responses (Suo et al., 2013). Classical Fisher combination tests have also been explored and applied (Yang et al., 2016). By taking advantage of the similarity across multivariate responses, significant gains in statistical power can potentially be achieved in an association analysis. This motivates us to consider the following high-dimensional multivariate regression model, where $$D$$ correlated responses are measured on $$n$$ independent individuals, together with $$p$$ covariates, where $$p$$ can be much greater than $$n$$: \begin{equation} \label{model1} {Y}_{n\times D}={\mu}_{n\times D}+{X}_{n\times p}{B}_{p\times D}+{\Upsilon}_{n\times D}, \end{equation} (1) where $${Y}=({Y}_{{\cdot},1},\ldots,{Y}_{{\cdot},D})\in {\mathbb{R}}^{n\times D}$$, with $${Y}_{{\cdot},d}=(Y_{1,d},\ldots,Y_{n,d})^{{\rm T}}$$, denotes the observed response matrix for $$n$$ samples over $$D$$ responses, $${\mu}=({\mu}_{{\cdot},1},\ldots,{\mu}_{{\cdot},D})\in {\mathbb{R}}^{n\times D}$$, with $${\mu}_{{\cdot},d}=(\mu_{1,d},\dots,\mu_{n,d})^{\rm T}$$, is the mean response matrix, the rows of which are the same, and $${X}=({X}_{1,{\cdot}}^{{\rm T}},\ldots,{X}_{n,{\cdot}}^{{\rm T}})^{{\rm T}}\in {\mathbb{R}}^{n\times p}$$ is the covariate matrix. In this model, $${B}=({B}_{{\cdot},1},\ldots,{B}_{{\cdot},D})\in {\mathbb{R}}^{p\times D}$$, with $${B}_{{\cdot},d}=(B_{1,d},\dots,B_{p,d})^{{\rm T}}\in {\mathbb{R}}^p$$, represents the matrix of regression coefficients, where the $$i$$th row represents the regression coefficients of the $$i$$th covariate on the $$D$$ responses, and $$D$$ is fixed. Finally, $${\Upsilon}=({\epsilon}_{{\cdot},1},\ldots,{\epsilon}_{{\cdot},D})\in {\mathbb{R}}^{n\times D}$$ with $${\epsilon}_{{\cdot},d}=(\epsilon_{1,d},\dots,\epsilon_{n,d})^{\rm T}$$, where $$\{\epsilon_{k,d}\}$$ are independent and identically distributed random variables with mean zero and variance $$\sigma_{\epsilon}^2$$ which are independent of $${X}$$. To test whether the $$i$$th covariate is associated with any of the $$D$$ responses, in this paper we develop an efficient procedure for simultaneously testing \begin{equation} \label{hypothesis} H_{0,i}: {B}_{i,{\cdot}}=0\quad \mbox{versus} \quad H_{1,i}: {B}_{i,{\cdot}}\neq 0 \quad (i=1,\dots,p) \end{equation} (2) while controlling the false discovery rate and false discovery proportion. Of particular interest is the scenario where the effects of the $$i$$th variable on each of the $$D$$ responses share strong similarities; that is, if $$B_{i,d}\neq 0$$, then the rest of the entries in that row are more likely to be nonzero. Hence, a row-wise testing method using the group information should be more favourable than testing $${B}$$ column by column. Currently there is significant interest in statistical inference for high-dimensional linear regression. In the ultra-sparse setting, Javanmard & Montanari (2013), van de Geer et al. (2014) and Zhang & Zhang (2014) considered constructing confidence intervals and testing for a given coordinate of a high-dimensional coefficient vector in a linear model, and proposed procedures based on the debiased lasso estimators. Cai & Guo (2017) studied adaptive confidence intervals for a general linear functional of the regression vector in the high-dimensional setting, and showed that the adaptivity depends on the sparsity of the regression and the loading vectors. Zhu & Bradic (2017) considered testing a single entry of the regression coefficient in the setting of nonsparse linear regression. These papers focused on inference for a given coordinate or a given linear functional, but simultaneous testing of all coordinates with false discovery rate control was not considered. Furthermore, they only dealt with the regression setting with a single response, whereas here we consider joint testing of regression coefficients over multiple responses. We develop a sum-of-squares-type test statistic for testing a given row of the regression coefficient matrix $${B}$$, based on the covariance between the residuals from the fit of the regression model (1) and the corresponding $$pD$$ inverse regression models introduced in § 2.2. To get an estimate of the error terms $${\Upsilon}$$ in (1), this model is reformulated and the group lasso algorithm applied to obtain a bias-corrected estimator of $${B}$$, so as to make use of the group information. The test statistic is shown to have an asymptotic $$\chi^2$$ distribution under the null hypothesis that the corresponding row of $${B}$$ is zero. A transformed statistic is introduced and its asymptotic power is studied. In addition, a new row-wise multiple testing procedure based on the normal quantile transformation is proposed and its theoretical error rate control investigated. It is shown numerically that for a range of settings, the row-wise testing procedure significantly outperforms the entrywise method. We apply our method to analyse an ovarian cancer genomic dataset. The results demonstrate that the proposed test provides a powerful tool for identifying the miRNA regulators that are associated with a set of important proteins related to cancer progression. 2. Estimation and row-wise statistical test 2.1. Notation and definitions We begin with some basic notation and definitions. For a vector $${a}=(a_{1},\ldots,a_{p})^{{\rm T}}\in {\mathbb{R}}^p$$, define the $$l_{q}$$-norm by $$|{a}|_{q}=(\sum_{i=1}^{p}|a_{i}|^{q})^{1/q}$$ for $$1\le q \le \infty$$. For subscripts, we use the convention that $$v_i$$ stands for the $$i$$th entry of a vector $$v$$ and $$M_{i,j}$$ for the entry in the $$i$$th row and $$j$$th column of a matrix $$M$$. Throughout, $$n$$ independent and identically distributed random samples $$\{Y_{k,d}, {X}_{k,{\cdot}}: k=1,\ldots,n;\, d=1,\ldots,D\}$$ are observed, where $${X}_{k,{\cdot}}=(X_{k,1},\dots,X_{k,p})$$ is a random vector with covariance matrix $$\Sigma$$. Define $$\Sigma^{-1}={\Omega}=(\omega_{i,j})$$. For any vector $${\mu}_d\in {\mathbb{R}}^p$$, let $${\mu}_{-i,d}$$ denote the $$(p-1)$$-dimensional vector obtained by removing the $$i$$th entry from $${\mu}_d$$. For a symmetric matrix $${A}$$, let $$\lambda_{\max}({A})$$ and $$\lambda_{\min}({A})$$ denote the largest and smallest eigenvalues of $${A}$$. For any $$n\times p$$ matrix $${A}$$, let $${A}_{i,-j}$$ denote the $$i$$th row of $${A}$$ with its $$j$$th entry removed and let $${A}_{-i,j}$$ denote the $$j$$th column of $${A}$$ with its $$i$$th entry removed. Let $${A}_{-i,-j}$$ denote the $$(n-1)\times(p-1)$$ submatrix of $${A}$$ formed by removing its $$i$$th row and $$j$$th column, and let $${A}_{{\cdot},-j}$$ denote the $$n\times (p-1)$$ submatrix of $${A}$$ formed by removing the $$j$$th column. Let $${A}_{i,{\cdot}}$$ be the $$i$$th row of $${A}$$ and $${A}_{{\cdot},j}$$ the $$j$$th column of $${A}$$. Write $$\skew6\bar{A}_{{\cdot},j}=n^{-1}\sum_{k=1}^nA_{k,j}$$, $$\skew6\bar{{A}}_{{\cdot},-j}=n^{-1}\sum_{k=1}^n {A}_{k,-j}$$, $$\skew6\bar{{A}}_{{\cdot},j}=(\skew6\bar{A}_{{\cdot},j},\ldots,\skew6\bar{A}_{{\cdot},j})^{{\rm T}}_{n\times 1}$$, and $$\skew6\bar{{A}}_{({\cdot},-j)}=(\skew6\bar{{A}}_{{\cdot},-j}^{{\rm T}},\ldots,\skew6\bar{{A}}_{{\cdot},-j}^{{\rm T}})^{{\rm T}}_{n\times (p-1)}$$. Let $$\skew6\bar{{A}}=n^{-1}\sum_{k=1}^n{A}_{k,{\cdot}}$$. For a matrix $${\Omega}=(\omega_{i,j})_{p\times p}$$, the matrix elementwise infinity norm is defined by $$\|{\Omega}\|_{\infty}=\max_{1\leq i,j\leq p}|\omega_{i,j}|$$. For a set $$\mathcal{H}$$, $$\:|\mathcal{H}|$$ denotes the cardinality. For two sequences of real numbers $$\{a_{n}\}$$ and $$\{b_{n}\}$$, write $$a_{n} = O(b_{n})$$ if there exists a constant $$C$$ such that $$|a_{n}| \leq C|b_{n}|$$ for all $$n$$, write $$a_{n} = o(b_{n})$$ if $$\lim_{n\rightarrow\infty}a_{n}/b_{n} = 0$$, and write $$a_{n}\asymp b_{n}$$ if $$\lim_{n\rightarrow\infty}a_{n}/b_{n} = 1$$. 2.2. Parameter estimation and construction of test statistics In order to construct a row-wise testing procedure, an inverse regression approach is used to first obtain a nearly unbiased estimator of $${B}$$. Based on model (1), for each $$i=1,\ldots,p$$ and $$d=1,\ldots,D$$, we run linear regression by taking $$X_{k,i}$$ as the response and $$(Y_{k,d},{X}_{k,-i})$$ as the covariates: \begin{equation} \label{model2} X_{k,i}=\alpha_{i,d}+(Y_{k,d}, {X}_{k,-i}){\gamma}_{i,d}+\eta_{k,i,d} \quad(k=1,\dots,n) \end{equation} (3) where $$\eta_{k,i,d}$$ has mean zero and variance $$\sigma_{i,d}^2$$ and is uncorrelated with $$(Y_{k,d},{X}_{k,-i})$$. The regression coefficients $${\gamma}_{i,d}=(\gamma_{i,1,d},\dots,\gamma_{i,p,d})^{{\rm T}}$$ satisfy \[ {\gamma}_{i,d}=-\sigma_{i,d}^2(-B_{i,d}/\sigma_{\epsilon}^2,\, B_{i,d}{B}_{-i,d}^{{\rm T}}/\sigma_{\epsilon}^2+{\Omega}_{i,-i})^{{\rm T}},\quad \sigma_{i,d}^2=(B_{i,d}^2/\sigma_{\epsilon}^2+\omega_{i,i})^{-1}\text{.} \] Note that $${\mathrm{cov}}(\epsilon_{k,d},\eta_{k,i,d})={\mathrm{cov}}\{\epsilon_{k,d},\,X_{k,i}-\alpha_{i,d}-(Y_{k,d}, {X}_{k,-i}){\gamma}_{i,d}\}$$. Because the $$\epsilon_{k,d}$$ are independent of the random design matrix $${X}$$, $$r_{i,d}={\mathrm{cov}}(\epsilon_{k,d},\eta_{k,i,d})$$ can be expressed as $$-\gamma_{i,1,d}\,{\mathrm{cov}}(\epsilon_{k,d},Y_{k,d})=-\gamma_{i,1,d}\sigma_{\epsilon}^2=-\sigma_{i,d}^2B_{i,d}$$. Thus the null hypotheses $$H_{0,i}: {B}_{i,{\cdot}}=0$$ are equivalent to \[ H_{0,i}: \sum_{d=1}^D( r_{i,d}/\sigma_{i,d}^2)^2=0 \quad (i =1,\ldots,p)\text{.} \] Test statistics can then be based on the estimates of $$\{ r_{i,d}/\sigma_{i,d}^2: i=1,\dots,p; \, d=1,\ldots,D\}$$. To obtain estimates of $$r_{i,d}$$, estimates of $${B}$$ and $$\gamma_{i,d}$$ are first constructed. To utilize the row-wise similarity information, the estimator $$\hat{{B}}=(\hat{{B}}_{{\cdot},1},\ldots,\hat{{B}}_{{\cdot},D})\in {\mathbb{R}}^{p\times D}$$, with $$\hat{{B}}_{{\cdot},d}=(\hat{B}_{1,d},\dots,\hat{B}_{p,d})^{{\rm T}}\in {\mathbb{R}}^p$$, is obtained by using the group lasso method proposed in Yuan & Lin (2006). Because the response variables and the covariates can be centred so that the observed mean is zero, without loss of generality we assume that $${\mu}=0$$. Specifically, write vec$$({Y})=({Y}_{1,{\cdot}},\ldots,{Y}_{n,{\cdot}})^{{\rm T}}\in {\mathbb{R}}^{nD\times 1}$$, vec$$({B})=({B}_{1,{\cdot}},\ldots,{B}_{p,{\cdot}})^{{\rm T}}\in {\mathbb{R}}^{pD\times 1}$$ and vec$$({\Upsilon})=({\epsilon}_{1,{\cdot}},\ldots,{\epsilon}_{n,{\cdot}})^{{\rm T}}\in {\mathbb{R}}^{nD\times 1}$$. Model (1) can then be written as \begin{equation}\label{equiv} \text{vec}({Y})={X}\otimes {I}_{D\times D}\text{vec}({B})+\text{vec}({\Upsilon})\text{.} \end{equation} (4) Let vec$$(\hat{{B}})=(\hat{{B}}_{1,{\cdot}},\ldots,\hat{{B}}_{p,{\cdot}})^{{\rm T}}\in {\mathbb{R}}^{pD\times 1}$$. The group lasso estimator is defined as \begin{equation}\label{B.est} \text{vec}(\hat{{B}})=\arg\min\left\{\frac{1}{2}\,\bigg|\text{vec}({Y})-\sum_{j=1}^p({X}\otimes {I}_{D\times D})_{{\cdot},[(j-1)D+1]:jD}{B}_{j,{\cdot}}\bigg|_2^2+\lambda_n\sum_{j=1}^p|{B}_{j,{\cdot}}|_2\right\}\!, \end{equation} (5) where $$\lambda_n$$ is a tuning parameter. Selection of $$\lambda_n$$ will be discussed in § 3.3. By Lemma A3 in the Appendix, if the row sparsity of $${B}$$ satisfies $$s(p)=o(n^{1/3}/{\log p})$$, then \begin{equation} \label{B1} \max_{1\leq d\leq D}\big|\hat{{B}}_{{\cdot},d}-{B}_{{\cdot},d}\big|_1=O_{\rm p}(a_{n1}),\quad \max_{1\leq d\leq D}\big|\hat{{B}}_{{\cdot},d}-{B}_{{\cdot},d}\big|_2=O_{\rm p}(a_{n2}) \end{equation} (6) for some $$a_{n1}$$ and $$a_{n2}$$ such that \begin{equation} \label{beta_eta2} \max(a_{n1}a_{n2}, a_{n2}^2)=o\{(n\log p)^{-1/2}\}, \quad a_{n1}=o(1/\log p)\text{.} \end{equation} (7) Remark 1. The estimator $$\hat{{B}}$$ in (5) provides one way to estimate the regression coefficients $$B$$, and other methods can be applied to estimate it if the coefficients satisfy (6) and (7). On the other hand, row-wise estimation as in (5) is better than entrywise estimation because the entries in the same row of $${B}$$ have similar patterns. Next, we construct estimates of $$\gamma_{i,d}$$ in the inverse regression (3) that satisfy \begin{equation} \label{gamma1} \max_{1\leq d\leq D}\max_{1\leq i\leq p}\big|\hat{{\gamma}}_{i,d}-{\gamma}_{i,d}\big|_1=O_{\rm p}(a_{n1}),\quad \max_{1\leq d\leq D}\max_{1\leq i\leq p}\big|\hat{{\gamma}}_{i,d}-{\gamma}_{i,d}\big|_2=O_{\rm p}(a_{n2}), \end{equation} (8) which can be obtained via the lasso estimator (Xia et al., 2015, 2018); see § 3.3. For $$k=1,\ldots, n$$ and $$d=1,\ldots,D$$, define the residuals \begin{eqnarray*} \hat{{\epsilon}}_{k,{\cdot}}&=&{Y}_{k,{\cdot}}-\bar{{Y}}-({X}_{k,{\cdot}}-\bar{{X}})\hat{{B}},\cr \hat{\eta}_{k,i,d}&=&X_{k,i}-\bar{X}_{i}-\{Y_{k,d}-\bar{Y}_d,({X}_{k,-i}-\bar{{X}}_{{\cdot},-i})\}\hat{{\gamma}}_{i,d}, \end{eqnarray*} where $$\hat{{B}}$$ and $$\hat{{\gamma}}_{i,d}$$ are estimators of $${B}$$ and $${\gamma}_{i,d}$$ that satisfy (7) and the following conditions by combining (6) and (8): \begin{align} \begin{split}\label{beta_eta1} \max\Big(\max_{1\leq d\leq D}|\hat{{B}}_{{\cdot},d}-{B}_{{\cdot},d}|_1,\:\max_{1\leq d\leq D}\max_{1\leq i\leq p}|\hat{{\gamma}}_{i,d}-{\gamma}_{i,d}|_1\Big)&=O_{\rm p}(a_{n1}),\\ \max\Big(\max_{1\leq d\leq D}|\hat{{B}}_{{\cdot},d}-{B}_{{\cdot},d}|_2,\:\max_{1\leq d\leq D}\max_{1\leq i\leq p}|\hat{{\gamma}}_{i,d}-{\gamma}_{i,d}|_2\Big)&=O_{\rm p}(a_{n2})\text{.} \end{split} \end{align} (9) We construct a nearly unbiased estimate of $$r_{i,d}$$ as follows. Since $$r_{i,d}={\mathrm{cov}}(\epsilon_{k,d},\eta_{k,i,d})$$, it can be estimated by the sample covariance between the error terms, $$R_{i,d}=n^{-1}\sum_{k=1}^{n}\epsilon_{k,d}\eta_{k,i,d}$$. By replacing the error terms with the residuals, $$R_{i,d}$$ can be estimated by $$\tilde{r}_{i,d}=n^{-1}\sum_{k=1}^{n}\hat{\epsilon}_{k,d}\hat{\eta}_{k,i,d}$$. However, the bias induced by the estimated parameters exceeds the desired rate $$(n\log p)^{-1/2}$$ and is not ignorable. By comparing $$\tilde{r}_{i,d}$$ and $$R_{i,d}$$, Lemma A2 leads to \[ \tilde r_{i,d}=R_{i,d}+\tilde{\sigma}_{\epsilon}^2(\gamma_{i,1,d}-\hat \gamma_{i,1,d})+\tilde{\sigma}_{i,d}^2(B_{i,d}-\hat B_{i,d})+o_{\rm p}\{(n\log p)^{-1/2}\}\text{.} \] Define $$\hat{\sigma}_{\epsilon}^2=(Dn)^{-1}\sum_{k=1}^{n}\sum_{d=1}^D\hat{\epsilon}_{k,d}^2$$ and $$\hat{\sigma}_{i,d}^2=n^{-1}\sum_{k=1}^{n}\hat{\eta}_{k,i,d}^2$$ to be the empirical sample variances that satisfy \[ \max\Big(|\hat{\sigma}_{\epsilon}^2-\sigma_{\epsilon}^2|, \:\max_{1\leq i\leq p}|\hat{\sigma}_{i,d}^2-\sigma_{i,d}^2|\Big)=O_{\rm p}\{(\log p/n)^{1/2}\}\text{.} \] We have \begin{eqnarray*} \tilde{r}_{i,d}+\hat{\sigma}_{\epsilon}^2\hat{\gamma}_{i,1,d}+\hat{\sigma}_{i,d}^2\hat{B}_{i,d} &=& n^{-1}\sum_{k=1}^{n}\{\epsilon_{k,d}\eta_{k,i,d}-{E}(\epsilon_{k,d}\eta_{k,i,d})\}\cr &&-\,\sigma_{i,d}^2B_{i,d}(1-\tilde{\sigma}_{\epsilon}^2/\sigma_{\epsilon}^2-\tilde{\sigma}_{i,d}^2/\sigma_{i,d}^2)+o_{\rm p}\{(n\log p)^{-1/2}\}\text{.} \end{eqnarray*} Thus, a debiased estimator for $$r_{i,d}$$ can be defined by \[ \hat{r}_{i,d}=\tilde{r}_{i,d}+\hat{\sigma}_{\epsilon}^2\hat{\gamma}_{i,1,d}+\hat{\sigma}_{i,d}^2\hat{B}_{i,d}\text{.} \] Under the null hypothesis $$H_{0,i}$$, the bias of $$\hat{r}_{i,d}$$ is of order $$(n\log p)^{-1/2}$$, which is sufficiently small to construct the test statistics \[ T_{i,d}=\hat{r}_{i,d}/\hat{\sigma}_{i,d}^2 \quad (i=1,\dots,p; \, d=1,\ldots,D)\text{.} \] Under $$H_{0,i}$$, we have $$|T_{i,d}-\tilde{U}_{i,d}|= o_{\rm p}\{(n\log p)^{-1/2}\},$$ where \begin{equation} \label{U_i} U_{i,d}=n^{-1}\sum_{k=1}^{n}\{\epsilon_{k,d}\eta_{k,i,d}-{E}(\epsilon_{k,d}\eta_{k,i,d})\},\quad \tilde{U}_{i,d}=(B_{i,d}+U_{i,d})/\sigma_{i,d}^2\text{.} \end{equation} (10) The variance of $$T_{i,d}$$ can be approximated by \[ \theta_{i,d}\equiv {\mathrm{var}}(\tilde{U}_{i,d}) ={\mathrm{var}}(\epsilon_{k,d}\eta_{k,i,d}/\sigma_{i,d}^2)/n=(\sigma_{\epsilon}^2/\sigma_{i,d}^2+B_{i,d}^2)/n, \] where $$\theta_{i,d}$$ can be estimated by $$\hat{\theta}_{i,d}=(\hat{\sigma}_{\epsilon}^2/\hat{\sigma}_{i,d}^2+\hat{B}_{i,d}^2)/n$$. Define the standardized statistic \[ W_{i,d}={T_{i,d}}/{\hat{\theta}_{i,d}^{1/2}} \quad (i=1,\dots,p)\text{.} \] The final test statistic for (2) based on $$\{W_{i,d}:i=1,\dots,p;\, d=1,\ldots,D\}$$ is then defined as \begin{equation} \label{test} S_i=\sum_{d=1}^DW_{i,d}^2\quad (i=1,\dots,p), \end{equation} (11) which will be studied in detail in § 2.3. 2.3. Asymptotic null distribution of $$S_i$$ To investigate the asymptotic null distribution of $$S_i$$, some regularity conditions are needed. Condition 1. Assume that $$\log p=o(n^{1/5})$$ and that for some constants $$C_0, C_1,C_2>0$$, $$C_0^{-1}\leq \lambda_{\min}({\Omega})\leq \lambda_{\max}({\Omega})\leq C_0$$, $$\:C_1^{-1}\leq \sigma_{\epsilon}^2\leq C_1$$, $$\:\|{B}\|_{\infty}\leq C_2$$, and $${\mathrm{var}} (Y_{k,d})\leq C_2$$. Condition 2. There exists some constant $$M>0$$ such that the quantities $${E}\{\exp(M\epsilon_{k,d}^2)\}$$ and $$\max_{{\mathrm{var}}({a}^{{\rm T}}{X}_{k,{\cdot}}^{{\rm T}})=1}{E} [\exp\{M({a}^{{\rm T}}{X}_{k,{\cdot}}^{{\rm T}})^2\}]$$ are finite. Condition 3. Let $$\Lambda$$ be the diagonal of $${\Omega}$$ and let $$(\xi_{i,j})={R}=\Lambda^{-1/2}{\Omega}\Lambda^{-1/2}$$. Assume that $$\max_{1\leq i\leq j\leq p} |\xi_{i,j} | \leq \xi < 1 \text{ for some constant } 0 < \xi < 1$$. Condition 1, the eigenvalue condition, is common in high-dimensional settings (Cai et al., 2013; Liu, 2013; Xia et al., 2015). It implies that most of the variables are not highly correlated with each other and is sufficient to give Lemma A2 and to ensure error rate control in simultaneous inference, studied in § 3. The assumption $$\log p=o(n^{1/5})$$ is sufficient for the simultaneous Gaussian approximations (A3) and (A4), as shown in the proof of Theorem 1. As we shall see from the numerical results in § 4, $$p$$ can be much larger than $$n$$ in practice. The same condition is commonly used (see, e.g., Cai et al., 2013; Xia et al., 2015), and stricter assumptions are also imposed, for example in Chang et al. (2017), where it was assumed that $$\log p=o(n^{1/7})$$. Condition 2 is a sub-Gaussian tail condition, and can be weakened to a polynomial tail condition if $$p<n^c$$ for some constant $$c>0$$. This condition is necessary for Gaussian approximation of the proposed test statistics. It is milder than the normal assumption used in the high-dimensional regression literature (Javanmard & Montanari, 2014; Zhang & Zhang, 2014). Condition 3 is also mild. For example, if $$\max_{1\leq i\leq j\leq p}|\xi_{i,j}|=1$$, then $${\Omega}$$ is singular. The following theorem gives the asymptotic null distribution of $$S_i$$, for any $$i=1,\ldots,p$$. Theorem 1. Suppose that Conditions 1 and 2 and the asymptotic conditions (9) and (7) hold. Then under $$H_{0,i}: {B}_{i,{\cdot}}=0$$, for any $$t\in {\mathbb{R}}$$, \begin{equation*} {\text{pr}}(S_i\leq t) \rightarrow {\text{pr}}(\chi^2_D\leq t) \end{equation*} as $$n,p\rightarrow \infty$$. The normal quantile transformation of $$S_i$$, defined as \begin{equation}\label{normal_transf} Q_i = \Phi^{-1}\{1-{\text{pr}}(\chi^2_D\geq S_i)/2\} \end{equation} (12) where $$\Phi$$ is the standard normal cumulative distribution function, asymptotically has the same distribution as the absolute value of a standard normal random variable. A test $$\Phi^{i}_{\alpha}$$ can be defined as \begin{equation} \label{testi} \Phi^{i}_{\alpha}=I\{Q_i\geq \Phi^{-1}(1-\alpha/2)\}, \end{equation} (13) where the hypothesis $$H_{0,i}: {B}_{i,{\cdot}}=0$$ is rejected whenever $$\Phi^{i}_\alpha =1$$. 2.4. Asymptotic power To analyse the asymptotic power of the test $$\Phi^{i}_{\alpha}$$ given in (13), for a given row index $$i$$, define the class of regression coefficients \begin{equation}\label{class1} \mathcal{W}_{i}(\alpha,\nu)=\left\{{B}: \sum_{d=1}^D\frac{B_{i,d}^2}{\theta_{i,d}}\geq (2+\delta)(\Psi^2_{1-\alpha}+{\Psi}^2_{1-\nu})\right\} \end{equation} (14) for any constant $$\delta>0$$, where $$\Psi_{1-\alpha}$$ is the $$1-\alpha$$ quantile of $$\chi^2_D$$. The next theorem shows that the test $$\Phi_\alpha^{i}$$ is able to asymptotically distinguish the null parameter set in which $${B}_{i,{\cdot}} = 0$$ from $$\mathcal{W}_{i}(\alpha,\nu)$$ for an arbitrarily small constant $$\delta>0$$, with $$\nu\rightarrow 0$$. Theorem 2. Suppose that Conditions 1 and 2 and the asymptotic conditions (9) and (7) hold. Then as $$n,p\rightarrow\infty$$, for any $$\delta>0$$, \[ \inf_{{B}\in\mathcal{W}_{i}(\alpha,\nu)}{\text{pr}}\big(\Phi^{i}_{\alpha}=1\big)\geq 1-\nu\text{.} \] Since $$\theta_{i,d}$$ is of order $$1/n$$, Theorem 2 shows that the proposed test rejects the null hypothesis $$H_{0,i}: {B}_{i,{\cdot}}=0$$ with high probability for a large class of regression coefficients satisfying the condition that there exists one entry $${B}_{i,d}$$ having a magnitude larger than $$C/n^{1/2}$$ for $$C= \{(2+\delta)(C_0C_1+C_2^2)(\Psi^2_{1-\alpha}+\Psi^2_{1-\nu})\}^{1/2}$$, where $$C_0,C_1$$ and $$C_2$$ are given in Condition 1. 3. Multiple testing with error rate control 3.1. Multiple testing algorithm In this section, a row-wise multiple testing procedure is introduced with error rate control for testing the $$p$$ hypotheses \[ H_{0,i}: {B}_{i,{\cdot}}=0\quad \mbox{versus} \quad H_{1,i}: {B}_{i,{\cdot}}\neq 0 \quad (i=1,\dots,p)\text{.} \] Let $${\mathcal{H}}=\{1,\ldots,p\}$$, let $$\mathcal{H}_0=\{i: {B}_{i,{\cdot}}= 0, \,i\in {\mathcal{H}}\}$$ be the set of true nulls, and let $${\mathcal{H}}_1={\mathcal{H}}\setminus{\mathcal{H}}_0$$ be the set of true alternatives. We are interested in cases where most of the rows of $${B}$$ consist of zeros, that is, where $$|{\mathcal{H}}_1|$$ is small relative to $$|{\mathcal{H}}|$$. Theorem 1 shows that $$S_{i}$$ is asymptotically chi-squared distributed, and, as discussed in § 2.3, the normal quantile transformation of $$S_{i}$$ defined as $$Q_i = \Phi^{-1}\{1-{\text{pr}}(\chi^2_D\geq S_i)/2\}$$ has approximately the same distribution as the absolute value of a standard normal random variable under the null $$H_{0,i}$$. Let $$t$$ be the threshold level such that $$H_{0,i}$$ is rejected if $$Q_i\geq t$$. For any given $$t$$, denote the total number of false positives by $$R_{0}(t) = \sum_{i\in \mathcal{H}_0}I(Q_i\geq t)$$ and the total number of rejections by $$ R(t)= \sum_{i\in {\mathcal{H}}}I(Q_i\geq t)$$. Then the false discovery proportion and false discovery rate are defined as \[ {\small{\text{FDP}}}(t)=\frac{R_{0}(t)}{R(t)\vee 1}, \quad {\small{\text{FDR}}}(t)={E}\{{\small{\text{FDP}}}(t)\}\text{.} \] An ideal choice of $$t$$ is \[ t_0=\inf\bigl\{0\leq t\leq (2\log p)^{1/2}: {\small{\text{FDP}}}(t)\leq \alpha\bigr\}, \] which would reject as many true positives as possible while controlling the false discovery proportion at the prespecified level $$\alpha$$. Here $$R_{0}(t)$$ can be estimated by $$2\{1-\Phi(t)\}|\mathcal{H}_0|$$ and $$|\mathcal{H}_0|$$ is bounded above by $$p$$. Thus we conservatively estimate $$|\mathcal{H}_0|$$ by $$p$$ because of the row sparsity of the regression coefficients. Therefore, we propose the following multiple testing algorithm. Step 1. Construct $$S_i$$ by (11); then calculate the row-wise test statistics $$Q_i$$ by (12), for $$i\in {\mathcal{H}}$$. Step 2. For a given $$0\leq \alpha\leq 1$$, calculate \begin{equation}\label{t_hat} \hat{t}=\inf\!\left[0\leq t\leq (2\log p-2\log\log p)^{1/2}: \, \frac{2p\{1-\Phi(t)\}}{R(t)\vee 1}\leq \alpha\right]\text{.} \end{equation} (15) If (15) does not exist, then set $$\hat{t}=(2\log p)^{1/2}$$. Step 3. For $$i\in {\mathcal{H}}$$, reject $$H_{0,i}$$ if $$Q_i\geq \hat{t}$$. 3.2. Theoretical error rate control We now turn our attention to the theoretical error rate control of the proposed multiple testing algorithm. For any $$i\in {\mathcal{H}}$$, define \[ \Gamma_i(\gamma)=\{\,j: j\in {\mathcal{H}}, \: |\xi_{i,j}|\geq (\log p)^{-2-\gamma}\}, \] where $$\xi_{i,j}$$ is defined in Condition 3. The following theorem shows that the proposed multiple testing procedure controls the false discovery proportion and false discovery rate at the prespecified level $$\alpha$$ asymptotically. Theorem 3. Assume $$p_0= |\mathcal{H}_0|\asymp p$$ and that (9) and (7) hold. Suppose there exists some $$\gamma>0$$ such that $$\max_{i\in {\mathcal{H}}_0}|\Gamma_i(\gamma)|=o(p^{\tau})$$ for any $$\tau>0$$. Then under Conditions 1–3 with $$p\leq cn^r$$ for some $$c>0$$ and $$r>0$$, we have \[ \limsup_{(n,p)\rightarrow \infty}{{\small{\text{FDR}}}(\hat{t})}\leq \alpha, \] and for any $$\epsilon>0$$ we have \[ \lim_{(n,p)\rightarrow \infty}{\text{pr}}\{{\small{\text{FDP}}}(\hat{t})\leq \alpha+\epsilon\}=1\text{.} \] Remark 2. The condition on $$|\Gamma_i(\gamma)|$$ ensures that most of the row estimates of $${B}$$ are not highly correlated with each other for those indices belonging to the true nulls $${\mathcal{H}}_0$$, so as to control the variance of $$R_{0}(t)$$ in order to control the false discovery proportion. When $$\hat{t}$$ is not attained in the range $$[0, \, (2\log p-2\log\log p)^{1/2}]$$ as described in (15), it is thresholded at $$(2\log p)^{1/2}$$. The following theorem states a weak condition to ensure the existence of $$\hat{t}$$ in the range; as a result, the false discovery rate and false discovery proportion will converge to the prespecified level $$\alpha$$. Theorem 4. Let \[ \mathcal{S}_\rho= \big\{i\in {\mathcal{H}}:\text{there exists }d \text{ with } 1\leq d\leq D \text{ such that }{|B_{i,d}|}/{{\theta_{i,d}^{1/2}}} \geq (\log p)^{{1/ 2}+\rho}\big\}\text{.} \] Suppose that for some $$\rho>0$$ and some $$\delta>0$$, $$|\mathcal{S}_\rho| \geq \{{1}/({\pi}^{1/2}\alpha)+\delta\}({\log p})^{1/2}$$. Suppose there exists some $$\gamma>0$$ such that $$\max_{i\in {\mathcal{H}}_0}|\Gamma_i(\gamma)|=o(p^{\tau})$$ for any $$\tau>0$$. Assume that $$p_0= |\mathcal{H}_0|\geq cp$$ for some $$c>0$$ and that (9) and (7) hold. Then under Conditions 1–3 with $$p\leq cn^r$$ for some $$c>0$$ and $$r>0$$, \[ \lim_{(n,p)\rightarrow \infty}\frac{{\small{\text{FDR}}}(\hat{t})}{\alpha p_0/p}=1, \quad \frac{{\small{\text{FDP}}}(\hat{t})}{\alpha p_0/p} \rightarrow 1 \] in probability as $$(n,p)\rightarrow \infty$$. Remark 3. The condition on $$|\mathcal{S}_\rho| $$ requires that a few rows of $${B}$$ have one entry with magnitude exceeding $$(\log p)^{1/2+\rho}/n^{1/2}$$ for some constant $$\rho>0$$, among $$p$$ hypotheses in total, and is thus a mild condition. It is critical to restrict $$t$$ to the range $$[0,\,(2\log p-2\log \log p)^{1/2}]$$ and to threshold $$Q_i$$ at $$(2\log p)^{1/2}$$ when $$\hat{t}$$ is not in this range. First of all, when $$t\geq (2\log p-2\log \log p)^{1/2}$$, $$\:2p\{1-\Phi(t)\}\rightarrow 0$$ and is not even a consistent estimate of the false rejections $$R_0(t)$$ because $$|{R_0(t)}/[2p\{1-\Phi(t)\}]-1|\not\rightarrow 0$$ in probability as $$(n,p)\rightarrow \infty$$. Hence, if we use $$2p\{1-\Phi(t)\}$$ as an estimate of $$R_0(t)$$ for all $$t\leq (2\log p)^{1/2}$$, it may not be able to control $${\small{\text{FDP}}}(\hat{t})$$ with positive probability. Secondly, it is critical to threshold $$Q_i$$ at $$(2\log p)^{1/2}$$ instead of $$(2\log p-2\log \log p)^{1/2}$$. When $$t$$ does not exist in the range, thresholding $$Q_i$$ at $$(2\log p-2\log \log p)^{1/2}$$ will cause too many false rejections, and consequently $${\small{\text{FDR}}}(\hat{t})$$ cannot be controlled asymptotically at level $$\alpha$$. 3.3. Algorithm details and tuning parameter selection To obtain the row-wise test statistics $$Q_i$$ in Step 1 of the above algorithm, we study the estimation of regression coefficients in models (1) and (3). As discussed in § 2.2, the regression coefficient matrix $${B}$$ can be estimated by first reformulating the model (1) as (4) and then applying the group lasso estimator of Yuan & Lin (2006) as in (5) with \[\lambda_n = b \{8\hat{\sigma}_{{Y}}(1+5/2D^{-1}\log p)/(nD)\}^{1/2}, \] where $$\text{vec}({Y})$$ and $${X}$$ are centred and $$\hat{\sigma}_{{Y}}$$ is the sample variance of $$\text{vec}({Y})$$; see, for example, Lounici et al. (2011). For the inverse regression models in (3), $$\gamma_{i,d}$$$$(d=1,\ldots, D;\, i=1,\ldots, p)$$ is estimated by applying the lasso as follows: \begin{equation} \label{ga_est} {\gamma}_{i,d}=\Lambda_{i,d}^{-1/2}\!\mathop{\arg\min}_{{v}}\!\left(\!\frac{1}{2n}\Big{|}\big[({Y}_{{\cdot},d},{X}_{{\cdot},-i})-\{\bar{{Y}}_d,\bar{{X}}_{({\cdot},-i)}\}\big]\Lambda_{i,d}^{-1/2}{v}-({X}_{{\cdot},i}-\bar{{X}}_{{\cdot},i})\Big{|}_2^2\!+\lambda_{i,n}|{v}|_1\!\right)\!, \end{equation} (16) where $$\lambda_{i,n}=b({\hat{\sigma}_{i,i}\log p/n})^{1/2}$$ and $$\Lambda_{i,d}=\text{diag}(\hat{\sigma}_{{Y}_{d}},\hat{\Sigma}_{-i,-i})$$, in which $$\hat{\sigma}_{{Y}_{d}}$$ is the sample variance of $${Y}_{k,d}$$ and $$\hat{\Sigma}=(\hat{\sigma}_{i,j})$$ is the sample covariance matrix of $${X}_{k,{\cdot}}$$. The tuning parameters $$\lambda_n$$ and $$\lambda_{i,n}$$ in (5) and (16) are selected adaptively using the data by matching the number of false rejections $$R_0(t)$$ with its estimate $$2p\{1-\Phi(t)\}$$ as closely as possible. Since $${\mathcal{H}}_0$$ is unknown, the constant $$b$$ in the tuning parameters is chosen to minimize \[ \int_{c}^1\left[ \frac{\sum_{i\in{\mathcal{H}}}I\{Q_i^{(b)}\geq \Phi^{-1}(1-\alpha/2)\}}{\alpha p}-1 \right]^2{\rm d}\alpha, \] where $$c>0$$ and $$Q_i^{(b)}$$ is the statistic of the corresponding tuning parameter. A discretized criterion and the algorithm are summarized as follows. Step 1. For $$b=1,\ldots,40$$ let $$\lambda_n=(b/20)[8\hat{\sigma}_{{Y}}\{1+(5/2)\log p/D\}/(nD)]^{1/2}$$ and $$\lambda_{i,n}=(b/20)(\hat{\sigma}_{i,i}\log p/n)^{1/2}$$. For each $$b$$, calculate $$\hat{{B}}^{(b)}$$ and $$\hat{{\gamma}}_{i,d}^{(b)}$$ ($$i=1,\ldots,p$$; $$d=1,\ldots,D$$). Based on the estimation of regression coefficients, construct the corresponding statistics $$S_i^{(b)}$$ for each $$b$$. Step 2. Choose $$\hat{b}$$ to be the minimizer of \[ \hat{b}=\arg\min\sum_{s=1}^{10}\left[\frac{\sum_{1\leq i\leq p}I\{S_{i}^{(b)}\geq \Psi^{-1}(1-s[1-\Psi\{({2\log p})^{1/2}\}]/10)\}}{sp[1-\Psi\{({2\log p})^{1/2}\}]/10}-1\right]^2\text{.} \] Then the tuning parameters $$\lambda_{n}$$ and $$\lambda_{i,n}$$ are chosen by \[ \lambda_{n}=(\hat{b}/20)[8\hat{\sigma}_{{Y}}\{1+(5/2)\log p/D\}/(nD)]^{1/2},\quad \lambda_{i,n}=(\hat{b}/20)(\hat{\sigma}_{i,i}\log p/n)^{1/2}\text{.} \] 4. Simulation studies 4.1. Data generation The false discovery rate and power of the multiple testing procedure proposed in § 3.1 are evaluated by simulation. The data are generated by considering three matrix models, with covariates being a combination of continuous and discrete random variables. The proposed row-wise multiple testing procedure is compared with the entrywise testing method of Xia et al. (2018). The R code is available in the Supplementary Material. The number of responses $$D$$ is chosen to be 10. The design matrices $${X}_{k,{\cdot}}$$ ($$k=1,\ldots,n$$) are generated as in Xia et al. (2018), with some covariates continuous and others discrete. Let $$\Lambda=(\Lambda_{i,j})$$ be a diagonal matrix with $$\Lambda_{i,i}=\text{Un}(1,3)$$ for $$i=1,\ldots,p$$. The following three models are used to generate the design matrices. Model 1 constructs $${\Omega}^{*(1)}=(\omega^{*(1)}_{i,j})$$ where $$\omega^{*(1)}_{i,i}=1$$, $$\omega^{*(1)}_{i,i+1}=\omega^{*(1)}_{i+1,i}=0{\cdot}6$$, $$\omega^{*(1)}_{i,i+2}=\omega^{*(1)}_{i+2,i}=0{\cdot}3$$ and $$\omega^{*(1)}_{i,j}=0$$ otherwise. Then the precision matrix is generated by $${\Omega}^{(1)}=\Lambda^{1/2}{\Omega}^{*(1)}\Lambda^{1/2}$$. Model 2 constructs $${\Omega}^{*(2)}=(\omega^{*(2)}_{i,j})$$ where $$\omega^{*(2)}_{i,j}=\omega^{*(2)}_{j,i}=0{\cdot} 5$$ for $$i=10(k-1)+1$$ and $$10(k-1)+2\leq j\leq 10(k-1)+10$$, with $$1\leq k\leq p/10$$, and $$\omega^{*(2)}_{i,j}=0$$ otherwise. Then the precision matrix is generated by $${\Omega}^{(2)}=\Lambda^{1/2}({\Omega}^{*(2)}+\delta{I})/(1+\delta)\Lambda^{1/2}$$ with $$\delta=|\lambda_{\min}({\Omega}^{*(2)})|+0{\cdot}05$$. Model 3 constructs $${\Omega}^{*(3)}=(\omega^{*(3)}_{i,j})$$ where $$\omega^{*(3)}_{i,i}=1$$, $$\omega^{*(3)}_{i,j}= 0{\cdot}8\times\text{Ber}(1,0{\cdot}05)$$ for $$i < j$$ and $$\omega^{*(3)}_{j,i}=\omega^{*(3)}_{i,j}$$. Then the precision matrix is generated by $${\Omega}^{(3)}=\Lambda^{1/2}({\Omega}^{*(3)}+\delta{I})/(1+\delta)\Lambda^{1/2}$$ with $$\delta=|\lambda_{\min}({\Omega}^{*(3)})|+0{\cdot}05$$. For each of the three matrix models, independent and identically distributed samples $${X}_{k,{\cdot}}\sim N(0,\Sigma^{(m)})$$ ($$k=1,\ldots,n$$) with $$m=1,2$$ and 3 are obtained. Discrete covariates are simulated by $$l$$ covariates of $${X}_{k,{\cdot},d}$$ taking one of three discrete values, 0, 1 or 2, with probability $$1/3$$ each, where $$l$$ is a random integer between $$\lfloor p/2 \rfloor$$ and $$p$$. For the regression coefficient matrix $${B}$$, $$s$$ nonzero rows are randomly selected, with $$s = 10, 20, 30$$ and 50 for $$p=50, 200, 500$$ and 1000, respectively. For the selected rows, we construct the regression coefficient matrix in the following two settings. In setting 1, all entries in the nonzero rows are nonzero: the magnitudes of $$B_{i,d}$$ are generated randomly from $$[-2(\log p/n)^{1/2}, \,-(\log p/n)^{1/2}]\cup[(\log p/n)^{1/2}, \,2(\log p/n)^{1/2}]$$ with $$n=100$$ and $$d=1,\ldots, D$$. In setting 2, a small proportion of entries of the nonzero rows are nonzero: for each nonzero row, we randomly selected three nonzero locations $$\{l_1,l_2,l_3\}$$ and generated the magnitudes of nonzero $$B_{i,d}$$ randomly from $$[-4(\log p/n)^{1/2}, \,-2(\log p/n)^{1/2}]\cup[2(\log p/n)^{1/2}, \,4(\log p/n)^{1/2}]$$ with $$n=100$$ and $$d=l_1,l_2,l_3$$, and set $$B_{i,d}=0$$ otherwise. The false discovery rate level is chosen as $$\alpha=5\%$$. Based on 50 replications, the empirical false discovery rate and power are calculated by \[ \frac{1}{50}\sum_{l=1}^{50}\frac{\sum_{i\in \mathcal{H}_0}I(N_{i,l}\geq \hat{t})}{\sum_{i\in \mathcal{H}}I(N_{i,l}\geq \hat{t})}, \quad\frac{1}{50}\sum_{l=1}^{50}\frac{\sum_{i\in \mathcal{H}_1}I(N_{i,l}\geq \hat{t})}{|\mathcal{H}_1|}, \] where $$N_{i,l}$$ denotes the normal quantile transformed statistics for the $$l$$th replication. 4.2. Simulation results Figure 1 shows that the new method controls the false discovery rate well with $$\alpha=5\%$$ for all three matrix models with randomly selected nonzero regression coefficients as described above, across the whole range of dimensions. In contrast, the entrywise method shows severe false discovery rate distortion when the same significance level $$\alpha$$ is used. As an alternative, we can conservatively select the significance level for the entrywise method to be $$\alpha/D$$, for testing each of the $$D$$ columns. The false discovery rate is well controlled by this corrected entrywise method for Models 1 and 2, but still suffers from distortion for Model 3 when the dimension is large. Fig. 1. View largeDownload slide Simulation setting 1: comparison of false discovery rate and power of the three methods, with $$\alpha=5\%$$; in each panel the solid line with circles represents the proposed row-wise method, the dotted line with crosses represents the entrywise method with significance level $$\alpha$$, and the dashed line with triangles represents the entrywise method with significance level $$\alpha/D$$. Fig. 1. View largeDownload slide Simulation setting 1: comparison of false discovery rate and power of the three methods, with $$\alpha=5\%$$; in each panel the solid line with circles represents the proposed row-wise method, the dotted line with crosses represents the entrywise method with significance level $$\alpha$$, and the dashed line with triangles represents the entrywise method with significance level $$\alpha/D$$. Figure 1 shows that the proposed method has a clear power advantage over the two entrywise methods. Even for the uncorrected entrywise method, which has serious false discovery rate distortion, the empirical power is still much lower than that of our new procedure. When the dimension is large, i.e., when $$p=1000$$, both entrywise methods suffer from trivial power, while the power of our method remains reasonably high. In Fig. 2 one can observe similar behaviour to that in Fig. 1: the proposed method attains the desired error rate control in all models with various dimensions and sparsity levels. In setting 2, because only a small proportion of entries in the nonzero rows of the regression coefficient matrix $$B$$ are nonzero, the entrywise method with the same significance level $$\alpha$$ exhibits less severe false discovery rate distortion, though obvious deviation from the nominal level can still be seen, especially in Model 3. Furthermore, although the similarities between the columns of $$B$$ are relatively weak in setting 2, the proposed method still shows significant power gain over the alternative procedures, especially as the dimension increases. Fig. 2. View largeDownload slide Simulation setting 2: comparison of false discovery rate and power of the three methods, with $$\alpha=5\%$$; in each panel the solid line with circles represents the proposed row-wise method, the dotted line with crosses represents the entrywise method with significance level $$\alpha$$, and the dashed line with triangles represents the entrywise method with significance level $$\alpha/D$$. Fig. 2. View largeDownload slide Simulation setting 2: comparison of false discovery rate and power of the three methods, with $$\alpha=5\%$$; in each panel the solid line with circles represents the proposed row-wise method, the dotted line with crosses represents the entrywise method with significance level $$\alpha$$, and the dashed line with triangles represents the entrywise method with significance level $$\alpha/D$$. In summary, the numerical results suggest that when the regression coefficients share some row-wise similarities, groupwise testing is preferable to the entrywise method, in terms of both false discovery rate and power. The results of more simulations with weaker signal sizes are reported in the Supplementary Material. 5. Real-data analysis We apply the proposed testing procedures to an ovarian cancer dataset, with the goal of identifying the miRNA regulators that regulate the expression of ovarian cancer-related proteins. MicroRNAs are a family of small noncoding RNAs that regulate a wide array of biological processes, including carcinogenesis. In cancer cells, miRNAs have been found to be heavily dysregulated and affect the expression of genes and their protein products. To investigate the association between miRNA expression and protein expression in ovarian cancer, a total of 125 stage III and stage IV papillary serous primary ovarian cancer samples resected during debulking at the University of Turin were profiled for both miRNA expression and protein expression (Zsiros et al., 2015). A total of 3480 probes were profiled for miRNA expression and summarized as $$\log_2$$ expression ratios. Among these, 3132 miRNAs were characterized and are used in our analysis. The protein expression data on these 125 samples were measured using reverse-phase protein arrays for a total of 195 proteins; reverse-phase protein array is a quantitative antibody-based technology that can be used to assess multiple protein markers in many samples in a cost-effective, sensitive and high-throughput manner (Li et al., 2013). Since many of the proteins did not exhibit large variations in the ovarian cancer samples, they were first screened with the variance threshold set to 1, which resulted in the following 16 most variable proteins in our analysis: Annexin, Caveolin, Caveolin1, Claudin7, Cyclin.B1, E.cadherin, FAK.C, FAK.C1, HER2, HSP70, HSP70.1, IGFBP2, MAPK, p38, PARP, and PR.V. These proteins play functionally significant roles in ovarian cancer migration and invasion. For example, HER2 overexpression is a driving force in the carcinogenesis of several human cancers, including ovarian cancer (Pils et al., 2007). It has been reported that the overwhelming majority of human tumours overexpress members of the HSP70 family, and expression of these proteins is typically a marker of poor prognosis (Murphy, 2013). Insulin-like growth factor binding protein 2, IGFBP2, has been shown to enhance the invasion capacity of ovarian cancer cells (Lee et al., 2005). It is therefore important to identify the possible miRNA regulators that regulate the expression of such proteins in ovarian cancer. Our goal is to identify the key protein regulators from the set of 3132 miRNAs using the data from these 125 ovarian cancer samples. Applying the proposed method, 25 miRNAs were identified as being associated with protein expression at a false discovery rate $$\alpha$$-level of 0$${\cdot}$$05; see Table 1. However, when the entrywise test was applied with an $$\alpha$$-level of 0$${\cdot}$$05/16, none of the miRNAs was selected. The identified miRNAs play various regulatory roles in cancer initiation and progression. Among them, miR-376a, miR-888, miR-187, miR-146a and miR-105 are associated with promotion of proliferation and metastasis, while miR-33b, miR-490-5p, miR-497, miR-596, miR-548-3p, miR-105, miR-185, miR-548i, miR-1247 and miR136 are associated with inhibition of cancer metastasis and progression. A few identified miRNAs, including miR-593, are also involved in protein kinase R, PKR, regulation and regulate cell proliferation. Table 1 lists supporting literature for the involvement of some of the identified miRNAs in ovarian cancer progression. These results show that miRNAs play important roles in regulating protein expression that are associated with ovarian cancer initiation and progression, and our proposed test provides a powerful tool for identifying such cancer-associated miRNAs. Table 1. MicroRNAs that regulate protein expression in stage II to stage IV ovarian cancer, their biological functions, $$p$$-values from the proposed test, and relevant literature MicroRNA Biological function $$p$$-value Reference hsa-miR-376a Promotes proliferation and metastasis 3$${\cdot}$$65$$\,\times\, 10^{-5}$$ Yang et al. (2016) hsa-miR-888 Cancer metastasis 9$${\cdot}$$76$$\,\times\, 10^{-6}$$ Huang & Chen (2014) hsa-miR-1288 Cancer location and pathological staging 7$${\cdot}$$00$$\,\times\, 10^{-6}$$ Gopalan et al. (2014) hsa-miR-33b Inhibits cancer metastasis 1$${\cdot}$$75$$\,\times\, 10^{-4}$$ Lin et al. (2015) hsa-miR-490-5p Tumour suppression 6$${\cdot}$$86$$\,\times\, 10^{-7}$$ Lan et al. (2015) hsa-miR-497 Inhibitory roles 3$${\cdot}$$55$$\,\times\, 10^{-4}$$ Li et al. (2011) mmu-miR-759 — 3$${\cdot}$$22$$\,\times\, 10^{-4}$$ has-RNU48 Tumour pathology and prognosis 2$${\cdot}$$15$$\,\times\, 10^{-4}$$ Gee et al. (2011) hsa-miR-596 Tumour suppression 7$${\cdot}$$90$$\,\times\, 10^{-5}$$ Endo et al. (2013) hsa-miR-187 Cancer progression 5$${\cdot}$$49$$\,\times\, 10^{-6}$$ Chao et al. (2012) has-RNU19 — 1$${\cdot}$$68$$\,\times\, 10^{-4}$$ hsa-miR-518f — 3$${\cdot}$$43$$\,\times\, 10^{-5}$$ hsa-miR-548d-3p Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsa-miR-105 Inhibits tumour growth 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Honeywell et al. (2013) hsa-miR-185 Suppresses tumour growth 2$${\cdot}$$92$$\,\times\, 10^{-4}$$ Imam et al. (2010) hsa-miR-146a Enhanced tumourigenic potential 2$${\cdot}$$89$$\,\times\, 10^{-4}$$ Sandhu et al. (2014) hsa-miR-548i Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsv1-miR-H6 — 3$${\cdot}$$66$$\,\times\, 10^{-4}$$ hsa-miR-129 Promotes apoptosis 5$${\cdot}$$23$$\,\times\, 10^{-7}$$ Karaayvaz et al. (2013) hsa-miR-1247 Inhibits cell proliferation 2$${\cdot}$$13$$\,\times\, 10^{-4}$$ Shi et al. (2014) hsa-miR-136 Inhibits cancer stem cell activity 1$${\cdot}$$38$$\,\times\, 10^{-4}$$ Jeong et al. (2017) hsa-miR-593 Regulates cell proliferation 1$${\cdot}$$65$$\,\times\, 10^{-6}$$ Ito et al. (2011) mmu-miR-105 Promotes metastasis 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Zhou et al. (2014) hsa-miR-886-5p Associates with PKR and modulates its activity 2$${\cdot}$$41$$\,\times\, 10^{-4}$$ Lee et al. (2011) hsa-miR-553 — 7$${\cdot}$$03$$\,\times\, 10^{-5}$$ MicroRNA Biological function $$p$$-value Reference hsa-miR-376a Promotes proliferation and metastasis 3$${\cdot}$$65$$\,\times\, 10^{-5}$$ Yang et al. (2016) hsa-miR-888 Cancer metastasis 9$${\cdot}$$76$$\,\times\, 10^{-6}$$ Huang & Chen (2014) hsa-miR-1288 Cancer location and pathological staging 7$${\cdot}$$00$$\,\times\, 10^{-6}$$ Gopalan et al. (2014) hsa-miR-33b Inhibits cancer metastasis 1$${\cdot}$$75$$\,\times\, 10^{-4}$$ Lin et al. (2015) hsa-miR-490-5p Tumour suppression 6$${\cdot}$$86$$\,\times\, 10^{-7}$$ Lan et al. (2015) hsa-miR-497 Inhibitory roles 3$${\cdot}$$55$$\,\times\, 10^{-4}$$ Li et al. (2011) mmu-miR-759 — 3$${\cdot}$$22$$\,\times\, 10^{-4}$$ has-RNU48 Tumour pathology and prognosis 2$${\cdot}$$15$$\,\times\, 10^{-4}$$ Gee et al. (2011) hsa-miR-596 Tumour suppression 7$${\cdot}$$90$$\,\times\, 10^{-5}$$ Endo et al. (2013) hsa-miR-187 Cancer progression 5$${\cdot}$$49$$\,\times\, 10^{-6}$$ Chao et al. (2012) has-RNU19 — 1$${\cdot}$$68$$\,\times\, 10^{-4}$$ hsa-miR-518f — 3$${\cdot}$$43$$\,\times\, 10^{-5}$$ hsa-miR-548d-3p Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsa-miR-105 Inhibits tumour growth 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Honeywell et al. (2013) hsa-miR-185 Suppresses tumour growth 2$${\cdot}$$92$$\,\times\, 10^{-4}$$ Imam et al. (2010) hsa-miR-146a Enhanced tumourigenic potential 2$${\cdot}$$89$$\,\times\, 10^{-4}$$ Sandhu et al. (2014) hsa-miR-548i Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsv1-miR-H6 — 3$${\cdot}$$66$$\,\times\, 10^{-4}$$ hsa-miR-129 Promotes apoptosis 5$${\cdot}$$23$$\,\times\, 10^{-7}$$ Karaayvaz et al. (2013) hsa-miR-1247 Inhibits cell proliferation 2$${\cdot}$$13$$\,\times\, 10^{-4}$$ Shi et al. (2014) hsa-miR-136 Inhibits cancer stem cell activity 1$${\cdot}$$38$$\,\times\, 10^{-4}$$ Jeong et al. (2017) hsa-miR-593 Regulates cell proliferation 1$${\cdot}$$65$$\,\times\, 10^{-6}$$ Ito et al. (2011) mmu-miR-105 Promotes metastasis 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Zhou et al. (2014) hsa-miR-886-5p Associates with PKR and modulates its activity 2$${\cdot}$$41$$\,\times\, 10^{-4}$$ Lee et al. (2011) hsa-miR-553 — 7$${\cdot}$$03$$\,\times\, 10^{-5}$$ Table 1. MicroRNAs that regulate protein expression in stage II to stage IV ovarian cancer, their biological functions, $$p$$-values from the proposed test, and relevant literature MicroRNA Biological function $$p$$-value Reference hsa-miR-376a Promotes proliferation and metastasis 3$${\cdot}$$65$$\,\times\, 10^{-5}$$ Yang et al. (2016) hsa-miR-888 Cancer metastasis 9$${\cdot}$$76$$\,\times\, 10^{-6}$$ Huang & Chen (2014) hsa-miR-1288 Cancer location and pathological staging 7$${\cdot}$$00$$\,\times\, 10^{-6}$$ Gopalan et al. (2014) hsa-miR-33b Inhibits cancer metastasis 1$${\cdot}$$75$$\,\times\, 10^{-4}$$ Lin et al. (2015) hsa-miR-490-5p Tumour suppression 6$${\cdot}$$86$$\,\times\, 10^{-7}$$ Lan et al. (2015) hsa-miR-497 Inhibitory roles 3$${\cdot}$$55$$\,\times\, 10^{-4}$$ Li et al. (2011) mmu-miR-759 — 3$${\cdot}$$22$$\,\times\, 10^{-4}$$ has-RNU48 Tumour pathology and prognosis 2$${\cdot}$$15$$\,\times\, 10^{-4}$$ Gee et al. (2011) hsa-miR-596 Tumour suppression 7$${\cdot}$$90$$\,\times\, 10^{-5}$$ Endo et al. (2013) hsa-miR-187 Cancer progression 5$${\cdot}$$49$$\,\times\, 10^{-6}$$ Chao et al. (2012) has-RNU19 — 1$${\cdot}$$68$$\,\times\, 10^{-4}$$ hsa-miR-518f — 3$${\cdot}$$43$$\,\times\, 10^{-5}$$ hsa-miR-548d-3p Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsa-miR-105 Inhibits tumour growth 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Honeywell et al. (2013) hsa-miR-185 Suppresses tumour growth 2$${\cdot}$$92$$\,\times\, 10^{-4}$$ Imam et al. (2010) hsa-miR-146a Enhanced tumourigenic potential 2$${\cdot}$$89$$\,\times\, 10^{-4}$$ Sandhu et al. (2014) hsa-miR-548i Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsv1-miR-H6 — 3$${\cdot}$$66$$\,\times\, 10^{-4}$$ hsa-miR-129 Promotes apoptosis 5$${\cdot}$$23$$\,\times\, 10^{-7}$$ Karaayvaz et al. (2013) hsa-miR-1247 Inhibits cell proliferation 2$${\cdot}$$13$$\,\times\, 10^{-4}$$ Shi et al. (2014) hsa-miR-136 Inhibits cancer stem cell activity 1$${\cdot}$$38$$\,\times\, 10^{-4}$$ Jeong et al. (2017) hsa-miR-593 Regulates cell proliferation 1$${\cdot}$$65$$\,\times\, 10^{-6}$$ Ito et al. (2011) mmu-miR-105 Promotes metastasis 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Zhou et al. (2014) hsa-miR-886-5p Associates with PKR and modulates its activity 2$${\cdot}$$41$$\,\times\, 10^{-4}$$ Lee et al. (2011) hsa-miR-553 — 7$${\cdot}$$03$$\,\times\, 10^{-5}$$ MicroRNA Biological function $$p$$-value Reference hsa-miR-376a Promotes proliferation and metastasis 3$${\cdot}$$65$$\,\times\, 10^{-5}$$ Yang et al. (2016) hsa-miR-888 Cancer metastasis 9$${\cdot}$$76$$\,\times\, 10^{-6}$$ Huang & Chen (2014) hsa-miR-1288 Cancer location and pathological staging 7$${\cdot}$$00$$\,\times\, 10^{-6}$$ Gopalan et al. (2014) hsa-miR-33b Inhibits cancer metastasis 1$${\cdot}$$75$$\,\times\, 10^{-4}$$ Lin et al. (2015) hsa-miR-490-5p Tumour suppression 6$${\cdot}$$86$$\,\times\, 10^{-7}$$ Lan et al. (2015) hsa-miR-497 Inhibitory roles 3$${\cdot}$$55$$\,\times\, 10^{-4}$$ Li et al. (2011) mmu-miR-759 — 3$${\cdot}$$22$$\,\times\, 10^{-4}$$ has-RNU48 Tumour pathology and prognosis 2$${\cdot}$$15$$\,\times\, 10^{-4}$$ Gee et al. (2011) hsa-miR-596 Tumour suppression 7$${\cdot}$$90$$\,\times\, 10^{-5}$$ Endo et al. (2013) hsa-miR-187 Cancer progression 5$${\cdot}$$49$$\,\times\, 10^{-6}$$ Chao et al. (2012) has-RNU19 — 1$${\cdot}$$68$$\,\times\, 10^{-4}$$ hsa-miR-518f — 3$${\cdot}$$43$$\,\times\, 10^{-5}$$ hsa-miR-548d-3p Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsa-miR-105 Inhibits tumour growth 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Honeywell et al. (2013) hsa-miR-185 Suppresses tumour growth 2$${\cdot}$$92$$\,\times\, 10^{-4}$$ Imam et al. (2010) hsa-miR-146a Enhanced tumourigenic potential 2$${\cdot}$$89$$\,\times\, 10^{-4}$$ Sandhu et al. (2014) hsa-miR-548i Anti-oncogenic regulator 9$${\cdot}$$73$$\,\times\, 10^{-5}$$ Shi et al. (2015) hsv1-miR-H6 — 3$${\cdot}$$66$$\,\times\, 10^{-4}$$ hsa-miR-129 Promotes apoptosis 5$${\cdot}$$23$$\,\times\, 10^{-7}$$ Karaayvaz et al. (2013) hsa-miR-1247 Inhibits cell proliferation 2$${\cdot}$$13$$\,\times\, 10^{-4}$$ Shi et al. (2014) hsa-miR-136 Inhibits cancer stem cell activity 1$${\cdot}$$38$$\,\times\, 10^{-4}$$ Jeong et al. (2017) hsa-miR-593 Regulates cell proliferation 1$${\cdot}$$65$$\,\times\, 10^{-6}$$ Ito et al. (2011) mmu-miR-105 Promotes metastasis 1$${\cdot}$$15$$\,\times\, 10^{-4}$$ Zhou et al. (2014) hsa-miR-886-5p Associates with PKR and modulates its activity 2$${\cdot}$$41$$\,\times\, 10^{-4}$$ Lee et al. (2011) hsa-miR-553 — 7$${\cdot}$$03$$\,\times\, 10^{-5}$$ For comparison, we also analysed this dataset using the method of Ruffieux et al. (2017), a Bayesian implementation of a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. For each miRNA-protein pair, we obtained the marginal posterior inclusion probability. Among all the pairs, the maximum inclusion probability was 0$${\cdot}$$56, with only four pairs having inclusion probability over 0$${\cdot}$$25 and one pair having inclusion probability above 0$${\cdot}$$50. Using 400 permutations as suggested by Ruffieux et al. (2017), the procedure did not detect any miRNA-protein association at a false discovery rate level of 0$${\cdot}$$25. Since our test aims to test the association between a given miRNA and any of the proteins, for each miRNA we also took the maximum inclusion probability among all the proteins and calculated the false discovery rate using 400 permutations. Again, no association was identified for a false discovery rate level of 0$${\cdot}$$25. These results were similar to the entrywise test results. Acknowledgement The research of Xia was supported in part by the National Natural Science Foundation of China, The Recruitment Program of Global Experts Youth Project from China, and Fudan University. Cai was supported in part by the U.S. National Science Foundation and National Institutes of Health. Li was partly supported by the National Cancer Institute. We thank Dr George Coukos for sharing the ovarian cancer datasets, and the associate editor and reviewer for very helpful comments. Supplementary material Supplementary material available at Biometrika online includes additional simulation results and the R code used in the simulations and for implementation of the proposed methods. Appendix Technical lemmas In this section we prove the main theoretical results. We begin by stating lemmas that will be used in the proofs of the theorems. The following lemma was introduced in Berman (1962). Lemma A1. If $$X$$ and $$Y$$ have a bivariate normal distribution with zero expectation, unit variance and correlation coefficient $$\rho$$, then \begin{eqnarray*} \lim_{c\rightarrow\infty}\frac{{\text{pr}}(X>c, Y>c)}{\{2\pi(1-\rho)^{1/2}c^{2}\}^{-1}\exp\bigl(-\frac{c^{2}}{1+\rho}\bigr)(1+\rho)^{1/2}}=1 \end{eqnarray*} uniformly for all $$\rho$$ such that $$|\rho|\leq \delta$$, for any $$\delta$$ with $$0<\delta<1$$. Based on the definitions of $$U_{i,d}$$ and $$\tilde{U}_{i,d}$$ in (10), the following lemma is essentially proved in an unpublished 2014 paper by W. Liu and S. Luo, available upon request from the first author. Lemma A2. Suppose that Conditions 1 and 2 and the asymptotic conditions (9) and (7) hold. Then \[ \tilde r_{i,d}=n^{-1}\sum_{k=1}^n(\epsilon_{k,d}-\bar{\epsilon})(\eta_{k,i,d}-\bar{\eta}_{i,d})+\tilde{\sigma}_{\epsilon}^2(\gamma_{i,1,d}-\hat \gamma_{i,1,d})+\tilde{\sigma}_{i,d}^2(B_{i,d}-\hat B_{i,d})+o_{\rm p}\{(n\log p)^{-1/2}\} \] and, consequently, \[ T_{i,d}=\tilde{U}_{i,d}+(\tilde{\sigma}_{\epsilon}^2/\sigma_{\epsilon}^2+\tilde{\sigma}_{i,d}^2/\sigma_{i,d}^2-2)B_{i,d}+o_{\rm p}\{(n\log p)^{-1/2}\}, \] where $$\tilde{\sigma}_{\epsilon}^2=(Dn)^{-1}\sum_{k=1}^{n_d}\sum_{d=1}^D(\epsilon_{k,d}-\bar{\epsilon})^2$$ and $$\tilde{\sigma}_{i,d}^2=n^{-1}\sum_{k=1}^{n}(\eta_{k,i,d}-\bar{\eta}_{i,d})^2$$ with $$\bar{\epsilon}=(Dn)^{-1}\sum_{k=1}^{n}\sum_{d=1}^D\epsilon_{k,d}$$ and $$\bar{\eta}_{i,d}=n^{-1}\sum_{k=1}^{n}\eta_{k,i,d}$$. As a result, uniformly in $$i=1,\dots,p$$, \[ |T_{i,d}-\tilde{U}_{i,d}|=O_{\rm p}\{B_{i,d}(\log p/n)^{1/2}\}+ o_{\rm p}\{(n\log p)^{-1/2}\}\text{.} \] We apply the group lasso estimator (5) with $$\lambda= \{8\hat{\sigma}_{{Y}}(1+A\log p/D)/(nD)\}^{1/2}$$, taking $$A>5/2$$, where $$\text{vec}({Y})$$ and $${X}$$ are centralized and $$\hat{\sigma}_{{Y}}$$ is the sample variance of $$\text{vec}({Y})$$. Then, by Corollary 4.1 in Lounici et al. (2011), we have the following lemma. Lemma A3. Consider the model (4). Under Conditions 1 and 2, if $$D=O(\log p)$$, then $$\hat{{B}}$$ satisfies \begin{eqnarray*} \max_{1\leq d\leq D}\big|\hat{{B}}_{{\cdot},d}-{B}_{{\cdot},d}\big|_1=O_{\rm p}\{s(p)(\log p/n)^{1/2}\},\quad \max_{1\leq d\leq D}\big|\hat{{B}}_{{\cdot},d}-{B}_{{\cdot},d}\big|_2=O_{\rm p}\big[\{s(p)\log p/n\}^{1/2}\big], \end{eqnarray*} where $$s(p)=\sum_{i=1}^pI({B}_{i,{\cdot}}\neq 0)$$ is the row sparsity of $${B}$$. Proof of Theorem 1 Based on the definitions of $$U_{i,d}$$ and $$\tilde{U}_{i,d}$$ in (10), let $$V_{i,d}={U_{i,d}/(\sigma_{i,1}^2\theta_{i,d}^{1/2})},$$ where $$\theta_{i,d}={\mathrm{var}}(\tilde{U}_{i,d})={\mathrm{var}}(\epsilon_{k,d}\eta_{k,i,d}/\sigma_{i,d}^2)/n=(\sigma_{\epsilon}^2/\sigma_{i,d}^2+B_{i,d}^2)/n$$ ($$d=1,\ldots,D$$). By Lemma 2 in Xia et al. (2015), under conditions (9) and (7) we have \[ |\hat{\sigma}_{\epsilon}^2-\sigma_{\epsilon}^2|=O_{\rm p}\{(\log p/n)^{1/2}\}, \quad\max_{i,d}|\hat{\sigma}_{i,d}^2-\sigma_{i,d}^2|=O_{\rm p}\{(\log p/n)^{1/2}\}\text{.} \] Therefore \begin{equation} \label{p2} \max_{i,d}|\hat{\theta}_{i,d}-\theta_{i,d}|=o_{\rm p}\{1/(n\log p)\}\text{.} \end{equation} (A1) Lemma A2, together with (A1), implies that uniformly in $$1\leq i\leq p$$, $$W_{i,d}=V_{i,d}+o_{\rm p}\{(\log p)^{-1/2}\}$$; thus we have that uniformly in $$1\leq i\leq p$$, $$\:S_i = \sum_{d=1}^D V_{i,d}^2 +o_{\rm p}(1)$$. Hence, it suffices to show that $$\sum_{d=1}^D V_{i,d}^2$$ converges to $$\chi^2_D$$ in distribution. Define $$Z_{k,i,d}=\{\epsilon_{k,d}\eta_{k,i,d}-{E}(\epsilon_{k,d}\eta_{k,i,d})\}/\sigma_{i,d}^2$$ for $$1\leq k\leq n$$. Then \[ V_{i,d}={\sum_{k=1}^{n}Z_{k,i,d}}/{(n^2\theta_{k,d})^{1/2}}\text{.} \] Without loss of generality, we assume $$\sigma_{\epsilon}^2=\sigma_{i,d}^2=1$$. Define $$\hat{V}_{i,d}={\sum_{k=1}^{n}\hat{Z}_{k,i,d}}/{(n^2\theta_{k,d})^{1/2}},$$ where $$ \hat{Z}_{k,i,d}=Z_{k,i,d}I(|Z_{k,i,d}|\leq \tau_{n})-{E} \{Z_{k,i,d}I(|Z_{k,i,d}|\leq \tau_{n})\}$$ with $$\tau_{n}=(4/M)\log (p+n)$$. Note that \begin{eqnarray*} &&\max_{1\leq i\leq p}n^{-{1/ 2}}\sum_{k=1}^{n}{E}\big[|Z_{k,i,d}|I\big\{|Z_{k,i,d}|\geq (4/M)\log (p+n)\big\}\big]\cr &&\quad \leq Cn^{1/ 2}\max_{1\leq k\leq n}\max_{1\leq i\leq p}{E}\big[|Z_{k,i,d}|I\big\{|Z_{k,i,d}|\geq (4/M)\log (p+n)\big\}\big]\cr &&\quad \leq Cn^{1/ 2}(p+n)^{-2}\max_{1\leq k\leq n}\max_{1\leq i\leq p}{E}\big[|Z_{k,i,d}|\exp\{(M/2)|Z_{k,i,d}|\}\big]\cr &&\quad \leq Cn^{1/ 2}(p+n)^{-2}\text{.} \end{eqnarray*} Hence \[ {\text{pr}}\Big{\{}\max_{1\leq i\leq p}|V_{i,d}-\hat{V}_{i,d}|\geq (\log p)^{-1}\Big{\}}\leq {\text{pr}}\Big{(}\max_{1\leq i\leq p}\max_{1\leq k\leq n}|Z_{k,i,d}|\geq \tau_{n}\Big{)}=O(p^{-1})\text{.} \] Because of the fact that \begin{align*} \left|\max_{1\leq i\leq p}\sum_{d=1}^DV_{i,d}^{2}-\max_{1\leq i\leq q}\sum_{d=1}^D\hat{V}_{i,d}^{2}\right|&\leq 2D\max_{1\leq i\leq p}\max_{1\leq d\leq D}|\hat{V}_{i,d}|\:\max_{1\leq i\leq q}\max_{1\leq d\leq D}|V_{i,d}-\hat{V}_{i,d}|\\ & \quad +D\max_{1\leq i\leq q}\max_{1\leq d\leq D}|V_{i,d}-\hat{V}_{i,d}|^{2}, \end{align*} it suffices to prove that for any $$t\in {\mathbb{R}}$$, \begin{equation} \label{aaa1} {\text{pr}}\!\left(\sum_{d=1}^D \hat{V}_{i,d}^2\leq t\right)\rightarrow{\text{pr}}(\chi^2_D\leq t)\text{.} \end{equation} (A2) Let $$\tilde{Z}_{k,i,d}=\hat{Z}_{k,i,d}/\theta_{i,d}^{1/2}$$ for $$i=1,\dots,q$$ and $${W}_{k,i}=(\tilde{Z}_{k,i,1},\ldots,\tilde{Z}_{k,i,D})$$ for $$1\leq k\leq n$$. Then \[ {\text{pr}}\!\left(\sum_{d=1}^D \hat{V}_{i,d}^2\geq t\right) = {\text{pr}}\!\left(\left|n^{-1/2}\sum_{k=1}^n{W}_{k,i}\right|_2^2\geq t\right)\text{.} \] It follows from Theorem 1 in Zaïtsev (1987) that \begin{equation} \label{z1} {\text{pr}}\!\left(\left|n^{-1/2}\sum_{k=1}^n{W}_{k,i}\right|_2^2\geq t\right)\! \leq {\text{pr}}\bigl\{ |{N}^{(D)}|_2^2\geq t-\epsilon_{n}(\log p)^{-{1/ 2}}\bigr\} + c_{1}d^{5/2}\exp\!\left\{-\frac{n^{1/2}\epsilon_{n}}{c_{2}d^{3}\tau_{n}(\log p)^{1/2}}\right\} \end{equation} (A3) and that \begin{equation} \label{z2} {\text{pr}}\!\left(\left|n^{-1/2}\sum_{k=1}^n{W}_{k,i}\right|_2^2\geq t\right)\!\geq {\text{pr}}\bigl\{ |{N}^{(D)}|_2^2\geq t+\epsilon_{n}(\log p)^{-{1/ 2}}\bigr\} - c_{1}d^{5/2}\exp\!\left\{-\frac{n^{1/2}\epsilon_{n}}{c_{2}d^{3}\tau_{n}(\log p)^{1/2}}\right\}\!, \end{equation} (A4) where $$c_{1}>0$$ and $$c_{2}>0$$ are constants, $$\epsilon_{n}\rightarrow 0$$ will be specified later, and $${N}^{(D)}=(N_1,\ldots,N_D)$$ is a normal random vector with $${E} \{{N}^{(D)}\}=0$$ and $${\mathrm{cov}}\{{N}^{(D)}\}={\mathrm{cov}}(W_{k,i})$$. Because $${\Upsilon}$$ is independent of $${X}$$ and $$\eta_{k,i,d_2}=X_{k,i}-(Y_{k,d_2},{X}_{k,-i}){\gamma}_{i,d_2}$$ with $$Y_{k,d_2}=X_{k,{\cdot}}B_{{\cdot}, d_2}+\epsilon_{k,d_2}$$, we have that for $$d_1\neq d_2$$, $$\epsilon_{k,d_1}$$ and $$\eta_{k,i,d_2}$$ are independent of each other. On the other hand, by definition, $$\epsilon_{k,d}$$ and $$\eta_{k,i,d}+\gamma_{i,1,d}\epsilon_{k,d}$$ are independent of each other. Hence, under the null $$H_{0,i}$$, we have that the $$\epsilon_{k,d}$$ are independent of $$\eta_{k,i,d}$$. Thus we have \begin{eqnarray*} {\mathrm{cov}}(\epsilon_{k,d_1}\eta_{k,i,d_1},\epsilon_{k,d_2}\eta_{k,i,d_2})={E}(\epsilon_{k,d_1}\eta_{k,i,d_1}\epsilon_{k,d_2}\eta_{k,i,d_2})=0\text{.} \end{eqnarray*} So $${\mathrm{cov}}\{{N}^{(D)}\}={\mathrm{cov}}(W_{k,i})={I}_{D\times D}$$. This, together with (A3) and (A4), proves (A2), and Theorem 1 then follows. Proof of Theorem 2 By Lemma A2, we have \[ \max_{1\leq d\leq D}\left|\frac{T_{i,d}-\{1+o(1)\}{E} T_{i,d}}{{\hat{\theta}_{i,d}}^{1/2}}-U_{i,d}\right|=o_{\rm p}\{(\log p)^{-1/2}\}\text{.} \] Observe that \[ \sum_{1\leq d\leq D}\frac{[\{1+o(1)\}{E} T_{i,d}]^2}{\hat{\theta}_{i,d}}\leq 2\left(S_{i}+\sum_{1\leq d\leq D}\frac{[T_{i,d}-\{1+o(1)\}{E} T_{i,d}]^2}{\hat{\theta}_{i,d}}\right)\text{.} \] Also note that $${E} (T_{i,d})=B_{i,d}\{1+o(1)\}+o\{(n\log p)^{-1/2}\}$$. The result then follows from the definition (14) of the classes of precision matrices and Theorem 1. Proof of Theorem 3 We start by showing that $${\text{pr}}[\sum_{i\in {\mathcal{H}}_0}I\{|Q_i|\geq (2\log p)^{1/2}\}=0]\rightarrow 1$$ as $$(n,p)\rightarrow \infty$$, and then we focus on the event $$\{\hat{t} \text{in}$$ (15) $${\rm exists}\}$$. Next, we show the false discovery proportion result by dividing the null set into small subsets and controlling the variance of $$R_0(t)$$ for each subset, and thus the false discovery rate result will also be proved. Note that \[ {\text{pr}}\!\left[\sum_{i\in {\mathcal{H}}_0}I\{|Q_i|\geq (2\log p)^{1/2}\}\geq 1\right]\leq p_0\max_{i\in {\mathcal{H}}_0}\:{\text{pr}}\{|Q_i|\geq (2\log p)^{1/2}\}\text{.} \] By Theorem 1, $${\text{pr}}(\max_{i\in {\mathcal{H}}_0}\max_{1\leq d\leq D}|W_{i,d}-\hat{V}_{i,d}|=o\{(\log p)^{-1/2}\})=1$$. Then, on the event $$\{\max_{i\in {\mathcal{H}}_0}$$$$\max_{1\leq d\leq D}|W_{i,d}-\hat{V}_{i,d}|=o\{(\log p)^{-1/2}\}\}$$, by (A3), (A4) and the fact that $$G(t+o\{(\log p)^{-1/2})\}/G(t)=1+o(1)$$ uniformly in $$0\leq t\leq (2\log p)^{1/2}$$, where $$G(t)=2\{1-\Phi(t)\}$$, we have \[ {\text{pr}}\!\left[\sum_{i\in {\mathcal{H}}_0}I\{|Q_i|\geq (2\log p)^{1/2}\}\geq 1\right]\leq p_0G\{(2\log p)^{1/2}\}\{1+o(1)\}=o(1)\text{.} \] Hence we shall focus on the event $$\{\hat{t}$$ exists in the range $$[0,(2\log p-2\log \log p)^{1/2}]\}$$. By definition of $$\hat{t}$$, it is easy to show that \[ \frac{2\{1-\Phi(\hat{t})\}p}{\max\{\sum_{i \in {\mathcal{H}}}I(|Q_i|\geq \hat{t}),1\}}=\alpha\text{.} \] Therefore it suffices to show that \[ \sup_{0\leq t\leq (2\log p-2\log \log p)^{1/2}}\left|\frac{\sum_{i\in {\mathcal{H}}_0}\{I(|Q_i|\geq t)-G(t)\}}{pG(t)}\right|\rightarrow 0 \] in probability. Let $$0\leq t_0<t_1<\cdots<t_b=t_{p}$$ be such that $$t_{\iota}-t_{{\iota}-1}=v_{p}$$ for $$1\leq {\iota}\leq b-1$$ and $$t_b-t_{b-1}\leq v_{p}$$, where $$v_{p}=\{\log p(\log_4 p)\}^{-1/2}$$. Thus we have $$b\sim t_{p}/v_{p}$$. For any $$t$$ such that $$t_{{\iota}-1}\leq t\leq t_{\iota}$$, \[ \frac{\sum_{i\in {\mathcal{H}}_0}I(|Q_i|\geq t_{\iota})}{p_0G(t_{\iota})}\frac{G(t_{\iota})}{G(t_{{\iota}-1})} \leq \frac{\sum_{i\in {\mathcal{H}}_0}I(|Q_i|\geq t)}{p_0G(t)} \leq \frac{\sum_{i\in {\mathcal{H}}_0}I(|Q_i|\geq t_{{\iota}-1})}{p_0G(t_{{\iota}-1})}\frac{G(t_{{\iota}-1})}{G(t_{\iota})}\text{.} \] So it suffices to prove that \[ \max_{0\leq\iota\leq b}\left|\frac{\sum_{i\in {\mathcal{H}}_0}\{I(|Q_i|\geq t_{\iota})-G(t_{\iota})\}}{pG(t_{\iota})}\right|\rightarrow 0 \] in probability. Define $$F_{i}=\sum_{1\leq d\leq D} \hat{V}_{i,d}^2$$ and $$M_{i}=\Phi^{-1}\{1-{\text{pr}}(\chi^2_D\geq F_{i})/2\}$$. By the proof of Theorem 1, we have $$\max_{i\in {\mathcal{H}}_0}|S_{i}-F_{i}|=o_{\rm p}(1)$$, because \[ {\text{pr}}\{\chi^2_D\geq t+o(1)\}/{\text{pr}}(\chi^2_D\geq t)=1+o(1) \] for all $$0\leq t\leq c\log p$$ with any constant $$c>0$$. Recall that $$Q_i=\Phi^{-1}\{1-{\text{pr}}(\chi^2_D\geq S_{i})/2\}$$. Note that $$G[t+o\{(\log p)^{-1/2}\}]/G(t)=1+o(1)$$ uniformly in $$0\leq t\leq (2\log p)^{1/2}$$. Based on the proof of Theorem 1, it suffices to show that \[ \max_{0\leq \iota \leq b} \left|\frac{\sum_{i\in {\mathcal{H}}_0}[I(|M_{i}|\geq t_\iota)-G(t_\iota)]}{p_0G(t_\iota)}\right|\rightarrow 0 \] in probability. Note that \begin{eqnarray*} &&{\text{pr}}\!\left[\max_{0\leq \iota\leq b}\left|\frac{\sum_{i\in \mathcal{H}_0}\{I(|M_i|\geq t_{\iota})-G(t_{\iota})\}}{p_0G(t_{\iota})}\right|\geq \epsilon\right] \leq \sum_{\iota=1}^b{\text{pr}}\!\left[\left|\frac{\sum_{i\in \mathcal{H}_0}\{I(|M_i|\geq t_{\iota})-G(t_{\iota})\}}{p_0G(t_{\iota})}\right|\geq \epsilon\right]\\[3pt] &&\leq \frac{1}{v_p}\int_0^{t_p}\!{\text{pr}}\!\left\{\left|\frac{\sum_{i\in \mathcal{H}_0}I(|M_i|\geq t)}{p_0G(t)}-1\right|\geq \epsilon\right\}{\rm d} t+\sum_{\iota=b-1}^b{\text{pr}}\!\left[\left|\frac{\sum_{i\in \mathcal{H}_0}\{I(|M_i|\geq t_{\iota})-G(t_{\iota})\}}{p_0G(t_{\iota})}\right|\geq \epsilon\right]\!\text{.} \end{eqnarray*} Therefore, it suffices to show that for any $$\epsilon>0$$, \begin{equation} \label{qqq1} \int_0^{t_p}{\text{pr}}\!\left[\left|\frac{\sum_{i\in \mathcal{H}_0}\{I(|M_i|\geq t)-{\text{pr}}(I(|M_i|\geq t)\}}{p_0G(t)}\right|\geq \epsilon\right]{\rm d} t=o(v_p)\text{.} \end{equation} (A5) Note that \begin{eqnarray*} &&{E}\left|\frac{\sum_{i\in \mathcal{H}_0}\{I(|M_i|\geq t)-{\text{pr}}(I(|M_i|\geq t)\}}{p_0G(t)}\right|^2\\[3pt] &&\quad = \frac{\sum_{i,j\in \mathcal{H}_0}\{{\text{pr}}(|M_i|\geq t, |M_j|\geq t)-{\text{pr}}(|M_i|\geq t){\text{pr}}(|M_j|\geq t)\}}{p_0^2G^2(t)}\text{.} \end{eqnarray*} We divide the indices $$i,j\in {\mathcal{H}}_0$$ into three subsets: $${\mathcal{H}}_{01} =\{i,j\in {\mathcal{H}}_{0}: i=j\}$$, $${\mathcal{H}}_{02}=\{i,j\in {\mathcal{H}}_{0}: i\neq j, \: i\in \Gamma_j(\gamma) \mbox{ or }j\in \Gamma_i(\gamma)\}$$, which contains the highly correlated pairs, and $${\mathcal{H}}_{03}={\mathcal{H}}_{0}\setminus ({\mathcal{H}}_{01}\cup {\mathcal{H}}_{02})$$. Then we have \begin{equation} \label{subset1} \frac{\sum_{i,j\in \mathcal{H}_{01}}\{{\text{pr}}(|M_i|\geq t, |M_j|\geq t)-{\text{pr}}(|M_i|\geq t){\text{pr}}(|M_j|\geq t)\}}{p_0^2G^2(t)}\leq \frac{C}{p_0G(t)}\text{.} \end{equation} (A6) Note that $${\mathrm{cov}}(\epsilon_{k,d}\eta_{k,i,d}, \epsilon_{k,d}\eta_{k,j,d})={E}(\epsilon_{k,d}^2\eta_{k,i,d}\eta_{k,j,d})-{E}(\epsilon_{k,d}\eta_{k,i,d}){E}(\epsilon_{k,d}\eta_{k,j,d})$$. Because $${\mathrm{cov}}(\epsilon_{k,d},\eta_{k,i,d})=-\sigma_{i,d}^2B_{i,d}$$, we have $${E}(\epsilon_{k,d}\eta_{k,i,d}){E}(\epsilon_{k,d}\eta_{k,j,d})=\sigma_{i,d}^2\sigma_{j,d}^2B_{i,d}B_{j,d}$$. Note that $${E}(\epsilon_{k,d}^2\eta_{k,i,d}\eta_{k,j,d})$$ equals \begin{align*} & {E}\{\epsilon_{k,d}^2(\eta_{k,i,d}+\epsilon_{k,d}\gamma_{i,1,d})(\eta_{k,j,d}+\epsilon_{k,d}\gamma_{j,1,d})\}\\ & -{E}\{\epsilon_{k,d}^2(\eta_{k,i,d}+\epsilon_{k,d}\gamma_{i,1,d})\epsilon_{k,d}\gamma_{j,1,d}\}-{E}(\epsilon_{k,d}^3\gamma_{i,1,d}\eta_{k,j,d})\text{.} \end{align*} Since $$\epsilon_{k,d}$$ is uncorrelated with $$\eta_{k,i,d}+\epsilon_{k,d}\gamma_{i,1,d}$$, we have $$-\gamma_{i,1,d}{\mathrm{var}}(\epsilon_{k,d})={\mathrm{cov}}(\epsilon_{k,d}\eta_{k,i,d})$$. Thus, \[ {E}(\epsilon_{k,d}^2\eta_{k,i,d}\eta_{k,j,d})=\sigma_{\epsilon}^2{E}\{(\eta_{k,i,d}+\epsilon_{k,d}\gamma_{i,1,d})(\eta_{k,j,d}+\epsilon_{k,d}\gamma_{j,1,d})\}-{E}(\epsilon_{k,d}^3\gamma_{i,1,d}\eta_{k,j,d})\text{.} \] Note that \[ {E}(\epsilon_{k,d}^3\gamma_{i,1,d}\eta_{k,j,d})={E}\{\epsilon_{k,d}^3\gamma_{i,1,d}(\eta_{k,j,d}+\epsilon_{k,d}\gamma_{j,1,d})\}-{E}(\epsilon_{k,d}^4\gamma_{i,1,d}\gamma_{j,1,d})= -3\gamma_{i,1,d}\gamma_{j,1,d}\sigma_{\epsilon_{d}}^4 \] and that \begin{eqnarray*} &&{E}\{(\eta_{k,i,d}+\epsilon_{k,d}\gamma_{i,1,d})(\eta_{k,j,d}+\epsilon_{k,d}\gamma_{j,1,d})\}\cr &&\quad={\mathrm{cov}}(\eta_{k,i,d},\eta_{k,j,d})+\gamma_{i,1,d}{\mathrm{cov}}(\epsilon_{k,d},\eta_{k,j,d})+\gamma_{j,1,d}{\mathrm{cov}}(\epsilon_{k,d},\eta_{k,i,d})+\gamma_{i,1,d}\gamma_{j,1,d}\sigma_{\epsilon}^2\text{.} \end{eqnarray*} We have $${\mathrm{cov}}(\epsilon_{k,d}\eta_{k,i,d}, \epsilon_{k,d}\eta_{k,j,d})=(\omega_{i,j}\sigma_{\epsilon}^2+2B_{i,d}B_{j,d})\sigma_{i,d}^2\sigma_{j,d}^2\text{.}$$ Therefore, for $$i,j \in {\mathcal{H}}_0$$, \[ \tilde{\xi}_{i,j,d}={\mathrm{corr}}(\epsilon_{k,d}\eta_{k,i,d}, \epsilon_{k,d}\eta_{k,j,d})=\frac{(\omega_{i,j}\sigma_{\epsilon}^2+2B_{i,d}B_{j,d})}{\{(\omega_{i,i}\sigma_{\epsilon}^2+2B_{i,d}^2)(\omega_{j,j}\sigma_{\epsilon}^2+2B_{j,d}^2)\}^{1/2}}=\xi_{i,j}\text{.} \] By the proof of Theorem 1 we also have, for $$d_1\neq d_2$$, $${\mathrm{corr}}(\epsilon_{k,d_1}\eta_{k,i,d_1}, \epsilon_{k,d_2}\eta_{k,j,d_2})=0\text{.}$$ Thus we have $$|{\mathrm{corr}}(M_i,M_j)|\leq \xi<1$$, where $$\xi$$ is defined in Condition 3. Hence, by Lemma A1 and Lemma 6.2 in Liu (2013), \begin{align} &\frac{\sum_{i,j\in \mathcal{H}_{02}}\{{\text{pr}}(|M_i|\geq t, |M_j|\geq t)-{\text{pr}}(|M_i|\geq t)\,{\text{pr}}(|M_j|\geq t)\}}{p_0^2G^2(t)}\nonumber\\ &\quad\leq C\frac{p^{1+\tau}t^{-2}\exp\{-t^2/(1+\xi)\}}{p^2G(t)}\leq \frac{C}{p^{1-\tau}\{G(t)\}^{2\xi/(1+\xi)}}\text{.}\label{subset2} \end{align} (A7) It remains to consider the subset $${\mathcal{H}}_{03}$$, in which $$M_i$$ and $$M_j$$ are weakly correlated with each other. It is easy to check that $$\max_{i,j\in {\mathcal{H}}_{03}}{\text{pr}}(|M_i|\geq t, |M_j|\geq t)=\big[1+O\{(\log p)^{-1-\gamma}\}\big]G^2(t)\text{.}$$ Thus we have \begin{equation} \label{subset3} \frac{\sum_{i,j\in \mathcal{H}_{03}}\{{\text{pr}}(|M_i|\geq t, |M_j|\geq t)-{\text{pr}}(|M_i|\geq t)\,{\text{pr}}(|M_j|\geq t)\}}{p_0^2G^2(t)}=O\{(\log p)^{-1-\gamma}\}\text{.} \end{equation} (A8) Equation (A5) follows by combining (A6), (A7) and (A8). Proof of Theorem 4 Under the conditions of Theorem 4, we have \[ \sum_{i \in {\mathcal{H}}}I\{|Q_i|\geq (2\log p)^{1/2}\}\geq \{{1}/({\pi^{1/2}\alpha})+\delta\}({ \log p})^{1/2} \] with probability tending to 1. Hence, with probability approaching 1, we have \[ \frac{p}{\sum_{i \in {\mathcal{H}}}I\{|Q_i|\geq (2\log p)^{1/2}\}}\leq p\{{1}/({\pi^{1/2}\alpha})+\delta\}^{-1}( \log p)^{-1/2}\text{.} \] Let $$t_{p}=(2\log p-2\log\log p)^{1/2}$$. Because $$1-\Phi(t_p)\sim {1}/\{(2\pi)^{1/2}t_{p}\}\exp(-t_{p}^2/2)$$, we have $${\text{pr}}(1\leq \hat{t}\leq t_p)\rightarrow 1$$ according to the definition of $$\hat{t}$$ in the proposed multiple testing algorithm in § 3.1; that is, $${\text{pr}}(\hat{t} \text{ exists in }[0,t_{p}])\rightarrow 1$$. Theorem 4 then follows from the proof of Theorem 3. References Berman S. M. ( 1962 ). A law of large numbers for the maximum in a stationary Gaussian sequence. Ann. Math. Statist. 33 , 93 – 7 . Google Scholar CrossRef Search ADS Cai T. , Liu W. & Xia Y. ( 2013 ). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Am. Statist. Assoc. 108 , 265 – 77 . Google Scholar CrossRef Search ADS Cai T. T. & Guo Z. ( 2017 ). Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45 , 615 – 46 . Google Scholar CrossRef Search ADS Chang J. , Zhou W. , Zhou W.-X. & Wang L. ( 2017 ). Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Biometrics 73 , 31 – 41 . Google Scholar CrossRef Search ADS PubMed Chao A. , Lin C. Y. , Lee Y. S. , Tsai C. L. , Wei P. C. , Hsueh S. , Wu T. I. , Tsai C. N. , Wang C. J. , Chao A. S. et al. ( 2012 ). Regulation of ovarian cancer progression by microRNA-187 through targeting Disabled homolog-2. Oncogene 31 , 764 – 75 . Google Scholar CrossRef Search ADS PubMed Endo H. , Muramatsu T. , Furuta M. , Uzawa N. , Pimkhaokham A. , Amagasa T. , Inazawa J. & Kozaki K. ( 2013 ). Potential of tumor-suppressive miR-596 targeting LGALS3BP as a therapeutic agent in oral cancer. Carcinogenesis 34 , 560 – 9 . Google Scholar CrossRef Search ADS PubMed Gee H. E. , Buffa F. M. , Camps C. , Ramachandran A. , Leek R. , Taylor M. , Patil M. , Sheldon H. , Betts G. , Homer J. et al. ( 2011 ). The small-nucleolar RNAs commonly used for microRNA normalisation correlate with tumour pathology and prognosis. Br. J. Cancer 104 , 1168 – 77 . Google Scholar CrossRef Search ADS PubMed Gopalan V. , Pillai S. , Ebrahimi F. , Salajegheh A. , Lam T. C. , Le T. K. , Langsford N. , Ho Y. H. , Smith R. A. & Lam A. K. ( 2014 ). Regulation of microRNA-1288 in colorectal cancer: Altered expression and its clinicopathological significance. Molec. Carcinogen. 53 , Suppl. 1 E 36 – 44 . Google Scholar CrossRef Search ADS Grundberg E. , Small K. S. , Hedman A. K. , Nica A. C. , Buil A. , Keildson S. , Bell J. T. , Yang T.-P. , Meduri E. , Barrett A. et al. ( 2012 ). Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nature Genet. 44 , 1084 – 9 . Google Scholar CrossRef Search ADS Honeywell D. , Cabrita M. , Zhao H. , Dimitroulakos J. & Addison C. ( 2013 ). miR-105 inhibits prostate tumour growth by suppressing CDK6 levels. PLoS ONE 8 , e70515 . Google Scholar CrossRef Search ADS PubMed Huang S. & Chen L. ( 2014 ). miR-888 regulates side population properties and cancer metastasis in breast cancer cells. Biochem. Biophys. Res. Commun. 450 , 1234 – 40 . Google Scholar CrossRef Search ADS PubMed Imam J. , Buddavarapu K. , Lee-Chang J. , Ganapathy S. , Camosy C. , Chen Y. & Rao M. ( 2010 ). MicroRNA-185 suppresses tumor growth and progression by targeting the Six1 oncogene in human cancers. Oncogene 29 , 4971 – 9 . Google Scholar CrossRef Search ADS PubMed Ito T. , Sato F. , Kan T. , Cheng Y. , David S. , Agarwal R. , Paun B. C. , Jin Z. , Olaru A. V. , Hamilton J. P. et al. ( 2011 ). Polo-like kinase 1 regulates cell proliferation and is targeted by miR-593* in esophageal cancer. Int. J. Cancer 129 , 2134 – 46 . Google Scholar CrossRef Search ADS PubMed Javanmard A. & Montanari A. ( 2013 ). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. IEEE Trans. Info. Theory 60 , 6522 – 54 . Google Scholar CrossRef Search ADS Javanmard A. & Montanari A. ( 2014 ). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 , 2869 – 909 . Jeong J. , Kang H. , Kim T. , Kim G. , Heo J. , Kwon A. , Kim S. , Jung S. & An H. ( 2017 ). MicroRNA-136 inhibits cancer stem cell activity and enhances the anti-tumor effect of paclitaxel against chemoresistant ovarian cancer cells by targeting Notch3. Cancer Lett. 386 , 168 – 78 . Google Scholar CrossRef Search ADS PubMed Karaayvaz M. , Zhai H. & Ju J. ( 2013 ). miR-129 promotes apoptosis and enhances chemosensitivity to 5-fluorouracil in colorectal cancer. Cell Death Dis. 4 , e659 . Google Scholar CrossRef Search ADS PubMed Lan G., L., Y. , Xie X. , Peng L. & Wang Y. ( 2015 ). MicroRNA-490-5p is a novel tumor suppressor targeting c-FOS in human bladder cancer. Arch. Med. Sci. 11 , 561 – 9 . Google Scholar CrossRef Search ADS PubMed Lee E. J. , Mircean C. , Shmulevich I. , Wang H. , Liu J. , Niemistö A. , Kavanagh J. J. , Lee J. H. & Zhang W. ( 2005 ). Insulin-like growth factor binding protein 2 promotes ovarian cancer cell invasion. Molec. Cancer 4 , 7 . Google Scholar CrossRef Search ADS Lee K. , Kunkeaw N. , Jeon S. H. , Lee I. , Johnson B. H. , Kang G. Y. , Bang J. Y. , Park H. S. , Leelayuwat C. & Lee Y. S. ( 2011 ). Precursor miR-886, a novel noncoding RNA repressed in cancer, associates with PKR and modulates its activity. RNA 17 , 1076 – 89 . Google Scholar CrossRef Search ADS PubMed Li D. , Zhao Y. , Liu C. , Chen X. , Qi Y. , Jiang Y. , Zou C. , Zhang X. , Liu S. , Wang X. et al. ( 2011 ). Analysis of MiR-195 and MiR-497 expression, regulation and role in breast cancer. Clin. Cancer Res. 17 , 1722 – 30 . Google Scholar CrossRef Search ADS PubMed Li J. , Lu Y. , Akbani R. , Ju Z. , Roebuck P. L. , Liu W. , Yang J.-Y. , Broom B. M. , Verhaak R. G. W. , Kane D. W. et al. ( 2013 ). TCPA: A resource for cancer functional proteomics data. Nature Meth . 10 , 1046 – 7 . Google Scholar CrossRef Search ADS Lin Y. , Liu A. Y. , Fan C. , Zheng H. , Li Y. , Zhang C. , Wu S. , Yu D. , Huang Z. , Liu F. et al. ( 2015 ). MicroRNA-33b inhibits breast cancer metastasis by targeting HMGA2, SALL4 and Twist1. Sci. Rep. 5 , 9995 . Google Scholar CrossRef Search ADS PubMed Liu W. ( 2013 ). Gaussian graphical model estimation with false discovery rate control. Ann. Statist. 41 , 2948 – 78 . Google Scholar CrossRef Search ADS Lounici K. , Pontil M. , van de Geer S. & Tsybakov A. B. ( 2011 ). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 , 2164 – 204 . Google Scholar CrossRef Search ADS Murphy M. E. ( 2013 ). The HSP70 family and cancer. Carcinogenesis 34 , 1181 – 8 . Google Scholar CrossRef Search ADS PubMed Pils D. , Pinter A. , Reibenwein J. , Alfanz A. , Horak P. , Schmid B. C. , Hefler L. , Horvat R. , Reinthaller A. , Zeillinger R. et al. ( 2007 ). In ovarian cancer the prognostic influence of HER2//neu is not dependent on the CXCR4//SDF-1 signalling pathway. Br. J. Cancer 96 , 485 – 91 . Google Scholar CrossRef Search ADS PubMed Ruffieux H. , Davison A. C. , Hager J. & Irincheeva I. ( 2017 ). Efficient inference for genetic association studies with multiple outcomes. Biostatistics 18 , 618 – 36 . Google Scholar PubMed Sandhu R. , Rein J. , D’Arcy M. , Herschkowitz J. I. , Hoadley K. A. & Troester M. A. ( 2014 ). Overexpression of miR-146a in basal-like breast cancer cells confers enhanced tumorigenic potential in association with altered p53 status. Carcinogenesis 35 , 2567 – 75 . Google Scholar CrossRef Search ADS PubMed Schifano E. D. , Li L. , Christiani D. C. & Lin X. ( 2013 ). Genome-wide association analysis for multiple continuous secondary phenotypes. Am. J. Hum. Genet. 92 , 744 – 59 . Google Scholar CrossRef Search ADS PubMed Shi S. , Lu Y. , Qin Y. , Li W. , Cheng H. , Xu Y. , Xu J. , Long J. , Liu L. , Liu C. et al. ( 2014 ). miR-1247 is correlated with prognosis of pancreatic cancer and inhibits cell proliferation by targeting neuropilins. Curr. Molec. Med. 14 , 316 – 27 . Google Scholar CrossRef Search ADS Shi Y. , Qiu M. , Wu Y. & Hai L. ( 2015 ). miR-548-3p functions as an anti-oncogenic regulator in breast cancer. Biomed. Pharmacother. 75 , 111 – 6 . Google Scholar CrossRef Search ADS PubMed Suo C. , Toulopoulou T. , Bramon E. , Walshe M. , Picchioni M. , Murray R. & Ott J. ( 2013 ). Analysis of multiple phenotypes in genome-wide genetic mapping studies. BMC Bioinformatics 14 , 151 . Google Scholar CrossRef Search ADS PubMed van de Geer S. , Bühlmann P. , Ritov Y. & Dezeure R. ( 2014 ). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 , 1166 – 202 . Google Scholar CrossRef Search ADS van Kouwenhove M. , Kedde M. & Agami R. ( 2011 ). MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nature Rev. Cancer 11 , 644 – 56 . Google Scholar CrossRef Search ADS Xia Y. , Cai T. & Cai T. T. ( 2015 ). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102 , 247 – 66 . Google Scholar CrossRef Search ADS PubMed Xia Y. , Cai T. & Cai T. T. ( 2018 ). Two-sample tests for high-dimensional linear regression with an application to detecting interactions. Statist. Sinica 28 , 63 – 92 . Yang J. J. , Li J. , Williams L. K. & Buu A. ( 2016 ). An efficient genome-wide association test for multivariate phenotypes based on the Fisher combination function. BMC Bioinformatics 17 , 19 . Google Scholar CrossRef Search ADS PubMed Yuan M. & Lin Y. ( 2006 ). Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. B 68 , 49 – 67 . Google Scholar CrossRef Search ADS Zaïtsev A. Y. ( 1987 ). On the Gaussian approximation of convolutions under multidimensional analogues of S. N. Bernstein’s inequality conditions. Prob. Theory Rel. Fields 74 , 535 – 66 . Google Scholar CrossRef Search ADS Zhang C.-H. & Zhang S. S. ( 2014 ). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Statist. Soc. B 76 , 217 – 42 . Google Scholar CrossRef Search ADS Zhou J. J. , Cho M. H. , Lange C. , Lutz S. , Silverman E. K. & Laird N. M. ( 2015 ). Integrating multiple correlated phenotypes for genetic association analysis by maximizing heritability. Hum. Hered. 79 , 93 – 104 . Google Scholar CrossRef Search ADS PubMed Zhou W. , Fong M. Y. , Min Y. , Somlo G. , Liu L. , Palomares M. R. , Yu Y. , Chow A. , O’Connor S. T. F. , Chin A. R. et al. ( 2014 ). Cancer-secreted miR-105 destroys vascular endothelial barriers to promote metastasis. Cancer Cell 25 , 501 – 15 . Google Scholar CrossRef Search ADS PubMed Zhu Y. & Bradic J. ( 2017 ). Significance testing in non-sparse high-dimensional linear models. arXiv:1610.02122v3 . Zsiros E. , Duttagupta P. , Dangaj D. , Li H. , Frank R. , Garrabrant T. , Hagemann I. S. , Levine B. L. , June C. H. , Zhang L. et al. ( 2015 ). The ovarian cancer chemokine landscape is conducive to homing of vaccine-primed and CD3/CD28–costimulated T cells prepared for adoptive therapy. Clin. Cancer Res. 21 , 2840 – 50 . Google Scholar CrossRef Search ADS PubMed © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Biometrika – Oxford University Press

**Published: ** Feb 16, 2018

Loading...

personal research library

It’s your single place to instantly

**discover** and **read** the research

that matters to you.

Enjoy **affordable access** to

over 18 million articles from more than

**15,000 peer-reviewed journals**.

All for just $49/month

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Read from thousands of the leading scholarly journals from *SpringerNature*, *Elsevier*, *Wiley-Blackwell*, *Oxford University Press* and more.

All the latest content is available, no embargo periods.

## “Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”

Daniel C.

## “Whoa! It’s like Spotify but for academic articles.”

@Phil_Robichaud

## “I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”

@deepthiw

## “My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”

@JoseServera

DeepDyve ## Freelancer | DeepDyve ## Pro | |
---|---|---|

Price | FREE | $49/month |

Save searches from | ||

Create lists to | ||

Export lists, citations | ||

Read DeepDyve articles | Abstract access only | Unlimited access to over |

20 pages / month | ||

PDF Discount | 20% off | |

Read and print from thousands of top scholarly journals.

System error. Please try again!

or

By signing up, you agree to DeepDyve’s Terms of Service and Privacy Policy.

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.

ok to continue