On Bayes factors for the linear model

On Bayes factors for the linear model Summary We show that the Bayes factor for testing whether a subset of coefficients are zero in the normal linear regression model gives the uniformly most powerful test amongst the class of invariant tests discussed in Lehmann & Romano (2005) if the prior distributions for the regression coefficients are in a specific class of distributions. The priors in this class can have any elliptical distribution, with a specific scale matrix, for the subset of coefficients that are being tested. We also show under mild conditions that the Bayes factor gives the uniformly most powerful invariant test only if the prior for the coefficients being tested is an elliptical distribution with this scale matrix. The implications are discussed. 1. Introduction This paper contributes to the understanding of Bayesian procedures by investigating some frequentist properties of the Bayes factor when regarded as a statistic for testing whether a subset of coefficients are zero in the normal linear regression model   \begin{equation} y = X_1\beta_1 + X_2\beta_2 + \phi^{-1/2}\,\epsilon, \end{equation} (1) where $$y$$ is an $$n \times 1$$ observation vector, $$\phi^{-1}$$ is the variance, $$X_1$$ and $$X_2$$ are $$n \times p$$ and $$n \times q$$ design matrices, respectively, with $$(X_1\;X_2)$$ of full column rank and $$p+q<n$$, $$\beta_1$$ and $$\beta_2$$ are $$p \times 1$$ and $$q \times 1$$ vectors of regression parameters, and $$\epsilon = (\epsilon_1, \ldots, \epsilon_n)'$$ with $$\epsilon_i$$ being independent standard normal random variables. Specifically, the testing problem is of the type   $$ H_0{:}\left\|\beta_2\right\|=0\quad \mbox{versus} \quad H_1{:}\left\|\beta_2\right\|>0 $$ where $$\left\|\beta_2\right\|=\sum_{j=1}^q\beta_{2j}^2$$. A Bayesian test rejects $$H_0$$ if the Bayes factor   \begin{align*} B=\frac{\int\int\int f(\,y \mid \beta_1,\beta_2,\phi)\,\pi_1(\beta_1,\beta_2,\phi)\,{\rm d}\beta_1{\rm d}\beta_2{\rm d}\phi}{\int\int f(\,y \mid \beta_1,\beta_2=0,\phi)\,\pi_0(\beta_1,\phi) \,{\rm d}\beta_1{\rm d}\phi} > \lambda, \end{align*} where $$f(\,y \mid \beta_1,\beta_2=0,\phi)$$ and $$f(\,y \mid \beta_1,\beta_2,\phi)$$ are the density functions for $$y$$ under $$H_0$$ and $$H_1$$, respectively, $$\pi_0(\beta_1,\phi)$$ and $$\pi_1(\beta_1,\beta_2,\phi)$$ are the prior distributions under $$H_0$$ and $$H_1$$, and $$\lambda>0$$ is specified by the user. Jeffreys (1961) and Kass & Raftery (1995) provide scales of evidence for the selection of $$\lambda$$. The classical $$F$$-test of $$H_0$$ versus $$H_1$$ rejects $$H_0$$ if   \begin{equation} F= \frac{y'(H-H_1)y/q}{y'(I-H)y/\left\{n-(p+q)\right\}} > \gamma, \end{equation} (2) where $$H_1=X_1(X_1'X_1)^{-1}X_1'$$, $$H=X(X'X)^{-1}X'$$ with $$X = (X_1\;X_2)$$, and $$\gamma$$ is chosen so that $${\rm pr}(F>\gamma \mid H_0)=\alpha$$, the significance level of the test. Lehmann & Romano (2005, p. 280) show that the $$F$$-test is uniformly most powerful amongst tests that are invariant with respect to the three groups of transformations given in Lehmann & Romano (2005, p. 279) and our § 2.1. The invariance properties seem reasonable as, for example, a change in the units of measurement for $$y$$ should not affect the result of the test. Many statisticians would consider it appropriate to consider only Lehmann and Romano’s invariant tests (Lehmann & Romano, 2005; Liang et al., 2008). If attention is restricted to this class of tests, then no test is better than the classical $$F$$-test. This paper makes two contributions to the literature. First, we specify a class of prior distributions such that if $$(\pi_0, \pi_1)$$ is in this class, then $$B = \psi(F)$$, where $$\psi$$ is a monotone function. In particular, the prior for $$\beta_1$$ given $$\phi$$ under $$H_0$$ and $$H_1$$ is a noninformative uniform prior, the prior for $$\beta_2$$ given $$\phi$$ under $$H_1$$ is any elliptical distribution with scale matrix $$\phi^{-1} \left\{X'_2(I-H_1)X_2\right\}^{-1}$$, and the prior for $$\phi$$ is $$\pi(\phi) \propto \phi^{-s}$$ with $$1 \leqslant s < (n-p)/2 + 1$$. Since the $$F$$-test is the uniformly most powerful invariant test, this implies that every prior in the class gives this test. This extends the results of Giron et al. (2006), who show that the intrinsic prior distribution gives a Bayes factor that is a monotone function of $$F$$, and of Bayarri et al. (2012), who show that a mixture of $$g$$-priors gives a Bayes factor that is a monotone function of $$F$$, to all elliptical priors with scale matrix $$\phi^{-1}\left\{X'_2(I-H_1)X_2\right\}^{-1}$$, including such priors as the nonlocal priors of Johnson & Rossell (2010). The second contribution of the paper is to show that under mild conditions $$B$$ is a monotone function of $$F$$, and therefore gives the uniformly most powerful invariant test, only if $$\beta_1$$ given $$\phi$$ has a noninformative uniform prior and $$\beta_2$$ given $$\phi$$ has an elliptical prior distribution with scale matrix $$\phi^{-1}\left\{X'_2(I-H_1)X_2\right\}^{-1}$$. 2. Hypothesis testing in normal regression models 2.1. Model transformation and invariance properties Let $$P$$ be an $$n \times n$$ orthogonal matrix whose first $$p$$ rows span the subspace spanned by the columns of $$X_1$$, and whose first $$p+q$$ rows span the subspace spanned by the columns of $$X_1$$ and $$X_2$$. Also, let $$P_1$$ denote the submatrix of $$P$$ containing its first $$p$$ rows, let $$P_2$$ denote the submatrix containing its next $$q$$ rows, and let $$P_3$$ denote the submatrix containing its remaining $$n-p-q$$ rows. Following Lehmann & Romano (2005, p. 278), let $$T=Py$$ with $$T_j=P_jy$$. Then $$T=(T'_1,T'_2,T'_3)' \sim N(\eta, \phi^{-1} I)$$ where $$\eta = (\eta'_1, \eta'_2, 0')'$$ with $$\eta_1 = P_1(X_1 \beta_1 + X_2 \beta_2)$$ and $$\eta_2 = P_2 X_2 \beta_2$$. The test of $$H_0{:}\left\|\beta_2\right\|=0$$ versus $$H_1{:}\left\|\beta_2\right\|>0$$ is equivalent to that of $$H_0{:}\left\|\eta_2\right\|=0$$ versus $$\: H_1{:}\left\|\eta_2\right\|>0$$. Lehmann & Romano (2005, p. 279) show that this testing problem remains invariant under each of the following groups of transformations: (i) $$\widetilde{T}_{1j} = T_{1j} + c_j$$ for $$j = 1, \ldots, p$$; (ii) all orthogonal transformations of $$T_2$$; and (iii) $$\widetilde{T} = cT, c \ne 0$$. They also show that the test which rejects $$H_0$$ for large values of   $$F=\frac{T'_2 T_2 /q}{T'_3 T_3/\left\{n-(p+q)\right\}} $$ is uniformly most powerful amongst all tests that are invariant under the groups of transformations in (i)’(iii). This $$F$$-statistic is equivalent to that in (2). 2.2. Specification of the class of prior distributions The Bayes factor in the transformed $$T/\eta$$-space is   \begin{equation} B=\frac{\int\int\int f(T \mid \eta_1,\eta_2,\phi)\,\pi_1(\eta_1,\eta_2,\phi)\,{\rm d}\eta_1{\rm d}\eta_2{\rm d}\phi}{\int\int f(T \mid \eta_1,\eta_2=0,\phi)\,\pi_0(\eta_1,\phi) \,{\rm d}\eta_1{\rm d}\phi} \end{equation} (3) where $$\pi_0(\eta_1,\phi)$$ and $$\pi_1(\eta_1,\eta_2,\phi)$$ are the prior distributions under $$H_0$$ and $$H_1$$. We assume that $$\eta_1$$ and $$\eta_2$$ given $$\phi$$ are independent, so $$\pi_1(\eta_1,\eta_2 \mid \phi)=\pi(\eta_1 \mid \phi)\pi(\eta_2 \mid \phi)$$. This is a reasonable condition to impose, because $$f(T_1,T_2 \mid \eta_1,\eta_2,\phi) = f(T_1 \mid \eta_1,\phi)f(T_2 \mid \eta_2,\phi)$$. Any dependence between $$\eta_1$$ and $$\eta_2$$ given $$\phi$$ cannot be learnt from the data and should not be imposed by the prior. We also assume for the remainder of the paper that $$\pi(\phi) \propto \phi^{-s}$$ with $$1 \leqslant s < (n-p)/2 + 1$$, $$\pi(\eta_1 \mid \phi) = \phi^{p/2} \widetilde{r}_1(\phi^{1/2}\eta_1)$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}\widetilde{r}_2(\phi^{1/2}\eta_2)$$, where $$\widetilde{r}_1(\cdot) \geqslant 0$$ and $$\widetilde{r}_2(\cdot) \geqslant 0$$. These conditions ensure that the test is invariant under scale changes and are standard assumptions in the literature (Bayarri et al., 2012; Liang et al., 2008; Casella & Moreno, 2006). Definition 1. Let $$\mathcal{C}$$ represent the class of prior distributions $$(\pi_0, \pi_1)$$ that satisfy $$\pi_0(\eta_1,\phi)=\pi(\eta_1 \mid \phi)\pi(\phi)$$ and $$\pi_1(\eta_1,\eta_2,\phi)=\pi(\eta_1 \mid \phi)\pi(\eta_2 \mid \phi)\pi(\phi)$$ with $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. By Definition 1, $$r_2(\phi\eta_2'\eta_2) = \widetilde{r}_2(\phi^{1/2}\eta_2)$$. In the original $$\beta$$-space, the prior distributions in $$\mathcal{C}$$ are $$\pi_0(\beta_1,\phi)=\pi(\beta_1 \mid \phi)\pi(\phi)$$ and $$\pi_1(\beta_1,\beta_2,\phi)=\pi(\beta_1 \mid \phi)\pi(\beta_2 \mid \phi)\pi(\phi)$$ where $$\pi(\beta_1 \mid \phi) \propto 1$$ and   \begin{equation} \pi(\beta_2 \mid \phi) = \phi^{q/2} r_2(\beta_2'\Sigma^{-1}\beta_2) \end{equation} (4) with $$\Sigma = \phi^{-1} \left\{X'_2(I-H_1)X_2\right\}^{-1}$$, because we can write $$\beta_1 = A_{11}\eta_1 + A_{12}\eta_2$$ and $$\beta_2 = A_{22}\eta_2$$, where $$A_{11} = (P_1 X_1)^{-1}$$, $$A_{12} = -(P_1 X_1)^{-1}(P_1 X_2)(P_2 X_2)^{-1}$$ and $$A_{22} = (P_2 X_2)^{-1}$$. Therefore the noninformative uniform prior on $$\eta_1$$ implies such a prior on $$\beta_1$$, and $$\pi(\eta_2 \mid \phi) = \phi^{q/2} r_2(\phi \eta_2'\eta_2)$$ implies $$\pi(\beta_2 \mid \phi) = \phi^{q/2} r_2(\beta_2'\Sigma^{-1}\beta_2)$$. We also show in the Supplementary Material that $$\beta_1$$ and $$\beta_2$$ are independent conditional on $$\phi$$. Prior distributions that are in $$\mathcal{C}$$ include the power-conditional-expected prior (Fouskakis & Ntzoufras 2016), the power-expected-posterior prior (Fouskakis et al. 2015), the intrinisic prior (Womack et al., 2014; Giron et al., 2006; Casella & Moreno, 2006), the nonlocal method-of-moment and inverse method-of-moment priors (Johnson & Rossell 2010), the hyper-$$g$$ prior (Liang et al. 2008) and the $$g$$-prior (Zellner 1986). 2.3. Sufficient conditions for a monotone relationship between $$B$$ and $$F$$ Theorem 1 gives sufficient conditions for a monotone relationship between $$B$$ and $$F$$, while Theorem 2 and its corollaries in the next section give necessary conditions. Theorem 1. If $$(\pi_0, \pi_1) \in \mathcal{C}$$, then $$B = \psi(F)$$, where $$\psi$$ is a monotone increasing function. The following lemma, used in the proof of Theorem 1, is proved in the Supplementary Material. Lemma 1. Let $$u$$ and $$\xi$$ represent $$q \times 1$$ vectors, $$h(\xi'\xi)$$ represent a nonnegative spherically symmetric function of $$\xi$$, and $$a_j = \int{\xi_1^j h(\xi'\xi)\,{\rm d}\xi}$$ where $$\xi_1$$ is the first element of $$\xi$$. Then $$a_j \geqslant 0$$ for $$j = 0, 1, \ldots ,$$ and $$\int{(\xi'u)^j h(\xi'\xi)\,{\rm d}\xi} = a_j (u'u)^{\,j/2}$$. Proof of Theorem 1. Integrating the numerator and denominator of (3) with respect to $$\eta_1$$, setting $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi \eta_2'\eta_2)$$ and $$\pi(\phi) \propto \phi^{-s}$$, and rearranging terms gives   $$ B=\frac{\int{\phi^{(n-p)/2-s}\exp{\left\{ -\tfrac{1}{2} \phi(T_2'T_2 + T_3'T_3) \right\}}}\left\{ \int{\exp(\phi\eta_2'T_2)}\exp{\left(-{1\over2}\phi\eta_2'\eta_2 \right) \phi^{q/2}r_2(\phi\eta_2'\eta_2){\rm d}\eta_2} \right\}{\rm d}\phi}{\int{\phi^{(n-p)/2-s}\exp{\left\{ -{1\over2} \phi (T_2'T_2 + T_3'T_3) \right\}{\rm d}\phi}}}\text{.}$$ Let $$\delta_2 = (\delta_{21}, \ldots, \delta_{2q})' = \phi^{1/2}\eta_2$$ and $$h(\delta_2'\delta_2)=\exp \left(- \delta_2'\delta_2/2 \right)r_2(\delta_2'\delta_2)$$. Then the numerator is   \begin{equation} \int{\phi^{(n-p)/2-s}\exp{\left\{ -\tfrac{1}{2} \phi(T_2'T_2 + T_3'T_3) \right\}}}\left\{ \int{\exp(\phi^{1/2}\delta_2'T_2)h(\delta_2'\delta_2){\rm d}\delta_2} \right\}{\rm d}\phi\text{.} \end{equation} (5) Expanding the exponential term in the integral on the right-hand side of (5) using a Taylor series expansion and then applying Lemma 1 gives   \begin{equation} \int{\exp(\phi^{1/2}\delta_2'T_2)h(\delta_2'\delta_2)\,{\rm d}\delta_2} = \sum_{j=0}^{\infty}\frac{\phi^{j/2}}{j!}\int{(\delta_2'T_2)^{\,j}h(\delta_2'\delta_2)\,{\rm d}\delta_2}= \sum_{j=0}^{\infty}a_j\phi^{j/2}(T_2'T_2)^{\,j/2} \end{equation} (6) where $$a_j = j!^{-1}\int{\delta_{21}^j h(\delta_2'\delta_2)\,{\rm d}\delta_2} > 0$$ for $$j = 0, 1, \ldots$$. Substituting (6) into (5) and integrating with respect to $$\phi$$ gives the numerator as   $$\sum_{j=0}^{\infty}a_j(T_2'T_2)^{\,j/2} \left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\tfrac{n-p+j}{2}-s+1\right)} \Gamma \left(\tfrac{n-p+j}{2}-s+1 \right)\!\text{.}$$ The denominator of $$B$$ is   $$\int{\phi^{(n-p)/2-s}\exp\left\{ -\tfrac{ \phi}{2} (T_2'T_2 + T_3'T_3) \right\}{\rm d}\phi} = \left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\tfrac{n-p}{2}-s+1\right)} \Gamma \left(\tfrac{n-p}{2}-s+1 \right)\!,$$ so the Bayes factor is   \begin{align*} B&=\frac{\sum_{j=0}^{\infty}a_j(T_2'T_2)^{\,j/2} \left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\frac{n-p+j}{2}-s+1\right)} \Gamma \left(\frac{n-p+j}{2}-s+1 \right)}{\left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\frac{n-p}{2}-s+1\right)} \Gamma \left(\frac{n-p}{2}-s+1 \right)}\\ &=\sum_{j=0}^{\infty}a_j^* \left( \frac{T_2'T_2}{T_2'T_2 + T_3'T_3} \right)^{\,j/2} \end{align*} where   $$a_j^*=\frac{a_j 2^{j/2}\Gamma \left(\frac{n-p+j}{2}-s+1 \right)}{\Gamma \left(\frac{n-p}{2}-s+1 \right)}\text{.}$$ It is straightforward to show that $$T_2'T_2/(T_2'T_2 + T_3'T_3)$$ is a monotone increasing function of $$T_2'T_2/(T_3'T_3)$$ and therefore that $$B$$ is a monotone increasing function of $$F$$. □ 2.4. Necessary conditions for a monotone relationship Theorem 2 and its corollaries give necessary conditions on $$(\pi_0, \pi_1)$$ in both the $$\eta$$ and the $$\beta$$-space for the Bayes factor to be a monotone function of $$F$$. Recall from § 2.2 that we assume $$\pi(\phi) \propto \phi^{-s}$$ and $$\pi(\eta_1,\eta_2 \mid \phi) = \phi^{(p+q)/2}\widetilde{r}_1(\phi^{1/2}\eta_1)\widetilde{r}_2(\phi^{1/2}\eta_2)$$. These conditions ensure that the test is invariant under scale changes. Theorem 2. The Bayes factor $$B$$ is invariant under location changes, $$\widetilde{T}_{1j} = T_{1j} + c_j$$ for $$j = 1, \ldots, p$$, and orthogonal transformations of $$T_2$$ only if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. Proof of Theorem 2. See the Supplementary Material. □ Corollary 1. The Bayes factor $$B$$ is a monotone function of $$F$$ only if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. Proof of Corollary 1. The $$F$$-statistic is invariant, so Theorem 2 shows that $$B$$ is a function of $$F$$ only if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. We know from Theorem 1 that $$B$$ is a monotone function of $$F$$ if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$, so the corollary follows. □ Corollary 2 states the equivalent result in the original $$\beta$$-space. Corollary 2. The Bayes factor $$B$$ is a monotone function of $$F$$ only if $$\pi(\beta_1 \mid \phi) \propto 1$$ and $$\pi(\beta_2 \mid \phi) = \phi^{q/2}r_2(\beta_2'\Sigma^{-1}\beta_2)$$ where $$\Sigma = \phi^{-1}\left\{X'_2(I-H_1)X_2\right\}^{-1}$$. Proof of Corollary 2. This follows from the specification of $$\mathcal{C}$$ in § 2.2 in the $$\eta$$- and $$\beta$$-spaces. □ 3. Discussion The monotone relationship between $$B$$ and $$F$$ has important implications. First, if we set $$\gamma = \psi^{-1}(\lambda)$$, then for every set of priors $$(\pi_0, \pi_1) \in \mathcal{C}$$, and in particular for any elliptical prior in (4), the Bayes test $$B > \lambda$$ is equivalent to the classical test $$F > \gamma$$. More specifically, the two tests will give the same results for every dataset and the error rate functions will be identical. Further, if one Bayesian chooses a prior $$\pi^{(1)} = (\pi_0^{(1)}, \pi_1^{(1)}) \in \mathcal{C}$$ and critical value $$\lambda_1$$, and a second Bayesian chooses the prior $$\pi^{(2)} = (\pi_0^{(2)}, \pi_1^{(2)}) \in \mathcal{C}$$, then there is a critical value $$\lambda_2$$ for the second test such that the two Bayesian tests give the same results for every dataset and therefore have the same error rate functions. Stated differently, if $$\pi^{(1)}$$ is chosen from $$\mathcal{C}$$, then the Bayesian test involving any other $$\pi^{(2)}\in\mathcal{C}$$ can be replicated by selecting an appropriate $$\lambda$$. That is, two Bayesians with priors $$\pi^{(1)}$$ and $$\pi^{(2)}$$ and critical value $$\lambda$$ behave equivalently to two Bayesians using the same prior $$\pi\in\mathcal{C}$$ and different critical values $$\lambda_1$$ and $$\lambda_2$$. These $$\lambda$$ values can be found using the fact that the Bayes test $$B > \lambda$$ is equivalent to $$F > \psi^{-1}(\lambda)$$. This is important, since it is well known that for a fixed $$\lambda$$ the choice of prior can have a significant impact on the outcome of the test; see Garcia-Donato & Chen (2005). The monotone relationship between $$B$$ and $$F$$ also has implications for the asymptotic properties of a test. For example, Johnson & Rossell (2010) consider the model in (1) with $$p = 0$$ and construct a special prior $$\widetilde{\pi}(\beta_2 \mid \phi)$$ such that $${\rm pr}(B_{\widetilde{\pi}}>\lambda_n \mid H_0)\rightarrow 0$$ for $$\log \lambda_n=-cn^{k/(k+1)}$$ for some $$c>0$$ and arbitrary integer $$k$$. An implication of the monotone relationship between $$B$$ and $$F$$ is that the same result is obtained for the classical test by selecting $$\gamma_n = \psi^{-1}(\lambda_n)$$. Then $${\rm pr}(F>\gamma_n \mid \beta_2) = {\rm pr}(B_{\widetilde{\pi}}>\lambda_n \mid \beta_2)$$ for every $$\beta_2$$, so the two tests have the same asymptotic properties. Similarly, for every $$\pi(\beta_2 \mid \phi)$$ that satisfies (4) there exists an $$\omega_n$$ such that $${\rm pr}(B_{\pi}>\omega_n \mid \beta_2) = {\rm pr}(B_{\widetilde{\pi}}>\lambda_n \mid \beta_2)$$, so the Bayesian test using $$\pi(\beta_2 \mid \phi)$$ and $$\omega_n$$ has the same properties as the test using $$\widetilde{\pi}(\beta_2 \mid \phi)$$ and $$\lambda_n$$. Acknowledgement The authors are grateful to the editor, associate editor and two referees, whose suggestions have improved the paper substantially. Supplementary material Supplementary material available at Biometrika online includes the proof that $$\beta_1$$ and $$\beta_2$$ are independent conditional on $$\phi$$, and the proofs of Lemma 1 and Theorem 2. References Bayarri M. J., Berger J. O., Forte A. & Garcia-Donato G. ( 2012). Criteria for Bayesian model choice with application to variable selection. Ann. Statist.  40, 1550– 77. Google Scholar CrossRef Search ADS   Casella G. & Moreno E. ( 2006). Objective Bayesian variable selection. J. Am. Statist. Assoc.  101, 157– 67. Google Scholar CrossRef Search ADS   Fouskakis D. & Ntzoufras I. ( 2016). Power-conditional-expected priors: Using $$g$$-priors with random imaginary data for variable selection. J. Comp. Graph. Statist.  25, 647– 64. Google Scholar CrossRef Search ADS   Fouskakis D., Ntzoufras I. & Draper D. ( 2015). Power-expected posterior priors for variable selection in Gaussian linear models. Bayesian Anal.  25, 75– 107. Google Scholar CrossRef Search ADS   Garcia-Donato G. & Chen M. H. ( 2005). Calibrating Bayes factor under prior predictive distributions. Statist. Sinica  15, 359– 80. Giron F. J., Martinez M. L., Moreno E. & Torres F. ( 2006). Objective testing procedures in linear models: Calibration of the $$p$$-values. Scand. J. Statist.  33, 765– 84. Google Scholar CrossRef Search ADS   Jeffreys H. ( 1961). Theory of Probability . New York: Oxford University Press, 3rd ed. Johnson V. & Rossell D. ( 2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Statist. Soc. B  72, 143– 70. Google Scholar CrossRef Search ADS   Kass R. E. & Raftery A. E. ( 1995). Bayes factors. J. Am. Statist. Assoc.  90, 773– 95. Google Scholar CrossRef Search ADS   Lehmann E. L. & Romano J. P. ( 2005). Testing Statistical Hypotheses . New York: Springer, 3rd ed. Liang F., Paulo R, Clyde M. A., Molina G. & Berger J. O. ( 2008). Mixtures of $$g$$-priors for Bayesian variable selection. J. Am. Statist. Assoc.  103, 410– 23. Google Scholar CrossRef Search ADS   Womack A. J., Leon-Novelo L. & Casella G. ( 2014). Inference from intrinsic Bayes’ procedures under model selection and uncertainty. J. Am. Statist. Assoc.  109, 1040– 53. Google Scholar CrossRef Search ADS   Zellner A. ( 1986). On assessing prior distributions and Bayesian regression analysis with $$g$$-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti , Goel P. K. and Zellner A. eds. Amsterdam: North-Holland, pp. 233– 43. © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biometrika Oxford University Press

On Bayes factors for the linear model

Biometrika , Volume Advance Article – May 10, 2018

Loading next page...
 
/lp/ou_press/on-bayes-factors-for-the-linear-model-00BSuuo0BG
Publisher
Oxford University Press
Copyright
© 2018 Biometrika Trust
ISSN
0006-3444
eISSN
1464-3510
D.O.I.
10.1093/biomet/asy022
Publisher site
See Article on Publisher Site

Abstract

Summary We show that the Bayes factor for testing whether a subset of coefficients are zero in the normal linear regression model gives the uniformly most powerful test amongst the class of invariant tests discussed in Lehmann & Romano (2005) if the prior distributions for the regression coefficients are in a specific class of distributions. The priors in this class can have any elliptical distribution, with a specific scale matrix, for the subset of coefficients that are being tested. We also show under mild conditions that the Bayes factor gives the uniformly most powerful invariant test only if the prior for the coefficients being tested is an elliptical distribution with this scale matrix. The implications are discussed. 1. Introduction This paper contributes to the understanding of Bayesian procedures by investigating some frequentist properties of the Bayes factor when regarded as a statistic for testing whether a subset of coefficients are zero in the normal linear regression model   \begin{equation} y = X_1\beta_1 + X_2\beta_2 + \phi^{-1/2}\,\epsilon, \end{equation} (1) where $$y$$ is an $$n \times 1$$ observation vector, $$\phi^{-1}$$ is the variance, $$X_1$$ and $$X_2$$ are $$n \times p$$ and $$n \times q$$ design matrices, respectively, with $$(X_1\;X_2)$$ of full column rank and $$p+q<n$$, $$\beta_1$$ and $$\beta_2$$ are $$p \times 1$$ and $$q \times 1$$ vectors of regression parameters, and $$\epsilon = (\epsilon_1, \ldots, \epsilon_n)'$$ with $$\epsilon_i$$ being independent standard normal random variables. Specifically, the testing problem is of the type   $$ H_0{:}\left\|\beta_2\right\|=0\quad \mbox{versus} \quad H_1{:}\left\|\beta_2\right\|>0 $$ where $$\left\|\beta_2\right\|=\sum_{j=1}^q\beta_{2j}^2$$. A Bayesian test rejects $$H_0$$ if the Bayes factor   \begin{align*} B=\frac{\int\int\int f(\,y \mid \beta_1,\beta_2,\phi)\,\pi_1(\beta_1,\beta_2,\phi)\,{\rm d}\beta_1{\rm d}\beta_2{\rm d}\phi}{\int\int f(\,y \mid \beta_1,\beta_2=0,\phi)\,\pi_0(\beta_1,\phi) \,{\rm d}\beta_1{\rm d}\phi} > \lambda, \end{align*} where $$f(\,y \mid \beta_1,\beta_2=0,\phi)$$ and $$f(\,y \mid \beta_1,\beta_2,\phi)$$ are the density functions for $$y$$ under $$H_0$$ and $$H_1$$, respectively, $$\pi_0(\beta_1,\phi)$$ and $$\pi_1(\beta_1,\beta_2,\phi)$$ are the prior distributions under $$H_0$$ and $$H_1$$, and $$\lambda>0$$ is specified by the user. Jeffreys (1961) and Kass & Raftery (1995) provide scales of evidence for the selection of $$\lambda$$. The classical $$F$$-test of $$H_0$$ versus $$H_1$$ rejects $$H_0$$ if   \begin{equation} F= \frac{y'(H-H_1)y/q}{y'(I-H)y/\left\{n-(p+q)\right\}} > \gamma, \end{equation} (2) where $$H_1=X_1(X_1'X_1)^{-1}X_1'$$, $$H=X(X'X)^{-1}X'$$ with $$X = (X_1\;X_2)$$, and $$\gamma$$ is chosen so that $${\rm pr}(F>\gamma \mid H_0)=\alpha$$, the significance level of the test. Lehmann & Romano (2005, p. 280) show that the $$F$$-test is uniformly most powerful amongst tests that are invariant with respect to the three groups of transformations given in Lehmann & Romano (2005, p. 279) and our § 2.1. The invariance properties seem reasonable as, for example, a change in the units of measurement for $$y$$ should not affect the result of the test. Many statisticians would consider it appropriate to consider only Lehmann and Romano’s invariant tests (Lehmann & Romano, 2005; Liang et al., 2008). If attention is restricted to this class of tests, then no test is better than the classical $$F$$-test. This paper makes two contributions to the literature. First, we specify a class of prior distributions such that if $$(\pi_0, \pi_1)$$ is in this class, then $$B = \psi(F)$$, where $$\psi$$ is a monotone function. In particular, the prior for $$\beta_1$$ given $$\phi$$ under $$H_0$$ and $$H_1$$ is a noninformative uniform prior, the prior for $$\beta_2$$ given $$\phi$$ under $$H_1$$ is any elliptical distribution with scale matrix $$\phi^{-1} \left\{X'_2(I-H_1)X_2\right\}^{-1}$$, and the prior for $$\phi$$ is $$\pi(\phi) \propto \phi^{-s}$$ with $$1 \leqslant s < (n-p)/2 + 1$$. Since the $$F$$-test is the uniformly most powerful invariant test, this implies that every prior in the class gives this test. This extends the results of Giron et al. (2006), who show that the intrinsic prior distribution gives a Bayes factor that is a monotone function of $$F$$, and of Bayarri et al. (2012), who show that a mixture of $$g$$-priors gives a Bayes factor that is a monotone function of $$F$$, to all elliptical priors with scale matrix $$\phi^{-1}\left\{X'_2(I-H_1)X_2\right\}^{-1}$$, including such priors as the nonlocal priors of Johnson & Rossell (2010). The second contribution of the paper is to show that under mild conditions $$B$$ is a monotone function of $$F$$, and therefore gives the uniformly most powerful invariant test, only if $$\beta_1$$ given $$\phi$$ has a noninformative uniform prior and $$\beta_2$$ given $$\phi$$ has an elliptical prior distribution with scale matrix $$\phi^{-1}\left\{X'_2(I-H_1)X_2\right\}^{-1}$$. 2. Hypothesis testing in normal regression models 2.1. Model transformation and invariance properties Let $$P$$ be an $$n \times n$$ orthogonal matrix whose first $$p$$ rows span the subspace spanned by the columns of $$X_1$$, and whose first $$p+q$$ rows span the subspace spanned by the columns of $$X_1$$ and $$X_2$$. Also, let $$P_1$$ denote the submatrix of $$P$$ containing its first $$p$$ rows, let $$P_2$$ denote the submatrix containing its next $$q$$ rows, and let $$P_3$$ denote the submatrix containing its remaining $$n-p-q$$ rows. Following Lehmann & Romano (2005, p. 278), let $$T=Py$$ with $$T_j=P_jy$$. Then $$T=(T'_1,T'_2,T'_3)' \sim N(\eta, \phi^{-1} I)$$ where $$\eta = (\eta'_1, \eta'_2, 0')'$$ with $$\eta_1 = P_1(X_1 \beta_1 + X_2 \beta_2)$$ and $$\eta_2 = P_2 X_2 \beta_2$$. The test of $$H_0{:}\left\|\beta_2\right\|=0$$ versus $$H_1{:}\left\|\beta_2\right\|>0$$ is equivalent to that of $$H_0{:}\left\|\eta_2\right\|=0$$ versus $$\: H_1{:}\left\|\eta_2\right\|>0$$. Lehmann & Romano (2005, p. 279) show that this testing problem remains invariant under each of the following groups of transformations: (i) $$\widetilde{T}_{1j} = T_{1j} + c_j$$ for $$j = 1, \ldots, p$$; (ii) all orthogonal transformations of $$T_2$$; and (iii) $$\widetilde{T} = cT, c \ne 0$$. They also show that the test which rejects $$H_0$$ for large values of   $$F=\frac{T'_2 T_2 /q}{T'_3 T_3/\left\{n-(p+q)\right\}} $$ is uniformly most powerful amongst all tests that are invariant under the groups of transformations in (i)’(iii). This $$F$$-statistic is equivalent to that in (2). 2.2. Specification of the class of prior distributions The Bayes factor in the transformed $$T/\eta$$-space is   \begin{equation} B=\frac{\int\int\int f(T \mid \eta_1,\eta_2,\phi)\,\pi_1(\eta_1,\eta_2,\phi)\,{\rm d}\eta_1{\rm d}\eta_2{\rm d}\phi}{\int\int f(T \mid \eta_1,\eta_2=0,\phi)\,\pi_0(\eta_1,\phi) \,{\rm d}\eta_1{\rm d}\phi} \end{equation} (3) where $$\pi_0(\eta_1,\phi)$$ and $$\pi_1(\eta_1,\eta_2,\phi)$$ are the prior distributions under $$H_0$$ and $$H_1$$. We assume that $$\eta_1$$ and $$\eta_2$$ given $$\phi$$ are independent, so $$\pi_1(\eta_1,\eta_2 \mid \phi)=\pi(\eta_1 \mid \phi)\pi(\eta_2 \mid \phi)$$. This is a reasonable condition to impose, because $$f(T_1,T_2 \mid \eta_1,\eta_2,\phi) = f(T_1 \mid \eta_1,\phi)f(T_2 \mid \eta_2,\phi)$$. Any dependence between $$\eta_1$$ and $$\eta_2$$ given $$\phi$$ cannot be learnt from the data and should not be imposed by the prior. We also assume for the remainder of the paper that $$\pi(\phi) \propto \phi^{-s}$$ with $$1 \leqslant s < (n-p)/2 + 1$$, $$\pi(\eta_1 \mid \phi) = \phi^{p/2} \widetilde{r}_1(\phi^{1/2}\eta_1)$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}\widetilde{r}_2(\phi^{1/2}\eta_2)$$, where $$\widetilde{r}_1(\cdot) \geqslant 0$$ and $$\widetilde{r}_2(\cdot) \geqslant 0$$. These conditions ensure that the test is invariant under scale changes and are standard assumptions in the literature (Bayarri et al., 2012; Liang et al., 2008; Casella & Moreno, 2006). Definition 1. Let $$\mathcal{C}$$ represent the class of prior distributions $$(\pi_0, \pi_1)$$ that satisfy $$\pi_0(\eta_1,\phi)=\pi(\eta_1 \mid \phi)\pi(\phi)$$ and $$\pi_1(\eta_1,\eta_2,\phi)=\pi(\eta_1 \mid \phi)\pi(\eta_2 \mid \phi)\pi(\phi)$$ with $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. By Definition 1, $$r_2(\phi\eta_2'\eta_2) = \widetilde{r}_2(\phi^{1/2}\eta_2)$$. In the original $$\beta$$-space, the prior distributions in $$\mathcal{C}$$ are $$\pi_0(\beta_1,\phi)=\pi(\beta_1 \mid \phi)\pi(\phi)$$ and $$\pi_1(\beta_1,\beta_2,\phi)=\pi(\beta_1 \mid \phi)\pi(\beta_2 \mid \phi)\pi(\phi)$$ where $$\pi(\beta_1 \mid \phi) \propto 1$$ and   \begin{equation} \pi(\beta_2 \mid \phi) = \phi^{q/2} r_2(\beta_2'\Sigma^{-1}\beta_2) \end{equation} (4) with $$\Sigma = \phi^{-1} \left\{X'_2(I-H_1)X_2\right\}^{-1}$$, because we can write $$\beta_1 = A_{11}\eta_1 + A_{12}\eta_2$$ and $$\beta_2 = A_{22}\eta_2$$, where $$A_{11} = (P_1 X_1)^{-1}$$, $$A_{12} = -(P_1 X_1)^{-1}(P_1 X_2)(P_2 X_2)^{-1}$$ and $$A_{22} = (P_2 X_2)^{-1}$$. Therefore the noninformative uniform prior on $$\eta_1$$ implies such a prior on $$\beta_1$$, and $$\pi(\eta_2 \mid \phi) = \phi^{q/2} r_2(\phi \eta_2'\eta_2)$$ implies $$\pi(\beta_2 \mid \phi) = \phi^{q/2} r_2(\beta_2'\Sigma^{-1}\beta_2)$$. We also show in the Supplementary Material that $$\beta_1$$ and $$\beta_2$$ are independent conditional on $$\phi$$. Prior distributions that are in $$\mathcal{C}$$ include the power-conditional-expected prior (Fouskakis & Ntzoufras 2016), the power-expected-posterior prior (Fouskakis et al. 2015), the intrinisic prior (Womack et al., 2014; Giron et al., 2006; Casella & Moreno, 2006), the nonlocal method-of-moment and inverse method-of-moment priors (Johnson & Rossell 2010), the hyper-$$g$$ prior (Liang et al. 2008) and the $$g$$-prior (Zellner 1986). 2.3. Sufficient conditions for a monotone relationship between $$B$$ and $$F$$ Theorem 1 gives sufficient conditions for a monotone relationship between $$B$$ and $$F$$, while Theorem 2 and its corollaries in the next section give necessary conditions. Theorem 1. If $$(\pi_0, \pi_1) \in \mathcal{C}$$, then $$B = \psi(F)$$, where $$\psi$$ is a monotone increasing function. The following lemma, used in the proof of Theorem 1, is proved in the Supplementary Material. Lemma 1. Let $$u$$ and $$\xi$$ represent $$q \times 1$$ vectors, $$h(\xi'\xi)$$ represent a nonnegative spherically symmetric function of $$\xi$$, and $$a_j = \int{\xi_1^j h(\xi'\xi)\,{\rm d}\xi}$$ where $$\xi_1$$ is the first element of $$\xi$$. Then $$a_j \geqslant 0$$ for $$j = 0, 1, \ldots ,$$ and $$\int{(\xi'u)^j h(\xi'\xi)\,{\rm d}\xi} = a_j (u'u)^{\,j/2}$$. Proof of Theorem 1. Integrating the numerator and denominator of (3) with respect to $$\eta_1$$, setting $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi \eta_2'\eta_2)$$ and $$\pi(\phi) \propto \phi^{-s}$$, and rearranging terms gives   $$ B=\frac{\int{\phi^{(n-p)/2-s}\exp{\left\{ -\tfrac{1}{2} \phi(T_2'T_2 + T_3'T_3) \right\}}}\left\{ \int{\exp(\phi\eta_2'T_2)}\exp{\left(-{1\over2}\phi\eta_2'\eta_2 \right) \phi^{q/2}r_2(\phi\eta_2'\eta_2){\rm d}\eta_2} \right\}{\rm d}\phi}{\int{\phi^{(n-p)/2-s}\exp{\left\{ -{1\over2} \phi (T_2'T_2 + T_3'T_3) \right\}{\rm d}\phi}}}\text{.}$$ Let $$\delta_2 = (\delta_{21}, \ldots, \delta_{2q})' = \phi^{1/2}\eta_2$$ and $$h(\delta_2'\delta_2)=\exp \left(- \delta_2'\delta_2/2 \right)r_2(\delta_2'\delta_2)$$. Then the numerator is   \begin{equation} \int{\phi^{(n-p)/2-s}\exp{\left\{ -\tfrac{1}{2} \phi(T_2'T_2 + T_3'T_3) \right\}}}\left\{ \int{\exp(\phi^{1/2}\delta_2'T_2)h(\delta_2'\delta_2){\rm d}\delta_2} \right\}{\rm d}\phi\text{.} \end{equation} (5) Expanding the exponential term in the integral on the right-hand side of (5) using a Taylor series expansion and then applying Lemma 1 gives   \begin{equation} \int{\exp(\phi^{1/2}\delta_2'T_2)h(\delta_2'\delta_2)\,{\rm d}\delta_2} = \sum_{j=0}^{\infty}\frac{\phi^{j/2}}{j!}\int{(\delta_2'T_2)^{\,j}h(\delta_2'\delta_2)\,{\rm d}\delta_2}= \sum_{j=0}^{\infty}a_j\phi^{j/2}(T_2'T_2)^{\,j/2} \end{equation} (6) where $$a_j = j!^{-1}\int{\delta_{21}^j h(\delta_2'\delta_2)\,{\rm d}\delta_2} > 0$$ for $$j = 0, 1, \ldots$$. Substituting (6) into (5) and integrating with respect to $$\phi$$ gives the numerator as   $$\sum_{j=0}^{\infty}a_j(T_2'T_2)^{\,j/2} \left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\tfrac{n-p+j}{2}-s+1\right)} \Gamma \left(\tfrac{n-p+j}{2}-s+1 \right)\!\text{.}$$ The denominator of $$B$$ is   $$\int{\phi^{(n-p)/2-s}\exp\left\{ -\tfrac{ \phi}{2} (T_2'T_2 + T_3'T_3) \right\}{\rm d}\phi} = \left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\tfrac{n-p}{2}-s+1\right)} \Gamma \left(\tfrac{n-p}{2}-s+1 \right)\!,$$ so the Bayes factor is   \begin{align*} B&=\frac{\sum_{j=0}^{\infty}a_j(T_2'T_2)^{\,j/2} \left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\frac{n-p+j}{2}-s+1\right)} \Gamma \left(\frac{n-p+j}{2}-s+1 \right)}{\left\{ {1\over2}(T_2'T_2 + T_3'T_3) \right\}^{-\left(\frac{n-p}{2}-s+1\right)} \Gamma \left(\frac{n-p}{2}-s+1 \right)}\\ &=\sum_{j=0}^{\infty}a_j^* \left( \frac{T_2'T_2}{T_2'T_2 + T_3'T_3} \right)^{\,j/2} \end{align*} where   $$a_j^*=\frac{a_j 2^{j/2}\Gamma \left(\frac{n-p+j}{2}-s+1 \right)}{\Gamma \left(\frac{n-p}{2}-s+1 \right)}\text{.}$$ It is straightforward to show that $$T_2'T_2/(T_2'T_2 + T_3'T_3)$$ is a monotone increasing function of $$T_2'T_2/(T_3'T_3)$$ and therefore that $$B$$ is a monotone increasing function of $$F$$. □ 2.4. Necessary conditions for a monotone relationship Theorem 2 and its corollaries give necessary conditions on $$(\pi_0, \pi_1)$$ in both the $$\eta$$ and the $$\beta$$-space for the Bayes factor to be a monotone function of $$F$$. Recall from § 2.2 that we assume $$\pi(\phi) \propto \phi^{-s}$$ and $$\pi(\eta_1,\eta_2 \mid \phi) = \phi^{(p+q)/2}\widetilde{r}_1(\phi^{1/2}\eta_1)\widetilde{r}_2(\phi^{1/2}\eta_2)$$. These conditions ensure that the test is invariant under scale changes. Theorem 2. The Bayes factor $$B$$ is invariant under location changes, $$\widetilde{T}_{1j} = T_{1j} + c_j$$ for $$j = 1, \ldots, p$$, and orthogonal transformations of $$T_2$$ only if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. Proof of Theorem 2. See the Supplementary Material. □ Corollary 1. The Bayes factor $$B$$ is a monotone function of $$F$$ only if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. Proof of Corollary 1. The $$F$$-statistic is invariant, so Theorem 2 shows that $$B$$ is a function of $$F$$ only if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$. We know from Theorem 1 that $$B$$ is a monotone function of $$F$$ if $$\pi(\eta_1 \mid \phi) \propto 1$$ and $$\pi(\eta_2 \mid \phi) = \phi^{q/2}r_2(\phi\eta_2'\eta_2)$$, so the corollary follows. □ Corollary 2 states the equivalent result in the original $$\beta$$-space. Corollary 2. The Bayes factor $$B$$ is a monotone function of $$F$$ only if $$\pi(\beta_1 \mid \phi) \propto 1$$ and $$\pi(\beta_2 \mid \phi) = \phi^{q/2}r_2(\beta_2'\Sigma^{-1}\beta_2)$$ where $$\Sigma = \phi^{-1}\left\{X'_2(I-H_1)X_2\right\}^{-1}$$. Proof of Corollary 2. This follows from the specification of $$\mathcal{C}$$ in § 2.2 in the $$\eta$$- and $$\beta$$-spaces. □ 3. Discussion The monotone relationship between $$B$$ and $$F$$ has important implications. First, if we set $$\gamma = \psi^{-1}(\lambda)$$, then for every set of priors $$(\pi_0, \pi_1) \in \mathcal{C}$$, and in particular for any elliptical prior in (4), the Bayes test $$B > \lambda$$ is equivalent to the classical test $$F > \gamma$$. More specifically, the two tests will give the same results for every dataset and the error rate functions will be identical. Further, if one Bayesian chooses a prior $$\pi^{(1)} = (\pi_0^{(1)}, \pi_1^{(1)}) \in \mathcal{C}$$ and critical value $$\lambda_1$$, and a second Bayesian chooses the prior $$\pi^{(2)} = (\pi_0^{(2)}, \pi_1^{(2)}) \in \mathcal{C}$$, then there is a critical value $$\lambda_2$$ for the second test such that the two Bayesian tests give the same results for every dataset and therefore have the same error rate functions. Stated differently, if $$\pi^{(1)}$$ is chosen from $$\mathcal{C}$$, then the Bayesian test involving any other $$\pi^{(2)}\in\mathcal{C}$$ can be replicated by selecting an appropriate $$\lambda$$. That is, two Bayesians with priors $$\pi^{(1)}$$ and $$\pi^{(2)}$$ and critical value $$\lambda$$ behave equivalently to two Bayesians using the same prior $$\pi\in\mathcal{C}$$ and different critical values $$\lambda_1$$ and $$\lambda_2$$. These $$\lambda$$ values can be found using the fact that the Bayes test $$B > \lambda$$ is equivalent to $$F > \psi^{-1}(\lambda)$$. This is important, since it is well known that for a fixed $$\lambda$$ the choice of prior can have a significant impact on the outcome of the test; see Garcia-Donato & Chen (2005). The monotone relationship between $$B$$ and $$F$$ also has implications for the asymptotic properties of a test. For example, Johnson & Rossell (2010) consider the model in (1) with $$p = 0$$ and construct a special prior $$\widetilde{\pi}(\beta_2 \mid \phi)$$ such that $${\rm pr}(B_{\widetilde{\pi}}>\lambda_n \mid H_0)\rightarrow 0$$ for $$\log \lambda_n=-cn^{k/(k+1)}$$ for some $$c>0$$ and arbitrary integer $$k$$. An implication of the monotone relationship between $$B$$ and $$F$$ is that the same result is obtained for the classical test by selecting $$\gamma_n = \psi^{-1}(\lambda_n)$$. Then $${\rm pr}(F>\gamma_n \mid \beta_2) = {\rm pr}(B_{\widetilde{\pi}}>\lambda_n \mid \beta_2)$$ for every $$\beta_2$$, so the two tests have the same asymptotic properties. Similarly, for every $$\pi(\beta_2 \mid \phi)$$ that satisfies (4) there exists an $$\omega_n$$ such that $${\rm pr}(B_{\pi}>\omega_n \mid \beta_2) = {\rm pr}(B_{\widetilde{\pi}}>\lambda_n \mid \beta_2)$$, so the Bayesian test using $$\pi(\beta_2 \mid \phi)$$ and $$\omega_n$$ has the same properties as the test using $$\widetilde{\pi}(\beta_2 \mid \phi)$$ and $$\lambda_n$$. Acknowledgement The authors are grateful to the editor, associate editor and two referees, whose suggestions have improved the paper substantially. Supplementary material Supplementary material available at Biometrika online includes the proof that $$\beta_1$$ and $$\beta_2$$ are independent conditional on $$\phi$$, and the proofs of Lemma 1 and Theorem 2. References Bayarri M. J., Berger J. O., Forte A. & Garcia-Donato G. ( 2012). Criteria for Bayesian model choice with application to variable selection. Ann. Statist.  40, 1550– 77. Google Scholar CrossRef Search ADS   Casella G. & Moreno E. ( 2006). Objective Bayesian variable selection. J. Am. Statist. Assoc.  101, 157– 67. Google Scholar CrossRef Search ADS   Fouskakis D. & Ntzoufras I. ( 2016). Power-conditional-expected priors: Using $$g$$-priors with random imaginary data for variable selection. J. Comp. Graph. Statist.  25, 647– 64. Google Scholar CrossRef Search ADS   Fouskakis D., Ntzoufras I. & Draper D. ( 2015). Power-expected posterior priors for variable selection in Gaussian linear models. Bayesian Anal.  25, 75– 107. Google Scholar CrossRef Search ADS   Garcia-Donato G. & Chen M. H. ( 2005). Calibrating Bayes factor under prior predictive distributions. Statist. Sinica  15, 359– 80. Giron F. J., Martinez M. L., Moreno E. & Torres F. ( 2006). Objective testing procedures in linear models: Calibration of the $$p$$-values. Scand. J. Statist.  33, 765– 84. Google Scholar CrossRef Search ADS   Jeffreys H. ( 1961). Theory of Probability . New York: Oxford University Press, 3rd ed. Johnson V. & Rossell D. ( 2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Statist. Soc. B  72, 143– 70. Google Scholar CrossRef Search ADS   Kass R. E. & Raftery A. E. ( 1995). Bayes factors. J. Am. Statist. Assoc.  90, 773– 95. Google Scholar CrossRef Search ADS   Lehmann E. L. & Romano J. P. ( 2005). Testing Statistical Hypotheses . New York: Springer, 3rd ed. Liang F., Paulo R, Clyde M. A., Molina G. & Berger J. O. ( 2008). Mixtures of $$g$$-priors for Bayesian variable selection. J. Am. Statist. Assoc.  103, 410– 23. Google Scholar CrossRef Search ADS   Womack A. J., Leon-Novelo L. & Casella G. ( 2014). Inference from intrinsic Bayes’ procedures under model selection and uncertainty. J. Am. Statist. Assoc.  109, 1040– 53. Google Scholar CrossRef Search ADS   Zellner A. ( 1986). On assessing prior distributions and Bayesian regression analysis with $$g$$-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti , Goel P. K. and Zellner A. eds. Amsterdam: North-Holland, pp. 233– 43. © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

BiometrikaOxford University Press

Published: May 10, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off