# A conditional composite likelihood ratio test with boundary constraints

A conditional composite likelihood ratio test with boundary constraints Summary Composite likelihood has been widely used in applications. The asymptotic distribution of the composite likelihood ratio statistic at the boundary of the parameter space is a complicated mixture of weighted $$\chi^2$$ distributions. In this paper we propose a conditional test with data-dependent degrees of freedom. We consider a modification of the composite likelihood which satisfies the second-order Bartlett identity. We show that the modified composite likelihood ratio statistic given the number of estimated parameters lying on the boundary converges to a simple $$\chi^2$$ distribution. This conditional testing procedure is validated through simulation studies. 1. Introduction Composite likelihood (Besag, 1974; Lindsay, 1988) is an inference function constructed as the product of a set of conditional and/or marginal density functions. Composite likelihoods have been widely used in longitudinal studies, spatial modelling, analysis of panel data or missing data, and other areas. When a working independence assumption is adopted, the composite likelihood is sometimes called the independence likelihood (Chandler & Bate, 2007). Varin et al. (2011) reviewed composite likelihood methods. Although Wald-based inference using composite likelihoods is common, likelihood ratio-based inference often has better finite-sample performance. The composite likelihood ratio statistic is generally distributed as a linear combination of independent $$\chi_1^2$$ distributions (Molenberghs & Verbeke, 2005). This is mainly due to the failure of the second-order Bartlett identity, which refers to the equality between the expected negative Hessian of the loglikelihood and the covariance of the score function. To deal with such complications, several adjustments have been proposed (Chandler & Bate, 2007; Pace et al., 2011), and the adjusted composite likelihood ratio statistics converge weakly to a simple $$\chi^2$$ distribution. A regularity condition underlying these adjustments is that the parameter $$\theta_0$$ under the null hypothesis $$H_0: \theta=\theta_0$$ is interior to its parameter space, so that the composite score function is zero at the maximum composite likelihood estimate. However, when the parameter $$\theta_0$$ lies on the boundary of the parameter space, which commonly occurs in variance component models, this first-order condition is usually not satisfied. Consequently, existing adjustments for composite likelihoods may not yield an asymptotic $$\chi^2$$ distribution. Another challenge for boundary problems is that the asymptotic distribution is often a mixture of $$\chi^2$$ distributions (Self & Liang, 1987; Chen & Liang, 2010; Chen et al., 2017), and calculation of the mixing proportions relies on a partition of the parameter space. Such a partition often depends on the geometry of the tangent cone at the null hypothesis as well as on the decomposition of the Fisher information, and must be worked out case by case; this limits the use of these asymptotic results. The primary aim of this paper is to develop a modified composite likelihood ratio statistic with a simple limiting distribution for hypothesis testing when the parameter of interest lies on the boundary of the parameter space. Specifically, to recover the second-order Bartlett identity, we propose a novel modification of the composite likelihood, whereby a new approximation is obtained via a quadratic function of a linear combination of the composite score function and the maximum composite likelihood estimator. To avoid the calculation of mixing proportions, we adopt a conditional testing procedure recently proposed by Susko (2013), which originated from the ideas of Bartholomew (1961). Susko showed that the standard likelihood ratio statistics given the number of parameters lying on the boundary converges weakly to a simple $$\chi^2$$ distribution with data-dependent degrees of freedom. A crucial assumption made in Bartholomew (1961) and Susko (2013) is the second-order Bartlett identity, which does not hold for composite likelihoods. The goals of this work are two-fold: we extend Chandler & Bate’s (2007) adjustment to boundary problems, and we generalize Susko’s (2013) result to more general likelihoods for which the second-order Bartlett identity does not hold. We give a general theorem on the asymptotic distribution of the modified composite likelihood ratio statistic, and we show empirically that the modified composite likelihood ratio statistic performs better than the naive test that ignores boundary problems. 2. Modified composite likelihood under boundary constraints Let $$g(x;\theta)$$ be the probability density function of a multi-dimensional random vector $$X$$, indexed by a $$p$$-dimensional parameter $$\theta=(\theta_1,\ldots,\theta_p)^{{\mathrm{T}}}$$, where $$\theta$$ belongs to the parameter space $$\Omega$$. We assume that distinct values of $$\theta$$ correspond to distinct probability distributions. Let $$\{\mathscr{A}_1,\ldots,\mathscr{A}_K\}$$ denote a set of marginal or conditional events associated with loglikelihoods $$\ell_k\{\theta;\mathscr{A}_k(x)\}=\log \int_{x\in\mathscr{A}_k} g(x;\theta)\,{\rm d} x$$, where $$k=1, \ldots, K$$ with $$K$$ being the total number of events. Suppose that $$N$$ independent random variables $$X_1,\ldots,X_N$$ are observed from the model $$g(x;\theta)$$. Following Lindsay (1988), a composite loglikelihood can be constructed as   $$\ell_{\rm c}(\theta)=\displaystyle\sum_{i=1}^{N} \sum_{k=1}^{K}\omega_{ik}\ell_k\{\theta;\mathscr{A}_k(x_i)\},$$ where $$\omega_{ik}$$ is a nonnegative deterministic weight associated with the loglikelihood component $$\ell_k\{\theta;\mathscr{A}_k(x_i)\}$$. Let $$\hat{\theta}_{\rm c}=\arg\max_{\theta\in \Omega} \ell_{\rm c}(\theta)$$ be the maximum composite likelihood estimator, and define the composite score function, sensitivity matrix and variability matrix as, respectively,   $${ U_{\rm c}(\theta)=\frac{\partial \ell_{\rm c}(\theta)}{\partial\theta},\quad H ={\displaystyle \lim _{N\rightarrow \infty}}-\frac{1}{N}E\!\left\{\frac{\partial^2\ell_{\rm c}(\theta)}{\partial\theta^{{\mathrm{T}}}\partial\theta}\right\}\!, \quad V={\displaystyle \lim _{N\rightarrow \infty}}\frac{1}{N}E\!\left[\left\{\frac{\partial\ell_{\rm c}(\theta)}{\partial\theta}\right\}\left\{\frac{\partial\ell_{\rm c}(\theta)}{\partial\theta}\right\}^{{\mathrm{T}}}\right]\!\!\text{.} }$$ The corresponding estimators of $$H$$ and $$V$$ are denoted by $$\hat H$$ and $$\hat V$$, evaluated at $$\hat\theta_{\rm c}$$. The second-order Bartlett identity generally does not hold for $$\ell_{\rm c}(\theta)$$, since $$H\,\,=\hspace{-8pt}|\,\,\, V$$. To circumvent this problem, Chandler & Bate (2007) proposed an adjusted composite likelihood through a vertical scaling:   $$\label{CHvert} \ell_{\rm A}(\theta) = \ell_{\rm c}(\hat\theta_{\rm c}){{+}}\left\{\left(\theta-\hat \theta_{\rm c}\right)^{{\mathrm{T}}} \hat H_{\rm A} \left(\theta-\hat \theta_{\rm c}\right) \right\} { \ell_{\rm c}(\theta)-\ell_{\rm c}(\hat \theta_{\rm c}) \over \left(\theta-\hat \theta_{\rm c}\right)^{{\mathrm{T}}} \hat H \left(\theta-\hat \theta_{\rm c}\right)}\!,$$ (1) where $$\hat H_{\rm A}$$ is the inverse of the robust variance estimator $$\hat H \hat V^{-1} \hat H$$ and $$(\theta-\hat \theta)^{{\mathrm{T}}}\hat H_{\rm A} (\theta-\hat \theta) /(\theta-\hat \theta)^{{\mathrm{T}}} \hat H (\theta-\hat \theta)$$ is the scaling factor. Pace et al. (2011) proposed another adjustment based on the composite score function. However, when the parameter $$\theta_0$$ of the null hypothesis lies on the boundary of its parameter space, the two adjustments above are not sufficient to enforce the second-order Bartlett identity at $$\theta_0$$, because both are based on approximation of $$\ell_{\rm c}(\hat \theta_{\rm c})$$ around $$\theta_0$$ by a quadratic form of $$(\hat \theta_{\rm c}-\theta_0)$$ (Chandler & Bate, 2007) or a quadratic form of $$U_{\rm c}(\theta_0)$$ (Pace et al., 2011), which relies on the asymptotic equivalence of $$N^{1/2}(\hat \theta_{\rm c}-\theta_0)$$ and $$N^{-1/2}\hat H^{-1} U_{\rm c}(\theta_0)$$. Such an equivalence is true only under the first-order condition $$U_{\rm c}(\hat \theta_{\rm c})=0$$. Under boundary constraints, $$\hat \theta_{\rm c}$$ may not solve this composite score equation, so the asymptotic equivalence does not hold. We therefore adapt the quadratic approximation in Lemma 1 of Self & Liang (1987) to deal with composite likelihood, and obtain the following approximation around $$\hat \theta_{\rm c}$$:   $$2\left\{\ell_{\rm c}(\theta)-\ell_{\rm c}(\hat\theta_{\rm c})\right\} = {{-}} T\left(\theta\right)^{{\mathrm{T}}} \hat{H}T(\theta){{+}}N^{-1}U_{\rm c}(\hat\theta_{\rm c})^{{\mathrm{T}}} \hat{H}^{-1}U_{\rm c}(\hat\theta_{\rm c})+{O}_{\mathrm{p}}\left(N\big\Vert\theta-\hat\theta_{\rm c}\big\Vert^3\right)\!, \label{eq:AB}$$ (2) where $$T(\theta)=N^{-1/2}\hat{H}^{-1}U_{\rm c}(\hat\theta_{\rm c}){{-}}N^{1/2}(\theta-\hat\theta_{\rm c})$$. Under boundary constraints, the composite likelihood $$\ell_{\rm c}(\theta)$$ is approximated by a quadratic form of a linear combination of $$U_{\rm c}(\hat\theta_{\rm c})$$ and $$(\theta-\hat\theta_{\rm c})$$. In a similar spirit to Chandler & Bate (2007) and Pace et al. (2011), we propose the following modified composite likelihood under boundary constraints:   $$\label{lm} \ell_{\rm M}(\theta)=\ell_{\rm c}(\hat\theta_{\rm c})-\left\{T(\theta)^{{\mathrm{T}}}\hat{H}_{\rm A} T(\theta)\right\}\phi(\theta),$$ (3) where $$T(\theta)=N^{-1/2}\hat{H}^{-1}U_{\rm c}(\hat{\theta}_{\rm c})-N^{1/2}(\theta-\hat{\theta}_{\rm c})$$ and   $$\phi(\theta)=\frac{\ell_{\rm c}(\theta)-\ell_{\rm c}l(\hat \theta_{\rm c})}{-T(\theta)^{{\mathrm{T}}}\hat{H}T(\theta)+N^{-1}U_{\rm c}(\hat{\theta}_{\rm c})^{{\mathrm{T}}} \hat{H}^{-1}U_{\rm c}(\hat{\theta}_{\rm c})}\text{.}$$ In (3), the modified composite likelihood $$\ell_{\rm M}(\theta)$$ is approximated by a quadratic form with a finite-sample adjustment $$\phi(\theta)$$. The quadratic term vertically scales the likelihood to enforce the second-order Bartlett identity. The proposed modification uses a second-order expansion based on both $$U_{\rm c}(\hat\theta_{\rm c})$$ and $$(\theta-\hat\theta_{\rm c})$$ to deal with boundary constraints. When there is no boundary constraint, the proposed modified composite likelihood reduces to Chandler & Bate’s (2007) adjustment in (1) upon noting that $$U_{\rm c}(\hat\theta_{\rm c})=0$$. 3. Theoretical results We decompose the $$p$$-dimensional parameter $$\theta$$ as $$\theta^{{\mathrm{T}}}=(\zeta^{{\mathrm{T}}}, \eta^{{\mathrm{T}}})$$, where the null hypothesis $$H_0: \zeta=0$$ and $$\eta=\eta_0$$ constrains the $$s$$-dimensional parameter $$\zeta$$ to lie on the boundary of the parameter space and sets the remaining $$(p-s)$$-dimensional parameter $$\eta$$ to a value in the interior of the parameter space. The composite likelihood ratio statistic based on the proposed modification is   $$W=-2\bigl\{\ell_{\rm M}(0, \eta_0)-\ell_{\rm M}(\hat{\theta}_{\rm M})\bigr\}\text{.}$$ Using the quadratic expansion for boundary parameters in (2) and the projections of normal random vectors onto cones in Shapiro (1985), we obtain our main results. Theorem 1. Under the regularity conditions R1–R5 in the Supplementary Material, the following properties hold.  (i) The second-order Bartlett identity holds asymptotically for $$\ell_{\rm M}(\theta)$$ at $$\theta_0$$, i.e., $$N^{-1}{\mathrm{var}}\{\partial \ell_{\rm M}(\theta_0)/\partial \theta\}=-N^{-1}E\{{\partial^2 \ell_{\rm M}(\theta_0)}/{\partial\theta^{{\mathrm{T}}}\partial \theta}\}+{O_{\rm p}({N^{-1/2}})}$$.  (ii) With probability tending to $$1$$, as $$N\rightarrow \infty$$, there exists a sequence of points $$\hat{\theta}_{\rm M}$$ in the parameter space $$\Omega$$ where local maxima of $$\ell_{\rm M}(\theta)$$ occur and which converges to $$\theta_0$$ in probability. Moreover, $$N^{1/2}(\hat{\theta}_{\rm M}-\theta_0)={O_{\rm p}(1)}$$.  (iii) Let $$V$$ denote the number of null hypothesis parameters that are estimated to be in the interior of the parameter space and let $$v$$ denote its observed value. Then  $$\mathrm{pr}(W\leq w \mid V=v) \rightarrow \mathrm{pr}(\chi_v^2\leq w)$$ as $$N\to\infty$$, where $$\chi^2_v$$ is a chi-squared variable with $$v$$ degrees of freedom.  This theorem shows that, given $$V=v$$, the modified composite likelihood ratio statistic converges weakly to $$\chi^2_v$$, which extends the results in Susko (2013). Without the modification of the composite likelihood, one could conduct a standard composite likelihood ratio test. The calculation of the asymptotic distribution of the test statistic under boundary constraints follows an argument similar to that in Chen & Liang (2010), where pseudolikelihood (Gong & Samaniego, 1981) is considered and the second-order Bartlett identity also does not hold. However, the calculation involves projection onto the tangent cone at the null hypothesis, as well as decomposition of the Fisher information, and must be worked out case by case. Another alternative is to calculate the composite likelihood ratio statistic and compare it with the $$\chi_{{p}}^2$$ distribution. This is a naive implementation of the composite likelihood ratio test, since the boundary constraints are ignored and the second-order Bartlett identity is taken to be valid even though it is not. Unlike in standard likelihood inference, the naive test using composite likelihood is not necessarily conservative, because the distribution of the composite likelihood ratio statistic converges to a mixture of distributions, some of which may have heavier tails than $$\chi^2$$ (Chen & Liang, 2010). The proposed modified composite likelihood ratio test is a good compromise, and its calculation is straightforward and easy to implement. 4. Simulation We conducted simulations to validate the theoretical results empirically. Our illustrative example is a hypothesis testing problem in stratified case-control studies (Liang, 1987). In a stratified case-control study, let $$x_{i1},\ldots,x_{ik_i}$$ denote the $$p\times 1$$ vectors of potential risk factors for $$k_i$$ cases, and let $$x_{ik_i+1},\ldots,x_{iK_i}$$ denote the potential risk factors of $$K_i-k_i$$ controls in the $$i$$th stratum ($$i=1,\ldots,N$$). A logistic regression model allowing for stratum-specific effects is   $$\label{logistic} \mathrm{logit\,pr}\left(\,y_{ij}=1 \mid {x_{ij}}\right) =\alpha_i+{{\beta}}^{{\mathrm{T}}} {x_{ij}} \quad (i=1,\ldots,N),$$ (4) where the coefficients $${\beta}$$ quantify the effects of the risk factors $$x_{ij}$$ on the disease status $$y_{ij}$$. To draw valid inference on $${\beta}$$, Liang (1987) proposed a composite likelihood method where the nuisance parameters $$\alpha_i$$ are eliminated by conditioning. For the $$(j,l)$$th case-control pair of subjects in the $$i$$th stratum $$(j=1,\ldots,k_i;\, l=k_i+1,\ldots,K_i)$$, the conditional probability that $$x_{ij}$$ is from the case given that $$x_{ij}$$ and $$x_{il}$$ are observed is   $${\mathrm{pr}}\left(\,y_{ij}=1,y_{il}=0 \mid y_{ij}+y_{il}=1, x_{ij}, x_{il}; \alpha_i, {\beta}\right)= \frac{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)}{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)+\exp\left({{\beta}}^{{\mathrm{T}}}{x_{il}}\right)} \text{.}$$ Hence, a composite loglikelihood combining all possible pairs from $$N$$ strata with weights $$w_{ijl}$$ is   $$\ell_{\rm c}({\beta})=\sum_{i=1}^{N}\sum_{j=1}^{k_i}\sum_{l=k_i+1}^{K_i} w_{ijl}\log \left\{\frac{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)}{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)+\exp\left({{\beta}}^{{\mathrm{T}}}{x_{il}}\right)}\right\}\! \text{.}$$ Without loss of generality we set $$w_{ijl}=K_i^{-1}$$, so the maximum composite likelihood estimator reduces to the Mantel–Haenszel estimator when a binary covariate is considered (Liang, 1987). Suppose that the covariates are three-dimensional and are known to be positively associated with the occurrence of disease, i.e., the null hypothesis is $$H_0: \beta = (0,0,0)$$ and the alternative is $$H_{\rm a}: \text{any of } \beta_1,\beta_2,\beta_3 >0$$. For this problem, the standard composite likelihood ratio statistic does not converge to $$\chi_3^2$$ because of the boundary constraint. Setting $$C_{\Omega_0}=\{0\}\times\{0\}\times\{0\}$$ and $$C_{\Omega}=[0,\infty)\times[0,\infty)\times[0,\infty)$$, the asymptotic distribution of the composite likelihood ratio statistic can be derived following Chen & Liang (2010), given that the test statistic is asymptotically equivalent to the difference between two quadratic forms, with   $$Z^{{\mathrm{T}}}H_0Z- \inf_{\beta \in C_{\Omega}}(Z-\beta)^{{\mathrm{T}}}H_0(Z-\beta), \quad Z \sim N\left(0, \,H_0^{-1}V_0H_0^{-1}\right)\!\text{.}$$ As $$\beta$$ is three-dimensional, the calculations of each individual distribution and of the mixing proportions in the limiting mixture are rather complicated. In this situation, a naive comparison of the composite likelihood ratio statistic with a $$\chi^2_3$$ distribution is a simple procedure, so we focus on comparing the modified composite likelihood ratio test and the naive tests. We simulate data from several different scenarios and compare the Type I error and power of the modified composite likelihood ratio test with those of the naive composite likelihood ratio test. The naive test uses the $$(1-\alpha)$$th quantile of $$\chi_{3}^2$$ as the critical value, where $$\alpha$$ is the nominal level, and the modified composite likelihood ratio test uses the $$(1-\alpha)$$th quantile of $$\chi_{v}^2$$, where $$v$$ is the number of $$\beta$$ coefficients that were estimated as positive. For all the scenarios, the number of cases and the number of controls for each stratum are fixed at five, whereas the number of strata varies from 25 to 400. For each stratum, we generate a stratum-specific intercept $$\alpha_i$$ from a uniform distribution on $$[-1, 1]$$, and generate covariates $$x_1$$, $$x_2$$ and $$x_3$$ from a standard normal distribution. We then calculate the individual probability of having disease using (4) and generate the binary disease status from the corresponding Bernoulli distribution. Finally, we randomly sample five cases and five controls from each stratum. Box-constrained optimization of the likelihoods is implemented through the R function optim (R Development Core Team, 2018) with the L-BFGS-B method. We use $$10^{-6}$$ as the threshold to determine whether the estimated parameter is on the boundary. Sensitivity analyses yield similar results when the threshold is varied from $$10^{-3}$$ to $$10^{-8}$$. Table 1 summarizes the Type I errors based on 5000 simulations. The naive composite likelihood ratio test has empirical Type I errors that are much smaller than the nominal levels, but the modified composite likelihood ratio test performs reasonably well. Since the modified likelihood ratio test cannot reject the null hypothesis when all the parameter estimates are on the boundary, the theoretical false positive probabilities for the proposed test corresponding to a nominal level of $$\alpha$$ can only be attained at $$\alpha\times \{1-\mathrm{pr}(\text{all }\hat \beta =0)\}$$; these probabilities are listed in the second and fourth rows in the mCLRT part of Table 1. Table 1. Type I errors $$(\times 100)$$ of the modified composite likelihood ratio test and the naive composite likelihood ratio test, for $$H_0: \beta_1=\beta_2=\beta_3=0$$ at nominal levels $$0{\cdot}05$$ and $$0{\cdot}1$$, where $$\mathrm{logit\,pr}(\,y_{ij}=1 \mid x_{ij}) =\alpha_i$$ and $$\alpha_i \sim {\rm Un}[-1,1]$$           Number of strata     Level (%)  Empirical/Theoretical  25  50  100  200  400  mCLRT  5  Empirical  5$$\cdot$$0  5$$\cdot$$3  5$$\cdot$$1  6$$\cdot$$1  4$$\cdot$$9       Theoretical  4$$\cdot$$3  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4     10  Empirical  8$$\cdot$$3  10$$\cdot$$1  10$$\cdot$$1  10$$\cdot$$7  9$$\cdot$$5       Theoretical  8$$\cdot$$7  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  Naive  5  Empirical  0$$\cdot$$2  0$$\cdot$$0  0$$\cdot$$1  0$$\cdot$$0  0$$\cdot$$0       Theoretical  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0     10  Empirical  0$$\cdot$$2  0$$\cdot$$3  0$$\cdot$$2  0$$\cdot$$4  0$$\cdot$$1        Theoretical  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0           Number of strata     Level (%)  Empirical/Theoretical  25  50  100  200  400  mCLRT  5  Empirical  5$$\cdot$$0  5$$\cdot$$3  5$$\cdot$$1  6$$\cdot$$1  4$$\cdot$$9       Theoretical  4$$\cdot$$3  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4     10  Empirical  8$$\cdot$$3  10$$\cdot$$1  10$$\cdot$$1  10$$\cdot$$7  9$$\cdot$$5       Theoretical  8$$\cdot$$7  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  Naive  5  Empirical  0$$\cdot$$2  0$$\cdot$$0  0$$\cdot$$1  0$$\cdot$$0  0$$\cdot$$0       Theoretical  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0     10  Empirical  0$$\cdot$$2  0$$\cdot$$3  0$$\cdot$$2  0$$\cdot$$4  0$$\cdot$$1        Theoretical  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  mCLRT, the modified composite likelihood ratio test; Naive, the naive composite likelihood ratio test; Empirical, the empirical Type I error; Theoretical, the theoretical false positive probability. To compare the power, we let one, two or three $$\beta$$ coefficients vary from 0$$\cdot$$0 to 0$$\cdot$$5 with $$50$$ strata. For the nonzero $$\beta$$ coefficients, we assume equal effect sizes. Panels (a)–(c) of Fig. 1 show the power curves at a 5% nominal level based on 5000 simulations; panels (d)–(f) show the frequencies of $$V =0, 1, 2 \text{ and }3$$ as the nonzero effect size increases from 0$$\cdot$$0 to 0$$\cdot$$5. The proposed test always has larger power than the naive test. The gain in power can be further explained by the proportion of times that $$\beta$$ is estimated as positive. Specifically, when all three $$\beta$$ coefficient estimates are far from the boundary in the data-generating model, most of the time we obtain three positive $$\beta$$ estimates. Thus, both the naive test and the proposed test refer to $$\chi^2_3$$ and perform more similarly, as illustrated in Fig. 1(c) and (f). On the other hand, when there are only one or two nonzero or three nonzero but small $$\beta$$ coefficient estimates, the proposed test frequently uses one or two degrees of freedom, even as the additional $$\beta$$ values get larger; see Fig. 1. There is a substantial gain in power in these situations. Fig. 1. View largeDownload slide Power comparisons for the matched case-control study example when one, two or three testing parameters are positive in the data-generating model: (a)–(c) power of the modified composite likelihood ratio test ($$\bullet$$), the naive test ($$\bigtriangleup$$), and the composite likelihood ratio test ($$\diamondsuit$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50; (d)–(f) the proportion of times that $${\beta}$$ coefficients were estimated as positive when $$V=0$$ ($$\bullet$$), 1 ($$\bigtriangleup$$), 2 ($$\blacksquare$$) or 3 ($$\boxplus$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50. Fig. 1. View largeDownload slide Power comparisons for the matched case-control study example when one, two or three testing parameters are positive in the data-generating model: (a)–(c) power of the modified composite likelihood ratio test ($$\bullet$$), the naive test ($$\bigtriangleup$$), and the composite likelihood ratio test ($$\diamondsuit$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50; (d)–(f) the proportion of times that $${\beta}$$ coefficients were estimated as positive when $$V=0$$ ($$\bullet$$), 1 ($$\bigtriangleup$$), 2 ($$\blacksquare$$) or 3 ($$\boxplus$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50. We also investigate the potential power loss due to the conditioning procedure and the modification. The critical value of the composite likelihood ratio test is obtained by calculating the 95% quantile of 5000 test statistics based on data generated under the null hypothesis. As indicated by Fig. 1(a)–(c), there is a moderate loss of power. However, theoretical calculation of the critical value of the composite likelihood ratio test can be tedious and must be done case by case. Therefore, the modified composite likelihood ratio test serves as an useful alternative at the cost of moderate power loss. 5. Discussion The proposed modification also applies to the composite likelihood when nuisance parameters are present. Assume that $$\gamma^{{\mathrm{T}}}=(\theta^{{\mathrm{T}}},\lambda^{{\mathrm{T}}})$$, where $$\lambda$$ is a finite-dimensional nuisance parameter and $$\theta^{{\mathrm{T}}}=(\zeta^{{\mathrm{T}}}, \eta^{{\mathrm{T}}})$$. Similarly, we test the null hypothesis $$H_0: \zeta=0 \text{ and } \eta=\eta_0$$, where $$0$$ is on the boundary of the parameter space for $$\zeta$$ and $$\eta_0$$ is in the interior of the parameter space. Let $$\ell_{\rm pc}(\theta)=\ell_{\rm c}(\theta,\hat\lambda_\theta)$$ be the profile composite likelihood for $$\theta$$, where $$\hat\lambda_\theta=\arg\max_{\lambda}\ell_{\rm c}(\theta,\lambda)$$. Let   \begin{align*} U_{\rm pc}(\theta)& =\frac{\partial \ell_{\rm pc}(\theta)}{\partial\theta}, \quad H_{\rm p} =\lim_{N\rightarrow\infty}{-}\frac{1}{N}E\!\left\{\frac{\partial^2\ell_{\rm pc}(\theta)}{\partial\theta^{{\mathrm{T}}}\partial\theta}\right\}\!,\\ V_{\rm p}&=\lim_{N\rightarrow\infty}\frac{1}{N}E\!\left[\left\{\frac{\partial\ell_{\rm pc}(\theta)}{\partial\theta}\right\}\left\{\frac{\partial\ell_{\rm pc}(\theta)}{\partial\theta}\right\}^{{\mathrm{T}}}\right] \end{align*} be, respectively, the profile composite score function, sensitivity matrix and variability matrix for $$\theta$$, and let $$\hat{H}_{\rm p}$$ and $$\hat{V}_{\rm p}$$ denote the corresponding estimates; also define $$\hat{H}_{\rm pA}=\hat H_{\rm p}\hat V_{\rm p}^{-1} \hat{H}_{\rm p}$$. The modified profile composite likelihood is $$\ell_{\rm MP}(\theta)=\ell_{\rm pc}(\hat\theta_{\rm c})-\{T_{\rm p}(\theta)^{{\mathrm{T}}} \hat{H}_{\rm pA}T_{\rm p}(\theta)\}\phi_{\rm p}(\theta)$$ where $$T_{\rm p}(\theta)=N^{-1/2}\hat{H}_{\rm p}^{-1} U_{\rm pc}(\hat\theta_{\rm c})-N^{1/2}(\theta-\hat\theta_{\rm c})$$ and   $$\phi_{\rm p}(\theta)=\frac{\ell_{\rm pc}(\theta)-\ell_{\rm pc}(\hat\theta_{\rm c})}{{-}T_{\rm p}(\theta)^{{\mathrm{T}}} \hat{H}_{\rm p}T_{\rm p}(\theta){+}N^{-1}U_{\rm pc}(\hat\theta_{\rm c})^{{\mathrm{T}}} \hat{H}_{\rm p}^{-1}U_{\rm pc}(\hat\theta_{\rm c})}\text{.}$$ We define the modified profile composite likelihood ratio statistic to be $$W_{\rm p}=2\{\ell_{\rm MP}(\hat{\theta}_{\rm M}) - \ell_{\rm MP}(0,\eta_0)\},$$ where $$\hat \theta_{\rm M}=\arg\max_\theta\ell_{\rm MP}(\theta)$$. Here $$\hat{H}_{\rm p}$$ and $$\hat{V}_{\rm p}$$ can be calculated empirically; see the Supplementary Material. When the parameter is on the boundary of the parameter space, Andrews (2000) established the inconsistency of the standard nonparametric and parametric bootstrap methods and proposed subsampling and $$m$$-out-of-$$n$$ bootstrap methods for obtaining consistent estimators of the limiting distribution. However, both these methods require additional tuning parameters. Our proposed method is computationally simpler and free of unknown tuning parameters. Acknowledgement The first four authors are truly indebted to the fifth author, Professor Bruce G. Lindsay of Pennsylvania State University, for his inspiration and encouragement; he will be remembered as a great scholar. The authors thank the referee, associate editor and editor for comments that substantially improved this work. This research was supported in part by the National Institutes of Health, U.S.A. Supplementary material Supplementary material available at Biometrika online includes regularity conditions, the proof of the asymptotic expansion in equation (2), the proof of Theorem 1, and a more general result involving nuisance parameters. References Andrews, D. W. ( 2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica  68, 399– 405. Google Scholar CrossRef Search ADS   Bartholomew, D. J. ( 1961). A test of homogeneity of means under restricted alternatives. J. R. Statist. Soc. B  23, 239– 81. Besag, J. E. ( 1974). Spatial interaction and the statistical analysis of lattice systems. J. R. Statist. Soc. B  36, 192– 236. Chandler, R. E. & Bate, S. ( 2007). Inference for clustered data using the independence loglikelihood. Biometrika  94, 167– 83. Google Scholar CrossRef Search ADS   Chen, Y. & Liang, K.-Y. ( 2010). On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems. Biometrika  97, 603– 20. Google Scholar CrossRef Search ADS PubMed  Chen, Y. , Ning, J., Ning, Y., Liang, K.-Y. & Bandeen-Roche, K. ( 2017). On pseudolikelihood inference for semiparametric models with boundary problems. Biometrika  104, 165– 79. Google Scholar CrossRef Search ADS PubMed  Gong, G. & Samaniego, F. J. ( 1981). Pseudo maximum likelihood estimation: Theory and applications. Ann. Statist.  9, 861– 9. Google Scholar CrossRef Search ADS   Liang, K.-Y. ( 1987). Extended Mantel-Haenszel estimating procedure for multivariate logistic regression models. Biometrics  43, 289– 99. Google Scholar CrossRef Search ADS PubMed  Lindsay, B. G. ( 1988). Composite likelihood methods. Contemp. Math.  80, 221– 39. Google Scholar CrossRef Search ADS   Molenberghs, G. & Verbeke, G. ( 2005). Models for Discrete Longitudinal Data . New York: Springer. Pace, L. , Salvan, A. & Sartori, N. ( 2011). Adjusting composite likelihood ratio statistics. Statist. Sinica  21, 129– 48. R Development Core Team ( 2018). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.http://www.R-project.org. Self, S. G. & Liang, K.-Y. ( 1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Statist. Assoc.  82, 605– 10. Google Scholar CrossRef Search ADS   Shapiro, A. ( 1985). Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika  72, 133– 44. Google Scholar CrossRef Search ADS   Susko, E. ( 2013). Likelihood ratio tests with boundary constraints using data-dependent degrees of freedom. Biometrika  100, 1019– 23. Google Scholar CrossRef Search ADS   Varin, C. , Reid, N. & Firth, D. ( 2011). An overview of composite likelihood methods. Statist. Sinica  21, 5– 42. © 2017 Biometrika Trust http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biometrika Oxford University Press

# A conditional composite likelihood ratio test with boundary constraints

, Volume 105 (1) – Mar 1, 2018
8 pages

/lp/ou_press/a-conditional-composite-likelihood-ratio-test-with-boundary-Gj4G8rEgUF
Publisher
Oxford University Press
ISSN
0006-3444
eISSN
1464-3510
D.O.I.
10.1093/biomet/asx066
Publisher site
See Article on Publisher Site

### Abstract

Summary Composite likelihood has been widely used in applications. The asymptotic distribution of the composite likelihood ratio statistic at the boundary of the parameter space is a complicated mixture of weighted $$\chi^2$$ distributions. In this paper we propose a conditional test with data-dependent degrees of freedom. We consider a modification of the composite likelihood which satisfies the second-order Bartlett identity. We show that the modified composite likelihood ratio statistic given the number of estimated parameters lying on the boundary converges to a simple $$\chi^2$$ distribution. This conditional testing procedure is validated through simulation studies. 1. Introduction Composite likelihood (Besag, 1974; Lindsay, 1988) is an inference function constructed as the product of a set of conditional and/or marginal density functions. Composite likelihoods have been widely used in longitudinal studies, spatial modelling, analysis of panel data or missing data, and other areas. When a working independence assumption is adopted, the composite likelihood is sometimes called the independence likelihood (Chandler & Bate, 2007). Varin et al. (2011) reviewed composite likelihood methods. Although Wald-based inference using composite likelihoods is common, likelihood ratio-based inference often has better finite-sample performance. The composite likelihood ratio statistic is generally distributed as a linear combination of independent $$\chi_1^2$$ distributions (Molenberghs & Verbeke, 2005). This is mainly due to the failure of the second-order Bartlett identity, which refers to the equality between the expected negative Hessian of the loglikelihood and the covariance of the score function. To deal with such complications, several adjustments have been proposed (Chandler & Bate, 2007; Pace et al., 2011), and the adjusted composite likelihood ratio statistics converge weakly to a simple $$\chi^2$$ distribution. A regularity condition underlying these adjustments is that the parameter $$\theta_0$$ under the null hypothesis $$H_0: \theta=\theta_0$$ is interior to its parameter space, so that the composite score function is zero at the maximum composite likelihood estimate. However, when the parameter $$\theta_0$$ lies on the boundary of the parameter space, which commonly occurs in variance component models, this first-order condition is usually not satisfied. Consequently, existing adjustments for composite likelihoods may not yield an asymptotic $$\chi^2$$ distribution. Another challenge for boundary problems is that the asymptotic distribution is often a mixture of $$\chi^2$$ distributions (Self & Liang, 1987; Chen & Liang, 2010; Chen et al., 2017), and calculation of the mixing proportions relies on a partition of the parameter space. Such a partition often depends on the geometry of the tangent cone at the null hypothesis as well as on the decomposition of the Fisher information, and must be worked out case by case; this limits the use of these asymptotic results. The primary aim of this paper is to develop a modified composite likelihood ratio statistic with a simple limiting distribution for hypothesis testing when the parameter of interest lies on the boundary of the parameter space. Specifically, to recover the second-order Bartlett identity, we propose a novel modification of the composite likelihood, whereby a new approximation is obtained via a quadratic function of a linear combination of the composite score function and the maximum composite likelihood estimator. To avoid the calculation of mixing proportions, we adopt a conditional testing procedure recently proposed by Susko (2013), which originated from the ideas of Bartholomew (1961). Susko showed that the standard likelihood ratio statistics given the number of parameters lying on the boundary converges weakly to a simple $$\chi^2$$ distribution with data-dependent degrees of freedom. A crucial assumption made in Bartholomew (1961) and Susko (2013) is the second-order Bartlett identity, which does not hold for composite likelihoods. The goals of this work are two-fold: we extend Chandler & Bate’s (2007) adjustment to boundary problems, and we generalize Susko’s (2013) result to more general likelihoods for which the second-order Bartlett identity does not hold. We give a general theorem on the asymptotic distribution of the modified composite likelihood ratio statistic, and we show empirically that the modified composite likelihood ratio statistic performs better than the naive test that ignores boundary problems. 2. Modified composite likelihood under boundary constraints Let $$g(x;\theta)$$ be the probability density function of a multi-dimensional random vector $$X$$, indexed by a $$p$$-dimensional parameter $$\theta=(\theta_1,\ldots,\theta_p)^{{\mathrm{T}}}$$, where $$\theta$$ belongs to the parameter space $$\Omega$$. We assume that distinct values of $$\theta$$ correspond to distinct probability distributions. Let $$\{\mathscr{A}_1,\ldots,\mathscr{A}_K\}$$ denote a set of marginal or conditional events associated with loglikelihoods $$\ell_k\{\theta;\mathscr{A}_k(x)\}=\log \int_{x\in\mathscr{A}_k} g(x;\theta)\,{\rm d} x$$, where $$k=1, \ldots, K$$ with $$K$$ being the total number of events. Suppose that $$N$$ independent random variables $$X_1,\ldots,X_N$$ are observed from the model $$g(x;\theta)$$. Following Lindsay (1988), a composite loglikelihood can be constructed as   $$\ell_{\rm c}(\theta)=\displaystyle\sum_{i=1}^{N} \sum_{k=1}^{K}\omega_{ik}\ell_k\{\theta;\mathscr{A}_k(x_i)\},$$ where $$\omega_{ik}$$ is a nonnegative deterministic weight associated with the loglikelihood component $$\ell_k\{\theta;\mathscr{A}_k(x_i)\}$$. Let $$\hat{\theta}_{\rm c}=\arg\max_{\theta\in \Omega} \ell_{\rm c}(\theta)$$ be the maximum composite likelihood estimator, and define the composite score function, sensitivity matrix and variability matrix as, respectively,   $${ U_{\rm c}(\theta)=\frac{\partial \ell_{\rm c}(\theta)}{\partial\theta},\quad H ={\displaystyle \lim _{N\rightarrow \infty}}-\frac{1}{N}E\!\left\{\frac{\partial^2\ell_{\rm c}(\theta)}{\partial\theta^{{\mathrm{T}}}\partial\theta}\right\}\!, \quad V={\displaystyle \lim _{N\rightarrow \infty}}\frac{1}{N}E\!\left[\left\{\frac{\partial\ell_{\rm c}(\theta)}{\partial\theta}\right\}\left\{\frac{\partial\ell_{\rm c}(\theta)}{\partial\theta}\right\}^{{\mathrm{T}}}\right]\!\!\text{.} }$$ The corresponding estimators of $$H$$ and $$V$$ are denoted by $$\hat H$$ and $$\hat V$$, evaluated at $$\hat\theta_{\rm c}$$. The second-order Bartlett identity generally does not hold for $$\ell_{\rm c}(\theta)$$, since $$H\,\,=\hspace{-8pt}|\,\,\, V$$. To circumvent this problem, Chandler & Bate (2007) proposed an adjusted composite likelihood through a vertical scaling:   $$\label{CHvert} \ell_{\rm A}(\theta) = \ell_{\rm c}(\hat\theta_{\rm c}){{+}}\left\{\left(\theta-\hat \theta_{\rm c}\right)^{{\mathrm{T}}} \hat H_{\rm A} \left(\theta-\hat \theta_{\rm c}\right) \right\} { \ell_{\rm c}(\theta)-\ell_{\rm c}(\hat \theta_{\rm c}) \over \left(\theta-\hat \theta_{\rm c}\right)^{{\mathrm{T}}} \hat H \left(\theta-\hat \theta_{\rm c}\right)}\!,$$ (1) where $$\hat H_{\rm A}$$ is the inverse of the robust variance estimator $$\hat H \hat V^{-1} \hat H$$ and $$(\theta-\hat \theta)^{{\mathrm{T}}}\hat H_{\rm A} (\theta-\hat \theta) /(\theta-\hat \theta)^{{\mathrm{T}}} \hat H (\theta-\hat \theta)$$ is the scaling factor. Pace et al. (2011) proposed another adjustment based on the composite score function. However, when the parameter $$\theta_0$$ of the null hypothesis lies on the boundary of its parameter space, the two adjustments above are not sufficient to enforce the second-order Bartlett identity at $$\theta_0$$, because both are based on approximation of $$\ell_{\rm c}(\hat \theta_{\rm c})$$ around $$\theta_0$$ by a quadratic form of $$(\hat \theta_{\rm c}-\theta_0)$$ (Chandler & Bate, 2007) or a quadratic form of $$U_{\rm c}(\theta_0)$$ (Pace et al., 2011), which relies on the asymptotic equivalence of $$N^{1/2}(\hat \theta_{\rm c}-\theta_0)$$ and $$N^{-1/2}\hat H^{-1} U_{\rm c}(\theta_0)$$. Such an equivalence is true only under the first-order condition $$U_{\rm c}(\hat \theta_{\rm c})=0$$. Under boundary constraints, $$\hat \theta_{\rm c}$$ may not solve this composite score equation, so the asymptotic equivalence does not hold. We therefore adapt the quadratic approximation in Lemma 1 of Self & Liang (1987) to deal with composite likelihood, and obtain the following approximation around $$\hat \theta_{\rm c}$$:   $$2\left\{\ell_{\rm c}(\theta)-\ell_{\rm c}(\hat\theta_{\rm c})\right\} = {{-}} T\left(\theta\right)^{{\mathrm{T}}} \hat{H}T(\theta){{+}}N^{-1}U_{\rm c}(\hat\theta_{\rm c})^{{\mathrm{T}}} \hat{H}^{-1}U_{\rm c}(\hat\theta_{\rm c})+{O}_{\mathrm{p}}\left(N\big\Vert\theta-\hat\theta_{\rm c}\big\Vert^3\right)\!, \label{eq:AB}$$ (2) where $$T(\theta)=N^{-1/2}\hat{H}^{-1}U_{\rm c}(\hat\theta_{\rm c}){{-}}N^{1/2}(\theta-\hat\theta_{\rm c})$$. Under boundary constraints, the composite likelihood $$\ell_{\rm c}(\theta)$$ is approximated by a quadratic form of a linear combination of $$U_{\rm c}(\hat\theta_{\rm c})$$ and $$(\theta-\hat\theta_{\rm c})$$. In a similar spirit to Chandler & Bate (2007) and Pace et al. (2011), we propose the following modified composite likelihood under boundary constraints:   $$\label{lm} \ell_{\rm M}(\theta)=\ell_{\rm c}(\hat\theta_{\rm c})-\left\{T(\theta)^{{\mathrm{T}}}\hat{H}_{\rm A} T(\theta)\right\}\phi(\theta),$$ (3) where $$T(\theta)=N^{-1/2}\hat{H}^{-1}U_{\rm c}(\hat{\theta}_{\rm c})-N^{1/2}(\theta-\hat{\theta}_{\rm c})$$ and   $$\phi(\theta)=\frac{\ell_{\rm c}(\theta)-\ell_{\rm c}l(\hat \theta_{\rm c})}{-T(\theta)^{{\mathrm{T}}}\hat{H}T(\theta)+N^{-1}U_{\rm c}(\hat{\theta}_{\rm c})^{{\mathrm{T}}} \hat{H}^{-1}U_{\rm c}(\hat{\theta}_{\rm c})}\text{.}$$ In (3), the modified composite likelihood $$\ell_{\rm M}(\theta)$$ is approximated by a quadratic form with a finite-sample adjustment $$\phi(\theta)$$. The quadratic term vertically scales the likelihood to enforce the second-order Bartlett identity. The proposed modification uses a second-order expansion based on both $$U_{\rm c}(\hat\theta_{\rm c})$$ and $$(\theta-\hat\theta_{\rm c})$$ to deal with boundary constraints. When there is no boundary constraint, the proposed modified composite likelihood reduces to Chandler & Bate’s (2007) adjustment in (1) upon noting that $$U_{\rm c}(\hat\theta_{\rm c})=0$$. 3. Theoretical results We decompose the $$p$$-dimensional parameter $$\theta$$ as $$\theta^{{\mathrm{T}}}=(\zeta^{{\mathrm{T}}}, \eta^{{\mathrm{T}}})$$, where the null hypothesis $$H_0: \zeta=0$$ and $$\eta=\eta_0$$ constrains the $$s$$-dimensional parameter $$\zeta$$ to lie on the boundary of the parameter space and sets the remaining $$(p-s)$$-dimensional parameter $$\eta$$ to a value in the interior of the parameter space. The composite likelihood ratio statistic based on the proposed modification is   $$W=-2\bigl\{\ell_{\rm M}(0, \eta_0)-\ell_{\rm M}(\hat{\theta}_{\rm M})\bigr\}\text{.}$$ Using the quadratic expansion for boundary parameters in (2) and the projections of normal random vectors onto cones in Shapiro (1985), we obtain our main results. Theorem 1. Under the regularity conditions R1–R5 in the Supplementary Material, the following properties hold.  (i) The second-order Bartlett identity holds asymptotically for $$\ell_{\rm M}(\theta)$$ at $$\theta_0$$, i.e., $$N^{-1}{\mathrm{var}}\{\partial \ell_{\rm M}(\theta_0)/\partial \theta\}=-N^{-1}E\{{\partial^2 \ell_{\rm M}(\theta_0)}/{\partial\theta^{{\mathrm{T}}}\partial \theta}\}+{O_{\rm p}({N^{-1/2}})}$$.  (ii) With probability tending to $$1$$, as $$N\rightarrow \infty$$, there exists a sequence of points $$\hat{\theta}_{\rm M}$$ in the parameter space $$\Omega$$ where local maxima of $$\ell_{\rm M}(\theta)$$ occur and which converges to $$\theta_0$$ in probability. Moreover, $$N^{1/2}(\hat{\theta}_{\rm M}-\theta_0)={O_{\rm p}(1)}$$.  (iii) Let $$V$$ denote the number of null hypothesis parameters that are estimated to be in the interior of the parameter space and let $$v$$ denote its observed value. Then  $$\mathrm{pr}(W\leq w \mid V=v) \rightarrow \mathrm{pr}(\chi_v^2\leq w)$$ as $$N\to\infty$$, where $$\chi^2_v$$ is a chi-squared variable with $$v$$ degrees of freedom.  This theorem shows that, given $$V=v$$, the modified composite likelihood ratio statistic converges weakly to $$\chi^2_v$$, which extends the results in Susko (2013). Without the modification of the composite likelihood, one could conduct a standard composite likelihood ratio test. The calculation of the asymptotic distribution of the test statistic under boundary constraints follows an argument similar to that in Chen & Liang (2010), where pseudolikelihood (Gong & Samaniego, 1981) is considered and the second-order Bartlett identity also does not hold. However, the calculation involves projection onto the tangent cone at the null hypothesis, as well as decomposition of the Fisher information, and must be worked out case by case. Another alternative is to calculate the composite likelihood ratio statistic and compare it with the $$\chi_{{p}}^2$$ distribution. This is a naive implementation of the composite likelihood ratio test, since the boundary constraints are ignored and the second-order Bartlett identity is taken to be valid even though it is not. Unlike in standard likelihood inference, the naive test using composite likelihood is not necessarily conservative, because the distribution of the composite likelihood ratio statistic converges to a mixture of distributions, some of which may have heavier tails than $$\chi^2$$ (Chen & Liang, 2010). The proposed modified composite likelihood ratio test is a good compromise, and its calculation is straightforward and easy to implement. 4. Simulation We conducted simulations to validate the theoretical results empirically. Our illustrative example is a hypothesis testing problem in stratified case-control studies (Liang, 1987). In a stratified case-control study, let $$x_{i1},\ldots,x_{ik_i}$$ denote the $$p\times 1$$ vectors of potential risk factors for $$k_i$$ cases, and let $$x_{ik_i+1},\ldots,x_{iK_i}$$ denote the potential risk factors of $$K_i-k_i$$ controls in the $$i$$th stratum ($$i=1,\ldots,N$$). A logistic regression model allowing for stratum-specific effects is   $$\label{logistic} \mathrm{logit\,pr}\left(\,y_{ij}=1 \mid {x_{ij}}\right) =\alpha_i+{{\beta}}^{{\mathrm{T}}} {x_{ij}} \quad (i=1,\ldots,N),$$ (4) where the coefficients $${\beta}$$ quantify the effects of the risk factors $$x_{ij}$$ on the disease status $$y_{ij}$$. To draw valid inference on $${\beta}$$, Liang (1987) proposed a composite likelihood method where the nuisance parameters $$\alpha_i$$ are eliminated by conditioning. For the $$(j,l)$$th case-control pair of subjects in the $$i$$th stratum $$(j=1,\ldots,k_i;\, l=k_i+1,\ldots,K_i)$$, the conditional probability that $$x_{ij}$$ is from the case given that $$x_{ij}$$ and $$x_{il}$$ are observed is   $${\mathrm{pr}}\left(\,y_{ij}=1,y_{il}=0 \mid y_{ij}+y_{il}=1, x_{ij}, x_{il}; \alpha_i, {\beta}\right)= \frac{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)}{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)+\exp\left({{\beta}}^{{\mathrm{T}}}{x_{il}}\right)} \text{.}$$ Hence, a composite loglikelihood combining all possible pairs from $$N$$ strata with weights $$w_{ijl}$$ is   $$\ell_{\rm c}({\beta})=\sum_{i=1}^{N}\sum_{j=1}^{k_i}\sum_{l=k_i+1}^{K_i} w_{ijl}\log \left\{\frac{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)}{\exp\left({{\beta}}^{{\mathrm{T}}}{x_{ij}}\right)+\exp\left({{\beta}}^{{\mathrm{T}}}{x_{il}}\right)}\right\}\! \text{.}$$ Without loss of generality we set $$w_{ijl}=K_i^{-1}$$, so the maximum composite likelihood estimator reduces to the Mantel–Haenszel estimator when a binary covariate is considered (Liang, 1987). Suppose that the covariates are three-dimensional and are known to be positively associated with the occurrence of disease, i.e., the null hypothesis is $$H_0: \beta = (0,0,0)$$ and the alternative is $$H_{\rm a}: \text{any of } \beta_1,\beta_2,\beta_3 >0$$. For this problem, the standard composite likelihood ratio statistic does not converge to $$\chi_3^2$$ because of the boundary constraint. Setting $$C_{\Omega_0}=\{0\}\times\{0\}\times\{0\}$$ and $$C_{\Omega}=[0,\infty)\times[0,\infty)\times[0,\infty)$$, the asymptotic distribution of the composite likelihood ratio statistic can be derived following Chen & Liang (2010), given that the test statistic is asymptotically equivalent to the difference between two quadratic forms, with   $$Z^{{\mathrm{T}}}H_0Z- \inf_{\beta \in C_{\Omega}}(Z-\beta)^{{\mathrm{T}}}H_0(Z-\beta), \quad Z \sim N\left(0, \,H_0^{-1}V_0H_0^{-1}\right)\!\text{.}$$ As $$\beta$$ is three-dimensional, the calculations of each individual distribution and of the mixing proportions in the limiting mixture are rather complicated. In this situation, a naive comparison of the composite likelihood ratio statistic with a $$\chi^2_3$$ distribution is a simple procedure, so we focus on comparing the modified composite likelihood ratio test and the naive tests. We simulate data from several different scenarios and compare the Type I error and power of the modified composite likelihood ratio test with those of the naive composite likelihood ratio test. The naive test uses the $$(1-\alpha)$$th quantile of $$\chi_{3}^2$$ as the critical value, where $$\alpha$$ is the nominal level, and the modified composite likelihood ratio test uses the $$(1-\alpha)$$th quantile of $$\chi_{v}^2$$, where $$v$$ is the number of $$\beta$$ coefficients that were estimated as positive. For all the scenarios, the number of cases and the number of controls for each stratum are fixed at five, whereas the number of strata varies from 25 to 400. For each stratum, we generate a stratum-specific intercept $$\alpha_i$$ from a uniform distribution on $$[-1, 1]$$, and generate covariates $$x_1$$, $$x_2$$ and $$x_3$$ from a standard normal distribution. We then calculate the individual probability of having disease using (4) and generate the binary disease status from the corresponding Bernoulli distribution. Finally, we randomly sample five cases and five controls from each stratum. Box-constrained optimization of the likelihoods is implemented through the R function optim (R Development Core Team, 2018) with the L-BFGS-B method. We use $$10^{-6}$$ as the threshold to determine whether the estimated parameter is on the boundary. Sensitivity analyses yield similar results when the threshold is varied from $$10^{-3}$$ to $$10^{-8}$$. Table 1 summarizes the Type I errors based on 5000 simulations. The naive composite likelihood ratio test has empirical Type I errors that are much smaller than the nominal levels, but the modified composite likelihood ratio test performs reasonably well. Since the modified likelihood ratio test cannot reject the null hypothesis when all the parameter estimates are on the boundary, the theoretical false positive probabilities for the proposed test corresponding to a nominal level of $$\alpha$$ can only be attained at $$\alpha\times \{1-\mathrm{pr}(\text{all }\hat \beta =0)\}$$; these probabilities are listed in the second and fourth rows in the mCLRT part of Table 1. Table 1. Type I errors $$(\times 100)$$ of the modified composite likelihood ratio test and the naive composite likelihood ratio test, for $$H_0: \beta_1=\beta_2=\beta_3=0$$ at nominal levels $$0{\cdot}05$$ and $$0{\cdot}1$$, where $$\mathrm{logit\,pr}(\,y_{ij}=1 \mid x_{ij}) =\alpha_i$$ and $$\alpha_i \sim {\rm Un}[-1,1]$$           Number of strata     Level (%)  Empirical/Theoretical  25  50  100  200  400  mCLRT  5  Empirical  5$$\cdot$$0  5$$\cdot$$3  5$$\cdot$$1  6$$\cdot$$1  4$$\cdot$$9       Theoretical  4$$\cdot$$3  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4     10  Empirical  8$$\cdot$$3  10$$\cdot$$1  10$$\cdot$$1  10$$\cdot$$7  9$$\cdot$$5       Theoretical  8$$\cdot$$7  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  Naive  5  Empirical  0$$\cdot$$2  0$$\cdot$$0  0$$\cdot$$1  0$$\cdot$$0  0$$\cdot$$0       Theoretical  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0     10  Empirical  0$$\cdot$$2  0$$\cdot$$3  0$$\cdot$$2  0$$\cdot$$4  0$$\cdot$$1        Theoretical  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0           Number of strata     Level (%)  Empirical/Theoretical  25  50  100  200  400  mCLRT  5  Empirical  5$$\cdot$$0  5$$\cdot$$3  5$$\cdot$$1  6$$\cdot$$1  4$$\cdot$$9       Theoretical  4$$\cdot$$3  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4  4$$\cdot$$4     10  Empirical  8$$\cdot$$3  10$$\cdot$$1  10$$\cdot$$1  10$$\cdot$$7  9$$\cdot$$5       Theoretical  8$$\cdot$$7  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  8$$\cdot$$8  Naive  5  Empirical  0$$\cdot$$2  0$$\cdot$$0  0$$\cdot$$1  0$$\cdot$$0  0$$\cdot$$0       Theoretical  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0  5$$\cdot$$0     10  Empirical  0$$\cdot$$2  0$$\cdot$$3  0$$\cdot$$2  0$$\cdot$$4  0$$\cdot$$1        Theoretical  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  10$$\cdot$$0  mCLRT, the modified composite likelihood ratio test; Naive, the naive composite likelihood ratio test; Empirical, the empirical Type I error; Theoretical, the theoretical false positive probability. To compare the power, we let one, two or three $$\beta$$ coefficients vary from 0$$\cdot$$0 to 0$$\cdot$$5 with $$50$$ strata. For the nonzero $$\beta$$ coefficients, we assume equal effect sizes. Panels (a)–(c) of Fig. 1 show the power curves at a 5% nominal level based on 5000 simulations; panels (d)–(f) show the frequencies of $$V =0, 1, 2 \text{ and }3$$ as the nonzero effect size increases from 0$$\cdot$$0 to 0$$\cdot$$5. The proposed test always has larger power than the naive test. The gain in power can be further explained by the proportion of times that $$\beta$$ is estimated as positive. Specifically, when all three $$\beta$$ coefficient estimates are far from the boundary in the data-generating model, most of the time we obtain three positive $$\beta$$ estimates. Thus, both the naive test and the proposed test refer to $$\chi^2_3$$ and perform more similarly, as illustrated in Fig. 1(c) and (f). On the other hand, when there are only one or two nonzero or three nonzero but small $$\beta$$ coefficient estimates, the proposed test frequently uses one or two degrees of freedom, even as the additional $$\beta$$ values get larger; see Fig. 1. There is a substantial gain in power in these situations. Fig. 1. View largeDownload slide Power comparisons for the matched case-control study example when one, two or three testing parameters are positive in the data-generating model: (a)–(c) power of the modified composite likelihood ratio test ($$\bullet$$), the naive test ($$\bigtriangleup$$), and the composite likelihood ratio test ($$\diamondsuit$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50; (d)–(f) the proportion of times that $${\beta}$$ coefficients were estimated as positive when $$V=0$$ ($$\bullet$$), 1 ($$\bigtriangleup$$), 2 ($$\blacksquare$$) or 3 ($$\boxplus$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50. Fig. 1. View largeDownload slide Power comparisons for the matched case-control study example when one, two or three testing parameters are positive in the data-generating model: (a)–(c) power of the modified composite likelihood ratio test ($$\bullet$$), the naive test ($$\bigtriangleup$$), and the composite likelihood ratio test ($$\diamondsuit$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50; (d)–(f) the proportion of times that $${\beta}$$ coefficients were estimated as positive when $$V=0$$ ($$\bullet$$), 1 ($$\bigtriangleup$$), 2 ($$\blacksquare$$) or 3 ($$\boxplus$$) as the nonzero effect size increases from 0 to 0$$\cdot$$5, with the number of strata equal to 50. We also investigate the potential power loss due to the conditioning procedure and the modification. The critical value of the composite likelihood ratio test is obtained by calculating the 95% quantile of 5000 test statistics based on data generated under the null hypothesis. As indicated by Fig. 1(a)–(c), there is a moderate loss of power. However, theoretical calculation of the critical value of the composite likelihood ratio test can be tedious and must be done case by case. Therefore, the modified composite likelihood ratio test serves as an useful alternative at the cost of moderate power loss. 5. Discussion The proposed modification also applies to the composite likelihood when nuisance parameters are present. Assume that $$\gamma^{{\mathrm{T}}}=(\theta^{{\mathrm{T}}},\lambda^{{\mathrm{T}}})$$, where $$\lambda$$ is a finite-dimensional nuisance parameter and $$\theta^{{\mathrm{T}}}=(\zeta^{{\mathrm{T}}}, \eta^{{\mathrm{T}}})$$. Similarly, we test the null hypothesis $$H_0: \zeta=0 \text{ and } \eta=\eta_0$$, where $$0$$ is on the boundary of the parameter space for $$\zeta$$ and $$\eta_0$$ is in the interior of the parameter space. Let $$\ell_{\rm pc}(\theta)=\ell_{\rm c}(\theta,\hat\lambda_\theta)$$ be the profile composite likelihood for $$\theta$$, where $$\hat\lambda_\theta=\arg\max_{\lambda}\ell_{\rm c}(\theta,\lambda)$$. Let   \begin{align*} U_{\rm pc}(\theta)& =\frac{\partial \ell_{\rm pc}(\theta)}{\partial\theta}, \quad H_{\rm p} =\lim_{N\rightarrow\infty}{-}\frac{1}{N}E\!\left\{\frac{\partial^2\ell_{\rm pc}(\theta)}{\partial\theta^{{\mathrm{T}}}\partial\theta}\right\}\!,\\ V_{\rm p}&=\lim_{N\rightarrow\infty}\frac{1}{N}E\!\left[\left\{\frac{\partial\ell_{\rm pc}(\theta)}{\partial\theta}\right\}\left\{\frac{\partial\ell_{\rm pc}(\theta)}{\partial\theta}\right\}^{{\mathrm{T}}}\right] \end{align*} be, respectively, the profile composite score function, sensitivity matrix and variability matrix for $$\theta$$, and let $$\hat{H}_{\rm p}$$ and $$\hat{V}_{\rm p}$$ denote the corresponding estimates; also define $$\hat{H}_{\rm pA}=\hat H_{\rm p}\hat V_{\rm p}^{-1} \hat{H}_{\rm p}$$. The modified profile composite likelihood is $$\ell_{\rm MP}(\theta)=\ell_{\rm pc}(\hat\theta_{\rm c})-\{T_{\rm p}(\theta)^{{\mathrm{T}}} \hat{H}_{\rm pA}T_{\rm p}(\theta)\}\phi_{\rm p}(\theta)$$ where $$T_{\rm p}(\theta)=N^{-1/2}\hat{H}_{\rm p}^{-1} U_{\rm pc}(\hat\theta_{\rm c})-N^{1/2}(\theta-\hat\theta_{\rm c})$$ and   $$\phi_{\rm p}(\theta)=\frac{\ell_{\rm pc}(\theta)-\ell_{\rm pc}(\hat\theta_{\rm c})}{{-}T_{\rm p}(\theta)^{{\mathrm{T}}} \hat{H}_{\rm p}T_{\rm p}(\theta){+}N^{-1}U_{\rm pc}(\hat\theta_{\rm c})^{{\mathrm{T}}} \hat{H}_{\rm p}^{-1}U_{\rm pc}(\hat\theta_{\rm c})}\text{.}$$ We define the modified profile composite likelihood ratio statistic to be $$W_{\rm p}=2\{\ell_{\rm MP}(\hat{\theta}_{\rm M}) - \ell_{\rm MP}(0,\eta_0)\},$$ where $$\hat \theta_{\rm M}=\arg\max_\theta\ell_{\rm MP}(\theta)$$. Here $$\hat{H}_{\rm p}$$ and $$\hat{V}_{\rm p}$$ can be calculated empirically; see the Supplementary Material. When the parameter is on the boundary of the parameter space, Andrews (2000) established the inconsistency of the standard nonparametric and parametric bootstrap methods and proposed subsampling and $$m$$-out-of-$$n$$ bootstrap methods for obtaining consistent estimators of the limiting distribution. However, both these methods require additional tuning parameters. Our proposed method is computationally simpler and free of unknown tuning parameters. Acknowledgement The first four authors are truly indebted to the fifth author, Professor Bruce G. Lindsay of Pennsylvania State University, for his inspiration and encouragement; he will be remembered as a great scholar. The authors thank the referee, associate editor and editor for comments that substantially improved this work. This research was supported in part by the National Institutes of Health, U.S.A. Supplementary material Supplementary material available at Biometrika online includes regularity conditions, the proof of the asymptotic expansion in equation (2), the proof of Theorem 1, and a more general result involving nuisance parameters. References Andrews, D. W. ( 2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica  68, 399– 405. Google Scholar CrossRef Search ADS   Bartholomew, D. J. ( 1961). A test of homogeneity of means under restricted alternatives. J. R. Statist. Soc. B  23, 239– 81. Besag, J. E. ( 1974). Spatial interaction and the statistical analysis of lattice systems. J. R. Statist. Soc. B  36, 192– 236. Chandler, R. E. & Bate, S. ( 2007). Inference for clustered data using the independence loglikelihood. Biometrika  94, 167– 83. Google Scholar CrossRef Search ADS   Chen, Y. & Liang, K.-Y. ( 2010). On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems. Biometrika  97, 603– 20. Google Scholar CrossRef Search ADS PubMed  Chen, Y. , Ning, J., Ning, Y., Liang, K.-Y. & Bandeen-Roche, K. ( 2017). On pseudolikelihood inference for semiparametric models with boundary problems. Biometrika  104, 165– 79. Google Scholar CrossRef Search ADS PubMed  Gong, G. & Samaniego, F. J. ( 1981). Pseudo maximum likelihood estimation: Theory and applications. Ann. Statist.  9, 861– 9. Google Scholar CrossRef Search ADS   Liang, K.-Y. ( 1987). Extended Mantel-Haenszel estimating procedure for multivariate logistic regression models. Biometrics  43, 289– 99. Google Scholar CrossRef Search ADS PubMed  Lindsay, B. G. ( 1988). Composite likelihood methods. Contemp. Math.  80, 221– 39. Google Scholar CrossRef Search ADS   Molenberghs, G. & Verbeke, G. ( 2005). Models for Discrete Longitudinal Data . New York: Springer. Pace, L. , Salvan, A. & Sartori, N. ( 2011). Adjusting composite likelihood ratio statistics. Statist. Sinica  21, 129– 48. R Development Core Team ( 2018). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.http://www.R-project.org. Self, S. G. & Liang, K.-Y. ( 1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Statist. Assoc.  82, 605– 10. Google Scholar CrossRef Search ADS   Shapiro, A. ( 1985). Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika  72, 133– 44. Google Scholar CrossRef Search ADS   Susko, E. ( 2013). Likelihood ratio tests with boundary constraints using data-dependent degrees of freedom. Biometrika  100, 1019– 23. Google Scholar CrossRef Search ADS   Varin, C. , Reid, N. & Firth, D. ( 2011). An overview of composite likelihood methods. Statist. Sinica  21, 5– 42. © 2017 Biometrika Trust

### Journal

BiometrikaOxford University Press

Published: Mar 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations