# On bias reduction and incidental parameters

On bias reduction and incidental parameters Summary Firth (1993) introduced a method for reducing the bias of the maximum likelihood estimator. Here it is shown that the approach is also effective in reducing the sensitivity of inferential procedures to incidental parameters. 1. Introduction Consider the model $$\{f_i(y;\psi,\lambda_i),\,i\,{=}\,1,\ldots,q\}$$ for independent stratified observations as in Sartori (2003). The unknown parameter is $$\theta\,{=}\,(\psi,\lambda)$$, where $$\psi$$ is a $$p_0$$-dimensional parameter of interest and $$\lambda\,{=}\,(\lambda_1,\ldots,\lambda_q)$$ is a $$q$$-dimensional nuisance parameter. The loglikelihood function for $$\theta$$ based on independent observations $$y_{ij}=(y_{i1},\ldots,y_{im_i})$$ is   \begin{equation*} \ell(\theta)=\sum_{i=1}^{q}\sum_{j=1}^{m_i} \log f_i(y_{ij};\psi,\lambda_i)=\sum_{i=1}^{q} \ell_i(\psi,\lambda_{i})\text{.} \end{equation*} The dimension $$q$$ of the nuisance parameter, or equivalently the number of strata, and the number $$m_i$$ of observations in each stratum may diverge. Neyman & Scott (1948) refer to $$\psi$$ and $$\lambda$$ as the structural and incidental parameters, respectively. Since their contribution, obtaining a consistent estimator of the structural parameter has been a challenge for statisticians. In the likelihood framework, this problem can be tackled by resorting to marginal, conditional and profile likelihoods (Kalbfleisch & Sprott, 1970). Typically, marginal and conditional likelihoods provide consistent estimators of the structural parameter, but their specification requires partially sufficient and partially distribution-constant statistics, respectively, so the availability of such likelihoods is restricted to exponential and composite group families (Pace & Salvan, 1997, § § 5.4 and 7.5). The derivation of the profile likelihood is not troublesome in itself, but the resulting estimator of the structural parameter is usually inconsistent, because the profile likelihood is not a genuine likelihood and therefore the Bartlett identities do not hold. For instance, failure of the first identity entails that the bias of the profile score function is $$O(q)$$. Using modified profile likelihoods can be regarded as a way to recover, at least approximately, the first Bartlett identity and thus obtain more reliable inference about the structural parameter (Barndorff-Nielsen, 1983; Cox & Reid, 1987; McCullagh & Tibshirani, 1990; DiCiccio et al., 1996; Stern, 1997). Under the assumption that the number of observations in each stratum is bounded by positive finite numbers that are proportional to an integer $$m$$, Sartori (2003) showed that the bias of modified profile score functions is $$O(q/m)$$. This result is used to establish the limiting properties of inferential procedures derived from modified profile likelihoods in terms of $$m$$, $$q$$ and $$n=mq$$. Specifically, if $$q/m^3=o(1)$$, then the derived estimator is root-$$n$$ consistent and the modified profile Wald, score and loglikelihood ratio test statistics differ by a relative error of order $$n^{-1/2}$$; otherwise the error rates are in suitable powers of $$m$$. In this note we show that the bias reduction approach of Firth (1993) provides an inferential framework which is, from an asymptotic perspective, equivalent to that for modified profile likelihoods when dealing with incidental parameters. This equivalence allows a user to choose either bias reduction or modified profile likelihoods based on finite-sample and subject matter considerations. For example, both approaches are not in general invariant under interest-respecting reparameterizations, and software is available for many common models (Brazzale, 2005; Kosmidis, 2017). On the other hand, bias reduction can handle the problem of monotone likelihood (see, e.g., Firth, 1993; Heinze & Schemper, 2001, 2002; Sartori, 2006; Kosmidis, 2014). 2. Notation and relevant quantities Vectors and matrices are partitioned into components pertaining to the structural and incidental parameters. For instance, the score vector $$\ell_\theta(\theta)=\partial\ell(\theta)/\partial\theta$$ has components $$\ell_\psi(\theta)$$ and $$\ell_\lambda(\theta)$$, while the observed information matrix $$k(\theta)=-\partial^2\ell(\theta)/(\partial\theta\partial\theta^{ \mathrm{\scriptscriptstyle T} })$$ has blocks $$k_{\psi\psi}(\theta)$$, $$k_{\lambda\lambda}(\theta)$$ and $$k_{\psi\lambda}(\theta)$$. Blocks of the inverse of a matrix are indicated by superscripts, e.g., $$k^{\psi\psi}(\theta)$$. The bias reduction approach is based on the adjusted score function (Firth, 1993, § 2)   $$\bar\ell_\theta(\theta)=\ell_\theta(\theta)-\nu(\theta)\Delta(\theta),$$ (1) where $$\nu(\theta)$$ can be either $$k(\theta)$$ or the expected information matrix $$i(\theta)$$, and $$\Delta(\theta)$$ is the leading term of the bias of the maximum likelihood estimator. The adjusted profile score function for $$\psi$$ is $$\bar\ell_\psi\{\bar\theta(\psi)\}$$, where $$\bar\theta(\psi)=\{\psi,\bar\lambda(\psi)\}$$ is the root of $$\bar\ell_\lambda(\theta)=0$$ for fixed $$\psi$$. Denoting by $$\bar\psi$$ the solution to $$\bar\ell_\psi\{\bar\theta(\psi)\}=0$$ and writing $$\bar{k}(\theta)=-\partial\bar\ell_\theta(\theta)/\partial\theta^{ \mathrm{\scriptscriptstyle T} }$$, the adjusted profile Wald and score test statistics are, respectively,   \begin{equation*} \bar W^e(\psi)=(\bar\psi-\psi)^{ \mathrm{\scriptscriptstyle T} } \bar{k}^{\psi\psi}\{\bar\theta(\psi)\}(\bar\psi-\psi),\quad \bar W^u(\psi)=\bar\ell_\psi\{\bar\theta(\psi)\}^{ \mathrm{\scriptscriptstyle T} }[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1}\bar\ell_\psi\{\bar\theta(\psi)\}, \end{equation*} and $$\bar W(\psi)=2[\bar\ell\{\bar\theta(\bar\psi)\}-\bar\ell\{\bar\theta(\psi)\}]$$ is the loglikelihood ratio test statistic, where $$\bar\ell(\theta)$$ is the adjusted loglikelihood function. Existence of $$\bar\ell(\theta)$$ is guaranteed, for instance, when $$\theta$$ is the canonical parameter in exponential family models; see Kosmidis & Firth (2009). 3. Reconciliation 3.1. Main results The results presented here hold under the conditions in Sartori (2003). We review the conditions and sketch the proofs in the Supplementary Material. The bias of the adjusted profile score function has the expansion   $$E_{\theta}[\bar\ell_\psi\{\bar\theta(\psi)\}]=\tau_{\psi}(\theta)+\{i^{\psi\psi}(\theta)\}^{-1}\Delta_\psi(\theta)+O(q/m)=O\{\max(1,\,q/m)\},$$ (2) where $$\tau_{\psi}(\theta)$$ is the leading term of the bias of the profile score function. Expression (2) highlights the fact that the bias of the adjusted profile score function has components arising from the profile score function, $$\tau_{\psi}(\theta)$$, and from the bias reduction approach, $$\{i^{\psi\psi}(\theta)\}^{-1}\Delta_\psi(\theta)$$. Although both terms are $$O(q)$$, the second is equal to $$-\tau_{\psi}(\theta)+O(1)$$ and compensates for most of the profile score bias. The order of (2) is equivalent to that of the bias of a modified profile score function, i.e., $$O(q/m)$$, if $$q/m\neq o(1)$$. However, in some cases, the bias of the adjusted profile score function is asymptotically smaller than the bound given in (2), or can even be zero as shown in the following example. Example 1 (Neyman–Scott problem). Let $$(y_{i1},\ldots,y_{im})$$ be $$m$$ independent realizations of normal random variables $$Y_{ij}$$ with means $$\lambda_i$$ and common variance $$\psi$$$$(i=1,\ldots,q; \,j=1,\ldots,m)$$. Consider the case $$m=2$$ as in Firth (1993, § 4.5). Then the profile score function is $$-q(\hat\psi-\psi)/\psi^2$$, while, when (1) is computed with $$\nu(\theta)=i(\theta)$$, the adjusted profile score function is $$\bar\ell_\psi\{\bar\theta(\psi)\}=-q(2\hat\psi-\psi)/(2\psi^2)$$, where $$\hat\psi=\sum_{i=1}^q(Y_{i1}-Y_{i2})^2/(4q)$$ is the maximum likelihood estimator for $$\psi$$. Direct computation gives $$E_{\theta}[\bar\ell_\psi\{\bar\theta(\psi)\}]=0$$. The result in (2) is used to derive the expansion of the adjusted profile score function,   $$\bar\ell_\psi\{\bar\theta(\psi)\}=\ell_{\psi}(\theta)-i_{\psi\lambda}(\theta)\{i_{\lambda\lambda}(\theta)\}^{-1}\ell_{\lambda}(\theta) + O_{\rm p}\{\max(q/m, \,q^{1/2})\};$$ (3) this and the expansion of the observed adjusted profile information,   $$[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1}=\{i^{\psi\psi}(\theta)\}^{-1}\{1+O_{\rm p}(m^{-1})+O_{\rm p}(n^{-1/2})\}=O(n)\{1+o_{\rm p}(1)\},$$ (4) provide the key quantities needed to prove the limiting behaviour of inferential procedures derived from the bias reduction approach. Expansions (3) and (4) are asymptotically equivalent to their counterparts derived from modified profile likelihoods, which implies that inferential procedures based on bias reduction and on modified profile likelihoods have the same limiting behaviour. 3.2. Inference Following the arguments of Sartori (2003), limiting properties of inferential procedures depend on the asymptotic normality of the studentized adjusted profile score function   $$[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1/2}\bar\ell_\psi\{\bar\theta(\psi)\}=O_{\rm p}(1) + O_{\rm p}[\max\{(q/m^3)^{1/2}, m^{-1/2}\}],$$ (5) where $$\{\bar{k}^{\psi\psi}(\theta)\}^{-1/2}$$ is the matrix square root of $$\{\bar{k}^{\psi\psi}(\theta)\}^{-1}$$. The term that is bounded in probability is $$\{i^{\psi\psi}(\theta)\}^{-1/2}[\ell_{\psi}(\theta)-i_{\psi\lambda}(\theta)\{i_{\lambda\lambda}(\theta)\}^{-1}\ell_{\lambda}(\theta)]$$ and follows a $$p_0$$-variate standard normal distribution. Asymptotic normality requires $$O_{\rm p}[\max\{(q/m^3)^{1/2}, m^{-1/2}\}]=o_{\rm p}(1)$$, and a sufficient condition is $$q/m^3=o(1)$$, as found by Sartori (2003) for modified profile likelihoods. Expansion of $$\bar{\ell}_\psi\{\bar{\theta}(\bar\psi)\}$$ about $$\psi$$, followed by inversion, yields   \begin{eqnarray*} (\bar\psi-\psi)&=&[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1}\bar{\ell}_\psi\{\bar{\theta}(\psi)\}+O_{\rm p}\{ \lVert\bar\psi-\psi\rVert^2 \}\notag\\ &=&O_{\rm p}(n^{-1/2})+O_{\rm p}[\max\{1/(m q^{1/2}), m^{-2}\}]\text{.} \end{eqnarray*} The estimator $$\bar\psi$$ is root-$$n$$ consistent if $$q/m^3=o(1)$$; otherwise it is square-$$m$$ consistent. From (5), the limiting distribution of $$\bar W^u(\psi)$$ is chi-squared with $$p_0$$ degrees of freedom, provided that $$q/m^3=o(1)$$. The result holds also for $$\bar W^e(\psi)$$ and $$\bar W(\psi)$$, because they are asymptotically equivalent to $$\bar W^u(\psi)$$ up to a relative error of magnitude $$n^{-1/2}$$ or $$m^{-2}$$ depending on whether the condition $$q/m^3=o(1)$$ is met or not, respectively. 4. Empirical evidence 4.1. Binary matched observations Let $$\psi$$ be the common odds ratio in $$2\times2$$ tables arising from a series of binary observations $$(y_{i1}, y_{i2})$$$$(i\,{=}\,1,\ldots,q)$$. The latter are realizations of independent binomial random variables $$(Y_{i1}, Y_{i2})$$ with denominators $$1$$ and $$m_i$$ and with success probabilities $$\pi_{i1}=\exp(\psi+\lambda_i)/\{1+\exp(\psi+\lambda_i)\}$$ and $$\pi_{i2}\,{=}\,\exp(\lambda_i)/\{1+\exp(\lambda_i)\}$$, respectively. The loglikelihood for $$\theta=(\psi,\lambda_1, \ldots,\lambda_q)$$ is   \begin{align*} \ell(\theta)=\psi\sum_{i=1}^q y_{i1} + \sum_{i=1}^q \lambda_i(y_{i1}+y_{i2})-\sum_{i=1}^q\bigl[\log\{1+\exp(\psi+\lambda_i)\} + m_i\log\{1+\exp(\lambda_i)\}\bigr]\text{.} \end{align*} This is a full-rank exponential family model with $$t=\sum_{i=1}^q y_{i1}$$ and $$s_i=y_{i1}+y_{i2}$$ sufficient statistics for $$\psi$$ and $$\lambda_i$$, respectively. The conditional likelihood is based on the distribution of $$Y_{i1}$$ given $$S_{i}=s_i$$ in each stratum (Davison, 1988). Bias reduction is achieved via the adjusted loglikelihood $$\bar\ell(\theta)=\ell(\theta) + \log| k(\theta)|^{1/2}$$ because $$\partial\bar\ell(\theta)/\partial\theta=\ell_\theta(\theta)-i(\theta)\Delta(\theta)$$ (Firth, 1993, § 3.1). We compare the finite-sample bias and variance of $$\bar\psi$$ with those of estimators derived from conditional, profile and modified profile likelihoods, denoted by $$\hat\psi_{\text{C}}$$, $$\hat\psi$$ and $$\hat\psi_{\text{M}}$$, respectively. The comparison is also extended to assess the coverage probability and length of 95% confidence intervals for $$\psi$$ based on the chi-squared approximation to the distribution of $$\bar W(\psi)$$ and to the distributions of the conditional, profile and modified profile loglikelihood ratios, denoted by $$W_{\text{C}}(\psi)$$, $$W(\psi)$$ and $$W_{\text{M}}(\psi)$$, respectively. Given the totals in each table, estimators and confidence intervals depend on $$T=\sum_{i=1}^q Y_{i1}$$, so it is feasible to obtain their exact properties through complete enumeration. We consider $$m_i=m$$, tables with totals $$s_i=(m+1)/2$$ as in Sartori (2003, Example 3), and compute numerically the distribution of $$T$$ given $$S_1=s_1,\ldots,S_q=s_q$$ following Butler & Stephens (2017). Table 1 reports the bias and variance of estimators, together with the coverage probabilities and average length of 95% confidence intervals, when the true odds ratio is unity, with $$m\in\{1, 3, 11, 39\}$$ and $$q\,{\in}\,\{30, 100, 1000\}$$. The adjustment to the loglikelihood $$\bar\ell(\theta)$$ prevents monotonicity when $$t\in\{0,q\}$$ (Firth, 1993), but this does occur for the remaining loglikelihoods. For these latter functions the values $$t\in\{0,q\}$$ must be excluded from the analysis, so summaries are computed only for $$t\in\{1, \ldots, q-1\}$$, i.e., the distribution of $$T$$ given $$S_1=s_1,\ldots,S_q=s_q$$ is restricted and normalized within this set. Overall, the bias and variance of the estimators $$\hat\psi_{\text{M}}$$, $$\bar\psi$$ and $$\hat\psi$$ decrease as $$m$$ increases, with the reduction being slower for $$\hat\psi$$. The estimator $$\hat\psi_{\text{M}}$$ tends to behave like $$\hat\psi_{\text{C}}$$ as $$m$$ increases, confirming that the modified profile likelihood approximates the conditional likelihood. For $$q\in\{30,100\}$$, the estimator $$\bar\psi$$ has smaller bias and variance than $$\hat\psi_{\text{M}}$$, and this is also the case when it is compared with $$\hat\psi_{\text{C}}$$ for $$m>3$$. Concerning confidence intervals, the discreteness of $$W_{\text{C}}(\psi)$$ and the chi-squared approximation to its distribution entail that the corresponding intervals achieve the nominal coverage when $$q=1000$$. Intervals derived from $$W_{\text{M}}(\psi)$$ and $$\bar W(\psi)$$ exhibit the same coverage in almost all settings and are consistent with those from $$W_{\text{C}}(\psi)$$ for $$m\geqslant 3$$. Results not reported here show that unconditional and conditional summaries for bias reduction inferential procedures are indistinguishable from the precision shown in Table 1. Table 1. Binary matched observations with true odds ratio $$\psi=1$$: second to fifth columns show the bias and variance (in parentheses) of estimators derived from conditional, modified profile, adjusted profile and profile likelihoods, with all entries multiplied by $$10$$; sixth to ninth columns show the coverage probability and average length (in parentheses) of confidence intervals with nominal level $$95\%$$ derived from conditional, modified profile, adjusted profile and profile loglikelihood ratios, with all coverage probabilities multiplied by $$100$$     Estimators  Confidence intervals     $$\hat\psi_{\text{C}}$$  $$\hat\psi_{\text{M}}$$  $$\bar\psi$$  $$\hat\psi$$  $$W_{\text{C}}$$  $$W_{\text{M}}$$  $$\bar W$$  $$W$$  $$m$$  $$q=30$$  1  0$$\cdot$$44 (1$$\cdot$$97)  2$$\cdot$$91 (2$$\cdot$$33)  2$$\cdot$$76 (2$$\cdot$$27)  10$$\cdot$$89 (7$$\cdot$$87)  96$$\cdot$$1 (2$$\cdot$$0)  85$$\cdot$$0 (1$$\cdot$$8)  93$$\cdot$$0 (1$$\cdot$$8)  57$$\cdot$$8 (2$$\cdot$$8)  3     0$$\cdot$$69 (1$$\cdot$$82)  0$$\cdot$$48 (1$$\cdot$$72)  3$$\cdot$$56 (3$$\cdot$$06)     96$$\cdot$$1 (1$$\cdot$$9)  96$$\cdot$$1 (2$$\cdot$$1)  85$$\cdot$$0 (2$$\cdot$$2)  11     0$$\cdot$$46 (1$$\cdot$$94)  0$$\cdot$$09 (1$$\cdot$$75)  1$$\cdot$$28 (2$$\cdot$$23)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  92$$\cdot$$0 (2$$\cdot$$0)  39     0$$\cdot$$44 (1$$\cdot$$96)  0$$\cdot$$03 (1$$\cdot$$75)  0$$\cdot$$68 (2$$\cdot$$04)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  96$$\cdot$$1 (2$$\cdot$$0)     $$q=100$$  1  0$$\cdot$$12 (0$$\cdot$$53)  2$$\cdot$$79 (0$$\cdot$$69)  2$$\cdot$$74 (0$$\cdot$$68)  10$$\cdot$$24 (2$$\cdot$$11)  94$$\cdot$$6 (1$$\cdot$$1)  77$$\cdot$$3 (1$$\cdot$$0)  77$$\cdot$$3 (1$$\cdot$$0)  15$$\cdot$$0 (1$$\cdot$$6)  3     0$$\cdot$$48 (0$$\cdot$$51)  0$$\cdot$$42 (0$$\cdot$$50)  3$$\cdot$$23 (0$$\cdot$$84)     93$$\cdot$$9 (1$$\cdot$$1)  95$$\cdot$$7 (1$$\cdot$$2)  69$$\cdot$$9 (1$$\cdot$$2)  11     0$$\cdot$$15 (0$$\cdot$$52)  0$$\cdot$$05 (0$$\cdot$$51)  0$$\cdot$$96 (0$$\cdot$$61)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  91$$\cdot$$8 (1$$\cdot$$2)  39     0$$\cdot$$12 (0$$\cdot$$53)  0$$\cdot$$01 (0$$\cdot$$51)  0$$\cdot$$36 (0$$\cdot$$55)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  93$$\cdot$$9 (1$$\cdot$$2)     $$q=1000$$  1  0$$\cdot$$01 (0$$\cdot$$05)  2$$\cdot$$74 (0$$\cdot$$07)  2$$\cdot$$74 (0$$\cdot$$07)  10$$\cdot$$02 (0$$\cdot$$20)  95$$\cdot$$0 (0$$\cdot$$4)  6$$\cdot$$3 (0$$\cdot$$3)  6$$\cdot$$3 (0$$\cdot$$3)  0$$\cdot$$0 (0$$\cdot$$5)  3     0$$\cdot$$40 (0$$\cdot$$05)  0$$\cdot$$40 (0$$\cdot$$05)  3$$\cdot$$12 (0$$\cdot$$08)     91$$\cdot$$2 (0$$\cdot$$4)  91$$\cdot$$2 (0$$\cdot$$4)  4$$\cdot$$1 (0$$\cdot$$4)  11     0$$\cdot$$04 (0$$\cdot$$05)  0$$\cdot$$03 (0$$\cdot$$05)  0$$\cdot$$85 (0$$\cdot$$06)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  79$$\cdot$$1 (0$$\cdot$$4)  39     0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$25 (0$$\cdot$$05)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  93$$\cdot$$6 (0$$\cdot$$4)     Estimators  Confidence intervals     $$\hat\psi_{\text{C}}$$  $$\hat\psi_{\text{M}}$$  $$\bar\psi$$  $$\hat\psi$$  $$W_{\text{C}}$$  $$W_{\text{M}}$$  $$\bar W$$  $$W$$  $$m$$  $$q=30$$  1  0$$\cdot$$44 (1$$\cdot$$97)  2$$\cdot$$91 (2$$\cdot$$33)  2$$\cdot$$76 (2$$\cdot$$27)  10$$\cdot$$89 (7$$\cdot$$87)  96$$\cdot$$1 (2$$\cdot$$0)  85$$\cdot$$0 (1$$\cdot$$8)  93$$\cdot$$0 (1$$\cdot$$8)  57$$\cdot$$8 (2$$\cdot$$8)  3     0$$\cdot$$69 (1$$\cdot$$82)  0$$\cdot$$48 (1$$\cdot$$72)  3$$\cdot$$56 (3$$\cdot$$06)     96$$\cdot$$1 (1$$\cdot$$9)  96$$\cdot$$1 (2$$\cdot$$1)  85$$\cdot$$0 (2$$\cdot$$2)  11     0$$\cdot$$46 (1$$\cdot$$94)  0$$\cdot$$09 (1$$\cdot$$75)  1$$\cdot$$28 (2$$\cdot$$23)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  92$$\cdot$$0 (2$$\cdot$$0)  39     0$$\cdot$$44 (1$$\cdot$$96)  0$$\cdot$$03 (1$$\cdot$$75)  0$$\cdot$$68 (2$$\cdot$$04)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  96$$\cdot$$1 (2$$\cdot$$0)     $$q=100$$  1  0$$\cdot$$12 (0$$\cdot$$53)  2$$\cdot$$79 (0$$\cdot$$69)  2$$\cdot$$74 (0$$\cdot$$68)  10$$\cdot$$24 (2$$\cdot$$11)  94$$\cdot$$6 (1$$\cdot$$1)  77$$\cdot$$3 (1$$\cdot$$0)  77$$\cdot$$3 (1$$\cdot$$0)  15$$\cdot$$0 (1$$\cdot$$6)  3     0$$\cdot$$48 (0$$\cdot$$51)  0$$\cdot$$42 (0$$\cdot$$50)  3$$\cdot$$23 (0$$\cdot$$84)     93$$\cdot$$9 (1$$\cdot$$1)  95$$\cdot$$7 (1$$\cdot$$2)  69$$\cdot$$9 (1$$\cdot$$2)  11     0$$\cdot$$15 (0$$\cdot$$52)  0$$\cdot$$05 (0$$\cdot$$51)  0$$\cdot$$96 (0$$\cdot$$61)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  91$$\cdot$$8 (1$$\cdot$$2)  39     0$$\cdot$$12 (0$$\cdot$$53)  0$$\cdot$$01 (0$$\cdot$$51)  0$$\cdot$$36 (0$$\cdot$$55)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  93$$\cdot$$9 (1$$\cdot$$2)     $$q=1000$$  1  0$$\cdot$$01 (0$$\cdot$$05)  2$$\cdot$$74 (0$$\cdot$$07)  2$$\cdot$$74 (0$$\cdot$$07)  10$$\cdot$$02 (0$$\cdot$$20)  95$$\cdot$$0 (0$$\cdot$$4)  6$$\cdot$$3 (0$$\cdot$$3)  6$$\cdot$$3 (0$$\cdot$$3)  0$$\cdot$$0 (0$$\cdot$$5)  3     0$$\cdot$$40 (0$$\cdot$$05)  0$$\cdot$$40 (0$$\cdot$$05)  3$$\cdot$$12 (0$$\cdot$$08)     91$$\cdot$$2 (0$$\cdot$$4)  91$$\cdot$$2 (0$$\cdot$$4)  4$$\cdot$$1 (0$$\cdot$$4)  11     0$$\cdot$$04 (0$$\cdot$$05)  0$$\cdot$$03 (0$$\cdot$$05)  0$$\cdot$$85 (0$$\cdot$$06)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  79$$\cdot$$1 (0$$\cdot$$4)  39     0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$25 (0$$\cdot$$05)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  93$$\cdot$$6 (0$$\cdot$$4)  To provide a real-data example with different stratum sizes, we reanalyse an example in Davison (1988, § 6.1). The data come from an experiment intended to assess the effectiveness of rocking motion on the crying of babies and were collected according to a matched case-control design with one case and $$m_i$$ controls per stratum, where $$i=1,\ldots,18$$. Davison (1988) obtained $$\hat\psi_{\text{C}}=1{\cdot}256$$ and $$\hat\psi=1{\cdot}432$$ with estimated standard errors 0$$\cdot$$686 and 0$$\cdot$$734, respectively. We found that $$\hat\psi_{\text{M}}=1{\cdot}269$$ and $$\bar\psi=1{\cdot}156$$ with estimated standard errors 0$$\cdot$$688 and 0$$\cdot$$652, respectively. As the true odds ratio $$\psi$$ is unknown, it is not possible to judge which estimator should be preferred. Nonetheless, we try to assess the reliability of the estimators by computing their actual bias and variance, conditioning on the observed totals, for a set of values of $$\psi$$ ranging from $$-3$$ to $$3$$. The results are reported in Table 2. For all estimators except $$\bar\psi$$, the bias and variance increase sharply as $$\psi$$ approaches $$-3$$ and $$3$$. By considering the outcome of the analysis and the estimates provided by $$\hat\psi_{\text{C}}$$, $$\hat\psi_{\text{M}}$$ and $$\bar\psi$$, we may conclude that $$\bar\psi$$ is preferable as its mean squared error is smaller at all the values of $$\psi$$ considered. Table 2. Real-data example: bias and variance (in parentheses) of estimators as the true odds ratio $$\psi$$ varies; estimators derived from conditional, modified profile, adjusted profile and profile likelihoods are denoted by $$\hat\psi_{\mathrm{C}}$$, $$\hat\psi_{\mathrm{M}}$$, $$\bar\psi$$ and $$\hat\psi$$, respectively, and all entries are multiplied by $$10$$     $$\psi$$     $$-3$$  $$-2$$  $$-1$$  $$0$$  $$1$$  $$2$$  $$3$$  $$\hat\psi_{\text{C}}$$  2$$\cdot$$75 (21$$\cdot$$43)  $$-$$0$$\cdot$$61 (4$$\cdot$$16)  $$-$$0$$\cdot$$12 (3$$\cdot$$00)  0$$\cdot$$36 (3$$\cdot$$30)  0$$\cdot$$76 (4$$\cdot$$73)  $$-$$3$$\cdot$$49$$\:0$$(8$$\cdot$$06)  $$-$$17$$\cdot$$25 (15$$\cdot$$00)  $$\hat\psi_{\text{M}}$$  2$$\cdot$$08 (11$$\cdot$$54)  $$-$$1$$\cdot$$31 (4$$\cdot$$37)  $$-$$0$$\cdot$$42 (3$$\cdot$$23)  0$$\cdot$$34 (3$$\cdot$$42)  0$$\cdot$$88 (4$$\cdot$$83)  $$-$$3$$\cdot$$31$$\:0$$(8$$\cdot$$22)  $$-$$17$$\cdot$$12 (15$$\cdot$$30)  $$\bar\psi$$  $$-$$0$$\cdot$$14$$\:0$$(3$$\cdot$$51)  $$-$$0$$\cdot$$68 (3$$\cdot$$85)  $$-$$0$$\cdot$$17 (2$$\cdot$$95)  0$$\cdot$$07 (3$$\cdot$$03)  0$$\cdot$$18 (4$$\cdot$$55)  0$$\cdot$$18$$\:0$$(7$$\cdot$$49)  $$-$$1$$\cdot$$48$$\:0$$(7$$\cdot$$45)  $$\hat\psi$$  $$-$$5$$\cdot$$34 (21$$\cdot$$52)  $$-$$5$$\cdot$$49 (7$$\cdot$$82)  $$-$$2$$\cdot$$02 (4$$\cdot$$48)  0$$\cdot$$37 (4$$\cdot$$40)  2$$\cdot$$33 (6$$\cdot$$35)  $$-$$0$$\cdot$$91 (11$$\cdot$$01)  $$-$$15$$\cdot$$18 (20$$\cdot$$36)     $$\psi$$     $$-3$$  $$-2$$  $$-1$$  $$0$$  $$1$$  $$2$$  $$3$$  $$\hat\psi_{\text{C}}$$  2$$\cdot$$75 (21$$\cdot$$43)  $$-$$0$$\cdot$$61 (4$$\cdot$$16)  $$-$$0$$\cdot$$12 (3$$\cdot$$00)  0$$\cdot$$36 (3$$\cdot$$30)  0$$\cdot$$76 (4$$\cdot$$73)  $$-$$3$$\cdot$$49$$\:0$$(8$$\cdot$$06)  $$-$$17$$\cdot$$25 (15$$\cdot$$00)  $$\hat\psi_{\text{M}}$$  2$$\cdot$$08 (11$$\cdot$$54)  $$-$$1$$\cdot$$31 (4$$\cdot$$37)  $$-$$0$$\cdot$$42 (3$$\cdot$$23)  0$$\cdot$$34 (3$$\cdot$$42)  0$$\cdot$$88 (4$$\cdot$$83)  $$-$$3$$\cdot$$31$$\:0$$(8$$\cdot$$22)  $$-$$17$$\cdot$$12 (15$$\cdot$$30)  $$\bar\psi$$  $$-$$0$$\cdot$$14$$\:0$$(3$$\cdot$$51)  $$-$$0$$\cdot$$68 (3$$\cdot$$85)  $$-$$0$$\cdot$$17 (2$$\cdot$$95)  0$$\cdot$$07 (3$$\cdot$$03)  0$$\cdot$$18 (4$$\cdot$$55)  0$$\cdot$$18$$\:0$$(7$$\cdot$$49)  $$-$$1$$\cdot$$48$$\:0$$(7$$\cdot$$45)  $$\hat\psi$$  $$-$$5$$\cdot$$34 (21$$\cdot$$52)  $$-$$5$$\cdot$$49 (7$$\cdot$$82)  $$-$$2$$\cdot$$02 (4$$\cdot$$48)  0$$\cdot$$37 (4$$\cdot$$40)  2$$\cdot$$33 (6$$\cdot$$35)  $$-$$0$$\cdot$$91 (11$$\cdot$$01)  $$-$$15$$\cdot$$18 (20$$\cdot$$36)  Acknowledgement I wish to thank G. Adimari, N. Sartori, R. De Bin, D. O. Scharfstein and N. L. Hjort for helpful discussions, as well as the referees and the editor for their comments. Supplementary material Supplementary Material available at Biometrika online includes the conditions and sketches of proofs for the claims in § 3. References Barndorff-Nielsen O. E. ( 1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika  70, 343– 65. Google Scholar CrossRef Search ADS   Brazzale A. R. ( 2005). hoa: An R package bundle for higher order likelihood inference. Rnews  5, 20– 7. Butler K. & Stephens M. A. ( 2017). The distribution of a sum of independent binomial random variables. Methodol. Comp. Appl. Prob.  19, 557– 71. Google Scholar CrossRef Search ADS   Cox D. R. & Reid N. ( 1987). Parameter orthogonality and approximate conditional inference (with Discussion). J. R. Statist. Soc. B  49, 1– 39. Davison A. C. ( 1988). Approximate conditional inference in generalized linear models. J. R. Statist. Soc. B  50, 445– 61. DiCiccio T. J., Martin M. A., Stern S. E. & Young G. A. ( 1996). Information bias and adjusted profile likelihoods. J. R. Statist. Soc. B  58, 189– 203. Firth D. ( 1993). Bias reduction of maximum likelihood estimates. Biometrika  80, 27– 38. Google Scholar CrossRef Search ADS   Heinze G. & Schemper M. ( 2001). A solution to the problem of monotone likelihood in Cox regression. Biometrics  98, 114– 9. Google Scholar CrossRef Search ADS   Heinze G. & Schemper M. ( 2002). A solution to the problem of separation in logistic regression. Statist. Med.  21, 2409– 19. Google Scholar CrossRef Search ADS   Kalbfleisch J. D. & Sprott D. A. ( 1970). Application of likelihood methods to models involving large numbers of parameters. J. R. Statist. Soc. B  32, 175– 208. Kosmidis I. ( 2014). Improved estimation in cumulative link models. J. R. Statist. Soc. B  76, 169– 96. Google Scholar CrossRef Search ADS   Kosmidis I. ( 2017). brglm2: Bias reduction in generalized linear models.  R package version 0.1.4. Kosmidis I. & Firth D. ( 2009). Bias reduction in exponential family nonlinear models. Biometrika  96, 793– 804. Google Scholar CrossRef Search ADS   McCullagh P. & Tibshirani R. J. ( 1990). A simple method for the adjustment of profile likelihoods. J. R. Statist. Soc. B  52, 325– 44. Neyman J. & Scott E. L. ( 1948). Consistent estimates based on partially consistent observations. Econometrica  16, 1– 32. Google Scholar CrossRef Search ADS   Pace L. & Salvan A. ( 1997). Principles of Statistical Inference: From a Neo-Fisherian Perspective.  Singapore: World Scientific. Google Scholar CrossRef Search ADS   Sartori N. ( 2003). Modified profile likelihoods in models with stratum nuisance parameters. Biometrika  90, 533– 49. Google Scholar CrossRef Search ADS   Sartori N. ( 2006). Bias prevention of maximum likelihood estimates for scalar skew normal and skew $$t$$ distributions. J. Statist. Plan. Infer.  136, 4259– 75. Google Scholar CrossRef Search ADS   Stern S. E. ( 1997). A second-order adjustment to the profile likelihood in the case of a multidimensional parameter of interest. J. R. Statist. Soc. B  59, 653– 65. Google Scholar CrossRef Search ADS   © 2018 Biometrika Trust http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biometrika Oxford University Press

# On bias reduction and incidental parameters

, Volume 105 (1) – Mar 1, 2018
6 pages

/lp/ou_press/on-bias-reduction-and-incidental-parameters-NePVvJBBk4
Publisher
Oxford University Press
ISSN
0006-3444
eISSN
1464-3510
D.O.I.
10.1093/biomet/asx079
Publisher site
See Article on Publisher Site

### Abstract

Summary Firth (1993) introduced a method for reducing the bias of the maximum likelihood estimator. Here it is shown that the approach is also effective in reducing the sensitivity of inferential procedures to incidental parameters. 1. Introduction Consider the model $$\{f_i(y;\psi,\lambda_i),\,i\,{=}\,1,\ldots,q\}$$ for independent stratified observations as in Sartori (2003). The unknown parameter is $$\theta\,{=}\,(\psi,\lambda)$$, where $$\psi$$ is a $$p_0$$-dimensional parameter of interest and $$\lambda\,{=}\,(\lambda_1,\ldots,\lambda_q)$$ is a $$q$$-dimensional nuisance parameter. The loglikelihood function for $$\theta$$ based on independent observations $$y_{ij}=(y_{i1},\ldots,y_{im_i})$$ is   \begin{equation*} \ell(\theta)=\sum_{i=1}^{q}\sum_{j=1}^{m_i} \log f_i(y_{ij};\psi,\lambda_i)=\sum_{i=1}^{q} \ell_i(\psi,\lambda_{i})\text{.} \end{equation*} The dimension $$q$$ of the nuisance parameter, or equivalently the number of strata, and the number $$m_i$$ of observations in each stratum may diverge. Neyman & Scott (1948) refer to $$\psi$$ and $$\lambda$$ as the structural and incidental parameters, respectively. Since their contribution, obtaining a consistent estimator of the structural parameter has been a challenge for statisticians. In the likelihood framework, this problem can be tackled by resorting to marginal, conditional and profile likelihoods (Kalbfleisch & Sprott, 1970). Typically, marginal and conditional likelihoods provide consistent estimators of the structural parameter, but their specification requires partially sufficient and partially distribution-constant statistics, respectively, so the availability of such likelihoods is restricted to exponential and composite group families (Pace & Salvan, 1997, § § 5.4 and 7.5). The derivation of the profile likelihood is not troublesome in itself, but the resulting estimator of the structural parameter is usually inconsistent, because the profile likelihood is not a genuine likelihood and therefore the Bartlett identities do not hold. For instance, failure of the first identity entails that the bias of the profile score function is $$O(q)$$. Using modified profile likelihoods can be regarded as a way to recover, at least approximately, the first Bartlett identity and thus obtain more reliable inference about the structural parameter (Barndorff-Nielsen, 1983; Cox & Reid, 1987; McCullagh & Tibshirani, 1990; DiCiccio et al., 1996; Stern, 1997). Under the assumption that the number of observations in each stratum is bounded by positive finite numbers that are proportional to an integer $$m$$, Sartori (2003) showed that the bias of modified profile score functions is $$O(q/m)$$. This result is used to establish the limiting properties of inferential procedures derived from modified profile likelihoods in terms of $$m$$, $$q$$ and $$n=mq$$. Specifically, if $$q/m^3=o(1)$$, then the derived estimator is root-$$n$$ consistent and the modified profile Wald, score and loglikelihood ratio test statistics differ by a relative error of order $$n^{-1/2}$$; otherwise the error rates are in suitable powers of $$m$$. In this note we show that the bias reduction approach of Firth (1993) provides an inferential framework which is, from an asymptotic perspective, equivalent to that for modified profile likelihoods when dealing with incidental parameters. This equivalence allows a user to choose either bias reduction or modified profile likelihoods based on finite-sample and subject matter considerations. For example, both approaches are not in general invariant under interest-respecting reparameterizations, and software is available for many common models (Brazzale, 2005; Kosmidis, 2017). On the other hand, bias reduction can handle the problem of monotone likelihood (see, e.g., Firth, 1993; Heinze & Schemper, 2001, 2002; Sartori, 2006; Kosmidis, 2014). 2. Notation and relevant quantities Vectors and matrices are partitioned into components pertaining to the structural and incidental parameters. For instance, the score vector $$\ell_\theta(\theta)=\partial\ell(\theta)/\partial\theta$$ has components $$\ell_\psi(\theta)$$ and $$\ell_\lambda(\theta)$$, while the observed information matrix $$k(\theta)=-\partial^2\ell(\theta)/(\partial\theta\partial\theta^{ \mathrm{\scriptscriptstyle T} })$$ has blocks $$k_{\psi\psi}(\theta)$$, $$k_{\lambda\lambda}(\theta)$$ and $$k_{\psi\lambda}(\theta)$$. Blocks of the inverse of a matrix are indicated by superscripts, e.g., $$k^{\psi\psi}(\theta)$$. The bias reduction approach is based on the adjusted score function (Firth, 1993, § 2)   $$\bar\ell_\theta(\theta)=\ell_\theta(\theta)-\nu(\theta)\Delta(\theta),$$ (1) where $$\nu(\theta)$$ can be either $$k(\theta)$$ or the expected information matrix $$i(\theta)$$, and $$\Delta(\theta)$$ is the leading term of the bias of the maximum likelihood estimator. The adjusted profile score function for $$\psi$$ is $$\bar\ell_\psi\{\bar\theta(\psi)\}$$, where $$\bar\theta(\psi)=\{\psi,\bar\lambda(\psi)\}$$ is the root of $$\bar\ell_\lambda(\theta)=0$$ for fixed $$\psi$$. Denoting by $$\bar\psi$$ the solution to $$\bar\ell_\psi\{\bar\theta(\psi)\}=0$$ and writing $$\bar{k}(\theta)=-\partial\bar\ell_\theta(\theta)/\partial\theta^{ \mathrm{\scriptscriptstyle T} }$$, the adjusted profile Wald and score test statistics are, respectively,   \begin{equation*} \bar W^e(\psi)=(\bar\psi-\psi)^{ \mathrm{\scriptscriptstyle T} } \bar{k}^{\psi\psi}\{\bar\theta(\psi)\}(\bar\psi-\psi),\quad \bar W^u(\psi)=\bar\ell_\psi\{\bar\theta(\psi)\}^{ \mathrm{\scriptscriptstyle T} }[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1}\bar\ell_\psi\{\bar\theta(\psi)\}, \end{equation*} and $$\bar W(\psi)=2[\bar\ell\{\bar\theta(\bar\psi)\}-\bar\ell\{\bar\theta(\psi)\}]$$ is the loglikelihood ratio test statistic, where $$\bar\ell(\theta)$$ is the adjusted loglikelihood function. Existence of $$\bar\ell(\theta)$$ is guaranteed, for instance, when $$\theta$$ is the canonical parameter in exponential family models; see Kosmidis & Firth (2009). 3. Reconciliation 3.1. Main results The results presented here hold under the conditions in Sartori (2003). We review the conditions and sketch the proofs in the Supplementary Material. The bias of the adjusted profile score function has the expansion   $$E_{\theta}[\bar\ell_\psi\{\bar\theta(\psi)\}]=\tau_{\psi}(\theta)+\{i^{\psi\psi}(\theta)\}^{-1}\Delta_\psi(\theta)+O(q/m)=O\{\max(1,\,q/m)\},$$ (2) where $$\tau_{\psi}(\theta)$$ is the leading term of the bias of the profile score function. Expression (2) highlights the fact that the bias of the adjusted profile score function has components arising from the profile score function, $$\tau_{\psi}(\theta)$$, and from the bias reduction approach, $$\{i^{\psi\psi}(\theta)\}^{-1}\Delta_\psi(\theta)$$. Although both terms are $$O(q)$$, the second is equal to $$-\tau_{\psi}(\theta)+O(1)$$ and compensates for most of the profile score bias. The order of (2) is equivalent to that of the bias of a modified profile score function, i.e., $$O(q/m)$$, if $$q/m\neq o(1)$$. However, in some cases, the bias of the adjusted profile score function is asymptotically smaller than the bound given in (2), or can even be zero as shown in the following example. Example 1 (Neyman–Scott problem). Let $$(y_{i1},\ldots,y_{im})$$ be $$m$$ independent realizations of normal random variables $$Y_{ij}$$ with means $$\lambda_i$$ and common variance $$\psi$$$$(i=1,\ldots,q; \,j=1,\ldots,m)$$. Consider the case $$m=2$$ as in Firth (1993, § 4.5). Then the profile score function is $$-q(\hat\psi-\psi)/\psi^2$$, while, when (1) is computed with $$\nu(\theta)=i(\theta)$$, the adjusted profile score function is $$\bar\ell_\psi\{\bar\theta(\psi)\}=-q(2\hat\psi-\psi)/(2\psi^2)$$, where $$\hat\psi=\sum_{i=1}^q(Y_{i1}-Y_{i2})^2/(4q)$$ is the maximum likelihood estimator for $$\psi$$. Direct computation gives $$E_{\theta}[\bar\ell_\psi\{\bar\theta(\psi)\}]=0$$. The result in (2) is used to derive the expansion of the adjusted profile score function,   $$\bar\ell_\psi\{\bar\theta(\psi)\}=\ell_{\psi}(\theta)-i_{\psi\lambda}(\theta)\{i_{\lambda\lambda}(\theta)\}^{-1}\ell_{\lambda}(\theta) + O_{\rm p}\{\max(q/m, \,q^{1/2})\};$$ (3) this and the expansion of the observed adjusted profile information,   $$[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1}=\{i^{\psi\psi}(\theta)\}^{-1}\{1+O_{\rm p}(m^{-1})+O_{\rm p}(n^{-1/2})\}=O(n)\{1+o_{\rm p}(1)\},$$ (4) provide the key quantities needed to prove the limiting behaviour of inferential procedures derived from the bias reduction approach. Expansions (3) and (4) are asymptotically equivalent to their counterparts derived from modified profile likelihoods, which implies that inferential procedures based on bias reduction and on modified profile likelihoods have the same limiting behaviour. 3.2. Inference Following the arguments of Sartori (2003), limiting properties of inferential procedures depend on the asymptotic normality of the studentized adjusted profile score function   $$[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1/2}\bar\ell_\psi\{\bar\theta(\psi)\}=O_{\rm p}(1) + O_{\rm p}[\max\{(q/m^3)^{1/2}, m^{-1/2}\}],$$ (5) where $$\{\bar{k}^{\psi\psi}(\theta)\}^{-1/2}$$ is the matrix square root of $$\{\bar{k}^{\psi\psi}(\theta)\}^{-1}$$. The term that is bounded in probability is $$\{i^{\psi\psi}(\theta)\}^{-1/2}[\ell_{\psi}(\theta)-i_{\psi\lambda}(\theta)\{i_{\lambda\lambda}(\theta)\}^{-1}\ell_{\lambda}(\theta)]$$ and follows a $$p_0$$-variate standard normal distribution. Asymptotic normality requires $$O_{\rm p}[\max\{(q/m^3)^{1/2}, m^{-1/2}\}]=o_{\rm p}(1)$$, and a sufficient condition is $$q/m^3=o(1)$$, as found by Sartori (2003) for modified profile likelihoods. Expansion of $$\bar{\ell}_\psi\{\bar{\theta}(\bar\psi)\}$$ about $$\psi$$, followed by inversion, yields   \begin{eqnarray*} (\bar\psi-\psi)&=&[\bar{k}^{\psi\psi}\{\bar\theta(\psi)\}]^{-1}\bar{\ell}_\psi\{\bar{\theta}(\psi)\}+O_{\rm p}\{ \lVert\bar\psi-\psi\rVert^2 \}\notag\\ &=&O_{\rm p}(n^{-1/2})+O_{\rm p}[\max\{1/(m q^{1/2}), m^{-2}\}]\text{.} \end{eqnarray*} The estimator $$\bar\psi$$ is root-$$n$$ consistent if $$q/m^3=o(1)$$; otherwise it is square-$$m$$ consistent. From (5), the limiting distribution of $$\bar W^u(\psi)$$ is chi-squared with $$p_0$$ degrees of freedom, provided that $$q/m^3=o(1)$$. The result holds also for $$\bar W^e(\psi)$$ and $$\bar W(\psi)$$, because they are asymptotically equivalent to $$\bar W^u(\psi)$$ up to a relative error of magnitude $$n^{-1/2}$$ or $$m^{-2}$$ depending on whether the condition $$q/m^3=o(1)$$ is met or not, respectively. 4. Empirical evidence 4.1. Binary matched observations Let $$\psi$$ be the common odds ratio in $$2\times2$$ tables arising from a series of binary observations $$(y_{i1}, y_{i2})$$$$(i\,{=}\,1,\ldots,q)$$. The latter are realizations of independent binomial random variables $$(Y_{i1}, Y_{i2})$$ with denominators $$1$$ and $$m_i$$ and with success probabilities $$\pi_{i1}=\exp(\psi+\lambda_i)/\{1+\exp(\psi+\lambda_i)\}$$ and $$\pi_{i2}\,{=}\,\exp(\lambda_i)/\{1+\exp(\lambda_i)\}$$, respectively. The loglikelihood for $$\theta=(\psi,\lambda_1, \ldots,\lambda_q)$$ is   \begin{align*} \ell(\theta)=\psi\sum_{i=1}^q y_{i1} + \sum_{i=1}^q \lambda_i(y_{i1}+y_{i2})-\sum_{i=1}^q\bigl[\log\{1+\exp(\psi+\lambda_i)\} + m_i\log\{1+\exp(\lambda_i)\}\bigr]\text{.} \end{align*} This is a full-rank exponential family model with $$t=\sum_{i=1}^q y_{i1}$$ and $$s_i=y_{i1}+y_{i2}$$ sufficient statistics for $$\psi$$ and $$\lambda_i$$, respectively. The conditional likelihood is based on the distribution of $$Y_{i1}$$ given $$S_{i}=s_i$$ in each stratum (Davison, 1988). Bias reduction is achieved via the adjusted loglikelihood $$\bar\ell(\theta)=\ell(\theta) + \log| k(\theta)|^{1/2}$$ because $$\partial\bar\ell(\theta)/\partial\theta=\ell_\theta(\theta)-i(\theta)\Delta(\theta)$$ (Firth, 1993, § 3.1). We compare the finite-sample bias and variance of $$\bar\psi$$ with those of estimators derived from conditional, profile and modified profile likelihoods, denoted by $$\hat\psi_{\text{C}}$$, $$\hat\psi$$ and $$\hat\psi_{\text{M}}$$, respectively. The comparison is also extended to assess the coverage probability and length of 95% confidence intervals for $$\psi$$ based on the chi-squared approximation to the distribution of $$\bar W(\psi)$$ and to the distributions of the conditional, profile and modified profile loglikelihood ratios, denoted by $$W_{\text{C}}(\psi)$$, $$W(\psi)$$ and $$W_{\text{M}}(\psi)$$, respectively. Given the totals in each table, estimators and confidence intervals depend on $$T=\sum_{i=1}^q Y_{i1}$$, so it is feasible to obtain their exact properties through complete enumeration. We consider $$m_i=m$$, tables with totals $$s_i=(m+1)/2$$ as in Sartori (2003, Example 3), and compute numerically the distribution of $$T$$ given $$S_1=s_1,\ldots,S_q=s_q$$ following Butler & Stephens (2017). Table 1 reports the bias and variance of estimators, together with the coverage probabilities and average length of 95% confidence intervals, when the true odds ratio is unity, with $$m\in\{1, 3, 11, 39\}$$ and $$q\,{\in}\,\{30, 100, 1000\}$$. The adjustment to the loglikelihood $$\bar\ell(\theta)$$ prevents monotonicity when $$t\in\{0,q\}$$ (Firth, 1993), but this does occur for the remaining loglikelihoods. For these latter functions the values $$t\in\{0,q\}$$ must be excluded from the analysis, so summaries are computed only for $$t\in\{1, \ldots, q-1\}$$, i.e., the distribution of $$T$$ given $$S_1=s_1,\ldots,S_q=s_q$$ is restricted and normalized within this set. Overall, the bias and variance of the estimators $$\hat\psi_{\text{M}}$$, $$\bar\psi$$ and $$\hat\psi$$ decrease as $$m$$ increases, with the reduction being slower for $$\hat\psi$$. The estimator $$\hat\psi_{\text{M}}$$ tends to behave like $$\hat\psi_{\text{C}}$$ as $$m$$ increases, confirming that the modified profile likelihood approximates the conditional likelihood. For $$q\in\{30,100\}$$, the estimator $$\bar\psi$$ has smaller bias and variance than $$\hat\psi_{\text{M}}$$, and this is also the case when it is compared with $$\hat\psi_{\text{C}}$$ for $$m>3$$. Concerning confidence intervals, the discreteness of $$W_{\text{C}}(\psi)$$ and the chi-squared approximation to its distribution entail that the corresponding intervals achieve the nominal coverage when $$q=1000$$. Intervals derived from $$W_{\text{M}}(\psi)$$ and $$\bar W(\psi)$$ exhibit the same coverage in almost all settings and are consistent with those from $$W_{\text{C}}(\psi)$$ for $$m\geqslant 3$$. Results not reported here show that unconditional and conditional summaries for bias reduction inferential procedures are indistinguishable from the precision shown in Table 1. Table 1. Binary matched observations with true odds ratio $$\psi=1$$: second to fifth columns show the bias and variance (in parentheses) of estimators derived from conditional, modified profile, adjusted profile and profile likelihoods, with all entries multiplied by $$10$$; sixth to ninth columns show the coverage probability and average length (in parentheses) of confidence intervals with nominal level $$95\%$$ derived from conditional, modified profile, adjusted profile and profile loglikelihood ratios, with all coverage probabilities multiplied by $$100$$     Estimators  Confidence intervals     $$\hat\psi_{\text{C}}$$  $$\hat\psi_{\text{M}}$$  $$\bar\psi$$  $$\hat\psi$$  $$W_{\text{C}}$$  $$W_{\text{M}}$$  $$\bar W$$  $$W$$  $$m$$  $$q=30$$  1  0$$\cdot$$44 (1$$\cdot$$97)  2$$\cdot$$91 (2$$\cdot$$33)  2$$\cdot$$76 (2$$\cdot$$27)  10$$\cdot$$89 (7$$\cdot$$87)  96$$\cdot$$1 (2$$\cdot$$0)  85$$\cdot$$0 (1$$\cdot$$8)  93$$\cdot$$0 (1$$\cdot$$8)  57$$\cdot$$8 (2$$\cdot$$8)  3     0$$\cdot$$69 (1$$\cdot$$82)  0$$\cdot$$48 (1$$\cdot$$72)  3$$\cdot$$56 (3$$\cdot$$06)     96$$\cdot$$1 (1$$\cdot$$9)  96$$\cdot$$1 (2$$\cdot$$1)  85$$\cdot$$0 (2$$\cdot$$2)  11     0$$\cdot$$46 (1$$\cdot$$94)  0$$\cdot$$09 (1$$\cdot$$75)  1$$\cdot$$28 (2$$\cdot$$23)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  92$$\cdot$$0 (2$$\cdot$$0)  39     0$$\cdot$$44 (1$$\cdot$$96)  0$$\cdot$$03 (1$$\cdot$$75)  0$$\cdot$$68 (2$$\cdot$$04)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  96$$\cdot$$1 (2$$\cdot$$0)     $$q=100$$  1  0$$\cdot$$12 (0$$\cdot$$53)  2$$\cdot$$79 (0$$\cdot$$69)  2$$\cdot$$74 (0$$\cdot$$68)  10$$\cdot$$24 (2$$\cdot$$11)  94$$\cdot$$6 (1$$\cdot$$1)  77$$\cdot$$3 (1$$\cdot$$0)  77$$\cdot$$3 (1$$\cdot$$0)  15$$\cdot$$0 (1$$\cdot$$6)  3     0$$\cdot$$48 (0$$\cdot$$51)  0$$\cdot$$42 (0$$\cdot$$50)  3$$\cdot$$23 (0$$\cdot$$84)     93$$\cdot$$9 (1$$\cdot$$1)  95$$\cdot$$7 (1$$\cdot$$2)  69$$\cdot$$9 (1$$\cdot$$2)  11     0$$\cdot$$15 (0$$\cdot$$52)  0$$\cdot$$05 (0$$\cdot$$51)  0$$\cdot$$96 (0$$\cdot$$61)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  91$$\cdot$$8 (1$$\cdot$$2)  39     0$$\cdot$$12 (0$$\cdot$$53)  0$$\cdot$$01 (0$$\cdot$$51)  0$$\cdot$$36 (0$$\cdot$$55)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  93$$\cdot$$9 (1$$\cdot$$2)     $$q=1000$$  1  0$$\cdot$$01 (0$$\cdot$$05)  2$$\cdot$$74 (0$$\cdot$$07)  2$$\cdot$$74 (0$$\cdot$$07)  10$$\cdot$$02 (0$$\cdot$$20)  95$$\cdot$$0 (0$$\cdot$$4)  6$$\cdot$$3 (0$$\cdot$$3)  6$$\cdot$$3 (0$$\cdot$$3)  0$$\cdot$$0 (0$$\cdot$$5)  3     0$$\cdot$$40 (0$$\cdot$$05)  0$$\cdot$$40 (0$$\cdot$$05)  3$$\cdot$$12 (0$$\cdot$$08)     91$$\cdot$$2 (0$$\cdot$$4)  91$$\cdot$$2 (0$$\cdot$$4)  4$$\cdot$$1 (0$$\cdot$$4)  11     0$$\cdot$$04 (0$$\cdot$$05)  0$$\cdot$$03 (0$$\cdot$$05)  0$$\cdot$$85 (0$$\cdot$$06)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  79$$\cdot$$1 (0$$\cdot$$4)  39     0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$25 (0$$\cdot$$05)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  93$$\cdot$$6 (0$$\cdot$$4)     Estimators  Confidence intervals     $$\hat\psi_{\text{C}}$$  $$\hat\psi_{\text{M}}$$  $$\bar\psi$$  $$\hat\psi$$  $$W_{\text{C}}$$  $$W_{\text{M}}$$  $$\bar W$$  $$W$$  $$m$$  $$q=30$$  1  0$$\cdot$$44 (1$$\cdot$$97)  2$$\cdot$$91 (2$$\cdot$$33)  2$$\cdot$$76 (2$$\cdot$$27)  10$$\cdot$$89 (7$$\cdot$$87)  96$$\cdot$$1 (2$$\cdot$$0)  85$$\cdot$$0 (1$$\cdot$$8)  93$$\cdot$$0 (1$$\cdot$$8)  57$$\cdot$$8 (2$$\cdot$$8)  3     0$$\cdot$$69 (1$$\cdot$$82)  0$$\cdot$$48 (1$$\cdot$$72)  3$$\cdot$$56 (3$$\cdot$$06)     96$$\cdot$$1 (1$$\cdot$$9)  96$$\cdot$$1 (2$$\cdot$$1)  85$$\cdot$$0 (2$$\cdot$$2)  11     0$$\cdot$$46 (1$$\cdot$$94)  0$$\cdot$$09 (1$$\cdot$$75)  1$$\cdot$$28 (2$$\cdot$$23)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  92$$\cdot$$0 (2$$\cdot$$0)  39     0$$\cdot$$44 (1$$\cdot$$96)  0$$\cdot$$03 (1$$\cdot$$75)  0$$\cdot$$68 (2$$\cdot$$04)     96$$\cdot$$1 (2$$\cdot$$0)  96$$\cdot$$1 (2$$\cdot$$2)  96$$\cdot$$1 (2$$\cdot$$0)     $$q=100$$  1  0$$\cdot$$12 (0$$\cdot$$53)  2$$\cdot$$79 (0$$\cdot$$69)  2$$\cdot$$74 (0$$\cdot$$68)  10$$\cdot$$24 (2$$\cdot$$11)  94$$\cdot$$6 (1$$\cdot$$1)  77$$\cdot$$3 (1$$\cdot$$0)  77$$\cdot$$3 (1$$\cdot$$0)  15$$\cdot$$0 (1$$\cdot$$6)  3     0$$\cdot$$48 (0$$\cdot$$51)  0$$\cdot$$42 (0$$\cdot$$50)  3$$\cdot$$23 (0$$\cdot$$84)     93$$\cdot$$9 (1$$\cdot$$1)  95$$\cdot$$7 (1$$\cdot$$2)  69$$\cdot$$9 (1$$\cdot$$2)  11     0$$\cdot$$15 (0$$\cdot$$52)  0$$\cdot$$05 (0$$\cdot$$51)  0$$\cdot$$96 (0$$\cdot$$61)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  91$$\cdot$$8 (1$$\cdot$$2)  39     0$$\cdot$$12 (0$$\cdot$$53)  0$$\cdot$$01 (0$$\cdot$$51)  0$$\cdot$$36 (0$$\cdot$$55)     94$$\cdot$$6 (1$$\cdot$$1)  94$$\cdot$$6 (1$$\cdot$$2)  93$$\cdot$$9 (1$$\cdot$$2)     $$q=1000$$  1  0$$\cdot$$01 (0$$\cdot$$05)  2$$\cdot$$74 (0$$\cdot$$07)  2$$\cdot$$74 (0$$\cdot$$07)  10$$\cdot$$02 (0$$\cdot$$20)  95$$\cdot$$0 (0$$\cdot$$4)  6$$\cdot$$3 (0$$\cdot$$3)  6$$\cdot$$3 (0$$\cdot$$3)  0$$\cdot$$0 (0$$\cdot$$5)  3     0$$\cdot$$40 (0$$\cdot$$05)  0$$\cdot$$40 (0$$\cdot$$05)  3$$\cdot$$12 (0$$\cdot$$08)     91$$\cdot$$2 (0$$\cdot$$4)  91$$\cdot$$2 (0$$\cdot$$4)  4$$\cdot$$1 (0$$\cdot$$4)  11     0$$\cdot$$04 (0$$\cdot$$05)  0$$\cdot$$03 (0$$\cdot$$05)  0$$\cdot$$85 (0$$\cdot$$06)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  79$$\cdot$$1 (0$$\cdot$$4)  39     0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$01 (0$$\cdot$$05)  0$$\cdot$$25 (0$$\cdot$$05)     95$$\cdot$$0 (0$$\cdot$$4)  95$$\cdot$$0 (0$$\cdot$$4)  93$$\cdot$$6 (0$$\cdot$$4)  To provide a real-data example with different stratum sizes, we reanalyse an example in Davison (1988, § 6.1). The data come from an experiment intended to assess the effectiveness of rocking motion on the crying of babies and were collected according to a matched case-control design with one case and $$m_i$$ controls per stratum, where $$i=1,\ldots,18$$. Davison (1988) obtained $$\hat\psi_{\text{C}}=1{\cdot}256$$ and $$\hat\psi=1{\cdot}432$$ with estimated standard errors 0$$\cdot$$686 and 0$$\cdot$$734, respectively. We found that $$\hat\psi_{\text{M}}=1{\cdot}269$$ and $$\bar\psi=1{\cdot}156$$ with estimated standard errors 0$$\cdot$$688 and 0$$\cdot$$652, respectively. As the true odds ratio $$\psi$$ is unknown, it is not possible to judge which estimator should be preferred. Nonetheless, we try to assess the reliability of the estimators by computing their actual bias and variance, conditioning on the observed totals, for a set of values of $$\psi$$ ranging from $$-3$$ to $$3$$. The results are reported in Table 2. For all estimators except $$\bar\psi$$, the bias and variance increase sharply as $$\psi$$ approaches $$-3$$ and $$3$$. By considering the outcome of the analysis and the estimates provided by $$\hat\psi_{\text{C}}$$, $$\hat\psi_{\text{M}}$$ and $$\bar\psi$$, we may conclude that $$\bar\psi$$ is preferable as its mean squared error is smaller at all the values of $$\psi$$ considered. Table 2. Real-data example: bias and variance (in parentheses) of estimators as the true odds ratio $$\psi$$ varies; estimators derived from conditional, modified profile, adjusted profile and profile likelihoods are denoted by $$\hat\psi_{\mathrm{C}}$$, $$\hat\psi_{\mathrm{M}}$$, $$\bar\psi$$ and $$\hat\psi$$, respectively, and all entries are multiplied by $$10$$     $$\psi$$     $$-3$$  $$-2$$  $$-1$$  $$0$$  $$1$$  $$2$$  $$3$$  $$\hat\psi_{\text{C}}$$  2$$\cdot$$75 (21$$\cdot$$43)  $$-$$0$$\cdot$$61 (4$$\cdot$$16)  $$-$$0$$\cdot$$12 (3$$\cdot$$00)  0$$\cdot$$36 (3$$\cdot$$30)  0$$\cdot$$76 (4$$\cdot$$73)  $$-$$3$$\cdot$$49$$\:0$$(8$$\cdot$$06)  $$-$$17$$\cdot$$25 (15$$\cdot$$00)  $$\hat\psi_{\text{M}}$$  2$$\cdot$$08 (11$$\cdot$$54)  $$-$$1$$\cdot$$31 (4$$\cdot$$37)  $$-$$0$$\cdot$$42 (3$$\cdot$$23)  0$$\cdot$$34 (3$$\cdot$$42)  0$$\cdot$$88 (4$$\cdot$$83)  $$-$$3$$\cdot$$31$$\:0$$(8$$\cdot$$22)  $$-$$17$$\cdot$$12 (15$$\cdot$$30)  $$\bar\psi$$  $$-$$0$$\cdot$$14$$\:0$$(3$$\cdot$$51)  $$-$$0$$\cdot$$68 (3$$\cdot$$85)  $$-$$0$$\cdot$$17 (2$$\cdot$$95)  0$$\cdot$$07 (3$$\cdot$$03)  0$$\cdot$$18 (4$$\cdot$$55)  0$$\cdot$$18$$\:0$$(7$$\cdot$$49)  $$-$$1$$\cdot$$48$$\:0$$(7$$\cdot$$45)  $$\hat\psi$$  $$-$$5$$\cdot$$34 (21$$\cdot$$52)  $$-$$5$$\cdot$$49 (7$$\cdot$$82)  $$-$$2$$\cdot$$02 (4$$\cdot$$48)  0$$\cdot$$37 (4$$\cdot$$40)  2$$\cdot$$33 (6$$\cdot$$35)  $$-$$0$$\cdot$$91 (11$$\cdot$$01)  $$-$$15$$\cdot$$18 (20$$\cdot$$36)     $$\psi$$     $$-3$$  $$-2$$  $$-1$$  $$0$$  $$1$$  $$2$$  $$3$$  $$\hat\psi_{\text{C}}$$  2$$\cdot$$75 (21$$\cdot$$43)  $$-$$0$$\cdot$$61 (4$$\cdot$$16)  $$-$$0$$\cdot$$12 (3$$\cdot$$00)  0$$\cdot$$36 (3$$\cdot$$30)  0$$\cdot$$76 (4$$\cdot$$73)  $$-$$3$$\cdot$$49$$\:0$$(8$$\cdot$$06)  $$-$$17$$\cdot$$25 (15$$\cdot$$00)  $$\hat\psi_{\text{M}}$$  2$$\cdot$$08 (11$$\cdot$$54)  $$-$$1$$\cdot$$31 (4$$\cdot$$37)  $$-$$0$$\cdot$$42 (3$$\cdot$$23)  0$$\cdot$$34 (3$$\cdot$$42)  0$$\cdot$$88 (4$$\cdot$$83)  $$-$$3$$\cdot$$31$$\:0$$(8$$\cdot$$22)  $$-$$17$$\cdot$$12 (15$$\cdot$$30)  $$\bar\psi$$  $$-$$0$$\cdot$$14$$\:0$$(3$$\cdot$$51)  $$-$$0$$\cdot$$68 (3$$\cdot$$85)  $$-$$0$$\cdot$$17 (2$$\cdot$$95)  0$$\cdot$$07 (3$$\cdot$$03)  0$$\cdot$$18 (4$$\cdot$$55)  0$$\cdot$$18$$\:0$$(7$$\cdot$$49)  $$-$$1$$\cdot$$48$$\:0$$(7$$\cdot$$45)  $$\hat\psi$$  $$-$$5$$\cdot$$34 (21$$\cdot$$52)  $$-$$5$$\cdot$$49 (7$$\cdot$$82)  $$-$$2$$\cdot$$02 (4$$\cdot$$48)  0$$\cdot$$37 (4$$\cdot$$40)  2$$\cdot$$33 (6$$\cdot$$35)  $$-$$0$$\cdot$$91 (11$$\cdot$$01)  $$-$$15$$\cdot$$18 (20$$\cdot$$36)  Acknowledgement I wish to thank G. Adimari, N. Sartori, R. De Bin, D. O. Scharfstein and N. L. Hjort for helpful discussions, as well as the referees and the editor for their comments. Supplementary material Supplementary Material available at Biometrika online includes the conditions and sketches of proofs for the claims in § 3. References Barndorff-Nielsen O. E. ( 1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika  70, 343– 65. Google Scholar CrossRef Search ADS   Brazzale A. R. ( 2005). hoa: An R package bundle for higher order likelihood inference. Rnews  5, 20– 7. Butler K. & Stephens M. A. ( 2017). The distribution of a sum of independent binomial random variables. Methodol. Comp. Appl. Prob.  19, 557– 71. Google Scholar CrossRef Search ADS   Cox D. R. & Reid N. ( 1987). Parameter orthogonality and approximate conditional inference (with Discussion). J. R. Statist. Soc. B  49, 1– 39. Davison A. C. ( 1988). Approximate conditional inference in generalized linear models. J. R. Statist. Soc. B  50, 445– 61. DiCiccio T. J., Martin M. A., Stern S. E. & Young G. A. ( 1996). Information bias and adjusted profile likelihoods. J. R. Statist. Soc. B  58, 189– 203. Firth D. ( 1993). Bias reduction of maximum likelihood estimates. Biometrika  80, 27– 38. Google Scholar CrossRef Search ADS   Heinze G. & Schemper M. ( 2001). A solution to the problem of monotone likelihood in Cox regression. Biometrics  98, 114– 9. Google Scholar CrossRef Search ADS   Heinze G. & Schemper M. ( 2002). A solution to the problem of separation in logistic regression. Statist. Med.  21, 2409– 19. Google Scholar CrossRef Search ADS   Kalbfleisch J. D. & Sprott D. A. ( 1970). Application of likelihood methods to models involving large numbers of parameters. J. R. Statist. Soc. B  32, 175– 208. Kosmidis I. ( 2014). Improved estimation in cumulative link models. J. R. Statist. Soc. B  76, 169– 96. Google Scholar CrossRef Search ADS   Kosmidis I. ( 2017). brglm2: Bias reduction in generalized linear models.  R package version 0.1.4. Kosmidis I. & Firth D. ( 2009). Bias reduction in exponential family nonlinear models. Biometrika  96, 793– 804. Google Scholar CrossRef Search ADS   McCullagh P. & Tibshirani R. J. ( 1990). A simple method for the adjustment of profile likelihoods. J. R. Statist. Soc. B  52, 325– 44. Neyman J. & Scott E. L. ( 1948). Consistent estimates based on partially consistent observations. Econometrica  16, 1– 32. Google Scholar CrossRef Search ADS   Pace L. & Salvan A. ( 1997). Principles of Statistical Inference: From a Neo-Fisherian Perspective.  Singapore: World Scientific. Google Scholar CrossRef Search ADS   Sartori N. ( 2003). Modified profile likelihoods in models with stratum nuisance parameters. Biometrika  90, 533– 49. Google Scholar CrossRef Search ADS   Sartori N. ( 2006). Bias prevention of maximum likelihood estimates for scalar skew normal and skew $$t$$ distributions. J. Statist. Plan. Infer.  136, 4259– 75. Google Scholar CrossRef Search ADS   Stern S. E. ( 1997). A second-order adjustment to the profile likelihood in the case of a multidimensional parameter of interest. J. R. Statist. Soc. B  59, 653– 65. Google Scholar CrossRef Search ADS   © 2018 Biometrika Trust

### Journal

BiometrikaOxford University Press

Published: Mar 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations