Biostatistics, Volume 19 (2) – Apr 1, 2018

16 pages

/lp/ou_press/conditional-copula-models-for-right-censored-clustered-event-time-data-w1fwRsiInA

- Publisher
- Oxford University Press
- ISSN
- 1465-4644
- eISSN
- 1468-4357
- D.O.I.
- 10.1093/biostatistics/kxx034
- Publisher site
- See Article on Publisher Site

SUMMARY This article proposes a modeling strategy to infer the impact of a covariate on the dependence structure of right-censored clustered event time data. The joint survival function of the event times is modeled using a conditional copula whose parameter depends on a cluster-level covariate in a functional way. We use a local likelihood approach to estimate the form of the copula parameter and outline a generalized likelihood ratio-type test strategy to formally test its constancy. A bootstrap procedure is employed to obtain an approximate $$p$$-value for the test. The performance of the proposed estimation and testing methods is evaluated in simulations under different rates of right-censoring and for various parametric copula families, considering both parametrically and nonparametrically estimated margins. We apply the methods to data from the Diabetic Retinopathy Study to assess the impact of age at diabetes onset on the time to loss of visual acuity. 1. Introduction Many biomedical studies involve clustered time-to-event data, which can be right-censored and which may exhibit strong dependence. For instance, lifetimes of twins or married couples are often dependent due to shared genetic or environmental factors, and characterizing these dependencies helps making informed decisions in health research. Other examples include time to failure of matched organs, such as eyes or kidneys, and occurrence times of linked diseases. In such studies, the data analysis should be directed toward unraveling the within-cluster dependence, or one should at least account for its presence in the applied modeling strategy. Copula models are well suited for this task. Copulas are dependence functions that link together the marginal survival functions to form the joint survival function. Their use in survival analysis has a long history dating back to Clayton (1978), followed by Oakes (1982), Hougaard (1986), Shih and Louis (1995), and Chen and others (2010), among others. In these articles, the focus is mainly on the unconditional dependence structure of event times and not on the presence of covariates that could provide additional information on the joint survival function. One exception is Clayton (1978), which devotes a section on strategies to include covariates in the association analysis of bivariate failure times and suggests adjusting both the marginal survival functions and the dependence parameter for covariates, but without any elaborate treatment. Despite Clayton’s suggestion, most commonly used approaches in survival analysis incorporate covariates only in the marginal models, and neglect their potential impact on the association structure. For instance, in an effort to perform covariate adjustment, Huster and others (1989) proposed a parametric analysis of paired right-censored event time data in the presence of a binary covariate, with an application to diabetic retinopathy data. In this analysis, the type of diabetes, classified into juvenile or adult groups based on age at diabetes onset, is considered as the covariate; and its impact is accounted for only in the marginal models for the time to loss of visual acuity in the laser-treated and untreated eyes, but not in the association structure. This amounts to an implicit assumption that the dependence parameter is the same for the juvenile and adult diabetes groups, which may not be realistic or at least needs to be verified. Based on a visual representation of the data, it is difficult to track whether the dependence parameters of the two groups differ or not, mainly due to the high rate of right-censoring (see Figure S3 of the supplementary material available at Biostatistics online). While there exists many tools to account for covariates in the marginal survival functions of clustered right-censored time-to-event data, there is a need to extend copula-based models to include covariate information in the association structure. This article proposes covariate-adjusted dependence analysis for clustered right-censored event time data using a conditional copula whose parameter is allowed to depend on a cluster-level covariate. When the latter is binary or discrete with few categories, one can form two or more strata according to the covariate value and fit a copula to each stratum separately. The impact of a continuous covariate on the dependence parameter is notoriously more difficult to formulate, as it should be specified in functional terms and is typically data specific. This invites the use of nonparametric techniques for function estimation. In the case of complete data, i.e. when there is no censoring, nonparametric estimation of the copula parameter function has been previously studied in Acar and others (2011) and Abegaz and others (2012) for parametrically and nonparametrically estimated margins, respectively. These proposals are built on local likelihood methods Tibshirani and Hastie, 1987 combined with local polynomial estimation (Fan and Gijbels, 1996). They are, however, not directly applicable to right-censored event times. The presence of right-censoring in the event times greatly challenges the statistical analysis, and its incorporation in the copula parameter estimation is necessary. A recent work in this domain is Ding and others (2015), which proposes a nonparametric estimator for the concordance probability as a function of covariates. However, this approach does not readily generalize to a likelihood-based model formulation. Here, the first contribution is an extension of the conditional copula framework in Acar and others (2011) and Abegaz and others (2012) to handle right-censored event time data, for both parametrically and nonparametrically estimated marginal survival functions. For the former, we focus on the Weibull model as employed in Huster and others (1989), and for the latter we consider the Beran estimator (Beran, 1981). Under both cases, we address the issue of bandwidth selection and investigate the impact of censoring rate and sample size on the estimation. We further examine how misspecification of the copula family affects the estimation results. The second contribution is a test strategy for the constancy of the conditional copula parameter across the range of a cluster-level covariate. In the case of a discrete covariate, one can employ the traditional likelihood ratio test to assess whether the dependence parameters for different strata can be deemed the same. However, for a continuous covariate, one is required to test the constancy of the whole dependence parameter function. Here, this is achieved by adopting the test strategy in Acar and others (2013). The test is built on the generalized likelihood ratio (GLR) statistic of Fan and others (2001) for testing a parametric or a nonparametric null hypothesis versus a nonparametric alternative hypothesis. For conditional copulas with complete data, Acar and others (2013) derived the asymptotic null distribution of the test statistic and used it to obtain a decision rule. The presence of right-censoring complicates the development of the asymptotic null distribution. Therefore, we alternatively propose a bootstrap procedure to obtain an approximate $$p$$-value for the test. The proposed estimation and testing methods are detailed in Sections 2 and 3, respectively. Section 4 contains the results from our simulations under different rates of right-censoring and for various parametric copula families, considering both parametrically and nonparametrically estimated margins. In Section 5, we revisit the diabetic retinopathy data and assess the impact of age at diabetes onset on the time to loss of visual acuity. The article concludes with a brief discussion. The bootstrap algorithms and part of the simulation and data analysis results are collected in the supplementary material available at Biostatistics online. 2. Conditional copula model for right-censored event time data In this section, we introduce the notation and describe the proposed conditional copula approach for right-censored clustered event times. To ease the presentation, we focus on the bivariate setting. However, the results can be extended to multivariate settings using a multivariate copula. Let $$(T_1, T_2)$$ be a vector of bivariate event times, and let $$X$$ be a continuous cluster-level covariate. Then, for each $$x$$ in the support of $$X$$, the conditional joint survival function $$S_X(t_1,t_2|x)=P(T_1>t_1,T_2>t_2|X=x)$$ of the vector $$(T_1,T_2)|X=x$$ has a unique representation (Patton, 2006) given by \begin{equation} \label{eq1} S_X(t_1,t_2 \mid X= x) = \mathbb{C}_{X}\left( S_{1\mid x}(t_1 \mid x), S_{2 \mid x}(t_2 \mid x) \mid X=x \right), \end{equation} (2.1) where $$\mathbb{C}_{X}$$ is the conditional copula of the event times, and $$S_{k \mid x}(t_k|x)=P(T_k>t_k|X=x)$$ is the continuous conditional marginal survival function of $$T_k|X=x$$ ($$k=1,2$$). A major complication in fitting the model in (2.1) is that for right-censored data, the true event time is not always recorded, but instead, a lower time, called the censoring time, is observed. Let $$(C_1, C_2)$$ be a vector of bivariate censoring times, independent of $$(T_1,T_2)$$. We observe the minima $$(Y_1,Y_2)= \left(\min\{T_1,C_1\},\min\{T_2,C_2\}\right)$$, together with the corresponding right-censoring indicators $$(\delta_1,~\delta_2)=(I\{T_1\leq C_1\},~I\{T_2\leq C_2\})$$. In the special case where the same censoring time applies to all members of a cluster, we have $$C= C_1=C_2$$, a situation referred to as univariate censoring. Given a random sample $$\{(Y_{1i}, Y_{2i}, \delta_{1i}, \delta_{2i}, X_i)\}_{i=1}^n$$, the fitting of the model in (2.1) is typically performed in two stages; first for the conditional marginal survival functions $$S_{k\mid x}$$, and second for the conditional copula $$\mathbb{C}_{X}$$. To estimate $$S_{k\mid x}$$, one can employ parametric or nonparametric strategies. These are outlined briefly in Section 2.1. We then describe the proposed nonparametric strategy for fitting the conditional copula in Section 2.2. The details on the bandwidth selection for the nonparametric methods are given in Section 2.3. 2.1. Estimation of the conditional marginal survival functions In the case of parametric conditional margins, e.g. Weibull as employed in Section 4, we have $$ S_{k\mid x}(t_k \mid x) = S_{k\mid x}(t_k \mid x, \boldsymbol{{\gamma}}_k )$$, with $$\boldsymbol{{\gamma}}_k$$ an unknown parameter vector ($$k=1,2$$). One can then use maximum likelihood estimation to obtain $$\hat{S}_{k\mid x}(t_k \mid x) = S_{k\mid x}(t_k \mid x, \boldsymbol{\hat{\gamma}}_k ),$$ where $$\boldsymbol{\hat{\gamma}}_k = {\rm{argmax}}_{\boldsymbol{\gamma}_k}{ \sum_{i=1}^n \{ \delta_{ki} \ln f_{k\mid x}(Y_{ki} | X_i , \boldsymbol{\gamma}_k) + (1- \delta_{ki}) \ln S_{k\mid x}(Y_{ki} | X_i , \boldsymbol{\gamma}_k) \} }$$ is the maximum likelihood estimate of the vector of marginal parameters ($$k=1,2$$). In the absence of a suitable parametric model, the conditional margins can be estimated nonparametrically using the Beran estimator (Beran, 1981), also called the conditional Kaplan–Meier estimator, which takes the form $$ \tilde{S}_{k\mid x}(t_k \mid x) = \prod_{Y_{ki} \leq t_k, \delta_{ki} = 1} \left( 1- \dfrac{ w_{nki} (x; h_{k}) } { \sum_{j=1}^{n} I\{Y_{kj}\geq Y_{ki}\} w_{nkj} (x; h_{k}) } \right)\!. $$ The weights $$w_{nki}$$ are typically defined as $$ w_{nki} (x; h_{k}) = \dfrac{ K_{h_k}\left( X_{i} - x \right) }{\sum_{j=1}^n K_{h_k}\left( X_{j}- x \right) }, $$ where $$K_{h_k} (\cdot)= K(\cdot /h_k) /h_k $$, with $$K$$ the kernel function and $$h_k$$ the bandwidth parameter ($$k=1,2$$). In our implementations, we use the Epanechnikov kernel. 2.2. Estimation of the conditional copula Given the estimated margins, and assuming that for each value of $$x$$, the conditional copula $$\mathbb{C}_{X}$$ belongs to the same parametric copula family, one can fit $$\mathbb{C}_{X}$$ within the likelihood framework. In this case, the impact of a covariate is considered to be solely on the strength of dependence, which is captured by the copula parameter $$\theta(X)$$ of $$\mathbb{C}_{X}$$. Due to the restricted parameter range of some copula families, instead of directly modeling $$\theta(X)$$, we consider the reparametrization $$ \theta(x) = g^{-1}(\eta(x))$$, where $$\eta(\cdot)$$ is called the calibration function and $$g^{-1}: \mathbb{R} \rightarrow \Theta $$ is a prespecified inverse-link function with $$\Theta$$ being the parameter space of a given copula family. For some commonly used copula families, possible inverse link functions are provided in Table 1. Table 1 Inverse link functions and Kendall’s tau conversions for some copula families Family $$\mathbb{C}(u_{1},u_{2})$$ $$\theta $$ $$g^{-1}(\eta)$$ $$\tau$$ Clayton $$ (u_{1}^{-\theta}+u_{2}^{-\theta}-1)^{-\frac{1}{\theta}} $$ $$(0,\infty)$$ $$\exp(\eta)$$ $$\frac{\theta}{\theta +2}$$ Frank $$- \frac{1}{\theta} \ln \Big\{ 1+ \frac{({\rm e}^{-\theta u_{1}}-1) ({\rm e}^{-\theta u_{2}}-1)}{ {\rm e}^{-\theta }-1} \Big\} $$ $$(-\infty,\infty) \setminus \{0\}$$ $$\eta$$ $$1+ \frac{4}{\theta} [D_{1}(\theta)-1]$$ Gumbel $$ \exp \Big\{ - [(-\ln u_{1})^{\theta} + (- \ln u_{2})^{\theta} ]^{\frac{1}{\theta}} \Big\} $$ $$[1,\infty) $$ $$\exp(\eta)+1$$ $$1- \frac{1}{\theta}$$ Family $$\mathbb{C}(u_{1},u_{2})$$ $$\theta $$ $$g^{-1}(\eta)$$ $$\tau$$ Clayton $$ (u_{1}^{-\theta}+u_{2}^{-\theta}-1)^{-\frac{1}{\theta}} $$ $$(0,\infty)$$ $$\exp(\eta)$$ $$\frac{\theta}{\theta +2}$$ Frank $$- \frac{1}{\theta} \ln \Big\{ 1+ \frac{({\rm e}^{-\theta u_{1}}-1) ({\rm e}^{-\theta u_{2}}-1)}{ {\rm e}^{-\theta }-1} \Big\} $$ $$(-\infty,\infty) \setminus \{0\}$$ $$\eta$$ $$1+ \frac{4}{\theta} [D_{1}(\theta)-1]$$ Gumbel $$ \exp \Big\{ - [(-\ln u_{1})^{\theta} + (- \ln u_{2})^{\theta} ]^{\frac{1}{\theta}} \Big\} $$ $$[1,\infty) $$ $$\exp(\eta)+1$$ $$1- \frac{1}{\theta}$$ where $$D_{1}(\theta) = \frac{1}{\theta} \int_{0}^{\theta} \frac{t}{{\rm e}^{t}-1} {\rm d}t $$ is the Debye function. Letting $$U_k \equiv S_{k\mid x}(\cdot \mid x)$$ for $$k=1,2$$, the model in (2.1) becomes \begin{equation} \label{eq2} \left(U_{1},U_{2}\right) \mid X= x \sim \mathbb{C}_{X} \left( u_{1},u_{2} \mid g^{-1}(\eta(x)) \right). \end{equation} (2.2) Hence, the loglikelihood function takes the form (Shih and Louis, 1995; Massonnet and others, 2009) \begin{equation} \label{eq3} \displaystyle \sum_{i=1}^n \ell \left( g^{-1}(\eta(X_i)), {U}_{1i}, {U}_{2i} \right), \end{equation} (2.3) where \begin{eqnarray*} \ell (v, u_1,u_2 ) &=& (1-\delta_{1}) (1-\delta_{2}) \ln \mathbb{C}_{X}(u_1,u_2 \mid v ) + \delta_{1} (1-\delta_{2}) \ln \left[\frac{\partial \mathbb{C}_{X}(u_1,u_2 \mid v )}{\partial u_1} \right] \\ && + (1-\delta_{1}) \delta_{2} \ln \left[\frac{\partial \mathbb{C}_{X}(u_1,u_2 \mid v)}{\partial u_2} \right] + \delta_{1} \delta_{2} \ln \left[\frac{\partial^2 \mathbb{C}_{X}(u_1,u_2 \mid v)}{\partial u_1 \partial u_2} \right]. \end{eqnarray*} Due to right-censoring, the loglikelihood contributions of the data vectors are nontrivial, i.e. they involve the copula function $$\mathbb{C}_{X}$$ as well as its first and second order derivatives. This aspect induces an additional computational challenge as compared to the case of complete data. To fit the conditional copula, one can use maximum likelihood estimation by specifying a parametric form for $$\eta(X)$$. However, in most applications it is difficult to prespecify the relation between the covariate and the dependence strength. Therefore, it is often advised to employ nonparametric strategies (Acar and others, 2011; Abegaz and others, 2012). Suppose that $$\eta(\cdot)$$ is sufficiently smooth, i.e., $$\eta(\cdot)$$ has the ($$p+1$$)th derivative at the point $$x$$. Then, for a covariate value $$X_i$$ in a neighborhood of $$x$$, we can approximate $$\eta(X_i)$$ by \begin{eqnarray*} \eta(X_i) &\approx& \eta(x) + \eta^{(1)}(x) (X_i - x) + \ldots + \displaystyle \frac{\eta^{(p)}(x)}{p!} (X_i - x)^p \\ & \equiv & \beta_{0,x} + \beta_{1,x} (X_i - x) + \ldots + \beta_{p,x} (X_i - x)^p \equiv {\boldsymbol{x}}_{i,x}^{T}~\boldsymbol{\beta}_x, \end{eqnarray*} where $${\boldsymbol x}_{i, x}= (1, X_{i}-x,\ldots, (X_{i}-x)^{p} ) ^{T} $$ and $$\boldsymbol{\beta}_x = (\beta_{0,x},\beta_{1,x},\ldots,\beta_{p,x})^{T}$$ with $$\beta_{r,x}=\eta^{(r)}(x) / r!$$. We then estimate $$\boldsymbol{\beta}_x$$ by maximizing a local version of (2.3), which is given by \begin{equation} \label{eq4} \sum_{i=1}^n \ell \left( g^{-1}( \boldsymbol{x}_{i,x}^{T} \boldsymbol{\beta}_x), {U}_{1i}, {U}_{2i} \right) \; K_{h_\mathbb{C}}(X_i - x), \end{equation} (2.4) where $$K_{h_\mathbb{C}} (\cdot)= K(\cdot /h_\mathbb{C}) /h_\mathbb{C} $$, with $$K$$ the kernel function (e.g. the Epanechnikov kernel), $$h_\mathbb{C}$$ the bandwidth parameter and $${U}_{ki} \equiv {S}_{k\mid x}(Y_{ki} \mid X_i) $$ ($$k=1,2$$). In our implementations, we use the local linear estimation with $$p = 1$$. An odd order polynomial is preferred over an even order one as the latter has an induced bias (Fan and Gijbels, 1996). The choice of $$p=1$$ suffices in most applications. In practice, the conditional survival margins $$U_{ki}$$ in (2.4) are replaced by either parametric estimates, $$\hat{U}_{ki} \equiv \hat{S}_{k\mid x}(Y_{ki} \mid X_i) $$, or nonparametric estimates $$\tilde{U}_{ki} \equiv \tilde{S}_{k\mid x}(Y_{ki}~\mid~X_i)$$ ($$k=1,2$$). The resulting local maximum likelihood estimates are denoted by $$\boldsymbol{\hat{\beta}}_x = (\hat{\beta}_{0,x},\hat{\beta}_{1,x}, \ldots, \hat{\beta}_{p,x})^{T} $$ and $$\boldsymbol{\tilde{\beta}}_x= (\tilde{\beta}_{0,x},\tilde{\beta}_{1,x}, \ldots, \tilde{\beta}_{p,x})^{T}$$, respectively. From these, one can obtain $$\hat{\eta}(x) = \hat{\beta}_{0,x}$$ and $$\tilde{\eta}(x) = \tilde{\beta}_{0,x}$$, which in turn yield the estimates of the copula parameter at covariate value $$x$$ via $$\hat{\theta}(x) = g^{-1} ( \hat{\eta}(x) )$$ and $$\tilde{\theta}(x) = g^{-1} ( \tilde{\eta}(x) )$$. 2.3. Bandwidth selection The choice of the bandwidth parameter $$h_\mathbb{C} $$ plays an important role in the bias-variance trade-off. If the conditional marginal survival functions are estimated parametrically, we select the bandwidth value that maximizes the leave-one-out cross-validated loglikelihood function (Acar and others, 2011) \begin{equation} \label{eq:5} B(h_\mathbb{C} ) = \displaystyle \sum_{i=1}^n \ell \left( \hat{\theta}^{(-i)}_{h_\mathbb{C}}(X_i) , \hat{U}_{1i}, \hat{U}_{2i} \right), \end{equation} (2.5) where $$\hat{\theta}^{(-i)}_{h_\mathbb{C}}(X_i)$$ is the estimated copula parameter at bandwidth value $${h_\mathbb{C}}$$, obtained using the data points ($$\hat{U}_{1j},\hat{U}_{2j}, \delta_{1j}, \delta_{2j}, X_j$$) with $$j = 1,\ldots, i-1, i+1, \ldots, n$$. If the Beran estimator is used to obtain the conditional marginal survival functions, the bandwidths $$h_1$$ and $$h_2$$ also need to be selected. Since the model fitting is performed in two stages, we separately choose $$h_1$$ and $$h_2$$ in stage 1 and then determine $$h_\mathbb{C}$$ in stage 2. For each conditional margin, we select the bandwidth value that minimizes the leave-one-out cross-validated criterion \begin{eqnarray} \label{eq:mar_band} \tilde{B}(h_k) = \displaystyle \underset{h}{\rm{argmin}} \sum_{i=1}^n \sum_{j=1}^n \Delta_{kij} \left( I(Y_{ki} \leq Y_{kj}) - \tilde{F}_{k|x}^{(-i)}(Y_{kj}|X_i) \right)^2, \end{eqnarray} (2.6) where $$\tilde{F}_{k|x}^{(-i)}(\cdot|X_i) = 1 - \tilde{S}_{k|x}^{(-i)}(\cdot|X_i)$$ is the Beran estimator of the k$${\rm{th}}$$ conditional margin at bandwidth value $$h_k$$, obtained using the data points ($${Y}_{1j},{Y}_{2j}, \delta_{1j}, \delta_{2j}, X_j$$) with $$j = 1,\ldots, i-1, i+1, \ldots, n$$ ($$k=1,2$$). Further, $$\Delta_{kij}$$ is $$1$$ for each so-called useful pair of observed times and is $$0$$ for all other pairs. A pair of observed times $$(Y_{ki},Y_{kj})$$ is considered to be useful if the value of the indicator $$I\{Y_{ki} \leq Y_{kj}\}$$ gives an unambiguous correct value for the indicator $$I\{T_{ki} \leq T_{kj}\}$$, which contains the corresponding true (possibly unknown) event times $$(T_{ki},T_{kj})$$. From the definition it follows that a pair of observed times $$(Y_{ki},Y_{kj})$$ is useful if one of the following holds: (i) $$(\delta_{ki},\delta_{kj}) = (1,1)$$; (ii) $$(\delta_{ki},\delta_{kj}) = (1,0)$$ and $$Y_{ki} \leq Y_{kj}$$; (iii) $$(\delta_{ki},\delta_{kj}) = (0,1)$$ and $$Y_{ki} \geq Y_{kj}$$; or (iv) $$i=j$$$$(i,j=1,\ldots,n$$ and $$k=1,2)$$. Gannoun and others (2007) use a criterion similar to (2.6) to perform bandwidth selection for the Beran estimator, but consider only the pairs of true event times in their bandwidth selector. Note that inclusion of useful pairs of observed times would be advantageous especially when the censoring rate is high. Once the bandwidth values $$h_1$$ and $$h_2$$ are determined, the criterion in (2.5) can be used to choose $$h_\mathbb{C}$$, with $$\hat{U}_{1i}$$ and $$\hat{U}_{2i}$$ replaced by $$\tilde{U}_{1i}$$ and $$\tilde{U}_{2i}$$, respectively. 3. Generalized likelihood ratio test A relevant question in applications is whether a covariate has a significant impact on the underlying dependence structure of the clustered event times. This is equivalent to testing the constancy of the conditional copula parameter as formalized by \begin{eqnarray} \label{eq:null} \rm{H}_0 : \theta(\cdot) \in \mathfrak{f}_c \qquad \quad \text{versus} \qquad \quad \rm{H}_1 : \theta(\cdot) \notin \mathfrak{f}_c, \end{eqnarray} (3.1) where $$\mathfrak{f}_c = \{ \theta(\cdot): \exists \; \theta_0 \in \Theta \mbox{ such that } \theta(X)=\theta_0, \; \; \forall X\in \mathcal{X} \}$$ is the set of all constant functions on $$X\in \mathcal{X}$$. The null hypothesis corresponds to the so-called simplifying assumption in pair-copula constructions (Hobæk Haff and others, 2010; Acar and others, 2012). The hypothesis testing problem in (3.1) can be evaluated through the difference (Acar and others, 2013) $$ \lambda_n = \sup_{\theta(\cdot) \notin \mathfrak{f}_c } \{\mathbb{L}_n (\rm{H}_1) \} - \sup_{\theta(\cdot) \in \mathfrak{f}_c} \{\mathbb{L}_n (\rm{H}_0)\}, $$ where $$ \mathbb{L}_n ({\rm{H}}_0) =\sum_{i=1}^n \ell \left( \theta_0 , U_{1i}, U_{2i} \right) \qquad \text{and} \qquad \mathbb{L}_n ({\rm{H}}_1) = \sum_{i=1}^n \ell \left( {\theta}(X_i), U_{1i}, U_{2i} \right). $$ The statistic $$\lambda_n$$ is referred to as the generalized likelihood ratio (GLR), and differs from the canonical likelihood ratio in that the model under the alternative hypothesis is specified nonparametrically. Hence, the distribution of the test statistic depends on the bandwidth parameter and the kernel function used in the nonparametric estimation. Further, as mentioned before, the presence of right-censoring complicates the loglikelihood expressions, making the assessment of the asymptotic null distribution of the test statistic quite cumbersome. Even when available, the convergence of the null distribution to the asymptotic distribution might be slow; hence a bootstrap estimate is typically used in finite samples to approximate the null distribution (Fan and Zhang, 2004; Fan and Jiang, 2005). Here, we follow a similar strategy and propose a parametric bootstrap algorithm to obtain an approximate $$p$$-value for the test. When the conditional marginal survival functions are estimated parametrically, the supremum of the loglikelihood function under the null hypothesis is given by $$ \mathbb{L}_n ({\rm{H}}_0, \hat{\theta}_0)= \sum_{i=1}^n \ell \left( \hat{\theta}_0, \hat{U}_{1i}, \hat{U}_{2i} \right), $$ where $$ \hat{\theta}_0$$ is the maximum likelihood estimator of the constant copula parameter $$\theta_0$$ based on observations $$(\hat{U}_{1i}, \hat{U}_{2i}, \delta_{1i}, \delta_{2i})$$, $$i=1,\ldots,n$$. For the alternative hypothesis, we use the local likelihood estimator $$\hat \theta_{h_\mathbb{C}}$$ at each covariate value (Section 2.2) and obtain $$ \mathbb{L}_n ({\rm{H}}_1,\hat \theta_{h_\mathbb{C}} ) = \sum_{i=1}^n \ell \left( \hat{\theta}_{h_\mathbb{C}} (X_i), \hat{U}_{1i}, \hat{U}_{2i} \right), $$ where $$\hat{\theta}_{h_\mathbb{C}}$$ is the estimated copula parameter obtained by maximization of (2.5) with the optimal bandwidth value $$h_\mathbb{C}$$. The GLR statistic then becomes $$ \lambda_n (h_\mathbb{C}) = \sum_{i=1}^n \ell \left( \hat{\theta}_{h_\mathbb{C}} (X_i), \hat{U}_{1i}, \hat{U}_{2i} \right) - \sum_{i=1}^n \ell \left( \hat{\theta}_0, \hat{U}_{1i}, \hat{U}_{2i} \right). $$ To obtain the null distribution of $$\lambda_n (h_\mathbb{C})$$, we use the bootstrap procedure outlined in Algorithm 1 (see Appendix A of the supplementary material available at Biostatistics online) (Davison and Hinkley, 1997; Geerdens and others, 2013, 2016). In the case of nonparametrically estimated conditional margins, a similar strategy is followed to obtain $$ \lambda_n (h_1, h_2, h_\mathbb{C}) = \sum_{i=1}^n \ell \left( \tilde{\theta}_{h_\mathbb{C}} (X_i), \tilde{U}_{1i}, \tilde{U}_{2i} \right) - \sum_{i=1}^n \ell \left( \tilde{\theta}_0, \tilde{U}_{1i}, \tilde{U}_{2i} \right), $$ where $$\tilde{\theta}_0$$ is the maximum likelihood estimator of the constant copula parameter $$\theta_0$$ based on observations $$(\tilde{U}_{1i}, \tilde{U}_{2i}, \delta_{1i}, \delta_{2i})$$, $$i=1,\ldots,n$$. The latter are obtained using $$h_1$$ and $$h_2$$, the optimal bandwidth values minimizing (2.6). For the alternative model, we use the local likelihood estimator $$\tilde{\theta}_{h_\mathbb{C}}$$ at each covariate value (Section 2.2). To obtain the null distribution of $$\lambda_n (h_1, h_2, h_\mathbb{C})$$, we employ the bootstrap in Algorithm 2 (see Appendix A of the supplementary material available at Biostatistics online), which differs from Algorithm 1 mainly in that $$(\hat{U}_{1i}, \hat{U}_{2i})$$ and $$\hat{S}_{k|x}$$ are replaced by $$(\tilde{U}_{1i}, \tilde{U}_{2i})$$ and $$\tilde{S}_{k|x}$$. In the bootstrap, the bandwidth values are taken to be the same as the ones obtained for the original data. Alternatively, one can update the bandwidth value(s) in each bootstrap sample, but at an increased computational cost (see Section 4.1 for a discussion). 4. Simulation study We investigate the finite sample performance of the proposed methods in a simulation study, using Clayton, Frank, and Gumbel copulas with the following dependence structures: Constant model: $$\quad \tau(X) = 0.6$$ Convex model: $$\quad \tau(X) = 0.1(X-3)^2 +0.3$$ Concave model: $$\quad \tau(X) = -0.1 (X-3)^2 +0.7.$$ A visual representation of these functions is given in Figure S1 of Appendix B of the supplementary material available at Biostatistics online. In the constant model, the covariate does not affect the strength of dependence, while in the convex and concave models, the covariate effect has the respective form with Kendall’s tau varying from $$0.3$$ to $$0.7$$. The models for the covariate effect are specified in terms of Kendall’s tau to allow comparisons across different copulas. The conversions between the copula parameter $$\theta$$ and Kendall’s tau are given in Table 1 for the considered copulas. Although the overall dependence, quantified by Kendall’s tau, is the same for all three copulas, their local dependence patterns are quite different; while the Frank copula has no tail dependence, the Clayton and Gumbel copula exhibit lower and upper tail dependence, respectively. Under each scenario, we generate the copula data $$\{ (U_{1i},U_{2i} \mid X_{i}): i=1,2,\ldots,n \}$$ as outlined in Acar and others (2011). That is, we first obtain covariate values $$X_{i}$$ from Uniform $$[2, 5]$$. Then, for each $$i=1,2,\ldots,n$$, we calculate the corresponding Kendall’s tau $$\tau_i \equiv\tau(X_i)$$ and copula parameter $$\theta_i \equiv \theta(X_i)$$. To obtain the event times, we apply the inverse-cdf method to the copula data using the Weibull model $$S_{k|x}(t_k|x)=\exp\{ -\lambda_{T} ~t_k^{\rho_{T}} \exp(\beta_{T} x) \}$$ with parameters $$\lambda_T=0.5$$, $$\rho_T=1.5$$, and $$\beta_T=0.8$$. To introduce right-censoring, we generate the (univariate) censoring times from the Weibull model $$S(c)=\exp(-\lambda_{C} ~c^{\rho_{C}})$$ with parameters $$\lambda_C=1.5$$ and $$\rho_C=1.5$$ for the case of low censoring (approximately 20% censoring rate), and with parameters $$\lambda_C=1.5$$ and $$\rho_C=0.5$$ for the case of moderate censoring (approximately 50% {censoring rate). The observed data are then given by the minima of the event times and the censoring times. We also consider non-censored event time data to assess the impact of censoring on the results. 4.1. Estimation and test results under correctly specified copula family We estimate the conditional marginal survival functions parametrically, using the Weibull model, and nonparametrically, using the Beran estimator. Based on the resulting estimates, we perform local linear estimation $$(p=1)$$ under the correct copula and obtain the estimates of the calibration function. To evaluate the impact of marginal estimation, we include the setting of known marginal survival functions. For the bandwidth parameter(s), we consider $$6$$ candidate values, ranging from 0.3 to 3 and equidistant on a logarithmic scale. The bandwidth selection for the Beran estimator is based on the criterion in (2.6), while the bandwidth selection for the conditional copula estimation is based on the criterion in (2.5), and this for each simulated data set. Results are based on the local calibration estimates at the chosen optimal bandwidth(s). The calibration estimates are converted into the copula parameter and Kendall’s tau via the link functions in Table 1. We evaluate the estimation strategy through the integrated Mean Squared Error (IMSE) along with the integrated squared Bias (IBIAS$$^{2}$$) and integrated Variance (IVAR), given by \begin{eqnarray} && \text{IBIAS}^{2}(\hat{\tau}) = \int_{\mathcal{X}} \big[ E [ \hat{\tau}(x)] - \tau(x) \big]^{2} \; {\rm d}x, \nonumber \qquad\quad \text{IVAR}(\hat{\tau}) = \int_{\mathcal{X}} E \big[ \{ \hat{\tau}(x) - E [ \hat \tau(x)] \}^{2} \big] \; {\rm d}x, \nonumber \\ && \text{IMSE}(\hat{\tau}) = \int_{\mathcal{X}} E \big[ \{ \hat{\tau}(x) - \tau(x) \}^{2} \big] \; {\rm d}x \; = \; \text{IBIAS}^{2}(\hat{\tau}) + \text{IVAR}(\hat{\tau}). \nonumber \end{eqnarray} These quantities are approximated by a Riemann sum over a grid of points $$\{2,2.1,\ldots,4.9,5\}$$ in the covariate range. Under each scenario for Kendall’s tau, we conduct experiments with sample sizes $$n=250$$ and $$n=500$$. The results under the Clayton copula, based on $$M=500$$ Monte Carlo samples for each setting, are displayed in Table 2. For the settings under the Frank and Gumbel copulas, we restrict our investigations to $$M=200$$ Monte Carlo samples in consideration with the computational cost and defer the results to Appendix B of the supplementary material available at Biostatistics online. Table 2 Integrated squared bias, integrated variance, and integrated mean square error (multiplied by 100) of Kendall’s tau estimates, based on $$M=500$$ replications, under the Clayton copula Known margins Parametric margins Nonparametric margins Censoring rate n IBIAS$$^{2}$$ IVAR IMSE IBIAS$$^{2}$$ IVAR IMSE IBIAS$$^{2}$$ IVAR IMSE Constant 0 % 250 0.002 0.382 0.384 0.002 0.604 0.606 0.232 0.956 1.188 500 0.000 0.191 0.191 0.001 0.303 0.304 0.127 0.478 0.605 20 % 250 0.002 0.516 0.518 0.003 0.744 0.746 0.270 1.136 1.406 500 0.000 0.268 0.268 0.001 0.393 0.394 0.131 0.547 0.678 50 % 250 0.003 0.788 0.791 0.013 1.157 1.170 0.524 1.866 2.390 500 0.001 0.429 0.430 0.006 0.636 0.642 0.264 0.980 1.244 Convex 0 % 250 0.091 1.059 1.150 0.080 1.337 1.417 0.289 1.586 1.875 500 0.054 0.514 0.568 0.050 0.644 0.693 0.184 0.794 0.978 20 % 250 0.101 1.502 1.603 0.099 1.720 1.819 0.308 1.898 2.206 500 0.057 0.780 0.837 0.054 0.914 0.968 0.177 1.034 1.211 50 % 250 0.153 2.395 2.548 0.160 2.682 2.842 0.317 3.346 3.663 500 0.073 1.249 1.322 0.068 1.437 1.505 0.199 1.699 1.898 Concave 0 % 250 0.058 0.690 0.748 0.091 0.905 0.996 0.424 1.360 1.785 500 0.032 0.369 0.402 0.046 0.478 0.524 0.243 0.662 0.905 20 % 250 0.065 0.887 0.952 0.101 1.118 1.219 0.509 1.516 2.025 500 0.042 0.456 0.498 0.057 0.576 0.633 0.273 0.761 1.034 50 % 250 0.090 1.396 1.487 0.162 1.781 1.943 0.871 2.515 3.386 500 0.067 0.704 0.771 0.103 0.881 0.984 0.497 1.290 1.787 Known margins Parametric margins Nonparametric margins Censoring rate n IBIAS$$^{2}$$ IVAR IMSE IBIAS$$^{2}$$ IVAR IMSE IBIAS$$^{2}$$ IVAR IMSE Constant 0 % 250 0.002 0.382 0.384 0.002 0.604 0.606 0.232 0.956 1.188 500 0.000 0.191 0.191 0.001 0.303 0.304 0.127 0.478 0.605 20 % 250 0.002 0.516 0.518 0.003 0.744 0.746 0.270 1.136 1.406 500 0.000 0.268 0.268 0.001 0.393 0.394 0.131 0.547 0.678 50 % 250 0.003 0.788 0.791 0.013 1.157 1.170 0.524 1.866 2.390 500 0.001 0.429 0.430 0.006 0.636 0.642 0.264 0.980 1.244 Convex 0 % 250 0.091 1.059 1.150 0.080 1.337 1.417 0.289 1.586 1.875 500 0.054 0.514 0.568 0.050 0.644 0.693 0.184 0.794 0.978 20 % 250 0.101 1.502 1.603 0.099 1.720 1.819 0.308 1.898 2.206 500 0.057 0.780 0.837 0.054 0.914 0.968 0.177 1.034 1.211 50 % 250 0.153 2.395 2.548 0.160 2.682 2.842 0.317 3.346 3.663 500 0.073 1.249 1.322 0.068 1.437 1.505 0.199 1.699 1.898 Concave 0 % 250 0.058 0.690 0.748 0.091 0.905 0.996 0.424 1.360 1.785 500 0.032 0.369 0.402 0.046 0.478 0.524 0.243 0.662 0.905 20 % 250 0.065 0.887 0.952 0.101 1.118 1.219 0.509 1.516 2.025 500 0.042 0.456 0.498 0.057 0.576 0.633 0.273 0.761 1.034 50 % 250 0.090 1.396 1.487 0.162 1.781 1.943 0.871 2.515 3.386 500 0.067 0.704 0.771 0.103 0.881 0.984 0.497 1.290 1.787 From Table 2, it can be seen that the estimation performance deteriorates with increasing censoring rate. Since right-censoring causes loss of information, this result is to be expected. Further, Table 2 shows that the results with parametrically estimated conditional marginal survival functions are close to those with known conditional margins, and that with the former better precision and accuracy is attained than with nonparametrically estimated conditional marginal survival functions. This observation confirms the additional uncertainty induced by marginal estimation, and in particular by the Beran estimator, which is nonparametric and has a slower convergence rate than its (correctly-specified) parametric counterpart. Table 2 also indicates that the estimation performance improves with increasing sample size. Similar conclusions are reached under the Frank and Gumbel copulas (see Tables S3 and S5 in Appendix B of the supplementary material available at Biostatistics online). A graphical representation of the results is provided in Figure 1 for the convex model with sample size $$n=250$$ under the Clayton copula. We observe that, on average, the underlying functional form is estimated successfully, with slightly wider confidence intervals for nonparametrically estimated margins and at higher censoring rates. Fig. 1. View largeDownload slide The Kendall’s tau estimates for the convex model with sample size $$n=250$$ under the Clayton copula. Known margins (left panel), parametrically estimated margins (middle panel) and nonparametrically estimated margins (right panel)—no censoring (top panel), $$20\%$$ of censoring (middle panel) and $$50\%$$ of censoring (bottom). The gray line represents the true Kendall’s tau; the dashed black line corresponds to the Kendall’s tau estimates for the Clayton copula averaged over 200 Monte Carlo samples and the dotted black lines represents the $$5{\rm{th}}$$ and $$95{\rm{th}}$$ percentiles of the Kendall’s tau estimates for the Clayton copula, respectively. Fig. 1. View largeDownload slide The Kendall’s tau estimates for the convex model with sample size $$n=250$$ under the Clayton copula. Known margins (left panel), parametrically estimated margins (middle panel) and nonparametrically estimated margins (right panel)—no censoring (top panel), $$20\%$$ of censoring (middle panel) and $$50\%$$ of censoring (bottom). The gray line represents the true Kendall’s tau; the dashed black line corresponds to the Kendall’s tau estimates for the Clayton copula averaged over 200 Monte Carlo samples and the dotted black lines represents the $$5{\rm{th}}$$ and $$95{\rm{th}}$$ percentiles of the Kendall’s tau estimates for the Clayton copula, respectively. We evaluate the proposed testing strategy through the empirical type I error and the empirical power attained at significance level $$\alpha = 0.05$$. Under each of the considered scenarios for Kendall’s tau, we perform the test focusing on the cases with sample size $$n=250$$. We obtain an approximate $$p$$-value based on $$B=200$$ bootstrap samples for each simulated sample under the Clayton copula and use $$B=100$$ bootstrap samples in our supplemental investigations under the Frank and Gumbel copulas. The rejection rates under the Clayton copula are reported in Table 3, those under the Frank and Gumbel copulas are listed in Appendix B of the supplementary material available at Biostatistics online. For known conditional margins we consider two approaches regarding the bandwidth choice in the bootstrap procedure: (i) in each bootstrap sample we use the bandwidth that has been selected for the analysis of the original data; (ii) for each bootstrap sample we select a bandwidth that is optimal for the bootstrap sample under consideration. Since approach (ii) is computationally demanding, we only investigate it in the case of known margins. Table 3 The empirical type I error $$($$constant model$$)$$ and the empirical power $$($$convex and concave models$$)$$ at $$5%$$ significance level, based on $$M=500$$ replications each with $$B=200$$ bootstrap samples, under the Clayton copula with sample size $$n=250$$, obtained reusing a sample-optimal bandwidth in each bootstrap sample. For known margins, reported in brackets are the rejection rates when bandwidth is updated in each bootstrap sample Known margins Parametric margins Nonparametric margins Censoring rate Rejection rate Rejection rate Rejection rate Constant 0 % 0.100 (0.050) 0.074 0.140 20 % 0.092 (0.054) 0.080 0.126 50 % 0.084 (0.038) 0.072 0.126 Convex 0 % 0.992 (0.874) 0.954 0.892 20 % 0.986 (0.776) 0.948 0.912 50 % 0.878 (0.566) 0.796 0.724 Concave 0 % 0.998 (0.956) 0.984 0.868 20 % 0.994 (0.934) 0.976 0.776 50 % 0.954 (0.728) 0.864 0.546 Known margins Parametric margins Nonparametric margins Censoring rate Rejection rate Rejection rate Rejection rate Constant 0 % 0.100 (0.050) 0.074 0.140 20 % 0.092 (0.054) 0.080 0.126 50 % 0.084 (0.038) 0.072 0.126 Convex 0 % 0.992 (0.874) 0.954 0.892 20 % 0.986 (0.776) 0.948 0.912 50 % 0.878 (0.566) 0.796 0.724 Concave 0 % 0.998 (0.956) 0.984 0.868 20 % 0.994 (0.934) 0.976 0.776 50 % 0.954 (0.728) 0.864 0.546 The rejection rate under the constant model allows to investigate the empirical type I error of the test. From Table 3, we see that reusing the bandwidth value in the bootstrap, generally leads to an empirical type I error that is (slightly) higher than the nominal level $$\alpha=0.05$$. However, if the bandwidth value is updated for each bootstrap sample, the empirical type I error is much closer to the nominal level $$\alpha=0.05$$. Further, the test tends to be less conservative when the conditional margins are estimated nonparametrically. The rejection rates under the convex and concave models allow to assess the empirical power of the test. If the bandwidth is reused in the bootstrap, we observe a higher empirical power for known and parametrically estimated conditional margins than for nonparametrically estimated conditional margins. Updating the bandwidth in each bootstrap sample yields a more accurate estimate of the true power, which is considerably lower than the ones obtained using the sample-optimal bandwidth. Further, the empirical power decreases as the censoring rate increases. The test results under the Frank and Gumbel copulas support these conclusions (see Tables S4 and S6 in Appendix B of the supplementary material available at Biostatistics online). To evaluate the test performance under small deviations from the null hypothesis, we focus on the known margins setting under the Clayton copula with $$20\%$$ censoring rate and examine two other convex models: (a) $$\tau(X) = 0.033(X-3)^2 +0.3$$ and (b) $$\tau(X) = 0.066(X-3)^2 +0.3$$. Model (a) is closer to a constant model than Model (b), and both are considerably flatter than the convex Model (c) $$\tau(X) = 0.1(X-3)^2 +0.3$$ examined so far. The empirical power values for these two convex models, (a) $$0.260$$ and (b) $$0.698$$, are much lower than the corresponding entry (c) $$0.986$$ in Table 3, confirming a boost in power with larger deviations from a constant model. A comparison of the results across different copula families indicates a better performance in the estimation and in the testing for the Frank and Gumbel copulas than for the Clayton copula, especially in case of nonparametrically estimated conditional margins and at higher censoring rates. In the context of joint survival models, a copula with lower (upper) tail dependence represents the association between late (early) event times. Typically, late event times are subject to right-censoring; the loss of information thus occurs mainly in the lower tail area. Hence, it follows that the Clayton copula is affected more by the right-censoring, leading to a relatively inferior performance in the estimation and in the testing strategy. 4.2. Estimation and test results under a misspecified copula family To assess the impact of copula misspecification, we perform local linear estimation $$(p=1)$$ and obtain calibration function estimates under both the Frank and Gumbel copulas for data generated from the Clayton copula. To evaluate the sole effect of copula misspecification, we assume known conditional Weibull marginal survival functions. We assess the performance of the estimation and the testing method under misspecified copulas following the same steps as in Section 4.1. Under each scenario for Kendall’s tau, we limit our investigations to $$M=200$$ simulated data sets of size $$n=250$$; for each data set we draw $$B=100$$ bootstrap samples for $$p$$-value calculation. The estimation results are summarized in Table S1 and visualized in Figure S2 while the rejection rates are reported in Table S2 all collected in Appendix B of the supplementary material available at Biostatistics online. It follows that, under a true Clayton copula, estimation is reasonably accurate and empirical power remains high if the Frank copula is used. However, estimation performance deteriorates drastically and empirical power lowers substantially if the Gumbel copula is applied. Since the Frank copula resembles the Clayton copula more closely than does the Gumbel copula, especially in the presence of censoring, this conclusion is as expected. 5. Data analysis In this section, we analyze a subset of the data from the Diabetic Retinopathy Study (DRS), which was considered in Huster and others (1989), among others. The clinical objective of the study was to investigate the effectiveness of laser photocoagulation in delaying the onset of blindness in diabetic retinopathy patients. One eye of each patient was randomly selected for treatment and the other eye was observed without treatment. The patients were followed up, for approximately $$6.5$$ years, from randomization to blindness, i.e. until their visual acuity got below a threshold at two consecutive visits. A scientific objective of the study was to understand the dependence between the times to blindness (in months) of the treated and untreated eyes, as well as the impact of age at onset of diabetes (in years) on this dependence. The analysis subset consists of $$n=197$$ patients, randomly selected among the participants who are identified as at high risk for blindness by the DRS Research Group. The first two datalines are given by $$(Y_{11},Y_{21},\delta_{11},\delta_{21},X_1)=(46.23,46.23,0,0,28)$$ and $$(Y_{12},Y_{22},\delta_{12},\delta_{2n},X_2)=(42.50,31.30,0,1,12)$$, respectively. For instance, the first subject had diabetes at age 28, and did not experience blindness till lost to follow-up at 46.23 months. On the other hand, the second subject had diabetes at age 12 and experienced blindness only in the untreated eye at 31.30 months after participating in the study; time to blindness in the treated eye was censored at 42.50 months. The (univariate) censoring rates are 73% for the treated eyes and 49% for the untreated eyes. More details on the dataset can be found in Huster and others (1989). Our objective is to understand the dependence between the times to blindness of the treated and untreated eyes. We consider the age at onset of diabetes as a continuous covariate, as opposed to the dichotomous age groups (juvenile versus adult) (Huster and others, 1989), and investigate if it has a significant effect on this dependence. Following Huster and others (1989), we use Weibull marginal survival functions and the Clayton copula to account for dependence between the event times. As outlined in Section 2, we fit the conditional joint survival function of the event times given the age at diabetes onset in two stages. For the conditional marginal survival functions, our preliminary model comparisons using a likelihood ratio test indicate a significant treatment effect on the time to blindness ($$p$$-value $$< 0.001$$), and presence of interaction between treatment and age at diabetes onset ($$p$$-value $$= 0.004$$). Hence, we use Weibull margins $$S_{1|x}(t|x) = \exp\{-\lambda_1 t^{\rho_1} \exp(\beta_1 x)\}$$ and $$S_{2|x}(t|x) = \exp\{-\lambda_2 t^{\rho_2} \exp(\beta_2 x)\}$$, where $$x$$ denotes the age at diabetes onset. The parameter estimates under these models are provided in Table S8 under Appendix C of the supplementary material available at Biostatistics online. The opposite signs of the coefficients for the covariate in the two models support the interaction between treatment and age at diabetes onset. In particular, the negative coefficient for treated eyes implies that time to blindness is delayed with age of diabetes onset, hence treatment is more effective for patients who experience diabetes later in life. We also employ the Beran estimator with the Epanechnikov kernel and obtain nonparametric estimates of each conditional margin at $$10$$ bandwidth values, chosen equidistant on a logarithmic scale between $$3$$ and $$57$$ years. The same $$10$$ bandwidth values are considered in the local likelihood estimation. For the conditional copula model with parametrically estimated conditional margins, the optimal bandwidth value is $$h_\mathbb{C} =42$$. When the conditional margins are estimated nonparametrically using the Beran estimator, we performed bandwidth selection in two steps, as outlined in Section 2.3, and obtained the optimal bandwidth vector $$(h_1, h_2,h_\mathbb{C} ) = (17, 17, 42)$$. The resulting nonparametric estimates for the conditional margins match closely with the estimates obtained under the Weibull model (the mean square deviations are 0.0030 and 0.0019 for the first and second margin, respectively), supporting the appropriateness of this model for the univariate event times. The local likelihood estimates of Kendall’s tau at the selected bandwidths under each type of conditional margins are shown in Figure 2, together with $$90\%$$ bootstrap confidence intervals. The latter are obtained by resampling the original data points $$B=1000$$ times, and fitting a joint model for each bootstrap sample at the bandwidth values selected for the original data. The results from the parametric and nonparametric conditional margins both suggest an increasing linear pattern in the strength of dependence with age at diabetes onset. The relatively weaker dependence at younger onset of diabetes indicates that the progression of diabetic retinopathy in the treated and untreated eyes may not be as similar as it is when diabetes onset is at a later age. The physical reliance of treated and untreated eyes seems to be higher at older ages of diabetes onset, which can perhaps be explained as an aging effect. For comparisons, we also fit constant and linear calibration models using maximum likelihood. The resulting estimates are displayed in Figure 2 in Kendall’s tau scale. As can be seen, the local likelihood estimates are in close agreement with the parametric estimates under the linear calibration model. Furthermore, we observe slightly wider bootstrap confidence intervals, hence a larger variation in Kendall’s tau estimates when the Beran estimator is used. This observation is in line with our findings in Section 4. There is more uncertainty in the local likelihood estimates when age at diabetes onset is greater than 40. This is mainly due to the limited number of patients (31 out of 197) with high onset age, for which most of the observations are censored for at least one eye. Fig. 2. View largeDownload slide Local likelihood estimates of Kendall’s tau (dashed lines) as a function of age at onset of diabetes obtained from parametrically (left panel) and nonparametrically (right panel) estimated conditional survival functions, along with $$90\%$$ bootstrap confidence intervals (dotted lines) under the Clayton copula. Also shown are maximum likelihood estimates of Kendall’s tau obtained under the constant (gray dot-dashed line) and linear (gray solid line) calibration models. Fig. 2. View largeDownload slide Local likelihood estimates of Kendall’s tau (dashed lines) as a function of age at onset of diabetes obtained from parametrically (left panel) and nonparametrically (right panel) estimated conditional survival functions, along with $$90\%$$ bootstrap confidence intervals (dotted lines) under the Clayton copula. Also shown are maximum likelihood estimates of Kendall’s tau obtained under the constant (gray dot-dashed line) and linear (gray solid line) calibration models. To decide whether the observed variation in the strength of dependence (ranging approximately from $$0.2$$ to $$0.7$$ in Kendall’s tau) is significant or not, we perform the GLR test as outlined in Section 3. The $$p$$-values based on $$B=1000$$ bootstrap samples under the null hypothesis are $$0.164$$ for the parametric conditional margins and $$0.168$$ for the nonparametric conditional margins. Hence, there is not enough evidence in the data to reject the constant conditional copula model. The traditional likelihood ratio test between the constant and linear calibration models also supports this conclusion with $$p$$-values $$0.111$$ and $$0.102$$ for parametrically and nonparametrically estimated conditional margins, respectively. We reach similar conclusions under the Frank and Gumbel copulas (see Table S9 in Appendix C of the supplementary material available at Biostatistics online). These results together suggest that the impact of age at diabetes onset on the dependence between the times-to-blindness in the treated and untreated eyes is not statistically significant. Nevertheless, the observed increasing pattern in the strength of dependence may be of clinical interest, and would be worth further investigation in a larger sample. 6. Discussion In this article, we outline an estimation and a testing strategy to assess the impact of a continuous cluster-level covariate on the strength of within-cluster dependence of right-censored event time data, as modeled via a conditional copula. A local likelihood approach is used to estimate the functional form of the conditional copula parameter and a GLR test is described to test its constancy. A bootstrap procedure is employed to obtain an approximate $$p$$-value for the test. The performance of the estimation and the testing method is evaluated in a simulation study, under different rates of right-censoring and for various parametric copula families, considering both parametrically and nonparametrically estimated margins. The results indicate that the proposed local likelihood approach leads to on target estimation, with more uncertainty for nonparametrically estimated conditional margins than for parametrically estimated conditional margins, and that, depending on the considered parametric copula family, the testing strategy has reasonable to high power. We further observe that the bootstrap-based type I errors are somewhat inflated if the bandwidth for the original data is re-used, but are close to the nominal level if the bandwidth is updated in each bootstrap sample. The second option, however, is quite demanding from a computational point of view. Details on computation times can be found in Table S7 under Appendix B of the supplementary material available at Biostatistics online. In the implementation of the proposed methods, it is first assumed that the copula family is correctly specified. Second, we investigate the effect of a misspecified copula family. Our results suggest that selection of an appropriate copula family is indispensable. In practice, one can select the copula family via a likelihood-based criterion, such as Akaike information criterion as typically employed in copula modeling, or by comparing the predictive performance of competitive copulas as suggested in Acar and others (2011). In our analysis of the diabetic retinopathy data, we followed Huster and others (1989) and employed the Clayton copula. A cross-validated likelihood based comparison (as reported in Table S9 of the supplementary material available at Biostatistics online) further confirms the appropriateness of this choice. The estimation results under the Frank and Gumbel copula yield a similar pattern for the impact of age at onset on the strength of dependence between the time to blindness of treated and untreated eyes (see Appendix C of the supplementary material available at Biostatistics online). The procedures in this article are developed for clustered right-censored event time data, assuming that the censoring times are independent of the event times and the covariate. This assumption is required to obtain the likelihood function in (2.3), which contains only the distribution of the event times but not the censoring distribution. Interesting topics for further research, include the extension of the proposed method to event time data subject to dependent censoring or to interval censored event time data. It is possible to extend the methods to include multiple event times using a multivariate copula, where the latter can be constructed, for instance, via pair-copula constructions. A recent work that considers pair-copula constructions for joint modeling of unconditional multiple event times is Barthel and others (2017). Incorporating multiple covariates in the proposed methods, on the other hand, is not straight-forward, especially when the conditional marginals are estimated using the Beran estimator. While one can use single-index models (Fermanian and Lopez, 2015) or additive models (e.g.] Vatter and Chavez-Demoulin, 2015) to overcome the so-called curse of dimensionality in estimating conditional copula parameter, such extensions require modifications in the likelihood methodology to account for right-censoring, and their practical value might be limited by the computational burden. 7. Software Software in the form of an R package, along with the R codes for the data analysis and a simulation example are available at https://github.com/EFAcar/CondiCopSurv. Supplementary material supplementary material is available at http://biostatistics.oxfordjournals.org. Acknowledgments We would like to thank the co-editors, the associate editor, and the two referees for their constructive comments that help improve the content and presentation of our manuscript. Conflict of Interest: None declared. Funding Hasselt University Visiting Scholar Program (BOF Short Research Stay); Natural Sciences and Engineering Research Council (NSERC) of Canada’s Discovery Grant Program (Grant 435943-2013); Canadian Statistical Sciences Institute (CANSSI) Collaborative Research Team Project; Interuniversity Attraction Poles Programme (IAP-network P7/06), Belgian Science Policy Office. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation—Flanders (FWO) and the Flemish Government—Department EWI. References Abegaz F., Gijbels I. and Veraverbeke N. ( 2012). Semiparametric estimation of conditional copulas. Journal of Multivariate Analysis 110, 43– 73. Google Scholar CrossRef Search ADS Acar E. F., Craiu R. V. and Yao F. ( 2011). Dependence calibration in conditional copulas: a nonparametric approach. Biometrics 67, 445– 453. Google Scholar CrossRef Search ADS PubMed Acar E. F., Craiu R. V. and Yao F. ( 2013). Statistical testing of covariate effects in conditional copula models. Electronic Journal of Statistics 7, 2822– 2850. Google Scholar CrossRef Search ADS Acar E. F., Genest C. and Nešlehová J. ( 2012). Beyond simplified pair-copula constructions. Journal of Multivariate Analysis 110, 74– 90. Google Scholar CrossRef Search ADS Barthel N., Geerdens C., Killiches M., Janssen P. and Czado C. ( 2017). Vine copula based inference of multivariate event time data. Computational Statistics & Data Analysis (to appear), arXiv:1603.01476. Beran R. ( 1981). Nonparametric regression with randomly censored survival data. https://www.researchgate.net/ publication/316173118_Nonparametric_regression_with_randomly_censored_survival_data. Chen X., Fan Y., Pouzo D. and Ying Z. ( 2010). Estimation and model selection of semiparametric multivariate survival functions under general censorship. Journal of Econometrics 157, 129– 142. Google Scholar CrossRef Search ADS PubMed Clayton D. G. ( 1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65, 141– 151. Google Scholar CrossRef Search ADS Davison A. and Hinkley D. ( 1997). Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge. Google Scholar CrossRef Search ADS Ding A. A., Hsieh J.-J. and Wang W. ( 2015). Local linear estimation of concordance probability with application to covariate effects models on association for bivariate failure-time data. Lifetime Data Analysis 21, 42– 74. Google Scholar CrossRef Search ADS PubMed Fan J. and Gijbels I. ( 1996). Local Polynomial Modelling and Its Applications , 1st edition, Volume 66. London: Chapman & Hall. Fan J. and Jiang J. ( 2005). Nonparametric inferences for additive models. Journal of American Statistical Association 100, 890– 907. Google Scholar CrossRef Search ADS Fan J., Zhang C. M. and Zhang J. ( 2001). Generalized likelihood ratio statistics and wilks phenomenon. Annals of Statistics 29, 153– 193. Google Scholar CrossRef Search ADS Fan J. and Zhang W. Y. ( 2004). Generalized likelihood ratio tests for spectral density. Biometrika 91, 195– 209. Google Scholar CrossRef Search ADS Fermanian J.-D. and Lopez O. ( 2015). Single-index copulae. arXiv:1512.07621. Gannoun A., Saracco J. and Yu K. ( 2007). Comparison of kernel estimators of conditional distribution function and quantile regression under censoring. Statistical Modelling 7, 329– 344. Google Scholar CrossRef Search ADS Geerdens C., Claeskens G. and Janssen P. ( 2013). Goodness-of-fit tests for the frailty distribution in proportional hazards models with shared frailty. Biostatistics 14, 433– 446. Google Scholar CrossRef Search ADS PubMed Geerdens C., Claeskens G. and Janssen P. ( 2016). Copula based flexible modeling of associations between clustered event times. Lifetime Data Analysis 22, 363– 381. Google Scholar CrossRef Search ADS PubMed Hobæk Haff I., Aas K. and Frigessi A. ( 2010). On the simplified pair-copula construction—simply useful or too simplistic? Journal of Multivariate Analysis 101, 1296– 1310. Google Scholar CrossRef Search ADS Hougaard P. ( 1986). A class of multivariate failure time distributions. Biometrika 73, 671– 678. Huster W. J., Brookmeyer R. and Self S. G. ( 1989). Modelling paired survival data with covariates. Biometrics 45, 145– 156. Google Scholar CrossRef Search ADS PubMed Massonnet G., Janssen P. and Duchateau L. ( 2009). Modeling udder data using copula models for quadruples. Journal of Statistical Planning and Inference 139, 3865– 3877. Google Scholar CrossRef Search ADS Oakes D. ( 1982). A model for association in bivariate survival data. Journal of Royal Statistical Society Series B 44, 414– 422. Patton A. J. ( 2006). Modelling asymmetric exchange rate dependence. International Economics Review 47, 527– 556. Google Scholar CrossRef Search ADS Shih J. H. and Louis T. A. ( 1995). Inferences on the association parameter in copula models for bivariate survival data. Biometrics 51, 1384– 1399. Google Scholar CrossRef Search ADS PubMed Tibshirani R. and Hastie T. ( 1987). Local likelihood estimation. Journal of the American Statistical Association 82, 559– 567. Google Scholar CrossRef Search ADS Vatter T. and Chavez-Demoulin V. ( 2015). Generalized additive models for conditional dependence structures. Journal of Multivariate Analysis 141, 147– 167. Google Scholar CrossRef Search ADS © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Biostatistics – Oxford University Press

**Published: ** Apr 1, 2018

Loading...

personal research library

It’s your single place to instantly

**discover** and **read** the research

that matters to you.

Enjoy **affordable access** to

over 18 million articles from more than

**15,000 peer-reviewed journals**.

All for just $49/month

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Read from thousands of the leading scholarly journals from *SpringerNature*, *Elsevier*, *Wiley-Blackwell*, *Oxford University Press* and more.

All the latest content is available, no embargo periods.

## “Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”

Daniel C.

## “Whoa! It’s like Spotify but for academic articles.”

@Phil_Robichaud

## “I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”

@deepthiw

## “My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”

@JoseServera

DeepDyve ## Freelancer | DeepDyve ## Pro | |
---|---|---|

Price | FREE | $49/month |

Save searches from | ||

Create lists to | ||

Export lists, citations | ||

Read DeepDyve articles | Abstract access only | Unlimited access to over |

20 pages / month | ||

PDF Discount | 20% off | |

Read and print from thousands of top scholarly journals.

System error. Please try again!

or

By signing up, you agree to DeepDyve’s Terms of Service and Privacy Policy.

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.

ok to continue