Biometrics 74, 100–108 DOI: 10.1111/biom.12746
A Pairwise Likelihood Augmented Cox Estimator for
and Yi Li
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.
Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health,
Bethesda, Maryland 20892, U.S.A.
Kidney Epidemiology and Cost Center, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.
Summary. Survival data collected from a prevalent cohort are subject to left truncation and the analysis is challenging.
Conditional approaches for left-truncated data could be ineﬃcient as they ignore the information in the marginal likelihood
of the truncation times. Length-biased sampling methods may improve the estimation eﬃciency but only when the under-
lying truncation time is uniform; otherwise, they may generate biased estimates. We propose a semiparametric method for
left-truncated data under the Cox model with no parametric distributional assumption about the truncation times. Our
approach is to make inference based on the conditional likelihood augmented with a pairwise likelihood, which eliminates the
truncation distribution, yet retains the information about the regression coeﬃcients and the baseline hazard function in the
marginal likelihood. An iterative algorithm is provided to solve for the regression coeﬃcients and the baseline hazard function
simultaneously. By empirical process and U-process theories, it has been shown that the proposed estimator is consistent and
asymptotically normal with a closed-form consistent variance estimator. Simulation studies show substantial eﬃciency gain
of our estimator in both the regression coeﬃcients and the cumulative baseline hazard function over the conditional approach
estimator. When the uniform truncation assumption holds, our estimator enjoys smaller biases and eﬃciency comparable
to that of the full maximum likelihood estimator. An application to the analysis of a chronic kidney disease cohort study
illustrates the utility of the method.
Key words: Chronic kidney disease; Composite likelihood; Empirical process; Self-consistency; U-process.
Survival data collected from a prevalent cohort that includes
patients who already have the disease at the study enroll-
ment are subject to left truncation. This is because those
who died with the disease before the enrollment would have
no chances to be selected, whereas the selected patients, hav-
ing survived until the enrollment, are healthier on average.
To avoid overestimating the survival, conventional approaches
make inferences conditional on truncation times (Kalbﬂeisch
and Lawless, 1991; Wang et al., 1993). These approaches dis-
regard the information about the regression coeﬃcients in the
marginal likelihood of the truncation times, and hence loss
of eﬃciency is expected when additional knowledge on the
underlying truncation distribution is available (Huang and
If the underlying truncation time is uniformly distributed,
left truncation reduces to length-biased sampling (Vardi,
1989), that is, the probability of selecting a subject is pro-
portional to the length of his or her underlying failure time;
see a comprehensive review by Shen et al. (2017). Among the
newly developed regression methods for length-biased data,
many show considerable improvement of eﬃciency in estima-
tion compared with the conditional approach by incorporating
information from the observed truncation times (Qin and
Shen, 2010; Qin et al., 2011; Huang et al., 2012; Huang and
Qin, 2012; Ning et al., 2014). Nevertheless, when the uniform
truncation assumption is violated, these methods may yield
inconsistent estimates (Huang and Qin, 2012).
The motivating study is a prevalent cohort study of patients
with chronic kidney disease (CKD), sponsored by the Renal
Research Institute (Perlman et al., 2003). Following the diag-
nosis, in general, CKD patients are referred to nephrologists
to receive special care and treatments. The investigators were
interested in whether the patient characteristics at referral
were associated with the disease progression to end-stage
renal disease (ESRD) or death. At the study recruitment from
June 2000 to January 2006, subjects with glomerular ﬁltration
rate (GFR) less than or equal to 50 ml/minute/1.73 m
invited to participate. The dataset is of a moderate sample
size, so improving the estimation eﬃciency is important. How-
ever, statistical assessment in Section 4 indicated deviation of
the motivating data from the uniform truncation assumption,
which prompted us to seek an eﬃciency-improving method
with consistent estimates.
Recently, Huang and Qin (2013) proposed a more eﬃcient
estimator for the additive hazards model under general left
truncation. They used a pairwise likelihood of the trunca-
tion times to eliminate the unspeciﬁed truncation distribution
(Liang and Qin, 2000). In practice, however, the Cox model
is more commonly used than the additive hazards model,
2017, The International Biometric Society