# The ROC curve for regularly measured longitudinal biomarkers

The ROC curve for regularly measured longitudinal biomarkers Summary The receiver operating characteristic (ROC) curve is a commonly used graphical summary of the discriminative capacity of a thresholded continuous scoring system for a binary outcome. Estimation and inference procedures for the ROC curve are well-studied in the cross-sectional setting. However, there is a paucity of research when both biomarker measurements and disease status are observed longitudinally. In a motivating example, we are interested in characterizing the value of longitudinally measured CD4 counts for predicting the presence or absence of a transient spike in HIV viral load, also time-dependent. The existing method neither appropriately characterizes the diagnostic value of observed CD4 counts nor efficiently uses status history in predicting the current spike status. We propose to jointly model the binary status as a Markov chain and the biomarkers levels, conditional on the binary status, as an autoregressive process, yielding a dynamic scoring procedure for predicting the occurrence of a spike. Based on the resulting prediction rule, we propose several natural extensions of the ROC curve to the longitudinal setting and describe procedures for statistical inference. Lastly, extensive simulations have been conducted to examine the small sample operational characteristics of the proposed methods. 1. Introduction The receiver operating characteristic (ROC) curve is a graphical summary of the discriminative capacity of a thresholded continuous scoring system for a binary outcome. The curve consists of pairs of true positive and false positive rates as the threshold is varied (Swets and Pickett, 1982). Although many alternative measures have been proposed (Uno and others, 2007, 2013; Pencina and others, 2008; Steyerberg and others, 2010), the ROC curve remains the most commonly used in many fields. In medical research, ROC curves are often used to characterize the quality of a continuous biomarker as a diagnostic for binary statuses such as “diseased” versus “non-diseased” (Pepe, 2003; Zhou and others, 2009). Well-studied in the cross-sectional setting, the ROC curve has been generalized to settings where the outcome of interest is time-to-event (Heagerty and others, 2000; Zheng and Heagerty, 2004; Heagerty and Zheng, 2005) and where the biomarker is longitudinally measured (Foulkes and others, 2010; Liu and Albert, 2014). However, less research has considered both longitudinal biomarker measurements and binary statuses. A motivating example is data gathered from the Yale Prospective Longitudinal Pediatric HIV Cohort. The cohort comprises 97 children born to HIV-infected mothers in the New Haven, CT, area since 1985. Various measurements were taken on the participants every 2–3 months over the 10-year period 1996–2006. Among these measurements, we focus on a continuous biomarker, CD4+ lymphocyte count, as a predictor of a binary outcome, “blip” status, the presence or absence of a transient spike in viral load (Paintsil and others, 2008). Let $$X_{ij}$$ and $$Y_{ij}$$ denote the biomarker value and a binary status of patient $$i$$ at visit $$j$$, respectively, $$i=1,\ldots,n$$, $$j=1,\ldots, n_i.$$ The values $$X_{ij}$$ may be direct assay measurements, as in the motivating example, or they may be derived or composite quantities. To assess the predictive value of the longitudinal biomarker $$X_{ij}$$ for predicting $$Y_{ij}$$, Liu and Wu (2003) and Liu and others (2005) propose a simple mixed effect regression model (Breslow and Clayton, 1993)   $$g\{\mbox{P}(Y_{ij}=1 \mid X_{i1},\cdots,X_{in_i})\}=\alpha_i+\beta_i X_{ij}, \label{math:mixedmodel}$$ (1.1) where $$g(\cdot)$$ is the logit function and $$\alpha_i$$ and $$\beta_i$$ are, respectively, the subject-specific random intercept and slope. Similar models are described in Foulkes and others (2010) and Albert (2012). The random vector $$(\alpha_i, \beta_i)'$$ is assumed to follow a Gaussian distribution   $$\left( \begin{array}{c} \alpha_i \\ \beta_i \end{array} \right)\sim N\left\{\left( \begin{array}{c} \alpha_0 \\ \beta_0 \end{array} \right)\!, {\boldsymbol{\Gamma}}_0 \right\}\!.$$ The ROC curve summarizing the diagnostic value of $$X_{ij}$$ is then constructed based on pairs   $$\left\{(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij}), i=1, \cdots, n; j=1, \cdots, n_i\right\}\!,$$ where $$(\hat \alpha_i, \hat \beta_i)'$$ are estimates of the subject-specific random effects obtained from the observed data, for example,   $$\left( \begin{array}{c} \hat \alpha_i \\ \hat \beta_i \end{array} \right)=E\left\{\left( \begin{array}{c} \alpha_i \\ \beta_i \end{array} \right)\biggm | X_{i1}, \ldots, X_{in_i}, Y_{i1}, \ldots, Y_{in_i}, \hat\alpha_0, \hat \beta_0, \hat {\boldsymbol{\Gamma}}_0 \right\}\!,$$ and $$\hat\alpha_0$$, $$\hat \beta_0$$, and $$\hat \Gamma_0$$ are maximum likelihood estimators for the corresponding population parameters. While this approach is simple and intuitive, we mention several limitations. First, the parametric assumptions may be too restrictive for some applications. For example, as discussed below, the Yale pediatric HIV data suggest greater dependency among biomarkers and disease statuses nearer in time. Neither is accounted for by model (1.1), which is symmetric in time. Second, the subject-specific random effect estimate $$\hat \alpha_i$$ and $$\hat \beta_i,$$ as defined above, are not available at visit $$j$$ as the biomarker levels $$(X_{i(\,j+1)}, \ldots, X_{in_i})$$ and responses $$(Y_{i(\,j+1)}, \ldots, Y_{in_i})$$ are not yet observed. Third, the approach uses the same data both to fit the model and, by using the fitted biomarkers to construct the ROC curve, to assess the quality of the fit. One expects such an assessment to overestimate the true diagnostic quality of the biomarker (Janes and others, 2009). Efforts to set aside a group of patients for validation after estimation encounter the difficulty that subject effect estimates for the validation patients are unavailable (Foulkes and others, 2010). Lastly and more conceptually, the notion of ROC curve stands to be refined in the context of longitudinal measurements of multiple patients. In contrast to the cross-sectional setting, several useful ROC curves suggest themselves. For example, the predictive performance of the biomarker for a given patient, as determined by that patient’s history, can be quite different from the predictive performance for the entire patient population. In the next section, we propose a general framework to address these limitations. 2. Methods We first note two properties desirable in a framework for assessing diagnostic performance in the longitudinal, multiple subject design under consideration. First, to assess the predictive performance of a biomarker, the biomarker should depend only on data that is available when a prediction is to be made. We adopt the vantage of a practitioner who has previous biomarker and status data for a patient, is confronted with a current biomarker for the patient, and must now predict current status. In the HIV example, due to the turnaround time of the tests involved, CD4 count or percentage is normally available before blip status. The patient’s history includes previous blip statuses, and the clinician may need to determine a course of treatment based on as-yet unavailable current blip status as predicted from current CD4 count or percentage. A similar problem is described by Yang and others (2015). Here, the “status” is the presence of absence of influenza-like illness in periodic reports issued by the Center for Disease Control (CDC), and the predictor or “biomarker” is real-time internet search data. The CDC’s reports describe outbreaks at a 1–3 week delay. When making real-time predictions with the search data, only CDC outbreak data referencing previous time points are available. In this case, an accurate early warning of the outbreak can be very important for public health. As predictions may be made at different times, the accuracy of the prediction and the associated ROC curve will depend on time, with the corresponding prediction depending only on patient history available at that time. Second, two types of prediction performance should be differentiated: that for an individual patient and that for a patient population. For the former, we target the performance of $${\cal F}({\cal H}_{ij})$$ as a predictor of $$Y_{ij},$$ where $${\cal F}({\cal H}_{ij})$$ is a continuous score summarizing the predictive information contained in the history up to visit $$j$$ of a given patient $$i,$$ and $$\mathcal{H}_{ij}=\{X_{i1},\ldots,X_{ij},Y_{i1},\ldots,Y_{i(\,j-1)}\}.$$ For the latter, we are interested in the predictive performance of $${\cal F}({\cal H}_{ij})$$ in the entire patient population at a time $$j$$, that is, marginalizing across patients. In the following, we first generalize the simple mixed effect model (1.1) and discuss the two types of predictive performance under the proposed model. As discussed further below, easy extensions lead to more sophisticated models allowing for more flexible prediction rules. We assume that the longitudinal biomarker levels $$X_{ij}$$ follow an autoregressive process conditional on disease status $$Y_{ij} \in \{0, 1\},$$ which are generated by a Markov chain as in, e.g., Azzalini (1994). Specifically, we assume that for the $$i$$th patient   \begin{aligned} \mbox{P}\{Y_{ij}=b | Y_{i(\,j-1)}=a, {\cal H}_{i(\,j-1)}\}& = p_i^{(ab)}, a,b\in\{0,1\}, j=2, \ldots, \\ X_{ij} | \left\{Y_{ij}=a, {\cal H}_{ij}\right\}&=\theta_i^{(a)}+\rho_0X_{i(\,j-1)}+\epsilon_{ij}, j=1, \ldots, \\ \mbox{P}(Y_{i1}=a)& =p^{(a)} ~~~\mbox{and}~~~ X_{i0}=0, \end{aligned} \label{math:jointmodel} (2.1) where   \begin{align*} \rho_0 &\in (-1,1)\\ \epsilon_{ij} &\overset{iid}{\sim} \mathcal{N}(0, \sigma_0^2), j=1,\cdots, n_i,\\ \xi_i = \left( \begin{array}{c} \theta_i^{(0)} \\ \theta_i^{(1)} \\ g(p^{(01)}_i) \\ g(p^{(11)}_i) \end{array} \right) &\overset{iid}{\sim} N\left\{\left( \begin{array}{c} \theta^{(0)} \\ \theta^{(1)} \\ \mu^{(01)} \\ \mu^{(11)}\end{array} \right)\!, \left(\begin{array}{cc} {\boldsymbol{\Sigma}}_\theta & 0 \\ 0 & {\boldsymbol{\Sigma}}_p \end{array}\right) \right\}\!, i=1,\ldots,n, \end{align*} independently and identically, and $$\eta_0=(\rho_0,p^{(0)}, \theta^{(0)},\theta^{(1)},\mu^{(01)},\mu^{(11)},{\boldsymbol{\Sigma}}_\theta,{\boldsymbol{\Sigma}}_p)$$ are hyperparameters. $$X_{i0}$$ is set to 0 to initialize the autoregressive, implying that the baseline biomarker level $$X_{i1}$$ follows a Gaussian distribution conditional on the blip status. This set of parametric distributions for the random effects is chosen in part for convenience as they permit the model parameters to be estimated using many standard statistical software packages. Specifically, $$\{\theta^{(0)}, \theta^{(1)}, {\boldsymbol{\Sigma}}_\theta, \rho_0, \sigma_0^2\}$$ can be estimated by fitting the linear mixed effects model (Laird and Ware, 1982)   $$X_{ij} =\theta_i^{(0)}+(\theta_i^{(1)}-\theta_i^{(0)})Y_{ij}+\rho_0X_{i(\,j-1)}+\epsilon_{i(\,j-1)}, j=2, \ldots, n_i,$$ and $$\{\mu_{01}, \mu_{11}, {\boldsymbol{\Sigma}}_p\}$$ can be estimated by fitting the generalized mixed effects model   $$g\{\mbox{P}(Y_{ij}=1 \mid Y_{i(\,j-1)})\}=\mu^{(01)}_i+(\mu^{(11)}_i-\mu^{(01)}_i)Y_{i(\,j-1)}, j=2, \ldots, n_i,$$ where $$\mu^{(ab)}_i=g(p^{(ab)}_i), a=0,1$$ and $$b=0,1.$$ In addition, $$p^{(0)}$$ can be estimated by the observed proportion across patients at the initial visit, i.e., the baseline. More importantly, under this model, we may link $$Y_{ij}$$ with the observed history at visit $$j$$ via a random effects model   \begin{align} g\left\{\mbox{P}\left(Y_{ij}=1\mid {\cal H}_{ij}, \xi_i\right)\right\}&=g\left\{\mbox{P}\left(Y_{ij}=1\mid X_{i(\,j-1)}, X_{ij}, Y_{i(\,j-1)}, \xi_i\right)\right\}\nonumber\\ &=\alpha_{i0}+\alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}, \label{math:masterreg} \end{align} (2.2) where   \begin{aligned} \alpha_{i0} &=\frac{1}{2\sigma_0^2}\left\{(\theta^{(0)}_i)^2-(\theta^{(1)}_i)^2\right\} + \mu^{(01)}_i,\\ \alpha_{i1} &= \mu^{(11)}_i-\mu^{(01)}_i,\\ \alpha_{i2}&=\frac{1}{\sigma^2_0}\rho_0\left(\theta_i^{(0)}-\theta_i^{(1)}\right)\!,\\ \alpha_{i3}&=-\frac{1}{\sigma^2_0}\left(\theta_i^{(0)}-\theta_i^{(1)}\right)\!. \end{aligned} \label{math:randomcoeff} (2.3) Thus model (2.1) generalizes model (1.1), insofar as the log-odds of positive disease status is modeled as linear in the subject’s most current biomarker level, although the distribution of the coefficients in this linear combination may differ between the two models. Generalizations of (2.1) that include more biomarkers, e.g., $$g\{\mbox{P}(Y_{ij}=1 \mid X_{i1},\ldots,X_{in_i})\}=\beta_0+\beta_1 X_{ij} + \beta_2X_{i(\,j-1)} + \cdots + \beta_mX_{i(\,j-m+1)}$$, correspond to higher order autoregressive processes in model (1.1). 2.1. Individual patient ROC curve We would like to evaluate the predictive performance of the biomarker or score $$M_{ij}$$ (or its history) for patient $$i$$ at time $$j$$ by contrasting two survival functions, $$\tilde S_{ij}^{(1)}(m)$$ and $$\tilde S_{ij}^{(0)}(m),$$ where   $$\tilde S_{ij}^{(a)}(m)=\mbox{P}\left(M_{ij}> m \mid Y_{ij}=a, \xi_i, \rho_0, \sigma_0\right)\!, a=0,1,$$ and   $$M_{ij}=\alpha_{i0}+\alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}.$$ $$M_{ij}$$ uses the available history, since under model (2.1), $$Y_{ij}$$ is conditionally independent of the remaining history given $$M_{ij}$$. We may then use the ROC curve $$\tilde S_{ij}^{(1)}\big\{\big(\tilde S_{ij}^{(0)}\big)^{-1}(u)\big\}$$ or derived statistics such as the area under the ROC curve $$\int_{\mathbb{R}}\tilde S_{ij}^{(1)}(x){\rm d}\big\{1-\tilde S_{ij}^{(0)}(x)\big\}$$ to summarize this contrast. As $$\alpha_{ij}, j=0,1,2,3$$, depend on the unknown subject-specific random effect $$\xi_i,$$ the score $$M_{ij}$$ is unavailable in practice. An ROC curve based on $$M_{ij}$$ can only serve as a theoretical benchmark. We therefore estimate the random effect $$\xi_i$$ based on its conditional distribution $$\xi_i \mid \bar{\cal H}_{i(\,j-1)},$$ where $${\bar{\mathcal{H}}}_{i(\,j-1)}= \{(Y_{ik}, X_{ik}), k=1, \ldots, (\,j-1) \},$$ and use a plug-in estimator for $$M_{ij}$$ . For example, we may estimate the random effects and $$M_{ij}$$ by the posterior mean   \begin{align*} \hat\xi_i({\bar{\cal H}}_{i(\,j-1)}, \eta_0)= \left( \begin{array}{c} \hat \theta_i^{(0)}({ \bar{\cal H}}_{i(\,j-1)}, \eta_0) \\[5pt] \hat \theta_i^{(1)}({ \bar{\cal H}}_{i(\,j-1)}, \eta_0)\\[5pt] g\left\{\hat p^{(01)}_i({ \bar{\cal H}}_{i(\,j-1)}, \eta_0)\right\} \\[5pt] g\left\{\hat p^{(11)}_i({\bar{\cal H}}_{i(\,j-1)}, \eta_0)\right\} \end{array}\right)=E\left\{\left( \begin{array}{c} \theta_i^{(0)} \\[5pt] \theta_i^{(1)} \\[5pt] \mu^{(01)}_i \\[5pt] \mu^{(11)}_i\end{array} \right) \biggm| { \bar{\cal H}}_{i(\,j-1)}, \eta_0 \right\}\!, \end{align*} and   \begin{eqnarray} \hat M_{j}({\cal H}_{ij}, \eta_0)&=&\alpha_{0j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)+\alpha_{1j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)Y_{i(\,j-1)} \nonumber\\ &&+\alpha_{2j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)X_{i(\,j-1)}+\alpha_{3j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)X_{ij}, \label{math:mhat} \end{eqnarray} (2.4) respectively (Robinson, 1991), where the functions $$\alpha_{kj}(\bar{\cal H}_{i(\,j-1)}, \eta_0), k=0, 1, 2, 3,$$ are obtained by replacing all the relevant subject-specific random effects in (2.3) with their estimated counterparts based on $${\bar{\cal H}}_{i(\,j-1)}.$$ For example,   $$\alpha_{3j}(\bar{\cal H}_{ij}, \eta_0) =\frac{1}{\sigma_0^2}\left\{ E(\theta_i^{(1)} \mid {\bar{\cal H}}_{i(\,j-1)}, \eta_0)-E(\theta_i^{(0)}\mid {\bar{\cal H}}_{i(\,j-1)}, \eta_0)\right\}\!.$$ Here, the subscript $$j$$ is used to emphasize that the prediction of the subject-specific random effect is made at visit $$j$$ using information up to visit $$j-1.$$ The estimator $$\hat{M}_{j}(\cdot,\cdot)$$ depends on the subject only through the first argument, i.e., the patient history, and so the subscript $$i$$ has been dropped. An explicit expression for this choice of $$\hat\xi_i({\bar{\cal H}}_{i(\,j-1)}, \eta_0)$$ can be found in Appendix A of the supplementary material available at Biostatistics online. Using the estimated score $$\hat M_{j}({\cal H}_{ij}, \eta_0)$$ (or $$\hat M_{j}({\cal H}_{ij},\hat\eta_0)$$ if $$\eta_0$$ is unknown) to predict the disease status at visit $$j$$, the predictive performance of patient $$i$$’s biomarker at visit $$j$$ can be summarized by the ROC curve   \begin{align*} {\rm ROC}_{ij}(u\mid \eta_0)= {\rm ROC}(u\mid \xi_i,\eta_0, j)=S_{ij}^{(1)}\left\{ \left( S_{ij}^{(0)}\right)^{-1}(u)\right\}\!, \end{align*} where   $$S_{ij}^{(a)}(m)=\mbox{P}\left(\hat M_{j}({\cal H}_{ij}, \eta_0)> m \mid Y_{ij}=a, \xi_i, \eta_0 \right)\!, a=0, 1,$$ is the subject- and visit-specific survival function of the estimated score. $${\rm ROC}_{ij}(u\mid \eta_0)$$ depends on the joint distribution of the random history $${\cal H}_{ij}$$ and the response $$Y_{ij}$$ and thus also on the subject-specific random effect $$\xi_i.$$ Since we do not have a convenient analytic expression for $${\rm ROC}_{ij}(u\mid \eta_0)=ROC(u|\xi, \eta_0, j),$$ we resort to a Monte-Carlo method. Specifically, for the $$i$$th patient: Simulate $${\cal H}_{ij}^*=\{Y_{i1}^*, \ldots, Y_{i(\,j-1)}^*, X_{i1}^*, \ldots, X_{ij}^*\}$$ and $$Y_{ij}^*$$ according to model (2.1) using consistent estimates of the subject-specific random effect $$\hat{\xi}_i$$ and the population parameter $$\eta_0$$ Compute $$\hat M_{j}({\cal H}_{ij}^*, \eta_0)$$ according to (2.4). Repeat steps 1–2 a large number of times and calculate the empirical ROC curve $$\hat{{\rm ROC}}_{ij}(u\mid \eta_0)=\hat{{\rm ROC}}(u|\hat{\xi}_i, \eta_0, j)$$ of the resulting pairs $$\left\{\hat M_{j}({\cal H}_{ij}^*, \eta_0), Y_{ij}^*\right\}\!.$$ $$\hat{{\rm ROC}}_{ij}(u\mid \eta_0)$$ can serve as an approximation to the subject-specific ROC curve of the $$i$$th patient at the $$j$$th visit provided that this patient’s subject-specific parameters are known or can be estimated up to the desired accuracy. When this assumption is unmet, e.g., when the time $$j$$ is small and few observations on the patient of interest are available, we instead propose two alternative summaries of the diagnostic performance of the biomarker at the individual level. The first is the average individual-specific ROC curve over the patient population,   $${\rm ROC}_1(u|\,j)=E\left\{{\rm ROC}(u |\xi, \eta_0, j)\right\}\!, \label{math:roc1}$$ (2.5) where the expectation is taken with respect to the random effect $$\xi.$$ In practice, we may use Monte-Carlo methods, simulating a large number $$B$$ of random effects $$\{\tilde{\xi}_1, \ldots, \tilde{\xi}_B\}$$ from the distribution for the random effect and estimating $${\rm ROC}_1(u|\,j)$$ by   $$\hat{{\rm ROC}}_1(u|\eta_0, j)=B^{-1}\sum_{b=1}^B \hat{{\rm ROC}}(u|\tilde{\xi}_b, \eta_0, j).$$ The resulting $$\hat{{\rm ROC}}_1(u|\eta_0, j)$$ is not the ROC curve for any individual patient but the expected patient-level ROC curve for a typical patient from the given population. As before, when $$\eta_0$$ is unknown, we may replace it by a consistent estimator $$\hat \eta$$ and let   $$\hat{{\rm ROC}}_1(u|\,j)=\hat{{\rm ROC}}_1(u|\hat \eta, j).$$ Since $$\hat{{\rm ROC}}_1(u|\eta_0, j)$$ is a smooth function of $$\eta_0$$, $$\hat{{\rm ROC}}_1(u|\,j)$$ is a consistent estimator for $${\rm ROC}_1(u|\,j)$$ and $$\sqrt{n}\big\{\hat{{\rm ROC}}_1(u|\,j)-{\rm ROC}_1(u|\,j)\big\}$$ converges weakly to a mean zero Gaussian process indexed by $$u$$ when $$\sqrt{n}(\hat \eta-\eta_0)$$ converges weakly to a mean zero Gaussian distribution. The second option is the limit   $${\rm ROC}_2(u | \xi_i)=\lim_{j\rightarrow\infty}{\rm ROC}(u|\xi_i, \eta_0, j).\label{math:roc2}$$ (2.6) As $$j\rightarrow \infty,$$$$\hat \xi_i({\bar{\cal H}}_{ij}),$$$$\hat M_{j}({\cal H}_{ij})$$ and $$\mbox{P}(Y_{ij}=a)$$ converge to $$\xi_i,$$$$M_{ij}$$, and $$\pi_{ai}$$, respectively, where   $$\pi^{(a)}_i=\frac{p^{((1-a)a)}_i}{p^{((1-a)a)}_i+p^{(a(1-a))}_i}, a=0,1$$ are subject-specific state probabilities of the stationary distribution of the 2-state Markov chain. Therefore, provided $$\alpha_{i3}\neq 0$$,   \begin{align} S_{i\infty}^{(a)}(m)&=\lim_{j\rightarrow \infty} S_{ij}^{(a)}(m) \nonumber \\ &=\lim_{j\rightarrow \infty}\mbox{P}\left\{\hat M_{j}({\cal H}_{ij})> m \mid Y_{ij}=a, \xi_i, \rho_0, \sigma_0, \right\} \nonumber\\ &= \lim_{j\rightarrow \infty}\sum_{b=0,1}\mbox{P}\left(X_{ij}-\rho_0X_{i(\,j-1)} > \frac{m - \alpha_{i0} - \alpha_{i1}b}{\alpha_{i3}} \biggm| Y_{ij}=a,Y_{i(\,j-1)}=b\right)\frac{p^{(ba)}_i\pi^{(b)}_i}{\pi^{(a)}_i} \nonumber\\ &= \sum_{b=0,1}\left\{1-\Phi\left(\frac{m-\alpha_{i0}-\alpha_{i1}b-\theta_i^{(a)}\sigma_0}{\sigma_0\alpha_{i3}}\right)\right\} \frac{p^{(ba)}_i\pi^{(b)}_i}{\pi^{(a)}_i}, \label{math:survlimit} \end{align} (2.7) where $$\Phi(\cdot)$$ is cumulative distribution function of the standard normal. Here, we used the fact that under model (2.1), $$X_{ij}-\rho_0X_{i(\,j-1)}$$ given $$\{Y_{ij}=a\}$$ is normally distributed with mean $$\theta_i^{(a)}$$ and variance $$\sigma_0^2$$. When $$\alpha_{i3}=\sigma_0^{-2}\left(\theta_i^{(1)}-\theta_i^{(0)}\right)=0$$, i.e., a patient’s diseased and non-diseased biomarker means are the same, the posterior probability of positive event status (2.2) reduces to   \begin{align*} pr(Y_{ij}=1\mid\mathcal{H}_{ij},\xi_i)=\frac{p^{(Y_{i(\,j-1)}1)}_i}{p^{(Y_{i(\,j-1)}0)}_i+p^{(Y_{i(\,j-1)}1)}_i}, \end{align*} posterior probability of a 2-state Markov chain. Consequently, the ROC curve summarizes the performance of a 2-state Markov chain in predicting the next state in this case. This performance serves as a limiting case when $$\alpha_{i3}$$ becomes small in magnitude, the biomarkers cease to provide useful discrimination, and the patient’s prior status carries all the information about current status. $${\rm ROC}_2(u | \xi_i)= S_{i\infty}^{(1)}\big\{(S_{i\infty}^{(0)})^{-1}(u)\big\}$$ can be viewed as the ROC curve for subject $$i$$ after adequate follow-up and therefore reflects the ultimate personalized diagnostic value of the biomarker for the $$i$$th patient with the subject random effect $$\xi_i.$$ It may or may not be similar to the population counterpart described in the next section. $${\rm ROC}_2(u | \xi_i)$$ can be estimated by $$\hat{{\rm ROC}}_2(u | \hat \xi_i),$$ which is the same as $${\rm ROC}_2(u | \xi_i)$$ with $$\xi_i$$ and $$\eta_0$$ being replaced by $$\hat \xi_i=\hat \xi_i({\bar{\cal H}}_{in_i}, \hat \eta)$$ and $$\hat \eta,$$ respectively. Assuming that $$n_i/n=r_0 \in (0, 1)$$ and $$n \rightarrow \infty,$$$$\hat{{\rm ROC}}_2(u | \hat \xi_i)$$ is consistent and $$\sqrt{n_i+n}\big\{\hat{{\rm ROC}}_2(u | \hat \xi_i)-{\rm ROC}_2(u | \xi_i)\big\}$$ converges to a mean zero Gaussian process. Therefore, the key assumption for estimating $${\rm ROC}_2(u | \xi_i)$$ in practice is that $$n_i$$ be sufficiently large to allow acceptable estimation of the individual-specific random effect. The resulting estimated ROC curve can then be used to characterize the diagnostic value of the biomarker for an individual patient after sufficient follow-up. Inference for $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_2(u | \xi_i)$$ can be carried out with the parametric bootstrap. One simulates fresh data using the estimated population parameter $$\hat \eta$$ from model (2.1) and obtains $${\rm ROC}^*_1(u|\,j)$$ and $${\rm ROC}_{2}^*(u|\xi^*_i)$$, the estimators for the corresponding ROC curves, from the simulated data. The empirical distributions $${\rm ROC}^*_1(u|\,j)-\hat{{\rm ROC}}_1(u|\,j)$$ and $${\rm ROC}_{2}^*(u|\xi^*_i)-\hat{{\rm ROC}}_2(u | \hat\xi_i)$$ based on a large number of simulations serve as approximations to the distributions of $$\hat{{\rm ROC}}_1(u|\,j)-{\rm ROC}_1(u|\,j)$$ and $$\hat{{\rm ROC}}_2(u | \hat\xi_i)-{\rm ROC}_2(u | \xi_i),$$ respectively. Point-wise confidence intervals (CIs) of $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_2(u | \xi_i)$$ can be constructed along these lines. Remark 2.1 One may be interested in the diagnostic value of $$X_{ij}$$ at the $$j$$th visit given the past history $$\bar{{\cal H}}_{i(\,j-1)}.$$ In this case, the ROC curve can be constructed based on the conditional survival function   \begin{align*} &S_{ij}^{(a)}(m \mid \bar{{\cal H}}_{i(\,j-1)})\\ =&\mbox{P}\left(X_{ij}> m \mid Y_{ij}=a, \xi_i, \bar{{\cal H}}_{i(\,j-1)}, \eta_0 \right)\\ =&\mbox{P}\left(X_{ij}> m \mid Y_{ij}=a, X_{i(\,j-1)}, \xi_i, \rho_0, \sigma_0 \right)\\ =&1-\Phi\left(\frac{m-\rho_0X_{i(\,j-1)}-\sigma_0\theta_i^{(a)}}{\sigma_0}\right)\!. \end{align*} In contrast to ROC curves based on $$M_{ij}$$ or its estimator, this ROC curve reflects the predictive value of $$X_{ij}$$ only. It also depends on the random effect $$\theta_i^{(a)}$$, unknown at the visit $$j.$$ One may also consider its expectation with respect to random effects or its limit when $$j \rightarrow \infty$$ as an estimable alternative. Remark 2.2 Both $${\rm ROC}_{1}(u|\,j)$$ and $${\rm ROC}_2(u | \xi_i)$$ are parametric in nature in that their summarization of the diagnostic value of the longitudinally measured biomarker are valid only if model (2.1) is correctly specified. 2.2. ROC curve for the patient population The predictive performance of the biomarker across the entire population may be very different from that for an individual patient. For example, the latter does not take into account biomarker variation between patients, or differences between patients in the prior probabilities of positive status events. Were the data not longitudinal, we might consider the empirical ROC curve of biomarker–status pairs $$(X_i,Y_i), i=1,\ldots n$$. To take accumulated patient data into account, we instead consider the ROC curve of $$\hat M_{j}({\mathcal H}_{ij}, \eta_0), i=1,\ldots, n$$, the patients’ biomarker scores (2.4) at a given time $$j$$. The scores synthesize all the predictive information in the past history under model (2.1). Conditionally on the population parameter $$\eta_0$$, the patient scores are iid, and the empirical ROC curve is a valid metric for the predictive value of the scores regardless of the validity of the model being used to derive them. If model (2.1) is a good approximation to the true relationship between $${\cal H}_{ij}$$ and $$Y_{ij}$$, one may anticipate good prediction accuracy of the resulting score. A severely misspecified model may give a prediction score with poor performance. In either case, the ROC curve and derived statistics such as the area under the ROC curve remain objective measures for the predictive value of the scoring system. Formally, assuming that $$\lim_{n\rightarrow \infty}\hat \eta = \tilde \eta_0$$ in probability, the score $$\hat M_{j}({\cal H}_{ij}, \hat \eta)$$ converges to   \begin{eqnarray*} \hat M_{j}({\cal H}_{ij}, \tilde\eta_0)&=&\alpha_{0j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)+\alpha_{1j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)Y_{i(\,j-1)} \nonumber \\ &&+\alpha_{2j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)X_{i(\,j-1)}+\alpha_{3j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)X_{ij}, \end{eqnarray*} where $$\tilde \eta_0=\eta_0$$ if the model is correctly specified. We are interested in estimating the ROC curve for the predictive value at the $$j$$th visit   $${\rm ROC}_{3}(u|\,j)= S_{j}^{(1)}\left\{ \left( S_{j}^{(0)}\right)^{-1}(u)\right\}\!, \label{math:roc3}$$ (2.8) where   $$S_{j}^{(a)}(m)=\mbox{P}\left(\hat M_{j}({\cal H}_{ij}, \tilde \eta_0)> m \mid Y_{ij}=a \right)\!, a=0, 1.$$ We do so by plugging in the empirical survival function:   $$\hat{{\rm ROC}}_3(u|\,j)=\hat S_{j}^{(1)}\left\{ \left( \hat S_{j}^{(0)}\right)^{-1}(u)\right\}\!,$$ (2.9) where $$N_{aj}=\sum_{i=1}^n I(Y_{ij}=a),$$  $$\hat S_j^{(a)}(m)= N_{aj}^{-1}\sum_{i=1}^{n} I\left\{\hat M_j({\cal H}_{ij},\hat{\eta})> m \right\} I( Y_{ij}=a)$$ and $$I(\cdot)$$ is the event indicator function. Similarly, the area under the ROC curve, the concordance statistics, may be estimated as   $$\skew7\hat A_j=\frac{1}{N_{1j}N_{0j}}\sum_{i=1}^{N_{0j}}\sum_{k=1}^{N_{1j}} I\left\{\hat M_{j}({\cal H}_{ij}, \hat \eta)>\hat M_{j}({\cal H}_{kj}, \hat \eta)\right\}I(Y_{ij}=1)I(Y_{kj}=0).$$ In Appendix B of supplementary material, available at Biostatistics online, we show that $$\hat{{\rm ROC}}_3(u|\,j)$$ is a consistent estimator for $${\rm ROC}_3(u|\,j)$$ and the distribution of $$\sqrt{n}\big\{\hat{ROC}_3(u|\,j)-{\rm ROC}_3(u|\,j)\big\}$$ converges to a mean zero Gaussian process under mild regularity conditions. The variance of $$\hat{\rm ROC}_3(u|\,j)$$ can be approximated by an efficient resampling method. At each iteration, we first generate random weights $$\{W_1, \ldots, W_n\}$$ from the unit exponential distribution and estimate $$\eta_0$$ under model (2.1) with the $$i$$th observation weighted by $$W_i.$$ Denote the estimator by $$\eta^*$$ and let   $${\rm ROC}_3^*(u|\,j)=\hat S_{j}^{(1)*}\left\{ \left( \hat S_{j}^{(0)*}\right)^{-1}(u)\right\}\!,$$ where   $$\hat S_j^{(a)*}(m)= \frac{\sum_{i=1}^{N} I\left\{\hat M_{j}({\cal H}_{ij}, \eta^*)> m \right\} I( Y_{ij}=a)W_i}{\sum_{i=1}^{N}I(Y_{ij}=a)W_i}.$$ Obtaining in this way a large number of realizations of $${\rm ROC}_3^*(u|\,j),$$ their empirical variance can be used to approximate that of $$\hat{{\rm ROC}}_3(u|\,j)-{\rm ROC}_3(u|\,j).$$ Similar resampling methods can be used to make inference on $$A_j$$, the area under the ROC curve at the $$j$$th visit. The predictive value of the biomarker in the entire population also varies with the visit $$j$$. With more visits and richer data observed, the predictive ability of the updated scoring system is expected to increase. We may study the trend of predictive value from visit $$1$$ to $$J$$ by simultaneously estimating $${\rm ROC}_3(u|2),\ldots,$$ and $${\rm ROC}_3(u|\,J)$$. It is not difficult to show that   $$\left\{\hat {\rm ROC}_3(u|2)-{\rm ROC}_3(u|2), \ldots, \hat{{\rm ROC}}_3(u|\,J)-{\rm ROC}_3(u|\,J)\right\}$$ can be approximated by a multivariate mean-zero Gaussian distribution, based on which joint inference for the predictive value at all visits of interest may be conducted. When the predictive value of the constructed scoring system only varies moderately from visit $$j_1$$ to $$j_2$$, i.e., $${\rm ROC}_3(u|\,j), j_1\le j\le j_2$$ are similar, it is tempting to estimate the ROC curve by the average predictive value between these two visits. To this end, one may empirically construct a ROC curve as   $$\hat{{\rm ROC}}_4(u|\,j_1, j_2)=\hat{S}_{j_1\,j_2}^{(1)}\left\{\left(\hat{S}_{j_1\,j_2}^{(0)}\right)^{-1}(u)\right\}\!,$$ where   $$\hat{S}_{j_1\,j_2}^{(a)}(m)=N_{aj_1\,j_2}\sum_{i=1}^n\sum_{j=j_1}^{j_2}I\left\{\hat{M}_{ij}({\cal H}_{ij})>m \right\}I(Y_{ij}=a) ,$$ and $$N_{aj_1\,j_2}=\sum_{i=1}^n \sum_{j=j_1}^{j_2}I(Y_{ij}=a).$$ Since it averages observations from multiple visits, $$\hat{{\rm ROC}}_4(u|\,j_1, j_2)$$ can be substantially more stable than $$\hat{{\rm ROC}}_3(u|\,j).$$$$\hat{{\rm ROC}}_4(u|\,j_1, j_2)$$ is a consistent estimator of   $${\rm ROC}_4(u|\,j_1, j_2)=S_{j_1\,j_2}^{(1)}\left\{\left(S_{j_1\,j_2}^{(0)}\right)^{-1}(u)\right\}\!,$$ where   $$S_{j_1\,j_2}^{(a)}(m)=\frac{\sum_{j=j_1}^{j_2} S_j^{(a)}(m)\mbox{P}(Y_{ij}=a)}{\sum_{j=j_1}^{j_2}\mbox{P}(Y_{ij}=a) },$$ a weighted average of $$S_j^{(a)}(m).$$ Statistical inference based on $$\hat{{\rm ROC}}_4(u|\,j_1, j_2)$$ can be made by resampling methods similar to those previously described. Remark 2.3 Despite some similarities, $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$ are quite different. The former is parametric and interpretable only when model (2.1) is correctly specified, while and latter is non-parametric in nature. The former, ignoring the differentiability in biomarkers across patients, tends to be smaller than the latter. Remark 2.4 The proposed ROC curves depend on the patient history $$\mathcal{H}_j$$ through the biomarkers estimates $$\hat M_j(\mathcal{H}_j, \hat \eta).$$ We may consider other functions of $$\mathcal{H}_j$$ given by different statistical models of the response. More generally, one may consider a working regression model   $$\mbox{P}\left(Y_{ij}=1|{\mathcal{H}_j}\right)=\mathcal{F}(Y_{i1}, \ldots, Y_{ij}, X_{i1}, \ldots, X_{i(\,j-1)}; {\boldsymbol \theta})$$ and construct the ROC curve based on   $$\left\{\left(\mathcal{F}(Y_{i1}, \ldots, Y_{ij}, X_{i1}, \ldots, X_{i(\,j-1)}; \hat{\boldsymbol \theta}), Y_{ij}\right)\!, i=1, \ldots, n\right\}\!,$$ where $$\mathcal{F}(\cdot; {\boldsymbol \theta})$$ is a parametric function of observed history and $${\boldsymbol \theta}$$ and $$\hat{\boldsymbol \theta}$$ are the model parameter and its appropriate estimator, respectively. 2.3. Extension In model (2.1), we assume that (i) the underlying disease status follows a simple Markov chain, i.e., the distribution of $$Y_{ij}$$ only depends on $$Y_{i(\,j-1)};$$ and (ii) the distribution of the biomarker level at visit $$j$$, $$X_{ij},$$ only depends on $$X_{i(\,j-1)}$$ and $$Y_{ij}$$; see Figure 1a, which diagrams the probability generating process described in (2.1). There are several obvious extensions: The distribution of $$X_{ij}$$ depends on $$(X_{i(\,j-1)}, Y_{i(\,j-1)}, Y_{ij})$$ (Figure 1b). The distribution of $$Y_{ij}$$ depends on $$(Y_{i(\,j-1)}, X_{i(\,j-1)})$$ (Figure 1c). Fig. 1. View largeDownload slide Schematic of the data-generation process described by (2.1) and extensions. (a) Model (2.1). (b) $$X_{ij}$$ generated from $$(X_{i(\,j-1)}, Y_{i(\,j-1)}, Y_{ij})$$. (c) $$Y_{ij}$$ generated from $$(Y_{i(\,j-1)}, X_{i(\,j-1)})$$. Fig. 1. View largeDownload slide Schematic of the data-generation process described by (2.1) and extensions. (a) Model (2.1). (b) $$X_{ij}$$ generated from $$(X_{i(\,j-1)}, Y_{i(\,j-1)}, Y_{ij})$$. (c) $$Y_{ij}$$ generated from $$(Y_{i(\,j-1)}, X_{i(\,j-1)})$$. Adapting model (2.1) to the first setting, where the biomarker value depends not only on the current disease status but also the status at the previous visit, gives:   $$X_{ij} | (X_{i(\,j-1)},Y_{i(\,j-1)},Y_{ij})=\theta_i^{(Y_{i(\,j-1)}Y_{ij})}+\rho_0X_{i(\,j-1)}+\epsilon_{ij}, \enspace\epsilon_{ij}\sim \mathcal{N}(0,\sigma_0^2), \label{math:modelextend1}$$ (2.10) where $$(\theta_{i1}, \theta_{i2}, \theta_{i3}, \theta_{i4})'$$ is the subject-specific random effect and $$\epsilon_{ij}$$ is independent $$\mathcal{N}(0,\sigma_0^2)$$. Under this model   \begin{align*} g\left\{\mbox{P}(Y_{ij}=1\mid {\cal H}_{ij},\xi_i)\right\}&=g\left\{\mbox{P}(Y_{ij}=1\mid X_{i(\,j-1)}, X_{ij}, Y_{i(\,j-1)})\right\}\\ &=\alpha_{i0}+\alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}+\alpha_{i4}X_{i(\,j-1)}Y_{i(\,j-1)} + \alpha_{i5}X_{ij}Y_{i(\,j-1)},\\ \end{align*} where   \begin{align*} \alpha_{i0} &= \frac{(\theta_{i}^{(00)})^2-(\theta_{i}^{(01)})^2}{2\sigma_0^2} + \mu^{(01)}_i\\ \alpha_{i1} &= -\frac{(\theta_{i}^{(11)})^2-(\theta_i^{(01)})^2-(\theta_i^{(10)})^2+(\theta_i^{(00)})^2}{2\sigma_0^2}+(\mu^{(11)}_i-\mu^{(01)}_i)\\ \alpha_{i2} &= \frac{\rho_0(\theta_{i}^{(00)}-\theta_i^{(01)})}{\sigma_0^2}, \alpha_{i3} = -\frac{(\theta_{i}^{(00)}-\theta_i^{(01)})}{\sigma_0^2}, \alpha_{i4} = -\frac{\rho_0(\theta_{i}^{(11)}-\theta_i^{(01)}-\theta_i^{(10)}+\theta_i^{(00)})}{\sigma_0^2},\\ \alpha_{i5} &= \frac{(\theta_{i}^{(11)}-\theta_i^{(01)}-\theta_i^{(10)}+\theta_i^{(00)})}{\sigma_0^2}. \end{align*} Therefore, besides the terms in (2.2), model (2.10) leads to additional interaction terms $$X_{i(\,j-1)}Y_{i(\,j-1)}$$ and $$X_{ij}Y_{i(\,j-1)}$$ contributing to the prediction of the disease status at the $$j$$th visit, $$Y_{ij}.$$ For the second setting, we may assume that   $$P\{Y_{ij}=b | Y_{i(\,j-1)}=a,X_{i(\,j-1)}\} = p^{(ab)}_i(X_{i(\,j-1)}), a,b\in\{0,1\},$$ where   $$\left( \begin{array}{c} g\{p_{01i}(X_{i(\,j-1)})\} \\ g\{p_{11i}(X_{i(\,j-1)})\} \end{array} \right) \sim N\left\{ \left( \begin{array}{c} \mu^{(01)}_i \\ \mu^{(11)}_i \end{array} \right)+\left( \begin{array}{c} \gamma_0 \\ \gamma_1 \end{array} \right)X_{i(\,j-1)}, \boldsymbol{\Sigma}_p\right\}\!.$$ In other words, the transition probability of the underlying disease status depends on the biomarker level at the prior visit. Under this model   \begin{align*} g\left\{\mbox{P}(Y_{ij}=1\mid {\cal H}_{ij},\xi_i)\right\}&=g\left\{\mbox{P}(Y_{ij}=1 \mid X_{i(\,j-1)}, X_{ij}, Y_{i(\,j-1)})\right\}\\ &=\alpha_0 + \alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}+\alpha_{i4}X_{i(\,j-1)}Y_{i(\,j-1)},\\ \end{align*} where   \begin{align*} \alpha_{i0} &= \mu^{(01)}_i+\frac{(\theta^{(0)}_i)^2-(\theta^{(1)}_i)^2}{2\sigma_0^2}\\ \alpha_{i1} &= \mu^{(11)}_i-\mu^{(01)}_i\\ \alpha_{i2} &= \frac{\rho_0(\theta^{(0)}_i-\theta^{(1)}_i)}{\sigma_0^2}+\gamma_0, \alpha_{i3} = -\frac{\theta^{(0)}_i-\theta^{(1)}_i}{\sigma_0^2}, \mbox{and} \alpha_{i4} = \gamma_1-\gamma_0. \end{align*} Therefore, compared with (2.2), there is an additional interaction term $$X_{i(\,j-1)}Y_{i(\,j-1)}$$ contributing to the prediction of the disease status at the $$j$$th visit, $$Y_{ij}.$$ There may be more extensive generalizations of model (2.2) such as the combination of extensions of (1) and (2) or higher order Markov chains for $$Y_{ij}$$. As mentioned in the previous sections, while the validity of individualized ROC curve depends on the correct model specification, the population-based ROC curve $${\rm ROC}_3(u|\,j)$$ and $${\rm ROC}_4(u|\,j_1, j_2)$$ can be constructed for scoring systems developed under different modeling assumptions and used to compare different models in terms of their predictive ability. 3. Example The goal of highly active antiretroviral therapy in the treatment of HIV is to keep a patient’s CD4 count high and to suppress viral load. CD4 count measures immunosuppression, the risk of opportunistic infections, and the strength of the immune system. Viral load is the amount of HIV in a sample, indicative of, among other things, transmission risk. Although viral load is regarded as a better indicator of disease status, it is also more expensive and time-consuming to measure than CD4 count. According to clinical guidelines, both are tested regularly in a typical treatment regimen and used to guide subsequent treatment. Even when therapy is effective and viral load is clinically categorized as suppressed, transient spikes in viral load, or “blips,” are observed. The clinical significance of viral blips is not understood well. While some studies have reported that viral blips are of no clinical significance, others have reported an association between viral blips and virologic failure. The identification of the predictors of viral blips and the association between viral blips and CD4+ T-cell changes over time are subjects of ongoing research. (see Paintsil and others, 2016 and references therein.) We consider the accuracy of absolute CD4+ T-lymphocyte count as a predictor of blip status among children. We analyzed longitudinal data from HIV-infected children enrolled in the Yale Prospective Longitudinal Cohort study comprising 97 children born to HIV-infected mothers in the greater New Haven, CT, area since 1985. The predictor CD4 count measures the number of CD4 cells/mm$$^3$$ of blood and the response blip status is defined as a viral load equal or exceeding 50 copies/mL. The median number of visits/patient is 33, with 1st and 3rd quartiles of 15 and 47 visits, respectively. For all of the 3309 visits in the data set, the median time between visits is exactly 90 days, with 1st and 3rd quartiles of 57 and 112 days, respectively, giving approximately evenly spaced visits during the follow-up. The average age at enrollment is 6.7 years (standard deviation: 3.9 years). Figure S1(a) of supplementary material available at Biostatistics online summarizes the dates of visits in the lifetimes of the subjects. Further details on the cohort and definitions used here can be found in Paintsil and others (2008) and the references therein. Sixteen patients with fewer than 10 visits were removed in order to allow for estimation of the individual ROC, $${\rm ROC}_2(u|\xi_i),$$ as discussed in Section 2. Eighty-one subjects remain after excluding those with fewer than 10 visits. The choice of how to group longitudinal observations is an important issue in many cohort-based longitudinal data analyzes, including ours. At each visit, measurements including CD4 count and blip status are taken, and antiretroviral treatment is administered. Therefore, the visit may serve as a surrogate for the number of treatments administered since baseline. While the specific enrollment time varies, the majority of the enrolled children (average age of 6.7 years) are in the early stages of treatment at baseline, and therefore it may be sensible to align observations according to their visit numbers. When the sample size allows, the analysis can be restricted to a more homogeneous subgroup of children with similar conditions at the baseline, which makes grouping by visits still more interpretable. A crude indication of the value of CD4 as a predictor of blip status is given in Figure S1(b) of supplementary material available at Biostatistics online, which plots the histograms of CD4, aggregated over patients and visits, conditional on positive and negative blip status. Despite the large overlap, there is a clear location shift between the two measures. Figure S2 of supplementary material available at Biostatistics online plots the trajectories of CD4, with plotting shape encoding blip status, for a representative sample of subjects. The long sequences of like shapes even as CD4 fluctuates wildly suggests previous blip status as a predictor of future blip status, motivating the Markov structure in model (2.1). Finally, Figure S3 of supplementary material available at Biostatistics online is a heatmap of the empirical correlation matrix among CD4 measurements on the first 40 visits, for the 44 patients with 40 or more visits. The entries tend to decrease in magnitude moving away from the diagonal. This correlation structure accords with the weak dependency implied by the autocorrelative structure in model (2.1). We apply the ROC estimation procedure described in Section 2 to the pediatric HIV data in order to assess the value of the past CD4 counts and blip statuses as a predictor of current blip status. The MLE $$\hat{\rho}_0=0.49,$$ (95% CI (0.28,0.63)), describes the strong autoregressive dependency of CD4, as suggested by the heatmap. Similarly, the strong dependence between previous and current blip status is confirmed by the population transition probabilities $$\{g^{-1}(\hat{\mu}^{(00)}), g^{-1}(\hat{\mu}^{(11)})\}=(0.93, 0.73)$$, giving the probabilities of remaining in the negative and positive, respectively, blip status states in successive visits. A 95% CI for the difference $$g^{-1}(\hat{\mu}^{(11)})-g^{-1}(\hat{\mu}^{(00)})$$ is (0.05,0.12). The difference between the CD4 standardized means conditional on negative versus positive blip status, $$(\hat{\theta}^{(1)}- \hat{\theta}^{(0)})/\hat{\sigma}_0=0.82,$$ (95% bootstrap CI (0.63,0.91)), is consistent with the location shift apparent from Figure S1(b) of supplementary material available at Biostatistics online. The resulting time-indexed ROC curves at time $$j=10$$ and their associated 95% CIs are summarized in Figure S4 of supplementary material available at Biostatistics online. The time $$j=10$$ was chosen to be consistent with our exclusion of patients with fewer than 10 visits from the analysis. As Figure S1(a) of supplementary material available at Biostatistics online shows, ROC curves at later time points are available if one is willing to exclude patients with insufficient visits. The CIs are constructed by a bootstrap with $$500$$ bootstrap samples. Despite the noisy data presented in Figures S1(b) and S2 of supplementary material available at Biostatistics online, the risk score taking into account both previous CD4 values and blip status performs reasonably well as a predictor of the current disease status. We also plot the time-asymptotic individual ROC curve $${\rm ROC}_2(u | \xi_i)$$ for selected patients. Patient no. 14 exhibits a non-smooth curve. The “elbow” arises when a patient’s previous disease status is significantly more predictive than the patient’s biomarkers of future disease status. In such cases, the ROC curve approximates the discrete behavior of a threshold predictor. As mentioned above, the validity of the individualized ROC curves depends on the correct specification of model (2.1). If we view model (2.1) as a working device used to derive a risk score for predicting the blip status, we may use $${\rm ROC}_3(u|\,j)$$ as well as $${\rm ROC}_4(u|\,j_1, j_2)$$ to summarize the predictive value of the scoring system between visits $$j_1$$ and $$j_2.$$ Due to the small sample size and infrequent occurrence of blips, we construct $${\rm ROC}_4(u|6,8)$$ and its 95% CI as shown in Figure S4 of supplementary material available at Biostatistics online. The area under the ROC curve is 0.865, with a bootstrap standard error of 0.008, also indicating good predictive value. The jagged shape of the ROC curve reflects the fact that few of the 81 patient scores lie in the overlap of the case and control distributions. As a comparison, we also plot in Figure 2 the ROC curve based on scores $$(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij})$$ by fitting the simple random effect model (1.1). As expected, the resulting ROC curve is higher than $$\hat{{\rm ROC}}_4(u|8,10)$$ and $$\hat{{\rm ROC}}_1(u|10)$$. However, using fitted values as scores to predict the blip status requires information not available at visit $$j=10$$ and is not therefore a comparable measure of the predictiveness of the biomarker at that time. Fig. 2. View largeDownload slide The ROC curve when fitted values under a random effects model (1.1) are used as scores compared with $${\rm ROC}_1(u|10)$$ and $${\rm ROC}_4(u|8,10)$$ (pediatric HIV data). Fig. 2. View largeDownload slide The ROC curve when fitted values under a random effects model (1.1) are used as scores compared with $${\rm ROC}_1(u|10)$$ and $${\rm ROC}_4(u|8,10)$$ (pediatric HIV data). 4. Simulation In this section, we investigate the finite-sample performance of the proposed method. To this end, we generate data sets mimicking the pediatric HIV data. Specifically, $$\{(X_{ij}, Y_{ij}), i=1, \ldots, n, j=1, \ldots, n_i\}$$ are simulated under model (2.1) with the population parameter $$\eta_0$$ being the maximum likelihood estimator obtained from the HIV data. We use a Monte-Carlo approximation for the underlying true ROC curves $${\rm ROC}_1(u|\,j)=E\{{\rm ROC}(u|\xi, \eta_0, j)\}$$ and $${\rm ROC}_3(u|\,j).$$ We also calculate $${\rm ROC}_2(u|\xi)=\lim_{j\rightarrow\infty} {\rm ROC}(u|\xi, \eta_0, j)$$ based on the analytic expression of $$S_\infty^{(a)}(m|\xi)$$ given in (2.7) for selected random effects $$\xi.$$ Next, we generate 500 data sets, each consisting of $$n=80$$ patients with $$n_i=40$$ visits each to match the HIV example. For each simulated data set, we estimate the expected individual-specific ROC curve $${\rm ROC}_1(u|\,j)$$ at $$j=5, 15, 25, 35$$ and its 95% point-wise CI using the parametric bootstrap method; the limiting individual-specific ROC curve $${\rm ROC}_2(u | \xi_i)$$ for selected patients; the population ROC curve $${\rm ROC}_3(u|\,j)$$ and its 95% point-wise CI using the resampling method, for $$j=5, 15, 25, 35.$$ The resulting ROC curves estimates and 95% CIs based on one generated data set are presented in Figure 3. To evaluate the performance of the proposed method, we estimate the empirical bias of the point estimators as well as the coverage level of the 95% CIs at selected $$u$$ for both $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$, $$j=5, 15, 25, 35.$$ For each estimator of interest, we also calculate the empirical average of the estimated standard errors and the empirical standard error. The detailed simulation results for $$u=10\%, 25\%, 50\%,$$ and $$75\%$$ are summarized at Table 1. The empirical biases are reasonably small in magnitude. The average estimated standard errors of all estimators are very close to the empirical standard errors and the coverage level of the 95% CIs are consistent with the nominal level allowed by the Monte-Carlo simulation error. In general, as expected, the population ROC curve tends to be higher than the individualized counterpart at the same visit. For estimating $${\rm ROC}_2(u | \xi_i),$$ we compare the AUC under ROC curve, $$\int_0^1 \hat{{\rm ROC}}_2(u|\xi, \hat{\eta}, j){\rm d}u,$$ based on data from increasing number of visits with the true limiting AUC value for selected random effects $$\xi$$s. We focus in particular on whether the estimator converges to the truth as the number of visits increases under this correct model specification. Figure S5 of supplementary material available at Biostatistics online plots the number of visits against the difference between estimated AUCs and the truth with five different realizations of $$\xi$$, showing the expected convergence. The convergence may be too slow for some purposes, requiring data from a large number of visits in order to achieve the required estimation accuracy of the individual random effect. Fig. 3. View largeDownload slide Expected individual ROC $${\rm {ROC}}_1(u|\,j)$$ (solid) and population ROC $${\rm ROC}_3(u|\,j)$$ (dotted) at visit $$j=5$$ with 95% bootstrap CI; limiting individual ROC $${\rm ROC}_2(u | \xi_i)$$ for four patients. The data were generated under model (2.1) with hyperparameters estimated from the pediatric HIV data. Fig. 3. View largeDownload slide Expected individual ROC $${\rm {ROC}}_1(u|\,j)$$ (solid) and population ROC $${\rm ROC}_3(u|\,j)$$ (dotted) at visit $$j=5$$ with 95% bootstrap CI; limiting individual ROC $${\rm ROC}_2(u | \xi_i)$$ for four patients. The data were generated under model (2.1) with hyperparameters estimated from the pediatric HIV data. Table 1. Nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (synthetic data using hyperparameters estimated from the pediatric HIV data, $$n=80$$ patients)       FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)        FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)  Table 1. Nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (synthetic data using hyperparameters estimated from the pediatric HIV data, $$n=80$$ patients)       FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)        FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)  In the second set of simulations, we examine the performance of the proposal under model mis-specification. Specifically, we simulate data under the random effect model (1.1), with model parameters taken to be the maximum likelihood estimators from the HIV data. As discussed above, the diagnostic value represented by the ROC curve based on $$\{(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij}), i=1, \ldots, n, j=1, \ldots, n_i\}$$ cannot be achieved in practice but may serve as a benchmark. Since model (2.1) is misspecified, we focus on the population ROC curve only. First, we plot in Figure S6 of supplementary material available at Biostatistics online the true $${\rm ROC}_3(u|\,j), j=5, 15, 25, 35,$$ and that based on $$(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij})$$ by setting $$n=10^6$$ and $$n_i=40.$$ As expected, by comparison with the benchmark $${\rm ROC}_3(u|\,j)$$ fails to reflect the predictive value of the observed history due to model misspecification. Next, we repeat the simulation with a sample size $$n=80$$ and $$n_i=40$$ 500 times to examine the finite-sample biases of the point estimators and coverage levels of the 95% CIs for estimating the true ROC curves. Table 2 confirms our expectation that the inference procedure for $${\rm ROC}_3(u|\,j)$$ remains valid in the presence of model misspecification. Table 2. Misspecified model: nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (random slope-intercept logistic model, $$n=80$$ patients)    FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)     FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)  Table 2. Misspecified model: nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (random slope-intercept logistic model, $$n=80$$ patients)    FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)     FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)  5. Discussion We have proposed a set of ROC-based metrics and statistical methods for evaluating the predictive value of a biomarker in a longitudinal, multiple patient design. We emphasize three keys in extending the ROC curve from the cross-sectional to longitudinal setting: (i) the score used to construct the ROC curve should take into account all of the observed history; (ii) the score should not require unobserved history; and (iii) the predictive value of the biomarker at the individual and the population levels should be treated differently. These objectives are not met satisfactorily by the mixed effects model (1.1) available in the literature, where (i) a patient’s observations are conditionally independent given the subject effects, and in particular past observations are not taken into account in using the observation as a score; (ii) all time points are used to estimate the subject effects, so that the score estimate for a given time point is a function of the disease status it is intended to predict; and (iii) there is no distinction between patient and population ROC curves. The current approach is developed based on a simple parametric model. While the parametric assumptions are plausible in light of our data, they are chosen mainly for convenience in implementation and motivated by the HIV data. Necessary model checking analysis for other data is needed. In the proposed approach, we assume that the biomarker is measured at a regular time interval, which is true in the HIV example. However, in clinical practice, the measurement time is often irregular and it may not be possible to group the measurements into comparable 1st, 2nd $$\ldots$$ visits. Furthermore, even with regular measurements, grouping measurements by visits may not be interpretable if the baseline does not represent an interpretable “origin,” such as onset of disease or start of treatment. In such cases, the predictiveness of the biomarker needs to be evaluated with respect to the measurement history, including the actual measurement times. Doing so may require complex joint modeling of measurement time, biomarker level and disease status. Further research in this direction is warranted. Supplementary material Supplementary material is available at http://biostatistics.oxfordjournals.org. Acknowledgments This research was partially supported by grants NIH/NAIDS 2P30 AI060354-11 from the National Institutes of Health and R01HL089778-05 from the National Heart, Lung, and Blood Institute. We thank the two referees and associate editor for their constructive comments. The authors also thank the study participants and the principal investigator of the study, Dr. Elijah Paintsil, for sharing the data with us. Conflict of Interest: None declared. References Albert, P. S. ( 2012). A linear mixed model for predicting a binary event from longitudinal data under random effects misspecification. Statistics in Medicine  31, 143– 154. Google Scholar CrossRef Search ADS PubMed  Azzalini, A. ( 1994). Logistic regression for autocorrelated data with application to repeated measures. Biometrika  81, 767– 775. Google Scholar CrossRef Search ADS   Breslow, N. E. and Clayton, D. G. ( 1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association  88, 9– 25. Foulkes, A. S., Azzoni, L., Li, X., Johnson, M. A., Smith, C., Mounzer, K. and Montaner, L. J. ( 2010). Prediction based classification for longitudinal biomarkers. The Annals of Applied Statistics  4, 1476. Google Scholar CrossRef Search ADS PubMed  Heagerty, P. J. and Zheng, Y. ( 2005). Survival model predictive accuracy and ROC curves. Biometrics  61, 92– 105. Google Scholar CrossRef Search ADS PubMed  Heagerty, P. J., Lumley, T. and Pepe, M. S. ( 2000). Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics  56, 337– 344. Google Scholar CrossRef Search ADS PubMed  Janes, H., Longton, G. and Pepe, M. ( 2009). Accommodating covariates in ROC analysis. Stata Journal  9, 17– 39. Google Scholar PubMed  Laird, N. M. and Ware, J. H. ( 1982). Random-effects models for longitudinal data. Biometrics  38, 963– 974. Google Scholar CrossRef Search ADS PubMed  Liu, D. and Albert, P. S. ( 2014). Combination of longitudinal biomarkers in predicting binary events. Biostatistics  15, 706– 718. Google Scholar CrossRef Search ADS PubMed  Liu, H. and Wu, T. ( 2003). Estimating the area under a receiver operating characteristic (ROC) curve for repeated measures design. Journal of Statistical Software  8, 1– 18. Google Scholar CrossRef Search ADS   Liu, H., Li, G., Cumberland, W. and Wu, T. ( 2005). Testing statistical significance of the area under a receiving operating characteristics curve for repeated measures design with bootstrapping. Journal of Data Science  3, 257– 278. Paintsil, E., Ghebremichael, M., Romano, S. and Andiman, W. ( 2008). Absolute CD4+ T-lymphocyte count as a surrogate marker of pediatric HIV disease progression. Pediatric Infectious Disease Journal  7, 629– 635. Google Scholar CrossRef Search ADS   Paintsil, E., Martin, R., Goldenthal, A., Bhandari, S., Andiman, W. and Ghebremichael, M. ( 2016). Frequent episodes of detectable viremia in HIV treatment-experienced children is associated with a decline in CD4+ T-cells over time. Journal of AIDS & Clinical Research  7, 565– 577. Google Scholar CrossRef Search ADS PubMed  Pencina, M. J., D’Agostino, R. B. and Vasan, R. S. ( 2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine  27, 157– 172. Google Scholar CrossRef Search ADS PubMed  Pepe, M. S. ( 2003). The Statistical Evaluation of Medical Tests for Classification and Prediction . New York: Oxford University Press, USA. Robinson, G. K. ( 1991). That BLUP is a good thing: the estimation of random effects. Statistical Science  6, 15– 32. Google Scholar CrossRef Search ADS   Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M. J. and Kattan, M. W. ( 2010). Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology (Cambridge, MA)  21, 128. Google Scholar CrossRef Search ADS   Swets, J. A. and Pickett, R. M. ( 1982). Evaluation of Diagnostic Systems: Methods from Signal Detection Theory . Academic Press Series in Cognition and Perception. New York, NY: Elsevier Science & Technology Books. Uno, H., Cai, T., Tian, L. and Wei, L.-J. ( 2007). Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association  102, 527– 537. Google Scholar CrossRef Search ADS   Uno, H., Tian, L, Cai, T., Kohane, I. S. and Wei, L.-J. ( 2013). A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data. Statistics in Medicine  32, 2430– 2442. Google Scholar CrossRef Search ADS PubMed  Yang, S., Santillana, M. and Kou, S. C. ( 2015). Accurate estimation of influenza epidemics using Google search data via ARGO. Proceedings of the National Academy of Sciences  112, 14473– 14478. Google Scholar CrossRef Search ADS   Zheng, Y. and Heagerty, P. J. ( 2004). Semiparametric estimation of time-dependent ROC curves for longitudinal marker data. Biostatistics  5, 615– 632. Google Scholar CrossRef Search ADS PubMed  Zhou, X.-H., McClish, D. K. and Obuchowski, N. A. ( 2009). Statistical Methods in Diagnostic Medicine , Volume 569. New York: John Wiley & Sons. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biostatistics Oxford University Press

# The ROC curve for regularly measured longitudinal biomarkers

, Volume Advance Article – Mar 28, 2018
19 pages

/lp/ou_press/the-roc-curve-for-regularly-measured-longitudinal-biomarkers-0jRJ0xtEEy
Publisher
Oxford University Press
ISSN
1465-4644
eISSN
1468-4357
D.O.I.
10.1093/biostatistics/kxy010
Publisher site
See Article on Publisher Site

### Abstract

Summary The receiver operating characteristic (ROC) curve is a commonly used graphical summary of the discriminative capacity of a thresholded continuous scoring system for a binary outcome. Estimation and inference procedures for the ROC curve are well-studied in the cross-sectional setting. However, there is a paucity of research when both biomarker measurements and disease status are observed longitudinally. In a motivating example, we are interested in characterizing the value of longitudinally measured CD4 counts for predicting the presence or absence of a transient spike in HIV viral load, also time-dependent. The existing method neither appropriately characterizes the diagnostic value of observed CD4 counts nor efficiently uses status history in predicting the current spike status. We propose to jointly model the binary status as a Markov chain and the biomarkers levels, conditional on the binary status, as an autoregressive process, yielding a dynamic scoring procedure for predicting the occurrence of a spike. Based on the resulting prediction rule, we propose several natural extensions of the ROC curve to the longitudinal setting and describe procedures for statistical inference. Lastly, extensive simulations have been conducted to examine the small sample operational characteristics of the proposed methods. 1. Introduction The receiver operating characteristic (ROC) curve is a graphical summary of the discriminative capacity of a thresholded continuous scoring system for a binary outcome. The curve consists of pairs of true positive and false positive rates as the threshold is varied (Swets and Pickett, 1982). Although many alternative measures have been proposed (Uno and others, 2007, 2013; Pencina and others, 2008; Steyerberg and others, 2010), the ROC curve remains the most commonly used in many fields. In medical research, ROC curves are often used to characterize the quality of a continuous biomarker as a diagnostic for binary statuses such as “diseased” versus “non-diseased” (Pepe, 2003; Zhou and others, 2009). Well-studied in the cross-sectional setting, the ROC curve has been generalized to settings where the outcome of interest is time-to-event (Heagerty and others, 2000; Zheng and Heagerty, 2004; Heagerty and Zheng, 2005) and where the biomarker is longitudinally measured (Foulkes and others, 2010; Liu and Albert, 2014). However, less research has considered both longitudinal biomarker measurements and binary statuses. A motivating example is data gathered from the Yale Prospective Longitudinal Pediatric HIV Cohort. The cohort comprises 97 children born to HIV-infected mothers in the New Haven, CT, area since 1985. Various measurements were taken on the participants every 2–3 months over the 10-year period 1996–2006. Among these measurements, we focus on a continuous biomarker, CD4+ lymphocyte count, as a predictor of a binary outcome, “blip” status, the presence or absence of a transient spike in viral load (Paintsil and others, 2008). Let $$X_{ij}$$ and $$Y_{ij}$$ denote the biomarker value and a binary status of patient $$i$$ at visit $$j$$, respectively, $$i=1,\ldots,n$$, $$j=1,\ldots, n_i.$$ The values $$X_{ij}$$ may be direct assay measurements, as in the motivating example, or they may be derived or composite quantities. To assess the predictive value of the longitudinal biomarker $$X_{ij}$$ for predicting $$Y_{ij}$$, Liu and Wu (2003) and Liu and others (2005) propose a simple mixed effect regression model (Breslow and Clayton, 1993)   $$g\{\mbox{P}(Y_{ij}=1 \mid X_{i1},\cdots,X_{in_i})\}=\alpha_i+\beta_i X_{ij}, \label{math:mixedmodel}$$ (1.1) where $$g(\cdot)$$ is the logit function and $$\alpha_i$$ and $$\beta_i$$ are, respectively, the subject-specific random intercept and slope. Similar models are described in Foulkes and others (2010) and Albert (2012). The random vector $$(\alpha_i, \beta_i)'$$ is assumed to follow a Gaussian distribution   $$\left( \begin{array}{c} \alpha_i \\ \beta_i \end{array} \right)\sim N\left\{\left( \begin{array}{c} \alpha_0 \\ \beta_0 \end{array} \right)\!, {\boldsymbol{\Gamma}}_0 \right\}\!.$$ The ROC curve summarizing the diagnostic value of $$X_{ij}$$ is then constructed based on pairs   $$\left\{(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij}), i=1, \cdots, n; j=1, \cdots, n_i\right\}\!,$$ where $$(\hat \alpha_i, \hat \beta_i)'$$ are estimates of the subject-specific random effects obtained from the observed data, for example,   $$\left( \begin{array}{c} \hat \alpha_i \\ \hat \beta_i \end{array} \right)=E\left\{\left( \begin{array}{c} \alpha_i \\ \beta_i \end{array} \right)\biggm | X_{i1}, \ldots, X_{in_i}, Y_{i1}, \ldots, Y_{in_i}, \hat\alpha_0, \hat \beta_0, \hat {\boldsymbol{\Gamma}}_0 \right\}\!,$$ and $$\hat\alpha_0$$, $$\hat \beta_0$$, and $$\hat \Gamma_0$$ are maximum likelihood estimators for the corresponding population parameters. While this approach is simple and intuitive, we mention several limitations. First, the parametric assumptions may be too restrictive for some applications. For example, as discussed below, the Yale pediatric HIV data suggest greater dependency among biomarkers and disease statuses nearer in time. Neither is accounted for by model (1.1), which is symmetric in time. Second, the subject-specific random effect estimate $$\hat \alpha_i$$ and $$\hat \beta_i,$$ as defined above, are not available at visit $$j$$ as the biomarker levels $$(X_{i(\,j+1)}, \ldots, X_{in_i})$$ and responses $$(Y_{i(\,j+1)}, \ldots, Y_{in_i})$$ are not yet observed. Third, the approach uses the same data both to fit the model and, by using the fitted biomarkers to construct the ROC curve, to assess the quality of the fit. One expects such an assessment to overestimate the true diagnostic quality of the biomarker (Janes and others, 2009). Efforts to set aside a group of patients for validation after estimation encounter the difficulty that subject effect estimates for the validation patients are unavailable (Foulkes and others, 2010). Lastly and more conceptually, the notion of ROC curve stands to be refined in the context of longitudinal measurements of multiple patients. In contrast to the cross-sectional setting, several useful ROC curves suggest themselves. For example, the predictive performance of the biomarker for a given patient, as determined by that patient’s history, can be quite different from the predictive performance for the entire patient population. In the next section, we propose a general framework to address these limitations. 2. Methods We first note two properties desirable in a framework for assessing diagnostic performance in the longitudinal, multiple subject design under consideration. First, to assess the predictive performance of a biomarker, the biomarker should depend only on data that is available when a prediction is to be made. We adopt the vantage of a practitioner who has previous biomarker and status data for a patient, is confronted with a current biomarker for the patient, and must now predict current status. In the HIV example, due to the turnaround time of the tests involved, CD4 count or percentage is normally available before blip status. The patient’s history includes previous blip statuses, and the clinician may need to determine a course of treatment based on as-yet unavailable current blip status as predicted from current CD4 count or percentage. A similar problem is described by Yang and others (2015). Here, the “status” is the presence of absence of influenza-like illness in periodic reports issued by the Center for Disease Control (CDC), and the predictor or “biomarker” is real-time internet search data. The CDC’s reports describe outbreaks at a 1–3 week delay. When making real-time predictions with the search data, only CDC outbreak data referencing previous time points are available. In this case, an accurate early warning of the outbreak can be very important for public health. As predictions may be made at different times, the accuracy of the prediction and the associated ROC curve will depend on time, with the corresponding prediction depending only on patient history available at that time. Second, two types of prediction performance should be differentiated: that for an individual patient and that for a patient population. For the former, we target the performance of $${\cal F}({\cal H}_{ij})$$ as a predictor of $$Y_{ij},$$ where $${\cal F}({\cal H}_{ij})$$ is a continuous score summarizing the predictive information contained in the history up to visit $$j$$ of a given patient $$i,$$ and $$\mathcal{H}_{ij}=\{X_{i1},\ldots,X_{ij},Y_{i1},\ldots,Y_{i(\,j-1)}\}.$$ For the latter, we are interested in the predictive performance of $${\cal F}({\cal H}_{ij})$$ in the entire patient population at a time $$j$$, that is, marginalizing across patients. In the following, we first generalize the simple mixed effect model (1.1) and discuss the two types of predictive performance under the proposed model. As discussed further below, easy extensions lead to more sophisticated models allowing for more flexible prediction rules. We assume that the longitudinal biomarker levels $$X_{ij}$$ follow an autoregressive process conditional on disease status $$Y_{ij} \in \{0, 1\},$$ which are generated by a Markov chain as in, e.g., Azzalini (1994). Specifically, we assume that for the $$i$$th patient   \begin{aligned} \mbox{P}\{Y_{ij}=b | Y_{i(\,j-1)}=a, {\cal H}_{i(\,j-1)}\}& = p_i^{(ab)}, a,b\in\{0,1\}, j=2, \ldots, \\ X_{ij} | \left\{Y_{ij}=a, {\cal H}_{ij}\right\}&=\theta_i^{(a)}+\rho_0X_{i(\,j-1)}+\epsilon_{ij}, j=1, \ldots, \\ \mbox{P}(Y_{i1}=a)& =p^{(a)} ~~~\mbox{and}~~~ X_{i0}=0, \end{aligned} \label{math:jointmodel} (2.1) where   \begin{align*} \rho_0 &\in (-1,1)\\ \epsilon_{ij} &\overset{iid}{\sim} \mathcal{N}(0, \sigma_0^2), j=1,\cdots, n_i,\\ \xi_i = \left( \begin{array}{c} \theta_i^{(0)} \\ \theta_i^{(1)} \\ g(p^{(01)}_i) \\ g(p^{(11)}_i) \end{array} \right) &\overset{iid}{\sim} N\left\{\left( \begin{array}{c} \theta^{(0)} \\ \theta^{(1)} \\ \mu^{(01)} \\ \mu^{(11)}\end{array} \right)\!, \left(\begin{array}{cc} {\boldsymbol{\Sigma}}_\theta & 0 \\ 0 & {\boldsymbol{\Sigma}}_p \end{array}\right) \right\}\!, i=1,\ldots,n, \end{align*} independently and identically, and $$\eta_0=(\rho_0,p^{(0)}, \theta^{(0)},\theta^{(1)},\mu^{(01)},\mu^{(11)},{\boldsymbol{\Sigma}}_\theta,{\boldsymbol{\Sigma}}_p)$$ are hyperparameters. $$X_{i0}$$ is set to 0 to initialize the autoregressive, implying that the baseline biomarker level $$X_{i1}$$ follows a Gaussian distribution conditional on the blip status. This set of parametric distributions for the random effects is chosen in part for convenience as they permit the model parameters to be estimated using many standard statistical software packages. Specifically, $$\{\theta^{(0)}, \theta^{(1)}, {\boldsymbol{\Sigma}}_\theta, \rho_0, \sigma_0^2\}$$ can be estimated by fitting the linear mixed effects model (Laird and Ware, 1982)   $$X_{ij} =\theta_i^{(0)}+(\theta_i^{(1)}-\theta_i^{(0)})Y_{ij}+\rho_0X_{i(\,j-1)}+\epsilon_{i(\,j-1)}, j=2, \ldots, n_i,$$ and $$\{\mu_{01}, \mu_{11}, {\boldsymbol{\Sigma}}_p\}$$ can be estimated by fitting the generalized mixed effects model   $$g\{\mbox{P}(Y_{ij}=1 \mid Y_{i(\,j-1)})\}=\mu^{(01)}_i+(\mu^{(11)}_i-\mu^{(01)}_i)Y_{i(\,j-1)}, j=2, \ldots, n_i,$$ where $$\mu^{(ab)}_i=g(p^{(ab)}_i), a=0,1$$ and $$b=0,1.$$ In addition, $$p^{(0)}$$ can be estimated by the observed proportion across patients at the initial visit, i.e., the baseline. More importantly, under this model, we may link $$Y_{ij}$$ with the observed history at visit $$j$$ via a random effects model   \begin{align} g\left\{\mbox{P}\left(Y_{ij}=1\mid {\cal H}_{ij}, \xi_i\right)\right\}&=g\left\{\mbox{P}\left(Y_{ij}=1\mid X_{i(\,j-1)}, X_{ij}, Y_{i(\,j-1)}, \xi_i\right)\right\}\nonumber\\ &=\alpha_{i0}+\alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}, \label{math:masterreg} \end{align} (2.2) where   \begin{aligned} \alpha_{i0} &=\frac{1}{2\sigma_0^2}\left\{(\theta^{(0)}_i)^2-(\theta^{(1)}_i)^2\right\} + \mu^{(01)}_i,\\ \alpha_{i1} &= \mu^{(11)}_i-\mu^{(01)}_i,\\ \alpha_{i2}&=\frac{1}{\sigma^2_0}\rho_0\left(\theta_i^{(0)}-\theta_i^{(1)}\right)\!,\\ \alpha_{i3}&=-\frac{1}{\sigma^2_0}\left(\theta_i^{(0)}-\theta_i^{(1)}\right)\!. \end{aligned} \label{math:randomcoeff} (2.3) Thus model (2.1) generalizes model (1.1), insofar as the log-odds of positive disease status is modeled as linear in the subject’s most current biomarker level, although the distribution of the coefficients in this linear combination may differ between the two models. Generalizations of (2.1) that include more biomarkers, e.g., $$g\{\mbox{P}(Y_{ij}=1 \mid X_{i1},\ldots,X_{in_i})\}=\beta_0+\beta_1 X_{ij} + \beta_2X_{i(\,j-1)} + \cdots + \beta_mX_{i(\,j-m+1)}$$, correspond to higher order autoregressive processes in model (1.1). 2.1. Individual patient ROC curve We would like to evaluate the predictive performance of the biomarker or score $$M_{ij}$$ (or its history) for patient $$i$$ at time $$j$$ by contrasting two survival functions, $$\tilde S_{ij}^{(1)}(m)$$ and $$\tilde S_{ij}^{(0)}(m),$$ where   $$\tilde S_{ij}^{(a)}(m)=\mbox{P}\left(M_{ij}> m \mid Y_{ij}=a, \xi_i, \rho_0, \sigma_0\right)\!, a=0,1,$$ and   $$M_{ij}=\alpha_{i0}+\alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}.$$ $$M_{ij}$$ uses the available history, since under model (2.1), $$Y_{ij}$$ is conditionally independent of the remaining history given $$M_{ij}$$. We may then use the ROC curve $$\tilde S_{ij}^{(1)}\big\{\big(\tilde S_{ij}^{(0)}\big)^{-1}(u)\big\}$$ or derived statistics such as the area under the ROC curve $$\int_{\mathbb{R}}\tilde S_{ij}^{(1)}(x){\rm d}\big\{1-\tilde S_{ij}^{(0)}(x)\big\}$$ to summarize this contrast. As $$\alpha_{ij}, j=0,1,2,3$$, depend on the unknown subject-specific random effect $$\xi_i,$$ the score $$M_{ij}$$ is unavailable in practice. An ROC curve based on $$M_{ij}$$ can only serve as a theoretical benchmark. We therefore estimate the random effect $$\xi_i$$ based on its conditional distribution $$\xi_i \mid \bar{\cal H}_{i(\,j-1)},$$ where $${\bar{\mathcal{H}}}_{i(\,j-1)}= \{(Y_{ik}, X_{ik}), k=1, \ldots, (\,j-1) \},$$ and use a plug-in estimator for $$M_{ij}$$ . For example, we may estimate the random effects and $$M_{ij}$$ by the posterior mean   \begin{align*} \hat\xi_i({\bar{\cal H}}_{i(\,j-1)}, \eta_0)= \left( \begin{array}{c} \hat \theta_i^{(0)}({ \bar{\cal H}}_{i(\,j-1)}, \eta_0) \\[5pt] \hat \theta_i^{(1)}({ \bar{\cal H}}_{i(\,j-1)}, \eta_0)\\[5pt] g\left\{\hat p^{(01)}_i({ \bar{\cal H}}_{i(\,j-1)}, \eta_0)\right\} \\[5pt] g\left\{\hat p^{(11)}_i({\bar{\cal H}}_{i(\,j-1)}, \eta_0)\right\} \end{array}\right)=E\left\{\left( \begin{array}{c} \theta_i^{(0)} \\[5pt] \theta_i^{(1)} \\[5pt] \mu^{(01)}_i \\[5pt] \mu^{(11)}_i\end{array} \right) \biggm| { \bar{\cal H}}_{i(\,j-1)}, \eta_0 \right\}\!, \end{align*} and   \begin{eqnarray} \hat M_{j}({\cal H}_{ij}, \eta_0)&=&\alpha_{0j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)+\alpha_{1j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)Y_{i(\,j-1)} \nonumber\\ &&+\alpha_{2j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)X_{i(\,j-1)}+\alpha_{3j}({\bar{\cal H}}_{i(\,j-1)}, \eta_0)X_{ij}, \label{math:mhat} \end{eqnarray} (2.4) respectively (Robinson, 1991), where the functions $$\alpha_{kj}(\bar{\cal H}_{i(\,j-1)}, \eta_0), k=0, 1, 2, 3,$$ are obtained by replacing all the relevant subject-specific random effects in (2.3) with their estimated counterparts based on $${\bar{\cal H}}_{i(\,j-1)}.$$ For example,   $$\alpha_{3j}(\bar{\cal H}_{ij}, \eta_0) =\frac{1}{\sigma_0^2}\left\{ E(\theta_i^{(1)} \mid {\bar{\cal H}}_{i(\,j-1)}, \eta_0)-E(\theta_i^{(0)}\mid {\bar{\cal H}}_{i(\,j-1)}, \eta_0)\right\}\!.$$ Here, the subscript $$j$$ is used to emphasize that the prediction of the subject-specific random effect is made at visit $$j$$ using information up to visit $$j-1.$$ The estimator $$\hat{M}_{j}(\cdot,\cdot)$$ depends on the subject only through the first argument, i.e., the patient history, and so the subscript $$i$$ has been dropped. An explicit expression for this choice of $$\hat\xi_i({\bar{\cal H}}_{i(\,j-1)}, \eta_0)$$ can be found in Appendix A of the supplementary material available at Biostatistics online. Using the estimated score $$\hat M_{j}({\cal H}_{ij}, \eta_0)$$ (or $$\hat M_{j}({\cal H}_{ij},\hat\eta_0)$$ if $$\eta_0$$ is unknown) to predict the disease status at visit $$j$$, the predictive performance of patient $$i$$’s biomarker at visit $$j$$ can be summarized by the ROC curve   \begin{align*} {\rm ROC}_{ij}(u\mid \eta_0)= {\rm ROC}(u\mid \xi_i,\eta_0, j)=S_{ij}^{(1)}\left\{ \left( S_{ij}^{(0)}\right)^{-1}(u)\right\}\!, \end{align*} where   $$S_{ij}^{(a)}(m)=\mbox{P}\left(\hat M_{j}({\cal H}_{ij}, \eta_0)> m \mid Y_{ij}=a, \xi_i, \eta_0 \right)\!, a=0, 1,$$ is the subject- and visit-specific survival function of the estimated score. $${\rm ROC}_{ij}(u\mid \eta_0)$$ depends on the joint distribution of the random history $${\cal H}_{ij}$$ and the response $$Y_{ij}$$ and thus also on the subject-specific random effect $$\xi_i.$$ Since we do not have a convenient analytic expression for $${\rm ROC}_{ij}(u\mid \eta_0)=ROC(u|\xi, \eta_0, j),$$ we resort to a Monte-Carlo method. Specifically, for the $$i$$th patient: Simulate $${\cal H}_{ij}^*=\{Y_{i1}^*, \ldots, Y_{i(\,j-1)}^*, X_{i1}^*, \ldots, X_{ij}^*\}$$ and $$Y_{ij}^*$$ according to model (2.1) using consistent estimates of the subject-specific random effect $$\hat{\xi}_i$$ and the population parameter $$\eta_0$$ Compute $$\hat M_{j}({\cal H}_{ij}^*, \eta_0)$$ according to (2.4). Repeat steps 1–2 a large number of times and calculate the empirical ROC curve $$\hat{{\rm ROC}}_{ij}(u\mid \eta_0)=\hat{{\rm ROC}}(u|\hat{\xi}_i, \eta_0, j)$$ of the resulting pairs $$\left\{\hat M_{j}({\cal H}_{ij}^*, \eta_0), Y_{ij}^*\right\}\!.$$ $$\hat{{\rm ROC}}_{ij}(u\mid \eta_0)$$ can serve as an approximation to the subject-specific ROC curve of the $$i$$th patient at the $$j$$th visit provided that this patient’s subject-specific parameters are known or can be estimated up to the desired accuracy. When this assumption is unmet, e.g., when the time $$j$$ is small and few observations on the patient of interest are available, we instead propose two alternative summaries of the diagnostic performance of the biomarker at the individual level. The first is the average individual-specific ROC curve over the patient population,   $${\rm ROC}_1(u|\,j)=E\left\{{\rm ROC}(u |\xi, \eta_0, j)\right\}\!, \label{math:roc1}$$ (2.5) where the expectation is taken with respect to the random effect $$\xi.$$ In practice, we may use Monte-Carlo methods, simulating a large number $$B$$ of random effects $$\{\tilde{\xi}_1, \ldots, \tilde{\xi}_B\}$$ from the distribution for the random effect and estimating $${\rm ROC}_1(u|\,j)$$ by   $$\hat{{\rm ROC}}_1(u|\eta_0, j)=B^{-1}\sum_{b=1}^B \hat{{\rm ROC}}(u|\tilde{\xi}_b, \eta_0, j).$$ The resulting $$\hat{{\rm ROC}}_1(u|\eta_0, j)$$ is not the ROC curve for any individual patient but the expected patient-level ROC curve for a typical patient from the given population. As before, when $$\eta_0$$ is unknown, we may replace it by a consistent estimator $$\hat \eta$$ and let   $$\hat{{\rm ROC}}_1(u|\,j)=\hat{{\rm ROC}}_1(u|\hat \eta, j).$$ Since $$\hat{{\rm ROC}}_1(u|\eta_0, j)$$ is a smooth function of $$\eta_0$$, $$\hat{{\rm ROC}}_1(u|\,j)$$ is a consistent estimator for $${\rm ROC}_1(u|\,j)$$ and $$\sqrt{n}\big\{\hat{{\rm ROC}}_1(u|\,j)-{\rm ROC}_1(u|\,j)\big\}$$ converges weakly to a mean zero Gaussian process indexed by $$u$$ when $$\sqrt{n}(\hat \eta-\eta_0)$$ converges weakly to a mean zero Gaussian distribution. The second option is the limit   $${\rm ROC}_2(u | \xi_i)=\lim_{j\rightarrow\infty}{\rm ROC}(u|\xi_i, \eta_0, j).\label{math:roc2}$$ (2.6) As $$j\rightarrow \infty,$$$$\hat \xi_i({\bar{\cal H}}_{ij}),$$$$\hat M_{j}({\cal H}_{ij})$$ and $$\mbox{P}(Y_{ij}=a)$$ converge to $$\xi_i,$$$$M_{ij}$$, and $$\pi_{ai}$$, respectively, where   $$\pi^{(a)}_i=\frac{p^{((1-a)a)}_i}{p^{((1-a)a)}_i+p^{(a(1-a))}_i}, a=0,1$$ are subject-specific state probabilities of the stationary distribution of the 2-state Markov chain. Therefore, provided $$\alpha_{i3}\neq 0$$,   \begin{align} S_{i\infty}^{(a)}(m)&=\lim_{j\rightarrow \infty} S_{ij}^{(a)}(m) \nonumber \\ &=\lim_{j\rightarrow \infty}\mbox{P}\left\{\hat M_{j}({\cal H}_{ij})> m \mid Y_{ij}=a, \xi_i, \rho_0, \sigma_0, \right\} \nonumber\\ &= \lim_{j\rightarrow \infty}\sum_{b=0,1}\mbox{P}\left(X_{ij}-\rho_0X_{i(\,j-1)} > \frac{m - \alpha_{i0} - \alpha_{i1}b}{\alpha_{i3}} \biggm| Y_{ij}=a,Y_{i(\,j-1)}=b\right)\frac{p^{(ba)}_i\pi^{(b)}_i}{\pi^{(a)}_i} \nonumber\\ &= \sum_{b=0,1}\left\{1-\Phi\left(\frac{m-\alpha_{i0}-\alpha_{i1}b-\theta_i^{(a)}\sigma_0}{\sigma_0\alpha_{i3}}\right)\right\} \frac{p^{(ba)}_i\pi^{(b)}_i}{\pi^{(a)}_i}, \label{math:survlimit} \end{align} (2.7) where $$\Phi(\cdot)$$ is cumulative distribution function of the standard normal. Here, we used the fact that under model (2.1), $$X_{ij}-\rho_0X_{i(\,j-1)}$$ given $$\{Y_{ij}=a\}$$ is normally distributed with mean $$\theta_i^{(a)}$$ and variance $$\sigma_0^2$$. When $$\alpha_{i3}=\sigma_0^{-2}\left(\theta_i^{(1)}-\theta_i^{(0)}\right)=0$$, i.e., a patient’s diseased and non-diseased biomarker means are the same, the posterior probability of positive event status (2.2) reduces to   \begin{align*} pr(Y_{ij}=1\mid\mathcal{H}_{ij},\xi_i)=\frac{p^{(Y_{i(\,j-1)}1)}_i}{p^{(Y_{i(\,j-1)}0)}_i+p^{(Y_{i(\,j-1)}1)}_i}, \end{align*} posterior probability of a 2-state Markov chain. Consequently, the ROC curve summarizes the performance of a 2-state Markov chain in predicting the next state in this case. This performance serves as a limiting case when $$\alpha_{i3}$$ becomes small in magnitude, the biomarkers cease to provide useful discrimination, and the patient’s prior status carries all the information about current status. $${\rm ROC}_2(u | \xi_i)= S_{i\infty}^{(1)}\big\{(S_{i\infty}^{(0)})^{-1}(u)\big\}$$ can be viewed as the ROC curve for subject $$i$$ after adequate follow-up and therefore reflects the ultimate personalized diagnostic value of the biomarker for the $$i$$th patient with the subject random effect $$\xi_i.$$ It may or may not be similar to the population counterpart described in the next section. $${\rm ROC}_2(u | \xi_i)$$ can be estimated by $$\hat{{\rm ROC}}_2(u | \hat \xi_i),$$ which is the same as $${\rm ROC}_2(u | \xi_i)$$ with $$\xi_i$$ and $$\eta_0$$ being replaced by $$\hat \xi_i=\hat \xi_i({\bar{\cal H}}_{in_i}, \hat \eta)$$ and $$\hat \eta,$$ respectively. Assuming that $$n_i/n=r_0 \in (0, 1)$$ and $$n \rightarrow \infty,$$$$\hat{{\rm ROC}}_2(u | \hat \xi_i)$$ is consistent and $$\sqrt{n_i+n}\big\{\hat{{\rm ROC}}_2(u | \hat \xi_i)-{\rm ROC}_2(u | \xi_i)\big\}$$ converges to a mean zero Gaussian process. Therefore, the key assumption for estimating $${\rm ROC}_2(u | \xi_i)$$ in practice is that $$n_i$$ be sufficiently large to allow acceptable estimation of the individual-specific random effect. The resulting estimated ROC curve can then be used to characterize the diagnostic value of the biomarker for an individual patient after sufficient follow-up. Inference for $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_2(u | \xi_i)$$ can be carried out with the parametric bootstrap. One simulates fresh data using the estimated population parameter $$\hat \eta$$ from model (2.1) and obtains $${\rm ROC}^*_1(u|\,j)$$ and $${\rm ROC}_{2}^*(u|\xi^*_i)$$, the estimators for the corresponding ROC curves, from the simulated data. The empirical distributions $${\rm ROC}^*_1(u|\,j)-\hat{{\rm ROC}}_1(u|\,j)$$ and $${\rm ROC}_{2}^*(u|\xi^*_i)-\hat{{\rm ROC}}_2(u | \hat\xi_i)$$ based on a large number of simulations serve as approximations to the distributions of $$\hat{{\rm ROC}}_1(u|\,j)-{\rm ROC}_1(u|\,j)$$ and $$\hat{{\rm ROC}}_2(u | \hat\xi_i)-{\rm ROC}_2(u | \xi_i),$$ respectively. Point-wise confidence intervals (CIs) of $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_2(u | \xi_i)$$ can be constructed along these lines. Remark 2.1 One may be interested in the diagnostic value of $$X_{ij}$$ at the $$j$$th visit given the past history $$\bar{{\cal H}}_{i(\,j-1)}.$$ In this case, the ROC curve can be constructed based on the conditional survival function   \begin{align*} &S_{ij}^{(a)}(m \mid \bar{{\cal H}}_{i(\,j-1)})\\ =&\mbox{P}\left(X_{ij}> m \mid Y_{ij}=a, \xi_i, \bar{{\cal H}}_{i(\,j-1)}, \eta_0 \right)\\ =&\mbox{P}\left(X_{ij}> m \mid Y_{ij}=a, X_{i(\,j-1)}, \xi_i, \rho_0, \sigma_0 \right)\\ =&1-\Phi\left(\frac{m-\rho_0X_{i(\,j-1)}-\sigma_0\theta_i^{(a)}}{\sigma_0}\right)\!. \end{align*} In contrast to ROC curves based on $$M_{ij}$$ or its estimator, this ROC curve reflects the predictive value of $$X_{ij}$$ only. It also depends on the random effect $$\theta_i^{(a)}$$, unknown at the visit $$j.$$ One may also consider its expectation with respect to random effects or its limit when $$j \rightarrow \infty$$ as an estimable alternative. Remark 2.2 Both $${\rm ROC}_{1}(u|\,j)$$ and $${\rm ROC}_2(u | \xi_i)$$ are parametric in nature in that their summarization of the diagnostic value of the longitudinally measured biomarker are valid only if model (2.1) is correctly specified. 2.2. ROC curve for the patient population The predictive performance of the biomarker across the entire population may be very different from that for an individual patient. For example, the latter does not take into account biomarker variation between patients, or differences between patients in the prior probabilities of positive status events. Were the data not longitudinal, we might consider the empirical ROC curve of biomarker–status pairs $$(X_i,Y_i), i=1,\ldots n$$. To take accumulated patient data into account, we instead consider the ROC curve of $$\hat M_{j}({\mathcal H}_{ij}, \eta_0), i=1,\ldots, n$$, the patients’ biomarker scores (2.4) at a given time $$j$$. The scores synthesize all the predictive information in the past history under model (2.1). Conditionally on the population parameter $$\eta_0$$, the patient scores are iid, and the empirical ROC curve is a valid metric for the predictive value of the scores regardless of the validity of the model being used to derive them. If model (2.1) is a good approximation to the true relationship between $${\cal H}_{ij}$$ and $$Y_{ij}$$, one may anticipate good prediction accuracy of the resulting score. A severely misspecified model may give a prediction score with poor performance. In either case, the ROC curve and derived statistics such as the area under the ROC curve remain objective measures for the predictive value of the scoring system. Formally, assuming that $$\lim_{n\rightarrow \infty}\hat \eta = \tilde \eta_0$$ in probability, the score $$\hat M_{j}({\cal H}_{ij}, \hat \eta)$$ converges to   \begin{eqnarray*} \hat M_{j}({\cal H}_{ij}, \tilde\eta_0)&=&\alpha_{0j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)+\alpha_{1j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)Y_{i(\,j-1)} \nonumber \\ &&+\alpha_{2j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)X_{i(\,j-1)}+\alpha_{3j}({\bar{\cal H}}_{i(\,j-1)}, \tilde \eta_0)X_{ij}, \end{eqnarray*} where $$\tilde \eta_0=\eta_0$$ if the model is correctly specified. We are interested in estimating the ROC curve for the predictive value at the $$j$$th visit   $${\rm ROC}_{3}(u|\,j)= S_{j}^{(1)}\left\{ \left( S_{j}^{(0)}\right)^{-1}(u)\right\}\!, \label{math:roc3}$$ (2.8) where   $$S_{j}^{(a)}(m)=\mbox{P}\left(\hat M_{j}({\cal H}_{ij}, \tilde \eta_0)> m \mid Y_{ij}=a \right)\!, a=0, 1.$$ We do so by plugging in the empirical survival function:   $$\hat{{\rm ROC}}_3(u|\,j)=\hat S_{j}^{(1)}\left\{ \left( \hat S_{j}^{(0)}\right)^{-1}(u)\right\}\!,$$ (2.9) where $$N_{aj}=\sum_{i=1}^n I(Y_{ij}=a),$$  $$\hat S_j^{(a)}(m)= N_{aj}^{-1}\sum_{i=1}^{n} I\left\{\hat M_j({\cal H}_{ij},\hat{\eta})> m \right\} I( Y_{ij}=a)$$ and $$I(\cdot)$$ is the event indicator function. Similarly, the area under the ROC curve, the concordance statistics, may be estimated as   $$\skew7\hat A_j=\frac{1}{N_{1j}N_{0j}}\sum_{i=1}^{N_{0j}}\sum_{k=1}^{N_{1j}} I\left\{\hat M_{j}({\cal H}_{ij}, \hat \eta)>\hat M_{j}({\cal H}_{kj}, \hat \eta)\right\}I(Y_{ij}=1)I(Y_{kj}=0).$$ In Appendix B of supplementary material, available at Biostatistics online, we show that $$\hat{{\rm ROC}}_3(u|\,j)$$ is a consistent estimator for $${\rm ROC}_3(u|\,j)$$ and the distribution of $$\sqrt{n}\big\{\hat{ROC}_3(u|\,j)-{\rm ROC}_3(u|\,j)\big\}$$ converges to a mean zero Gaussian process under mild regularity conditions. The variance of $$\hat{\rm ROC}_3(u|\,j)$$ can be approximated by an efficient resampling method. At each iteration, we first generate random weights $$\{W_1, \ldots, W_n\}$$ from the unit exponential distribution and estimate $$\eta_0$$ under model (2.1) with the $$i$$th observation weighted by $$W_i.$$ Denote the estimator by $$\eta^*$$ and let   $${\rm ROC}_3^*(u|\,j)=\hat S_{j}^{(1)*}\left\{ \left( \hat S_{j}^{(0)*}\right)^{-1}(u)\right\}\!,$$ where   $$\hat S_j^{(a)*}(m)= \frac{\sum_{i=1}^{N} I\left\{\hat M_{j}({\cal H}_{ij}, \eta^*)> m \right\} I( Y_{ij}=a)W_i}{\sum_{i=1}^{N}I(Y_{ij}=a)W_i}.$$ Obtaining in this way a large number of realizations of $${\rm ROC}_3^*(u|\,j),$$ their empirical variance can be used to approximate that of $$\hat{{\rm ROC}}_3(u|\,j)-{\rm ROC}_3(u|\,j).$$ Similar resampling methods can be used to make inference on $$A_j$$, the area under the ROC curve at the $$j$$th visit. The predictive value of the biomarker in the entire population also varies with the visit $$j$$. With more visits and richer data observed, the predictive ability of the updated scoring system is expected to increase. We may study the trend of predictive value from visit $$1$$ to $$J$$ by simultaneously estimating $${\rm ROC}_3(u|2),\ldots,$$ and $${\rm ROC}_3(u|\,J)$$. It is not difficult to show that   $$\left\{\hat {\rm ROC}_3(u|2)-{\rm ROC}_3(u|2), \ldots, \hat{{\rm ROC}}_3(u|\,J)-{\rm ROC}_3(u|\,J)\right\}$$ can be approximated by a multivariate mean-zero Gaussian distribution, based on which joint inference for the predictive value at all visits of interest may be conducted. When the predictive value of the constructed scoring system only varies moderately from visit $$j_1$$ to $$j_2$$, i.e., $${\rm ROC}_3(u|\,j), j_1\le j\le j_2$$ are similar, it is tempting to estimate the ROC curve by the average predictive value between these two visits. To this end, one may empirically construct a ROC curve as   $$\hat{{\rm ROC}}_4(u|\,j_1, j_2)=\hat{S}_{j_1\,j_2}^{(1)}\left\{\left(\hat{S}_{j_1\,j_2}^{(0)}\right)^{-1}(u)\right\}\!,$$ where   $$\hat{S}_{j_1\,j_2}^{(a)}(m)=N_{aj_1\,j_2}\sum_{i=1}^n\sum_{j=j_1}^{j_2}I\left\{\hat{M}_{ij}({\cal H}_{ij})>m \right\}I(Y_{ij}=a) ,$$ and $$N_{aj_1\,j_2}=\sum_{i=1}^n \sum_{j=j_1}^{j_2}I(Y_{ij}=a).$$ Since it averages observations from multiple visits, $$\hat{{\rm ROC}}_4(u|\,j_1, j_2)$$ can be substantially more stable than $$\hat{{\rm ROC}}_3(u|\,j).$$$$\hat{{\rm ROC}}_4(u|\,j_1, j_2)$$ is a consistent estimator of   $${\rm ROC}_4(u|\,j_1, j_2)=S_{j_1\,j_2}^{(1)}\left\{\left(S_{j_1\,j_2}^{(0)}\right)^{-1}(u)\right\}\!,$$ where   $$S_{j_1\,j_2}^{(a)}(m)=\frac{\sum_{j=j_1}^{j_2} S_j^{(a)}(m)\mbox{P}(Y_{ij}=a)}{\sum_{j=j_1}^{j_2}\mbox{P}(Y_{ij}=a) },$$ a weighted average of $$S_j^{(a)}(m).$$ Statistical inference based on $$\hat{{\rm ROC}}_4(u|\,j_1, j_2)$$ can be made by resampling methods similar to those previously described. Remark 2.3 Despite some similarities, $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$ are quite different. The former is parametric and interpretable only when model (2.1) is correctly specified, while and latter is non-parametric in nature. The former, ignoring the differentiability in biomarkers across patients, tends to be smaller than the latter. Remark 2.4 The proposed ROC curves depend on the patient history $$\mathcal{H}_j$$ through the biomarkers estimates $$\hat M_j(\mathcal{H}_j, \hat \eta).$$ We may consider other functions of $$\mathcal{H}_j$$ given by different statistical models of the response. More generally, one may consider a working regression model   $$\mbox{P}\left(Y_{ij}=1|{\mathcal{H}_j}\right)=\mathcal{F}(Y_{i1}, \ldots, Y_{ij}, X_{i1}, \ldots, X_{i(\,j-1)}; {\boldsymbol \theta})$$ and construct the ROC curve based on   $$\left\{\left(\mathcal{F}(Y_{i1}, \ldots, Y_{ij}, X_{i1}, \ldots, X_{i(\,j-1)}; \hat{\boldsymbol \theta}), Y_{ij}\right)\!, i=1, \ldots, n\right\}\!,$$ where $$\mathcal{F}(\cdot; {\boldsymbol \theta})$$ is a parametric function of observed history and $${\boldsymbol \theta}$$ and $$\hat{\boldsymbol \theta}$$ are the model parameter and its appropriate estimator, respectively. 2.3. Extension In model (2.1), we assume that (i) the underlying disease status follows a simple Markov chain, i.e., the distribution of $$Y_{ij}$$ only depends on $$Y_{i(\,j-1)};$$ and (ii) the distribution of the biomarker level at visit $$j$$, $$X_{ij},$$ only depends on $$X_{i(\,j-1)}$$ and $$Y_{ij}$$; see Figure 1a, which diagrams the probability generating process described in (2.1). There are several obvious extensions: The distribution of $$X_{ij}$$ depends on $$(X_{i(\,j-1)}, Y_{i(\,j-1)}, Y_{ij})$$ (Figure 1b). The distribution of $$Y_{ij}$$ depends on $$(Y_{i(\,j-1)}, X_{i(\,j-1)})$$ (Figure 1c). Fig. 1. View largeDownload slide Schematic of the data-generation process described by (2.1) and extensions. (a) Model (2.1). (b) $$X_{ij}$$ generated from $$(X_{i(\,j-1)}, Y_{i(\,j-1)}, Y_{ij})$$. (c) $$Y_{ij}$$ generated from $$(Y_{i(\,j-1)}, X_{i(\,j-1)})$$. Fig. 1. View largeDownload slide Schematic of the data-generation process described by (2.1) and extensions. (a) Model (2.1). (b) $$X_{ij}$$ generated from $$(X_{i(\,j-1)}, Y_{i(\,j-1)}, Y_{ij})$$. (c) $$Y_{ij}$$ generated from $$(Y_{i(\,j-1)}, X_{i(\,j-1)})$$. Adapting model (2.1) to the first setting, where the biomarker value depends not only on the current disease status but also the status at the previous visit, gives:   $$X_{ij} | (X_{i(\,j-1)},Y_{i(\,j-1)},Y_{ij})=\theta_i^{(Y_{i(\,j-1)}Y_{ij})}+\rho_0X_{i(\,j-1)}+\epsilon_{ij}, \enspace\epsilon_{ij}\sim \mathcal{N}(0,\sigma_0^2), \label{math:modelextend1}$$ (2.10) where $$(\theta_{i1}, \theta_{i2}, \theta_{i3}, \theta_{i4})'$$ is the subject-specific random effect and $$\epsilon_{ij}$$ is independent $$\mathcal{N}(0,\sigma_0^2)$$. Under this model   \begin{align*} g\left\{\mbox{P}(Y_{ij}=1\mid {\cal H}_{ij},\xi_i)\right\}&=g\left\{\mbox{P}(Y_{ij}=1\mid X_{i(\,j-1)}, X_{ij}, Y_{i(\,j-1)})\right\}\\ &=\alpha_{i0}+\alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}+\alpha_{i4}X_{i(\,j-1)}Y_{i(\,j-1)} + \alpha_{i5}X_{ij}Y_{i(\,j-1)},\\ \end{align*} where   \begin{align*} \alpha_{i0} &= \frac{(\theta_{i}^{(00)})^2-(\theta_{i}^{(01)})^2}{2\sigma_0^2} + \mu^{(01)}_i\\ \alpha_{i1} &= -\frac{(\theta_{i}^{(11)})^2-(\theta_i^{(01)})^2-(\theta_i^{(10)})^2+(\theta_i^{(00)})^2}{2\sigma_0^2}+(\mu^{(11)}_i-\mu^{(01)}_i)\\ \alpha_{i2} &= \frac{\rho_0(\theta_{i}^{(00)}-\theta_i^{(01)})}{\sigma_0^2}, \alpha_{i3} = -\frac{(\theta_{i}^{(00)}-\theta_i^{(01)})}{\sigma_0^2}, \alpha_{i4} = -\frac{\rho_0(\theta_{i}^{(11)}-\theta_i^{(01)}-\theta_i^{(10)}+\theta_i^{(00)})}{\sigma_0^2},\\ \alpha_{i5} &= \frac{(\theta_{i}^{(11)}-\theta_i^{(01)}-\theta_i^{(10)}+\theta_i^{(00)})}{\sigma_0^2}. \end{align*} Therefore, besides the terms in (2.2), model (2.10) leads to additional interaction terms $$X_{i(\,j-1)}Y_{i(\,j-1)}$$ and $$X_{ij}Y_{i(\,j-1)}$$ contributing to the prediction of the disease status at the $$j$$th visit, $$Y_{ij}.$$ For the second setting, we may assume that   $$P\{Y_{ij}=b | Y_{i(\,j-1)}=a,X_{i(\,j-1)}\} = p^{(ab)}_i(X_{i(\,j-1)}), a,b\in\{0,1\},$$ where   $$\left( \begin{array}{c} g\{p_{01i}(X_{i(\,j-1)})\} \\ g\{p_{11i}(X_{i(\,j-1)})\} \end{array} \right) \sim N\left\{ \left( \begin{array}{c} \mu^{(01)}_i \\ \mu^{(11)}_i \end{array} \right)+\left( \begin{array}{c} \gamma_0 \\ \gamma_1 \end{array} \right)X_{i(\,j-1)}, \boldsymbol{\Sigma}_p\right\}\!.$$ In other words, the transition probability of the underlying disease status depends on the biomarker level at the prior visit. Under this model   \begin{align*} g\left\{\mbox{P}(Y_{ij}=1\mid {\cal H}_{ij},\xi_i)\right\}&=g\left\{\mbox{P}(Y_{ij}=1 \mid X_{i(\,j-1)}, X_{ij}, Y_{i(\,j-1)})\right\}\\ &=\alpha_0 + \alpha_{i1}Y_{i(\,j-1)}+\alpha_{i2}X_{i(\,j-1)}+\alpha_{i3}X_{ij}+\alpha_{i4}X_{i(\,j-1)}Y_{i(\,j-1)},\\ \end{align*} where   \begin{align*} \alpha_{i0} &= \mu^{(01)}_i+\frac{(\theta^{(0)}_i)^2-(\theta^{(1)}_i)^2}{2\sigma_0^2}\\ \alpha_{i1} &= \mu^{(11)}_i-\mu^{(01)}_i\\ \alpha_{i2} &= \frac{\rho_0(\theta^{(0)}_i-\theta^{(1)}_i)}{\sigma_0^2}+\gamma_0, \alpha_{i3} = -\frac{\theta^{(0)}_i-\theta^{(1)}_i}{\sigma_0^2}, \mbox{and} \alpha_{i4} = \gamma_1-\gamma_0. \end{align*} Therefore, compared with (2.2), there is an additional interaction term $$X_{i(\,j-1)}Y_{i(\,j-1)}$$ contributing to the prediction of the disease status at the $$j$$th visit, $$Y_{ij}.$$ There may be more extensive generalizations of model (2.2) such as the combination of extensions of (1) and (2) or higher order Markov chains for $$Y_{ij}$$. As mentioned in the previous sections, while the validity of individualized ROC curve depends on the correct model specification, the population-based ROC curve $${\rm ROC}_3(u|\,j)$$ and $${\rm ROC}_4(u|\,j_1, j_2)$$ can be constructed for scoring systems developed under different modeling assumptions and used to compare different models in terms of their predictive ability. 3. Example The goal of highly active antiretroviral therapy in the treatment of HIV is to keep a patient’s CD4 count high and to suppress viral load. CD4 count measures immunosuppression, the risk of opportunistic infections, and the strength of the immune system. Viral load is the amount of HIV in a sample, indicative of, among other things, transmission risk. Although viral load is regarded as a better indicator of disease status, it is also more expensive and time-consuming to measure than CD4 count. According to clinical guidelines, both are tested regularly in a typical treatment regimen and used to guide subsequent treatment. Even when therapy is effective and viral load is clinically categorized as suppressed, transient spikes in viral load, or “blips,” are observed. The clinical significance of viral blips is not understood well. While some studies have reported that viral blips are of no clinical significance, others have reported an association between viral blips and virologic failure. The identification of the predictors of viral blips and the association between viral blips and CD4+ T-cell changes over time are subjects of ongoing research. (see Paintsil and others, 2016 and references therein.) We consider the accuracy of absolute CD4+ T-lymphocyte count as a predictor of blip status among children. We analyzed longitudinal data from HIV-infected children enrolled in the Yale Prospective Longitudinal Cohort study comprising 97 children born to HIV-infected mothers in the greater New Haven, CT, area since 1985. The predictor CD4 count measures the number of CD4 cells/mm$$^3$$ of blood and the response blip status is defined as a viral load equal or exceeding 50 copies/mL. The median number of visits/patient is 33, with 1st and 3rd quartiles of 15 and 47 visits, respectively. For all of the 3309 visits in the data set, the median time between visits is exactly 90 days, with 1st and 3rd quartiles of 57 and 112 days, respectively, giving approximately evenly spaced visits during the follow-up. The average age at enrollment is 6.7 years (standard deviation: 3.9 years). Figure S1(a) of supplementary material available at Biostatistics online summarizes the dates of visits in the lifetimes of the subjects. Further details on the cohort and definitions used here can be found in Paintsil and others (2008) and the references therein. Sixteen patients with fewer than 10 visits were removed in order to allow for estimation of the individual ROC, $${\rm ROC}_2(u|\xi_i),$$ as discussed in Section 2. Eighty-one subjects remain after excluding those with fewer than 10 visits. The choice of how to group longitudinal observations is an important issue in many cohort-based longitudinal data analyzes, including ours. At each visit, measurements including CD4 count and blip status are taken, and antiretroviral treatment is administered. Therefore, the visit may serve as a surrogate for the number of treatments administered since baseline. While the specific enrollment time varies, the majority of the enrolled children (average age of 6.7 years) are in the early stages of treatment at baseline, and therefore it may be sensible to align observations according to their visit numbers. When the sample size allows, the analysis can be restricted to a more homogeneous subgroup of children with similar conditions at the baseline, which makes grouping by visits still more interpretable. A crude indication of the value of CD4 as a predictor of blip status is given in Figure S1(b) of supplementary material available at Biostatistics online, which plots the histograms of CD4, aggregated over patients and visits, conditional on positive and negative blip status. Despite the large overlap, there is a clear location shift between the two measures. Figure S2 of supplementary material available at Biostatistics online plots the trajectories of CD4, with plotting shape encoding blip status, for a representative sample of subjects. The long sequences of like shapes even as CD4 fluctuates wildly suggests previous blip status as a predictor of future blip status, motivating the Markov structure in model (2.1). Finally, Figure S3 of supplementary material available at Biostatistics online is a heatmap of the empirical correlation matrix among CD4 measurements on the first 40 visits, for the 44 patients with 40 or more visits. The entries tend to decrease in magnitude moving away from the diagonal. This correlation structure accords with the weak dependency implied by the autocorrelative structure in model (2.1). We apply the ROC estimation procedure described in Section 2 to the pediatric HIV data in order to assess the value of the past CD4 counts and blip statuses as a predictor of current blip status. The MLE $$\hat{\rho}_0=0.49,$$ (95% CI (0.28,0.63)), describes the strong autoregressive dependency of CD4, as suggested by the heatmap. Similarly, the strong dependence between previous and current blip status is confirmed by the population transition probabilities $$\{g^{-1}(\hat{\mu}^{(00)}), g^{-1}(\hat{\mu}^{(11)})\}=(0.93, 0.73)$$, giving the probabilities of remaining in the negative and positive, respectively, blip status states in successive visits. A 95% CI for the difference $$g^{-1}(\hat{\mu}^{(11)})-g^{-1}(\hat{\mu}^{(00)})$$ is (0.05,0.12). The difference between the CD4 standardized means conditional on negative versus positive blip status, $$(\hat{\theta}^{(1)}- \hat{\theta}^{(0)})/\hat{\sigma}_0=0.82,$$ (95% bootstrap CI (0.63,0.91)), is consistent with the location shift apparent from Figure S1(b) of supplementary material available at Biostatistics online. The resulting time-indexed ROC curves at time $$j=10$$ and their associated 95% CIs are summarized in Figure S4 of supplementary material available at Biostatistics online. The time $$j=10$$ was chosen to be consistent with our exclusion of patients with fewer than 10 visits from the analysis. As Figure S1(a) of supplementary material available at Biostatistics online shows, ROC curves at later time points are available if one is willing to exclude patients with insufficient visits. The CIs are constructed by a bootstrap with $$500$$ bootstrap samples. Despite the noisy data presented in Figures S1(b) and S2 of supplementary material available at Biostatistics online, the risk score taking into account both previous CD4 values and blip status performs reasonably well as a predictor of the current disease status. We also plot the time-asymptotic individual ROC curve $${\rm ROC}_2(u | \xi_i)$$ for selected patients. Patient no. 14 exhibits a non-smooth curve. The “elbow” arises when a patient’s previous disease status is significantly more predictive than the patient’s biomarkers of future disease status. In such cases, the ROC curve approximates the discrete behavior of a threshold predictor. As mentioned above, the validity of the individualized ROC curves depends on the correct specification of model (2.1). If we view model (2.1) as a working device used to derive a risk score for predicting the blip status, we may use $${\rm ROC}_3(u|\,j)$$ as well as $${\rm ROC}_4(u|\,j_1, j_2)$$ to summarize the predictive value of the scoring system between visits $$j_1$$ and $$j_2.$$ Due to the small sample size and infrequent occurrence of blips, we construct $${\rm ROC}_4(u|6,8)$$ and its 95% CI as shown in Figure S4 of supplementary material available at Biostatistics online. The area under the ROC curve is 0.865, with a bootstrap standard error of 0.008, also indicating good predictive value. The jagged shape of the ROC curve reflects the fact that few of the 81 patient scores lie in the overlap of the case and control distributions. As a comparison, we also plot in Figure 2 the ROC curve based on scores $$(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij})$$ by fitting the simple random effect model (1.1). As expected, the resulting ROC curve is higher than $$\hat{{\rm ROC}}_4(u|8,10)$$ and $$\hat{{\rm ROC}}_1(u|10)$$. However, using fitted values as scores to predict the blip status requires information not available at visit $$j=10$$ and is not therefore a comparable measure of the predictiveness of the biomarker at that time. Fig. 2. View largeDownload slide The ROC curve when fitted values under a random effects model (1.1) are used as scores compared with $${\rm ROC}_1(u|10)$$ and $${\rm ROC}_4(u|8,10)$$ (pediatric HIV data). Fig. 2. View largeDownload slide The ROC curve when fitted values under a random effects model (1.1) are used as scores compared with $${\rm ROC}_1(u|10)$$ and $${\rm ROC}_4(u|8,10)$$ (pediatric HIV data). 4. Simulation In this section, we investigate the finite-sample performance of the proposed method. To this end, we generate data sets mimicking the pediatric HIV data. Specifically, $$\{(X_{ij}, Y_{ij}), i=1, \ldots, n, j=1, \ldots, n_i\}$$ are simulated under model (2.1) with the population parameter $$\eta_0$$ being the maximum likelihood estimator obtained from the HIV data. We use a Monte-Carlo approximation for the underlying true ROC curves $${\rm ROC}_1(u|\,j)=E\{{\rm ROC}(u|\xi, \eta_0, j)\}$$ and $${\rm ROC}_3(u|\,j).$$ We also calculate $${\rm ROC}_2(u|\xi)=\lim_{j\rightarrow\infty} {\rm ROC}(u|\xi, \eta_0, j)$$ based on the analytic expression of $$S_\infty^{(a)}(m|\xi)$$ given in (2.7) for selected random effects $$\xi.$$ Next, we generate 500 data sets, each consisting of $$n=80$$ patients with $$n_i=40$$ visits each to match the HIV example. For each simulated data set, we estimate the expected individual-specific ROC curve $${\rm ROC}_1(u|\,j)$$ at $$j=5, 15, 25, 35$$ and its 95% point-wise CI using the parametric bootstrap method; the limiting individual-specific ROC curve $${\rm ROC}_2(u | \xi_i)$$ for selected patients; the population ROC curve $${\rm ROC}_3(u|\,j)$$ and its 95% point-wise CI using the resampling method, for $$j=5, 15, 25, 35.$$ The resulting ROC curves estimates and 95% CIs based on one generated data set are presented in Figure 3. To evaluate the performance of the proposed method, we estimate the empirical bias of the point estimators as well as the coverage level of the 95% CIs at selected $$u$$ for both $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$, $$j=5, 15, 25, 35.$$ For each estimator of interest, we also calculate the empirical average of the estimated standard errors and the empirical standard error. The detailed simulation results for $$u=10\%, 25\%, 50\%,$$ and $$75\%$$ are summarized at Table 1. The empirical biases are reasonably small in magnitude. The average estimated standard errors of all estimators are very close to the empirical standard errors and the coverage level of the 95% CIs are consistent with the nominal level allowed by the Monte-Carlo simulation error. In general, as expected, the population ROC curve tends to be higher than the individualized counterpart at the same visit. For estimating $${\rm ROC}_2(u | \xi_i),$$ we compare the AUC under ROC curve, $$\int_0^1 \hat{{\rm ROC}}_2(u|\xi, \hat{\eta}, j){\rm d}u,$$ based on data from increasing number of visits with the true limiting AUC value for selected random effects $$\xi$$s. We focus in particular on whether the estimator converges to the truth as the number of visits increases under this correct model specification. Figure S5 of supplementary material available at Biostatistics online plots the number of visits against the difference between estimated AUCs and the truth with five different realizations of $$\xi$$, showing the expected convergence. The convergence may be too slow for some purposes, requiring data from a large number of visits in order to achieve the required estimation accuracy of the individual random effect. Fig. 3. View largeDownload slide Expected individual ROC $${\rm {ROC}}_1(u|\,j)$$ (solid) and population ROC $${\rm ROC}_3(u|\,j)$$ (dotted) at visit $$j=5$$ with 95% bootstrap CI; limiting individual ROC $${\rm ROC}_2(u | \xi_i)$$ for four patients. The data were generated under model (2.1) with hyperparameters estimated from the pediatric HIV data. Fig. 3. View largeDownload slide Expected individual ROC $${\rm {ROC}}_1(u|\,j)$$ (solid) and population ROC $${\rm ROC}_3(u|\,j)$$ (dotted) at visit $$j=5$$ with 95% bootstrap CI; limiting individual ROC $${\rm ROC}_2(u | \xi_i)$$ for four patients. The data were generated under model (2.1) with hyperparameters estimated from the pediatric HIV data. Table 1. Nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (synthetic data using hyperparameters estimated from the pediatric HIV data, $$n=80$$ patients)       FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)        FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)  Table 1. Nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}_1(u|\,j)$$ and $${\rm ROC}_3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (synthetic data using hyperparameters estimated from the pediatric HIV data, $$n=80$$ patients)       FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)        FPR  0.10  0.25  0.50  0.75  Visit  ROC                          CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5  $${\rm ROC}_1$$     0.94 (0.01,0.05,0.05)  0.96 ($$-$$0.00,0.04,0.04)  0.94 ($$-$$0.01,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.94 ($$-$$0.04,0.12,0.12)  0.95 ($$-$$0.02,0.12,0.11)  0.97 ($$-$$0.00,0.10,0.10)  0.97 ($$-$$0.00,0.06,0.06)  15  $${\rm ROC}_1$$     0.93 ($$-$$0.01,0.05,0.05)  0.93 ($$-$$0.01,0.04,0.04)  0.91 ($$-$$0.02,0.03,0.03)  0.97 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 ($$-$$0.03,0.13,0.12)  0.96 ($$-$$0.01,0.12,0.11)  0.98 (0.00,0.08,0.08)  0.95 (0.01,0.05,0.04)  25  $${\rm ROC}_1$$     0.93 ($$-$$0.00,0.05,0.05)  0.94 ($$-$$0.01,0.04,0.04)  0.95 ($$-$$0.01,0.03,0.03)  0.94 ($$-$$0.01,0.02,0.02)     $${\rm ROC}_3$$     0.95 ($$-$$0.02,0.13,0.12)  0.97 ($$-$$0.00,0.11,0.11)  0.95 ($$-$$0.01,0.07,0.07)  0.92 (0.00,0.04,0.04)  35  $${\rm ROC}_1$$     0.94 ($$-$$0.01,0.05,0.05)  0.92 ($$-$$0.01,0.04,0.04)  0.96 ($$-$$0.00,0.03,0.03)  0.95 ($$-$$0.00,0.02,0.02)     $${\rm ROC}_3$$     0.96 (0.02,0.13,0.12)  0.94 (0.01,0.11,0.11)  0.94 ($$-$$0.00,0.08,0.07)  0.92 ($$-$$0.00,0.04,0.04)  In the second set of simulations, we examine the performance of the proposal under model mis-specification. Specifically, we simulate data under the random effect model (1.1), with model parameters taken to be the maximum likelihood estimators from the HIV data. As discussed above, the diagnostic value represented by the ROC curve based on $$\{(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij}), i=1, \ldots, n, j=1, \ldots, n_i\}$$ cannot be achieved in practice but may serve as a benchmark. Since model (2.1) is misspecified, we focus on the population ROC curve only. First, we plot in Figure S6 of supplementary material available at Biostatistics online the true $${\rm ROC}_3(u|\,j), j=5, 15, 25, 35,$$ and that based on $$(\hat \alpha_i+\hat \beta_i X_{ij}, Y_{ij})$$ by setting $$n=10^6$$ and $$n_i=40.$$ As expected, by comparison with the benchmark $${\rm ROC}_3(u|\,j)$$ fails to reflect the predictive value of the observed history due to model misspecification. Next, we repeat the simulation with a sample size $$n=80$$ and $$n_i=40$$ 500 times to examine the finite-sample biases of the point estimators and coverage levels of the 95% CIs for estimating the true ROC curves. Table 2 confirms our expectation that the inference procedure for $${\rm ROC}_3(u|\,j)$$ remains valid in the presence of model misspecification. Table 2. Misspecified model: nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (random slope-intercept logistic model, $$n=80$$ patients)    FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)     FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)  Table 2. Misspecified model: nominal 95% CI coverage (CVL), bias (BS), average standard error (ASE), and empirical standard error (ESE) of $${\rm ROC}3(u|\,j)$$ for false positive rates (FPR) 10%, 25%, 50%, and 75% at visits $$j=5, 15, 25,$$ and 35 (random slope-intercept logistic model, $$n=80$$ patients)    FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)     FPR  0.10  0.25  0.50  0.75  Visit                       CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  CVL (BS,ASE,ESE)  5     0.97 ($$-$$0.00,0.14,0.13)  0.97 ($$-$$0.00,0.12,0.10)  0.96 (0.00,0.09,0.08)  0.92 (0.00,0.06,0.05)  15     0.94 (0.01,0.15,0.16)  0.95 (0.05,0.11,0.10)  0.92 ($$-$$0.00,0.07,0.06)  0.92 (0.00,0.04,0.03)  25     0.94 ($$-$$0.01,0.15,0.15)  0.93 (0.03,0.11,0.11)  0.96 (0.03,0.07,0.07)  0.93 (0.02,0.04,0.03)  35     0.93 ($$-$$0.08,0.14,0.11)  0.97 ($$-$$0.03,0.11,0.09)  0.95 ($$-$$0.00,0.06,0.06)  0.95 (0.01,0.03,0.03)  5. Discussion We have proposed a set of ROC-based metrics and statistical methods for evaluating the predictive value of a biomarker in a longitudinal, multiple patient design. We emphasize three keys in extending the ROC curve from the cross-sectional to longitudinal setting: (i) the score used to construct the ROC curve should take into account all of the observed history; (ii) the score should not require unobserved history; and (iii) the predictive value of the biomarker at the individual and the population levels should be treated differently. These objectives are not met satisfactorily by the mixed effects model (1.1) available in the literature, where (i) a patient’s observations are conditionally independent given the subject effects, and in particular past observations are not taken into account in using the observation as a score; (ii) all time points are used to estimate the subject effects, so that the score estimate for a given time point is a function of the disease status it is intended to predict; and (iii) there is no distinction between patient and population ROC curves. The current approach is developed based on a simple parametric model. While the parametric assumptions are plausible in light of our data, they are chosen mainly for convenience in implementation and motivated by the HIV data. Necessary model checking analysis for other data is needed. In the proposed approach, we assume that the biomarker is measured at a regular time interval, which is true in the HIV example. However, in clinical practice, the measurement time is often irregular and it may not be possible to group the measurements into comparable 1st, 2nd $$\ldots$$ visits. Furthermore, even with regular measurements, grouping measurements by visits may not be interpretable if the baseline does not represent an interpretable “origin,” such as onset of disease or start of treatment. In such cases, the predictiveness of the biomarker needs to be evaluated with respect to the measurement history, including the actual measurement times. Doing so may require complex joint modeling of measurement time, biomarker level and disease status. Further research in this direction is warranted. Supplementary material Supplementary material is available at http://biostatistics.oxfordjournals.org. Acknowledgments This research was partially supported by grants NIH/NAIDS 2P30 AI060354-11 from the National Institutes of Health and R01HL089778-05 from the National Heart, Lung, and Blood Institute. We thank the two referees and associate editor for their constructive comments. The authors also thank the study participants and the principal investigator of the study, Dr. Elijah Paintsil, for sharing the data with us. Conflict of Interest: None declared. References Albert, P. S. ( 2012). A linear mixed model for predicting a binary event from longitudinal data under random effects misspecification. Statistics in Medicine  31, 143– 154. Google Scholar CrossRef Search ADS PubMed  Azzalini, A. ( 1994). Logistic regression for autocorrelated data with application to repeated measures. Biometrika  81, 767– 775. Google Scholar CrossRef Search ADS   Breslow, N. E. and Clayton, D. G. ( 1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association  88, 9– 25. Foulkes, A. S., Azzoni, L., Li, X., Johnson, M. A., Smith, C., Mounzer, K. and Montaner, L. J. ( 2010). Prediction based classification for longitudinal biomarkers. The Annals of Applied Statistics  4, 1476. Google Scholar CrossRef Search ADS PubMed  Heagerty, P. J. and Zheng, Y. ( 2005). Survival model predictive accuracy and ROC curves. Biometrics  61, 92– 105. Google Scholar CrossRef Search ADS PubMed  Heagerty, P. J., Lumley, T. and Pepe, M. S. ( 2000). Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics  56, 337– 344. Google Scholar CrossRef Search ADS PubMed  Janes, H., Longton, G. and Pepe, M. ( 2009). Accommodating covariates in ROC analysis. Stata Journal  9, 17– 39. Google Scholar PubMed  Laird, N. M. and Ware, J. H. ( 1982). Random-effects models for longitudinal data. Biometrics  38, 963– 974. Google Scholar CrossRef Search ADS PubMed  Liu, D. and Albert, P. S. ( 2014). Combination of longitudinal biomarkers in predicting binary events. Biostatistics  15, 706– 718. Google Scholar CrossRef Search ADS PubMed  Liu, H. and Wu, T. ( 2003). Estimating the area under a receiver operating characteristic (ROC) curve for repeated measures design. Journal of Statistical Software  8, 1– 18. Google Scholar CrossRef Search ADS   Liu, H., Li, G., Cumberland, W. and Wu, T. ( 2005). Testing statistical significance of the area under a receiving operating characteristics curve for repeated measures design with bootstrapping. Journal of Data Science  3, 257– 278. Paintsil, E., Ghebremichael, M., Romano, S. and Andiman, W. ( 2008). Absolute CD4+ T-lymphocyte count as a surrogate marker of pediatric HIV disease progression. Pediatric Infectious Disease Journal  7, 629– 635. Google Scholar CrossRef Search ADS   Paintsil, E., Martin, R., Goldenthal, A., Bhandari, S., Andiman, W. and Ghebremichael, M. ( 2016). Frequent episodes of detectable viremia in HIV treatment-experienced children is associated with a decline in CD4+ T-cells over time. Journal of AIDS & Clinical Research  7, 565– 577. Google Scholar CrossRef Search ADS PubMed  Pencina, M. J., D’Agostino, R. B. and Vasan, R. S. ( 2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine  27, 157– 172. Google Scholar CrossRef Search ADS PubMed  Pepe, M. S. ( 2003). The Statistical Evaluation of Medical Tests for Classification and Prediction . New York: Oxford University Press, USA. Robinson, G. K. ( 1991). That BLUP is a good thing: the estimation of random effects. Statistical Science  6, 15– 32. Google Scholar CrossRef Search ADS   Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M. J. and Kattan, M. W. ( 2010). Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology (Cambridge, MA)  21, 128. Google Scholar CrossRef Search ADS   Swets, J. A. and Pickett, R. M. ( 1982). Evaluation of Diagnostic Systems: Methods from Signal Detection Theory . Academic Press Series in Cognition and Perception. New York, NY: Elsevier Science & Technology Books. Uno, H., Cai, T., Tian, L. and Wei, L.-J. ( 2007). Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association  102, 527– 537. Google Scholar CrossRef Search ADS   Uno, H., Tian, L, Cai, T., Kohane, I. S. and Wei, L.-J. ( 2013). A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data. Statistics in Medicine  32, 2430– 2442. Google Scholar CrossRef Search ADS PubMed  Yang, S., Santillana, M. and Kou, S. C. ( 2015). Accurate estimation of influenza epidemics using Google search data via ARGO. Proceedings of the National Academy of Sciences  112, 14473– 14478. Google Scholar CrossRef Search ADS   Zheng, Y. and Heagerty, P. J. ( 2004). Semiparametric estimation of time-dependent ROC curves for longitudinal marker data. Biostatistics  5, 615– 632. Google Scholar CrossRef Search ADS PubMed  Zhou, X.-H., McClish, D. K. and Obuchowski, N. A. ( 2009). Statistical Methods in Diagnostic Medicine , Volume 569. New York: John Wiley & Sons. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

### Journal

BiostatisticsOxford University Press

Published: Mar 28, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations