Using pseudo-observations for estimation in relative survival

Using pseudo-observations for estimation in relative survival Summary A common goal in the analysis of the long-term survival related to a specific disease is to estimate a measure that is comparable between populations with different general population mortality. When cause of death is unavailable or unreliable, as for example in cancer registry studies, relative survival methodology is used—in addition to the mortality data of the patients, we use the data on the mortality of the general population. In this article, we focus on the marginal relative survival measure that summarizes the information about the disease-specific hazard. Under additional assumptions about latent times to death of each cause, this measure equals net survival. We propose a new approach to estimation based on pseudo-observations and derive two estimators of its variance. The properties of the new approach are assessed both theoretically and with simulations, showing practically no bias and a close to nominal coverage of the confidence intervals with the precise formula for the variance. The approximate formula for the variance has sufficiently good performance in large samples where the precise formula calculation becomes computationally intensive. Using bladder cancer data and simulations, we show that the behavior of the new approach is very close to that of the Pohar Perme estimator but has the important advantage of a simpler formula that does not require numerical integration and therefore lends itself more naturally to further extensions. 1. Introduction When analyzing long-term survival of patients with a specific disease, we can expect many deaths due to causes that are not of our primary interest. The hazard of dying due to causes other than the disease can of course differ between different populations or periods in time, so a measure, that accounts for these differences is required. In order to distinguish between the causes, we assume the competing risks model (in the field of relative survival often referred to as the additive model, Estève and others, 1990) \begin{equation} \lambda_{O}(t|X) = \lambda_{P}(t|D) + \lambda_{E}(t|X), \end{equation} (1.1) where $$\lambda_{O}$$ is the overall hazard of each individual, $$\lambda_{E}$$ is the disease-specific hazard (excess hazard due to the disease) and $$\lambda_{P}$$ the other-cause hazard; $$X=(D,Z)$$, $$D$$ denotes the demographic covariates (those by which the population tables are split) and $$Z$$ denotes other variables of interest. If the cause of death in the data is not given or is unreliable, only $$\lambda_{O}$$ can be observed and additional information must be brought into the analysis to discern between two terms on the right hand of (1.1). The idea of the relative survival field is to assume that $$\lambda_{P}(t|D)$$ is known, that is that it can be obtained from the general population mortality tables (hence the subscript P in the notation) for each combination of values of demographic variables $$D$$. When the main goal is to consider the patient’s survival as an indicator of treatment success and to compare it between different populations or time periods, a measure of survival that does not depend on the population hazard is the basic goal of the estimation. Therefore, our interest does not lie in cumulative incidence functions [i.e. cumulative probability of dying with cancer which depend on both hazards ($$\lambda_E$$ and $$ \lambda_P$$ in 1.1)], but rather in a measure that summarizes the information given in the excess term $$\lambda_E$$ only. While the estimation of such a measure in the form of net survival has been considered as the goal of the analysis of population-based cancer survival for decades (Ederer and others, 1961; Hakulinen and Tenkanen, 1987; Estève and others, 1990; Brenner and Hakulinen, 2003; Brenner and others, 2004; Pokhrel and Hakulinen, 2008, 2009) with relative survival methodology being part of major statistical packages (Pohar and Stare, 2006, 2007; Clerc-Urmès and others, 2014; Coviello and others, 2015; Dickman and Coviello, 2015; Pohar Perme, 2017), it was only in 2012, that a consistent estimator of net survival was proposed (Pohar Perme and others, 2012), we shall refer to it as the PP estimator. With practical use of this estimator, very large variability and peculiarly jumpy behavior has been observed in the following years (Dickman and others, 2013; Seppä and others, 2016), which led some authors to search for alternative estimators (Lambert and others, 2015). In 2016, a doubly robust version of the PP estimator was proposed (Komukai and Hattori, 2016). As we shall show in Section 7, the large variability follows from the definition of the net survival measure and is hence not a property of a particular estimator. Nevertheless, the PP estimator does have some undesired properties, in particular, the need of approximations required to perform the numerical integration is precluding direct theoretical extensions, for example, the link between non-parametric estimation and regression modeling. In this article, we define an alternative non-parametric estimator based on pseudo-observations. The idea for the use of pseudo-observations comes from the wish to formulate the estimator directly from the definition of the measure, which allows us to distinguish between the properties of the measure and the properties of the methods for its estimation. This article is organized as follows: in Section 2 the measure of interest is defined, and in Section 3 the new approach to estimation is proposed and its properties explained. The vast majority of papers about pseudo-observations focused on their usage as response variables in regression models (Andersen and others, 2003), an exception being the (Cortese and others, 2013) paper where the idea of pseudo-observations was used to calculate the Brier score. However, to the best of our knowledge no special attention has been devoted to variance of the pseudo-observation and its estimation and this is where this article makes a step forward. We derive the variance formula in Section 4 and consider also several approaches to its estimation. Section 5 is devoted to a simulation study that compares the new approach to the existing PP method and explores the properties of the different proposals for the variance calculation. Section 6 includes a real data example, Section 7 discusses the properties of the measure, and Section 8 closes the article. 2. The measure of interest In this section, we define the measure that is the focus of this work. Our goal is a non-parametric estimator for the group as a whole, but we do not assume that hazards are homogeneous, hence the covariates are explicitly used in the notation. Since our interest lies in the excess term only, we wish to consider some sort of a weighted average of $$\lambda_E(t|X)$$. In the field of population-based cancer survival, where the relative survival methodology is most frequently used, it is usual to consider the cumulative hazard $$\Lambda_E(t|X)$$ for each individual and then report a measure defined on the “survival scale,” i.e. to use the operator $$v \mapsto \exp(-v)$$ on (1.1) to obtain the following relation \begin{equation} S_E(t|X) = \frac{S_{O}(t|X)}{S_{P}(t|D)}. \end{equation} (2.1) On individual level, we shall refer to this ratio as the relative survival. In the relative survival field, we assume that the denominator in (2.1) equals the population survival $$S_P(t|D)$$ obtained from the general population mortality tables. For a group of individuals with different values of $$X$$, we define the marginal relative survival as \begin{equation} S_E(t)=\int S_E(t|x){\rm d}H(x), \end{equation} (2.2) where $$H$$ is the distribution function of covariates $$X$$. The definition could also allow additional weights, i.e. \begin{equation} S_E^w(t) = \int w(z)S_E(t|z){\rm d}H(z), \end{equation} (2.3) that can be useful when trying to compare populations with considerably different covariate distribution (more on this in Section 7). The marginal relative survival (2.2) is of interest when the sole focus is the excess term $$\lambda_E$$—it summarizes it over time and over individuals. It has a simple interpretation: for each individual, it presents the ratio of his personal survival probability $$S_{O}$$ to the survival probability of his counterparts (defined by a vector of demographic covariates $$D$$) in the population $$S_P$$ (2.1); for a group of individuals, it can be interpreted as the average ratio. Note that (2.2) is not the same as the classical relative survival ratio (Ederer and others, 1961), which is defined as the ratio between the overall survival of a group and the population survival of a comparable group in the population, i.e. the ratio of averages and not the average ratio. The measure (2.2) is commonly interpreted also as marginal net survival, i.e. the survival in a hypothetical world where one could not die of other causes but cancer, we state the additional implicit assumptions required for this interpretation in Section 7. One of the crucial points of this work is to propose an estimator whose derivation does not depend on assumptions in a hypothetical world. 3. A new approach to estimation We assume that our sample consists of $$n$$ possibly right-censored pairs $$(T_i,\delta_i)$$, where $$T_i$$ is the observed time and $$\delta_i$$ is the censoring indicator for the $$i$$th patient. To simplify the notation, we will use weights $$w$$ (formula (2.3)) that are constantly equal to 1, i.e. we propose the estimator for (2.2). However, the derivations and arguments used therein can be directly carried over to the more general case (2.3). Following the definition of the measure, we propose the estimator \begin{equation*} \hat{S}_E(t) = \frac{1}{n} \sum_{i=1}^n \hat{S}_{Ei}(t), \end{equation*} where $$\hat{S}_{Ei}$$ is the estimator of this measure for patient $$i$$ from the sample ($$\hat{S}_{Ei}(t) = \hat{S}_E(t|X=x_i)$$). The individual values $$S_{Ei}$$ could be estimated assuming a model for the excess hazard (see e.g. Lambert and others, 2015; Quaresma and others, 2015; Seppä and others, 2016), we attempt a non-parametric approach here. Following the definition of relative survival (2.1), $$S_{Ei}$$ can be estimated using $$\frac{\hat{S}_{Oi}}{S_{Pi}}$$ [here $$\hat{S}_{Oi}(t) = \hat{S}_{O}(t|X=x_i)$$ and $$S_{Pi}(t) = S_P(t|D=d_i)$$]. The values $$S_{Pi}$$ can be obtained from the population mortality tables. To get $$\hat{S}_{Oi}(t)$$, we consider two cases: Complete data: If the data are complete, i.e. there are no censored observations ($$\delta_i=0$$ for all $$i$$), the natural estimator of $$\hat{S}_{Oi}(t)$$ is the indicator function $$I(T_i > t)$$. Measure (2.2) can be estimated as \begin{equation*} \hat{S}_E(t) = \frac{1}{n} \sum_{i=1}^n \frac{I(T_i> t)}{S_{Pi}(t)}. \end{equation*} Right-censored data: If some individuals are right-censored, the above mentioned indicator functions can no longer be used as estimators. Instead of defining the estimator on the hazard scale (as for example the PP estimator, Pohar Perme and others, 2012), we keep the same idea as in the complete case by plugging-in the pseudo-observations. The pseudo-observation for $$i$$th individual is defined as (Andersen and others, 2003) \begin{equation*} \hat{S}_{Oi}(t) = n \cdot \hat{S}_{O}(t) - (n-1)\cdot \hat{S}_{O}^{(-i)}(t), \end{equation*} where $$n$$ is the sample size and $$\hat{S}$$ and $$\hat{S}^{(-i)}$$ are the estimators of overall survival evaluated on the whole sample and on the reduced sample without the $$i$$th observation, respectively. With this definition, we obtain the individual contribution to the overall estimator. This contribution is, in spite of censoring, defined for all individuals $$i$$ at each time $$t$$ on the available follow-up interval. Therefore, we propose to estimate $$S_E$$ using \begin{equation} \hat{S}_E(t) = \frac{1}{n} \sum_{i=1}^n \frac{\hat{S}_{Oi}(t)}{S_{Pi}(t)}, \end{equation} (3.1) where $$\hat{S}_{Oi}$$ is the $$i$$th person pseudo-observation and the Kaplan–Meier estimator is used to estimate the overall survival functions $$\hat{S}_{O}(t)$$ and $$\hat{S}_{O}^{(-i)}(t)$$. 3.1. Further comments The idea of pseudo-observations is that whenever we have an estimator that estimates some quantity on the whole sample (i.e. a group-level estimator, the Kaplan–Meier estimator in our case) we can use pseudo-observations formed with that estimator to estimate the same quantity on the individual level. Since the average of pseudo-observations calculated with the Kaplan–Meier equals the overall Kaplan–Meier estimate (except maybe at the last two observed times, see Stute and Wang, 1994, for details), pseudo-observations can be interpreted as an individual contribution to the overall estimate. Pseudo-observations inherit some of the theoretical properties of the group-level estimator that is used for their calculation. For example, if one uses an unbiased estimator it follows from the linearity of expected value that the pseudo-observations are also unbiased. Since the Kaplan–Meier estimator is asymptotically unbiased, the same is true for the pseudo-observations (see Graw and others, 2009, for details). 4. Derivation and estimation of variance To estimate the variance, we first notice that by using the definition of pseudo-observations, we can rewrite (3.1) to get \begin{equation*} \hat{S}_E(t) = \hat{S}_{O}(t) \cdot \sum_{i=1}^n \frac{1}{S_{Pi}(t)} - \frac{n-1}{n} \cdot \sum_{i=1}^n \frac{\hat{S}_{O}^{(-i)}(t)}{S_{Pi}(t)}. \end{equation*} The variance then equals (see Appendix A of the supplementary material available at Biostatistics online for details) \begin{eqnarray} \mathop{\hbox{Var}}\big(\hat{S}_E(t)\big) &=&\bigg( \sum_{i=1}^n \frac{1}{S_{Pi}(t)} \bigg)^2 \cdot \mathop{\hbox{Var}} \big( \hat{S}_{O}(t)\big) \nonumber \\ &&-\, 2 \cdot \frac{n-1}{n} \cdot \sum_{i=1}^n \frac{1}{S_{Pi}(t)} \cdot \bigg\{ \sum_{j=1}^n \frac{1}{S_{Pj}(t)} \cdot \mathop{\hbox{Cov}}\big(\hat{S}_{O}(t), \hat{S}_{O}^{(-j)}(t) \big) \bigg\} \\ &&+\, \bigg( \frac{n-1}{n}\bigg)^2 \cdot \sum_{i=1}^n \sum_{j=1}^n \frac{1}{S_{Pi}(t) S_{Pj}(t)} \cdot \mathop{\hbox{Cov}} \big( \hat{S}_{O}^{(-i)}(t), \hat{S}_{O}^{(-j)}(t) \big).\nonumber \end{eqnarray} (4.1) To calculate the variance one has to calculate $$n \choose 2$$ different covariances $$\mathop{\hbox{Cov}} \big( \hat{S}_{O}^{(-i)}(t), \hat{S}_{O}^{(-j)}(t) \big)$$ and $$n$$ variances $$\mathop{\hbox{Var}} \big( \hat{S}_{O}^{(-i)}(t) \big)$$. Therefore, this approach is very computationally intensive, and a simpler approximation of this formulae might be useful in practice. To obtain one, we can use the result that the $$i$$th pseudo-observation $$\hat{S}_{Oi}$$ asymptotically depends only on the $$i$$th observation (see Lemma 1 from Graw and others, 2009) and is thus asymptotically independent of other pseudo observations. If there are no censored observations in the data, this is also trivially true on finite samples. With this assumption we can express the variance as (see Appendix A of the supplementary material available at Biostatistics online for details) \begin{eqnarray} \mathop{\hbox{Var}}\big(\hat{S}_E(t)\big) & \approx& \mathop{\hbox{Var}}\big(\hat{S}_{O}(t)\big) \cdot \bigg( \sum_{i=1}^n \frac{1}{S_{Pi}^2(t)} \bigg) \nonumber \\ && -\, 2\cdot \frac{n-1}{n} \cdot \sum_{i=1}^n \frac{1}{S_{Pi}^2(t)} \cdot \mathop{\hbox{Cov}}\big( \hat{S}_{O}(t), \hat{S}_{O}^{(-i)}(t) \big)\\ &&+\, \bigg( \frac{n-1}{n} \bigg)^2 \sum_{i=1}^n \frac{1}{S_{Pi}^2(t)} \cdot \mathop{\hbox{Var}}\big( \hat{S}_O^{(-i)}(t) \big) \nonumber. \end{eqnarray} (4.2) Using this approach, we have to calculate only $$n$$ variances and, if this approximation works reasonably well, this implies a huge improvement of the speed of calculations and reduction of the amount of computer memory needed for the calculations. To estimate the variance following the formulae (4.1) or (4.2), we thus have to estimate the quantities $$\mathop{\hbox{Var}}\big( \hat{S}_O^{(-i)}(t)\big)$$, $$\mathop{\hbox{Cov}}\big( \hat{S}_O(t),\hat{S}_O^{(-i)}(t) \big),$$ and $$\mathop{\hbox{Cov}} \big( \hat{S}_O^{(-i)}(t),\hat{S}_O^{(-j)}(t) \big)$$. To this end, we shall build upon the ideas for the variance estimation of the Kaplan–Meier estimator. Let $$N_i(t) = I(T_i \le t, \delta_i=1)$$, $$Y_i(t) = I(T_i \ge t)$$, $$N(t) = \sum_{i=1}^n N_i(t)$$, $$N^{(-j)}(t) = \sum_{i\neq j} N_i(t)$$ and similarly for $$Y$$, $$Y^{(-j)}$$, and $$Y^{(-j,-k)}$$. Let $$S_O^*(t) = \exp\big( -\Lambda_O^*(t) \big)$$, where $$\Lambda_O^*(t) = \int_0^t I\big( Y(u) > 0 \big)\lambda_O(u){\rm d}u$$ and similarly for $$S_O^{*(-i)}(t)$$. The terms $$\mathop{\hbox{Var}}\big( \hat{S}_O(t) \big)$$ and $$\mathop{\hbox{Var}}\big( \hat{S}_O^{(-i)}(t) \big)$$ are simply the variances of the Kaplan–Meier estimator calculated on the whole and on the reduced samples, respectively. To estimate these two terms, the standard counting processes-based estimators can be used (Andersen and others, 1993). The remaining covariances can be approximated using the facts that $$\mathop{\hbox{Cov}} \Big( \frac{\hat{S}_O}{S_O^*} - 1, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1 \Big)(t) = E \Big\langle \frac{\hat{S}_O}{S_O^*} - 1, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1 \Big\rangle(t)$$ and $$\mathop{\hbox{Cov}}\Big( \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} - 1 \Big)(t) = E\Big\langle \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} - 1 \Big\rangle(t)$$. These predictable covariation processes can be calculated using similar techniques as in Andersen and others (1993) (see Appendix B of the supplementary material available at Biostatistics online for details). We obtain the following results $$\matrix{ {\langle {{{{\hat S}_O}} \over {S_O^*}} - 1,{{\hat S_O^{( - i)}} \over {S_O^{*( - i)}}} - 1\rangle (t)} \hfill \cr { = \int_0^t {{{{{\hat S}_O}(s - )} \over {S_O^*(s)}}} \cdot {{I(Y(s) > 0)} \over {Y(s)}} \cdot {{\hat S_O^{( - i)}(s - )} \over {S_O^{*( - i)}(s)}} \cdot {{I({Y^{( - i)}}(s) > 0)} \over {{Y^{( - i)}}(s)}} \cdot (1 - \Delta {\Lambda _O}(s)){Y^{( - i)}}(s){\rm{d}}{\Lambda _O}(s)} \hfill \cr } $$ (4.3) and $$\matrix{ {\langle {{\hat S_O^{( - i)}} \over {S_O^{*( - i)}}} - 1,{{\hat S_O^{( - j)}} \over {S_O^{*( - j)}}} - 1\rangle (t)} \hfill \cr { = \int_0^t {{{\hat S_O^{( - i)}(s - )} \over {S_O^{*( - i)}(s)}}} \cdot {{I({Y^{( - i)}}(s) > 0)} \over {{Y^{( - i)}}(s)}} \cdot {{\hat S_O^{( - j)}(s - )} \over {S_O^{*( - j)}(s)}} \cdot {{I({Y^{( - j)}}(s) > 0)} \over {{Y^{( - j)}}(s)}} \cdot (1 - \Delta {\Lambda _O}(s)){Y^{( - i, - j)}}(s){\rm{d}}{\Lambda _O}(s)}. \hfill \cr } $$ (4.4) Replacing $$S_O^*(s)$$ by $$\hat{S}_O(s)$$, $$S_O^{*(-i)}(s)$$ by $$\hat{S}_O^{(-i)}(s),$$ and $$\Lambda_O(s)$$ by $$\hat{\Lambda}_O(s) $$ in the expressions (4.3) and (4.4), we arrive at the following estimators of the covariances \begin{eqnarray*} \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O}{S^*_O}, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} \bigg)(t) &=& \int_0^t \frac{I\big(Y^{(-i)}(s) > 0 \big)}{Y^{(-i)}(s) - \Delta N^{(-i)}(s)} \cdot \frac{Y^{(-i)}(s)}{Y^2(s)}{\rm d}N(s)\\ \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}}, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} \bigg)(t)&=& \int_0^t \frac{I\big(Y^{(-i)}(s) > 0 \big)}{Y^{(-i)}(s) - \Delta N^{(-i)}(s)} \cdot \frac{I\big(Y^{(-j)}(s) > 0 \big)}{Y^{(-j)}(s) - \Delta N^{(-j)}(s)} \cdot \big( Y(s) - \Delta N(s) \big)\nonumber\\ && \cdot \frac{Y^{(-i,-j)}(s)}{Y^2(s)}{\rm d}N(s) \nonumber \end{eqnarray*} Using these two estimators, we can estimate the covariances required in formulas (4.1) and (4.2): \begin{eqnarray*} \widehat{\mathop{\hbox{Cov}}}\big( \hat{S}_O, \hat{S}_O^{(-i)} \big)(t) &=& \hat{S}_O(t) \cdot \hat{S}_O^{(-i)}(t)\cdot \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O}{S_O^*}, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} \bigg)(t)\\ \widehat{\mathop{\hbox{Cov}}}\big( \hat{S}_O^{(-i)}, \hat{S}_O^{(-j)}\big)(t) &=& \hat{S}_O^{(-i)}(t) \cdot \hat{S}_O^{(-j)}(t) \cdot \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}}, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} \bigg)(t). \end{eqnarray*} 5. Simulation study This section has two goals. The first one is to illustrate that the new approach is practically unbiased. The second goal is to compare the two formulae for the estimation of variance: we wish to check the precise formula (4.1) in practice and show that it can be replaced by the approximate formula (4.2) as the sample size gets larger and the formula (4.1) becomes overly computationally intensive. 5.1. Simulation design All the simulations have some common features: The demographic variables $$D$$ used in the simulations are age (uniform), sex (binary), and year (uniform), they are simulated independently. The times to death from other causes ($$T_P$$) are simulated using Slovene population mortality tables. The times to death from cancer ($$T_E$$) are simulated with $$\lambda_E=\lambda_{E0}\exp(\beta X)$$. The basic scenario uses a fixed $$\lambda_{E0}$$ and $$\beta=0$$ for all covariates $$X$$. Times to death from either cause are simulated independently, their minimum is taken to be the observed time ($$T=\min(T_P,T_E)$$). Censoring times are generated from a uniform distribution, cca 10 % of individuals are censored by the end of follow-up time (5 years). Each simulation scenario consists of 5000 repetitions, all results (bias and coverage in different scenarios) are assessed at 5 years. The main goal of the simulations is to check that the derived theory works as expected in practice, we therefore change the parameters of the simulations to consider the situations which have proven as important in the previous literature on relative survival (e.g. Pohar Perme and others, 2009; Danieli and others, 2012; Pavlič and Pohar Perme, 2017), but keep others simple so that the observed properties can be tracked to their real cause (e.g. a constant baseline excess hazard is of course completely unrealistic, but also irrelevant as we are dealing with non-parametric estimation that makes no assumptions about the baseline). The key properties we are changing in simulation scenarios are the sample size and the proportion of patients with events. Since our measure summarizes the information on $$\lambda_E$$, the number of patients with $$T_E<5$$ is the most important parameter. On the other hand, the patients who die of other causes serve as a nuisance parameter. We illustrate this by two scenarios which have approximately equal number of patients who die of cancer ($$\sim$$40%), but a very different other cause hazard: we change the age distribution of the patients and the size of the baseline hazard to get 52% and 77% of deaths due to excess hazard, respectively. The upper limit for age never exceeds 85 and thus ensures that the probability of the patients to be still alive at the end of follow-up time (5 years) is high enough to make our interest in $$\lambda_E$$ still sensible. Based on the experience in the literature, we add a third scenario, which has proven as problematic with other studies in relative survival (e.g. Pohar Perme and others, 2009, Supplementary material) has shown both a huge variance and biased regression coefficient estimates in such scenarios): we consider a very low proportion of deaths due to excess hazard (36 %). Coupled with a low overall number of events (22%), this is also an example where the sample size 500 contains very little information on $$\lambda_E$$ (only 8% of patients die due to $$\lambda_E$$). We also changed other parameters, for example the size of covariate effect $$\beta$$ in the excess hazard. However, while these scenarios were important with other issues of relative survival (see e.g. Danieli and others, 2012), they proved irrelevant in our case, the results are hence not reported or are moved to the Appendix D of the supplementary material available at Biostatistics online, where also the details on both the basic scenarios (included in the article) and other scenarios considered can be found. 5.2. Simulation results The first goal is to assess the magnitude of bias of the new approach on small samples, the results are presented in Table 1. We can see that the magnitude of bias of the new approach is of the order of $$10^{-4}$$ (the standard error of the estimate is of the order of $$10^{-2}$$ in this case), the bias is therefore practically negligible. Furthermore, we note that new approach and the PP approach are almost numerically equal in all scenarios considered, the differences are negligible even in the rather extreme scenario (22%/36%), where the weights of the PP estimator are known to be very large. Due to this equality no further comparisons with other methods need to be done, since in terms of bias, difference between the existing estimators and variability the results will be the same as in the 7s comparing the PP estimator with other approaches (Danieli and others, 2012; Roche and others, 2012; Lambert and others, 2015; Seppä and others, 2015, 2016). Table 1. Bias of different estimators on samples of increasing sizes (assessed at 5 years) Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Data generated from a model with no effect of covariates; proportion of deaths = 22%; proportion of deaths due to excess hazard among all deaths = 36%. Table 1. Bias of different estimators on samples of increasing sizes (assessed at 5 years) Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Data generated from a model with no effect of covariates; proportion of deaths = 22%; proportion of deaths due to excess hazard among all deaths = 36%. In order to compare the two formulae for estimation of variance (4.1) and (4.2), we consider the coverage of the confidence intervals. Since the new approach is practically unbiased, the coverage entirely reflects the precision of the two estimators of variance. The simulation results for three different scenarios with different proportions of deaths and different proportions of deaths due to excess hazard among all deaths for three different sample sizes are given in Table 2. They indicate that the precise formula gives coverages that are close to the nominal value of 0.95 in all three scenarios with the most extreme scenario (the smallest proportion of events) tending to have a slightly too low coverage. This is in line with the previous work, which indicates disproportionately large variability with small proportions of excess hazard events (e.g. Figure 1 in the supplementary material of Pohar Perme and others, 2009), in our case, this large variability seems to be underestimated. The approximate formula gives coverages that are constantly below the nominal level (see Table 2), but the discrepancies are minimal in the two realistic scenarios with the higher proportion of deaths. As a standard to compare to, we also report the coverage of the PP estimator, which is close to the nominal value in all cases. Table 2. Coverages of CI, data generated from a model with no effect of covariates (assessed at 5 years) $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 d, number of deaths from any cause; $$d_E$$, number of deaths due to excess hazard; P, the new approach based on pseudo-observations; PP, Pohar Perme estimator; pr, precise formula for the variance of P; apr, approximate formula for the variance of P. Table 2. Coverages of CI, data generated from a model with no effect of covariates (assessed at 5 years) $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 d, number of deaths from any cause; $$d_E$$, number of deaths due to excess hazard; P, the new approach based on pseudo-observations; PP, Pohar Perme estimator; pr, precise formula for the variance of P; apr, approximate formula for the variance of P. Comparing all the results in Table 2, it is clear that the problems do not lie in small samples, instead, the proportion of excess hazard deaths among all deaths is crucial for the properties of our estimators. In all of the above, the $$\lambda_E$$ was equal for all individuals and hence independent of covariates $$X$$. We have also generated data from the models where covariates had effect, but the results were similar and are therefore not reported (the covariate effect changes the proportion of deaths and the proportion of deaths due to cancer and are similar to scenarios mentioned above—for details see Appendix D of the supplementary material available at Biostatistics online). 6. Example We now explore the properties of the estimator on real data. Consider the data of all Slovene female colon cancer patients—538 female patients diagnosed between January 1, 1994 and December 30, 2000, data are taken from Slovene cancer registry (Zadnik and others, 2012) and available in Pohar Perme (2017). Their age span is between 26 and 85. Figure 1 (left side) presents their 20-year survival curve. This sample was chosen on purpose—as an example of a truly bizarre behavior we can get with routine use of the estimator, a point raised by Dickman and others (2013) and Lambert and others (2015). To understand the reasons for this behavior, note first, that despite the weird behavior both the PP estimator and our new proposal actually give the same estimates. The same would be true also if we had no censoring, i.e. in the case, where the new approach follows directly from the definition of the measure. It is thus clear that this behavior is not a property of the PP estimator, but rather the property of the measure. The weights ($$1 / S_P$$) that cause the variable behavior of the PP estimator are present already in the definition in formula (2.1) and hence affect any estimator of this measure. The oldest individuals in our sample were aged 85 and their probability of still being alive after 20 years ($$S_P(20)$$) is of order $$10^{-6}$$, hence we cannot expect to have any information on their $$\lambda_E$$ in the data. Before dropping from this sample, these individuals can have very large weights—in the case of our sample, there were 267 patients still at risk at 17 years, with two patients having weights of size 50 (the median weight at that time is 1.3), these two patients almost entirely control the behavior of the curve at that time. Figure 1 View largeDownload slide The PP and the new approach on the sample of Slovene female colon cancer patients. Left: 1538 patients, age at diagnosis between 26 and 85. Right: 815 patients, age at diagnosis between 26 and 70. Figure 1 View largeDownload slide The PP and the new approach on the sample of Slovene female colon cancer patients. Left: 1538 patients, age at diagnosis between 26 and 85. Right: 815 patients, age at diagnosis between 26 and 70. Some authors propose alternative estimators to deal with such data (age-stratified Ederer II, model-based estimator, etc., see e.g. Dickman and others, 2013), i.e. introducing assumptions to deal with the problem of little or no information available in the data. As the above example clearly illustrates, the large variability is not a problem of a specific estimator (PP) but rather a problem of the measure. Instead of reasoning why the assumptions made (that are almost always untestable) are acceptable, we focus on understanding this property of the measure and the limitations it brings with. The measure is a function of $$\lambda_E$$ of all patients. With the probability of still being alive in the population $$S_p$$ getting very low, we cannot expect to get any information on $$\lambda_E$$ from the data. One way to move forward is to set some assumptions, but, apart from being untestable, any assumptions about the value of $$\lambda_E$$ for very old patients are also not at all of practical interest. We therefore propose one should limit to a cohort for which the value of $$\lambda_E$$ is of interest throughout the follow-up interval. In our example—if the interest lies in 20 year survival, it seems reasonable to limit ourselves to patients under 70 at diagnosis, thus ensuring that their probability of still being alive in the population after 20 years is large enough to allow estimation. The graph of the resulting 815 women is presented on the right side of Figure 1. Again, we see that the values of the PP and the new estimator are practically identical, the same is true for the confidence intervals. When limiting the analysis to follow-up for which the measure is meaningful, both estimators (PP and the newly proposed approach) shall have reasonable properties in terms of variance. 7. Discussion on the measure This article is focused on the estimation of the measure defined in (2.2). This is of course not the only measure that could be of interest in the relative survival setting (see e.g. Ederer and others, 1961; Cronin and Feuer, 2000; Sasieni and Brentnall, 2016). An important drawback lies in the way it summarizes the excess term information—by definition, the information on $$\lambda_E$$ should be available throughout the follow-up time for the entire cohort. Section 6 presented an example where this condition was clearly not met. Sasieni and Brentnall (2016) instead propose a measure that can again be seen as a weighted average of $$\lambda_E$$, but define weights in a way that optimizes the use of the information available, i.e. consider the older patients for a period shorter than the whole follow-up time. The interpretation of the measure obtained is rather complex and the authors themselves propose that, for a non-specialist audience, it should be described merely as a standardized relative survival index. They claim the same is true for net survival (Sasieni and Brentnall, 2016). We believe that particularly the average ratio interpretation of $$S_E$$ is rather easy to grasp but agree that net survival is often interpreted overly loosely without paying the proper respect to all the assumptions it warrants and the comparability between populations is taken for granted. In this section, we state these assumptions and thus warn against oversimplifications. Comparability between populations Net survival is often referred to as the measure “that takes into account the population mortality differences” or the measure “that does not depend on the population mortality hazard,” its comparability between groups with different population mortality is hence a key issue. In terms of comparability, the key issue is the assumption that $$\lambda_P$$ is given by the population mortality tables. This is often not true, since $$D$$ does not include all the covariates that affect both $$\lambda_P$$ and $$\lambda_E$$, for example, population tables do not include smoking even though this would be crucial for lung cancer data. With this assumption violated, populations differing with respect to these, missing covariates (e.g. smoking), become incomparable. To see this, say we wish to compare $$S_E$$ between Slovene and Chinese male lung cancer patients. A vast majority of these patients are smokers and it is clear that the $$\lambda_P$$ calculated on both smokers and non-smokers in the population is underestimating their other-cause mortality, hence their $$\lambda_E$$ shall be overestimated. However, since the prevalence of smoking in the two countries is very different ($$\sim$$20% in Slovenia versus 50% in China), the amount of this bias shall be very different in the two countries, thus making the estimated $$\widehat{S}_E$$ differ considerably even if the true $$\lambda_E$$ is in fact the same. Several papers have been published investigating this topic (see e.g. Grafféo and others, 2012). Equation (1.1) is commonly stated as the basic starting model when defining net survival, even though it is not always clearly defined in the literature. Note that the overall hazard $$\lambda_O$$ is the conditional probability of dying: $$\lambda_O(t|X)=\lim\limits_{\Delta t\rightarrow 0}\frac{P(t \leq T \leq t+ \Delta t |X,T \geq t)}{\Delta t} $$ which implies \begin{eqnarray} \lambda_E(t|X)&=&\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_E \leq t+ \Delta t|X, T \geq t)}{\Delta t} \\ \lambda_P(t|D)&=&\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_P \leq t+ \Delta t|D, T \geq t)}{\Delta t} \notag. \end{eqnarray} (7.1) The common assumption of the relative survival field is that $$ \lambda_P(t|D)=\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_P \leq t+ \Delta t|D, T_P \geq t)}{\Delta t} $$ and can be obtained from the survival probabilities $$S_P$$ reported in the population mortality tables. Using $$\lambda_E$$ as in (7.1) and $$\lambda_P$$ from the population tables, we define the relative survival in (2.2). The net survival measure definition assumes also that $$S_E$$ is a survival function, i.e. \begin{eqnarray} \lambda_E(t|X)=\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_E \leq t+ \Delta t|X, T_E \geq t)}{\Delta t}. \end{eqnarray} (7.2) A sufficient but not necessary assumption would be that $$T_P$$ and $$T_E$$ are independent conditional on $$X$$. The assumption (7.2) is necessary if $$S_E$$ is to be independent of $$T_P$$. Note that the definition of net survival has thus also implicitly assumed the existence of latent times $$T_E$$ and $$T_P$$—the time of dying due to the disease and due to other causes, respectively (the observed time $$T=\min(T_E,T_P)$$). The marginal relative survival of course depends on the distribution of covariates $$X$$, a difference in $$S_E$$ between two populations might therefore stem from a different covariate distribution $$H$$. To take this into account, the formula (2.3) can be used as it allows weights $$w$$. The idea of the weights is to report results with a pregiven standard distribution of covariates (see e.g. Corazziari and others, 2004 for age-standardization). Such a standardization is of course only crude, and setting weights that ensure truly comparable results may not be easy. To understand the observed differences in $$\widehat{S}_E$$, one shall usually rather resort to regression models. Interpretation When the population mortality tables do not adequately describe $$\lambda_P$$, the interpretation of net survival fails; however, this assumption is not needed if interpreting $$S_E$$ as the average ratio—the observed survival of an individual is compared to his counterparts in the population, where the counterparts are defined by the covariates $$D$$ that are available (the patient does not need to be equal to the defined population in all aspects but the fact he has been diagnosed with the disease). The comparability between populations is of course lost regardless of the interpretation. The assumption (7.2) cannot be verified from the observed data (Kalbfleisch and Prentice, 2002, p. 259–261), in fact verifying it could only make sense if the $$T_E$$ and $$T_P$$ did not indicate the times to death, but rather the times to some other events which do not preclude one another. Without this assumption, net survival is not identifiable with real world data (see e.g. Tsiatis, 1975; Andersen and Keiding, 2012), as the next example shows. Say the data follow the additive model in (1.1) with $$T_E$$ and $$T_P$$ conditionally independent, the observed times $$T_O$$ are calculated as the minimum of the two latent times. We shall refer to this data set as data set A. Using the times $$T_E$$ only, we estimate survival in the hypothetical world using Kaplan–Meier. Using the observed time $$T_O$$ we add the curves estimated by PP estimator and the new approach to the plot (see Figure 2). There are various scenarios of dependency between $$T_P$$ and $$T_E$$ which would result in the same observed times. One option is to change the time of the other event after the first one has happened. Two extreme cases in terms of the marginal distribution of $$T_E$$ can be imagined: after an individual experienced the “other cause” event ($$T_O=T_P$$, data set B), the individuals “dies of cancer” immediately ($$T_E=T_P+\Delta t$$) or, the individual becomes immortal with respect to cancer ($$T_E=\infty$$, data set C). Using times $$T_E$$ only, we again estimate survival in the hypothetical world. The estimators on the three data sets (A, B, C) in Figure 2 are wildly different and the value of net survival between the two extrema (B and C) is unidentifiable. To summarize: both PP and the new estimator estimate the same quantity, i.e. marginal relative survival. As this example shows, this may be very far from net survival if (7.2) is not true. In practice, we never know if (7.2) is fulfilled, which makes the interpretation of the estimate as net survival overly optimistic. We believe the historical interest in net survival does not stem from the interest in what would happen if we could remove other causes, rather, it is driven by the need of a measure independent of the population mortality. To avoid the long-standing debate on the sensibility of net survival and the need for untestable assumptions, the more cautious interpretation of the measure in terms of the average ratio can be used. Without assuming (7.2) (and conditional on $$\lambda_P$$ being correct), the extent of comparability has been summarized as the criterium A2 in Sasieni’s work (Sasieni and Brentnall, 2016): the criterium demands that the measure depends on the population mortality only through $$\lambda_E$$ (it does not directly depend on the size of $$\lambda_P$$ as is the case with relative survival ratio or cumulative incidence function). Whether $$\lambda_E$$ is defined as in (7.1) or (7.2) is irrelevant for this criterium. Figure 2 View largeDownload slide The PP and the new approach versus the three true net survivals in the hypothetical world data sets A, B, and C. Figure 2 View largeDownload slide The PP and the new approach versus the three true net survivals in the hypothetical world data sets A, B, and C. All things being said, the marginal relative survival shall in practice never be completely comparable, but it is (possibly with some additional weighting to achieve equal covariate distribution) a reasonable and simple survival measure to report when focusing on $$\lambda_E$$. While the data are never perfect and the above listed assumptions clearly indicate what is relevant when judging them, it is important that the estimator at least avoids depending on $$\lambda_P$$ when this is not necessary (a counter-example is the Ederer II estimator which always depends on $$\lambda_P$$, unless $$\lambda_E$$ is equal for all the patients regardless of $$X$$). 8. Conclusions A measure that compares survival of cancer patients between countries accounting for their differences in the general population mortality has been one of the main aims of the analyzes of cancer registry data in the last 50 years. While most of the debate in the past was focusing on choosing the right estimator, more theoretical work has been done in the recent years focusing on the possible measures and their properties (Pohar Perme and others, 2012; Sasieni and Brentnall, 2016). In this article, we have focused on the marginal relative survival measure that summarizes the information about the excess term $$\lambda_E(t)$$. Our new approach to estimation of marginal relative survival is based on pseudo-observations, that are calculated on the whole sample. They represent the individual contribution to the overall estimate and are defined at all times regardless of censoring. With their aid, we construct the new estimator directly from the definition, which enables us to distinguish between the properties of the measure and the properties of its estimators. In particular, we show that the weights ($$1/S_P$$) are intrinsic to this measure and explain the limitations of focusing on the excess term $$\lambda_E$$ with real world data. Both net survival and marginal relative survival are measures defined as an average over all individuals in the sample, we should thus only attempt their estimation in group of individuals for whom the probability of their counterparts still being alive in the population is high enough—it is impossible to get a reasonable estimator of this measure for groups containing very old individuals for whom $$S_p(t)$$ is practically equal to 0. A common critique with any approach based on pseudo-observations is that it requires more stringent conditions in terms of censoring compared to alternative approaches. However, this is only true and needs to be dealt with in regression models (Binder and others, 2014). In our case, a non-parametric measure is of interest, and the pseudo-observations based approach is thus no exception—any non-parametric method requires censoring independent of covariates. We have shown that the new approach is practically unbiased even on small samples with few events. We have derived the variance formula for this estimator and proposed two options for its estimation. Their performance has been studied with simulations, where important discrepancies were found only with smaller data sizes, for which the precise formula can be calculated in a sufficiently fast manner. The new approach to estimation gives an estimate which is very close to that of the PP estimator, from the practical view, the results are almost equal, which we have also shown in theory. The limit of both estimators is hence equal to the marginal relative survival. Claiming that it is also equal to net survival requires strong additional untestable assumptions (7.1 equaling 7.2). The main importance of this work is thus not in replacing the existing estimator, but rather in providing an alternative approach that gives important insight into the measure and that may be easier to implement in software. The approach does not depend on the definition of latent failure times and on the hypothetical world assumptions. With its simpler form, we have removed the need for numerical integration, which is a disadvantage of the PP estimator blocking further extensions. In particular, a log-rank type test for comparison of net survival curves has recently been developed (following the logic of the PP estimator, see Grafféo and others, 2016 for details) but no theoretical link to semi-parametric models can be established because of the numerical approximations (Pavlič and Pohar Perme, 2017). On the other hand, the usage of pseudo-observations in regression models is rather straightforward—having the same general theory in both non-parametric estimation and regression models could improve the understanding of the properties of the methods. 9. Software Due to numeric integration, PP needs to be evaluated in short enough intervals (daily intervals are the default in the relsurv package, Pohar Perme, 2017) regardless of whether a single point estimate or a whole survival curve is required. With our code the new approach is considerably faster (10% of the time at $$n=1000$$, 20% of the time at $$n=10\,000$$) and remains faster even when monthly estimates are required (50% to 70%). Precise variance formula calculation is always very intensive but can still be done in a reasonable time when it is in fact needed (see Table 2) whereas its memory requirements prevent its usage in very large samples. The approximate formula does not increase the computational time of the estimator by more than 10%. The GitHub repository https://github.com/Klemen-Pavlic/Marginal-relative-survival contains the implementation of the new estimation approach and the code for the example presented in Section 6. Supplementary material Supplementary material is available online at http://biostatistics.oxfordjournals.org. Acknowledgements The authors thank the CENSUR working survival group (ANR Grant Number ANR-12-BSV1-0028; coordinator Prof. Roch Giorgi) for helpful discussions. Conflict of Interest: None declared. Funding The authors acknowledge the financial support from the Slovenian Research Agency (research core funding No. P3-0154). The authors acknowledge the project Methods of estimation of key indicators in population cancer survival J3-7272 was financially supported by the Slovenian Research Agency. Furthermore, Klemen Pavlič is a young researcher supported by the Slovenian Research Agency. References Andersen P. K. and Keiding N. ( 2012 ). Interpretability and importance of functionals in competing risks and multistate models. Statistics in Medicine 31 , 1074 – 1088 . Google Scholar CrossRef Search ADS PubMed Andersen P. K. , Borgan O. , Gill R. D. and Keiding N. ( 1993 ). Statistical Models Based on Counting Processes . New York : Springer. Google Scholar CrossRef Search ADS Andersen P. K. , Klein J. P. and Rosthøj S. ( 2003 ). Generalized linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika 90 , 15 – 27 . Google Scholar CrossRef Search ADS Binder N. , Gerds T. A. and Andersen P. K. ( 2014 ). Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Analysis 20 , 303 – 315 . Google Scholar CrossRef Search ADS PubMed Brenner H. and Hakulinen T. ( 2003 ). On crude and age-adjusted relative survival rates. Journal of Clinical Epidemiology 56 , 1185 – 1191 . Google Scholar CrossRef Search ADS PubMed Brenner H. , Arndt V. , Gefeller O. and Hakulinen T. ( 2004 ). An alternative approach to age-adjustment of cancer survival rates. European Journal of Cancer 40 , 2317 – 2322 . Google Scholar CrossRef Search ADS PubMed Clerc-Urmès I. , Grzebyk M. and Hédelin G. ( 2014 ). Net survival estimation with stns. Stata Journal 14 , 87 – 102 . Corazziari I. , Quinn M. and Capocaccia R. ( 2004 ). Standard cancer patient population for age standardizing survival ratios. European Journal of Cancer 40 , 2307 – 2316 . Google Scholar CrossRef Search ADS PubMed Cortese G. , Gerds T. A. and Andersen P. K. ( 2013 ). Comparing predictions among competing risks models with time-dependent covariates. Statistics in Medicine 32 , 3089 – 3101 . Google Scholar CrossRef Search ADS PubMed Coviello E. , Dickman P. W. , Seppä K. and Pokhrel A. ( 2015 ). Estimating net survival using a life-table approach. Stata Journal 15 , 173 – 185 . Cronin K. A. and Feuer E. J. ( 2000 ). Cumulative cause-specific mortality for cancer patients in the presence of other causes: a crude analogue of relative survival. Statistics in Medicine 19 , 1729 – 1740 . Google Scholar CrossRef Search ADS PubMed Danieli C. , Remontet L. , Bossard N. , Roche L. and Belot A. ( 2012 ). Estimating net survival: the importance of allowing for informative censoring. Statistics in Medicine 31 , 775 – 786 . Google Scholar CrossRef Search ADS PubMed Dickman P. W. and Coviello E. ( 2015 ). Estimating and modeling relative survival. Stata Journal 15 , 186 – 215 . Dickman P. W. , Lambert P. C. , Coviello E. and Rutherford M. J. ( 2013 ). Estimating net survival in population-based cancer studies, letter to editor. International Journal of Cancer 133 , 519 – 521 . Google Scholar CrossRef Search ADS PubMed Ederer F. , Axtell L. M. and Cutler S. J. ( 1961 ). The relative survival rate: a statistical methodology. National Cancer Institute Monograph 6 , 101 – 121 . Google Scholar PubMed Estève J. , Benhamou E. , Croasdale M. and Raymond M. ( 1990 ). Relative survival and the estimation of net survival: elements for further discussion. Statistics in Medicine 9 , 529 – 538 . Google Scholar CrossRef Search ADS PubMed Grafféo N. , Castell F. , Belot A. and Giorgi R. ( 2016 ). A log-rank-type test to compare net survival distributions. Biometrics 72 , 760 – 769 . Google Scholar CrossRef Search ADS PubMed Grafféo N. , Jooste V. and Giorgi R. ( 2012 ). The impact of additional life-table variables on excess mortality estimates. Statistics in Medicine 31 , 4219 – 4230 . Google Scholar CrossRef Search ADS PubMed Graw F. , Gerds T. A. and Schumacher M. ( 2009 ). On pseudo-values for regression analysis in multi-state models. Lifetime Data Analysis 15 , 241 – 255 . Google Scholar CrossRef Search ADS PubMed Hakulinen T. and Tenkanen L. ( 1987 ). Regression analysis of relative survival rates. Journal of the Royal Statistical Society—Series C 36 , 309 – 317 . Kalbfleisch J. D. and Prentice R. ( 2002 ). The Statistical Analysis of Failure Time Data , 2nd edition . Hoboken, NJ : John Wiley and Sons, Inc. Google Scholar CrossRef Search ADS Komukai S. and Hattori S. ( 2016 ). Doubly robust estimator for net survival rate in analyses of cancer registry data. Biometrics , 124 – 133 . Lambert P. C. , Dickman P. W. and Rutherford M. J. ( 2015 ). Comparison of different approaches to estimating age standardized net survival. BMC Medical Research Methodology 15 , 64 . Google Scholar CrossRef Search ADS PubMed Pavlič K. and Pohar Perme M. ( 2017 ). On comparison of net survival curves. BMC Medical Research Methodology 17 , 79 . Google Scholar CrossRef Search ADS PubMed Pohar M. and Stare J. ( 2006 ). Relative survival analysis in R. Computer Methods and Programs in Biomedicine 81 , 272 – 278 . Google Scholar CrossRef Search ADS PubMed Pohar M. and Stare J. ( 2007 ). Making relative survival analysis relatively easy. Computers in Biology and Medicine 37 , 1741 – 1749 . Google Scholar CrossRef Search ADS PubMed Pohar Perme M. ( 2017 ). relsurv: Relative Survival . R package version 2.1-1. Vienna, Austria : R Foundation for Statistical Computing. Pohar Perme M. , Henderson R. and Stare J. ( 2009 ). An approach to estimation in relative survival regression. Biostatistics 10 , 136 – 146 . Google Scholar CrossRef Search ADS PubMed Pohar Perme M. , Stare J. and Estève J. ( 2012 ). On estimation in relative survival. Biometrics 68 , 113 – 120 . Google Scholar CrossRef Search ADS PubMed Pokhrel A. and Hakulinen T. ( 2008 ). How to interpret the relative survival ratios of cancer patients. European Journal of Cancer 44 , 2661 – 2667 . Google Scholar CrossRef Search ADS PubMed Pokhrel A. and Hakulinen T. ( 2009 ). Age-standardisation of relative survival ratios of cancer patients in a comparison between countries, genders and time periods. European Journal of Cancer 45 , 642 – 647 . Google Scholar CrossRef Search ADS PubMed Quaresma M. , Coleman M. P. and Rachet B. ( 2015 ). 40-year trends in an index of survival for all cancers combined and survival adjusted for age and sex for each cancer in England and Wales, 1971–2011: a population-based study. The Lancet 385 , 1206 – 1218 . Google Scholar CrossRef Search ADS Roche L. , Danieli C. , Belot A. , Grosclaude P. , Bouvier A. M. , Velten M. , Iwaz J. , Remontet L. and Bossard N. ( 2012 ). Cancer net survival on registry data: use of the new unbiased Pohar-Perme estimator and magnitude of the bias with the classical methods. International Journal of Cancer 132 , 2359 – 2369 . Google Scholar CrossRef Search ADS PubMed Sasieni P. and Brentnall A. R. ( 2017 ). On standardized relative survival. Biometrics 73 , 473 – 482 . Google Scholar CrossRef Search ADS PubMed Seppä K. , Hakulinen T. and Pokhrel A. ( 2015 ). Choosing the net survival method for cancer survival estimation. European Journal of Cancer 51 , 1123 – 1129 . Google Scholar CrossRef Search ADS PubMed Seppä K. , Hakulinen T. , Läärä E. and Pitkäniemi J. ( 2016 ). Comparing net survival estimators of cancer patients. Statistics in Medicine 35 , 1866 – 1879 . Google Scholar CrossRef Search ADS PubMed Stute W. and Wang J. L. ( 1994 ). The jackknife estimate of a Kaplan–Meier integral. Biometrika 81 , 603 – 606 . Tsiatis A. ( 1975 ). A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences of the United States of America 72 , 20 – 22 . Google Scholar CrossRef Search ADS PubMed Zadnik V. , Primic Žakelj M. and Krajc M. ( 2012 ). Cancer burden in Slovenia in comparison with the burden in other European countries. Zdravniški vestnik—Slovenian Medical Journal 81 , 407 – 412 . © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biostatistics Oxford University Press

Using pseudo-observations for estimation in relative survival

Loading next page...
 
/lp/ou_press/using-pseudo-observations-for-estimation-in-relative-survival-M8wX05MfBR
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
ISSN
1465-4644
eISSN
1468-4357
D.O.I.
10.1093/biostatistics/kxy008
Publisher site
See Article on Publisher Site

Abstract

Summary A common goal in the analysis of the long-term survival related to a specific disease is to estimate a measure that is comparable between populations with different general population mortality. When cause of death is unavailable or unreliable, as for example in cancer registry studies, relative survival methodology is used—in addition to the mortality data of the patients, we use the data on the mortality of the general population. In this article, we focus on the marginal relative survival measure that summarizes the information about the disease-specific hazard. Under additional assumptions about latent times to death of each cause, this measure equals net survival. We propose a new approach to estimation based on pseudo-observations and derive two estimators of its variance. The properties of the new approach are assessed both theoretically and with simulations, showing practically no bias and a close to nominal coverage of the confidence intervals with the precise formula for the variance. The approximate formula for the variance has sufficiently good performance in large samples where the precise formula calculation becomes computationally intensive. Using bladder cancer data and simulations, we show that the behavior of the new approach is very close to that of the Pohar Perme estimator but has the important advantage of a simpler formula that does not require numerical integration and therefore lends itself more naturally to further extensions. 1. Introduction When analyzing long-term survival of patients with a specific disease, we can expect many deaths due to causes that are not of our primary interest. The hazard of dying due to causes other than the disease can of course differ between different populations or periods in time, so a measure, that accounts for these differences is required. In order to distinguish between the causes, we assume the competing risks model (in the field of relative survival often referred to as the additive model, Estève and others, 1990) \begin{equation} \lambda_{O}(t|X) = \lambda_{P}(t|D) + \lambda_{E}(t|X), \end{equation} (1.1) where $$\lambda_{O}$$ is the overall hazard of each individual, $$\lambda_{E}$$ is the disease-specific hazard (excess hazard due to the disease) and $$\lambda_{P}$$ the other-cause hazard; $$X=(D,Z)$$, $$D$$ denotes the demographic covariates (those by which the population tables are split) and $$Z$$ denotes other variables of interest. If the cause of death in the data is not given or is unreliable, only $$\lambda_{O}$$ can be observed and additional information must be brought into the analysis to discern between two terms on the right hand of (1.1). The idea of the relative survival field is to assume that $$\lambda_{P}(t|D)$$ is known, that is that it can be obtained from the general population mortality tables (hence the subscript P in the notation) for each combination of values of demographic variables $$D$$. When the main goal is to consider the patient’s survival as an indicator of treatment success and to compare it between different populations or time periods, a measure of survival that does not depend on the population hazard is the basic goal of the estimation. Therefore, our interest does not lie in cumulative incidence functions [i.e. cumulative probability of dying with cancer which depend on both hazards ($$\lambda_E$$ and $$ \lambda_P$$ in 1.1)], but rather in a measure that summarizes the information given in the excess term $$\lambda_E$$ only. While the estimation of such a measure in the form of net survival has been considered as the goal of the analysis of population-based cancer survival for decades (Ederer and others, 1961; Hakulinen and Tenkanen, 1987; Estève and others, 1990; Brenner and Hakulinen, 2003; Brenner and others, 2004; Pokhrel and Hakulinen, 2008, 2009) with relative survival methodology being part of major statistical packages (Pohar and Stare, 2006, 2007; Clerc-Urmès and others, 2014; Coviello and others, 2015; Dickman and Coviello, 2015; Pohar Perme, 2017), it was only in 2012, that a consistent estimator of net survival was proposed (Pohar Perme and others, 2012), we shall refer to it as the PP estimator. With practical use of this estimator, very large variability and peculiarly jumpy behavior has been observed in the following years (Dickman and others, 2013; Seppä and others, 2016), which led some authors to search for alternative estimators (Lambert and others, 2015). In 2016, a doubly robust version of the PP estimator was proposed (Komukai and Hattori, 2016). As we shall show in Section 7, the large variability follows from the definition of the net survival measure and is hence not a property of a particular estimator. Nevertheless, the PP estimator does have some undesired properties, in particular, the need of approximations required to perform the numerical integration is precluding direct theoretical extensions, for example, the link between non-parametric estimation and regression modeling. In this article, we define an alternative non-parametric estimator based on pseudo-observations. The idea for the use of pseudo-observations comes from the wish to formulate the estimator directly from the definition of the measure, which allows us to distinguish between the properties of the measure and the properties of the methods for its estimation. This article is organized as follows: in Section 2 the measure of interest is defined, and in Section 3 the new approach to estimation is proposed and its properties explained. The vast majority of papers about pseudo-observations focused on their usage as response variables in regression models (Andersen and others, 2003), an exception being the (Cortese and others, 2013) paper where the idea of pseudo-observations was used to calculate the Brier score. However, to the best of our knowledge no special attention has been devoted to variance of the pseudo-observation and its estimation and this is where this article makes a step forward. We derive the variance formula in Section 4 and consider also several approaches to its estimation. Section 5 is devoted to a simulation study that compares the new approach to the existing PP method and explores the properties of the different proposals for the variance calculation. Section 6 includes a real data example, Section 7 discusses the properties of the measure, and Section 8 closes the article. 2. The measure of interest In this section, we define the measure that is the focus of this work. Our goal is a non-parametric estimator for the group as a whole, but we do not assume that hazards are homogeneous, hence the covariates are explicitly used in the notation. Since our interest lies in the excess term only, we wish to consider some sort of a weighted average of $$\lambda_E(t|X)$$. In the field of population-based cancer survival, where the relative survival methodology is most frequently used, it is usual to consider the cumulative hazard $$\Lambda_E(t|X)$$ for each individual and then report a measure defined on the “survival scale,” i.e. to use the operator $$v \mapsto \exp(-v)$$ on (1.1) to obtain the following relation \begin{equation} S_E(t|X) = \frac{S_{O}(t|X)}{S_{P}(t|D)}. \end{equation} (2.1) On individual level, we shall refer to this ratio as the relative survival. In the relative survival field, we assume that the denominator in (2.1) equals the population survival $$S_P(t|D)$$ obtained from the general population mortality tables. For a group of individuals with different values of $$X$$, we define the marginal relative survival as \begin{equation} S_E(t)=\int S_E(t|x){\rm d}H(x), \end{equation} (2.2) where $$H$$ is the distribution function of covariates $$X$$. The definition could also allow additional weights, i.e. \begin{equation} S_E^w(t) = \int w(z)S_E(t|z){\rm d}H(z), \end{equation} (2.3) that can be useful when trying to compare populations with considerably different covariate distribution (more on this in Section 7). The marginal relative survival (2.2) is of interest when the sole focus is the excess term $$\lambda_E$$—it summarizes it over time and over individuals. It has a simple interpretation: for each individual, it presents the ratio of his personal survival probability $$S_{O}$$ to the survival probability of his counterparts (defined by a vector of demographic covariates $$D$$) in the population $$S_P$$ (2.1); for a group of individuals, it can be interpreted as the average ratio. Note that (2.2) is not the same as the classical relative survival ratio (Ederer and others, 1961), which is defined as the ratio between the overall survival of a group and the population survival of a comparable group in the population, i.e. the ratio of averages and not the average ratio. The measure (2.2) is commonly interpreted also as marginal net survival, i.e. the survival in a hypothetical world where one could not die of other causes but cancer, we state the additional implicit assumptions required for this interpretation in Section 7. One of the crucial points of this work is to propose an estimator whose derivation does not depend on assumptions in a hypothetical world. 3. A new approach to estimation We assume that our sample consists of $$n$$ possibly right-censored pairs $$(T_i,\delta_i)$$, where $$T_i$$ is the observed time and $$\delta_i$$ is the censoring indicator for the $$i$$th patient. To simplify the notation, we will use weights $$w$$ (formula (2.3)) that are constantly equal to 1, i.e. we propose the estimator for (2.2). However, the derivations and arguments used therein can be directly carried over to the more general case (2.3). Following the definition of the measure, we propose the estimator \begin{equation*} \hat{S}_E(t) = \frac{1}{n} \sum_{i=1}^n \hat{S}_{Ei}(t), \end{equation*} where $$\hat{S}_{Ei}$$ is the estimator of this measure for patient $$i$$ from the sample ($$\hat{S}_{Ei}(t) = \hat{S}_E(t|X=x_i)$$). The individual values $$S_{Ei}$$ could be estimated assuming a model for the excess hazard (see e.g. Lambert and others, 2015; Quaresma and others, 2015; Seppä and others, 2016), we attempt a non-parametric approach here. Following the definition of relative survival (2.1), $$S_{Ei}$$ can be estimated using $$\frac{\hat{S}_{Oi}}{S_{Pi}}$$ [here $$\hat{S}_{Oi}(t) = \hat{S}_{O}(t|X=x_i)$$ and $$S_{Pi}(t) = S_P(t|D=d_i)$$]. The values $$S_{Pi}$$ can be obtained from the population mortality tables. To get $$\hat{S}_{Oi}(t)$$, we consider two cases: Complete data: If the data are complete, i.e. there are no censored observations ($$\delta_i=0$$ for all $$i$$), the natural estimator of $$\hat{S}_{Oi}(t)$$ is the indicator function $$I(T_i > t)$$. Measure (2.2) can be estimated as \begin{equation*} \hat{S}_E(t) = \frac{1}{n} \sum_{i=1}^n \frac{I(T_i> t)}{S_{Pi}(t)}. \end{equation*} Right-censored data: If some individuals are right-censored, the above mentioned indicator functions can no longer be used as estimators. Instead of defining the estimator on the hazard scale (as for example the PP estimator, Pohar Perme and others, 2012), we keep the same idea as in the complete case by plugging-in the pseudo-observations. The pseudo-observation for $$i$$th individual is defined as (Andersen and others, 2003) \begin{equation*} \hat{S}_{Oi}(t) = n \cdot \hat{S}_{O}(t) - (n-1)\cdot \hat{S}_{O}^{(-i)}(t), \end{equation*} where $$n$$ is the sample size and $$\hat{S}$$ and $$\hat{S}^{(-i)}$$ are the estimators of overall survival evaluated on the whole sample and on the reduced sample without the $$i$$th observation, respectively. With this definition, we obtain the individual contribution to the overall estimator. This contribution is, in spite of censoring, defined for all individuals $$i$$ at each time $$t$$ on the available follow-up interval. Therefore, we propose to estimate $$S_E$$ using \begin{equation} \hat{S}_E(t) = \frac{1}{n} \sum_{i=1}^n \frac{\hat{S}_{Oi}(t)}{S_{Pi}(t)}, \end{equation} (3.1) where $$\hat{S}_{Oi}$$ is the $$i$$th person pseudo-observation and the Kaplan–Meier estimator is used to estimate the overall survival functions $$\hat{S}_{O}(t)$$ and $$\hat{S}_{O}^{(-i)}(t)$$. 3.1. Further comments The idea of pseudo-observations is that whenever we have an estimator that estimates some quantity on the whole sample (i.e. a group-level estimator, the Kaplan–Meier estimator in our case) we can use pseudo-observations formed with that estimator to estimate the same quantity on the individual level. Since the average of pseudo-observations calculated with the Kaplan–Meier equals the overall Kaplan–Meier estimate (except maybe at the last two observed times, see Stute and Wang, 1994, for details), pseudo-observations can be interpreted as an individual contribution to the overall estimate. Pseudo-observations inherit some of the theoretical properties of the group-level estimator that is used for their calculation. For example, if one uses an unbiased estimator it follows from the linearity of expected value that the pseudo-observations are also unbiased. Since the Kaplan–Meier estimator is asymptotically unbiased, the same is true for the pseudo-observations (see Graw and others, 2009, for details). 4. Derivation and estimation of variance To estimate the variance, we first notice that by using the definition of pseudo-observations, we can rewrite (3.1) to get \begin{equation*} \hat{S}_E(t) = \hat{S}_{O}(t) \cdot \sum_{i=1}^n \frac{1}{S_{Pi}(t)} - \frac{n-1}{n} \cdot \sum_{i=1}^n \frac{\hat{S}_{O}^{(-i)}(t)}{S_{Pi}(t)}. \end{equation*} The variance then equals (see Appendix A of the supplementary material available at Biostatistics online for details) \begin{eqnarray} \mathop{\hbox{Var}}\big(\hat{S}_E(t)\big) &=&\bigg( \sum_{i=1}^n \frac{1}{S_{Pi}(t)} \bigg)^2 \cdot \mathop{\hbox{Var}} \big( \hat{S}_{O}(t)\big) \nonumber \\ &&-\, 2 \cdot \frac{n-1}{n} \cdot \sum_{i=1}^n \frac{1}{S_{Pi}(t)} \cdot \bigg\{ \sum_{j=1}^n \frac{1}{S_{Pj}(t)} \cdot \mathop{\hbox{Cov}}\big(\hat{S}_{O}(t), \hat{S}_{O}^{(-j)}(t) \big) \bigg\} \\ &&+\, \bigg( \frac{n-1}{n}\bigg)^2 \cdot \sum_{i=1}^n \sum_{j=1}^n \frac{1}{S_{Pi}(t) S_{Pj}(t)} \cdot \mathop{\hbox{Cov}} \big( \hat{S}_{O}^{(-i)}(t), \hat{S}_{O}^{(-j)}(t) \big).\nonumber \end{eqnarray} (4.1) To calculate the variance one has to calculate $$n \choose 2$$ different covariances $$\mathop{\hbox{Cov}} \big( \hat{S}_{O}^{(-i)}(t), \hat{S}_{O}^{(-j)}(t) \big)$$ and $$n$$ variances $$\mathop{\hbox{Var}} \big( \hat{S}_{O}^{(-i)}(t) \big)$$. Therefore, this approach is very computationally intensive, and a simpler approximation of this formulae might be useful in practice. To obtain one, we can use the result that the $$i$$th pseudo-observation $$\hat{S}_{Oi}$$ asymptotically depends only on the $$i$$th observation (see Lemma 1 from Graw and others, 2009) and is thus asymptotically independent of other pseudo observations. If there are no censored observations in the data, this is also trivially true on finite samples. With this assumption we can express the variance as (see Appendix A of the supplementary material available at Biostatistics online for details) \begin{eqnarray} \mathop{\hbox{Var}}\big(\hat{S}_E(t)\big) & \approx& \mathop{\hbox{Var}}\big(\hat{S}_{O}(t)\big) \cdot \bigg( \sum_{i=1}^n \frac{1}{S_{Pi}^2(t)} \bigg) \nonumber \\ && -\, 2\cdot \frac{n-1}{n} \cdot \sum_{i=1}^n \frac{1}{S_{Pi}^2(t)} \cdot \mathop{\hbox{Cov}}\big( \hat{S}_{O}(t), \hat{S}_{O}^{(-i)}(t) \big)\\ &&+\, \bigg( \frac{n-1}{n} \bigg)^2 \sum_{i=1}^n \frac{1}{S_{Pi}^2(t)} \cdot \mathop{\hbox{Var}}\big( \hat{S}_O^{(-i)}(t) \big) \nonumber. \end{eqnarray} (4.2) Using this approach, we have to calculate only $$n$$ variances and, if this approximation works reasonably well, this implies a huge improvement of the speed of calculations and reduction of the amount of computer memory needed for the calculations. To estimate the variance following the formulae (4.1) or (4.2), we thus have to estimate the quantities $$\mathop{\hbox{Var}}\big( \hat{S}_O^{(-i)}(t)\big)$$, $$\mathop{\hbox{Cov}}\big( \hat{S}_O(t),\hat{S}_O^{(-i)}(t) \big),$$ and $$\mathop{\hbox{Cov}} \big( \hat{S}_O^{(-i)}(t),\hat{S}_O^{(-j)}(t) \big)$$. To this end, we shall build upon the ideas for the variance estimation of the Kaplan–Meier estimator. Let $$N_i(t) = I(T_i \le t, \delta_i=1)$$, $$Y_i(t) = I(T_i \ge t)$$, $$N(t) = \sum_{i=1}^n N_i(t)$$, $$N^{(-j)}(t) = \sum_{i\neq j} N_i(t)$$ and similarly for $$Y$$, $$Y^{(-j)}$$, and $$Y^{(-j,-k)}$$. Let $$S_O^*(t) = \exp\big( -\Lambda_O^*(t) \big)$$, where $$\Lambda_O^*(t) = \int_0^t I\big( Y(u) > 0 \big)\lambda_O(u){\rm d}u$$ and similarly for $$S_O^{*(-i)}(t)$$. The terms $$\mathop{\hbox{Var}}\big( \hat{S}_O(t) \big)$$ and $$\mathop{\hbox{Var}}\big( \hat{S}_O^{(-i)}(t) \big)$$ are simply the variances of the Kaplan–Meier estimator calculated on the whole and on the reduced samples, respectively. To estimate these two terms, the standard counting processes-based estimators can be used (Andersen and others, 1993). The remaining covariances can be approximated using the facts that $$\mathop{\hbox{Cov}} \Big( \frac{\hat{S}_O}{S_O^*} - 1, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1 \Big)(t) = E \Big\langle \frac{\hat{S}_O}{S_O^*} - 1, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1 \Big\rangle(t)$$ and $$\mathop{\hbox{Cov}}\Big( \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} - 1 \Big)(t) = E\Big\langle \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} - 1, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} - 1 \Big\rangle(t)$$. These predictable covariation processes can be calculated using similar techniques as in Andersen and others (1993) (see Appendix B of the supplementary material available at Biostatistics online for details). We obtain the following results $$\matrix{ {\langle {{{{\hat S}_O}} \over {S_O^*}} - 1,{{\hat S_O^{( - i)}} \over {S_O^{*( - i)}}} - 1\rangle (t)} \hfill \cr { = \int_0^t {{{{{\hat S}_O}(s - )} \over {S_O^*(s)}}} \cdot {{I(Y(s) > 0)} \over {Y(s)}} \cdot {{\hat S_O^{( - i)}(s - )} \over {S_O^{*( - i)}(s)}} \cdot {{I({Y^{( - i)}}(s) > 0)} \over {{Y^{( - i)}}(s)}} \cdot (1 - \Delta {\Lambda _O}(s)){Y^{( - i)}}(s){\rm{d}}{\Lambda _O}(s)} \hfill \cr } $$ (4.3) and $$\matrix{ {\langle {{\hat S_O^{( - i)}} \over {S_O^{*( - i)}}} - 1,{{\hat S_O^{( - j)}} \over {S_O^{*( - j)}}} - 1\rangle (t)} \hfill \cr { = \int_0^t {{{\hat S_O^{( - i)}(s - )} \over {S_O^{*( - i)}(s)}}} \cdot {{I({Y^{( - i)}}(s) > 0)} \over {{Y^{( - i)}}(s)}} \cdot {{\hat S_O^{( - j)}(s - )} \over {S_O^{*( - j)}(s)}} \cdot {{I({Y^{( - j)}}(s) > 0)} \over {{Y^{( - j)}}(s)}} \cdot (1 - \Delta {\Lambda _O}(s)){Y^{( - i, - j)}}(s){\rm{d}}{\Lambda _O}(s)}. \hfill \cr } $$ (4.4) Replacing $$S_O^*(s)$$ by $$\hat{S}_O(s)$$, $$S_O^{*(-i)}(s)$$ by $$\hat{S}_O^{(-i)}(s),$$ and $$\Lambda_O(s)$$ by $$\hat{\Lambda}_O(s) $$ in the expressions (4.3) and (4.4), we arrive at the following estimators of the covariances \begin{eqnarray*} \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O}{S^*_O}, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} \bigg)(t) &=& \int_0^t \frac{I\big(Y^{(-i)}(s) > 0 \big)}{Y^{(-i)}(s) - \Delta N^{(-i)}(s)} \cdot \frac{Y^{(-i)}(s)}{Y^2(s)}{\rm d}N(s)\\ \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}}, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} \bigg)(t)&=& \int_0^t \frac{I\big(Y^{(-i)}(s) > 0 \big)}{Y^{(-i)}(s) - \Delta N^{(-i)}(s)} \cdot \frac{I\big(Y^{(-j)}(s) > 0 \big)}{Y^{(-j)}(s) - \Delta N^{(-j)}(s)} \cdot \big( Y(s) - \Delta N(s) \big)\nonumber\\ && \cdot \frac{Y^{(-i,-j)}(s)}{Y^2(s)}{\rm d}N(s) \nonumber \end{eqnarray*} Using these two estimators, we can estimate the covariances required in formulas (4.1) and (4.2): \begin{eqnarray*} \widehat{\mathop{\hbox{Cov}}}\big( \hat{S}_O, \hat{S}_O^{(-i)} \big)(t) &=& \hat{S}_O(t) \cdot \hat{S}_O^{(-i)}(t)\cdot \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O}{S_O^*}, \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}} \bigg)(t)\\ \widehat{\mathop{\hbox{Cov}}}\big( \hat{S}_O^{(-i)}, \hat{S}_O^{(-j)}\big)(t) &=& \hat{S}_O^{(-i)}(t) \cdot \hat{S}_O^{(-j)}(t) \cdot \widehat{\mathop{\hbox{Cov}}}\bigg( \frac{\hat{S}_O^{(-i)}}{S_O^{*(-i)}}, \frac{\hat{S}_O^{(-j)}}{S_O^{*(-j)}} \bigg)(t). \end{eqnarray*} 5. Simulation study This section has two goals. The first one is to illustrate that the new approach is practically unbiased. The second goal is to compare the two formulae for the estimation of variance: we wish to check the precise formula (4.1) in practice and show that it can be replaced by the approximate formula (4.2) as the sample size gets larger and the formula (4.1) becomes overly computationally intensive. 5.1. Simulation design All the simulations have some common features: The demographic variables $$D$$ used in the simulations are age (uniform), sex (binary), and year (uniform), they are simulated independently. The times to death from other causes ($$T_P$$) are simulated using Slovene population mortality tables. The times to death from cancer ($$T_E$$) are simulated with $$\lambda_E=\lambda_{E0}\exp(\beta X)$$. The basic scenario uses a fixed $$\lambda_{E0}$$ and $$\beta=0$$ for all covariates $$X$$. Times to death from either cause are simulated independently, their minimum is taken to be the observed time ($$T=\min(T_P,T_E)$$). Censoring times are generated from a uniform distribution, cca 10 % of individuals are censored by the end of follow-up time (5 years). Each simulation scenario consists of 5000 repetitions, all results (bias and coverage in different scenarios) are assessed at 5 years. The main goal of the simulations is to check that the derived theory works as expected in practice, we therefore change the parameters of the simulations to consider the situations which have proven as important in the previous literature on relative survival (e.g. Pohar Perme and others, 2009; Danieli and others, 2012; Pavlič and Pohar Perme, 2017), but keep others simple so that the observed properties can be tracked to their real cause (e.g. a constant baseline excess hazard is of course completely unrealistic, but also irrelevant as we are dealing with non-parametric estimation that makes no assumptions about the baseline). The key properties we are changing in simulation scenarios are the sample size and the proportion of patients with events. Since our measure summarizes the information on $$\lambda_E$$, the number of patients with $$T_E<5$$ is the most important parameter. On the other hand, the patients who die of other causes serve as a nuisance parameter. We illustrate this by two scenarios which have approximately equal number of patients who die of cancer ($$\sim$$40%), but a very different other cause hazard: we change the age distribution of the patients and the size of the baseline hazard to get 52% and 77% of deaths due to excess hazard, respectively. The upper limit for age never exceeds 85 and thus ensures that the probability of the patients to be still alive at the end of follow-up time (5 years) is high enough to make our interest in $$\lambda_E$$ still sensible. Based on the experience in the literature, we add a third scenario, which has proven as problematic with other studies in relative survival (e.g. Pohar Perme and others, 2009, Supplementary material) has shown both a huge variance and biased regression coefficient estimates in such scenarios): we consider a very low proportion of deaths due to excess hazard (36 %). Coupled with a low overall number of events (22%), this is also an example where the sample size 500 contains very little information on $$\lambda_E$$ (only 8% of patients die due to $$\lambda_E$$). We also changed other parameters, for example the size of covariate effect $$\beta$$ in the excess hazard. However, while these scenarios were important with other issues of relative survival (see e.g. Danieli and others, 2012), they proved irrelevant in our case, the results are hence not reported or are moved to the Appendix D of the supplementary material available at Biostatistics online, where also the details on both the basic scenarios (included in the article) and other scenarios considered can be found. 5.2. Simulation results The first goal is to assess the magnitude of bias of the new approach on small samples, the results are presented in Table 1. We can see that the magnitude of bias of the new approach is of the order of $$10^{-4}$$ (the standard error of the estimate is of the order of $$10^{-2}$$ in this case), the bias is therefore practically negligible. Furthermore, we note that new approach and the PP approach are almost numerically equal in all scenarios considered, the differences are negligible even in the rather extreme scenario (22%/36%), where the weights of the PP estimator are known to be very large. Due to this equality no further comparisons with other methods need to be done, since in terms of bias, difference between the existing estimators and variability the results will be the same as in the 7s comparing the PP estimator with other approaches (Danieli and others, 2012; Roche and others, 2012; Lambert and others, 2015; Seppä and others, 2015, 2016). Table 1. Bias of different estimators on samples of increasing sizes (assessed at 5 years) Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Data generated from a model with no effect of covariates; proportion of deaths = 22%; proportion of deaths due to excess hazard among all deaths = 36%. Table 1. Bias of different estimators on samples of increasing sizes (assessed at 5 years) Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Bias sample size Complete data Right-censored data P PP h P PP $$n=500$$ $$8.2 \times 10^{-4}$$ $$5.73 \times 10^{-4}$$ $$-5.37 \times 10^{-5}$$ $$9.19 \times 10^{-4}$$ $$7.18 \times 10^{-4}$$ $$n=2500$$ $$3.97 \times 10^{-4}$$ $$3.44 \times 10^{-4}$$ $$-2.33 \times 10^{-6}$$ $$3.61 \times 10^{-4}$$ $$3.26 \times 10^{-4}$$ $$n=5000$$ $$3.75 \times 10^{-4}$$ $$3.46 \times 10^{-4}$$ $$-2.44 \times 10^{-5}$$ $$4.08 \times 10^{-4}$$ $$3.87 \times 10^{-4}$$ Data generated from a model with no effect of covariates; proportion of deaths = 22%; proportion of deaths due to excess hazard among all deaths = 36%. In order to compare the two formulae for estimation of variance (4.1) and (4.2), we consider the coverage of the confidence intervals. Since the new approach is practically unbiased, the coverage entirely reflects the precision of the two estimators of variance. The simulation results for three different scenarios with different proportions of deaths and different proportions of deaths due to excess hazard among all deaths for three different sample sizes are given in Table 2. They indicate that the precise formula gives coverages that are close to the nominal value of 0.95 in all three scenarios with the most extreme scenario (the smallest proportion of events) tending to have a slightly too low coverage. This is in line with the previous work, which indicates disproportionately large variability with small proportions of excess hazard events (e.g. Figure 1 in the supplementary material of Pohar Perme and others, 2009), in our case, this large variability seems to be underestimated. The approximate formula gives coverages that are constantly below the nominal level (see Table 2), but the discrepancies are minimal in the two realistic scenarios with the higher proportion of deaths. As a standard to compare to, we also report the coverage of the PP estimator, which is close to the nominal value in all cases. Table 2. Coverages of CI, data generated from a model with no effect of covariates (assessed at 5 years) $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 d, number of deaths from any cause; $$d_E$$, number of deaths due to excess hazard; P, the new approach based on pseudo-observations; PP, Pohar Perme estimator; pr, precise formula for the variance of P; apr, approximate formula for the variance of P. Table 2. Coverages of CI, data generated from a model with no effect of covariates (assessed at 5 years) $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 $$\frac{d}{n} (\frac{d_E}{d})$$ 0.22 (0.36) 0.51 (0.77) 0.77 (0.52) estimator P(pr) P(apr) PP P(pr) P(apr) PP P(pr) P(apr) PP $$n=500$$ 0.9404 0.9108 0.947 0.954 0.9468 0.9546 0.9454 0.9424 0.9458 $$n=2500$$ 0.9378 0.9128 0.949 0.9442 0.9382 0.9464 0.956 0.9544 0.9562 $$n=5000$$ 0.9386 0.9116 0.9488 0.943 0.9366 0.9462 0.9482 0.946 0.9498 d, number of deaths from any cause; $$d_E$$, number of deaths due to excess hazard; P, the new approach based on pseudo-observations; PP, Pohar Perme estimator; pr, precise formula for the variance of P; apr, approximate formula for the variance of P. Comparing all the results in Table 2, it is clear that the problems do not lie in small samples, instead, the proportion of excess hazard deaths among all deaths is crucial for the properties of our estimators. In all of the above, the $$\lambda_E$$ was equal for all individuals and hence independent of covariates $$X$$. We have also generated data from the models where covariates had effect, but the results were similar and are therefore not reported (the covariate effect changes the proportion of deaths and the proportion of deaths due to cancer and are similar to scenarios mentioned above—for details see Appendix D of the supplementary material available at Biostatistics online). 6. Example We now explore the properties of the estimator on real data. Consider the data of all Slovene female colon cancer patients—538 female patients diagnosed between January 1, 1994 and December 30, 2000, data are taken from Slovene cancer registry (Zadnik and others, 2012) and available in Pohar Perme (2017). Their age span is between 26 and 85. Figure 1 (left side) presents their 20-year survival curve. This sample was chosen on purpose—as an example of a truly bizarre behavior we can get with routine use of the estimator, a point raised by Dickman and others (2013) and Lambert and others (2015). To understand the reasons for this behavior, note first, that despite the weird behavior both the PP estimator and our new proposal actually give the same estimates. The same would be true also if we had no censoring, i.e. in the case, where the new approach follows directly from the definition of the measure. It is thus clear that this behavior is not a property of the PP estimator, but rather the property of the measure. The weights ($$1 / S_P$$) that cause the variable behavior of the PP estimator are present already in the definition in formula (2.1) and hence affect any estimator of this measure. The oldest individuals in our sample were aged 85 and their probability of still being alive after 20 years ($$S_P(20)$$) is of order $$10^{-6}$$, hence we cannot expect to have any information on their $$\lambda_E$$ in the data. Before dropping from this sample, these individuals can have very large weights—in the case of our sample, there were 267 patients still at risk at 17 years, with two patients having weights of size 50 (the median weight at that time is 1.3), these two patients almost entirely control the behavior of the curve at that time. Figure 1 View largeDownload slide The PP and the new approach on the sample of Slovene female colon cancer patients. Left: 1538 patients, age at diagnosis between 26 and 85. Right: 815 patients, age at diagnosis between 26 and 70. Figure 1 View largeDownload slide The PP and the new approach on the sample of Slovene female colon cancer patients. Left: 1538 patients, age at diagnosis between 26 and 85. Right: 815 patients, age at diagnosis between 26 and 70. Some authors propose alternative estimators to deal with such data (age-stratified Ederer II, model-based estimator, etc., see e.g. Dickman and others, 2013), i.e. introducing assumptions to deal with the problem of little or no information available in the data. As the above example clearly illustrates, the large variability is not a problem of a specific estimator (PP) but rather a problem of the measure. Instead of reasoning why the assumptions made (that are almost always untestable) are acceptable, we focus on understanding this property of the measure and the limitations it brings with. The measure is a function of $$\lambda_E$$ of all patients. With the probability of still being alive in the population $$S_p$$ getting very low, we cannot expect to get any information on $$\lambda_E$$ from the data. One way to move forward is to set some assumptions, but, apart from being untestable, any assumptions about the value of $$\lambda_E$$ for very old patients are also not at all of practical interest. We therefore propose one should limit to a cohort for which the value of $$\lambda_E$$ is of interest throughout the follow-up interval. In our example—if the interest lies in 20 year survival, it seems reasonable to limit ourselves to patients under 70 at diagnosis, thus ensuring that their probability of still being alive in the population after 20 years is large enough to allow estimation. The graph of the resulting 815 women is presented on the right side of Figure 1. Again, we see that the values of the PP and the new estimator are practically identical, the same is true for the confidence intervals. When limiting the analysis to follow-up for which the measure is meaningful, both estimators (PP and the newly proposed approach) shall have reasonable properties in terms of variance. 7. Discussion on the measure This article is focused on the estimation of the measure defined in (2.2). This is of course not the only measure that could be of interest in the relative survival setting (see e.g. Ederer and others, 1961; Cronin and Feuer, 2000; Sasieni and Brentnall, 2016). An important drawback lies in the way it summarizes the excess term information—by definition, the information on $$\lambda_E$$ should be available throughout the follow-up time for the entire cohort. Section 6 presented an example where this condition was clearly not met. Sasieni and Brentnall (2016) instead propose a measure that can again be seen as a weighted average of $$\lambda_E$$, but define weights in a way that optimizes the use of the information available, i.e. consider the older patients for a period shorter than the whole follow-up time. The interpretation of the measure obtained is rather complex and the authors themselves propose that, for a non-specialist audience, it should be described merely as a standardized relative survival index. They claim the same is true for net survival (Sasieni and Brentnall, 2016). We believe that particularly the average ratio interpretation of $$S_E$$ is rather easy to grasp but agree that net survival is often interpreted overly loosely without paying the proper respect to all the assumptions it warrants and the comparability between populations is taken for granted. In this section, we state these assumptions and thus warn against oversimplifications. Comparability between populations Net survival is often referred to as the measure “that takes into account the population mortality differences” or the measure “that does not depend on the population mortality hazard,” its comparability between groups with different population mortality is hence a key issue. In terms of comparability, the key issue is the assumption that $$\lambda_P$$ is given by the population mortality tables. This is often not true, since $$D$$ does not include all the covariates that affect both $$\lambda_P$$ and $$\lambda_E$$, for example, population tables do not include smoking even though this would be crucial for lung cancer data. With this assumption violated, populations differing with respect to these, missing covariates (e.g. smoking), become incomparable. To see this, say we wish to compare $$S_E$$ between Slovene and Chinese male lung cancer patients. A vast majority of these patients are smokers and it is clear that the $$\lambda_P$$ calculated on both smokers and non-smokers in the population is underestimating their other-cause mortality, hence their $$\lambda_E$$ shall be overestimated. However, since the prevalence of smoking in the two countries is very different ($$\sim$$20% in Slovenia versus 50% in China), the amount of this bias shall be very different in the two countries, thus making the estimated $$\widehat{S}_E$$ differ considerably even if the true $$\lambda_E$$ is in fact the same. Several papers have been published investigating this topic (see e.g. Grafféo and others, 2012). Equation (1.1) is commonly stated as the basic starting model when defining net survival, even though it is not always clearly defined in the literature. Note that the overall hazard $$\lambda_O$$ is the conditional probability of dying: $$\lambda_O(t|X)=\lim\limits_{\Delta t\rightarrow 0}\frac{P(t \leq T \leq t+ \Delta t |X,T \geq t)}{\Delta t} $$ which implies \begin{eqnarray} \lambda_E(t|X)&=&\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_E \leq t+ \Delta t|X, T \geq t)}{\Delta t} \\ \lambda_P(t|D)&=&\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_P \leq t+ \Delta t|D, T \geq t)}{\Delta t} \notag. \end{eqnarray} (7.1) The common assumption of the relative survival field is that $$ \lambda_P(t|D)=\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_P \leq t+ \Delta t|D, T_P \geq t)}{\Delta t} $$ and can be obtained from the survival probabilities $$S_P$$ reported in the population mortality tables. Using $$\lambda_E$$ as in (7.1) and $$\lambda_P$$ from the population tables, we define the relative survival in (2.2). The net survival measure definition assumes also that $$S_E$$ is a survival function, i.e. \begin{eqnarray} \lambda_E(t|X)=\lim_{\Delta t\rightarrow 0}\frac{P(t \leq T_E \leq t+ \Delta t|X, T_E \geq t)}{\Delta t}. \end{eqnarray} (7.2) A sufficient but not necessary assumption would be that $$T_P$$ and $$T_E$$ are independent conditional on $$X$$. The assumption (7.2) is necessary if $$S_E$$ is to be independent of $$T_P$$. Note that the definition of net survival has thus also implicitly assumed the existence of latent times $$T_E$$ and $$T_P$$—the time of dying due to the disease and due to other causes, respectively (the observed time $$T=\min(T_E,T_P)$$). The marginal relative survival of course depends on the distribution of covariates $$X$$, a difference in $$S_E$$ between two populations might therefore stem from a different covariate distribution $$H$$. To take this into account, the formula (2.3) can be used as it allows weights $$w$$. The idea of the weights is to report results with a pregiven standard distribution of covariates (see e.g. Corazziari and others, 2004 for age-standardization). Such a standardization is of course only crude, and setting weights that ensure truly comparable results may not be easy. To understand the observed differences in $$\widehat{S}_E$$, one shall usually rather resort to regression models. Interpretation When the population mortality tables do not adequately describe $$\lambda_P$$, the interpretation of net survival fails; however, this assumption is not needed if interpreting $$S_E$$ as the average ratio—the observed survival of an individual is compared to his counterparts in the population, where the counterparts are defined by the covariates $$D$$ that are available (the patient does not need to be equal to the defined population in all aspects but the fact he has been diagnosed with the disease). The comparability between populations is of course lost regardless of the interpretation. The assumption (7.2) cannot be verified from the observed data (Kalbfleisch and Prentice, 2002, p. 259–261), in fact verifying it could only make sense if the $$T_E$$ and $$T_P$$ did not indicate the times to death, but rather the times to some other events which do not preclude one another. Without this assumption, net survival is not identifiable with real world data (see e.g. Tsiatis, 1975; Andersen and Keiding, 2012), as the next example shows. Say the data follow the additive model in (1.1) with $$T_E$$ and $$T_P$$ conditionally independent, the observed times $$T_O$$ are calculated as the minimum of the two latent times. We shall refer to this data set as data set A. Using the times $$T_E$$ only, we estimate survival in the hypothetical world using Kaplan–Meier. Using the observed time $$T_O$$ we add the curves estimated by PP estimator and the new approach to the plot (see Figure 2). There are various scenarios of dependency between $$T_P$$ and $$T_E$$ which would result in the same observed times. One option is to change the time of the other event after the first one has happened. Two extreme cases in terms of the marginal distribution of $$T_E$$ can be imagined: after an individual experienced the “other cause” event ($$T_O=T_P$$, data set B), the individuals “dies of cancer” immediately ($$T_E=T_P+\Delta t$$) or, the individual becomes immortal with respect to cancer ($$T_E=\infty$$, data set C). Using times $$T_E$$ only, we again estimate survival in the hypothetical world. The estimators on the three data sets (A, B, C) in Figure 2 are wildly different and the value of net survival between the two extrema (B and C) is unidentifiable. To summarize: both PP and the new estimator estimate the same quantity, i.e. marginal relative survival. As this example shows, this may be very far from net survival if (7.2) is not true. In practice, we never know if (7.2) is fulfilled, which makes the interpretation of the estimate as net survival overly optimistic. We believe the historical interest in net survival does not stem from the interest in what would happen if we could remove other causes, rather, it is driven by the need of a measure independent of the population mortality. To avoid the long-standing debate on the sensibility of net survival and the need for untestable assumptions, the more cautious interpretation of the measure in terms of the average ratio can be used. Without assuming (7.2) (and conditional on $$\lambda_P$$ being correct), the extent of comparability has been summarized as the criterium A2 in Sasieni’s work (Sasieni and Brentnall, 2016): the criterium demands that the measure depends on the population mortality only through $$\lambda_E$$ (it does not directly depend on the size of $$\lambda_P$$ as is the case with relative survival ratio or cumulative incidence function). Whether $$\lambda_E$$ is defined as in (7.1) or (7.2) is irrelevant for this criterium. Figure 2 View largeDownload slide The PP and the new approach versus the three true net survivals in the hypothetical world data sets A, B, and C. Figure 2 View largeDownload slide The PP and the new approach versus the three true net survivals in the hypothetical world data sets A, B, and C. All things being said, the marginal relative survival shall in practice never be completely comparable, but it is (possibly with some additional weighting to achieve equal covariate distribution) a reasonable and simple survival measure to report when focusing on $$\lambda_E$$. While the data are never perfect and the above listed assumptions clearly indicate what is relevant when judging them, it is important that the estimator at least avoids depending on $$\lambda_P$$ when this is not necessary (a counter-example is the Ederer II estimator which always depends on $$\lambda_P$$, unless $$\lambda_E$$ is equal for all the patients regardless of $$X$$). 8. Conclusions A measure that compares survival of cancer patients between countries accounting for their differences in the general population mortality has been one of the main aims of the analyzes of cancer registry data in the last 50 years. While most of the debate in the past was focusing on choosing the right estimator, more theoretical work has been done in the recent years focusing on the possible measures and their properties (Pohar Perme and others, 2012; Sasieni and Brentnall, 2016). In this article, we have focused on the marginal relative survival measure that summarizes the information about the excess term $$\lambda_E(t)$$. Our new approach to estimation of marginal relative survival is based on pseudo-observations, that are calculated on the whole sample. They represent the individual contribution to the overall estimate and are defined at all times regardless of censoring. With their aid, we construct the new estimator directly from the definition, which enables us to distinguish between the properties of the measure and the properties of its estimators. In particular, we show that the weights ($$1/S_P$$) are intrinsic to this measure and explain the limitations of focusing on the excess term $$\lambda_E$$ with real world data. Both net survival and marginal relative survival are measures defined as an average over all individuals in the sample, we should thus only attempt their estimation in group of individuals for whom the probability of their counterparts still being alive in the population is high enough—it is impossible to get a reasonable estimator of this measure for groups containing very old individuals for whom $$S_p(t)$$ is practically equal to 0. A common critique with any approach based on pseudo-observations is that it requires more stringent conditions in terms of censoring compared to alternative approaches. However, this is only true and needs to be dealt with in regression models (Binder and others, 2014). In our case, a non-parametric measure is of interest, and the pseudo-observations based approach is thus no exception—any non-parametric method requires censoring independent of covariates. We have shown that the new approach is practically unbiased even on small samples with few events. We have derived the variance formula for this estimator and proposed two options for its estimation. Their performance has been studied with simulations, where important discrepancies were found only with smaller data sizes, for which the precise formula can be calculated in a sufficiently fast manner. The new approach to estimation gives an estimate which is very close to that of the PP estimator, from the practical view, the results are almost equal, which we have also shown in theory. The limit of both estimators is hence equal to the marginal relative survival. Claiming that it is also equal to net survival requires strong additional untestable assumptions (7.1 equaling 7.2). The main importance of this work is thus not in replacing the existing estimator, but rather in providing an alternative approach that gives important insight into the measure and that may be easier to implement in software. The approach does not depend on the definition of latent failure times and on the hypothetical world assumptions. With its simpler form, we have removed the need for numerical integration, which is a disadvantage of the PP estimator blocking further extensions. In particular, a log-rank type test for comparison of net survival curves has recently been developed (following the logic of the PP estimator, see Grafféo and others, 2016 for details) but no theoretical link to semi-parametric models can be established because of the numerical approximations (Pavlič and Pohar Perme, 2017). On the other hand, the usage of pseudo-observations in regression models is rather straightforward—having the same general theory in both non-parametric estimation and regression models could improve the understanding of the properties of the methods. 9. Software Due to numeric integration, PP needs to be evaluated in short enough intervals (daily intervals are the default in the relsurv package, Pohar Perme, 2017) regardless of whether a single point estimate or a whole survival curve is required. With our code the new approach is considerably faster (10% of the time at $$n=1000$$, 20% of the time at $$n=10\,000$$) and remains faster even when monthly estimates are required (50% to 70%). Precise variance formula calculation is always very intensive but can still be done in a reasonable time when it is in fact needed (see Table 2) whereas its memory requirements prevent its usage in very large samples. The approximate formula does not increase the computational time of the estimator by more than 10%. The GitHub repository https://github.com/Klemen-Pavlic/Marginal-relative-survival contains the implementation of the new estimation approach and the code for the example presented in Section 6. Supplementary material Supplementary material is available online at http://biostatistics.oxfordjournals.org. Acknowledgements The authors thank the CENSUR working survival group (ANR Grant Number ANR-12-BSV1-0028; coordinator Prof. Roch Giorgi) for helpful discussions. Conflict of Interest: None declared. Funding The authors acknowledge the financial support from the Slovenian Research Agency (research core funding No. P3-0154). The authors acknowledge the project Methods of estimation of key indicators in population cancer survival J3-7272 was financially supported by the Slovenian Research Agency. Furthermore, Klemen Pavlič is a young researcher supported by the Slovenian Research Agency. References Andersen P. K. and Keiding N. ( 2012 ). Interpretability and importance of functionals in competing risks and multistate models. Statistics in Medicine 31 , 1074 – 1088 . Google Scholar CrossRef Search ADS PubMed Andersen P. K. , Borgan O. , Gill R. D. and Keiding N. ( 1993 ). Statistical Models Based on Counting Processes . New York : Springer. Google Scholar CrossRef Search ADS Andersen P. K. , Klein J. P. and Rosthøj S. ( 2003 ). Generalized linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika 90 , 15 – 27 . Google Scholar CrossRef Search ADS Binder N. , Gerds T. A. and Andersen P. K. ( 2014 ). Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Analysis 20 , 303 – 315 . Google Scholar CrossRef Search ADS PubMed Brenner H. and Hakulinen T. ( 2003 ). On crude and age-adjusted relative survival rates. Journal of Clinical Epidemiology 56 , 1185 – 1191 . Google Scholar CrossRef Search ADS PubMed Brenner H. , Arndt V. , Gefeller O. and Hakulinen T. ( 2004 ). An alternative approach to age-adjustment of cancer survival rates. European Journal of Cancer 40 , 2317 – 2322 . Google Scholar CrossRef Search ADS PubMed Clerc-Urmès I. , Grzebyk M. and Hédelin G. ( 2014 ). Net survival estimation with stns. Stata Journal 14 , 87 – 102 . Corazziari I. , Quinn M. and Capocaccia R. ( 2004 ). Standard cancer patient population for age standardizing survival ratios. European Journal of Cancer 40 , 2307 – 2316 . Google Scholar CrossRef Search ADS PubMed Cortese G. , Gerds T. A. and Andersen P. K. ( 2013 ). Comparing predictions among competing risks models with time-dependent covariates. Statistics in Medicine 32 , 3089 – 3101 . Google Scholar CrossRef Search ADS PubMed Coviello E. , Dickman P. W. , Seppä K. and Pokhrel A. ( 2015 ). Estimating net survival using a life-table approach. Stata Journal 15 , 173 – 185 . Cronin K. A. and Feuer E. J. ( 2000 ). Cumulative cause-specific mortality for cancer patients in the presence of other causes: a crude analogue of relative survival. Statistics in Medicine 19 , 1729 – 1740 . Google Scholar CrossRef Search ADS PubMed Danieli C. , Remontet L. , Bossard N. , Roche L. and Belot A. ( 2012 ). Estimating net survival: the importance of allowing for informative censoring. Statistics in Medicine 31 , 775 – 786 . Google Scholar CrossRef Search ADS PubMed Dickman P. W. and Coviello E. ( 2015 ). Estimating and modeling relative survival. Stata Journal 15 , 186 – 215 . Dickman P. W. , Lambert P. C. , Coviello E. and Rutherford M. J. ( 2013 ). Estimating net survival in population-based cancer studies, letter to editor. International Journal of Cancer 133 , 519 – 521 . Google Scholar CrossRef Search ADS PubMed Ederer F. , Axtell L. M. and Cutler S. J. ( 1961 ). The relative survival rate: a statistical methodology. National Cancer Institute Monograph 6 , 101 – 121 . Google Scholar PubMed Estève J. , Benhamou E. , Croasdale M. and Raymond M. ( 1990 ). Relative survival and the estimation of net survival: elements for further discussion. Statistics in Medicine 9 , 529 – 538 . Google Scholar CrossRef Search ADS PubMed Grafféo N. , Castell F. , Belot A. and Giorgi R. ( 2016 ). A log-rank-type test to compare net survival distributions. Biometrics 72 , 760 – 769 . Google Scholar CrossRef Search ADS PubMed Grafféo N. , Jooste V. and Giorgi R. ( 2012 ). The impact of additional life-table variables on excess mortality estimates. Statistics in Medicine 31 , 4219 – 4230 . Google Scholar CrossRef Search ADS PubMed Graw F. , Gerds T. A. and Schumacher M. ( 2009 ). On pseudo-values for regression analysis in multi-state models. Lifetime Data Analysis 15 , 241 – 255 . Google Scholar CrossRef Search ADS PubMed Hakulinen T. and Tenkanen L. ( 1987 ). Regression analysis of relative survival rates. Journal of the Royal Statistical Society—Series C 36 , 309 – 317 . Kalbfleisch J. D. and Prentice R. ( 2002 ). The Statistical Analysis of Failure Time Data , 2nd edition . Hoboken, NJ : John Wiley and Sons, Inc. Google Scholar CrossRef Search ADS Komukai S. and Hattori S. ( 2016 ). Doubly robust estimator for net survival rate in analyses of cancer registry data. Biometrics , 124 – 133 . Lambert P. C. , Dickman P. W. and Rutherford M. J. ( 2015 ). Comparison of different approaches to estimating age standardized net survival. BMC Medical Research Methodology 15 , 64 . Google Scholar CrossRef Search ADS PubMed Pavlič K. and Pohar Perme M. ( 2017 ). On comparison of net survival curves. BMC Medical Research Methodology 17 , 79 . Google Scholar CrossRef Search ADS PubMed Pohar M. and Stare J. ( 2006 ). Relative survival analysis in R. Computer Methods and Programs in Biomedicine 81 , 272 – 278 . Google Scholar CrossRef Search ADS PubMed Pohar M. and Stare J. ( 2007 ). Making relative survival analysis relatively easy. Computers in Biology and Medicine 37 , 1741 – 1749 . Google Scholar CrossRef Search ADS PubMed Pohar Perme M. ( 2017 ). relsurv: Relative Survival . R package version 2.1-1. Vienna, Austria : R Foundation for Statistical Computing. Pohar Perme M. , Henderson R. and Stare J. ( 2009 ). An approach to estimation in relative survival regression. Biostatistics 10 , 136 – 146 . Google Scholar CrossRef Search ADS PubMed Pohar Perme M. , Stare J. and Estève J. ( 2012 ). On estimation in relative survival. Biometrics 68 , 113 – 120 . Google Scholar CrossRef Search ADS PubMed Pokhrel A. and Hakulinen T. ( 2008 ). How to interpret the relative survival ratios of cancer patients. European Journal of Cancer 44 , 2661 – 2667 . Google Scholar CrossRef Search ADS PubMed Pokhrel A. and Hakulinen T. ( 2009 ). Age-standardisation of relative survival ratios of cancer patients in a comparison between countries, genders and time periods. European Journal of Cancer 45 , 642 – 647 . Google Scholar CrossRef Search ADS PubMed Quaresma M. , Coleman M. P. and Rachet B. ( 2015 ). 40-year trends in an index of survival for all cancers combined and survival adjusted for age and sex for each cancer in England and Wales, 1971–2011: a population-based study. The Lancet 385 , 1206 – 1218 . Google Scholar CrossRef Search ADS Roche L. , Danieli C. , Belot A. , Grosclaude P. , Bouvier A. M. , Velten M. , Iwaz J. , Remontet L. and Bossard N. ( 2012 ). Cancer net survival on registry data: use of the new unbiased Pohar-Perme estimator and magnitude of the bias with the classical methods. International Journal of Cancer 132 , 2359 – 2369 . Google Scholar CrossRef Search ADS PubMed Sasieni P. and Brentnall A. R. ( 2017 ). On standardized relative survival. Biometrics 73 , 473 – 482 . Google Scholar CrossRef Search ADS PubMed Seppä K. , Hakulinen T. and Pokhrel A. ( 2015 ). Choosing the net survival method for cancer survival estimation. European Journal of Cancer 51 , 1123 – 1129 . Google Scholar CrossRef Search ADS PubMed Seppä K. , Hakulinen T. , Läärä E. and Pitkäniemi J. ( 2016 ). Comparing net survival estimators of cancer patients. Statistics in Medicine 35 , 1866 – 1879 . Google Scholar CrossRef Search ADS PubMed Stute W. and Wang J. L. ( 1994 ). The jackknife estimate of a Kaplan–Meier integral. Biometrika 81 , 603 – 606 . Tsiatis A. ( 1975 ). A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences of the United States of America 72 , 20 – 22 . Google Scholar CrossRef Search ADS PubMed Zadnik V. , Primic Žakelj M. and Krajc M. ( 2012 ). Cancer burden in Slovenia in comparison with the burden in other European countries. Zdravniški vestnik—Slovenian Medical Journal 81 , 407 – 412 . © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

BiostatisticsOxford University Press

Published: Mar 13, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off