Pitfalls of linear regression for estimating slopes over time and how to avoid them by using linear mixed-effects models

Pitfalls of linear regression for estimating slopes over time and how to avoid them by using... Abstract Clinical epidemiological studies often focus on investigating the underlying causes of disease. For instance, a nephrologist may be interested in the association between blood pressure and the development of chronic kidney disease (CKD). However, instead of focusing on the mere occurrence of CKD, the decline of kidney function over time might be the outcome of interest. For examining this kidney function trajectory, patients are typically followed over time with their kidney function estimated at several time points. During follow-up, some patients may drop out earlier than others and for different reasons. Furthermore, some patients may have greater kidney function at study entry or faster kidney function decline than others. Also, a substantial heterogeneity may exist in the number of kidney function estimates available for each patient. This heterogeneity with respect to kidney function, dropout and number of kidney function estimates is important to take into account when estimating kidney function trajectories. In general, two methods are used in the literature to estimate kidney function trajectories over time: linear regression to estimate individual slopes and the linear mixed-effects model (LMM), i.e. repeated measures analysis. Importantly, the linear regression method does not properly take into account the above-mentioned heterogeneity, whereas the LMM is able to retain all information and variability in the data. However, the underlying concepts, use and interpretation of LMMs are not always straightforward. Therefore we illustrate this using a clinical example and offer a framework of how to model and interpret the LMM. dropout, GFR trajectory, kidney function trajectory, linear mixed-effects model, linear regression INTRODUCTION In epidemiological research, studies often focus on investigating risk factors for diseases. For instance, the effect of blood pressure or glycated haemoglobin A1c (HbA1c) levels on the development of end-stage renal disease is investigated [1, 2]. In addition to the mere occurrence of end-stage renal disease, clinicians may also be interested in chronic kidney disease (CKD) progression. Then one might study the effect of blood pressure or HbA1c levels on CKD progression, in other words the kidney function decline or, more generally, the trajectory of kidney function over time [3−10]. When investigating trajectories of kidney function, patients are typically followed over time with their kidney function estimated at several time points. In addition, the number of estimated glomerular filtration rate (eGFR) values may vary across patients. Also, during the follow-up, some patients may drop out during the study and thus their follow-up period is terminated earlier than intended. Furthermore, some patients may have greater kidney function at study entry or show a much faster CKD progression than others. This heterogeneity—in baseline eGFR, dropout and number of eGFR values between patients—should be taken into account when investigating risk factors associated with kidney function decline. In the literature investigating changes in kidney function over time, two methods are commonly used: linear regression of individual slopes and linear mixed-effects models (LMMs). Both methods use repeated eGFR values within an individual over time. The methods differ in the way the overall GFR decline is estimated. In the linear regression method, individual eGFR declines or slopes are estimated using linear regression based on at least two eGFR estimates over time. All values of a patient are collapsed into a single summarizing eGFR decline, yielding an individual eGFR slope for each patient. Subsequently a risk factor such as blood pressure is associated with this summarized decline rate using yet another linear regression with the individual slopes as the outcome. By summarizing these individual eGFR declines, this method is unable to take account of the above-mentioned heterogeneity in dropout, baseline kidney function values and number of eGFR values between individuals. A method that does take account of these sources of heterogeneity when analysing eGFR trajectories is the LMM. LMMs, used for repeated measures designs, are a special case of multilevel or hierarchical linear models [11]. The differences between the two methods and the interpretation and use of an LMM are not always straightforward. Therefore we aimed to highlight the differences between linear regression on individual slopes and LMMs when used for the purpose of estimating the GFR decline over time and its association with a certain risk factor. This will be illustrated by a clinical example of the effect of baseline diastolic blood pressure (DBP) on the decline of kidney function over time. CLINICAL EXAMPLE: EFFECT OF DBP ON KIDNEY FUNCTION DECLINE Study population We used the prospective Predialysis Patient Record-2 (PREPARE-2) cohort, described elsewhere in more detail [12, 13]. In summary, incident adult CKD 4–5 patients starting pre-dialysis care were included when referred to one of the 25 participating Dutch specialized pre-dialysis outpatient clinics (inclusion period 2004−11). Clinical and laboratory data were collected every 6 months. Patients were followed until the start of dialysis, receiving a kidney transplant, death or censoring. Censoring was defined as recovery of kidney function prior to the start of renal replacement therapy, refusal of further study participation, moving to an outpatient clinic not participating in the PREPARE-2 study, loss to follow-up or 18 October 2016 (end of follow-up), whichever came first. The study was approved by the medical ethics committee or institutional review board (as appropriate) of all participating centres. Study exposure and outcome The study exposure in this illustrative example is baseline DBP. Baseline was defined as the first available measurement at cohort entry. DBP was dichotomized based on the median value of DBP, i.e. 80 mmHg. The study outcome was kidney function decline per year. Kidney function, based on serum creatinine levels, was estimated using the Chronic Kidney disease Epidemiology Collaboration equation [14]. Kidney function decline was estimated based on all available individual eGFR values during the first 2 years of pre-dialysis care. In patients initiating dialysis, eGFR values until 2 weeks before the start of dialysis were used, because eGFR values after this point in time were no longer representative of actual kidney function [13]. Analyses were performed with and without adjustment for potential baseline confounders: sex, age, race, smoking, alcohol use, primary kidney disease and comorbidities, cardiovascular disease (angina pectoris, coronary disease and/or myocardial infarction) and diabetes. Statistical analyses were performed with SPSS Statistics 23 (IBM, Armonk, NY, USA). Results using linear regression versus LMM We used both linear regression on individual slopes and the LMM to investigate the association between baseline DBP and eGFR decline. We now demonstrate the differences in results obtained when using these methods. In Supplementary Materials 1 and 2, we provide equations and an example SPSS syntax for both linear regression on individual slopes and the LMM, including general technical issues to keep in mind when modelling the LMM and an example of how to interpret LMM results obtained in SPSS, using the example below. To estimate the eGFR decline, we use linear regression on individual slopes, for which at least two eGFR values within an individual over time are needed. In total, 271 patients of the study population had at least two eGFR values available and were included in the analysis. All results are shown in Table 1. For frequencies of different reasons of dropout after the 2-year follow-up period, see Supplementary Material 3. For categorical risk factors, it applies that the estimated effect is relative to a reference category. First, in the linear regression analysis, the adjusted additional change in eGFR decline is 2.05 [95% confidence interval (CI) 1.44–2.66] mL/min/1.73 m2/year in patients with a DBP ≥80 mmHg compared with individuals with a DBP <80 mmHg, i.e. the reference category. In other words, patients with a DBP ≥80 mmHg on average have a 2.05 mL/min/1.73 m2 faster eGFR decline per year than patients with a DBP <80 mmHg, given a fixed sex, age, etc. Second, using the LMM in the same study population yielded an adjusted additional change in the annual eGFR decline of 1.70 (95% CI 0.90–2.51)  mL/min/1.73 m2 in individuals with a DBP ≥80 mmHg compared with individuals with a DBP <80 mmHg. Table 1 Association of DBP with decline in kidney function during the first 2 years of pre-dialysis DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) a Adjusted for sex, age, race, smoking, alcohol use, primary kidney disease and comorbidities, cardiovascular disease and diabetes. b The fixed effects included time, baseline DBP and baseline DBP*time. For the adjusted results, confounders and the interaction terms for each confounder*time were added. A random intercept and slope model was used. Table 1 Association of DBP with decline in kidney function during the first 2 years of pre-dialysis DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) a Adjusted for sex, age, race, smoking, alcohol use, primary kidney disease and comorbidities, cardiovascular disease and diabetes. b The fixed effects included time, baseline DBP and baseline DBP*time. For the adjusted results, confounders and the interaction terms for each confounder*time were added. A random intercept and slope model was used. Remarkably, this example shows that the obtained additional annual eGFR decline estimates are not the same when directly comparing the linear regression method to LMMs. How could this be explained and which is the better estimate? In the population of 271 patients, dropout was already at 22% after 1 year of follow-up. Could this dropout rate have influenced the results? Below, we will explain the underlying concepts and provide answers using this example. Before discussing the differences between the two methods, an important strength of the LMM is that it allows us to also include individuals with only one eGFR value available during the follow-up period. Estimating the LMM in the extended sample of 416 patients, the adjusted additional annual kidney function decline was 1.91 (95% CI 1.12–2.71)  mL/min/1.73 m2 for patients with a DBP ≥80 mmHg versus DBP <80 mmHg. Although in this particular case this estimate seems to be similar to that obtained in linear regression on individual slopes, we should not forget that the LMM uses the full sample, making use of all available information and thereby reducing the risk of selection bias. Of note, the wider 95% CIs are inherent to the use of LMMs, which we will touch upon below. Underlying concepts of linear regression versus LMMs To obtain population-averaged eGFR declines in association with a risk factor (DBP), linear regression on individual slopes is a commonly used method. This is achieved in a two-stage approach [15]. In the first stage, individual slopes of kidney function over time are estimated, also called patient-specific regression coefficients. For this purpose, using all values of a single patient, a simple linear regression model is estimated with eGFR as the outcome variable, defined as the kidney function estimated at different time points, and time as exposure, meaning the time between baseline and each time point at which the kidney function was estimated. This first stage is based on the assumption that the underlying eGFR trajectory is linear for each patient. The estimated slope of a patient represents the eGFR decline for a pre-specified time period; in our example, an annual eGFR decline (mL/min/1.73 m2/year). Thus, in this first step, all eGFR values of a patient are collapsed into a single summary measure, yielding one eGFR slope for each individual patient. In the second stage, a linear regression model is used in which these previously estimated slopes per individual are analysed as outcome. In our example, this outcome variable (decline in eGFR) is related to baseline DBP (exposure). In aetiological research, we further adjust for potential confounders in this stage using a more elaborate model [16]. Following the clinical example, clear differences in obtained eGFR decline are present using the two-stage linear regression approach versus LMMs. Linear regression on individual slopes is quite simple and easy to understand. However, four important drawbacks exist. The solution for these drawbacks is provided by the LMM: the key characteristics of the LMM align with the problems encountered with the aforementioned two-stage approach. The LMM retains all information and variability in the data when examining eGFR change over time. But how is the LMM able to do this? Below we discuss the four drawbacks of the two-stage linear regression approach and provide the associated solutions using LMMs (Box 1). Box 1. Differences between LMMs and linear regression on individual models LMMs retain all information and variability in the data. Variability in different baseline eGFRs or eGFR slopes between individuals is taken into account by the LMM. LMMs take account of variation in the number of eGFR values between individuals. LMMs deal accurately with dropout in longitudinal studies. In the LMM, individuals with only one eGFR value can be included to estimate the eGFR decline at the population level. First, in linear regression on individual slopes, all eGFR values of a patient are collapsed into a single-summary individual eGFR slope, as illustrated in Figure 1, which is then used as the outcome in the second stage. Consequently, the variability in the estimates of an individual, on which the eGFR slope is based, is not handled properly. In addition, the variability in baseline eGFR values between individuals is totally ignored by the linear regression model. The LMM provides a solution for these problems, because the LMM is able to take into account both the variability of baseline eGFR and eGFR slopes between patients (Figure 2). In general, we aim to estimate the trajectory at the population level, for instance, the mean eGFR trajectory at the population level, characterized by a population baseline value and slope. These are also called fixed effects (Box 2). However, an individual’s eGFR trajectory could deviate from this mean eGFR trajectory in the overall study population. Due to variability around the population-averaged baseline eGFR, the baseline eGFR between individuals could vary. For instance, the overall population-averaged baseline eGFR could be 14 mL/min/1.73 m2, while a certain individual had a baseline eGFR value of 12 mL/min/1.73 m2. This difference is represented by Subject 1 compared with the population mean at Time 0 in Figure 2. In addition, the eGFR slope of an individual over time could be the same as the population-averaged eGFR slope, just like in Subject 1 (i.e. 2 mL/min/1.73 m2/year), or could deviate from the population-averaged eGFR slope, as is the case for Subject 2 (i.e. 1 mL/min/1.73 m2/year). The individual deviations from the population-level trajectory are quantified by defining the so-called random-effects model (see Box 2 for more details). Because the model deals properly with the variability in baseline eGFR values and eGFR slopes, wider 95% CIs are inherent to the use of LMMs compared with linear regression on individual slopes, which ignores this variability. The change in time may not be necessarily linear, i.e. the rate of decline is not necessarily constant in time. By forcing a linear trend, information could be lost. The LMM allows for modelling nonlinearities over time. Box 2. Fixed- and random-effects model in the LMM The ‘fixed-effects model’ contains the effects at the population level. We aim to estimate the trajectory at the population level, for instance, the mean eGFR trajectory at the population level, characterized by a population baseline value and slope. The ‘random-effects model’ may include Random intercepts model  The baseline eGFR value is also called the intercept and the LMM takes into account the variability in baseline eGFR values between individuals by defining a random intercepts model. For a given individual, the random intercept quantifies the difference between the observed baseline eGFR value of the individual and the population-averaged baseline eGFR value. Random-slopes model  For a given individual, the random slope quantifies the difference between the observed eGFR slope of the individual and the population-averaged eGFR slope. FIGURE 1: View largeDownload slide Illustration of the fitted line by linear regression on individual slopes during the first step of the two-stage approach. The dashed line is the line fitted by the linear regression model based on the available eGFR values for each subject. The individual slope for all the subjects is 2 mL/min/1.73 m2/year despite the presence of different intercepts and the large heterogeneity in eGFR values between the subjects. Also, the heterogeneity in the available number of eGFR values is ignored. These issues are not taken into account in the linear regression model on individual slopes. FIGURE 1: View largeDownload slide Illustration of the fitted line by linear regression on individual slopes during the first step of the two-stage approach. The dashed line is the line fitted by the linear regression model based on the available eGFR values for each subject. The individual slope for all the subjects is 2 mL/min/1.73 m2/year despite the presence of different intercepts and the large heterogeneity in eGFR values between the subjects. Also, the heterogeneity in the available number of eGFR values is ignored. These issues are not taken into account in the linear regression model on individual slopes. FIGURE 2: View largeDownload slide Illustration of LMM to model eGFR trajectories over time with a mixture of fixed and random effects. The fixed-effects model is represented by the population mean. The individual’s baseline eGFR at Time 0 of Subject 1 deviates from the population-averaged baseline eGFR, which is taken into account in the random intercepts model. The eGFR slope of Subject 2 deviates from the population-averaged eGFR slope and is taken into account by the random slopes model. FIGURE 2: View largeDownload slide Illustration of LMM to model eGFR trajectories over time with a mixture of fixed and random effects. The fixed-effects model is represented by the population mean. The individual’s baseline eGFR at Time 0 of Subject 1 deviates from the population-averaged baseline eGFR, which is taken into account in the random intercepts model. The eGFR slope of Subject 2 deviates from the population-averaged eGFR slope and is taken into account by the random slopes model. Second, using linear regression, estimated individual slopes might be accurate for patients with many repeated eGFR values available during the whole follow-up, but it will result in less accurate estimated slopes for patients with only a few values available. Again, the individual slopes in linear regression are obtained by fitting a straight line through all available eGFR values over time for each individual. In Figure 1, all subjects (Subjects 1–3) have the same annual eGFR decline of 2 mL/min/1.73 m2 as estimated by the linear regression model. However, Subject 3 only has three eGFR measurements available compared with five eGFR values available for Subjects 1 and 2. All values are collapsed into one summarized eGFR decline, thus the variability in the number of values between individuals is ignored. Importantly, the LMM takes this variation in the number of eGFR values between individuals into account due to the fact that individuals with more eGFR values available contribute more to the overall population mean than individuals with fewer eGFR values available. Third, linear regression does not take into account whether the follow-up period is ended earlier than intended due to dropout for a certain individual when estimating the population-averaged slope. Individual slopes in linear regression are obtained by fitting a straight line through all available eGFR values over time within each individual, ignoring whether follow-up was complete or not. When an individual drops out, the observed slope is extrapolated over the complete study period. This can result in biased estimates. For each individual observed, eGFR values could deviate from the true underlying eGFR value due to random measurement errors or random noise. In general, some of the observed eGFR values are higher or lower than the true eGFR value (Figure 3). In addition, repeated eGFR values could be missing for several reasons. The reasons for missing data are formally described by the missing data mechanism. In practice, three mechanisms can be distinguished: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) [17−19]. MCAR applies when missingness is unrelated to the outcome of interest, e.g. relocation or device malfunction. In this case, the observed data are a random sample of the target population and unbiased estimates can be obtained even when using linear regression on individual slopes. However, such a mechanism is hardly likely to hold in practice. Instead, MAR is more realistic to apply in practice. Under MAR, the reason for dropout is related to previously observed eGFR values. In this case, the observed data cannot be considered as a random sample from the target population anymore. Thus the use of linear regression on individual slopes will lead to biased estimates. In contrast, unbiased estimates are obtained using LMMs. Especially when the observed eGFR value is lower than the true eGFR value, the estimated kidney function decline will be overestimated using linear regression on individual slopes. This is reflected in a frequent clinical scenario where the observed low eGFR value could be a reason for starting renal replacement therapy and thus for dropout of a patient from the study (based on previously observed eGFR values). Importantly, instead of extrapolating individual slopes based only on measurements of that individual, the LMM estimates the individual slope also based on complete observed data of other similar individuals in the data set. In this way, the LMM is able to take the dropout into account. The anticipated result of using LMMs is that an overall eGFR decline at the population level is obtained closer to the true eGFR slope than linear regression. In longitudinal studies with high dropout rates, especially early in follow-up, LMM will provide more accurate eGFR declines than linear regression [20, 21]. This is reflected in our example: the adjusted additional change in annual eGFR decline was 2.05 (95% CI 1.44–2.66) for individuals with a DBP ≥80 mmHg compared with individuals with a DBP <80 mmHg and 1.70 (95% CI 0.90–2.51)  mL/min/1.73 m2 using the two-stage linear regression approach and the LMM, respectively. Clearly, in this example, the obtained additional annual eGFR decline is overestimated using linear regression, due to a dropout rate of 22% after 1 year. The last possible missing data mechanism, MNAR, applies when the reason for dropout is related to unobserved eGFR values, e.g. a patient is lost to follow-up due to an improvement or deterioration of her condition, which we never got the chance to measure. In this case, neither the linear regression on individual slopes nor LMMs will provide valid results. More sophisticated methods of analysis are required in this case [22]. However, this mechanism is unlikely to hold in clinical practice. FIGURE 3: View largeDownload slide Illustration of the conceptual difference in dealing with dropout using linear regression on individual slopes and the LMM. Suppose an individual with dropout after 1 year and the illustrated eGFR values: the squared boxes are the observed eGFR values, with the second value randomly low compared with the true underlying eGFR decline. Due to the extrapolation of the observed eGFR slope from an individual after dropout by linear regression, the overall eGFR decline will be overestimated. The LMM is able to take the dropout into account and provides an eGFR decline closer to the true kidney function decline. FIGURE 3: View largeDownload slide Illustration of the conceptual difference in dealing with dropout using linear regression on individual slopes and the LMM. Suppose an individual with dropout after 1 year and the illustrated eGFR values: the squared boxes are the observed eGFR values, with the second value randomly low compared with the true underlying eGFR decline. Due to the extrapolation of the observed eGFR slope from an individual after dropout by linear regression, the overall eGFR decline will be overestimated. The LMM is able to take the dropout into account and provides an eGFR decline closer to the true kidney function decline. Fourth, as we saw in the example above using linear regression, an individual slope could only be estimated in the presence of at least two eGFR values. Patients with only one eGFR value available are therefore excluded from the analysis [23]. However, these values could also contribute to a better estimation of the intercept of the fitted line, which represents the eGFR decline. The omission of these values will reduce the sample size for the analysis and may introduce selection bias. Selection bias in linear regression on individual slopes could lead to either an overestimation or underestimation of the true underlying kidney function decline. An overestimation could occur when patients with at least two eGFR values have a worse prognosis, a reason that eGFR is more often estimated, than patients with one eGFR value. In contrast, an underestimation could occur when the former patients have a better prognosis and if, for instance, patients with only one eGFR value died prior to the next eGFR value. However, using the LMM allows us to include those patients with only one eGFR value available, thereby fully using the sample size and eliminating selection bias for estimating the eGFR trajectory over time at the population level. In our example, this resulted in the inclusion of 416 patients instead of 271 patients. Coincidentally, the obtained results are closer together using linear regression on individual slopes in 271 patients compared with LMMs in 416 patients, but of course we have to keep in mind that linear regression only includes a subgroup of the study population used in the LMM. Of note, the results based on the LMM in the full sample of 416 patients and the linear regression on individual slopes in 271 patients should not be compared. If linear regression could be performed in the full sample of 416 patients, an even higher additional change in eGFR decline than 2.05 mL/min/1.73 m2/year would likely have been obtained; however, it is impossible to estimate this. CONCLUSIONS We aimed at creating awareness for the distinction between the LMM and linear regression analysis on individual slopes for the purpose of estimating kidney function decline over time. The LMM is the preferred and recommended model for research questions regarding eGFR trajectories over time at the population level. Dropouts and heterogeneity in the number of eGFR values between individuals are accurately handled by LMMs. Also, individual differences in both baseline eGFR and eGFR slopes are taken into account by the fixed and random effects in LMMs. SUPPLEMENTARY DATA Supplementary data are available at ndt online. CONFLICT OF INTEREST STATEMENT None declared. The results presented in this article have not been published previously in whole or part. REFERENCES 1 Navaneethan SD , Schold JD , Jolly SE. et al. Diabetes control and the risks of ESRD and mortality in patients with CKD . Am J Kidney Dis 2017 ; 70 : 191 – 198 Google Scholar CrossRef Search ADS PubMed 2 Qureshi S , Lorch R , Navaneethan SD. Blood pressure parameters and their associations with death in patients with chronic kidney disease . Curr Hypertens Rep 2017 ; 19 : 92 Google Scholar CrossRef Search ADS PubMed 3 Bell EK , Gao L , Judd S. et al. Blood pressure indexes and end-stage renal disease risk in adults with chronic kidney disease . Am J Hypertens 2012 ; 25 : 789 – 796 Google Scholar CrossRef Search ADS PubMed 4 de Goeij MC , Voormolen N , Halbesma N. et al. Association of blood pressure with decline in renal function and time until the start of renal replacement therapy in pre-dialysis patients: a cohort study . BMC Nephrol 2011 ; 12 : 38 Google Scholar CrossRef Search ADS PubMed 5 Peralta CA , Norris KC , Li S. et al. Blood pressure components and end-stage renal disease in persons with chronic kidney disease: the Kidney Early Evaluation Program (KEEP) . Arch Intern Med 2012 ; 172 : 41 – 47 Google Scholar CrossRef Search ADS PubMed 6 Rifkin DE , Katz R , Chonchol M. et al. Blood pressure components and decline in kidney function in community-living older adults: the Cardiovascular Health Study . Am J Hypertens 2013 ; 26 : 1037 – 1044 Google Scholar CrossRef Search ADS PubMed 7 Sood MM , Akbari A , Manuel D. et al. Time-varying association of individual BP components with eGFR in late-stage CKD . Clin J Am Soc Nephrol 2017 ; 12 : 904 – 911 Google Scholar CrossRef Search ADS PubMed 8 Anderson AH , Yang W , Townsend RR. et al. Time-updated systolic blood pressure and the progression of chronic kidney disease. A cohort study . Ann Intern Med 2015 ; 162 : 258 – 265 Google Scholar CrossRef Search ADS PubMed 9 Kovesdy CP , Lu JL , Molnar MZ. et al. Observational modeling of strict vs conventional blood pressure control in patients with chronic kidney disease . JAMA Intern Med 2014 ; 174 : 1442 – 1449 Google Scholar CrossRef Search ADS PubMed 10 Cummings DM , Larsen LC , Doherty L. et al. Glycemic control patterns and kidney disease progression among primary care patients with diabetes mellitus . J Am Board Fam Med 2011 ; 24 : 391 – 398 Google Scholar CrossRef Search ADS PubMed 11 FitzMaurice GM , Laird NM , Ware JH. Applied Longitudinal Analysis. Hoboken, NJ : John Wiley & Sons , 2004 : 99 – 102 . 12 de Goeij MC , Rotmans JI , Matthijssen X. et al. Lipid levels and renal function decline in pre-dialysis patients . Nephron Extra 2015 ; 5 : 19 – 29 Google Scholar CrossRef Search ADS PubMed 13 Nacak H , van Diepen M , de Goeij MC. et al. Uric acid: association with rate of renal function decline and time until start of dialysis in incident pre-dialysis patients . BMC Nephrol 2014 ; 15 : 91 Google Scholar CrossRef Search ADS PubMed 14 Levey AS , Stevens LA. Estimating GFR using the CKD Epidemiology Collaboration (CKD-EPI) creatinine equation: more accurate GFR estimates, lower CKD prevalence estimates, and better risk predictions . Am J Kidney Dis 2010 ; 55 : 622 – 627 Google Scholar CrossRef Search ADS PubMed 15 Pfister R , Schwarz K , Carson R , et al. Easy methods for extracting individual regression slopes: comparing SPSS, R, and Excel . Tutor Quant Methods Psychol 2013 ; 9 : 72 – 78 Google Scholar CrossRef Search ADS 16 van Diepen M , Ramspek CL , Jager KJ. et al. Prediction versus aetiology: common pitfalls and how to avoid them . Nephrol Dial Transplant 2017 ; 32(Suppl 2) : ii1 – ii5 Google Scholar CrossRef Search ADS 17 de Goeij MC , van Diepen M , Jager KJ. et al. Multiple imputation: dealing with missing data . Nephrol Dial Transplant 2013 ; 28 : 2415 – 2420 Google Scholar CrossRef Search ADS PubMed 18 Graham JW. Missing data analysis: making it work in the real world . Annu Rev Psychol 2009 ; 60 : 549 – 576 Google Scholar CrossRef Search ADS PubMed 19 Leffondre K , Boucquemont J , Tripepi G. et al. Analysis of risk factors associated with renal function trajectory over time: a comparison of different statistical approaches . Nephrol Dial Transplant 2015 ; 30 : 1237 – 1243 Google Scholar CrossRef Search ADS PubMed 20 Diggle PJ , Heagerty P , Liang KY. et al. Analysis of Longitudinal Data . Oxford : Oxford University Press , 2002 21 Fitzmaurice G , Davidian M , Verbeke G. et al. Longitudinal Data Analysis. New York, NY : Chapman & Hall/CRC , 2008 Google Scholar CrossRef Search ADS 22 Tsonaka R , Verbeke G , Lesaffre E. A semi-parametric shared parameter model to handle nonmonotone nonignorable missingness . Biometrics 2009 ; 65 : 81 – 87 Google Scholar CrossRef Search ADS PubMed 23 Thiebaut R , Walker S. When it is better to estimate a slope with only one point . QJM 2008 ; 101 : 821 – 824 Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nephrology Dialysis Transplantation Oxford University Press

Pitfalls of linear regression for estimating slopes over time and how to avoid them by using linear mixed-effects models

Loading next page...
 
/lp/ou_press/pitfalls-of-linear-regression-for-estimating-slopes-over-time-and-how-Q0wc1SGytg
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
ISSN
0931-0509
eISSN
1460-2385
D.O.I.
10.1093/ndt/gfy128
Publisher site
See Article on Publisher Site

Abstract

Abstract Clinical epidemiological studies often focus on investigating the underlying causes of disease. For instance, a nephrologist may be interested in the association between blood pressure and the development of chronic kidney disease (CKD). However, instead of focusing on the mere occurrence of CKD, the decline of kidney function over time might be the outcome of interest. For examining this kidney function trajectory, patients are typically followed over time with their kidney function estimated at several time points. During follow-up, some patients may drop out earlier than others and for different reasons. Furthermore, some patients may have greater kidney function at study entry or faster kidney function decline than others. Also, a substantial heterogeneity may exist in the number of kidney function estimates available for each patient. This heterogeneity with respect to kidney function, dropout and number of kidney function estimates is important to take into account when estimating kidney function trajectories. In general, two methods are used in the literature to estimate kidney function trajectories over time: linear regression to estimate individual slopes and the linear mixed-effects model (LMM), i.e. repeated measures analysis. Importantly, the linear regression method does not properly take into account the above-mentioned heterogeneity, whereas the LMM is able to retain all information and variability in the data. However, the underlying concepts, use and interpretation of LMMs are not always straightforward. Therefore we illustrate this using a clinical example and offer a framework of how to model and interpret the LMM. dropout, GFR trajectory, kidney function trajectory, linear mixed-effects model, linear regression INTRODUCTION In epidemiological research, studies often focus on investigating risk factors for diseases. For instance, the effect of blood pressure or glycated haemoglobin A1c (HbA1c) levels on the development of end-stage renal disease is investigated [1, 2]. In addition to the mere occurrence of end-stage renal disease, clinicians may also be interested in chronic kidney disease (CKD) progression. Then one might study the effect of blood pressure or HbA1c levels on CKD progression, in other words the kidney function decline or, more generally, the trajectory of kidney function over time [3−10]. When investigating trajectories of kidney function, patients are typically followed over time with their kidney function estimated at several time points. In addition, the number of estimated glomerular filtration rate (eGFR) values may vary across patients. Also, during the follow-up, some patients may drop out during the study and thus their follow-up period is terminated earlier than intended. Furthermore, some patients may have greater kidney function at study entry or show a much faster CKD progression than others. This heterogeneity—in baseline eGFR, dropout and number of eGFR values between patients—should be taken into account when investigating risk factors associated with kidney function decline. In the literature investigating changes in kidney function over time, two methods are commonly used: linear regression of individual slopes and linear mixed-effects models (LMMs). Both methods use repeated eGFR values within an individual over time. The methods differ in the way the overall GFR decline is estimated. In the linear regression method, individual eGFR declines or slopes are estimated using linear regression based on at least two eGFR estimates over time. All values of a patient are collapsed into a single summarizing eGFR decline, yielding an individual eGFR slope for each patient. Subsequently a risk factor such as blood pressure is associated with this summarized decline rate using yet another linear regression with the individual slopes as the outcome. By summarizing these individual eGFR declines, this method is unable to take account of the above-mentioned heterogeneity in dropout, baseline kidney function values and number of eGFR values between individuals. A method that does take account of these sources of heterogeneity when analysing eGFR trajectories is the LMM. LMMs, used for repeated measures designs, are a special case of multilevel or hierarchical linear models [11]. The differences between the two methods and the interpretation and use of an LMM are not always straightforward. Therefore we aimed to highlight the differences between linear regression on individual slopes and LMMs when used for the purpose of estimating the GFR decline over time and its association with a certain risk factor. This will be illustrated by a clinical example of the effect of baseline diastolic blood pressure (DBP) on the decline of kidney function over time. CLINICAL EXAMPLE: EFFECT OF DBP ON KIDNEY FUNCTION DECLINE Study population We used the prospective Predialysis Patient Record-2 (PREPARE-2) cohort, described elsewhere in more detail [12, 13]. In summary, incident adult CKD 4–5 patients starting pre-dialysis care were included when referred to one of the 25 participating Dutch specialized pre-dialysis outpatient clinics (inclusion period 2004−11). Clinical and laboratory data were collected every 6 months. Patients were followed until the start of dialysis, receiving a kidney transplant, death or censoring. Censoring was defined as recovery of kidney function prior to the start of renal replacement therapy, refusal of further study participation, moving to an outpatient clinic not participating in the PREPARE-2 study, loss to follow-up or 18 October 2016 (end of follow-up), whichever came first. The study was approved by the medical ethics committee or institutional review board (as appropriate) of all participating centres. Study exposure and outcome The study exposure in this illustrative example is baseline DBP. Baseline was defined as the first available measurement at cohort entry. DBP was dichotomized based on the median value of DBP, i.e. 80 mmHg. The study outcome was kidney function decline per year. Kidney function, based on serum creatinine levels, was estimated using the Chronic Kidney disease Epidemiology Collaboration equation [14]. Kidney function decline was estimated based on all available individual eGFR values during the first 2 years of pre-dialysis care. In patients initiating dialysis, eGFR values until 2 weeks before the start of dialysis were used, because eGFR values after this point in time were no longer representative of actual kidney function [13]. Analyses were performed with and without adjustment for potential baseline confounders: sex, age, race, smoking, alcohol use, primary kidney disease and comorbidities, cardiovascular disease (angina pectoris, coronary disease and/or myocardial infarction) and diabetes. Statistical analyses were performed with SPSS Statistics 23 (IBM, Armonk, NY, USA). Results using linear regression versus LMM We used both linear regression on individual slopes and the LMM to investigate the association between baseline DBP and eGFR decline. We now demonstrate the differences in results obtained when using these methods. In Supplementary Materials 1 and 2, we provide equations and an example SPSS syntax for both linear regression on individual slopes and the LMM, including general technical issues to keep in mind when modelling the LMM and an example of how to interpret LMM results obtained in SPSS, using the example below. To estimate the eGFR decline, we use linear regression on individual slopes, for which at least two eGFR values within an individual over time are needed. In total, 271 patients of the study population had at least two eGFR values available and were included in the analysis. All results are shown in Table 1. For frequencies of different reasons of dropout after the 2-year follow-up period, see Supplementary Material 3. For categorical risk factors, it applies that the estimated effect is relative to a reference category. First, in the linear regression analysis, the adjusted additional change in eGFR decline is 2.05 [95% confidence interval (CI) 1.44–2.66] mL/min/1.73 m2/year in patients with a DBP ≥80 mmHg compared with individuals with a DBP <80 mmHg, i.e. the reference category. In other words, patients with a DBP ≥80 mmHg on average have a 2.05 mL/min/1.73 m2 faster eGFR decline per year than patients with a DBP <80 mmHg, given a fixed sex, age, etc. Second, using the LMM in the same study population yielded an adjusted additional change in the annual eGFR decline of 1.70 (95% CI 0.90–2.51)  mL/min/1.73 m2 in individuals with a DBP ≥80 mmHg compared with individuals with a DBP <80 mmHg. Table 1 Association of DBP with decline in kidney function during the first 2 years of pre-dialysis DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) a Adjusted for sex, age, race, smoking, alcohol use, primary kidney disease and comorbidities, cardiovascular disease and diabetes. b The fixed effects included time, baseline DBP and baseline DBP*time. For the adjusted results, confounders and the interaction terms for each confounder*time were added. A random intercept and slope model was used. Table 1 Association of DBP with decline in kidney function during the first 2 years of pre-dialysis DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) DBP (mmHg) n Unadjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year) Adjusted additional change in eGFR decline (95% CI) (mL/min/1.73 m2/year)a Linear regression on individual slopes <80 129 0 0 ≥80 142 2.03 (1.43–2.62) 2.05 (1.44–2.66) Linear mixed models on subjects for which linear regression on individual slopes was performedb <80 129 0 0 ≥80 142 1.65 (0.82–2.49) 1.70 (0.90–2.51) Linear mixed models in total study populationb <80 202 0 0 ≥80 214 1.80 (0.98–2.63) 1.91 (1.12–2.71) a Adjusted for sex, age, race, smoking, alcohol use, primary kidney disease and comorbidities, cardiovascular disease and diabetes. b The fixed effects included time, baseline DBP and baseline DBP*time. For the adjusted results, confounders and the interaction terms for each confounder*time were added. A random intercept and slope model was used. Remarkably, this example shows that the obtained additional annual eGFR decline estimates are not the same when directly comparing the linear regression method to LMMs. How could this be explained and which is the better estimate? In the population of 271 patients, dropout was already at 22% after 1 year of follow-up. Could this dropout rate have influenced the results? Below, we will explain the underlying concepts and provide answers using this example. Before discussing the differences between the two methods, an important strength of the LMM is that it allows us to also include individuals with only one eGFR value available during the follow-up period. Estimating the LMM in the extended sample of 416 patients, the adjusted additional annual kidney function decline was 1.91 (95% CI 1.12–2.71)  mL/min/1.73 m2 for patients with a DBP ≥80 mmHg versus DBP <80 mmHg. Although in this particular case this estimate seems to be similar to that obtained in linear regression on individual slopes, we should not forget that the LMM uses the full sample, making use of all available information and thereby reducing the risk of selection bias. Of note, the wider 95% CIs are inherent to the use of LMMs, which we will touch upon below. Underlying concepts of linear regression versus LMMs To obtain population-averaged eGFR declines in association with a risk factor (DBP), linear regression on individual slopes is a commonly used method. This is achieved in a two-stage approach [15]. In the first stage, individual slopes of kidney function over time are estimated, also called patient-specific regression coefficients. For this purpose, using all values of a single patient, a simple linear regression model is estimated with eGFR as the outcome variable, defined as the kidney function estimated at different time points, and time as exposure, meaning the time between baseline and each time point at which the kidney function was estimated. This first stage is based on the assumption that the underlying eGFR trajectory is linear for each patient. The estimated slope of a patient represents the eGFR decline for a pre-specified time period; in our example, an annual eGFR decline (mL/min/1.73 m2/year). Thus, in this first step, all eGFR values of a patient are collapsed into a single summary measure, yielding one eGFR slope for each individual patient. In the second stage, a linear regression model is used in which these previously estimated slopes per individual are analysed as outcome. In our example, this outcome variable (decline in eGFR) is related to baseline DBP (exposure). In aetiological research, we further adjust for potential confounders in this stage using a more elaborate model [16]. Following the clinical example, clear differences in obtained eGFR decline are present using the two-stage linear regression approach versus LMMs. Linear regression on individual slopes is quite simple and easy to understand. However, four important drawbacks exist. The solution for these drawbacks is provided by the LMM: the key characteristics of the LMM align with the problems encountered with the aforementioned two-stage approach. The LMM retains all information and variability in the data when examining eGFR change over time. But how is the LMM able to do this? Below we discuss the four drawbacks of the two-stage linear regression approach and provide the associated solutions using LMMs (Box 1). Box 1. Differences between LMMs and linear regression on individual models LMMs retain all information and variability in the data. Variability in different baseline eGFRs or eGFR slopes between individuals is taken into account by the LMM. LMMs take account of variation in the number of eGFR values between individuals. LMMs deal accurately with dropout in longitudinal studies. In the LMM, individuals with only one eGFR value can be included to estimate the eGFR decline at the population level. First, in linear regression on individual slopes, all eGFR values of a patient are collapsed into a single-summary individual eGFR slope, as illustrated in Figure 1, which is then used as the outcome in the second stage. Consequently, the variability in the estimates of an individual, on which the eGFR slope is based, is not handled properly. In addition, the variability in baseline eGFR values between individuals is totally ignored by the linear regression model. The LMM provides a solution for these problems, because the LMM is able to take into account both the variability of baseline eGFR and eGFR slopes between patients (Figure 2). In general, we aim to estimate the trajectory at the population level, for instance, the mean eGFR trajectory at the population level, characterized by a population baseline value and slope. These are also called fixed effects (Box 2). However, an individual’s eGFR trajectory could deviate from this mean eGFR trajectory in the overall study population. Due to variability around the population-averaged baseline eGFR, the baseline eGFR between individuals could vary. For instance, the overall population-averaged baseline eGFR could be 14 mL/min/1.73 m2, while a certain individual had a baseline eGFR value of 12 mL/min/1.73 m2. This difference is represented by Subject 1 compared with the population mean at Time 0 in Figure 2. In addition, the eGFR slope of an individual over time could be the same as the population-averaged eGFR slope, just like in Subject 1 (i.e. 2 mL/min/1.73 m2/year), or could deviate from the population-averaged eGFR slope, as is the case for Subject 2 (i.e. 1 mL/min/1.73 m2/year). The individual deviations from the population-level trajectory are quantified by defining the so-called random-effects model (see Box 2 for more details). Because the model deals properly with the variability in baseline eGFR values and eGFR slopes, wider 95% CIs are inherent to the use of LMMs compared with linear regression on individual slopes, which ignores this variability. The change in time may not be necessarily linear, i.e. the rate of decline is not necessarily constant in time. By forcing a linear trend, information could be lost. The LMM allows for modelling nonlinearities over time. Box 2. Fixed- and random-effects model in the LMM The ‘fixed-effects model’ contains the effects at the population level. We aim to estimate the trajectory at the population level, for instance, the mean eGFR trajectory at the population level, characterized by a population baseline value and slope. The ‘random-effects model’ may include Random intercepts model  The baseline eGFR value is also called the intercept and the LMM takes into account the variability in baseline eGFR values between individuals by defining a random intercepts model. For a given individual, the random intercept quantifies the difference between the observed baseline eGFR value of the individual and the population-averaged baseline eGFR value. Random-slopes model  For a given individual, the random slope quantifies the difference between the observed eGFR slope of the individual and the population-averaged eGFR slope. FIGURE 1: View largeDownload slide Illustration of the fitted line by linear regression on individual slopes during the first step of the two-stage approach. The dashed line is the line fitted by the linear regression model based on the available eGFR values for each subject. The individual slope for all the subjects is 2 mL/min/1.73 m2/year despite the presence of different intercepts and the large heterogeneity in eGFR values between the subjects. Also, the heterogeneity in the available number of eGFR values is ignored. These issues are not taken into account in the linear regression model on individual slopes. FIGURE 1: View largeDownload slide Illustration of the fitted line by linear regression on individual slopes during the first step of the two-stage approach. The dashed line is the line fitted by the linear regression model based on the available eGFR values for each subject. The individual slope for all the subjects is 2 mL/min/1.73 m2/year despite the presence of different intercepts and the large heterogeneity in eGFR values between the subjects. Also, the heterogeneity in the available number of eGFR values is ignored. These issues are not taken into account in the linear regression model on individual slopes. FIGURE 2: View largeDownload slide Illustration of LMM to model eGFR trajectories over time with a mixture of fixed and random effects. The fixed-effects model is represented by the population mean. The individual’s baseline eGFR at Time 0 of Subject 1 deviates from the population-averaged baseline eGFR, which is taken into account in the random intercepts model. The eGFR slope of Subject 2 deviates from the population-averaged eGFR slope and is taken into account by the random slopes model. FIGURE 2: View largeDownload slide Illustration of LMM to model eGFR trajectories over time with a mixture of fixed and random effects. The fixed-effects model is represented by the population mean. The individual’s baseline eGFR at Time 0 of Subject 1 deviates from the population-averaged baseline eGFR, which is taken into account in the random intercepts model. The eGFR slope of Subject 2 deviates from the population-averaged eGFR slope and is taken into account by the random slopes model. Second, using linear regression, estimated individual slopes might be accurate for patients with many repeated eGFR values available during the whole follow-up, but it will result in less accurate estimated slopes for patients with only a few values available. Again, the individual slopes in linear regression are obtained by fitting a straight line through all available eGFR values over time for each individual. In Figure 1, all subjects (Subjects 1–3) have the same annual eGFR decline of 2 mL/min/1.73 m2 as estimated by the linear regression model. However, Subject 3 only has three eGFR measurements available compared with five eGFR values available for Subjects 1 and 2. All values are collapsed into one summarized eGFR decline, thus the variability in the number of values between individuals is ignored. Importantly, the LMM takes this variation in the number of eGFR values between individuals into account due to the fact that individuals with more eGFR values available contribute more to the overall population mean than individuals with fewer eGFR values available. Third, linear regression does not take into account whether the follow-up period is ended earlier than intended due to dropout for a certain individual when estimating the population-averaged slope. Individual slopes in linear regression are obtained by fitting a straight line through all available eGFR values over time within each individual, ignoring whether follow-up was complete or not. When an individual drops out, the observed slope is extrapolated over the complete study period. This can result in biased estimates. For each individual observed, eGFR values could deviate from the true underlying eGFR value due to random measurement errors or random noise. In general, some of the observed eGFR values are higher or lower than the true eGFR value (Figure 3). In addition, repeated eGFR values could be missing for several reasons. The reasons for missing data are formally described by the missing data mechanism. In practice, three mechanisms can be distinguished: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) [17−19]. MCAR applies when missingness is unrelated to the outcome of interest, e.g. relocation or device malfunction. In this case, the observed data are a random sample of the target population and unbiased estimates can be obtained even when using linear regression on individual slopes. However, such a mechanism is hardly likely to hold in practice. Instead, MAR is more realistic to apply in practice. Under MAR, the reason for dropout is related to previously observed eGFR values. In this case, the observed data cannot be considered as a random sample from the target population anymore. Thus the use of linear regression on individual slopes will lead to biased estimates. In contrast, unbiased estimates are obtained using LMMs. Especially when the observed eGFR value is lower than the true eGFR value, the estimated kidney function decline will be overestimated using linear regression on individual slopes. This is reflected in a frequent clinical scenario where the observed low eGFR value could be a reason for starting renal replacement therapy and thus for dropout of a patient from the study (based on previously observed eGFR values). Importantly, instead of extrapolating individual slopes based only on measurements of that individual, the LMM estimates the individual slope also based on complete observed data of other similar individuals in the data set. In this way, the LMM is able to take the dropout into account. The anticipated result of using LMMs is that an overall eGFR decline at the population level is obtained closer to the true eGFR slope than linear regression. In longitudinal studies with high dropout rates, especially early in follow-up, LMM will provide more accurate eGFR declines than linear regression [20, 21]. This is reflected in our example: the adjusted additional change in annual eGFR decline was 2.05 (95% CI 1.44–2.66) for individuals with a DBP ≥80 mmHg compared with individuals with a DBP <80 mmHg and 1.70 (95% CI 0.90–2.51)  mL/min/1.73 m2 using the two-stage linear regression approach and the LMM, respectively. Clearly, in this example, the obtained additional annual eGFR decline is overestimated using linear regression, due to a dropout rate of 22% after 1 year. The last possible missing data mechanism, MNAR, applies when the reason for dropout is related to unobserved eGFR values, e.g. a patient is lost to follow-up due to an improvement or deterioration of her condition, which we never got the chance to measure. In this case, neither the linear regression on individual slopes nor LMMs will provide valid results. More sophisticated methods of analysis are required in this case [22]. However, this mechanism is unlikely to hold in clinical practice. FIGURE 3: View largeDownload slide Illustration of the conceptual difference in dealing with dropout using linear regression on individual slopes and the LMM. Suppose an individual with dropout after 1 year and the illustrated eGFR values: the squared boxes are the observed eGFR values, with the second value randomly low compared with the true underlying eGFR decline. Due to the extrapolation of the observed eGFR slope from an individual after dropout by linear regression, the overall eGFR decline will be overestimated. The LMM is able to take the dropout into account and provides an eGFR decline closer to the true kidney function decline. FIGURE 3: View largeDownload slide Illustration of the conceptual difference in dealing with dropout using linear regression on individual slopes and the LMM. Suppose an individual with dropout after 1 year and the illustrated eGFR values: the squared boxes are the observed eGFR values, with the second value randomly low compared with the true underlying eGFR decline. Due to the extrapolation of the observed eGFR slope from an individual after dropout by linear regression, the overall eGFR decline will be overestimated. The LMM is able to take the dropout into account and provides an eGFR decline closer to the true kidney function decline. Fourth, as we saw in the example above using linear regression, an individual slope could only be estimated in the presence of at least two eGFR values. Patients with only one eGFR value available are therefore excluded from the analysis [23]. However, these values could also contribute to a better estimation of the intercept of the fitted line, which represents the eGFR decline. The omission of these values will reduce the sample size for the analysis and may introduce selection bias. Selection bias in linear regression on individual slopes could lead to either an overestimation or underestimation of the true underlying kidney function decline. An overestimation could occur when patients with at least two eGFR values have a worse prognosis, a reason that eGFR is more often estimated, than patients with one eGFR value. In contrast, an underestimation could occur when the former patients have a better prognosis and if, for instance, patients with only one eGFR value died prior to the next eGFR value. However, using the LMM allows us to include those patients with only one eGFR value available, thereby fully using the sample size and eliminating selection bias for estimating the eGFR trajectory over time at the population level. In our example, this resulted in the inclusion of 416 patients instead of 271 patients. Coincidentally, the obtained results are closer together using linear regression on individual slopes in 271 patients compared with LMMs in 416 patients, but of course we have to keep in mind that linear regression only includes a subgroup of the study population used in the LMM. Of note, the results based on the LMM in the full sample of 416 patients and the linear regression on individual slopes in 271 patients should not be compared. If linear regression could be performed in the full sample of 416 patients, an even higher additional change in eGFR decline than 2.05 mL/min/1.73 m2/year would likely have been obtained; however, it is impossible to estimate this. CONCLUSIONS We aimed at creating awareness for the distinction between the LMM and linear regression analysis on individual slopes for the purpose of estimating kidney function decline over time. The LMM is the preferred and recommended model for research questions regarding eGFR trajectories over time at the population level. Dropouts and heterogeneity in the number of eGFR values between individuals are accurately handled by LMMs. Also, individual differences in both baseline eGFR and eGFR slopes are taken into account by the fixed and random effects in LMMs. SUPPLEMENTARY DATA Supplementary data are available at ndt online. CONFLICT OF INTEREST STATEMENT None declared. The results presented in this article have not been published previously in whole or part. REFERENCES 1 Navaneethan SD , Schold JD , Jolly SE. et al. Diabetes control and the risks of ESRD and mortality in patients with CKD . Am J Kidney Dis 2017 ; 70 : 191 – 198 Google Scholar CrossRef Search ADS PubMed 2 Qureshi S , Lorch R , Navaneethan SD. Blood pressure parameters and their associations with death in patients with chronic kidney disease . Curr Hypertens Rep 2017 ; 19 : 92 Google Scholar CrossRef Search ADS PubMed 3 Bell EK , Gao L , Judd S. et al. Blood pressure indexes and end-stage renal disease risk in adults with chronic kidney disease . Am J Hypertens 2012 ; 25 : 789 – 796 Google Scholar CrossRef Search ADS PubMed 4 de Goeij MC , Voormolen N , Halbesma N. et al. Association of blood pressure with decline in renal function and time until the start of renal replacement therapy in pre-dialysis patients: a cohort study . BMC Nephrol 2011 ; 12 : 38 Google Scholar CrossRef Search ADS PubMed 5 Peralta CA , Norris KC , Li S. et al. Blood pressure components and end-stage renal disease in persons with chronic kidney disease: the Kidney Early Evaluation Program (KEEP) . Arch Intern Med 2012 ; 172 : 41 – 47 Google Scholar CrossRef Search ADS PubMed 6 Rifkin DE , Katz R , Chonchol M. et al. Blood pressure components and decline in kidney function in community-living older adults: the Cardiovascular Health Study . Am J Hypertens 2013 ; 26 : 1037 – 1044 Google Scholar CrossRef Search ADS PubMed 7 Sood MM , Akbari A , Manuel D. et al. Time-varying association of individual BP components with eGFR in late-stage CKD . Clin J Am Soc Nephrol 2017 ; 12 : 904 – 911 Google Scholar CrossRef Search ADS PubMed 8 Anderson AH , Yang W , Townsend RR. et al. Time-updated systolic blood pressure and the progression of chronic kidney disease. A cohort study . Ann Intern Med 2015 ; 162 : 258 – 265 Google Scholar CrossRef Search ADS PubMed 9 Kovesdy CP , Lu JL , Molnar MZ. et al. Observational modeling of strict vs conventional blood pressure control in patients with chronic kidney disease . JAMA Intern Med 2014 ; 174 : 1442 – 1449 Google Scholar CrossRef Search ADS PubMed 10 Cummings DM , Larsen LC , Doherty L. et al. Glycemic control patterns and kidney disease progression among primary care patients with diabetes mellitus . J Am Board Fam Med 2011 ; 24 : 391 – 398 Google Scholar CrossRef Search ADS PubMed 11 FitzMaurice GM , Laird NM , Ware JH. Applied Longitudinal Analysis. Hoboken, NJ : John Wiley & Sons , 2004 : 99 – 102 . 12 de Goeij MC , Rotmans JI , Matthijssen X. et al. Lipid levels and renal function decline in pre-dialysis patients . Nephron Extra 2015 ; 5 : 19 – 29 Google Scholar CrossRef Search ADS PubMed 13 Nacak H , van Diepen M , de Goeij MC. et al. Uric acid: association with rate of renal function decline and time until start of dialysis in incident pre-dialysis patients . BMC Nephrol 2014 ; 15 : 91 Google Scholar CrossRef Search ADS PubMed 14 Levey AS , Stevens LA. Estimating GFR using the CKD Epidemiology Collaboration (CKD-EPI) creatinine equation: more accurate GFR estimates, lower CKD prevalence estimates, and better risk predictions . Am J Kidney Dis 2010 ; 55 : 622 – 627 Google Scholar CrossRef Search ADS PubMed 15 Pfister R , Schwarz K , Carson R , et al. Easy methods for extracting individual regression slopes: comparing SPSS, R, and Excel . Tutor Quant Methods Psychol 2013 ; 9 : 72 – 78 Google Scholar CrossRef Search ADS 16 van Diepen M , Ramspek CL , Jager KJ. et al. Prediction versus aetiology: common pitfalls and how to avoid them . Nephrol Dial Transplant 2017 ; 32(Suppl 2) : ii1 – ii5 Google Scholar CrossRef Search ADS 17 de Goeij MC , van Diepen M , Jager KJ. et al. Multiple imputation: dealing with missing data . Nephrol Dial Transplant 2013 ; 28 : 2415 – 2420 Google Scholar CrossRef Search ADS PubMed 18 Graham JW. Missing data analysis: making it work in the real world . Annu Rev Psychol 2009 ; 60 : 549 – 576 Google Scholar CrossRef Search ADS PubMed 19 Leffondre K , Boucquemont J , Tripepi G. et al. Analysis of risk factors associated with renal function trajectory over time: a comparison of different statistical approaches . Nephrol Dial Transplant 2015 ; 30 : 1237 – 1243 Google Scholar CrossRef Search ADS PubMed 20 Diggle PJ , Heagerty P , Liang KY. et al. Analysis of Longitudinal Data . Oxford : Oxford University Press , 2002 21 Fitzmaurice G , Davidian M , Verbeke G. et al. Longitudinal Data Analysis. New York, NY : Chapman & Hall/CRC , 2008 Google Scholar CrossRef Search ADS 22 Tsonaka R , Verbeke G , Lesaffre E. A semi-parametric shared parameter model to handle nonmonotone nonignorable missingness . Biometrics 2009 ; 65 : 81 – 87 Google Scholar CrossRef Search ADS PubMed 23 Thiebaut R , Walker S. When it is better to estimate a slope with only one point . QJM 2008 ; 101 : 821 – 824 Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Nephrology Dialysis TransplantationOxford University Press

Published: May 22, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off