Add Journal to My Library
Nephrology Dialysis Transplantation
, Volume Advance Article (6) – Jul 8, 2017

3 pages

/lp/ou_press/cautionary-note-propensity-score-matching-does-not-account-for-bias-MsORljFwB1

- Publisher
- Oxford University Press
- Copyright
- © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
- ISSN
- 0931-0509
- eISSN
- 1460-2385
- D.O.I.
- 10.1093/ndt/gfx198
- Publisher site
- See Article on Publisher Site

ABSTRACT This article gives a review of the limitations of propensity score matching as a tool for confounding control in the presence of censoring. Using an illustrative simulation study, we emphasize the importance of explicit adjustment for selective loss to follow-up and explain how this may be achieved. censoring, collider stratification bias, propensity score matching In epidemiological research, valid causal inference is often hampered by confounding and selective loss to follow-up. Confounding is increasingly often addressed by means of propensity score (PS) matching. The analysis of a PS-matched data set closely resembles that of a randomized controlled trial (RCT); one expects that, on average, the distribution of covariates will be similar between treatment groups after PS matching or randomization so that in the absence of other forms of bias, systematic differences in outcomes between treatment groups can be attributed to treatment. Importantly, as is the case with RCTs [1], PS matching (or randomization in the case of an RCT) typically does not account for selective loss to follow-up, and the confounder balance that was achieved through PS matching (or randomization) may falsely reassure researchers and readers that the exposure groups under study were (and remained) comparable. However, the problem of selective loss to follow-up can be potentially remedied by the same methods that have been proposed to address the problem in RCTs, namely, inverse probability weighting (IPW), multiple imputation or regression adjustment [1]. TWO EXAMPLES In a study on the dose–response relationship between sulphonylurea (SU) derivatives and major adverse cardiovascular events in elderly patients with type 2 diabetes, patients were censored if they switched their treatment regimen [2]. Matching on a high-dimensional PS created treatment groups (high- and low-dose SU) that were very similar in terms of baseline characteristics, including those reflecting disease severity, co-medication use and comorbidity state. Possibly, however, those who switched treatments at any point during follow-up represent a selective subset; for example, because switching occurred more often among those who used more concomitant medication. Over time, this may have distorted the balance in co-medication that was initially achieved through PS matching. Another example is a study comparing outcomes between incremental and thrice-weekly initiation of haemodialysis [3]. Following PS matching, the groups were similar in terms of a number of baseline characteristics, including age, sex and primary renal disease. However, approximately half of the participants were lost to follow-up at 12 months. Again, this may have induced a selection bias if the loss to follow-up affected the treatment groups differentially. AN ILLUSTRATION OF THE PROBLEM Through a small simulation study, we will illustrate the effect of ignoring selectively missing outcomes, while focusing on PS matching to control for confounding. Throughout, it is assumed that there is exchangeability for treatment and censoring, consistency, no model misspecification and positivity, so that the observed covariates are sufficient to adjust for both confounding and selection bias due to loss to follow-up [4–6]. For this illustration, we consider a hypothetical setting representing an observational study of a binary treatment variable T, a binary outcome variable Y and a trichotomous confounder X. The probability of a subject dropping out before their outcome could be assessed depends on both T and X. Data were generated for 10 000 subjects using the mechanism detailed in the Supplementary material. The interest lies in the marginal odds ratio (OR) of 2 for the average treatment effect on the treated (ATT). However, in this observational setting, causal inference is hampered by confounding. This motivates the use of PS matching, which typically provides an estimate of the ATT (7). Here, PSs were estimated by a logistic regression of T on X. We then matched treated to untreated subjects on the estimated PSs with replacement. As an alternative to PS matching to estimate the ATT, we also used IPW, with weights of 1 and PS/(1 – PS) for treated and untreated subjects, respectively. Treatment effects were estimated by applying a logistic regression to the matched or weighted pseudopopulations. We refer to these approaches as PS1 and IPW1, respectively. This procedure was repeated 1000 times. Bias was estimated on the log-OR scale as the average deviation from the true log-OR log 2. The results in Table 1 show that both PS1 and IPW1 yielded substantial bias. The reason for this bias is apparent from Figure 1, which depicts the balance in the population before and after matching and/or weighting. Although PS1 and IPW1 are suited to balance confounders (Figure 1A and B), as subjects are lost to follow-up, the balance achieved through matching or weighting is not guaranteed to be maintained in the data set used for the analysis (Figure 1C). In fact, since the probability of dropping out depends on both T and X, conditioning on not being lost at follow-up (i.e. performing an analysis on those subjects for whom an outcome is observed) induces an association between X and T [8, 9], thereby biasing the relation between T and Y through what is formally known as collider stratification bias. Table 1 Performance of IPW and PS matching estimators Estimator Bias (95% CI) OR PS1 −0.134 (−0.139 to − 0.129) 1.749 IPW1 −0.135 (−0.139 to − 0.130) 1.748 PS2 0.002 (−0.003 − 0.008) 2.004 IPW2 0.002 (−0.003 − 0.007) 2.003 Estimator Bias (95% CI) OR PS1 −0.134 (−0.139 to − 0.129) 1.749 IPW1 −0.135 (−0.139 to − 0.130) 1.748 PS2 0.002 (−0.003 − 0.008) 2.004 IPW2 0.002 (−0.003 − 0.007) 2.003 CI, confidence interval. For definitions of PS1, IPW1, PS2 and IPW2, see text. Bias was estimated by the average deviation of the estimated log-ORs β^ from the true effect β= log 2 across 1000 simulated samples. 95% CI = β^¯−β ± 1.96√(σ^2/1000), where σ^2 is the empirical variance of β^. OR = exp β^¯ (true OR = 2). Table 1 Performance of IPW and PS matching estimators Estimator Bias (95% CI) OR PS1 −0.134 (−0.139 to − 0.129) 1.749 IPW1 −0.135 (−0.139 to − 0.130) 1.748 PS2 0.002 (−0.003 − 0.008) 2.004 IPW2 0.002 (−0.003 − 0.007) 2.003 Estimator Bias (95% CI) OR PS1 −0.134 (−0.139 to − 0.129) 1.749 IPW1 −0.135 (−0.139 to − 0.130) 1.748 PS2 0.002 (−0.003 − 0.008) 2.004 IPW2 0.002 (−0.003 − 0.007) 2.003 CI, confidence interval. For definitions of PS1, IPW1, PS2 and IPW2, see text. Bias was estimated by the average deviation of the estimated log-ORs β^ from the true effect β= log 2 across 1000 simulated samples. 95% CI = β^¯−β ± 1.96√(σ^2/1000), where σ^2 is the empirical variance of β^. OR = exp β^¯ (true OR = 2). FIGURE 1 View largeDownload slide Balance on the confounder X across treatment groups in a hypothetical setting. The untreated group is represented in grey; the treated group in white. Frequencies are relative to treatment (treated/untreated) group size; hence, equally sized bars indicate confounder balance. In the following, PS and PC denote the PS and the probability of censoring (being lost to follow-up) given T and X, respectively. Panel (A) shows the balance in the original unweighted population. Reweighting observations using weights of 1 and PS/(1–PS) for treated and untreated subjects, respectively, results in the balance shown in (B). The same result is obtained by matching treated subjects to untreated with similar PS. Removing observations with censored outcomes from this inverse probability weighted or PS-matched data set results in imbalance (C). The balance shown in (D) is obtained by weighting the original observations with 1/(1–PC) and PS/[(1–PS)(1–PC)] for treated and untreated subjects, respectively, and conditioning on uncensored observations. The same result is obtained by reweighting the PS-matched data set by 1/(1–PC) for each subject. FIGURE 1 View largeDownload slide Balance on the confounder X across treatment groups in a hypothetical setting. The untreated group is represented in grey; the treated group in white. Frequencies are relative to treatment (treated/untreated) group size; hence, equally sized bars indicate confounder balance. In the following, PS and PC denote the PS and the probability of censoring (being lost to follow-up) given T and X, respectively. Panel (A) shows the balance in the original unweighted population. Reweighting observations using weights of 1 and PS/(1–PS) for treated and untreated subjects, respectively, results in the balance shown in (B). The same result is obtained by matching treated subjects to untreated with similar PS. Removing observations with censored outcomes from this inverse probability weighted or PS-matched data set results in imbalance (C). The balance shown in (D) is obtained by weighting the original observations with 1/(1–PC) and PS/[(1–PS)(1–PC)] for treated and untreated subjects, respectively, and conditioning on uncensored observations. The same result is obtained by reweighting the PS-matched data set by 1/(1–PC) for each subject. To account for selective loss to follow-up, we applied inverse probability of censoring weighting (IPCW) [4–6]. In this simple setting with only one point of follow-up, the IPCW weights reduce to the inverse probability of not being lost to follow-up (censored). Probabilities of censoring (PCs) were estimated by logistic regression of C, a censoring indicator, on T and X applied to the original data sets. We then applied two additional estimators, PS2 and IPW2. In PS2, the matched sets obtained through PS1 were additionally weighted by 1/(1 – PC) for each subject. In IPW2, the weights 1/(1 – PC) and PS/[(1 – PS)(1 – PC)] for treated and untreated, respectively, were applied to the original data sets, and only subjects with observed outcomes were included in the analysis. Again, treatment effects were estimated by applying a logistic regression to the matched and/or weighted pseudopopulations. The results in Table 1 show that both PS2 and IPW2 yielded estimates that on average were very close to the true effect. The reason is that PS2 and IPW2 restore the imbalance that resulted from conditioning on not being lost to follow-up by reweighting observations such that X and T are no longer associated, and X takes the distribution of the treated subjects (Figure 1D). COVARIATE IMBALANCE IN THE ABSENCE OF CENSORING It should be borne in mind that with two or more points of follow-up, covariate imbalance can develop even in the absence of censoring—specifically, that is, leaving the risk set for reasons other than sustaining the outcome of interest. Conditioning on past survival by restricting the analysis to those individuals who have not yet sustained the outcome may induce an association between treatment and marginally independent covariates if past survival is a common effect of both [9–13]. If these covariates are also predictive of survival at a subsequent point of follow-up, this conditioning may therefore open a backdoor path, thereby inducing a selection bias. Thus neither RCTs nor PS matching nor weighting analyses are guaranteed to be free of selection bias, because such selection occurs after baseline imbalances have been removed through randomization, matching or weighting. CONCLUSION PS methods have gained increasing interest as a means to adjust for confounding [14]. However, as illustrated, PS matching does not account for bias due to censoring. In fact, the balance of confounders across exposure groups that was achieved by PS matching may be ruined by selective censoring. This problem can potentially be remedied by IPW (as shown here), multiple imputation or regression adjustment. It is important to be aware, however, that in contrast to PS matching and IPW, the estimand of conventional multivariable regression analysis is not typically a marginal effect such as the ATT. Also, our simulations were done under the assumption that the censoring mechanism was independent of the outcome. Importantly, neither of the above methods is suited to solve the problem of censored data when the missingness depends on unobserved variables that are predictive of the outcome or on the outcome itself. It is only when the missingness can be explained by observed data, such as in our illustration, that such biases may be adequately addressed by one of the above methods. If loss to follow-up is a completely random process, the confounder balance that was achieved by PS matching is expected to be preserved and conventional analysis on those for whom an outcome was observed will still be appropriate. SUPPLEMENTARY DATA Supplementary data are available online at http://ndt.oxfordjournals.org. FUNDING R.H.H.G. was funded by the Netherlands Organization for Scientific Research (NWO-Vidi project 917.16.430). The views expressed in this article are those of the authors and not necessarily any funding body. CONFLICT OF INTEREST STATEMENT None declared. REFERENCES 1 Groenwold RH, Moons KG, Vandenbroucke JP. Randomized trials with missing outcome data: how to analyze and what to report. CMAJ 2014; 186: 1153– 1157 Google Scholar CrossRef Search ADS PubMed 2 Abdelmoneim AS, Eurich DT, Senthilselvan A et al. . Dose‐response relationship between sulfonylureas and major adverse cardiovascular events in elderly patients with type 2 diabetes. Pharmacoepidemiol Drug Saf 2016; 25: 1186– 1195 Google Scholar CrossRef Search ADS PubMed 3 Park JI, Park JT, Kim YL et al. . Comparison of outcomes between the incremental and thrice-weekly initiation of hemodialysis: a propensity-matched study of a prospective cohort in Korea. Nephrol Dial Transplant 2017; 32: 355– 363 Google Scholar PubMed 4 Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11: 550– 560 Google Scholar CrossRef Search ADS PubMed 5 Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000; 11: 561– 570 Google Scholar CrossRef Search ADS PubMed 6 Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol 2008; 168: 656– 664 Google Scholar CrossRef Search ADS PubMed 7 Williamson E, Morley R, Lucas A et al. . Propensity scores: from naive enthusiasm to intuitive understanding. Stat Methods Med Res 2012; 21: 273– 293 Google Scholar CrossRef Search ADS PubMed 8 Pearl J. Causality: Models, Reasoning, and Inference . Cambridge: Cambridge University Press, 2009. Google Scholar CrossRef Search ADS 9 Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology 2004; 15: 615– 625 Google Scholar CrossRef Search ADS PubMed 10 Hernán MA. The hazards of hazard ratios. Epidemiology 2010; 21: 13– 15 Google Scholar CrossRef Search ADS PubMed 11 Aalen OO, Cook RJ, Røysland K. Does Cox analysis of a randomized survival study yield a causal treatment effect? Lifetime Data Anal 2015; 21: 579– 593 Google Scholar CrossRef Search ADS PubMed 12 Sjölander A, Dahlqwist E, Zetterqvist J. A note on the noncollapsibility of rate differences and rate ratios. Epidemiology 2016; 27: 356– 359 Google Scholar CrossRef Search ADS PubMed 13 Hernán MA, Robins JM. Chapter 8: Selection Bias. In: Causal Inference. Ed. Hernán MA, Robins JM. Boca Raton, FL: Chapman & Hall/CRC, 2017 (forthcoming). https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ (9 May 2017, date last accessed) 14 Stürmer T, Joshi M, Glynn RJ et al. . A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol 2006; 59: 437.e1– 437.e24 Google Scholar CrossRef Search ADS © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Nephrology Dialysis Transplantation – Oxford University Press

**Published: ** Jul 8, 2017

Loading...

personal research library

It’s your single place to instantly

**discover** and **read** the research

that matters to you.

Enjoy **affordable access** to

over 18 million articles from more than

**15,000 peer-reviewed journals**.

All for just $49/month

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Read from thousands of the leading scholarly journals from *SpringerNature*, *Elsevier*, *Wiley-Blackwell*, *Oxford University Press* and more.

All the latest content is available, no embargo periods.

## “Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”

Daniel C.

## “Whoa! It’s like Spotify but for academic articles.”

@Phil_Robichaud

## “I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”

@deepthiw

## “My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”

@JoseServera

DeepDyve ## Freelancer | DeepDyve ## Pro | |
---|---|---|

Price | FREE | $49/month |

Save searches from | ||

Create lists to | ||

Export lists, citations | ||

Read DeepDyve articles | Abstract access only | Unlimited access to over |

20 pages / month | ||

PDF Discount | 20% off | |

Read and print from thousands of top scholarly journals.

System error. Please try again!

or

By signing up, you agree to DeepDyve’s Terms of Service and Privacy Policy.

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.