Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis

Dirk Bassler; Matthias Briel; Victor M. Montori; Melanie Lane; Paul Glasziou; Qi Zhou; Diane Heels-Ansdell; Stephen D. Walter; Gordon H. Guyatt; and the STOPIT-2 Study Group

doi:10.1001/jama.2010.310

Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis

Bassler, Dirk;Briel, Matthias;Montori, Victor M.;Lane, Melanie;Glasziou, Paul;Zhou, Qi;Heels-Ansdell, Diane;Walter, Stephen D.;Guyatt, Gordon H.;Group, and the STOPIT-2 Study 2010-03-24 00:00:00 Abstract Context Theory and simulation suggest that randomized controlled trials (RCTs) stopped early for benefit (truncated RCTs) systematically overestimate treatment effects for the outcome that precipitated early stopping. Objective To compare the treatment effect from truncated RCTs with that from meta-analyses of RCTs addressing the same question but not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect. Data Sources Search of MEDLINE, EMBASE, Current Contents, and full-text journal content databases to identify truncated RCTs up to January 2007; search of MEDLINE, Cochrane Database of Systematic Reviews, and Database of Abstracts of Reviews of Effects to identify systematic reviews from which individual RCTs were extracted up to January 2008. Study Selection Selected studies were RCTs reported as having stopped early for benefit and matching nontruncated RCTs from systematic reviews. Independent reviewers with medical content expertise, working blinded to trial results, judged the eligibility of the nontruncated RCTs based on their similarity to the truncated RCTs. Data Extraction Reviewers with methodological expertise conducted data extraction independently. Results The analysis included 91 truncated RCTs asking 63 different questions and 424 matching nontruncated RCTs. The pooled ratio of relative risks in truncated RCTs vs matching nontruncated RCTs was 0.71 (95% confidence interval, 0.65-0.77). This difference was independent of the presence of a statistical stopping rule and the methodological quality of the studies as assessed by allocation concealment and blinding. Large differences in treatment effect size between truncated and nontruncated RCTs (ratio of relative risks <0.75) occurred with truncated RCTs having fewer than 500 events. In 39 of the 63 questions (62%), the pooled effects of the nontruncated RCTs failed to demonstrate significant benefit. Conclusions Truncated RCTs were associated with greater effect sizes than RCTs not stopped early. This difference was independent of the presence of statistical stopping rules and was greatest in smaller studies. Although randomized controlled trials (RCTs) generally provide credible evidence of treatment effects, multiple problems may emerge when investigators terminate a trial earlier than planned,1 especially when the decision to terminate the trial is based on the finding of an apparently beneficial treatment effect. Bias may arise because large random fluctuations of the estimated treatment effect can occur, particularly early in the progress of a trial.2 When investigators stop a trial based on an apparently beneficial treatment effect, their results may therefore provide misleading estimates of the benefit.3,4 Statistical modeling suggests that RCTs stopped early for benefit (truncated RCTs) will systematically overestimate treatment effects,5 and empirical data demonstrate that truncated RCTs often show implausibly large treatment effects.6 Empirical evidence addressing the magnitude of bias from stopping early, and factors that may influence the magnitude of the bias, remain limited and the appropriate interpretation of truncated RCTs a matter of controversy.6-11 We therefore undertook a systematic review to determine the treatment effect from truncated RCTs compared with meta-analyses of RCTs addressing the same research question that were not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect. Methods A prior report provides a detailed description of the design and methods of this study (Study of Trial Policy of Interim Truncation-2 [STOPIT-2]).12 In summary, we conducted extensive literature searches to identify truncated RCTs and systematic reviews addressing the same question. We retrieved all RCTs included in the systematic reviews, extracted data and conducted new meta-analyses of the nontruncated RCTs addressing the outcome that led to the early termination of the truncated RCTs, and compared the relative risk (RR) generated by the truncated RCTs with the RR from all matching nontruncated RCTs. Literature Search We updated the database from our prior study following the same search strategy.6 In January 2007 we searched MEDLINE, EMBASE, Current Contents, and full-text journal content databases from their inception for truncated RCTs. In addition, we identified truncated RCTs through hand searching, by personal contact with trial investigators, and by a citation search linked to 2 key articles.6,13 For systematic reviews, we searched the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects, and MEDLINE from their inception to January 2008. Eligibility Criteria for Truncated RCTs and Matching Systematic Reviews We included RCTs of any intervention reported as having stopped earlier than initially planned owing to interim results in favor of the intervention. We excluded matching systematic reviews that did not have a methods section and did not describe a literature search that, at minimum, included MEDLINE.12 Identification, Retrieval, and Eligibility of Nontruncated RCTs We retrieved the full text of all RCTs included in each systematic review. If a systematic review was published prior to the matching truncated RCT and thus did not include the truncated RCT, we updated this review.12 Eligible nontruncated RCTs addressed the outcome that led to the early termination of the truncated RCT and stated clearly that allocation was randomized. We assessed the eligibility of nontruncated RCTs based on the similarity of the question addressed by the matching truncated RCT (see Briel et al12 for details). Teams of 2 reviewers with relevant clinical expertise made independent eligibility and similarity decisions and resolved disagreement by discussion and, if necessary, by consulting a third party. Reviewers who judged eligibility were blinded to the results of the trials through electronic or manual masking.12 Data Extraction and Analysis Working in pairs, reviewers with methodological expertise conducted data extraction independently.12 From each RCT (truncated or nontruncated), we collected information about early termination, the journal of publication (we categorized Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine as high-impact journals), the year of publication, methodological quality, data monitoring committees, stopping rules at the outset of the trial, interim analyses, and the measure of treatment effect for the outcome that terminated the truncated RCT. The only study characteristic tested for a statistically significant difference between truncated and nontruncated RCTs was publication in a high-impact journal. We calculated an RR for each RCT in our study. For studies that provided results using continuous data, we estimated an approximate dichotomous equivalent.12,14 For each question, we used meta-analysis for the pooled RR and a 95% confidence interval (CI) for all nontruncated RCTs. If more than 1 truncated RCT addressed the same question, we calculated a pooled RR and CI for those truncated RCTs. Pooled estimates of RRs were calculated using an inverse-variance weighted random-effects model. We performed a z test for each meta-analysis to assess differences between the truncated and nontruncated RCTs with respect to their pooled RRs. As a summary measure we calculated a ratio of RRs, and its logarithm, for each question as follows: Log[ratio of RRs] = log[RR of truncated RCT(s)/pooled RR of nontruncated RCTs] = log[RR of truncated RCT(s)] – log[pooled RR of nontruncated RCTs]. We estimated the pooled log[ratio of RRs] using a random-effects inverse-variance meta-analysis and then, for purposes of presentation, back transformed to the overall ratio of RRs. To explore factors associated with the magnitude of the ratio of RRs, we performed a meta-regression analysis in which the dependent variable was the log[ratio of RRs] and independent variables were whether the truncated RCTs used a formal stopping rule and the number of outcome events in the truncated RCTs. When more than 1 truncated RCT addressed the same question, the stopping rule status was assigned to “has a rule” if at least 1 truncated RCT had a rule. Similarly, when there was more than 1 truncated RCT for the same question, we used the truncated RCT with the largest number of events as the source for our analyses of the influence of the number of events. To allow consideration of methodological quality as a predictor and to test whether restriction to nontruncated RCTs that are more similar to the truncated RCTs would change the results, we constructed a second meta-regression described fully in a prior report.12 In brief, this meta-regression used a hierarchical model with 2 levels: individual RCT (study) level and meta-analysis (question) level. The dependent variable in this analysis was the logarithm of the RR for each study. Predictor variables considered included a combined group variable (truncated RCT with a rule, truncated RCT without a rule, nontruncated RCT), number of events, concealment of allocation, use of blinding, and the interaction between the group variable and the other variables. We performed this meta-regression on different data sets based on various thresholds for the similarity of the nontruncated RCTs in each question to the matching truncated RCTs. To test for an order effect (the hypothesis being that studies published earlier will have more responsive populations), for each review question we established where in the sequence of published studies (by publication date) the truncated RCT stood and referred to this as the “rank” of the truncated RCT. We then calculated a “standardized rank” [100 · (rank − 1)/(total number of studies − 1)]. If there was more than 1 truncated RCT in the review question, we used the median among the truncated RCTs as the standardized rank of truncated RCTs. We then divided the review questions into 2 groups (those with the standardized rank of the truncated RCT equal to or less than 50 [n = 27] and those with the standardized rank of the truncated RCT greater than 50 [n = 37]) and repeated the meta-analysis for each group. As a secondary analysis, we compared the RR of the truncated RCT(s) with the pooled estimate for all trials including the truncated RCTs. Analyses were performed using SAS version 9.2 (SAS Institute Inc, Cary, North Carolina); tests were 2-sided, and P < .05 was used as the threshold for statistical significance. Results Literature Search A total of 195 truncated RCTs formed the basis for the search for systematic reviews; we identified matching systematic reviews for 79 questions. We extracted 2488 nontruncated RCTs from 202 matching systematic reviews (of which 32 were updated). Of these 2488 studies, 22 (0.9%) proved to be truncated RCTs, which we added to the truncated RCT database. We excluded 2012 nontruncated RCTs based on insufficient similarity to the truncated RCTs or unclear randomization and 30 because the RR could not be calculated. The remaining 424 nontruncated RCTs and 91 matching truncated RCTs addressed 63 questions (Figure 1). An eSupplement reporting the references of the included studies is available. Study Characteristics Table 1 describes the characteristics of the eligible studies. Compared with matching nontruncated RCTs, truncated RCTs were more likely to be published in high-impact journals (30% vs 68%, P < .001). Quantification of Differences in Treatment Effect Size Of 63 comparisons, the ratio of RRs was equal to or less than 1.0 in 55 (87%); the weighted average ratio of RRs was 0.71 (95% CI, 0.65-0.77; P < .001) (Figure 2). In 39 of 63 comparisons (62%), the pooled estimates for nontruncated RCTs were not statistically significant. Comparison of the truncated RCTs with all RCTs (including the truncated RCTs) demonstrated a weighted average ratio of RRs of 0.85; in 16 of 63 comparisons (25%), the pooled estimate failed to demonstrate a significant effect. Determinants of Differences in Treatment Effect Size Table 2 summarizes the findings from the single-level meta-regression analysis to determine predictors of differences in the treatment effect size between truncated and nontruncated RCTs. In the univariable models, both the number of events (P < .001) and the presence of a statistical stopping rule (P = .02) were significant. When we included both variables in the model, only the number of events remained significant (P < .001). The results from the multilevel meta-regression confirmed significant interactions between the combined variable (truncated vs nontruncated RCT) and the number of events (P < .001). Large differences in treatment effect size between truncated and nontruncated RCTs (ratio of RRs <0.75) occurred in truncated RCTs with fewer than 500 events (Figure 3). The multilevel meta-regression analysis using the entire data set demonstrated that neither concealment of allocation (P = .96) nor blinding (P = .32) were significant predictors of the differences in treatment effect size. Different Data Sets and Order of Publication The findings were similar, irrespective of either the closeness of the match between nontruncated and truncated RCTs or the order of publication of the truncated RCTs relative to that of matching nontruncated RCTs. In the multilevel meta-regression analysis, adjusted ratios of RRs of truncated vs nontruncated RCTs were 0.64 when questions were very closely matched, 0.70 when they were moderately close, and 0.69 when they were least close. The ratio of RRs of the group in which the truncated RCTs were published in early years (standardized rank ≤50) was 0.74 (95% CI, 0.66-0.83) and for the later years (standardized rank >50) was 0.68 (95% CI, 0.60-0.77). The P value for the difference between the 2 estimates was 0.33. Comment Summary of Findings In this empirical study including 91 truncated RCTs and 424 matching nontruncated RCTs addressing 63 questions, we found that truncated RCTs provide biased estimates of effects on the outcome that precipitated early stopping. On average, the ratio of RRs in the truncated RCTs and matching nontruncated RCTs was 0.71. This implies that, for instance, if the RR from the nontruncated RCTs was 0.8 (a 20% relative risk reduction), the RR from the truncated RCTs would be on average approximately 0.57 (a 43% relative risk reduction, more than double the estimate of benefit). Nontruncated RCTs with no evidence of benefit—ie, with an RR of 1.0—would on average be associated with a 29% relative risk reduction in truncated RCTs addressing the same question. In nearly two-thirds of comparisons, the pooled estimate for nontruncated RCTs failed to demonstrate a statistically significant effect. We found substantial heterogeneity in our analysis of the pooled ratio of RRs for truncated vs nontruncated RCTs, suggesting that differences between truncated and nontruncated RCT effects will differ across study questions. This heterogeneity could be partially explained by the total number of outcome events in the truncated RCTs, with larger differences between truncated and nontruncated RCTs in studies with a smaller number of events. The methodological quality and the presence of a statistical stopping rule failed to predict the observed difference in the treatment effect. Strengths and Limitations We used rigorous search strategies and undertook an intensive independent evaluation of eligibility and similarity of several thousand RCTs blinded to the results. Our analysis had considerable statistical power to link the estimates of treatment effect from truncated and nontruncated RCTs addressing the same question and demonstrated consistent results across degrees of similarity of the question addressed by the truncated RCTs and the matching nontruncated RCTs. Our literature search, while extensive, missed some truncated RCTs. Assessment of the 2488 RCTs included in the systematic reviews revealed 22 additional truncated RCTs not initially identified. Whether results would differ in other unidentified truncated RCTs remains speculative. We relied on systematic reviews to identify nontruncated RCTs but did not assess the reviews' susceptibility to publication bias. However, we know that trials with positive findings have nearly 4 times the odds of being published compared with those with negative findings.15 To the extent that publication bias is present, inclusion of unpublished studies would lead to a diminished pooled effect from the nontruncated RCTs. This would in turn likely lead to a larger gradient of effect between truncated and nontruncated RCTs. Thus, to the extent that publication bias exists, our results probably represent a conservative estimate of the exaggeration in treatment benefit associated with stopping early. Relation to Recent Empirical Studies, Simulation, and Commentaries Korn and colleagues recently reviewed the results of cancer trials stopped early and that either continued with further follow-up or released results early.16 They found that substantial differences between results at the time of early stopping and subsequent follow-up seldom occurred.16 Freidlin and Korn published a related simulation study that supported these findings, suggesting that if the true effect is large, differences between stopped-early results and full follow-up results will differ little.17 These recent studies confirm that even when the true effect is large, studies stopped early still overestimate that effect. More important, the authors do not address circumstances in which the true underlying effect is small or absent. Clinicians seek the best estimate of an unknown true underlying effect with appropriate safeguards against bias. As Goodman18 points out in a commentary on the simulation by Freidlin and Korn, “since we do not know what the true effect is, we cannot know in any particular case whether the observed effect is biased or not; the fact that the trial is stopped early is not prima facie evidence that the estimate is wrong.” We support this statement; unfortunately, neither do we know that the stopped-early result is close to the truth. Our findings suggest that often it is not. Implications Consensus exists that rigorous data monitoring practice requires a predefined statistical stopping rule.19,20 Our findings, however, indicate that even a formal rule is insufficient to prevent bias consequent on stopping early and suggest the advisability of rules that require a large number of outcome events before early stopping is contemplated. In this review we have focused only on RCTs stopped early for benefit. Although ethical concerns make decisions regarding stopping trials early for safety more complex than those regarding stopping trials early for benefit, inferences regarding harm and those regarding benefit are equally susceptible to the bias associated with stopping early. Our results have important implications for systematic reviews and ethics.21,22 If reviewers do not note truncation and do not consider early stopping for benefit, meta-analyses will report overestimates of effects.21 Investigators and funding bodies—in particular, drug and device manufacturers—have different but convergent interests to stop a study as soon as an important difference between experimental and control groups emerges, and journals have an interest in publishing the apparently exciting findings. Furthermore, data monitoring committees are well aware of their ethical obligation to ensure that patients are offered effective treatment as soon as it is clear that effective treatment is indeed available, providing a justification for stopping early. However, data monitoring committees also have an ethical obligation to future patients who need to know more than whether data crossed a significance threshold; these patients need precise and accurate data on patient-important outcomes, of both risk and benefits, to make treatment choices.22 Such patients will often number in the tens or hundreds of thousands and sometimes in the millions. To the extent that substantial overestimates of treatment effect are widely disseminated, patients and clinicians will be misled when trying to balance benefits, harms, inconvenience, and cost of a possible health care intervention. If the true treatment effect is negligible or absent—as our results suggest it sometimes might be—acting on the results of a trial stopped early will be even more problematic. Thus, for trial investigators, our results suggest the desirability of stopping rules demanding large numbers of events. For clinicians, they suggest the necessity of assuming the likelihood of appreciable overestimates of effect in trials stopped early. Back to top Article Information Corresponding Author: Victor Montori, MD, MSc, Knowledge and Encounter Research Unit, Mayo Clinic, Plummer 3-35, 200 First St SW, Rochester, MN 55905 (montori.victor@mayo.edu). Author Contributions: Drs Montori and Guyatt had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Bassler, Briel, Montori, Glasziou, Zhou, Heels-Ansdell, Walter, Guyatt. Acquisition of data: Bassler, Briel, Montori, Lane, Malaga, Akl, Ferreira-Gonzalez, Alonso-Coello, Urrutia, Kunz, Ruiz Culebro, Alves da Silva, Flynn, Elamin, Strahm, Murad, Djulbegovic, Adhikari, Mills, Gwadry-Sridhar, Kirpalani, Soares, Abu Elnour, You, Karanicolas, Bucher, Lampropulos, Nordmann, Burns, Mulla, Raatz, Sood, Kaur, Bankhead, Mullan, Nerenberg, Vandvik, Coto-Yglesias, Schünemann, Tuche, Chrispim, Cook, Lutz, Ribic, Vale, Erwin, Nerenberg, Montori. Analysis and interpretation of data: Bassler, Briel, Montori, Glasziou, Zhou, Heels-Ansdell, Ramsay, Walter, Guyatt. Drafting of the manuscript: Bassler, Briel, Montori, Lane, Glasziou, Zhou, Heels-Ansdell, Walter, Guyatt. Critical revision of the manuscript for important intellectual content: Bassler, Briel, Montori, Lane, Glasziou, Zhou, Heels-Ansdell, Malaga, Akl, Ferreira-Gonzalez, Alonso-Coello, Urrutia, Kunz, Ruiz Culebro, Alves da Silva, Flynn, Elamin, Strahm, Murad, Djulbegovic, Adhikari, Mills, Gwadry-Sridhar, Kirpalani, Soares, Abu Elnour, You, Karanicolas, Bucher, Lampropulos, Nordmann, Burns, Mulla, Raatz, Sood, Kaur, Bankhead, Mullan, Nerenberg, Vandvik, Coto-Yglesias, Schünemann, Tuche, Chrispim, Cook, Lutz, Ribic, Vale, Erwin, Perera, Ramsay, Walter, Guyatt. Statistical analysis: Zhou, Heels-Ansdell, Walter. Obtained funding: Glasziou, Bassler, Perera, Walter, Guyatt. Administrative, technical, or material support: Glasziou, Montori, Lane, Flynn, Elamin, Bassler, Guyatt. Study supervision: Montori, Glasziou, Walter, Bassler, Briel, Guyatt. Financial Disclosures: None reported. Funding/Support: This study was funded by the UK Medical Research Council (reference G0600561). Dr Briel is supported by a scholarship for advanced researchers from the Swiss National Science Foundation (PASMA-112951/1) and the Roche Research Foundation. Dr Kunz, Dr Bucher, Dr Nordmann, and Dr Raatz are supported by grants from Santésuisse and the Gottfried and Julia Bangerter-Rhyner-Foundation. Dr Cook holds a Research Chair from the Canadian Institutes of Health Research. Role of the Sponsor: The UK Medical Research Council had no role in the design and conduct of the study; the collection, management, analysis, and interpretation of data; or the preparation, review, or approval of the manuscript. Additional Contributions: We thank Monica Owen, Michelle Vanderby, Shelley Anderson, BA, and Deborah Maddock, all from the Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, and Amanda Bedard, Knowledge and Encounter Research Unit, Mayo Clinic, Rochester, Minnesota, for their administrative assistance. We are grateful to Ratchada Kitsommart, MD, Department of Pediatrics, Siriraj Hospital Mahidol University, Bangkok, Thailand, Chusak Okascharoen, MD, PhD, Department of Pediatrics, Ramathibodi Faculty of Medicine, Mahidol University, Bangkok, for their help with blinding of articles, and Luma Muhtadie, BSc, University of California, Berkeley, and Kayi Li, BHSc, University of Toronto, Toronto, Ontario, Canada, for their help with the literature search. None of these individuals received any extra compensation for their contributions. Authors/Members of the Study of Trial Policy of Interim Truncation-2 (STOPIT-2) Group: Dirk Bassler, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, and Department of Neonatology, University Children's Hospital Tuebingen, Tuebingen, Germany); Matthias Briel, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, Basel, Switzerland); Victor M. Montori, MD, MSc, Melanie Lane, BA, David N. Flynn, BS, Mohamed B. Elamin, MBBS, Mohammad Hassan Murad, MD, MPH, Nisrin O. Abu Elnour, MBBS, Julianna F. Lampropulos, MD, Amit Sood, MD, MSc, Rebecca J. Mullan, MSc, and Patricia J. Erwin, MLS (Knowledge and Encounter Research Unit, Mayo Clinic, Rochester, Minnesota); Paul Glasziou, MBBS, PhD, Clare R. Bankhead, DPhil, and Rafael Perera, DPhil, MSc (Centre for Evidence-Based Medicine, Department of Primary Health Care, University of Oxford, Oxford, United Kingdom); Qi Zhou, PhD, Diane Heels-Ansdell, MSc, Carolina Ruiz Culebro, MD, John J. You, MD, MSc, Sohail M. Mulla, Jagdeep Kaur, PhD, CRA, Kara A. Nerenberg, MD, MSc, Holger Schünemann, MD, PhD, Deborah J. Cook, MD, MSc, Kristina Lutz, Christine M. Ribic, MD, MSc, Noah Vale, MD, Stephen D. Walter, PhD, and Gordon H. Guyatt, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University); German Malaga, MD, MSc (Universidad Peruana Cayetano Heredia, Lima, Peru); Elie A. Akl, MD, PhD (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Departments of Medicine and Family Medicine, State University of New York at Buffalo); Ignacio Ferreira-Gonzalez, PhD (Cardiology Department, Vall d’Hebron Hospital, CIBER de Epidemiología y Salud Pública, Barcelona, Spain); Pablo Alonso-Coello, MD, PhD, and Gerard Urrutia, MD (Centro Cochrane Iberoamericano, Hospital Sant Pau, Barcelona, and CIBER de Epidemiologia y Salud Publica, Barcelona); Regina Kunz, MD, MSc, Heiner C. Bucher, MD, MPH, Alain J. Nordmann, MD, MSc, and Heike Raatz, MD, MSc (Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel); Suzana Alves da Silva, MD, MSc, and Fabio Tuche, MD (Teaching and Research Center of Pro-Cardiaco, Rio de Janeiro, Brazil); Brigitte Strahm, MD (Pediatric Hematology and Oncology Centre for Pediatrics and Adolescent Medicine, University Hospital Freiburg, Freiburg, Germany); Benjamin Djulbegovic, MD, PhD (Center for Evidence-based Medicine, USF Health Clinical Research, Tampa, Florida); Neill K. J. Adhikari, MD, MSc (Sunnybrook Health Sciences Centre and University of Toronto, Toronto, Ontario, Canada); Edward J. Mills, PhD (British Columbia Centre for Excellence in HIV/AIDS, University of British Columbia, Vancouver, Canada); Femida Gwadry-Sridhar, PhD (University of Western Ontario, London, Canada); Haresh Kirpalani, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Children's Hospital Philadelphia, Philadelphia, Pennsylvania); Heloisa P. Soares, MD (Mount Sinai Medical Center, Miami Beach, Florida); Paul J. Karanicolas, MD, PhD (Memorial Sloan-Kettering Cancer Center, New York, New York); Karen E. A. Burns, MD, MSC (St. Michael's Hospital, Keenan Research Centre and Li Ka Shing Knowledge Institute, University of Toronto); Per Olav Vandvik, MD, PhD (Department of Medicine, Gjøvik, Innlandet Hospital Trust, Norway); Fernando Coto-Yglesias, MD (Hospital Nacional de Geriatría y Gerontología San José, Costa Rica); Pedro Paulo M. Chrispim, MSc (National School of Public Health, Rio de Janeiro); and Tim Ramsay, PhD (Ottawa Hospital Research Institute, University of Ottawa, Ottawa, Ontario, Canada). References 1. Psaty BM, Rennie D. Stopping medical research to save money. JAMA. 2003;289(16):2128-213112709471PubMedGoogle ScholarCrossref 2. Wheatley K, Clayton D. Be skeptical about unexpected large apparent treatment effects. Control Clin Trials. 2003;24(1):66-7012559643PubMedGoogle ScholarCrossref 3. Pocock S, White I. Trials stopped early: too good to be true? Lancet. 1999;353(9157):943-94410459899PubMedGoogle ScholarCrossref 4. Schulz KF, Grimes DA. Multiplicity in randomised trials: subgroup and interim analyses. Lancet. 2005;365(9471):1657-166115885299PubMedGoogle ScholarCrossref 5. Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials. 1989;10(4):(suppl) 209S-221S2605969PubMedGoogle ScholarCrossref 6. Montori VM, Devereaux PJ, Adhikari NK, et al. Randomized trials stopped early for benefit: a systematic review. JAMA. 2005;294(17):2203-220916264162PubMedGoogle ScholarCrossref 7. Bassler D, Montori VM, Briel M, et al. Early stopping of randomized clinical trials for overt efficacy is problematic. J Clin Epidemiol. 2008;61(3):241-24618226746PubMedGoogle ScholarCrossref 8. Sydes MR, Parmar MK. Interim monitoring of efficacy data is important and appropriate. J Clin Epidemiol. 2008;61(3):203-20418226741PubMedGoogle ScholarCrossref 9. Trotta F, Apolone G, Garattini S, Tafuri G. Stopping a trial early in oncology: for patients or for industry? Ann Oncol. 2008;19(7):1347-135318304961PubMedGoogle ScholarCrossref 10. Sargent D. Early stopping for benefit in National Cancer Institute–sponsored randomized Phase III trials. J Clin Oncol. 2009;27(10):1543-154419237627PubMedGoogle ScholarCrossref 11. Goodman SN. Stopping at nothing? some dilemmas of data monitoring in clinical trials. Ann Intern Med. 2007;146(12):882-88717577008PubMedGoogle ScholarCrossref 12. Briel M, Lane M, Montori VM, et al. Stopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncation-2 (STOPIT-2). Trials. 2009;10:4919580665PubMedGoogle ScholarCrossref 13. Pocock SJ. When (not) to stop a clinical trial for benefit. JAMA. 2005;294(17):2228-223016264167PubMedGoogle ScholarCrossref 14. Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life. Med Care. 2003;41(5):582-59212719681PubMedGoogle Scholar 15. Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev. 2009;(1):MR00000619160345PubMedGoogle Scholar 16. Korn EL, Freidlin B, Mooney M. Stopping or reporting early for positive results in randomized clinical trials. J Clin Oncol. 2009;27(10):1712-172119237631PubMedGoogle ScholarCrossref 17. Freidlin B, Korn EL. Stopping clinical trials early for benefit: impact on estimation. Clin Trials. 2009;6(2):119-12519342463PubMedGoogle ScholarCrossref 18. Goodman SN. Stopping trials for efficacy. Clin Trials. 2009;6(2):133-13519342465PubMedGoogle ScholarCrossref 19. DAMOCLES Study Group; NHS Health Technology Assessment Programme. A proposed charter for clinical trial data monitoring committees. Lancet. 2005;365(9460):711-72215721478PubMedGoogle Scholar 20. Pocock SJ. Current controversies in data monitoring for clinical trials. Clin Trials. 2006;3(6):513-52117170035PubMedGoogle ScholarCrossref 21. Bassler D, Ferreira-Gonzalez I, Briel M, et al. Systematic reviewers neglect bias that results from trials stopped early for benefit. J Clin Epidemiol. 2007;60(9):869-87317689802PubMedGoogle ScholarCrossref 22. Mueller PS, Montori VM, Bassler D, Koenig BA, Guyatt GH. Ethical issues in stopping randomized trials early because of apparent benefit. Ann Intern Med. 2007;146(12):878-88117577007PubMedGoogle ScholarCrossref http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png JAMA American Medical Association http://www.deepdyve.com/lp/american-medical-association/stopping-randomized-trials-early-for-benefit-and-estimation-of-c5snvpsnrm

Loading next page...

References (37)

D. Bassler, D. Bassler, Ignacio Ferreira-González, Ignacio Ferreira-González, M. Briel, D. Cook, P. Devereaux, D. Heels-Ansdell, H. Kirpalani, M. Meade, V. Montori, Anna Rozenberg, H. Schünemann, H. Schünemann, G. Guyatt (2007)
Systematic reviewers neglect bias that results from trials stopped early for benefit.
Journal of clinical epidemiology, 60 9
B. Freidlin, E. Korn (2009)
Stopping clinical trials early for benefit: impact on estimation
Clinical Trials, 6
S. Pocock (2005)
When (not) to stop a clinical trial for benefit.
JAMA, 294 17
D. Sargent (2009)
Early stopping for benefit in National Cancer Institute-sponsored randomized Phase III trials: the system is working.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 27 10
S. Pocock, I. White (1999)
Trials stopped early: too good to be true?
The Lancet, 353
MD (Pediatric Hematology and Oncology Centre for Pediatrics and Adolescent Medicine
P. Mueller, V. Montori, D. Bassler, B. Koenig, G. Guyatt (2007)
Ethical Issues in Stopping Randomized Trials Early Because of Apparent Benefit
Annals of Internal Medicine, 146
E. Korn, B. Freidlin, M. Mooney (2009)
Stopping or reporting early for positive results in randomized clinical trials: the National Cancer Institute Cooperative Group experience from 1990 to 2005.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 27 10
B. Psaty, D. Rennie (2003)
Stopping medical research to save money: a broken pact with researchers and patients.
JAMA, 289 16
Vivian Barnes, Dr. Jones, Dr. Livingston, Dr. Milam, Dr. Pheil (1993)
Department of
S. Hopewell, Kirsty Loudon, M. Clarke, A. Oxman, K. Dickersin (2009)
Publication bias in clinical trials due to statistical significance or direction of trial results.
The Cochrane database of systematic reviews, 1
PhD (British Columbia Centre for Excellence in HIV/AIDS
Wheatley K, Clayton D (2003)
Be skeptical about unexpected large apparent treatment effects.
Control Clin Trials, 24
MSc (National School of Public Health
PhD (Cardiology Department, Vall d'Hebron Hospital, CIBER de Epidemiología y Salud Pú blica
MD (Hospital Nacional de Geriatría y Gerontología
S. Pocock (2006)
Current controversies in data monitoring for clinical trials
Clinical Trials, 3
M. Sydes, M. Parmar (2008)
Interim monitoring of efficacy data is important and appropriate.
Journal of clinical epidemiology, 61 3
S. Pocock, M. Hughes (1989)
Practical problems in interim analyses, with particular regard to estimation.
Controlled clinical trials, 10 4 Suppl
(2005)
Assessment Programme
Psaty BM, Rennie D (2003)
Stopping medical research to save money.
JAMA, 289
A. Grant, D. Altman, Abdel Babiker, M. Campbell, F. Clemens, J. Darbyshire, Elbourne, S. McLeer, M. Parmar, S. Pocock, D. Spiegelhalter, Sydes, A. Walker, S. Wallace, Damocles Grp (2005)
A proposed charter for clinical trial data monitoring committees: helping them to do their job well
The Lancet, 365
Michael's Hospital
G. Norman, J. Sloan, KathleenW. Wyrwich (2003)
Interpretation of Changes in Health-related Quality of Life: The Remarkable Universality of Half a Standard Deviation
Medical Care, 41
D. Bassler, V. Montori, M. Briel, P. Glasziou, G. Guyatt (2008)
Early stopping of randomized clinical trials for overt efficacy is problematic.
Journal of clinical epidemiology, 61 3
M. Briel, Melanie Lane, V. Montori, D. Bassler, P. Glasziou, G. Málaga, E. Akl, I. Ferreira-González, P. Alonso-Coello, G. Úrrutia, R. Kunz, C. Culebro, Suzana Silva, David Flynn, Mohamed Elamin, B. Strahm, M. Murad, B. Djulbegovic, N. Adhikari, E. Mills, F. Gwadry-Sridhar, H. Kirpalani, H. Kirpalani, H. Soares, N. Elnour, J. You, P. Karanicolas, H. Bucher, J. Lampropulos, A. Nordmann, K. Burns, S. Mulla, H. Raatz, A. Sood, Jagdeep Kaur, C. Bankhead, R. Mullan, K. Nerenberg, P. Vandvik, Fernando Coto‐Yglesias, H. Schünemann, Fabio Tuche, P. Chrispim, D. Cook, K. Lutz, C. Ribic, Noah Vale, P. Erwin, R. Perera, Qi Zhou, D. Heels-Ansdell, T. Ramsay, S. Walter, G. Guyatt (2009)
Stopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncation-2 (STOPIT-2)
Trials, 10
S. Goodman (2007)
Stopping at Nothing? Some Dilemmas of Data Monitoring in Clinical Trials
Annals of Internal Medicine, 146
Sargent D (2009)
Early stopping for benefit in National Cancer Institute–sponsored randomized Phase III trials.
J Clin Oncol, 27
(2005)
A proposed charter for clinical trial data monitoring committees.
Lancet, 365
V. Montori, P. Devereaux, Neill Adhikari, N. Adhikari, K. Burns, K. Burns, Christoph Eggert, M. Briel, C. Lacchetti, T. Leung, E. Darling, D. Bryant, H. Bucher, H. Schünemann, H. Schünemann, M. Meade, D. Cook, P. Erwin, A. Sood, R. Sood, Benjamin Lo, C. Thompson, Qi Zhou, E. Mills, G. Guyatt (2005)
Randomized trials stopped early for benefit: a systematic review.
JAMA, 294 17
J. Whitehead, E. Cobo (2008)
Stopping a trial early in oncology: for patients or for industry?
Annals of oncology : official journal of the European Society for Medical Oncology, 19 8
MD, MSc (Sunnybrook Health Sciences Centre and University of Toronto
K. Schulz, D. Grimes (2005)
Multiplicity in randomised trials II: subgroup and interim analyses
The Lancet, 365
K. Wheatley, D. Clayton (2003)
Be skeptical about unexpected large apparent treatment effects: the case of an MRC AML12 randomization.
Controlled clinical trials, 24 1
Korn EL, Freidlin B, Mooney M (2009)
Stopping or reporting early for positive results in randomized clinical trials.
J Clin Oncol, 27
(2009)
Stopping trials for efficacy
Norman GR, Sloan JA, Wyrwich KW (2003)
Interpretation of changes in health-related quality of life.
Med Care, 41

Publisher: American Medical Association
Copyright: Copyright © 2010 American Medical Association. All Rights Reserved.
ISSN: 0098-7484
eISSN: 1538-3598
DOI: 10.1001/jama.2010.310
Publisher site: See Article on Publisher Site

Abstract

Abstract Context Theory and simulation suggest that randomized controlled trials (RCTs) stopped early for benefit (truncated RCTs) systematically overestimate treatment effects for the outcome that precipitated early stopping. Objective To compare the treatment effect from truncated RCTs with that from meta-analyses of RCTs addressing the same question but not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect. Data Sources Search of MEDLINE, EMBASE, Current Contents, and full-text journal content databases to identify truncated RCTs up to January 2007; search of MEDLINE, Cochrane Database of Systematic Reviews, and Database of Abstracts of Reviews of Effects to identify systematic reviews from which individual RCTs were extracted up to January 2008. Study Selection Selected studies were RCTs reported as having stopped early for benefit and matching nontruncated RCTs from systematic reviews. Independent reviewers with medical content expertise, working blinded to trial results, judged the eligibility of the nontruncated RCTs based on their similarity to the truncated RCTs. Data Extraction Reviewers with methodological expertise conducted data extraction independently. Results The analysis included 91 truncated RCTs asking 63 different questions and 424 matching nontruncated RCTs. The pooled ratio of relative risks in truncated RCTs vs matching nontruncated RCTs was 0.71 (95% confidence interval, 0.65-0.77). This difference was independent of the presence of a statistical stopping rule and the methodological quality of the studies as assessed by allocation concealment and blinding. Large differences in treatment effect size between truncated and nontruncated RCTs (ratio of relative risks <0.75) occurred with truncated RCTs having fewer than 500 events. In 39 of the 63 questions (62%), the pooled effects of the nontruncated RCTs failed to demonstrate significant benefit. Conclusions Truncated RCTs were associated with greater effect sizes than RCTs not stopped early. This difference was independent of the presence of statistical stopping rules and was greatest in smaller studies. Although randomized controlled trials (RCTs) generally provide credible evidence of treatment effects, multiple problems may emerge when investigators terminate a trial earlier than planned,1 especially when the decision to terminate the trial is based on the finding of an apparently beneficial treatment effect. Bias may arise because large random fluctuations of the estimated treatment effect can occur, particularly early in the progress of a trial.2 When investigators stop a trial based on an apparently beneficial treatment effect, their results may therefore provide misleading estimates of the benefit.3,4 Statistical modeling suggests that RCTs stopped early for benefit (truncated RCTs) will systematically overestimate treatment effects,5 and empirical data demonstrate that truncated RCTs often show implausibly large treatment effects.6 Empirical evidence addressing the magnitude of bias from stopping early, and factors that may influence the magnitude of the bias, remain limited and the appropriate interpretation of truncated RCTs a matter of controversy.6-11 We therefore undertook a systematic review to determine the treatment effect from truncated RCTs compared with meta-analyses of RCTs addressing the same research question that were not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect. Methods A prior report provides a detailed description of the design and methods of this study (Study of Trial Policy of Interim Truncation-2 [STOPIT-2]).12 In summary, we conducted extensive literature searches to identify truncated RCTs and systematic reviews addressing the same question. We retrieved all RCTs included in the systematic reviews, extracted data and conducted new meta-analyses of the nontruncated RCTs addressing the outcome that led to the early termination of the truncated RCTs, and compared the relative risk (RR) generated by the truncated RCTs with the RR from all matching nontruncated RCTs. Literature Search We updated the database from our prior study following the same search strategy.6 In January 2007 we searched MEDLINE, EMBASE, Current Contents, and full-text journal content databases from their inception for truncated RCTs. In addition, we identified truncated RCTs through hand searching, by personal contact with trial investigators, and by a citation search linked to 2 key articles.6,13 For systematic reviews, we searched the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects, and MEDLINE from their inception to January 2008. Eligibility Criteria for Truncated RCTs and Matching Systematic Reviews We included RCTs of any intervention reported as having stopped earlier than initially planned owing to interim results in favor of the intervention. We excluded matching systematic reviews that did not have a methods section and did not describe a literature search that, at minimum, included MEDLINE.12 Identification, Retrieval, and Eligibility of Nontruncated RCTs We retrieved the full text of all RCTs included in each systematic review. If a systematic review was published prior to the matching truncated RCT and thus did not include the truncated RCT, we updated this review.12 Eligible nontruncated RCTs addressed the outcome that led to the early termination of the truncated RCT and stated clearly that allocation was randomized. We assessed the eligibility of nontruncated RCTs based on the similarity of the question addressed by the matching truncated RCT (see Briel et al12 for details). Teams of 2 reviewers with relevant clinical expertise made independent eligibility and similarity decisions and resolved disagreement by discussion and, if necessary, by consulting a third party. Reviewers who judged eligibility were blinded to the results of the trials through electronic or manual masking.12 Data Extraction and Analysis Working in pairs, reviewers with methodological expertise conducted data extraction independently.12 From each RCT (truncated or nontruncated), we collected information about early termination, the journal of publication (we categorized Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine as high-impact journals), the year of publication, methodological quality, data monitoring committees, stopping rules at the outset of the trial, interim analyses, and the measure of treatment effect for the outcome that terminated the truncated RCT. The only study characteristic tested for a statistically significant difference between truncated and nontruncated RCTs was publication in a high-impact journal. We calculated an RR for each RCT in our study. For studies that provided results using continuous data, we estimated an approximate dichotomous equivalent.12,14 For each question, we used meta-analysis for the pooled RR and a 95% confidence interval (CI) for all nontruncated RCTs. If more than 1 truncated RCT addressed the same question, we calculated a pooled RR and CI for those truncated RCTs. Pooled estimates of RRs were calculated using an inverse-variance weighted random-effects model. We performed a z test for each meta-analysis to assess differences between the truncated and nontruncated RCTs with respect to their pooled RRs. As a summary measure we calculated a ratio of RRs, and its logarithm, for each question as follows: Log[ratio of RRs] = log[RR of truncated RCT(s)/pooled RR of nontruncated RCTs] = log[RR of truncated RCT(s)] – log[pooled RR of nontruncated RCTs]. We estimated the pooled log[ratio of RRs] using a random-effects inverse-variance meta-analysis and then, for purposes of presentation, back transformed to the overall ratio of RRs. To explore factors associated with the magnitude of the ratio of RRs, we performed a meta-regression analysis in which the dependent variable was the log[ratio of RRs] and independent variables were whether the truncated RCTs used a formal stopping rule and the number of outcome events in the truncated RCTs. When more than 1 truncated RCT addressed the same question, the stopping rule status was assigned to “has a rule” if at least 1 truncated RCT had a rule. Similarly, when there was more than 1 truncated RCT for the same question, we used the truncated RCT with the largest number of events as the source for our analyses of the influence of the number of events. To allow consideration of methodological quality as a predictor and to test whether restriction to nontruncated RCTs that are more similar to the truncated RCTs would change the results, we constructed a second meta-regression described fully in a prior report.12 In brief, this meta-regression used a hierarchical model with 2 levels: individual RCT (study) level and meta-analysis (question) level. The dependent variable in this analysis was the logarithm of the RR for each study. Predictor variables considered included a combined group variable (truncated RCT with a rule, truncated RCT without a rule, nontruncated RCT), number of events, concealment of allocation, use of blinding, and the interaction between the group variable and the other variables. We performed this meta-regression on different data sets based on various thresholds for the similarity of the nontruncated RCTs in each question to the matching truncated RCTs. To test for an order effect (the hypothesis being that studies published earlier will have more responsive populations), for each review question we established where in the sequence of published studies (by publication date) the truncated RCT stood and referred to this as the “rank” of the truncated RCT. We then calculated a “standardized rank” [100 · (rank − 1)/(total number of studies − 1)]. If there was more than 1 truncated RCT in the review question, we used the median among the truncated RCTs as the standardized rank of truncated RCTs. We then divided the review questions into 2 groups (those with the standardized rank of the truncated RCT equal to or less than 50 [n = 27] and those with the standardized rank of the truncated RCT greater than 50 [n = 37]) and repeated the meta-analysis for each group. As a secondary analysis, we compared the RR of the truncated RCT(s) with the pooled estimate for all trials including the truncated RCTs. Analyses were performed using SAS version 9.2 (SAS Institute Inc, Cary, North Carolina); tests were 2-sided, and P < .05 was used as the threshold for statistical significance. Results Literature Search A total of 195 truncated RCTs formed the basis for the search for systematic reviews; we identified matching systematic reviews for 79 questions. We extracted 2488 nontruncated RCTs from 202 matching systematic reviews (of which 32 were updated). Of these 2488 studies, 22 (0.9%) proved to be truncated RCTs, which we added to the truncated RCT database. We excluded 2012 nontruncated RCTs based on insufficient similarity to the truncated RCTs or unclear randomization and 30 because the RR could not be calculated. The remaining 424 nontruncated RCTs and 91 matching truncated RCTs addressed 63 questions (Figure 1). An eSupplement reporting the references of the included studies is available. Study Characteristics Table 1 describes the characteristics of the eligible studies. Compared with matching nontruncated RCTs, truncated RCTs were more likely to be published in high-impact journals (30% vs 68%, P < .001). Quantification of Differences in Treatment Effect Size Of 63 comparisons, the ratio of RRs was equal to or less than 1.0 in 55 (87%); the weighted average ratio of RRs was 0.71 (95% CI, 0.65-0.77; P < .001) (Figure 2). In 39 of 63 comparisons (62%), the pooled estimates for nontruncated RCTs were not statistically significant. Comparison of the truncated RCTs with all RCTs (including the truncated RCTs) demonstrated a weighted average ratio of RRs of 0.85; in 16 of 63 comparisons (25%), the pooled estimate failed to demonstrate a significant effect. Determinants of Differences in Treatment Effect Size Table 2 summarizes the findings from the single-level meta-regression analysis to determine predictors of differences in the treatment effect size between truncated and nontruncated RCTs. In the univariable models, both the number of events (P < .001) and the presence of a statistical stopping rule (P = .02) were significant. When we included both variables in the model, only the number of events remained significant (P < .001). The results from the multilevel meta-regression confirmed significant interactions between the combined variable (truncated vs nontruncated RCT) and the number of events (P < .001). Large differences in treatment effect size between truncated and nontruncated RCTs (ratio of RRs <0.75) occurred in truncated RCTs with fewer than 500 events (Figure 3). The multilevel meta-regression analysis using the entire data set demonstrated that neither concealment of allocation (P = .96) nor blinding (P = .32) were significant predictors of the differences in treatment effect size. Different Data Sets and Order of Publication The findings were similar, irrespective of either the closeness of the match between nontruncated and truncated RCTs or the order of publication of the truncated RCTs relative to that of matching nontruncated RCTs. In the multilevel meta-regression analysis, adjusted ratios of RRs of truncated vs nontruncated RCTs were 0.64 when questions were very closely matched, 0.70 when they were moderately close, and 0.69 when they were least close. The ratio of RRs of the group in which the truncated RCTs were published in early years (standardized rank ≤50) was 0.74 (95% CI, 0.66-0.83) and for the later years (standardized rank >50) was 0.68 (95% CI, 0.60-0.77). The P value for the difference between the 2 estimates was 0.33. Comment Summary of Findings In this empirical study including 91 truncated RCTs and 424 matching nontruncated RCTs addressing 63 questions, we found that truncated RCTs provide biased estimates of effects on the outcome that precipitated early stopping. On average, the ratio of RRs in the truncated RCTs and matching nontruncated RCTs was 0.71. This implies that, for instance, if the RR from the nontruncated RCTs was 0.8 (a 20% relative risk reduction), the RR from the truncated RCTs would be on average approximately 0.57 (a 43% relative risk reduction, more than double the estimate of benefit). Nontruncated RCTs with no evidence of benefit—ie, with an RR of 1.0—would on average be associated with a 29% relative risk reduction in truncated RCTs addressing the same question. In nearly two-thirds of comparisons, the pooled estimate for nontruncated RCTs failed to demonstrate a statistically significant effect. We found substantial heterogeneity in our analysis of the pooled ratio of RRs for truncated vs nontruncated RCTs, suggesting that differences between truncated and nontruncated RCT effects will differ across study questions. This heterogeneity could be partially explained by the total number of outcome events in the truncated RCTs, with larger differences between truncated and nontruncated RCTs in studies with a smaller number of events. The methodological quality and the presence of a statistical stopping rule failed to predict the observed difference in the treatment effect. Strengths and Limitations We used rigorous search strategies and undertook an intensive independent evaluation of eligibility and similarity of several thousand RCTs blinded to the results. Our analysis had considerable statistical power to link the estimates of treatment effect from truncated and nontruncated RCTs addressing the same question and demonstrated consistent results across degrees of similarity of the question addressed by the truncated RCTs and the matching nontruncated RCTs. Our literature search, while extensive, missed some truncated RCTs. Assessment of the 2488 RCTs included in the systematic reviews revealed 22 additional truncated RCTs not initially identified. Whether results would differ in other unidentified truncated RCTs remains speculative. We relied on systematic reviews to identify nontruncated RCTs but did not assess the reviews' susceptibility to publication bias. However, we know that trials with positive findings have nearly 4 times the odds of being published compared with those with negative findings.15 To the extent that publication bias is present, inclusion of unpublished studies would lead to a diminished pooled effect from the nontruncated RCTs. This would in turn likely lead to a larger gradient of effect between truncated and nontruncated RCTs. Thus, to the extent that publication bias exists, our results probably represent a conservative estimate of the exaggeration in treatment benefit associated with stopping early. Relation to Recent Empirical Studies, Simulation, and Commentaries Korn and colleagues recently reviewed the results of cancer trials stopped early and that either continued with further follow-up or released results early.16 They found that substantial differences between results at the time of early stopping and subsequent follow-up seldom occurred.16 Freidlin and Korn published a related simulation study that supported these findings, suggesting that if the true effect is large, differences between stopped-early results and full follow-up results will differ little.17 These recent studies confirm that even when the true effect is large, studies stopped early still overestimate that effect. More important, the authors do not address circumstances in which the true underlying effect is small or absent. Clinicians seek the best estimate of an unknown true underlying effect with appropriate safeguards against bias. As Goodman18 points out in a commentary on the simulation by Freidlin and Korn, “since we do not know what the true effect is, we cannot know in any particular case whether the observed effect is biased or not; the fact that the trial is stopped early is not prima facie evidence that the estimate is wrong.” We support this statement; unfortunately, neither do we know that the stopped-early result is close to the truth. Our findings suggest that often it is not. Implications Consensus exists that rigorous data monitoring practice requires a predefined statistical stopping rule.19,20 Our findings, however, indicate that even a formal rule is insufficient to prevent bias consequent on stopping early and suggest the advisability of rules that require a large number of outcome events before early stopping is contemplated. In this review we have focused only on RCTs stopped early for benefit. Although ethical concerns make decisions regarding stopping trials early for safety more complex than those regarding stopping trials early for benefit, inferences regarding harm and those regarding benefit are equally susceptible to the bias associated with stopping early. Our results have important implications for systematic reviews and ethics.21,22 If reviewers do not note truncation and do not consider early stopping for benefit, meta-analyses will report overestimates of effects.21 Investigators and funding bodies—in particular, drug and device manufacturers—have different but convergent interests to stop a study as soon as an important difference between experimental and control groups emerges, and journals have an interest in publishing the apparently exciting findings. Furthermore, data monitoring committees are well aware of their ethical obligation to ensure that patients are offered effective treatment as soon as it is clear that effective treatment is indeed available, providing a justification for stopping early. However, data monitoring committees also have an ethical obligation to future patients who need to know more than whether data crossed a significance threshold; these patients need precise and accurate data on patient-important outcomes, of both risk and benefits, to make treatment choices.22 Such patients will often number in the tens or hundreds of thousands and sometimes in the millions. To the extent that substantial overestimates of treatment effect are widely disseminated, patients and clinicians will be misled when trying to balance benefits, harms, inconvenience, and cost of a possible health care intervention. If the true treatment effect is negligible or absent—as our results suggest it sometimes might be—acting on the results of a trial stopped early will be even more problematic. Thus, for trial investigators, our results suggest the desirability of stopping rules demanding large numbers of events. For clinicians, they suggest the necessity of assuming the likelihood of appreciable overestimates of effect in trials stopped early. Back to top Article Information Corresponding Author: Victor Montori, MD, MSc, Knowledge and Encounter Research Unit, Mayo Clinic, Plummer 3-35, 200 First St SW, Rochester, MN 55905 (montori.victor@mayo.edu). Author Contributions: Drs Montori and Guyatt had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Bassler, Briel, Montori, Glasziou, Zhou, Heels-Ansdell, Walter, Guyatt. Acquisition of data: Bassler, Briel, Montori, Lane, Malaga, Akl, Ferreira-Gonzalez, Alonso-Coello, Urrutia, Kunz, Ruiz Culebro, Alves da Silva, Flynn, Elamin, Strahm, Murad, Djulbegovic, Adhikari, Mills, Gwadry-Sridhar, Kirpalani, Soares, Abu Elnour, You, Karanicolas, Bucher, Lampropulos, Nordmann, Burns, Mulla, Raatz, Sood, Kaur, Bankhead, Mullan, Nerenberg, Vandvik, Coto-Yglesias, Schünemann, Tuche, Chrispim, Cook, Lutz, Ribic, Vale, Erwin, Nerenberg, Montori. Analysis and interpretation of data: Bassler, Briel, Montori, Glasziou, Zhou, Heels-Ansdell, Ramsay, Walter, Guyatt. Drafting of the manuscript: Bassler, Briel, Montori, Lane, Glasziou, Zhou, Heels-Ansdell, Walter, Guyatt. Critical revision of the manuscript for important intellectual content: Bassler, Briel, Montori, Lane, Glasziou, Zhou, Heels-Ansdell, Malaga, Akl, Ferreira-Gonzalez, Alonso-Coello, Urrutia, Kunz, Ruiz Culebro, Alves da Silva, Flynn, Elamin, Strahm, Murad, Djulbegovic, Adhikari, Mills, Gwadry-Sridhar, Kirpalani, Soares, Abu Elnour, You, Karanicolas, Bucher, Lampropulos, Nordmann, Burns, Mulla, Raatz, Sood, Kaur, Bankhead, Mullan, Nerenberg, Vandvik, Coto-Yglesias, Schünemann, Tuche, Chrispim, Cook, Lutz, Ribic, Vale, Erwin, Perera, Ramsay, Walter, Guyatt. Statistical analysis: Zhou, Heels-Ansdell, Walter. Obtained funding: Glasziou, Bassler, Perera, Walter, Guyatt. Administrative, technical, or material support: Glasziou, Montori, Lane, Flynn, Elamin, Bassler, Guyatt. Study supervision: Montori, Glasziou, Walter, Bassler, Briel, Guyatt. Financial Disclosures: None reported. Funding/Support: This study was funded by the UK Medical Research Council (reference G0600561). Dr Briel is supported by a scholarship for advanced researchers from the Swiss National Science Foundation (PASMA-112951/1) and the Roche Research Foundation. Dr Kunz, Dr Bucher, Dr Nordmann, and Dr Raatz are supported by grants from Santésuisse and the Gottfried and Julia Bangerter-Rhyner-Foundation. Dr Cook holds a Research Chair from the Canadian Institutes of Health Research. Role of the Sponsor: The UK Medical Research Council had no role in the design and conduct of the study; the collection, management, analysis, and interpretation of data; or the preparation, review, or approval of the manuscript. Additional Contributions: We thank Monica Owen, Michelle Vanderby, Shelley Anderson, BA, and Deborah Maddock, all from the Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, and Amanda Bedard, Knowledge and Encounter Research Unit, Mayo Clinic, Rochester, Minnesota, for their administrative assistance. We are grateful to Ratchada Kitsommart, MD, Department of Pediatrics, Siriraj Hospital Mahidol University, Bangkok, Thailand, Chusak Okascharoen, MD, PhD, Department of Pediatrics, Ramathibodi Faculty of Medicine, Mahidol University, Bangkok, for their help with blinding of articles, and Luma Muhtadie, BSc, University of California, Berkeley, and Kayi Li, BHSc, University of Toronto, Toronto, Ontario, Canada, for their help with the literature search. None of these individuals received any extra compensation for their contributions. Authors/Members of the Study of Trial Policy of Interim Truncation-2 (STOPIT-2) Group: Dirk Bassler, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, and Department of Neonatology, University Children's Hospital Tuebingen, Tuebingen, Germany); Matthias Briel, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, Basel, Switzerland); Victor M. Montori, MD, MSc, Melanie Lane, BA, David N. Flynn, BS, Mohamed B. Elamin, MBBS, Mohammad Hassan Murad, MD, MPH, Nisrin O. Abu Elnour, MBBS, Julianna F. Lampropulos, MD, Amit Sood, MD, MSc, Rebecca J. Mullan, MSc, and Patricia J. Erwin, MLS (Knowledge and Encounter Research Unit, Mayo Clinic, Rochester, Minnesota); Paul Glasziou, MBBS, PhD, Clare R. Bankhead, DPhil, and Rafael Perera, DPhil, MSc (Centre for Evidence-Based Medicine, Department of Primary Health Care, University of Oxford, Oxford, United Kingdom); Qi Zhou, PhD, Diane Heels-Ansdell, MSc, Carolina Ruiz Culebro, MD, John J. You, MD, MSc, Sohail M. Mulla, Jagdeep Kaur, PhD, CRA, Kara A. Nerenberg, MD, MSc, Holger Schünemann, MD, PhD, Deborah J. Cook, MD, MSc, Kristina Lutz, Christine M. Ribic, MD, MSc, Noah Vale, MD, Stephen D. Walter, PhD, and Gordon H. Guyatt, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University); German Malaga, MD, MSc (Universidad Peruana Cayetano Heredia, Lima, Peru); Elie A. Akl, MD, PhD (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Departments of Medicine and Family Medicine, State University of New York at Buffalo); Ignacio Ferreira-Gonzalez, PhD (Cardiology Department, Vall d’Hebron Hospital, CIBER de Epidemiología y Salud Pública, Barcelona, Spain); Pablo Alonso-Coello, MD, PhD, and Gerard Urrutia, MD (Centro Cochrane Iberoamericano, Hospital Sant Pau, Barcelona, and CIBER de Epidemiologia y Salud Publica, Barcelona); Regina Kunz, MD, MSc, Heiner C. Bucher, MD, MPH, Alain J. Nordmann, MD, MSc, and Heike Raatz, MD, MSc (Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel); Suzana Alves da Silva, MD, MSc, and Fabio Tuche, MD (Teaching and Research Center of Pro-Cardiaco, Rio de Janeiro, Brazil); Brigitte Strahm, MD (Pediatric Hematology and Oncology Centre for Pediatrics and Adolescent Medicine, University Hospital Freiburg, Freiburg, Germany); Benjamin Djulbegovic, MD, PhD (Center for Evidence-based Medicine, USF Health Clinical Research, Tampa, Florida); Neill K. J. Adhikari, MD, MSc (Sunnybrook Health Sciences Centre and University of Toronto, Toronto, Ontario, Canada); Edward J. Mills, PhD (British Columbia Centre for Excellence in HIV/AIDS, University of British Columbia, Vancouver, Canada); Femida Gwadry-Sridhar, PhD (University of Western Ontario, London, Canada); Haresh Kirpalani, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Children's Hospital Philadelphia, Philadelphia, Pennsylvania); Heloisa P. Soares, MD (Mount Sinai Medical Center, Miami Beach, Florida); Paul J. Karanicolas, MD, PhD (Memorial Sloan-Kettering Cancer Center, New York, New York); Karen E. A. Burns, MD, MSC (St. Michael's Hospital, Keenan Research Centre and Li Ka Shing Knowledge Institute, University of Toronto); Per Olav Vandvik, MD, PhD (Department of Medicine, Gjøvik, Innlandet Hospital Trust, Norway); Fernando Coto-Yglesias, MD (Hospital Nacional de Geriatría y Gerontología San José, Costa Rica); Pedro Paulo M. Chrispim, MSc (National School of Public Health, Rio de Janeiro); and Tim Ramsay, PhD (Ottawa Hospital Research Institute, University of Ottawa, Ottawa, Ontario, Canada). References 1. Psaty BM, Rennie D. Stopping medical research to save money. JAMA. 2003;289(16):2128-213112709471PubMedGoogle ScholarCrossref 2. Wheatley K, Clayton D. Be skeptical about unexpected large apparent treatment effects. Control Clin Trials. 2003;24(1):66-7012559643PubMedGoogle ScholarCrossref 3. Pocock S, White I. Trials stopped early: too good to be true? Lancet. 1999;353(9157):943-94410459899PubMedGoogle ScholarCrossref 4. Schulz KF, Grimes DA. Multiplicity in randomised trials: subgroup and interim analyses. Lancet. 2005;365(9471):1657-166115885299PubMedGoogle ScholarCrossref 5. Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials. 1989;10(4):(suppl) 209S-221S2605969PubMedGoogle ScholarCrossref 6. Montori VM, Devereaux PJ, Adhikari NK, et al. Randomized trials stopped early for benefit: a systematic review. JAMA. 2005;294(17):2203-220916264162PubMedGoogle ScholarCrossref 7. Bassler D, Montori VM, Briel M, et al. Early stopping of randomized clinical trials for overt efficacy is problematic. J Clin Epidemiol. 2008;61(3):241-24618226746PubMedGoogle ScholarCrossref 8. Sydes MR, Parmar MK. Interim monitoring of efficacy data is important and appropriate. J Clin Epidemiol. 2008;61(3):203-20418226741PubMedGoogle ScholarCrossref 9. Trotta F, Apolone G, Garattini S, Tafuri G. Stopping a trial early in oncology: for patients or for industry? Ann Oncol. 2008;19(7):1347-135318304961PubMedGoogle ScholarCrossref 10. Sargent D. Early stopping for benefit in National Cancer Institute–sponsored randomized Phase III trials. J Clin Oncol. 2009;27(10):1543-154419237627PubMedGoogle ScholarCrossref 11. Goodman SN. Stopping at nothing? some dilemmas of data monitoring in clinical trials. Ann Intern Med. 2007;146(12):882-88717577008PubMedGoogle ScholarCrossref 12. Briel M, Lane M, Montori VM, et al. Stopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncation-2 (STOPIT-2). Trials. 2009;10:4919580665PubMedGoogle ScholarCrossref 13. Pocock SJ. When (not) to stop a clinical trial for benefit. JAMA. 2005;294(17):2228-223016264167PubMedGoogle ScholarCrossref 14. Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life. Med Care. 2003;41(5):582-59212719681PubMedGoogle Scholar 15. Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev. 2009;(1):MR00000619160345PubMedGoogle Scholar 16. Korn EL, Freidlin B, Mooney M. Stopping or reporting early for positive results in randomized clinical trials. J Clin Oncol. 2009;27(10):1712-172119237631PubMedGoogle ScholarCrossref 17. Freidlin B, Korn EL. Stopping clinical trials early for benefit: impact on estimation. Clin Trials. 2009;6(2):119-12519342463PubMedGoogle ScholarCrossref 18. Goodman SN. Stopping trials for efficacy. Clin Trials. 2009;6(2):133-13519342465PubMedGoogle ScholarCrossref 19. DAMOCLES Study Group; NHS Health Technology Assessment Programme. A proposed charter for clinical trial data monitoring committees. Lancet. 2005;365(9460):711-72215721478PubMedGoogle Scholar 20. Pocock SJ. Current controversies in data monitoring for clinical trials. Clin Trials. 2006;3(6):513-52117170035PubMedGoogle ScholarCrossref 21. Bassler D, Ferreira-Gonzalez I, Briel M, et al. Systematic reviewers neglect bias that results from trials stopped early for benefit. J Clin Epidemiol. 2007;60(9):869-87317689802PubMedGoogle ScholarCrossref 22. Mueller PS, Montori VM, Bassler D, Koenig BA, Guyatt GH. Ethical issues in stopping randomized trials early because of apparent benefit. Ann Intern Med. 2007;146(12):878-88117577007PubMedGoogle ScholarCrossref

Journal

JAMA – American Medical Association

Published: Mar 24, 2010

Keywords: cochrane collaboration,medline

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis

Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis

Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects: Systematic Review and Meta-regression Analysis

References (37)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies