Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

Reporting of Noninferiority and Equivalence Randomized Trials

Reporting of Noninferiority and Equivalence Randomized Trials The CONSORT (Consolidated Standards of Reporting Trials) Statement, including a checklist and a flow diagram, was developed to help authors improve their reporting of randomized controlled trials. Its primary focus was on individually randomized trials with 2 parallel groups that assess the possible superiority of one treatment compared with another but is now being extended to other trial designs. Noninferiority and equivalence trials have methodological features that differ from superiority trials and present particular difficulties in design, conduct, analysis, and interpretation. Although the rationale for such trials occurs frequently, those designed and described specifically as noninferiority or equivalence trials appear less commonly in the medical literature. The quality of reporting of those that are published is often inadequate. In this article, we present an adapted CONSORT checklist for reporting noninferiority and equivalence trials and provide illustrative examples and explanations for those items amended from the original CONSORT checklist. The intent is to improve reporting of noninferiority and equivalence trials, enabling readers to assess the validity of their results and conclusions.The Consolidated Standards of Reporting Trials (CONSORT) statement was developed to alleviate the problem of inadequate reporting of randomized controlled trials (RCTs),which has been associated with biased treatment effects.The statement comprises evidence-based recommendations for reporting RCTs, including a flowchart of participants through the trial.CONSORT's primary focus is on parallel group trials,aiming to identify treatment superiority if it really exists. Most CONSORT recommendations apply equally well to other trial designs, but some need adaptation. Herein we extend the CONSORT recommendations to noninferiority and equivalence trials. First, we explain the rationale for and key methodological features of such trials. Second, we consider how commonly noninferiority and equivalence trials are published and provide empirical evidence about their quality. Last, we present an adapted CONSORT checklist for reporting noninferiority and equivalence trials and give illustrative examples (and further elaboration) for those items that have been amended.For convenience, we will refer to treatmentsand patients, although we recognize that not all interventions evaluated in RCTs are technically treatments and the participants in trials are not always patients.Rationale for Noninferiority or Equivalence DesignsMost RCTs aim to determine whether one intervention is superior to another. By contrast, equivalence trialsaim to determine whether one (typically new) intervention is therapeutically similar to another, usually an existing treatment. We use newto refer to the treatment under test, and the comparison or reference treatment is often called an active control.A noninferiority trial seeks to determine whether a new treatment is no worse than a reference treatment. Because proof of exact equality is impossible, a prestated margin of noninferiority (&Dgr;) for the treatment effect in a primary patient outcome is defined. Equivalence trials are very similar, except that equivalence is defined as the treatment effect being between −&Dgr; and &Dgr;. True (2-sided) equivalence therapeutic or prophylactic trials are rare because most such trials address the question of noninferiority.In trials that investigate noninferiority, the question of interest is not symmetric.The new treatment will be recommended if it is similar to or better than an existing one, but not if it is worse (by more than &Dgr;). Superiority of the new treatment would be a bonus. This article focuses mainly on noninferiority trials but applies also to 2-sided equivalence trials.Noninferiority trials are intended to show whether a new treatment has at least as much efficacy as the standard or is worse by an amount less than &Dgr;, often on the premise that it has some other advantage, eg, greater availability, reduced cost, less invasiveness,fewer side effects (harms),or greater ease of administration,for instance, one daily dose rather than 2 dosesor more than 2 doses.Some noninferiority trials have been criticized for merely studying a new marketable product (“me-too” drugs).Noninferiority and equivalence trials are not limited to drugs. For example, a new antenatal care model with fewer clinic visits and reduced cost was investigated for its equivalence to the standard model as regards maternal and neonatal outcomes.A noninferiority trial compared 2 interventional strategies for coronary revascularization in diabetic patients.Methodological Issues in Noninferiority and Equivalence TrialsNoninferiority and equivalence trials present particular difficulties in design, conduct, analysis, and interpretation.HypothesesIn a superiority trial the null hypothesis is that treatments are equally effective and the alternative hypothesis is that they differ. A type I error is falsely finding a treatment effect when there is none, and a type II error is failing to detect a treatment effect when truly one exists. In noninferiority trials, the null and alternative hypotheses are reversed; a type I error is the erroneous acceptance of an inferior new treatment, whereas a type II error is the erroneous rejection of a truly noninferior treatment.DesignA noninferiority or equivalence trial requires that the reference treatment's efficacy is establishedor is in widespread use so that a placebo or untreated control group would be deemed unethical.Both participants and outcome measures in a noninferiority or equivalence trial should be similar to those in trial(s) that established the efficacy of the reference treatment. Outcome measures should also be similar to those in previous trials.The required sample size is calculated using the confidence interval (CI) approach, considering where the CI for the treatment effect lies with respect to both the margin of noninferiority &Dgr; and a null effect. Sample size depends on the level of confidence chosen, the risk of type II error (or desired power), and &Dgr;.A prestated margin of noninferiority &Dgr; can be specified as a difference in means or proportions or the logarithm of an odds ratio, risk ratio, or hazard ratio. A prestated margin of noninferiority is often chosen as the smallest value that would be a clinically important effect.If relevant, &Dgr; should be smaller than the “clinically relevant” effect chosen to investigate superiority of reference treatment against placebo.For example, if mortality with treatment A is better than placebo by 10% (absolute difference), a new treatment B might need to be at least 5% better than placebo (and thus no more than 5% worse than A). The required size of noninferiority trials is therefore usually larger than that for superiority trials.Unfortunately, sample sizes for noninferiority and equivalence trials are often too small.Given several previous trials, the effect of the reference treatment can be estimated from a meta-analysis. There are several techniques to determine &Dgr;,its magnitude being influenced by several factors, eg, efficacy, safety, cost, acceptability, and adherence.ConductTrial conduct should closely match any trial that demonstrated efficacy of the reference treatment, provided they were of high quality.One should avoid features that might dilute true differences between treatments, thereby enhancing the risk of erroneously concluding noninferiority,eg, poor adherence, dropouts, recruitment of patients unlikely to respond, and treatment crossovers.AnalysisAlthough a modified hypothesis testing framework exists,a more informative CI approach is preferred in the design, analysis, and reporting of noninferiority and equivalence trials.For superiority trials, intention-to-treat (ITT) analysis (analyzing all patients within their randomized groups, regardless of whether they completed allocated treatment) is recommended.Intention-to-treat analysis often leads to smaller observed treatment effects than if all patients had adhered to treatment. In noninferiority trials, ITT analysis will often increase the risk of falsely claiming noninferiority (type I error),although not always.In practice, ITT analysis is often not possible and one uses a “full analysis set” to describe that patient follow-up, which is “as complete . . . and . . . close as possible” to ITT.Alternative analyses that exclude patients not taking allocated treatment or otherwise not protocol-adherent could bias the trial in either direction.The terms on-treatmentor per-protocolanalysis are often used but may be inadequately defined. Potentially biased non-ITT analysis is less desirable than ITT in superiority trials but may still provide some insight. In noninferiority and equivalence trials, non-ITT analyses might be desirable as a protection from ITT's increase of type I error risk (falsely concluding noninferiority).There is greater confidence in results when the conclusions are consistent.Subgroup analysis requires the same caveats in noninferiority trials as it requires in superiority trials. Interim analyses in noninferiority trials have some differences in rationale from superiority trials. If noninferiority is established before the trial is completed, there may be no ethical requirement to stop early because of lack of efficacy.However, other advantages (adverse effects, cost) could justify stopping the trial, to expedite availability of the new treatment. If a treatment is clearly inferior, then stopping the trial (or a particular trial arm) is ethically justified.Stopping rules might be asymmetric, a trial being allowed to continue longer if the new treatment appears superior,although this result is unlikely.InterpretationInterpreting a noninferiority trial's results depends on where the CI for the treatment effect lies relative to both the margin of noninferiority &Dgr; and a null effect. The observed treatment effect is not by itself sufficiently informative. With 2-sided equivalence the interpretation is analogous, but both margins &Dgr; and −&Dgr; need considering, and claiming equivalence requires the CI to lie wholly between −&Dgr; and &Dgr;.Many noninferiority trials based their interpretation on the upper limit of a 1-sided 97.5% CI, which is the same as the upper limit of a 2-sided 95% CI. Although both 1-sided and 2-sided CIs allow for inferences about noninferiority, we suggest that 2-sided CIs are appropriate in most noninferiority trials.If a 1-sided 5% significance level is deemed acceptable for the noninferiority hypothesis test(a decision open to question), a 90% 2-sided CI could then be used. The Figureinterprets several possible scenarios with 2-sided CIs for a noninferiority trial.Figure.Possible Scenarios of Observed Treatment Differences for Adverse Outcomes (Harms) in Noninferiority TrialsError bars indicate 2-sided 95% confidence intervals (CIs). Tinted area indicates zone of inferiority. A, If the CI lies wholly to the left of zero, the new treatment is superior. B and C, If the CI lies to the left of &Dgr; and includes zero, the new treatment is noninferior but not shown to be superior. D, If the CI lies wholly to the left of &Dgr; and wholly to the right of zero, the new treatment is noninferior in the sense already defined, but it is also inferior in the sense that a null treatment difference is excluded. This puzzling case is rare, since it requires a very large sample size. It can also result from having too wide a noninferiority margin. E and F, If the CI includes &Dgr; and zero, the difference is nonsignificant but the result regarding noninferiority is inconclusive. G, If the CI includes &Dgr; and is wholly to the right of zero, the difference is statistically significant but the result is inconclusive regarding possible inferiority of magnitude &Dgr; or worse. H, If the CI is wholly above &Dgr;, the new treatment is inferior.*This CI indicates noninferiority in the sense that it does not include &Dgr;, but the new treatment is significantly worse than the standard. Such a result is unlikely because it would require a very large sample size. †This CI is inconclusive in that it is still plausible that the true treatment difference is less than &Dgr;, but the new treatment is significantly worse than the standard.Once noninferiority is evident, it is acceptable to then assess whether the new treatment appears superior to the reference treatment, using an appropriate test or CI (ie, not just the point estimate), preferably defined a priori and with an ITT analysis.It is inappropriate to claim noninferiority post hoc from a superiority trial unless clearly related to a predefined margin of equivalence. That is, both superiority and noninferiority hypotheses need explicit specification in the trial protocol.It is, however, always reasonable to interpret a CI as excluding an effect of a particular prestated size.Having demonstrated noninferiority against reference treatment, some authors then make claims for efficacy of a new treatment relative to placebo by also using evidence from earlier trials of reference treatment vs placebo.Such inferences assume assay constancy, ie, current and earlier trials are identical in all relevant aspects,eg, participants, outcomes definition, and use of standard therapy. Regarding patient populations, for example, this implies no differences in the effect of treatment across subgroups or similar distribution of relevant subgroups. In the absence of assay constancy, an adjustment method has been proposed.Since assay constancy is inevitably questionable, any claims regarding efficacy of new treatment relative to placebo require cautious interpretation.How Common Are Noninferiority and Equivalence Trials?Assessing the frequency of noninferiority and equivalence trials is not straightforward because of inconsistencies in terminology. A search of the Cochrane Controlled Trials Registerin October 2004 for the words equivalenceor noninferiorityyielded 1021 of 415  918 trials (0.2%), but these figures are likely to be misleading. Not all noninferiority or equivalence trials use these words, and the term equivalenceis often inappropriately used when reporting negative results of superiority trials; such trials often lack statistical power to rule out important differences.Identifying noninferiority trials is difficult because they are often labeled as equivalence trials. A recent studyfound that only 3 of 188 (1.6%) cancer trials in a PubMed electronic search were designed to evaluate equivalence or noninferiority (Luciano Costa, MD, written communication, April 26, 2005).From the above findings, it seems that whereas the objective of testing for noninferiority or equivalence is likely to be common, there have been relatively few noninferiority and equivalence trials that are both designed and described as such. However, we perceive that these designs are becoming more widespread.Quality of Reporting of Noninferiority and Equivalence TrialsWe do not know of a study that looked at reporting of a cohort of trials actually designed as noninferiority or equivalence trials. There have been several reviews of quality of trials claiming equivalence, without differentiation between noninferiority and 2-sided equivalence, because many authors use the term equivalenceto mean either.Greene et alidentified methodological flaws in a systematic review of 88 studies claiming equivalence, published from 1992 to 1996. Equivalence was inappropriately claimed in 67% of them, on the basis of nonsignificant tests for superiority. Fifty-one percent stated equivalence as an aim, but only 23% were designed with a preset margin of equivalence. Only 22% adopted appropriate practice: a predefined aim of equivalence, a preset &Dgr;, consequent sample size determination, and actually testing equivalence.Other disease- or field-specific reviews reveal similar findings. Only 2 trials (8%) in a reviewof 25 RCTs in childhood bacterial meningitis published between 1980 and 2000 that claimed equivalent mortality were designed to test equivalence. A review of 90 RCTs with nonstatistically significant or “negative” results published in 3 surgical journals from 1988 to 1998 found 39% met predefined criteria for establishing equivalence.Only 3 studies in a recent review of 188 cancer trials with negative results used a noninferiority or equivalence analysis.In a review of 20 trials intended to detect equivalence in reproductive health, only 4 stated a margin of equivalence.McAlister and Sackettevaluated 4 large “negative” RCTs in hypertension as regards methodological requirements for active-control equivalence trials. Only 2 trials published both ITT and per-protocol (or on-treatment) analyses, only 1 trial specified the margin of equivalence in advance, and none was sufficiently large to address the equivalence hypothesis. This illustrates how failure to properly design, conduct, and analyze equivalence trials leads to incorrect conclusions about equivalence.Extension of Consort StatementTo accommodate noninferiority or equivalence trials, an extension of the CONSORT statement should encompass the following issues: (1) the rationale for adopting a noninferiority or equivalence design; (2) how study hypotheses were incorporated into the design; (3) choice of participants, interventions (especially the reference treatment), and outcomes; (4) statistical methods, including sample size calculation; and (5) how the design affects interpretation and conclusions. Consequences for the CONSORT checklist and flow diagram, including specific changes, are described below.ChecklistWe build on the work of McAlister and Sackettin modifying the CONSORT checklist(Table), especially items 1 to 7, 12, 16, 17, and 20. New text is shown in italics. For each modification, we include 1 or more examples of good reporting (and further elaboration where appropriate). In some examples, we have added text in brackets to explain the context. We mainly concentrate on noninferiority trials but make some reference to equivalence trials which are much less common.Table.Checklist of Items for Reporting Noninferiority or Equivalence Trials (Additions or Modifications to the CONSORT Checklist are Shown in Italics)Paper Section and TopicItem NumberDescriptor (Adapted for Noninferiority or Equivalence Trials)Title and abstract1*How participants were allocated to interventions (eg, “random allocation,” “randomized,” or “randomly assigned”), specifying that the trial is a noninferiority or equivalence trial.IntroductionBackground2*Scientific background and explanation of rationale, including the rationale for using a noninferiority or equivalence design.MethodsParticipants3*Eligibility criteria for participants (detailing whether participants in the noninferiority or equivalence trial are similar to those in any trial[s] that established efficacy of the reference treatment)and the settings and locations where the data were collected.Interventions4*Precise details of the interventions intended for each group, detailing whether the reference treatment in the noninferiority or equivalence trial is identical (or very similar) to that in any trial(s) that established efficacy,and how and when they were actually administered.Objectives5*Specific objectives and hypotheses, including the hypothesis concerning noninferiority or equivalence.Outcomes6*Clearly defined primary and secondary outcome measures, detailing whether the outcomes in the noninferiority or equivalence trial are identical (or very similar) to those in any trial(s) that established efficacy of the reference treatmentand, when applicable, any methods used to enhance the quality of measurements (eg, multiple observations, training of assessors).Sample size7*How sample size was determined, detailing whether it was calculated using a noninferiority or equivalence criterion and specifying the margin of equivalence with the rationale for its choice.When applicable, explanation of any interim analyses and stopping rules (and whether related to a noninferiority or equivalence hypothesis).RandomizationSequence generation8Method used to generate the random allocation sequence, including details of any restriction (eg, blocking, stratification).Allocation concealment9Method used to implement the random allocation sequence (eg, numbered containers or central telephone), clarifying whether the sequence was concealed until interventions were assigned.Implementation10Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups.Blinding (masking)11Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. When relevant, how the success of blinding was evaluated.Statistical methods12*Statistical methods used to compare groups for primary outcome(s), specifying whether a 1- or 2-sided confidence interval approach was used. Methods for additional analyses, such as subgroup analyses and adjusted analyses.ResultsParticipant flow13Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of participants randomly assigned, receiving intended treatment, completing the trial protocol, and analyzed for the primary outcome. Describe protocol deviations from trial as planned, together with reasons.Recruitment14Dates defining the periods of recruitment and follow-up.Baseline data15Baseline demographic and clinical characteristics of each group.Numbers analyzed16*Number of participants (denominator) in each group included in each analysis and whether “intention-to-treat” and/or alternative analyses were conducted. State the results in absolute numbers when feasible (eg, 10/20, not 50%).Outcomes and estimation17*For each primary and secondary outcome, a summary of results for each group and the estimated effect size and its precision (eg, 95% confidence interval). For the outcome(s) for which noninferiority or equivalence is hypothesized, a figure showing confidence intervals and margins of equivalence may be useful.Ancillary analyses18Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those prespecified and those exploratory.Adverse events19All important adverse events or side effects in each intervention group.CommentInterpretation20*Interpretation of the results, taking into account the noninferiority or equivalence hypothesis and any othertrial hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes.Generalizability21Generalizability (external validity) of the trial findings.Overall evidence22General interpretation of the results in the context of current evidence.*Expansion of corresponding item on CONSORT checklist.Title and AbstractTitle and Abstract: Item 1.How participants were allocated to interventions (eg, random allocation, randomized, or randomly assigned), specifying that the trial is a noninferiority or equivalence trial.Title.“Oral Pristinamycin versus Standard Penicillin Regimen to Treat Erysipelas in Adults: Randomised, Noninferiority, Open Trial”Abstract.“Design—Multicentre, parallel group, open labelled, randomised noninferiority trial.”IntroductionBackground: Item 2.Scientific background and explanation of rationale, including the rationale for using a noninferiority or equivalence design.Example.“Up to 40 million children worldwide are estimated to suffer from vitamin A deficiency. . . . A dose of 200,000 IU retinyl palmitate to children over 1 year old is most widely used and has generally been regarded as safe and potentially effective. . . . In developing countries, animal products that provide retinyl esters are too expensive. . . . Vegetables and fruit . . . are cheap and good sources of vitamin A in the form of beta carotene. . . . Beta carotene is also considered to be virtually non-toxic. . . . In a preliminary study, . . . after 20 days there was a reversion of the clinical and subclinical signs of vitamin A deficiency in the study group. . . . Since beta carotene is the principal source of vitamin A in developing countries and is non-toxic, we compared retinyl palmitate and beta carotene for treatment of vitamin A deficiency.”Elaboration.The rationale should cite evidence for the efficacy of the reference treatment. If previous trials, or their systematic review, demonstrate the superiority of the reference treatment relative to placebo, they should be cited with effect sizes and CIs. If no such trials exist, other evidence for efficacy of the reference treatment should be given. Evidence for other advantages of the new treatment over the reference treatment, if present, should be given, to justify use of the new treatment, if not inferior. One aim of the current trial might be to provide or support such evidence. In the case of “me-too” drugs, it should be clear whether there are other advantages.MethodsParticipants: Item 3.Eligibility criteria for participants (detailing whether participants in the noninferiority or equivalence trial are similar to those in any trial[s] that established efficacy of the reference treatment)and the settings and locations where the data were collected.Example.“From Sept 1, 1992, to Dec 30, 1994, we enrolled 6628 men and women in 312 health centres in Sweden . . . who had hypertension (blood pressure ≥180 mm Hg systolic, ≥105 mm Hg diastolic, or both), aged 70-84 years. The only difference in inclusion criteria between this trial and the STOP-Hypertension trial was that patients with isolated systolic hypertension could be included in STOP-Hypertension-2, based on previous positive findings in patients with isolated systolic hypertension treated with diuretics and calcium antagonists.”Elaboration.Relevant changes in participants' characteristics compared with previous trial(s) should be reported and explained. Clinical trial participants differ, mainly if time has elapsed between trials; therefore, such description should concentrate in relevant departures (that might affect response to treatments).Interventions: Item 4.Precise details of the interventions intended for each group, detailing whether the reference treatment in the noninferiority or equivalence trial is identical (or very similar) to that in any trial(s) that established efficacy,and how and when they were actually administered.Example.“[W]e randomly assigned women about to deliver vaginally to receive 600 μg misoprostol orally or 10 IU oxytocin intravenously or intramuscularly, according to practice. . . . The use of uterotonic agents [oxytocin, a type of uterotonic, is the reference treatment] in the management of the third stage of labour reduces the amount of bleeding and the need for blood transfusion . . . ”(The authors reference a Cochrane systematic review, showing that uterotonic agents reduced bleeding and blood transfusions compared with placebo.)Elaboration.Any differences between the control intervention in the trial and the reference treatment in the previous trial(s) in which efficacy was established should be reported and explained. For example, differences may exist because background treatment and patient management change with time and concomitant therapies may differ.Dose changes may occur: if the dose of the reference treatment is reduced, it might result in reduced efficacy; if it is increased, possibly leading to tolerability problems, the new treatment's advantages could be overestimated.Objectives: Item 5.Specific objectives and hypotheses, including the hypothesis concerning noninferiority or equivalence.Example.“[A] bodyweight-adjusted single bolus of 0.50-0.55 mg/kg tenecteplase would be equivalent to a 90 min regimen of alteplase for efficacy and safety [the primary endpoint for efficacy was all-cause 30-day mortality from acute myocardial infarction]. In this double-blind, randomised, controlled study, we formally tested this hypothesis.”Elaboration.The authors should specify for which outcomes noninferiority or equivalence hypotheses apply and for which superiority hypotheses apply. Usually the noninferiority or equivalence hypothesis refers to the primary end point, whereas the new treatment is expected to offer other advantages, eg, fewer adverse effects, cost.Outcomes: Item 6.Clearly defined primary and secondary outcome measures, detailing whether the outcomes in the noninferiority or equivalence trial are identical (or very similar) to those in any trial(s) that established efficacy of the reference treatmentand, when applicable, any methods used to enhance the quality of measurements (eg, multiple observations, training of assessors).Example.“Over the past decade seven large, randomised, placebo-controlled trials involving a total of 16,770 patients who underwent percutaneous interventions have established that the overall reduction in the risk of death or nonfatal myocardial infarction 30 days after adjunctive inhibition of platelet glycoprotein IIb/IIIa receptors is 38 percent. Three glycoprotein IIb/IIIa inhibitors were assessed in these trials. The primary end point [in the present trial] was a composite of death, nonfatal myocardial infarction, or urgent target-vessel revascularization within 30 days after the index procedure.”Elaboration.Any differences in outcome measures in the new trial compared with trial(s) that established efficacy of the reference treatment should be noted and justified. In particular, note any changes in timing of evaluation. Ideally, outcomes should remain unchanged, but often insights do lead to change as the understanding, management, and prognosis of a disease improve. For example, early AIDS trials used death outcomes, then deaths became uncommon, so they shifted to AIDS clinical events, then clinical events became uncommon, so they shifted to surrogate markers.Sample Size: Item 7a.How sample size was determined, detailing whether it was calculated using a noninferiority or equivalence criterion, and specifying the margin of equivalence with the rationale for its choice.Examples.“Considering previous studies, a primary event rate of 3.1% per year was estimated for patients in both treatment groups. To obtain 90% statistical power with a 1-sided α equal to 0.025, approximately 1600 patient-years of exposure per treatment groups are necessary to establish the noninferiority of ximelagatran compared with dose-adjusted warfarin within 2% per year. . . . Assuming an average follow up of 16 months, approximately 2400 patients are required.”“Sample size was based on . . . [an] 8.0% primary quadruple end point event rate in the control (heparin plus Gp IIb/IIIa blockade) group [reference treatment] and a 12.5% relative reduction in the bivalirudin arm. Using a 2-sided α level of .05 and 3000 patients per group, the trial had a 99% power to detect superiority over the imputed heparin control [historical control] and a 92% power to satisfy noninferiority criteria relative to heparin plus Gp IIb/IIIa.”Elaboration.The margin of noninferiority or equivalence should be specified, and preferably justified on clinical grounds. Its relation to the effect of the reference treatment relative to placebo in any previous trials should be noted (see second example).Sample size calculations are usually based on the assumption that the point estimate of the difference between treatments will be 0 (as in the first example above). Examples F and G in the Figurewould have met the noninferiority criterion had the observed point estimates been 0. That is, the precision of the estimates would have been adequate, had the 2 treatments been equally effective. With a large enough sample, it is possible to demonstrate noninferiority even when the point estimate is between 0 and &Dgr;. If the true effect is assumed to be greater than 0, the sample size will need to be increased, perhaps substantially.Stopping Rules: Item 7b.When applicable, explanation of any interim analyses and stopping rules (and whether related to a noninferiority or equivalence hypothesis).Example.“Interim safety analyses were planned when 40 and 70 percent of the total number of women had been enrolled. An increased rate of HIV transmission associated with the shorter regimens, as compared with the long-long regimen, would be considered significant if any of the nominal P values for the differences were less than 0.007 in the first interim analysis and less than 0.012 in the second. . . . ”Elaboration.It is customary to base interim stopping criteria on Pvalues, and these adjusted Pvalues are analogous to widened CIs.Statistical Methods: Item 12.Statistical methods used to compare groups for primary outcome(s), specifying whether a 1- or 2-sided confidence interval approach was used.Methods for additional analyses, such as subgroup analyses and adjusted analyses.Examples. Binary outcome.“The proportion of the intention-to-treat population experiencing primary events per year for both treatment groups, and the associated 1-sided 97.5% CI for the difference, will be estimated using the time to first event . . . The noninferiority margin (&Dgr;) defined in the primary analysis is based on absolute event rate differences . . . Noninferiority of ximelagatran over warfarin will be accepted [in a 0.025 level test] if the upper bound of the 97.5% CI around the estimated difference in primary event rates lies below &Dgr;. For these studies, an absolute &Dgr; of 2% was adopted. . . . ”Continuous outcome.“Regimens were regarded as equivalent if the difference between treatments in change in FEV1 (using 95% CI) was less than 4% of predicted FEV1 . . . Since we were undertaking an equivalence study, the primary analysis was per protocol but an intention-to treat analysis was also undertaken. The mean difference between treatments and 95% CI for the true difference was obtained from analysis of variance, with adjustment for centre and type of clinic. . . . ”Elaboration.The upper bound of the 1-sided (1 − α) × 100% CI (or correspondingly, the upper bound of the 2-sided (1 − α/2) × 100% CI) for the treatment effect has to be below the margin &Dgr; to declare that noninferiority has been shown, with a significance level α. Both &Dgr; and α should be prespecified in the noninferiority hypothesis.ResultsNumbers Analyzed: Item 16.Number of participants (denominator) in each group included in each analysis and whether “intention-to-treat” and/or alternative analyses were conducted. State the results in absolute numbers when feasible (eg, 10/20, not just 50%).Example.“Efficacy variables were analyzed on an intent-to-treat basis . . . and on an as-treated basis. In the intent-to-treat analysis, patients were considered treatment failures if they made any treatment changes, prematurely discontinued randomized treatment for any reason, or had missing data for 2 consecutive evaluations. In the as-treated analysis, only data from patients continuing randomized treatment were considered for analysis.”Outcomes and Estimation: Item 17.For each primary and secondary outcome, a summary of results for each group and the estimated effect size and its precision (eg, 95% confidence interval). For the outcome[s] for which noninferiority or equivalence is hypothesized, a figure showing confidence intervals and margins of equivalence may be useful.Examples. Inferiority of new treatment, figure legend.“Relative risk of blood loss of 1000 mL or more with misoprostol compared with oxytocin [1.39, 95% CI 1.19 to 1.63]. Vertical dotted lines represent margins of clinical equivalence determined a priori [0.74 and 1.35 on the relative scale]. Solid line represents null effect.”(A figure similar to case G in the Figurewas presented on the relative scale.)Noninferiority of new treatment.“The primary quadruple composite end point of death, myocardial infarction, urgent repeat revascularization, or in-hospital major bleeding by 30 days occurred in 299 (10.0%) of 2991 patients in the heparin plus Gp IIb/IIIa inhibitor group vs 275 (9.2%) of 2975 patients in the bivalirudin group (OR, 0.92; 95% CI, 0.77-1.09; P=0.32). Relative to heparin alone, the imputed OR was 0.62 (95% CI, 0.47-0.82), satisfying statistical criteria for noninferiority to heparin plus Gp IIb/IIIa blockade and superiority to heparin alone.”(A figure similar to case B in the Figurewas presented on the relative scale but without the margin of noninferiority.)Elaboration.In the first example the new treatment was inferior, but it was uncertain whether the treatment effect was smaller or larger than the margin of equivalence 1.35. The second example demonstrated noninferiority.CommentInterpretation: Item 20.Interpretation of the results, taking into account the noninferiority or equivalence hypothesis and any othertrial hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes.Examples. Concluding noninferiority.“According to our definition of equivalence, the efficacy of the . . . long-short regimen (was) statistically equivalent to the efficacy of the long-long regimen . . . The upper limit of the 95 percent confidence interval for the difference between the rates in the two groups was 5.3 percent (close to the boundary of 6.0 percent).”Concluding inferiority of new drug (or conventional superiority of reference drug).“Although the trial was intended to assess the noninferiority of tirobifan as compared with abciximab, the findings demonstrated that tirobifan offered less protection from major ischemic events than did abciximab. . . . In order to meet the present definition of equivalence, the upper bound of the 95% confidence interval of the hazard ratio for the comparison of tirofiban with abciximab had to be less than 1.47. . . . The primary endpoint occurred more frequently among the 2398 patients in the tirofiban group than among the 2411 patients in the abciximab group (7.6 percent vs. 6.0 percent; hazard ratio, 1.26; . . . two-sided 95 percent confidence interval of 1.01 to 1.57, demonstrating the superiority of abciximab over tirofiban; P=0.038).”Concluding noninferiority of new drug from a trial designed to assess superiority.“The SYNERGY protocol prespecified that if enoxaparin was not demonstrated to be superior to unfractionated heparin, a noninferiority analysis was to be performed. . . . Enaxoparin was not superior to unfractionated heparin but was noninferior for the treatment of high-risk patients with non-ST-segment elevation ACS.”CommentIt is not our intent to promote noninferiority or equivalence trials: the design should be appropriate to the question to be answered.Available efficacious reference treatments can make use of placebo controls unethical.But even in cases for which a treatment is efficacious on some measures, eg, depression scales, it may not be for a rarer but more important outcome, eg, suicide.Reports of noninferiority and equivalence trials must be clear enough to allow readers to interpret results reliably. Accordingly, we herein propose extensions to the CONSORT statement to facilitate appropriate reporting of noninferiority and equivalence trials.We advocate that editors extend support of the original CONSORT statement to include use of this extension to noninferiority and equivalence trials and refer to it in their “Instructions to Authors.” Adoption by journals of the original CONSORT statement is associated with improved quality,so we hope this proposed extension will result in similar improvements for noninferiority and equivalence trials.The CONSORT Group continues to update and extend its recommendations. The current recommendations add to recent extensions to cluster randomized trials,and the reporting of harms.Further extensions are in preparation. The current versions of all CONSORT recommendations are available at http://www.consort-statement.org.Corresponding Author:Gilda Piaggio, PhD, Department of Reproductive Health and Research, World Health Organization, 1211 Geneva 27, Switzerland (piaggiog@who.int).Financial Disclosures:None reported.Funding/Support:No funding was received for writing this article, although Drs Altman, Elbourne, and Piaggio and Mr Evans were supported by the CONSORT group to attend a meeting in Canada on this topic. The wider CONSORT group commented on earlier drafts and endorsed its submission for publication. Dr Altman is supported by Cancer Research United Kingdom. Dr Piaggio is supported by the UNDP/UNFPA/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction, Department of Reproductive Health and Research, World Health Organization.Acknowledgment:We thank the members of the CONSORT Group, especially David Moher, MSc, Thomas C. Chalmers Centre for Systematic Reviews, Children's Hospital of Eastern Ontario, Ottawa; Ken Schulz, PhD, Quantitative Science, Family Health International, Research Triangle Park, NC; and Susan Eastwood, ELS, Publications and Grants Writing, Department of Neurological Surgery, University of California at San Francisco, Emeryville; Peter C. Gøtzsche, MD, Nordic Cochrane Centre, Copenhagen, Denmark; Barbara Hawkins, PhD, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Md; Tom Lang, MA, Lakewood, Ohio; Ingram Olkin, PhD, Stanford University, Stanford, Calif; David L. Sackett, OC, FRSC, MD, FRCP, Trout Research and Education Centre, Markdale, Ontario; and David Simel, MD, MHS, Duke University, Durham, NC, and Simon Day, PhD, Licensing Division, Medicines and Healthcare Products Regulatory Agency, London, England, for comments on earlier drafts. We also thank Luciano Costa, MD, Division of Medical Oncology, University of Colorado Health Science Center, Aurora, for providing unpublished data.REFERENCESCBeggMChoSEastwoodImproving the quality of reporting of randomized controlled trials: the CONSORT statement.JAMA19962766376398773637DMoherKFSchulzDGAltmanfor the CONSORT GroupThe CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials.Lancet20013571191119411323066DGAltmanKFSchulzDMoherThe revised CONSORT statement for reporting randomized trials: explanation and elaboration.Ann Intern Med200113466369411304107JPAIoannidisSJWEvansPCGøtzscheBetter reporting of harms in randomized trials: an extension of the CONSORT statement.Ann Intern Med200414178178815545678KFSchulzRandomized trials, human nature, and reporting guidelines.Lancet19963485965988774577PJüniDGAltmanMEggerSystematic reviews in health care: assessing the quality of controlled clinical trials.BMJ2001323424611440947MEggerPJüniCBartlettHow important are comprehensive literature searches and the assessment of trial quality in systematic reviews? empirical study.Health Technol Assess2003717612583822SWellekTesting Statistical Hypotheses of Equivalence.Boca Raton, Fla: Chapman Hall/CRC; 2003Committee for Proprietary Medicinal ProductsNote for Guidelines on Evaluation of Medicinal Products Indicated for Treatment of Bacterial Infections.London, England: European Medicines Agency (EMEA); April 22, 2004. Available at: http://www.emea.eu.int/pdfs/human/ewp/055895en.pdf. Accessed July 2004VLDurkalskiYYPaleschBCPineauThe virtual colonoscopy study: a large multicenter clinical trial designed to compare two diagnostic screening procedures.Control Clin Trials20022357058312392872Clinical Outcomes of Surgical Therapy Study GroupA comparison of laparoscopically assisted and open colectomy for colon cancer.N Engl J Med20043502050205915141043DChadwickVigabatrin European Monotherapy Study GroupSafety and efficacy of vigabatrin and carbamazepine in newly diagnosed epilepsy: a multicentre randomised double-blind study.Lancet1999354131910406359Assessment of the Safety and Efficacy of a New Thrombolytic (ASSENT-2) InvestigatorsSingle-bolus tenecteplase compared with front-loaded alteplase in acute myocardial infarction: the ASSENT-2 double-blind randomised trial.Lancet199935471672210475182Hvon HertzenGPiaggioJDingLow dose mifepristone and two regimens of levonorgestrel for emergency contraception: a WHO multicentre randomised trial.Lancet20023601803181012480356ASmythKH-VTanPHyman-TaylorOnce versus three times daily regimens of tobramycin treatment for pulmonary exacerbations of cystic fibrosis–the TOPIC study: a randomised controlled trial.Lancet200536557357815708100SJPocockThe pros and cons of non-inferiority (equivalence) trials.In: Guess HA, Kleinman A, Kusek JW, Engel LW, eds. The Science of Placebo: Towards an Interdisciplinary Research Agenda.London, England: BMJ Books; 2000:236-248JVillarHBa'aqeelGPiaggioWHO antenatal care randomised trial for the evaluation of a new model of routine antenatal care.Lancet20013571551156411377642AKapurISMalikJPBaggerThe Coronary Artery Revascularisation in Diabetes (CARDia) trial: background, aims, and design.Am Heart J2005149131915660030RBD'AgostinoSrJMMassaroLMSullivanNon-inferiority trials: design concepts and issues—the encounters of academic consultants in statistics.Stat Med20032216918612520555ICH Steering CommitteeHarmonised Tripartite Guideline: Choice of Control Group and Related Issues in Clinical Trials (E10).Geneva, Switzerland: International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use; 2000. Available at: http://www.ich.org/LOB/media/MEDIA486.pdf. Accessed February 9, 2006BDjulbegovicMClarkeScientific and ethical issues in equivalence trials.JAMA20012851206120811231752Points to Consider on Switching Between Superiority and Non-inferiority.London, England: European Medicines Agency (EMEA); July 27, 2000. Available at: http://www.emea.eu.int/pdfs/human/ewp/048299en.pdf. Accessed February 9, 2006JHTamayo-SarverJMAlbertMTamayo-SarverAdvanced statistics: how to determine whether your intervention is different, at least as effective as, or equivalent: a basic introduction.Acad Emerg Med20051253654215930406BLWiensChoosing an equivalence limit for non-inferiority or equivalence studies.Control Clin Trials20022321411852160BJonesPJarvisJALewisTrials to assess equivalence: the importance of rigorous methods.BMJ199631336398664772ASDetskyDLSackettWhen was a negative clinical trial big enough? how many patients you needed depends on what you found.Arch Intern Med19851457097123985731MRothmannNLiGChenDesign and analysis of non-inferiority mortality trials in oncology.Stat Med20032223926412520560JALewisThe European regulatory experience.Stat Med2002212931293812325109Points to Consider on the Choice of Non-inferiority Margin.London, England: European Medicines Agency (EMEA); February 26, 2004. Available at: http://www.emea.eu.int/pdfs/human/ewp/215899en.pdf. Accessed February 9, 2006SSennStatistical Issues in Drug Development.Chichester: John Wiley & Sons; 2002:207-217MYKimJDGoldbergThe effects of outcome misclassification and measurement error on the design and analysis of therapeutic equivalence trials.Stat Med2001202065207811439421CWDunnettMGentSignificance testing to establish equivalence between treatments with special reference to data in the form of 2 x 2 tables.Biometrics197733593602588654CWDunnettMGentAn alternative to the use of two-sided tests in clinical trials.Stat Med199615172917388870155KJRothmanSignificance questing.Ann Intern Med19861054454473740684EBrittainDLinA comparison of intent-to-treat and per-protocol results in antibiotic non-inferiority trials.Stat Med20052411015532089ICH Steering CommitteeStatistical Principles for Clinical Trials (E9).Geneva, Switzerland: International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use; 1998. Available at: http://www.ich.org/LOB/media/MEDIA485.pdf. Accessed February 9, 2006FAMcAlisterDLSackettActive-control equivalence trials and antihypertensive agents.Am J Med200111155355811705432MLallemantGJourdainSLe CoeurA trial of shortened zidovudine regimens to prevent mother-to-child transmission of human immunodeficiency virus type 1.N Engl J Med200034398299111018164PPrandoniOBruchiPSabbionProlonged thromboprophylaxis with oral anticoagulants after total hip arthroplasty.Arch Intern Med20021621966197112230419FKLChanSChungBYSuenPreventing recurrent upper gastrointestinal bleeding in patients with Helicobacter pyloriinfection who are taking low-dose aspirin or naproxen.N Engl J Med200134496797311274623SDurrlemanRSimonPlanning and monitoring of equivalence studies.Biometrics1990463293362194579DLSackettSuperiority trials, non inferiority trials, and prisoners of the 2-sided null hypothesis.ACP J Club2004140A1115122874JLFleissGeneral design issues in efficacy, equivalency and superiority trials.J Periodontal Res1992273063131507018SYNERGY Trial InvestigatorsEnoxaparin vs unfractionated heparin in high-risk patients with non–st-segment elevation acute coronary syndromes managed with an intended early invasive strategy: primary results of the SYNERGY randomized trial.JAMA2004292455415238590DElbourneCDezateuxRArthurUK Collaborative Hip Trial GroupUltrasonography in the diagnosis and management of developmental hip dysplasia (UK Hip Trial): clinical and economic results of a multicentre randomised controled trial.Lancet20023602009201712504396AMLincoffJABittlRAHarringtonBivalrudin and provisional glycoprotein IIb/IIIa blockade compared with heparin and planned glycoprotein IIb/IIIa blockade during percutaneous coronary intervention: REPLACE-2 randomized trial.JAMA200328985386312588269The Cochrane Library.Issue 3. Chichester, England: Wiley; 2004DMoherCSDulbergGAWellsStatistical power, sample size, and their reporting in randomized controlled trials.JAMA19942721221248015121WLGreeneJConcatoARFeinsteinClaims of equivalence in medical research: are they supported by the evidence?Ann Intern Med200013271572210787365LJCostaACGXavierAdel GiglioNegative results in cancer clinical trials—equivalence or poor accrual?Control Clin Trials20042552553315465621DJKrysanARKemperClaims of equivalence in randomized controlled trials of the treatment of bacterial meningitis in children.Pediatr Infect Dis J20022175375812192164JBDimickMDiener-WestPALipsettNegative results of randomized clinical trials published in the surgical literature: equivalency or error?Arch Surg200113679680011448393GPiaggioAPPinolUse of the equivalence approach in reproductive health clinical trials.Stat Med2001203571357711746338PBernardOChosidowLVaillantFrench Erysipelas Study GroupOral pristinamycin versus standard penicillin regimen to treat erysipelas in adults: randomised, non-inferiority, open trial.BMJ200232586412386036CCarlierJCosteMEtchepareA randomised controlled trial to test equivalence between retinyl palmate and carotene for vitamin A deficiency.BMJ1993307110611108251808LHanssonLHLindholmTEkbomthe STOP-Hypertension-2 study groupRandomised trial of old and new antihypertensive drugs in elderly patients: cardiovascular mortality and morbidity in the Swedish trial in Old Patients with Hypertension-2 study.Lancet19993541751175610577635AMGülmezogluJVillarNTNNgocWHO Collaborative Group to Evaluate Misoprostol in the Management of the Third Stage of LabourWHO multicentre randomised trial of misoprostol in the management of the third stage of labour.Lancet200135868969511551574EJTopolDJMoliternoHCHerrmannComparison of two platelet glycoprotein IIb/IIIa inhibitors, tirobifan and abciximab, for the prevention of ischemic events with percutaneous coronary revascularization.N Engl J Med20013441888189411419425JLHalperinExecutive Committee, Sportif III and V Study InvestigatorsXimelataragan compared with warfarin for prevention of thromboembolism in patients with nonvalvular atrial fibrillation: rationale, objectives, and design of a pair of clinical studies and baseline patient characteristics (SPORTIF III and V).Am Heart J200314643143812947359SStaszewskiPKeiserJMontanerAbacavir-lamivudine-zidovudine vs indinavir-lamivudine-zidovudine in antiretroviral-naïve HIV-infected adults.JAMA20012851155116311231744KJRothmanPlacebo mania.BMJ1996313348664770DGunnellJSaperiaDAshbySelective serotonin reuptake inhibitors (SSRIs) and suicide in adults: meta-analysis of drug company data from placebo controlled, randomised controlled trials submitted to the MHRA's safety review.BMJ200533038538815718537DMoherAJonesLLepageCONSORT GroupUse of the CONSORT statement and quality of reports of randomised trials: a comparative before-and-after evaluation.JAMA20012851992199511308436MEggerPJüniCBartlettCONSORT GroupValue of flow diagrams in reports of randomized controlled trials.JAMA20012851996199911308437PJDevereauxBJMannsWAGhaliThe reporting of methodological factors in randomized controlled trials and the association with a journal policy to promote adherence to the Consolidated Standards of Reporting Trials (CONSORT) checklist.Control Clin Trials20022338038812161081MKCampbellDRElbourneDGAltmanthe CONSORT GroupCONSORT statement: extension to cluster randomised trials.BMJ200432870270815031246 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png JAMA American Medical Association

Reporting of Noninferiority and Equivalence Randomized Trials

Loading next page...
 
/lp/american-medical-association/reporting-of-noninferiority-and-equivalence-randomized-trials-mNZH4NsYc3
Publisher
American Medical Association
Copyright
Copyright 2006 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
ISSN
0098-7484
eISSN
1538-3598
DOI
10.1001/jama.295.10.1152
pmid
16522836
Publisher site
See Article on Publisher Site

Abstract

The CONSORT (Consolidated Standards of Reporting Trials) Statement, including a checklist and a flow diagram, was developed to help authors improve their reporting of randomized controlled trials. Its primary focus was on individually randomized trials with 2 parallel groups that assess the possible superiority of one treatment compared with another but is now being extended to other trial designs. Noninferiority and equivalence trials have methodological features that differ from superiority trials and present particular difficulties in design, conduct, analysis, and interpretation. Although the rationale for such trials occurs frequently, those designed and described specifically as noninferiority or equivalence trials appear less commonly in the medical literature. The quality of reporting of those that are published is often inadequate. In this article, we present an adapted CONSORT checklist for reporting noninferiority and equivalence trials and provide illustrative examples and explanations for those items amended from the original CONSORT checklist. The intent is to improve reporting of noninferiority and equivalence trials, enabling readers to assess the validity of their results and conclusions.The Consolidated Standards of Reporting Trials (CONSORT) statement was developed to alleviate the problem of inadequate reporting of randomized controlled trials (RCTs),which has been associated with biased treatment effects.The statement comprises evidence-based recommendations for reporting RCTs, including a flowchart of participants through the trial.CONSORT's primary focus is on parallel group trials,aiming to identify treatment superiority if it really exists. Most CONSORT recommendations apply equally well to other trial designs, but some need adaptation. Herein we extend the CONSORT recommendations to noninferiority and equivalence trials. First, we explain the rationale for and key methodological features of such trials. Second, we consider how commonly noninferiority and equivalence trials are published and provide empirical evidence about their quality. Last, we present an adapted CONSORT checklist for reporting noninferiority and equivalence trials and give illustrative examples (and further elaboration) for those items that have been amended.For convenience, we will refer to treatmentsand patients, although we recognize that not all interventions evaluated in RCTs are technically treatments and the participants in trials are not always patients.Rationale for Noninferiority or Equivalence DesignsMost RCTs aim to determine whether one intervention is superior to another. By contrast, equivalence trialsaim to determine whether one (typically new) intervention is therapeutically similar to another, usually an existing treatment. We use newto refer to the treatment under test, and the comparison or reference treatment is often called an active control.A noninferiority trial seeks to determine whether a new treatment is no worse than a reference treatment. Because proof of exact equality is impossible, a prestated margin of noninferiority (&Dgr;) for the treatment effect in a primary patient outcome is defined. Equivalence trials are very similar, except that equivalence is defined as the treatment effect being between −&Dgr; and &Dgr;. True (2-sided) equivalence therapeutic or prophylactic trials are rare because most such trials address the question of noninferiority.In trials that investigate noninferiority, the question of interest is not symmetric.The new treatment will be recommended if it is similar to or better than an existing one, but not if it is worse (by more than &Dgr;). Superiority of the new treatment would be a bonus. This article focuses mainly on noninferiority trials but applies also to 2-sided equivalence trials.Noninferiority trials are intended to show whether a new treatment has at least as much efficacy as the standard or is worse by an amount less than &Dgr;, often on the premise that it has some other advantage, eg, greater availability, reduced cost, less invasiveness,fewer side effects (harms),or greater ease of administration,for instance, one daily dose rather than 2 dosesor more than 2 doses.Some noninferiority trials have been criticized for merely studying a new marketable product (“me-too” drugs).Noninferiority and equivalence trials are not limited to drugs. For example, a new antenatal care model with fewer clinic visits and reduced cost was investigated for its equivalence to the standard model as regards maternal and neonatal outcomes.A noninferiority trial compared 2 interventional strategies for coronary revascularization in diabetic patients.Methodological Issues in Noninferiority and Equivalence TrialsNoninferiority and equivalence trials present particular difficulties in design, conduct, analysis, and interpretation.HypothesesIn a superiority trial the null hypothesis is that treatments are equally effective and the alternative hypothesis is that they differ. A type I error is falsely finding a treatment effect when there is none, and a type II error is failing to detect a treatment effect when truly one exists. In noninferiority trials, the null and alternative hypotheses are reversed; a type I error is the erroneous acceptance of an inferior new treatment, whereas a type II error is the erroneous rejection of a truly noninferior treatment.DesignA noninferiority or equivalence trial requires that the reference treatment's efficacy is establishedor is in widespread use so that a placebo or untreated control group would be deemed unethical.Both participants and outcome measures in a noninferiority or equivalence trial should be similar to those in trial(s) that established the efficacy of the reference treatment. Outcome measures should also be similar to those in previous trials.The required sample size is calculated using the confidence interval (CI) approach, considering where the CI for the treatment effect lies with respect to both the margin of noninferiority &Dgr; and a null effect. Sample size depends on the level of confidence chosen, the risk of type II error (or desired power), and &Dgr;.A prestated margin of noninferiority &Dgr; can be specified as a difference in means or proportions or the logarithm of an odds ratio, risk ratio, or hazard ratio. A prestated margin of noninferiority is often chosen as the smallest value that would be a clinically important effect.If relevant, &Dgr; should be smaller than the “clinically relevant” effect chosen to investigate superiority of reference treatment against placebo.For example, if mortality with treatment A is better than placebo by 10% (absolute difference), a new treatment B might need to be at least 5% better than placebo (and thus no more than 5% worse than A). The required size of noninferiority trials is therefore usually larger than that for superiority trials.Unfortunately, sample sizes for noninferiority and equivalence trials are often too small.Given several previous trials, the effect of the reference treatment can be estimated from a meta-analysis. There are several techniques to determine &Dgr;,its magnitude being influenced by several factors, eg, efficacy, safety, cost, acceptability, and adherence.ConductTrial conduct should closely match any trial that demonstrated efficacy of the reference treatment, provided they were of high quality.One should avoid features that might dilute true differences between treatments, thereby enhancing the risk of erroneously concluding noninferiority,eg, poor adherence, dropouts, recruitment of patients unlikely to respond, and treatment crossovers.AnalysisAlthough a modified hypothesis testing framework exists,a more informative CI approach is preferred in the design, analysis, and reporting of noninferiority and equivalence trials.For superiority trials, intention-to-treat (ITT) analysis (analyzing all patients within their randomized groups, regardless of whether they completed allocated treatment) is recommended.Intention-to-treat analysis often leads to smaller observed treatment effects than if all patients had adhered to treatment. In noninferiority trials, ITT analysis will often increase the risk of falsely claiming noninferiority (type I error),although not always.In practice, ITT analysis is often not possible and one uses a “full analysis set” to describe that patient follow-up, which is “as complete . . . and . . . close as possible” to ITT.Alternative analyses that exclude patients not taking allocated treatment or otherwise not protocol-adherent could bias the trial in either direction.The terms on-treatmentor per-protocolanalysis are often used but may be inadequately defined. Potentially biased non-ITT analysis is less desirable than ITT in superiority trials but may still provide some insight. In noninferiority and equivalence trials, non-ITT analyses might be desirable as a protection from ITT's increase of type I error risk (falsely concluding noninferiority).There is greater confidence in results when the conclusions are consistent.Subgroup analysis requires the same caveats in noninferiority trials as it requires in superiority trials. Interim analyses in noninferiority trials have some differences in rationale from superiority trials. If noninferiority is established before the trial is completed, there may be no ethical requirement to stop early because of lack of efficacy.However, other advantages (adverse effects, cost) could justify stopping the trial, to expedite availability of the new treatment. If a treatment is clearly inferior, then stopping the trial (or a particular trial arm) is ethically justified.Stopping rules might be asymmetric, a trial being allowed to continue longer if the new treatment appears superior,although this result is unlikely.InterpretationInterpreting a noninferiority trial's results depends on where the CI for the treatment effect lies relative to both the margin of noninferiority &Dgr; and a null effect. The observed treatment effect is not by itself sufficiently informative. With 2-sided equivalence the interpretation is analogous, but both margins &Dgr; and −&Dgr; need considering, and claiming equivalence requires the CI to lie wholly between −&Dgr; and &Dgr;.Many noninferiority trials based their interpretation on the upper limit of a 1-sided 97.5% CI, which is the same as the upper limit of a 2-sided 95% CI. Although both 1-sided and 2-sided CIs allow for inferences about noninferiority, we suggest that 2-sided CIs are appropriate in most noninferiority trials.If a 1-sided 5% significance level is deemed acceptable for the noninferiority hypothesis test(a decision open to question), a 90% 2-sided CI could then be used. The Figureinterprets several possible scenarios with 2-sided CIs for a noninferiority trial.Figure.Possible Scenarios of Observed Treatment Differences for Adverse Outcomes (Harms) in Noninferiority TrialsError bars indicate 2-sided 95% confidence intervals (CIs). Tinted area indicates zone of inferiority. A, If the CI lies wholly to the left of zero, the new treatment is superior. B and C, If the CI lies to the left of &Dgr; and includes zero, the new treatment is noninferior but not shown to be superior. D, If the CI lies wholly to the left of &Dgr; and wholly to the right of zero, the new treatment is noninferior in the sense already defined, but it is also inferior in the sense that a null treatment difference is excluded. This puzzling case is rare, since it requires a very large sample size. It can also result from having too wide a noninferiority margin. E and F, If the CI includes &Dgr; and zero, the difference is nonsignificant but the result regarding noninferiority is inconclusive. G, If the CI includes &Dgr; and is wholly to the right of zero, the difference is statistically significant but the result is inconclusive regarding possible inferiority of magnitude &Dgr; or worse. H, If the CI is wholly above &Dgr;, the new treatment is inferior.*This CI indicates noninferiority in the sense that it does not include &Dgr;, but the new treatment is significantly worse than the standard. Such a result is unlikely because it would require a very large sample size. †This CI is inconclusive in that it is still plausible that the true treatment difference is less than &Dgr;, but the new treatment is significantly worse than the standard.Once noninferiority is evident, it is acceptable to then assess whether the new treatment appears superior to the reference treatment, using an appropriate test or CI (ie, not just the point estimate), preferably defined a priori and with an ITT analysis.It is inappropriate to claim noninferiority post hoc from a superiority trial unless clearly related to a predefined margin of equivalence. That is, both superiority and noninferiority hypotheses need explicit specification in the trial protocol.It is, however, always reasonable to interpret a CI as excluding an effect of a particular prestated size.Having demonstrated noninferiority against reference treatment, some authors then make claims for efficacy of a new treatment relative to placebo by also using evidence from earlier trials of reference treatment vs placebo.Such inferences assume assay constancy, ie, current and earlier trials are identical in all relevant aspects,eg, participants, outcomes definition, and use of standard therapy. Regarding patient populations, for example, this implies no differences in the effect of treatment across subgroups or similar distribution of relevant subgroups. In the absence of assay constancy, an adjustment method has been proposed.Since assay constancy is inevitably questionable, any claims regarding efficacy of new treatment relative to placebo require cautious interpretation.How Common Are Noninferiority and Equivalence Trials?Assessing the frequency of noninferiority and equivalence trials is not straightforward because of inconsistencies in terminology. A search of the Cochrane Controlled Trials Registerin October 2004 for the words equivalenceor noninferiorityyielded 1021 of 415  918 trials (0.2%), but these figures are likely to be misleading. Not all noninferiority or equivalence trials use these words, and the term equivalenceis often inappropriately used when reporting negative results of superiority trials; such trials often lack statistical power to rule out important differences.Identifying noninferiority trials is difficult because they are often labeled as equivalence trials. A recent studyfound that only 3 of 188 (1.6%) cancer trials in a PubMed electronic search were designed to evaluate equivalence or noninferiority (Luciano Costa, MD, written communication, April 26, 2005).From the above findings, it seems that whereas the objective of testing for noninferiority or equivalence is likely to be common, there have been relatively few noninferiority and equivalence trials that are both designed and described as such. However, we perceive that these designs are becoming more widespread.Quality of Reporting of Noninferiority and Equivalence TrialsWe do not know of a study that looked at reporting of a cohort of trials actually designed as noninferiority or equivalence trials. There have been several reviews of quality of trials claiming equivalence, without differentiation between noninferiority and 2-sided equivalence, because many authors use the term equivalenceto mean either.Greene et alidentified methodological flaws in a systematic review of 88 studies claiming equivalence, published from 1992 to 1996. Equivalence was inappropriately claimed in 67% of them, on the basis of nonsignificant tests for superiority. Fifty-one percent stated equivalence as an aim, but only 23% were designed with a preset margin of equivalence. Only 22% adopted appropriate practice: a predefined aim of equivalence, a preset &Dgr;, consequent sample size determination, and actually testing equivalence.Other disease- or field-specific reviews reveal similar findings. Only 2 trials (8%) in a reviewof 25 RCTs in childhood bacterial meningitis published between 1980 and 2000 that claimed equivalent mortality were designed to test equivalence. A review of 90 RCTs with nonstatistically significant or “negative” results published in 3 surgical journals from 1988 to 1998 found 39% met predefined criteria for establishing equivalence.Only 3 studies in a recent review of 188 cancer trials with negative results used a noninferiority or equivalence analysis.In a review of 20 trials intended to detect equivalence in reproductive health, only 4 stated a margin of equivalence.McAlister and Sackettevaluated 4 large “negative” RCTs in hypertension as regards methodological requirements for active-control equivalence trials. Only 2 trials published both ITT and per-protocol (or on-treatment) analyses, only 1 trial specified the margin of equivalence in advance, and none was sufficiently large to address the equivalence hypothesis. This illustrates how failure to properly design, conduct, and analyze equivalence trials leads to incorrect conclusions about equivalence.Extension of Consort StatementTo accommodate noninferiority or equivalence trials, an extension of the CONSORT statement should encompass the following issues: (1) the rationale for adopting a noninferiority or equivalence design; (2) how study hypotheses were incorporated into the design; (3) choice of participants, interventions (especially the reference treatment), and outcomes; (4) statistical methods, including sample size calculation; and (5) how the design affects interpretation and conclusions. Consequences for the CONSORT checklist and flow diagram, including specific changes, are described below.ChecklistWe build on the work of McAlister and Sackettin modifying the CONSORT checklist(Table), especially items 1 to 7, 12, 16, 17, and 20. New text is shown in italics. For each modification, we include 1 or more examples of good reporting (and further elaboration where appropriate). In some examples, we have added text in brackets to explain the context. We mainly concentrate on noninferiority trials but make some reference to equivalence trials which are much less common.Table.Checklist of Items for Reporting Noninferiority or Equivalence Trials (Additions or Modifications to the CONSORT Checklist are Shown in Italics)Paper Section and TopicItem NumberDescriptor (Adapted for Noninferiority or Equivalence Trials)Title and abstract1*How participants were allocated to interventions (eg, “random allocation,” “randomized,” or “randomly assigned”), specifying that the trial is a noninferiority or equivalence trial.IntroductionBackground2*Scientific background and explanation of rationale, including the rationale for using a noninferiority or equivalence design.MethodsParticipants3*Eligibility criteria for participants (detailing whether participants in the noninferiority or equivalence trial are similar to those in any trial[s] that established efficacy of the reference treatment)and the settings and locations where the data were collected.Interventions4*Precise details of the interventions intended for each group, detailing whether the reference treatment in the noninferiority or equivalence trial is identical (or very similar) to that in any trial(s) that established efficacy,and how and when they were actually administered.Objectives5*Specific objectives and hypotheses, including the hypothesis concerning noninferiority or equivalence.Outcomes6*Clearly defined primary and secondary outcome measures, detailing whether the outcomes in the noninferiority or equivalence trial are identical (or very similar) to those in any trial(s) that established efficacy of the reference treatmentand, when applicable, any methods used to enhance the quality of measurements (eg, multiple observations, training of assessors).Sample size7*How sample size was determined, detailing whether it was calculated using a noninferiority or equivalence criterion and specifying the margin of equivalence with the rationale for its choice.When applicable, explanation of any interim analyses and stopping rules (and whether related to a noninferiority or equivalence hypothesis).RandomizationSequence generation8Method used to generate the random allocation sequence, including details of any restriction (eg, blocking, stratification).Allocation concealment9Method used to implement the random allocation sequence (eg, numbered containers or central telephone), clarifying whether the sequence was concealed until interventions were assigned.Implementation10Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups.Blinding (masking)11Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. When relevant, how the success of blinding was evaluated.Statistical methods12*Statistical methods used to compare groups for primary outcome(s), specifying whether a 1- or 2-sided confidence interval approach was used. Methods for additional analyses, such as subgroup analyses and adjusted analyses.ResultsParticipant flow13Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of participants randomly assigned, receiving intended treatment, completing the trial protocol, and analyzed for the primary outcome. Describe protocol deviations from trial as planned, together with reasons.Recruitment14Dates defining the periods of recruitment and follow-up.Baseline data15Baseline demographic and clinical characteristics of each group.Numbers analyzed16*Number of participants (denominator) in each group included in each analysis and whether “intention-to-treat” and/or alternative analyses were conducted. State the results in absolute numbers when feasible (eg, 10/20, not 50%).Outcomes and estimation17*For each primary and secondary outcome, a summary of results for each group and the estimated effect size and its precision (eg, 95% confidence interval). For the outcome(s) for which noninferiority or equivalence is hypothesized, a figure showing confidence intervals and margins of equivalence may be useful.Ancillary analyses18Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those prespecified and those exploratory.Adverse events19All important adverse events or side effects in each intervention group.CommentInterpretation20*Interpretation of the results, taking into account the noninferiority or equivalence hypothesis and any othertrial hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes.Generalizability21Generalizability (external validity) of the trial findings.Overall evidence22General interpretation of the results in the context of current evidence.*Expansion of corresponding item on CONSORT checklist.Title and AbstractTitle and Abstract: Item 1.How participants were allocated to interventions (eg, random allocation, randomized, or randomly assigned), specifying that the trial is a noninferiority or equivalence trial.Title.“Oral Pristinamycin versus Standard Penicillin Regimen to Treat Erysipelas in Adults: Randomised, Noninferiority, Open Trial”Abstract.“Design—Multicentre, parallel group, open labelled, randomised noninferiority trial.”IntroductionBackground: Item 2.Scientific background and explanation of rationale, including the rationale for using a noninferiority or equivalence design.Example.“Up to 40 million children worldwide are estimated to suffer from vitamin A deficiency. . . . A dose of 200,000 IU retinyl palmitate to children over 1 year old is most widely used and has generally been regarded as safe and potentially effective. . . . In developing countries, animal products that provide retinyl esters are too expensive. . . . Vegetables and fruit . . . are cheap and good sources of vitamin A in the form of beta carotene. . . . Beta carotene is also considered to be virtually non-toxic. . . . In a preliminary study, . . . after 20 days there was a reversion of the clinical and subclinical signs of vitamin A deficiency in the study group. . . . Since beta carotene is the principal source of vitamin A in developing countries and is non-toxic, we compared retinyl palmitate and beta carotene for treatment of vitamin A deficiency.”Elaboration.The rationale should cite evidence for the efficacy of the reference treatment. If previous trials, or their systematic review, demonstrate the superiority of the reference treatment relative to placebo, they should be cited with effect sizes and CIs. If no such trials exist, other evidence for efficacy of the reference treatment should be given. Evidence for other advantages of the new treatment over the reference treatment, if present, should be given, to justify use of the new treatment, if not inferior. One aim of the current trial might be to provide or support such evidence. In the case of “me-too” drugs, it should be clear whether there are other advantages.MethodsParticipants: Item 3.Eligibility criteria for participants (detailing whether participants in the noninferiority or equivalence trial are similar to those in any trial[s] that established efficacy of the reference treatment)and the settings and locations where the data were collected.Example.“From Sept 1, 1992, to Dec 30, 1994, we enrolled 6628 men and women in 312 health centres in Sweden . . . who had hypertension (blood pressure ≥180 mm Hg systolic, ≥105 mm Hg diastolic, or both), aged 70-84 years. The only difference in inclusion criteria between this trial and the STOP-Hypertension trial was that patients with isolated systolic hypertension could be included in STOP-Hypertension-2, based on previous positive findings in patients with isolated systolic hypertension treated with diuretics and calcium antagonists.”Elaboration.Relevant changes in participants' characteristics compared with previous trial(s) should be reported and explained. Clinical trial participants differ, mainly if time has elapsed between trials; therefore, such description should concentrate in relevant departures (that might affect response to treatments).Interventions: Item 4.Precise details of the interventions intended for each group, detailing whether the reference treatment in the noninferiority or equivalence trial is identical (or very similar) to that in any trial(s) that established efficacy,and how and when they were actually administered.Example.“[W]e randomly assigned women about to deliver vaginally to receive 600 μg misoprostol orally or 10 IU oxytocin intravenously or intramuscularly, according to practice. . . . The use of uterotonic agents [oxytocin, a type of uterotonic, is the reference treatment] in the management of the third stage of labour reduces the amount of bleeding and the need for blood transfusion . . . ”(The authors reference a Cochrane systematic review, showing that uterotonic agents reduced bleeding and blood transfusions compared with placebo.)Elaboration.Any differences between the control intervention in the trial and the reference treatment in the previous trial(s) in which efficacy was established should be reported and explained. For example, differences may exist because background treatment and patient management change with time and concomitant therapies may differ.Dose changes may occur: if the dose of the reference treatment is reduced, it might result in reduced efficacy; if it is increased, possibly leading to tolerability problems, the new treatment's advantages could be overestimated.Objectives: Item 5.Specific objectives and hypotheses, including the hypothesis concerning noninferiority or equivalence.Example.“[A] bodyweight-adjusted single bolus of 0.50-0.55 mg/kg tenecteplase would be equivalent to a 90 min regimen of alteplase for efficacy and safety [the primary endpoint for efficacy was all-cause 30-day mortality from acute myocardial infarction]. In this double-blind, randomised, controlled study, we formally tested this hypothesis.”Elaboration.The authors should specify for which outcomes noninferiority or equivalence hypotheses apply and for which superiority hypotheses apply. Usually the noninferiority or equivalence hypothesis refers to the primary end point, whereas the new treatment is expected to offer other advantages, eg, fewer adverse effects, cost.Outcomes: Item 6.Clearly defined primary and secondary outcome measures, detailing whether the outcomes in the noninferiority or equivalence trial are identical (or very similar) to those in any trial(s) that established efficacy of the reference treatmentand, when applicable, any methods used to enhance the quality of measurements (eg, multiple observations, training of assessors).Example.“Over the past decade seven large, randomised, placebo-controlled trials involving a total of 16,770 patients who underwent percutaneous interventions have established that the overall reduction in the risk of death or nonfatal myocardial infarction 30 days after adjunctive inhibition of platelet glycoprotein IIb/IIIa receptors is 38 percent. Three glycoprotein IIb/IIIa inhibitors were assessed in these trials. The primary end point [in the present trial] was a composite of death, nonfatal myocardial infarction, or urgent target-vessel revascularization within 30 days after the index procedure.”Elaboration.Any differences in outcome measures in the new trial compared with trial(s) that established efficacy of the reference treatment should be noted and justified. In particular, note any changes in timing of evaluation. Ideally, outcomes should remain unchanged, but often insights do lead to change as the understanding, management, and prognosis of a disease improve. For example, early AIDS trials used death outcomes, then deaths became uncommon, so they shifted to AIDS clinical events, then clinical events became uncommon, so they shifted to surrogate markers.Sample Size: Item 7a.How sample size was determined, detailing whether it was calculated using a noninferiority or equivalence criterion, and specifying the margin of equivalence with the rationale for its choice.Examples.“Considering previous studies, a primary event rate of 3.1% per year was estimated for patients in both treatment groups. To obtain 90% statistical power with a 1-sided α equal to 0.025, approximately 1600 patient-years of exposure per treatment groups are necessary to establish the noninferiority of ximelagatran compared with dose-adjusted warfarin within 2% per year. . . . Assuming an average follow up of 16 months, approximately 2400 patients are required.”“Sample size was based on . . . [an] 8.0% primary quadruple end point event rate in the control (heparin plus Gp IIb/IIIa blockade) group [reference treatment] and a 12.5% relative reduction in the bivalirudin arm. Using a 2-sided α level of .05 and 3000 patients per group, the trial had a 99% power to detect superiority over the imputed heparin control [historical control] and a 92% power to satisfy noninferiority criteria relative to heparin plus Gp IIb/IIIa.”Elaboration.The margin of noninferiority or equivalence should be specified, and preferably justified on clinical grounds. Its relation to the effect of the reference treatment relative to placebo in any previous trials should be noted (see second example).Sample size calculations are usually based on the assumption that the point estimate of the difference between treatments will be 0 (as in the first example above). Examples F and G in the Figurewould have met the noninferiority criterion had the observed point estimates been 0. That is, the precision of the estimates would have been adequate, had the 2 treatments been equally effective. With a large enough sample, it is possible to demonstrate noninferiority even when the point estimate is between 0 and &Dgr;. If the true effect is assumed to be greater than 0, the sample size will need to be increased, perhaps substantially.Stopping Rules: Item 7b.When applicable, explanation of any interim analyses and stopping rules (and whether related to a noninferiority or equivalence hypothesis).Example.“Interim safety analyses were planned when 40 and 70 percent of the total number of women had been enrolled. An increased rate of HIV transmission associated with the shorter regimens, as compared with the long-long regimen, would be considered significant if any of the nominal P values for the differences were less than 0.007 in the first interim analysis and less than 0.012 in the second. . . . ”Elaboration.It is customary to base interim stopping criteria on Pvalues, and these adjusted Pvalues are analogous to widened CIs.Statistical Methods: Item 12.Statistical methods used to compare groups for primary outcome(s), specifying whether a 1- or 2-sided confidence interval approach was used.Methods for additional analyses, such as subgroup analyses and adjusted analyses.Examples. Binary outcome.“The proportion of the intention-to-treat population experiencing primary events per year for both treatment groups, and the associated 1-sided 97.5% CI for the difference, will be estimated using the time to first event . . . The noninferiority margin (&Dgr;) defined in the primary analysis is based on absolute event rate differences . . . Noninferiority of ximelagatran over warfarin will be accepted [in a 0.025 level test] if the upper bound of the 97.5% CI around the estimated difference in primary event rates lies below &Dgr;. For these studies, an absolute &Dgr; of 2% was adopted. . . . ”Continuous outcome.“Regimens were regarded as equivalent if the difference between treatments in change in FEV1 (using 95% CI) was less than 4% of predicted FEV1 . . . Since we were undertaking an equivalence study, the primary analysis was per protocol but an intention-to treat analysis was also undertaken. The mean difference between treatments and 95% CI for the true difference was obtained from analysis of variance, with adjustment for centre and type of clinic. . . . ”Elaboration.The upper bound of the 1-sided (1 − α) × 100% CI (or correspondingly, the upper bound of the 2-sided (1 − α/2) × 100% CI) for the treatment effect has to be below the margin &Dgr; to declare that noninferiority has been shown, with a significance level α. Both &Dgr; and α should be prespecified in the noninferiority hypothesis.ResultsNumbers Analyzed: Item 16.Number of participants (denominator) in each group included in each analysis and whether “intention-to-treat” and/or alternative analyses were conducted. State the results in absolute numbers when feasible (eg, 10/20, not just 50%).Example.“Efficacy variables were analyzed on an intent-to-treat basis . . . and on an as-treated basis. In the intent-to-treat analysis, patients were considered treatment failures if they made any treatment changes, prematurely discontinued randomized treatment for any reason, or had missing data for 2 consecutive evaluations. In the as-treated analysis, only data from patients continuing randomized treatment were considered for analysis.”Outcomes and Estimation: Item 17.For each primary and secondary outcome, a summary of results for each group and the estimated effect size and its precision (eg, 95% confidence interval). For the outcome[s] for which noninferiority or equivalence is hypothesized, a figure showing confidence intervals and margins of equivalence may be useful.Examples. Inferiority of new treatment, figure legend.“Relative risk of blood loss of 1000 mL or more with misoprostol compared with oxytocin [1.39, 95% CI 1.19 to 1.63]. Vertical dotted lines represent margins of clinical equivalence determined a priori [0.74 and 1.35 on the relative scale]. Solid line represents null effect.”(A figure similar to case G in the Figurewas presented on the relative scale.)Noninferiority of new treatment.“The primary quadruple composite end point of death, myocardial infarction, urgent repeat revascularization, or in-hospital major bleeding by 30 days occurred in 299 (10.0%) of 2991 patients in the heparin plus Gp IIb/IIIa inhibitor group vs 275 (9.2%) of 2975 patients in the bivalirudin group (OR, 0.92; 95% CI, 0.77-1.09; P=0.32). Relative to heparin alone, the imputed OR was 0.62 (95% CI, 0.47-0.82), satisfying statistical criteria for noninferiority to heparin plus Gp IIb/IIIa blockade and superiority to heparin alone.”(A figure similar to case B in the Figurewas presented on the relative scale but without the margin of noninferiority.)Elaboration.In the first example the new treatment was inferior, but it was uncertain whether the treatment effect was smaller or larger than the margin of equivalence 1.35. The second example demonstrated noninferiority.CommentInterpretation: Item 20.Interpretation of the results, taking into account the noninferiority or equivalence hypothesis and any othertrial hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes.Examples. Concluding noninferiority.“According to our definition of equivalence, the efficacy of the . . . long-short regimen (was) statistically equivalent to the efficacy of the long-long regimen . . . The upper limit of the 95 percent confidence interval for the difference between the rates in the two groups was 5.3 percent (close to the boundary of 6.0 percent).”Concluding inferiority of new drug (or conventional superiority of reference drug).“Although the trial was intended to assess the noninferiority of tirobifan as compared with abciximab, the findings demonstrated that tirobifan offered less protection from major ischemic events than did abciximab. . . . In order to meet the present definition of equivalence, the upper bound of the 95% confidence interval of the hazard ratio for the comparison of tirofiban with abciximab had to be less than 1.47. . . . The primary endpoint occurred more frequently among the 2398 patients in the tirofiban group than among the 2411 patients in the abciximab group (7.6 percent vs. 6.0 percent; hazard ratio, 1.26; . . . two-sided 95 percent confidence interval of 1.01 to 1.57, demonstrating the superiority of abciximab over tirofiban; P=0.038).”Concluding noninferiority of new drug from a trial designed to assess superiority.“The SYNERGY protocol prespecified that if enoxaparin was not demonstrated to be superior to unfractionated heparin, a noninferiority analysis was to be performed. . . . Enaxoparin was not superior to unfractionated heparin but was noninferior for the treatment of high-risk patients with non-ST-segment elevation ACS.”CommentIt is not our intent to promote noninferiority or equivalence trials: the design should be appropriate to the question to be answered.Available efficacious reference treatments can make use of placebo controls unethical.But even in cases for which a treatment is efficacious on some measures, eg, depression scales, it may not be for a rarer but more important outcome, eg, suicide.Reports of noninferiority and equivalence trials must be clear enough to allow readers to interpret results reliably. Accordingly, we herein propose extensions to the CONSORT statement to facilitate appropriate reporting of noninferiority and equivalence trials.We advocate that editors extend support of the original CONSORT statement to include use of this extension to noninferiority and equivalence trials and refer to it in their “Instructions to Authors.” Adoption by journals of the original CONSORT statement is associated with improved quality,so we hope this proposed extension will result in similar improvements for noninferiority and equivalence trials.The CONSORT Group continues to update and extend its recommendations. The current recommendations add to recent extensions to cluster randomized trials,and the reporting of harms.Further extensions are in preparation. The current versions of all CONSORT recommendations are available at http://www.consort-statement.org.Corresponding Author:Gilda Piaggio, PhD, Department of Reproductive Health and Research, World Health Organization, 1211 Geneva 27, Switzerland (piaggiog@who.int).Financial Disclosures:None reported.Funding/Support:No funding was received for writing this article, although Drs Altman, Elbourne, and Piaggio and Mr Evans were supported by the CONSORT group to attend a meeting in Canada on this topic. The wider CONSORT group commented on earlier drafts and endorsed its submission for publication. Dr Altman is supported by Cancer Research United Kingdom. Dr Piaggio is supported by the UNDP/UNFPA/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction, Department of Reproductive Health and Research, World Health Organization.Acknowledgment:We thank the members of the CONSORT Group, especially David Moher, MSc, Thomas C. Chalmers Centre for Systematic Reviews, Children's Hospital of Eastern Ontario, Ottawa; Ken Schulz, PhD, Quantitative Science, Family Health International, Research Triangle Park, NC; and Susan Eastwood, ELS, Publications and Grants Writing, Department of Neurological Surgery, University of California at San Francisco, Emeryville; Peter C. Gøtzsche, MD, Nordic Cochrane Centre, Copenhagen, Denmark; Barbara Hawkins, PhD, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Md; Tom Lang, MA, Lakewood, Ohio; Ingram Olkin, PhD, Stanford University, Stanford, Calif; David L. Sackett, OC, FRSC, MD, FRCP, Trout Research and Education Centre, Markdale, Ontario; and David Simel, MD, MHS, Duke University, Durham, NC, and Simon Day, PhD, Licensing Division, Medicines and Healthcare Products Regulatory Agency, London, England, for comments on earlier drafts. We also thank Luciano Costa, MD, Division of Medical Oncology, University of Colorado Health Science Center, Aurora, for providing unpublished data.REFERENCESCBeggMChoSEastwoodImproving the quality of reporting of randomized controlled trials: the CONSORT statement.JAMA19962766376398773637DMoherKFSchulzDGAltmanfor the CONSORT GroupThe CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials.Lancet20013571191119411323066DGAltmanKFSchulzDMoherThe revised CONSORT statement for reporting randomized trials: explanation and elaboration.Ann Intern Med200113466369411304107JPAIoannidisSJWEvansPCGøtzscheBetter reporting of harms in randomized trials: an extension of the CONSORT statement.Ann Intern Med200414178178815545678KFSchulzRandomized trials, human nature, and reporting guidelines.Lancet19963485965988774577PJüniDGAltmanMEggerSystematic reviews in health care: assessing the quality of controlled clinical trials.BMJ2001323424611440947MEggerPJüniCBartlettHow important are comprehensive literature searches and the assessment of trial quality in systematic reviews? empirical study.Health Technol Assess2003717612583822SWellekTesting Statistical Hypotheses of Equivalence.Boca Raton, Fla: Chapman Hall/CRC; 2003Committee for Proprietary Medicinal ProductsNote for Guidelines on Evaluation of Medicinal Products Indicated for Treatment of Bacterial Infections.London, England: European Medicines Agency (EMEA); April 22, 2004. Available at: http://www.emea.eu.int/pdfs/human/ewp/055895en.pdf. Accessed July 2004VLDurkalskiYYPaleschBCPineauThe virtual colonoscopy study: a large multicenter clinical trial designed to compare two diagnostic screening procedures.Control Clin Trials20022357058312392872Clinical Outcomes of Surgical Therapy Study GroupA comparison of laparoscopically assisted and open colectomy for colon cancer.N Engl J Med20043502050205915141043DChadwickVigabatrin European Monotherapy Study GroupSafety and efficacy of vigabatrin and carbamazepine in newly diagnosed epilepsy: a multicentre randomised double-blind study.Lancet1999354131910406359Assessment of the Safety and Efficacy of a New Thrombolytic (ASSENT-2) InvestigatorsSingle-bolus tenecteplase compared with front-loaded alteplase in acute myocardial infarction: the ASSENT-2 double-blind randomised trial.Lancet199935471672210475182Hvon HertzenGPiaggioJDingLow dose mifepristone and two regimens of levonorgestrel for emergency contraception: a WHO multicentre randomised trial.Lancet20023601803181012480356ASmythKH-VTanPHyman-TaylorOnce versus three times daily regimens of tobramycin treatment for pulmonary exacerbations of cystic fibrosis–the TOPIC study: a randomised controlled trial.Lancet200536557357815708100SJPocockThe pros and cons of non-inferiority (equivalence) trials.In: Guess HA, Kleinman A, Kusek JW, Engel LW, eds. The Science of Placebo: Towards an Interdisciplinary Research Agenda.London, England: BMJ Books; 2000:236-248JVillarHBa'aqeelGPiaggioWHO antenatal care randomised trial for the evaluation of a new model of routine antenatal care.Lancet20013571551156411377642AKapurISMalikJPBaggerThe Coronary Artery Revascularisation in Diabetes (CARDia) trial: background, aims, and design.Am Heart J2005149131915660030RBD'AgostinoSrJMMassaroLMSullivanNon-inferiority trials: design concepts and issues—the encounters of academic consultants in statistics.Stat Med20032216918612520555ICH Steering CommitteeHarmonised Tripartite Guideline: Choice of Control Group and Related Issues in Clinical Trials (E10).Geneva, Switzerland: International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use; 2000. Available at: http://www.ich.org/LOB/media/MEDIA486.pdf. Accessed February 9, 2006BDjulbegovicMClarkeScientific and ethical issues in equivalence trials.JAMA20012851206120811231752Points to Consider on Switching Between Superiority and Non-inferiority.London, England: European Medicines Agency (EMEA); July 27, 2000. Available at: http://www.emea.eu.int/pdfs/human/ewp/048299en.pdf. Accessed February 9, 2006JHTamayo-SarverJMAlbertMTamayo-SarverAdvanced statistics: how to determine whether your intervention is different, at least as effective as, or equivalent: a basic introduction.Acad Emerg Med20051253654215930406BLWiensChoosing an equivalence limit for non-inferiority or equivalence studies.Control Clin Trials20022321411852160BJonesPJarvisJALewisTrials to assess equivalence: the importance of rigorous methods.BMJ199631336398664772ASDetskyDLSackettWhen was a negative clinical trial big enough? how many patients you needed depends on what you found.Arch Intern Med19851457097123985731MRothmannNLiGChenDesign and analysis of non-inferiority mortality trials in oncology.Stat Med20032223926412520560JALewisThe European regulatory experience.Stat Med2002212931293812325109Points to Consider on the Choice of Non-inferiority Margin.London, England: European Medicines Agency (EMEA); February 26, 2004. Available at: http://www.emea.eu.int/pdfs/human/ewp/215899en.pdf. Accessed February 9, 2006SSennStatistical Issues in Drug Development.Chichester: John Wiley & Sons; 2002:207-217MYKimJDGoldbergThe effects of outcome misclassification and measurement error on the design and analysis of therapeutic equivalence trials.Stat Med2001202065207811439421CWDunnettMGentSignificance testing to establish equivalence between treatments with special reference to data in the form of 2 x 2 tables.Biometrics197733593602588654CWDunnettMGentAn alternative to the use of two-sided tests in clinical trials.Stat Med199615172917388870155KJRothmanSignificance questing.Ann Intern Med19861054454473740684EBrittainDLinA comparison of intent-to-treat and per-protocol results in antibiotic non-inferiority trials.Stat Med20052411015532089ICH Steering CommitteeStatistical Principles for Clinical Trials (E9).Geneva, Switzerland: International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use; 1998. Available at: http://www.ich.org/LOB/media/MEDIA485.pdf. Accessed February 9, 2006FAMcAlisterDLSackettActive-control equivalence trials and antihypertensive agents.Am J Med200111155355811705432MLallemantGJourdainSLe CoeurA trial of shortened zidovudine regimens to prevent mother-to-child transmission of human immunodeficiency virus type 1.N Engl J Med200034398299111018164PPrandoniOBruchiPSabbionProlonged thromboprophylaxis with oral anticoagulants after total hip arthroplasty.Arch Intern Med20021621966197112230419FKLChanSChungBYSuenPreventing recurrent upper gastrointestinal bleeding in patients with Helicobacter pyloriinfection who are taking low-dose aspirin or naproxen.N Engl J Med200134496797311274623SDurrlemanRSimonPlanning and monitoring of equivalence studies.Biometrics1990463293362194579DLSackettSuperiority trials, non inferiority trials, and prisoners of the 2-sided null hypothesis.ACP J Club2004140A1115122874JLFleissGeneral design issues in efficacy, equivalency and superiority trials.J Periodontal Res1992273063131507018SYNERGY Trial InvestigatorsEnoxaparin vs unfractionated heparin in high-risk patients with non–st-segment elevation acute coronary syndromes managed with an intended early invasive strategy: primary results of the SYNERGY randomized trial.JAMA2004292455415238590DElbourneCDezateuxRArthurUK Collaborative Hip Trial GroupUltrasonography in the diagnosis and management of developmental hip dysplasia (UK Hip Trial): clinical and economic results of a multicentre randomised controled trial.Lancet20023602009201712504396AMLincoffJABittlRAHarringtonBivalrudin and provisional glycoprotein IIb/IIIa blockade compared with heparin and planned glycoprotein IIb/IIIa blockade during percutaneous coronary intervention: REPLACE-2 randomized trial.JAMA200328985386312588269The Cochrane Library.Issue 3. Chichester, England: Wiley; 2004DMoherCSDulbergGAWellsStatistical power, sample size, and their reporting in randomized controlled trials.JAMA19942721221248015121WLGreeneJConcatoARFeinsteinClaims of equivalence in medical research: are they supported by the evidence?Ann Intern Med200013271572210787365LJCostaACGXavierAdel GiglioNegative results in cancer clinical trials—equivalence or poor accrual?Control Clin Trials20042552553315465621DJKrysanARKemperClaims of equivalence in randomized controlled trials of the treatment of bacterial meningitis in children.Pediatr Infect Dis J20022175375812192164JBDimickMDiener-WestPALipsettNegative results of randomized clinical trials published in the surgical literature: equivalency or error?Arch Surg200113679680011448393GPiaggioAPPinolUse of the equivalence approach in reproductive health clinical trials.Stat Med2001203571357711746338PBernardOChosidowLVaillantFrench Erysipelas Study GroupOral pristinamycin versus standard penicillin regimen to treat erysipelas in adults: randomised, non-inferiority, open trial.BMJ200232586412386036CCarlierJCosteMEtchepareA randomised controlled trial to test equivalence between retinyl palmate and carotene for vitamin A deficiency.BMJ1993307110611108251808LHanssonLHLindholmTEkbomthe STOP-Hypertension-2 study groupRandomised trial of old and new antihypertensive drugs in elderly patients: cardiovascular mortality and morbidity in the Swedish trial in Old Patients with Hypertension-2 study.Lancet19993541751175610577635AMGülmezogluJVillarNTNNgocWHO Collaborative Group to Evaluate Misoprostol in the Management of the Third Stage of LabourWHO multicentre randomised trial of misoprostol in the management of the third stage of labour.Lancet200135868969511551574EJTopolDJMoliternoHCHerrmannComparison of two platelet glycoprotein IIb/IIIa inhibitors, tirobifan and abciximab, for the prevention of ischemic events with percutaneous coronary revascularization.N Engl J Med20013441888189411419425JLHalperinExecutive Committee, Sportif III and V Study InvestigatorsXimelataragan compared with warfarin for prevention of thromboembolism in patients with nonvalvular atrial fibrillation: rationale, objectives, and design of a pair of clinical studies and baseline patient characteristics (SPORTIF III and V).Am Heart J200314643143812947359SStaszewskiPKeiserJMontanerAbacavir-lamivudine-zidovudine vs indinavir-lamivudine-zidovudine in antiretroviral-naïve HIV-infected adults.JAMA20012851155116311231744KJRothmanPlacebo mania.BMJ1996313348664770DGunnellJSaperiaDAshbySelective serotonin reuptake inhibitors (SSRIs) and suicide in adults: meta-analysis of drug company data from placebo controlled, randomised controlled trials submitted to the MHRA's safety review.BMJ200533038538815718537DMoherAJonesLLepageCONSORT GroupUse of the CONSORT statement and quality of reports of randomised trials: a comparative before-and-after evaluation.JAMA20012851992199511308436MEggerPJüniCBartlettCONSORT GroupValue of flow diagrams in reports of randomized controlled trials.JAMA20012851996199911308437PJDevereauxBJMannsWAGhaliThe reporting of methodological factors in randomized controlled trials and the association with a journal policy to promote adherence to the Consolidated Standards of Reporting Trials (CONSORT) checklist.Control Clin Trials20022338038812161081MKCampbellDRElbourneDGAltmanthe CONSORT GroupCONSORT statement: extension to cluster randomised trials.BMJ200432870270815031246

Journal

JAMAAmerican Medical Association

Published: Mar 8, 2006

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$499/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month