Network meta-analysis: users’ guide for pediatricians

Network meta-analysis: users’ guide for pediatricians Background: Network meta-analysis (NMA) is a powerful analytic tool that allows simultaneous comparison between several management/treatment alternatives even when direct comparisons of the alternatives (such as the case in which treatments are compared against placebo and have not been compared against each other) are unavailable. Though there are still a limited number of pediatric NMAs published, the rapid increase in NMAs in other areas suggests pediatricians will soon be frequently facing this new form of evidence summary. Discussion: Evaluating the NMA evidence requires serial judgments on the creditability of the process of NMA conduct, and evidence quality assessment. First clinicians need to evaluate the basic standards applicable to any meta-analysis (e.g. comprehensive search, duplicate assessment of eligibility, risk of bias, and data abstraction). Then evaluate specific issues related to NMA including precision, transitivity, coherence, and rankings. Conclusions: In this article we discuss how clinicians can evaluate the credibility of NMA methods, and how they can make judgments regarding the quality (certainty) of the evidence. We illustrate the concepts using recent pediatric NMA publications. Keywords: Network meta-analysis, Multiple treatment comparisons, Multiple-treatment meta-analysis evidence synthesis, Evidence credibility, Evidence certainty, Pediatric Background possible paired comparison, but these have two major Randomized control trials (RCTs) constitute the optimal limitations. First, for the clinician or patient consumer, methodology to determine the effectiveness of medical making sense of multiple meta-analyses would be chal- interventions. When results against placebo or standard lenging. Second, it is extremely likely that many of the care suggest benefits outweigh harms, clinicians, patients possible paired comparisons will not have direct and families must choose among several interventions. comparisons available; in such instances, there will be Making this choice optimally requires access to system- no conventional meta-analysis to consider. atic summaries of the best available evidence. Network meta-analysis (NMA), also known as For decades, investigators have provided these multiple-treatment comparisons or multiple-treatment evidence summaries using systematic reviews and meta- meta-analysis, provides a methodology to address this analyses. By combining across studies, meta-analyses dilemma, taking advantage of two statistical innovations: increase the precision of the effect estimate [1]. Conven- the first is use of indirect comparisons—we can estimate tional meta-analyses, however, address only single paired the effect of A-B indirectly if both A and B have been comparisons and are therefore of limited use when mul- compared against C (see next section). The second is tiple reasonable options exist. One could envision a that NMA statistical methods combining direct and in- series of conventional meta-analyses addressing each direct comparisons allow estimates of the relative effect of every alternative versus every other alternative. Although the majority of published NMAs summarize * Correspondence: ralkhalifah@ksu.edu.sa; Reem_ah@yahoo.com 1 evidence from RCTs, NMA of cohort studies – most Department of Clinical Epidemiology & Biostatistics, McMaster University, Hamilton, ON, Canada often addressing the evidence regarding adverse events - Department of Pediatrics, Division of Pediatric Endocrinology and are increasing [2, 3]. Moreover, given the recent Metabolism King Saud University, Riyadh, Saudi Arabia Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 2 of 12 development of the required methods, diagnostic test ac- (nodes) represent each intervention, and the lines be- curacy NMA may soon be available [4]. tween the nodes (called edges) represent head-to-head The first NMA addressing a pediatric issue evaluated comparisons (Fig. 2)[25]. Some network graphs use the the effects of indomethacin, ibuprofen, and placebo on size of the nodes and the width of the edges to convey patent ductus arteriosus closure in preterm infants [5]. information about the amount of information available Since then, the number of pediatric NMAs has increased (circles convey the sample size of studies of a particular [6–23] and, given development in other fields, one can intervention and edges the number, sample size, or vari- anticipate a substantial further increase. This increase ance associated with the related direct comparisons, i.e. might, however, occur at a slower rate in the pediatric large node means larger sample size, and thick edge field because of the smaller number of RCTs relative to means increased number of studies included). the adult literature. In comparison to conventional meta-analysis that relies The goal of this paper is to provide a users’ guide for exclusively on direct evidence, the NMA provides esti- pediatricians considering the application of the results of mates of relative effectiveness among all interventions be- NMA addressing a therapeutic issue to their practice. ing compared, increases precision around effect estimates, Nonetheless, a minimum knowledge on Conventional ranks treatments, and enhances generalizability [26–28]. meta-analysis is needed to understand most of the im- portant concepts of NMA [24]. First, we introduce the Credibility of NMA methods reader to NMAs and provide criteria for evaluating the The conduct of NMA should adhere to standards of a credibility of the NMA method. We then discuss the traditional systematic review. Like a conventional meta- quality of the evidence (synonyms: certainty or confi- analysis, a credible NMA requires explicit eligibility cri- dence in evidence) obtained from a NMA (the NMA teria, comprehensive search, and assessment of evidence may have used optimal methods, but limitations of the quality (Table 1). underlying studies may still result in low quality evi- dence). To illustrate the processes of interpretation and implementation in the context of pediatric literature, we Did the review explicitly address a sensible question? will present an example of the effects of 16 different A well-formulated clinical question will typically follow mechanical ventilation modes on mortality among pre- the PICO format (P: population, I: intervention, C: term infants with respiratory distress syndrome (RDS) [9], in addition to other examples from the pediatric lit- erature when we could not illustrate the presented con- cepts using the mechanical ventilation NMA. Discussion Indirect evidence Let us suppose that we are interested in the relative merits of two treatments, A and B. It may be that no study has directly compared the two treatments. If, how- ever, investigators have compared both A and B against the same third alternative C, we can infer the relative ef- fect of A-B. We do so by comparing the effect of A-C and B-C (the indirect comparison, Fig. 1.1). For instance, if the relative risk (RR) of death in A-C is 0.5 (A reduces deaths relative to C by 50%) the RR of Fig. 1 The concept of network meta-analysis. Each node (circle) is death in B-C is 1.0 (B has no effect on deaths relative to considered an intervention (A, B or C), sold lines represent loops of C), then it would be reasonable to infer that A will re- pairwise comparison (direct evidence), and doted lines represent duce death relative to B by 50%. Furthermore, if investi- loops of indirect comparison (indirect evidence). Indirect compari- gators have conducted both direct and indirect sons can be made via deduction from the common comparator. 1.1. Indirect evidence of A versus B inferred from direct estimates of comparisons, we can combine the two and produce a A versus C and B versus C Four studies formed the effect estimate for mixed or network estimate (Fig. 1.2). A-C, and 3 studies formed the effect estimate for C-B. The effect esti- mate of A-B was obtained from indirect evidence. 1.2.Closed network shows the a closed network meta-analysis in a hypothetical example Network meta-analysis where all interventions were compared in RCT’s, therefore; direct and Ideally, an NMA will depict the available direct evidence indirect evidence is available for all comparisons in a figure; we refer to as a network graph. The circles Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 3 of 12 Fig. 2 The geometry of the mechanical ventilation for premature infants NMA. A/C, assist-control ventilation; VG, volume guarantee ventilation; RM, recruitment maneuver; CMV, continuous mandatory ventilation; HFFIV, high-frequency flow interrupted ventilation; HFJV, high-frequency-jet ventilation; HFOV, high-frequency oscillatory ventilation; IMV, intermittent mandatory ventilation; PSV, pressure support ventilation; PTV, patient- triggered ventilation; SIMV, synchronized intermittent mechanical ventilation; SIPPV, synchronized intermittent positive pressure ventilation;V-C,volume- controlled. Wang C et al. Mechanical ventilation modes for respiratory distress syndrome in infants: a systematic review and network meta-analysis.Critical care (London, England). 2015, reprinted by permission of the publisher [9] comparator, O: outcomes) [29]. NMA uses the same for- disease severity that might lead to differences in treat- mat except that “I” and “C” (intervention and compari- ment response [33]. sons), include all the interventions compared against Another example relates to differences in the meas- each other. Successful definitions of each element of the urement of outcome [27, 34]. Two systematic reviews PICO are required to determine the studies eligible for in asthma began with the goal of conducting an NMA; the review and develop a priori hypotheses to address only one was successful. The first study evaluated the possible heterogeneity. effectiveness of the various inhalation regimens on Although, the scope of the research question can vary FEV1 improvement [18]. The systematic review re- from narrow to broad, it is essential that for any paired vealed large variations in the way the 23 trials measured comparison within the NMA, it is plausible that we will, and reported FEV1. This heterogeneity prevented the for each outcome of interest, observe similar effects review team from performing an NMA. The second across all patient populations being addressed [30, 31]. Eligibility criteria can be wide enough to permit the Table 1 Guide for appraising NMA evidence possibility of differences in effect across the included pa- Credibility Did the review explicitly address sensible question? tients, interventions, and outcomes. For instance, effects Was the search for studies and selection comprehensive? Did the review assess evidence certainty? may differ – among eligible studies- in more or less Did the review present results for the reader? severely affected patients; across high and low doses and Certainty What is the risk of bias of included studies? across shorter and longer follow-up. Were the results precise? An NMA that assessed the efficacy of asthma treat- Were results consistent across studies? How trustworthy are the indirect comparisons? ments strategies included all children with chronic Were results consistent between direct and indirect asthma [12]. The definition of chronic asthma was not comparisons? based on the Global Initiative for Asthma (GINA) Is there evidence for publication bias? Were treatment ranks presented and were they trustworthy? asthma guidelines staging [32], nor did the authors present data on disease severity, or attempt subgroup Applicability What is the overall quality of the evidence? What are the limitations of the evidence? analyses. The broad inclusion criteria and lack of Can I apply the results to my patients? subgroup hypotheses fail to address the differences in Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 4 of 12 NMA assessed the efficacy of treatments on reducing information is not available, then the credibility of the exacerbation [12]. Severe exacerbation was defined as NMA is compromised [31]. patients needing hospital admission, a visit to the emer- Consider, for instance, the GRADE profile for the dir- gency department or a standard course of systemic cor- ect evidence of an NMA of antidepressant medications ticosteroids. In this case, outcomes were reported for improving depression symptoms in children (Table 2) similarly across trials and the authors presented [41]. The evidence certainty for Fluoxetine versus pla- pooled estimates. cebo was rated as very low as a result of high RoB, im- precision, and inconsistency. The Imipramine versus Was the search for studies and selection comprehensive? placebo comparison was rated as moderate, the only A comprehensive systematic search that identifies all concern being imprecision. With this variation in evi- pertinent available studies minimizes the risk of spurious dence certainty, making sense of the results requires rat- findings from unrepresentative selection of studies. Since ings of evidence quality for each pairwise comparison. many reviews articles have demonstrated the inadequacy of searching only one database [35–37], an optimal How do NMAs conduct analyses and present results? search include all relevant electronic databases (e.g., There are two statistical approaches to perform NMA: a Medline, Embase, Psycinfo, CENTRAL, CINAHL) [38]. frequentist and a Bayesian approach [41, 42]. The fre- Ideally, a search of the grey literature will minimize the quentist approach is what clinicians will generally see in risk of publication bias. individual RCTs and conventional meta-analyses. The Subsequently, the team selects eligible studies [38]. additional major aspect of Bayesian approaches is the The report should provide evidence of the reproducibil- specification of prior probabilities of treatment effects ity of assessment of study eligibility through review by at before beginning the data analysis and combining these least 2 independent assessors, and present a figure sum- priors and their precision with the estimate from the marizing each selection step in the eligibility determin- data to produce a posterior probability and its credible ation process (identification of titles and abstracts; interval. Results in NMA are presented as effect esti- culling of titles and abstracts; review of full texts; final mates, typically odds ratios (ORs), or RR, hazard ratio determination of eligibility) [39]. (HR), mean difference (MD) with their 95% confidence interval (CI) (frequentist approach) or credible interval Did the review assess evidence certainty? (CrI) (Bayesian approach), both of which describe the Certainty in effect estimates represents how trustworthy range of plausible truth around the point estimate. are the results and their conclusions [31]. Within any Ideally, NMAs will present direct, indirect, and net- network, it is likely that the quality of the evidence dif- work estimates for each paired comparison. When, how- fers across paired comparisons: high quality evidence ever, there are large numbers of comparisons, this may reveal that one treatment is superior to another, becomes a challenging task. For example, the mechanical whereas we may have only low quality evidence regard- ventilation modes for RDS in preterm infants NMA in- ing the relative merit of other treatments. cluded 16 different ventilation modes, yielding 120 Making that rating requires a sequence of judgments comparisons- this probably requires an online appendix relying on assessments of the quality of the direct and [9]. Ways to deal with this profusion of comparisons is indirect evidence. Three articles were published on 2014 to present effect estimates in a league table (all possible by the Grades of Recommendation, Assessment, Devel- pairwise interventions compared to each other by cross- opment and Evaluation (GRADE) working group, the matching the interventions on the raw with those in the Cochrane Collaboration, and the ISPOR-AMCP-NPC column), forest plots (all pairwise interventions good practice task force [30, 31, 40] that extend quality compared to one reference intervention, or to the least of evidence assessment of meta-analysis to NMA. efficacious intervention such as placebo), or evidence Following the GRADE approach, the overall confidence comparisons (direct, indirect, and NMA) for each inter- starts as high for direct, indirect, and network estimates vention compared to one reference [13, 41, 43, 44]. that are derived from RCTs [31]. The evidence can be rated down from high to moderate, low, or very low quality based on the presence and magnitude of any of Certainty of NMA evidence the 5 domains: Risk of bias (RoB), indirectness, impreci- What is the risk of bias of included studies? sion, inconsistency, and publication bias [31]. RoB conveys the likelihood that limitations in design or Many prior published NMAs have not explicitly ad- conduct of studies will result in estimates of treatment dressed all the recommended elements. Fortunately, effect that vary systematically from the truth. The however, some present the information required for a greater the RoB, the more appropriate it becomes to rate reader to make the necessary judgments. If the down the quality of the evidence [45, 46]. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 5 of 12 Table 2 GRADE evidence profile showing differences in the evidence certainty among two direct evidence comparisons in the depression treatment NMA for depression symptoms Quality assessment Quality № of studies Risk of bias Inconsistency Indirectness Imprecision Other considerations Absolute Effect (95% CI) Fluoxetine vs. placebo a b c 8 Serious Serious Not serious Serious None SMD 0.26 SD lower ⊕OOO VERY LOW (0.5 lower to 0.03 lower) Imipramine vs. placebo 2 Not serious Not serious Not serious Serious None SMD 0 SD ⊕⊕⊕O MODERATE (0.27 lower to 0.26 higher) RCT Randomised trials; CI Confidence interval, SMD Standardised mean difference [41] Selective outcome reporting, and incomplete outcome data b 2 Moderate heterogeneity I = 67.4% Upper CI very close to no effect SMD includes no effect For assessing the RoB, authors may use an instrument For example, in the NMA of ventilation modes for such as the Cochrane RoB tool for RCTs [38]. This in- infants, the comparison of synchronized-intermittent strument assesses six elements: randomization sequence mechanical ventilation with volume-guarantee (SIMV+VG) generation, concealment of allocation, blinding of partic- versus high-frequency-jet-ventilation (HFJV), the point ipants, personnel and outcome assessors, completeness estimate suggests that SIMV+VG reduced mortality (HR = of follow-up, selective outcome reporting, and presence 0.23) [9]. However, the 95%CrI ranged from an extremely of other biases. large reduction in mortality (HR = 0.03, reduction in hazard In the NMA of strategies for preventing asthma exac- by 97%) to an almost doubling of hazard (HR = 1.46). Since erbations, the authors used the Cochrane instrument to the treatment choice will be different at each CrI end, the assess RoB [12] and judged all trials to be at low RoB. evidence quality is reduced for imprecision. Although the authors did not provide an overall RoB On the other hand, for the comparison SIMV+VG judgment per comparison, it is possible -although tedi- versus SIMV with pressure-support ventilation (SIMV ous- for the pediatrician to make this rating if the NMA +PSV), mortality was lower with SIMV+VG (HR = 0.12; authors have presented ratings of RoB for each study in 95%CrI 0.01, 0.86). Here, even the upper suggests a 14% a table or figure. In this case, it is not a problem: since reduction in hazard with SIMV+VG. Therefore, in this all studies were at low RoB, there is no need to rate instance, there is no need to rate down the quality of the down for RoB for any comparison. evidence for imprecision. Although, the width of the CrI may still be considered large and thus could be consid- ered imprecise for outcomes such as hospital length of Were the results precise? stay, any but the smallest reduction in mortality is crit- The lack of adequate power to inform a particular out- ical. The judgment of importance is critically dependent come leads to imprecision [47]. One standard for asses- on the absolute difference, in this case the absolute mor- sing precision is to consider whether differences tality risk difference: for instance, for 27 weeks infants between intervention and control exclude chance (i.e. with baseline mortality risk of 10%, the absolute mortal- statistically significant). This has two limitations: first ity risk reduction with SIMV+VG versus SIMV+PSV results may exclude no effect, but may not exclude an would approximately be 9% if the point estimate of the effect too small to be important; second, using this HR (0.12) were accurate, and approximately 1.4% if the criterion, one would always rate down for precision if upper boundary of the CrI (0.86) represented the truth. results were not statistically significant, no matter how The magnitude of the absolute difference is greater for narrow the CI or CrI. even younger infants with higher mortality, and less for Therefore, we suggest an alternative standard. To older infants with lower mortality (Table 3)[49–51]. assess imprecision, one can consider whether decisions In a complementary approach, authors can, for each regarding choice of therapy will differ if the upper and direct comparison, assess imprecision by calculating the lower CI or CrI represents the truth. Another way of optimal information size (OIS), the number of patients thinking about this approach is to consider whether the or events needed for adequately powered individual CI or CrI excludes a minimally important difference study to avoid spurious findings [47]. This, however, ig- (MID). The MID is a measure of the smallest change in nores the contribution of the indirect comparisons to the value of a patient-reported outcome, typically the network estimate. Methods to incorporate indirect applied to outcomes such as quality of life measures [48]. estimates of OIS to NMA are under development [26]. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 6 of 12 Table 3 Anticipated absolute mortality among premature infants using SIMV+VG versus SIMV+PSV Relative effect Anticipated absolute effects Hazard ratio Mortality risk with regular care Mortality risk difference with SIMV+VG (95% CrI) GA > 30 weeks 0.12 (0.01 to 0.86) 5 per 100 4 fewer per 100 (5 fewer to 1 fewer) GA 27–30 weeks 0.12 (0.01 to 0.86) 10 per 100 9 fewer per 100 (10 fewer to 1 fewer) GA 25–26 weeks 0.12 (0.01 to 0.86) 50 per 100 42 fewer per 100 (49 fewer to 5 fewer) The relative effect of SIMV+VG (and its 95% CrI) is based on the NMA estimates [9]; the absolute effect (and its 95% CI) is based on the assumed risk in the comparison group; mortality estimates with regular care are based on previous literature [49–51] GA gestational age Were results consistent across studies? heterogeneity that is attributable to differences between One can expect variation between treatment effects –we the studies and ranges from 0 to 100%) [38]. call such variation “heterogeneity”. Heterogeneity can For example, in the chronic asthma NMA, the au- result from chance, or from differences in patients, inter- thors presented direct comparison between low-dose ventions, comparisons, outcomes and methodology be- inhaled corticosteroids (ICS-L) and placebo for moder- tween studies (Table 4). ate or severe exacerbation. Six trials contributed to the Assessing the degree of inconsistency in direct com- pooled estimate OR = 0.41 (95%CrI 0.29, 0.56). The for- parisons involves inspecting the point estimates and the est plot shows similar point estimates, and CIs over- degree of confidence or credible intervals overlap of lapped across all trials. The P-value for heterogeneity each study in a forest plot. Two methods for formal stat- assessment was 0.54 (not significant), and I =0% (Fig. 3), istical testing can complement visual inspection of forest indicating a high level of consistency between results. plots – the test for heterogeneity (Cochran’s Q-test), and Conversely, if there is substantial heterogeneity that is I (which quantifies the proportion of the total unexplained by subgroup analysis or meta-regression, we lose confidence in treatment effects and, in the GRADE approach, rate down the quality of evidence for inconsist- ency [31, 34, 52]. Table 4 Possible effect modifiers that may contribute to between study variability How trustworthy are the indirect comparisons? Pure chance Trustworthiness of indirect comparisons - for instance, Different Risk of Bias Studies with high RoB might show large effect than those with low RoB. inferring the relative effect of A-B from A-C and B-C comparisons -requires similarity of patient population, Different study Population: Baseline risk like gender, age (e.g., in some interventions, the effect comparators, outcomes, RoB, and optimal administra- could be larger in infants than in adolescents). tion of the interventions under consideration (Fig. 4). In Disease severity (e.g., in children with severe diseases the effect other words, A and B must both be optimally adminis- of x intervention might be smaller than in case of patients with mild disease). tered; the A-C and B-C comparisons must include simi- Treatment setting (e.g., patient with asthma enrolled from the lar patients; C must be similar; outcomes must be emergency room will have different characteristics than those enrolled measured similarly; and studies would ideally be at low from the outpatient clinic). RoB. We refer to situations when this is not the case as Different Interventions: Dose (larger doses are expected to be associated with larger effect “intransitivity”. Intransitivity reduces confidence in the ad sometimes with larger effect in terms of side effects). results of indirect comparisons. Route (intravenous administration may have larger effect if oral To illustrate the concept of intransitivity consider an administration is impacted by absorption or hepatic metabolism). Duration (using the medication for longer duration may be associated NMA of comparative efficacy of psychotherapies for with larger effect compared to shorter duration). depression in children [10]. The comparison of interest Different comparators: is cognitive behavioral therapy (CBT) versus Problem- Different standards of care when the standard of care is the comparator solving therapy (PST). We wish to make inferences re- (e.g., in a diarrhea study, oral rehydration solution (ORS) is given to the garding the effects of CBT versus PST from an indirect control group in study A vs. ORS+ zinc supplement given to the control group in study B). comparison: studies have compared both CBT and PST to wait list (WL) controls. The 14 RCTs comparing Different ways in Outcome assessment: Definition (e.g., if fever is defined as 38.0 C in study A vs. 39.0 C in study CBT versus WL used 8 different instruments to define B, this may result in diagnosing more patients with the fever in the depression; the 3 RCTs comparing PST versus WL study A). (Table 5) used 2 of the 8, and a ninth that was not used Measurement (e.g., if fever is measured using rectal temperature, compared to axillary temperature in another study; or standard methods at all in the CEB versus WL studies. Use of the different in one study compared to non standard way). instruments could create differences in depression Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 7 of 12 Fig. 3 Forest plot comparing ICS-L vs. placebo for moderate or severe asthma exacerbations. Visual assessment indicates low heterogeneity, similar point estimates, overlapped CI, and I =0 [12]. Zhao Y, et al. Effectiveness of drug treatment strategies to prevent asthma exacerbations and increase symptom-free days in asthmatic children: a network meta-analysis. The Journal of asthma: official journal of the Association for the Care of Asthma. 2015, reprinted by permission of the publisher (Taylor & Francis Ltd., WWW.tandfonline.com)[12] severity in the population that in turn could influence investigators may conduct statistical tests that addresses the magnitude of the treatment effect, suggesting pos- whether chance can explain difference between direct sible intransitivity and consideration of consequent rat- and indirect comparisons [55, 56]. Unexplained incoher- ing down of quality. ence requires rating down evidence quality. In the asthma NMA, the direct evidence comparing Were results consistent between direct and indirect ICS-L versus leukotriene receptor antagonists (LTRA) comparisons? suggested a large reduction in exacerbation favoring Whenever a closed loop is present (Fig. 1.2, and Table 6) ICS-L (OR = 0.38; 95%CrI 0.21, 0.68), and the network there is a possibility that the available direct and indirect estimate showed a significant reduction (OR = 0.56, comparisons will yield very different effect estimates, a 95%CrI 0.39, 0.76) [12] – from which, one might infer condition we refer to as incoherence, or “inconsistency” that the indirect estimate showed a substantially smaller used by other authorities [26, 27, 43, 53, 54]. Incoher- effect or, depending on the amount of indirect evidence, ence can arise for reasons similar to those that can ex- none at all. If the authors had provided the indirect esti- plain heterogeneity and intransitivity (Table 4). mate and its CrI, one could make the judgment regard- One can assess incoherence through inspecting the ing the degree of incoherence. The authors’ statement point estimates and the degree of CI or CrI overlap be- that they found no incoherence in the network on the tween direct and indirect evidence. In addition, basis of statistical tests is somewhat reassuring. Like conventional meta-analysis, when heterogeneity is high, NMA can use techniques of subgroup analysis and meta-regression to try and explain heterogeneity by identifying modifiers of treatment effects [57, 58]. For example in the NMA addressing adverse events associ- ated with antidepressant medications in children and adolescents [41], the OR for adverse events associated with sertraline use compared to placebo 2.94 (95%CrI 0.94,17.19, I = 79.3). The authors performed a subgroup analysis based on age and found increased adverse events with sertraline compared to placebo; for children age < 13 years (OR = 12.64, 95%CrI 2.72, 678.43), and in children age > 13 years (OR = 0.59, 95%CrI 0.15, 6.03). A somewhat less satisfactory way of exploring hetero- geneity is to omit studies and determine if the omission influences results. For example, in the mechanical venti- Fig. 4 The diagram shows the concept of intransitivity. The doted line A–C shows the indirect evidence were inferences are being lation NMA, the authors examined the robustness of the made. B is not shown as a unique intervention, rather as two different analysis by excluding 2 studies that included only new- ways of B (Blue and Red). Intransitivity can occur when the distribution borns with gestational age 25–26 weeks [9]. The results of a possible effect modifier is different between two groups showed no changes in the effect estimates. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 8 of 12 Table 5 Depression definition used in the psychotherapies NMA argue that, having committed oneself to an NMA, one in the wait list (the common comparator) to illustrate the concept should always use the network estimates. of intransitivity in the indirect evidence For example, the pediatric antidepressants medications Pairwise Cognitive-behavioral Problem-solving NMA included a comparison of Fluoxetine versus Pla- comparison therapy vs. Wait list therapy vs. Wait list cebo (Table 7)[41]. In this comparison, one can infer Definition of APAI > 32 20-item CES-D > 16 from the information presented a rating of the quality of depression 21-item BDI > 15 27-item CDI > 16 the direct evidence as very low, the indirect evidence as 21-item BDI > 10 DSM-IV 27-item CDI > 15 moderate, and the network estimate as very low quality. CDRS-R >30 In this case, following the GRADE approach, the clin- DSM-III ician is better off using the effect estimates from the in- DSM-III-R DSM-IV direct evidence. APAI Acholi Psychosocial Assessment Instrument depression symptom scale, BDI Beck Depression Inventory, CES-D Center for Epidemiologic Study Is there evidence for publication Bias? Depression Scale, CDI Children’s Depression Inventory, CDRS-R Children’s Publication bias results from missing studies [59]. This Depression Rating Scale-Revised [10] is because some studies, particularly those with nega- When direct and indirect evidence vary, and the net- tive results, may never be published. A low risk NMA work estimate is between the two and rated down for in- for publication bias will demonstrate comprehensive coherence, what estimate is the clinician going to believe? search for studies, present symmetrical funnel, and The GRADE approach suggests using the effect estimates demonstrate insignificant statistical test for publication from the highest quality evidence, which most commonly bias [38]. This assessment requires, however, at least 10 will be the direct estimate [31]. Other authorities would studies. If publication bias is very likely rating down the evidence is warranted. Table 6 Glossary of terms Were treatment ranks presented and were they trustworthy? Certainty: Methods are available that allow NMA authors to rank Quality of the evidence or confidence in the evidence. treatments from best to worst [26, 60]. They are often Direct estimates: expressed as probabilities that treatments are 1st, 2nd, Effect estimate determined from a head-to-head comparison (such as study 3rd etc. best, either in tables (Table 8) or graphically of A versus B). (rankograms). Surface under the ranking (SUCRA) sum- Indirect estimates: marizes the information from the rankograms as a single Effect estimate determined from two or more head-to-head comparisons through a common (such as the relative effect of A versus B by comparing number. Ranking need be made for each outcome –a the effect of A versus C and B versus C). treatment that is best for one outcome (e.g. benefit) may Network (multiple-treatment comparisons or multiple-treatment meta-analysis): be worst for another (e.g. harm) [60]. Effect estimate determined for a particular comparison from the combination Although intuitively appealing, there are a number of of direct and indirect effect estimates. reasons why clinicians should not routinely choose a Loop: treatment with the higher rankings [61]. First, a treat- A loop of evidence exists when 2 or more direct comparisons contribute to an indirect estimate (e.g., A-B and A-C, contribute to indirect B-C) this ment that is best in one outcome (e.g., a benefit out- loop is considered closed if direct evidence exists between B-C, and open come) may be the worst in another (e.g., a harm when this direct evidence does not exists. outcome). Second, issues such as cost and a clinician’s Indirectness: familiarity with use of a particular treatment may also Term used in direct evidence to describe the presence of systematic bear consideration. Third, rankings do not take into clinical or methodological differences between head-to-head studies that can act as effect modifiers. These can be in different patients account the magnitude of differences in effects between characteristics, ways of administering the interventions, measuring treatments (a first ranked treatment may be only slightly, outcomes, or ROB. or a great deal better than the second ranked treatment). Intransitivity: Fourth, chance may explain apparent difference between Term used in indirect evidence to describe the presence of systematic clinical or methodological differences between head-to-head studies treatments; the use of a measure of uncertainty such as that can act as effect modifiers. These can be in different patients credible intervals for the SUCRA or p-value might help characteristics, ways of administering the interventions, measuring to consider the precision of these probabilities [62]. outcomes, or ROB. Finally, and most important, the evidence on which Heterogeneity (Inconsistency): rankings are based may be very low quality, and there- The presence of differences in effect estimates between head-to-head studies that assessed the same comparison. fore untrustworthy [61]. Although the first ranking may be secure, others are Incoherence: The presence of differences in effect estimates between direct and indirect not: the asthma NMA showed that the treatment ranks evidence. for 2nd, 3rd, and 4th orders were ICS-L, ICS-H, ICS + Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 9 of 12 Table 7 Differences in the evidence certainty across evidence sources in the depression treatment NMA Comparison Direct evidence Direct evidence Indirect evidence Indirect evidence Network Network certainty certainty in estimates certainty in estimates in estimates a,b,c d b,e Fluoxetine vs. Placebo −0.26 ⊕OOO VERY LOW −1.41 ⊕⊕⊕O MODERATE − 0.51 ⊕OOO VERY LOW (−0.50, −0.03) (−2.35, − 0.47) (− 0.99, − 0.03) rated down for RoB rated down for imprecision (upper CI close to the null) c 2 rated down for heterogeneity (I = 67.4%) d 2 loops informed the indirect evidence were of lowROB, imprecise (Duloxetine-placebo[SMD= − 0.11 95%CrI -0.3, 0.08; I = 17%], Duloxetine- Fluoxetine [SMD = − 0.09 95%CrI -0.26, 0.08; I = 0%], no intransitivity e 2 rated down for incoherence (τ = 0.33, P value = 0.02) Effect estimates are SMD (95th CI) [41] Assessed from first order loop Duloxetine-placebo (n = 552), Duloxetine- Fluoxetine (n = 557), included 7–17 years old children, treated for 10 weeks LTRA (Table 8) for improving symptom-free days [12]. strategy included 5 databases, and a grey literature However, the probability for each treatment were close: search. Two independent reviewers performed title and 0.38, 0.33, 0.24 respectively, the NMA estimates were abstract screening, full text eligibility, data extraction, imprecise, and of low quality evidence. Therefore, the and quality assessment, resulting in 20 eligible RCTs, treatment ranks for the 2nd, 3rd, and 4th orders are comparing 16 ventilation modes in 2832 infants with untrustworthy. gestational age 25–32 weeks (Fig. 2). The authors reported baseline characteristics, and assessed RoB using Applicability the Jadad instrument [63]. The authors did not present Just as in conventional systematic reviews and pairwise evidence quality assessment but, as we note in the next meta-analysis, applicability may be limited by differences paragraph, they present enough information to make between the clinicial setting and the setting in which the this judgment. trials were conducted. These limitations may include All included studies were low RoB. The only NMA es- differences in the patients (e.g. the patient may be younger timates for mortality in the entire network in which the than those included in the trials); the intervention (e.g. the CrI did not include HR = 1.0 suggested benefit for time- clinician is considering use of doses differing from those cycled pressure-limited ventilation (TCPL) (HR = 0.29 tested in the trials); comparators (e.g. trials used standard 95%CrI 0.07, 0.97), HFOV (HR = 0.29 95%CrI 0.08, 0.85), care as a comparator, and standard care delivered in the SIMV+VG (HR = 0.12, 95%CrI 0.01, 0.86), and V-C trials differs from standard care in the clinician’s setting); (HR = 0.14 95%CrI 0.02, 0.68) modes compared to and outcomes (e.g. the clinician is concerned about long- SIMV+PSV. Although, the upper CrI of those estimates term effects of treatment and trials examined only shorter are close to no difference, you decide to not rate them term outcomes). In any of these situations, the clinician down for imprecision (refer to the earlier discussion on must consider the extent to which trial results apply to imprecision). The contributing direct comparisons en- their patients and, if such differences exist, potentially rolled similar appropriate patients, the interventions refer to other evidence or their own experience in decid- appeared to be administered optimally, and the authors ing on optimal management. reported no heterogeneity or incoherence. You see little reason why, depending on the direction of results, Implementation authors would choose not to submit, or editors to Returning to the NMA of ventilation modes in preterm publish these studies, and therefore rate publication infants with RDS [9] (P infants with RDS; I and C; all bias as undetected. All these comparisons constitute mechanical ventilation modes; O; mortality), the search high quality evidence. For every other paired comparison in the network, precision is a major concern. In the ranking, SIMV+VG mode had the highest Table 8 Asthma treatments strategies effectiveness NMA in probability of being ranked first, though that probabil- improving symptom free days ity was only 29.7%. The V-C mode had the second 1st Rank 2nd Rank 3rd Rank 4th Rank highest probability of being ranked first, at 22.8%. ICS + LABA 0.95 0.05 0.01 0 Given that there is clear difference between these two modes versus only SIMV+PSV (all other CrI were not ICS low dose 0.02 0.38 0.37 0.24 precise) the only convincing result is that it is wise to ICS high dose 0.01 0.33 0.36 0.29 avoid using SIMV+PSV. You therefore conclude that ICS + LTRA 0.02 0.24 0.26 0.45 use of TCPL, HFOV, SIMV+VG, or V-C – all of Ranks are expressed as probabilities that sums to 1. ICS-L low-dose inhaled which the pediatrician uses regularly - is reasonable corticosteroids, ICS-H medium or high-dose inhaled corticosteroids, LTRAs leukotriene receptor antagonists, LABA, long-acting b-agonists strategies [12] and appropriate. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 10 of 12 Conclusion Received: 30 January 2017 Accepted: 30 April 2018 NMA is a powerful analytic tool that offers many advan- tages over a conventional meta-analysis. NMA may, however, be misleading because of a number of prob- References 1. Guyatt G, Rennie D, Meade MO, Cook DJ. Chapter 22: the process of a lems. First authors may not have not followed the basic systematic review and meta-analysis. In: Users’ guides to the medical standards applicable to any meta-analysis (e.g. compre- literature: a manual for evidence-based clinical practice. 3rd ed. New York: hensive search, duplicate assessment of eligibility, risk of McGraw-Hill Education; 2015. p. 459–69. 2. Stegeman BH, de Bastos M, Rosendaal FR, van Hylckama Vlieg A, bias, and data abstraction). Second, trials may suffer lim- Helmerhorst FM, Stijnen T, Dekkers OM. Different combined oral itations in risk of bias, precision, consistency, and indir- contraceptives and the risk of venous thrombosis: systematic review and ectness. Third, there may be limitations specific to NMA network meta-analysis. BMJ. 2013;347:f5298. including intransitivity, incoherence, or uncritical reli- 3. Sundaresh V, Brito JP, Wang Z, Prokop LJ, Stan MN, Murad MH, Bahn RS. Comparative effectiveness of therapies for graves’ hyperthyroidism: a systematic ance on rankings. Therefore, evaluating the NMA evi- review and network meta-analysis. J Clin Endocrinol Metab. 2013;98(9):3671–7. dence requires serial judgments on the creditability of 4. Menten J, Lesaffre E. A general framework for comparative Bayesian meta- the process of NMA conduct, and evidence quality as- analysis of diagnostic studies. BMC Med Res Methodol. 2015;15:70. 5. Jones LJ, Craven PD, Attia J, Thakkinstian A, Wright I. Network meta-analysis sessment. This introductory guide will assist clinicians in of indomethacin versus ibuprofen versus placebo for PDA in preterm their understanding of NMA. infants. Arch Dis Child Fetal Neonatal Ed. 2011;96(1):F45–52. 6. Chinnadurai S, Fonnesbeck C, Snyder KM, Sathe NA, Morad A, Likis FE, McPheeters ML. Pharmacologic interventions for infantile hemangioma: a Abbreviations meta-analysis. Pediatrics. 2016;137(2):1–10. A/C: Assist-control ventilation; CI: Confidence or credible interval; CrI: Credible 7. Huang J, Wen D, Wang Q, McAlinden C, Flitcroft I, Chen H, Saw SM, Chen H, interval; FEV1: Forced expiratory volume in 1 s; GINA: Global initiative for Bao F, Zhao Y, et al. Efficacy comparison of 16 interventions for myopia asthma; GRADE: Grading recommendations assessment development and control in children: a network meta-analysis. Ophthalmology. 2016;123(4): evaluation; HFOV: High-frequency oscillatory ventilation; HR: Hazard ratio; 697–708. ICS: Inhaled corticosteroids; ICS-H: Medium or high-dose inhaled corticosteroids; 8. Littlewood KJ, Higashi K, Jansen JP, Capkun-Niggli G, Balp MM, Doering G, ICS-L: Low-dose inhaled corticosteroids; IMV: Intermittent mandatory ventilation; Tiddens HA, Angyalosi G. A network meta-analysis of the efficacy of inhaled LABA: Long-acting b-agonists strategies; LTRAs: Leukotriene receptor antibiotics for chronic Pseudomonas infections in cystic fibrosis. J Cyst antagonists; MA: Meta-analysis; MD: Mean difference; NMA: Network meta- Fibros. 2012;11(5):419–26. analysis; OR: Odds ratio; RCT: Randomized controlled trial; RD: Risk difference; 9. Wang C, Guo L, Chi C, Wang X, Guo L, Wang W, Zhao N, Wang Y, Zhang Z, RDS: Respiratory distress syndrome; ROB: Risk of bias; RR: Risk ratio; Li E. Mechanical ventilation modes for respiratory distress syndrome in SIMV: Synchronized intermittent mechanical ventilation; SIPPV: Synchronized infants: a systematic review and network meta-analysis. Critical Care. intermittent positive pressure ventilation; VG: Volume-guarantee ventilation; 2015;19:108. WL: Wait list 10. Zhou X, Hetrick SE, Cuijpers P, Qin B, Barth J, Whittington CJ, Cohen D, Del Giovane C, Liu Y, Michael KD, et al. Comparative efficacy and Availability of data and materials acceptability of psychotherapies for depression in children and Data sharing is not applicable to this article as no datasets were generated adolescents: a systematic review and network meta-analysis. World or analyzed during the current study. Psychiat. 2015;14(2):207–22. 11. Knottnerus BJ, Grigoryan L, Geerlings SE, Moll van Charante EP, Verheij TJ, Kessels AG, ter Riet G. Comparative effectiveness of antibiotics for Authors’ contributions uncomplicated urinary tract infections: network meta-analysis of RA: conceptualized and designed the study, drafted the initial manuscript, randomized trials. Fam Pract. 2012;29(6):659–70. and approved the final manuscript as submitted; IF: conceptualized the 12. Zhao Y, Han S, Shang J, Zhao X, Pu R, Shi L. Effectiveness of drug study, reviewed the, and approved the final manuscript as submitted; GG: treatment strategies to prevent asthma exacerbations and increase conceptualized the study, reviewed the, and approved the final manuscript symptom-free days in asthmatic children: a network meta-analysis. as submitted; LT: conceptualized the study, reviewed the, and approved the J Asthma. 2015;52(8):846–57. final manuscript as submitted. 13. Caldwell DM, Welton NJ, Dias S, Ades AE. Selecting the best scale for measuring treatment effect in a network meta-analysis: a case study in childhood nocturnal enuresis. Res Synthesis Methods. 2012;3(2):126–41. Ethics approval and consent to participate 14. Fang XZ, Gao J, Ge YL, Zhou LJ, Zhang Y. Network meta-analysis on the Not applicable. efficacy of Dexmedetomidine, midazolam, ketamine, Propofol, and fentanyl for the prevention of sevoflurane-related emergence agitation in children. Competing interests Am J Ther. 2015;23:e1032–42. The authors declare that they have no competing interests. 15. Huang X, Xu B. Efficacy and safety of tacrolimus versus Pimecrolimus for the treatment of atopic dermatitis in children: a network meta-analysis. Dermatology. 2015;231(1):41–9. Publisher’sNote 16. Achana FA, Sutton AJ, Kendrick D, Wynn P, Young B, Jones DR, Hubbard SJ, Cooper NJ. The effectiveness of different interventions to promote poison Springer Nature remains neutral with regard to jurisdictional claims in published prevention behaviours in households with children: a network meta- maps and institutional affiliations. analysis. PLoS One. 2015;10(3):e0121122. Author details 17. Hubbard S, Cooper N, Kendrick D, Young B, Wynn PM, He Z, Miller P, Department of Clinical Epidemiology & Biostatistics, McMaster University, Achana F, Sutton A. Network meta-analysis to evaluate the effectiveness of Hamilton, ON, Canada. Department of Pediatrics, Division of Pediatric interventions to prevent falls in children under age 5 years. Inj Prevent. Endocrinology and Metabolism King Saud University, Riyadh, Saudi Arabia. 2015;21(2):98–108. Department of Pediatrics, Universidad de Antioquia, Medellín, Colombia. 18. van der Mark LB, Lyklema PH, Geskus RB, Mohrs J, Bindels PJ, van Aalderen Department of Medicine, McMaster University, Hamilton, Canada. WM, Ter Riet G. A systematic review with attempted network meta-analysis Department of Pediatrics and Anesthesia, McMaster University, Hamilton, of asthma therapy recommended for five to eighteen year olds in GINA ON, Canada. steps three and four. BMC Pulmonary Med. 2012;12:63. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 11 of 12 19. Guo C, Sun X, Wang X, Guo Q, Chen D. Network meta-analysis comparing statement for reporting of systematic reviews incorporating network meta- the efficacy of therapeutic treatments for bronchiolitis in children. JPEN J analyses of health care interventions: checklist and explanations. Ann Intern Parenter Enteral Nutr. 2018;42(1):186–95. Med. 2015;162(11):777–84. 20. Padilha S, Virtuoso S, Tonin FS, Borba HHL, Pontarolo R. Efficacy and safety 40. Jansen JP, Trikalinos T, Cappelleri JC, Daw J, Andes S, Eldessouki R, Salanti G. of drugs for attention deficit hyperactivity disorder in children and Indirect treatment comparison/network meta-analysis study questionnaire adolescents: a network meta-analysis. Eur Child Adol Psychiat. 2018; https:// to assess relevance and credibility to inform health care decision making: doi.org/10.1007/s00787-018-1125-0. an ISPOR-AMCP-NPC good practice task force report. Value Health. 2014; 17(2):157–73. 21. Zeng L, Tian J, Song F, Li W, Jiang L, Gui G, Zhang Y, Ge L, Shi J, Sun X, et al. Corticosteroids for the prevention of bronchopulmonary dysplasia in preterm 41. Cipriani A, Zhou X, Del Giovane C, Hetrick SE, Qin B, Whittington C, Coghill infants: a network meta-analysis. Arch Dis Child Fetal Neonatal Ed. 2018; D, Zhang Y, Hazell P, Leucht S, et al. Comparative efficacy and tolerability of https://doi.org/10.1136/archdischild-2017-313759. antidepressants for major depressive disorder in children and adolescents: a 22. Fu H-D, Qian G-L, Jiang Z-Y. Comparison of second-line immunosuppressants network meta-analysis. Lancet. 2016;388:881–90. for childhood refractory nephrotic syndrome: a systematic review and network 42. Windecker S, Stortecky S, Stefanini GG, da Costa BR, Rutjes AW, Di Nisio M, meta-analysis. J Investig Med. 2017;65(1):65–71. Silletta MG, Maione A, Alfonso F, Clemmensen PM, et al. Revascularisation 23. Gutierrez-Castrellon P, Indrio F, Bolio-Galvis A, Jimenez-Gutierrez C, Jimenez- versus medical treatment in patients with stable coronary artery disease: Escobar I, Lopez-Velazquez G. Efficacy of lactobacillus reuteri DSM 17938 for network meta-analysis. BMJ Clin Res ed. 2014;348:g3859. infantile colic: systematic review with network meta-analysis. Medicine. 43. Lumley T. Network meta-analysis for indirect treatment comparisons. Stat 2017;96(51):e9375. Med. 2002;21(16):2313–24. 24. Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux PJ, Prasad K, 44. Friedrich JO, Adhikari NK, Beyene J. Ratio of means for analyzing continuous Neumann I, Carrasco-Labra A, Agoritsas T, Hatala R, et al. How to read a outcomes in meta-analysis performed as well as mean difference methods. systematic review and meta-analysis and apply the results to patient care: J Clin Epidemiol. 2011;64(5):556–64. users’ guides to the medical literature. JAMA. 2014;312(2):171–9. 45. Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen 25. Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, TP. Does quality of reports of randomised trials affect estimates of intervention Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment efficacy reported in meta-analyses? Lancet. 1998;352(9128):609–13. comparisons and network meta-analysis for health-care decision making: 46. Chaimani A, Vasiliadis HS, Pandis N, Schmid CH, Welton NJ, Salanti G. Effects report of the ISPOR task force on indirect treatment comparisons good of study precision and risk of bias in networks of interventions: a network research practices: part 1. Value Health. 2011;14(4):417–28. meta-epidemiological study. Int J Epidemiol. 2013;42(4):1120–31. 26. Cipriani A, Higgins JP, Geddes JR, Salanti G. Conceptual and technical 47. Guyatt GH,Oxman AD,KunzR,BrozekJ,Alonso-Coello P, Rind D, challenges in network meta-analysis. Ann Intern Med. 2013;159(2):130–7. Devereaux PJ, Montori VM, Freyschuss B, Vist G, et al. GRADE guidelines 27. Salanti G. Indirect and mixed-treatment comparison, network, or multiple- 6. Rating the quality of evidence–imprecision. J Clin Epidemiol. treatments meta-analysis: many names, many benefits, many concerns for 2011;64(12):1283–93. the next generation evidence synthesis tool. Res Synthesis Methods. 2012; 48. Schunemann HJ, Guyatt GH. Commentary–goodbye M (C) ID! Hello MID, 3(2):80–97. where do you come from? Health Serv Res. 2005;40(2):593–7. 28. Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, Boersma 49. Counseling parents before high-risk delivery. In: Lacy GT, Eyal FG, Zenk KE, C, Thompson D, Larholt KM, Diaz M, et al. Conducting indirect-treatment- editors. Neonatology: management, procedures, on-call problems, diseases, comparison and network-meta-analysis studies: report of the ISPOR task and drugs. Edn. Stamford: Appleton & Lange; 1999. p. 223. force on indirect treatment comparisons good research practices: part 2. 50. Ancel PY, Goffinet F, Kuhn P, Langer B, Matis J, Hernandorena X, Value Health. 2011;14(4):429–37. Chabanier P, Joly-Pedespan L, Lecomte B, Vendittelli F, et al. Survival 29. Thabane L, Thomas T, Ye C, Paul J. Posing the research question: not so and morbidity of preterm children born at 22 through 34 weeks’ simple. Can J Anaesth. 2009;56(1):71–9. gestation in France in 2011: results of the EPIPAGE-2 cohort study. 30. Salanti G, Del Giovane C, Chaimani A, Caldwell DM, Higgins JP. Evaluating JAMA Pediatr. 2015;169(3):230–8. the quality of evidence from a network meta-analysis. PLoS One. 2014;9(7): 51. Sun H, Cheng R, Kang W, Xiong H, Zhou C, Zhang Y, Wang X, Zhu C. High- e99682. frequency oscillatory ventilation versus synchronized intermittent mandatory ventilation plus pressure support in preterm infants with severe 31. Puhan MA, Schunemann HJ, Murad MH, Li T, Brignardello-Petersen R, Singh respiratory distress syndrome. Respir Care. 2014;59(2):159–69. JA, Kessels AG, Guyatt GH. A GRADE working group approach for rating the quality of treatment effect estimates from network meta-analysis. BMJ. 2014; 52. Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso- 349:g5630. Coello P, Falck-Ytter Y, Jaeschke R, Vist G, et al. GRADE guidelines: 8. Rating 32. Reddel HK, Levy ML. The GINA asthma strategy report: what's new for the quality of evidence–indirectness. J Clin Epidemiol. 2011;64(12):1303–10. primary care? NPJ Prim Care Resp Med. 2015;25:15050. 53. Higgins J. Identifying and addressing inconsistency in network meta- 33. Expert Panel Report 3 (EPR-3). Guidelines for the diagnosis and analysis. In: Cochrane comparing multiple interventions methods group Management of Asthma-Summary Report 2007. J Allergy Clin Immunol. Oxford training event, vol. 2013: Cochrane Collaboration; 2013. 2007;120(5 Suppl):S94–138. 54. Song F, Xiong T, Parekh-Bhurke S, Loke YK, Sutton AJ, Eastwood AJ, Holland 34. Jansen JP, Naci H. Is network meta-analysis as valid as standard pairwise R, Chen Y-F, Glenny A-M, Deeks JJ, et al. Inconsistency between direct and meta-analysis? It all depends on the distribution of effect modifiers. BMC indirect comparisons of competing interventions: meta-epidemiological Med. 2013;11:159. study. BMJ. 2011;343:d4909. 35. Betran AP, Say L, Gulmezoglu AM, Allen T, Hampson L. Effectiveness of 55. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed different databases in identifying studies for systematic reviews: experience treatment comparison meta-analysis. Stat Med. 2010;29(7–8):932–44. from the WHO systematic review of maternal morbidity and mortality. BMC 56. Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence Med Res Methodol. 2005;5(1):6. synthesis for decision making 4: inconsistency in networks of evidence 36. Kwon Y, Powelson SE, Wong H, Ghali WA, Conly JM. An assessment of based on randomized controlled trials. Med Decis Mak. 2013;33(5):641–56. the efficacy of searching in biomedical databases beyond MEDLINE in 57. Oxman AD, Guyatt GH. A consumer's guide to subgroup analyses. Ann identifying studies for a systematic review on ward closures as an Intern Med. 1992;116(1):78–84. infection control intervention to control outbreaks. Systematic Rev. 58. Sun X, Ioannidis JP, Agoritsas T, Alba AC, Guyatt G. How to use a subgroup 2014;3:135. analysis: users’ guide to the medical literature. JAMA. 2014;311(4):405–11. 37. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic 59. Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, Alonso-Coello reviews. BMJ. Br Med J. 1994;309(6964):1286–91. P, Djulbegovic B,AtkinsD,Falck-Ytter Y, et al.GRADE guidelines:5.Rating 38. Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews the quality of evidence–publication bias. J Clin Epidemiol. 2011;64(12): of Interventions Version 5.1.0 [updated March 2011]. The Cochrane 1277–82. Collaboration, 2011. Available from www.cochrane-handbook.org. 60. Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical 39. Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, summaries for presenting results from multiple-treatment meta-analysis: an Ioannidis JP, Straus S, Thorlund K, Jansen JP, et al. The PRISMA extension overview and tutorial. J Clin Epidemiol. 2011;64(2):163–71. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 12 of 12 61. Mbuagbaw L, Rochwerg B, Jaeschke R, Heels-Andsell D, Alhazzani W, Thabane L, Guyatt GH. Approaches to interpreting and choosing the best treatments in network meta-analyses. Systematic Rev. 2017;6(1):79. 62. Veroniki AA, Straus SE, Rücker G, Tricco AC. Is providing uncertainty intervals in treatment ranking helpful in a network meta-analysis? J Clin Epidemiol. 2018; https://doi.org/10.1016/j.jclinepi.2018.02.009. 63. Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996;17(1):1–12. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Pediatrics Springer Journals

Network meta-analysis: users’ guide for pediatricians

Free
12 pages
Loading next page...
 
/lp/springer_journal/network-meta-analysis-users-guide-for-pediatricians-mBvl3arBPL
Publisher
BioMed Central
Copyright
Copyright © 2018 by The Author(s).
Subject
Medicine & Public Health; Pediatrics; Internal Medicine
eISSN
1471-2431
D.O.I.
10.1186/s12887-018-1132-9
Publisher site
See Article on Publisher Site

Abstract

Background: Network meta-analysis (NMA) is a powerful analytic tool that allows simultaneous comparison between several management/treatment alternatives even when direct comparisons of the alternatives (such as the case in which treatments are compared against placebo and have not been compared against each other) are unavailable. Though there are still a limited number of pediatric NMAs published, the rapid increase in NMAs in other areas suggests pediatricians will soon be frequently facing this new form of evidence summary. Discussion: Evaluating the NMA evidence requires serial judgments on the creditability of the process of NMA conduct, and evidence quality assessment. First clinicians need to evaluate the basic standards applicable to any meta-analysis (e.g. comprehensive search, duplicate assessment of eligibility, risk of bias, and data abstraction). Then evaluate specific issues related to NMA including precision, transitivity, coherence, and rankings. Conclusions: In this article we discuss how clinicians can evaluate the credibility of NMA methods, and how they can make judgments regarding the quality (certainty) of the evidence. We illustrate the concepts using recent pediatric NMA publications. Keywords: Network meta-analysis, Multiple treatment comparisons, Multiple-treatment meta-analysis evidence synthesis, Evidence credibility, Evidence certainty, Pediatric Background possible paired comparison, but these have two major Randomized control trials (RCTs) constitute the optimal limitations. First, for the clinician or patient consumer, methodology to determine the effectiveness of medical making sense of multiple meta-analyses would be chal- interventions. When results against placebo or standard lenging. Second, it is extremely likely that many of the care suggest benefits outweigh harms, clinicians, patients possible paired comparisons will not have direct and families must choose among several interventions. comparisons available; in such instances, there will be Making this choice optimally requires access to system- no conventional meta-analysis to consider. atic summaries of the best available evidence. Network meta-analysis (NMA), also known as For decades, investigators have provided these multiple-treatment comparisons or multiple-treatment evidence summaries using systematic reviews and meta- meta-analysis, provides a methodology to address this analyses. By combining across studies, meta-analyses dilemma, taking advantage of two statistical innovations: increase the precision of the effect estimate [1]. Conven- the first is use of indirect comparisons—we can estimate tional meta-analyses, however, address only single paired the effect of A-B indirectly if both A and B have been comparisons and are therefore of limited use when mul- compared against C (see next section). The second is tiple reasonable options exist. One could envision a that NMA statistical methods combining direct and in- series of conventional meta-analyses addressing each direct comparisons allow estimates of the relative effect of every alternative versus every other alternative. Although the majority of published NMAs summarize * Correspondence: ralkhalifah@ksu.edu.sa; Reem_ah@yahoo.com 1 evidence from RCTs, NMA of cohort studies – most Department of Clinical Epidemiology & Biostatistics, McMaster University, Hamilton, ON, Canada often addressing the evidence regarding adverse events - Department of Pediatrics, Division of Pediatric Endocrinology and are increasing [2, 3]. Moreover, given the recent Metabolism King Saud University, Riyadh, Saudi Arabia Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 2 of 12 development of the required methods, diagnostic test ac- (nodes) represent each intervention, and the lines be- curacy NMA may soon be available [4]. tween the nodes (called edges) represent head-to-head The first NMA addressing a pediatric issue evaluated comparisons (Fig. 2)[25]. Some network graphs use the the effects of indomethacin, ibuprofen, and placebo on size of the nodes and the width of the edges to convey patent ductus arteriosus closure in preterm infants [5]. information about the amount of information available Since then, the number of pediatric NMAs has increased (circles convey the sample size of studies of a particular [6–23] and, given development in other fields, one can intervention and edges the number, sample size, or vari- anticipate a substantial further increase. This increase ance associated with the related direct comparisons, i.e. might, however, occur at a slower rate in the pediatric large node means larger sample size, and thick edge field because of the smaller number of RCTs relative to means increased number of studies included). the adult literature. In comparison to conventional meta-analysis that relies The goal of this paper is to provide a users’ guide for exclusively on direct evidence, the NMA provides esti- pediatricians considering the application of the results of mates of relative effectiveness among all interventions be- NMA addressing a therapeutic issue to their practice. ing compared, increases precision around effect estimates, Nonetheless, a minimum knowledge on Conventional ranks treatments, and enhances generalizability [26–28]. meta-analysis is needed to understand most of the im- portant concepts of NMA [24]. First, we introduce the Credibility of NMA methods reader to NMAs and provide criteria for evaluating the The conduct of NMA should adhere to standards of a credibility of the NMA method. We then discuss the traditional systematic review. Like a conventional meta- quality of the evidence (synonyms: certainty or confi- analysis, a credible NMA requires explicit eligibility cri- dence in evidence) obtained from a NMA (the NMA teria, comprehensive search, and assessment of evidence may have used optimal methods, but limitations of the quality (Table 1). underlying studies may still result in low quality evi- dence). To illustrate the processes of interpretation and implementation in the context of pediatric literature, we Did the review explicitly address a sensible question? will present an example of the effects of 16 different A well-formulated clinical question will typically follow mechanical ventilation modes on mortality among pre- the PICO format (P: population, I: intervention, C: term infants with respiratory distress syndrome (RDS) [9], in addition to other examples from the pediatric lit- erature when we could not illustrate the presented con- cepts using the mechanical ventilation NMA. Discussion Indirect evidence Let us suppose that we are interested in the relative merits of two treatments, A and B. It may be that no study has directly compared the two treatments. If, how- ever, investigators have compared both A and B against the same third alternative C, we can infer the relative ef- fect of A-B. We do so by comparing the effect of A-C and B-C (the indirect comparison, Fig. 1.1). For instance, if the relative risk (RR) of death in A-C is 0.5 (A reduces deaths relative to C by 50%) the RR of Fig. 1 The concept of network meta-analysis. Each node (circle) is death in B-C is 1.0 (B has no effect on deaths relative to considered an intervention (A, B or C), sold lines represent loops of C), then it would be reasonable to infer that A will re- pairwise comparison (direct evidence), and doted lines represent duce death relative to B by 50%. Furthermore, if investi- loops of indirect comparison (indirect evidence). Indirect compari- gators have conducted both direct and indirect sons can be made via deduction from the common comparator. 1.1. Indirect evidence of A versus B inferred from direct estimates of comparisons, we can combine the two and produce a A versus C and B versus C Four studies formed the effect estimate for mixed or network estimate (Fig. 1.2). A-C, and 3 studies formed the effect estimate for C-B. The effect esti- mate of A-B was obtained from indirect evidence. 1.2.Closed network shows the a closed network meta-analysis in a hypothetical example Network meta-analysis where all interventions were compared in RCT’s, therefore; direct and Ideally, an NMA will depict the available direct evidence indirect evidence is available for all comparisons in a figure; we refer to as a network graph. The circles Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 3 of 12 Fig. 2 The geometry of the mechanical ventilation for premature infants NMA. A/C, assist-control ventilation; VG, volume guarantee ventilation; RM, recruitment maneuver; CMV, continuous mandatory ventilation; HFFIV, high-frequency flow interrupted ventilation; HFJV, high-frequency-jet ventilation; HFOV, high-frequency oscillatory ventilation; IMV, intermittent mandatory ventilation; PSV, pressure support ventilation; PTV, patient- triggered ventilation; SIMV, synchronized intermittent mechanical ventilation; SIPPV, synchronized intermittent positive pressure ventilation;V-C,volume- controlled. Wang C et al. Mechanical ventilation modes for respiratory distress syndrome in infants: a systematic review and network meta-analysis.Critical care (London, England). 2015, reprinted by permission of the publisher [9] comparator, O: outcomes) [29]. NMA uses the same for- disease severity that might lead to differences in treat- mat except that “I” and “C” (intervention and compari- ment response [33]. sons), include all the interventions compared against Another example relates to differences in the meas- each other. Successful definitions of each element of the urement of outcome [27, 34]. Two systematic reviews PICO are required to determine the studies eligible for in asthma began with the goal of conducting an NMA; the review and develop a priori hypotheses to address only one was successful. The first study evaluated the possible heterogeneity. effectiveness of the various inhalation regimens on Although, the scope of the research question can vary FEV1 improvement [18]. The systematic review re- from narrow to broad, it is essential that for any paired vealed large variations in the way the 23 trials measured comparison within the NMA, it is plausible that we will, and reported FEV1. This heterogeneity prevented the for each outcome of interest, observe similar effects review team from performing an NMA. The second across all patient populations being addressed [30, 31]. Eligibility criteria can be wide enough to permit the Table 1 Guide for appraising NMA evidence possibility of differences in effect across the included pa- Credibility Did the review explicitly address sensible question? tients, interventions, and outcomes. For instance, effects Was the search for studies and selection comprehensive? Did the review assess evidence certainty? may differ – among eligible studies- in more or less Did the review present results for the reader? severely affected patients; across high and low doses and Certainty What is the risk of bias of included studies? across shorter and longer follow-up. Were the results precise? An NMA that assessed the efficacy of asthma treat- Were results consistent across studies? How trustworthy are the indirect comparisons? ments strategies included all children with chronic Were results consistent between direct and indirect asthma [12]. The definition of chronic asthma was not comparisons? based on the Global Initiative for Asthma (GINA) Is there evidence for publication bias? Were treatment ranks presented and were they trustworthy? asthma guidelines staging [32], nor did the authors present data on disease severity, or attempt subgroup Applicability What is the overall quality of the evidence? What are the limitations of the evidence? analyses. The broad inclusion criteria and lack of Can I apply the results to my patients? subgroup hypotheses fail to address the differences in Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 4 of 12 NMA assessed the efficacy of treatments on reducing information is not available, then the credibility of the exacerbation [12]. Severe exacerbation was defined as NMA is compromised [31]. patients needing hospital admission, a visit to the emer- Consider, for instance, the GRADE profile for the dir- gency department or a standard course of systemic cor- ect evidence of an NMA of antidepressant medications ticosteroids. In this case, outcomes were reported for improving depression symptoms in children (Table 2) similarly across trials and the authors presented [41]. The evidence certainty for Fluoxetine versus pla- pooled estimates. cebo was rated as very low as a result of high RoB, im- precision, and inconsistency. The Imipramine versus Was the search for studies and selection comprehensive? placebo comparison was rated as moderate, the only A comprehensive systematic search that identifies all concern being imprecision. With this variation in evi- pertinent available studies minimizes the risk of spurious dence certainty, making sense of the results requires rat- findings from unrepresentative selection of studies. Since ings of evidence quality for each pairwise comparison. many reviews articles have demonstrated the inadequacy of searching only one database [35–37], an optimal How do NMAs conduct analyses and present results? search include all relevant electronic databases (e.g., There are two statistical approaches to perform NMA: a Medline, Embase, Psycinfo, CENTRAL, CINAHL) [38]. frequentist and a Bayesian approach [41, 42]. The fre- Ideally, a search of the grey literature will minimize the quentist approach is what clinicians will generally see in risk of publication bias. individual RCTs and conventional meta-analyses. The Subsequently, the team selects eligible studies [38]. additional major aspect of Bayesian approaches is the The report should provide evidence of the reproducibil- specification of prior probabilities of treatment effects ity of assessment of study eligibility through review by at before beginning the data analysis and combining these least 2 independent assessors, and present a figure sum- priors and their precision with the estimate from the marizing each selection step in the eligibility determin- data to produce a posterior probability and its credible ation process (identification of titles and abstracts; interval. Results in NMA are presented as effect esti- culling of titles and abstracts; review of full texts; final mates, typically odds ratios (ORs), or RR, hazard ratio determination of eligibility) [39]. (HR), mean difference (MD) with their 95% confidence interval (CI) (frequentist approach) or credible interval Did the review assess evidence certainty? (CrI) (Bayesian approach), both of which describe the Certainty in effect estimates represents how trustworthy range of plausible truth around the point estimate. are the results and their conclusions [31]. Within any Ideally, NMAs will present direct, indirect, and net- network, it is likely that the quality of the evidence dif- work estimates for each paired comparison. When, how- fers across paired comparisons: high quality evidence ever, there are large numbers of comparisons, this may reveal that one treatment is superior to another, becomes a challenging task. For example, the mechanical whereas we may have only low quality evidence regard- ventilation modes for RDS in preterm infants NMA in- ing the relative merit of other treatments. cluded 16 different ventilation modes, yielding 120 Making that rating requires a sequence of judgments comparisons- this probably requires an online appendix relying on assessments of the quality of the direct and [9]. Ways to deal with this profusion of comparisons is indirect evidence. Three articles were published on 2014 to present effect estimates in a league table (all possible by the Grades of Recommendation, Assessment, Devel- pairwise interventions compared to each other by cross- opment and Evaluation (GRADE) working group, the matching the interventions on the raw with those in the Cochrane Collaboration, and the ISPOR-AMCP-NPC column), forest plots (all pairwise interventions good practice task force [30, 31, 40] that extend quality compared to one reference intervention, or to the least of evidence assessment of meta-analysis to NMA. efficacious intervention such as placebo), or evidence Following the GRADE approach, the overall confidence comparisons (direct, indirect, and NMA) for each inter- starts as high for direct, indirect, and network estimates vention compared to one reference [13, 41, 43, 44]. that are derived from RCTs [31]. The evidence can be rated down from high to moderate, low, or very low quality based on the presence and magnitude of any of Certainty of NMA evidence the 5 domains: Risk of bias (RoB), indirectness, impreci- What is the risk of bias of included studies? sion, inconsistency, and publication bias [31]. RoB conveys the likelihood that limitations in design or Many prior published NMAs have not explicitly ad- conduct of studies will result in estimates of treatment dressed all the recommended elements. Fortunately, effect that vary systematically from the truth. The however, some present the information required for a greater the RoB, the more appropriate it becomes to rate reader to make the necessary judgments. If the down the quality of the evidence [45, 46]. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 5 of 12 Table 2 GRADE evidence profile showing differences in the evidence certainty among two direct evidence comparisons in the depression treatment NMA for depression symptoms Quality assessment Quality № of studies Risk of bias Inconsistency Indirectness Imprecision Other considerations Absolute Effect (95% CI) Fluoxetine vs. placebo a b c 8 Serious Serious Not serious Serious None SMD 0.26 SD lower ⊕OOO VERY LOW (0.5 lower to 0.03 lower) Imipramine vs. placebo 2 Not serious Not serious Not serious Serious None SMD 0 SD ⊕⊕⊕O MODERATE (0.27 lower to 0.26 higher) RCT Randomised trials; CI Confidence interval, SMD Standardised mean difference [41] Selective outcome reporting, and incomplete outcome data b 2 Moderate heterogeneity I = 67.4% Upper CI very close to no effect SMD includes no effect For assessing the RoB, authors may use an instrument For example, in the NMA of ventilation modes for such as the Cochrane RoB tool for RCTs [38]. This in- infants, the comparison of synchronized-intermittent strument assesses six elements: randomization sequence mechanical ventilation with volume-guarantee (SIMV+VG) generation, concealment of allocation, blinding of partic- versus high-frequency-jet-ventilation (HFJV), the point ipants, personnel and outcome assessors, completeness estimate suggests that SIMV+VG reduced mortality (HR = of follow-up, selective outcome reporting, and presence 0.23) [9]. However, the 95%CrI ranged from an extremely of other biases. large reduction in mortality (HR = 0.03, reduction in hazard In the NMA of strategies for preventing asthma exac- by 97%) to an almost doubling of hazard (HR = 1.46). Since erbations, the authors used the Cochrane instrument to the treatment choice will be different at each CrI end, the assess RoB [12] and judged all trials to be at low RoB. evidence quality is reduced for imprecision. Although the authors did not provide an overall RoB On the other hand, for the comparison SIMV+VG judgment per comparison, it is possible -although tedi- versus SIMV with pressure-support ventilation (SIMV ous- for the pediatrician to make this rating if the NMA +PSV), mortality was lower with SIMV+VG (HR = 0.12; authors have presented ratings of RoB for each study in 95%CrI 0.01, 0.86). Here, even the upper suggests a 14% a table or figure. In this case, it is not a problem: since reduction in hazard with SIMV+VG. Therefore, in this all studies were at low RoB, there is no need to rate instance, there is no need to rate down the quality of the down for RoB for any comparison. evidence for imprecision. Although, the width of the CrI may still be considered large and thus could be consid- ered imprecise for outcomes such as hospital length of Were the results precise? stay, any but the smallest reduction in mortality is crit- The lack of adequate power to inform a particular out- ical. The judgment of importance is critically dependent come leads to imprecision [47]. One standard for asses- on the absolute difference, in this case the absolute mor- sing precision is to consider whether differences tality risk difference: for instance, for 27 weeks infants between intervention and control exclude chance (i.e. with baseline mortality risk of 10%, the absolute mortal- statistically significant). This has two limitations: first ity risk reduction with SIMV+VG versus SIMV+PSV results may exclude no effect, but may not exclude an would approximately be 9% if the point estimate of the effect too small to be important; second, using this HR (0.12) were accurate, and approximately 1.4% if the criterion, one would always rate down for precision if upper boundary of the CrI (0.86) represented the truth. results were not statistically significant, no matter how The magnitude of the absolute difference is greater for narrow the CI or CrI. even younger infants with higher mortality, and less for Therefore, we suggest an alternative standard. To older infants with lower mortality (Table 3)[49–51]. assess imprecision, one can consider whether decisions In a complementary approach, authors can, for each regarding choice of therapy will differ if the upper and direct comparison, assess imprecision by calculating the lower CI or CrI represents the truth. Another way of optimal information size (OIS), the number of patients thinking about this approach is to consider whether the or events needed for adequately powered individual CI or CrI excludes a minimally important difference study to avoid spurious findings [47]. This, however, ig- (MID). The MID is a measure of the smallest change in nores the contribution of the indirect comparisons to the value of a patient-reported outcome, typically the network estimate. Methods to incorporate indirect applied to outcomes such as quality of life measures [48]. estimates of OIS to NMA are under development [26]. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 6 of 12 Table 3 Anticipated absolute mortality among premature infants using SIMV+VG versus SIMV+PSV Relative effect Anticipated absolute effects Hazard ratio Mortality risk with regular care Mortality risk difference with SIMV+VG (95% CrI) GA > 30 weeks 0.12 (0.01 to 0.86) 5 per 100 4 fewer per 100 (5 fewer to 1 fewer) GA 27–30 weeks 0.12 (0.01 to 0.86) 10 per 100 9 fewer per 100 (10 fewer to 1 fewer) GA 25–26 weeks 0.12 (0.01 to 0.86) 50 per 100 42 fewer per 100 (49 fewer to 5 fewer) The relative effect of SIMV+VG (and its 95% CrI) is based on the NMA estimates [9]; the absolute effect (and its 95% CI) is based on the assumed risk in the comparison group; mortality estimates with regular care are based on previous literature [49–51] GA gestational age Were results consistent across studies? heterogeneity that is attributable to differences between One can expect variation between treatment effects –we the studies and ranges from 0 to 100%) [38]. call such variation “heterogeneity”. Heterogeneity can For example, in the chronic asthma NMA, the au- result from chance, or from differences in patients, inter- thors presented direct comparison between low-dose ventions, comparisons, outcomes and methodology be- inhaled corticosteroids (ICS-L) and placebo for moder- tween studies (Table 4). ate or severe exacerbation. Six trials contributed to the Assessing the degree of inconsistency in direct com- pooled estimate OR = 0.41 (95%CrI 0.29, 0.56). The for- parisons involves inspecting the point estimates and the est plot shows similar point estimates, and CIs over- degree of confidence or credible intervals overlap of lapped across all trials. The P-value for heterogeneity each study in a forest plot. Two methods for formal stat- assessment was 0.54 (not significant), and I =0% (Fig. 3), istical testing can complement visual inspection of forest indicating a high level of consistency between results. plots – the test for heterogeneity (Cochran’s Q-test), and Conversely, if there is substantial heterogeneity that is I (which quantifies the proportion of the total unexplained by subgroup analysis or meta-regression, we lose confidence in treatment effects and, in the GRADE approach, rate down the quality of evidence for inconsist- ency [31, 34, 52]. Table 4 Possible effect modifiers that may contribute to between study variability How trustworthy are the indirect comparisons? Pure chance Trustworthiness of indirect comparisons - for instance, Different Risk of Bias Studies with high RoB might show large effect than those with low RoB. inferring the relative effect of A-B from A-C and B-C comparisons -requires similarity of patient population, Different study Population: Baseline risk like gender, age (e.g., in some interventions, the effect comparators, outcomes, RoB, and optimal administra- could be larger in infants than in adolescents). tion of the interventions under consideration (Fig. 4). In Disease severity (e.g., in children with severe diseases the effect other words, A and B must both be optimally adminis- of x intervention might be smaller than in case of patients with mild disease). tered; the A-C and B-C comparisons must include simi- Treatment setting (e.g., patient with asthma enrolled from the lar patients; C must be similar; outcomes must be emergency room will have different characteristics than those enrolled measured similarly; and studies would ideally be at low from the outpatient clinic). RoB. We refer to situations when this is not the case as Different Interventions: Dose (larger doses are expected to be associated with larger effect “intransitivity”. Intransitivity reduces confidence in the ad sometimes with larger effect in terms of side effects). results of indirect comparisons. Route (intravenous administration may have larger effect if oral To illustrate the concept of intransitivity consider an administration is impacted by absorption or hepatic metabolism). Duration (using the medication for longer duration may be associated NMA of comparative efficacy of psychotherapies for with larger effect compared to shorter duration). depression in children [10]. The comparison of interest Different comparators: is cognitive behavioral therapy (CBT) versus Problem- Different standards of care when the standard of care is the comparator solving therapy (PST). We wish to make inferences re- (e.g., in a diarrhea study, oral rehydration solution (ORS) is given to the garding the effects of CBT versus PST from an indirect control group in study A vs. ORS+ zinc supplement given to the control group in study B). comparison: studies have compared both CBT and PST to wait list (WL) controls. The 14 RCTs comparing Different ways in Outcome assessment: Definition (e.g., if fever is defined as 38.0 C in study A vs. 39.0 C in study CBT versus WL used 8 different instruments to define B, this may result in diagnosing more patients with the fever in the depression; the 3 RCTs comparing PST versus WL study A). (Table 5) used 2 of the 8, and a ninth that was not used Measurement (e.g., if fever is measured using rectal temperature, compared to axillary temperature in another study; or standard methods at all in the CEB versus WL studies. Use of the different in one study compared to non standard way). instruments could create differences in depression Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 7 of 12 Fig. 3 Forest plot comparing ICS-L vs. placebo for moderate or severe asthma exacerbations. Visual assessment indicates low heterogeneity, similar point estimates, overlapped CI, and I =0 [12]. Zhao Y, et al. Effectiveness of drug treatment strategies to prevent asthma exacerbations and increase symptom-free days in asthmatic children: a network meta-analysis. The Journal of asthma: official journal of the Association for the Care of Asthma. 2015, reprinted by permission of the publisher (Taylor & Francis Ltd., WWW.tandfonline.com)[12] severity in the population that in turn could influence investigators may conduct statistical tests that addresses the magnitude of the treatment effect, suggesting pos- whether chance can explain difference between direct sible intransitivity and consideration of consequent rat- and indirect comparisons [55, 56]. Unexplained incoher- ing down of quality. ence requires rating down evidence quality. In the asthma NMA, the direct evidence comparing Were results consistent between direct and indirect ICS-L versus leukotriene receptor antagonists (LTRA) comparisons? suggested a large reduction in exacerbation favoring Whenever a closed loop is present (Fig. 1.2, and Table 6) ICS-L (OR = 0.38; 95%CrI 0.21, 0.68), and the network there is a possibility that the available direct and indirect estimate showed a significant reduction (OR = 0.56, comparisons will yield very different effect estimates, a 95%CrI 0.39, 0.76) [12] – from which, one might infer condition we refer to as incoherence, or “inconsistency” that the indirect estimate showed a substantially smaller used by other authorities [26, 27, 43, 53, 54]. Incoher- effect or, depending on the amount of indirect evidence, ence can arise for reasons similar to those that can ex- none at all. If the authors had provided the indirect esti- plain heterogeneity and intransitivity (Table 4). mate and its CrI, one could make the judgment regard- One can assess incoherence through inspecting the ing the degree of incoherence. The authors’ statement point estimates and the degree of CI or CrI overlap be- that they found no incoherence in the network on the tween direct and indirect evidence. In addition, basis of statistical tests is somewhat reassuring. Like conventional meta-analysis, when heterogeneity is high, NMA can use techniques of subgroup analysis and meta-regression to try and explain heterogeneity by identifying modifiers of treatment effects [57, 58]. For example in the NMA addressing adverse events associ- ated with antidepressant medications in children and adolescents [41], the OR for adverse events associated with sertraline use compared to placebo 2.94 (95%CrI 0.94,17.19, I = 79.3). The authors performed a subgroup analysis based on age and found increased adverse events with sertraline compared to placebo; for children age < 13 years (OR = 12.64, 95%CrI 2.72, 678.43), and in children age > 13 years (OR = 0.59, 95%CrI 0.15, 6.03). A somewhat less satisfactory way of exploring hetero- geneity is to omit studies and determine if the omission influences results. For example, in the mechanical venti- Fig. 4 The diagram shows the concept of intransitivity. The doted line A–C shows the indirect evidence were inferences are being lation NMA, the authors examined the robustness of the made. B is not shown as a unique intervention, rather as two different analysis by excluding 2 studies that included only new- ways of B (Blue and Red). Intransitivity can occur when the distribution borns with gestational age 25–26 weeks [9]. The results of a possible effect modifier is different between two groups showed no changes in the effect estimates. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 8 of 12 Table 5 Depression definition used in the psychotherapies NMA argue that, having committed oneself to an NMA, one in the wait list (the common comparator) to illustrate the concept should always use the network estimates. of intransitivity in the indirect evidence For example, the pediatric antidepressants medications Pairwise Cognitive-behavioral Problem-solving NMA included a comparison of Fluoxetine versus Pla- comparison therapy vs. Wait list therapy vs. Wait list cebo (Table 7)[41]. In this comparison, one can infer Definition of APAI > 32 20-item CES-D > 16 from the information presented a rating of the quality of depression 21-item BDI > 15 27-item CDI > 16 the direct evidence as very low, the indirect evidence as 21-item BDI > 10 DSM-IV 27-item CDI > 15 moderate, and the network estimate as very low quality. CDRS-R >30 In this case, following the GRADE approach, the clin- DSM-III ician is better off using the effect estimates from the in- DSM-III-R DSM-IV direct evidence. APAI Acholi Psychosocial Assessment Instrument depression symptom scale, BDI Beck Depression Inventory, CES-D Center for Epidemiologic Study Is there evidence for publication Bias? Depression Scale, CDI Children’s Depression Inventory, CDRS-R Children’s Publication bias results from missing studies [59]. This Depression Rating Scale-Revised [10] is because some studies, particularly those with nega- When direct and indirect evidence vary, and the net- tive results, may never be published. A low risk NMA work estimate is between the two and rated down for in- for publication bias will demonstrate comprehensive coherence, what estimate is the clinician going to believe? search for studies, present symmetrical funnel, and The GRADE approach suggests using the effect estimates demonstrate insignificant statistical test for publication from the highest quality evidence, which most commonly bias [38]. This assessment requires, however, at least 10 will be the direct estimate [31]. Other authorities would studies. If publication bias is very likely rating down the evidence is warranted. Table 6 Glossary of terms Were treatment ranks presented and were they trustworthy? Certainty: Methods are available that allow NMA authors to rank Quality of the evidence or confidence in the evidence. treatments from best to worst [26, 60]. They are often Direct estimates: expressed as probabilities that treatments are 1st, 2nd, Effect estimate determined from a head-to-head comparison (such as study 3rd etc. best, either in tables (Table 8) or graphically of A versus B). (rankograms). Surface under the ranking (SUCRA) sum- Indirect estimates: marizes the information from the rankograms as a single Effect estimate determined from two or more head-to-head comparisons through a common (such as the relative effect of A versus B by comparing number. Ranking need be made for each outcome –a the effect of A versus C and B versus C). treatment that is best for one outcome (e.g. benefit) may Network (multiple-treatment comparisons or multiple-treatment meta-analysis): be worst for another (e.g. harm) [60]. Effect estimate determined for a particular comparison from the combination Although intuitively appealing, there are a number of of direct and indirect effect estimates. reasons why clinicians should not routinely choose a Loop: treatment with the higher rankings [61]. First, a treat- A loop of evidence exists when 2 or more direct comparisons contribute to an indirect estimate (e.g., A-B and A-C, contribute to indirect B-C) this ment that is best in one outcome (e.g., a benefit out- loop is considered closed if direct evidence exists between B-C, and open come) may be the worst in another (e.g., a harm when this direct evidence does not exists. outcome). Second, issues such as cost and a clinician’s Indirectness: familiarity with use of a particular treatment may also Term used in direct evidence to describe the presence of systematic bear consideration. Third, rankings do not take into clinical or methodological differences between head-to-head studies that can act as effect modifiers. These can be in different patients account the magnitude of differences in effects between characteristics, ways of administering the interventions, measuring treatments (a first ranked treatment may be only slightly, outcomes, or ROB. or a great deal better than the second ranked treatment). Intransitivity: Fourth, chance may explain apparent difference between Term used in indirect evidence to describe the presence of systematic clinical or methodological differences between head-to-head studies treatments; the use of a measure of uncertainty such as that can act as effect modifiers. These can be in different patients credible intervals for the SUCRA or p-value might help characteristics, ways of administering the interventions, measuring to consider the precision of these probabilities [62]. outcomes, or ROB. Finally, and most important, the evidence on which Heterogeneity (Inconsistency): rankings are based may be very low quality, and there- The presence of differences in effect estimates between head-to-head studies that assessed the same comparison. fore untrustworthy [61]. Although the first ranking may be secure, others are Incoherence: The presence of differences in effect estimates between direct and indirect not: the asthma NMA showed that the treatment ranks evidence. for 2nd, 3rd, and 4th orders were ICS-L, ICS-H, ICS + Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 9 of 12 Table 7 Differences in the evidence certainty across evidence sources in the depression treatment NMA Comparison Direct evidence Direct evidence Indirect evidence Indirect evidence Network Network certainty certainty in estimates certainty in estimates in estimates a,b,c d b,e Fluoxetine vs. Placebo −0.26 ⊕OOO VERY LOW −1.41 ⊕⊕⊕O MODERATE − 0.51 ⊕OOO VERY LOW (−0.50, −0.03) (−2.35, − 0.47) (− 0.99, − 0.03) rated down for RoB rated down for imprecision (upper CI close to the null) c 2 rated down for heterogeneity (I = 67.4%) d 2 loops informed the indirect evidence were of lowROB, imprecise (Duloxetine-placebo[SMD= − 0.11 95%CrI -0.3, 0.08; I = 17%], Duloxetine- Fluoxetine [SMD = − 0.09 95%CrI -0.26, 0.08; I = 0%], no intransitivity e 2 rated down for incoherence (τ = 0.33, P value = 0.02) Effect estimates are SMD (95th CI) [41] Assessed from first order loop Duloxetine-placebo (n = 552), Duloxetine- Fluoxetine (n = 557), included 7–17 years old children, treated for 10 weeks LTRA (Table 8) for improving symptom-free days [12]. strategy included 5 databases, and a grey literature However, the probability for each treatment were close: search. Two independent reviewers performed title and 0.38, 0.33, 0.24 respectively, the NMA estimates were abstract screening, full text eligibility, data extraction, imprecise, and of low quality evidence. Therefore, the and quality assessment, resulting in 20 eligible RCTs, treatment ranks for the 2nd, 3rd, and 4th orders are comparing 16 ventilation modes in 2832 infants with untrustworthy. gestational age 25–32 weeks (Fig. 2). The authors reported baseline characteristics, and assessed RoB using Applicability the Jadad instrument [63]. The authors did not present Just as in conventional systematic reviews and pairwise evidence quality assessment but, as we note in the next meta-analysis, applicability may be limited by differences paragraph, they present enough information to make between the clinicial setting and the setting in which the this judgment. trials were conducted. These limitations may include All included studies were low RoB. The only NMA es- differences in the patients (e.g. the patient may be younger timates for mortality in the entire network in which the than those included in the trials); the intervention (e.g. the CrI did not include HR = 1.0 suggested benefit for time- clinician is considering use of doses differing from those cycled pressure-limited ventilation (TCPL) (HR = 0.29 tested in the trials); comparators (e.g. trials used standard 95%CrI 0.07, 0.97), HFOV (HR = 0.29 95%CrI 0.08, 0.85), care as a comparator, and standard care delivered in the SIMV+VG (HR = 0.12, 95%CrI 0.01, 0.86), and V-C trials differs from standard care in the clinician’s setting); (HR = 0.14 95%CrI 0.02, 0.68) modes compared to and outcomes (e.g. the clinician is concerned about long- SIMV+PSV. Although, the upper CrI of those estimates term effects of treatment and trials examined only shorter are close to no difference, you decide to not rate them term outcomes). In any of these situations, the clinician down for imprecision (refer to the earlier discussion on must consider the extent to which trial results apply to imprecision). The contributing direct comparisons en- their patients and, if such differences exist, potentially rolled similar appropriate patients, the interventions refer to other evidence or their own experience in decid- appeared to be administered optimally, and the authors ing on optimal management. reported no heterogeneity or incoherence. You see little reason why, depending on the direction of results, Implementation authors would choose not to submit, or editors to Returning to the NMA of ventilation modes in preterm publish these studies, and therefore rate publication infants with RDS [9] (P infants with RDS; I and C; all bias as undetected. All these comparisons constitute mechanical ventilation modes; O; mortality), the search high quality evidence. For every other paired comparison in the network, precision is a major concern. In the ranking, SIMV+VG mode had the highest Table 8 Asthma treatments strategies effectiveness NMA in probability of being ranked first, though that probabil- improving symptom free days ity was only 29.7%. The V-C mode had the second 1st Rank 2nd Rank 3rd Rank 4th Rank highest probability of being ranked first, at 22.8%. ICS + LABA 0.95 0.05 0.01 0 Given that there is clear difference between these two modes versus only SIMV+PSV (all other CrI were not ICS low dose 0.02 0.38 0.37 0.24 precise) the only convincing result is that it is wise to ICS high dose 0.01 0.33 0.36 0.29 avoid using SIMV+PSV. You therefore conclude that ICS + LTRA 0.02 0.24 0.26 0.45 use of TCPL, HFOV, SIMV+VG, or V-C – all of Ranks are expressed as probabilities that sums to 1. ICS-L low-dose inhaled which the pediatrician uses regularly - is reasonable corticosteroids, ICS-H medium or high-dose inhaled corticosteroids, LTRAs leukotriene receptor antagonists, LABA, long-acting b-agonists strategies [12] and appropriate. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 10 of 12 Conclusion Received: 30 January 2017 Accepted: 30 April 2018 NMA is a powerful analytic tool that offers many advan- tages over a conventional meta-analysis. NMA may, however, be misleading because of a number of prob- References 1. Guyatt G, Rennie D, Meade MO, Cook DJ. Chapter 22: the process of a lems. First authors may not have not followed the basic systematic review and meta-analysis. In: Users’ guides to the medical standards applicable to any meta-analysis (e.g. compre- literature: a manual for evidence-based clinical practice. 3rd ed. New York: hensive search, duplicate assessment of eligibility, risk of McGraw-Hill Education; 2015. p. 459–69. 2. Stegeman BH, de Bastos M, Rosendaal FR, van Hylckama Vlieg A, bias, and data abstraction). Second, trials may suffer lim- Helmerhorst FM, Stijnen T, Dekkers OM. Different combined oral itations in risk of bias, precision, consistency, and indir- contraceptives and the risk of venous thrombosis: systematic review and ectness. Third, there may be limitations specific to NMA network meta-analysis. BMJ. 2013;347:f5298. including intransitivity, incoherence, or uncritical reli- 3. Sundaresh V, Brito JP, Wang Z, Prokop LJ, Stan MN, Murad MH, Bahn RS. Comparative effectiveness of therapies for graves’ hyperthyroidism: a systematic ance on rankings. Therefore, evaluating the NMA evi- review and network meta-analysis. J Clin Endocrinol Metab. 2013;98(9):3671–7. dence requires serial judgments on the creditability of 4. Menten J, Lesaffre E. A general framework for comparative Bayesian meta- the process of NMA conduct, and evidence quality as- analysis of diagnostic studies. BMC Med Res Methodol. 2015;15:70. 5. Jones LJ, Craven PD, Attia J, Thakkinstian A, Wright I. Network meta-analysis sessment. This introductory guide will assist clinicians in of indomethacin versus ibuprofen versus placebo for PDA in preterm their understanding of NMA. infants. Arch Dis Child Fetal Neonatal Ed. 2011;96(1):F45–52. 6. Chinnadurai S, Fonnesbeck C, Snyder KM, Sathe NA, Morad A, Likis FE, McPheeters ML. Pharmacologic interventions for infantile hemangioma: a Abbreviations meta-analysis. Pediatrics. 2016;137(2):1–10. A/C: Assist-control ventilation; CI: Confidence or credible interval; CrI: Credible 7. Huang J, Wen D, Wang Q, McAlinden C, Flitcroft I, Chen H, Saw SM, Chen H, interval; FEV1: Forced expiratory volume in 1 s; GINA: Global initiative for Bao F, Zhao Y, et al. Efficacy comparison of 16 interventions for myopia asthma; GRADE: Grading recommendations assessment development and control in children: a network meta-analysis. Ophthalmology. 2016;123(4): evaluation; HFOV: High-frequency oscillatory ventilation; HR: Hazard ratio; 697–708. ICS: Inhaled corticosteroids; ICS-H: Medium or high-dose inhaled corticosteroids; 8. Littlewood KJ, Higashi K, Jansen JP, Capkun-Niggli G, Balp MM, Doering G, ICS-L: Low-dose inhaled corticosteroids; IMV: Intermittent mandatory ventilation; Tiddens HA, Angyalosi G. A network meta-analysis of the efficacy of inhaled LABA: Long-acting b-agonists strategies; LTRAs: Leukotriene receptor antibiotics for chronic Pseudomonas infections in cystic fibrosis. J Cyst antagonists; MA: Meta-analysis; MD: Mean difference; NMA: Network meta- Fibros. 2012;11(5):419–26. analysis; OR: Odds ratio; RCT: Randomized controlled trial; RD: Risk difference; 9. Wang C, Guo L, Chi C, Wang X, Guo L, Wang W, Zhao N, Wang Y, Zhang Z, RDS: Respiratory distress syndrome; ROB: Risk of bias; RR: Risk ratio; Li E. Mechanical ventilation modes for respiratory distress syndrome in SIMV: Synchronized intermittent mechanical ventilation; SIPPV: Synchronized infants: a systematic review and network meta-analysis. Critical Care. intermittent positive pressure ventilation; VG: Volume-guarantee ventilation; 2015;19:108. WL: Wait list 10. Zhou X, Hetrick SE, Cuijpers P, Qin B, Barth J, Whittington CJ, Cohen D, Del Giovane C, Liu Y, Michael KD, et al. Comparative efficacy and Availability of data and materials acceptability of psychotherapies for depression in children and Data sharing is not applicable to this article as no datasets were generated adolescents: a systematic review and network meta-analysis. World or analyzed during the current study. Psychiat. 2015;14(2):207–22. 11. Knottnerus BJ, Grigoryan L, Geerlings SE, Moll van Charante EP, Verheij TJ, Kessels AG, ter Riet G. Comparative effectiveness of antibiotics for Authors’ contributions uncomplicated urinary tract infections: network meta-analysis of RA: conceptualized and designed the study, drafted the initial manuscript, randomized trials. Fam Pract. 2012;29(6):659–70. and approved the final manuscript as submitted; IF: conceptualized the 12. Zhao Y, Han S, Shang J, Zhao X, Pu R, Shi L. Effectiveness of drug study, reviewed the, and approved the final manuscript as submitted; GG: treatment strategies to prevent asthma exacerbations and increase conceptualized the study, reviewed the, and approved the final manuscript symptom-free days in asthmatic children: a network meta-analysis. as submitted; LT: conceptualized the study, reviewed the, and approved the J Asthma. 2015;52(8):846–57. final manuscript as submitted. 13. Caldwell DM, Welton NJ, Dias S, Ades AE. Selecting the best scale for measuring treatment effect in a network meta-analysis: a case study in childhood nocturnal enuresis. Res Synthesis Methods. 2012;3(2):126–41. Ethics approval and consent to participate 14. Fang XZ, Gao J, Ge YL, Zhou LJ, Zhang Y. Network meta-analysis on the Not applicable. efficacy of Dexmedetomidine, midazolam, ketamine, Propofol, and fentanyl for the prevention of sevoflurane-related emergence agitation in children. Competing interests Am J Ther. 2015;23:e1032–42. The authors declare that they have no competing interests. 15. Huang X, Xu B. Efficacy and safety of tacrolimus versus Pimecrolimus for the treatment of atopic dermatitis in children: a network meta-analysis. Dermatology. 2015;231(1):41–9. Publisher’sNote 16. Achana FA, Sutton AJ, Kendrick D, Wynn P, Young B, Jones DR, Hubbard SJ, Cooper NJ. The effectiveness of different interventions to promote poison Springer Nature remains neutral with regard to jurisdictional claims in published prevention behaviours in households with children: a network meta- maps and institutional affiliations. analysis. PLoS One. 2015;10(3):e0121122. Author details 17. Hubbard S, Cooper N, Kendrick D, Young B, Wynn PM, He Z, Miller P, Department of Clinical Epidemiology & Biostatistics, McMaster University, Achana F, Sutton A. Network meta-analysis to evaluate the effectiveness of Hamilton, ON, Canada. Department of Pediatrics, Division of Pediatric interventions to prevent falls in children under age 5 years. Inj Prevent. Endocrinology and Metabolism King Saud University, Riyadh, Saudi Arabia. 2015;21(2):98–108. Department of Pediatrics, Universidad de Antioquia, Medellín, Colombia. 18. van der Mark LB, Lyklema PH, Geskus RB, Mohrs J, Bindels PJ, van Aalderen Department of Medicine, McMaster University, Hamilton, Canada. WM, Ter Riet G. A systematic review with attempted network meta-analysis Department of Pediatrics and Anesthesia, McMaster University, Hamilton, of asthma therapy recommended for five to eighteen year olds in GINA ON, Canada. steps three and four. BMC Pulmonary Med. 2012;12:63. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 11 of 12 19. Guo C, Sun X, Wang X, Guo Q, Chen D. Network meta-analysis comparing statement for reporting of systematic reviews incorporating network meta- the efficacy of therapeutic treatments for bronchiolitis in children. JPEN J analyses of health care interventions: checklist and explanations. Ann Intern Parenter Enteral Nutr. 2018;42(1):186–95. Med. 2015;162(11):777–84. 20. Padilha S, Virtuoso S, Tonin FS, Borba HHL, Pontarolo R. Efficacy and safety 40. Jansen JP, Trikalinos T, Cappelleri JC, Daw J, Andes S, Eldessouki R, Salanti G. of drugs for attention deficit hyperactivity disorder in children and Indirect treatment comparison/network meta-analysis study questionnaire adolescents: a network meta-analysis. Eur Child Adol Psychiat. 2018; https:// to assess relevance and credibility to inform health care decision making: doi.org/10.1007/s00787-018-1125-0. an ISPOR-AMCP-NPC good practice task force report. Value Health. 2014; 17(2):157–73. 21. Zeng L, Tian J, Song F, Li W, Jiang L, Gui G, Zhang Y, Ge L, Shi J, Sun X, et al. Corticosteroids for the prevention of bronchopulmonary dysplasia in preterm 41. Cipriani A, Zhou X, Del Giovane C, Hetrick SE, Qin B, Whittington C, Coghill infants: a network meta-analysis. Arch Dis Child Fetal Neonatal Ed. 2018; D, Zhang Y, Hazell P, Leucht S, et al. Comparative efficacy and tolerability of https://doi.org/10.1136/archdischild-2017-313759. antidepressants for major depressive disorder in children and adolescents: a 22. Fu H-D, Qian G-L, Jiang Z-Y. Comparison of second-line immunosuppressants network meta-analysis. Lancet. 2016;388:881–90. for childhood refractory nephrotic syndrome: a systematic review and network 42. Windecker S, Stortecky S, Stefanini GG, da Costa BR, Rutjes AW, Di Nisio M, meta-analysis. J Investig Med. 2017;65(1):65–71. Silletta MG, Maione A, Alfonso F, Clemmensen PM, et al. Revascularisation 23. Gutierrez-Castrellon P, Indrio F, Bolio-Galvis A, Jimenez-Gutierrez C, Jimenez- versus medical treatment in patients with stable coronary artery disease: Escobar I, Lopez-Velazquez G. Efficacy of lactobacillus reuteri DSM 17938 for network meta-analysis. BMJ Clin Res ed. 2014;348:g3859. infantile colic: systematic review with network meta-analysis. Medicine. 43. Lumley T. Network meta-analysis for indirect treatment comparisons. Stat 2017;96(51):e9375. Med. 2002;21(16):2313–24. 24. Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux PJ, Prasad K, 44. Friedrich JO, Adhikari NK, Beyene J. Ratio of means for analyzing continuous Neumann I, Carrasco-Labra A, Agoritsas T, Hatala R, et al. How to read a outcomes in meta-analysis performed as well as mean difference methods. systematic review and meta-analysis and apply the results to patient care: J Clin Epidemiol. 2011;64(5):556–64. users’ guides to the medical literature. JAMA. 2014;312(2):171–9. 45. Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen 25. Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, TP. Does quality of reports of randomised trials affect estimates of intervention Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment efficacy reported in meta-analyses? Lancet. 1998;352(9128):609–13. comparisons and network meta-analysis for health-care decision making: 46. Chaimani A, Vasiliadis HS, Pandis N, Schmid CH, Welton NJ, Salanti G. Effects report of the ISPOR task force on indirect treatment comparisons good of study precision and risk of bias in networks of interventions: a network research practices: part 1. Value Health. 2011;14(4):417–28. meta-epidemiological study. Int J Epidemiol. 2013;42(4):1120–31. 26. Cipriani A, Higgins JP, Geddes JR, Salanti G. Conceptual and technical 47. Guyatt GH,Oxman AD,KunzR,BrozekJ,Alonso-Coello P, Rind D, challenges in network meta-analysis. Ann Intern Med. 2013;159(2):130–7. Devereaux PJ, Montori VM, Freyschuss B, Vist G, et al. GRADE guidelines 27. Salanti G. Indirect and mixed-treatment comparison, network, or multiple- 6. Rating the quality of evidence–imprecision. J Clin Epidemiol. treatments meta-analysis: many names, many benefits, many concerns for 2011;64(12):1283–93. the next generation evidence synthesis tool. Res Synthesis Methods. 2012; 48. Schunemann HJ, Guyatt GH. Commentary–goodbye M (C) ID! Hello MID, 3(2):80–97. where do you come from? Health Serv Res. 2005;40(2):593–7. 28. Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, Boersma 49. Counseling parents before high-risk delivery. In: Lacy GT, Eyal FG, Zenk KE, C, Thompson D, Larholt KM, Diaz M, et al. Conducting indirect-treatment- editors. Neonatology: management, procedures, on-call problems, diseases, comparison and network-meta-analysis studies: report of the ISPOR task and drugs. Edn. Stamford: Appleton & Lange; 1999. p. 223. force on indirect treatment comparisons good research practices: part 2. 50. Ancel PY, Goffinet F, Kuhn P, Langer B, Matis J, Hernandorena X, Value Health. 2011;14(4):429–37. Chabanier P, Joly-Pedespan L, Lecomte B, Vendittelli F, et al. Survival 29. Thabane L, Thomas T, Ye C, Paul J. Posing the research question: not so and morbidity of preterm children born at 22 through 34 weeks’ simple. Can J Anaesth. 2009;56(1):71–9. gestation in France in 2011: results of the EPIPAGE-2 cohort study. 30. Salanti G, Del Giovane C, Chaimani A, Caldwell DM, Higgins JP. Evaluating JAMA Pediatr. 2015;169(3):230–8. the quality of evidence from a network meta-analysis. PLoS One. 2014;9(7): 51. Sun H, Cheng R, Kang W, Xiong H, Zhou C, Zhang Y, Wang X, Zhu C. High- e99682. frequency oscillatory ventilation versus synchronized intermittent mandatory ventilation plus pressure support in preterm infants with severe 31. Puhan MA, Schunemann HJ, Murad MH, Li T, Brignardello-Petersen R, Singh respiratory distress syndrome. Respir Care. 2014;59(2):159–69. JA, Kessels AG, Guyatt GH. A GRADE working group approach for rating the quality of treatment effect estimates from network meta-analysis. BMJ. 2014; 52. Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso- 349:g5630. Coello P, Falck-Ytter Y, Jaeschke R, Vist G, et al. GRADE guidelines: 8. Rating 32. Reddel HK, Levy ML. The GINA asthma strategy report: what's new for the quality of evidence–indirectness. J Clin Epidemiol. 2011;64(12):1303–10. primary care? NPJ Prim Care Resp Med. 2015;25:15050. 53. Higgins J. Identifying and addressing inconsistency in network meta- 33. Expert Panel Report 3 (EPR-3). Guidelines for the diagnosis and analysis. In: Cochrane comparing multiple interventions methods group Management of Asthma-Summary Report 2007. J Allergy Clin Immunol. Oxford training event, vol. 2013: Cochrane Collaboration; 2013. 2007;120(5 Suppl):S94–138. 54. Song F, Xiong T, Parekh-Bhurke S, Loke YK, Sutton AJ, Eastwood AJ, Holland 34. Jansen JP, Naci H. Is network meta-analysis as valid as standard pairwise R, Chen Y-F, Glenny A-M, Deeks JJ, et al. Inconsistency between direct and meta-analysis? It all depends on the distribution of effect modifiers. BMC indirect comparisons of competing interventions: meta-epidemiological Med. 2013;11:159. study. BMJ. 2011;343:d4909. 35. Betran AP, Say L, Gulmezoglu AM, Allen T, Hampson L. Effectiveness of 55. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed different databases in identifying studies for systematic reviews: experience treatment comparison meta-analysis. Stat Med. 2010;29(7–8):932–44. from the WHO systematic review of maternal morbidity and mortality. BMC 56. Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence Med Res Methodol. 2005;5(1):6. synthesis for decision making 4: inconsistency in networks of evidence 36. Kwon Y, Powelson SE, Wong H, Ghali WA, Conly JM. An assessment of based on randomized controlled trials. Med Decis Mak. 2013;33(5):641–56. the efficacy of searching in biomedical databases beyond MEDLINE in 57. Oxman AD, Guyatt GH. A consumer's guide to subgroup analyses. Ann identifying studies for a systematic review on ward closures as an Intern Med. 1992;116(1):78–84. infection control intervention to control outbreaks. Systematic Rev. 58. Sun X, Ioannidis JP, Agoritsas T, Alba AC, Guyatt G. How to use a subgroup 2014;3:135. analysis: users’ guide to the medical literature. JAMA. 2014;311(4):405–11. 37. Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic 59. Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, Alonso-Coello reviews. BMJ. Br Med J. 1994;309(6964):1286–91. P, Djulbegovic B,AtkinsD,Falck-Ytter Y, et al.GRADE guidelines:5.Rating 38. Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews the quality of evidence–publication bias. J Clin Epidemiol. 2011;64(12): of Interventions Version 5.1.0 [updated March 2011]. The Cochrane 1277–82. Collaboration, 2011. Available from www.cochrane-handbook.org. 60. Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical 39. Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, summaries for presenting results from multiple-treatment meta-analysis: an Ioannidis JP, Straus S, Thorlund K, Jansen JP, et al. The PRISMA extension overview and tutorial. J Clin Epidemiol. 2011;64(2):163–71. Al Khalifah et al. BMC Pediatrics (2018) 18:180 Page 12 of 12 61. Mbuagbaw L, Rochwerg B, Jaeschke R, Heels-Andsell D, Alhazzani W, Thabane L, Guyatt GH. Approaches to interpreting and choosing the best treatments in network meta-analyses. Systematic Rev. 2017;6(1):79. 62. Veroniki AA, Straus SE, Rücker G, Tricco AC. Is providing uncertainty intervals in treatment ranking helpful in a network meta-analysis? J Clin Epidemiol. 2018; https://doi.org/10.1016/j.jclinepi.2018.02.009. 63. Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996;17(1):1–12.

Journal

BMC PediatricsSpringer Journals

Published: May 29, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off