Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

Data Extraction Errors in Meta-analyses That Use Standardized Mean Differences

Data Extraction Errors in Meta-analyses That Use Standardized Mean Differences Abstract Context Meta-analysis of trials that have used different continuous or rating scales to record outcomes of a similar nature requires sophisticated data handling and data transformation to a uniform scale, the standardized mean difference (SMD). It is not known how reliable such meta-analyses are. Objective To study whether SMDs in meta-analyses are accurate. Data Sources Systematic review of meta-analyses published in 2004 that reported a result as an SMD, with no language restrictions. Two trials were randomly selected from each meta-analysis. We attempted to replicate the results in each meta-analysis by independently calculating SMD using Hedges adjusted g. Data Extraction Our primary outcome was the proportion of meta-analyses for which our result differed from that of the authors by 0.1 or more, either for the point estimate or for its confidence interval, for at least 1 of the 2 selected trials. We chose 0.1 as cut point because many commonly used treatments have an effect of 0.1 to 0.5, compared with placebo. Results Of the 27 meta-analyses included in this study, we could not replicate the result for at least 1 of the 2 trials within 0.1 in 10 of the meta-analyses (37%), and in 4 cases, the discrepancy was 0.6 or more for the point estimate. Common problems were erroneous number of patients, means, standard deviations, and sign for the effect estimate. In total, 17 meta-analyses (63%) had errors for at least 1 of the 2 trials examined. For the 10 meta-analyses with errors of at least 0.1, we checked the data from all the trials and conducted our own meta-analysis, using the authors' methods. Seven of these 10 meta-analyses were erroneous (70%); 1 was subsequently retracted, and in 2 a significant difference disappeared or appeared. Conclusions The high proportion of meta-analyses based on SMDs that show errors indicates that although the statistical process is ostensibly simple, data extraction is particularly liable to errors that can negate or even reverse the findings of the study. This has implications for researchers and implies that all readers, including journal reviewers and policy makers, should approach such meta-analyses with caution. Results from trials that have measured the same outcome on the same scale, eg, diastolic blood pressure in mm Hg, can readily be combined in a meta-analysis by calculating the weighted mean difference.1 Sometimes, trials have used outcomes of a similar nature but that were measured on different scales, eg, pain on a 5-point ranking scale or on a 100-mm visual analog scale, or depression on a clinician-rated scale such as the Hamilton Rating Scale for Depression2 or a self-rating scale such as the Beck Depression Inventory.3 In such cases, it is necessary to standardize the measurements on a uniform scale before they can be pooled in a meta-analysis. This is done by calculating the standardized mean difference (SMD) for each trial, which is the difference in means between the 2 groups, divided by the pooled standard deviation of the measurements.1 By this transformation, the outcome becomes dimensionless and the scales become uniform, eg, for the same degree of pain, values measured on a 100-mm analog scale would be expected to be 20 times larger than values measured on a 5-point ranking scale, but the standard deviation would also be expected to be 20 times larger. Although simple in principle, it is not known how reliable this method is in practice. In contrast to a meta-analysis of binary data, which usually involves only the extraction of the number of patients and events from the trial reports, a meta-analysis using SMDs requires much more sophisticated data handling, and there are many pitfalls. Standard errors may be mistaken for standard deviations, which will inflate the estimates substantially, and standard deviations may need to be calculated or estimated from P values or other data. Some trials may have used changes from baseline instead of values after treatment but may have failed to report data that allow the calculation of within-patient standard deviations. Data extractors also need to know the direction of the scales, which is not always clear in the trial reports. When a high value on one scale means a poor effect, eg, on a depression scale, but a good effect on another scale, eg, a mood scale, it is necessary to change the sign of those values that mean the opposite. Adding to this complexity is that trial authors often give changes from baseline as positive values when they should have been negative, eg, when the average value after treatment is lower than the baseline value, or they say they have used changes from baseline when in reality they have used values after treatment. In 1 case, the review authors used the wrong sign for some of the estimates, which led to an erroneous conclusion of harm and retraction of the review, that, when corrected and republished, concluded that the intervention was beneficial.4 We studied whether trial SMDs in published meta-analyses are accurate and described the frequency and nature of any data extraction errors and their impact on the meta-analysis result. Methods We performed a PubMed search on March 3, 2005, for meta-analyses that had used the SMD and that were published in 2004. We used the search strategy (effect size or standardised mean difference or standardized mean difference or SMD) and (systematic review [title and abstract {tiab}] or meta-analysis [publication type {pt}] or review [pt]). There were no language restrictions. We included meta-analyses with abstracts that reported an SMD or indicated that there was such a result in the article. The first result in the abstract or in the results section if there was none in the abstract was our index result. We excluded meta-analyses if (1) the index result was clearly not based exclusively on randomized trials; (2) the index result was based on crossover trials; (3) the index result was not based on at least 2 trials; (4) the authors had used Bayesian statistics; (5) the authors had performed an individual patient data meta-analysis; (6) the meta-analysis had been performed by ourselves; or (7) the meta-analysis was not restricted to humans. For each meta-analysis, the intervention that appeared to be the authors' primary interest was labeled the experimental intervention. It was easy to determine from the title, introduction, graphs, statistical advice, or grants which intervention was experimental. The other intervention, whether active or inactive, was defined as control. We noted the SMD and its timing for the index result, interventions, disease, any explicit statements about methods for selection of 1 of several possible outcomes or time points in a trial, statistical methods used for pooling, whether values after treatment or changes from baseline had been used, source of funding, and conflicts of interest. We randomly selected 2 trials from each meta-analysis by using a random numbers table, starting at a new place in the table for every new trial. In one case, the selected trial report could not be retrieved, so we randomly selected another. We extracted outcome data from the trial reports, ensuring that the data extractor on a trial report was different from the one on the corresponding meta-analysis. The trial data extractor was provided with a data sheet with information on the experimental intervention, disease and measurement scale, including any timing if available in the meta-analysis, eg, Hamilton depression score after 6 weeks. Furthermore, the data extractor was informed about the trial result, with its 95% confidence interval (CI), and the group sizes, means and standard deviations for the particular trial's outcome if available, the statistical method used for pooling, and whether final values or changes had been used. The reason for the lack of blinding was that we wished to see whether we could replicate the published results. We therefore focused on what the authors of the meta-analysis had done and not on what they could have done instead, eg, selected another, perhaps more appropriate, scale when several had been used for measuring the same outcome. Trial data extractors retrieved the necessary information for calculating the SMD from each trial report, including the direction of the effect in relation to the scale used, and could write comments. Two persons extracted data independently and disagreements (which were mainly caused by simple oversight) were resolved by discussion. We contacted the authors of the meta-analyses for clarification when we could not replicate their data, or when essential data in the trial report for the calculations were missing, ambiguous, or appeared to be erroneous. When the authors had received unpublished data from the trial authors, we used the same unpublished data for our calculations. Our main outcome was the proportion of meta-analyses for which 1 or both of our 2 trial SMDs differed from that of the authors by 0.1 or more, either for the point estimate or for its CI. We chose 0.1 as the cut point because many commonly used treatments have an effect of 0.1 to 0.5 compared with placebo. For example, the effect of acetaminophen on pain in patients with osteoarthritis is SMD −0.13 (95% CI, −0.22 to −0.04),5 the effect of antidepressants on mood in trials with active placebos is SMD 0.17 (95% CI, 0.00-0.34),6 the effect of physical and chemical methods to reduce house dust mite allergens on asthma symptoms is SMD −0.01 (95% CI, −0.10 to 0.13),7 whereas the effect of inhaled corticosteroids on asthma symptoms is relatively large, SMD −0.49 (95% CI, −0.56 to −0.43).8 Furthermore, an error of 0.1 can be important when 2 active treatments have been compared, for there is usually little difference between active treatments. We used Microsoft Excel for our initial calculations of Hedges adjusted g, and Review Manager9 and Comprehensive Meta Analysis10 for our final estimates. Results We identified 148 potentially eligible reviews. Fifty-five were excluded based on the abstracts, another 61 after reading the full text, and 5 after reading the 2 randomly selected trial reports (Figure 1). The main reasons for exclusion were lack of a reported pooled SMD in the meta-analysis (n = 35) or for the individual trials (n = 16) and that the reviews were clearly not based solely on randomized trials (n = 29). We included 27 reviews,11-37 of which 16 were Cochrane reviews. Two reviews had industry funding, 18 nonindustry funding, 1 had no funding, 5 had no statements about funding, and 1 was unclear. All 16 Cochrane reviews had a conflict of interest statement, which is a standard heading, whereas 9 of the other 11 reviews had no such declaration. The outcome in our index meta-analysis result was a clinical or functional score in 10 reviews, depression in 5, pain in 4, and other in 8. It was unclear whether the calculations were preferentially based on change from baseline or on final values in 15 meta-analyses; in 7, change from baseline was used; in 4, final values; and in 1, both approaches. In 22 reviews, the statistical method used for meta-analysis was Hedges adjusted g; in 3, Cohen d; and in 2, the method was not stated. Five reviews explicitly reported use of unpublished data in relation to one or both trials we selected. Accuracy of the Published Data In 10 of the 27 meta-analyses (37%), we could not replicate the result or its 95% CI within our predefined cut point of 0.1 for at least 1 of the 2 randomly selected trials38-49 (Figure 2). Seven meta-analyses (26%) had a trial with a discrepancy of 0.2 or more in the point estimate, and 4 (15%) a discrepancy of 0.6 or more, with a maximum of 1.45.48 Common errors were that the authors' number of patients, means, standard deviations, and sign for the effect estimate were wrong (after we had taken into account that some authors had reversed the sign for all trials, for convenience, to obtain a positive value for a beneficial effect; Figure 2). We also found errors that led to a discrepancy of less than 0.1 in the SMD, eg, wrong standard deviation,30,50 the use of number of patients and standard deviations at baseline rather than after treatment,27,51 wrong time point,24,52 and double counting of the control group when there were 2 treatment groups.26,53 In total, we found 17 meta-analyses (63%) with errors for at least 1 of the 2 trials examined. Other Problems Multiplicity of Available Data. The authors of a meta-analysis of osteoporosis had based their calculations on exact P values, although means and standard deviations were available, but we found that the P values in both trials were seriously wrong.36 We replicated the authors' SMDs from the P values, but when we used means and standard deviations for the same outcome, we found an SMD of 0.34 vs the authors' 0.55 for the first trial,54 and 1.42 vs 0.60 for the second.55 In the second trial,55 there were 12 different data sets to choose from: intact or hemiplegic side, 2 measurement methods for bone mineral content, and values after treatment or changes, and 4 sets of P values. The SMDs for these 12 possibilities varied between −0.02 and 1.42. Ten meta-analyses (37%) described methods for selection of 1 of several possible outcomes in a trial. In 4, however, the selected outcome was the most frequently reported one, which suggests that it might have been a post hoc decision rather than having been stated in a review protocol. Two meta-analyses had pooled the reported outcomes for each trial,21,31 but pooling was inappropriate for one trial in which psychometric scales had been pooled with number of visits to the infirmary for psychiatric prison inmates21,46 (if a person is mentally disturbed, he may score high on a psychometric scale but low on visits to a physician because his problems keep him from making an appointment; in fact, the SMD was 0.67 for 1 of the psychometric scales and −0.70 for 1 of the visit outcomes). Eight meta-analyses (30%) had statements about the selection of 1 of several possible time points in a trial, but they were often unclear or appeared to have been post hoc decisions. One meta-analysis stated that “Day three clinical score was most often reported,”32 another that it had “trial durations of at least 6 weeks and for most 12 or more weeks, which is sufficient time for antidepressant effects to occur.”31 In a third meta-analysis, the length varied between 2 and 8 weeks and the 2-week data were used because they included all study participants in both trials.14 A fourth meta-analysis selected “results obtained during the whole circumcision procedure,”16 but in 1 of the trials,43 there were 9 different data sets, corresponding to various time points. In a fifth meta-analysis,19 the authors had used 8-week data for one of the trials but 20-week data for the other when only half of the patients in the experimental group remained, although data were reported for each of the 20 weeks separately. Over these 20 weeks, the SMD varied substantially, between −0.73 and 0.41 (Figure 3).45 Adjusted Data. In a meta-analysis of nursing care, the authors had used statistically adjusted data and found an SMD of 0.31, whereas we found an SMD of 0.21, based on unadjusted data.22,56 Because we could replicate the authors' result with adjusted data, we did not consider this a discrepancy but nevertheless believe that one should use unadjusted data in meta-analyses since trial authors are more prone to use adjustment when it results in smaller P values than unadjusted analyses.57 In another meta-analysis, the authors had “adjusted” their data by subtracting baseline values from values after treatment.27,50 Because of dropouts and missing data, there were more patients at baseline. We calculated other SMDs than the authors reported and believe such corrections should be avoided because the patients at baseline are different from those after treatment. Non-Gaussian Distributions. The data were often not normally distributed, and in some cases, the deviations from normality were substantial. In 6 meta-analyses, the standard deviation was larger than half the mean for at least 1 of the 2 trials, although the scale did not allow values below 0. In 3 meta-analyses, the SD even exceeded the mean, and in one case, the average number of sick days was 5.5 while the SD was 25.26,53 Calculation of the SMD may be questionable in such cases. Replication of Full Meta-analyses. For the 10 meta-analyses with important errors in 1 or both of our 2 selected trial results, we checked the data from all the trials and did our own meta-analysis, using the authors' methods. We shared our results with the authors, including those for the individual trials and asked them whether they could explain the differences. For 7 (70%) of these meta-analyses,11,13,18,21,25,32,35 we could not replicate the authors' pooled result within our cut point of 0.1 in SMD for the point estimate or its CI, and for 5 of them, the discrepancy exceeded 0.2 (Figure 4). Because of our findings, 1 of these 7 meta-analyses was retracted by the editor who was also an author of the meta-analysis,11 in another, the authors reported a significant effect we could not reproduce,21 and in a third, we found a significant effect in contrast to the authors.32 Comment We found erroneous SMD estimates of potential clinical relevance for at least 1 of our 2 selected trials in 10 of 27 meta-analyses (37%). When we tried to replicate the 10 full meta-analyses by including all the trials, we found erroneous pooled estimates in 7 of them (70%). Our choice of 0.1 as a cut point for errors can be discussed, but there were also many errors that were larger than 0.2, and several were larger than 0.6. Because it can be difficult for readers to grasp what a certain SMD means, we suggest that authors of meta-analyses use the pooled SMD to calculate back what the effect corresponds to on a commonly used scale, eg, an analog scale for pain or Hamilton scale for depression. Although the error rates were high, they are very likely underestimates. First, we only checked a single outcome in only 2 randomly selected trials in each meta-analysis. Second, we did not check the full meta-analyses in the majority of cases for which we did not find errors of at least 0.1 in the SMDs in the 2 selected trials. But we could not avoid finding errors even in those meta-analyses. For example, we noted incidentally that in 1 of them,37 there was extreme heterogeneity for some of the trials that we had not selected; in one trial, SMD was −1.38 (95% CI, −2.07 to −0.68), corresponding to a large, significantly beneficial effect, and in another, the SMD was 0.80 (95% CI, 0.02-1.57), corresponding to a large, significantly harmful effect, with a distance of 0.70 between the borders of the 2 nonoverlapping CIs. This suggests that 1 of the estimates is highly likely to be wrong. Third, when we checked the full meta-analyses in the remaining cases, we found many additional errors. Of the 40 new trials for this analysis, we found errors in 16 (40%); in 12 of these, the discrepancy in SMD exceeded 0.2, and in 6, it exceeded 0.6. Some errors were extremely large but tended to neutralize each other as they went in both directions, eg, in 1 meta-analysis, the 4 largest discrepancies were 0.47, −1.35, 1.33, and −1.4532; in another, the 3 largest discrepancies were −0.79, 0.64, and 0.65.35 It should be noted that the use of SMD in meta-analyses is far more common than our results suggest. We had narrow inclusion criteria and excluded many meta-analyses because they were not based solely on randomized trials, or because there were insufficient data for our analyses (Figure 1). Furthermore, our PubMed search must have missed many meta-analyses because authors quite often do not indicate in their abstract that they have used the SMD. It is therefore likely that our sample consisted of meta-analyses that were relatively well done, well reported, and therefore well indexed, and that the problems could be more pronounced than we have described. We also note that our search technique may have led to oversampling of Cochrane reviews because the abstracts and methods of these reviews are standardized.1 Our study was small and needs to be replicated. It is also a limitation that we were primarily interested in detecting and discussing the possible consequences of obvious errors in published meta-analyses. The persons who extracted data from the trial reports were therefore aware of the data that had been used in the corresponding meta-analysis in order to focus on what the authors of the meta-analysis had done and not on what they could have done, sometimes with better justification, as illustrated in our examples. There are only a few previous studies on the accuracy of continuous and ordinal-scale data in meta-analyses. A statistician with experience in systematic reviews found errors in 20 of 34 published Cochrane reviews in cystic fibrosis and other genetic disorders.58 This study was not limited to checking continuous data, but for these, some of the same types of errors were reported as those we found. The authors gave no data on the discrepancies but only noted that they did not lead to “substantial changes in any conclusion.” In another study, we tried to replicate a meta-analysis of analgesic effects of placebos59,60 but found many serious errors, and after correcting for them, we rejected the authors' claim of large effects of placebo.61 Our study suggests that statistical expertise and considerable attention to detail are required to get SMDs right. We found examples from which it was necessary to extract information from analysis of variance tables, results of F tests and graphs with a nonzero origin; to combine baseline and follow-up data; and to judge whether results reported as medians and percentiles could be used with reasonable approximations. We also found examples of errors made by the trial authors, eg, an asymmetric CI, which is impossible for an untransformed continuous outcome; grossly erroneous P values; and apparently erroneous unpublished data (Figure 2). It is usually recommended that 2 observers extract trial data independently and compare their findings,1 and a study based on 30 trials of the effect of melatonin for sleep disorders showed that single-data extraction with verification by a second observer led to more errors than double data extraction.62 However, in 1 of our meta-analyses,11 the data were very different from those in the trial reports for both included trials, although the review reported to have used 2 independent observers (Figure 2). This suggests that this precaution may not have taken place as reported or that the observers may not have checked what was entered in a statistical program and what was published. Because data handling can so easily go wrong, it was unfortunate that it was rarely clear what the meta-analysts had done. Although we consulted the “Methods” sections and knew which estimates the meta-analysts had arrived at when we tried to replicate them, we often had to do extensive detective work in order to understand where they came from, for there was too little detail in the reviews. Cochrane reviews were the easiest to follow because graphs are always published that show—for each trial —the number of patients, the mean and standard deviation for each group, and the SMD and its CI. Other meta-analyses sometimes gave only the point estimate for the SMD. The reporting could be improved if authors adhered to the Quality of Reporting of Meta-analyses (QUOROM) guidelines63 that are currently being updated under the name of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses). We find it essential that meta-analysts report detailed data for each trial and detailed methods on why, how, and which trial data they extracted and whether decision rules on selection of outcomes and time points were prespecified in a review protocol and adhered to. Although our sample was limited, we found examples that SMDs in the same trial varied between −0.02 and 1.42 for the same type of outcome, and between −0.73 and 0.41 for the same outcome measured at different time points. These variations are extreme compared with the small effects some of our treatments have over placebo and the even smaller differences between most active treatments, and they suggest that the potential for error due to erroneous data handling and bias is far greater in meta-analyses of continuous and ordinal-scale outcomes than in those of binary data. Further Research Our study is the first step toward elucidating the reliability of the SMD when used in practice as a uniform outcome measure in meta-analyses. We will explore in another study the observer variability, when the same meta-analysis is performed by independent researchers using the same protocol. There is no tradition among statisticians for letting several people analyze the same set of raw data independently and comparing their results. However, observer variation studies among clinicians have shown that clinicians' diagnostic skills and mutual agreement is generally small and, indeed, much smaller than what they thought themselves before their beliefs were put on trial.64 It would be interesting to know whether the same applies to statisticians and other methodologists. We will also explore whether meta-analyses using the weighted mean difference suffer from similar problems as meta-analyses using SMD. Conclusions The high prevalence of errors that may potentially negate or even reverse the findings of the included studies implies that all readers, including journal reviewers and policy makers, should approach meta-analyses using SMDs with caution. Editors should be particularly careful when editing SMD meta-analyses. Back to top Article Information Corresponding Author: Peter C. Gøtzsche, MD, DrMedSci, Nordic Cochrane Centre, Rigshospitalet, Dept 3343, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark (pcg@cochrane.dk). Author Contributions: Dr Gøtzsche had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Gøtzsche, Hróbjartsson. Acquisition of data: Gøtzsche, Hróbjartsson, Marić, Tendal. Analysis and interpretation of data: Gøtzsche, Hróbjartsson, Tendal. Drafting of the manuscript: Gøtzsche. Critical revision of the manuscript for important intellectual content: Gøtzsche, Hróbjartsson, Marić, Tendal. Statistical analysis: Gøtzsche, Tendal. Administrative, technical, or material support: Gøtzsche, Tendal. Study supervision: Gøtzsche. Financial Disclosures: None reported. Funding/Support: The study was not funded. Additional Contributions: We thank senior statistician Julian Higgins, MRC, Biostatistics Unit, University of Cambridge, England, for comments on the manuscript and the following authors for providing additional information on their meta-analyses: Ruth Barclay-Goddard, MHSc, University of Manitoba; Barbara Brady-Fryer, RN, University of Ottawa, Ottawa, Ontario; Peter den Boer, MD, University Hospital Groningen, Groningen, the Netherlands; Chen Junmin, MD, Australasian Cochrane Centre, Melbourne, Australia; Chris Deery, MD, Edinburgh Dental Institute, Edinburgh, Scotland; Pasquale Frisina, PhD, City University of New York, New York; Peter Gibson, MB, BS, FRACP, John Hunter Hospital, Newcastle, Australia; Peter Griffiths, MD, Florence Nightingale School of Nursing and Midwifery at King's College, London, England; Kåre B. Hagen, MD, Diakonhjemmet Hospital, Oslo, Norway; Lisa Hartling, BScPT, MSc, University of Alberta, Edmonton; Jan Kool, PhD, Klinik Valens Rehabilitationszentrum, Valens, Switzerland; Gert Kwakkel, MD, University Hospital Vrije Universiteit, Amsterdam, the Netherlands; Hugh McGuire, Trials Search Coordinator, Cochrane Depression, Anxiety and Neurosis Group, London, England; Colleen Murphy, International Medical Corps, Santa Monica, California; Dr Edward Nuñes, MD, New York State Psychiatric Institute, New York; Hema Patel, MD, MSc, FRCP, Montreal Children's Hospital, Montreal, Quebec; M. Florent Richy, MSc, University of Liège, Belgium; Natasha Wiebe, MMath, University of Alberta, Edmonton. None of those acknowledged received compensation for their contributions. References 1. Higgins JPT, , Green S, . Cochrane Handbook for Systematic Reviews of Interventions, 4.2.5. http://www.cochrane.org/resources/handbook/hbook.htm. Updated May 2005. Accessed May 31, 2005 2. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56-6214399272Google ScholarCrossref 3. Beck AT, Ward CH, Medelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561-57113688369Google ScholarCrossref 4. Murray E, Burns J, See TS, Lai R, Nazareth I. Interactive Health Communication Applications for people with chronic disease. Cochrane Database Syst Rev. 2005;(4):CD00427416235356Google Scholar 5. Towheed TE, Maxwell L, Judd MG, Catton M, Hochberg MC, Wells G. Acetaminophen for osteoarthritis. Cochrane Database Syst Rev. 2006;(1):CD00425716437479Google Scholar 6. Moncrieff J, Wessely S, Hardy R. Active placebos versus antidepressants for depression. Cochrane Database Syst Rev. 2004;(1):CD00301214974002Google Scholar 7. Gøtzsche PC, Johansen HK, Schmidt LM, Burr ML. House dust mite control measures for asthma. Cochrane Database Syst Rev. 2004;(4):CD00118715495009Google Scholar 8. Adams NP, Bestall JC, Lasserson TJ, Jones PW, Cates CJ. Fluticasone versus placebo for chronic asthma in adults and children. Cochrane Database Syst Rev. 2005;(4):CD00313516235315Google Scholar 9. Manager R. Review Manager [computer program]. Version 4.2 for Windows. Copenhagen, Denmark: The Nordic Cochrane Centre, The Cochrane Collaboration; 2003 10. Comprehensive Meta Analysis [computer program]. Version 2.2.030; Englewood, NJ: Biostat Inc; July 2006 11. Brosseau L, Welch V, Wells G. et al. Low level laser therapy (Classes I, II and III) for treating osteoarthritis. Cochrane Database Syst Rev. 2004;(3):CD00204615266461Google Scholar 12. Barlow J, Coren E. Parent-training programmes for improving maternal psychosocial health. Cochrane Database Syst Rev. 2004;(1):CD00202014973981Google Scholar 13. Edmonds M, McGuire H, Price J. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2004;(3):CD00320015266475Google Scholar 14. Barclay-Goddard R, Stevenson T, Poluha W, Moffatt ME, Taback SP. Force platform feedback for standing balance training after stroke. Cochrane Database Syst Rev. 2004;(4):CD00412915495079Google Scholar 15. Castro-Rodriguez JA, Rodrigo GJ. Beta-agonists through metered-dose inhaler with valved holding chamber versus nebulizer for acute exacerbation of wheezing or asthma in children under 5 years of age: a systematic review with meta-analysis. J Pediatr. 2004;145(2):172-17715289762Google ScholarCrossref 16. Brady-Fryer B, Wiebe N, Lander JA. Pain relief for neonatal circumcision. Cochrane Database Syst Rev. 2004;(4):CD00421715495086Google Scholar 17. Deery C, Heanue M, Deacon S. et al. The effectiveness of manual versus powered toothbrushes for dental health: a systematic review. J Dent. 2004;32(3):197-21115001285Google ScholarCrossref 18. Chen J, Liu C. Methotrexate for ankylosing spondylitis. Cochrane Database Syst Rev. 2004;(3):CD00452415266537Google Scholar 19. den Boer PC, Wiersma D, Van den Bosch RJ. Why is self-help neglected in the treatment of emotional disorders? a meta-analysis. Psychol Med. 2004;34(6):959-97115554567Google ScholarCrossref 20. Ekeland E, Heian F, Hagen KB, Abbott J, Nordheim L. Exercise to improve self-esteem in children and young people. Cochrane Database Syst Rev. 2004;(1):CD00368314974029Google Scholar 21. Frisina PG, Borod JC, Lepore SJ. A meta-analysis of the effects of written emotional disclosure on the health outcomes of clinical populations. J Nerv Ment Dis. 2004;192(9):629-63415348980Google ScholarCrossref 22. Griffiths PD, Edwards MH, Forbes A, Harris RL, Ritchie G. Effectiveness of intermediate care in nursing-led in-patient units. Cochrane Database Syst Rev. 2004;(4):CD00221415495030Google Scholar 23. Gross AR, Hoving JL, Haines TA. et al. Manipulation and mobilisation for mechanical neck disorders. Cochrane Database Syst Rev. 2004;(1):CD00424914974063Google Scholar 24. Hagen KB, Hilde G, Jamtvedt G, Winnem M. Bed rest for acute low-back pain and sciatica. Cochrane Database Syst Rev. 2004;(4):CD00125415495012Google Scholar 25. Hartling L, Wiebe N, Russell K, Patel H, Klassen TP. Epinephrine for bronchiolitis. Cochrane Database Syst Rev. 2004;(1):CD00312314974006Google Scholar 26. Kool J, de Bie R, Oesch P, Knusel O, van den Brandt P, Bachmann S. Exercise reduces sick leave in patients with non-acute non-specific low back pain: a meta-analysis. J Rehabil Med. 2004;36(2):49-6215180219Google ScholarCrossref 27. Kwakkel G, van Peppen R, Wagenaar RC. et al. Effects of augmented exercise therapy time after stroke: a meta-analysis. Stroke. 2004;35(11):2529-253915472114Google ScholarCrossref 28. Latham NK, Bennett DA, Stretton CM, Anderson CS. Systematic review of progressive resistance strength training in older adults. J Gerontol A Biol Sci Med Sci. 2004;59(1):48-6114718486Google ScholarCrossref 29. Merry S, McDowell H, Hetrick S, Bir J, Muller N. Psychological and/or educational interventions for the prevention of depression in children and adolescents. Cochrane Database Syst Rev. 2004;(1):CD00338014974014Google Scholar 30. Murphy C, Hahn S, Volmink J. Reduced osmolarity oral rehydration solution for treating cholera. Cochrane Database Syst Rev. 2004;(4):CD00375415495063Google Scholar 31. Nunes EV, Levin FR. Treatment of depression in patients with alcohol or other drug dependence: a meta-analysis. JAMA. 2004;291(15):1887-189615100209Google ScholarCrossref 32. Patel H, Platt R, Lozano JM, Wang EE. Glucocorticoids for acute viral bronchiolitis in infants and young children. Cochrane Database Syst Rev. 2004;(3):CD00487815266547Google Scholar 33. Perrott DA, Piira T, Goodenough B, Champion GD. Efficacy and safety of acetaminophen vs ibuprofen for treating children's pain or fever: a meta-analysis. Arch Pediatr Adolesc Med. 2004;158(6):521-52615184213Google ScholarCrossref 34. Powell H, Gibson PG. High dose versus low dose inhaled corticosteroid as initial starting dose for asthma in adults and children. Cochrane Database Syst Rev. 2004;(2):CD00410915106238Google Scholar 35. Ramakrishnan U, Aburto N, McCabe G, Martorell R. Multimicronutrient interventions but not vitamin a or iron interventions alone improve child growth: results of 3 meta-analyses. J Nutr. 2004;134(10):2592-260215465753Google Scholar 36. Richy F, Ethgen O, Bruyere O, Reginster JY. Efficacy of alphacalcidol and calcitriol in primary and corticosteroid-induced osteoporosis: a meta-analysis of their effects on bone mineral density and fracture rate. Osteoporos Int. 2004;15(4):301-31014740153Google ScholarCrossref 37. Tuunainen A, Kripke DF, Endo T. Light therapy for non-seasonal depression. Cochrane Database Syst Rev. 2004;(2):CD00405015106233Google Scholar 38. Stelian J, Gil I, Habot B. et al. Improvement of pain and disability in elderly patients with degenerative osteoarthritis of the knee treated with narrow-band light therapy. J Am Geriatr Soc. 1992;40(1):23-261727843Google Scholar 39. Bülow PM, Jensen H, Danneskiold-Samsoe B. Low power Ga-Al-As laser treatment of painful osteoarthritis of the knee. A double-blind placebo-controlled study. Scand J Rehabil Med. 1994;26(3):155-1597801065Google Scholar 40. Wearden AJ, Morriss RK, Mullis R. et al. Randomised, double-blind, placebo-controlled treatment trial of fluoxetine and graded exercise for chronic fatigue syndrome. Br J Psychiatry. 1998;172:485-4909828987Google ScholarCrossref 41. Powell P, Bentall RP, Nye FJ, Edwards RH. Randomised controlled trial of patient education to encourage graded exercise in chronic fatigue syndrome. BMJ. 2001;322(7283):387-39011179154Google ScholarCrossref 42. Sackley CM, Lincoln NB. Single blind randomized controlled trial of visual feedback after stroke: effects on stance symmetry and function. Disabil Rehabil. 1997;19(12):536-5469442992Google ScholarCrossref 43. Lander J, Brady-Fryer B, Metcalfe JB, Nazarali S, Muttitt S. Comparison of ring block, dorsal penile nerve block, and topical anesthesia for neonatal circumcision: a randomized controlled trial. JAMA. 1997;278(24):2157-21629417009Google ScholarCrossref 44. Roychowdhury B, Bintley-Bagot S, Bulgen DY, Thompson RN, Tunn EJ, Moots RJ. Is methotrexate effective in ankylosing spondylitis? Rheumatology (Oxford). 2002;41(11):1330-133212422010Google ScholarCrossref 45. Rosner R, Beutler LE, Daldrup R. Depressionsverläufe in unterschiedlichen Psychotherapieformen: modellierung durch Hierarchische lineare Modelle (HLM) [Course of depression in different psychotherapies: an application of hierarchical linear models]. Zeitschrift für Klinische Psychologie. 1999;28(2):112-120Google ScholarCrossref 46. Richards JM, Beal WE, Seagal JD, Pennebaker JW. Effects of disclosure of traumatic events on illness behavior among psychiatric prison inmates. J Abnorm Psychol. 2000;109(1):156-16010740948Google ScholarCrossref 47. Abul-Ainine A, Luyt D. Short term effects of adrenaline in bronchiolitis: a randomised controlled trial. Arch Dis Child. 2002;86(4):276-27911919104Google ScholarCrossref 48. Goebel J, Estrada B, Quinonez J, Nagji N, Sanford D, Boerth RC. Prednisolone plus albuterol versus albuterol alone in mild to moderate bronchiolitis. Clin Pediatr (Phila). 2000;39(4):213-22010791133Google ScholarCrossref 49. Lie C, Ying C, Wang EL, Brun T, Geissler C. Impact of large-dose vitamin A supplementation on childhood diarrhoea, respiratory disease and growth. Eur J Clin Nutr. 1993;47(2):88-968436094Google Scholar 50. Alam NH, Majumder RN, Fuchs GJ.CHOICE study group. Efficacy and safety of oral rehydration solution with reduced osmolarity in adults with cholera: a randomised double-blind clinical trial. Lancet. 1999;354(9175):296-29910440307Google ScholarCrossref 51. Partridge C, Mackenzie M, Edwards S. et al. Is dosage of physiotherapy a critical factor in deciding patterns of recovery from stroke: a pragmatic randomized controlled trial. Physiother Res Int. 2000;5(4):230-24011129665Google ScholarCrossref 52. Rozenberg S, Delval C, Rezvani Y. et al. Bed rest or normal activity for patients with acute low back pain: a randomized controlled trial. Spine. 2002;27(14):1487-149312131705Google ScholarCrossref 53. Härkäpää K, Mellin G, Järvikoski A, Hurri H. A controlled study on the outcome of inpatient and outpatient treatment of low back pain. Part III. Long-term follow-up of pain, disability, and compliance. Scand J Rehabil Med. 1990;22(4):181-1882148221Google Scholar 54. Orimo H, Shiraki M, Hayashi Y. et al. Effects of 1 alpha-hydroxyvitamin D3 on lumbar bone mineral density and vertebral fractures in patients with postmenopausal osteoporosis. Calcif Tissue Int. 1994;54(5):370-3768062152Google ScholarCrossref 55. Sato Y, Maruoka H, Oizumi K. Amelioration of hemiplegia-associated osteopenia more than 4 years after stroke by 1 alpha-hydroxyvitamin D3 and calcium supplementation. Stroke. 1997;28(4):736-7399099188Google ScholarCrossref 56. Steiner A, Walsh B, Pickering RM, Wiles R, Ward J, Brooking JI. Therapeutic nursing or unblocking beds? A randomised controlled trial of a post-acute intermediate care unit. BMJ. 2001;322(7284):453-46011222419Google ScholarCrossref 57. Gøtzsche PC. Believability of relative risks and odds ratios in abstracts: cross-sectional study. BMJ. 2006;333(7561):231-23416854948Google ScholarCrossref 58. Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. J Clin Epidemiol. 2005;58(7):741-74215939227Google ScholarCrossref 59. Vase L, Riley JL III, Price DD. A comparison of placebo effects in clinical analgesic trials versus studies of placebo analgesia. Pain. 2002;99(3):443-45212406519Google ScholarCrossref 60. Price DD, Riley JL III, Vase L. Reliable differences in placebo effects between clinical analgesic trials and studies of placebo analgesia mechanisms [author reply]. Pain. 2003;104:715-716Google ScholarCrossref 61. Hróbjartsson A, Gøtzsche PC. Unsubstantiated claims of large effects of placebo on pain: serious errors in meta-analysis of placebo analgesia mechanism studies. J Clin Epidemiol. 2006;59(4):336-33816549252Google ScholarCrossref 62. Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. J Clin Epidemiol. 2006;59(7):697-70316765272Google ScholarCrossref 63. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999;354(9193):1896-190010584742Google ScholarCrossref 64. Wulff HR, Gøtzsche PC. Rational Diagnosis and Treatment: Evidence-Based Clinical Decision-Making. 3rd ed. Oxford, England: Blackwell Scientific; 2000 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png JAMA American Medical Association

Data Extraction Errors in Meta-analyses That Use Standardized Mean Differences

Loading next page...
 
/lp/american-medical-association/data-extraction-errors-in-meta-analyses-that-use-standardized-mean-HcLHCREZ01
Publisher
American Medical Association
Copyright
Copyright © 2007 American Medical Association. All Rights Reserved.
ISSN
0098-7484
eISSN
1538-3598
DOI
10.1001/jama.298.4.430
Publisher site
See Article on Publisher Site

Abstract

Abstract Context Meta-analysis of trials that have used different continuous or rating scales to record outcomes of a similar nature requires sophisticated data handling and data transformation to a uniform scale, the standardized mean difference (SMD). It is not known how reliable such meta-analyses are. Objective To study whether SMDs in meta-analyses are accurate. Data Sources Systematic review of meta-analyses published in 2004 that reported a result as an SMD, with no language restrictions. Two trials were randomly selected from each meta-analysis. We attempted to replicate the results in each meta-analysis by independently calculating SMD using Hedges adjusted g. Data Extraction Our primary outcome was the proportion of meta-analyses for which our result differed from that of the authors by 0.1 or more, either for the point estimate or for its confidence interval, for at least 1 of the 2 selected trials. We chose 0.1 as cut point because many commonly used treatments have an effect of 0.1 to 0.5, compared with placebo. Results Of the 27 meta-analyses included in this study, we could not replicate the result for at least 1 of the 2 trials within 0.1 in 10 of the meta-analyses (37%), and in 4 cases, the discrepancy was 0.6 or more for the point estimate. Common problems were erroneous number of patients, means, standard deviations, and sign for the effect estimate. In total, 17 meta-analyses (63%) had errors for at least 1 of the 2 trials examined. For the 10 meta-analyses with errors of at least 0.1, we checked the data from all the trials and conducted our own meta-analysis, using the authors' methods. Seven of these 10 meta-analyses were erroneous (70%); 1 was subsequently retracted, and in 2 a significant difference disappeared or appeared. Conclusions The high proportion of meta-analyses based on SMDs that show errors indicates that although the statistical process is ostensibly simple, data extraction is particularly liable to errors that can negate or even reverse the findings of the study. This has implications for researchers and implies that all readers, including journal reviewers and policy makers, should approach such meta-analyses with caution. Results from trials that have measured the same outcome on the same scale, eg, diastolic blood pressure in mm Hg, can readily be combined in a meta-analysis by calculating the weighted mean difference.1 Sometimes, trials have used outcomes of a similar nature but that were measured on different scales, eg, pain on a 5-point ranking scale or on a 100-mm visual analog scale, or depression on a clinician-rated scale such as the Hamilton Rating Scale for Depression2 or a self-rating scale such as the Beck Depression Inventory.3 In such cases, it is necessary to standardize the measurements on a uniform scale before they can be pooled in a meta-analysis. This is done by calculating the standardized mean difference (SMD) for each trial, which is the difference in means between the 2 groups, divided by the pooled standard deviation of the measurements.1 By this transformation, the outcome becomes dimensionless and the scales become uniform, eg, for the same degree of pain, values measured on a 100-mm analog scale would be expected to be 20 times larger than values measured on a 5-point ranking scale, but the standard deviation would also be expected to be 20 times larger. Although simple in principle, it is not known how reliable this method is in practice. In contrast to a meta-analysis of binary data, which usually involves only the extraction of the number of patients and events from the trial reports, a meta-analysis using SMDs requires much more sophisticated data handling, and there are many pitfalls. Standard errors may be mistaken for standard deviations, which will inflate the estimates substantially, and standard deviations may need to be calculated or estimated from P values or other data. Some trials may have used changes from baseline instead of values after treatment but may have failed to report data that allow the calculation of within-patient standard deviations. Data extractors also need to know the direction of the scales, which is not always clear in the trial reports. When a high value on one scale means a poor effect, eg, on a depression scale, but a good effect on another scale, eg, a mood scale, it is necessary to change the sign of those values that mean the opposite. Adding to this complexity is that trial authors often give changes from baseline as positive values when they should have been negative, eg, when the average value after treatment is lower than the baseline value, or they say they have used changes from baseline when in reality they have used values after treatment. In 1 case, the review authors used the wrong sign for some of the estimates, which led to an erroneous conclusion of harm and retraction of the review, that, when corrected and republished, concluded that the intervention was beneficial.4 We studied whether trial SMDs in published meta-analyses are accurate and described the frequency and nature of any data extraction errors and their impact on the meta-analysis result. Methods We performed a PubMed search on March 3, 2005, for meta-analyses that had used the SMD and that were published in 2004. We used the search strategy (effect size or standardised mean difference or standardized mean difference or SMD) and (systematic review [title and abstract {tiab}] or meta-analysis [publication type {pt}] or review [pt]). There were no language restrictions. We included meta-analyses with abstracts that reported an SMD or indicated that there was such a result in the article. The first result in the abstract or in the results section if there was none in the abstract was our index result. We excluded meta-analyses if (1) the index result was clearly not based exclusively on randomized trials; (2) the index result was based on crossover trials; (3) the index result was not based on at least 2 trials; (4) the authors had used Bayesian statistics; (5) the authors had performed an individual patient data meta-analysis; (6) the meta-analysis had been performed by ourselves; or (7) the meta-analysis was not restricted to humans. For each meta-analysis, the intervention that appeared to be the authors' primary interest was labeled the experimental intervention. It was easy to determine from the title, introduction, graphs, statistical advice, or grants which intervention was experimental. The other intervention, whether active or inactive, was defined as control. We noted the SMD and its timing for the index result, interventions, disease, any explicit statements about methods for selection of 1 of several possible outcomes or time points in a trial, statistical methods used for pooling, whether values after treatment or changes from baseline had been used, source of funding, and conflicts of interest. We randomly selected 2 trials from each meta-analysis by using a random numbers table, starting at a new place in the table for every new trial. In one case, the selected trial report could not be retrieved, so we randomly selected another. We extracted outcome data from the trial reports, ensuring that the data extractor on a trial report was different from the one on the corresponding meta-analysis. The trial data extractor was provided with a data sheet with information on the experimental intervention, disease and measurement scale, including any timing if available in the meta-analysis, eg, Hamilton depression score after 6 weeks. Furthermore, the data extractor was informed about the trial result, with its 95% confidence interval (CI), and the group sizes, means and standard deviations for the particular trial's outcome if available, the statistical method used for pooling, and whether final values or changes had been used. The reason for the lack of blinding was that we wished to see whether we could replicate the published results. We therefore focused on what the authors of the meta-analysis had done and not on what they could have done instead, eg, selected another, perhaps more appropriate, scale when several had been used for measuring the same outcome. Trial data extractors retrieved the necessary information for calculating the SMD from each trial report, including the direction of the effect in relation to the scale used, and could write comments. Two persons extracted data independently and disagreements (which were mainly caused by simple oversight) were resolved by discussion. We contacted the authors of the meta-analyses for clarification when we could not replicate their data, or when essential data in the trial report for the calculations were missing, ambiguous, or appeared to be erroneous. When the authors had received unpublished data from the trial authors, we used the same unpublished data for our calculations. Our main outcome was the proportion of meta-analyses for which 1 or both of our 2 trial SMDs differed from that of the authors by 0.1 or more, either for the point estimate or for its CI. We chose 0.1 as the cut point because many commonly used treatments have an effect of 0.1 to 0.5 compared with placebo. For example, the effect of acetaminophen on pain in patients with osteoarthritis is SMD −0.13 (95% CI, −0.22 to −0.04),5 the effect of antidepressants on mood in trials with active placebos is SMD 0.17 (95% CI, 0.00-0.34),6 the effect of physical and chemical methods to reduce house dust mite allergens on asthma symptoms is SMD −0.01 (95% CI, −0.10 to 0.13),7 whereas the effect of inhaled corticosteroids on asthma symptoms is relatively large, SMD −0.49 (95% CI, −0.56 to −0.43).8 Furthermore, an error of 0.1 can be important when 2 active treatments have been compared, for there is usually little difference between active treatments. We used Microsoft Excel for our initial calculations of Hedges adjusted g, and Review Manager9 and Comprehensive Meta Analysis10 for our final estimates. Results We identified 148 potentially eligible reviews. Fifty-five were excluded based on the abstracts, another 61 after reading the full text, and 5 after reading the 2 randomly selected trial reports (Figure 1). The main reasons for exclusion were lack of a reported pooled SMD in the meta-analysis (n = 35) or for the individual trials (n = 16) and that the reviews were clearly not based solely on randomized trials (n = 29). We included 27 reviews,11-37 of which 16 were Cochrane reviews. Two reviews had industry funding, 18 nonindustry funding, 1 had no funding, 5 had no statements about funding, and 1 was unclear. All 16 Cochrane reviews had a conflict of interest statement, which is a standard heading, whereas 9 of the other 11 reviews had no such declaration. The outcome in our index meta-analysis result was a clinical or functional score in 10 reviews, depression in 5, pain in 4, and other in 8. It was unclear whether the calculations were preferentially based on change from baseline or on final values in 15 meta-analyses; in 7, change from baseline was used; in 4, final values; and in 1, both approaches. In 22 reviews, the statistical method used for meta-analysis was Hedges adjusted g; in 3, Cohen d; and in 2, the method was not stated. Five reviews explicitly reported use of unpublished data in relation to one or both trials we selected. Accuracy of the Published Data In 10 of the 27 meta-analyses (37%), we could not replicate the result or its 95% CI within our predefined cut point of 0.1 for at least 1 of the 2 randomly selected trials38-49 (Figure 2). Seven meta-analyses (26%) had a trial with a discrepancy of 0.2 or more in the point estimate, and 4 (15%) a discrepancy of 0.6 or more, with a maximum of 1.45.48 Common errors were that the authors' number of patients, means, standard deviations, and sign for the effect estimate were wrong (after we had taken into account that some authors had reversed the sign for all trials, for convenience, to obtain a positive value for a beneficial effect; Figure 2). We also found errors that led to a discrepancy of less than 0.1 in the SMD, eg, wrong standard deviation,30,50 the use of number of patients and standard deviations at baseline rather than after treatment,27,51 wrong time point,24,52 and double counting of the control group when there were 2 treatment groups.26,53 In total, we found 17 meta-analyses (63%) with errors for at least 1 of the 2 trials examined. Other Problems Multiplicity of Available Data. The authors of a meta-analysis of osteoporosis had based their calculations on exact P values, although means and standard deviations were available, but we found that the P values in both trials were seriously wrong.36 We replicated the authors' SMDs from the P values, but when we used means and standard deviations for the same outcome, we found an SMD of 0.34 vs the authors' 0.55 for the first trial,54 and 1.42 vs 0.60 for the second.55 In the second trial,55 there were 12 different data sets to choose from: intact or hemiplegic side, 2 measurement methods for bone mineral content, and values after treatment or changes, and 4 sets of P values. The SMDs for these 12 possibilities varied between −0.02 and 1.42. Ten meta-analyses (37%) described methods for selection of 1 of several possible outcomes in a trial. In 4, however, the selected outcome was the most frequently reported one, which suggests that it might have been a post hoc decision rather than having been stated in a review protocol. Two meta-analyses had pooled the reported outcomes for each trial,21,31 but pooling was inappropriate for one trial in which psychometric scales had been pooled with number of visits to the infirmary for psychiatric prison inmates21,46 (if a person is mentally disturbed, he may score high on a psychometric scale but low on visits to a physician because his problems keep him from making an appointment; in fact, the SMD was 0.67 for 1 of the psychometric scales and −0.70 for 1 of the visit outcomes). Eight meta-analyses (30%) had statements about the selection of 1 of several possible time points in a trial, but they were often unclear or appeared to have been post hoc decisions. One meta-analysis stated that “Day three clinical score was most often reported,”32 another that it had “trial durations of at least 6 weeks and for most 12 or more weeks, which is sufficient time for antidepressant effects to occur.”31 In a third meta-analysis, the length varied between 2 and 8 weeks and the 2-week data were used because they included all study participants in both trials.14 A fourth meta-analysis selected “results obtained during the whole circumcision procedure,”16 but in 1 of the trials,43 there were 9 different data sets, corresponding to various time points. In a fifth meta-analysis,19 the authors had used 8-week data for one of the trials but 20-week data for the other when only half of the patients in the experimental group remained, although data were reported for each of the 20 weeks separately. Over these 20 weeks, the SMD varied substantially, between −0.73 and 0.41 (Figure 3).45 Adjusted Data. In a meta-analysis of nursing care, the authors had used statistically adjusted data and found an SMD of 0.31, whereas we found an SMD of 0.21, based on unadjusted data.22,56 Because we could replicate the authors' result with adjusted data, we did not consider this a discrepancy but nevertheless believe that one should use unadjusted data in meta-analyses since trial authors are more prone to use adjustment when it results in smaller P values than unadjusted analyses.57 In another meta-analysis, the authors had “adjusted” their data by subtracting baseline values from values after treatment.27,50 Because of dropouts and missing data, there were more patients at baseline. We calculated other SMDs than the authors reported and believe such corrections should be avoided because the patients at baseline are different from those after treatment. Non-Gaussian Distributions. The data were often not normally distributed, and in some cases, the deviations from normality were substantial. In 6 meta-analyses, the standard deviation was larger than half the mean for at least 1 of the 2 trials, although the scale did not allow values below 0. In 3 meta-analyses, the SD even exceeded the mean, and in one case, the average number of sick days was 5.5 while the SD was 25.26,53 Calculation of the SMD may be questionable in such cases. Replication of Full Meta-analyses. For the 10 meta-analyses with important errors in 1 or both of our 2 selected trial results, we checked the data from all the trials and did our own meta-analysis, using the authors' methods. We shared our results with the authors, including those for the individual trials and asked them whether they could explain the differences. For 7 (70%) of these meta-analyses,11,13,18,21,25,32,35 we could not replicate the authors' pooled result within our cut point of 0.1 in SMD for the point estimate or its CI, and for 5 of them, the discrepancy exceeded 0.2 (Figure 4). Because of our findings, 1 of these 7 meta-analyses was retracted by the editor who was also an author of the meta-analysis,11 in another, the authors reported a significant effect we could not reproduce,21 and in a third, we found a significant effect in contrast to the authors.32 Comment We found erroneous SMD estimates of potential clinical relevance for at least 1 of our 2 selected trials in 10 of 27 meta-analyses (37%). When we tried to replicate the 10 full meta-analyses by including all the trials, we found erroneous pooled estimates in 7 of them (70%). Our choice of 0.1 as a cut point for errors can be discussed, but there were also many errors that were larger than 0.2, and several were larger than 0.6. Because it can be difficult for readers to grasp what a certain SMD means, we suggest that authors of meta-analyses use the pooled SMD to calculate back what the effect corresponds to on a commonly used scale, eg, an analog scale for pain or Hamilton scale for depression. Although the error rates were high, they are very likely underestimates. First, we only checked a single outcome in only 2 randomly selected trials in each meta-analysis. Second, we did not check the full meta-analyses in the majority of cases for which we did not find errors of at least 0.1 in the SMDs in the 2 selected trials. But we could not avoid finding errors even in those meta-analyses. For example, we noted incidentally that in 1 of them,37 there was extreme heterogeneity for some of the trials that we had not selected; in one trial, SMD was −1.38 (95% CI, −2.07 to −0.68), corresponding to a large, significantly beneficial effect, and in another, the SMD was 0.80 (95% CI, 0.02-1.57), corresponding to a large, significantly harmful effect, with a distance of 0.70 between the borders of the 2 nonoverlapping CIs. This suggests that 1 of the estimates is highly likely to be wrong. Third, when we checked the full meta-analyses in the remaining cases, we found many additional errors. Of the 40 new trials for this analysis, we found errors in 16 (40%); in 12 of these, the discrepancy in SMD exceeded 0.2, and in 6, it exceeded 0.6. Some errors were extremely large but tended to neutralize each other as they went in both directions, eg, in 1 meta-analysis, the 4 largest discrepancies were 0.47, −1.35, 1.33, and −1.4532; in another, the 3 largest discrepancies were −0.79, 0.64, and 0.65.35 It should be noted that the use of SMD in meta-analyses is far more common than our results suggest. We had narrow inclusion criteria and excluded many meta-analyses because they were not based solely on randomized trials, or because there were insufficient data for our analyses (Figure 1). Furthermore, our PubMed search must have missed many meta-analyses because authors quite often do not indicate in their abstract that they have used the SMD. It is therefore likely that our sample consisted of meta-analyses that were relatively well done, well reported, and therefore well indexed, and that the problems could be more pronounced than we have described. We also note that our search technique may have led to oversampling of Cochrane reviews because the abstracts and methods of these reviews are standardized.1 Our study was small and needs to be replicated. It is also a limitation that we were primarily interested in detecting and discussing the possible consequences of obvious errors in published meta-analyses. The persons who extracted data from the trial reports were therefore aware of the data that had been used in the corresponding meta-analysis in order to focus on what the authors of the meta-analysis had done and not on what they could have done, sometimes with better justification, as illustrated in our examples. There are only a few previous studies on the accuracy of continuous and ordinal-scale data in meta-analyses. A statistician with experience in systematic reviews found errors in 20 of 34 published Cochrane reviews in cystic fibrosis and other genetic disorders.58 This study was not limited to checking continuous data, but for these, some of the same types of errors were reported as those we found. The authors gave no data on the discrepancies but only noted that they did not lead to “substantial changes in any conclusion.” In another study, we tried to replicate a meta-analysis of analgesic effects of placebos59,60 but found many serious errors, and after correcting for them, we rejected the authors' claim of large effects of placebo.61 Our study suggests that statistical expertise and considerable attention to detail are required to get SMDs right. We found examples from which it was necessary to extract information from analysis of variance tables, results of F tests and graphs with a nonzero origin; to combine baseline and follow-up data; and to judge whether results reported as medians and percentiles could be used with reasonable approximations. We also found examples of errors made by the trial authors, eg, an asymmetric CI, which is impossible for an untransformed continuous outcome; grossly erroneous P values; and apparently erroneous unpublished data (Figure 2). It is usually recommended that 2 observers extract trial data independently and compare their findings,1 and a study based on 30 trials of the effect of melatonin for sleep disorders showed that single-data extraction with verification by a second observer led to more errors than double data extraction.62 However, in 1 of our meta-analyses,11 the data were very different from those in the trial reports for both included trials, although the review reported to have used 2 independent observers (Figure 2). This suggests that this precaution may not have taken place as reported or that the observers may not have checked what was entered in a statistical program and what was published. Because data handling can so easily go wrong, it was unfortunate that it was rarely clear what the meta-analysts had done. Although we consulted the “Methods” sections and knew which estimates the meta-analysts had arrived at when we tried to replicate them, we often had to do extensive detective work in order to understand where they came from, for there was too little detail in the reviews. Cochrane reviews were the easiest to follow because graphs are always published that show—for each trial —the number of patients, the mean and standard deviation for each group, and the SMD and its CI. Other meta-analyses sometimes gave only the point estimate for the SMD. The reporting could be improved if authors adhered to the Quality of Reporting of Meta-analyses (QUOROM) guidelines63 that are currently being updated under the name of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses). We find it essential that meta-analysts report detailed data for each trial and detailed methods on why, how, and which trial data they extracted and whether decision rules on selection of outcomes and time points were prespecified in a review protocol and adhered to. Although our sample was limited, we found examples that SMDs in the same trial varied between −0.02 and 1.42 for the same type of outcome, and between −0.73 and 0.41 for the same outcome measured at different time points. These variations are extreme compared with the small effects some of our treatments have over placebo and the even smaller differences between most active treatments, and they suggest that the potential for error due to erroneous data handling and bias is far greater in meta-analyses of continuous and ordinal-scale outcomes than in those of binary data. Further Research Our study is the first step toward elucidating the reliability of the SMD when used in practice as a uniform outcome measure in meta-analyses. We will explore in another study the observer variability, when the same meta-analysis is performed by independent researchers using the same protocol. There is no tradition among statisticians for letting several people analyze the same set of raw data independently and comparing their results. However, observer variation studies among clinicians have shown that clinicians' diagnostic skills and mutual agreement is generally small and, indeed, much smaller than what they thought themselves before their beliefs were put on trial.64 It would be interesting to know whether the same applies to statisticians and other methodologists. We will also explore whether meta-analyses using the weighted mean difference suffer from similar problems as meta-analyses using SMD. Conclusions The high prevalence of errors that may potentially negate or even reverse the findings of the included studies implies that all readers, including journal reviewers and policy makers, should approach meta-analyses using SMDs with caution. Editors should be particularly careful when editing SMD meta-analyses. Back to top Article Information Corresponding Author: Peter C. Gøtzsche, MD, DrMedSci, Nordic Cochrane Centre, Rigshospitalet, Dept 3343, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark (pcg@cochrane.dk). Author Contributions: Dr Gøtzsche had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Gøtzsche, Hróbjartsson. Acquisition of data: Gøtzsche, Hróbjartsson, Marić, Tendal. Analysis and interpretation of data: Gøtzsche, Hróbjartsson, Tendal. Drafting of the manuscript: Gøtzsche. Critical revision of the manuscript for important intellectual content: Gøtzsche, Hróbjartsson, Marić, Tendal. Statistical analysis: Gøtzsche, Tendal. Administrative, technical, or material support: Gøtzsche, Tendal. Study supervision: Gøtzsche. Financial Disclosures: None reported. Funding/Support: The study was not funded. Additional Contributions: We thank senior statistician Julian Higgins, MRC, Biostatistics Unit, University of Cambridge, England, for comments on the manuscript and the following authors for providing additional information on their meta-analyses: Ruth Barclay-Goddard, MHSc, University of Manitoba; Barbara Brady-Fryer, RN, University of Ottawa, Ottawa, Ontario; Peter den Boer, MD, University Hospital Groningen, Groningen, the Netherlands; Chen Junmin, MD, Australasian Cochrane Centre, Melbourne, Australia; Chris Deery, MD, Edinburgh Dental Institute, Edinburgh, Scotland; Pasquale Frisina, PhD, City University of New York, New York; Peter Gibson, MB, BS, FRACP, John Hunter Hospital, Newcastle, Australia; Peter Griffiths, MD, Florence Nightingale School of Nursing and Midwifery at King's College, London, England; Kåre B. Hagen, MD, Diakonhjemmet Hospital, Oslo, Norway; Lisa Hartling, BScPT, MSc, University of Alberta, Edmonton; Jan Kool, PhD, Klinik Valens Rehabilitationszentrum, Valens, Switzerland; Gert Kwakkel, MD, University Hospital Vrije Universiteit, Amsterdam, the Netherlands; Hugh McGuire, Trials Search Coordinator, Cochrane Depression, Anxiety and Neurosis Group, London, England; Colleen Murphy, International Medical Corps, Santa Monica, California; Dr Edward Nuñes, MD, New York State Psychiatric Institute, New York; Hema Patel, MD, MSc, FRCP, Montreal Children's Hospital, Montreal, Quebec; M. Florent Richy, MSc, University of Liège, Belgium; Natasha Wiebe, MMath, University of Alberta, Edmonton. None of those acknowledged received compensation for their contributions. References 1. Higgins JPT, , Green S, . Cochrane Handbook for Systematic Reviews of Interventions, 4.2.5. http://www.cochrane.org/resources/handbook/hbook.htm. Updated May 2005. Accessed May 31, 2005 2. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56-6214399272Google ScholarCrossref 3. Beck AT, Ward CH, Medelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561-57113688369Google ScholarCrossref 4. Murray E, Burns J, See TS, Lai R, Nazareth I. Interactive Health Communication Applications for people with chronic disease. Cochrane Database Syst Rev. 2005;(4):CD00427416235356Google Scholar 5. Towheed TE, Maxwell L, Judd MG, Catton M, Hochberg MC, Wells G. Acetaminophen for osteoarthritis. Cochrane Database Syst Rev. 2006;(1):CD00425716437479Google Scholar 6. Moncrieff J, Wessely S, Hardy R. Active placebos versus antidepressants for depression. Cochrane Database Syst Rev. 2004;(1):CD00301214974002Google Scholar 7. Gøtzsche PC, Johansen HK, Schmidt LM, Burr ML. House dust mite control measures for asthma. Cochrane Database Syst Rev. 2004;(4):CD00118715495009Google Scholar 8. Adams NP, Bestall JC, Lasserson TJ, Jones PW, Cates CJ. Fluticasone versus placebo for chronic asthma in adults and children. Cochrane Database Syst Rev. 2005;(4):CD00313516235315Google Scholar 9. Manager R. Review Manager [computer program]. Version 4.2 for Windows. Copenhagen, Denmark: The Nordic Cochrane Centre, The Cochrane Collaboration; 2003 10. Comprehensive Meta Analysis [computer program]. Version 2.2.030; Englewood, NJ: Biostat Inc; July 2006 11. Brosseau L, Welch V, Wells G. et al. Low level laser therapy (Classes I, II and III) for treating osteoarthritis. Cochrane Database Syst Rev. 2004;(3):CD00204615266461Google Scholar 12. Barlow J, Coren E. Parent-training programmes for improving maternal psychosocial health. Cochrane Database Syst Rev. 2004;(1):CD00202014973981Google Scholar 13. Edmonds M, McGuire H, Price J. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2004;(3):CD00320015266475Google Scholar 14. Barclay-Goddard R, Stevenson T, Poluha W, Moffatt ME, Taback SP. Force platform feedback for standing balance training after stroke. Cochrane Database Syst Rev. 2004;(4):CD00412915495079Google Scholar 15. Castro-Rodriguez JA, Rodrigo GJ. Beta-agonists through metered-dose inhaler with valved holding chamber versus nebulizer for acute exacerbation of wheezing or asthma in children under 5 years of age: a systematic review with meta-analysis. J Pediatr. 2004;145(2):172-17715289762Google ScholarCrossref 16. Brady-Fryer B, Wiebe N, Lander JA. Pain relief for neonatal circumcision. Cochrane Database Syst Rev. 2004;(4):CD00421715495086Google Scholar 17. Deery C, Heanue M, Deacon S. et al. The effectiveness of manual versus powered toothbrushes for dental health: a systematic review. J Dent. 2004;32(3):197-21115001285Google ScholarCrossref 18. Chen J, Liu C. Methotrexate for ankylosing spondylitis. Cochrane Database Syst Rev. 2004;(3):CD00452415266537Google Scholar 19. den Boer PC, Wiersma D, Van den Bosch RJ. Why is self-help neglected in the treatment of emotional disorders? a meta-analysis. Psychol Med. 2004;34(6):959-97115554567Google ScholarCrossref 20. Ekeland E, Heian F, Hagen KB, Abbott J, Nordheim L. Exercise to improve self-esteem in children and young people. Cochrane Database Syst Rev. 2004;(1):CD00368314974029Google Scholar 21. Frisina PG, Borod JC, Lepore SJ. A meta-analysis of the effects of written emotional disclosure on the health outcomes of clinical populations. J Nerv Ment Dis. 2004;192(9):629-63415348980Google ScholarCrossref 22. Griffiths PD, Edwards MH, Forbes A, Harris RL, Ritchie G. Effectiveness of intermediate care in nursing-led in-patient units. Cochrane Database Syst Rev. 2004;(4):CD00221415495030Google Scholar 23. Gross AR, Hoving JL, Haines TA. et al. Manipulation and mobilisation for mechanical neck disorders. Cochrane Database Syst Rev. 2004;(1):CD00424914974063Google Scholar 24. Hagen KB, Hilde G, Jamtvedt G, Winnem M. Bed rest for acute low-back pain and sciatica. Cochrane Database Syst Rev. 2004;(4):CD00125415495012Google Scholar 25. Hartling L, Wiebe N, Russell K, Patel H, Klassen TP. Epinephrine for bronchiolitis. Cochrane Database Syst Rev. 2004;(1):CD00312314974006Google Scholar 26. Kool J, de Bie R, Oesch P, Knusel O, van den Brandt P, Bachmann S. Exercise reduces sick leave in patients with non-acute non-specific low back pain: a meta-analysis. J Rehabil Med. 2004;36(2):49-6215180219Google ScholarCrossref 27. Kwakkel G, van Peppen R, Wagenaar RC. et al. Effects of augmented exercise therapy time after stroke: a meta-analysis. Stroke. 2004;35(11):2529-253915472114Google ScholarCrossref 28. Latham NK, Bennett DA, Stretton CM, Anderson CS. Systematic review of progressive resistance strength training in older adults. J Gerontol A Biol Sci Med Sci. 2004;59(1):48-6114718486Google ScholarCrossref 29. Merry S, McDowell H, Hetrick S, Bir J, Muller N. Psychological and/or educational interventions for the prevention of depression in children and adolescents. Cochrane Database Syst Rev. 2004;(1):CD00338014974014Google Scholar 30. Murphy C, Hahn S, Volmink J. Reduced osmolarity oral rehydration solution for treating cholera. Cochrane Database Syst Rev. 2004;(4):CD00375415495063Google Scholar 31. Nunes EV, Levin FR. Treatment of depression in patients with alcohol or other drug dependence: a meta-analysis. JAMA. 2004;291(15):1887-189615100209Google ScholarCrossref 32. Patel H, Platt R, Lozano JM, Wang EE. Glucocorticoids for acute viral bronchiolitis in infants and young children. Cochrane Database Syst Rev. 2004;(3):CD00487815266547Google Scholar 33. Perrott DA, Piira T, Goodenough B, Champion GD. Efficacy and safety of acetaminophen vs ibuprofen for treating children's pain or fever: a meta-analysis. Arch Pediatr Adolesc Med. 2004;158(6):521-52615184213Google ScholarCrossref 34. Powell H, Gibson PG. High dose versus low dose inhaled corticosteroid as initial starting dose for asthma in adults and children. Cochrane Database Syst Rev. 2004;(2):CD00410915106238Google Scholar 35. Ramakrishnan U, Aburto N, McCabe G, Martorell R. Multimicronutrient interventions but not vitamin a or iron interventions alone improve child growth: results of 3 meta-analyses. J Nutr. 2004;134(10):2592-260215465753Google Scholar 36. Richy F, Ethgen O, Bruyere O, Reginster JY. Efficacy of alphacalcidol and calcitriol in primary and corticosteroid-induced osteoporosis: a meta-analysis of their effects on bone mineral density and fracture rate. Osteoporos Int. 2004;15(4):301-31014740153Google ScholarCrossref 37. Tuunainen A, Kripke DF, Endo T. Light therapy for non-seasonal depression. Cochrane Database Syst Rev. 2004;(2):CD00405015106233Google Scholar 38. Stelian J, Gil I, Habot B. et al. Improvement of pain and disability in elderly patients with degenerative osteoarthritis of the knee treated with narrow-band light therapy. J Am Geriatr Soc. 1992;40(1):23-261727843Google Scholar 39. Bülow PM, Jensen H, Danneskiold-Samsoe B. Low power Ga-Al-As laser treatment of painful osteoarthritis of the knee. A double-blind placebo-controlled study. Scand J Rehabil Med. 1994;26(3):155-1597801065Google Scholar 40. Wearden AJ, Morriss RK, Mullis R. et al. Randomised, double-blind, placebo-controlled treatment trial of fluoxetine and graded exercise for chronic fatigue syndrome. Br J Psychiatry. 1998;172:485-4909828987Google ScholarCrossref 41. Powell P, Bentall RP, Nye FJ, Edwards RH. Randomised controlled trial of patient education to encourage graded exercise in chronic fatigue syndrome. BMJ. 2001;322(7283):387-39011179154Google ScholarCrossref 42. Sackley CM, Lincoln NB. Single blind randomized controlled trial of visual feedback after stroke: effects on stance symmetry and function. Disabil Rehabil. 1997;19(12):536-5469442992Google ScholarCrossref 43. Lander J, Brady-Fryer B, Metcalfe JB, Nazarali S, Muttitt S. Comparison of ring block, dorsal penile nerve block, and topical anesthesia for neonatal circumcision: a randomized controlled trial. JAMA. 1997;278(24):2157-21629417009Google ScholarCrossref 44. Roychowdhury B, Bintley-Bagot S, Bulgen DY, Thompson RN, Tunn EJ, Moots RJ. Is methotrexate effective in ankylosing spondylitis? Rheumatology (Oxford). 2002;41(11):1330-133212422010Google ScholarCrossref 45. Rosner R, Beutler LE, Daldrup R. Depressionsverläufe in unterschiedlichen Psychotherapieformen: modellierung durch Hierarchische lineare Modelle (HLM) [Course of depression in different psychotherapies: an application of hierarchical linear models]. Zeitschrift für Klinische Psychologie. 1999;28(2):112-120Google ScholarCrossref 46. Richards JM, Beal WE, Seagal JD, Pennebaker JW. Effects of disclosure of traumatic events on illness behavior among psychiatric prison inmates. J Abnorm Psychol. 2000;109(1):156-16010740948Google ScholarCrossref 47. Abul-Ainine A, Luyt D. Short term effects of adrenaline in bronchiolitis: a randomised controlled trial. Arch Dis Child. 2002;86(4):276-27911919104Google ScholarCrossref 48. Goebel J, Estrada B, Quinonez J, Nagji N, Sanford D, Boerth RC. Prednisolone plus albuterol versus albuterol alone in mild to moderate bronchiolitis. Clin Pediatr (Phila). 2000;39(4):213-22010791133Google ScholarCrossref 49. Lie C, Ying C, Wang EL, Brun T, Geissler C. Impact of large-dose vitamin A supplementation on childhood diarrhoea, respiratory disease and growth. Eur J Clin Nutr. 1993;47(2):88-968436094Google Scholar 50. Alam NH, Majumder RN, Fuchs GJ.CHOICE study group. Efficacy and safety of oral rehydration solution with reduced osmolarity in adults with cholera: a randomised double-blind clinical trial. Lancet. 1999;354(9175):296-29910440307Google ScholarCrossref 51. Partridge C, Mackenzie M, Edwards S. et al. Is dosage of physiotherapy a critical factor in deciding patterns of recovery from stroke: a pragmatic randomized controlled trial. Physiother Res Int. 2000;5(4):230-24011129665Google ScholarCrossref 52. Rozenberg S, Delval C, Rezvani Y. et al. Bed rest or normal activity for patients with acute low back pain: a randomized controlled trial. Spine. 2002;27(14):1487-149312131705Google ScholarCrossref 53. Härkäpää K, Mellin G, Järvikoski A, Hurri H. A controlled study on the outcome of inpatient and outpatient treatment of low back pain. Part III. Long-term follow-up of pain, disability, and compliance. Scand J Rehabil Med. 1990;22(4):181-1882148221Google Scholar 54. Orimo H, Shiraki M, Hayashi Y. et al. Effects of 1 alpha-hydroxyvitamin D3 on lumbar bone mineral density and vertebral fractures in patients with postmenopausal osteoporosis. Calcif Tissue Int. 1994;54(5):370-3768062152Google ScholarCrossref 55. Sato Y, Maruoka H, Oizumi K. Amelioration of hemiplegia-associated osteopenia more than 4 years after stroke by 1 alpha-hydroxyvitamin D3 and calcium supplementation. Stroke. 1997;28(4):736-7399099188Google ScholarCrossref 56. Steiner A, Walsh B, Pickering RM, Wiles R, Ward J, Brooking JI. Therapeutic nursing or unblocking beds? A randomised controlled trial of a post-acute intermediate care unit. BMJ. 2001;322(7284):453-46011222419Google ScholarCrossref 57. Gøtzsche PC. Believability of relative risks and odds ratios in abstracts: cross-sectional study. BMJ. 2006;333(7561):231-23416854948Google ScholarCrossref 58. Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. J Clin Epidemiol. 2005;58(7):741-74215939227Google ScholarCrossref 59. Vase L, Riley JL III, Price DD. A comparison of placebo effects in clinical analgesic trials versus studies of placebo analgesia. Pain. 2002;99(3):443-45212406519Google ScholarCrossref 60. Price DD, Riley JL III, Vase L. Reliable differences in placebo effects between clinical analgesic trials and studies of placebo analgesia mechanisms [author reply]. Pain. 2003;104:715-716Google ScholarCrossref 61. Hróbjartsson A, Gøtzsche PC. Unsubstantiated claims of large effects of placebo on pain: serious errors in meta-analysis of placebo analgesia mechanism studies. J Clin Epidemiol. 2006;59(4):336-33816549252Google ScholarCrossref 62. Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. J Clin Epidemiol. 2006;59(7):697-70316765272Google ScholarCrossref 63. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999;354(9193):1896-190010584742Google ScholarCrossref 64. Wulff HR, Gøtzsche PC. Rational Diagnosis and Treatment: Evidence-Based Clinical Decision-Making. 3rd ed. Oxford, England: Blackwell Scientific; 2000

Journal

JAMAAmerican Medical Association

Published: Jul 25, 2007

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$499/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month