The intriguing evolution of effect sizes in biomedical research over time: smaller but more often statistically significant

The intriguing evolution of effect sizes in biomedical research over time: smaller but more often... Background: In medicine, effect sizes (ESs) allow the effects of independent variables (including risk/protective factors or treatment interventions) on dependent variables (e.g., health outcomes) to be quantified. Given that many public health decisions and health care policies are based on ES estimates, it is important to assess how ESs are used in the biomedical literature and to investigate potential trends in their reporting over time. Results: Through a big data approach, the text mining process automatically extracted 814 120 ESs from 13 322 754 PubMed abstracts. Eligible ESs were risk ratio, odds ratio, and hazard ratio, along with their confidence intervals. Here we show a remarkable decrease of ES values in PubMed abstracts between 1990 and 2015 while, concomitantly, results become more often statistically significant. Medians of ES values have decreased over time for both “risk” and “protective” values. This trend was found in nearly all fields of biomedical research, with the most marked downward tendency in genetics. Over the same period, the proportion of statistically significant ESs increased regularly: among the abstracts with at least 1 ES, 74% were statistically significant in 1990–1995, vs 85% in 2010–2015. Conclusions: whereas decreasing ESs could be an intrinsic evolution in biomedical research, the concomitant increase of statistically significant results is more intriguing. Although it is likely that growing sample sizes in biomedical research could explain these results, another explanation may lie in the “publish or perish” context of scientific research, with the probability of a growing orientation toward sensationalism in research reports. Important provisions must be made to improve the credibility of biomedical research and limit waste of resources. Keywords: meta-research; effect size; biomedical research; “publish or perish”; data mining Received: 17 July 2017; Revised: 29 October 2017; Accepted: 28 November 2017 The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 2 Monsarrat and Vergnes were higher in South America, Africa, and Asia, and lower in Eu- Background rope, Oceania, and North America (Fig. 1A). ESs were more likely Effect sizes (ESs) are useful to describe associations in stud- to be significant in regions where they were the highest (Fig. 1B, ies that focus broadly on associations between variables [1]. In Additional Table S5). Higher ES values and proportions of signif- medicine, ESs allow the effects of independent variables (includ- icant ESs were found in fields dealing with infectious diseases ing risk/protective factors or treatment interventions) on depen- (Fig. 2, Additional Fig. S4). dent variables (e.g., health outcomes) to be quantified. There are many different types of ESs [2], but in human biomedical ESs values are decreasing over time research, ESs are predominantly derived from risk (or rate) ra- A major finding was that ES values were decreasing over time. tios (RRs), odds ratios (ORs), or hazard ratios (HRs) [3]. No longer In Fig. 3A, there is a clear, progressive evolution between the confined to the early domains of epidemiological research (such 1990s and the 2010s, with a massive concentration of ES values as epidemiological oncology) [4], use of these estimates is now nearer to the value 1 at the present time. This result was very ro- benefiting all biomedical research (e.g., environmental epidemi- bust, as the decrease was observed with all tested outcomes per ology [5], genetics [6], or interventional research [7]). As there is abstract (i.e., minimal, maximal, mean transformed ES values) no straightforward relationship between P-values and strengths (Fig. 3B, C). It also concerned both “risk” and “protective” values of association [2], adequate reporting of ESs is strongly recom- (Additional Fig. S5A, B): overall medians of ES values for “risk” mended by recent statistical guidelines [8]. Given that many decreased from ES∼ 2.50 in 1990–1995 to ES∼ 2.11 in 2010–2015, public health decisions and health care policies are based on ES and those for “protective” values from ES∼ 0.59 to ES∼ 0.63. The estimates [9], it is important to assess how ESs are used in the decrease was observed for all types of ESs, when analyzed sepa- biomedical literature and to investigate potential trends in their rately (Additional Fig. S5C). It was also consistent with a dimin- reporting over time. Consequently, in this study we aim (1) to ishing volume of “large” ESs, and a proliferation of “tiny” ESs in describe the global use of ESs in the biomedical literature dur- recent years (Additional Fig. S5D). The trend was found in nearly ing the last 25 years, (2) to analyze their temporal evolution in terms of strength and statistical significance, and (3) to identify all fields of biomedical research, with the most marked down- ward trend concerning genetic phenomena (Fig. 2). It was also and discuss factors associated with potential evolutions. found on nearly all continents (Additional Fig. S5E). ESs from ab- Data Description stracts of reviews showed a modest decrease of ESs (Additional Fig. S6A), but the decrease was not found in subgroups of ESs PubMed is the most commonly used database of biomedical with 90% or 99% CIs (Additional Fig. S6B). Analysis of full-text information [10] and was considered the primary source. A PMC articles confirmed the decreasing trend for abstracts and “Knowledge Discovery in Databases” (KDD) process led us to tables (τ value of –0.44 and –0.21, P < 0.001) but not for Results add the PubMed Central (PMC) database as an additional source sections (τ value = –0.04, P = 0.41) (Additional Fig. S6C). of data, according to the aims and modalities described in the Knowledge checking subsection of the Methods section. ESs are becoming more often statistically significant All PubMed citations were bulk-downloaded in XML format (2017 release dated 13 December 2016) from the FTP servers of At the same time as ES values have fallen, the proportion of sta- tistically significant ESs has increased. Again, this finding was the US National Library of Medicine (NLM). Among the 26 759 399 citations, 16 820 871 (63%) provided an abstract, and were constant for each outcome considered (i.e., presence of at least 1 statistically significant ES per abstract, or proportion of sta- thus considered preprocessed data (Additional Fig. S1A–C). A tistically significant ESs per abstract) (Fig. 4A, B), for both “risk” data mining process was then run to automatically detect ESs and “protective” ESs, and whatever their type (OR, RR, HR) or (OR, RR, HR) within PubMed abstracts, along with several char- the continent in question (Additional Fig. S7A–D). CIs are now acteristics of the abstracts (see details in the Methods). narrower than in the past (Fig. 3C), while limits near 1 are quite stable, even slightly farther from 1 for the upper limits of “pro- Analyses tective” ESs: between 1990–1995 and 2010–2015, overall medians Unless specified, the results presented are related to nonreview of 95% CI limits evolved from 1.23–4.96 to 1.21–3.54 for “risk” abstracts with 95% confidence intervals (95% CIs). Details may be values, and from 0.32–0.95 to 0.42–0.91 for “protective” values. found in the flow diagram of the selection process for abstracts There was no evidence of an increasing trend in abstracts of re- (Additional Fig. S1C and Additional Table S4) and in the Supple- views (Additional Fig. S6D), nor in subgroups of ESs with 90% or mentary Methods for identification of type of CI. 99% CIs (Additional Fig. S6E), but the proportion of statistically significant ESs in PMC full-text articles also increased ( τ =+0.50, Reporting of ESs increased greatly over time P < 0.001 for abstracts and Results sections) (Additional Fig. S6F). Two point one percent of PubMed abstracts contained at least 1 ES. The relative proportions of ES reports increased markedly Factors associated with observed trends over time (Additional Fig. S2A). More than half of the ESs were ORs, with a trend for RRs to be substituted by HRs (Additional Both decreasing ESs and increasing significance were found in Fig. S2B). ESs >1 were still largely predominant, despite an in- abstracts with evidence of a multivariate analysis, from Open crease of abstracts with all ESs <1, or with a mix of ESs >1and Access (OA) journals and from Core Clinical Journals (CCJ) (Fig. 5). ESs <1 (Additional Figs S2C and S3). However, we found some evolutions in the general environ- ment of publishing: (1) a growing use of multivariate analyses Geographic and thematic disparities in reporting of ESs (Additional Fig. S2E), (2) an increasing appeal for Open Access Europe and North America were by far the biggest providers of publication (Additional Fig. S2F), and (3) a smaller proportion abstracts with ESs (Fig. 1A), although the number was growing of abstracts from Core Clinical Journals (Additional Fig. S2G). considerably in Asia (Additional Fig. S2D). There were notable These changes could accentuate the observed trends because (1) disparities in ES values among different geographical areas: they ESs from abstracts with multivariate analysis were lower than Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 3 Figure 1: ESs are subject to geographic disparities. (A) Treemap of medians of ESs (T#3) by continent. All detected ESs were considered for the comparisons between continents. For each continent, the size of the rectangle is proportional to the absolute number of abstracts with at least 1 detected ES. ESs from abstracts with cross-continental affiliations (5.2% of abstracts) were counted in each continent concerned. The grayscale indicates median values of ESs (on a line ar scale, T#3) by continent: lighter gray corresponds to lower ES values, and darker gray to higher ES values. In rectangles, different letters correspond to statistically different ESs (Kruskal-Wallis pairwise comparisons test). Europe and North America were by far the biggest providers of abstracts with ESs. ES values were higher in abstracts from South America, Africa, and Asia, and lower in abstracts from Europe, Oceania, and North America. Number of abstracts: 238 954. (B) Histogram of mean and standard error of proportions of statistically significant ESs per abstract, according to continent. Beneath the bars, different letters correspond to stati stically different values (Kruskal-Wallis pairwise comparisons test). ESs were more likely to be statistically significant in abstracts from South America, Africa, and Asia. unadjusted ES values (with no difference concerning statisti- as their performance has grown) and because the rise of com- cal significance) (Fig. 5A, B), (2) ES values reported in abstracts puter science is, at least indirectly, linked with this general trend from OA journals were lower than those from non-OA journals (advances in statistical methods and software, availability of (but with a similar proportion of statistical significance) (Fig. 5C, huge electronic databases and larger studies, etc.). D), and (3) ESs from CCJ also decreased but, above all, became The global decrease of ESs could be explained by several less often statistically significant than in non-CCJ over time inter-related considerations. First, as already pointed out by (Fig. 5E, F). Taubes in 1995, there could be a true rarefaction over time of undiscovered conspicuous determinants of diseases, such as smoking or alcohol [11]. We showed that this trend could be ob- Discussion served worldwide and in most fields of biomedical research. Sec- ond, methodological improvements in biomedical research [12] Epidemiology has now reached the paradoxical situation where ESs are decreasing remarkably over time, while these same ESs could also have led to smaller ESs. Most importantly, it is highly probable that larger sample sizes could lead to smaller effect are becoming more and more often statistically significant. We call this surprising phenomenon the in silico effect, by analogy sizes (e.g., through better management of confounders), which are likely to be statistically significant (through an increase in with the evolution of processors (the size of which has decreased Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Monsarrat and Vergnes Figure 2: Heatmap of the temporal evolution of median ESs (T#3) by research field. ESs were considered at the abstract level, using the mean of ES(s) per abstract (on a linear scale, T#3). Abstracts were linked to specific research field(s) according to their (MeSH) keywords, so a single abstract could be linked to multiple fields of research (overall ratio: 801 839/229 581 = 3.49). Research fields (at the right of the figure) were defined from 2 main branches of the MeSH Tree (US NLM): [C] “diseases” and [G] “phenomena and processes.” Numbers in brackets are the total number of abstracts with at least 1 detected ES during the 25-year period in a specific research field. Three branches (out of 43) with fewer than 1000 abstracts with at least 1 ES were eliminated. Trends were calculated at the monthly level but are re presented in the graph at the yearly level for readability. The grayscale indicates yearly median values of ESs (T#3): lighter gray corresponds to lower ES values, and darker gray to higher ES values. On the left, research fields are grouped using a hierarchical cluster analysis and represented as a dendrogram: higher ESs are found i n fields dealing with infectious diseases (e.g., microbiological phenomena, virus diseases, bacterial infections and mycoses; see top of the figure). The color scal e indicates the τ value of the evolution of monthly medians of ESs for each research field. Blank rectangles mean nonsignificant trends. Colored rectangles are red (not blue), with variable intensity indicating a significant monotonic downward trend of ESs in nearly all research fields. The most marked decrease is observed for genetic phen omena (τ = –0.52, P < 0.001). Number of abstracts: 229 581. statistical power). Indeed, multivariate analyses are more fre- One should not directly interpret this structural trend at the quently used as time goes on, which could lead to weaker effects whole literature level in the same way as has already been de- than those obtained with univariate analyses [13]. Third, cul- scribed at the level of particular topics in biology [21]orin tural effects should also be considered. We found that ESs have medical research [22]. Gehr evoked the “fading of reported ef- become smaller in contemporary CCJ. “Modest” ESs (i.e., <RR∼ 3) fectiveness” in randomized controlled trials [23]. Among several are no longer “discredited,” as may have been the case in the past explanations [21], the “Proteus phenomenon” [24] has been de- (e.g., by some former editors of Core Clinical Journals) [11], and scribed to evoke “rapidly alternating extreme research claims slight associations have now become the rule [14]. It is now ac- and extremely opposite refutations” [25]. Decreasing ESs in a cepted, at least in some fields of research, that most true associ- particular topic are likely to lead to a loss of statistical signifi- ations have small effects [15]. Another kind of cultural explana- cance [21], as observed in several cumulative meta-analyses [26]. tion appears when different geographical areas are examined: In contrast, while we also measured decreasing ESs, our findings the “five eyes” countries (Australia, Canada, New Zealand, the indicated a clear trend toward a growing proportion of statisti- United Kingdom, and the United States—the greatest produc- cally significant results over time. This result is consistent with ers and influencers of biomedical research) [ 16] and the Scandi- several other trans-disciplinary meta-research results: a trend navian monarchies (Denmark, Sweden, and Norway) are among toward lower P-values reported in PubMed abstracts between the countries reporting the lowest ESs. Interestingly, it has been 1990 and 2015 [27], increasing reporting of significant tiny effects shown that scientists from these countries may be more cau- in the literature [28], and an increasing proportion of positive re- tious when reporting results, as evidenced by their prominent sults [29]. use of words implying uncertainty in their abstracts [17]. This Although the decrease in ESs over time does not seem prob- is also consistent with stronger ESs being found in Asian stud- lematic in itself, the growing proportion of statistically signifi- ies than in the European and American literature, e.g., for gene- cant results is more intriguing and may reflect the “publish or disease associations [18]. The desire to “compete” with Europe perish” context of scientific research. With a growing popula- and the United States may be an explanation [14]. Finally, an- tion of researchers worldwide [30], all competing to obtain funds, other explanation would be the file drawer effect (i.e., publica- and a probable tendency toward placing greater emphasis on tion bias) [19, 20], which could mask a more pronounced de- novelty and sensationalism [29], maintaining statistically signif- crease of ESs than the 1 we identified by underestimating the icant results may have become the way to “compensate” for the amount of null or negative effects. The increased rejection rates decrease of ESs. We also found that the growing proportion of and the increased emphasis on risk factors have encouraged ed- statistically significant results was unaffected by the develop- itors and authors to select and present manuscripts with bigger ment of Open Access publishing [31] but could be accentuated effect sizes and/or significant differences [ 19]. by the increasing relative importance of Asian papers. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 5 Figure 3: ESs are decreasing over time. (A) Heatmap of the temporal evolution of ESs (i.e., odds ratio, relative risk, or hazard ratio) on their original log scale (T#1) (see details in the Supplementary Methods). All detected ESs (n = 690 196) are considered. ESs <1 were transformed according to the T#1 transformation (inverse transformation) (Additional Table S3A). ES >1 were not transformed. The vertical axis corresponds to a logarithmic scale ranging from 1 to 100, with 25 regular cutoff values (ESs that were >100, corresponding to 0.16% of all detected ESs, are not reported on the graph). The color scale indicates the monthly relative proportion of ESs in each interval: cold colors correspond to lower proportions and hot colors to higher. We can see a trend toward a massive concentration of ES values near to 1 at present. The black dots represent the overall relative proportion of ESs, by year and by interval. We can see that the lowest ESs of the more recent abstracts are the most numerous ESs overall. (B) Scatter plot of the temporal evolution of monthly medians of ESs on a linear scale (T#3). ESs were considered at the abstract level (n = 247 339). Three different outcomes were considered: minimal, maximal, and mean of ES(s) of each abstract. The 3 temporal evolutions are decreasing, with τ values of –0.64 (P < 0.001), –0.59 (P < 0.001), and –0.63 (P < 0.001), respectively. (C) Scatter plot of the temporal evolution of monthly medians of confidence interval (CI) magnitudes on a linear scale (T#3). CI magnitudes were considered at the abstract level (n = 247 339). Three different outcomes are considered: minimal, maximal, and mean of CI magnitude(s) of each abstract. The 3 temporal evolutions are decreasing, with τ values of –0.76 (P < 0.001), –0.67 (P < 0.001), and –0.72 (P < 0.001), respectively. Among the limitations of this study is the incomplete repre- unlikely that the in silico effect would be specific to particular sentation of different possible metrics of ESs [2]: RRs, ORs, and metrics. We also did not filter out analyses in regard to RR/OR/HR HRs are not the only way to report measures of associations. Al- that were expressed per unit of continuous variable, but this lim- though it is mathematically conceivable to standardize other ES itation should not have any effect on temporal trends. One could metrics (e.g., to convert Cohen’s d, Hedges’ g, and correlation co- argue that the heterogeneity of the data that forms the basis of efficient to odds ratio following standard transformations [ 32], the analysis makes it impossible to infer the meaning of these as already done in other meta-research [33]), we could not per- trends. ESs reflect the effects of continuous, categorical, or bi- form data mining on all existing metrics with sufficient accuracy nary measures and include risk factors for diseases, treatment to guarantee the best measurement quality. However, it is rather effects of new drugs vs placebo, genetic effects, effects of risk Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Monsarrat and Vergnes Figure 4: Proportions of statistically significant ESs have increased with time in 247 339 abstracts. ( A) Scatter plot of the temporal evolution of monthly proportions of abstracts with at least 1 statistically significant ES. There is a monotonic upward trend: τ value = 0.65 (P < 0.001). (B) Scatter plot of the temporal evolution of the monthly mean of proportions of statistically significant ES per abstract. There is a monotonic upward trend: τ value = 0.77 (P < 0.001). scores, etc. However, considering the biomedical literature as a S1A–C. Algorithms and statistical scripts are explained in the whole is the only way to assess macro-trends in the way ESs are Supplementary Information and are downloadable [41]. reported. Given that practical interpretation of ESs has not re- ally changed over time, it is important to identify such trends. Data mining Other limitations are related to the data available in XML files of PubMed abstracts, and to the automatic nature of the data min- Using an iterative process, we developed an algorithm aimed ing process: both these considerations prevented us from carry- to automatically detect the 3 main types of ESs (OR, RR, HR) in ing out in-depth analysis of results in relation to sample sizes, PubMed abstracts. As terminology was poorly standardized, we e.g., quality of studies or conflicts of interest. iteratively refreshed a list of ES terms frequently used in biomed- ical research, e.g., “RR,” “OR,” “HR,” “relative risk,” “odds ratio,” “hazard ratio,” “aRR,” “aOR,” “aHR,” etc. (Additional Table S1). Potential implications We also filtered numeric values not likely to be ES values and In this era of alternative truths and bullying of the press, the checked for polysemy of acronyms. The algorithm [41]wastai- public and politicians need a science of epidemiology that is lored to detect the full wording of all medical abbreviations hav- credible and trustworthy. Echoing Taubes [11], it is still impor- ing reported values that could be confused with those of ES tant for epidemiology to avoid becoming an “unending source terms using the same abbreviation (e.g., “respiratory rate” for RR, of fear,” with too many studies having too little real impact “ovulation rate” for OR, “heart rate” for HR) (Additional Table S1). on public health. The medical and research community should Each attempt to improve the detection of ESs was tested for acknowledge forces and constraints that influence the design diagnostic performance on random samples of 200 abstracts, of studies and the way their results are interpreted, because and iterations were validated if both sensitivity and specificity they have significant impact on health decisions and policies. were improved. At the final iteration, a sensitivity greater than We suggest that biomedical researchers should be skilled in 95% and a specificity of 99.9% (interobserver κ> 0.97) were meta-research in order to take a bird’s eye view of science [34]. reached (Supplementary Methods, Additional Table S2, and Sup- More than ever, efforts to improve the credibility of biomedi- plementary File 1 for performance testing). cal research and limit waste of resources must be continued The algorithm automatically recognized the type of ES, its [35]. This implies important provisions, described by Ioannidis value, and the values of upper and lower limits of its CI (Sup- [36], among others, such as the adoption of replication culture, plementary Methods). Other characteristics of the citation that changes in the way statistical methods are designed and used the ES was drawn from were retrieved: PubMed identifier (PMID), in the reporting and interpretation of results [37], and modifi- ±PMC identifier (PMCID), month/year of publication, authors’ cations in the reward system of science [38], to name but a few. affiliation country(ies), Medical SubHeadings (MeSH) keywords, From our results, we can add the consideration to be accorded to detection of a multivariate analysis (yes/no), OA publication Core Clinical Journals when making health decisions and poli- (yes/no), publication in a CCJ (yes/no), CI level (i.e., 90%, 95%, or cies: the importance of their role both in maintaining quality 99%), and type of publication (“review”: yes/no). of research and in filtering articles of clinical or scientific impor- Given the small number of abstracts indexed per year [27] tance seems to be growing. Finally, intensifying transdisciplinar- before 1990 and the as-yet incomplete indexing of abstracts from ity with the humanities would help epidemiologists to provide 2016, only the 1990–2015 period was considered. This process research that would be regarded in terms of its “potential uses led to the generation of a comprehensive database of 814 120 ES and misuses in serving and affecting the human condition” [39]. values (fully available in GigaDB [41]). Data transformation Methods We followed a KDD approach. The KDD process is iterative and By nature, OR/RR/HR values are expressed on a logarithmic scale involves several steps, combining automated methods with hu- (between 0 and 1 for “protective” values, and between 1 and man decisions [40]. The following subsections describe all fi- +∞ for “risk” values). The logarithmic transformation of these nal iterations. The overall process is described in Additional Fig. ESs has the useful property of being normally distributed [42], Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 7 Figure 5: Factors associated with observed trends. Scatter plots of the temporal evolution of monthly medians of ESs (A, C, E) or mean proportions of statistically significant ESs per abstract ( B, D, F), according to presence (yes/no) of the following factors: a multivariate analysis (A, B), the Open Access status of the article (C, D)or the “Core Clinical Journal” status of the article (E, F). The full line represents the temporal trend for abstracts with evidence of the factor, and the dotted line without evidence of the factor. ESs were considered at the abstract level. The outcome was the mean of ES(s) of each abstract (on a linear scale, T#3). (A) ESs from abstracts with multivariate analysis were generally lower than values from abstracts without multivariate analysis during the 25 year period (P < 0.001, Mann-Whitney test). (B) There was no statistical difference between the 2 categories regarding statistical significance during the 25-year period ( P = 0.59, Mann-Whitney test). Number of abstracts: 136 724 and 110 615 abstracts with and without multivariate analysis, respectively. (C) ESs from Open Access abstracts were generally lower than values from non–Open Access abstracts during the 25-year period (P < 0.001, Mann-Whitney test). (D) There was no statistical difference between the 2 categories regarding statistical significance during the 25-year period ( P = 0.57, Mann-Whitney test). Number of abstracts: 92 040 Open Access and 155 299 non–Open Access abstracts. (E) ESs from CCJ abstracts were generally lower than values from non-CCJ abstracts during the 25-year period (P < 0.001, Mann-Whitney test), especially from around the year 2000 onwards. (F) There was no difference between the 2 categories regarding statistical significance during the 25-year period ( P = 0.08, Mann-Whitney test). However, we can see that the curves cross around 2005. When the period between 2005 and 2015 was considered, ESs from CCJ abstracts were less often statistically significant ( P < 0.001, Mann-Whitney test). and the absolute value of the ln-transformed ESs provides a - statistically significant if the CI did not encompass 1. standardization of “protective” and “risk” values. Depending on As multiple ESs are often found within a single abstract, for whether ES values were normalized and/or standardized, 4 dif- analyses at the abstract level, ES values were condensed in dif- ferent transformations were defined (rationale and mathemati- ferent ways (Additional Table S3B): cal explanations in Additional Table S3a). - minimal and maximal ES values per abstract (i.e., the nearest Data analysis value to 1 and the farthest value from 1, respectively); - mean of ES values per abstract (after logarithmic transforma- Outcomes tion); We defined 3 types of ESs: ORs, RRs, and HRs. - magnitude of CIs (minimal, maximal, and mean per abstract Original ESs values were categorized as: after logarithmic transformation); - “protective” if <1, “risk” if >1, “neutral” if =1; - presence of at least 1 statistically significant ES value in the abstract (yes/no) and proportion of statistically significant -“large” [43]when ≤0.2 or ≥5, and “tiny” [28] if between 0.95 and 1.05; ESs per abstract. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 8 Monsarrat and Vergnes Primary analyses were confined to non-reviews to avoid over- Additional files representation of some ES values, and to ESs with 95% CI to allow Additional information may be found in the Supplementary In- magnitude comparisons of CIs. formation pdf file. The Supplementary Methods contains additional informa- Analysis plan tion about the data mining method, programming algorithm, An iterative analysis plan was designed for the 3 aims of the performance tests of the algorithm, and definition of citation study. Specific objectives were listed (Additional Table S4). characteristics. Additional Table 1: check for polysemy of terms re- Statistical analyses lated to types of ESs. The algorithm checked for the pol- Descriptive analyses involved calculations of frequency dis- ysemy of acronyms. Through the MediLexicon online tribution, percentages, means, and tabular statistics for the database of pharmaceutical and medical abbreviations reporting of ESs (both by type of ES and all taken together (http://www.medilexicon.com/), all potential synonyms were for readability purposes). The monotonic upward or downward identified by text mining on the entire abstract. All the terms trend of monthly medians of ES values over time was assessed considered are presented below. From regular expressions, using the Mann-Kendall (MK) test [44]. ES comparisons between some variations were considered to increase the detection of classes of binary variables were tested using Mann-Whitney ES acronyms: presence or absence of plural, hyphen, or spaces. statistics. A Kruskal-Wallis pairwise comparison (using Dunn’s The presence of any of these terms in an abstract oriented the test for multiple comparisons) was achieved to compare val- data mining process toward a more restrictive procedure in ues across continents. The significance level of statistical tests order to minimize the “false positive” rate. was set at P <0.001. Statistics and graphics for data visualization Additional Table 2: Examples of “undetectable” ESs, false- were produced using R 3.2.3 (Vienna, Austria, 2015; R Project for negative ESs, and false-positive ESs. Statistical Computing, RRID:SCR 001905). A “loess” fitted curve Additional Table 3: Mathematical transformations and main [45] was added to scatterplots in order to visualize temporal outcomes. trends. Additional Table 4: Summary table of the analysis plan. Additional Table 5: Geographical analysis. Knowledge checking [40] Additional Figure 1: Overview of the “Knowledge Discovery in Databases” (KDD) approach used in this study: the different Systematic reviews and other types of CI steps that compose the KDD process, the flowchart of the al- Complementary analyses on temporal evolution of ESs were gorithm for PubMed data mining, and the flow diagram of the conducted on 2 subgroups not included in the primary analy- selection process for abstracts. ses: ESs detected in citations identified as “review” and ESs with Additional Figure 2: Descriptive analysis of the comprehen- CI at 90% or 99% (Additional Fig. S2H, I). sive database and descriptive analysis of ESs in abstracts. Additional Figure 3: Histogram distribution of the effect sizes. PMC database Additional Figure 4: Heatmap of the temporal evolution of As an abstract may not be fully representative of the full-text proportion of statistically significant ESs per abstract: disparities article, we extended the data mining process to full-text articles; among fields of research. 64 829 citations with a PMCID number were thus selected from Additional Figure 5: Descriptive analysis of ES values in ab- the comprehensive database. XML data from corresponding PMC stracts for protective and risk values, type of ESs, tiny and large articles (25 868 available articles) were then downloaded, and a effects, and geographical areas. similar data-mining strategy was applied to the Results sections: Additional Figure 6: Descriptive analysis of ES values and sig- 135 542 values were detected; 589 743 ESs were also detected nificance from reviews, according to confidence intervals, from within tables and analyzed separately [41]. PMC full texts. Additional Figure 7: Descriptive analysis of ES significance in abstracts for protective and risk values, type of ESs, tiny and Availability of source code and requirements large effects, and geographical areas. Project name: PubMed ES Detector Source code available Supplementary File 1 contains additional information about at: https://github.com/gigascience/paper-monsarrat2017 & performance testing: kappa, sensitivity, and specificity. http://dx.doi.org/10.5524/100385. Supplementary References Operating systems: platform independent Programming language: Perl License: GNU GPL v3 Abbreviations CCJ: Core Clinical Journal; CI: confidence interval; ES: effect size; Availability of supporting data and materials HR: hazard ratio; KDD: Knowledge Discovery in Databases; MK: Further supporting data are available in the GigaScience repos- Mann-Kendall; NLM: National Library of Medicine; OA: Open Ac- itory, GigaDB [41]. The dataset contains the comprehensive cess; OR: odds ratio; PMC: PubMed Central; PMID: PubMed ID; RR: database of detected ESs in Pubmed, the database of detected relative risk; XML: eXtensible Markup Language. ESs in PubMed Central, and snapshots of the source code of the program that helped to generate these databases. Three spe- cific modules were developed: ES˙detector.pm, Load module.pm Competing financial interests and Mesh detector.pm. The flow diagram of the program can be found in Additional Fig. S1. The authors declare that they have no competing interests. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 9 Funding of risk factors and the impact of model specifications. J Clin Epidemiol 2016;71:58–67. This work was supported by Toulouse University Hospi- 14. Ioannidis JPA. Exposure-wide epidemiology: revisiting Brad- tal (CHU de Toulouse), Toulouse University (UniversiteP ´ aul ford Hill. Statist Med 2016;35(11):1749–62. Sabatier), the Midi-Pyrenees region, the research platform of 15. Khoury MJ, Little J, Gwinn M et al. On the synthesis and the Toulouse Dental Faculty (PLTRO), and the French National interpretation of consistent but weak gene-disease associ- Research Agency (Agence Nationale de la Recherche—ANR— ations in the era of genome-wide association studies. Int J http://dx.doi.org/10.13039/501100001665) under grant ANR-16- Epidemiol 2007;36(2):439–45. CE18–0019-01. 16. Xu Q, Boggio A, Ballabeni A. Countries’ biomedical publi- cations and attraction scores. A PubMed-based assessment Author contributions [version 2; referees: 2 approved]. F1000Research 2015;3:292. doi:10.12688/f1000research.5775.2. P.M. and J.N.V. designed the research, analyzed and interpreted 17. Netzel R, Perez-Iratxeta C, Bork P et al. The way we the data, performed the statistical analysis, and drafted the write: country-specific variations of the English lan- manuscript. P.M. acquired the data and coded the algorithm. guage in the biomedical literature. EMBO Rep 2003;4(5): J.N.V. supervised the study. 446–51. 18. Pan Z, Trikalinos TA, Kavvoura FK et al. Local literature Acknowledgements bias in genetic epidemiology: an empirical evaluation of the The authors thank Ms. Susan Becker for her assistance with Chinese literature. PLoS Med 2005;2(12):e334. English language editing. 19. Pautasso M. Worsening file-drawer problem in the abstracts of natural, medical and social science databases. Sciento- metrics 2010;85(1):193–202. References 20. Mueck L. Report the awful truth! Nat Nanotechnol 1. Rosenthal JA. Qualitative descriptors of strength of associa- 2013;8:693–5. 21. Koricheva J, Jennions M, Lau J. Temporal Trends in Ef- tion and effect size. J Soc Serv Res 1996;21(4):37–59. 2. Durlak JA. How to select, calculate, and interpret effect sizes. fect Sizes: Causes, Detection, and Implications. Prince- J Pediatr Psychol 2009;34(9):917–28. ton, NJ: Princeton University Press; 2013. https:// 3. Anglemyer A, Horvath HT, Bero L. Healthcare outcomes openresearch-repository.anu.edu.au/handle/1885/65531. assessed with observational study designs compared with Accessed 23 December 2016. those assessed in randomized trials. Cochrane Database Syst 22. Ioannidis JPA, Lau J. Evolution of treatment effects over time: Rev 2014;MR000034. empirical insight from recursive cumulative metaanalyses. 4. Schachter J, Hill EC, King EB et al. Chlamydia trachomatis and Proc Natl Acad Sci U S A 2001;98(3):831–6. cervical neoplasia. JAMA 1982;248(17):2134–8. 23. Gehr BT, Weiss C, Porzsolt F. The fading of reported effective- 5. National Research Council (US) Committee on Envi- ness. A meta-analysis of randomised controlled trials. BMC ronmental Epidemiology, National Research Council Med Res Methodol 2006;6(1):25. (US) Commission on Life Sciences. Environmental- 24. Ioannidis JPA, Trikalinos TA. Early extreme contradictory es- Epidemiology Studies: Their Design and Conduct. timates may appear in published research: the Proteus phe- nomenon in molecular genetics research and randomized Washington, DC: The National Academies Press; 1997. https://www.ncbi.nlm.nih.gov/books/NBK233644/. Accessed trials. J Clin Epidemiol 2005;58(6):543–9. 29 September 2016. 25. Ioannidis JP. Why most published research findings are false. 6. Khoury MJ, Beaty TH, Cohen BH. Fundamentals of Genetic PLoS Med 2005;2(8):e124. Epidemiology. Oxford: Oxford University Press; 1993. 26. Trikalinos TA, Churchill R, Ferri M et al. Effect sizes 7. Crowther MA, Ginsberg J, Schunemann ¨ H et al. Evidence- in cumulative meta-analyses of mental health random- Based Hematology. Oxford, UK: John Wiley & Sons; 2009. ized trials evolved over time. J Clin Epidemiol 2004;57(11): 8. Lang T, Altman D. Basic statistical reporting for articles pub- 1124–30. lished in biomedical journals: The “Statistical Analyses and 27. Chavalarias D, Wallach J, Li A et al. Evolution of report- Methods in the Published Literature” or “The SAMPL Guide- ing P values in the biomedical literature, 1990-2015. JAMA lines.” Oxford, UK: Science Editors’ Handbook, European As- 2016;315(11):1141–8. 28. Siontis GCM, Ioannidis JPA. Risk factors and interventions sociation of Science Editors; 2013. 9. Committee on Decision Making Under Uncertainty, Board with statistically significant tiny effects. Int J Epidemiol 2011;40(5):1292–307. on Population Health and Public Health Practice, Insti- tute of Medicine. Environmental Decisions in the Face of 29. Fanelli D. Negative results are disappearing from most Uncertainty. Washington, DC: National Academies Press; disciplines and countries. Scientometrics 2012;90(3):891– 2013. http://www.ncbi.nlm.nih.gov/books/NBK200848/. Ac- 904. cessed 3 October 2016. 30. Pautasso M. Publication growth in biological sub-fields: 10. Falagas ME, Giannopoulou KP, Issaris EA et al. World patterns, predictability and sustainability. Sustainability databases of summaries of articles in the biomedical fields. 2012;4(12):3234–47. Arch Intern Med 2007;167(11):1204–6. 31. Kurata K, Morioka T, Yokoi K et al. Remarkable growth of 11. Taubes G. Epidemiology faces its limits. Science open access in the biomedical field: analysis of PubMed 1995;269(5221):164–9. articles from 2006 to 2010. PLoS One 2013;8(5):e60925. 12. Reveiz L, Chapman E, Asial S et al. Risk of bias of randomized doi:10.1371/journal.pone.0060925. 32. Lipsey MW, Wilson D. Practical Meta-Analysis. 1 edi- trials over time. J Clin Epidemiol 2015;68(9):1036–45. 13. Serghiou S, Patel CJ, Tan YY et al. Field-wide meta-analyses tion. Thousand Oaks, CA: SAGE Publications, Inc; of observational associations can map selective availability Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 10 Monsarrat and Vergnes 33. Fanelli D, Ioannidis JPA. US studies may overestimate ef- in a biotechnological future. Philos Ethics Humanit Med fect sizes in softer research. Proc Natl Acad Sci U S A 2009;4:12. 2013;110(37):15031–6. 40. Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining 34. Ioannidis JP, Fanelli D, Dunne DD et al. Meta-research: to knowledge discovery in databases. AI Mag 1996;17:37–54. evaluation and improvement of research methods and 41. Monsarrat P, Vergnes J. Supporting data for “The intriguing practices. PLoS Biol 2015;13(10):e1002264. doi:10.1371/ evolution of effect sizes in biomedical research over time: journal.pbio.1002264. smaller but more often statistically significant.” GigaScience 35. Macleod MR, Michie S, Roberts I et al. Biomedical research: Database 2017. http://dx.doi.org/10.5524/100385. increasing value, reducing waste. Lancet North Am Ed 42. Bland JM, Altman DG. Statistics notes: the odds ratio. BMJ 2014;383:101–4. 2000;320:1468. 36. Ioannidis JPA. How to make more published research true. 43. Pereira TV, Horwitz RI, Ioannidis JA. Empirical evaluation of PLoS Med 2014;11:e1001747. very large treatment effects of medical interventions. JAMA 37. Sterne JAC, Smith GD. Sifting the evidence—what’s wrong 2012;308:1676–84. with significance tests? Another comment on the role of sta- 44. Esterby SR. Review of methods for the detection and estima- tistical methods. BMJ 2001;322:226–31. tion of trends with emphasis on water quality applications. 38. Ioannidis JPA, Khoury MJ. Assessing value in biomedical re- Hydrol Process 1996;10:127–49. search. JAMA 2014;312:483–4. 45. Jacoby WG. Loess: a nonparametric, graphical tool for de- 39. Giordano J. Quo vadis? Philosophy, ethics, and humanities in picting relationships between variables. Electoral Studies medicine-preserving the humanistic character of medicine 2000;19:577–613. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png GigaScience Oxford University Press

The intriguing evolution of effect sizes in biomedical research over time: smaller but more often statistically significant

Free
10 pages

Loading next page...
1
 
/lp/ou_press/the-intriguing-evolution-of-effect-sizes-in-biomedical-research-over-s2KkHJli00
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press.
eISSN
2047-217X
D.O.I.
10.1093/gigascience/gix121
Publisher site
See Article on Publisher Site

Abstract

Background: In medicine, effect sizes (ESs) allow the effects of independent variables (including risk/protective factors or treatment interventions) on dependent variables (e.g., health outcomes) to be quantified. Given that many public health decisions and health care policies are based on ES estimates, it is important to assess how ESs are used in the biomedical literature and to investigate potential trends in their reporting over time. Results: Through a big data approach, the text mining process automatically extracted 814 120 ESs from 13 322 754 PubMed abstracts. Eligible ESs were risk ratio, odds ratio, and hazard ratio, along with their confidence intervals. Here we show a remarkable decrease of ES values in PubMed abstracts between 1990 and 2015 while, concomitantly, results become more often statistically significant. Medians of ES values have decreased over time for both “risk” and “protective” values. This trend was found in nearly all fields of biomedical research, with the most marked downward tendency in genetics. Over the same period, the proportion of statistically significant ESs increased regularly: among the abstracts with at least 1 ES, 74% were statistically significant in 1990–1995, vs 85% in 2010–2015. Conclusions: whereas decreasing ESs could be an intrinsic evolution in biomedical research, the concomitant increase of statistically significant results is more intriguing. Although it is likely that growing sample sizes in biomedical research could explain these results, another explanation may lie in the “publish or perish” context of scientific research, with the probability of a growing orientation toward sensationalism in research reports. Important provisions must be made to improve the credibility of biomedical research and limit waste of resources. Keywords: meta-research; effect size; biomedical research; “publish or perish”; data mining Received: 17 July 2017; Revised: 29 October 2017; Accepted: 28 November 2017 The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 2 Monsarrat and Vergnes were higher in South America, Africa, and Asia, and lower in Eu- Background rope, Oceania, and North America (Fig. 1A). ESs were more likely Effect sizes (ESs) are useful to describe associations in stud- to be significant in regions where they were the highest (Fig. 1B, ies that focus broadly on associations between variables [1]. In Additional Table S5). Higher ES values and proportions of signif- medicine, ESs allow the effects of independent variables (includ- icant ESs were found in fields dealing with infectious diseases ing risk/protective factors or treatment interventions) on depen- (Fig. 2, Additional Fig. S4). dent variables (e.g., health outcomes) to be quantified. There are many different types of ESs [2], but in human biomedical ESs values are decreasing over time research, ESs are predominantly derived from risk (or rate) ra- A major finding was that ES values were decreasing over time. tios (RRs), odds ratios (ORs), or hazard ratios (HRs) [3]. No longer In Fig. 3A, there is a clear, progressive evolution between the confined to the early domains of epidemiological research (such 1990s and the 2010s, with a massive concentration of ES values as epidemiological oncology) [4], use of these estimates is now nearer to the value 1 at the present time. This result was very ro- benefiting all biomedical research (e.g., environmental epidemi- bust, as the decrease was observed with all tested outcomes per ology [5], genetics [6], or interventional research [7]). As there is abstract (i.e., minimal, maximal, mean transformed ES values) no straightforward relationship between P-values and strengths (Fig. 3B, C). It also concerned both “risk” and “protective” values of association [2], adequate reporting of ESs is strongly recom- (Additional Fig. S5A, B): overall medians of ES values for “risk” mended by recent statistical guidelines [8]. Given that many decreased from ES∼ 2.50 in 1990–1995 to ES∼ 2.11 in 2010–2015, public health decisions and health care policies are based on ES and those for “protective” values from ES∼ 0.59 to ES∼ 0.63. The estimates [9], it is important to assess how ESs are used in the decrease was observed for all types of ESs, when analyzed sepa- biomedical literature and to investigate potential trends in their rately (Additional Fig. S5C). It was also consistent with a dimin- reporting over time. Consequently, in this study we aim (1) to ishing volume of “large” ESs, and a proliferation of “tiny” ESs in describe the global use of ESs in the biomedical literature dur- recent years (Additional Fig. S5D). The trend was found in nearly ing the last 25 years, (2) to analyze their temporal evolution in terms of strength and statistical significance, and (3) to identify all fields of biomedical research, with the most marked down- ward trend concerning genetic phenomena (Fig. 2). It was also and discuss factors associated with potential evolutions. found on nearly all continents (Additional Fig. S5E). ESs from ab- Data Description stracts of reviews showed a modest decrease of ESs (Additional Fig. S6A), but the decrease was not found in subgroups of ESs PubMed is the most commonly used database of biomedical with 90% or 99% CIs (Additional Fig. S6B). Analysis of full-text information [10] and was considered the primary source. A PMC articles confirmed the decreasing trend for abstracts and “Knowledge Discovery in Databases” (KDD) process led us to tables (τ value of –0.44 and –0.21, P < 0.001) but not for Results add the PubMed Central (PMC) database as an additional source sections (τ value = –0.04, P = 0.41) (Additional Fig. S6C). of data, according to the aims and modalities described in the Knowledge checking subsection of the Methods section. ESs are becoming more often statistically significant All PubMed citations were bulk-downloaded in XML format (2017 release dated 13 December 2016) from the FTP servers of At the same time as ES values have fallen, the proportion of sta- tistically significant ESs has increased. Again, this finding was the US National Library of Medicine (NLM). Among the 26 759 399 citations, 16 820 871 (63%) provided an abstract, and were constant for each outcome considered (i.e., presence of at least 1 statistically significant ES per abstract, or proportion of sta- thus considered preprocessed data (Additional Fig. S1A–C). A tistically significant ESs per abstract) (Fig. 4A, B), for both “risk” data mining process was then run to automatically detect ESs and “protective” ESs, and whatever their type (OR, RR, HR) or (OR, RR, HR) within PubMed abstracts, along with several char- the continent in question (Additional Fig. S7A–D). CIs are now acteristics of the abstracts (see details in the Methods). narrower than in the past (Fig. 3C), while limits near 1 are quite stable, even slightly farther from 1 for the upper limits of “pro- Analyses tective” ESs: between 1990–1995 and 2010–2015, overall medians Unless specified, the results presented are related to nonreview of 95% CI limits evolved from 1.23–4.96 to 1.21–3.54 for “risk” abstracts with 95% confidence intervals (95% CIs). Details may be values, and from 0.32–0.95 to 0.42–0.91 for “protective” values. found in the flow diagram of the selection process for abstracts There was no evidence of an increasing trend in abstracts of re- (Additional Fig. S1C and Additional Table S4) and in the Supple- views (Additional Fig. S6D), nor in subgroups of ESs with 90% or mentary Methods for identification of type of CI. 99% CIs (Additional Fig. S6E), but the proportion of statistically significant ESs in PMC full-text articles also increased ( τ =+0.50, Reporting of ESs increased greatly over time P < 0.001 for abstracts and Results sections) (Additional Fig. S6F). Two point one percent of PubMed abstracts contained at least 1 ES. The relative proportions of ES reports increased markedly Factors associated with observed trends over time (Additional Fig. S2A). More than half of the ESs were ORs, with a trend for RRs to be substituted by HRs (Additional Both decreasing ESs and increasing significance were found in Fig. S2B). ESs >1 were still largely predominant, despite an in- abstracts with evidence of a multivariate analysis, from Open crease of abstracts with all ESs <1, or with a mix of ESs >1and Access (OA) journals and from Core Clinical Journals (CCJ) (Fig. 5). ESs <1 (Additional Figs S2C and S3). However, we found some evolutions in the general environ- ment of publishing: (1) a growing use of multivariate analyses Geographic and thematic disparities in reporting of ESs (Additional Fig. S2E), (2) an increasing appeal for Open Access Europe and North America were by far the biggest providers of publication (Additional Fig. S2F), and (3) a smaller proportion abstracts with ESs (Fig. 1A), although the number was growing of abstracts from Core Clinical Journals (Additional Fig. S2G). considerably in Asia (Additional Fig. S2D). There were notable These changes could accentuate the observed trends because (1) disparities in ES values among different geographical areas: they ESs from abstracts with multivariate analysis were lower than Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 3 Figure 1: ESs are subject to geographic disparities. (A) Treemap of medians of ESs (T#3) by continent. All detected ESs were considered for the comparisons between continents. For each continent, the size of the rectangle is proportional to the absolute number of abstracts with at least 1 detected ES. ESs from abstracts with cross-continental affiliations (5.2% of abstracts) were counted in each continent concerned. The grayscale indicates median values of ESs (on a line ar scale, T#3) by continent: lighter gray corresponds to lower ES values, and darker gray to higher ES values. In rectangles, different letters correspond to statistically different ESs (Kruskal-Wallis pairwise comparisons test). Europe and North America were by far the biggest providers of abstracts with ESs. ES values were higher in abstracts from South America, Africa, and Asia, and lower in abstracts from Europe, Oceania, and North America. Number of abstracts: 238 954. (B) Histogram of mean and standard error of proportions of statistically significant ESs per abstract, according to continent. Beneath the bars, different letters correspond to stati stically different values (Kruskal-Wallis pairwise comparisons test). ESs were more likely to be statistically significant in abstracts from South America, Africa, and Asia. unadjusted ES values (with no difference concerning statisti- as their performance has grown) and because the rise of com- cal significance) (Fig. 5A, B), (2) ES values reported in abstracts puter science is, at least indirectly, linked with this general trend from OA journals were lower than those from non-OA journals (advances in statistical methods and software, availability of (but with a similar proportion of statistical significance) (Fig. 5C, huge electronic databases and larger studies, etc.). D), and (3) ESs from CCJ also decreased but, above all, became The global decrease of ESs could be explained by several less often statistically significant than in non-CCJ over time inter-related considerations. First, as already pointed out by (Fig. 5E, F). Taubes in 1995, there could be a true rarefaction over time of undiscovered conspicuous determinants of diseases, such as smoking or alcohol [11]. We showed that this trend could be ob- Discussion served worldwide and in most fields of biomedical research. Sec- ond, methodological improvements in biomedical research [12] Epidemiology has now reached the paradoxical situation where ESs are decreasing remarkably over time, while these same ESs could also have led to smaller ESs. Most importantly, it is highly probable that larger sample sizes could lead to smaller effect are becoming more and more often statistically significant. We call this surprising phenomenon the in silico effect, by analogy sizes (e.g., through better management of confounders), which are likely to be statistically significant (through an increase in with the evolution of processors (the size of which has decreased Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Monsarrat and Vergnes Figure 2: Heatmap of the temporal evolution of median ESs (T#3) by research field. ESs were considered at the abstract level, using the mean of ES(s) per abstract (on a linear scale, T#3). Abstracts were linked to specific research field(s) according to their (MeSH) keywords, so a single abstract could be linked to multiple fields of research (overall ratio: 801 839/229 581 = 3.49). Research fields (at the right of the figure) were defined from 2 main branches of the MeSH Tree (US NLM): [C] “diseases” and [G] “phenomena and processes.” Numbers in brackets are the total number of abstracts with at least 1 detected ES during the 25-year period in a specific research field. Three branches (out of 43) with fewer than 1000 abstracts with at least 1 ES were eliminated. Trends were calculated at the monthly level but are re presented in the graph at the yearly level for readability. The grayscale indicates yearly median values of ESs (T#3): lighter gray corresponds to lower ES values, and darker gray to higher ES values. On the left, research fields are grouped using a hierarchical cluster analysis and represented as a dendrogram: higher ESs are found i n fields dealing with infectious diseases (e.g., microbiological phenomena, virus diseases, bacterial infections and mycoses; see top of the figure). The color scal e indicates the τ value of the evolution of monthly medians of ESs for each research field. Blank rectangles mean nonsignificant trends. Colored rectangles are red (not blue), with variable intensity indicating a significant monotonic downward trend of ESs in nearly all research fields. The most marked decrease is observed for genetic phen omena (τ = –0.52, P < 0.001). Number of abstracts: 229 581. statistical power). Indeed, multivariate analyses are more fre- One should not directly interpret this structural trend at the quently used as time goes on, which could lead to weaker effects whole literature level in the same way as has already been de- than those obtained with univariate analyses [13]. Third, cul- scribed at the level of particular topics in biology [21]orin tural effects should also be considered. We found that ESs have medical research [22]. Gehr evoked the “fading of reported ef- become smaller in contemporary CCJ. “Modest” ESs (i.e., <RR∼ 3) fectiveness” in randomized controlled trials [23]. Among several are no longer “discredited,” as may have been the case in the past explanations [21], the “Proteus phenomenon” [24] has been de- (e.g., by some former editors of Core Clinical Journals) [11], and scribed to evoke “rapidly alternating extreme research claims slight associations have now become the rule [14]. It is now ac- and extremely opposite refutations” [25]. Decreasing ESs in a cepted, at least in some fields of research, that most true associ- particular topic are likely to lead to a loss of statistical signifi- ations have small effects [15]. Another kind of cultural explana- cance [21], as observed in several cumulative meta-analyses [26]. tion appears when different geographical areas are examined: In contrast, while we also measured decreasing ESs, our findings the “five eyes” countries (Australia, Canada, New Zealand, the indicated a clear trend toward a growing proportion of statisti- United Kingdom, and the United States—the greatest produc- cally significant results over time. This result is consistent with ers and influencers of biomedical research) [ 16] and the Scandi- several other trans-disciplinary meta-research results: a trend navian monarchies (Denmark, Sweden, and Norway) are among toward lower P-values reported in PubMed abstracts between the countries reporting the lowest ESs. Interestingly, it has been 1990 and 2015 [27], increasing reporting of significant tiny effects shown that scientists from these countries may be more cau- in the literature [28], and an increasing proportion of positive re- tious when reporting results, as evidenced by their prominent sults [29]. use of words implying uncertainty in their abstracts [17]. This Although the decrease in ESs over time does not seem prob- is also consistent with stronger ESs being found in Asian stud- lematic in itself, the growing proportion of statistically signifi- ies than in the European and American literature, e.g., for gene- cant results is more intriguing and may reflect the “publish or disease associations [18]. The desire to “compete” with Europe perish” context of scientific research. With a growing popula- and the United States may be an explanation [14]. Finally, an- tion of researchers worldwide [30], all competing to obtain funds, other explanation would be the file drawer effect (i.e., publica- and a probable tendency toward placing greater emphasis on tion bias) [19, 20], which could mask a more pronounced de- novelty and sensationalism [29], maintaining statistically signif- crease of ESs than the 1 we identified by underestimating the icant results may have become the way to “compensate” for the amount of null or negative effects. The increased rejection rates decrease of ESs. We also found that the growing proportion of and the increased emphasis on risk factors have encouraged ed- statistically significant results was unaffected by the develop- itors and authors to select and present manuscripts with bigger ment of Open Access publishing [31] but could be accentuated effect sizes and/or significant differences [ 19]. by the increasing relative importance of Asian papers. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 5 Figure 3: ESs are decreasing over time. (A) Heatmap of the temporal evolution of ESs (i.e., odds ratio, relative risk, or hazard ratio) on their original log scale (T#1) (see details in the Supplementary Methods). All detected ESs (n = 690 196) are considered. ESs <1 were transformed according to the T#1 transformation (inverse transformation) (Additional Table S3A). ES >1 were not transformed. The vertical axis corresponds to a logarithmic scale ranging from 1 to 100, with 25 regular cutoff values (ESs that were >100, corresponding to 0.16% of all detected ESs, are not reported on the graph). The color scale indicates the monthly relative proportion of ESs in each interval: cold colors correspond to lower proportions and hot colors to higher. We can see a trend toward a massive concentration of ES values near to 1 at present. The black dots represent the overall relative proportion of ESs, by year and by interval. We can see that the lowest ESs of the more recent abstracts are the most numerous ESs overall. (B) Scatter plot of the temporal evolution of monthly medians of ESs on a linear scale (T#3). ESs were considered at the abstract level (n = 247 339). Three different outcomes were considered: minimal, maximal, and mean of ES(s) of each abstract. The 3 temporal evolutions are decreasing, with τ values of –0.64 (P < 0.001), –0.59 (P < 0.001), and –0.63 (P < 0.001), respectively. (C) Scatter plot of the temporal evolution of monthly medians of confidence interval (CI) magnitudes on a linear scale (T#3). CI magnitudes were considered at the abstract level (n = 247 339). Three different outcomes are considered: minimal, maximal, and mean of CI magnitude(s) of each abstract. The 3 temporal evolutions are decreasing, with τ values of –0.76 (P < 0.001), –0.67 (P < 0.001), and –0.72 (P < 0.001), respectively. Among the limitations of this study is the incomplete repre- unlikely that the in silico effect would be specific to particular sentation of different possible metrics of ESs [2]: RRs, ORs, and metrics. We also did not filter out analyses in regard to RR/OR/HR HRs are not the only way to report measures of associations. Al- that were expressed per unit of continuous variable, but this lim- though it is mathematically conceivable to standardize other ES itation should not have any effect on temporal trends. One could metrics (e.g., to convert Cohen’s d, Hedges’ g, and correlation co- argue that the heterogeneity of the data that forms the basis of efficient to odds ratio following standard transformations [ 32], the analysis makes it impossible to infer the meaning of these as already done in other meta-research [33]), we could not per- trends. ESs reflect the effects of continuous, categorical, or bi- form data mining on all existing metrics with sufficient accuracy nary measures and include risk factors for diseases, treatment to guarantee the best measurement quality. However, it is rather effects of new drugs vs placebo, genetic effects, effects of risk Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Monsarrat and Vergnes Figure 4: Proportions of statistically significant ESs have increased with time in 247 339 abstracts. ( A) Scatter plot of the temporal evolution of monthly proportions of abstracts with at least 1 statistically significant ES. There is a monotonic upward trend: τ value = 0.65 (P < 0.001). (B) Scatter plot of the temporal evolution of the monthly mean of proportions of statistically significant ES per abstract. There is a monotonic upward trend: τ value = 0.77 (P < 0.001). scores, etc. However, considering the biomedical literature as a S1A–C. Algorithms and statistical scripts are explained in the whole is the only way to assess macro-trends in the way ESs are Supplementary Information and are downloadable [41]. reported. Given that practical interpretation of ESs has not re- ally changed over time, it is important to identify such trends. Data mining Other limitations are related to the data available in XML files of PubMed abstracts, and to the automatic nature of the data min- Using an iterative process, we developed an algorithm aimed ing process: both these considerations prevented us from carry- to automatically detect the 3 main types of ESs (OR, RR, HR) in ing out in-depth analysis of results in relation to sample sizes, PubMed abstracts. As terminology was poorly standardized, we e.g., quality of studies or conflicts of interest. iteratively refreshed a list of ES terms frequently used in biomed- ical research, e.g., “RR,” “OR,” “HR,” “relative risk,” “odds ratio,” “hazard ratio,” “aRR,” “aOR,” “aHR,” etc. (Additional Table S1). Potential implications We also filtered numeric values not likely to be ES values and In this era of alternative truths and bullying of the press, the checked for polysemy of acronyms. The algorithm [41]wastai- public and politicians need a science of epidemiology that is lored to detect the full wording of all medical abbreviations hav- credible and trustworthy. Echoing Taubes [11], it is still impor- ing reported values that could be confused with those of ES tant for epidemiology to avoid becoming an “unending source terms using the same abbreviation (e.g., “respiratory rate” for RR, of fear,” with too many studies having too little real impact “ovulation rate” for OR, “heart rate” for HR) (Additional Table S1). on public health. The medical and research community should Each attempt to improve the detection of ESs was tested for acknowledge forces and constraints that influence the design diagnostic performance on random samples of 200 abstracts, of studies and the way their results are interpreted, because and iterations were validated if both sensitivity and specificity they have significant impact on health decisions and policies. were improved. At the final iteration, a sensitivity greater than We suggest that biomedical researchers should be skilled in 95% and a specificity of 99.9% (interobserver κ> 0.97) were meta-research in order to take a bird’s eye view of science [34]. reached (Supplementary Methods, Additional Table S2, and Sup- More than ever, efforts to improve the credibility of biomedi- plementary File 1 for performance testing). cal research and limit waste of resources must be continued The algorithm automatically recognized the type of ES, its [35]. This implies important provisions, described by Ioannidis value, and the values of upper and lower limits of its CI (Sup- [36], among others, such as the adoption of replication culture, plementary Methods). Other characteristics of the citation that changes in the way statistical methods are designed and used the ES was drawn from were retrieved: PubMed identifier (PMID), in the reporting and interpretation of results [37], and modifi- ±PMC identifier (PMCID), month/year of publication, authors’ cations in the reward system of science [38], to name but a few. affiliation country(ies), Medical SubHeadings (MeSH) keywords, From our results, we can add the consideration to be accorded to detection of a multivariate analysis (yes/no), OA publication Core Clinical Journals when making health decisions and poli- (yes/no), publication in a CCJ (yes/no), CI level (i.e., 90%, 95%, or cies: the importance of their role both in maintaining quality 99%), and type of publication (“review”: yes/no). of research and in filtering articles of clinical or scientific impor- Given the small number of abstracts indexed per year [27] tance seems to be growing. Finally, intensifying transdisciplinar- before 1990 and the as-yet incomplete indexing of abstracts from ity with the humanities would help epidemiologists to provide 2016, only the 1990–2015 period was considered. This process research that would be regarded in terms of its “potential uses led to the generation of a comprehensive database of 814 120 ES and misuses in serving and affecting the human condition” [39]. values (fully available in GigaDB [41]). Data transformation Methods We followed a KDD approach. The KDD process is iterative and By nature, OR/RR/HR values are expressed on a logarithmic scale involves several steps, combining automated methods with hu- (between 0 and 1 for “protective” values, and between 1 and man decisions [40]. The following subsections describe all fi- +∞ for “risk” values). The logarithmic transformation of these nal iterations. The overall process is described in Additional Fig. ESs has the useful property of being normally distributed [42], Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 7 Figure 5: Factors associated with observed trends. Scatter plots of the temporal evolution of monthly medians of ESs (A, C, E) or mean proportions of statistically significant ESs per abstract ( B, D, F), according to presence (yes/no) of the following factors: a multivariate analysis (A, B), the Open Access status of the article (C, D)or the “Core Clinical Journal” status of the article (E, F). The full line represents the temporal trend for abstracts with evidence of the factor, and the dotted line without evidence of the factor. ESs were considered at the abstract level. The outcome was the mean of ES(s) of each abstract (on a linear scale, T#3). (A) ESs from abstracts with multivariate analysis were generally lower than values from abstracts without multivariate analysis during the 25 year period (P < 0.001, Mann-Whitney test). (B) There was no statistical difference between the 2 categories regarding statistical significance during the 25-year period ( P = 0.59, Mann-Whitney test). Number of abstracts: 136 724 and 110 615 abstracts with and without multivariate analysis, respectively. (C) ESs from Open Access abstracts were generally lower than values from non–Open Access abstracts during the 25-year period (P < 0.001, Mann-Whitney test). (D) There was no statistical difference between the 2 categories regarding statistical significance during the 25-year period ( P = 0.57, Mann-Whitney test). Number of abstracts: 92 040 Open Access and 155 299 non–Open Access abstracts. (E) ESs from CCJ abstracts were generally lower than values from non-CCJ abstracts during the 25-year period (P < 0.001, Mann-Whitney test), especially from around the year 2000 onwards. (F) There was no difference between the 2 categories regarding statistical significance during the 25-year period ( P = 0.08, Mann-Whitney test). However, we can see that the curves cross around 2005. When the period between 2005 and 2015 was considered, ESs from CCJ abstracts were less often statistically significant ( P < 0.001, Mann-Whitney test). and the absolute value of the ln-transformed ESs provides a - statistically significant if the CI did not encompass 1. standardization of “protective” and “risk” values. Depending on As multiple ESs are often found within a single abstract, for whether ES values were normalized and/or standardized, 4 dif- analyses at the abstract level, ES values were condensed in dif- ferent transformations were defined (rationale and mathemati- ferent ways (Additional Table S3B): cal explanations in Additional Table S3a). - minimal and maximal ES values per abstract (i.e., the nearest Data analysis value to 1 and the farthest value from 1, respectively); - mean of ES values per abstract (after logarithmic transforma- Outcomes tion); We defined 3 types of ESs: ORs, RRs, and HRs. - magnitude of CIs (minimal, maximal, and mean per abstract Original ESs values were categorized as: after logarithmic transformation); - “protective” if <1, “risk” if >1, “neutral” if =1; - presence of at least 1 statistically significant ES value in the abstract (yes/no) and proportion of statistically significant -“large” [43]when ≤0.2 or ≥5, and “tiny” [28] if between 0.95 and 1.05; ESs per abstract. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 8 Monsarrat and Vergnes Primary analyses were confined to non-reviews to avoid over- Additional files representation of some ES values, and to ESs with 95% CI to allow Additional information may be found in the Supplementary In- magnitude comparisons of CIs. formation pdf file. The Supplementary Methods contains additional informa- Analysis plan tion about the data mining method, programming algorithm, An iterative analysis plan was designed for the 3 aims of the performance tests of the algorithm, and definition of citation study. Specific objectives were listed (Additional Table S4). characteristics. Additional Table 1: check for polysemy of terms re- Statistical analyses lated to types of ESs. The algorithm checked for the pol- Descriptive analyses involved calculations of frequency dis- ysemy of acronyms. Through the MediLexicon online tribution, percentages, means, and tabular statistics for the database of pharmaceutical and medical abbreviations reporting of ESs (both by type of ES and all taken together (http://www.medilexicon.com/), all potential synonyms were for readability purposes). The monotonic upward or downward identified by text mining on the entire abstract. All the terms trend of monthly medians of ES values over time was assessed considered are presented below. From regular expressions, using the Mann-Kendall (MK) test [44]. ES comparisons between some variations were considered to increase the detection of classes of binary variables were tested using Mann-Whitney ES acronyms: presence or absence of plural, hyphen, or spaces. statistics. A Kruskal-Wallis pairwise comparison (using Dunn’s The presence of any of these terms in an abstract oriented the test for multiple comparisons) was achieved to compare val- data mining process toward a more restrictive procedure in ues across continents. The significance level of statistical tests order to minimize the “false positive” rate. was set at P <0.001. Statistics and graphics for data visualization Additional Table 2: Examples of “undetectable” ESs, false- were produced using R 3.2.3 (Vienna, Austria, 2015; R Project for negative ESs, and false-positive ESs. Statistical Computing, RRID:SCR 001905). A “loess” fitted curve Additional Table 3: Mathematical transformations and main [45] was added to scatterplots in order to visualize temporal outcomes. trends. Additional Table 4: Summary table of the analysis plan. Additional Table 5: Geographical analysis. Knowledge checking [40] Additional Figure 1: Overview of the “Knowledge Discovery in Databases” (KDD) approach used in this study: the different Systematic reviews and other types of CI steps that compose the KDD process, the flowchart of the al- Complementary analyses on temporal evolution of ESs were gorithm for PubMed data mining, and the flow diagram of the conducted on 2 subgroups not included in the primary analy- selection process for abstracts. ses: ESs detected in citations identified as “review” and ESs with Additional Figure 2: Descriptive analysis of the comprehen- CI at 90% or 99% (Additional Fig. S2H, I). sive database and descriptive analysis of ESs in abstracts. Additional Figure 3: Histogram distribution of the effect sizes. PMC database Additional Figure 4: Heatmap of the temporal evolution of As an abstract may not be fully representative of the full-text proportion of statistically significant ESs per abstract: disparities article, we extended the data mining process to full-text articles; among fields of research. 64 829 citations with a PMCID number were thus selected from Additional Figure 5: Descriptive analysis of ES values in ab- the comprehensive database. XML data from corresponding PMC stracts for protective and risk values, type of ESs, tiny and large articles (25 868 available articles) were then downloaded, and a effects, and geographical areas. similar data-mining strategy was applied to the Results sections: Additional Figure 6: Descriptive analysis of ES values and sig- 135 542 values were detected; 589 743 ESs were also detected nificance from reviews, according to confidence intervals, from within tables and analyzed separately [41]. PMC full texts. Additional Figure 7: Descriptive analysis of ES significance in abstracts for protective and risk values, type of ESs, tiny and Availability of source code and requirements large effects, and geographical areas. Project name: PubMed ES Detector Source code available Supplementary File 1 contains additional information about at: https://github.com/gigascience/paper-monsarrat2017 & performance testing: kappa, sensitivity, and specificity. http://dx.doi.org/10.5524/100385. Supplementary References Operating systems: platform independent Programming language: Perl License: GNU GPL v3 Abbreviations CCJ: Core Clinical Journal; CI: confidence interval; ES: effect size; Availability of supporting data and materials HR: hazard ratio; KDD: Knowledge Discovery in Databases; MK: Further supporting data are available in the GigaScience repos- Mann-Kendall; NLM: National Library of Medicine; OA: Open Ac- itory, GigaDB [41]. The dataset contains the comprehensive cess; OR: odds ratio; PMC: PubMed Central; PMID: PubMed ID; RR: database of detected ESs in Pubmed, the database of detected relative risk; XML: eXtensible Markup Language. ESs in PubMed Central, and snapshots of the source code of the program that helped to generate these databases. Three spe- cific modules were developed: ES˙detector.pm, Load module.pm Competing financial interests and Mesh detector.pm. The flow diagram of the program can be found in Additional Fig. S1. The authors declare that they have no competing interests. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Global trends of effect sizes in medical research 9 Funding of risk factors and the impact of model specifications. J Clin Epidemiol 2016;71:58–67. This work was supported by Toulouse University Hospi- 14. Ioannidis JPA. Exposure-wide epidemiology: revisiting Brad- tal (CHU de Toulouse), Toulouse University (UniversiteP ´ aul ford Hill. Statist Med 2016;35(11):1749–62. Sabatier), the Midi-Pyrenees region, the research platform of 15. Khoury MJ, Little J, Gwinn M et al. On the synthesis and the Toulouse Dental Faculty (PLTRO), and the French National interpretation of consistent but weak gene-disease associ- Research Agency (Agence Nationale de la Recherche—ANR— ations in the era of genome-wide association studies. Int J http://dx.doi.org/10.13039/501100001665) under grant ANR-16- Epidemiol 2007;36(2):439–45. CE18–0019-01. 16. Xu Q, Boggio A, Ballabeni A. Countries’ biomedical publi- cations and attraction scores. A PubMed-based assessment Author contributions [version 2; referees: 2 approved]. F1000Research 2015;3:292. doi:10.12688/f1000research.5775.2. P.M. and J.N.V. designed the research, analyzed and interpreted 17. Netzel R, Perez-Iratxeta C, Bork P et al. The way we the data, performed the statistical analysis, and drafted the write: country-specific variations of the English lan- manuscript. P.M. acquired the data and coded the algorithm. guage in the biomedical literature. EMBO Rep 2003;4(5): J.N.V. supervised the study. 446–51. 18. Pan Z, Trikalinos TA, Kavvoura FK et al. Local literature Acknowledgements bias in genetic epidemiology: an empirical evaluation of the The authors thank Ms. Susan Becker for her assistance with Chinese literature. PLoS Med 2005;2(12):e334. English language editing. 19. Pautasso M. Worsening file-drawer problem in the abstracts of natural, medical and social science databases. Sciento- metrics 2010;85(1):193–202. References 20. Mueck L. Report the awful truth! Nat Nanotechnol 1. Rosenthal JA. Qualitative descriptors of strength of associa- 2013;8:693–5. 21. Koricheva J, Jennions M, Lau J. Temporal Trends in Ef- tion and effect size. J Soc Serv Res 1996;21(4):37–59. 2. Durlak JA. How to select, calculate, and interpret effect sizes. fect Sizes: Causes, Detection, and Implications. Prince- J Pediatr Psychol 2009;34(9):917–28. ton, NJ: Princeton University Press; 2013. https:// 3. Anglemyer A, Horvath HT, Bero L. Healthcare outcomes openresearch-repository.anu.edu.au/handle/1885/65531. assessed with observational study designs compared with Accessed 23 December 2016. those assessed in randomized trials. Cochrane Database Syst 22. Ioannidis JPA, Lau J. Evolution of treatment effects over time: Rev 2014;MR000034. empirical insight from recursive cumulative metaanalyses. 4. Schachter J, Hill EC, King EB et al. Chlamydia trachomatis and Proc Natl Acad Sci U S A 2001;98(3):831–6. cervical neoplasia. JAMA 1982;248(17):2134–8. 23. Gehr BT, Weiss C, Porzsolt F. The fading of reported effective- 5. National Research Council (US) Committee on Envi- ness. A meta-analysis of randomised controlled trials. BMC ronmental Epidemiology, National Research Council Med Res Methodol 2006;6(1):25. (US) Commission on Life Sciences. Environmental- 24. Ioannidis JPA, Trikalinos TA. Early extreme contradictory es- Epidemiology Studies: Their Design and Conduct. timates may appear in published research: the Proteus phe- nomenon in molecular genetics research and randomized Washington, DC: The National Academies Press; 1997. https://www.ncbi.nlm.nih.gov/books/NBK233644/. Accessed trials. J Clin Epidemiol 2005;58(6):543–9. 29 September 2016. 25. Ioannidis JP. Why most published research findings are false. 6. Khoury MJ, Beaty TH, Cohen BH. Fundamentals of Genetic PLoS Med 2005;2(8):e124. Epidemiology. Oxford: Oxford University Press; 1993. 26. Trikalinos TA, Churchill R, Ferri M et al. Effect sizes 7. Crowther MA, Ginsberg J, Schunemann ¨ H et al. Evidence- in cumulative meta-analyses of mental health random- Based Hematology. Oxford, UK: John Wiley & Sons; 2009. ized trials evolved over time. J Clin Epidemiol 2004;57(11): 8. Lang T, Altman D. Basic statistical reporting for articles pub- 1124–30. lished in biomedical journals: The “Statistical Analyses and 27. Chavalarias D, Wallach J, Li A et al. Evolution of report- Methods in the Published Literature” or “The SAMPL Guide- ing P values in the biomedical literature, 1990-2015. JAMA lines.” Oxford, UK: Science Editors’ Handbook, European As- 2016;315(11):1141–8. 28. Siontis GCM, Ioannidis JPA. Risk factors and interventions sociation of Science Editors; 2013. 9. Committee on Decision Making Under Uncertainty, Board with statistically significant tiny effects. Int J Epidemiol 2011;40(5):1292–307. on Population Health and Public Health Practice, Insti- tute of Medicine. Environmental Decisions in the Face of 29. Fanelli D. Negative results are disappearing from most Uncertainty. Washington, DC: National Academies Press; disciplines and countries. Scientometrics 2012;90(3):891– 2013. http://www.ncbi.nlm.nih.gov/books/NBK200848/. Ac- 904. cessed 3 October 2016. 30. Pautasso M. Publication growth in biological sub-fields: 10. Falagas ME, Giannopoulou KP, Issaris EA et al. World patterns, predictability and sustainability. Sustainability databases of summaries of articles in the biomedical fields. 2012;4(12):3234–47. Arch Intern Med 2007;167(11):1204–6. 31. Kurata K, Morioka T, Yokoi K et al. Remarkable growth of 11. Taubes G. Epidemiology faces its limits. Science open access in the biomedical field: analysis of PubMed 1995;269(5221):164–9. articles from 2006 to 2010. PLoS One 2013;8(5):e60925. 12. Reveiz L, Chapman E, Asial S et al. Risk of bias of randomized doi:10.1371/journal.pone.0060925. 32. Lipsey MW, Wilson D. Practical Meta-Analysis. 1 edi- trials over time. J Clin Epidemiol 2015;68(9):1036–45. 13. Serghiou S, Patel CJ, Tan YY et al. Field-wide meta-analyses tion. Thousand Oaks, CA: SAGE Publications, Inc; of observational associations can map selective availability Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018 10 Monsarrat and Vergnes 33. Fanelli D, Ioannidis JPA. US studies may overestimate ef- in a biotechnological future. Philos Ethics Humanit Med fect sizes in softer research. Proc Natl Acad Sci U S A 2009;4:12. 2013;110(37):15031–6. 40. Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining 34. Ioannidis JP, Fanelli D, Dunne DD et al. Meta-research: to knowledge discovery in databases. AI Mag 1996;17:37–54. evaluation and improvement of research methods and 41. Monsarrat P, Vergnes J. Supporting data for “The intriguing practices. PLoS Biol 2015;13(10):e1002264. doi:10.1371/ evolution of effect sizes in biomedical research over time: journal.pbio.1002264. smaller but more often statistically significant.” GigaScience 35. Macleod MR, Michie S, Roberts I et al. Biomedical research: Database 2017. http://dx.doi.org/10.5524/100385. increasing value, reducing waste. Lancet North Am Ed 42. Bland JM, Altman DG. Statistics notes: the odds ratio. BMJ 2014;383:101–4. 2000;320:1468. 36. Ioannidis JPA. How to make more published research true. 43. Pereira TV, Horwitz RI, Ioannidis JA. Empirical evaluation of PLoS Med 2014;11:e1001747. very large treatment effects of medical interventions. JAMA 37. Sterne JAC, Smith GD. Sifting the evidence—what’s wrong 2012;308:1676–84. with significance tests? Another comment on the role of sta- 44. Esterby SR. Review of methods for the detection and estima- tistical methods. BMJ 2001;322:226–31. tion of trends with emphasis on water quality applications. 38. Ioannidis JPA, Khoury MJ. Assessing value in biomedical re- Hydrol Process 1996;10:127–49. search. JAMA 2014;312:483–4. 45. Jacoby WG. Loess: a nonparametric, graphical tool for de- 39. Giordano J. Quo vadis? Philosophy, ethics, and humanities in picting relationships between variables. Electoral Studies medicine-preserving the humanistic character of medicine 2000;19:577–613. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/1/1/4705900 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

GigaScienceOxford University Press

Published: Jan 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off