Biannual azithromycin distribution and child mortality among malnourished children: A subgroup analysis of the MORDOR cluster-randomized trial in NigerO’Brien, Kieran S.;Arzika, Ahmed M.;Maliki, Ramatou;Manzo, Farouk;Mamkara, Alio K.;Lebas, Elodie;Cook, Catherine;Bailey, Robin L.;West, Sheila K.;Oldenburg, Catherine E.;Porco, Travis C.;Arnold, Benjamin;Keenan, Jeremy D.;Lietman, Thomas M.;Group, for the MORDOR Study
doi: 10.1371/journal.pmed.1003285pmid: 32931496
Background Biannual azithromycin distribution has been shown to reduce child mortality as well as increase antimicrobial resistance. Targeting distributions to vulnerable subgroups such as malnourished children is one approach to reaching those at the highest risk of mortality while limiting selection for resistance. The objective of this analysis was to assess whether the effect of azithromycin on mortality differs by nutritional status. Methods and findings A large simple trial randomized communities in Niger to receive biannual distributions of azithromycin or placebo to children 1–59 months old over a 2-year timeframe. In exploratory subgroup analyses, the effect of azithromycin distribution on child mortality was assessed for underweight subgroups using weight-for-age Z-score (WAZ) thresholds of −2 and −3. Modification of the effect of azithromycin on mortality by underweight status was examined on the additive and multiplicative scale. Between December 2014 and August 2017, 27,222 children 1–11 months of age from 593 communities had weight measured at their first study visit. Overall, the average age among included children was 4.7 months (interquartile range [IQR] 3–6), 49.5% were female, 23% had a WAZ < −2, and 10% had a WAZ < −3. This analysis included 523 deaths in communities assigned to azithromycin and 661 deaths in communities assigned to placebo. The mortality rate was lower in communities assigned to azithromycin than placebo overall, with larger reductions among children with lower WAZ: −12.6 deaths per 1,000 person-years (95% CI −18.5 to −6.9, P < 0.001) overall, −17.0 (95% CI −28.0 to −7.0, P = 0.001) among children with WAZ < −2, and −25.6 (95% CI −42.6 to −9.6, P = 0.003) among children with WAZ < −3. No statistically significant evidence of effect modification was demonstrated by WAZ subgroup on either the additive or multiplicative scale (WAZ < −2, additive: 95% CI −6.4 to 16.8, P = 0.34; WAZ < −2, multiplicative: 95% CI 0.8 to 1.4, P = 0.50, WAZ < −3, additive: 95% CI −2.2 to 31.1, P = 0.14; WAZ < −3, multiplicative: 95% CI 0.9 to 1.7, P = 0.26). The estimated number of deaths averted with azithromycin was 388 (95% CI 214 to 574) overall, 116 (95% CI 48 to 192) among children with WAZ < −2, and 76 (95% CI 27 to 127) among children with WAZ < −3. Limitations include the availability of a single weight measurement on only the youngest children and the lack of power to detect small effect sizes with this rare outcome. Despite the trial’s large size, formal tests for effect modification did not reach statistical significance at the 95% confidence level. Conclusions Although mortality rates were higher in the underweight subgroups, this study was unable to demonstrate that nutritional status modified the effect of biannual azithromycin distribution on mortality. Even if the effect were greater among underweight children, a nontargeted intervention would result in the greatest absolute number of deaths averted. Trial registration The MORDOR trial is registered at clinicaltrials.gov NCT02047981. Why was this study done? Previous research has found that distributing oral azithromycin to children 1–59 months old biannually reduces mortality in that age group. Despite the promise of this intervention to promote child survival in high-mortality settings, community distribution of antibiotics like azithromycin has the potential to increase antimicrobial resistance, which could reduce the efficacy of azithromycin and similar antibiotics over time. To harness the benefit of the intervention while reducing the potential for increasing resistance, some propose targeting the intervention to smaller groups of the population that face a particularly high risk of mortality. As malnutrition is implicated in a large proportion of childhood mortality, malnourished children are one potential subgroup to target. What did the researchers do and find? We conducted a subgroup analysis using data from a trial that compared communities randomized to azithromycin distribution or placebo distribution to children 1–59 months over 2 years in Niger. We included 27,222 children from the original trial who had weight measured at their first study visit and compared the effect of azithromycin to placebo on mortality among those who were not underweight, were underweight, and were severely underweight, and we calculated the number of deaths that would have been averted if this intervention had been given to the overall group as well as these groups defined by underweight status. The observed effect of azithromycin on mortality was strongest in the severely underweight subgroup, but we found no statistical evidence that the effect differed by underweight status. As only 10% of children were in the severe underweight group, the overall number of deaths averted would be greatest with an intervention treating all children 1–11 months, rather than only underweight children. What do these findings mean? The mortality rates and the observed effect of azithromycin on mortality were largest for the severe underweight group, suggesting a potential group to target with this intervention, although the study was not powered to detect this effect. Even if this study provided evidence that the effect was strongest among underweight children, a nontargeted intervention would have prevented the greatest number of deaths in this population. Introduction Biannual azithromycin distribution reduced mortality among children 1–59 months of age in a large cluster-randomized trial in Malawi, Niger, and Tanzania (MORDOR trial, “Macrolides Oraux pour Réduire les Décès avec un Oeil sur la Résistance”) [1,2]. The strongest effects were observed in Niger, which had the highest baseline mortality rates, and in children 1–11 months of age [1]. In conjunction with existing child survival activities, this intervention has the potential to bolster progress in reducing under-5 mortality, particularly in high-mortality settings. However, these distributions increase the prevalence of antimicrobial resistance [3,4]. Limiting antibiotic distributions to smaller subgroups at the highest risk of mortality might be an approach to reduce selection for resistance [5]. Malnutrition is implicated in up to 45% of all childhood deaths globally [6]. Malnourished children are at increased risk of mortality from infectious diseases such as diarrhea and respiratory tract infections [6]. Moreover, the relationship between malnutrition and infection is complex, with undernutrition suppressing the immune system and increasing the risk of infection, and infection causing a reduction in appetite, malabsorption of nutrients, and competition for nutrients [7,8], Provision of antibiotics to malnourished children could lead to clearance of both overt and subclinical infections associated with mortality. Use of antibiotics with a long half-life, like azithromycin, could also prevent the development of infections during the 1–2 weeks after administration [9]. Other proposed mechanisms for a beneficial effect of antibiotics in undernourished children involve modulation of the intestinal microbiota, which could result in a reduction in gut flora that compete for nutrients and affect chronic conditions like environmental enteropathy [8,10–14]. Multiple studies have examined the role of antibiotics in malnourished children, with varying results. Three individual-randomized trials have compared antibiotics to placebo in the management of severe acute malnutrition [12,15,16]. One trial in Malawi found that children receiving antibiotics experienced greater nutritional recovery and less mortality than those receiving placebo [15], whereas two other trials found no difference in either nutritional recovery or mortality between arms [12,16]. Fewer studies have focused on children with moderate malnutrition, although one multi-country trial evaluating the effect of antibiotics on a number of outcomes in children with moderate acute malnutrition is currently underway [17]. Targeting high-risk subgroups such as malnourished children with azithromycin could preserve resources and lower the risk of selecting for antimicrobial resistance. However, evidence on the effect of antibiotics on mortality in malnourished children is mixed. The MORDOR trial provides an opportunity to examine the role of antibiotics in reducing mortality in malnourished children in a sub-Saharan African setting. The objective of this subgroup analysis was to assess whether the effect of biannual distribution of oral azithromycin on child mortality differed by nutritional status in Niger. Methods Trial design, setting, and participants MORDOR was a large, simple, multisite cluster-randomized trial designed to compare the effect of biannual distribution of oral azithromycin to placebo on child mortality [1]. The protocol and statistical analysis plan for the main trial have been published, and the analyses presented here are exploratory [1]. This analysis included the Niger site, which enrolled communities in the Boboye and Loga districts (now Boboye, Loga, and Falmey districts after nation-wide redistricting). Communities with populations between 200 and 2,000 inhabitants according to the Niger 2012 census were eligible for inclusion in the main trial. Children 1–59 months of age who weighed ≥ 3.8 kg were eligible for treatment. This subgroup analysis included children 1–11 months old who had weight recorded at the time of the child’s first census, which could have been in any one of the censuses. Children 12–59 months old were excluded because crude height intervals were used to determine dose in children able to stand, and nutritional status indicators could not be accurately calculated for this group. Ethical approval for the Niger site was obtained from the Niger Ministry of Health and the University of California, San Francisco Committee on Human Research. Verbal informed consent was obtained from households and caregivers before inclusion. The trial was conducted in accordance with the principles of the Declaration of Helsinki and was registered at Clinicaltrials.Gov (NCT02047981). Census A door-to-door census was conducted every 6 months to enumerate households in the study area between December 2014 and August 2017. Demographic information (age, sex) was recorded for each child 1–59 months old. During follow-up census data collection, vital status (alive, dead, or unknown) and residence (living in community, moved outside community, or unknown) were recorded. Five censuses (four inter-census periods) were completed during the 2-year study. Data were collected electronically using a custom-designed mobile application (Conexus, Los Gatos, CA) and uploaded to a cloud-based server (Salesforce, San Francisco, CA). Interventions At every biannual census, each child 1–59 months old was offered a single, directly observed dose of oral azithromycin or placebo (Pfizer, New York, NY). Children were given a dose of 20 mg per kg, which was assessed by height-stick approximation according to Niger’s trachoma program guidelines or by weight for children unable to stand. Children known to be allergic to macrolides were not treated. Adverse events were monitored and have been reported elsewhere [1,18]. Outcomes The outcome for this analysis is mortality, defined as community mortality rate (deaths per 1,000 person-years at risk). Data collected during the biannual census were used to assess the outcome. A death was included if a child was recorded as alive on one census and died at the subsequent census. Person-time at risk was calculated as the number of days between consecutive census periods or until death. Children who moved or had an unknown status at the subsequent census contributed half of the days during that inter-census period. Assessment of nutritional status The trial protocol included assessment of weight for the purpose of determining dosage in children unable to stand. Trained study personnel recorded weight (if measured) and dose administered for all children in the mobile application. To determine dosage, children unable to stand were weighed (Amw-tl440 digital hanging scale, American Weigh Scales, Cumming, GA), and weight was recorded to the nearest 0.1 kg. A single weight measurement was taken at each visit. Age- and sex-adjusted weight-for-age Z-scores (WAZs) were calculated using the 2006 WHO Child Growth Standards with the zscorer package in R (R Foundation for Statistical Computing, Vienna, Austria) [19–21]. WAZ was dichotomized to group children without or with moderate to severe malnutrition (WAZ ≥ −2 and WAZ < −2, respectively) and without or with severe malnutrition (WAZ ≥ −3 and WAZ < −3, respectively). These categories were chosen to align with current classification standards used in nutritional policies and programs. Children with a baseline WAZ of less than −6 or greater than 5 were excluded according to WHO recommendations [20]. As WAZ was calculated after program completion, underweight children were not actively identified during the study period, and no additional measures were taken to address nutritional status during the trial. Randomization and masking Within each country, communities were randomized 1:1 to receive biannual azithromycin or placebo. The randomization sequence was generated in R by the trial biostatistician and was implemented by unmasked members of the data team and Pfizer. The allocation was concealed by simultaneous randomization assignment. Participants, investigators, data collectors, and data analysts were masked to treatment assignment. Placebo was packaged to be identical in appearance to the azithromycin to maintain masking. Sample size and statistical methods The MORDOR trial was designed and powered for the primary outcome, which has been previously published [1]. Briefly, the overall trial had 80% power to detect a 10% difference in all-cause mortality among communities receiving azithromycin compared to placebo, and the Niger site included 594 eligible communities [1]. Given the fixed design, the prevalence of moderate to severe and severe underweight, and the mortality rates within subgroups, this subgroup analysis had 80% power to detect additive interaction effects of the following sizes, interpreted as the mortality rate among underweight children receiving placebo in excess of the individual effects of underweight or placebo on mortality: 17 deaths per 1,000 person-years for the moderate to severe subgroup and 25 deaths per 1,000 person-years for the severe subgroup [22]. Analyses were conducted in R. Participant characteristics, WAZ, and outcomes were summarized by arm using frequency and percentage for categorical variables, mean and standard deviation for continuous variables, and incidence rate (deaths per 1,000 person-years, hereafter referred to as “mortality rate”) and 95% confidence interval for outcomes. Confidence intervals were constructed using percentiles from bootstrap resampling with 1,000 replicates. Participant characteristics were also compared among those included in the analysis and those excluded for having missing or invalid weight measurements. No multiple comparisons corrections were made. Effect modification was evaluated non-parametrically with interaction contrasts [23]. To calculate the contrasts, subgroups were coded such that the groups with the lowest mortality rates were the reference categories (i.e., R00 = mortality rate among higher-weight children in communities assigned to azithromycin, R01 = mortality rate among underweight children in communities assigned to azithromycin, R10 = mortality rate among higher-weight children in communities assigned to placebo, and R11 = mortality rate among underweight children in communities assigned to placebo) [24]. An additive interaction contrast greater than 0 indicates the joint effect of receiving placebo and being underweight is greater than the sum of the individual effects considered separately. A multiplicative interaction contrast greater than 1 indicates the joint effect of receiving placebo and being underweight is greater than the product of the individual effects considered separately. The absolute number of deaths averted with azithromycin in each subgroup was also estimated using person-time at risk in both arms and the subgroup-level mortality rates. Several sensitivity analyses were conducted. Survival probability was summarized by treatment arm and WAZ subgroup using Kaplan-Meier survival curves. Effect modification was also examined using Cox proportional hazards models. To determine the presence of multiplicative interaction, models included a shared frailty assuming a gamma distribution to account for clustering, the Efron method for ties, and treatment and WAZ as covariates with their product as an interaction term. Model estimates were reported with hazard ratios for each subgroup against a single reference category and with hazard ratios for the effect of treatment within each stratum of WAZ [23,25]. The estimated hazard ratios were used to calculate the Relative Excess Risk due to Interaction (RERIHR) to assess the presence and direction of additive interaction, with the same coding as used for the interaction contrasts [23–26]. The delta method was used to calculate standard errors for the RERIHR [23]. As treatment arm was randomized and is the primary intervention of interest, confounding of the relationship between nutritional status and mortality was not considered, and no additional factors were controlled for in the models [23]. Model assumptions were evaluated graphically with ln(-ln) survival plots and analytically with tests of scaled Schoenfeld residuals as well as with models including terms for interactions with time to event for each covariate. The appropriateness of the distributional assumptions for the shared frailty were assessed by comparing results against models using a lognormal distribution for the shared frailty and estimated with generalized estimating equations (GEEs) to account for clustering. Additional sensitivity analyses included evaluating the potential for bias introduced by the selection of the analysis sample by restricting the analysis to children eligible during the first inter-census period only and by restricting to children 1–5 months of age. To assess the impact of the use and form of WAZ, baseline weight, age, and sex were included in the models, and baseline WAZ was assessed in continuous form. To evaluate assumptions made in determining time to mortality when no exact date was available, an interval censoring method was also used. This was implemented as a generalized linear mixed model, with a binary outcome for death, a complementary log-log link, a term for inter-census period, and a random effect for community. Trial design, setting, and participants MORDOR was a large, simple, multisite cluster-randomized trial designed to compare the effect of biannual distribution of oral azithromycin to placebo on child mortality [1]. The protocol and statistical analysis plan for the main trial have been published, and the analyses presented here are exploratory [1]. This analysis included the Niger site, which enrolled communities in the Boboye and Loga districts (now Boboye, Loga, and Falmey districts after nation-wide redistricting). Communities with populations between 200 and 2,000 inhabitants according to the Niger 2012 census were eligible for inclusion in the main trial. Children 1–59 months of age who weighed ≥ 3.8 kg were eligible for treatment. This subgroup analysis included children 1–11 months old who had weight recorded at the time of the child’s first census, which could have been in any one of the censuses. Children 12–59 months old were excluded because crude height intervals were used to determine dose in children able to stand, and nutritional status indicators could not be accurately calculated for this group. Ethical approval for the Niger site was obtained from the Niger Ministry of Health and the University of California, San Francisco Committee on Human Research. Verbal informed consent was obtained from households and caregivers before inclusion. The trial was conducted in accordance with the principles of the Declaration of Helsinki and was registered at Clinicaltrials.Gov (NCT02047981). Census A door-to-door census was conducted every 6 months to enumerate households in the study area between December 2014 and August 2017. Demographic information (age, sex) was recorded for each child 1–59 months old. During follow-up census data collection, vital status (alive, dead, or unknown) and residence (living in community, moved outside community, or unknown) were recorded. Five censuses (four inter-census periods) were completed during the 2-year study. Data were collected electronically using a custom-designed mobile application (Conexus, Los Gatos, CA) and uploaded to a cloud-based server (Salesforce, San Francisco, CA). Interventions At every biannual census, each child 1–59 months old was offered a single, directly observed dose of oral azithromycin or placebo (Pfizer, New York, NY). Children were given a dose of 20 mg per kg, which was assessed by height-stick approximation according to Niger’s trachoma program guidelines or by weight for children unable to stand. Children known to be allergic to macrolides were not treated. Adverse events were monitored and have been reported elsewhere [1,18]. Outcomes The outcome for this analysis is mortality, defined as community mortality rate (deaths per 1,000 person-years at risk). Data collected during the biannual census were used to assess the outcome. A death was included if a child was recorded as alive on one census and died at the subsequent census. Person-time at risk was calculated as the number of days between consecutive census periods or until death. Children who moved or had an unknown status at the subsequent census contributed half of the days during that inter-census period. Assessment of nutritional status The trial protocol included assessment of weight for the purpose of determining dosage in children unable to stand. Trained study personnel recorded weight (if measured) and dose administered for all children in the mobile application. To determine dosage, children unable to stand were weighed (Amw-tl440 digital hanging scale, American Weigh Scales, Cumming, GA), and weight was recorded to the nearest 0.1 kg. A single weight measurement was taken at each visit. Age- and sex-adjusted weight-for-age Z-scores (WAZs) were calculated using the 2006 WHO Child Growth Standards with the zscorer package in R (R Foundation for Statistical Computing, Vienna, Austria) [19–21]. WAZ was dichotomized to group children without or with moderate to severe malnutrition (WAZ ≥ −2 and WAZ < −2, respectively) and without or with severe malnutrition (WAZ ≥ −3 and WAZ < −3, respectively). These categories were chosen to align with current classification standards used in nutritional policies and programs. Children with a baseline WAZ of less than −6 or greater than 5 were excluded according to WHO recommendations [20]. As WAZ was calculated after program completion, underweight children were not actively identified during the study period, and no additional measures were taken to address nutritional status during the trial. Randomization and masking Within each country, communities were randomized 1:1 to receive biannual azithromycin or placebo. The randomization sequence was generated in R by the trial biostatistician and was implemented by unmasked members of the data team and Pfizer. The allocation was concealed by simultaneous randomization assignment. Participants, investigators, data collectors, and data analysts were masked to treatment assignment. Placebo was packaged to be identical in appearance to the azithromycin to maintain masking. Sample size and statistical methods The MORDOR trial was designed and powered for the primary outcome, which has been previously published [1]. Briefly, the overall trial had 80% power to detect a 10% difference in all-cause mortality among communities receiving azithromycin compared to placebo, and the Niger site included 594 eligible communities [1]. Given the fixed design, the prevalence of moderate to severe and severe underweight, and the mortality rates within subgroups, this subgroup analysis had 80% power to detect additive interaction effects of the following sizes, interpreted as the mortality rate among underweight children receiving placebo in excess of the individual effects of underweight or placebo on mortality: 17 deaths per 1,000 person-years for the moderate to severe subgroup and 25 deaths per 1,000 person-years for the severe subgroup [22]. Analyses were conducted in R. Participant characteristics, WAZ, and outcomes were summarized by arm using frequency and percentage for categorical variables, mean and standard deviation for continuous variables, and incidence rate (deaths per 1,000 person-years, hereafter referred to as “mortality rate”) and 95% confidence interval for outcomes. Confidence intervals were constructed using percentiles from bootstrap resampling with 1,000 replicates. Participant characteristics were also compared among those included in the analysis and those excluded for having missing or invalid weight measurements. No multiple comparisons corrections were made. Effect modification was evaluated non-parametrically with interaction contrasts [23]. To calculate the contrasts, subgroups were coded such that the groups with the lowest mortality rates were the reference categories (i.e., R00 = mortality rate among higher-weight children in communities assigned to azithromycin, R01 = mortality rate among underweight children in communities assigned to azithromycin, R10 = mortality rate among higher-weight children in communities assigned to placebo, and R11 = mortality rate among underweight children in communities assigned to placebo) [24]. An additive interaction contrast greater than 0 indicates the joint effect of receiving placebo and being underweight is greater than the sum of the individual effects considered separately. A multiplicative interaction contrast greater than 1 indicates the joint effect of receiving placebo and being underweight is greater than the product of the individual effects considered separately. The absolute number of deaths averted with azithromycin in each subgroup was also estimated using person-time at risk in both arms and the subgroup-level mortality rates. Several sensitivity analyses were conducted. Survival probability was summarized by treatment arm and WAZ subgroup using Kaplan-Meier survival curves. Effect modification was also examined using Cox proportional hazards models. To determine the presence of multiplicative interaction, models included a shared frailty assuming a gamma distribution to account for clustering, the Efron method for ties, and treatment and WAZ as covariates with their product as an interaction term. Model estimates were reported with hazard ratios for each subgroup against a single reference category and with hazard ratios for the effect of treatment within each stratum of WAZ [23,25]. The estimated hazard ratios were used to calculate the Relative Excess Risk due to Interaction (RERIHR) to assess the presence and direction of additive interaction, with the same coding as used for the interaction contrasts [23–26]. The delta method was used to calculate standard errors for the RERIHR [23]. As treatment arm was randomized and is the primary intervention of interest, confounding of the relationship between nutritional status and mortality was not considered, and no additional factors were controlled for in the models [23]. Model assumptions were evaluated graphically with ln(-ln) survival plots and analytically with tests of scaled Schoenfeld residuals as well as with models including terms for interactions with time to event for each covariate. The appropriateness of the distributional assumptions for the shared frailty were assessed by comparing results against models using a lognormal distribution for the shared frailty and estimated with generalized estimating equations (GEEs) to account for clustering. Additional sensitivity analyses included evaluating the potential for bias introduced by the selection of the analysis sample by restricting the analysis to children eligible during the first inter-census period only and by restricting to children 1–5 months of age. To assess the impact of the use and form of WAZ, baseline weight, age, and sex were included in the models, and baseline WAZ was assessed in continuous form. To evaluate assumptions made in determining time to mortality when no exact date was available, an interval censoring method was also used. This was implemented as a generalized linear mixed model, with a binary outcome for death, a complementary log-log link, a term for inter-census period, and a random effect for community. Results In December 2014, 615 communities in Niger were randomized to receive biannual azithromycin or placebo in the main trial, of which 594 communities were successfully censused and included in analyses (Fig 1). Treatment coverage among children 1–59 months old was greater than 91% over the 4 inter-census periods in both arms. The final sample for this analysis included 593 communities with 27,222 children 1–11 months old who had a valid weight recorded at the time of the child’s first entry into the study. One community was not included because it had no eligible children, and 12,086 children 1–11 months old at their first census were excluded either for having no weight recorded (11,899 children, of which 10,271 had approximate height measured) or having a WAZ less than −6 or greater than 5 recorded (187 children). Over the 2-year study period, 5,189 children were lost to follow-up, with a similar percentage of children lost in each arm. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. CONSORT participant flow diagram. CONSORT, Consolidated Standards of Reporting Trials. https://doi.org/10.1371/journal.pmed.1003285.g001 Characteristics of included children at the time of the child’s first census are shown by treatment arm in Table 1. Overall, the median age was 4 months (interquartile range [IQR] 3–6), and 49.5% of children (13,484/27,222) were female. Mean WAZ was −0.8 (SD 1.7), with 23.0% (6,268/27,222) of all children having a WAZ < −2 and 10.1% (2,755/27,222) having a WAZ < −3. All characteristics were similar in both arms. Excluded children were older than included children (median age 9 months, IQR 6–11), and a similar percentage were female (49.1%; S1 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Characteristics of children 1–11 months old with weight recorded at the time of entry into the study. https://doi.org/10.1371/journal.pmed.1003285.t001 The analysis included 1,184 deaths and a total of 30,852 person-years at risk (Table 2). The overall difference in the incidence of mortality comparing communities assigned to azithromycin to communities assigned to placebo was −12.6 deaths per 1,000 person-years (95% CI −18.5 to −6.9, P < 0.001). By subgroup, this difference was −17.0 (95% CI −28.0 to −7.0, P = 0.001) among those with WAZ < −2, and −25.6 (95% CI −42.6 to −9.6, P = 0.003) among those with WAZ < −3. Fig 2 compares mortality rates by treatment arm and subgroup. Interaction contrasts on the additive scale were 5.7 deaths per 1,000 person-years (95% CI −6.4 to 16.8, P = 0.34) for the moderate to severe subgroup and 14.4 deaths per 1,000 person-years (95% CI −2.2 to 31.1, P = 0.14) for the severe subgroup. On the multiplicative scale, these contrasts were 1.1 (95% CI 0.8 to 1.4, P = 0.50) and 1.2 (95% CI 0.9 to 1.7, P = 0.26), respectively. The estimated number of deaths averted with azithromycin among children 1–11 months old was 388 (95% CI 214 to 574) overall, 116 (95% CI 48 to 192) among children with WAZ < −2, and 76 (95% CI 27 to 127) among children with WAZ < −3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Comparison of mortality rates by treatment arm and WAZ subgroup with interaction contrasts. (A, B) Comparisons of mortality rate (deaths per 1,000 person-years) by treatment arm overall and by WAZ subgroup on the additive (A) and multiplicative (B) scales. (A) Mortality rate differences (mortality rate in communities assigned to azithromycin minus mortality rate in communities assigned to placebo). (B) Mortality rate ratios (mortality rate in communities assigned to azithromycin divided by mortality rate in communities assigned to placebo). (C, D) Interaction contrasts on the additive (C) and multiplicative (D) scales. Interaction contrasts defined subgroups such that the groups with the lowest mortality rates were the reference categories (i.e., R00 = mortality rate among higher-weight children in communities assigned to azithromycin, R01 = mortality rate among underweight in communities assigned to azithromycin, R10 = mortality rate among higher-weight children in communities assigned to placebo, and R11 = mortality rate among underweight children in communities assigned to placebo). (C) Interaction contrasts on the additive scale. (D) Interaction contrasts on the multiplicative scale. WAZ, weight-for-age Z-score. https://doi.org/10.1371/journal.pmed.1003285.g002 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Number of deaths, person-time at risk, and mortality rates by treatment arm and subgroups of WAZ. https://doi.org/10.1371/journal.pmed.1003285.t002 Figs 3 and 4 display survival probabilities by arm and subgroup, and Table 3 reports model-based estimates of mortality and effect modification by subgroup. Among children in placebo-treated communities, lower WAZ was associated with an increased hazard of mortality (HR 1.32, 95% CI 1.11–1.57, P = 0.002 comparing WAZ < −2 to WAZ ≥ −2 and HR 1.56, 95% CI 1.25–1.95, P < 0.001 comparing WAZ < −3 to WAZ ≥ −3). The hazard for mortality was lower in communities assigned to azithromycin than communities assigned to placebo, with a more pronounced effect for the subgroups of underweight children (27% lower in WAZ ≥ −2, 95% CI 15–38, P < 0.001; 30% lower in WAZ < −2, 95% CI 11–45, P = 0.003; and 38% lower in WAZ < −3, 95% CI 14–55, P = 0.005). When comparing underweight children in communities assigned to azithromycin to higher-weight children in communities assigned to placebo, the hazards for mortality were similar in both subgroups (HR 0.93, 95% CI 0.75–1.14, P = 0.48 comparing WAZ < −2 to WAZ ≥ −2 and HR 0.97, 95% CI 0.74–1.27, P = 0.82 comparing WAZ < −3 to WAZ ≥ −3). No evidence of effect modification was identified. Similar results were found in all sensitivity analyses. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Kaplan-Meier estimates of survival probability by treatment arm and the moderate to severe underweight subgroup. Each curve depicts a different subgroup, with placebo represented by dotted lines in shades of blue and azithromycin represented by solid lines in shades of red. The darker shades indicate the higher-weight subgroup (WAZ ≥ −2), and the lighter shades indicate the underweight subgroup (WAZ < −2). The y-axis is broken for clarity and jumps from 0.00 to 0.85. WAZ, weight-for-age Z-score. https://doi.org/10.1371/journal.pmed.1003285.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Kaplan-Meier estimates of survival probability by treatment arm and the severe underweight subgroup. Each curve depicts a different subgroup, with placebo represented by dotted lines in shades of blue and azithromycin represented by solid lines in shades of red. The darker shades indicate the higher-weight subgroup (WAZ ≥ −3), and the lighter shades indicate the underweight subgroup (WAZ < −3). The y-axis is broken for clarity and jumps from 0.00 to 0.85. WAZ, weight-for-age Z-score. https://doi.org/10.1371/journal.pmed.1003285.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Sensitivity analysis using Cox proportional hazards regression to evaluate the association between biannual oral azithromycin distribution and mortality by WAZ subgroups1. https://doi.org/10.1371/journal.pmed.1003285.t003 Discussion This subgroup analysis evaluated whether the effect of biannual azithromycin distribution on child mortality differed by underweight status in a high-mortality West African setting. Azithromycin was associated with an overall 28% reduction in mortality compared to placebo in children 1–11 months old with weight measured, similar to the age-based subgroup results from the main trial [1]. As expected given evidence on the relationship between malnutrition and mortality [6,27,28], lower weight for age was associated with increased mortality. The observed time to mortality in underweight children receiving azithromycin was approximately the same as that of higher-weight children receiving placebo. Although the absolute reduction in mortality between arms appears larger in both underweight groups, no evidence of effect modification by WAZ subgroup was found at the 95% confidence level. The number of deaths averted was greatest if all children were treated with azithromycin, regardless of nutritional status. The nonspecific distribution of azithromycin to reduce child mortality presents an ethical dilemma: given the strong evidence of efficacy, it may be unethical to withhold such an intervention, yet the intervention’s effect on antimicrobial resistance warrants caution [29]. Increasing resistance could reduce the efficacy of essential antibiotics, potentially causing additional morbidity and mortality in the longer term. Targeting the intervention to high-risk subgroups is one solution to preserve resources and reduce negative consequences; targeting all children 1–11 months in this study population required 10 times the amount of azithromycin compared to targeting WAZ < −3. A targeted approach may also be more cost-effective than a broader distribution strategy [30]. The assumption that targeting vulnerable subgroups results in the greatest population health benefits has been questioned, however, since more lives are saved by intervening on a population with a wider risk spectrum [31–33]. Here, although there is some indication that intervening on those with the lowest WAZ may be particularly beneficial, the absolute number of deaths averted was 5 times greater when including all children 1–11 months as opposed to only the 10% with WAZ < −3. In addition, possible indirect effects might be lost with a more focused intervention. Finally, targeting a subgroup of the population presents its own ethical complexity, as providing a beneficial intervention more broadly might be more equitable when resources are available to do so [29]. Approximately 23% of the children included in this analysis were underweight, similar to other estimates indicating that Niger bears a high burden of malnutrition [34]. A single weight measurement was taken on a subset of children 1–11 months old who were unable to stand, which has several implications for interpretation of these results. First, other nutritional status indicators like wasting and stunting were not assessed, nor were the causes of underweight status. Underweight status has been shown to increase the risk of mortality in multiple settings [6,27,28,35,36], with some evidence demonstrating that WAZ alone is a highly sensitive and specific indicator of concurrent wasting and stunting [37]. In addition, as malnutrition is caused by a wide variety of factors, azithromycin might be more effective in cause-specific subgroups of underweight children, though this study was neither designed nor powered to assess the effect by smaller subgroups. Similarly, as underweight status could be a proxy for other child, household, and community characteristics, the mechanism of effect modification is likely more complex than modeled here. Second, although being underweight at the first visit likely predicts being underweight at later visits, we were unable to unable to examine the impact of changing nutritional status over time. Children who became underweight after their first visit thus might be misclassified by this analysis, which we would expect to bias any effect modification towards the null. Third, the selection of children 1–11 months of age who had weight measurements available could introduce bias, since children at the older end of that range who were able to stand were more likely not to be weighed. However, exclusions among the older age group were balanced by arm, overall and across census periods, and sensitivity analyses restricting the population to children 1–5 months produced similar results to the main analysis. Additionally, older children were not weighed. The analysis population thus might not be representative of the general population, as it might include a higher prevalence of underweight children and does not reflect the experience of children 12–59 months old. Fourth, the SD for WAZ was greater than 1 [20], likely due to measurement error since weight was assessed primarily for the purpose of intervention delivery. Only one measurement was taken at each visit for each child in order to determine dosage. As mean WAZ and SD were similar across arms, any information bias is likely to be conservative, which could have masked the presence of effect modification. Fifth, both mortality and malnutrition are known to vary seasonally in West Africa [38,39]. Seasonality-focused analyses were not pursued given the low power to further stratify the population, and the lack of an overall seasonal effect of azithromycin on mortality in the main trial [39]. Finally, the use of cutoffs to categorize malnourished groups has been criticized for creating a false separation of subgroups in which to intervene [40], particularly in high-burden areas where the entire distribution of anthropometric indicators is shifted downwards. As these cutoffs are actively used in current programs and policy, their use in this application provides readily available information to these sectors while also calling into question the impact of a targeted strategy that would exclude many children with mild to moderate malnutrition who also face an increased burden of mortality [27]. Additional limitations of this study include those shared by most subgroup analyses of trials, such as the potential for false negatives from lack of power and bias from use of improper subgroups. The effect sizes observed in this analysis were smaller than detectable by the design (5.7 versus 17 deaths per 1,000 person-years for the moderate to severe subgroup, and 14.4 versus 25 deaths per 1,000 person-years for the severe subgroup), indicating the analysis was underpowered. The use of baseline WAZ from children who entered the study after azithromycin had been distributed at the community level could result in bias since WAZ for these children is a post-randomization characteristic that could be influenced by treatment arm. A sensitivity analysis restricted to the first phase of the study did not reveal differences in results. Also, underweight prevalence did not differ by arm across census period, so more complex approaches to assessing or controlling for this potential bias were not pursued. In this type of dynamic cohort, differential loss to follow-up can result in selection bias. Although loss to follow-up was present, it was similar when compared by arm. Further research would be required to determine whether these results were generalizable to settings beyond those similar to Niger, which has a high burden of both malnutrition and mortality. Strengths of this study include the large sample size, the assessment of both additive and multiplicative interaction, and the randomized design. In summary, a placebo-controlled trial found that biannual azithromycin distribution reduced mortality among children 1–11 months old regardless of underweight status. Although the observed mortality reduction with azithromycin was larger among subgroups of underweight children, underweight status was not a statistically significant effect modifier in this trial. Treatment of all children 1–11 months old would save 5 times as many lives as restricting treatments only to children with a WAZ < −3. Supporting information S1 CONSORT Checklist. Details about where the trial-specific information outlined by the CONSORT guidelines can be found in this manuscript. CONSORT, Consolidated Standards of Reporting Trials. https://doi.org/10.1371/journal.pmed.1003285.s001 (DOC) S1 Table. Summary of characteristics of children 1–11 months old at the time of entry into the study among included children (n = 27,222) and excluded children (n = 12,086). Excluded children include those who were 1–11 months of age at the time of entry into the study and did not have weight measured (n = 11,899) or had an invalid weight recorded (n = 187). https://doi.org/10.1371/journal.pmed.1003285.s002 (DOCX)
Combined associations of body mass index and adherence to a Mediterranean-like diet with all-cause and cardiovascular mortality: A cohort studyMichaëlsson, Karl;Baron, John A.;Byberg, Liisa;Höijer, Jonas;Larsson, Susanna C.;Svennblad, Bodil;Melhus, Håkan;Wolk, Alicja;Lemming, Eva Warensjö
doi: 10.1371/journal.pmed.1003331pmid: 32941436
Background It is unclear whether the effect on mortality of a higher body mass index (BMI) can be compensated for by adherence to a healthy diet and whether the effect on mortality by a low adherence to a healthy diet can be compensated for by a normal weight. We aimed to evaluate the associations of BMI combined with adherence to a Mediterranean-like diet on all-cause and cardiovascular disease (CVD) mortality. Methods and findings Our longitudinal cohort design included the Swedish Mammography Cohort (SMC) and the Cohort of Swedish Men (COSM) (1997–2017), with a total of 79,003 women (44%) and men (56%) and a mean baseline age of 61 years. BMI was categorized into normal weight (20–24.9 kg/m2), overweight (25–29.9 kg/m2), and obesity (30+ kg/m2). Adherence to a Mediterranean-like diet was assessed by means of the modified Mediterranean-like diet (mMED) score, ranging from 0 to 8; mMED was classified into 3 categories (0 to <4, 4 to <6, and 6–8 score points), forming a total of 9 BMI × mMED combinations. We identified mortality by use of national Swedish registers. Cox proportional hazard models with time-updated information on exposure and covariates were used to calculate the adjusted hazard ratios (HRs) of mortality with their 95% confidence intervals (CIs). Our HRs were adjusted for age, baseline educational level, marital status, leisure time physical exercise, walking/cycling, height, energy intake, smoking habits, baseline Charlson’s weighted comorbidity index, and baseline diabetes mellitus. During up to 21 years of follow-up, 30,389 (38%) participants died, corresponding to 22 deaths per 1,000 person-years. We found the lowest HR of all-cause mortality among overweight individuals with high mMED (HR 0.94; 95% CI 0.90, 0.98) compared with those with normal weight and high mMED. Using the same reference, obese individuals with high mMED did not experience significantly higher all-cause mortality (HR 1.03; 95% CI 0.96–1.11). In contrast, compared with those with normal weight and high mMED, individuals with a low mMED had a high mortality despite a normal BMI (HR 1.60; 95% CI 1.48–1.74). We found similar estimates among women and men. For CVD mortality (12,064 deaths) the findings were broadly similar, though obese individuals with high mMED retained a modestly increased risk of CVD death (HR 1.29; 95% CI 1.16–1.44) compared with those with normal weight and high mMED. A main limitation of the present study is the observational design with self-reported lifestyle information with risk of residual or unmeasured confounding (e.g., genetic liability), and no causal inferences can be made based on this study alone. Conclusions These findings suggest that diet quality modifies the association between BMI and all-cause mortality in women and men. A healthy diet may, however, not completely counter higher CVD mortality related to obesity. Why was this study done? It is unclear whether the effect on mortality of a higher BMI can be compensated for by adherence to a healthy diet. It is also unclear whether the effect on mortality by a low adherence to a healthy diet can be compensated for by a normal weight. What did the researchers do and find? We conducted a population-based cohort study that included women and men with time-updated lifestyle information. Obese individuals with high adherence to a Mediterranean-type diet did not experience the increased overall mortality otherwise associated with high BMI, although higher CVD mortality remained. Lower BMI did not counter the elevated mortality associated with a low adherence to a Mediterranean diet. What do these findings mean? Our results indicate that adherence to healthy diets such as a Mediterranean-like diet may modify the association between BMI and mortality. Methods The study population consisted of participants from 2 population-based cohort studies in Sweden: the Swedish Mammography Cohort (SMC) and the Cohort of Swedish Men (COSM), belonging to the national research infrastructure SIMPLER (www.simpler4health.se). The SMC was established in 1987–1990 when women (born 1914–1948, n = 90,303) residing in 2 counties (Uppsala and Västmanland) were invited to a questionnaire survey covering diet and lifestyle, which was completed by 74% of the women. In the fall of 1997, a second extended questionnaire was sent to all SMC participants who were still alive and residing in the study area (n = 56,030). COSM was established in late 1997 when all male residents (n = 100,303) of 2 counties (Örebro and Västmanland) and born between 1918 and 1952 were invited to participate. When compared with the Official Statistics of Sweden, the cohorts well represented the Swedish population in 1997 in terms of age distribution, educational level, prevalence of overweight and obesity, and smoking status [32]. The 1997 questionnaires in both the SMC and COSM were similar except for the sex-specific questions and included almost 350 items that covered life style factors such as body weight and height, diet (using a validated food frequency questionnaire [FFQ]), dietary supplement use, alcohol consumption, smoking, physical activity, sociodemographic data, and self-perceived health status. This questionnaire was completed by 70% of the women and by 49% of the men. Participants with a prior cancer diagnosis or with energy intakes deemed implausible (±3 SDs from the mean of ln-transformed energy intake) were excluded. The final cohorts consisted of 38,984 women in the SMC and 45,906 men in the COSM followed from January 1, 1998. In 2008, a questionnaire covering general health, lifestyle, and diseases was sent to all participants that had completed the 1997 questionnaire and who were still alive and living in the study area. The response rate was 63% in the SMC and 78% in the COSM. Those who responded to the 2008 questionnaire received an expanded semiquantitative FFQ in 2009; the response rate was 84% and 90% in the SMC and COSM, respectively. This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines (S1 Checklist). The study has ethical approval by the Regional Ethical Review Boards in Uppsala and Stockholm, Sweden. The questionnaires included a written informed consent. A prespecified analysis plan in Swedish can be found at dx.doi.org/10.17504/protocols.io.bgftjtnn and in S1 Protocol in Swedish and with English translation. BMI We categorized BMI into normal (20 to <25 kg/m2), overweight (25 to <30 kg/m2), and obese (≥30 kg/m2), using self-reported weight and height in 1997 and 2008. Overall, 4% of data points were missing. Those with a BMI below 20 kg/m2 at baseline were excluded (n = 3,226, 4%; 2,354 women and 872 men) since a low body mass can reflect frailty or prevalent disease, which were not intended to be examined in this analysis. Therefore, our final data set used for the analyses contains 79,003 women (44%) and men (56%). Modified Mediterranean-like diet (mMED) score The dietary assessment has been described previously [33]. Briefly, the FFQs included 96 and 132 food items in 1997 and 2009, respectively. We calculated an mMED score adapted from the Mediterranean diet scale devised by Trichopoulou and colleagues [34] using previously defined food items [35], but the scoring was modified according to Knudsen and colleagues, rendering a continuous score [36]. Details of the scoring, ranging from 0 to 8 on a continuous scale, are found below; participants with higher score points were more adherent to the diet. For analysis, mMED was classified into 3 predefined categories (0 to <4, 4 to <6, and 6–8 score points) chosen to balance exposure range and numbers of individuals in each category [37]. Participants indicated in the FFQs how often, on average, they had consumed each food item during the last year, choosing from 8 predefined frequency categories ranging from "never/seldom" to "3 or more times per day". Frequently consumed foods such as dairy products and bread were reported as the number of servings per day (open question). Information on fat type including vegetable oils used in cooking and as salad dressing was also reported. At baseline in 1997, 19% of the women and 12% of the men reported use of olive oil in dressing. The corresponding frequencies were 25% and 19% for use of olive oil in cocking. In 2009, 41% of the women and 37% of the men reported use of olive oil in dressing and with similar proportions of olive oil use in cocking. Total amount of alcohol consumed per day was derived from the FFQ by multiplying the reported frequencies with the reported amounts on a single occasion. Energy intake was estimated by multiplying the portion-specific consumption frequency of each food item with the nutrient content obtained from the Swedish food database [33]. The mMED score comprises 8 components: fruit and vegetables (apple, banana, berry, orange/citrus, and other fruit; carrot, beetroot, broccoli, cabbage, cauliflower, lettuce, onion, garlic, pepper, spinach, tomato, and other vegetables), legumes (peas, lentils, beans, and pea soup) and nuts, unrefined or high-fiber grains (whole-meal bread, crisp bread, oatmeal, and bran of wheat), fermented dairy products (sour milk, yoghurt, and cheese), fish (excluding shellfish), red and processed meat, any use of olive or rapeseed oil for cooking or as dressing, and alcohol intake. An individual with a reported intake above or below a specific cut point for each component of a diet score usually receives discrete score points (0 or 1), but in the method of Knudsen and colleagues [36], each individual receives 1 or a ratio between the actual intake and a chosen intake amount. Such an approach generates continuous component variables and improves precision of the exposure assessment. In the present study, the reference points for fruit and vegetables, legumes and nuts, nonrefined or high-fiber grains, fermented dairy products, and fish were the median intakes in the 1997 data and are lower-intake thresholds. A participant with an intake of legume and nuts of x grams will thus receive the score = x/median1997. For red and processed meat, consumption below the population median intake rendered a score of 1 point, intakes of 2 or more times the population median rendered 0 points, and intakes above the median (but below 2 × population median) rendered a score of 1 − (actual intake − median1997)/median1997. Any use of olive or rapeseed oil gave 1 point and otherwise 0 points. The alcohol component was coded as intake divided by 5 in the range 0–5 grams/day, as 1 in the intake range 5–15 grams/day, as 1 × (intake − 15)/15 in the range above 15 up to 30 gram/day, and 0 for intakes above 30 grams/day. The same 1997 cutoff points were applied using the 2009 data in order to avoid secular trends and intake differences caused by the fact that the number of food items was higher in the 2009 FFQ. The more detailed FFQ in 2009 is a reflection of a greater diversification of diet over time. Assessment of covariates Covariates obtained from the questionnaires (1997 and 2008/2009) were age, smoking status (including cigarettes per day at different ages), walking/cycling, leisure time, physical exercise during the past year, and, as markers of socioeconomic status, cohabiting/marital status as well as educational level. The exercise questions have been validated against activity records and accelerometer data [38]. Comorbidity, expressed as Charlson’s weighted comorbidity index [39, 40], was defined using ICD diagnosis codes (versions 8, 9, and 10) from the National Patient Register from 1964 to before baseline 1 January 1998. Information on diabetes mellitus was retrieved from the questionnaire and from the National Patient Register. Assessment of deaths All-cause mortality was our primary outcome, with information obtained from the continuously updated Swedish Total Population Register. A complete linkage with the register is possible since all Swedish residents have a unique personal identity number. Since 1952, the National Board of Health and Welfare has maintained information with yearly updates on the causes of death for all Swedish residents in the Cause of Death Registry. We used the underlying cause of death to define our secondary outcome, mortality from CVD (ICD-10 codes I00–I99). Statistical analysis For each participant, follow-up time accrued from 1 January 1998 until the date of death, a questionnaire response indicating a BMI <20 kg/m2 in 2008 (n = 1,824), or the end of the study period (31 October 2018 for all-cause mortality and 31 December 2017 for CVD mortality). The associations of mMED and BMI with all-cause mortality and CVD mortality were assessed as age and multivariable-adjusted hazard ratios (HRs) with 95% confidence intervals (CIs) by Cox proportional hazards regression models, with time-updated information of all variables except Charlson’s comorbidity index and diabetes mellitus (defined only at baseline) and calendar date as the timescale. Both exposures were initially treated as continuous variables. To select suitable covariates for the multivariable model, we used current knowledge and a directed acyclic graph [41], presented as S2 Protocol. The overall model included sex, age (splines with 3 knots), educational level (≤9, 10–12, >12 years, other), living alone (yes or no), leisure time exercise during the past year (<1 h/w, 1 h/w, 2–3 h/w, 4–5 h/w, >5 h/w), walking/cycling (almost never, <20 min/d, 20–40 min/d, 40–60 min/d, 1–1.5 h/d, >1.5 h/d), height (splines with 3 knots), energy intake (splines with 3 knots), smoking habits (current, former, never), Charlson’s weighted comorbidity index (continuous), and diabetes mellitus as a separate marker variable (yes/no). Missing data were imputed (20 imputations) using Stata’s “mi” package (multiple imputations using chained equations). The proportion of missing data in the cohorts was 4% for BMI, 3% for height, 9% for walking/bicycling, 11% for exercise, and 6% for marital status. For all other covariates, the percentage of missing was less than 2%. Missingness of foods at baseline was for fruit and vegetables 0.1%, legumes and nuts 3.4%, grains 0.9%, fermented dairy products 2.9%, fish 1.1%, meat 0.7%, olive or rapeseed oil 0%, and alcohol intake 0%. Nonlinear trends of mortality were assessed using restricted cubic splines with 3 knots placed at centiles 10, 50, and 90 of mMED and BMI, respectively. We performed stratified analyses in subgroups of potential confounders in which BMI below or above the median of 26 kg/m2 and mMED score were examined as continuous variables. The purpose of these analyses in homogeneous strata was 2-fold: to visualize and evaluate potential confounding, although with a limitation of different baseline hazards in the strata, and to evaluate potential effect modification of the exposures. Combinations of the categories of BMI and mMED were used to jointly classify study participants into 9 strata. Participants with normal BMI and in the highest category of mMED were used as the reference category in these analyses. Test for homogeneity of HRs across strata was done according to Fleiss [42]. We conducted a stratified analysis by sex with all-cause mortality as outcome. A complementary analysis of risk differences (RDs) was suggested by one of the reviewers. By using the approach described by Austin [43], multivariable-adjusted RDs and relative risks (RRs) were calculated from the predicted survival curve based on the Cox model for all-cause mortality. For the main analysis, the RDs and RRs (with 95% CIs based on 500 bootstrap replicates) were calculated at 20 years, and for the sensitivity analysis starting follow-up in 2009, RDs and RRs at 9 years of follow-up were calculated. For the sensitivity analysis, in which no variable information was updated, yet another method based on pseudo-observations [44], as suggested by the reviewer, was used to estimate the RDs. Additional sensitivity analyses were conducted, excluding those with a BMI greater than 35 kg/m2, restricting those with normal BMI to 22–25 kg/m2, adding adjustment for pack-years of smoking, restricting analysis to never-smokers, excluding those with pre-existing diseases before baseline (chronic obstructive lung disease, cancer, myocardial infarction or other ischemic heart disease, heart failure, peripheral arterial disease, and stroke as suggested by the reviewer), and excluding the first 2 years of follow-up. Statistical analyses were carried out in Stata version 15.1 (StataCorp, College Station, TX, USA) and in R, version 4.0 (R Core Team, 2020). BMI We categorized BMI into normal (20 to <25 kg/m2), overweight (25 to <30 kg/m2), and obese (≥30 kg/m2), using self-reported weight and height in 1997 and 2008. Overall, 4% of data points were missing. Those with a BMI below 20 kg/m2 at baseline were excluded (n = 3,226, 4%; 2,354 women and 872 men) since a low body mass can reflect frailty or prevalent disease, which were not intended to be examined in this analysis. Therefore, our final data set used for the analyses contains 79,003 women (44%) and men (56%). Modified Mediterranean-like diet (mMED) score The dietary assessment has been described previously [33]. Briefly, the FFQs included 96 and 132 food items in 1997 and 2009, respectively. We calculated an mMED score adapted from the Mediterranean diet scale devised by Trichopoulou and colleagues [34] using previously defined food items [35], but the scoring was modified according to Knudsen and colleagues, rendering a continuous score [36]. Details of the scoring, ranging from 0 to 8 on a continuous scale, are found below; participants with higher score points were more adherent to the diet. For analysis, mMED was classified into 3 predefined categories (0 to <4, 4 to <6, and 6–8 score points) chosen to balance exposure range and numbers of individuals in each category [37]. Participants indicated in the FFQs how often, on average, they had consumed each food item during the last year, choosing from 8 predefined frequency categories ranging from "never/seldom" to "3 or more times per day". Frequently consumed foods such as dairy products and bread were reported as the number of servings per day (open question). Information on fat type including vegetable oils used in cooking and as salad dressing was also reported. At baseline in 1997, 19% of the women and 12% of the men reported use of olive oil in dressing. The corresponding frequencies were 25% and 19% for use of olive oil in cocking. In 2009, 41% of the women and 37% of the men reported use of olive oil in dressing and with similar proportions of olive oil use in cocking. Total amount of alcohol consumed per day was derived from the FFQ by multiplying the reported frequencies with the reported amounts on a single occasion. Energy intake was estimated by multiplying the portion-specific consumption frequency of each food item with the nutrient content obtained from the Swedish food database [33]. The mMED score comprises 8 components: fruit and vegetables (apple, banana, berry, orange/citrus, and other fruit; carrot, beetroot, broccoli, cabbage, cauliflower, lettuce, onion, garlic, pepper, spinach, tomato, and other vegetables), legumes (peas, lentils, beans, and pea soup) and nuts, unrefined or high-fiber grains (whole-meal bread, crisp bread, oatmeal, and bran of wheat), fermented dairy products (sour milk, yoghurt, and cheese), fish (excluding shellfish), red and processed meat, any use of olive or rapeseed oil for cooking or as dressing, and alcohol intake. An individual with a reported intake above or below a specific cut point for each component of a diet score usually receives discrete score points (0 or 1), but in the method of Knudsen and colleagues [36], each individual receives 1 or a ratio between the actual intake and a chosen intake amount. Such an approach generates continuous component variables and improves precision of the exposure assessment. In the present study, the reference points for fruit and vegetables, legumes and nuts, nonrefined or high-fiber grains, fermented dairy products, and fish were the median intakes in the 1997 data and are lower-intake thresholds. A participant with an intake of legume and nuts of x grams will thus receive the score = x/median1997. For red and processed meat, consumption below the population median intake rendered a score of 1 point, intakes of 2 or more times the population median rendered 0 points, and intakes above the median (but below 2 × population median) rendered a score of 1 − (actual intake − median1997)/median1997. Any use of olive or rapeseed oil gave 1 point and otherwise 0 points. The alcohol component was coded as intake divided by 5 in the range 0–5 grams/day, as 1 in the intake range 5–15 grams/day, as 1 × (intake − 15)/15 in the range above 15 up to 30 gram/day, and 0 for intakes above 30 grams/day. The same 1997 cutoff points were applied using the 2009 data in order to avoid secular trends and intake differences caused by the fact that the number of food items was higher in the 2009 FFQ. The more detailed FFQ in 2009 is a reflection of a greater diversification of diet over time. Assessment of covariates Covariates obtained from the questionnaires (1997 and 2008/2009) were age, smoking status (including cigarettes per day at different ages), walking/cycling, leisure time, physical exercise during the past year, and, as markers of socioeconomic status, cohabiting/marital status as well as educational level. The exercise questions have been validated against activity records and accelerometer data [38]. Comorbidity, expressed as Charlson’s weighted comorbidity index [39, 40], was defined using ICD diagnosis codes (versions 8, 9, and 10) from the National Patient Register from 1964 to before baseline 1 January 1998. Information on diabetes mellitus was retrieved from the questionnaire and from the National Patient Register. Assessment of deaths All-cause mortality was our primary outcome, with information obtained from the continuously updated Swedish Total Population Register. A complete linkage with the register is possible since all Swedish residents have a unique personal identity number. Since 1952, the National Board of Health and Welfare has maintained information with yearly updates on the causes of death for all Swedish residents in the Cause of Death Registry. We used the underlying cause of death to define our secondary outcome, mortality from CVD (ICD-10 codes I00–I99). Statistical analysis For each participant, follow-up time accrued from 1 January 1998 until the date of death, a questionnaire response indicating a BMI <20 kg/m2 in 2008 (n = 1,824), or the end of the study period (31 October 2018 for all-cause mortality and 31 December 2017 for CVD mortality). The associations of mMED and BMI with all-cause mortality and CVD mortality were assessed as age and multivariable-adjusted hazard ratios (HRs) with 95% confidence intervals (CIs) by Cox proportional hazards regression models, with time-updated information of all variables except Charlson’s comorbidity index and diabetes mellitus (defined only at baseline) and calendar date as the timescale. Both exposures were initially treated as continuous variables. To select suitable covariates for the multivariable model, we used current knowledge and a directed acyclic graph [41], presented as S2 Protocol. The overall model included sex, age (splines with 3 knots), educational level (≤9, 10–12, >12 years, other), living alone (yes or no), leisure time exercise during the past year (<1 h/w, 1 h/w, 2–3 h/w, 4–5 h/w, >5 h/w), walking/cycling (almost never, <20 min/d, 20–40 min/d, 40–60 min/d, 1–1.5 h/d, >1.5 h/d), height (splines with 3 knots), energy intake (splines with 3 knots), smoking habits (current, former, never), Charlson’s weighted comorbidity index (continuous), and diabetes mellitus as a separate marker variable (yes/no). Missing data were imputed (20 imputations) using Stata’s “mi” package (multiple imputations using chained equations). The proportion of missing data in the cohorts was 4% for BMI, 3% for height, 9% for walking/bicycling, 11% for exercise, and 6% for marital status. For all other covariates, the percentage of missing was less than 2%. Missingness of foods at baseline was for fruit and vegetables 0.1%, legumes and nuts 3.4%, grains 0.9%, fermented dairy products 2.9%, fish 1.1%, meat 0.7%, olive or rapeseed oil 0%, and alcohol intake 0%. Nonlinear trends of mortality were assessed using restricted cubic splines with 3 knots placed at centiles 10, 50, and 90 of mMED and BMI, respectively. We performed stratified analyses in subgroups of potential confounders in which BMI below or above the median of 26 kg/m2 and mMED score were examined as continuous variables. The purpose of these analyses in homogeneous strata was 2-fold: to visualize and evaluate potential confounding, although with a limitation of different baseline hazards in the strata, and to evaluate potential effect modification of the exposures. Combinations of the categories of BMI and mMED were used to jointly classify study participants into 9 strata. Participants with normal BMI and in the highest category of mMED were used as the reference category in these analyses. Test for homogeneity of HRs across strata was done according to Fleiss [42]. We conducted a stratified analysis by sex with all-cause mortality as outcome. A complementary analysis of risk differences (RDs) was suggested by one of the reviewers. By using the approach described by Austin [43], multivariable-adjusted RDs and relative risks (RRs) were calculated from the predicted survival curve based on the Cox model for all-cause mortality. For the main analysis, the RDs and RRs (with 95% CIs based on 500 bootstrap replicates) were calculated at 20 years, and for the sensitivity analysis starting follow-up in 2009, RDs and RRs at 9 years of follow-up were calculated. For the sensitivity analysis, in which no variable information was updated, yet another method based on pseudo-observations [44], as suggested by the reviewer, was used to estimate the RDs. Additional sensitivity analyses were conducted, excluding those with a BMI greater than 35 kg/m2, restricting those with normal BMI to 22–25 kg/m2, adding adjustment for pack-years of smoking, restricting analysis to never-smokers, excluding those with pre-existing diseases before baseline (chronic obstructive lung disease, cancer, myocardial infarction or other ischemic heart disease, heart failure, peripheral arterial disease, and stroke as suggested by the reviewer), and excluding the first 2 years of follow-up. Statistical analyses were carried out in Stata version 15.1 (StataCorp, College Station, TX, USA) and in R, version 4.0 (R Core Team, 2020). Results Age-standardized baseline characteristics (mean age 61 years, range 45–83) within categories of BMI and mMED are displayed in Table 1. Ten percent of the participants were obese, and 46% had a normal BMI; 44% of the participants reported dietary habits consistent with high adherence to mMed and 8% low adherence. Individuals who were overweight or obese reported lower educational attainment, a higher prevalence of diabetes, and less exercise than those with a normal BMI. Those with high mMED had higher educational attainment, higher physical activity level and energy intake, and a higher prevalence of cohabitation. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Age-standardized baseline characteristics of the participants by 3 categories of BMI and 3 categories of Mediterranean diet score, respectively. https://doi.org/10.1371/journal.pmed.1003331.t001 During up to 21 years of follow-up (mean 17.4 years) that accrued 1,372,266 person-years of observation, 30,389 (38%) participants died (22 deaths per 1,000 person-years). HRs of death were related to BMI in a J-shaped pattern (Fig 1A for all-cause mortality and Fig 1B for cardiovascular mortality) and inversely with adherence to mMED (Fig 1C for all-cause mortality and Fig 1D for cardiovascular mortality). The nadir in HRs of all-cause mortality was around a BMI of 26 kg/m2, the median, with an HR of 1.022 (95% CI 1.017–1.027) per 1 kg/m2 above this level. Each unit higher mMED score was associated with a multivariable-adjusted HR of 0.860 (95% CI 0.849–0.871). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Association between BMI (A for all-cause mortality and B for cardiovascular mortality) and an mMED score (C for all-cause mortality and D for cardiovascular mortality) with mortality. The dark gray shaded regions in the figures correspond to 95% CIs, and the spike plots represent the distribution of BMI and mMED scores, respectively. Assessed by multivariable-adjusted HRs using of Cox regression analysis and restricted cubic splines, with a BMI of 25 kg/m2 and mMED score of 8 units as references. HRs adjusted for sex, age (splines with 2 knots), educational level (≤9, 10–12, >12 years, other), living alone (yes or no), leisure time physical exercise during the past year (<1 h/w, 1 h/w, 2–3 h/w, 4–5 h/w, >5 h/w), walking/cycling (almost never, <20 min/d, 20–40 min/d, 40–60 min/d, 1–1.5 h/d, >1.5 h/d), height (splines with 2 knots), energy intake (splines with 2 knots), smoking habits (current, former, never), Charlson’s weighted comorbidity index (continuous; 1–16), and diabetes mellitus (yes/no). BMI, body mass index; CI, confidence interval; CVD, cardiovascular disease; HR, hazard ratio; mMED, modified Mediterranean-like diet. https://doi.org/10.1371/journal.pmed.1003331.g001 Fig 2A and Fig 2B illustrate the analysis of mortality by categories of covariates for BMI as a continuous variable (in subgroups below or above the median of 26 kg/m2 to take into account the J-shaped association with risk) and for mMED score as a continuous variable. The overall pattern of the HRs for all-cause mortality were in the same direction within each subgroup. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Subgroup analysis for BMI as continuous variable below or above the median of 26 kg/m2 (A) and for mMED (B) as a continuous variable by categories of the covariates. The whiskers represent 95% CIs. Associations expressed as multivariable-adjusted HRs of all-cause mortality by 1 unit change in BMI or mMED score. HRs adjusted for sex, age (splines with 2 knots), educational level (≤9, 10–12, >12 years, other), living alone (yes or no), leisure time physical exercise during the past year (<1 h/w, 1 h/w, 2–3 h/w, 4–5 h/w, >5 h/w), walking/cycling (almost never, <20 min/d, 20–40 min/d, 40–60 min/d, 1–1.5 h/d, >1.5 h/d), height (splines with 2 knots), energy intake (splines with 2 knots), smoking habits (current, former, never), Charlson’s weighted comorbidity index (continuous; 1–16), and diabetes mellitus (yes/no). BMI, body mass index; CI, confidence interval; HR, hazard ratio; mMED, modified Mediterranean-like diet. https://doi.org/10.1371/journal.pmed.1003331.g002 Associations of cross-classified categories of BMI and mMED with total mortality are illustrated in Fig 3A, using normal BMI (mean 23 kg/m2) and high mMED as the reference. We found the lowest mortality among overweight (mean 27 kg/m2) individuals with high mMED (HR 0.94; 95% CI 0.90, 0.98). Whatever the BMI category, a high mMED score brought the point estimate of the HR to the reference level or below. In particular, obese individuals (mean BMI 33 kg/m2) with high mMED scores did not have significantly elevated HR of all-cause mortality (HR of 1.03; 95% CI 0.96–1.11). In contrast, lower BMI did not compensate for a low mMED score. No matter what the BMI, participants with a low mMED score retained an elevated risk. Indeed, participants with a normal BMI but a low mMED score had an overall mortality HR of 1.60 (95% CI 1.48–1.74), which was actually higher than that for obese individuals with high mMED (p < 0.0001 for homogeneity). We found similar estimates among women and men (Table 2) as in the pooled analysis (Fig 3A). The attenuation of the estimates after multivariable adjustment was mainly driven by differences in physical activity. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Associations of combinations of BMI and adherence to an mMED with all-cause (A) and CVD mortality (B). Estimated by multivariable-adjusted HRs by use of Cox regression analysis with a normal BMI and high adherence to mMED as the reference. The CI in each subpanel is expressed both in numbers and as a line representing the width. HRs adjusted for sex, age (splines with 2 knots), educational level (≤9, 10–12, >12 years, other), living alone (yes or no), leisure time physical exercise during the past year (<1 h/w, 1 h/w, 2–3 h/w, 4–5 h/w, >5 h/w), walking/cycling (almost never, <20 min/d, 20–40 min/d, 40–60 min/d, 1–1.5 h/d, >1.5 h/d), height (splines with 2 knots), energy intake (splines with 2 knots), smoking habits (current, former, never), Charlson’s weighted comorbidity index (continuous; 1–16), and diabetes mellitus (yes/no). BMI, body mass index; CI, confidence interval; CVD, cardiovascular disease; HR, hazard ratio; mMED, modified Mediterranean-like diet. https://doi.org/10.1371/journal.pmed.1003331.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Combined associations of a Mediterranean diet score and BMI on all-cause mortality in women and in men. High adherence to a Mediterranean diet and a normal body index is the reference with an HR of 1.0. https://doi.org/10.1371/journal.pmed.1003331.t002 RDs and RRs are presented in S1 Table. Generally, the results followed the same pattern as that for the HRs. At 20 years of follow-up, the mortality risk difference for participants with a normal BMI and a low mMED score compared with those with a high mMED score and a normal BMI was 0.094 (95% CI 0.090–0.097), corresponding to a number needed to treat of 11 individuals. Our secondary outcome was CVD mortality, with 12,064 cardiovascular deaths during follow-up (Fig 3B). For this outcome, the lowest mortality HRs were in participants with high mMED scores and normal or overweight BMI (Fig 3B). A high mMED score was associated with lower CVD mortality within each BMI stratum, but in contrast to findings for total mortality, individuals with high mMED scores and obesity retained a modestly elevated HR of 1.29 (95% CI 1.16–1.44). Otherwise, the patterns of the HRs were similar to those for all-cause mortality. Participants with a normal BMI but low mMED score had a CVD mortality HR of 1.76 (95% CI 1.55–1.99), which was statistically indistinguishable from the HR for the obese. Further sensitivity analyses revealed similar estimates as the primary analyses for the combined exposures of BMI and mMED, including exclusion of participants with a BMI higher than 35 kg/m2 (S1 Fig), restricting the analysis to those with normal BMI to a more narrow 22–25 kg/m2 range (S2 Fig), additionally adjusting for pack-years of smoking (S3 Fig), restricting analysis to never-smokers (S4 Fig), and excluding individuals with any of the following criteria: ever smokers, BMI below 22 kg/m2, and those with pre-existing diseases before baseline (chronic obstructive lung disease, cancer, myocardial infarction or other ischemic heart disease, heart failure, peripheral arterial disease, and stroke), as well as excluding the first 2 years of follow-up (S5 Fig). Discussion In this large, population-based cohort analysis of middle-aged and older men and women, obese individuals with high adherence to a Mediterranean-type diet did not experience the increased overall mortality otherwise associated with high BMI, although a higher CVD mortality remained. However, lower BMI did not appear to counter the elevated mortality associated with a low adherence to a Mediterranean-like diet: individuals with a low mMED score retained an increased mortality even with a normal BMI. These results indicate that adherence to healthy diets such as a Mediterranean-like diet may be a more appropriate focus than avoidance of obesity for the prevention of overall mortality. Ours is the first large cohort study examining the combined association of BMI and a Mediterranean-like diet with rates of mortality. The novelty of our study is the examination of combined strata of BMI and Mediterranean diet. The J-shaped association between BMI and all-cause as well as cardiovascular mortality confirms results from previous observational studies [2–5] as well as a mendelian randomization study based on a genetic instrument for BMI [45]. A modestly sized secondary prevention trial of a Mediterranean diet after a myocardial infarction reported more than a halved rate of all-cause mortality after 4 years among those randomized to the diet [19]. Our results are also partially consistent with those of the larger primary prevention PREDIMED trial. This study included 7,447 participants 55–80 years of age with a mean BMI of 30 kg/m2 at high risk of CVD [20]. After 5 years of follow-up, there was a 30% reduction in risk of myocardial infarction, stroke, or cardiovascular death in those randomized to a Mediterranean diet, with an even larger effect in obese individuals. However, all-cause mortality was not affected by the intervention [20]. The magnitude of differences in the 14-point score between the Mediterranean diet intervention and the control diet group during different time points of follow-up in the PREDIMED trial was not large, ranging from 1.4 to 1.8 points, a smaller exposure contrast than in our study. Potential mechanisms A high BMI has been associated with a negative impact on risk factors for premature death and CVD, including hypertension, insulin resistance, hyperlipidemia, low-grade inflammation, and oxidative stress [46]. In contrast, intervention studies have shown reduced blood pressure, improved insulin resistance, lower blood lipids, and lower inflammation and oxidative stress marker levels with Mediterranean-like diet even in those with continuing high body weight [47–53]. Additionally, these diets have effects on gut-microbiota–mediated production of metabolites influencing metabolic health [49], higher circulating adiponectin concentrations [52], and improved endothelial function [52]. Even though a Mediterranean-like diet seems to have counteracted higher all-cause mortality associated with obesity in our study, these individuals still had modestly higher CVD mortality, albeit with lower rates than obese individuals who had lower mMED scores. This remaining elevation in risk could have several different explanations; one might be the consequence of a common genetic predisposition to both high BMI and CVD [54–56]. Another biological explanation may be that an even higher adherence to a classical Mediterranean diet is needed to fully compensate for obesity or that the negative effect of obesity on cardiovascular risk factors cannot be fully compensated for by healthy eating. The relatively high mortality rates in our study among individuals with a normal weight, even among never-smokers, might seem counterintuitive. However, nutritional reserves may be particularly needed at older ages, and sarcopenia associated with low body weight and malnutrition is a strong independent predictor of early death [57, 58]. A healthy diet, including a Mediterranean diet, is related to a lower future risk of sarcopenia, frailty, and falls [59–62]. A low BMI and a low adherence to mMED are both strongly associated with higher risk of fragility fractures [63–65], which in turn leads to high mortality rates [66, 67]. In elderly individuals, concomitant low BMI and malnutrition have also led to decreased immune function, followed by a higher risk of infections [68, 69] and higher risk of surgical complications [70], more frequent hospital admissions, and a 4-fold greater risk of mortality [71]. Strengths and limitations Our analysis was made possible by use of 2 population-based cohorts in a setting with wide variation in dietary habits. We had a long follow-up with a large number of deaths, ascertained by use of national register information and personal identification numbers without loss to follow-up. We used time-updated information on diet, other lifestyle factors such as exercise and walking, socioeconomic status, and comorbidity information in our statistical analysis. Exclusion of very lean individuals from the analysis lowered the risk of reverse causation bias. The results were independent of other major known risk factors for early death, and we found consistency of the HRs in subgroups of covariates, an indication of no major confounding or effect modification. However, our results might not apply to people in other settings with different dietary patterns, to those with more extreme obesity (BMI >35 kg/m2), or to younger age groups. Measurement errors in self-reported lifestyle factors such as the diet are inevitable, generally leading to conservatively biased estimates of association. Although recall of weight and height on average are quite accurate, those with high body weight tend to slightly underreport their weight [72], and therefore, some truly obese individuals might have been classified as overweight. Most importantly, our observational study of the associations of diet and BMI with mortality cannot prove that weight loss or dietary change can reduce the risk of death, and therefore, our RDs and corresponding numbers needed to treat are recommended to be cautiously interpreted. Clinical trials would be required for that level of certainty, but long-term adherence to the allocated diet is an issue with such design. Replication of our results by independent researchers and with use of other cohorts with time-updated lifestyle information would also be of additional value since recommendations cannot be based on our findings alone. Conclusions The results from this longitudinal cohort study indicate that for both women and men during the last decades of life, diet can modify the association of a higher BMI with mortality; obese individuals adhering to a Mediterranean diet did not have an increased mortality in comparison to more lean individuals. In contrast, a lean BMI did not offset a poor diet. Nonetheless, a healthy diet may not completely counter higher CVD mortality related to obesity. Potential mechanisms A high BMI has been associated with a negative impact on risk factors for premature death and CVD, including hypertension, insulin resistance, hyperlipidemia, low-grade inflammation, and oxidative stress [46]. In contrast, intervention studies have shown reduced blood pressure, improved insulin resistance, lower blood lipids, and lower inflammation and oxidative stress marker levels with Mediterranean-like diet even in those with continuing high body weight [47–53]. Additionally, these diets have effects on gut-microbiota–mediated production of metabolites influencing metabolic health [49], higher circulating adiponectin concentrations [52], and improved endothelial function [52]. Even though a Mediterranean-like diet seems to have counteracted higher all-cause mortality associated with obesity in our study, these individuals still had modestly higher CVD mortality, albeit with lower rates than obese individuals who had lower mMED scores. This remaining elevation in risk could have several different explanations; one might be the consequence of a common genetic predisposition to both high BMI and CVD [54–56]. Another biological explanation may be that an even higher adherence to a classical Mediterranean diet is needed to fully compensate for obesity or that the negative effect of obesity on cardiovascular risk factors cannot be fully compensated for by healthy eating. The relatively high mortality rates in our study among individuals with a normal weight, even among never-smokers, might seem counterintuitive. However, nutritional reserves may be particularly needed at older ages, and sarcopenia associated with low body weight and malnutrition is a strong independent predictor of early death [57, 58]. A healthy diet, including a Mediterranean diet, is related to a lower future risk of sarcopenia, frailty, and falls [59–62]. A low BMI and a low adherence to mMED are both strongly associated with higher risk of fragility fractures [63–65], which in turn leads to high mortality rates [66, 67]. In elderly individuals, concomitant low BMI and malnutrition have also led to decreased immune function, followed by a higher risk of infections [68, 69] and higher risk of surgical complications [70], more frequent hospital admissions, and a 4-fold greater risk of mortality [71]. Strengths and limitations Our analysis was made possible by use of 2 population-based cohorts in a setting with wide variation in dietary habits. We had a long follow-up with a large number of deaths, ascertained by use of national register information and personal identification numbers without loss to follow-up. We used time-updated information on diet, other lifestyle factors such as exercise and walking, socioeconomic status, and comorbidity information in our statistical analysis. Exclusion of very lean individuals from the analysis lowered the risk of reverse causation bias. The results were independent of other major known risk factors for early death, and we found consistency of the HRs in subgroups of covariates, an indication of no major confounding or effect modification. However, our results might not apply to people in other settings with different dietary patterns, to those with more extreme obesity (BMI >35 kg/m2), or to younger age groups. Measurement errors in self-reported lifestyle factors such as the diet are inevitable, generally leading to conservatively biased estimates of association. Although recall of weight and height on average are quite accurate, those with high body weight tend to slightly underreport their weight [72], and therefore, some truly obese individuals might have been classified as overweight. Most importantly, our observational study of the associations of diet and BMI with mortality cannot prove that weight loss or dietary change can reduce the risk of death, and therefore, our RDs and corresponding numbers needed to treat are recommended to be cautiously interpreted. Clinical trials would be required for that level of certainty, but long-term adherence to the allocated diet is an issue with such design. Replication of our results by independent researchers and with use of other cohorts with time-updated lifestyle information would also be of additional value since recommendations cannot be based on our findings alone. Conclusions The results from this longitudinal cohort study indicate that for both women and men during the last decades of life, diet can modify the association of a higher BMI with mortality; obese individuals adhering to a Mediterranean diet did not have an increased mortality in comparison to more lean individuals. In contrast, a lean BMI did not offset a poor diet. Nonetheless, a healthy diet may not completely counter higher CVD mortality related to obesity. Supporting information S1 STROBE Checklist. Checklist according to STROBE guidelines. STROBE, Strengthening the Reporting of Observational Studies in Epidemiology. https://doi.org/10.1371/journal.pmed.1003331.s001 (DOCX) S1 Protocol. Prospective study plan. https://doi.org/10.1371/journal.pmed.1003331.s002 (DOCX) S2 Protocol. Directed acyclic graph with code displaying the selection of covariates for the analysis of association of BMI combined with adherence to a Mediterranean-like diet with mortality. BMI, body mass index https://doi.org/10.1371/journal.pmed.1003331.s003 (DOCX) S1 Fig. Associations of combinations of BMI and adherence to an mMED with all-cause mortality after exclusion of those with BMI higher than 35 kg/m2. BMI, body mass index; mMED, modified Mediterranean-like diet https://doi.org/10.1371/journal.pmed.1003331.s004 (DOCX) S2 Fig. Associations of combinations of BMI and adherence to an mMED with all-cause mortality after restriction of the analysis to those with normal BMI to a more narrow 22–25 kg/m2 range. BMI, body mass index; mMED, modified Mediterranean-like diet https://doi.org/10.1371/journal.pmed.1003331.s005 (DOCX) S3 Fig. Associations of combinations of BMI and adherence to an mMED with all-cause mortality after extending the multivariable model by additional adjustment for pack-years of smoking. BMI, body mass index; mMED, modified Mediterranean-like diet https://doi.org/10.1371/journal.pmed.1003331.s006 (DOCX) S4 Fig. Associations of combinations of BMI and adherence to an mMED with all-cause mortality after restriction to never-smokers. BMI, body mass index; mMED, modified Mediterranean-like diet https://doi.org/10.1371/journal.pmed.1003331.s007 (DOCX) S5 Fig. Associations of combinations of BMI and adherence to an mMED with all-cause mortality excluding individuals with any of the following criteria: Ever smokers, BMI below 22 kg/m2, and those with pre-existing diseases before baseline (chronic obstructive lung disease, cancer, myocardial infarction or other ischemic heart disease, heart failure, peripheral arterial disease, and stroke) and excluding the first 2 years of follow-up. BMI, body mass index; mMED, modified Mediterranean-like diet https://doi.org/10.1371/journal.pmed.1003331.s008 (DOCX) S1 Table. Associations of combinations of BMI and adherence to an mMED with all-cause mortality. The upper part of the table presents results with use of time-updated information and 20 years of follow-up from 1997 and the lower part with use of 9 years of follow-up from 2009. The estimated associations are all multivariable-adjusted*. Absolute RDs and RRs (at 20 years and 9 years of follow-up, respectively) are calculated from the predicted survival curves based on the multivariable-adjusted Cox model. The last column of the 9 years follow-up from 2009 presents absolute RDs calculated from pseudo-observations using a GEE model with identity link. BMI, body mass index; GEE, generalized estimated equation; mMED, modified Mediterranean-like diet; RD, risk difference; RR, relative risk. https://doi.org/10.1371/journal.pmed.1003331.s009 (DOCX) Acknowledgments We acknowledge the Swedish Research Council-supported national research infrastructure SIMPLER for provisioning of facilities and experimental support, and we thank Anna-Karin Kolseth for her assistance. The computations were performed on resources provided by the Swedish National Infrastructure for Computing’s (https://www.snic.se/) support for sensitive data (SNIC-SENS) through the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project SIMP2019004.
Severe mental illness diagnosis in English general hospitals 2006-2017: A registry linkage studyMansour, Hassan;Mueller, Christoph;Davis, Katrina A. S.;Burton, Alexandra;Shetty, Hitesh;Hotopf, Matthew;Osborn, David;Stewart, Robert;Sommerlad, Andrew
doi: 10.1371/journal.pmed.1003306pmid: 32941435
Background The higher mortality rates in people with severe mental illness (SMI) may be partly due to inadequate integration of physical and mental healthcare. Accurate recording of SMI during hospital admissions has the potential to facilitate integrated care including tailoring of treatment to account for comorbidities. We therefore aimed to investigate the sensitivity of SMI recording within general hospitals, changes in diagnostic accuracy over time, and factors associated with accurate recording. Methods and findings We undertook a cohort study of 13,786 adults with SMI diagnosed during 2006–2017, using data from a large secondary mental healthcare database as reference standard, linked to English national records for 45,706 emergency hospital admissions. We examined general hospital record sensitivity across patients’ subsequent hospital records, for each subsequent emergency admission, and at different levels of diagnostic precision. We analyzed time trends during the study period and used logistic regression to examine sociodemographic and clinical factors associated with psychiatric recording accuracy, with multiple imputation for missing data. Sensitivity for recording of SMI as any mental health diagnosis was 76.7% (95% CI 76.0–77.4). Category-level sensitivity (e.g., proportion of individuals with schizophrenia spectrum disorders (F20-29) who received any F20-29 diagnosis in hospital records) was 56.4% (95% CI 55.4–57.4) for schizophrenia spectrum disorder and 49.7% (95% CI 48.1–51.3) for bipolar affective disorder. Sensitivity for SMI recording in emergency admissions increased from 47.8% (95% CI 43.1–52.5) in 2006 to 75.4% (95% CI 68.3–81.4) in 2017 (ptrend < 0.001). Minority ethnicity, being married, and having better mental and physical health were associated with less accurate diagnostic recording. The main limitation of our study is the potential for misclassification of diagnosis in the reference-standard mental healthcare data. Conclusions Our findings suggest that there have been improvements in recording of SMI diagnoses, but concerning under-recording, especially in minority ethnic groups, persists. Training in culturally sensitive diagnosis, expansion of liaison psychiatry input in general hospitals, and improved data sharing between physical and mental health services may be required to reduce inequalities in diagnostic practice. Why was this study done? People with severe mental illness (SMI) have increased mortality and morbidity, largely due to preventable medical conditions, and these disparities have the potential to be ameliorated through better healthcare integration. Accurate recognition of SMI during hospital admissions can be critical as it allows continuity of previous pharmacological and supportive treatments and tailoring of inpatient and discharge care to individual needs. What did the researchers do and find? We examined the hospital discharge records of 13,786 individuals with SMI diagnosis from a mental health service, who had 45,706 admissions to English general hospitals between 2006 and 2017. We found that a psychiatric condition is recorded in around two-thirds of general hospital admissions of people with SMI. Recording of SMI diagnosis increased between 2006 and 2017. However, people from ethnic minority and married backgrounds were less likely to have psychiatric diagnosis recorded. Similarly, those with less severe mental or physical health symptoms were also less likely to have diagnosis recorded. What do these findings mean? Despite improvements over the past decade, inequities related to ethnicity remain. Policy-makers and clinicians should endeavor to improve recognition and recording of SMI in general hospital settings to promote integrated physical and mental healthcare. A limitation of our study is that our use of electronic health records for the reference-standard means that some people with SMI may have been misclassified. Introduction Severe mental illnesses (SMI), defined as schizophrenia spectrum and bipolar affective disorders, have lifetime prevalence of around 0.5% and 1% respectively [1, 2], and are associated with several physical comorbidities that increase the risk of general hospital admission [3, 4]. Life expectancy is 10–15 years lower for people with SMI [5], and this health gap is increasing over time [6], with SMI contributing to 3.5% of worldwide years lost to disability [7]. The 2019 Lancet Commission on Physical Health of People with Mental Illness identified the need for multidisciplinary approaches to multimorbidity in people with mental illness [8]. An important aspect of such healthcare integration is detection of psychiatric conditions in healthcare settings to allow continuity of previous care such as medication and tailoring of inpatient and discharge plans to account for comorbidity [9]. However, the separation of physical and mental healthcare in the United Kingdom, with different hospital settings using discrete electronic health records (EHRs), may inhibit this. There has been limited research into recognition of mental health conditions within secondary physical healthcare settings. One study investigated SMI recording in primary care [10], and another investigated recognition specifically of deliberate self-poisoning within one UK general hospital’s EHRs [11]; both were conducted over 20 years ago. A further study examined recording of alcohol disorders in primary and secondary care [12]. Other studies have investigated accuracy of psychiatric, rather than general hospital, records, in which recognition of mental illness is likely to be higher [13]. No studies have investigated time-trend changes or factors associated with systematic differences in SMI recording. Understanding such changes may inform future approaches to improving illness recording and elucidate potential biases in hospital records, which are being increasingly used for case-ascertainment in epidemiological studies [14]. We therefore aimed to evaluate the accuracy of SMI recording during English general hospital admissions of a person with preexisting SMI who is admitted to a general hospital for any health condition, using a large secondary mental healthcare database to identify people with SMI, and linked national general hospital data to assess diagnostic recording. Our specific objectives were to calculate the sensitivity of SMI diagnosis in general hospital records, evaluate time-trend changes of SMI diagnosis recording in general hospital records from 2006 to 2017, and examine association of sociodemographic and clinical factors with accuracy of psychiatric recording in general hospital records. Methods Study design and participants We undertook a cohort study, using data from the South London and Maudsley National Health Service (NHS) Foundation Trust (SLaM), one of Europe’s largest secondary mental healthcare trusts providing support to around 1.36 million people living in 4 ethnically diverse communities in South London, UK (Croydon, Lambeth, Lewisham, and Southwark). We used the “Clinical Record Interactive Search” (CRIS) data extraction tool, which enables construction of databases suitable for research by identifying, retrieving, and linking a pseudonymized version of SLaM patient records for over 450,000 individuals [15]. CRIS uses natural language processing (NLP) algorithms developed on General Architecture for Text Engineering (GATE) software [16] to extract information from unstructured fields of the clinical record. SLaM data were linked using deterministic matching procedures to NHS Digital’s Hospital Episode Statistics data source to identify admissions to any English hospital. Oxfordshire Research Ethics Committee C (18/SC/0372) approved these resources for secondary analysis. The terms of the ethical approval do not require consent to be provided, but all participants have the right to opt out of data use at any time. This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline (S1 STROBE Checklist) [17]. These data were selected to generate “reference-standard” SMI diagnoses because they included information from the predominant diagnostic and treatment service for mental illnesses in the catchment area. Patients were diagnosed after assessment by mental health staff, e.g., a psychiatrist, nurse, or psychologist, according to the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) [18], which is the predominant diagnostic framework in the UK. We included study participants who (1) had clinical contact with SLaM mental health services between 1 January 2006 and 31 March 2017 while aged 18 or over, (2) were diagnosed at any time with schizophrenia (ICD-10 code F20), schizotypal disorder (F21), delusional disorder (F22), schizoaffective disorder (F25), other nonorganic psychotic disorder (F28), unspecified nonorganic psychosis (F29), manic episode (F30), or bipolar affective disorder (F31), including those described in clinical coding or records as “probable” or “in remission,” and (3) were admitted to an English NHS general (nonpsychiatric) hospital after the first diagnosis of SMI in CRIS. Participants were identified from diagnoses in structured EHR diagnostic fields or unstructured text. Participants were assigned to only 1 diagnostic category following the “hierarchy” of psychiatric diagnoses [19] (e.g., an individual diagnosed sequentially with schizophrenia [F20] and a manic episode [F30] would be assigned to F20 group), as in previous studies [20]. We did this to avoid double-counting patients who may have been diagnosed at different times within the reference-standard database as having a schizophrenia-like disorder and bipolar disorder; we judged that this most likely represented evolving clinical opinion rather than coexistence of the 2 disorders, which is clinically unlikely [21]. Outcomes We obtained outcome data from Hospital Episode Statistics (HES) records, which include clinical diagnoses according to ICD-10 criteria from admissions to any hospital within England [22]. The HES database is primarily designed to allocate payment to hospitals for the care they provide, and its secondary uses are for research and health service planning. The index date for our study was the first SMI diagnosis within SLaM, and we obtained data from HES on all subsequent admissions of included patients to general (i.e., nonpsychiatric) hospitals, including dates of admission and discharge, admission method (emergency, i.e., unplanned admission, or elective, such as admission for renal dialysis, wound dressing, chemotherapy, or elective surgery), and up to 20 primary and secondary recorded diagnoses. Our outcome of interest was recording of psychiatric illness in any of the 20 recorded primary or secondary diagnoses. Diagnoses recorded in HES include those clinically identified by hospital staff during admissions, those derived from preexisting clinical records from secondary mental health trusts or previous hospital medical records, or those obtained following communication with primary care. Separate EHRs are used in different hospital trusts, meaning that diagnoses are not automatically entered in general hospital records from secondary mental healthcare or primary care records. Some EHRs may automatically populate diagnosis fields with previously recorded chronic conditions, although there are no data available to determine the extent of this practice. Covariates We obtained age, sex, ethnicity (White, Mixed, Asian or British Asian, Black or Black British, and other) and marital status (single, married or cohabiting, divorced or separated, and widowed) from SLaM records data. We used the Index of Multiple Deprivation (IMD) [23] to rate neighborhood-level socioeconomic deprivation. Clinical presentation was derived using the Health of the Nation Outcome Scale (HoNOS), which is a clinician-rated measure routinely applied in UK mental health services to assessed patients with good validity and adequate reliability [24]. HoNOS comprises of 12 subscales, each rated on a 5-point Likert scale with higher values indicating more severe problems; we dichotomized these into 0–1, indicating no or minor problems, and 2–4, indicating more severe problems. As we aimed to assess association of mental illness severity with diagnostic recording, we combined the subscales reflecting mental health symptoms (agitation, self-injury, drug/alcohol use, cognitive impairment, delusions/hallucinations and depressed mood) into an ordinal scale indicating 0 symptoms, 1 current mental health symptom, 2 current mental health symptoms, and 3+ current mental health symptoms, as in previous studies [25]. We also used the physical illness and activity of daily living (ADL) impairment subscales as covariates. All covariates were derived from time closest to first general hospital admission. Analysis Our prospective analysis plan is in S1 Text. We described sociodemographic characteristics according to whether SMI had ever been recorded in hospital records using chi-squared tests for categorical data and independent t tests for continuous data. We described the primary diagnoses for admissions. Sensitivity of general hospital SMI diagnosis. We examined sensitivity of psychiatric diagnosis recording at patient-level (proportion of people with SLaM SMI diagnosis who ever had a mental illness recorded in their complete general hospital records as a primary or secondary diagnosis) and emergency admission-level (proportion who have a mental illness recorded in primary or secondary diagnoses during each emergency admission). We chose to examine emergency admissions as nonemergency admissions are usually recurrent brief admissions, which we considered were unlikely to warrant a full diagnostic assessment [26]. In response to peer-reviewer comments, we additionally calculated (1) patient-level sensitivity according to the number of previous hospital admissions (0, 1–5, ≥6) and (2) admission-level sensitivity according to the primary diagnosis recorded for each hospital admission (grouped into ICD-10 categories). We report our sensitivity calculations at different levels of diagnostic accuracy. Our primary analyses calculated sensitivity at the level of any psychiatric diagnosis (proportion of individuals with SLaM diagnosis of any F20-31 disorder who receive any psychiatric diagnosis [F00-99] in HES). We also examined category-specific sensitivity (e.g., proportion of individuals with schizophrenia spectrum disorders [F20-29] who received any F20-29 diagnosis in HES) and disorder-specific diagnosis (e.g. proportion of individuals with schizophrenia diagnosis [F20] who specifically received an F20 diagnosis in HES). We calculated 95% confidence intervals for sensitivity using the Clopper–Pearson method [27]. Time-trend changes from 2006 to 2017. We calculated sensitivity for each participant’s first emergency hospital admission following SMI diagnosis stratified by year of admission and used Cochran–Armitage test to examine sensitivity changes over time [28]. Association of sociodemographic and clinical factors with psychiatric diagnosis being unrecorded. We calculated the association of sociodemographic and clinical characteristics with unrecorded diagnosis (no psychiatric diagnosis ever being recorded at a patient level) using multivariable logistic regression. Included variables were age, sex, ethnicity, marital status, IMD, number of mental illness symptoms, physical illness, ADL impairment, and number of hospital admissions (log-transformed because of the skewed distribution). To explore the impact of missing data, we conducted 2 non-prespecified sensitivity analyses. We first repeated the analysis without HoNOS variables because these data were missing for 21% of the cohort. We also used multiple imputation with chained equations to create 20 imputed datasets using STATA’s mi package by replacing missing values using a model constructed from all available covariates and outcome variables; our imputation used predictive mean matching for continuous data and logistic regression for categorical data. We then conducted logistic regression on each imputed dataset before combining coefficients using Rubin’s rules [29]. The fraction of missing information after imputation was 0.33, so 20 imputed datasets is likely to give replicable estimates of standard error [30]. All analyses were undertaken using STATA SE version 15 (Stata Corp. https://www.stata.com/). Study design and participants We undertook a cohort study, using data from the South London and Maudsley National Health Service (NHS) Foundation Trust (SLaM), one of Europe’s largest secondary mental healthcare trusts providing support to around 1.36 million people living in 4 ethnically diverse communities in South London, UK (Croydon, Lambeth, Lewisham, and Southwark). We used the “Clinical Record Interactive Search” (CRIS) data extraction tool, which enables construction of databases suitable for research by identifying, retrieving, and linking a pseudonymized version of SLaM patient records for over 450,000 individuals [15]. CRIS uses natural language processing (NLP) algorithms developed on General Architecture for Text Engineering (GATE) software [16] to extract information from unstructured fields of the clinical record. SLaM data were linked using deterministic matching procedures to NHS Digital’s Hospital Episode Statistics data source to identify admissions to any English hospital. Oxfordshire Research Ethics Committee C (18/SC/0372) approved these resources for secondary analysis. The terms of the ethical approval do not require consent to be provided, but all participants have the right to opt out of data use at any time. This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline (S1 STROBE Checklist) [17]. These data were selected to generate “reference-standard” SMI diagnoses because they included information from the predominant diagnostic and treatment service for mental illnesses in the catchment area. Patients were diagnosed after assessment by mental health staff, e.g., a psychiatrist, nurse, or psychologist, according to the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) [18], which is the predominant diagnostic framework in the UK. We included study participants who (1) had clinical contact with SLaM mental health services between 1 January 2006 and 31 March 2017 while aged 18 or over, (2) were diagnosed at any time with schizophrenia (ICD-10 code F20), schizotypal disorder (F21), delusional disorder (F22), schizoaffective disorder (F25), other nonorganic psychotic disorder (F28), unspecified nonorganic psychosis (F29), manic episode (F30), or bipolar affective disorder (F31), including those described in clinical coding or records as “probable” or “in remission,” and (3) were admitted to an English NHS general (nonpsychiatric) hospital after the first diagnosis of SMI in CRIS. Participants were identified from diagnoses in structured EHR diagnostic fields or unstructured text. Participants were assigned to only 1 diagnostic category following the “hierarchy” of psychiatric diagnoses [19] (e.g., an individual diagnosed sequentially with schizophrenia [F20] and a manic episode [F30] would be assigned to F20 group), as in previous studies [20]. We did this to avoid double-counting patients who may have been diagnosed at different times within the reference-standard database as having a schizophrenia-like disorder and bipolar disorder; we judged that this most likely represented evolving clinical opinion rather than coexistence of the 2 disorders, which is clinically unlikely [21]. Outcomes We obtained outcome data from Hospital Episode Statistics (HES) records, which include clinical diagnoses according to ICD-10 criteria from admissions to any hospital within England [22]. The HES database is primarily designed to allocate payment to hospitals for the care they provide, and its secondary uses are for research and health service planning. The index date for our study was the first SMI diagnosis within SLaM, and we obtained data from HES on all subsequent admissions of included patients to general (i.e., nonpsychiatric) hospitals, including dates of admission and discharge, admission method (emergency, i.e., unplanned admission, or elective, such as admission for renal dialysis, wound dressing, chemotherapy, or elective surgery), and up to 20 primary and secondary recorded diagnoses. Our outcome of interest was recording of psychiatric illness in any of the 20 recorded primary or secondary diagnoses. Diagnoses recorded in HES include those clinically identified by hospital staff during admissions, those derived from preexisting clinical records from secondary mental health trusts or previous hospital medical records, or those obtained following communication with primary care. Separate EHRs are used in different hospital trusts, meaning that diagnoses are not automatically entered in general hospital records from secondary mental healthcare or primary care records. Some EHRs may automatically populate diagnosis fields with previously recorded chronic conditions, although there are no data available to determine the extent of this practice. Covariates We obtained age, sex, ethnicity (White, Mixed, Asian or British Asian, Black or Black British, and other) and marital status (single, married or cohabiting, divorced or separated, and widowed) from SLaM records data. We used the Index of Multiple Deprivation (IMD) [23] to rate neighborhood-level socioeconomic deprivation. Clinical presentation was derived using the Health of the Nation Outcome Scale (HoNOS), which is a clinician-rated measure routinely applied in UK mental health services to assessed patients with good validity and adequate reliability [24]. HoNOS comprises of 12 subscales, each rated on a 5-point Likert scale with higher values indicating more severe problems; we dichotomized these into 0–1, indicating no or minor problems, and 2–4, indicating more severe problems. As we aimed to assess association of mental illness severity with diagnostic recording, we combined the subscales reflecting mental health symptoms (agitation, self-injury, drug/alcohol use, cognitive impairment, delusions/hallucinations and depressed mood) into an ordinal scale indicating 0 symptoms, 1 current mental health symptom, 2 current mental health symptoms, and 3+ current mental health symptoms, as in previous studies [25]. We also used the physical illness and activity of daily living (ADL) impairment subscales as covariates. All covariates were derived from time closest to first general hospital admission. Analysis Our prospective analysis plan is in S1 Text. We described sociodemographic characteristics according to whether SMI had ever been recorded in hospital records using chi-squared tests for categorical data and independent t tests for continuous data. We described the primary diagnoses for admissions. Sensitivity of general hospital SMI diagnosis. We examined sensitivity of psychiatric diagnosis recording at patient-level (proportion of people with SLaM SMI diagnosis who ever had a mental illness recorded in their complete general hospital records as a primary or secondary diagnosis) and emergency admission-level (proportion who have a mental illness recorded in primary or secondary diagnoses during each emergency admission). We chose to examine emergency admissions as nonemergency admissions are usually recurrent brief admissions, which we considered were unlikely to warrant a full diagnostic assessment [26]. In response to peer-reviewer comments, we additionally calculated (1) patient-level sensitivity according to the number of previous hospital admissions (0, 1–5, ≥6) and (2) admission-level sensitivity according to the primary diagnosis recorded for each hospital admission (grouped into ICD-10 categories). We report our sensitivity calculations at different levels of diagnostic accuracy. Our primary analyses calculated sensitivity at the level of any psychiatric diagnosis (proportion of individuals with SLaM diagnosis of any F20-31 disorder who receive any psychiatric diagnosis [F00-99] in HES). We also examined category-specific sensitivity (e.g., proportion of individuals with schizophrenia spectrum disorders [F20-29] who received any F20-29 diagnosis in HES) and disorder-specific diagnosis (e.g. proportion of individuals with schizophrenia diagnosis [F20] who specifically received an F20 diagnosis in HES). We calculated 95% confidence intervals for sensitivity using the Clopper–Pearson method [27]. Time-trend changes from 2006 to 2017. We calculated sensitivity for each participant’s first emergency hospital admission following SMI diagnosis stratified by year of admission and used Cochran–Armitage test to examine sensitivity changes over time [28]. Association of sociodemographic and clinical factors with psychiatric diagnosis being unrecorded. We calculated the association of sociodemographic and clinical characteristics with unrecorded diagnosis (no psychiatric diagnosis ever being recorded at a patient level) using multivariable logistic regression. Included variables were age, sex, ethnicity, marital status, IMD, number of mental illness symptoms, physical illness, ADL impairment, and number of hospital admissions (log-transformed because of the skewed distribution). To explore the impact of missing data, we conducted 2 non-prespecified sensitivity analyses. We first repeated the analysis without HoNOS variables because these data were missing for 21% of the cohort. We also used multiple imputation with chained equations to create 20 imputed datasets using STATA’s mi package by replacing missing values using a model constructed from all available covariates and outcome variables; our imputation used predictive mean matching for continuous data and logistic regression for categorical data. We then conducted logistic regression on each imputed dataset before combining coefficients using Rubin’s rules [29]. The fraction of missing information after imputation was 0.33, so 20 imputed datasets is likely to give replicable estimates of standard error [30]. All analyses were undertaken using STATA SE version 15 (Stata Corp. https://www.stata.com/). Results We identified 28,832 individuals with F20-31 diagnosis who were seen by SLaM between 1 January 2006 and 31 March 2017 (Fig 1). We excluded 2,410 because of diagnosis of acute and transient psychotic disorder (F23) or induced delusional disorders (F24). A further 12,636 individuals were excluded as they had no hospital admissions in the study period. Our final sample included 13,786 individuals who had 45,706 emergency admissions (Fig 1). Primary ICD-10 diagnoses are described in S1 Table; mental disorder was the primary diagnosis in 7.9% of emergency hospital admissions. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Cohort of individuals with SMI who were also admitted into general hospitals. CRIS, Clinical Record Interactive Search; F20, schizophrenia; F21, schizotypal disorder; F22, delusional disorder; F23, acute and transient psychotic disorder; F24, induced delusional disorders; F25, schizoaffective disorder; F28, other nonorganic psychotic disorder; F29, unspecified nonorganic psychosis; F30, manic episode; F31, bipolar affective disorder; HES, Hospital Episode Statistics; ICD-10, International Statistical Classification of Diseases and Related Health Problems, Tenth Revision; SMI, severe mental illness. https://doi.org/10.1371/journal.pmed.1003306.g001 The mean age was 46.9 years, 51% of participants were female, and the majority were single (64.7%) (Table 1). Most were White ethnicity (59.5%), and the largest ethnic minority group was Black African/Caribbean (27.7%). Three-quarters (10,574, 76.7%) of participants had a schizophrenia spectrum disorder, and 3,212 (23.3%) had bipolar disorder. At the time closest to first general hospital admission, many participants displayed no (24.8%) or one (25.1%) significant mental health symptoms; full information on prevalence of clinical problems in individual HoNOS domains is in S2 Table. The median number of hospital admissions was 3 admissions (interquartile range [IQR] 1, 5) with median duration of follow-up being 7.19 years [IQR 0.4, 10]; 3,047 people died during follow-up. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Sociodemographic and clinical characteristics of all participants, according to whether psychiatric diagnosis ever made in subsequent general hospital records. https://doi.org/10.1371/journal.pmed.1003306.t001 Sensitivity of general hospital SMI diagnosis We found that 10,574 of 13,786 people with SMI had any psychiatric illness recorded during their subsequent general hospital admissions, meaning that sensitivity at the level of each individual patient’s complete hospital records was 76.7% (95% CI 76.0–77.4) (Table 2). Patient-level sensitivity was 57.3% (55.8–58.8) for patients with only 1 admission, 80.7% (79.7–81.6) for those with 2–5 admissions, 92.8% (91.5, 93.9) for those with 6–10 admissions, and 96.7 (95.7, 97.6) for those with 11+ admissions. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Sensitivity of general hospital diagnoses of SMI 2006–17, at the level of each individual patient’s whole-hospital records. https://doi.org/10.1371/journal.pmed.1003306.t002 Sensitivity is lower when examining more specific diagnosis. Sensitivity was 56.4% (55.4–57.4) for category-specific ICD-10 diagnosis for schizophrenia spectrum disorder (F20-29) and 49.7% (48.1–51.3) for bipolar affective disorders (F30-31). Full results for disorder-specific recording can be found in S3 Table. Patient-level sensitivity for schizophrenia (F20) was 45.1% (44.5–45.7) and 40.4% (39.5–41.3) for bipolar affective disorder (F31). Disorder-specific sensitivity for other rarer conditions such as schizoaffective or schizotypal disorder was lower. For admission-level recordings, 32,033 of 45,706 emergency hospital admissions had any psychiatric diagnosis recorded meaning that sensitivity was 70.1% (69.7–70.5) (Table 3). Recording of any psychiatric diagnosis was 71.1% (70.7–71.6) for schizophrenia spectrum disorders and 67.1% (66.2–67.9) for bipolar affective disorders. Category-specific admission-level sensitivity was 45.5% (45.0–46.1) for admissions of people with schizophrenia spectrum disorder (F20-29) and 39.6% (38.7–40.5) for bipolar affective disorders. Admission-level sensitivity according to primary diagnosis varied from 33.1% (31.3–34.9) for admissions related to pregnancy, childbirth, and the puerperium (ICD-10 codes O00-O99) to 79.7% (78.4, 81.0) for poisoning-related admissions (T36-65) and 81.0% (79.0, 82.9) for endocrine, nutritional, and metabolic diseases (E00-E90) (see S1 Table for full results). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Sensitivity of general hospital diagnoses of SMI 2006–17, at the level of emergency hospital admissions only. https://doi.org/10.1371/journal.pmed.1003306.t003 Time-trend changes from 2006 to 2017 Sensitivity of recording for any psychiatric diagnosis increased from 47.8% (43.1–52.5) for emergency admissions during 2006 to 75.4% (68.3–81.4) for admissions during 2017 (ptrend< 0.001 [χ2 = 326, 1 df]), although much of this change was observed between 2009 and 2012 (Fig 2; full data in S4 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Time-trend of sensitivity for schizophrenia spectrum disorders and bipolar affective disorders diagnosis in general hospitals. Points represent the proportion of people’s first emergency hospital admissions following severe mental illness diagnosis in which a mental illness is recorded. Error bars show 95% confidence interval, and linear trend line shows change over time. https://doi.org/10.1371/journal.pmed.1003306.g002 Association of sociodemographic and clinical factors with psychiatric diagnosis being unrecorded In unadjusted analyses, age, ethnicity, marital status, mental and physical symptoms, functional impairment, and number of hospital admissions were associated with diagnostic recording (Table 4). In mutually adjusted multivariable analysis, those from Black African/Caribbean backgrounds were more likely (odds ratio [OR] = 1.38 [95% CI 1.24, 1.55; p < 0.001]) to have no psychiatric diagnosis ever recorded compared with those from White ethnic backgrounds. Marital status was also associated with diagnostic accuracy; single people were less likely (OR = 0.78 [95% CI 0.63–0.92; p < 0.001]) to have no psychiatric disorder recorded compared with married individuals, as were divorced (OR = 0.76 [95% CI 0.63–0.92; p = 0.004]) or widowed people (OR = 0.77 [95% CI 0.60–1.00; p = 0.046]). More mental health symptoms were associated with greater diagnostic accuracy, with 2 (OR = 0.71 [95% CI 0.62–0.83; p < 0.001]) or 3 plus symptoms (OR = 0.61 [95% CI 0.52–0.73; p < 0.001]) being associated with lower risk of no psychiatric diagnosis being recorded. Difficulties with activities of daily living (OR = 0.71; 95% CI 0.74–0.95; p < 0.001) and physical illness (OR = 0.84; 95% CI 0.74–0.95; p = 0.004) were also associated with lower risk of unrecorded diagnosis. Results were consistent in sensitivity analyses without inclusion of HoNOS symptoms (S5 Table) and with multiple imputation for missing covariates (S6 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Association of clinical and sociodemographic characteristics with psychiatric diagnosis of people with severe mental illness not being recorded in general hospital records: Univariate and multivariable logistic regression. https://doi.org/10.1371/journal.pmed.1003306.t004 Sensitivity of general hospital SMI diagnosis We found that 10,574 of 13,786 people with SMI had any psychiatric illness recorded during their subsequent general hospital admissions, meaning that sensitivity at the level of each individual patient’s complete hospital records was 76.7% (95% CI 76.0–77.4) (Table 2). Patient-level sensitivity was 57.3% (55.8–58.8) for patients with only 1 admission, 80.7% (79.7–81.6) for those with 2–5 admissions, 92.8% (91.5, 93.9) for those with 6–10 admissions, and 96.7 (95.7, 97.6) for those with 11+ admissions. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Sensitivity of general hospital diagnoses of SMI 2006–17, at the level of each individual patient’s whole-hospital records. https://doi.org/10.1371/journal.pmed.1003306.t002 Sensitivity is lower when examining more specific diagnosis. Sensitivity was 56.4% (55.4–57.4) for category-specific ICD-10 diagnosis for schizophrenia spectrum disorder (F20-29) and 49.7% (48.1–51.3) for bipolar affective disorders (F30-31). Full results for disorder-specific recording can be found in S3 Table. Patient-level sensitivity for schizophrenia (F20) was 45.1% (44.5–45.7) and 40.4% (39.5–41.3) for bipolar affective disorder (F31). Disorder-specific sensitivity for other rarer conditions such as schizoaffective or schizotypal disorder was lower. For admission-level recordings, 32,033 of 45,706 emergency hospital admissions had any psychiatric diagnosis recorded meaning that sensitivity was 70.1% (69.7–70.5) (Table 3). Recording of any psychiatric diagnosis was 71.1% (70.7–71.6) for schizophrenia spectrum disorders and 67.1% (66.2–67.9) for bipolar affective disorders. Category-specific admission-level sensitivity was 45.5% (45.0–46.1) for admissions of people with schizophrenia spectrum disorder (F20-29) and 39.6% (38.7–40.5) for bipolar affective disorders. Admission-level sensitivity according to primary diagnosis varied from 33.1% (31.3–34.9) for admissions related to pregnancy, childbirth, and the puerperium (ICD-10 codes O00-O99) to 79.7% (78.4, 81.0) for poisoning-related admissions (T36-65) and 81.0% (79.0, 82.9) for endocrine, nutritional, and metabolic diseases (E00-E90) (see S1 Table for full results). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Sensitivity of general hospital diagnoses of SMI 2006–17, at the level of emergency hospital admissions only. https://doi.org/10.1371/journal.pmed.1003306.t003 Time-trend changes from 2006 to 2017 Sensitivity of recording for any psychiatric diagnosis increased from 47.8% (43.1–52.5) for emergency admissions during 2006 to 75.4% (68.3–81.4) for admissions during 2017 (ptrend< 0.001 [χ2 = 326, 1 df]), although much of this change was observed between 2009 and 2012 (Fig 2; full data in S4 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Time-trend of sensitivity for schizophrenia spectrum disorders and bipolar affective disorders diagnosis in general hospitals. Points represent the proportion of people’s first emergency hospital admissions following severe mental illness diagnosis in which a mental illness is recorded. Error bars show 95% confidence interval, and linear trend line shows change over time. https://doi.org/10.1371/journal.pmed.1003306.g002 Association of sociodemographic and clinical factors with psychiatric diagnosis being unrecorded In unadjusted analyses, age, ethnicity, marital status, mental and physical symptoms, functional impairment, and number of hospital admissions were associated with diagnostic recording (Table 4). In mutually adjusted multivariable analysis, those from Black African/Caribbean backgrounds were more likely (odds ratio [OR] = 1.38 [95% CI 1.24, 1.55; p < 0.001]) to have no psychiatric diagnosis ever recorded compared with those from White ethnic backgrounds. Marital status was also associated with diagnostic accuracy; single people were less likely (OR = 0.78 [95% CI 0.63–0.92; p < 0.001]) to have no psychiatric disorder recorded compared with married individuals, as were divorced (OR = 0.76 [95% CI 0.63–0.92; p = 0.004]) or widowed people (OR = 0.77 [95% CI 0.60–1.00; p = 0.046]). More mental health symptoms were associated with greater diagnostic accuracy, with 2 (OR = 0.71 [95% CI 0.62–0.83; p < 0.001]) or 3 plus symptoms (OR = 0.61 [95% CI 0.52–0.73; p < 0.001]) being associated with lower risk of no psychiatric diagnosis being recorded. Difficulties with activities of daily living (OR = 0.71; 95% CI 0.74–0.95; p < 0.001) and physical illness (OR = 0.84; 95% CI 0.74–0.95; p = 0.004) were also associated with lower risk of unrecorded diagnosis. Results were consistent in sensitivity analyses without inclusion of HoNOS symptoms (S5 Table) and with multiple imputation for missing covariates (S6 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Association of clinical and sociodemographic characteristics with psychiatric diagnosis of people with severe mental illness not being recorded in general hospital records: Univariate and multivariable logistic regression. https://doi.org/10.1371/journal.pmed.1003306.t004 Discussion In this study, we examined accuracy of general hospital records for people with SMI admitted to general hospitals, finding that at least some form of psychiatric diagnosis was recorded in 70.1% of individual emergency hospital admissions and in 76.7% of patients’ complete hospital discharge records. Our findings suggest that accuracy of SMI recording in general hospitals has improved over time with sensitivity for any psychiatric diagnosis in those experiencing emergency hospital admissions increasing from 47.8% in 2006 to 75.4% in 2017. Unrecorded psychiatric diagnosis was more likely in people with milder symptoms or higher ADL scores, married individuals, and ethnic minority groups. Although there has been improvement in recording of SMI within the general hospitals investigated, our findings show there are nearly 25% of individuals with established SMI diagnosis who have never had a preexisting SMI recorded throughout their, on average, 3 subsequent general hospital admissions. Moreover, recording is lower for category-specific sensitivity. Thus, the proportion of individuals with schizophrenia spectrum disorders who received a diagnosis within that same ICD-10 F20-29 category was 56.4%, compared with 77.5% for recording of any psychiatric diagnosis, and for bipolar affective disorders, these figures were 49.7% and 74.6%, respectively. The lower sensitivity for more specific diagnoses may reflect uncertainty about diagnostic labels, including whether these are accurate and used or communicated appropriately [31]; there is the potential for disagreement between clinicians and for different diagnostic practices in different settings. The relatively low sensitivity at a category-level is potentially problematic, when considering the importance of specific diagnosis for treatment outcomes and continuity of care, such as the different symptom clusters, illness severity, and pharmacological treatment indicated for schizophrenia compared with, for example, an anxiety disorder [32]. We acknowledge, however, that it is not necessarily the role of a physician or surgeon in a physical healthcare setting to diagnose a specific psychiatric condition, and instead, recognition of the presence of a mental illness may suffice if it leads to a referral to more specialist mental healthcare service such as liaison psychiatry. Other holistic measures to promote the wellbeing of a general hospital inpatient with SMI may also be instituted after identification of the presence of a psychiatric disorder. These may include consultation with family to identify whether psychiatric symptoms are new or preexisting, discussion with community mental health teams, or provision of a side room offered so that sleep could be prioritized. We are not aware of any previous studies that have examined sensitivity of psychiatric diagnosis recording in English general hospitals, so cannot compare our findings directly with others. Our findings are lower than the 88% and 96.3% coding accuracy reported for general practice EHR [10, 33] and deliberate self-poisoning admissions in UK hospital [34]. However, general practice records are likely to include lifelong diagnostic records, rather than be a snapshot of a clinical episode such as hospital admissions, so may be more likely to include a diagnosis received in another service sector. Self-poisoning is likely to have been the cause of presentation to hospital, and so recording of this is expected to be higher. Sensitivity in our study was higher than the figure of 52% reported for recognition of alcohol disorders by hospital staff [12]. Increased recording of diagnosis since the introduction of payment by results in 2006/2007 from 49.5% to 81.3% in 2017 is likely due to a range of different reasons from changes to policy approaches such as the “Five-Year Forwards” plan for whole-person centered care [34], financial incentives [35], improvements in coding practices [35], and expansion of liaison psychiatric services [36], whereby psychiatric diagnoses are more readily available (whether assigned anew or confirmed from previous records) from mental health specialists. The association between clinical symptoms and unrecorded psychiatric diagnosis showed a dose–response gradient. It is likely that increasing symptom severity reflects clinical complexity meaning that difficulties are more prominent, and diagnosis is therefore more likely to be recorded. Similarly, symptom severity might reflect more frequent or longer hospital admissions during which clinicians have more opportunity to investigate clinical records. These individuals might be better known to clinical staff as they have more regularly updated HES records. People with milder SMI are less likely to have acute psychiatric healthcare needs, so lower accuracy in milder cases may be of less concern. Unrecorded diagnosis was more common in individuals from Black African/Caribbean or other ethnic minority backgrounds. There are several potential reasons for this, relating to patient or service-level factors [37]. Patient-level factors include language barriers, lack of information disclosure, or distrust of clinicians from different backgrounds, and service-level factors include clinician bias, stigma, or lack of cultural awareness [38]. Considering poor health outcomes for people from minority ethnic groups with SMI, including increased cardiovascular risk factors [39], reduced support post discharge [40], and increased rates of compulsory admissions [41], this is particularly concerning. Recording of SMI in general hospitals might be an early opportunity for support as this is a setting for key treatment decisions and an opportunity for enhanced continuity of care [9, 42]. We did not find association of diagnostic accuracy with Asian or mixed backgrounds, possibly because of the smaller sample size giving less statistical power or reflecting other factors such as socioeconomic status. Counter to our expectation based on analyses of other mental disorders [43, 44], being married was associated with higher likelihood of diagnosis being unrecorded. This may be explained by marital status being a marker of SMI severity and chronicity, whereby symptoms are higher in those who are single as compared with those in relationships. However, adjustment for symptom severity did not attenuate this association. Alternatively, it may be that support from a partner makes mental health symptoms less apparent as clinicians might assume that someone is receiving support from home or feel less inclined to ask about symptoms of mental illness. Similarly, it might be that partners are less reluctant to disclose information in regard to SMI diagnosis because of increased stigma associated with such conditions. Sensitivity appeared to vary according to the primary reason for admission so that admissions for poisoning (which may have been precipitated by mental health symptoms) or those for endocrine and metabolic conditions (which may result from psychotropic adverse effects) had higher diagnostic recording. The lower recording for pregnancy and childbirth-related is of concern and warrants further future investigation, though it may reflect coding practice differences. Important strengths of this study include its large sample size, representativeness of people with SMI diagnosis, and data availability over a decade enabling analysis of changes over time. The main limitation is that EHRs are not primarily collected for research purposes, meaning that using SLaM records as the “reference standard” might be problematic because the amount of clinical contact before diagnosis, and the diagnostic process may have varied, so some of our cohort may have been misdiagnosed. However, SLaM provides specialist diagnostic services, and we grouped patients according to the most recent recorded diagnoses from either structured quantitative health outcomes or rich unstructured clinical records. This approach also allowed construction of a large cohort that was representative of people with clinically diagnosed SMI, which would not have been feasible had we interviewed all participants with a standardized clinical assessment. Although there is wider debate about the validity of psychiatric diagnostic constructs [45], SMI categories are considered stable and persistent over time [46]. Furthermore, our primary analysis considered sensitivity at the level of all psychiatric diagnoses, for example, whether an individual with schizophrenia had any mental illness recorded, as we acknowledged the potential limitations of the reference standard. Our cohort’s derivation from secondary mental health services may have meant that participants had more pronounced symptoms and care-seeking behaviors, which would likely result in overestimation of sensitivity, but we included individuals with SMI in remission and included a range of clinical severity. Although HES records cover all English hospitals, most admissions will have been in South London hospitals. Diagnostic practice in rural settings where clinical populations and resources may differ. Finally, use of general hospital discharge diagnoses may miss nuances of diagnostic practice in general hospital in which the SMI may have been recognized during the admission but not recorded on final discharge, and our observational study design meant that we were unable to examine whether diagnostic recording affected clinical outcomes. Future studies should examine whether diagnostic recording affects outcomes, such as length of stay or readmission rate, and has any negative effects such as stigma or diagnostic overshadowing of physical illness. This may require more detailed scrutiny of individuals’ case records and could be analyzed as part of future evaluations of liaison/consultation psychiatry services. It is also important to consider other relevant metrics of diagnostic accuracy, such as positive predictive value or specificity, which would require reference-standard data representative of people without psychiatric illness. Future research should also aim to elucidate the mechanisms for association between ethnicity, marital status, and symptom severity and diagnostic recording in general hospitals and evaluate effective approaches to improving diagnostic practice. Our findings have important clinical, research, and policy implications. Researchers wishing to use hospital EHRs such as HES to ascertain cases of schizophrenia-like illnesses and bipolar disorder should be aware that around one quarter of known cases are likely to be unrecorded and there is potential for systematic bias whereby ethnic minorities, married people, and people with milder symptoms will be missed. Although our findings suggest that sensitivity is improving over time, there were around 30% of admissions in which people with established SMI did not have any psychiatric diagnosis recorded, suggesting that more needs to be done by policymakers to bridge the gap for “whole-person centered” care. Hospital settings should endeavor to improve diagnostic recording, particularly for high-risk groups. This may include training staff in culturally sensitive diagnosis for ethnic minority populations, expansion of mental healthcare input in general hospitals and collaborative working with these liaison psychiatry services, and proactive contact with primary care and mental health services to elicit information about past psychiatric history. Better data sharing between physical and mental health services such as through harmonized clinical records could improve accuracy of mental illness in physical healthcare, and physical illness in mental healthcare services, to move towards truly integrated healthcare for people with mental illness. Supporting information S1 STROBE Checklist. STROBE, Strengthening the Reporting of Observational Studies in Epidemiology. https://doi.org/10.1371/journal.pmed.1003306.s001 (DOCX) S1 Text. Prospective analysis plan. https://doi.org/10.1371/journal.pmed.1003306.s002 (DOCX) S1 Table. Primary diagnosis (ICD-10 code) for emergency hospital admissions of people with severe mental illness. ICD-10, International Statistical Classification of Diseases and Related Health Problems, Tenth Revision. https://doi.org/10.1371/journal.pmed.1003306.s003 (DOCX) S2 Table. Clinical characteristics of people with severe mental illness admitted to a general hospital, according to whether mental illness was recorded in their hospital records. https://doi.org/10.1371/journal.pmed.1003306.s004 (DOCX) S3 Table. Disorder-specific sensitivity of general hospital emergency admission records for people with severe mental illness. https://doi.org/10.1371/journal.pmed.1003306.s005 (DOCX) S4 Table. Recording of mental illness in people with severe mental illness admitted to general hospitals: By year of first emergency hospital admission. https://doi.org/10.1371/journal.pmed.1003306.s006 (DOCX) S5 Table. Association of sociodemographic characteristics with recording of mental illness in general hospital records: Multivariable regression without Health of Nation Outcome Scale clinical characteristics. https://doi.org/10.1371/journal.pmed.1003306.s007 (DOCX) S6 Table. Association of clinical and sociodemographic characteristics with psychiatric diagnosis of people with severe mental illness not being recorded in general hospital records: Multivariable regression with multiple imputation for missing variables. https://doi.org/10.1371/journal.pmed.1003306.s008 (DOCX)
How to make your research jump off the page: Co-creation to broaden public engagement in medical researchFinley, Nina;Swartz, Talia H.;Cao, Kevin;Tucker, Joseph D.
doi: 10.1371/journal.pmed.1003246pmid: 32925970
Summary points Many scientific research manuscripts are intended for other researchers and not the public. However, the public are involved in research as participants, taxpayers, and patients. We discuss co-creation and how it can be used to enhance medical research. Co-creation is an iterative, bidirectional collaboration between researchers and laypeople to create knowledge. This process can broaden public engagement in medical research. Co-creation is related to theories of crowdsourcing, community-based participatory research, citizen science, and participatory action research. Public online calls for input, crowdsourcing contests, hackathons, and participatory design sessions are all examples of activities to co-create with the public. Infographics and videos are two tools that can be used to broaden public engagement in medical research. Methods We identified systematic reviews, randomized controlled trials, and observational studies that show how co-creation can be used to enhance public engagement in medical research. We searched three databases using the terms “co-creation,” “public engagement,” and “research manuscript.” The search was initially undertaken January 20, 2019, and updated June 19, 2020. Although we included some theoretical literature related to co-creation, the focus was on applications related to writing research manuscripts. In this narrative opinion piece, we introduce the conventional approach to public engagement and suggest co-creation as a tool to help medical researchers engage the public. As part of the piece, we issued a public online call to solicit feedback on an infographic on June 18, 2019 [7]. An infographic is an image that presents information in a manner easily understood by nonexperts. The open call noted that suggestions would be used to improve the infographic and that compiled open access resources would be shared. The problem The conventional approach to public engagement in medical research is one of benign neglect. A systematic review found that patient engagement was feasible in many medical research settings [8]. However, public engagement has generally been limited to the early phases of a study and not the final phases of creating a manuscript [8]. For example, engagement in clinical trials often takes the form of a community advisory board reviewing ways to optimize participant recruitment from the perspective of people living with the disease. While this input is useful for developing studies, it risks lapsing into tokenistic relationships between researchers and community members [8]. Fewer research studies engage the public in later research phases, such as developing a manuscript and sharing findings in a way that could be understood by the public. Co-creation with the public Co-creation is an iterative, bidirectional collaboration between researchers and laypeople to create knowledge. We focus on co-creation as it relates to writing medical research manuscripts. Co-creation could include making research results available to the public earlier, in the form of a preprint or other publicly accessible form. Co-creation provides a structured process for broadening public engagement in research. This process is related to several types of engagement models, including crowdsourcing [9], citizen science [10], community-based participatory research [11], youth participatory action research [12], and patient and public involvement [13]. Using co-creation in research introduces several questions about who should be involved, extent of participation, acknowledging and recognizing participation, and related ethical issues [14, 15]. Co-creation approaches include public online calls for input, crowdsourcing contests [15], hackathons [16], and participatory design sessions [17]. Public online calls for input are the simplest co-creation method. Social media platforms such as Facebook and Twitter allow researchers to post drafts of research content (e.g., infographics, preprints, videos) and receive real-time feedback from experts and nonexperts alike. For example, in creating this manuscript, we posted a draft infographic online (Fig 1) in order to solicit public feedback. The message resulted in 2,647 impressions (the number of times a tweet shows up in someone’s timeline) according to Twitter analytics, resulting in helpful feedback that improved the message (Fig 1). In addition, preprints allow the public free access to scientific research. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Co-creation in public engagement, developed using an open call through social media. Two authors posted the image on their respective Twitter feeds. https://doi.org/10.1371/journal.pmed.1003246.g001 Crowdsourcing contests (also called innovation challenges, challenge contests, and prize contests) allow a group of individuals to tackle a problem and proposes solutions [9]. Contests have been used to create concepts, images, videos, and songs related to medical research. A recent systematic review identified 188 studies that used crowdsourcing in health and medical research [18]. The steps of conducting a crowdsourcing contest are typically to identify a steering group, solicit ideas from the public online, and select exceptional solutions to use or publish [15]. For example, a Chinese public health group used crowdsourcing to develop a campaign focused on increasing rates of testing for HIV (S1 Fig) [4]. This project had members of the public designing messages, images, and service models that were ultimately implemented in eight cities. Data from randomized controlled trials suggests that crowdsourcing contests are effective in creating sexual health messages [4, 19–21]. Hackathons (also known as hackfests, hack days, or designathons) are brief, sprint-like events in which individuals physically convene to focus on one topic for a short period [16]. Participants, judges, and steering committee members are often members of the public, and expert mentors are available to provide guidance. Although originally focused on developing computer software or hardware, hackathons are now used to spur innovation in both the content [22] and presentation [23] of medical research. Participatory design sessions are in-person community gatherings organized by researchers. They can be used to brainstorm ideas, understand local perceptions of research, and elicit feedback on how results are presented. A team from Columbia University found that health-related infographics co-created with community members of diverse ages, languages, and health literacy levels were more informative, contextualized, and understandable for readers [17]. Co-creating infographics and videos Two excellent tools for public engagement are infographics and videos [6, 24]. These complementary tools are commonly accepted at major medical journals and may be useful for engaging public audiences. Infographics are similar to journal figures in that they are compact, data-rich visuals. The difference is that infographics should be easier to read and focus on one key message [25] and can engage people with varying literacy capabilities [17, 26, 27]. Infographics can be created by hand, as in the technique of sketch-noting, or by computer software. The key questions that need to be considered when creating an infographic are the “who, what, why, when, how, and where” of the message (Table 1). The University of Leeds and Public Health England published an open-access guide to creating infographics (S1 Text). This guide provides clear, user-tested advice on how to define the audience, align key components, and arrange visual elements. Infographics can be disseminated in many places: social media, email newsletters, blogs, or the local university bulletin board. A study in Northern Ireland found that patients who viewed an infographic were more likely to understand cancer risk factors than those who read the same information as text [26]. Two small studies suggest that people are more likely to read the abstracts of research articles with infographics than of those without [28, 29]. One Croatian study found that readers of Cochrane systematic reviews interpreted infographic and text summaries with equal accuracy, but enjoyed the infographics more [30]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Key questions to answer in preparing an infographic or video related to medical research. https://doi.org/10.1371/journal.pmed.1003246.t001 Videos have the advantage of reaching some audiences who may not read medical research journals, including individuals for whom English is a second language and those who face socioeconomic barriers [6, 31, 32]. Short videos co-created with the public suggest new ways to empower groups that are underrepresented in research [24, 31, 33]. Video formats include whiteboard time lapses, filmed interviews with investigators, and short animations. Creating a video to explain key research findings does not require specialist training or resources of a professional production, such as a TED Talk (S2 Text). We have developed a guide for how to make a time-lapse video using only a whiteboard, smartphone, and basic video-editing software available on most computers (S3 Text). Researchers in China used a crowdsourcing contest to co-create sexual health videos with the public [20, 34]. The results showed that crowdsourced videos worked equally well or better than videos produced by a commercial social media firm. An example of a co-created video is included (S4 Text). Co-creation to enhance public engagement has some limitations that should be noted. Sharing of potentially identifiable research data online must adhere to the same guidelines for ensuring patient confidentiality that exist with any identifiable research data. In instances in which identifiable information is shared, specific consent for online sharing is important. Second, infographics and videos will not replace conventional figures and tables in medical research articles. However, they could be a useful adjunct to extend public engagement. Third, there may be disciplines or settings in which the public may not be able or willing to be engaged. For example, research on proprietary materials or potentially traumatic topics (e.g., child maltreatment, female genital mutilation) may be less suitable for public engagement [35]. We encourage researchers to think beyond academic audiences and co-create with the public. What projects are you working on now that could lend themselves to co-creation? Start simple: try sketching an infographic of the results from your latest project and posting it on social media for public feedback. Craft a visual abstract when you submit your next research manuscript. Present your preliminary results to local community partners and incorporate their insights and wisdom; you may be surprised at how much your team and the local partners can gain from this process. The co-creation process can ultimately create greater transparency and accountability in research. Co-creation of research and visual aids can be the difference between a dusty manuscript on the shelf and an article that sparks conversation. Supporting information S1 Fig. Two images developed through a crowdsourcing contest. The contest had a steering committee, open call for submissions, evaluation of submissions, prizes awarded to finalists, and recognition of all those who contributed. https://doi.org/10.1371/journal.pmed.1003246.s001 (DOCX) S1 Text. Open access resources for designing infographics for public health (noncommercial). https://doi.org/10.1371/journal.pmed.1003246.s002 (DOCX) S2 Text. Open access resources for designing videos. https://doi.org/10.1371/journal.pmed.1003246.s003 (DOCX) S3 Text. How to make a time-lapse video. https://doi.org/10.1371/journal.pmed.1003246.s004 (DOCX) S4 Text. Example of a co-created video presentation. https://doi.org/10.1371/journal.pmed.1003246.s005 (DOCX) Acknowledgments We would like to thank the SESH (Social Entrepreneurship to Spur Health) team for administrative support.
The prevalence of mental illness in refugees and asylum seekers: A systematic review and meta-analysisBlackmore, Rebecca;Boyle, Jacqueline A.;Fazel, Mina;Ranasinha, Sanjeeva;Gray, Kylie M.;Fitzgerald, Grace;Misso, Marie;Gibson-Helm, Melanie
doi: 10.1371/journal.pmed.1003337pmid: 32956381
Background Globally, the number of refugees and asylum seekers has reached record highs. Past research in refugee mental health has reported wide variation in mental illness prevalence data, partially attributable to methodological limitations. This systematic review aims to summarise the current body of evidence for the prevalence of mental illness in global refugee populations and overcome methodological limitations of individual studies. Methods and findings A comprehensive search of electronic databases was undertaken from 1 January 2003 to 4 February 2020 (MEDLINE, MEDLINE In-Process, EBM Reviews, Embase, PsycINFO, CINAHL, PILOTS, Web of Science). Quantitative studies were included if diagnosis of mental illness involved a clinical interview and use of a validated assessment measure and reported at least 50 participants. Study quality was assessed using a descriptive approach based on a template according to study design (modified Newcastle-Ottawa Scale). Random-effects models, based on inverse variance weights, were conducted. Subgroup analyses were performed for sex, sample size, displacement duration, visa status, country of origin, current residence, type of interview (interpreter-assisted or native language), and diagnostic measure. The systematic review was registered with PROSPERO (CRD) 42016046349. The search yielded a result of 21,842 records. Twenty-six studies, which included one randomised controlled trial and 25 observational studies, provided results for 5,143 adult refugees and asylum seekers. Studies were undertaken across 15 countries: Australia (652 refugees), Austria (150), China (65), Germany (1,104), Italy (297), Lebanon (646), Nepal (574), Norway (64), South Korea (200), Sweden (86), Switzerland (164), Turkey (238), Uganda (77), United Kingdom (420), and the United States of America (406). The prevalence of posttraumatic stress disorder (PTSD) was 31.46% (95% CI 24.43–38.5), the prevalence of depression was 31.5% (95% CI 22.64–40.38), the prevalence of anxiety disorders was 11% (95% CI 6.75–15.43), and the prevalence of psychosis was 1.51% (95% CI 0.63–2.40). A limitation of the study is that substantial heterogeneity was present in the prevalence estimates of PTSD, depression, and anxiety, and limited covariates were reported in the included studies. Conclusions This comprehensive review generates current prevalence estimates for not only PTSD but also depression, anxiety, and psychosis. Refugees and asylum seekers have high and persistent rates of PTSD and depression, and the results of this review highlight the need for ongoing, long-term mental health care beyond the initial period of resettlement. Why was this study done? Globally, the numbers of refugees and asylum seekers have reached record highs. This systematic review aims to estimate how common mental illnesses are in current adult refugee and asylum-seeker populations. What did the researchers do and find? We performed a comprehensive literature search looking for studies that diagnosed mental illness in refugee and asylum-seeker populations. For studies to be included, the diagnosis must have resulted from a clinical interview using a validated diagnostic assessment measure. We found adult refugee and asylum seekers have high and persistent rates of posttraumatic stress disorder (PTSD) and depression. The prevalence of anxiety disorders and psychosis are more comparable to findings from general populations. What do these findings mean? The increased prevalence of PTSD and depression appears to persist for many years after displacement. These results highlight the importance of early and ongoing mental health care, extending beyond the period of initial resettlement, to promote the health of refugees and asylum seekers. Introduction Globally, the numbers of refugees and asylum seekers have reached record highs [1]. Ongoing conflicts around the world raise challenging social, political, and humanitarian issues [2]. For host-country health systems, the refugee crisis can have major implications for service planning and provision. Refugees and asylum seekers may have been exposed to traumatic events such as conflict, loss or separation from family, a life-threatening journey to safety, long waiting periods, and complexities with acculturation [3,4]. A sizable proportion are therefore at risk of developing psychological symptoms and major mental illness that can persist for many years after resettlement [5]. Estimates of the prevalence of mental illness in refugees vary greatly, even at the level of systematic reviews. Fazel and colleagues (2005) [6] conducted a systematic review and meta-analysis of refugees resettled in high-income countries, covering the period 1986–2004, and reported a prevalence of 9% for posttraumatic stress disorder (PTSD), 5% for major depressive disorder, and 4% for generalised anxiety disorder, based on studies reporting at least 200 participants. A subsequent systematic review into the association between torture or other traumatic events and PTSD and depression, covering studies between 1987 and 2009 and comprising 81,866 refugees and conflict-affected populations, reported an unadjusted weighted prevalence of 30% for PTSD and 30% for depression [7]. A recent systematic review of 8,176 Syrian refugees resettled in 10 countries reported a prevalence of 43% for PTSD, 40% for depression, and 26% for anxiety [8]. As the literature has focused on either specific cultural groups or specific host nations or has combined internally displaced populations with refugees and asylum seekers, there is a lack of estimates on the prevalence of mental illness in more representative refugee and asylum-seeker populations [9–12]. There is also a lack of research investigating the full breadth of mental illness, as the literature has mainly focused on PTSD and depression, hence the need for a comprehensive, worldwide, systematic review to investigate mental illness in the current refugee populations. Some of the variation across individual studies may be attributable to methodological differences. For example, self-report measures tend to overestimate symptomatology, yet the literature relies heavily on these data rather than comprehensive psychiatric assessments using validated diagnostic tools [7,13]. There is also no uniform refugee experience: country of origin or resettlement, duration of displacement, or experience of displacement, amongst other important factors. Given the changing nature of forced displacement and record numbers of refugees and asylum seekers, it is timely to reexamine this topic based on the many studies published since the two previously mentioned major reviews. Current prevalence information could be a powerful tool for advocacy and also assist host countries and humanitarian agencies to strengthen health services to provide the essential components of timely diagnosis and treatment for mental illnesses, in line with the priorities and objectives of the World Health Organization (WHO) Draft Global Action Plan ‘Promoting the health of refugees and migrants’ (2019–2023) [14]. Providing appropriate, early, and ongoing mental health care to refugees and asylum seekers benefits not only the individual but the host nation, as it improves the chances of successful reintegration, which has long-term benefits for the social and economic capital of that country, which will likely impact not only the displaced generation but the second generation as well [15]. Bringing together the global literature on the prevalence of mental illness in refugee and asylum-seeker populations would also enable the research community to move ahead and focus on different components of the mental health needs of this population, for example, on interventions, on less well-understood mental health conditions, or longitudinal mental health trajectories. This systematic review aims to establish the current overall prevalence of mental illnesses in refugee and asylum-seeker populations by summarising the current global body of evidence and overcoming some methodological limitations of individual studies. Methods Search strategy and selection criteria We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement (S1 Prisma Checklist) [16] and registered the protocol with PROSPERO (record CRD42016046349) (https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=46349). The search was based on that used in the earlier systematic review by Fazel and colleagues [6] but expanded to increase the range of databases searched, number of search terms, and stricter criteria regarding study inclusion. This review also placed no restrictions on resettlement countries. In total, eight databases were searched: MEDLINE, MEDLINE In-Process, EBM Reviews, Embase, PsycINFO, CINAHL, PILOTS, Web of Science. The search strategy included terms for refugees and asylum seekers and terms related to mental illness, diagnosis, and trauma. An example of a complete search string is provided in S1 Table. The date limits of the search were 1 January 2003 to 4 February 2020. This start date reflects the end date of the search conducted by Fazel and colleagues [6], in order to provide a contemporary estimate of mental illness within this population. The reference lists of 92 relevant systematic reviews identified during the search were also screened, resulting in an additional 37 articles to review. Studies were included if (1) the sample solely comprised adult refugees and/or asylum seekers residing outside their country of origin, (2) had a sample size larger than 50, and (3) reported quantitative prevalence estimates of a mental illness as classified by the Diagnostic and Statistical Manual of Mental Disorders (DSM) [17] or the International Classification of Disease (ICD) [18]. This diagnosis must have resulted from a clinical interview using a validated diagnostic assessment measure. The interview needed to be conducted either by a mental health professional (psychiatrist, psychologist, psychiatric nurse) or trained paraprofessional (psychology research assistant, trained researcher). In studies administering the WHO World Mental Health Composite International Diagnostic Interview (WMH-CIDI) [19], nonclinicians who had completed official WHO training were accepted, as this fully structured interview measure is intended for use by trained lay interviewers. If multiple articles reported data from the same study, the article providing data best meeting the selection criteria was included. Randomised controlled trials (RCTs), longitudinal cohort, and cross-sectional studies were considered for inclusion, whereas retrospective registry reviews, medical records audits, and qualitative studies were excluded. Case-control studies were excluded if cases were selected based on the presence of our outcomes of interest. Studies were excluded if they met the following criteria: Participants were recruited from psychiatric or mental health clinics (to reduce selection bias). However, those studies that recruited participants from primary healthcare clinics were still included. The sample included asylum seekers whose applications had been rejected and the results were not disaggregated or the assessment was not conducted prior to rejection (when the individuals met the definition of asylum seekers). Diagnoses were based solely on self-report questionnaires or symptom rating scales. Two reviewers (RB and MG-H or GF) independently assessed the title, abstract, and keywords of every article retrieved against the selection criteria. Full text was then assessed if the title and abstract suggested the study met the selection criteria. We contacted 31 study authors for further information regarding methodology and data and received 28 responses. Studies in languages other than English were assessed first by a native speaker when possible or via Google translate and then professionally translated if assessed as potentially eligible. Data analysis Using a fixed protocol, two review authors (RB and MG-H) independently extracted statistical data and study characteristics: host country, publication year, sample size, country or region of origin, sampling method, diagnostic tool and criteria, use of interpreter, age, proportion of female participants, visa status, duration of displacement, and prevalence of mental illness (numerator and denominator). Data regarding the sex distribution of samples were extracted separately for males and females, when possible. Meta-analysis results (Stata software version 14.1 [StataCorp]) were expressed as prevalence estimates of mental illness calculated with 95% confidence intervals (CIs) in the pooled data. Random-effects meta-analyses using a DerSimonian and Laird estimator based on inverse variance weights were employed [20]. Random-effects meta-analysis was chosen, as heterogeneity was anticipated because of between-study variations in clinical factors due to the heterogenous nature of refugees and asylum seekers (e.g., country of origin, language, host nations, etc.). The DerSimonian and Laird method incorporates a measure of the heterogeneity between studies. Heterogeneity was assessed using the I2 statistic [21]. In the case of five or more studies being available, publication bias was assessed by visual inspection of funnel plots and applying Egger’s test set at a threshold of a p-value less than 0.05 to indicate funnel plot asymmetry [22]. Prevalence rates were for current diagnoses, except studies reporting 1-year prevalence as assessed by the WHO WMH-CIDI [23–25]. Sources of heterogeneity between studies were investigated, when reported data allowed, by subgroup analyses. This included sex, sample size, displacement duration, visa status, country or region of origin, current residence, type of interview (interpreter-assisted or native language), and diagnostic measure. As prevalence of mental illness is related to sample size [6], the subgroup analysis for sample size compared studies with more or less than 200 participants. Risk of bias assessment Methodological quality was independently assessed by two reviewers (RB and JAB) using an assessment template for risk of bias, developed a priori according to study design, which meant the criteria to assess an RCT were different from the criteria of an observational study (S1 Risk of Bias) [26]. These templates are based upon the Newcastle-Ottawa Scale (NOS) [27], with the addition of further risk of bias components assessing internal and external validity such as use of appropriate study design, explicit and appropriate use of inclusion criteria, reporting bias, confounding, sufficient power for analyses, and any apparent conflicts of interest, as has been used in international evidence-based guidelines and other systematic reviews [28–30]. Using a descriptive approach, studies were assigned a rating of low, moderate, or high risk of bias. Any disagreement was resolved by discussion with other reviewers (MG-H and MF) to reach a consensus. Such discussions occurred on two occasions, both times regarding papers assigned at high risk of bias [31,32]. Search strategy and selection criteria We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement (S1 Prisma Checklist) [16] and registered the protocol with PROSPERO (record CRD42016046349) (https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=46349). The search was based on that used in the earlier systematic review by Fazel and colleagues [6] but expanded to increase the range of databases searched, number of search terms, and stricter criteria regarding study inclusion. This review also placed no restrictions on resettlement countries. In total, eight databases were searched: MEDLINE, MEDLINE In-Process, EBM Reviews, Embase, PsycINFO, CINAHL, PILOTS, Web of Science. The search strategy included terms for refugees and asylum seekers and terms related to mental illness, diagnosis, and trauma. An example of a complete search string is provided in S1 Table. The date limits of the search were 1 January 2003 to 4 February 2020. This start date reflects the end date of the search conducted by Fazel and colleagues [6], in order to provide a contemporary estimate of mental illness within this population. The reference lists of 92 relevant systematic reviews identified during the search were also screened, resulting in an additional 37 articles to review. Studies were included if (1) the sample solely comprised adult refugees and/or asylum seekers residing outside their country of origin, (2) had a sample size larger than 50, and (3) reported quantitative prevalence estimates of a mental illness as classified by the Diagnostic and Statistical Manual of Mental Disorders (DSM) [17] or the International Classification of Disease (ICD) [18]. This diagnosis must have resulted from a clinical interview using a validated diagnostic assessment measure. The interview needed to be conducted either by a mental health professional (psychiatrist, psychologist, psychiatric nurse) or trained paraprofessional (psychology research assistant, trained researcher). In studies administering the WHO World Mental Health Composite International Diagnostic Interview (WMH-CIDI) [19], nonclinicians who had completed official WHO training were accepted, as this fully structured interview measure is intended for use by trained lay interviewers. If multiple articles reported data from the same study, the article providing data best meeting the selection criteria was included. Randomised controlled trials (RCTs), longitudinal cohort, and cross-sectional studies were considered for inclusion, whereas retrospective registry reviews, medical records audits, and qualitative studies were excluded. Case-control studies were excluded if cases were selected based on the presence of our outcomes of interest. Studies were excluded if they met the following criteria: Participants were recruited from psychiatric or mental health clinics (to reduce selection bias). However, those studies that recruited participants from primary healthcare clinics were still included. The sample included asylum seekers whose applications had been rejected and the results were not disaggregated or the assessment was not conducted prior to rejection (when the individuals met the definition of asylum seekers). Diagnoses were based solely on self-report questionnaires or symptom rating scales. Two reviewers (RB and MG-H or GF) independently assessed the title, abstract, and keywords of every article retrieved against the selection criteria. Full text was then assessed if the title and abstract suggested the study met the selection criteria. We contacted 31 study authors for further information regarding methodology and data and received 28 responses. Studies in languages other than English were assessed first by a native speaker when possible or via Google translate and then professionally translated if assessed as potentially eligible. Data analysis Using a fixed protocol, two review authors (RB and MG-H) independently extracted statistical data and study characteristics: host country, publication year, sample size, country or region of origin, sampling method, diagnostic tool and criteria, use of interpreter, age, proportion of female participants, visa status, duration of displacement, and prevalence of mental illness (numerator and denominator). Data regarding the sex distribution of samples were extracted separately for males and females, when possible. Meta-analysis results (Stata software version 14.1 [StataCorp]) were expressed as prevalence estimates of mental illness calculated with 95% confidence intervals (CIs) in the pooled data. Random-effects meta-analyses using a DerSimonian and Laird estimator based on inverse variance weights were employed [20]. Random-effects meta-analysis was chosen, as heterogeneity was anticipated because of between-study variations in clinical factors due to the heterogenous nature of refugees and asylum seekers (e.g., country of origin, language, host nations, etc.). The DerSimonian and Laird method incorporates a measure of the heterogeneity between studies. Heterogeneity was assessed using the I2 statistic [21]. In the case of five or more studies being available, publication bias was assessed by visual inspection of funnel plots and applying Egger’s test set at a threshold of a p-value less than 0.05 to indicate funnel plot asymmetry [22]. Prevalence rates were for current diagnoses, except studies reporting 1-year prevalence as assessed by the WHO WMH-CIDI [23–25]. Sources of heterogeneity between studies were investigated, when reported data allowed, by subgroup analyses. This included sex, sample size, displacement duration, visa status, country or region of origin, current residence, type of interview (interpreter-assisted or native language), and diagnostic measure. As prevalence of mental illness is related to sample size [6], the subgroup analysis for sample size compared studies with more or less than 200 participants. Risk of bias assessment Methodological quality was independently assessed by two reviewers (RB and JAB) using an assessment template for risk of bias, developed a priori according to study design, which meant the criteria to assess an RCT were different from the criteria of an observational study (S1 Risk of Bias) [26]. These templates are based upon the Newcastle-Ottawa Scale (NOS) [27], with the addition of further risk of bias components assessing internal and external validity such as use of appropriate study design, explicit and appropriate use of inclusion criteria, reporting bias, confounding, sufficient power for analyses, and any apparent conflicts of interest, as has been used in international evidence-based guidelines and other systematic reviews [28–30]. Using a descriptive approach, studies were assigned a rating of low, moderate, or high risk of bias. Any disagreement was resolved by discussion with other reviewers (MG-H and MF) to reach a consensus. Such discussions occurred on two occasions, both times regarding papers assigned at high risk of bias [31,32]. Results The entire search yielded 21,842 records (Fig 1). After removing duplicates, 12,517 records were excluded based on title and abstract and a further 1,186 records were selected for full text review. Twenty-six studies met the inclusion criteria, providing data for 5,143 adult refugees and asylum seekers (Fig 1). Characteristics of the included studies are provided in Table 1. All were observational, except one RCT from which we included baseline prevalence data [24]. Studies were undertaken in 15 countries: Australia (652 refugees) [33–37], Austria (150) [38], China (65) [32], Germany (1,104) [39–44], Italy (297) [39], Lebanon (646) [45,46], Nepal (574) [25], Norway (64) [23], South Korea (200) [47], Sweden (86) [48], Switzerland (164) [49,50], Turkey (238) [51], Uganda (77) [24], UK (420) [39,52], and the US (406) [31,53]. Participants were from four geographic regions: the Middle East (43%), Europe (29%), Asia (20%), and Africa (5%), with two studies reporting refugee samples coming from 18 different countries (3%) [26, 41] (97% of total sample due to unreported countries of origin). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Search results and selection of studies reporting prevalence of mental illness among refugees and asylum seekers. https://doi.org/10.1371/journal.pmed.1003337.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Characteristics of included studies. https://doi.org/10.1371/journal.pmed.1003337.t001 Five diagnostic measures were used (S1 Table): Structured Clinical Interview for DSM (SCID) [54], Mini-International Neuropsychiatric Interview (M.I.N.I.) [55], Clinician Administered PTSD Scale (CAPS) [56], and WHO WMH-CIDI [18]. None of these instruments were developed specifically for refugee populations but have been widely used in different cultural contexts. Nine studies mentioned the reliability or validity of the used instruments [23,35,36,39,41,44–46,51]. Thirteen studies conducted the assessment in the refugee’s native language [25,31,32,37,39,42,43,45–47,51–53]. Thirteen studies were conducted with assistance from interpreters [23,24,33–36,38,40,41,44,48–50]. Twenty-two studies of PTSD were identified (n = 4,639) [23–25,32,33,35,36,38–40,42–53]. Participants had a weighted mean age of 35.2 years and 44% were women. Overall, 31.46% (95% CI 24.43–38.50) were diagnosed with PTSD (1,376/4,639) (Fig 2). There was substantial heterogeneity between studies (Fig 2), and subgroup analyses indicated PTSD prevalence was significantly higher for women (34.02%, 95% CI 31.12–37.01, p = 0.02), in the smaller studies (n < 200) (37.35%, 95% CI 34.86–39.90, p < 0.001), those with refugee status (31.01%, 95% CI 29.52–32.54, p < 0.001), and those originating from Africa (48.25%, 95% CI 39.82–56.75, p < 0.001) (Fig 3). In the eight largest studies with 200 participants or more, PTSD prevalence was significantly lower (29.30%, 95% CI 27.72–30.91, p < 0.001) [25,39,42,43,46,47,51,53]. Duration of displacement had no significant impact on PTSD prevalence (p = 0.11). The prevalence of PTSD for those displaced less than 4 years was 30.17% (95% CI 28.24–32.14) compared to 33.14% (95% CI 29.99–36.41) for those displaced longer than 4 years. The PTSD prevalence for interpreter-assisted interviews was 35.75% (95% CI 33.80–39.70) compared to 27.82% (95% CI 26.40–29.30) for interviews conducted in the native language (p < 0.001). There was a statistically significant difference across diagnostic measures (p < 0.001) with the CAPS yielding a higher prevalence of PTSD (40.41%, 95% CI 36.20–44.70), followed by the WHO-CIDI (31.6%, 95% CI 28.20–35.20), the SCID (30.55%, 95% CI 28–33.20), and the M.I.N.I. (25.8%, 95% CI 24–27.70). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Prevalence of PTSD in refugees and asylum seekers. *Study with sample size of ≥200. Horizontal lines indicate 95% CIs; horizontal points of the open diamond are the limits of the overall 95% CIs; and the red dashed line shows the position of the overall prevalence. AF, Afghanistan; AZR, Azerbaijan; CI, confidence interval; Frmr Yug, former Yugoslavia; ME, Middle East; PTSD, posttraumatic stress disorder; W Afr, West Africa. https://doi.org/10.1371/journal.pmed.1003337.g002 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Prevalence of PTSD by various study characteristics. p-Values derived from random-effects models; horizontal lines indicate 95% CIs. CAPS, Clinician Administered PTSD Scale; CI, confidence interval; M.I.N.I., the Mini-International Neuropsychiatric Interview; PTSD, posttraumatic stress disorder; SCID, Structured Clinical Interview for DSM; WHO-CIDI, World Health Organization–Composite International Diagnostic Interview. https://doi.org/10.1371/journal.pmed.1003337.g003 Seventeen studies of depression were identified (n = 3,877) [23,25,33,35–37,39–45,49–51,53]. Participants had a weighted mean age of 35.7 years and 48% were women. Overall, 31.51% (95% CI 22.64–40.38) were diagnosed with depression (1,066/3,877) (Fig 4). Three studies provided separate data for dysthymia (n = 1,135) [39,41,45]. The overall prevalence of dysthymia was 6.72% (95% CI 3.63%–9.81%) with moderate heterogeneity between studies (I2 = 65.6%, p = 0.055). There was considerable heterogeneity between the studies (Fig 4). Subgroup analyses indicated depression prevalence was significantly higher in the smaller studies 32.89% (95% CI 30.06–35.82, p < 0.001), for those deemed asylum seekers 30.14% (95% CI 27.10–33.32, p = 0.04), those originating from Europe 35.82% (95% CI 32.81–38.92, p < 0.0001), and for those living in the community 30.70% (95% CI 28.74–32.72, p < 0.0001) (Fig 5). The subgroup analysis for sex could not be conducted, owing to a lack of reported data. In the seven larger studies with 200 or more participants [25,37,39,42,43,51,53], the reported depression prevalence was 20.65% (95% CI 18.88–22.51), which was significantly lower (p < 0.001) than in the smaller studies, 32.89% (95% CI 30.06–35.82). Duration of displacement had no significant impact on depression prevalence (p = 0.17). The prevalence of depression for those displaced less than 4 years was 32.44% (95% CI 30.00–34.95) and 35.12% (95% CI 32.08–38.25) for those displaced longer than 4 years. The depression prevalence for interpreter-assisted interviews was 35.35% (95% CI 32.05–38.76) compared to 24.87% (95% CI 23.33–26.45) for interviews conducted in the native language (p < 0.0001). There was a statistically significant difference across type of diagnostic measures (p < 0.0001) with the SCID yielding a higher prevalence of depression (34.52%, 95% CI 31.74–37.39), followed by the M.I.N.I. (30.55%, 95% CI 28.59–32.56) and the WHO-CIDI (5.02%, 95% CI 3.46–7.01). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Prevalence of depression in refugees and asylum seekers. *Study with sample size of ≥200. Horizontal lines indicate 95% CIs; horizontal points of the open diamond are the limits of the overall 95% CIs; and the red dashed line shows the position of the overall prevalence. AZR, Azerbaijan; CI, confidence interval; Frmr Yug, former Yugoslavia; ME, Middle East. https://doi.org/10.1371/journal.pmed.1003337.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Prevalence of depression by various study characteristics. p-Values derived from random-effects models; horizontal lines indicate 95% CI. Subgroup analysis for sex could not be conducted, owing to a lack of reported data. CI, confidence interval; M.I.N.I., the Mini-International Neuropsychiatric Interview; SCID, Structured Clinical Interview for DSM; WHO-CIDI, World Health Organization–Composite International Diagnostic Interview. https://doi.org/10.1371/journal.pmed.1003337.g005 Eleven studies of anxiety disorders were identified (n = 2,840) [23,25,34,36,39,41–43,45,49,50]. Participants had a weighted mean age of 36.8 years and 31% were women. Four studies reported prevalence for generalised anxiety disorder [25,39,41,45], six reported any anxiety disorder [23,36,42,43,49,50], and one study diagnosed adult separation anxiety disorder [34]. Overall, 11.09% (95% CI 6.75–15.43) were diagnosed with an anxiety disorder (305/2,840) (Fig 6). There was substantial heterogeneity between studies (Fig 6). Subgroup analyses indicated anxiety prevalence was higher for those displaced less than 4 years (21.72%, 95% CI 18.74–24.94, p < 0.0001), those granted formal refugee status (11.44%, 95% CI 10.12–12.87, p = 0.0009), those originating from the Middle East (26.73%, 95% CI 22.86–30.89, p < 0.0001), and those living in temporary refugee accommodation (13.18%, 95% CI 11.46–15.06, p < 0.0001) (Fig 7). The subgroup analysis for sex could not be conducted, owing to a lack of reported data. Sample size had no significant impact on anxiety disorder prevalence (p = 0.21). The prevalence of anxiety disorders in the smaller studies (N < 200) was 9.24% (95% CI 7.36–11.42), and in the larger studies (N ≥ 200), the prevalence was 10.83% (95% CI 9.50–12.27). The use of an interpreter to conduct assessments had no significant impact on the reported prevalence of anxiety disorders (p = 0.34). The prevalence of anxiety for interpreter-assisted interviews was 9.70 (95% CI 7.50–12.30) and 11.04% (95% CI 9.76–12.40) for those interviews conducted in the native language. The subgroup analysis for diagnostic measure could not be conducted, owing to insufficient studies for each measure. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. Prevalence of anxiety in refugees and asylum seekers. *Study with sample size of ≥200. Horizontal lines indicate 95% CIs; horizontal points of the open diamond are the limits of the overall 95% CIs; and the red dashed line shows the position of the overall prevalence. AZR, Azerbaijan; CI, confidence interval; Frmr Yug, former Yugoslavia; ME, Middle East. https://doi.org/10.1371/journal.pmed.1003337.g006 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. Prevalence of anxiety by various study characteristics. p-Values are derived from random-effects models; horizontal lines indicate 95% CI. Subgroup analysis for sex could not be conducted, owing to a lack of reported data. Subgroup analysis for diagnostic measure could not be conducted, owing to insufficient studies for each measure. CI, confidence interval. https://doi.org/10.1371/journal.pmed.1003337.g007 Six studies of psychotic illness were identified (n = 1,695) [23,31,36,39,43,45]. Participants had a weighted mean age of 37.6 years and 51% were female. Overall, 1.51% (95% CI 0.63–2.40) were diagnosed with psychosis (31/1,695), with low heterogeneity between studies (Fig 8). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 8. Prevalence of psychosis in refugees and asylum seekers. Horizontal lines indicate 95% CIs. Horizontal points of the open diamond are the limits of the overall 95% CIs; and the red dashed line shows the position of the overall prevalence. AZR, Azerbaijan; CI, confidence interval; Frmr Yug, former Yugoslavia; ME, Middle East. https://doi.org/10.1371/journal.pmed.1003337.g008 Publication bias There was no evidence of publication bias for PTSD, depression, anxiety, or psychosis (S1–S4 Egger’s Tests). Risk of bias Thirteen studies were assigned a low risk of bias and determined to be of high quality [23,24,35,37,38,40,45–50,53]. Nine studies demonstrated moderate risk of bias [25,33,34,36,39,41,44,51,52]. A moderate rating was assigned to studies that had issues with the representativeness of their sample or used nonrandom sampling techniques. Additionally, in one study, only male psychologists conducted the diagnostic assessments, and this was associated with fewer than expected reports of sexual assault [51]. Four studies were assigned a high risk of bias [31,32,42,43]. One study, providing data for PTSD and depression, assessed the mental health consequences of captivity by the Islamic State (IS) militant group on a sample of Yazidi women. It was reported that some of the women were not yet ready to receive psychotherapy for their symptoms [42]. This may have impacted upon the reported prevalence rates, particularly PTSD, as some women may have been reluctant and not ready to disclose trauma details during the research interviews. Another study, providing PTSD data, conducted diagnostic assessments in nonconfidential areas of a detention facility [32]. The reported PTSD prevalence was low but similar to two other studies assigned a low risk of bias. Two studies recruited help-seeking populations through the use of advertisements or flyers offering psychological treatment for those affected by war [31,43]. One of these studies compared their help-seeking population to a randomly recruited sample, and there was a difference in prevalence rates, with higher rates in the help-seeking population [43]. Publication bias There was no evidence of publication bias for PTSD, depression, anxiety, or psychosis (S1–S4 Egger’s Tests). Risk of bias Thirteen studies were assigned a low risk of bias and determined to be of high quality [23,24,35,37,38,40,45–50,53]. Nine studies demonstrated moderate risk of bias [25,33,34,36,39,41,44,51,52]. A moderate rating was assigned to studies that had issues with the representativeness of their sample or used nonrandom sampling techniques. Additionally, in one study, only male psychologists conducted the diagnostic assessments, and this was associated with fewer than expected reports of sexual assault [51]. Four studies were assigned a high risk of bias [31,32,42,43]. One study, providing data for PTSD and depression, assessed the mental health consequences of captivity by the Islamic State (IS) militant group on a sample of Yazidi women. It was reported that some of the women were not yet ready to receive psychotherapy for their symptoms [42]. This may have impacted upon the reported prevalence rates, particularly PTSD, as some women may have been reluctant and not ready to disclose trauma details during the research interviews. Another study, providing PTSD data, conducted diagnostic assessments in nonconfidential areas of a detention facility [32]. The reported PTSD prevalence was low but similar to two other studies assigned a low risk of bias. Two studies recruited help-seeking populations through the use of advertisements or flyers offering psychological treatment for those affected by war [31,43]. One of these studies compared their help-seeking population to a randomly recruited sample, and there was a difference in prevalence rates, with higher rates in the help-seeking population [43]. Discussion Our results indicate that refugees and asylum seekers experience high rates of mental illness, in particular PTSD and depression. PTSD and depression appear to persist for many years post displacement, as there was no difference in prevalence between those displaced less than 4 years and those displaced longer. However, this was not the case for the prevalence of anxiety disorders, which we found to be higher among those displaced less than 4 years. PTSD and depression in refugees and asylum seekers appear to be more prevalent than in the general population. According to data from the World Mental Health Surveys, lifetime prevalence in the general population is 3.9% for PTSD [57] and 12% for any depressive disorder [58], compared to our findings of 31% for PTSD and 31.5% for depression. However, the prevalence of anxiety disorders (11%) and psychosis (1.5%) in refugees and asylum seekers appears to be less than the lifetime prevalence in general population samples: 16% [58] and 3% [59], respectively. Only 11 studies reporting data on anxiety prevalence met the inclusion criteria for this review, and of those 11, only six assessed the full range of DSM anxiety disorders. With a heavy emphasis on PTSD and depression, the full breadth of anxiety disorders is less frequently examined and reported in the literature. It was only recently, with the release of DSM-5, that PTSD was no longer classified as an anxiety disorder but in a separate category of trauma and stressor-related disorders [60]. Further research on the prevalence of the full range of anxiety disorders and comorbidities is needed. With the aim of including all possible refugee populations that have been studied, this systematic review placed few restrictions on characteristics of refugee experiences (region of origin or resettlement, duration of displacement, etc.). As a result, the review’s criteria could in fact have been a contributing factor to the resulting substantial statistical heterogeneity. Despite this high heterogeneity, which is expected when investigating and analysing prevalence across global refugee populations, knowledge of current prevalence estimates provides a foundation for the field to build on. Researchers can progress with this knowledge and focus their attention on addressing the critical need for immediate, appropriate, and ongoing mental health support and interventions. Without the progression of further high-quality research that explores the different components of mental health needs, culturally appropriate and effective interventions, and longitudinal mental illness trajectories, untreated mental illnesses will severely impact upon successful integration into host communities. For host countries and humanitarian agencies, current prevalence estimates of mental illness within this ever-growing population can be used in advocacy and health service planning to strengthen mental health services for refugees and asylum seekers, in line with WHO priorities and objectives [14]. Subgroup analysis for sex was only possible for PTSD, owing to a lack of sex data for the other outcomes, and this is a major limitation of the current literature. The subgroup analysis indicated a higher PTSD prevalence for women, consistent with studies of sex differences and PTSD within general populations [61–63]. During times of conflict, women face not only an increased risk of sexual violence [64–66], which is considered to confer a high risk for developing PTSD, but other risks associated with migration trauma such as safety concerns, child-rearing pressures, and exploitation and trafficking [67]. Although trauma type in relation to PTSD diagnosis was not adequately described in the studies, many of the studies included participants from countries such as the former Yugoslavia, Syria, and Iraq, areas with conflicts reported to have perpetrated systematic sexual violence [68]. In line with best-practice research reporting, future research in the field must ensure outcome measures are disaggregated by sex. The studies with populations from Africa reported the highest prevalence of PTSD. This result likely reflects how countries within Africa are consistently ranked at the highest levels of the Political Terror Scale [69]. This scale is a five-point rating system based on data from Amnesty International and the US State Department and measures the levels of extensive human rights violations and violence within nations. In our review, the refugee populations from Europe, which mostly consisted of individuals from the former Yugoslavia, had the highest prevalence of depression, and the Middle East refugee populations had the highest prevalence of anxiety. The prevalence of PTSD and depression appeared to be higher in studies that utilised interpreter-assisted diagnostic assessments. However, this was not the case for anxiety disorders, for which we did not find evidence for a difference between the interpreter-assisted interviews and those conducted in the native language. This difference could be due to a number of factors, such as language fluency, which plays an important role in the diagnosis of mental illness because the clinician relies heavily on the self-reported symptoms of the individual [70]. However, further research is required to understand the differences in diagnosis rates between interpreter-assisted interviews and clinicians conducting the assessment in the native language and whether there are cultural and linguistic nuances that can impact on diagnostic rates that might only be accessible to native interviewers. Even though the different diagnostic measures are considered comparable in performance and diagnosis precision [71], our results suggest some differences, which highlight the importance of careful consideration of the method and instrument used in the mental health assessments of refugee populations. Although beyond the scope of this review, further investigation is required to understand potential differences in case identification between diagnostic measures. Our findings suggest that the prevalence of PTSD and depression persists for many years post displacement, suggesting ongoing suffering from mental illnesses in the postmigration environment. This environment can include complexities of social and cultural isolation, reconfigured family relationships, difficulties adjusting to life in a foreign country, and often limited opportunities to contribute economically and socially to their new communities. Previous longitudinal studies have demonstrated how these hallmarks of the postmigration environment, alongside poor social support and acculturation difficulties, may contribute to a deterioration in mental health [5,72–74]. In contrast to the findings for PTSD and depression, anxiety prevalence was higher for those individuals recently displaced. Factors contributing to anxiety might be influenced by the uncertainty of the resettlement process and participation in the refugee determination process, which might have a detrimental effect on psychological well-being; however, robust longitudinal research is needed in this field. We found that the prevalence of PTSD and depression is higher than in the review by Fazel and colleagues [6]. This could reflect the fact that this current systematic review included refugee populations from low- and middle-income countries or that the more recent refugee flows might be exposed to higher numbers of risk factors. In contrast, the results for anxiety disorders and psychosis are comparable with previously reported prevalence rates [6]. The influence of sample size is further supported, with the larger studies reporting lower prevalence rates for PTSD and depression. However, this was not the case for anxiety, for which sample size did not influence prevalence. The results for PTSD and depression are comparable to the findings by Steel and colleagues [7] and slightly lower than other systematic reviews, which have reported PTSD prevalence in the range of 36%–43% and depression 40%–44% [12,75]. Two phenomena currently affecting refugee and asylum-seeker populations should be considered when interpreting the results of this review. First is the increased targeting of civilian populations in areas of mass conflict. Second is the postmigration environment in countries with increasingly harsh immigration policies including detention, deportation, and delayed granting of refugee status—possibly mirroring local population shifts against immigration and heightened hostility towards refugee populations [76,77]. Investigation of these situations and their impact on mental health is warranted. Limitations and strengths Some statistical heterogeneity is to be expected as a result of the review’s design, which set no exclusion criteria for host country, country of origin, sex, or duration of displacement. We addressed this by using random-effects models to calculate more conservative 95% CIs. The conventional method to investigate potential sources of heterogeneity is to conduct a meta-regression; however, this was not possible, because of the limited covariates reported in the studies. We conducted subgroup analyses to investigate potential sources heterogeneity, but some subgroup analyses were also not possible, and some studies were excluded from subgroup analyses because of a lack of reported data. There are many challenges to conducting research with refugee populations, one of which is sampling. Ideally, this review would have restricted the inclusion criteria to studies that incorporated multistage representative sampling. However, such a restriction in this field would have yielded so few studies that the prevalence estimates could not have been made. In fact, only two of the included studies in this review would have met this criterion. Other limitations were imposed when studies combined illnesses to form diagnostic groups and/or reported only the number of comorbidities rather than the actual diagnoses. Although many of the diagnostic measures had been widely used in different cultural contexts, none had been specifically developed for refugee populations or cross-cultural use. Although the DSM-5 attempts to enhance cultural validity, all of the included studies used the DSM-IV, DSM-III-R, or ICD-10 criteria, previously criticized for limited recognition of cultural perspectives [78]. In particular, the diagnostic framework for PTSD has largely been investigated using military personnel and single-incident trauma survivors from high-income nations [79]. Somatic symptoms and related disorders were outside the scope of this review but warrant specific investigation and characterization. As far as we are aware, this is the only systematic review to implement strict inclusion criteria regarding the diagnosis of mental illness in current refugee and asylum-seeker populations. This allowed for the selective analysis of higher-quality studies reporting the prevalence of mental illness based on clinical interviews with trained assessors using validated diagnostic measures. This review also expands the current evidence base by not only focusing on PTSD but also reporting depression, anxiety, and psychosis. To the best of our knowledge, this is the first systematic review to place no restrictions on language or on countries of origin or settlement. The majority of studies in this field are undertaken in high-income countries, which are often not countries of first asylum. Although most studies in this review came from countries such as the UK, Germany, Switzerland, and Australia, it also included studies from key refugee host nations such as Lebanon, Turkey, Uganda, and Nepal. The ever-growing refugee and asylum-seeker populations pose a major global public health crisis with serious implications for mental health. This review provides current prevalence estimates for PTSD, depression, anxiety, and psychosis and suggests that both short-term and ongoing mental health services, beyond the period of initial resettlement, are required to promote the health of refugees. Limitations and strengths Some statistical heterogeneity is to be expected as a result of the review’s design, which set no exclusion criteria for host country, country of origin, sex, or duration of displacement. We addressed this by using random-effects models to calculate more conservative 95% CIs. The conventional method to investigate potential sources of heterogeneity is to conduct a meta-regression; however, this was not possible, because of the limited covariates reported in the studies. We conducted subgroup analyses to investigate potential sources heterogeneity, but some subgroup analyses were also not possible, and some studies were excluded from subgroup analyses because of a lack of reported data. There are many challenges to conducting research with refugee populations, one of which is sampling. Ideally, this review would have restricted the inclusion criteria to studies that incorporated multistage representative sampling. However, such a restriction in this field would have yielded so few studies that the prevalence estimates could not have been made. In fact, only two of the included studies in this review would have met this criterion. Other limitations were imposed when studies combined illnesses to form diagnostic groups and/or reported only the number of comorbidities rather than the actual diagnoses. Although many of the diagnostic measures had been widely used in different cultural contexts, none had been specifically developed for refugee populations or cross-cultural use. Although the DSM-5 attempts to enhance cultural validity, all of the included studies used the DSM-IV, DSM-III-R, or ICD-10 criteria, previously criticized for limited recognition of cultural perspectives [78]. In particular, the diagnostic framework for PTSD has largely been investigated using military personnel and single-incident trauma survivors from high-income nations [79]. Somatic symptoms and related disorders were outside the scope of this review but warrant specific investigation and characterization. As far as we are aware, this is the only systematic review to implement strict inclusion criteria regarding the diagnosis of mental illness in current refugee and asylum-seeker populations. This allowed for the selective analysis of higher-quality studies reporting the prevalence of mental illness based on clinical interviews with trained assessors using validated diagnostic measures. This review also expands the current evidence base by not only focusing on PTSD but also reporting depression, anxiety, and psychosis. To the best of our knowledge, this is the first systematic review to place no restrictions on language or on countries of origin or settlement. The majority of studies in this field are undertaken in high-income countries, which are often not countries of first asylum. Although most studies in this review came from countries such as the UK, Germany, Switzerland, and Australia, it also included studies from key refugee host nations such as Lebanon, Turkey, Uganda, and Nepal. The ever-growing refugee and asylum-seeker populations pose a major global public health crisis with serious implications for mental health. This review provides current prevalence estimates for PTSD, depression, anxiety, and psychosis and suggests that both short-term and ongoing mental health services, beyond the period of initial resettlement, are required to promote the health of refugees. Supporting information S1 Prisma Checklist. From [16]. For more information, visit: www.prisma-statement.org. https://doi.org/10.1371/journal.pmed.1003337.s001 (DOCX) S1 Table. *Truncation symbol. MeSH term, Medical Subject Headings. https://doi.org/10.1371/journal.pmed.1003337.s002 (DOCX) S1 Risk of Bias. https://doi.org/10.1371/journal.pmed.1003337.s003 (DOCX) S1 Egger’s Test PTSD. Figure: Funnel plot using data from 22 studies providing data for the prevalence of posttraumatic stress disorder. Each dot represents a study. ES, effect size; s.e, standard error. Table: Egger’s test set at a threshold of a p-value less than 0.05 to indicate funnel plot asymmetry. Coef., coefficient; Conf. Interval, confidence interval; Std_Eff, standard effect; Std. Err, standard error; Test of HO, test of null hypothesis. https://doi.org/10.1371/journal.pmed.1003337.s004 (DOCX) S2 Egger’s Test Depression. Figure: Funnel plot using data from 17 studies providing data for the prevalence of depression. Each dot represents a study. ES, effect size; s.e, standard error. Table: Egger’s test set at a threshold of a p-value less than 0.05 to indicate funnel plot asymmetry. Coef., coefficient; Conf. Interval, confidence interval; Std_Eff, standard effect; Std. Err, standard error; Test of HO, test of null hypothesis. https://doi.org/10.1371/journal.pmed.1003337.s005 (DOCX) S3 Egger’s Test Anxiety. Figure: Funnel plot using data from 11 studies providing data for the prevalence of anxiety disorders. Each dot represents a study. ES, effect size; s.e, standard error. Table: Egger’s test set at a threshold of a p-value less than 0.05 to indicate funnel plot asymmetry. Coef., coefficient; Conf. Interval, confidence interval; Std_Eff, standard effect; Std. Err, standard error; Test of HO, test of null hypothesis. https://doi.org/10.1371/journal.pmed.1003337.s006 (DOCX) S4 Egger’s Test Psychosis. Figure: Funnel plot using data from six studies providing data for the prevalence of psychosis. Each dot represents a study. ES, effect size; s.e, standard error. Table: Egger’s test set at a threshold of a p-value less than 0.05 to indicate funnel plot asymmetry. Coef., coefficient; Conf. Interval, confidence interval; Std_Eff, standard effect; Std. Err, standard error; Test of HO, test of null hypothesis. https://doi.org/10.1371/journal.pmed.1003337.s007 (DOCX) Acknowledgments We thank the following authors for providing additional information regarding their studies: C. Acarturk, M. Aoun, C. Eckart, E. Kaltenbach, C. J. Laban, A. Nickerson, F. Neuner, A. Rasmussen, Z. Steel, and S. Thapa. We would also like to thank A. Young from the Monash University library for her assistance with conducting the database search. We sincerely thank the Monash University staff and students and non-Monash colleagues who assisted with the screening of articles across a number of languages: R. Goldstein, C. Tay, C. Pickett (Edith Cowan University and Victoria University), D. Coles, K. Petersen, N. Pekin, K. Hammarberg, K. Stanzel, and R. Hasanov.
Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic review and meta-analysisBuitrago-Garcia, Diana;Egli-Gany, Dianne;Counotte, Michel J.;Hossmann, Stefanie;Imeri, Hira;Ipekci, Aziz Mert;Salanti, Georgia;Low, Nicola
doi: 10.1371/journal.pmed.1003346pmid: 32960881
Background There is disagreement about the level of asymptomatic severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We conducted a living systematic review and meta-analysis to address three questions: (1) Amongst people who become infected with SARS-CoV-2, what proportion does not experience symptoms at all during their infection? (2) Amongst people with SARS-CoV-2 infection who are asymptomatic when diagnosed, what proportion will develop symptoms later? (3) What proportion of SARS-CoV-2 transmission is accounted for by people who are either asymptomatic throughout infection or presymptomatic? Methods and findings We searched PubMed, Embase, bioRxiv, and medRxiv using a database of SARS-CoV-2 literature that is updated daily, on 25 March 2020, 20 April 2020, and 10 June 2020. Studies of people with SARS-CoV-2 diagnosed by reverse transcriptase PCR (RT-PCR) that documented follow-up and symptom status at the beginning and end of follow-up or modelling studies were included. One reviewer extracted data and a second verified the extraction, with disagreement resolved by discussion or a third reviewer. Risk of bias in empirical studies was assessed with an adapted checklist for case series, and the relevance and credibility of modelling studies were assessed using a published checklist. We included a total of 94 studies. The overall estimate of the proportion of people who become infected with SARS-CoV-2 and remain asymptomatic throughout infection was 20% (95% confidence interval [CI] 17–25) with a prediction interval of 3%–67% in 79 studies that addressed this review question. There was some evidence that biases in the selection of participants influence the estimate. In seven studies of defined populations screened for SARS-CoV-2 and then followed, 31% (95% CI 26%–37%, prediction interval 24%–38%) remained asymptomatic. The proportion of people that is presymptomatic could not be summarised, owing to heterogeneity. The secondary attack rate was lower in contacts of people with asymptomatic infection than those with symptomatic infection (relative risk 0.35, 95% CI 0.10–1.27). Modelling studies fit to data found a higher proportion of all SARS-CoV-2 infections resulting from transmission from presymptomatic individuals than from asymptomatic individuals. Limitations of the review include that most included studies were not designed to estimate the proportion of asymptomatic SARS-CoV-2 infections and were at risk of selection biases; we did not consider the possible impact of false negative RT-PCR results, which would underestimate the proportion of asymptomatic infections; and the database does not include all sources. Conclusions The findings of this living systematic review suggest that most people who become infected with SARS-CoV-2 will not remain asymptomatic throughout the course of the infection. The contribution of presymptomatic and asymptomatic infections to overall SARS-CoV-2 transmission means that combination prevention measures, with enhanced hand hygiene, masks, testing tracing, and isolation strategies and social distancing, will continue to be needed. Why was this study done? The proportion of people who will remain asymptomatic throughout the course of infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), is not known. Studies that assess people at just one time point will overestimate the proportion of true asymptomatic infection because those who go on to develop COVID-19 symptoms will be wrongly classified as asymptomatic rather than presymptomatic. The amount, and infectiousness, of asymptomatic SARS-CoV-2 infection will determine what kind of measures will prevent transmission most effectively. What did the researchers do and find? We did a living systematic review through 10 June 2020, using automated workflows that speed up the review processes and allow the review to be updated when relevant new evidence becomes available. Overall, in 79 studies in a range of different settings, 20% (95% confidence interval [CI] 17%–25%, prediction interval 3%–67%) of people with SARS-CoV-2 infection remained asymptomatic during follow-up, but biases in study designs limit the certainty of this estimate. In seven studies of defined populations screened for SARS-CoV-2 and then followed, 31% (95% CI 26%–37%, prediction interval 24%–38%) remained asymptomatic. We found some evidence that SARS-CoV-2 infection in contacts of people with asymptomatic infection is less likely than in contacts of people with symptomatic infection (relative risk 0.35, 95% CI 0.10–1.27). What do these findings mean? The findings of this living systematic review suggest that most people who become infected with SARS-CoV-2 will not remain asymptomatic throughout the course of infection. Future studies should be designed specifically to determine the true proportion of asymptomatic SARS-CoV-2 infections, using methods to minimise biases in the selection of study participants and ascertainment of symptom status during follow-up. The contribution of presymptomatic and asymptomatic infections to overall SARS-CoV-2 transmission means that combination prevention measures, with enhanced hand hygiene, masks, testing tracing, and isolation strategies and social distancing, will continue to be needed. Introduction There is ongoing discussion about the level of asymptomatic severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. The authors of a narrative review report a range of proportions of participants positive for SARS-CoV-2 but asymptomatic in different studies from 6% to 96% [1]. The discrepancy results, in part, from the interpretation of studies that report a proportion of asymptomatic people with SARS-CoV-2 detected at a single point. The studies cited include both people who will remain asymptomatic throughout and those, known as presymptomatic, who will develop symptoms of coronavirus disease 2019 (COVID-19) if followed up [2]. The full spectrum and distribution of COVID-19, from completely asymptomatic, to mild and nonspecific symptoms, viral pneumonia, respiratory distress syndrome, and death, are not yet known [3]. Without follow-up, however, the proportions of asymptomatic and presymptomatic infections cannot be determined. Accurate estimates of the proportions of true asymptomatic and presymptomatic infections are needed urgently because their contribution to overall SARS-CoV-2 transmission at the population level will determine the appropriate balance of control measures [3]. If the predominant route of transmission is from people who have symptoms, then strategies should focus on testing, followed by isolation of infected individuals and quarantine of their contacts. If, however, most transmission is from people without symptoms, social distancing measures that reduce contact with people who might be infectious should be prioritised, enhanced by active case-finding through testing of asymptomatic people. The objectives of this study were to address three questions: (1) Amongst people who become infected with SARS-CoV-2, what proportion do not experience symptoms at all during their infection? (2) Amongst people with SARS-CoV-2 infection who are asymptomatic when diagnosed, what proportion will develop symptoms later? (3) What proportion of SARS-CoV-2 transmission is accounted for by people who are either asymptomatic throughout infection or presymptomatic? Methods We conducted a living systematic review, a systematic review that provides an online summary of findings and is updated when relevant new evidence becomes available [4]. The review follows a published protocol (https://osf.io/9ewys/), which describes in detail the methods used to speed up review tasks [5] and to assess relevant evidence rapidly during a public health emergency [6]. The first two versions of the review have been published as preprints [7,8]. We report our findings according to the statement on preferred reporting items for systematic reviews and meta-analyses (S1 PRISMA Checklist) [9]. Ethics committee review was not required for this study. Box 1 shows our definitions of symptoms, asymptomatic infection, and presymptomatic status. We use the term asymptomatic SARS-CoV-2 infection for people without symptoms of COVID-19 who remain asymptomatic throughout the course of infection. We use the term presymptomatic for people who do not have symptoms of COVID-19 when enrolled in a study but who develop symptoms during adequate follow-up. Box 1. Definitions of symptoms and symptom status in a person with SARS-CoV-2 infections Symptoms: symptoms that a person experiences and reports. We used the authors’ definitions. We searched included manuscripts for an explicit statement that the study participant did not report symptoms that they experienced. Some authors defined ‘asymptomatic’ as an absence of self-reported symptoms. We did not include clinical signs observed or elicited on examination. Asymptomatic infection: a person with laboratory-confirmed SARS-CoV-2 infection, who has no symptoms, according to the authors’ report, at the time of first clinical assessment and had no symptoms at the end of follow-up. The end of follow-up was defined as any of the following: virological cure, with one or more negative reverse transcriptase PCR (RT-PCR) test results; follow-up for 14 days or more after the last possible exposure to an index case; follow-up for 7 days or more after the first RT-PCR positive result. Presymptomatic: a person with laboratory-confirmed SARS-CoV-2 infection, who has no symptoms, according to the authors’ report, at the time of first clinical assessment but who developed symptoms by the end of follow-up. The end of follow-up was defined as any of the following: virological cure, with one or more negative RT-PCR test results; follow-up for 14 days or more after the last possible exposure to an index case; follow-up for 7 days or more after the first RT-PCR positive result. Information sources and search We conducted the first search on 25 March 2020 and updated it on 20 April and 10 June 2020. We searched the COVID-19 living evidence database [10], which is generated using automated workflow processes [5] to (1) provide daily updates of searches of four electronic databases (Medline PubMed, Ovid Embase, bioRxiv, and medRxiv), using medical subject headings and free-text keywords for SARS-CoV-2 infection and COVID-19; (2) de-duplicate the records; (3) tag records that are preprints; and (4) allow searches of titles and abstracts using Boolean operators. We used the search function to identify studies of asymptomatic or presymptomatic SARS-CoV-2 infection using a search string of medical subject headings and free-text keywords (S1 Text). We also examined articles suggested by experts and the reference lists of retrieved mathematical modelling studies and systematic reviews. Reports from this living rapid systematic review will be updated at 3-monthly intervals, with continuously updated searches. Eligibility criteria We included studies in any language of people with SARS-CoV-2 diagnosed by RT-PCR that documented follow-up and symptom status at the beginning and end of follow-up or investigated the contribution to SARS-CoV-2 transmission of asymptomatic or presymptomatic infection. We included contact-tracing investigations, case series, cohort studies, case-control studies, and statistical and mathematical modelling studies. We excluded the following study types: case reports of a single patient and case series in which participants were not enrolled consecutively. When multiple records included data from the same study population, we linked the records and extracted data from the most complete report. Study selection and data extraction Reviewers worked in pairs to screen records using an application programming interface in the electronic data capture system (REDCap, Vanderbilt University, Nashville, TN, USA). One reviewer selected potentially eligible studies and a second reviewer verified all included and excluded studies. We reported the identification, exclusion, and inclusion of studies in a flowchart (S1 Fig). The reviewers determined which of the three review questions each study addressed, using the definitions in Box 1. One reviewer extracted data using a pre-piloted extraction form in REDCap, and a second reviewer verified the extracted data using the query system. A third reviewer adjudicated on disagreements that could not be resolved by discussion. We contacted study authors for clarification when the study description was insufficient to reach a decision on inclusion or if reported data in the manuscript were internally inconsistent. The extracted variables included, but were not limited to, study design, country and/or region, study setting, population, age, primary outcomes, and length of follow-up. From empirical studies, we extracted raw numbers of individuals with any outcome and its relevant denominator. From statistical and mathematical modelling studies, we extracted proportions and uncertainty intervals reported by the authors. The primary outcomes for each review question were (1) proportion with asymptomatic SARS-CoV-2 infection who did not experience symptoms at all during follow-up; (2) proportion with SARS-CoV-2 infections who did not have symptoms at the time of testing but developed symptoms during follow-up; (3) estimated proportion (with uncertainty interval) of SARS-CoV-2 transmission accounted for by people who are asymptomatic or presymptomatic. A secondary outcome for review question 3 was the secondary attack rate from asymptomatic or presymptomatic index cases. Risk of bias in included studies Two authors independently assessed the risk of bias. A third reviewer resolved disagreements. For observational epidemiological studies, we adapted the Joanna Briggs Institute Critical Appraisal Checklist for Case Series [11]. The adapted tool included items about inclusion criteria, measurement of asymptomatic status, follow-up of course of disease, and statistical analysis. We added items about selection biases affecting the study population from a tool for the assessment of risk of bias in prevalence studies [12]. For mathematical modelling studies, we used a checklist for assessing relevance and credibility [13]. Synthesis of the evidence We used the ‘metaprop’ and ‘metabin’ functions from the ‘meta’ package (version 4.11–0) [14] in R (version 3.5.1) to display the study findings in forest plots and synthesise their findings. The 95% confidence intervals (CIs) for each study are estimated using the Clopper-Pearson method [15]. We examined heterogeneity visually in forest plots. We stratified studies according to the methods used to identify people with asymptomatic SARS-CoV-2 infection and the study setting. To synthesise proportions from comparable studies, in terms of design and population, we used stratified random-effects meta-analysis. For the stratified and overall summary estimates, we calculated prediction intervals, to represent the likely range of proportions that would be obtained in subsequent studies conducted in similar settings [16]. We calculated the secondary attack rate as the number of cases among contacts as a proportion of all close contacts ascertained. We did not account for potential clustering of contacts because the included studies did not report the size of clusters. We compared the secondary attack rate from asymptomatic or presymptomatic index cases with that from symptomatic cases. If there were no events in a group, we added 0.5 to each cell in the 2 × 2 table. We used random-effects meta-analysis with the Mantel-Haenszel method to estimate a summary risk ratio (with 95% CI). Box 1. Definitions of symptoms and symptom status in a person with SARS-CoV-2 infections Symptoms: symptoms that a person experiences and reports. We used the authors’ definitions. We searched included manuscripts for an explicit statement that the study participant did not report symptoms that they experienced. Some authors defined ‘asymptomatic’ as an absence of self-reported symptoms. We did not include clinical signs observed or elicited on examination. Asymptomatic infection: a person with laboratory-confirmed SARS-CoV-2 infection, who has no symptoms, according to the authors’ report, at the time of first clinical assessment and had no symptoms at the end of follow-up. The end of follow-up was defined as any of the following: virological cure, with one or more negative reverse transcriptase PCR (RT-PCR) test results; follow-up for 14 days or more after the last possible exposure to an index case; follow-up for 7 days or more after the first RT-PCR positive result. Presymptomatic: a person with laboratory-confirmed SARS-CoV-2 infection, who has no symptoms, according to the authors’ report, at the time of first clinical assessment but who developed symptoms by the end of follow-up. The end of follow-up was defined as any of the following: virological cure, with one or more negative RT-PCR test results; follow-up for 14 days or more after the last possible exposure to an index case; follow-up for 7 days or more after the first RT-PCR positive result. Information sources and search We conducted the first search on 25 March 2020 and updated it on 20 April and 10 June 2020. We searched the COVID-19 living evidence database [10], which is generated using automated workflow processes [5] to (1) provide daily updates of searches of four electronic databases (Medline PubMed, Ovid Embase, bioRxiv, and medRxiv), using medical subject headings and free-text keywords for SARS-CoV-2 infection and COVID-19; (2) de-duplicate the records; (3) tag records that are preprints; and (4) allow searches of titles and abstracts using Boolean operators. We used the search function to identify studies of asymptomatic or presymptomatic SARS-CoV-2 infection using a search string of medical subject headings and free-text keywords (S1 Text). We also examined articles suggested by experts and the reference lists of retrieved mathematical modelling studies and systematic reviews. Reports from this living rapid systematic review will be updated at 3-monthly intervals, with continuously updated searches. Eligibility criteria We included studies in any language of people with SARS-CoV-2 diagnosed by RT-PCR that documented follow-up and symptom status at the beginning and end of follow-up or investigated the contribution to SARS-CoV-2 transmission of asymptomatic or presymptomatic infection. We included contact-tracing investigations, case series, cohort studies, case-control studies, and statistical and mathematical modelling studies. We excluded the following study types: case reports of a single patient and case series in which participants were not enrolled consecutively. When multiple records included data from the same study population, we linked the records and extracted data from the most complete report. Study selection and data extraction Reviewers worked in pairs to screen records using an application programming interface in the electronic data capture system (REDCap, Vanderbilt University, Nashville, TN, USA). One reviewer selected potentially eligible studies and a second reviewer verified all included and excluded studies. We reported the identification, exclusion, and inclusion of studies in a flowchart (S1 Fig). The reviewers determined which of the three review questions each study addressed, using the definitions in Box 1. One reviewer extracted data using a pre-piloted extraction form in REDCap, and a second reviewer verified the extracted data using the query system. A third reviewer adjudicated on disagreements that could not be resolved by discussion. We contacted study authors for clarification when the study description was insufficient to reach a decision on inclusion or if reported data in the manuscript were internally inconsistent. The extracted variables included, but were not limited to, study design, country and/or region, study setting, population, age, primary outcomes, and length of follow-up. From empirical studies, we extracted raw numbers of individuals with any outcome and its relevant denominator. From statistical and mathematical modelling studies, we extracted proportions and uncertainty intervals reported by the authors. The primary outcomes for each review question were (1) proportion with asymptomatic SARS-CoV-2 infection who did not experience symptoms at all during follow-up; (2) proportion with SARS-CoV-2 infections who did not have symptoms at the time of testing but developed symptoms during follow-up; (3) estimated proportion (with uncertainty interval) of SARS-CoV-2 transmission accounted for by people who are asymptomatic or presymptomatic. A secondary outcome for review question 3 was the secondary attack rate from asymptomatic or presymptomatic index cases. Risk of bias in included studies Two authors independently assessed the risk of bias. A third reviewer resolved disagreements. For observational epidemiological studies, we adapted the Joanna Briggs Institute Critical Appraisal Checklist for Case Series [11]. The adapted tool included items about inclusion criteria, measurement of asymptomatic status, follow-up of course of disease, and statistical analysis. We added items about selection biases affecting the study population from a tool for the assessment of risk of bias in prevalence studies [12]. For mathematical modelling studies, we used a checklist for assessing relevance and credibility [13]. Synthesis of the evidence We used the ‘metaprop’ and ‘metabin’ functions from the ‘meta’ package (version 4.11–0) [14] in R (version 3.5.1) to display the study findings in forest plots and synthesise their findings. The 95% confidence intervals (CIs) for each study are estimated using the Clopper-Pearson method [15]. We examined heterogeneity visually in forest plots. We stratified studies according to the methods used to identify people with asymptomatic SARS-CoV-2 infection and the study setting. To synthesise proportions from comparable studies, in terms of design and population, we used stratified random-effects meta-analysis. For the stratified and overall summary estimates, we calculated prediction intervals, to represent the likely range of proportions that would be obtained in subsequent studies conducted in similar settings [16]. We calculated the secondary attack rate as the number of cases among contacts as a proportion of all close contacts ascertained. We did not account for potential clustering of contacts because the included studies did not report the size of clusters. We compared the secondary attack rate from asymptomatic or presymptomatic index cases with that from symptomatic cases. If there were no events in a group, we added 0.5 to each cell in the 2 × 2 table. We used random-effects meta-analysis with the Mantel-Haenszel method to estimate a summary risk ratio (with 95% CI). Results The living evidence database contained a total of 25,538 records about SARS-CoV-2 or COVID-19 by 10 June 2020. The searches for studies about asymptomatic or presymptomatic SARS-CoV-2 on 25 March, 20 April, and 10 June resulted in 89, 230, and 688 records for screening (S1 Fig). In the first version of the review [7], 11 articles were eligible for inclusion [17–27], version 2 [8] identified another 26 eligible records [28–53], and version 3 identified another 61 eligible records [54–114]. After excluding four articles for which more recent data became available in a subsequent version [25,29,30,35], the total number of articles included was 94 (S1 Table) [17–24,26–28,31–34,36–114]. The types of evidence changed across the three versions of the review (S1 Table). In the first version, six of 11 studies were contact investigations of single-family clusters with a total of 39 people. In the next versions, study designs included larger investigations of contacts and outbreaks, screening of defined groups, and studies of hospitalised adults and children. Across all three review versions, data from 79 empirical observational studies were collected in 19 countries or territories (Tables 1 and 2) and included 6,832 people with SARS-CoV-2 infection. Forty-seven of the studies, including 3,802 infected people, were done in China (S2 Table). At the time of their inclusion in the review, 23 of the included records were preprints; six of these had been published in peer-reviewed journals by 17 July 2020 [19,20,27,81,82,106]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Characteristics of studies reporting on proportion of asymptomatic SARS-CoV-2 infections. https://doi.org/10.1371/journal.pmed.1003346.t001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Characteristics of studies that measured the proportion of people with SARS-CoV-2 infection that develops symptoms. https://doi.org/10.1371/journal.pmed.1003346.t002 Proportion of people with asymptomatic SARS-CoV-2 infection We included 79 studies that reported empirical data about 6,616 people with SARS-CoV-2 infection (1,287 defined as having asymptomatic infection) [17,18,21–23,26–28,31,32,34,36,39–45,47–50,52–54,56–62,64,66–68,70–77,79–90,92–112,114] and one statistical modelling study [24] (Table 1). The sex distribution of the people with asymptomatic infection was reported in 41/79 studies, and the median age was reported in 35/79 studies (Table 1). The results of the studies were heterogeneous (S2 Fig). We defined seven strata, according to the method of selection of asymptomatic status and study settings. Study findings within some of these strata were more consistent (Fig 1). We considered the statistical modelling study of passengers on the Diamond Princess cruise ship passengers [24] separately, because of the different method of analysis and overlap with the study population reported by Tabata and colleagues [27]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Forest plot of proportion (‘Prop.’) of people with asymptomatic SARS-CoV-2 infection, stratified by setting. In the setting 'Contact investigations', in which more than one cluster was reported, clusters are annotated with '[cluster]'. The diamond shows the summary estimate and its 95% CI. The red bar and red text show the prediction interval. CI, confidence interval; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. https://doi.org/10.1371/journal.pmed.1003346.g001 The main risks of bias across all categories of empirical studies were in the selection and enrolment of people with asymptomatic infection and mismeasurement of asymptomatic status because of absent or incomplete definitions (S3 Fig). Sources of bias specific to studies in particular settings are discussed with the relevant results. The overall estimate of the proportion of people who become infected with SARS-CoV-2 and remain asymptomatic throughout the course of infection was 20% (95% CI 17%–25%, 79 studies), with a prediction interval of 3%–67% (Fig 1). One statistical modelling study was based on data from all 634 passengers from the Diamond Princess cruise ship with RT-PCR positive test results [24]. The authors adjusted for the proportion of people who would develop symptoms (right censoring) in a Bayesian framework to estimate that, if all were followed up until the end of the incubation period, the probability of asymptomatic infections would be 17.9% (95% credibility interval [CrI] 15.5%–20.2%). The summary estimates of the proportion of people with asymptomatic SARS-CoV-2 infection differed according to study setting, although prediction intervals for all groups overlapped. The first three strata in Fig 1 involve studies that reported on different types of contact investigation, which start with an identified COVID-19 case. The studies reporting on single-family clusters (21 estimates from 16 studies in China, n = 102 people with SARS-CoV-2) all included at least one asymptomatic person [17,18,21–23,26,44,49,50,70,73–76,85,110]. The summary estimate was 34% (95% CI 26%–44%, prediction interval 25%–45%). In nine studies that reported on close contacts of infected individuals and aggregated data from clusters of both asymptomatic and symptomatic people with SARS-CoV-2 the summary estimate was 14% (95% CI 8%–23%, prediction interval 2%–53%) [36,47,60,62,66,72,105,108,111]. We included 12 studies (n = 675 people) that reported on outbreak investigations arising from a single symptomatic person or from the country’s first imported cases of people with COVID-19 [32,43,48,58,61,68,71,90,94,95,97,100]. Four of the outbreaks involved nursing homes [58,68,71,94] and four involved occupational settings [43,61,90,95]. The summary estimate of the proportion of asymptomatic SARS-CoV-2 infections was 18% (95% CI 10%–28%, prediction interval 2%–64%). In seven studies, people with SARS-CoV-2 infection were detected through screening of all people in defined populations who were potentially exposed (303 infected people amongst 10,090 screened) [28,31,34,81,82,93,101]. The screened populations included healthcare workers [82,93,101]; people evacuated from a setting where SARS-CoV-2 transmission was confirmed, irrespective of symptom status [28,34]; the whole population of one village in Italy [81]; and blood donors [31]. In these studies, the summary estimate of the proportion asymptomatic was 31% (95% CI 26%–37%, prediction interval 24%–38%). There is a risk of selection bias in studies of certain groups, such as healthcare workers and blood donors, because people with symptoms are excluded [31,82,93,101], or from nonresponders in population-based screening [81]. Retrospective symptom ascertainment could also increase the proportion determined asymptomatic [81,82,101]. The remaining studies, in hospital settings, included adult patients only (15 studies, n = 3,228) [27,39,45,52,53,56,57,64,80,83,89,92,103,107,114], children only (10 studies, n = 285) [40–42,59,84,87,98,99,104,106], or adults and children (10 studies, n = 1,518) [54,67,77,79,86,88,96,102,109,112] (Table 1, Fig 1). The types of hospital and clinical severity of patients differed, including settings in which anyone with SARS-CoV-2 infection was admitted for isolation and traditional hospitals. Proportion of presymptomatic SARS-CoV-2 infections We included 31 studies in which the people with no symptoms of COVID-19 at enrolment were followed up, and the proportion that develops symptoms is defined as presymptomatic (Table 2, Fig 2) [21,27,28,31,34,37,38,41,45,46,49,52,55,56,58,67,68,71,73,76,77,79,81,90,93,95,103,110,111,113,114]. Four studies addressed only this review question [37,38,55,113]. The findings from the 31 studies were heterogeneous (S4 Fig), even when categorised according to the method of selection of asymptomatic participants, and we did not estimate a summary measure (Fig 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Forest plot of proportion (‘Prop.’) of people with presymptomatic SARS-CoV-2 infection, stratified by setting. CI, confidence interval; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. https://doi.org/10.1371/journal.pmed.1003346.g002 Additional analyses We investigated heterogeneity in the estimates of the proportion of asymptomatic SARS-CoV-2 infections in subgroup analyses that were not specified in the original protocol. In studies of hospitalised children, the point estimate was higher (27%, 95% CI 22%–32%, 10 studies) than in adults (11%, 95% CI 6%–19%, 15 studies) (Fig 1). The proportion of asymptomatic SARS-CoV-2 infection estimated in studies of hospitalised patients (35 studies, 19%, 95% CI 14%–25%) was similar to that in all other settings (44 studies, 22%, 95% CI 17%–29%, S5 Fig). To examine publication status, we conducted a sensitivity analysis, omitting studies that were identified as preprints at the time of data extraction (S6 Fig). The estimate of the proportion of asymptomatic infection in all settings (18%, 95% CI 14%–22%) and setting-specific estimates were very similar to the main analysis. Contribution of asymptomatic and presymptomatic infection to SARS-CoV-2 to transmission Five of the studies that conducted detailed contact investigations provided enough data to calculate a secondary attack rate according to the symptom status of the index cases (Fig 3) [36,65,66,90,111]. The summary risk ratio for asymptomatic compared with symptomatic was 0.35 (95% CI 0.1–1.27) and for presymptomatic compared with symptomatic people was 0.63 (95% CI 0.18–2.26) [66,90]. The risk of bias in ascertainment of contacts was judged to be low in all studies. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Forest plot of the RR and 95% CI of the SAR, comparing infections in contacts of asymptomatic and presymptomatic index cases with infections in contacts of symptomatic cases. The RR is on a logarithmic scale. CI, confidence interval; E, number of secondary transmission events; N, number of close contacts; RR, risk ratio; SAR, secondary attack rate. https://doi.org/10.1371/journal.pmed.1003346.g003 We included eight mathematical modelling studies (Fig 4) [19,20,33,51,63,69,78,91]. The models in five studies were informed by analysis of data from contact investigations in China, South Korea, Singapore, and the Diamond Princess cruise ship, using data to estimate the serial interval or generation time [19,20,33,69,78], and in three studies the authors used previously published estimates [51,63,91]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Forest plot of proportion (‘Prop.’) of SARS-CoV-2 infection resulting from asymptomatic or presymptomatic transmission. For studies that report outcomes in multiple settings, these are annotated in brackets. CI, confidence interval; GI, generation interval; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; SI, serial interval. https://doi.org/10.1371/journal.pmed.1003346.g004 Estimates of the contributions of both asymptomatic and presymptomatic infections SARS-CoV-2 transmission were very heterogeneous. In two studies, the contributions to SARS-CoV-2 transmission of asymptomatic infection were estimated to be 6% (95% CrI 0%–57%) [19] and 69% (95% CrI 20%–85%) [69] (Fig 4). The estimates have large uncertainty intervals and the disparate predictions result from differences in the proportion of asymptomatic infections and relative infectiousness of asymptomatic infection. Ferretti and colleagues provide an interactive web application [19] that shows how these parameters affect the model results. Models of the contribution of presymptomatic transmission used different assumptions about the durations and distributions of infection parameters such as incubation period, generation time, and serial interval [19,20,33,51,63,78,91]. In models that accounted for uncertainty appropriately, most estimates of the proportion of transmission resulting from people with SARS-CoV-2 who are presymptomatic ranged from 20% to 70%. In one study that estimated a contribution of <1% [91], the model-fitted serial interval was longer than observed in empirical studies [115]. The credibility of most modelling studies was limited by the absence of external validation. The data to which the models were fitted were generally from small samples (S7 Fig). Proportion of people with asymptomatic SARS-CoV-2 infection We included 79 studies that reported empirical data about 6,616 people with SARS-CoV-2 infection (1,287 defined as having asymptomatic infection) [17,18,21–23,26–28,31,32,34,36,39–45,47–50,52–54,56–62,64,66–68,70–77,79–90,92–112,114] and one statistical modelling study [24] (Table 1). The sex distribution of the people with asymptomatic infection was reported in 41/79 studies, and the median age was reported in 35/79 studies (Table 1). The results of the studies were heterogeneous (S2 Fig). We defined seven strata, according to the method of selection of asymptomatic status and study settings. Study findings within some of these strata were more consistent (Fig 1). We considered the statistical modelling study of passengers on the Diamond Princess cruise ship passengers [24] separately, because of the different method of analysis and overlap with the study population reported by Tabata and colleagues [27]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Forest plot of proportion (‘Prop.’) of people with asymptomatic SARS-CoV-2 infection, stratified by setting. In the setting 'Contact investigations', in which more than one cluster was reported, clusters are annotated with '[cluster]'. The diamond shows the summary estimate and its 95% CI. The red bar and red text show the prediction interval. CI, confidence interval; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. https://doi.org/10.1371/journal.pmed.1003346.g001 The main risks of bias across all categories of empirical studies were in the selection and enrolment of people with asymptomatic infection and mismeasurement of asymptomatic status because of absent or incomplete definitions (S3 Fig). Sources of bias specific to studies in particular settings are discussed with the relevant results. The overall estimate of the proportion of people who become infected with SARS-CoV-2 and remain asymptomatic throughout the course of infection was 20% (95% CI 17%–25%, 79 studies), with a prediction interval of 3%–67% (Fig 1). One statistical modelling study was based on data from all 634 passengers from the Diamond Princess cruise ship with RT-PCR positive test results [24]. The authors adjusted for the proportion of people who would develop symptoms (right censoring) in a Bayesian framework to estimate that, if all were followed up until the end of the incubation period, the probability of asymptomatic infections would be 17.9% (95% credibility interval [CrI] 15.5%–20.2%). The summary estimates of the proportion of people with asymptomatic SARS-CoV-2 infection differed according to study setting, although prediction intervals for all groups overlapped. The first three strata in Fig 1 involve studies that reported on different types of contact investigation, which start with an identified COVID-19 case. The studies reporting on single-family clusters (21 estimates from 16 studies in China, n = 102 people with SARS-CoV-2) all included at least one asymptomatic person [17,18,21–23,26,44,49,50,70,73–76,85,110]. The summary estimate was 34% (95% CI 26%–44%, prediction interval 25%–45%). In nine studies that reported on close contacts of infected individuals and aggregated data from clusters of both asymptomatic and symptomatic people with SARS-CoV-2 the summary estimate was 14% (95% CI 8%–23%, prediction interval 2%–53%) [36,47,60,62,66,72,105,108,111]. We included 12 studies (n = 675 people) that reported on outbreak investigations arising from a single symptomatic person or from the country’s first imported cases of people with COVID-19 [32,43,48,58,61,68,71,90,94,95,97,100]. Four of the outbreaks involved nursing homes [58,68,71,94] and four involved occupational settings [43,61,90,95]. The summary estimate of the proportion of asymptomatic SARS-CoV-2 infections was 18% (95% CI 10%–28%, prediction interval 2%–64%). In seven studies, people with SARS-CoV-2 infection were detected through screening of all people in defined populations who were potentially exposed (303 infected people amongst 10,090 screened) [28,31,34,81,82,93,101]. The screened populations included healthcare workers [82,93,101]; people evacuated from a setting where SARS-CoV-2 transmission was confirmed, irrespective of symptom status [28,34]; the whole population of one village in Italy [81]; and blood donors [31]. In these studies, the summary estimate of the proportion asymptomatic was 31% (95% CI 26%–37%, prediction interval 24%–38%). There is a risk of selection bias in studies of certain groups, such as healthcare workers and blood donors, because people with symptoms are excluded [31,82,93,101], or from nonresponders in population-based screening [81]. Retrospective symptom ascertainment could also increase the proportion determined asymptomatic [81,82,101]. The remaining studies, in hospital settings, included adult patients only (15 studies, n = 3,228) [27,39,45,52,53,56,57,64,80,83,89,92,103,107,114], children only (10 studies, n = 285) [40–42,59,84,87,98,99,104,106], or adults and children (10 studies, n = 1,518) [54,67,77,79,86,88,96,102,109,112] (Table 1, Fig 1). The types of hospital and clinical severity of patients differed, including settings in which anyone with SARS-CoV-2 infection was admitted for isolation and traditional hospitals. Proportion of presymptomatic SARS-CoV-2 infections We included 31 studies in which the people with no symptoms of COVID-19 at enrolment were followed up, and the proportion that develops symptoms is defined as presymptomatic (Table 2, Fig 2) [21,27,28,31,34,37,38,41,45,46,49,52,55,56,58,67,68,71,73,76,77,79,81,90,93,95,103,110,111,113,114]. Four studies addressed only this review question [37,38,55,113]. The findings from the 31 studies were heterogeneous (S4 Fig), even when categorised according to the method of selection of asymptomatic participants, and we did not estimate a summary measure (Fig 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Forest plot of proportion (‘Prop.’) of people with presymptomatic SARS-CoV-2 infection, stratified by setting. CI, confidence interval; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. https://doi.org/10.1371/journal.pmed.1003346.g002 Additional analyses We investigated heterogeneity in the estimates of the proportion of asymptomatic SARS-CoV-2 infections in subgroup analyses that were not specified in the original protocol. In studies of hospitalised children, the point estimate was higher (27%, 95% CI 22%–32%, 10 studies) than in adults (11%, 95% CI 6%–19%, 15 studies) (Fig 1). The proportion of asymptomatic SARS-CoV-2 infection estimated in studies of hospitalised patients (35 studies, 19%, 95% CI 14%–25%) was similar to that in all other settings (44 studies, 22%, 95% CI 17%–29%, S5 Fig). To examine publication status, we conducted a sensitivity analysis, omitting studies that were identified as preprints at the time of data extraction (S6 Fig). The estimate of the proportion of asymptomatic infection in all settings (18%, 95% CI 14%–22%) and setting-specific estimates were very similar to the main analysis. Contribution of asymptomatic and presymptomatic infection to SARS-CoV-2 to transmission Five of the studies that conducted detailed contact investigations provided enough data to calculate a secondary attack rate according to the symptom status of the index cases (Fig 3) [36,65,66,90,111]. The summary risk ratio for asymptomatic compared with symptomatic was 0.35 (95% CI 0.1–1.27) and for presymptomatic compared with symptomatic people was 0.63 (95% CI 0.18–2.26) [66,90]. The risk of bias in ascertainment of contacts was judged to be low in all studies. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Forest plot of the RR and 95% CI of the SAR, comparing infections in contacts of asymptomatic and presymptomatic index cases with infections in contacts of symptomatic cases. The RR is on a logarithmic scale. CI, confidence interval; E, number of secondary transmission events; N, number of close contacts; RR, risk ratio; SAR, secondary attack rate. https://doi.org/10.1371/journal.pmed.1003346.g003 We included eight mathematical modelling studies (Fig 4) [19,20,33,51,63,69,78,91]. The models in five studies were informed by analysis of data from contact investigations in China, South Korea, Singapore, and the Diamond Princess cruise ship, using data to estimate the serial interval or generation time [19,20,33,69,78], and in three studies the authors used previously published estimates [51,63,91]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Forest plot of proportion (‘Prop.’) of SARS-CoV-2 infection resulting from asymptomatic or presymptomatic transmission. For studies that report outcomes in multiple settings, these are annotated in brackets. CI, confidence interval; GI, generation interval; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; SI, serial interval. https://doi.org/10.1371/journal.pmed.1003346.g004 Estimates of the contributions of both asymptomatic and presymptomatic infections SARS-CoV-2 transmission were very heterogeneous. In two studies, the contributions to SARS-CoV-2 transmission of asymptomatic infection were estimated to be 6% (95% CrI 0%–57%) [19] and 69% (95% CrI 20%–85%) [69] (Fig 4). The estimates have large uncertainty intervals and the disparate predictions result from differences in the proportion of asymptomatic infections and relative infectiousness of asymptomatic infection. Ferretti and colleagues provide an interactive web application [19] that shows how these parameters affect the model results. Models of the contribution of presymptomatic transmission used different assumptions about the durations and distributions of infection parameters such as incubation period, generation time, and serial interval [19,20,33,51,63,78,91]. In models that accounted for uncertainty appropriately, most estimates of the proportion of transmission resulting from people with SARS-CoV-2 who are presymptomatic ranged from 20% to 70%. In one study that estimated a contribution of <1% [91], the model-fitted serial interval was longer than observed in empirical studies [115]. The credibility of most modelling studies was limited by the absence of external validation. The data to which the models were fitted were generally from small samples (S7 Fig). Discussion Summary of main findings The summary proportion of SARS-CoV-2 that is asymptomatic throughout the course of infection was estimated, across all study settings, to be 20% (95% CI 17%–25%, 79 studies), with a prediction interval of 3%–67%. In studies that identified SARS-CoV-2 infection through screening of defined populations, the proportion of asymptomatic infections was 31% (95% CI 26%–37%, 7 studies). In 31 studies reporting on people who are presymptomatic but who go on to develop symptoms, the results were too heterogeneous to combine. The secondary attack rate from asymptomatic infections may be lower than that from symptomatic infections (relative risk 0.35, 95% CI 0.1–1.27). Modelling studies estimated a wide range of the proportion of all SARS-CoV-2 infections that result from transmission from asymptomatic and presymptomatic individuals. Strengths and weaknesses A strength of this review is that we used clear definitions and separated review questions to distinguish between SARS-CoV-2 infections that remain asymptomatic throughout their course from those that become symptomatic and to separate proportions of people with infection from their contribution to transmission in a population. This living systematic review uses methods to minimise bias whilst increasing the speed of the review process [5,6] and will be updated regularly. We only included studies that provided information about follow-up through the course of infection, which allowed reliable assessment about the proportion of asymptomatic people in different settings. In the statistical synthesis of proportions, we used a method that accounts for the binary nature of the data and avoids the normality approximation (weighted logistic regression). Limitations of the review are that most included studies were not designed to estimate the proportion of asymptomatic SARS-CoV-2 infection and definitions of asymptomatic status were often incomplete or absent. The risks of bias, particularly those affecting selection of participants, differed between studies and could result in both underestimation and overestimation of the true proportion of asymptomatic infections. Also, we did not consider the possible impact of false negative RT-PCR results, which might be more likely to occur in asymptomatic infections [116] and would underestimate the proportion of asymptomatic infections [117]. The four databases that we searched are not comprehensive, but they cover the majority of publications and we do not believe that we have missed studies that would change our conclusions. Comparison with other reviews We found narrative reviews that reported wide ranges (5%–96%) of infections that might be asymptomatic [1,118]. These reviews presented cross-sectional studies alongside longitudinal studies and did not distinguish between asymptomatic and presymptomatic infection. We found three systematic reviews, which reported similar summary estimates from meta-analysis of studies published up to May [119–121]. In two reviews, authors applied inclusion criteria to reduce the risks of selection bias, with summary estimates of 11% (95% CI 4%–18%, 6 studies) [120] and 15% (95% CI 12%–18%, 9 studies) [121]. Our review includes all these studies, mostly in the categories of aggregated contact or outbreak investigations, with compatible summary estimates (Fig 1). We categorised one report [81] with other studies in which a defined population was screened. The summary estimate in the third systematic review (16%, 95% CI 10%–23%, 41 studies) [119] was similar to that of other systematic reviews, despite inclusion of studies with no information about follow-up. In comparison with other reviews, rather than restricting inclusion, we give a comprehensive overview of studies with adequate follow-up, with assessment of risks of bias and exploration of heterogeneity (S2–S7 Figs). The three versions of this review to date have shown how types of evidence change over time, from single-family investigations to large screening studies (S1 Table). Interpretation The findings from systematic reviews, including ours [119–121], do not support the claim that a large majority of SARS-CoV-2 infections are asymptomatic [122]. We estimated that, across all study settings, the proportion of SARS-CoV-2 infections that are asymptomatic throughout the course of infection is 20% (95% CI 17%–25%). The wider prediction interval reflects the heterogeneity between studies and indicates that future studies with similar study designs and in similar settings will estimate a proportion of asymptomatic infections from 3% to 67%. Studies that detect SARS-CoV-2 through screening of defined populations irrespective of infection status at enrolment should be less affected by selection biases. In this group of studies, the estimated proportion of asymptomatic infection was 31% (95% CI 26%–37%, prediction interval 24%–38%). This estimate suggests that other studies might have had an overrepresentation of participants diagnosed because of symptoms, but there were also potential selection biases in screening studies that might have overestimated the proportion of asymptomatic infections. Our knowledge to date is based on data collected during the acute phase of an international public health emergency, mostly for other purposes. To estimate the true proportion of asymptomatic SARS-CoV-2 infections, researchers need to design prospective longitudinal studies with clear definitions, methods that minimise selection and measurement biases, and transparent reporting. Serological tests, in combination with virological diagnostic methods, might improve ascertainment of SARS-CoV-2 infection in asymptomatic populations. Prospective documentation of symptom status would be required, and improvements in the performance of serological tests are still needed [123]. Our review adds to information about the relative contributions of asymptomatic and presymptomatic infection to overall SARS-CoV-2 transmission. Since all people infected with SARS-CoV-2 are initially asymptomatic, the proportion that will go on to develop symptoms can be derived by subtraction from the estimated proportion with true asymptomatic infections; from our review, we would estimate this fraction to be 80% (95% CI 75%–83%). Since SARS-CoV-2 can be transmitted a few days before the onset of symptoms [124], presymptomatic transmission likely contributes substantially to overall SARS-CoV-2 epidemics. The analysis of secondary attack rates provides some evidence of lower infectiousness of people with asymptomatic than symptomatic infection (Fig 3) [36,65,66,90,111], but more studies are needed to quantify this association more precisely. If both the proportion and transmissibility of asymptomatic infection are relatively low, people with asymptomatic SARS-CoV-2 infection should account for a smaller proportion of overall transmission than presymptomatic individuals. This is consistent with the findings of the only mathematical modelling study in our review that explored this question [19]. Uncertainties in estimates of the true proportion and the relative infectiousness of asymptomatic SARS-Cov-2 infection and other infection parameters contributed to heterogeneous predictions about the proportion of presymptomatic transmission [20,33,51,63,78,91]. Implications and unanswered questions Integration of evidence from epidemiological, clinical, and laboratory studies will help to clarify the relative infectiousness of asymptomatic SARS-CoV-2. Studies using viral culture as well as RNA detection are needed, since RT-PCR defined viral loads appear to be broadly similar in asymptomatic and symptomatic people [116,125]. Age might play a role as children appear more likely than adults to have an asymptomatic course of infection (Fig 1) [126]; age was poorly reported in studies included in this review (Table 1). SARS-CoV-2 transmission from people who are either asymptomatic or presymptomatic has implications for prevention. Social distancing measures will need to be sustained at some level because droplet transmission from close contact with people with asymptomatic and presymptomatic infection occurs. Easing of restrictions will, however, only be possible with wide access to testing, contact tracing, and rapid isolation of infected individuals. Quarantine of close contacts is also essential to prevent onward transmission during asymptomatic or presymptomatic periods of those that have become infected. Digital, proximity tracing could supplement classical contact tracing to speed up detection of contacts to interrupt transmission during the presymptomatic phase if shown to be effective [19,127]. The findings of this systematic review of publications early in the pandemic suggests that most SARS-CoV-2 infections are not asymptomatic throughout the course of infection. The contribution of presymptomatic and asymptomatic infections to overall SARS-CoV-2 transmission means that combination prevention measures, with enhanced hand and respiratory hygiene, testing tracing, and isolation strategies and social distancing, will continue to be needed. Summary of main findings The summary proportion of SARS-CoV-2 that is asymptomatic throughout the course of infection was estimated, across all study settings, to be 20% (95% CI 17%–25%, 79 studies), with a prediction interval of 3%–67%. In studies that identified SARS-CoV-2 infection through screening of defined populations, the proportion of asymptomatic infections was 31% (95% CI 26%–37%, 7 studies). In 31 studies reporting on people who are presymptomatic but who go on to develop symptoms, the results were too heterogeneous to combine. The secondary attack rate from asymptomatic infections may be lower than that from symptomatic infections (relative risk 0.35, 95% CI 0.1–1.27). Modelling studies estimated a wide range of the proportion of all SARS-CoV-2 infections that result from transmission from asymptomatic and presymptomatic individuals. Strengths and weaknesses A strength of this review is that we used clear definitions and separated review questions to distinguish between SARS-CoV-2 infections that remain asymptomatic throughout their course from those that become symptomatic and to separate proportions of people with infection from their contribution to transmission in a population. This living systematic review uses methods to minimise bias whilst increasing the speed of the review process [5,6] and will be updated regularly. We only included studies that provided information about follow-up through the course of infection, which allowed reliable assessment about the proportion of asymptomatic people in different settings. In the statistical synthesis of proportions, we used a method that accounts for the binary nature of the data and avoids the normality approximation (weighted logistic regression). Limitations of the review are that most included studies were not designed to estimate the proportion of asymptomatic SARS-CoV-2 infection and definitions of asymptomatic status were often incomplete or absent. The risks of bias, particularly those affecting selection of participants, differed between studies and could result in both underestimation and overestimation of the true proportion of asymptomatic infections. Also, we did not consider the possible impact of false negative RT-PCR results, which might be more likely to occur in asymptomatic infections [116] and would underestimate the proportion of asymptomatic infections [117]. The four databases that we searched are not comprehensive, but they cover the majority of publications and we do not believe that we have missed studies that would change our conclusions. Comparison with other reviews We found narrative reviews that reported wide ranges (5%–96%) of infections that might be asymptomatic [1,118]. These reviews presented cross-sectional studies alongside longitudinal studies and did not distinguish between asymptomatic and presymptomatic infection. We found three systematic reviews, which reported similar summary estimates from meta-analysis of studies published up to May [119–121]. In two reviews, authors applied inclusion criteria to reduce the risks of selection bias, with summary estimates of 11% (95% CI 4%–18%, 6 studies) [120] and 15% (95% CI 12%–18%, 9 studies) [121]. Our review includes all these studies, mostly in the categories of aggregated contact or outbreak investigations, with compatible summary estimates (Fig 1). We categorised one report [81] with other studies in which a defined population was screened. The summary estimate in the third systematic review (16%, 95% CI 10%–23%, 41 studies) [119] was similar to that of other systematic reviews, despite inclusion of studies with no information about follow-up. In comparison with other reviews, rather than restricting inclusion, we give a comprehensive overview of studies with adequate follow-up, with assessment of risks of bias and exploration of heterogeneity (S2–S7 Figs). The three versions of this review to date have shown how types of evidence change over time, from single-family investigations to large screening studies (S1 Table). Interpretation The findings from systematic reviews, including ours [119–121], do not support the claim that a large majority of SARS-CoV-2 infections are asymptomatic [122]. We estimated that, across all study settings, the proportion of SARS-CoV-2 infections that are asymptomatic throughout the course of infection is 20% (95% CI 17%–25%). The wider prediction interval reflects the heterogeneity between studies and indicates that future studies with similar study designs and in similar settings will estimate a proportion of asymptomatic infections from 3% to 67%. Studies that detect SARS-CoV-2 through screening of defined populations irrespective of infection status at enrolment should be less affected by selection biases. In this group of studies, the estimated proportion of asymptomatic infection was 31% (95% CI 26%–37%, prediction interval 24%–38%). This estimate suggests that other studies might have had an overrepresentation of participants diagnosed because of symptoms, but there were also potential selection biases in screening studies that might have overestimated the proportion of asymptomatic infections. Our knowledge to date is based on data collected during the acute phase of an international public health emergency, mostly for other purposes. To estimate the true proportion of asymptomatic SARS-CoV-2 infections, researchers need to design prospective longitudinal studies with clear definitions, methods that minimise selection and measurement biases, and transparent reporting. Serological tests, in combination with virological diagnostic methods, might improve ascertainment of SARS-CoV-2 infection in asymptomatic populations. Prospective documentation of symptom status would be required, and improvements in the performance of serological tests are still needed [123]. Our review adds to information about the relative contributions of asymptomatic and presymptomatic infection to overall SARS-CoV-2 transmission. Since all people infected with SARS-CoV-2 are initially asymptomatic, the proportion that will go on to develop symptoms can be derived by subtraction from the estimated proportion with true asymptomatic infections; from our review, we would estimate this fraction to be 80% (95% CI 75%–83%). Since SARS-CoV-2 can be transmitted a few days before the onset of symptoms [124], presymptomatic transmission likely contributes substantially to overall SARS-CoV-2 epidemics. The analysis of secondary attack rates provides some evidence of lower infectiousness of people with asymptomatic than symptomatic infection (Fig 3) [36,65,66,90,111], but more studies are needed to quantify this association more precisely. If both the proportion and transmissibility of asymptomatic infection are relatively low, people with asymptomatic SARS-CoV-2 infection should account for a smaller proportion of overall transmission than presymptomatic individuals. This is consistent with the findings of the only mathematical modelling study in our review that explored this question [19]. Uncertainties in estimates of the true proportion and the relative infectiousness of asymptomatic SARS-Cov-2 infection and other infection parameters contributed to heterogeneous predictions about the proportion of presymptomatic transmission [20,33,51,63,78,91]. Implications and unanswered questions Integration of evidence from epidemiological, clinical, and laboratory studies will help to clarify the relative infectiousness of asymptomatic SARS-CoV-2. Studies using viral culture as well as RNA detection are needed, since RT-PCR defined viral loads appear to be broadly similar in asymptomatic and symptomatic people [116,125]. Age might play a role as children appear more likely than adults to have an asymptomatic course of infection (Fig 1) [126]; age was poorly reported in studies included in this review (Table 1). SARS-CoV-2 transmission from people who are either asymptomatic or presymptomatic has implications for prevention. Social distancing measures will need to be sustained at some level because droplet transmission from close contact with people with asymptomatic and presymptomatic infection occurs. Easing of restrictions will, however, only be possible with wide access to testing, contact tracing, and rapid isolation of infected individuals. Quarantine of close contacts is also essential to prevent onward transmission during asymptomatic or presymptomatic periods of those that have become infected. Digital, proximity tracing could supplement classical contact tracing to speed up detection of contacts to interrupt transmission during the presymptomatic phase if shown to be effective [19,127]. The findings of this systematic review of publications early in the pandemic suggests that most SARS-CoV-2 infections are not asymptomatic throughout the course of infection. The contribution of presymptomatic and asymptomatic infections to overall SARS-CoV-2 transmission means that combination prevention measures, with enhanced hand and respiratory hygiene, testing tracing, and isolation strategies and social distancing, will continue to be needed. Supporting information S1 PRISMA Checklist. https://doi.org/10.1371/journal.pmed.1003346.s001 (DOCX) S1 Text. Search strings. https://doi.org/10.1371/journal.pmed.1003346.s002 (DOCX) S1 Fig. Flowchart. https://doi.org/10.1371/journal.pmed.1003346.s003 (PDF) S2 Fig. Review question 1, forest plot of included studies, by study precision. https://doi.org/10.1371/journal.pmed.1003346.s004 (PDF) S3 Fig. Risk of bias in studies included in review question 1 and review question 2. https://doi.org/10.1371/journal.pmed.1003346.s005 (PDF) S4 Fig. Review question 2, forest plot of included studies, by study precision. https://doi.org/10.1371/journal.pmed.1003346.s006 (PDF) S5 Fig. Review question 1, subgroup analysis comparing studies of hospitalised patients with all other settings. https://doi.org/10.1371/journal.pmed.1003346.s007 (PDF) S6 Fig. Review question 1, sensitivity analysis, omitting studies that were preprints at the time of literature search. https://doi.org/10.1371/journal.pmed.1003346.s008 (PDF) S7 Fig. Assessment of credibility of mathematical modelling studies. https://doi.org/10.1371/journal.pmed.1003346.s009 (PDF) S1 Table. Types of study included in successive versions of the living systematic review, as of 10 June 2020. https://doi.org/10.1371/journal.pmed.1003346.s010 (DOCX) S2 Table. Location of studies contributing data to review questions 1 and 2. https://doi.org/10.1371/journal.pmed.1003346.s011 (DOCX)
Economic influences on population health in the United States: Toward policymaking driven by data and evidenceVenkataramani, Atheendar S.;O’Brien, Rourke;Whitehorn, Gregory L.;Tsai, Alexander C.
doi: 10.1371/journal.pmed.1003319pmid: 32877406
Summary points The United States is in the midst of a 40-year-long population health crisis. Life expectancy has declined since 2014, an unprecedented event that has followed on the heels of a decades-long slowing in secular gains in longevity in the US relative to peer countries. These adverse population health trends appear to be primarily driven by worsening health among working-age individuals of lower socioeconomic status. A growing body of research suggests that worsening economic outcomes—e.g., fading employment opportunities and increasing economic insecurity—may be a primary causal driver of adverse health trends among low-income and less-educated working-age US residents. Evidence-based public policies to address widening gaps in economic and health outcomes include expanding early childhood health and educational investments, increasing the scope of programs that assist displaced workers in developing new skills and finding new jobs, reinforcing the social safety net, and improving the reach of public health efforts to help moderate the health consequences of adverse economic shocks. Policymakers will also need to consider and rigorously evaluate new approaches, such as basic income grants, investments to direct automation toward complementing rather than replacing the work force, or job guarantee programs. The size and scope of the population health challenges that have arisen with the changing economy highlight the importance of new data sources and evidence-based engagement by policymakers. Introduction Over the last 40 years, the secular increase in longevity in the US has slowed relative to peer countries, a phenomenon capped in more recent years by an unprecedented decline in US life expectancy [1]. These broad trends mask widening disparities in health outcomes by socioeconomic status. Stagnant—and more recently, rising—mortality among working-age adults in lower income brackets who have less formal education, primarily driven by rising drug overdose and suicide death rates, appear to entirely account for the growing gap in health outcomes between the US and other high-income countries [1–4]. During this time, economic outcomes in the same population subgroups have worsened as well. Since the late 1970s, growth in wages and incomes stagnated for most US residents [5]. The ability of individuals from poor families to achieve upward socioeconomic mobility has fallen considerably [6–8], whereas insecurity in many aspects of life—such as earnings, work, and housing—has risen [8,9]. Consequently, income inequality has increased dramatically over this period, reaching levels not seen since the eve of the 1929–1939 Great Depression [10]. We argue that worsening economic outcomes among low-income and less-educated working-age adults may be a key driver of adverse population health trends in the US. We begin by describing trends in economic outcomes since the late 1970s and their underlying drivers. We then discuss the myriad ways in which these trends have affected population health, focusing primarily on working-age adults. We review existing and new interventions that may jointly improve economic outcomes and population health (or mitigate the adverse health consequences of negative economic shocks). We also highlight the importance of new and more accessible data to better inform policy and the role of political processes in realizing the full potential of evidence-based policymaking. Worsening economic outcomes in the US One of the most striking changes in the US economy has been the divergence, since 1980, in economic outcomes by level of education and relative position in the income distribution. Through the late 1970s, inflation-adjusted (i.e., real) wages grew at similar rates for US residents across the income distribution. Thereafter, however, real incomes stagnated for individuals in the bottom 50%, even as they continued to increase for those in the top 10% [5,11]. This divergence is even more stark when stratified by level of education. Over the same 40-year period, real earnings increased for those with a college education but decreased for those with only a high school education or less. At the same time, individuals growing up in poor families have found it increasingly difficult to exit poverty later in life. Economic opportunity—the ability to achieve upward socioeconomic mobility regardless of one’s background—has declined dramatically, particularly for those entering the US labor market in the early 1980s and thereafter [6,7]. Several forces may have contributed to these trends. The disappearance of employment opportunities that had previously provided to individuals without a college education a credible path to the middle class—e.g., manufacturing jobs—has played an outsized role [3,12,13]. These jobs have disappeared, in part, because of technological advances that have allowed firms to automate many tasks previously performed by workers [14] and also because of increases in foreign trade that have led to large declines in employment within industries and areas most exposed to competition with foreign firms [13,15]. The growth of low-wage healthcare industry jobs has buffered some areas against the loss of manufacturing employment opportunities but, overall, has done little to counter the worsening economic outcomes among low-income and less-educated workers that have been caused by automation and trade [16,17]. In addition to these market forces, shifts in public policy have also contributed to rising income inequality and falling social mobility in the US. Starting in the late 1970s, federal minimum wage increases have failed to keep up with inflation, and the inflation-adjusted minimum wage today is lower than it was in 1980. Recent studies have demonstrated the critical importance of minimum wage increases in sustaining incomes of low-wage workers, with attendant consequences for the shape of the overall income distribution [18,19]. Second, the deregulation of the financial industry has contributed to increasing “financialization” of the economy, which has both increased the relative importance of financial firms—and the wages of those employed within them—and exposed small businesses and individuals to greater financial risk [20]. The consequences of financialization are illustrated by the substantial declines in wealth among low- and middle-income households after the post-2007 Great Recession, compared with increasing wealth among higher-income households [21]. Third, emerging evidence suggests that deunionization has reduced the bargaining power of workers without college degrees, thereby undermining union-associated earnings premia and worsening income inequality [22,23]. Fourth, further entrenchment of structural racism has contributed to economic inequality by eroding economic outcomes among historically minoritized populations [24]. For example, after a period of narrowing, the gap in earnings between Black and White Americans has grown since the late 1970s [25]. Shifts in policing, criminal justice, and sentencing policies resulting in mass incarceration have contributed to this rise in economic inequality [26,27]. In addition, the persistence of racial segregation [28–30]—and the inability of local, state, and federal governments to mount definitive policy responses to effectively eliminate it [31]—has undermined the resilience of Black Americans to negative economic shocks [32], while remaining a continuing driver of poverty among Black Americans and Black-White disparities in income and wealth [33]. Policy-driven erosion of the social safety net has compounded the adverse socioeconomic consequences of the changing economy on low-income and less-educated individuals [34]. For example, cuts to cash welfare programs under both Republican and Democratic presidential administrations have reduced the share of poor families receiving benefits during times of need, from 4 in 5 families accessing welfare in 1980 to 1 in 5 in the present day [35]. The combination of falling wages, fading economic opportunities, and a shrinking safety net has introduced increasing precarity in the lives of the poor, a state marked by income insecurity and unpredictability, disengagement from social and economic structures, decreasing resilience to compounding social and economic shocks, and the need to make hard choices around basic needs [34,36–39]. Economic outcomes and population health Theoretical mechanisms linking economic conditions and population health Worsening economic outcomes for less-educated and low-income individuals can influence health through several channels [2,40,41] (Fig 1). Falling incomes can reduce access to basic material resources (e.g., stable housing, food, health insurance, and healthcare) needed to ensure good health [41,42]. Worsening economic outcomes may also increase exposure to stressors such as poor environmental conditions (e.g., worse air pollution) [43,44]. Increasing economic insecurity and precarity may directly harm health through increasing biological and psychosocial stress [40,45,46]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Drivers of economic outcomes and consequences for population health. Figure summarizes the key relationships between economic outcomes—and their underlying drivers—and health. To ensure better clarity, the figure only focuses on drivers and pathways discussed in this review. In doing so, we do not directly specify some relationships that are likely important for population health, e.g., direct effects of public policy (e.g., Medicaid expansions) and structural racism on health outcomes. https://doi.org/10.1371/journal.pmed.1003319.g001 The diverging fortunes between the “haves” and “have-nots” may change or constrain the way individuals think and plan for the future, leading to underinvestment in behaviors that may improve health and economic outcomes. For example, worsening economic outcomes can diminish one’s expectations for a better future, which can undermine individuals’ motivations to engage in health-promoting behaviors [47–50]. In addition, economic insecurity may reduce the mental bandwidth needed to make productive, future-oriented economic decisions, reducing individuals’ ability to exit their current economic circumstances and thereby contributing to their worsening health [51,52]. Empirical evidence linking worsening economic outcomes to population health Descriptive evidence on trends in life expectancy and mortality by socioeconomic status implicate changing economic conditions as a key driver of worsening population health in the US. Most starkly, worsening health outcomes in the past 40 years have been concentrated among working-age adults in the bottom of the income distribution and among those with relatively low levels of formal education (i.e., high school or less). These are the very groups that have increasingly fallen behind in economic outcomes since the 1980s [1,3]. For example, a recent study using data from 1.4 billion individual-level tax records linked to administrative data on mortality found widening income-based mortality gaps from 1999 to 2014 [4]. Life expectancy increased by approximately 2–3 years over the 15-year study period for men and women with pre-tax incomes in the top 5% nationally. By contrast, life expectancies for men and women with incomes in the bottom 5% remained virtually unchanged. The growth in the mortality gap by level of education has been even more striking, with several studies demonstrating increases in mortality (or decreases in life expectancy) for adults in the bottom of the US education distribution [3, 53–56]. Growing mortality gaps by education remain [57], even after accounting for bias from growing selection into lower levels of education over time [58]. Differential experiences across geographic areas are also consistent with economic factors playing a critical role in driving population health outcomes. This phenomenon can be revealed by even cursory inspection: for example, age-adjusted mortality rates in counties in different income and education deciles tracked closely with each other until the 1980s, after which the richest and more highly educated counties experienced much larger declines in death rates (Fig 2). Excess all-cause and drug overdose mortality relative to historical trends has increased in states where employment opportunities for less-educated workers have fallen the most in recent decades [1]. Areas with lower economic opportunity (operationalized as county-level differences in rates of upward mobility for individuals born into poorer families) tend to have higher mortality and morbidity [49,59,60]. Similarly, an extensive literature has demonstrated associations between rising income inequality and worsening health [61,62] (although other studies have challenged these findings [63]). Socioeconomic gaps in longevity even vary within geographic areas, with gaps growing markedly in some areas (e.g., the South and industrial Midwest) relative to others [4,54]. Metropolitan areas with higher proportions of college-educated individuals, lower unemployment, richer tax bases, and higher social mobility also tend to have narrower income-based gaps in health outcomes [4,64]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Age-adjusted mortality rates for working-age adults (ages 25–64 years) by county of residence per capital income and educational deciles, 1970–2016. County-level age-adjusted, all-cause mortality rates for adults ages 25–64 years were obtained from the US Centers for Disease Control WONDER database. Data on county-level per-capita incomes and share of individuals with some college education or greater were obtained from US Decennial Census and ACS data made available through Social Explorer. We assigned counties into deciles of the per-capita income and education variations by each census-ACS year (so a given county could be grouped into different deciles in different years). This operationalization is particularly important for level of education, as focusing on fixed educated groups (e.g., high school or less) may result in selection bias [58]. Population-weighted average mortality rates were then calculated for each socioeconomic decile-year. We then plotted these average mortality rates for the bottom (1st), middle (5th), and top (10th) decile of each socioeconomic variable. After the 1980s, the beginning of which is marked by the vertical red lines, income and education disparities began widening. ACS, American Community Survey; WONDER, Wide-Ranging Online Data for Epidemiologic Research. https://doi.org/10.1371/journal.pmed.1003319.g002 Studies examining the health consequences of specific economic insecurities among low-income and less-educated adults provide still more supporting evidence. Many of these studies have the important advantage of leveraging “natural experiments,” which are research designs that use sudden unanticipated events or shifts in policy to account for unmeasured factors that may bias estimates of the associations between economic outcomes and health [65], allowing researchers to gain more purchase in assessing causal relationships [66] and disentangling age versus period versus cohort-based explanations [67,68]. Recent studies have used policy-driven increases in local exposure to foreign trade to investigate the effects of fading economic opportunities in sectors such as manufacturing, finding large impacts on drug overdose mortality [69,70]. Other studies have demonstrated worsening mental health and increases in overall and drug overdose mortality after manufacturing plant closures [71–73]. Falling participation in labor unions has been linked to rising mortality rates from suicide and drug overdose [74]. Structural racism—as manifest by racial gaps in economic opportunity, racial gaps in wealth, racial segregation, and risk of incarceration—have also been negatively associated with a range of health outcomes, particularly for Black Americans [60,75–77]. A growing literature has also investigated the mechanisms that plausibly may link worsening economic outcomes to health. Demonstrating that key hypothesized mediators influence health provides even greater confidence that economic changes in the past 40 years have adversely affected health. Several studies have examined the health consequences of increasing precarity: food, housing, and economic insecurity, for example, have been robustly linked to a range of health outcomes, including mortality [78–80]. A number of studies have demonstrated negative effects of large economic shocks (e.g., financial hardship during the Great Recession or sudden, unexpected loss of wealth) on physiological stress as measured by cortisol, blood pressure, and blood glucose [46,81], and on premature death [82]. Our review of the evidence thus far has focused primarily on short- and medium-run impacts of economic shifts on the working-age individuals who are directly affected. However, worsening economic conditions may also negatively affect the health of those exposed to these changes early in life, prior to entering the labor market. Young adults entering the US job market during economic downturns are more likely to earn lower lifetime wages, engage in health risk behaviors such as smoking and excessive alcohol use, and, by their late 30s, have higher mortality risk, compared with their counterparts entering the labor market during better times [83,84]. Similarly, income inequality and fading economic opportunities have been linked to risky health behaviors and lower educational investments among adolescents, both of which influence health later in adulthood [85,86]. A large, robust literature has found strong links between exposure to economic insecurity in early childhood and a range of adult health outcomes [87–89]. Theoretical mechanisms linking economic conditions and population health Worsening economic outcomes for less-educated and low-income individuals can influence health through several channels [2,40,41] (Fig 1). Falling incomes can reduce access to basic material resources (e.g., stable housing, food, health insurance, and healthcare) needed to ensure good health [41,42]. Worsening economic outcomes may also increase exposure to stressors such as poor environmental conditions (e.g., worse air pollution) [43,44]. Increasing economic insecurity and precarity may directly harm health through increasing biological and psychosocial stress [40,45,46]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Drivers of economic outcomes and consequences for population health. Figure summarizes the key relationships between economic outcomes—and their underlying drivers—and health. To ensure better clarity, the figure only focuses on drivers and pathways discussed in this review. In doing so, we do not directly specify some relationships that are likely important for population health, e.g., direct effects of public policy (e.g., Medicaid expansions) and structural racism on health outcomes. https://doi.org/10.1371/journal.pmed.1003319.g001 The diverging fortunes between the “haves” and “have-nots” may change or constrain the way individuals think and plan for the future, leading to underinvestment in behaviors that may improve health and economic outcomes. For example, worsening economic outcomes can diminish one’s expectations for a better future, which can undermine individuals’ motivations to engage in health-promoting behaviors [47–50]. In addition, economic insecurity may reduce the mental bandwidth needed to make productive, future-oriented economic decisions, reducing individuals’ ability to exit their current economic circumstances and thereby contributing to their worsening health [51,52]. Empirical evidence linking worsening economic outcomes to population health Descriptive evidence on trends in life expectancy and mortality by socioeconomic status implicate changing economic conditions as a key driver of worsening population health in the US. Most starkly, worsening health outcomes in the past 40 years have been concentrated among working-age adults in the bottom of the income distribution and among those with relatively low levels of formal education (i.e., high school or less). These are the very groups that have increasingly fallen behind in economic outcomes since the 1980s [1,3]. For example, a recent study using data from 1.4 billion individual-level tax records linked to administrative data on mortality found widening income-based mortality gaps from 1999 to 2014 [4]. Life expectancy increased by approximately 2–3 years over the 15-year study period for men and women with pre-tax incomes in the top 5% nationally. By contrast, life expectancies for men and women with incomes in the bottom 5% remained virtually unchanged. The growth in the mortality gap by level of education has been even more striking, with several studies demonstrating increases in mortality (or decreases in life expectancy) for adults in the bottom of the US education distribution [3, 53–56]. Growing mortality gaps by education remain [57], even after accounting for bias from growing selection into lower levels of education over time [58]. Differential experiences across geographic areas are also consistent with economic factors playing a critical role in driving population health outcomes. This phenomenon can be revealed by even cursory inspection: for example, age-adjusted mortality rates in counties in different income and education deciles tracked closely with each other until the 1980s, after which the richest and more highly educated counties experienced much larger declines in death rates (Fig 2). Excess all-cause and drug overdose mortality relative to historical trends has increased in states where employment opportunities for less-educated workers have fallen the most in recent decades [1]. Areas with lower economic opportunity (operationalized as county-level differences in rates of upward mobility for individuals born into poorer families) tend to have higher mortality and morbidity [49,59,60]. Similarly, an extensive literature has demonstrated associations between rising income inequality and worsening health [61,62] (although other studies have challenged these findings [63]). Socioeconomic gaps in longevity even vary within geographic areas, with gaps growing markedly in some areas (e.g., the South and industrial Midwest) relative to others [4,54]. Metropolitan areas with higher proportions of college-educated individuals, lower unemployment, richer tax bases, and higher social mobility also tend to have narrower income-based gaps in health outcomes [4,64]. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Age-adjusted mortality rates for working-age adults (ages 25–64 years) by county of residence per capital income and educational deciles, 1970–2016. County-level age-adjusted, all-cause mortality rates for adults ages 25–64 years were obtained from the US Centers for Disease Control WONDER database. Data on county-level per-capita incomes and share of individuals with some college education or greater were obtained from US Decennial Census and ACS data made available through Social Explorer. We assigned counties into deciles of the per-capita income and education variations by each census-ACS year (so a given county could be grouped into different deciles in different years). This operationalization is particularly important for level of education, as focusing on fixed educated groups (e.g., high school or less) may result in selection bias [58]. Population-weighted average mortality rates were then calculated for each socioeconomic decile-year. We then plotted these average mortality rates for the bottom (1st), middle (5th), and top (10th) decile of each socioeconomic variable. After the 1980s, the beginning of which is marked by the vertical red lines, income and education disparities began widening. ACS, American Community Survey; WONDER, Wide-Ranging Online Data for Epidemiologic Research. https://doi.org/10.1371/journal.pmed.1003319.g002 Studies examining the health consequences of specific economic insecurities among low-income and less-educated adults provide still more supporting evidence. Many of these studies have the important advantage of leveraging “natural experiments,” which are research designs that use sudden unanticipated events or shifts in policy to account for unmeasured factors that may bias estimates of the associations between economic outcomes and health [65], allowing researchers to gain more purchase in assessing causal relationships [66] and disentangling age versus period versus cohort-based explanations [67,68]. Recent studies have used policy-driven increases in local exposure to foreign trade to investigate the effects of fading economic opportunities in sectors such as manufacturing, finding large impacts on drug overdose mortality [69,70]. Other studies have demonstrated worsening mental health and increases in overall and drug overdose mortality after manufacturing plant closures [71–73]. Falling participation in labor unions has been linked to rising mortality rates from suicide and drug overdose [74]. Structural racism—as manifest by racial gaps in economic opportunity, racial gaps in wealth, racial segregation, and risk of incarceration—have also been negatively associated with a range of health outcomes, particularly for Black Americans [60,75–77]. A growing literature has also investigated the mechanisms that plausibly may link worsening economic outcomes to health. Demonstrating that key hypothesized mediators influence health provides even greater confidence that economic changes in the past 40 years have adversely affected health. Several studies have examined the health consequences of increasing precarity: food, housing, and economic insecurity, for example, have been robustly linked to a range of health outcomes, including mortality [78–80]. A number of studies have demonstrated negative effects of large economic shocks (e.g., financial hardship during the Great Recession or sudden, unexpected loss of wealth) on physiological stress as measured by cortisol, blood pressure, and blood glucose [46,81], and on premature death [82]. Our review of the evidence thus far has focused primarily on short- and medium-run impacts of economic shifts on the working-age individuals who are directly affected. However, worsening economic conditions may also negatively affect the health of those exposed to these changes early in life, prior to entering the labor market. Young adults entering the US job market during economic downturns are more likely to earn lower lifetime wages, engage in health risk behaviors such as smoking and excessive alcohol use, and, by their late 30s, have higher mortality risk, compared with their counterparts entering the labor market during better times [83,84]. Similarly, income inequality and fading economic opportunities have been linked to risky health behaviors and lower educational investments among adolescents, both of which influence health later in adulthood [85,86]. A large, robust literature has found strong links between exposure to economic insecurity in early childhood and a range of adult health outcomes [87–89]. Social and economic policies to address population health challenges Evidence to date Existing research highlights a number of social and economic policies that may jointly improve economic and health outcomes. Intervening in infancy and childhood appears to be most effective in achieving these objectives, and consequently, such interventions tend to have the highest social returns on investment [90]. The collective impacts of early investments made to date may explain why socioeconomic gaps in mortality rates for children, adolescents, and young adults have actually narrowed over the same period during which they have widened for working-age adults [91]. Evidence from randomized and natural experiments demonstrates that investments made before the age of 5 can lead to improved economic and health outcomes in adulthood [92–94]. Policies that provide early-life access to high-quality preschool and healthcare (e.g., through expansions of the Medicaid program) and reduce early exposure to pollution have also been shown to improve educational attainment, income, and health in adulthood [95–100]. Investments made later in childhood, adolescence, and young adulthood can also have large impacts on economic and health outcomes. Policies that enable children to grow up in better neighborhoods may be particularly transformative: children under the age of 13 whose families were randomly assigned to receive vouchers to move to low-poverty neighborhoods as part of the US Housing and Urban Development’s Moving to Opportunity (MTO) experiment in the late 1990s were more likely to attend and complete college and achieve incomes that were over 30% higher, compared with children whose families did not receive vouchers [101]. Researchers strongly agree on the value of increasing access to high-quality education, with reinforcing investments made throughout the K–12 years appearing to have the greatest impact in reducing economic inequality [102]. In an economy where a 4-year college education increasingly demarcates economic success from economic failure, it will be important to ensure access to high-quality colleges and universities [93,94]. The positive long-run health consequences from investing in education—particularly, college education—have been noted in a number of studies [53,84,92,103]. It will also be important to bolster opportunities for those for whom a 4-year college degree may not be a good fit, for example, through increasing access to high-return vocational education [104]. There is comparatively less evidence to inform the deployment of policy interventions aimed at working-age adults to address worsening economic outcomes and population health. Whatever evidence exists, however, suggests that the right approach will likely involve bundling multiple strategies. There is emerging evidence that minimum wage increases and the Earned Income Tax Credit result in fewer deaths from suicide among individuals with lower levels of education [105–107]. Policies and programs that assist workers displaced by automation and trade to acquire new skills may also play an important role. More generous unemployment insurance benefits have been found to mitigate the impact of economic downturns on suicide mortality [108] and overall self-reported health [109]. A recent study of the US Trade Adjustment Assistance Program, which aims to retrain workers who lose their jobs in industries affected by foreign trade, demonstrates large positive effects on employment and income [110]. Studies examining the consequences of similar policies in Europe also find important positive health impacts [111]. Recent evidence also suggests that adults moving to better neighborhoods as part of the MTO experiment achieved better health outcomes, despite little change in their economic outcomes [112]. At present, programs that foster geographic mobility and financial support to obtain new skills after job displacement remain relatively small in scope. This early evidence suggests value in scale-up and further testing. For historically marginalized population groups such as Black Americans, improving health outcomes may additionally require supplementing universal economic and social policies with specific interventions to reduce long-standing, historically patterned barriers to social mobility. Longevity gaps between Black and White Americans persist and closely track with similar gaps in economic opportunity [60]. The positive impacts of Civil Rights–era policies on short- and long-run labor market outcomes and health [113,114] should motivate investment in a broad set of next-generation policies that include reforming law enforcement, the judicial system, and corrections; enhancing access to and affordability of high-quality education; implementing interventions aimed at narrowing the racial wealth gap; and reducing discrimination in labor markets [115–118]. It will also be important to invest in social policies that break the link between worsening economic outcomes and worsening health. A growing body of research demonstrates that expansions of state Medicaid programs have both reduced debt-driven economy insecurity among low-income adults and improved health outcomes [119–121]. Bolstering other safety net programs—e.g., Temporary Assistance for Needy Families, the Supplemental Nutritional Assistance Program—and expanding access to federal housing programs can help reduce food and housing insecurity during hard times [122]. These efforts should explicitly address administrative features that may introduce inconveniences and stigma that may discourage participation even in generous social programs [123]. Additionally, there is growing evidence that eviction and foreclosure prevention policies may have an important role to play in severing the link between adverse economic shocks and semipermanent declines in health and well-being [80,124,125]. Future policy frontiers The size and scope of population health declines among low-income and less-educated working-age adults also call for identifying, developing, and testing new intervention approaches. The threat of continued job displacement from automation has sparked interest in Universal Basic Income (UBI) cash-transfer programs. Early evidence from universal income programs (including ongoing programs in Alaska and the Eastern Cherokee nation) suggest positive impacts on health and educational attainment (without any evidence of the program inducing exit from the labor force) [126]. These findings reinforce earlier work showing how income supplements can improve health and developmental outcomes among children [127,128]. Large experimental evaluations of newly instituted UBI programs across the world are currently underway. Researchers and policymakers have also proposed a number of other novel transfer programs, such as automatic stabilizers to help decouple business cycle fluctuations from one’s economic circumstances [129] and reparation policies to close structurally mediated racial gaps in wealth [130]. It will also be important to consider new labor market interventions. For example, ongoing threats to economic opportunities due to automation could motivate policies that direct investment toward artificial intelligence that complements, rather than replaces, workers [131]. Job guarantee programs, which seek to provide voluntary public-sector work opportunities to those in need, have also been suggested as a potential policy option [132], particularly given positive early evidence from India [133]. Innovations in public health and healthcare systems—specifically those that help buffer population health against threats from adverse economic shocks—should also be explored. For example, there is growing interest in addressing social determinants of health among state Medicaid programs and hospital systems. Specific pilot programs (such as providing housing, community health worker programs, or interventions to increase access to other social programs) have been shown to be effective in this regard [134–137], though the evidence base as a whole remains sparse. In addition, heightened screening and surveillance in areas facing economic downturns, empowering healthcare practitioners to identify social drivers of health and refer patients to local resources, and fostering partnerships between community-based organizations and health systems may represent a comprehensive and cohesive action plan to mitigate the negative consequences of worsening economic outcomes [138–141]. Implementation and scale-up of these interventions may require redesigning financial incentives in healthcare in ways that direct investment toward addressing socioeconomic determinants of health, e.g., alternative payment models or financial partnerships between healthcare and social sector organizations to address key social issues affecting health and economic outcomes within communities [141,142]. It will also be important to ensure programs implemented by healthcare organizations do not crowd out or replace similar, but higher-return, efforts by other organizations (e.g., public health or housing departments). Evidence to date Existing research highlights a number of social and economic policies that may jointly improve economic and health outcomes. Intervening in infancy and childhood appears to be most effective in achieving these objectives, and consequently, such interventions tend to have the highest social returns on investment [90]. The collective impacts of early investments made to date may explain why socioeconomic gaps in mortality rates for children, adolescents, and young adults have actually narrowed over the same period during which they have widened for working-age adults [91]. Evidence from randomized and natural experiments demonstrates that investments made before the age of 5 can lead to improved economic and health outcomes in adulthood [92–94]. Policies that provide early-life access to high-quality preschool and healthcare (e.g., through expansions of the Medicaid program) and reduce early exposure to pollution have also been shown to improve educational attainment, income, and health in adulthood [95–100]. Investments made later in childhood, adolescence, and young adulthood can also have large impacts on economic and health outcomes. Policies that enable children to grow up in better neighborhoods may be particularly transformative: children under the age of 13 whose families were randomly assigned to receive vouchers to move to low-poverty neighborhoods as part of the US Housing and Urban Development’s Moving to Opportunity (MTO) experiment in the late 1990s were more likely to attend and complete college and achieve incomes that were over 30% higher, compared with children whose families did not receive vouchers [101]. Researchers strongly agree on the value of increasing access to high-quality education, with reinforcing investments made throughout the K–12 years appearing to have the greatest impact in reducing economic inequality [102]. In an economy where a 4-year college education increasingly demarcates economic success from economic failure, it will be important to ensure access to high-quality colleges and universities [93,94]. The positive long-run health consequences from investing in education—particularly, college education—have been noted in a number of studies [53,84,92,103]. It will also be important to bolster opportunities for those for whom a 4-year college degree may not be a good fit, for example, through increasing access to high-return vocational education [104]. There is comparatively less evidence to inform the deployment of policy interventions aimed at working-age adults to address worsening economic outcomes and population health. Whatever evidence exists, however, suggests that the right approach will likely involve bundling multiple strategies. There is emerging evidence that minimum wage increases and the Earned Income Tax Credit result in fewer deaths from suicide among individuals with lower levels of education [105–107]. Policies and programs that assist workers displaced by automation and trade to acquire new skills may also play an important role. More generous unemployment insurance benefits have been found to mitigate the impact of economic downturns on suicide mortality [108] and overall self-reported health [109]. A recent study of the US Trade Adjustment Assistance Program, which aims to retrain workers who lose their jobs in industries affected by foreign trade, demonstrates large positive effects on employment and income [110]. Studies examining the consequences of similar policies in Europe also find important positive health impacts [111]. Recent evidence also suggests that adults moving to better neighborhoods as part of the MTO experiment achieved better health outcomes, despite little change in their economic outcomes [112]. At present, programs that foster geographic mobility and financial support to obtain new skills after job displacement remain relatively small in scope. This early evidence suggests value in scale-up and further testing. For historically marginalized population groups such as Black Americans, improving health outcomes may additionally require supplementing universal economic and social policies with specific interventions to reduce long-standing, historically patterned barriers to social mobility. Longevity gaps between Black and White Americans persist and closely track with similar gaps in economic opportunity [60]. The positive impacts of Civil Rights–era policies on short- and long-run labor market outcomes and health [113,114] should motivate investment in a broad set of next-generation policies that include reforming law enforcement, the judicial system, and corrections; enhancing access to and affordability of high-quality education; implementing interventions aimed at narrowing the racial wealth gap; and reducing discrimination in labor markets [115–118]. It will also be important to invest in social policies that break the link between worsening economic outcomes and worsening health. A growing body of research demonstrates that expansions of state Medicaid programs have both reduced debt-driven economy insecurity among low-income adults and improved health outcomes [119–121]. Bolstering other safety net programs—e.g., Temporary Assistance for Needy Families, the Supplemental Nutritional Assistance Program—and expanding access to federal housing programs can help reduce food and housing insecurity during hard times [122]. These efforts should explicitly address administrative features that may introduce inconveniences and stigma that may discourage participation even in generous social programs [123]. Additionally, there is growing evidence that eviction and foreclosure prevention policies may have an important role to play in severing the link between adverse economic shocks and semipermanent declines in health and well-being [80,124,125]. Future policy frontiers The size and scope of population health declines among low-income and less-educated working-age adults also call for identifying, developing, and testing new intervention approaches. The threat of continued job displacement from automation has sparked interest in Universal Basic Income (UBI) cash-transfer programs. Early evidence from universal income programs (including ongoing programs in Alaska and the Eastern Cherokee nation) suggest positive impacts on health and educational attainment (without any evidence of the program inducing exit from the labor force) [126]. These findings reinforce earlier work showing how income supplements can improve health and developmental outcomes among children [127,128]. Large experimental evaluations of newly instituted UBI programs across the world are currently underway. Researchers and policymakers have also proposed a number of other novel transfer programs, such as automatic stabilizers to help decouple business cycle fluctuations from one’s economic circumstances [129] and reparation policies to close structurally mediated racial gaps in wealth [130]. It will also be important to consider new labor market interventions. For example, ongoing threats to economic opportunities due to automation could motivate policies that direct investment toward artificial intelligence that complements, rather than replaces, workers [131]. Job guarantee programs, which seek to provide voluntary public-sector work opportunities to those in need, have also been suggested as a potential policy option [132], particularly given positive early evidence from India [133]. Innovations in public health and healthcare systems—specifically those that help buffer population health against threats from adverse economic shocks—should also be explored. For example, there is growing interest in addressing social determinants of health among state Medicaid programs and hospital systems. Specific pilot programs (such as providing housing, community health worker programs, or interventions to increase access to other social programs) have been shown to be effective in this regard [134–137], though the evidence base as a whole remains sparse. In addition, heightened screening and surveillance in areas facing economic downturns, empowering healthcare practitioners to identify social drivers of health and refer patients to local resources, and fostering partnerships between community-based organizations and health systems may represent a comprehensive and cohesive action plan to mitigate the negative consequences of worsening economic outcomes [138–141]. Implementation and scale-up of these interventions may require redesigning financial incentives in healthcare in ways that direct investment toward addressing socioeconomic determinants of health, e.g., alternative payment models or financial partnerships between healthcare and social sector organizations to address key social issues affecting health and economic outcomes within communities [141,142]. It will also be important to ensure programs implemented by healthcare organizations do not crowd out or replace similar, but higher-return, efforts by other organizations (e.g., public health or housing departments). Investing in new data and enhancing data access to inform policy A detailed understanding of how the changing economy has affected population health has been made possible by newly available large administrative databases and innovations in statistical techniques. New insights on the changing relationship between income and health have come from combining data from tax databases with mortality data from Social Security death records [4]. Evidence on the impacts of the US Trade Adjustment Assistance program on the incomes of workers comes from unprecedented linkages between US Census data and program administrative data [110]. Landmark contributions on the impacts of environmental policies on health and economic outcomes result from combining individual- and firm-level employment and wage data collected by states with detailed local-area data on air pollution [98]. Researchers studying the impacts of specific social and economic policies on a range of health outcomes have creatively combined hand-collected information on the timing of policy adoption with data on health outcomes from publicly available data from long-standing surveys, such as the Behavioral Risk Factor Surveillance Study and National Health Interview Survey, or vital statistics data [72,86,106,143]. Although many of these novel datasets have been gathered or built by individual research groups, there is an increasing trend toward data sharing, further catalyzing research on the economic determinants of health. For example, researchers releasing deidentified, aggregate versions of their tax-record data have enabled other scholars to identify new insights on the relationship between economic opportunity and health [50,64,144,145]. The growth of federally and privately funded “data aggregators” (e.g., the Integrated Public Use Microdata Series [IPUMS] or Inter-university Consortium for Political and Social Research [ICPSR]) has also been important for researchers studying the economy and health. Data aggregators serve a variety of functions, including enabling easy access to cleaned and harmonized census, health, and economic survey data; collating and disseminating data used in other academic papers; and collecting vetted information on policy implementation across a variety of domains. State and federal government databases are becoming increasingly more complete and easier to access through public-facing web portals, with prime examples being the US Centers for Disease Control Wide-Ranging Online Data for Epidemiologic Research (WONDER) database for vital statistics and the US Bureau of Labor Statistics databases for unemployment and income data. Despite these encouraging trends, researchers and policymakers still face significant challenges obtaining the data needed to best inform policymaking. Large, annual sample surveys primarily collect detailed information on health outcomes or economic outcomes, but not both, making it difficult to study the nuances of how economic trends shape various dimensions of health in different populations. Linking large-scale, individual-level federal economic datasets to health datasets such as death records or medical claims remains a costly, time-consuming process for most researchers. As a result, researchers studying the health consequences of adverse economic trends have had to rely on aggregate data, which are prone to their own biases (e.g., researchers cannot reliably account for nonrandom migration) and make it difficult to study heterogeneity across population groups. Moreover, most of these data are released from 1–2 years after collection, making it difficult to provide real-time information on key questions of interest. The increasing availability of proprietary private data sources and high-frequency, crowd-sourced data has helped address some, but unfortunately not all, of these constraints. Consequently, new policies to increase access to existing large-scale administrative and survey data are needed to maximize our understanding of what policies are most effective in addressing the twin challenges of worsening economic outcomes and population health in the US. Steps to reduce administrative and financial barriers to accessing and linking state and federal data resources will be critical. In this respect, state and federal agencies could follow the lead of the National Institutes of Health (NIH), who recently made data linkages with the death registration system free for NIH-supported researchers, streamlined the process to apply for these data and increased the frequency at which these data were updated. Insights from the development and management of wide-ranging linked registries, such as those in Sweden or Denmark, could also provide a useful model. Agencies can also consider making some currently restricted-access data (e.g., vital statistics for counties with small numbers of births or deaths) public by adopting newly developed techniques to add noise to these measures in a manner that reduces loss of privacy while maintaining statistical fidelity [146]. Evidence-based policy and politics The full potential of data-driven policymaking to improve economic and health outcomes cannot be realized without buy-in from policymakers. There needs to be consensus and political will among policymakers around investing in data and acting on evidence. Translating evidence into policy will require a detailed understanding of the political considerations, policymaker values, and decision-making processes that influence policy adoption [147,148]. Evidence may be valued differently by different actors and in different circumstances [149–151]. Wherever it is possible to do so, collaboration between researchers and policymakers to incorporate data collection efforts and rigorous evaluation designs as part of the policy implementation process [151,152] will be important to help build consensus around the value of data and evidence-based policymaking. In addition, it will be important to elucidate the complex feedback loops between health, economic outcomes, and voting patterns that will ultimately dictate which policies are adopted [147]. Exposure to international trade, which has been tied to worse economic outcomes and deteriorating health among working class individuals [15,70], has led to increased support for presidential candidates whose policy agendas have often run counter to emerging evidence on how to improve economic well-being and population health [153]. These dynamics were particularly salient in the 2016 presidential election, in which areas with worsening health outcomes saw increased support for Donald Trump [154,155], a candidate who campaigned, for example, on deconstructing the Affordable Care Act and reducing environmental regulation. On the other hand, expansion of the Medicaid program, which has shown to improve both health and economic outcomes among low-income US residents [99,119,120], was associated with increased support for the program [156] and greater voter turnout [157]. Incorporating the study of such “political feedback loops” into analyses of the effects of social and economic policies will thus be important to fully elucidate their long-run economic and health consequences. Conclusion Population health in the US stands at a critical juncture. Growing evidence suggests that fading economic opportunities and rising economic insecurity have played an important role in the deteriorating health outcomes and rising mortality rates experienced by working-age individuals. These trends may be further exacerbated by the SARS-CoV-2 pandemic, whose adverse economic consequences—and wide-ranging direct and indirect negative effects on a variety of health outcomes—are expected to most heavily affect low-income individuals. Despite these troubling trends, there are reasons for optimism. New data have enabled researchers and policymakers to better delineate emerging population health challenges and identify policies that can reduce growing health and economic inequality. Further investments in new data and ensuring better access to, and linkages between, existing databases will be necessary to fully realize the potential of evidence-based social and economic policy. However, data alone will not be sufficient. Policymaker consensus around the value of data—and the political will to act on it—will be critical for translating evidence into improvements in population health.
STrengthening the Reporting Of Pharmacogenetic Studies: Development of the STROPS guidelineChaplin, Marty;Kirkham, Jamie J.;Dwan, Kerry;Sloan, Derek J.;Davies, Geraint;Jorgensen, Andrea L.
doi: 10.1371/journal.pmed.1003344pmid: 32956352
Background Large sample sizes are often required to detect statistically significant associations between pharmacogenetic markers and treatment response. Meta-analysis may be performed to synthesize data from several studies, increasing sample size and, consequently, power to detect significant genetic effects. However, performing robust synthesis of data from pharmacogenetic studies is often challenging because of poor reporting of key data in study reports. There is currently no guideline for the reporting of pharmacogenetic studies that has been developed using a widely accepted robust methodology. The objective of this project was to develop the STrengthening the Reporting Of Pharmacogenetic Studies (STROPS) guideline. Methods and findings We established a preliminary checklist of reporting items to be considered for inclusion in the guideline. We invited representatives of key stakeholder groups to participate in a 2-round Delphi survey. A total of 52 individuals participated in both rounds of the survey, scoring items with regards to their importance for inclusion in the STROPS guideline. We then held a consensus meeting, at which 8 individuals considered the results of the Delphi survey and voted on whether each item ought to be included in the final guideline. The STROPS guideline consists of 54 items and is accompanied by an explanation and elaboration document. The guideline contains items that are particularly important in the field of pharmacogenetics, such as the drug regimen of interest and whether adherence to treatment was accounted for in the conducted analyses. The guideline also requires that outcomes be clearly defined and justified, because in pharmacogenetic studies, there may be a greater number of possible outcomes than in other types of study (for example, disease–gene association studies). A limitation of this project is that our consensus meeting involved a small number of individuals, the majority of whom are based in the United Kingdom. Conclusions Our aim is for the STROPS guideline to improve the transparency of reporting of pharmacogenetic studies and also to facilitate the conduct of high-quality systematic reviews and meta-analyses. We encourage authors to adhere to the STROPS guideline when publishing pharmacogenetic studies. Introduction Pharmacogenetic studies investigate associations between genetic variants and treatment response for a particular drug in terms of both efficacy and adverse events. If a significant association between a genetic variant and a treatment response outcome is identified, patients may eventually be genotyped in clinical practice before being prescribed treatment. Healthcare providers may then refer to the genotyping test result when determining whether to prescribe the drug, and if prescribed, the appropriate drug dosage. This approach is known as “personalized medicine”. Outcomes from pharmacogenetic studies are often complex traits; genetic influence may be explained by several genetic variants each having a small effect on outcome. Consequently, large sample sizes are typically required to detect pharmacogenetic associations. Meta-analysis improves sample size and increases power to detect significant associations while also allowing researchers to investigate the possibility that significant associations observed in individual studies may be spurious. However, authors may encounter difficulties when synthesizing evidence from pharmacogenetic studies because of poor reporting of data in study reports. For example, if study authors do not report outcomes for each genotype group separately, it may not be possible for researchers to include this study in meta-analyses. Furthermore, lack of reporting of participants’ ethnicities can also hinder investigations of heterogeneity, which form a key part of any systematic review and/or meta-analysis. Genetic associations often vary according to ethnicity; it is, therefore, recommended that meta-analyses are stratified by ethnicity, and pooling of results should only be performed if effect estimates for different ethnic groups appear sufficiently similar [1]. Although reporting guidelines are available for observational studies [2] and genetic association studies [3], to the best of our knowledge, no reporting guideline has been developed using rigorous methodologically specifically for pharmacogenetic studies. Pharmacogenetic studies have different characteristics than other types of observational and, indeed, genetic association studies. Although some items from existing guidelines can be applied to pharmacogenetic studies, there are many additional pharmacogenetic-specific characteristics that could be reported; clear guidance on which items are essential to report is needed. In this article, we present results of a research project, the aim of which was to develop a reporting guideline for pharmacogenetic studies (the STrengthening the Reporting Of Pharmacogenetic Studies [STROPS] guideline) and an explanation and elaboration (E+E) document. Our aim is that the STROPS guideline will set a robust standard of reporting for pharmacogenetic studies and will consequently facilitate the conduct of high-quality systematic reviews and meta-analyses, thus improving power to detect pharmacogenetic associations. Methods The protocol outlining the prespecified methods of this project has been published [4]. The 6 authors of this article form the steering committee for the project: Marty Chaplin (researcher into meta-analysis of pharmacogenetic studies), Jamie Kirkham (researcher into consensus methodology and developer of reporting guidelines), Kerry Dwan (researcher into systematic review methodology), Derek Sloan (clinical infectious disease researcher), Gerry Davies (clinical pharmacogenetic researcher in infectious diseases), and Andrea Jorgensen (researcher into statistical methods for pharmacogenetics, including evidence synthesis methods). In accordance with methodology proposed by Enhancing the QUAlity and Transparency Of health Research (EQUATOR) [4], we developed the STROPS guideline in the following stages: (1) development of a preliminary checklist, (2) two-round Delphi survey, (3) consensus meeting, and (4) development of the STROPS guideline and accompanying E+E document. Preliminary checklist of reporting items To establish a preliminary checklist of reporting items, we first included items from existing relevant guidelines. We considered all guidelines listed on the EQUATOR website [5] under the clinical area of genetics. Two authors (MC and ALJ) assessed guidelines to be relevant if they were applicable to pharmacogenetics studies. Two authors (MC and ALJ) discussed whether items from these guidelines would ensure transparency of reporting of pharmacogenetic studies and consequently decided whether to include each item in the preliminary checklist. For example, the GRIPS statement [6] includes some items that can be applied to pharmacogenetic studies; however, we did not include all items from this guideline because many items are only relevant to studies in which a genetic risk prediction model is being developed, and these studies are outside the remit of our guideline. We modified some items from existing guidelines; the majority of these modifications were intended to make items more relevant to pharmacogenetic studies. Second, we supplemented this list with additional items thought to be important. These items were either suggested by steering committee members based on our own experience in pharmacogenetic research or were drafted by MC and ALJ to cover issues identified by Jorgensen and Williamson [7], which relate specifically to the conduct of pharmacogenetic research. Finally, we drafted help text for each item, to ensure that language used was comprehensible to all Delphi participants. All steering committee members approved this preliminary checklist before the Delphi survey began. Delphi survey Participants. In March 7, 2019–April 30, 2019, we invited 3 groups of stakeholders to participate in the Delphi survey. Stakeholder groups were chosen to encompass all aspects of pharmacogenetic research. Primary researchers. We asked coordinators of 10 national and international pharmacogenetics networks (UK Pharmacogenetics and Stratified Medicine Network, Pharmacogenomics Research Network [PGRN], Canadian Pharmacogenomics Network for Drug Safety [CPNDS], South East Asian Pharmacogenomics Research Network [SEAPharm], Surveillance and Pharmacogenomics Initiative for Adverse Drug Reactions [SAPhIRE], Brazilian Pharmacogenetics Research Network [REFARGEN], European Society of Pharmacogenomics and Personalised Therapy [ESPT], European Federation for Pharmaceutical Sciences [EUFEPS] Network on Pharmacogenetics and Pharmacogenomics Research, and Clinical Pharmacogenetics Implementation Consortium [CPIC] and Ubiquitous Pharmacogenomics [U-PGx]) to forward the survey on to network members. We performed searches using Google to ensure that all major networks across the globe were identified. Systematic reviewers. We identified 89 contact authors of systematic reviews of pharmacogenetics studies by searching PubMed, using the following search terms: “pharmacogenetics,” “pharmacogenomics,” “systematic review,” and “meta-analysis.” An information specialist designed the search strategy. We used a snowball technique, asking contact authors to complete the survey and to forward the survey on to their coauthors. Journal editors. We contacted 210 editors-in-chief of 168 journals that may publish pharmacogenetic studies. We used a snowball technique, asking editors-in-chief to participate in the survey and also to forward the survey on to editors at their journal. We performed searches using Google to identify journals using search terms “pharmacogenetics,” “pharmacogenomics,” “precision medicine,” “personalised/personalized medicine,” and “journal.” We also considered journals listed on the “SCImago Journal & Country Rank” website [8] under the category “Genetics.” Design. The Delphi process consisted of 2 rounds of survey, response and feedback. The first-round survey (Round 1, March 27, 2019–May 17, 2019) invited participants to score items from the preliminary list and to submit additional reporting items. The second-round survey (Round 2, May 31, 2019–July 12, 2019) provided feedback from the previous round and invited participants to rescore items. Additional reporting items submitted by participants in Round 1 (and approved by the steering committee) were included for scoring by participants in Round 2. The Delphi survey was conducted using DelphiManager, a web-based system designed by the COMET Initiative (http://www.comet-initiative.org/delphimanager/) to facilitate the building and management of Delphi surveys. Recruitment process. We e-mailed individuals from stakeholder groups with information about the STROPS project and Delphi process and an invitation to complete Round 1 within 3 weeks. We informed invitees that participation was optional and that scoring data would be anonymized; we allocated a unique identification number to each Delphi participant. We sent a reminder e-mail at the end of the second week to prompt completion of the survey. All participants who completed Round 1 were invited to participate in Round 2. However, we informed invitees that completion of Round 1 did not necessitate completion in Round 2. Ethics statement. The University of Liverpool Ethics Committee confirmed ethical approval for this study in January 2019 (Reference: 3586). We informed invitees to the Delphi survey that we would assume informed consent if an invitee responded to the survey. Participant characteristics. We asked participants to provide their name, e-mail address, and their consent to be acknowledged as a Delphi participant in the published guideline. Delphi scoring and consensus definition. Participants were asked to score each reporting item using a scale of 1–9, with 1–3 labeled “not important for inclusion in the guideline,” 4–6 labeled “important but not critical for inclusion in the guideline,” and 7–9 labeled “critical for inclusion into the guideline” [9]. Participants were also given the option to score a reporting item as “unable to score” if they were unable to offer an opinion on the importance of the item. We defined that each stakeholder group had reached consensus for an item if at least 70% of members of that group scored the item as “critical for inclusion into the guideline.” Delphi Round 1. Reporting items were presented in the order in which they would be addressed in the pharmacogenetic study report and were grouped under relevant headings: title and abstract, introduction, methods, results, discussion, and other information. Participants were asked to score each item as described previously and were also invited to suggest additional items for inclusion in the reporting guideline. For each item, we summarized the number of respondents and the distribution of scores. Participants who scored an item as “unable to score” were excluded from the analysis for that particular item. We felt that this would not adversely affect our conclusions because for most items, the proportion of “unable to score” responses was minimal. Indeed, in Round 1, there was only 1 item for which more than 10% of participants responded “unable to score.” The steering committee reviewed all additional reporting items suggested by participants. If items were not already covered by the existing list, we added these items to the list of reporting items presented in Round 2, or we covered the item as part of the E+E text for existing items. Delphi Round 2. In Round 2, each participant was shown the number of respondents and distribution of scores for each item from Round 1, for each stakeholder group separately. Participants were also reminded how they personally scored each item in Round 1. Participants were asked to consider responses from other Delphi participants and to rescore the items. Additional items identified as part of Round 1 were scored by participants in Round 2. For each item, the number of respondents and the distribution of scores were summarized. Participants who scored an item as “unable to score” were excluded from analysis for that particular item. Once again, for most items, the proportion of “unable to score” responses was minimal; there were 3 items for which more than 10% of participants responded “unable to score.” If participants that did not respond to Round 2 have different opinions to participants from the same stakeholder group who completed both rounds, then Delphi results may be at risk of attrition bias. We investigated this risk by calculating average Round 1 scores for each participant and plotting these scores according to whether participants completed Round 2 or not for each stakeholder group. We visually examined these plots to assess the likelihood of attrition bias. Consensus meeting The steering committee and stakeholder group representatives met to consider the Delphi results and to finalize the list of items for the reporting guideline. The meeting was conducted via conference call (Zoom). We aimed to include 1 or 2 representatives (with at least 1 being non-UK based) from each stakeholder group in the consensus meeting. We invited individuals to the meeting using the following principles: (1) Delphi participants who completed both rounds, (2) a balance across stakeholder groups, and (3) a reasonable geographic spread. If an individual could not attend, they were replaced by an individual from the same stakeholder group. Prior to and during the meeting, attendees were shown a summary of how each stakeholder group scored each reporting item at Round 2 and the number of stakeholder groups who achieved consensus. Attendees discussed each reporting item in turn and decided whether to include the item in the reporting guideline or not. Where necessary, attendees voted using TurningPoint polling software (Turning Technologies, https://www.turningtechnologies.com/turningpoint/); the item was retained if at least 70% of participants voted for its inclusion. Items were considered in the order they were presented in the Delphi survey. Postconsensus meeting development We drafted the initial reporting guideline and E+E document concurrently. The purpose of the E+E document is to provide the rationale for and meaning of each reporting item alongside examples of good reporting practice. We also provided the origin of each reporting item in the E+E document. Preliminary checklist of reporting items To establish a preliminary checklist of reporting items, we first included items from existing relevant guidelines. We considered all guidelines listed on the EQUATOR website [5] under the clinical area of genetics. Two authors (MC and ALJ) assessed guidelines to be relevant if they were applicable to pharmacogenetics studies. Two authors (MC and ALJ) discussed whether items from these guidelines would ensure transparency of reporting of pharmacogenetic studies and consequently decided whether to include each item in the preliminary checklist. For example, the GRIPS statement [6] includes some items that can be applied to pharmacogenetic studies; however, we did not include all items from this guideline because many items are only relevant to studies in which a genetic risk prediction model is being developed, and these studies are outside the remit of our guideline. We modified some items from existing guidelines; the majority of these modifications were intended to make items more relevant to pharmacogenetic studies. Second, we supplemented this list with additional items thought to be important. These items were either suggested by steering committee members based on our own experience in pharmacogenetic research or were drafted by MC and ALJ to cover issues identified by Jorgensen and Williamson [7], which relate specifically to the conduct of pharmacogenetic research. Finally, we drafted help text for each item, to ensure that language used was comprehensible to all Delphi participants. All steering committee members approved this preliminary checklist before the Delphi survey began. Delphi survey Participants. In March 7, 2019–April 30, 2019, we invited 3 groups of stakeholders to participate in the Delphi survey. Stakeholder groups were chosen to encompass all aspects of pharmacogenetic research. Primary researchers. We asked coordinators of 10 national and international pharmacogenetics networks (UK Pharmacogenetics and Stratified Medicine Network, Pharmacogenomics Research Network [PGRN], Canadian Pharmacogenomics Network for Drug Safety [CPNDS], South East Asian Pharmacogenomics Research Network [SEAPharm], Surveillance and Pharmacogenomics Initiative for Adverse Drug Reactions [SAPhIRE], Brazilian Pharmacogenetics Research Network [REFARGEN], European Society of Pharmacogenomics and Personalised Therapy [ESPT], European Federation for Pharmaceutical Sciences [EUFEPS] Network on Pharmacogenetics and Pharmacogenomics Research, and Clinical Pharmacogenetics Implementation Consortium [CPIC] and Ubiquitous Pharmacogenomics [U-PGx]) to forward the survey on to network members. We performed searches using Google to ensure that all major networks across the globe were identified. Systematic reviewers. We identified 89 contact authors of systematic reviews of pharmacogenetics studies by searching PubMed, using the following search terms: “pharmacogenetics,” “pharmacogenomics,” “systematic review,” and “meta-analysis.” An information specialist designed the search strategy. We used a snowball technique, asking contact authors to complete the survey and to forward the survey on to their coauthors. Journal editors. We contacted 210 editors-in-chief of 168 journals that may publish pharmacogenetic studies. We used a snowball technique, asking editors-in-chief to participate in the survey and also to forward the survey on to editors at their journal. We performed searches using Google to identify journals using search terms “pharmacogenetics,” “pharmacogenomics,” “precision medicine,” “personalised/personalized medicine,” and “journal.” We also considered journals listed on the “SCImago Journal & Country Rank” website [8] under the category “Genetics.” Design. The Delphi process consisted of 2 rounds of survey, response and feedback. The first-round survey (Round 1, March 27, 2019–May 17, 2019) invited participants to score items from the preliminary list and to submit additional reporting items. The second-round survey (Round 2, May 31, 2019–July 12, 2019) provided feedback from the previous round and invited participants to rescore items. Additional reporting items submitted by participants in Round 1 (and approved by the steering committee) were included for scoring by participants in Round 2. The Delphi survey was conducted using DelphiManager, a web-based system designed by the COMET Initiative (http://www.comet-initiative.org/delphimanager/) to facilitate the building and management of Delphi surveys. Recruitment process. We e-mailed individuals from stakeholder groups with information about the STROPS project and Delphi process and an invitation to complete Round 1 within 3 weeks. We informed invitees that participation was optional and that scoring data would be anonymized; we allocated a unique identification number to each Delphi participant. We sent a reminder e-mail at the end of the second week to prompt completion of the survey. All participants who completed Round 1 were invited to participate in Round 2. However, we informed invitees that completion of Round 1 did not necessitate completion in Round 2. Ethics statement. The University of Liverpool Ethics Committee confirmed ethical approval for this study in January 2019 (Reference: 3586). We informed invitees to the Delphi survey that we would assume informed consent if an invitee responded to the survey. Participant characteristics. We asked participants to provide their name, e-mail address, and their consent to be acknowledged as a Delphi participant in the published guideline. Delphi scoring and consensus definition. Participants were asked to score each reporting item using a scale of 1–9, with 1–3 labeled “not important for inclusion in the guideline,” 4–6 labeled “important but not critical for inclusion in the guideline,” and 7–9 labeled “critical for inclusion into the guideline” [9]. Participants were also given the option to score a reporting item as “unable to score” if they were unable to offer an opinion on the importance of the item. We defined that each stakeholder group had reached consensus for an item if at least 70% of members of that group scored the item as “critical for inclusion into the guideline.” Delphi Round 1. Reporting items were presented in the order in which they would be addressed in the pharmacogenetic study report and were grouped under relevant headings: title and abstract, introduction, methods, results, discussion, and other information. Participants were asked to score each item as described previously and were also invited to suggest additional items for inclusion in the reporting guideline. For each item, we summarized the number of respondents and the distribution of scores. Participants who scored an item as “unable to score” were excluded from the analysis for that particular item. We felt that this would not adversely affect our conclusions because for most items, the proportion of “unable to score” responses was minimal. Indeed, in Round 1, there was only 1 item for which more than 10% of participants responded “unable to score.” The steering committee reviewed all additional reporting items suggested by participants. If items were not already covered by the existing list, we added these items to the list of reporting items presented in Round 2, or we covered the item as part of the E+E text for existing items. Delphi Round 2. In Round 2, each participant was shown the number of respondents and distribution of scores for each item from Round 1, for each stakeholder group separately. Participants were also reminded how they personally scored each item in Round 1. Participants were asked to consider responses from other Delphi participants and to rescore the items. Additional items identified as part of Round 1 were scored by participants in Round 2. For each item, the number of respondents and the distribution of scores were summarized. Participants who scored an item as “unable to score” were excluded from analysis for that particular item. Once again, for most items, the proportion of “unable to score” responses was minimal; there were 3 items for which more than 10% of participants responded “unable to score.” If participants that did not respond to Round 2 have different opinions to participants from the same stakeholder group who completed both rounds, then Delphi results may be at risk of attrition bias. We investigated this risk by calculating average Round 1 scores for each participant and plotting these scores according to whether participants completed Round 2 or not for each stakeholder group. We visually examined these plots to assess the likelihood of attrition bias. Participants. In March 7, 2019–April 30, 2019, we invited 3 groups of stakeholders to participate in the Delphi survey. Stakeholder groups were chosen to encompass all aspects of pharmacogenetic research. Primary researchers. We asked coordinators of 10 national and international pharmacogenetics networks (UK Pharmacogenetics and Stratified Medicine Network, Pharmacogenomics Research Network [PGRN], Canadian Pharmacogenomics Network for Drug Safety [CPNDS], South East Asian Pharmacogenomics Research Network [SEAPharm], Surveillance and Pharmacogenomics Initiative for Adverse Drug Reactions [SAPhIRE], Brazilian Pharmacogenetics Research Network [REFARGEN], European Society of Pharmacogenomics and Personalised Therapy [ESPT], European Federation for Pharmaceutical Sciences [EUFEPS] Network on Pharmacogenetics and Pharmacogenomics Research, and Clinical Pharmacogenetics Implementation Consortium [CPIC] and Ubiquitous Pharmacogenomics [U-PGx]) to forward the survey on to network members. We performed searches using Google to ensure that all major networks across the globe were identified. Systematic reviewers. We identified 89 contact authors of systematic reviews of pharmacogenetics studies by searching PubMed, using the following search terms: “pharmacogenetics,” “pharmacogenomics,” “systematic review,” and “meta-analysis.” An information specialist designed the search strategy. We used a snowball technique, asking contact authors to complete the survey and to forward the survey on to their coauthors. Journal editors. We contacted 210 editors-in-chief of 168 journals that may publish pharmacogenetic studies. We used a snowball technique, asking editors-in-chief to participate in the survey and also to forward the survey on to editors at their journal. We performed searches using Google to identify journals using search terms “pharmacogenetics,” “pharmacogenomics,” “precision medicine,” “personalised/personalized medicine,” and “journal.” We also considered journals listed on the “SCImago Journal & Country Rank” website [8] under the category “Genetics.” Design. The Delphi process consisted of 2 rounds of survey, response and feedback. The first-round survey (Round 1, March 27, 2019–May 17, 2019) invited participants to score items from the preliminary list and to submit additional reporting items. The second-round survey (Round 2, May 31, 2019–July 12, 2019) provided feedback from the previous round and invited participants to rescore items. Additional reporting items submitted by participants in Round 1 (and approved by the steering committee) were included for scoring by participants in Round 2. The Delphi survey was conducted using DelphiManager, a web-based system designed by the COMET Initiative (http://www.comet-initiative.org/delphimanager/) to facilitate the building and management of Delphi surveys. Recruitment process. We e-mailed individuals from stakeholder groups with information about the STROPS project and Delphi process and an invitation to complete Round 1 within 3 weeks. We informed invitees that participation was optional and that scoring data would be anonymized; we allocated a unique identification number to each Delphi participant. We sent a reminder e-mail at the end of the second week to prompt completion of the survey. All participants who completed Round 1 were invited to participate in Round 2. However, we informed invitees that completion of Round 1 did not necessitate completion in Round 2. Ethics statement. The University of Liverpool Ethics Committee confirmed ethical approval for this study in January 2019 (Reference: 3586). We informed invitees to the Delphi survey that we would assume informed consent if an invitee responded to the survey. Participant characteristics. We asked participants to provide their name, e-mail address, and their consent to be acknowledged as a Delphi participant in the published guideline. Delphi scoring and consensus definition. Participants were asked to score each reporting item using a scale of 1–9, with 1–3 labeled “not important for inclusion in the guideline,” 4–6 labeled “important but not critical for inclusion in the guideline,” and 7–9 labeled “critical for inclusion into the guideline” [9]. Participants were also given the option to score a reporting item as “unable to score” if they were unable to offer an opinion on the importance of the item. We defined that each stakeholder group had reached consensus for an item if at least 70% of members of that group scored the item as “critical for inclusion into the guideline.” Delphi Round 1. Reporting items were presented in the order in which they would be addressed in the pharmacogenetic study report and were grouped under relevant headings: title and abstract, introduction, methods, results, discussion, and other information. Participants were asked to score each item as described previously and were also invited to suggest additional items for inclusion in the reporting guideline. For each item, we summarized the number of respondents and the distribution of scores. Participants who scored an item as “unable to score” were excluded from the analysis for that particular item. We felt that this would not adversely affect our conclusions because for most items, the proportion of “unable to score” responses was minimal. Indeed, in Round 1, there was only 1 item for which more than 10% of participants responded “unable to score.” The steering committee reviewed all additional reporting items suggested by participants. If items were not already covered by the existing list, we added these items to the list of reporting items presented in Round 2, or we covered the item as part of the E+E text for existing items. Delphi Round 2. In Round 2, each participant was shown the number of respondents and distribution of scores for each item from Round 1, for each stakeholder group separately. Participants were also reminded how they personally scored each item in Round 1. Participants were asked to consider responses from other Delphi participants and to rescore the items. Additional items identified as part of Round 1 were scored by participants in Round 2. For each item, the number of respondents and the distribution of scores were summarized. Participants who scored an item as “unable to score” were excluded from analysis for that particular item. Once again, for most items, the proportion of “unable to score” responses was minimal; there were 3 items for which more than 10% of participants responded “unable to score.” If participants that did not respond to Round 2 have different opinions to participants from the same stakeholder group who completed both rounds, then Delphi results may be at risk of attrition bias. We investigated this risk by calculating average Round 1 scores for each participant and plotting these scores according to whether participants completed Round 2 or not for each stakeholder group. We visually examined these plots to assess the likelihood of attrition bias. Consensus meeting The steering committee and stakeholder group representatives met to consider the Delphi results and to finalize the list of items for the reporting guideline. The meeting was conducted via conference call (Zoom). We aimed to include 1 or 2 representatives (with at least 1 being non-UK based) from each stakeholder group in the consensus meeting. We invited individuals to the meeting using the following principles: (1) Delphi participants who completed both rounds, (2) a balance across stakeholder groups, and (3) a reasonable geographic spread. If an individual could not attend, they were replaced by an individual from the same stakeholder group. Prior to and during the meeting, attendees were shown a summary of how each stakeholder group scored each reporting item at Round 2 and the number of stakeholder groups who achieved consensus. Attendees discussed each reporting item in turn and decided whether to include the item in the reporting guideline or not. Where necessary, attendees voted using TurningPoint polling software (Turning Technologies, https://www.turningtechnologies.com/turningpoint/); the item was retained if at least 70% of participants voted for its inclusion. Items were considered in the order they were presented in the Delphi survey. Postconsensus meeting development We drafted the initial reporting guideline and E+E document concurrently. The purpose of the E+E document is to provide the rationale for and meaning of each reporting item alongside examples of good reporting practice. We also provided the origin of each reporting item in the E+E document. Results Delphi survey In Round 1, participants were asked to score 92 reporting items (the preliminary checklist of items) (S1 Table). The items are labeled 1 to 85, as some items have subitems, i.e., 52a, 52b, 52c. A total of 71 individuals completed this round: 15 journal editors, 41 primary researchers, and 15 systematic reviewers. A total of 10 participants suggested 31 additional reporting items. In addition, during Round 1, Delphi participants notified us of 2 publications containing relevant reporting items. After reviewing additional reporting items suggested by participants and the 2 relevant publications [10, 11], we included 7 additional items in Round 2 (S1 Table); we also covered some suggested reporting items by including additional detail in the E+E text for existing items. A total of 52 individuals scored 99 reporting items in Round 2: 10 journal editors, 31 primary researchers, and 11 systematic reviewers. Anonymized data from both Delphi survey rounds are available in S1 Data. A list of individuals who gave their permission to be listed as participants in the Delphi survey is provided in S1 Document. As we asked network coordinators, systematic reviewers, and journal editors to contact individuals on our behalf, it is impossible to determine a response rate to Round 1. However, we considered the response received to Round 2 to be reasonable (overall: 52/71, 73%; journal editors: 10/15, 67%; systematic reviewers: 31/41, 76%; primary researchers: 11/15, 73%). Considering the boxplots presented in S2 Document, the distributions of scores were similar between those who completed both rounds of the Delphi survey and those who completed Round 1 only. There was therefore no evidence to suggest that attrition bias occurred. Consensus meeting The consensus meeting took place in November 7, 2019, and included 6 steering committee members and 4 representatives of stakeholder groups (1 journal editor, based in Germany; 1 primary researcher, based in Switzerland; 2 systematic reviewers, based in the UK and Spain). Names and affiliations of these representatives are provided in S1 Document. Two steering committee members did not participate in voting (JK chaired and KD took notes), so there were 8 voting individuals in attendance. The consensus matrix (S2 Table) documents how each stakeholder group scored each item at Round 1 and at Round 2 and was provided to attendees prior to the meeting. Consensus meeting slides are provided in S1 Presentation. Decisions made at the consensus meeting are summarized in S3 Table. We decided whether to include or exclude items and whether to combine multiple items under a single item. Where a vote was taken, this is indicated in the table; otherwise, decisions made were based solely on consideration of the Delphi results and discussion. Postconsensus meeting development Following the consensus meeting, MC drafted the reporting guideline with guidance from the steering committee. The following minor amendments were made: We excluded item 14 and item 49; while searching for examples for these items, we found very few pharmacogenetic studies that used a matched cohort design or a cross-sectional design with a complex sampling strategy; these items would therefore be irrelevant to the vast majority of guideline users. We removed “Identify variables likely to be associated with population stratification (confounding by ethnic origin)” from item 22, because this is covered by item 54. We added more terms (“major,” “reference,” “risk,” and “effect”) that might be used to describe alleles to item 27. Although we decided to cover item 55 by adding to the E+E text for item 34 at the consensus meeting, the steering committee subsequently agreed that relatedness of participants is a separate issue to genotype quality control. We decided to keep item 55 as a standalone item in the guideline. We introduced a new subitem to item 42 to cover confounding and made item 42 a generic introduction to the statistical methods subitems. We modified item 68 to indicate that average and/or total follow-up time is sufficient. Although we voted to exclude item 80 from the “Other analyses” section of the reporting guideline at the consensus meeting, the intention was to consider this item under the “Databases” section. However, time constraints meant that we did not discuss this item again. The steering committee subsequently agreed that this item relates to additional results, rather than individual patient data in databases. We decided to keep the item in its original position, and add “i.e. in supplementary materials” so the meaning of the item is clear. The resulting draft guideline was circulated to all consensus meeting attendees in March 2020. All comments and revisions were taken into consideration, and the checklist revised accordingly. STROPS guideline In Table 1, we report the STROPS guideline. The accompanying E+E document is provided in S3 Document. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. STROPS reporting guideline. https://doi.org/10.1371/journal.pmed.1003344.t001 Delphi survey In Round 1, participants were asked to score 92 reporting items (the preliminary checklist of items) (S1 Table). The items are labeled 1 to 85, as some items have subitems, i.e., 52a, 52b, 52c. A total of 71 individuals completed this round: 15 journal editors, 41 primary researchers, and 15 systematic reviewers. A total of 10 participants suggested 31 additional reporting items. In addition, during Round 1, Delphi participants notified us of 2 publications containing relevant reporting items. After reviewing additional reporting items suggested by participants and the 2 relevant publications [10, 11], we included 7 additional items in Round 2 (S1 Table); we also covered some suggested reporting items by including additional detail in the E+E text for existing items. A total of 52 individuals scored 99 reporting items in Round 2: 10 journal editors, 31 primary researchers, and 11 systematic reviewers. Anonymized data from both Delphi survey rounds are available in S1 Data. A list of individuals who gave their permission to be listed as participants in the Delphi survey is provided in S1 Document. As we asked network coordinators, systematic reviewers, and journal editors to contact individuals on our behalf, it is impossible to determine a response rate to Round 1. However, we considered the response received to Round 2 to be reasonable (overall: 52/71, 73%; journal editors: 10/15, 67%; systematic reviewers: 31/41, 76%; primary researchers: 11/15, 73%). Considering the boxplots presented in S2 Document, the distributions of scores were similar between those who completed both rounds of the Delphi survey and those who completed Round 1 only. There was therefore no evidence to suggest that attrition bias occurred. Consensus meeting The consensus meeting took place in November 7, 2019, and included 6 steering committee members and 4 representatives of stakeholder groups (1 journal editor, based in Germany; 1 primary researcher, based in Switzerland; 2 systematic reviewers, based in the UK and Spain). Names and affiliations of these representatives are provided in S1 Document. Two steering committee members did not participate in voting (JK chaired and KD took notes), so there were 8 voting individuals in attendance. The consensus matrix (S2 Table) documents how each stakeholder group scored each item at Round 1 and at Round 2 and was provided to attendees prior to the meeting. Consensus meeting slides are provided in S1 Presentation. Decisions made at the consensus meeting are summarized in S3 Table. We decided whether to include or exclude items and whether to combine multiple items under a single item. Where a vote was taken, this is indicated in the table; otherwise, decisions made were based solely on consideration of the Delphi results and discussion. Postconsensus meeting development Following the consensus meeting, MC drafted the reporting guideline with guidance from the steering committee. The following minor amendments were made: We excluded item 14 and item 49; while searching for examples for these items, we found very few pharmacogenetic studies that used a matched cohort design or a cross-sectional design with a complex sampling strategy; these items would therefore be irrelevant to the vast majority of guideline users. We removed “Identify variables likely to be associated with population stratification (confounding by ethnic origin)” from item 22, because this is covered by item 54. We added more terms (“major,” “reference,” “risk,” and “effect”) that might be used to describe alleles to item 27. Although we decided to cover item 55 by adding to the E+E text for item 34 at the consensus meeting, the steering committee subsequently agreed that relatedness of participants is a separate issue to genotype quality control. We decided to keep item 55 as a standalone item in the guideline. We introduced a new subitem to item 42 to cover confounding and made item 42 a generic introduction to the statistical methods subitems. We modified item 68 to indicate that average and/or total follow-up time is sufficient. Although we voted to exclude item 80 from the “Other analyses” section of the reporting guideline at the consensus meeting, the intention was to consider this item under the “Databases” section. However, time constraints meant that we did not discuss this item again. The steering committee subsequently agreed that this item relates to additional results, rather than individual patient data in databases. We decided to keep the item in its original position, and add “i.e. in supplementary materials” so the meaning of the item is clear. The resulting draft guideline was circulated to all consensus meeting attendees in March 2020. All comments and revisions were taken into consideration, and the checklist revised accordingly. STROPS guideline In Table 1, we report the STROPS guideline. The accompanying E+E document is provided in S3 Document. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. STROPS reporting guideline. https://doi.org/10.1371/journal.pmed.1003344.t001 Discussion The objective of this project was to develop the STROPS guideline. We used rigorous methodology for the development of reporting guidelines proposed by EQUATOR [4], including a 2-round Delphi survey and consensus meeting, both of which involved representatives of 3 key stakeholder groups. The final guideline consists of 54 items, 17 of which are novel items that have not been included in any existing guideline. A further 14 items originate from existing guidelines but have been modified for this pharmacogenetic-specific guideline. We encourage pharmacogenetic researchers to adhere to the STROPS guideline to ensure the transparency and completeness of their study reports. Because of a lack of funding to cover travel and accommodation costs, we were unable to arrange a face-to-face consensus meeting as recommended by EQUATOR [4]. Our meeting was conducted via conference call, and the majority of meeting attendees were UK based. However, we invited a large, international and multidisciplinary cohort to participate in the Delphi survey, and meeting attendees were able to base their decisions on the opinions of this wider cohort. At the consensus meeting, we prioritized items for inclusion in the guideline if all stakeholder groups reached consensus, i.e., at least 70% of participants in each stakeholder group scored the item as “critical.” Although choice of threshold is subjective, prespecification of the threshold in the protocol ought to provide assurance that we did not define consensus in a post hoc way and therefore, that our own opinions did not bias the Delphi results [12]. The final phase of activities described by Moher and colleagues [4] relates to dissemination and implementation of the published guideline. We plan to circulate the STROPS guideline to individuals who completed both Delphi rounds and to ask coordinators of pharmacogenetic networks to notify their members of the publication of the guideline. We will also register the guideline on the EQUATOR website, present the guideline at conferences relevant to pharmacogenetic research, and seek guideline endorsement from relevant journals. It is important to note that the STROPS guideline has been developed to improve the reporting of primary pharmacogenetic studies; to the best of our knowledge, no guideline exists for the reporting of systematic reviews and meta-analyses of pharmacogenetic studies. Evidence synthesis is an indispensable tool to researchers who are striving to improve the strength of the evidence base for pharmacogenetic associations, and a specific guideline designed to improve the reporting of systematic reviews and meta-analyses of pharmacogenetic studies would certainly be a useful addition in this field of research. Indeed, setting a robust standard for reporting of systematic reviews may improve the likelihood of pharmacogenetic findings being translated into clinical practice. Supporting information S1 Table. Items scored in the Delphi survey. https://doi.org/10.1371/journal.pmed.1003344.s001 (DOCX) S2 Table. Consensus matrix. https://doi.org/10.1371/journal.pmed.1003344.s002 (DOCX) S3 Table. Summary of decisions made at consensus meeting. https://doi.org/10.1371/journal.pmed.1003344.s003 (DOCX) S1 Data. Anonymized data from Round 1 and Round 2 of the Delphi survey. A score of “-9” indicates that the participant did not score an item or that the item was not scored because it was not available at Round 1. These items will also have a “0” in the Round column. Data from participants that only partially completed a round (i.e., did not score all items) were not included in or analyses. A score of “10” indicates that the participant selected “unable to score.” Participants who scored an item as “unable to score” were excluded from the analysis for that particular item. https://doi.org/10.1371/journal.pmed.1003344.s004 (CSV) S1 Document. Delphi participants and consensus meeting attendees. https://doi.org/10.1371/journal.pmed.1003344.s005 (DOCX) S2 Document. Investigation of attrition bias in the Delphi survey. https://doi.org/10.1371/journal.pmed.1003344.s006 (DOCX) S3 Document. STROPS guideline: Explanation and elaboration document. STROPS, STrengthening the Reporting Of Pharmacogenetic Studies. https://doi.org/10.1371/journal.pmed.1003344.s007 (DOCX) S1 Presentation. Consensus meeting slides. https://doi.org/10.1371/journal.pmed.1003344.s008 (PDF) Acknowledgments The study team would like to thank all individuals who contributed to the consensus meeting and all individuals who participated in both rounds of the Delphi survey. We would also like to thank Eleanor Kotas for her assistance in drafting and implementing the search strategy to identify systematic reviews of pharmacogenetic studies.