Reporting Items for Updated Clinical Guidelines: Checklist for the Reporting of Updated Guidelines (CheckUp)doi: 10.1371/journal.pmed.1002207pmid: 28072838
Background Scientific knowledge is in constant development. Consequently, regular review to assure the trustworthiness of clinical guidelines is required. However, there is still a lack of preferred reporting items of the updating process in updated clinical guidelines. The present article describes the development process of the Checklist for the Reporting of Updated Guidelines (CheckUp). Methods and Findings We developed an initial list of items based on an overview of research evidence on clinical guideline updating, the Appraisal of Guidelines for Research and Evaluation (AGREE) II Instrument, and the advice of the CheckUp panel (n = 33 professionals). A multistep process was used to refine this list, including an assessment of ten existing updated clinical guidelines, interviews with key informants (response rate: 54.2%; 13/24), a three-round Delphi consensus survey with the CheckUp panel (33 participants), and an external review with clinical guideline methodologists (response rate: 90%; 53/59) and users (response rate: 55.6%; 10/18). CheckUp includes 16 items that address (1) the presentation of an updated guideline, (2) editorial independence, and (3) the methodology of the updating process. In this article, we present the methodology to develop CheckUp and include as a supplementary file an explanation and elaboration document. Conclusions CheckUp can be used to evaluate the completeness of reporting in updated guidelines and as a tool to inform guideline developers about reporting requirements. Editors may request its completion from guideline authors when submitting updated guidelines for publication. Adherence to CheckUp will likely enhance the comprehensiveness and transparency of clinical guideline updating for the benefit of patients and the public, health care professionals, and other relevant stakeholders. Background Trustworthy clinical guidelines aim to assist decision making by providing recommendations that are informed by the best available evidence and include an assessment of the benefits and harms of alternative care options [1,2]. Because of the continuous emergence of new research evidence (i.e., changes in available interventions, effects, or cost) [3], appropriate updating to maintain the trustworthiness of clinical guidelines is challenging since it requires regular surveillance and reviewing of the new evidence [4,5]. Updating clinical guidelines is a process that includes different stages: (1) prioritisation of candidate guidelines or recommendations to update [6], (2) identification of new scientific evidence [3,6–8], (3) assessment of the need to update [3,6,9], (4) updating the recommendations [6,10–12], and (5) publication of the updated guideline [6,13]. However, there is no consensus about what is the optimal methodology to operationalise each of these steps or how to report on the process [5,14,15,16]; the available guidance from guideline institutions is suboptimal [17,18]. Trustworthiness standards for guidelines have been published by both the Institute of Medicine (IOM) and the Guidelines International Network (G-I-N) [1,2]. Additionally, instruments are available for assessing the quality of clinical guidelines, such as the Appraisal of Guidelines for Research and Evaluation (AGREE) II Instrument [19], while others, such as the GIN-McMaster Guideline Development Checklist [20], support developing and implementing trustworthy clinical guidelines. However, guideline updating requires some different methodological considerations and unique communication procedures. Currently, none of the existing tools address these issues. To address this gap, in a partnership of the Iberoamerican Cochrane Centre (www.cochrane.org), the AGREE Collaboration (www.agreetrust.org), and the G-I-N Updating Guidelines Working Group (www.g-i-n.net/working-groups/updating-guidelines), we have developed the Checklist for the Reporting of Updated Guidelines (CheckUp). This article about CheckUp is targeted at guideline developers and users of guidelines. In the article, we present the methodology of the development process and the final checklist. In a supplementary file, we present explanations and examples for each item (S1 Appendix). Methodology For reporting the development process of CheckUp, we followed Enhancing the QUAlity and Transparency Of health Research(EQUATOR) and Moher’s criteria [21,22]. The development of CheckUp consisted of four phases: (1) panel selection, (2) generation of the initial checklist, (3) optimisation of the checklist, and (4) approval of the final checklist (Fig 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Checklist development process. Abbreviation: AGREE, Appraisal of Guidelines for Research and Evaluation. https://doi.org/10.1371/journal.pmed.1002207.g001 Panel Selection To advise on the development of the CheckUp, a panel comprising individuals with relevant experience in clinical guideline development and updating and/or in systematic reviews/guidelines research methodology was convened. Invited panel participants were identified based on a review of the main authors in the field, as well as the AGREE Trust (www.agreetrust.org) and the G-I-N (www.g-i-n.net) members. The purpose of the panel was to provide expert advice during the development process and to participate in the Delphi survey. A core group (RWMV, PAC, MB, and LMG) was established to design the protocol and provide more time-sensitive and operational advice. Generation of the Initial Checklist The core group first developed an initial list of items—including explanation and examples—through brainstorming and discussion, taking into account (1) research evidence in the field [15,16,18], (2) the AGREE II Instrument [19], and (3) the panel experience. We used three core updating publications as a starting point from which the initial version of the checklist was generated [15,16,18]. These studies include an overview of the available guidance from guideline methodological handbooks [18], a systematic review of the published methodological research [16], and an international survey about the experiences of the main guideline developers [15]. Optimisation of the Checklist We optimised the initial checklist through a multistep process that included an assessment of existing updated clinical guidelines, semistructured interviews, a Delphi consensus survey, and an external review with clinical guideline methodologists and users (Fig 1). Assessment of updated clinical guidelines. We piloted the initial checklist among a convenience sample of updated clinical guidelines to assess the used terminology, to identify missing items, and as a first step to explore its face validity. We included updated clinical guidelines that were (1) developed by G-I-N members, (2) published in English or Spanish, and (3) published between 2011 and 2013. We searched the G-I-N library (www.g-i-n.net) and the National Guideline Clearinghouse (www.guidelines.gov). Two reviewers (RWMV and SPS) applied the checklist, solving disagreements by consensus. The core group discussed the results and refined the initial list of items. Semistructured interviews. To refine the checklist and to identify missing items, we conducted semistructured interviews with clinical guideline experts. We chose a convenience sample of participants, outside the CheckUp panel, with (1) experience in updating clinical guidelines, defined as having participated in the updating process of at least one clinical guideline over the past year, and (2) fluency in English. We identified the participants by contacting professionals associated with G-I-N or researchers in the field. When someone did not respond or could not participate, a new person was recruited. We continued to recruit participants and collect data until data saturation was achieved. In each interview, participants were asked about their experiences and challenges in updating clinical guidelines. Subsequently, the participant was prompted by the interviewer (RWMV) to reflect on the strengths, weaknesses, missing concepts, and redundancies in the checklist. The interviews were audiotaped, and key themes were identified. The core group discussed the results and refined the list of items. Delphi consensus survey. We reviewed the refined list of items through a Delphi consensus survey [23], with all members of the CheckUp panel. The Delphi participants assessed the inclusion, comprehensiveness, clarity, and coverage of each item and tried to identify potentially additional items for the checklist. Using a seven-point Likert scale (one meaning strongly disagree and seven meaning strongly agree) [24], we asked participants to rate whether the item should be included in the checklist. For each item, participants were asked whether their perceptions of (1) the completeness, (2) the usability, and (3) the quality of a clinical guideline would be influenced if the item was reported. We included a free text box for suggestions to modify the items, the explanation, or the examples. We used online software to design the survey and to collect the responses (www.surveymonkey.com). We calculated the median score for inclusion, completeness, usability, and quality for each item and classified them into (1) items with a median score of 0 to 3 points, which were excluded; (2) items with a median score of 4 to 5 points or with substantial comments that needed important revision, which were retained, modified, and further tested; and (3) items with a median score of 6 to 7 points and without substantial comments, which were included and not evaluated further in the following rounds. One reviewer (RWMV) analysed the quantitative and qualitative results and suggested potential solutions. The core group discussed the results and potential solutions and refined the list of items accordingly. We continued with additional rounds until consensus for inclusion or exclusion was reached and no more relevant comments were provided. External review with clinical guideline methodologists. To evaluate the usability of the checklist, we conducted a survey with clinical guideline methodologists who had experience in updating clinical guidelines, as measured by having participated in the updating process of at least one clinical guideline over the past year. We also invited all of the G-I-N institutional member contacts to participate in the external review. If the contact person was not able to participate, we asked them to provide contact details of another expert working at the same institution. Using a seven-point scale (one meaning strongly disagree and seven meaning strongly agree), we asked participants to rate the usability of each item and its influence on the confidence in an updated clinical guideline if the item was reported. A free text option was included for suggestions to modify the items, the explanation, or the examples. We used online software to design the survey and to collect the responses (www.surveymonkey.com). We calculated the median score for usability and confidence for each item. One reviewer (RWMV) analysed the quantitative and qualitative results and suggested potential solutions. The core group discussed the results and potential solutions and refined the list of items accordingly. External review with clinical guideline users. We conducted semistructured interviews with clinical guideline users to evaluate the usability of the checklist. We engaged individuals who were (1) health care professionals who used clinical guidelines in clinical practice and (2) located in Canada, Spain, or the Netherlands. We identified the participants with the help of the panel members. When someone did not respond or could not participate, a new person was recruited. We continued to recruit participants and collect data until the information was repeated and no new information emerged (data saturation). For each interview, participants were asked whether reporting of the item in an updated clinical guideline would increase their confidence in the guideline. The participant and interviewer (RWMV) reviewed the checklist, and the participants were prompted to consider missing concepts, redundancy, and the usability of the checklist. The interviews were audiotaped, and key themes were identified. The core group discussed the results and refined the list of items. Approval of the final checklist. The checklist was presented and discussed in a workshop at the 2015 G-I-N Conference in Amsterdam. In this workshop, we asked the participants whether the checklist was deemed adequate for assessment of the updated clinical guidelines [25]. We also asked the participants to give an overall impression of the checklist. The core group discussed the results and agreed on the final list of items. Panel Selection To advise on the development of the CheckUp, a panel comprising individuals with relevant experience in clinical guideline development and updating and/or in systematic reviews/guidelines research methodology was convened. Invited panel participants were identified based on a review of the main authors in the field, as well as the AGREE Trust (www.agreetrust.org) and the G-I-N (www.g-i-n.net) members. The purpose of the panel was to provide expert advice during the development process and to participate in the Delphi survey. A core group (RWMV, PAC, MB, and LMG) was established to design the protocol and provide more time-sensitive and operational advice. Generation of the Initial Checklist The core group first developed an initial list of items—including explanation and examples—through brainstorming and discussion, taking into account (1) research evidence in the field [15,16,18], (2) the AGREE II Instrument [19], and (3) the panel experience. We used three core updating publications as a starting point from which the initial version of the checklist was generated [15,16,18]. These studies include an overview of the available guidance from guideline methodological handbooks [18], a systematic review of the published methodological research [16], and an international survey about the experiences of the main guideline developers [15]. Optimisation of the Checklist We optimised the initial checklist through a multistep process that included an assessment of existing updated clinical guidelines, semistructured interviews, a Delphi consensus survey, and an external review with clinical guideline methodologists and users (Fig 1). Assessment of updated clinical guidelines. We piloted the initial checklist among a convenience sample of updated clinical guidelines to assess the used terminology, to identify missing items, and as a first step to explore its face validity. We included updated clinical guidelines that were (1) developed by G-I-N members, (2) published in English or Spanish, and (3) published between 2011 and 2013. We searched the G-I-N library (www.g-i-n.net) and the National Guideline Clearinghouse (www.guidelines.gov). Two reviewers (RWMV and SPS) applied the checklist, solving disagreements by consensus. The core group discussed the results and refined the initial list of items. Semistructured interviews. To refine the checklist and to identify missing items, we conducted semistructured interviews with clinical guideline experts. We chose a convenience sample of participants, outside the CheckUp panel, with (1) experience in updating clinical guidelines, defined as having participated in the updating process of at least one clinical guideline over the past year, and (2) fluency in English. We identified the participants by contacting professionals associated with G-I-N or researchers in the field. When someone did not respond or could not participate, a new person was recruited. We continued to recruit participants and collect data until data saturation was achieved. In each interview, participants were asked about their experiences and challenges in updating clinical guidelines. Subsequently, the participant was prompted by the interviewer (RWMV) to reflect on the strengths, weaknesses, missing concepts, and redundancies in the checklist. The interviews were audiotaped, and key themes were identified. The core group discussed the results and refined the list of items. Delphi consensus survey. We reviewed the refined list of items through a Delphi consensus survey [23], with all members of the CheckUp panel. The Delphi participants assessed the inclusion, comprehensiveness, clarity, and coverage of each item and tried to identify potentially additional items for the checklist. Using a seven-point Likert scale (one meaning strongly disagree and seven meaning strongly agree) [24], we asked participants to rate whether the item should be included in the checklist. For each item, participants were asked whether their perceptions of (1) the completeness, (2) the usability, and (3) the quality of a clinical guideline would be influenced if the item was reported. We included a free text box for suggestions to modify the items, the explanation, or the examples. We used online software to design the survey and to collect the responses (www.surveymonkey.com). We calculated the median score for inclusion, completeness, usability, and quality for each item and classified them into (1) items with a median score of 0 to 3 points, which were excluded; (2) items with a median score of 4 to 5 points or with substantial comments that needed important revision, which were retained, modified, and further tested; and (3) items with a median score of 6 to 7 points and without substantial comments, which were included and not evaluated further in the following rounds. One reviewer (RWMV) analysed the quantitative and qualitative results and suggested potential solutions. The core group discussed the results and potential solutions and refined the list of items accordingly. We continued with additional rounds until consensus for inclusion or exclusion was reached and no more relevant comments were provided. External review with clinical guideline methodologists. To evaluate the usability of the checklist, we conducted a survey with clinical guideline methodologists who had experience in updating clinical guidelines, as measured by having participated in the updating process of at least one clinical guideline over the past year. We also invited all of the G-I-N institutional member contacts to participate in the external review. If the contact person was not able to participate, we asked them to provide contact details of another expert working at the same institution. Using a seven-point scale (one meaning strongly disagree and seven meaning strongly agree), we asked participants to rate the usability of each item and its influence on the confidence in an updated clinical guideline if the item was reported. A free text option was included for suggestions to modify the items, the explanation, or the examples. We used online software to design the survey and to collect the responses (www.surveymonkey.com). We calculated the median score for usability and confidence for each item. One reviewer (RWMV) analysed the quantitative and qualitative results and suggested potential solutions. The core group discussed the results and potential solutions and refined the list of items accordingly. External review with clinical guideline users. We conducted semistructured interviews with clinical guideline users to evaluate the usability of the checklist. We engaged individuals who were (1) health care professionals who used clinical guidelines in clinical practice and (2) located in Canada, Spain, or the Netherlands. We identified the participants with the help of the panel members. When someone did not respond or could not participate, a new person was recruited. We continued to recruit participants and collect data until the information was repeated and no new information emerged (data saturation). For each interview, participants were asked whether reporting of the item in an updated clinical guideline would increase their confidence in the guideline. The participant and interviewer (RWMV) reviewed the checklist, and the participants were prompted to consider missing concepts, redundancy, and the usability of the checklist. The interviews were audiotaped, and key themes were identified. The core group discussed the results and refined the list of items. Approval of the final checklist. The checklist was presented and discussed in a workshop at the 2015 G-I-N Conference in Amsterdam. In this workshop, we asked the participants whether the checklist was deemed adequate for assessment of the updated clinical guidelines [25]. We also asked the participants to give an overall impression of the checklist. The core group discussed the results and agreed on the final list of items. Assessment of updated clinical guidelines. We piloted the initial checklist among a convenience sample of updated clinical guidelines to assess the used terminology, to identify missing items, and as a first step to explore its face validity. We included updated clinical guidelines that were (1) developed by G-I-N members, (2) published in English or Spanish, and (3) published between 2011 and 2013. We searched the G-I-N library (www.g-i-n.net) and the National Guideline Clearinghouse (www.guidelines.gov). Two reviewers (RWMV and SPS) applied the checklist, solving disagreements by consensus. The core group discussed the results and refined the initial list of items. Semistructured interviews. To refine the checklist and to identify missing items, we conducted semistructured interviews with clinical guideline experts. We chose a convenience sample of participants, outside the CheckUp panel, with (1) experience in updating clinical guidelines, defined as having participated in the updating process of at least one clinical guideline over the past year, and (2) fluency in English. We identified the participants by contacting professionals associated with G-I-N or researchers in the field. When someone did not respond or could not participate, a new person was recruited. We continued to recruit participants and collect data until data saturation was achieved. In each interview, participants were asked about their experiences and challenges in updating clinical guidelines. Subsequently, the participant was prompted by the interviewer (RWMV) to reflect on the strengths, weaknesses, missing concepts, and redundancies in the checklist. The interviews were audiotaped, and key themes were identified. The core group discussed the results and refined the list of items. Delphi consensus survey. We reviewed the refined list of items through a Delphi consensus survey [23], with all members of the CheckUp panel. The Delphi participants assessed the inclusion, comprehensiveness, clarity, and coverage of each item and tried to identify potentially additional items for the checklist. Using a seven-point Likert scale (one meaning strongly disagree and seven meaning strongly agree) [24], we asked participants to rate whether the item should be included in the checklist. For each item, participants were asked whether their perceptions of (1) the completeness, (2) the usability, and (3) the quality of a clinical guideline would be influenced if the item was reported. We included a free text box for suggestions to modify the items, the explanation, or the examples. We used online software to design the survey and to collect the responses (www.surveymonkey.com). We calculated the median score for inclusion, completeness, usability, and quality for each item and classified them into (1) items with a median score of 0 to 3 points, which were excluded; (2) items with a median score of 4 to 5 points or with substantial comments that needed important revision, which were retained, modified, and further tested; and (3) items with a median score of 6 to 7 points and without substantial comments, which were included and not evaluated further in the following rounds. One reviewer (RWMV) analysed the quantitative and qualitative results and suggested potential solutions. The core group discussed the results and potential solutions and refined the list of items accordingly. We continued with additional rounds until consensus for inclusion or exclusion was reached and no more relevant comments were provided. External review with clinical guideline methodologists. To evaluate the usability of the checklist, we conducted a survey with clinical guideline methodologists who had experience in updating clinical guidelines, as measured by having participated in the updating process of at least one clinical guideline over the past year. We also invited all of the G-I-N institutional member contacts to participate in the external review. If the contact person was not able to participate, we asked them to provide contact details of another expert working at the same institution. Using a seven-point scale (one meaning strongly disagree and seven meaning strongly agree), we asked participants to rate the usability of each item and its influence on the confidence in an updated clinical guideline if the item was reported. A free text option was included for suggestions to modify the items, the explanation, or the examples. We used online software to design the survey and to collect the responses (www.surveymonkey.com). We calculated the median score for usability and confidence for each item. One reviewer (RWMV) analysed the quantitative and qualitative results and suggested potential solutions. The core group discussed the results and potential solutions and refined the list of items accordingly. External review with clinical guideline users. We conducted semistructured interviews with clinical guideline users to evaluate the usability of the checklist. We engaged individuals who were (1) health care professionals who used clinical guidelines in clinical practice and (2) located in Canada, Spain, or the Netherlands. We identified the participants with the help of the panel members. When someone did not respond or could not participate, a new person was recruited. We continued to recruit participants and collect data until the information was repeated and no new information emerged (data saturation). For each interview, participants were asked whether reporting of the item in an updated clinical guideline would increase their confidence in the guideline. The participant and interviewer (RWMV) reviewed the checklist, and the participants were prompted to consider missing concepts, redundancy, and the usability of the checklist. The interviews were audiotaped, and key themes were identified. The core group discussed the results and refined the list of items. Approval of the final checklist. The checklist was presented and discussed in a workshop at the 2015 G-I-N Conference in Amsterdam. In this workshop, we asked the participants whether the checklist was deemed adequate for assessment of the updated clinical guidelines [25]. We also asked the participants to give an overall impression of the checklist. The core group discussed the results and agreed on the final list of items. Results CheckUp Panel Fifty-six potential individuals were invited to be part of the CheckUp Panel. In total, 33 professionals, 20 males and 13 females, (response rate: 58.9%, 33/56) confirmed their participation (17 from Europe, 9 from South America, 5 from North America, and 2 from Oceania). The primary role of the panellists was health care researcher (60.6%; 20/33), guideline developer (30.3%; 10/33), and clinical guideline user (9.1%; 3/33). Generation of the Initial Checklist The initial checklist included 13 items within the following domains: updating rationale, scope and purpose of the updated clinical guideline, participants in the updating panel, conflicts of interest, updating methodology, differentiating original and new information, and reasons for the changes in the recommendations. Optimisation of the Checklist Assessment of updated clinical guidelines. Initially, we assessed a convenience sample of ten updated clinical guidelines from the G-I-N library with the initial checklist [26–35]. The items more frequently reported in the included clinical guidelines were related to the literature search strategy (60%), the composition of the panel (50%), and the external review (40%). The other checklist items (e.g., assessment for the need of updating, evidence selection, rationale for updating, and rationale for changes) were reported in less than 20% of the included clinical guidelines. No additional concepts or items emerged (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. CheckUp: Stages of the optimisation process (objective, sample, and results by optimisation processes). https://doi.org/10.1371/journal.pmed.1002207.t001 Semistructured interviews. We conducted semistructured interviews with clinical guideline developers (5 from Europe, 5 from North America, and 3 from South America). In total, we interviewed 13 participants, at which point saturation was reached. As a result, we modified five items and added four new ones. The modifications were related to (a) differences in the objectives, purpose, or aim between the original and updated version; (b) the identification of new evidence; (c) the rationale for changing recommendations; and (d) the funding. The new items were related to (a) the scope of the update (partial or complete), (b) the target audience, (c) the changes in the recommendations, and (d) the plans and methodology reported to update the clinical guideline (Table 1). Delphi consensus survey. All the members of the CheckUp panel (n = 33) were invited to participate in the Delphi consensus survey. Twenty-seven (82%) members participated in the first Delphi round, thirty-one members (93.9%) in the second Delphi round, and all (100%) members in the third and final round. In the first round, the participants provided substantial feedback on various items, their explanation, and the accompanying examples. This feedback triggered modification in the order of the items, phrasing of the items, explanations, and examples (Table 1). All items met the inclusion criteria, and no major comments were reported. The median score for whether the participants believed that an updated clinical guideline would be more complete, usable, and of higher quality whenever the item was reported was six for all questions. In the second round (n = 17 items), the amount of feedback was substantially smaller than in the first round (Table 1). We merged some items, and the checklist was reduced to 16 items. Again, all items met the inclusion criteria, and no major comments were reported. The median score for whether the participants believed the clinical guideline would be more complete, usable, and of higher quality was 7.0, 6.0, and 6.0, respectively. In the third and last round, a general consensus was reached for all items, explanations, and examples. The median score for item inclusion, completeness, usability, and quality was ≥6 in all items (except for two items with a median score of 5.5 in the usability and quality question) (Table 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Results of the Delphi survey (third round). https://doi.org/10.1371/journal.pmed.1002207.t002 External review. External review with clinical guideline methodologists. We conducted a survey with 53 clinical guideline methodologists (53/59, response rate 90%). The median scores of usability and confidence for each item were ≥6 (Table 3). Participants provided comments that improved the writing style of the items, explanations, and examples (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Results of the external review with clinical guideline methodologists. https://doi.org/10.1371/journal.pmed.1002207.t003 External review with clinical guideline users. We had conducted semistructured interviews with 10 clinical guideline users (3 from Spain, 2 from the Netherlands, and 5 from Canada) when saturation was reached. All participants acknowledged that all items were useful to evaluate the reporting of the updating process in updated clinical guidelines. Neither new items nor modifications were proposed (Table 1). Final Checklist The checklist includes 16 items that can be broadly categorised into three themes: (1) presentation (e.g., clinical guideline sections and recommendations), (2) editorial independence (e.g., the working group and funding), and (3) the methodology used (e.g., search strategy and evidence synthesis) (Table 4). Those attending the presentation of the checklist workshop at the G-I-N 2015 conference reviewed and agreed with the final version of the checklist. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Final version of CheckUp. https://doi.org/10.1371/journal.pmed.1002207.t004 CheckUp Panel Fifty-six potential individuals were invited to be part of the CheckUp Panel. In total, 33 professionals, 20 males and 13 females, (response rate: 58.9%, 33/56) confirmed their participation (17 from Europe, 9 from South America, 5 from North America, and 2 from Oceania). The primary role of the panellists was health care researcher (60.6%; 20/33), guideline developer (30.3%; 10/33), and clinical guideline user (9.1%; 3/33). Generation of the Initial Checklist The initial checklist included 13 items within the following domains: updating rationale, scope and purpose of the updated clinical guideline, participants in the updating panel, conflicts of interest, updating methodology, differentiating original and new information, and reasons for the changes in the recommendations. Optimisation of the Checklist Assessment of updated clinical guidelines. Initially, we assessed a convenience sample of ten updated clinical guidelines from the G-I-N library with the initial checklist [26–35]. The items more frequently reported in the included clinical guidelines were related to the literature search strategy (60%), the composition of the panel (50%), and the external review (40%). The other checklist items (e.g., assessment for the need of updating, evidence selection, rationale for updating, and rationale for changes) were reported in less than 20% of the included clinical guidelines. No additional concepts or items emerged (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. CheckUp: Stages of the optimisation process (objective, sample, and results by optimisation processes). https://doi.org/10.1371/journal.pmed.1002207.t001 Semistructured interviews. We conducted semistructured interviews with clinical guideline developers (5 from Europe, 5 from North America, and 3 from South America). In total, we interviewed 13 participants, at which point saturation was reached. As a result, we modified five items and added four new ones. The modifications were related to (a) differences in the objectives, purpose, or aim between the original and updated version; (b) the identification of new evidence; (c) the rationale for changing recommendations; and (d) the funding. The new items were related to (a) the scope of the update (partial or complete), (b) the target audience, (c) the changes in the recommendations, and (d) the plans and methodology reported to update the clinical guideline (Table 1). Delphi consensus survey. All the members of the CheckUp panel (n = 33) were invited to participate in the Delphi consensus survey. Twenty-seven (82%) members participated in the first Delphi round, thirty-one members (93.9%) in the second Delphi round, and all (100%) members in the third and final round. In the first round, the participants provided substantial feedback on various items, their explanation, and the accompanying examples. This feedback triggered modification in the order of the items, phrasing of the items, explanations, and examples (Table 1). All items met the inclusion criteria, and no major comments were reported. The median score for whether the participants believed that an updated clinical guideline would be more complete, usable, and of higher quality whenever the item was reported was six for all questions. In the second round (n = 17 items), the amount of feedback was substantially smaller than in the first round (Table 1). We merged some items, and the checklist was reduced to 16 items. Again, all items met the inclusion criteria, and no major comments were reported. The median score for whether the participants believed the clinical guideline would be more complete, usable, and of higher quality was 7.0, 6.0, and 6.0, respectively. In the third and last round, a general consensus was reached for all items, explanations, and examples. The median score for item inclusion, completeness, usability, and quality was ≥6 in all items (except for two items with a median score of 5.5 in the usability and quality question) (Table 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Results of the Delphi survey (third round). https://doi.org/10.1371/journal.pmed.1002207.t002 External review. External review with clinical guideline methodologists. We conducted a survey with 53 clinical guideline methodologists (53/59, response rate 90%). The median scores of usability and confidence for each item were ≥6 (Table 3). Participants provided comments that improved the writing style of the items, explanations, and examples (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Results of the external review with clinical guideline methodologists. https://doi.org/10.1371/journal.pmed.1002207.t003 External review with clinical guideline users. We had conducted semistructured interviews with 10 clinical guideline users (3 from Spain, 2 from the Netherlands, and 5 from Canada) when saturation was reached. All participants acknowledged that all items were useful to evaluate the reporting of the updating process in updated clinical guidelines. Neither new items nor modifications were proposed (Table 1). Assessment of updated clinical guidelines. Initially, we assessed a convenience sample of ten updated clinical guidelines from the G-I-N library with the initial checklist [26–35]. The items more frequently reported in the included clinical guidelines were related to the literature search strategy (60%), the composition of the panel (50%), and the external review (40%). The other checklist items (e.g., assessment for the need of updating, evidence selection, rationale for updating, and rationale for changes) were reported in less than 20% of the included clinical guidelines. No additional concepts or items emerged (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. CheckUp: Stages of the optimisation process (objective, sample, and results by optimisation processes). https://doi.org/10.1371/journal.pmed.1002207.t001 Semistructured interviews. We conducted semistructured interviews with clinical guideline developers (5 from Europe, 5 from North America, and 3 from South America). In total, we interviewed 13 participants, at which point saturation was reached. As a result, we modified five items and added four new ones. The modifications were related to (a) differences in the objectives, purpose, or aim between the original and updated version; (b) the identification of new evidence; (c) the rationale for changing recommendations; and (d) the funding. The new items were related to (a) the scope of the update (partial or complete), (b) the target audience, (c) the changes in the recommendations, and (d) the plans and methodology reported to update the clinical guideline (Table 1). Delphi consensus survey. All the members of the CheckUp panel (n = 33) were invited to participate in the Delphi consensus survey. Twenty-seven (82%) members participated in the first Delphi round, thirty-one members (93.9%) in the second Delphi round, and all (100%) members in the third and final round. In the first round, the participants provided substantial feedback on various items, their explanation, and the accompanying examples. This feedback triggered modification in the order of the items, phrasing of the items, explanations, and examples (Table 1). All items met the inclusion criteria, and no major comments were reported. The median score for whether the participants believed that an updated clinical guideline would be more complete, usable, and of higher quality whenever the item was reported was six for all questions. In the second round (n = 17 items), the amount of feedback was substantially smaller than in the first round (Table 1). We merged some items, and the checklist was reduced to 16 items. Again, all items met the inclusion criteria, and no major comments were reported. The median score for whether the participants believed the clinical guideline would be more complete, usable, and of higher quality was 7.0, 6.0, and 6.0, respectively. In the third and last round, a general consensus was reached for all items, explanations, and examples. The median score for item inclusion, completeness, usability, and quality was ≥6 in all items (except for two items with a median score of 5.5 in the usability and quality question) (Table 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Results of the Delphi survey (third round). https://doi.org/10.1371/journal.pmed.1002207.t002 External review. External review with clinical guideline methodologists. We conducted a survey with 53 clinical guideline methodologists (53/59, response rate 90%). The median scores of usability and confidence for each item were ≥6 (Table 3). Participants provided comments that improved the writing style of the items, explanations, and examples (Table 1). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Results of the external review with clinical guideline methodologists. https://doi.org/10.1371/journal.pmed.1002207.t003 External review with clinical guideline users. We had conducted semistructured interviews with 10 clinical guideline users (3 from Spain, 2 from the Netherlands, and 5 from Canada) when saturation was reached. All participants acknowledged that all items were useful to evaluate the reporting of the updating process in updated clinical guidelines. Neither new items nor modifications were proposed (Table 1). Final Checklist The checklist includes 16 items that can be broadly categorised into three themes: (1) presentation (e.g., clinical guideline sections and recommendations), (2) editorial independence (e.g., the working group and funding), and (3) the methodology used (e.g., search strategy and evidence synthesis) (Table 4). Those attending the presentation of the checklist workshop at the G-I-N 2015 conference reviewed and agreed with the final version of the checklist. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Final version of CheckUp. https://doi.org/10.1371/journal.pmed.1002207.t004 Discussion We developed CheckUp through a comprehensive development process, including the use of systematic reviews, assessment of updated clinical guidelines, and engagement of the international guideline community through semistructured interviews, a Delphi consensus survey, and an external review. Main Findings Across the different processes, an alignment and consensus of opinion emerged between what was documented in the literature and the expectations of clinical guideline developers, users, and researchers in regards to what information ought to be reported in an updated clinical guideline. CheckUp includes 16 items regarding the presentation of the updated clinical guideline, editorial independence, and the methodology used in the clinical guideline updating process. CheckUp was primarily developed to evaluate the completeness of reporting in updated guidelines. Additionally, the tool can inform guideline developers about strategies for updating clinical guidelines and their reporting requirements. An explanation and elaboration article for the CheckUp is published as a supporting information article (S1 Appendix). CheckUp can be used in several ways. The checklist can provide guidance to developers who update clinical guidelines, by providing methodological principles that should be incorporated into the updating process, as well as strategies for reporting this information. The checklist can be applied by users or appraisers of clinical guidelines to assess whether updated clinical guidelines align with the CheckUp items. We suggest that a minimum of two reviewers assess the reporting of the guideline updating process independently, with the help of a third reviewer if there is a need of reaching consensus. Our Results in the Context of Previous Research Updating is a crucial part of maintaining the trustworthiness of clinical guidelines [1,2]. Since clinical guidelines have a limited lifespan, updating clinical guidelines is crucial to maintain the validity of the recommendations [3,9,36,37]. Although the importance of regular updating has been recognised and clinical guidelines may have an “expiration date,” little research has been conducted in the field so far [13,15–18]. Published standards for trustworthy guidelines require the description of updating plans [1,2]; however, these standards do not provide specific guidance about the detailed reporting of the updating process of guidelines. Strengths and Limitations Our CheckUp proposal has several strengths. For the development process, we systematically reviewed the evidence and followed EQUATOR and Moher’s criteria [21,22]. Also, by applying a formal consensus method (Delphi survey) and collecting experts’ opinions (semistructured interviews and external reviews), we reached a fair understanding of clinical guideline methodologists’ and users’ perceptions about the updating of clinical guidelines. Finally, there was fairly strong overall consensus during the development of CheckUp. Our study has some limitations. We used consensus methods and convenience samples of clinical guideline stakeholders. However, across the different processes, an alignment and consensus of opinion emerged on what clinical guideline developers, users, and researchers expect to see reported in updated clinical guidelines. Another potential limitation is that CheckUp includes some items that may partially overlap with some items that are present in other instruments [19,20]; however, we think this is a minor limitation as CheckUp differs for the most part and has a very specific and differentiated goal. Finally, we did not collect potential conflicts of interest in our panel. Implications for Practice and Research CheckUp can be used for multiple purposes. Firstly, guideline developers can use it both for the reporting of their guidelines and to plan their updating processes. Guideline users can assess the reporting of updated guidelines. Editors may request its completion from guideline authors. CheckUp provides an overall picture of how complete the updating process is reported in updated clinical guidelines. Being a reporting checklist, CheckUp does not evaluate the quality of the updating processes, as there are no gold standards for this process. Currently, the G-I-N Updating Guidelines Working Group (http://www.g-i-n.net/working-groups/updating-guidelines) is undertaking an analysis of current guideline updating methods worldwide. From this work, strategies or advice might come on how we might assess guideline updating quality. There are currently no gold standards for guideline updating methodology. Nonetheless, updating is key to ensuring trustworthy, implementable, and clinically relevant recommendations. Current guideline evaluation tools or guideline method resources (e.g., AGREE II, Grading of Recommendations Assessment, Development, and Evaluation (GRADE), IOM Standards, and the like) are not simply transferable to the conceptual requirements of an updated guideline. CheckUp addresses the gap: it has been supported by our study participants and is a resource that complements (rather than competes with) the other high-quality tools available in the guideline enterprise. Further rigorous research in updating clinical guidelines is warranted, and we invite users to comment on the items and the usability of CheckUp. It would be important to assess the impact of CheckUp in the updating clinical guideline field over the next few years [16]. When dynamic or living guidelines become a reality, [38] some adaptation of CheckUp could potentially be necessary. Finally, the G-I-N Updating Guidelines Working Group will continue to play a key role in this work and in moving forward the updating agenda in the clinical guideline enterprise. Main Findings Across the different processes, an alignment and consensus of opinion emerged between what was documented in the literature and the expectations of clinical guideline developers, users, and researchers in regards to what information ought to be reported in an updated clinical guideline. CheckUp includes 16 items regarding the presentation of the updated clinical guideline, editorial independence, and the methodology used in the clinical guideline updating process. CheckUp was primarily developed to evaluate the completeness of reporting in updated guidelines. Additionally, the tool can inform guideline developers about strategies for updating clinical guidelines and their reporting requirements. An explanation and elaboration article for the CheckUp is published as a supporting information article (S1 Appendix). CheckUp can be used in several ways. The checklist can provide guidance to developers who update clinical guidelines, by providing methodological principles that should be incorporated into the updating process, as well as strategies for reporting this information. The checklist can be applied by users or appraisers of clinical guidelines to assess whether updated clinical guidelines align with the CheckUp items. We suggest that a minimum of two reviewers assess the reporting of the guideline updating process independently, with the help of a third reviewer if there is a need of reaching consensus. Our Results in the Context of Previous Research Updating is a crucial part of maintaining the trustworthiness of clinical guidelines [1,2]. Since clinical guidelines have a limited lifespan, updating clinical guidelines is crucial to maintain the validity of the recommendations [3,9,36,37]. Although the importance of regular updating has been recognised and clinical guidelines may have an “expiration date,” little research has been conducted in the field so far [13,15–18]. Published standards for trustworthy guidelines require the description of updating plans [1,2]; however, these standards do not provide specific guidance about the detailed reporting of the updating process of guidelines. Strengths and Limitations Our CheckUp proposal has several strengths. For the development process, we systematically reviewed the evidence and followed EQUATOR and Moher’s criteria [21,22]. Also, by applying a formal consensus method (Delphi survey) and collecting experts’ opinions (semistructured interviews and external reviews), we reached a fair understanding of clinical guideline methodologists’ and users’ perceptions about the updating of clinical guidelines. Finally, there was fairly strong overall consensus during the development of CheckUp. Our study has some limitations. We used consensus methods and convenience samples of clinical guideline stakeholders. However, across the different processes, an alignment and consensus of opinion emerged on what clinical guideline developers, users, and researchers expect to see reported in updated clinical guidelines. Another potential limitation is that CheckUp includes some items that may partially overlap with some items that are present in other instruments [19,20]; however, we think this is a minor limitation as CheckUp differs for the most part and has a very specific and differentiated goal. Finally, we did not collect potential conflicts of interest in our panel. Implications for Practice and Research CheckUp can be used for multiple purposes. Firstly, guideline developers can use it both for the reporting of their guidelines and to plan their updating processes. Guideline users can assess the reporting of updated guidelines. Editors may request its completion from guideline authors. CheckUp provides an overall picture of how complete the updating process is reported in updated clinical guidelines. Being a reporting checklist, CheckUp does not evaluate the quality of the updating processes, as there are no gold standards for this process. Currently, the G-I-N Updating Guidelines Working Group (http://www.g-i-n.net/working-groups/updating-guidelines) is undertaking an analysis of current guideline updating methods worldwide. From this work, strategies or advice might come on how we might assess guideline updating quality. There are currently no gold standards for guideline updating methodology. Nonetheless, updating is key to ensuring trustworthy, implementable, and clinically relevant recommendations. Current guideline evaluation tools or guideline method resources (e.g., AGREE II, Grading of Recommendations Assessment, Development, and Evaluation (GRADE), IOM Standards, and the like) are not simply transferable to the conceptual requirements of an updated guideline. CheckUp addresses the gap: it has been supported by our study participants and is a resource that complements (rather than competes with) the other high-quality tools available in the guideline enterprise. Further rigorous research in updating clinical guidelines is warranted, and we invite users to comment on the items and the usability of CheckUp. It would be important to assess the impact of CheckUp in the updating clinical guideline field over the next few years [16]. When dynamic or living guidelines become a reality, [38] some adaptation of CheckUp could potentially be necessary. Finally, the G-I-N Updating Guidelines Working Group will continue to play a key role in this work and in moving forward the updating agenda in the clinical guideline enterprise. Supporting Information S1 Appendix. CheckUp: Explanation and elaboration of a checklist for the reporting of updating clinical guidelines. https://doi.org/10.1371/journal.pmed.1002207.s001 (DOCX) Acknowledgments Robin W. M. Vernooij is a doctoral candidate at the Paediatrics, Obstetrics and Gynaecology and Preventive Medicine Department, Universitat Autònoma de Barcelona, Barcelona, Spain. The authors thank Sandra Pequeño Saco (Iberoamerican Cochrane Centre) for her support in assessing updated clinical guidelines. The authors also thank María Victoria Leo for her help editing the manuscript. Contributors. Vernooij RWM, Alonso-Coello P, Brouwers M, Martínez García L, Florez ID, Iorio A, James R, Sanabria AJ, Selva A, Shekelle PG, and Vandvik PO are members of the G-I-N Updating Guidelines Working Group. The members of the CheckUp panel are as follows: Vernooij RWM, Alonso-Coello P, Brouwers M, Martínez García L, Ada L, Alemán A, Arévalo-Rodriguez I, Becker M, Burgers JS, Chan W, Delvaux N, Duggan G, Enciso Olivera CO, Etxeandia-Ikobaltzeta I, Florez ID, Follmann M, Gartlehner G, Iorio A, James R, Jones SL, Kotzeva A, Lloyd M, López Gallegos D, Louro-González A, Marin Leon I, Martí-Carvajal A, Meerpohl JJ, Pardo R, Rojas-Reyes MX, Rotaeche R, Sanabria AJ, Selva A, Shekelle PG, Sierra Matamoros FA, van de Velde S, Vandvik PO, and Willett S.
What Is the Purpose of the Orphan Drug Act?doi: 10.1371/journal.pmed.1002191pmid: 28045908
The Definition of an Orphan Before the ODA became law, Congress heard diverse views about which R&D “orphans” the legislation should attempt to rescue [10]. Some witnesses focused on rare diseases during Congressional hearings, whereas others advocated for “orphan medical devices and medically necessary foods” [10]. Still others spoke in favor of “drugs for less developed countries,” or vaccines, which manufacturers had moved away from due to high product liability concerns at that time. Worried that this lack of consensus might undermine the bill’s progress, the ODA’s authors made a political choice to focus on rare diseases [10]. Even so, the ODA did not originally include a prevalence-based definition of rare disease. Rather, the ODA defined a “rare disease or condition” as one that “occurs so infrequently in the United States that there is no reasonable expectation that the cost of developing and making available in the United States a drug for such disease or condition will be recovered from the sales in the United States of such drug” [11]. Orphan drug status was therefore not granted simply because it targeted a rare disease; rather, the disease had to be rare enough to occasion market neglect. However, just as the Food and Drug Administration (FDA) began making “determinations” about whether a given disease met the ODA’s test of market neglect, the FDA’s task was grossly simplified. In 1984 the ODA was amended, redefining rare diseases as those affecting “less than 200,000 persons in the United States” (the prevalence-based definition) or more than 200,000 persons, but for whom “there is no reasonable expectation that the cost of developing and making available in the United States a drug for such disease or condition will be recovered from the sale in the United States” (a commercial viability definition) [12]. With that change, the FDA went from requiring evidence of the commercial non-viability of orphan drug R&D to assuming commercial non-viability, provided the drug targeted fewer than 200,000 persons. Numerous studies that suggest that orphan drugs are actually more profitable than non-orphan drugs call this underlying assumption into serious question [7,13,14]. Avoiding any accounting of their actual R&D costs, nearly all of the more than 2,000 orphan drug designations sought and obtained by drug-makers between 1983 and 2011 fall within the prevalence-based definition [15]. And without upfront scrutiny of the relationship between disease prevalence and anticipated profits, drug manufacturers are routinely able to price orphan drugs at US$100,000–US$200,000 per patient per year, needing only 5,000–10,000 patients to generate US$1 billion in annual revenues. Revise and Reclaim the ODA What should be done? One idea is for the FDA to try yet again to cut down on the practice of salami slicing and, in turn, to better discriminate between genuine and artificial rare diseases. In 1992, the FDA first purported to curb salami slicing by requiring that, for subsets of common diseases to be considered rare, they needed to be “medically plausible” [16], a term it failed to define. Twenty-one years later, the FDA finally promulgated more promising regulations [17] that hold drug manufacturers to a higher standard of evidence. When seeking an orphan drug indication, manufacturers must now show not only why one subset of a disease should be targeted by their drug, but also why the drug is inappropriate outside the selected subset [18]. However, the findings of Kesselheim and colleagues [4] suggest that these new regulations—or the FDA’s application of them—may not be adequate to the task. Another potentially more fruitful approach is to limit orphan drug designation to disease pathways rather than the rarity of the disease per se [4]. Fundamentally, though, the purpose of the ODA merits re-examination. At bottom, the ODA was intended to redistribute resources to medical needs that would otherwise be marginalized by market forces. With the introduction of the prevalence-based definition of rare disease, we began losing sight of the ODA’s core, redistributive function. To restore that function, we need to open up the very concept of an orphan disease or condition under the ODA and resume the scrutiny of claims of market-mediated, unmet medical need instead of policing the ever-shifting boundaries of disease. After all, “the category of orphan diseases bears no essential relationship to their prevalence, morbidity, or mortality” [19]. Markets also discourage a range of other research areas, including research involving pregnant women (because of perceived risks to the foetus) [20], comparative effectiveness research that seeks to assess the risks and benefits of competing drug treatments (because of the difficulty involved in patenting that type of information) [21], and research into diseases that disproportionately affect the world’s poor (because of the population’s low purchasing power) [22]. All of these areas of research carry tremendous social welfare gains; however, they have been effectively orphaned by the ODA’s current focus on rare diseases. Policymakers can reclaim the ODA by removing the prevalence-based definition of a rare disease, allowing other areas of research to qualify as orphans and reviving the FDA’s original, albeit short-lived, task of scrutinizing the demonstrable level of market neglect of the affected population, whether for reasons of rarity, poverty, gender, or otherwise. Waiting to see what comes down the pipe and then attempting to negotiate better prices for orphan drugs seems unlikely to succeed as a strategy for securing access to marginalized, but socially valuable, health innovations. Acknowledgments The author would like to thank Aidan Hollis and Catherine Bryan for helpful comments on a draft of this Perspective article.
Evaluating Hospital-Based Surveillance for Outbreak Detection in Bangladesh: Analysis of Healthcare Utilization Datadoi: 10.1371/journal.pmed.1002218pmid: 28095468
Background The International Health Regulations outline core requirements to ensure the detection of public health threats of international concern. Assessing the capacity of surveillance systems to detect these threats is crucial for evaluating a country’s ability to meet these requirements. Methods and Findings We propose a framework to evaluate the sensitivity and representativeness of hospital-based surveillance and apply it to severe neurological infectious diseases and fatal respiratory infectious diseases in Bangladesh. We identified cases in selected communities within surveillance hospital catchment areas using key informant and house-to-house surveys and ascertained where cases had sought care. We estimated the probability of surveillance detecting different sized outbreaks by distance from the surveillance hospital and compared characteristics of cases identified in the community and cases attending surveillance hospitals. We estimated that surveillance detected 26% (95% CI 18%–33%) of severe neurological disease cases and 18% (95% CI 16%–21%) of fatal respiratory disease cases residing at 10 km distance from a surveillance hospital. Detection probabilities decreased markedly with distance. The probability of detecting small outbreaks (three cases) dropped below 50% at distances greater than 26 km for severe neurological disease and at distances greater than 7 km for fatal respiratory disease. Characteristics of cases attending surveillance hospitals were largely representative of all cases; however, neurological disease cases aged <5 y or from the lowest socioeconomic group and fatal respiratory disease cases aged ≥60 y were underrepresented. Our estimates of outbreak detection rely on suspected cases that attend a surveillance hospital receiving laboratory confirmation of disease and being reported to the surveillance system. The extent to which this occurs will depend on disease characteristics (e.g., severity and symptom specificity) and surveillance resources. Conclusion We present a new approach to evaluating the sensitivity and representativeness of hospital-based surveillance, making it possible to predict its ability to detect emerging threats. Why Was This Study Done? Many countries rely on hospital-based surveillance for the detection of infectious diseases of national and global public health relevance. It is often difficult to access suitable external reference data to assess the capacity of a surveillance system to detect cases and outbreaks or to characterize cases. What Did the Researchers Do and Find? We demonstrate a novel approach using healthcare utilization data to evaluate the sensitivity and representativeness of severe infectious disease surveillance in Bangladesh. The capacity to detect cases and outbreaks decreased with distance from surveillance hospitals. Cases captured by surveillance differed from cases in communities by age and socioeconomic status. Geographic coverage of surveillance could be improved by including other hospitals in the surveillance system. What Do These Findings Mean? The presented approach is applicable for a wide range of infectious diseases in different settings, taking some practical considerations into account. Hospital-based surveillance may have low sensitivity in rural areas at greater distances from surveillance hospitals, suggesting a risk of unrecognized transmission of emerging infectious diseases. Alternative surveillance strategies, such as including additional hospitals in the surveillance system or considering alternative data streams, may help to increase surveillance performance in such remote regions. Introduction A well-functioning disease surveillance system is crucial for the identification and control of outbreaks, and hence the prevention of national and global health emergencies [1]. The World Health Organization (WHO) highlighted the value of national surveillance systems in the International Health Regulations (2005), an agreement among all member states to develop and maintain sufficient capacity for the detection, reporting, and control of public health threats of international concern [2]. Infectious disease surveillance should enable (i) the timely detection of outbreaks, (ii) the quantification of health problems, (iii) the identification of subpopulations at risk, and (iv) the assessment of temporal trends including the impact of control strategies [3,4]. National surveillance systems typically collect data from patients seeking care at sentinel hospitals or other healthcare facilities and can provide useful information for public health purposes. However, hospital-based surveillance generally underestimates disease burden since only a proportion of cases visit a hospital for care [5]. Low case detection may also undermine the value of hospital-based surveillance for outbreak detection. Moreover, if patients captured by the surveillance system are not representative of all cases in the community, surveillance statistics could lead to erroneous interpretations of disease patterns and misallocation of prevention resources. In particular, sex, socioeconomic status, or distance can affect healthcare seeking at hospitals, especially where access to care is limited [6–9]. Surveillance evaluation guidelines, such as those established by the US Centers of Disease Control and Prevention, list sensitivity and representativeness among the attributes that a public health surveillance system should possess and that require assessment [10,11]. In order to follow these guidelines, we need external reference data that are often unavailable in resource-poor settings [12]. Here, we present a new approach to evaluating the capacity of a surveillance system to detect and characterize disease cases, with emphasis on outbreaks of emerging infections that often occur as small case clusters in remote areas. We apply our methodology to assess hospital-based surveillance of severe neurological infectious disease and fatal respiratory infectious disease in Bangladesh. Methods Ethics Statement The field teams obtained written informed consent from participants or their guardians (if <18 y of age) during community surveys. Healthcare utilization survey protocols were reviewed and approved by the Ethical Review Committee of the International Centre for Diarrhoeal Disease Research, Bangladesh. Protocol for Evaluating Sensitivity and Representativeness of Surveillance Systems Evaluating the sensitivity and representativeness of surveillance systems may be hampered by difficulties in identifying and characterizing the underlying case population. Here we describe how epidemiological studies can be used to identify cases with severe symptoms in communities and capture their personal and healthcare utilization characteristics (data collection stage) (Fig 1). In addition to detailing how we collected the data in this study, we provide information about how the approach could be varied in other settings. We subsequently demonstrate how such data can be used to evaluate the sensitivity and representativeness of surveillance systems (evaluation stage). We then apply our approach to the detection of severe neurological infectious diseases and fatal respiratory infectious diseases in Bangladesh as a case study. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Key steps of the collection of healthcare utilization data to evaluate the sensitivity and representativeness of surveillance systems. In the Bangladesh example, the catchment areas of surveillance hospitals were first defined based on hospital records (e.g., areas where >50% or >75% of cases reside) [13,14]. Subsequently, small administrative units were chosen at random from within the catchment area, and all communities in the selected areas were surveyed. Cases in the community were identified based on lists of deaths in addition to community networking strategies (rural settings) or house-to-house surveys (urban settings). Information on symptoms (to establish case definitions), healthcare seeking behavior, and characteristics of cases was collected. In other settings, the exact survey procedures may vary according to the context. https://doi.org/10.1371/journal.pmed.1002218.g001 Data Collection Selecting study locations. The first step was to randomly select communities at differing distances from the surveillance hospitals. We specified catchment areas of selected hospitals based on hospital records and subsequently randomly selected small administrative units from which all communities were surveyed. Selection of communities could also be done through census data or using detailed population maps of the area. Identifying people with diseases in the selected community. Study teams visited the selected communities and identified cases that had had the disease of interest. The retrospective identification of severe disease cases in the community was based on syndromic criteria, used as a proxy for clinical case definitions that would be applied in healthcare facilities. The identification of such cases in the community is often the most problematic step, and the optimum strategy will depend on the local context, the severity of the disease, and the specificity of disease symptoms. Collecting information on healthcare seeking and personal case characteristics. To estimate case detection probabilities, identify biases in case statistics, and characterize the healthcare utilization behavior in the population, we needed information about the healthcare seeking and personal characteristics of cases. In particular, we needed to identify whether the cases attended a surveillance hospital. Such information was obtained during household visits of identified cases. To understand the impact of distance from the hospital, we approximated the locations of households by the central positions of the small administrative units. Alternatively, household locations could be recorded precisely using GPS devices. Evaluation of the Surveillance System Quantifying the probability of detecting a case. We estimated the case detection probability as the proportion of cases who reportedly attended a surveillance hospital among all cases identified in the community. We further assessed how this probability changed with distance from the surveillance hospital. Quantifying the probability of detecting outbreaks. We subsequently used the estimated case detection probabilities to quantify the capacity of the surveillance system to identify disease outbreaks. We estimated outbreak detection probabilities for varying outbreak sizes and for outbreaks occurring at different distances from surveillance hospitals. Assessing the representativeness of detected cases. We evaluated the representativeness of detected cases by estimating the difference between case statistics (proportions of specific case characteristics) based on all cases in the community and based on identified cases who attended the surveillance hospital. The investigated characteristics included sex, age, and socioeconomic status. Assessing alternative surveillance strategies. To investigate how sensitivity and representativeness of the surveillance system could be improved by integrating other healthcare providers, we applied the evaluation procedures as described above to other healthcare provider types. Example Using Severe Neurological Infectious Diseases and Fatal Respiratory Infectious Diseases in Bangladesh We demonstrate the application of the proposed evaluation strategy by using it to assess the capacity of hospital-based surveillance for severe infectious diseases in Bangladesh, which is based on tertiary care hospitals located throughout the country. We used data from two surveys carried out in catchment areas of some of these hospitals that investigated the healthcare utilization behavior of individuals with severe neurological infectious disease or fatal respiratory infectious disease (Fig 2A) [14,15]. These disease types are of great public health relevance in Bangladesh (e.g., Japanese encephalitis and influenza) but also represent symptoms typical of other emerging infectious diseases (e.g., Nipah and severe acute respiratory syndrome). A first survey collected data between 10 June 2008 and 30 March 2009 about cases with symptoms of severe neurological infection that occurred within the previous 12 mo in 60 small administrative units (mean population size of 28,000 people) in the catchment areas of three surveillance hospitals [14]. A second survey collected data between 3 April 2012 and 22 February 2013 about acute respiratory infection (ARI)–related deaths that occurred within the previous 24 mo in 22 administrative units in the catchment areas of 11 surveillance hospitals [15]. We considered ARI-related deaths as a proxy for respiratory disease of sufficient severity to require medical attention. The surveillance hospital in Dhaka City was excluded from the original studies because of the difficulty of defining the catchment area (a step necessary for the original study purpose), as people nationwide seek medical care in Dhaka. The surveys followed procedures as previously described and summarized below [13,14]. Characteristics of the study population are described in Fig. A in S1 Text. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Location of administrative units and case detection probabilities by distance. (A) Location of surveillance hospitals and administrative units. The hospital in Dhaka City was excluded from the original studies. (B) Population density map of Bangladesh [16]. Sixty-eight percent of the population in Bangladesh lives >30 km from a surveillance hospital (including the Dhaka surveillance hospital), a distance at which case and outbreak detection probabilities are low. (C) Probability of surveillance case detection by distance. The observed probability was calculated as a moving average over a 25 km distance window. Case detection probabilities were estimated using log-binomial regression models including distance as an explanatory variable. https://doi.org/10.1371/journal.pmed.1002218.g002 Community Healthcare Utilization Surveys The catchment areas of selected hospitals were first specified based on hospital records (S1 Text). Small administrative units (mean population of 28,000 people) were subsequently selected randomly within the catchment areas, and all communities in the selected areas surveyed. The identification of cases in selected communities was based on social structures, i.e., cases were identified by visiting public meeting points, such as mosques, markets, or tea-stalls, where health problems in the community are often publicly discussed. Cases were subsequently confirmed by household visits. In urban areas, house-to-house surveys were conducted to compensate for less pronounced community structures. Additional fatal respiratory infectious disease cases were identified through lists of deaths provided by administrative officers. For both disease types, the identification of cases was based on syndromic criteria. We defined severe neurological infectious disease as fever with altered mental status for >6 h or with unconsciousness for ≥1 h, or fever with altered mental status, unconsciousness, or a new onset seizure that resulted in death. Fatal respiratory infectious disease (ARI-related death) was defined as having any two of the following symptoms in the 30 d prior to death: sudden onset of fever, cough, breathing difficulty, feeding difficulty, or runny nose. Deaths in children aged <5 y were also classified as ARI-related deaths if there was a sudden onset of breathing difficulty in the 30 d prior to death. During surveys, information was collected on healthcare utilization behavior and personal characteristics of identified severe neurological and fatal respiratory disease cases. Cases or their household members were asked whether the case visited the surveillance hospital or any other healthcare provider, including other nonlocal hospitals, during his/her illness. Further, information on sex, age, socioeconomic status, and geographic location of households of cases was collected. Classification of Case Characteristics We defined “community cases” as all severe neurological or fatal respiratory disease cases identified during community surveys (whether they attended a surveillance hospital or not) and “surveillance cases” as the subset of community cases who reportedly attended a surveillance hospital. For each case identified in community surveys, we identified whether they attended their nearest surveillance hospital. We then estimated the distance to that surveillance hospital as the distance between the residence administrative unit centroid and that specific surveillance hospital using QGIS [17]. Age was categorized as <5, 5–14, 15–59, and ≥60 y. A socioeconomic status index was generated by principal component analysis based on household assets (electricity, working television, bicycle, motorcycle, sewing machine, mobile phone) and categorized into tertiles (lowest, middle, and highest) [18]. In sensitivity analyses, we explored the use of continuous age and socioeconomic status classified into quintiles (S1 Text). Socioeconomic status was missing for 45 of 1,633 fatal respiratory disease cases, who were excluded from the analysis where this information was required. Three fatal respiratory disease cases were excluded from all analyses due to missing healthcare seeking information. Quantifying the Probability of Detecting Cases We estimated the disease-specific case detection probability as the proportion of cases who reportedly sought care at a surveillance hospital among all cases identified during community surveys (number of surveillance cases/number of community cases) and computed 95% confidence intervals (95% CIs) based on the Clopper-Pearson exact method [19]. We quantified case detection probabilities by distance from a surveillance hospital using log-binomial regression analysis separately for severe neurological and fatal respiratory disease cases. We further investigated more complex functional forms of distance in log-binomial regression models. We fitted models with polynomial terms up to the fifth degree and models with basic splines with knots at various positions (between 20 and 50 km distance). Model fit was compared based on the Akaike information criterion (AIC), and the models with lowest AIC were selected. The fit of selected models was compared to the observed proportion of cases who attended surveillance hospitals at different distances (moving average over a distance window of 25 km). We estimated the proportion of the population living >30 km and >50 km from a surveillance hospital using gridded population density estimates of 100 × 100 m resolution [16]. Quantifying the Probability of Detecting Outbreaks To quantify the capacity of the surveillance system to detect outbreaks of varying sizes, we calculated the probability that at least one case was detected: Proutbreak1 is the outbreak detection probability based on a one-case threshold, Pr is the case detection probability, and s is the outbreak size. This calculation assumes that the probability of detecting a sentinel case is independent of other cases. We used distance-specific case detection probabilities estimated by log-binomial regression and obtained confidence intervals of outbreak detection probabilities based on the 95% CI limits of case detection probabilities. We further estimated the size of the smallest outbreak that would be detected with ≥90% probability by distance from the surveillance hospital. For emerging infectious diseases of global health importance, such as Nipah, severe acute respiratory syndrome, or avian influenza, a single detected case may be considered an outbreak. For other disease systems (e.g., endemic diseases or diseases for which differential diagnosis is difficult), an outbreak may be declared only after more than a single case is detected over a specified period of time and within specified geographic boundaries [20]. We can extend the framework to estimate the probability of identifying an outbreak with different outbreak thresholds applied, and we provide examples for outbreaks defined as detection of at least two cases or at least five cases. We calculated the probability of detecting at least two cases (Proutbreak2) as one minus the probability of detecting no cases (Pr0) and exactly one case (Pr1): Likewise, we estimated the probability of detecting at least five cases (Proutbreak5) as one minus the probability of detecting no cases (Pr0) and exactly one (Pr1), two (Pr2), three (Pr3), and four cases (Pr4): Assessing the Representativeness of Surveillance Cases We investigated the representativeness of surveillance cases (sex, age, and socioeconomic group) by comparing the proportion of cases with a specific characteristic (and exact binomial confidence intervals) among community cases to the proportion of cases with that characteristic among surveillance cases. We quantified the absolute difference in proportions (proportion of cases with characteristic among surveillance cases minus proportion among community cases) with 95% CIs and p-values using bootstrapping (2,000 bootstrap iterations) [21]. Evaluating Alternative Surveillance Strategies Based on the collected healthcare utilization data, we evaluated how the sensitivity and representativeness of a surveillance system may be improved by integrating other healthcare providers. We classified healthcare providers as (i) surveillance hospitals, (ii) other hospitals (government and private clinics), (iii) qualified private practitioners, and (iv) the informal sector (unqualified practitioners such as traditional healers, village doctors, homeopaths, and pharmacies). We estimated the proportion of cases attending each healthcare provider class, with exact binomial confidence intervals, and estimated outbreak detection probabilities based on proportions attending the surveillance hospital plus (i) other hospitals, (ii) qualified private practitioners, or (iii) informal healthcare providers. Furthermore, we compared the proportion of cases with each characteristic (sex, age, and socioeconomic group) among community cases to the proportion among those attending each healthcare provider class and quantified absolute differences in proportions with 95% CIs and p-values using bootstrapping (2,000 bootstrap iterations). All statistical analyses and graphics were implemented in the R computing environment; maps were created using QGIS software [17,22]. Ethics Statement The field teams obtained written informed consent from participants or their guardians (if <18 y of age) during community surveys. Healthcare utilization survey protocols were reviewed and approved by the Ethical Review Committee of the International Centre for Diarrhoeal Disease Research, Bangladesh. Protocol for Evaluating Sensitivity and Representativeness of Surveillance Systems Evaluating the sensitivity and representativeness of surveillance systems may be hampered by difficulties in identifying and characterizing the underlying case population. Here we describe how epidemiological studies can be used to identify cases with severe symptoms in communities and capture their personal and healthcare utilization characteristics (data collection stage) (Fig 1). In addition to detailing how we collected the data in this study, we provide information about how the approach could be varied in other settings. We subsequently demonstrate how such data can be used to evaluate the sensitivity and representativeness of surveillance systems (evaluation stage). We then apply our approach to the detection of severe neurological infectious diseases and fatal respiratory infectious diseases in Bangladesh as a case study. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Key steps of the collection of healthcare utilization data to evaluate the sensitivity and representativeness of surveillance systems. In the Bangladesh example, the catchment areas of surveillance hospitals were first defined based on hospital records (e.g., areas where >50% or >75% of cases reside) [13,14]. Subsequently, small administrative units were chosen at random from within the catchment area, and all communities in the selected areas were surveyed. Cases in the community were identified based on lists of deaths in addition to community networking strategies (rural settings) or house-to-house surveys (urban settings). Information on symptoms (to establish case definitions), healthcare seeking behavior, and characteristics of cases was collected. In other settings, the exact survey procedures may vary according to the context. https://doi.org/10.1371/journal.pmed.1002218.g001 Data Collection Selecting study locations. The first step was to randomly select communities at differing distances from the surveillance hospitals. We specified catchment areas of selected hospitals based on hospital records and subsequently randomly selected small administrative units from which all communities were surveyed. Selection of communities could also be done through census data or using detailed population maps of the area. Identifying people with diseases in the selected community. Study teams visited the selected communities and identified cases that had had the disease of interest. The retrospective identification of severe disease cases in the community was based on syndromic criteria, used as a proxy for clinical case definitions that would be applied in healthcare facilities. The identification of such cases in the community is often the most problematic step, and the optimum strategy will depend on the local context, the severity of the disease, and the specificity of disease symptoms. Collecting information on healthcare seeking and personal case characteristics. To estimate case detection probabilities, identify biases in case statistics, and characterize the healthcare utilization behavior in the population, we needed information about the healthcare seeking and personal characteristics of cases. In particular, we needed to identify whether the cases attended a surveillance hospital. Such information was obtained during household visits of identified cases. To understand the impact of distance from the hospital, we approximated the locations of households by the central positions of the small administrative units. Alternatively, household locations could be recorded precisely using GPS devices. Selecting study locations. The first step was to randomly select communities at differing distances from the surveillance hospitals. We specified catchment areas of selected hospitals based on hospital records and subsequently randomly selected small administrative units from which all communities were surveyed. Selection of communities could also be done through census data or using detailed population maps of the area. Identifying people with diseases in the selected community. Study teams visited the selected communities and identified cases that had had the disease of interest. The retrospective identification of severe disease cases in the community was based on syndromic criteria, used as a proxy for clinical case definitions that would be applied in healthcare facilities. The identification of such cases in the community is often the most problematic step, and the optimum strategy will depend on the local context, the severity of the disease, and the specificity of disease symptoms. Collecting information on healthcare seeking and personal case characteristics. To estimate case detection probabilities, identify biases in case statistics, and characterize the healthcare utilization behavior in the population, we needed information about the healthcare seeking and personal characteristics of cases. In particular, we needed to identify whether the cases attended a surveillance hospital. Such information was obtained during household visits of identified cases. To understand the impact of distance from the hospital, we approximated the locations of households by the central positions of the small administrative units. Alternatively, household locations could be recorded precisely using GPS devices. Evaluation of the Surveillance System Quantifying the probability of detecting a case. We estimated the case detection probability as the proportion of cases who reportedly attended a surveillance hospital among all cases identified in the community. We further assessed how this probability changed with distance from the surveillance hospital. Quantifying the probability of detecting outbreaks. We subsequently used the estimated case detection probabilities to quantify the capacity of the surveillance system to identify disease outbreaks. We estimated outbreak detection probabilities for varying outbreak sizes and for outbreaks occurring at different distances from surveillance hospitals. Assessing the representativeness of detected cases. We evaluated the representativeness of detected cases by estimating the difference between case statistics (proportions of specific case characteristics) based on all cases in the community and based on identified cases who attended the surveillance hospital. The investigated characteristics included sex, age, and socioeconomic status. Assessing alternative surveillance strategies. To investigate how sensitivity and representativeness of the surveillance system could be improved by integrating other healthcare providers, we applied the evaluation procedures as described above to other healthcare provider types. Quantifying the probability of detecting a case. We estimated the case detection probability as the proportion of cases who reportedly attended a surveillance hospital among all cases identified in the community. We further assessed how this probability changed with distance from the surveillance hospital. Quantifying the probability of detecting outbreaks. We subsequently used the estimated case detection probabilities to quantify the capacity of the surveillance system to identify disease outbreaks. We estimated outbreak detection probabilities for varying outbreak sizes and for outbreaks occurring at different distances from surveillance hospitals. Assessing the representativeness of detected cases. We evaluated the representativeness of detected cases by estimating the difference between case statistics (proportions of specific case characteristics) based on all cases in the community and based on identified cases who attended the surveillance hospital. The investigated characteristics included sex, age, and socioeconomic status. Assessing alternative surveillance strategies. To investigate how sensitivity and representativeness of the surveillance system could be improved by integrating other healthcare providers, we applied the evaluation procedures as described above to other healthcare provider types. Example Using Severe Neurological Infectious Diseases and Fatal Respiratory Infectious Diseases in Bangladesh We demonstrate the application of the proposed evaluation strategy by using it to assess the capacity of hospital-based surveillance for severe infectious diseases in Bangladesh, which is based on tertiary care hospitals located throughout the country. We used data from two surveys carried out in catchment areas of some of these hospitals that investigated the healthcare utilization behavior of individuals with severe neurological infectious disease or fatal respiratory infectious disease (Fig 2A) [14,15]. These disease types are of great public health relevance in Bangladesh (e.g., Japanese encephalitis and influenza) but also represent symptoms typical of other emerging infectious diseases (e.g., Nipah and severe acute respiratory syndrome). A first survey collected data between 10 June 2008 and 30 March 2009 about cases with symptoms of severe neurological infection that occurred within the previous 12 mo in 60 small administrative units (mean population size of 28,000 people) in the catchment areas of three surveillance hospitals [14]. A second survey collected data between 3 April 2012 and 22 February 2013 about acute respiratory infection (ARI)–related deaths that occurred within the previous 24 mo in 22 administrative units in the catchment areas of 11 surveillance hospitals [15]. We considered ARI-related deaths as a proxy for respiratory disease of sufficient severity to require medical attention. The surveillance hospital in Dhaka City was excluded from the original studies because of the difficulty of defining the catchment area (a step necessary for the original study purpose), as people nationwide seek medical care in Dhaka. The surveys followed procedures as previously described and summarized below [13,14]. Characteristics of the study population are described in Fig. A in S1 Text. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Location of administrative units and case detection probabilities by distance. (A) Location of surveillance hospitals and administrative units. The hospital in Dhaka City was excluded from the original studies. (B) Population density map of Bangladesh [16]. Sixty-eight percent of the population in Bangladesh lives >30 km from a surveillance hospital (including the Dhaka surveillance hospital), a distance at which case and outbreak detection probabilities are low. (C) Probability of surveillance case detection by distance. The observed probability was calculated as a moving average over a 25 km distance window. Case detection probabilities were estimated using log-binomial regression models including distance as an explanatory variable. https://doi.org/10.1371/journal.pmed.1002218.g002 Community Healthcare Utilization Surveys The catchment areas of selected hospitals were first specified based on hospital records (S1 Text). Small administrative units (mean population of 28,000 people) were subsequently selected randomly within the catchment areas, and all communities in the selected areas surveyed. The identification of cases in selected communities was based on social structures, i.e., cases were identified by visiting public meeting points, such as mosques, markets, or tea-stalls, where health problems in the community are often publicly discussed. Cases were subsequently confirmed by household visits. In urban areas, house-to-house surveys were conducted to compensate for less pronounced community structures. Additional fatal respiratory infectious disease cases were identified through lists of deaths provided by administrative officers. For both disease types, the identification of cases was based on syndromic criteria. We defined severe neurological infectious disease as fever with altered mental status for >6 h or with unconsciousness for ≥1 h, or fever with altered mental status, unconsciousness, or a new onset seizure that resulted in death. Fatal respiratory infectious disease (ARI-related death) was defined as having any two of the following symptoms in the 30 d prior to death: sudden onset of fever, cough, breathing difficulty, feeding difficulty, or runny nose. Deaths in children aged <5 y were also classified as ARI-related deaths if there was a sudden onset of breathing difficulty in the 30 d prior to death. During surveys, information was collected on healthcare utilization behavior and personal characteristics of identified severe neurological and fatal respiratory disease cases. Cases or their household members were asked whether the case visited the surveillance hospital or any other healthcare provider, including other nonlocal hospitals, during his/her illness. Further, information on sex, age, socioeconomic status, and geographic location of households of cases was collected. Classification of Case Characteristics We defined “community cases” as all severe neurological or fatal respiratory disease cases identified during community surveys (whether they attended a surveillance hospital or not) and “surveillance cases” as the subset of community cases who reportedly attended a surveillance hospital. For each case identified in community surveys, we identified whether they attended their nearest surveillance hospital. We then estimated the distance to that surveillance hospital as the distance between the residence administrative unit centroid and that specific surveillance hospital using QGIS [17]. Age was categorized as <5, 5–14, 15–59, and ≥60 y. A socioeconomic status index was generated by principal component analysis based on household assets (electricity, working television, bicycle, motorcycle, sewing machine, mobile phone) and categorized into tertiles (lowest, middle, and highest) [18]. In sensitivity analyses, we explored the use of continuous age and socioeconomic status classified into quintiles (S1 Text). Socioeconomic status was missing for 45 of 1,633 fatal respiratory disease cases, who were excluded from the analysis where this information was required. Three fatal respiratory disease cases were excluded from all analyses due to missing healthcare seeking information. Quantifying the Probability of Detecting Cases We estimated the disease-specific case detection probability as the proportion of cases who reportedly sought care at a surveillance hospital among all cases identified during community surveys (number of surveillance cases/number of community cases) and computed 95% confidence intervals (95% CIs) based on the Clopper-Pearson exact method [19]. We quantified case detection probabilities by distance from a surveillance hospital using log-binomial regression analysis separately for severe neurological and fatal respiratory disease cases. We further investigated more complex functional forms of distance in log-binomial regression models. We fitted models with polynomial terms up to the fifth degree and models with basic splines with knots at various positions (between 20 and 50 km distance). Model fit was compared based on the Akaike information criterion (AIC), and the models with lowest AIC were selected. The fit of selected models was compared to the observed proportion of cases who attended surveillance hospitals at different distances (moving average over a distance window of 25 km). We estimated the proportion of the population living >30 km and >50 km from a surveillance hospital using gridded population density estimates of 100 × 100 m resolution [16]. Quantifying the Probability of Detecting Outbreaks To quantify the capacity of the surveillance system to detect outbreaks of varying sizes, we calculated the probability that at least one case was detected: Proutbreak1 is the outbreak detection probability based on a one-case threshold, Pr is the case detection probability, and s is the outbreak size. This calculation assumes that the probability of detecting a sentinel case is independent of other cases. We used distance-specific case detection probabilities estimated by log-binomial regression and obtained confidence intervals of outbreak detection probabilities based on the 95% CI limits of case detection probabilities. We further estimated the size of the smallest outbreak that would be detected with ≥90% probability by distance from the surveillance hospital. For emerging infectious diseases of global health importance, such as Nipah, severe acute respiratory syndrome, or avian influenza, a single detected case may be considered an outbreak. For other disease systems (e.g., endemic diseases or diseases for which differential diagnosis is difficult), an outbreak may be declared only after more than a single case is detected over a specified period of time and within specified geographic boundaries [20]. We can extend the framework to estimate the probability of identifying an outbreak with different outbreak thresholds applied, and we provide examples for outbreaks defined as detection of at least two cases or at least five cases. We calculated the probability of detecting at least two cases (Proutbreak2) as one minus the probability of detecting no cases (Pr0) and exactly one case (Pr1): Likewise, we estimated the probability of detecting at least five cases (Proutbreak5) as one minus the probability of detecting no cases (Pr0) and exactly one (Pr1), two (Pr2), three (Pr3), and four cases (Pr4): Assessing the Representativeness of Surveillance Cases We investigated the representativeness of surveillance cases (sex, age, and socioeconomic group) by comparing the proportion of cases with a specific characteristic (and exact binomial confidence intervals) among community cases to the proportion of cases with that characteristic among surveillance cases. We quantified the absolute difference in proportions (proportion of cases with characteristic among surveillance cases minus proportion among community cases) with 95% CIs and p-values using bootstrapping (2,000 bootstrap iterations) [21]. Evaluating Alternative Surveillance Strategies Based on the collected healthcare utilization data, we evaluated how the sensitivity and representativeness of a surveillance system may be improved by integrating other healthcare providers. We classified healthcare providers as (i) surveillance hospitals, (ii) other hospitals (government and private clinics), (iii) qualified private practitioners, and (iv) the informal sector (unqualified practitioners such as traditional healers, village doctors, homeopaths, and pharmacies). We estimated the proportion of cases attending each healthcare provider class, with exact binomial confidence intervals, and estimated outbreak detection probabilities based on proportions attending the surveillance hospital plus (i) other hospitals, (ii) qualified private practitioners, or (iii) informal healthcare providers. Furthermore, we compared the proportion of cases with each characteristic (sex, age, and socioeconomic group) among community cases to the proportion among those attending each healthcare provider class and quantified absolute differences in proportions with 95% CIs and p-values using bootstrapping (2,000 bootstrap iterations). All statistical analyses and graphics were implemented in the R computing environment; maps were created using QGIS software [17,22]. Results The studied communities were located within 95 km (severe neurological infectious disease) and 62 km (fatal respiratory infectious disease) of a surveillance hospital. In these communities, 76 of 426 severe neurological disease cases (18%, 95% CI 14%–22%) and 234 of 1,630 fatal respiratory disease cases (14%, 95% CI 13%–16%) attended a surveillance hospital. Adjusting for distance, the case detection probability was nearly twice as high among severe neurological disease cases than among fatal respiratory disease cases (risk ratio 1.8, 95% CI 1.4–2.3; p < 0.001). At 10 km distance, an estimated 26% (95% CI 18%–33%) of severe neurological disease cases and 18% (95% CI 16%–21%) of fatal respiratory disease cases were detected by the hospital-based surveillance. The detection probability decreased with distance from the surveillance hospital, and the decline was faster for fatal respiratory disease than for severe neurological disease. A 10 km distance increase resulted in a 12% (95% CI 4%–19%; p = 0.003) relative reduction in case detection probability for severe neurological disease but a 36% (95% CI 29%–43%; p < 0.001) relative reduction for fatal respiratory disease (Fig 2C). Including more complex functional forms of distance in the log-binomial regression models did not improve model fit based on AIC (Table A and Figs. B and C in S1 Text). The probability of detecting an outbreak of exactly three cases (if a single detected case was considered an outbreak) dropped below 50% at distances greater than 26 km for severe neurological disease and at distances greater than 7 km for fatal respiratory disease (Fig 3A). Fig 3B and 3C show the minimum number of cases required for surveillance to detect outbreaks with a probability of ≥90% if different outbreak thresholds are applied. For outbreaks defined as detection of at least one case, we found that an outbreak of fatal respiratory disease required 12 cases (95% CI 11–13) to be detected with 90% probability at 10 km from a surveillance hospital, but 30 cases (95% CI 24–39) to be detected at 30 km. In contrast, the impact of distance on the outbreak size requirement was much more limited for severe neurological disease: eight cases (95% CI 6–12) at 10 km and 11 cases (95% CI 9–14) at 30 km. For outbreaks defined as detection of at least two cases, 14 severe neurological disease cases (95% CI 11–20) and 20 fatal respiratory disease cases (95% CI 18–23) would be necessary for an outbreak to be detected at 10 km distance, and 19 severe neurological disease cases (95% CI 15–24) and 51 fatal respiratory disease cases (95% CI 41–66) at 30 km. The necessary outbreak sizes increased further when a five-case threshold was applied, so that 28 severe neurological disease cases (95% CI 21–39) and 39 fatal respiratory disease cases (95% CI 35–44) would need to occur for an outbreak to be detected at 10 km distance, and 36 (95% CI 30–46) and 97 (95% CI 79–128), respectively, cases at 30 km. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Outbreak detection capacity. (A) Probability of detecting outbreaks with exactly three cases of severe neurological or fatal respiratory disease by distance from surveillance hospital if a single detected case is considered an outbreak. (B) Smallest size of severe neurological disease outbreak that would be detected with ≥90% probability by distance from surveillance hospital for outbreak thresholds of at least one, two, or five detected cases. (C) Smallest size of fatal respiratory disease outbreak that would be detected with ≥90% probability by distance from surveillance hospital for outbreak thresholds of at least one, two, or five detected cases. https://doi.org/10.1371/journal.pmed.1002218.g003 Surveillance hospital attendance among community cases varied by case characteristics, leading sometimes to biased disease statistics among surveillance cases (Table B in S1 Text). For severe neurological disease, individuals aged <5 y represented 48% of community cases but only 29% of surveillance cases (p < 0.001). Additionally, the proportion of cases in the lowest socioeconomic group was lower among surveillance cases than among community cases (43% versus 57%; p = 0.012), while the proportion of individuals aged 15–59 y was higher (43% versus 29%; p = 0.005) (Fig 4A). For fatal respiratory disease, the proportion of individuals aged ≥60 y (47% versus 62%; p < 0.001) was lower among surveillance cases than among community cases, while the proportion of individuals aged <5 y (24% versus 18%; p = 0.020), individuals aged 15–59 y (27% versus 18%; p < 0.001), and cases in the highest socioeconomic group (43% versus 37%; p = 0.022) was higher (Fig 4B). We observed a slight difference in the proportion of females for fatal respiratory disease (34% among surveillance cases versus 38% among community cases; p = 0.108), but not for severe neurological disease (39% versus 40%; p = 0.861). Results were consistent in sensitivity analyses with age as a continuous variable and socioeconomic status classified into quintiles (Figs. D and E in S1 Text). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Representativeness of surveillance cases. Comparison of case statistics (proportion of cases with a characteristic) estimated for community cases to those estimated for surveillance cases for (A) severe neurological infectious disease and (B) fatal respiratory infectious disease. Significant differences (bootstrap p ≤ 0.05) are indicated with an asterisk. SES, socioeconomic status. https://doi.org/10.1371/journal.pmed.1002218.g004 A substantial proportion of cases (severe neurological disease 42% [95% CI 38%–47%]; fatal respiratory disease 26% [95% CI 24%–28%]) visited multiple healthcare providers during their illness. Forty-eight percent (95% CI 44%–53%) of severe neurological disease cases and 31% (95% CI 29%–34%) of fatal respiratory disease cases attended any hospital, including surveillance hospitals (Fig 5). Including other hospitals that were attended by cases in the surveillance system could have increased the overall case detection probability by 31% (absolute increase) for severe neurological disease cases and 17% for fatal respiratory disease cases. The capacity to detect outbreaks would have increased, so that outbreaks containing four severe neurological or eight fatal respiratory disease cases would have been detected with ≥90% probability for any distance in the range 0–40 km from the original surveillance hospital, compared to 13 and 47 cases, respectively, with the current system (Fig. F in S1 Text). However, since individuals who attended any hospital had similar characteristics in terms of sex, age, and socioeconomic status as those attending surveillance hospitals (Fig. G in S1 Text), this expansion would not have increased disease detection in key groups such as the lowest socioeconomic group. Only with the informal sector incorporated in the surveillance system would cases in such groups be detected. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Attendance at surveillance hospitals and alternative healthcare providers. Proportion of (A) severe neurological and (B) fatal respiratory disease cases attending surveillance hospitals and other healthcare providers. Cases may attend several different healthcare providers during their sickness. Cases who attended a surveillance hospital at any time are indicated with diagonal hatching. https://doi.org/10.1371/journal.pmed.1002218.g005 Discussion We described an analytic approach for evaluating the sensitivity and representativeness of hospital-based surveillance systems and applied it to surveillance for severe neurological diseases and fatal respiratory infectious diseases in Bangladesh. We quantified the proportion of cases detected and the probability that the surveillance system would detect different sized outbreaks by distance from the surveillance hospital. Finally, we characterized biases in surveillance statistics and identified potential improvements to the surveillance platform. We estimated that approximately one-quarter of severe neurological disease cases and one-fifth of fatal respiratory disease cases occurring 10 km from a surveillance hospital would be detected with current surveillance. The proportion of cases attending a surveillance hospital declined significantly with increasing distance between individuals’ residence and the surveillance hospital, substantially faster for fatal respiratory disease than for severe neurological disease. These low detection probabilities mean that hospital-based surveillance in Bangladesh (like in most other resource-poor countries presumably) would likely miss a high proportion of single-case public health events. Of greater relevance is that surveillance system capacity to detect outbreaks and detection probabilities increased substantially with the number of cases. The required number of cases to detect outbreaks with high probability varied with disease type and distance from the surveillance hospital. It could be as low as about ten cases if the outbreak occurred <10 km from the surveillance hospital but increased quickly with distance for fatal respiratory disease. For outbreaks defined as a single detected case, we found that more than half of outbreaks with ten cases of fatal respiratory disease would be missed if the outbreak occurred >32 km from the hospital. Such detailed quantification of outbreak detection probability is essential to ascertain the likelihood that an emerging threat can be detected early enough to be contained [23]. In some circumstances, authorities may have to wait until more than a single case is detected to recognize that an outbreak is occurring. In particular, difficult differential diagnoses and lack of appropriate diagnostic tests mean that only when a number of cases are detected from the same area and over a short time frame will an outbreak be identified and further investigations conducted. In addition, where a low background level of transmission is expected (such as with endemic diseases), public health authorities may wait until a particular threshold is exceeded before declaring an outbreak. In both of these scenarios, where multiple cases need to be detected by the hospital before an outbreak is recognized, the optimal number of detected cases and their spatial and temporal separation will depend on the disease system. We can incorporate this information into our flexible framework and provide examples where we calculate the size an outbreak needs to be for scenarios where at least two or five cases need to be detected (Fig 3B and 3C). In particular, this demonstrates that if an outbreak is identified only once five cases are detected at the surveillance hospital, the size of the outbreak would have to be substantially larger (e.g., nearly 100 total cases of a fatal respiratory disease at 30 km from a surveillance hospital) for there to be a 90% chance of an outbreak being identified. This highlights the possibility that, by the time an outbreak reaches sufficient size to be detected by the system, outbreak control measures may be much less effective at controlling spread. Thresholds for case counts that trigger an outbreak response should be crafted taking this possibility into account. Low detection probabilities for outbreaks that occur far from surveillance hospitals are an important concern because pathogens with high case fatality such as Japanese encephalitis and Nipah virus are nearly exclusively found in rural communities in Bangladesh [24,25], and these communities are usually located far from surveillance hospitals. Rural environments are also considered to be at highest risk for the emergence of novel pathogens [26,27]. Population distribution maps suggest that 68% of the population in Bangladesh live in communities >30 km from a surveillance hospital (representing 108 million individuals) and 40% live >50 km from a surveillance hospital (representing 63 million people) (Fig 2B). Strengthening healthcare-based surveillance in these areas should be a priority, and cost-effective approaches to achieving surveillance targets need to be identified. There is increasing recognition of the value of novel data sources to improve the sensitivity of infectious disease surveillance, some of which can provide crucial information in remote areas [20]. Novel approaches include surveillance for media reports of disease clusters, as used for various infectious diseases in Bangladesh [12,28], and training of local drug sellers to recognize and report disease symptoms, as rolled out nationally to enhance tuberculosis surveillance in Ghana [29]. Other surveillance data streams, such as monitoring over-the-counter medication sales, telephone triage, and web-based queries, have been successfully integrated in surveillance systems in resource-rich settings [30]. We found that cases attending surveillance hospitals were not necessarily representative of all cases in the community. In particular, the youngest severe neurological disease cases and the oldest fatal respiratory disease cases were less likely to attend surveillance hospitals, and attendance was also lower among cases in the lowest socioeconomic group. Similar variation in hospital attendance has been reported in other resource-poor settings [6,8,9], indicating that hospital-based surveillance in these countries may have comparable limitations. Disease statistics obtained through hospital-based surveillance have to be interpreted in the light of detected biases, and correction factors may need to be applied. For example, underestimating severe neurological disease among young children may mislead any future Japanese encephalitis vaccination strategy [31,32]. Differential surveillance hospital attendance may also influence the capacity to detect emerging infections, such as the avian influenza A (H7N9) virus that emerged in 2013 in China with observed cases mainly among elderly men [33]. Overall, access to appropriate care was poor—over 30% of community cases with severe disease or who died in our study never saw a qualified provider. Such low access is a common problem in low-income settings and means that a large proportion of the population, and particularly subgroups that are potentially at highest need, do not receive the required medical attention [6–9]. For example, difficulties accessing qualified healthcare providers for elderly people, who are often at greatest risk of respiratory disease, can have severe consequences for the outcome of disease. Previous studies have demonstrated that accessibility to healthcare is a significant predictor of morbidity and mortality among elderly individuals with respiratory disease [34]. The study showed that healthcare utilization behavior varied by disease type, which may be due to different characteristics of cases such as their age, socioeconomic status, and geographic location (Fig. A in S1 Text). The majority of fatal respiratory disease cases were ≥60 y old and may have faced limitations in mobility; moreover, rapid progression of disease to death may have prevented cases in this age group from seeking appropriate care. Cases and their family members in general may have also perceived neurological symptoms as more severe, resulting in higher motivation to attend a qualified healthcare provider [7]. We evaluated potential improvements of surveillance by analyzing healthcare seeking behavior among cases identified in communities. While the majority of individuals did seek care, much of this was in the informal sector, which cannot easily be incorporated into surveillance activity. Nevertheless, including other hospitals attended by cases in the surveillance system (the exact location and number of these hospitals was unfortunately not identified during surveys) would double case detection probabilities and allow detection of medium-sized outbreaks (<10 cases) in a wider geographic area. However, in the case of Bangladesh, such extension is likely to be prohibitively expensive. Mapping other hospitals in Bangladesh that may serve as surveillance sites would allow testing of various surveillance scenarios to identify the optimal location of surveillance sites while keeping the same total number or to quantify the number of sites needed to achieve a target surveillance coverage [35]. Many emerging infectious diseases originate as spillover infections of zoonotic diseases into the human population [36]. Therefore, mapping the occurrence of relevant zoonotic diseases (e.g., avian influenza) and combining such maps with the estimated outbreak detection probabilities would allow highlighting of surveillance gaps for particular types of emerging infectious diseases. The capacity of surveillance systems to detect outbreaks will depend not only on the probability that a case attends a surveillance hospital, but also on whether the case undergoes confirmatory laboratory testing and is subsequently reported through the surveillance system by the hospital. Here we assumed a “best-case scenario” with a fully functional surveillance system at the hospital level, where each case who attends the surveillance hospital is ultimately recognized and confirmed as a case. Case detection at the hospital may however be incomplete, since case definitions at hospitals may differ from syndromic definitions, a surveillance sampling frame may be applied, or resources and trained personnel for the diagnosis and reporting of cases may be limited [37]. The calculation of case and outbreak detection probabilities may be adjusted for misdiagnosis and underreporting at hospitals if such information is available. We further assumed complete detection of cases in communities during surveys. Although a few cases may have been missed, this assumption is justified as we investigated severe disease conditions that are easily remembered by family and community members. Moreover, survey procedures combining interviews of key informants and house-to-house visits were specifically designed to capture near-complete case information. Further, any missed cases are unlikely to impact our estimates, as such impacts would occur only if there was differential healthcare seeking between those detected and those missed. We investigated spatial differences in hospital attendance based on the straight-line distance of communities from the surveillance hospital. If available, other distance measures such as travel distance or travel time may provide a more accurate indicator of distance from the surveillance hospital. In some cases, these measures may strongly vary with the season, and it would be interesting to explore how that may impact the probability of detecting an outbreak. We assumed that cases did not visit other surveillance hospitals than the catchment hospital. Given the poor road infrastructure in the country, it would be very unusual to travel to a tertiary care hospital that was not the closest one. It is possible that some individuals traveled to Dhaka; however, these are likely to be wealthier individuals who would visit small private healthcare facilities that are not part of the surveillance network. The surveillance hospital in Dhaka was not included in our study. This is unlikely to have biased our assessment of the performance of the surveillance system outside the capital. Indeed, this would introduce a bias only under the unlikely scenario that many cases in our study who did not attend the nearest surveillance hospital (and were therefore not captured there) instead attended the surveillance hospital in Dhaka (and were captured there). Surveillance system performance in the capital city may however differ from elsewhere, and a comprehensive assessment of the national surveillance system would therefore have to include Dhaka. Moreover, hospital-based surveillance is only one surveillance type in Bangladesh, and other data sources need to be considered to assess the country’s overall capacity to detect public health events. The described methodology is applicable to assessing surveillance for other severe diseases in resource-poor settings, keeping in mind practical constraints. Conducting community surveys may be labor intensive, time consuming, and expensive depending on the setting and may be particularly challenging in densely populated areas such as Dhaka. Nonetheless, such surveys are valuable tools for obtaining external reference data and simultaneously assess heterogeneities in healthcare access. The effectiveness of community networking may depend on the social structures in the study area; where social links are weaker (e.g., in urban areas), house-to-house surveys, even though more labor intensive, may be more suitable for the identification of cases in the community. The proposed strategy is valid for diseases of sufficient severity to require medical attention and to be remembered by cases and family members. The approach is syndromic (i.e., disease types are classified based on a set of symptoms), and the classification specificity may vary by disease. In conclusion, this study allowed us to quantify the sensitivity and representativeness of hospital-based surveillance and to identify weaknesses, particularly in detecting small- to medium-sized outbreaks in remote areas. These findings highlight difficulties that low-middle-income countries may have in meeting International Health Regulations requirements, despite considerable investment in hospital-based surveillance platforms. Supporting Information S1 Dataset. Healthcare utilization by distance from surveillance hospital. https://doi.org/10.1371/journal.pmed.1002218.s001 (XLSX) S2 Dataset. Healthcare utilization by case characteristics. https://doi.org/10.1371/journal.pmed.1002218.s002 (XLSX) S1 Text. Additional results and figures. https://doi.org/10.1371/journal.pmed.1002218.s003 (PDF) Acknowledgments The International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b) acknowledges with gratitude the commitment of the US Centers for Disease Control and Prevention to its research efforts. icddr,b is also grateful to the governments of Bangladesh, Canada, Sweden, and the UK for providing core/unrestricted support.
Sick Children Crying for Help: Fostering Adverse Event Reportsdoi: 10.1371/journal.pmed.1002216pmid: 28095409
In this week’s PLOS Medicine, Philippa Rees and a team of patient safety experts delved into a decade of reports to the United Kingdom National Reporting and Learning System to cull more than 2,000 events occurring in sick children [1]. Their retrospective finding of key areas of outpatient harm and risk points to a number of specific areas for improvement. The study is valuable both for the specific findings, lessons, and insights and also for encouraging us to grapple with the value of such reporting systems, analysis of collected reports, and ways of better leveraging findings to prevent harm in the future. Representative of the state of the art in incident reporting and review, there is cause for celebration as well as concern—celebration because assembling a large number of case reports allows us to step back and get beyond these single/isolated/disconnected reports to see a bigger picture, and concern because one has to worry that two decades of such efforts to collect, classify, draw conclusions, and effectively stimulate corrective changes from adverse events has resulted in a paucity of measurable benefits, with considerable wasted effort and lost opportunities for meaningful learning and improvement [2–4]. While not a pediatrician, as a primary care physician and patient safety researcher I have spent considerable time both submitting and reviewing safety reports [5,6]. At one point, I had filed more error and adverse drug reactions reports than all the other physicians at my public hospital in Chicago combined, making me either the institution’s most dangerous prescriber or its most diligent reporter [7]. Hoping it is more the latter, it is sobering to consider how infrequently adverse events and errors are being reported. Not only are we missing many adverse events, but those being reported likely are not perfectly representative of all errors that are occurring [8]. Thus, it is neither advisable nor fair to use report rates (or changes in rates) as measures of the epidemiology of quality or improvement efforts. Thus, I would caution readers to be wary of accepting the authors’ opening suggestion that we can correlate these reported safety issues with poorer outcome measures or higher mortality rates of the United Kingdom relative to other European countries. But what about the reports themselves—this wealth of rich case examples of actual problems transmitted straight from the front lines? We certainly have a debt to those who took the time and effort and perhaps even took risks to report these adverse events and owe them (as well as patients who may have been harmed) meaningful follow-up, learning, feedback, sharing, and improvement. In-depth examination both horizontally (to connect the reports with each other to see aggregate data and trends) and vertically (to dig deeper, delving into the rich details of the free-text narrative details that accompany a good report) as the authors have done is needed, yet is more often the exception than the rule. While I am not aware of any studies that have quantified the extent of this failure, this waste, and these wasted opportunities, I suspect it is huge. A key aspect of meaningfully bringing together these reports is classification of the events. Thinking critically about this step is necessary to avoid compounding empty collecting exercises with pointless taxonomy counting rituals. Analysis should breathe additional life into safety reports, rather than simply “putting them to bed.” What does a vision of such breathing life look and feel like, and to what extent does the Rees et al. work give us a glimpse of it? Two key ingredients in my view are (1) careful, timely review and contemplation of report narratives and (2) envisioning the ways such an event could happen elsewhere with an eye to error-proofing redesign. Basic quality improvement conceptual tools such as a) the “5 Whys” (digging progressively deeper by asking “why” five times to get to the root of underlying contributing causes) [9], b) distinguishing “special cause” from “common cause” (a statistical process control tool begging for more use in health care), c) minimizing “tampering” (well-meaning attempts to make changes that can introduce more variation and quality problems), and d) avoiding “suboptimization” (another improvement pitfall whereby changes are recommended or made that may help a narrowly conceived problem but create a more complex and dysfunctional system overall) need to be applied more regularly and rigorously [10,11]. With this perspective, how does the study by Rees and colleagues help move us forward? One way is by shining a light on two somewhat new or mostly untapped venues for collecting errors and adverse event reports—community pharmacies and telephone triage call centers. These two settings led the pack in reports of issues, which originated from and illustrated a number of vulnerabilities of special relevance to a pediatric population—particularly, special dosing/dispensing considerations (often requiring individualized medication compounding) and delays in recognizing septicemia. Also noteworthy was the finding that diagnostic delays had the highest burden of harm, something we have also argued and seen in malpractice cases [12]. Diagnostic errors are relatively infrequently reported to adverse event reporting systems, so finding substantial numbers here suggests this is the tip of a larger iceberg [13]. In their list of contributing causes, one item jarringly stands out both for its frequency and disharmony with a systems and just culture perspective: “failure to follow protocol” [14]. Were such reports perhaps more akin to incident reports submitted by supervisors to “write up” an employee who may have committed an error as documentation for the employee’s personnel record and as a warning? To the extent these reports were grounded in a retributive workplace culture, rather than a more ideal model of frontline staff submitting reports of errors or problems that they had seen, been involved with, or personally committed, based on caring deeply about the need to share these widely to help others avoid such pitfalls, these reports fulfill a less noble and valuable function. Lest we throw out the babies with the bathwater, we need to listen carefully to these incidents, analyze them, and act on them better (Table 1). Safety experts are debating and rethinking incident reporting on multiple continents [4,15]. Meanwhile, there is much to be learned from these reports, and the paper by Rees and colleagues just scratches the surface. Quality and learning and improvement from incident reporting need to be supported and enhanced at every step, but currently, there is a paucity of resources, responsiveness, and responsibility to do this well [16,17]. For these children and their parents, each institution should use these reports to ask and examine the questions: is this happening here; if so, how often; and how can we work at the front line and at the larger health authority level to make sure such incidents are less likely in the future [16]? Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Framework for Nurturing Reporting: Opportunities/Imperatives for Incident Reporting Improvement. https://doi.org/10.1371/journal.pmed.1002216.t001
Socioeconomic Inequalities in Body Mass Index across Adulthood: Coordinated Analyses of Individual Participant Data from Three British Birth Cohort Studies Initiated in 1946, 1958 and 1970doi: 10.1371/journal.pmed.1002214pmid: 28072856
Background High body mass index (BMI) is an important contributor to the global burden of ill-health and health inequality. Lower socioeconomic position (SEP) in both childhood and adulthood is associated with higher adult BMI, but how these associations have changed across time is poorly understood. We used longitudinal data to examine how childhood and adult SEP relates to BMI across adulthood in three national British birth cohorts. Methods and Findings The sample comprised up to 22,810 participants with 77,115 BMI observations in the 1946 MRC National Survey of Health and Development (ages 20 to 60–64), the 1958 National Child Development Study (ages 23 to 50), and the 1970 British Cohort Study (ages 26 to 42). Harmonized social class-based SEP data (Registrar General’s Social Class) was ascertained in childhood (father’s class at 10/11 y) and adulthood (42/43 years), and BMI repeatedly across adulthood, spanning 1966 to 2012. Associations between SEP and BMI were examined using linear regression and multilevel models. Lower childhood SEP was associated with higher adult BMI in both genders, and differences were typically larger at older ages and similar in magnitude in each cohort. The strength of association between adult SEP and BMI did not vary with age in any consistent pattern in these cohorts, but were more evident in women than men, and inequalities were larger among women in the 1970 cohort compared with earlier-born cohorts. For example, mean differences in BMI at 42/43 y amongst women in the lowest compared with highest social class were 2.0 kg/m2 (95% CI: −0.1, 4.0) in the 1946 NSHD, 2.3 kg/m2 (1.1, 3.4) in the 1958 NCDS, and 3.9 kg/m2 (2.3, 5.4) the in the 1970 BCS; mean (SD) BMI in the highest and lowest social classes were as follows: 24.9 (0.8) versus 26.8 (0.7) in the 1946 NSHD, 24.2 (0.4) versus 26.5 (0.4) in the 1958 NCDS, and 24.2 (0.3) versus 28.1 (0.8) in the 1970 BCS. Findings did not differ whether using overweight or obesity as an outcome. Limitations of this work include the use of social class as the sole indicator of SEP—while it was available in each cohort in both childhood and adulthood, trends in BMI inequalities may differ according to other dimensions of SEP such as education or income. Although harmonized data were used to aid inferences about birth cohort differences in BMI inequality, differences in other factors may have also contributed to findings—for example, differences in missing data. Conclusions Given these persisting inequalities and their public health implications, new and effective policies to reduce inequalities in adult BMI that tackle inequality with respect to both childhood and adult SEP are urgently required Why Was This Study Done? High body mass index (BMI) is thought to be harmful to human health—in most adults, a high BMI is due to having high amounts of fat mass in the body. Previous studies have found that those with fewer socioeconomic resources—both as children and as adults—are more likely, on average, to have a higher BMI as adults. Reducing these socioeconomic inequalities in BMI is an important health policy goal, yet there is limited existing data to help us understand comprehensively how these inequalities have changed across time. What Did the Researchers Do and Find? We used data from three national British birth cohort studies of those born in 1946, 1958, and 1970—these studies contain comparable data on social class in childhood and adulthood, and on BMI across adult life. We confirm that large inequalities in BMI exist, according to both childhood and adult SEP—these were stronger among women, but also found among men. Inequalities according to childhood SEP generally become progressively larger at older ages in all cohorts and in both genders; inequalities according to adult SEP were larger among more recently born generations of women. What Do These Findings Mean? The fact that BMI inequalities have persisted or increased across different generations, despite policies designed to reduce them, suggests that new policies are required. Results support the need to intervene earlier rather than later in adult life, since inequalities tend to become larger at older ages. Limitations include the use of only one aspect of socioeconomic circumstances (social class), and the fact that not all participants continue to provide data in longitudinal studies—this may have led us to underestimating the size of BMI inequalities. Introduction High body mass index (BMI) is an important modifiable contributor to the global burden of ill-health and health expenditure,[1, 2] and its prevalence increased markedly between the 1980s and 2014.[3–5] National attempts to reduce population BMI levels have thus far largely been unsuccessful,[3–5] suggesting that it is likely to be an important threat to the health of future generations.[3] Indeed, the increasing prevalence of high BMI at younger ages suggests that later-born generations are at risk of spending longer periods of life either overweight or obese.[6] Systematic reviews have also shown that in high income countries, lower socioeconomic position (SEP) in childhood and adulthood are associated with higher adult BMI and increased obesity risk.[7–9] Due to the links between higher BMI and adverse health outcomes,[10, 11] inequalities in BMI are likely to be an important contributor to socioeconomic inequalities in health. Accordingly, reducing BMI inequalities is a stated goal of numerous health policymakers and organizations; [12] achieving this requires high-quality evidence on how such inequalities have changed across time, in response to changing policy and economic contexts. Existing evidence on how inequalities in BMI have changed across generations is largely restricted to repeated analyses of cross-sectional data.[13–21] These studies have tended to report persisting inequalities in BMI that are stronger among women, yet are limited to investigating relatively recent changes (e.g., from 1993/1994 to 2002/2003,[13] or 1994 to 2008[14]). They also do not elucidate how inequalities in BMI change with age; understanding these patterns may help inform the development of interventions targeted at the most effective ages. Such differences may reflect age differences in susceptibility to weight gain, which in turn may differ by cohort depending on the period of life when exposed to more obesogenic environments. The use of repeated cross-sectional adult data also leads to an almost exclusive focus on the changing consequences of adult SEP in previous studies, since such studies do not have (by design) prospective measures of childhood SEP. However, childhood socioeconomic circumstances strongly determine those in adulthood, and childhood SEP has been repeatedly related to higher adult adiposity and other health outcomes independent of adult SEP.[22–27] Relations between childhood SEP and adult BMI are also less likely to be affected by reverse causation than those with adult SEP, since obesity may impair adult economic outcomes;[28–31] childhood SEP is therefore an important dimension of socioeconomic circumstances to consider when understanding how inequalities have changed across time. Many previous studies have not investigated inequality on both relative (e.g., % change) and absolute (difference in kg/m2) scales. Since changes in relative inequality can occur despite opposing or no changes in absolute inequality, particularly when dichotomized outcomes are used and the overall population outcome prevalence changes, analyzing both is likely to be important in order to better understand how inequalities have changed across time, and their population health impact.[32, 33] To our knowledge, the literature currently lacks a comprehensive coordinated analysis to investigate how inequalities in BMI across adulthood have changed over the course of the obesity epidemic, with respect to both childhood and adult SEP. The objectives of this study were to examine trends in the socioeconomic distribution of BMI and overweight or obesity across adult life, using data from the British birth cohort studies initiated in 1946, 1958, and 1970. These data have previously been used to show that the obesity epidemic has hit more recently born generations at increasingly younger ages in adulthood;[6] as with evidence from repeated cross-sectional studies,[4] this paper found that obesity prevalence increased markedly from the 1980s onwards in the United Kingdom and remained persistently high up to 2012. We use data collected between 1966 and 2012, which covers this period. Consistent with a fundamental cause hypothesis for understanding health inequality,[34, 35] we expected that due to the increasing public dissemination of the health harms of obesity, those of higher SEP may have been increasingly able to use their greater social, financial, and educational resources to protect themselves against excessive weight gain across adulthood; principally by modifying their diet and/or physical activity levels. As such, we hypothesized that socioeconomic inequalities in BMI would be larger in cohorts born later in the 20th century, and that these differences would be evident for both childhood and adult SEP. Methods Study samples Each study used in this manuscript has received relevant ethical approval and obtained parental/participant consent; this information is available from the study websites and/or cohort profiles. Britain’s birth cohort studies with participants followed up through adulthood were used. These were designed to be nationally representative when initiated in 1946 (MRC National Survey of Health and Development[36, 37]—1946 NSHD), 1958 (National Child Development Study[38]—1958 NCDS), and 1970 (British Cohort Study[39]—1970 BCS). The history, design, and characteristics of these studies have been previously described in detail in papers[36–40] and books;[41, 42] studies have also examined the characteristics of those lost due to attrition.[43–46] To aid the comparability of associations between SEP and BMI across studies, analyses were restricted to singleton births in England, Scotland, and Wales from those born and included in cohorts in the relevant weeks in March/April 1946 (N = 5,362), 1958 (N = 16,383), and 1970 (N = 16,172). In the 1946 NSHD, participants were restricted to those from married mothers due to tracing difficulties in the minority of babies born to unmarried mothers.[40] The weeks of initiation were chosen on the basis of practical considerations at the time of study development. The analytic sample sizes were those with valid data for each SEP measure and at least one adult BMI measure; for childhood SEP, a total of 22,810 participants with 77,115 BMI observations; for adult SEP, a total of 17,898 participants with 36,702 BMI observations (sample sizes for all analyses, in each cohort, age, and gender are shown in Table 1 and Table 2, and in S1 Table and S2 Table). All participants in the 1946 NSHD were white, as were 98.7% of those in the 1958 NCDS, and 95.4% of 1970 BCS. All analyses using the 1946 NSHD were weighted to account for the stratified sampling design[37]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Father’s social class (10/11 y) and BMI across adulthood in the 1946 NSHD, the 1958 NCDS, and the 1970 BCS British birth cohort studies https://doi.org/10.1371/journal.pmed.1002214.t001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Own social class (42/43 y) and BMI across mid-adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies https://doi.org/10.1371/journal.pmed.1002214.t002 BMI measurement BMI (kg/m2), the main outcome measure, was derived in each study from measured or self-reported weight and height obtained at all available adulthood ages: 1946 NSHD: 20*, 26*, 36, 43, 53, and 60–64 y; 1958 NCDS: 23*, 33, 42*, 44, and 50* y; 1970 BCS: 26*, 30*, 34*, and 42* y (*self-report). The main protocol differences and methods used to harmonize height and weight have been described elsewhere.[6] Briefly, all measures were converted to metric units, women were excluded when pregnant (N = 257 (1946), 684 (1958), 110 (1970)), and a standardized cleaning process was used to remove participants with implausible values. SEP ascertainment Indicators of SEP in childhood and adulthood were derived in each cohort—the main exposures in this study. Social class measures were used, since comparable measures were available in both childhood and adulthood, and in each cohort. Childhood SEP was indicated by father’s occupational social class, measured at 10/11 y, and adult SEP by own occupational social class measured at 42/43 y. To aid comparability of results across cohorts, the Registrar General’s Social Class was used to classify social class—from I (professional), II (managerial and technical), IIIN (skilled nonmanual), IIIM (skilled manual), IV (partly-skilled), and V (unskilled) occupations. The 1990 classification schema was used for childhood and adult SEP in all cohorts, and (due to a lack of conversion schema) the 1970 version was used for childhood SEP in the 1946 NSHD. Those in the armed forces were not assigned a social class, nor those not employed. Analytical strategy DB, RH, WJ, LL, and DK determined which analyses to perform and include in the paper in August 2016. Following request from peer review, we conducted additional analyses in October 2016 to examine the extent to which associations between childhood SEP and BMI were explained by adult SEP, and to examine in greater detail the characteristics of those with missing data. SEP differences in BMI. We derived mean BMI (and standard error [SE]) at each age for each childhood and adult SEP group, separately for each cohort and gender. To assess absolute inequalities in BMI at each age for childhood and adult SEP, we applied linear regression models to estimate differences in mean BMI (kg/m2) between each SEP class and the referent group (class I). We additionally checked if results differed when using regression models with log-transformed BMI to estimate relative differences (i.e., percentage) in BMI. To limit the potential for reverse causality, analyses using adult SEP were limited to contemporaneous and future BMI (≥42 y). Given expected gender differences in association (with larger inequalities in women),[7, 8] all analyses were conducted separately in each gender, and gender differences were formally tested by including an interaction term (gender*SEP). Deviation from linearity in the association between SEP and BMI was examined using Wald tests to determine whether coefficients for SEP (modelled as a categorical term) were equal to zero, in a model which also contained SEP modelled as a continuous term. To examine if the size of inequalities in BMI differed by cohort, regression models were also fitted on BMI at 42/43 y for all three cohorts combined, where differences in age of BMI measurements were smallest so that inequalities could be compared with respect to both childhood and adult SEP. A dummy term for each cohort was included in the model, and SEP*cohort interaction was tested. In all cohort-combined models, weights to account for the stratified nature of the 1946 NSHD cohort were applied, and all participants from 1958 NCDS and 1970 BCS were given the same weighting value of one. All analyses were conducted using statistical software STATA 14 (StataCorp, 2009). SEP differences in risk of overweight or obesity. The primary analyses described above used BMI as a continuous outcome, since preservation of the continuous nature of the outcome preserves statistical power and may enable a greater understanding of the nature of adiposity inequalities than analyses using binary outcomes. However, to aid public health interpretation, all analyses were repeated using a binary outcome indicating normal (BMI < 25) or overweight/obese (BMI ≥ 25) as an outcome. Overweight and obese were grouped together given the low obesity prevalence at younger ages, and participants classified as thin were excluded from analyses (2% of observations).[6] Inequalities were estimated using linear probability models to derive differences in prevalence by SEP group, and using log-binomial generalized linear models to estimate the relative risks of overweight/obesity in each SEP group. Do SEP differences in BMI differ by age? Trajectories of BMI were modelled using multilevel models—BMI measurements (level 1) were nested within individuals (level 2). We adopted a quadratic function for age to summarize the longitudinal changes of BMI. We specified a random intercept and random slope (linear term for age). SEP (as a categorical variable) was added to the models to examine its associations with BMI across adulthood. Differences in rate of BMI change between SEP groups were examined by including age*SEP interaction terms (age2*SEP interactions were not found and therefore not included in models; p > 0.05). Finally, a cohort combining all three cohorts was fitted, and an age*SEP*cohort interaction term was included to test whether the change in association by age differed by cohort. This model also contained cohort main effects, and all two-way interactions (age*SEP, age*cohort, SEP*cohort). Only fixed effects models were fitted when analysing adult SEP in 1970 BCS, since only one age point of BMI was available. Finally, additional models were conducted to examine whether associations between childhood SEP and BMI were explained by adult SEP. In these models, participants with valid data for SEP and valid BMI data on at least one age were included in analyses. Additional and sensitivity analyses. To examine the extent to which self-reported BMI data could bias SEP and BMI associations, we calculated differences in BMI in the 1958 NCDS at 42 y (self-reported) and 44 y (objectively measured)—the closest proximity of BMI self-reporting and objective measures in our data. We then examined relations between SEP and this difference measure: larger scores would indicate misreporting and/or excessive weight change. We were unable to adjust for an indicator of self-reporting or objective BMI measurement method in our models due to collinearity between measurement method and age/cohort. Finally, to inform the extent to which differences in missing data between cohorts could affect results, we compared the extent of missing SEP and BMI data by cohort and examined the characteristics of those with missing data. Study samples Each study used in this manuscript has received relevant ethical approval and obtained parental/participant consent; this information is available from the study websites and/or cohort profiles. Britain’s birth cohort studies with participants followed up through adulthood were used. These were designed to be nationally representative when initiated in 1946 (MRC National Survey of Health and Development[36, 37]—1946 NSHD), 1958 (National Child Development Study[38]—1958 NCDS), and 1970 (British Cohort Study[39]—1970 BCS). The history, design, and characteristics of these studies have been previously described in detail in papers[36–40] and books;[41, 42] studies have also examined the characteristics of those lost due to attrition.[43–46] To aid the comparability of associations between SEP and BMI across studies, analyses were restricted to singleton births in England, Scotland, and Wales from those born and included in cohorts in the relevant weeks in March/April 1946 (N = 5,362), 1958 (N = 16,383), and 1970 (N = 16,172). In the 1946 NSHD, participants were restricted to those from married mothers due to tracing difficulties in the minority of babies born to unmarried mothers.[40] The weeks of initiation were chosen on the basis of practical considerations at the time of study development. The analytic sample sizes were those with valid data for each SEP measure and at least one adult BMI measure; for childhood SEP, a total of 22,810 participants with 77,115 BMI observations; for adult SEP, a total of 17,898 participants with 36,702 BMI observations (sample sizes for all analyses, in each cohort, age, and gender are shown in Table 1 and Table 2, and in S1 Table and S2 Table). All participants in the 1946 NSHD were white, as were 98.7% of those in the 1958 NCDS, and 95.4% of 1970 BCS. All analyses using the 1946 NSHD were weighted to account for the stratified sampling design[37]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Father’s social class (10/11 y) and BMI across adulthood in the 1946 NSHD, the 1958 NCDS, and the 1970 BCS British birth cohort studies https://doi.org/10.1371/journal.pmed.1002214.t001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Own social class (42/43 y) and BMI across mid-adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies https://doi.org/10.1371/journal.pmed.1002214.t002 BMI measurement BMI (kg/m2), the main outcome measure, was derived in each study from measured or self-reported weight and height obtained at all available adulthood ages: 1946 NSHD: 20*, 26*, 36, 43, 53, and 60–64 y; 1958 NCDS: 23*, 33, 42*, 44, and 50* y; 1970 BCS: 26*, 30*, 34*, and 42* y (*self-report). The main protocol differences and methods used to harmonize height and weight have been described elsewhere.[6] Briefly, all measures were converted to metric units, women were excluded when pregnant (N = 257 (1946), 684 (1958), 110 (1970)), and a standardized cleaning process was used to remove participants with implausible values. SEP ascertainment Indicators of SEP in childhood and adulthood were derived in each cohort—the main exposures in this study. Social class measures were used, since comparable measures were available in both childhood and adulthood, and in each cohort. Childhood SEP was indicated by father’s occupational social class, measured at 10/11 y, and adult SEP by own occupational social class measured at 42/43 y. To aid comparability of results across cohorts, the Registrar General’s Social Class was used to classify social class—from I (professional), II (managerial and technical), IIIN (skilled nonmanual), IIIM (skilled manual), IV (partly-skilled), and V (unskilled) occupations. The 1990 classification schema was used for childhood and adult SEP in all cohorts, and (due to a lack of conversion schema) the 1970 version was used for childhood SEP in the 1946 NSHD. Those in the armed forces were not assigned a social class, nor those not employed. Analytical strategy DB, RH, WJ, LL, and DK determined which analyses to perform and include in the paper in August 2016. Following request from peer review, we conducted additional analyses in October 2016 to examine the extent to which associations between childhood SEP and BMI were explained by adult SEP, and to examine in greater detail the characteristics of those with missing data. SEP differences in BMI. We derived mean BMI (and standard error [SE]) at each age for each childhood and adult SEP group, separately for each cohort and gender. To assess absolute inequalities in BMI at each age for childhood and adult SEP, we applied linear regression models to estimate differences in mean BMI (kg/m2) between each SEP class and the referent group (class I). We additionally checked if results differed when using regression models with log-transformed BMI to estimate relative differences (i.e., percentage) in BMI. To limit the potential for reverse causality, analyses using adult SEP were limited to contemporaneous and future BMI (≥42 y). Given expected gender differences in association (with larger inequalities in women),[7, 8] all analyses were conducted separately in each gender, and gender differences were formally tested by including an interaction term (gender*SEP). Deviation from linearity in the association between SEP and BMI was examined using Wald tests to determine whether coefficients for SEP (modelled as a categorical term) were equal to zero, in a model which also contained SEP modelled as a continuous term. To examine if the size of inequalities in BMI differed by cohort, regression models were also fitted on BMI at 42/43 y for all three cohorts combined, where differences in age of BMI measurements were smallest so that inequalities could be compared with respect to both childhood and adult SEP. A dummy term for each cohort was included in the model, and SEP*cohort interaction was tested. In all cohort-combined models, weights to account for the stratified nature of the 1946 NSHD cohort were applied, and all participants from 1958 NCDS and 1970 BCS were given the same weighting value of one. All analyses were conducted using statistical software STATA 14 (StataCorp, 2009). SEP differences in risk of overweight or obesity. The primary analyses described above used BMI as a continuous outcome, since preservation of the continuous nature of the outcome preserves statistical power and may enable a greater understanding of the nature of adiposity inequalities than analyses using binary outcomes. However, to aid public health interpretation, all analyses were repeated using a binary outcome indicating normal (BMI < 25) or overweight/obese (BMI ≥ 25) as an outcome. Overweight and obese were grouped together given the low obesity prevalence at younger ages, and participants classified as thin were excluded from analyses (2% of observations).[6] Inequalities were estimated using linear probability models to derive differences in prevalence by SEP group, and using log-binomial generalized linear models to estimate the relative risks of overweight/obesity in each SEP group. Do SEP differences in BMI differ by age? Trajectories of BMI were modelled using multilevel models—BMI measurements (level 1) were nested within individuals (level 2). We adopted a quadratic function for age to summarize the longitudinal changes of BMI. We specified a random intercept and random slope (linear term for age). SEP (as a categorical variable) was added to the models to examine its associations with BMI across adulthood. Differences in rate of BMI change between SEP groups were examined by including age*SEP interaction terms (age2*SEP interactions were not found and therefore not included in models; p > 0.05). Finally, a cohort combining all three cohorts was fitted, and an age*SEP*cohort interaction term was included to test whether the change in association by age differed by cohort. This model also contained cohort main effects, and all two-way interactions (age*SEP, age*cohort, SEP*cohort). Only fixed effects models were fitted when analysing adult SEP in 1970 BCS, since only one age point of BMI was available. Finally, additional models were conducted to examine whether associations between childhood SEP and BMI were explained by adult SEP. In these models, participants with valid data for SEP and valid BMI data on at least one age were included in analyses. Additional and sensitivity analyses. To examine the extent to which self-reported BMI data could bias SEP and BMI associations, we calculated differences in BMI in the 1958 NCDS at 42 y (self-reported) and 44 y (objectively measured)—the closest proximity of BMI self-reporting and objective measures in our data. We then examined relations between SEP and this difference measure: larger scores would indicate misreporting and/or excessive weight change. We were unable to adjust for an indicator of self-reporting or objective BMI measurement method in our models due to collinearity between measurement method and age/cohort. Finally, to inform the extent to which differences in missing data between cohorts could affect results, we compared the extent of missing SEP and BMI data by cohort and examined the characteristics of those with missing data. SEP differences in BMI. We derived mean BMI (and standard error [SE]) at each age for each childhood and adult SEP group, separately for each cohort and gender. To assess absolute inequalities in BMI at each age for childhood and adult SEP, we applied linear regression models to estimate differences in mean BMI (kg/m2) between each SEP class and the referent group (class I). We additionally checked if results differed when using regression models with log-transformed BMI to estimate relative differences (i.e., percentage) in BMI. To limit the potential for reverse causality, analyses using adult SEP were limited to contemporaneous and future BMI (≥42 y). Given expected gender differences in association (with larger inequalities in women),[7, 8] all analyses were conducted separately in each gender, and gender differences were formally tested by including an interaction term (gender*SEP). Deviation from linearity in the association between SEP and BMI was examined using Wald tests to determine whether coefficients for SEP (modelled as a categorical term) were equal to zero, in a model which also contained SEP modelled as a continuous term. To examine if the size of inequalities in BMI differed by cohort, regression models were also fitted on BMI at 42/43 y for all three cohorts combined, where differences in age of BMI measurements were smallest so that inequalities could be compared with respect to both childhood and adult SEP. A dummy term for each cohort was included in the model, and SEP*cohort interaction was tested. In all cohort-combined models, weights to account for the stratified nature of the 1946 NSHD cohort were applied, and all participants from 1958 NCDS and 1970 BCS were given the same weighting value of one. All analyses were conducted using statistical software STATA 14 (StataCorp, 2009). SEP differences in risk of overweight or obesity. The primary analyses described above used BMI as a continuous outcome, since preservation of the continuous nature of the outcome preserves statistical power and may enable a greater understanding of the nature of adiposity inequalities than analyses using binary outcomes. However, to aid public health interpretation, all analyses were repeated using a binary outcome indicating normal (BMI < 25) or overweight/obese (BMI ≥ 25) as an outcome. Overweight and obese were grouped together given the low obesity prevalence at younger ages, and participants classified as thin were excluded from analyses (2% of observations).[6] Inequalities were estimated using linear probability models to derive differences in prevalence by SEP group, and using log-binomial generalized linear models to estimate the relative risks of overweight/obesity in each SEP group. Do SEP differences in BMI differ by age? Trajectories of BMI were modelled using multilevel models—BMI measurements (level 1) were nested within individuals (level 2). We adopted a quadratic function for age to summarize the longitudinal changes of BMI. We specified a random intercept and random slope (linear term for age). SEP (as a categorical variable) was added to the models to examine its associations with BMI across adulthood. Differences in rate of BMI change between SEP groups were examined by including age*SEP interaction terms (age2*SEP interactions were not found and therefore not included in models; p > 0.05). Finally, a cohort combining all three cohorts was fitted, and an age*SEP*cohort interaction term was included to test whether the change in association by age differed by cohort. This model also contained cohort main effects, and all two-way interactions (age*SEP, age*cohort, SEP*cohort). Only fixed effects models were fitted when analysing adult SEP in 1970 BCS, since only one age point of BMI was available. Finally, additional models were conducted to examine whether associations between childhood SEP and BMI were explained by adult SEP. In these models, participants with valid data for SEP and valid BMI data on at least one age were included in analyses. Additional and sensitivity analyses. To examine the extent to which self-reported BMI data could bias SEP and BMI associations, we calculated differences in BMI in the 1958 NCDS at 42 y (self-reported) and 44 y (objectively measured)—the closest proximity of BMI self-reporting and objective measures in our data. We then examined relations between SEP and this difference measure: larger scores would indicate misreporting and/or excessive weight change. We were unable to adjust for an indicator of self-reporting or objective BMI measurement method in our models due to collinearity between measurement method and age/cohort. Finally, to inform the extent to which differences in missing data between cohorts could affect results, we compared the extent of missing SEP and BMI data by cohort and examined the characteristics of those with missing data. Results BMI was typically higher at older ages within each cohort and in cohorts born more recently. BMI was also generally higher amongst those of lower rather than higher SEP (S1 and S2 Tables). Obesity or overweight prevalence followed the same patterns (S3 and S4 Tables). Childhood SEP In each cohort, lower childhood SEP was associated with higher mean BMI at all ages (Table 1). Associations tended to be stronger among women than men: evidence for gender interaction with childhood SEP (P<0.05) was found in the 1946 NSHD (at 60–64 y), the 1958 NCDS (33, 42, 44, and 50 y), and the 1970 BCS (30 and 42 y). Associations were also found to be nonlinear in many cases, with kg/m2 differences in BMI between classes I and IV (partly skilled) being larger than those between I and V (unskilled). The sizes of SEP differences in BMI were of considerable magnitude. For example, comparing mean BMI at 42/43 y amongst women in the lowest compared with highest childhood SEP, there was a 1.7 kg/m2 (95% CI: 0.2, 3.2) difference in the 1946 NSHD, 1.5 kg/m2 (0.6, 2.5) difference in the 1958 NCDS, and 2.7 kg/m2 (1.6, 3.9) difference in the 1970 BCS. However, there was little evidence that the size of BMI inequalities differed systematically by cohort (p-value for cohort*childhood SEP term = 0.33 in men and 0.20 in women; findings were similar when analyzing BMI differences in relative instead of absolute scales). Analyses using overweight or obese as a binary outcome yielded similar results (S5 Table). For example, comparing the difference in prevalence at 42/43 y amongst women in the lowest compared with highest childhood SEP, there was a 22.5% (5.9%, 39.2%) difference in the 1946 NSHD, 14.2% (5.1%, 23.3%) in the 1958 NCDS, and 15.7% (5.4%, 25.9%) in the 1970 BCS. Inequalities in BMI within each cohort were typically larger at older ages, as indicated by positive age*SEP interaction terms for the lower SEP groups (Figs 1 and 2, multilevel model estimates shown in S6 Table). For example, among women in the 1946 NSHD, the estimated BMI difference between class I and V of 1.45 kg/m2 (0.61, 2.29) at 26 years had increased to 3.30 kg/m2 (1.38, 5.21) at 60–64 y. The size of these age-related increases in inequality did not appear to differ systematically by cohort except in men, where age-related increases appeared to be stronger in the 1958 NCDS and 1970 BCS than the 1946 NSHD (p = 0.001 age*SEP*cohort interaction; p = 0.01 in women). These associations occurred alongside both age-related increases in mean BMI and secular increases in mean BMI in later born cohorts—mean BMI in the highest SEP group in the BCS 1970 was comparable to that of the lowest SEP group in the 1946 NSHD. Finally, associations between childhood SEP and BMI were typically only partly attenuated after adjustment for adult SEP (S7 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Male BMI across adulthood in relation to father’s social class (10/11 y) in the 1946, 1958, and 1970 British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models. https://doi.org/10.1371/journal.pmed.1002214.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Female BMI across adulthood in relation to father’s social class (10/11 y) in the 1946, 1958, and 1970 British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models. https://doi.org/10.1371/journal.pmed.1002214.g002 Adult SEP Lower adult SEP was typically associated with higher BMI (Table 2). As with childhood SEP differences, these associations were typically stronger among women than men—evidence for gender interaction with adult SEP (p < 0.05) was found in the 1946 NSHD (at 43, 53, and 60–64 y), the 1958 NCDS (42 and 44 y), and the 1970 BCS (42 y)—and for nonlinearity in the shape of associations. Among women, the size of BMI inequalities was progressively larger in each subsequent cohort. For example, the mean differences in BMI at 42/43 y amongst women in the lowest compared with highest social class was 2.0 kg/m2 (95% CI: −0.1, 4.0) in the 1946 NSHD, 2.3 (1.1, 3.4) in the 1958 NCDS, and 3.9 (2.3, 5.4) in the 1970 BCS—p(for cohort*SEP term) = 0.01. These cohort differences were driven by slight decreases in mean BMI in the highest social class and increases in BMI amongst those in the lowest social classes (S2 Table). There was no such evidence of cohort differences in men (p = 0.7), and findings for either gender did not differ when using relative instead of absolute measures of inequality. For example, among women, percentage differences in BMI at 42/43 in the lowest compared with highest social class were 6.4% (−1.3, 14.1) in the 1946 NSHD, 8.1% (4.0, 12.3) in the 1958 NCDS, and 14.0% (8.5, 19.6) in 1970 BCS. Mean (standard deviation [SD]) BMI in the highest and lowest social classes were as follows: 24.9 (0.8) versus 26.8 (0.7) in the 1946 NSHD, 24.2 (0.4) versus 26.5 (0.4) in the 1958 NCDS, and 24.2 (0.3) versus 28.1 (0.8) in the 1970 BCS (see S2 Table). Analyses using overweight or obese as binary outcomes yielded similar results (S8 Table). For example, when comparing the percentage difference in prevalence at 42/43 y amongst women in the lowest compared with highest adult SEP, there was a 5.2% (−26.2, 36.5) difference in the 1946 NSHD, 20.2% (8.6, 31.9) in the 1958 NCDS, and 26.9% (12.5, 41.2) in the 1970 BCS. Among men in the 1958 NCDS, adult SEP and BMI associations tended to become larger at older ages—the estimated difference in BMI between class IV and I was 0.98 kg/m2 (0.43, 1.54) at 43 y and 1.57 kg/m2 (0.91, 2.23) at 50 y (Fig 3 and Fig 4 and S9 Table). These differences were not found among men in the 1946 NSHD (p < 0.001 age*SEP*cohort interaction). Among women, there was no evidence that associations systematically changed with age in either cohort (p = 0.97 age*SEP*cohort interaction). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Male BMI across adulthood in relation to own social class (42/43 y) in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models (age terms not included in the 1970 BCS due to only 1 age of measurement). https://doi.org/10.1371/journal.pmed.1002214.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Female BMI across adulthood in relation to own social class (42/43 y) in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models (age terms not included in the 1970 BCS due to only one age of measurement). https://doi.org/10.1371/journal.pmed.1002214.g004 Additional and sensitivity analyses We did not find evidence that differences in BMI at 42 (self-reported) and 44 y (objectively measured) differed according to childhood (p = 0.9 in men, p = 0.2 in women) or adult SEP (p = 0.3 in men, p = 0.3 in women) in the 1958 NCDS (S10 Table). More recently born cohorts had greater missing SEP and BMI data; similarly in each cohort, those with lower childhood SEP or higher preceding BMI were more likely to have missing adult SEP and BMI data (S1 Text). Childhood SEP In each cohort, lower childhood SEP was associated with higher mean BMI at all ages (Table 1). Associations tended to be stronger among women than men: evidence for gender interaction with childhood SEP (P<0.05) was found in the 1946 NSHD (at 60–64 y), the 1958 NCDS (33, 42, 44, and 50 y), and the 1970 BCS (30 and 42 y). Associations were also found to be nonlinear in many cases, with kg/m2 differences in BMI between classes I and IV (partly skilled) being larger than those between I and V (unskilled). The sizes of SEP differences in BMI were of considerable magnitude. For example, comparing mean BMI at 42/43 y amongst women in the lowest compared with highest childhood SEP, there was a 1.7 kg/m2 (95% CI: 0.2, 3.2) difference in the 1946 NSHD, 1.5 kg/m2 (0.6, 2.5) difference in the 1958 NCDS, and 2.7 kg/m2 (1.6, 3.9) difference in the 1970 BCS. However, there was little evidence that the size of BMI inequalities differed systematically by cohort (p-value for cohort*childhood SEP term = 0.33 in men and 0.20 in women; findings were similar when analyzing BMI differences in relative instead of absolute scales). Analyses using overweight or obese as a binary outcome yielded similar results (S5 Table). For example, comparing the difference in prevalence at 42/43 y amongst women in the lowest compared with highest childhood SEP, there was a 22.5% (5.9%, 39.2%) difference in the 1946 NSHD, 14.2% (5.1%, 23.3%) in the 1958 NCDS, and 15.7% (5.4%, 25.9%) in the 1970 BCS. Inequalities in BMI within each cohort were typically larger at older ages, as indicated by positive age*SEP interaction terms for the lower SEP groups (Figs 1 and 2, multilevel model estimates shown in S6 Table). For example, among women in the 1946 NSHD, the estimated BMI difference between class I and V of 1.45 kg/m2 (0.61, 2.29) at 26 years had increased to 3.30 kg/m2 (1.38, 5.21) at 60–64 y. The size of these age-related increases in inequality did not appear to differ systematically by cohort except in men, where age-related increases appeared to be stronger in the 1958 NCDS and 1970 BCS than the 1946 NSHD (p = 0.001 age*SEP*cohort interaction; p = 0.01 in women). These associations occurred alongside both age-related increases in mean BMI and secular increases in mean BMI in later born cohorts—mean BMI in the highest SEP group in the BCS 1970 was comparable to that of the lowest SEP group in the 1946 NSHD. Finally, associations between childhood SEP and BMI were typically only partly attenuated after adjustment for adult SEP (S7 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Male BMI across adulthood in relation to father’s social class (10/11 y) in the 1946, 1958, and 1970 British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models. https://doi.org/10.1371/journal.pmed.1002214.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Female BMI across adulthood in relation to father’s social class (10/11 y) in the 1946, 1958, and 1970 British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models. https://doi.org/10.1371/journal.pmed.1002214.g002 Adult SEP Lower adult SEP was typically associated with higher BMI (Table 2). As with childhood SEP differences, these associations were typically stronger among women than men—evidence for gender interaction with adult SEP (p < 0.05) was found in the 1946 NSHD (at 43, 53, and 60–64 y), the 1958 NCDS (42 and 44 y), and the 1970 BCS (42 y)—and for nonlinearity in the shape of associations. Among women, the size of BMI inequalities was progressively larger in each subsequent cohort. For example, the mean differences in BMI at 42/43 y amongst women in the lowest compared with highest social class was 2.0 kg/m2 (95% CI: −0.1, 4.0) in the 1946 NSHD, 2.3 (1.1, 3.4) in the 1958 NCDS, and 3.9 (2.3, 5.4) in the 1970 BCS—p(for cohort*SEP term) = 0.01. These cohort differences were driven by slight decreases in mean BMI in the highest social class and increases in BMI amongst those in the lowest social classes (S2 Table). There was no such evidence of cohort differences in men (p = 0.7), and findings for either gender did not differ when using relative instead of absolute measures of inequality. For example, among women, percentage differences in BMI at 42/43 in the lowest compared with highest social class were 6.4% (−1.3, 14.1) in the 1946 NSHD, 8.1% (4.0, 12.3) in the 1958 NCDS, and 14.0% (8.5, 19.6) in 1970 BCS. Mean (standard deviation [SD]) BMI in the highest and lowest social classes were as follows: 24.9 (0.8) versus 26.8 (0.7) in the 1946 NSHD, 24.2 (0.4) versus 26.5 (0.4) in the 1958 NCDS, and 24.2 (0.3) versus 28.1 (0.8) in the 1970 BCS (see S2 Table). Analyses using overweight or obese as binary outcomes yielded similar results (S8 Table). For example, when comparing the percentage difference in prevalence at 42/43 y amongst women in the lowest compared with highest adult SEP, there was a 5.2% (−26.2, 36.5) difference in the 1946 NSHD, 20.2% (8.6, 31.9) in the 1958 NCDS, and 26.9% (12.5, 41.2) in the 1970 BCS. Among men in the 1958 NCDS, adult SEP and BMI associations tended to become larger at older ages—the estimated difference in BMI between class IV and I was 0.98 kg/m2 (0.43, 1.54) at 43 y and 1.57 kg/m2 (0.91, 2.23) at 50 y (Fig 3 and Fig 4 and S9 Table). These differences were not found among men in the 1946 NSHD (p < 0.001 age*SEP*cohort interaction). Among women, there was no evidence that associations systematically changed with age in either cohort (p = 0.97 age*SEP*cohort interaction). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Male BMI across adulthood in relation to own social class (42/43 y) in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models (age terms not included in the 1970 BCS due to only 1 age of measurement). https://doi.org/10.1371/journal.pmed.1002214.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Female BMI across adulthood in relation to own social class (42/43 y) in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. Note: lines show estimated BMI along with 95% confidence intervals at each age, estimated using multilevel general linear regression models (age terms not included in the 1970 BCS due to only one age of measurement). https://doi.org/10.1371/journal.pmed.1002214.g004 Additional and sensitivity analyses We did not find evidence that differences in BMI at 42 (self-reported) and 44 y (objectively measured) differed according to childhood (p = 0.9 in men, p = 0.2 in women) or adult SEP (p = 0.3 in men, p = 0.3 in women) in the 1958 NCDS (S10 Table). More recently born cohorts had greater missing SEP and BMI data; similarly in each cohort, those with lower childhood SEP or higher preceding BMI were more likely to have missing adult SEP and BMI data (S1 Text). Discussion Main findings Using longitudinal data from three British birth cohorts who experienced the obesity epidemic at increasingly younger ages in adulthood, we identified large and persisting socioeconomic inequalities in BMI from age 20 up to ages 60–64. Inequalities according to childhood SEP were evident in both men and women, were typically larger at older ages, and were similar in magnitude at any given age in all three cohorts. Inequalities according to adult SEP were more evident in women than men, yet did not consistently differ by age. Among women, inequalities were found to be larger in both absolute and relative terms in the 1970 BCS compared with earlier-born cohorts in midlife BMI (i.e., BMI at age 42/43 y when measured in 2012 compared with 2000 or 1989), while in men they remained constant. Comparison with previous studies Our findings are broadly consistent with evidence of persisting inequalities in obesity or BMI derived from separate studies utilizing repeated cross-sectional data and adult SEP indicators. For example, in English adults aged 18–64, BMI inequalities according to adult manual or nonmanual social class did not change (in absolute terms) from 1993/4 to 2002/3,[13] yet obesity inequalities increased in absolute but not relative terms according to area-based socioeconomic circumstances among older (age ≥55) but not younger (16–54) adults from 1994 to 2008.[14] In Scotland, obesity inequalities (in relative terms) according to education or income in adults were estimated to have decreased from 1995 to 2011 (age 16–65).[15] Studies using birth cohort data have also reported inequalities in BMI yet are either limited to investigation of single birth cohorts,[22–25, 27] or use multiple birth cohorts in a more limited manner (e.g., among regional samples with limited overlap between ages/y born,[47] or among two national birth cohorts using BMI at one age and a binary SEP indicator[26]). Our findings add to the international evidence base, also derived from repeated cross-sectional data, suggesting persisting inequalities in high income countries and/or emerging inequalities in low–middle income countries.[17–21] Our study adds to previous knowledge by providing a more detailed understanding of how inequalities in BMI have changed across a longer timeframe. Despite large societal changes in occupations and a reduction in absolute disadvantage in the later 20th century, we found that social class inequalities have persisted across three generations. Two additional findings also suggest a greater potential cause for concern than previously thought: for childhood SEP, associations typically widened with increasing age in all three cohorts and, for adult SEP, associations appear to have become larger in more recently born generations of women. Explanation of findings Inequalities in BMI are likely to be explained by inequalities in the determinants of weight gain both before and during adulthood. In support of this, systematic reviews and large population-based studies have found evidence for inequalities in both indicators of physical activity[48, 49] and diet.[50–52] The persistence of inequalities in BMI across cohorts suggests that inequalities in the determinants of BMI have not substantially changed,[53] despite policies designed to reduce them.[54, 55] The net influence of childhood SEP on the determinants of BMI, operating both in childhood and adulthood, may therefore have not differed in each cohort. If this were the case, our finding of increasing size of inequalities in BMI found at older compared with younger ages (according to childhood SEP) may be due to the accumulation of periods of weight gain which would be expected to track across life.[56] Part of the childhood SEP and BMI associations are also likely to be explained by continuity of SEP into adulthood, although we found limited evidence for a prominent mediating role of adult social class (as in other studies[22–26]). However, such pathways may be more appropriately identified in analyses that account for potential intermediary confounding[57] and the multidimensional and time-varying nature of adult SEP. Although we are not aware of studies that have examined trends in diet and physical activity inequalities in these birth cohorts, there are some co-occurring cultural and economic changes which may have been expected to have increased BMI inequalities across time. For example, the relative cost of a high-quality diet increased in the UK (and other high income countries) from the 1990s–2010,[58] although it is unclear if this was matched by increased inequalities in dietary intake.[14, 59] In addition, studies have suggested that there have been increasing cultural expectations from the 1970s onwards for women to be thin,[60] yet for men to be muscular.[61, 62] Since women of higher SEP are likely to have had greater resources to follow these trends, these changes could explain the increasing inequalities observed in the 1970 BCS compared with earlier born cohorts; mean BMI in the most advantaged group remained similar in all three cohorts. Women also experience additional putative risk factors for weight gain which men do not, such as childbirth and the menopausal transition,[63] yet these factors have not explained BMI inequalities in previous studies.[64] Inequalities in leisure time physical activity widened among older women but not men in England from 1994–2008 (according to an area-based socioeconomic indicator)[14], and occupational-related physical activity may have declined, especially amongst manual social classes, due to declines in the number of physical occupations.[65] However, it should be noted that understanding the relative contribution of diet and physical activity to BMI inequalities, and how these may change across time, is likely to be challenging due to the difficulties in accurately assessing these behaviours in the population. In addition, labour market participation has changed across time, with increasing participation from women in the 1970 BCS compared with the 1946 NSHD.[66] Such differences could potentially affect estimates of how inequalities in BMI have changed across time, particularly among women. A conservative interpretation of our results would therefore limit them to providing evidence of how socioeconomic inequality in BMI has changed amongst those whose fathers were employed as children, or those subsequently employed as adults. Despite overall declining population smoking rates, relative inequalities in smoking have increased,[67] a pattern which could affect BMI inequality, since smoking has a modest association with lower weight. However, smoking has been found not to explain associations between SEP and BMI in other cohorts,[68] and smoking was not associated with obesity in the 1946 NSHD,[69] nor in the 1970 BCS.[70] Finally, mortality rates are likely to have been highest in lower SEP groups[71] and those with higher BMI,[72] which might partly explain the lower-than-expected BMI found in the lowest SEP group among men at older ages. Strengths and limitations Strengths of this study include the use of three national birth cohort studies initiated in 1946, 1958, and 1970, with harmonized data for SEP and BMI across life. These data are especially well-suited to examine the emergence of inequalities in BMI across age and generation. Limitations include the use of BMI which, although strongly positively correlated with fat mass[73], does not distinguish fat and lean (muscle) mass. Lower SEP has been associated with lower lean mass in the 1946 NSHD (after adjustment)[23], so the use of BMI could have resulted in underestimating socioeconomic inequalities in fat mass, compared with direct measures of fat mass. Indeed, given evidence that BMI and blood pressure associations are stronger in the 1958 NCDS than the 1946 NSHD,[74] it may be amongst later born cohorts a given BMI value reflects increasingly more fat than lean mass compared with earlier born cohorts. All included studies experienced attrition that, as expected,[43, 44] was generally more pronounced amongst those of lower SEP and/or with higher BMI. The use of multilevel models enabled those with incomplete information to be included in our analyses (under the assumption of missing at random). However, as in all observational studies, we cannot rule out the possibility that missing data are nonignorable and therefore might bias our findings. Missing SEP and BMI data were also more frequent in later born cohorts, which might bias results regarding cross cohort changes in BMI inequality. Studies have found that greater attrition or nonresponse in longitudinal studies typically leads to a reduction in the magnitude of health inequalities observed;[75] analogous findings were also found in the present study, since childhood SEP and BMI associations were generally weaker in the samples which additionally had valid adult SEP data (S7 Table). This direction of bias may therefore have led to us underestimating increases in BMI inequality observed amongst more recently born cohorts (which were observed for women), and increases in the size of BMI inequality at older ages (which were found for both men and women). In support of the representativeness of the included birth cohorts, obesity estimates have been found to be similar with those from the Health Survey for England.[6] However, such biases should be considered when interpreting these, and other, results on how SEP differences may have changed across time—cross-sectional health surveys have moderate response rates (around 40 to 60%) which have declined in recent decades.[76] We used harmonized measures of occupational-based social class, yet these cohorts predate the 2000 move to NS-SEC social class used in the UK.[77] The use of social class omitted participants not currently employed; class is also only one dimension of SEP—others (e.g., education, income, and wealth) may yield health-relevant information independent of class, and therefore warrant further study. Finally, we focused on average differences across social groups, whereas within-group differences may have also changed by cohort, and this warrants investigation.[78] Implications The persistence of inequalities in BMI throughout adulthood across different generations suggests that new and/or improved strategies are required to reduce them. Indeed, the UK government has, through a number of initiatives, aimed to reduce both obesity and its inequality since 1998, with limited current success.[54, 55] While there is some evidence that interventions targeted at disadvantaged populations can result in reductions in BMI, there is limited high-quality evidence on population-wide interventions, as well as on the relative timing of interventions across life.[79] Given our findings of progressively widening BMI inequalities across adulthood, and the fact that BMI tends to track across life,[56] interventions may be most effective when initiated as early as possible in adulthood. Interventions which require little individual agency[80] may be especially efficacious at lowering population BMI levels and reducing inequalities. For example, increasing taxation on unhealthy foods while subsidizing others may be effective, despite potentially being financially regressive in the short term.[81, 82] Coordinated analyses of how inequalities in BMI differ across countries may also help to identify strategies which successfully reduce BMI inequalities.[83] Ultimately, targeting inequalities in socioeconomic resources may be more effective than targeting specific mediating factors, since the relative importance of the mediators may change across time and interventions may inadvertently increase inequalities.[84] Reducing inequalities in socioeconomic resources in childhood may be particularly beneficial by affecting both early life determinants of adult BMI and subsequent adult SEP inequalities. Indeed, such early interventions will also be best placed to reduce inequalities in childhood obesity, which are found in cohorts born more recently (e.g., those born in 2000/1);[85] understanding how adiposity inequalities have changed in different generations in childhood requires investigation of childhood growth in terms of weight, height, and height-adjusted weight. Main findings Using longitudinal data from three British birth cohorts who experienced the obesity epidemic at increasingly younger ages in adulthood, we identified large and persisting socioeconomic inequalities in BMI from age 20 up to ages 60–64. Inequalities according to childhood SEP were evident in both men and women, were typically larger at older ages, and were similar in magnitude at any given age in all three cohorts. Inequalities according to adult SEP were more evident in women than men, yet did not consistently differ by age. Among women, inequalities were found to be larger in both absolute and relative terms in the 1970 BCS compared with earlier-born cohorts in midlife BMI (i.e., BMI at age 42/43 y when measured in 2012 compared with 2000 or 1989), while in men they remained constant. Comparison with previous studies Our findings are broadly consistent with evidence of persisting inequalities in obesity or BMI derived from separate studies utilizing repeated cross-sectional data and adult SEP indicators. For example, in English adults aged 18–64, BMI inequalities according to adult manual or nonmanual social class did not change (in absolute terms) from 1993/4 to 2002/3,[13] yet obesity inequalities increased in absolute but not relative terms according to area-based socioeconomic circumstances among older (age ≥55) but not younger (16–54) adults from 1994 to 2008.[14] In Scotland, obesity inequalities (in relative terms) according to education or income in adults were estimated to have decreased from 1995 to 2011 (age 16–65).[15] Studies using birth cohort data have also reported inequalities in BMI yet are either limited to investigation of single birth cohorts,[22–25, 27] or use multiple birth cohorts in a more limited manner (e.g., among regional samples with limited overlap between ages/y born,[47] or among two national birth cohorts using BMI at one age and a binary SEP indicator[26]). Our findings add to the international evidence base, also derived from repeated cross-sectional data, suggesting persisting inequalities in high income countries and/or emerging inequalities in low–middle income countries.[17–21] Our study adds to previous knowledge by providing a more detailed understanding of how inequalities in BMI have changed across a longer timeframe. Despite large societal changes in occupations and a reduction in absolute disadvantage in the later 20th century, we found that social class inequalities have persisted across three generations. Two additional findings also suggest a greater potential cause for concern than previously thought: for childhood SEP, associations typically widened with increasing age in all three cohorts and, for adult SEP, associations appear to have become larger in more recently born generations of women. Explanation of findings Inequalities in BMI are likely to be explained by inequalities in the determinants of weight gain both before and during adulthood. In support of this, systematic reviews and large population-based studies have found evidence for inequalities in both indicators of physical activity[48, 49] and diet.[50–52] The persistence of inequalities in BMI across cohorts suggests that inequalities in the determinants of BMI have not substantially changed,[53] despite policies designed to reduce them.[54, 55] The net influence of childhood SEP on the determinants of BMI, operating both in childhood and adulthood, may therefore have not differed in each cohort. If this were the case, our finding of increasing size of inequalities in BMI found at older compared with younger ages (according to childhood SEP) may be due to the accumulation of periods of weight gain which would be expected to track across life.[56] Part of the childhood SEP and BMI associations are also likely to be explained by continuity of SEP into adulthood, although we found limited evidence for a prominent mediating role of adult social class (as in other studies[22–26]). However, such pathways may be more appropriately identified in analyses that account for potential intermediary confounding[57] and the multidimensional and time-varying nature of adult SEP. Although we are not aware of studies that have examined trends in diet and physical activity inequalities in these birth cohorts, there are some co-occurring cultural and economic changes which may have been expected to have increased BMI inequalities across time. For example, the relative cost of a high-quality diet increased in the UK (and other high income countries) from the 1990s–2010,[58] although it is unclear if this was matched by increased inequalities in dietary intake.[14, 59] In addition, studies have suggested that there have been increasing cultural expectations from the 1970s onwards for women to be thin,[60] yet for men to be muscular.[61, 62] Since women of higher SEP are likely to have had greater resources to follow these trends, these changes could explain the increasing inequalities observed in the 1970 BCS compared with earlier born cohorts; mean BMI in the most advantaged group remained similar in all three cohorts. Women also experience additional putative risk factors for weight gain which men do not, such as childbirth and the menopausal transition,[63] yet these factors have not explained BMI inequalities in previous studies.[64] Inequalities in leisure time physical activity widened among older women but not men in England from 1994–2008 (according to an area-based socioeconomic indicator)[14], and occupational-related physical activity may have declined, especially amongst manual social classes, due to declines in the number of physical occupations.[65] However, it should be noted that understanding the relative contribution of diet and physical activity to BMI inequalities, and how these may change across time, is likely to be challenging due to the difficulties in accurately assessing these behaviours in the population. In addition, labour market participation has changed across time, with increasing participation from women in the 1970 BCS compared with the 1946 NSHD.[66] Such differences could potentially affect estimates of how inequalities in BMI have changed across time, particularly among women. A conservative interpretation of our results would therefore limit them to providing evidence of how socioeconomic inequality in BMI has changed amongst those whose fathers were employed as children, or those subsequently employed as adults. Despite overall declining population smoking rates, relative inequalities in smoking have increased,[67] a pattern which could affect BMI inequality, since smoking has a modest association with lower weight. However, smoking has been found not to explain associations between SEP and BMI in other cohorts,[68] and smoking was not associated with obesity in the 1946 NSHD,[69] nor in the 1970 BCS.[70] Finally, mortality rates are likely to have been highest in lower SEP groups[71] and those with higher BMI,[72] which might partly explain the lower-than-expected BMI found in the lowest SEP group among men at older ages. Strengths and limitations Strengths of this study include the use of three national birth cohort studies initiated in 1946, 1958, and 1970, with harmonized data for SEP and BMI across life. These data are especially well-suited to examine the emergence of inequalities in BMI across age and generation. Limitations include the use of BMI which, although strongly positively correlated with fat mass[73], does not distinguish fat and lean (muscle) mass. Lower SEP has been associated with lower lean mass in the 1946 NSHD (after adjustment)[23], so the use of BMI could have resulted in underestimating socioeconomic inequalities in fat mass, compared with direct measures of fat mass. Indeed, given evidence that BMI and blood pressure associations are stronger in the 1958 NCDS than the 1946 NSHD,[74] it may be amongst later born cohorts a given BMI value reflects increasingly more fat than lean mass compared with earlier born cohorts. All included studies experienced attrition that, as expected,[43, 44] was generally more pronounced amongst those of lower SEP and/or with higher BMI. The use of multilevel models enabled those with incomplete information to be included in our analyses (under the assumption of missing at random). However, as in all observational studies, we cannot rule out the possibility that missing data are nonignorable and therefore might bias our findings. Missing SEP and BMI data were also more frequent in later born cohorts, which might bias results regarding cross cohort changes in BMI inequality. Studies have found that greater attrition or nonresponse in longitudinal studies typically leads to a reduction in the magnitude of health inequalities observed;[75] analogous findings were also found in the present study, since childhood SEP and BMI associations were generally weaker in the samples which additionally had valid adult SEP data (S7 Table). This direction of bias may therefore have led to us underestimating increases in BMI inequality observed amongst more recently born cohorts (which were observed for women), and increases in the size of BMI inequality at older ages (which were found for both men and women). In support of the representativeness of the included birth cohorts, obesity estimates have been found to be similar with those from the Health Survey for England.[6] However, such biases should be considered when interpreting these, and other, results on how SEP differences may have changed across time—cross-sectional health surveys have moderate response rates (around 40 to 60%) which have declined in recent decades.[76] We used harmonized measures of occupational-based social class, yet these cohorts predate the 2000 move to NS-SEC social class used in the UK.[77] The use of social class omitted participants not currently employed; class is also only one dimension of SEP—others (e.g., education, income, and wealth) may yield health-relevant information independent of class, and therefore warrant further study. Finally, we focused on average differences across social groups, whereas within-group differences may have also changed by cohort, and this warrants investigation.[78] Implications The persistence of inequalities in BMI throughout adulthood across different generations suggests that new and/or improved strategies are required to reduce them. Indeed, the UK government has, through a number of initiatives, aimed to reduce both obesity and its inequality since 1998, with limited current success.[54, 55] While there is some evidence that interventions targeted at disadvantaged populations can result in reductions in BMI, there is limited high-quality evidence on population-wide interventions, as well as on the relative timing of interventions across life.[79] Given our findings of progressively widening BMI inequalities across adulthood, and the fact that BMI tends to track across life,[56] interventions may be most effective when initiated as early as possible in adulthood. Interventions which require little individual agency[80] may be especially efficacious at lowering population BMI levels and reducing inequalities. For example, increasing taxation on unhealthy foods while subsidizing others may be effective, despite potentially being financially regressive in the short term.[81, 82] Coordinated analyses of how inequalities in BMI differ across countries may also help to identify strategies which successfully reduce BMI inequalities.[83] Ultimately, targeting inequalities in socioeconomic resources may be more effective than targeting specific mediating factors, since the relative importance of the mediators may change across time and interventions may inadvertently increase inequalities.[84] Reducing inequalities in socioeconomic resources in childhood may be particularly beneficial by affecting both early life determinants of adult BMI and subsequent adult SEP inequalities. Indeed, such early interventions will also be best placed to reduce inequalities in childhood obesity, which are found in cohorts born more recently (e.g., those born in 2000/1);[85] understanding how adiposity inequalities have changed in different generations in childhood requires investigation of childhood growth in terms of weight, height, and height-adjusted weight. Conclusions Our findings, based on historic longitudinal data, demonstrate that the overweight and obesity epidemic has disproportionately impacted adults in Britain born in 1946, 1958, and 1970 who were more socioeconomically disadvantaged in childhood or adulthood. They prompt consideration of how inequalities can be reduced amongst these and future cohorts, given the considerable expected adverse health impacts. Supporting Information S1 Table. Father’s occupational class (10/11 y) and mean BMI across adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. https://doi.org/10.1371/journal.pmed.1002214.s001 (DOC) S2 Table. Own occupational class (42/43 y) and mean BMI across adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. https://doi.org/10.1371/journal.pmed.1002214.s002 (DOC) S3 Table. Father’s occupational class (10/11 y) and overweight or obesity prevalence across adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. https://doi.org/10.1371/journal.pmed.1002214.s003 (DOC) S4 Table. Own occupational class (42/43 y) and overweight or obesity prevalence across adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. https://doi.org/10.1371/journal.pmed.1002214.s004 (DOC) S5 Table. Father’s occupational class (10/11 y) and obesity or overweight across adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies. https://doi.org/10.1371/journal.pmed.1002214.s005 (DOC) S6 Table. Father’s occupational class (10/11 y) and BMI across adulthood (≥20 y) in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies: estimates from separate multilevel models, scaled to show estimated BMI differences at 26 y. https://doi.org/10.1371/journal.pmed.1002214.s006 (DOC) S7 Table. Father’s occupational class (10/11 y) and BMI across adulthood (≥20 y) in the 1946 NSHD, 1958 NCDS, and 1970 BCS British birth cohort studies, adjusted for adult occupational class (42/43 y): estimates from separate multilevel models, scaled to show estimated BMI differences at 26 y. https://doi.org/10.1371/journal.pmed.1002214.s007 (DOC) S8 Table. Own occupational class (42/43 y) and obesity or overweight across mid-adulthood in the 1946 NSHD, 1958 NCDS, and 1970 BCS birth cohort studies. https://doi.org/10.1371/journal.pmed.1002214.s008 (DOC) S9 Table. Own occupational class (42/43 y) and adult BMI (≥42 y) in the 1946 NSHD and 1958 NCDS British birth cohort studies: estimates from separate multilevel models, scaled to show estimated BMI differences at 43 y. https://doi.org/10.1371/journal.pmed.1002214.s009 (DOC) S10 Table. Socioeconomic position in relation to calculated change in BMI between a self-reported (at 42 y) and objective measure (at 44 y) in the 1958 NCDS British birth cohort study https://doi.org/10.1371/journal.pmed.1002214.s010 (DOC) S1 PRIMSA Checklist. https://doi.org/10.1371/journal.pmed.1002214.s011 (DOC) S1 Text. Missing data appendix. https://doi.org/10.1371/journal.pmed.1002214.s012 (DOC) Acknowledgments We thank Dr. Shaun Scholes for providing helpful comments on an earlier version of this manuscript and Brian Dodgeon for preparing harmonized social class data. We thank participants from all three studies for their invaluable long-standing contribution.
Association of Body Mass Index with DNA Methylation and Gene Expression in Blood Cells and Relations to Cardiometabolic Disease: A Mendelian Randomization Approachdoi: 10.1371/journal.pmed.1002215pmid: 28095459
Background The link between DNA methylation, obesity, and adiposity-related diseases in the general population remains uncertain. Methods and Findings We conducted an association study of body mass index (BMI) and differential methylation for over 400,000 CpGs assayed by microarray in whole-blood-derived DNA from 3,743 participants in the Framingham Heart Study and the Lothian Birth Cohorts, with independent replication in three external cohorts of 4,055 participants. We examined variations in whole blood gene expression and conducted Mendelian randomization analyses to investigate the functional and clinical relevance of the findings. We identified novel and previously reported BMI-related differential methylation at 83 CpGs that replicated across cohorts; BMI-related differential methylation was associated with concurrent changes in the expression of genes in lipid metabolism pathways. Genetic instrumental variable analysis of alterations in methylation at one of the 83 replicated CpGs, cg11024682 (intronic to sterol regulatory element binding transcription factor 1 [SREBF1]), demonstrated links to BMI, adiposity-related traits, and coronary artery disease. Independent genetic instruments for expression of SREBF1 supported the findings linking methylation to adiposity and cardiometabolic disease. Methylation at a substantial proportion (16 of 83) of the identified loci was found to be secondary to differences in BMI. However, the cross-sectional nature of the data limits definitive causal determination. Conclusions We present robust associations of BMI with differential DNA methylation at numerous loci in blood cells. BMI-related DNA methylation and gene expression provide mechanistic insights into the relationship between DNA methylation, obesity, and adiposity-related diseases. Why Was This Study Done? Genetic sequence variants explain only a modest proportion of the variation in body mass index (BMI) and cardiometabolic disease in the general population. There is limited understanding of the link of DNA methylation—a well-characterized epigenetic modification—with BMI and cardiometabolic disease in the general population. What Did the Researchers Do and Find? We conducted a cross-sectional analysis of the association of BMI with leukocyte DNA methylation at over 400,000 sites in the genome among 7,798 community-dwelling adults. We identified associations between BMI and methylation at 83 replicated sites (including 50 novel sites) and concurrent differences in expression in whole blood of genes overrepresented in lipid metabolism pathways. Using genetic sequence variants to model exposure to differential DNA methylation and tissue-specific gene expression, we found differential methylation and expression of SREBF1 to be implicated in BMI, adiposity-related traits, and coronary artery disease. Using genetic sequence variants to model exposure to differences in BMI, we found a substantial proportion of the differentially methylated sites (16 of 83) to be downstream of BMI. What Do These Findings Mean? Evidence is accumulating that epigenetic modifications, such as DNA methylation, are related to obesity-related diseases in the general population. We provide support for a role of genomic regulation of a lipid metabolism transcription factor, SREBF1, in adiposity and coronary artery disease. Mendelian randomization approaches can help prioritize relevant loci for future functional studies, but the cross-sectional observational nature of our study limits definitive causal inference. Introduction Obesity is highly prevalent in developed nations [1] and contributes to a substantial burden of morbidity and mortality [2,3]. Despite advances in the understanding of genetic variants, lifestyle factors, and gene–environment interactions associated with obesity [4–7], much of the interindividual variation in body weight remains unexplained by measurable lifestyle and genetic factors. DNA methylation, one of the most frequent and well-characterized epigenetic modifications, reflects at the molecular level a wide range of environmental exposures and genetic influences [8]. By stabilizing chromatin structure and altering gene expression, DNA methylation has the potential to affect an individual’s susceptibility to obesity (see review in [9]). Further, changes in the methylation of DNA may occur secondarily to obesity and may consequently influence the development of adiposity-related diseases such as diabetes, dyslipidemia, hypertension, and cardiovascular disease. Large gaps in knowledge remain as to how human epigenetic modifications relate to obesity and its sequelae. Epigenetic biomarkers represent a largely untapped precision medicine resource to guide therapy decisions using an individual’s epigenetic profile obtained from blood samples [10]. Identification of clinically relevant epigenetic loci in blood holds the potential to create a foundation upon which to base future functional studies and trials to test epigenetically guided clinical decision making for cardiometabolic diseases. In addition, we may gain novel insights into the molecular underpinnings of obesity and adiposity-related diseases through the study of differentially methylated DNA loci in blood. Doing so may lead to the identification of biologically relevant therapeutic targets. The present study provides results of an epigenome-wide association study (EWAS) of body mass index (BMI) in over 3,700 participants from the Framingham Heart Study (FHS) and the Lothian Birth Cohorts (LBCs) of 1921 and 1936 (LBC1921 and LBC1936). We conducted independent external replication in over 4,000 individuals from the Atherosclerosis Risk in Communities (ARIC), Genetics of Lipid Lowering Drugs and Diet Network (GOLDN), and Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS) cohort studies. We examined the functional relevance of the identified loci by interrogating the known trans-tissue regulatory functions and concomitant changes in gene expression in blood. In addition, we explored the clinical relevance of the findings for adiposity-related diseases with genetic instrumental variable (IV) analyses using bidirectional and two-step trans-tissue Mendelian randomization (MR) approaches [11–13]. Methods Study Design The study includes two major components. First, we conducted an EWAS of BMI. Second, BMI-related differentially methylated loci were taken forward for further analyses to better understand the magnitude of association, regulatory annotation, functional implications, and clinical relevance (Fig 1). The discovery/replication design and secondary models for the BMI EWAS were defined a priori (S1 Text). Downstream analyses to characterize the discovered loci were outlined a priori, but the final approach was primarily driven by the findings and concurrent advancements in the field. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Series of analyses conducted for the epigenome-wide association study of body mass index. ARIC, Atherosclerosis Risk in Communities; BMI, body mass index; DHS, DNase I hypersensitive site; FHS, Framingham Heart Study; GO, Gene Ontology; GOLDN, Genetics of Lipid Lowering Drugs and Diet Network; GWAS, genome-wide association study; LBC, Lothian Birth Cohorts; MR, Mendelian randomization; PIVUS, Prospective Investigation of the Vasculature in Uppsala Seniors; TSS, transcription start site. https://doi.org/10.1371/journal.pmed.1002215.g001 Ethics The FHS protocols and participant consent forms were approved by the institutional review board of Boston University School of Medicine. Ethics permission for the LBC1921 was obtained from the Lothian Research Ethics Committee (Wave 1: LREC/1998/4/183). Ethics permission for the LBC1936 was obtained from the Multi-Centre Research Ethics Committee for Scotland (Wave 1: MREC/01/0/56) and the Lothian Research Ethics Committee (Wave 1: LREC/2003/2/29). Written informed consent was obtained from all discovery cohort (FHS and LBC) and replication cohort (ARIC, GOLDN, and PIVUS) participants. Study Participants Data for the discovery phase of this investigation were drawn from the FHS offspring cohort [14] and the LBCs of 1921 and 1936 [15–17]. As previously described [14], the FHS offspring cohort was initially recruited in 1971 and included 5,124 offspring (and their spouses) from the FHS original cohort [18]. The eligible sample for this investigation was from the 3,021 participants in the FHS offspring cohort who attended the eighth examination cycle from 2005 to 2008. The LBC1921 and LBC1936 samples derive from the Scottish Mental Surveys of 1932 and 1947, respectively, when nearly all 11-y-old children in Scotland completed an IQ-type test in school. The LBC studies provided follow-up of surviving participants, most of whom were living in the Lothian region (Edinburgh city and outskirts) of Scotland. The current study draws upon the older-age baseline examinations of 551 participants in LBC1921 recruited in 1999–2001 and 1,091 participants in LBC1936 recruited in 2004–2007. Anthropometric Measurements Height and weight were measured in each study using established protocols as described in detail in the S1 Methods. BMI was calculated as weight (in kilograms) divided by height (in meters) squared. Molecular Genomics DNA from whole blood samples was collected at the same examination assessment as the anthropometric and covariate measurements in both studies. DNA methylation, assayed with the Infinium HumanMethylation450 BeadChip [19] (Illumina), was available for 2,846 FHS participants and 1,518 LBC participants (514 from LBC1921 and 1,004 from LBC1936). Details of rigorous quality control, normalization procedures, and exclusions of non-autosomal probes, cross-hybridizing probes, and probes with underlying single nucleotide polymorphisms (SNPs) are described in S1 Methods. Each discovery and replication cohort conducted cohort-specific preprocessing pipelines that allowed each cohort to address study-specific technical and batch effects. This design allowed for the selection of true biological signals independent of bias introduced from uniform processing methods. After quality control in the discovery cohorts, there were 402,358 shared CpG (cytosine-phosphate-guanine) methylation probes available for analyses in 2,377 FHS and 1,366 LBC participants (446 from LBC1921 and 920 from LBC1936). Final sample size was determined by the number of community-based participants in the discovery cohorts who consented to genomic studies and who had available DNA and methylation assays passing quality control measures. In the FHS, SNP data were obtained from the Affymetrix 550K Array imputed to the 1000 Genomes Project reference panel, as previously reported [20]. The LBC samples were genotyped using the Illumina Human610-Quad v1.0 genotyping platform and imputed to the 1000 Genomes Project reference panel as well. Gene expression in blood was available in the FHS and was measured using the Affymetrix Human Exon 1.0 ST GeneChip as described in S1 Methods. Epigenome-Wide Association Study of BMI In the FHS, linear mixed effects regression models were conducted to test the association between site-specific DNA methylation and BMI. The primary model was adjusted for age, sex, family relatedness (random effect), and surrogate variables (to account for differential cell proportions and technical effects) [21], with BMI as the independent variable of interest and DNA methylation (inverse-normal transformed) as the dependent variable. In the LBC, linear regression models were conducted adjusting for age, sex, and white blood cell counts, with each DNA methylation probe (residual taken forward from a generalized linear model with a logistic link function adjusting for technical and batch effects) as the dependent variable and BMI as the independent variable of interest. Further analytical details for the discovery cohorts are described in S1 Methods. In both cohorts, secondary models were conducted: (1) additionally adjusting for smoking status, (2) restricted to participants with BMI 18–35 kg/m2 in order to avoid confounding due to frailty or morbid obesity and obesity-related diseases, and (3) testing for age and sex interactions. Results from the FHS and LBCs were meta-analyzed using methods that weighted the p-value by sample size [22]. Directional consistency of statistically significant cohort-specific effects was confirmed for all methylome-wide significant findings from the discovery meta-analysis. We focused our analyses on the resultant test statistic and direction of effect from the independent variable of interest (BMI) as the cohort-specific linear regression coefficients were not directly comparable due to the differences in the preprocessing approach between cohorts. The threshold for statistical significance in the discovery phase was defined by Bonferroni correction for multiple testing to be 0.05/405,000 (p-value < 1.2 × 10−7). A flowchart of analyses conducted is presented in Fig 1. External Replication of EWAS Findings The methylome-wide significant CpGs from the FHS and LBC meta-analysis were taken forward to external replication in three independent cohorts that used the same methylation microarray: the ARIC study, using whole-blood-derived DNA from 2,096 participants of African ancestry; the GOLDN study, using DNA derived from CD4+ cells from 992 participants of European ancestry; and the PIVUS study, using whole blood-derived DNA from 967 participants of Swedish ancestry. Description and analytical methods of the replication cohorts are supplied in S1 Methods. Replication cohorts also conducted cohort-specific preprocessing. Replication was examined within each cohort individually and then in a meta-analysis of all three replication cohorts (using p-value-weighted methods and ensuring directional consistency as described above). The threshold for statistically significant replication was determined by Bonferroni correction to be 0.05 divided by the number of CpGs taken forward from discovery. Sensitivity Models Adjusting EWAS Findings for Potential Confounding by Genetic Variation In order to demonstrate whether the DNA methylation and BMI association results were independent of genetic variants influencing methylation (methylation quantitative trait loci [meQTLs]), we conducted sensitivity models in the FHS for the replicated BMI-related CpGs conditional on the top cis-meQTL (selected by lowest p-value; ±500 kb from the CpG) for each replicated CpG. The approach to identify cis-meQTLs for the BMI-related CpGs is described in S1 Methods. Interindividual Variation in BMI and Distribution of Obesity in Relation to EWAS Findings In order to determine the magnitude of variation in BMI contained within the studied epigenetic signatures in blood, we examined the variation captured in three ways. First, we examined the increase in model R2 starting from the baseline covariate-only linear regression model, with BMI as the dependent variable, when adding nonredundant (|r| < 0.7) replicated CpGs as independent variables in order of decreasing statistical significance. We conducted this analysis in two discovery test sets: (1) methylome-wide significant CpGs in the FHS only were tested in the LBCs and (2) replicated nonredundant CpGs from the BMI EWAS were tested in one of the replication cohorts, PIVUS. Due to differences from the discovery cohorts in ethnicity (African ancestry in ARIC) and cell line (CD4+ cells in GOLDN), we conducted the variation analyses only in PIVUS. Second, we created an additive composite measure of the same nonredundant statistically significant replicated CpGs weighted by effect size. The composite methylation measure was generated for each individual by summing the product of the methylation beta-value and the cohort-specific effect size (including direction of effect) for each of the nonredundant replicated CpGs. The distribution of BMI and prevalence of obesity (BMI ≥ 30 kg/m2) was assessed across deciles of the additive weighted composite measure in the PIVUS cohort. Third, the change in BMI and odds of overweight (BMI 25–29.9 kg/m2) and obesity were tested in age- and sex-adjusted linear and logistic regression models for each standard deviation (SD) change in the additive weighted composite measure in the PIVUS cohort. The weighted summation of the composite methylation measure was converted to SD units (mean = 0, SD = 1) to enhance interpretability of results. As some of the cross-sectional differential methylation changes were expected to be secondary to BMI differences, the purpose of these analyses was not to develop a biomarker or risk predictor for cross-sectional BMI measures but to determine if a large proportion of variation in BMI and obesity, and hence obesity-related cardiometabolic risk, is reflected in the blood DNA methylation patterns. Further analyses examine the molecular pathways that are affected and attempt to infer which methylation changes are causally influencing BMI, which are secondary to BMI differences, and which have relevance for clinical disease outcomes. Gene Expression Analyses We analyzed whole blood gene expression data in the FHS to identify which BMI-related differentially methylated CpGs demonstrated association with altered gene expression. The replicated CpGs were tested using linear mixed effects models for association, with the expression level of the corresponding gene in whole blood (based on annotation by the manufacturer) as the dependent variable and DNA methylation as the independent variable, adjusted for age, sex, and technical and batch effects (further details in S1 Methods). Functional and Regulatory Annotation We studied the Gene Ontology (GO) biological process, molecular function, and cellular component pathways (release 2016-08-22) of the genes identified in the BMI EWAS using the PANTHER (protein annotation through evolutionary relationship) overrepresentation test [23]. Secondarily, we restricted analysis to the higher certainty genes shown to have altered whole blood gene expression in association with BMI-related differential methylation, as described in the previous section. If multiple probes were annotated to the same gene, then the gene was included only once (unweighted). As the methylation array covers 99% of RefSeq genes, the background universe of genes tested was not restricted. Results were corrected for multiple testing within each category. In addition, we used eFORGE v1.2 (http://eforge.cs.ucl.ac.uk/) [24] to identify if the replicated CpGs were enriched in DNase I hypersensitive sites (DHSs) (markers of active regulatory regions) and loci with overlapping histone modifications (H3Kme1, H3Kme4, H3K9me3, H3K27me3, and H3K36me3) across available cell lines and tissues from Roadmap Epigenomics Project, BLUEPRINT Epigenome, and ENCODE (Encyclopedia of DNA Elements) consortia data [25–27]. Bidirectional and Two-Step Trans-tissue Mendelian Randomization IV analyses using SNPs as IVs for (1) DNA methylation, (2) gene expression, and (3) BMI were conducted in order to infer potential causal relationships between EWAS findings, BMI, and adiposity-related diseases (the series of analyses conducted is outlined in Table 1). The detailed approach is provided in S1 Methods. In brief, differences in methylation and expression were modeled using quantitative trait loci (QTLs), thus leveraging the contribution of genetic variation to epigenetic traits to infer causal relations. Blood QTL IVs were selected as the single top SNP methylation or expression association (by lowest p-value) in the FHS with replication in the external cohorts or public datasets. As QTLs vary in effect in different tissue types, we selected tissue-specific methylation and expression QTLs to examine tissue-specific effects (details in S1 Methods). To model the effect of BMI on methylation (reverse causation), the IV for BMI was assembled as an additive weighted genetic risk score from the 97 genome-wide significant SNPs from the Genetic Investigation of ANthropometric Traits (GIANT) consortium 2015 genome-wide association study (GWAS) results [7]. A sensitivity analysis utilizing a single SNP in the FTO (fat mass and obesity associated) locus as the IV for BMI was conducted to examine an IV less prone to pleiotropy bias but also less powerful to detect potential causal relations. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Schema of instrumental variable analyses conducted in order to infer the potential causal relations between DNA methylation, gene expression, BMI, and adiposity-related disease. https://doi.org/10.1371/journal.pmed.1002215.t001 Forward MR, using the two-stage least squares method, tests the causal relation of differential methylation with BMI. SNP IVs that implicated a causal effect of differential methylation on BMI from the forward MR (Bonferroni-corrected and, secondarily, nominal causal p-value < 0.05) were tested in the trans-tissue two-step MR. The trans-tissue two-step MR was implemented to further break down the relationship between DNA methylation and BMI and to infer whether the hypothesized mediator (gene expression in multiple tissues) is influenced by the exposure (DNA methylation) and, second, whether the mediator (gene expression in multiple tissues) affects the outcome (BMI). SNP IVs that implicated a causal effect of differential methylation and expression on BMI were tested for associations with adiposity-related phenotypes from published GWAS results. Finally, the reverse MR was conducted to test the causal relation of BMI with downstream changes in DNA methylation. Study Design The study includes two major components. First, we conducted an EWAS of BMI. Second, BMI-related differentially methylated loci were taken forward for further analyses to better understand the magnitude of association, regulatory annotation, functional implications, and clinical relevance (Fig 1). The discovery/replication design and secondary models for the BMI EWAS were defined a priori (S1 Text). Downstream analyses to characterize the discovered loci were outlined a priori, but the final approach was primarily driven by the findings and concurrent advancements in the field. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Series of analyses conducted for the epigenome-wide association study of body mass index. ARIC, Atherosclerosis Risk in Communities; BMI, body mass index; DHS, DNase I hypersensitive site; FHS, Framingham Heart Study; GO, Gene Ontology; GOLDN, Genetics of Lipid Lowering Drugs and Diet Network; GWAS, genome-wide association study; LBC, Lothian Birth Cohorts; MR, Mendelian randomization; PIVUS, Prospective Investigation of the Vasculature in Uppsala Seniors; TSS, transcription start site. https://doi.org/10.1371/journal.pmed.1002215.g001 Ethics The FHS protocols and participant consent forms were approved by the institutional review board of Boston University School of Medicine. Ethics permission for the LBC1921 was obtained from the Lothian Research Ethics Committee (Wave 1: LREC/1998/4/183). Ethics permission for the LBC1936 was obtained from the Multi-Centre Research Ethics Committee for Scotland (Wave 1: MREC/01/0/56) and the Lothian Research Ethics Committee (Wave 1: LREC/2003/2/29). Written informed consent was obtained from all discovery cohort (FHS and LBC) and replication cohort (ARIC, GOLDN, and PIVUS) participants. Study Participants Data for the discovery phase of this investigation were drawn from the FHS offspring cohort [14] and the LBCs of 1921 and 1936 [15–17]. As previously described [14], the FHS offspring cohort was initially recruited in 1971 and included 5,124 offspring (and their spouses) from the FHS original cohort [18]. The eligible sample for this investigation was from the 3,021 participants in the FHS offspring cohort who attended the eighth examination cycle from 2005 to 2008. The LBC1921 and LBC1936 samples derive from the Scottish Mental Surveys of 1932 and 1947, respectively, when nearly all 11-y-old children in Scotland completed an IQ-type test in school. The LBC studies provided follow-up of surviving participants, most of whom were living in the Lothian region (Edinburgh city and outskirts) of Scotland. The current study draws upon the older-age baseline examinations of 551 participants in LBC1921 recruited in 1999–2001 and 1,091 participants in LBC1936 recruited in 2004–2007. Anthropometric Measurements Height and weight were measured in each study using established protocols as described in detail in the S1 Methods. BMI was calculated as weight (in kilograms) divided by height (in meters) squared. Molecular Genomics DNA from whole blood samples was collected at the same examination assessment as the anthropometric and covariate measurements in both studies. DNA methylation, assayed with the Infinium HumanMethylation450 BeadChip [19] (Illumina), was available for 2,846 FHS participants and 1,518 LBC participants (514 from LBC1921 and 1,004 from LBC1936). Details of rigorous quality control, normalization procedures, and exclusions of non-autosomal probes, cross-hybridizing probes, and probes with underlying single nucleotide polymorphisms (SNPs) are described in S1 Methods. Each discovery and replication cohort conducted cohort-specific preprocessing pipelines that allowed each cohort to address study-specific technical and batch effects. This design allowed for the selection of true biological signals independent of bias introduced from uniform processing methods. After quality control in the discovery cohorts, there were 402,358 shared CpG (cytosine-phosphate-guanine) methylation probes available for analyses in 2,377 FHS and 1,366 LBC participants (446 from LBC1921 and 920 from LBC1936). Final sample size was determined by the number of community-based participants in the discovery cohorts who consented to genomic studies and who had available DNA and methylation assays passing quality control measures. In the FHS, SNP data were obtained from the Affymetrix 550K Array imputed to the 1000 Genomes Project reference panel, as previously reported [20]. The LBC samples were genotyped using the Illumina Human610-Quad v1.0 genotyping platform and imputed to the 1000 Genomes Project reference panel as well. Gene expression in blood was available in the FHS and was measured using the Affymetrix Human Exon 1.0 ST GeneChip as described in S1 Methods. Epigenome-Wide Association Study of BMI In the FHS, linear mixed effects regression models were conducted to test the association between site-specific DNA methylation and BMI. The primary model was adjusted for age, sex, family relatedness (random effect), and surrogate variables (to account for differential cell proportions and technical effects) [21], with BMI as the independent variable of interest and DNA methylation (inverse-normal transformed) as the dependent variable. In the LBC, linear regression models were conducted adjusting for age, sex, and white blood cell counts, with each DNA methylation probe (residual taken forward from a generalized linear model with a logistic link function adjusting for technical and batch effects) as the dependent variable and BMI as the independent variable of interest. Further analytical details for the discovery cohorts are described in S1 Methods. In both cohorts, secondary models were conducted: (1) additionally adjusting for smoking status, (2) restricted to participants with BMI 18–35 kg/m2 in order to avoid confounding due to frailty or morbid obesity and obesity-related diseases, and (3) testing for age and sex interactions. Results from the FHS and LBCs were meta-analyzed using methods that weighted the p-value by sample size [22]. Directional consistency of statistically significant cohort-specific effects was confirmed for all methylome-wide significant findings from the discovery meta-analysis. We focused our analyses on the resultant test statistic and direction of effect from the independent variable of interest (BMI) as the cohort-specific linear regression coefficients were not directly comparable due to the differences in the preprocessing approach between cohorts. The threshold for statistical significance in the discovery phase was defined by Bonferroni correction for multiple testing to be 0.05/405,000 (p-value < 1.2 × 10−7). A flowchart of analyses conducted is presented in Fig 1. External Replication of EWAS Findings The methylome-wide significant CpGs from the FHS and LBC meta-analysis were taken forward to external replication in three independent cohorts that used the same methylation microarray: the ARIC study, using whole-blood-derived DNA from 2,096 participants of African ancestry; the GOLDN study, using DNA derived from CD4+ cells from 992 participants of European ancestry; and the PIVUS study, using whole blood-derived DNA from 967 participants of Swedish ancestry. Description and analytical methods of the replication cohorts are supplied in S1 Methods. Replication cohorts also conducted cohort-specific preprocessing. Replication was examined within each cohort individually and then in a meta-analysis of all three replication cohorts (using p-value-weighted methods and ensuring directional consistency as described above). The threshold for statistically significant replication was determined by Bonferroni correction to be 0.05 divided by the number of CpGs taken forward from discovery. Sensitivity Models Adjusting EWAS Findings for Potential Confounding by Genetic Variation In order to demonstrate whether the DNA methylation and BMI association results were independent of genetic variants influencing methylation (methylation quantitative trait loci [meQTLs]), we conducted sensitivity models in the FHS for the replicated BMI-related CpGs conditional on the top cis-meQTL (selected by lowest p-value; ±500 kb from the CpG) for each replicated CpG. The approach to identify cis-meQTLs for the BMI-related CpGs is described in S1 Methods. Interindividual Variation in BMI and Distribution of Obesity in Relation to EWAS Findings In order to determine the magnitude of variation in BMI contained within the studied epigenetic signatures in blood, we examined the variation captured in three ways. First, we examined the increase in model R2 starting from the baseline covariate-only linear regression model, with BMI as the dependent variable, when adding nonredundant (|r| < 0.7) replicated CpGs as independent variables in order of decreasing statistical significance. We conducted this analysis in two discovery test sets: (1) methylome-wide significant CpGs in the FHS only were tested in the LBCs and (2) replicated nonredundant CpGs from the BMI EWAS were tested in one of the replication cohorts, PIVUS. Due to differences from the discovery cohorts in ethnicity (African ancestry in ARIC) and cell line (CD4+ cells in GOLDN), we conducted the variation analyses only in PIVUS. Second, we created an additive composite measure of the same nonredundant statistically significant replicated CpGs weighted by effect size. The composite methylation measure was generated for each individual by summing the product of the methylation beta-value and the cohort-specific effect size (including direction of effect) for each of the nonredundant replicated CpGs. The distribution of BMI and prevalence of obesity (BMI ≥ 30 kg/m2) was assessed across deciles of the additive weighted composite measure in the PIVUS cohort. Third, the change in BMI and odds of overweight (BMI 25–29.9 kg/m2) and obesity were tested in age- and sex-adjusted linear and logistic regression models for each standard deviation (SD) change in the additive weighted composite measure in the PIVUS cohort. The weighted summation of the composite methylation measure was converted to SD units (mean = 0, SD = 1) to enhance interpretability of results. As some of the cross-sectional differential methylation changes were expected to be secondary to BMI differences, the purpose of these analyses was not to develop a biomarker or risk predictor for cross-sectional BMI measures but to determine if a large proportion of variation in BMI and obesity, and hence obesity-related cardiometabolic risk, is reflected in the blood DNA methylation patterns. Further analyses examine the molecular pathways that are affected and attempt to infer which methylation changes are causally influencing BMI, which are secondary to BMI differences, and which have relevance for clinical disease outcomes. Gene Expression Analyses We analyzed whole blood gene expression data in the FHS to identify which BMI-related differentially methylated CpGs demonstrated association with altered gene expression. The replicated CpGs were tested using linear mixed effects models for association, with the expression level of the corresponding gene in whole blood (based on annotation by the manufacturer) as the dependent variable and DNA methylation as the independent variable, adjusted for age, sex, and technical and batch effects (further details in S1 Methods). Functional and Regulatory Annotation We studied the Gene Ontology (GO) biological process, molecular function, and cellular component pathways (release 2016-08-22) of the genes identified in the BMI EWAS using the PANTHER (protein annotation through evolutionary relationship) overrepresentation test [23]. Secondarily, we restricted analysis to the higher certainty genes shown to have altered whole blood gene expression in association with BMI-related differential methylation, as described in the previous section. If multiple probes were annotated to the same gene, then the gene was included only once (unweighted). As the methylation array covers 99% of RefSeq genes, the background universe of genes tested was not restricted. Results were corrected for multiple testing within each category. In addition, we used eFORGE v1.2 (http://eforge.cs.ucl.ac.uk/) [24] to identify if the replicated CpGs were enriched in DNase I hypersensitive sites (DHSs) (markers of active regulatory regions) and loci with overlapping histone modifications (H3Kme1, H3Kme4, H3K9me3, H3K27me3, and H3K36me3) across available cell lines and tissues from Roadmap Epigenomics Project, BLUEPRINT Epigenome, and ENCODE (Encyclopedia of DNA Elements) consortia data [25–27]. Bidirectional and Two-Step Trans-tissue Mendelian Randomization IV analyses using SNPs as IVs for (1) DNA methylation, (2) gene expression, and (3) BMI were conducted in order to infer potential causal relationships between EWAS findings, BMI, and adiposity-related diseases (the series of analyses conducted is outlined in Table 1). The detailed approach is provided in S1 Methods. In brief, differences in methylation and expression were modeled using quantitative trait loci (QTLs), thus leveraging the contribution of genetic variation to epigenetic traits to infer causal relations. Blood QTL IVs were selected as the single top SNP methylation or expression association (by lowest p-value) in the FHS with replication in the external cohorts or public datasets. As QTLs vary in effect in different tissue types, we selected tissue-specific methylation and expression QTLs to examine tissue-specific effects (details in S1 Methods). To model the effect of BMI on methylation (reverse causation), the IV for BMI was assembled as an additive weighted genetic risk score from the 97 genome-wide significant SNPs from the Genetic Investigation of ANthropometric Traits (GIANT) consortium 2015 genome-wide association study (GWAS) results [7]. A sensitivity analysis utilizing a single SNP in the FTO (fat mass and obesity associated) locus as the IV for BMI was conducted to examine an IV less prone to pleiotropy bias but also less powerful to detect potential causal relations. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Schema of instrumental variable analyses conducted in order to infer the potential causal relations between DNA methylation, gene expression, BMI, and adiposity-related disease. https://doi.org/10.1371/journal.pmed.1002215.t001 Forward MR, using the two-stage least squares method, tests the causal relation of differential methylation with BMI. SNP IVs that implicated a causal effect of differential methylation on BMI from the forward MR (Bonferroni-corrected and, secondarily, nominal causal p-value < 0.05) were tested in the trans-tissue two-step MR. The trans-tissue two-step MR was implemented to further break down the relationship between DNA methylation and BMI and to infer whether the hypothesized mediator (gene expression in multiple tissues) is influenced by the exposure (DNA methylation) and, second, whether the mediator (gene expression in multiple tissues) affects the outcome (BMI). SNP IVs that implicated a causal effect of differential methylation and expression on BMI were tested for associations with adiposity-related phenotypes from published GWAS results. Finally, the reverse MR was conducted to test the causal relation of BMI with downstream changes in DNA methylation. Results Discovery Cohort Characteristics The discovery sample included 3,743 individuals: 2,377 from the FHS and 1,366 from the LBCs (n = 446 from LBC1921 and n = 920 from LBC1936). The FHS, LBC1921, and LBC1936 cohorts were older adults (mean [SD] age 67 [9], 79 [1], and 70 [1] y, respectively) and had similar sex distribution (50%–60% female) and proportion of current smokers (8%–11%) (Table 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Study characteristics of the Framingham Heart Study and Lothian Birth Cohort participants (discovery cohorts) at the time of DNA methylation assays. https://doi.org/10.1371/journal.pmed.1002215.t002 Epigenome-Wide Association Study of BMI Discovery. In the FHS-LBC EWAS meta-analysis, 135 CpGs were significantly associated with BMI after correction for multiple testing in the primary age- and sex-adjusted model (p < 1.2 × 10−7; full list and regression coefficients are provided in S1 Table; Q-Q plots in S1 and S2 Figs; Manhattan plot in S3 Fig; genomic inflation factor of discovery meta-analysis, λ = 1.14). Similar results were observed following additional adjustment for smoking status and after excluding 313 individuals with BMI outside of 18–35 kg/m2 (Models 2–3 in S2 Table; S4 Fig). External replication. The 135 statistically significant CpGs from the discovery BMI EWAS meta-analysis (primary model) were tested for external replication in the ARIC (n = 2,096), GOLDN (n = 992), and PIVUS (n = 967) cohorts. There was external replication of 83 of 135 CpGs in at least one cohort (73 in ARIC, 22 in GOLDN, and 19 in PIVUS; S5 Fig) at p-value < 3.7 × 10−4 (Bonferroni-corrected p-value for 135 tests), and 83 of 135 CpGs replicated in the meta-analyses of the three replication cohorts and were taken forward for subsequent analyses (S3 Table). Greater methylation was associated with higher BMI at 49 (59%) of the 83 replicated CpGs. The majority of BMI-related CpGs (65%–85% of CpGs depending on the cohort) had mean sample CpG methylation levels between 20% and 80% (S4 Table). Fifty of the 83 replicated differentially methylated CpGs have not been previously reported in microarray-based EWASs of BMI [28–36] (Table 3). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Fifty novel replicated differentially methylated CpGs associated with BMI sorted by p-value in the discovery cohorts. https://doi.org/10.1371/journal.pmed.1002215.t003 Age and sex interactions among the BMI EWAS findings. Among the 135 discovery CpGs, a significant sex interaction was demonstrated in the discovery cohorts for one unannotated CpG (cg26651978 on Chromosome 17q25.3; <3 kbp from the 3′ end of LGALS3BP [lectin galactoside-binding soluble 3-binding protein]), and a significant age interaction for one CpG (cg24678869; DENND4B [DENN domain 4B Rab GDP-GTP exchange factor]) at p-value < 3.7 × 10−4 (Bonferroni-corrected p-value for 135 tests) (S4 Table). The sex interaction identified at cg26651978 (LGALS3BP) modestly replicated in the external cohorts (replication meta-analyses p = 0.02), with larger regression coefficients and lower p-values in stratified models among men than among women (replication meta-analyses p = 1.73 × 10−6 and 0.002 in men and women, respectively; overall and sex-stratified regression coefficients for each cohort in S5 Table). The age interaction at cg24678869 (DENND4B) did not replicate in the external cohorts (replication meta-analyses p = 0.9). Due to the narrow age range in PIVUS, however, this interaction was tested only in ARIC and GOLDN (n = 3,079). HIF3A locus methylation. Examining a previously identified BMI-related differential methylation at the HIF3A locus [28], we demonstrated modest associations with BMI in the FHS-LBC discovery cohorts for the three reported CpGs (p = 0.02 for cg22891070, p = 0.03 for cg16672562, and p = 0.04 for cg27146050; no significant sex interactions). Stratifying models at the median age of 66 y in the FHS (age range too narrow in LBC for stratification) revealed stronger associations in the younger subset and null associations in the older subset (for cg22891070, cg16672562, and cg27146050, p = 0.003, p = 0.008, and p = 0.046, respectively, among participants ≤66 y of age, and p = 0.9, p = 0.6, and p = 0.4, respectively, among participants >66 y of age). Sensitivity models conditioning on cis methylation quantitative trait loci. Sensitivity models conditioning on the top cis-meQTL (selected by lowest p-value; ±500 kb from the CpG) in the FHS demonstrated minimal attenuation of the test statistic for the association of BMI, with differential methylation at the majority of CpGs (81/83 [98%]) attenuated by less than 20% (S6 Table). Interindividual Variation in BMI and Distribution of Obesity The interindividual variation in BMI and distribution of obesity captured in the BMI EWAS findings was evaluated. Regressing BMI on the 77 nonredundant (inter-probe correlation |r| < 0.7) CpGs from the 83 replicated CpGs identified in the BMI EWAS revealed that 18% of the interindividual variation (adjusted R2) in BMI is captured by differential methylation beyond age and sex in the external replication cohort PIVUS (S6 Fig). This proportion is similar to that observed when examining a completely independent discovery test set using the 75 CpGs that were methylome-wide significant in the FHS discovery cohort (no replication), which accounted for 17.5% of the interindividual variation in BMI (adjusted R2) beyond age and sex in the LBCs. Creating an additive weighted composite measure of the 77 nonredundant replicated CpGs and examining the distribution of BMI and obesity (BMI ≥ 30 kg/m2) across deciles of the measure demonstrated that the median BMI increased in a graded manner from 22 to 34 kg/m2 and the prevalence of obesity rose from 0% to 50% (Figs 2 and S7). For each SD increase in the composite DNA methylation measure in the PIVUS replication cohort, BMI increased by 1.63 (standard error 0.13) kg/m2 (p = 3.7 × 10−34). The odds ratios for obesity (BMI ≥ 30 kg/m2) and overweight (BMI 25–29.9 kg/m2) compared to the reference group (BMI < 25 kg/m2) were 2.8 (95% CI 2.3–3.5; p = 1.6 × 10−25) and 1.9 (95% CI 1.6–2.2; p = 2.5 × 10−18), respectively, for each SD increase in methylation measure in age- and sex-adjusted models. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Histogram of the proportion of obese individuals (BMI ≥ 30 kg/m2) in the PIVUS cohort across deciles of the additive weighted composite methylation measure of the 77 nonredundant replicated CpGs (|r| < 0.7) from the BMI epigenome-wide association study. BMI, body mass index; PIVUS, Prospective Investigation of the Vasculature in Uppsala Seniors. https://doi.org/10.1371/journal.pmed.1002215.g002 Three-Way Association of DNA Methylation, Gene Expression, and BMI We examined the association of DNA methylation at the 83 replicated BMI-related CpGs with gene expression among 2,246 FHS participants, in order to determine which genes in blood may be influenced by differential methylation of the BMI EWAS CpGs. Of the 83 replicated CpGs, annotated gene expression from whole blood was available for 62 CpG–gene expression pairs (three transcript results were unavailable on the microarray, and 18 CpGs were intergenic). There were significant associations (p-value < 8 × 10−4; 0.05/62) between differential DNA methylation and gene expression in whole blood for 19 CpG–gene expression pairs, representing ten unique gene transcripts (ABCG1, CPT1A, SREBF1, LGALS3BP, DHCR24, PHGDH, SARS, NOD2, CACNA2D3, and SLC1A5), with almost all of the CpG–gene expression pairs (18/19; 95%) demonstrating an inverse association of methylation with expression (S7 Table). There were significant three-way associations (CpG versus BMI; CpG versus gene expression; gene expression versus BMI) for 11 CpGs with seven unique annotated genes (Table 4). Five of the seven genes (71%) with significant three-way associations between CpG–gene expression–BMI are known to exhibit cardiometabolic phenotypes in murine gene knockout models [37–44]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Association results from 11 replicated CpGs with significant three-way associations in whole blood between CpG methylation and BMI, CpG methylation and gene expression, and gene expression and BMI. https://doi.org/10.1371/journal.pmed.1002215.t004 Functional and Regulatory Annotation of the BMI EWAS Findings Gene Ontology pathway analyses. GO analyses of biological process, molecular function, and cellular component pathways of the 55 unique genes annotated to the 83 replicated CpGs (ten CpGs were annotated to genes annotated to other replicated CpGs, and 18 CpGs were intergenic) did not identify any statistically significant pathways after adjustment for multiple testing. Secondarily, in order to further refine gene selection for GO analyses to the genes that demonstrated altered expression, we restricted the GO analyses to the ten unique genes for which variation in expression was associated with differential methylation, as described in the previous section. We identified significant overrepresentation of a biological process pathway in the positive regulation of lipid metabolic processes (GO:0045834; adjusted p-value = 0.002; 64-fold enrichment; four overlapping genes [ABCG1, SREBF1, CPT1A and NOD2] of 130 total genes in pathway) and two related processes (positive regulation of the cholesterol biosynthetic [GO:0045542] and cholesterol metabolic [GO:0090205] processes; adjusted p-value = 0.02–0.03). Regulatory annotation of CpGs associated with gene expression in blood. Most BMI-related CpGs associated with altered gene expression were located within 50 kb of the transcription start site and were within known enhancer or DHSs (S8 and S9 Figs). CpGs associated with BMI were more likely to be in enhancers and DHSs (enrichment p-value = 4.5 × 10−7 and 9.4 × 10−4, respectively) and less likely to reside in CpG islands (depletion p-value = 3.2 × 10−11) compared to the full set of measured CpGs on the microarray (S8 Table). DNase I hypersensitive site testing of all identified CpGs. Tissue- and cell-type-specific DHS enrichment testing using the eFORGE v1.2 tool demonstrated that the BMI-related CpGs are enriched in DHSs across almost every tissue and cell type assayed in the included ENCODE, BLUEPRINT Epigenome, and Roadmap Epigenomics Project datasets (S10 and S11 Figs), thus supporting the notion that the CpGs identified in blood are also situated in known active regulatory regions in not only blood, but also other metabolically active tissues. Further stratification by whether BMI-related CpGs had overlapping H3 histone methylation revealed that the BMI-related CpGs predominately overlapped regions with mono-methylation and, to a lesser extent, tri-methylation of lysine 4 on histone H3K4 (H3K4me1 and H3K4me3) across numerous tissues from the consolidated Roadmap Epigenomics Project data (S12–S14 Figs). H3K4me1 marks are indicative of enhancers, H3K4me3 marks are indicative of promoters, and both are known markers of transcriptional activation. Genetic Instrumental Variable Analyses (Mendelian Randomization) Successive genetic IV analyses were conducted to infer causal relations between differential methylation, gene expression, and BMI, followed by evaluation of the modeled epigenetic changes on adiposity-related traits using GWAS results (Table 1). Forward Mendelian randomization. Testing the causal association of DNA methylation with BMI revealed that differential methylation at two CpGs had nominally significant causal associations (p-value < 0.05) with BMI: (1) cg11024682 (SREBF1; cis-meQTL SNP IV rs752579) and (2) cg07730360 (a non-annotated CpG on Chromosome 3q21.3; trans-meQTL SNP IV rs13437553), with causal p-value = 0.02 and 0.04, respectively (S15 Fig; S9 Table). Taking forward the two causal CpGs in discovery for external validation, we found that modeled differential methylation at one of the two CpGs (cis-meQTL SNP IV rs752579 for differential methylation at cg11024682 [SREBF1]) was associated with BMI in the 2015 GIANT consortium results (p = 0.0003; all ancestries). Two-step Mendelian randomization (first step). In the first step (DNA methylation affecting the mediator, gene expression), the SNP IV (rs752579) utilized in the forward MR analyses to model differential methylation of the SREBF1 locus (cg11024682) was also found to be strongly associated with altered SREBF1 gene expression in blood in the FHS (p = 3 × 10−12; decreased expression in relation to the C allele), a published [45] blood expression quantitative trait locus (eQTL) dataset (p = 3.2 × 10−6; direction of effect in blood consistent with that seen in the FHS), and liver (p = 1 × 10−15; in the same direction as observed in blood in a reanalysis of 958 samples [46,47]). Two-step Mendelian randomization (second step). In the second step (gene expression in blood and alternate tissues affecting BMI), we identified adequate eQTLs for SREBF1 expression in whole blood (rs1889018; p = 1.7 × 10−15) from the FHS; in adrenal gland (rs4925138; p = 1.1 × 10−6) and liver (rs11078366; p = 1.8 × 10−6) from the Genotype-Tissue Expression (GTEx) Project; and in adipose tissue (rs4985779; p = 8.4 × 10−4) from the larger MuTHER dataset [48]. The multi-tissue SREBF1 eQTLs were selected to be largely independent from the SREBF1 methylation locus SNP IV (details in S1 Methods). We identified significant associations with BMI (adjusted for the four tests, p < 0.013) in the GIANT consortium results for two of the four tissue types; specifically, BMI was associated with the SNP IV for SREBF1 expression in whole blood (rs1889018, p = 0.002) and adrenal gland (rs4925138, p = 0.0098), but not liver (rs11078366, p = 0.89) or adipose tissue (rs4985779, p = 0.80). Adiposity-related traits in GWASs. Assessing other cardiometabolic disease associations from published GWASs, the SNP IV (rs752579) for exposure to differential methylation at the SREBF1 locus (cg11024682) was also found to be associated with (1) adiposity-related traits [49–51] (waist-hip ratio adjusted for BMI [p = 2.0 × 10−4], adiponectin [p = 0.007], birthweight [p = 0.046]), (2) diabetes traits [52–55] (type 2 diabetes [p = 0.002], fasting insulin adjusted for BMI [p = 0.001], HbA1C [p = 0.003], HOMA-B [p = 0.007]), (3) lipid levels [56] (triglycerides [p = 0.001], high-density lipoprotein cholesterol [p = 0.03]), and (4) coronary artery disease [57] (p = 1.7 × 10−6). Additionally, the SNP IV for increased SREBF1 expression in whole blood (rs1889018) was also associated with waist-hip ratio (p = 0.0002), adiponectin (p = 0.003), and triglycerides (p = 0.02) based on GWAS results [49,50,56]. The SNP IV for increased SREBF1 expression in the adrenal gland (rs4925138) was also nominally associated with adiponectin (p = 0.02), triglycerides (p = 0.022), and low-density lipoprotein change in response to statin treatment (p = 0.04) [50,56,58]. Causal effect estimates. Each SD increase in DNA methylation at the SREBF1 locus (cg11024682) was predicted to result in a 2.8-kg/m2 decrease in BMI in the FHS (modeling the effect of allele C for rs752579). In contrast, the observed relationship between methylation in blood and BMI in the FHS was in the opposite direction: a 1.0-kg/m2 increase in BMI per SD increase in DNA methylation at cg11024682. The predicted direction of effect between methylation and BMI is partly derived from the observed direction of effect between the SNP IV and methylation in blood. Previous literature has reported cell-type-dependent QTLs with opposite directions of effect between a SNP and methylation or expression depending on the cell or tissue type examined [59]. As extensive databases of trans-tissue methylation are unavailable, we examined trans-tissue eQTLs for SREBF1 from the GTEx Portal [60]. A series of eQTLs for SREBF1 (false discovery rate ≤ 0.05) demonstrate opposite direction of effect between blood versus adrenal gland (p-value < 10−6) and additional tissues (at p-value < 10−5) such as skeletal muscle, esophagus, aorta tissue, and tibial nerve (http://www.gtexportal.org/home/bubbleHeatmapPage/SREBF1). Strong eQTLs for SREBF1 are likely present in adrenal tissue as SREBF1 is highly expressed in the adrenal gland compared to other tissues (http://www.proteinatlas.org/ENSG00000072310-SREBF1/tissue). For example, rs854764 is a strong eQTL for SREBF1 in both blood and adrenal tissue but in opposite directions (p = 3.8 × 10−12 and p = 4 × 10−6, respectively, in the GTEx catalog) and is associated with BMI in GIANT (p = 0.001) and waist-hip ratio (p = 9.2 × 10−4), adiponectin (p = 0.02), HbA1C (p = 0.02), type 2 diabetes (p = 0.03), triglycerides (p = 0.04), and coronary artery disease (p = 1.1 × 10−5) in GWAS results [4,7,50,52,54,57,61]. This SNP, rs854764, is also a meQTL for SREBF1 locus methylation at cg11024682 in the FHS (p = 2.8 × 10−18), but the association with SREBF1 locus methylation in adrenal gland, the potential tissue of effect, is unknown. See S10 Table for causal effect estimates and confidence intervals for the second step of the two-step MR analyses. Reverse Mendelian randomization. To test whether BMI affects methylation at the identified CpGs, the additive weighted genetic risk score of 97 known BMI SNPs [7] was used as an IV for BMI (F-test statistic = 26). Sixteen CpGs were found to be differentially methylated as a consequence of BMI using a nominal causal p-value < 0.05 cutoff (full list in S11 Table). The 16 downstream CpGs were annotated to 12 genes (ABCG1, USP22, DPF1, RARA, KDM2B, KANK2, RALB, NT5DC2, DENND4B, B3GNT7, DKK4, and ABAT). A sensitivity analysis using a single SNP in the FTO locus as a BMI IV (S12 Table) further supported causal associations downstream of BMI at two of the 16 CpGs (nominal causal p-value < 0.05 for cg06500161 and cg04286697, at the ABCG1 and B3GNT7 loci, respectively). The annotated genes with BMI-related differential methylation are characterized in Fig 3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Annotated genes of replicated differentially methylated CpGs identified in the BMI epigenome-wide association study. Genes are grouped by association with gene expression, association of gene expression with BMI, and Mendelian randomization analyses for causal support. Duplicate gene names within the same group are not shown. Figure does not include 18 intergenic CpGs without a gene annotation. BMI, body mass index; EWAS, epigenome-wide association study. https://doi.org/10.1371/journal.pmed.1002215.g003 Discovery Cohort Characteristics The discovery sample included 3,743 individuals: 2,377 from the FHS and 1,366 from the LBCs (n = 446 from LBC1921 and n = 920 from LBC1936). The FHS, LBC1921, and LBC1936 cohorts were older adults (mean [SD] age 67 [9], 79 [1], and 70 [1] y, respectively) and had similar sex distribution (50%–60% female) and proportion of current smokers (8%–11%) (Table 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Study characteristics of the Framingham Heart Study and Lothian Birth Cohort participants (discovery cohorts) at the time of DNA methylation assays. https://doi.org/10.1371/journal.pmed.1002215.t002 Epigenome-Wide Association Study of BMI Discovery. In the FHS-LBC EWAS meta-analysis, 135 CpGs were significantly associated with BMI after correction for multiple testing in the primary age- and sex-adjusted model (p < 1.2 × 10−7; full list and regression coefficients are provided in S1 Table; Q-Q plots in S1 and S2 Figs; Manhattan plot in S3 Fig; genomic inflation factor of discovery meta-analysis, λ = 1.14). Similar results were observed following additional adjustment for smoking status and after excluding 313 individuals with BMI outside of 18–35 kg/m2 (Models 2–3 in S2 Table; S4 Fig). External replication. The 135 statistically significant CpGs from the discovery BMI EWAS meta-analysis (primary model) were tested for external replication in the ARIC (n = 2,096), GOLDN (n = 992), and PIVUS (n = 967) cohorts. There was external replication of 83 of 135 CpGs in at least one cohort (73 in ARIC, 22 in GOLDN, and 19 in PIVUS; S5 Fig) at p-value < 3.7 × 10−4 (Bonferroni-corrected p-value for 135 tests), and 83 of 135 CpGs replicated in the meta-analyses of the three replication cohorts and were taken forward for subsequent analyses (S3 Table). Greater methylation was associated with higher BMI at 49 (59%) of the 83 replicated CpGs. The majority of BMI-related CpGs (65%–85% of CpGs depending on the cohort) had mean sample CpG methylation levels between 20% and 80% (S4 Table). Fifty of the 83 replicated differentially methylated CpGs have not been previously reported in microarray-based EWASs of BMI [28–36] (Table 3). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Fifty novel replicated differentially methylated CpGs associated with BMI sorted by p-value in the discovery cohorts. https://doi.org/10.1371/journal.pmed.1002215.t003 Age and sex interactions among the BMI EWAS findings. Among the 135 discovery CpGs, a significant sex interaction was demonstrated in the discovery cohorts for one unannotated CpG (cg26651978 on Chromosome 17q25.3; <3 kbp from the 3′ end of LGALS3BP [lectin galactoside-binding soluble 3-binding protein]), and a significant age interaction for one CpG (cg24678869; DENND4B [DENN domain 4B Rab GDP-GTP exchange factor]) at p-value < 3.7 × 10−4 (Bonferroni-corrected p-value for 135 tests) (S4 Table). The sex interaction identified at cg26651978 (LGALS3BP) modestly replicated in the external cohorts (replication meta-analyses p = 0.02), with larger regression coefficients and lower p-values in stratified models among men than among women (replication meta-analyses p = 1.73 × 10−6 and 0.002 in men and women, respectively; overall and sex-stratified regression coefficients for each cohort in S5 Table). The age interaction at cg24678869 (DENND4B) did not replicate in the external cohorts (replication meta-analyses p = 0.9). Due to the narrow age range in PIVUS, however, this interaction was tested only in ARIC and GOLDN (n = 3,079). HIF3A locus methylation. Examining a previously identified BMI-related differential methylation at the HIF3A locus [28], we demonstrated modest associations with BMI in the FHS-LBC discovery cohorts for the three reported CpGs (p = 0.02 for cg22891070, p = 0.03 for cg16672562, and p = 0.04 for cg27146050; no significant sex interactions). Stratifying models at the median age of 66 y in the FHS (age range too narrow in LBC for stratification) revealed stronger associations in the younger subset and null associations in the older subset (for cg22891070, cg16672562, and cg27146050, p = 0.003, p = 0.008, and p = 0.046, respectively, among participants ≤66 y of age, and p = 0.9, p = 0.6, and p = 0.4, respectively, among participants >66 y of age). Sensitivity models conditioning on cis methylation quantitative trait loci. Sensitivity models conditioning on the top cis-meQTL (selected by lowest p-value; ±500 kb from the CpG) in the FHS demonstrated minimal attenuation of the test statistic for the association of BMI, with differential methylation at the majority of CpGs (81/83 [98%]) attenuated by less than 20% (S6 Table). Discovery. In the FHS-LBC EWAS meta-analysis, 135 CpGs were significantly associated with BMI after correction for multiple testing in the primary age- and sex-adjusted model (p < 1.2 × 10−7; full list and regression coefficients are provided in S1 Table; Q-Q plots in S1 and S2 Figs; Manhattan plot in S3 Fig; genomic inflation factor of discovery meta-analysis, λ = 1.14). Similar results were observed following additional adjustment for smoking status and after excluding 313 individuals with BMI outside of 18–35 kg/m2 (Models 2–3 in S2 Table; S4 Fig). External replication. The 135 statistically significant CpGs from the discovery BMI EWAS meta-analysis (primary model) were tested for external replication in the ARIC (n = 2,096), GOLDN (n = 992), and PIVUS (n = 967) cohorts. There was external replication of 83 of 135 CpGs in at least one cohort (73 in ARIC, 22 in GOLDN, and 19 in PIVUS; S5 Fig) at p-value < 3.7 × 10−4 (Bonferroni-corrected p-value for 135 tests), and 83 of 135 CpGs replicated in the meta-analyses of the three replication cohorts and were taken forward for subsequent analyses (S3 Table). Greater methylation was associated with higher BMI at 49 (59%) of the 83 replicated CpGs. The majority of BMI-related CpGs (65%–85% of CpGs depending on the cohort) had mean sample CpG methylation levels between 20% and 80% (S4 Table). Fifty of the 83 replicated differentially methylated CpGs have not been previously reported in microarray-based EWASs of BMI [28–36] (Table 3). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Fifty novel replicated differentially methylated CpGs associated with BMI sorted by p-value in the discovery cohorts. https://doi.org/10.1371/journal.pmed.1002215.t003 Age and sex interactions among the BMI EWAS findings. Among the 135 discovery CpGs, a significant sex interaction was demonstrated in the discovery cohorts for one unannotated CpG (cg26651978 on Chromosome 17q25.3; <3 kbp from the 3′ end of LGALS3BP [lectin galactoside-binding soluble 3-binding protein]), and a significant age interaction for one CpG (cg24678869; DENND4B [DENN domain 4B Rab GDP-GTP exchange factor]) at p-value < 3.7 × 10−4 (Bonferroni-corrected p-value for 135 tests) (S4 Table). The sex interaction identified at cg26651978 (LGALS3BP) modestly replicated in the external cohorts (replication meta-analyses p = 0.02), with larger regression coefficients and lower p-values in stratified models among men than among women (replication meta-analyses p = 1.73 × 10−6 and 0.002 in men and women, respectively; overall and sex-stratified regression coefficients for each cohort in S5 Table). The age interaction at cg24678869 (DENND4B) did not replicate in the external cohorts (replication meta-analyses p = 0.9). Due to the narrow age range in PIVUS, however, this interaction was tested only in ARIC and GOLDN (n = 3,079). HIF3A locus methylation. Examining a previously identified BMI-related differential methylation at the HIF3A locus [28], we demonstrated modest associations with BMI in the FHS-LBC discovery cohorts for the three reported CpGs (p = 0.02 for cg22891070, p = 0.03 for cg16672562, and p = 0.04 for cg27146050; no significant sex interactions). Stratifying models at the median age of 66 y in the FHS (age range too narrow in LBC for stratification) revealed stronger associations in the younger subset and null associations in the older subset (for cg22891070, cg16672562, and cg27146050, p = 0.003, p = 0.008, and p = 0.046, respectively, among participants ≤66 y of age, and p = 0.9, p = 0.6, and p = 0.4, respectively, among participants >66 y of age). Sensitivity models conditioning on cis methylation quantitative trait loci. Sensitivity models conditioning on the top cis-meQTL (selected by lowest p-value; ±500 kb from the CpG) in the FHS demonstrated minimal attenuation of the test statistic for the association of BMI, with differential methylation at the majority of CpGs (81/83 [98%]) attenuated by less than 20% (S6 Table). Interindividual Variation in BMI and Distribution of Obesity The interindividual variation in BMI and distribution of obesity captured in the BMI EWAS findings was evaluated. Regressing BMI on the 77 nonredundant (inter-probe correlation |r| < 0.7) CpGs from the 83 replicated CpGs identified in the BMI EWAS revealed that 18% of the interindividual variation (adjusted R2) in BMI is captured by differential methylation beyond age and sex in the external replication cohort PIVUS (S6 Fig). This proportion is similar to that observed when examining a completely independent discovery test set using the 75 CpGs that were methylome-wide significant in the FHS discovery cohort (no replication), which accounted for 17.5% of the interindividual variation in BMI (adjusted R2) beyond age and sex in the LBCs. Creating an additive weighted composite measure of the 77 nonredundant replicated CpGs and examining the distribution of BMI and obesity (BMI ≥ 30 kg/m2) across deciles of the measure demonstrated that the median BMI increased in a graded manner from 22 to 34 kg/m2 and the prevalence of obesity rose from 0% to 50% (Figs 2 and S7). For each SD increase in the composite DNA methylation measure in the PIVUS replication cohort, BMI increased by 1.63 (standard error 0.13) kg/m2 (p = 3.7 × 10−34). The odds ratios for obesity (BMI ≥ 30 kg/m2) and overweight (BMI 25–29.9 kg/m2) compared to the reference group (BMI < 25 kg/m2) were 2.8 (95% CI 2.3–3.5; p = 1.6 × 10−25) and 1.9 (95% CI 1.6–2.2; p = 2.5 × 10−18), respectively, for each SD increase in methylation measure in age- and sex-adjusted models. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Histogram of the proportion of obese individuals (BMI ≥ 30 kg/m2) in the PIVUS cohort across deciles of the additive weighted composite methylation measure of the 77 nonredundant replicated CpGs (|r| < 0.7) from the BMI epigenome-wide association study. BMI, body mass index; PIVUS, Prospective Investigation of the Vasculature in Uppsala Seniors. https://doi.org/10.1371/journal.pmed.1002215.g002 Three-Way Association of DNA Methylation, Gene Expression, and BMI We examined the association of DNA methylation at the 83 replicated BMI-related CpGs with gene expression among 2,246 FHS participants, in order to determine which genes in blood may be influenced by differential methylation of the BMI EWAS CpGs. Of the 83 replicated CpGs, annotated gene expression from whole blood was available for 62 CpG–gene expression pairs (three transcript results were unavailable on the microarray, and 18 CpGs were intergenic). There were significant associations (p-value < 8 × 10−4; 0.05/62) between differential DNA methylation and gene expression in whole blood for 19 CpG–gene expression pairs, representing ten unique gene transcripts (ABCG1, CPT1A, SREBF1, LGALS3BP, DHCR24, PHGDH, SARS, NOD2, CACNA2D3, and SLC1A5), with almost all of the CpG–gene expression pairs (18/19; 95%) demonstrating an inverse association of methylation with expression (S7 Table). There were significant three-way associations (CpG versus BMI; CpG versus gene expression; gene expression versus BMI) for 11 CpGs with seven unique annotated genes (Table 4). Five of the seven genes (71%) with significant three-way associations between CpG–gene expression–BMI are known to exhibit cardiometabolic phenotypes in murine gene knockout models [37–44]. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Association results from 11 replicated CpGs with significant three-way associations in whole blood between CpG methylation and BMI, CpG methylation and gene expression, and gene expression and BMI. https://doi.org/10.1371/journal.pmed.1002215.t004 Functional and Regulatory Annotation of the BMI EWAS Findings Gene Ontology pathway analyses. GO analyses of biological process, molecular function, and cellular component pathways of the 55 unique genes annotated to the 83 replicated CpGs (ten CpGs were annotated to genes annotated to other replicated CpGs, and 18 CpGs were intergenic) did not identify any statistically significant pathways after adjustment for multiple testing. Secondarily, in order to further refine gene selection for GO analyses to the genes that demonstrated altered expression, we restricted the GO analyses to the ten unique genes for which variation in expression was associated with differential methylation, as described in the previous section. We identified significant overrepresentation of a biological process pathway in the positive regulation of lipid metabolic processes (GO:0045834; adjusted p-value = 0.002; 64-fold enrichment; four overlapping genes [ABCG1, SREBF1, CPT1A and NOD2] of 130 total genes in pathway) and two related processes (positive regulation of the cholesterol biosynthetic [GO:0045542] and cholesterol metabolic [GO:0090205] processes; adjusted p-value = 0.02–0.03). Regulatory annotation of CpGs associated with gene expression in blood. Most BMI-related CpGs associated with altered gene expression were located within 50 kb of the transcription start site and were within known enhancer or DHSs (S8 and S9 Figs). CpGs associated with BMI were more likely to be in enhancers and DHSs (enrichment p-value = 4.5 × 10−7 and 9.4 × 10−4, respectively) and less likely to reside in CpG islands (depletion p-value = 3.2 × 10−11) compared to the full set of measured CpGs on the microarray (S8 Table). DNase I hypersensitive site testing of all identified CpGs. Tissue- and cell-type-specific DHS enrichment testing using the eFORGE v1.2 tool demonstrated that the BMI-related CpGs are enriched in DHSs across almost every tissue and cell type assayed in the included ENCODE, BLUEPRINT Epigenome, and Roadmap Epigenomics Project datasets (S10 and S11 Figs), thus supporting the notion that the CpGs identified in blood are also situated in known active regulatory regions in not only blood, but also other metabolically active tissues. Further stratification by whether BMI-related CpGs had overlapping H3 histone methylation revealed that the BMI-related CpGs predominately overlapped regions with mono-methylation and, to a lesser extent, tri-methylation of lysine 4 on histone H3K4 (H3K4me1 and H3K4me3) across numerous tissues from the consolidated Roadmap Epigenomics Project data (S12–S14 Figs). H3K4me1 marks are indicative of enhancers, H3K4me3 marks are indicative of promoters, and both are known markers of transcriptional activation. Gene Ontology pathway analyses. GO analyses of biological process, molecular function, and cellular component pathways of the 55 unique genes annotated to the 83 replicated CpGs (ten CpGs were annotated to genes annotated to other replicated CpGs, and 18 CpGs were intergenic) did not identify any statistically significant pathways after adjustment for multiple testing. Secondarily, in order to further refine gene selection for GO analyses to the genes that demonstrated altered expression, we restricted the GO analyses to the ten unique genes for which variation in expression was associated with differential methylation, as described in the previous section. We identified significant overrepresentation of a biological process pathway in the positive regulation of lipid metabolic processes (GO:0045834; adjusted p-value = 0.002; 64-fold enrichment; four overlapping genes [ABCG1, SREBF1, CPT1A and NOD2] of 130 total genes in pathway) and two related processes (positive regulation of the cholesterol biosynthetic [GO:0045542] and cholesterol metabolic [GO:0090205] processes; adjusted p-value = 0.02–0.03). Regulatory annotation of CpGs associated with gene expression in blood. Most BMI-related CpGs associated with altered gene expression were located within 50 kb of the transcription start site and were within known enhancer or DHSs (S8 and S9 Figs). CpGs associated with BMI were more likely to be in enhancers and DHSs (enrichment p-value = 4.5 × 10−7 and 9.4 × 10−4, respectively) and less likely to reside in CpG islands (depletion p-value = 3.2 × 10−11) compared to the full set of measured CpGs on the microarray (S8 Table). DNase I hypersensitive site testing of all identified CpGs. Tissue- and cell-type-specific DHS enrichment testing using the eFORGE v1.2 tool demonstrated that the BMI-related CpGs are enriched in DHSs across almost every tissue and cell type assayed in the included ENCODE, BLUEPRINT Epigenome, and Roadmap Epigenomics Project datasets (S10 and S11 Figs), thus supporting the notion that the CpGs identified in blood are also situated in known active regulatory regions in not only blood, but also other metabolically active tissues. Further stratification by whether BMI-related CpGs had overlapping H3 histone methylation revealed that the BMI-related CpGs predominately overlapped regions with mono-methylation and, to a lesser extent, tri-methylation of lysine 4 on histone H3K4 (H3K4me1 and H3K4me3) across numerous tissues from the consolidated Roadmap Epigenomics Project data (S12–S14 Figs). H3K4me1 marks are indicative of enhancers, H3K4me3 marks are indicative of promoters, and both are known markers of transcriptional activation. Genetic Instrumental Variable Analyses (Mendelian Randomization) Successive genetic IV analyses were conducted to infer causal relations between differential methylation, gene expression, and BMI, followed by evaluation of the modeled epigenetic changes on adiposity-related traits using GWAS results (Table 1). Forward Mendelian randomization. Testing the causal association of DNA methylation with BMI revealed that differential methylation at two CpGs had nominally significant causal associations (p-value < 0.05) with BMI: (1) cg11024682 (SREBF1; cis-meQTL SNP IV rs752579) and (2) cg07730360 (a non-annotated CpG on Chromosome 3q21.3; trans-meQTL SNP IV rs13437553), with causal p-value = 0.02 and 0.04, respectively (S15 Fig; S9 Table). Taking forward the two causal CpGs in discovery for external validation, we found that modeled differential methylation at one of the two CpGs (cis-meQTL SNP IV rs752579 for differential methylation at cg11024682 [SREBF1]) was associated with BMI in the 2015 GIANT consortium results (p = 0.0003; all ancestries). Two-step Mendelian randomization (first step). In the first step (DNA methylation affecting the mediator, gene expression), the SNP IV (rs752579) utilized in the forward MR analyses to model differential methylation of the SREBF1 locus (cg11024682) was also found to be strongly associated with altered SREBF1 gene expression in blood in the FHS (p = 3 × 10−12; decreased expression in relation to the C allele), a published [45] blood expression quantitative trait locus (eQTL) dataset (p = 3.2 × 10−6; direction of effect in blood consistent with that seen in the FHS), and liver (p = 1 × 10−15; in the same direction as observed in blood in a reanalysis of 958 samples [46,47]). Two-step Mendelian randomization (second step). In the second step (gene expression in blood and alternate tissues affecting BMI), we identified adequate eQTLs for SREBF1 expression in whole blood (rs1889018; p = 1.7 × 10−15) from the FHS; in adrenal gland (rs4925138; p = 1.1 × 10−6) and liver (rs11078366; p = 1.8 × 10−6) from the Genotype-Tissue Expression (GTEx) Project; and in adipose tissue (rs4985779; p = 8.4 × 10−4) from the larger MuTHER dataset [48]. The multi-tissue SREBF1 eQTLs were selected to be largely independent from the SREBF1 methylation locus SNP IV (details in S1 Methods). We identified significant associations with BMI (adjusted for the four tests, p < 0.013) in the GIANT consortium results for two of the four tissue types; specifically, BMI was associated with the SNP IV for SREBF1 expression in whole blood (rs1889018, p = 0.002) and adrenal gland (rs4925138, p = 0.0098), but not liver (rs11078366, p = 0.89) or adipose tissue (rs4985779, p = 0.80). Adiposity-related traits in GWASs. Assessing other cardiometabolic disease associations from published GWASs, the SNP IV (rs752579) for exposure to differential methylation at the SREBF1 locus (cg11024682) was also found to be associated with (1) adiposity-related traits [49–51] (waist-hip ratio adjusted for BMI [p = 2.0 × 10−4], adiponectin [p = 0.007], birthweight [p = 0.046]), (2) diabetes traits [52–55] (type 2 diabetes [p = 0.002], fasting insulin adjusted for BMI [p = 0.001], HbA1C [p = 0.003], HOMA-B [p = 0.007]), (3) lipid levels [56] (triglycerides [p = 0.001], high-density lipoprotein cholesterol [p = 0.03]), and (4) coronary artery disease [57] (p = 1.7 × 10−6). Additionally, the SNP IV for increased SREBF1 expression in whole blood (rs1889018) was also associated with waist-hip ratio (p = 0.0002), adiponectin (p = 0.003), and triglycerides (p = 0.02) based on GWAS results [49,50,56]. The SNP IV for increased SREBF1 expression in the adrenal gland (rs4925138) was also nominally associated with adiponectin (p = 0.02), triglycerides (p = 0.022), and low-density lipoprotein change in response to statin treatment (p = 0.04) [50,56,58]. Causal effect estimates. Each SD increase in DNA methylation at the SREBF1 locus (cg11024682) was predicted to result in a 2.8-kg/m2 decrease in BMI in the FHS (modeling the effect of allele C for rs752579). In contrast, the observed relationship between methylation in blood and BMI in the FHS was in the opposite direction: a 1.0-kg/m2 increase in BMI per SD increase in DNA methylation at cg11024682. The predicted direction of effect between methylation and BMI is partly derived from the observed direction of effect between the SNP IV and methylation in blood. Previous literature has reported cell-type-dependent QTLs with opposite directions of effect between a SNP and methylation or expression depending on the cell or tissue type examined [59]. As extensive databases of trans-tissue methylation are unavailable, we examined trans-tissue eQTLs for SREBF1 from the GTEx Portal [60]. A series of eQTLs for SREBF1 (false discovery rate ≤ 0.05) demonstrate opposite direction of effect between blood versus adrenal gland (p-value < 10−6) and additional tissues (at p-value < 10−5) such as skeletal muscle, esophagus, aorta tissue, and tibial nerve (http://www.gtexportal.org/home/bubbleHeatmapPage/SREBF1). Strong eQTLs for SREBF1 are likely present in adrenal tissue as SREBF1 is highly expressed in the adrenal gland compared to other tissues (http://www.proteinatlas.org/ENSG00000072310-SREBF1/tissue). For example, rs854764 is a strong eQTL for SREBF1 in both blood and adrenal tissue but in opposite directions (p = 3.8 × 10−12 and p = 4 × 10−6, respectively, in the GTEx catalog) and is associated with BMI in GIANT (p = 0.001) and waist-hip ratio (p = 9.2 × 10−4), adiponectin (p = 0.02), HbA1C (p = 0.02), type 2 diabetes (p = 0.03), triglycerides (p = 0.04), and coronary artery disease (p = 1.1 × 10−5) in GWAS results [4,7,50,52,54,57,61]. This SNP, rs854764, is also a meQTL for SREBF1 locus methylation at cg11024682 in the FHS (p = 2.8 × 10−18), but the association with SREBF1 locus methylation in adrenal gland, the potential tissue of effect, is unknown. See S10 Table for causal effect estimates and confidence intervals for the second step of the two-step MR analyses. Reverse Mendelian randomization. To test whether BMI affects methylation at the identified CpGs, the additive weighted genetic risk score of 97 known BMI SNPs [7] was used as an IV for BMI (F-test statistic = 26). Sixteen CpGs were found to be differentially methylated as a consequence of BMI using a nominal causal p-value < 0.05 cutoff (full list in S11 Table). The 16 downstream CpGs were annotated to 12 genes (ABCG1, USP22, DPF1, RARA, KDM2B, KANK2, RALB, NT5DC2, DENND4B, B3GNT7, DKK4, and ABAT). A sensitivity analysis using a single SNP in the FTO locus as a BMI IV (S12 Table) further supported causal associations downstream of BMI at two of the 16 CpGs (nominal causal p-value < 0.05 for cg06500161 and cg04286697, at the ABCG1 and B3GNT7 loci, respectively). The annotated genes with BMI-related differential methylation are characterized in Fig 3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Annotated genes of replicated differentially methylated CpGs identified in the BMI epigenome-wide association study. Genes are grouped by association with gene expression, association of gene expression with BMI, and Mendelian randomization analyses for causal support. Duplicate gene names within the same group are not shown. Figure does not include 18 intergenic CpGs without a gene annotation. BMI, body mass index; EWAS, epigenome-wide association study. https://doi.org/10.1371/journal.pmed.1002215.g003 Forward Mendelian randomization. Testing the causal association of DNA methylation with BMI revealed that differential methylation at two CpGs had nominally significant causal associations (p-value < 0.05) with BMI: (1) cg11024682 (SREBF1; cis-meQTL SNP IV rs752579) and (2) cg07730360 (a non-annotated CpG on Chromosome 3q21.3; trans-meQTL SNP IV rs13437553), with causal p-value = 0.02 and 0.04, respectively (S15 Fig; S9 Table). Taking forward the two causal CpGs in discovery for external validation, we found that modeled differential methylation at one of the two CpGs (cis-meQTL SNP IV rs752579 for differential methylation at cg11024682 [SREBF1]) was associated with BMI in the 2015 GIANT consortium results (p = 0.0003; all ancestries). Two-step Mendelian randomization (first step). In the first step (DNA methylation affecting the mediator, gene expression), the SNP IV (rs752579) utilized in the forward MR analyses to model differential methylation of the SREBF1 locus (cg11024682) was also found to be strongly associated with altered SREBF1 gene expression in blood in the FHS (p = 3 × 10−12; decreased expression in relation to the C allele), a published [45] blood expression quantitative trait locus (eQTL) dataset (p = 3.2 × 10−6; direction of effect in blood consistent with that seen in the FHS), and liver (p = 1 × 10−15; in the same direction as observed in blood in a reanalysis of 958 samples [46,47]). Two-step Mendelian randomization (second step). In the second step (gene expression in blood and alternate tissues affecting BMI), we identified adequate eQTLs for SREBF1 expression in whole blood (rs1889018; p = 1.7 × 10−15) from the FHS; in adrenal gland (rs4925138; p = 1.1 × 10−6) and liver (rs11078366; p = 1.8 × 10−6) from the Genotype-Tissue Expression (GTEx) Project; and in adipose tissue (rs4985779; p = 8.4 × 10−4) from the larger MuTHER dataset [48]. The multi-tissue SREBF1 eQTLs were selected to be largely independent from the SREBF1 methylation locus SNP IV (details in S1 Methods). We identified significant associations with BMI (adjusted for the four tests, p < 0.013) in the GIANT consortium results for two of the four tissue types; specifically, BMI was associated with the SNP IV for SREBF1 expression in whole blood (rs1889018, p = 0.002) and adrenal gland (rs4925138, p = 0.0098), but not liver (rs11078366, p = 0.89) or adipose tissue (rs4985779, p = 0.80). Adiposity-related traits in GWASs. Assessing other cardiometabolic disease associations from published GWASs, the SNP IV (rs752579) for exposure to differential methylation at the SREBF1 locus (cg11024682) was also found to be associated with (1) adiposity-related traits [49–51] (waist-hip ratio adjusted for BMI [p = 2.0 × 10−4], adiponectin [p = 0.007], birthweight [p = 0.046]), (2) diabetes traits [52–55] (type 2 diabetes [p = 0.002], fasting insulin adjusted for BMI [p = 0.001], HbA1C [p = 0.003], HOMA-B [p = 0.007]), (3) lipid levels [56] (triglycerides [p = 0.001], high-density lipoprotein cholesterol [p = 0.03]), and (4) coronary artery disease [57] (p = 1.7 × 10−6). Additionally, the SNP IV for increased SREBF1 expression in whole blood (rs1889018) was also associated with waist-hip ratio (p = 0.0002), adiponectin (p = 0.003), and triglycerides (p = 0.02) based on GWAS results [49,50,56]. The SNP IV for increased SREBF1 expression in the adrenal gland (rs4925138) was also nominally associated with adiponectin (p = 0.02), triglycerides (p = 0.022), and low-density lipoprotein change in response to statin treatment (p = 0.04) [50,56,58]. Causal effect estimates. Each SD increase in DNA methylation at the SREBF1 locus (cg11024682) was predicted to result in a 2.8-kg/m2 decrease in BMI in the FHS (modeling the effect of allele C for rs752579). In contrast, the observed relationship between methylation in blood and BMI in the FHS was in the opposite direction: a 1.0-kg/m2 increase in BMI per SD increase in DNA methylation at cg11024682. The predicted direction of effect between methylation and BMI is partly derived from the observed direction of effect between the SNP IV and methylation in blood. Previous literature has reported cell-type-dependent QTLs with opposite directions of effect between a SNP and methylation or expression depending on the cell or tissue type examined [59]. As extensive databases of trans-tissue methylation are unavailable, we examined trans-tissue eQTLs for SREBF1 from the GTEx Portal [60]. A series of eQTLs for SREBF1 (false discovery rate ≤ 0.05) demonstrate opposite direction of effect between blood versus adrenal gland (p-value < 10−6) and additional tissues (at p-value < 10−5) such as skeletal muscle, esophagus, aorta tissue, and tibial nerve (http://www.gtexportal.org/home/bubbleHeatmapPage/SREBF1). Strong eQTLs for SREBF1 are likely present in adrenal tissue as SREBF1 is highly expressed in the adrenal gland compared to other tissues (http://www.proteinatlas.org/ENSG00000072310-SREBF1/tissue). For example, rs854764 is a strong eQTL for SREBF1 in both blood and adrenal tissue but in opposite directions (p = 3.8 × 10−12 and p = 4 × 10−6, respectively, in the GTEx catalog) and is associated with BMI in GIANT (p = 0.001) and waist-hip ratio (p = 9.2 × 10−4), adiponectin (p = 0.02), HbA1C (p = 0.02), type 2 diabetes (p = 0.03), triglycerides (p = 0.04), and coronary artery disease (p = 1.1 × 10−5) in GWAS results [4,7,50,52,54,57,61]. This SNP, rs854764, is also a meQTL for SREBF1 locus methylation at cg11024682 in the FHS (p = 2.8 × 10−18), but the association with SREBF1 locus methylation in adrenal gland, the potential tissue of effect, is unknown. See S10 Table for causal effect estimates and confidence intervals for the second step of the two-step MR analyses. Reverse Mendelian randomization. To test whether BMI affects methylation at the identified CpGs, the additive weighted genetic risk score of 97 known BMI SNPs [7] was used as an IV for BMI (F-test statistic = 26). Sixteen CpGs were found to be differentially methylated as a consequence of BMI using a nominal causal p-value < 0.05 cutoff (full list in S11 Table). The 16 downstream CpGs were annotated to 12 genes (ABCG1, USP22, DPF1, RARA, KDM2B, KANK2, RALB, NT5DC2, DENND4B, B3GNT7, DKK4, and ABAT). A sensitivity analysis using a single SNP in the FTO locus as a BMI IV (S12 Table) further supported causal associations downstream of BMI at two of the 16 CpGs (nominal causal p-value < 0.05 for cg06500161 and cg04286697, at the ABCG1 and B3GNT7 loci, respectively). The annotated genes with BMI-related differential methylation are characterized in Fig 3. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Annotated genes of replicated differentially methylated CpGs identified in the BMI epigenome-wide association study. Genes are grouped by association with gene expression, association of gene expression with BMI, and Mendelian randomization analyses for causal support. Duplicate gene names within the same group are not shown. Figure does not include 18 intergenic CpGs without a gene annotation. BMI, body mass index; EWAS, epigenome-wide association study. https://doi.org/10.1371/journal.pmed.1002215.g003 Discussion In this analysis of the association of BMI with differential methylation of blood-derived DNA, we provide robust evidence of a connection between replicable epigenetic signaling at 83 CpGs and BMI. We also demonstrate the correlation of BMI-related differential methylation with the altered expression of ten genes in whole blood that are overrepresented in lipid metabolism pathways. Among the 83 replicated BMI-related CpGs, one differentially methylated locus (cg11024682) at the lipid metabolism transcription factor SREBF1 demonstrated evidence of a causal effect on BMI; genetically predicted exposure to differential methylation and expression of SREBF1 was found to be associated with BMI and other adiposity traits, glycemic traits, dyslipidemia, and coronary artery disease. In contrast, we found that a substantial proportion (16 out of 83 [19%]) of the BMI-related differentially methylated CpGs identified in this EWAS are likely a consequence of BMI (i.e., downstream signals). BMI Variation Is Reflected in DNA Methylation Signatures in Blood A substantial proportion (~18%) of interindividual variation in BMI is captured by the replicated differentially methylated CpGs in blood. The magnitude of BMI difference (~12 kg/m2 between the highest and lowest deciles) equates to substantial health risks; for example, each 5-kg/m2 increase in BMI in the general population is associated with a 30% increase in mortality [62]. Our results suggest that epigenetic biomarkers hold the potential to improve risk prediction and help tailor therapy choices to prevent or treat cardiometabolic diseases. For example, at the population level, BMI is an effective measure of average future cardiometabolic disease risk [63], but it is insufficiently predictive at the individual level. Regardless of causality, blood-based biomarkers can be useful for prognostic or diagnostic purposes. Further research is required to determine whether refining BMI-related risk by incorporating epigenetic biomarkers can improve risk prediction and help guide treatment decisions. Differential Methylation Is Identified in Loci Known to Be Involved in Adiposity Lipid metabolism. Previously conducted experiments support a causal role of SREBF1 in adiposity [64]. SREBF1 (also known as SREBP1) plays a central role in energy homeostasis by promoting glycolysis, lipogenesis, and adipogenesis via induction of the conversion of acetyl-CoA to triglycerides (S16 Fig). SREBF1 promotes the conversion of free fatty acids to triglycerides in the liver and to triglyceride-rich lipoproteins in the bloodstream. In situations of caloric excess, SREBF1 is a key mediator of the induction of lipogenesis in humans [64]. In mice with diet-induced insulin resistance, inhibition of SREBF1 attenuates accelerated atherosclerosis, supporting a link to atherosclerosis and coronary artery disease [65]. The causal connection between increased triglyceride-rich lipoproteins and coronary disease is supported by human genetic studies [66]. We highlight the potential role of SREBF1 expression in the adrenal gland in weight regulation and adiposity-related diseases based on results from the MR analyses. Diseases of the adrenal gland are known to be linked to severe obesity, and adrenalectomy in murine models can reverse genetically induced obesity [67,68]. Our results suggest that altered genomic regulation of SREBF1 is causally related to BMI; however, the lack of large datasets of meQTLs in numerous tissues and under various conditions, in combination with the inability to conduct tissue-targeted epigenetic editing in relevant experimental models, limits our ability to make a definitive causal inference. Regulation of SREBF1 is an underexplored target for the prevention of coronary artery disease. Another of our top genes, CPT1A (carnitine palmitoyltransferase 1A), is an outer mitochondrial membrane enzyme involved in the utilization of acetyl-CoA, functioning as a key enzyme in the beta-oxidation of long-chain fatty acids in mitochondrial energy metabolism. Acetyl-CoA has recently been identified as a central link between altered lipolysis due to adiposity or inflammation and resultant changes in hepatic insulin resistance with cross-communication between liver and adipose tissue [69]. ABCG1 (ATP binding cassette G1), a cell-membrane lipid transporter, has an established role in reverse cholesterol transport; its role in obesity is supported in previous animal [37] and human studies [70]. DHCR24 (24-dehydrocholesterol reductase) catalyzes the reduction of sterol intermediates during cholesterol synthesis. Differential methylation of SREBF1, CPT1A, ABCG1, and DHCR24 has been reported in previous EWASs of adiposity, glycemic traits, and lipids [29–31,71–76]. We add to the published literature and provide evidence that differential methylation at the ABCG1 locus is likely a downstream effect of BMI. From these findings taken together, epigenetic dysregulation is emerging as a common link between obesity and obesity-related comorbidities. Although further functional research is required, we hypothesize that obesity and adiposity-related diseases are partly driven by changes in DNA methylation, with resultant dysregulation of energy balance via effects on expression of lipid metabolism pathway genes. Regulatory mechanisms involved in energy homeostasis have been proposed as attractive targets for the treatment of obesity, metabolic syndrome, and heart disease [77,78]. Our results demonstrate that these connections are evident in humans, adding to previous evidence from animal models [78]. Inflammatory pathways. Aside from the lipid metabolism genes, a number of loci involved in inflammatory pathways were identified by our EWAS. Enlarged adipocytes in obese individuals are known to promote inflammation. One BMI-related differentially methylated CpG was identified at the NOD2 (nucleotide binding oligomerization domain 2) locus. NOD2, an innate immune receptor, is involved in the immune response to bacterial lipopolysaccharides (LPSs) by activating NF-κB signaling. Uptake of LPS from gut microbiota has been demonstrated to result in increased internalization of LPS-rich lipoproteins into adipocytes and promote macrophage conversion from the M2 form to the inflammatory M1 form [79]. NOD2 is also included in the GO pathway for regulation of lipid metabolism (0045834) as it is a positive regulator of phosphatidylinositol 3-kinase activity and has been demonstrated to promote vascular inflammation and formation of lipid-rich atherosclerotic lesions in hypercholesterolemic LDLR−/− mice [80,81]. NOD2 interacts with another BMI-related differentially methylated inflammatory gene locus at SOCS3 (suppressor of cytokine signaling 3), a negative regulator of cytokine signaling. In addition, LGALS3BP (lectin, galactoside binding soluble 3 binding protein), also known as MAC2BP (Mac-2-binding protein), is involved in the immune response associated with lymphokine-activated killer cell cytotoxicity and platelet activation, signaling, and aggregation. LGALS3BP has been found to stimulate host defenses and is elevated in individuals with various types of cancer such as breast, lung, colorectal, ovary, and endometrial cancers, many of which are obesity-related. In addition, LGALS3BP was recently identified as a promising biomarker for non-alcoholic steatohepatitis and pancreatitis [82,83], known obesity-related diseases. Methylation at the LGALS3BP locus demonstrated a significant sex interaction, with a stronger effect in men. This may be related to environmental factors more common in men (such as specific dietary patterns) or male-specific physiology. Differential Methylation Is Identified in Loci Not Previously Linked to Adiposity Serine metabolism. Two of the ten genes differentially expressed in association with BMI-related methylation (PHGDH and SARS) are involved in L-serine metabolism. PHGDH (phosphoglycerate dehydrogenase) is involved in the early steps of the synthesis of the amino acid L-serine, which plays a role in oxidoreductase as a NADP acceptor in the tricarboxylic acid cycle. SARS (seryl-TRNA synthetase) catalyzes the transfer of L-serine to tRNA. In addition, RPS6KA2 (ribosomal protein S6 kinase A2), a locus not previously reported as being BMI-related, is a serine/threonine kinase that acts downstream of MAPK signaling and is involved in cell proliferation. L-serine is necessary for specific functions in the central nervous system; however, the link between adiposity and functional health consequences via effects on serine metabolism is currently unknown. Cell-membrane transporters. In addition to the cell-membrane transporters discussed, two additional membrane transporters were identified among the ten genes associated with differential methylation. SLC1A5 (solute carrier family 1 member 5), which was found to have significant three way associations with altered gene expression in blood and BMI, is a sodium-dependent transporter of amino acids. It is activated by insulin concentration, which is often elevated in individuals with obesity. BMI-related differential methylation was also identified at the CACNA2D3 (calcium channel, voltage-dependent, alpha 2/delta subunit 3) locus. CACNA2D3 is involved in nerve signal transmission and cardiac conduction. BMI EWAS Findings in the Context of Published Epigenetic Epidemiology Studies Previous in silico methods of identifying putative epigenetically regulated obesity genes highlighted SOCS3 (suppressor of cytokine signaling 3) and RARA (retinoic acid receptor alpha) [84], both of which were identified in the FHS-LBC meta-analysis (p = 2.7 × 10−11 for cg27637521 in SOCS3 and p = 1.3 × 10−8 for cg13274938 in RARA). An association study of DNA methylation and BMI in 459 individuals from the Cardiogenics Consortium identified an association of methylation at three CpGs intronic to HIF3A (hypoxia inducible factor 3A) in blood and adipose cells with BMI [28]. We found modest associations of differential methylation and expression at the HIF3A locus with BMI in our study. However, the associations were stronger in younger individuals in the FHS, suggesting that the connection may be less apparent at older ages. At a nominal causal p-value < 0.05, we found that many (16 [19%]) of the replicated CpGs are downstream of BMI. This is consistent with recent findings from longitudinal methylation data and bidirectional MR in the Avon Longitudinal Study of Parents and Children [85] that BMI-related HIF3A methylation is likely secondary to differences in BMI. There is substantial overlap between the identified BMI-related CpGs and reported CpG–metabolite associations in blood from 1,814 participants in the KORA cohort (Kooperative Gesundheitsforschung in der Region Augsburg) [86] (S13 Table). Notably, ceramides and sphingolipids—known to have altered levels among obese individuals and implicated in the development of the metabolic syndrome [87–89]—were identified. In addition, the BMI-related differentially methylated CpG (cg03725309) at the SARS locus, as discussed above in the serine metabolism section, was found to be associated with blood levels of serine. BMI EWAS Findings in the Context of BMI GWAS Results and Nearby Genetic Variants Of note, none of the CpGs associated with BMI was near genes previously identified in GWASs of obesity-related traits, such as FTO (fat mass and obesity associated) or MC4R (melanocortin 4 receptor). We hypothesize that many of the replicated differentially methylated loci reflect novel pathways involved in the regulation of adiposity or adiposity-related diseases. Long-range interactions of DNA methylation with known obesity-related loci, however, may exist [90]. Further work to understand the role of the novel loci in relation to adiposity is also required. In addition, combining information from DNA methylation with genetic markers identified from DNA sequence variation may allow for improvements in risk prediction previously not possible with sequence variants alone [91]. Many of the significant loci from the discovery phase (73 of 135) were replicated in African-Americans from the ARIC study [30]. Similarly, many of the BMI-related differentially methylated CpGs identified in this study were also reported in relation to BMI in people of Arabic ancestry [34]. In GWASs, failure to replicate across racial/ethnic groups may be due to differences in allele frequencies and linkage disequilibrium patterns. In contrast, the high rate of replication of DNA methylation results for BMI in individuals of European and African and other ancestries suggests that shared environmental exposures or changes secondary to differences in BMI, and not genetic variation, may underlie many of the associations. Further work is needed to identify environmental factors that promote or mitigate disease-relevant obesity-related epigenetic dysregulation. Our analyses that conditioned on top meQTLs showed minimal attenuation, suggesting that the association between differential methylation and BMI is largely independent of genetic variants near the reported CpGs. Study Limitations Our study has several limitations. Results from MR analyses utilizing genetically predicted methylation and expression levels do not prove causation but provide supportive evidence. The results of the MR analyses are based on numerous assumptions, for example, that there are not alternative pathways through which the SNP IV may act on BMI (i.e., pleiotropic effects). The MR assumptions cannot be tested directly and may bias the results. The forward MR results did not reach Bonferroni-adjusted significance thresholds for multiple testing; however, validation of the nominally significant results in the larger GIANT consortium supports our findings. We avoided the use of multi-SNP score IVs as we had already identified adequate single SNP meQTLs and using multi-SNP score IVs would have further risked introducing bias due to pleiotropy. The meQTLs for the MR analyses were derived in the FHS and the outcome was tested within the same cohort, which can potentially result in bias toward significance. The MR analyses, using the blood meQTL IV, suggest an inverse relationship between the predicted methylation of the SREBF1 locus and BMI, the reverse of the observed relationship, which can be interpreted as a null result. This finding is potentially explained by different directions of effect of QTLs in alternate tissues, which was supported by examining the association of genetic variants in blood versus other metabolically active tissues in the GTEx Project resource. Unfortunately, there are limited datasets of meQTLs in various tissues to explore this further. The observation of associations of BMI with methylation at the same CpG in different directions of effect in blood versus adipose-derived DNA has been previously reported at BMI-related CpG sites [30]. For SREBF1, we presume that the metabolic consequences of altered methylation and the effect on BMI occur in tissues other than blood, such as the adrenal gland, with the methylation changes in blood that we were able to detect representing a biomarker of trans-tissue differential methylation [92]. In addition, it is possible that positive and negative feedback loops can result in regulation of the same gene to be both a causal and a downstream effect of adiposity. We would not be able to discern this scenario from the observational cross-sectional data in this study. An alternate methylation assay would be required for clinical purposes as the current microarrays are unsuitable in a clinical setting. Future research would be required for technical validation for clinical purposes. Our study supports blood cells as a useful accessible tissue for epigenetic biomarker discovery in large population studies. However, our study would not be able to detect tissue-specific methylation changes occurring in non-blood cell lines (e.g., neuron-specific epigenetic modifications in relation to BMI). Many of our top CpGs replicated in the GOLDN study, which assessed DNA methylation in a single blood cell type (CD4+), suggesting that the associations we detected are not likely to be due to confounding by blood cell heterogeneity. Many of the genes associated with BMI-related differential methylation were known to have a role in adiposity and cardiometabolic traits from murine knockout models; however, the universe of knockout models is likely enriched for the study of adiposity and cardiometabolic traits, and we could not directly test whether our results identified more than expected. Our study was conducted among older-age adults, and the findings may not be generalizable to younger ages. Conclusions We provide the results of a large EWAS of BMI in almost 8,000 individuals that identified 83 replicable DNA methylation loci and evidence of complementary transcriptomic differences that were enriched for gene products involved in lipid metabolism. The genetic IV analyses prioritize the SREBF1 locus for future functional studies to further define the causal relation with adiposity, insulin resistance, obesity-related dyslipidemia, and coronary artery disease. Our findings provide a foundation for further research to determine if individualized epigenetic profiles can be used to guide clinical decision making and improve health outcomes. Our findings may have additional clinical and therapeutic relevance if other loci that are differentially methylated in relation to BMI represent attractive targets for the treatment or prevention of obesity and adiposity-related diseases. BMI Variation Is Reflected in DNA Methylation Signatures in Blood A substantial proportion (~18%) of interindividual variation in BMI is captured by the replicated differentially methylated CpGs in blood. The magnitude of BMI difference (~12 kg/m2 between the highest and lowest deciles) equates to substantial health risks; for example, each 5-kg/m2 increase in BMI in the general population is associated with a 30% increase in mortality [62]. Our results suggest that epigenetic biomarkers hold the potential to improve risk prediction and help tailor therapy choices to prevent or treat cardiometabolic diseases. For example, at the population level, BMI is an effective measure of average future cardiometabolic disease risk [63], but it is insufficiently predictive at the individual level. Regardless of causality, blood-based biomarkers can be useful for prognostic or diagnostic purposes. Further research is required to determine whether refining BMI-related risk by incorporating epigenetic biomarkers can improve risk prediction and help guide treatment decisions. Differential Methylation Is Identified in Loci Known to Be Involved in Adiposity Lipid metabolism. Previously conducted experiments support a causal role of SREBF1 in adiposity [64]. SREBF1 (also known as SREBP1) plays a central role in energy homeostasis by promoting glycolysis, lipogenesis, and adipogenesis via induction of the conversion of acetyl-CoA to triglycerides (S16 Fig). SREBF1 promotes the conversion of free fatty acids to triglycerides in the liver and to triglyceride-rich lipoproteins in the bloodstream. In situations of caloric excess, SREBF1 is a key mediator of the induction of lipogenesis in humans [64]. In mice with diet-induced insulin resistance, inhibition of SREBF1 attenuates accelerated atherosclerosis, supporting a link to atherosclerosis and coronary artery disease [65]. The causal connection between increased triglyceride-rich lipoproteins and coronary disease is supported by human genetic studies [66]. We highlight the potential role of SREBF1 expression in the adrenal gland in weight regulation and adiposity-related diseases based on results from the MR analyses. Diseases of the adrenal gland are known to be linked to severe obesity, and adrenalectomy in murine models can reverse genetically induced obesity [67,68]. Our results suggest that altered genomic regulation of SREBF1 is causally related to BMI; however, the lack of large datasets of meQTLs in numerous tissues and under various conditions, in combination with the inability to conduct tissue-targeted epigenetic editing in relevant experimental models, limits our ability to make a definitive causal inference. Regulation of SREBF1 is an underexplored target for the prevention of coronary artery disease. Another of our top genes, CPT1A (carnitine palmitoyltransferase 1A), is an outer mitochondrial membrane enzyme involved in the utilization of acetyl-CoA, functioning as a key enzyme in the beta-oxidation of long-chain fatty acids in mitochondrial energy metabolism. Acetyl-CoA has recently been identified as a central link between altered lipolysis due to adiposity or inflammation and resultant changes in hepatic insulin resistance with cross-communication between liver and adipose tissue [69]. ABCG1 (ATP binding cassette G1), a cell-membrane lipid transporter, has an established role in reverse cholesterol transport; its role in obesity is supported in previous animal [37] and human studies [70]. DHCR24 (24-dehydrocholesterol reductase) catalyzes the reduction of sterol intermediates during cholesterol synthesis. Differential methylation of SREBF1, CPT1A, ABCG1, and DHCR24 has been reported in previous EWASs of adiposity, glycemic traits, and lipids [29–31,71–76]. We add to the published literature and provide evidence that differential methylation at the ABCG1 locus is likely a downstream effect of BMI. From these findings taken together, epigenetic dysregulation is emerging as a common link between obesity and obesity-related comorbidities. Although further functional research is required, we hypothesize that obesity and adiposity-related diseases are partly driven by changes in DNA methylation, with resultant dysregulation of energy balance via effects on expression of lipid metabolism pathway genes. Regulatory mechanisms involved in energy homeostasis have been proposed as attractive targets for the treatment of obesity, metabolic syndrome, and heart disease [77,78]. Our results demonstrate that these connections are evident in humans, adding to previous evidence from animal models [78]. Inflammatory pathways. Aside from the lipid metabolism genes, a number of loci involved in inflammatory pathways were identified by our EWAS. Enlarged adipocytes in obese individuals are known to promote inflammation. One BMI-related differentially methylated CpG was identified at the NOD2 (nucleotide binding oligomerization domain 2) locus. NOD2, an innate immune receptor, is involved in the immune response to bacterial lipopolysaccharides (LPSs) by activating NF-κB signaling. Uptake of LPS from gut microbiota has been demonstrated to result in increased internalization of LPS-rich lipoproteins into adipocytes and promote macrophage conversion from the M2 form to the inflammatory M1 form [79]. NOD2 is also included in the GO pathway for regulation of lipid metabolism (0045834) as it is a positive regulator of phosphatidylinositol 3-kinase activity and has been demonstrated to promote vascular inflammation and formation of lipid-rich atherosclerotic lesions in hypercholesterolemic LDLR−/− mice [80,81]. NOD2 interacts with another BMI-related differentially methylated inflammatory gene locus at SOCS3 (suppressor of cytokine signaling 3), a negative regulator of cytokine signaling. In addition, LGALS3BP (lectin, galactoside binding soluble 3 binding protein), also known as MAC2BP (Mac-2-binding protein), is involved in the immune response associated with lymphokine-activated killer cell cytotoxicity and platelet activation, signaling, and aggregation. LGALS3BP has been found to stimulate host defenses and is elevated in individuals with various types of cancer such as breast, lung, colorectal, ovary, and endometrial cancers, many of which are obesity-related. In addition, LGALS3BP was recently identified as a promising biomarker for non-alcoholic steatohepatitis and pancreatitis [82,83], known obesity-related diseases. Methylation at the LGALS3BP locus demonstrated a significant sex interaction, with a stronger effect in men. This may be related to environmental factors more common in men (such as specific dietary patterns) or male-specific physiology. Lipid metabolism. Previously conducted experiments support a causal role of SREBF1 in adiposity [64]. SREBF1 (also known as SREBP1) plays a central role in energy homeostasis by promoting glycolysis, lipogenesis, and adipogenesis via induction of the conversion of acetyl-CoA to triglycerides (S16 Fig). SREBF1 promotes the conversion of free fatty acids to triglycerides in the liver and to triglyceride-rich lipoproteins in the bloodstream. In situations of caloric excess, SREBF1 is a key mediator of the induction of lipogenesis in humans [64]. In mice with diet-induced insulin resistance, inhibition of SREBF1 attenuates accelerated atherosclerosis, supporting a link to atherosclerosis and coronary artery disease [65]. The causal connection between increased triglyceride-rich lipoproteins and coronary disease is supported by human genetic studies [66]. We highlight the potential role of SREBF1 expression in the adrenal gland in weight regulation and adiposity-related diseases based on results from the MR analyses. Diseases of the adrenal gland are known to be linked to severe obesity, and adrenalectomy in murine models can reverse genetically induced obesity [67,68]. Our results suggest that altered genomic regulation of SREBF1 is causally related to BMI; however, the lack of large datasets of meQTLs in numerous tissues and under various conditions, in combination with the inability to conduct tissue-targeted epigenetic editing in relevant experimental models, limits our ability to make a definitive causal inference. Regulation of SREBF1 is an underexplored target for the prevention of coronary artery disease. Another of our top genes, CPT1A (carnitine palmitoyltransferase 1A), is an outer mitochondrial membrane enzyme involved in the utilization of acetyl-CoA, functioning as a key enzyme in the beta-oxidation of long-chain fatty acids in mitochondrial energy metabolism. Acetyl-CoA has recently been identified as a central link between altered lipolysis due to adiposity or inflammation and resultant changes in hepatic insulin resistance with cross-communication between liver and adipose tissue [69]. ABCG1 (ATP binding cassette G1), a cell-membrane lipid transporter, has an established role in reverse cholesterol transport; its role in obesity is supported in previous animal [37] and human studies [70]. DHCR24 (24-dehydrocholesterol reductase) catalyzes the reduction of sterol intermediates during cholesterol synthesis. Differential methylation of SREBF1, CPT1A, ABCG1, and DHCR24 has been reported in previous EWASs of adiposity, glycemic traits, and lipids [29–31,71–76]. We add to the published literature and provide evidence that differential methylation at the ABCG1 locus is likely a downstream effect of BMI. From these findings taken together, epigenetic dysregulation is emerging as a common link between obesity and obesity-related comorbidities. Although further functional research is required, we hypothesize that obesity and adiposity-related diseases are partly driven by changes in DNA methylation, with resultant dysregulation of energy balance via effects on expression of lipid metabolism pathway genes. Regulatory mechanisms involved in energy homeostasis have been proposed as attractive targets for the treatment of obesity, metabolic syndrome, and heart disease [77,78]. Our results demonstrate that these connections are evident in humans, adding to previous evidence from animal models [78]. Inflammatory pathways. Aside from the lipid metabolism genes, a number of loci involved in inflammatory pathways were identified by our EWAS. Enlarged adipocytes in obese individuals are known to promote inflammation. One BMI-related differentially methylated CpG was identified at the NOD2 (nucleotide binding oligomerization domain 2) locus. NOD2, an innate immune receptor, is involved in the immune response to bacterial lipopolysaccharides (LPSs) by activating NF-κB signaling. Uptake of LPS from gut microbiota has been demonstrated to result in increased internalization of LPS-rich lipoproteins into adipocytes and promote macrophage conversion from the M2 form to the inflammatory M1 form [79]. NOD2 is also included in the GO pathway for regulation of lipid metabolism (0045834) as it is a positive regulator of phosphatidylinositol 3-kinase activity and has been demonstrated to promote vascular inflammation and formation of lipid-rich atherosclerotic lesions in hypercholesterolemic LDLR−/− mice [80,81]. NOD2 interacts with another BMI-related differentially methylated inflammatory gene locus at SOCS3 (suppressor of cytokine signaling 3), a negative regulator of cytokine signaling. In addition, LGALS3BP (lectin, galactoside binding soluble 3 binding protein), also known as MAC2BP (Mac-2-binding protein), is involved in the immune response associated with lymphokine-activated killer cell cytotoxicity and platelet activation, signaling, and aggregation. LGALS3BP has been found to stimulate host defenses and is elevated in individuals with various types of cancer such as breast, lung, colorectal, ovary, and endometrial cancers, many of which are obesity-related. In addition, LGALS3BP was recently identified as a promising biomarker for non-alcoholic steatohepatitis and pancreatitis [82,83], known obesity-related diseases. Methylation at the LGALS3BP locus demonstrated a significant sex interaction, with a stronger effect in men. This may be related to environmental factors more common in men (such as specific dietary patterns) or male-specific physiology. Differential Methylation Is Identified in Loci Not Previously Linked to Adiposity Serine metabolism. Two of the ten genes differentially expressed in association with BMI-related methylation (PHGDH and SARS) are involved in L-serine metabolism. PHGDH (phosphoglycerate dehydrogenase) is involved in the early steps of the synthesis of the amino acid L-serine, which plays a role in oxidoreductase as a NADP acceptor in the tricarboxylic acid cycle. SARS (seryl-TRNA synthetase) catalyzes the transfer of L-serine to tRNA. In addition, RPS6KA2 (ribosomal protein S6 kinase A2), a locus not previously reported as being BMI-related, is a serine/threonine kinase that acts downstream of MAPK signaling and is involved in cell proliferation. L-serine is necessary for specific functions in the central nervous system; however, the link between adiposity and functional health consequences via effects on serine metabolism is currently unknown. Cell-membrane transporters. In addition to the cell-membrane transporters discussed, two additional membrane transporters were identified among the ten genes associated with differential methylation. SLC1A5 (solute carrier family 1 member 5), which was found to have significant three way associations with altered gene expression in blood and BMI, is a sodium-dependent transporter of amino acids. It is activated by insulin concentration, which is often elevated in individuals with obesity. BMI-related differential methylation was also identified at the CACNA2D3 (calcium channel, voltage-dependent, alpha 2/delta subunit 3) locus. CACNA2D3 is involved in nerve signal transmission and cardiac conduction. Serine metabolism. Two of the ten genes differentially expressed in association with BMI-related methylation (PHGDH and SARS) are involved in L-serine metabolism. PHGDH (phosphoglycerate dehydrogenase) is involved in the early steps of the synthesis of the amino acid L-serine, which plays a role in oxidoreductase as a NADP acceptor in the tricarboxylic acid cycle. SARS (seryl-TRNA synthetase) catalyzes the transfer of L-serine to tRNA. In addition, RPS6KA2 (ribosomal protein S6 kinase A2), a locus not previously reported as being BMI-related, is a serine/threonine kinase that acts downstream of MAPK signaling and is involved in cell proliferation. L-serine is necessary for specific functions in the central nervous system; however, the link between adiposity and functional health consequences via effects on serine metabolism is currently unknown. Cell-membrane transporters. In addition to the cell-membrane transporters discussed, two additional membrane transporters were identified among the ten genes associated with differential methylation. SLC1A5 (solute carrier family 1 member 5), which was found to have significant three way associations with altered gene expression in blood and BMI, is a sodium-dependent transporter of amino acids. It is activated by insulin concentration, which is often elevated in individuals with obesity. BMI-related differential methylation was also identified at the CACNA2D3 (calcium channel, voltage-dependent, alpha 2/delta subunit 3) locus. CACNA2D3 is involved in nerve signal transmission and cardiac conduction. BMI EWAS Findings in the Context of Published Epigenetic Epidemiology Studies Previous in silico methods of identifying putative epigenetically regulated obesity genes highlighted SOCS3 (suppressor of cytokine signaling 3) and RARA (retinoic acid receptor alpha) [84], both of which were identified in the FHS-LBC meta-analysis (p = 2.7 × 10−11 for cg27637521 in SOCS3 and p = 1.3 × 10−8 for cg13274938 in RARA). An association study of DNA methylation and BMI in 459 individuals from the Cardiogenics Consortium identified an association of methylation at three CpGs intronic to HIF3A (hypoxia inducible factor 3A) in blood and adipose cells with BMI [28]. We found modest associations of differential methylation and expression at the HIF3A locus with BMI in our study. However, the associations were stronger in younger individuals in the FHS, suggesting that the connection may be less apparent at older ages. At a nominal causal p-value < 0.05, we found that many (16 [19%]) of the replicated CpGs are downstream of BMI. This is consistent with recent findings from longitudinal methylation data and bidirectional MR in the Avon Longitudinal Study of Parents and Children [85] that BMI-related HIF3A methylation is likely secondary to differences in BMI. There is substantial overlap between the identified BMI-related CpGs and reported CpG–metabolite associations in blood from 1,814 participants in the KORA cohort (Kooperative Gesundheitsforschung in der Region Augsburg) [86] (S13 Table). Notably, ceramides and sphingolipids—known to have altered levels among obese individuals and implicated in the development of the metabolic syndrome [87–89]—were identified. In addition, the BMI-related differentially methylated CpG (cg03725309) at the SARS locus, as discussed above in the serine metabolism section, was found to be associated with blood levels of serine. BMI EWAS Findings in the Context of BMI GWAS Results and Nearby Genetic Variants Of note, none of the CpGs associated with BMI was near genes previously identified in GWASs of obesity-related traits, such as FTO (fat mass and obesity associated) or MC4R (melanocortin 4 receptor). We hypothesize that many of the replicated differentially methylated loci reflect novel pathways involved in the regulation of adiposity or adiposity-related diseases. Long-range interactions of DNA methylation with known obesity-related loci, however, may exist [90]. Further work to understand the role of the novel loci in relation to adiposity is also required. In addition, combining information from DNA methylation with genetic markers identified from DNA sequence variation may allow for improvements in risk prediction previously not possible with sequence variants alone [91]. Many of the significant loci from the discovery phase (73 of 135) were replicated in African-Americans from the ARIC study [30]. Similarly, many of the BMI-related differentially methylated CpGs identified in this study were also reported in relation to BMI in people of Arabic ancestry [34]. In GWASs, failure to replicate across racial/ethnic groups may be due to differences in allele frequencies and linkage disequilibrium patterns. In contrast, the high rate of replication of DNA methylation results for BMI in individuals of European and African and other ancestries suggests that shared environmental exposures or changes secondary to differences in BMI, and not genetic variation, may underlie many of the associations. Further work is needed to identify environmental factors that promote or mitigate disease-relevant obesity-related epigenetic dysregulation. Our analyses that conditioned on top meQTLs showed minimal attenuation, suggesting that the association between differential methylation and BMI is largely independent of genetic variants near the reported CpGs. Study Limitations Our study has several limitations. Results from MR analyses utilizing genetically predicted methylation and expression levels do not prove causation but provide supportive evidence. The results of the MR analyses are based on numerous assumptions, for example, that there are not alternative pathways through which the SNP IV may act on BMI (i.e., pleiotropic effects). The MR assumptions cannot be tested directly and may bias the results. The forward MR results did not reach Bonferroni-adjusted significance thresholds for multiple testing; however, validation of the nominally significant results in the larger GIANT consortium supports our findings. We avoided the use of multi-SNP score IVs as we had already identified adequate single SNP meQTLs and using multi-SNP score IVs would have further risked introducing bias due to pleiotropy. The meQTLs for the MR analyses were derived in the FHS and the outcome was tested within the same cohort, which can potentially result in bias toward significance. The MR analyses, using the blood meQTL IV, suggest an inverse relationship between the predicted methylation of the SREBF1 locus and BMI, the reverse of the observed relationship, which can be interpreted as a null result. This finding is potentially explained by different directions of effect of QTLs in alternate tissues, which was supported by examining the association of genetic variants in blood versus other metabolically active tissues in the GTEx Project resource. Unfortunately, there are limited datasets of meQTLs in various tissues to explore this further. The observation of associations of BMI with methylation at the same CpG in different directions of effect in blood versus adipose-derived DNA has been previously reported at BMI-related CpG sites [30]. For SREBF1, we presume that the metabolic consequences of altered methylation and the effect on BMI occur in tissues other than blood, such as the adrenal gland, with the methylation changes in blood that we were able to detect representing a biomarker of trans-tissue differential methylation [92]. In addition, it is possible that positive and negative feedback loops can result in regulation of the same gene to be both a causal and a downstream effect of adiposity. We would not be able to discern this scenario from the observational cross-sectional data in this study. An alternate methylation assay would be required for clinical purposes as the current microarrays are unsuitable in a clinical setting. Future research would be required for technical validation for clinical purposes. Our study supports blood cells as a useful accessible tissue for epigenetic biomarker discovery in large population studies. However, our study would not be able to detect tissue-specific methylation changes occurring in non-blood cell lines (e.g., neuron-specific epigenetic modifications in relation to BMI). Many of our top CpGs replicated in the GOLDN study, which assessed DNA methylation in a single blood cell type (CD4+), suggesting that the associations we detected are not likely to be due to confounding by blood cell heterogeneity. Many of the genes associated with BMI-related differential methylation were known to have a role in adiposity and cardiometabolic traits from murine knockout models; however, the universe of knockout models is likely enriched for the study of adiposity and cardiometabolic traits, and we could not directly test whether our results identified more than expected. Our study was conducted among older-age adults, and the findings may not be generalizable to younger ages. Conclusions We provide the results of a large EWAS of BMI in almost 8,000 individuals that identified 83 replicable DNA methylation loci and evidence of complementary transcriptomic differences that were enriched for gene products involved in lipid metabolism. The genetic IV analyses prioritize the SREBF1 locus for future functional studies to further define the causal relation with adiposity, insulin resistance, obesity-related dyslipidemia, and coronary artery disease. Our findings provide a foundation for further research to determine if individualized epigenetic profiles can be used to guide clinical decision making and improve health outcomes. Our findings may have additional clinical and therapeutic relevance if other loci that are differentially methylated in relation to BMI represent attractive targets for the treatment or prevention of obesity and adiposity-related diseases. Supporting Information S1 Fig. Quantile-quantile plot of expected versus observed −log10 p-values from the epigenome-wide association study of BMI in the FHS-LBC meta-analysis. Models: (A) age- and sex-adjusted, (B) additionally smoking-adjusted, and (C) additionally excluding frailty/morbid obesity (BMI < 18 kg/m2 and > 35 kg/m2). https://doi.org/10.1371/journal.pmed.1002215.s001 (EPS) S2 Fig. Quantile-quantile plot of expected versus observed −log10 p-values from the epigenome-wide association study of BMI in FHS alone. Using surrogate variables to adjust for cell count proportion and technical effects (A) compared to the alternate approach of imputed cell counts and measured technical effects (B). Genomic inflation factor lambda is lower in the surrogate variable analysis approach compared to the approach of imputed cell counts and measured technical effects (1.04 versus 1.25), suggesting fewer potential false positives and a more conservative approach. https://doi.org/10.1371/journal.pmed.1002215.s002 (EPS) S3 Fig. Manhattan plot of the epigenome-wide association study of BMI in the FHS-LBC meta-analysis in age- and sex-adjusted models. The dotted line indicates the Bonferroni cutoff for significance of p-value < 1.2 × 10−7. The top six CpGs with the lowest p-values are shown, annotated to their closest gene transcript. https://doi.org/10.1371/journal.pmed.1002215.s003 (EPS) S4 Fig. Comparison of −log10 p-values of results of the FHS-LBC BMI epigenome-wide association study. (A) Model 1 (age- and sex-adjusted) + Model 2 (additionally smoking-adjusted). (B) Model 1 + Model 3 (excluding BMI < 18 and > 35 kg/m2). https://doi.org/10.1371/journal.pmed.1002215.s004 (EPS) S5 Fig. Three-dimensional scatterplot of −log10 p-values for 135 epigenome-wide significant CpGs from the FHS-LBC discovery cohorts in three external replication cohorts. Replication significance defined as Bonferroni-adjusted p-value < 3.7 × 10−4 (0.05/135). CpGs significant in one, two, and all three replication cohorts are depicted in green, yellow, and red, respectively. Annotated genes are labeled for CpGs replicated in all three cohorts. Full list of replication results is available in S2 Table. https://doi.org/10.1371/journal.pmed.1002215.s005 (EPS) S6 Fig. Variation in BMI explained (adjusted R2) by differential methylation of 77 nonredundant replicated CpGs in the FHS-LBC epigenome-wide association study and tested in the independent PIVUS cohort. CpGs are added in decreasing order of significance and are adjusted for age, sex, and preceding CpGs. https://doi.org/10.1371/journal.pmed.1002215.s006 (EPS) S7 Fig. Boxplot of BMI in the PIVUS cohort across deciles of the additive weighted composite measure of differential DNA methylation at 77 nonredundant replicated CpGs. https://doi.org/10.1371/journal.pmed.1002215.s007 (EPS) S8 Fig. Relationship between location of CpG relative to the transcription start site and proportion of variation in changes in corresponding gene expression, stratified by whether the CpG resides in a known DHS or enhancer region. CpGs located in known DHS or enhancer regions are depicted in red. bp, base pairs; DHS, DNase I hypersensitive site; TSS, transcription start site. https://doi.org/10.1371/journal.pmed.1002215.s008 (EPS) S9 Fig. Relationship between location of CpG relative to the transcription start site and proportion of variation in changes in corresponding gene expression, stratified based on location relative to nearest CpG island. Shores are defined as up to 2 kb from the CpG island, and shelves are defined as up to 2 kb from the CpG shore. TSS, transcription start site. https://doi.org/10.1371/journal.pmed.1002215.s009 (EPS) S10 Fig. Enrichment of nonredundant replicated differentially methylated CpGs from the BMI epigenome-wide association study in DNase I hypersensitive sites among various cell and tissue types using ENCODE and 2012 Roadmap Epigenomics Project data. (A) ENCODE and (B) 2012 Roadmap Epigenomics Project. https://doi.org/10.1371/journal.pmed.1002215.s010 (EPS) S11 Fig. Enrichment of nonredundant replicated differentially methylated CpGs from the BMI epigenome-wide association study in DNAse I hypersensitive sites among various cell and tissue types using consolidated Roadmap Epigenomics Project and BLUEPRINT Epigenome data. (C) consolidated Roadmap Epigenomics Project and (D) BLUEPRINT Epigenome. https://doi.org/10.1371/journal.pmed.1002215.s011 (EPS) S12 Fig. Enrichment of nonredundant replicated differentially methylated CpGs from the BMI epigenome-wide association study in regions overlapping histone modifications in the consolidated Roadmap Epigenomics Project data: H3K4me1 and H3K4me3 histone modifications. Presented is enrichment of BMI EWAS CpGs in regions overlapping (A) H3K4me1 and (B) H3K4me3 histone modifications. https://doi.org/10.1371/journal.pmed.1002215.s012 (EPS) S13 Fig. Enrichment of nonredundant replicated differentially methylated CpGs from the BMI epigenome-wide association study in regions overlapping histone modifications in the consolidated Roadmap Epigenomics Project data: H3K9me3 and H3K27me3 histone modifications. Presented is enrichment of BMI EWAS CpGs in regions overlapping (C) H3K9me3 and (D) H3K27me3 histone modifications. https://doi.org/10.1371/journal.pmed.1002215.s013 (EPS) S14 Fig. Enrichment of nonredundant replicated differentially methylated CpGs from the BMI epigenome-wide association study in regions overlapping histone modifications in the consolidated Roadmap Epigenomics Project data: H3K36me3 histone modifications. Presented is enrichment of BMI EWAS CpGs in regions overlapping (E) H3K36me3 histone modifications. https://doi.org/10.1371/journal.pmed.1002215.s014 (EPS) S15 Fig. Depiction of an example result for SREBF1 from the bidirectional Mendelian randomization analyses for each of the replicated CpGs and BMI. Example shown illustrates the bidirectional relationship of cg11024682 intronic to SREBF1 and BMI using a meQTL to model the exposure of differential methylation at that locus and an additive weighted genetic risk score using known BMI-related SNPs to model the exposure of elevated BMI. https://doi.org/10.1371/journal.pmed.1002215.s015 (EPS) S16 Fig. DNA methylation and mRNA expression of CPT1A and SREBF1 in whole blood in triglyceride and fatty acid catabolism (beta-oxidation) pathways was observed in association with higher BMI. https://doi.org/10.1371/journal.pmed.1002215.s016 (EPS) S1 Methods. Supplemental methods. https://doi.org/10.1371/journal.pmed.1002215.s017 (DOCX) S1 STROBE Checklist. https://doi.org/10.1371/journal.pmed.1002215.s018 (DOC) S1 Table. Complete list of methylome-wide significant (p-value < 1.2 × 10−7) CpGs associated with BMI in the FHS-LBC meta-analysis. https://doi.org/10.1371/journal.pmed.1002215.s019 (XLSX) S2 Table. Secondary models including additional adjustment for smoking and exclusion of BMI < 18 and > 35 kg/m2. https://doi.org/10.1371/journal.pmed.1002215.s020 (XLSX) S3 Table. External replication of methylome-wide significant CpGs. https://doi.org/10.1371/journal.pmed.1002215.s021 (XLSX) S4 Table. Distribution and variability of replicated BMI-related differentially methylated CpGs. https://doi.org/10.1371/journal.pmed.1002215.s022 (XLSX) S5 Table. Secondary models testing age and sex interactions. https://doi.org/10.1371/journal.pmed.1002215.s023 (XLSX) S6 Table. Sex-stratified models for cg26651978 (LGALS3BP) in the replication cohorts. https://doi.org/10.1371/journal.pmed.1002215.s024 (XLSX) S7 Table. Association of BMI with the replicated CpGs conditional on the top methylation QTL in the FHS. https://doi.org/10.1371/journal.pmed.1002215.s025 (XLSX) S8 Table. Three-way association results of CpGs, expression levels of nearby annotated genes, and BMI. https://doi.org/10.1371/journal.pmed.1002215.s026 (XLSX) S9 Table. Enrichment of BMI-related CpGs associated with gene expression in DNase I hypersensitive sites and enhancers. https://doi.org/10.1371/journal.pmed.1002215.s027 (XLSX) S10 Table. Results from the forward Mendelian randomization (DNA methylation affecting BMI) for the 83 replicated CpGs. https://doi.org/10.1371/journal.pmed.1002215.s028 (XLSX) S11 Table. Results from the mediator-to-outcome analyses of the two-step trans-tissue Mendelian randomization. https://doi.org/10.1371/journal.pmed.1002215.s029 (XLSX) S12 Table. Results from the reverse Mendelian randomization for the 83 replicated CpGs using the BMI genetic risk score instrumental variable. https://doi.org/10.1371/journal.pmed.1002215.s030 (XLSX) S13 Table. Sensitivity analyses for the reverse Mendelian randomization for the 16 implicated CpGs using the FTO locus SNP instrumental variable. https://doi.org/10.1371/journal.pmed.1002215.s031 (XLSX) S14 Table. Overlap between replicated BMI-related CpGs and metabolites as reported from the KORA cohort [86]. https://doi.org/10.1371/journal.pmed.1002215.s032 (XLSX) S1 Text. Project proposal. https://doi.org/10.1371/journal.pmed.1002215.s033 (DOCX) Acknowledgments Data on adiposity traits have been contributed by GIANT investigators and have been downloaded from https://www.broadinstitute.org/collaboration/giant/index.php/. Data on glycemic traits have been contributed by MAGIC investigators and have been downloaded from https://www.magicinvestigators.org. Data on diabetes traits have been contributed by the DIAGRAM consortium and have been downloaded from http://diagram-consortium.org/. Data on coronary artery disease/myocardial infarction have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from http://www.cardiogramplusc4d.org/. The views expressed in this paper are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the US Department of Health and Human Services. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Health and Medical Research Council or the Australian Research Council.
Effect of a Primary Care Walking Intervention with and without Nurse Support on Physical Activity Levels in 45- to 75-Year-Olds: The Pedometer And Consultation Evaluation (PACE-UP) Cluster Randomised Clinical Trialdoi: 10.1371/journal.pmed.1002210pmid: 28045890
Background Pedometers can increase walking and moderate-to-vigorous physical activity (MVPA) levels, but their effectiveness with or without support has not been rigorously evaluated. We assessed the effectiveness of a pedometer-based walking intervention in predominantly inactive adults, delivered by post or through primary care nurse-supported physical activity (PA) consultations. Methods and Findings A parallel three-arm cluster randomised trial was randomised by household, with 12-mo follow-up, in seven London, United Kingdom, primary care practices. Eleven thousand fifteen randomly selected patients aged 45–75 y without PA contraindications were invited. Five hundred forty-eight self-reporting achieving PA guidelines were excluded. One thousand twenty-three people from 922 households were randomised between 2012–2013 to one of the following groups: usual care (n = 338); postal pedometer intervention (n = 339); and nurse-supported pedometer intervention (n = 346). Of these, 956 participants (93%) provided outcome data (usual care n = 323, postal n = 312, nurse-supported n = 321). Both intervention groups received pedometers, 12-wk walking programmes, and PA diaries. The nurse group was offered three PA consultations. Primary and main secondary outcomes were changes from baseline to 12 mo in average daily step-counts and time in MVPA (in ≥10-min bouts), respectively, measured objectively by accelerometry. Only statisticians were masked to group. Analysis was by intention-to-treat. Average baseline daily step-count was 7,479 (standard deviation [s.d.] 2,671), and average time in MVPA bouts was 94 (s.d. 102) min/wk. At 12 mo, mean steps/d, with s.d. in parentheses, were as follows: control 7,246 (2,671); postal 8,010 (2,922); and nurse support 8,131 (3,228). PA increased in both intervention groups compared with the control group; additional steps/d were 642 for postal (95% CI 329–955) and 677 for nurse support (95% CI 365–989); additional MVPA in bouts (min/wk) were 33 for postal (95% CI 17–49) and 35 for nurse support (95% CI 19–51). There were no significant differences between the two interventions at 12 mo. The 10% (1,023/10,467) recruitment rate was a study limitation. Conclusions A primary care pedometer-based walking intervention in predominantly inactive 45- to 75-y-olds increased step-counts by about one-tenth and time in MVPA in bouts by about one-third. Nurse and postal delivery achieved similar 12-mo PA outcomes. A primary care pedometer intervention delivered by post or with minimal support could help address the public health physical inactivity challenge. Clinical Trial Registration isrctn.com ISRCTN98538934. Why Was This Study Done? Brisk walking for at least 30 min daily on five or more days weekly is a good way to achieve moderate-to-vigorous physical activity (MVPA) guidelines for health, yet many adults and older adults do not achieve these levels. Pedometers measure steps taken (step-count) and can increase walking and physical activity (PA) levels. Pedometer trials have usually measured short-term outcomes, combined pedometer effects with other support provided, and reported only step-counts, not time spent in MVPA. What Did the Researchers Do and Find? One thousand twenty-three inactive 45- to 75-y-olds from seven family (general) practices in London, UK, were randomly allocated to either a usual PA (control) group or to one of two intervention groups. The postal group were sent a pedometer, a PA diary, and instructions for a 12-wk walking programme to add in 3,000 steps or a 30-min walk on five or more days weekly; the nurse group received these materials through practice nurse PA consultations. Both intervention groups significantly increased their walking from baseline to 12 mo (step-counts increased by about 10% and time in MVPA increased by about one-third) compared to controls, with similar effect sizes for nurse and postal groups. What Do These Findings Mean? The findings suggest that a primary care pedometer intervention, delivered by post or with minimal support, could provide an effective way to increase PA levels in adults and older adults. Introduction Physical activity (PA) helps adults remain healthy and improves physical function, quality of life, and emotional well-being [1]. Physical inactivity is the fourth leading risk factor for global mortality [2], leading to high health service costs [1, 3]. PA guidelines in adults and older adults advise at least 150 min of moderate-to-vigorous PA (MVPA) or 75 min of vigorous intensity PA weekly, or a combination of both, in at least 10-min bouts [1, 4, 5]. One way to achieve this is by 30 min of MVPA on at least 5 d weekly [1]. Although setting such goals is helpful, a graded dose–-response relationship exists for PA and health, so for inactive people any PA increase is valuable [6]. Emphasising that the MVPA can occur in 10- rather than 30-min bouts enables older adults and those with disabilities to increase their MVPA gradually. Walking is the most common adult PA; a pace of 5 km/hour qualifies as moderate intensity [7]. Walking is safe, as both frequency and intensity can be increased gradually [7]. Despite individual variation, moderate-intensity walking approximates 100 steps/min [8], or 3,000 steps in 30 min. Adding “3,000 steps in 30 min” onto habitual activity can increase step-counts [9] and reduce fasting glucose [9] in people with impaired glucose tolerance, but evidence for a change in MVPA in bouts is lacking. Reducing sedentary time may also be beneficial [1]. Programmes using personalised PA goals and behavioural strategies [10–12] can achieve PA increases. Cochrane Reviews called for PA interventions to include objective PA measurement [13, 14], adverse events [13], and comparisons of face-to-face with remote interventions [14]. Comparative evidence on individuals, couples, or households is also needed [15]. Systematic reviews of pedometer-based walking interventions showed increases of 2,000–2,500 steps/d [10, 16, 17]. However, studies were mainly small, volunteer based, and short term, the independence of pedometer effects were unclear, and outcomes focused on step-counts, not MVPA [10, 16, 17]. Primary care provides an ideal context for PA interventions, allowing population-based sampling, practice nurse involvement, and continuity of care. Brief PA advice in primary care is advocated [18]. However, to date, primary care has had little success in playing its part in the challenge of increasing population PA levels. Some small primary care pedometer-based walking interventions in older adults have increased PA levels at 3 [19], 6 [20], and 12 mo [21], but the effects of exercise referral schemes have been disappointing [22]. We therefore conducted a trial of a pedometer-based walking intervention in 45- to 75-y-olds, predominantly inactive, primary care patients, with novel separate evaluation of pedometer and nurse-support effects on objective PA outcomes, including MVPA in bouts. The research questions were as follows: (i) Does a 3-mo postal pedometer-based walking intervention increase PA in inactive 45- to 75-y-olds at 12 mo follow-up, and (ii) Do practice nurse PA consultations provide additional benefit? We also present effects on patient-reported outcomes, anthropometric measures, and adverse events. Cost-effectiveness analyses will be published separately. Methods Study Design and Participants The trial protocol is published (S1 Text) [23]. A three-arm parallel cluster trial, randomised by household (allowing individuals and couples to participate) compared a 3-mo pedometer-based walking intervention, by post or with nurse support, with usual care. We recruited from an ethnically and socioeconomically diverse population in South London, UK, between September 2012 and October 2013, and follow-up was completed by October 2014. Six general (family) practices were selected, a seventh was later added, to ensure recruitment to target in the available time period. Eligible patients were 45–75 y old without contraindications to increasing MVPA. Care-home residents and those with unsuitable conditions were excluded [23]. All eligible participants were classified by household. Households were selected at random using Stata’s random number generator. All participants in single-person households were included. In multi-person households, an index person was selected at random, and a second person was randomly selected from amongst those aged within 15 y of the index person. Random samples of 400 eligible households were selected per practice [23], and individual invitations were posted. Those reporting achieving ≥150 min of MVPA weekly on a validated self-report PA question [24] were excluded. The London Research Ethics Committee (Hampstead) provided approval (12L/LO/0219). Trial registration: ISRCTN 98538934. Randomisation and Masking Random allocation by household, avoiding couple contamination, was in a 1:1:1 ratio using the Kings College Clinical Trials Unit internet service, ensuring allocation concealment. Block randomisation was used within practice, with random-sized blocks for balanced groups and an even nurse workload. Participants, nurses, and researchers were unmasked to intervention allocation. Main outcome analyses were conducted by statisticians masked to study group. Procedures Trial procedures, including individual informed written consent, baseline and 3- and 12-mo follow-up assessments, and complex intervention components are fully described elsewhere (S1 Text) [23] and summarised in S1 Fig. Of note, if participants were unable to be contacted at 3 mo, contact was still attempted again at the main 12-mo outcome. Assessment of outcomes were conducted identically for all three groups; an accelerometer (GT3X+, Actigraph LLC) was used for baseline, 3- and 12-mo masked PA assessment of step-counts, and time in different PA intensities. A simple pedometer, the SW-200 Yamax Digi-Walker, was used by both nurse and postal groups to record their own step-counts, as part of the intervention. The interventions incorporated behaviour change techniques (BCTs) and included individualised step-count, PA goals, and the “3,000-in-30” PA intensity message. Key intervention components were as follows: pedometers (SW-200 Yamax Digi-Walker); patient handbook; PA diary (including individual 12-wk walking plan); and three individually tailored practice nurse PA (10- to 20-min) consultations (nurse-support group only) were offered at approximately weeks 1, 5, and 9. The handbook and diary are available on the Pedometer and Consultation Evaluation (PACE-UP) website www.paceup.sgul.ac.uk/materials and both explain that adding 3,000 steps/d (approximating a 30-min walk) on five or more days weekly to an individual’s baseline step-count, progressing over 12 wk, would help achieve PA guidelines. BCTs, including goals and planning, self-monitoring and feedback, and encouraging social support, were included in the handbook, diary, and nurse consultations [23]. Participants in both postal and nurse intervention groups were encouraged to continue using the pedometer to monitor their walking and step-count beyond the 3-mo intervention period if they found this helpful. Control group participants were not provided with any feedback on their PA levels or materials promoting PA during the trial. They had follow-up assessments as per the intervention groups and were informed at the start of the trial that after 12-mo follow-up they would be offered feedback on their PA levels over the trial, a pedometer, a trial handbook, and a diary, either by post or as part of a single nurse consultation (according to their preference). Outcomes The primary outcome is change in average daily step-count, assessed by accelerometry over 7 d, between baseline and 12 mo. Secondary PA outcomes (all accelerometry) are as follows: changes in step-counts between baseline and 3 mo; changes in time spent weekly in MVPA in ≥10-min bouts; and time spent sedentary between baseline and 3 and 12 mo. Ancillary outcomes reported are as follows: i). changes in anthropometry (body mass index, waist circumference, body fat) [23] at 12 mo; ii). changes in patient-reported outcomes—exercise self-efficacy, anxiety, depression, health-related quality of life, pain (see protocol for full references [S1 Text] [23]) at 3 and 12 mo; iii). adverse outcomes—falls, injuries, fractures, cardiovascular disease events, and deaths—assessed from trial monitoring procedures, questionnaires at 3 and 12 mo, and primary care records. The following additional outcomes specified in the trial registry and trial protocol (S1 Text) will be published separately: economic (cost-effectiveness, including health service use outcomes and a Markov model to simulate long-term cost-effectiveness); self-report PA variables [23]; and a process evaluation. Qualitative evaluations from nonparticipants [25], participants [26], and practice nurses [27] are already published. An additional paper comparing trial participants and nonparticipants is also in progress. Statistical Analysis A sample of 993 (331 per group) was required to detect a 1,000 steps/d difference (assuming a standard deviation of 2,700) at 12 mo when comparing any two groups, with 90% power, at p = 0.01. Household clustering was allowed for, assuming an intra-cluster correlation of 0.5 and an average household size of 1.6, and we assumed 15% attrition [23]. Analysis and reporting followed CONSORT guidelines (S2 Text). Actigraph data were reduced using Actilife software (v 6.6.0), ignoring runs of ≥60 min of zero counts [23]. Vertical counts were used, as these are the basis of the validated step-count and MVPA algorithms. The analysis summary variables used were as follows: step-counts; accelerometer wear-time; time spent in MVPA (≥1,952 Counts Per Minute [CPM], equivalent to ≥3 Metabolic Equivalents [METs] [28]); time spent in ≥10-min MVPA bouts; and time spent sedentary (≤100 CPM, equivalent to ≤1.5 METs) [29]. Changes from protocol planned analyses (S1 Text) [23] were approved by the Trial Steering Committee prior to analyses. We report MVPA in ≥10-min bouts, as this relates more closely to PA guidelines [1, 4]. Only 20% of participants were nonwhite; ethnic group was therefore excluded from subgroup analyses due to low power. To lessen attrition bias, our primary analysis included all participants with ≥1 d of 540 min wear-time at 12 mo. All analyses were carried out using Stata, version 12.0 [StataCorp]. Regression analyses used the xtmixed procedure. For accelerometry, this was in two stages. Stage 1 estimated average daily step-count at 12 mo and at baseline, derived by using the same two-level model (level 1 was day within individual, level 2 was individual) in which daily step-counts were regressed on day-order-of-wear and day-of-week. Random effects were assumed to be independent. In stage 2, we regressed estimated average daily step-count at 12 mo on estimated average daily baseline step-count, mo of baseline accelerometry, age, gender, general practice, and treatment group. This effectively measured change in step-count over the 12 mo, minimising bias and maintaining power. In this analysis, level 1 was individual and level 2 was household. The pwcompare (pairwise comparison) post estimation command was used to generate estimates and confidence limits for the difference in change between the nurse and control groups and the postal and control groups. The same command was used to provide a direct comparison of the nurse and postal groups; although the difference is effectively the difference of the previous two estimates, it is important to put confidence limits on this comparison. Secondary outcome measures, MVPA in ≥10-min bouts, and sedentary time were analysed using identical approaches, as were 3-mo outcomes. Checks confirmed that distributions of residuals from the regression models were normally distributed (S2 Fig). Change in anthropometric measures and patient-reported outcomes were estimated using identical models to stage 2 above. Sensitivity analyses were carried out for our primary outcome. We assessed (i) the effect of restricting analyses to those with ≥600 min of daily wear-time (both with ≥1 d of accelerometry at 12 mo and ≥5 d of accelerometry data at 12 mo); (ii) whether participants lost to follow-up, or who failed to record a single adequate day at 12 mo, might have introduced bias using the Stata procedure mi impute; (iii) the possible impact of outcomes not being missing at random; and (iv) the effect of adjusting for wear-time. We also conducted further analyses examining total time in MVPA, as opposed to time in MVPA in ≥10-min bouts. Patient Involvement Pilot work with older primary care patients from three general practices was carried out ahead of seeking trial funding, with focus groups at each practice discussing ideas for a pedometer-based PA intervention. Patients were enthusiastic about the study and felt that the postal approach to recruitment and the interventions offered would be acceptable. They had input into aspects of the study design; for example, they encouraged us to offer the usual care arm a pedometer at the end of the follow-up period and they encouraged us to recruit couples as well as individuals, and to allow couples to attend nurse appointments together. A patient advisor was a Trial Steering Committee member and was involved in discussions about recruitment and study conduct, as well as advising about patient materials, dissemination of results to participants, and safety reporting mechanisms. All participants were provided with timely feedback of their individual trial results after completion of 12-mo follow-up, including their PA and body size measures over the trial duration. Summaries of results for the whole trial were disseminated to all trial participants as A4 feedback sheets after completion of baseline assessments and after analysis of the main results. A trial website (http://www.paceup.sgul.ac.uk/) has been created, and details have been circulated to participants. This also provides a summary of the trial results and details about further trial follow-up. All publications relating to the trial are provided on the website. The burden of the intervention was assessed by all participants in the nurse group with a questionnaire as part of the process evaluation and by samples of both intervention groups as part of the qualitative evaluation [26]. Study Design and Participants The trial protocol is published (S1 Text) [23]. A three-arm parallel cluster trial, randomised by household (allowing individuals and couples to participate) compared a 3-mo pedometer-based walking intervention, by post or with nurse support, with usual care. We recruited from an ethnically and socioeconomically diverse population in South London, UK, between September 2012 and October 2013, and follow-up was completed by October 2014. Six general (family) practices were selected, a seventh was later added, to ensure recruitment to target in the available time period. Eligible patients were 45–75 y old without contraindications to increasing MVPA. Care-home residents and those with unsuitable conditions were excluded [23]. All eligible participants were classified by household. Households were selected at random using Stata’s random number generator. All participants in single-person households were included. In multi-person households, an index person was selected at random, and a second person was randomly selected from amongst those aged within 15 y of the index person. Random samples of 400 eligible households were selected per practice [23], and individual invitations were posted. Those reporting achieving ≥150 min of MVPA weekly on a validated self-report PA question [24] were excluded. The London Research Ethics Committee (Hampstead) provided approval (12L/LO/0219). Trial registration: ISRCTN 98538934. Randomisation and Masking Random allocation by household, avoiding couple contamination, was in a 1:1:1 ratio using the Kings College Clinical Trials Unit internet service, ensuring allocation concealment. Block randomisation was used within practice, with random-sized blocks for balanced groups and an even nurse workload. Participants, nurses, and researchers were unmasked to intervention allocation. Main outcome analyses were conducted by statisticians masked to study group. Procedures Trial procedures, including individual informed written consent, baseline and 3- and 12-mo follow-up assessments, and complex intervention components are fully described elsewhere (S1 Text) [23] and summarised in S1 Fig. Of note, if participants were unable to be contacted at 3 mo, contact was still attempted again at the main 12-mo outcome. Assessment of outcomes were conducted identically for all three groups; an accelerometer (GT3X+, Actigraph LLC) was used for baseline, 3- and 12-mo masked PA assessment of step-counts, and time in different PA intensities. A simple pedometer, the SW-200 Yamax Digi-Walker, was used by both nurse and postal groups to record their own step-counts, as part of the intervention. The interventions incorporated behaviour change techniques (BCTs) and included individualised step-count, PA goals, and the “3,000-in-30” PA intensity message. Key intervention components were as follows: pedometers (SW-200 Yamax Digi-Walker); patient handbook; PA diary (including individual 12-wk walking plan); and three individually tailored practice nurse PA (10- to 20-min) consultations (nurse-support group only) were offered at approximately weeks 1, 5, and 9. The handbook and diary are available on the Pedometer and Consultation Evaluation (PACE-UP) website www.paceup.sgul.ac.uk/materials and both explain that adding 3,000 steps/d (approximating a 30-min walk) on five or more days weekly to an individual’s baseline step-count, progressing over 12 wk, would help achieve PA guidelines. BCTs, including goals and planning, self-monitoring and feedback, and encouraging social support, were included in the handbook, diary, and nurse consultations [23]. Participants in both postal and nurse intervention groups were encouraged to continue using the pedometer to monitor their walking and step-count beyond the 3-mo intervention period if they found this helpful. Control group participants were not provided with any feedback on their PA levels or materials promoting PA during the trial. They had follow-up assessments as per the intervention groups and were informed at the start of the trial that after 12-mo follow-up they would be offered feedback on their PA levels over the trial, a pedometer, a trial handbook, and a diary, either by post or as part of a single nurse consultation (according to their preference). Outcomes The primary outcome is change in average daily step-count, assessed by accelerometry over 7 d, between baseline and 12 mo. Secondary PA outcomes (all accelerometry) are as follows: changes in step-counts between baseline and 3 mo; changes in time spent weekly in MVPA in ≥10-min bouts; and time spent sedentary between baseline and 3 and 12 mo. Ancillary outcomes reported are as follows: i). changes in anthropometry (body mass index, waist circumference, body fat) [23] at 12 mo; ii). changes in patient-reported outcomes—exercise self-efficacy, anxiety, depression, health-related quality of life, pain (see protocol for full references [S1 Text] [23]) at 3 and 12 mo; iii). adverse outcomes—falls, injuries, fractures, cardiovascular disease events, and deaths—assessed from trial monitoring procedures, questionnaires at 3 and 12 mo, and primary care records. The following additional outcomes specified in the trial registry and trial protocol (S1 Text) will be published separately: economic (cost-effectiveness, including health service use outcomes and a Markov model to simulate long-term cost-effectiveness); self-report PA variables [23]; and a process evaluation. Qualitative evaluations from nonparticipants [25], participants [26], and practice nurses [27] are already published. An additional paper comparing trial participants and nonparticipants is also in progress. Statistical Analysis A sample of 993 (331 per group) was required to detect a 1,000 steps/d difference (assuming a standard deviation of 2,700) at 12 mo when comparing any two groups, with 90% power, at p = 0.01. Household clustering was allowed for, assuming an intra-cluster correlation of 0.5 and an average household size of 1.6, and we assumed 15% attrition [23]. Analysis and reporting followed CONSORT guidelines (S2 Text). Actigraph data were reduced using Actilife software (v 6.6.0), ignoring runs of ≥60 min of zero counts [23]. Vertical counts were used, as these are the basis of the validated step-count and MVPA algorithms. The analysis summary variables used were as follows: step-counts; accelerometer wear-time; time spent in MVPA (≥1,952 Counts Per Minute [CPM], equivalent to ≥3 Metabolic Equivalents [METs] [28]); time spent in ≥10-min MVPA bouts; and time spent sedentary (≤100 CPM, equivalent to ≤1.5 METs) [29]. Changes from protocol planned analyses (S1 Text) [23] were approved by the Trial Steering Committee prior to analyses. We report MVPA in ≥10-min bouts, as this relates more closely to PA guidelines [1, 4]. Only 20% of participants were nonwhite; ethnic group was therefore excluded from subgroup analyses due to low power. To lessen attrition bias, our primary analysis included all participants with ≥1 d of 540 min wear-time at 12 mo. All analyses were carried out using Stata, version 12.0 [StataCorp]. Regression analyses used the xtmixed procedure. For accelerometry, this was in two stages. Stage 1 estimated average daily step-count at 12 mo and at baseline, derived by using the same two-level model (level 1 was day within individual, level 2 was individual) in which daily step-counts were regressed on day-order-of-wear and day-of-week. Random effects were assumed to be independent. In stage 2, we regressed estimated average daily step-count at 12 mo on estimated average daily baseline step-count, mo of baseline accelerometry, age, gender, general practice, and treatment group. This effectively measured change in step-count over the 12 mo, minimising bias and maintaining power. In this analysis, level 1 was individual and level 2 was household. The pwcompare (pairwise comparison) post estimation command was used to generate estimates and confidence limits for the difference in change between the nurse and control groups and the postal and control groups. The same command was used to provide a direct comparison of the nurse and postal groups; although the difference is effectively the difference of the previous two estimates, it is important to put confidence limits on this comparison. Secondary outcome measures, MVPA in ≥10-min bouts, and sedentary time were analysed using identical approaches, as were 3-mo outcomes. Checks confirmed that distributions of residuals from the regression models were normally distributed (S2 Fig). Change in anthropometric measures and patient-reported outcomes were estimated using identical models to stage 2 above. Sensitivity analyses were carried out for our primary outcome. We assessed (i) the effect of restricting analyses to those with ≥600 min of daily wear-time (both with ≥1 d of accelerometry at 12 mo and ≥5 d of accelerometry data at 12 mo); (ii) whether participants lost to follow-up, or who failed to record a single adequate day at 12 mo, might have introduced bias using the Stata procedure mi impute; (iii) the possible impact of outcomes not being missing at random; and (iv) the effect of adjusting for wear-time. We also conducted further analyses examining total time in MVPA, as opposed to time in MVPA in ≥10-min bouts. Patient Involvement Pilot work with older primary care patients from three general practices was carried out ahead of seeking trial funding, with focus groups at each practice discussing ideas for a pedometer-based PA intervention. Patients were enthusiastic about the study and felt that the postal approach to recruitment and the interventions offered would be acceptable. They had input into aspects of the study design; for example, they encouraged us to offer the usual care arm a pedometer at the end of the follow-up period and they encouraged us to recruit couples as well as individuals, and to allow couples to attend nurse appointments together. A patient advisor was a Trial Steering Committee member and was involved in discussions about recruitment and study conduct, as well as advising about patient materials, dissemination of results to participants, and safety reporting mechanisms. All participants were provided with timely feedback of their individual trial results after completion of 12-mo follow-up, including their PA and body size measures over the trial duration. Summaries of results for the whole trial were disseminated to all trial participants as A4 feedback sheets after completion of baseline assessments and after analysis of the main results. A trial website (http://www.paceup.sgul.ac.uk/) has been created, and details have been circulated to participants. This also provides a summary of the trial results and details about further trial follow-up. All publications relating to the trial are provided on the website. The burden of the intervention was assessed by all participants in the nurse group with a questionnaire as part of the process evaluation and by samples of both intervention groups as part of the qualitative evaluation [26]. Results Participants Of 11,015 invited, 6,399 did not respond, 548 were excluded due to self-reported PA guideline achievement, 127 were recruited but did not attend baseline assessment or provided inadequate baseline accelerometry data, and 1,023/10,467 (10%) were randomised (Fig 1). Of the 1,023 participants, 32 (3%) withdrew, and 8 (1%) were uncontactable at 12 mo. In total, 956/1,023 (93%) participants provided at least 1 d of 540 min wear-time accelerometer data and were included in 12-mo primary analyses. Baseline findings (Table 1) showed recruitment was balanced across age-groups; over a third were male. Characteristics were similar between groups. The nurse-support group had a slightly higher baseline adjusted average daily step-count (7,653, s.d. 2,826) and minutes spent weekly in MVPA in bouts of ≥10 min (105, s.d. 116) compared with the postal (steps 7,402, s.d. 2,476; MVPA in bouts 92, s.d. 90) and control groups (steps 7,379, s.d. 2,696; MVPA in bouts 84, s.d. 97). Overall, 218/1,023 (21%) achieved PA guidelines of ≥150 min of MVPA in bouts. Accelerometer wear-time was similar between groups at baseline and 3- and 12-mo follow-ups (Tables 1 and 2). Over 90% of all groups provided ≥5 d of ≥540 min wear-time at 12 mo (S1 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. PACE-UP CONSORT diagram. https://doi.org/10.1371/journal.pmed.1002210.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Baseline characteristics of 1,023 randomised participants. https://doi.org/10.1371/journal.pmed.1002210.t001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Primary and secondary accelerometry outcome data. https://doi.org/10.1371/journal.pmed.1002210.t002 Among intervention participants, 256/346 (74%) of the nurse-support group attended all three sessions and 268/339 (79%) of the postal and 281/346 (81%) of the nurse-support group sent back PA diaries completed with their pedometer step-counts after the intervention. Effect of the Intervention on PA at 3 and 12 Mo Three-mo (interim) outcomes (Table 2). There were significant differences for change in step-counts from baseline to 3 mo between intervention groups and the control group: additional step-counts (steps/day) postal 692 (95% CI 363, 1,020; p < 0.001), nurse-support 1,172 (95% CI 844, 1,501; p < 0.001); the difference between the intervention groups was statistically significant: 481 (95% CI 153, 809; p = 0.004). Findings for MVPA showed a similar pattern: additional MVPA in bouts (min/wk) postal 43 (95% CI 26, 60; p < 0.001), nurse-support 61 (95% CI 44, 78; p < 0.001); the difference between intervention groups was 18 (95% CI 1, 35; p = 0.04). Sedentary time was similar between groups. Summary data for 3-mo PA outcomes are shown in S2 Table. Twelve-mo (main) outcomes (Table 2). Both intervention groups increased their step-counts at 12 mo compared with controls: additional step-counts (steps/day) postal 642 (95% CI 329, 955; p < 0.001) and nurse-support 677 (95% CI 365, 989; p < 0.001), with no statistically significant difference between intervention groups, 36 (-277, 349). Time spent in MVPA in bouts showed a similar pattern; both intervention groups increased at 12 mo compared with controls; additional MVPA in bouts (min/wk) postal 33 (95% CI 17, 49; p < 0.001) and nurse-support 35 (95% CI 19, 51; p < 0.001), with no statistically significant difference between intervention groups 2 (-14, 17). Sedentary time was similar between groups. Summary data for 12-mo PA outcomes are shown in S2 Table. Effect of the Intervention on Other Health-Related Outcomes Fat mass was slightly reduced at 12 mo in both intervention groups, but these differences did not differ significantly from the control group (Table 3). There was no change in body mass index or waist circumference. The interventions had no significant effects on anxiety, depression, health-related quality of life, or pain scores at 3 or 12 mo. Exercise self-efficacy score significantly increased in both intervention groups at 3 mo compared with controls, and there was a greater effect in the nurse group compared with postal. By 12 mo, there was a difference in self-efficacy score between only the nurse and control groups; the postal group was intermediate between, but not significantly different from, the other groups (Table 3). Summary data for health-related outcomes are shown S2 Table. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Ancillary outcomes. https://doi.org/10.1371/journal.pmed.1002210.t003 Subgroup analyses. There was no evidence of effect modification on change in step-count at 12 mo for either of the intervention groups versus control for any of the following factors: age, gender, taking part as a couple, body mass index, disability, pain, socioeconomic group, exercise self-efficacy (Fig 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Treatment effect for primary outcome by subgroup at 12 mo. (a) Postal and control groups (b) nurse and control groups. Abbreviations: BMI, body mass index; NS-SEC, National Statistics Socioeconomic Classification. https://doi.org/10.1371/journal.pmed.1002210.g002 Effect of the Intervention on Adverse Events and Serious Adverse Events Total adverse events did not differ between groups at 3 or 12 mo whether self-reported on the questionnaire (falls, fractures, sprains, and injuries) or from primary care records (any adverse event) (Table 4). There was also no between-group difference in trial serious adverse events reported for safety monitoring. Self-reported falls were lower in the nurse group at 12 mo (p = 0.02). Falls reported in primary care records over 12 mo are fewer, but also in the same direction, although differences are nonsignificant (p = 0.13). Primary care recorded cardiovascular events over 0–12 mo were lower in the intervention groups than in controls (p = 0.04). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Adverse events. https://doi.org/10.1371/journal.pmed.1002210.t004 Sensitivity analyses and imputations. Restricting analyses to those with ≥600 min daily wear-time (and either ≥1 or ≥5 d of accelerometry data at 12 mo) and imputations with both missing at random and missing not at random assumptions and analyses, adjusting for accelerometer wear-time, gave broadly similar effect size estimates for both interventions compared with control and to each other and made no difference to interpretation (S3 Table). Analyses of total MVPA as the outcome produced almost identical effect size estimates as found with MVPA in ≥10-min bouts; at 12 mo, postal versus control was 36 (95% CI 17, 55) min/wk and nurse versus control was 32 (95% CI 13, 50) min/wk. In other words, all of the increase in MVPA was in ≥10-min bouts. Participants Of 11,015 invited, 6,399 did not respond, 548 were excluded due to self-reported PA guideline achievement, 127 were recruited but did not attend baseline assessment or provided inadequate baseline accelerometry data, and 1,023/10,467 (10%) were randomised (Fig 1). Of the 1,023 participants, 32 (3%) withdrew, and 8 (1%) were uncontactable at 12 mo. In total, 956/1,023 (93%) participants provided at least 1 d of 540 min wear-time accelerometer data and were included in 12-mo primary analyses. Baseline findings (Table 1) showed recruitment was balanced across age-groups; over a third were male. Characteristics were similar between groups. The nurse-support group had a slightly higher baseline adjusted average daily step-count (7,653, s.d. 2,826) and minutes spent weekly in MVPA in bouts of ≥10 min (105, s.d. 116) compared with the postal (steps 7,402, s.d. 2,476; MVPA in bouts 92, s.d. 90) and control groups (steps 7,379, s.d. 2,696; MVPA in bouts 84, s.d. 97). Overall, 218/1,023 (21%) achieved PA guidelines of ≥150 min of MVPA in bouts. Accelerometer wear-time was similar between groups at baseline and 3- and 12-mo follow-ups (Tables 1 and 2). Over 90% of all groups provided ≥5 d of ≥540 min wear-time at 12 mo (S1 Table). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. PACE-UP CONSORT diagram. https://doi.org/10.1371/journal.pmed.1002210.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Baseline characteristics of 1,023 randomised participants. https://doi.org/10.1371/journal.pmed.1002210.t001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Primary and secondary accelerometry outcome data. https://doi.org/10.1371/journal.pmed.1002210.t002 Among intervention participants, 256/346 (74%) of the nurse-support group attended all three sessions and 268/339 (79%) of the postal and 281/346 (81%) of the nurse-support group sent back PA diaries completed with their pedometer step-counts after the intervention. Effect of the Intervention on PA at 3 and 12 Mo Three-mo (interim) outcomes (Table 2). There were significant differences for change in step-counts from baseline to 3 mo between intervention groups and the control group: additional step-counts (steps/day) postal 692 (95% CI 363, 1,020; p < 0.001), nurse-support 1,172 (95% CI 844, 1,501; p < 0.001); the difference between the intervention groups was statistically significant: 481 (95% CI 153, 809; p = 0.004). Findings for MVPA showed a similar pattern: additional MVPA in bouts (min/wk) postal 43 (95% CI 26, 60; p < 0.001), nurse-support 61 (95% CI 44, 78; p < 0.001); the difference between intervention groups was 18 (95% CI 1, 35; p = 0.04). Sedentary time was similar between groups. Summary data for 3-mo PA outcomes are shown in S2 Table. Twelve-mo (main) outcomes (Table 2). Both intervention groups increased their step-counts at 12 mo compared with controls: additional step-counts (steps/day) postal 642 (95% CI 329, 955; p < 0.001) and nurse-support 677 (95% CI 365, 989; p < 0.001), with no statistically significant difference between intervention groups, 36 (-277, 349). Time spent in MVPA in bouts showed a similar pattern; both intervention groups increased at 12 mo compared with controls; additional MVPA in bouts (min/wk) postal 33 (95% CI 17, 49; p < 0.001) and nurse-support 35 (95% CI 19, 51; p < 0.001), with no statistically significant difference between intervention groups 2 (-14, 17). Sedentary time was similar between groups. Summary data for 12-mo PA outcomes are shown in S2 Table. Three-mo (interim) outcomes (Table 2). There were significant differences for change in step-counts from baseline to 3 mo between intervention groups and the control group: additional step-counts (steps/day) postal 692 (95% CI 363, 1,020; p < 0.001), nurse-support 1,172 (95% CI 844, 1,501; p < 0.001); the difference between the intervention groups was statistically significant: 481 (95% CI 153, 809; p = 0.004). Findings for MVPA showed a similar pattern: additional MVPA in bouts (min/wk) postal 43 (95% CI 26, 60; p < 0.001), nurse-support 61 (95% CI 44, 78; p < 0.001); the difference between intervention groups was 18 (95% CI 1, 35; p = 0.04). Sedentary time was similar between groups. Summary data for 3-mo PA outcomes are shown in S2 Table. Twelve-mo (main) outcomes (Table 2). Both intervention groups increased their step-counts at 12 mo compared with controls: additional step-counts (steps/day) postal 642 (95% CI 329, 955; p < 0.001) and nurse-support 677 (95% CI 365, 989; p < 0.001), with no statistically significant difference between intervention groups, 36 (-277, 349). Time spent in MVPA in bouts showed a similar pattern; both intervention groups increased at 12 mo compared with controls; additional MVPA in bouts (min/wk) postal 33 (95% CI 17, 49; p < 0.001) and nurse-support 35 (95% CI 19, 51; p < 0.001), with no statistically significant difference between intervention groups 2 (-14, 17). Sedentary time was similar between groups. Summary data for 12-mo PA outcomes are shown in S2 Table. Effect of the Intervention on Other Health-Related Outcomes Fat mass was slightly reduced at 12 mo in both intervention groups, but these differences did not differ significantly from the control group (Table 3). There was no change in body mass index or waist circumference. The interventions had no significant effects on anxiety, depression, health-related quality of life, or pain scores at 3 or 12 mo. Exercise self-efficacy score significantly increased in both intervention groups at 3 mo compared with controls, and there was a greater effect in the nurse group compared with postal. By 12 mo, there was a difference in self-efficacy score between only the nurse and control groups; the postal group was intermediate between, but not significantly different from, the other groups (Table 3). Summary data for health-related outcomes are shown S2 Table. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Ancillary outcomes. https://doi.org/10.1371/journal.pmed.1002210.t003 Subgroup analyses. There was no evidence of effect modification on change in step-count at 12 mo for either of the intervention groups versus control for any of the following factors: age, gender, taking part as a couple, body mass index, disability, pain, socioeconomic group, exercise self-efficacy (Fig 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Treatment effect for primary outcome by subgroup at 12 mo. (a) Postal and control groups (b) nurse and control groups. Abbreviations: BMI, body mass index; NS-SEC, National Statistics Socioeconomic Classification. https://doi.org/10.1371/journal.pmed.1002210.g002 Subgroup analyses. There was no evidence of effect modification on change in step-count at 12 mo for either of the intervention groups versus control for any of the following factors: age, gender, taking part as a couple, body mass index, disability, pain, socioeconomic group, exercise self-efficacy (Fig 2). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Treatment effect for primary outcome by subgroup at 12 mo. (a) Postal and control groups (b) nurse and control groups. Abbreviations: BMI, body mass index; NS-SEC, National Statistics Socioeconomic Classification. https://doi.org/10.1371/journal.pmed.1002210.g002 Effect of the Intervention on Adverse Events and Serious Adverse Events Total adverse events did not differ between groups at 3 or 12 mo whether self-reported on the questionnaire (falls, fractures, sprains, and injuries) or from primary care records (any adverse event) (Table 4). There was also no between-group difference in trial serious adverse events reported for safety monitoring. Self-reported falls were lower in the nurse group at 12 mo (p = 0.02). Falls reported in primary care records over 12 mo are fewer, but also in the same direction, although differences are nonsignificant (p = 0.13). Primary care recorded cardiovascular events over 0–12 mo were lower in the intervention groups than in controls (p = 0.04). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Adverse events. https://doi.org/10.1371/journal.pmed.1002210.t004 Sensitivity analyses and imputations. Restricting analyses to those with ≥600 min daily wear-time (and either ≥1 or ≥5 d of accelerometry data at 12 mo) and imputations with both missing at random and missing not at random assumptions and analyses, adjusting for accelerometer wear-time, gave broadly similar effect size estimates for both interventions compared with control and to each other and made no difference to interpretation (S3 Table). Analyses of total MVPA as the outcome produced almost identical effect size estimates as found with MVPA in ≥10-min bouts; at 12 mo, postal versus control was 36 (95% CI 17, 55) min/wk and nurse versus control was 32 (95% CI 13, 50) min/wk. In other words, all of the increase in MVPA was in ≥10-min bouts. Sensitivity analyses and imputations. Restricting analyses to those with ≥600 min daily wear-time (and either ≥1 or ≥5 d of accelerometry data at 12 mo) and imputations with both missing at random and missing not at random assumptions and analyses, adjusting for accelerometer wear-time, gave broadly similar effect size estimates for both interventions compared with control and to each other and made no difference to interpretation (S3 Table). Analyses of total MVPA as the outcome produced almost identical effect size estimates as found with MVPA in ≥10-min bouts; at 12 mo, postal versus control was 36 (95% CI 17, 55) min/wk and nurse versus control was 32 (95% CI 13, 50) min/wk. In other words, all of the increase in MVPA was in ≥10-min bouts. Discussion Principal Findings The interventions increased objectively assessed PA (step-counts by about 650–700 steps per day and MVPA in bouts by about 33–35 min/wk) among predominantly inactive 45- to 75-y-olds at 12 mo. Whilst nurse delivery had a greater effect than postal delivery at 3 mo, by 12 mo this difference was not sustained. Exercise self-efficacy was significantly increased by both interventions compared to control at 3 mo and in the nurse group at 12 mo. The interventions had no effect on sedentary time, anthropometry, or other outcomes and did not increase adverse events. Both interventions were well accepted; three-quarters of the nurse group attended all three sessions and ~80% of both groups returned completed step-count diaries. The trial was novel in clearly separating out the effects of pedometer provision and nurse support in a general population sample of adults and older adults and demonstrating the effects on both step-counts and MVPA in bouts, thus making the outcome assessment relevant to current national and international PA guidelines. Study Strengths and Limitations Study strengths include the following: a large, population-based, primary care sample; household randomisation, allowing comparison of individual and couple effects; three arms, allowing separation of nurse support and pedometer/handbook effects; practice nurses, rather than researchers or exercise specialists, delivering the intervention; good uptake of nurse appointments and return of completed step-count diaries; an objective PA outcome, relevant to PA guidelines; adverse event measurement from primary care records; a 93% follow-up rate; and embedded economic and qualitative evaluations (not presented here). There were some study limitations. The 10% (1,023/10,467) recruitment rate raises issues of generalizability, which are dealt with in the later section on Implications for Policy, Practice, and Future Research. At baseline, 218/1,023 (21%) achieved PA guidelines based on accelerometry. They were not excluded because, if rolled out in primary care, self-report would define participation. Our nurse intervention group had slightly higher baseline PA levels; however, results were not biased, as analyses were based on individual change, controlling for baseline PA level. It was impossible to mask participants and nurses to group and, pragmatically, research assistants recruited and followed up the same participants, so they were unmasked to group at outcome assessment. However, all the primary and secondary PA outcomes were assessed objectively. Participants might have tried harder with their PA when monitored, but this would also have affected controls and would be reduced by using a 7-d protocol [16]. Also, our intervention groups increased MVPA in bouts of ≥10 min, implying that participants made changes suggested by the programme. Despite recruiting to target and having excellent follow-up, our confidence intervals for the difference between intervention groups cannot rule out a small 12-mo difference. Main Results in Context of Other Literature To our knowledge, this is the largest population-based trial of a pedometer-based walking intervention with 12-mo follow-up and is consistent with our findings in 60- to 75-y-olds in the smaller PACE-Lift trial [21]. Whilst the PACE-Lift intervention also included pedometer feedback, step-count diary, and practice nurse PA consultations based around BCTs, it comprised four longer consultations, which also included accelerometer feedback on PA intensity. PACE-Lift only had a single intervention arm and was therefore unable to separate out PA monitor effects from those of the nurse support. Despite a less intense intervention, PACE-UP has delivered similar levels of effect at both 3 and 12 mo and additionally has shown what can be achieved via a postal route. Compared with systematic reviews [10, 16, 17], our absolute step-count increase was modest. However, most trials with 12 mo of data have been based on small numbers and either volunteers [30], high-risk groups [9], or self-report PA data [31], likely leading to larger effects. PA guidelines focus on time in MVPA, not step-counts; the reviews presented no data on this important outcome [10, 16, 17]. PACE-UP results confirm PACE-Lift findings [21], with significant 12-mo increases in MVPA in bouts. Based on the “3,000-in-30” formula, 35 extra min of MVPA/wk in bouts corresponds to 500 extra steps/day. Thus, three-quarters of the extra steps achieved contributed to MVPA in bouts. We believe our trial is the first to show that the “3,000-in-30” message [8] can lead to an approximately one-third increase in weekly MVPA in bouts at 12 mo, achieved across both intervention groups. It is also reassuring that our interventions did not increase sedentary time, given its potential harm, as compensation can sometimes occur. Most pedometer-based interventions have not separated pedometer and support effects [14, 16, 21]. The Healthy Steps trial showed pedometers achieved an additional effect compared with a primary care PA prescription, but PA outcomes were self-reported [31]. PACE-UP demonstrates that whilst the nurse intervention group had a significantly greater effect on both step-counts and time in MVPA at 3 mo, by 12 mo both nurse and postal interventions still had a significant effect, but with no evidence of difference between them. This stronger effect during the period of contact with the nurse, which was not sustainable longer term, has also been shown in other interventions with health professionals [32]. Both nurse and postal groups received a pedometer, diary, and handbook as part of the PACE-UP package; it is not possible to know how much the individual components contributed. A systematic review suggested that step-count diaries were common to successful pedometer interventions [16], and approximately 80% of both of our intervention groups returned completed step-count diaries. Also, our qualitative findings suggest that participants from both groups valued the handbook and diary as well as the pedometer [26]. We found no effect of the interventions on body mass index or fat mass, consistent with other studies [21, 30]. Our interventions did not affect anxiety or depression scores, consistent with other primary care pedometer-based interventions, suggesting either no effect or insensitivity of these measures to change, particularly when levels are in the normal range for most people [19, 21]. However, whilst a few participants mentioned negative effects from overdoing walking, most intervention participants talked about feeling fitter, sleeping better, improved mood, having more energy and less pain, and keeping more active into older age [26]. There is a lack of data comparing individual, couple, or household participation in walking studies [15, 21]. Household sampling allowed us to investigate this, but only 20% participated as couples, reducing the power of our subgroup analyses, which showed no effect. Self-efficacy differences between both intervention groups and controls at 3 mo and between the nurse group and controls at 12 mo are consistent with the positive relationship between changing self-efficacy and PA behaviour [33]. The BCTs most associated with self-efficacy and successful outcomes are goal and action planning, prompting self-monitoring and feedback, and planning of social support/change [33]. All these BCTs were specifically recommended in recent guidance [11] and were included in our study in written materials for both intervention groups and as a focus of nurse PA consultations [23]. Our qualitative interviews found that more BCT comments were made by the nurse than postal group, apart from around self-monitoring [26]. Increased self-efficacy is important for long-term PA adherence [34]. Walking is a safe intervention indicated in many chronic diseases [1, 7], although empirical data are limited [13] and a large trial on 40- to 74-y-old women encouraging a single 30-min brisk walk 5 d weekly reported increased falls and injuries [24]. Our findings showing no increase in adverse events builds on similar evidence from PACE-Lift [21], using both self-report and primary care data, and highlights the potential importance of building up MVPA gradually, particularly in those who are inactive or have comorbidities [1, 6]. The suggestion of a protective effect of the interventions on falls and cardiovascular events is plausible, but not definitive, as it is based on small numbers of events. Implications for Policy, Practice, and Future Research Individual PA behaviour change approaches such as PACE-UP are important in tackling the public health challenge of physical inactivity but for maximum benefit need to occur alongside environmental and policy approaches [12]. Our results support current guidance for pedometers, which suggests that they are used as part of a package that includes support to set realistic goals, monitoring, and feedback [35]. Only 10% of eligible individuals were randomised, similar to other primary care PA trials [19, 36] but lower than the 30% in our recent older adult trial [21]. However, 10% of a population sample is still a very useful percentage to be participating in a public health intervention, and this trial shows the potential of primary care to contribute to PA public health goals. It is important to consider whether the participants randomised are representative of the target population from which they were drawn, particularly given the uptake rate of 10%. From Table 1, we can see that, of those randomised, there were more women than men, and the proportion of participants of Asian origin and from deprived areas was low and fewer than expected from the areas sampled. While approximately 4/5 of those randomised reported their health as good or very good, about 2/3 were overweight or obese, half reported one to two chronic diseases, nearly 2/5 reported slight/some disability, and over 1/5 reported a limiting, longstanding illness. Older adults were well represented. Thus, although it is unlikely that those randomised are entirely typical of the practice populations (it would be surprising if they were), there was substantial representation from groups who are particularly likely to benefit from the intervention, specifically older adults, women, and the overweight. Moreover, 1/3 of those randomised rated their self-efficacy for exercise as low. Nevertheless, some groups, for example Asians, will be underrepresented, and we are carrying out further work comparing participants and nonparticipants to identify these. Tailoring future interventions to be more acceptable to such groups will be important. If the intervention were to be rolled out in routine primary care, take-up could be higher, with no requirement for informed consent, randomisation, and rigorous evaluation. Handing out the intervention materials (pedometer, handbook, and diary) in primary care consultations where advice to increase low PA levels is already being offered is also likely to increase the intervention’s reach (e.g., in relevant chronic disease consultations or as part of preventive health checks, such as the UK National Health Service Health Checks, which cover a similar age-group and aim to reduce cardiovascular risk [37]). The intervention could also be a valuable addition to diabetes prevention strategies, such as the National Health Service Diabetes Prevention Programme [38], where primary care is being used to identify patients at high risk of developing diabetes, the majority of whom are inactive. The “3,000 steps-in-30 min” neatly captures intensity and could become a commendable new public health goal, with many people now having the ability to measure steps easily with their mobile phones. Our interventions led to an extra 33–35 min weekly of MVPA in bouts (an increase of about a third from baseline) and an extra 642–692 steps per day in a predominantly inactive cohort. Based on a systematic review, which has quantified the strength of association between PA (particularly walking) and developing coronary heart disease [39], the increase of 33 min/wk in the postal group in our study at 12 mo, if sustained, would be expected to reduce coronary heart disease risk by 4.5% (95% CI 3%, 6%; see S3 Text for details). Similarly, a cohort study relating pedometer-measured steps to mortality [40] allowed us to estimate that a sustained increase of 642 steps/day would be expected to decrease all-cause mortality by 4% (95% CI 1%, 7%). Whilst the nurse intervention produced greater effects at 3 mo, by 12 mo both interventions performed similarly. However, maintenance is important to consider, as long-term health effects require sustained PA increases and little is known about the effectiveness of PA interventions beyond 12 mo [13, 16]. We designed both PACE-UP interventions to have lasting effects [23], including techniques shown to help maintain behaviour change (e.g., encouraging feedback and self-monitoring; relapse prevention strategies and “if-then” plans in case of relapse; building social support; and incorporating new behaviours into daily routines [11]). Some strategies may have been more effective in the nurse group; the sustained self-efficacy difference between nurse and control groups at 12 mo supports this possibility. It is therefore important to test the long-term effectiveness of both interventions, and we are currently following up the PACE-UP cohort at 3 y. Principal Findings The interventions increased objectively assessed PA (step-counts by about 650–700 steps per day and MVPA in bouts by about 33–35 min/wk) among predominantly inactive 45- to 75-y-olds at 12 mo. Whilst nurse delivery had a greater effect than postal delivery at 3 mo, by 12 mo this difference was not sustained. Exercise self-efficacy was significantly increased by both interventions compared to control at 3 mo and in the nurse group at 12 mo. The interventions had no effect on sedentary time, anthropometry, or other outcomes and did not increase adverse events. Both interventions were well accepted; three-quarters of the nurse group attended all three sessions and ~80% of both groups returned completed step-count diaries. The trial was novel in clearly separating out the effects of pedometer provision and nurse support in a general population sample of adults and older adults and demonstrating the effects on both step-counts and MVPA in bouts, thus making the outcome assessment relevant to current national and international PA guidelines. Study Strengths and Limitations Study strengths include the following: a large, population-based, primary care sample; household randomisation, allowing comparison of individual and couple effects; three arms, allowing separation of nurse support and pedometer/handbook effects; practice nurses, rather than researchers or exercise specialists, delivering the intervention; good uptake of nurse appointments and return of completed step-count diaries; an objective PA outcome, relevant to PA guidelines; adverse event measurement from primary care records; a 93% follow-up rate; and embedded economic and qualitative evaluations (not presented here). There were some study limitations. The 10% (1,023/10,467) recruitment rate raises issues of generalizability, which are dealt with in the later section on Implications for Policy, Practice, and Future Research. At baseline, 218/1,023 (21%) achieved PA guidelines based on accelerometry. They were not excluded because, if rolled out in primary care, self-report would define participation. Our nurse intervention group had slightly higher baseline PA levels; however, results were not biased, as analyses were based on individual change, controlling for baseline PA level. It was impossible to mask participants and nurses to group and, pragmatically, research assistants recruited and followed up the same participants, so they were unmasked to group at outcome assessment. However, all the primary and secondary PA outcomes were assessed objectively. Participants might have tried harder with their PA when monitored, but this would also have affected controls and would be reduced by using a 7-d protocol [16]. Also, our intervention groups increased MVPA in bouts of ≥10 min, implying that participants made changes suggested by the programme. Despite recruiting to target and having excellent follow-up, our confidence intervals for the difference between intervention groups cannot rule out a small 12-mo difference. Main Results in Context of Other Literature To our knowledge, this is the largest population-based trial of a pedometer-based walking intervention with 12-mo follow-up and is consistent with our findings in 60- to 75-y-olds in the smaller PACE-Lift trial [21]. Whilst the PACE-Lift intervention also included pedometer feedback, step-count diary, and practice nurse PA consultations based around BCTs, it comprised four longer consultations, which also included accelerometer feedback on PA intensity. PACE-Lift only had a single intervention arm and was therefore unable to separate out PA monitor effects from those of the nurse support. Despite a less intense intervention, PACE-UP has delivered similar levels of effect at both 3 and 12 mo and additionally has shown what can be achieved via a postal route. Compared with systematic reviews [10, 16, 17], our absolute step-count increase was modest. However, most trials with 12 mo of data have been based on small numbers and either volunteers [30], high-risk groups [9], or self-report PA data [31], likely leading to larger effects. PA guidelines focus on time in MVPA, not step-counts; the reviews presented no data on this important outcome [10, 16, 17]. PACE-UP results confirm PACE-Lift findings [21], with significant 12-mo increases in MVPA in bouts. Based on the “3,000-in-30” formula, 35 extra min of MVPA/wk in bouts corresponds to 500 extra steps/day. Thus, three-quarters of the extra steps achieved contributed to MVPA in bouts. We believe our trial is the first to show that the “3,000-in-30” message [8] can lead to an approximately one-third increase in weekly MVPA in bouts at 12 mo, achieved across both intervention groups. It is also reassuring that our interventions did not increase sedentary time, given its potential harm, as compensation can sometimes occur. Most pedometer-based interventions have not separated pedometer and support effects [14, 16, 21]. The Healthy Steps trial showed pedometers achieved an additional effect compared with a primary care PA prescription, but PA outcomes were self-reported [31]. PACE-UP demonstrates that whilst the nurse intervention group had a significantly greater effect on both step-counts and time in MVPA at 3 mo, by 12 mo both nurse and postal interventions still had a significant effect, but with no evidence of difference between them. This stronger effect during the period of contact with the nurse, which was not sustainable longer term, has also been shown in other interventions with health professionals [32]. Both nurse and postal groups received a pedometer, diary, and handbook as part of the PACE-UP package; it is not possible to know how much the individual components contributed. A systematic review suggested that step-count diaries were common to successful pedometer interventions [16], and approximately 80% of both of our intervention groups returned completed step-count diaries. Also, our qualitative findings suggest that participants from both groups valued the handbook and diary as well as the pedometer [26]. We found no effect of the interventions on body mass index or fat mass, consistent with other studies [21, 30]. Our interventions did not affect anxiety or depression scores, consistent with other primary care pedometer-based interventions, suggesting either no effect or insensitivity of these measures to change, particularly when levels are in the normal range for most people [19, 21]. However, whilst a few participants mentioned negative effects from overdoing walking, most intervention participants talked about feeling fitter, sleeping better, improved mood, having more energy and less pain, and keeping more active into older age [26]. There is a lack of data comparing individual, couple, or household participation in walking studies [15, 21]. Household sampling allowed us to investigate this, but only 20% participated as couples, reducing the power of our subgroup analyses, which showed no effect. Self-efficacy differences between both intervention groups and controls at 3 mo and between the nurse group and controls at 12 mo are consistent with the positive relationship between changing self-efficacy and PA behaviour [33]. The BCTs most associated with self-efficacy and successful outcomes are goal and action planning, prompting self-monitoring and feedback, and planning of social support/change [33]. All these BCTs were specifically recommended in recent guidance [11] and were included in our study in written materials for both intervention groups and as a focus of nurse PA consultations [23]. Our qualitative interviews found that more BCT comments were made by the nurse than postal group, apart from around self-monitoring [26]. Increased self-efficacy is important for long-term PA adherence [34]. Walking is a safe intervention indicated in many chronic diseases [1, 7], although empirical data are limited [13] and a large trial on 40- to 74-y-old women encouraging a single 30-min brisk walk 5 d weekly reported increased falls and injuries [24]. Our findings showing no increase in adverse events builds on similar evidence from PACE-Lift [21], using both self-report and primary care data, and highlights the potential importance of building up MVPA gradually, particularly in those who are inactive or have comorbidities [1, 6]. The suggestion of a protective effect of the interventions on falls and cardiovascular events is plausible, but not definitive, as it is based on small numbers of events. Implications for Policy, Practice, and Future Research Individual PA behaviour change approaches such as PACE-UP are important in tackling the public health challenge of physical inactivity but for maximum benefit need to occur alongside environmental and policy approaches [12]. Our results support current guidance for pedometers, which suggests that they are used as part of a package that includes support to set realistic goals, monitoring, and feedback [35]. Only 10% of eligible individuals were randomised, similar to other primary care PA trials [19, 36] but lower than the 30% in our recent older adult trial [21]. However, 10% of a population sample is still a very useful percentage to be participating in a public health intervention, and this trial shows the potential of primary care to contribute to PA public health goals. It is important to consider whether the participants randomised are representative of the target population from which they were drawn, particularly given the uptake rate of 10%. From Table 1, we can see that, of those randomised, there were more women than men, and the proportion of participants of Asian origin and from deprived areas was low and fewer than expected from the areas sampled. While approximately 4/5 of those randomised reported their health as good or very good, about 2/3 were overweight or obese, half reported one to two chronic diseases, nearly 2/5 reported slight/some disability, and over 1/5 reported a limiting, longstanding illness. Older adults were well represented. Thus, although it is unlikely that those randomised are entirely typical of the practice populations (it would be surprising if they were), there was substantial representation from groups who are particularly likely to benefit from the intervention, specifically older adults, women, and the overweight. Moreover, 1/3 of those randomised rated their self-efficacy for exercise as low. Nevertheless, some groups, for example Asians, will be underrepresented, and we are carrying out further work comparing participants and nonparticipants to identify these. Tailoring future interventions to be more acceptable to such groups will be important. If the intervention were to be rolled out in routine primary care, take-up could be higher, with no requirement for informed consent, randomisation, and rigorous evaluation. Handing out the intervention materials (pedometer, handbook, and diary) in primary care consultations where advice to increase low PA levels is already being offered is also likely to increase the intervention’s reach (e.g., in relevant chronic disease consultations or as part of preventive health checks, such as the UK National Health Service Health Checks, which cover a similar age-group and aim to reduce cardiovascular risk [37]). The intervention could also be a valuable addition to diabetes prevention strategies, such as the National Health Service Diabetes Prevention Programme [38], where primary care is being used to identify patients at high risk of developing diabetes, the majority of whom are inactive. The “3,000 steps-in-30 min” neatly captures intensity and could become a commendable new public health goal, with many people now having the ability to measure steps easily with their mobile phones. Our interventions led to an extra 33–35 min weekly of MVPA in bouts (an increase of about a third from baseline) and an extra 642–692 steps per day in a predominantly inactive cohort. Based on a systematic review, which has quantified the strength of association between PA (particularly walking) and developing coronary heart disease [39], the increase of 33 min/wk in the postal group in our study at 12 mo, if sustained, would be expected to reduce coronary heart disease risk by 4.5% (95% CI 3%, 6%; see S3 Text for details). Similarly, a cohort study relating pedometer-measured steps to mortality [40] allowed us to estimate that a sustained increase of 642 steps/day would be expected to decrease all-cause mortality by 4% (95% CI 1%, 7%). Whilst the nurse intervention produced greater effects at 3 mo, by 12 mo both interventions performed similarly. However, maintenance is important to consider, as long-term health effects require sustained PA increases and little is known about the effectiveness of PA interventions beyond 12 mo [13, 16]. We designed both PACE-UP interventions to have lasting effects [23], including techniques shown to help maintain behaviour change (e.g., encouraging feedback and self-monitoring; relapse prevention strategies and “if-then” plans in case of relapse; building social support; and incorporating new behaviours into daily routines [11]). Some strategies may have been more effective in the nurse group; the sustained self-efficacy difference between nurse and control groups at 12 mo supports this possibility. It is therefore important to test the long-term effectiveness of both interventions, and we are currently following up the PACE-UP cohort at 3 y. Conclusion The PACE-UP pedometer-based walking intervention increased step-counts by approximately a tenth and time in MVPA in bouts by a third in predominantly inactive 45- to 75-y-old primary care patients. Nurse delivery over three consultations had no greater effect on 12-mo PA outcomes than postal delivery. A primary care pedometer intervention, delivered by post or with minimal contact, would provide an effective approach to addressing the public health physical inactivity challenge. Supporting Information S1 Text. PACE-UP trial protocol. https://doi.org/10.1371/journal.pmed.1002210.s001 (PDF) S2 Text. PACE-UP CONSORT checklist. https://doi.org/10.1371/journal.pmed.1002210.s002 (DOCX) S3 Text. Details on cardiovascular and overall mortality risk reduction. https://doi.org/10.1371/journal.pmed.1002210.s003 (DOCX) S1 Fig. Trial procedures and complex intervention components. https://doi.org/10.1371/journal.pmed.1002210.s004 (TIF) S2 Fig. Residuals from 12-mo models for steps and weekly MVPA in ≥10-min bouts. https://doi.org/10.1371/journal.pmed.1002210.s005 (TIF) S1 Table. Number of days with ≥540 min accelerometer wear-time by treatment group at baseline, 3 mo, and 12 mo. https://doi.org/10.1371/journal.pmed.1002210.s006 (DOCX) S2 Table. Summary data for main outcome and ancillary outcome variables. https://doi.org/10.1371/journal.pmed.1002210.s007 (DOCX) S3 Table. Sensitivity and imputation analyses for the primary outcome (step-count at 12 mo). https://doi.org/10.1371/journal.pmed.1002210.s008 (DOCX) Acknowledgments We would first like to thank Dr. Sunil Shah for his invaluable contribution to the study as a trial investigator from inception through to trial follow-up. Unfortunately, he died suddenly in 2015. We would like to thank the South-West London (UK) general practices, their practice nurses who supported this study, and all the patients from these practices who participated: Upper Tooting Road Practice, Tooting; Chatfield Practice, Battersea; Wrythe Green Practice, Carshalton; Francis Grove Practice, Wimbledon; Putneymead Practice Putney; Heathfield Practice Putney; and Cricket Green Practice, Mitcham. We would also like to thank our supportive Trial Steering Committee: Professor Sarah Lewis (chair); Professor Paul Little (GP representative); Mr. Bob Laventure (Patient and Public Involvement representative). Dr. Iain Carey helped with processing downloaded GP data. Disclaimer: the views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Health Technology Assessment (HTA) programme, National Institute for Health Research (NIHR) National Health Service, or the Department of Health.
Mosquito-Disseminated Insecticide for Citywide Vector Control and Its Potential to Block Arbovirus Epidemics: Entomological Observations and Modeling Results from Amazonian Brazildoi: 10.1371/journal.pmed.1002213pmid: 28095414
Background Mosquito-borne viruses threaten public health worldwide. When the ratio of competent vectors to susceptible humans is low enough, the virus’s basic reproductive number (R0) falls below 1.0 (each case generating, on average, <1.0 additional case) and the infection fades out from the population. Conventional mosquito control tactics, however, seldom yield R0 < 1.0. A promising alternative uses mosquitoes to disseminate a potent growth-regulator larvicide, pyriproxyfen (PPF), to aquatic larval habitats; this kills most mosquito juveniles and substantially reduces adult mosquito emergence. We tested mosquito-disseminated PPF in Manacapuru, a 60,000-inhabitant city (~650 ha) in Amazonian Brazil. Methods and Findings We sampled juvenile mosquitoes monthly in 100 dwellings over four periods in February 2014–January 2016: 12 baseline months, 5 mo of citywide PPF dissemination, 3 mo of focal PPF dissemination around Aedes-infested dwellings, and 3 mo after dissemination ended. We caught 19,434 juvenile mosquitoes (66% Aedes albopictus, 28% Ae. aegypti) in 8,271 trap-months. Using generalized linear mixed models, we estimated intervention effects on juvenile catch and adult emergence while adjusting for dwelling-level clustering, unequal sampling effort, and weather-related confounders. Following PPF dissemination, Aedes juvenile catch decreased by 79%–92% and juvenile mortality increased from 2%–7% to 80%–90%. Mean adult Aedes emergence fell from 1,077 per month (range 653–1,635) at baseline to 50.4 per month during PPF dissemination (range 2–117). Female Aedes emergence dropped by 96%–98%, such that the number of females emerging per person decreased to 0.06 females per person-month (range 0.002–0.129). Deterministic models predict, under plausible biological-epidemiological scenarios, that the R0 of typical Aedes-borne viruses would fall from 3–45 at baseline to 0.004–0.06 during PPF dissemination. The main limitations of our study were that it was a before–after trial lacking truly independent replicates and that we did not measure mosquito-borne virus transmission empirically. Conclusions Mosquito-disseminated PPF has potential to block mosquito-borne virus transmission citywide, even under adverse scenarios. Our results signal new avenues for mosquito-borne disease prevention, likely including the effective control of Aedes-borne dengue, Zika, and chikungunya epidemics. Cluster-randomized controlled trials will help determine whether mosquito-disseminated PPF can, as our findings suggest, develop into a major tool for improving global public health. Why Was This Study Done? Urban mosquitoes are global public health threats. They transmit dengue, Zika, and many other diseases for which vaccines or drugs are not available. Mosquito control is the key to preventing these diseases, yet conventional control tactics seldom yield satisfactory results. One key drawback is that many mosquitoes (especially Aedes) use small, hidden, or inaccessible breeding sites (aquatic larval habitats) that often remain untreated during control campaigns. One way of increasing the fraction of breeding sites that are treated is to use the mosquitoes themselves to transfer pyriproxyfen (PPF), a potent juvenile-killing insecticide, from resting sites to untreated breeding sites; there, PPF impedes juvenile mosquito development. What Did the Researchers Do and Find? Working with municipal vector control staff, we tested mosquito-disseminated PPF in a 60,000-inhabitant town of central Amazonia, in Brazil. We sampled juvenile mosquitoes monthly in 100 dwellings over 12 baseline months, eight months of PPF dissemination (five months citywide and three months focal), and three months post-dissemination. We caught 12,817 Aedes albopictus and 5,346 Ae. aegypti juveniles, and kept them in the laboratory to measure juvenile mortality and adult emergence. Following PPF dissemination, we observed an 80%–90% decrease in Aedes juvenile catch, while Aedes juvenile mortality increased from 2%–7% to 80%–90%. Adult Aedes emergence dropped by 96%–98%, such that the number of females emerging per person decreased to 0.002–0.129 females per person-month. Mathematical models predict that this reduction would effectively block transmission of mosquito-borne viruses like dengue, Zika, or chikungunya. What Do These Findings Mean? Our findings suggest that mosquito-disseminated PPF has the potential to become an important public health tool; larger, carefully designed trials are now necessary to determine the impact of this tactic on disease transmission. Introduction Fast global spread of mosquito-borne viruses is among the most pressing contemporary public health challenges [1,2]. The dengue, West Nile, and Japanese encephalitis viruses are well-known mosquito-transmitted pathogens, but we are currently witnessing the emergence of novel threats including chikungunya and Zika [1–6]. Both African in origin, these two viruses are causing large epidemics in the Americas and more restricted outbreaks in Europe, Southeast Asia, and the Pacific [5–7]. Ongoing Zika epidemics are particularly worrying because infection with this virus can cause Guillain-Barré syndrome and congenital central nervous system malformations including microcephaly [6–13]. Aedes aegypti and Ae. albopictus are considered the main urban vectors of dengue, Zika, and chikungunya, while Culex spp. mosquitoes transmit West Nile and Japanese encephalitis viruses [1–7]. The presence of urban mosquito vectors also increases the emergence or re-emergence potential of other viruses including yellow fever and Mayaro [1,2]. Effective vaccines exist for yellow fever and Japanese encephalitis, and recent advances in dengue [14] and Zika [15] vaccine development are relatively encouraging. However, major challenges remain (e.g., [16,17]), and for most mosquito-borne viral infections vector control is still the cornerstone of disease prevention [3,18,19]. In theory, effective control of disease spread requires lowering the ratio of competent vectors to susceptible human hosts below a critical threshold value, which, in turn, brings an infection’s basic reproductive number, R0, below unity [20–22]. R0 is a fundamental quantity in infectious disease epidemiology [22]. It measures the number of new (secondary) cases that arise from a primary (index) case entering a susceptible host population; with R0 < 1.0, each case produces, on average, less than one new infection, and the disease fades out from the host population. R0, then, provides also a measure of the control effort needed to effectively stop transmission [20–22]. Despite the large (and mounting) burden imposed by Aedes- and Culex-transmitted viruses [1,2,4–13,23,24], current mosquito control tactics have often failed to reliably reduce vector:human ratios to values that would keep R0 below the 1.0 threshold [18,19,25]. Mosquito control tactics usually combine insecticide spraying to kill adult mosquitoes with the identification and elimination of mosquito breeding sites (i.e., aquatic larval habitats) to limit juvenile mosquito numbers [3,18,19,25]. Unfortunately, insecticide spraying has only transient effects on the adult mosquito population, and the proportion of breeding sites that are detected and treated or eliminated (“breeding-site coverage”) is often so low as to render control campaigns largely ineffective (see [18,25]). An attractive way to increase breeding-site coverage is to use adult mosquitoes to disseminate tiny particles of juvenile-killing insecticides (larvicides or pupicides) to breeding sites [26,27]. One such insecticide is pyriproxyfen (PPF), an insect juvenile hormone analogue that kills mosquito juveniles at minute doses and can safely be used even in drinking water [3,27,28]. Mosquito-disseminated PPF has been shown to yield high breeding-site coverage and large reductions of adult mosquito emergence across a tropical neighborhood [29]. One crucial open question is whether mosquito-disseminated PPF can effectively reduce mosquito populations at the spatial scale relevant for vector control and disease prevention—the scale of cities and towns. To address this question, we conducted a 2-y trial in a Brazilian Amazon city. First, we asked whether and to what extent some key demographic parameters of local mosquito populations (with a focus on Ae. aegypti and Ae. albopictus) would change following citywide deployment of mosquito-disseminated PPF. We then used simple deterministic models to explore the possible impact of observed changes in female Aedes emergence on the basic reproductive number, R0, for dengue and similar pathogens, including Zika, under epidemiological-entomological scenarios ranging from somewhat optimistic to essentially catastrophic. Methods This project was led by the Fundação Oswaldo Cruz (Brazilian Ministry of Health) in a joint initiative with local state and municipal health departments. Formal approval was not required for urban mosquito collection. Setting and Mosquito Surveillance We conducted a 2-y trial in Manacapuru, a 60,000-inhabitant city (~13,500 dwellings in ~650 ha) in the Brazilian Amazon (data from the Brazilian Institute of Geography and Statistics; http://www.ibge.gov.br/) (Fig 1). We selected 100 dwellings roughly evenly distributed across the city (Fig 1) for mosquito surveillance including two surveys per month from February 2014 to January 2016 (except that no surveys were conducted in November 2015, and just one survey in February 2015 and January 2016; see S1 Data). Residents in these 100 dwellings gave written informed consent to participate in the study. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Study site: the city of Manacapuru, state of Amazonas, Brazil. The black circles indicate the location of the 100 dwellings monitored for mosquito vectors; the green circles indicate the location of the 1,000 pyriproxyfen dissemination stations. Circles are overlaid on a schematic of Manacapuru; locations are approximate. The map of Latin America was drawn using the maps library in R 3.1.2 (https://www.r-project.org/). https://doi.org/10.1371/journal.pmed.1002213.g001 Each month, we set four sentinel breeding sites (SBSs; two per survey) in each surveillance dwelling. SBSs were 580-ml dark-brown plastic cups containing 400 ml of tap water; they were retrieved after 5–6 d of operation, and their contents kept in the laboratory as previously described [29]. We then recorded (a) the number of juvenile mosquitoes developing or dying in each SBS and (b) the number of adult mosquitoes emerging from each SBS. These data allowed us to assess monthly, for four mosquito taxa (Aedes aegypti, Ae. albopictus, Culex spp., and Limatus spp.), the following metrics: (a) house infestation, measured as the percent of surveillance dwellings with at least one juvenile mosquito; (b) juvenile mosquito catch, or the number of larvae in each SBS; (c) juvenile mosquito mortality, or the proportion of mosquito juveniles that died before reaching the adult stage in each SBS; and (d) adult mosquito emergence, or the number of adults emerging from each SBS. Here, we focus on our results on juvenile mosquito catch and adult mosquito emergence—and particularly on the epidemiologically most relevant quantity, female mosquito emergence. Full raw data are provided in S1 Data. Intervention In March 2015, after 1 y of monthly monitoring, the intervention started. Citywide PPF dissemination occurred from March through July 2015. Working under our supervision, municipal vector control staff deployed 1,000 PPF dissemination stations (DSs) scattered over the entire urban area (Fig 1); all site owners gave oral informed consent. DSs were 2-l plastic cups containing 600–700 ml of tap water and with the inner wall lined with black, Oxford-type polyester cloth dusted with 5 g of PPF 0.5% (SumiLarv 0.5G; Sumitomo Chemical, Tokyo) ground to fine powder (see also [29]). Municipal vector control staff visited DSs fortnightly for maintenance (re-dusting with PPF and refilling with water). Logistic constraints, however, precluded DS maintenance in some city sectors at some time points (see S1 Data and S1 Fig); we investigated the possible effects of these operational failures using generalized linear mixed models (GLMMs) (see below). From August through October 2015, PPF dissemination was scheduled to be “focal”—i.e., limited to dwellings with evidence of infestation by Aedes spp. based on the SBS surveillance. Again, logistic constraints did not allow for full coverage, with focal dissemination not taking place in eight of the 37 dwellings found to be infested at least once over this 3-mo period; in addition, our field team noted that PPF used in focal dissemination in October 2015 (26 dwellings) was not ground to sufficiently fine powder (see S1 Data and S1 Fig). Final PPF dissemination occurred in October 2015, with SBS-based monitoring maintained until January 2016. Thus, the trial spanned 12 mo before PPF dissemination, 5 mo of citywide PPF dissemination, 3 mo of focal PPF dissemination, and 3 mo after PPF dissemination stopped. Importantly, conventional Aedes control measures (active breeding-site searches and breeding-site elimination by municipal vector control staff) were in place over the first 12 and last 6 mo of the trial—i.e., over the periods with no citywide PPF dissemination. Descriptive Analyses We first described our data using graphs and tables, and calculated summary statistics including percentages with score 95% confidence intervals, means with standard errors, and quantiles. Statistical Modeling We used GLMMs to quantify changes in (a) juvenile mosquito catch (number of larvae caught in SBSs) and (b) adult Aedes emergence (number of adults emerging from SBSs) following PPF dissemination. The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) [30,31] unambiguously selected the negative binomial error structure (versus Poisson) as the best fit for our count data (see S1 Table). Our GLMMs accounted for unequal sampling effort due to missing SBS surveillance data by including the (log)number of operational SBSs in each dwelling each month as an offset. Since repeated observations were made over time in each dwelling, we specified dwelling ID as a random factor. Six dwelling-months produced no data (closed dwellings) and were excluded from the analyses, for a total of 2,294 SBS surveillance data points clustered in 100 dwellings. We specified intervention as a factor, indexing four consecutive periods: (1) before the intervention, or baseline; (2) citywide PPF dissemination (with some operational failures as noted above); (3) focal PPF dissemination (also with some operational failures); and (4) after PPF dissemination. We also tested alternative models excluding intervention effects (“null” models) or specifying, for each dwelling and month, (a) whether PPF dissemination (including DS maintenance) had/had not taken place at least once in the previous month (coded 1/0) or (b) the intensity/quality of dissemination, with 0 = no dissemination, 1 = unsupervised dissemination with possible operational failures, and 2 = supervised dissemination. For the second variable, for each month we summed the scores of the two fortnightly dissemination/maintenance events of the previous month, so this variable could take on integer values from 0 (no events) to 4 (two supervised events) (see S1 Data and S1 Fig). Dissemination/maintenance was recorded at the city sector level during citywide PPF dissemination and at the dwelling level during focal PPF dissemination. All these alternative GLMMs had, however, much larger AIC and BIC scores (consistently >50 units; see S1 Table) than the basic four-period models, on which we therefore base inference [30,31]. Our models controlled for the effects of rainfall (monthly total) and temperature (monthly average of maximum daily values); since these covariates were correlated (Pearson’s ρ = −0.704), we fit separate GLMMs adjusting for (standardized) rainfall and temperature. The Brazilian National Institute of Meteorology (INMET), which operates a meteorological station at the study locality, provided daily weather data (see S1 Data). GLMMs were fit using package lme4 1.1–10 in R 3.1.2 [32,33]. See S1 Table for details on the structure and relative performance of the full set of models used in each analysis, and S1 Text for a brief description of our original statistical modeling plan. Deterministic Modeling Using our empirical data and a simple Ross-Macdonald–type model [20–22], we explored the potential effects of observed changes in Aedes spp. female emergence on pathogen transmission. We calculated the basic reproductive number, R0, for pathogens resembling Aedes-borne viruses, including dengue (see Table 1), and the ratio (denoted m) of female Aedes mosquitoes to susceptible humans, which was the parameter we aimed to affect with our intervention. R0 is given by with parameters as defined in Table 1. We estimated monthly m ratios as the number of Aedes females emerging from SBSs each month in each dwelling divided by 4.5, the average number of people per dwelling in our study setting. We hence assumed that 100% of the local human population was susceptible to the pathogen, mirroring the current spread of Zika and chikungunya outside Africa [5–7]. To provide much more conservative estimates of intervention effects on R0, we repeated these analyses using three times as many emerging females as observed—i.e., using 3m instead of m. This represents the (unlikely) possibility that eight further breeding sites with mean productivity similar to that of our SBSs were present, on average, in each dwelling each month. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Parameter values used to investigate the expected variation of the basic reproductive number, R0, of a mosquito-borne viral infection as a function of the ratio of emerging Aedes females to humans under five hypothetical scenarios. https://doi.org/10.1371/journal.pmed.1002213.t001 Setting and Mosquito Surveillance We conducted a 2-y trial in Manacapuru, a 60,000-inhabitant city (~13,500 dwellings in ~650 ha) in the Brazilian Amazon (data from the Brazilian Institute of Geography and Statistics; http://www.ibge.gov.br/) (Fig 1). We selected 100 dwellings roughly evenly distributed across the city (Fig 1) for mosquito surveillance including two surveys per month from February 2014 to January 2016 (except that no surveys were conducted in November 2015, and just one survey in February 2015 and January 2016; see S1 Data). Residents in these 100 dwellings gave written informed consent to participate in the study. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Study site: the city of Manacapuru, state of Amazonas, Brazil. The black circles indicate the location of the 100 dwellings monitored for mosquito vectors; the green circles indicate the location of the 1,000 pyriproxyfen dissemination stations. Circles are overlaid on a schematic of Manacapuru; locations are approximate. The map of Latin America was drawn using the maps library in R 3.1.2 (https://www.r-project.org/). https://doi.org/10.1371/journal.pmed.1002213.g001 Each month, we set four sentinel breeding sites (SBSs; two per survey) in each surveillance dwelling. SBSs were 580-ml dark-brown plastic cups containing 400 ml of tap water; they were retrieved after 5–6 d of operation, and their contents kept in the laboratory as previously described [29]. We then recorded (a) the number of juvenile mosquitoes developing or dying in each SBS and (b) the number of adult mosquitoes emerging from each SBS. These data allowed us to assess monthly, for four mosquito taxa (Aedes aegypti, Ae. albopictus, Culex spp., and Limatus spp.), the following metrics: (a) house infestation, measured as the percent of surveillance dwellings with at least one juvenile mosquito; (b) juvenile mosquito catch, or the number of larvae in each SBS; (c) juvenile mosquito mortality, or the proportion of mosquito juveniles that died before reaching the adult stage in each SBS; and (d) adult mosquito emergence, or the number of adults emerging from each SBS. Here, we focus on our results on juvenile mosquito catch and adult mosquito emergence—and particularly on the epidemiologically most relevant quantity, female mosquito emergence. Full raw data are provided in S1 Data. Intervention In March 2015, after 1 y of monthly monitoring, the intervention started. Citywide PPF dissemination occurred from March through July 2015. Working under our supervision, municipal vector control staff deployed 1,000 PPF dissemination stations (DSs) scattered over the entire urban area (Fig 1); all site owners gave oral informed consent. DSs were 2-l plastic cups containing 600–700 ml of tap water and with the inner wall lined with black, Oxford-type polyester cloth dusted with 5 g of PPF 0.5% (SumiLarv 0.5G; Sumitomo Chemical, Tokyo) ground to fine powder (see also [29]). Municipal vector control staff visited DSs fortnightly for maintenance (re-dusting with PPF and refilling with water). Logistic constraints, however, precluded DS maintenance in some city sectors at some time points (see S1 Data and S1 Fig); we investigated the possible effects of these operational failures using generalized linear mixed models (GLMMs) (see below). From August through October 2015, PPF dissemination was scheduled to be “focal”—i.e., limited to dwellings with evidence of infestation by Aedes spp. based on the SBS surveillance. Again, logistic constraints did not allow for full coverage, with focal dissemination not taking place in eight of the 37 dwellings found to be infested at least once over this 3-mo period; in addition, our field team noted that PPF used in focal dissemination in October 2015 (26 dwellings) was not ground to sufficiently fine powder (see S1 Data and S1 Fig). Final PPF dissemination occurred in October 2015, with SBS-based monitoring maintained until January 2016. Thus, the trial spanned 12 mo before PPF dissemination, 5 mo of citywide PPF dissemination, 3 mo of focal PPF dissemination, and 3 mo after PPF dissemination stopped. Importantly, conventional Aedes control measures (active breeding-site searches and breeding-site elimination by municipal vector control staff) were in place over the first 12 and last 6 mo of the trial—i.e., over the periods with no citywide PPF dissemination. Descriptive Analyses We first described our data using graphs and tables, and calculated summary statistics including percentages with score 95% confidence intervals, means with standard errors, and quantiles. Statistical Modeling We used GLMMs to quantify changes in (a) juvenile mosquito catch (number of larvae caught in SBSs) and (b) adult Aedes emergence (number of adults emerging from SBSs) following PPF dissemination. The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) [30,31] unambiguously selected the negative binomial error structure (versus Poisson) as the best fit for our count data (see S1 Table). Our GLMMs accounted for unequal sampling effort due to missing SBS surveillance data by including the (log)number of operational SBSs in each dwelling each month as an offset. Since repeated observations were made over time in each dwelling, we specified dwelling ID as a random factor. Six dwelling-months produced no data (closed dwellings) and were excluded from the analyses, for a total of 2,294 SBS surveillance data points clustered in 100 dwellings. We specified intervention as a factor, indexing four consecutive periods: (1) before the intervention, or baseline; (2) citywide PPF dissemination (with some operational failures as noted above); (3) focal PPF dissemination (also with some operational failures); and (4) after PPF dissemination. We also tested alternative models excluding intervention effects (“null” models) or specifying, for each dwelling and month, (a) whether PPF dissemination (including DS maintenance) had/had not taken place at least once in the previous month (coded 1/0) or (b) the intensity/quality of dissemination, with 0 = no dissemination, 1 = unsupervised dissemination with possible operational failures, and 2 = supervised dissemination. For the second variable, for each month we summed the scores of the two fortnightly dissemination/maintenance events of the previous month, so this variable could take on integer values from 0 (no events) to 4 (two supervised events) (see S1 Data and S1 Fig). Dissemination/maintenance was recorded at the city sector level during citywide PPF dissemination and at the dwelling level during focal PPF dissemination. All these alternative GLMMs had, however, much larger AIC and BIC scores (consistently >50 units; see S1 Table) than the basic four-period models, on which we therefore base inference [30,31]. Our models controlled for the effects of rainfall (monthly total) and temperature (monthly average of maximum daily values); since these covariates were correlated (Pearson’s ρ = −0.704), we fit separate GLMMs adjusting for (standardized) rainfall and temperature. The Brazilian National Institute of Meteorology (INMET), which operates a meteorological station at the study locality, provided daily weather data (see S1 Data). GLMMs were fit using package lme4 1.1–10 in R 3.1.2 [32,33]. See S1 Table for details on the structure and relative performance of the full set of models used in each analysis, and S1 Text for a brief description of our original statistical modeling plan. Deterministic Modeling Using our empirical data and a simple Ross-Macdonald–type model [20–22], we explored the potential effects of observed changes in Aedes spp. female emergence on pathogen transmission. We calculated the basic reproductive number, R0, for pathogens resembling Aedes-borne viruses, including dengue (see Table 1), and the ratio (denoted m) of female Aedes mosquitoes to susceptible humans, which was the parameter we aimed to affect with our intervention. R0 is given by with parameters as defined in Table 1. We estimated monthly m ratios as the number of Aedes females emerging from SBSs each month in each dwelling divided by 4.5, the average number of people per dwelling in our study setting. We hence assumed that 100% of the local human population was susceptible to the pathogen, mirroring the current spread of Zika and chikungunya outside Africa [5–7]. To provide much more conservative estimates of intervention effects on R0, we repeated these analyses using three times as many emerging females as observed—i.e., using 3m instead of m. This represents the (unlikely) possibility that eight further breeding sites with mean productivity similar to that of our SBSs were present, on average, in each dwelling each month. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Parameter values used to investigate the expected variation of the basic reproductive number, R0, of a mosquito-borne viral infection as a function of the ratio of emerging Aedes females to humans under five hypothetical scenarios. https://doi.org/10.1371/journal.pmed.1002213.t001 Results Descriptive Analyses Ae. albopictus was the dominant mosquito species at the study site; overall, we caught 12,817 Ae. albopictus and 5,346 Ae. aegypti juveniles in our SBSs. House infestation by Aedes spp. fell from monthly values consistently about 70%–90% at baseline (mean 84.5%, median 87%, range 67%–97%) to a mean of 33% during citywide PPF dissemination (median 24%, range 15%–61%) and to a lowest value of 9% in the first month of focal dissemination (mean 16%, median 13%, range 9%–26%); afterwards, infestation gradually recovered to baseline values (Fig 2A). We also collected 58 Culex spp. (Cx. quinquefasciatus and a few Cx. nigripalpus) and 1,213 Limatus spp. (mainly L. durhami) larvae during the trial. House infestation by Culex spp. was consistently low before dissemination (median 1%, range 0%–6%); afterwards, just four dwellings were positive in just one month (April 2015). Dwelling infestation by Limatus spp. was recorded only before PPF dissemination (median 17.6%, range 0%–83%). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Changes in mosquito population metrics following deployment of mosquito-disseminated pyriproxyfen: descriptive graphs. (A) Monthly dwelling infestation by Aedes spp. (percent of dwellings in which at least one Ae. albopictus or Ae. aegypti juvenile was present in sentinel breeding sites [SBSs]); error bars are score 95% confidence intervals. (B) Mean monthly numbers of Aedes juveniles per SBS. (C) Monthly Aedes juvenile mortality (overall percent of juveniles that died before reaching adulthood); error bars are score 95% confidence intervals. (D) Mean monthly adult Aedes emergence (number of juvenile Aedes that developed into adults in each SBS); error bars are two standard errors. In all panels, the periods of citywide (dark grey) and focal (light grey) pyriproxyfen (PPF) dissemination are highlighted on the x-axes. Color coding in (A–C): red, pre-intervention (baseline) period, with orange indicating that just one survey was conducted in February 2015; dark green, citywide PPF dissemination; light green, focal PPF dissemination; blue, post-intervention period, with light blue indicating that just one survey was conducted in January 2016. Color coding in (D): red, females; blue, males; shaded area, total adult emergence; lighter red/blue, single-survey months (February 2015 and January 2016). https://doi.org/10.1371/journal.pmed.1002213.g002 Juvenile Aedes catch fell from a median value of 3.20 individuals per SBS per month before the intervention (range 1.94–4.82, mean 3.28) to less than one juvenile per SBS per month during citywide (median 0.77, range 0.17–1.36, mean 0.78) and focal (median 0.17, range 0.14–0.46, mean 0.26) PPF dissemination. Aedes catch rose back to a mean of more than three larvae per SBS over the last 3 mo of the trial (Fig 2B). At the dwelling level, these figures translate into typical mean catches of about 7–17 juvenile Aedes per month before PPF dissemination, falling to a minimum of 0.52 (52 Aedes juveniles in 100 dwellings) in the first month of focal dissemination. Mean monthly catch per dwelling was 1.01 for Limatus spp. and 0.04 for Culex spp. before dissemination; except for seven Culex larvae caught in the second month of focal dissemination, neither genus appeared in samples taken during or after the intervention. Before PPF dissemination, most Aedes juveniles survived to adulthood in our SBSs. Mean baseline monthly mortality was 1.9% (median 2.4%, range 0.0%–3.8%) for Ae. albopictus and 6.6% (median 5.5%, range 0.0%–17.8%) for Ae. aegypti. Monthly Aedes spp. mortality soared to 79.7% on average (range 61.2%–92.7%) during citywide PPF dissemination and reached a peak value of 96.2% (95% CI 87.0%–98.9%) in the first month of focal dissemination (Fig 2C). We could not investigate possible changes in juvenile Limatus mortality (mean at baseline 3.95%) because no larvae were caught after dissemination started. All Culex spp. juveniles caught before, but just three of seven caught during, PPF dissemination survived to adulthood. The combined effects of much lower juvenile mosquito catches (Fig 2B) and much higher juvenile mortality (Fig 2C) yielded a striking citywide decrease of adult mosquito emergence during PPF dissemination (Fig 2D). Mean monthly Aedes adult emergence from SBSs was 1,077 (median 1,034, range 653–1,635) at baseline, for a mean of 3.2 adults per SBS per month (median 3.1, range 1.9–4.8) and 10.8 adults per dwelling per month (median 10.3, range 6.7–16.5). During citywide PPF dissemination, monthly emergence fell about 40-fold to just 56 adults on average (median 26, range 21–117), or 0.14 adults per SBS (median 0.07, range 0.06–0.30) and 0.56 adults per dwelling (median 0.26, range 0.21–1.17). Comparing extreme values (1,635 adults in January 2015 versus 21 adults in May 2015), adult Aedes emergence fell about 80-fold during citywide PPF dissemination. Further decreases were recorded during focal dissemination, down to a minimum of just two adult Aedes in total (a male and a female Ae. albopictus) emerging from SBSs, each in a different dwelling, in August 2015—an 800-fold reduction relative to January 2015. As with other metrics, adult Aedes emergence rose back to baseline values after PPF dissemination stopped (Fig 2D). Since mosquito females but not males transmit human pathogens, we separately assessed Aedes female emergence from our SBSs. Table 2 summarizes monthly female emergence, and Figs 3 and 4 show, respectively, the numbers of Ae. albopictus and Ae. aegypti females emerging from SBSs in each dwelling and month. Monthly Aedes female emergence fell from an average of 536.6 (median 530, range 306–750) before to 28.8 (median 16, range 6–58) during citywide PPF dissemination; median values were therefore 33-fold lower, and extreme values 125-fold lower, during than before citywide dissemination. Again, this reduction became even larger over the focal dissemination period, with just one Aedes female emerging from the SBSs in August 2015—a >500-fold decrease compared to January 2015 (Table 2; Figs 2D, 3 and 4). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Monthly female Aedes albopictus emergence in each of the 100 surveillance dwellings. The distribution of dwellings (black dots) and pyriproxyfen dissemination stations (green dots) is shown in the first panel, where dots are overlaid on a schematic of Manacapuru. In the remaining panels, bubble size is proportional to the number of emerging Ae. albopictus females; the scale is shown as a grey bubble in the second panel. For each month, the total number of emerging Ae. albopictus females is shown in the upper right corner of the panel. Color coding: brown, pre-intervention (baseline) period, with yellow indicating a single-survey month; dark green, citywide PPF dissemination; light green, focal PPF dissemination; blue, post-intervention period, with light blue indicating a single-survey month. Temporal boundaries between periods are highlighted by colored vertical bars. https://doi.org/10.1371/journal.pmed.1002213.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Monthly female Aedes aegypti emergence in each of the 100 surveillance dwellings. The distribution of dwellings (black dots) and pyriproxyfen dissemination stations (green dots) is shown in the first panel, where dots are overlaid on a schematic of Manacapuru. In the remaining panels, bubble size is proportional to the number of emerging Ae. aegypti females; the scale is shown as a grey bubble in the second panel. For each month, the total number of emerging Ae. aegypti females is shown in the upper right corner of the panel. Color coding: brown, pre-intervention (baseline) period, with yellow indicating a single-survey month; dark green, citywide PPF dissemination; light green, focal PPF dissemination; blue, post-intervention period, with light blue indicating a single-survey month. Temporal boundaries between periods are highlighted by colored vertical bars. https://doi.org/10.1371/journal.pmed.1002213.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Monthly female Aedes spp. emergence from sentinel breeding sites set in 100 surveillance dwellings, Manacapuru, Amazonas, Brazil, February 2014–January 2016. https://doi.org/10.1371/journal.pmed.1002213.t002 Statistical Modeling GLMMs estimated strong negative effects of PPF dissemination on juvenile mosquito catch (Table 3; Fig 5A). Compared with baseline values, mean juvenile catch (all species) was estimated to fall by 80.2% (95% CI 76.3%–83.5%) and 92.1% (95% CI 89.9%–94.0%) during, respectively, citywide and focal PPF dissemination. The largest effect estimate was for Ae. albopictus catch, with a 94.1% reduction (95% CI 92.0%–95.6%) during focal dissemination and with negative effects still evident after dissemination stopped (36.2% reduction, 95% CI 18.9%–53.7%) (Fig 5B). Ae. aegypti mean catch was estimated to fall by 72.7% (95% CI 63.9%–79.4%) and 83.1% (95% CI 74.8%–88.7%) during citywide and focal dissemination, respectively (Fig 5B); the estimated increase in Ae. aegypti catch after PPF dissemination (Fig 5B) is driven by four outlier dwellings with a mean monthly catch of 30.2 Ae. aegypti juveniles per SBS (see S2 Fig). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Estimated impact of mosquito-disseminated pyriproxyfen on Aedes populations: results of generalized linear mixed models. (A) Mean monthly Aedes juvenile catch per dwelling: dots, observed values (in grey, values adjusted for single-survey months); solid line, model predictions (red, pre-intervention period; dark green, citywide pyriproxyfen [PPF] dissemination; light green, focal PPF dissemination; blue, post-intervention period); dotted line, model-predicted trajectory in the absence of intervention as a function of monthly rainfall and adjusted for the number of operational sentinel breeding sites and dwelling-level clustering. (B) Model-predicted percent change (with 95% confidence intervals) in juvenile mosquito catch relative to the pre-intervention period (CW, citywide PPF dissemination; F, focal dissemination; A, after PPF dissemination): diamonds, Ae. albopictus; triangles, Ae. aegypti; black circles, Aedes spp.; gold circles, all mosquito species. (C) Mean monthly Aedes adult emergence per dwelling, with coding as in (A). (D) Model-predicted percent change (with 95% confidence intervals) in adult Aedes emergence relative to the pre-intervention period, with periods coded as in (B); black circles, all Aedes adults; red circles, Aedes females; blue circles, Aedes males. In (A) and (C), the periods of citywide (dark grey) and focal (light grey) PPF dissemination are highlighted on the x-axes. https://doi.org/10.1371/journal.pmed.1002213.g005 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Estimated effects of mosquito-disseminated pyriproxyfen on juvenile mosquito catch: results from generalized linear mixed models. https://doi.org/10.1371/journal.pmed.1002213.t003 Mean reductions in adult Aedes emergence relative to baseline, as estimated by a GLMM (Table 4), were 96.0% (95% CI 95.2%–96.8%) during citywide and 96.8% (95% CI 95.8%–97.6%) during focal PPF dissemination, with emergence rising back to baseline values after dissemination ended (17.3% mean reduction relative to baseline but with a 95% CI ranging from a 31.6% decrease to a 2.0% increase). For Aedes females, GLMM estimates suggest emergence reductions of 95.6% (95% CI 94.6%–96.5%) during citywide and 95.1% (95% CI 93.4%–96.3%) during focal dissemination, with emergence still 22.1% (95% CI 5.8%–34.9%) lower in the post-intervention period than at baseline (Table 4; Fig 5C and 5D). Results were similar for Aedes males, with a maximum estimated reduction in emergence of 98.0% (95% CI 97.1%–98.7%) in the focal dissemination period (Table 4; Fig 5D). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Estimated effects of mosquito-disseminated pyriproxyfen on adult Aedes emergence: results from generalized linear mixed models. https://doi.org/10.1371/journal.pmed.1002213.t004 All the above intervention effect estimates were fully consistent with those derived from models in which we used temperature instead of rainfall to provide adjustment for weather conditions; as expected, our GLMMs overall suggest moderate positive effects of rainfall and weaker negative effects of maximum temperature on mosquito population metrics (see S2 and S3 Tables). Deterministic Modeling Monthly values of m, an estimate of the mean number of Aedes females emerging per person, fell from 1.2 (median 1.2; range 0.7–1.7) before PPF dissemination to 0.06 during both citywide (median 0.04, range 0.01–0.13) and focal (median 0.06, range 0.002–0.13) PPF dissemination. Fig 6 shows R0 values as a function of monthly m ratios in five scenarios ranging from optimistic to worst case (see Table 1). Recall that R0 measures the number of new infections arising from a primary case [20–22], so an infection can persist in a host population only if R0 > 1.0; recall also that our models assume a naïve human population with no immunity against the pathogen. Fig 6 shows that, across scenarios, the reduction of monthly m ratio seen during PPF dissemination is predicted to consistently bring R0 to <1.0, whereas baseline m values (first 12 mo) predict R0 values typically between 2.0 (optimistic scenario; range 1.15–2.75) and 5.5 (fair/realistic scenario; 3.18–7.63). Even in the worst-case scenario, with baseline R0 = 32 (range 19–45), the intervention would bring R0 to <1.0 for 4 and 9 mo assuming, respectively, daily vector death rates of μ = 0.1 and μ = 0.3 (see [34–37]) (Fig 6). A reanalysis using three times as many emerging Aedes females as observed predicts R0 < 1.0 for 1 mo (worst-case scenario, with baseline R0 from 56 to 133) to 8 mo (optimistic scenario, with baseline R0 ranging from 3 to 8) (see S3 Fig). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. Monthly estimates of the basic reproductive number (R0) of mosquito-borne viruses similar to dengue, Zika, or chikungunya. We considered scenarios ranging from optimistic to very adverse (see parameter values for each scenario in Table 1); the grey line corresponds to the worst-case scenario but with a higher value of the mean daily female mosquito death rate (μ = 0.3 instead of 0.1) to approximate data from wild Ae. aegypti populations (see [34–37]). The pink dotted line shows empirical monthly values of the number of Aedes females per person (parameter m) in our study setting and period. The periods of citywide (dark grey) and focal (light grey) PPF dissemination are highlighted on the x-axis. https://doi.org/10.1371/journal.pmed.1002213.g006 Descriptive Analyses Ae. albopictus was the dominant mosquito species at the study site; overall, we caught 12,817 Ae. albopictus and 5,346 Ae. aegypti juveniles in our SBSs. House infestation by Aedes spp. fell from monthly values consistently about 70%–90% at baseline (mean 84.5%, median 87%, range 67%–97%) to a mean of 33% during citywide PPF dissemination (median 24%, range 15%–61%) and to a lowest value of 9% in the first month of focal dissemination (mean 16%, median 13%, range 9%–26%); afterwards, infestation gradually recovered to baseline values (Fig 2A). We also collected 58 Culex spp. (Cx. quinquefasciatus and a few Cx. nigripalpus) and 1,213 Limatus spp. (mainly L. durhami) larvae during the trial. House infestation by Culex spp. was consistently low before dissemination (median 1%, range 0%–6%); afterwards, just four dwellings were positive in just one month (April 2015). Dwelling infestation by Limatus spp. was recorded only before PPF dissemination (median 17.6%, range 0%–83%). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Changes in mosquito population metrics following deployment of mosquito-disseminated pyriproxyfen: descriptive graphs. (A) Monthly dwelling infestation by Aedes spp. (percent of dwellings in which at least one Ae. albopictus or Ae. aegypti juvenile was present in sentinel breeding sites [SBSs]); error bars are score 95% confidence intervals. (B) Mean monthly numbers of Aedes juveniles per SBS. (C) Monthly Aedes juvenile mortality (overall percent of juveniles that died before reaching adulthood); error bars are score 95% confidence intervals. (D) Mean monthly adult Aedes emergence (number of juvenile Aedes that developed into adults in each SBS); error bars are two standard errors. In all panels, the periods of citywide (dark grey) and focal (light grey) pyriproxyfen (PPF) dissemination are highlighted on the x-axes. Color coding in (A–C): red, pre-intervention (baseline) period, with orange indicating that just one survey was conducted in February 2015; dark green, citywide PPF dissemination; light green, focal PPF dissemination; blue, post-intervention period, with light blue indicating that just one survey was conducted in January 2016. Color coding in (D): red, females; blue, males; shaded area, total adult emergence; lighter red/blue, single-survey months (February 2015 and January 2016). https://doi.org/10.1371/journal.pmed.1002213.g002 Juvenile Aedes catch fell from a median value of 3.20 individuals per SBS per month before the intervention (range 1.94–4.82, mean 3.28) to less than one juvenile per SBS per month during citywide (median 0.77, range 0.17–1.36, mean 0.78) and focal (median 0.17, range 0.14–0.46, mean 0.26) PPF dissemination. Aedes catch rose back to a mean of more than three larvae per SBS over the last 3 mo of the trial (Fig 2B). At the dwelling level, these figures translate into typical mean catches of about 7–17 juvenile Aedes per month before PPF dissemination, falling to a minimum of 0.52 (52 Aedes juveniles in 100 dwellings) in the first month of focal dissemination. Mean monthly catch per dwelling was 1.01 for Limatus spp. and 0.04 for Culex spp. before dissemination; except for seven Culex larvae caught in the second month of focal dissemination, neither genus appeared in samples taken during or after the intervention. Before PPF dissemination, most Aedes juveniles survived to adulthood in our SBSs. Mean baseline monthly mortality was 1.9% (median 2.4%, range 0.0%–3.8%) for Ae. albopictus and 6.6% (median 5.5%, range 0.0%–17.8%) for Ae. aegypti. Monthly Aedes spp. mortality soared to 79.7% on average (range 61.2%–92.7%) during citywide PPF dissemination and reached a peak value of 96.2% (95% CI 87.0%–98.9%) in the first month of focal dissemination (Fig 2C). We could not investigate possible changes in juvenile Limatus mortality (mean at baseline 3.95%) because no larvae were caught after dissemination started. All Culex spp. juveniles caught before, but just three of seven caught during, PPF dissemination survived to adulthood. The combined effects of much lower juvenile mosquito catches (Fig 2B) and much higher juvenile mortality (Fig 2C) yielded a striking citywide decrease of adult mosquito emergence during PPF dissemination (Fig 2D). Mean monthly Aedes adult emergence from SBSs was 1,077 (median 1,034, range 653–1,635) at baseline, for a mean of 3.2 adults per SBS per month (median 3.1, range 1.9–4.8) and 10.8 adults per dwelling per month (median 10.3, range 6.7–16.5). During citywide PPF dissemination, monthly emergence fell about 40-fold to just 56 adults on average (median 26, range 21–117), or 0.14 adults per SBS (median 0.07, range 0.06–0.30) and 0.56 adults per dwelling (median 0.26, range 0.21–1.17). Comparing extreme values (1,635 adults in January 2015 versus 21 adults in May 2015), adult Aedes emergence fell about 80-fold during citywide PPF dissemination. Further decreases were recorded during focal dissemination, down to a minimum of just two adult Aedes in total (a male and a female Ae. albopictus) emerging from SBSs, each in a different dwelling, in August 2015—an 800-fold reduction relative to January 2015. As with other metrics, adult Aedes emergence rose back to baseline values after PPF dissemination stopped (Fig 2D). Since mosquito females but not males transmit human pathogens, we separately assessed Aedes female emergence from our SBSs. Table 2 summarizes monthly female emergence, and Figs 3 and 4 show, respectively, the numbers of Ae. albopictus and Ae. aegypti females emerging from SBSs in each dwelling and month. Monthly Aedes female emergence fell from an average of 536.6 (median 530, range 306–750) before to 28.8 (median 16, range 6–58) during citywide PPF dissemination; median values were therefore 33-fold lower, and extreme values 125-fold lower, during than before citywide dissemination. Again, this reduction became even larger over the focal dissemination period, with just one Aedes female emerging from the SBSs in August 2015—a >500-fold decrease compared to January 2015 (Table 2; Figs 2D, 3 and 4). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Monthly female Aedes albopictus emergence in each of the 100 surveillance dwellings. The distribution of dwellings (black dots) and pyriproxyfen dissemination stations (green dots) is shown in the first panel, where dots are overlaid on a schematic of Manacapuru. In the remaining panels, bubble size is proportional to the number of emerging Ae. albopictus females; the scale is shown as a grey bubble in the second panel. For each month, the total number of emerging Ae. albopictus females is shown in the upper right corner of the panel. Color coding: brown, pre-intervention (baseline) period, with yellow indicating a single-survey month; dark green, citywide PPF dissemination; light green, focal PPF dissemination; blue, post-intervention period, with light blue indicating a single-survey month. Temporal boundaries between periods are highlighted by colored vertical bars. https://doi.org/10.1371/journal.pmed.1002213.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Monthly female Aedes aegypti emergence in each of the 100 surveillance dwellings. The distribution of dwellings (black dots) and pyriproxyfen dissemination stations (green dots) is shown in the first panel, where dots are overlaid on a schematic of Manacapuru. In the remaining panels, bubble size is proportional to the number of emerging Ae. aegypti females; the scale is shown as a grey bubble in the second panel. For each month, the total number of emerging Ae. aegypti females is shown in the upper right corner of the panel. Color coding: brown, pre-intervention (baseline) period, with yellow indicating a single-survey month; dark green, citywide PPF dissemination; light green, focal PPF dissemination; blue, post-intervention period, with light blue indicating a single-survey month. Temporal boundaries between periods are highlighted by colored vertical bars. https://doi.org/10.1371/journal.pmed.1002213.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Monthly female Aedes spp. emergence from sentinel breeding sites set in 100 surveillance dwellings, Manacapuru, Amazonas, Brazil, February 2014–January 2016. https://doi.org/10.1371/journal.pmed.1002213.t002 Statistical Modeling GLMMs estimated strong negative effects of PPF dissemination on juvenile mosquito catch (Table 3; Fig 5A). Compared with baseline values, mean juvenile catch (all species) was estimated to fall by 80.2% (95% CI 76.3%–83.5%) and 92.1% (95% CI 89.9%–94.0%) during, respectively, citywide and focal PPF dissemination. The largest effect estimate was for Ae. albopictus catch, with a 94.1% reduction (95% CI 92.0%–95.6%) during focal dissemination and with negative effects still evident after dissemination stopped (36.2% reduction, 95% CI 18.9%–53.7%) (Fig 5B). Ae. aegypti mean catch was estimated to fall by 72.7% (95% CI 63.9%–79.4%) and 83.1% (95% CI 74.8%–88.7%) during citywide and focal dissemination, respectively (Fig 5B); the estimated increase in Ae. aegypti catch after PPF dissemination (Fig 5B) is driven by four outlier dwellings with a mean monthly catch of 30.2 Ae. aegypti juveniles per SBS (see S2 Fig). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Estimated impact of mosquito-disseminated pyriproxyfen on Aedes populations: results of generalized linear mixed models. (A) Mean monthly Aedes juvenile catch per dwelling: dots, observed values (in grey, values adjusted for single-survey months); solid line, model predictions (red, pre-intervention period; dark green, citywide pyriproxyfen [PPF] dissemination; light green, focal PPF dissemination; blue, post-intervention period); dotted line, model-predicted trajectory in the absence of intervention as a function of monthly rainfall and adjusted for the number of operational sentinel breeding sites and dwelling-level clustering. (B) Model-predicted percent change (with 95% confidence intervals) in juvenile mosquito catch relative to the pre-intervention period (CW, citywide PPF dissemination; F, focal dissemination; A, after PPF dissemination): diamonds, Ae. albopictus; triangles, Ae. aegypti; black circles, Aedes spp.; gold circles, all mosquito species. (C) Mean monthly Aedes adult emergence per dwelling, with coding as in (A). (D) Model-predicted percent change (with 95% confidence intervals) in adult Aedes emergence relative to the pre-intervention period, with periods coded as in (B); black circles, all Aedes adults; red circles, Aedes females; blue circles, Aedes males. In (A) and (C), the periods of citywide (dark grey) and focal (light grey) PPF dissemination are highlighted on the x-axes. https://doi.org/10.1371/journal.pmed.1002213.g005 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Estimated effects of mosquito-disseminated pyriproxyfen on juvenile mosquito catch: results from generalized linear mixed models. https://doi.org/10.1371/journal.pmed.1002213.t003 Mean reductions in adult Aedes emergence relative to baseline, as estimated by a GLMM (Table 4), were 96.0% (95% CI 95.2%–96.8%) during citywide and 96.8% (95% CI 95.8%–97.6%) during focal PPF dissemination, with emergence rising back to baseline values after dissemination ended (17.3% mean reduction relative to baseline but with a 95% CI ranging from a 31.6% decrease to a 2.0% increase). For Aedes females, GLMM estimates suggest emergence reductions of 95.6% (95% CI 94.6%–96.5%) during citywide and 95.1% (95% CI 93.4%–96.3%) during focal dissemination, with emergence still 22.1% (95% CI 5.8%–34.9%) lower in the post-intervention period than at baseline (Table 4; Fig 5C and 5D). Results were similar for Aedes males, with a maximum estimated reduction in emergence of 98.0% (95% CI 97.1%–98.7%) in the focal dissemination period (Table 4; Fig 5D). Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Estimated effects of mosquito-disseminated pyriproxyfen on adult Aedes emergence: results from generalized linear mixed models. https://doi.org/10.1371/journal.pmed.1002213.t004 All the above intervention effect estimates were fully consistent with those derived from models in which we used temperature instead of rainfall to provide adjustment for weather conditions; as expected, our GLMMs overall suggest moderate positive effects of rainfall and weaker negative effects of maximum temperature on mosquito population metrics (see S2 and S3 Tables). Deterministic Modeling Monthly values of m, an estimate of the mean number of Aedes females emerging per person, fell from 1.2 (median 1.2; range 0.7–1.7) before PPF dissemination to 0.06 during both citywide (median 0.04, range 0.01–0.13) and focal (median 0.06, range 0.002–0.13) PPF dissemination. Fig 6 shows R0 values as a function of monthly m ratios in five scenarios ranging from optimistic to worst case (see Table 1). Recall that R0 measures the number of new infections arising from a primary case [20–22], so an infection can persist in a host population only if R0 > 1.0; recall also that our models assume a naïve human population with no immunity against the pathogen. Fig 6 shows that, across scenarios, the reduction of monthly m ratio seen during PPF dissemination is predicted to consistently bring R0 to <1.0, whereas baseline m values (first 12 mo) predict R0 values typically between 2.0 (optimistic scenario; range 1.15–2.75) and 5.5 (fair/realistic scenario; 3.18–7.63). Even in the worst-case scenario, with baseline R0 = 32 (range 19–45), the intervention would bring R0 to <1.0 for 4 and 9 mo assuming, respectively, daily vector death rates of μ = 0.1 and μ = 0.3 (see [34–37]) (Fig 6). A reanalysis using three times as many emerging Aedes females as observed predicts R0 < 1.0 for 1 mo (worst-case scenario, with baseline R0 from 56 to 133) to 8 mo (optimistic scenario, with baseline R0 ranging from 3 to 8) (see S3 Fig). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. Monthly estimates of the basic reproductive number (R0) of mosquito-borne viruses similar to dengue, Zika, or chikungunya. We considered scenarios ranging from optimistic to very adverse (see parameter values for each scenario in Table 1); the grey line corresponds to the worst-case scenario but with a higher value of the mean daily female mosquito death rate (μ = 0.3 instead of 0.1) to approximate data from wild Ae. aegypti populations (see [34–37]). The pink dotted line shows empirical monthly values of the number of Aedes females per person (parameter m) in our study setting and period. The periods of citywide (dark grey) and focal (light grey) PPF dissemination are highlighted on the x-axis. https://doi.org/10.1371/journal.pmed.1002213.g006 Discussion In this study we have shown that a sharp citywide decrease in mosquito vector populations followed the application of a low-technology tactic based on mosquito-disseminated PPF in a tropical town. Population declines were observed for Aedes and Culex spp., two foremost vectors of human disease, and for Limatus spp. The 95%–96% reduction in Aedes female emergence we report has the potential of blocking arbovirus transmission under scenarios ranging from somewhat optimistic to overtly adverse. The control of urban Culex spp. could have similar effects on the spread of important pathogens ranging from West Nile virus to lymphatic filariae; Culex spp. might in addition transmit Zika virus [38], although this is yet to be confirmed. Suppression of urban Limatus populations has less clear public health implications, but several bunyaviruses capable of infecting mammals have been isolated from mosquitoes of this day-biting genus in Amazonia [39]. Our findings suggest that mosquito-disseminated PPF could be particularly relevant for the control of epidemic outbreaks such as those seen when Zika, dengue, or chikungunya virus sweeps through immunologically naïve populations. Given partial herd immunity, the more stable endemic-epidemic transmission of, for example, dengue in many countries [40] would be even easier to interrupt. We note, in addition, that some of the parameter values used in our calculations (Table 1) probably exceed typical real values; in fact, our baseline R0 estimates are higher than those reported for dengue epidemics in Brazil [34]. Daily death rates of Aedes females, for example, have been estimated as μ ≈ 0.2–0.4 in Brazil and Puerto Rico [35–37]. If, moreover, PPF reduces the lifespan of female Aedes as it does with Anopheles gambiae [41], this would further increase μ. Using μ = 0.3 instead of 0.1, our models suggest that R0 would be brought to <1.0 for 9 mo under the worst-case scenario—and for 6 mo even assuming three times as many emerging females as observed (Figs 6 and S3). Further, Aedes infectivity (parameter b in Table 1) is probably lower, on average, than we assumed in our calculations (see, e.g., [42,43] for Zika virus). Thus, in general, our models likely underestimate the potential intervention effects on R0. This suggests that mosquito-disseminated PPF might block arbovirus transmission citywide even under very adverse circumstances—an entirely susceptible population, long-lasting viremias, frequent mosquito biting, short extrinsic incubation periods, and high probabilities of virus transmission from vector to human and vice versa (Table 1; Figs 6 and S3). The findings we report come, however, with several important caveats. The most obvious is that ours was a before–after, single-site trial lacking independent replicates, which limits the strength of the evidence we present. To partly mitigate this limitation, we estimated intervention effects using detailed 12-mo baseline data, and accounted for dwelling-level repeated measures, weather conditions, and unequal sampling effort using GLMMs. The size, sign, and consistency of effect estimates—which fully align with expectations based on mosquito biology and previous reports of PPF effects [26,27,29,44]—reinforce our confidence in the outcome of our analyses. In addition, that our findings are consistent with results from a neighborhood in a different city [29] suggests that PPF dissemination effects might be replicable elsewhere. Control results might however depend on the local availability of alternative breeding sites, which we did not measure. The impact of the intervention could thus be reduced in places where competing larval habitats are widespread enough to distract egg-laying females away from DSs. A second key limitation is that our SBS data are difficult to interpret in terms of true adult mosquito density or abundance [45]. While this should not greatly affect our estimates of relative change in juvenile catch and adult emergence, it calls for a cautious reading of our R0 results. Even if our complementary analyses using three times as many emerging females as observed are reassuring, we regard our R0 estimates as explicit, plausible hypotheses to be tested in future trials—ideally, cluster-randomized trials with replicate intervention and control sites and including a blind prospective assessment of arboviral infection incidence [46]. Finally, we note that we did not have the means to measure PPF in our SBSs, and therefore lack direct evidence of PPF dissemination. However, we did not record any extreme weather event or vector control intervention that could account for our observations. Further, alternative GLMMs investigating dissemination intensity/quality (coded as a 1-mo-lagged 0–4 variable; see Methods) suggested “dose-dependent” effects—with, for instance, a 57.1% (95% CI 54.3%–59.8%) reduction in monthly adult Aedes emergence for each unit increase in dissemination intensity/quality (see S4 Fig). In sum, we are confident that the striking changes in mosquito demographics we report were real and were a direct consequence of our intervention. The caveats discussed above call, however, for a cautious interpretation of our results, particularly regarding virus transmission—which we did not measure empirically. In our trial, local vector control staff deployed and maintained PPF DSs, with the research team providing initial training and nearly continuous supervision—in which we monitored, but did not interfere with, PPF dissemination or DS maintenance. This led to some operational problems, including failure to maintain or deploy some DSs as scheduled (mainly due to lack of fuel for reaching the more distant northwestern city sector) and suboptimal PPF grinding (see S1 Data and S1 Fig). Model comparisons showed, however, that four-intervention-period GLMMs performed much better (as measured by much lower AIC and BIC scores [30,31]) than alternative GLMMs with more detailed descriptions of PPF dissemination dynamics (S1 Table). This suggests that operational problems had little impact on overall intervention effects, and hence that the strategy may work under the constraints of real-life vector control efforts. In practical terms, the most relevant obstacle was that the PPF we used is formulated as coarse sand-like granules that had to be manually ground to talc-like powder; this was time-consuming and yielded dust particles of variable, unknown size. In preparation for larger-scale trials, we are using mechanical micronizers to get PPF dust of standardized particle size. One additional asset of our approach is that it may easily be combined with other interventions, traditional or novel, in integrated mosquito control strategies. For example, control agents and community members could focus on treating or destroying large, conspicuous, and accessible breeding sites, while mosquitoes disseminate PPF to the small, cryptic, and inaccessible larval habitats often used by Aedes spp. The community could engage in DS maintenance with support from local health agents; this would empower communities and may enhance acceptability while reducing costs. During outbreaks, indoor insecticide spraying could synergize PPF dissemination to quickly block transmission; in sites with effective early warning systems, spatially targeted interventions could be deployed as soon as the first cases of infection (in humans, vectors, or sentinel hosts) are detected. In general, flexible PPF dissemination strategies can be designed to suit particular needs in time and space. For example, focally deploying DSs at high densities could protect people in transmission-prone places such as hospitals, schools, stadiums, markets, churches, cemeteries, hotels, or prisons. Specific dissemination schemes in airports, bus/train stations, or ports (even on ships) might help limit man-mediated Aedes spread. Mosquito-disseminated PPF also holds promise for sites without mosquito-borne disease transmission but where mosquito bites cause skin lesions, allergies, distress, or economic losses (e.g., by affecting tourism). We also note that the 95%–98% reduction in adult Aedes emergence we recorded (Fig 5) could allow PPF dissemination to contribute to strategies based on the release of sterile, transgenic, or Wolbachia-transinfected mosquitoes, which require a high enough ratio of modified to wild mosquitoes [47–50]. The scale of mosquito releases (and associated costs including those of mass rearing) could be considerably reduced after a pulse of mosquito-disseminated PPF crashes local wild populations. Finally, we stress that combining PPF with products or tactics with different modes of action [46] would help reduce the odds of selecting resistant mosquitoes—a concern we also plan to address by testing larvicides other than PPF in experimental dissemination trials. Here we have shown, in summary, that mosquito-disseminated PPF has the potential to become a major tool for urban mosquito control and, consequently, for the prevention of mosquito-borne diseases. These findings might be equally relevant for rapidly spreading emerging arboviral infections, including Zika and chikungunya, and for better-established endemic pathogens, including dengue, West Nile, and Japanese encephalitis viruses. Cluster-randomized, multi-site controlled trials are now necessary to provide stronger evidence for (or against) these hypotheses [46]. We plan to conduct one such trial in the context of the Brazilian dengue control program, which recently recommended considering our approach for inclusion in national guidelines [51]. Based on the present findings, we anticipate that randomized controlled trials will show that mosquito-disseminated PPF can develop into a new, crucial means for improving global public health. Supporting Information S1 Data. Raw data. Mosquito catch and emergence by species, plus intervention and weather covariate values. https://doi.org/10.1371/journal.pmed.1002213.s001 (XLSX) S1 Fig. Pyriproxyfen dissemination in the vicinity of each surveillance dwelling. Each circle is centered on a surveillance dwelling, with circle size proportional to dissemination intensity/quality: two rounds of supervised dissemination (largest circles, value 4); one round of supervised dissemination and one round of unsupervised dissemination (value 3); one round of supervised dissemination (value 2); one round of unsupervised dissemination (value 1); or no dissemination (smallest circles, value 0). Dissemination was scheduled to be citywide in March–July 2015 and focal in August–October 2015. Note that dissemination failures mainly affected the northwestern sector of the town (the most distant from vector control headquarters), where three consecutive dissemination cycles (in May–June 2015) were not completed. https://doi.org/10.1371/journal.pmed.1002213.s002 (PDF) S2 Fig. Monthly number of Ae. aegypti juveniles caught in each dwelling. Boxplots show the 10th, 25th, 50th, 75th, and 90th quantiles; note the four outliers in the last 2 mo of monitoring. https://doi.org/10.1371/journal.pmed.1002213.s003 (PDF) S3 Fig. Monthly estimates of the basic reproductive number (R0) of mosquito-borne viruses similar to dengue, Zika, or chikungunya. We considered scenarios ranging from optimistic to very adverse (see parameter values for each scenario in Table 1) and used three times as many emerging females as observed in our study (i.e., 3m instead of m; pink dotted line); the grey line corresponds to the worst-case scenario but with a higher value of the mean daily female mosquito death rate (μ = 0.3 instead of 0.1) to approximate data from wild Ae. aegypti populations (see [34–37]). https://doi.org/10.1371/journal.pmed.1002213.s004 (PDF) S4 Fig. Reduction in adult Aedes emergence (with 95% confidence intervals) as a function of pyriproxyfen dissemination intensity/quality (measured as a 0–4 score). Predictions from a generalized linear mixed model adjusting for monthly rainfall, the number of operational sentinel breeding sites, and dwelling-level clustering. https://doi.org/10.1371/journal.pmed.1002213.s005 (PDF) S1 Table. The full set of generalized linear mixed models used in each analysis. Model structure and relative model performance, as measured through Akaike and Bayesian information criteria, are provided. https://doi.org/10.1371/journal.pmed.1002213.s006 (XLSX) S2 Table. Juvenile mosquito catch: results of generalized linear mixed models with either rainfall or temperature as the weather covariate. Parameter estimates, standard errors, and values of the Akaike and Bayesian information criteria are provided. https://doi.org/10.1371/journal.pmed.1002213.s007 (PDF) S3 Table. Adult Aedes emergence: results of generalized linear mixed models with either rainfall or temperature as the weather covariate. Parameter estimates, standard errors, and values of the Akaike and Bayesian information criteria are provided. https://doi.org/10.1371/journal.pmed.1002213.s008 (PDF) S1 Text. Original plan for statistical modeling. https://doi.org/10.1371/journal.pmed.1002213.s009 (PDF) Acknowledgments We thank R. Mota for field and laboratory assistance, and the Manacapuru municipal vector control staff and Manacapuru Health Department for their crucial help during the trial. We also thank the Fundação de Vigilância em Saúde do Amazonas of the Amazonas State Health Department. This paper is contribution number 27 of the Research Program on Infectious Disease Ecology in the Amazon (RP-IDEA) of the Instituto Leônidas e Maria Deane–Fundação Oswaldo Cruz.
Population Pharmacokinetic Properties of Piperaquine in Falciparum Malaria: An Individual Participant Data Meta-Analysisdoi: 10.1371/journal.pmed.1002212pmid: 28072872
Background Artemisinin-based combination therapies (ACTs) are the mainstay of the current treatment of uncomplicated Plasmodium falciparum malaria, but ACT resistance is spreading across Southeast Asia. Dihydroartemisinin-piperaquine is one of the five ACTs currently recommended by the World Health Organization. Previous studies suggest that young children (<5 y) with malaria are under-dosed. This study utilised a population-based pharmacokinetic approach to optimise the antimalarial treatment regimen for piperaquine. Methods and Findings Published pharmacokinetic studies on piperaquine were identified through a systematic literature review of articles published between 1 January 1960 and 15 February 2013. Individual plasma piperaquine concentration–time data from 11 clinical studies (8,776 samples from 728 individuals) in adults and children with uncomplicated malaria and healthy volunteers were collated and standardised by the WorldWide Antimalarial Resistance Network. Data were pooled and analysed using nonlinear mixed-effects modelling. Piperaquine pharmacokinetics were described successfully by a three-compartment disposition model with flexible absorption. Body weight influenced clearance and volume parameters significantly, resulting in lower piperaquine exposures in small children (<25 kg) compared to larger children and adults (≥25 kg) after administration of the manufacturers’ currently recommended dose regimens. Simulated median (interquartile range) day 7 plasma concentration was 29.4 (19.3–44.3) ng/ml in small children compared to 38.1 (25.8–56.3) ng/ml in larger children and adults, with the recommended dose regimen. The final model identified a mean (95% confidence interval) increase of 23.7% (15.8%–32.5%) in piperaquine bioavailability between each piperaquine dose occasion. The model also described an enzyme maturation function in very young children, resulting in 50% maturation at 0.575 (0.413–0.711) y of age. An evidence-based optimised dose regimen was constructed that would provide piperaquine exposures across all ages comparable to the exposure currently seen in a typical adult with standard treatment, without exceeding the concentration range observed with the manufacturers’ recommended regimen. Limited data were available in infants and pregnant women with malaria as well as in healthy individuals. Conclusions The derived population pharmacokinetic model was used to develop a revised dose regimen of dihydroartemisinin-piperaquine that is expected to provide equivalent piperaquine exposures safely in all patients, including in small children with malaria. Use of this dose regimen is expected to prolong the useful therapeutic life of dihydroartemisinin-piperaquine by increasing cure rates and thereby slowing resistance development. This work was part of the evidence that informed the World Health Organization technical guidelines development group in the development of the recently published treatment guidelines (2015). Why Was This Study Done? Despite expansion of malaria prevention and treatment in the last decade, malaria still kills around 1,200 people each day, mostly children below the age of five. Reduced drug exposure to one of the commonly used antimalarial drug combinations, dihydroartemisinin-piperaquine, has been reported in small children. It is crucial to develop an optimised dose regimen that achieves similar drug exposure in all patient groups, in order to give all patients an equal chance of cure. What Did the Researchers Do and Find? Drug concentration–time data (8,776 piperaquine concentration measurements) from 728 individuals in 11 separate clinical trials were collated and pooled for an individual participant data meta-analysis. A pharmacokinetic model was developed to describe the pharmacological properties of piperaquine, the expected variability between patients, and the influence of biologically important covariates. Small children had a substantially lower piperaquine exposure after recommended dosing regimens. The developed pharmacokinetic model was used to derive a new optimised dose regimen. What Do These Findings Mean? The proposed improved dose regimen of dihydroartemisinin-piperaquine is expected to provide equivalent piperaquine exposures safely in all patients, including in small children with malaria. An optimised dose regimen should prolong the useful therapeutic life of dihydroartemisinin-piperaquine by increasing cure rates and thereby slowing resistance development. Background Malaria currently causes an estimated 1,200 deaths each day [1]. Most malaria-related deaths occur in Africa in children under the age of 5 y. In endemic areas, young children lack sufficient acquired immunity and are more likely to develop severe forms of the disease. Artemisinin-based combination therapy (ACT) is the recommended first-line treatment for uncomplicated Plasmodium falciparum malaria. The 3-d fixed-dose combination of dihydroartemisinin and piperaquine is one of five ACTs currently recommended by the World Health Organization (WHO) [2]. The rapidly eliminated dihydroartemisinin component has a very potent antimalarial effect and eliminates the majority of the parasite biomass during the first 3 d of treatment [3]. The partner drug, piperaquine, is a slowly eliminated antimalarial that kills the residual parasites that remain after two asexual life cycles of exposure to dihydroartemisinin, thereby preventing recrudescent malaria. Piperaquine also prevents reinfections for approximately 1 mo after treatment [4–11]. The principal determinant of the therapeutic response of a slowly eliminated antimalarial drug is the duration for which the plasma (and thus free drug) level exceeds the minimum inhibitory concentration, which is reflected by the area under the plasma concentration–time curve, or its surrogate, the day 7 level [12]. Although there are several producers of dihydroartemisinin-piperaquine, three main manufacturers are producing and distributing dihydroartemisinin-piperaquine in endemic countries: Sigma-Tau Pharmaceuticals produces Eurartesim, registered with the European Medicine Agency in 2012; Guilin Pharmaceutical produces D-Artepp; and Beijing Holley-Cotec Pharmaceuticals produces Duo-Cotexin. Sigma-Tau’s recommendation is a target daily dosage of 18 mg piperaquine phosphate per kilogram body weight across all age groups, with a practical weight-based dosing schedule provided [13]. Beijing Holley-Cotec provides two weight-based dosing schedules, one for children, with a target daily dosage of 16 mg/kg, and one for adults [14,15]. Both manufacturers’ dosage recommendations are based on evidence from the early stages of piperaquine development before there was extensive information on the pharmacokinetic properties of piperaquine in young children (<5 y of age) and before resistance to artemisinins was established. Artemisinin resistance results in lower fractional reductions in parasite numbers per asexual cycle, leaving a larger residual biomass of parasites for the partner drug to remove. This increases the probability of recrudescence and drives the spread of resistance. First artemisinin, and now piperaquine, resistance has emerged in Cambodia [16–18]. Elsewhere the dihydroartemisinin-piperaquine combination has shown excellent efficacy and tolerability, although young children treated with dihydroartemisinin-piperaquine have a 3-fold greater risk of recrudescent malaria compared with older children and adults [19–21]. Piperaquine is highly bound to plasma proteins (>98%), with a very large volume of distribution (>100 l/kg), a low hepatic elimination clearance (<1.4 l/h/kg), and a consequently long terminal plasma elimination half-life (estimates range from 18 to 28 d) [22–28]. The pharmacokinetic properties of piperaquine are affected by body weight, pregnancy, and age [24,25,27,29,30]. A large quantity of co-administered fat enhances absorption significantly, particularly in healthy volunteers, whereas a small amount of fat does not [25,31,32]. Previous reports on the pharmacokinetic properties of piperaquine in children are conflicting [22–24,33]. The larger studies indicated that small children have an inadequate plasma exposure to piperaquine after standard dosing, which led to a proposed increased dose regimen of dihydroartemisinin-piperaquine in order to achieve adequate exposures in small children [24,30]. Pharmacokinetic studies are often small and so have limited power to detect important covariates. Provided the assay performances are comparable, pooling of individual participant data from several studies increases the power to determine covariates with higher precision and accuracy. Nonlinear mixed-effects modelling for pharmacokinetic meta-analyses permits a unifying structural, covariate, and statistical model to be developed [34]. Even with the best of tools, however, the heterogeneity of study designs and assay methods makes these analyses challenging. The WorldWide Antimalarial Resistance Network (WWARN) is a unique data sharing platform providing scientists and clinical investigators with an opportunity to share their data, knowledge, and experience. The aim of this study was to use pooled individual participant pharmacokinetic data from WWARN to characterise the pharmacokinetic properties of piperaquine, with a special focus on small children. Stochastic simulations from the final model were used to develop an evidence-based optimised dose regimen based on maximum concentration (as a measurement of toxicity) and piperaquine concentration at day 7 (as a measurement of efficacy) [35]. Methods Ethical Approval Participating investigators agreed to the WWARN terms of submission [36], which ensure that all data uploaded are anonymized and obtained with informed consent, and in accordance with any laws and ethics committee approvals applicable in the country of origin. Ethics committee approval for the pooled analysis of individual participant data was granted by the Oxford Tropical Research Ethics Committee. Clinical Studies All published pharmacology studies reported in PubMed, Google Scholar, Embase, ClinicalTrials.gov, or conference proceedings were identified through a systematic literature review of articles published between 1 January 1960 and 15 February 2013 according to PRISMA guidelines (Fig 1); the PRISMA checklist can be found in S1 PRISMA IPD Checklist. Principal investigators were invited to contribute individual patient data to the WWARN repository as part of a study group conducting a collaborative pooled analysis provided that their studies met the following criteria: (i) prospective dihydroartemisinin-piperaquine study in patients with uncomplicated P. falciparum infection or in healthy volunteers and (ii) validated measure of capillary and/or venous plasma piperaquine concentrations available. All data were uploaded to the WWARN repository and standardised using a methodology described in the WWARN clinical and pharmacology data management and statistical analysis plans [37,38]. Study reports generated from the formatted datasets were sent back to investigators for clarification and/or validation. Pharmacokinetic data from ten previously published clinical studies and one unpublished (at the time) clinical study were contributed and used for modelling [22,24,25,27,29,30,39–43]. Demographic data from each study are summarised in Table 1. Study protocols for the studies were available in the original publication or on request from the data contributor. Individual-patient-level data are available through WWARN (http://www.wwarn.org). Requests for access will be reviewed by a Data Access Committee to ensure that use of data is within the terms of consent and ethics approval. WWARN is registered with the Registry of Research Data Repositories (http://re3data.org). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Flowchart of the literature search. https://doi.org/10.1371/journal.pmed.1002212.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Demographic data of study population for the pooled analysis. https://doi.org/10.1371/journal.pmed.1002212.t001 Pharmacokinetic Analysis Plasma piperaquine concentrations (base form), transformed into their natural logarithms, were analysed using nonlinear mixed-effects modelling implemented in NONMEM v7.3 (ICON Development Solutions) with the first-order conditional estimation method [45,46]. Perl-Speaks-NONMEM 3.5.3, R v2.14.2 (R Foundation for Statistical Computing) with the Xpose package v4.3.5, and Piraña v2.6.0 were used for diagnostics and automation throughout the modelling process [47–49]. Piperaquine was administered as piperaquine phosphate, which was converted to piperaquine base with a scale factor of 57.7%. Two-, three-, and four-compartment disposition models were evaluated with first-order absorption. The best performing model was used to evaluate the most appropriate absorption model. First-order absorption with and without lag time and a more flexible transit absorption model with a fixed number (1–8) of transit compartments were investigated. Inter-individual variability was added exponentially to all parameters according to Eq 1: (1) where Pi is the individual parameter estimate for the ith individual, θp is the population value of the investigated parameter, and ηi,p is the individual deviation from the population parameter value for the ith individual. The η is drawn from a normal distribution with mean zero and variance ω2 (diagonal correlation matrix). The bioavailability was fixed to unity for the population, though inter-individual variability of this parameter was allowed. Inter-occasion variability was evaluated on absorption parameters (i.e., bioavailability and mean transit time) to allow variability in rate and amount of piperaquine absorption between dosing occasions: (2) where Pi,j is the individual estimate of the investigated parameter at the jth dosing occasion for individual i. κj,p is the deviation from the population parameter value for the jth dose. The κ is drawn from a normal distribution with mean zero and variance Π2. The unknown variability in concentration was described by an additive error on the individually predicted logarithmic concentrations (i.e., equivalent to an exponential error on non-transformed concentrations). Body weight was evaluated by adding it as an allometric function to all clearance (power of 0.75) and volume of distribution (power of 1) parameters before any other covariates were investigated, but an attempt was also made to estimate these exponents. The maturation process of enzyme-dependent biotransformation pathways in infants and its effect on elimination clearance in children below the age of 5 y was evaluated according to Eq 3: (3) where CLi is the individual clearance parameter estimate for the ith individual, θCL is the population value of the elimination clearance parameter, AGEi is the individual’s age, MF50 is the age that results in 50% maturation, and Hill is the Hill coefficient describing the slope of the maturation process. Disease effect and gender (i.e., sex) were evaluated as proportional categorical covariates in a subset of data to avoid false positive/negative relationships resulting from correlated covariates. Disease effect was evaluated on all parameters in a dataset with only adult (>18 y of age) male patients and healthy volunteers. The effect of gender was investigated on all parameters in malaria-infected non-pregnant adults. Dosing occasion as a categorical covariate for absorption parameters was investigated using all the available data. Total daily dose per body weight was also investigated as a linear covariate for absorption parameters. Substantial systematic differences in matched venous and capillary plasma piperaquine concentrations have been reported in a previous clinical study [50]. A proportional scaling factor between venous and capillary concentrations was therefore estimated to allow fitting of all data simultaneously: (4) where CCAP is the individually predicted capillary concentration, CVEN is the individually predicted venous concentration, and θS is the population scale parameter between the two biological matrices. Model discrimination was based on the objective function value (OFV) proportional to −2 times the log likelihood of data. A reduction in OFV of 3.84 and 10.8 was considered significant at p = 0.05 and p = 0.001, respectively, for a nested model with one degree of freedom difference. All covariates, except the allometric function of weight, were analysed in a step-wise manner with a forward selection step (p = 0.05) and a stricter backward elimination step (p = 0.001). Model diagnostics and predictive performance were evaluated by goodness-of-fit plots and simulation-based diagnostics (i.e., non-corrected, prediction-corrected, and variability-corrected visual predictive checks), respectively [51]. Parameter precision was investigated by generating 1,000 resampled datasets, stratified by clinical study, in a bootstrap approach. Parameter shrinkages were calculated [52] to determine the reliability of diagnostic plots. Dose Optimisation Stochastic simulations of the final mixed-effects model were performed to evaluate the exposure to piperaquine after (i) the dose regimen recommended by Sigma-Tau, (ii) the dose regimen recommended by Beijing Holley-Cotec, and (iii) a putative optimised dose regimen, as presented in Table 2 [24]. A total of 1,000 malaria-infected non-pregnant patients were simulated per kilogram of body weight (range: 5 to 100 kg) and dose regimen. As a simple surrogate of total exposure, predicted day 7 venous plasma piperaquine concentrations were compared between simulations, while the maximum concentrations were compared as an indicator of possible acute toxicity. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Evaluated dose regimens for piperaquine simulations. https://doi.org/10.1371/journal.pmed.1002212.t002 Ethical Approval Participating investigators agreed to the WWARN terms of submission [36], which ensure that all data uploaded are anonymized and obtained with informed consent, and in accordance with any laws and ethics committee approvals applicable in the country of origin. Ethics committee approval for the pooled analysis of individual participant data was granted by the Oxford Tropical Research Ethics Committee. Clinical Studies All published pharmacology studies reported in PubMed, Google Scholar, Embase, ClinicalTrials.gov, or conference proceedings were identified through a systematic literature review of articles published between 1 January 1960 and 15 February 2013 according to PRISMA guidelines (Fig 1); the PRISMA checklist can be found in S1 PRISMA IPD Checklist. Principal investigators were invited to contribute individual patient data to the WWARN repository as part of a study group conducting a collaborative pooled analysis provided that their studies met the following criteria: (i) prospective dihydroartemisinin-piperaquine study in patients with uncomplicated P. falciparum infection or in healthy volunteers and (ii) validated measure of capillary and/or venous plasma piperaquine concentrations available. All data were uploaded to the WWARN repository and standardised using a methodology described in the WWARN clinical and pharmacology data management and statistical analysis plans [37,38]. Study reports generated from the formatted datasets were sent back to investigators for clarification and/or validation. Pharmacokinetic data from ten previously published clinical studies and one unpublished (at the time) clinical study were contributed and used for modelling [22,24,25,27,29,30,39–43]. Demographic data from each study are summarised in Table 1. Study protocols for the studies were available in the original publication or on request from the data contributor. Individual-patient-level data are available through WWARN (http://www.wwarn.org). Requests for access will be reviewed by a Data Access Committee to ensure that use of data is within the terms of consent and ethics approval. WWARN is registered with the Registry of Research Data Repositories (http://re3data.org). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Flowchart of the literature search. https://doi.org/10.1371/journal.pmed.1002212.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Demographic data of study population for the pooled analysis. https://doi.org/10.1371/journal.pmed.1002212.t001 Pharmacokinetic Analysis Plasma piperaquine concentrations (base form), transformed into their natural logarithms, were analysed using nonlinear mixed-effects modelling implemented in NONMEM v7.3 (ICON Development Solutions) with the first-order conditional estimation method [45,46]. Perl-Speaks-NONMEM 3.5.3, R v2.14.2 (R Foundation for Statistical Computing) with the Xpose package v4.3.5, and Piraña v2.6.0 were used for diagnostics and automation throughout the modelling process [47–49]. Piperaquine was administered as piperaquine phosphate, which was converted to piperaquine base with a scale factor of 57.7%. Two-, three-, and four-compartment disposition models were evaluated with first-order absorption. The best performing model was used to evaluate the most appropriate absorption model. First-order absorption with and without lag time and a more flexible transit absorption model with a fixed number (1–8) of transit compartments were investigated. Inter-individual variability was added exponentially to all parameters according to Eq 1: (1) where Pi is the individual parameter estimate for the ith individual, θp is the population value of the investigated parameter, and ηi,p is the individual deviation from the population parameter value for the ith individual. The η is drawn from a normal distribution with mean zero and variance ω2 (diagonal correlation matrix). The bioavailability was fixed to unity for the population, though inter-individual variability of this parameter was allowed. Inter-occasion variability was evaluated on absorption parameters (i.e., bioavailability and mean transit time) to allow variability in rate and amount of piperaquine absorption between dosing occasions: (2) where Pi,j is the individual estimate of the investigated parameter at the jth dosing occasion for individual i. κj,p is the deviation from the population parameter value for the jth dose. The κ is drawn from a normal distribution with mean zero and variance Π2. The unknown variability in concentration was described by an additive error on the individually predicted logarithmic concentrations (i.e., equivalent to an exponential error on non-transformed concentrations). Body weight was evaluated by adding it as an allometric function to all clearance (power of 0.75) and volume of distribution (power of 1) parameters before any other covariates were investigated, but an attempt was also made to estimate these exponents. The maturation process of enzyme-dependent biotransformation pathways in infants and its effect on elimination clearance in children below the age of 5 y was evaluated according to Eq 3: (3) where CLi is the individual clearance parameter estimate for the ith individual, θCL is the population value of the elimination clearance parameter, AGEi is the individual’s age, MF50 is the age that results in 50% maturation, and Hill is the Hill coefficient describing the slope of the maturation process. Disease effect and gender (i.e., sex) were evaluated as proportional categorical covariates in a subset of data to avoid false positive/negative relationships resulting from correlated covariates. Disease effect was evaluated on all parameters in a dataset with only adult (>18 y of age) male patients and healthy volunteers. The effect of gender was investigated on all parameters in malaria-infected non-pregnant adults. Dosing occasion as a categorical covariate for absorption parameters was investigated using all the available data. Total daily dose per body weight was also investigated as a linear covariate for absorption parameters. Substantial systematic differences in matched venous and capillary plasma piperaquine concentrations have been reported in a previous clinical study [50]. A proportional scaling factor between venous and capillary concentrations was therefore estimated to allow fitting of all data simultaneously: (4) where CCAP is the individually predicted capillary concentration, CVEN is the individually predicted venous concentration, and θS is the population scale parameter between the two biological matrices. Model discrimination was based on the objective function value (OFV) proportional to −2 times the log likelihood of data. A reduction in OFV of 3.84 and 10.8 was considered significant at p = 0.05 and p = 0.001, respectively, for a nested model with one degree of freedom difference. All covariates, except the allometric function of weight, were analysed in a step-wise manner with a forward selection step (p = 0.05) and a stricter backward elimination step (p = 0.001). Model diagnostics and predictive performance were evaluated by goodness-of-fit plots and simulation-based diagnostics (i.e., non-corrected, prediction-corrected, and variability-corrected visual predictive checks), respectively [51]. Parameter precision was investigated by generating 1,000 resampled datasets, stratified by clinical study, in a bootstrap approach. Parameter shrinkages were calculated [52] to determine the reliability of diagnostic plots. Dose Optimisation Stochastic simulations of the final mixed-effects model were performed to evaluate the exposure to piperaquine after (i) the dose regimen recommended by Sigma-Tau, (ii) the dose regimen recommended by Beijing Holley-Cotec, and (iii) a putative optimised dose regimen, as presented in Table 2 [24]. A total of 1,000 malaria-infected non-pregnant patients were simulated per kilogram of body weight (range: 5 to 100 kg) and dose regimen. As a simple surrogate of total exposure, predicted day 7 venous plasma piperaquine concentrations were compared between simulations, while the maximum concentrations were compared as an indicator of possible acute toxicity. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Evaluated dose regimens for piperaquine simulations. https://doi.org/10.1371/journal.pmed.1002212.t002 Results Clinical Studies A total of 11 different clinical studies were shared with WWARN, containing 8,776 plasma piperaquine concentrations from 728 individuals that could be included in the pooled analysis (Fig 1) [22,24,25,27,29,30,39–43]. Demographic data are presented in Table 1. Pharmacokinetic Properties of Piperaquine Of the 8,776 samples included, 141 concentrations (1.74%) were measured to be below the lower limit of quantification. These were the only measurements omitted from the analysis. A three-compartment disposition model proved superior to a two-compartment disposition model (p < 0.001). There was no further improvement from an additional fourth compartment (p > 0.05). The absorption phase was described successfully by a transit compartment model with two transit compartments (kA and kTR were assumed equal). This model proved superior to all other tested absorption models (ΔOFV = −215). The addition of inter-individual variability in relative bioavailability improved the model fit significantly (p < 0.001). The final structural model is presented in Fig 2. Inter-individual variability was retained on all parameters (except inter-compartment clearance), and inter-occasion variability was significantly associated with relative bioavailability (p < 0.001) and mean transit time (p < 0.001). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. A graphical overview of the final piperaquine population pharmacokinetic model. kTR is the absorption transit rate constant. CL is the elimination clearance. VC is the volume of distribution of the central compartment. VP1 and VP2 are the volumes of distribution of the first and second peripheral compartments, respectively. Q1 and Q2 are the inter-compartment clearances for the first and second peripheral compartments, respectively. F is the relative oral bioavailability. https://doi.org/10.1371/journal.pmed.1002212.g002 Body weight as a fixed allometric function on all clearance and volume of distribution parameters improved the fit of the model significantly (ΔOFV = −419). Estimating the exponent on elimination clearance did not result in a significant drop in OFV (p > 0.001) compared to using a fixed exponent, and the estimated exponent was similar to the fixed value. A categorical disease effect had a significant impact on elimination clearance, mean transit time, and central volume of distribution in the forward selection step (p < 0.05) and could be retained on mean transit time and elimination clearance in the backward elimination step (p < 0.001). However, during antimalarial treatment the patient usually recovers from the disease, so a disease effect on pharmacokinetics would be present only during the early assessments (day 1–3). Attempts to model a time-dependent disease effect failed, probably because of the low number of healthy volunteers (n = 50; 6.87%), of whom only 14 were dosed more than once. Thus, this covariate effect was not retained in the final model. Similarly, an exploratory analysis showed that disease could also have an effect on relative bioavailability, resulting in an increase in the exposure to piperaquine in healthy volunteers compared to patients. However, the disease effect was not retained in the final analysis, again because of the low number of healthy individuals. This needs to be addressed in future studies. A 24% increase (p < 0.001) in relative bioavailability was observed between dose occasions, whereas the total daily milligram/kilogram dosage did not influence absorption. This finding is likely to be related to the recovery from malaria illness (and also increasing food intake) during the 3 d of treatment [25,27], as the systematic dose-occasion effect was estimated as close to zero in healthy volunteers. However, this covariate effect was not separated for patients and healthy volunteers in the final model due to the small number of healthy volunteers who had been dosed more than once (n = 14). Gender was found to affect the mean transit time significantly among malaria-infected non-pregnant adults, but this covariate relationship could not be retained in the more stringent backward step. Inclusion of a maturation factor produced only a minor improvement in model fit (ΔOFV = −3.29) among children below 5 y of age. However, inclusion of this covariate resulted in an estimated enzymatic maturation that reflected the biological maturation of infants’ piperaquine biotransformation pathways; therefore, this factor was included in the final model. Parameter estimates were reliable, with small relative standard errors (Table 3). Predicted secondary pharmacokinetic parameters (i.e., elimination half-life, maximum concentration, time to maximum concentration, total exposure, and day 7 plasma concentration) obtained from the final model are presented in Table 3. Calculated η shrinkages were high due to the sparseness of data in some of the individual studies, but the calculated ε shrinkage was low (14.6%). Goodness-of-fit diagnostics and the prediction-corrected visual predictive check (n = 2,000) demonstrated that the model described the observed data well (Figs 3 and 4). The visual predicted check showed a small model misspecification for capillary versus venous plasma concentrations, but overall the diagnostics supported using the developed model for stochastic simulations and dose optimisations. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Basic goodness-of-fit plots for the final piperaquine model. Observed plasma piperaquine concentrations (from 11 clinical studies) plotted against population predicted concentrations (A) and against individual predicted concentrations (B). Conditional weighted residuals plotted against population prediction (C) and time (D). The solid line is the identity line, and the dashed line is the locally weighted least squares regression line. https://doi.org/10.1371/journal.pmed.1002212.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Prediction-corrected visual predictive check of the final population pharmacokinetic model of piperaquine. Based on 2,000 stochastic simulations. The insert shows the first 25 h after dosing. Open circles represent the observations, and solid lines represent the 5th, 50th, and 95th percentiles of the observed data. The shaded areas represent the 95% confidence intervals around the simulated 5th, 50th, and 95th percentiles. https://doi.org/10.1371/journal.pmed.1002212.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Final parameter estimates describing piperaquine population pharmacokinetics. https://doi.org/10.1371/journal.pmed.1002212.t003 Dose Optimisation The derived pharmacokinetic model predicted lower plasma piperaquine exposures in small children (5–24 kg) and adults with body weight between 60 and 75 kg (or between 60 and 100 kg if following the dose recommendation from Beijing Holley-Cotec) compared to other adult patients after dihydroartemisinin-piperaquine administration following the manufacturers’ recommended dose regimens. Simulated median (interquartile range) day 7 plasma concentration was 29.4 (19.3–44.3) ng/ml in small children (<25 kg) compared to 38.1 (25.8–56.3) ng/ml in larger children and adults (≥25 kg), with Sigma-Tau’s recommended dose regimen. The revised dose regimen proposed here is expected to achieve predicted exposures at all body weights comparable to that currently seen in a typical adult after appropriate standard treatment (Table 2; Fig 5). Simulated maximum plasma piperaquine concentrations after manufacturers’ dosing and the new optimised dose regimen are presented in Fig 5. The minimum and maximum piperaquine dosages of the three evaluated dose regimens are summarised in Table 2. It is important to note that the predicted maximum plasma piperaquine concentrations with the optimised regimen (upper 75th percentile of approximately 600 ng/ml) are not higher than those observed with the manufacturers’ regimens. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Stochastic simulations of dose regimens. Maximum plasma piperaquine concentration (A–C) and day 7 plasma piperaquine concentration (D–F) after Sigma-Tau’s recommended dosing (left panels), Beijing Holley-Cotec’s recommended dosing (middle panels), and the revised dose regimen (right panels). Circles represent the median values, and vertical lines represent the 25th to 75th percentiles of simulated concentrations. The dashed line indicates the previously defined venous plasma piperaquine day 7 concentration threshold value for therapeutic success of 30 ng/ml [35]. https://doi.org/10.1371/journal.pmed.1002212.g005 Clinical Studies A total of 11 different clinical studies were shared with WWARN, containing 8,776 plasma piperaquine concentrations from 728 individuals that could be included in the pooled analysis (Fig 1) [22,24,25,27,29,30,39–43]. Demographic data are presented in Table 1. Pharmacokinetic Properties of Piperaquine Of the 8,776 samples included, 141 concentrations (1.74%) were measured to be below the lower limit of quantification. These were the only measurements omitted from the analysis. A three-compartment disposition model proved superior to a two-compartment disposition model (p < 0.001). There was no further improvement from an additional fourth compartment (p > 0.05). The absorption phase was described successfully by a transit compartment model with two transit compartments (kA and kTR were assumed equal). This model proved superior to all other tested absorption models (ΔOFV = −215). The addition of inter-individual variability in relative bioavailability improved the model fit significantly (p < 0.001). The final structural model is presented in Fig 2. Inter-individual variability was retained on all parameters (except inter-compartment clearance), and inter-occasion variability was significantly associated with relative bioavailability (p < 0.001) and mean transit time (p < 0.001). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. A graphical overview of the final piperaquine population pharmacokinetic model. kTR is the absorption transit rate constant. CL is the elimination clearance. VC is the volume of distribution of the central compartment. VP1 and VP2 are the volumes of distribution of the first and second peripheral compartments, respectively. Q1 and Q2 are the inter-compartment clearances for the first and second peripheral compartments, respectively. F is the relative oral bioavailability. https://doi.org/10.1371/journal.pmed.1002212.g002 Body weight as a fixed allometric function on all clearance and volume of distribution parameters improved the fit of the model significantly (ΔOFV = −419). Estimating the exponent on elimination clearance did not result in a significant drop in OFV (p > 0.001) compared to using a fixed exponent, and the estimated exponent was similar to the fixed value. A categorical disease effect had a significant impact on elimination clearance, mean transit time, and central volume of distribution in the forward selection step (p < 0.05) and could be retained on mean transit time and elimination clearance in the backward elimination step (p < 0.001). However, during antimalarial treatment the patient usually recovers from the disease, so a disease effect on pharmacokinetics would be present only during the early assessments (day 1–3). Attempts to model a time-dependent disease effect failed, probably because of the low number of healthy volunteers (n = 50; 6.87%), of whom only 14 were dosed more than once. Thus, this covariate effect was not retained in the final model. Similarly, an exploratory analysis showed that disease could also have an effect on relative bioavailability, resulting in an increase in the exposure to piperaquine in healthy volunteers compared to patients. However, the disease effect was not retained in the final analysis, again because of the low number of healthy individuals. This needs to be addressed in future studies. A 24% increase (p < 0.001) in relative bioavailability was observed between dose occasions, whereas the total daily milligram/kilogram dosage did not influence absorption. This finding is likely to be related to the recovery from malaria illness (and also increasing food intake) during the 3 d of treatment [25,27], as the systematic dose-occasion effect was estimated as close to zero in healthy volunteers. However, this covariate effect was not separated for patients and healthy volunteers in the final model due to the small number of healthy volunteers who had been dosed more than once (n = 14). Gender was found to affect the mean transit time significantly among malaria-infected non-pregnant adults, but this covariate relationship could not be retained in the more stringent backward step. Inclusion of a maturation factor produced only a minor improvement in model fit (ΔOFV = −3.29) among children below 5 y of age. However, inclusion of this covariate resulted in an estimated enzymatic maturation that reflected the biological maturation of infants’ piperaquine biotransformation pathways; therefore, this factor was included in the final model. Parameter estimates were reliable, with small relative standard errors (Table 3). Predicted secondary pharmacokinetic parameters (i.e., elimination half-life, maximum concentration, time to maximum concentration, total exposure, and day 7 plasma concentration) obtained from the final model are presented in Table 3. Calculated η shrinkages were high due to the sparseness of data in some of the individual studies, but the calculated ε shrinkage was low (14.6%). Goodness-of-fit diagnostics and the prediction-corrected visual predictive check (n = 2,000) demonstrated that the model described the observed data well (Figs 3 and 4). The visual predicted check showed a small model misspecification for capillary versus venous plasma concentrations, but overall the diagnostics supported using the developed model for stochastic simulations and dose optimisations. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Basic goodness-of-fit plots for the final piperaquine model. Observed plasma piperaquine concentrations (from 11 clinical studies) plotted against population predicted concentrations (A) and against individual predicted concentrations (B). Conditional weighted residuals plotted against population prediction (C) and time (D). The solid line is the identity line, and the dashed line is the locally weighted least squares regression line. https://doi.org/10.1371/journal.pmed.1002212.g003 Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Prediction-corrected visual predictive check of the final population pharmacokinetic model of piperaquine. Based on 2,000 stochastic simulations. The insert shows the first 25 h after dosing. Open circles represent the observations, and solid lines represent the 5th, 50th, and 95th percentiles of the observed data. The shaded areas represent the 95% confidence intervals around the simulated 5th, 50th, and 95th percentiles. https://doi.org/10.1371/journal.pmed.1002212.g004 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Final parameter estimates describing piperaquine population pharmacokinetics. https://doi.org/10.1371/journal.pmed.1002212.t003 Dose Optimisation The derived pharmacokinetic model predicted lower plasma piperaquine exposures in small children (5–24 kg) and adults with body weight between 60 and 75 kg (or between 60 and 100 kg if following the dose recommendation from Beijing Holley-Cotec) compared to other adult patients after dihydroartemisinin-piperaquine administration following the manufacturers’ recommended dose regimens. Simulated median (interquartile range) day 7 plasma concentration was 29.4 (19.3–44.3) ng/ml in small children (<25 kg) compared to 38.1 (25.8–56.3) ng/ml in larger children and adults (≥25 kg), with Sigma-Tau’s recommended dose regimen. The revised dose regimen proposed here is expected to achieve predicted exposures at all body weights comparable to that currently seen in a typical adult after appropriate standard treatment (Table 2; Fig 5). Simulated maximum plasma piperaquine concentrations after manufacturers’ dosing and the new optimised dose regimen are presented in Fig 5. The minimum and maximum piperaquine dosages of the three evaluated dose regimens are summarised in Table 2. It is important to note that the predicted maximum plasma piperaquine concentrations with the optimised regimen (upper 75th percentile of approximately 600 ng/ml) are not higher than those observed with the manufacturers’ regimens. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. Stochastic simulations of dose regimens. Maximum plasma piperaquine concentration (A–C) and day 7 plasma piperaquine concentration (D–F) after Sigma-Tau’s recommended dosing (left panels), Beijing Holley-Cotec’s recommended dosing (middle panels), and the revised dose regimen (right panels). Circles represent the median values, and vertical lines represent the 25th to 75th percentiles of simulated concentrations. The dashed line indicates the previously defined venous plasma piperaquine day 7 concentration threshold value for therapeutic success of 30 ng/ml [35]. https://doi.org/10.1371/journal.pmed.1002212.g005 Discussion This study developed and validated a population pharmacokinetic meta-model to describe the pharmacological properties of piperaquine and the influence of demographic and clinical covariates. Body weight influenced clearance and volume parameters significantly, resulting in lower piperaquine exposures in small children compared to larger children and adults after administration of the manufacturers’ currently recommended dose regimens. The final model was used to develop a revised dose regimen of dihydroartemisinin-piperaquine that is expected to provide therapeutic piperaquine exposures safely in all patients, including in small children with malaria. This could improve the treatment of malaria in small children. Dihydroartemisinin-piperaquine is an excellent fixed-dose ACT that has shown consistently good efficacy in patients with uncomplicated P. falciparum malaria infection [4–8], but cumulative evidence shows that dosing could be improved [19]. Under-dosing increases the risk of treatment failure and therefore drives the development of resistance. Piperaquine resistance developed in China, where piperaquine monotherapy was used for prevention and treatment between 1978 and 1994, and has emerged again recently in Cambodia on a background of artemisinin resistance [53–55]. To maximise its therapeutic lifespan, it is essential to optimise dihydroartemisinin-piperaquine recommended dose regimens, which would lower the risk of treatment failure and reduce the selective pressure for the development of resistance. The present study used a pooled pharmacokinetic modelling approach to evaluate dihydroartemisinin-piperaquine treatment and presents an optimised dose regimen that should improve the treatment of uncomplicated P. falciparum malaria. A large WWARN pooled analysis of the clinical efficacy of dihydroartemisinin-piperaquine in 7,072 patients enrolled in 26 studies highlighted a significant risk of recrudescent malaria in young children (1–5 y of age) [19]. Younger children typically have the highest treatment failure rates in meta-analyses of antimalarial treatments [19,21,56,57], usually attributed to relative lack of immunity [20]. But lower drug exposure is another possible contributor. The previously published pooled analysis demonstrated that the milligram/kilogram piperaquine dosage administered was a significant predictor of treatment failure, with the risk of recrudescence increased by 13% (95% CI 5.0%–21%) for every 5-mg/kg decrease in dosage on day 42 [19]. Previous clinical studies as well as pharmacokinetic-pharmacodynamic modelling show that children achieved a lower plasma exposure to piperaquine, compared to adults, after equivalent weight-based (mg/kg) piperaquine dosage [24,30,35]. Taken together, these findings suggest the need to increase dose regimens of dihydroartemisinin-piperaquine for small children. Several studies have investigated the pharmacokinetic properties of piperaquine [6,10,23–25,27,29,30,32,35,40,58–60]. However, the present study is, to our knowledge, the largest analysis to date, incorporating data from 8,776 pharmacokinetic samples from 728 individuals from different target populations and continents. Our results can be used to inform evidence-based optimised treatment in vulnerable young children with malaria, who account for an estimated 78% of malaria-related deaths [1]. Pharmacokinetics of Piperaquine A three-compartment disposition model with a transit compartment absorption model was found to describe the pooled data adequately, which is in accordance with other recent studies [24,25,27,29,30]. Inter-occasion (within-individual) variability had a significant impact on both relative bioavailability and the mean transit time of the absorption, indicating a large degree of variability not just among patients but also between dose occasions for a specific patient. The final model showed satisfactory goodness-of-fit diagnostics and high precision in parameter estimates. Estimated parameters were also comparable to those found in previous studies [22,24,25,27,29]. The visual predictive check showed a small model misspecification for the capillary versus venous plasma data, probably because capillary measurements were performed only in studies of children with malaria, whereas the overall model is based on a more diverse dataset. The model simulation therefore included more variability compared to the observed data. In agreement with previous studies [25,27], piperaquine absorption in patients with malaria could be characterised further with a categorical covariate that increased the relative bioavailability by 24% with each subsequent dose within a single course of treatment. This increase in relative bioavailability may result from the improved gastrointestinal function and food intake that accompanies clinical recovery. However, a disease effect on bioavailability could not be evaluated given the few healthy volunteers (n = 14) who received more than one dose. Most drug measurement data in children were based on capillary samples, compared to venous plasma sampling in adult patients. A constant scale factor to convert between capillary and venous plasma piperaquine measurements was therefore required to allow simultaneous modelling of all individual patient data. The estimated capillary piperaquine concentrations were 106% higher than venous piperaquine concentrations, which is a greater difference than that reported previously [58]. However, this conversion factor resulted in a good description of both venous and capillary observations. An additional advantage of a simultaneous approach is that the model can be used to predict drug exposures from any sampling technique and therefore enables literature comparisons. The exact mechanism underlying this matrix-dependent difference cannot be elucidated from the data pooled in this analysis and needs further evaluation. Metabolic enzymes mature during the first years of life in a way that cannot be explained by an allometric function of body weight [61]. Most hepatic enzymes reach 70% to 100% maturation during the first 12 mo of life [62]. Sambol et al. investigated age as a nonlinear covariate (in addition to weight) and identified that, both covariates taken together, 6-mo-old infants have approximately half the clearance of 2-y-old children [30]. To investigate this further, the present study assessed age as a maturation function on piperaquine elimination clearance. The inclusion of a maturation function improved the model fit. Although the improvement was not statistically significant, the factor was kept in the final model to reflect the known changes in biotransformation pathways that occur as the infant grows. More information is needed on piperaquine disposition in the first 2 y of life. A malaria disease effect was investigated but not retained in the final model given the few healthy volunteers who received more than one dose (n = 14). However, the preliminary results suggest that healthy patients have higher maximum concentrations and higher exposures to piperaquine. This is an important caveat to the revised dosing recommendations—they apply to the treatment of malaria. More information is needed on the pharmacokinetic properties of 3-d dihydroartemisinin-piperaquine regimens in healthy individuals. This is particularly important as dihydroartemisinin-piperaquine is also being used in mass treatment campaigns where most of the recipients are healthy. Pregnancy was not evaluated in this analysis, since only 4.9% (n = 36) of the patients in the study population were pregnant. However, pregnancy is expected to have a limited influence on the pharmacokinetic model presented here. Separate pharmacokinetic evaluations of the two studies that included pregnant women concluded that there were no differences in total piperaquine exposures between pregnant and non-pregnant patients [27,29]. Young children in areas of high malaria transmission are at increased risk of developing life-threatening severe malaria. Indeed, most of the deaths from malaria occur in African children. Partially protective immunity that also boosts parasite elimination after treatment only develops after repeated infections. It is important that children receive effective treatments at the doses needed to provide adequate exposure to all antimalarial drug components, but particularly the artemisinin partner in the case of an ACT. Sub-therapeutic exposure also increases the risk of drug resistance. The Sigma-Tau-recommended piperaquine dosage targets a daily dosage of 18 mg/kg, with a range between 13 and 27 mg/kg at different body weights (Table 2), while the dosage recommended by Beijing Holley-Cotec ranges from 9.6 to 32 mg/kg. A constant weight-based (mg/kg) dosage target would be appropriate only if there were a linear relationship between drug exposure and body weight, but physiological processes do not scale linearly with body weight [30,61,63]. Body weight has been identified as an important predictor of piperaquine exposure in previous studies [22,24,25,29], as confirmed in this study. Consequently, children achieve a lower drug exposure compared to adults after a standard target daily dosage of 18 mg/kg, which increases their risk of treatment failure and could shorten the useful therapeutic life of dihydroartemisinin-piperaquine [19]. This has potentially important consequences for therapeutic outcome and the development of drug resistance. It likely contributes to the >3-fold higher risk of recrudescence observed in children aged 1–5 y (hazard ratio 3.71; 95% CI 1.66–8.26; p = 0.002) compared to patients >12 y of age [19]. Consequently, in order to achieve exposure similar to that of adults, small children need higher doses of piperaquine than currently recommended by the manufacturers. Dose Optimisation The pooled analysis presented here is, to our knowledge, the largest pharmacokinetic analysis of piperaquine to date and incorporates data collected in phase III clinical trials and post-marketing studies in a large variety of populations, resulting in a greater power to identify important pharmacokinetic differences between key target populations. The final model showed adequate predictive performance, demonstrating its suitability for dose optimisation simulations. The final pharmacokinetic model was therefore used to simulate piperaquine exposures and maximum piperaquine concentrations at different body weights using the dose regimens recommended by the manufacturers (Sigma-Tau and Beijing Holley-Cotec) and was used to develop an evidence-based improved dose regimen for small children. Our simulations (Fig 5) show that both small children and adults with body weights between 60 and 75 kg for the Sigma-Tau–recommended dose regimen (or between 60 and 100 kg for the Beijing Holley-Cotec–recommended dose regimen) achieve lower plasma piperaquine exposures than typical adult patients (35–65 kg) with acute falciparum malaria after standard dosing (Table 2). The revised dosing scheme (Table 2) is predicted to achieve equivalent plasma piperaquine exposures in all patient groups, including small children and larger adults, without risking higher maximum piperaquine concentrations (Fig 5). A previously pooled efficacy analysis of dihydroartemisinin-piperaquine treatment [19] showed that a total minimum piperaquine dosage of 59 mg/kg would result in successful treatment in 95% of small children. Our revised dose scheme proposes a total minimum dosage of 64 mg/kg for children weighing 5–15 kg (Table 2), compared to the minimum dosage of 40 mg/kg recommended by the manufacturers. This adjustment would ensure similar plasma piperaquine exposure across all weight groups and, most importantly, would improve the treatment of small children. Most ACTs have a target artesunate dosage of 4 mg/kg/d according to WHO guidelines for the treatment of malaria. This corresponds to a dihydroartemisinin dosage of 2.96 mg/kg/d (based on molar equivalents). However, the current manufacturer-recommended dihydroartemisinin dosage ranges between 1.62 and 3.33 mg/kg/d for Sigma-Tau, and between 1.20 and 4.00 mg/kg/d for Beijing Holley-Cotec, giving the lowest dosage of the artemisinin component of all WHO-recommended ACTs. As dihydroartemisinin-piperaquine is a fixed-dose combination, the increased piperaquine doses recommended by this analysis will also increase the dihydroartemisinin dosage, while remaining within the 2–10 mg/kg target range recommended by WHO. Increasing both dihydroartemisinin and piperaquine concentrations should contribute to a more effective therapy by reducing the residual parasite biomass remaining on day 3, and reduce the risk of recrudescence and reinfection, thereby potentially prolonging the useful therapeutic life of dihydroartemisinin-piperaquine. Piperaquine prolongs ventricular repolarisation, and this is reflected in electrocardiographic QT prolongation. Manning et al. [64] stopped a recent study in healthy volunteers given a 50% higher than recommended daily dose of piperaquine for 2 d because four volunteers had potentially unsafe QT prolongation (>500 ms). However, that study used the automated electrocardiograph reading, mostly likely resulting in a reported QU interval instead of the correct manual QT reading, which would be substantially shorter. Furthermore, increasing the piperaquine dose by 50% in these healthy volunteers resulted in very high maximum plasma piperaquine concentrations, with a mean value of 1,750 ng/ml [64], compared to a predicted median maximum concentration of 310 ng/ml following the suggested optimised dose regimen (Fig 5). Small children receiving higher body-weight-based daily doses of piperaquine, after dose adjustment, are not expected to achieve higher maximum piperaquine concentrations than a typical non-pregnant adult patient given the manufacturer’s recommended dose regimen (Fig 5). Thus, maximum plasma piperaquine concentrations after optimised dosing are not expected to increase the risk of cardiac adverse events. However, the safety and efficacy of this suggested revised dosing will need to be evaluated prospectively. Such prospective studies could also assess whether more pragmatic dose regimens with fewer body weight bands could be achieved safely. In the context of antimalarial drug registration for uncomplicated P. falciparum malaria, patients recruited for phase II or phase III trials usually exclude important sub-populations such as infants, pregnant women, and patients with co-morbidities (e.g., malnutrition, co-infections). Thus, these sub-populations are unlikely to be represented in sufficient numbers to draw a conclusion on their optimal dosing at the time of the initial registration of the drug with a medicines regulatory authority. Pooled analyses of individual patient data accumulated in the post-marketing phase are needed to allow dose optimisation for such vulnerable target population groups, as these are the populations that carry the highest malaria morbidity and mortality rates. However, limited data were available in infants (≤1 y of age), pregnant women with malaria, patients with co-morbidities, and healthy individuals for this pooled meta-analysis. Thus, the pharmacological properties of piperaquine could not be assessed reliably in these groups within the present analysis. Prospective pharmacological studies are urgently needed to address potential differences in these sub-groups. Other study limitations are the lack of general safety data, and the lack of data from large monotherapy piperaquine trials performed in China between 1978 and 1994. In conclusion, suboptimal plasma piperaquine exposures in small children given the current manufacturers’ recommended dose regimens were confirmed in this pooled pharmacokinetic analysis. In addition, low exposure in adults with body weights of 60–100 kg (depending on dose regimen) was also detected. Pharmacokinetic analysis was used to derive an optimised antimalarial dose regimen. It is essential that currently used antimalarial treatments are optimised in the post-registration phase so that all patient groups achieve similar drug exposures, and thus an equal chance of being cured. This optimisation would also reduce the selective pressure for the development of resistance, thus slowing the development of drug resistance and prolonging the useful therapeutic life of dihydroartemisinin-piperaquine. It is essential that currently available antimalarials remain effective until novel treatments can be produced to overcome artemisinin resistance. This evidence-based improved dose regimen has been adopted by WHO in their recently published guidelines for the treatment of malaria [2]. Pharmacokinetics of Piperaquine A three-compartment disposition model with a transit compartment absorption model was found to describe the pooled data adequately, which is in accordance with other recent studies [24,25,27,29,30]. Inter-occasion (within-individual) variability had a significant impact on both relative bioavailability and the mean transit time of the absorption, indicating a large degree of variability not just among patients but also between dose occasions for a specific patient. The final model showed satisfactory goodness-of-fit diagnostics and high precision in parameter estimates. Estimated parameters were also comparable to those found in previous studies [22,24,25,27,29]. The visual predictive check showed a small model misspecification for the capillary versus venous plasma data, probably because capillary measurements were performed only in studies of children with malaria, whereas the overall model is based on a more diverse dataset. The model simulation therefore included more variability compared to the observed data. In agreement with previous studies [25,27], piperaquine absorption in patients with malaria could be characterised further with a categorical covariate that increased the relative bioavailability by 24% with each subsequent dose within a single course of treatment. This increase in relative bioavailability may result from the improved gastrointestinal function and food intake that accompanies clinical recovery. However, a disease effect on bioavailability could not be evaluated given the few healthy volunteers (n = 14) who received more than one dose. Most drug measurement data in children were based on capillary samples, compared to venous plasma sampling in adult patients. A constant scale factor to convert between capillary and venous plasma piperaquine measurements was therefore required to allow simultaneous modelling of all individual patient data. The estimated capillary piperaquine concentrations were 106% higher than venous piperaquine concentrations, which is a greater difference than that reported previously [58]. However, this conversion factor resulted in a good description of both venous and capillary observations. An additional advantage of a simultaneous approach is that the model can be used to predict drug exposures from any sampling technique and therefore enables literature comparisons. The exact mechanism underlying this matrix-dependent difference cannot be elucidated from the data pooled in this analysis and needs further evaluation. Metabolic enzymes mature during the first years of life in a way that cannot be explained by an allometric function of body weight [61]. Most hepatic enzymes reach 70% to 100% maturation during the first 12 mo of life [62]. Sambol et al. investigated age as a nonlinear covariate (in addition to weight) and identified that, both covariates taken together, 6-mo-old infants have approximately half the clearance of 2-y-old children [30]. To investigate this further, the present study assessed age as a maturation function on piperaquine elimination clearance. The inclusion of a maturation function improved the model fit. Although the improvement was not statistically significant, the factor was kept in the final model to reflect the known changes in biotransformation pathways that occur as the infant grows. More information is needed on piperaquine disposition in the first 2 y of life. A malaria disease effect was investigated but not retained in the final model given the few healthy volunteers who received more than one dose (n = 14). However, the preliminary results suggest that healthy patients have higher maximum concentrations and higher exposures to piperaquine. This is an important caveat to the revised dosing recommendations—they apply to the treatment of malaria. More information is needed on the pharmacokinetic properties of 3-d dihydroartemisinin-piperaquine regimens in healthy individuals. This is particularly important as dihydroartemisinin-piperaquine is also being used in mass treatment campaigns where most of the recipients are healthy. Pregnancy was not evaluated in this analysis, since only 4.9% (n = 36) of the patients in the study population were pregnant. However, pregnancy is expected to have a limited influence on the pharmacokinetic model presented here. Separate pharmacokinetic evaluations of the two studies that included pregnant women concluded that there were no differences in total piperaquine exposures between pregnant and non-pregnant patients [27,29]. Young children in areas of high malaria transmission are at increased risk of developing life-threatening severe malaria. Indeed, most of the deaths from malaria occur in African children. Partially protective immunity that also boosts parasite elimination after treatment only develops after repeated infections. It is important that children receive effective treatments at the doses needed to provide adequate exposure to all antimalarial drug components, but particularly the artemisinin partner in the case of an ACT. Sub-therapeutic exposure also increases the risk of drug resistance. The Sigma-Tau-recommended piperaquine dosage targets a daily dosage of 18 mg/kg, with a range between 13 and 27 mg/kg at different body weights (Table 2), while the dosage recommended by Beijing Holley-Cotec ranges from 9.6 to 32 mg/kg. A constant weight-based (mg/kg) dosage target would be appropriate only if there were a linear relationship between drug exposure and body weight, but physiological processes do not scale linearly with body weight [30,61,63]. Body weight has been identified as an important predictor of piperaquine exposure in previous studies [22,24,25,29], as confirmed in this study. Consequently, children achieve a lower drug exposure compared to adults after a standard target daily dosage of 18 mg/kg, which increases their risk of treatment failure and could shorten the useful therapeutic life of dihydroartemisinin-piperaquine [19]. This has potentially important consequences for therapeutic outcome and the development of drug resistance. It likely contributes to the >3-fold higher risk of recrudescence observed in children aged 1–5 y (hazard ratio 3.71; 95% CI 1.66–8.26; p = 0.002) compared to patients >12 y of age [19]. Consequently, in order to achieve exposure similar to that of adults, small children need higher doses of piperaquine than currently recommended by the manufacturers. Dose Optimisation The pooled analysis presented here is, to our knowledge, the largest pharmacokinetic analysis of piperaquine to date and incorporates data collected in phase III clinical trials and post-marketing studies in a large variety of populations, resulting in a greater power to identify important pharmacokinetic differences between key target populations. The final model showed adequate predictive performance, demonstrating its suitability for dose optimisation simulations. The final pharmacokinetic model was therefore used to simulate piperaquine exposures and maximum piperaquine concentrations at different body weights using the dose regimens recommended by the manufacturers (Sigma-Tau and Beijing Holley-Cotec) and was used to develop an evidence-based improved dose regimen for small children. Our simulations (Fig 5) show that both small children and adults with body weights between 60 and 75 kg for the Sigma-Tau–recommended dose regimen (or between 60 and 100 kg for the Beijing Holley-Cotec–recommended dose regimen) achieve lower plasma piperaquine exposures than typical adult patients (35–65 kg) with acute falciparum malaria after standard dosing (Table 2). The revised dosing scheme (Table 2) is predicted to achieve equivalent plasma piperaquine exposures in all patient groups, including small children and larger adults, without risking higher maximum piperaquine concentrations (Fig 5). A previously pooled efficacy analysis of dihydroartemisinin-piperaquine treatment [19] showed that a total minimum piperaquine dosage of 59 mg/kg would result in successful treatment in 95% of small children. Our revised dose scheme proposes a total minimum dosage of 64 mg/kg for children weighing 5–15 kg (Table 2), compared to the minimum dosage of 40 mg/kg recommended by the manufacturers. This adjustment would ensure similar plasma piperaquine exposure across all weight groups and, most importantly, would improve the treatment of small children. Most ACTs have a target artesunate dosage of 4 mg/kg/d according to WHO guidelines for the treatment of malaria. This corresponds to a dihydroartemisinin dosage of 2.96 mg/kg/d (based on molar equivalents). However, the current manufacturer-recommended dihydroartemisinin dosage ranges between 1.62 and 3.33 mg/kg/d for Sigma-Tau, and between 1.20 and 4.00 mg/kg/d for Beijing Holley-Cotec, giving the lowest dosage of the artemisinin component of all WHO-recommended ACTs. As dihydroartemisinin-piperaquine is a fixed-dose combination, the increased piperaquine doses recommended by this analysis will also increase the dihydroartemisinin dosage, while remaining within the 2–10 mg/kg target range recommended by WHO. Increasing both dihydroartemisinin and piperaquine concentrations should contribute to a more effective therapy by reducing the residual parasite biomass remaining on day 3, and reduce the risk of recrudescence and reinfection, thereby potentially prolonging the useful therapeutic life of dihydroartemisinin-piperaquine. Piperaquine prolongs ventricular repolarisation, and this is reflected in electrocardiographic QT prolongation. Manning et al. [64] stopped a recent study in healthy volunteers given a 50% higher than recommended daily dose of piperaquine for 2 d because four volunteers had potentially unsafe QT prolongation (>500 ms). However, that study used the automated electrocardiograph reading, mostly likely resulting in a reported QU interval instead of the correct manual QT reading, which would be substantially shorter. Furthermore, increasing the piperaquine dose by 50% in these healthy volunteers resulted in very high maximum plasma piperaquine concentrations, with a mean value of 1,750 ng/ml [64], compared to a predicted median maximum concentration of 310 ng/ml following the suggested optimised dose regimen (Fig 5). Small children receiving higher body-weight-based daily doses of piperaquine, after dose adjustment, are not expected to achieve higher maximum piperaquine concentrations than a typical non-pregnant adult patient given the manufacturer’s recommended dose regimen (Fig 5). Thus, maximum plasma piperaquine concentrations after optimised dosing are not expected to increase the risk of cardiac adverse events. However, the safety and efficacy of this suggested revised dosing will need to be evaluated prospectively. Such prospective studies could also assess whether more pragmatic dose regimens with fewer body weight bands could be achieved safely. In the context of antimalarial drug registration for uncomplicated P. falciparum malaria, patients recruited for phase II or phase III trials usually exclude important sub-populations such as infants, pregnant women, and patients with co-morbidities (e.g., malnutrition, co-infections). Thus, these sub-populations are unlikely to be represented in sufficient numbers to draw a conclusion on their optimal dosing at the time of the initial registration of the drug with a medicines regulatory authority. Pooled analyses of individual patient data accumulated in the post-marketing phase are needed to allow dose optimisation for such vulnerable target population groups, as these are the populations that carry the highest malaria morbidity and mortality rates. However, limited data were available in infants (≤1 y of age), pregnant women with malaria, patients with co-morbidities, and healthy individuals for this pooled meta-analysis. Thus, the pharmacological properties of piperaquine could not be assessed reliably in these groups within the present analysis. Prospective pharmacological studies are urgently needed to address potential differences in these sub-groups. Other study limitations are the lack of general safety data, and the lack of data from large monotherapy piperaquine trials performed in China between 1978 and 1994. In conclusion, suboptimal plasma piperaquine exposures in small children given the current manufacturers’ recommended dose regimens were confirmed in this pooled pharmacokinetic analysis. In addition, low exposure in adults with body weights of 60–100 kg (depending on dose regimen) was also detected. Pharmacokinetic analysis was used to derive an optimised antimalarial dose regimen. It is essential that currently used antimalarial treatments are optimised in the post-registration phase so that all patient groups achieve similar drug exposures, and thus an equal chance of being cured. This optimisation would also reduce the selective pressure for the development of resistance, thus slowing the development of drug resistance and prolonging the useful therapeutic life of dihydroartemisinin-piperaquine. It is essential that currently available antimalarials remain effective until novel treatments can be produced to overcome artemisinin resistance. This evidence-based improved dose regimen has been adopted by WHO in their recently published guidelines for the treatment of malaria [2]. Supporting Information S1 PRISMA IPD Checklist. PRISMA checklist. https://doi.org/10.1371/journal.pmed.1002212.s001 (DOCX) Acknowledgments We sincerely thank all participants for their significant contribution to these difficult and important studies, and the different clinical study teams. The opinions expressed are those of the authors and do not necessarily reflect those of the Australian Defence Organisation or any extant policy.
Priority-Setting for Novel Drug Regimens to Treat Tuberculosis: An Epidemiologic Modeldoi: 10.1371/journal.pmed.1002202pmid: 28045934
Background Novel drug regimens are needed for tuberculosis (TB) treatment. New regimens aim to improve on characteristics such as duration, efficacy, and safety profile, but no single regimen is likely to be ideal in all respects. By linking these regimen characteristics to a novel regimen’s ability to reduce TB incidence and mortality, we sought to prioritize regimen characteristics from a population-level perspective. Methods and Findings We developed a dynamic transmission model of multi-strain TB epidemics in hypothetical populations reflective of the epidemiological situations in India (primary analysis), South Africa, the Philippines, and Brazil. We modeled the introduction of various novel rifampicin-susceptible (RS) or rifampicin-resistant (RR) TB regimens that differed on six characteristics, identified in consultation with a team of global experts: (1) efficacy, (2) duration, (3) ease of adherence, (4) medical contraindications, (5) barrier to resistance, and (6) baseline prevalence of resistance to the novel regimen. We compared scale-up of these regimens to a baseline reflective of continued standard of care. For our primary analysis situated in India, our model generated baseline TB incidence and mortality of 157 (95% uncertainty range [UR]: 113–187) and 16 (95% UR: 9–23) per 100,000 per year at the time of novel regimen introduction and RR TB incidence and mortality of 6 (95% UR: 4–10) and 0.6 (95% UR: 0.3–1.1) per 100,000 per year. An optimal RS TB regimen was projected to reduce 10-y TB incidence and mortality in the India-like scenario by 12% (95% UR: 6%–20%) and 11% (95% UR: 6%–20%), respectively, compared to current-care projections. An optimal RR TB regimen reduced RR TB incidence by an estimated 32% (95% UR: 18%–46%) and RR TB mortality by 30% (95% UR: 18%–44%). Efficacy was the greatest determinant of impact; compared to a novel regimen meeting all minimal targets only, increasing RS TB treatment efficacy from 94% to 99% reduced TB mortality by 6% (95% UR: 1%–13%, half the impact of a fully optimized regimen), and increasing the efficacy against RR TB from 76% to 94% lowered RR TB mortality by 13% (95% UR: 6%–23%). Reducing treatment duration or improving ease of adherence had smaller but still substantial impact: shortening RS TB treatment duration from 6 to 2 mo lowered TB mortality by 3% (95% UR: 1%–6%), and shortening RR TB treatment from 20 to 6 mo reduced RR TB mortality by 8% (95% UR: 4%–13%), while reducing nonadherence to the corresponding regimens by 50% reduced TB and RR TB mortality by 2% (95% UR: 1%–4%) and 6% (95% UR: 3%–10%), respectively. Limitations include sparse data on key model parameters and necessary simplifications to model structure and outcomes. Conclusions In designing clinical trials of novel TB regimens, investigators should consider that even small changes in treatment efficacy may have considerable impact on TB-related incidence and mortality. Other regimen improvements may still have important benefits for resource allocation and outcomes such as patient quality of life. Why Was This Study Done? Improvements in tuberculosis (TB) treatment are expected to play an important role in reaching the WHO goal of reducing TB deaths by 95%—that is, by more than 1.3 million deaths per year—between 2015 and 2035. Multiple aspects of existing treatment regimens have room for improvement, but it is unclear which types of improvement are most important. This study sought to prioritize different features of TB treatment regimens based on their potential to prevent TB deaths and new TB cases. What Did the Researchers Do and Find? The researchers developed a mathematical model to simulate the introduction of a novel regimen for the treatment of either drug-susceptible or multidrug-resistant (MDR) TB. They then compared regimens with different characteristics in terms of their ability to reduce TB deaths and new TB cases. Of the six characteristics modeled, regimen efficacy was the characteristic with the greatest potential to reduce TB cases and deaths, but other characteristics also had important effects (for example, shorter regimens could save health care resources). What Do These Findings Mean? It is important for developers of novel regimens to at least maintain the efficacy of existing regimens, and improvements in efficacy could prevent a large number of deaths. Other improvements, such as shorter duration and increased ease of adherence, may still have important effects by enabling more people with TB to receive appropriate and timely treatment. Introduction The number of available or prospective drugs for treating tuberculosis (TB) is undergoing a long-overdue expansion. Delamanid and bedaquiline, both recently approved for the treatment of multidrug-resistant (MDR) TB [1,2], are the first novel agents registered for TB treatment in decades. Antibiotic classes such as carbapenems [3] and oxazolidinones [4] are also being repurposed to treat highly resistant TB cases. There is hope that later-generation fluoroquinolones [5], rifamycins [6], and newer drug classes [7,8] could shorten first-line treatment for TB (usually six mo), and in 2016 WHO endorsed a regimen that shortens MDR TB treatment to 9–11 mo [9] from a conventional duration of at least 18–20 mo. Despite these advances, however, many characteristics of TB regimens could be further improved, including not only treatment duration but also tolerability [10,11], efficacy [12,13], drug–drug interactions and medical indications [14,15], and the barrier against acquiring drug resistance while on therapy [16,17]. The development of improved treatment regimens within the next decade is recognized as a critical component of efforts to achieve the drastic reductions in TB cases and deaths that have been set as targets by the global community [18]. The WHO’s End TB Strategy, adopted by the World Health Assembly in 2015, highlights new drugs and shorter regimens as part of the path to a 95% reduction in global TB deaths by 2035, relative to the estimated 1.4 million that occurred in 2015 [19,20]. The Stop TB Partnership, similarly, names development of “drug regimens (including for drug-resistant TB) that are highly effective, faster-acting and nontoxic” as an essential investment if we are to meet TB elimination goals set forth in the United Nations’ Sustainable Development Goals [21]. In September 2016, WHO released target regimen profiles, describing characteristics desired in future TB regimens [22]. In the pursuit of these improved TB treatment regimens, improving all possible characteristics simultaneously in a single regimen will likely be impossible in the short term [23], leading to inevitable trade-offs. For example, higher cure rates may be difficult to achieve simultaneously with shorter treatment duration, and simpler or better-tolerated regimens may be less robust to emergence of drug resistance. Few tools currently exist to understand specific regimens’ population-level impact or to help prioritize different characteristics from this epidemiologic perspective when constructing and evaluating new regimens. We therefore developed a population-level model of novel regimens for TB, implemented within a representative set of hypothetical TB epidemics, for purposes of systematically understanding the relationships between regimen characteristics and potential population-level impact. Methods We created a deterministic compartmental transmission model of a pulmonary TB epidemic in an adult population, similar to prior models with respect to the natural history of TB and HIV [24,25], but incorporating additional structure related to TB treatment and drug-susceptibility phenotypes in order to simultaneously model resistance to rifampicin and to components of novel regimens (Fig 1). Parameters related to novel regimen characteristics (Table 1) were determined through an expert consultation process described below and in S1 Methods. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Model structure. The model (panel A) includes infection, rapid or slow progression to active TB, and initiation of treatment with a standard regimen or novel regimen (the transition from Active TB to Treatment, shown in more detail in panels B and C). (Also included in model but not shown in Fig 1: parallel structure for eight different drug resistance phenotypes; parallel structure for HIV infected/uninfected and treatment naïve/experienced; and death/spontaneous resolution.) Six novel drug regimen characteristics were evaluated within this transmission model; improved novel regimen (a) efficacy increases the probability of durable cure. A high barrier to resistance (b) prevents acquisition of resistance to drugs in the novel regimen. Less preexisting resistance to components of the novel regimen (c) and fewer medication contraindications or treatment-limiting toxicities associated with the novel regimen (d) increase the number of patients for whom the novel regimen is prescribed. Shorter regimen duration (e) and greater ease of adherence (f) both increase treatment completion, and shortened duration also reduces the probability of cure after loss to follow-up at any given time point. https://doi.org/10.1371/journal.pmed.1002202.g001 Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Modeled novel regimen characteristics and target values*. https://doi.org/10.1371/journal.pmed.1002202.t001 TB Natural History A complete description of the model depiction of TB natural history is provided in S2 Methods. Briefly, the risk of TB infection at each point in time reflects the number of active TB cases of each drug-susceptibility phenotype. A fraction of those who become infected (or re-infected) progress rapidly to active disease, while the remainder develop latent infection with a small but persistent hazard of reactivation. Active TB results in transmission as well as additional mortality risk, and HIV modifies multiple aspects of TB natural history. Populations with active TB seek care and receive a TB diagnosis at a defined rate according to treatment history and HIV status. Once diagnosed, most immediately start treatment, while a smaller fraction experience pretreatment loss to follow-up and remain in the active compartment. Nonadherence is modeled as a rate of loss to follow-up each month; the modeled rate is higher than that reported in treatment cohorts, in order to account for documented losses to follow-up as well as estimates of intermittent nonadherence. Treatment Regimens Three treatment regimens are modeled in each analysis: current standard of care for rifampicin-susceptible (RS) TB, standard of care for known rifampicin-resistant (RR) or MDR TB (modeled as lasting 20 mo), and a novel regimen intended for the treatment of either RS TB or RR TB. Treatment regimens are assigned on the basis of drug susceptibility testing and patient eligibility (Fig 1B and 1C), assuming gradual novel regimen scale-up over 3 y. Novel regimens are modeled as consisting of a “companion” component (one or more drugs in current use) and a “novel” component (one or more novel agents to which resistance is negligible at baseline). Infections may be susceptible or resistant to each of the companion component, novel component, and rifampicin, for a total of 23 = 8 modeled drug-susceptibility phenotypes. New resistance may be acquired during use of a regimen containing the element in question. We assume a modest 15%–45% reduction (S1 Table) in transmission fitness for infections resistant to rifampicin and/or the novel component [26,27]. We modeled introduction of a single type of novel regimen (i.e., intended either for RS TB or for RR TB) in each analysis. We assumed linear introduction of novel regimens over 3 y up to a total population coverage of 75% and measured impact at 10 y after initiation. For comparability, we assumed continued gradual scale-up of rifampicin drug-susceptibility testing (DST) through increased use of Xpert MTB/RIF or other molecular assays (as described in S2 Methods) and no other changes in current practice apart from the novel regimen, and we assumed that treatment with the novel regimen was only initiated after performing DST for drugs in the regimen. Treatment Outcomes A fraction of patients treated with a given regimen are assumed to relapse with acquired drug resistance, according to a regimen’s barrier to resistance. Among other patients, the probability of durable cure reflects the fraction of the intended treatment course that is completed, the efficacy of the regimen, and the initial drug susceptibility (section 2.4 in S2 Methods). Efficacy (Table 1 and S1 Table) is defined as the proportion of patients who, in the absence of drug resistance and conditional on completing the full treatment course, experience durable cure. When durable cure is not achieved, the result may be either treatment failure (persistent active disease) or relapse to active disease after a short period of noninfectiousness (modeled as a “pending relapse” state from which relapse occurs at a specified rate). Model Initialization and Calibration We started each model simulation by calibrating to epidemiologic targets based on present-day India (TB prevalence 195/100,000, HIV coprevalence 4% of individuals with TB, and RR TB 2.2% of new TB cases [28]); to explore the impact of novel regimens in epidemiologic settings with a range of TB and HIV burden, alternative analyses were also performed with the model calibrated to epidemiologic targets for Brazil, the Philippines, and South Africa (S2 Table). For each set of calibration targets, we randomly selected sets of model parameter values for a drug-susceptible TB epidemic from the ranges presented in S1 Table using Latin Hypercube Sampling (LHS). We adjusted the TB transmission rate and HIV infection rate in each simulation to achieve the target TB prevalence and HIV coprevalence when the drug-susceptible epidemic was at equilibrium. We then introduced drug resistance to each simulation by randomly sampling (again using LHS) 20 sets of parameters related to rifampicin resistance (S1 Table) for each drug-susceptible simulation—thereby resulting in 20 separate simulations for each drug-susceptible epidemic. After the introduction of rifampicin resistance, we allowed the model to progress for 25 y, reflecting the slow emergence of drug resistance over a prolonged time period prior to the historical introduction of effective second-line therapy. During the final 10 y of each calibration period, we gradually introduced second-line treatment, thereby enabling us to replicate the current situation in which most previously treated RR TB cases and a minority of treatment-naïve RR TB cases are identified and appropriately treated (S1 Table). We then evaluated the prevalence of RR TB among incident TB cases in each simulation at the end of this calibration period, excluding those that differed from our calibration target (2.2% in the primary analysis) by more than a factor of 1.5. The resulting calibrated epidemics were used to model the introduction of novel regimens at the end of this the calibration period. To ensure an adequate number of simulations, we doubled the number of simulations until results reached stability. A sensitivity analysis described in S3 Methods considers an alternative Bayesian approach to model calibration in which we weighted all simulations according to a joint Gaussian likelihood function based on WHO estimates of TB incidence, mortality, and RR TB prevalence. Selection of Novel Regimen Characteristics and Their Target Values In consultation with a WHO-appointed group of experts, we selected six characteristics of novel regimens for inclusion in our model of population impact (Table 1). These characteristics were not meant to form an exhaustive list but rather were chosen based on their potential to guide drug development and their ease of conceptualization. Regimen efficacy (which refers to the proportion cured within a specified duration) was distinguished from regimen duration and from ease of adherence (defined per month of regimen duration) because of the different mechanisms by which they impact treatment effectiveness and because of the potential for tradeoffs between these characteristics. (For example, the same drug combination could be used for a shorter course with lower efficacy or for a longer course with higher efficacy, or another drug could be added to enhance efficacy and shorten duration but would reduce ease of adherence). For each characteristic, we relied on literature review and expert consultation to define a minimum acceptable value for a new regimen, an optimistic target, and an intermediate target (Table 1). S1 Methods contains additional details of the process. For the characteristic of regimen efficacy, minimal targets for novel RS and RR TB regimens were based on the proportions achieving durable cure, among those who completed treatment, for participants in recent drug-susceptible TB treatment trials [29–31] who received standard treatment, and for patients in a systematic review of observational MDR TB cohorts [13]. Intermediate and optimistic efficacy targets represented consensus about attainable and more ambitious targets, respectively (S1 Methods 1.2.2). Targets for barrier to resistance for an RS TB regimen ranged from minimal resistance to the risk of resistance amplification for patients with isoniazid monoresistant TB treated with the standard regimen. For an RR TB regimen, this barrier ranged from that of the current standard RS TB regimen to that of current RR TB standard of care (section 1.2.3 in S1 Methods). Prevalence of preexisting resistance to the novel regimen was assumed to range from no resistance to the approximate prevalence of isoniazid and fluoroquinolone resistance among RS TB and RR TB patients, respectively (section 1.2.4 in S1 Methods). Regimen duration varied from current standard durations to the most optimistic durations considered plausible within the next decade (section 1.2.5 in S1 Methods). Proportions of patients who could be excluded from novel regimens for reasons other than drug resistance were determined by estimating the prevalence among TB patients of each of multiple possible contraindications (see list in section 1.2.6 in S1 Methods) and considering that a regimen could have zero, one, or multiple such contraindications; sensitivity analyses considered HIV-specific exclusions. Finally, the adherence characteristic combined observed rates of loss to follow-up as well as intermittent nonadherence (section 1.2.7 in S1 Methods), modeling both processes as a monthly attrition rate in order to fully capture the potential impact of shortened treatment durations on adherence and resulting effectiveness (section 2.4.2 in S2 Methods). Outcome Measures and Reporting Our primary outcome was the reduction in TB mortality (for RS TB regimens) or RR TB mortality (for RR TB regimens) in the India-like setting, 10 y after introduction of a given regimen, relative to a novel regimen meeting only minimal targets and to a novel regimen meeting all optimal targets (Fig 2). Secondary outcomes included reduction in incidence, reduction in total number of patient-months on treatment, reduction in mortality in other epidemiologic settings, and reduction in mortality when regimen improvements enhanced or limited scale-up of the novel regimen (causing an RS TB regimen to reach from 50% to 100% of eligible patients after 3 y and causing an RR TB to expand its reach more quickly through accompanying accelerated scale-up of rifampicin DST). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Illustration of resulting mortality trends and comparisons for different novel RS and RR TB regimens. Trajectories illustrate the median impact of novel regimens on the median projections of TB mortality. The impact of variation in each individual characteristic (such as efficacy, illustrated here) was evaluated as a fraction of the total impact of regimen optimization (distance between solid red and green trend lines). This evaluation was performed by optimizing the characteristic in question with an otherwise minimal baseline (difference between solid and dashed red lines, corresponding to the results shown in Fig 3A and 3C) and then by removing the characteristic from an otherwise optimized novel regimen (difference between solid and dashed green lines, corresponding to Fig 3B and 3D). Scale-up of the novel regimen was assumed to occur over 3 y following regimen introduction, and analyses were performed over the 10 y following the novel regimen’s introduction (including the 3 y of scale-up). https://doi.org/10.1371/journal.pmed.1002202.g002 The model was coded and statistical analyses performed in R version 3.2.3 [32]. Unless otherwise specified, results are presented as the median and 95% uncertainty range (UR) (representing the 0.025 through 0.975 quantiles) over all simulations that met calibration targets. Sensitivity Analyses To understand the role of scale-up of a novel regimen, we considered variation in the mortality impact of a novel RS TB regimen as its reach ranged between 50% and 100% of eligible patients. For an RR TB regimen, we evaluated the extent to which its impact increased if it its introduction were accompanied by accelerated scale-up of rapid rifampin susceptibility testing. We evaluated the sensitivity of the relative impact of each particular regimen characteristic, and of the total impact of a fully optimized novel regimen, to each of the model input parameters (S1 Table) by calculating partial rank correlation coefficients (PRCCs). For sensitivity analysis of the impact of HIV-specific exclusions, we modeled a scenario in which the same total fraction of patients was excluded, but those exclusions were concentrated among people living with HIV, as well as an extreme scenario in which all HIV-positive individuals in the South African setting were excluded from the novel regimen. In structural sensitivity analyses, we tested sensitivity to our assumption of homogeneous contact structure by repeating our primary analysis after dividing the modeled population into two groups with 50% higher and 50% lower transmission rates than the base case, partitioning the population between these groups in a ratio that maintained the same overall TB prevalence. We also tested sensitivity of these impacts to our assumption of an underlying RS TB epidemic at equilibrium by instead modeling an epidemic in which TB incidence was decreasing at a rate of 2%–3%/year due to secular declines in the transmission coefficient, probability of progressing rapidly to active disease, latent TB reactivation rate, and TB diagnosis rate (the four parameters to which TB incidence was most sensitive). TB Natural History A complete description of the model depiction of TB natural history is provided in S2 Methods. Briefly, the risk of TB infection at each point in time reflects the number of active TB cases of each drug-susceptibility phenotype. A fraction of those who become infected (or re-infected) progress rapidly to active disease, while the remainder develop latent infection with a small but persistent hazard of reactivation. Active TB results in transmission as well as additional mortality risk, and HIV modifies multiple aspects of TB natural history. Populations with active TB seek care and receive a TB diagnosis at a defined rate according to treatment history and HIV status. Once diagnosed, most immediately start treatment, while a smaller fraction experience pretreatment loss to follow-up and remain in the active compartment. Nonadherence is modeled as a rate of loss to follow-up each month; the modeled rate is higher than that reported in treatment cohorts, in order to account for documented losses to follow-up as well as estimates of intermittent nonadherence. Treatment Regimens Three treatment regimens are modeled in each analysis: current standard of care for rifampicin-susceptible (RS) TB, standard of care for known rifampicin-resistant (RR) or MDR TB (modeled as lasting 20 mo), and a novel regimen intended for the treatment of either RS TB or RR TB. Treatment regimens are assigned on the basis of drug susceptibility testing and patient eligibility (Fig 1B and 1C), assuming gradual novel regimen scale-up over 3 y. Novel regimens are modeled as consisting of a “companion” component (one or more drugs in current use) and a “novel” component (one or more novel agents to which resistance is negligible at baseline). Infections may be susceptible or resistant to each of the companion component, novel component, and rifampicin, for a total of 23 = 8 modeled drug-susceptibility phenotypes. New resistance may be acquired during use of a regimen containing the element in question. We assume a modest 15%–45% reduction (S1 Table) in transmission fitness for infections resistant to rifampicin and/or the novel component [26,27]. We modeled introduction of a single type of novel regimen (i.e., intended either for RS TB or for RR TB) in each analysis. We assumed linear introduction of novel regimens over 3 y up to a total population coverage of 75% and measured impact at 10 y after initiation. For comparability, we assumed continued gradual scale-up of rifampicin drug-susceptibility testing (DST) through increased use of Xpert MTB/RIF or other molecular assays (as described in S2 Methods) and no other changes in current practice apart from the novel regimen, and we assumed that treatment with the novel regimen was only initiated after performing DST for drugs in the regimen. Treatment Outcomes A fraction of patients treated with a given regimen are assumed to relapse with acquired drug resistance, according to a regimen’s barrier to resistance. Among other patients, the probability of durable cure reflects the fraction of the intended treatment course that is completed, the efficacy of the regimen, and the initial drug susceptibility (section 2.4 in S2 Methods). Efficacy (Table 1 and S1 Table) is defined as the proportion of patients who, in the absence of drug resistance and conditional on completing the full treatment course, experience durable cure. When durable cure is not achieved, the result may be either treatment failure (persistent active disease) or relapse to active disease after a short period of noninfectiousness (modeled as a “pending relapse” state from which relapse occurs at a specified rate). Model Initialization and Calibration We started each model simulation by calibrating to epidemiologic targets based on present-day India (TB prevalence 195/100,000, HIV coprevalence 4% of individuals with TB, and RR TB 2.2% of new TB cases [28]); to explore the impact of novel regimens in epidemiologic settings with a range of TB and HIV burden, alternative analyses were also performed with the model calibrated to epidemiologic targets for Brazil, the Philippines, and South Africa (S2 Table). For each set of calibration targets, we randomly selected sets of model parameter values for a drug-susceptible TB epidemic from the ranges presented in S1 Table using Latin Hypercube Sampling (LHS). We adjusted the TB transmission rate and HIV infection rate in each simulation to achieve the target TB prevalence and HIV coprevalence when the drug-susceptible epidemic was at equilibrium. We then introduced drug resistance to each simulation by randomly sampling (again using LHS) 20 sets of parameters related to rifampicin resistance (S1 Table) for each drug-susceptible simulation—thereby resulting in 20 separate simulations for each drug-susceptible epidemic. After the introduction of rifampicin resistance, we allowed the model to progress for 25 y, reflecting the slow emergence of drug resistance over a prolonged time period prior to the historical introduction of effective second-line therapy. During the final 10 y of each calibration period, we gradually introduced second-line treatment, thereby enabling us to replicate the current situation in which most previously treated RR TB cases and a minority of treatment-naïve RR TB cases are identified and appropriately treated (S1 Table). We then evaluated the prevalence of RR TB among incident TB cases in each simulation at the end of this calibration period, excluding those that differed from our calibration target (2.2% in the primary analysis) by more than a factor of 1.5. The resulting calibrated epidemics were used to model the introduction of novel regimens at the end of this the calibration period. To ensure an adequate number of simulations, we doubled the number of simulations until results reached stability. A sensitivity analysis described in S3 Methods considers an alternative Bayesian approach to model calibration in which we weighted all simulations according to a joint Gaussian likelihood function based on WHO estimates of TB incidence, mortality, and RR TB prevalence. Selection of Novel Regimen Characteristics and Their Target Values In consultation with a WHO-appointed group of experts, we selected six characteristics of novel regimens for inclusion in our model of population impact (Table 1). These characteristics were not meant to form an exhaustive list but rather were chosen based on their potential to guide drug development and their ease of conceptualization. Regimen efficacy (which refers to the proportion cured within a specified duration) was distinguished from regimen duration and from ease of adherence (defined per month of regimen duration) because of the different mechanisms by which they impact treatment effectiveness and because of the potential for tradeoffs between these characteristics. (For example, the same drug combination could be used for a shorter course with lower efficacy or for a longer course with higher efficacy, or another drug could be added to enhance efficacy and shorten duration but would reduce ease of adherence). For each characteristic, we relied on literature review and expert consultation to define a minimum acceptable value for a new regimen, an optimistic target, and an intermediate target (Table 1). S1 Methods contains additional details of the process. For the characteristic of regimen efficacy, minimal targets for novel RS and RR TB regimens were based on the proportions achieving durable cure, among those who completed treatment, for participants in recent drug-susceptible TB treatment trials [29–31] who received standard treatment, and for patients in a systematic review of observational MDR TB cohorts [13]. Intermediate and optimistic efficacy targets represented consensus about attainable and more ambitious targets, respectively (S1 Methods 1.2.2). Targets for barrier to resistance for an RS TB regimen ranged from minimal resistance to the risk of resistance amplification for patients with isoniazid monoresistant TB treated with the standard regimen. For an RR TB regimen, this barrier ranged from that of the current standard RS TB regimen to that of current RR TB standard of care (section 1.2.3 in S1 Methods). Prevalence of preexisting resistance to the novel regimen was assumed to range from no resistance to the approximate prevalence of isoniazid and fluoroquinolone resistance among RS TB and RR TB patients, respectively (section 1.2.4 in S1 Methods). Regimen duration varied from current standard durations to the most optimistic durations considered plausible within the next decade (section 1.2.5 in S1 Methods). Proportions of patients who could be excluded from novel regimens for reasons other than drug resistance were determined by estimating the prevalence among TB patients of each of multiple possible contraindications (see list in section 1.2.6 in S1 Methods) and considering that a regimen could have zero, one, or multiple such contraindications; sensitivity analyses considered HIV-specific exclusions. Finally, the adherence characteristic combined observed rates of loss to follow-up as well as intermittent nonadherence (section 1.2.7 in S1 Methods), modeling both processes as a monthly attrition rate in order to fully capture the potential impact of shortened treatment durations on adherence and resulting effectiveness (section 2.4.2 in S2 Methods). Outcome Measures and Reporting Our primary outcome was the reduction in TB mortality (for RS TB regimens) or RR TB mortality (for RR TB regimens) in the India-like setting, 10 y after introduction of a given regimen, relative to a novel regimen meeting only minimal targets and to a novel regimen meeting all optimal targets (Fig 2). Secondary outcomes included reduction in incidence, reduction in total number of patient-months on treatment, reduction in mortality in other epidemiologic settings, and reduction in mortality when regimen improvements enhanced or limited scale-up of the novel regimen (causing an RS TB regimen to reach from 50% to 100% of eligible patients after 3 y and causing an RR TB to expand its reach more quickly through accompanying accelerated scale-up of rifampicin DST). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Illustration of resulting mortality trends and comparisons for different novel RS and RR TB regimens. Trajectories illustrate the median impact of novel regimens on the median projections of TB mortality. The impact of variation in each individual characteristic (such as efficacy, illustrated here) was evaluated as a fraction of the total impact of regimen optimization (distance between solid red and green trend lines). This evaluation was performed by optimizing the characteristic in question with an otherwise minimal baseline (difference between solid and dashed red lines, corresponding to the results shown in Fig 3A and 3C) and then by removing the characteristic from an otherwise optimized novel regimen (difference between solid and dashed green lines, corresponding to Fig 3B and 3D). Scale-up of the novel regimen was assumed to occur over 3 y following regimen introduction, and analyses were performed over the 10 y following the novel regimen’s introduction (including the 3 y of scale-up). https://doi.org/10.1371/journal.pmed.1002202.g002 The model was coded and statistical analyses performed in R version 3.2.3 [32]. Unless otherwise specified, results are presented as the median and 95% uncertainty range (UR) (representing the 0.025 through 0.975 quantiles) over all simulations that met calibration targets. Sensitivity Analyses To understand the role of scale-up of a novel regimen, we considered variation in the mortality impact of a novel RS TB regimen as its reach ranged between 50% and 100% of eligible patients. For an RR TB regimen, we evaluated the extent to which its impact increased if it its introduction were accompanied by accelerated scale-up of rapid rifampin susceptibility testing. We evaluated the sensitivity of the relative impact of each particular regimen characteristic, and of the total impact of a fully optimized novel regimen, to each of the model input parameters (S1 Table) by calculating partial rank correlation coefficients (PRCCs). For sensitivity analysis of the impact of HIV-specific exclusions, we modeled a scenario in which the same total fraction of patients was excluded, but those exclusions were concentrated among people living with HIV, as well as an extreme scenario in which all HIV-positive individuals in the South African setting were excluded from the novel regimen. In structural sensitivity analyses, we tested sensitivity to our assumption of homogeneous contact structure by repeating our primary analysis after dividing the modeled population into two groups with 50% higher and 50% lower transmission rates than the base case, partitioning the population between these groups in a ratio that maintained the same overall TB prevalence. We also tested sensitivity of these impacts to our assumption of an underlying RS TB epidemic at equilibrium by instead modeling an epidemic in which TB incidence was decreasing at a rate of 2%–3%/year due to secular declines in the transmission coefficient, probability of progressing rapidly to active disease, latent TB reactivation rate, and TB diagnosis rate (the four parameters to which TB incidence was most sensitive). Results Calibration and Baseline Projections For the evaluation of RS TB regimens in India, 4,917 simulations met calibration targets, with an estimated baseline TB incidence and mortality of 157 (95% UR: 113–187) and 16 (95% UR: 9–23) per 100,000 per year. Corresponding estimates for the 5,298 simulations calibrated to evaluate the RR TB regimen scenario were as follows: TB incidence of 143 (95% UR: 103–170), TB mortality of 16 (95% UR: 9–24), RR TB incidence of 6.0 (95% UR: 3.5–10.2), and RR TB mortality of 0.6 (95% UR: 0.3–1.1)—all expressed in units per 100,000 per year. S3 Table shows corresponding outputs for the other epidemiological settings modeled. Impact of an Optimal Novel Regimen A novel regimen for RS TB, if it met all optimistic development targets (Table 1), was projected to reduce TB incidence by 12% (95% UR: 6%–22%) and TB mortality by 11% (95% UR: 6%–20%) relative to current practice at 10 y after implementation in the primary (India-like) setting. Given the much greater room for improvement in current RR TB treatment, a novel regimen for RR TB that met all optimistic targets could reduce RR TB incidence by 32% (95% UR: 18%–46%) and RR TB mortality by 30% (95% UR: 18%–44%) within 10 y. Primary Analysis: Relative Impact of Individual Novel Regimen Characteristics Upon varying each of the six regimen characteristics in isolation (compared to a regimen that met either minimal targets only or all optimal targets), regimen efficacy had the greatest potential impact on mortality and incidence (Fig 3 and S1 Fig). Improving the efficacy of a novel RS TB regimen from 94% (current-regimen estimate) to 99% was projected to reduce TB mortality by 6% (95% UR: 1%–13%) relative to current practice; this impact of improved efficacy alone was nearly half (44%, 95% UR: 33%–52%) of the total achievable impact of a fully optimized regimen (Fig 3A). Conversely, a novel RS TB regimen that met all other optimistic development targets except for efficacy had 60% (47%–68%) of the impact of a regimen that was fully optimized, including increased efficacy (Fig 3B). Similar results were seen for a novel RR TB regimen when efficacy was increased from 76% (estimate for current RR TB regimen) to 94% (comparable to current RS TB treatment) (Fig 3C and 3D). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Relative mortality impact of different individual characteristics of novel regimens for the treatment of RS or RR TB. Characteristics and levels are defined in Table 1. Impact is measured as a relative change in TB mortality (RS TB regimen, A and B) or RR TB mortality (RR TB regimen, C and D) 10 y after introduction of the novel regimen, as illustrated in Fig 3. In A and C, the benefit of partially (striped bars) or fully (solid bars) optimizing only one aspect of a regimen, with the remaining characteristics meeting only minimal targets, is compared to the impact of a regimen that is fully optimized in all aspects. In B and D, the mortality reduction achievable by a regimen that fails to meet only one optimistic target (relative to mortality projections using standard regimens) is compared to mortality reduction with a regimen that meets all optimistic targets. Percentages need not sum to 100% due to synergy between multiple characteristics of the regimen. Error bars show the 95% UR for the impact of each fully optimized characteristic. https://doi.org/10.1371/journal.pmed.1002202.g003 The impact of shortening treatment duration on treatment outcomes and resulting TB mortality and transmission was substantial but less than that of improving regimen efficacy. Compared to a regimen with the minimal value of all characteristics, a shortening of RS TB treatment duration from 6 mo to 2 mo, or of RR TB treatment duration from 20 mo to 6 mo, achieved approximately one quarter of the mortality impact that could achieved by optimizing all six regimen characteristics rather than only the duration characteristic (Fig 3A–3C). However, this magnitude of effect was only seen in settings of poor efficacy and poor adherence; if efficacy and tolerability of the regimen were improved to optimal levels, the additional impact of achieving a short duration was limited to about 10% of total novel regimen impact (Fig 3B and 3D). Reducing nonadherence by 50% (i.e., achieving the optimistic adherence level for a novel regimen) had similar but slightly less impact than aggressively shortening treatment duration (Fig 3) and had similarly diminishing yield as efficacy and duration improved relative to current care (Fig 3B and 3D). Among the other regimen characteristics modeled, medical contraindications and exclusions due to preexisting resistance each had negligible impact when the novel regimen offered little advantage over standard therapy (Fig 3A and 3C) but became more influential when the novel regimen was optimized in other respects (Fig 3B and 3D). Concentrating the same number of contraindications among people living with HIV (e.g., due to drug–drug interactions with antiretrovirals) had slightly greater mortality impact than other types of medical contraindications, and excluding all people living with HIV from an otherwise effective regimen would cause a very large reduction in impact in a high-HIV-prevalence setting such as South Africa (S2 Results). Low barriers to acquired resistance could substantially reduce the impact of novel regimens. For example, even under our optimistic assumption that novel-regimen DST was available and consistently used, a low barrier to resistance (e.g., 5% of RS TB patients acquiring resistance, Fig 2A, striped yellow bar) lowered the impact of an otherwise optimal novel RS TB regimen on TB mortality by 27% (95% UR: 19–40) within this 10-y time frame. Considerations for each regimen characteristic were similar whether evaluating incidence or mortality as the outcome (S1 Fig). Ancillary Impact of Novel Regimen Characteristics: Resource Use and Scalability Reductions in treatment duration, in particular, had potential ancillary effects on resource requirements. For example, reducing RS TB treatment duration from 6 to 2 mo, which we projected could reduce mortality by 22% (95% UR: 13%–29%), was also projected to reduce total patient-months of TB treatment in year 10 by 35% (95% UR: 33%–37%) (S5 Fig). By contrast, the impact of improved efficacy (and other regimen characteristics) on total treatment time reflects only the ability of such regimens to reduce the number of incident TB cases requiring treatment; thus, their treatment-related resource savings are smaller and accrue more gradually. Although we assumed the same scale-up for all novel regimens in the primary analyses above, the potential for regimen characteristics such as improved duration or safety to facilitate wider scale-up of a novel regimen is also an important consideration. In our secondary analyses of variation in regimen scale-up, we found that the mortality impact of an optimized RS TB regimen would be twice as large if it reached all eligible patients than if it reached only half of eligible patients while the remainder continued to receive current care (14% reduction, 95% UR: 8%–26%, versus 7% reduction, 95% UR: 4%–14%). If a particular regimen characteristic, such as elimination of a cold chain requirement or the expansion of opportunities to create fixed dose combinations, allowed such a substantial increase in the proportion of eligible patients reached by a superior regimen, then that characteristic could be as influential as efficacy. However, similar to the impact of other characteristics, the impact of scalability reflected the novel regimen’s ability to offer additional advantages over standard therapy, with negligible epidemiologic advantage when a novel regimen otherwise met only minimal targets. Similarly, if introduction of a novel regimen for treating RR TB facilitated rapid scale-up of universal rifampicin DST, the estimated impact on RR TB mortality increased: an optimized RR TB regimen could reduce RR TB mortality by 30% (95% UR: 18%–44%) under continued gradual DST scale-up, compared to 45% (95% UR: 29%–60%) when accompanied by universal RR TB detection within 3 y (S6 Fig). Primary Results for Other Epidemiologic Settings Results for the other settings modeled (Brazil, the Philippines, and South Africa) were similar overall to those obtained for India (S2 Fig, S3 Fig, S4 Fig), but the high HIV coprevalence in South Africa did result in some small differences. The higher TB case fatality before people with HIV-TB coinfection start TB treatment slightly reduced the proportion of TB incidence and mortality that an optimized novel RS TB regimen could prevent (S4 Table). The higher annual infection and mortality risks in the South African setting also resulted in a small increase in the relative importance to a novel regimen’s impact of preexisting resistance and a small decrease in the relative importance of regimen duration and adherence (S4 Fig). Other Sensitivity Analyses Using an “intermediate” novel regimen as the baseline for comparison, the model parameters that most influenced a standardized (meeting all intermediate targets) novel regimen’s mortality impact (S7 Fig) were the efficacy and loss to follow-up associated with the standard regimens and, for RR TB regimens, the extent of RR TB detection. The relative amounts of relapse versus failure and the timing of relapse were also important (S7 Fig). The relative importance of regimen characteristics was sensitive to underlying assumptions about the values of model parameters in ways that differed between RS versus RR TB regimens (Fig 4). For example, although improvements in the efficacy of an RS TB regimen consistently had greater impact than improvements in other characteristics of an RS TB regimen, this impact was greatest when new cases were detected quickly, relapses (as opposed to outright failures, who could be immediately re-treated) were a large proportion of those not cured by treatment, and re-diagnosis of those relapses was slow. The impact of RR TB regimen efficacy improvements was instead most sensitive to the extent of RR TB detection (among both new and retreatment patients) and the amount of loss to follow-up experienced with existing regimens at baseline. Improvements in duration and ease of adherence had greater impact when rates of loss to follow-up were high at baseline and when fractional treatment courses were associated with large increases in relapse risk. Further sensitivity analysis results, including consideration of declining TB incidence and HIV-specific regimen exclusions, are shown in S2 Results, S6 Table, S7 Table, S7 Fig, S8 Fig and S9 Fig. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Sensitivity of the impact of individual regimen characteristics to values of model parameters. Impact of each regimen characteristic is summarized here as the difference in the percent of TB or RR TB mortality reduction that results from achieving the minimal versus the optimal target for that characteristic when intermediate targets are met for all other characteristics. For the impact of each regimen characteristic, sensitivity to model input parameters is described by the partial rank correlation coefficient, a measure of the degree of correlation between projected impact and input variable value, while holding all other input variables constant. More intense color represents greater sensitivity to the parameter, with all parameters defined such that the strongest associations are in the positive direction. Parameters that did not rank among the top four for any regimen characteristic’s impact were excluded from this figure. https://doi.org/10.1371/journal.pmed.1002202.g004 Calibration and Baseline Projections For the evaluation of RS TB regimens in India, 4,917 simulations met calibration targets, with an estimated baseline TB incidence and mortality of 157 (95% UR: 113–187) and 16 (95% UR: 9–23) per 100,000 per year. Corresponding estimates for the 5,298 simulations calibrated to evaluate the RR TB regimen scenario were as follows: TB incidence of 143 (95% UR: 103–170), TB mortality of 16 (95% UR: 9–24), RR TB incidence of 6.0 (95% UR: 3.5–10.2), and RR TB mortality of 0.6 (95% UR: 0.3–1.1)—all expressed in units per 100,000 per year. S3 Table shows corresponding outputs for the other epidemiological settings modeled. Impact of an Optimal Novel Regimen A novel regimen for RS TB, if it met all optimistic development targets (Table 1), was projected to reduce TB incidence by 12% (95% UR: 6%–22%) and TB mortality by 11% (95% UR: 6%–20%) relative to current practice at 10 y after implementation in the primary (India-like) setting. Given the much greater room for improvement in current RR TB treatment, a novel regimen for RR TB that met all optimistic targets could reduce RR TB incidence by 32% (95% UR: 18%–46%) and RR TB mortality by 30% (95% UR: 18%–44%) within 10 y. Primary Analysis: Relative Impact of Individual Novel Regimen Characteristics Upon varying each of the six regimen characteristics in isolation (compared to a regimen that met either minimal targets only or all optimal targets), regimen efficacy had the greatest potential impact on mortality and incidence (Fig 3 and S1 Fig). Improving the efficacy of a novel RS TB regimen from 94% (current-regimen estimate) to 99% was projected to reduce TB mortality by 6% (95% UR: 1%–13%) relative to current practice; this impact of improved efficacy alone was nearly half (44%, 95% UR: 33%–52%) of the total achievable impact of a fully optimized regimen (Fig 3A). Conversely, a novel RS TB regimen that met all other optimistic development targets except for efficacy had 60% (47%–68%) of the impact of a regimen that was fully optimized, including increased efficacy (Fig 3B). Similar results were seen for a novel RR TB regimen when efficacy was increased from 76% (estimate for current RR TB regimen) to 94% (comparable to current RS TB treatment) (Fig 3C and 3D). Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Relative mortality impact of different individual characteristics of novel regimens for the treatment of RS or RR TB. Characteristics and levels are defined in Table 1. Impact is measured as a relative change in TB mortality (RS TB regimen, A and B) or RR TB mortality (RR TB regimen, C and D) 10 y after introduction of the novel regimen, as illustrated in Fig 3. In A and C, the benefit of partially (striped bars) or fully (solid bars) optimizing only one aspect of a regimen, with the remaining characteristics meeting only minimal targets, is compared to the impact of a regimen that is fully optimized in all aspects. In B and D, the mortality reduction achievable by a regimen that fails to meet only one optimistic target (relative to mortality projections using standard regimens) is compared to mortality reduction with a regimen that meets all optimistic targets. Percentages need not sum to 100% due to synergy between multiple characteristics of the regimen. Error bars show the 95% UR for the impact of each fully optimized characteristic. https://doi.org/10.1371/journal.pmed.1002202.g003 The impact of shortening treatment duration on treatment outcomes and resulting TB mortality and transmission was substantial but less than that of improving regimen efficacy. Compared to a regimen with the minimal value of all characteristics, a shortening of RS TB treatment duration from 6 mo to 2 mo, or of RR TB treatment duration from 20 mo to 6 mo, achieved approximately one quarter of the mortality impact that could achieved by optimizing all six regimen characteristics rather than only the duration characteristic (Fig 3A–3C). However, this magnitude of effect was only seen in settings of poor efficacy and poor adherence; if efficacy and tolerability of the regimen were improved to optimal levels, the additional impact of achieving a short duration was limited to about 10% of total novel regimen impact (Fig 3B and 3D). Reducing nonadherence by 50% (i.e., achieving the optimistic adherence level for a novel regimen) had similar but slightly less impact than aggressively shortening treatment duration (Fig 3) and had similarly diminishing yield as efficacy and duration improved relative to current care (Fig 3B and 3D). Among the other regimen characteristics modeled, medical contraindications and exclusions due to preexisting resistance each had negligible impact when the novel regimen offered little advantage over standard therapy (Fig 3A and 3C) but became more influential when the novel regimen was optimized in other respects (Fig 3B and 3D). Concentrating the same number of contraindications among people living with HIV (e.g., due to drug–drug interactions with antiretrovirals) had slightly greater mortality impact than other types of medical contraindications, and excluding all people living with HIV from an otherwise effective regimen would cause a very large reduction in impact in a high-HIV-prevalence setting such as South Africa (S2 Results). Low barriers to acquired resistance could substantially reduce the impact of novel regimens. For example, even under our optimistic assumption that novel-regimen DST was available and consistently used, a low barrier to resistance (e.g., 5% of RS TB patients acquiring resistance, Fig 2A, striped yellow bar) lowered the impact of an otherwise optimal novel RS TB regimen on TB mortality by 27% (95% UR: 19–40) within this 10-y time frame. Considerations for each regimen characteristic were similar whether evaluating incidence or mortality as the outcome (S1 Fig). Ancillary Impact of Novel Regimen Characteristics: Resource Use and Scalability Reductions in treatment duration, in particular, had potential ancillary effects on resource requirements. For example, reducing RS TB treatment duration from 6 to 2 mo, which we projected could reduce mortality by 22% (95% UR: 13%–29%), was also projected to reduce total patient-months of TB treatment in year 10 by 35% (95% UR: 33%–37%) (S5 Fig). By contrast, the impact of improved efficacy (and other regimen characteristics) on total treatment time reflects only the ability of such regimens to reduce the number of incident TB cases requiring treatment; thus, their treatment-related resource savings are smaller and accrue more gradually. Although we assumed the same scale-up for all novel regimens in the primary analyses above, the potential for regimen characteristics such as improved duration or safety to facilitate wider scale-up of a novel regimen is also an important consideration. In our secondary analyses of variation in regimen scale-up, we found that the mortality impact of an optimized RS TB regimen would be twice as large if it reached all eligible patients than if it reached only half of eligible patients while the remainder continued to receive current care (14% reduction, 95% UR: 8%–26%, versus 7% reduction, 95% UR: 4%–14%). If a particular regimen characteristic, such as elimination of a cold chain requirement or the expansion of opportunities to create fixed dose combinations, allowed such a substantial increase in the proportion of eligible patients reached by a superior regimen, then that characteristic could be as influential as efficacy. However, similar to the impact of other characteristics, the impact of scalability reflected the novel regimen’s ability to offer additional advantages over standard therapy, with negligible epidemiologic advantage when a novel regimen otherwise met only minimal targets. Similarly, if introduction of a novel regimen for treating RR TB facilitated rapid scale-up of universal rifampicin DST, the estimated impact on RR TB mortality increased: an optimized RR TB regimen could reduce RR TB mortality by 30% (95% UR: 18%–44%) under continued gradual DST scale-up, compared to 45% (95% UR: 29%–60%) when accompanied by universal RR TB detection within 3 y (S6 Fig). Primary Results for Other Epidemiologic Settings Results for the other settings modeled (Brazil, the Philippines, and South Africa) were similar overall to those obtained for India (S2 Fig, S3 Fig, S4 Fig), but the high HIV coprevalence in South Africa did result in some small differences. The higher TB case fatality before people with HIV-TB coinfection start TB treatment slightly reduced the proportion of TB incidence and mortality that an optimized novel RS TB regimen could prevent (S4 Table). The higher annual infection and mortality risks in the South African setting also resulted in a small increase in the relative importance to a novel regimen’s impact of preexisting resistance and a small decrease in the relative importance of regimen duration and adherence (S4 Fig). Other Sensitivity Analyses Using an “intermediate” novel regimen as the baseline for comparison, the model parameters that most influenced a standardized (meeting all intermediate targets) novel regimen’s mortality impact (S7 Fig) were the efficacy and loss to follow-up associated with the standard regimens and, for RR TB regimens, the extent of RR TB detection. The relative amounts of relapse versus failure and the timing of relapse were also important (S7 Fig). The relative importance of regimen characteristics was sensitive to underlying assumptions about the values of model parameters in ways that differed between RS versus RR TB regimens (Fig 4). For example, although improvements in the efficacy of an RS TB regimen consistently had greater impact than improvements in other characteristics of an RS TB regimen, this impact was greatest when new cases were detected quickly, relapses (as opposed to outright failures, who could be immediately re-treated) were a large proportion of those not cured by treatment, and re-diagnosis of those relapses was slow. The impact of RR TB regimen efficacy improvements was instead most sensitive to the extent of RR TB detection (among both new and retreatment patients) and the amount of loss to follow-up experienced with existing regimens at baseline. Improvements in duration and ease of adherence had greater impact when rates of loss to follow-up were high at baseline and when fractional treatment courses were associated with large increases in relapse risk. Further sensitivity analysis results, including consideration of declining TB incidence and HIV-specific regimen exclusions, are shown in S2 Results, S6 Table, S7 Table, S7 Fig, S8 Fig and S9 Fig. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. Sensitivity of the impact of individual regimen characteristics to values of model parameters. Impact of each regimen characteristic is summarized here as the difference in the percent of TB or RR TB mortality reduction that results from achieving the minimal versus the optimal target for that characteristic when intermediate targets are met for all other characteristics. For the impact of each regimen characteristic, sensitivity to model input parameters is described by the partial rank correlation coefficient, a measure of the degree of correlation between projected impact and input variable value, while holding all other input variables constant. More intense color represents greater sensitivity to the parameter, with all parameters defined such that the strongest associations are in the positive direction. Parameters that did not rank among the top four for any regimen characteristic’s impact were excluded from this figure. https://doi.org/10.1371/journal.pmed.1002202.g004 Discussion We used a dynamic transmission model in a series of idealized settings to help prioritize characteristics of novel drug regimens for treating TB. We found that increases in efficacy, for both RS TB and RR TB regimens, have the greatest potential to reduce TB incidence and mortality through direct impacts on treatment outcomes and resulting TB transmission. Shortened duration and improved tolerability may also yield substantial population-level benefits, but these will come in part through facilitating expanded treatment availability or reallocation of resources from treatment to other aspects of TB control. This process of using an epidemiological model, in ongoing consultation with worldwide experts, to help prioritize elements of new drug regimens offers a new approach to inform the development of combination antimicrobial regimens. For RS TB regimens, our finding that further improvements in efficacy could be more important than regimen shortening runs counter to the prevailing focus on developing a non-inferior, shorter regimen. This result reflects our use of evidence that (a) existing RS TB treatment already cures a majority of patients who complete as little as 2 mo of therapy [33] and (b) 85% or more of TB patients currently complete a full course of treatment [28]. These data suggest that more patients currently relapse due to incomplete regimen efficacy rather than loss to follow-up. Unfortunately, changes of a few percentage points in efficacy may be the characteristic most difficult to demonstrate in randomized trials of feasible size and scope. This finding has important implications for clinical trial design, suggesting that non-inferiority margins for any novel RS TB regimen should be as narrow as possible to avoid unintended harm from a shorter but marginally less effective regimen. Notably, efficacy and duration of treatment are not truly independent measures; as more potent anti-TB regimens are developed, a choice may be faced between the operational benefit of reducing treatment duration and the epidemiological value of using those same potent agents for a full six mo. These results also highlight the potential importance of developing biomarkers to identify individual patients who are at highest risk for relapse and may benefit from extended or intensified therapy. For RR TB, the importance of efficacy largely reflects the poor efficacy of the existing regimen and the substantial gains that remain to be made. Ultimately, the potential impact of novel drug regimens must be assessed from a holistic perspective; impact of specific regimen characteristics on incidence and mortality is only one consideration. In attempting to attain the ambitious targets of the End TB Strategy [34], indirect and ancillary effects of regimen improvements (for example, reduced resource requirements or improved patient experience) may be even more important, as better treatment outcomes in isolation will not achieve these goals. Specific regimen characteristics may facilitate more complete or more rapid scale-up of a more effective regimen—for instance, dosing frequency or safety monitoring requirements may determine whether a novel regimen is adopted for widespread use in particular settings. Because of unpredictability of the extent to which such features will limit uptake in different contexts, and because some such features (e.g., availability of fixed dose combinations) may be determined after a regimen is largely developed, our analysis standardized scale-up between regimens and settings. However, characteristics that determine scalability (e.g., dosing frequency or safety monitoring requirements) could be the most critical regimen characteristics in particular settings. In addition, synergies with other interventions—such as improved diagnosis, case-finding, and preventive therapy—must also be considered. For RR TB, for example, the availability of simpler and safer regimens could motivate TB programs to expand RR TB diagnosis and treatment [35], outweighing the direct effect of any particular regimen improvement in many settings. Averting adverse events such as liver, renal, and oto-toxicity is also important to individual patients, and reductions in regimen duration or visit frequency could reduce often-devastating patient costs and lost productivity [36,37]. This analysis has several limitations. First, as with all modeling analyses, we adopted a simplified structure and used parameters with substantial uncertainty. In particular, we simplified HIV natural history, age, and contact structure. These simplifications are unlikely to change the relative impact of different regimen characteristics as long as regimen improvements apply similarly across the population, but they could bias our results if simultaneous differences exist both in TB epidemiology (e.g., transmission) and in the differential impact of different regimen variables. More specific to this analysis, the particular task of linking regimen characteristics to anticipated population-level impact presents unique challenges, in that some characteristics (and the interdependence between them) are not easily represented in simplified models. In some cases, multiple regimen features all influence a single aspect of epidemic dynamics (e.g., the multiple reasons patients may be excluded from or poorly adherent to a regimen), while in other cases, a single outcome assessed during regimen development comprises multiple processes within such a mechanistic model (e.g., regimen effectiveness depends on both regimen potency and patient adherence). We therefore left the mechanism for achieving some specifications (e.g., “50% reduction in nonadherence”) open to developer interpretation. This precludes direct mapping of some elements of a typical target profile (e.g., dosing frequency or number of tablets) onto the model, while making the model better suited to its primary purpose of weighing the relative importance of different types of regimen strengths from an epidemiologic perspective. Similarly, synergies between different regimen characteristics may make it difficult to interpret a measure of an individual characteristic’s impact in the absence of a single specific regimen under study. For example, the potential gain from making a regimen more tolerable depends in part on the treatment duration, with greater impact when duration is longer (and, similarly, when efficacy is lower). The baseline and minimal levels selected may also change over time; of particular note, we used 20 mo as the worst-case duration for new RR regimens, but a 9-mo MDR TB regimen has already been endorsed for widespread use [9], setting a new benchmark for RR regimen duration and perhaps also efficacy [38]. We also deferred consideration of scalability to secondary analyses due to its context dependence, and, in doing so, we may have underestimated the impact of characteristics such as duration in those settings in which shorter duration would result in wider adoption of a novel regimen. Finally, there is much uncertainty about the selection, amplification, and transmission of drug resistance associated with novel regimens; we attempt to mitigate the effects of such uncertainty by assuming use of DST for novel regimens and by limiting analyses to a 10-y time horizon, but the relationships between preexisting drug resistance, emergence and amplification of resistance during treatment, and impact of resistance on treatment efficacy and disease transmission warrant further exploration. Consistent DST may be essential for regimens that have low barrier to resistance or significant overlap with regimens already in use. In conclusion, this analysis suggests that TB drug development could achieve substantial impact on mortality and TB incidence by capitalizing on new, more potent TB drugs and drug combinations to improve treatment efficacy. Other regimen characteristics such as duration and safety are also critically important, but much of their impact on population-level dynamics may occur through indirect effects on the health system. The importance of even small changes in efficacy implies that the reported efficacy gains of new MDR regimens may be at least as impactful as their reduced duration [38], that clinical trials of new RS TB regimens should ensure that efficacy is at least maintained in new regimens, and that a strategy of increasing RS TB regimen potency rather than shortening duration merits further consideration. The development of novel drug regimens will be an essential component of ending the global TB epidemic, and priority-setting frameworks such as the one presented here can help to focus resources on those regimens likely to have the greatest impact at the population level. Supporting Information S1 Methods. Modeling of novel regimen characteristics. https://doi.org/10.1371/journal.pmed.1002202.s001 (DOCX) S2 Methods. Transmission model specification. https://doi.org/10.1371/journal.pmed.1002202.s002 (DOCX) S3 Methods. Details of model calibration and epidemiologic settings. https://doi.org/10.1371/journal.pmed.1002202.s003 (DOCX) S1 Results. Results of model calibration. https://doi.org/10.1371/journal.pmed.1002202.s004 (DOCX) S2 Results. Sensitivity of results to HIV-specific exclusions and to contact structure. https://doi.org/10.1371/journal.pmed.1002202.s005 (DOCX) S1 Table. Model parameters. https://doi.org/10.1371/journal.pmed.1002202.s006 (DOCX) S2 Table. Calibration targets for all modeled epidemiologic settings. https://doi.org/10.1371/journal.pmed.1002202.s007 (DOCX) S3 Table. Baseline TB incidence and mortality of calibrated models. https://doi.org/10.1371/journal.pmed.1002202.s008 (DOCX) S4 Table. Estimated 10-y mortality and incidence impact of optimal novel regimen for all modeled settings. https://doi.org/10.1371/journal.pmed.1002202.s009 (DOCX) S5 Table. Sensitivity analysis for nonequilibrium underlying RS TB epidemic: parameters used and resulting TB incidence. https://doi.org/10.1371/journal.pmed.1002202.s010 (DOCX) S6 Table. Sensitivity analysis results for nonequilibrium epidemic: comparing impacts of improving a single regimen characteristic. https://doi.org/10.1371/journal.pmed.1002202.s011 (DOCX) S7 Table. Sensitivity analysis results for nonequilibrium epidemic: comparing impacts of failing to optimize a single regimen characteristic. https://doi.org/10.1371/journal.pmed.1002202.s012 (DOCX) S1 Fig. Results for incidence outcome, primary (India) setting. In contrast to other figures showing impact on the TB or RR TB mortality reduction resulting from a regimen, this analysis considers the impact of different novel regimen characteristics on the regimen’s ability to reduce TB incidence (RS TB regimens) or RR TB incidence (RR TB regimens). https://doi.org/10.1371/journal.pmed.1002202.s013 (TIF) S2 Fig. Impact of novel regimens on TB mortality in an epidemic reflective of Brazil (lower TB prevalence, higher HIV coprevalence). https://doi.org/10.1371/journal.pmed.1002202.s014 (TIF) S3 Fig. Impact of novel regimens on TB mortality in an epidemic reflective of the Philippines (high TB prevalence, low HIV coprevalence). https://doi.org/10.1371/journal.pmed.1002202.s015 (TIF) S4 Fig. Impact of novel regimens on TB mortality in an epidemic reflective of South Africa (high TB prevalence and high HIV coprevalence). https://doi.org/10.1371/journal.pmed.1002202.s016 (TIF) S5 Fig. Total patient-months on treatment resulting from different RS TB regimen improvements. Different improvements in a novel RS TB regimen have different population-level impacts on total TB treatment person-time. Efficacy improvements reduce the use of all regimens by lowering incidence most dramatically. Inclusive eligibility allows more patients to receive the novel rather than standard regimen, which reduces total treatment time only if the novel regimen is also shorter. Shortening the regimen duration has a direct and immediate impact on the total patient-months on treatment. https://doi.org/10.1371/journal.pmed.1002202.s017 (TIF) S6 Fig. Role of expanded RR TB detection and treatment in novel RR TB regimen impact. This sensitivity analysis considers a scenario in which an improved RR TB regimen allows or motivates more rapid scale-up of rifampin DST, such that universal rifampin DST is achieved by the end of the 3-y scale-up period for the novel regimen. Compared to the baseline scenario that assumes continued gradual scale-up of DST, the indirect effect of simultaneous rapid DST scale-up is expected to approximately double the direct effect of a novel regimen improvement such as shortened duration or improved tolerability. https://doi.org/10.1371/journal.pmed.1002202.s018 (TIF) S7 Fig. Sensitivity analysis: influence of individual parameter estimates on overall novel regimen impact. Partial rank correlation (adjusted for other parameters) was calculated for each parameter with the percent reduction in TB mortality (or, for RR TB regimens, the percent reduction in RR TB mortality) achieved by a novel regimen that met all intermediate target criteria. https://doi.org/10.1371/journal.pmed.1002202.s019 (TIF) S8 Fig. Primary results with contact structure changed to heterogeneous, as described in S3 Methods. https://doi.org/10.1371/journal.pmed.1002202.s020 (TIF) S9 Fig. Sensitivity analysis for calibration method. The calibration method used in the primary analysis, in which all simulations that fell inside of uncertainty intervals were included in the analysis with equal weight, is compared to an alternative approach weighted according to a Gaussian-based likelihood function as described in S3 Methods. In order to summarize many results in a single figure, regimen characteristics not being varied are set at an intermediate baseline; scalability is included among the characteristics varied; and reduction in mortality is shown relative to projections without any novel regimen. https://doi.org/10.1371/journal.pmed.1002202.s021 (TIF)