Abstract Measurement and characterization of the human microbiome in large population-based human studies has recently become a reality secondary to technological advances in high-throughput DNA sequencing. These advances bring new challenges and knowledge gaps for study planning, data analysis, and interpretation that are novel to large-scale epidemiologic studies. In this issue of the Journal, Sinha et al. (Am J Epidemiol. 2018;187(6):1282–1290) have provided data with which to inform statistical power and sample size requirements for microbiome studies in population-based settings. This work serves as a helpful starting point for study planning while also serving as a springboard for discussion regarding additional considerations for improving microbiome research. This commentary emphasizes the importance of selecting microbiome metrics appropriate for the biological hypothesis under investigation, as well as the need for new analytical tools that can better capitalize on the unique yet rich information contained in microbiome data sets. microbiome, microbiota Measurement and characterization of the human microbiome in large population-based human studies has recently become a reality secondary to technological advances in high-throughput DNA sequencing (1). Accordingly, there has been an exponential rise in the number of research projects and publications directly addressing the role of the microbiome in disease etiology. Since the introduction of the term by Joshua Lederberg in 2001 (2), there have been over 11,000 publications with “microbiome” in the title or abstract indexed in PubMed (National Library of Medicine, Bethesda, Maryland). Among these publications are numerous reports of associations between a wide array of microbiome metrics and several disease biomarkers (3–8), as well as clinical diseases, including cardiovascular disease, cancer, diabetes, periodontitis, allergies, autism, inflammatory bowel disease, fibromyalgia, and depression (1, 9, 10). While results from many of these studies are thought-provoking, the majority of microbiome research to date has been conducted using animal models or human studies with extreme phenotype contrasts (e.g., case-control studies comparing people with and without a clinical disease), and less research is available from large population-based longitudinal studies that can establish temporality. In this issue of the Journal, Sinha et al. (11) have provided an elegant and practical overview of statistical power and sample size requirements for microbiome studies in population-based settings. Specifically, they report intraclass correlation coefficients corresponding to several microbiome metrics reflecting alpha diversity, beta diversity, and phylum-level relative abundance. The data arise from multiple body sites and 3 different study populations and are used to inform the temporal stability of microbiome metrics and the corresponding influence of repeat sampling on sample size requirements. Their report advances existing research concerning statistical power for microbiome studies (12) in some meaningful ways, including the following: 1) quantifying the value of repeat sampling for sample size reduction; 2) the provision of an R-based tool (R Foundation for Statistical Computing, Vienna, Austria) for project-specific sample size calculations; 3) providing a formula allowing for unbalanced comparison groups; and 4) reporting measures of association using odds ratios and risk ratios, which are more commonly utilized in population-based studies. In addition to providing a helpful starting point for study planning, Sinha et al.’s findings (11) also raise important questions about long-term research needs for enhancing study design, data analysis, and interpretation of results from microbiome research to ensure that the eventual interventional arm of etiological epidemiology is properly informed by its observational counterpart. These questions have direct implications not only for power estimation methods but also for the trajectory of scientific inquiry into the potential health effects of the microbiome and the likelihood of realizing novel prevention and treatment approaches leveraging the microbiome. One such question that has not been systematically explored but is central to the results presented by Sinha et al. relates to the best exposure (or outcome) microbiome metric to use vis-à-vis specific biological hypotheses. This issue is of high relevance in microbiome studies due to the vast array of summary metrics available and the fact that each metric captures different features of microbiome complexity. A key observation reported by Sinha et al. is the generally high intraclass correlation coefficient of principal components obtained from the unweighted UniFrac beta-diversity metric (13), which consistently outperformed other metrics, particularly in comparison with phylum-level relative abundance measures. If this finding is replicated in other settings, it suggests that from the standpoint of statistical power, unweighted UniFrac would be the best metric to use, as it reduces sample size requirements. However, in settings where unweighted UniFrac does not capture variation in microbiome features relevant to the hypothesis, the issue of power becomes secondary. While defining and operationalizing precise exposure constructs (and well-defined interventions with which to manipulate these exposures) has always been of critical importance to etiological epidemiology, this concept is of heightened importance in the context of microbiome studies, where the metrics are numerous and limited population-based data exist to help inform their meaning. From a historical perspective, much can be learned about the importance of well-informed exposure constructs in relation to a specific biological hypothesis, by revisiting previous research interrogating the microbial etiologies of various chronic diseases. In the case of human papillomavirus, which is now known to be a necessary cause of virtually all cervical cancers (14), it was important to carefully study different types of human papillomavirus and learn which of the more than 100 types were causal (15). Doing so has been critical for establishing causality and precisely estimating effect size, as well as for the development of appropriate screening and prevention programs. In contrast, research investigating microbial etiologies of coronary artery disease (CAD) has had limited success in establishing causality and translating findings into clinical practice. As Anderson previously discussed (16), this may be due to a truly null hypothesis. However, the rapid progression from observational study designs to interventional designs, prior to the establishment of appropriate exposure constructs and corresponding interventions, might also provide some explanation. Specifically, despite a large body of observational evidence suggesting that multiple microbial exposures—both viral and bacterial—were associated with modestly increased risk of CAD (17, 18), subsequent clinical trials were designed to test the effectiveness of antibiotic therapy against a single organism, Chlamydia pneumoniae. Importantly, there was little evidence from observational studies concerning the role of antibiotic therapy in CAD risk reduction. Therefore, the intervention and intervention target were not well-aligned with the totality of the evidence from observational data used to rationalize the interventions. We now move into a new era of increasingly complex, and often countervailing, microbial hypotheses. A few examples are illustrative of the complexity and are worth mentioning. The gut- bacteria–derived metabolite trimethylamine-N-oxide has been strongly linked to increased risk of clinical cardiovascular events (6, 19), which represents a potentially deleterious role for the gut microbiome. In contrast, many gut bacteria are potentially beneficial to host health through a variety of hypothesized mechanisms, including production of short-chain fatty acids important for intestinal integrity, energy homeostasis, and reducing inflammation (5). As another example, microbes in the mouth are important for nitric oxide generation (20)—an important signaling molecule in cardiovascular physiology—and experimental studies demonstrate that antimicrobial mouthwash modestly increases blood pressure (via blockage of oral nitrate reduction and nitric oxide generation) (21), suggesting a potential cardiovascular benefit from oral colonization of nitrate reducers. However, long-standing recommendations for maintaining optimal oral health include hygiene behaviors that rely on suppression of the oral microbiota (e.g., brushing, flossing, and use of chemical adjuvants with antimicrobial activity) without consideration of other health effects possibly related to oral commensal organisms. There are also novel hypotheses emerging that suggest synergy between potential pathogens and the broader microbiome. Consider an emerging body of science regarding the etiology of periodontitis—a highly prevalent oral infection arising from chronic, nonresolving host inflammatory response to biofilms. Recent evidence suggests that keystone pathogens have the ability to subvert certain aspects of host immunity, enabling the formation of dysbiotic microbial communities which are central to the development of clinical disease; importantly, results from germ-free animal models have demonstrated that putative keystone pathogens only produce clinical disease in the presence of baseline commensal microbiota (22, 23). In the context of the gut microbiome, the development and resolution of Clostridium difficile infection also appears to be dependent on synergy between a known pathogen and the broader microbial community. Randomized clinical trials have shown that fecal transplants dramatically increase cure rates from 30% to more than 90%, and clinical improvement is accompanied by broad changes in gut microbial communities (24). The C. difficile example is particularly noteworthy, as it suggest a potentially important role for the broader microbial community in the context of a classical infectious disease with a known etiological agent. Beyond these examples, numerous additional hypotheses are emerging which suggest an important, and often beneficial, role for the microbiome. Consequently, it is possible that microbiome metrics that are too broad (or too narrow) in scope may result in mixed and attenuated measures of association via competing risks/benefits conferred by different members of the microbial community. Moreover, while certain broad metrics of microbial community membership and function provide important clues that can help refine old hypotheses or generate new hypotheses, it will often be difficult to form focused interventions using these metrics. Therefore, careful hypothesis formation directly linked to a clear causal exposure construct will be necessary to enable the conduct of large intervention studies with a high probability of success. Additionally, interventions designed to influence the microbiome exposure construct must also consider the intervention’s influence on the microbiome more broadly to avoid unintended consequences of untargeted approaches, as others have suggested (6). To inform these questions, there is a great need for more population-based research to inform appropriate exposure constructs. While many studies will understandably be designed with a focus on a narrow set of microbiome metrics, the value of reporting and discussing results for a wide array of metrics should not be overlooked, as our understanding will be enriched by the patterns that emerge through replication in the literature. In addition to the value of high-quality exposure constructs in regard to enhancing the power and validity of research findings, another approach for improving power includes the development of new analytical methods. For example, an important feature of microbiome data operationalization is the common use of relative (as opposed to absolute) abundance to summarize microbial levels. While many notable advances in analyzing and interpreting relative abundance data have occurred (25), it is possible that rich biological information is lost when ignoring absolute abundance. Consider hypotheses concerning keystone pathogens, where it is hypothesized that low-abundance keystone pathogens can influence the broader microbial community (discussed above) despite being a minority member of the community (23, 26). Despite the potential importance of low–relative-abundance organisms, large changes in absolute levels of a low-abundance organism are difficult to accurately measure on a relative abundance scale; in some situations, simple data rounding can mask these changes. Conversely, small changes in a highly prevalent taxon’s abundance causes artificial changes in the relative abundance of many other taxa, even if their absolute abundances remain unchanged. This latter point is particularly relevant when performing multiple hypothesis testing at the level of individual taxa (or genes), because many of the observed changes in relative abundance are artificial and thus increase the risk of false-positive findings. To date, methods for power calculation do not address this issue or provide information on the probability of detecting true positives (sensitivity) while controlling the false-negative rate. Sinha et al. have made considerable efforts to provide a baseline for power and sample-size calculations, based on existing metrics of microbial community composition (11). While the nascent field of microbiome epidemiology emerges, their findings serve as a valuable genesis for discussion and the new collaborations necessary to address many knowledge gaps related to the design of high-quality epidemiologic studies and subsequent data analysis and interpretation, which are central to the improvement of population health. ACKNOWLEDGMENTS Author affiliations: Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota (Ryan T. Demmer); Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York (Ryan T. Demmer). This commentary was supported by a grant from the National Institute of Diabetes and Digestive and Kidney Diseases (grant R01 DK102932) to R.T.D. Conflict of interest: none declared. Abbreviations CAD coronary artery disease REFERENCES 1 Morgan XC , Huttenhower C . Chapter 12: human microbiome analysis . PLoS Comput Biol . 2012 ; 8 ( 12 ): e1002808 . Google Scholar CrossRef Search ADS PubMed 2 Lederberg J , McCray AT . ‘Ome sweet’omics—a genealogical treasury of words . Scientist . 2001 ; 15 ( 7 ): 8 . 3 Demmer RT , Breskin A , Rosenbaum M , et al. . The subgingival microbiome, systemic inflammation and insulin resistance: the Oral Infections, Glucose Intolerance and Insulin Resistance Study . J Clin Periodontol . 2017 ; 44 ( 3 ): 255 – 265 . Google Scholar CrossRef Search ADS PubMed 4 Demmer RT , Jacobs DR Jr , Singh R , et al. . Periodontal bacteria and prediabetes prevalence in ORIGINS: the Oral Infections, Glucose Intolerance, and Insulin Resistance Study . J Dent Res . 2015 ; 94 ( 9 suppl ): 201S – 211S . Google Scholar CrossRef Search ADS PubMed 5 Wang J , Jia H . Metagenome-wide association studies: fine-mining the microbiome . Nat Rev Microbiol . 2016 ; 14 ( 8 ): 508 – 522 . Google Scholar CrossRef Search ADS PubMed 6 Tang WH , Wang Z , Levison BS , et al. . Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk . N Engl J Med . 2013 ; 368 ( 17 ): 1575 – 1584 . Google Scholar CrossRef Search ADS PubMed 7 Le Chatelier E , Nielsen T , Qin J , et al. . Richness of human gut microbiome correlates with metabolic markers . Nature . 2013 ; 500 ( 7464 ): 541 – 546 . Google Scholar CrossRef Search ADS PubMed 8 Arumugam M , Raes J , Pelletier E , et al. . Enterotypes of the human gut microbiome . Nature . 2011 ; 473 ( 7346 ): 174 – 180 . Google Scholar CrossRef Search ADS PubMed 9 Cho I , Blaser MJ . The human microbiome: at the interface of health and disease . Nat Rev Genet . 2012 ; 13 ( 4 ): 260 – 270 . Google Scholar CrossRef Search ADS PubMed 10 Ganesan SM , Joshi V , Fellows M , et al. . A tale of two risks: smoking, diabetes and the subgingival microbiome . ISME J . 2017 ; 11 ( 9 ): 2075 – 2089 . Google Scholar CrossRef Search ADS PubMed 11 Sinha R , Goedert JJ , Vogtmann E , et al. . Quantification of human microbiome stability over 6 months: implications for epidemiologic studies . Am J Epidemiol . 2018 ; 187 ( 6 ): 1282 – 1290 . 12 Kelly BJ , Gross R , Bittinger K , et al. . Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA . Bioinformatics . 2015 ; 31 ( 15 ): 2461 – 2468 . Google Scholar CrossRef Search ADS PubMed 13 Lozupone C , Knight R . UniFrac: a new phylogenetic method for comparing microbial communities . Appl Environ Microbiol . 2005 ; 71 ( 12 ): 8228 – 8235 . Google Scholar CrossRef Search ADS PubMed 14 de Martel C , Ferlay J , Franceschi S , et al. . Global burden of cancers attributable to infections in 2008: a review and synthetic analysis . Lancet Oncol . 2012 ; 13 ( 6 ): 607 – 615 . Google Scholar CrossRef Search ADS PubMed 15 Muñoz N , Bosch FX , de Sanjosé S , et al. . Epidemiologic classification of human papillomavirus types associated with cervical cancer . N Engl J Med . 2003 ; 348 ( 6 ): 518 – 527 . Google Scholar CrossRef Search ADS PubMed 16 Anderson JL . Infection, antibiotics, and atherothrombosis—end of the road or new beginnings? N Engl J Med . 2005 ; 352 ( 16 ): 1706 – 1709 . Google Scholar CrossRef Search ADS PubMed 17 Danesh J , Whincup P , Walker M , et al. . Chlamydia pneumoniae IgG titres and coronary heart disease: prospective study and meta-analysis . BMJ . 2000 ; 321 ( 7255 ): 208 – 213 . Google Scholar CrossRef Search ADS PubMed 18 Danesh J . Coronary heart disease, Helicobacter pylori, dental disease, Chlamydia pneumoniae, and cytomegalovirus: meta-analyses of prospective studies . Am Heart J . 1999 ; 138 ( 5 ): S434 – S437 . Google Scholar CrossRef Search ADS PubMed 19 Wang Z , Klipfell E , Bennett BJ , et al. . Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease . Nature . 2011 ; 472 ( 7341 ): 57 – 63 . Google Scholar CrossRef Search ADS PubMed 20 Duncan C , Dougall H , Johnston P , et al. . Chemical generation of nitric oxide in the mouth from the enterosalivary circulation of dietary nitrate . Nat Med . 1995 ; 1 ( 6 ): 546 – 551 . Google Scholar CrossRef Search ADS PubMed 21 Kapil V , Haydar SM , Pearl V , et al. . Physiological role for nitrate-reducing oral bacteria in blood pressure control . Free Radic Biol Med . 2013 ; 55 : 93 – 100 . Google Scholar CrossRef Search ADS PubMed 22 Hajishengallis G . Periodontitis: from microbial immune subversion to systemic inflammation . Nat Rev Immunol . 2015 ; 15 : 30 – 44 . Google Scholar CrossRef Search ADS PubMed 23 Hajishengallis G , Darveau RP , Curtis MA . The keystone-pathogen hypothesis . Nat Rev Microbiol . 2012 ; 10 ( 10 ): 717 – 725 . Google Scholar CrossRef Search ADS PubMed 24 van Nood E , Vrieze A , Nieuwdorp M , et al. . Duodenal infusion of donor feces for recurrent Clostridium difficile . N Engl J Med . 2013 ; 368 ( 5 ): 407 – 415 . Google Scholar CrossRef Search ADS PubMed 25 McMurdie PJ , Holmes S . Waste not, want not: why rarefying microbiome data is inadmissible . PLoS Comput Biol . 2014 ; 10 ( 4 ): e1003531 . Google Scholar CrossRef Search ADS PubMed 26 Hajishengallis G , Liang S , Payne MA , et al. . Low-abundance biofilm species orchestrates inflammatory periodontal disease through the commensal microbiota and complement . Cell Host Microbe . 2011 ; 10 ( 5 ): 497 – 506 . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
American Journal of Epidemiology – Oxford University Press
Published: Apr 3, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera