Access the full text.
Sign up today, get DeepDyve free for 14 days.
Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 Syst. Biol. 67(4):719–728, 2018 © The Author(s) 2018. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For Permissions, please email: email@example.com DOI:10.1093/sysbio/syy007 Advance Access publication February 7, 2018 Modeling the Growth and Decline of Pathogen Effective Population Size Provides Insight into Epidemic Dynamics and Drivers of Antimicrobial Resistance ERIK M. VOLZ AND XAVIER DIDELOT Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK Correspondence to be sent to: Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place W2 1PG, UK; Email: firstname.lastname@example.org. Received 19 October 2017; reviews returned 1 February 2018; accepted 4 February 2018 Associate Editor: Jeffrey Townsend Abstract.—Nonparametric population genetic modeling provides a simple and ﬂexible approach for studying demographic history and epidemic dynamics using pathogen sequence data. Existing Bayesian approaches are premised on stochastic processes with stationary increments which may provide an unrealistic prior for epidemic histories which feature extended period of exponential growth or decline. We show that nonparametric models deﬁned in terms of the growth rate of the effective population size can provide a more realistic prior for epidemic history. We propose a nonparametric autoregressive model on the growth rate as a prior for effective population size, which corresponds to the dynamics expected under many epidemic situations. We demonstrate the use of this model within a Bayesian phylodynamic inference framework. Our method correctly reconstructs trends of epidemic growth and decline from pathogen genealogies even when genealogical data are sparse and conventional skyline estimators erroneously predict stable population size. We also propose a regression approach for relating growth rates of pathogen effective population size and time-varying variables that may impact the replicative ﬁtness of a pathogen. The model is applied to real data from rabies virus and Staphylococcus aureus epidemics. We ﬁnd a close correspondence between the estimated growth rates of a lineage of methicillin-resistant S. aureus and population- level prescription rates of -lactam antibiotics. The new models are implemented in an open source R package called skygrowth which is available at https://github.com/mrc-ide/skygrowth.[Antimicrobial resistance; effective population size; growth rate; MRSA; phylodynamics; skygrowth.] Nonparametric population genetic modeling has Minin 2013). The choice of a process with stationary emerged as a simple, ﬂexible, popular, and powerful increments as prior can have large inﬂuence on size tool for interrogating genetic sequence data to reveal estimates especially when genealogical data are sparse demographic history (Ho and Shapiro 2011). This and uninformative. Genealogies often provide very approach has proved especially useful for analysis little information about effective population size near of pathogen sequence data to reconstruct epidemic the present (or most recent sample), especially in history, and such models are increasingly incorporated exponentially increasing populations (de Silva et al. into surveillance systems for infectious diseases (Volz 2012). In such cases, skyline estimators with BM priors et al. 2013). The most commonly used techniques are on the effective population size may produce estimates derivatives of the original skyline coalescent model, which stabilize at a constant level even when the true which describes the evolution of effective population size is increasing or decreasing exponentially. We argue size as a piecewise constant function of time (Pybus et al. that in many situations, a more realistic prior can be 2000). The basic skyline model is prone to overﬁtting and deﬁned in terms of the growth rate of the effective estimating drastic ﬂuctuations in effective population population size. Below, we describe such a prior based on size, so that numerous approaches were subsequently a simple autoregressive stochastic process deﬁned on the developed for smoothing population size trajectories. growth rate of effective population size. We show how Initial approaches to smoothing skyline estimators were this prior can lead to substantially different estimates based on aggregating adjacent coalescent intervals and argue that these estimates are more accurate in within a maximum likelihood framework (Strimmer many situations. When genealogical data are sparse, our and Pybus 2001). Subsequent development has largely model will retain the growth rate learned from other focused on Bayesian approaches where a more complex parts of the genealogy and will correctly capture trends stochastic diffusion process provides a prior for of exponential growth or decline. Even though our the evolution of a piecewise constant function of approach is nonparametric, we consider its relationship effective population size (Drummond et al. 2005). with parametric models of epidemic population genetics Nonparametric Bayesian approaches are now the most to show that our estimates of growth rates of pathogen popular approach for phylodynamic inference, and such effective population size are often likely to correspond approaches have illuminated the epidemic history of to growth rates of an infectious disease epidemic. numerous pathogens in humans and animals (Ho and Smoothing effective population size trajectories using Shapiro 2011). a prior on growth rates also have important advantages To date, all Bayesian nonparametric models have when incorporating nongenetic covariate data into assumed that the effective population size (or its phylodynamic inference (Baele et al. 2016). Recent logarithm) follows a stochastic process such as a work has focused on reﬁning effective population size Brownian motion (BM) (Minin et al. 2008; Palacios and estimates using both the times of sequencing sampling [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 719 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 720 SYSTEMATIC BIOLOGY VOL. 67 (Karcher et al. 2016) or using environmental data This BM prior has been adapted and applied in a which are expected to correlate with size estimates, variety of ways to enable statistical inference. In the such as independent epidemic size estimates based on skygrid model (Gill et al. 2013), time is discretized, and nongenetic data (Gill et al. 2016). Existing statistical is deﬁned to be a piecewise constant function of time models have assumed that the effective population size over a grid with time increments h, and the value is has a linear or log-linear relationship with temporal estimated for each interval i. Time intervals do not in covariates. However in many cases, a more realistic general correspond to coalescent times in the genealogy. model would specify that the growth rate of effective In this case, the BM prior is computed over increments population size is correlated with covariates, as when of : for example an environmental variable impacts the m−1 replicative ﬁtness of a pathogen. We provide a similar p( |) ∝ p( − |), (2) extension of previous skyride models with covariate data 1:m i+1 i (Gill et al. 2016) to show how such data can be used i=1 to test hypotheses concerning their effect and, when a where signiﬁcant effect exists, to reﬁne estimates of both the growth rates and the effective population sizes. − ( − ) We illustrate the potential advantages of our growth i+1 i 2h p( − |) = e . i+1 i 2h rate model using a rabies virus data set that has been thoroughly studied using previous phylodynamic The genealogical data take the form G = (c ,s ), methods (Biek et al. 2007; Gill et al. 2016). In particular, we 1:(n−1) 1:n where c and s are respectively ordered coalescent times show how our model correctly estimates a recent decline (internal nodes of the genealogy) and sampling times in epidemic size whereas previous models mistakenly (terminal nodes of the genealogy). In the coalescent predict a stabilization of the epidemic prevalence. We framework, the sampling times are usually considered also apply our methodology to a genomic data set to be ﬁxed, so that p(s) = 1 and p(G) = p(c|s). Alternatively, of methicilin-resistant S. aureus (MRSA) that had not in some variations of this model, a prior p(s|Ne) is also formally been analyzed using phylodynamic methods provided for the sequence of sampling times, making (Uhlemann et al. 2014). We show how time series this approach similar to but more ﬂexible than sampling- on prescription rates of -lactam antibiotics correlate birth-death-models (Volz and Frost 2014; Karcher et al. strongly with growth and decline of the effective 2016). population size, revealing the impact of antibiotic use Given a genealogy, the posterior distribution of the on the emergence and spread of resistant bacterial parameters and is decomposed as: pathogens. 1:m p( ,|G) ∝ p(G| )p( |)p(). (3) 1:m 1:m 1:m METHODS AND MATERIALS The second term is given by Equation 2 and the last term We model effective population size through time as by the prior on . To assist with the deﬁnition of the ﬁrst a ﬁrst order autoregressive stochastic process on the term, we ﬁrst denote A(t) to be the number of extant growth rate. This provides an intuitive link between the lineages at time t: growth rate of effective population size of pathogens and epidemic size as well as the reproduction number of the n n−1 epidemic. We further show how to incorporate time- A(t) = I(s > t) − I(c > t), (4) varying environmental covariates into phylodynamic i i inference. i=1 i=1 where I(x) is an indicator function equal to one when x is true and equal to zero otherwise. The probability Previous Bayesian Nonparametric Phylodynamic Models density of the genealogical data given the population Several nonparametric phylodynamic models have size history is then equal to (Grifﬁths and Tavare 1:m been proposed based on BM processes and the Kingman 1994): coalescent genealogical model (Kingman 1982). In particular, the Bayesian nonparametric skyride model 2n−2 A(t ) i t A(t ) i+1 1 − − dt uses a BM prior to smooth trajectories of the logarithm 2 t 2 Ne(t) p(G| ) = I(t ∈ c ) e 1:m i i of the effective population size (Minin et al. 2008). Let Ne(t ) i+1 i=1 (t) = log(Ne(t)) denote the logarithm of the effective A(t ) i+1 1 population size as a function of time. The BM prior is − − dt t Ne(t) +(1 −I(t ∈ c ))e , i i deﬁned as: (5) (t +dt) ∼ (t) +N (0,dt/), (1) where t = c ∪s is the set union of sample where is an estimated precision parameter, for which 1:(2n−1) 1:(n−1) 1:n an uninformative Gamma prior is typically used. and coalescent times in descending order. [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 720 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 2018 VOLZ AND DIDELOT–GROWTH AND DECLINE OF EFFECTIVE POPULATION SIZE 721 Relationship Between the Growth Rate of Effective expected number of transmissions by an infected host infected at time t (Fraser 2007). Assuming that all infected Population Size and Epidemic Properties individuals are equally infectious (as is the case e.g., Several recent studies have investigated the in the SIR model), we have that during periods when relationship between the effective population size the epidemic growth rate is constant, each infected of a pathogen and the number of infected hosts individual transmits at rate (t) = R(t)/, where is (Rosenberg and Nordborg 2002; Koelle et al. 2011; the mean duration of infections. With these deﬁnitions, Dearlove and Wilson 2013). A simple link between these the number of infections Y(t) varies according to the quantities does not exist, since the relationship depends following differential equation: on how incidence and epidemic size change through time (Volz et al. 2009), population structure (Volz 2012), R(t) −1 Y(t) = Y(t) (9) and complex evolution of the pathogen within hosts (Didelot et al. 2016; Volz et al. 2017). Under idealized situations, there is however a simple relationship Combining Equations 7, 8, and 9 leads to the following between the growth rate of effective population size and approximate estimator for the reproduction number the growth rate of an epidemic (Frost and Volz 2010; through time: Volz et al. 2013). Ne(t) Let Y(t) and (t) denote the number of infected R(t) = 1 + (10) Ne(t) hosts and per-capita transmission rate, respectively, as functions of time. Note that (t) may depend on the This estimator makes use of the quantity Ne(t)/Ne(t) density of susceptible individuals in the population, as in which will be estimated in our model below. Equation the common susceptible-infected-removed (SIR) model, 10 is likely to be a good estimator over periods of in which case (t) ∝ S(t)/N (Allen 2008). The coalescent the epidemic where per-capita transmission rates are rate for an infectious disease epidemic was previously invariant. A special case of this occurs at the start derived under the assumption that within-host effective of an epidemic, in which case Equation 10 can be population size is negligible and that superinfection used to estimate the basic reproduction number R ,as does not occur (Volz et al. 2009; Frost and Volz 2010): previously noted (Pybus 2001). A(t) 2(t) (t) = . (6) 2 Y(t) A Growth Rate Prior for Effective Population Size Equating this rate with the coalescent rate under We propose a model in which the growth rate of A(t) the coalescent model (t) = /Ne(t)(Kingman 1982) the effective population size, as opposed to effective yields the following formula for the effective population population size itself, is an autoregressive process with size: stationary increments. This growth rate is deﬁned as: Y(t) Ne(t) = . (7) ˙ Ne(t) 2(t) (t) = . (11) Ne(t) Differentiating with respect to time (denoting with a dot superscript) yields: Note that (t) is a real-valued quantity, with negative and positive values respectively indicating an increase and ˙ ˙ Y(t) (t)Y(t) decrease in the effective population size. In particular, Ne(t) = − . (8) if the population is exponentially growing or declining 2(t) 2((t)) from t = 0, then we would have Ne(t) = Ne(0)exp(t)so Note that, in general the growth rate of the effective that (t) = at every time t ≥ 0. More generally, we model population size does not correspond to the growth (t) using a BM process: (t) ∼ BM() (cf Equation 1). To rate of Y, however if the per-capita transmission rate facilitate statistical inference, we work with a discretized ˙ ˙ ˙ ˙ is constant ( = 0), we have Ne = Y/(2) ∝ Y. Thus, we time axis with m intervals of length h as in the skygrid expect that over phases of the epidemic where per-capita model (Gill et al. 2013). We deﬁne the growth rate in transmission rates are nearly constant there will be close time interval i as: correspondence between the growth or decline of the Ne −Ne i+1 i effective population size and the growth or decline of = . (12) hNe the unobserved number of infected hosts. This condition i is often satisﬁed near the beginning of an outbreak We use the following approximate model for which has an exponential phase. It is also often satisﬁed p( | ): i+1 i towards the end of epidemics when the epidemic size is ∼ +N (0,h/). (13) i+1 i decreasing at a constant exponential rate. The basic reproduction number R describes the Note that Equation 12 implies ∈ (−1/h,∞) since Ne 0 i expected number of transmission events caused by a cannot decline below zero, whereas the approximate single infected individual in an otherwise susceptible model in Equation 13 assumes support on the entire real population. By extension, we can deﬁne R(t)asthe line. We have found performance with this approximate [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 721 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 722 SYSTEMATIC BIOLOGY VOL. 67 model to be superior to exact models on the log precision parameter . Metropolis-Hastings sampling transformation of Ne provided that h is small. is also performed for regression coefﬁcients if 1:q With the above deﬁnitions, the prior density of a covariate data are provided with univariate normal sequence is deﬁned in terms of the increments: proposals. The elements of are sampled in sequence 1:m 1:m (from past to present), and multiple Gibbs iterations m−2 (by default 100) are performed before updating other p( |) ∝ p( − |), (14) 1:m i+1 i parameters using Metropolis–Hastings steps. i=1 MAP is used as a starting point for the MCMC. The MAP estimator alternates between optimization of where 1:m using gradient descent (BFGS in R, Goldfarb 1970) and − ( − ) i+1 i univariate optimization of until convergence in the 2h p( − |) = e . i+1 i 2h posterior is observed. Approximate credible intervals are provided for the MAP estimator based on curvature of This equation can be compared with the skygrid density, the posterior around the optimum. Equation 2. RESULTS Incorporating Covariates into Phylodynamic Inference Simulations A simple model was recently proposed We evaluated the ability of the skygrowth model for incorporating time-varying covariates into to infer epidemic trends by simulating partially- phylodynamic inference with skygrid models (Gill sampled genealogies from a stochastic individual-based et al. 2016). Suppose we observe q covariates at m time susceptible-infected-removed (SIR) model. Simulated points denoted X = (X ), and such that observation 1:m,1:q data were generated using the BEAST2 package times correspond to the grid used in the phylodynamic MASTER (Vaughan and Drummond 2013), and model. The following linear model for the marginal code to reproduce simulated results is available at distribution of with covariate vector was proposed: 1:q https://github.com/emvolz/skygrowth-experiments. p( |X, , ) ∼ N ( +X , ), (15) The skygrowth model was also compared to skygrid model i 1:q 0 i,1:q 1:q as implemented in the phylodyn R package (Karcher where is the expected mean of without covariate et al. 2016, 2017) which estimates effective population effects. size using a fast approximate Bayesian nonparametric This implies, along with the BM model, the following reconstruction (BNPR). The SIR model was density marginal distribution of the increments: dependent with a reaction rate S(t)I(t) of generating new infections. Figure 1 shows results of a single p( − |X, ,, i+1 i 1:q simulation with R = 1.3 and 10,000 initial susceptible ∼ N (X −X ,h/ +2 ). (16) i+1,1:q 1:q i,1:q 1:q individuals. Additional simulations are shown in Supplementary Fig. S1 available on Dryad at When covariates are likely to be associated with http://dx.doi.org/10.5061/dryad.9qh7t9t. Estimates growth rates of the effective population size instead of with skygrowth were obtained using the MCMC the logarithm of the effective population size, we can algorithm and an Exponential(0.1) prior on the analogously deﬁne the density of increments of : precision parameter. We report the posterior means p( − |X, ,, ) from both skygrowth and skygrid BNPR. Genealogies i+1 i 1:q were reconstructed by sampling 200 or 1000 infected ∼ N (X −X ,h/ +2 ). (17) i+1,1:q 1:q i,1:q 1:q individuals at random from the entire history of the epidemic. In this scenario, both the skygrowth and When ﬁtting this model, we drop for simplicity (as in skygrid models reproduce the true epidemic trend, Gill et al. 2016), and estimate a single variance parameter capturing both the rate of initial exponential increase, along with the regression coefﬁcients . the time of peak prevalence, and the rate of epidemic decline. However, when sampling only 200 lineages (Fig. 1B), the genealogy contains relatively little Inference and Software Implementation information about later epidemic dynamics, and the Our growth rate model is implemented in an skygrid estimates an unrealistic levelling-off of Ne. open source R package called skygrowth , available Estimates using the skygrid BNPR model were highly from https://mrc-ide.github.io/skygrowth/, and similar to results using an exact MCMC algorithm for which includes both maximum a posteriori (MAP) and sampling the posterior also included in the phylodyn Bayesian Markov Chain Monte Carlo (MCMC) methods package. for model ﬁtting. While the results in Figure 1A and B suggest that Ne(t) The MCMC procedure uses a Gibbs-within- can serve as a very effective proxy for epidemic size, the Metropolis algorithm that alternates between sampling degree of correspondence will depend on details of the the growth rate vector and sampling of the epidemic model as discussed in the Methods section. 1:m [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 722 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 2018 VOLZ AND DIDELOT–GROWTH AND DECLINE OF EFFECTIVE POPULATION SIZE 723 A B Simulation (rescaled) Skygrowth BNPR FIGURE 1. Comparison of effective population size estimates using the skygrowth and skygrid models applied to data from a susceptible- infected-recovered simulated epidemic. Effective population size estimates are also compared to the number of infected hosts through time under a linear rescaling (red). a) Estimates using a SIR model and simulated genealogy with 1000 sampled lineages and R = 1.3. b) Estimates using a SIR model and simulated genealogy with 200 sampled lineages and R = 1.3. c) Estimates using a SIR model and simulated genealogy with 200 sampled lineages and R = 5. Figure 1C and Supplementary Fig. S2 available on Dryad These data were recently reanalyzed using the show a scenario where estimates of Ne(t) capture the skygrid model with covariates (Gill et al. 2016). No initial rate of exponential growth but fail to estimate the signiﬁcant association was found between Ne and V, time of peak epidemic prevalence, and the skygrid model but the authors noted that since V is the newly affected also fails to detect that the epidemic ever decreases. area, V would be expected to be associated with a change This scenario was based on a higher R = 5 and only 0 in Ne rather than Ne itself. Since the skyride method is 2000 initially susceptible individuals, such that almost focused on Ne, like all previous phylodynamic methods, all hosts are eventually infected and the rate of epidemic the authors considered the cumulative distribution of V decline predominantly reﬂects the host recovery rate. and showed that this is slightly associated with Ne [with This is easily understood using the formula Ne(t) ∝ a 95% credible interval of (0.18–2.86) on the covariate I(t)/S(t) (cf. Equation 7). When R is large, S(t) will effect size, Gill et al. 2016]. However, this approach is change drastically over the course of the epidemic. In not fully satisfactory. In particular, since V is always the later stages, almost all hosts have been infected so positive, the cumulative distribution of V is always that 1/S(t) is large, producing correspondingly large increasing, whereas Ne is in principle equally likely effective population sizes. There is a very slight signal to increase or decrease over time. Furthermore both V of decreasing growth rate which is detected shortly and its cumulative distribution were considered on a following epidemic peak using the skygrowth model. In logarithm scale, so that the latter ﬂattens over time by the absence of other information, the skygrowth model deﬁnition. retains this growth rate which produces estimates of A more natural solution is to keep the covariate V decreasing Ne(t). untransformed and investigate its association with the growth rate (t) rather than Ne(t) as implemented in our Rabies Virus methodology (Fig. 2). For this analysis, we used exactly An epidemic of rabies broke out in the late 1970s the same dated phylogeny as previously published in the North American raccoon population, following (Biek et al. 2007) (reproduced in Supplementary Fig. S3 the emergence of a host-adapted variant of the virus available on Dryad). When the covariate was not used called raccoon rabies virus (RRV). By the end of the (red results in Fig. 2), the growth rate was inferred to be 1990s, this outbreak had spread to a vast geographical positive but declining progressively to zero from 1973 to area including all Northeast and mid-Atlantic US states ∼1983, then stable around zero up to ∼1990, followed by (Childs et al. 2000). A sample of 47 RRV isolates has been a period of positive growth until ∼2000, after which the sequenced in a previous study (Biek et al. 2007), and growth rate decreased below zero. This implies that the BEAST (Drummond et al. 2012) was used to reconstruct effective population size increased from 1973 to ∼1983, a dated phylogenetic tree. A standard skyline analysis then was stable until ∼1990, increased to a peak in ∼1997 (Drummond et al. 2005) was performed, which visually and afterwards decreased. Two waves of spread have suggested a correlation between the inferred effective therefore been inferred as in previous analyses (Biek et al. population size (Ne) and the monthly area newly 2007; Gill et al. 2016), with the ﬁrst one starting in the affected by RRV (hereafter denoted V), but without 1970s and ending in ∼1983 and the second one lasting attempting to quantify the strength or signiﬁcance of this from ∼1990 to ∼1997. association. [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 723 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 724 SYSTEMATIC BIOLOGY VOL. 67 FIGURE 2. Results on the rabies application. Top: covariate data, representing the area in km newly affected by rabies recorded monthly between September 1978 and October 1999. Middle: growth rate estimates. Bottom: log effective population size estimates. The middle and bottom plots show results without (red) and with (blue) the use of the covariate data, and with a solid line indicating posterior means and shaded areas indicating the 95% credible regions. Unfortunately the covariate data V start in September previous estimate around R = 1.1 based on the same data 1978 and therefore do not cover the ﬁrst wave. However, (Biek et al. 2007). One of the main novel ﬁndings of our analysis is the covariate data show that the epidemic was spreading that we found a signiﬁcant decline of the effective very quickly between 1992 and 1997, much faster than population size of raccoon rabies post-2000, whereas before or after these dates, and this timing corresponds previous phylodynamic studies based on the same data fairly precisely to the second wave of spread. When found this to be constant (Biek et al. 2007; Gill et al. 2016). the covariate data were integrated into phylodynamic Previous methods consider a BM on the logarithm of inference, the covariate effect size was found to be Ne, which results in a strong prior that Ne is constant statistically signiﬁcant but only slightly so, with a large in recent time. In contrast, our model results in the 95% credible interval for the covariate effect size of growth rate being a priori constant, so that the clear (0.03–4.61) and posterior mean of 1.09. The reconstructed decline in growth rate started in the mid-1990s is likely growth rate and effective population size when using the to have continued to the point that the growth rate covariate data (blue results in Fig. 2) were compatible became negative and Ne declined. Our result is in good with results without covariate data. Using additional agreement with Centers for Disease Control surveillance informative data tighten the credible interval as would that shows a clear decline in rabid raccoons after the peak be expected, except in the second wave during which the in the mid-1990s (Monroe et al. 2016). covariate data suggest higher values for both the growth rate and effective population size. The mean posterior growth rate reached a value of about 2.5 per year in the Staphylococcus aureus USA300 1990s (Fig. 2) and the average generation time of raccoon rabies has previously been estimated to be around 2 Staphylococcus aureus is a bacterium that causes months (Biek et al. 2007). We can use Equation 10 to infer infections ranging from mild skin infections to a reproduction number of R = 1.4, slightly higher than a life-threatening septicemia. In the 1980s and 1990s, [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 724 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 2018 VOLZ AND DIDELOT–GROWTH AND DECLINE OF EFFECTIVE POPULATION SIZE 725 FIGURE 3. Results on the USA300 application. Top: covariate data, representing the consumption of -lactams between 1980 to 2012 in the USA, measured in standard units per 1000 population. Middle: growth rate estimates. Bottom: effective population size estimates. The middle and bottom plots show results without (red) and with (blue) the use of the covariate data, and with a solid line indicating posterior means and shaded areas indicating the 95% credible regions. several variants of S. aureus have emerged that are data (red results in Fig. 3) and found that the growth resistant to methicilin and other -lactam antibiotics, rate had been around zero up until 1985, after which and collectively called MRSA (Chambers and Deleo it steadily increased until ∼1995, and subsequently 2009). MRSA are well known as a leading cause of decreased almost linearly, becoming negative in ∼2002 hospital infections worldwide, but the MRSA variant and continuing to decrease afterwards. The effective called USA300 differs from most others by causing population size was accordingly found to have been very infections mostly in communities rather than hospitals. small until the mid-1990s, to have peaked in ∼2002 and USA300 was ﬁrst reported in 2000 and has since spread to have declined since. These results are in very good throughout the USA and internationally (Tenover and agreement with a phylodynamic analysis of USA300 Goering 2009; Challagundla et al. 2018). A recent study performed using a traditional skyline plot on a different sequenced the genomes from 387 isolates of USA300 genomic data set (Glaser et al. 2016), and epidemiological sampled from New York between 2009 and 2011, and data also suggest that USA300 may be declining (Planet reconstructed phylogeographic spread that frequently 2017). The overall MRSA incidence has declined by 31.2% involved transmission within households (Uhlemann in the USA between 2005 and 2011 (Dantes et al. 2013), et al. 2014). with some MRSA lineages showing encouraging The USA300 phylogenetic tree (Uhlemann et al. signs of reverting to methicilin susceptibility 2014) was dated using a previously described method (Ledda et al. 2017). (Didelot et al. 2012) and a clock rate of ∼3 substitutions We hypothesized that the dynamics of USA300 may be per year for USA300 (Uhlemann et al. 2014; Alam et al. driven by the consumption of -lactams in the USA, and 2015). We analyzed the resulting dated phylogeny we therefore gathered data on this from three different (Supplementary Fig. S4 available on Dryad) using sources covering respectively the periods between 1980 our phylodynamic methodology (Fig. 3). We initially and 1992 (McCaig and Hughes 1995), between 1992 performed this analysis without the use of any covariate and 2000 (McCaig et al. 2003), and between 2000 and [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 725 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 726 SYSTEMATIC BIOLOGY VOL. 67 2012 (CDDEP 2017). There ﬁrst and second sources the molecular clock rate of USA300 is approximately 3 overlapped in the year 1992, and the second and third substitutions per year (Uhlemann et al. 2014; Alam et al. sources overlapped in the year 2000. We used these 2015), the average duration of infections in this outbreak 2 years of overlap to scale the data for consistency is therefore around = 1.4 ×2/3 = 0.93 year. In the ﬁrst between the three sources. Speciﬁcally, the values from half of the 1990s, the growth rate peaked around 1 per the second source (McCaig et al. 2003) were scaled so year (Fig. 3) and using Equation 10 we estimate that that the 2000 value is equal to the one in the third the reproduction number was around R = 1.93, which source (CDDEP 2017), and values from the ﬁrst source is in good agreement with the recent estimate R = 1.5for (McCaig and Hughes 1995) were then scaled so that the MRSA in the US population (Hogea et al. 2014). The fact 1992 value is equal to the one in the previously rescaled that this estimate is only modestly above the minimum second source. The ﬁnal rescaled data are therefore threshold of R = 1 required for outbreaks to take place measured as in the third source, namely in standard could help explain why the USA300 is declining, even units of -lactams (i.e., narrow-spectrum and broad though -lactams are still widely used. The consumption spectrum penicilins plus cephalosporins) consumed per level may have lowered below the threshold caused by 1000 population in the USA (CDDEP 2017). These data the ﬁtness cost of resistance, as previously discussed for show that the consumption of -lactams almost doubled other resistant bacteria (Dingle et al. 2017; Whittles et al. between 1980 and 1991 and subsequently decreased 2017). to reach around 2010 levels comparable to the early 1980s (Fig. 3). These trends on -lactams consumption therefore appear to be very similar to the ones observed DISCUSSION for the USA300 growth rate without the use of covariates (red results in Fig. 3). To conﬁrm this observation, we Many environmental covariates, particularly those with a mechanistic inﬂuence on replicative ﬁtness of repeated our phylodynamic analysis with integration pathogens, are closely related to the growth rate of of the -lactam use as a covariate (blue results in epidemic size but not necessarily related to absolute Fig. 3). We found that the covariate was signiﬁcantly epidemic size. We have found that these relationships associated with growth rate, with a mean posterior can be inferred from random samples of pathogen effect of 0.48% and 95% credible interval (0.18–0.71). The genetic sequences by relating environmental covariates growth rate dynamics inferred when using covariate to the growth rate of the effective population size. data were almost identical to those inferred without This enables the estimation of the ﬁtness effect of the use of covariate data, except for a clear reduction environmental covariates as well as the prediction of of the width of the intervals which reﬂects the gain in future epidemic dynamics should conditions change. We information when combining two independent types have found a clear and highly signiﬁcant relationship of data. USA300 is also partly resistant to macrolides between the growth and decline of community- and quinolones, but the consumption of these antibiotics associated MRSA USA300 and the population-level increased in the USA throughout the 1990s (McCaig et al. prescription rates of -lactam antibiotics (Fig. 3). 2003), and the subset of sensitive genomes (31.9% for This relationship is not apparent when comparing ciproﬂoxacin and 6.3% for erythromycin) did not exhibit antibiotic usage directly with the effective population different phylodynamic properties compared to resistant size of MRSA USA300. Our methodology focused on genomes (Uhlemann et al. 2014), so that these antibiotics growth rate is therefore well suited to investigate the could not explain the USA300 growth rate dynamics. drivers of antibiotic resistance, compared to previous Our analysis therefore suggests that the rise in phylodynamic methods focused on the effective -lactams consumption in the 1980s was responsible for the emergence of the highly successful USA300 population size. lineage. From the mid-1990s, the use of -lactams has The skygrowth model can provide a more realistic prior declined, both due to an overall reduction in antibiotic for many infectious disease epidemics where the growth use and a diversiﬁcation of the type of antibiotics rate of epidemic size is likely to approach stationarity as opposed to the absolute effective population size. prescribed (McCaig et al. 2003; CDDEP 2017), and the Conventional skyride and skygrid models are prone to growth rate of USA300 has consequently decreased. erroneously estimating a stable effective population Importantly, the consumption of antibiotics is expected size when genealogical data are uninformative, as for to be associated with the growth rates of resistant example when estimating epidemic trends in the latter bacterial pathogens, rather than with their effective stages of SIR epidemics (Fig. 1). The skygrowth model population sizes, which here is not at all correlated with will correctly predict epidemic decline in this situation. the covariate (Fig. 3). Amongst pairs of genomes sampled Moreover, under ideal conditions, the estimated growth from the same hosts, the distribution of genomic distance rate can be related to the reproduction number of an had a mean of 1.4 substitution (Uhlemann et al. 2014). epidemic, and the skygrowth model provides a simple If we assume that sampling occurred on average in the nonparametric estimator of the reproduction number middle of the carriage duration (i.e., /2 time after through time given additional information about the infection), the evolutionary time separating the two natural history of infection (Equation 10). Caution genomes is between 0 depending on the level of should be exercised when using the effective population within-host genetic drift (Didelot et al. 2016). Given that [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 726 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 2018 VOLZ AND DIDELOT–GROWTH AND DECLINE OF EFFECTIVE POPULATION SIZE 727 size as a proxy for epidemic size, as the relationship Baele G., Suchard, M. A., Rambaut A., Lemey P. 2016. Emerging concepts of data integration in pathogen phylodynamics. Syst. Biol. between the two is complex (cf. Simulation results). 66:e47–e65. In general, there will be close correspondence between Biek R., Henderson J.C., Waller L.A., Rupprecht C.E., Real L.A. 2007. the growth of epidemic size and growth of effective A high-resolution genetic signature of demographic and spatial population size during periods where the growth rate expansion in epizootic rabies virus. Proc. Natl. Acad. Sci. USA 104:7993–8. is relatively constant. Bonhoeffer S., Lipsitch M., Levin B.R. 1997. Evaluating treatment The methods presented here can be applied more protocols to prevent antibiotic resistance. Proc. Natl. Acad. Sci. USA generally to evaluate the role of antibiotic stewardship, 94:12106–12111. vaccine campaigns, or other public health interventions CDDEP. 2017. The Center for Disease Dynamics Economics on epidemic growth rates. Some environmental and Policy. ResistanceMap. Available from: URL http://resistancemap.cddep.org/ (accessed July 2017). covariates, such as independent prevalence estimates, Challagundla L., Luo X., Tickler I.A., Didelot X., Coleman D.C., Shore may be more closely related to effective population size A.C., Coombs G.W., Sordelli D.O., Brown E.L., Skov R., Larsen, R., rather than growth rates, and future work is indicated Reyes J., Robledo I.E., Vazquez G.J., Rivera R., Fey P.D., Stevenson K., on the development of regression models in terms Wang S.-H., Kreiswirth B.N., Mediavilla J.R., Arias C.A., Planet P.J., Nolan R.L., Tenover F.C., Goering R.V., Robinson D.A. 2018. Range of both statistics. More complex stochastic models expansion and the origin of USA300 North American epidemic can also be considered, such as processes with both methicillin-resistant Staphylococcus aureus. MBio 9:e02016–17. autoregressive and moving average components. A Chambers H.F., Deleo F.R. 2009. Waves of resistance: Staphylococcus variety of mathematical models have been developed aureus in the antibiotic era. Nat. Rev. Microbiol. 7:629–41. to explain de novo evolution of antimicrobial resistance Childs J.E., Curns A.T., Dey M.E., Real L.A., Feinstein L., Bjornstad O.N., Krebs J.W. 2000. Predicting the local dynamics of epizootic as a function of population-level antimicrobial usage rabies among raccoons in the United States. Proc. Natl. Acad. Sci. (Bonhoeffer et al. 1997; Austin et al. 1999; Spicknall et al. USA 97:13666–13671. 2013; Whittles et al. 2017), and an important direction Dantes R., Mu Y., Belﬂower R., Aragon D., Dumyati G., Harrison L.H., for future work will be the development of parametric Lessa F.C., Lynﬁeld R., Nadle J., Petit S., Ray S.M., Schaffner W., Townes, J., Fridkin S. 2013. National burden of invasive methicillin- and semiparametric structured coalescent models (Volz resistant Staphylococcus aureus infections, United States, 2011. JAMA 2012) that can be applied to bacterial phylogenies Intern. Med. 173:1970–1979. featuring a mixture of antibiotic sensitive and resistant de Silva E., Ferguson N.M., Fraser C. 2012. Inferring pandemic growth lineages. This methodology will allow us to estimate rates from sequence data. J. R. Soc. Interface 9:1797–1808. key evolutionary parameters, such as the ﬁtness cost Dearlove B., Wilson D. 2013. Coalescent inference for infectious disease: meta-analysis of hepatitis C. Philos. Trans. R. Soc. B Biol. Sci. and beneﬁt of resistance, or the rate of mutation from 368:20120314. sensitive to resistant status, which are needed to make Didelot X., Eyre D.W., Cule M., Ip C.L.C., Ansari M.A., Grifﬁths well-informed recommendations on resistance control D., Vaughan A., O’Connor L., Golubchik T., Batty E.M., Piazza strategies. P., Wilson D.J., Bowden R., Donnelly P.J., Dingle K.E., Wilcox M., Walker A. S., Crook D. W., Peto T. E., Harding R.M. 2012. Microevolutionary analysis of Clostridium difﬁcile genomes to investigate transmission. Genome Biol. 13:R118. Didelot X., Walker A.S., Peto T.E., Crook D.W., Wilson D.J. 2016. SUPPLEMENTARY MATERIAL Within-host evolution of bacterial pathogens. Nat. Rev. Microbiol. Data available from the Dryad Digital Repository: 14:150–162. Dingle K.E., Didelot X., Quan T.P., Eyre D.W., Stoesser N., Golubchik http://dx.doi.org/10.5061/dryad.9qh7t9t. T., Harding R.M., Wilson D.J., Grifﬁths D., Vaughan A., Others. 2017. Effects of control interventions on Clostridium difﬁcile infection in England: an observational study. Lancet Infect. Dis. 17:411–421. FUNDING Drummond A.J., Rambaut A., Shapiro B., and Pybus O.G. 2005. Bayesian coalescent inference of past population dynamics from This work was supported by the National Institute molecular sequences. Mol. Biol. Evol. 22:1185–92. of General Medical Sciences [U01 GM110749 to E.M.V.]; Drummond A.J., Suchard M.A., Xie D., Rambaut A. 2012. Bayesian and the Medical Research Councils Centre for Outbreak phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. Analysis and Modelling [MR/K010174]. 29:1969–1973. Fraser C. 2007. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One 2:e758. Frost S.D.W., Volz E.M. 2010. Viral phylodynamics and the search for REFERENCES an ’effective number of infections’. Philos. Trans. R. Soc. B 365:1879– Alam M.T., Read T.D., Petit R.A., Boyle-Vavra S., Miller L.G., Eells S.J., Gill M.S., Lemey P., Bennett S.N., Biek R., Suchard M.A. 2016. Daum R.S., David M.Z. 2015. Transmission and microevolution of Understanding past population dynamics: Bayesian coalescent- USA300 MRSA in U.S. households: evidence from whole-genome based modeling with covariates. Syst. Biol. 65:1041–1056. sequencing. MBio 6:1–10. Gill M.S., Lemey P., Faria N.R., Rambaut A., Shapiro B., Suchard Allen L. 2008. An introduction to stochastic epidemic models. M.A. 2013. Improving Bayesian population dynamics inference: In: Jean-Michel Morel, Bernard Teissier, editors, Mathematical a coalescent-based model for multiple loci. Mol. Biol. Evol. epidemiology. Lecture Notes in Mathematics. vol. 1945. Berlin: 30:713–724. Springer. p. 81–130. Glaser P., Martins-Simões P., Villain A., Barbier M., Tristan A. Austin D.J., Kristinsson K.G., Anderson R.M. 1999. The relationship Bouchier C., Ma L., Bes M., Laurent F., Guillemot D., Wirth T., between the volume of antimicrobial consumption in human Vandenesch F. 2016. Demography and intercontinental spread of the communities and the frequency of resistance. Proc. Natl. Acad. Sci. USA300 community-acquired methicillin-resistant Staphylococcus USA 96:1152–1156. aureus lineage. MBio 7:1–11. [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 727 719–729 Downloaded from https://academic.oup.com/sysbio/article/67/4/719/4844079 by DeepDyve user on 12 July 2022 728 SYSTEMATIC BIOLOGY VOL. 67 Goldfarb D. 1970. A family of variable-metric methods derived by Pybus O.G. 2001. The epidemic behavior of the hepatitis C virus. variational means. Math. Comput. 24:23–26. Science 292:2323–2325. Grifﬁths R., Tavare S. 1994. Sampling theory for neutral alleles in a Pybus O.G., Rambaut A., Harvey P.H. 2000. An integrated framework varying environment. Philos. Trans. R. Soc. B Biol. Sci. 344:403–410. for the inference of viral population history from reconstructed Ho S.Y., Shapiro B. 2011. Skyline-plot methods for estimating genealogies. Genetics 155:1429–1437. demographic history from nucleotide sequences. Mol. Ecol. Resour. Rosenberg N.A., Nordborg M. 2002. Genealogical trees, coalescent 11:423–434. theory and the analysis of genetic polymorphisms. Nat. Rev. Genet. Hogea C., Van Effelterre T., Acosta C.J. 2014. A basic dynamic 3:380–90. transmission model of Staphylococcus aureus in the US population. Spicknall I.H., Foxman B., Marrs C.F., Eisenberg J.N.S. 2013. A Epidemiol. Infect. 142:468–478. modeling framework for the evolution and spread of antibiotic Karcher M.D., Palacios J.A., Bedford T., Suchard M.A., Minin, V.N. 2016. resistance: literature review and model categorization. Am. J. Quantifying and mitigating the effect of preferential sampling on Epidemiol. 178:508–520. phylodynamic inference. PLoS Comput. Biol. 12:e1004789. Strimmer K., Pybus O.G. 2001. Exploring the demographic history of Karcher M.D., Palacios J.A., Lan S., Minin V.N. 2017. phylodyn: an DNA sequences using the generalized skyline plot. Mol. Biol. Evol. R package for phylodynamic simulation and inference. Mol. Ecol. 18:2298–2305. Resour. 17:96–100. Tenover F.C., Goering R.V. 2009. Methicillin-resistant Staphylococcus Kingman J. 1982. The coalescent. Stoch. Process. Appl. 13:235–248. aureus strain USA300: origin and epidemiology. J. Antimicrob. Koelle K., Ratmann O., Rasmussen D.A. Pasour V., Mattingly J. Chemother. 64:441–446. 2011. A dimensionless number for understanding the evolutionary Uhlemann A.-C., Dordel J., Knox J.R., Raven K.E., Parkhill J., Holden dynamics of antigenically variable RNA viruses. Proc. R. Soc. B Biol. M.T.G., Peacock S.J., Lowy F.D. 2014. Molecular tracing of the Sci. 278:3723–3730. emergence, diversiﬁcation, and transmission of S. aureus sequence Ledda A., Price J.R., Cole K., Llewelyn M.J., Kearns A.M., Crook D.W., type 8 in a New York community. Proc. Natl. Acad. Sci. USA Paul J., Didelot X. 2017. Re-emergence of methicillin susceptibility 111:6738–43. in a resistant lineage of Staphylococcus aureus. J. Antimicrob. Vaughan T.G., Drummond A.J. 2013. A stochastic simulator of birth- Chemother. 72:1285–1288. death master equations with application to phylodynamics. Mol. McCaig L.F., Besser R.E., Hughes J.M. 2003. Antimicrobial drug Biol. Evol. 30:1480–1493. prescription in ambulatory care settings, United States, 1992–2000. Volz E.M. 2012. Complex population dynamics and the coalescent Emerg. Infect. Dis. 9:432–437. under neutrality. Genetics 190:187–201. McCaig L.F., Hughes J.M. 1995. Trends in antimicrobial drug Volz E.M., Frost S.D.W. 2014. Sampling through time and prescribing among ofﬁce-based physicians in the United States. J. phylodynamic inference with coalescent and birth – death models. Am. Med. Assoc. 273:214–219. J. R. Soc. Interface 11:20140945. Minin V.N., Bloomquist E.W., Suchard M.A. 2008. Smooth skyride Volz E.M., Koelle K., Bedford T. 2013. Viral phylodynamics. PLoS through a rough skyline: Bayesian coalescent-based inference of Comput. Biol. 9:e1002947. population dynamics. Mol. Biol. Evol. 25:1459–1471. Volz E.M., Kosakovsky Pond S.L., Ward M.J., Leigh Brown A. J., Monroe B., Yager P., Blanton J., Birhane M., Wadhwa A., Orciari, L., Frost S.D.W. 2009. Phylodynamics of infectious disease epidemics. Petersen B., Wallace R. 2016. Rabies surveillance in the United States Genetics 183:1421–30. during 2014. J. Am. Vet. Med. Assoc. 248:777–788. Volz E.M., Romero-Severson E., Leitner T. 2017. Phylodynamic Palacios J.A., Minin V.N. 2013. Gaussian process-based Bayesian inference across epidemic scales. Mol. Biol. Evol. nonparametric inference of population size trajectories from gene 34:1276–1288. genealogies. Biometrics 69:8–18. Whittles L., White P., Didelot X. 2017. Estimating the ﬁtness cost and Planet P.J. 2017. Life after USA300: the rise and fall of a superbug. J. beneﬁt of ceﬁxime resistance in Neisseria gonorrhoeae to inform Infect. Dis. 215:S71–S77. prescription policy: a modelling study. PLoS Med. 14:e1002416. [19:31 4/6/2018 Sysbio-OP-SYSB180007.tex] Page: 728 719–729
Systematic Biology – Oxford University Press
Published: Jul 1, 2018
Access the full text.
Sign up today, get DeepDyve free for 14 days.