Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Phylogenetic Tools for Generalized HIV-1 Epidemics: Findings from the PANGEA-HIV Methods Comparison

Phylogenetic Tools for Generalized HIV-1 Epidemics: Findings from the PANGEA-HIV Methods Comparison Viral phylogenetic methods contribute to understanding how HIV spreads in populations, and thereby help guide the design of prevention interventions. So far, most analyses have been applied to well-sampled concentrated HIV-1 epi- demics in wealthy countries. To direct the use of phylogenetic tools to where the impact of HIV-1 is greatest, the Phylogenetics And Networks for Generalized HIV Epidemics in Africa (PANGEA-HIV) consortium generates full-genome viral sequences from across sub-Saharan Africa. Analyzing these data presents new challenges, since epidemics are principally driven by heterosexual transmission and a smaller fraction of cases is sampled. Here, we show that viral phylogenetic tools can be adapted and used to estimate epidemiological quantities of central importance to HIV-1 prevention in sub-Saharan Africa. We used a community-wide methods comparison exercise on simulated data, where participants were blinded to the true dynamics they were inferring. Two distinct simulations captured generalized HIV-1 epidemics, before and after a large community-level intervention that reduced infection levels. Five research groups participated. Structured coalescent modeling approaches were most successful: phylogenetic estimates of HIV-1 inci- dence, incidence reductions, and the proportion of transmissions from individuals in their first 3 months of infection correlated with the true values (Pearson correlation> 90%), with small bias. However, on some simulations, true values were markedly outside reported confidence or credibility intervals. The blinded comparison revealed current limits and strengths in using HIV phylogenetics in challenging settings, provided benchmarks for future methods’ development, and supports using the latest generation of phylogenetic tools to advance HIV surveillance and prevention. Key words: HIV transmission and prevention, molecular epidemiology of infectious diseases, viral phylogenetic methods validation. Introduction Incorporating these strategies into routine care services and Recent breakthroughs in human immunodeficiency virus delivering on the commitment to end the HIV-1 epidemic by type 1 (HIV-1) prevention and treatment have provided a 2030 remains a major challenge (UNAIDS 2014), particularly range of tools to reduce HIV-1 transmission (WHO 2015). in sub-Saharan Africa where the burden of HIV-1 is greatest. The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is Open Access properly cited. Mol. Biol. Evol. 34(1):185–203 doi:10.1093/molbev/msw217 Advance Access publication October 7, 2016 185 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE This region suffers 75% of all new HIV-1 infections worldwide, reductions in HIV-1 incidence over a short period. Viral phy- with adult HIV-1 prevalence exceeding 25% in some regions, logenetics could be an effective tool to measure similar re- and averaging 5% overall (UNAIDS 2015). To sustain public ductions, especially in contexts where incidence cohorts do health interventions at this scale with limited resources, a not exist, and thereby contribute to monitoring the impact of sufficiently detailed understanding of the local and regional prevention strategies. First, participants were asked to esti- drivers of HIV-1 spread is often indispensable. Universal pre- mate recent reductions in HIV-1 incidence resulting from a vention packages (Iwuji et al. 2013; Hayes et al. 2014) benefit simulated community-based intervention over a 3- to 5-year from data that allows monitoring incidence trends and driv- period. Here, incidence was defined as the proportion of new ers of residual spread, whereas more targeted prevention cases per year among uninfected adults, and reductions in approaches (Vassall et al. 2014) by definition require a de- incidenceweremeasuredinterms of the incidence ratiobe- tailed knowledge of at-risk populations. fore and after the intervention. Second, it has been debated The Phylogenetics And Networks for Generalized HIV whether frequent transmission during the early acute phase Epidemics in Africa (PANGEA-HIV) consortium aims to pro- of HIV infection could undermine the impact in reducing vide viral sequence data from across sub-Saharan Africa, and incidence of universal test and treat (Cohen et al. 2012). In to evaluate their viral phylogenetic relationship as a marker of concentrated epidemics, viral phylogenetics based on partial recent HIV-1 transmission dynamics (Pillay et al. 2015). pol sequences have been used to provide estimates of the Previous molecular epidemiological studies indicate that proportion of transmissions arising from individuals in their this approach can characterize transmission landscapes first year of infection (Volz et al. 2013; Ratmann et al. 2016). across a diverse array of epidemic contexts in order to guide Here, we sought to evaluate whether viral phylogenetics prevention efforts (Fisher et al. 2010; Kouyos et al. 2010; von based on full-genome sequences can provide accurate esti- Wyl et al. 2011; Stadler et al. 2013; Volz et al. 2013; Grabowski mates of the proportion of transmissions from individuals in et al. 2014; Bezemer et al. 2015; Ratmann et al. 2016). Rather early and acute HIV (defined here as in their first 3 months of than the partial gene sequences frequently used, the consor- infection), because these are likely not preventable in current tium is generating near full-length HIV-1 sequences in order prevention trials where testing intervals are 1 year or more to further increase the resolution and power of viral phylo- (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014). Third, genomic methods (Dennis et al. 2014). Indeed, such increases as sequence data are now collected as part of HIV-1 preven- in power are needed to disentangle signal from noise in ep- tion trials (HPTN 071 (PopART) Phylogenetics Protocol Team idemic settings with frequent co-infection and recombination 2015; Novitsky et al. 2015), different approaches to prospec- events (Grabowski et al. 2014), and to shift focus to recent tive sequence sampling have emerged. Sequences could be transmission dynamics (Dennis et al. 2014). collected at high coverage in villages or smaller townships at Available viral phylogenetic techniques can provide esti- the risk of missing long-range transmissions, or at lower cov- mates of key epidemiological quantities of concentrated erage over geographically much larger areas. We sought to HIV-1 epidemics (Brenner et al. 2007; Fisher et al. 2010; compare the impact of these sampling strategies on viral Stadler and Bonhoeffer 2013; Volz et al. 2013; Bezemer phylogenetic analyses by simulating epidemics in village and et al. 2015; Ratmann et al. 2016). But the generalized epi- larger regional populations, and sampling sequences at high demics in sub-Saharan Africa and sequence availability in and low coverage respectively. Other objectives included eval- these resource-poor settings differ fundamentally from uating the benefit of using concatenated HIV-1 sequences well sampled concentrated epidemics in wealthy countries, comprising simulated pol, gag and env genes, as compared where viral phylogenetic tools have been proven to be most with using simulated pol sequences alone, and the impact of effective to date (Dennis et al. 2014). To strengthen the frequent viral introductions into the modeled population as a application of viral phylogenetics in sub-Saharan Africa, in result of long-distance transmission. Table 1 describes the October 2014 PANGEA-HIV invited research groups to par- objectives and reporting variables of the exercise more fully. ticipate in a blinded methods comparison exercise. Two Five external research groups participated in the exercise, individual-level HIV epidemic models were used to simu- out of eight teams that initially indicated interest. Table 2 lists late generalized HIV-1 epidemics. From these, we gener- the phylogenetic methods that were used: the ABC-kernel ated corresponding viral sequence datasets comprising method (A.Poon, J. Joy,R.Liang; teamVancouver)(Poon simulated pol, gag and env genes (which we refer to as 2015), the birth-death skyline method with sampled ancestors full genome sequences for brevity), as well as basic (C. Weis, G.E. Leventhal, D. Ku ¨hnert,D.A.Rasmussen, T. Stadler; individual-level epidemiological data on those infected team Basel-Zu ¨rich) (Gavryushkina et al. 2014; Ku ¨hnert et al. individuals that were sequenced in the simulations. 2016), a metapopulation coalescent approach (B. Dearlove, M. External research groups then analyzed the blinded data. Hossain, S. Frost; team Cambridge) (Dearlove and Wilson 2013), Overall, we aimed to evaluate if the most recent genera- thestructuredcoalescent(E. Volz,M.Hossain,S.Frost;team tion of viral phylogenetic tools could be adapted and used to Cambridge-London) (Volz et al. 2009), and a Bayesian trans- estimate epidemiological quantities of central importance to mission chain analyser (C. Colijn, M. Kendall, X. Didelot, G. HIV-1 prevention in sub-Saharan Africa. The specific objec- Plazotta; team London) (Didelot et al. 2014). These methods tives were inspired by current HIV-1 prevention trials in sub- differed in the underlying transmission and intervention mod- Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. els, assumptions to facilitate estimation of the reporting vari- 2014). The primary goal of these trials is to achieve substantial ables, and computational estimation routines. Here, we 186 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Table 1. Aims of the PANGEA Phylodynamic Methods Comparison Exercise. Objectives Reporting Variable Primary objectives 1 Identify incident trends during the intervention Consider the year t before the intervention started, and the second last year t of the s e simulation. Participants were asked to report HIV-1 incidence trends from t to t in s e terms of “declining”, “stable”, “increasing” 2 Estimate HIV-1 incidence after the intervention Participants were asked to report %Incidence defined as %INCðt Þ¼ INCðt Þ=Sðt Þ, e e e where INCðt Þ is the number of new cases in year t , and Sðt Þ is the number of e e e sexually active individuals that were not infected in year t 3 Quantify the reduction in HIV-1 incidence at the Participants were asked to report the incidence ratio %INCðt Þ=%INCðt Þ e s end of the intervention 4 Estimate the proportion of transmissions from Participants were asked to report the proportion of new cases in year t from indi- early and acute HIV, just before the intervention viduals in their first 3 months of infection 5 Estimate the proportion of transmissions from Participants were asked to report the proportion of new cases in year t from indi- early and acute HIV, after the intervention: viduals in their first 3 months of infection Secondary objectives To estimate the impact of the following controlled covariates on the reporting variables: 6 Availability of full genome sequences (HIV-1 gag, pol and env genes) as compared with partial sequences (HIV-1 pol gene only) 7 Sequence sampling frame: Sequence coverage at the end of the simulation; Rapid increases in sequence coverage; Sampling duration after intervention start 8 Frequency of viral introductions into the modeled study population 9 Inference of dated viral phylogenies from sequence data summarizethe findings of theexercise, and discuss their impli- simulations, a combination prevention intervention was cations for using phylogenetic methods to estimate recent as- started in 2015 for 3 years at varying degrees of uptake and pects of HIV-1 transmission dynamics in generalized epidemics. coverage, resulting in 30% or 60% reductions in incidence Datasets and simulations generated here may be of use for relative to the start of the intervention, when incidence was testing other applications of viral phylogenetic methods, and close to 2% per year. In half of the 20 simulations, the pro- are made available alongside this article. portion of early transmissions in 2015 was respectively cali- brated to 10% and 40% (fig. 2). Ranges in incidence reduction reflect modeled, optimistic and pessimistic scenarios in on- Results going prevention trials in sub-Saharan Africa (Iwuji et al. 2013; PANGEA-HIV Reference Datasets for Benchmarking Moore et al. 2013; Hayes et al. 2014). The proportion of trans- Molecular Epidemiological Transmission Analysis missions from early and acute HIV has been challenging to estimate without sequence data, and the ranges used here Methods reflect estimates from several settings in sub-Saharan Africa The simulations capture a variety of transmission and inter- (Cohen et al. 2012). About 5–20% of all transmissions per year vention scenarios across two demographic settings in sub- occurred from outside the model population, which hindered Saharan Africa, and are available from https://dx.doi.org/10. prevention efforts in the simulations through continual re- 6084/m9.figshare.3103015 (last accessed October 14, 2016). plenishment of the epidemic. 20 datasets correspond to generalized HIV-1 epidemics in a 13 simulated datasets capture generalized HIV-1 epidemics region of 80,000 individuals between 1980 and 2020 (table over 45 years in a smaller village population of 8,000 indi- 3). The proportion of infected individuals of whom one se- viduals (table 3). Sequence coverage was higher in this smaller quence was sampled (sequence coverage) was 8–16% by the population, 25–50% by the end of the simulation. These data end of the simulation. These data were simulated under the were simulated under an individual-based household model individual-based HPTN071 (PopART) model, version 1.1, de- using the Discrete Spatial Phylo Simulator for HIV, developed veloped at Imperial College London (“Regional” model). The at the University of Edinburgh (“Village” model). Model com- overall simulation pipeline and model components are illus- ponents are illustrated in figure 1, and further information is trated in figure 1, and further information is provided in sup provided in supplementary table S2, Supplementary Material plementary table S1, Supplementary Material online. The online. The Village model was parameterized to simulate an Regional model was calibrated to generate an epidemic HIV-1 epidemic mostly contained within a small rural African with a comparable prevalence atthe startofthe intervention village, with a peak prevalence of 20–25% and peak incidence to that seen currently in HPTN071 (PopART) trial sites in of 5–7% without treatment (fig. 2). In 12 out of 13 simula- South Africa (Hayes et al. 2014). In the model, standard of tions, a community-level intervention providing antiretroviral care improved according to national guidelines over time, treatment took place for the last 5 years of the simulation. resulting in steady declines in incidence. In 18 of the 20 187 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Table 2. Phylogenetic Methods Used in the PANGEA Phylodynamic Methods Comparison Exercise. Team Team Members Method Model-based Model Overview Simulated Data Used To Fitting Process Availability analysis Inform Inference Basel-Zu ¨ rich C. Weis, G.E. Birth–death sky- Yes Stochastic birth–death model with sampled ances- All sequences and full Markov Chain http://beast2.org/ Leventhal, line method tors to estimate incidence and incidence reduc- trees to estimate Monte Carlo (last accessed D. Ku ¨ hnert, with sampled tions, and multi-type birth death model birth–death parame- October 14, 2016) D.A. ancestors corresponding to two stages of infection to esti- ters; cross-sectional using add-ons Rasmussen, mate the proportion of early transmissions. Time survey data bdsky, SA, bdmm T. Stadler trends in parameters were modeled with serial time intervals during which parameters were assumed constant. Viral introductions were not modeled Cambridge B. Dearlove, Meta-population Yes Standard SI, SIS and SIR models were averaged. Model All sequences and full Markov Chain http://beast.bio.ed. M. Hossain, coalescent parameters did not change over time. Viral intro- trees. Monte Carlo ac.uk/ (last S. Frost approach ductions were not modeled. accessed October 14, 2016) using XML specification described in (Dearlove and Wilson 2013) Cambridge- E. Volz, M. Structured Yes Deterministic compartment model stratified by gen- All sequences and sub- Parallel Markov http://colgem.r- London Hossain, coalescent der, disease progression, diagnosis and treatment trees including all in- Chain Monte forge.r-proj- S. Frost status, risk behavior. Time trends in baseline ternal nodes 30 before Carlo ect.org/ (last transmission rates were modeled with 4-parameter the last sample; cross- accessed October generalized logistic function. Diagnosis and treat- sectional survey data; 14, 2016) ment uptake rates changed at intervention start. and gender and CD4 Viral introductions were modeled with a source count at time of diag- deme. nosis for Regional datasets. London C. Colijn, Bayesian trans- Yes Stochastic generalized branching model with gener- All sequences and full Reversible-jump https://github.com/ M. Kendall, mission chain ation time modeled to represent three infection trees on village data- Markov Chain xavierdidelot/ G. Plazotta, analyzer stages. Model parameters did not change over time. sets; sequences in Monte Carlo TransPhylo (last X. Didelot Viral introductions were not modeled. trees with at least 80 accessed October tips on regional 14, 2016) PANGEA datasets. release available from authors Vancouver A. Poon, J. Joy, ABC kernel Yes Deterministic compartment model stratified by in- All sequences and full Approximate https://github.com/ R. Liang method fection status, three stages of infection, and risk trees. Bayesian ArtPoon/kamphir behavior. Model parameters did not change over Computation (last accessed time. Viral introductions were modeled with a October 14, 2016) source deme. PANGEA release available from authors Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Table 3. Simulated Datasets of the Phylodynamic Methods Comparison Exercise. a,b Model Dataset Purpose %Acute Intervention Viral Sequences Sequence Sequences Sampling a,c (Low¼L, Scale Up Introduction- (#) Coverage in the After Duration a,d High¼H) (Fast¼F, s (% of All Last Year of the Intervention After a,e f Slow¼S) Transmission- Simulation (% Start (% of All Intervention s per Year) of All Infected Sequences) Start (Years) and Alive) Regional D Identify 60% reduction in incidence during intervention and 10% L F 5 1,600 8 50 5 early transmissions. C Identify 30% reduction in incidence during intervention and 10% L S 5 1,600 8 50 5 early transmissions. A Identify 60% reduction in incidence during intervention and 40% H F 5 1,600 8 50 5 early transmissions. B Identify 30% reduction in incidence during intervention and 40% H S 5 1,600 8 50 5 early transmissions. O As D, and evaluate impact of sampling frame: shorter duration of L F 5 1,280 8 50 3 intensive sequencing. T As D, and evaluate impact of tree reconstruction. L F 5 1,600 8 50 5 S As D, and evaluate impact of sampling frame: most sequences L F 5 1,600 8 85 5 from after intervention start. I As D, and evaluate impact of sampling frame: higher sequence L F 5 3,200 16 50 5 coverage. R As C, and evaluate impact of tree reconstruction. L S 5 1,600 8 50 5 Q As C, and evaluate impact of sampling frame: most sequences L S 5 1,600 8 85 5 from after intervention start. G As C, and evaluate impact of sampling frame: higher sequence L S 5 3,200 16 50 5 coverage. N Control simulation, no intervention. L None 5 1,600 8 50 5 F As A, and evaluate impact of sampling frame: shorter duration of H F 5 1,280 8 50 3 intensive sequencing. L As A, and evaluate impact of tree reconstruction. H F 5 1,600 8 50 5 J As A, and evaluate impact of sampling frame: higher sequence H F 5 3,200 16 50 5 coverage. P As A, and evaluate impact of higher proportion of viral H F 20 1,600 8 50 5 introductions. H As B, and evaluate impact of tree reconstruction. H S 5 1,600 8 50 5 K As B, and evaluate impact of sampling frame: higher sequence H S 5 3,200 16 50 5 coverage. E As B, and evaluate impact of higher proportion of viral H S 20 1,600 8 50 5 introductions. M Control simulation, no intervention. H None 5 1,600 8 50 5 Village 3 Identify 40% reduction in incidence during intervention and 4% LF <2 777 25 >95 5 early transmissions. 2 Identify 15% reduction in incidence during intervention and 4% LS <2 857 25 >95 5 early transmissions. (continued) Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Table 3. Continued a,b Model Dataset Purpose %Acute Intervention Viral Sequences Sequence Sequences Sampling a,c (Low¼L, Scale Up Introduction- (#) Coverage in the After Duration a,d High¼H) (Fast¼F, s (% of All Last Year of the Intervention After a,e f Slow¼S) Transmission- Simulation (% Start (% of All Intervention s per Year) of All Infected Sequences) Start (Years) and Alive) 1 Identify 40% reduction in incidence during intervention and 20% HF <2 957 25 >95 5 early transmissions. 4 Identify 15% reduction in incidence during intervention and 20% HS <2 1,040 25 >95 5 early transmissions. 5 As 3, and evaluate impact of sampling frame: higher sequence LF <2 1,469 50 >95 5 coverage. 11 Similar to 3, without imported sequences. L F 0 638 25 >95 5 8 As 2, and evaluate impact of sampling frame: higher sequence LS <2 1,630 50 >95 5 coverage. 9 Similar to 2, without imported sequences. L S 0 686 25 >95 5 0 Control simulation, no intervention. L None <2 872 25 >95 5 6 As 1, and evaluate impact of sampling frame: higher sequence HF <2 1,831 50 >95 5 coverage. 12 Similar to 1, without imported sequences. H F 0 956 25 >95 5 7 As 4, and evaluate impact of sampling frame: higher sequence HS <2 1,996 50 >95 5 coverage. 10 Similar to 4, without imported sequences. H S 0 1,012 25 >95 5 Variables in shaded columns were unknown to participants at time of analysis. Values range from 5% to 40%, reflecting recent estimates for endemic-phase epidemics in sub-Saharan Africa (Cohen et al. 2012). Range reflects optimistic and pessimistic scenarios in prevention trials in sub-Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014). Range includes frequent viral introductions as reported in settings with highly mobile populations (Grabowski et al. 2014). In comparison to the large sequence datasets that are available for concentrated epidemics in Europe or North America, the lower values here reflect challenges in achieving high sequence coverage where large populations are infected. Higher values reflect geographically focused sequencing efforts such as in Mochudi, Botswana (Carnegie et al. 2014). Values reflect the duration of typical prevention trial settings, and that most sequences are obtained after intervention start (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014). Out of all individuals that were alive and infected in the last calendar year of the simulation, the proportion that had ever a sequence taken. For datasets in bold, only viral sequences were disclosed. For all other datasets, only viral phylogenies were provided. Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Model Regional simulations Village simulations Simulation Output Component Demographics FIG.1. Simulation pipeline to generate HIV-1 sequence data, viral phylogenies, and accompanying individual-level data. Two simulation models (Regional and Village) were implemented for the methods comparison. The two individual-level epidemic and intervention models generated HIV-1 transmission chains in the model population, and its components are shown in blue to green. Next, individuals were sampled for sequencing, and a viral tree was generated for these individuals. Tree generation accounted for within-host viral evolution under a neutral coalescent model. Finally, viral sequences comprising the gag, pol and env genes were simulated along the viral tree. Sequence generation accounted for known variation in evolutionary rates across genes, codon positions, and along within-host lineages. Further details are provided in supplementary tables S1 and S2, Supplementary Material online. Treatment uptake was either “fast” or “slow”, with reductions in sub-Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; in incidence averaging between 10% and 40% relative to be- Dennis et al. 2014; Grabowski et al. 2014;HPTN071 (PopART) fore intervention start. Additionally, simulations were config- Phylogenetics Protocol Team 2015; Pillay et al. 2015). ured so that either a small (4%) or large (20%) proportion of Sequence sampling biases can be substantial in real datasets, transmissions occurred during the first 3 months of infection. but were not included in the model (Carnegie et al. 2014; Some infections originated from outside the model popula- Ratmann et al. 2016). Second, viral trees were generated un- tion in half of the simulations. der a hybrid within- and between-host coalescent model. The Viral sequences were generated from the simulated trans- viral trees did not always correspond to the transmission mission chains (fig. 1). First, individuals were sampled at ran- trees, because viruses diversified within infected individuals dom for sequencing. The majority of individuals were only before transmission (Pybus and Rambaut 2009). In 25 of the 33 sampled in the last years of the simulations, reflecting that datasets, these viral trees were made available, in order to re- sequences are only beginning to be more routinely collected duce the computational burden of molecular epidemiological 191 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Regional simulation model Adult population size HIV prevalence Early transmissions from individuals in their first three months of infection 60000 20 1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020 ART coverage among infected adults New cases among uninfected adults per 12 months Sequence coverage among infected adults 2.00 10.0 1.75 7.5 1.50 5.0 1.25 25 2.5 1.00 0.75 0.0 2016 2018 2020 2016 2018 2020 1990 2000 2010 2020 Calendar year, simulation started in 1980 Data sets D, O, S, T A, F, L C, R, Q B, H N M Village simulation model Adult population size HIV prevalence Early transmissions from individuals in their first three months of infection 0 0 10 15 20 25 30 35 40 45 10 15 20 25 30 35 40 45 10 15 20 25 30 35 40 45 New cases among uninfected adults per 12 months Sequence coverage among infected adults ART coverage among infected adults 7 50 6 40 4 20 3 10 0 2 40 41 42 43 44 45 39 40 41 42 43 44 45 46 39 40 41 42 43 44 45 46 Years since start of the simulation Data sets 3, 5 2, 8 1, 6 4, 7 9 10 11 12 0 FIG.2. Simulated epidemic scenarios under the Regional and Village models. (A) Six generalized HIV-1 epidemic scenarios were simulated in a region of 80,000 adult individuals using the Regional model, and (B) nine scenarios were simulated in a rural village population with an initial population of 6,000 individuals using the Village model. The scenarios differ in terms of incidence, the proportion of early transmissions, and scale-up of the combination prevention package during the intervention period (gray-shaded time period). From these, 33 datasets were generated, that included either viral sequences or viral trees. These datasets further varied in the sequence sampling frame and the frequency of viral introductions; see also figure 1 and table 3.DatasetsE,G,I,J,K,Phadmorefrequent viralintroductions or higher sequencecoverage, andare notshown.The proportion of early transmissions under the Village model was smoothed with a 3-year sliding window to better visualize trends in this smaller model population. Total Total % % % % % Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Team Cambridge Team Cambridge-London rich Team London Team Vancouver A L D S F C Q B H M 03 11 01 06 12 10 07 J P I T O G R E K N 05 09 02 08 00 04 A L D S F C Q B H M 03 11 01 06 12 10 07 A L D S F C Q B H M 03 11 01 06 12 10 07 J P I T O G R E K N 05 09 02 08 00 04 J P I T O G R E K N 05 09 02 08 00 04 PANGEA data set Estimates based on concatenated gag, pol, env sequences true tree FIG.3. Estimates of HIV-1 incidence from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team (panel) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black. analyses (table 3 and supplementary figs. S1 and S2, level intervention scenarios through the viral sequences Supplementary Material online). For the remaining 13 data- provided (supplementary fig. S5, Supplementary Material on- sets, viral sequences of HIV-1 gag, pol and env genes were line). However, we expected that rapid increases in sequence simulated along the viral trees (1,500, 3,000 and 2,500 coverage after the intervention would complicate phyloge- nucleotides respectively, for a total of approximately 6,000 netic inference. The simulations also retained, on average, nucleotides), from an HIV-1 subtype C starting sequence. The information for differentiating between the 10% and 40% sequences thus represent generalized subtype C epidemics, early transmission scenarios of the Regional simulations at as in most Southern African countries. The nucleotide se- very low sequence coverage (supplementary fig. S6, quence evolution model that was used incorporated known Supplementary Material online). More challenges were ex- differences in evolutionary rates by gene and codon position pected on the Village simulations despite higher sequence and relative differences in substitution rates by gene and coverage, partly because the effect size between the low codon position (Shapiro et al. 2006; Alizon and Fraser and high %Acute scenarios was smaller (supplementary fig. 2013). The coalescent and sequence evolution models did S7, Supplementary Material online). not account for recombination, sequencing errors, or selec- tion beyond differential evolutionary rates across genes, co- Responses to the Methods Comparison Exercise dons and within-host lineages (supplementary tables S1 and Participants were primarily asked to estimate incidence re- S2, Supplementary Material online). As a key indicator of the ductions from before the intervention (year 39 or 2014) to realism of the simulated sequences, we calculated the pro- just after the intervention (year 43 or 2018), and to estimate portionofthe variationinevolutionary diversification among the proportion of early transmissions in the year before and the simulated HIV-1 sequences, that can be explained by a after the intervention (table 1). Participating teams developed constant molecular clock model. The proportion explained fast computational strategies for handling full-genome HIV ranged from 25% to 60% (supplementary figs. S3 and S4, sequence datasets within given timelines (3 months for 13 Supplementary Material online), broadly in line with esti- Village datasets and 6 months for 20 regional datasets). First, mates on real HIV-1 sequence datasets (Lemey et al. 2006). where only sequences were provided, viral phylogenies were The simulations were designed to retain signal for differ- reconstructed with maximum likelihood methods (Price et al. entiating between the “fast”, “slow” and “no” community- 2010; Stamatakis 2014). Second, these phylogenies were dated Estimated and true incidence after the intervention (%) Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Team Cambridge-London rich Team Cambridge 1.5 1.0 0.5 0.0 Team London Team Vancouver D S A L F 03 01 C Q 12 E K 02 04 M N 00 I T J P O 05 06 G R B H 10 08 07 11 09 1.5 1.0 0.5 0.0 D S A L F 03 01 C Q 12 E K 02 04 M N 00 D S A L F 03 01 C Q 12 E K 02 04 M N 00 I T J P O 05 06 G R B H 10 08 07 11 09 I T J P O 05 06 G R B H 10 08 07 11 09 PANGEA data set Estimates based on concatenated gag, pol, env sequences true tree FIG.4. Estimates of HIV-1 incidence reductions from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team (panel) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black. under least-squares criteria or similar fast approaches (To mentary table S2, Supplementary Material online, team et al. 2015). Third, dated phylogenies were used as input to Cambridge-London who used a structured coalescent the transmission analysis methods described in table 2.This model). Bias in these estimates was relatively small for esti- sequential approach allowed the teams to obtain phyloge- mates of two teams (on an average 0.35% by team netic estimates to all reporting variables for the large majority Cambridge-London and 0.57% by team London). Team of the datasets (see supplementary table S3, Supplementary Basel-Zu ¨rich achieved substantially more accurate estimates Material online). Team Vancouver did not provide estimates on the Regional datasets than the Village datasets, whereas to datasets of the Regional model that contained true phy- the converse was true for team London (supplementary table logenetic trees; and teams Cambridge-London and Basel- S2, Supplementary Material online). Zu ¨rich did not provide estimates to datasets of the The accuracy of phylogenetic estimates of changes in in- Regional model that contained sequences. The most com- cidence as a result of the intervention largely reflected the mon reasons for incomplete recall were limited availability of accuracy of the underlying incidence estimates (fig. 4). computing resources, tight timelines to evaluate the simula- Phylogenetic estimates of incidence ratios correlated with tions, and difficulties in tree estimation when viral introduc- the true values by 93% under the structured coalescent ap- tions occurred frequently. Nearly all participants focused on proach of team Cambridge-London, and had only slight up- inference from full viral genomes (supplementary table S3, ward bias (supplementary table S4, Supplementary Material Supplementary Material online), meaning that the impact of online). This meant that large reductions in incidence, which full genome sequences (concatenated HIV-1 gag, pol and env are expected from combination prevention interventions, genes) as compared with partial sequences (HIV-1 pol gene could be correctly detected at relatively low sequence cover- only) could not be evaluated. age when sequences were sampled for 5 years since interven- tion start by the most successful method. Epidemic simulations with >25% reductions in incidence were cor- Estimating Incidence and Reductions in Incidence rectly classified as declining in 15/17 (88%) of all simulations Phylogenetic methods differed in their ability to estimate in- with a submission by team Cambridge-London, although the cidenceafter theintervention(fig. 3). Under the most suc- true positive rate was lower with other phylogenetic methods cessful computational approach, phylogenetic estimates of (supplementary table S5, Supplementary Material online). incidence were correlated with true values by 91% (supple Estimated and true incidence ratio Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Regional Simulation Model Team Cambridge-London rich Team London Team Vancouver OD S I T N R C Q G E P MA L F J K H B O D S I T NRC QG E P M A L F J K H B OD S I T N R CQG E PMA LF JK H B OD S I T N R C Q G E P MA L F J K H B Village Simulation Model Team Cambridge-London rich Team London Team Vancouver 09 11 00 02 03 05 08 01 04 06 07 10 12 09 11 00 02 03 05 08 01 04 06 07 10 12 09 11 00 02 03 05 08 01 04 06 07 10 12 09 11 00 02 03 05 08 01 04 06 07 10 12 PANGEA data set Estimates based on concatenated gag, pol, env sequences true tree FIG.5. Estimates of the proportion of transmissions from individuals in their first 3 months of infection (early and acute HIV), before the intervention from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team and model simulation (panels) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black. Estimating the Proportion of Transmissions from differences in the simulation datasets (referred to as “covar- iates”), such as sequence coverage and frequency of viral in- Individuals in Their First Three Months of Infection troductions (table 3). Figure 6A illustrates the phylogenetic (Early and Acute HIV) estimates that deviated largely from the true values (referred Phylogenetic estimates of the proportion of early transmis- to as “outliers”). We focused on quantifying the association of sions just before and after the intervention were more accu- outlier presence with the covariates listed in table 3 using a rate on the Regional simulations than the Village simulations, partial least squares regression approach, which enabled us to potentially reflecting stronger signal as a result of larger effect handle a relatively large number of co-dependent covariates sizes in the Regional simulations (fig. 5 and supplementary (see “Materials and Methods” section). figs. S6–S8, Supplementary Material online). On the regional Several covariates could be excluded from this analysis. simulations, estimates by team Cambridge-London had a Estimates obtained from the simulated full genome sequence mean absolute error of 3.9% and correlated with true values datasets were not more strongly associated with estimation by 92%. However, on the Village simulations, the mean abso- error than estimates obtained using the phylogenetic trees lute error in estimates by team Cambridge-London was 12% from which the sequences were simulated (supplementary (supplementary table S6, Supplementary Material online). fig. S9 and supplementary table S7, Supplementary Material Other teams had, overall, difficulties recovering the frequent online). Shorter, intense sampling periods after intervention early transmission scenarios. Team Basel-Zu ¨rich achieved the start of 3 years compared with a default of 5 years were also smallest mean absolute error on the Village simulations (sup not strongly associated with larger estimation error (supple plementary table S6, Supplementary Material online). mentary table S7, Supplementary Material online). Figure 6B shows the proportion of variance in outlier Predictors of Large Error in Phylogenetic Estimates presence that is explained by each of the remaining co- We evaluated to what extent the variation in errors of phy- variates. Signs indicate the impact of a change in predic- logenetic estimates could be associated to systematic tor values on the number of phylogenetic estimates with Estimated and true % early transmissions just before the intervention Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Incidence Incidence reduction Proportion of early transmissions Proportion of early transmissions after intervention during intervention just before intervention after intervention 04 07 07 03 B Team Vancouver J H Team London A J M T S J J L P G Team Cambridge-London 09 11 Team Cambridge −2 0 2 01234 −20 0 20 −20 0 20 error in phylogenetic estimate a a Outlier No Yes Incidence Incidence reduction Proportion of early transmissions Proportion of early transmissions after intervention during intervention just before intervention after intervention Team Vancouver −+ + + − + − − − − − −− Team London ++ − − − − − −− − − − ++ − − − ++ − + −− Team Cambridge-London + +− + +− − 0 20406080 100 0 20 40 60 80 100 0 20406080 100 0 20 40 60 80 100 variance in outlier presence explained (%) Error predictor and predictor values Impact of change in predictor values True % incidence, increasing Positive impact, fewer outliers True incidence ratio, increasing Negative impact, more outliers True % early transmissions just before intervention, increasing True % early transmissions after intervention, increasing Village simulation model vs. Regional simulation model Frequency of viral introductions 20%/year vs. <=5%/year High sequence coverage (50% for Village, 16% for Regional) vs. lower coverage (25% for Village, 8% for Regional) Proportion of sequences from after intervention start >80% vs. 50% FIG.6. Predictors of large error in phylogenetic estimates. (A) For each response, the error in the phylogenetic estimate was calculated, and statistical outliers were identified. The plot shows error in phylogenetic estimates by team and outcome measure. For large errors, the corre- sponding PANGEA dataset code in table 1 is indicated. (B) The contribution of the systematically varied covariates in table 1 to the presence of outliers was quantified through partial least squares regression (PLS, see “Materials and Methods” section). The plot shows the contribution of each predictor to the variance in outlier presence in colors, and the corresponding signs of the regression coefficients are added. Estimates from team Cambridge could not be characterized due to small sample size. The impact of the error predictors varied across the primary objectives of phylogenetic inference, as well as the phylogenetic methods used. With regard to estimates of incidence and incidence reduction, a subset of phylogenetic methods was particularly sensitive to high sequence coverage, a very large proportion of sequences obtained after intervention start, and a large frequency of viral introductions. With regard to estimates of the proportion of early transmissions, outliers were in several cases best explained by true differences in the proportion of early transmissions. very large error. Subplots are empty when phylogenetic (>80% vs. 50%) were associated with more outliers for methods did not produce estimates with large error (in- more than one phylogenetic method. Frequent viral in- dicating ahigherdegreeofsuccess). Overall, with regard troductions (20%/year vs. < ¼5%/year) were associated to estimates of incidence and incidence reduction, higher with more outliers by team Basel-Zu ¨ rich. These predictors sequence coverage (16% vs. 8% in the Regional datasets tended to outweigh the impact that true differences in and 50% vs. 25% in the Village datasets) and a large pro- incidence and incidence reduction had on outlier portion of sequences obtained after intervention start presence. 196 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE In contrast, with regard to estimates of the proportion of (Faria et al. 2014), or to undertake descriptive analyses of early transmissions, outliers were in several cases best ex- putative transmission chains (Brenner et al. 2007; Dennis plained by true differences in the proportion of early trans- et al. 2012). To be precise, the challenge here was in obtaining missions. Several phylogenetic methods had substantial quantitative estimates of HIV-1 incidence and the sources of difficulty estimating frequent early transmissions. Low sam- transmission in generalized epidemics, and to do so close to pling coverage did not contribute substantially to the pres- the present, when the phylogenetic signal weakens (de Silva ence of outliers. To substantiate this observation further, we et al. 2012). Second, sequence coverage was relatively low in compared phylogenetic estimates from just before the inter- most simulations, as is expected for most endemic-phase vention to those after the intervention, and found no con- settings in sub-Saharan Africa. Furthermore, frequent viral sistent improvements in accuracy with a doubling of introductions complicated the interpretation of viral trees, sampling coverage (supplementary fig. S10, Supplementary timelines were tight (3 months for the Village datasets, and Material online). Instead, outlier presence could be explained 6 months for the Regional datasets), and phylodynamic mod- through the simulation model, with more outliers on the els had to represent viral spread in heterogeneous popula- Village datasets. These simulations were characterized by tions (males and females with different risk profiles). We smaller sample sizes and smaller effect size (table 3 and sup aspired to evaluate the extent to which these challenges plementary figs. S6 and S7, Supplementary Material online). can be addressed with full genome HIV-1 sequences, and through customized phylogenetic methods. The methods comparison exercise demonstrates that viral Discussion phylogenetic tools can successfully estimate aspects of recent The PANGEA methods comparison exercise represents a transmission dynamics of generalized HIV-1 epidemics at community-wide effort for advancing the use of phylogenetic limited sequence coverage of the infected population, when methods to estimate aspects of recent HIV-1 transmission full-genome sequences are available. Two methods, the ABC dynamics of generalized epidemics in sub-Saharan Africa. kernel method of team Vancouver and the Bayesian trans- This region is affected by the largest HIV-1 epidemics world- mission analyzer of team London (table 2), were newly de- wide. Viral phylogenetics could be a central tool to guide HIV- veloped in response to the exercise. The birth–death skyline 1 prevention in these settings (Dennis et al. 2014). model with sampled ancestors (Gavryushkina et al. 2014)and It is not possible for phylogenetic methods to capture all its multi-type analogue (Ku ¨hnert et al. 2016) are readily avail- factors that influence the spread of HIV-1, ranging all the way able through the BEAST2 software package. The structured from biological factors determining person-to-person trans- coalescent (Volz et al. 2009) was customized to reflect avail- mission (Cohen et al. 2011) to the structure of sexual net- able information on the simulated epidemics, and required works on the community level (Gregson et al. 2002; Tanser considerable resources (roughly 1 week of computation time et al. 2011), and the broader impact of prevention and care on a 64-core machine of 2.5 Ghz processors per analysis). The services (Gardner et al. 2011). Of course, capturing all such methods comparison reflects these different stages in devel- features may not be needed: particular aspects of HIV-1 opment and customization. In this context, the structured spread in generalized epidemics could be estimable from se- coalescent approach was overall most accurate, producing quence data under the simplifying assumptions of phyloge- accurate estimates of incidence and changes in incidence, netic methods, and at relatively low sequence coverage. as well as broadly accurate estimates into the proportion of To validate this hypothesis from the outset, the PANGEA- early transmissions on the Regional simulations from full- HIV team simulated data under two highly complex HIV genome sequences. Confidence intervals were sufficiently transmission and intervention models, whose components tight for epidemiological interpretation, bearing in mind are considered essential for understanding long-term HIV that uncertainty in tree reconstructions was ignored. This transmission dynamics (Eaton et al. 2012). The aspects of indicates that the latest generation of viral phylogenetic HIV-1 spread evaluated here (table 1) were chosen both be- methods can complement standard incidence estimation cause molecular epidemiological studies into the sources of techniques where full-genome sequences are available from transmission and temporal changes in epidemic spread are in the general population. The use of sequence data for estimat- principle feasible (von Wyl et al. 2011; Stadler et al. 2013; Volz ing incidence trends in sub-Saharan Africa could be particu- et al. 2013; Dennis et al. 2014; Ratmann et al. 2016), and larly useful where demographic and health survey data are because of their relevance to on-going HIV-1 prevention ef- sparse (Pillay et al. 2015), no relevant observational HIV co- forts in sub-Saharan Africa. Crucially, the model simulations horts exist, or where estimates would otherwise be solely were constrained to pessimistic and optimistic projections of reliant on data from particular population groups such as the likely outcomes of on-going HIV-1 prevention efforts in pregnant women (Montana et al. 2008). Further, this study sub-Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; Hayes supports using viral phylogenetic methods for identifying et al. 2014), as well as what sequence data could become sources of HIV-1 transmission from full-genome sequences available in these settings. in certain settings. Broadly accurate estimates into the frac- The methods comparison exercise was challenging. First, tion of transmissions attributable to a population group were the exercise focused on quantifying recent transmission dy- obtained when both transmission from that group was not namics, whereas HIV-1 sequence data are more routinely infrequent (at least 10%) and sample size was not too small used to characterize the origins and spread of the virus (thousands of sequences for the HIV-infected populations 197 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE considered). Viral phylogenetic methods could thus help to to epidemic settings in sub-Saharan Africa where multiple quantify the contribution of several other source populations subtypes and recombinant forms circulate at high frequen- that are of key interest for prevention in sub-Saharan Africa, cies.Third,phylogeneticanalyses of full-genome sequences including the proportion of individuals infected within local- were not compared with similar analyses using shorter frag- ized high prevalence areas (Tanser et al. 2013), or the propor- ments of the genome such as, e.g., several 250 base pair re- tion of young women infected by male peers (Dellar et al. gions from the gag, pol or env genes. Full-genome sequences 2015). may not be required for estimating recent changes in HIV-1 We varied aspects of transmission dynamics and the sam- incidence or for quantifying the sources of HIV-1 transmis- pling frame in the simulations, to obtain a more systematic sion, and more cost-effective sequencing approaches could understanding of methods’ performance (fig. 5). Most phylo- provide similar results. genetic methods did not identify significant differences be- The PANGEA-HIV methods comparison exercise showed tween the high/low early transmission scenarios, and this was viral phylogenetic methods can be adapted to provide quan- also the case when basic genetic distance measures recovered titative estimates on aspects of recent HIV-1 transmission differences between the high/low early transmission scenarios dynamics in sub-Saharan Africa, where sequence coverage (regional simulations, supplementary fig. S6, Supplementary remains limited. On simulations, the structured coalescent Material online). The true proportions of early transmissions approach was overall most accurate for estimating recent were also frequently outside 95% confidence or credibility changes in incidence and the proportion of early transmis- intervals. This indicates that further methods’ improvement sions in modeled populations with generalized, and large is needed for estimating the proportion of early transmissions, HIV-1 epidemics. Future molecular epidemiological analyses and potentially for attributing sources of HIV-1 transmission would ideally make use of several of the evaluated phyloge- more broadly at the low sequence coverage scenarios consid- netic tools, in order to obtain robust insights into HIV-1 ered. Further, nearly all participants reported difficulties in transmission flows and how to disrupt them. Further meth- achieving numerical convergence of their methods on full- ods’ refinement is required to this end, with our analysis genome sequence data (unpublished submission reports). suggesting a focus on estimating the sources of HIV-1 trans- This could explain the above observations in part, and in mission from full-genome HIV-1 sequence data. These find- particular why the accuracy of early transmission estimates ings were obtained through a community-wide, blinded did not improve when using larger datasets with higher se- evaluation, and thereby add confidence into the use and in- quence coverage (fig. 5 and supplementary fig. S10, terpretation of viral phylogenetic tools for HIV-1 surveillance Supplementary Material online). Further investigations are and prevention in sub-Saharan Arica and beyond. needed. Finally, our error analysis suggests that explicit mod- eling of unobserved source demes (team Cambridge- Materials and Methods London) or identification of spatially localized phylogenetic clusters prior to transmission analyses (team London) could Study Design be effective approaches for mitigating the negative impact of The blinded PANGEA-HIV methods comparison exercise was viral introductions on phylogenetic analyses on mobile pop- announced in October 2014 at HIV Dynamics & Evolution, ulations (Grabowski et al. 2014). The simulated PANGEA and later on the PANGEA-HIV website. In a training round datasets as well as various aspects of the corresponding (round 1), participants were asked to identify trends in inci- true epidemics and interventions are available for future dence on simulated sequence datasets that were similar in benchmarking. size to the datasets in table 3, but that had qualitatively dif- This study has limitations. First, phylogenetic methods ferent epidemic dynamics. Data included full-genome viral were evaluated on simulated HIV-1 epidemics. While the sequences, patient meta-data, and further broad information use of two models guards to some extent against over- on the simulated epidemic (supplementary text S1, interpretation, analyses of real datasets may be more complex Supplementary Material online). Participation was unre- and could be associated with overall larger error. Of note, the stricted. In December 2014, the training data were un- simulated datasets are free of sequence sampling biases, blinded. All participants shared their findings. PANGEA-HIV which can substantially distort phylogenetic inferences and the participants agreed on the objectives and reporting (Carnegie et al. 2014). Second, the evolutionary components variables listed in table 1; on the timelines for the second final of the two models generated sequences that do not contain round; and that participation will be retrospectively restricted gaps or sequencing errors, cannot be translated to amino to teams addressing at least one of the pre-specified reporting acids, were correctly aligned, and did not contain recombi- variables. Simulation models were updated to include explicit nant sequences. Viral trees reconstructed from real sequence HIV care and intervention components, and re-calibrated to data are likely less accurate than those used in this analysis, a generate the epidemic scenarios shown in figures 1 and 2. potential source of error that is not represented in our eval- Blinded datasets were released on 10 February 2015 (supple uations. Frequent recombination could imply that full HIV-1 mentary text S2, Supplementary Material online). The dead- genomes are more appropriately analyzed on a gene-by-gene line for submissions was 8 May 2015. Questions and clarifica- basis (Hollingsworth et al. 2010; Ward et al. 2013), in contrast tions during the exercise were disseminated to all to our full-genome analyses of simulated sequences that ex- participants. Submissions were checked manually, and teams cluded recombinants. This limitation is particularly relevant were given the opportunity to fix conceptual errors. Few 198 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE submissions to the Regional simulations were obtained, and interventions (population-level effectiveness of ART); the deadline for submission to Regional datasets was ex- within-host evolution (neutral coalescent model, no co- tended to 18 August 2015. The Village simulations were infection and no recombination); between-host evolution un-blinded on 14 May 2015, and a preliminary evaluation (transmission of one virion, no recombination); and sequence was presented and reviewed by all participants at the 22nd sampling (at time of diagnosis of randomly selected individ- HIV Dynamics & Evolution conference. Teams Vancouver uals). To obtain the six epidemic scenarios shown in figure 2, and Basel-Zu ¨rich informed the evaluation group of a concep- we varied the relative transmission rate from early infections tual misunderstanding of the reporting variables, and pro- as well as parameters relating to uptake of the combination vided updated incidence estimates after the intervention intervention respectively. The simulation algorithm is avail- 1 day after the presentation. These updates on the Village able from https://github.com/olli0601/PANGEA.HIV.sim (last datasets were used in the evaluation reported here. The accessed October 14, 2016), and combines (with further Regional datasets were un-blinded on 3 September 2015. code): the individual-based HPTN071 (PopART) model ver- sion 1.1 to generate transmission chains, the Village Simulations VirusTreeSimulator (https://github.com/PangeaHIV/ The Village simulations were generated using the Discrete VirusTreeSimulator; last accessed October 14, 2016) to gen- Spatial Phylo Simulator with HIV-specific components erate viral trees from transmission chains, and SeqGen version (DSPS-HIV, https://github.com/PangeaHIV/DSPS-HIV_ 1.3 (Rambaut and Grassly 1997) to simulate viral sequences PANGEA; last accessed October 14, 2016). The DSPS-HIV is along viral trees. an individual-based stochastic simulator which models HIV-1 transmissions along a specifiable contact network of individ- Protocols for Phylogenetic Transmission Analyses uals and produces a line-list of all events (Hodcroft 2015). All participants adopted overall similar computational strat- Viral phylogenies that reflect between- and within-host viral egies that first reconstructed dated maximum-likelihood trees evolution were generated along transmission chains using (Price et al. 2010; Stamatakis 2014; To et al. 2015), and then VirusTreeSimulator (https://github.com/PangeaHIV/ considered the viral trees fixed in one of the following trans- VirusTreeSimulator; last accessed October 14, 2016). HIV-1 mission analyses: subtype C sequences were simulated along these viral phy- logenies using pBUSS (Bielejec et al. 2014), with substitution rates parameterized from analyses of African subtype C se- ABC Kernel Method quences. An overview of the simulation pipeline is shown in Reporting variables were estimated with an experimental figure 1, and details about the parameter values and assump- kernel-ABC method that combines a kernel method on tions used in the DSPS-HIV and to generate phylogenies and tree shapes (Poon et al. 2013) with a framework for approx- sequences are found in supplementary table S2, imate Bayesian computation (ABC). The basic premise of Supplementary Material online. Notably, assumptions were ABC is that it is usually easier to simulate data from a model made in sexual mixing partners, partner duration, interven- than to calculate its exact likelihood for the observed data. A tions, sampling, and between- and within-evolution complex- model can then be fit to the observed data by adjusting its ity. Disease progression and transmission within the DSPS- parameters until it yields simulations that resemble these HIV are determined by set-point viral load using previously data, bypassing the calculation of likelihoods altogether. We described relationships (Fraser et al. 2007). Simulations were formulated a structured compartmental SI model (Jacquez parameterized to reflect estimates of prevalence and inci- et al. 1988) that was informed by the descriptions of the dencefromthe peak of theHIV-1 epidemic in thelate agent-based simulations that were distributed to all partici- 1980s and early 1990s (Serwadda et al. 1992; Wawer et al. pants. Specifically, the model comprised three populations: a 1994), before treatment was widely available, with the root of main local population, a second local high-risk minority pop- the sequences dating back 40 years previously, coinciding ulation, and an external source population. Each population with the recent subtype C estimates of a common ancestor in was further partitioned into susceptible and infected groups, the 1940s (Faria et al. 2014). Further information about the where the latter was stratified into three stages of infection DSPS-HIV will be available in a forthcoming publication. (acute, asymptomatic, and chronic). Mixing rates between the main and minority local populations were controlled by Regional Simulations two parameters to allow for asymmetric mixing. Individuals The Regional simulation model consists of a stochastic, with acute or asymptomatic infections migrated from the individual-level epidemic transmission and intervention external region to the local region at a constant rate m, model, and an evolutionary model that generates viral phy- and replaced with new susceptible individuals in the external logenies and sequence data to simulated transmission chains. region. One infected individual in the external source popu- Figure 1 and supplementary table S1, Supplementary Material lation started the simulation. Coalescent trees were then sim- online, describe the overall simulation pipeline, model com- ulated based on population trajectories derived from the ponents, parameters, and parameter values. Notably, assump- numerical solution of the ordinary differential equations tions were made on: sexual risk behavior (proportion of that represent the model, using the R package rcolgem. The individuals in risk groups, mixing between risk groups, partner subset tree kernel (Poon et al. 2013) was used as a distance change rates); HIV infection (relative transmission rates); measure between the simulated coalescent trees and the 199 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE reconstructed viral phylogeniesonavailablesequencedata, or sequence was not available, likelihood terms were adjusted the provided phylogenies. A Markov chain Monte Carlo im- by numerically calculating the probability that a case infected plementation of ABC was used to fit the model. This kernel- at a given time had no sampled descendant cases by the time ABC approach was validated on simulated data from more the study finished, and then conditioning on each case’s num- conventional compartmental models (Poon 2015). ber of sampled and unsampled descendants. A reversible- jump Bayesian MCMC approach with proposal moves as described in (Didelot et al. 2014) was used to fit the model. Birth–Death Skyline Method with Sampled Ancestors This approach produces a posterior collection of trans- Phylodynamic analyses were performed in BEAST v2.0 mission trees. From these, we extracted the portion of (Bouckaert et al. 2014) using the add-ons “bdsky” (Stadler infections in the acute stage, recent changes in incidence et al. 2013), “SA” (Gavryushkina et al. 2014)and “bdmm” and other outcomes required for the comparison study. (Ku ¨hnert et al. 2016). Under the birth–death skyline model The generation time t had prior t  0.4 gen gen with sampled ancestors (“SA” module), individuals could gamma(1.3,1) þ 0.6 gamma(3.5,3.5) where the arguments transmit with some probability after sampling which im- are the shape and scale parameters. The time to sampling proved estimation of the reporting variables in preliminary had prior t  gamma(0.7, 1.5). samp analyses (round 1 of the exercise). To estimate the proportion of early transmissions, the multi-type birth–death model was used with two compartments (“bdmm” module) to consider Structured Coalescent individuals in their first 3 months of infection separately from Structured coalescent models were implemented in the rcol- those in later stages of infection. In all analyses, time was gem R package and were based on compartmental infectious partitioned into different intervals to obtain estimates of vary- disease models using the approach described in (Volz 2012). ing transmission rates through time. As further described in These models were tailored to the Regional and Village sce- supplementary text S3, Supplementary Material online, for narios, and included compartments for stage of infection both Village and Regional simulations, lognormal priors (early HIV infection through AIDS as in Cori et al. 2014), were used for the effective reproductive number (mu¼ 0 sex, and diagnosis/treatment status. Transmission rates and sigma¼ 0.75) and the becoming-non-infectious rate were allowed to vary between compartments, and general- (lognormal with mu¼1 and sigma 0.5). Uniform priors ized logistic functions described secular trends in the force of were used for the sampling proportion, and specified based infection through time. Coalescent models also included a on available meta-data. For the Village datasets 0, 1, 2, 3, 4, 9, deme for the unsampled source deme to capture the effects 10, 11 and 12, we assumed a priori a sampling proportion of lineage importation into the surveyed region. Models were between 15% and 40%; for Village datasets 5, 6, 7 and 8 be- fitted to the dated viral phylogenetic trees and to available tween 40% and 100%; and for the Regional datasets between epidemiological data under the approximation that the cor- 5% and 10%. The prior distribution for the removal probabil- responding likelihood terms are independent. For the ity r was chosen based on an estimate of the proportion of Regional simulations, the contribution to the likelihood sampled infected individuals that are on treatment, and cal- model of the CD4 counts at diagnosis and gender of all se- culated from available survey data before intervention start. quenced individuals was assumed multinomial; the propor- Sensitivity analyses on these prior choices were conducted. tion of diagnoses with a sequence was assumed binomial; and The reporting variables were estimated from MCMC output that of survey data (sex, diagnosis, and treatment status) was of the posterior model parameters using a customized pro- assumed multinomial. For the Village simulations, fewer cedure that is fully described in supplementary text S3, meta-data variables were available. The likelihood model as- Supplementary Material online. sumed that estimated HIV prevalence was within the bounds given by the available survey data. A parallel Bayesian MCMC technique (Calderhead 2014) was used to obtain posterior Bayesian Transmission Chain Analyser distributions of model parameters. The Bayesian approach reported in (Didelot et al. 2014)was adapted to account for incomplete sampling as well as het- erogeneity in HIV transmission rates. In place of a susceptible- Statistical Analysis infectious-recovered (SIR) model (as in Didelotetal. 2014)a Phylogenetic estimates and true values were transformed so generalized branching model was used to describe transmis- that their differences were approximately normally distrib- sion dynamics. In this model, the (prior) time interval be- uted. For incidence and incidence reductions, the error tween a case becoming infected and infecting others (t ) e of response i was calculated as e ¼ logð^x Þ logðx Þ, i i i i gen is distributed such that there is a peak after infection, a where ^x is the phylogenetic estimate and x thetruevalue i i chronic phase, and increased infectivity with progression to on dataset i; for proportions, the error was calculated as AIDS. Cases were sampled after a random time since becom- e ¼ ^x  x . Data points outside the whiskers of Tukey box- i i i ing infected (t ). The prior distribution of the numbers of plots were considered as outliers. samp secondary cases was negative binomial (n¼ 5, P¼ 0.7), re- To identify covariates associated with large error in phylo- flecting a convolution of a Poisson distribution conditioned genetic estimates, stepwise model selection with the on a gamma-distributed overall infectivity. To account for stepGAIV.VR procedure in the gamlss Rpackage wasused infected individuals in transmission chains for whom a to reduce the number of covariates at significance level 0.01 200 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE (supplementary table S4, Supplementary Material online). consortium (to O.R., E.H., A.L.B., and C.F.); the NIH through The contribution of the remaining covariates to outlier pres- the NIAID cooperative agreement UM1AI068619 for work on ence (response) was evaluated with partial least squares (PLS) the HPTN 071 trial (to A.C., M.P., and C.F.); the Wellcome regression (Boulesteix and Strimmer 2007), because of the Trust (WR092311MF to O.R.); the European Research Council limited number of datasets and dependencies amongst the (PBDR-339251 to C.F., PhyPD-335529 to T.S.); the National covariates. PLS regression is a dimension reduction technique Institutes of Health (NIH MIDAS U01 GM110749 to E.V. and that identifies combinations of covariates (PLS latent factors) A.L.B., NIH R01 AI087520 to E.V.); the Biotechnology and that are maximally correlated with the response variable, Biological Sciences Research Council (BB/J004227/1 to S.J.L.); and then regresses the response variable against the latent the Canadian Institutes of Health Research (CIHR HOP- factors. The first four latent factors that explained most of the 111406 to A.F.Y.P., New Investigator Award 175594 to A.F.Y. variance in outlier presence were considered in the error P.), the Michael Smith Foundation for Health Research/St. analysis. Figure 5B shows, in the notation of (Boulesteix and Paul’s Hospital Foundation/the Providence Health Care Strimmer 2007), thesignofthe PLSregression coefficients B Research Institute (Scholar Award 5127 to A.F.Y.P.); the j1 for each covariate j to the univariate response variable across ETH Zu ¨rich Postdoctoral Fellowship Program (to D.R. and the first c ¼ 4 latent factors. The proportion of variance p in D.K.); theMarie CurieActions forPeople COFUND the response variable attributable to each covariate j is calcu- Program (to D.R. and D.K.); the University of Edinburgh c jk 2 lated as p ¼ Þ v ,where w is the weight of co- Chancellor’s Fellowship scheme (to S.J.L.); the Centre of j k jk k¼1 w variate j to the kth latent factor and v is the variance Expertise in Animal Disease Outbreaks (to S.J.L.); and the explained by the kth latent factor. PLS regression was per- Swiss National Science Foundation (162251 to G.E.L.). The formed with the plsr routine in the pls Rpackage. funders had no role in study design, data collection and anal- ysis, decision to publish, or preparation of the article. Supplementary Material References Supplementary figures S1–S10, tables S1–S7,and text S1–S4 areavailableat Molecular Biology and Evolution online. Alizon S, Fraser C. 2013. Within-host and between-host evolution- ary rates across the HIV-1 genome. Retrovirology 10. Available from: https://retrovirology.biomedcentral.com/articles/10.1186/ Author Contributions 1742-4690-10-49. A.L.B. and C.F. conceived the study. O.R., E.H., A.L.B., and C.F. Bezemer D, Cori A, Ratmann O, van Sighem A, Hermanides HS, Dutilh designed and coordinated the study. E.H., M.H., S.L., and A.L.B. BE, Gras L, Rodrigues Faria N, van den Hengel R, Duits AJ, et al. 2015. Dispersion of the HIV-1 epidemic in men who have sex with men in designed and generated the Village simulations. M.H. contrib- the Netherlands: a combined mathematical model and phylogenetic uted the virus tree simulator from transmission chains. M.P., analysis. PLoS Med. 12:e1001898. A.C.,O.R., andC.F.designedand generatedthe Regional sim- Bielejec F, Lemey P, Carvalho LM, Baele G, Rambaut A, Suchard MA. ulations. O.R. checked the submissions received, performed 2014. piBUSS: a parallel BEAST/BEAGLE utility for sequence simula- the statistical analysis and wrote the first draft except parts of tion under complex evolutionary scenarios. BMC Bioinformatics 15:133. the “Methods” section. C.C., M.K., X.D., G.P., A.P., J.J., R.L., C.W., BouckaertR,Heled J, KuhnertD,Vaughan T, Wu CH,Xie D, SuchardMA, G.L., D.R., D.K., T.S., E.V., B.D., M.H., and S.F. evaluated the Rambaut A, Drummond AJ. 2014. BEAST 2: a software platform for simulated data and wrote parts of the “Methods” section. Bayesian evolutionary analysis. PLoS Comput Biol. 10:e1003537. All authors reviewed and approved the statistical analysis, Boulesteix AL, Strimmer K. 2007. Partial least squares: a versatile tool for and the final version of the article. the analysis of high-dimensional genomic data. Brief Bioinform. 8:32–44. Brenner BG, Roger M, Routy JP, Moisi D, Ntemgwa M, Matte C, Baril JG, Acknowledgments Thomas R, Rouleau D, Bruneau J, et al. 2007. High rates of forward We thank Andrew Rambaut for his comments on the design transmission events after acute/early HIV-1 infection. JInfectDis. 195:951–959. of the exercise; the PANGEA-HIV steering committee and Calderhead B. 2014. A general construction for parallelizing Metropolis- participants of the PANGEA-HIV satellite workshop of the Hastings algorithms. Proc Natl Acad Sci U S A. 111:17408–17413. 21st and 22nd HIV Dynamics & Evolution conference for their Carnegie NB, Wang R, Novitsky V, De Gruttola V. 2014. Linkage of viral comments during the exercise; and three anonymous re- sequences among HIV-infected village residents in Botswana: esti- viewers and associate editors for their comments that im- mation of linkage rates in the presence of missing data. PLoS Comput Biol. 10:e1003430. proved an earlier version of the article. Regional simulations Cohen MS, Dye C, Fraser C, Miller WC, Powers KA, Williams BG. 2012. were designed and generated using resources at the Imperial HIV treatment as prevention: debate and commentary–will early College High Performance Computing Service (http://www3. infection compromise treatment-as-prevention strategies?. PLoS imperial.ac.uk/ict/services/hpc). Team Cambridge-London Med. 9:e1001232. thanks the MRC Centre for Outbreak Analysis and Cohen MS, Shaw GM, McMichael AJ, Haynes BF. 2011. Acute HIV-1 Infection. NEnglJMed. 364:1943–1954. Modeling for support. Team Vancouver thanks Rosemary Cori A, AylesH,BeyersN,SchaapA,Floyd S, SabapathyK,Eaton McCloskey for help with tree reconstructions; their contribu- JW, Hauck K, Smith P, Griffith S, et al. 2014. HPTN 071 tion was enabled in part by support provided by Westgrid (PopART): a cluster-randomized trial of the population impact (www.westgrid.ca) and Compute Canada Calcul Canada of an HIV combination prevention intervention including uni- (www.computecanada.ca). This work was supported by the versal testing and treatment: mathematical model. PLoS One 9:e84511. Bill & Melinda Gates Foundation through the PANGEA-HIV 201 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE de Silva E, Ferguson NM, Fraser C. 2012. Inferring pandemic growth rates a combination prevention package on population-level HIV incidence from sequence data. JRSocInterface 9:1797–1808. in Zambia and South Africa” [Internet]. 2015. HIV Prevention Trials Dearlove B, Wilson DJ. 2013. Coalescent inference for infectious disease: Network. Available from: https://www.hptn.org/sites/default/files/ meta-analysis of hepatitis C. Philos Trans R Soc Lond B Biol Sci. 2016-05/HPTN%20071-2_Phylogenetics%20Ancillary%20Protocol_v% 368:20120314. 201.0_15Jan2015.pdf. Dellar RC, Dlamini S, Karim QA. 2015. Adolescent girls and young Iwuji CC, Orne-Gliemann J, Tanser F, Boyer S, Lessells RJ, Lert F, Imrie J, women: key populations for HIV epidemic control. JInt AIDS Soc. Barnighausen T, Rekacewicz C, Bazin B, et al. 2013. Evaluation of the 18:19408. impact of immediate versus WHO recommendations-guided an- Dennis AM, Herbeck JT, Brown AL, Kellam P, de Oliveira T, Pillay D, tiretroviral therapy initiation on HIV incidence: the ANRS 12249 Fraser C, Cohen MS. 2014. Phylogenetic studies of transmission dy- TasP (Treatment as Prevention) trial in Hlabisa sub-district, namics in generalized HIV epidemics: an essential tool where the KwaZulu-Natal, South Africa: study protocol for a cluster rando- burden is greatest?. J Acquir Immune Defic Syndr. 67:181–195. misedcontrolledtrial. Trials 14:230. Dennis AM, Hue S, Hurt CB, Napravnik S, Sebastian J, Pillay D, Eron JJ. Jacquez JA, Simon CP, Koopman J, Sattenspiel L, Perry T. 1988. Modeling 2012. Phylogenetic insights into regional HIV transmission. Aids and analyzing HIV transmission – the effect of contact patterns. 26:1813–1822. Math Biosci. 92:119–199. Didelot X, Gardy J, Colijn C. 2014. Bayesian inference of infectious disease Kouyos RD, von Wyl V, Yerly S, Boni J, Taffe P, Shah C, Burgisser P, transmission from whole-genome sequence data. Mol Biol Evol. Klimkait T, Weber R, Hirschel B, et al. 2010. Molecular epidemiology 31:1869–1879. reveals long-term changes in HIV type 1 subtype B transmission in Eaton JW, Johnson LF, Salomon JA, Barnighausen T, Bendavid E, Switzerland. JInfect Dis. 201:1488–1497. Bershteyn A, Bloom DE, Cambiano V,FraserC, HontelezJA, et al. Ku ¨hnert D, Stadler T, Vaughan TG, Drummond A. 2016. Phylodynamics 2012. HIV treatment as prevention: systematic comparison of math- with migration: a computational framework to quantify population ematical models of the potential impact of antiretroviral therapy on structure from genomic data. Mol Biol Evol. 33:2102–2116. HIV incidence in South Africa. PLoS Med. 9:e1001245. Lemey P, Rambaut A, Pybus OG. 2006. HIV evolutionary dynamics Faria NR, Rambaut A, Suchard MA, Baele G, Bedford T, Ward MJ, Tatem within and among hosts. AIDS Rev. 8:125–140. AJ, Sousa JD, Arinaminpathy N, Pepin J, et al. 2014. HIV epidemiology. Montana LS, Mishra V, Hong R. 2008. Comparison of HIV prevalence The early spread and epidemic ignition of HIV-1 in human popula- estimates from antenatal care surveillance and population- tions. Science 346:56–61. based surveys in sub-Saharan Africa. Sex Transm Infect. Fisher M, PaoD,Brown AE,SudarshiD,Gill ON,CaneP,Buckton AJ, 84(Suppl 1):i78–i84. Parry JV, Johnson AM, Sabin C, et al. 2010. Determinants of HIV- Moore JS, Essex M, Lebelonyane R, El Halabi S, Makhema J, Lockman 1 transmission in men who have sex with men: a combined S, Tchetgen E, Holme MP, Mills L, Bachanas P, Marukutira T, et al. clinical, epidemiological and phylogenetic approach. Aids 2013. Botswana Combination Prevention Project (BCPP). 24:1739–1747. ClincialTrials.gov. Available from: https://clinicaltrials.gov/ct2/ Fraser C, Hollingsworth TD, Chapman R, de Wolf F, Hanage WP. 2007. show/NCT01965470. Variation in HIV-1 set-point viral load: epidemiological analysis and Novitsky V, Ku ¨hnert D, Moyo S, Widenfelt E, Okui L, Essex M. 2015. an evolutionary hypothesis. Proc Natl Acad Sci U S A. Phylodynamic analysis of HIV sub-epidemics in Mochudi, Botswana. 104:17441–17446. Epidemics 13:44–55. Gardner EM, McLees MP, Steiner JF, Del Rio C, Burman WJ. 2011. The Pillay D, HerbeckJ,Cohen MS,de OliveiraT,FraserC,Ratmann O, spectrum of engagement in HIV care and its relevance to test-and- Brown AL, Kellam P, Consortium P-H. 2015. PANGEA-HIV: phy- treat strategies for prevention of HIV infection. Clin Infect Dis. logenetics for generalised epidemics in Africa. Lancet Infect Dis. 52:793–800. 15:259–261. Gavryushkina A, Welch D, Stadler T, Drummond AJ. 2014. Bayesian Poon AF. 2015. Phylodynamic inference with kernel ABC and its appli- inference of sampled ancestor trees for epidemiology and fossil cal- cation to HIV epidemiology. Mol Biol Evol. 32:2483–2495. ibration. PLoS Comput Biol. 10:e1003919. Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH. Grabowski MK, Lessler J, Redd AD, Kagaayi J, Laeyendecker O, Ndyanabo 2013. Mapping the shapes of phylogenetic trees from human and A, Nelson MI, Cummings DA, Bwanika JB, Mueller AC, et al. 2014. zoonotic RNA viruses. PLoS One 8:e78122. The role of viral introductions in sustaining community-based HIV Price MN, Dehal PS, Arkin AP. 2010. FastTree 2–approximately epidemics in rural Uganda: evidence from spatial clustering, phylo- maximum-likelihood trees for large alignments. PLoS One 5:e9490. genetics, and egocentric transmission models. PLoS Med. Pybus OG, Rambaut A. 2009. Evolutionary analysis of the dynamics of 11:e1001610. viral infectious disease. Nat Rev Genet. 10:540–550. Gregson S, Nyamukapa CA, Garnett GP, Mason PR,ZhuwauT,CaraelM, Rambaut A, Grassly NC. 1997. Seq-Gen: an application for the Monte Chandiwana SK, Anderson RM. 2002. Sexual mixing patterns and Carlo simulation of DNA sequence evolution along phylogenetic sex-differentials in teenage exposure to HIV infection in rural trees. Comput Appl Biosci. 13:235–238. Zimbabwe. Lancet 359:1896–1903. Ratmann O, van Sighem A, Bezemer D, Gavryushkina A, Juurrians S, Hayes R, Ayles H, Beyers N, Sabapathy K, Floyd S, Shanaube K, Bock P, Wensing AM, de Wolf F, Reiss P, Fraser C. 2016. Sources of HIV Griffith S, Moore A, Watson-Jones D, et al. 2014. HPTN 071 infection among men having sex with men and implications for (PopART): rationale and design of a cluster-randomised trial of prevention. Sci Transl Med. 8:320ra322. the population impact of an HIV combination prevention interven- Serwadda D, Wawer MJ, Musgrave SD, Sewankambo NK, Kaplan JE, Gray tion including universal testing and treatment – a study protocol for RH. 1992. HIV risk factors in three geographic strata of rural Rakai a cluster randomised trial. Trials 15:57. District, Uganda. Aids 6:983–989. Hodcroft E. 2015. Estimating the heritability of virulence in HIV. PhD Shapiro B, Rambaut A, Drummond AJ. 2006. Choosing appropriate sub- thesis, University of Edinburgh. Available from: https://www.era.lib. stitution models for the phylogenetic analysis of protein-coding se- ed.ac.uk/handle/1842/15814. quences. MolBiolEvol. 23:7–9. Hollingsworth TD, Laeyendecker O, Shirreff G, Donnelly CA, Serwadda D, Stadler T, Bonhoeffer S. 2013. Uncovering epidemiological dynamics in Wawer MJ, Kiwanuka N, Nalugoda F, Collinson-Streng A, Ssempijja heterogeneous host populations using phylogenetic methods. Philos V, et al. 2010. HIV-1 transmitting couples have similar viral load set- Trans R Soc Lond B Biol Sci. 368:20120198. points in Rakai, Uganda. PLoS Pathog. 6:e1000876. Stadler T, Kuhnert D, Bonhoeffer S, Drummond AJ. 2013. Birth- HPTN 071-2 Phylogenetics in HPTN 071: An ancillary study to death skyline plot reveals temporal changes of epidemic spread “Population Effects of Antiretroviral Therapy to Reduce HIV in HIV and hepatitis C virus (HCV). Proc Natl Acad Sci U S A. Transmission (PopART): A cluster-randomized trial of the impact of 110:228–233. 202 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis Volz E, Ionides E, Romero-Severson E, Brandt MG, Mokotoff E, and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. Koopman J. 2013. HIV-1 transmission during early infection in Tanser F, Barnighausen T, Grapsa E, Zaidi J, Newell ML. 2013. High cov- men who have sex with men: a phylodynamic analysis. PLoS erage of ART associated with decline in risk of HIV acquisition in Med. 10:e1001568. rural KwaZulu-Natal, South Africa. Science 339:966–971. Volz EM. 2012. Complex population dynamics and the coalescent under Tanser F, Barnighausen T, Hund L, Garnett GP, McGrath N, Newell ML. neutrality. Genetics 190:187–201. 2011. Effect of concurrent sexual partnerships on rate of new HIV Volz EM, Kosakovsky Pond SL, Ward MJ, Leigh Brown AJ, Frost SD. 2009. infections in a high-prevalence, rural South African population: a Phylodynamics of infectious disease epidemics. Genetics cohort study. Lancet 378:247–255. 183:1421–1430. To TH, Jung M, Lycett S, Gascuel O. 2015. Fast dating using least-squares von WylV,KouyosRD, YerlyS,BoniJ,Shah C,BurgisserP,Klimkait T, criteria and algorithms. Syst Biol. 65:82–97. Weber R, Hirschel B, Cavassini M, et al. 2011. The role of migration UNAIDS. 2014. Fast-Track – Ending the AIDS epidemic by 2030. Geneva: and domestic transmission in the spread of HIV-1 non-B subtypes in UNAIDS. Available from: http://www.unaids.org/en/resources/docu Switzerland. JInfect Dis. 204:1095–1103. ments/2014/JC2686_WAD2014report Ward MJ, Lycett SJ, Kalish ML, Rambaut A, Leigh Brown A. 2013. UNAIDS. 2015. AIDS by the numbers 2015. Geneva: UNAIDS. Available Estimating therateofintersubtyperecombination in earlyHIV-1 from: http://www.unaids.org/sites/default/files/media_asset/AIDS_ group M strains. JVirol. 87:1967–1973. by_the_numbers_2015_en.pdf. Wawer MJ, Sewankambo NK, Berkley S, Serwadda D, Musgrave SD, Vassall A, Pickles M, Chandrashekar S, Boily MC, Shetty G, Guinness L, Gray RH, Musagara M, Stallings RY, Konde-Lule JK. 1994. Incidence Lowndes CM, Bradley J, Moses S, Alary M, et al. 2014. Cost-effective- of HIV-1 infection in a rural region of Uganda. BMJ 308:171–173. ness of HIV prevention for high-risk groups at scale: an economic WHO. 2015. Guideline on when to start antiretroviral therapy and on evaluation of the Avahan programme in south India. Lancet Glob pre-exposure prophylaxis for HIV. Geneva: WHO Press. Available Health 2:e531–e540. from: http://www.who.int/hiv/pub/guidelines/earlyrelease-arv/en/. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Molecular Biology and Evolution Oxford University Press

Loading next page...
 
/lp/ou_press/phylogenetic-tools-for-generalized-hiv-1-epidemics-findings-from-the-5OidOYUbtJ

References (73)

Publisher
Oxford University Press
Copyright
Copyright © 2022 Society for Molecular Biology and Evolution
ISSN
0737-4038
eISSN
1537-1719
DOI
10.1093/molbev/msw217
pmid
28053012
Publisher site
See Article on Publisher Site

Abstract

Viral phylogenetic methods contribute to understanding how HIV spreads in populations, and thereby help guide the design of prevention interventions. So far, most analyses have been applied to well-sampled concentrated HIV-1 epi- demics in wealthy countries. To direct the use of phylogenetic tools to where the impact of HIV-1 is greatest, the Phylogenetics And Networks for Generalized HIV Epidemics in Africa (PANGEA-HIV) consortium generates full-genome viral sequences from across sub-Saharan Africa. Analyzing these data presents new challenges, since epidemics are principally driven by heterosexual transmission and a smaller fraction of cases is sampled. Here, we show that viral phylogenetic tools can be adapted and used to estimate epidemiological quantities of central importance to HIV-1 prevention in sub-Saharan Africa. We used a community-wide methods comparison exercise on simulated data, where participants were blinded to the true dynamics they were inferring. Two distinct simulations captured generalized HIV-1 epidemics, before and after a large community-level intervention that reduced infection levels. Five research groups participated. Structured coalescent modeling approaches were most successful: phylogenetic estimates of HIV-1 inci- dence, incidence reductions, and the proportion of transmissions from individuals in their first 3 months of infection correlated with the true values (Pearson correlation> 90%), with small bias. However, on some simulations, true values were markedly outside reported confidence or credibility intervals. The blinded comparison revealed current limits and strengths in using HIV phylogenetics in challenging settings, provided benchmarks for future methods’ development, and supports using the latest generation of phylogenetic tools to advance HIV surveillance and prevention. Key words: HIV transmission and prevention, molecular epidemiology of infectious diseases, viral phylogenetic methods validation. Introduction Incorporating these strategies into routine care services and Recent breakthroughs in human immunodeficiency virus delivering on the commitment to end the HIV-1 epidemic by type 1 (HIV-1) prevention and treatment have provided a 2030 remains a major challenge (UNAIDS 2014), particularly range of tools to reduce HIV-1 transmission (WHO 2015). in sub-Saharan Africa where the burden of HIV-1 is greatest. The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is Open Access properly cited. Mol. Biol. Evol. 34(1):185–203 doi:10.1093/molbev/msw217 Advance Access publication October 7, 2016 185 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE This region suffers 75% of all new HIV-1 infections worldwide, reductions in HIV-1 incidence over a short period. Viral phy- with adult HIV-1 prevalence exceeding 25% in some regions, logenetics could be an effective tool to measure similar re- and averaging 5% overall (UNAIDS 2015). To sustain public ductions, especially in contexts where incidence cohorts do health interventions at this scale with limited resources, a not exist, and thereby contribute to monitoring the impact of sufficiently detailed understanding of the local and regional prevention strategies. First, participants were asked to esti- drivers of HIV-1 spread is often indispensable. Universal pre- mate recent reductions in HIV-1 incidence resulting from a vention packages (Iwuji et al. 2013; Hayes et al. 2014) benefit simulated community-based intervention over a 3- to 5-year from data that allows monitoring incidence trends and driv- period. Here, incidence was defined as the proportion of new ers of residual spread, whereas more targeted prevention cases per year among uninfected adults, and reductions in approaches (Vassall et al. 2014) by definition require a de- incidenceweremeasuredinterms of the incidence ratiobe- tailed knowledge of at-risk populations. fore and after the intervention. Second, it has been debated The Phylogenetics And Networks for Generalized HIV whether frequent transmission during the early acute phase Epidemics in Africa (PANGEA-HIV) consortium aims to pro- of HIV infection could undermine the impact in reducing vide viral sequence data from across sub-Saharan Africa, and incidence of universal test and treat (Cohen et al. 2012). In to evaluate their viral phylogenetic relationship as a marker of concentrated epidemics, viral phylogenetics based on partial recent HIV-1 transmission dynamics (Pillay et al. 2015). pol sequences have been used to provide estimates of the Previous molecular epidemiological studies indicate that proportion of transmissions arising from individuals in their this approach can characterize transmission landscapes first year of infection (Volz et al. 2013; Ratmann et al. 2016). across a diverse array of epidemic contexts in order to guide Here, we sought to evaluate whether viral phylogenetics prevention efforts (Fisher et al. 2010; Kouyos et al. 2010; von based on full-genome sequences can provide accurate esti- Wyl et al. 2011; Stadler et al. 2013; Volz et al. 2013; Grabowski mates of the proportion of transmissions from individuals in et al. 2014; Bezemer et al. 2015; Ratmann et al. 2016). Rather early and acute HIV (defined here as in their first 3 months of than the partial gene sequences frequently used, the consor- infection), because these are likely not preventable in current tium is generating near full-length HIV-1 sequences in order prevention trials where testing intervals are 1 year or more to further increase the resolution and power of viral phylo- (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014). Third, genomic methods (Dennis et al. 2014). Indeed, such increases as sequence data are now collected as part of HIV-1 preven- in power are needed to disentangle signal from noise in ep- tion trials (HPTN 071 (PopART) Phylogenetics Protocol Team idemic settings with frequent co-infection and recombination 2015; Novitsky et al. 2015), different approaches to prospec- events (Grabowski et al. 2014), and to shift focus to recent tive sequence sampling have emerged. Sequences could be transmission dynamics (Dennis et al. 2014). collected at high coverage in villages or smaller townships at Available viral phylogenetic techniques can provide esti- the risk of missing long-range transmissions, or at lower cov- mates of key epidemiological quantities of concentrated erage over geographically much larger areas. We sought to HIV-1 epidemics (Brenner et al. 2007; Fisher et al. 2010; compare the impact of these sampling strategies on viral Stadler and Bonhoeffer 2013; Volz et al. 2013; Bezemer phylogenetic analyses by simulating epidemics in village and et al. 2015; Ratmann et al. 2016). But the generalized epi- larger regional populations, and sampling sequences at high demics in sub-Saharan Africa and sequence availability in and low coverage respectively. Other objectives included eval- these resource-poor settings differ fundamentally from uating the benefit of using concatenated HIV-1 sequences well sampled concentrated epidemics in wealthy countries, comprising simulated pol, gag and env genes, as compared where viral phylogenetic tools have been proven to be most with using simulated pol sequences alone, and the impact of effective to date (Dennis et al. 2014). To strengthen the frequent viral introductions into the modeled population as a application of viral phylogenetics in sub-Saharan Africa, in result of long-distance transmission. Table 1 describes the October 2014 PANGEA-HIV invited research groups to par- objectives and reporting variables of the exercise more fully. ticipate in a blinded methods comparison exercise. Two Five external research groups participated in the exercise, individual-level HIV epidemic models were used to simu- out of eight teams that initially indicated interest. Table 2 lists late generalized HIV-1 epidemics. From these, we gener- the phylogenetic methods that were used: the ABC-kernel ated corresponding viral sequence datasets comprising method (A.Poon, J. Joy,R.Liang; teamVancouver)(Poon simulated pol, gag and env genes (which we refer to as 2015), the birth-death skyline method with sampled ancestors full genome sequences for brevity), as well as basic (C. Weis, G.E. Leventhal, D. Ku ¨hnert,D.A.Rasmussen, T. Stadler; individual-level epidemiological data on those infected team Basel-Zu ¨rich) (Gavryushkina et al. 2014; Ku ¨hnert et al. individuals that were sequenced in the simulations. 2016), a metapopulation coalescent approach (B. Dearlove, M. External research groups then analyzed the blinded data. Hossain, S. Frost; team Cambridge) (Dearlove and Wilson 2013), Overall, we aimed to evaluate if the most recent genera- thestructuredcoalescent(E. Volz,M.Hossain,S.Frost;team tion of viral phylogenetic tools could be adapted and used to Cambridge-London) (Volz et al. 2009), and a Bayesian trans- estimate epidemiological quantities of central importance to mission chain analyser (C. Colijn, M. Kendall, X. Didelot, G. HIV-1 prevention in sub-Saharan Africa. The specific objec- Plazotta; team London) (Didelot et al. 2014). These methods tives were inspired by current HIV-1 prevention trials in sub- differed in the underlying transmission and intervention mod- Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. els, assumptions to facilitate estimation of the reporting vari- 2014). The primary goal of these trials is to achieve substantial ables, and computational estimation routines. Here, we 186 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Table 1. Aims of the PANGEA Phylodynamic Methods Comparison Exercise. Objectives Reporting Variable Primary objectives 1 Identify incident trends during the intervention Consider the year t before the intervention started, and the second last year t of the s e simulation. Participants were asked to report HIV-1 incidence trends from t to t in s e terms of “declining”, “stable”, “increasing” 2 Estimate HIV-1 incidence after the intervention Participants were asked to report %Incidence defined as %INCðt Þ¼ INCðt Þ=Sðt Þ, e e e where INCðt Þ is the number of new cases in year t , and Sðt Þ is the number of e e e sexually active individuals that were not infected in year t 3 Quantify the reduction in HIV-1 incidence at the Participants were asked to report the incidence ratio %INCðt Þ=%INCðt Þ e s end of the intervention 4 Estimate the proportion of transmissions from Participants were asked to report the proportion of new cases in year t from indi- early and acute HIV, just before the intervention viduals in their first 3 months of infection 5 Estimate the proportion of transmissions from Participants were asked to report the proportion of new cases in year t from indi- early and acute HIV, after the intervention: viduals in their first 3 months of infection Secondary objectives To estimate the impact of the following controlled covariates on the reporting variables: 6 Availability of full genome sequences (HIV-1 gag, pol and env genes) as compared with partial sequences (HIV-1 pol gene only) 7 Sequence sampling frame: Sequence coverage at the end of the simulation; Rapid increases in sequence coverage; Sampling duration after intervention start 8 Frequency of viral introductions into the modeled study population 9 Inference of dated viral phylogenies from sequence data summarizethe findings of theexercise, and discuss their impli- simulations, a combination prevention intervention was cations for using phylogenetic methods to estimate recent as- started in 2015 for 3 years at varying degrees of uptake and pects of HIV-1 transmission dynamics in generalized epidemics. coverage, resulting in 30% or 60% reductions in incidence Datasets and simulations generated here may be of use for relative to the start of the intervention, when incidence was testing other applications of viral phylogenetic methods, and close to 2% per year. In half of the 20 simulations, the pro- are made available alongside this article. portion of early transmissions in 2015 was respectively cali- brated to 10% and 40% (fig. 2). Ranges in incidence reduction reflect modeled, optimistic and pessimistic scenarios in on- Results going prevention trials in sub-Saharan Africa (Iwuji et al. 2013; PANGEA-HIV Reference Datasets for Benchmarking Moore et al. 2013; Hayes et al. 2014). The proportion of trans- Molecular Epidemiological Transmission Analysis missions from early and acute HIV has been challenging to estimate without sequence data, and the ranges used here Methods reflect estimates from several settings in sub-Saharan Africa The simulations capture a variety of transmission and inter- (Cohen et al. 2012). About 5–20% of all transmissions per year vention scenarios across two demographic settings in sub- occurred from outside the model population, which hindered Saharan Africa, and are available from https://dx.doi.org/10. prevention efforts in the simulations through continual re- 6084/m9.figshare.3103015 (last accessed October 14, 2016). plenishment of the epidemic. 20 datasets correspond to generalized HIV-1 epidemics in a 13 simulated datasets capture generalized HIV-1 epidemics region of 80,000 individuals between 1980 and 2020 (table over 45 years in a smaller village population of 8,000 indi- 3). The proportion of infected individuals of whom one se- viduals (table 3). Sequence coverage was higher in this smaller quence was sampled (sequence coverage) was 8–16% by the population, 25–50% by the end of the simulation. These data end of the simulation. These data were simulated under the were simulated under an individual-based household model individual-based HPTN071 (PopART) model, version 1.1, de- using the Discrete Spatial Phylo Simulator for HIV, developed veloped at Imperial College London (“Regional” model). The at the University of Edinburgh (“Village” model). Model com- overall simulation pipeline and model components are illus- ponents are illustrated in figure 1, and further information is trated in figure 1, and further information is provided in sup provided in supplementary table S2, Supplementary Material plementary table S1, Supplementary Material online. The online. The Village model was parameterized to simulate an Regional model was calibrated to generate an epidemic HIV-1 epidemic mostly contained within a small rural African with a comparable prevalence atthe startofthe intervention village, with a peak prevalence of 20–25% and peak incidence to that seen currently in HPTN071 (PopART) trial sites in of 5–7% without treatment (fig. 2). In 12 out of 13 simula- South Africa (Hayes et al. 2014). In the model, standard of tions, a community-level intervention providing antiretroviral care improved according to national guidelines over time, treatment took place for the last 5 years of the simulation. resulting in steady declines in incidence. In 18 of the 20 187 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Table 2. Phylogenetic Methods Used in the PANGEA Phylodynamic Methods Comparison Exercise. Team Team Members Method Model-based Model Overview Simulated Data Used To Fitting Process Availability analysis Inform Inference Basel-Zu ¨ rich C. Weis, G.E. Birth–death sky- Yes Stochastic birth–death model with sampled ances- All sequences and full Markov Chain http://beast2.org/ Leventhal, line method tors to estimate incidence and incidence reduc- trees to estimate Monte Carlo (last accessed D. Ku ¨ hnert, with sampled tions, and multi-type birth death model birth–death parame- October 14, 2016) D.A. ancestors corresponding to two stages of infection to esti- ters; cross-sectional using add-ons Rasmussen, mate the proportion of early transmissions. Time survey data bdsky, SA, bdmm T. Stadler trends in parameters were modeled with serial time intervals during which parameters were assumed constant. Viral introductions were not modeled Cambridge B. Dearlove, Meta-population Yes Standard SI, SIS and SIR models were averaged. Model All sequences and full Markov Chain http://beast.bio.ed. M. Hossain, coalescent parameters did not change over time. Viral intro- trees. Monte Carlo ac.uk/ (last S. Frost approach ductions were not modeled. accessed October 14, 2016) using XML specification described in (Dearlove and Wilson 2013) Cambridge- E. Volz, M. Structured Yes Deterministic compartment model stratified by gen- All sequences and sub- Parallel Markov http://colgem.r- London Hossain, coalescent der, disease progression, diagnosis and treatment trees including all in- Chain Monte forge.r-proj- S. Frost status, risk behavior. Time trends in baseline ternal nodes 30 before Carlo ect.org/ (last transmission rates were modeled with 4-parameter the last sample; cross- accessed October generalized logistic function. Diagnosis and treat- sectional survey data; 14, 2016) ment uptake rates changed at intervention start. and gender and CD4 Viral introductions were modeled with a source count at time of diag- deme. nosis for Regional datasets. London C. Colijn, Bayesian trans- Yes Stochastic generalized branching model with gener- All sequences and full Reversible-jump https://github.com/ M. Kendall, mission chain ation time modeled to represent three infection trees on village data- Markov Chain xavierdidelot/ G. Plazotta, analyzer stages. Model parameters did not change over time. sets; sequences in Monte Carlo TransPhylo (last X. Didelot Viral introductions were not modeled. trees with at least 80 accessed October tips on regional 14, 2016) PANGEA datasets. release available from authors Vancouver A. Poon, J. Joy, ABC kernel Yes Deterministic compartment model stratified by in- All sequences and full Approximate https://github.com/ R. Liang method fection status, three stages of infection, and risk trees. Bayesian ArtPoon/kamphir behavior. Model parameters did not change over Computation (last accessed time. Viral introductions were modeled with a October 14, 2016) source deme. PANGEA release available from authors Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Table 3. Simulated Datasets of the Phylodynamic Methods Comparison Exercise. a,b Model Dataset Purpose %Acute Intervention Viral Sequences Sequence Sequences Sampling a,c (Low¼L, Scale Up Introduction- (#) Coverage in the After Duration a,d High¼H) (Fast¼F, s (% of All Last Year of the Intervention After a,e f Slow¼S) Transmission- Simulation (% Start (% of All Intervention s per Year) of All Infected Sequences) Start (Years) and Alive) Regional D Identify 60% reduction in incidence during intervention and 10% L F 5 1,600 8 50 5 early transmissions. C Identify 30% reduction in incidence during intervention and 10% L S 5 1,600 8 50 5 early transmissions. A Identify 60% reduction in incidence during intervention and 40% H F 5 1,600 8 50 5 early transmissions. B Identify 30% reduction in incidence during intervention and 40% H S 5 1,600 8 50 5 early transmissions. O As D, and evaluate impact of sampling frame: shorter duration of L F 5 1,280 8 50 3 intensive sequencing. T As D, and evaluate impact of tree reconstruction. L F 5 1,600 8 50 5 S As D, and evaluate impact of sampling frame: most sequences L F 5 1,600 8 85 5 from after intervention start. I As D, and evaluate impact of sampling frame: higher sequence L F 5 3,200 16 50 5 coverage. R As C, and evaluate impact of tree reconstruction. L S 5 1,600 8 50 5 Q As C, and evaluate impact of sampling frame: most sequences L S 5 1,600 8 85 5 from after intervention start. G As C, and evaluate impact of sampling frame: higher sequence L S 5 3,200 16 50 5 coverage. N Control simulation, no intervention. L None 5 1,600 8 50 5 F As A, and evaluate impact of sampling frame: shorter duration of H F 5 1,280 8 50 3 intensive sequencing. L As A, and evaluate impact of tree reconstruction. H F 5 1,600 8 50 5 J As A, and evaluate impact of sampling frame: higher sequence H F 5 3,200 16 50 5 coverage. P As A, and evaluate impact of higher proportion of viral H F 20 1,600 8 50 5 introductions. H As B, and evaluate impact of tree reconstruction. H S 5 1,600 8 50 5 K As B, and evaluate impact of sampling frame: higher sequence H S 5 3,200 16 50 5 coverage. E As B, and evaluate impact of higher proportion of viral H S 20 1,600 8 50 5 introductions. M Control simulation, no intervention. H None 5 1,600 8 50 5 Village 3 Identify 40% reduction in incidence during intervention and 4% LF <2 777 25 >95 5 early transmissions. 2 Identify 15% reduction in incidence during intervention and 4% LS <2 857 25 >95 5 early transmissions. (continued) Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Table 3. Continued a,b Model Dataset Purpose %Acute Intervention Viral Sequences Sequence Sequences Sampling a,c (Low¼L, Scale Up Introduction- (#) Coverage in the After Duration a,d High¼H) (Fast¼F, s (% of All Last Year of the Intervention After a,e f Slow¼S) Transmission- Simulation (% Start (% of All Intervention s per Year) of All Infected Sequences) Start (Years) and Alive) 1 Identify 40% reduction in incidence during intervention and 20% HF <2 957 25 >95 5 early transmissions. 4 Identify 15% reduction in incidence during intervention and 20% HS <2 1,040 25 >95 5 early transmissions. 5 As 3, and evaluate impact of sampling frame: higher sequence LF <2 1,469 50 >95 5 coverage. 11 Similar to 3, without imported sequences. L F 0 638 25 >95 5 8 As 2, and evaluate impact of sampling frame: higher sequence LS <2 1,630 50 >95 5 coverage. 9 Similar to 2, without imported sequences. L S 0 686 25 >95 5 0 Control simulation, no intervention. L None <2 872 25 >95 5 6 As 1, and evaluate impact of sampling frame: higher sequence HF <2 1,831 50 >95 5 coverage. 12 Similar to 1, without imported sequences. H F 0 956 25 >95 5 7 As 4, and evaluate impact of sampling frame: higher sequence HS <2 1,996 50 >95 5 coverage. 10 Similar to 4, without imported sequences. H S 0 1,012 25 >95 5 Variables in shaded columns were unknown to participants at time of analysis. Values range from 5% to 40%, reflecting recent estimates for endemic-phase epidemics in sub-Saharan Africa (Cohen et al. 2012). Range reflects optimistic and pessimistic scenarios in prevention trials in sub-Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014). Range includes frequent viral introductions as reported in settings with highly mobile populations (Grabowski et al. 2014). In comparison to the large sequence datasets that are available for concentrated epidemics in Europe or North America, the lower values here reflect challenges in achieving high sequence coverage where large populations are infected. Higher values reflect geographically focused sequencing efforts such as in Mochudi, Botswana (Carnegie et al. 2014). Values reflect the duration of typical prevention trial settings, and that most sequences are obtained after intervention start (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014). Out of all individuals that were alive and infected in the last calendar year of the simulation, the proportion that had ever a sequence taken. For datasets in bold, only viral sequences were disclosed. For all other datasets, only viral phylogenies were provided. Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Model Regional simulations Village simulations Simulation Output Component Demographics FIG.1. Simulation pipeline to generate HIV-1 sequence data, viral phylogenies, and accompanying individual-level data. Two simulation models (Regional and Village) were implemented for the methods comparison. The two individual-level epidemic and intervention models generated HIV-1 transmission chains in the model population, and its components are shown in blue to green. Next, individuals were sampled for sequencing, and a viral tree was generated for these individuals. Tree generation accounted for within-host viral evolution under a neutral coalescent model. Finally, viral sequences comprising the gag, pol and env genes were simulated along the viral tree. Sequence generation accounted for known variation in evolutionary rates across genes, codon positions, and along within-host lineages. Further details are provided in supplementary tables S1 and S2, Supplementary Material online. Treatment uptake was either “fast” or “slow”, with reductions in sub-Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; in incidence averaging between 10% and 40% relative to be- Dennis et al. 2014; Grabowski et al. 2014;HPTN071 (PopART) fore intervention start. Additionally, simulations were config- Phylogenetics Protocol Team 2015; Pillay et al. 2015). ured so that either a small (4%) or large (20%) proportion of Sequence sampling biases can be substantial in real datasets, transmissions occurred during the first 3 months of infection. but were not included in the model (Carnegie et al. 2014; Some infections originated from outside the model popula- Ratmann et al. 2016). Second, viral trees were generated un- tion in half of the simulations. der a hybrid within- and between-host coalescent model. The Viral sequences were generated from the simulated trans- viral trees did not always correspond to the transmission mission chains (fig. 1). First, individuals were sampled at ran- trees, because viruses diversified within infected individuals dom for sequencing. The majority of individuals were only before transmission (Pybus and Rambaut 2009). In 25 of the 33 sampled in the last years of the simulations, reflecting that datasets, these viral trees were made available, in order to re- sequences are only beginning to be more routinely collected duce the computational burden of molecular epidemiological 191 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Regional simulation model Adult population size HIV prevalence Early transmissions from individuals in their first three months of infection 60000 20 1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020 ART coverage among infected adults New cases among uninfected adults per 12 months Sequence coverage among infected adults 2.00 10.0 1.75 7.5 1.50 5.0 1.25 25 2.5 1.00 0.75 0.0 2016 2018 2020 2016 2018 2020 1990 2000 2010 2020 Calendar year, simulation started in 1980 Data sets D, O, S, T A, F, L C, R, Q B, H N M Village simulation model Adult population size HIV prevalence Early transmissions from individuals in their first three months of infection 0 0 10 15 20 25 30 35 40 45 10 15 20 25 30 35 40 45 10 15 20 25 30 35 40 45 New cases among uninfected adults per 12 months Sequence coverage among infected adults ART coverage among infected adults 7 50 6 40 4 20 3 10 0 2 40 41 42 43 44 45 39 40 41 42 43 44 45 46 39 40 41 42 43 44 45 46 Years since start of the simulation Data sets 3, 5 2, 8 1, 6 4, 7 9 10 11 12 0 FIG.2. Simulated epidemic scenarios under the Regional and Village models. (A) Six generalized HIV-1 epidemic scenarios were simulated in a region of 80,000 adult individuals using the Regional model, and (B) nine scenarios were simulated in a rural village population with an initial population of 6,000 individuals using the Village model. The scenarios differ in terms of incidence, the proportion of early transmissions, and scale-up of the combination prevention package during the intervention period (gray-shaded time period). From these, 33 datasets were generated, that included either viral sequences or viral trees. These datasets further varied in the sequence sampling frame and the frequency of viral introductions; see also figure 1 and table 3.DatasetsE,G,I,J,K,Phadmorefrequent viralintroductions or higher sequencecoverage, andare notshown.The proportion of early transmissions under the Village model was smoothed with a 3-year sliding window to better visualize trends in this smaller model population. Total Total % % % % % Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Team Cambridge Team Cambridge-London rich Team London Team Vancouver A L D S F C Q B H M 03 11 01 06 12 10 07 J P I T O G R E K N 05 09 02 08 00 04 A L D S F C Q B H M 03 11 01 06 12 10 07 A L D S F C Q B H M 03 11 01 06 12 10 07 J P I T O G R E K N 05 09 02 08 00 04 J P I T O G R E K N 05 09 02 08 00 04 PANGEA data set Estimates based on concatenated gag, pol, env sequences true tree FIG.3. Estimates of HIV-1 incidence from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team (panel) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black. analyses (table 3 and supplementary figs. S1 and S2, level intervention scenarios through the viral sequences Supplementary Material online). For the remaining 13 data- provided (supplementary fig. S5, Supplementary Material on- sets, viral sequences of HIV-1 gag, pol and env genes were line). However, we expected that rapid increases in sequence simulated along the viral trees (1,500, 3,000 and 2,500 coverage after the intervention would complicate phyloge- nucleotides respectively, for a total of approximately 6,000 netic inference. The simulations also retained, on average, nucleotides), from an HIV-1 subtype C starting sequence. The information for differentiating between the 10% and 40% sequences thus represent generalized subtype C epidemics, early transmission scenarios of the Regional simulations at as in most Southern African countries. The nucleotide se- very low sequence coverage (supplementary fig. S6, quence evolution model that was used incorporated known Supplementary Material online). More challenges were ex- differences in evolutionary rates by gene and codon position pected on the Village simulations despite higher sequence and relative differences in substitution rates by gene and coverage, partly because the effect size between the low codon position (Shapiro et al. 2006; Alizon and Fraser and high %Acute scenarios was smaller (supplementary fig. 2013). The coalescent and sequence evolution models did S7, Supplementary Material online). not account for recombination, sequencing errors, or selec- tion beyond differential evolutionary rates across genes, co- Responses to the Methods Comparison Exercise dons and within-host lineages (supplementary tables S1 and Participants were primarily asked to estimate incidence re- S2, Supplementary Material online). As a key indicator of the ductions from before the intervention (year 39 or 2014) to realism of the simulated sequences, we calculated the pro- just after the intervention (year 43 or 2018), and to estimate portionofthe variationinevolutionary diversification among the proportion of early transmissions in the year before and the simulated HIV-1 sequences, that can be explained by a after the intervention (table 1). Participating teams developed constant molecular clock model. The proportion explained fast computational strategies for handling full-genome HIV ranged from 25% to 60% (supplementary figs. S3 and S4, sequence datasets within given timelines (3 months for 13 Supplementary Material online), broadly in line with esti- Village datasets and 6 months for 20 regional datasets). First, mates on real HIV-1 sequence datasets (Lemey et al. 2006). where only sequences were provided, viral phylogenies were The simulations were designed to retain signal for differ- reconstructed with maximum likelihood methods (Price et al. entiating between the “fast”, “slow” and “no” community- 2010; Stamatakis 2014). Second, these phylogenies were dated Estimated and true incidence after the intervention (%) Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Team Cambridge-London rich Team Cambridge 1.5 1.0 0.5 0.0 Team London Team Vancouver D S A L F 03 01 C Q 12 E K 02 04 M N 00 I T J P O 05 06 G R B H 10 08 07 11 09 1.5 1.0 0.5 0.0 D S A L F 03 01 C Q 12 E K 02 04 M N 00 D S A L F 03 01 C Q 12 E K 02 04 M N 00 I T J P O 05 06 G R B H 10 08 07 11 09 I T J P O 05 06 G R B H 10 08 07 11 09 PANGEA data set Estimates based on concatenated gag, pol, env sequences true tree FIG.4. Estimates of HIV-1 incidence reductions from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team (panel) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black. under least-squares criteria or similar fast approaches (To mentary table S2, Supplementary Material online, team et al. 2015). Third, dated phylogenies were used as input to Cambridge-London who used a structured coalescent the transmission analysis methods described in table 2.This model). Bias in these estimates was relatively small for esti- sequential approach allowed the teams to obtain phyloge- mates of two teams (on an average 0.35% by team netic estimates to all reporting variables for the large majority Cambridge-London and 0.57% by team London). Team of the datasets (see supplementary table S3, Supplementary Basel-Zu ¨rich achieved substantially more accurate estimates Material online). Team Vancouver did not provide estimates on the Regional datasets than the Village datasets, whereas to datasets of the Regional model that contained true phy- the converse was true for team London (supplementary table logenetic trees; and teams Cambridge-London and Basel- S2, Supplementary Material online). Zu ¨rich did not provide estimates to datasets of the The accuracy of phylogenetic estimates of changes in in- Regional model that contained sequences. The most com- cidence as a result of the intervention largely reflected the mon reasons for incomplete recall were limited availability of accuracy of the underlying incidence estimates (fig. 4). computing resources, tight timelines to evaluate the simula- Phylogenetic estimates of incidence ratios correlated with tions, and difficulties in tree estimation when viral introduc- the true values by 93% under the structured coalescent ap- tions occurred frequently. Nearly all participants focused on proach of team Cambridge-London, and had only slight up- inference from full viral genomes (supplementary table S3, ward bias (supplementary table S4, Supplementary Material Supplementary Material online), meaning that the impact of online). This meant that large reductions in incidence, which full genome sequences (concatenated HIV-1 gag, pol and env are expected from combination prevention interventions, genes) as compared with partial sequences (HIV-1 pol gene could be correctly detected at relatively low sequence cover- only) could not be evaluated. age when sequences were sampled for 5 years since interven- tion start by the most successful method. Epidemic simulations with >25% reductions in incidence were cor- Estimating Incidence and Reductions in Incidence rectly classified as declining in 15/17 (88%) of all simulations Phylogenetic methods differed in their ability to estimate in- with a submission by team Cambridge-London, although the cidenceafter theintervention(fig. 3). Under the most suc- true positive rate was lower with other phylogenetic methods cessful computational approach, phylogenetic estimates of (supplementary table S5, Supplementary Material online). incidence were correlated with true values by 91% (supple Estimated and true incidence ratio Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Regional Simulation Model Team Cambridge-London rich Team London Team Vancouver OD S I T N R C Q G E P MA L F J K H B O D S I T NRC QG E P M A L F J K H B OD S I T N R CQG E PMA LF JK H B OD S I T N R C Q G E P MA L F J K H B Village Simulation Model Team Cambridge-London rich Team London Team Vancouver 09 11 00 02 03 05 08 01 04 06 07 10 12 09 11 00 02 03 05 08 01 04 06 07 10 12 09 11 00 02 03 05 08 01 04 06 07 10 12 09 11 00 02 03 05 08 01 04 06 07 10 12 PANGEA data set Estimates based on concatenated gag, pol, env sequences true tree FIG.5. Estimates of the proportion of transmissions from individuals in their first 3 months of infection (early and acute HIV), before the intervention from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team and model simulation (panels) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black. Estimating the Proportion of Transmissions from differences in the simulation datasets (referred to as “covar- iates”), such as sequence coverage and frequency of viral in- Individuals in Their First Three Months of Infection troductions (table 3). Figure 6A illustrates the phylogenetic (Early and Acute HIV) estimates that deviated largely from the true values (referred Phylogenetic estimates of the proportion of early transmis- to as “outliers”). We focused on quantifying the association of sions just before and after the intervention were more accu- outlier presence with the covariates listed in table 3 using a rate on the Regional simulations than the Village simulations, partial least squares regression approach, which enabled us to potentially reflecting stronger signal as a result of larger effect handle a relatively large number of co-dependent covariates sizes in the Regional simulations (fig. 5 and supplementary (see “Materials and Methods” section). figs. S6–S8, Supplementary Material online). On the regional Several covariates could be excluded from this analysis. simulations, estimates by team Cambridge-London had a Estimates obtained from the simulated full genome sequence mean absolute error of 3.9% and correlated with true values datasets were not more strongly associated with estimation by 92%. However, on the Village simulations, the mean abso- error than estimates obtained using the phylogenetic trees lute error in estimates by team Cambridge-London was 12% from which the sequences were simulated (supplementary (supplementary table S6, Supplementary Material online). fig. S9 and supplementary table S7, Supplementary Material Other teams had, overall, difficulties recovering the frequent online). Shorter, intense sampling periods after intervention early transmission scenarios. Team Basel-Zu ¨rich achieved the start of 3 years compared with a default of 5 years were also smallest mean absolute error on the Village simulations (sup not strongly associated with larger estimation error (supple plementary table S6, Supplementary Material online). mentary table S7, Supplementary Material online). Figure 6B shows the proportion of variance in outlier Predictors of Large Error in Phylogenetic Estimates presence that is explained by each of the remaining co- We evaluated to what extent the variation in errors of phy- variates. Signs indicate the impact of a change in predic- logenetic estimates could be associated to systematic tor values on the number of phylogenetic estimates with Estimated and true % early transmissions just before the intervention Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE Incidence Incidence reduction Proportion of early transmissions Proportion of early transmissions after intervention during intervention just before intervention after intervention 04 07 07 03 B Team Vancouver J H Team London A J M T S J J L P G Team Cambridge-London 09 11 Team Cambridge −2 0 2 01234 −20 0 20 −20 0 20 error in phylogenetic estimate a a Outlier No Yes Incidence Incidence reduction Proportion of early transmissions Proportion of early transmissions after intervention during intervention just before intervention after intervention Team Vancouver −+ + + − + − − − − − −− Team London ++ − − − − − −− − − − ++ − − − ++ − + −− Team Cambridge-London + +− + +− − 0 20406080 100 0 20 40 60 80 100 0 20406080 100 0 20 40 60 80 100 variance in outlier presence explained (%) Error predictor and predictor values Impact of change in predictor values True % incidence, increasing Positive impact, fewer outliers True incidence ratio, increasing Negative impact, more outliers True % early transmissions just before intervention, increasing True % early transmissions after intervention, increasing Village simulation model vs. Regional simulation model Frequency of viral introductions 20%/year vs. <=5%/year High sequence coverage (50% for Village, 16% for Regional) vs. lower coverage (25% for Village, 8% for Regional) Proportion of sequences from after intervention start >80% vs. 50% FIG.6. Predictors of large error in phylogenetic estimates. (A) For each response, the error in the phylogenetic estimate was calculated, and statistical outliers were identified. The plot shows error in phylogenetic estimates by team and outcome measure. For large errors, the corre- sponding PANGEA dataset code in table 1 is indicated. (B) The contribution of the systematically varied covariates in table 1 to the presence of outliers was quantified through partial least squares regression (PLS, see “Materials and Methods” section). The plot shows the contribution of each predictor to the variance in outlier presence in colors, and the corresponding signs of the regression coefficients are added. Estimates from team Cambridge could not be characterized due to small sample size. The impact of the error predictors varied across the primary objectives of phylogenetic inference, as well as the phylogenetic methods used. With regard to estimates of incidence and incidence reduction, a subset of phylogenetic methods was particularly sensitive to high sequence coverage, a very large proportion of sequences obtained after intervention start, and a large frequency of viral introductions. With regard to estimates of the proportion of early transmissions, outliers were in several cases best explained by true differences in the proportion of early transmissions. very large error. Subplots are empty when phylogenetic (>80% vs. 50%) were associated with more outliers for methods did not produce estimates with large error (in- more than one phylogenetic method. Frequent viral in- dicating ahigherdegreeofsuccess). Overall, with regard troductions (20%/year vs. < ¼5%/year) were associated to estimates of incidence and incidence reduction, higher with more outliers by team Basel-Zu ¨ rich. These predictors sequence coverage (16% vs. 8% in the Regional datasets tended to outweigh the impact that true differences in and 50% vs. 25% in the Village datasets) and a large pro- incidence and incidence reduction had on outlier portion of sequences obtained after intervention start presence. 196 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE In contrast, with regard to estimates of the proportion of (Faria et al. 2014), or to undertake descriptive analyses of early transmissions, outliers were in several cases best ex- putative transmission chains (Brenner et al. 2007; Dennis plained by true differences in the proportion of early trans- et al. 2012). To be precise, the challenge here was in obtaining missions. Several phylogenetic methods had substantial quantitative estimates of HIV-1 incidence and the sources of difficulty estimating frequent early transmissions. Low sam- transmission in generalized epidemics, and to do so close to pling coverage did not contribute substantially to the pres- the present, when the phylogenetic signal weakens (de Silva ence of outliers. To substantiate this observation further, we et al. 2012). Second, sequence coverage was relatively low in compared phylogenetic estimates from just before the inter- most simulations, as is expected for most endemic-phase vention to those after the intervention, and found no con- settings in sub-Saharan Africa. Furthermore, frequent viral sistent improvements in accuracy with a doubling of introductions complicated the interpretation of viral trees, sampling coverage (supplementary fig. S10, Supplementary timelines were tight (3 months for the Village datasets, and Material online). Instead, outlier presence could be explained 6 months for the Regional datasets), and phylodynamic mod- through the simulation model, with more outliers on the els had to represent viral spread in heterogeneous popula- Village datasets. These simulations were characterized by tions (males and females with different risk profiles). We smaller sample sizes and smaller effect size (table 3 and sup aspired to evaluate the extent to which these challenges plementary figs. S6 and S7, Supplementary Material online). can be addressed with full genome HIV-1 sequences, and through customized phylogenetic methods. The methods comparison exercise demonstrates that viral Discussion phylogenetic tools can successfully estimate aspects of recent The PANGEA methods comparison exercise represents a transmission dynamics of generalized HIV-1 epidemics at community-wide effort for advancing the use of phylogenetic limited sequence coverage of the infected population, when methods to estimate aspects of recent HIV-1 transmission full-genome sequences are available. Two methods, the ABC dynamics of generalized epidemics in sub-Saharan Africa. kernel method of team Vancouver and the Bayesian trans- This region is affected by the largest HIV-1 epidemics world- mission analyzer of team London (table 2), were newly de- wide. Viral phylogenetics could be a central tool to guide HIV- veloped in response to the exercise. The birth–death skyline 1 prevention in these settings (Dennis et al. 2014). model with sampled ancestors (Gavryushkina et al. 2014)and It is not possible for phylogenetic methods to capture all its multi-type analogue (Ku ¨hnert et al. 2016) are readily avail- factors that influence the spread of HIV-1, ranging all the way able through the BEAST2 software package. The structured from biological factors determining person-to-person trans- coalescent (Volz et al. 2009) was customized to reflect avail- mission (Cohen et al. 2011) to the structure of sexual net- able information on the simulated epidemics, and required works on the community level (Gregson et al. 2002; Tanser considerable resources (roughly 1 week of computation time et al. 2011), and the broader impact of prevention and care on a 64-core machine of 2.5 Ghz processors per analysis). The services (Gardner et al. 2011). Of course, capturing all such methods comparison reflects these different stages in devel- features may not be needed: particular aspects of HIV-1 opment and customization. In this context, the structured spread in generalized epidemics could be estimable from se- coalescent approach was overall most accurate, producing quence data under the simplifying assumptions of phyloge- accurate estimates of incidence and changes in incidence, netic methods, and at relatively low sequence coverage. as well as broadly accurate estimates into the proportion of To validate this hypothesis from the outset, the PANGEA- early transmissions on the Regional simulations from full- HIV team simulated data under two highly complex HIV genome sequences. Confidence intervals were sufficiently transmission and intervention models, whose components tight for epidemiological interpretation, bearing in mind are considered essential for understanding long-term HIV that uncertainty in tree reconstructions was ignored. This transmission dynamics (Eaton et al. 2012). The aspects of indicates that the latest generation of viral phylogenetic HIV-1 spread evaluated here (table 1) were chosen both be- methods can complement standard incidence estimation cause molecular epidemiological studies into the sources of techniques where full-genome sequences are available from transmission and temporal changes in epidemic spread are in the general population. The use of sequence data for estimat- principle feasible (von Wyl et al. 2011; Stadler et al. 2013; Volz ing incidence trends in sub-Saharan Africa could be particu- et al. 2013; Dennis et al. 2014; Ratmann et al. 2016), and larly useful where demographic and health survey data are because of their relevance to on-going HIV-1 prevention ef- sparse (Pillay et al. 2015), no relevant observational HIV co- forts in sub-Saharan Africa. Crucially, the model simulations horts exist, or where estimates would otherwise be solely were constrained to pessimistic and optimistic projections of reliant on data from particular population groups such as the likely outcomes of on-going HIV-1 prevention efforts in pregnant women (Montana et al. 2008). Further, this study sub-Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; Hayes supports using viral phylogenetic methods for identifying et al. 2014), as well as what sequence data could become sources of HIV-1 transmission from full-genome sequences available in these settings. in certain settings. Broadly accurate estimates into the frac- The methods comparison exercise was challenging. First, tion of transmissions attributable to a population group were the exercise focused on quantifying recent transmission dy- obtained when both transmission from that group was not namics, whereas HIV-1 sequence data are more routinely infrequent (at least 10%) and sample size was not too small used to characterize the origins and spread of the virus (thousands of sequences for the HIV-infected populations 197 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE considered). Viral phylogenetic methods could thus help to to epidemic settings in sub-Saharan Africa where multiple quantify the contribution of several other source populations subtypes and recombinant forms circulate at high frequen- that are of key interest for prevention in sub-Saharan Africa, cies.Third,phylogeneticanalyses of full-genome sequences including the proportion of individuals infected within local- were not compared with similar analyses using shorter frag- ized high prevalence areas (Tanser et al. 2013), or the propor- ments of the genome such as, e.g., several 250 base pair re- tion of young women infected by male peers (Dellar et al. gions from the gag, pol or env genes. Full-genome sequences 2015). may not be required for estimating recent changes in HIV-1 We varied aspects of transmission dynamics and the sam- incidence or for quantifying the sources of HIV-1 transmis- pling frame in the simulations, to obtain a more systematic sion, and more cost-effective sequencing approaches could understanding of methods’ performance (fig. 5). Most phylo- provide similar results. genetic methods did not identify significant differences be- The PANGEA-HIV methods comparison exercise showed tween the high/low early transmission scenarios, and this was viral phylogenetic methods can be adapted to provide quan- also the case when basic genetic distance measures recovered titative estimates on aspects of recent HIV-1 transmission differences between the high/low early transmission scenarios dynamics in sub-Saharan Africa, where sequence coverage (regional simulations, supplementary fig. S6, Supplementary remains limited. On simulations, the structured coalescent Material online). The true proportions of early transmissions approach was overall most accurate for estimating recent were also frequently outside 95% confidence or credibility changes in incidence and the proportion of early transmis- intervals. This indicates that further methods’ improvement sions in modeled populations with generalized, and large is needed for estimating the proportion of early transmissions, HIV-1 epidemics. Future molecular epidemiological analyses and potentially for attributing sources of HIV-1 transmission would ideally make use of several of the evaluated phyloge- more broadly at the low sequence coverage scenarios consid- netic tools, in order to obtain robust insights into HIV-1 ered. Further, nearly all participants reported difficulties in transmission flows and how to disrupt them. Further meth- achieving numerical convergence of their methods on full- ods’ refinement is required to this end, with our analysis genome sequence data (unpublished submission reports). suggesting a focus on estimating the sources of HIV-1 trans- This could explain the above observations in part, and in mission from full-genome HIV-1 sequence data. These find- particular why the accuracy of early transmission estimates ings were obtained through a community-wide, blinded did not improve when using larger datasets with higher se- evaluation, and thereby add confidence into the use and in- quence coverage (fig. 5 and supplementary fig. S10, terpretation of viral phylogenetic tools for HIV-1 surveillance Supplementary Material online). Further investigations are and prevention in sub-Saharan Arica and beyond. needed. Finally, our error analysis suggests that explicit mod- eling of unobserved source demes (team Cambridge- Materials and Methods London) or identification of spatially localized phylogenetic clusters prior to transmission analyses (team London) could Study Design be effective approaches for mitigating the negative impact of The blinded PANGEA-HIV methods comparison exercise was viral introductions on phylogenetic analyses on mobile pop- announced in October 2014 at HIV Dynamics & Evolution, ulations (Grabowski et al. 2014). The simulated PANGEA and later on the PANGEA-HIV website. In a training round datasets as well as various aspects of the corresponding (round 1), participants were asked to identify trends in inci- true epidemics and interventions are available for future dence on simulated sequence datasets that were similar in benchmarking. size to the datasets in table 3, but that had qualitatively dif- This study has limitations. First, phylogenetic methods ferent epidemic dynamics. Data included full-genome viral were evaluated on simulated HIV-1 epidemics. While the sequences, patient meta-data, and further broad information use of two models guards to some extent against over- on the simulated epidemic (supplementary text S1, interpretation, analyses of real datasets may be more complex Supplementary Material online). Participation was unre- and could be associated with overall larger error. Of note, the stricted. In December 2014, the training data were un- simulated datasets are free of sequence sampling biases, blinded. All participants shared their findings. PANGEA-HIV which can substantially distort phylogenetic inferences and the participants agreed on the objectives and reporting (Carnegie et al. 2014). Second, the evolutionary components variables listed in table 1; on the timelines for the second final of the two models generated sequences that do not contain round; and that participation will be retrospectively restricted gaps or sequencing errors, cannot be translated to amino to teams addressing at least one of the pre-specified reporting acids, were correctly aligned, and did not contain recombi- variables. Simulation models were updated to include explicit nant sequences. Viral trees reconstructed from real sequence HIV care and intervention components, and re-calibrated to data are likely less accurate than those used in this analysis, a generate the epidemic scenarios shown in figures 1 and 2. potential source of error that is not represented in our eval- Blinded datasets were released on 10 February 2015 (supple uations. Frequent recombination could imply that full HIV-1 mentary text S2, Supplementary Material online). The dead- genomes are more appropriately analyzed on a gene-by-gene line for submissions was 8 May 2015. Questions and clarifica- basis (Hollingsworth et al. 2010; Ward et al. 2013), in contrast tions during the exercise were disseminated to all to our full-genome analyses of simulated sequences that ex- participants. Submissions were checked manually, and teams cluded recombinants. This limitation is particularly relevant were given the opportunity to fix conceptual errors. Few 198 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE submissions to the Regional simulations were obtained, and interventions (population-level effectiveness of ART); the deadline for submission to Regional datasets was ex- within-host evolution (neutral coalescent model, no co- tended to 18 August 2015. The Village simulations were infection and no recombination); between-host evolution un-blinded on 14 May 2015, and a preliminary evaluation (transmission of one virion, no recombination); and sequence was presented and reviewed by all participants at the 22nd sampling (at time of diagnosis of randomly selected individ- HIV Dynamics & Evolution conference. Teams Vancouver uals). To obtain the six epidemic scenarios shown in figure 2, and Basel-Zu ¨rich informed the evaluation group of a concep- we varied the relative transmission rate from early infections tual misunderstanding of the reporting variables, and pro- as well as parameters relating to uptake of the combination vided updated incidence estimates after the intervention intervention respectively. The simulation algorithm is avail- 1 day after the presentation. These updates on the Village able from https://github.com/olli0601/PANGEA.HIV.sim (last datasets were used in the evaluation reported here. The accessed October 14, 2016), and combines (with further Regional datasets were un-blinded on 3 September 2015. code): the individual-based HPTN071 (PopART) model ver- sion 1.1 to generate transmission chains, the Village Simulations VirusTreeSimulator (https://github.com/PangeaHIV/ The Village simulations were generated using the Discrete VirusTreeSimulator; last accessed October 14, 2016) to gen- Spatial Phylo Simulator with HIV-specific components erate viral trees from transmission chains, and SeqGen version (DSPS-HIV, https://github.com/PangeaHIV/DSPS-HIV_ 1.3 (Rambaut and Grassly 1997) to simulate viral sequences PANGEA; last accessed October 14, 2016). The DSPS-HIV is along viral trees. an individual-based stochastic simulator which models HIV-1 transmissions along a specifiable contact network of individ- Protocols for Phylogenetic Transmission Analyses uals and produces a line-list of all events (Hodcroft 2015). All participants adopted overall similar computational strat- Viral phylogenies that reflect between- and within-host viral egies that first reconstructed dated maximum-likelihood trees evolution were generated along transmission chains using (Price et al. 2010; Stamatakis 2014; To et al. 2015), and then VirusTreeSimulator (https://github.com/PangeaHIV/ considered the viral trees fixed in one of the following trans- VirusTreeSimulator; last accessed October 14, 2016). HIV-1 mission analyses: subtype C sequences were simulated along these viral phy- logenies using pBUSS (Bielejec et al. 2014), with substitution rates parameterized from analyses of African subtype C se- ABC Kernel Method quences. An overview of the simulation pipeline is shown in Reporting variables were estimated with an experimental figure 1, and details about the parameter values and assump- kernel-ABC method that combines a kernel method on tions used in the DSPS-HIV and to generate phylogenies and tree shapes (Poon et al. 2013) with a framework for approx- sequences are found in supplementary table S2, imate Bayesian computation (ABC). The basic premise of Supplementary Material online. Notably, assumptions were ABC is that it is usually easier to simulate data from a model made in sexual mixing partners, partner duration, interven- than to calculate its exact likelihood for the observed data. A tions, sampling, and between- and within-evolution complex- model can then be fit to the observed data by adjusting its ity. Disease progression and transmission within the DSPS- parameters until it yields simulations that resemble these HIV are determined by set-point viral load using previously data, bypassing the calculation of likelihoods altogether. We described relationships (Fraser et al. 2007). Simulations were formulated a structured compartmental SI model (Jacquez parameterized to reflect estimates of prevalence and inci- et al. 1988) that was informed by the descriptions of the dencefromthe peak of theHIV-1 epidemic in thelate agent-based simulations that were distributed to all partici- 1980s and early 1990s (Serwadda et al. 1992; Wawer et al. pants. Specifically, the model comprised three populations: a 1994), before treatment was widely available, with the root of main local population, a second local high-risk minority pop- the sequences dating back 40 years previously, coinciding ulation, and an external source population. Each population with the recent subtype C estimates of a common ancestor in was further partitioned into susceptible and infected groups, the 1940s (Faria et al. 2014). Further information about the where the latter was stratified into three stages of infection DSPS-HIV will be available in a forthcoming publication. (acute, asymptomatic, and chronic). Mixing rates between the main and minority local populations were controlled by Regional Simulations two parameters to allow for asymmetric mixing. Individuals The Regional simulation model consists of a stochastic, with acute or asymptomatic infections migrated from the individual-level epidemic transmission and intervention external region to the local region at a constant rate m, model, and an evolutionary model that generates viral phy- and replaced with new susceptible individuals in the external logenies and sequence data to simulated transmission chains. region. One infected individual in the external source popu- Figure 1 and supplementary table S1, Supplementary Material lation started the simulation. Coalescent trees were then sim- online, describe the overall simulation pipeline, model com- ulated based on population trajectories derived from the ponents, parameters, and parameter values. Notably, assump- numerical solution of the ordinary differential equations tions were made on: sexual risk behavior (proportion of that represent the model, using the R package rcolgem. The individuals in risk groups, mixing between risk groups, partner subset tree kernel (Poon et al. 2013) was used as a distance change rates); HIV infection (relative transmission rates); measure between the simulated coalescent trees and the 199 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE reconstructed viral phylogeniesonavailablesequencedata, or sequence was not available, likelihood terms were adjusted the provided phylogenies. A Markov chain Monte Carlo im- by numerically calculating the probability that a case infected plementation of ABC was used to fit the model. This kernel- at a given time had no sampled descendant cases by the time ABC approach was validated on simulated data from more the study finished, and then conditioning on each case’s num- conventional compartmental models (Poon 2015). ber of sampled and unsampled descendants. A reversible- jump Bayesian MCMC approach with proposal moves as described in (Didelot et al. 2014) was used to fit the model. Birth–Death Skyline Method with Sampled Ancestors This approach produces a posterior collection of trans- Phylodynamic analyses were performed in BEAST v2.0 mission trees. From these, we extracted the portion of (Bouckaert et al. 2014) using the add-ons “bdsky” (Stadler infections in the acute stage, recent changes in incidence et al. 2013), “SA” (Gavryushkina et al. 2014)and “bdmm” and other outcomes required for the comparison study. (Ku ¨hnert et al. 2016). Under the birth–death skyline model The generation time t had prior t  0.4 gen gen with sampled ancestors (“SA” module), individuals could gamma(1.3,1) þ 0.6 gamma(3.5,3.5) where the arguments transmit with some probability after sampling which im- are the shape and scale parameters. The time to sampling proved estimation of the reporting variables in preliminary had prior t  gamma(0.7, 1.5). samp analyses (round 1 of the exercise). To estimate the proportion of early transmissions, the multi-type birth–death model was used with two compartments (“bdmm” module) to consider Structured Coalescent individuals in their first 3 months of infection separately from Structured coalescent models were implemented in the rcol- those in later stages of infection. In all analyses, time was gem R package and were based on compartmental infectious partitioned into different intervals to obtain estimates of vary- disease models using the approach described in (Volz 2012). ing transmission rates through time. As further described in These models were tailored to the Regional and Village sce- supplementary text S3, Supplementary Material online, for narios, and included compartments for stage of infection both Village and Regional simulations, lognormal priors (early HIV infection through AIDS as in Cori et al. 2014), were used for the effective reproductive number (mu¼ 0 sex, and diagnosis/treatment status. Transmission rates and sigma¼ 0.75) and the becoming-non-infectious rate were allowed to vary between compartments, and general- (lognormal with mu¼1 and sigma 0.5). Uniform priors ized logistic functions described secular trends in the force of were used for the sampling proportion, and specified based infection through time. Coalescent models also included a on available meta-data. For the Village datasets 0, 1, 2, 3, 4, 9, deme for the unsampled source deme to capture the effects 10, 11 and 12, we assumed a priori a sampling proportion of lineage importation into the surveyed region. Models were between 15% and 40%; for Village datasets 5, 6, 7 and 8 be- fitted to the dated viral phylogenetic trees and to available tween 40% and 100%; and for the Regional datasets between epidemiological data under the approximation that the cor- 5% and 10%. The prior distribution for the removal probabil- responding likelihood terms are independent. For the ity r was chosen based on an estimate of the proportion of Regional simulations, the contribution to the likelihood sampled infected individuals that are on treatment, and cal- model of the CD4 counts at diagnosis and gender of all se- culated from available survey data before intervention start. quenced individuals was assumed multinomial; the propor- Sensitivity analyses on these prior choices were conducted. tion of diagnoses with a sequence was assumed binomial; and The reporting variables were estimated from MCMC output that of survey data (sex, diagnosis, and treatment status) was of the posterior model parameters using a customized pro- assumed multinomial. For the Village simulations, fewer cedure that is fully described in supplementary text S3, meta-data variables were available. The likelihood model as- Supplementary Material online. sumed that estimated HIV prevalence was within the bounds given by the available survey data. A parallel Bayesian MCMC technique (Calderhead 2014) was used to obtain posterior Bayesian Transmission Chain Analyser distributions of model parameters. The Bayesian approach reported in (Didelot et al. 2014)was adapted to account for incomplete sampling as well as het- erogeneity in HIV transmission rates. In place of a susceptible- Statistical Analysis infectious-recovered (SIR) model (as in Didelotetal. 2014)a Phylogenetic estimates and true values were transformed so generalized branching model was used to describe transmis- that their differences were approximately normally distrib- sion dynamics. In this model, the (prior) time interval be- uted. For incidence and incidence reductions, the error tween a case becoming infected and infecting others (t ) e of response i was calculated as e ¼ logð^x Þ logðx Þ, i i i i gen is distributed such that there is a peak after infection, a where ^x is the phylogenetic estimate and x thetruevalue i i chronic phase, and increased infectivity with progression to on dataset i; for proportions, the error was calculated as AIDS. Cases were sampled after a random time since becom- e ¼ ^x  x . Data points outside the whiskers of Tukey box- i i i ing infected (t ). The prior distribution of the numbers of plots were considered as outliers. samp secondary cases was negative binomial (n¼ 5, P¼ 0.7), re- To identify covariates associated with large error in phylo- flecting a convolution of a Poisson distribution conditioned genetic estimates, stepwise model selection with the on a gamma-distributed overall infectivity. To account for stepGAIV.VR procedure in the gamlss Rpackage wasused infected individuals in transmission chains for whom a to reduce the number of covariates at significance level 0.01 200 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE (supplementary table S4, Supplementary Material online). consortium (to O.R., E.H., A.L.B., and C.F.); the NIH through The contribution of the remaining covariates to outlier pres- the NIAID cooperative agreement UM1AI068619 for work on ence (response) was evaluated with partial least squares (PLS) the HPTN 071 trial (to A.C., M.P., and C.F.); the Wellcome regression (Boulesteix and Strimmer 2007), because of the Trust (WR092311MF to O.R.); the European Research Council limited number of datasets and dependencies amongst the (PBDR-339251 to C.F., PhyPD-335529 to T.S.); the National covariates. PLS regression is a dimension reduction technique Institutes of Health (NIH MIDAS U01 GM110749 to E.V. and that identifies combinations of covariates (PLS latent factors) A.L.B., NIH R01 AI087520 to E.V.); the Biotechnology and that are maximally correlated with the response variable, Biological Sciences Research Council (BB/J004227/1 to S.J.L.); and then regresses the response variable against the latent the Canadian Institutes of Health Research (CIHR HOP- factors. The first four latent factors that explained most of the 111406 to A.F.Y.P., New Investigator Award 175594 to A.F.Y. variance in outlier presence were considered in the error P.), the Michael Smith Foundation for Health Research/St. analysis. Figure 5B shows, in the notation of (Boulesteix and Paul’s Hospital Foundation/the Providence Health Care Strimmer 2007), thesignofthe PLSregression coefficients B Research Institute (Scholar Award 5127 to A.F.Y.P.); the j1 for each covariate j to the univariate response variable across ETH Zu ¨rich Postdoctoral Fellowship Program (to D.R. and the first c ¼ 4 latent factors. The proportion of variance p in D.K.); theMarie CurieActions forPeople COFUND the response variable attributable to each covariate j is calcu- Program (to D.R. and D.K.); the University of Edinburgh c jk 2 lated as p ¼ Þ v ,where w is the weight of co- Chancellor’s Fellowship scheme (to S.J.L.); the Centre of j k jk k¼1 w variate j to the kth latent factor and v is the variance Expertise in Animal Disease Outbreaks (to S.J.L.); and the explained by the kth latent factor. PLS regression was per- Swiss National Science Foundation (162251 to G.E.L.). The formed with the plsr routine in the pls Rpackage. funders had no role in study design, data collection and anal- ysis, decision to publish, or preparation of the article. Supplementary Material References Supplementary figures S1–S10, tables S1–S7,and text S1–S4 areavailableat Molecular Biology and Evolution online. Alizon S, Fraser C. 2013. Within-host and between-host evolution- ary rates across the HIV-1 genome. Retrovirology 10. Available from: https://retrovirology.biomedcentral.com/articles/10.1186/ Author Contributions 1742-4690-10-49. A.L.B. and C.F. conceived the study. O.R., E.H., A.L.B., and C.F. Bezemer D, Cori A, Ratmann O, van Sighem A, Hermanides HS, Dutilh designed and coordinated the study. E.H., M.H., S.L., and A.L.B. BE, Gras L, Rodrigues Faria N, van den Hengel R, Duits AJ, et al. 2015. Dispersion of the HIV-1 epidemic in men who have sex with men in designed and generated the Village simulations. M.H. contrib- the Netherlands: a combined mathematical model and phylogenetic uted the virus tree simulator from transmission chains. M.P., analysis. PLoS Med. 12:e1001898. A.C.,O.R., andC.F.designedand generatedthe Regional sim- Bielejec F, Lemey P, Carvalho LM, Baele G, Rambaut A, Suchard MA. ulations. O.R. checked the submissions received, performed 2014. piBUSS: a parallel BEAST/BEAGLE utility for sequence simula- the statistical analysis and wrote the first draft except parts of tion under complex evolutionary scenarios. BMC Bioinformatics 15:133. the “Methods” section. C.C., M.K., X.D., G.P., A.P., J.J., R.L., C.W., BouckaertR,Heled J, KuhnertD,Vaughan T, Wu CH,Xie D, SuchardMA, G.L., D.R., D.K., T.S., E.V., B.D., M.H., and S.F. evaluated the Rambaut A, Drummond AJ. 2014. BEAST 2: a software platform for simulated data and wrote parts of the “Methods” section. Bayesian evolutionary analysis. PLoS Comput Biol. 10:e1003537. All authors reviewed and approved the statistical analysis, Boulesteix AL, Strimmer K. 2007. Partial least squares: a versatile tool for and the final version of the article. the analysis of high-dimensional genomic data. Brief Bioinform. 8:32–44. Brenner BG, Roger M, Routy JP, Moisi D, Ntemgwa M, Matte C, Baril JG, Acknowledgments Thomas R, Rouleau D, Bruneau J, et al. 2007. High rates of forward We thank Andrew Rambaut for his comments on the design transmission events after acute/early HIV-1 infection. JInfectDis. 195:951–959. of the exercise; the PANGEA-HIV steering committee and Calderhead B. 2014. A general construction for parallelizing Metropolis- participants of the PANGEA-HIV satellite workshop of the Hastings algorithms. Proc Natl Acad Sci U S A. 111:17408–17413. 21st and 22nd HIV Dynamics & Evolution conference for their Carnegie NB, Wang R, Novitsky V, De Gruttola V. 2014. Linkage of viral comments during the exercise; and three anonymous re- sequences among HIV-infected village residents in Botswana: esti- viewers and associate editors for their comments that im- mation of linkage rates in the presence of missing data. PLoS Comput Biol. 10:e1003430. proved an earlier version of the article. Regional simulations Cohen MS, Dye C, Fraser C, Miller WC, Powers KA, Williams BG. 2012. were designed and generated using resources at the Imperial HIV treatment as prevention: debate and commentary–will early College High Performance Computing Service (http://www3. infection compromise treatment-as-prevention strategies?. PLoS imperial.ac.uk/ict/services/hpc). Team Cambridge-London Med. 9:e1001232. thanks the MRC Centre for Outbreak Analysis and Cohen MS, Shaw GM, McMichael AJ, Haynes BF. 2011. Acute HIV-1 Infection. NEnglJMed. 364:1943–1954. Modeling for support. Team Vancouver thanks Rosemary Cori A, AylesH,BeyersN,SchaapA,Floyd S, SabapathyK,Eaton McCloskey for help with tree reconstructions; their contribu- JW, Hauck K, Smith P, Griffith S, et al. 2014. HPTN 071 tion was enabled in part by support provided by Westgrid (PopART): a cluster-randomized trial of the population impact (www.westgrid.ca) and Compute Canada Calcul Canada of an HIV combination prevention intervention including uni- (www.computecanada.ca). This work was supported by the versal testing and treatment: mathematical model. PLoS One 9:e84511. Bill & Melinda Gates Foundation through the PANGEA-HIV 201 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Ratmann et al. doi:10.1093/molbev/msw217 MBE de Silva E, Ferguson NM, Fraser C. 2012. Inferring pandemic growth rates a combination prevention package on population-level HIV incidence from sequence data. JRSocInterface 9:1797–1808. in Zambia and South Africa” [Internet]. 2015. HIV Prevention Trials Dearlove B, Wilson DJ. 2013. Coalescent inference for infectious disease: Network. Available from: https://www.hptn.org/sites/default/files/ meta-analysis of hepatitis C. Philos Trans R Soc Lond B Biol Sci. 2016-05/HPTN%20071-2_Phylogenetics%20Ancillary%20Protocol_v% 368:20120314. 201.0_15Jan2015.pdf. Dellar RC, Dlamini S, Karim QA. 2015. Adolescent girls and young Iwuji CC, Orne-Gliemann J, Tanser F, Boyer S, Lessells RJ, Lert F, Imrie J, women: key populations for HIV epidemic control. JInt AIDS Soc. Barnighausen T, Rekacewicz C, Bazin B, et al. 2013. Evaluation of the 18:19408. impact of immediate versus WHO recommendations-guided an- Dennis AM, Herbeck JT, Brown AL, Kellam P, de Oliveira T, Pillay D, tiretroviral therapy initiation on HIV incidence: the ANRS 12249 Fraser C, Cohen MS. 2014. Phylogenetic studies of transmission dy- TasP (Treatment as Prevention) trial in Hlabisa sub-district, namics in generalized HIV epidemics: an essential tool where the KwaZulu-Natal, South Africa: study protocol for a cluster rando- burden is greatest?. J Acquir Immune Defic Syndr. 67:181–195. misedcontrolledtrial. Trials 14:230. Dennis AM, Hue S, Hurt CB, Napravnik S, Sebastian J, Pillay D, Eron JJ. Jacquez JA, Simon CP, Koopman J, Sattenspiel L, Perry T. 1988. Modeling 2012. Phylogenetic insights into regional HIV transmission. Aids and analyzing HIV transmission – the effect of contact patterns. 26:1813–1822. Math Biosci. 92:119–199. Didelot X, Gardy J, Colijn C. 2014. Bayesian inference of infectious disease Kouyos RD, von Wyl V, Yerly S, Boni J, Taffe P, Shah C, Burgisser P, transmission from whole-genome sequence data. Mol Biol Evol. Klimkait T, Weber R, Hirschel B, et al. 2010. Molecular epidemiology 31:1869–1879. reveals long-term changes in HIV type 1 subtype B transmission in Eaton JW, Johnson LF, Salomon JA, Barnighausen T, Bendavid E, Switzerland. JInfect Dis. 201:1488–1497. Bershteyn A, Bloom DE, Cambiano V,FraserC, HontelezJA, et al. Ku ¨hnert D, Stadler T, Vaughan TG, Drummond A. 2016. Phylodynamics 2012. HIV treatment as prevention: systematic comparison of math- with migration: a computational framework to quantify population ematical models of the potential impact of antiretroviral therapy on structure from genomic data. Mol Biol Evol. 33:2102–2116. HIV incidence in South Africa. PLoS Med. 9:e1001245. Lemey P, Rambaut A, Pybus OG. 2006. HIV evolutionary dynamics Faria NR, Rambaut A, Suchard MA, Baele G, Bedford T, Ward MJ, Tatem within and among hosts. AIDS Rev. 8:125–140. AJ, Sousa JD, Arinaminpathy N, Pepin J, et al. 2014. HIV epidemiology. Montana LS, Mishra V, Hong R. 2008. Comparison of HIV prevalence The early spread and epidemic ignition of HIV-1 in human popula- estimates from antenatal care surveillance and population- tions. Science 346:56–61. based surveys in sub-Saharan Africa. Sex Transm Infect. Fisher M, PaoD,Brown AE,SudarshiD,Gill ON,CaneP,Buckton AJ, 84(Suppl 1):i78–i84. Parry JV, Johnson AM, Sabin C, et al. 2010. Determinants of HIV- Moore JS, Essex M, Lebelonyane R, El Halabi S, Makhema J, Lockman 1 transmission in men who have sex with men: a combined S, Tchetgen E, Holme MP, Mills L, Bachanas P, Marukutira T, et al. clinical, epidemiological and phylogenetic approach. Aids 2013. Botswana Combination Prevention Project (BCPP). 24:1739–1747. ClincialTrials.gov. Available from: https://clinicaltrials.gov/ct2/ Fraser C, Hollingsworth TD, Chapman R, de Wolf F, Hanage WP. 2007. show/NCT01965470. Variation in HIV-1 set-point viral load: epidemiological analysis and Novitsky V, Ku ¨hnert D, Moyo S, Widenfelt E, Okui L, Essex M. 2015. an evolutionary hypothesis. Proc Natl Acad Sci U S A. Phylodynamic analysis of HIV sub-epidemics in Mochudi, Botswana. 104:17441–17446. Epidemics 13:44–55. Gardner EM, McLees MP, Steiner JF, Del Rio C, Burman WJ. 2011. The Pillay D, HerbeckJ,Cohen MS,de OliveiraT,FraserC,Ratmann O, spectrum of engagement in HIV care and its relevance to test-and- Brown AL, Kellam P, Consortium P-H. 2015. PANGEA-HIV: phy- treat strategies for prevention of HIV infection. Clin Infect Dis. logenetics for generalised epidemics in Africa. Lancet Infect Dis. 52:793–800. 15:259–261. Gavryushkina A, Welch D, Stadler T, Drummond AJ. 2014. Bayesian Poon AF. 2015. Phylodynamic inference with kernel ABC and its appli- inference of sampled ancestor trees for epidemiology and fossil cal- cation to HIV epidemiology. Mol Biol Evol. 32:2483–2495. ibration. PLoS Comput Biol. 10:e1003919. Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH. Grabowski MK, Lessler J, Redd AD, Kagaayi J, Laeyendecker O, Ndyanabo 2013. Mapping the shapes of phylogenetic trees from human and A, Nelson MI, Cummings DA, Bwanika JB, Mueller AC, et al. 2014. zoonotic RNA viruses. PLoS One 8:e78122. The role of viral introductions in sustaining community-based HIV Price MN, Dehal PS, Arkin AP. 2010. FastTree 2–approximately epidemics in rural Uganda: evidence from spatial clustering, phylo- maximum-likelihood trees for large alignments. PLoS One 5:e9490. genetics, and egocentric transmission models. PLoS Med. Pybus OG, Rambaut A. 2009. Evolutionary analysis of the dynamics of 11:e1001610. viral infectious disease. Nat Rev Genet. 10:540–550. Gregson S, Nyamukapa CA, Garnett GP, Mason PR,ZhuwauT,CaraelM, Rambaut A, Grassly NC. 1997. Seq-Gen: an application for the Monte Chandiwana SK, Anderson RM. 2002. Sexual mixing patterns and Carlo simulation of DNA sequence evolution along phylogenetic sex-differentials in teenage exposure to HIV infection in rural trees. Comput Appl Biosci. 13:235–238. Zimbabwe. Lancet 359:1896–1903. Ratmann O, van Sighem A, Bezemer D, Gavryushkina A, Juurrians S, Hayes R, Ayles H, Beyers N, Sabapathy K, Floyd S, Shanaube K, Bock P, Wensing AM, de Wolf F, Reiss P, Fraser C. 2016. Sources of HIV Griffith S, Moore A, Watson-Jones D, et al. 2014. HPTN 071 infection among men having sex with men and implications for (PopART): rationale and design of a cluster-randomised trial of prevention. Sci Transl Med. 8:320ra322. the population impact of an HIV combination prevention interven- Serwadda D, Wawer MJ, Musgrave SD, Sewankambo NK, Kaplan JE, Gray tion including universal testing and treatment – a study protocol for RH. 1992. HIV risk factors in three geographic strata of rural Rakai a cluster randomised trial. Trials 15:57. District, Uganda. Aids 6:983–989. Hodcroft E. 2015. Estimating the heritability of virulence in HIV. PhD Shapiro B, Rambaut A, Drummond AJ. 2006. Choosing appropriate sub- thesis, University of Edinburgh. Available from: https://www.era.lib. stitution models for the phylogenetic analysis of protein-coding se- ed.ac.uk/handle/1842/15814. quences. MolBiolEvol. 23:7–9. Hollingsworth TD, Laeyendecker O, Shirreff G, Donnelly CA, Serwadda D, Stadler T, Bonhoeffer S. 2013. Uncovering epidemiological dynamics in Wawer MJ, Kiwanuka N, Nalugoda F, Collinson-Streng A, Ssempijja heterogeneous host populations using phylogenetic methods. Philos V, et al. 2010. HIV-1 transmitting couples have similar viral load set- Trans R Soc Lond B Biol Sci. 368:20120198. points in Rakai, Uganda. PLoS Pathog. 6:e1000876. Stadler T, Kuhnert D, Bonhoeffer S, Drummond AJ. 2013. Birth- HPTN 071-2 Phylogenetics in HPTN 071: An ancillary study to death skyline plot reveals temporal changes of epidemic spread “Population Effects of Antiretroviral Therapy to Reduce HIV in HIV and hepatitis C virus (HCV). Proc Natl Acad Sci U S A. Transmission (PopART): A cluster-randomized trial of the impact of 110:228–233. 202 Downloaded from https://academic.oup.com/mbe/article/34/1/185/2670195 by DeepDyve user on 14 July 2022 Phylogenetic Tools for Generalized HIV-1 Epidemics doi:10.1093/molbev/msw217 MBE Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis Volz E, Ionides E, Romero-Severson E, Brandt MG, Mokotoff E, and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. Koopman J. 2013. HIV-1 transmission during early infection in Tanser F, Barnighausen T, Grapsa E, Zaidi J, Newell ML. 2013. High cov- men who have sex with men: a phylodynamic analysis. PLoS erage of ART associated with decline in risk of HIV acquisition in Med. 10:e1001568. rural KwaZulu-Natal, South Africa. Science 339:966–971. Volz EM. 2012. Complex population dynamics and the coalescent under Tanser F, Barnighausen T, Hund L, Garnett GP, McGrath N, Newell ML. neutrality. Genetics 190:187–201. 2011. Effect of concurrent sexual partnerships on rate of new HIV Volz EM, Kosakovsky Pond SL, Ward MJ, Leigh Brown AJ, Frost SD. 2009. infections in a high-prevalence, rural South African population: a Phylodynamics of infectious disease epidemics. Genetics cohort study. Lancet 378:247–255. 183:1421–1430. To TH, Jung M, Lycett S, Gascuel O. 2015. Fast dating using least-squares von WylV,KouyosRD, YerlyS,BoniJ,Shah C,BurgisserP,Klimkait T, criteria and algorithms. Syst Biol. 65:82–97. Weber R, Hirschel B, Cavassini M, et al. 2011. The role of migration UNAIDS. 2014. Fast-Track – Ending the AIDS epidemic by 2030. Geneva: and domestic transmission in the spread of HIV-1 non-B subtypes in UNAIDS. Available from: http://www.unaids.org/en/resources/docu Switzerland. JInfect Dis. 204:1095–1103. ments/2014/JC2686_WAD2014report Ward MJ, Lycett SJ, Kalish ML, Rambaut A, Leigh Brown A. 2013. UNAIDS. 2015. AIDS by the numbers 2015. Geneva: UNAIDS. Available Estimating therateofintersubtyperecombination in earlyHIV-1 from: http://www.unaids.org/sites/default/files/media_asset/AIDS_ group M strains. JVirol. 87:1967–1973. by_the_numbers_2015_en.pdf. Wawer MJ, Sewankambo NK, Berkley S, Serwadda D, Musgrave SD, Vassall A, Pickles M, Chandrashekar S, Boily MC, Shetty G, Guinness L, Gray RH, Musagara M, Stallings RY, Konde-Lule JK. 1994. Incidence Lowndes CM, Bradley J, Moses S, Alary M, et al. 2014. Cost-effective- of HIV-1 infection in a rural region of Uganda. BMJ 308:171–173. ness of HIV prevention for high-risk groups at scale: an economic WHO. 2015. Guideline on when to start antiretroviral therapy and on evaluation of the Avahan programme in south India. Lancet Glob pre-exposure prophylaxis for HIV. Geneva: WHO Press. Available Health 2:e531–e540. from: http://www.who.int/hiv/pub/guidelines/earlyrelease-arv/en/.

Journal

Molecular Biology and EvolutionOxford University Press

Published: Jan 1, 2017

There are no references for this article.