Background: Emerging pathogens such as Zika, chikungunya, Ebola, and dengue viruses are serious threats to national and global health security. Accurate forecasts of emerging epidemics and their severity are critical to minimizing subsequent mortality, morbidity, and economic loss. The recent introduction of chikungunya and Zika virus to the Americas underscores the need for better methods for disease surveillance and forecasting. Methods: To explore the suitability of current approaches to forecasting emerging diseases, the Defense Advanced Research Projects Agency (DARPA) launched the 2014–2015 DARPA Chikungunya Challenge to forecast the number of cases and spread of chikungunya disease in the Americas. Challenge participants (n = 38 during final evaluation) provided predictions of chikungunya epidemics across the Americas for a six-month period, from September 1, 2014 to February 16, 2015, to be evaluated by comparison with incidence data reported to the Pan American Health Organization (PAHO). This manuscript presents an overview of the challenge and a summary of the approaches used by the winners. Results: Participant submissions were evaluated by a team of non-competing government subject matter experts based on numerical accuracy and methodology. Although this manuscript does not include in-depth analyses of the results, cursory analyses suggest that simpler models appear to outperform more complex approaches that included, for example, demographic information and transportation dynamics, due to the reporting biases, which can be implicitly captured in statistical models. Mosquito-dynamics, population specific information, and dengue-specific information correlated best with prediction accuracy. Conclusion: We conclude that with careful consideration and understanding of the relative advantages and disadvantages of particular methods, implementation of an effective prediction system is feasible. However, there is a need to improve the quality of the data in order to more accurately predict the course of epidemics. Keywords: Chikungunya, Forecasting, Morphological models, Mechanistic models Background of infectious disease spread that are analogous to those Mathematical models for infectious diseases have been available for weather prediction. Forecasting approaches used to gain insight into disease dynamics for more than vary substantially in both method and complexity; for acentury [1–4]. However, only recently have models example, some use human judgment or prediction andsystems beguntobedesignedspecifically forthe markets, some use purely statistical or machine learning task of providing regularly updated quantitative forecasts approaches, and others rely upon disease transmission models of varying complexity [5–8]. In parallel, recent experiences responding to outbreaks *Correspondence: email@example.com have highlighted the significant utility of infectious dis- Analytics, Intelligence, and Technology Division, Los Alamos National Laboratory, P.O. Box 1663, Bikini Atoll Road, Los Alamos, New Mexico 87544, ease forecasts to support decision-making [9, 10]. Models USA provide critical insight in the face of limited data by fore- Full list of author information is available at the end of the article casting the international spread of viruses, illustrating © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 2 of 14 the value of different mitigation strategies, and assess- with forecasts that were inaccurate and that did not make ing the risk of continued danger in cases such as the the inherent underlying uncertainties clear. 2009 influenza pandemic [11, 12]. Early predictions for This manuscript summarizes the challenge and provides the 2014-2015 Ebola outbreak in West Africa indicated a description of the top six solver submissions including that incidence would continue to grow rapidly unless sig- data sources and methodologies. nificant mitigation measures were undertaken . This information helped galvanize the international response Chikungunya challenge to the crisis and indicate the importance of rapid deploy- Chikungunya is a mosquito-borne viral infection of ment of resources. As the outbreak progressed, incidence humans. Although rarely fatal, chikungunya is an emerg- forecasts were used to inform the planning and execution ing, debilitating viral disease that is transmitted among of clinical trials for vaccines and therapeutics by ensur- humans by mosquitoes . There is no specific treat- ing that activities were responding to the rapidly changing ment for the disease, although palliative care has been situation and that decision makers had adequate time to shown to reduce its severity and duration. The chikun- develop contingency plans [14, 15]. gunya virus (CHIKV) was originally detected in Tanzania Disease forecasting has received significant attention in 1952, with the name meaning ‘to become contorted’ in among the mathematical epidemiological community the Kimakonde language of Mozambique, referring to the as well as decision makers. For example, the 2012 effects of severe joint pain . Chikungunya expanded National Strategy for Biosurveillance  specifically to Asia and the Indo-Pacific islands, causing notably large identified forecasting as one of the core functions of a outbreaks over the past 10-20 years. national biosurveillance enterprise. Building upon this, The CHIKV epidemic was well suited for this Chal- the 2013 National Biosurveillance Science and Technol- lengebecause itsspreadtothe WesternHemispherehad ogy Roadmap identified several key research priorities, been expected for some years and presented a valu- including additional research and development for dis- able opportunity to evaluate disease progression in a ease forecasting technology, which are critical to achiev- naive population. Further, there was a pre-existing report- ing the overall goal of providing decision makers with ing system via the Pan American Health Organization more accurate and timely information during biological (PAHO) in place for tracking disease incidence across the incidents. Americas. The goal of the DARPA Chikungunya Chal- In response to this madate, several United States (US) lenge was to evaluate state-of-the-art epidemic modeling Government agencies have conducted challenge and prize methods to forecast outbreaks of CHIKV throughout the competitions that involved infectious disease forecasting Americas, to compare modeling strategies, and to pro- in an effort to help mature operational forecasting tech- vide insight into how different data streams could be nologies. The Center for Disease Control and Prevention incorporated into these models. The Challenge provided has organized consecutive challenges for the 2013-2018 a baseline of current forecasting capabilities for infectious influenza seasons that have focused on predicting the tim- diseases and their applicability for vector-borne infectious ing and intensity of influenza-like illness (ILI) in the US at diseases. the regional level [17, 18]. In 2015, several departments in the US Government joined together with the support of Design and execution of the DARPA Chikungunya the National Science and Technology Council to launch an challenge open dengue challenge that strove to forecast disease inci- The introduction of CHIKV into the Western Hemisphere dence using previously unpublished data from Peru and had been anticipated, and the first case was recorded in Puerto Rico . The 2014-2015 DARPA Chikungunya Saint Martin in December 2013 . Its emergence in Challenge was conceived as an effort to mobilize a wide the Caribbean caused substantial morbidity in the popu- variety of participants to foster innovation and advance lation and concern about subsequent spread in the Amer- the state of the art by attempting to predict chikungunya icas. After the first cases were reported in December incidence across the Americas . 2013, the virus spread throughout the Eastern Caribbean Nonetheless, significant challenges remain for the islands and into Central and South America, reaching the development of operational forecasting as a mature tech- United States in mid-July, 2014. Since then, Zika has been nology . The fundamental science of forecasting needs detected in several countries and territories of the Ameri- to be developed and supported by a robust research pro- cas . As of epidemiological week 35 of 2014 (Septem- gram. Data availability is often limited, especially during ber 18, 2014), when the DARPA Chikungunya Challenge outbreak responses, and this hampers the ability to pro- was initiated, 659,367 cases, including 37 deaths, had been vide critical insights in a timely fashion. While some deci- reported in the Americas. The disease was determined to sion makers have embraced the use of modeling and fore- be an ideal candidate for the DARPA Chikungunya Chal- casting, others remain skeptical, having been presented lenge because of the predictable spread of the virus among Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 3 of 14 Table 1 Description of the DARPA Chikungunya Challenge an immunologically naive population, and the availability deliverables and points description of incidence data reported by participating countries to Deliverable Due Content Max PAHO . date Points The Department of Defense’s (DOD) role in global health includes conducting timely, relevant, and com- 1 September Initial methodology, 5 1, 2014 documentation, and data prehensive health surveillance to promote, maintain, and sources enhance the health of both the military and associated 2 Septebmer Forecast for 6-month period 5 populations. Tracking disease outbreaks and emergence 8, 2014 (Epidemic week 36-9) of new pathogens is an intrinsic component of this effort. 3 October Forecast for peak new cases 10 Force health protection and readiness, protection of civil- 1, 2014 ian populations, medical stability operations, and part- 4 October Forecast for 5-month period 15 nership engagement are key components to this mandate. 1, 2014 (Epidemic week 36-9) Conducting health surveillance that can detect, contain, 5November Forecast for 4-month period 20 and prevent impacts of intentional or natural biological 1, 2014 (Epidemic week 36-9) events is a critical part of the DOD’s ability to maintain 6 December Forecast for 3-month period 15 force health while promoting stability and security abroad. 1, 2014 (Epidemic week 36-9) To accomplish this, there needs to be a proactive approach 7January Forecast for 2-month period 10 to anticipating the geographic and temporal trajectory of 1, 2015 (Epidemic week 36-9) infectious disease outbreaks. 8February Forecast for 1-month period 5 Mathematical and statistical models (grouped under 1, 2015 (Epidemic week 36-9) the morphological category in this manuscript) are used 9February Final methodology, 15 1, 2015 documentation, and data not only to forecast the spatial-temporal evolution of sources real world outbreaks, but also to estimate the potential Maximum total points 100 value of mitigation efforts. The latter requires an accu- rate understanding of both public policy and the behav- ior of people in novel situations. A further challenge is how existing methods account for delayed reporting and information, and social media. Proprietary data were per- underreporting, and how to use additional data streams mitted for incorporation into models if obtained indepen- to reduce systematic errors (bias) and forecasting uncer- dently by participants. Participants were not required to tainties. The DARPA Chikungunya Challenge addressed disclose the content of proprietary data but had to include this data gap by promoting innovation in data integration a detailed description of how it was obtained and used in techniques. the Challenge methodology deliverables. The methodol- The DARPA Chikungunya Challenge asked partici- ogy reports required sections describing: (1) data sources pants to forecast the cumulative total cases (suspected used, (2) model robustness, (3) applicability, (4) presenta- and confirmed, the latter including imported-confirmed) tion, and (5) computational requirements. perweekper country. Aformatwas selected to inspire innovative approaches and encourage non-traditional par- Methods ticipants, forecasting approaches, and data sources to Summaries of participants’ approaches improve overall infectious disease forecasting capabilities. DARPA awarded cash prizes to six leading participants, The forecast submissions were evaluated and scored on including $150,000 for first place, $100,000 for second a weighted basis (Table 1). The forecasts were submitted place, and $50,000 to each of four honorable mentions. at various stages of the epidemic progression across the The leading participants used varying methodologies and Americas (Fig. 1). The figure provides information on the model types to inform their forecasts. The following are epidemic progression as PAHO reports during the time descriptions of their overall approach, methodologies to of the reporting . Evaluation of methodology was per- forecast the spread of chikungunya in the Americas, and a formed by a panel of non-competing government subject brief summary of their results. matter experts in infectious disease modeling, CHIKV, and other vector-borne diseases. First place submission (henceforth participant 1) Accuracy was scored based on the predicted number of A simple model for the recent outbreaks of chikungunya in cases and spread of CHIKV in the Americas compared the Americas to weekly publicly-available PAHO reporting of suspected Modeling Approach: Participant 1 relied on estimating and confirmed cases. Participants were encouraged to the growth rate G(N ) of the outbreak in each country as a utilize any publicly available data for modeling and fore- function of N,where G = dN /dt, N is a smooth interpo- casting such as climate, clinical surveillance data, genetic lation of the total number of cases reported on the PAHO Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 4 of 14 Fig. 1 Weekly incidence of chikungunya cases, aggregated by region from PAHO reports (symbols) and smoothed epidemic curves (lines). The two vertical lines show the beginning and end of the prediction period for the DARPA Challenge website, and t is time in weeks. The function G implicitly quadratic fit for G was used. Attempts to connect these reflects the combined effects of the meteorological, geo- groups to economic (Gini Coefficient, per capita Gross graphic, human, and vector characteristics that describe Domestic Product), demographic (population density and vector borne diseases. Participant 1 fitted G to a quadratic percent of population living in urban areas), connectiv- or piecewise quadratic function G , which describes N ity (number of ports, number of port calls, and distance as proportional to the number of infected and recovered between islands), and health indices (infant mortality and individuals in an Susceptible-Infectious-Recovered (SIR) life expectancy) were unsuccessful. model . Participant 1 solved the differential equation dN /dt = G (N ) and chose parameters in the expres- Second place submission (henceforth participant 2) sion of G as to optimize both (C ) (i.e., how well G (N ) Predicting the spread of chikungunya using a logistic S- f 1 f approximates G(N)) and (C ) (i.e., how well N (t),obtained curve from solving dN /dt = G (N ), fits the reported cumula- Modeling Approach:Participant 2usedaBoundedGeo- tive epidemiological curve) . metric Growth approach (shown by a logistic function Results: Model parameters were estimated by hand, with or S-curve on Fig. 3) to model CHIKV across the ameri- the help of a MATLAB graphical user interface, dis- cas. Participant 2 used a macro-enabled Excel workbook played in Fig. 2. The top right plot shows how G(N ) (blue to manually fit each curve to the PAHO data for each solid curve) for the Dominican Republic may be approxi- country. mated by a quadratic function (inverted parabola in red). Results: This approach described the overall dynamics Parameter values are set by the sliders on the left. The for about half the countries. The results show that the bottom right plot compares the predicted and observed model worked best for countries with higher incidence cumulative epidemiological curves: the red stars are the than for countries with low incidence. model predictions obtained by solving dN /dt = G (N ); the reported data are shown as blue circles. By observ- Honorable mention #1 (henceforth participant 3) ing how changes in the model parameters affected these Forecasting chikungunya fever plots, parameter values that best fitted the data for each Modeling Approach: Participant 3 implemented three country were selected. Participant 1 organized the PAHO different predictive models for each country, namely the countries into groups, depending on dengue and CHIKV logistic model, the Cauchy model, and an epidemiologi- incidence and on whether a quadratic or piecewise cal SIR model, which were fitted to the smoothed PAHO Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 5 of 14 Fig. 2 The MATLAB interface used for the model (developed by Participant 1).The growth rate G(N) for the Dominican Republic is shown by the solid blue curve. The inverted parabola in red represents its quadratic approximation G (N). Parameter values are set by the sliders on the left. The bottom right plot compares the predicted and observed cumulative epidemiological curves: the red stars are the model predictions obtained by solving dN/dt = G (N); the reported data (from PAHO) are shown as blue circles data. The basic assumption that all predictive models have enough data, especially at the early stages of an outbreak, is that the total cases for each country is a sigmoidal the problem of minimization can be ill determined; there- function of time (Fig. 4). The parameters of each model fore, the problem is regularized using Tikhonov (or ridge) were estimated by regularized weighted non-linear least regularization . All considered, models had only three squares. In detail, the iterative Gauss-Newton algorithm parameters to be estimated. was utilized for the minimization of the error (or cost) Results: The forecasts were obtained for each country by function. The weighting procedure assigns more weight projecting the estimated predictive model to the future. to the recent data rather than to the past, modeling the Confidence intervals were provided for the estimated fact that data from the far past contain less information parameter vector based on the covariance matrix. The about the future. Furthermore, due to the typical lack of computed confidence intervals were able to create upper Fig. 3 The straight orange line represents the predicted line based on data as of the 35th week (August 29th, 2014), by participant 2. The blue squares represent the actual data (adjusted to when they occurred, not when they were reported) as of the 56th week (January 23rd, 2015) Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 6 of 14 Fig. 4 Smoothed PAHO data (magenta) and the logistic model (red), Cauchy model (green) and SIR model (black) for the USA’s total cases from Participant 3. Data from the first 38 weeks were used for the prediction. The upper and lower bounds were computed from the covariance matrix of the estimated parameter vector and lower bonds for the predicted values. Figure 4 shows . Imported cases were predicted to follow the total the three-month forecasts for the USA. Notice that the infections in the region, and were scaled to the historical SIR prediction has the best performance for the USA, but proportion of imported cases to total cases for each the logistic or Cauchy predictions were found to perform country. better in other countries. Results: This simple and robust method provides sat- isfactory solutions, which may circumvent some of the Honorable mention #2 (henceforth participant 4) problems of classical analytic methods for basic epi- A simple empirical approach to predict the spread of demics. The method outlined gives a good approximation epidemics for short-term forecasting especially with limited data Modeling Approach: Participant 4 used an empirical but cannot give probabilistic forecasts nor provide an approach to fit the observed incidence provided by PAHO analytical model that can be refined using more detailed using the least-means squares. For epidemics where there data of transmission, incident cases, and population is active transmission in a population, the incidence as a movement. function of time I(t) can be fitted to incidence, I(t) = m −nt At e ,where A, m and n are constants and m > 0 ,as depicted in Fig. 5. The cumulative incidence for autochthonous and imported cases for each territory was obtained from the weekly PAHO data and used to derive the weekly incidence for each territory . For simplicity, countries were considered to have either autochthonous transmission or imported cases. The cumulative number of cases was fitted to the incidence function for the model using the weekly incidence data derived from PAHO. Con- ditions were imposed to allow a solution to be derived. The solutions were found to be optimal when the total Fig. 5 Empirical model for disease progression used by Participant 4. cases, and the cases in the last six weeks in predictions Incidence (I)isplottedasafunctionoftime I(t), and can be fitted, from the model were matched with observed data, and m −nt I(t) = At e where A, m and n are constants and m > 0 transmission was assumed to last no longer than one year Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 7 of 14 Honorable mention #3 (henceforth participant 5) forecasts that were significantly over or underestimated Forecasting the Spread of Chikungunya Virus using a Cou- also decreased over time. In addition, five large outbreaks pled SEIR Transmission Model (> 1000 reported cases) were severely (> 50%) underesti- Modeling Approach: Participant 5 used a stochastic, mated in the five-month forecast. mechanistic model of transmission dynamics in each locality to forecast chikungunya epidemics for each coun- Honorable mention #4 (henceforth participant 6) try and territory in the PAHO data. A susceptible- Modeling the chikungunya epidemic in the Americas: Dis- exposed-infectious-recovered (SEIR) transmission model tributional ecology and population dynamics was developed to describe viral transmission between Modeling Approach:Participant 6usedvectoroccur- human and mosquito populations . People in the sus- rence and climate variables  to generate ecological ceptible class experience a force of infection and become niche models (ENM) for vectors as multidimensional infected at a rate, λH = αβ Z /N + ξ which depends ellipsoid forms enclosing occurrences in a multidi- on the biting rate of mosquitoes (α), the transmission effi- mensional environmental space, as described previously ciency of the virus from mosquito to humans (β ), and the [36, 37]. The models depended on two main estima- number of infectious mosquitoes per human (Z/N). The tions: (i) rates at with which the virus is transmitted force of infection scales non-linearly with the number of locally, and (ii) rates of importation of infections. To infectious mosquitoes (Z ), where ϕ < 1. The human obtain these estimates, four “ingredients” were employed: force of infection also includes exposed individuals com- primary occurrence data for mosquito species, 50-year ing into the population from elsewhere at rate ξ, which climate data averages, estimated pairwise city-to-city air- was represented using a gravity model, with the rate enter- line passenger travel rates, and case report data from ing the population from another locality dependent on the PAHO . Aedes aegypti and Aedes albopictus occur- sizes of each population and inversely proportional to the rences were drawn from Campbell et al. . Prin- distance between the two populations . This mecha- cipal components analysis (PCA) was applied to the nistic model was implemented in a state-space modeling original climate variables to reduce their number and framework with an imperfect observation process on top correlation ; the first three components (which of the transmission dynamics and stochasticity in both explained 84.9% of the overall variance) were used as the infection and observation processes. Model parameter axes to define the multidimensional environmental space values were estimated and then used to generate weekly (NicheA 3.0 ). To identify areas with environmen- forecasts using an iterated filtering method for calculating tal conditions ideal for transmission [40–44], Participant maximum likelihood estimates implemented in the pomp 6 divided the ellipsoid for each vector into 100 layers package in R . summarizing proximity to the niche centroid to iden- Results: The weekly forecasts were calculated as the tify areas close to or far from the ENM centroid. Thus, median of 2000 simulations (Fig. 6). The overall num- areas close to the niche centroid (i.e., areas ideal for ber of cases predicted was fairly accurate, particularly for transmission) were identified as potential transmission the one to four month forecasts. The number of country hotspots (Fig. 7). Fig. 6 Weekly simulation of reported chikungunya cases in (a) Puerto Rico and (b) Saint Barthelemy from Participant 5. Simulations are one-month forecasts for February 2015. Red circles represent reported cases and each light blue line represents one of 2000 simulated outbreaks. Dark blue line is median used for prediction and dashed lines are 95% prediction intervals Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 8 of 14 Fig. 7 Identification of transmission hot spots as areas close to the niche centroid, showing human movement vectors (air travel) to estimate connectivity from Participant 6 Results: Participant 6 found that most countries showed the Challenge is shown in Fig. 8, to complement the a dramatic pattern of intensive reporting in early weeks weekly incidences shown in Fig. 1. An interactive ver- of the epidemic, followed by reduced reporting in sion of this map, showing the CHIKV epidemic pro- later stages. This phenomenon was termed “surveillance gression across the Western Hemisphere is available at fatigue” to refer to the reduction of collection, reporting, the website: http://bsvgateway.org/chikv/ (courtesy and and publication of epidemiological data after explosive copyright, LANL). PAHO groups countries based on and sustained disease outbreak events. These models sup- their geographic location into the following regions: port the idea of higher incidences than those reported North America (Bermuda, Canada, Mexico, USA); Cen- during late surveillance, suggesting that reduced reported tral America (Costa Rica, El Salvador, Guatemala, Hon- rates may be driven by reduction in effort rather than a duras, Nicaragua and Panama); Latin Caribbean (Cuba, dramatic pause on local transmission. Countries closest Dominican Republic, French Guinea, Guadaloupe, Haiti, to the centroid of vectors’ niches showed higher CHIKV Martinique, Puerto Rico, Saint Barthelemy and Saint prevalence. Fore a complete description of the model and Martin (French Part); Andean Area (Bolivia, Colombia, methodology please refer to . Ecuador, Peru and Venezuela); South Zone (Argentina, Brazil, Chile, Paraguay and Uruguay) and the Non-Latin Results Caribbean countries (Anguilla, Antigua and Barbuda, Reported PAHO data Aruba, Bahamas, Cayman Islands, Curacao, Dominica, The distribution of chikungunya cases across the 50 Grenada, Guyana, Jamaica, Montserrat, Saint Kitts and participating PAHO countries, at three times during Nevis Saint Lucia, Saint Vincent and Grenadines, Saint Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 9 of 14 Fig. 8 Progression of the CHIKV epidemic in the Americas as a function of time, as reported by participating countries to PAHO Martin (Dutch part), Suriname, Trinidad and Tobago, us to cross-check with alternative reports. The countries Turks and Caicos, US Virgin Islands and UK Virgin chosen represent the spectrum of variability associ- Islands). ated with geography, socio-economic strata, population, By week 36 of 2014 (corresponding to the week of weather and other parameters. Specifically, we analyzed September 6, 2014), at the beginning of the Challenge, Guadeloupe, Martinique, Dominican Republic, Haiti, 651,344 suspected cases were reported to PAHO, mostly United States, Mexico, El Salvador, Guatemala, Colom- in the Latin Caribbean region, with 8210 confirmed cases. bia, and Venezuela. Below, we present an analysis of solver The United States reported 762 imported cases. By week entries for these countries. We chose to highlight differ- 48 of 2014, the epidemic was largely over in the Latin ent solver entries, including some that did not rank among Caribbean region, but was peaking in Central America the top 6, in the analysis presented in the manuscript. and the Andean region, with the total number of sus- The reason being that certain submissions were more suit- pected at 914,960 and 15,906 confirmed cases. By the end able for demonstration of a particular concept, and certain of the Challenge in week 8 of 2015 (corresponding to methodologies required attention, even though the entries the week of February 22, 2015), 1,247,359 cases had been did not rank among the top 6 solvers. reported to PAHO, of which 24,982 cases were confirmed. The epidemic had largely ended in the Latin Caribbean Choice of models with a reported incidence of 2.2%, had subsided for the To better understand the participant submissions, it is year in Central America with a reported incidence of 0.4%, important to define and describe the general model- and was still near a broad peak in the Andean area, with a ing approaches used by top participants. Classification reported incidence of 0.16%. of participant-submitted models was challenging, as par- The 20 most-affected countries accounted for 98% of ticipants typically used hybrid models that combined all reported chikungunya cases. The Dominican Repub- aspects of different approaches. For the purpose of this lic reported the most cases, followed by El Salvador and manuscript, and ensuing discussion, we have categorized Colombia.Bothdelayed andsporadic reportingwereevi- the models submitted by all participants (not just the dent in the reported data, which should be kept in mind winning ones) into three broad categories: morphological when this information is used to derive predictions of models, mechanistic models, and subject matter expert future epidemics. Accuracy and timeliness of the reported models (SME). Morphological models represent a curve- number of new cases may depend on the socio-economic fitting approach, wherein the curves can be defined ana- structure, health care infrastructure, economic strength, lytically or via a set of differential equations. The curves and other factors. are fitted independently to each outbreak and/or derived We focused our discussion on a subset of the 50 from an entirely different outbreak (e.g., dengue), suitably PAHO countries with more complete data that allowed scaled and translated (solvers 1-4 in this manuscript). Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 10 of 14 Mechanistic models attempt to capture the dynamic inter- participants used a morphological approach to arrive at play of outbreaks in multiple countries and/or describe their prediction. a dynamic interplay in the host (humans) and vectors There is no significant correlation between the number (mosquitoes)(solvers 5 and 6 in this manuscript). The of data sources used and the accuracy of the forecasts, SME-based model (i.e., participant defined subject mat- irrespective of the type of the model being utilized. In ter experts), utilized by only one participant (who did short, more data does not necessarily translate into better not rank in the top 6, not discussed in this manuscript), forecasts. The most important thing was to get the right required consensus subjective opinion of various experts kind of data, and to use the data appropriately. A regres- in the field, and did not require any type of computation sion analysis relating forecast accuracy to the types of to generate a prediction. This approach relied exclusively data sources used by each participant (Figure 9b) showed on expert judgment as an alternate to explicit modeling, that some data streams, such as those related to dengue leveraging the collective expertise to maximize forecast epidemiology or mosquito dynamics, are used in models accuracy and simultaneously minimizing the number and that have smaller forecasting errors. Conversely, models strength of assumptions made. It is worth noting that that exploit demographics and transportation data, have this approach has been traditionally used by public health worseforecastaccuracythanmodelsthatdonotusethem. practitioners in the absence of models to inform their Online searches correlated positively with accurate out- decisions. As expected from their descriptions, the model comes, although the specificity of this data-stream is dif- types overlap with each other in many cases. For example, ficult to define because of the wide variety of information many participants used subject matter expertise to inform types that can be tapped through the Internet. Arguably, mechanistic and morphological models. the explanation is that Internet searches are used to validate, and sometimes, correct other data streams. In Data sources for effective predictions of Chikungunya summary, not all data sources lead to improved fore- Participants typically used several data sources to com- casting accuracy. However, models that leverage specific plement the information provided by PAHO. It is impor- data sources to substantiate missing links in surveillance tant to note that not all of these data sources were data (e.g., dengue epidemiology data) or help improve utilized to derive the predictions made in the final sub- data quality (e.g., Internet searches), typically have more missions. These data types included online web searches accurate forecasts. (e.g., Wikipedia, Google searches, government web- sites), climate information (e.g., temperature and humid- Predicting the peak of the epidemic ity), vector-specific information (e.g., reporting of other Although the peak of an outbreak is one of the most mosquito-borne illnesses such as dengue in the same significant features of an epidemic, it was relatively dif- population, mosquito dynamics, ecology) and others ficult for the solvers to predict. We analyzed the peak (Table 2). Figure 9a represents the effect of the number predictions provided by the top 11 participants for the of data sources used on the accuracy of prediction, as 20 hardest-hit countries. As mentioned earlier, by the differentiated by the main categories of models defined time the first prediction was submitted, the epidemic elsewhere, for the top 10 participants of the Challenge. had ended in the Latin Caribbean countries, and was Participants with higher accuracy (i.e., 3, 4, 1, and 2) just getting started in Central America and the Andean used anywhere between 1-8 data sources. However, not all region. Since participants were not allowed to “back- data sources were considered or included in deriving the cast” (i.e., predict in the past), the best choice was to final prediction. Interestingly, all four of these top ranking select week 40 as the peak week, as a consequence of Table 2 Major categories of data sources used by the top 6 participants in the DARPA Chikungunya Challenge, although not all data sources were incorporated into the modeling by the solvers Solver # PAHO Online/ Population Climate Transportation Economic Vector Dengue News Index 6 Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 11 of 14 ab Fig. 9 Effect of choice of data streams on accuracy of predictions, and relevant data streams for effective predictions. Participants 1-4 used morphological models, whereas 5 and 6 developed mechanistic models. a The effect of the number of data sources on accuracy of prediction, differentiated by the main categories of models employed by the participants. Note that the subject matter expert category refers to participant-defined subject matter experts. b The positive versus negative correlation of use of a data source on the accuracy of the prediction by regression analysis the challenge design. Figure 10 shows the peak predic- and showed very little variability) in the predictions pro- tions for a subset of countries. Only some of all 36 vided for all countries considered here (e.g., participants participants were able to accurately predict the exact 1 and 4), whereas others showed more variation (e.g., week of the peak, and only in a few countries. The peak participant 3) (data not shown). Indeed, the standard devi- week as reported by PAHO clearly varies from partici- ationfor thePAHOdatawas larger duetothe fact that pant submissions. A statistical analysis of the predicted the peak for these countries was spread out starting from peaks indicates that some participants showed very lit- week 8.5 for Saint Barthelemy to week 55 for Guyana tle variation (i.e., predictions were extremely conservative, (data not shown). Fig. 10 Peak Week Predictions. This figure shows the boxplot of peak week predictions for the top 9 participants for 10 countries. The box contains 50% of the predictions. The blue dots show the actual peak week as reported by PAHO (not shown for the first four countries because the epidemic had already peaked prior to week 35) Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 12 of 14 Discussion important for predictions to be judged against reliable The ability to go beyond health surveillance and provide reported data, such as a controlled test-bed, wherein the timely predictions of disease spread to mitigate disease evaluation of different models and methodologies can be outbreaks is a capability gap in global health. The DARPA performed accurately and the value of various strategies Chikungunya Challenge (also referred to as the Challenge) clearly delineated. These findings, and further efforts to attempted to address this gap by promoting innovation understand reported data and integrate multiple surveil- in data collection techniques and infectious disease mod- lance systems, could improve both the quality and quality eling and prediction. The Challenge also aimed to iden- of reporting and the associated response to an outbreak, tify and characterize methodologies, data streams, and making the dream of an effective infectious disease fore- approaches beyond the traditional winners that demon- casting architecture a reality. strate critical value or lack thereof in predicting CHIKV Acknowledgements outbreaks, with the intention of developing an integral The authors thank the Defense Advanced Research Projects Agency (DARPA) multi-aspect forecasting system for future use. and in particular COL Matthew Hepburn, MD (Program Manager, BTO), Dr. Anne Cheever (Associate/Lead Scientist at Booz Allen Hamilton and Technical It is a health security imperative to detect, contain, and Advisor to DARPA) and Dr. David Fang (Sr. Lead Technologist at Booz Allen prevent impacts of intentional or natural biological events. Hamilton and Technical Advisor to DARPA) for the design and execution of In order to accomplish this, proactive anticipation of the the Challenge, technical direction, and advise. In addition, we would like to thank the Pan American Health Organization (PAHO) for providing the data trajectory of infectious diseases outbreaks is required for required for the Challenge, and subsequent participation in discussions and public health planning. The results from this Challenge deliberations. Many thanks are due to the volunteers that allowed for effective may inform future efforts in response to Zika outbreaks, judging, and to the team of Subject Matter Experts that facilitated the design and execution of the Challenge. The LANL authors would like to thank Jonas or that associated with existing vector-borne diseases like Lukasczyk, a summer student who made the Chikunguya visualization shown dengue. in Fig. 8. Although most participants utilized multiple data Funding streams, the use of a large number of data streams did not The LANL authors would like to thank DARPA for supporting the analysis of necessarily improve the accuracy of the predictions. It was the Challenge, as well as for providing administrative assistance during the the choice of the data streams, and how they were utilized DARPA Chikungunya Challenge workshop. The challenge and the independent analysis performed by LANL were supported by DARPA. LANL is that enabled successful predictions. Participants that used operated by Los Alamos National Security, LLC for the Department of Energy alternative data streams to understand gaps and limita- under contract DE-AC52-06NA25396. tionsinthe availabledatawerebetterabletopredictthe Availability of data and materials epidemic. Mosquito-dynamics, population specific infor- As this study involved the description of existing studies, all data supporting mation, and dengue-specific information correlated best the described models can be obtained by contacting the respective with prediction accuracy. participants. Authors’ contributions Conclusion SYD, BM, NWH, and HM drafted the manuscript, independently analyzed the The results of this Challenge highlighted the fact that outcomes of the challenge, and assisted in evaluation of challenge entries. JA with careful consideration and understanding of the rela- and RH contributed to the challenge design, evaluation of outcomes, and contributed to the manuscript. JCL, HEB (aka Participant 1), MEL (aka tive advantages and disadvantages of particular methods, Participant 2), YP (aka Participant 3), DJR (aka Participant 4), SM (aka Participant implementation of an effective prediction system is feasi- 5), ATP, LEE, and HQ (aka Participant 6) participated in the challenge and ble. Indeed, the ability of a model to forecast the reported provided summaries of their entries for this manuscript. All authors read, edited, and approved the final manuscript. data may not always translate into the ability of a model to forecast the epidemic. Furthermore, it may be of crit- Ethics approval and consent to participate ical importance to also capture emergent behavior and Not applicable. mitigation strategies implemented in response to a deadly Competing interests epidemic, which may require the use of more complex The authors declare that they have no competing interests. modeling approaches. Publisher’s Note Improved data reporting might not always be possi- Springer Nature remains neutral with regard to jurisdictional claims in ble, as this depends on the socio-economic and cultural published maps and institutional affiliations. framework of participating countries. However, uniform Author details application of case definitions, reporting of geographic Analytics, Intelligence, and Technology Division, Los Alamos National and demographic subsets of people, and reporting of dates Laboratory, P.O. Box 1663, Bikini Atoll Road, Los Alamos, New Mexico 87544, of disease onset, rather than date of report may improve USA. Theoretical Division, Los Alamos National Laboratory, P.O. Box 1663, Bikini Atoll Road, Los Alamos, New Mexico, 87544, USA. Leidos Supporting the overall usability of the reported data. Also, qualifi- Biomedical Advanced Research and Development Authority, 200 cation of data with parallel epidemics (e.g., dengue, in Independence Avenue, S.W., Washington, District of Columbia 20201, USA. this case) that rely on the same climactic factors and vec- 4 Office of the Assistant Secretary for Preparedness and Response, U.S. tor dynamics can significantly improve predictions. It is Department of Health and Human Services, 200 Independence Avenue, S.W., Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 13 of 14 Washington, District of Columbia 20201, USA. Department of Mathematics, default/files/National_Strategy_for_Biosurveillance_July_2012.pdf. University of Arizona, 617 N. Santa Rita Ave, Tucson, Arizona, 85721, USA. Accessed 23 Jan 2017. Epidemiology and Biostatistics Department, University of Arizona, 1295 N. 17. Biggerstaff M, Alper D, Dredze M, Fox S, Fung IC, Hickmann KS, Lewis B, Martin Ave, Tucson, Arizona 85724, USA. Utah Valley University, 800 W Rosenfeld R, Shaman J, Tsou MH, Velardi P, Vespignani A, Finelli L. University Pkwy, Orem, Utah 84058, USA. Department of Mathematics and Results from the centers for disease control and prevention’s predict the Statistics, University of Massachusetts, 710 N. Pleasant St, Amherst, 2013-2014 influenza season challenge. BMC Infect Dis. 2016;16:357. Massachusetts 01003, USA. Present Address: Institute of Applied and https://doi.org/10.1186/s12879-016-1669-x. Computational Mathematics, Foundation for Research and Technology - 18. Center for Disease Control and Prevention: Epidemic Prediction Initiative. Hellas, Heraklion, Greece. NHS Blood and Transplant-Oxford, BRC https://predict.phiresearchlab.org Accessed 05 Feb 2018. Haematology Theme and Radcliffe Department of Medicine, John Radcliffe 19. CDC: Epidemic Prediction Initiative. https://predict.phiresearchlab.org/ Hospital, Headley Way, Oxford OX3 9BQ, UK. Department of Biological legacy/dengue/index.html Accessed 23 Jan 2017. Sciences, University of Notre Dame, Notre Dame, IN 46556, USA. Biodiversity 20. DARPA: DARPA Forecasting Chikungunya Challenge. https://www. Institute, University of Kansas, 1345 Jayhawk Blvd, Lawrence, Kansas 66045, innocentive.com/ar/challenge/9933617 Accessed 15 Aug 2014. USA. Department of Fish and Wildlife Conservation, Virginia Tech, 21. Moran KR, Fairchild G, Generous N, Hickmann K, Osthus D, Priedhorsky Blacksburg, VA 24061, USA. Institute of Zoology, Chinese Academy of R, Hyman J, Del Valle SY. Epidemic forecasting is messier than weather Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China. forecasting: The role of human behavior and Internet data streams in Chemistry Division, Los Alamos National Laboratory, P.O. Box 1663, Bikini epidemic forecast. J Infect Dis. 2016;214(suppl 4):404-8. Atoll Road, Los Alamos, New Mexico 87544, USA. 22. Staples JE, Breiman RF, Powers AM. Chikungunya fever: An epidemiological review of a re-emerging infectious disease. Clin Inf Dis. 2009;49(6):942-8. Received: 26 April 2017 Accepted: 30 April 2018 23. World Health Organization (WHO): Chikungunya. http://www.who.int/ denguecontrol/arbo-viral/other_arboviral_chikungunya/en/ Accessed 09 Mar 2016. References 24. World Health Organization (WHO): Emergency Preparedness and 1. Hamer WH. The Milroy lectures on epidemic disease in England –The Response: Chikungunya in the French Part of the Caribbean Isle of Saint evidence of variability and of persistency of type. The Lancet. 1906;167: Martin. http://www.who.int/csr/don/2013_12_10a/en/ Accessed 09 Mar 665-662. 2. Kermack WO, McKendrick AG. A contribution to the Mathematical Theory 25. World Health Organization (WHO) Collaborating Centres: Global of Epidemics. In: Proceedings of the Royal Society of London. Series A, Database. http://apps.who.int/whocc/ Accessed 09 Mar 2016. Containing Papers of a Mathematical and Physical Character, vol. 115. No. 26. Pan American Health Organization (PAHO): Chikungunya. http://www. 772; 1927. p. 700-21. paho.org/chikungunya Accessed 09 Mar 2016. 3. Anderson R, May R. Population biology of infectious diseases: Part I. 27. Hethcote HW. The mathematics of infectious diseases. SIAM Rev. Nature. 1979;280:361. 2000;42(4):599-653. 4. Anderson RM, May RM, Boily M, Garnett G, Rowley J, May R. The spread 28. Lega J, Brown HE. Data-driven outbreak forecasting with a simple of HIV-1 in Africa: sexual contact patterns and the predicted demographic nonlinear growth model. Epidemics. 2016;17:19-26. impact of AIDS. Nature. 1991;352(6336):581–9. 29. Tarantola A. Inverse Problem Theory and Methods for Model Parameter 5. Shaman J, Karspeck A. Forecasting seasonal outbreaks of influenza. Proc Estimation. New York: Elsevier Sci; 1987. Natl Acad Sci U S A. 2012;109(50):20425–30. 30. Pan American Health Organization (PAHO): Chikungunya Incidence Data. 6. Tizzoni M, Bajardi P, Poletto C, Ramasco JJ, Balcan D, Gonçalves B, Perra http://www.paho.org/hq/index.php?option=com_topics&view= N, Colizza V, Vespignani A. Real-time numerical forecast of global epidemic readall&cid=5927&Itemid=40931&lang=en. Accessed 09 Mar 2016. spreading: case study of 2009 A/H1N1pdm. BMC Med. 2012;10(1):1. 31. Yakob L, Clements AC. A mathematical model of Chikungunya dynamics 7. Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global and control: The major epidemic on Reunion Island. PLoS ONE. 2013;8(3): disease monitoring and forecasting with Wikipedia. PLoS Comput Biol. 2014;10(11):1003892. 32. Keeling MJ, Rohani P. Modeling Infectious Diseases in Humans and 8. Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible Animals. NJ: Princeton University Press; 2008. modeling of epidemics with an empirical Bayes framework. PLoS Comput 33. Xia Y, Bjørnstad ON, Grenfell BT. Measles metapopulation dynamics: A Biol. 2015;11(8):1004382. gravity model for epidemiological coupling and dynamics. Am Nat. 9. Cretien J-P, Riley S, George DB. Mathematical modeling of the West Africa 2004;164(2):267-81. ebola epidemic. eLIFE. 2015;4:09186. 34. Ionides E, Bretó C, King A. Inference for nonlinear dynamical systems. 10. Moghadas SM, Pizzi NJ, Wu J, Yan P. Managing public health crises: the Proc Natl Acad Sci U S A. 2006;103(49):18438–43. role of models in pandemic preparedness. Influenza Other Respir Viruses. 35. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high 2009;3(2):75-79. resolution interpolated climate surfaces for global land areas. Int J Climatol. 11. Colizza V, Barrat A, Barthelemy M, Valleron A-J, Vespignani A. Modeling 2005;25(15):1965–78. the worldwide spread of pandemic influenza: Baseline case and 36. Soberón J, Nakamura M. Niches and distributional areas: Concepts, containment interventions. PLoS Med. 2007;4(1):13. methods, and assumptions. Proc Natl Acad Sci U S A. 12. Chan J, Holmes A, Rabadan R. Network analysis of global influenza 2009;106(Supplement 2):19644–50. spread. PLoS Comput Biol. 2010;6(11):1001005. 37. Van Aelst S, Rousseeuw P. Minimum volume ellipsoid. Wiley Interdiscip Rev Comput Stat. 2009;1(1):71-82. 13. Meltzer MI, Atkins CY, Santibanez S, Knust B, Petersen BW, Ervin ED, Nichol ST, Damon IK, Washington ML, et al. Estimating the future number 38. Campbell LP, Luther C, Moo-Llanes D, Ramsey JM, Danis-Lozano R, of cases in the ebola epidemic – Liberia and Sierra Leone, 2014-2015. Peterson AT. Climate change influences on global distributions of dengue MMWR Surveill Summ. 2014;63(Suppl 3):1-14. and Chikungunya virus vectors. Philos Trans R Soc Lond B Biol Sci. 2015;370(1665):20140135. 14. Bellan SE, Pulliam JR, Pearson CA, Champredon D, Fox SJ, Skrip L, Galvani AP, Galvani M, Gambhir M, Lopman BA, Porco TC, Meyers LA, 39. Qiao H, Peterson AT, Campbell LP, Soberón J, Ji L, Escobar LE. NicheA: Dusho J. Statistical power and validity of Ebola vaccine trials in Sierra Creating virtual species and ecological niches in multivariate Leone: A simulation study of trial design and analysis. Lancet Infect Dis. environmental scenarios. Ecography. 2016;39:805–13. 2015;15(6):703-10. 40. Martínez-Meyer E, Díaz-Porras D, Peterson AT, Yáñez-Arenas C. Ecological niche structure and rangewide abundance patterns of species. 15. Kucharski AJ, Eggo RM, Watson C, Camacho A, Funk S, Edmunds WJ. Biol Lett. 2013;9(1):20120637. Effectiveness of ring vaccination as control strategy for Ebola virus disease. 41. Yáñez-Arenas C, Peterson AT, Mokondoko P, Rojas-Soto O, Martínez- Emerg Infect Dis. 2016;22(1):105-8. Meyer E. The use of ecological niche modeling to infer potential risk areas 16. White House: Office of Science and Technology Policy (OSTP): National of snakebite in the Mexican state of Veracruz. PLoS ONE. 2014;9(6):100957. Strategy for Biosurveillance. https://obamawhitehouse.archives.gov/sites/ Del Valle et al. BMC Infectious Diseases (2018) 18:245 Page 14 of 14 42. Holt RD. Bringing the Hutchinsonian niche into the 21st century: Ecological and evolutionary perspectives. Proc Natl Acad Sci U S A. 2009;106(Supplement 2):19659–65. 43. Manthey JD, Campbell LP, Saupe EE, Soberón J, Hensz CM, Myers CE, Owens HL Ingenlo K, Peterson AT, Barve N, et al. A test of niche centrality as a determinant of population trends and conservation status in threatened and endangered North American birds. Endanger Species Res. 2015;26(3):201–8. 44. Lira-Noriega A, Manthey JD. Relationship of genetic diversity and niche centrality: A survey and analysis. Evolution. 2014;68(4):1082–93. 45. Romero-Alvarez D, Peterson AT, Escobar LE. Surveillance fatigue (fatigatio vigilantiae) during epidemics. Rev Chil Infectología. 2017;34:289–292.
BMC Infectious Diseases – Springer Journals
Published: May 30, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud