# Multicomponent ensemble models to forecast induced seismicity

Multicomponent ensemble models to forecast induced seismicity SUMMARY In recent years, human-induced seismicity has become a more and more relevant topic due to its economic and social implications. Several models and approaches have been developed to explain underlying physical processes or forecast induced seismicity. They range from simple statistical models to coupled numerical models incorporating complex physics. We advocate the need for forecast testing as currently the best method for ascertaining if models are capable to reasonably accounting for key physical governing processes—or not. Moreover, operational forecast models are of great interest to help on-site decision-making in projects entailing induced earthquakes. We previously introduced a standardized framework following the guidelines of the Collaboratory for the Study of Earthquake Predictability, the Induced Seismicity Test Bench, to test, validate, and rank induced seismicity models. In this study, we describe how to construct multicomponent ensemble models based on Bayesian weightings that deliver more accurate forecasts than individual models in the case of Basel 2006 and Soultz-sous-Forêts 2004 enhanced geothermal stimulation projects. For this, we examine five calibrated variants of two significantly different model groups: (1) Shapiro and Smoothed Seismicity based on the seismogenic index, simple modified Omori-law-type seismicity decay, and temporally weighted smoothed seismicity; (2) Hydraulics and Seismicity based on numerically modelled pore pressure evolution that triggers seismicity using the Mohr–Coulomb failure criterion. We also demonstrate how the individual and ensemble models would perform as part of an operational Adaptive Traffic Light System. Investigating seismicity forecasts based on a range of potential injection scenarios, we use forecast periods of different durations to compute the occurrence probabilities of seismic events M ≥ 3. We show that in the case of the Basel 2006 geothermal stimulation the models forecast hazardous levels of seismicity days before the occurrence of felt events. Earthquake interaction, forecasting, and prediction, Induced seismicity, Statistical seismology 1 INTRODUCTION Increasing demand for energy is giving rise to more and more underground engineering projects accompanied by induced earthquakes. These projects are associated with various types of operations, such as controlled filling of surface reservoirs (Koyna, India, Gupta 2002), mining (e.g. Witwatersrand, South Africa, Cook 1976), hydraulic fracturing (e.g. Horn River Basin, Canada, Farahbod et al. 2015), gas production (e.g. Groningen, The Netherlands, van Thienen-Visser & Breunese 2015), gas injections (e.g. Castor gas-storage project, Spain, Cesca et al. 2014), waste water injections (e.g. Oklahoma, USA, Keranen et al. 2013) or geothermal energy projects (e.g. Basel, Switzerland, Häring et al. (2008), Cooper Basin, Australia, Baisch et al. (2009)). Not only the scientific community but also the general public is becoming increasingly aware of induced seismicity as the number of felt or damaging induced earthquakes increases (e.g. Basel, Hirschberg et al. (2015), the Prague, Fairview, and Pawnee earthquakes in Oklahoma, Langenbruch & Zoback (2016)). Clearly, efficient monitoring systems and hazard management procedures are needed to guarantee the safety of such underground engineering operations and thus safeguard their economical profitability. In the scientific literature, great attention is paid to modelling reservoir behaviour (e.g. Baisch et al. 2010; Gischig et al. 2014; Segall & Lu 2015; Catalli et al. 2016). The models in question examine the governing processes involved either by attempting to reproduce the entire history of the project (e.g. the injection rate, well-head pressure evolution, seismicity) or in conceptual studies without any calibration against observations. We argue that sound calibration and careful testing of model, forecasts are important, because (1) this is an unbiased way to test models if they are able to reasonably capture the most important physical governing processes; and (2) in ongoing reservoir stimulations, models can deliver quick, reasonable forecasts of the seismicity rate, magnitude and spatial distribution for the near future. In the latter case, models become part of induced seismic hazard control schemes (such as traffic light systems, Király-Proag et al. (2016); Grigoli et al. (2017)) that support operators’ on-site decisions. For such near real-time model calibration and forecasting, computational efficiency is a key issue as currently most of the existing models are more computationally expensive than required, especially if they involve numerical modelling. We believe that high-performance computing will resolve this issue in the near future. Another key issue is the accuracy of forecasting. To assess accuracy, we introduced an Induced Seismicity Test Bench, which is a standardized framework for validating and ranking induced seismicity forecasts (Király-Proag et al. 2016). This is done in a pseudo-prospective way, which means that a limited period of an already recorded data set is given to the models being tested to train them (i.e. calibration on a learning period), followed by the generation of forecasts for the next few hours, days or weeks (i.e. the forecast period). We test several duration of learning and forecast periods to chart their effect on the models’ forecast skills. The forecast performance is measured using statistical tests that were developed by the Collaboratory for the Study of Earthquake Predictability (CSEP community) for natural earthquakes (e.g. Gerstenberger & Rhoades 2010; Schorlemmer et al. 2010; Zechar et al. 2010; Nanjo et al. 2011; Eberhard et al. 2012; Mignan et al. 2013; Taroni et al. 2013; Zechar et al. 2013). The aforementioned testing framework applies to any induced-seismicity-related project. In Király-Proag et al. (2016), we showed in details how well two models performed in the case of two deep geothermal projects (i.e. Soultz-sous-Forêts 2004 and Basel 2006). The conclusion reached was that neither of the tested models was capable of perfect forecasts and that different models capture different spatiotemporal aspects of the seismic sequences. In this paper, we intend to go one step further and show how to take advantage of different models and various forecast skills. We combine the best features of existing models to build special ensemble models, which yield more accurate forecasts than previously used ensemble or individual models. We further demonstrate that in the case of the Basel project the suggested seismicity forecast strategy could have predicted the ensuing high level of seismicity. 2 DATA In this study, we again use seismicity data from the Soultz-sous-Forêts 2004 and Basel 2006 geothermal projects (Fig. 1). Figure 1. View largeDownload slide Data sets used in this study. (A) Soultz-sous-Forêts 2004 stimulation data. (B) Basel 2006 stimulation data. (a) Hydraulic data: the solid black line represents the injection rate [l s−1], the dotted grey line shows the well-head pressure [MPa] in function of time. Inset geolocate the projects. (b) Magnitudes of observed seismicity over the first few days. The colours and size of circles denote magnitude. (c) Map views of observed seismicity. (d) E–W cross-section of observed seismicity. (e) N–S cross-section of observed seismicity. Figure 1. View largeDownload slide Data sets used in this study. (A) Soultz-sous-Forêts 2004 stimulation data. (B) Basel 2006 stimulation data. (a) Hydraulic data: the solid black line represents the injection rate [l s−1], the dotted grey line shows the well-head pressure [MPa] in function of time. Inset geolocate the projects. (b) Magnitudes of observed seismicity over the first few days. The colours and size of circles denote magnitude. (c) Map views of observed seismicity. (d) E–W cross-section of observed seismicity. (e) N–S cross-section of observed seismicity. The Soultz-sous-Forêts geothermal project is located roughly 70 km north of Strasbourg in the Alsace region (France) between Kutzenhausen and Soultz-sous-Forêts in the Upper Rhine Graben. It is France’s first enhanced geothermal power plant. Over more than 30 yr, several stimulation and circulation tests were carried out in four wells at depth of 3.5 and 5 km. A detailed description of this project can be found in Evans et al. (2012), Gérard et al. (2006), Calò et al. (2014), Genter et al. (2010, 2012). For the purposes of this study, we use hydraulic and seismic data from the pre-stimulation and stimulation phases in September 2004 (Dyer 2005). We corrected local magnitudes using the scaling relationship by Douglas et al. (2013). Since some seismograms were clipped, there was a saturation of magnitudes at 1.8 (85 per cent of the earthquakes Mw > 1.8 clipped the sensor used for magnitude estimations Dyer 2005). Events with magnitudes greater than Mw > 1.8 were assigned a value of Mw = 1.8. The Basel geothermal reservoir is also located in the Rhine Graben, directly beneath the city of Basel (Switzerland). The reservoir lies at a depth of 5 km. Three weeks of stimulation was initially planned for December 2006, however, due to extensive ground shaking and feedback from the general public, the injection was stopped after almost 6 d. After shut-in, four Mw > 2.6 events occurred, one just few hours later and three more in early 2007. After studying the feasibility and acceptance of the project (Secanell et al. 2009), the operators decided to abandon the project. A detailed description of the project can be found in Häring et al. (2008), Dyer et al. (2008), Baisch et al. (2009), Deichmann et al. (2014). Here, we use about 15 d of hydraulic and seismic data (Dyer et al. 2010) starting from the onset of the stimulation (2006-12-02, 18:00), and also use the pre-stimulation injection test data. 3 MODELS In this section, we briefly describe the two models we use for this study: the Shapiro and Smoothed Seismicity (SaSS) and the Hydraulics and Seismics (HySei) models. Even if pore pressure perturbation plays a certain role in both of these models, forecasts of these models are calculated significantly differently: the SaSS model is based mainly on seismic observations and the anticipated injected volume, while HySei is based on numerical modelling of pore pressure diffusion in the reservoir that triggers earthquakes. Altogether five variants of these models are calibrated every 6 hr using learning periods between 1 and 11.5 d for Basel 2006 and 1-7 d for Soultz-sous-Forêts 2004. 3.1 Shapiro and Smoothed Seismicity (SaSS) model The SaSS model is based on three pillars: rate of earthquakes, magnitude distribution of earthquakes, and spatial distribution of earthquakes. The seismicity rate computed using the seismogenic index (eq. 1) during injection (Shapiro et al. 2010).   $$\text{log}_{10}(N_{m}(t)) = \text{log}_{10}(Q_{c}(t))-bm-\Sigma$$ (1)where Nm(t) is the number of induced events above magnitude m up until time t, Qc(t) is the cumulative injected volume of fluid at time t, b is Gutenberg–Richter b-value of the observed seismicity (i.e. slope of the cumulative magnitude distribution), and m is the magnitude above which all events are expected to be reliably recorded. Σ and b are estimated on the learning period, and we estimate the total volume that will be injected by the end of the forecast period. For the total injected volume, we use the final injection rate as it was the previously planned injection strategy. After shut-in, the decaying seismicity follows the equation of Langenbruch & Shapiro (2010) (we apply the original notation for consistency, eq. 2):   $$R_{0b}\left (\frac{t}{t_{0}}\right ) = \frac{R_{0a}}{\left (\frac{t}{t_{0}}\right )^p}$$ (2)where R0b is the post-stimulation seismicity rate at time t, t0 is the length of the stimulation period before shut-in, R0a is the average seismicity rate during stimulation, and p indicates the seismicity rate decay. The magnitude distribution follows the Gutenberg–Richter relation (Gutenberg & Richter 1944). The slope is the same b-value estimated on the learning period for the number component, the total number of events are also calculated for the number component by eqs (1) and (2). The spatial distribution of seismicity (Fig. 2) based on smoothing seismicity recorded during the learning period with 3-D Gaussian kernels (Király-Proag et al. 2016). The kernels are calibrated every 6 hr. For this, 1000 kernels represented by one (for isotropic kernels) or three (for anisotropic kernels) smoothing widths (i.e. bandwidths) are examined and the one that best forecasts the latter part of the learning period is chosen to compute the forecast probability density function (PDF). Forecasts fail consistency tests if an earthquake occurs where a rate of zero was forecast (i.e. an earthquake occurs when the model suggested it was impossible). Thus, we also calibrate a low background rate (i.e. a ‘trust level’) of the model jointly with the aforementioned three bandwidths. In other words, we trust the model to a certain extent (it is usually > 90 per cent) and equally distribute the rest of the PDF among all voxels. Figure 2. View largeDownload slide Construction of smoothed seismicity models with different temporal weightings. (a) Explanation of the general concept of smoothed seismicity in 3-D. Note that illustrations are in 2-D, however, all smoothed seismicity models are 3-D. (b) Left: past seismicity over the learning period without temporal weights. The inset in the upper right-hand corner shows temporal weighting. Centre: Gaussian kernels in x and z directions. Gaussian kernels are centred on past seismic events one by one. Summing and normalizing the contribution of all earthquakes observed during the learning period gives a 3-D probability density function (PDF). Right: Vertical cross section of the PDF. (c) Same as panel (b) with Heaviside temporal weights. (d) Same as panel (b) with exponential temporal weights. Figure 2. View largeDownload slide Construction of smoothed seismicity models with different temporal weightings. (a) Explanation of the general concept of smoothed seismicity in 3-D. Note that illustrations are in 2-D, however, all smoothed seismicity models are 3-D. (b) Left: past seismicity over the learning period without temporal weights. The inset in the upper right-hand corner shows temporal weighting. Centre: Gaussian kernels in x and z directions. Gaussian kernels are centred on past seismic events one by one. Summing and normalizing the contribution of all earthquakes observed during the learning period gives a 3-D probability density function (PDF). Right: Vertical cross section of the PDF. (c) Same as panel (b) with Heaviside temporal weights. (d) Same as panel (b) with exponential temporal weights. To model the migration of seismic activity from the vicinity of the well towards the edges of the reservoir after the pump has been stopped, we apply two types of temporal weights, which highlight more recent earthquakes: Heaviside temporal weights (i.e. moving window) and exponential temporal weights. In total, three types of spatial component were examined: SaSS S, a smoothed seismicity model without temporal weights; SaSS H, a smoothed seismicity model with a one-day Heaviside weights (Fig. 2c); and SaSS E, a smoothed seismicity with exponential weights (Fig. 2d). Table 1 summarizes all parameters of the SaSS model. Further details of the SaSS model can be found in Király-Proag et al. (2016). Table 1. Parameters of SaSS. LP stands for learning period. Number  Component    During stimulation      m  magnitude of completeness  fixed  Σ  seismogenic index  estimated on LP  b  b-value  estimated on LP  Qc(t)  total injected volume  based on planned injection  After stimulation      t0  length of the stimulation period  estimated on LP  R0a  co-stimulation seismicity rate  estimated on LP  p  decay parameter  generic value (p = 2)      or estimated on LP  Space  Component    LWtemp  length of temporal weighting period  fixed  σ1, σ2, σ3,  spatial kernel bandwidths  estimated on LP  T  trust level  estimated on LP  Number  Component    During stimulation      m  magnitude of completeness  fixed  Σ  seismogenic index  estimated on LP  b  b-value  estimated on LP  Qc(t)  total injected volume  based on planned injection  After stimulation      t0  length of the stimulation period  estimated on LP  R0a  co-stimulation seismicity rate  estimated on LP  p  decay parameter  generic value (p = 2)      or estimated on LP  Space  Component    LWtemp  length of temporal weighting period  fixed  σ1, σ2, σ3,  spatial kernel bandwidths  estimated on LP  T  trust level  estimated on LP  View Large 3.2 Hydraulics and Seismicity (HySei) model Gischig & Wiemer (2013) introduced the HySei model (Fig. 3) in which seismicity triggered by pressure diffusion with irreversible permeability enhancement. As the name suggests, the model consists of two main parts: (1) hydraulic and (2) seismicity modelling. The first goal is to characterize pressure evolution in the reservoir. To do this, we invert for the best hydraulic parameters by matching well-head pressure generated by a 1-D radial flow model with the observed well-head pressure. To evaluate the reservoir’s initial permeability, we solve the diffusion equation (eq. 3) with constant permeability (κ = κ0) using the pre-stimulation test injection. To invert for stimulation parameters, we use hydraulic data from the stimulation and solve the diffusion equation (eq. 3) with irreversible changing permeability (eq. 4) due to increasing pressure (eq. 5):   $$\rho S \frac{\partial p}{\partial t} = \nabla \left ( \frac{\kappa \rho }{\nu } \nabla p \right ) + q_{m}$$ (3)  $$\kappa = \kappa _{0} (u + 1)$$ (4)  $$\frac{\partial u}{\partial t} = C_{u} H_{pt}\left ( \frac{\partial p}{\partial t} \right ) H_{u} (u_{t}-u)H_{p}(p-p_{t})$$ (5)where ρ denotes fluid density, S is storage coefficient, κ is permeability, ν is fluid viscosity, and qm is a mass source; κ0 is the initial permeability before the stimulation, u is the overall permeability enhancement of the reservoir, called stimulation factor; Cu is a constant that scales the rate at which permeability changes, called stimulation velocity, ut is the maximum stimulation factor, and pt is the threshold pressure; Hpt is a Heaviside function, it is one if pressure increases, zero otherwise, and Hp and Hu are Heaviside functions for pressure and stimulation factor. These are not Heaviside functions in a strict sense, but are smoothed to avoid numerical instability. Permeability starts arising if pressure reaches pt; if the pressure rise any further, the permeability of the reservoir increases until it reaches ut. Figure 3. View largeDownload slide Schematic plot of the HySei model. Black dots represent events simulated by HySei either with a constant (HySei C) or with a variable (HySei V) b-value. Red dots show the seismicity of the current learning period oriented in the direction of the minimum and maximum principal axes, az and ax, respectively. Solid black lines indicate the length of the minimum and maximum axes. Black ellipses indicate the 95 per cent of the seismicity cloud. The red curve displays empirical event distribution along the minimum principal axis (i.e. in the off-plane direction). Figure 3. View largeDownload slide Schematic plot of the HySei model. Black dots represent events simulated by HySei either with a constant (HySei C) or with a variable (HySei V) b-value. Red dots show the seismicity of the current learning period oriented in the direction of the minimum and maximum principal axes, az and ax, respectively. Solid black lines indicate the length of the minimum and maximum axes. Black ellipses indicate the 95 per cent of the seismicity cloud. The red curve displays empirical event distribution along the minimum principal axis (i.e. in the off-plane direction). In the seismicity model, randomly placed potential nucleation points are triggered by the radial symmetric pressure evolution following the Mohr–Coulomb failure criterion. Minimum and maximum stresses (σ3 and σ1) of all seed points are taken from a normal distribution around the average in-situ stress estimate with standard deviation of 10 per cent. To determine magnitudes of the triggered events, we draw a random magnitude from a distribution of magnitudes defined by a certain b-value. There is no geometric constraint on earthquake size. We assign b-values at the seed points in two different ways: (1) a spatially homogeneous b-value is assigned to each seed (HySei C for a constant b-value) (2) spatially heterogeneous b-values are attributed to seed points (HySei V for variable b-value). In the former case, we only calibrate the homogeneous b-value on the learning period. In the latter case, we define differential stress (σ1 − σ3) at each seed point drawn from a normal distribution and a b-value that follows a linear relationship between differential stress and b-value: bmax and bmin parameters are b-values at minimum and maximum values of differential stress, respectively. Additional free parameters are the scaling factor Fs (the ratio between the number of synthetic and observed events), the stress drop coefficient dτ (the change of stress conditions after a seed has been triggered), and a criticality threshold μc, which accounts for the fact that seed points cannot be too close to the failure limit (proximity to failure). We calculate the pressure evolution in the reservoir for the forecast period in question using the injection rates of the actual stimulation conducted on site. Then we construct the forecast PDF by computing the spatial distribution of the 1000 catalogues generated for the same forecast period. Events simulated by HySei are distributed on the main fault plane. To construct a PDF in 3-D, we add an off-fault component to the simulated events (Fig. 3). First, we rotate the observed seismicity of the current learning period to the orientation of the principal components, then we draw random off-fault locations from the empirical distribution of observed events along the smallest principal axes. Similarly to SaSS model variants, we calibrate trust levels to give seismicity rate to those voxels where the model would not otherwise ever expect any event. Parameters of HySei are listed in Table 2. Table 2. Parameters of HySei. LP stands for learning period. Pre-stimulation      h  open-hole section  fixed (from geotechnical data)  ν  water viscosity  fixed  ρ  water density  fixed  κ0  permeability  estimated on pre-stimulation  reff  effective borehole radius  estimated on pre-stimulation  S  storage coefficient  estimated on pre-stimulation  Stimulation      Hydraulic parameters      Cu  stimulation velocity  estimated on stimulation LP  pt  threshold pressure  estimated on stimulation LP  ut  maximum stimulation factor  estimated on stimulation LP  Δpt, Δut  transition zones for Hp and Hu  estimated on stimulation LP  Seismic parameters      σ1, σ3  maximum and minimum stress magnitude  randomly drawn from geotechnical data  c  cohesion  fixed  ϕ  friction angle  fixed  N  seed density  fixed  Fs  scale factor  estimated on stimulation LP  μc  criticality threshold  estimated on stimulation LP  dτ  stress drop coefficient  estimated on stimulation LP  bcons  constant b-value for HySei C  estimated on stimulation LP  bmin, bmax  minimum and maximum b-values for HySei V  estimated on stimulation LP  T  trust level  estimated on LP  Pre-stimulation      h  open-hole section  fixed (from geotechnical data)  ν  water viscosity  fixed  ρ  water density  fixed  κ0  permeability  estimated on pre-stimulation  reff  effective borehole radius  estimated on pre-stimulation  S  storage coefficient  estimated on pre-stimulation  Stimulation      Hydraulic parameters      Cu  stimulation velocity  estimated on stimulation LP  pt  threshold pressure  estimated on stimulation LP  ut  maximum stimulation factor  estimated on stimulation LP  Δpt, Δut  transition zones for Hp and Hu  estimated on stimulation LP  Seismic parameters      σ1, σ3  maximum and minimum stress magnitude  randomly drawn from geotechnical data  c  cohesion  fixed  ϕ  friction angle  fixed  N  seed density  fixed  Fs  scale factor  estimated on stimulation LP  μc  criticality threshold  estimated on stimulation LP  dτ  stress drop coefficient  estimated on stimulation LP  bcons  constant b-value for HySei C  estimated on stimulation LP  bmin, bmax  minimum and maximum b-values for HySei V  estimated on stimulation LP  T  trust level  estimated on LP  View Large 4 METHOD 4.1 Testing of model forecast performance As stated in the introduction, testing of model forecast performance is based on statistical tests established by the CSEP community (Rhoades et al. 2011). In general, these tests focus on two elements: (1) if the forecast is consistent with observations, (2) which models are better than others. First, we apply three consistency tests: Number-, Magnitude- and Space-tests assuming Poissonian earthquake distributions. Note that throughout this study, we refer to these entities as model or forecast components. In the case of Number-test, we examine whether the overall number of forecast earthquakes is consistent with the number of observed events. For Magnitude-test, we check if the forecast magnitudes match the observed distribution. In the case of Space-test, we use a 4 × 4 × 4 km testing grid divided into 200 × 200 × 200 m voxels to calculate Poisson log-likelihood of the observation in each voxel (Zechar et al. 2010; Rhoades et al. 2011). Summing individual log-likelihood values yields the joint log-likelihood (LL) of a specific experiment. To ascertain whether the forecast is accurate enough, we simulate 1000 synthetic catalogues consistent with the given forecast. If the LL for the current observation is higher than the 5th percentile of the simulated catalogues the forecast passes the Space-test. Although it is more intuitive to understand the results of the Number-test with absolute numbers, to be consistent with the other tests, we express all results in terms of log-likelihoods, which are generally negative values. The closer the results are to zero, the better the forecast. To compare models, we here use the joint LL values of the entire model (all components together) and of its separate model components. We test all forecast skills over 6, 24, 48 and 72 hr forecast periods. Further details of the testing procedure are described in Király-Proag et al. (2016). 4.2 Ensemble models Forecasting tests allow us to check model consistency and to rank different models, so that we know which performs best over any given forecast period. But how should we use these results for operational purposes? A trivial and straight-forward solution would be to choose the best model and rely exclusively on its forecasts, but there is no guarantee that its forecast would continue to be the most accurate in the future (Oreskes et al. 1994; Marzocchi et al. 2012). While the best model prevails because it captures some features of seismicity very well (e.g. spatial distribution), other models may better represent other features (e.g. seismicity rates). Our preferred solution, then, is to keep all models and merge their results using a weighted average model, that is, an ensemble model. Assigning weights inevitably involves some subjectivity, but we consider an approach based on forecast performance superior to giving all models equal weight. Weightings takes account of two main issues: forecast correlation and cumulative forecast performance (Marzocchi et al. 2012). Correlation weights are a function of correlation between models and should fulfil three criteria: (1) the weight for highly correlated models should be lower than for less-correlated models; (2) if a new model is identical to an existing model, correlation weights should be shared between the duplicate models; (3) when a new model is added to an existing set of models none of the model weights should increase. According to Marzocchi et al. (2012), no formal weighting scheme fully satisfies all criteria, thus we apply their scheme that approximately achieves these goals, and we refer the reader to their article for details. Here, we consider a weighting scheme called Bayesian Model Averaging (BMA, Hoeting et al.1999) where the weights of the jth model (Wj) depend on the Poisson probability of each model’s previous forecasts (Pj) and the corresponding correlation weights, $$\sigma _{j}^{\text{corr}}$$ (eq. 6). All weights are normalized so the sum across all models equals one:   $$W_{j} = \frac{\sigma _{j}^{\text{corr}} P_{j}}{\sum \limits _{k = 1}^{J} \sigma _{k}^{\text{corr}} P_{k}}.$$ (6)In this scheme, we consider the models’s entire performance when calculating its weights, although this can be a drawback if model components perform significantly differently regarding rate or spatial distribution. For instance, our previous results showed that the rate forecast of both HySei variants is superior to that of the SaSS variants, and that smoothed seismicity models perform better than a radially symmetric seismic cloud (Király-Proag et al. 2016). Different performances of model components can counter-balance each other and lower the weights of models that have some good components but, in combination, produce a mediocre performance. Thus, we treat model components separately, building up a multicomponent ensemble model with different weights for the number and spatial components. The number component of the ensemble forecast ($$W^{\text{Number}}_{j}$$) is constructed using weights directly proportional to the probability of each model’s expected rate forecast   $$W_{j}^{\text{Number}} = \frac{P^{\text{Number}}_{j}}{\sum \limits _{k = 1}^{J}P^{\text{Number}}_{k}}.$$ (7)To calculate spatial weights ($$W_{j}^{\text{Space}}$$) for the jth model, we account for previous performance ($$P_{j}^{\text{Space}}$$) and correlation weights ($$\sigma _{j}^{\text{corr}}$$, its calculation is detailed in the Supporting Information eqs S1–S5) as formulated in eq. (8):   $$W_{j}^{\text{Space}} = \frac{\sigma _{j}^{\text{corr}} P_{j}^{\text{Space}}}{\sum \limits _{k = 1}^{J} \sigma _{k}^{\text{corr}} P_{k}^{\text{Space}}}.$$ (8)Thus, spatial weights indicate how the spatial forecasts of the different models are combined. The ensemble number of events (i.e. the event rate)—computed from combining the number forecasts of the individual models with number weights—is then distributed in space according to the ensemble spatial distribution. All models are recalibrated and all weights are updated every 6 hr throughout the experiment. To avoid overlapping forecast periods, weights are the cumulative sums of the most recently calibrated models’ performances over the first 6 hr long forecast time window (Supporting Information Fig. S1). In other words, weight of a forecast model is computed based on the model’s correlation weight and its cumulative LL by the end of the current learning period. This implies that weights at the end of the very first learning period are equal to the correlation weights. Weights corresponding to learning period of day 1.25 are computed based on correlation weights and the forecast performance of individual models between day 1–1.25. Weights at the end of day 1.5 are computed based on also correlation weights and the forecast performance of individual models between days 1–1.25 and—with recalibrated parameters—between days 1.25–1.5 and so on. 5 RESULTS In this section, we compare the forecast performance of individual models, one-component (1C) ensemble models (i.e. weights based on overall model performance) and two-component (2C) ensemble models (i.e. weights separately based on number and spatial performance). In the following figures, blue colours indicate SaSS model variants (light, intermediate and dark shades of blue stand for SaSS S, SaSS H and SaSS E, respectively); brown and orange denote HySei C and HySei V, respectively. Vertical grey dashed lines indicate when the shut-in occurred. Basel 2006 is always on the left-hand side of each figure and Soultz-sous-Forêts 2004 is on the right. Fig. 4 summarizes the weights calculated based on previous forecast performance of the most recently calibrated models. Correlation weights (Figs 4a and e) vary within a small range centred around the value 0.2. Model variants that are inherently more similar, especially when their spatial component is concerned, receive the same weight, for example, the HySei variants. Initially, the same trend can be observed in the SaSS variants but subsequently, their spatial component differs more widely, due to the different temporal weighting of the smoothed seismicity model and their correlation weights also differ more than earlier on. Figure 4. View largeDownload slide Weights of SaSS and HySei model variants. Vertical grey dashed line indicates the shut-in moment. (a,e) Correlation weights for the data sets of Basel 2006 and Soultz-sous-Forêts 2004, respectively. (b,f) One-component ensemble weights. (c,g) Two-component number weights. (d,h) Two-component spatial weights. Figure 4. View largeDownload slide Weights of SaSS and HySei model variants. Vertical grey dashed line indicates the shut-in moment. (a,e) Correlation weights for the data sets of Basel 2006 and Soultz-sous-Forêts 2004, respectively. (b,f) One-component ensemble weights. (c,g) Two-component number weights. (d,h) Two-component spatial weights. One-component ensemble weights (Figs 4b and f) strongly emphasize the best model (SaSS H and SaSS E variants), suppressing other models especially during the post-shut-in period. In the case of two-component ensemble weights (Figs 4c, d, g and h), the different performances of the model components found in our previous paper (Király-Proag et al. 2016) are clearly reflected in both data sets. In terms of event numbers, SaSS performs first better than HySei during injection, but after shut-in HySei performs better. Thus, number weights of SaSS are higher than those of HySei during the injection, but, HySei’s number component weights are higher after shut-in. In terms of spatial forecasts, SaSS performs better throughout the entire experiment. Thus, the spatial weights of SaSS variants are higher than for any other model variants. Fig. 5 presents the cumulative LL of the individual models using the same colours as before, with one- and two-component ensemble models for both data sets represented by yellow and red triangles, respectively. There are several clear similarities between the two graphs: In the early stages, all models perform similarly, but by the end of the experiment SaSS variants outperform the HySei variants. In both data sets, one-component ensemble models often incorporate only the best performing model. As a consequence, the final performance of the one-component ensemble model is equal to the performance of the best individual model. Two-component ensemble models improve the forecast performance during the late shut-in period. Figure 5. View largeDownload slide Evolution of cumulative LL of the model variants for both data sets. Colours and symbols are explained in the legend. BMA Ens 1C and BMA Ens 2C stand for the one-component ensemble model and the two-component ensemble model, respectively. Vertical grey dashed line indicates the shut-in moment. Figure 5. View largeDownload slide Evolution of cumulative LL of the model variants for both data sets. Colours and symbols are explained in the legend. BMA Ens 1C and BMA Ens 2C stand for the one-component ensemble model and the two-component ensemble model, respectively. Vertical grey dashed line indicates the shut-in moment. Whereas Fig. 5 shows the cumulative LL corresponding to all earthquakes in previous forecast periods, Fig. 6 represents the normalized cumulative LL of events occurred during the specific forecast periods for both data sets, namely during 6 hr forecast periods (Figs 6a and e), 24 hr periods (Figs 6b and f), 48 hr periods (Figs 6c and g), and 72 hr periods (Figs 6d and h). Again, the models initially perform very similarly, but LL/Eqks declines for the SaSS variants and the one-component ensemble models. The HySei variants perform in a more stable way for Basel 2006; and whereas they are also stable for Soultz-sous-Forêts 2004, there is a clear fall at shut-in. The two-component ensemble model’s performance also declines towards the end of the experiment, but is nonetheless superior to any other model. Examining the number and spatial components separately, it becomes clear that the one-component ensemble models always highlight the most strongly performing SaSS variant, while the two-component ensemble models combine the best components: usually a HySei variant as number component, and a SaSS variant as spatial component (Supporting Information Figs S2 and S3). Figure 6. View largeDownload slide Evolution of LL/Eqks. The colour coding (explained in the legend) is the same as in Fig. 5. Vertical grey dashed line indicates when the shut-in occurred. (a–d) LL/Eqks in Basel 2006 for 6, 24, 48, and 72 hr forecast periods, respectively. (e–h) LL/Eqks in Soultz-sous-Forêts 2004 for the same forecast periods. Figure 6. View largeDownload slide Evolution of LL/Eqks. The colour coding (explained in the legend) is the same as in Fig. 5. Vertical grey dashed line indicates when the shut-in occurred. (a–d) LL/Eqks in Basel 2006 for 6, 24, 48, and 72 hr forecast periods, respectively. (e–h) LL/Eqks in Soultz-sous-Forêts 2004 for the same forecast periods. Here we demonstrate how the testing of model forecasting against observations not only helps to assess their performance, but also offers a way of building ensemble models that are superior to individual models. Below, we explore sensitivity regarding the choice of the learning period and the weighting approach. In a first sensitivity analysis, we examine whether forecasting performance systematically differed over various segments of the learning periods. We consider periods ranging in duration from a one-day moving window (representing the most recent observation over the past 24 hr) to the full duration of the available data, and conclude that there is no significant difference in forecast skills. We then examine another weighting scheme, the Score Model Averaging (SMA, see Marzocchi et al.2012) to construct ensemble models. SMA weights are inversely proportional to the absolute value of the LL ($$W_{\text{SMA}} \propto \frac{1}{|\text{LL}|}$$). This scheme generally distributes weights more equally among all models than the BMA weighting scheme does. These weights do not highlight major differences between the models in either data sets (Supporting Information Fig. S4). Comparing individual models, the results for BMA and SMA ensemble models in terms of cumulative LL indicate that SMA one-component ensembles outperform their BMA counterparts. Two-component SMA weights yield slightly better results than SMA one-component ensembles, yet perform worse than BMA two-component ensembles (Supporting Information Fig. S5). The same pattern is observed in terms of LL/Eqks for 6, 24, 48 and 72 hr forecasts (Supporting Information Fig. S6). Even though the SMA weighting scheme does not always choose the best model based on previous forecast skills, it outperformed the BMA one-component ensemble model. This shows that one must keep in mind that forecasts can hold surprises, regardless of the models’ past performance. In other words, even if we build a model applying the currently best performing model components, there is no guarantee that this model gives the best forecast in the future. 6 TOWARDS AN ADAPTIVE TRAFFIC LIGHT SYSTEM 6.1 Near-real-time hazard forecasting One goal of forecast modelling is to build an on-site, real-time hazard estimating tool (i.e. an ATLS; Király-Proag et al. (2016); Grigoli et al. (2017)) to decide if injection strategies can continue as planned or have to be adjusted due to high hazard levels. An operator’s decision should be assisted by any hazard management framework to avoid arbitrary choices (i.e. led by gut feelings or personal interests) and also have to be based on clear predefined guidelines agreed upon by all stakeholders (project managers, authorities, insurances, etc.) prior to a project. Thus, part of such a hazard forecasting tool would be a suite of alternative injection strategies in addition to the planned injection defined by operators together with other stakeholders. Further, they have to agree upon hazard or risk thresholds that trigger the change from the planned to an alternative injection strategy. Possibly such hazard thresholds are based on expert judgements. At any time, such an ATLS would provide hazard estimates (e.g. probability of and event exceeding M ≥ 3) for each consider injection scenario. If the planned injection scenario indicates that the threshold will be exceeded than the alternative scenarios are considered. In the worst case, that is, when the scenario in which injection is stopped immediately indicates too high hazard, immediate shut-in follows. Here, we present a prototype of an ATLS based on our models and show how it could give practical assistance to on-site operators. In our example, we calibrate models at six moments (learning period of 1, 2, 3, 4, 5, 5.5 d) of the on-going injection and give 20 d forecasts in terms of occurrence probability of at least one M ≥ 3 event. Probabilities are only calculated for the forecast periods, not taking into account any previous events (i.e. those occurring during the corresponding learning period). Before the injection starts, we have to decide which injection strategies are of interests and what levels of seismic hazard is acceptable. We decide to run four scenarios: Scenario A: original stepwise increasing stimulation with 12 d of constant injection in the end (to simulate an ‘original plan’ strategy), Fig. 7(a). Scenario B: the actual stimulation design happened at Basel (11 000m3 injected volume), Fig. 8(a). Scenario C: constant injection for one more day after the current learning period, then ceasing of injection, Fig. 9(a). This scenario would come into play if above scenarios indicate too high hazard, but immediate shut-in (i.e. Scenario D) still seems to be safe. Scenario D: immediate shut-in after the current learning period, Fig. 10(a). Figure 7. View largeDownload slide Injection scenario A. (a) Injection rates of the scenario. Yellow, brown, pink, purple, blue and green rectangles show the 20 d forecast periods corresponding to learning periods ending after 1, 2, 3, 4, 5 and 5.5 d, respectively. (b) Occurrence probability of M ≥ 3 calculated between days 1 and 21. Symbols are explained in the legend. (c–g) Same as (b) between days 2 and 22, 3 and 23, 4 and 24, 5 and 25, and 5.5 and 25.5, respectively. Figure 7. View largeDownload slide Injection scenario A. (a) Injection rates of the scenario. Yellow, brown, pink, purple, blue and green rectangles show the 20 d forecast periods corresponding to learning periods ending after 1, 2, 3, 4, 5 and 5.5 d, respectively. (b) Occurrence probability of M ≥ 3 calculated between days 1 and 21. Symbols are explained in the legend. (c–g) Same as (b) between days 2 and 22, 3 and 23, 4 and 24, 5 and 25, and 5.5 and 25.5, respectively. Figure 8. View largeDownload slide Injection scenario B. Structure of the figure is the same as in Fig. 7. Figure 8. View largeDownload slide Injection scenario B. Structure of the figure is the same as in Fig. 7. Figure 9. View largeDownload slide Injection scenario C. Structure of the figure is the same as in Figs 7 and 8. At the end of the current learning period, injection continues for 1 d with constant injection rate then ceased; injection curves are indicated with the colour of the corresponding learning periods. Figure 9. View largeDownload slide Injection scenario C. Structure of the figure is the same as in Figs 7 and 8. At the end of the current learning period, injection continues for 1 d with constant injection rate then ceased; injection curves are indicated with the colour of the corresponding learning periods. Figure 10. View largeDownload slide Injection scenario D. Structure of the figure is the same as in Figs 7–9. Injections always end at the end of the current learning periods; injection curves are indicated with the colour of the corresponding learning periods. Figure 10. View largeDownload slide Injection scenario D. Structure of the figure is the same as in Figs 7–9. Injections always end at the end of the current learning periods; injection curves are indicated with the colour of the corresponding learning periods. Scenarios A and D are the two end-members: A is the most optimistic scenario of all, D is the safest one in terms of seismicity indicating the limits of the current injection strategy. Scenarios B and C are two injection strategies to provide alternatives in case injection should be changed. We decide that 5 per cent occurrence probability of M ≥ 3 event is the hazard threshold above which immediate action should be taken. The threshold is arbitrary and only serves for demonstration purposes in the following. Panels (b)–(g) in Figs 7–10 will appear on operators’ screens as the stimulation advances. After one day of injection (panel b in Fig. 7–10), A, B, C scenarios forecast higher than 5 per cent of occurrence probability, meaning that future seismicity is too dangerous, however, for immediate shut-in, the most conservative scenario, the occurrence probabilities are low. An operator would decide that there is no need for immediate shut-in, but would anticipate that hazard may become too high within one day. After 2 d (panel c in Figs 7–10), occurrence probabilities has decreased for all scenarios and also immediate shut-in forecasts less than 5 per cent of occurrence probability. The decrease in hazard forecasts for Scenario A and B is due to the fact that the volume to be injected in the future decreases, because unlike Scenario C and D, the total volume is predefined. Note, however, that one reason for changing hazard estimates may also be because the calibration of the forecast models improves with more data available after 2 d. Nevertheless, the results of Scenario C and D indicate to continue the injection. Scenario C shows that keeping the same injection rate for a day then ceasing injection would result in lower than 5 per cent of occurrence probability, so increasing injection rate could be encouraged. After 3 d (panel d in Figs 7–10), forecasts of scenario A and B tend to decrease relative to previous forecasts. Again, decreasing occurrence probabilities with increasing learning periods could be due to that learning periods of less than 2 d tend to overestimate seismicity, but more available data improve rate estimates. Occurrence probabilities increase for scenarios C and D: in case of C, most models forecast about 10 per cent of occurrence probability, immediate shut-in still results in < 5 per cent occurrence probabilities. The operator would continue the injection but to keep the injection rate moderate. After 4 d of injection (panel e in Figs 7–10), all scenarios urge to take immediate action: the most conservative scenario (D) reaches the 5 per cent threshold, all other scenarios indicate significantly higher occurrence probability. According to our ATLS, this is the moment, when stimulation should have been immediately ceased based on 20 d seismicity forecasts and 5 per cent occurrence probability of M ≥ 3 events as hazard threshold. Later forecasts (learning periods of 5 and 5.5. d, panels f and g in Figs 7–10) confirm that occurrence of at least one M ≥ 3 earthquake is increasingly probable even with cautious (C) and conservative (D) scenarios. Actual observations showed that one such event with Mw = 3.1 occurred in Basel at 16:48 on 8 December 2006. It has to be acknowledged that seismicity around the time of shut-in tends to be underpredicted by our models. We stress that such forecast deficiencies might be alleviated if models that more accurately represent physical processes are included in the ensemble forecasts. Nevertheless, model calibration and forecasting will never be perfectly accurate. Hence, uncertainties in the forecasts themselves have to be anticipated. Choosing low hazard thresholds can partially cope with potential hazard underpredictions. This demonstration of a prototype ATLS shows that such a system would have already warned operators after 3 d of injection to significantly alter the original injection plan and shut-in earlier to avoid the occurrence of any M ≥ 3 events. We argue that at Basel, there would have been a good chance that felt earthquakes could have been avoided with such an ATLS. 6.2 Long-term forecasts While the performance of short-term forecast (i.e. 6 hr to 20 d) has been thoroughly examined, we also investigate our models’ forecast skill over several weeks to a year. Seismicity in the case of the Basel reservoir, in particular, has seen to decline very slowly with events still occurring years after the stimulation in 2006 (e.g. Deichmann et al. 2014; Wiemer et al. 2017). Accordingly, for both model groups we consider a 1 yr forecast horizon. Fig. 11 represents weekly forecast rates for all models with M ≥ 0.9 (Fig. 11a) and with M ≥ 3 (Fig. 11b). In the former case, HySei C and V (brown and orange dots) forecast low seismicity rates from day 10 and no seismicity after the second half of January 2007. Thus, the model severely underpredicts the data (open black squares) observed throughout the period under scrutiny. From February 2007 on, only about 37 per cent of the weekly forecasts (the weeks, when no earthquake occurred) are well-forecast by the HySei variants. SaSS (blue squares) produces higher weekly rates providing conservative overpredictions or good forecasts until early March 2007, after which time the observed fluctuations in seismicity are not captured by the smoothly decaying models. Between February and December 2007, SaSS forecasts good seismicity rate in more than 70 per cent of the cases. Forecasts of ensemble models (yellow and red triangles) fall between the forecasts provided by SaSS and HySei variants: 60 per cent of the weekly forecasts are within the ensemble forecast uncertainties for the same period. Looking at Fig. 11(b), all models, including ensemble models, forecast low or close-to-zero rates for M ≥ 3 events over the entire 1 yr period, assigning a low probability to an Mw3 event occurring in February 2007. Whereas one would usually expect better rate forecasts from numerical models based on pore pressure diffusion compared to those based on simple seismicity decay, it should be borne in mind that whenever pressure drops below a certain threshold in the simulated reservoir, seeds are no longer triggered. Heterogeneities in pressure evolution are not well represented by a radially symmetrical model where the inversion is based solely on well-head pressure, and this model also raises the question of whether pore pressure is the only phenomenon responsible for triggering events after shut-in as seismicity migrates further from the well. As Segall & Lu (2015) showed, poroelastic stresses may dominate at greater distances and poroelastic coupling-induced stresses break the symmetry of diffusion even if faults are uniformly distributed. Figure 11. View largeDownload slide One-year weekly forecast rates of all models in Basel. Symbols and colours are indicated in the legend. Note the overlapping between the shaded areas representing 95 per cent confidence intervals. Models are calibrated on 10 d of learning period. (a) Number of events per week with M ≥ 0.9. Inset shows a zoom between December 2006 and February 2007. Only SaSS, HySei C, two-component ensemble models and observed seismicity rate are plotted for clarity. (b) Number of events per week with M ≥ 3. Figure 11. View largeDownload slide One-year weekly forecast rates of all models in Basel. Symbols and colours are indicated in the legend. Note the overlapping between the shaded areas representing 95 per cent confidence intervals. Models are calibrated on 10 d of learning period. (a) Number of events per week with M ≥ 0.9. Inset shows a zoom between December 2006 and February 2007. Only SaSS, HySei C, two-component ensemble models and observed seismicity rate are plotted for clarity. (b) Number of events per week with M ≥ 3. 6.3 Limitations Although good results are achieved, operational induced seismicity forecasting is still in its infancy. One of the main limitations of operational forecast testing, hence the applicability of an ATLS, is the near real-time availability of accurate seismic catalogues. Usually, only rough estimates of locations and magnitudes are available shortly after the origin time of an earthquake and uncertainties are larger compared to catalogues created later. In our analysis, we purposefully decided to use an earlier catalogues that have not been thoroughly analysed in post-processing for both projects. This limitation, however, does not inhibit testing model forecasts in a pseudo-prospective way to validate governing physical phenomena of the reservoir creation process. We also acknowledge that our results are limited to two model groups and to two geothermal data sets in similar geological environment. Although several models are developed, many of them are not applicable for systematic subsequent forecast testing. Model calibration over many (70+) learning periods is a major effort, especially for numerical models. Nonetheless, we consider to expand the test bench with further models, such as TOUGH2-SEED model by Nespoli et al. (2015), which incorporates 3-D hydraulics and stochastic seismicity modelling; model based on Coulomb stress transfer developed by Catalli et al. (2016); and ETAS model (Bachmann et al. 2011) with additional 3-D spatial kernels. A major obstacle for further development of models and the ATLS is the limited availability of reliable data sets. Although seismic catalogues have recently become more available, it is still difficult to get access to hydraulic data due to legal and proprietary issues. Since physics-based and hybrid models require injection rate and/or pressure data, limited access to data hinders their testing. 7 CONCLUSIONS In this study, we used the Induced Seismicity Test Bench (Király-Proag et al. 2016) on five model variants calibrated and systematically tested against data from two EGS reservoirs. We paid special attention to the forecast skills of previously applied and newly introduced ensemble models and to the practical utility of induced seismicity forecasting. Our main findings are summarized as follows: Cumulative LL and LL/Eqks for different forecast periods demonstrated that a one-component BMA ensemble model performs equally well as the best individual model meaning that this weighting acts as a model selector. We introduced multi-component ensemble models, that is, separate weighting of number and spatial forecast components. Cumulative LL and LL/Eqks for different forecast periods also showed that two-component BMA ensemble models improve forecasting performance, especially during the post-stimulation period. We found that parameter calibration over different segments of the learning periods produce insignificant random variation in the models’ forecast skills. Numerical models based solely on pore pressure diffusion may not forecast well over long term, for example, in these models after the modelled pressure decreased to a sufficiently low value, no earthquake is triggered even if observations stated the contrary. Scenario analysis of different injection strategies within the frame of an Adaptive Traffic Light System can be of practical assistance to on-site decision-making by operators. In the case of Basel, a ATLS may have initiated shut-in due to too high induced seismic hazard after 3–4 d. In future, it will be important to incorporate more models that differ significantly from previously tested ones and to involve more data sets in the Induced Seismicity Test Bench for these would provide a broader picture of the physical processes in engineered reservoirs and the best modelling practices. Moreover, similarly to the CSEP testing centres, real prospective testing on operational projects or in appropriately scaled laboratory experiments would take us a step further. We believe that not only the induced seismicity community, but also researchers in other fields (e.g. statistical seismology) would benefit from such experiments. ACKNOWLEDGEMENTS We acknowledge the financial support GEOTHERM, GEOTHERM-2 and GEISERS projects for developing fundamental ideas concerning the Adaptive Traffic Light System and providing the stimulation data of Soultz-sous-Forêts 2004. The authors wish to thank EEIG Heat Mining for permission to publish that data. Acknowledgement is also due to the numerous bodies that have supported the Soultz-sous-Forêts project over the years including the European Union, ADEME of France, BMU of Germany, SER and SFOE of Switzerland, and the EEIG ‘Exploitation Minière de la Chaleur’ consortium. Access to the data is provided by contacting the authors. We are also grateful to Eduard Kissling for discussing an earlier version of this manuscript. E.K.-P. thanks the GEOTHERM-2 and DESTRESS projects for funding her PhD. Some of this work has been done at the Swiss Competence Center on Energy Research - Supply of Electricity (SCCER-SoE), with support from the Swiss Commission for Technology and Innovation (CTI). REFERENCES Bachmann C.E., Wiemer S., Woessner J., Hainzl S., 2011. Statistical analysis of the induced Basel 2006 earthquake sequence: introducing a probability-based monitoring approach for Enhanced Geothermal Systems, Geophys. J. Int. , 186( 2), 793– 807. Google Scholar CrossRef Search ADS   Baisch S., Vörös R., Weidler R., Wyborn D., 2009. Investigation of fault mechanisms during geothermal reservoir stimulation experiments in the Cooper Basin, Australia, Bull. seism. Soc. Am. , 99( 1), 148– 158. Google Scholar CrossRef Search ADS   Baisch S., Vörös R., Rothert E., Stang H., Jung R., Schellschmidt R., 2010. A numerical model for fluid injection induced seismicity at Soultz-sous-Forêts, Int. J. Rock Mech. Min. Sci. , 47( 3), 405– 413. Google Scholar CrossRef Search ADS   Calò M., Dorbath C., Frogneux M., 2014. Injection tests at the EGS reservoir of Soultz-sous-Forêts. Seismic response of the GPK4 stimulations, Geothermics , 52, 50– 58. Google Scholar CrossRef Search ADS   Catalli F., Rinaldi A.P., Gischig V., Nespoli M., Wiemer S., 2016. The importance of earthquake interactions for injection-induced seismicity: retrospective modeling of the Basel Enhanced Geothermal System, Geophys. Res. Lett. , 43( 10), 4992– 4999. Google Scholar CrossRef Search ADS   Cesca S., Grigoli F., Heimann S., González Á., Buforn E., Maghsoudi S., Blanch E., Dahm T., 2014. The 2013 September–October seismic sequence offshore Spain: a case of seismicity triggered by gas injection?, Geophys. J. Int. , 198( 2), 941– 953. Google Scholar CrossRef Search ADS   Cook N.G.W., 1976. Seismicity associated with mining, Eng. Geol. , 10( 2–4), 99– 122. Google Scholar CrossRef Search ADS   Deichmann N., Kraft T., Evans K.F., 2014. Identification of faults activated during the stimulation of the Basel geothermal project from cluster analysis and focal mechanisms of the larger magnitude events, Geothermics , 52, 84– 97. Google Scholar CrossRef Search ADS   Douglas J. et al.  , 2013. Predicting ground motion from induced earthquakes in geothermal areas, Bull. seism. Soc. Am. , 103( 3), 1875– 1897. Google Scholar CrossRef Search ADS   Dyer B., 2005. Soultz GPK4 stimulation September 2004 to April 2005. Seismic monitoring report, Semore Seismic Report, Tech. rep . Dyer B.C., Schanz U., Ladner F., Häring M.O., Spillmann T., 2008. Microseismic imaging of a geothermal reservoir stimulation, Leading Edge , 27( 7), 856– 869. Google Scholar CrossRef Search ADS   Dyer B.C., Schanz U., Spillmann T., Ladner F., Haering M.O., 2010. Application of microseismic multiplet analysis to the Basel geothermal reservoir stimulation events, Geophys. Prospect. , 58( 5), 791– 807. Google Scholar CrossRef Search ADS   Eberhard D. A.J., Zechar J.D., Wiemer S., 2012. A prospective earthquake forecast experiment in the western Pacific, Geophys. J. Int. , 190( 3), 1579– 1592. Google Scholar CrossRef Search ADS   Evans K.F., Zappone A., Kraft T., Deichmann N., Moia F., 2012. A survey of the induced seismic responses to fluid injection in geothermal and CO2 reservoirs in Europe, Geothermics , 41, 30– 54. Google Scholar CrossRef Search ADS   Farahbod A.M., Kao H., Cassidy J.F., Walker D., 2015. How did hydraulic-fracturing operations in the Horn River Basin change seismicity patterns in northeastern British Columbia, Canada?, Leading Edge , 34, 658– 663. Google Scholar CrossRef Search ADS   Genter A., Evans K., Cuenot N., Fritsch D., Sanjuan B., 2010. Contribution of the exploration of deep crystalline fractured reservoir of Soultz to the knowledge of enhanced geothermal systems (EGS), C. R. Geosci. , 342( 7–8), 502– 516. Google Scholar CrossRef Search ADS   Genter A., Cuenot N., Goerke X., Melchert B., Sanjuan B., Scheiber J., 2012. Status of the Soultz geothermal project during exploitation between 2010 and 2012, in Thirty-Seventh Workshop on Geothermal Reservoir Engineering , pp. 1– 12, Stanford, CA, USA. Gérard A., Genter A., Kohl T., Lutz P., Rose P., Rummel F., 2006. The deep EGS (Enhanced Geothermal System) project at Soultz-sous-Forêts (Alsace, France), Geothermics , 35( 5–6), 473– 483. Google Scholar CrossRef Search ADS   Gerstenberger M.C., Rhoades D.A., 2010. New Zealand earthquake forecast testing centre, Pure appl. Geophys. , 167( 8–9), 877– 892. Google Scholar CrossRef Search ADS   Gischig V., Wiemer S., Alcolea A., 2014. Balancing reservoir creation and seismic hazard in enhanced geothermal systems, Geophys. J. Int. , 198, 1585– 1598. Google Scholar CrossRef Search ADS   Gischig V.S., Wiemer S., 2013. A stochastic model for induced seismicity based on non-linear pressure diffusion and irreversible permeability enhancement, Geophys. J. Int. , 194( 2), 1229– 1249. Google Scholar CrossRef Search ADS   Grigoli F. et al.  , 2017. Current challenges in monitoring, discrimination and management of induced seismicity related to underground industrial activities: a European perspective, Rev. Geophys. , 55, 1– 31. Google Scholar CrossRef Search ADS   Gupta H.K., 2002. A review of recent studies of triggered earthquakes by artificial water reservoirs with special emphasis on earthquakes in Koyna, India, Earth-Sci. Rev. , 58( 3–4), 279– 310. Google Scholar CrossRef Search ADS   Gutenberg B., Richter C., 1944. Frequency of earthquakes in California, Bull. seism. Soc. Am. , 34, 185– 188. Häring M.O., Schanz U., Ladner F., Dyer B.C., 2008. Characterisation of the Basel 1 enhanced geothermal system, Geothermics , 37( 5), 469– 495. Google Scholar CrossRef Search ADS   Hirschberg S., Wiemer S., Burgherr P., 2015. Energy from the Earth—Deep Geothermal as a Resource for the Future? , Die Deutsche Nationalbibliothek. Hoeting J.A., Madigan D., Raftery A.E., Volinsky C.T., 1999. Bayesian model averaging: a tutorial, Stat. Sci. , 14( 4), 382– 417. Google Scholar CrossRef Search ADS   Keranen K.M., Savage H.M., Abers G.A., Cochran E.S., 2013. Potentially induced earthquakes in Oklahoma, USA: links between wastewater injection and the 2011 Mw 5.7 earthquake sequence, Geology , 41( 6), 699– 702. Google Scholar CrossRef Search ADS   Király-Proag E., Zechar J.D., Gischig V., Wiemer S., Karvounis D., Doetsch J., 2016. Validating induced seismicity forecast models–Induced Seismicity Test Bench, J. geophys. Res. , 121, 6009– 6029. Google Scholar CrossRef Search ADS   Langenbruch C., Shapiro S.A., 2010. Decay rate of fluid-induced seismicity after termination of reservoir stimulations, Geophysics , 75( 6), 1– 10. Google Scholar CrossRef Search ADS   Langenbruch C., Zoback M.D., 2016. How will induced seismicity in Oklahoma respond to decreased saltwater injection rates?, Sci. Adv. , 2, 1– 9. Google Scholar CrossRef Search ADS   Marzocchi W., Douglas Zechar J., Jordan T.H., 2012. Bayesian forecast evaluation and ensemble earthquake forecasting, Bull. seism. Soc. Am. , 102( 6), 2574– 2584. Google Scholar CrossRef Search ADS   Mignan A., Jiang C., Zechar J.D., Wiemer S., Wu Z., Huang Z., 2013. Completeness of the Mainland China earthquake catalog and implications for the setup of the China earthquake forecast testing center, Bull. seism. Soc. Am. , 103( 2 A), 845– 859. Google Scholar CrossRef Search ADS   Nanjo K.Z., Tsuruoka H., Hirata N., Jordan T.H., 2011. Overview of the first earthquake forecast testing experiment in Japan, Earth Planets Space , 63( 3), 159– 169. Google Scholar CrossRef Search ADS   Nespoli M., Rinaldi A.P., Wiemer S., 2015. TOUGH2-SEED: A coupled fluid flow mechanical-statistical model for the study of injection-induced seismicity, in TOUGH Symposium 2015 , pp. 284– 292. Oreskes N., Shrader-Frechette K., Belitz K., 1994. Verification, validation, and confirmation of numerical models in the earth sciences, Science , 263( 5147), 641– 646. Google Scholar CrossRef Search ADS PubMed  Rhoades D.A., Schorlemmer D., Gerstenberger M.C., Christophersen A., Zechar J.D., Imoto M., 2011. Efficient testing of earthquake forecasting models, Acta Geophys. , 59( 4), 728– 747. Google Scholar CrossRef Search ADS   Schorlemmer D., Christophersen A., Rovida A., Mele F., Stucchi M., Marzocchi W., 2010. Setting up an earthquake forecast experiment in Italy, Ann. Geophys. , 53( 3), 1– 9. Secanell R., Carbon D., Dunand F., Martin C., 2009. AP5000 Report - Seismic Hazard and Risk assessments during three reference time periods (normal, stimulation and circulation), Tech. Rep. October 2009, GEOTER S.A.S . Segall P., Lu S., 2015. Injection-induced seismicity: Poroelastic and earthquake nucleation effects, J. geophys. Res. , 120, 1– 22. Google Scholar CrossRef Search ADS   Shapiro S.A., Dinske C., Langenbruch C., Wenzel F., 2010. Seismogenic index and magnitude probability of earthquakes induced during reservoir fluid stimulations, Leading Edge , 29( 3), 304– 309. Google Scholar CrossRef Search ADS   Taroni M., Zechar J.D., Marzocchi W., 2013. Assessing annual global M6+ seismicity forecasts, Geophys. J. Int. , 196( 1), 422– 431. Google Scholar CrossRef Search ADS   van Thienen-Visser K., Breunese J.N., 2015. Induced seismicity of the Groningen gas field: history and recent developments, Leading Edge , 34( 6), 664– 671. Google Scholar CrossRef Search ADS   Wiemer S., Tormann T., Herrmann M., Karvounis D., Kraft T., Marti M., 2017. Induzierte Erdbeben im Nachgang des eingestellten Geothermieprojekts in Basel, Tech. rep., Schweizerischer Erdbebendienst (SED) an der ETH Zürich Autoren , Zürich, Switzerland. Zechar J.D., Schorlemmer D., Liukis M., Yu J., Euchner F., Maechling P.J., Jordan T.H., 2010. The Collaboratory for the Study of Earthquake Predictability perspective on computational earthquake science, Concurrency Comput. Pract. Exp. , 22( 12), 1836– 1847. Google Scholar CrossRef Search ADS   Zechar J.D., Schorlemmer D., Werner M.J., Gerstenberger M.C., Rhoades D.A., Jordan T.H., 2013. Regional Earthquake Likelihood Models I: First-order results, Bull. seism. Soc. Am. , 103( 2A), 787– 798. Google Scholar CrossRef Search ADS   SUPPORTING INFORMATION Supplementary data are available at GJI online. Figure S1. Explanation of weight calculations for ensemble models. Wcorr stands for correlation weights, FTW indicates forecast time window (6 hours). Figure S2. Evolution of LL/Eqks of the number components. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. LL/Eqks for 6-hour forecast periods in Basel 2006. b. Same as a. for 24-hour forecast periods. c. Same as a. for 48-hour forecast periods. d. Same as a. for 72-hour forecast periods. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. Figure S3. Evolution of LL/Eqks of the spatial components. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. LL/Eqks for 6-hour forecast periods in Basel 2006. b. Same as a. for 24-hour forecast periods. c. Same as a. for 48-hour forecast periods. d. Same as a. for 72-hour forecast periods. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. Figure S4. Score Model Averaging (SMA) weights of SaSS and HySei variants. Vertical grey dashed line indicates the shut-in moment. a. Correlation weights for Basel 2006. b. One-component ensemble weights for Basel 2006. c. Two-component number weights for Basel 2006. d. Two-component spatial weights for Basel 2006. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. Figure S5. Evolution of cumulative LL of SaSS and HySei model variants as well as BMA and SMA ensemble models. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. Basel 2006. b. Soultz-sous-Forêts 2004. Figure S6. Comparison of LL/Eqks with BMA and SMA weighted ensemble models. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. LL/Eqks for 6-hour forecast periods in Basel 2006. b. Same as a. for 24-hour forecast periods. c. Same as a. for 48-hour forecast periods. d. Same as a. for 72-hour forecast periods. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. [Equations (S1) to (S5)] Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper. © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Geophysical Journal International Oxford University Press

# Multicomponent ensemble models to forecast induced seismicity

, Volume 212 (1) – Jan 1, 2018
15 pages

/lp/ou_press/multicomponent-ensemble-models-to-forecast-induced-seismicity-7iJQznmTF5
Publisher
The Royal Astronomical Society
ISSN
0956-540X
eISSN
1365-246X
D.O.I.
10.1093/gji/ggx393
Publisher site
See Article on Publisher Site

### Abstract

SUMMARY In recent years, human-induced seismicity has become a more and more relevant topic due to its economic and social implications. Several models and approaches have been developed to explain underlying physical processes or forecast induced seismicity. They range from simple statistical models to coupled numerical models incorporating complex physics. We advocate the need for forecast testing as currently the best method for ascertaining if models are capable to reasonably accounting for key physical governing processes—or not. Moreover, operational forecast models are of great interest to help on-site decision-making in projects entailing induced earthquakes. We previously introduced a standardized framework following the guidelines of the Collaboratory for the Study of Earthquake Predictability, the Induced Seismicity Test Bench, to test, validate, and rank induced seismicity models. In this study, we describe how to construct multicomponent ensemble models based on Bayesian weightings that deliver more accurate forecasts than individual models in the case of Basel 2006 and Soultz-sous-Forêts 2004 enhanced geothermal stimulation projects. For this, we examine five calibrated variants of two significantly different model groups: (1) Shapiro and Smoothed Seismicity based on the seismogenic index, simple modified Omori-law-type seismicity decay, and temporally weighted smoothed seismicity; (2) Hydraulics and Seismicity based on numerically modelled pore pressure evolution that triggers seismicity using the Mohr–Coulomb failure criterion. We also demonstrate how the individual and ensemble models would perform as part of an operational Adaptive Traffic Light System. Investigating seismicity forecasts based on a range of potential injection scenarios, we use forecast periods of different durations to compute the occurrence probabilities of seismic events M ≥ 3. We show that in the case of the Basel 2006 geothermal stimulation the models forecast hazardous levels of seismicity days before the occurrence of felt events. Earthquake interaction, forecasting, and prediction, Induced seismicity, Statistical seismology 1 INTRODUCTION Increasing demand for energy is giving rise to more and more underground engineering projects accompanied by induced earthquakes. These projects are associated with various types of operations, such as controlled filling of surface reservoirs (Koyna, India, Gupta 2002), mining (e.g. Witwatersrand, South Africa, Cook 1976), hydraulic fracturing (e.g. Horn River Basin, Canada, Farahbod et al. 2015), gas production (e.g. Groningen, The Netherlands, van Thienen-Visser & Breunese 2015), gas injections (e.g. Castor gas-storage project, Spain, Cesca et al. 2014), waste water injections (e.g. Oklahoma, USA, Keranen et al. 2013) or geothermal energy projects (e.g. Basel, Switzerland, Häring et al. (2008), Cooper Basin, Australia, Baisch et al. (2009)). Not only the scientific community but also the general public is becoming increasingly aware of induced seismicity as the number of felt or damaging induced earthquakes increases (e.g. Basel, Hirschberg et al. (2015), the Prague, Fairview, and Pawnee earthquakes in Oklahoma, Langenbruch & Zoback (2016)). Clearly, efficient monitoring systems and hazard management procedures are needed to guarantee the safety of such underground engineering operations and thus safeguard their economical profitability. In the scientific literature, great attention is paid to modelling reservoir behaviour (e.g. Baisch et al. 2010; Gischig et al. 2014; Segall & Lu 2015; Catalli et al. 2016). The models in question examine the governing processes involved either by attempting to reproduce the entire history of the project (e.g. the injection rate, well-head pressure evolution, seismicity) or in conceptual studies without any calibration against observations. We argue that sound calibration and careful testing of model, forecasts are important, because (1) this is an unbiased way to test models if they are able to reasonably capture the most important physical governing processes; and (2) in ongoing reservoir stimulations, models can deliver quick, reasonable forecasts of the seismicity rate, magnitude and spatial distribution for the near future. In the latter case, models become part of induced seismic hazard control schemes (such as traffic light systems, Király-Proag et al. (2016); Grigoli et al. (2017)) that support operators’ on-site decisions. For such near real-time model calibration and forecasting, computational efficiency is a key issue as currently most of the existing models are more computationally expensive than required, especially if they involve numerical modelling. We believe that high-performance computing will resolve this issue in the near future. Another key issue is the accuracy of forecasting. To assess accuracy, we introduced an Induced Seismicity Test Bench, which is a standardized framework for validating and ranking induced seismicity forecasts (Király-Proag et al. 2016). This is done in a pseudo-prospective way, which means that a limited period of an already recorded data set is given to the models being tested to train them (i.e. calibration on a learning period), followed by the generation of forecasts for the next few hours, days or weeks (i.e. the forecast period). We test several duration of learning and forecast periods to chart their effect on the models’ forecast skills. The forecast performance is measured using statistical tests that were developed by the Collaboratory for the Study of Earthquake Predictability (CSEP community) for natural earthquakes (e.g. Gerstenberger & Rhoades 2010; Schorlemmer et al. 2010; Zechar et al. 2010; Nanjo et al. 2011; Eberhard et al. 2012; Mignan et al. 2013; Taroni et al. 2013; Zechar et al. 2013). The aforementioned testing framework applies to any induced-seismicity-related project. In Király-Proag et al. (2016), we showed in details how well two models performed in the case of two deep geothermal projects (i.e. Soultz-sous-Forêts 2004 and Basel 2006). The conclusion reached was that neither of the tested models was capable of perfect forecasts and that different models capture different spatiotemporal aspects of the seismic sequences. In this paper, we intend to go one step further and show how to take advantage of different models and various forecast skills. We combine the best features of existing models to build special ensemble models, which yield more accurate forecasts than previously used ensemble or individual models. We further demonstrate that in the case of the Basel project the suggested seismicity forecast strategy could have predicted the ensuing high level of seismicity. 2 DATA In this study, we again use seismicity data from the Soultz-sous-Forêts 2004 and Basel 2006 geothermal projects (Fig. 1). Figure 1. View largeDownload slide Data sets used in this study. (A) Soultz-sous-Forêts 2004 stimulation data. (B) Basel 2006 stimulation data. (a) Hydraulic data: the solid black line represents the injection rate [l s−1], the dotted grey line shows the well-head pressure [MPa] in function of time. Inset geolocate the projects. (b) Magnitudes of observed seismicity over the first few days. The colours and size of circles denote magnitude. (c) Map views of observed seismicity. (d) E–W cross-section of observed seismicity. (e) N–S cross-section of observed seismicity. Figure 1. View largeDownload slide Data sets used in this study. (A) Soultz-sous-Forêts 2004 stimulation data. (B) Basel 2006 stimulation data. (a) Hydraulic data: the solid black line represents the injection rate [l s−1], the dotted grey line shows the well-head pressure [MPa] in function of time. Inset geolocate the projects. (b) Magnitudes of observed seismicity over the first few days. The colours and size of circles denote magnitude. (c) Map views of observed seismicity. (d) E–W cross-section of observed seismicity. (e) N–S cross-section of observed seismicity. The Soultz-sous-Forêts geothermal project is located roughly 70 km north of Strasbourg in the Alsace region (France) between Kutzenhausen and Soultz-sous-Forêts in the Upper Rhine Graben. It is France’s first enhanced geothermal power plant. Over more than 30 yr, several stimulation and circulation tests were carried out in four wells at depth of 3.5 and 5 km. A detailed description of this project can be found in Evans et al. (2012), Gérard et al. (2006), Calò et al. (2014), Genter et al. (2010, 2012). For the purposes of this study, we use hydraulic and seismic data from the pre-stimulation and stimulation phases in September 2004 (Dyer 2005). We corrected local magnitudes using the scaling relationship by Douglas et al. (2013). Since some seismograms were clipped, there was a saturation of magnitudes at 1.8 (85 per cent of the earthquakes Mw > 1.8 clipped the sensor used for magnitude estimations Dyer 2005). Events with magnitudes greater than Mw > 1.8 were assigned a value of Mw = 1.8. The Basel geothermal reservoir is also located in the Rhine Graben, directly beneath the city of Basel (Switzerland). The reservoir lies at a depth of 5 km. Three weeks of stimulation was initially planned for December 2006, however, due to extensive ground shaking and feedback from the general public, the injection was stopped after almost 6 d. After shut-in, four Mw > 2.6 events occurred, one just few hours later and three more in early 2007. After studying the feasibility and acceptance of the project (Secanell et al. 2009), the operators decided to abandon the project. A detailed description of the project can be found in Häring et al. (2008), Dyer et al. (2008), Baisch et al. (2009), Deichmann et al. (2014). Here, we use about 15 d of hydraulic and seismic data (Dyer et al. 2010) starting from the onset of the stimulation (2006-12-02, 18:00), and also use the pre-stimulation injection test data. 3 MODELS In this section, we briefly describe the two models we use for this study: the Shapiro and Smoothed Seismicity (SaSS) and the Hydraulics and Seismics (HySei) models. Even if pore pressure perturbation plays a certain role in both of these models, forecasts of these models are calculated significantly differently: the SaSS model is based mainly on seismic observations and the anticipated injected volume, while HySei is based on numerical modelling of pore pressure diffusion in the reservoir that triggers earthquakes. Altogether five variants of these models are calibrated every 6 hr using learning periods between 1 and 11.5 d for Basel 2006 and 1-7 d for Soultz-sous-Forêts 2004. 3.1 Shapiro and Smoothed Seismicity (SaSS) model The SaSS model is based on three pillars: rate of earthquakes, magnitude distribution of earthquakes, and spatial distribution of earthquakes. The seismicity rate computed using the seismogenic index (eq. 1) during injection (Shapiro et al. 2010).   $$\text{log}_{10}(N_{m}(t)) = \text{log}_{10}(Q_{c}(t))-bm-\Sigma$$ (1)where Nm(t) is the number of induced events above magnitude m up until time t, Qc(t) is the cumulative injected volume of fluid at time t, b is Gutenberg–Richter b-value of the observed seismicity (i.e. slope of the cumulative magnitude distribution), and m is the magnitude above which all events are expected to be reliably recorded. Σ and b are estimated on the learning period, and we estimate the total volume that will be injected by the end of the forecast period. For the total injected volume, we use the final injection rate as it was the previously planned injection strategy. After shut-in, the decaying seismicity follows the equation of Langenbruch & Shapiro (2010) (we apply the original notation for consistency, eq. 2):   $$R_{0b}\left (\frac{t}{t_{0}}\right ) = \frac{R_{0a}}{\left (\frac{t}{t_{0}}\right )^p}$$ (2)where R0b is the post-stimulation seismicity rate at time t, t0 is the length of the stimulation period before shut-in, R0a is the average seismicity rate during stimulation, and p indicates the seismicity rate decay. The magnitude distribution follows the Gutenberg–Richter relation (Gutenberg & Richter 1944). The slope is the same b-value estimated on the learning period for the number component, the total number of events are also calculated for the number component by eqs (1) and (2). The spatial distribution of seismicity (Fig. 2) based on smoothing seismicity recorded during the learning period with 3-D Gaussian kernels (Király-Proag et al. 2016). The kernels are calibrated every 6 hr. For this, 1000 kernels represented by one (for isotropic kernels) or three (for anisotropic kernels) smoothing widths (i.e. bandwidths) are examined and the one that best forecasts the latter part of the learning period is chosen to compute the forecast probability density function (PDF). Forecasts fail consistency tests if an earthquake occurs where a rate of zero was forecast (i.e. an earthquake occurs when the model suggested it was impossible). Thus, we also calibrate a low background rate (i.e. a ‘trust level’) of the model jointly with the aforementioned three bandwidths. In other words, we trust the model to a certain extent (it is usually > 90 per cent) and equally distribute the rest of the PDF among all voxels. Figure 2. View largeDownload slide Construction of smoothed seismicity models with different temporal weightings. (a) Explanation of the general concept of smoothed seismicity in 3-D. Note that illustrations are in 2-D, however, all smoothed seismicity models are 3-D. (b) Left: past seismicity over the learning period without temporal weights. The inset in the upper right-hand corner shows temporal weighting. Centre: Gaussian kernels in x and z directions. Gaussian kernels are centred on past seismic events one by one. Summing and normalizing the contribution of all earthquakes observed during the learning period gives a 3-D probability density function (PDF). Right: Vertical cross section of the PDF. (c) Same as panel (b) with Heaviside temporal weights. (d) Same as panel (b) with exponential temporal weights. Figure 2. View largeDownload slide Construction of smoothed seismicity models with different temporal weightings. (a) Explanation of the general concept of smoothed seismicity in 3-D. Note that illustrations are in 2-D, however, all smoothed seismicity models are 3-D. (b) Left: past seismicity over the learning period without temporal weights. The inset in the upper right-hand corner shows temporal weighting. Centre: Gaussian kernels in x and z directions. Gaussian kernels are centred on past seismic events one by one. Summing and normalizing the contribution of all earthquakes observed during the learning period gives a 3-D probability density function (PDF). Right: Vertical cross section of the PDF. (c) Same as panel (b) with Heaviside temporal weights. (d) Same as panel (b) with exponential temporal weights. To model the migration of seismic activity from the vicinity of the well towards the edges of the reservoir after the pump has been stopped, we apply two types of temporal weights, which highlight more recent earthquakes: Heaviside temporal weights (i.e. moving window) and exponential temporal weights. In total, three types of spatial component were examined: SaSS S, a smoothed seismicity model without temporal weights; SaSS H, a smoothed seismicity model with a one-day Heaviside weights (Fig. 2c); and SaSS E, a smoothed seismicity with exponential weights (Fig. 2d). Table 1 summarizes all parameters of the SaSS model. Further details of the SaSS model can be found in Király-Proag et al. (2016). Table 1. Parameters of SaSS. LP stands for learning period. Number  Component    During stimulation      m  magnitude of completeness  fixed  Σ  seismogenic index  estimated on LP  b  b-value  estimated on LP  Qc(t)  total injected volume  based on planned injection  After stimulation      t0  length of the stimulation period  estimated on LP  R0a  co-stimulation seismicity rate  estimated on LP  p  decay parameter  generic value (p = 2)      or estimated on LP  Space  Component    LWtemp  length of temporal weighting period  fixed  σ1, σ2, σ3,  spatial kernel bandwidths  estimated on LP  T  trust level  estimated on LP  Number  Component    During stimulation      m  magnitude of completeness  fixed  Σ  seismogenic index  estimated on LP  b  b-value  estimated on LP  Qc(t)  total injected volume  based on planned injection  After stimulation      t0  length of the stimulation period  estimated on LP  R0a  co-stimulation seismicity rate  estimated on LP  p  decay parameter  generic value (p = 2)      or estimated on LP  Space  Component    LWtemp  length of temporal weighting period  fixed  σ1, σ2, σ3,  spatial kernel bandwidths  estimated on LP  T  trust level  estimated on LP  View Large 3.2 Hydraulics and Seismicity (HySei) model Gischig & Wiemer (2013) introduced the HySei model (Fig. 3) in which seismicity triggered by pressure diffusion with irreversible permeability enhancement. As the name suggests, the model consists of two main parts: (1) hydraulic and (2) seismicity modelling. The first goal is to characterize pressure evolution in the reservoir. To do this, we invert for the best hydraulic parameters by matching well-head pressure generated by a 1-D radial flow model with the observed well-head pressure. To evaluate the reservoir’s initial permeability, we solve the diffusion equation (eq. 3) with constant permeability (κ = κ0) using the pre-stimulation test injection. To invert for stimulation parameters, we use hydraulic data from the stimulation and solve the diffusion equation (eq. 3) with irreversible changing permeability (eq. 4) due to increasing pressure (eq. 5):   $$\rho S \frac{\partial p}{\partial t} = \nabla \left ( \frac{\kappa \rho }{\nu } \nabla p \right ) + q_{m}$$ (3)  $$\kappa = \kappa _{0} (u + 1)$$ (4)  $$\frac{\partial u}{\partial t} = C_{u} H_{pt}\left ( \frac{\partial p}{\partial t} \right ) H_{u} (u_{t}-u)H_{p}(p-p_{t})$$ (5)where ρ denotes fluid density, S is storage coefficient, κ is permeability, ν is fluid viscosity, and qm is a mass source; κ0 is the initial permeability before the stimulation, u is the overall permeability enhancement of the reservoir, called stimulation factor; Cu is a constant that scales the rate at which permeability changes, called stimulation velocity, ut is the maximum stimulation factor, and pt is the threshold pressure; Hpt is a Heaviside function, it is one if pressure increases, zero otherwise, and Hp and Hu are Heaviside functions for pressure and stimulation factor. These are not Heaviside functions in a strict sense, but are smoothed to avoid numerical instability. Permeability starts arising if pressure reaches pt; if the pressure rise any further, the permeability of the reservoir increases until it reaches ut. Figure 3. View largeDownload slide Schematic plot of the HySei model. Black dots represent events simulated by HySei either with a constant (HySei C) or with a variable (HySei V) b-value. Red dots show the seismicity of the current learning period oriented in the direction of the minimum and maximum principal axes, az and ax, respectively. Solid black lines indicate the length of the minimum and maximum axes. Black ellipses indicate the 95 per cent of the seismicity cloud. The red curve displays empirical event distribution along the minimum principal axis (i.e. in the off-plane direction). Figure 3. View largeDownload slide Schematic plot of the HySei model. Black dots represent events simulated by HySei either with a constant (HySei C) or with a variable (HySei V) b-value. Red dots show the seismicity of the current learning period oriented in the direction of the minimum and maximum principal axes, az and ax, respectively. Solid black lines indicate the length of the minimum and maximum axes. Black ellipses indicate the 95 per cent of the seismicity cloud. The red curve displays empirical event distribution along the minimum principal axis (i.e. in the off-plane direction). In the seismicity model, randomly placed potential nucleation points are triggered by the radial symmetric pressure evolution following the Mohr–Coulomb failure criterion. Minimum and maximum stresses (σ3 and σ1) of all seed points are taken from a normal distribution around the average in-situ stress estimate with standard deviation of 10 per cent. To determine magnitudes of the triggered events, we draw a random magnitude from a distribution of magnitudes defined by a certain b-value. There is no geometric constraint on earthquake size. We assign b-values at the seed points in two different ways: (1) a spatially homogeneous b-value is assigned to each seed (HySei C for a constant b-value) (2) spatially heterogeneous b-values are attributed to seed points (HySei V for variable b-value). In the former case, we only calibrate the homogeneous b-value on the learning period. In the latter case, we define differential stress (σ1 − σ3) at each seed point drawn from a normal distribution and a b-value that follows a linear relationship between differential stress and b-value: bmax and bmin parameters are b-values at minimum and maximum values of differential stress, respectively. Additional free parameters are the scaling factor Fs (the ratio between the number of synthetic and observed events), the stress drop coefficient dτ (the change of stress conditions after a seed has been triggered), and a criticality threshold μc, which accounts for the fact that seed points cannot be too close to the failure limit (proximity to failure). We calculate the pressure evolution in the reservoir for the forecast period in question using the injection rates of the actual stimulation conducted on site. Then we construct the forecast PDF by computing the spatial distribution of the 1000 catalogues generated for the same forecast period. Events simulated by HySei are distributed on the main fault plane. To construct a PDF in 3-D, we add an off-fault component to the simulated events (Fig. 3). First, we rotate the observed seismicity of the current learning period to the orientation of the principal components, then we draw random off-fault locations from the empirical distribution of observed events along the smallest principal axes. Similarly to SaSS model variants, we calibrate trust levels to give seismicity rate to those voxels where the model would not otherwise ever expect any event. Parameters of HySei are listed in Table 2. Table 2. Parameters of HySei. LP stands for learning period. Pre-stimulation      h  open-hole section  fixed (from geotechnical data)  ν  water viscosity  fixed  ρ  water density  fixed  κ0  permeability  estimated on pre-stimulation  reff  effective borehole radius  estimated on pre-stimulation  S  storage coefficient  estimated on pre-stimulation  Stimulation      Hydraulic parameters      Cu  stimulation velocity  estimated on stimulation LP  pt  threshold pressure  estimated on stimulation LP  ut  maximum stimulation factor  estimated on stimulation LP  Δpt, Δut  transition zones for Hp and Hu  estimated on stimulation LP  Seismic parameters      σ1, σ3  maximum and minimum stress magnitude  randomly drawn from geotechnical data  c  cohesion  fixed  ϕ  friction angle  fixed  N  seed density  fixed  Fs  scale factor  estimated on stimulation LP  μc  criticality threshold  estimated on stimulation LP  dτ  stress drop coefficient  estimated on stimulation LP  bcons  constant b-value for HySei C  estimated on stimulation LP  bmin, bmax  minimum and maximum b-values for HySei V  estimated on stimulation LP  T  trust level  estimated on LP  Pre-stimulation      h  open-hole section  fixed (from geotechnical data)  ν  water viscosity  fixed  ρ  water density  fixed  κ0  permeability  estimated on pre-stimulation  reff  effective borehole radius  estimated on pre-stimulation  S  storage coefficient  estimated on pre-stimulation  Stimulation      Hydraulic parameters      Cu  stimulation velocity  estimated on stimulation LP  pt  threshold pressure  estimated on stimulation LP  ut  maximum stimulation factor  estimated on stimulation LP  Δpt, Δut  transition zones for Hp and Hu  estimated on stimulation LP  Seismic parameters      σ1, σ3  maximum and minimum stress magnitude  randomly drawn from geotechnical data  c  cohesion  fixed  ϕ  friction angle  fixed  N  seed density  fixed  Fs  scale factor  estimated on stimulation LP  μc  criticality threshold  estimated on stimulation LP  dτ  stress drop coefficient  estimated on stimulation LP  bcons  constant b-value for HySei C  estimated on stimulation LP  bmin, bmax  minimum and maximum b-values for HySei V  estimated on stimulation LP  T  trust level  estimated on LP  View Large 4 METHOD 4.1 Testing of model forecast performance As stated in the introduction, testing of model forecast performance is based on statistical tests established by the CSEP community (Rhoades et al. 2011). In general, these tests focus on two elements: (1) if the forecast is consistent with observations, (2) which models are better than others. First, we apply three consistency tests: Number-, Magnitude- and Space-tests assuming Poissonian earthquake distributions. Note that throughout this study, we refer to these entities as model or forecast components. In the case of Number-test, we examine whether the overall number of forecast earthquakes is consistent with the number of observed events. For Magnitude-test, we check if the forecast magnitudes match the observed distribution. In the case of Space-test, we use a 4 × 4 × 4 km testing grid divided into 200 × 200 × 200 m voxels to calculate Poisson log-likelihood of the observation in each voxel (Zechar et al. 2010; Rhoades et al. 2011). Summing individual log-likelihood values yields the joint log-likelihood (LL) of a specific experiment. To ascertain whether the forecast is accurate enough, we simulate 1000 synthetic catalogues consistent with the given forecast. If the LL for the current observation is higher than the 5th percentile of the simulated catalogues the forecast passes the Space-test. Although it is more intuitive to understand the results of the Number-test with absolute numbers, to be consistent with the other tests, we express all results in terms of log-likelihoods, which are generally negative values. The closer the results are to zero, the better the forecast. To compare models, we here use the joint LL values of the entire model (all components together) and of its separate model components. We test all forecast skills over 6, 24, 48 and 72 hr forecast periods. Further details of the testing procedure are described in Király-Proag et al. (2016). 4.2 Ensemble models Forecasting tests allow us to check model consistency and to rank different models, so that we know which performs best over any given forecast period. But how should we use these results for operational purposes? A trivial and straight-forward solution would be to choose the best model and rely exclusively on its forecasts, but there is no guarantee that its forecast would continue to be the most accurate in the future (Oreskes et al. 1994; Marzocchi et al. 2012). While the best model prevails because it captures some features of seismicity very well (e.g. spatial distribution), other models may better represent other features (e.g. seismicity rates). Our preferred solution, then, is to keep all models and merge their results using a weighted average model, that is, an ensemble model. Assigning weights inevitably involves some subjectivity, but we consider an approach based on forecast performance superior to giving all models equal weight. Weightings takes account of two main issues: forecast correlation and cumulative forecast performance (Marzocchi et al. 2012). Correlation weights are a function of correlation between models and should fulfil three criteria: (1) the weight for highly correlated models should be lower than for less-correlated models; (2) if a new model is identical to an existing model, correlation weights should be shared between the duplicate models; (3) when a new model is added to an existing set of models none of the model weights should increase. According to Marzocchi et al. (2012), no formal weighting scheme fully satisfies all criteria, thus we apply their scheme that approximately achieves these goals, and we refer the reader to their article for details. Here, we consider a weighting scheme called Bayesian Model Averaging (BMA, Hoeting et al.1999) where the weights of the jth model (Wj) depend on the Poisson probability of each model’s previous forecasts (Pj) and the corresponding correlation weights, $$\sigma _{j}^{\text{corr}}$$ (eq. 6). All weights are normalized so the sum across all models equals one:   $$W_{j} = \frac{\sigma _{j}^{\text{corr}} P_{j}}{\sum \limits _{k = 1}^{J} \sigma _{k}^{\text{corr}} P_{k}}.$$ (6)In this scheme, we consider the models’s entire performance when calculating its weights, although this can be a drawback if model components perform significantly differently regarding rate or spatial distribution. For instance, our previous results showed that the rate forecast of both HySei variants is superior to that of the SaSS variants, and that smoothed seismicity models perform better than a radially symmetric seismic cloud (Király-Proag et al. 2016). Different performances of model components can counter-balance each other and lower the weights of models that have some good components but, in combination, produce a mediocre performance. Thus, we treat model components separately, building up a multicomponent ensemble model with different weights for the number and spatial components. The number component of the ensemble forecast ($$W^{\text{Number}}_{j}$$) is constructed using weights directly proportional to the probability of each model’s expected rate forecast   $$W_{j}^{\text{Number}} = \frac{P^{\text{Number}}_{j}}{\sum \limits _{k = 1}^{J}P^{\text{Number}}_{k}}.$$ (7)To calculate spatial weights ($$W_{j}^{\text{Space}}$$) for the jth model, we account for previous performance ($$P_{j}^{\text{Space}}$$) and correlation weights ($$\sigma _{j}^{\text{corr}}$$, its calculation is detailed in the Supporting Information eqs S1–S5) as formulated in eq. (8):   $$W_{j}^{\text{Space}} = \frac{\sigma _{j}^{\text{corr}} P_{j}^{\text{Space}}}{\sum \limits _{k = 1}^{J} \sigma _{k}^{\text{corr}} P_{k}^{\text{Space}}}.$$ (8)Thus, spatial weights indicate how the spatial forecasts of the different models are combined. The ensemble number of events (i.e. the event rate)—computed from combining the number forecasts of the individual models with number weights—is then distributed in space according to the ensemble spatial distribution. All models are recalibrated and all weights are updated every 6 hr throughout the experiment. To avoid overlapping forecast periods, weights are the cumulative sums of the most recently calibrated models’ performances over the first 6 hr long forecast time window (Supporting Information Fig. S1). In other words, weight of a forecast model is computed based on the model’s correlation weight and its cumulative LL by the end of the current learning period. This implies that weights at the end of the very first learning period are equal to the correlation weights. Weights corresponding to learning period of day 1.25 are computed based on correlation weights and the forecast performance of individual models between day 1–1.25. Weights at the end of day 1.5 are computed based on also correlation weights and the forecast performance of individual models between days 1–1.25 and—with recalibrated parameters—between days 1.25–1.5 and so on. 5 RESULTS In this section, we compare the forecast performance of individual models, one-component (1C) ensemble models (i.e. weights based on overall model performance) and two-component (2C) ensemble models (i.e. weights separately based on number and spatial performance). In the following figures, blue colours indicate SaSS model variants (light, intermediate and dark shades of blue stand for SaSS S, SaSS H and SaSS E, respectively); brown and orange denote HySei C and HySei V, respectively. Vertical grey dashed lines indicate when the shut-in occurred. Basel 2006 is always on the left-hand side of each figure and Soultz-sous-Forêts 2004 is on the right. Fig. 4 summarizes the weights calculated based on previous forecast performance of the most recently calibrated models. Correlation weights (Figs 4a and e) vary within a small range centred around the value 0.2. Model variants that are inherently more similar, especially when their spatial component is concerned, receive the same weight, for example, the HySei variants. Initially, the same trend can be observed in the SaSS variants but subsequently, their spatial component differs more widely, due to the different temporal weighting of the smoothed seismicity model and their correlation weights also differ more than earlier on. Figure 4. View largeDownload slide Weights of SaSS and HySei model variants. Vertical grey dashed line indicates the shut-in moment. (a,e) Correlation weights for the data sets of Basel 2006 and Soultz-sous-Forêts 2004, respectively. (b,f) One-component ensemble weights. (c,g) Two-component number weights. (d,h) Two-component spatial weights. Figure 4. View largeDownload slide Weights of SaSS and HySei model variants. Vertical grey dashed line indicates the shut-in moment. (a,e) Correlation weights for the data sets of Basel 2006 and Soultz-sous-Forêts 2004, respectively. (b,f) One-component ensemble weights. (c,g) Two-component number weights. (d,h) Two-component spatial weights. One-component ensemble weights (Figs 4b and f) strongly emphasize the best model (SaSS H and SaSS E variants), suppressing other models especially during the post-shut-in period. In the case of two-component ensemble weights (Figs 4c, d, g and h), the different performances of the model components found in our previous paper (Király-Proag et al. 2016) are clearly reflected in both data sets. In terms of event numbers, SaSS performs first better than HySei during injection, but after shut-in HySei performs better. Thus, number weights of SaSS are higher than those of HySei during the injection, but, HySei’s number component weights are higher after shut-in. In terms of spatial forecasts, SaSS performs better throughout the entire experiment. Thus, the spatial weights of SaSS variants are higher than for any other model variants. Fig. 5 presents the cumulative LL of the individual models using the same colours as before, with one- and two-component ensemble models for both data sets represented by yellow and red triangles, respectively. There are several clear similarities between the two graphs: In the early stages, all models perform similarly, but by the end of the experiment SaSS variants outperform the HySei variants. In both data sets, one-component ensemble models often incorporate only the best performing model. As a consequence, the final performance of the one-component ensemble model is equal to the performance of the best individual model. Two-component ensemble models improve the forecast performance during the late shut-in period. Figure 5. View largeDownload slide Evolution of cumulative LL of the model variants for both data sets. Colours and symbols are explained in the legend. BMA Ens 1C and BMA Ens 2C stand for the one-component ensemble model and the two-component ensemble model, respectively. Vertical grey dashed line indicates the shut-in moment. Figure 5. View largeDownload slide Evolution of cumulative LL of the model variants for both data sets. Colours and symbols are explained in the legend. BMA Ens 1C and BMA Ens 2C stand for the one-component ensemble model and the two-component ensemble model, respectively. Vertical grey dashed line indicates the shut-in moment. Whereas Fig. 5 shows the cumulative LL corresponding to all earthquakes in previous forecast periods, Fig. 6 represents the normalized cumulative LL of events occurred during the specific forecast periods for both data sets, namely during 6 hr forecast periods (Figs 6a and e), 24 hr periods (Figs 6b and f), 48 hr periods (Figs 6c and g), and 72 hr periods (Figs 6d and h). Again, the models initially perform very similarly, but LL/Eqks declines for the SaSS variants and the one-component ensemble models. The HySei variants perform in a more stable way for Basel 2006; and whereas they are also stable for Soultz-sous-Forêts 2004, there is a clear fall at shut-in. The two-component ensemble model’s performance also declines towards the end of the experiment, but is nonetheless superior to any other model. Examining the number and spatial components separately, it becomes clear that the one-component ensemble models always highlight the most strongly performing SaSS variant, while the two-component ensemble models combine the best components: usually a HySei variant as number component, and a SaSS variant as spatial component (Supporting Information Figs S2 and S3). Figure 6. View largeDownload slide Evolution of LL/Eqks. The colour coding (explained in the legend) is the same as in Fig. 5. Vertical grey dashed line indicates when the shut-in occurred. (a–d) LL/Eqks in Basel 2006 for 6, 24, 48, and 72 hr forecast periods, respectively. (e–h) LL/Eqks in Soultz-sous-Forêts 2004 for the same forecast periods. Figure 6. View largeDownload slide Evolution of LL/Eqks. The colour coding (explained in the legend) is the same as in Fig. 5. Vertical grey dashed line indicates when the shut-in occurred. (a–d) LL/Eqks in Basel 2006 for 6, 24, 48, and 72 hr forecast periods, respectively. (e–h) LL/Eqks in Soultz-sous-Forêts 2004 for the same forecast periods. Here we demonstrate how the testing of model forecasting against observations not only helps to assess their performance, but also offers a way of building ensemble models that are superior to individual models. Below, we explore sensitivity regarding the choice of the learning period and the weighting approach. In a first sensitivity analysis, we examine whether forecasting performance systematically differed over various segments of the learning periods. We consider periods ranging in duration from a one-day moving window (representing the most recent observation over the past 24 hr) to the full duration of the available data, and conclude that there is no significant difference in forecast skills. We then examine another weighting scheme, the Score Model Averaging (SMA, see Marzocchi et al.2012) to construct ensemble models. SMA weights are inversely proportional to the absolute value of the LL ($$W_{\text{SMA}} \propto \frac{1}{|\text{LL}|}$$). This scheme generally distributes weights more equally among all models than the BMA weighting scheme does. These weights do not highlight major differences between the models in either data sets (Supporting Information Fig. S4). Comparing individual models, the results for BMA and SMA ensemble models in terms of cumulative LL indicate that SMA one-component ensembles outperform their BMA counterparts. Two-component SMA weights yield slightly better results than SMA one-component ensembles, yet perform worse than BMA two-component ensembles (Supporting Information Fig. S5). The same pattern is observed in terms of LL/Eqks for 6, 24, 48 and 72 hr forecasts (Supporting Information Fig. S6). Even though the SMA weighting scheme does not always choose the best model based on previous forecast skills, it outperformed the BMA one-component ensemble model. This shows that one must keep in mind that forecasts can hold surprises, regardless of the models’ past performance. In other words, even if we build a model applying the currently best performing model components, there is no guarantee that this model gives the best forecast in the future. 6 TOWARDS AN ADAPTIVE TRAFFIC LIGHT SYSTEM 6.1 Near-real-time hazard forecasting One goal of forecast modelling is to build an on-site, real-time hazard estimating tool (i.e. an ATLS; Király-Proag et al. (2016); Grigoli et al. (2017)) to decide if injection strategies can continue as planned or have to be adjusted due to high hazard levels. An operator’s decision should be assisted by any hazard management framework to avoid arbitrary choices (i.e. led by gut feelings or personal interests) and also have to be based on clear predefined guidelines agreed upon by all stakeholders (project managers, authorities, insurances, etc.) prior to a project. Thus, part of such a hazard forecasting tool would be a suite of alternative injection strategies in addition to the planned injection defined by operators together with other stakeholders. Further, they have to agree upon hazard or risk thresholds that trigger the change from the planned to an alternative injection strategy. Possibly such hazard thresholds are based on expert judgements. At any time, such an ATLS would provide hazard estimates (e.g. probability of and event exceeding M ≥ 3) for each consider injection scenario. If the planned injection scenario indicates that the threshold will be exceeded than the alternative scenarios are considered. In the worst case, that is, when the scenario in which injection is stopped immediately indicates too high hazard, immediate shut-in follows. Here, we present a prototype of an ATLS based on our models and show how it could give practical assistance to on-site operators. In our example, we calibrate models at six moments (learning period of 1, 2, 3, 4, 5, 5.5 d) of the on-going injection and give 20 d forecasts in terms of occurrence probability of at least one M ≥ 3 event. Probabilities are only calculated for the forecast periods, not taking into account any previous events (i.e. those occurring during the corresponding learning period). Before the injection starts, we have to decide which injection strategies are of interests and what levels of seismic hazard is acceptable. We decide to run four scenarios: Scenario A: original stepwise increasing stimulation with 12 d of constant injection in the end (to simulate an ‘original plan’ strategy), Fig. 7(a). Scenario B: the actual stimulation design happened at Basel (11 000m3 injected volume), Fig. 8(a). Scenario C: constant injection for one more day after the current learning period, then ceasing of injection, Fig. 9(a). This scenario would come into play if above scenarios indicate too high hazard, but immediate shut-in (i.e. Scenario D) still seems to be safe. Scenario D: immediate shut-in after the current learning period, Fig. 10(a). Figure 7. View largeDownload slide Injection scenario A. (a) Injection rates of the scenario. Yellow, brown, pink, purple, blue and green rectangles show the 20 d forecast periods corresponding to learning periods ending after 1, 2, 3, 4, 5 and 5.5 d, respectively. (b) Occurrence probability of M ≥ 3 calculated between days 1 and 21. Symbols are explained in the legend. (c–g) Same as (b) between days 2 and 22, 3 and 23, 4 and 24, 5 and 25, and 5.5 and 25.5, respectively. Figure 7. View largeDownload slide Injection scenario A. (a) Injection rates of the scenario. Yellow, brown, pink, purple, blue and green rectangles show the 20 d forecast periods corresponding to learning periods ending after 1, 2, 3, 4, 5 and 5.5 d, respectively. (b) Occurrence probability of M ≥ 3 calculated between days 1 and 21. Symbols are explained in the legend. (c–g) Same as (b) between days 2 and 22, 3 and 23, 4 and 24, 5 and 25, and 5.5 and 25.5, respectively. Figure 8. View largeDownload slide Injection scenario B. Structure of the figure is the same as in Fig. 7. Figure 8. View largeDownload slide Injection scenario B. Structure of the figure is the same as in Fig. 7. Figure 9. View largeDownload slide Injection scenario C. Structure of the figure is the same as in Figs 7 and 8. At the end of the current learning period, injection continues for 1 d with constant injection rate then ceased; injection curves are indicated with the colour of the corresponding learning periods. Figure 9. View largeDownload slide Injection scenario C. Structure of the figure is the same as in Figs 7 and 8. At the end of the current learning period, injection continues for 1 d with constant injection rate then ceased; injection curves are indicated with the colour of the corresponding learning periods. Figure 10. View largeDownload slide Injection scenario D. Structure of the figure is the same as in Figs 7–9. Injections always end at the end of the current learning periods; injection curves are indicated with the colour of the corresponding learning periods. Figure 10. View largeDownload slide Injection scenario D. Structure of the figure is the same as in Figs 7–9. Injections always end at the end of the current learning periods; injection curves are indicated with the colour of the corresponding learning periods. Scenarios A and D are the two end-members: A is the most optimistic scenario of all, D is the safest one in terms of seismicity indicating the limits of the current injection strategy. Scenarios B and C are two injection strategies to provide alternatives in case injection should be changed. We decide that 5 per cent occurrence probability of M ≥ 3 event is the hazard threshold above which immediate action should be taken. The threshold is arbitrary and only serves for demonstration purposes in the following. Panels (b)–(g) in Figs 7–10 will appear on operators’ screens as the stimulation advances. After one day of injection (panel b in Fig. 7–10), A, B, C scenarios forecast higher than 5 per cent of occurrence probability, meaning that future seismicity is too dangerous, however, for immediate shut-in, the most conservative scenario, the occurrence probabilities are low. An operator would decide that there is no need for immediate shut-in, but would anticipate that hazard may become too high within one day. After 2 d (panel c in Figs 7–10), occurrence probabilities has decreased for all scenarios and also immediate shut-in forecasts less than 5 per cent of occurrence probability. The decrease in hazard forecasts for Scenario A and B is due to the fact that the volume to be injected in the future decreases, because unlike Scenario C and D, the total volume is predefined. Note, however, that one reason for changing hazard estimates may also be because the calibration of the forecast models improves with more data available after 2 d. Nevertheless, the results of Scenario C and D indicate to continue the injection. Scenario C shows that keeping the same injection rate for a day then ceasing injection would result in lower than 5 per cent of occurrence probability, so increasing injection rate could be encouraged. After 3 d (panel d in Figs 7–10), forecasts of scenario A and B tend to decrease relative to previous forecasts. Again, decreasing occurrence probabilities with increasing learning periods could be due to that learning periods of less than 2 d tend to overestimate seismicity, but more available data improve rate estimates. Occurrence probabilities increase for scenarios C and D: in case of C, most models forecast about 10 per cent of occurrence probability, immediate shut-in still results in < 5 per cent occurrence probabilities. The operator would continue the injection but to keep the injection rate moderate. After 4 d of injection (panel e in Figs 7–10), all scenarios urge to take immediate action: the most conservative scenario (D) reaches the 5 per cent threshold, all other scenarios indicate significantly higher occurrence probability. According to our ATLS, this is the moment, when stimulation should have been immediately ceased based on 20 d seismicity forecasts and 5 per cent occurrence probability of M ≥ 3 events as hazard threshold. Later forecasts (learning periods of 5 and 5.5. d, panels f and g in Figs 7–10) confirm that occurrence of at least one M ≥ 3 earthquake is increasingly probable even with cautious (C) and conservative (D) scenarios. Actual observations showed that one such event with Mw = 3.1 occurred in Basel at 16:48 on 8 December 2006. It has to be acknowledged that seismicity around the time of shut-in tends to be underpredicted by our models. We stress that such forecast deficiencies might be alleviated if models that more accurately represent physical processes are included in the ensemble forecasts. Nevertheless, model calibration and forecasting will never be perfectly accurate. Hence, uncertainties in the forecasts themselves have to be anticipated. Choosing low hazard thresholds can partially cope with potential hazard underpredictions. This demonstration of a prototype ATLS shows that such a system would have already warned operators after 3 d of injection to significantly alter the original injection plan and shut-in earlier to avoid the occurrence of any M ≥ 3 events. We argue that at Basel, there would have been a good chance that felt earthquakes could have been avoided with such an ATLS. 6.2 Long-term forecasts While the performance of short-term forecast (i.e. 6 hr to 20 d) has been thoroughly examined, we also investigate our models’ forecast skill over several weeks to a year. Seismicity in the case of the Basel reservoir, in particular, has seen to decline very slowly with events still occurring years after the stimulation in 2006 (e.g. Deichmann et al. 2014; Wiemer et al. 2017). Accordingly, for both model groups we consider a 1 yr forecast horizon. Fig. 11 represents weekly forecast rates for all models with M ≥ 0.9 (Fig. 11a) and with M ≥ 3 (Fig. 11b). In the former case, HySei C and V (brown and orange dots) forecast low seismicity rates from day 10 and no seismicity after the second half of January 2007. Thus, the model severely underpredicts the data (open black squares) observed throughout the period under scrutiny. From February 2007 on, only about 37 per cent of the weekly forecasts (the weeks, when no earthquake occurred) are well-forecast by the HySei variants. SaSS (blue squares) produces higher weekly rates providing conservative overpredictions or good forecasts until early March 2007, after which time the observed fluctuations in seismicity are not captured by the smoothly decaying models. Between February and December 2007, SaSS forecasts good seismicity rate in more than 70 per cent of the cases. Forecasts of ensemble models (yellow and red triangles) fall between the forecasts provided by SaSS and HySei variants: 60 per cent of the weekly forecasts are within the ensemble forecast uncertainties for the same period. Looking at Fig. 11(b), all models, including ensemble models, forecast low or close-to-zero rates for M ≥ 3 events over the entire 1 yr period, assigning a low probability to an Mw3 event occurring in February 2007. Whereas one would usually expect better rate forecasts from numerical models based on pore pressure diffusion compared to those based on simple seismicity decay, it should be borne in mind that whenever pressure drops below a certain threshold in the simulated reservoir, seeds are no longer triggered. Heterogeneities in pressure evolution are not well represented by a radially symmetrical model where the inversion is based solely on well-head pressure, and this model also raises the question of whether pore pressure is the only phenomenon responsible for triggering events after shut-in as seismicity migrates further from the well. As Segall & Lu (2015) showed, poroelastic stresses may dominate at greater distances and poroelastic coupling-induced stresses break the symmetry of diffusion even if faults are uniformly distributed. Figure 11. View largeDownload slide One-year weekly forecast rates of all models in Basel. Symbols and colours are indicated in the legend. Note the overlapping between the shaded areas representing 95 per cent confidence intervals. Models are calibrated on 10 d of learning period. (a) Number of events per week with M ≥ 0.9. Inset shows a zoom between December 2006 and February 2007. Only SaSS, HySei C, two-component ensemble models and observed seismicity rate are plotted for clarity. (b) Number of events per week with M ≥ 3. Figure 11. View largeDownload slide One-year weekly forecast rates of all models in Basel. Symbols and colours are indicated in the legend. Note the overlapping between the shaded areas representing 95 per cent confidence intervals. Models are calibrated on 10 d of learning period. (a) Number of events per week with M ≥ 0.9. Inset shows a zoom between December 2006 and February 2007. Only SaSS, HySei C, two-component ensemble models and observed seismicity rate are plotted for clarity. (b) Number of events per week with M ≥ 3. 6.3 Limitations Although good results are achieved, operational induced seismicity forecasting is still in its infancy. One of the main limitations of operational forecast testing, hence the applicability of an ATLS, is the near real-time availability of accurate seismic catalogues. Usually, only rough estimates of locations and magnitudes are available shortly after the origin time of an earthquake and uncertainties are larger compared to catalogues created later. In our analysis, we purposefully decided to use an earlier catalogues that have not been thoroughly analysed in post-processing for both projects. This limitation, however, does not inhibit testing model forecasts in a pseudo-prospective way to validate governing physical phenomena of the reservoir creation process. We also acknowledge that our results are limited to two model groups and to two geothermal data sets in similar geological environment. Although several models are developed, many of them are not applicable for systematic subsequent forecast testing. Model calibration over many (70+) learning periods is a major effort, especially for numerical models. Nonetheless, we consider to expand the test bench with further models, such as TOUGH2-SEED model by Nespoli et al. (2015), which incorporates 3-D hydraulics and stochastic seismicity modelling; model based on Coulomb stress transfer developed by Catalli et al. (2016); and ETAS model (Bachmann et al. 2011) with additional 3-D spatial kernels. A major obstacle for further development of models and the ATLS is the limited availability of reliable data sets. Although seismic catalogues have recently become more available, it is still difficult to get access to hydraulic data due to legal and proprietary issues. Since physics-based and hybrid models require injection rate and/or pressure data, limited access to data hinders their testing. 7 CONCLUSIONS In this study, we used the Induced Seismicity Test Bench (Király-Proag et al. 2016) on five model variants calibrated and systematically tested against data from two EGS reservoirs. We paid special attention to the forecast skills of previously applied and newly introduced ensemble models and to the practical utility of induced seismicity forecasting. Our main findings are summarized as follows: Cumulative LL and LL/Eqks for different forecast periods demonstrated that a one-component BMA ensemble model performs equally well as the best individual model meaning that this weighting acts as a model selector. We introduced multi-component ensemble models, that is, separate weighting of number and spatial forecast components. Cumulative LL and LL/Eqks for different forecast periods also showed that two-component BMA ensemble models improve forecasting performance, especially during the post-stimulation period. We found that parameter calibration over different segments of the learning periods produce insignificant random variation in the models’ forecast skills. Numerical models based solely on pore pressure diffusion may not forecast well over long term, for example, in these models after the modelled pressure decreased to a sufficiently low value, no earthquake is triggered even if observations stated the contrary. Scenario analysis of different injection strategies within the frame of an Adaptive Traffic Light System can be of practical assistance to on-site decision-making by operators. In the case of Basel, a ATLS may have initiated shut-in due to too high induced seismic hazard after 3–4 d. In future, it will be important to incorporate more models that differ significantly from previously tested ones and to involve more data sets in the Induced Seismicity Test Bench for these would provide a broader picture of the physical processes in engineered reservoirs and the best modelling practices. Moreover, similarly to the CSEP testing centres, real prospective testing on operational projects or in appropriately scaled laboratory experiments would take us a step further. We believe that not only the induced seismicity community, but also researchers in other fields (e.g. statistical seismology) would benefit from such experiments. ACKNOWLEDGEMENTS We acknowledge the financial support GEOTHERM, GEOTHERM-2 and GEISERS projects for developing fundamental ideas concerning the Adaptive Traffic Light System and providing the stimulation data of Soultz-sous-Forêts 2004. The authors wish to thank EEIG Heat Mining for permission to publish that data. Acknowledgement is also due to the numerous bodies that have supported the Soultz-sous-Forêts project over the years including the European Union, ADEME of France, BMU of Germany, SER and SFOE of Switzerland, and the EEIG ‘Exploitation Minière de la Chaleur’ consortium. Access to the data is provided by contacting the authors. We are also grateful to Eduard Kissling for discussing an earlier version of this manuscript. E.K.-P. thanks the GEOTHERM-2 and DESTRESS projects for funding her PhD. Some of this work has been done at the Swiss Competence Center on Energy Research - Supply of Electricity (SCCER-SoE), with support from the Swiss Commission for Technology and Innovation (CTI). REFERENCES Bachmann C.E., Wiemer S., Woessner J., Hainzl S., 2011. Statistical analysis of the induced Basel 2006 earthquake sequence: introducing a probability-based monitoring approach for Enhanced Geothermal Systems, Geophys. J. Int. , 186( 2), 793– 807. Google Scholar CrossRef Search ADS   Baisch S., Vörös R., Weidler R., Wyborn D., 2009. Investigation of fault mechanisms during geothermal reservoir stimulation experiments in the Cooper Basin, Australia, Bull. seism. Soc. Am. , 99( 1), 148– 158. Google Scholar CrossRef Search ADS   Baisch S., Vörös R., Rothert E., Stang H., Jung R., Schellschmidt R., 2010. A numerical model for fluid injection induced seismicity at Soultz-sous-Forêts, Int. J. Rock Mech. Min. Sci. , 47( 3), 405– 413. Google Scholar CrossRef Search ADS   Calò M., Dorbath C., Frogneux M., 2014. Injection tests at the EGS reservoir of Soultz-sous-Forêts. Seismic response of the GPK4 stimulations, Geothermics , 52, 50– 58. Google Scholar CrossRef Search ADS   Catalli F., Rinaldi A.P., Gischig V., Nespoli M., Wiemer S., 2016. The importance of earthquake interactions for injection-induced seismicity: retrospective modeling of the Basel Enhanced Geothermal System, Geophys. Res. Lett. , 43( 10), 4992– 4999. Google Scholar CrossRef Search ADS   Cesca S., Grigoli F., Heimann S., González Á., Buforn E., Maghsoudi S., Blanch E., Dahm T., 2014. The 2013 September–October seismic sequence offshore Spain: a case of seismicity triggered by gas injection?, Geophys. J. Int. , 198( 2), 941– 953. Google Scholar CrossRef Search ADS   Cook N.G.W., 1976. Seismicity associated with mining, Eng. Geol. , 10( 2–4), 99– 122. Google Scholar CrossRef Search ADS   Deichmann N., Kraft T., Evans K.F., 2014. Identification of faults activated during the stimulation of the Basel geothermal project from cluster analysis and focal mechanisms of the larger magnitude events, Geothermics , 52, 84– 97. Google Scholar CrossRef Search ADS   Douglas J. et al.  , 2013. Predicting ground motion from induced earthquakes in geothermal areas, Bull. seism. Soc. Am. , 103( 3), 1875– 1897. Google Scholar CrossRef Search ADS   Dyer B., 2005. Soultz GPK4 stimulation September 2004 to April 2005. Seismic monitoring report, Semore Seismic Report, Tech. rep . Dyer B.C., Schanz U., Ladner F., Häring M.O., Spillmann T., 2008. Microseismic imaging of a geothermal reservoir stimulation, Leading Edge , 27( 7), 856– 869. Google Scholar CrossRef Search ADS   Dyer B.C., Schanz U., Spillmann T., Ladner F., Haering M.O., 2010. Application of microseismic multiplet analysis to the Basel geothermal reservoir stimulation events, Geophys. Prospect. , 58( 5), 791– 807. Google Scholar CrossRef Search ADS   Eberhard D. A.J., Zechar J.D., Wiemer S., 2012. A prospective earthquake forecast experiment in the western Pacific, Geophys. J. Int. , 190( 3), 1579– 1592. Google Scholar CrossRef Search ADS   Evans K.F., Zappone A., Kraft T., Deichmann N., Moia F., 2012. A survey of the induced seismic responses to fluid injection in geothermal and CO2 reservoirs in Europe, Geothermics , 41, 30– 54. Google Scholar CrossRef Search ADS   Farahbod A.M., Kao H., Cassidy J.F., Walker D., 2015. How did hydraulic-fracturing operations in the Horn River Basin change seismicity patterns in northeastern British Columbia, Canada?, Leading Edge , 34, 658– 663. Google Scholar CrossRef Search ADS   Genter A., Evans K., Cuenot N., Fritsch D., Sanjuan B., 2010. Contribution of the exploration of deep crystalline fractured reservoir of Soultz to the knowledge of enhanced geothermal systems (EGS), C. R. Geosci. , 342( 7–8), 502– 516. Google Scholar CrossRef Search ADS   Genter A., Cuenot N., Goerke X., Melchert B., Sanjuan B., Scheiber J., 2012. Status of the Soultz geothermal project during exploitation between 2010 and 2012, in Thirty-Seventh Workshop on Geothermal Reservoir Engineering , pp. 1– 12, Stanford, CA, USA. Gérard A., Genter A., Kohl T., Lutz P., Rose P., Rummel F., 2006. The deep EGS (Enhanced Geothermal System) project at Soultz-sous-Forêts (Alsace, France), Geothermics , 35( 5–6), 473– 483. Google Scholar CrossRef Search ADS   Gerstenberger M.C., Rhoades D.A., 2010. New Zealand earthquake forecast testing centre, Pure appl. Geophys. , 167( 8–9), 877– 892. Google Scholar CrossRef Search ADS   Gischig V., Wiemer S., Alcolea A., 2014. Balancing reservoir creation and seismic hazard in enhanced geothermal systems, Geophys. J. Int. , 198, 1585– 1598. Google Scholar CrossRef Search ADS   Gischig V.S., Wiemer S., 2013. A stochastic model for induced seismicity based on non-linear pressure diffusion and irreversible permeability enhancement, Geophys. J. Int. , 194( 2), 1229– 1249. Google Scholar CrossRef Search ADS   Grigoli F. et al.  , 2017. Current challenges in monitoring, discrimination and management of induced seismicity related to underground industrial activities: a European perspective, Rev. Geophys. , 55, 1– 31. Google Scholar CrossRef Search ADS   Gupta H.K., 2002. A review of recent studies of triggered earthquakes by artificial water reservoirs with special emphasis on earthquakes in Koyna, India, Earth-Sci. Rev. , 58( 3–4), 279– 310. Google Scholar CrossRef Search ADS   Gutenberg B., Richter C., 1944. Frequency of earthquakes in California, Bull. seism. Soc. Am. , 34, 185– 188. Häring M.O., Schanz U., Ladner F., Dyer B.C., 2008. Characterisation of the Basel 1 enhanced geothermal system, Geothermics , 37( 5), 469– 495. Google Scholar CrossRef Search ADS   Hirschberg S., Wiemer S., Burgherr P., 2015. Energy from the Earth—Deep Geothermal as a Resource for the Future? , Die Deutsche Nationalbibliothek. Hoeting J.A., Madigan D., Raftery A.E., Volinsky C.T., 1999. Bayesian model averaging: a tutorial, Stat. Sci. , 14( 4), 382– 417. Google Scholar CrossRef Search ADS   Keranen K.M., Savage H.M., Abers G.A., Cochran E.S., 2013. Potentially induced earthquakes in Oklahoma, USA: links between wastewater injection and the 2011 Mw 5.7 earthquake sequence, Geology , 41( 6), 699– 702. Google Scholar CrossRef Search ADS   Király-Proag E., Zechar J.D., Gischig V., Wiemer S., Karvounis D., Doetsch J., 2016. Validating induced seismicity forecast models–Induced Seismicity Test Bench, J. geophys. Res. , 121, 6009– 6029. Google Scholar CrossRef Search ADS   Langenbruch C., Shapiro S.A., 2010. Decay rate of fluid-induced seismicity after termination of reservoir stimulations, Geophysics , 75( 6), 1– 10. Google Scholar CrossRef Search ADS   Langenbruch C., Zoback M.D., 2016. How will induced seismicity in Oklahoma respond to decreased saltwater injection rates?, Sci. Adv. , 2, 1– 9. Google Scholar CrossRef Search ADS   Marzocchi W., Douglas Zechar J., Jordan T.H., 2012. Bayesian forecast evaluation and ensemble earthquake forecasting, Bull. seism. Soc. Am. , 102( 6), 2574– 2584. Google Scholar CrossRef Search ADS   Mignan A., Jiang C., Zechar J.D., Wiemer S., Wu Z., Huang Z., 2013. Completeness of the Mainland China earthquake catalog and implications for the setup of the China earthquake forecast testing center, Bull. seism. Soc. Am. , 103( 2 A), 845– 859. Google Scholar CrossRef Search ADS   Nanjo K.Z., Tsuruoka H., Hirata N., Jordan T.H., 2011. Overview of the first earthquake forecast testing experiment in Japan, Earth Planets Space , 63( 3), 159– 169. Google Scholar CrossRef Search ADS   Nespoli M., Rinaldi A.P., Wiemer S., 2015. TOUGH2-SEED: A coupled fluid flow mechanical-statistical model for the study of injection-induced seismicity, in TOUGH Symposium 2015 , pp. 284– 292. Oreskes N., Shrader-Frechette K., Belitz K., 1994. Verification, validation, and confirmation of numerical models in the earth sciences, Science , 263( 5147), 641– 646. Google Scholar CrossRef Search ADS PubMed  Rhoades D.A., Schorlemmer D., Gerstenberger M.C., Christophersen A., Zechar J.D., Imoto M., 2011. Efficient testing of earthquake forecasting models, Acta Geophys. , 59( 4), 728– 747. Google Scholar CrossRef Search ADS   Schorlemmer D., Christophersen A., Rovida A., Mele F., Stucchi M., Marzocchi W., 2010. Setting up an earthquake forecast experiment in Italy, Ann. Geophys. , 53( 3), 1– 9. Secanell R., Carbon D., Dunand F., Martin C., 2009. AP5000 Report - Seismic Hazard and Risk assessments during three reference time periods (normal, stimulation and circulation), Tech. Rep. October 2009, GEOTER S.A.S . Segall P., Lu S., 2015. Injection-induced seismicity: Poroelastic and earthquake nucleation effects, J. geophys. Res. , 120, 1– 22. Google Scholar CrossRef Search ADS   Shapiro S.A., Dinske C., Langenbruch C., Wenzel F., 2010. Seismogenic index and magnitude probability of earthquakes induced during reservoir fluid stimulations, Leading Edge , 29( 3), 304– 309. Google Scholar CrossRef Search ADS   Taroni M., Zechar J.D., Marzocchi W., 2013. Assessing annual global M6+ seismicity forecasts, Geophys. J. Int. , 196( 1), 422– 431. Google Scholar CrossRef Search ADS   van Thienen-Visser K., Breunese J.N., 2015. Induced seismicity of the Groningen gas field: history and recent developments, Leading Edge , 34( 6), 664– 671. Google Scholar CrossRef Search ADS   Wiemer S., Tormann T., Herrmann M., Karvounis D., Kraft T., Marti M., 2017. Induzierte Erdbeben im Nachgang des eingestellten Geothermieprojekts in Basel, Tech. rep., Schweizerischer Erdbebendienst (SED) an der ETH Zürich Autoren , Zürich, Switzerland. Zechar J.D., Schorlemmer D., Liukis M., Yu J., Euchner F., Maechling P.J., Jordan T.H., 2010. The Collaboratory for the Study of Earthquake Predictability perspective on computational earthquake science, Concurrency Comput. Pract. Exp. , 22( 12), 1836– 1847. Google Scholar CrossRef Search ADS   Zechar J.D., Schorlemmer D., Werner M.J., Gerstenberger M.C., Rhoades D.A., Jordan T.H., 2013. Regional Earthquake Likelihood Models I: First-order results, Bull. seism. Soc. Am. , 103( 2A), 787– 798. Google Scholar CrossRef Search ADS   SUPPORTING INFORMATION Supplementary data are available at GJI online. Figure S1. Explanation of weight calculations for ensemble models. Wcorr stands for correlation weights, FTW indicates forecast time window (6 hours). Figure S2. Evolution of LL/Eqks of the number components. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. LL/Eqks for 6-hour forecast periods in Basel 2006. b. Same as a. for 24-hour forecast periods. c. Same as a. for 48-hour forecast periods. d. Same as a. for 72-hour forecast periods. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. Figure S3. Evolution of LL/Eqks of the spatial components. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. LL/Eqks for 6-hour forecast periods in Basel 2006. b. Same as a. for 24-hour forecast periods. c. Same as a. for 48-hour forecast periods. d. Same as a. for 72-hour forecast periods. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. Figure S4. Score Model Averaging (SMA) weights of SaSS and HySei variants. Vertical grey dashed line indicates the shut-in moment. a. Correlation weights for Basel 2006. b. One-component ensemble weights for Basel 2006. c. Two-component number weights for Basel 2006. d. Two-component spatial weights for Basel 2006. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. Figure S5. Evolution of cumulative LL of SaSS and HySei model variants as well as BMA and SMA ensemble models. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. Basel 2006. b. Soultz-sous-Forêts 2004. Figure S6. Comparison of LL/Eqks with BMA and SMA weighted ensemble models. Symbols are explained in the legend. Vertical grey dashed line indicates the shut-in moment. a. LL/Eqks for 6-hour forecast periods in Basel 2006. b. Same as a. for 24-hour forecast periods. c. Same as a. for 48-hour forecast periods. d. Same as a. for 72-hour forecast periods. e. Same as a. for Soultz-sous-Forêts 2004. f. Same as b. for Soultz-sous-Forêts 2004. g. Same as c. for Soultz-sous-Forêts 2004. h. Same as d. for Soultz-sous-Forêts 2004. [Equations (S1) to (S5)] Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper. © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society.

### Journal

Geophysical Journal InternationalOxford University Press

Published: Jan 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations