# Constraining the magnitude of the largest event in a foreshock–main shock–aftershock sequence

Constraining the magnitude of the largest event in a foreshock–main shock–aftershock sequence Abstract Extreme value statistics and Bayesian methods are used to constrain the magnitudes of the largest expected earthquakes in a sequence governed by the parametric time-dependent occurrence rate and frequency–magnitude statistics. The Bayesian predictive distribution for the magnitude of the largest event in a sequence is derived. Two types of sequences are considered, that is, the classical aftershock sequences generated by large main shocks and the aftershocks generated by large foreshocks preceding a main shock. For the former sequences, the early aftershocks during a training time interval are used to constrain the magnitude of the future extreme event during the forecasting time interval. For the latter sequences, the earthquakes preceding the main shock are used to constrain the magnitudes of the subsequent extreme events including the main shock. The analysis is applied retrospectively to past prominent earthquake sequences. Probabilistic forecasting, Statistical methods, Earthquake hazards, Earthquake interaction, forecasting, and prediction, Statistical seismology 1 INTRODUCTION The occurrence of extreme earthquakes is a manifestation of the scaling of the frequency–magnitude statistics of seismicity (Gutenberg & Richter 1954; Vere-Jones 1969). In addition, it is controlled by the temporal and spatial variability of the seismicity rate. For example, large aftershocks typically happen with high probability right after the main shock and less often later in the aftershock sequences. This is a direct consequence of the fast decaying temporal aftershock rate, which can be approximated in many cases by the Omori–Utsu law (Utsu et al. 1995). From the physics point of view, the occurrence of earthquakes in general and extreme ones in particular is the outcome of the multiscale physical processes operating in the seismogenic crust and upper mantle (Ben-Zion 2008; Rundle et al. 2003; Shcherbakov et al. 2015). Among those, the stress transfer mechanism plays a critical role. Large aftershocks constitute significant hazard and can inflict additional damage to the already weakened infrastructure by the main shock. Therefore, constraining the magnitude of the largest event in an earthquake sequence is a crucial problem in statistical seismology (Reasenberg & Jones 1989; Shcherbakov 2014; Page et al. 2016). In this work, we combine extreme value statistics with Bayesian methods to derive the Bayesian predictive distribution for the magnitude of the largest expected event in a sequence of earthquakes. To accomplish this, we use the information of the early events in the sequence to constrain the variability of the model parameters describing the frequency–magnitude statistics and the occurrence rates. We assume that the occurrence of earthquakes in the sequence can be described by a non-homogeneous Poisson (NHP) marked point process (Utsu et al. 1995; Shcherbakov et al. 2005b), where magnitudes are not correlated and their frequency–magnitude statistics and occurrence rates can be described by a parametric model. One typical example of such a sequence of events is ubiquitous aftershock sequences. Similarly, large foreshocks, which in turn generate their own aftershock sequences, can also initiate a sequence of earthquakes culminating in the occurrence of a main shock. In both cases, the decay rates can be approximated well by the Omori–Utsu law and the frequency–magnitude statistics follow Gutenberg–Richter scaling. We analyse both types of sequences associated with prominent past main shocks to estimate the probabilities of having a large aftershock following the main shock and also to estimate the probability of the main shock magnitude following the preceding foreshocks. As a main result of this work, we provide a robust scheme in estimating the above probabilities, where we incorporate all the uncertainties associated with the model parameters into the calculation of respective probabilities. The suggested framework can be incorporated into the existing or future operating forecasting methods devoted to the estimation of magnitudes of extreme earthquakes. Extreme value theory and Bayesian analysis play an important role in statistical seismology (Campbell 1982; Daley & Vere-Jones 2003). They were used to address the problem of estimating the maximum expected magnitude of a main shock (Campbell 1982; Pisarenko 1991; Pisarenko et al. 1996; Kijko 2004), where the Bayesian approach is the most promising (Campbell 1982; Holschneider et al. 2011; Zöller et al. 2013, 2014). The search for reliable probabilistic models to forecast the occurrence of large aftershocks is justified by the hazard they pose (Gerstenberger et al. 2005; Shebalin et al. 2011; Page et al. 2016). In one such approach, the aftershock rate is recovered using the information of early aftershocks and can be used to estimate the probability of larger subsequent events (Omi et al. 2013; Ebrahimian et al. 2014; Omi et al. 2016). The occurrence of aftershocks increases significantly the seismic hazard and have to be taken into account in any operational earthquake forecasting schemes (Reasenberg & Jones 1989; Page et al. 2016). One of the first systematic empirical aftershock studies states that the magnitude of the largest aftershock on average is 1.2 less than the magnitude of the main shock (Båth 1965) and is known as Båth’s law. It was also suggested that the observed difference was independent of the number of events and its mean value was proportional to the inverse of the b-value assuming that aftershocks are independent and identically distributed random variables drawn from the exponential distribution (Vere-Jones 1969, 1975). Recent studies have explored this difference in more detail (Console et al. 2003; Shcherbakov & Turcotte 2004; Tahir et al. 2012; Shearer 2012; Shcherbakov et al. 2013). The existence of Båth’s law was also tested in the synthetic aftershock data produced by the epidemic-type aftershock sequence (ETAS) model (Helmstetter & Sornette 2003a) as well as derived theoretically (Saichev & Sornette 2005; Zhuang & Ogata 2006; Vere-Jones & Zhuang 2008; Luo & Zhuang 2016). In most of these studies, the point estimator for the difference between the magnitude of the main shock and the largest aftershock was analysed. However, those studies neither addressed the problem of constraining the variability of the difference nor the problem of quantifying the uncertainty caused by the estimation error of model parameters. The occurrence of statistically significant foreshocks before large subsequent main shocks is not as consistent as the occurrence of aftershocks (Ogata et al. 1995, 1996; Molchan et al. 1999; Reasenberg 1999; Felzer et al. 2004). However, there are clear examples when past large main shocks were preceded by relatively large foreshocks, which generated their own aftershock sequences. Typically, the analysis of foreshock sequence is performed by stacking many sequences of events based on specific clustering algorithms (Molchan et al. 1999; Helmstetter & Sornette 2003b; Christophersen & Smith 2008; Zhuang et al. 2008; Marzocchi & Zhuang 2011; Ogata & Katsura 2012, 2014). The paper is organized as follows. In Section 2, we derive the Bayesian predictive distribution for the magnitude of the largest event in a sequence by assuming specific parametric models for the frequency–magnitude statistics of earthquakes and time-dependent earthquake rates. In Section 3, we apply the developed framework to several prominent earthquake sequences to constrain the magnitude of the largest event in the sequences. We provide the final remarks and future directions in Sections 4 and 5. 2 BAYESIAN PREDICTIVE DISTRIBUTION FOR THE MAGNITUDE OF THE LARGEST EVENT IN A SEQUENCE 2.1 Extreme value and Bayesian analyses Earthquakes can be considered as an outcome of a stochastic marked point process. Each earthquake is characterized by its size or magnitude mi, which is drawn from a particular distribution, and by its occurrence time ti. In most cases, it is assumed that the magnitudes are not correlated and the temporal variation of the occurrence of events is governed by a time-dependent rate, λω(t), where ω specifies a set of parameters. With these assumptions, the earthquakes can be modeled as an NHP process in time (Utsu et al. 1995; Shcherbakov et al. 2005b). The spatial extent of the occurrence of earthquakes can also be considered, however, in this study, we restrict ourselves to the time domain only. In the analysis that follows, we define a training time interval ]0 , T] during which early earthquakes in the sequence are considered. The times and magnitudes of these earthquakes Mn = {(ti, mi): i = 1, …, n} are used to constrain the model parameters. During the forecasting time interval ]T, T + ΔT], we estimate the probability of having the magnitude of the extreme event larger than a given value. Within sequences of earthquakes during a forecasting time interval ΔT, the distribution of the largest event can be studied using the framework of extreme value statistics (Coles 2001; David & Nagaraja 2003). Suppose that earthquake magnitudes {mi}, i = 1, 2, … are described by a parametric distribution function Fθ(m), where θ is a set of parameters. If one observes k earthquakes during a specified future time interval ]T, T + ΔT], then the maximum magnitude μk = max i = 1, …, k{mi} can be considered as a random variable. Using the order statistics of extremes, the probability that the maximum observed event μk is smaller or equal to m is (Daley & Vere-Jones 2003, p. 7)   $$\mathrm{Pr}(\mu _k \le m|\theta ) = \left[F_\theta (m)\right]^k.$$ (1) On the other hand, the probability of observing k events during the interval ]T, T + ΔT], assuming that they are generated by an NHP process with the productivity $$\Lambda _\omega (\Delta T)=\int _T^{T+\Delta T} \lambda _\omega (t)\, dt$$, is (Daley & Vere-Jones 2003, p. 22)   $$\mathrm{Pr}(k|\Delta T,\omega ) = \frac{[\Lambda _\omega (\Delta T)]^k}{k!}\, e^{-\Lambda _\omega (\Delta T)}.$$ (2) The probability that the maximum event magnitude is less than or equal to m for all possible number of events k during a given time interval ]T, T + ΔT] can be evaluated by combining eqs (1) and (2) and applying the total probability theorem (Campbell 1982; Coles 2001; Daley & Vere-Jones 2003; Reiss & Thomas 2007; Zöller et al. 2013)   \begin{eqnarray} P_\mathrm{EV}(m_\mathrm{ex} \le m|\theta ,\omega ) & = & \sum \limits _{k=0}^{\infty }\frac{[\Lambda _\omega (\Delta T)]^k}{k!}\, e^{-\Lambda _\omega (\Delta T)}\, [F_\theta (m)]^k \nonumber \\ & = & \exp \lbrace -\Lambda _\omega (\Delta T)\left[1-F_\theta (m)\right]\rbrace . \end{eqnarray} (3) The above eq. (3) can be used to compute the probability 1 − PEV(mex ≤ m|θ, ω) that the largest event will be larger than m given the estimates of the model parameters {θ, ω}. In real applications, the parameters are estimated with a degree of uncertainty. Therefore, when computing this probability one needs to incorporate those uncertainties into the analysis. In order to do this, we assume that the model parameters {θ, ω} and their uncertainties can be estimated from the early observed events during a training time interval ]0, T] preceding the forecasting time interval ]T, T + ΔT]. According to the Bayesian analysis, the posterior distribution function p(θ, ω|Mn) for the model parameters {θ, ω}, given the information for the magnitudes and times of n early events, Mn, observed during the interval ]0, T], has the following form:   $$p(\theta ,\omega |M_n) \propto L\left(M_n|\theta ,\omega \right) \pi (\theta ,\omega ),$$ (4)where π(θ, ω) is a prior distribution for the model parameters. In this study, we assume that π(θ, ω) is a uniform density function over the space of all the possible values. Thus, p(θ, ω|Mn) ∝ L(Mn|θ, ω), i = 1, …, n. The likelihood function L(Mn|θ, ω) for an NHP process driven by the time-dependent rate λω(t) with event magnitudes distributed according to Fθ(m), can be written (Daley & Vere-Jones 2003, p. 23)   $$L\left(M_n|\theta ,\omega \right) = e^{-\Lambda _\omega (T)}\, \prod \limits _{i=1}^n\lambda _\omega (t_i)\, \prod \limits _{i=1}^n f_\theta (m_i),$$ (5)where $$f_\theta (m)=\frac{dF_\theta (m)}{dm}$$ is the probability density function and $$\Lambda _\omega (T)=\int _0^T \lambda _\omega (t)\, dt$$ is the productivity of the NHP process during the training time interval ]0, T]. The posterior distribution function (eq. 4), can be combined with the probability (eq. 3), to obtain the Bayesian predictive distribution for the magnitude of the maximum event mex during the forecasting time interval ]T, T + ΔT] given the information for the magnitudes and times of each n early observed earthquakes Mn during the training time interval ]0, T] (Zöller et al. 2013):   \begin{eqnarray} P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) \!=\! \int \limits _\Omega \!\int \limits _\Theta P_\mathrm{EV}(m_\mathrm{ex}\le m|\theta ,\omega )\, p(\theta ,\omega |M_n)\, d\theta \, d\omega ,\nonumber\\ \end{eqnarray} (6)where Θ and Ω define the multidimensional domains of frequency–magnitude distribution and rate parameters. The Bayesian predictive distribution (eq. 6), includes both uncertainties associated with stochastic nature of the problem and parameter variations. It is a marginal distribution where one integrates over parameter uncertainties using the posterior distribution (eq. 4). Using eqs (3)–(5), the Bayesian predictive distribution (eq. 6), can be written in the following form   \begin{eqnarray} P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) \!=\! \frac{1}{A}\, \!\int \limits _\Omega \!\int \limits _\Theta e^{-\Lambda _\omega (\Delta T)\left[1-F_\theta (m)\right]}\, L\left(M_n|\theta ,\omega \right)\, d\theta \, d\omega ,\nonumber\\ \end{eqnarray} (7)where we used a non-informative prior distribution, π(θ, ω) = const, for the model parameters. When evaluating numerically the integrals in eqs (6), (7) and other similar integrals, we always consider a typical finite range of the model parameters. This effectively defines a flat prior, which is non-zero between finite lower and upper bounds. The normalization constant A is computed by integrating the likelihood function (eq. 5) over the model parameter space {Θ, Ω}   $$A = \int \limits _\Omega \int \limits _\Theta L\left(M_n|\theta ,\omega \right)\, d\theta \, d\omega .$$ (8) The corresponding probability density function $$p_\mathrm{B}(m)=\frac{d}{dm}P_\mathrm{B}(m)$$ is   \begin{eqnarray} p_\mathrm{B}(m|M_n) &=& \frac{1}{A} \int \limits _\Omega \int \limits _\Theta \Lambda _\omega (\Delta T) f_\theta (m) e^{-\Lambda _\omega (\Delta T)\left[1-F_\theta (m)\right]}\nonumber\\ &&\times\, L\left(M_n|\theta ,\omega \right) d\theta d\omega . \end{eqnarray} (9)Eqs (7) or (9) give the distribution for the magnitude of the largest expected aftershock during a future time interval ]T, T + ΔT] given the magnitudes and times Mn of early n earthquakes, which occurred during the training time interval ]0, T]. In addition, one can consider the future time interval ΔT to the next maximum expected event as a random variable (Zöller et al. 2013). The probability distribution function of having the next maximum earthquake to occur during this time interval and to exceed a specific value mex is   \begin{eqnarray} P_\mathrm{B}(\Delta T \le t|M_n,m_\mathrm{ex}) &=& 1 - \frac{1}{A}\, \int \limits _\Omega \int \limits _\Theta e^{-\Lambda _\omega (t)\left[1-F_\theta (m_\mathrm{ex})\right]}\,\nonumber\\ &&\times\, L\left(M_n|\theta ,\omega \right)\, d\theta \, d\omega , \end{eqnarray} (10)where $$\Lambda _\omega (t)=\int _T^{T+t} \lambda _\omega (t^{\prime })\, dt^{\prime }$$ is the productivity of the NHP process during the time interval ]T, T + t]. This equation has the same functional form as eq. (7). By fixing a given probability level one can estimate the interarrival time t = ΔT to the next extreme earthquake with the magnitude greater than mex. Similarly to eq. (9), one can also compute the probability density function $$p_\mathrm{B}(t)=\frac{d}{dt}P_\mathrm{B}(\Delta T \le t|M_n,m_\mathrm{ex})$$ as   \begin{eqnarray} p_\mathrm{B}(t|M_n,m_\mathrm{ex}) &=& \frac{1}{A} \int \limits _\Omega \int \limits _\Theta \lambda _\omega (t) \left[1-F_\theta (m)\right] e^{-\Lambda _\omega (t)\left[1-F_\theta (m)\right]}\nonumber\\ &&\times\, L\left(M_n|\theta ,\omega \right) d\theta d\omega . \end{eqnarray} (11) As stated above, the occurrence of earthquakes can be considered as a marked point process. The sequence Mn = {(ti, mi)} of arrival times ti and marks (magnitudes) mi can be approximated as an NHP process, where the arrival times form an NHP process and the marks mi are iid random variables with the common distribution function F. From this marked point process, one can extract a sequence of exceedance events, where each event has a magnitude above a certain threshold mex. This threshold model is characterized by exceedance arrival times and exceedances and can be modeled as an NHP process as well (Coles 2001; Reiss & Thomas 2007). The statistics of extreme events can be obtained from the analysis of exceedances over a pre-defined high threshold. In this framework, eq. (10) gives the distribution of interarrival time intervals between such extreme events. 2.2 Parametric earthquake model Here, we analyse a specific model for the distribution of earthquake magnitudes and occurrence rates. This is a reasonably good approximation for aftershock sequences generated by large earthquakes. The model is also applicable to the aftershocks generated by some large foreshock preceding a subsequent main shock. In this case, the main shock can be considered as an extreme aftershock triggered by a large foreshock. We assume that aftershock magnitudes are distributed according to the exponential distribution with the probability density function fθ(m) with the corresponding distribution function Fθ(m):   \begin{eqnarray} f_\theta (m) & = & \beta \exp \left[-\beta \, (m - m_0)\right], \end{eqnarray} (12)  \begin{eqnarray} F_\theta (m) & = & 1 - \exp \left[-\beta \, (m - m_0)\right], \end{eqnarray} (13)where θ = {β}. The parameter β is related to the b-value of the Gutenberg–Richter scaling relation, β = ln (10)b (Gutenberg & Richter 1954). m0 is a given lower magnitude cut-off set above the catalogue completeness level. It is typically observed that the rate of aftershocks λω(t) can be approximated by the Omori–Utsu law with parameters ω = {τ, c, p} (Utsu et al. 1995; Shcherbakov et al. 2004, 2005a; Holschneider et al. 2012):   $$\lambda _\omega (t) = \frac{1}{\tau }\frac{1}{\left(1+t/c\right)^p}.$$ (14) Using the above parametric model, the Bayesian predictive distribution (eq. 7), for the largest expected aftershock can be written   \begin{eqnarray} P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) & = & \frac{1}{A} \, \!\int \limits _0^\infty \int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, e^{-\Lambda _\omega (T)-\Lambda _\omega (\Delta T)\, \exp \left[-\beta \, (m - m_0)\right] } \nonumber \\ &&\times\, \beta ^n e^{\!-\!\beta n(\overline{m} \!-\! m_0)} \frac{1}{\tau ^n} \prod \limits _{i=1}^n \left(1\!+\!\frac{t_i}{c}\right)^{-p} dp dc d\beta d\tau ,\nonumber\\ \end{eqnarray} (15)with   \begin{eqnarray} A &=& \int \limits _0^\infty \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, e^{-\Lambda _\omega (T)}\, \beta ^n\, e^{-\beta \, n(\overline{m} - m_0)}\, \frac{1}{\tau ^n}\, \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p} \nonumber\\ &&\times\, dp\, dc\, d\beta \, d\tau , \end{eqnarray} (16)where $$\overline{m}=\frac{1}{n}\sum _{i=1}^n m_i$$ is the average magnitude above m0. The corresponding probability density function is   \begin{eqnarray} p_\mathrm{B}(m|M_n) & = & \frac{1}{A} \int \limits _0^\infty \int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \Lambda _\omega (\Delta T)\, e^{-\Lambda _\omega (T)-\Lambda _\omega (\Delta T) \exp \left[\!-\!\beta (m \!-\! m_0)\right] } \nonumber \\ &&\times\,\beta ^{n+1} e^{\!-\!\beta n(\overline{m} \!-\! m_0) \!-\! \beta (m \!-\! m_0)} \frac{1}{\tau ^n} \prod \limits _{i\!=\!1}^n \left(1\!+\!\frac{t_i}{c}\right)^{-p} \!\!dp dc d\beta d\tau. \nonumber\\ \end{eqnarray} (17)The limits of integration for the β, c and p parameters can be specified through the prior distribution π(θ, ω). For this purpose, one can use typical ranges of observed values or provide a full range of variability for these parameters. It is possible to integrate eq. (15) with respect to τ to obtain   \begin{eqnarray} {P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) = \frac{\Gamma (n-1)}{A} \, \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p}} \nonumber \\ &&\times\, \frac{\beta ^n\, e^{-\beta \, n(\overline{m} - m_0)}}{\left[\varphi (c,p,0,T) + e^{-\beta (m-m_0)}\varphi (c,p,T,\Delta T)\right]^{n-1}} dp\, dc\, d\beta, \end{eqnarray} (18)where Γ(n) is the Gamma function and   $$\varphi (c,p;u,v) = \frac{c^p}{p-1}\left[(c+u)^{1-p}-(c+u+v)^{1-p}\right].$$ (19) For the normalization constant A, one can also integrate eq. (16) over τ to obtain:   \begin{eqnarray} A &=& \Gamma (n-1) \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, \frac{\beta ^n e^{-\beta \, n(\overline{m} - m_0)}}{\left[\varphi (c,p\,0,T)\right]^{n-1}}\nonumber\\ &&\times\, \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p} dp\, dc\, d\beta . \end{eqnarray} (20) The corresponding probability density function (eq. 17), after integration over τ becomes   \begin{eqnarray} p_\mathrm{B}(m|M_n) & = & \frac{\Gamma (n)}{A} \int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \varphi (c,p,T,\Delta T) \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p} \nonumber \\ &\times &\,\frac{\beta ^{n\!+\!1} e^{\!-\!\beta (m \!-\! m_0)} e^{\!-\!\beta n(\overline{m} \!-\! m_0)}}{\left[\varphi (c,p,0,T) \!+\! e^{\!-\!\beta (m\!-\!m_0)}\varphi (c,p,T,\Delta T)\right]^{n}} dp dc d\beta .\nonumber\\ \end{eqnarray} (21) The derived predictive distribution eq. (18) and the corresponding probability density function eq. (21) can be used to estimate the probabilities of having an extreme earthquake above a certain magnitude during a specified future time interval ]T, ΔT]. The proposed method incorporates the full uncertainties associated with each parameter of the models used to describe the frequency–magnitude statistics and decay rates of the early events in the sequence during the training time interval ]0, T]. The integration in the above equations has to be performed numerically. The limits of integration are finite and chosen based on the typical values of each parameter observed for past aftershock sequences. This effectively defines a flat prior π(β, c, p) for the model parameters bounded between finite minimum and maximum values. And finally, we provide the Bayesian predictive distribution for the interarrival time interval to the next largest expected event (eq. 10), for the magnitude of the earthquake to be above a certain threshold mex:   \begin{eqnarray} {P_\mathrm{B}(\Delta T \le t|M_n,m_\mathrm{ex}) = 1 \!-\! \frac{\Gamma (n-1)}{A} \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p}} \nonumber \\ &&\times\, \frac{\beta ^n\, e^{-\beta \, n(\overline{m} - m_0)}}{\left[\varphi (c,p,0,T) + e^{-\beta (m_\mathrm{ex}-m_0)}\varphi (c,p,T,t)\right]^{n-1}} dp\, dc\, d\beta, \end{eqnarray} (22)and the corresponding probability density function $$p_\mathrm{B}(t)=\frac{d}{dt}P_\mathrm{B}(t|M_n,m_\mathrm{ex})$$:   \begin{eqnarray} {p_\mathrm{B}(t|M_n,m_\mathrm{ex}) = \frac{\Gamma (n)}{A} \!\int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\!\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \left(1\!+\!\frac{T+t}{c}\right)^{\!-\!p} \prod \limits _{i\!=\!1}^n \left(1+\frac{t_i}{c}\right)^{-p}} \nonumber \\ &&\times\, \frac{\beta ^{n} e^{\!-\!\beta (m_\mathrm{ex} \!-\! m_0)} e^{-\beta n(\overline{m} \!-\! m_0)}}{\left[\varphi (c,p,0,T) \!+\! e^{\!-\!\beta (m_\mathrm{ex}\!-\!m_0)}\varphi (c,p,T,t)\right]^{n}} dp dc d\beta . \end{eqnarray} (23) Eq. (22) can be used to estimate the time to the next extreme earthquake by fixing a certain probability level. This interval ΔT can be obtained by solving the equation PB(ΔT|Mn, mex) = α, where α can be set to a specific value, for example, α = 0.05. 3 APPLICATION TO PAST PROMINENT EARTHQUAKE SEQUENCES To illustrate the application of the method, which is developed in the previous section, we consider two types of earthquake sequences. The first type comprises usual main shock–aftershock sequences, where the decay rate of aftershocks can be approximated well by the Omori–Utsu law eq. (14) and the frequency–magnitude statistics is given by the exponential distribution eq. (12). For these sequences, we use the information of the early aftershocks observed right after the main shock during the training time interval ]0, T], to estimate the probability of having an extreme aftershock above a certain magnitude later in the sequence during the forecasting time interval ]T, ΔT]. For the second type of sequences, we consider the occurrence of large foreshocks, which in turn generate their own aftershock sequences and may trigger the main shock. Therefore, we can estimate the probability of having an event of the size of the main shock or larger. 3.1 The 2016 Mms 7.3 Kumamoto, Japan, earthquake sequence As the first example, we consider the recent 2016 Mms 7.3 Kumamoto, Japan, earthquake sequence. The main shock occurred on 2016 April 16. It was preceded by two large foreshocks of magnitude 6.5 and 6.4, which occurred in the preceding day before the main shock. The largest aftershock up until 2017 September 01 had magnitude 5.9, which occurred in the first day after the main shock. First, we consider the sequence initiated by the first foreshock of magnitude 6.5, which occurred 1.16 d before the main shock. This foreshock generated its own aftershock sequence with well-defined frequency–magnitude statistics and a decay rate (Nanjo et al. 2016; Nanjo & Yoshida 2017). Therefore, we use the magnitudes and times of those aftershocks during 1.16 d after the magnitude 6.5 foreshock to compute the Bayesian predictive distribution, eqs (18) and (21), for the subsequent extreme event (main shock). This is illustrated in Fig. 1. We fix T = 1.16 d and consider several forecasting time intervals ΔT = 1, 2 and 5 d with all events larger than m0 = 3.2. From the plots, we can estimate the probability of having an extreme event larger than a certain magnitude. For example, considering the forecasting time interval ΔT = 5 d, there is 5.1 per cent chance of having an event larger than 6.5 and 1.3 per cent chance of having an event larger than 7.3, which is the magnitude of the subsequent main shock. Figure 1. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the 2016 April 14, Mms 6.5 foreshock of the Mms 7.3 Kumamoto earthquake (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1.16 d with all the events above magnitude m ≥ 3.2 and for the different forecasting time intervals ΔT = 1, 2 and 5 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 1.16 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.2, 3.5 and 3.8. The probabilities to have extreme earthquakes above magnitudes m = 6.5 and 7.3 are given in the legend. Figure 1. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the 2016 April 14, Mms 6.5 foreshock of the Mms 7.3 Kumamoto earthquake (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1.16 d with all the events above magnitude m ≥ 3.2 and for the different forecasting time intervals ΔT = 1, 2 and 5 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 1.16 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.2, 3.5 and 3.8. The probabilities to have extreme earthquakes above magnitudes m = 6.5 and 7.3 are given in the legend. The completeness of the catalogue can also play an important role in the analysis of the early aftershocks during the training time interval T (Zhuang et al. 2017). To analyse the effect of the lower magnitude cut-off m0, we consider fixed training and forecasting time intervals and vary the lower magnitude cut-off m0. This is shown in Fig. 1(b) for the fixed T = 1.16 and ΔT = 5 d and the three magnitude cut-offs m0 = 3.2, 3.5 and 3.8. The analysis shows a moderate dependence on the cut-off magnitude and slight increase in the probabilities for the lowest cut-off 3.2. This can be attributed to the fact that there are missing events right after the magnitude 6.5 foreshock. This illustrates the importance of having a complete earthquake catalogue to perform the analysis. On the other hand, the lower cut-off allows to use more events, which in turn reduce the associated uncertainties of the parameters used to model the frequency–magnitude statistics and the decay rates but increases the estimation biases. Next, we consider the aftershock sequence generated by the main shock itself. We use the information of the early aftershocks after the main shock to compute the Bayesian predictive distribution for the magnitudes of the extreme aftershocks in the future time interval. In Fig. 2, we plot the cumulative distribution (eq. 18), and the probability density function (eq. 21), for several future time intervals ΔT = 1, 2, 5 and 10 d, where we use the aftershocks above magnitude m0 = 4.0 in the first T = 1 d after the main shock. The effect of the lower magnitude cut-off m0 on the estimated probabilities is shown in Fig. 2(b), where we consider m0 = 3.5, 4.0 and 4.5. Figure 2. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1 d with all the events above magnitude m ≥ 4.0 and for the different forecasting time intervals ΔT = 1, 2, 5 and 10 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 5 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.5, 4.0 and 4.5. For both figures, the probabilities to have extreme earthquakes above magnitudes m = 6.3 and 7.3 are given in the legend. Figure 2. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1 d with all the events above magnitude m ≥ 4.0 and for the different forecasting time intervals ΔT = 1, 2, 5 and 10 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 5 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.5, 4.0 and 4.5. For both figures, the probabilities to have extreme earthquakes above magnitudes m = 6.3 and 7.3 are given in the legend. In addition, we analysed the distribution of the interarrival time interval ΔT to the next largest expected earthquake above a certain magnitude mex. Using the derived equations for the cumulative distribution (eq. 23), and the probability density function (eq. 23), we plot them in Fig. 3 for the analysis of the aftershock sequence generated by the Kumamoto main shock. We consider a fixed training time interval of T = 1 d and estimate the distribution of interarrival times to the next extreme earthquake above several magnitude thresholds mex = 6.0, 6.3 and 7.3. For a fixed probability level of 5 per cent the estimated interarrival times are ΔT = 0.21, 0.42 and 5.32 d, respectively. Figure 3. View largeDownload slide Bayesian predictive distribution PB(ΔT ≤ t|Mn, mex) (eq. 22), as dashed curves and the corresponding probability density function pB(t|Mn, mex) (eq. 23), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). The results correspond to the same early training time interval of T = 1 d with all the events above magnitude m0 ≥ 4.0 and for different forecasting extreme magnitudes mex = 6.0, 6.3 and 7.3. The horizontal solid line corresponds to the probability level of 5 per cent. The intercepts of this line with the cumulative distribution functions give the forecasting or interarrival time intervals ΔT = 0.21, 0.42 and 5.32 d to the next extreme earthquake with the magnitude larger than mex. Figure 3. View largeDownload slide Bayesian predictive distribution PB(ΔT ≤ t|Mn, mex) (eq. 22), as dashed curves and the corresponding probability density function pB(t|Mn, mex) (eq. 23), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). The results correspond to the same early training time interval of T = 1 d with all the events above magnitude m0 ≥ 4.0 and for different forecasting extreme magnitudes mex = 6.0, 6.3 and 7.3. The horizontal solid line corresponds to the probability level of 5 per cent. The intercepts of this line with the cumulative distribution functions give the forecasting or interarrival time intervals ΔT = 0.21, 0.42 and 5.32 d to the next extreme earthquake with the magnitude larger than mex. To analyse the interplay between the two time intervals T and ΔT, we plot in Fig. 4 the dependence of the probability to have an extreme aftershock above magnitude mex ≥ 6.3 on varying T ∈ [1, 20] and ΔT ∈ [1, 15] d. The probability to have an extreme event typically decreases with increasing the training time interval T for a fixed forecasting interval ]T, ΔT]. This is a direct consequence of the power-law time-decaying Omori–Utsu law. On the other hand, the increasing forecasting time interval ]T, ΔT] for a fixed training time interval T results in the increase of the probability to have an extreme event. Figure 4. View largeDownload slide (a) The probability to have an extreme aftershock above a magnitude mex = 6.3 for the varying initial T and future time intervals ΔT. The sequence is initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). All the aftershocks above magnitude m0 = 4.0 were used. (b) The probability to have an extreme event above the magnitude mex = 7.3 for the sequence initiated by the magnitude 6.5 Kumamoto foreshock (2016 April 14). The maximum initial time interval T = 1.16 d was used. All the earthquake above magnitude m0 = 3.1 were used. Figure 4. View largeDownload slide (a) The probability to have an extreme aftershock above a magnitude mex = 6.3 for the varying initial T and future time intervals ΔT. The sequence is initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). All the aftershocks above magnitude m0 = 4.0 were used. (b) The probability to have an extreme event above the magnitude mex = 7.3 for the sequence initiated by the magnitude 6.5 Kumamoto foreshock (2016 April 14). The maximum initial time interval T = 1.16 d was used. All the earthquake above magnitude m0 = 3.1 were used. 3.2 Comparative analysis of other sequences To extend our analysis, we consider other past prominent earthquake sequences and provide the probabilities of the occurrence of the largest events within a specified forecasting time interval ]T, T + ΔT]. The considered aftershock sequences generated by past large main shocks are summarized in Table 1. In Table 2, we list sequences associated with large foreshocks of the corresponding main shocks. Table 1. Summary of the data for the aftershock sequences used in the analysis. The columns are: the date of the occurrence of a main shock, magnitude of the main shock Mms, lower magnitude cut-off for aftershocks m0, the magnitude of the largest aftershock and the time interval in days when the largest aftershock occurred, respectively. Footnotes provide the references to the corresponding earthquake catalogues used in the study. Date  Name  Mms  m0  mas  Days  2016 November 13  Kaikourad  7.8  4.0  6.3  0.563  2016 April 16  Kumamotoa  7.3  4.0  5.9  0.014  2011 March 11  Tohokua  9.0  5.5  7.6  0.020  2010 September 3  Darfieldd  7.1  3.5  6.2  171.3  2010 April 4  El Mayorb  7.2  3.2  5.7  71.24  2008 June 14  Iwatea  7.2  3.1  5.7  0.025  2007 July 16  Chuetsua  6.8  3.1  5.8  0.225  2007 March 25  Notoa  6.9  3.1  5.3  0.354  2005 March 20  Fukuokaa  7.0  3.1  5.8  30.80  2004 October 23  Niigataa  6.8  3.1  6.5  0.026  2003 September 26  Tokachi-okia  8.0  4.0  7.1  0.054  2002 November 3  Denalic  7.9  3.0  5.8  0.014  2000 October 6  Tottoria  7.3  3.1  5.6  1.991  1999 October 17  Hector Mineb  7.1  2.7  5.8  0.009  1995 January 16  Kobea  7.3  3.1  5.4  0.078  1994 January 17  Northridgeb  6.7  3.0  5.9  0.001  1992 June 28  Landersb  7.3  3.0  6.3  0.131  Date  Name  Mms  m0  mas  Days  2016 November 13  Kaikourad  7.8  4.0  6.3  0.563  2016 April 16  Kumamotoa  7.3  4.0  5.9  0.014  2011 March 11  Tohokua  9.0  5.5  7.6  0.020  2010 September 3  Darfieldd  7.1  3.5  6.2  171.3  2010 April 4  El Mayorb  7.2  3.2  5.7  71.24  2008 June 14  Iwatea  7.2  3.1  5.7  0.025  2007 July 16  Chuetsua  6.8  3.1  5.8  0.225  2007 March 25  Notoa  6.9  3.1  5.3  0.354  2005 March 20  Fukuokaa  7.0  3.1  5.8  30.80  2004 October 23  Niigataa  6.8  3.1  6.5  0.026  2003 September 26  Tokachi-okia  8.0  4.0  7.1  0.054  2002 November 3  Denalic  7.9  3.0  5.8  0.014  2000 October 6  Tottoria  7.3  3.1  5.6  1.991  1999 October 17  Hector Mineb  7.1  2.7  5.8  0.009  1995 January 16  Kobea  7.3  3.1  5.4  0.078  1994 January 17  Northridgeb  6.7  3.0  5.9  0.001  1992 June 28  Landersb  7.3  3.0  6.3  0.131  aThe Japan Meteorological Agency earthquake catalog (http://www.jma.go.jp/en/quake/). bHauksson et al. waveform relocated catalog for Southern California (http://web.gps.caltech.edu/∼hauksson/catalogs/index.html) (Hauksson et al., 2012). cAlaska Earthquake Information Center catalog (http://www.aeic.alaska.edu/html_docs/db2catalog.html). dNew Zealand GEONET earthquake catalog (http://quakesearch.geonet.org.nz/). View Large Table 2. Summary of the data for the aftershock sequences generated by the foreshocks of the large main shocks. The columns are: the date of occurrence of a foreshock, the name of the corresponding main shock, the magnitude of the foreshock mfs, the magnitude of the main shock Mms, the lower magnitude cut-off for aftershocks m0 and the training time interval T, respectively. Date  Name  mfs  Mms  m0  T (days)  2016 April 14  Kumamotoa  6.5  7.3  3.1  1.16  2011 March 9  Tohokua  7.3  9.0  4.0  2.12  2010 April 3  El Mayorb  4.3  7.2  2.0  0.98  2002 October 23  Denalic  6.7  7.9  2.5  11.31  1992 April 23  Landersb  6.1  7.3  2.5  2.0  Date  Name  mfs  Mms  m0  T (days)  2016 April 14  Kumamotoa  6.5  7.3  3.1  1.16  2011 March 9  Tohokua  7.3  9.0  4.0  2.12  2010 April 3  El Mayorb  4.3  7.2  2.0  0.98  2002 October 23  Denalic  6.7  7.9  2.5  11.31  1992 April 23  Landersb  6.1  7.3  2.5  2.0  a,b,cThe same earthquake catalogs given in Table 1. View Large For aftershock sequences generated by large main shocks, we consider the training time intervals of T = 1, 2, 5 and 10 d and estimate the probabilities PB(mex ≥ m) of having extreme aftershocks above magnitude m for the fixed forecasting time interval ΔT = 5 d. We report these probabilities in Fig. 5(a) for m = Mms − 1.0 and in Fig. 5(b) for m = Mms. It is evident from the figures that the estimated probabilities are reduced by increasing the early training time intervals. This is related to the fact that for longer training time intervals the corresponding aftershock decay rate starts to decrease according to the Omori–Utsu law which in turn reduces the corresponding probabilities of having large subsequent events. The observed variation of the computed probabilities for different main shocks is directly related to the variability of the corresponding frequency–magnitude statistics and decay rates of each aftershock sequence. In general, the aftershock sequences with lower b-values and shallow decay rates (smaller p values) will produce higher probabilities for subsequent extreme aftershocks during the forecasting time interval ΔT. Figure 5. View largeDownload slide The probabilities of occurrence of extreme aftershocks after several prominent main shocks are shown. The left vertical axis indicate the date of the main shock and the right vertical axis give the magnitude above which the probabilities are computed. The probabilities are estimated using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = Mms − 1.0 and (b) m = Mms. The training time intervals T = 1, 2, 5 and 10 d and the fixed forecasting time interval ΔT = 5 d were used and indicated with different colours from light to dark, respectively. Figure 5. View largeDownload slide The probabilities of occurrence of extreme aftershocks after several prominent main shocks are shown. The left vertical axis indicate the date of the main shock and the right vertical axis give the magnitude above which the probabilities are computed. The probabilities are estimated using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = Mms − 1.0 and (b) m = Mms. The training time intervals T = 1, 2, 5 and 10 d and the fixed forecasting time interval ΔT = 5 d were used and indicated with different colours from light to dark, respectively. In some cases, main shocks are preceded by large foreshocks, which generate their own aftershock sequences. In this context, we can treat the occurrence of the subsequent main shock as an extreme aftershock by assuming that it belongs to the same sequence of events governed by the corresponding frequency–magnitude statistics and decay rates. Therefore, we can use the information of earthquakes before the occurrence of the main shock to estimate the probability of having an extreme event equal to or larger than the magnitude of the main shock. This is reported in Fig. 6 and the parameters of the corresponding foreshock sequences are given in Table 2. In Fig. 6(a), we computed the probabilities of having events larger than the magnitude of the largest foreshock. In Fig. 6(b), we provide the probabilities of having events larger than the subsequent main shock magnitudes. For most of the sequences, we used the training time interval T that ends before the occurrence of the main shocks. We also used several forecasting time intervals ΔT = 1, 5 and 10 d. In most case, the obtained probabilities are rather low (Fig. 6) which is consistent with other studies (Zhuang et al. 2008). This is especially evident for the analysis of the foreshock sequence before the 2002 Mms 7.9 Denali, Alaska, main shock. The obtained probabilities are several orders of magnitude lower compared to the other foreshock sequences summarized in Fig. 6, where we employed the logarithmic scale to emphasize this variability. Figure 6. View largeDownload slide The probabilities of occurrence of extreme earthquakes after the foreshocks preceding several prominent main shocks are shown. The left vertical axis indicate the dates of the foreshocks and the right vertical axis give the magnitudes of the extreme events. The probabilities were computed using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = mfs and (b) m = Mms. The training time interval T is given in Table 2 for each sequence. Three forecasting time intervals ΔT = 1, 2 and 5 d were used and indicated with different colours from light to dark, respectively. Figure 6. View largeDownload slide The probabilities of occurrence of extreme earthquakes after the foreshocks preceding several prominent main shocks are shown. The left vertical axis indicate the dates of the foreshocks and the right vertical axis give the magnitudes of the extreme events. The probabilities were computed using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = mfs and (b) m = Mms. The training time interval T is given in Table 2 for each sequence. Three forecasting time intervals ΔT = 1, 2 and 5 d were used and indicated with different colours from light to dark, respectively. 4 DISCUSSION The proposed statistical model to estimate the occurrence of extreme earthquake magnitudes in a sequence assumes that earthquakes can be modeled by an NHP process driven by the time-dependent earthquake rate. In addition, it assumes that the frequency–magnitude statistics of earthquakes are stationary in time. It also assumes that the extreme events are drawn from the same population of earthquakes as the earthquakes used to constrain the parameters of the model. For real earthquake sequences, not all of these assumptions are always satisfied. For example, some large main shocks can trigger rather large aftershocks later in a sequence, however, these aftershocks do not necessarily belong to the same statistical population of events as the main shock and its direct aftershocks. This usually applies to the occurrence of large aftershocks on separate faults. For example, this may be the case for the occurrence of three large aftershocks during 1 yr and 3 months time interval after the 2010 Mms 7.1 Darfield, New Zealand, main shock (Beavan et al. 2012; Shcherbakov et al. 2012). Another example is the occurrence of a strong aftershock (the 1992 m 6.3 Big Bear event) after the 1992 Mms 7.3 Landers main shock. In these and other cases, those large aftershocks may have been triggered by complex stress redistribution after the corresponding main shocks and it is not assured that they belong to the same statistical population of aftershocks. In some case, the main shocks themselves can rupture several fault segments and can be subdivided into several subevents. The typical examples are the 1992 Mms 7.3 Landers, 2002 Mms 7.9 Denali, 2010 Mms 7.1 Darfield, 2012 Mms 8.6 Indian Ocean, 2016 Mms 7.8 Kaikoura and other earthquakes. Similar limitations also apply to the frequency–magnitude statistics and the earthquake occurrence rates. For some aftershock sequences, the Omori–Utsu law is not the best-fitting model as some large aftershocks can generate their own aftershock sequences and the total observed rate deviates from the Omori–Utsu functional form. In many cases, the ETAS model can be a better rate model (Ogata 1988, 1999). However, the temporal ETAS model has more parameters than the Omori–Utsu model and this translates into higher dimensional integration of the Bayesian predictive distribution function (eq. 7), which is usually computationally expensive. The distribution of aftershock magnitudes can also deviate from the exponential functional form assumed in this study. Therefore, these limitations of the assumed parametric models can lead to improperly computed probabilities for the magnitudes of extreme earthquakes. Another important aspect of the majority of aftershock sequences is the early aftershock incompleteness (Kagan 2004; Peng et al. 2006; Hainzl 2016a,b). This affects the estimation of the model parameters if the magnitude of completeness is underestimated. As a result, this can significantly influence the calculation of probabilities for the extreme events. To recover partially, the true rate a variable magnitude of completeness can be considered (Helmstetter et al. 2006; Omi et al. 2014; Page et al. 2016). In our analysis, we use a uniform non-informative prior for the model parameters when we derive the Bayesian predictive distribution (eq. 7), for the magnitude of the expected largest aftershock. The only constraint we impose is that the model parameters are found between finite bounds. However, it is possible to incorporate a non-uniform prior based on historical seismicity. For example, Page et al. (2016) consider a worldwide distribution of aftershock sequences subdivided into different seismogenic zones and estimate the regional parameters of the Omori–Utsu (eq. 14), and Gutenberg–Richter (eq. 12), laws. Shcherbakov et al. (2013) carried out the analysis of the parameters of the above empirical laws for large subduction earthquakes. Therefore, one can incorporate this information into the computation of occurrence probabilities. Our suggested method of estimating the probabilities of the magnitudes of large aftershocks can be compared with the Reasenberg & Jones (1989) approach introduced to model the aftershock occurrence probabilities in California. The Reasenberg & Jones (1989) model has been reformulated recently by Page et al. (2016) to account for the early aftershock incompleteness and has been extended to the worldwide seismogenic regions. The calculation of the probabilities for the magnitudes of large earthquakes in the Reasenberg & Jones (1989) model (their eq. 4) is in fact equivalent to the use of the extreme value distribution given in eq. (3) with the same parametric models for the frequency–magnitude statistics (eq. 13), and the aftershock occurrence rate (eq. 14), with the following relationship between parameters $$a=\log _{10}\left[\beta \, K\, e^{-\beta (M_\mathrm{ms}-m_0)}\right]$$ in the limit M2 → ∞, where a and M2 are the parameters defined in eq. (3) of Reasenberg & Jones (1989). In our approach, we incorporate the uncertainties associated with the model parameters by deriving the Bayesian predictive distribution (eq. 18). For comparison, we computed the probabilities of having the magnitudes of large aftershocks greater than Mms − 1.0 for the above past prominent sequences by using both the Bayesian predictive distribution (eq. 18), and the extreme value distribution (eq. 3). This is reported in Table 3. For the extreme value distribution calculations, we used the point estimates of the model parameters {β, K, c, p} for each aftershock sequence during the first T = 2 d after the main shocks. The results show that the extreme value distribution consistently overestimates the probabilities of the magnitudes of large aftershocks. Table 3. Comparison of the probabilities of having the magnitudes of future large aftershocks larger than m = Mms − 1.0 by using both the Bayesian predictive distribution PB (eq. 18), and the extreme value distribution PEV (eq. 3). The training time interval T = 2 d and the forecasting time interval ΔT = 5 d were used. Date  Name  Mms  m0  PB(mex ≥ m)  PEV(mex ≥ m)  2016 November 13  Kaikoura  7.8  4.0  0.140  0.180  2016 April 16  Kumamoto  7.3  4.0  0.120  0.222  2011 March 11  Tohoku  9.0  5.5  0.054  0.103  2010 September 03  Darfield  7.1  3.5  0.495  0.643  2010 April 04  El Mayor  7.2  3.2  0.141  0.193  2008 June 14  Iwate  7.2  3.1  0.042  0.080  2007 July 16  Chuetsu  6.8  3.1  0.011  0.032  2007 March 25  Noto  6.9  3.1  0.090  0.162  2005 March 20  Fukuoka  7.0  3.1  0.009  0.023  2004 October 23  Niigata  6.8  3.1  0.429  0.554  2003 September 26  Tokachi-oki  8.0  4.0  0.122  0.197  2002 November 03  Denali  7.9  3.0  0.202  0.372  2000 October 06  Tottori  7.3  3.1  0.015  0.032  1999 October 17  Hector Mine  7.1  2.7  0.166  0.130  1995 January 16  Kobe  7.3  3.1  0.088  0.146  1994 January 17  Northridge  6.7  3.0  0.415  0.569  1992 June 28  Landers  7.3  3.0  0.131  0.099  Date  Name  Mms  m0  PB(mex ≥ m)  PEV(mex ≥ m)  2016 November 13  Kaikoura  7.8  4.0  0.140  0.180  2016 April 16  Kumamoto  7.3  4.0  0.120  0.222  2011 March 11  Tohoku  9.0  5.5  0.054  0.103  2010 September 03  Darfield  7.1  3.5  0.495  0.643  2010 April 04  El Mayor  7.2  3.2  0.141  0.193  2008 June 14  Iwate  7.2  3.1  0.042  0.080  2007 July 16  Chuetsu  6.8  3.1  0.011  0.032  2007 March 25  Noto  6.9  3.1  0.090  0.162  2005 March 20  Fukuoka  7.0  3.1  0.009  0.023  2004 October 23  Niigata  6.8  3.1  0.429  0.554  2003 September 26  Tokachi-oki  8.0  4.0  0.122  0.197  2002 November 03  Denali  7.9  3.0  0.202  0.372  2000 October 06  Tottori  7.3  3.1  0.015  0.032  1999 October 17  Hector Mine  7.1  2.7  0.166  0.130  1995 January 16  Kobe  7.3  3.1  0.088  0.146  1994 January 17  Northridge  6.7  3.0  0.415  0.569  1992 June 28  Landers  7.3  3.0  0.131  0.099  View Large 5 CONCLUSIONS In this work, we proposed a statistical scheme to estimate the probabilities of having extreme events above a certain magnitude in an earthquake sequence by using the information of early earthquakes in the sequence. We derived the Bayesian predictive distribution for the magnitudes of extreme events during the future time interval by assuming specific parametric forms of the frequency–magnitude statistics and earthquake occurrence rates. Specifically, we assume that the distribution of earthquake magnitudes can be approximated by the exponential distribution (eq. 12) and the temporal rate is given by the Omori–Utsu law (eq. 14). In addition, we treat the occurrence of earthquakes as an NHP process. The advantage of the derived Bayesian predictive distribution is that it fully incorporates the uncertainties associated with the model parameters. It also combines in a uniform framework the two important characteristics of the occurrence of extreme earthquakes, that is, the frequency–magnitude statistics and the corresponding earthquake occurrence rate. To illustrate the applicability of the derived framework, we considered several prominent past earthquake sequences in order to estimate the probabilities for the occurrence of extreme events. In the analysis, we studied individual aftershocks sequences generated by several large main shocks and estimated the corresponding probabilities of having large subsequent aftershocks during a fixed forecasting time interval by using the information of early events in the sequences. We also considered several past foreshock sequences which culminated in the occurrence of large main shocks. We were able to compute the probabilities of having events larger than the magnitude of the main shock for each sequence. Acknowledgements RS would like to acknowledge the hospitality of the Institute of Statistical Mathematics, where part of this work was done. We would also like to thank two anonymous reviewers for their constructive criticism and suggestions. RS has been supported partially by the NSERC Discovery grant, NSERC CRD grant 453034 and CFI grant 25816. REFERENCES Båth M., 1965. Lateral inhomogeneities of the upper mantle, Tectonophysics , 2, 483– 514. Google Scholar CrossRef Search ADS   Beavan J., Motagh M., Fielding E.J., Donnelly N., Collett D., 2012. Fault slip models of the 2010-2011 Canterbury, New Zealand, earthquakes from geodetic data and observations of postseismic ground deformation, N. Z. J. Geol. Geophys. , 55( 3), 207– 221. Google Scholar CrossRef Search ADS   Ben-Zion Y., 2008. Collective behavior of earthquakes and faults: Continuum-discrete transitions, progressive evolutionary changes, and different dynamic regimes, Rev. Geophys. , 46( 4), RG4006, doi:10.1029/2008RG000260. Campbell K.W., 1982. Bayesian analysis of extreme earthquake occurrences. Part I. Probabilistic hazard model, Bull. seism. Soc. Am. , 72( 5), 1689– 1705. Christophersen A., Smith E.G.C., 2008. Foreshock rates from aftershock abundance, Bull. seism. Soc. Am. , 98( 5), 2133– 2148. Google Scholar CrossRef Search ADS   Coles S., 2001. An Introduction to Statistical Modeling of Extreme Values , 208, Springer, London. Google Scholar CrossRef Search ADS   Console R., Lombardi A.M., Murru M., Rhoades D., 2003. Båth’s law and the self-similarity of earthquakes, J. geophys. Res. , 108( B2), 2128, doi:10.1029/2001JB001651. Google Scholar CrossRef Search ADS   Daley D.J., Vere-Jones D., 2003. An Introduction to the Theory of Point Processes , 1, 2nd edn, Springer, New York. David H.A., Nagaraja H.N., 2003. Order Statistics , 3rd edn, Wiley, New Jersey. Google Scholar CrossRef Search ADS   Ebrahimian H., Jalayer F., Asprone D., Lombardi A.M., Marzocchi W., Prota A., Manfredi G., 2014. Adaptive daily forecasting of seismic aftershock hazard, Bull. seism. Soc. Am. , 104( 1), 145– 161. Google Scholar CrossRef Search ADS   Felzer K.R., Abercrombie R.E., Ekstrom G., 2004. A common origin for aftershocks, foreshocks, and multiplets, Bull. seism. Soc. Am. , 94( 1), 88– 98. Google Scholar CrossRef Search ADS   Gerstenberger M.C., Wiemer S., Jones L.M., Reasenberg P.A., 2005. Real-time forecasts of tomorrow’s earthquakes in California, Nature , 435( 7040), 328– 331. Google Scholar CrossRef Search ADS PubMed  Gutenberg B., Richter C.F., 1954. Seismicity of the Earth and Associated Phenomenon , 2nd edn, Princeton University Press, Princeton, NJ. Hainzl S., 2016a. Rate-dependent incompleteness of earthquake catalogs, Seismol. Res. Lett. , 87( 2), 337– 344. Google Scholar CrossRef Search ADS   Hainzl S., 2016b. Apparent triggering function of aftershocks resulting from rate-dependent incompleteness of earthquake catalogs, J. geophys. Res. , 121( 9), 6499– 6509. Google Scholar CrossRef Search ADS   Hauksson E., Yang W., Shearer P.M., 2012. Waveform relocated earthquake catalog for Southern California (1981 to 2011), Bull. seism. Soc. Am. , 102( 5), 2239– 2244. Google Scholar CrossRef Search ADS   Helmstetter A., Sornette D., 2003a. Båth’s law derived from the Gutenberg-Richter law and from aftershock properties, Geophys. Res. Lett. , 30( 20), 2069, doi:10.1029/2003GL018186. Google Scholar CrossRef Search ADS   Helmstetter A., Sornette D., 2003b. Foreshocks explained by cascades of triggered seismicity, J. geophys. Res. , 108( B10), 2457, doi:10.1029/2003JB002409. Helmstetter A., Kagan Y.Y., Jackson D.D., 2006. Comparison of short-term and time-independent earthquake forecast models for southern California, Bull. seism. Soc. Am. , 96( 1), 90– 106. Google Scholar CrossRef Search ADS   Holschneider M., Zöller G., Hainzl S., 2011. Estimation of the maximum possible magnitude in the framework of a doubly truncated Gutenberg-Richter model, Bull. seism. Soc. Am. , 101( 4), 1649– 1659. Google Scholar CrossRef Search ADS   Holschneider M., Narteau C., Shebalin P., Peng Z., Schorlemmer D., 2012. Bayesian analysis of the modified Omori law, J. geophys. Res. , 117, B06317, doi:10.1029/2011JB009054. Google Scholar CrossRef Search ADS   Kagan Y.Y., 2004. Short-term properties of earthquake catalogs and models of earthquake source, Bull. seism. Soc. Am. , 94( 4), 1207– 1228. Google Scholar CrossRef Search ADS   Kijko A., 2004. Estimation of the maximum earthquake magnitude, mmax, Pure appl. Geophys. , 161( 8), 1655– 1681. Google Scholar CrossRef Search ADS   Luo J.W., Zhuang J., 2016. Three regimes of the distribution of the largest event in the critical ETAS model, Bull. seism. Soc. Am. , 106( 3), 1364– 1369. Google Scholar CrossRef Search ADS   Marzocchi W., Zhuang J., 2011. Statistics between mainshocks and foreshocks in Italy and Southern California, Geophys. Res. Lett. , 38, L09310, doi:10.1029/2011GL047165. Google Scholar CrossRef Search ADS   Molchan G.M., Kronrod T.L., Nekrasova A.K., 1999. Immediate foreshocks: time variation of the b-value, Phys. Earth planet. Inter. , 111( 3–4), 229– 240. Google Scholar CrossRef Search ADS   Nanjo K.Z., Yoshida A., 2017. Anomalous decrease in relatively large shocks and increase in the p and b values preceding the April 16, 2016, M7.3 earthquake in Kumamoto, Japan, Earth Planets Space , 69, 13, doi:10.1186/s40623-017-0598-2. Google Scholar CrossRef Search ADS   Nanjo K.Z.et al.  , 2016. Seismicity prior to the 2016 Kumamoto earthquakes, Earth Planets Space , 68, 187, doi:10.1186/s40623-016-0558-2. Google Scholar CrossRef Search ADS   Ogata Y., 1988. Statistical-models for earthquake occurrences and residual analysis for point-processes, J. Am. Stat. Assoc. , 83( 401), 9– 27. Google Scholar CrossRef Search ADS   Ogata Y., 1999. Seismicity analysis through point-process modeling: a review, Pure appl. Geophys. , 155( 2–4), 471– 507. Google Scholar CrossRef Search ADS   Ogata Y., Katsura K., 2012. Prospective foreshock forecast experiment during the last 17 years, Geophys. J. Int. , 191( 3), 1237– 1244. Ogata Y., Katsura K., 2014. Comparing foreshock characteristics and foreshock forecasting in observed and simulated earthquake catalogs, J. geophys. Res. , 119( 11), 8457– 8477. Google Scholar CrossRef Search ADS   Ogata Y., Utsu T., Katsura K., 1995. Statistical features of foreshocks in comparison with other earthquake clusters, Geophys. J. Int. , 121( 1), 233– 254. Google Scholar CrossRef Search ADS   Ogata Y., Utsu T., Katsura K., 1996. Statistical discrimination of foreshocks from other earthquake clusters, Geophys. J. Int. , 127( 1), 17– 30. Google Scholar CrossRef Search ADS   Omi T., Ogata Y., Hirata Y., Aihara K., 2013. Forecasting large aftershocks within one day after the main shock, Sci. Rep. , 3, 2218, doi:10.1038/srep02218. Google Scholar CrossRef Search ADS PubMed  Omi T., Ogata Y., Hirata Y., Aihara K., 2014. Estimating the ETAS model from an early aftershock sequence, Geophys. Res. Lett. , 41( 3), 850– 857. Google Scholar CrossRef Search ADS   Omi T., Ogata Y., Shiomi K., Enescu B., Sawazaki K., Aihara K., 2016. Automatic aftershock forecasting: a test using real-time seismicity data in Japan, Bull. seism. Soc. Am. , 106( 6), 2450– 2458. Google Scholar CrossRef Search ADS   Page M.T., van der Elst N., Hardebeck J., Felzer K., Michael A.J., 2016. Three ingredients for improved global aftershock forecasts: tectonic region, time-dependent catalog incompleteness, and intersequence variability, Bull. seism. Soc. Am. , 106( 5), 2290– 2301. Google Scholar CrossRef Search ADS   Peng Z.G., Vidale J.E., Houston H., 2006. Anomalous early aftershock decay rate of the 2004 Mw 6.0 Parkfield, California, earthquake, Geophys. Res. Lett. , 33( 17), L17307, doi:10.1029/2006GL026744. Google Scholar CrossRef Search ADS   Pisarenko V.F., 1991. Statistical evaluation of maximum possible earthquakes, Izv. Earth Phys. , 27( 9), 757– 763. Pisarenko V.F., Lyubushin A.A., Lysenko V.B., Golubeva T.V., 1996. Statistical estimation of seismic hazard parameters: maximum possible magnitude and related parameters, Bull. seism. Soc. Am. , 86( 3), 691– 700. Reasenberg P.A., 1999. Foreshock occurrence before large earthquakes, J. geophys. Res. , 104( B3), 4755– 4768. Google Scholar CrossRef Search ADS   Reasenberg P.A., Jones L.M., 1989. Earthquake hazard after a mainshock in California, Science , 243( 4895), 1173– 1176. Google Scholar CrossRef Search ADS PubMed  Reiss R.-D., Thomas M., 2007. Statistical Analysis of Extreme Values , 3rd edn, Springer, Basel. Rundle J.B., Turcotte D.L., Shcherbakov R., Klein W., Sammis C., 2003. Statistical physics approach to understanding the multiscale dynamics of earthquake fault systems, Rev. Geophys. , 41( 4), 1019, doi:10.1029/2003RG000135. Google Scholar CrossRef Search ADS   Saichev A., Sornette D., 2005. Distribution of the largest aftershocks in branching models of triggered seismicity: theory of the universal Båth law, Phys. Rev. E , 71( 5), 056127, doi:10.1103/PhysRevE.71.056127. Google Scholar CrossRef Search ADS   Shcherbakov R., 2014. Bayesian confidence intervals for the magnitude of the largest aftershock, Geophys. Res. Lett. , 41( 18), 6380– 6388. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., 2004. A modified form of Båth’s law, Bull. seism. Soc. Am. , 94( 5), 1968– 1975. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., Rundle J.B., 2004. A generalized Omori’s law for earthquake aftershock decay, Geophys. Res. Lett. , 31( 11), L11613, doi:10.1029/2004GL019808. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., Rundle J.B., 2005a. Aftershock statistics, Pure appl. Geophys. , 162( 6–7), 1051– 1076. Google Scholar CrossRef Search ADS   Shcherbakov R., Yakovlev G., Turcotte D.L., Rundle J.B., 2005b. Model for the distribution of aftershock interoccurrence times, Phys. Rev. Lett. , 95, 218501, doi:10.1103/PhysRevLett.95.218501. Google Scholar CrossRef Search ADS   Shcherbakov R., Nguyen M., Quigley M., 2012. Statistical analysis of the 2010 Mw 7.1 Darfield earthquake aftershock sequence, N. Z. J. Geol. Geophys. , 55( 3), 305– 311. Google Scholar CrossRef Search ADS   Shcherbakov R., Goda K., Ivanian A., Atkinson G.M., 2013. Aftershock statistics of major subduction earthquakes, Bull. seism. Soc. Am. , 103( 6), 3222– 3234. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., Rundle J.B., 2015. Complexity and earthquakes, in Earthquake Seismology , 4 of Treatise on Geophysics, 2nd edn, pp. 627– 653, ed. Kanamori H., Elsevier. Google Scholar CrossRef Search ADS   Shearer P.M., 2012. Self-similar earthquake triggering, Båth’s law, and foreshock/aftershock magnitudes: Simulations, theory, and results for southern California, J. geophys. Res. , 117, B06310, doi:10.1029/2011jb008957. Google Scholar CrossRef Search ADS   Shebalin P., Narteau C., Holschneider M., Schorlemmer D., 2011. Short-term earthquake forecasting using early aftershock statistics, Bull. seism. Soc. Am. , 101( 1), 297– 312. Google Scholar CrossRef Search ADS   Tahir M., Grasso J.R., Amorese D., 2012. The largest aftershock: how strong, how far away, how delayed?, Geophys. Res. Lett. , 39, L04301, doi:10.1029/2011gl050604. Google Scholar CrossRef Search ADS   Utsu T., Ogata Y., Matsu’ura R.S., 1995. The centenary of the Omori formula for a decay law of aftershock activity, J. Phys. Earth , 43( 1), 1– 33. Google Scholar CrossRef Search ADS   Vere-Jones D., 1969. A note on the statistical interpretation of Båth’s law, Bull. seism. Soc. Am. , 59, 1535– 1541. Vere-Jones D., 1975. Stochastic models for earthquake sequences, Geophys. J. Int. , 42( 2), 811– 826. Google Scholar CrossRef Search ADS   Vere-Jones D., Zhuang J., 2008. Distribution of the largest event in the critical epidemic-type aftershock-sequence model, Phys. Rev. E , 78( 4), 047102, doi:10.1103/PhysRevE.78.047102. Google Scholar CrossRef Search ADS   Zhuang J., Ogata Y., 2006. Properties of the probability distribution associated with the largest event in an earthquake cluster and their implications to foreshocks, Phys. Rev. E , 73( 4), 046134, doi:10.1103/PhysRevE.73.046134. Google Scholar CrossRef Search ADS   Zhuang J., Christophersen A., Savage M.K., Vere-Jones D., Ogata Y., Jackson D.D., 2008. Differences between spontaneous and triggered earthquakes: their influences on foreshock probabilities, J. geophys. Res. , 113( B11), B11302, doi:10.1029/2008jb005579. Google Scholar CrossRef Search ADS   Zhuang J., Ogata Y., Wang T., 2017. Data completeness of the Kumamoto earthquake sequence in the JMA catalog and its influence on the estimation of the ETAS parameters, Earth Planets Space , 69, 36, doi:10.1186/s40623-017-0614-6. Google Scholar CrossRef Search ADS   Zöller G., Holschneider M., Hainzl S., 2013. The maximum earthquake magnitude in a time horizon: theory and case studies, Bull. seism. Soc. Am. , 103( 2A), 860– 875. Google Scholar CrossRef Search ADS   Zöller G., Holschneider M., Hainzl S., Zhuang J., 2014. The largest expected earthquake magnitudes in Japan: the statistical perspective, Bull. seism. Soc. Am. , 104( 2), 769– 779. Google Scholar CrossRef Search ADS   © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Geophysical Journal International Oxford University Press

# Constraining the magnitude of the largest event in a foreshock–main shock–aftershock sequence

, Volume 212 (1) – Jan 1, 2018
13 pages

/lp/ou_press/constraining-the-magnitude-of-the-largest-event-in-a-foreshock-main-hazSxy0Z7w
Publisher
The Royal Astronomical Society
ISSN
0956-540X
eISSN
1365-246X
D.O.I.
10.1093/gji/ggx407
Publisher site
See Article on Publisher Site

### Abstract

Abstract Extreme value statistics and Bayesian methods are used to constrain the magnitudes of the largest expected earthquakes in a sequence governed by the parametric time-dependent occurrence rate and frequency–magnitude statistics. The Bayesian predictive distribution for the magnitude of the largest event in a sequence is derived. Two types of sequences are considered, that is, the classical aftershock sequences generated by large main shocks and the aftershocks generated by large foreshocks preceding a main shock. For the former sequences, the early aftershocks during a training time interval are used to constrain the magnitude of the future extreme event during the forecasting time interval. For the latter sequences, the earthquakes preceding the main shock are used to constrain the magnitudes of the subsequent extreme events including the main shock. The analysis is applied retrospectively to past prominent earthquake sequences. Probabilistic forecasting, Statistical methods, Earthquake hazards, Earthquake interaction, forecasting, and prediction, Statistical seismology 1 INTRODUCTION The occurrence of extreme earthquakes is a manifestation of the scaling of the frequency–magnitude statistics of seismicity (Gutenberg & Richter 1954; Vere-Jones 1969). In addition, it is controlled by the temporal and spatial variability of the seismicity rate. For example, large aftershocks typically happen with high probability right after the main shock and less often later in the aftershock sequences. This is a direct consequence of the fast decaying temporal aftershock rate, which can be approximated in many cases by the Omori–Utsu law (Utsu et al. 1995). From the physics point of view, the occurrence of earthquakes in general and extreme ones in particular is the outcome of the multiscale physical processes operating in the seismogenic crust and upper mantle (Ben-Zion 2008; Rundle et al. 2003; Shcherbakov et al. 2015). Among those, the stress transfer mechanism plays a critical role. Large aftershocks constitute significant hazard and can inflict additional damage to the already weakened infrastructure by the main shock. Therefore, constraining the magnitude of the largest event in an earthquake sequence is a crucial problem in statistical seismology (Reasenberg & Jones 1989; Shcherbakov 2014; Page et al. 2016). In this work, we combine extreme value statistics with Bayesian methods to derive the Bayesian predictive distribution for the magnitude of the largest expected event in a sequence of earthquakes. To accomplish this, we use the information of the early events in the sequence to constrain the variability of the model parameters describing the frequency–magnitude statistics and the occurrence rates. We assume that the occurrence of earthquakes in the sequence can be described by a non-homogeneous Poisson (NHP) marked point process (Utsu et al. 1995; Shcherbakov et al. 2005b), where magnitudes are not correlated and their frequency–magnitude statistics and occurrence rates can be described by a parametric model. One typical example of such a sequence of events is ubiquitous aftershock sequences. Similarly, large foreshocks, which in turn generate their own aftershock sequences, can also initiate a sequence of earthquakes culminating in the occurrence of a main shock. In both cases, the decay rates can be approximated well by the Omori–Utsu law and the frequency–magnitude statistics follow Gutenberg–Richter scaling. We analyse both types of sequences associated with prominent past main shocks to estimate the probabilities of having a large aftershock following the main shock and also to estimate the probability of the main shock magnitude following the preceding foreshocks. As a main result of this work, we provide a robust scheme in estimating the above probabilities, where we incorporate all the uncertainties associated with the model parameters into the calculation of respective probabilities. The suggested framework can be incorporated into the existing or future operating forecasting methods devoted to the estimation of magnitudes of extreme earthquakes. Extreme value theory and Bayesian analysis play an important role in statistical seismology (Campbell 1982; Daley & Vere-Jones 2003). They were used to address the problem of estimating the maximum expected magnitude of a main shock (Campbell 1982; Pisarenko 1991; Pisarenko et al. 1996; Kijko 2004), where the Bayesian approach is the most promising (Campbell 1982; Holschneider et al. 2011; Zöller et al. 2013, 2014). The search for reliable probabilistic models to forecast the occurrence of large aftershocks is justified by the hazard they pose (Gerstenberger et al. 2005; Shebalin et al. 2011; Page et al. 2016). In one such approach, the aftershock rate is recovered using the information of early aftershocks and can be used to estimate the probability of larger subsequent events (Omi et al. 2013; Ebrahimian et al. 2014; Omi et al. 2016). The occurrence of aftershocks increases significantly the seismic hazard and have to be taken into account in any operational earthquake forecasting schemes (Reasenberg & Jones 1989; Page et al. 2016). One of the first systematic empirical aftershock studies states that the magnitude of the largest aftershock on average is 1.2 less than the magnitude of the main shock (Båth 1965) and is known as Båth’s law. It was also suggested that the observed difference was independent of the number of events and its mean value was proportional to the inverse of the b-value assuming that aftershocks are independent and identically distributed random variables drawn from the exponential distribution (Vere-Jones 1969, 1975). Recent studies have explored this difference in more detail (Console et al. 2003; Shcherbakov & Turcotte 2004; Tahir et al. 2012; Shearer 2012; Shcherbakov et al. 2013). The existence of Båth’s law was also tested in the synthetic aftershock data produced by the epidemic-type aftershock sequence (ETAS) model (Helmstetter & Sornette 2003a) as well as derived theoretically (Saichev & Sornette 2005; Zhuang & Ogata 2006; Vere-Jones & Zhuang 2008; Luo & Zhuang 2016). In most of these studies, the point estimator for the difference between the magnitude of the main shock and the largest aftershock was analysed. However, those studies neither addressed the problem of constraining the variability of the difference nor the problem of quantifying the uncertainty caused by the estimation error of model parameters. The occurrence of statistically significant foreshocks before large subsequent main shocks is not as consistent as the occurrence of aftershocks (Ogata et al. 1995, 1996; Molchan et al. 1999; Reasenberg 1999; Felzer et al. 2004). However, there are clear examples when past large main shocks were preceded by relatively large foreshocks, which generated their own aftershock sequences. Typically, the analysis of foreshock sequence is performed by stacking many sequences of events based on specific clustering algorithms (Molchan et al. 1999; Helmstetter & Sornette 2003b; Christophersen & Smith 2008; Zhuang et al. 2008; Marzocchi & Zhuang 2011; Ogata & Katsura 2012, 2014). The paper is organized as follows. In Section 2, we derive the Bayesian predictive distribution for the magnitude of the largest event in a sequence by assuming specific parametric models for the frequency–magnitude statistics of earthquakes and time-dependent earthquake rates. In Section 3, we apply the developed framework to several prominent earthquake sequences to constrain the magnitude of the largest event in the sequences. We provide the final remarks and future directions in Sections 4 and 5. 2 BAYESIAN PREDICTIVE DISTRIBUTION FOR THE MAGNITUDE OF THE LARGEST EVENT IN A SEQUENCE 2.1 Extreme value and Bayesian analyses Earthquakes can be considered as an outcome of a stochastic marked point process. Each earthquake is characterized by its size or magnitude mi, which is drawn from a particular distribution, and by its occurrence time ti. In most cases, it is assumed that the magnitudes are not correlated and the temporal variation of the occurrence of events is governed by a time-dependent rate, λω(t), where ω specifies a set of parameters. With these assumptions, the earthquakes can be modeled as an NHP process in time (Utsu et al. 1995; Shcherbakov et al. 2005b). The spatial extent of the occurrence of earthquakes can also be considered, however, in this study, we restrict ourselves to the time domain only. In the analysis that follows, we define a training time interval ]0 , T] during which early earthquakes in the sequence are considered. The times and magnitudes of these earthquakes Mn = {(ti, mi): i = 1, …, n} are used to constrain the model parameters. During the forecasting time interval ]T, T + ΔT], we estimate the probability of having the magnitude of the extreme event larger than a given value. Within sequences of earthquakes during a forecasting time interval ΔT, the distribution of the largest event can be studied using the framework of extreme value statistics (Coles 2001; David & Nagaraja 2003). Suppose that earthquake magnitudes {mi}, i = 1, 2, … are described by a parametric distribution function Fθ(m), where θ is a set of parameters. If one observes k earthquakes during a specified future time interval ]T, T + ΔT], then the maximum magnitude μk = max i = 1, …, k{mi} can be considered as a random variable. Using the order statistics of extremes, the probability that the maximum observed event μk is smaller or equal to m is (Daley & Vere-Jones 2003, p. 7)   $$\mathrm{Pr}(\mu _k \le m|\theta ) = \left[F_\theta (m)\right]^k.$$ (1) On the other hand, the probability of observing k events during the interval ]T, T + ΔT], assuming that they are generated by an NHP process with the productivity $$\Lambda _\omega (\Delta T)=\int _T^{T+\Delta T} \lambda _\omega (t)\, dt$$, is (Daley & Vere-Jones 2003, p. 22)   $$\mathrm{Pr}(k|\Delta T,\omega ) = \frac{[\Lambda _\omega (\Delta T)]^k}{k!}\, e^{-\Lambda _\omega (\Delta T)}.$$ (2) The probability that the maximum event magnitude is less than or equal to m for all possible number of events k during a given time interval ]T, T + ΔT] can be evaluated by combining eqs (1) and (2) and applying the total probability theorem (Campbell 1982; Coles 2001; Daley & Vere-Jones 2003; Reiss & Thomas 2007; Zöller et al. 2013)   \begin{eqnarray} P_\mathrm{EV}(m_\mathrm{ex} \le m|\theta ,\omega ) & = & \sum \limits _{k=0}^{\infty }\frac{[\Lambda _\omega (\Delta T)]^k}{k!}\, e^{-\Lambda _\omega (\Delta T)}\, [F_\theta (m)]^k \nonumber \\ & = & \exp \lbrace -\Lambda _\omega (\Delta T)\left[1-F_\theta (m)\right]\rbrace . \end{eqnarray} (3) The above eq. (3) can be used to compute the probability 1 − PEV(mex ≤ m|θ, ω) that the largest event will be larger than m given the estimates of the model parameters {θ, ω}. In real applications, the parameters are estimated with a degree of uncertainty. Therefore, when computing this probability one needs to incorporate those uncertainties into the analysis. In order to do this, we assume that the model parameters {θ, ω} and their uncertainties can be estimated from the early observed events during a training time interval ]0, T] preceding the forecasting time interval ]T, T + ΔT]. According to the Bayesian analysis, the posterior distribution function p(θ, ω|Mn) for the model parameters {θ, ω}, given the information for the magnitudes and times of n early events, Mn, observed during the interval ]0, T], has the following form:   $$p(\theta ,\omega |M_n) \propto L\left(M_n|\theta ,\omega \right) \pi (\theta ,\omega ),$$ (4)where π(θ, ω) is a prior distribution for the model parameters. In this study, we assume that π(θ, ω) is a uniform density function over the space of all the possible values. Thus, p(θ, ω|Mn) ∝ L(Mn|θ, ω), i = 1, …, n. The likelihood function L(Mn|θ, ω) for an NHP process driven by the time-dependent rate λω(t) with event magnitudes distributed according to Fθ(m), can be written (Daley & Vere-Jones 2003, p. 23)   $$L\left(M_n|\theta ,\omega \right) = e^{-\Lambda _\omega (T)}\, \prod \limits _{i=1}^n\lambda _\omega (t_i)\, \prod \limits _{i=1}^n f_\theta (m_i),$$ (5)where $$f_\theta (m)=\frac{dF_\theta (m)}{dm}$$ is the probability density function and $$\Lambda _\omega (T)=\int _0^T \lambda _\omega (t)\, dt$$ is the productivity of the NHP process during the training time interval ]0, T]. The posterior distribution function (eq. 4), can be combined with the probability (eq. 3), to obtain the Bayesian predictive distribution for the magnitude of the maximum event mex during the forecasting time interval ]T, T + ΔT] given the information for the magnitudes and times of each n early observed earthquakes Mn during the training time interval ]0, T] (Zöller et al. 2013):   \begin{eqnarray} P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) \!=\! \int \limits _\Omega \!\int \limits _\Theta P_\mathrm{EV}(m_\mathrm{ex}\le m|\theta ,\omega )\, p(\theta ,\omega |M_n)\, d\theta \, d\omega ,\nonumber\\ \end{eqnarray} (6)where Θ and Ω define the multidimensional domains of frequency–magnitude distribution and rate parameters. The Bayesian predictive distribution (eq. 6), includes both uncertainties associated with stochastic nature of the problem and parameter variations. It is a marginal distribution where one integrates over parameter uncertainties using the posterior distribution (eq. 4). Using eqs (3)–(5), the Bayesian predictive distribution (eq. 6), can be written in the following form   \begin{eqnarray} P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) \!=\! \frac{1}{A}\, \!\int \limits _\Omega \!\int \limits _\Theta e^{-\Lambda _\omega (\Delta T)\left[1-F_\theta (m)\right]}\, L\left(M_n|\theta ,\omega \right)\, d\theta \, d\omega ,\nonumber\\ \end{eqnarray} (7)where we used a non-informative prior distribution, π(θ, ω) = const, for the model parameters. When evaluating numerically the integrals in eqs (6), (7) and other similar integrals, we always consider a typical finite range of the model parameters. This effectively defines a flat prior, which is non-zero between finite lower and upper bounds. The normalization constant A is computed by integrating the likelihood function (eq. 5) over the model parameter space {Θ, Ω}   $$A = \int \limits _\Omega \int \limits _\Theta L\left(M_n|\theta ,\omega \right)\, d\theta \, d\omega .$$ (8) The corresponding probability density function $$p_\mathrm{B}(m)=\frac{d}{dm}P_\mathrm{B}(m)$$ is   \begin{eqnarray} p_\mathrm{B}(m|M_n) &=& \frac{1}{A} \int \limits _\Omega \int \limits _\Theta \Lambda _\omega (\Delta T) f_\theta (m) e^{-\Lambda _\omega (\Delta T)\left[1-F_\theta (m)\right]}\nonumber\\ &&\times\, L\left(M_n|\theta ,\omega \right) d\theta d\omega . \end{eqnarray} (9)Eqs (7) or (9) give the distribution for the magnitude of the largest expected aftershock during a future time interval ]T, T + ΔT] given the magnitudes and times Mn of early n earthquakes, which occurred during the training time interval ]0, T]. In addition, one can consider the future time interval ΔT to the next maximum expected event as a random variable (Zöller et al. 2013). The probability distribution function of having the next maximum earthquake to occur during this time interval and to exceed a specific value mex is   \begin{eqnarray} P_\mathrm{B}(\Delta T \le t|M_n,m_\mathrm{ex}) &=& 1 - \frac{1}{A}\, \int \limits _\Omega \int \limits _\Theta e^{-\Lambda _\omega (t)\left[1-F_\theta (m_\mathrm{ex})\right]}\,\nonumber\\ &&\times\, L\left(M_n|\theta ,\omega \right)\, d\theta \, d\omega , \end{eqnarray} (10)where $$\Lambda _\omega (t)=\int _T^{T+t} \lambda _\omega (t^{\prime })\, dt^{\prime }$$ is the productivity of the NHP process during the time interval ]T, T + t]. This equation has the same functional form as eq. (7). By fixing a given probability level one can estimate the interarrival time t = ΔT to the next extreme earthquake with the magnitude greater than mex. Similarly to eq. (9), one can also compute the probability density function $$p_\mathrm{B}(t)=\frac{d}{dt}P_\mathrm{B}(\Delta T \le t|M_n,m_\mathrm{ex})$$ as   \begin{eqnarray} p_\mathrm{B}(t|M_n,m_\mathrm{ex}) &=& \frac{1}{A} \int \limits _\Omega \int \limits _\Theta \lambda _\omega (t) \left[1-F_\theta (m)\right] e^{-\Lambda _\omega (t)\left[1-F_\theta (m)\right]}\nonumber\\ &&\times\, L\left(M_n|\theta ,\omega \right) d\theta d\omega . \end{eqnarray} (11) As stated above, the occurrence of earthquakes can be considered as a marked point process. The sequence Mn = {(ti, mi)} of arrival times ti and marks (magnitudes) mi can be approximated as an NHP process, where the arrival times form an NHP process and the marks mi are iid random variables with the common distribution function F. From this marked point process, one can extract a sequence of exceedance events, where each event has a magnitude above a certain threshold mex. This threshold model is characterized by exceedance arrival times and exceedances and can be modeled as an NHP process as well (Coles 2001; Reiss & Thomas 2007). The statistics of extreme events can be obtained from the analysis of exceedances over a pre-defined high threshold. In this framework, eq. (10) gives the distribution of interarrival time intervals between such extreme events. 2.2 Parametric earthquake model Here, we analyse a specific model for the distribution of earthquake magnitudes and occurrence rates. This is a reasonably good approximation for aftershock sequences generated by large earthquakes. The model is also applicable to the aftershocks generated by some large foreshock preceding a subsequent main shock. In this case, the main shock can be considered as an extreme aftershock triggered by a large foreshock. We assume that aftershock magnitudes are distributed according to the exponential distribution with the probability density function fθ(m) with the corresponding distribution function Fθ(m):   \begin{eqnarray} f_\theta (m) & = & \beta \exp \left[-\beta \, (m - m_0)\right], \end{eqnarray} (12)  \begin{eqnarray} F_\theta (m) & = & 1 - \exp \left[-\beta \, (m - m_0)\right], \end{eqnarray} (13)where θ = {β}. The parameter β is related to the b-value of the Gutenberg–Richter scaling relation, β = ln (10)b (Gutenberg & Richter 1954). m0 is a given lower magnitude cut-off set above the catalogue completeness level. It is typically observed that the rate of aftershocks λω(t) can be approximated by the Omori–Utsu law with parameters ω = {τ, c, p} (Utsu et al. 1995; Shcherbakov et al. 2004, 2005a; Holschneider et al. 2012):   $$\lambda _\omega (t) = \frac{1}{\tau }\frac{1}{\left(1+t/c\right)^p}.$$ (14) Using the above parametric model, the Bayesian predictive distribution (eq. 7), for the largest expected aftershock can be written   \begin{eqnarray} P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) & = & \frac{1}{A} \, \!\int \limits _0^\infty \int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, e^{-\Lambda _\omega (T)-\Lambda _\omega (\Delta T)\, \exp \left[-\beta \, (m - m_0)\right] } \nonumber \\ &&\times\, \beta ^n e^{\!-\!\beta n(\overline{m} \!-\! m_0)} \frac{1}{\tau ^n} \prod \limits _{i=1}^n \left(1\!+\!\frac{t_i}{c}\right)^{-p} dp dc d\beta d\tau ,\nonumber\\ \end{eqnarray} (15)with   \begin{eqnarray} A &=& \int \limits _0^\infty \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, e^{-\Lambda _\omega (T)}\, \beta ^n\, e^{-\beta \, n(\overline{m} - m_0)}\, \frac{1}{\tau ^n}\, \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p} \nonumber\\ &&\times\, dp\, dc\, d\beta \, d\tau , \end{eqnarray} (16)where $$\overline{m}=\frac{1}{n}\sum _{i=1}^n m_i$$ is the average magnitude above m0. The corresponding probability density function is   \begin{eqnarray} p_\mathrm{B}(m|M_n) & = & \frac{1}{A} \int \limits _0^\infty \int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \Lambda _\omega (\Delta T)\, e^{-\Lambda _\omega (T)-\Lambda _\omega (\Delta T) \exp \left[\!-\!\beta (m \!-\! m_0)\right] } \nonumber \\ &&\times\,\beta ^{n+1} e^{\!-\!\beta n(\overline{m} \!-\! m_0) \!-\! \beta (m \!-\! m_0)} \frac{1}{\tau ^n} \prod \limits _{i\!=\!1}^n \left(1\!+\!\frac{t_i}{c}\right)^{-p} \!\!dp dc d\beta d\tau. \nonumber\\ \end{eqnarray} (17)The limits of integration for the β, c and p parameters can be specified through the prior distribution π(θ, ω). For this purpose, one can use typical ranges of observed values or provide a full range of variability for these parameters. It is possible to integrate eq. (15) with respect to τ to obtain   \begin{eqnarray} {P_\mathrm{B}(m_\mathrm{ex}\le m|M_n) = \frac{\Gamma (n-1)}{A} \, \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p}} \nonumber \\ &&\times\, \frac{\beta ^n\, e^{-\beta \, n(\overline{m} - m_0)}}{\left[\varphi (c,p,0,T) + e^{-\beta (m-m_0)}\varphi (c,p,T,\Delta T)\right]^{n-1}} dp\, dc\, d\beta, \end{eqnarray} (18)where Γ(n) is the Gamma function and   $$\varphi (c,p;u,v) = \frac{c^p}{p-1}\left[(c+u)^{1-p}-(c+u+v)^{1-p}\right].$$ (19) For the normalization constant A, one can also integrate eq. (16) over τ to obtain:   \begin{eqnarray} A &=& \Gamma (n-1) \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}}\, \frac{\beta ^n e^{-\beta \, n(\overline{m} - m_0)}}{\left[\varphi (c,p\,0,T)\right]^{n-1}}\nonumber\\ &&\times\, \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p} dp\, dc\, d\beta . \end{eqnarray} (20) The corresponding probability density function (eq. 17), after integration over τ becomes   \begin{eqnarray} p_\mathrm{B}(m|M_n) & = & \frac{\Gamma (n)}{A} \int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \varphi (c,p,T,\Delta T) \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p} \nonumber \\ &\times &\,\frac{\beta ^{n\!+\!1} e^{\!-\!\beta (m \!-\! m_0)} e^{\!-\!\beta n(\overline{m} \!-\! m_0)}}{\left[\varphi (c,p,0,T) \!+\! e^{\!-\!\beta (m\!-\!m_0)}\varphi (c,p,T,\Delta T)\right]^{n}} dp dc d\beta .\nonumber\\ \end{eqnarray} (21) The derived predictive distribution eq. (18) and the corresponding probability density function eq. (21) can be used to estimate the probabilities of having an extreme earthquake above a certain magnitude during a specified future time interval ]T, ΔT]. The proposed method incorporates the full uncertainties associated with each parameter of the models used to describe the frequency–magnitude statistics and decay rates of the early events in the sequence during the training time interval ]0, T]. The integration in the above equations has to be performed numerically. The limits of integration are finite and chosen based on the typical values of each parameter observed for past aftershock sequences. This effectively defines a flat prior π(β, c, p) for the model parameters bounded between finite minimum and maximum values. And finally, we provide the Bayesian predictive distribution for the interarrival time interval to the next largest expected event (eq. 10), for the magnitude of the earthquake to be above a certain threshold mex:   \begin{eqnarray} {P_\mathrm{B}(\Delta T \le t|M_n,m_\mathrm{ex}) = 1 \!-\! \frac{\Gamma (n-1)}{A} \int \limits _0^{\beta _\mathrm{max}} \int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \prod \limits _{i=1}^n \left(1+\frac{t_i}{c}\right)^{-p}} \nonumber \\ &&\times\, \frac{\beta ^n\, e^{-\beta \, n(\overline{m} - m_0)}}{\left[\varphi (c,p,0,T) + e^{-\beta (m_\mathrm{ex}-m_0)}\varphi (c,p,T,t)\right]^{n-1}} dp\, dc\, d\beta, \end{eqnarray} (22)and the corresponding probability density function $$p_\mathrm{B}(t)=\frac{d}{dt}P_\mathrm{B}(t|M_n,m_\mathrm{ex})$$:   \begin{eqnarray} {p_\mathrm{B}(t|M_n,m_\mathrm{ex}) = \frac{\Gamma (n)}{A} \!\int \limits _0^{\beta _\mathrm{max}} \!\int \limits _{c_\mathrm{min}}^{c_\mathrm{max}}\!\int \limits _{p_\mathrm{min}}^{p_\mathrm{max}} \left(1\!+\!\frac{T+t}{c}\right)^{\!-\!p} \prod \limits _{i\!=\!1}^n \left(1+\frac{t_i}{c}\right)^{-p}} \nonumber \\ &&\times\, \frac{\beta ^{n} e^{\!-\!\beta (m_\mathrm{ex} \!-\! m_0)} e^{-\beta n(\overline{m} \!-\! m_0)}}{\left[\varphi (c,p,0,T) \!+\! e^{\!-\!\beta (m_\mathrm{ex}\!-\!m_0)}\varphi (c,p,T,t)\right]^{n}} dp dc d\beta . \end{eqnarray} (23) Eq. (22) can be used to estimate the time to the next extreme earthquake by fixing a certain probability level. This interval ΔT can be obtained by solving the equation PB(ΔT|Mn, mex) = α, where α can be set to a specific value, for example, α = 0.05. 3 APPLICATION TO PAST PROMINENT EARTHQUAKE SEQUENCES To illustrate the application of the method, which is developed in the previous section, we consider two types of earthquake sequences. The first type comprises usual main shock–aftershock sequences, where the decay rate of aftershocks can be approximated well by the Omori–Utsu law eq. (14) and the frequency–magnitude statistics is given by the exponential distribution eq. (12). For these sequences, we use the information of the early aftershocks observed right after the main shock during the training time interval ]0, T], to estimate the probability of having an extreme aftershock above a certain magnitude later in the sequence during the forecasting time interval ]T, ΔT]. For the second type of sequences, we consider the occurrence of large foreshocks, which in turn generate their own aftershock sequences and may trigger the main shock. Therefore, we can estimate the probability of having an event of the size of the main shock or larger. 3.1 The 2016 Mms 7.3 Kumamoto, Japan, earthquake sequence As the first example, we consider the recent 2016 Mms 7.3 Kumamoto, Japan, earthquake sequence. The main shock occurred on 2016 April 16. It was preceded by two large foreshocks of magnitude 6.5 and 6.4, which occurred in the preceding day before the main shock. The largest aftershock up until 2017 September 01 had magnitude 5.9, which occurred in the first day after the main shock. First, we consider the sequence initiated by the first foreshock of magnitude 6.5, which occurred 1.16 d before the main shock. This foreshock generated its own aftershock sequence with well-defined frequency–magnitude statistics and a decay rate (Nanjo et al. 2016; Nanjo & Yoshida 2017). Therefore, we use the magnitudes and times of those aftershocks during 1.16 d after the magnitude 6.5 foreshock to compute the Bayesian predictive distribution, eqs (18) and (21), for the subsequent extreme event (main shock). This is illustrated in Fig. 1. We fix T = 1.16 d and consider several forecasting time intervals ΔT = 1, 2 and 5 d with all events larger than m0 = 3.2. From the plots, we can estimate the probability of having an extreme event larger than a certain magnitude. For example, considering the forecasting time interval ΔT = 5 d, there is 5.1 per cent chance of having an event larger than 6.5 and 1.3 per cent chance of having an event larger than 7.3, which is the magnitude of the subsequent main shock. Figure 1. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the 2016 April 14, Mms 6.5 foreshock of the Mms 7.3 Kumamoto earthquake (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1.16 d with all the events above magnitude m ≥ 3.2 and for the different forecasting time intervals ΔT = 1, 2 and 5 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 1.16 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.2, 3.5 and 3.8. The probabilities to have extreme earthquakes above magnitudes m = 6.5 and 7.3 are given in the legend. Figure 1. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the 2016 April 14, Mms 6.5 foreshock of the Mms 7.3 Kumamoto earthquake (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1.16 d with all the events above magnitude m ≥ 3.2 and for the different forecasting time intervals ΔT = 1, 2 and 5 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 1.16 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.2, 3.5 and 3.8. The probabilities to have extreme earthquakes above magnitudes m = 6.5 and 7.3 are given in the legend. The completeness of the catalogue can also play an important role in the analysis of the early aftershocks during the training time interval T (Zhuang et al. 2017). To analyse the effect of the lower magnitude cut-off m0, we consider fixed training and forecasting time intervals and vary the lower magnitude cut-off m0. This is shown in Fig. 1(b) for the fixed T = 1.16 and ΔT = 5 d and the three magnitude cut-offs m0 = 3.2, 3.5 and 3.8. The analysis shows a moderate dependence on the cut-off magnitude and slight increase in the probabilities for the lowest cut-off 3.2. This can be attributed to the fact that there are missing events right after the magnitude 6.5 foreshock. This illustrates the importance of having a complete earthquake catalogue to perform the analysis. On the other hand, the lower cut-off allows to use more events, which in turn reduce the associated uncertainties of the parameters used to model the frequency–magnitude statistics and the decay rates but increases the estimation biases. Next, we consider the aftershock sequence generated by the main shock itself. We use the information of the early aftershocks after the main shock to compute the Bayesian predictive distribution for the magnitudes of the extreme aftershocks in the future time interval. In Fig. 2, we plot the cumulative distribution (eq. 18), and the probability density function (eq. 21), for several future time intervals ΔT = 1, 2, 5 and 10 d, where we use the aftershocks above magnitude m0 = 4.0 in the first T = 1 d after the main shock. The effect of the lower magnitude cut-off m0 on the estimated probabilities is shown in Fig. 2(b), where we consider m0 = 3.5, 4.0 and 4.5. Figure 2. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1 d with all the events above magnitude m ≥ 4.0 and for the different forecasting time intervals ΔT = 1, 2, 5 and 10 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 5 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.5, 4.0 and 4.5. For both figures, the probabilities to have extreme earthquakes above magnitudes m = 6.3 and 7.3 are given in the legend. Figure 2. View largeDownload slide Bayesian predictive distribution PB(mex ≤ m|Mn) (eq. 18), as dashed curves and the corresponding probability density function pB(m|Mn) (eq. 21), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). (a) Each curve corresponds to the same early training time interval of T = 1 d with all the events above magnitude m ≥ 4.0 and for the different forecasting time intervals ΔT = 1, 2, 5 and 10 d. (b) Each curve corresponds to the fixed training and forecasting time intervals of T = 5 and ΔT = 5 d and varying lower magnitude cut-offs m0 = 3.5, 4.0 and 4.5. For both figures, the probabilities to have extreme earthquakes above magnitudes m = 6.3 and 7.3 are given in the legend. In addition, we analysed the distribution of the interarrival time interval ΔT to the next largest expected earthquake above a certain magnitude mex. Using the derived equations for the cumulative distribution (eq. 23), and the probability density function (eq. 23), we plot them in Fig. 3 for the analysis of the aftershock sequence generated by the Kumamoto main shock. We consider a fixed training time interval of T = 1 d and estimate the distribution of interarrival times to the next extreme earthquake above several magnitude thresholds mex = 6.0, 6.3 and 7.3. For a fixed probability level of 5 per cent the estimated interarrival times are ΔT = 0.21, 0.42 and 5.32 d, respectively. Figure 3. View largeDownload slide Bayesian predictive distribution PB(ΔT ≤ t|Mn, mex) (eq. 22), as dashed curves and the corresponding probability density function pB(t|Mn, mex) (eq. 23), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). The results correspond to the same early training time interval of T = 1 d with all the events above magnitude m0 ≥ 4.0 and for different forecasting extreme magnitudes mex = 6.0, 6.3 and 7.3. The horizontal solid line corresponds to the probability level of 5 per cent. The intercepts of this line with the cumulative distribution functions give the forecasting or interarrival time intervals ΔT = 0.21, 0.42 and 5.32 d to the next extreme earthquake with the magnitude larger than mex. Figure 3. View largeDownload slide Bayesian predictive distribution PB(ΔT ≤ t|Mn, mex) (eq. 22), as dashed curves and the corresponding probability density function pB(t|Mn, mex) (eq. 23), as solid curves, for the sequence initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). The results correspond to the same early training time interval of T = 1 d with all the events above magnitude m0 ≥ 4.0 and for different forecasting extreme magnitudes mex = 6.0, 6.3 and 7.3. The horizontal solid line corresponds to the probability level of 5 per cent. The intercepts of this line with the cumulative distribution functions give the forecasting or interarrival time intervals ΔT = 0.21, 0.42 and 5.32 d to the next extreme earthquake with the magnitude larger than mex. To analyse the interplay between the two time intervals T and ΔT, we plot in Fig. 4 the dependence of the probability to have an extreme aftershock above magnitude mex ≥ 6.3 on varying T ∈ [1, 20] and ΔT ∈ [1, 15] d. The probability to have an extreme event typically decreases with increasing the training time interval T for a fixed forecasting interval ]T, ΔT]. This is a direct consequence of the power-law time-decaying Omori–Utsu law. On the other hand, the increasing forecasting time interval ]T, ΔT] for a fixed training time interval T results in the increase of the probability to have an extreme event. Figure 4. View largeDownload slide (a) The probability to have an extreme aftershock above a magnitude mex = 6.3 for the varying initial T and future time intervals ΔT. The sequence is initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). All the aftershocks above magnitude m0 = 4.0 were used. (b) The probability to have an extreme event above the magnitude mex = 7.3 for the sequence initiated by the magnitude 6.5 Kumamoto foreshock (2016 April 14). The maximum initial time interval T = 1.16 d was used. All the earthquake above magnitude m0 = 3.1 were used. Figure 4. View largeDownload slide (a) The probability to have an extreme aftershock above a magnitude mex = 6.3 for the varying initial T and future time intervals ΔT. The sequence is initiated by the magnitude Mms 7.3 Kumamoto main shock (2016 April 16). All the aftershocks above magnitude m0 = 4.0 were used. (b) The probability to have an extreme event above the magnitude mex = 7.3 for the sequence initiated by the magnitude 6.5 Kumamoto foreshock (2016 April 14). The maximum initial time interval T = 1.16 d was used. All the earthquake above magnitude m0 = 3.1 were used. 3.2 Comparative analysis of other sequences To extend our analysis, we consider other past prominent earthquake sequences and provide the probabilities of the occurrence of the largest events within a specified forecasting time interval ]T, T + ΔT]. The considered aftershock sequences generated by past large main shocks are summarized in Table 1. In Table 2, we list sequences associated with large foreshocks of the corresponding main shocks. Table 1. Summary of the data for the aftershock sequences used in the analysis. The columns are: the date of the occurrence of a main shock, magnitude of the main shock Mms, lower magnitude cut-off for aftershocks m0, the magnitude of the largest aftershock and the time interval in days when the largest aftershock occurred, respectively. Footnotes provide the references to the corresponding earthquake catalogues used in the study. Date  Name  Mms  m0  mas  Days  2016 November 13  Kaikourad  7.8  4.0  6.3  0.563  2016 April 16  Kumamotoa  7.3  4.0  5.9  0.014  2011 March 11  Tohokua  9.0  5.5  7.6  0.020  2010 September 3  Darfieldd  7.1  3.5  6.2  171.3  2010 April 4  El Mayorb  7.2  3.2  5.7  71.24  2008 June 14  Iwatea  7.2  3.1  5.7  0.025  2007 July 16  Chuetsua  6.8  3.1  5.8  0.225  2007 March 25  Notoa  6.9  3.1  5.3  0.354  2005 March 20  Fukuokaa  7.0  3.1  5.8  30.80  2004 October 23  Niigataa  6.8  3.1  6.5  0.026  2003 September 26  Tokachi-okia  8.0  4.0  7.1  0.054  2002 November 3  Denalic  7.9  3.0  5.8  0.014  2000 October 6  Tottoria  7.3  3.1  5.6  1.991  1999 October 17  Hector Mineb  7.1  2.7  5.8  0.009  1995 January 16  Kobea  7.3  3.1  5.4  0.078  1994 January 17  Northridgeb  6.7  3.0  5.9  0.001  1992 June 28  Landersb  7.3  3.0  6.3  0.131  Date  Name  Mms  m0  mas  Days  2016 November 13  Kaikourad  7.8  4.0  6.3  0.563  2016 April 16  Kumamotoa  7.3  4.0  5.9  0.014  2011 March 11  Tohokua  9.0  5.5  7.6  0.020  2010 September 3  Darfieldd  7.1  3.5  6.2  171.3  2010 April 4  El Mayorb  7.2  3.2  5.7  71.24  2008 June 14  Iwatea  7.2  3.1  5.7  0.025  2007 July 16  Chuetsua  6.8  3.1  5.8  0.225  2007 March 25  Notoa  6.9  3.1  5.3  0.354  2005 March 20  Fukuokaa  7.0  3.1  5.8  30.80  2004 October 23  Niigataa  6.8  3.1  6.5  0.026  2003 September 26  Tokachi-okia  8.0  4.0  7.1  0.054  2002 November 3  Denalic  7.9  3.0  5.8  0.014  2000 October 6  Tottoria  7.3  3.1  5.6  1.991  1999 October 17  Hector Mineb  7.1  2.7  5.8  0.009  1995 January 16  Kobea  7.3  3.1  5.4  0.078  1994 January 17  Northridgeb  6.7  3.0  5.9  0.001  1992 June 28  Landersb  7.3  3.0  6.3  0.131  aThe Japan Meteorological Agency earthquake catalog (http://www.jma.go.jp/en/quake/). bHauksson et al. waveform relocated catalog for Southern California (http://web.gps.caltech.edu/∼hauksson/catalogs/index.html) (Hauksson et al., 2012). cAlaska Earthquake Information Center catalog (http://www.aeic.alaska.edu/html_docs/db2catalog.html). dNew Zealand GEONET earthquake catalog (http://quakesearch.geonet.org.nz/). View Large Table 2. Summary of the data for the aftershock sequences generated by the foreshocks of the large main shocks. The columns are: the date of occurrence of a foreshock, the name of the corresponding main shock, the magnitude of the foreshock mfs, the magnitude of the main shock Mms, the lower magnitude cut-off for aftershocks m0 and the training time interval T, respectively. Date  Name  mfs  Mms  m0  T (days)  2016 April 14  Kumamotoa  6.5  7.3  3.1  1.16  2011 March 9  Tohokua  7.3  9.0  4.0  2.12  2010 April 3  El Mayorb  4.3  7.2  2.0  0.98  2002 October 23  Denalic  6.7  7.9  2.5  11.31  1992 April 23  Landersb  6.1  7.3  2.5  2.0  Date  Name  mfs  Mms  m0  T (days)  2016 April 14  Kumamotoa  6.5  7.3  3.1  1.16  2011 March 9  Tohokua  7.3  9.0  4.0  2.12  2010 April 3  El Mayorb  4.3  7.2  2.0  0.98  2002 October 23  Denalic  6.7  7.9  2.5  11.31  1992 April 23  Landersb  6.1  7.3  2.5  2.0  a,b,cThe same earthquake catalogs given in Table 1. View Large For aftershock sequences generated by large main shocks, we consider the training time intervals of T = 1, 2, 5 and 10 d and estimate the probabilities PB(mex ≥ m) of having extreme aftershocks above magnitude m for the fixed forecasting time interval ΔT = 5 d. We report these probabilities in Fig. 5(a) for m = Mms − 1.0 and in Fig. 5(b) for m = Mms. It is evident from the figures that the estimated probabilities are reduced by increasing the early training time intervals. This is related to the fact that for longer training time intervals the corresponding aftershock decay rate starts to decrease according to the Omori–Utsu law which in turn reduces the corresponding probabilities of having large subsequent events. The observed variation of the computed probabilities for different main shocks is directly related to the variability of the corresponding frequency–magnitude statistics and decay rates of each aftershock sequence. In general, the aftershock sequences with lower b-values and shallow decay rates (smaller p values) will produce higher probabilities for subsequent extreme aftershocks during the forecasting time interval ΔT. Figure 5. View largeDownload slide The probabilities of occurrence of extreme aftershocks after several prominent main shocks are shown. The left vertical axis indicate the date of the main shock and the right vertical axis give the magnitude above which the probabilities are computed. The probabilities are estimated using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = Mms − 1.0 and (b) m = Mms. The training time intervals T = 1, 2, 5 and 10 d and the fixed forecasting time interval ΔT = 5 d were used and indicated with different colours from light to dark, respectively. Figure 5. View largeDownload slide The probabilities of occurrence of extreme aftershocks after several prominent main shocks are shown. The left vertical axis indicate the date of the main shock and the right vertical axis give the magnitude above which the probabilities are computed. The probabilities are estimated using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = Mms − 1.0 and (b) m = Mms. The training time intervals T = 1, 2, 5 and 10 d and the fixed forecasting time interval ΔT = 5 d were used and indicated with different colours from light to dark, respectively. In some cases, main shocks are preceded by large foreshocks, which generate their own aftershock sequences. In this context, we can treat the occurrence of the subsequent main shock as an extreme aftershock by assuming that it belongs to the same sequence of events governed by the corresponding frequency–magnitude statistics and decay rates. Therefore, we can use the information of earthquakes before the occurrence of the main shock to estimate the probability of having an extreme event equal to or larger than the magnitude of the main shock. This is reported in Fig. 6 and the parameters of the corresponding foreshock sequences are given in Table 2. In Fig. 6(a), we computed the probabilities of having events larger than the magnitude of the largest foreshock. In Fig. 6(b), we provide the probabilities of having events larger than the subsequent main shock magnitudes. For most of the sequences, we used the training time interval T that ends before the occurrence of the main shocks. We also used several forecasting time intervals ΔT = 1, 5 and 10 d. In most case, the obtained probabilities are rather low (Fig. 6) which is consistent with other studies (Zhuang et al. 2008). This is especially evident for the analysis of the foreshock sequence before the 2002 Mms 7.9 Denali, Alaska, main shock. The obtained probabilities are several orders of magnitude lower compared to the other foreshock sequences summarized in Fig. 6, where we employed the logarithmic scale to emphasize this variability. Figure 6. View largeDownload slide The probabilities of occurrence of extreme earthquakes after the foreshocks preceding several prominent main shocks are shown. The left vertical axis indicate the dates of the foreshocks and the right vertical axis give the magnitudes of the extreme events. The probabilities were computed using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = mfs and (b) m = Mms. The training time interval T is given in Table 2 for each sequence. Three forecasting time intervals ΔT = 1, 2 and 5 d were used and indicated with different colours from light to dark, respectively. Figure 6. View largeDownload slide The probabilities of occurrence of extreme earthquakes after the foreshocks preceding several prominent main shocks are shown. The left vertical axis indicate the dates of the foreshocks and the right vertical axis give the magnitudes of the extreme events. The probabilities were computed using the Bayesian predictive distribution PB(mex > m) = 1.0 − PB(mex ≤ m) (eq. 18), for the following magnitudes (a) m = mfs and (b) m = Mms. The training time interval T is given in Table 2 for each sequence. Three forecasting time intervals ΔT = 1, 2 and 5 d were used and indicated with different colours from light to dark, respectively. 4 DISCUSSION The proposed statistical model to estimate the occurrence of extreme earthquake magnitudes in a sequence assumes that earthquakes can be modeled by an NHP process driven by the time-dependent earthquake rate. In addition, it assumes that the frequency–magnitude statistics of earthquakes are stationary in time. It also assumes that the extreme events are drawn from the same population of earthquakes as the earthquakes used to constrain the parameters of the model. For real earthquake sequences, not all of these assumptions are always satisfied. For example, some large main shocks can trigger rather large aftershocks later in a sequence, however, these aftershocks do not necessarily belong to the same statistical population of events as the main shock and its direct aftershocks. This usually applies to the occurrence of large aftershocks on separate faults. For example, this may be the case for the occurrence of three large aftershocks during 1 yr and 3 months time interval after the 2010 Mms 7.1 Darfield, New Zealand, main shock (Beavan et al. 2012; Shcherbakov et al. 2012). Another example is the occurrence of a strong aftershock (the 1992 m 6.3 Big Bear event) after the 1992 Mms 7.3 Landers main shock. In these and other cases, those large aftershocks may have been triggered by complex stress redistribution after the corresponding main shocks and it is not assured that they belong to the same statistical population of aftershocks. In some case, the main shocks themselves can rupture several fault segments and can be subdivided into several subevents. The typical examples are the 1992 Mms 7.3 Landers, 2002 Mms 7.9 Denali, 2010 Mms 7.1 Darfield, 2012 Mms 8.6 Indian Ocean, 2016 Mms 7.8 Kaikoura and other earthquakes. Similar limitations also apply to the frequency–magnitude statistics and the earthquake occurrence rates. For some aftershock sequences, the Omori–Utsu law is not the best-fitting model as some large aftershocks can generate their own aftershock sequences and the total observed rate deviates from the Omori–Utsu functional form. In many cases, the ETAS model can be a better rate model (Ogata 1988, 1999). However, the temporal ETAS model has more parameters than the Omori–Utsu model and this translates into higher dimensional integration of the Bayesian predictive distribution function (eq. 7), which is usually computationally expensive. The distribution of aftershock magnitudes can also deviate from the exponential functional form assumed in this study. Therefore, these limitations of the assumed parametric models can lead to improperly computed probabilities for the magnitudes of extreme earthquakes. Another important aspect of the majority of aftershock sequences is the early aftershock incompleteness (Kagan 2004; Peng et al. 2006; Hainzl 2016a,b). This affects the estimation of the model parameters if the magnitude of completeness is underestimated. As a result, this can significantly influence the calculation of probabilities for the extreme events. To recover partially, the true rate a variable magnitude of completeness can be considered (Helmstetter et al. 2006; Omi et al. 2014; Page et al. 2016). In our analysis, we use a uniform non-informative prior for the model parameters when we derive the Bayesian predictive distribution (eq. 7), for the magnitude of the expected largest aftershock. The only constraint we impose is that the model parameters are found between finite bounds. However, it is possible to incorporate a non-uniform prior based on historical seismicity. For example, Page et al. (2016) consider a worldwide distribution of aftershock sequences subdivided into different seismogenic zones and estimate the regional parameters of the Omori–Utsu (eq. 14), and Gutenberg–Richter (eq. 12), laws. Shcherbakov et al. (2013) carried out the analysis of the parameters of the above empirical laws for large subduction earthquakes. Therefore, one can incorporate this information into the computation of occurrence probabilities. Our suggested method of estimating the probabilities of the magnitudes of large aftershocks can be compared with the Reasenberg & Jones (1989) approach introduced to model the aftershock occurrence probabilities in California. The Reasenberg & Jones (1989) model has been reformulated recently by Page et al. (2016) to account for the early aftershock incompleteness and has been extended to the worldwide seismogenic regions. The calculation of the probabilities for the magnitudes of large earthquakes in the Reasenberg & Jones (1989) model (their eq. 4) is in fact equivalent to the use of the extreme value distribution given in eq. (3) with the same parametric models for the frequency–magnitude statistics (eq. 13), and the aftershock occurrence rate (eq. 14), with the following relationship between parameters $$a=\log _{10}\left[\beta \, K\, e^{-\beta (M_\mathrm{ms}-m_0)}\right]$$ in the limit M2 → ∞, where a and M2 are the parameters defined in eq. (3) of Reasenberg & Jones (1989). In our approach, we incorporate the uncertainties associated with the model parameters by deriving the Bayesian predictive distribution (eq. 18). For comparison, we computed the probabilities of having the magnitudes of large aftershocks greater than Mms − 1.0 for the above past prominent sequences by using both the Bayesian predictive distribution (eq. 18), and the extreme value distribution (eq. 3). This is reported in Table 3. For the extreme value distribution calculations, we used the point estimates of the model parameters {β, K, c, p} for each aftershock sequence during the first T = 2 d after the main shocks. The results show that the extreme value distribution consistently overestimates the probabilities of the magnitudes of large aftershocks. Table 3. Comparison of the probabilities of having the magnitudes of future large aftershocks larger than m = Mms − 1.0 by using both the Bayesian predictive distribution PB (eq. 18), and the extreme value distribution PEV (eq. 3). The training time interval T = 2 d and the forecasting time interval ΔT = 5 d were used. Date  Name  Mms  m0  PB(mex ≥ m)  PEV(mex ≥ m)  2016 November 13  Kaikoura  7.8  4.0  0.140  0.180  2016 April 16  Kumamoto  7.3  4.0  0.120  0.222  2011 March 11  Tohoku  9.0  5.5  0.054  0.103  2010 September 03  Darfield  7.1  3.5  0.495  0.643  2010 April 04  El Mayor  7.2  3.2  0.141  0.193  2008 June 14  Iwate  7.2  3.1  0.042  0.080  2007 July 16  Chuetsu  6.8  3.1  0.011  0.032  2007 March 25  Noto  6.9  3.1  0.090  0.162  2005 March 20  Fukuoka  7.0  3.1  0.009  0.023  2004 October 23  Niigata  6.8  3.1  0.429  0.554  2003 September 26  Tokachi-oki  8.0  4.0  0.122  0.197  2002 November 03  Denali  7.9  3.0  0.202  0.372  2000 October 06  Tottori  7.3  3.1  0.015  0.032  1999 October 17  Hector Mine  7.1  2.7  0.166  0.130  1995 January 16  Kobe  7.3  3.1  0.088  0.146  1994 January 17  Northridge  6.7  3.0  0.415  0.569  1992 June 28  Landers  7.3  3.0  0.131  0.099  Date  Name  Mms  m0  PB(mex ≥ m)  PEV(mex ≥ m)  2016 November 13  Kaikoura  7.8  4.0  0.140  0.180  2016 April 16  Kumamoto  7.3  4.0  0.120  0.222  2011 March 11  Tohoku  9.0  5.5  0.054  0.103  2010 September 03  Darfield  7.1  3.5  0.495  0.643  2010 April 04  El Mayor  7.2  3.2  0.141  0.193  2008 June 14  Iwate  7.2  3.1  0.042  0.080  2007 July 16  Chuetsu  6.8  3.1  0.011  0.032  2007 March 25  Noto  6.9  3.1  0.090  0.162  2005 March 20  Fukuoka  7.0  3.1  0.009  0.023  2004 October 23  Niigata  6.8  3.1  0.429  0.554  2003 September 26  Tokachi-oki  8.0  4.0  0.122  0.197  2002 November 03  Denali  7.9  3.0  0.202  0.372  2000 October 06  Tottori  7.3  3.1  0.015  0.032  1999 October 17  Hector Mine  7.1  2.7  0.166  0.130  1995 January 16  Kobe  7.3  3.1  0.088  0.146  1994 January 17  Northridge  6.7  3.0  0.415  0.569  1992 June 28  Landers  7.3  3.0  0.131  0.099  View Large 5 CONCLUSIONS In this work, we proposed a statistical scheme to estimate the probabilities of having extreme events above a certain magnitude in an earthquake sequence by using the information of early earthquakes in the sequence. We derived the Bayesian predictive distribution for the magnitudes of extreme events during the future time interval by assuming specific parametric forms of the frequency–magnitude statistics and earthquake occurrence rates. Specifically, we assume that the distribution of earthquake magnitudes can be approximated by the exponential distribution (eq. 12) and the temporal rate is given by the Omori–Utsu law (eq. 14). In addition, we treat the occurrence of earthquakes as an NHP process. The advantage of the derived Bayesian predictive distribution is that it fully incorporates the uncertainties associated with the model parameters. It also combines in a uniform framework the two important characteristics of the occurrence of extreme earthquakes, that is, the frequency–magnitude statistics and the corresponding earthquake occurrence rate. To illustrate the applicability of the derived framework, we considered several prominent past earthquake sequences in order to estimate the probabilities for the occurrence of extreme events. In the analysis, we studied individual aftershocks sequences generated by several large main shocks and estimated the corresponding probabilities of having large subsequent aftershocks during a fixed forecasting time interval by using the information of early events in the sequences. We also considered several past foreshock sequences which culminated in the occurrence of large main shocks. We were able to compute the probabilities of having events larger than the magnitude of the main shock for each sequence. Acknowledgements RS would like to acknowledge the hospitality of the Institute of Statistical Mathematics, where part of this work was done. We would also like to thank two anonymous reviewers for their constructive criticism and suggestions. RS has been supported partially by the NSERC Discovery grant, NSERC CRD grant 453034 and CFI grant 25816. REFERENCES Båth M., 1965. Lateral inhomogeneities of the upper mantle, Tectonophysics , 2, 483– 514. Google Scholar CrossRef Search ADS   Beavan J., Motagh M., Fielding E.J., Donnelly N., Collett D., 2012. Fault slip models of the 2010-2011 Canterbury, New Zealand, earthquakes from geodetic data and observations of postseismic ground deformation, N. Z. J. Geol. Geophys. , 55( 3), 207– 221. Google Scholar CrossRef Search ADS   Ben-Zion Y., 2008. Collective behavior of earthquakes and faults: Continuum-discrete transitions, progressive evolutionary changes, and different dynamic regimes, Rev. Geophys. , 46( 4), RG4006, doi:10.1029/2008RG000260. Campbell K.W., 1982. Bayesian analysis of extreme earthquake occurrences. Part I. Probabilistic hazard model, Bull. seism. Soc. Am. , 72( 5), 1689– 1705. Christophersen A., Smith E.G.C., 2008. Foreshock rates from aftershock abundance, Bull. seism. Soc. Am. , 98( 5), 2133– 2148. Google Scholar CrossRef Search ADS   Coles S., 2001. An Introduction to Statistical Modeling of Extreme Values , 208, Springer, London. Google Scholar CrossRef Search ADS   Console R., Lombardi A.M., Murru M., Rhoades D., 2003. Båth’s law and the self-similarity of earthquakes, J. geophys. Res. , 108( B2), 2128, doi:10.1029/2001JB001651. Google Scholar CrossRef Search ADS   Daley D.J., Vere-Jones D., 2003. An Introduction to the Theory of Point Processes , 1, 2nd edn, Springer, New York. David H.A., Nagaraja H.N., 2003. Order Statistics , 3rd edn, Wiley, New Jersey. Google Scholar CrossRef Search ADS   Ebrahimian H., Jalayer F., Asprone D., Lombardi A.M., Marzocchi W., Prota A., Manfredi G., 2014. Adaptive daily forecasting of seismic aftershock hazard, Bull. seism. Soc. Am. , 104( 1), 145– 161. Google Scholar CrossRef Search ADS   Felzer K.R., Abercrombie R.E., Ekstrom G., 2004. A common origin for aftershocks, foreshocks, and multiplets, Bull. seism. Soc. Am. , 94( 1), 88– 98. Google Scholar CrossRef Search ADS   Gerstenberger M.C., Wiemer S., Jones L.M., Reasenberg P.A., 2005. Real-time forecasts of tomorrow’s earthquakes in California, Nature , 435( 7040), 328– 331. Google Scholar CrossRef Search ADS PubMed  Gutenberg B., Richter C.F., 1954. Seismicity of the Earth and Associated Phenomenon , 2nd edn, Princeton University Press, Princeton, NJ. Hainzl S., 2016a. Rate-dependent incompleteness of earthquake catalogs, Seismol. Res. Lett. , 87( 2), 337– 344. Google Scholar CrossRef Search ADS   Hainzl S., 2016b. Apparent triggering function of aftershocks resulting from rate-dependent incompleteness of earthquake catalogs, J. geophys. Res. , 121( 9), 6499– 6509. Google Scholar CrossRef Search ADS   Hauksson E., Yang W., Shearer P.M., 2012. Waveform relocated earthquake catalog for Southern California (1981 to 2011), Bull. seism. Soc. Am. , 102( 5), 2239– 2244. Google Scholar CrossRef Search ADS   Helmstetter A., Sornette D., 2003a. Båth’s law derived from the Gutenberg-Richter law and from aftershock properties, Geophys. Res. Lett. , 30( 20), 2069, doi:10.1029/2003GL018186. Google Scholar CrossRef Search ADS   Helmstetter A., Sornette D., 2003b. Foreshocks explained by cascades of triggered seismicity, J. geophys. Res. , 108( B10), 2457, doi:10.1029/2003JB002409. Helmstetter A., Kagan Y.Y., Jackson D.D., 2006. Comparison of short-term and time-independent earthquake forecast models for southern California, Bull. seism. Soc. Am. , 96( 1), 90– 106. Google Scholar CrossRef Search ADS   Holschneider M., Zöller G., Hainzl S., 2011. Estimation of the maximum possible magnitude in the framework of a doubly truncated Gutenberg-Richter model, Bull. seism. Soc. Am. , 101( 4), 1649– 1659. Google Scholar CrossRef Search ADS   Holschneider M., Narteau C., Shebalin P., Peng Z., Schorlemmer D., 2012. Bayesian analysis of the modified Omori law, J. geophys. Res. , 117, B06317, doi:10.1029/2011JB009054. Google Scholar CrossRef Search ADS   Kagan Y.Y., 2004. Short-term properties of earthquake catalogs and models of earthquake source, Bull. seism. Soc. Am. , 94( 4), 1207– 1228. Google Scholar CrossRef Search ADS   Kijko A., 2004. Estimation of the maximum earthquake magnitude, mmax, Pure appl. Geophys. , 161( 8), 1655– 1681. Google Scholar CrossRef Search ADS   Luo J.W., Zhuang J., 2016. Three regimes of the distribution of the largest event in the critical ETAS model, Bull. seism. Soc. Am. , 106( 3), 1364– 1369. Google Scholar CrossRef Search ADS   Marzocchi W., Zhuang J., 2011. Statistics between mainshocks and foreshocks in Italy and Southern California, Geophys. Res. Lett. , 38, L09310, doi:10.1029/2011GL047165. Google Scholar CrossRef Search ADS   Molchan G.M., Kronrod T.L., Nekrasova A.K., 1999. Immediate foreshocks: time variation of the b-value, Phys. Earth planet. Inter. , 111( 3–4), 229– 240. Google Scholar CrossRef Search ADS   Nanjo K.Z., Yoshida A., 2017. Anomalous decrease in relatively large shocks and increase in the p and b values preceding the April 16, 2016, M7.3 earthquake in Kumamoto, Japan, Earth Planets Space , 69, 13, doi:10.1186/s40623-017-0598-2. Google Scholar CrossRef Search ADS   Nanjo K.Z.et al.  , 2016. Seismicity prior to the 2016 Kumamoto earthquakes, Earth Planets Space , 68, 187, doi:10.1186/s40623-016-0558-2. Google Scholar CrossRef Search ADS   Ogata Y., 1988. Statistical-models for earthquake occurrences and residual analysis for point-processes, J. Am. Stat. Assoc. , 83( 401), 9– 27. Google Scholar CrossRef Search ADS   Ogata Y., 1999. Seismicity analysis through point-process modeling: a review, Pure appl. Geophys. , 155( 2–4), 471– 507. Google Scholar CrossRef Search ADS   Ogata Y., Katsura K., 2012. Prospective foreshock forecast experiment during the last 17 years, Geophys. J. Int. , 191( 3), 1237– 1244. Ogata Y., Katsura K., 2014. Comparing foreshock characteristics and foreshock forecasting in observed and simulated earthquake catalogs, J. geophys. Res. , 119( 11), 8457– 8477. Google Scholar CrossRef Search ADS   Ogata Y., Utsu T., Katsura K., 1995. Statistical features of foreshocks in comparison with other earthquake clusters, Geophys. J. Int. , 121( 1), 233– 254. Google Scholar CrossRef Search ADS   Ogata Y., Utsu T., Katsura K., 1996. Statistical discrimination of foreshocks from other earthquake clusters, Geophys. J. Int. , 127( 1), 17– 30. Google Scholar CrossRef Search ADS   Omi T., Ogata Y., Hirata Y., Aihara K., 2013. Forecasting large aftershocks within one day after the main shock, Sci. Rep. , 3, 2218, doi:10.1038/srep02218. Google Scholar CrossRef Search ADS PubMed  Omi T., Ogata Y., Hirata Y., Aihara K., 2014. Estimating the ETAS model from an early aftershock sequence, Geophys. Res. Lett. , 41( 3), 850– 857. Google Scholar CrossRef Search ADS   Omi T., Ogata Y., Shiomi K., Enescu B., Sawazaki K., Aihara K., 2016. Automatic aftershock forecasting: a test using real-time seismicity data in Japan, Bull. seism. Soc. Am. , 106( 6), 2450– 2458. Google Scholar CrossRef Search ADS   Page M.T., van der Elst N., Hardebeck J., Felzer K., Michael A.J., 2016. Three ingredients for improved global aftershock forecasts: tectonic region, time-dependent catalog incompleteness, and intersequence variability, Bull. seism. Soc. Am. , 106( 5), 2290– 2301. Google Scholar CrossRef Search ADS   Peng Z.G., Vidale J.E., Houston H., 2006. Anomalous early aftershock decay rate of the 2004 Mw 6.0 Parkfield, California, earthquake, Geophys. Res. Lett. , 33( 17), L17307, doi:10.1029/2006GL026744. Google Scholar CrossRef Search ADS   Pisarenko V.F., 1991. Statistical evaluation of maximum possible earthquakes, Izv. Earth Phys. , 27( 9), 757– 763. Pisarenko V.F., Lyubushin A.A., Lysenko V.B., Golubeva T.V., 1996. Statistical estimation of seismic hazard parameters: maximum possible magnitude and related parameters, Bull. seism. Soc. Am. , 86( 3), 691– 700. Reasenberg P.A., 1999. Foreshock occurrence before large earthquakes, J. geophys. Res. , 104( B3), 4755– 4768. Google Scholar CrossRef Search ADS   Reasenberg P.A., Jones L.M., 1989. Earthquake hazard after a mainshock in California, Science , 243( 4895), 1173– 1176. Google Scholar CrossRef Search ADS PubMed  Reiss R.-D., Thomas M., 2007. Statistical Analysis of Extreme Values , 3rd edn, Springer, Basel. Rundle J.B., Turcotte D.L., Shcherbakov R., Klein W., Sammis C., 2003. Statistical physics approach to understanding the multiscale dynamics of earthquake fault systems, Rev. Geophys. , 41( 4), 1019, doi:10.1029/2003RG000135. Google Scholar CrossRef Search ADS   Saichev A., Sornette D., 2005. Distribution of the largest aftershocks in branching models of triggered seismicity: theory of the universal Båth law, Phys. Rev. E , 71( 5), 056127, doi:10.1103/PhysRevE.71.056127. Google Scholar CrossRef Search ADS   Shcherbakov R., 2014. Bayesian confidence intervals for the magnitude of the largest aftershock, Geophys. Res. Lett. , 41( 18), 6380– 6388. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., 2004. A modified form of Båth’s law, Bull. seism. Soc. Am. , 94( 5), 1968– 1975. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., Rundle J.B., 2004. A generalized Omori’s law for earthquake aftershock decay, Geophys. Res. Lett. , 31( 11), L11613, doi:10.1029/2004GL019808. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., Rundle J.B., 2005a. Aftershock statistics, Pure appl. Geophys. , 162( 6–7), 1051– 1076. Google Scholar CrossRef Search ADS   Shcherbakov R., Yakovlev G., Turcotte D.L., Rundle J.B., 2005b. Model for the distribution of aftershock interoccurrence times, Phys. Rev. Lett. , 95, 218501, doi:10.1103/PhysRevLett.95.218501. Google Scholar CrossRef Search ADS   Shcherbakov R., Nguyen M., Quigley M., 2012. Statistical analysis of the 2010 Mw 7.1 Darfield earthquake aftershock sequence, N. Z. J. Geol. Geophys. , 55( 3), 305– 311. Google Scholar CrossRef Search ADS   Shcherbakov R., Goda K., Ivanian A., Atkinson G.M., 2013. Aftershock statistics of major subduction earthquakes, Bull. seism. Soc. Am. , 103( 6), 3222– 3234. Google Scholar CrossRef Search ADS   Shcherbakov R., Turcotte D.L., Rundle J.B., 2015. Complexity and earthquakes, in Earthquake Seismology , 4 of Treatise on Geophysics, 2nd edn, pp. 627– 653, ed. Kanamori H., Elsevier. Google Scholar CrossRef Search ADS   Shearer P.M., 2012. Self-similar earthquake triggering, Båth’s law, and foreshock/aftershock magnitudes: Simulations, theory, and results for southern California, J. geophys. Res. , 117, B06310, doi:10.1029/2011jb008957. Google Scholar CrossRef Search ADS   Shebalin P., Narteau C., Holschneider M., Schorlemmer D., 2011. Short-term earthquake forecasting using early aftershock statistics, Bull. seism. Soc. Am. , 101( 1), 297– 312. Google Scholar CrossRef Search ADS   Tahir M., Grasso J.R., Amorese D., 2012. The largest aftershock: how strong, how far away, how delayed?, Geophys. Res. Lett. , 39, L04301, doi:10.1029/2011gl050604. Google Scholar CrossRef Search ADS   Utsu T., Ogata Y., Matsu’ura R.S., 1995. The centenary of the Omori formula for a decay law of aftershock activity, J. Phys. Earth , 43( 1), 1– 33. Google Scholar CrossRef Search ADS   Vere-Jones D., 1969. A note on the statistical interpretation of Båth’s law, Bull. seism. Soc. Am. , 59, 1535– 1541. Vere-Jones D., 1975. Stochastic models for earthquake sequences, Geophys. J. Int. , 42( 2), 811– 826. Google Scholar CrossRef Search ADS   Vere-Jones D., Zhuang J., 2008. Distribution of the largest event in the critical epidemic-type aftershock-sequence model, Phys. Rev. E , 78( 4), 047102, doi:10.1103/PhysRevE.78.047102. Google Scholar CrossRef Search ADS   Zhuang J., Ogata Y., 2006. Properties of the probability distribution associated with the largest event in an earthquake cluster and their implications to foreshocks, Phys. Rev. E , 73( 4), 046134, doi:10.1103/PhysRevE.73.046134. Google Scholar CrossRef Search ADS   Zhuang J., Christophersen A., Savage M.K., Vere-Jones D., Ogata Y., Jackson D.D., 2008. Differences between spontaneous and triggered earthquakes: their influences on foreshock probabilities, J. geophys. Res. , 113( B11), B11302, doi:10.1029/2008jb005579. Google Scholar CrossRef Search ADS   Zhuang J., Ogata Y., Wang T., 2017. Data completeness of the Kumamoto earthquake sequence in the JMA catalog and its influence on the estimation of the ETAS parameters, Earth Planets Space , 69, 36, doi:10.1186/s40623-017-0614-6. Google Scholar CrossRef Search ADS   Zöller G., Holschneider M., Hainzl S., 2013. The maximum earthquake magnitude in a time horizon: theory and case studies, Bull. seism. Soc. Am. , 103( 2A), 860– 875. Google Scholar CrossRef Search ADS   Zöller G., Holschneider M., Hainzl S., Zhuang J., 2014. The largest expected earthquake magnitudes in Japan: the statistical perspective, Bull. seism. Soc. Am. , 104( 2), 769– 779. Google Scholar CrossRef Search ADS   © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society.

### Journal

Geophysical Journal InternationalOxford University Press

Published: Jan 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations