# Impact of magnitude uncertainties on seismic catalogue properties

Impact of magnitude uncertainties on seismic catalogue properties Summary Catalogue-based studies are of central importance in seismological research, to investigate the temporal, spatial and size distribution of earthquakes in specified study areas. Methods for estimating the fundamental catalogue parameters like the Gutenberg–Richter (G-R) b-value and the completeness magnitude (Mc) are well established and routinely applied. However, the magnitudes reported in seismicity catalogues contain measurement uncertainties which may significantly distort the estimation of the derived parameters. In this study, we use numerical simulations of synthetic data sets to assess the reliability of different methods for determining b-value and Mc, assuming the G-R law validity. After contaminating the synthetic catalogues with Gaussian noise (with selected standard deviations), the analysis is performed for numerous data sets of different sample size (N). The noise introduced to the data generally leads to a systematic overestimation of magnitudes close to and above Mc. This fact causes an increase of the average number of events above Mc, which in turn leads to an apparent decrease of the b-value. This may result to a significant overestimation of seismicity rate even well above the actual completeness level. The b-value can in general be reliably estimated even for relatively small data sets (N < 1000) when only magnitudes higher than the actual completeness level are used. Nevertheless, a correction of the total number of events belonging in each magnitude class (i.e. 0.1 unit) should be considered, to deal with the magnitude uncertainty effect. Because magnitude uncertainties (here with the form of Gaussian noise) are inevitable in all instrumental catalogues, this finding is fundamental for seismicity rate and seismic hazard assessment analyses. Also important is that for some data analyses significant bias cannot necessarily be avoided by choosing a high Mc value for analysis. In such cases, there may be a risk of severe miscalculation of seismicity rate regardless the selected magnitude threshold, unless possible bias is properly assessed. Numerical approximations and analysis, Statistical methods, Statistical seismology 1 INTRODUCTION Seismicity catalogues are comprehensive databases which result from the operation of seismic networks and they include information, mainly, on earthquakes’ location, origin time and magnitude. These data are the basis for extensive analyses of the spatial, temporal and size distribution of seismicity, possible seismicity rate changes, as well as hazard assessment. Local, national and global networks have in general been (gradually) significantly improved, recording increasingly more events in seismogenic areas. However, data heterogeneity is inevitable, as catalogues embody measurement uncertainties (both random and systematic) stemming from human artefacts (instrumentation, methodology, seismological network modifications, etc.) or from geological interference (e.g. attenuation, site effects and aftershock sequences). Such factors result in discrepancies which are strongly imprinted on earthquake catalogues and may significantly distort statistical analysis results (Mignan & Woessner 2012). Therefore, it is important that seismological catalogues be scrutinized for their consistency and homogeneity. One of the most important issues to consider is that estimated magnitudes contain various uncertainties which may result to significant biases, for example on the estimation of the completeness magnitude (Mc, defined as the magnitude above which all events are believed to be recorded). Two different approaches are usually followed for the determination of Mc (Mignan & Woessner 2012). The first family is network-based methods (Schorlemmer & Woessner 2008; Mignan et al.2011; Plenkers et al.2011; Maghsoudi et al., 2013, 2015; Kwiatek & Ben-Zion 2016) which are based upon the assessed probability of the detection of an earthquake, given the observation sensitivity and distribution of the seismological network. The alternative approach is catalogue-based methods, either relying on the day-to-night ratio of earthquake frequency (Rydelek & Sacks 1989; Taylor et al.1990) or on the assumption of a self-similar earthquake generation process (Woessner & Wiemer 2005, and references therein). In such cases, the Frequency–Magnitude Distribution (FMD) of earthquakes is assumed to be sufficiently described by the well-known empirical Gutenberg–Righter (G-R) law (Ishimoto & Iida 1939; Gutenberg & Richter 1944). The frequency N(M) of the events with magnitudes greater than or equal to M is given by:   $${\log _{10}}N(M) = \alpha - bM$$ (1)where parameter α describes the general activity level and b the proportion of larger relative to smaller events. The exponential behaviour of the FMD has been observed in many data sets and is consistent with laboratory experiments (Scholz 1968) and various theoretical models (Kanamori & Anderson 1975; Tinti & Mulargia 1985). Eq. (1) can also be transformed into a power law (Pareto) distribution of the scalar seismic moment (Kagan 1997). Detailed investigations of some seismicity data sets have occasionally revealed cases of apparent significant deviations from the G-R law (Utsu 1999) in, for example anthropogenic seismicity (Urban et al.2015) or global data (Serra & Corral 2017). Still, the exponential distribution is usually dominant (e.g. Kagan 1991; Kijko et al.2001; Pisarenko & Sornette 2003) and as a result, the G-R law remains the simplest and most commonly accepted formulation for describing the FMD of earthquakes from nano (e.g. Boettcher et al.2009; Kwiatek et al.2010) to global seismicity scale (Kagan et al. 2003). Assuming that the G-R law is valid, and taking into account that the estimated magnitudes in a seismicity catalogue incorporate errors (uncertainties), a reasonable question that arises is whether these errors can affect the estimation of b-value and Mc and what is the degree of their potential influence. Since Mc determination is essential, for example for the correct estimation of b-values, an inaccurate Mc assessment may significantly degrade seismic hazard assessment, seismicity rate investigation, cluster analysis or any other form of relevant statistical analysis. Tinti & Mulargia (1985) and Tinti et al. (1987) showed that α and b estimates of the G-R relationship strongly depend on the magnitude uncertainties rising from (i) systematic errors (due to e.g. the approximations made while applying a model or methodology), (ii) discretization (i.e. magnitudes are rounded although they are naturally continuous quantities), (iii) random errors (e.g. related to the data acquisition and analysis process) and (iv) catalogue heterogeneities (e.g. conversions among different magnitude scales). In a homogeneous catalogue in the sense of including the same magnitude scale for all events, if the magnitude errors are disregarded, the α-value may be overestimated, even if the b-value is not affected. In heterogeneous catalogues, neglecting the random magnitude errors may lead to more significant bias in the estimated parameters (Tinti et al.1987). A conservative strategy often followed is to select a rather high completeness magnitude threshold, resulting in smaller data sets for analysis. The advantages of using a high threshold to try to assure completeness must be weighed against the loss of information, when a considerable number of events are excluded, leading to higher statistical errors. The aim of this study is to investigate and assess the impact of magnitude uncertainties (inherently existing in seismic catalogues) on the estimation of b-value and Mc. Thus, we will identify how these uncertainties affect seismicity rate determination for M greater or equal to a pre-defined value. Since other seismicity parameters (spatial and temporal) are not studied here, synthetic magnitude catalogues following a specified distribution are produced. Two different approaches for creating the synthetic data sets are applied and the results are compared with each other. Then, the properties of the synthetic catalogues are demonstrated by examining idealized cases where data sets are large enough to provide statistically significant estimations of the basic seismicity parameters (b-value and Mc). The calculations are then repeated after introducing noise of specified distribution to the original data. We rely on three catalogue-based methodologies for assessing Mc and apply four different approaches (see Section 2.2), whereas the b-value (and the corresponding uncertainty) is calculated throughout as a function of the minimum included magnitude (Mmin) by the maximum likelihood estimator of Aki (1965). Several scenarios are tested, corresponding to different noise properties and sample size, and the significance of the results is evaluated by Monte Carlo simulations. Finally, the influence of Mc and b-value determination on seismicity rate and seismic hazard assessment implications is discussed. 2 METHODS 2.1 b-value The general method for estimating the b-value is based on the double truncated exponential distribution, where the probability density function of magnitude M (limited between a minimum Mmin and a maximum Mmax) for the G-R relation is given by (Page 1968; Kijko & Sellevoll 1989):   \begin{eqnarray} {f(M) = \frac{{\beta [\exp ( - \beta (M - {M_{\min }} + \Delta M/2))]}}{{1 - \exp [ - \beta ({M_{\max }} - {M_{\min }} + \Delta M/2)]}}} \nonumber\\ \quad {\rm{for}}\quad {M_{\min }} \le M \le {M_{\max }},\quad 0\quad {\rm{otherwise}}\end{eqnarray} (2)with parameter β being approximately 2.3 times (ln10) larger than the b-value of the G-R law. The term ΔM/2 is introduced to eq. (2) as a correction for the finite binning width of the catalogue. ΔM is the round-off interval of the magnitudes, that is the minimum non-zero difference in the reported magnitudes. ΔM is set equal to 0.1 in this study, in accordance with the real catalogues we refer to. The corresponding cumulative distribution function of eq. (2) is given by:   \begin{eqnarray} F(M) \!=\! \left\lbrace {\begin{array}{c@{}c@{}} 0&\qquad\qquad\qquad {{\rm{for}}\quad M < {M_{\min }}}\\ &\!\!\!{\frac{{1 - \exp [( - \beta (M - {M_{\min }} \!+\! \Delta M/2))]}}{{1 - \exp [ - \beta ({M_{\max }} \!-\! {M_{\min }} \!+\! \Delta M/2)]}}}\,\,\,{{\rm{for}}\quad {M_{\min }} \le M \le {M_{\max }}}\\ 1& \qquad\qquad\qquad{{\rm{for}}\quad M > {M_{\max }}} \end{array}} \right. \end{eqnarray} (3) A common assumption that is frequently followed is the so-called unbounded G-R law, in which an infinitely large maximum magnitude is assumed to be possible. This approach leads to a simplification of eqs (2) and (3), by eliminating the denominators. The unbounded exponential distribution may result to the energy catastrophe paradox (Knopoff & Kagan 1977), which can be prevented by applying the original, upper bounded form of eqs (2) and (3) (Kijko & Sellevoll 1989; Kijko & Singh 2011; Lasocki & Urban 2011). However, the two approaches lead to essentially identical results for sufficiently large samples (N > ∼1000) and a robustly constrained Mmax (see Fig. S1, Supporting Information). The b-value for the unbounded G-R approach is considered to be most appropriately estimated by the maximum likelihood estimator (Aki 1965; Bender 1983; Utsu 1999):   $$b = \frac{1}{{\ln (10)\left[ {\left\langle M \right\rangle - ({M_{\min }} - \Delta M/2)} \right]}}$$ (4)where <M> is the sample mean of the events included. Aki (1965) also estimated the b-value standard error, σb, for a sample size, N, expressed as:   $${\sigma _b} = \frac{b}{{\sqrt N }}$$ (5) Note that eq. (5) is an asymptotic maximum likelihood estimator and can be considered to be reliable when N is essentially large (∼ >100). In our study, we applied eqs (4) and (5) in data sets with N > 300, so the asymptotic estimator can be considered as appropriate. 2.2 Completeness magnitude (Mc) Once the b-value has been estimated as a function of Mmin, the completeness magnitude (Mc) of the data sets can be estimated. For the Mc evaluation, we rely on four catalogue-based methodologies: (i) Maximum Curvature (MaxC, Wiemer & Wyss 2000), (ii) Goodness-of-Fit Test (GFT, Wiemer & Wyss 2000), (iii) modified GFT (mGFT, Leptokaropoulos et al.2013) and (iv) the b-value stability (MBS, Cao & Gao 2002). In all these methods, there is a direct dependency between the estimated Mc and the estimated b-value and there are cases when the b-values are sensitive to even small variations of Mc (0.1 unit). These methods are not computationally expensive and they are easily applied, constituting suitable tools for Monte Carlo simulations over numerous synthetic data sets, as in this study. The MaxC technique defines Mc as the most frequent magnitude in the non-cumulative distribution, which corresponds to the maximum value of the first derivative of the G-R law. According to the GFT, the completeness threshold is estimated by fitting an exponential distribution over real and synthetic magnitude samples. These two approaches (real and synthetic) are combined under the frame of a quantitative testing procedure comparing R-values, that is, the offset between real and synthetic data:   $$R = \frac{{\sum\limits_{{M_i}}^{{M_{\max }}} {\left| {{N_o} - {N_s}} \right|} }}{{\sum\limits_i N }}$$ (6) where R is the normalized absolute difference between the number of observed (No) and simulated (Ns) events. Two thresholds for R, 5 and 10 per cent, are commonly used in published studies to justify the selection of Mc. In mGFT, k synthetic FMDs with the same number of events and same exponential distribution as the observed data are created in each case. Goodness of fit to the G-R law in each magnitude bin is found for each one of these synthetic catalogues. Hence, the total residual value RmGFT is calculated by averaging the R-values from the k synthetic FMDs (eq. 7). The minimum RmGFT indicates the value of the Mc. In this study, k is set equal to 500, in all cases investigated. Given the R from eq. (6) and k synthetic FMDs, RmGFT is given by:   $${R_{{\rm{mGFT}}}} = \frac{{\sum\limits_{j = 1}^k R }}{k}$$ (7) The MBS approach is built upon the assumption that b-value increases for Mmin < Mc, but stabilizes for M > Mc. Cao and Gao (2002) applied this technique considering the b stability as a function of Mmin. The completeness threshold is defined as the minimum magnitude for which   $$\Delta b = \left| {{b_{{\rm{average}}}} - b} \right| \le 0.03$$ (8) with baverage being the arithmetic mean of the b-values corresponding to successive magnitude thresholds within half a magnitude range (i.e. 0.5 units for 0.1 bin size). Instead of using the 0.03 value as the criterion for b-value stability, we here follow the alternative technique of Woessner & Wiemer (2005). The authors suggested the use of the b-value uncertainty, δb, as the criterion, introduced by Shi & Bolt (1982):   $$\delta b = 2.3{b^2}\sqrt {\frac{{\sum\limits_{i = 1}^N {({M_i} - \left\langle M \right\rangle )} }}{{N(N - 1)}}}$$ (9) where N is the number of all events with M > Mc and <M> is their mean magnitude. Thus, Mc is defined as the minimum magnitude for which   $$\Delta b = \left| {{b_{{\rm{average}}}} - {\rm{b}}} \right| \le {\rm{\delta b}}$$ (10) There are other catalogue-based methods that can be used to estimate the completeness magnitude (e.g. Mignan & Woessner 2012; Godano 2017, and references therein). For example, Godano (2017) recently suggested that a robust estimation of Mc can be achieved based on the calculation of the harmonic mean of the magnitudes larger than a given magnitude threshold. The method was tested on synthetic and real data and the results were compared to the MBS technique, revealing the necessity of an accurately defined Mc in order to obtain a correctly estimated b-value (see also later sections). Another broadly used technique is the Entire Magnitude Range (EMR) method, developed by Ogata & Katsura (1993) and modified by Woessner & Wiemer (2005). For EMR to be applied, a complete FMD model needs to be assumed, necessitating the estimation of four parameters. This makes the technique more complex in comparison with other methods such as the non-parametric MaxC or the MBS/GFTs methods described above. A technique like EMR incorporates stronger assumptions concerning the incomplete part when evaluating the completeness magnitude. As discussed in Kagan (2002), strong implicit or explicit assumptions about the character of the incompleteness may be problematic. There are multiple reasons of losing data below Mc (e.g. spatiotemporal network inefficiency, variations of the spatiotemporal signal-to-noise ratio, even operators’ ability and decisions). Therefore, the distribution of the incomplete data may be statistically complicated. Although modeling of the entire catalogue for generating realistic synthetic data sets is applied here, the fitting of parameters to the incomplete part will be reliable if our detailed assumptions about the character of the incompleteness are correct, but may be misleading otherwise. Therefore, we have chosen not to use this approach in this study. 2.3 Synthetic data We study synthetic magnitude catalogues which are constructed using two models, both based upon the G-R law with some data missing at smaller magnitudes. The first synthetic catalogue (A) consists of two groups of magnitudes which are produced separately and are then combined. The incomplete part corresponds to a half Gaussian distribution, that is a set of normally distributed event magnitudes which is produced with mean equal to 1.9 and a standard deviation equal to 0.2. As described by Woessner & Wiemer (2005), various distributions are sometimes assumed to describe the incomplete fraction of a catalogue. For our simulations, we choose a generic normal distribution for simplicity. The completeness magnitude (Mc) of the entire data set is fixed at magnitude 2. Therefore, the distribution of M ≥ 2 exactly follows the G-R law, with b = 1.0, which is a commonly observed value for a wide variety of instrumental catalogues on several spatial and temporal scales (e.g. El-Isa & Eaton 2014, and references therein). Although Mc = 2 might seem an arbitrary selection, the self-similar properties of observed G-R distributions indicate scale invariance, and thus the choice of Mc does not really affect the results, since they can be extrapolated towards both higher as well as lower values. We generate a set of approximately 73 000 synthetic magnitudes, including more than 30 000 events with M > Mc, rounded to the first decimal as is common in many regional and global catalogues. The maximum ‘observed’ magnitude in this catalogue is 6.5, so the range of the complete part is 4.5 mag units. That catalogue A as a compilation of two distinct parts of known properties gives the advantage of well-determined initial conditions, with exact and known Mc and b-value for the complete part. This allows us to explicitly assess the performance of the different analysis methods in estimating Mc and b-value. However, it is unlikely that real data sets have such a clear discontinuity. Therefore, in order to assess the possible sensitivity of our analysis to our initial assumptions, we chose also to examine a synthetic data set with no explicit discontinuity between the complete and incomplete parts, which may be more realistic. We use the model suggested by Ogata & Katsura (1993) and modified by Woessner & Wiemer (2005), based on the observed seismicity in Japan. According to their model, a detection function q (M|μ,σ) is used to describe the detection rate, which is given by a cumulative normal distribution:   $$q(M|\mu ,\sigma ) = \left\{ {\begin{array}{@{}*{1}{c}@{}} {\frac{1}{{\sigma \sqrt {2\pi } }}\int\limits_{{ - \infty }}^{{{M_c}}}{{\exp \left[ { - \frac{{{{(M - \mu )}^2}}}{{2{\sigma ^2}}}} \right]}}dm,\quad M < {M_c}}\\ {1,\quad {\rm{else}}} \end{array}} \right\}$$ (11) where μ is the magnitude corresponding to 50 per cent of detected earthquakes and is based upon the network detectability, while σ represents how fast the detection rate decays as magnitude decreases. Using this approach, we build a second set of synthetic magnitudes (catalogue B), where μ = 1.5 and σ = 0.2. More details on the choice of these parameters can be found in the work of Huang et al. (2016). The resulting stochastic catalogue includes ‘events’ randomly selected (following the rejection method described by Zhuang & Touati 2015) from a joint distribution of the G-R law and the detection function. Catalogue B eventually includes 100 000 events, with magnitudes up to 6.3, which are later rounded to the first decimal digit. For these rounded magnitudes, ‘events’ with M ≥ 2 have 99 per cent probability to be included in the catalogue, meaning that all events above this threshold are essentially recorded. Although this technique cannot provide a precise a priori completeness threshold, it is in some ways perhaps more realistic than catalogue A. We then add random errors (i.e. noise) to the magnitude estimates in the two catalogues. These are produced using a normal distribution with zero mean (μM = 0), assuming that there is no systematic trend for overestimating or underestimating the magnitudes in the catalogue (Tinti & Mulargia 1985). The effect of using a non-zero mean has also been investigated (Figs S2 and S3, Supporting Information) leading to a shift to higher or lower magnitudes. The assumed standard deviation of the Gaussian noise (σΜ) is in the range 0.1–0.4 (each distribution is investigated separately). The noise we introduce is magnitude independent, that is both higher and lower magnitudes demonstrate identical standard deviation. This choice was motivated by two examples of empirical data, from Greece and Spain. Gonzalez (2016) reports that the standard deviation of the reported magnitude errors in the Spanish national earthquake catalogue, from 1997 November onwards, is approximately 0.2 units, regardless the magnitude range. For the Greek catalogue, we ended up with similar conclusions, after investigating nearly 49 000 events obtained from the monthly bulletins provided by the Aristotle University of Thessaloniki (http://geophysics.geo.auth.gr/ss/) covering a period of 6.5 yr. During that time, the Hellenic Unified Seismological Network (HUSN) which provides the data have been relatively stable and the corresponding catalogue is homogeneous as far as the magnitude scale is concerned. The histogram in Fig. 1 shows that the reported magnitudes include errors with an estimated standard deviation of about 0.2. According to the results shown in Table 1, the standard deviation of the reported magnitude errors appears to be independent of the event magnitude, exhibiting essentially identical (σΜ) values (approximately 0.2) for all magnitude classes in the range of the available data. Figure 1. View largeDownload slide Histogram of reported magnitude uncertainties for ∼49 000 events of the HUSN seismicity catalogue for the period 2008 August, 1 to 2014 December, 31. Figure 1. View largeDownload slide Histogram of reported magnitude uncertainties for ∼49 000 events of the HUSN seismicity catalogue for the period 2008 August, 1 to 2014 December, 31. Table 1. Mean reported standard deviation, σΜ, for subsets of data from the HUSN seismicity catalogue (2008 August 1–2014 December 31). Magnitude  Mean σM  0.0 < M < 1.0  0.214  1.1 < M < 2.0  0.212  2.1 < M < 3.0  0.213  3.1 < M < 4.0  0.214  4.1 < M < 5.0  0.217  5.1 < M < 6.2  0.222  Magnitude  Mean σM  0.0 < M < 1.0  0.214  1.1 < M < 2.0  0.212  2.1 < M < 3.0  0.213  3.1 < M < 4.0  0.214  4.1 < M < 5.0  0.217  5.1 < M < 6.2  0.222  View Large The first examination of the effect of noise on the magnitude distribution can be made with Fig. 2, where the apparent change of the shape of the magnitude distribution due to the presence of a Gaussian noise with σΜ = 0.2 is shown. As the noise level is increased (larger σΜ), more events are sorted into different magnitude bins. The influence of the noise on the derived b-values using different minimum magnitudes is demonstrated in Fig. 3. Because there are fewer larger events than smaller and because the errors have a zero mean and magnitude independent standard deviation, we expect numerically more ‘leakage’ to larger than to smaller magnitudes. This means that more events are distributed towards the right tail of the distribution and fewer to the left. In this case, the practically estimated completeness threshold will in general be higher than the true underlying value. For reference, the completeness magnitudes and b-values of the initial catalogues A and B described above are shown in Table 2. Figure 2. View largeDownload slide Red circles: cumulative Frequency Magnitude Distributions (FMD) for catalogue A (top graphs) and catalogue B (bottom graphs), before (left-hand column) and after (right-hand column) the contamination with Gaussian noise of μM = 0 and σΜ = 0.2. The histograms show the number of events in each magnitude bin. Figure 2. View largeDownload slide Red circles: cumulative Frequency Magnitude Distributions (FMD) for catalogue A (top graphs) and catalogue B (bottom graphs), before (left-hand column) and after (right-hand column) the contamination with Gaussian noise of μM = 0 and σΜ = 0.2. The histograms show the number of events in each magnitude bin. Figure 3. View largeDownload slide The estimated b-value as a function of the minimum included magnitude for both catalogues (A on the left and B on the right graph), before (blue lines) and after (red lines) Gaussian noise of μM = 0 and σΜ = 0.2 is introduced to the data (see also the FMDs of the data in Fig. 2). The dashed lines show the estimated standard error of b (eq. 5). Figure 3. View largeDownload slide The estimated b-value as a function of the minimum included magnitude for both catalogues (A on the left and B on the right graph), before (blue lines) and after (red lines) Gaussian noise of μM = 0 and σΜ = 0.2 is introduced to the data (see also the FMDs of the data in Fig. 2). The dashed lines show the estimated standard error of b (eq. 5). Table 2. Four techniques are used to estimate the completeness magnitude (Mc) and b-value for the two catalogues, A and B (noise-free). Catalogue A  MaxC  GFT90  GFT95  mGFT  MBS  Mc  1.9  1.7  1.8  2  1.9  b  0.99  0.86  0.93  1.00  1.00  Catalogue B  Mc  1.6  1.5  1.6  2  1.9  b  0.91  0.85  0.91  0.99  0.98  Catalogue A  MaxC  GFT90  GFT95  mGFT  MBS  Mc  1.9  1.7  1.8  2  1.9  b  0.99  0.86  0.93  1.00  1.00  Catalogue B  Mc  1.6  1.5  1.6  2  1.9  b  0.91  0.85  0.91  0.99  0.98  View Large 3 RESULTS 3.1 Introducing magnitude uncertainties Using the four techniques described earlier (MaxC, GFT, mGFT and MBS), we investigate the influence of magnitude uncertainties on the estimated b-values and completeness threshold of two catalogues (A and B). Four levels of noise are introduced to the initial magnitude data, with σΜ = 0.1, 0.2, 0.3 and 0.4, that is corresponding to increasingly higher standard deviation of the noise. For GFT, the results are tested for two degrees of fit as suggested by earlier studies (e.g. Woessner &Wiemer 2005), 90 and 95 per cent, referred to as GFT90 and GFT95, respectively. Results are summarized in Figs 4 and 5. Numerical results can be found in the Table S1 of the Supporting Information. For catalogue A, where Mc is explicitly defined in the initial magnitude set (Mc = 2), the results reveal that MaxC tends to underestimate the completeness threshold, without major changes as higher noise level is introduced (Fig. 4). GFT95 also underestimates Mc for low noise levels, but the influence of the magnitude uncertainty is more profound (higher Mc is estimated as noise standard deviation gets larger). mGFT and MBS provide similar results to each other concerning Mc, for the different values of noise standard deviation. The results are similar for the case of catalogue B, although the response of MaxC seems slightly different between the two data sets. Figure 4. View largeDownload slide Mc estimated using the MaxC, GFT, mGFT, MBS methods when Gaussian noise of different standard deviation (σΜ) is added to catalogues A (Mc = 2 and b = 1) and B (50 per cent detection for M = 1.5 and b = 1). The results are the means of 1000 realizations and the error bars show the standard deviation (blue for catalogue A and red for B). The straight lines show the input Mc = 2 for comparison. The stars show the estimated Mc from each method in the error-free case. Figure 4. View largeDownload slide Mc estimated using the MaxC, GFT, mGFT, MBS methods when Gaussian noise of different standard deviation (σΜ) is added to catalogues A (Mc = 2 and b = 1) and B (50 per cent detection for M = 1.5 and b = 1). The results are the means of 1000 realizations and the error bars show the standard deviation (blue for catalogue A and red for B). The straight lines show the input Mc = 2 for comparison. The stars show the estimated Mc from each method in the error-free case. Figure 5. View largeDownload slide As Fig. 4, but mean b-values (and their standard deviation as error bars) from 1000 Monte Carlo realizations of noise contaminated catalogues A and B (the narrow error bars obtained by GFT are explained in Fig. S5 in the Supporting Information). Figure 5. View largeDownload slide As Fig. 4, but mean b-values (and their standard deviation as error bars) from 1000 Monte Carlo realizations of noise contaminated catalogues A and B (the narrow error bars obtained by GFT are explained in Fig. S5 in the Supporting Information). The b-values calculated using the different Mc estimates tend to decrease as noise increases (Fig. 5). mGFT and MBS provide more robust estimates than the other methods, with smaller changes in b and values close to the correct (input) b = 1 and the noise-free estimates (Table 2). The error bars in the figures correspond to one standard deviation from 1000 Monte Carlo realizations. The corresponding completeness threshold is also more ‘realistic’ (in terms of G-R law validity) when it is defined with these methods. Given that real magnitude estimates contain errors, our results imply that empirical b-values based on Mc from MaxC and GFT may be strongly biased, with considerable implications for seismic hazard assessment. 3.2 Magnitude uncertainties and sample size Considering that the b-value is a catalogue property statistically estimated from the available data, it is reasonable to assume that a sufficient sample size, N is required for a reliable estimation. As pointed out in the recent study of Nava et al. (2017), for small N (e.g. for small geographical areas or short time periods) results can be unreliable. We now investigate the bias in estimated Mc and b due to magnitude noise when N varies between 300 and 50 000. Events are extracted randomly from catalogues A and B and we show results only for magnitude noise with μM = 0 and σΜ = 0.2 (the latter motivated by what we observed for real data sets, see Section 2.3). We investigate the effect on the b-value of adding such noise by performing 1000 Monte Carlo trials for each sample size, and calculating the mean difference and the standard deviation of the result. The results are summarized in Figs 6 and 7 and are similar for both catalogues. Therefore, we show only a representative part of the results (more information can be found in Figs S5–S8 in the Supporting Information). Figure 6. View largeDownload slide Mc (top graphs) and b (bottom graphs) for subsets of catalogue A (Mc = 2 and b = 1) with N from 300 to 50 000, estimated using the four techniques (MaxC, GFT, mGFT and MBS). Markers show the means from 1000 separate realizations. Left: no added noise. Right: noise (μM = 0 and σΜ = 0.2) added to magnitudes. The straight lines indicate the input Mc and b. Figure 6. View largeDownload slide Mc (top graphs) and b (bottom graphs) for subsets of catalogue A (Mc = 2 and b = 1) with N from 300 to 50 000, estimated using the four techniques (MaxC, GFT, mGFT and MBS). Markers show the means from 1000 separate realizations. Left: no added noise. Right: noise (μM = 0 and σΜ = 0.2) added to magnitudes. The straight lines indicate the input Mc and b. Figure 7. View largeDownload slide Similar to Fig. 6, but for catalogue B (50 per cent detection for M = 1.5 and b = 1) and now showing differences between input and estimated values of Mc and b. Figure 7. View largeDownload slide Similar to Fig. 6, but for catalogue B (50 per cent detection for M = 1.5 and b = 1) and now showing differences between input and estimated values of Mc and b. Working with subsets from catalogue A and without the addition of any noise (Fig. 6, left-hand column), we can see that the first two methods, MaxC and GFT, underestimate Mc irrespectively of the sample size. For GFT, we present only results for the 90 per cent goodness of fit level, as the 95 per cent level was not reached for the majority of the subsets in both catalogues (95 per cent goodness of fit was achieved only in 30–40 per cent of the subsets). The mGFT and MBS methods define higher Mc for larger N, tending to approach the initially assumed value for catalogue A (see Table 2). However, MBS seems to be more sensitive to small N. This is what we would expect given that the stability of the estimated b-value is used as the criterion here, and more ‘events’ and wider magnitude range ensure more robust and realistic estimates of b. As also illustrated in Fig. 6, the three methods converge to the ‘real’ value of b = 1 as more ‘events’ are included in the subsets, except GFT90 which provides the lowest Mc and b, regardless the sample size. When the magnitude noise is added (Fig. 6, right-hand column), the performances of MaxC and GFT do not change significantly. Both methods define low completeness thresholds which are almost constant despite the changes in the sample size. However, because the added noise perturbs the FMD of the original catalogue (Fig. 2), the b-values are also lower than the input (b = 1.0) value (also shown in Fig. 3), similarly for the two techniques. For large N (N > ∼2000), the b-values given by mGFT and MBS are very close to the input value (less than 5 per cent are lower than b = 1.0), even though Mc is overestimated when it is compared to the input value (Mc = 2.0). For better understanding the potential combined effect of sample size and magnitude uncertainties on Mc and b, we use catalogue B to calculate the difference in Mc and b (as a function of the sample size) when these are estimated for data sets before and after the noise contamination. The results are illustrated in Fig. 7. Here, MaxC and GFT tend to estimate similar Mc values for all subsets, regardless the existence or absence of noise. As for catalogue A, mGFT and MBS tend to estimate higher Mc when errors are introduced and more events are used, which is expressed by the downward trend of the plotted differences (i.e. the absolute difference gets higher). In general, the estimated b-value is more accurate for larger subsets, that is closer to the initial input, irrespective of whether noise is introduced or not. The most accurate b-value estimation is achieved for N > ∼3000, by applying MBS and mGFT. 4 DISCUSSION We have investigated the potential biases introduced to b-value and Mc calculations due to the interference of various factors in real seismicity catalogues, such as the catalogue size and random errors in reported magnitudes. We also explore if different common methods for the b and Mc estimation are more susceptible to bias when data noise is present. Under the assumption that each event has an equal probability of being either overestimated or underestimated (i.e. normally distributed magnitude noise), there is an inherent trend of reporting an overestimated magnitude (compared to the real one) for the majority of the events close to and above Mc. This occurs because there are more small events that are shifted towards higher magnitudes than large events shifted to lower magnitude values. Therefore, since the noise is uniformly introduced to all magnitudes, an apparent increase of the larger to smaller events ratio in the complete part of the catalogue leads to an apparent decrease in the estimated b-value. This b-value decrease is inevitable to all catalogues contaminated by any symmetrically distributed noise (e.g. normal, uniform and t-distribution) and is proportional to the standard deviation of the noise. The underestimation of the b-value may in turn result in significant artefacts regarding seismicity rate evaluation and seismic hazard assessment. The effect of magnitude uncertainties in catalogue A is demonstrated in Fig. 8 (see also Table S2 in the Supporting Information for the numerical results). For catalogue A, where Mc is explicitly set at M = 2, we can easily calculate the number of events shifted from the incomplete to the complete part of the catalogue, and vice versa. The larger the number of events with M < Mc, the higher the resulting noise-induced contamination of the ‘complete’ data set from events with true magnitude below Mc. Note that the percentage of the additional events (erroneously included above a given magnitude threshold) in the contaminated catalogues stabilizes at 2.9, 11.5 and ∼27 per cent of the total number of events included in the original catalogue, from magnitudes of about 2.2 (σΜ = 0.1), 2.4–2.5 (σΜ = 0.2) and 2.7–2.8 (σΜ = 0.3), respectively. The mGFT and MBS methods estimate higher Mc than the other methods when noise with larger standard deviation is introduced. In previous studies that are based on data sets which do not include any noise, Mignan & Woessner (2012) have also reported the overestimation of Mc when MBS is applied, while the recently suggested method by Godano (2017) is reported to define even higher Mc than MBS. These methods (which report higher Mc) provide a more reliable b-value estimation. Although there are still events shifted from the apparent incomplete to the complete part of the catalogue, the resulting ratio of the smaller to larger magnitudes stabilizes, yielding a more realistic b-value and eliminating in some sense the effect of the magnitude noise (uncertainty). This is potentially significant, for example because an underestimate of b implies an overestimation of the number of larger future events. Figure 8. View largeDownload slide The difference in the apparent number of events over a given magnitude threshold and the corresponding (‘true’) number before noise (μM = 0 and σΜ = 0.1, 0.2 and 0.3) is added (catalogue A). Figure 8. View largeDownload slide The difference in the apparent number of events over a given magnitude threshold and the corresponding (‘true’) number before noise (μM = 0 and σΜ = 0.1, 0.2 and 0.3) is added (catalogue A). The magnitude of completeness (Mc) values estimated with MaxC and GFT are lower than those from the other algorithms, meaning more intense contamination of the presumed complete data set. Similarly, Mignan & Woessner (2012) showed that EMR also gives lower estimations of Mc. This implies that the absolute seismicity rate will be overestimated, with various implications for seismicity studies. However, when investigating rate changes this may not matter because the entire (complete) catalogue will be affected in a similar manner. This is demonstrated in Fig. 9, where the percentage of the events (catalogue A) switching between the deduced incomplete and complete magnitude ranges are shown for a noise level of σΜ = 0.2. Note that for M ≥ ∼2.2, the three proportions (overestimation, underestimation and same value) tend to stable values of about 57, 27 and 16 per cent, respectively, for completeness thresholds larger than 2.2. This threshold of 2.2 (implying b = 0.992 ± 0.007) is the magnitude threshold above which the b-value stabilizes close to the ‘real’ value (Figs 8 and 9). The anomalies observed at higher magnitudes are due to the limited sample size. Figure 9. View largeDownload slide Percentage of events in the synthetic catalogue A which have underestimated (blue crosses), overestimated (black circles) or correctly estimated (red squares) magnitudes when Gaussian noise (μM = 0 and σΜ = 0.2) is added, plotted against true (synthetic) magnitude. The shaded area indicates 0.2 units on both sides of M = 2.0. Note the shift towards larger magnitudes. Figure 9. View largeDownload slide Percentage of events in the synthetic catalogue A which have underestimated (blue crosses), overestimated (black circles) or correctly estimated (red squares) magnitudes when Gaussian noise (μM = 0 and σΜ = 0.2) is added, plotted against true (synthetic) magnitude. The shaded area indicates 0.2 units on both sides of M = 2.0. Note the shift towards larger magnitudes. To summarize the above, for catalogue A, contaminated with Gaussian noise (μM = 0 and σΜ = 0.2) and magnitudes rounded to 0.1, for relatively large data sets (N > ∼800): For Mmin  = Mc, where Mmin is the lowest magnitude used in estimating b-value: there are ∼50 per cent of events below Mc included in the catalogue and ∼30 per cent of events above Mc are discarded. For Mmin  > Mc + 0.2: there are ∼57 per cent of events with M <  Mmin erroneously included in the catalogue and ∼27 per cent of events with M >  Mmin which are wrongly discarded. It follows from our analysis that whatever method is used, the part of any noise contaminated catalogue which is assessed to be ‘complete’ is unlikely ever to be truly complete, whatever magnitude threshold we use. The effect of magnitude noise is that some events with magnitudes below our threshold ‘leak’ to above the threshold, and vice versa. This may have significant implications for statistical investigations of, for example interevent times or distances analysis. For some studies, it may be possible to apply straightforward approximate corrections. For example, with the parameters used in our example (Gaussian noise, μM = 0 and σΜ = 0.2; Fig. 8, red curve), the number of events apparently over our threshold will be about 11.5 per cent greater than the true value. 5 SUMMARY AND CONCLUSIONS In order to investigate possible bias effects, we generated synthetic data under some clear assumptions, which were to the extent possible based on our understanding of the character of real data: Both catalogues (A and B) include a complete part which follows the G-R law with b = 1 and an incomplete part corresponding to a specified detection level. Magnitudes are rounded at their first decimal. The noise assigned to the original synthetic catalogue is magnitude independent and normally distributed (Gaussian noise) owing a zero mean (μM = 0) and a selected standard deviation (σΜ: 0.1–0.4). Under these assumptions the following conclusions can be obtained: The b-values are always underestimated when noise of the character we have investigated is introduced to magnitude values. The degree of underestimation is proportional to the noise standard deviation. MaxC generally provides a relatively reliable (although underestimated) estimate of the actual Mc. However, this might not always be the case (e.g. when no obvious most frequent magnitude is evident or a complex incomplete part is present). Therefore, it is suggested that if the MaxC method is used to estimate Mc, it should be in combination with other methods, along with a visual inspection of the FMD (i.e. a completely automatic algorithm might be sometimes unreliable). MaxC is definitely inadequate for b-value estimation. GFT90 is inadequate for estimating the actual Mc and may introduce severe bias in the analysis. GFT95 is an appropriate method for estimating Mc, although 95 per cent fitting is not usually achieved for data sets with N < 1000. The 90 per cent level of fitting using GFT is definitely unacceptable for b-value evaluation. For N < 1000, GFT95 provides the most robust estimate of the b-value. For N > 1000, mGFT and MBS overestimate the actual Mc in general, but they are by far the most appropriate methods for estimating the b-value and they are highly recommended. The use of b-values based on Mc derived by MaxC and GFT may be highly problematic. Even above a well-defined Mc, the catalogue is unlikely to be complete because of a bidirectional flow of data below and above Mc. For Mmin ≫Mc, the actual number of events is overestimated by a factor depending primarily on the noise standard deviation (at least for relatively large data sets, N > ∼800). In this work, we showed that adequate knowledge of the noise properties incorporated in the seismic catalogues is of utmost importance for correct catalogue analysis. Real data may, at least sometimes, be significantly more complex in character than our simulations and for example different statistical distributions in the data could affect bias and uncertainty estimates. Additional complications may include, for example the effect of magnitude conversions from different scales on seismicity rates, as has been discussed by Musson (2012); magnitude-dependent noise; mixing of data sets with different spatial and temporal variation of b-value; incomplete data sets; spatial and temporal variation of Mc (Márquez-Ramírez et al.2015; Hainzl 2016; Kijko et al.2016), etc. The possible implications of such effects, for example hazard evaluation may be significant, and several of them merit further investigation. However, our models are based on observed characteristics of data and our conclusions probably have some general validity, even though it is possible that there may be data sets with rather different characteristics. Acknowledgements This work was supported within SHEER: ‘SHale gas Exploration and Exploitation induced Risks’ project funded from the European Union Horizon 2020—Research and Innovation Programme, under grant agreement 640896. The work was also partially supported within statutory activities no. 3841/E-41/S/2017 of the Ministry of Science and Higher Education of Poland. AA acknowledges funding from the Jubilee scholarship (Uppsala University). The magnitude distribution parameters and the magnitude of completeness were estimated by MCCalc program, which can be downloaded from IS-EPOS: Induced Seismicity European Plate Observing System web platform https://tcs.ah-epos.eu/ after registration. The data used in this study can be found at http://geophysics.geo.auth.gr/ss/station_index_en.html. REFERENCES Aki K., 1965. Maximum likelihood estimate of b in the formula log N = a − bM and its confidence limits, Bull. Earthq. Res. Inst. Tokyo Univ. , 43, 237– 239. Aristotle University of Thessaloniki Seismological Network, 1981. Permanent regional seismological network operated by the Aristotle University of Thessaloniki. International Federation of Digital Seismograph Networks. Other/Seismic Network, doi:10.7914/SN/HT. Bender B., 1983. Maximum likelihood estimation of b-values for magnitude grouped data, Bull. seism. Soc. Am. , 73, 831– 851. Boettcher M.S., McGarr A., Johnston M., 2009. Extension of Gutenberg-Richter distribution to Mw-1.3, no lower limit in sight, Geophys. Res. Lett. , 36( 10), doi:10.1029/2009GL038080. https://doi.org/10.1029/2009GL038080 Cao A.M., Gao S.S., 2002. Temporal variation of seismic b-values beneath northeastern Japan island arc, Geophys. Res. Lett. , 29( 9), doi:10.1029/2001GL013775. https://doi.org/10.1029/2001GL013775 El-Isa Z.H., Eaton D.W., 2014. Spatiotemporal variations in the b-value of earthquake magnitude-frequency distributions: classification and causes, Tectonophysics , 615–616, 1– 11. https://doi.org/10.1016/j.tecto.2013.12.001 Google Scholar CrossRef Search ADS   Godano C., 2017. A new method for the estimation of the completeness magnitude, Phys. Earth planet. Inter. , 263, 7– 11. https://doi.org/10.1016/j.pepi.2016.12.003 Google Scholar CrossRef Search ADS   Gonzalez A., 2016. The Spanish National Earthquake Catalogue: evolution, precision and completeness, J. Seismol. , doi:10.1007/s10950-016-9610-8. Gutenberg B., Richter C.F., 1944. Frequency of earthquakes in California, Bull. seism. Soc. Am. , 34, 184– 188. Hainzl S., 2016. Apparent triggering function of aftershocks resulting from rate-dependent incompleteness of earthquake catalogs, J. geophys. Res. , 121( 9), 6499– 6509. https://doi.org/10.1002/2016JB013319 Google Scholar CrossRef Search ADS   Huang Y.L., Zhou S.Y., Zhuang J.C., 2016. Numerical tests on catalog-based methods to estimate magnitude of completeness, Chin. J. Geophys. , 59, 101– 110. Google Scholar CrossRef Search ADS   Ishimoto M., Iida K., 1939. Observations of earthquakes registered with the microseismograph constructed recently, Bull. Earthq. Res. Inst. , 17, 443– 478. Kagan Y.Y., 1991. Likelihood analysis of earthquake catalogues, Geophys. J. Int. , 106( 1), 135– 148. https://doi.org/10.1111/j.1365-246X.1991.tb04607.x Google Scholar CrossRef Search ADS   Kagan Y.Y., 1997. Seismic moment-frequency relation for shallow earthquakes: regional comparison, J. geophys. Res. , 102( B2), 2835– 2852. https://doi.org/10.1029/96JB03386 Google Scholar CrossRef Search ADS   Kagan Y.Y., 2002. Seismic moment distribution revisited: I. Statistical results, Geophys. J. Int. , 148( 3), 520– 541. https://doi.org/10.1046/j.1365-246x.2002.01594.x Google Scholar CrossRef Search ADS   Kagan Y.Y., Rong Y.F., Jackson D.D., 2003. Probabilistic forecasting of seismicity, in Earthquake Science and Seismic Risk Reduction , pp 185– 200, eds Mulargia F., Geller R.J., Kluwer Academic Publishing, Dordrecht. Kanamori H., Anderson D., 1975. Theoretical basis of some empirical relations in seismology, Bull. seism. Soc. Am. , 65( 5), 1073– 1095. Kijko A., Sellevoll M.A., 1989. Estimation of earthquake hazard parameters from incomplete data files. Part I. Utilization of extreme and complete catalogs with different threshold magnitudes, Bull. seism. Soc. Am. , 79, 645– 654. Kijko A., Lasocki S., Graham G., 2001. Non-parametric seismic hazard in mines, Pure appl. Geophys. , 158( 9), 1655– 1675. https://doi.org/10.1007/PL00001238 Google Scholar CrossRef Search ADS   Kijko A., Singh M., 2011. Statistical tools for maximum possible earthquake magnitude estimation, Acta Geophys. , 59( 4), 674– 700. https://doi.org/10.2478/s11600-011-0012-6 Google Scholar CrossRef Search ADS   Kijko A., Smit A., Selevoll M.A., 2016. Estimation of earthquake hazard parameters from incomplete data files. Part III. Incorporation of uncertainty of earthquake-occurrence model, Bull. seism. Soc. Am. , ( 3), doi:10.1785/0120150252. https://doi.org/10.1785/0120150252 Knopoff L., Kagan Y.Y., 1977. Analysis of the theory of extremes as applied to earthquake problems, J. geophys. Res. , 82( 36), 5647– 5657. https://doi.org/10.1029/JB082i036p05647 Google Scholar CrossRef Search ADS   Kwiatek G., Plenkers K., Nakatami M., Yabe Y., Dresen G., & the JAGUARS-Group, 2010. Frequency-magnitude characteristics down to magnitude -4.4 for induced seismicity recorded at Mponeng gold mine, South Africa, Bull. seism. Soc. Am. , 100( 3), 1165– 1173 https://doi.org/10.1785/0120090277 Google Scholar CrossRef Search ADS   Kwiatek G., Ben-Zion Y., 2016. Theoretical limits on detection and analysis of small earthquakes, J. geophys. Res. , ( 8), doi:10.1002/2016JB012908. https://doi.org/10.1002/2016JB012908 Lasocki S., Urban P., 2011. Bias, variance and computational properties of Kijko's estimators of the upper limit of magnitude distribution, Mmax, Acta Geophys. , 59( 4), 5898– 5916. https://doi.org/10.2478/s11600-010-0049-y Google Scholar CrossRef Search ADS   Leptokaropoulos K.M., Karakostas V.G., Papadimitriou E.E., Adamaki A.K., Tan O., İnan S., 2013. A homogeneous earthquake catalog for Western Turkey and magnitude of completeness determination, Bull. seism. Soc. Am. , 103( 5), 2739– 2751. https://doi.org/10.1785/0120120174 Google Scholar CrossRef Search ADS   Maghsoudi S., Cesca S., Hainzl S., Kaiser D., Becker D., Dahm T., 2013. Improving the estimation of detection probability and magnitude of completeness in strongly heterogeneous media, an application to acoustic emission (AE), Geophys. J. Int. , 193( 3), 1556– 1569. https://doi.org/10.1093/gji/ggt049 Google Scholar CrossRef Search ADS   Maghsoudi S., Cesca S., Hainzl S., Dahm T., Zöller G., Kaiser D., 2015. Maximum magnitude of completeness in a salt mine, Bull. seism. Soc. Am. , 105, doi:10.1785/0120140039. https://doi.org/10.1785/0120140039 Márquez-Ramírez V.H., Nava F.A., Zúñiga F.R., 2015. Correcting the Gutenberg-Richter b-value for effects of rounding and noise, Earthq. Sci ., 28, 129– 134. https://doi.org/10.1007/s11589-015-0116-1 Google Scholar CrossRef Search ADS   Mignan A., Werner M., Wiemer S., Chen C.-C., Wu Y.-M., 2011. Bayesian estimation of the spatially varying completeness magnitude of earthquake catalogs, Bull. seism. Soc. Am. , 101, 1371– 1385. https://doi.org/10.1785/0120100223 Google Scholar CrossRef Search ADS   Mignan A., Woessner J., 2012. Estimating the magnitude of completeness for earthquake catalogs, in Community Online Resource for Statistical Seismicity Analysis , doi:10.5078/corssa-00180805. Musson R.M.W., 2012. The effect of magnitude uncertainty on earthquake activity rates, Bull. seism. Soc. Am. , 102, 2771– 2775. https://doi.org/10.1785/0120110224 Google Scholar CrossRef Search ADS   Nava F.A. et al.  , 2017. Gutenberg-Richter b-value maximum likelihood estimation and sample size, J. Seismol. , 1– 9, doi:10.1007/s10950-016-9589-1. Ogata Y., Katsura K., 1993. Analysis of temporal and spatial heterogeneity of magnitude frequency distribution inferred from earthquake catalogues, Geophys. J. Int. , 113, 727– 738. https://doi.org/10.1111/j.1365-246X.1993.tb04663.x Google Scholar CrossRef Search ADS   Page R., 1968. Aftershocks and microaftershocks, Bull. seism. Soc. Am. , 58, 1131– 1168. Pisarenko V.F., Sornette D., 2003. Characterization of the frequency of extreme Earthquake events by the generalized pareto distribution, Pure appl. Geophys. , 160, 2343– 2364. https://doi.org/10.1007/s00024-003-2397-x Google Scholar CrossRef Search ADS   Plenkers K., Schorlemmer D., Kwiatek G., 2011. On the probability of detecting Picoseismicity, Bull. seism. Soc. Am. , 101, 2579– 2591. https://doi.org/10.1785/0120110017 Google Scholar CrossRef Search ADS   Rydelek P.A., Sacks I.S., 1989. Testing the completeness of earthquake catalogues and the hypothesis of self-similarity, Nature , 337, 251– 253 https://doi.org/10.1038/337251a0 Google Scholar CrossRef Search ADS   Scholz C.H., 1968. Mechanism of creep in brittle rock, J. geophys. Res. , doi:10.1029/JB073i010p03295. https://doi.org/10.1029/JB073i010p03295 Schorlemmer D., Woessner J., 2008. Probability of detecting an earthquake, Bull. seism. Soc. Am. , 98, 2103– 2117. https://doi.org/10.1785/0120070105 Google Scholar CrossRef Search ADS   Serra I., Corral A., 2017. Deviation from power law of the global seismic moment distribution, Sci. Rep. , 7, 40045, doi:10.1038/srep40045. https://doi.org/10.1038/srep40045 PMID28053311 Google Scholar CrossRef Search ADS PubMed  Shi Y., Bolt B.A., 1982. The standard error of the magnitude frequency b-value, Bull. seism. Soc. Am. , 72, 1677– 1687. Taylor D.W.A., Snoke J.A., Sacks I.S., Takanami T., 1990. Non-linear frequency-magnitude relationship for the Hokkaido corner, Japan, Bull. seism. Soc. Am. , 80, 605– 609. Tinti S., Mulargia F., 1985. Effects of magnitude uncertainties on estimating the parameters in the Gutenberg-Richter frequency-magnitude law, Bull. seism. Soc. Am. , 75, 1681– 1697. Tinti S., Rimondi R., Mulargia F., 1987. On estimating frequency-magnitude relations from heterogeneous catalogs, Pageoph , 125, 1– 18. https://doi.org/10.1007/BF00878611 Google Scholar CrossRef Search ADS   Urban P., Lasocki S., Blascheck P., do Nascimento A.F., Van Giang N., Kwiatek G., 2015. Violations of Gutenberg– Richter relation in anthropogenic seismicity, Pure appl. Geophys. , doi:10.1007/s00024-015-1188-5. Utsu T., 1999. Representation and analysis of the Earthquake Size Distribution: a historical review and some new approaches, in Seismicity Patterns, Their Statistical Significance and Physical Meaning, Pageoph Topical Volumes . pp. 509– 535, eds Wyss M., Shimazaki K., Ito A., Birkhäuser, Basel. Wiemer S., Wyss M., 2000. Minimum magnitude of completeness in Earthquake catalogs: examples from Alaska, the Western United States, and Japan, Bull. seism. Soc. Am. , 90, 859– 869. https://doi.org/10.1785/0119990114 Google Scholar CrossRef Search ADS   Woessner J., Wiemer S., 2005. Assessing the quality of earthquake catalogues: estimating the magnitude of completeness and its uncertainty, Bull. seism. Soc. Am. , 95, 684– 698. https://doi.org/10.1785/0120040007 Google Scholar CrossRef Search ADS   Zhuang J.C., Touati S., 2015. Stochastic simulation of earthquake catalogs, in Community Online Resource for Statistical Seismicity Analysis , doi:10.5078/corssa-43806322. SUPPORTING INFORMATION Supplementary data are available at GJI online. Figure S1. Left: histogram of differences between b-values as they were obtained by the unbounded and truncated G-R law for 10 000 catalogues of different size and maximum magnitude, Mmax. Middle: the b-value differences as a function of maximum magnitude. Right: b-value differences as a function of catalogue size (N from 500 to 50 000). In all cases, the b-values estimated by the unlimited G-R were greater than or equal to the one derived from the upper bounded G-R, however the differences were very small (less than 0.01 for 95 per cent of the cases), indicating a very satisfactory approximation by the unbounded G-R. We may conclude that for adequately large data sets (N > ∼1000) and well-determined maximum magnitude, the two GR approaches lead to essentially identical results. Figure S2. Histograms of Mc values, after introducing magnitude errors to the original synthetic catalogue A, with μM = 0, σΜ = 0.2 (grey bars) and μM = 0.3 and σΜ = 0.2 (blue bars), respectively. Results were derived from 500 synthetic catalogues of 50 000 events each, drawn randomly with permutation from our original synthetic data set. Figure S3.b-value (solid lines) and its estimated standard error (dashed lines) as a function of magnitude, after introducing magnitude errors to the original synthetic catalogue, with μM = 0, σΜ = 0.2 (blue curves) and μM = 0.3, σΜ = 0.2 (green curves), respectively. Results were derived from 500 synthetic catalogues of 50 000 events each, drawn randomly by permutation from our original synthetic data set. Figure S4. Histograms of Mc obtained using the different methods for 1000 synthetic catalogues derived from the original synthetic catalogue A, after introducing Gaussian error with μM = 0, and σM = 0.1 (top frame), σM = 0.2 (central frame) and σM = 0.3 (bottom frame). Note the stable (but underestimated) values from GFT90 σM = 0.1 and 0.3 (99 per cent of results in one magnitude bin). This results to the narrow error bars in both Mc and b-values shown in Figs 4 and 5 of the manuscript. Figure S5. Left-hand column: subsets of varying sample size (300–50 000) are extracted from catalogue A and the completeness threshold is found by means of MaxC, GFT90, mGFT and MBS. The process is repeated 1000 times per sample size, and the points on the graph represent the average Mc. The error bars correspond to one standard deviation. Right-hand column: after contaminating catalogue A with Gaussian error (μM = 0 and σM = 0.2), the same process is followed (for all sample sizes) and the difference between the magnitude thresholds before and after the noise addition is found, for all four methods. Figure S6. Same as in Fig. S5, but now the results for the estimated of b-value. Figure S7. As Fig. S5, but data from catalogue B. Figure S8. As Fig. S6, but catalogue B. Table S1. Monte Carlo simulation results on b-value (mean and standard deviation) for catalogues A and B (after being contaminated with Gaussian noise) using the different approaches discussed in the main text. Table S2. Seismicity rate overestimation as a function of Mmin for different levels of noise. Cumulative number of events (second column) for each magnitude bin (first column) for a catalogue with a theoretical distribution with b = 1.0. Columns 3–5 correspond to the difference between column 2 and the cumulative number of events per magnitude bin of catalogues contaminated by errors with μM = 0 and σM = 0.1, 0.2 and 0.3, respectively). The numbers in the parentheses denote the difference as a percentage. Positive and negative values denote that the contaminated catalogues contain more or fewer events than the theoretical distribution, respectively. Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper. © The Author(s) 2018. Published by Oxford University Press on behalf of The Royal Astronomical Society. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Geophysical Journal International Oxford University Press

# Impact of magnitude uncertainties on seismic catalogue properties

, Volume 213 (2) – May 1, 2018
12 pages

/lp/ou_press/impact-of-magnitude-uncertainties-on-seismic-catalogue-properties-cww0E3LoBr
Publisher
The Royal Astronomical Society
ISSN
0956-540X
eISSN
1365-246X
D.O.I.
10.1093/gji/ggy023
Publisher site
See Article on Publisher Site

### Abstract

Summary Catalogue-based studies are of central importance in seismological research, to investigate the temporal, spatial and size distribution of earthquakes in specified study areas. Methods for estimating the fundamental catalogue parameters like the Gutenberg–Richter (G-R) b-value and the completeness magnitude (Mc) are well established and routinely applied. However, the magnitudes reported in seismicity catalogues contain measurement uncertainties which may significantly distort the estimation of the derived parameters. In this study, we use numerical simulations of synthetic data sets to assess the reliability of different methods for determining b-value and Mc, assuming the G-R law validity. After contaminating the synthetic catalogues with Gaussian noise (with selected standard deviations), the analysis is performed for numerous data sets of different sample size (N). The noise introduced to the data generally leads to a systematic overestimation of magnitudes close to and above Mc. This fact causes an increase of the average number of events above Mc, which in turn leads to an apparent decrease of the b-value. This may result to a significant overestimation of seismicity rate even well above the actual completeness level. The b-value can in general be reliably estimated even for relatively small data sets (N < 1000) when only magnitudes higher than the actual completeness level are used. Nevertheless, a correction of the total number of events belonging in each magnitude class (i.e. 0.1 unit) should be considered, to deal with the magnitude uncertainty effect. Because magnitude uncertainties (here with the form of Gaussian noise) are inevitable in all instrumental catalogues, this finding is fundamental for seismicity rate and seismic hazard assessment analyses. Also important is that for some data analyses significant bias cannot necessarily be avoided by choosing a high Mc value for analysis. In such cases, there may be a risk of severe miscalculation of seismicity rate regardless the selected magnitude threshold, unless possible bias is properly assessed. Numerical approximations and analysis, Statistical methods, Statistical seismology 1 INTRODUCTION Seismicity catalogues are comprehensive databases which result from the operation of seismic networks and they include information, mainly, on earthquakes’ location, origin time and magnitude. These data are the basis for extensive analyses of the spatial, temporal and size distribution of seismicity, possible seismicity rate changes, as well as hazard assessment. Local, national and global networks have in general been (gradually) significantly improved, recording increasingly more events in seismogenic areas. However, data heterogeneity is inevitable, as catalogues embody measurement uncertainties (both random and systematic) stemming from human artefacts (instrumentation, methodology, seismological network modifications, etc.) or from geological interference (e.g. attenuation, site effects and aftershock sequences). Such factors result in discrepancies which are strongly imprinted on earthquake catalogues and may significantly distort statistical analysis results (Mignan & Woessner 2012). Therefore, it is important that seismological catalogues be scrutinized for their consistency and homogeneity. One of the most important issues to consider is that estimated magnitudes contain various uncertainties which may result to significant biases, for example on the estimation of the completeness magnitude (Mc, defined as the magnitude above which all events are believed to be recorded). Two different approaches are usually followed for the determination of Mc (Mignan & Woessner 2012). The first family is network-based methods (Schorlemmer & Woessner 2008; Mignan et al.2011; Plenkers et al.2011; Maghsoudi et al., 2013, 2015; Kwiatek & Ben-Zion 2016) which are based upon the assessed probability of the detection of an earthquake, given the observation sensitivity and distribution of the seismological network. The alternative approach is catalogue-based methods, either relying on the day-to-night ratio of earthquake frequency (Rydelek & Sacks 1989; Taylor et al.1990) or on the assumption of a self-similar earthquake generation process (Woessner & Wiemer 2005, and references therein). In such cases, the Frequency–Magnitude Distribution (FMD) of earthquakes is assumed to be sufficiently described by the well-known empirical Gutenberg–Righter (G-R) law (Ishimoto & Iida 1939; Gutenberg & Richter 1944). The frequency N(M) of the events with magnitudes greater than or equal to M is given by:   $${\log _{10}}N(M) = \alpha - bM$$ (1)where parameter α describes the general activity level and b the proportion of larger relative to smaller events. The exponential behaviour of the FMD has been observed in many data sets and is consistent with laboratory experiments (Scholz 1968) and various theoretical models (Kanamori & Anderson 1975; Tinti & Mulargia 1985). Eq. (1) can also be transformed into a power law (Pareto) distribution of the scalar seismic moment (Kagan 1997). Detailed investigations of some seismicity data sets have occasionally revealed cases of apparent significant deviations from the G-R law (Utsu 1999) in, for example anthropogenic seismicity (Urban et al.2015) or global data (Serra & Corral 2017). Still, the exponential distribution is usually dominant (e.g. Kagan 1991; Kijko et al.2001; Pisarenko & Sornette 2003) and as a result, the G-R law remains the simplest and most commonly accepted formulation for describing the FMD of earthquakes from nano (e.g. Boettcher et al.2009; Kwiatek et al.2010) to global seismicity scale (Kagan et al. 2003). Assuming that the G-R law is valid, and taking into account that the estimated magnitudes in a seismicity catalogue incorporate errors (uncertainties), a reasonable question that arises is whether these errors can affect the estimation of b-value and Mc and what is the degree of their potential influence. Since Mc determination is essential, for example for the correct estimation of b-values, an inaccurate Mc assessment may significantly degrade seismic hazard assessment, seismicity rate investigation, cluster analysis or any other form of relevant statistical analysis. Tinti & Mulargia (1985) and Tinti et al. (1987) showed that α and b estimates of the G-R relationship strongly depend on the magnitude uncertainties rising from (i) systematic errors (due to e.g. the approximations made while applying a model or methodology), (ii) discretization (i.e. magnitudes are rounded although they are naturally continuous quantities), (iii) random errors (e.g. related to the data acquisition and analysis process) and (iv) catalogue heterogeneities (e.g. conversions among different magnitude scales). In a homogeneous catalogue in the sense of including the same magnitude scale for all events, if the magnitude errors are disregarded, the α-value may be overestimated, even if the b-value is not affected. In heterogeneous catalogues, neglecting the random magnitude errors may lead to more significant bias in the estimated parameters (Tinti et al.1987). A conservative strategy often followed is to select a rather high completeness magnitude threshold, resulting in smaller data sets for analysis. The advantages of using a high threshold to try to assure completeness must be weighed against the loss of information, when a considerable number of events are excluded, leading to higher statistical errors. The aim of this study is to investigate and assess the impact of magnitude uncertainties (inherently existing in seismic catalogues) on the estimation of b-value and Mc. Thus, we will identify how these uncertainties affect seismicity rate determination for M greater or equal to a pre-defined value. Since other seismicity parameters (spatial and temporal) are not studied here, synthetic magnitude catalogues following a specified distribution are produced. Two different approaches for creating the synthetic data sets are applied and the results are compared with each other. Then, the properties of the synthetic catalogues are demonstrated by examining idealized cases where data sets are large enough to provide statistically significant estimations of the basic seismicity parameters (b-value and Mc). The calculations are then repeated after introducing noise of specified distribution to the original data. We rely on three catalogue-based methodologies for assessing Mc and apply four different approaches (see Section 2.2), whereas the b-value (and the corresponding uncertainty) is calculated throughout as a function of the minimum included magnitude (Mmin) by the maximum likelihood estimator of Aki (1965). Several scenarios are tested, corresponding to different noise properties and sample size, and the significance of the results is evaluated by Monte Carlo simulations. Finally, the influence of Mc and b-value determination on seismicity rate and seismic hazard assessment implications is discussed. 2 METHODS 2.1 b-value The general method for estimating the b-value is based on the double truncated exponential distribution, where the probability density function of magnitude M (limited between a minimum Mmin and a maximum Mmax) for the G-R relation is given by (Page 1968; Kijko & Sellevoll 1989):   \begin{eqnarray} {f(M) = \frac{{\beta [\exp ( - \beta (M - {M_{\min }} + \Delta M/2))]}}{{1 - \exp [ - \beta ({M_{\max }} - {M_{\min }} + \Delta M/2)]}}} \nonumber\\ \quad {\rm{for}}\quad {M_{\min }} \le M \le {M_{\max }},\quad 0\quad {\rm{otherwise}}\end{eqnarray} (2)with parameter β being approximately 2.3 times (ln10) larger than the b-value of the G-R law. The term ΔM/2 is introduced to eq. (2) as a correction for the finite binning width of the catalogue. ΔM is the round-off interval of the magnitudes, that is the minimum non-zero difference in the reported magnitudes. ΔM is set equal to 0.1 in this study, in accordance with the real catalogues we refer to. The corresponding cumulative distribution function of eq. (2) is given by:   \begin{eqnarray} F(M) \!=\! \left\lbrace {\begin{array}{c@{}c@{}} 0&\qquad\qquad\qquad {{\rm{for}}\quad M < {M_{\min }}}\\ &\!\!\!{\frac{{1 - \exp [( - \beta (M - {M_{\min }} \!+\! \Delta M/2))]}}{{1 - \exp [ - \beta ({M_{\max }} \!-\! {M_{\min }} \!+\! \Delta M/2)]}}}\,\,\,{{\rm{for}}\quad {M_{\min }} \le M \le {M_{\max }}}\\ 1& \qquad\qquad\qquad{{\rm{for}}\quad M > {M_{\max }}} \end{array}} \right. \end{eqnarray} (3) A common assumption that is frequently followed is the so-called unbounded G-R law, in which an infinitely large maximum magnitude is assumed to be possible. This approach leads to a simplification of eqs (2) and (3), by eliminating the denominators. The unbounded exponential distribution may result to the energy catastrophe paradox (Knopoff & Kagan 1977), which can be prevented by applying the original, upper bounded form of eqs (2) and (3) (Kijko & Sellevoll 1989; Kijko & Singh 2011; Lasocki & Urban 2011). However, the two approaches lead to essentially identical results for sufficiently large samples (N > ∼1000) and a robustly constrained Mmax (see Fig. S1, Supporting Information). The b-value for the unbounded G-R approach is considered to be most appropriately estimated by the maximum likelihood estimator (Aki 1965; Bender 1983; Utsu 1999):   $$b = \frac{1}{{\ln (10)\left[ {\left\langle M \right\rangle - ({M_{\min }} - \Delta M/2)} \right]}}$$ (4)where <M> is the sample mean of the events included. Aki (1965) also estimated the b-value standard error, σb, for a sample size, N, expressed as:   $${\sigma _b} = \frac{b}{{\sqrt N }}$$ (5) Note that eq. (5) is an asymptotic maximum likelihood estimator and can be considered to be reliable when N is essentially large (∼ >100). In our study, we applied eqs (4) and (5) in data sets with N > 300, so the asymptotic estimator can be considered as appropriate. 2.2 Completeness magnitude (Mc) Once the b-value has been estimated as a function of Mmin, the completeness magnitude (Mc) of the data sets can be estimated. For the Mc evaluation, we rely on four catalogue-based methodologies: (i) Maximum Curvature (MaxC, Wiemer & Wyss 2000), (ii) Goodness-of-Fit Test (GFT, Wiemer & Wyss 2000), (iii) modified GFT (mGFT, Leptokaropoulos et al.2013) and (iv) the b-value stability (MBS, Cao & Gao 2002). In all these methods, there is a direct dependency between the estimated Mc and the estimated b-value and there are cases when the b-values are sensitive to even small variations of Mc (0.1 unit). These methods are not computationally expensive and they are easily applied, constituting suitable tools for Monte Carlo simulations over numerous synthetic data sets, as in this study. The MaxC technique defines Mc as the most frequent magnitude in the non-cumulative distribution, which corresponds to the maximum value of the first derivative of the G-R law. According to the GFT, the completeness threshold is estimated by fitting an exponential distribution over real and synthetic magnitude samples. These two approaches (real and synthetic) are combined under the frame of a quantitative testing procedure comparing R-values, that is, the offset between real and synthetic data:   $$R = \frac{{\sum\limits_{{M_i}}^{{M_{\max }}} {\left| {{N_o} - {N_s}} \right|} }}{{\sum\limits_i N }}$$ (6) where R is the normalized absolute difference between the number of observed (No) and simulated (Ns) events. Two thresholds for R, 5 and 10 per cent, are commonly used in published studies to justify the selection of Mc. In mGFT, k synthetic FMDs with the same number of events and same exponential distribution as the observed data are created in each case. Goodness of fit to the G-R law in each magnitude bin is found for each one of these synthetic catalogues. Hence, the total residual value RmGFT is calculated by averaging the R-values from the k synthetic FMDs (eq. 7). The minimum RmGFT indicates the value of the Mc. In this study, k is set equal to 500, in all cases investigated. Given the R from eq. (6) and k synthetic FMDs, RmGFT is given by:   $${R_{{\rm{mGFT}}}} = \frac{{\sum\limits_{j = 1}^k R }}{k}$$ (7) The MBS approach is built upon the assumption that b-value increases for Mmin < Mc, but stabilizes for M > Mc. Cao and Gao (2002) applied this technique considering the b stability as a function of Mmin. The completeness threshold is defined as the minimum magnitude for which   $$\Delta b = \left| {{b_{{\rm{average}}}} - b} \right| \le 0.03$$ (8) with baverage being the arithmetic mean of the b-values corresponding to successive magnitude thresholds within half a magnitude range (i.e. 0.5 units for 0.1 bin size). Instead of using the 0.03 value as the criterion for b-value stability, we here follow the alternative technique of Woessner & Wiemer (2005). The authors suggested the use of the b-value uncertainty, δb, as the criterion, introduced by Shi & Bolt (1982):   $$\delta b = 2.3{b^2}\sqrt {\frac{{\sum\limits_{i = 1}^N {({M_i} - \left\langle M \right\rangle )} }}{{N(N - 1)}}}$$ (9) where N is the number of all events with M > Mc and <M> is their mean magnitude. Thus, Mc is defined as the minimum magnitude for which   $$\Delta b = \left| {{b_{{\rm{average}}}} - {\rm{b}}} \right| \le {\rm{\delta b}}$$ (10) There are other catalogue-based methods that can be used to estimate the completeness magnitude (e.g. Mignan & Woessner 2012; Godano 2017, and references therein). For example, Godano (2017) recently suggested that a robust estimation of Mc can be achieved based on the calculation of the harmonic mean of the magnitudes larger than a given magnitude threshold. The method was tested on synthetic and real data and the results were compared to the MBS technique, revealing the necessity of an accurately defined Mc in order to obtain a correctly estimated b-value (see also later sections). Another broadly used technique is the Entire Magnitude Range (EMR) method, developed by Ogata & Katsura (1993) and modified by Woessner & Wiemer (2005). For EMR to be applied, a complete FMD model needs to be assumed, necessitating the estimation of four parameters. This makes the technique more complex in comparison with other methods such as the non-parametric MaxC or the MBS/GFTs methods described above. A technique like EMR incorporates stronger assumptions concerning the incomplete part when evaluating the completeness magnitude. As discussed in Kagan (2002), strong implicit or explicit assumptions about the character of the incompleteness may be problematic. There are multiple reasons of losing data below Mc (e.g. spatiotemporal network inefficiency, variations of the spatiotemporal signal-to-noise ratio, even operators’ ability and decisions). Therefore, the distribution of the incomplete data may be statistically complicated. Although modeling of the entire catalogue for generating realistic synthetic data sets is applied here, the fitting of parameters to the incomplete part will be reliable if our detailed assumptions about the character of the incompleteness are correct, but may be misleading otherwise. Therefore, we have chosen not to use this approach in this study. 2.3 Synthetic data We study synthetic magnitude catalogues which are constructed using two models, both based upon the G-R law with some data missing at smaller magnitudes. The first synthetic catalogue (A) consists of two groups of magnitudes which are produced separately and are then combined. The incomplete part corresponds to a half Gaussian distribution, that is a set of normally distributed event magnitudes which is produced with mean equal to 1.9 and a standard deviation equal to 0.2. As described by Woessner & Wiemer (2005), various distributions are sometimes assumed to describe the incomplete fraction of a catalogue. For our simulations, we choose a generic normal distribution for simplicity. The completeness magnitude (Mc) of the entire data set is fixed at magnitude 2. Therefore, the distribution of M ≥ 2 exactly follows the G-R law, with b = 1.0, which is a commonly observed value for a wide variety of instrumental catalogues on several spatial and temporal scales (e.g. El-Isa & Eaton 2014, and references therein). Although Mc = 2 might seem an arbitrary selection, the self-similar properties of observed G-R distributions indicate scale invariance, and thus the choice of Mc does not really affect the results, since they can be extrapolated towards both higher as well as lower values. We generate a set of approximately 73 000 synthetic magnitudes, including more than 30 000 events with M > Mc, rounded to the first decimal as is common in many regional and global catalogues. The maximum ‘observed’ magnitude in this catalogue is 6.5, so the range of the complete part is 4.5 mag units. That catalogue A as a compilation of two distinct parts of known properties gives the advantage of well-determined initial conditions, with exact and known Mc and b-value for the complete part. This allows us to explicitly assess the performance of the different analysis methods in estimating Mc and b-value. However, it is unlikely that real data sets have such a clear discontinuity. Therefore, in order to assess the possible sensitivity of our analysis to our initial assumptions, we chose also to examine a synthetic data set with no explicit discontinuity between the complete and incomplete parts, which may be more realistic. We use the model suggested by Ogata & Katsura (1993) and modified by Woessner & Wiemer (2005), based on the observed seismicity in Japan. According to their model, a detection function q (M|μ,σ) is used to describe the detection rate, which is given by a cumulative normal distribution:   $$q(M|\mu ,\sigma ) = \left\{ {\begin{array}{@{}*{1}{c}@{}} {\frac{1}{{\sigma \sqrt {2\pi } }}\int\limits_{{ - \infty }}^{{{M_c}}}{{\exp \left[ { - \frac{{{{(M - \mu )}^2}}}{{2{\sigma ^2}}}} \right]}}dm,\quad M < {M_c}}\\ {1,\quad {\rm{else}}} \end{array}} \right\}$$ (11) where μ is the magnitude corresponding to 50 per cent of detected earthquakes and is based upon the network detectability, while σ represents how fast the detection rate decays as magnitude decreases. Using this approach, we build a second set of synthetic magnitudes (catalogue B), where μ = 1.5 and σ = 0.2. More details on the choice of these parameters can be found in the work of Huang et al. (2016). The resulting stochastic catalogue includes ‘events’ randomly selected (following the rejection method described by Zhuang & Touati 2015) from a joint distribution of the G-R law and the detection function. Catalogue B eventually includes 100 000 events, with magnitudes up to 6.3, which are later rounded to the first decimal digit. For these rounded magnitudes, ‘events’ with M ≥ 2 have 99 per cent probability to be included in the catalogue, meaning that all events above this threshold are essentially recorded. Although this technique cannot provide a precise a priori completeness threshold, it is in some ways perhaps more realistic than catalogue A. We then add random errors (i.e. noise) to the magnitude estimates in the two catalogues. These are produced using a normal distribution with zero mean (μM = 0), assuming that there is no systematic trend for overestimating or underestimating the magnitudes in the catalogue (Tinti & Mulargia 1985). The effect of using a non-zero mean has also been investigated (Figs S2 and S3, Supporting Information) leading to a shift to higher or lower magnitudes. The assumed standard deviation of the Gaussian noise (σΜ) is in the range 0.1–0.4 (each distribution is investigated separately). The noise we introduce is magnitude independent, that is both higher and lower magnitudes demonstrate identical standard deviation. This choice was motivated by two examples of empirical data, from Greece and Spain. Gonzalez (2016) reports that the standard deviation of the reported magnitude errors in the Spanish national earthquake catalogue, from 1997 November onwards, is approximately 0.2 units, regardless the magnitude range. For the Greek catalogue, we ended up with similar conclusions, after investigating nearly 49 000 events obtained from the monthly bulletins provided by the Aristotle University of Thessaloniki (http://geophysics.geo.auth.gr/ss/) covering a period of 6.5 yr. During that time, the Hellenic Unified Seismological Network (HUSN) which provides the data have been relatively stable and the corresponding catalogue is homogeneous as far as the magnitude scale is concerned. The histogram in Fig. 1 shows that the reported magnitudes include errors with an estimated standard deviation of about 0.2. According to the results shown in Table 1, the standard deviation of the reported magnitude errors appears to be independent of the event magnitude, exhibiting essentially identical (σΜ) values (approximately 0.2) for all magnitude classes in the range of the available data. Figure 1. View largeDownload slide Histogram of reported magnitude uncertainties for ∼49 000 events of the HUSN seismicity catalogue for the period 2008 August, 1 to 2014 December, 31. Figure 1. View largeDownload slide Histogram of reported magnitude uncertainties for ∼49 000 events of the HUSN seismicity catalogue for the period 2008 August, 1 to 2014 December, 31. Table 1. Mean reported standard deviation, σΜ, for subsets of data from the HUSN seismicity catalogue (2008 August 1–2014 December 31). Magnitude  Mean σM  0.0 < M < 1.0  0.214  1.1 < M < 2.0  0.212  2.1 < M < 3.0  0.213  3.1 < M < 4.0  0.214  4.1 < M < 5.0  0.217  5.1 < M < 6.2  0.222  Magnitude  Mean σM  0.0 < M < 1.0  0.214  1.1 < M < 2.0  0.212  2.1 < M < 3.0  0.213  3.1 < M < 4.0  0.214  4.1 < M < 5.0  0.217  5.1 < M < 6.2  0.222  View Large The first examination of the effect of noise on the magnitude distribution can be made with Fig. 2, where the apparent change of the shape of the magnitude distribution due to the presence of a Gaussian noise with σΜ = 0.2 is shown. As the noise level is increased (larger σΜ), more events are sorted into different magnitude bins. The influence of the noise on the derived b-values using different minimum magnitudes is demonstrated in Fig. 3. Because there are fewer larger events than smaller and because the errors have a zero mean and magnitude independent standard deviation, we expect numerically more ‘leakage’ to larger than to smaller magnitudes. This means that more events are distributed towards the right tail of the distribution and fewer to the left. In this case, the practically estimated completeness threshold will in general be higher than the true underlying value. For reference, the completeness magnitudes and b-values of the initial catalogues A and B described above are shown in Table 2. Figure 2. View largeDownload slide Red circles: cumulative Frequency Magnitude Distributions (FMD) for catalogue A (top graphs) and catalogue B (bottom graphs), before (left-hand column) and after (right-hand column) the contamination with Gaussian noise of μM = 0 and σΜ = 0.2. The histograms show the number of events in each magnitude bin. Figure 2. View largeDownload slide Red circles: cumulative Frequency Magnitude Distributions (FMD) for catalogue A (top graphs) and catalogue B (bottom graphs), before (left-hand column) and after (right-hand column) the contamination with Gaussian noise of μM = 0 and σΜ = 0.2. The histograms show the number of events in each magnitude bin. Figure 3. View largeDownload slide The estimated b-value as a function of the minimum included magnitude for both catalogues (A on the left and B on the right graph), before (blue lines) and after (red lines) Gaussian noise of μM = 0 and σΜ = 0.2 is introduced to the data (see also the FMDs of the data in Fig. 2). The dashed lines show the estimated standard error of b (eq. 5). Figure 3. View largeDownload slide The estimated b-value as a function of the minimum included magnitude for both catalogues (A on the left and B on the right graph), before (blue lines) and after (red lines) Gaussian noise of μM = 0 and σΜ = 0.2 is introduced to the data (see also the FMDs of the data in Fig. 2). The dashed lines show the estimated standard error of b (eq. 5). Table 2. Four techniques are used to estimate the completeness magnitude (Mc) and b-value for the two catalogues, A and B (noise-free). Catalogue A  MaxC  GFT90  GFT95  mGFT  MBS  Mc  1.9  1.7  1.8  2  1.9  b  0.99  0.86  0.93  1.00  1.00  Catalogue B  Mc  1.6  1.5  1.6  2  1.9  b  0.91  0.85  0.91  0.99  0.98  Catalogue A  MaxC  GFT90  GFT95  mGFT  MBS  Mc  1.9  1.7  1.8  2  1.9  b  0.99  0.86  0.93  1.00  1.00  Catalogue B  Mc  1.6  1.5  1.6  2  1.9  b  0.91  0.85  0.91  0.99  0.98  View Large 3 RESULTS 3.1 Introducing magnitude uncertainties Using the four techniques described earlier (MaxC, GFT, mGFT and MBS), we investigate the influence of magnitude uncertainties on the estimated b-values and completeness threshold of two catalogues (A and B). Four levels of noise are introduced to the initial magnitude data, with σΜ = 0.1, 0.2, 0.3 and 0.4, that is corresponding to increasingly higher standard deviation of the noise. For GFT, the results are tested for two degrees of fit as suggested by earlier studies (e.g. Woessner &Wiemer 2005), 90 and 95 per cent, referred to as GFT90 and GFT95, respectively. Results are summarized in Figs 4 and 5. Numerical results can be found in the Table S1 of the Supporting Information. For catalogue A, where Mc is explicitly defined in the initial magnitude set (Mc = 2), the results reveal that MaxC tends to underestimate the completeness threshold, without major changes as higher noise level is introduced (Fig. 4). GFT95 also underestimates Mc for low noise levels, but the influence of the magnitude uncertainty is more profound (higher Mc is estimated as noise standard deviation gets larger). mGFT and MBS provide similar results to each other concerning Mc, for the different values of noise standard deviation. The results are similar for the case of catalogue B, although the response of MaxC seems slightly different between the two data sets. Figure 4. View largeDownload slide Mc estimated using the MaxC, GFT, mGFT, MBS methods when Gaussian noise of different standard deviation (σΜ) is added to catalogues A (Mc = 2 and b = 1) and B (50 per cent detection for M = 1.5 and b = 1). The results are the means of 1000 realizations and the error bars show the standard deviation (blue for catalogue A and red for B). The straight lines show the input Mc = 2 for comparison. The stars show the estimated Mc from each method in the error-free case. Figure 4. View largeDownload slide Mc estimated using the MaxC, GFT, mGFT, MBS methods when Gaussian noise of different standard deviation (σΜ) is added to catalogues A (Mc = 2 and b = 1) and B (50 per cent detection for M = 1.5 and b = 1). The results are the means of 1000 realizations and the error bars show the standard deviation (blue for catalogue A and red for B). The straight lines show the input Mc = 2 for comparison. The stars show the estimated Mc from each method in the error-free case. Figure 5. View largeDownload slide As Fig. 4, but mean b-values (and their standard deviation as error bars) from 1000 Monte Carlo realizations of noise contaminated catalogues A and B (the narrow error bars obtained by GFT are explained in Fig. S5 in the Supporting Information). Figure 5. View largeDownload slide As Fig. 4, but mean b-values (and their standard deviation as error bars) from 1000 Monte Carlo realizations of noise contaminated catalogues A and B (the narrow error bars obtained by GFT are explained in Fig. S5 in the Supporting Information). The b-values calculated using the different Mc estimates tend to decrease as noise increases (Fig. 5). mGFT and MBS provide more robust estimates than the other methods, with smaller changes in b and values close to the correct (input) b = 1 and the noise-free estimates (Table 2). The error bars in the figures correspond to one standard deviation from 1000 Monte Carlo realizations. The corresponding completeness threshold is also more ‘realistic’ (in terms of G-R law validity) when it is defined with these methods. Given that real magnitude estimates contain errors, our results imply that empirical b-values based on Mc from MaxC and GFT may be strongly biased, with considerable implications for seismic hazard assessment. 3.2 Magnitude uncertainties and sample size Considering that the b-value is a catalogue property statistically estimated from the available data, it is reasonable to assume that a sufficient sample size, N is required for a reliable estimation. As pointed out in the recent study of Nava et al. (2017), for small N (e.g. for small geographical areas or short time periods) results can be unreliable. We now investigate the bias in estimated Mc and b due to magnitude noise when N varies between 300 and 50 000. Events are extracted randomly from catalogues A and B and we show results only for magnitude noise with μM = 0 and σΜ = 0.2 (the latter motivated by what we observed for real data sets, see Section 2.3). We investigate the effect on the b-value of adding such noise by performing 1000 Monte Carlo trials for each sample size, and calculating the mean difference and the standard deviation of the result. The results are summarized in Figs 6 and 7 and are similar for both catalogues. Therefore, we show only a representative part of the results (more information can be found in Figs S5–S8 in the Supporting Information). Figure 6. View largeDownload slide Mc (top graphs) and b (bottom graphs) for subsets of catalogue A (Mc = 2 and b = 1) with N from 300 to 50 000, estimated using the four techniques (MaxC, GFT, mGFT and MBS). Markers show the means from 1000 separate realizations. Left: no added noise. Right: noise (μM = 0 and σΜ = 0.2) added to magnitudes. The straight lines indicate the input Mc and b. Figure 6. View largeDownload slide Mc (top graphs) and b (bottom graphs) for subsets of catalogue A (Mc = 2 and b = 1) with N from 300 to 50 000, estimated using the four techniques (MaxC, GFT, mGFT and MBS). Markers show the means from 1000 separate realizations. Left: no added noise. Right: noise (μM = 0 and σΜ = 0.2) added to magnitudes. The straight lines indicate the input Mc and b. Figure 7. View largeDownload slide Similar to Fig. 6, but for catalogue B (50 per cent detection for M = 1.5 and b = 1) and now showing differences between input and estimated values of Mc and b. Figure 7. View largeDownload slide Similar to Fig. 6, but for catalogue B (50 per cent detection for M = 1.5 and b = 1) and now showing differences between input and estimated values of Mc and b. Working with subsets from catalogue A and without the addition of any noise (Fig. 6, left-hand column), we can see that the first two methods, MaxC and GFT, underestimate Mc irrespectively of the sample size. For GFT, we present only results for the 90 per cent goodness of fit level, as the 95 per cent level was not reached for the majority of the subsets in both catalogues (95 per cent goodness of fit was achieved only in 30–40 per cent of the subsets). The mGFT and MBS methods define higher Mc for larger N, tending to approach the initially assumed value for catalogue A (see Table 2). However, MBS seems to be more sensitive to small N. This is what we would expect given that the stability of the estimated b-value is used as the criterion here, and more ‘events’ and wider magnitude range ensure more robust and realistic estimates of b. As also illustrated in Fig. 6, the three methods converge to the ‘real’ value of b = 1 as more ‘events’ are included in the subsets, except GFT90 which provides the lowest Mc and b, regardless the sample size. When the magnitude noise is added (Fig. 6, right-hand column), the performances of MaxC and GFT do not change significantly. Both methods define low completeness thresholds which are almost constant despite the changes in the sample size. However, because the added noise perturbs the FMD of the original catalogue (Fig. 2), the b-values are also lower than the input (b = 1.0) value (also shown in Fig. 3), similarly for the two techniques. For large N (N > ∼2000), the b-values given by mGFT and MBS are very close to the input value (less than 5 per cent are lower than b = 1.0), even though Mc is overestimated when it is compared to the input value (Mc = 2.0). For better understanding the potential combined effect of sample size and magnitude uncertainties on Mc and b, we use catalogue B to calculate the difference in Mc and b (as a function of the sample size) when these are estimated for data sets before and after the noise contamination. The results are illustrated in Fig. 7. Here, MaxC and GFT tend to estimate similar Mc values for all subsets, regardless the existence or absence of noise. As for catalogue A, mGFT and MBS tend to estimate higher Mc when errors are introduced and more events are used, which is expressed by the downward trend of the plotted differences (i.e. the absolute difference gets higher). In general, the estimated b-value is more accurate for larger subsets, that is closer to the initial input, irrespective of whether noise is introduced or not. The most accurate b-value estimation is achieved for N > ∼3000, by applying MBS and mGFT. 4 DISCUSSION We have investigated the potential biases introduced to b-value and Mc calculations due to the interference of various factors in real seismicity catalogues, such as the catalogue size and random errors in reported magnitudes. We also explore if different common methods for the b and Mc estimation are more susceptible to bias when data noise is present. Under the assumption that each event has an equal probability of being either overestimated or underestimated (i.e. normally distributed magnitude noise), there is an inherent trend of reporting an overestimated magnitude (compared to the real one) for the majority of the events close to and above Mc. This occurs because there are more small events that are shifted towards higher magnitudes than large events shifted to lower magnitude values. Therefore, since the noise is uniformly introduced to all magnitudes, an apparent increase of the larger to smaller events ratio in the complete part of the catalogue leads to an apparent decrease in the estimated b-value. This b-value decrease is inevitable to all catalogues contaminated by any symmetrically distributed noise (e.g. normal, uniform and t-distribution) and is proportional to the standard deviation of the noise. The underestimation of the b-value may in turn result in significant artefacts regarding seismicity rate evaluation and seismic hazard assessment. The effect of magnitude uncertainties in catalogue A is demonstrated in Fig. 8 (see also Table S2 in the Supporting Information for the numerical results). For catalogue A, where Mc is explicitly set at M = 2, we can easily calculate the number of events shifted from the incomplete to the complete part of the catalogue, and vice versa. The larger the number of events with M < Mc, the higher the resulting noise-induced contamination of the ‘complete’ data set from events with true magnitude below Mc. Note that the percentage of the additional events (erroneously included above a given magnitude threshold) in the contaminated catalogues stabilizes at 2.9, 11.5 and ∼27 per cent of the total number of events included in the original catalogue, from magnitudes of about 2.2 (σΜ = 0.1), 2.4–2.5 (σΜ = 0.2) and 2.7–2.8 (σΜ = 0.3), respectively. The mGFT and MBS methods estimate higher Mc than the other methods when noise with larger standard deviation is introduced. In previous studies that are based on data sets which do not include any noise, Mignan & Woessner (2012) have also reported the overestimation of Mc when MBS is applied, while the recently suggested method by Godano (2017) is reported to define even higher Mc than MBS. These methods (which report higher Mc) provide a more reliable b-value estimation. Although there are still events shifted from the apparent incomplete to the complete part of the catalogue, the resulting ratio of the smaller to larger magnitudes stabilizes, yielding a more realistic b-value and eliminating in some sense the effect of the magnitude noise (uncertainty). This is potentially significant, for example because an underestimate of b implies an overestimation of the number of larger future events. Figure 8. View largeDownload slide The difference in the apparent number of events over a given magnitude threshold and the corresponding (‘true’) number before noise (μM = 0 and σΜ = 0.1, 0.2 and 0.3) is added (catalogue A). Figure 8. View largeDownload slide The difference in the apparent number of events over a given magnitude threshold and the corresponding (‘true’) number before noise (μM = 0 and σΜ = 0.1, 0.2 and 0.3) is added (catalogue A). The magnitude of completeness (Mc) values estimated with MaxC and GFT are lower than those from the other algorithms, meaning more intense contamination of the presumed complete data set. Similarly, Mignan & Woessner (2012) showed that EMR also gives lower estimations of Mc. This implies that the absolute seismicity rate will be overestimated, with various implications for seismicity studies. However, when investigating rate changes this may not matter because the entire (complete) catalogue will be affected in a similar manner. This is demonstrated in Fig. 9, where the percentage of the events (catalogue A) switching between the deduced incomplete and complete magnitude ranges are shown for a noise level of σΜ = 0.2. Note that for M ≥ ∼2.2, the three proportions (overestimation, underestimation and same value) tend to stable values of about 57, 27 and 16 per cent, respectively, for completeness thresholds larger than 2.2. This threshold of 2.2 (implying b = 0.992 ± 0.007) is the magnitude threshold above which the b-value stabilizes close to the ‘real’ value (Figs 8 and 9). The anomalies observed at higher magnitudes are due to the limited sample size. Figure 9. View largeDownload slide Percentage of events in the synthetic catalogue A which have underestimated (blue crosses), overestimated (black circles) or correctly estimated (red squares) magnitudes when Gaussian noise (μM = 0 and σΜ = 0.2) is added, plotted against true (synthetic) magnitude. The shaded area indicates 0.2 units on both sides of M = 2.0. Note the shift towards larger magnitudes. Figure 9. View largeDownload slide Percentage of events in the synthetic catalogue A which have underestimated (blue crosses), overestimated (black circles) or correctly estimated (red squares) magnitudes when Gaussian noise (μM = 0 and σΜ = 0.2) is added, plotted against true (synthetic) magnitude. The shaded area indicates 0.2 units on both sides of M = 2.0. Note the shift towards larger magnitudes. To summarize the above, for catalogue A, contaminated with Gaussian noise (μM = 0 and σΜ = 0.2) and magnitudes rounded to 0.1, for relatively large data sets (N > ∼800): For Mmin  = Mc, where Mmin is the lowest magnitude used in estimating b-value: there are ∼50 per cent of events below Mc included in the catalogue and ∼30 per cent of events above Mc are discarded. For Mmin  > Mc + 0.2: there are ∼57 per cent of events with M <  Mmin erroneously included in the catalogue and ∼27 per cent of events with M >  Mmin which are wrongly discarded. It follows from our analysis that whatever method is used, the part of any noise contaminated catalogue which is assessed to be ‘complete’ is unlikely ever to be truly complete, whatever magnitude threshold we use. The effect of magnitude noise is that some events with magnitudes below our threshold ‘leak’ to above the threshold, and vice versa. This may have significant implications for statistical investigations of, for example interevent times or distances analysis. For some studies, it may be possible to apply straightforward approximate corrections. For example, with the parameters used in our example (Gaussian noise, μM = 0 and σΜ = 0.2; Fig. 8, red curve), the number of events apparently over our threshold will be about 11.5 per cent greater than the true value. 5 SUMMARY AND CONCLUSIONS In order to investigate possible bias effects, we generated synthetic data under some clear assumptions, which were to the extent possible based on our understanding of the character of real data: Both catalogues (A and B) include a complete part which follows the G-R law with b = 1 and an incomplete part corresponding to a specified detection level. Magnitudes are rounded at their first decimal. The noise assigned to the original synthetic catalogue is magnitude independent and normally distributed (Gaussian noise) owing a zero mean (μM = 0) and a selected standard deviation (σΜ: 0.1–0.4). Under these assumptions the following conclusions can be obtained: The b-values are always underestimated when noise of the character we have investigated is introduced to magnitude values. The degree of underestimation is proportional to the noise standard deviation. MaxC generally provides a relatively reliable (although underestimated) estimate of the actual Mc. However, this might not always be the case (e.g. when no obvious most frequent magnitude is evident or a complex incomplete part is present). Therefore, it is suggested that if the MaxC method is used to estimate Mc, it should be in combination with other methods, along with a visual inspection of the FMD (i.e. a completely automatic algorithm might be sometimes unreliable). MaxC is definitely inadequate for b-value estimation. GFT90 is inadequate for estimating the actual Mc and may introduce severe bias in the analysis. GFT95 is an appropriate method for estimating Mc, although 95 per cent fitting is not usually achieved for data sets with N < 1000. The 90 per cent level of fitting using GFT is definitely unacceptable for b-value evaluation. For N < 1000, GFT95 provides the most robust estimate of the b-value. For N > 1000, mGFT and MBS overestimate the actual Mc in general, but they are by far the most appropriate methods for estimating the b-value and they are highly recommended. The use of b-values based on Mc derived by MaxC and GFT may be highly problematic. Even above a well-defined Mc, the catalogue is unlikely to be complete because of a bidirectional flow of data below and above Mc. For Mmin ≫Mc, the actual number of events is overestimated by a factor depending primarily on the noise standard deviation (at least for relatively large data sets, N > ∼800). In this work, we showed that adequate knowledge of the noise properties incorporated in the seismic catalogues is of utmost importance for correct catalogue analysis. Real data may, at least sometimes, be significantly more complex in character than our simulations and for example different statistical distributions in the data could affect bias and uncertainty estimates. Additional complications may include, for example the effect of magnitude conversions from different scales on seismicity rates, as has been discussed by Musson (2012); magnitude-dependent noise; mixing of data sets with different spatial and temporal variation of b-value; incomplete data sets; spatial and temporal variation of Mc (Márquez-Ramírez et al.2015; Hainzl 2016; Kijko et al.2016), etc. The possible implications of such effects, for example hazard evaluation may be significant, and several of them merit further investigation. However, our models are based on observed characteristics of data and our conclusions probably have some general validity, even though it is possible that there may be data sets with rather different characteristics. Acknowledgements This work was supported within SHEER: ‘SHale gas Exploration and Exploitation induced Risks’ project funded from the European Union Horizon 2020—Research and Innovation Programme, under grant agreement 640896. The work was also partially supported within statutory activities no. 3841/E-41/S/2017 of the Ministry of Science and Higher Education of Poland. AA acknowledges funding from the Jubilee scholarship (Uppsala University). The magnitude distribution parameters and the magnitude of completeness were estimated by MCCalc program, which can be downloaded from IS-EPOS: Induced Seismicity European Plate Observing System web platform https://tcs.ah-epos.eu/ after registration. The data used in this study can be found at http://geophysics.geo.auth.gr/ss/station_index_en.html. REFERENCES Aki K., 1965. Maximum likelihood estimate of b in the formula log N = a − bM and its confidence limits, Bull. Earthq. Res. Inst. Tokyo Univ. , 43, 237– 239. Aristotle University of Thessaloniki Seismological Network, 1981. Permanent regional seismological network operated by the Aristotle University of Thessaloniki. International Federation of Digital Seismograph Networks. Other/Seismic Network, doi:10.7914/SN/HT. Bender B., 1983. Maximum likelihood estimation of b-values for magnitude grouped data, Bull. seism. Soc. Am. , 73, 831– 851. Boettcher M.S., McGarr A., Johnston M., 2009. Extension of Gutenberg-Richter distribution to Mw-1.3, no lower limit in sight, Geophys. Res. Lett. , 36( 10), doi:10.1029/2009GL038080. https://doi.org/10.1029/2009GL038080 Cao A.M., Gao S.S., 2002. Temporal variation of seismic b-values beneath northeastern Japan island arc, Geophys. Res. Lett. , 29( 9), doi:10.1029/2001GL013775. https://doi.org/10.1029/2001GL013775 El-Isa Z.H., Eaton D.W., 2014. Spatiotemporal variations in the b-value of earthquake magnitude-frequency distributions: classification and causes, Tectonophysics , 615–616, 1– 11. https://doi.org/10.1016/j.tecto.2013.12.001 Google Scholar CrossRef Search ADS   Godano C., 2017. A new method for the estimation of the completeness magnitude, Phys. Earth planet. Inter. , 263, 7– 11. https://doi.org/10.1016/j.pepi.2016.12.003 Google Scholar CrossRef Search ADS   Gonzalez A., 2016. The Spanish National Earthquake Catalogue: evolution, precision and completeness, J. Seismol. , doi:10.1007/s10950-016-9610-8. Gutenberg B., Richter C.F., 1944. Frequency of earthquakes in California, Bull. seism. Soc. Am. , 34, 184– 188. Hainzl S., 2016. Apparent triggering function of aftershocks resulting from rate-dependent incompleteness of earthquake catalogs, J. geophys. Res. , 121( 9), 6499– 6509. https://doi.org/10.1002/2016JB013319 Google Scholar CrossRef Search ADS   Huang Y.L., Zhou S.Y., Zhuang J.C., 2016. Numerical tests on catalog-based methods to estimate magnitude of completeness, Chin. J. Geophys. , 59, 101– 110. Google Scholar CrossRef Search ADS   Ishimoto M., Iida K., 1939. Observations of earthquakes registered with the microseismograph constructed recently, Bull. Earthq. Res. Inst. , 17, 443– 478. Kagan Y.Y., 1991. Likelihood analysis of earthquake catalogues, Geophys. J. Int. , 106( 1), 135– 148. https://doi.org/10.1111/j.1365-246X.1991.tb04607.x Google Scholar CrossRef Search ADS   Kagan Y.Y., 1997. Seismic moment-frequency relation for shallow earthquakes: regional comparison, J. geophys. Res. , 102( B2), 2835– 2852. https://doi.org/10.1029/96JB03386 Google Scholar CrossRef Search ADS   Kagan Y.Y., 2002. Seismic moment distribution revisited: I. Statistical results, Geophys. J. Int. , 148( 3), 520– 541. https://doi.org/10.1046/j.1365-246x.2002.01594.x Google Scholar CrossRef Search ADS   Kagan Y.Y., Rong Y.F., Jackson D.D., 2003. Probabilistic forecasting of seismicity, in Earthquake Science and Seismic Risk Reduction , pp 185– 200, eds Mulargia F., Geller R.J., Kluwer Academic Publishing, Dordrecht. Kanamori H., Anderson D., 1975. Theoretical basis of some empirical relations in seismology, Bull. seism. Soc. Am. , 65( 5), 1073– 1095. Kijko A., Sellevoll M.A., 1989. Estimation of earthquake hazard parameters from incomplete data files. Part I. Utilization of extreme and complete catalogs with different threshold magnitudes, Bull. seism. Soc. Am. , 79, 645– 654. Kijko A., Lasocki S., Graham G., 2001. Non-parametric seismic hazard in mines, Pure appl. Geophys. , 158( 9), 1655– 1675. https://doi.org/10.1007/PL00001238 Google Scholar CrossRef Search ADS   Kijko A., Singh M., 2011. Statistical tools for maximum possible earthquake magnitude estimation, Acta Geophys. , 59( 4), 674– 700. https://doi.org/10.2478/s11600-011-0012-6 Google Scholar CrossRef Search ADS   Kijko A., Smit A., Selevoll M.A., 2016. Estimation of earthquake hazard parameters from incomplete data files. Part III. Incorporation of uncertainty of earthquake-occurrence model, Bull. seism. Soc. Am. , ( 3), doi:10.1785/0120150252. https://doi.org/10.1785/0120150252 Knopoff L., Kagan Y.Y., 1977. Analysis of the theory of extremes as applied to earthquake problems, J. geophys. Res. , 82( 36), 5647– 5657. https://doi.org/10.1029/JB082i036p05647 Google Scholar CrossRef Search ADS   Kwiatek G., Plenkers K., Nakatami M., Yabe Y., Dresen G., & the JAGUARS-Group, 2010. Frequency-magnitude characteristics down to magnitude -4.4 for induced seismicity recorded at Mponeng gold mine, South Africa, Bull. seism. Soc. Am. , 100( 3), 1165– 1173 https://doi.org/10.1785/0120090277 Google Scholar CrossRef Search ADS   Kwiatek G., Ben-Zion Y., 2016. Theoretical limits on detection and analysis of small earthquakes, J. geophys. Res. , ( 8), doi:10.1002/2016JB012908. https://doi.org/10.1002/2016JB012908 Lasocki S., Urban P., 2011. Bias, variance and computational properties of Kijko's estimators of the upper limit of magnitude distribution, Mmax, Acta Geophys. , 59( 4), 5898– 5916. https://doi.org/10.2478/s11600-010-0049-y Google Scholar CrossRef Search ADS   Leptokaropoulos K.M., Karakostas V.G., Papadimitriou E.E., Adamaki A.K., Tan O., İnan S., 2013. A homogeneous earthquake catalog for Western Turkey and magnitude of completeness determination, Bull. seism. Soc. Am. , 103( 5), 2739– 2751. https://doi.org/10.1785/0120120174 Google Scholar CrossRef Search ADS   Maghsoudi S., Cesca S., Hainzl S., Kaiser D., Becker D., Dahm T., 2013. Improving the estimation of detection probability and magnitude of completeness in strongly heterogeneous media, an application to acoustic emission (AE), Geophys. J. Int. , 193( 3), 1556– 1569. https://doi.org/10.1093/gji/ggt049 Google Scholar CrossRef Search ADS   Maghsoudi S., Cesca S., Hainzl S., Dahm T., Zöller G., Kaiser D., 2015. Maximum magnitude of completeness in a salt mine, Bull. seism. Soc. Am. , 105, doi:10.1785/0120140039. https://doi.org/10.1785/0120140039 Márquez-Ramírez V.H., Nava F.A., Zúñiga F.R., 2015. Correcting the Gutenberg-Richter b-value for effects of rounding and noise, Earthq. Sci ., 28, 129– 134. https://doi.org/10.1007/s11589-015-0116-1 Google Scholar CrossRef Search ADS   Mignan A., Werner M., Wiemer S., Chen C.-C., Wu Y.-M., 2011. Bayesian estimation of the spatially varying completeness magnitude of earthquake catalogs, Bull. seism. Soc. Am. , 101, 1371– 1385. https://doi.org/10.1785/0120100223 Google Scholar CrossRef Search ADS   Mignan A., Woessner J., 2012. Estimating the magnitude of completeness for earthquake catalogs, in Community Online Resource for Statistical Seismicity Analysis , doi:10.5078/corssa-00180805. Musson R.M.W., 2012. The effect of magnitude uncertainty on earthquake activity rates, Bull. seism. Soc. Am. , 102, 2771– 2775. https://doi.org/10.1785/0120110224 Google Scholar CrossRef Search ADS   Nava F.A. et al.  , 2017. Gutenberg-Richter b-value maximum likelihood estimation and sample size, J. Seismol. , 1– 9, doi:10.1007/s10950-016-9589-1. Ogata Y., Katsura K., 1993. Analysis of temporal and spatial heterogeneity of magnitude frequency distribution inferred from earthquake catalogues, Geophys. J. Int. , 113, 727– 738. https://doi.org/10.1111/j.1365-246X.1993.tb04663.x Google Scholar CrossRef Search ADS   Page R., 1968. Aftershocks and microaftershocks, Bull. seism. Soc. Am. , 58, 1131– 1168. Pisarenko V.F., Sornette D., 2003. Characterization of the frequency of extreme Earthquake events by the generalized pareto distribution, Pure appl. Geophys. , 160, 2343– 2364. https://doi.org/10.1007/s00024-003-2397-x Google Scholar CrossRef Search ADS   Plenkers K., Schorlemmer D., Kwiatek G., 2011. On the probability of detecting Picoseismicity, Bull. seism. Soc. Am. , 101, 2579– 2591. https://doi.org/10.1785/0120110017 Google Scholar CrossRef Search ADS   Rydelek P.A., Sacks I.S., 1989. Testing the completeness of earthquake catalogues and the hypothesis of self-similarity, Nature , 337, 251– 253 https://doi.org/10.1038/337251a0 Google Scholar CrossRef Search ADS   Scholz C.H., 1968. Mechanism of creep in brittle rock, J. geophys. Res. , doi:10.1029/JB073i010p03295. https://doi.org/10.1029/JB073i010p03295 Schorlemmer D., Woessner J., 2008. Probability of detecting an earthquake, Bull. seism. Soc. Am. , 98, 2103– 2117. https://doi.org/10.1785/0120070105 Google Scholar CrossRef Search ADS   Serra I., Corral A., 2017. Deviation from power law of the global seismic moment distribution, Sci. Rep. , 7, 40045, doi:10.1038/srep40045. https://doi.org/10.1038/srep40045 PMID28053311 Google Scholar CrossRef Search ADS PubMed  Shi Y., Bolt B.A., 1982. The standard error of the magnitude frequency b-value, Bull. seism. Soc. Am. , 72, 1677– 1687. Taylor D.W.A., Snoke J.A., Sacks I.S., Takanami T., 1990. Non-linear frequency-magnitude relationship for the Hokkaido corner, Japan, Bull. seism. Soc. Am. , 80, 605– 609. Tinti S., Mulargia F., 1985. Effects of magnitude uncertainties on estimating the parameters in the Gutenberg-Richter frequency-magnitude law, Bull. seism. Soc. Am. , 75, 1681– 1697. Tinti S., Rimondi R., Mulargia F., 1987. On estimating frequency-magnitude relations from heterogeneous catalogs, Pageoph , 125, 1– 18. https://doi.org/10.1007/BF00878611 Google Scholar CrossRef Search ADS   Urban P., Lasocki S., Blascheck P., do Nascimento A.F., Van Giang N., Kwiatek G., 2015. Violations of Gutenberg– Richter relation in anthropogenic seismicity, Pure appl. Geophys. , doi:10.1007/s00024-015-1188-5. Utsu T., 1999. Representation and analysis of the Earthquake Size Distribution: a historical review and some new approaches, in Seismicity Patterns, Their Statistical Significance and Physical Meaning, Pageoph Topical Volumes . pp. 509– 535, eds Wyss M., Shimazaki K., Ito A., Birkhäuser, Basel. Wiemer S., Wyss M., 2000. Minimum magnitude of completeness in Earthquake catalogs: examples from Alaska, the Western United States, and Japan, Bull. seism. Soc. Am. , 90, 859– 869. https://doi.org/10.1785/0119990114 Google Scholar CrossRef Search ADS   Woessner J., Wiemer S., 2005. Assessing the quality of earthquake catalogues: estimating the magnitude of completeness and its uncertainty, Bull. seism. Soc. Am. , 95, 684– 698. https://doi.org/10.1785/0120040007 Google Scholar CrossRef Search ADS   Zhuang J.C., Touati S., 2015. Stochastic simulation of earthquake catalogs, in Community Online Resource for Statistical Seismicity Analysis , doi:10.5078/corssa-43806322. SUPPORTING INFORMATION Supplementary data are available at GJI online. Figure S1. Left: histogram of differences between b-values as they were obtained by the unbounded and truncated G-R law for 10 000 catalogues of different size and maximum magnitude, Mmax. Middle: the b-value differences as a function of maximum magnitude. Right: b-value differences as a function of catalogue size (N from 500 to 50 000). In all cases, the b-values estimated by the unlimited G-R were greater than or equal to the one derived from the upper bounded G-R, however the differences were very small (less than 0.01 for 95 per cent of the cases), indicating a very satisfactory approximation by the unbounded G-R. We may conclude that for adequately large data sets (N > ∼1000) and well-determined maximum magnitude, the two GR approaches lead to essentially identical results. Figure S2. Histograms of Mc values, after introducing magnitude errors to the original synthetic catalogue A, with μM = 0, σΜ = 0.2 (grey bars) and μM = 0.3 and σΜ = 0.2 (blue bars), respectively. Results were derived from 500 synthetic catalogues of 50 000 events each, drawn randomly with permutation from our original synthetic data set. Figure S3.b-value (solid lines) and its estimated standard error (dashed lines) as a function of magnitude, after introducing magnitude errors to the original synthetic catalogue, with μM = 0, σΜ = 0.2 (blue curves) and μM = 0.3, σΜ = 0.2 (green curves), respectively. Results were derived from 500 synthetic catalogues of 50 000 events each, drawn randomly by permutation from our original synthetic data set. Figure S4. Histograms of Mc obtained using the different methods for 1000 synthetic catalogues derived from the original synthetic catalogue A, after introducing Gaussian error with μM = 0, and σM = 0.1 (top frame), σM = 0.2 (central frame) and σM = 0.3 (bottom frame). Note the stable (but underestimated) values from GFT90 σM = 0.1 and 0.3 (99 per cent of results in one magnitude bin). This results to the narrow error bars in both Mc and b-values shown in Figs 4 and 5 of the manuscript. Figure S5. Left-hand column: subsets of varying sample size (300–50 000) are extracted from catalogue A and the completeness threshold is found by means of MaxC, GFT90, mGFT and MBS. The process is repeated 1000 times per sample size, and the points on the graph represent the average Mc. The error bars correspond to one standard deviation. Right-hand column: after contaminating catalogue A with Gaussian error (μM = 0 and σM = 0.2), the same process is followed (for all sample sizes) and the difference between the magnitude thresholds before and after the noise addition is found, for all four methods. Figure S6. Same as in Fig. S5, but now the results for the estimated of b-value. Figure S7. As Fig. S5, but data from catalogue B. Figure S8. As Fig. S6, but catalogue B. Table S1. Monte Carlo simulation results on b-value (mean and standard deviation) for catalogues A and B (after being contaminated with Gaussian noise) using the different approaches discussed in the main text. Table S2. Seismicity rate overestimation as a function of Mmin for different levels of noise. Cumulative number of events (second column) for each magnitude bin (first column) for a catalogue with a theoretical distribution with b = 1.0. Columns 3–5 correspond to the difference between column 2 and the cumulative number of events per magnitude bin of catalogues contaminated by errors with μM = 0 and σM = 0.1, 0.2 and 0.3, respectively). The numbers in the parentheses denote the difference as a percentage. Positive and negative values denote that the contaminated catalogues contain more or fewer events than the theoretical distribution, respectively. Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the paper. © The Author(s) 2018. Published by Oxford University Press on behalf of The Royal Astronomical Society.

### Journal

Geophysical Journal InternationalOxford University Press

Published: May 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations