Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Joint modelling of flood peaks and volumes: A copula application for the Danube River

Joint modelling of flood peaks and volumes: A copula application for the Danube River Flood frequency analysis is usually performed as a univariate analysis of flood peaks using a suitable theoretical probability distribution of the annual maximum flood peaks or peak over threshold values. However, other flood attributes, such as flood volume and duration, are necessary for the design of hydrotechnical projects, too. In this study, the suitability of various copula families for a bivariate analysis of peak discharges and flood volumes has been tested. Streamflow data from selected gauging stations along the whole Danube River have been used. Kendall's rank correlation coefficient (tau) quantifies the dependence between flood peak discharge and flood volume settings. The methodology is applied to two different data samples: 1) annual maximum flood (AMF) peaks combined with annual maximum flow volumes of fixed durations at 5, 10, 15, 20, 25, 30 and 60 days, respectively (which can be regarded as a regime analysis of the dependence between the extremes of both variables in a given year), and 2) annual maximum flood (AMF) peaks with corresponding flood volumes (which is a typical choice for engineering studies). The bivariate modelling of the extracted peak discharge - flood volume couples is achieved with the use of the Ali-Mikhail-Haq (AMH), Clayton, Frank, Joe, Gumbel, Hüsler-Reiss, Galambos, Tawn, Normal, Plackett and FGM copula families. Scatterplots of the observed and simulated peak discharge - flood volume pairs and goodness-of-fit tests have been used to assess the overall applicability of the copulas as well as observing any changes in suitable models along the Danube River. The results indicate that for the second data sampling method, almost all of the considered Archimedean class copula families perform better than the other copula families selected for this study, and that for the first method, only the upper-tail-flat copulas excel (except for the AMH copula due to its inability to model stronger relationships). Keywords: Bivariate frequency analysis; Copulas; Dependence of flood peaks and volumes; Kendall's rank correlation coefficient; Danube River. INTRODUCTION The design of flood protection structures where storage is involved requires an entire hydrograph or at least the flood volume/shape estimates related to the flood peaks. Therefore, the relationship between peaks of annual maximum floods (AMF) and their volumes is an interesting scientific research issue. From a broader perspective, however, the dependence between the annual extremes of both variables is also of interest when studied from the perspective of regimes. In the past, identical marginal distributions for both of these random variables have often been used for modelling their dependence in hydrology (e.g., Goel et al., 1998; Yue et al., 2001). As flood peaks are the most commonly used data in hydrological frequency analysis, the majority of studies have analyzed flood peak frequency curves (Cunnane, 1988, 1989; Dawdy et al., 2012; Groupe de recherche en hydrologie statistique (GREHYS), 1996; Laio et al., 2011; Mediero and Kjeldsen, 2014) in contrast to flood volumes, for which fewer studies are available (e.g. Bacová-Mitková, 2011; Mediero et al., 2010). Under the assumption that flood peaks and volumes have the same type of marginal distributions, several authors have used bivariate distributions for a frequency analysis of these variables (e.g. Singh and Singh, 1991; Shiau et al., 2006; Yue et al., 2001). This requirement, however, is seldom fulfilled in practice. In the event that the flood peaks and volumes do not have the same type of marginal probability distribution, the copula approach provides a flexible solution (Gaál et al., 2015; Giustarini et al., 2010; Szolgay et al., 2015; Zhang and Singh, 2006). Copulas allow for the combining of various marginal distributions into multivariate distributions (Favre et al., 2004; Genest and Favre, 2007; Nelsen, 2006; Salvadori and De Michele, 2004, 2010). Copulas have recently become a popular tool in hydrological analysis for modeling the relationship among hydrological characteristics. A methodology for using copulas in hydrology has been described e.g. by Dupuis (2007) and Genest and Favre (2007). Recently, a lot of studies have implemented the application of copulas in engineering practice (see, e.g., Aronica et al., 2012; Balistrocchi and Baldassarre, 2011; Bezak et al., 2016; De Michele and Salvadori, 2003; De Michele et al., 2005; Gaál et al., 2010; Zhang and Singh, 2007). Favre et al., (2004) were among the first authors who instituted the use of a two-dimensional copula for describing the relationship between flood discharges and volumes. Zhang and Singh (2006) stressed that bivariate copula-based distributions of flood peaks vs. volumes and flood volumes vs. durations provide better results for showing agreement when plotting frequency estimates than traditional distributions. Bacová-Mitková and Halmová (2014) applied parametric families of Archimedean copulas for an analysis of the relationship between vol- umes, peak discharges and durations at the Bratislava gauging station on the Danube River. Szolgay et al. (2012) used a joint analysis of maximum discharges and volumes through copulas for the estimation of design quantities. The analysis described a case study in the Vltava River Basin in the Czech Republic to estimate the design discharge for a return period of 10,000 years to assess the safety of the Orlík dam. The more recent study of Sraj et al. (2015) applied a bivariate copula analysis to the annual maximum discharges and volumes from a gauging station on the Sava River in Slovenia. The results from three families were compared, and the Gumbel-Hougaard copula was found to be the most appropriate for modeling peak discharges vs. volume and volume vs. duration. The main objective of this study is to investigate the suitability and evaluate the applicability of the selected copula models for the flood peak­volume relationship along the Danube River using streamflow data from seven gauging stations. The following eleven copula families were used for the analysis: AliMikhail-Haq, Clayton, Frank, Joe, Gumbel, Hüsler-Reiss, Galambos, Tawn, Normal, Plackett and FGM copula families. In contrast to previous studies conducted in the Danube River area (e.g., Bacová-Mitková and Halmová, 2014; Gaál et al., 2015; Szolgay et al., 2015, 2016), the flood peak discharge and volume pairs are constructed by means of two essentially different approaches. In either case, flood peaks are represented by the AMF peak. The difference appears in the way the flood volumes are defined. In the first approach, the annual maxima of the volumes for different durations have been selected (which can be regarded as a regime analysis of the dependence of the extremes of both variables in a given year), while in the second approach, flood volumes corresponding to the annual maximum flood peaks have been used (which is a typical choice for engineering studies). A bivariate analysis has been applied to the extracted pairs of the flood peak discharge and flood volume data using the copulas to assess which copula family is the most suitable for reproducing the dependence structure of the variables selected as well as observing changes in their applicability along the Danube River. STUDY AREA The Danube River originates in the Black Forest (Schwarzwald) in Germany at an altitude of about 678 m a.s.l. at the confluence of the Brigach and Breg rivers in Donaueschingen. The Danube River flows through ten countries, and its delta is located in the Black Sea. The area of the entire watershed is 801,463 km² with a main channel length of 2830 km (ICPDR, 2009). The average discharge is 6.500 m3 s­1 before the delta. The climatic regime over the river basin is dominated by a clear seasonality, which is influenced in the northern parts by the Atlantic Ocean and in the eastern parts by the continental climate. The total annual precipitation in the catchment is between 3000 mm at the high altitudes and 400 mm in the lowlands (Sommerwerk et al., 2009). The hydrological regime of the Danube River is highly influenced by precipitation patterns and the orographic structure of its catchment (Pekárová et al., 2008, 2013). The Danube Basin can be subdivided into three main parts: the Upper Danube region, between the springs and the Devín Gate (mean elevation = 133 m a.s.l., catchment area = 131,338 km2, mean annual discharge = 2,051 m3 s­1 at the Devín-Bratislava gauge), which is characterized by high precipitation in the west (up to 2500 mm of mean annual precipitation totals in the Alps), whereas the eastern regions have lower precipitation (700 mm of mean annual precipitation totals in the lowlands). The runoff regime corresponds to that of the Alpine tributaries with the maximum discharges in June and minimum discharges during the winter months. The Central Danube Region lies between the Devín Gate and the Iron Gate (60 m a.s.l., 444,894 km2, 5,585 m3 s­1 at the Turnu Severin/Orsova gauge). The runoff regime is characterized by two runoff peaks in June and April. The April runoff peak is mainly caused by Fig. 1. The Danube River with the locations of the gauging stations analyzed. Table 1. Gauging stations with the observation periods, catchment areas and elevation of the gauging stations used in the analysis. Gauged Stations Hofkirchen Achleiten Vienna Bratislava Nagymaros Orsova Reni Country Germany Germany Austria Slovakia Hungary Romania Ukraine Length of the data series (Years) 1901­2007 1901­2007 1828­2006 1876­2006 1893­2007 1904­2009 1931­1995 /1997­2002 Catchment area (km2) 47,496 76,653 101,731 131,338 183,534 576,232 805,700 Gauging station elevation (m a.s.l.) 299.17 287.27 152 132.86 99.37 44.76 0.2 the addition of waters from the snow melting in the Carpathians and from the early spring rains of the tributaries in the lowlands and low mountains of the area. The Lower Danube Region is located between the Iron Gate and the Danube's embouchure into the Black Sea (1 m a.s.l., 230,768 km2, 6,499 m3 s­1 at the Ceatal Izmail gauge). The Danube Delta is considered to be region with temperate climate zones of a steppe character due to low precipitation totals (mean annual precipitation totals range from 600 mm to 300 mm) and the high summer temperatures over 35°C (ICPDR, 2009). In this study the discharge data (annual maximum peak and daily discharges) of the following gauging stations: Hofkirchen (Germany), Achleiten (Germany), Vienna (Austria) from the Upper Danube Region, Bratislava (Slovakia), Nagymaros (Hungary) from the Central Danube Region and Orsova (Romania) and Reni (Ukraine) from the Lower Danube Region, were used, see Figure 1 and Table 1. METHODOLOGY This study applies a bivariate analysis of the AMF peak and volumes using various copula families. For all of the implementations of the bivariate analysis, the dependence of the two variables is assessed with Kendall's tau correlation coefficient. A "blanket" goodness-of-fit (GOF) test, which is based on Kendall's transformation with Cramér-von Mises' measure of distance (Genest et al., 2009), has been used to evaluate and subsequently rank the best fitted copula families. Initially, all the univariate data were tested for the presence of temporal dependence by the Mann-Kendall test (Mann, 1945) and the Ljung-Box test (Ljung and Box, 1978) due to prerequisite for a copula analysis that the data are independent and identically distributed (i.i.d.). Subsequently, the multivariate data of the flood peaks and volumes were tested for the significance of the (rank) correlation (Kendall, 1955) and for the asymmetry of their relationship (Genest et al., 2012) to justify the choice of modelling by (symmetrical) copulas. In the next subchapter the applied copula families are described in detail, followed by the methods of input data selection and the annual maximum flood peaks and flow volume separation. Copulas A copula is a function that allows for the modelling of the dependence structure between stochastic variables. The main advantage is that the copula approach can split the problem of constructing multivariate probability distributions into a part containing the marginal one-dimensional distribution functions and a part containing the dependence structure. These two parts can be studied and estimated independently and then put together into a joint distribution function. The bivariate case can be written as: FXY(x,y) = C(FX(x),FY(y)), (1) where FX, FY are the respective marginal distribution functions of the random variables X, Y. FXY represents the joint distribution function of the random vector (X,Y), and copula C is a function C:[0,1]2 [0,1] that satisfies the boundary conditions C(t,0) = C(0,t) = 0 and C(t,1) = C(1,t) = t (uniform margins) for any t[0,1] and the so-called 2-increasing property. Thus a copula can be viewed as a standardized joint distribution function. Further details can be found in (Nelsen, 2006). Table 2. List of the copulas applied in this study. Archimedean copulas Ali-Mikhail-Haq Clayton Frank Joe Gumbel Extreme-value copulas Hüsler-Reiss Galambos Tawn Other copulas Normal Plackett FGM The current study applies ten commonly-used oneparametric families from several classes of copulas (Table 2, Appendix), namely, the Archimedean class (the Clayton, Frank, Gumbel-Hougaard, Joe and Ali-Mikhail-Haq (AMH) copulas), the extreme-value class (Gumbel-Hougaard, Galambos, HüslerReiss), the elliptical class (normal or Gaussian family), and the unclassified Plackett and Farlie-Gumbel-Morgenstern (FGM) parametric family of copulas. Additionally, the three-parametric Tawn family (of the EV class) has also been used. The Archimedean copulas ( , )= ( ( ) + ( )), , [0,1], (2) have the advantage of easy construction via a one-dimensional function : [0,1] [0, ) called a generator. The extremevalue (EV) copulas ( , )= ( )/ )) [0,1], (3) are uniquely defined through the so-called Pickands dependence function : [0,1] [1/2,1] (see, e.g., Bacigál and Mesiar, 2012; Gudendorf and Segers, 2010; Tawn, 1988) and were theoretically derived to model the dependence between extremes of random variables, while the elliptical copulas are simply the copulas extracted from elliptically contoured distributions. All the Archimedean (except for the Frank) and EV copulas are non-symmetrical with respect to secondary diagonals; some accumulate more of a probability mass at points (0,0) or (1,1), which is denoted as a lower tail or upper tail dependence, respectively. The parameters of the copulas are estimated by maximizing the so-called pseudo-likelihood function: L() = i log[c(Ui, Vi)], (4) The annual maximum flood volumes of selected durations In the first approach the selecting of the annual maximum flow volumes of fixed durations around the peak wave were done separately, instead of defining the volume of each flood (thus conducting a process-oriented analysis of the hydrograph and defining the beginning and end of each runoff event). Here, this simplified algorithm for volume selection was adopted to emulate the volume of a wave, since this paper is more technically oriented towards studying the suitability of theoretical models for such a type of analysis. Both variables, AMF and food volumes of the fixed duration, represent the annual maxima, but they are not necessarily linked to the same hydrological event. Such an approach for the construction of a data set is more preferred among statisticians, since the selected bivariate sample satisfies the conditions for a rigorous adoption of the extreme-value class copulas. Moreover, in our case, it allows for studying the regime of extreme values within each year and defining the possibility that in a year with a high flood, there will also be a flow volume of a predefined duration. For notational simplicity, we will subsequently refer to these flow volumes as flood volumes. The annual maximum flood volumes of the fixed duration dataset used in this study was prepared in the frame of the UNESCO project "Flood regime of rivers in the Danube River Basin" (Pekárová et al., 2008). The annual flow volumes were separated using daily discharge data and fixed around the flood peaks at seven different durations: 5, 10, 15, 20, 25, 30 and 60 days, respectively, using a method described in BacováMitková (2002). According to this methodology for all flood peaks in each year, using a moving window, the flood volumes of certain duration were calculated and largest annual flood which, besides the copula density c,,, contains pseudoobservations Ui, Vi (i = 1,..n, j = 1,2), which are a transformation of the n real observations of the respective random variable X, Y, through a corresponding empirical distribution function, also known as a plotting position. A "blanket" test (Genest et al., 2009) provides the goodnessof-fit test with the Cramér-von Mises' measure of distance: Sn = i{C(Ui,Vi) ­ Cn(Ui,Vi)}2, (5) between the parametric copula C and the empirical copula, which is defined as Cn(u,v) = i 1(Ui < u) 1(Vi < v) / n. (6) Given the validity of the null hypothesis that C fits well, the probability distribution of Sn is unknown and needs to be calculated from bootstrap simulations. The correlation is quantified by Kendall's rank correlation coefficient [-1,1], which is a measure of concordance that is also able to detect nonlinear dependence (to the contrary of the standard Pearson correlation coefficient). Recall that there is a direct relation between Kendall's and a copula, = [ , ] ( , ) ( , ). (7) The analysis is performed with the `acopula' package (Bacigál, 2013) under GNU R (R Core Team, 2014). Supplementary details about the copulas used in this study can be found in the Appendix. Fig. 2. Flowchart of the first approach for the sample selection. RESULTS In this section the results indicating the suitability of the various copula families for the bivariate analysis of the AMF discharges and flood volumes are presented. First, the preliminary data testing for the presence of temporal depend-ence, and the significance of (rank) correlation and asymmetry are discussed. Then, four different rank categories, which were created according to the GOF test, are described. The detailed results of the bivariate analyses of the AMF peaks and volumes are presented separately, according to the approach of volume separation, which was described in the previous section. The preliminary tests revealed only a few violations of the copula analysis prerequisites, namely, the original flood peak data at the Vienna station seemed to contain a trend. No peaks at any station were indicated for autocorrelation. As for the volumes, within the both approaches no volume data shows a temporal dependence. Within the second approach a downward trend was detected at the Orsova and Reni stations due to decreasing volatility in recent decades. The correlation between the AMF peaks and all the flood volumes was quantified with Kendall's Tau and results are summarized within the corresponding subsections. According to the GOF test statistic, which serves as a measure of the distance to the empirical distribution (the lower the better), four different rank categories were created: a) The first to third ranks, and b) the total rank, which refers to the frequency with which a copula family scored with one of the previous ranks. For instance, at the Hofkirchen station (see Table 4, column 3), copulas from the Frank family scored three times with the lowest the GOF test statistic (1st rank) among all the copulas estimated for each fixed volume class, then two times with the second lowest value of the GOF test statistic (i.e. the 2nd rank), and just once with the third lowest value (the 3rd rank) ­ a total of scoring 6 times in the copula competition, which is summarized in Table 5. The total rank then represents the number of the scores across all the fixed volume classes and all the stations. We adopted this ranking approach to be able to select the best models among all those applied, as more than one model was always suitable. The analysis was performed with a focus on the stations spread along the river, different flood volumes (annual maximum volumes of fixed durations and flood volumes corresponding to flood peaks), and diverse (yet commonly implemented) copula models. In the following paragraphs, a detailed analysis of the results of both approaches will be presented. First approach: the annual maximum flood volumes of selected durations One of the findings in this approach is that the (always significant) correlation between the flood variables (i.e., the discharge and volume) decreases as the flood´s duration increases (Table 3). A notable decrease in the correlation can be observed in the Austrian reach, which would require further analysis, e.g., with respect to the lateral inflow in this reach. Overall the values (except for this reach) are more or less constant for all the durations. The results concerning Kendall's Tau correlation analysis for all the stations is that although the upper tail dependence (reading from the properties of the best fitting copula) disappears when the floods´ durations increase, the lower tail dependence remains unchanged. The tests for asymmetry rejected the exchangeability of peaks and volumes in two out of 35 cases; thus the copula families considered are taken as appropriate in this respect. Fig. 3. Flowchart of the second approach for the sample selection: flood volumes corresponding to flood peaks. volume values for the selected durations were then taken for the analysis. The mean daily streamflow data used in this approach are from the following Danube River stations: 1) Hofkirchen, 2) Achleiten, 3) Vienna, 4) Bratislava and 5) Nagymaros. Unfortunately, the annual maximum flood volume data of selected durations for the Orsova and Reni stations were not available within the frame of the mentioned UNESCO project. Next, the applicability of different copula families from the Archimedean class and extreme-value class as well as three other families (Normal, Plackett and FGM) for all of the above data combinations was investigated. Figure 2 presents the flowchart of the first approach. The flood volumes corresponding to the annual maximum flood peaks The second approach refers to a bivariate analysis using the AMF peaks and their corresponding flood volumes for each station (Figure 3). In other words, this is an event-based analysis, since each year at the given site is represented by a single flood event with the largest peak discharge value (and its corresponding flood volume). This is rather an intuitive approach more preferred among engineering hydrologists in practical applications, and it enables the definition of conditional probabilities for design purposes. The mean daily streamflow data used for estimating the flood volume separation in this approach are taken from the following Danube River stations: 1) Hofkirchen, 2) Bratislava, 3) Nagymaros, 4) Orsova and 5) Reni. A local-minimum method developed by Willems (2009) was used for the separation of the base flow, and for a better estimation of the beginning and end of the flood waves. Eleven different copula families (Table 2, Appendix) for all of the various data combinations were investigated for their applicability. Table 3. The Kendall's Tau correlation coefficient values for both approaches and all the stations analysed. Volumes 5D 10D 15D 20D 25D 30D 60D Corresponding volumes to AMF Hofkirchen 0.85 0.79 0.67 0.64 0.61 0.6 0.52 0.26 Achleiten 0.75 0.68 0.54 0.49 0.46 0.43 0.36 ­ Vienna 0.75 0.68 0.53 0.49 0.46 0.44 0.36 ­ Bratislava 0.85 0.78 0.64 0.61 0.59 0.57 0.46 0.36 Nagymaros 0.88 0.82 0.67 0.63 0.6 0.58 0.5 0.35 Orsova ­ ­ ­ ­ ­ ­ ­ 0.41 Reni ­ ­ ­ ­ ­ ­ ­ 0.27 Table 4. The rank values of the copula families according to the GOF test for both approaches. Hofkirchen Copula family Frank Galambos Gumbel Galambos Tawn Gumbel Frank Normal Clayton Plackett Normal Frank Clayton Frank Normal Plackett Frank Clayton Frank Plackett Normal AMH Frank Clayton Achleiten Copula family Hüsler-Reiss Galambos Tawn Plackett Clayton Frank Clayton Plackett Frank Clayton Normal Frank Clayton Frank AMH Clayton Normal AMH Clayton AMH Plackett ­ ­ ­ Vienna Copula family Hüsler-Reiss Galambos Gumbel Galambos Gumbel Hüsler-Reiss Clayton Normal Frank Clayton Plackett Frank Clayton Plackett Frank Clayton AMH Frank Clayton AMH FGM ­ ­ ­ Bratislava Copula family Galambos Gumbel Tawn Tawn Galambos Gumbel Plackett Tawn Gumbel Plackett Normal Clayton Normal Clayton Frank Clayton Normal Frank Clayton Normal Frank Frank Normal AMH Nagymaros Copula family Plackett Normal Gumbel Frank Tawn Plackett Normal Frank Tawn Plackett Normal Frank Plackett Frank Normal Plackett Clayton Frank Clayton Normal Frank AMH Frank Clayton Volume 5D 10D 15D 20D 25D 30D 60D Corresp. volumes to AMF Ranks 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd Table 5. The frequency (total) and relative frequency (total %) of the overall rankings according to the GOF test for the data samples created on the basis of the first approach. Copula families AMH Clayton Frank Joe Gumbel Hüsler-Reiss Galambos Tawn Normal Plackett FGM Hofkirchen 0 3 6 0 2 0 2 1 4 3 0 Achleiten 3 6 4 0 0 1 1 1 2 3 0 Vienna 2 5 4 0 2 2 2 0 1 2 1 Bratislava 0 4 3 0 3 0 2 3 4 2 0 Nagymaros 0 2 6 0 1 0 0 2 5 5 0 Total 5 20 23 0 8 3 7 7 16 15 1 Total (%) 5 19 22 0 8 3 7 7 15 14 1 Fig. 4. The relative frequency of the ranks (in percentages) of different copula families as a measure of their applicability to all the stations (first approach). Fig. 5. Scatter plots of 500 pair simulations according to the fitted copulas (grey color) and measured data (black color) for the data sample representing 60 days´ fixed volume duration. a) Bratislava Station, the Clayton fit, GOF p-value 0.042, Kendall's Tau correlation coefficient 0.464 b) Hofkirchen Station, the Frank fit, GOF p-value 0.061, Kendall's Tau correlation coefficient 0.521. Fig. 6. Scatter plots of the pseudo-observation for all the stations (the AMF peak vs. the corresponding flood volume) with the Kendall's Tau correlation coefficient (in parentheses). Fig. 7. Ranking of the relative frequency (in percentages) of the applicability of different copula families to all the stations for the data samples created based on the second approach. The copula family rankings are presented in Table 4, where, in the category of the first rank, the Clayton copula prevails with 14 counts, N = 35 (a 40% relative frequency) (Figure 4b, Table 4). At the Achleiten and Vienna stations, the Clayton family has five counts with N = 7 (a 70% relative frequency). In the second rank, the Normal family, which dominates in the ranking, has 11 counts with N = 35 (a 31% relative frequency) (Figure 4c, Table 4), and at the stations of Bratislava and Nagymaros, the same family has three counts with N = 7 (a 43% relative frequency). Moreover, in the third rank, the leading copula is that of the Frank family, having 14 counts with N = 35 (a 40% relative frequency) (Figure 4.d, Table 4), and at the Vienna station, the same family has four counts with N = 7 (a 57% relative frequency). Finally, the GOF test indicates that for the total ranking, the Frank copula performs better than the others with 23 counts, N = 105 (a 22% relative frequency) (Figure 4a, Table 5). The Clayton copula is in the second place of the total ranking with 20 counts (a 19% relative frequency), and the Normal copula is in the third place with 16 counts (a 15% relative frequency) (Figure 4a, Table 5). In Figure 5 two scatter plot examples of the simulated/observed data can be seen. Second approach: the flood volumes corresponding to annual maximum flood peaks Concerning the second approach, according to Kendall's Tau correlation coefficient (which is again always significant), the correlation between the AMF peak and the corresponding flood volume increases along the Danube River (downstream), except for the Reni station (Figure 6, Table 3). No relation appears to be asymmetric with respect to the main diagonal (as suggested by the asymmetry test). Another finding from this analysis is that there is an increase in the upper tail dependence at the river stations located in the lower part of the river (see Figure 6), which is supported by the GOF test preference of extreme-value copulas at the Orsova and Reni stations (while at the Hofkirchen, Bratislava and Nagymaros stations, no copula with an upper tail dependence modelling capability was preferred), see Table 4. Furthermore, the analysis of the GOF test indicates that for the total rank, the Frank copula with four counts performs better than the others, N = 15 (a 27% relative frequency) while the AMH is in the second place with three counts, N = 15 (a 20% relative frequency) (Figure 7, Table 4). In Figure 8 two scatter plot examples of the simulated/observed data can be seen. In comparison with the first approach, the correlation of the AMF peak and the corresponding flood volumes is notably lower (see the p-values in Figure 8). Moreover, a result that follows from the analysis is that in the low correlation samples (the Hofkirchen, Bratislava and Nagymaros stations) the AMH copula family is mostly preferred (the Reni station is an exception due to the upper tail behaviour) (Figure 6, Table 3, Table 4). DISCUSSION AND CONCLUSIONS In this study we provided a bivariate analysis of the streamflow data (the AMF peak and flood volumes) for modeling extreme flood events with the use of various copula families. First, the annual maximum flood volumes of the fixed durations were separated from the flood waves, which were not necessarily linked to the same hydrological event. This approach leads to the construction of the data set allowing us to study the regime of extreme values within each year and to investigate the chance that in a year with a high flood, there would also be a flood with a high volume. Since this paper is more technically oriented towards studying the suitability of theoretical models for such a type of analysis, this simplified volume selection was adopted instead of defining the volume of each flood (thus conducting a process-oriented analysis of the hydrograph and defining the beginning and end of each runoff event). In the second approach to flood volume separation, the AMF peaks and their corresponding flood volumes for each station were sampled. This approach is mostly preferred in engineering studies in practical applications, since it enables the definition of conditional probabilities for design purposes. The positive and rather significant dependence of the AMF and annual Fig. 8. Scatter plots of 500 pair simulations according to the fitted copula (grey color) and measured data (black color), for the data samples representing annual flood peaks and the corresponding flood volumes. a) Bratislava Station, the Frank fit, GOF p-value 0.045, the Kendall's Tau correlation coefficient 0.359 b) Hofkirchen Station, the AMH fit, GOF p-value 0.04, Kendall's Tau correlation coefficient 0.262. maximum flood volumes of a fixed duration indicates that a detailed comparative analysis of such peaks and volumes along the river and between distinct periods bears a potential for new information on the regime of extremes and its changes. As for the technical part, the main outcome of the analysis is that for all of the applications, the most favored copula family is the Frank one, and the least preferred is the Joe, one which failed to give any good fitting (Table 5, Figures 4 and 7). The AMH family is mostly favored in the low correlation samples. In the first approach, the families with the best fit for all of the applications are the: 1) Frank, 2) Clayton and 3) Normal. Furthermore, the most favored copula families, taking into account their ranks, are: first rank ­ the Clayton, second rank ­ the Normal, and third rank ­ the Frank. The correlation analysis showed that there is: a) a certain decline in the correlation between the AMF peak and the flood volumes of fixed durations when the flood duration increases, and b) the persistence of a lower tail dependence with an increasing flood volume. When focusing the analysis on the stations, the Clayton family is the most applicable for the stations at Achleiten and Vienna. The second approach to the flood peak - volume sample construction shows that the Frank and AMH are the best fitted families. The correlation of the AMF peak ­ corresponding flood volume pairs rises as the stations downstream of the Danube River are passed. The result that the Frank copula is the most favored family agrees with the study of Reddy and Ganguli (2012), which concerning the GOF p-value, underlined the Frank copula as the most suitable in terms of the best fit for the flood peakvolume pairs. Moreover, in the studies of Chowdhary et al. (2011) and Bacová-Mitková and Halmová (2014), the Clayton copula family was reported as the most suitable choice for simulation of the flood peak-volume pairs. The results of the unacceptable performance (in fitting) of the Joe copula family support the results of Szolgay et al. (2015). Finally, a valuable outcome from the results of this study, which agree with the results of Favre et al. (2004) and Szolgay et al. (2015, 2016), could be that a further investigation of the choice of the "best" copula families for the flood peak-volume pairs is essential. For a more comprehensive and complete analysis of flood characteristics (peaks, volumes and durations), the following steps could be implemented in the near future: 1) a seasonal analysis splitting the year into several logical periods followed by applying the same methods of constructing the volumes and copula-fitting approach as in this study, 2) assessing the impacts of hydrotechnical projects on the study area (i.e., an examination of the correlation patterns between discharges ­ volumes due to dam construction), and 3) the use of partial duration series for a peak-over-threshold analysis using bivariate or trivariate copulas. Acknowledgement. This research was funded by the "COST Action ES0901, European procedures for flood frequency estimation (FloodFreq)" research program, within the Short-Term Scientific Mission collaboration of the University of Thessaly, Department of Civil Engineering, and the Slovak University of Technology in Bratislava, Faculty of Civil Engineering, Department of Land and Water Resources Management. This financial support is gratefully acknowledged. Furthermore, we would like to acknowledge the support of the ERC "FloodChange" Advanced Grant and the Slovak Grant Agency under VEGA Project Nos. 1/0776/13 and 1/0710/15 for their financial support. The authors would like to thank Katarína Jeneiová, PhD., for her help with data processing. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Hydrology and Hydromechanics de Gruyter

Joint modelling of flood peaks and volumes: A copula application for the Danube River

Loading next page...
 
/lp/de-gruyter/joint-modelling-of-flood-peaks-and-volumes-a-copula-application-for-dlcC4ePTQ1

References (58)

Publisher
de Gruyter
Copyright
Copyright © 2016 by the
ISSN
0042-790X
eISSN
0042-790X
DOI
10.1515/johh-2016-0049
Publisher site
See Article on Publisher Site

Abstract

Flood frequency analysis is usually performed as a univariate analysis of flood peaks using a suitable theoretical probability distribution of the annual maximum flood peaks or peak over threshold values. However, other flood attributes, such as flood volume and duration, are necessary for the design of hydrotechnical projects, too. In this study, the suitability of various copula families for a bivariate analysis of peak discharges and flood volumes has been tested. Streamflow data from selected gauging stations along the whole Danube River have been used. Kendall's rank correlation coefficient (tau) quantifies the dependence between flood peak discharge and flood volume settings. The methodology is applied to two different data samples: 1) annual maximum flood (AMF) peaks combined with annual maximum flow volumes of fixed durations at 5, 10, 15, 20, 25, 30 and 60 days, respectively (which can be regarded as a regime analysis of the dependence between the extremes of both variables in a given year), and 2) annual maximum flood (AMF) peaks with corresponding flood volumes (which is a typical choice for engineering studies). The bivariate modelling of the extracted peak discharge - flood volume couples is achieved with the use of the Ali-Mikhail-Haq (AMH), Clayton, Frank, Joe, Gumbel, Hüsler-Reiss, Galambos, Tawn, Normal, Plackett and FGM copula families. Scatterplots of the observed and simulated peak discharge - flood volume pairs and goodness-of-fit tests have been used to assess the overall applicability of the copulas as well as observing any changes in suitable models along the Danube River. The results indicate that for the second data sampling method, almost all of the considered Archimedean class copula families perform better than the other copula families selected for this study, and that for the first method, only the upper-tail-flat copulas excel (except for the AMH copula due to its inability to model stronger relationships). Keywords: Bivariate frequency analysis; Copulas; Dependence of flood peaks and volumes; Kendall's rank correlation coefficient; Danube River. INTRODUCTION The design of flood protection structures where storage is involved requires an entire hydrograph or at least the flood volume/shape estimates related to the flood peaks. Therefore, the relationship between peaks of annual maximum floods (AMF) and their volumes is an interesting scientific research issue. From a broader perspective, however, the dependence between the annual extremes of both variables is also of interest when studied from the perspective of regimes. In the past, identical marginal distributions for both of these random variables have often been used for modelling their dependence in hydrology (e.g., Goel et al., 1998; Yue et al., 2001). As flood peaks are the most commonly used data in hydrological frequency analysis, the majority of studies have analyzed flood peak frequency curves (Cunnane, 1988, 1989; Dawdy et al., 2012; Groupe de recherche en hydrologie statistique (GREHYS), 1996; Laio et al., 2011; Mediero and Kjeldsen, 2014) in contrast to flood volumes, for which fewer studies are available (e.g. Bacová-Mitková, 2011; Mediero et al., 2010). Under the assumption that flood peaks and volumes have the same type of marginal distributions, several authors have used bivariate distributions for a frequency analysis of these variables (e.g. Singh and Singh, 1991; Shiau et al., 2006; Yue et al., 2001). This requirement, however, is seldom fulfilled in practice. In the event that the flood peaks and volumes do not have the same type of marginal probability distribution, the copula approach provides a flexible solution (Gaál et al., 2015; Giustarini et al., 2010; Szolgay et al., 2015; Zhang and Singh, 2006). Copulas allow for the combining of various marginal distributions into multivariate distributions (Favre et al., 2004; Genest and Favre, 2007; Nelsen, 2006; Salvadori and De Michele, 2004, 2010). Copulas have recently become a popular tool in hydrological analysis for modeling the relationship among hydrological characteristics. A methodology for using copulas in hydrology has been described e.g. by Dupuis (2007) and Genest and Favre (2007). Recently, a lot of studies have implemented the application of copulas in engineering practice (see, e.g., Aronica et al., 2012; Balistrocchi and Baldassarre, 2011; Bezak et al., 2016; De Michele and Salvadori, 2003; De Michele et al., 2005; Gaál et al., 2010; Zhang and Singh, 2007). Favre et al., (2004) were among the first authors who instituted the use of a two-dimensional copula for describing the relationship between flood discharges and volumes. Zhang and Singh (2006) stressed that bivariate copula-based distributions of flood peaks vs. volumes and flood volumes vs. durations provide better results for showing agreement when plotting frequency estimates than traditional distributions. Bacová-Mitková and Halmová (2014) applied parametric families of Archimedean copulas for an analysis of the relationship between vol- umes, peak discharges and durations at the Bratislava gauging station on the Danube River. Szolgay et al. (2012) used a joint analysis of maximum discharges and volumes through copulas for the estimation of design quantities. The analysis described a case study in the Vltava River Basin in the Czech Republic to estimate the design discharge for a return period of 10,000 years to assess the safety of the Orlík dam. The more recent study of Sraj et al. (2015) applied a bivariate copula analysis to the annual maximum discharges and volumes from a gauging station on the Sava River in Slovenia. The results from three families were compared, and the Gumbel-Hougaard copula was found to be the most appropriate for modeling peak discharges vs. volume and volume vs. duration. The main objective of this study is to investigate the suitability and evaluate the applicability of the selected copula models for the flood peak­volume relationship along the Danube River using streamflow data from seven gauging stations. The following eleven copula families were used for the analysis: AliMikhail-Haq, Clayton, Frank, Joe, Gumbel, Hüsler-Reiss, Galambos, Tawn, Normal, Plackett and FGM copula families. In contrast to previous studies conducted in the Danube River area (e.g., Bacová-Mitková and Halmová, 2014; Gaál et al., 2015; Szolgay et al., 2015, 2016), the flood peak discharge and volume pairs are constructed by means of two essentially different approaches. In either case, flood peaks are represented by the AMF peak. The difference appears in the way the flood volumes are defined. In the first approach, the annual maxima of the volumes for different durations have been selected (which can be regarded as a regime analysis of the dependence of the extremes of both variables in a given year), while in the second approach, flood volumes corresponding to the annual maximum flood peaks have been used (which is a typical choice for engineering studies). A bivariate analysis has been applied to the extracted pairs of the flood peak discharge and flood volume data using the copulas to assess which copula family is the most suitable for reproducing the dependence structure of the variables selected as well as observing changes in their applicability along the Danube River. STUDY AREA The Danube River originates in the Black Forest (Schwarzwald) in Germany at an altitude of about 678 m a.s.l. at the confluence of the Brigach and Breg rivers in Donaueschingen. The Danube River flows through ten countries, and its delta is located in the Black Sea. The area of the entire watershed is 801,463 km² with a main channel length of 2830 km (ICPDR, 2009). The average discharge is 6.500 m3 s­1 before the delta. The climatic regime over the river basin is dominated by a clear seasonality, which is influenced in the northern parts by the Atlantic Ocean and in the eastern parts by the continental climate. The total annual precipitation in the catchment is between 3000 mm at the high altitudes and 400 mm in the lowlands (Sommerwerk et al., 2009). The hydrological regime of the Danube River is highly influenced by precipitation patterns and the orographic structure of its catchment (Pekárová et al., 2008, 2013). The Danube Basin can be subdivided into three main parts: the Upper Danube region, between the springs and the Devín Gate (mean elevation = 133 m a.s.l., catchment area = 131,338 km2, mean annual discharge = 2,051 m3 s­1 at the Devín-Bratislava gauge), which is characterized by high precipitation in the west (up to 2500 mm of mean annual precipitation totals in the Alps), whereas the eastern regions have lower precipitation (700 mm of mean annual precipitation totals in the lowlands). The runoff regime corresponds to that of the Alpine tributaries with the maximum discharges in June and minimum discharges during the winter months. The Central Danube Region lies between the Devín Gate and the Iron Gate (60 m a.s.l., 444,894 km2, 5,585 m3 s­1 at the Turnu Severin/Orsova gauge). The runoff regime is characterized by two runoff peaks in June and April. The April runoff peak is mainly caused by Fig. 1. The Danube River with the locations of the gauging stations analyzed. Table 1. Gauging stations with the observation periods, catchment areas and elevation of the gauging stations used in the analysis. Gauged Stations Hofkirchen Achleiten Vienna Bratislava Nagymaros Orsova Reni Country Germany Germany Austria Slovakia Hungary Romania Ukraine Length of the data series (Years) 1901­2007 1901­2007 1828­2006 1876­2006 1893­2007 1904­2009 1931­1995 /1997­2002 Catchment area (km2) 47,496 76,653 101,731 131,338 183,534 576,232 805,700 Gauging station elevation (m a.s.l.) 299.17 287.27 152 132.86 99.37 44.76 0.2 the addition of waters from the snow melting in the Carpathians and from the early spring rains of the tributaries in the lowlands and low mountains of the area. The Lower Danube Region is located between the Iron Gate and the Danube's embouchure into the Black Sea (1 m a.s.l., 230,768 km2, 6,499 m3 s­1 at the Ceatal Izmail gauge). The Danube Delta is considered to be region with temperate climate zones of a steppe character due to low precipitation totals (mean annual precipitation totals range from 600 mm to 300 mm) and the high summer temperatures over 35°C (ICPDR, 2009). In this study the discharge data (annual maximum peak and daily discharges) of the following gauging stations: Hofkirchen (Germany), Achleiten (Germany), Vienna (Austria) from the Upper Danube Region, Bratislava (Slovakia), Nagymaros (Hungary) from the Central Danube Region and Orsova (Romania) and Reni (Ukraine) from the Lower Danube Region, were used, see Figure 1 and Table 1. METHODOLOGY This study applies a bivariate analysis of the AMF peak and volumes using various copula families. For all of the implementations of the bivariate analysis, the dependence of the two variables is assessed with Kendall's tau correlation coefficient. A "blanket" goodness-of-fit (GOF) test, which is based on Kendall's transformation with Cramér-von Mises' measure of distance (Genest et al., 2009), has been used to evaluate and subsequently rank the best fitted copula families. Initially, all the univariate data were tested for the presence of temporal dependence by the Mann-Kendall test (Mann, 1945) and the Ljung-Box test (Ljung and Box, 1978) due to prerequisite for a copula analysis that the data are independent and identically distributed (i.i.d.). Subsequently, the multivariate data of the flood peaks and volumes were tested for the significance of the (rank) correlation (Kendall, 1955) and for the asymmetry of their relationship (Genest et al., 2012) to justify the choice of modelling by (symmetrical) copulas. In the next subchapter the applied copula families are described in detail, followed by the methods of input data selection and the annual maximum flood peaks and flow volume separation. Copulas A copula is a function that allows for the modelling of the dependence structure between stochastic variables. The main advantage is that the copula approach can split the problem of constructing multivariate probability distributions into a part containing the marginal one-dimensional distribution functions and a part containing the dependence structure. These two parts can be studied and estimated independently and then put together into a joint distribution function. The bivariate case can be written as: FXY(x,y) = C(FX(x),FY(y)), (1) where FX, FY are the respective marginal distribution functions of the random variables X, Y. FXY represents the joint distribution function of the random vector (X,Y), and copula C is a function C:[0,1]2 [0,1] that satisfies the boundary conditions C(t,0) = C(0,t) = 0 and C(t,1) = C(1,t) = t (uniform margins) for any t[0,1] and the so-called 2-increasing property. Thus a copula can be viewed as a standardized joint distribution function. Further details can be found in (Nelsen, 2006). Table 2. List of the copulas applied in this study. Archimedean copulas Ali-Mikhail-Haq Clayton Frank Joe Gumbel Extreme-value copulas Hüsler-Reiss Galambos Tawn Other copulas Normal Plackett FGM The current study applies ten commonly-used oneparametric families from several classes of copulas (Table 2, Appendix), namely, the Archimedean class (the Clayton, Frank, Gumbel-Hougaard, Joe and Ali-Mikhail-Haq (AMH) copulas), the extreme-value class (Gumbel-Hougaard, Galambos, HüslerReiss), the elliptical class (normal or Gaussian family), and the unclassified Plackett and Farlie-Gumbel-Morgenstern (FGM) parametric family of copulas. Additionally, the three-parametric Tawn family (of the EV class) has also been used. The Archimedean copulas ( , )= ( ( ) + ( )), , [0,1], (2) have the advantage of easy construction via a one-dimensional function : [0,1] [0, ) called a generator. The extremevalue (EV) copulas ( , )= ( )/ )) [0,1], (3) are uniquely defined through the so-called Pickands dependence function : [0,1] [1/2,1] (see, e.g., Bacigál and Mesiar, 2012; Gudendorf and Segers, 2010; Tawn, 1988) and were theoretically derived to model the dependence between extremes of random variables, while the elliptical copulas are simply the copulas extracted from elliptically contoured distributions. All the Archimedean (except for the Frank) and EV copulas are non-symmetrical with respect to secondary diagonals; some accumulate more of a probability mass at points (0,0) or (1,1), which is denoted as a lower tail or upper tail dependence, respectively. The parameters of the copulas are estimated by maximizing the so-called pseudo-likelihood function: L() = i log[c(Ui, Vi)], (4) The annual maximum flood volumes of selected durations In the first approach the selecting of the annual maximum flow volumes of fixed durations around the peak wave were done separately, instead of defining the volume of each flood (thus conducting a process-oriented analysis of the hydrograph and defining the beginning and end of each runoff event). Here, this simplified algorithm for volume selection was adopted to emulate the volume of a wave, since this paper is more technically oriented towards studying the suitability of theoretical models for such a type of analysis. Both variables, AMF and food volumes of the fixed duration, represent the annual maxima, but they are not necessarily linked to the same hydrological event. Such an approach for the construction of a data set is more preferred among statisticians, since the selected bivariate sample satisfies the conditions for a rigorous adoption of the extreme-value class copulas. Moreover, in our case, it allows for studying the regime of extreme values within each year and defining the possibility that in a year with a high flood, there will also be a flow volume of a predefined duration. For notational simplicity, we will subsequently refer to these flow volumes as flood volumes. The annual maximum flood volumes of the fixed duration dataset used in this study was prepared in the frame of the UNESCO project "Flood regime of rivers in the Danube River Basin" (Pekárová et al., 2008). The annual flow volumes were separated using daily discharge data and fixed around the flood peaks at seven different durations: 5, 10, 15, 20, 25, 30 and 60 days, respectively, using a method described in BacováMitková (2002). According to this methodology for all flood peaks in each year, using a moving window, the flood volumes of certain duration were calculated and largest annual flood which, besides the copula density c,,, contains pseudoobservations Ui, Vi (i = 1,..n, j = 1,2), which are a transformation of the n real observations of the respective random variable X, Y, through a corresponding empirical distribution function, also known as a plotting position. A "blanket" test (Genest et al., 2009) provides the goodnessof-fit test with the Cramér-von Mises' measure of distance: Sn = i{C(Ui,Vi) ­ Cn(Ui,Vi)}2, (5) between the parametric copula C and the empirical copula, which is defined as Cn(u,v) = i 1(Ui < u) 1(Vi < v) / n. (6) Given the validity of the null hypothesis that C fits well, the probability distribution of Sn is unknown and needs to be calculated from bootstrap simulations. The correlation is quantified by Kendall's rank correlation coefficient [-1,1], which is a measure of concordance that is also able to detect nonlinear dependence (to the contrary of the standard Pearson correlation coefficient). Recall that there is a direct relation between Kendall's and a copula, = [ , ] ( , ) ( , ). (7) The analysis is performed with the `acopula' package (Bacigál, 2013) under GNU R (R Core Team, 2014). Supplementary details about the copulas used in this study can be found in the Appendix. Fig. 2. Flowchart of the first approach for the sample selection. RESULTS In this section the results indicating the suitability of the various copula families for the bivariate analysis of the AMF discharges and flood volumes are presented. First, the preliminary data testing for the presence of temporal depend-ence, and the significance of (rank) correlation and asymmetry are discussed. Then, four different rank categories, which were created according to the GOF test, are described. The detailed results of the bivariate analyses of the AMF peaks and volumes are presented separately, according to the approach of volume separation, which was described in the previous section. The preliminary tests revealed only a few violations of the copula analysis prerequisites, namely, the original flood peak data at the Vienna station seemed to contain a trend. No peaks at any station were indicated for autocorrelation. As for the volumes, within the both approaches no volume data shows a temporal dependence. Within the second approach a downward trend was detected at the Orsova and Reni stations due to decreasing volatility in recent decades. The correlation between the AMF peaks and all the flood volumes was quantified with Kendall's Tau and results are summarized within the corresponding subsections. According to the GOF test statistic, which serves as a measure of the distance to the empirical distribution (the lower the better), four different rank categories were created: a) The first to third ranks, and b) the total rank, which refers to the frequency with which a copula family scored with one of the previous ranks. For instance, at the Hofkirchen station (see Table 4, column 3), copulas from the Frank family scored three times with the lowest the GOF test statistic (1st rank) among all the copulas estimated for each fixed volume class, then two times with the second lowest value of the GOF test statistic (i.e. the 2nd rank), and just once with the third lowest value (the 3rd rank) ­ a total of scoring 6 times in the copula competition, which is summarized in Table 5. The total rank then represents the number of the scores across all the fixed volume classes and all the stations. We adopted this ranking approach to be able to select the best models among all those applied, as more than one model was always suitable. The analysis was performed with a focus on the stations spread along the river, different flood volumes (annual maximum volumes of fixed durations and flood volumes corresponding to flood peaks), and diverse (yet commonly implemented) copula models. In the following paragraphs, a detailed analysis of the results of both approaches will be presented. First approach: the annual maximum flood volumes of selected durations One of the findings in this approach is that the (always significant) correlation between the flood variables (i.e., the discharge and volume) decreases as the flood´s duration increases (Table 3). A notable decrease in the correlation can be observed in the Austrian reach, which would require further analysis, e.g., with respect to the lateral inflow in this reach. Overall the values (except for this reach) are more or less constant for all the durations. The results concerning Kendall's Tau correlation analysis for all the stations is that although the upper tail dependence (reading from the properties of the best fitting copula) disappears when the floods´ durations increase, the lower tail dependence remains unchanged. The tests for asymmetry rejected the exchangeability of peaks and volumes in two out of 35 cases; thus the copula families considered are taken as appropriate in this respect. Fig. 3. Flowchart of the second approach for the sample selection: flood volumes corresponding to flood peaks. volume values for the selected durations were then taken for the analysis. The mean daily streamflow data used in this approach are from the following Danube River stations: 1) Hofkirchen, 2) Achleiten, 3) Vienna, 4) Bratislava and 5) Nagymaros. Unfortunately, the annual maximum flood volume data of selected durations for the Orsova and Reni stations were not available within the frame of the mentioned UNESCO project. Next, the applicability of different copula families from the Archimedean class and extreme-value class as well as three other families (Normal, Plackett and FGM) for all of the above data combinations was investigated. Figure 2 presents the flowchart of the first approach. The flood volumes corresponding to the annual maximum flood peaks The second approach refers to a bivariate analysis using the AMF peaks and their corresponding flood volumes for each station (Figure 3). In other words, this is an event-based analysis, since each year at the given site is represented by a single flood event with the largest peak discharge value (and its corresponding flood volume). This is rather an intuitive approach more preferred among engineering hydrologists in practical applications, and it enables the definition of conditional probabilities for design purposes. The mean daily streamflow data used for estimating the flood volume separation in this approach are taken from the following Danube River stations: 1) Hofkirchen, 2) Bratislava, 3) Nagymaros, 4) Orsova and 5) Reni. A local-minimum method developed by Willems (2009) was used for the separation of the base flow, and for a better estimation of the beginning and end of the flood waves. Eleven different copula families (Table 2, Appendix) for all of the various data combinations were investigated for their applicability. Table 3. The Kendall's Tau correlation coefficient values for both approaches and all the stations analysed. Volumes 5D 10D 15D 20D 25D 30D 60D Corresponding volumes to AMF Hofkirchen 0.85 0.79 0.67 0.64 0.61 0.6 0.52 0.26 Achleiten 0.75 0.68 0.54 0.49 0.46 0.43 0.36 ­ Vienna 0.75 0.68 0.53 0.49 0.46 0.44 0.36 ­ Bratislava 0.85 0.78 0.64 0.61 0.59 0.57 0.46 0.36 Nagymaros 0.88 0.82 0.67 0.63 0.6 0.58 0.5 0.35 Orsova ­ ­ ­ ­ ­ ­ ­ 0.41 Reni ­ ­ ­ ­ ­ ­ ­ 0.27 Table 4. The rank values of the copula families according to the GOF test for both approaches. Hofkirchen Copula family Frank Galambos Gumbel Galambos Tawn Gumbel Frank Normal Clayton Plackett Normal Frank Clayton Frank Normal Plackett Frank Clayton Frank Plackett Normal AMH Frank Clayton Achleiten Copula family Hüsler-Reiss Galambos Tawn Plackett Clayton Frank Clayton Plackett Frank Clayton Normal Frank Clayton Frank AMH Clayton Normal AMH Clayton AMH Plackett ­ ­ ­ Vienna Copula family Hüsler-Reiss Galambos Gumbel Galambos Gumbel Hüsler-Reiss Clayton Normal Frank Clayton Plackett Frank Clayton Plackett Frank Clayton AMH Frank Clayton AMH FGM ­ ­ ­ Bratislava Copula family Galambos Gumbel Tawn Tawn Galambos Gumbel Plackett Tawn Gumbel Plackett Normal Clayton Normal Clayton Frank Clayton Normal Frank Clayton Normal Frank Frank Normal AMH Nagymaros Copula family Plackett Normal Gumbel Frank Tawn Plackett Normal Frank Tawn Plackett Normal Frank Plackett Frank Normal Plackett Clayton Frank Clayton Normal Frank AMH Frank Clayton Volume 5D 10D 15D 20D 25D 30D 60D Corresp. volumes to AMF Ranks 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd 1st 2nd 3rd Table 5. The frequency (total) and relative frequency (total %) of the overall rankings according to the GOF test for the data samples created on the basis of the first approach. Copula families AMH Clayton Frank Joe Gumbel Hüsler-Reiss Galambos Tawn Normal Plackett FGM Hofkirchen 0 3 6 0 2 0 2 1 4 3 0 Achleiten 3 6 4 0 0 1 1 1 2 3 0 Vienna 2 5 4 0 2 2 2 0 1 2 1 Bratislava 0 4 3 0 3 0 2 3 4 2 0 Nagymaros 0 2 6 0 1 0 0 2 5 5 0 Total 5 20 23 0 8 3 7 7 16 15 1 Total (%) 5 19 22 0 8 3 7 7 15 14 1 Fig. 4. The relative frequency of the ranks (in percentages) of different copula families as a measure of their applicability to all the stations (first approach). Fig. 5. Scatter plots of 500 pair simulations according to the fitted copulas (grey color) and measured data (black color) for the data sample representing 60 days´ fixed volume duration. a) Bratislava Station, the Clayton fit, GOF p-value 0.042, Kendall's Tau correlation coefficient 0.464 b) Hofkirchen Station, the Frank fit, GOF p-value 0.061, Kendall's Tau correlation coefficient 0.521. Fig. 6. Scatter plots of the pseudo-observation for all the stations (the AMF peak vs. the corresponding flood volume) with the Kendall's Tau correlation coefficient (in parentheses). Fig. 7. Ranking of the relative frequency (in percentages) of the applicability of different copula families to all the stations for the data samples created based on the second approach. The copula family rankings are presented in Table 4, where, in the category of the first rank, the Clayton copula prevails with 14 counts, N = 35 (a 40% relative frequency) (Figure 4b, Table 4). At the Achleiten and Vienna stations, the Clayton family has five counts with N = 7 (a 70% relative frequency). In the second rank, the Normal family, which dominates in the ranking, has 11 counts with N = 35 (a 31% relative frequency) (Figure 4c, Table 4), and at the stations of Bratislava and Nagymaros, the same family has three counts with N = 7 (a 43% relative frequency). Moreover, in the third rank, the leading copula is that of the Frank family, having 14 counts with N = 35 (a 40% relative frequency) (Figure 4.d, Table 4), and at the Vienna station, the same family has four counts with N = 7 (a 57% relative frequency). Finally, the GOF test indicates that for the total ranking, the Frank copula performs better than the others with 23 counts, N = 105 (a 22% relative frequency) (Figure 4a, Table 5). The Clayton copula is in the second place of the total ranking with 20 counts (a 19% relative frequency), and the Normal copula is in the third place with 16 counts (a 15% relative frequency) (Figure 4a, Table 5). In Figure 5 two scatter plot examples of the simulated/observed data can be seen. Second approach: the flood volumes corresponding to annual maximum flood peaks Concerning the second approach, according to Kendall's Tau correlation coefficient (which is again always significant), the correlation between the AMF peak and the corresponding flood volume increases along the Danube River (downstream), except for the Reni station (Figure 6, Table 3). No relation appears to be asymmetric with respect to the main diagonal (as suggested by the asymmetry test). Another finding from this analysis is that there is an increase in the upper tail dependence at the river stations located in the lower part of the river (see Figure 6), which is supported by the GOF test preference of extreme-value copulas at the Orsova and Reni stations (while at the Hofkirchen, Bratislava and Nagymaros stations, no copula with an upper tail dependence modelling capability was preferred), see Table 4. Furthermore, the analysis of the GOF test indicates that for the total rank, the Frank copula with four counts performs better than the others, N = 15 (a 27% relative frequency) while the AMH is in the second place with three counts, N = 15 (a 20% relative frequency) (Figure 7, Table 4). In Figure 8 two scatter plot examples of the simulated/observed data can be seen. In comparison with the first approach, the correlation of the AMF peak and the corresponding flood volumes is notably lower (see the p-values in Figure 8). Moreover, a result that follows from the analysis is that in the low correlation samples (the Hofkirchen, Bratislava and Nagymaros stations) the AMH copula family is mostly preferred (the Reni station is an exception due to the upper tail behaviour) (Figure 6, Table 3, Table 4). DISCUSSION AND CONCLUSIONS In this study we provided a bivariate analysis of the streamflow data (the AMF peak and flood volumes) for modeling extreme flood events with the use of various copula families. First, the annual maximum flood volumes of the fixed durations were separated from the flood waves, which were not necessarily linked to the same hydrological event. This approach leads to the construction of the data set allowing us to study the regime of extreme values within each year and to investigate the chance that in a year with a high flood, there would also be a flood with a high volume. Since this paper is more technically oriented towards studying the suitability of theoretical models for such a type of analysis, this simplified volume selection was adopted instead of defining the volume of each flood (thus conducting a process-oriented analysis of the hydrograph and defining the beginning and end of each runoff event). In the second approach to flood volume separation, the AMF peaks and their corresponding flood volumes for each station were sampled. This approach is mostly preferred in engineering studies in practical applications, since it enables the definition of conditional probabilities for design purposes. The positive and rather significant dependence of the AMF and annual Fig. 8. Scatter plots of 500 pair simulations according to the fitted copula (grey color) and measured data (black color), for the data samples representing annual flood peaks and the corresponding flood volumes. a) Bratislava Station, the Frank fit, GOF p-value 0.045, the Kendall's Tau correlation coefficient 0.359 b) Hofkirchen Station, the AMH fit, GOF p-value 0.04, Kendall's Tau correlation coefficient 0.262. maximum flood volumes of a fixed duration indicates that a detailed comparative analysis of such peaks and volumes along the river and between distinct periods bears a potential for new information on the regime of extremes and its changes. As for the technical part, the main outcome of the analysis is that for all of the applications, the most favored copula family is the Frank one, and the least preferred is the Joe, one which failed to give any good fitting (Table 5, Figures 4 and 7). The AMH family is mostly favored in the low correlation samples. In the first approach, the families with the best fit for all of the applications are the: 1) Frank, 2) Clayton and 3) Normal. Furthermore, the most favored copula families, taking into account their ranks, are: first rank ­ the Clayton, second rank ­ the Normal, and third rank ­ the Frank. The correlation analysis showed that there is: a) a certain decline in the correlation between the AMF peak and the flood volumes of fixed durations when the flood duration increases, and b) the persistence of a lower tail dependence with an increasing flood volume. When focusing the analysis on the stations, the Clayton family is the most applicable for the stations at Achleiten and Vienna. The second approach to the flood peak - volume sample construction shows that the Frank and AMH are the best fitted families. The correlation of the AMF peak ­ corresponding flood volume pairs rises as the stations downstream of the Danube River are passed. The result that the Frank copula is the most favored family agrees with the study of Reddy and Ganguli (2012), which concerning the GOF p-value, underlined the Frank copula as the most suitable in terms of the best fit for the flood peakvolume pairs. Moreover, in the studies of Chowdhary et al. (2011) and Bacová-Mitková and Halmová (2014), the Clayton copula family was reported as the most suitable choice for simulation of the flood peak-volume pairs. The results of the unacceptable performance (in fitting) of the Joe copula family support the results of Szolgay et al. (2015). Finally, a valuable outcome from the results of this study, which agree with the results of Favre et al. (2004) and Szolgay et al. (2015, 2016), could be that a further investigation of the choice of the "best" copula families for the flood peak-volume pairs is essential. For a more comprehensive and complete analysis of flood characteristics (peaks, volumes and durations), the following steps could be implemented in the near future: 1) a seasonal analysis splitting the year into several logical periods followed by applying the same methods of constructing the volumes and copula-fitting approach as in this study, 2) assessing the impacts of hydrotechnical projects on the study area (i.e., an examination of the correlation patterns between discharges ­ volumes due to dam construction), and 3) the use of partial duration series for a peak-over-threshold analysis using bivariate or trivariate copulas. Acknowledgement. This research was funded by the "COST Action ES0901, European procedures for flood frequency estimation (FloodFreq)" research program, within the Short-Term Scientific Mission collaboration of the University of Thessaly, Department of Civil Engineering, and the Slovak University of Technology in Bratislava, Faculty of Civil Engineering, Department of Land and Water Resources Management. This financial support is gratefully acknowledged. Furthermore, we would like to acknowledge the support of the ERC "FloodChange" Advanced Grant and the Slovak Grant Agency under VEGA Project Nos. 1/0776/13 and 1/0710/15 for their financial support. The authors would like to thank Katarína Jeneiová, PhD., for her help with data processing.

Journal

Journal of Hydrology and Hydromechanicsde Gruyter

Published: Dec 1, 2016

There are no references for this article.