# Inferring tsunami flow depth and flow speed from sediment deposits based on Ensemble Kalman Filtering

Inferring tsunami flow depth and flow speed from sediment deposits based on Ensemble Kalman... SUMMARY Flood hazards caused by tsunamis can cause enormous amounts of casualties and property losses. However, these hazards often occur at unexpected locations, which poses challenges for direct observations of flood source characteristics (e.g. flow speed and water depth in tsunamis-induced inundations). For this reason, inverse modelling methods based on the information contained in the deposit are indispensable for deciphering the quantitative characteristics of the flood sources. In this work, we propose an inversion scheme based on Ensemble Kalman Filtering (EnKF) to infer tsunami flow characteristics from sediment deposits. In contrast to traditional data assimilation methods using EnKF, in the current work the system state is augmented by including both the physical states (sediment fluxes) that are observable and the unknown parameters (flow speed and flow depth) to be inferred. A novelty of this work lies in formulating the tsunami inverse problem in a statistically rigorous way. Based on the Bayesian inference theory, the inversion scheme provides quantified uncertainties on the inferred quantities, which clearly distinguishes the present method with existing schemes for tsunami inversion. Two test cases with synthetic observation data are used to verify the proposed inversion scheme. Numerical results show that the tsunami flow variables inferred from the sediment deposit information agree with the synthetic truths very well, which clearly demonstrated the merits of the proposed inversion scheme. Moreover, a realistic application of the proposed inversion scheme with the field data from the 2006 South Java Tsunami is studied, and the results are validated against both previous inversion results and field data reported in the literature. The comparisons show excellent performance of the proposed approach. Tsunamis, Inverse theory, Numerical modelling, Probability distributions 1 INTRODUCTION Historically and in the modern era, human civilization and economic activities have prospered near oceans. This is partly due to the abundant availability of water for transportation and industrial activities. Coastal cities have become important nodes in global economic network. Furthermore, the increased economic relevance of cities, such as Los Angeles, Singapore or Hong Kong (to mention only three of many), has caused their population to significantly grow, including projections that even more people will be attracted to these power houses. NASA estimated that over one-third the total human population live within 100 km of a coastline, and this number will increase in the coming decades. A direct consequence of the tendency of settling near oceans is that throughout human history people had to battle with anthropogenic and naturally occurring coastal disasters, such as tsunamis, and to manage their threat with engineering infrastructure. The flooding hazards caused by tsunamis have caused enormous amounts of casualties and property losses (Nwe & Tokuzo 2010). It is thus crucial to understand the initiation and propagation of these floods and to predict their impact on infrastructure and human settlements (Tuck & Hwang 1972; Xiao et al. 2010; Young et al. 2010). On the other hand, tsunamis have been relatively rare, which is fortunate for these cities but poses great challenges for the scientific understanding of these events due to the sparseness of field data (Tappin et al. 2014). Moreover, the tsunami events often occur at unexpected times and locations along stretched coastlines, which makes direct observations infeasible. This is in spite of the recent advances in field monitoring techniques and the ever-increasing availability of sensors. In other words, the tsunami flow characteristics (e.g. flow speed and water depth) are volatile information. Fortunately, the sediment deposit and erosion patterns left behind by these floods are less volatile—they are often preserved for a long time. In fact, sediment deposits are indispensable records for the study of palaeotsunamis (Bourgeois et al. 1988; Dawson & Stewart 2007; Spiske et al. 2013). They are also critical information to obtain in post-disaster surveys (Richmond et al. 2012), which can be used to infer the tsunami characteristics along with other records (e.g. visual observations and oral histories). Indeed, researchers in the coastal hazard community have systematically pursued the inversion of water depth and flow velocities of tsunami-induced inundation from sediment deposits. Various models have been developed to estimate the flow conditions at the time of sediment depositions. These models include Moore’s Advection Model (Moore et al. 2007), Soulsby’s Model (Soulsby et al. 2007) and TsuSedMod (Jaffe & Gelfenbuam 2007), which have been successfully used in a wide range of applications (Spiske et al. 2010; Jaffe et al. 2011, 2012; Witter et al. 2012; Spiske et al. 2013). They all used the simple idea of target shooting. That is, a forward model of sedimentation, which computes the sediment deposition from given flow speed and water depth, is iteratively invoked until the sediment deposit thickness and grain size distribution that match the observation are obtained. Despite their simplicity and effectiveness, these existing models all have a major limitation in that they are not able to account for the uncertainties in the sediment deposit and the forward models. Uncertainties in sediment deposits can stem from various sources. First, the sampled sediment record can be very thin, and thus significant errors can occur in the sample collection and measurement processes. Second, a given sediment record can be the result of successive actions by multiple events, possibly of different natures (e.g. tsunamis and storm surges). This makes the interrogation of sediment record much more challenging. In addition to the uncertainties in sediment deposits, which are the input to the inverse problem, the modelling assumptions in the forward models also introduces significant uncertainties. For example, most numerical models of tsunami-induced sediment transport are based on Saint-Venant equations, which are computationally cheap but highly simplified (Xiao et al. 2010). They do not explicitly represent the small-scale turbulence processes and the local flow features within the boundary layer, but these features can significantly influence the sediment transport and deposition processes. In view of the large uncertainties in both the input and the model, it is imperative to understand them and to quantify the uncertainties they introduce in the inferred quantities. To this end, a statistical inference framework that can provide more rigorous confidence intervals is needed. Most widely used statistically rigorous inversion frameworks with quantified uncertainties are based on Bayes theorem. Markov Chain Monte Carlo (MCMC) sampling algorithm is the most accurate pathway to perform Bayesian inference (Law & Stuart 2012). However, it requires a large number of forward model evaluations, which is prohibitively expensive for complex forward simulations. Therefore, the use of approximate Bayesian approaches is necessary. There are a number of approximate Bayesian inference approaches, for example, Maximum a posteriori (MAP) (Ferraiuolo et al. 2004), variational methods (Caya et al. 2005), filtering-based methods (Evensen 1994). Among these approaches, the Ensemble Kalman Filtering (EnKF) method has its own advantages (Evensen 2009). First, it is a derivative-free inference method, making it easy to implement. It does not need to derive the complex adjoint equation that is required in the derivative-based methods. Second, the sequential filtering-based method can be used for the time-varying state (e.g. sediment flux). Third, EnKF is a non-intrusive method and thus can handle the high-dimensional, nonlinear system. Finally, the accuracy of the mean value of the inferred quantities by EnKF is comparable to that of the exact Bayesian approach based on the MCMC sampling, which is the golden standard of Bayesian inference (Law & Stuart 2012). Therefore, in this contribution, we propose a Bayesian scheme based on EnKF for the inference of flood source characteristics from sediment deposits. Specifically, this study is concerned with the inference of flow speed and water depth of a tsunami from its sediment deposits, which is referred to as ‘tsunami inversion’ following the conventions in the literature. However, we expect that the proposed method can be straightforwardly extended and adapted for the inference of source characteristics in other floods, for example, the breaching initiation and discharge rate history in riverine floods. These will be pursued in future studies. Data assimilation methods based on EnKF are widely used in geosciences applications (Evensen 2009). For example, in numerical weather forecasting, EnKF is used to infer the initial conditions (present states of the system, e.g. pressure, velocity, and temperature), which are not known precisely and thus must be inferred from observations. In subsurface flows, EnKF has been used to infer the permeability fields of soils and rocks, which have large uncertainties but are not amenable to full-field measurements due to the large size of the domain (Elsheikh et al. 2012). In addition, EnKF is also used to estimate CO2 emission fluxes from trace gas observations (Peters et al. 2005). A novelty of this work lies in formulating the tsunami inverse problem from sediment deposits by using a statistically rigorous approach, that is, EnKF-based data assimilation method. To the authors’ knowledge, our work represents the first attempt to use EnKF for tsunami inversion. This contribution can be seen as a proof of concept to demonstrate the feasibility of the approach. A more comprehensive study consisting of an evaluation of the proposed method, a demonstration of its practical relevance, and a parametric sensitivity study has been presented in a separate companion paper (Tang et al. 2016). The proposed method would open up possibilities for statistically rigorous inference of flood source characteristics from sediment deposits, where the inferred quantities are given with quantified uncertainties. The rest of the paper is organized as follows. Section 2 introduces the forward model of the sediment deposition by a tsunami wave and the formulation of the inverse problem. Section 3 presents the computational setup of three numerical examples of increasing difficulty levels used to demonstrate the merits of the proposed method. The detailed numerical results are presented and discussed in Section 4. Finally, Section 5 concludes the paper. 2 METHODOLOGY The objective of this work is to infer tsunami flow characteristics from the tsunami deposits, which is an inverse problem. As with most inversion algorithm, the proposed method for the inverse problem involves solving the forward problem repeatedly, that is, computing the tsunami deposit from given tsunami flow characteristics. Therefore, in this section the formulations, assumptions, and solution algorithm of the forward problem are first presented in Section 2.1. Subsequently, the formulation and solution algorithm for the inverse problem are introduced in Section 2.2. 2.1 Formulation of the forward problem and solution algorithm Given the tsunami flow characteristics (i.e. flow speed u and flow depth h), the forward problem is to compute the sediment deposition characteristics (i.e. the depth of the tsunami deposit and the particle size composition at any height of the sediment column). An example output of the forward problem is shown in Fig. 1(c), which shows the thickness of the tsunami deposit, the grain-size distribution in each layer. Figure 1. View largeDownload slide A schematic illustration of the algorithm for solving the forward problem, that is, computing the tsunami deposit thickness and the grain-size distribution in each layer from given tsunami characteristics (flow speed and flow depth). An exemplary output of the algorithm is presented in Fig. 2. Figure 1. View largeDownload slide A schematic illustration of the algorithm for solving the forward problem, that is, computing the tsunami deposit thickness and the grain-size distribution in each layer from given tsunami characteristics (flow speed and flow depth). An exemplary output of the algorithm is presented in Fig. 2. The forward model adopted in this study is based on the simplified sedimentation models of Jaffe & Gelfenbuam (2007) and Tang & Weiss (2015). In these models it is assumed that the tsunami deposit is formed solely by the steady sedimentation of the particles in the water column. The effects of bed load transport and the acceleration of sediment particles during the settling process are neglected. This assumption can be justified because the forward model is to calculate the flow speed near the peak inundation of the tsunami, during which, the formation of the sediment deposits is dominated by the suspended load. It is further assumed that the sediment concentration and the fluid velocity vary only vertically in the water column, and their horizontal gradients and temporal changes are neglected. With the assumptions above, the flow velocity profile u(z) along the water column can be parameterized by the shear velocity u*, where z is the elevation above the bed. Consequently, the depth-averaged flow velocity can be obtained from the following integral over the entire water column:   $$U = \int _{z_0}^{h}\frac{u_*^2}{K(z)} \text{d}z ,$$ (1)in which z0 is the total roughness of the bed and h is the flow depth. The eddy viscosity K can be calculated as following (Gelfenbaum & Smith 1986):   $$K(z) = \kappa \, u_* \, z \exp \left[ {\frac{-z}{h} - 3.2 \left( \frac{z}{h} \right)^2 + \frac{2}{3} \times 3.2 \left( \frac{z}{h} \right)^3} \right],$$ (2)where κ = 0.41 is the von Karman constant. The suspended sediment concentration for each grain size in the water column is assumed to follow the Rouse profile (Jaffe & Gelfenbuam 2007):   $$C_i (z) = C_{i, 0} \, \exp \left[ {w_{i} \int _{z_0}^{z} {\frac{1}{K(z)}}} \right]$$ (3)in which wi is the settling velocity of particles in the ith grain-size class, which depends on the mean particle diameter Di of the class; Ci, 0 denotes sediment concentration of the ith grain-size class at the bed, which depends on the shear velocity u* and the fraction of bed sediment in class i, among other parameters (Madsen et al. 1993). Based on eqs (1) and (3), the velocity and concentration profiles can be uniquely parameterized by two scalar quantities, the shear velocity u* and flow depth h. Therefore, from here on we consider u* and h the characteristics that define a tsunami. The objective of a tsunami inversion is thus to infer u* and h from a tsunami deposit record. Eq. (3) suggests that the sediment concentration for each sediment grain size can be determined by the tsunami characteristics (i.e. shear velocity u* and flow depth h) and the particle diameter Di. According to the convention in the sedimentology, the grain size is represented in ϕ scale (the logarithm of the particle diameter to the base 2), ϕi = −log2(Di/Dref), in which Dref = 1 mm is the reference particle diameter to ensure dimensional consistency. Consequently, the tsunami deposit thickness and grain-size distribution can be obtained by integrating the concentration curve Ci(z) for each grain size class. The integration is performed with the following algorithm, which is illustrated in Fig. 1 with two grain-size classes as an example. Discretization of time. Assuming that T is the total time taken for all the sediment in the water column (including all grain-size classes) to settle, we divide the time T to N time steps of size Δt such that T = N Δt. Discretization of water column. For grain-size class i, the water column can be divided to N layers, numbered sequentially upward from the bottom (see Fig. 1a), such that the sediment at the top of layer l arrives at the bed at time l Δt. Since the particles are assumed to have no acceleration (e.g. at constant velocity wi) during the sedimentation, the layers of the water column have a uniform thickness of Δzi = wi Δt. Note that the layer thickness can be different among different grain-size classes since the terminal velocity wi is larger for coarse grains than for fine grains. Consequently, for a coarse grain-size class the discretized water column layer thickness Δzi = wiΔt is larger and thus the number of discretized water column layers is smaller. This can be seen by comparing Figs 1(a) and (b). Accumulation of tsunami deposit. With the discretization of the water column, it can be seen that the obtained tsunami deposit has N layers as well (numbered in the same way as for the water column; see Fig. 1c). The sediment in layer l consists of the sediment in the lth layer of the water column for all grain-size classes (indicted in colours/patterns in Fig. 1c). The tsunami deposit thickness Δηl of the lth layer is computed by summing up the sediment volume in the corresponding lth water column layers for all grain-size classes:   $$\Delta \eta _l = \frac{1}{C_0} \left( \sum _{i = 1}^{n} {\overline{C}_{i, l} \, \Delta z_{i}} \right)$$ (4)in which C0 is the total sediment concentration at the bed including size classes, n is the number of grain-size classes, and $$\overline{C}_{i, l}$$ is the average concentration of grain-size class i in the water column layer l, which can be obtained by a simple integration:   $$\bar{C}_{i, l} = \frac{1}{\Delta z_i} \int _z^{z + \Delta z_i} \, C_i (z) \, \text{d}z.$$ (5) Post-processing for grain-size distribution. The fraction fi, l of each grain-size class i in the lth layer in the sediment column is   $$f_{i, l} = \frac{1}{C_0} \frac{\overline{C}_{i, l} \, \Delta z_{i, l}}{\Delta \eta _l}.$$ (6) The thickness of sediment layers Δηl and the fraction fi, l for each grain-size class in each layer are used to produce the plots presented in Figs 1(c) and 2, which is the final output of the forward problem. Figure 2. View largeDownload slide Hypothetical 0.2 m thick tsunami deposit for verification cases: vertical grading in grain-size distributions (blue line) and mean grain-size (red line) for sediment intervals. The grain-size distribution of the entire tsunami deposit is a log-norm distribution. Figure 2. View largeDownload slide Hypothetical 0.2 m thick tsunami deposit for verification cases: vertical grading in grain-size distributions (blue line) and mean grain-size (red line) for sediment intervals. The grain-size distribution of the entire tsunami deposit is a log-norm distribution. The algorithm above for the forward problem produces some key information, which is the time stamp of tsunami deposit layers and the sediment flux at each time step. Specifically, the time, at which the sediment layer l finished the deposition at time l Δt, can be considered the time stamp on the sediment layer (see Fig. 1b). Admittedly, the sediment cores obtained in the field do not come with time stamps. However, by comparing sediment cores obtained at several locations with cross-shore offsets, it is possible to infer the time stamps and thus the sediment fluxes at discretized time intervals (Dearing et al. 1981). To facilitate the filtering procedure to be used in the inversion, we adopt an alternative approach of computing tsunami deposit thickness by considering the time sequence of the sedimentation process. Specifically, in contrast to eq. (4), the deposition thickness can be computed from the average sediment flux ζi, l of grain-size class i at time step l when the lth layer of the sediment deposit is formed. That is,   \begin{eqnarray} \Delta \eta _l = \frac{1}{C_0} \left( \sum _{i = 1}^{n} \zeta _{i, l} \, \Delta t \right), \quad \textrm{with} \end{eqnarray} (7a)  \begin{eqnarray} \zeta _{i, l} & =& \frac{\overline{C}_{i, l} \, \Delta z_{i}}{\Delta t}, \quad \textrm{or equivalently,} \nonumber\\ \zeta _{i, l} & =& \overline{C}_{i, l} \, w_i \quad \textrm{in which} \quad w_i = \frac{\Delta z_{i}}{\Delta t} . \end{eqnarray} (7b) Note that the same subscript l that is used above as the indices of the water column layer (Figs 1a and b) and sediment layer (Fig. 1c) is used to denote the time step index here. This choice of notation is justified by the assumption that the lth sediment layer is formed by the deposition of all sediment grain classes in the lth water column layer at time step l. Simply substituting eq. (7b) to eq. (7a) yields eq. (4), and thus the two formulations in eqs (4) and (7) are equivalent. By adopting the flux-based formulation, we are modelling the sediment deposition process as the evolution of a dynamical system with the sediment flux ζi for each grain-size class i as the system state. Based on the assumptions from eq. (7b), it can be seen that for any grain-size class i the state ζi(t) is uniquely determined by the concentration profile Ci(z) and the settling velocity wi, which in turn depends on the tsunami characteristics (shear velocity u* and flow depth h) and the grain-size ϕi. Therefore, the forward model $$\mathcal {F}$$ is thus formulated as to compute sediment deposition flux ζi(t) from known shear velocity u* and flow depth h, that is, $$\mathcal {F}: (u_*, h) \mapsto \zeta _{i} (t)$$. The sediment thickness and grain-size distribution are considered auxiliary quantities obtained by post-processing the time-series of the system state ζi(t). 2.2 Formulation of the inverse problem and solution algorithm The objective of the inverse problem is to infer the tsunami characteristics (depth-averaged flow velocity and flow depth) from tsunami deposit. In the formulation of the forward problem above, the flow velocity profile and the depth-averaged velocity are parameterized by the shear velocity as in eq. (1), and the sediment flux is the state variable of the system. Therefore, the inverse problem is recast as inferring the shear velocity and flow depth from the sediment flux. The sediment flux, which is the input to the inverse problem, can be obtained by analysing the tsunami deposit from the field. Specifically, when tsunami deposit samples are cored, they are first divided to a number of layers and to obtain the time stamp for each layer by utilizing the spatial information of the samples. Particle-size analysis is then performed on each layer, that is, by using sieve analysis or other sedimentation techniques (Barth 1984), which leads to the average sediment flux ζi at a few discrete time steps. The inverse problem needs to be formulated so that shear velocity u* and flow depth h can be inferred from the sediment flux. As such, we consider u* and h the unobservable parameters of the dynamical system $$\mathcal {F}(u_*, h; \zeta _i (t))$$. They will be inferred from observations of the system state, that is, the sediment flux ζi(t). The inversion of shear velocity and flow depth from observed sediment flux is challenging for at least two reasons. First, the observation is inevitably sparse and noisy, because the sediment core can only be divided to a few layers to ensure each layer has enough sediment mass, and the measurement has large errors. Second, the forward model describing the sedimentation process is based on high simplified assumptions and thus does not faithfully represent the exact system dynamics. In this work we use the Ensemble Kalman Filtering (EnKF) to perform the inversion (Evensen 2003, 2009; Iglesias et al. 2013), which is widely used in data assimilations, particularly in numerical weather forecasting. When used to solve the tsunami deposit inversion problem, the system state is first augmented to include both the physical state ζi(t), which are observable, and parameters u* and h, which are unobservable (from the sediment core) and are to be inferred. The augmented system state x(t) is written as a vector formed by stacking the unknown parameters and the sediment flux ζi(t):   $$\mathbf {x} = [\zeta _1, \ldots , \zeta _n, u_*, h]^{\prime },$$ (8)in which ΄ indicates vector transpose. Given the prior distributions for parameters (u* and h) to be inferred and the covariance matrix R of the sediment flux observations $$\zeta _{\rm i}^{\rm obs}$$, the inversion algorithm proceeds as follows: Sampling of prior distribution. From the prior distributions of the parameters, M samples are drawn. Each sample consists of a combination of values for u* and h. Propagation. The sediment fluxes $$\hat{\zeta }_{i}$$ for all grain-size classes are computed from eq. (7b) by using the updated parameters u* and h from the previous analysis step (or from the initial sampling if this is the first propagation step). The propagation is performed for ΔN time steps, where ΔN is the number of time steps in the time interval ΔT between two consecutive data assimilation operations. The $$\hat{\cdot }$$ indicates predicted quantities that will be corrected in the analysis step below. The propagation is performed for each sample in the ensemble, leading to the propagated ensemble $${\lbrace \hat{\mathbf {x}}_j\rbrace _{j = 1}^M}$$. Each sample $$\hat{\mathbf {x}}_j$$ is a vector containing a realization of shear velocity and flow depth as well as the corresponding sediment flux (see eq. 8). The mean $$\bar{\mathbf {x}}$$ and covariance P of the propagated ensemble are computed (see eq. A1b in Appendix A). Analysis/correction. The computed sediment fluxes $$\hat{\zeta }_{i}$$ for all grain-size classes are compared with observations $$\zeta _{i}^{\text{obs}}$$ corresponding to the current time step l. The ensemble covariance P and the error covariance R are used to compute the Kalman gain matrix K. Each sample is corrected as follows:   $$\mathbf {x}_j = \hat{\mathbf {x}}_j + \mathbf {K} ({\boldsymbol{\zeta }_j} - \mathbf {H} \hat{\mathbf {x}}_j),$$ (9)where superscript xj is the corrected system state; $$\boldsymbol{\zeta } = [\zeta _1, \ldots , \zeta _n]^{\prime }$$ are the sediment fluxes, the part of the system state vector that can be observed; H is the observation matrix. After the correction, the analysed state contains updated fluxes and parameters.  It should be remarked that: (1) the analysis scheme above suggest that the corrected state (i.e. the analysis) is a linear combination of the prediction and observations, with the Kalman gain matrix K being the weight of the observations; (2) the observation matrix $$\mathbf {H}: \mathbb {R}^{n+2} \mapsto \mathbb {R}^n$$ has a size of n × (n + 2), which maps a vector in the n + 2 dimensional state space to a vector in the n-dimensional observation space. The first n columns of H are ones and the last two columns are zeros, indicating that the parameters are not observed. Repeat propagation and analysis steps 2–3 for next data assimilation time t + ΔT to incorporate next observation until all observations are assimilated. The EnKF algorithm for inferring tsunami characteristics from tsunami deposit is summarized in Fig. 3. The detailed algorithm is presented in Appendix A. Figure 3. View largeDownload slide Schematic of the inverse modelling approach based on the state augmentation. The system state is augmented to include both the physical state (sediment flux ζi(t)) and the parameters to be inferred (shear velocity u* and flow depth h). An ensemble representative of the augmented state is propagated via the forward model. The propagated ensemble is then updated in the analysis process based on the observation data. The updated state (physical quantities and model parameters) is then propagated to the next time step, and this loop continues until all observations are assimilated. Figure 3. View largeDownload slide Schematic of the inverse modelling approach based on the state augmentation. The system state is augmented to include both the physical state (sediment flux ζi(t)) and the parameters to be inferred (shear velocity u* and flow depth h). An ensemble representative of the augmented state is propagated via the forward model. The propagated ensemble is then updated in the analysis process based on the observation data. The updated state (physical quantities and model parameters) is then propagated to the next time step, and this loop continues until all observations are assimilated. It can be seen that the observations arrive sequentially in the EnKF data assimilation procedure above, which is typical for applications such as numerical weather forecasting. In this work we formulate the tsunami inversion problem with a sequential streaming of data to take advantage of the widely used EnKF algorithm. In this method, the filtering procedure finds an optimal correction at each assimilation step based on the latest observation and the latest prediction ensemble (see eq. 9). We note that it can be preferable to use another algorithm that is closely related to EnKF, namely the Ensemble Kalman Smoothing method, which finds optimal correction in light of all past observations (Evensen & Van Leeuwen 2000). This method will be investigated in future work. 3 COMPUTATIONAL SETUP OF CASES While EnKF-based Bayesian inferences have been widely used in other communities of geosciences, the present contribution represents the first attempt in using it for tsunami inversion. To establish confidence in the proposed framework for tsunami inversion based on sediment deposits, we construct a series of verification cases with synthetic truths to assess the performance of the proposed inversion scheme. Furthermore, we test the proposed framework by using a set of field data of the real tsunami deposits from the 2006 South Java Tsunami (sections Bunton, sample JTR 6 (Spiske et al. 2010)). A synthetic case can be generated by running the forward model described in Section 2.1 on given a set of tsunami and sediment characteristics (i.e. shear velocity $$\tilde{u}_*$$ and flow depth $$\tilde{h}$$, the range of particle sizes $$\tilde{\phi }_i$$, where $$\tilde{\cdot }$$ indicates synthetic truths). The corresponding tsunami deposits including the thickness and the grain-size distribution as shown in Fig. 2 can thus be obtained. Subsequently, the sediment flux can be obtained by post-processing the tsunami deposit information with the procedure explained in Section 2.2. In fact, for the synthetic cases the sediment fluxes are part of the forward model simulation output, and thus a post-processing procedure is not required. Synthetic observations are then generated by adding Gaussian random noises of standard deviation σi to the true sediment fluxes, which represent the measurement and sampling errors in sediment coring operations in the field. The decision to use synthetic cases instead of realistic cases to verify the proposed method is justified by the fact that the true tsunami characteristics corresponding to actual field samples are usually unknown, which make them not ideal for verification purposes. Even if the truth for tsunami characteristics were known, for example, when the flow speed and flow depth were measured from independent sources, it would be difficult to distinguish the errors due to the forward model inadequacy and those due to the inversion procedure. Therefore, using the synthetic cases allow us to focus on assessing the performance of the proposed inversion procedure. The merits of the inversion scheme can be assessed by its capability to reproduce the synthetic truths $$\tilde{u}_*$$ and $$\tilde{h}$$, to which the inversion scheme is blind. With the established confidence from the verification cases, field data of deposits from the 2006 Java tsunami event are used as the observations to demonstrate the capability of the proposed framework in realistic applications. For simplicity, we assume that the shear velocity u* and flow depth h, which are to be inferred, are assumed time-invariant. This is because the temporal gradients in mean flow speed during much of the tsunami inundation landward are small. Near the peak of the tsunami, as the tsunami periods are very long, the time change in flow speed is small and the flow can be approximated as steady. However, the proposed scheme can be straightforwardly extended to time-varying shear velocity u*(t) by incorporating iterations in each time step. As a proof-of-concept study, the tsunami deposit at only one onshore location is utilized. For the problem where sediment deposits are available at multiple locations, the proposed scheme also can be applied by expanding the state vector to include fluxes at different locations. In this case, in addition to the local tsunami flow quantities inferred in this study, other information also can be obtained, for example, wavelength, period, number of waves and duration of the tsunami. Two verification cases with synthetic observations of increasing difficulty levels are defined. In case 1, the sediment has a single grain size ϕ = 2.0, and the only unknown parameter to be inferred is the shear velocity u* while the flow depth h is given. In case 2, which is more challenging, the sediment has a log-normal grain-size distribution with 0 < ϕ < 3.25. Both u* and h are unknown and are to be inferred. The synthetic truths for u* and h, the prior ensemble means ($$\bar{u}_*^{0}$$ for both cases and $$\bar{h}^{0}$$ for case 2 only) are the same for both synthetic cases. For the realistic case with field data (referred to as case 3), the mean grain size, largest grain size and smallest grain size are ϕ = 2.5, ϕ = 0.0 and ϕ = 5.25, respectively. The mean values of prior ensembles are estimated by TSUFLIND (Tang & Weiss 2015). All other parameters, including the forward model time step Δt, the data assimilation interval ΔN, and the number of samples M are the same for all 3 cases. The parameters are summarized in Table 1. The prior ensembles for both parameters are uniformly distributed in the ranges specified in Table 1, which is representative of the lack of knowledge on the quantities to be inferred in practical tsunami inversions. Since the truths of the quantities to be inferred, $$\tilde{u}_*$$ and $$\tilde{h}$$, are unknown when performing the inversion, the mean values of prior ensemble $$\bar{u}_*^0$$ and $$\bar{h}^0$$ for both quantities are likely to have biases compared to the synthetic truth. This is reflected in the choice of ensemble mean as shown in Table 1. The time step Δt = 0.5 s is chosen for all cases, and the observations of sediment fluxes are assimilated every ΔN = 10 time steps. The error covariance matrix is chosen as $$\mathbf {R} = \textrm{diag}[\sigma _1^2, \ldots , \sigma _n^2]$$ with σi being the standard deviation of the white noise added to the true sediment fluxes for each grain size i. The choice of standard deviation of noises shown in Table 1 is based on the error model detailed in the companion paper (Tang et al. 2016). Table 1. Physical and computational parameters for test cases. Parameters  Case 1  Case 2  Case 3  Inferred parameters  u*  u* and h  u* and h  Grain size ϕ ( = −log2[D/Dref])  2.0  0 < ϕ < 3.25  0 < ϕ < 5.25  Δt  0.5 s  Data assimilation interval ΔN  10  Number of samples M  1000  std. σi of observation errora  ε + 0.01$$\tilde{\zeta }_i$$  ε + 0.05$$\tilde{\zeta }_i$$  Synthetic truth $$\tilde{u}_*$$ for u*  0.5 m s−1  −  Range of prior ensemble for u*b  0.4–1.2 m s−1  0.15–0.45 m s−1  Mean $$\bar{u}_*^{0}$$ of prior ensemble for u*  0.8 m s−1  0.3 m s−1  Synthetic truth $$\tilde{h}$$ for h  3 m  3 m  —  Range of prior ensemble for h  —  2.5–7.5 m  6–10 m  Mean $$\bar{h}^0$$ of prior ensemble for h  —  5 m  8 m  Parameters  Case 1  Case 2  Case 3  Inferred parameters  u*  u* and h  u* and h  Grain size ϕ ( = −log2[D/Dref])  2.0  0 < ϕ < 3.25  0 < ϕ < 5.25  Δt  0.5 s  Data assimilation interval ΔN  10  Number of samples M  1000  std. σi of observation errora  ε + 0.01$$\tilde{\zeta }_i$$  ε + 0.05$$\tilde{\zeta }_i$$  Synthetic truth $$\tilde{u}_*$$ for u*  0.5 m s−1  −  Range of prior ensemble for u*b  0.4–1.2 m s−1  0.15–0.45 m s−1  Mean $$\bar{u}_*^{0}$$ of prior ensemble for u*  0.8 m s−1  0.3 m s−1  Synthetic truth $$\tilde{h}$$ for h  3 m  3 m  —  Range of prior ensemble for h  —  2.5–7.5 m  6–10 m  Mean $$\bar{h}^0$$ of prior ensemble for h  —  5 m  8 m  astd. denotes standard deviation; the standard deviation of noise added to the observation depend on the grain-size class; ε = 0.125 mm3 s−1 is a fixed constant, and $$\tilde{\zeta }$$ is the synthetic truth of the sediment flux, which depends on the grain size and vary with time. bA smaller range of prior ensemble for u* from 0.7 to 0.9 m s−1 is also investigated in case 1. View Large 4 RESULTS AND DISCUSSION 4.1 Case 1: a single grain-size class and one unknown parameter First, a case with simple conditions is studied where all tsunami deposits are in the same grain-size class and the shear velocity u* is the only unknown parameter to be inferred. Although this highly simplified scenario is likely to be uncommon in reality, a case with controlled conditions is able to provide a reasonable initial assessment of the proposed method. The time-series ζ(t) of the sediment flux, which is the physical state of the system, is presented in Fig. 4. The sediment fluxes for all 1000 samples are shown along with the synthetic truth corresponding to the true shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. It can be seen that for all samples (i.e. prior guesses of shear velocities) and the synthetic truth show a general trend that the magnitude of sediment flux decreases with time, which is typical of a Rouse sediment concentration profile assumed in the forward model as in eq. (3). Figure 4. View largeDownload slide Time-series of the sediment fluxes ζ(t), which is the physical state of the system, for a single grain-size class ϕ = 2.0 during the sedimentation process. The green (light grey) lines show M = 1000 sediment flux samples, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth of the shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. (The 2σo observation error bar is also plotted in the figure.) The synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. Figure 4. View largeDownload slide Time-series of the sediment fluxes ζ(t), which is the physical state of the system, for a single grain-size class ϕ = 2.0 during the sedimentation process. The green (light grey) lines show M = 1000 sediment flux samples, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth of the shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. (The 2σo observation error bar is also plotted in the figure.) The synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. The convergence history of the inferred parameter, the shear velocity u*, is shown in Fig. 5(a). By assimilating the observation data as shown in Fig. 4, the shear velocity of all samples and the sample mean gradually converge to the synthetic truth $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$, regardless of their values in the initial ensemble. The range of sample scattering for u*, which is 0.8 m s−1 (with a range from 0.4 to 1.2 m s−1) in the prior ensemble, shrinks to 0.04 m s−1 at the end of the inference. Since the scattering of samples in EnKF-based inference is indicative of the uncertainty in the inferred results, the reduction of sample scattering represents the reduction of uncertainties and correspondingly the increase of confidence in the quantity to be inferred. Figure 5. View largeDownload slide Convergence history of the inferred parameter, the shear velocity u*, for the single grain-size case. The plot shows (a) the convergence of the ensemble and the sample mean as well as (b) the evolution of the probability density function of the shear velocity u* in the inversion process. (c) The corresponding cumulative distribution function for the final inference results. Figure 5. View largeDownload slide Convergence history of the inferred parameter, the shear velocity u*, for the single grain-size case. The plot shows (a) the convergence of the ensemble and the sample mean as well as (b) the evolution of the probability density function of the shear velocity u* in the inversion process. (c) The corresponding cumulative distribution function for the final inference results. The probability density function of the shear velocity u* corresponding to the ensemble at time steps 0 (initial), 30, 50, 100 and 200 (final) are presented in Fig. 5(b), highlighting the evolution of uncertainty in the inferred quantity. Note that the changes of shear velocity u* at different time steps does not mean that u* is time-varying. Rather, the shear velocity is updated because the inference is formulated as a filtering problem. That is, the velocity ensemble is updated whenever new ‘observations’ are streamed in. We can see that the shear velocity u* is equally likely between 0.4 and 1.2 m s−1 because of the non-informative, uniform prior distribution that is chosen. At time step 50 (after five observations assimilated) the samples are scattered between approximately 0.47 and 0.54 m s−1. Moreover, the bias of the ensemble mean compared to the truth has been largely corrected. As the data assimilation continues, the distribution of u* continues to shrink and concentrate towards the truth ($$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$). The final, posterior distribution of u* at time step 200 is Gaussian based on the plot shown in Fig. 5(b). While this could well be the true distribution, it should be interpreted with caution, since it is well known that the EnKF procedure is derived under Gaussian assumptions (Evensen 2009). The cumulative distribution function of the posterior distribution is presented in Fig. 5(c). It shows that 95 per cent of ensembles have shear velocities u* between 0.490 and 0.508 m s−1. In other words, the inference suggest that there is 95 per cent probability that the shear velocity u* falls within the range above. This credible interval is based on both the prior distribution and the observation data. The uncertainties represented as credible intervals that are obtained in the inversion process is based on the Bayesian inference theory, which clearly distinguishes the present method with existing schemes for tsunami inversion. The evolution of the inference uncertainty over time as shown in Figs 5(a) and (b) illustrates the Bayesian nature of the EnKF-based data assimilation method. In the inference, one starts with a subjective prior distribution on the unknown parameters, which is usually non-informative (see the wide distribution at time step 0) and is represented with an ensemble. As observation data are assimilated, the distribution becomes narrower. Meanwhile, the importance of the prior distribution diminishes as more and more data are assimilated. The remaining uncertainties in the inference results stems from the uncertainties in the observation data, which are inevitable in field measurement and are represented with random noise in this study. The proposed approach provides a statistical pathway to consider the uncertainties from various sources, for example, model inputs, numerical model, and observation data. However, the estimated uncertainties are approximate, because the EnKF is an approximate Bayesian approach based on Gaussian assumption. Further investigations are still needed for assessing the accuracy of the estimated uncertainties. In order to demonstrate the robustness of the inversion procedure with respect to the specified prior distribution, we also tested a prior ensemble (i.e. initial guesses of the parameter to be inferred) with u* in the range of 0.7 and 0.9 m s−1. In contrast to the case presented above, this range does not cover the truth $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. Even with this overly confident prior distribution, almost identical inversion results were obtained. In fact, the differences between the convergence history of u* disappear after a few assimilation steps. As such, detailed results for this case are omitted here for brevity. The relative inference error for the shear velocity is presented in Fig. 6, which is defined as the L2 norm $$| (\bar{u}_* - \tilde{u}_*) / \tilde{u}_*|$$ of the difference between the ensemble mean $$\bar{u}_*$$ and the synthetic truth $$\tilde{u}_*$$. It can be seen that the inference error decreases dramatically within the first few data assimilations steps, from 0.6 for the initial prior ensemble (time step 0) to 0.025 after five observations are assimilated (at time step 50). This finding is consistent with the decrease of the ensemble scattering (indicating inference uncertainties) as shown in Fig. 5(a). Figure 6. View largeDownload slide The L2 norm of the inference error for the shear velocity, defined as the difference $$| (\bar{u}_* - \tilde{u}_*) / \tilde{u}_* |$$ between the ensemble mean $$\bar{u}_*$$ and the synthetic truth $$\tilde{u}_*$$. Figure 6. View largeDownload slide The L2 norm of the inference error for the shear velocity, defined as the difference $$| (\bar{u}_* - \tilde{u}_*) / \tilde{u}_* |$$ between the ensemble mean $$\bar{u}_*$$ and the synthetic truth $$\tilde{u}_*$$. 4.2 Case 2: multiple grain-size classes and two unknown parameters A tsunami inversion problem with wide grain-size distribution is studied to demonstrate the capability of the proposed method in a more realistic scenario. Both the shear velocity u* and the flow depth h are unknown and the two parameters must be inferred simultaneously. The input to the tsunami inversion procedure is the analysed results of tsunami deposit column as shown in Fig. 2. The grain-size distribution along the depth of the sediment column is obtained from a forward simulation with shear velocity and flow depth as specified in Table 1. The particle sizes range from 0 to 3.25 in ϕ scale, or equivalently from 1 mm to 0.105 mm. The range is divided into 10 grain-size classes, ϕi with i = 1, …, 10, equally spaced in ϕ scale. The superscripts are indices of the grain-size classes. The time-series of sediment fluxes for six representative grain-size classes are shown in Fig. 7, that is, ϕ2 = 0.65, ϕ3 = 0.975, ϕ4 = 1.3, ϕ5 = 1.625, ϕ6 = 1.95, and ϕ7 = 2.275. As in the case with a single grain-size class, the sediment fluxes ζi(t) for all grain-size classes decrease as the deposition proceeds, and the uncertainties represented by the ensemble scattering are reduced as the observations are assimilated. Moreover, the sediment fluxes for the coarse grains (corresponding to smaller ϕ values, for example, ϕ2 = 0.65, D = 0.637 mm as shown in Fig. 7a) decrease more rapidly than those for the finer grains with larger ϕ values. It can be seen from Fig. 7(a) that the coarsest (ϕ2 = 0.65) among the six grain-size classes completed sedimentation in 60 time steps (i.e. 30 s since Δt = 0.5 s). In contrast, the finest grain-size class (ϕ7 = 2.275, D = 0.105 mm) among the six has a more uniform sediment flux throughout the entire sedimentation process. Again, this is attributed to the assumed Rouse profile. It is also noted that the relative uncertainties (ensemble scattering) in the sediment fluxes at the end of the inversion are larger for the fine grains than for the coarse grains. This is due to the random noises added to the true sediment flux when generating synthetic observations. As shown in Table 1, the standard deviation σi of the noise has a fixed component (ε) and a component $$0.01 \tilde{\eta }_i$$ proportional to the truth. Since the sediment flux for the finer grains are smaller in absolute value, as can be seen from the different vertical axis ranges in panels (a)–(f), the relative observation uncertainties are thus larger for the sediment fluxes of the finer grain sizes. Consequently, the uncertainties in ensemble directly reflect the uncertainties in the observations. Figure 7. View largeDownload slide Time-series of sediment fluxes ζ(t) for six (among a total of 10) grain-size classes during the sedimentation: (a) ϕ2 = 0.65, (b) ϕ3 = 0.975, (c) ϕ4 = 1.3, (d) ϕ5 = 1.625, (e) ϕ6 = 1.95 and (f) ϕ7 = 2.275. The green (light grey) lines show M = 1000 samples of sediment flux time-series, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth, that is, shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$ and flow depth h = 3 m. synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. Figure 7. View largeDownload slide Time-series of sediment fluxes ζ(t) for six (among a total of 10) grain-size classes during the sedimentation: (a) ϕ2 = 0.65, (b) ϕ3 = 0.975, (c) ϕ4 = 1.3, (d) ϕ5 = 1.625, (e) ϕ6 = 1.95 and (f) ϕ7 = 2.275. The green (light grey) lines show M = 1000 samples of sediment flux time-series, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth, that is, shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$ and flow depth h = 3 m. synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. The convergence history of the two parameters to be inferred, shear velocity and flow depth, are presented in Fig. 8. It can be seen from Figs 8(a) and (b) that the sample means for both parameters (red dotted lines) converge to the synthetic truths (horizontal black dashed lines) within 50 time steps, that is, after five observations are assimilated, although the prior ensemble means deviate significantly from the truths at time step 0. A more comprehensive parameter study (Tang et al. 2016) also shows that the prediction results are quite robust with different prior ensembles, clearly suggesting that the system is rather well-posed. Moreover, the uncertainties as indicated by the ensemble scattering are reduced significantly. We note two points here. First, comparison between Figs 8(a) and (b) suggests that the shear velocity converges to the truth faster than the flow depth does. Since EnKF inversion procedure depends on correlation to make inferences, this seems to indicate that the sediment fluxes, which are the observed physical state, are more sensitive to the shear velocity than that to the flow depth. Second, the convergence of shear velocity in the multiple grain-size case as shown in Fig. 8(b) is slightly faster than that in the single grain-size class case in Fig. 5(a). A major difference between the two cases is that the state in the multiple grain-size case is the sediment flux for all grain-size classes, that is, a vector of size 10 with one component corresponding to each grain-size class. Accordingly, an observation in the multiple grain-size case is also a vector of size 10. This is much more information than in the single grain-size class case, where the state and observations are only scalars. More information in the observation leads to more accurate inference. Intuitively, one would expect the multiple grain-size class case to be more difficult, particularly considering the fact that there are two unknown parameters. Indeed, this apparently counterintuitive finding suggests that the setup here is not entirely realistic in that we used similar levels of relative error for both the single and multiple grain-size cases. In the field, when a sediment core of a given size is divided to yield grain-size distributions for multiple grain-size classes, the relative measurement error in the obtained grain-size distribution would inevitably increase compared to a single grain-size class case. Therefore, it would have been more realistic to use a larger observation error for the multiple grain-size case. The effect of observation errors on the inference results and the optimization of number of grain-size classes from a given sediment core are topics of future work. The decrease of uncertainties in the inferred parameters can be clearly seen in the probability density functions shown in Figs 8(c) and (d). For both parameters, the probability density functions shrink continuously towards the truth, indicating the gain of confidence as more observations are assimilated. Overall, in this more realistic and challenging test case, we found that the proposed method has equally satisfactory performance compared to that in the single grain-size case. Figure 8. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. Panels (c) and (d) show the evolution of probability density functions for the shear velocity and the flow depth, respectively, during the inversion process. In panel (c), a zoomed-in view is presented as inset to show the detailed results in the first few steps. Figure 8. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. Panels (c) and (d) show the evolution of probability density functions for the shear velocity and the flow depth, respectively, during the inversion process. In panel (c), a zoomed-in view is presented as inset to show the detailed results in the first few steps. 4.3 Case 3: realistic application For this case, a deposit column with a 0.12 m thickness from the 2006 South Java Tsunami is used as the observation (Spiske et al. 2010). We simultaneously infer both unknown parameters, shear velocity u* and flow depth h. The particle sizes range from 0 to 5.25 in ϕ scale, or equivalently from 1 mm to 0.0263 mm in diameter. This range was divided into 15 grain-size classes ϕi with i = 1, …, 15, equally spaced in ϕ scale. The convergence history of shear velocity and flow depth is presented in Fig. 9. We can see that the scattering of both parameters is reduced. Within 200 steps, flow depths in most of the samples converges to the range of 6 to 8 m from initial range of 4–12 m. Meanwhile, shear velocities in the samples converges to a small interval around 0.235 m s−1. Similar to what has been shown in synthetic case 2, the shear velocity converges faster than the flow depth does. The scattering of shear velocity samples is significantly reduced after 200 steps, while the flow depth samples are still largely scattered, indicating a relatively large posterior inference uncertainty. This is because the sediment fluxes are not sensitive to the flow depth, especially when the flow depth is large. Specifically, the Rouse profiles depicted in Fig. 1 shows that the sediment concentration is low in the upper region of the water column. When the flow is deep, the change of its depth does not significantly affects the sediment concentration in the upper layers of the water, and thus the sediment fluxes are not sensitive to the flow depth. The probability density functions of the flow depth and shear velocity in the last time step (posterior) are shown in Figs 9(c) and (d), respectively. The posterior distributions of flow depth and shear velocity are approximately Gaussian with mean values of 7.0 m and 0.236 m s−1, respectively. Figure 9. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h, for case 3. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. The green (light grey) lines denote samples and the red dashed lines denote sample means. Panels (c) and (d) show the probability density functions for the inferred shear velocity and the flow depth, respectively, at the final time step (200). The sample mean is denoted with red dashed line. Figure 9. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h, for case 3. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. The green (light grey) lines denote samples and the red dashed lines denote sample means. Panels (c) and (d) show the probability density functions for the inferred shear velocity and the flow depth, respectively, at the final time step (200). The sample mean is denoted with red dashed line. Since there are no ground truths of flow depth and shear velocity available for validation in this realistic case, we have to validate the inference results indirectly. Specifically, the depth-averaged velocity is computed by using the inferred shear velocity and flow depth based on eq. (1) and is compared with that obtained by Spiske et al. (2010) with the inversion model TsuSedMod. The comparison is shown in Fig. 10(a). We can see that the depth-averaged velocity from our inference is distributed from 7.8 to 8.2 m s−1 with a mean value of 8.0 m s−1. This posterior credible interval covers the inference result (8.06 m s−1) of Spiske et al. (2010), although the sample mean from our inference is slightly smaller. A possible explanation of the discrepancy in the two mean estimates is that Spiske used the entire deposit for the TsuSedMod model to perform the inversion, but the sediments in the bottom layers of the deposit may be transported by bed load. Since the TsuSedMod model assumes the sediments is formed only by suspended load (Jaffe & Gelfenbuam 2007), the velocity obtained by Spiske may have been overestimated. In order to compare the inversion results with the field data, we perform forward simulations of sedimentation with the inferred parameters, that is, mean values of posterior flow depth and shear velocity. Fig. 10(b) compares the grain-size distribution obtained based on the inferred parameters with that from the field data. It can be seen that the inferred grain-size distribution agrees very well with the field data, which shows a satisfactory performance of the proposed inversion scheme in the realistic case. However, it should be noted that this indirect validation of the inference does not guarantee the inferred quantities (i.e. flow depth and shear velocity) are accurate, since the model error of the forward model affects the accuracy of the inversion. Using a forward model of higher fidelity and accuracy can lead to further improved inferences. Finally, we briefly comment on the computational cost of the proposed inversion framework compared to the previous tsunami inversion schemes. Previously developed inversion schemes all used the optimization-based target shooting. Their computational costs depend on the choice of optimization scheme. Although gradient-based optimization methods are more efficient, obtaining the gradient information is very difficult for complex nonlinear physical system. Without the gradient information and a reduced model, the brutal-force data fitting scheme involves a large number of forward-model evaluations, which might be much more computationally expensive. In particular, the computational costs could exponentially increase when the dimension of parameter space increases. The currently adopted EnKF method is a Monte Carlo based, derivative-free inference approach. It is easy to implement and without the need of deriving adjoint equations. In this paper, 1000 samples (see Table 1) are used, and thus 1000 evaluations of the forward model are involved. However, the ensemble size can be significantly reduced based on a comprehensive parameter study (Tang et al. 2016), which shows that only 40 samples can achieve the same level of inference accuracy. Note that the forward model evaluations can be done in parallel, that is, 40 forward sedimentation simulations were run simultaneously on 40 CPU cores. As a result, the wall time of the inference is approximately the same as the forward sedimentation simulation. Figure 10. View largeDownload slide Comparison of the two inferred quantities, the depth-averaged velocity and the grain-size distribution, with the previous inference of Spiske et al. (2010) (depth-averaged velocity) and field data (grain-size distribution), respectively. Panel (a) shows the comparison of depth-averaged velocity calculated based on the inferred shear velocity u* and water depth h from the proposed inversion scheme with that of TsuSedMod results (Spiske et al. 2010). Panel (b) compares the grain-size distribution obtained from the forward simulation with the inferred parameter u* and h with the field data. Figure 10. View largeDownload slide Comparison of the two inferred quantities, the depth-averaged velocity and the grain-size distribution, with the previous inference of Spiske et al. (2010) (depth-averaged velocity) and field data (grain-size distribution), respectively. Panel (a) shows the comparison of depth-averaged velocity calculated based on the inferred shear velocity u* and water depth h from the proposed inversion scheme with that of TsuSedMod results (Spiske et al. 2010). Panel (b) compares the grain-size distribution obtained from the forward simulation with the inferred parameter u* and h with the field data. 5 CONCLUSION In this work, we proposed a novel inference scheme based on EnKF to infer tsunami flow speed and flow depth from tsunami deposits. In contrast to traditional data assimilation methods using EnKF, an important novelty of the current work is that the system state is augmented to include both the physical variables (sediment fluxes) and the unknown parameters to be inferred, i.e. shear velocity and flow depth. Consequently, the unknown tsunami flow characteristics are inferred in a rigorous Bayesian way with quantified uncertainties, which clearly distinguishes our method with existing tsunami inversion schemes. Two test cases with synthetic observation data are used to verify the proposed inversion scheme. Numerical results show that the tsunami characteristics inferred from the tsunami deposit information have favourable agreement with the synthetic truths, which demonstrated the merits of the proposed tsunami inversion scheme. A realistic case with field data is studied, and the results are compared to those obtained with a previous inversion model TsuSedMod and are validated by the field data. The comparisons indicate a satisfactory performance of the proposed inference scheme on realistic applications. It should be noted that current work focuses on inferring tsunami flow characteristics at a single location, which are affected by both large-scale tsunami wave characteristics and local geological conditions, for example, surface roughness and bathymetry. To further consider the local geological features and to infer the large-scale wave information, the unknown geological parameters also can be augmented into the state to be inferred, and a more comprehensive 2-D sediment transport model needs to be adopted to assimilate deposits from multiple locations simultaneously. In summary, the proposed scheme is a promising tool for the study of palaeotsunamis in the interrogation of sediment records to infer tsunami characteristics. ACKNOWLEDGEMENTS JXW gratefully acknowledge partial funding of graduate research assistantship from the Institute for Critical Technology and Applied Science (ICTAS, grant number 175258) in this effort. The authors would like to acknowledge the two reviewers, Prof David Tappin and Dr Ruoqian Wang, and the editor, Prof Paul Martin Mai, whose comments helped improving the clarity and quality of the manuscript. REFERENCES Barth H.G., 1984. Modern Methods of Particle Size Analysis , vol. 73, John Wiley & Sons. Bourgeois J., Hansen T.A., Wiberg P.L., Kauffman E.G., 1988. A tsunami deposit at the cretaceous-tertiary boundary in Texas, Science , 241( 4865), 567– 570. Google Scholar CrossRef Search ADS PubMed  Caya A., Sun J., Snyder C., 2005. A comparison between the 4DVAR and the ensemble Kalman filter techniques for radar data assimilation, Mon. Weather Rev. , 133( 11), 3081– 3094. Google Scholar CrossRef Search ADS   Dawson A.G., Stewart I., 2007. Tsunami deposits in the geological record, Sedimentary Geol. , 200( 3), 166– 183. Google Scholar CrossRef Search ADS   Dearing J.A., Elner J.K., Happey-Wood C.M., 1981. Recent sediment flux and erosional processes in a Welsh upland lake-catchment based on magnetic susceptibility measurements, Quat. Res. , 16( 3), 356– 372. Google Scholar CrossRef Search ADS   Elsheikh A.H., Pain C.C., Fang F., Gomes J.L.M.A., Navon I.M., 2012. Parameter estimation of subsurface flow models using iterative regularized ensemble Kalman filter, Stochastic Env. Res. Risk Assess. , 27( 4), 877– 897. Google Scholar CrossRef Search ADS   Evensen G., 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. geophys. Res. , 99( C5), 10 143– 10 162. Google Scholar CrossRef Search ADS   Evensen G., 2003. The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean Dyn. , 53( 4), 343– 367. Google Scholar CrossRef Search ADS   Evensen G., 2009. Data Assimilation: the Ensemble Kalman Filter , Springer Science & Business Media. Evensen G., Van Leeuwen P.J., 2000. An ensemble Kalman smoother for nonlinear dynamics, Mon. Weather Rev. , 128( 6), 1852– 1867. Google Scholar CrossRef Search ADS   Ferraiuolo G., Pascazio V., Schirinzi G., 2004. Maximum a posteriori estimation of height profiles in InSAR imaging, IEEE Geosci. Remote Sens. Lett. , 1( 2), 66– 70. Google Scholar CrossRef Search ADS   Gelfenbaum G., Smith J.D., 1986. Experimental evaluation of a generalized suspended-sediment transport theory, Shelf Sands and Sandstones. Memoir , 11, 133– 144. Iglesias M.A., Law K.J., Stuart A.M., 2013. Ensemble Kalman methods for inverse problems, Inverse Probl. , 29( 4), 045001. Google Scholar CrossRef Search ADS   Jaffe B.E., Gelfenbuam G., 2007. A simple model for calculating tsunami flow speed from tsunami deposits, Sedimentary Geol. , 200( 3), 347– 361. Google Scholar CrossRef Search ADS   Jaffe B.E. et al.  , 2011. Flow speed estimated by inverse modeling of sandy sediment deposited by the 29 September 2009 tsunami near Satitoa, East Upolu, Samoa, Earth-Sci. Rev. , 107( 1–2), 23– 37. Google Scholar CrossRef Search ADS   Jaffe B.E., Goto K., Sugawara D., Richmond B.M., Fujino S., Nishimura Y., 2012. Flow speed estimated by inverse modeling of sandy tsunami deposits: results from the 11 March 2011 tsunami on the coastal plain near the Sendai Airport, Honshu, Japan, Sedimentary Geol. , 282, 90– 109. Google Scholar CrossRef Search ADS   Law K.J., Stuart A.M., 2012. Evaluating data assimilation algorithms, Mon. Weather Rev. , 140( 11), 3757– 3782. Google Scholar CrossRef Search ADS   Madsen O., Wright L., Boon J., Chisholm T., 1993. Wind stress, bed roughness and sediment suspension on the inner shelf during an extreme storm event, Cont. Shelf Res. , 13( 11), 1303– 1324. Google Scholar CrossRef Search ADS   Moore A.L., McAdoo B.G., Ruffman A., 2007. Landward fining from multiple sources in a sand sheet deposited by the 1929 Grand Banks tsunami, Newfoundland, Sedimentary Geol. , 200( 3), 336– 346. Google Scholar CrossRef Search ADS   Nwe T.T., Tokuzo H., 2010. Sediment transport due to flooding from levee breach: a numerical application of the 2004 Niigata flood, Int. J. River Basin Manage. , 8( 1), 3– 14. Google Scholar CrossRef Search ADS   Peters W. et al.  , 2005. An ensemble data assimilation system to estimate CO2 surface fluxes from atmospheric trace gas observations, J geophys. Res. , 110, D24304, doi:10.1029/2005JD006157. Google Scholar CrossRef Search ADS   Richmond B.M. et al.  , 2012. Erosion, deposition and landscape change on the Sendai coastal plain, Japan, resulting from the March 11, 2011 Tohoku-oki tsunami, Sedimentary Geol. , 282, 27– 39. Google Scholar CrossRef Search ADS   Soulsby R.L., Smith D.E., Ruffman A., 2007. Reconstructing tsunami run-up from sedimentary characteristics–a simple mathematical model, in Coastal Sediments'07 , pp. 1075– 1088, eds Kraus N.C., Rosati J.D., American Society of Civil Engineers. Spiske M., Weiss R., Bahlburg H., Roskosch J., Amijaya H., 2010. The TsuSedMod inversion model applied to the deposits of the 2004 Sumatra and 2006 Java tsunami and implications for estimating flow parameters of palaeo-tsunami, Sedimentary Geol. , 224( 14), 29– 37. Google Scholar CrossRef Search ADS   Spiske M., Piepenbreier J., Benavente C., Kunz A., Bahlburg H., Steffahn J., 2013. Historical tsunami deposits in Peru: Sedimentology, inverse modeling and optically stimulated luminescence dating, Quat. Int. , 305, 31– 44. Google Scholar CrossRef Search ADS   Tang H., Weiss R., 2015. A model for TSUnami FLow INversion from deposits (TSUFLIND), Mar. Geol. , 99( C5), 10 143– 10 162. Tang H., Wang J., Weiss R., Xiao H., 2016. TSUFLIND-EnKF: Inversion of tsunami flow depth and flow speed from deposits with quantified uncertainties, Mar. Geol. , doi:10.1016/j.margeo.2016.11.009. Tappin D.R. et al.  , 2014. Did a submarine landslide contribute to the 2011 Tohoku tsunami, Marine Geol. , 357, 344– 361. Google Scholar CrossRef Search ADS   Tuck E.O., Hwang L., 1972. Long wave generation on a sloping beach, J. Fluid Mech. , 51( 03), 449– 461. Google Scholar CrossRef Search ADS   Witter R., Jaffe B., Zhang Y., Priest G., 2012. Reconstructing hydrodynamic flow parameters of the 1700 tsunami at Cannon Beach, Oregon, USA, Nat. Hazards , 63( 1), 223– 240. Google Scholar CrossRef Search ADS   Xiao H., Young Y.L., Prevost J.H., 2010. Hydro- and morpho-dynamic modeling of breaking solitary waves over a fine sand beach. Part II: Numerical simulation, Mar. Geol. , 269( 3), 119– 131. Google Scholar CrossRef Search ADS   Young Y.L., Xiao H., Maddux T.B., 2010. Hyro- and morpho-dynamic modeling of breaking solitary waves over sand beach. Part I: experimental study, Mar. Geol. , 269( 3), 107– 118. Google Scholar CrossRef Search ADS   APPENDIX A: DETAILED ALGORITHM FOR ENSEMBLE KALMAN FILTERING The algorithm of the ensemble Kalman filtering for data assimilation and inverse modelling is summarized below. Given the prior distribution of the parameters to be inferred (shear velocity u* and flow depth h) and sediment flux observations with error covariance matrix R, the following steps are performed: Sampling step. Generate initial ensemble $$\lbrace {\mathbf {x}_j}\rbrace _{j = 1}^M$$ of size M, where the augmented system state is   \begin{equation*} \mathbf {x} = [\zeta _1, \ldots , \zeta _n, u_*, h]^{\prime }. \end{equation*} Prediction step. Propagate the state from current state at time t to the next assimilation step t + ΔT with the forward model TSUFLIND, indicated as $$\mathcal {F}$$,   \begin{eqnarray*} \hat{\mathbf {x}}_j^{(t+\Delta T)} &=& \mathcal {F} [ \mathbf {x}_j^{(t)} ] \end{eqnarray*} in which ΔT = ΔNΔt, indicating that the observation data is assimilated every ΔN time steps. Estimate the mean $$\bar{\mathbf {x}}$$ and covariance P(n + 1) of the ensemble as   \begin{eqnarray} \bar{\mathbf {x}}^{(t+\Delta T)} = \frac{1}{M}\sum _{j=1}^N{\hat{\mathbf {x}}^{(t+\Delta T)}_j} \end{eqnarray} (A1a)  \begin{eqnarray} \mathbf {P}^{(t+\Delta T)} = \frac{1}{M-1} \sum _{j = 1}^{N} {\left( \hat{\mathbf {x}}_j\hat{\mathbf {x}}_j^{\prime } - \bar{\mathbf {x}}\bar{\mathbf {x}}^{\prime } \right)^{(t+\Delta T)}}. \end{eqnarray} (A1b) (iii) Analysis step. Compute the Kalman gain matrix as   \begin{equation*} \mathbf {K}^{(t+\Delta T)} = \mathbf {P}^{(t+\Delta T)} \mathbf {H}^{\prime } (\mathbf {H} \mathbf {P}^{(t+\Delta T)} \mathbf {H}^{\prime } + \mathbf {R})^{-1}. \end{equation*} Update each sample in the predicted ensemble as follows:   \begin{eqnarray*} \mathbf {x}_j^{(t+\Delta T)} &=& \hat{\mathbf {x}}_j^{(t+\Delta T)} + \mathbf {K} (\boldsymbol{\zeta }_j - \mathbf {H} \hat{\mathbf {x}}_j^{(t+\Delta T)}). \end{eqnarray*} (iv) Repeat the prediction and analysis steps until all observations are assimilated. APPENDIX B: NOTATION $$h$$  water depth  $$n$$  number of grain-size classes  $$U$$  depth-averaged flow speed  $$u_{\ast}$$  shear velocity  $$w$$  settling velocity of particles  $$z$$  elevation above the bed  $$z_{0}$$  total roughness of the bed  X  augmented system state  C(z)  vertical profile of sediment concentration  Ci,0  sediment concentration of class i at the bed  C0  total sediment concentration at the bed  D  mean particle diameter  $$\cal{F}$$  forward model for sedimentation  K  eddy viscosity  M  number of samples  N  number of time steps  T  total time for all sediments to settle  U  depth-averaged flow velocity  R  observation covariance matrix  $$\mathbb{R}^{n}$$  space of n-dimensional real-valued vectors  P  ensemble covariance matrix  K  Kalman gain matrix  H  observation matrix  $$h$$  water depth  $$n$$  number of grain-size classes  $$U$$  depth-averaged flow speed  $$u_{\ast}$$  shear velocity  $$w$$  settling velocity of particles  $$z$$  elevation above the bed  $$z_{0}$$  total roughness of the bed  X  augmented system state  C(z)  vertical profile of sediment concentration  Ci,0  sediment concentration of class i at the bed  C0  total sediment concentration at the bed  D  mean particle diameter  $$\cal{F}$$  forward model for sedimentation  K  eddy viscosity  M  number of samples  N  number of time steps  T  total time for all sediments to settle  U  depth-averaged flow velocity  R  observation covariance matrix  $$\mathbb{R}^{n}$$  space of n-dimensional real-valued vectors  P  ensemble covariance matrix  K  Kalman gain matrix  H  observation matrix  View Large Greek letters ζ  sedimentation flux  κ  Von Karman constant  φ  logarithmic scale of particle diameter  σi  standard deviation of observation noise for the ith grain-size class  Δt  time step  Δz  water column layer thickness  Δη  deposit layer thickness  ΔN  time steps between two observations  ΔT  time interval between two observations  ζ  sedimentation flux  κ  Von Karman constant  φ  logarithmic scale of particle diameter  σi  standard deviation of observation noise for the ith grain-size class  Δt  time step  Δz  water column layer thickness  Δη  deposit layer thickness  ΔN  time steps between two observations  ΔT  time interval between two observations  View Large Subscripts/Superscripts i  index of grain-size class  j  index of ensemble member  l  indices of time step, sediment layer, and water column layer  0  initial ensemble  i  index of grain-size class  j  index of ensemble member  l  indices of time step, sediment layer, and water column layer  0  initial ensemble  View Large Decorative symbols $$\widetilde{\square }$$  synthetic truth  $$\overline{\square }$$  mean  $$\widehat{\square }$$  forecast state in EnKF  $${\square }\prime$$  vector/matrix transpose  $$\widetilde{\square }$$  synthetic truth  $$\overline{\square }$$  mean  $$\widehat{\square }$$  forecast state in EnKF  $${\square }\prime$$  vector/matrix transpose  View Large © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Geophysical Journal International Oxford University Press

# Inferring tsunami flow depth and flow speed from sediment deposits based on Ensemble Kalman Filtering

, Volume 212 (1) – Jan 1, 2018
13 pages

/lp/ou_press/inferring-tsunami-flow-depth-and-flow-speed-from-sediment-deposits-2HE09ZiWtd
Publisher
The Royal Astronomical Society
ISSN
0956-540X
eISSN
1365-246X
D.O.I.
10.1093/gji/ggx435
Publisher site
See Article on Publisher Site

### Abstract

SUMMARY Flood hazards caused by tsunamis can cause enormous amounts of casualties and property losses. However, these hazards often occur at unexpected locations, which poses challenges for direct observations of flood source characteristics (e.g. flow speed and water depth in tsunamis-induced inundations). For this reason, inverse modelling methods based on the information contained in the deposit are indispensable for deciphering the quantitative characteristics of the flood sources. In this work, we propose an inversion scheme based on Ensemble Kalman Filtering (EnKF) to infer tsunami flow characteristics from sediment deposits. In contrast to traditional data assimilation methods using EnKF, in the current work the system state is augmented by including both the physical states (sediment fluxes) that are observable and the unknown parameters (flow speed and flow depth) to be inferred. A novelty of this work lies in formulating the tsunami inverse problem in a statistically rigorous way. Based on the Bayesian inference theory, the inversion scheme provides quantified uncertainties on the inferred quantities, which clearly distinguishes the present method with existing schemes for tsunami inversion. Two test cases with synthetic observation data are used to verify the proposed inversion scheme. Numerical results show that the tsunami flow variables inferred from the sediment deposit information agree with the synthetic truths very well, which clearly demonstrated the merits of the proposed inversion scheme. Moreover, a realistic application of the proposed inversion scheme with the field data from the 2006 South Java Tsunami is studied, and the results are validated against both previous inversion results and field data reported in the literature. The comparisons show excellent performance of the proposed approach. Tsunamis, Inverse theory, Numerical modelling, Probability distributions 1 INTRODUCTION Historically and in the modern era, human civilization and economic activities have prospered near oceans. This is partly due to the abundant availability of water for transportation and industrial activities. Coastal cities have become important nodes in global economic network. Furthermore, the increased economic relevance of cities, such as Los Angeles, Singapore or Hong Kong (to mention only three of many), has caused their population to significantly grow, including projections that even more people will be attracted to these power houses. NASA estimated that over one-third the total human population live within 100 km of a coastline, and this number will increase in the coming decades. A direct consequence of the tendency of settling near oceans is that throughout human history people had to battle with anthropogenic and naturally occurring coastal disasters, such as tsunamis, and to manage their threat with engineering infrastructure. The flooding hazards caused by tsunamis have caused enormous amounts of casualties and property losses (Nwe & Tokuzo 2010). It is thus crucial to understand the initiation and propagation of these floods and to predict their impact on infrastructure and human settlements (Tuck & Hwang 1972; Xiao et al. 2010; Young et al. 2010). On the other hand, tsunamis have been relatively rare, which is fortunate for these cities but poses great challenges for the scientific understanding of these events due to the sparseness of field data (Tappin et al. 2014). Moreover, the tsunami events often occur at unexpected times and locations along stretched coastlines, which makes direct observations infeasible. This is in spite of the recent advances in field monitoring techniques and the ever-increasing availability of sensors. In other words, the tsunami flow characteristics (e.g. flow speed and water depth) are volatile information. Fortunately, the sediment deposit and erosion patterns left behind by these floods are less volatile—they are often preserved for a long time. In fact, sediment deposits are indispensable records for the study of palaeotsunamis (Bourgeois et al. 1988; Dawson & Stewart 2007; Spiske et al. 2013). They are also critical information to obtain in post-disaster surveys (Richmond et al. 2012), which can be used to infer the tsunami characteristics along with other records (e.g. visual observations and oral histories). Indeed, researchers in the coastal hazard community have systematically pursued the inversion of water depth and flow velocities of tsunami-induced inundation from sediment deposits. Various models have been developed to estimate the flow conditions at the time of sediment depositions. These models include Moore’s Advection Model (Moore et al. 2007), Soulsby’s Model (Soulsby et al. 2007) and TsuSedMod (Jaffe & Gelfenbuam 2007), which have been successfully used in a wide range of applications (Spiske et al. 2010; Jaffe et al. 2011, 2012; Witter et al. 2012; Spiske et al. 2013). They all used the simple idea of target shooting. That is, a forward model of sedimentation, which computes the sediment deposition from given flow speed and water depth, is iteratively invoked until the sediment deposit thickness and grain size distribution that match the observation are obtained. Despite their simplicity and effectiveness, these existing models all have a major limitation in that they are not able to account for the uncertainties in the sediment deposit and the forward models. Uncertainties in sediment deposits can stem from various sources. First, the sampled sediment record can be very thin, and thus significant errors can occur in the sample collection and measurement processes. Second, a given sediment record can be the result of successive actions by multiple events, possibly of different natures (e.g. tsunamis and storm surges). This makes the interrogation of sediment record much more challenging. In addition to the uncertainties in sediment deposits, which are the input to the inverse problem, the modelling assumptions in the forward models also introduces significant uncertainties. For example, most numerical models of tsunami-induced sediment transport are based on Saint-Venant equations, which are computationally cheap but highly simplified (Xiao et al. 2010). They do not explicitly represent the small-scale turbulence processes and the local flow features within the boundary layer, but these features can significantly influence the sediment transport and deposition processes. In view of the large uncertainties in both the input and the model, it is imperative to understand them and to quantify the uncertainties they introduce in the inferred quantities. To this end, a statistical inference framework that can provide more rigorous confidence intervals is needed. Most widely used statistically rigorous inversion frameworks with quantified uncertainties are based on Bayes theorem. Markov Chain Monte Carlo (MCMC) sampling algorithm is the most accurate pathway to perform Bayesian inference (Law & Stuart 2012). However, it requires a large number of forward model evaluations, which is prohibitively expensive for complex forward simulations. Therefore, the use of approximate Bayesian approaches is necessary. There are a number of approximate Bayesian inference approaches, for example, Maximum a posteriori (MAP) (Ferraiuolo et al. 2004), variational methods (Caya et al. 2005), filtering-based methods (Evensen 1994). Among these approaches, the Ensemble Kalman Filtering (EnKF) method has its own advantages (Evensen 2009). First, it is a derivative-free inference method, making it easy to implement. It does not need to derive the complex adjoint equation that is required in the derivative-based methods. Second, the sequential filtering-based method can be used for the time-varying state (e.g. sediment flux). Third, EnKF is a non-intrusive method and thus can handle the high-dimensional, nonlinear system. Finally, the accuracy of the mean value of the inferred quantities by EnKF is comparable to that of the exact Bayesian approach based on the MCMC sampling, which is the golden standard of Bayesian inference (Law & Stuart 2012). Therefore, in this contribution, we propose a Bayesian scheme based on EnKF for the inference of flood source characteristics from sediment deposits. Specifically, this study is concerned with the inference of flow speed and water depth of a tsunami from its sediment deposits, which is referred to as ‘tsunami inversion’ following the conventions in the literature. However, we expect that the proposed method can be straightforwardly extended and adapted for the inference of source characteristics in other floods, for example, the breaching initiation and discharge rate history in riverine floods. These will be pursued in future studies. Data assimilation methods based on EnKF are widely used in geosciences applications (Evensen 2009). For example, in numerical weather forecasting, EnKF is used to infer the initial conditions (present states of the system, e.g. pressure, velocity, and temperature), which are not known precisely and thus must be inferred from observations. In subsurface flows, EnKF has been used to infer the permeability fields of soils and rocks, which have large uncertainties but are not amenable to full-field measurements due to the large size of the domain (Elsheikh et al. 2012). In addition, EnKF is also used to estimate CO2 emission fluxes from trace gas observations (Peters et al. 2005). A novelty of this work lies in formulating the tsunami inverse problem from sediment deposits by using a statistically rigorous approach, that is, EnKF-based data assimilation method. To the authors’ knowledge, our work represents the first attempt to use EnKF for tsunami inversion. This contribution can be seen as a proof of concept to demonstrate the feasibility of the approach. A more comprehensive study consisting of an evaluation of the proposed method, a demonstration of its practical relevance, and a parametric sensitivity study has been presented in a separate companion paper (Tang et al. 2016). The proposed method would open up possibilities for statistically rigorous inference of flood source characteristics from sediment deposits, where the inferred quantities are given with quantified uncertainties. The rest of the paper is organized as follows. Section 2 introduces the forward model of the sediment deposition by a tsunami wave and the formulation of the inverse problem. Section 3 presents the computational setup of three numerical examples of increasing difficulty levels used to demonstrate the merits of the proposed method. The detailed numerical results are presented and discussed in Section 4. Finally, Section 5 concludes the paper. 2 METHODOLOGY The objective of this work is to infer tsunami flow characteristics from the tsunami deposits, which is an inverse problem. As with most inversion algorithm, the proposed method for the inverse problem involves solving the forward problem repeatedly, that is, computing the tsunami deposit from given tsunami flow characteristics. Therefore, in this section the formulations, assumptions, and solution algorithm of the forward problem are first presented in Section 2.1. Subsequently, the formulation and solution algorithm for the inverse problem are introduced in Section 2.2. 2.1 Formulation of the forward problem and solution algorithm Given the tsunami flow characteristics (i.e. flow speed u and flow depth h), the forward problem is to compute the sediment deposition characteristics (i.e. the depth of the tsunami deposit and the particle size composition at any height of the sediment column). An example output of the forward problem is shown in Fig. 1(c), which shows the thickness of the tsunami deposit, the grain-size distribution in each layer. Figure 1. View largeDownload slide A schematic illustration of the algorithm for solving the forward problem, that is, computing the tsunami deposit thickness and the grain-size distribution in each layer from given tsunami characteristics (flow speed and flow depth). An exemplary output of the algorithm is presented in Fig. 2. Figure 1. View largeDownload slide A schematic illustration of the algorithm for solving the forward problem, that is, computing the tsunami deposit thickness and the grain-size distribution in each layer from given tsunami characteristics (flow speed and flow depth). An exemplary output of the algorithm is presented in Fig. 2. The forward model adopted in this study is based on the simplified sedimentation models of Jaffe & Gelfenbuam (2007) and Tang & Weiss (2015). In these models it is assumed that the tsunami deposit is formed solely by the steady sedimentation of the particles in the water column. The effects of bed load transport and the acceleration of sediment particles during the settling process are neglected. This assumption can be justified because the forward model is to calculate the flow speed near the peak inundation of the tsunami, during which, the formation of the sediment deposits is dominated by the suspended load. It is further assumed that the sediment concentration and the fluid velocity vary only vertically in the water column, and their horizontal gradients and temporal changes are neglected. With the assumptions above, the flow velocity profile u(z) along the water column can be parameterized by the shear velocity u*, where z is the elevation above the bed. Consequently, the depth-averaged flow velocity can be obtained from the following integral over the entire water column:   $$U = \int _{z_0}^{h}\frac{u_*^2}{K(z)} \text{d}z ,$$ (1)in which z0 is the total roughness of the bed and h is the flow depth. The eddy viscosity K can be calculated as following (Gelfenbaum & Smith 1986):   $$K(z) = \kappa \, u_* \, z \exp \left[ {\frac{-z}{h} - 3.2 \left( \frac{z}{h} \right)^2 + \frac{2}{3} \times 3.2 \left( \frac{z}{h} \right)^3} \right],$$ (2)where κ = 0.41 is the von Karman constant. The suspended sediment concentration for each grain size in the water column is assumed to follow the Rouse profile (Jaffe & Gelfenbuam 2007):   $$C_i (z) = C_{i, 0} \, \exp \left[ {w_{i} \int _{z_0}^{z} {\frac{1}{K(z)}}} \right]$$ (3)in which wi is the settling velocity of particles in the ith grain-size class, which depends on the mean particle diameter Di of the class; Ci, 0 denotes sediment concentration of the ith grain-size class at the bed, which depends on the shear velocity u* and the fraction of bed sediment in class i, among other parameters (Madsen et al. 1993). Based on eqs (1) and (3), the velocity and concentration profiles can be uniquely parameterized by two scalar quantities, the shear velocity u* and flow depth h. Therefore, from here on we consider u* and h the characteristics that define a tsunami. The objective of a tsunami inversion is thus to infer u* and h from a tsunami deposit record. Eq. (3) suggests that the sediment concentration for each sediment grain size can be determined by the tsunami characteristics (i.e. shear velocity u* and flow depth h) and the particle diameter Di. According to the convention in the sedimentology, the grain size is represented in ϕ scale (the logarithm of the particle diameter to the base 2), ϕi = −log2(Di/Dref), in which Dref = 1 mm is the reference particle diameter to ensure dimensional consistency. Consequently, the tsunami deposit thickness and grain-size distribution can be obtained by integrating the concentration curve Ci(z) for each grain size class. The integration is performed with the following algorithm, which is illustrated in Fig. 1 with two grain-size classes as an example. Discretization of time. Assuming that T is the total time taken for all the sediment in the water column (including all grain-size classes) to settle, we divide the time T to N time steps of size Δt such that T = N Δt. Discretization of water column. For grain-size class i, the water column can be divided to N layers, numbered sequentially upward from the bottom (see Fig. 1a), such that the sediment at the top of layer l arrives at the bed at time l Δt. Since the particles are assumed to have no acceleration (e.g. at constant velocity wi) during the sedimentation, the layers of the water column have a uniform thickness of Δzi = wi Δt. Note that the layer thickness can be different among different grain-size classes since the terminal velocity wi is larger for coarse grains than for fine grains. Consequently, for a coarse grain-size class the discretized water column layer thickness Δzi = wiΔt is larger and thus the number of discretized water column layers is smaller. This can be seen by comparing Figs 1(a) and (b). Accumulation of tsunami deposit. With the discretization of the water column, it can be seen that the obtained tsunami deposit has N layers as well (numbered in the same way as for the water column; see Fig. 1c). The sediment in layer l consists of the sediment in the lth layer of the water column for all grain-size classes (indicted in colours/patterns in Fig. 1c). The tsunami deposit thickness Δηl of the lth layer is computed by summing up the sediment volume in the corresponding lth water column layers for all grain-size classes:   $$\Delta \eta _l = \frac{1}{C_0} \left( \sum _{i = 1}^{n} {\overline{C}_{i, l} \, \Delta z_{i}} \right)$$ (4)in which C0 is the total sediment concentration at the bed including size classes, n is the number of grain-size classes, and $$\overline{C}_{i, l}$$ is the average concentration of grain-size class i in the water column layer l, which can be obtained by a simple integration:   $$\bar{C}_{i, l} = \frac{1}{\Delta z_i} \int _z^{z + \Delta z_i} \, C_i (z) \, \text{d}z.$$ (5) Post-processing for grain-size distribution. The fraction fi, l of each grain-size class i in the lth layer in the sediment column is   $$f_{i, l} = \frac{1}{C_0} \frac{\overline{C}_{i, l} \, \Delta z_{i, l}}{\Delta \eta _l}.$$ (6) The thickness of sediment layers Δηl and the fraction fi, l for each grain-size class in each layer are used to produce the plots presented in Figs 1(c) and 2, which is the final output of the forward problem. Figure 2. View largeDownload slide Hypothetical 0.2 m thick tsunami deposit for verification cases: vertical grading in grain-size distributions (blue line) and mean grain-size (red line) for sediment intervals. The grain-size distribution of the entire tsunami deposit is a log-norm distribution. Figure 2. View largeDownload slide Hypothetical 0.2 m thick tsunami deposit for verification cases: vertical grading in grain-size distributions (blue line) and mean grain-size (red line) for sediment intervals. The grain-size distribution of the entire tsunami deposit is a log-norm distribution. The algorithm above for the forward problem produces some key information, which is the time stamp of tsunami deposit layers and the sediment flux at each time step. Specifically, the time, at which the sediment layer l finished the deposition at time l Δt, can be considered the time stamp on the sediment layer (see Fig. 1b). Admittedly, the sediment cores obtained in the field do not come with time stamps. However, by comparing sediment cores obtained at several locations with cross-shore offsets, it is possible to infer the time stamps and thus the sediment fluxes at discretized time intervals (Dearing et al. 1981). To facilitate the filtering procedure to be used in the inversion, we adopt an alternative approach of computing tsunami deposit thickness by considering the time sequence of the sedimentation process. Specifically, in contrast to eq. (4), the deposition thickness can be computed from the average sediment flux ζi, l of grain-size class i at time step l when the lth layer of the sediment deposit is formed. That is,   \begin{eqnarray} \Delta \eta _l = \frac{1}{C_0} \left( \sum _{i = 1}^{n} \zeta _{i, l} \, \Delta t \right), \quad \textrm{with} \end{eqnarray} (7a)  \begin{eqnarray} \zeta _{i, l} & =& \frac{\overline{C}_{i, l} \, \Delta z_{i}}{\Delta t}, \quad \textrm{or equivalently,} \nonumber\\ \zeta _{i, l} & =& \overline{C}_{i, l} \, w_i \quad \textrm{in which} \quad w_i = \frac{\Delta z_{i}}{\Delta t} . \end{eqnarray} (7b) Note that the same subscript l that is used above as the indices of the water column layer (Figs 1a and b) and sediment layer (Fig. 1c) is used to denote the time step index here. This choice of notation is justified by the assumption that the lth sediment layer is formed by the deposition of all sediment grain classes in the lth water column layer at time step l. Simply substituting eq. (7b) to eq. (7a) yields eq. (4), and thus the two formulations in eqs (4) and (7) are equivalent. By adopting the flux-based formulation, we are modelling the sediment deposition process as the evolution of a dynamical system with the sediment flux ζi for each grain-size class i as the system state. Based on the assumptions from eq. (7b), it can be seen that for any grain-size class i the state ζi(t) is uniquely determined by the concentration profile Ci(z) and the settling velocity wi, which in turn depends on the tsunami characteristics (shear velocity u* and flow depth h) and the grain-size ϕi. Therefore, the forward model $$\mathcal {F}$$ is thus formulated as to compute sediment deposition flux ζi(t) from known shear velocity u* and flow depth h, that is, $$\mathcal {F}: (u_*, h) \mapsto \zeta _{i} (t)$$. The sediment thickness and grain-size distribution are considered auxiliary quantities obtained by post-processing the time-series of the system state ζi(t). 2.2 Formulation of the inverse problem and solution algorithm The objective of the inverse problem is to infer the tsunami characteristics (depth-averaged flow velocity and flow depth) from tsunami deposit. In the formulation of the forward problem above, the flow velocity profile and the depth-averaged velocity are parameterized by the shear velocity as in eq. (1), and the sediment flux is the state variable of the system. Therefore, the inverse problem is recast as inferring the shear velocity and flow depth from the sediment flux. The sediment flux, which is the input to the inverse problem, can be obtained by analysing the tsunami deposit from the field. Specifically, when tsunami deposit samples are cored, they are first divided to a number of layers and to obtain the time stamp for each layer by utilizing the spatial information of the samples. Particle-size analysis is then performed on each layer, that is, by using sieve analysis or other sedimentation techniques (Barth 1984), which leads to the average sediment flux ζi at a few discrete time steps. The inverse problem needs to be formulated so that shear velocity u* and flow depth h can be inferred from the sediment flux. As such, we consider u* and h the unobservable parameters of the dynamical system $$\mathcal {F}(u_*, h; \zeta _i (t))$$. They will be inferred from observations of the system state, that is, the sediment flux ζi(t). The inversion of shear velocity and flow depth from observed sediment flux is challenging for at least two reasons. First, the observation is inevitably sparse and noisy, because the sediment core can only be divided to a few layers to ensure each layer has enough sediment mass, and the measurement has large errors. Second, the forward model describing the sedimentation process is based on high simplified assumptions and thus does not faithfully represent the exact system dynamics. In this work we use the Ensemble Kalman Filtering (EnKF) to perform the inversion (Evensen 2003, 2009; Iglesias et al. 2013), which is widely used in data assimilations, particularly in numerical weather forecasting. When used to solve the tsunami deposit inversion problem, the system state is first augmented to include both the physical state ζi(t), which are observable, and parameters u* and h, which are unobservable (from the sediment core) and are to be inferred. The augmented system state x(t) is written as a vector formed by stacking the unknown parameters and the sediment flux ζi(t):   $$\mathbf {x} = [\zeta _1, \ldots , \zeta _n, u_*, h]^{\prime },$$ (8)in which ΄ indicates vector transpose. Given the prior distributions for parameters (u* and h) to be inferred and the covariance matrix R of the sediment flux observations $$\zeta _{\rm i}^{\rm obs}$$, the inversion algorithm proceeds as follows: Sampling of prior distribution. From the prior distributions of the parameters, M samples are drawn. Each sample consists of a combination of values for u* and h. Propagation. The sediment fluxes $$\hat{\zeta }_{i}$$ for all grain-size classes are computed from eq. (7b) by using the updated parameters u* and h from the previous analysis step (or from the initial sampling if this is the first propagation step). The propagation is performed for ΔN time steps, where ΔN is the number of time steps in the time interval ΔT between two consecutive data assimilation operations. The $$\hat{\cdot }$$ indicates predicted quantities that will be corrected in the analysis step below. The propagation is performed for each sample in the ensemble, leading to the propagated ensemble $${\lbrace \hat{\mathbf {x}}_j\rbrace _{j = 1}^M}$$. Each sample $$\hat{\mathbf {x}}_j$$ is a vector containing a realization of shear velocity and flow depth as well as the corresponding sediment flux (see eq. 8). The mean $$\bar{\mathbf {x}}$$ and covariance P of the propagated ensemble are computed (see eq. A1b in Appendix A). Analysis/correction. The computed sediment fluxes $$\hat{\zeta }_{i}$$ for all grain-size classes are compared with observations $$\zeta _{i}^{\text{obs}}$$ corresponding to the current time step l. The ensemble covariance P and the error covariance R are used to compute the Kalman gain matrix K. Each sample is corrected as follows:   $$\mathbf {x}_j = \hat{\mathbf {x}}_j + \mathbf {K} ({\boldsymbol{\zeta }_j} - \mathbf {H} \hat{\mathbf {x}}_j),$$ (9)where superscript xj is the corrected system state; $$\boldsymbol{\zeta } = [\zeta _1, \ldots , \zeta _n]^{\prime }$$ are the sediment fluxes, the part of the system state vector that can be observed; H is the observation matrix. After the correction, the analysed state contains updated fluxes and parameters.  It should be remarked that: (1) the analysis scheme above suggest that the corrected state (i.e. the analysis) is a linear combination of the prediction and observations, with the Kalman gain matrix K being the weight of the observations; (2) the observation matrix $$\mathbf {H}: \mathbb {R}^{n+2} \mapsto \mathbb {R}^n$$ has a size of n × (n + 2), which maps a vector in the n + 2 dimensional state space to a vector in the n-dimensional observation space. The first n columns of H are ones and the last two columns are zeros, indicating that the parameters are not observed. Repeat propagation and analysis steps 2–3 for next data assimilation time t + ΔT to incorporate next observation until all observations are assimilated. The EnKF algorithm for inferring tsunami characteristics from tsunami deposit is summarized in Fig. 3. The detailed algorithm is presented in Appendix A. Figure 3. View largeDownload slide Schematic of the inverse modelling approach based on the state augmentation. The system state is augmented to include both the physical state (sediment flux ζi(t)) and the parameters to be inferred (shear velocity u* and flow depth h). An ensemble representative of the augmented state is propagated via the forward model. The propagated ensemble is then updated in the analysis process based on the observation data. The updated state (physical quantities and model parameters) is then propagated to the next time step, and this loop continues until all observations are assimilated. Figure 3. View largeDownload slide Schematic of the inverse modelling approach based on the state augmentation. The system state is augmented to include both the physical state (sediment flux ζi(t)) and the parameters to be inferred (shear velocity u* and flow depth h). An ensemble representative of the augmented state is propagated via the forward model. The propagated ensemble is then updated in the analysis process based on the observation data. The updated state (physical quantities and model parameters) is then propagated to the next time step, and this loop continues until all observations are assimilated. It can be seen that the observations arrive sequentially in the EnKF data assimilation procedure above, which is typical for applications such as numerical weather forecasting. In this work we formulate the tsunami inversion problem with a sequential streaming of data to take advantage of the widely used EnKF algorithm. In this method, the filtering procedure finds an optimal correction at each assimilation step based on the latest observation and the latest prediction ensemble (see eq. 9). We note that it can be preferable to use another algorithm that is closely related to EnKF, namely the Ensemble Kalman Smoothing method, which finds optimal correction in light of all past observations (Evensen & Van Leeuwen 2000). This method will be investigated in future work. 3 COMPUTATIONAL SETUP OF CASES While EnKF-based Bayesian inferences have been widely used in other communities of geosciences, the present contribution represents the first attempt in using it for tsunami inversion. To establish confidence in the proposed framework for tsunami inversion based on sediment deposits, we construct a series of verification cases with synthetic truths to assess the performance of the proposed inversion scheme. Furthermore, we test the proposed framework by using a set of field data of the real tsunami deposits from the 2006 South Java Tsunami (sections Bunton, sample JTR 6 (Spiske et al. 2010)). A synthetic case can be generated by running the forward model described in Section 2.1 on given a set of tsunami and sediment characteristics (i.e. shear velocity $$\tilde{u}_*$$ and flow depth $$\tilde{h}$$, the range of particle sizes $$\tilde{\phi }_i$$, where $$\tilde{\cdot }$$ indicates synthetic truths). The corresponding tsunami deposits including the thickness and the grain-size distribution as shown in Fig. 2 can thus be obtained. Subsequently, the sediment flux can be obtained by post-processing the tsunami deposit information with the procedure explained in Section 2.2. In fact, for the synthetic cases the sediment fluxes are part of the forward model simulation output, and thus a post-processing procedure is not required. Synthetic observations are then generated by adding Gaussian random noises of standard deviation σi to the true sediment fluxes, which represent the measurement and sampling errors in sediment coring operations in the field. The decision to use synthetic cases instead of realistic cases to verify the proposed method is justified by the fact that the true tsunami characteristics corresponding to actual field samples are usually unknown, which make them not ideal for verification purposes. Even if the truth for tsunami characteristics were known, for example, when the flow speed and flow depth were measured from independent sources, it would be difficult to distinguish the errors due to the forward model inadequacy and those due to the inversion procedure. Therefore, using the synthetic cases allow us to focus on assessing the performance of the proposed inversion procedure. The merits of the inversion scheme can be assessed by its capability to reproduce the synthetic truths $$\tilde{u}_*$$ and $$\tilde{h}$$, to which the inversion scheme is blind. With the established confidence from the verification cases, field data of deposits from the 2006 Java tsunami event are used as the observations to demonstrate the capability of the proposed framework in realistic applications. For simplicity, we assume that the shear velocity u* and flow depth h, which are to be inferred, are assumed time-invariant. This is because the temporal gradients in mean flow speed during much of the tsunami inundation landward are small. Near the peak of the tsunami, as the tsunami periods are very long, the time change in flow speed is small and the flow can be approximated as steady. However, the proposed scheme can be straightforwardly extended to time-varying shear velocity u*(t) by incorporating iterations in each time step. As a proof-of-concept study, the tsunami deposit at only one onshore location is utilized. For the problem where sediment deposits are available at multiple locations, the proposed scheme also can be applied by expanding the state vector to include fluxes at different locations. In this case, in addition to the local tsunami flow quantities inferred in this study, other information also can be obtained, for example, wavelength, period, number of waves and duration of the tsunami. Two verification cases with synthetic observations of increasing difficulty levels are defined. In case 1, the sediment has a single grain size ϕ = 2.0, and the only unknown parameter to be inferred is the shear velocity u* while the flow depth h is given. In case 2, which is more challenging, the sediment has a log-normal grain-size distribution with 0 < ϕ < 3.25. Both u* and h are unknown and are to be inferred. The synthetic truths for u* and h, the prior ensemble means ($$\bar{u}_*^{0}$$ for both cases and $$\bar{h}^{0}$$ for case 2 only) are the same for both synthetic cases. For the realistic case with field data (referred to as case 3), the mean grain size, largest grain size and smallest grain size are ϕ = 2.5, ϕ = 0.0 and ϕ = 5.25, respectively. The mean values of prior ensembles are estimated by TSUFLIND (Tang & Weiss 2015). All other parameters, including the forward model time step Δt, the data assimilation interval ΔN, and the number of samples M are the same for all 3 cases. The parameters are summarized in Table 1. The prior ensembles for both parameters are uniformly distributed in the ranges specified in Table 1, which is representative of the lack of knowledge on the quantities to be inferred in practical tsunami inversions. Since the truths of the quantities to be inferred, $$\tilde{u}_*$$ and $$\tilde{h}$$, are unknown when performing the inversion, the mean values of prior ensemble $$\bar{u}_*^0$$ and $$\bar{h}^0$$ for both quantities are likely to have biases compared to the synthetic truth. This is reflected in the choice of ensemble mean as shown in Table 1. The time step Δt = 0.5 s is chosen for all cases, and the observations of sediment fluxes are assimilated every ΔN = 10 time steps. The error covariance matrix is chosen as $$\mathbf {R} = \textrm{diag}[\sigma _1^2, \ldots , \sigma _n^2]$$ with σi being the standard deviation of the white noise added to the true sediment fluxes for each grain size i. The choice of standard deviation of noises shown in Table 1 is based on the error model detailed in the companion paper (Tang et al. 2016). Table 1. Physical and computational parameters for test cases. Parameters  Case 1  Case 2  Case 3  Inferred parameters  u*  u* and h  u* and h  Grain size ϕ ( = −log2[D/Dref])  2.0  0 < ϕ < 3.25  0 < ϕ < 5.25  Δt  0.5 s  Data assimilation interval ΔN  10  Number of samples M  1000  std. σi of observation errora  ε + 0.01$$\tilde{\zeta }_i$$  ε + 0.05$$\tilde{\zeta }_i$$  Synthetic truth $$\tilde{u}_*$$ for u*  0.5 m s−1  −  Range of prior ensemble for u*b  0.4–1.2 m s−1  0.15–0.45 m s−1  Mean $$\bar{u}_*^{0}$$ of prior ensemble for u*  0.8 m s−1  0.3 m s−1  Synthetic truth $$\tilde{h}$$ for h  3 m  3 m  —  Range of prior ensemble for h  —  2.5–7.5 m  6–10 m  Mean $$\bar{h}^0$$ of prior ensemble for h  —  5 m  8 m  Parameters  Case 1  Case 2  Case 3  Inferred parameters  u*  u* and h  u* and h  Grain size ϕ ( = −log2[D/Dref])  2.0  0 < ϕ < 3.25  0 < ϕ < 5.25  Δt  0.5 s  Data assimilation interval ΔN  10  Number of samples M  1000  std. σi of observation errora  ε + 0.01$$\tilde{\zeta }_i$$  ε + 0.05$$\tilde{\zeta }_i$$  Synthetic truth $$\tilde{u}_*$$ for u*  0.5 m s−1  −  Range of prior ensemble for u*b  0.4–1.2 m s−1  0.15–0.45 m s−1  Mean $$\bar{u}_*^{0}$$ of prior ensemble for u*  0.8 m s−1  0.3 m s−1  Synthetic truth $$\tilde{h}$$ for h  3 m  3 m  —  Range of prior ensemble for h  —  2.5–7.5 m  6–10 m  Mean $$\bar{h}^0$$ of prior ensemble for h  —  5 m  8 m  astd. denotes standard deviation; the standard deviation of noise added to the observation depend on the grain-size class; ε = 0.125 mm3 s−1 is a fixed constant, and $$\tilde{\zeta }$$ is the synthetic truth of the sediment flux, which depends on the grain size and vary with time. bA smaller range of prior ensemble for u* from 0.7 to 0.9 m s−1 is also investigated in case 1. View Large 4 RESULTS AND DISCUSSION 4.1 Case 1: a single grain-size class and one unknown parameter First, a case with simple conditions is studied where all tsunami deposits are in the same grain-size class and the shear velocity u* is the only unknown parameter to be inferred. Although this highly simplified scenario is likely to be uncommon in reality, a case with controlled conditions is able to provide a reasonable initial assessment of the proposed method. The time-series ζ(t) of the sediment flux, which is the physical state of the system, is presented in Fig. 4. The sediment fluxes for all 1000 samples are shown along with the synthetic truth corresponding to the true shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. It can be seen that for all samples (i.e. prior guesses of shear velocities) and the synthetic truth show a general trend that the magnitude of sediment flux decreases with time, which is typical of a Rouse sediment concentration profile assumed in the forward model as in eq. (3). Figure 4. View largeDownload slide Time-series of the sediment fluxes ζ(t), which is the physical state of the system, for a single grain-size class ϕ = 2.0 during the sedimentation process. The green (light grey) lines show M = 1000 sediment flux samples, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth of the shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. (The 2σo observation error bar is also plotted in the figure.) The synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. Figure 4. View largeDownload slide Time-series of the sediment fluxes ζ(t), which is the physical state of the system, for a single grain-size class ϕ = 2.0 during the sedimentation process. The green (light grey) lines show M = 1000 sediment flux samples, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth of the shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. (The 2σo observation error bar is also plotted in the figure.) The synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. The convergence history of the inferred parameter, the shear velocity u*, is shown in Fig. 5(a). By assimilating the observation data as shown in Fig. 4, the shear velocity of all samples and the sample mean gradually converge to the synthetic truth $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$, regardless of their values in the initial ensemble. The range of sample scattering for u*, which is 0.8 m s−1 (with a range from 0.4 to 1.2 m s−1) in the prior ensemble, shrinks to 0.04 m s−1 at the end of the inference. Since the scattering of samples in EnKF-based inference is indicative of the uncertainty in the inferred results, the reduction of sample scattering represents the reduction of uncertainties and correspondingly the increase of confidence in the quantity to be inferred. Figure 5. View largeDownload slide Convergence history of the inferred parameter, the shear velocity u*, for the single grain-size case. The plot shows (a) the convergence of the ensemble and the sample mean as well as (b) the evolution of the probability density function of the shear velocity u* in the inversion process. (c) The corresponding cumulative distribution function for the final inference results. Figure 5. View largeDownload slide Convergence history of the inferred parameter, the shear velocity u*, for the single grain-size case. The plot shows (a) the convergence of the ensemble and the sample mean as well as (b) the evolution of the probability density function of the shear velocity u* in the inversion process. (c) The corresponding cumulative distribution function for the final inference results. The probability density function of the shear velocity u* corresponding to the ensemble at time steps 0 (initial), 30, 50, 100 and 200 (final) are presented in Fig. 5(b), highlighting the evolution of uncertainty in the inferred quantity. Note that the changes of shear velocity u* at different time steps does not mean that u* is time-varying. Rather, the shear velocity is updated because the inference is formulated as a filtering problem. That is, the velocity ensemble is updated whenever new ‘observations’ are streamed in. We can see that the shear velocity u* is equally likely between 0.4 and 1.2 m s−1 because of the non-informative, uniform prior distribution that is chosen. At time step 50 (after five observations assimilated) the samples are scattered between approximately 0.47 and 0.54 m s−1. Moreover, the bias of the ensemble mean compared to the truth has been largely corrected. As the data assimilation continues, the distribution of u* continues to shrink and concentrate towards the truth ($$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$). The final, posterior distribution of u* at time step 200 is Gaussian based on the plot shown in Fig. 5(b). While this could well be the true distribution, it should be interpreted with caution, since it is well known that the EnKF procedure is derived under Gaussian assumptions (Evensen 2009). The cumulative distribution function of the posterior distribution is presented in Fig. 5(c). It shows that 95 per cent of ensembles have shear velocities u* between 0.490 and 0.508 m s−1. In other words, the inference suggest that there is 95 per cent probability that the shear velocity u* falls within the range above. This credible interval is based on both the prior distribution and the observation data. The uncertainties represented as credible intervals that are obtained in the inversion process is based on the Bayesian inference theory, which clearly distinguishes the present method with existing schemes for tsunami inversion. The evolution of the inference uncertainty over time as shown in Figs 5(a) and (b) illustrates the Bayesian nature of the EnKF-based data assimilation method. In the inference, one starts with a subjective prior distribution on the unknown parameters, which is usually non-informative (see the wide distribution at time step 0) and is represented with an ensemble. As observation data are assimilated, the distribution becomes narrower. Meanwhile, the importance of the prior distribution diminishes as more and more data are assimilated. The remaining uncertainties in the inference results stems from the uncertainties in the observation data, which are inevitable in field measurement and are represented with random noise in this study. The proposed approach provides a statistical pathway to consider the uncertainties from various sources, for example, model inputs, numerical model, and observation data. However, the estimated uncertainties are approximate, because the EnKF is an approximate Bayesian approach based on Gaussian assumption. Further investigations are still needed for assessing the accuracy of the estimated uncertainties. In order to demonstrate the robustness of the inversion procedure with respect to the specified prior distribution, we also tested a prior ensemble (i.e. initial guesses of the parameter to be inferred) with u* in the range of 0.7 and 0.9 m s−1. In contrast to the case presented above, this range does not cover the truth $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$. Even with this overly confident prior distribution, almost identical inversion results were obtained. In fact, the differences between the convergence history of u* disappear after a few assimilation steps. As such, detailed results for this case are omitted here for brevity. The relative inference error for the shear velocity is presented in Fig. 6, which is defined as the L2 norm $$| (\bar{u}_* - \tilde{u}_*) / \tilde{u}_*|$$ of the difference between the ensemble mean $$\bar{u}_*$$ and the synthetic truth $$\tilde{u}_*$$. It can be seen that the inference error decreases dramatically within the first few data assimilations steps, from 0.6 for the initial prior ensemble (time step 0) to 0.025 after five observations are assimilated (at time step 50). This finding is consistent with the decrease of the ensemble scattering (indicating inference uncertainties) as shown in Fig. 5(a). Figure 6. View largeDownload slide The L2 norm of the inference error for the shear velocity, defined as the difference $$| (\bar{u}_* - \tilde{u}_*) / \tilde{u}_* |$$ between the ensemble mean $$\bar{u}_*$$ and the synthetic truth $$\tilde{u}_*$$. Figure 6. View largeDownload slide The L2 norm of the inference error for the shear velocity, defined as the difference $$| (\bar{u}_* - \tilde{u}_*) / \tilde{u}_* |$$ between the ensemble mean $$\bar{u}_*$$ and the synthetic truth $$\tilde{u}_*$$. 4.2 Case 2: multiple grain-size classes and two unknown parameters A tsunami inversion problem with wide grain-size distribution is studied to demonstrate the capability of the proposed method in a more realistic scenario. Both the shear velocity u* and the flow depth h are unknown and the two parameters must be inferred simultaneously. The input to the tsunami inversion procedure is the analysed results of tsunami deposit column as shown in Fig. 2. The grain-size distribution along the depth of the sediment column is obtained from a forward simulation with shear velocity and flow depth as specified in Table 1. The particle sizes range from 0 to 3.25 in ϕ scale, or equivalently from 1 mm to 0.105 mm. The range is divided into 10 grain-size classes, ϕi with i = 1, …, 10, equally spaced in ϕ scale. The superscripts are indices of the grain-size classes. The time-series of sediment fluxes for six representative grain-size classes are shown in Fig. 7, that is, ϕ2 = 0.65, ϕ3 = 0.975, ϕ4 = 1.3, ϕ5 = 1.625, ϕ6 = 1.95, and ϕ7 = 2.275. As in the case with a single grain-size class, the sediment fluxes ζi(t) for all grain-size classes decrease as the deposition proceeds, and the uncertainties represented by the ensemble scattering are reduced as the observations are assimilated. Moreover, the sediment fluxes for the coarse grains (corresponding to smaller ϕ values, for example, ϕ2 = 0.65, D = 0.637 mm as shown in Fig. 7a) decrease more rapidly than those for the finer grains with larger ϕ values. It can be seen from Fig. 7(a) that the coarsest (ϕ2 = 0.65) among the six grain-size classes completed sedimentation in 60 time steps (i.e. 30 s since Δt = 0.5 s). In contrast, the finest grain-size class (ϕ7 = 2.275, D = 0.105 mm) among the six has a more uniform sediment flux throughout the entire sedimentation process. Again, this is attributed to the assumed Rouse profile. It is also noted that the relative uncertainties (ensemble scattering) in the sediment fluxes at the end of the inversion are larger for the fine grains than for the coarse grains. This is due to the random noises added to the true sediment flux when generating synthetic observations. As shown in Table 1, the standard deviation σi of the noise has a fixed component (ε) and a component $$0.01 \tilde{\eta }_i$$ proportional to the truth. Since the sediment flux for the finer grains are smaller in absolute value, as can be seen from the different vertical axis ranges in panels (a)–(f), the relative observation uncertainties are thus larger for the sediment fluxes of the finer grain sizes. Consequently, the uncertainties in ensemble directly reflect the uncertainties in the observations. Figure 7. View largeDownload slide Time-series of sediment fluxes ζ(t) for six (among a total of 10) grain-size classes during the sedimentation: (a) ϕ2 = 0.65, (b) ϕ3 = 0.975, (c) ϕ4 = 1.3, (d) ϕ5 = 1.625, (e) ϕ6 = 1.95 and (f) ϕ7 = 2.275. The green (light grey) lines show M = 1000 samples of sediment flux time-series, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth, that is, shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$ and flow depth h = 3 m. synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. Figure 7. View largeDownload slide Time-series of sediment fluxes ζ(t) for six (among a total of 10) grain-size classes during the sedimentation: (a) ϕ2 = 0.65, (b) ϕ3 = 0.975, (c) ϕ4 = 1.3, (d) ϕ5 = 1.625, (e) ϕ6 = 1.95 and (f) ϕ7 = 2.275. The green (light grey) lines show M = 1000 samples of sediment flux time-series, and the yellow (filled) circles indicate the observed sediment flux corresponding to the synthetic truth, that is, shear velocity $$\tilde{u}_* = 0.5\ \mathrm{m\, s}^{-1}$$ and flow depth h = 3 m. synthetic observation data of sediment fluxes are assimilated to the simulation every 10 times in the EnKF-based inversion procedure. The convergence history of the two parameters to be inferred, shear velocity and flow depth, are presented in Fig. 8. It can be seen from Figs 8(a) and (b) that the sample means for both parameters (red dotted lines) converge to the synthetic truths (horizontal black dashed lines) within 50 time steps, that is, after five observations are assimilated, although the prior ensemble means deviate significantly from the truths at time step 0. A more comprehensive parameter study (Tang et al. 2016) also shows that the prediction results are quite robust with different prior ensembles, clearly suggesting that the system is rather well-posed. Moreover, the uncertainties as indicated by the ensemble scattering are reduced significantly. We note two points here. First, comparison between Figs 8(a) and (b) suggests that the shear velocity converges to the truth faster than the flow depth does. Since EnKF inversion procedure depends on correlation to make inferences, this seems to indicate that the sediment fluxes, which are the observed physical state, are more sensitive to the shear velocity than that to the flow depth. Second, the convergence of shear velocity in the multiple grain-size case as shown in Fig. 8(b) is slightly faster than that in the single grain-size class case in Fig. 5(a). A major difference between the two cases is that the state in the multiple grain-size case is the sediment flux for all grain-size classes, that is, a vector of size 10 with one component corresponding to each grain-size class. Accordingly, an observation in the multiple grain-size case is also a vector of size 10. This is much more information than in the single grain-size class case, where the state and observations are only scalars. More information in the observation leads to more accurate inference. Intuitively, one would expect the multiple grain-size class case to be more difficult, particularly considering the fact that there are two unknown parameters. Indeed, this apparently counterintuitive finding suggests that the setup here is not entirely realistic in that we used similar levels of relative error for both the single and multiple grain-size cases. In the field, when a sediment core of a given size is divided to yield grain-size distributions for multiple grain-size classes, the relative measurement error in the obtained grain-size distribution would inevitably increase compared to a single grain-size class case. Therefore, it would have been more realistic to use a larger observation error for the multiple grain-size case. The effect of observation errors on the inference results and the optimization of number of grain-size classes from a given sediment core are topics of future work. The decrease of uncertainties in the inferred parameters can be clearly seen in the probability density functions shown in Figs 8(c) and (d). For both parameters, the probability density functions shrink continuously towards the truth, indicating the gain of confidence as more observations are assimilated. Overall, in this more realistic and challenging test case, we found that the proposed method has equally satisfactory performance compared to that in the single grain-size case. Figure 8. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. Panels (c) and (d) show the evolution of probability density functions for the shear velocity and the flow depth, respectively, during the inversion process. In panel (c), a zoomed-in view is presented as inset to show the detailed results in the first few steps. Figure 8. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. Panels (c) and (d) show the evolution of probability density functions for the shear velocity and the flow depth, respectively, during the inversion process. In panel (c), a zoomed-in view is presented as inset to show the detailed results in the first few steps. 4.3 Case 3: realistic application For this case, a deposit column with a 0.12 m thickness from the 2006 South Java Tsunami is used as the observation (Spiske et al. 2010). We simultaneously infer both unknown parameters, shear velocity u* and flow depth h. The particle sizes range from 0 to 5.25 in ϕ scale, or equivalently from 1 mm to 0.0263 mm in diameter. This range was divided into 15 grain-size classes ϕi with i = 1, …, 15, equally spaced in ϕ scale. The convergence history of shear velocity and flow depth is presented in Fig. 9. We can see that the scattering of both parameters is reduced. Within 200 steps, flow depths in most of the samples converges to the range of 6 to 8 m from initial range of 4–12 m. Meanwhile, shear velocities in the samples converges to a small interval around 0.235 m s−1. Similar to what has been shown in synthetic case 2, the shear velocity converges faster than the flow depth does. The scattering of shear velocity samples is significantly reduced after 200 steps, while the flow depth samples are still largely scattered, indicating a relatively large posterior inference uncertainty. This is because the sediment fluxes are not sensitive to the flow depth, especially when the flow depth is large. Specifically, the Rouse profiles depicted in Fig. 1 shows that the sediment concentration is low in the upper region of the water column. When the flow is deep, the change of its depth does not significantly affects the sediment concentration in the upper layers of the water, and thus the sediment fluxes are not sensitive to the flow depth. The probability density functions of the flow depth and shear velocity in the last time step (posterior) are shown in Figs 9(c) and (d), respectively. The posterior distributions of flow depth and shear velocity are approximately Gaussian with mean values of 7.0 m and 0.236 m s−1, respectively. Figure 9. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h, for case 3. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. The green (light grey) lines denote samples and the red dashed lines denote sample means. Panels (c) and (d) show the probability density functions for the inferred shear velocity and the flow depth, respectively, at the final time step (200). The sample mean is denoted with red dashed line. Figure 9. View largeDownload slide Convergence history of the inferred parameters, the shear velocity u* and the flow depth h, for case 3. Panels (a) and (b) show the evolution of the ensemble and the sample mean for the shear velocity and the flow depth, respectively. The green (light grey) lines denote samples and the red dashed lines denote sample means. Panels (c) and (d) show the probability density functions for the inferred shear velocity and the flow depth, respectively, at the final time step (200). The sample mean is denoted with red dashed line. Since there are no ground truths of flow depth and shear velocity available for validation in this realistic case, we have to validate the inference results indirectly. Specifically, the depth-averaged velocity is computed by using the inferred shear velocity and flow depth based on eq. (1) and is compared with that obtained by Spiske et al. (2010) with the inversion model TsuSedMod. The comparison is shown in Fig. 10(a). We can see that the depth-averaged velocity from our inference is distributed from 7.8 to 8.2 m s−1 with a mean value of 8.0 m s−1. This posterior credible interval covers the inference result (8.06 m s−1) of Spiske et al. (2010), although the sample mean from our inference is slightly smaller. A possible explanation of the discrepancy in the two mean estimates is that Spiske used the entire deposit for the TsuSedMod model to perform the inversion, but the sediments in the bottom layers of the deposit may be transported by bed load. Since the TsuSedMod model assumes the sediments is formed only by suspended load (Jaffe & Gelfenbuam 2007), the velocity obtained by Spiske may have been overestimated. In order to compare the inversion results with the field data, we perform forward simulations of sedimentation with the inferred parameters, that is, mean values of posterior flow depth and shear velocity. Fig. 10(b) compares the grain-size distribution obtained based on the inferred parameters with that from the field data. It can be seen that the inferred grain-size distribution agrees very well with the field data, which shows a satisfactory performance of the proposed inversion scheme in the realistic case. However, it should be noted that this indirect validation of the inference does not guarantee the inferred quantities (i.e. flow depth and shear velocity) are accurate, since the model error of the forward model affects the accuracy of the inversion. Using a forward model of higher fidelity and accuracy can lead to further improved inferences. Finally, we briefly comment on the computational cost of the proposed inversion framework compared to the previous tsunami inversion schemes. Previously developed inversion schemes all used the optimization-based target shooting. Their computational costs depend on the choice of optimization scheme. Although gradient-based optimization methods are more efficient, obtaining the gradient information is very difficult for complex nonlinear physical system. Without the gradient information and a reduced model, the brutal-force data fitting scheme involves a large number of forward-model evaluations, which might be much more computationally expensive. In particular, the computational costs could exponentially increase when the dimension of parameter space increases. The currently adopted EnKF method is a Monte Carlo based, derivative-free inference approach. It is easy to implement and without the need of deriving adjoint equations. In this paper, 1000 samples (see Table 1) are used, and thus 1000 evaluations of the forward model are involved. However, the ensemble size can be significantly reduced based on a comprehensive parameter study (Tang et al. 2016), which shows that only 40 samples can achieve the same level of inference accuracy. Note that the forward model evaluations can be done in parallel, that is, 40 forward sedimentation simulations were run simultaneously on 40 CPU cores. As a result, the wall time of the inference is approximately the same as the forward sedimentation simulation. Figure 10. View largeDownload slide Comparison of the two inferred quantities, the depth-averaged velocity and the grain-size distribution, with the previous inference of Spiske et al. (2010) (depth-averaged velocity) and field data (grain-size distribution), respectively. Panel (a) shows the comparison of depth-averaged velocity calculated based on the inferred shear velocity u* and water depth h from the proposed inversion scheme with that of TsuSedMod results (Spiske et al. 2010). Panel (b) compares the grain-size distribution obtained from the forward simulation with the inferred parameter u* and h with the field data. Figure 10. View largeDownload slide Comparison of the two inferred quantities, the depth-averaged velocity and the grain-size distribution, with the previous inference of Spiske et al. (2010) (depth-averaged velocity) and field data (grain-size distribution), respectively. Panel (a) shows the comparison of depth-averaged velocity calculated based on the inferred shear velocity u* and water depth h from the proposed inversion scheme with that of TsuSedMod results (Spiske et al. 2010). Panel (b) compares the grain-size distribution obtained from the forward simulation with the inferred parameter u* and h with the field data. 5 CONCLUSION In this work, we proposed a novel inference scheme based on EnKF to infer tsunami flow speed and flow depth from tsunami deposits. In contrast to traditional data assimilation methods using EnKF, an important novelty of the current work is that the system state is augmented to include both the physical variables (sediment fluxes) and the unknown parameters to be inferred, i.e. shear velocity and flow depth. Consequently, the unknown tsunami flow characteristics are inferred in a rigorous Bayesian way with quantified uncertainties, which clearly distinguishes our method with existing tsunami inversion schemes. Two test cases with synthetic observation data are used to verify the proposed inversion scheme. Numerical results show that the tsunami characteristics inferred from the tsunami deposit information have favourable agreement with the synthetic truths, which demonstrated the merits of the proposed tsunami inversion scheme. A realistic case with field data is studied, and the results are compared to those obtained with a previous inversion model TsuSedMod and are validated by the field data. The comparisons indicate a satisfactory performance of the proposed inference scheme on realistic applications. It should be noted that current work focuses on inferring tsunami flow characteristics at a single location, which are affected by both large-scale tsunami wave characteristics and local geological conditions, for example, surface roughness and bathymetry. To further consider the local geological features and to infer the large-scale wave information, the unknown geological parameters also can be augmented into the state to be inferred, and a more comprehensive 2-D sediment transport model needs to be adopted to assimilate deposits from multiple locations simultaneously. In summary, the proposed scheme is a promising tool for the study of palaeotsunamis in the interrogation of sediment records to infer tsunami characteristics. ACKNOWLEDGEMENTS JXW gratefully acknowledge partial funding of graduate research assistantship from the Institute for Critical Technology and Applied Science (ICTAS, grant number 175258) in this effort. The authors would like to acknowledge the two reviewers, Prof David Tappin and Dr Ruoqian Wang, and the editor, Prof Paul Martin Mai, whose comments helped improving the clarity and quality of the manuscript. REFERENCES Barth H.G., 1984. Modern Methods of Particle Size Analysis , vol. 73, John Wiley & Sons. Bourgeois J., Hansen T.A., Wiberg P.L., Kauffman E.G., 1988. A tsunami deposit at the cretaceous-tertiary boundary in Texas, Science , 241( 4865), 567– 570. Google Scholar CrossRef Search ADS PubMed  Caya A., Sun J., Snyder C., 2005. A comparison between the 4DVAR and the ensemble Kalman filter techniques for radar data assimilation, Mon. Weather Rev. , 133( 11), 3081– 3094. Google Scholar CrossRef Search ADS   Dawson A.G., Stewart I., 2007. Tsunami deposits in the geological record, Sedimentary Geol. , 200( 3), 166– 183. Google Scholar CrossRef Search ADS   Dearing J.A., Elner J.K., Happey-Wood C.M., 1981. Recent sediment flux and erosional processes in a Welsh upland lake-catchment based on magnetic susceptibility measurements, Quat. Res. , 16( 3), 356– 372. Google Scholar CrossRef Search ADS   Elsheikh A.H., Pain C.C., Fang F., Gomes J.L.M.A., Navon I.M., 2012. Parameter estimation of subsurface flow models using iterative regularized ensemble Kalman filter, Stochastic Env. Res. Risk Assess. , 27( 4), 877– 897. Google Scholar CrossRef Search ADS   Evensen G., 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. geophys. Res. , 99( C5), 10 143– 10 162. Google Scholar CrossRef Search ADS   Evensen G., 2003. The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean Dyn. , 53( 4), 343– 367. Google Scholar CrossRef Search ADS   Evensen G., 2009. Data Assimilation: the Ensemble Kalman Filter , Springer Science & Business Media. Evensen G., Van Leeuwen P.J., 2000. An ensemble Kalman smoother for nonlinear dynamics, Mon. Weather Rev. , 128( 6), 1852– 1867. Google Scholar CrossRef Search ADS   Ferraiuolo G., Pascazio V., Schirinzi G., 2004. Maximum a posteriori estimation of height profiles in InSAR imaging, IEEE Geosci. Remote Sens. Lett. , 1( 2), 66– 70. Google Scholar CrossRef Search ADS   Gelfenbaum G., Smith J.D., 1986. Experimental evaluation of a generalized suspended-sediment transport theory, Shelf Sands and Sandstones. Memoir , 11, 133– 144. Iglesias M.A., Law K.J., Stuart A.M., 2013. Ensemble Kalman methods for inverse problems, Inverse Probl. , 29( 4), 045001. Google Scholar CrossRef Search ADS   Jaffe B.E., Gelfenbuam G., 2007. A simple model for calculating tsunami flow speed from tsunami deposits, Sedimentary Geol. , 200( 3), 347– 361. Google Scholar CrossRef Search ADS   Jaffe B.E. et al.  , 2011. Flow speed estimated by inverse modeling of sandy sediment deposited by the 29 September 2009 tsunami near Satitoa, East Upolu, Samoa, Earth-Sci. Rev. , 107( 1–2), 23– 37. Google Scholar CrossRef Search ADS   Jaffe B.E., Goto K., Sugawara D., Richmond B.M., Fujino S., Nishimura Y., 2012. Flow speed estimated by inverse modeling of sandy tsunami deposits: results from the 11 March 2011 tsunami on the coastal plain near the Sendai Airport, Honshu, Japan, Sedimentary Geol. , 282, 90– 109. Google Scholar CrossRef Search ADS   Law K.J., Stuart A.M., 2012. Evaluating data assimilation algorithms, Mon. Weather Rev. , 140( 11), 3757– 3782. Google Scholar CrossRef Search ADS   Madsen O., Wright L., Boon J., Chisholm T., 1993. Wind stress, bed roughness and sediment suspension on the inner shelf during an extreme storm event, Cont. Shelf Res. , 13( 11), 1303– 1324. Google Scholar CrossRef Search ADS   Moore A.L., McAdoo B.G., Ruffman A., 2007. Landward fining from multiple sources in a sand sheet deposited by the 1929 Grand Banks tsunami, Newfoundland, Sedimentary Geol. , 200( 3), 336– 346. Google Scholar CrossRef Search ADS   Nwe T.T., Tokuzo H., 2010. Sediment transport due to flooding from levee breach: a numerical application of the 2004 Niigata flood, Int. J. River Basin Manage. , 8( 1), 3– 14. Google Scholar CrossRef Search ADS   Peters W. et al.  , 2005. An ensemble data assimilation system to estimate CO2 surface fluxes from atmospheric trace gas observations, J geophys. Res. , 110, D24304, doi:10.1029/2005JD006157. Google Scholar CrossRef Search ADS   Richmond B.M. et al.  , 2012. Erosion, deposition and landscape change on the Sendai coastal plain, Japan, resulting from the March 11, 2011 Tohoku-oki tsunami, Sedimentary Geol. , 282, 27– 39. Google Scholar CrossRef Search ADS   Soulsby R.L., Smith D.E., Ruffman A., 2007. Reconstructing tsunami run-up from sedimentary characteristics–a simple mathematical model, in Coastal Sediments'07 , pp. 1075– 1088, eds Kraus N.C., Rosati J.D., American Society of Civil Engineers. Spiske M., Weiss R., Bahlburg H., Roskosch J., Amijaya H., 2010. The TsuSedMod inversion model applied to the deposits of the 2004 Sumatra and 2006 Java tsunami and implications for estimating flow parameters of palaeo-tsunami, Sedimentary Geol. , 224( 14), 29– 37. Google Scholar CrossRef Search ADS   Spiske M., Piepenbreier J., Benavente C., Kunz A., Bahlburg H., Steffahn J., 2013. Historical tsunami deposits in Peru: Sedimentology, inverse modeling and optically stimulated luminescence dating, Quat. Int. , 305, 31– 44. Google Scholar CrossRef Search ADS   Tang H., Weiss R., 2015. A model for TSUnami FLow INversion from deposits (TSUFLIND), Mar. Geol. , 99( C5), 10 143– 10 162. Tang H., Wang J., Weiss R., Xiao H., 2016. TSUFLIND-EnKF: Inversion of tsunami flow depth and flow speed from deposits with quantified uncertainties, Mar. Geol. , doi:10.1016/j.margeo.2016.11.009. Tappin D.R. et al.  , 2014. Did a submarine landslide contribute to the 2011 Tohoku tsunami, Marine Geol. , 357, 344– 361. Google Scholar CrossRef Search ADS   Tuck E.O., Hwang L., 1972. Long wave generation on a sloping beach, J. Fluid Mech. , 51( 03), 449– 461. Google Scholar CrossRef Search ADS   Witter R., Jaffe B., Zhang Y., Priest G., 2012. Reconstructing hydrodynamic flow parameters of the 1700 tsunami at Cannon Beach, Oregon, USA, Nat. Hazards , 63( 1), 223– 240. Google Scholar CrossRef Search ADS   Xiao H., Young Y.L., Prevost J.H., 2010. Hydro- and morpho-dynamic modeling of breaking solitary waves over a fine sand beach. Part II: Numerical simulation, Mar. Geol. , 269( 3), 119– 131. Google Scholar CrossRef Search ADS   Young Y.L., Xiao H., Maddux T.B., 2010. Hyro- and morpho-dynamic modeling of breaking solitary waves over sand beach. Part I: experimental study, Mar. Geol. , 269( 3), 107– 118. Google Scholar CrossRef Search ADS   APPENDIX A: DETAILED ALGORITHM FOR ENSEMBLE KALMAN FILTERING The algorithm of the ensemble Kalman filtering for data assimilation and inverse modelling is summarized below. Given the prior distribution of the parameters to be inferred (shear velocity u* and flow depth h) and sediment flux observations with error covariance matrix R, the following steps are performed: Sampling step. Generate initial ensemble $$\lbrace {\mathbf {x}_j}\rbrace _{j = 1}^M$$ of size M, where the augmented system state is   \begin{equation*} \mathbf {x} = [\zeta _1, \ldots , \zeta _n, u_*, h]^{\prime }. \end{equation*} Prediction step. Propagate the state from current state at time t to the next assimilation step t + ΔT with the forward model TSUFLIND, indicated as $$\mathcal {F}$$,   \begin{eqnarray*} \hat{\mathbf {x}}_j^{(t+\Delta T)} &=& \mathcal {F} [ \mathbf {x}_j^{(t)} ] \end{eqnarray*} in which ΔT = ΔNΔt, indicating that the observation data is assimilated every ΔN time steps. Estimate the mean $$\bar{\mathbf {x}}$$ and covariance P(n + 1) of the ensemble as   \begin{eqnarray} \bar{\mathbf {x}}^{(t+\Delta T)} = \frac{1}{M}\sum _{j=1}^N{\hat{\mathbf {x}}^{(t+\Delta T)}_j} \end{eqnarray} (A1a)  \begin{eqnarray} \mathbf {P}^{(t+\Delta T)} = \frac{1}{M-1} \sum _{j = 1}^{N} {\left( \hat{\mathbf {x}}_j\hat{\mathbf {x}}_j^{\prime } - \bar{\mathbf {x}}\bar{\mathbf {x}}^{\prime } \right)^{(t+\Delta T)}}. \end{eqnarray} (A1b) (iii) Analysis step. Compute the Kalman gain matrix as   \begin{equation*} \mathbf {K}^{(t+\Delta T)} = \mathbf {P}^{(t+\Delta T)} \mathbf {H}^{\prime } (\mathbf {H} \mathbf {P}^{(t+\Delta T)} \mathbf {H}^{\prime } + \mathbf {R})^{-1}. \end{equation*} Update each sample in the predicted ensemble as follows:   \begin{eqnarray*} \mathbf {x}_j^{(t+\Delta T)} &=& \hat{\mathbf {x}}_j^{(t+\Delta T)} + \mathbf {K} (\boldsymbol{\zeta }_j - \mathbf {H} \hat{\mathbf {x}}_j^{(t+\Delta T)}). \end{eqnarray*} (iv) Repeat the prediction and analysis steps until all observations are assimilated. APPENDIX B: NOTATION $$h$$  water depth  $$n$$  number of grain-size classes  $$U$$  depth-averaged flow speed  $$u_{\ast}$$  shear velocity  $$w$$  settling velocity of particles  $$z$$  elevation above the bed  $$z_{0}$$  total roughness of the bed  X  augmented system state  C(z)  vertical profile of sediment concentration  Ci,0  sediment concentration of class i at the bed  C0  total sediment concentration at the bed  D  mean particle diameter  $$\cal{F}$$  forward model for sedimentation  K  eddy viscosity  M  number of samples  N  number of time steps  T  total time for all sediments to settle  U  depth-averaged flow velocity  R  observation covariance matrix  $$\mathbb{R}^{n}$$  space of n-dimensional real-valued vectors  P  ensemble covariance matrix  K  Kalman gain matrix  H  observation matrix  $$h$$  water depth  $$n$$  number of grain-size classes  $$U$$  depth-averaged flow speed  $$u_{\ast}$$  shear velocity  $$w$$  settling velocity of particles  $$z$$  elevation above the bed  $$z_{0}$$  total roughness of the bed  X  augmented system state  C(z)  vertical profile of sediment concentration  Ci,0  sediment concentration of class i at the bed  C0  total sediment concentration at the bed  D  mean particle diameter  $$\cal{F}$$  forward model for sedimentation  K  eddy viscosity  M  number of samples  N  number of time steps  T  total time for all sediments to settle  U  depth-averaged flow velocity  R  observation covariance matrix  $$\mathbb{R}^{n}$$  space of n-dimensional real-valued vectors  P  ensemble covariance matrix  K  Kalman gain matrix  H  observation matrix  View Large Greek letters ζ  sedimentation flux  κ  Von Karman constant  φ  logarithmic scale of particle diameter  σi  standard deviation of observation noise for the ith grain-size class  Δt  time step  Δz  water column layer thickness  Δη  deposit layer thickness  ΔN  time steps between two observations  ΔT  time interval between two observations  ζ  sedimentation flux  κ  Von Karman constant  φ  logarithmic scale of particle diameter  σi  standard deviation of observation noise for the ith grain-size class  Δt  time step  Δz  water column layer thickness  Δη  deposit layer thickness  ΔN  time steps between two observations  ΔT  time interval between two observations  View Large Subscripts/Superscripts i  index of grain-size class  j  index of ensemble member  l  indices of time step, sediment layer, and water column layer  0  initial ensemble  i  index of grain-size class  j  index of ensemble member  l  indices of time step, sediment layer, and water column layer  0  initial ensemble  View Large Decorative symbols $$\widetilde{\square }$$  synthetic truth  $$\overline{\square }$$  mean  $$\widehat{\square }$$  forecast state in EnKF  $${\square }\prime$$  vector/matrix transpose  $$\widetilde{\square }$$  synthetic truth  $$\overline{\square }$$  mean  $$\widehat{\square }$$  forecast state in EnKF  $${\square }\prime$$  vector/matrix transpose  View Large © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society.

### Journal

Geophysical Journal InternationalOxford University Press

Published: Jan 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Unlimited reading Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere. ### Stay up to date Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates. ### Organize your research It’s easy to organize your research with our built-in tools. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. ### Monthly Plan • Read unlimited articles • Personalized recommendations • No expiration • Print 20 pages per month • 20% off on PDF purchases • Organize your research • Get updates on your journals and topic searches$49/month

14-day Free Trial

Best Deal — 39% off

### Annual Plan

• All the features of the Professional Plan, but for 39% off!
• Billed annually
• No expiration
• For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588$360/year

billed annually

14-day Free Trial