Towards full waveform ambient noise inversion

Towards full waveform ambient noise inversion SUMMARY In this work we investigate fundamentals of a method—referred to as full waveform ambient noise inversion—that improves the resolution of tomographic images by extracting waveform information from interstation correlation functions that cannot be used without knowing the distribution of noise sources. The fundamental idea is to drop the principle of Green function retrieval and to establish correlation functions as self-consistent observables in seismology. This involves the following steps: (1) We introduce an operator-based formulation of the forward problem of computing correlation functions. It is valid for arbitrary distributions of noise sources in both space and frequency, and for any type of medium, including 3-D elastic, heterogeneous and attenuating media. In addition, the formulation allows us to keep the derivations independent of time and frequency domain and it facilitates the application of adjoint techniques, which we use to derive efficient expressions to compute first and also second derivatives. The latter are essential for a resolution analysis that accounts for intra- and interparameter trade-offs. (2) In a forward modelling study we investigate the effect of noise sources and structure on different observables. Traveltimes are hardly affected by heterogeneous noise source distributions. On the other hand, the amplitude asymmetry of correlations is at least to first order insensitive to unmodelled Earth structure. Energy and waveform differences are sensitive to both structure and the distribution of noise sources. (3) We design and implement an appropriate inversion scheme, where the extraction of waveform information is successively increased. We demonstrate that full waveform ambient noise inversion has the potential to go beyond ambient noise tomography based on Green function retrieval and to refine noise source location, which is essential for a better understanding of noise generation. Inherent trade-offs between source and structure are quantified using Hessian-vector products. Interferometry, Inverse theory, Computational seismology, Seismic noise, Seismic tomography, Theoretical seismology 1 INTRODUCTION The ambient noise field was long regarded as ‘not of great seismological interest in itself ’ (Agnew & Berger 1978), but has in recent years attracted considerable attention as a valuable source of information on 3-D Earth structure. This paradigm shift was possible due to the raising awareness in seismology that the system response can be extracted from noise recordings. As pointed out by Snieder & Larose (2013) the underlying principle actually dates back more than a century to the paper by Einstein (1906) on Brownian motion. Although Aki (1957) and Claerbout (1968) used similar approaches to study surface waves propagating in the near surface or to relate the reflection response to the transmission response, this field of research only gained momentum after the work by Lobkis & Weaver (2001), who derived and demonstrated in a laboratory experiment that the interstation correlation of recordings of a diffuse field with random and uncorrelated modes is proportional to the Green function between the respective receivers. Campillo & Paul (2003) applied the idea first to earthquake coda and extracted the Green tensor for surface waves. Follow-up studies then confirmed the emergence of surface waveforms from correlations of microseismic noise (Shapiro & Campillo 2004; Sabra et al. 2005b) and prompted a large number of regional and also global tomographic studies (Shapiro et al. 2005; Sabra et al. 2005a; Nishida et al. 2009). While seismic tomography based on ambient noise flourished, theoretical derivations established that the equality of the interstation correlation and the Green function only holds under specific conditions, including wavefield diffusivity and equipartitioning (Lobkis & Weaver 2001; Weaver & Lobkis 2004; Sánchez-Sesma & Campillo 2006; Snieder et al. 2007; Tsai 2009) or the isotropic distribution of both mono- and dipolar uncorrelated noise sources (Wapenaar 2004; Wapenaar & Fokkema 2006; Snieder 2007). These assumptions are typically not satisfied on Earth and the ambient seismic noise field at any frequency is excited by a heterogeneous and non-stationary distribution of noise sources (Rhie & Romanowicz 2004; Nishida & Fukao 2007; Ardhuin et al. 2011; Stutzmann et al. 2012; Gualtieri et al. 2013; Ardhuin et al. 2015; Ermert et al. 2016). Despite the strong empirical evidence that at least fundamental-mode surface waves can be extracted and although tomographic images are broadly consistent with models based on earthquake data (Nishida et al. 2009; Stehly et al. 2009; Haned et al. 2016), breaking these assumptions has various consequences. Amplitudes of noise correlation functions are difficult to interpret and complicate attenuation studies (Harmon et al. 2010; Prieto et al. 2011; Tsai 2011; Stehly & Boué 2017). Heterogeneous noise source distributions affect traveltime measurements (Tsai 2009; Weaver et al. 2009; Yao & van der Hilst 2009; Froment et al. 2010; Harmon et al. 2010; Delaney et al. 2017) and introduce spurious arrivals, which can be mistaken as sound information (Snieder et al. 2006). In addition, certain seismic phases such as body waves and higher-mode surfaces waves are not excited correctly (Halliday & Curtis 2008; Kimman & Trampert 2010), which leads to a poor depth resolution in tomographic images. Besides correction factors presented in some of the studies above, numerous processing and stacking schemes have been proposed to mitigate the difference between reality and the assumptions necessary for Green function retrieval (Bensen et al. 2007; Groos et al. 2012). A common approach is to select correlation functions that resemble plausible Green functions and to discard the rest, which may affect up to 70 per cent of a data set (Stehly et al. 2009). Correlation functions that are strongly altered by heterogeneous noise source distributions may therefore be excluded although they contain valuable information on 3-D Earth structure (Landès et al. 2010). For example, strong noise sources perpendicular to the line connecting two stations manifest themselves in wave packets around zero lag, which cannot be interpreted reasonably as a feature of a Green function. Other processing steps vary for each data set and are usually guided by the objective of the respective study. The specific choices change the actual waveforms in correlation functions, which unavoidably leaves an imprint on the sensitivity to noise sources and Earth structure (Fichtner 2014; Fichtner et al. 2017). During the last decade, great progress has been made in ambient noise tomography based on the principle of Green function retrieval. The discussed shortcomings and corresponding ways to circumvent them, however, reflect that this principle cannot always be readily invoked and ambient noise tomography reaches a limit. Modern tomographic methods exploiting details in waveforms for the benefit of improved resolution (Pratt 1999; Chen et al. 2007; Tape et al. 2009; Fichtner et al. 2009) cannot be applied directly, since Green function retrieval does not account for the influence of the distribution of noise sources on the actual waveforms in correlation functions. Motivation and outline In order to address these limitations, we attempt to develop a method—in the following referred to as full waveform ambient noise inversion—that is capable of accounting for the distribution of noise sources in space and frequency, 3-D heterogeneous Earth structure and the full seismic wave propagation physics. The fundamental idea is to drop the principle of Green function retrieval and to extract information directly from correlation functions and establish them as self-consistent observables in seismology. A similar approach is used in helioseismology to study solar structure and dynamics (Woodard 1997; Gizon & Birch 2002; Hanasoge et al. 2011) and has recently been applied to terrestrial seismology (Tromp et al. 2010; Basini et al. 2013; Hanasoge 2013, 2014; Fichtner 2014). In the following, we present a more general and extended formulation of the theory that allows us not only to compute first, but also second derivatives, which are necessary for an uncertainty analysis accounting for intra- and interparameter trade-offs. In addition, we develop an inversion scheme to simultaneously constrain noise sources and Earth structure, and demonstrate in a 2-D example that this approach has the potential to improve resolution and the location of noise sources substantially. 2 FORWARD THEORY To forward model interstation correlations of ambient noise, we adopt the method developed by Woodard (1997) in helioseismology and recently transported to terrestrial seismology by Tromp et al. (2010). We start with the definition of the frequency-domain Green function $$G_{i,n}({\bf x},\boldsymbol{\xi })$$ as the solution of the governing equations when the right-hand side equals a point-localized force at position $$\boldsymbol{\xi }$$ in n-direction, that is,   \begin{equation} \mathcal {L}[\, G_{i,n}({\bf x},\boldsymbol{\xi })\, ] = \delta _{in} \delta ({\bf x}-\boldsymbol{\xi }), \end{equation} (1)with the Kronecker delta δin and the forward modelling operator $$\mathcal {L}$$. In the interest of a condensed notation, dependencies are omitted wherever possible, here on the angular frequency ω. The operator notation allows us to develop a general theory independent of the specific choice of the forward problem. In seismology, for example, it could stand for any type of wave equation. We limit ourselves, however, to operators that are linear in their argument. The operator contains all model parameters m, in our case the material parameters, that determine the response of the physical system to an external force given by the right hand side in eq. (1). The i-component of the seismic displacement field ui(x) is connected with the Green function $$G_{i,n}({\bf x},\boldsymbol{\xi })$$ and the forcing term $$N_n(\boldsymbol{\xi })$$, in our case the source field of ambient noise, via an integral over the volume ⊕ of the Earth, known as representation theorem (Aki & Richards 2002)   \begin{equation} u_i({\bf x}) = \int \limits _\oplus G_{i,n}({\bf x},\boldsymbol{\xi }) N_n(\boldsymbol{\xi })\, \text{d}\boldsymbol{\xi }. \end{equation} (2)Einstein’s summation convention applies to repeated lower case subscripts. The cross-correlation of the noise fields ui(x1) and uj(x2) is in the frequency domain given by   \begin{equation} \mathcal {C}_{ij}({\bf x}_1,{\bf x}_2) = u_i({\bf x}_1) u_j^*({\bf x}_2) = \int \limits _\oplus \int \limits _\oplus G_{i,n}({\bf x}_1,\boldsymbol{\xi }_1) N_n(\boldsymbol{\xi }_1) G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }_2) N_m^*(\boldsymbol{\xi }_2)\, \text{d}\boldsymbol{\xi }_1\, \text{d}\boldsymbol{\xi }_2. \end{equation} (3)We then consider the average $$\overline{\mathcal {C}_{ij}({\bf x}_1,{\bf x}_2)}$$ over many realizations   \begin{equation} C_{ij}({\bf x}_1,{\bf x}_2) = \overline{\mathcal {C}_{ij}({\bf x}_1,{\bf x}_2)} = \int \limits _\oplus \int \limits _\oplus G_{i,n}({\bf x}_1,\boldsymbol{\xi }_1) G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }_2)\ \overline{N_n(\boldsymbol{\xi }_1) N_m^*(\boldsymbol{\xi }_2)} \ \text{d}\boldsymbol{\xi }_1\, \text{d}\boldsymbol{\xi }_2, \end{equation} (4)which is estimated in standard processing of interstation correlation functions by stacking over many time intervals (Bensen et al. 2007). As pointed out by Snieder et al. (2010) dissipation effectively resets the clock after a characteristic decay time so that consecutive wavefield samples are practically independent. The evaluation of eq. (4) is prohibitively expensive, not only because of the double integral, but also since many realizations of the noise sources are required. This has only been done for simplified wave propagation scenarios (Cupillard & Capdeville 2010; van Driel et al. 2015; Fichtner et al. 2017). To reduce computational costs, we assume that the correlation between different noise sources decays relatively quickly with distance compared to the seismic wavelength and that the contribution from collocated noise sources to the correlation function is dominant. We therefore approximate $$\overline{N_n(\boldsymbol{\xi }_1)N_m^*(\boldsymbol{\xi }_2)}$$ with a delta function in space, that is,   \begin{equation} \overline{N_n(\boldsymbol{\xi }_1)N_m^*(\boldsymbol{\xi }_2)}=S_{nm}(\boldsymbol{\xi }_1)\delta (\boldsymbol{\xi }_1-\boldsymbol{\xi }_2), \end{equation} (5)with the source power-spectral density $$S_{nm}(\boldsymbol{\xi })$$. This assumption is frequently made (Woodard 1997; Tromp et al. 2010; Hanasoge 2013, 2014; Fichtner 2014; Nishida 2014) and its quality has to be assessed for each data set. In any case, the correlation length in space will be finite and the forward theory will remain computable, since the double integral has to be evaluated only for a small region in space. A detailed discussion can be found in Section 5. For uncorrelated noise sources eq. (3) condenses to   \begin{equation} C_{ij}({\bf x}_1,{\bf x}_2) = \int \limits _\oplus G_{i,n}({\bf x}_1,\boldsymbol{\xi }) G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }) S_{nm}(\boldsymbol{\xi })\, \text{d}\boldsymbol{\xi }. \end{equation} (6)Comparing (6) to the representation theorem (2) we see that Cij(x1, x2) can be interpreted as the i-component of a correlation wavefield Cj(x, x2)   \begin{equation} C_{ij}({\bf x},{\bf x}_2)=\int \limits _\oplus G_{i,n}({\bf x},\boldsymbol{\xi })\, \Bigl [G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }) S_{nm}(\boldsymbol{\xi })\Bigr ]\, \text{d}\boldsymbol{\xi }, \end{equation} (7)evaluated at position x = x1. Its numerical computation can be described with the following recipe: step 1: Using source-receiver reciprocity, compute the Green function $$G_{m,j}(\boldsymbol{\xi },{\bf x}_2)$$ with source at x2. step 2: Multiply its complex conjugate with the noise source power-spectral density $$S_{nm}(\boldsymbol{\xi })$$. step 3: Model the correlation wavefield as a solution of the wave equation with $$G_{m,j}^*(\boldsymbol{\xi },{\bf x}_2) S_{nm}(\boldsymbol{\xi })$$ as distributed source and sample the correlation wavefield at any position where another receiver is located. We now have a forward model for the computation of interstation correlation functions for arbitrary distributions of the power-spectral density in frequency and space. It is important to note that this approach does not require any assumptions on wavefield diffusivity and equipartitioning (Lobkis & Weaver 2001; Weaver & Lobkis 2004; Sánchez-Sesma & Campillo 2006) or the isotropic distribution of both mono- and dipolar uncorrelated noise sources (Wapenaar 2004; Wapenaar & Fokkema 2006; Snieder 2007) that would be necessary to equate correlations with Green functions. 2.1 General formulation of the forward theory To facilitate further developments regarding sensitivity kernels and Hessian-vector products and, in addition, to be independent of the choice of time or frequency domain, we propose the following two partial differential equations (PDE) for a general formulation describing the modelling of interstation correlation functions according to the recipe above:   \begin{equation} \underbrace{\mathcal {L}[\, G_{m,j}({\bf x},{\bf x}_2)\, ] = \delta _{mj}\, D(\, 0, {\bf x}_2\, )}_{{\bf step 1}} \end{equation} (8)  \begin{equation} \underbrace{\mathcal {L}[\, C_{ij}({\bf x}, {\bf x}_2)\, ] = \underbrace{\delta _{in}\, P[\, S_{nm}({\bf x}), G_{m,j}({\bf x}, {\bf x}_2)\, ]}_{{\bf step 2}}}_{{\bf step 3}}, \end{equation} (9)where D( ts, xs) represents two delta functions, one at time ts or the respective frequency domain equivalent and the other in space at xs. P denotes a bilinear function acting on Snm(x) and Gm, j(x, x2). In the frequency domain P can simply be a multiplication of Snm(x, ω) and $$G_{m,j}^*({\bf x}, {\bf x}_2,\omega )$$, thereby again invoking Einstein’s summation convention. This would translate to a correlation between Gm, j(x, x2, t) and the time domain equivalent Snm(x, t) of the power-spectral density, and summing over m. These are only examples and other definitions are possible (see next subsection). 2.2 Example for membrane waves After each paragraph covering theoretical developments, we apply the main findings to membrane waves (Tanimoto 1990; Peter et al. 2007; Tape et al. 2007) to give a better understanding of the concepts. Since they can be interpreted as an analogue to fundamental mode surface waves, we also refer to them as surface waves in the following. The operator $$\mathcal {L}$$ describing membrane waves travelling in the x − y plane with vertical displacement u is in the time domain given by   \begin{equation} \mathcal {L}(\circ ) = \rho \, \partial _{tt} \circ -\, \partial _{\bf x}\bigl [\mu \, \partial _{\bf x}\, \circ \bigr ], \end{equation} (10)with density ρ, shear modulus μ and the (second) partial derivatives with respective to space ∂x and time ∂tt. The symbol ○ represents a place holder, in this case for a scalar wavefield. Eqs (8) and (9) take the specific form   \begin{equation} \rho \, \partial _{tt} G({\bf x}, {\bf x}_2, t) -\, \partial _{\bf x}\bigl [\mu \, \partial _{\bf x} G({\bf x}, {\bf x}_2, t) \bigr ] = \delta (t) \delta ({\bf x}-{\bf x}_2), \end{equation} (11)  \begin{equation} \rho \, \partial _{tt} C({\bf x},{\bf x}_2,t) - \partial _{\bf x}\bigl [\mu \, \partial _{\bf x} C({\bf x},{\bf x}_2,t)\bigr ] = \underbrace{\frac{1}{2\pi } \rm {Re} \int \limits _{-\infty }^{\infty } S({\bf x}, \omega )\, \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t)\, e^{-i\omega t}\, \text{d}t\, \Bigr ]^* e^{i\omega t} \, \text{d}\omega }_{=:P[\, S({\bf x}),\, G({\bf x}, {\bf x}_2)\, ]}. \end{equation} (12)The innermost integral on the right hand side of eq. (12), bracketed with [.], is an approximation of the Fourier transform (see Appendix A) of the Green function for a finite time interval τ = [t1, t2]. The interval is chosen such that (t2 − t1)−1 is much smaller than the minimum frequency of interest in order to avoid contamination of the Fourier spectrum. The complex conjugate of the Fourier transformed Green function, equivalent to its time reversed version, is multiplied with the power-spectral density S(x, ω) and the product is transformed back to the time domain. The final forcing term of the correlation wavefield is a time-dependent quantity that is distributed in space. In a realistic inverse problem, however, a continuous and infinite-dimensional frequency dependence cannot be resolved. We therefore choose a parametrization of the power-spectral density in terms of L basis functions Sl(x) that are constant within a period band given by ωl and ωl + 1 and zero outside. The integral over frequency is thus split up in a sum of L integrals, each covering the frequency band of the respective basis function. Eq. (12) changes accordingly to   \begin{equation} \rho \, \partial _{tt} C({\bf x},{\bf x}_2,t) - \partial _{\bf x}\bigl [\mu \, \partial _{\bf x} C({\bf x},{\bf x}_2,t)\bigr ] = \underbrace{\frac{1}{2\pi } \textrm{Re} \sum _l S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t)\, e^{-i\omega t}\, \text{d}t\, \Bigr ]^* e^{i\omega t} \, \text{d}\omega }_{=:\,P[\, S({\bf x}),\, G({\bf x}, {\bf x}_2)\, ]}. \end{equation} (13)This represents only one possibility to adapt P[ S(x), G(x, x2) ] for inversions and other definitions might be more feasible, but for clarity we base the following examples on eq. (13). Correlation functions for one frequency band are shown in Fig. 1 for different distributions in space. The two heterogeneous noise source distributions lead to asymmetric correlations functions with waves arriving earlier than the surface wave, which cannot be interpreted reasonably by Green function retrieval. Figure 1. View largeDownload slide Correlation functions for different noise source distributions computed for a frequency band of 0.05–0.18 Hz and a homogeneous velocity model with v = 4000 km s−1. The noise source distributions for (b) and (c) are shown on the right. Each correlation function is normalized. Figure 1. View largeDownload slide Correlation functions for different noise source distributions computed for a frequency band of 0.05–0.18 Hz and a homogeneous velocity model with v = 4000 km s−1. The noise source distributions for (b) and (c) are shown on the right. Each correlation function is normalized. 3 THE CONTINUOUS ADJOINT METHOD FOR FIRST AND SECOND DERIVATIVES 3.1 First derivatives The objective of full waveform ambient noise inversion is to find material parameters m and a power-spectral density distribution Snm, which minimizes a misfit $$\mathcal {X}$$ used to quantify the deviation of synthetic correlation functions and their corresponding observations. Since $$\mathcal {X}$$ is generally a non-linear function of m and Snm, we approximate the optimum with the help of iterative minimization algorithms. In order to obtain an efficient formulation to compute directional derivatives of $$\mathcal {X}$$ with respect to the power-spectral density and Earth structure, we apply the continuous adjoint method in the following (Tarantola 1988; Tromp et al. 2005; Fichtner et al. 2006) to the PDEs (8) and (9) describing the modelling of correlation functions. For the sake of a clean notation we present the derivation only for scalar wavefields. The extension to elastic media follows the same steps. 3.1.1 Source kernel The choice of the misfit functional depends on the specific problem and influences to a large extent the results of an inversion. Since the formalism described in the following, however, can be applied to any differentiable misfit functional, we write the misfit in a generic form by two integrals, which we denote by 〈.〉. One integral is over space $${\bf x} \in \oplus \subset \mathbb {R}^3$$ and the other over time t ∈ τ = [t1, t2] or equivalently over angular frequency ω ∈ W = [ω1, ω2] (Fichtner 2010). We therefore write   \begin{equation} \mathcal {X}({\bf m},S) = \Bigl \langle \chi (C ({\bf m},S) ) \Bigr \rangle , \end{equation} (14)where χ is a space and time dependent evaluation of the discrepancy between synthetics and observations. For observations at specific locations, for example, at seismic stations, χ can be defined by introducing δ functions in space. Without loss of generality, we consider a time interval τ well suited for correlation functions, which is centred around time zero, that is, τ = [−tc, tc], where tc is the maximum lag of the correlation function. An improvement of the fit translates to a decrease of the misfit functional $$\mathcal {X}$$. In order to invert for the distribution of noise sources in space, we are therefore interested in the directional derivative of $$\mathcal {X}$$ with respect to S. Applying the chain rule we get   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S = \nabla _C\mathcal {X}\, \delta _S C = \Bigl \langle \nabla _C\chi \, \delta _S C \Bigr \rangle , \end{equation} (15)where   \begin{equation} \delta _S C = \nabla _S C\, \delta S \end{equation} (16)represents the first-order change of the correlation C for a variation in the source distribution δS. The difficulty of eq. (15) lies in the appearance of δSC that is computationally expensive to evaluate for all possible directions δS necessary for a complete evaluation of the directional derivative. A first-order finite-difference approximation of $$\nabla _{S}\mathcal {X}$$ would mean to evaluate C(m, S + δS) for all possible directions δS, which becomes infeasible for expensive forward calculations or large model domains. Thus, the following considerations are directed to eliminate δSC in eq. (15). To achieve this, we differentiate eq. (9) with respect to S, which gives   \begin{equation} \mathcal {L}(\delta _{S}C) - P(\, \delta S, G\, ) = 0, \end{equation} (17)where we use that $$\mathcal {L}$$ is linear in C and P linear in S, that is,   \begin{equation} \nabla _C\mathcal {L}(C)\, \delta _{S}C = \mathcal {L}(\delta _{S}C), \end{equation} (18)and   \begin{equation} \nabla _S \, P(\, S, G\, )\, \delta S = P(\, \delta S, G\, ). \end{equation} (19)Multiplying eq. (17) by an arbitrary test function u†, applying the integrals 〈.〉 and adding it to eq. (15) yields   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S = \Bigl \langle \nabla _C\chi \, \delta _S C + u^\dagger \, \mathcal {L}( \delta _{S}C ) - u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle . \end{equation} (20)Invoking the adjoint operators defined by   \begin{equation} \Bigl \langle \nabla _C\chi \, \delta _S C \Bigr \rangle = \Bigl \langle \delta _{S} C\, \nabla _C\chi ^\dagger \Bigr \rangle \end{equation} (21)and   \begin{equation} \Bigl \langle u^\dagger \, \mathcal {L}(\delta _{S}C)\Bigr \rangle = \Bigl \langle \delta _{S}C\, \mathcal {L}^\dagger (u^\dagger )\Bigr \rangle \end{equation} (22)for any u† and δSC, we obtain   \begin{eqnarray} \nabla _{S}\mathcal {X} \, \delta S & = & \Bigl \langle \delta _{S} C\, \nabla _C\chi ^\dagger + \delta _{S}C\, \mathcal {L}^\dagger (u^\dagger ) - u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle \nonumber\\ & = & \Bigl \langle \delta _{S} C \, [\, \nabla _C\chi ^\dagger + \mathcal {L}^\dagger (u^\dagger )\, ] - u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle . \end{eqnarray} (23)We can now eliminate δSC when a field u† can be found such that the adjoint equation   \begin{equation} \mathcal {L}^\dagger ( u^\dagger ) = -\nabla _C\chi ^\dagger \end{equation} (24)is satisfied. The right hand side of eq. (24) is usually referred to as adjoint source time function and we denote it by f in the example sections below. The directional derivative of $$\mathcal {X}$$ in eq. (15) then simplifies to   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S = -\Bigl \langle u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle . \end{equation} (25)Choosing the frequency domain for illustration, eq. (25) can be written as   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S({\bf x},\omega ) = -\int \limits _\oplus \int \limits _W u^\dagger ({\bf x},\omega )\, P\bigl [\, \delta S({\bf x},\omega ), G({\bf x}, {\bf x}_2, \omega )\, \bigr ] \, \text{d}{\bf x}\, \text{d}\omega = \int \limits _\oplus \int \limits _W K(x, \omega )\, \delta S({\bf x},\omega ) \, \text{d}{\bf x}\, \text{d}\omega , \end{equation} (26)where we define the frequency dependent noise source kernel as   \begin{equation} K(x, \omega ) = - u^\dagger ({\bf x},\omega )\, P\bigl [\, \circ , G({\bf x}, {\bf x}_2, \omega )\, \bigr ]. \end{equation} (27)In the time domain, any definition of P includes an integral over angular frequency ω. Consequently, a noise source kernel can be defined in a similar way in the time domain. A kernel represents the first-order change of the measured misfit and tells us in our case where changes in the source distribution have an influence on the misfit. By construction, we can now compute $$\nabla _{S}\mathcal {X} \, \delta S$$ for any direction δS without the explicit knowledge of δSC. The price we have to pay is to find the adjoint operator $$\mathcal {L}^\dagger$$ and to solve the adjoint problem according to eq. (24). In non-dissipative media, the wave equation operator is self-adjoint, meaning that $$\mathcal {L}=\mathcal {L}^\dagger$$. The derivation of $$\mathcal {L}^\dagger$$ can be found, for example, in Fichtner (2010). In the time domain, the adjoint eq. (24) has to be solved subject to terminal conditions, requiring that the adjoint wavefield is zero at time tc. In practice, it is solved backwards in time, such that the terminal conditions act as initial conditions for the numerical simulation (Tarantola 1988; Tromp et al. 2005; Plessix 2006; Fichtner et al. 2006). Example for membrane waves After the derivation of the source kernel, we come back to the example introduced in Section 2.2. Suppose we have a perfect Earth model for a certain study area and we calculate synthetic correlation functions for a homogeneous distribution of noise sources. Since noise sources at any frequency are heterogeneously distributed and non-stationary (Ardhuin et al. 2011; Stutzmann et al. 2012; Ardhuin et al. 2015), there is a discrepancy between synthetics and observations. We are therefore interested in how each basis function Sl(x) has to be changed to improve the fit between synthetics and observations. According to eq. (25) this is given by   \begin{eqnarray} \nabla _{S_l}\chi \, \delta S_l & = & -\Bigl \langle u^\dagger \, P(\, \delta {S_l}, G\, )\Bigr \rangle \nonumber\\ & = & - \frac{1}{2\pi } \textrm{Re} \int \limits _\oplus \int \limits _\tau u^\dagger ({\bf x}, t)\, \delta S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t^{\prime })\, e^{-i\omega t^{\prime }} \text{d}t^{\prime } \Bigr ]^* e^{i\omega t} \, \text{d}\omega \, \text{d}t\, \text{d}{\bf x}. \end{eqnarray} (28)Since Sl is constant in the corresponding frequency band, the kernel Kl can be defined as   \begin{equation} K_l({\bf x}) = - \frac{1}{2\pi } \textrm{Re} \int \limits _\tau u^\dagger ({\bf x},t) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t^{\prime })\, e^{-i\omega t^{\prime }} \, \text{d}t^{\prime }\, \Bigr ]^* e^{i\omega t} \, \text{d}\omega \, \text{d}t, \end{equation} (29)such that   \begin{equation} \nabla _{S_l}\chi \, \delta S_l = \int \limits _\oplus K_l({\bf x}) \delta S_l({\bf x})\, \text{d}{\bf x}. \end{equation} (30)The frequency dependence of the noise source is in this example honoured by L noise source kernels, one for each basis function. The adjoint wavefield u†(x) can be interpreted as a time reversed Green function G(x, x1, −t), with its source at the receiver x1 where the measurement is taken, convolved ⋆ with the adjoint source time function f(t). We rewrite eq. (29) accordingly and obtain   \begin{equation} K_l({\bf x}) = - \int \limits _\tau f(t) \star G({\bf x}, {\bf x}_1, -t)\, G_l({\bf x}, {\bf x}_2, -t)\, \text{d}t, \end{equation} (31)where we abbreviate the complex conjugate of the finite Fourier transform and its band-limited inverse, covering the angular frequencies of the corresponding basis function, with Gl(x, x2, −t). Since the time reversed Green function is purely acausal, the kernel has to be evaluated only for times t ≤ 0. Examples of source kernels are presented in Fig. 2. The general shape is governed by the interaction of two Green functions, one at the reference station x2 and the other at the receiver x1, leading to a superposition of hyperbolas with varying eccentricity (Hanasoge 2013; Fichtner 2015; Ermert et al. 2016). Geometrically, a hyperbola can be defined as a set of points, for which the difference between any point and two foci, in our case the two receivers, is constant. Therefore, any source along one hyperbola contributes to the correlation waveform at a specific traveltime. Taking a measurement in a specific time window thus selects a set of hyperbolas and the measurement type can be interpreted as a weighting factor. The alternating signs of the hyperbolas indicate different Fresnel zones. For infinite frequency, the kernels collapse into rays starting at one focal point, directed away from the second. Figure 2. View largeDownload slide Effect of windowing and adjoint source time function on source kernels for a frequency band of 0.02–0.2 Hz. The general shape of the source kernel, that is, the hyperbolas (top left) are governed by two Green functions. Windowing the correlation function, selects different parts of the sensitivity kernel, for example, for the surface wave signal from 73 to 127 s (bottom left) or times around zero lag from −10 to 10 s (top right). Whereas the first three kernels are calculated for waveform differences, the last (bottom right) corresponds to an asymmetry measurement, both computed without data. For details about the specific misfit functionals see Section 4. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. Figure 2. View largeDownload slide Effect of windowing and adjoint source time function on source kernels for a frequency band of 0.02–0.2 Hz. The general shape of the source kernel, that is, the hyperbolas (top left) are governed by two Green functions. Windowing the correlation function, selects different parts of the sensitivity kernel, for example, for the surface wave signal from 73 to 127 s (bottom left) or times around zero lag from −10 to 10 s (top right). Whereas the first three kernels are calculated for waveform differences, the last (bottom right) corresponds to an asymmetry measurement, both computed without data. For details about the specific misfit functionals see Section 4. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. 3.1.2 Structure kernel The assumption of having a perfect structure model as in the example above will, of course, never be fulfilled and missing Earth structure in the model m will contribute to discrepancies between synthetics and observations as well. Therefore, we follow the procedure described in Section 3.1.1 in order to derive sensitivity kernels for material parameters m. The directional derivative of the misfit functional $$\mathcal {X}$$ with respect to δm is given by   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \nabla _C\mathcal {X}\, \delta _{\bf m} C = \Bigl \langle \nabla _C \chi \, \delta _{{\bf m}} C \Bigr \rangle , \end{equation} (32)where   \begin{equation} \delta _{\bf m} C = \nabla _{\bf m} C\, \delta {\bf m} \end{equation} (33)denotes the derivative of the correlation function C with respect to m in the direction of δm. The problem in eq. (32) is this time the appearance of δmC, which is computationally expensive to evaluate for all possible directions δm. In order to again eliminate the unpleasant term, we differentiate eq. (9) with respect to m  \begin{equation} \nabla _{{\bf m}} \mathcal {L}(C) \, \delta {\bf m} + \mathcal {L}(\delta _{{\bf m}}C)- P(S, \delta _{{\bf m}}G\, ) = 0, \end{equation} (34)where   \begin{equation} \delta _{\bf m} G = \nabla _{\bf m} G\, \delta {\bf m} \end{equation} (35)represents the derivative of the Green function G with respect to m in the direction of δm. Please note that we obtain three terms in eq. (34), because not only the correlation function C and the Green function G depend on the material parameters m, but also the operator $$\mathcal {L}$$. Multiplying eq. (34) with the adjoint wavefield u† defined in eq. (24), applying 〈.〉 and adding the term to eq. (32) leads to   \begin{equation} \delta _{{\bf m}}\chi = \Bigl \langle \nabla _C \chi \, \delta _{{\bf m}} C + u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + u^\dagger \, \mathcal {L}( \delta _{{\bf m}}C ) - u^\dagger \, P(S, \delta _{{\bf m}}G\, ) \Bigr \rangle . \end{equation} (36)With the same set of adjoint operators defined before in eqs (21) and (22), we eliminate terms one and three due to the definition of u† and obtain   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} - u^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle . \end{equation} (37)Since the source term for the correlation wavefield also depends on the material parameters m in terms of the Green function G, we still have the term δmG in eq. (37) and the directional derivative cannot be evaluated for arbitrary directions δm. Trying to eliminate δmG, we consider the derivative of eq. (8) with respect to m in the direction δm:   \begin{equation} \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + \mathcal {L}( \delta _{{\bf m}}G ) = 0 \end{equation} (38)Following the procedure above, that is, multiplying eq. (38) with a test field c†, applying the integrals 〈.〉 and adding the resulting term to eq. (37), yields   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} - u^\dagger \, P( S, \delta _{{\bf m}}G )+ c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + c^\dagger \mathcal {L}( \delta _{{\bf m}}G ) \Bigr \rangle . \end{equation} (39)Now invoking a second pair of adjoint operators   \begin{equation} \Bigl \langle u^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle = \Bigl \langle \delta _{{\bf m}}G\, P^\dagger ( S, u^\dagger ) \Bigr \rangle , \end{equation} (40)  \begin{equation} \Bigl \langle c^\dagger \, \mathcal {L}( \delta _{{\bf m}}G ) \Bigr \rangle = \Bigl \langle \delta _{{\bf m}}G\, \mathcal {L}^\dagger ( c^\dagger ) \Bigr \rangle , \end{equation} (41)we can alter eq. (39) to   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} - \delta _{{\bf m}}G\, [ P^\dagger ( S, u^\dagger ) - \mathcal {L}^\dagger (c^\dagger ) ] \Bigr \rangle . \end{equation} (42)When a field c† can be found that satisfies the second adjoint equation given by   \begin{equation} \mathcal {L}^\dagger ( c^\dagger ) = P^\dagger ( S, u^\dagger ), \end{equation} (43)the term δmG is eliminated. The derivative of the misfit functional $$\mathcal {X}$$ with respect to material parameters m can then be computed for any direction δm by   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} \Bigr \rangle = \int \limits _\oplus K({\bf x})\, \delta {\bf m}\, \text{d}{\bf x}, \end{equation} (44)where the sensitivity kernel K is in the time domain defined as   \begin{equation} K({\bf x}) = \int \limits _{\tau } u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C) + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \text{d}t. \end{equation} (45)For the computation of sensitivity kernels for Earth structure, two adjoint equations given by eqs (24) and (43) have to be solved. The steps in solving the adjoint equations follow the recipe introduced in Section 2 to forward model correlation functions: First, a wavefield u† is computed, which is then combined with the power-spectral density distribution according to P†( S, u† ). The combination then drives in the last step a second wavefield c† as distributed source following eq. (43). For a further interpretation of the second adjoint wavefield, we first consider the right hand side of eq. (43) in the frequency domain and for a self-adjoint operator P, meaning that P = P†. We can then write   \begin{equation} P^\dagger \bigl [ S({\bf x},\omega ), u^\dagger ({\bf x},\omega ) \bigr ] = S({\bf x},\omega )\, u^{\dagger *}({\bf x},\omega ). \end{equation} (46)Using eq. (46) as the source term in the scalar version of the representation theorem in eq. (2) and writing again u† = f(ω) G*(x, x1, ω), we obtain   \begin{equation} c^\dagger ({\bf x},\omega ) = f^*(\omega ) \int \limits _\oplus G^*({\bf x},\boldsymbol{\xi },\omega ) \Bigl [S({\bf x},\omega ) G({\bf x}_1, \boldsymbol{\xi }, \omega )\Bigr ]\, \text{d}\boldsymbol{\xi }, \end{equation} (47)where the first Green function $$G^*({\bf x},\boldsymbol{\xi })$$ is time reversed, since c† is an adjoint wavefield. Comparing eq. (47) to the correlation wavefield given by expression (7), c† can be interpreted as an adjoint correlation wavefield with a reference station at x1, modulated with the adjoint source time function f(ω). The link between the derivation presented here and by Fichtner (2015) is presented in Appendix B. Example for membrane waves Similar to the derivation of the adjoint operator $$\mathcal {L}^\dagger$$, the explicit form of P† depends on the definition of P. A derivation for our example case is given in Appendix C. In addition, we apply the general expression for the derivative of the objective function $$\mathcal {X}$$ in direction δm (eq. (44)) to membrane waves and, for simplicity, only to variations in density ρ. Using the wave operator defined in eq. (10), we obtain for the derivative of the wave equation operator with respect to ρ   \begin{equation} \nabla _{\rho } \mathcal {L}(\circ ) \, \delta \rho = \delta \rho \, \partial _{tt}\circ . \end{equation} (48)Consequently, we find   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \partial _{tt}{C}\, \delta \rho + c^\dagger \, \partial _{tt}{G}\, \delta \rho \Bigr \rangle = \int \limits _\oplus K_{\rho }({\bf x})\, \delta \rho \, \text{d}{\bf x}. \end{equation} (49)Integration by parts and using the terminal conditions for the adjoint wavefield provides a more symmetric version of the density kernel, which is convenient for numerical simulations, and is given by   \begin{equation} K_\rho ({\bf x}) = \int \limits _\tau \partial _{t}u^\dagger ({\bf x},t)\, \partial _{t}C({\bf x},t)\, + \partial _{t}c^\dagger ({\bf x},t)\, \partial _{t}G({\bf x},t)\, \text{d}t. \end{equation} (50)Following the same recipe, kernels can be derived for the elastic parameters or seismic velocities. Examples for structure kernels for different source distributions are illustrated in Fig. 3. Inherent in each structure kernel are elliptical features. For any point along an ellipse, the sum of the distances to two foci is constant. A change of the medium at any of these points therefore affects the waveforms at a specific traveltime. For a point source and two receivers, the structure kernel is composed of ellipses, each with one focal point at the source position and the other at one of the receivers. Any other source distribution can be explained by a superposition of point sources. For the special case of a homogeneous distribution of noise sources, the sensitivity to each noise source location vanishes and the two focal points are located at the receivers. The kernel then resembles finite-frequency kernels for earthquake tomography (Dahlen et al. 2000; Dahlen & Baig 2002; Zhou et al. 2004; Tromp et al. 2005). The kernel width is again determined by the frequency content. Figure 3. View largeDownload slide Structure kernels for four different source distributions: homogeneous (top left), Gaussian anomaly (bottom left) and two point sources (right). In the last three scenarios, the source anomalies are indicated in green colours. All kernels are calculated for a frequency band of 0.02 to 0.2 Hz using waveform differences as misfit functional, computed without data. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. Figure 3. View largeDownload slide Structure kernels for four different source distributions: homogeneous (top left), Gaussian anomaly (bottom left) and two point sources (right). In the last three scenarios, the source anomalies are indicated in green colours. All kernels are calculated for a frequency band of 0.02 to 0.2 Hz using waveform differences as misfit functional, computed without data. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. 3.2 Second derivatives With the developments above, we can attribute differences in synthetic and observed correlation functions either to the material parameters m or to the power-spectral density distribution Snm. In order to establish the foundation of a resolution analysis that accounts for the resulting trade-offs between noise sources and Earth structure, we extend the theory to the computation of second derivatives. In the context of probabilistic inverse problems, the inverse Hessian matrix in the vicinity of the global model and under the assumption of Gaussian statistics can be interpreted as an approximation of the posterior covariance matrix (Tarantola 2005; Fichtner & Trampert 2011). The Hessian matrix therefore contains all the information on resolution and trade-offs that we are trying to retrieve. Since mixed derivatives are most important for the characterization of trade-offs, we focus on them in the following and show second derivatives only with respect to structure Hmm or source Hss in Appendix D. For the sake of a clean notation we again present the derivation only for the scalar case, while the same recipe applies for the extension to elastic media. Mixed source-structure derivative Hms The basic principles of the adjoint method applied above to get an efficient formulation for the computation of first derivatives, are reused in the following to compute second derivatives. We start with the directional derivative of the misfit functional $$\mathcal {X}$$ with respect to δm, which is given by   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \nabla _C\mathcal {X}\, \delta _{\bf m} C = \Bigl \langle \nabla _C \chi \, \delta _{{\bf m}} C \Bigr \rangle . \end{equation} (51)Since we are interested in mixed derivatives, we differentiate eq. (51) again, but now with respect to S, and we get   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \nabla _{S}\nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} \, \delta S = \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + \nabla _{C}\chi \, \delta _{{\bf m}S} C \Bigr \rangle , \end{equation} (52)where   \begin{equation} \delta _{{\bf m}S} C = \nabla _{S}\nabla _{\bf m}\, C\, \delta {\bf m}\, \delta S \end{equation} (53)denotes the second-order variation of the correlation function C with a perturbation in m and in S, and Hms( δm, δS ) denotes the Hessian with mixed derivatives applied to the directions δm and δS. The problem in eq. (52) is not only the combined appearance of δmC and δSC, but also the second derivative of the correlation function δmSC, which would be even more expensive to evaluate for all possible directions δm and δS. Trying to eliminate the latter, we differentiate eq. (9) with respect to S and then again with respect to m, yielding   \begin{equation} \mathcal {L}(\delta _{S}C) - P( \delta S, G ) = 0, \end{equation} (54)  \begin{equation} \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} + \mathcal {L}( \delta _{{\bf m}S}C ) - P( \delta S, \delta _{{\bf m}} G ) = 0. \end{equation} (55)Using again the adjoint wavefield u† and the two integrals over space and time/frequency 〈.〉 introduced in eq. (14) in the same manner as before, but now with eq. (52) leads to   \begin{eqnarray} H_{ms}( \delta {\bf m}, \delta S ) &=& \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + \nabla _{C}\chi \, \delta _{{\bf m}S} C \nonumber\\ && + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} + u^\dagger \, \mathcal {L}( \delta _{{\bf m}S}C ) - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle . \end{eqnarray} (56)With the set of adjoint operators defined in eqs (21) and (22), we can eliminate terms two and four due to the definition of u† in eq. (24) and obtain   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle , \end{equation} (57)where only first derivatives of the correlation function and the Green function are involved. Unfortunately, the choice of the first direction determines the effect of the second and it is therefore not possible to get a general expression for the Hessian operator that can be evaluated easily for all possible directions for δm and for δS. Our strategy is therefore to choose one direction δm and try to get an expression for Hms( δm, ○ ), also known as Hessian-vector product, which can be evaluated for all possible directions δS. To achieve this, we define the adjoint wavefield p† by   \begin{equation} \mathcal {L}^\dagger (p^\dagger ) = - \nabla _{C}\nabla _{C}\chi ^\dagger \, \delta _{{\bf m}} C - \nabla _{{\bf m}}\mathcal {L}^\dagger (u^\dagger ), \end{equation} (58)and invoke the adjoint operators   \begin{equation} \Bigl \langle [\nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C]\, \delta _{S} C= \delta _{S} C\, [\nabla _{C}\nabla _{C}\chi ^\dagger \, \delta _{{\bf m}} C]\Bigr \rangle \end{equation} (59)and   \begin{equation} \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C ) = \delta _{S}C\, \nabla _{{\bf m}}\mathcal {L}^\dagger (u^\dagger )\Bigr \rangle . \end{equation} (60)With the definition of p† and the two adjoint operators, we can change eq. (57) to   \begin{equation} H_{ms}(\delta \textbf{m}, \delta S) = \Bigl\langle -\delta_{S} C\, \mathcal{L}^\dagger(p^\dagger) - u^\dagger\,P(\delta S, \delta_{\textbf{m}} G) \Bigr\rangle = \Bigl\langle -p^\dagger\,\mathcal{L}(\delta_{S} C) - u^\dagger\,P(\delta S, \delta_{\textbf{m}} G) \Bigr\rangle. \end{equation} (61)With eq. (54) we remove $$\mathcal {L}(\delta _{S} C)$$ from eq. (61) and replace it with P( δS, G ). Since δS then only appears explicitly, we write   \begin{equation} H_{ms}( \delta \textbf{m}, \circ) = \Bigl\langle -p^\dagger\, P(\circ, G) - u^\dagger\,P(\circ, \delta_{\textbf{m}} G) \Bigr\rangle \end{equation} (62)to illustrate that it can be evaluated for all possible directions δS. Following the idea used in expression (26), a Hessian kernel can be defined in a similar way. The Hessian-vector product Hms( δm, ○ ) can be interpreted as the first-order change in the sensitivity to the distribution of noise sources caused by a perturbation δm in the structure model, which we can think of as a scatterer without loss of generality. The source kernel according to eq. (27) is composed of two components: the adjoint wavefield u† and the Green function G. The change of either one of them is accounted for by the two terms in eq. (62). The wavefield p† describes the change of the adjoint wavefield u† caused by scattering of the adjoint and the correlation wavefield, where the latter affects the misfit that drives the adjoint wavefield u†. The second term in eq. (62) accounts for the scattering of the Green function. A direct interaction of the change in the adjoint wavefield and the change in the Green function would lead to higher order terms. 4 INVERSION With the theoretical developments presented above, we now try to understand the physics of the problem in a numerical model in 2-D and study, how structure and heterogeneous noise source distributions express themselves in noise correlation functions in terms of measurable quantities. We then use the 2-D examples in order to find suitable joint inversion schemes and measurements that are likely to be useful in 3-D real-data problems. The feasibility of the developed strategy is demonstrated in a synthetic inversion. In the following, we use a 2-D finite-difference discretization of the PDE describing membrane waves (eq. 10). The correlations are computed in the time domain for a frequency band from 0.02 to 0.2 Hz. 4.1 Misfit study A key element for the inversion of both noise sources and structure is the design of misfit functionals that decouple the inversion as much as possible and the awareness which quantities can only be recovered jointly. We therefore start our developments by studying the effect of heterogeneous noise source distributions and of structure on noise correlation functions in terms of four different misfit functionals. The first one is based on the causal/acausal asymmetry of noise correlations, proposed by Ermert et al. (2016) for an inference of the distribution of noise sources with the supposition that it is to first order insensitive to unmodelled Earth structure. The asymmetry A is defined by   \begin{equation} A = \ln \frac{ {\int }_\tau [w_+(t) C(t)]^2 \text{d}t}{ {\int }_\tau [w_-(t) C(t)]^2 \text{d}t}, \end{equation} (63)where w+(t) and w−(t) are time windows centred around signals on the positive and negative lag, respectively. The discrepancy between asymmetries of synthetic and observed correlations can be quantified in terms of an L2-misfit. A closely related measurement, referred to as energy differences E in the following, directly compares the absolute energy of windowed signals in synthetics and observations and is given by   \begin{equation} E = \frac{ {\int }_\tau [w(t) C(t)]^2 \text{d}t - {\int }_\tau [w(t) C^o(t)]^2 \text{d}t}{{\int }_\tau [w(t) C^o(t)]^2 \text{d}t}, \end{equation} (64)where the superscript o denotes observations. These two misfits are complemented with waveform differences using the L2-norm and cross-correlation traveltime measurements. Following Luo & Schuster (1991) and Dahlen et al. (2000), we define the latter as the traveltime shift, where the time domain cross-correlation between a synthetic and an observed signal reaches its global maximum. With the exception of waveform differences, the measurements are, for now, only performed on the surface wave arrival. In this study, the measurement process is kept simplistic and information from other signals should be exploited for an application with real data. First, we compile a synthetic reference data set for an array of 16 stations for a homogeneous velocity model and a homogeneous distribution of noise sources, and generate random models with varying degree of complexity following the approach by Igel & Gudmundsson (1997). The strength of the anomalies is chosen such that scattering potential remains constant, that is, size of the anomaly squared times strength. With this scaling the induced traveltime shift of each anomaly is approximately the same independent of the complexity of the medium. The specific choice for the scaling does not alter the conclusion drawn from this study. The array and three velocity models with varying degree of complexity are shown in Fig. 4. We then perform two sets of experiments: Figure 4. View largeDownload slide Source distribution (top left) and velocity model with lowest (top right), moderate (bottom left) and highest complexity (bottom right) used in the misfit study. Receivers are indicated with black triangles and absorbing boundaries with dashed lines. The source distribution and the model with highest complexity are reused in the inversion example as target model. Figure 4. View largeDownload slide Source distribution (top left) and velocity model with lowest (top right), moderate (bottom left) and highest complexity (bottom right) used in the misfit study. Receivers are indicated with black triangles and absorbing boundaries with dashed lines. The source distribution and the model with highest complexity are reused in the inversion example as target model. Series I: Effect of structure We evaluate the effect of structural perturbations alone by forward modelling correlations for the different velocity models with a homogeneous source distribution and compute the discrepancy to the reference data set in terms of the four misfit functionals. The respective values are shown as red lines in Fig. 5. Figure 5. View largeDownload slide Effect of structure (red line) and the combined effect structure and noise sources (blue line) for four different misfit functions: (a) asymmetry, (b) traveltimes, (c) energy differences and (d) waveform differences. The misfit values are normalized by the number of recordings and their units are given in the titles. The average size of the anomalies in the velocity models is given in terms of the dominant wavelength. Figure 5. View largeDownload slide Effect of structure (red line) and the combined effect structure and noise sources (blue line) for four different misfit functions: (a) asymmetry, (b) traveltimes, (c) energy differences and (d) waveform differences. The misfit values are normalized by the number of recordings and their units are given in the titles. The average size of the anomalies in the velocity models is given in terms of the dominant wavelength. Series II: Effect of source We repeat this procedure with the same velocity models, but now with a Gaussian-shaped source anomaly superimposed on the homogeneous background. The Gaussian anomaly is placed on the left side of the array (see Fig. 4), which mimics a frequent configuration with strong sources near coasts (Stehly et al. 2006; Yang & Ritzwoller 2008; Tian & Ritzwoller 2015). The misfit values from this series of models show the combined effect of both noise sources and structure on the measurements and are shown in Fig. 5 as blue line. The difference between misfits caused by structure perturbations only (red lines) and those caused by combined structure and source perturbations (blue lines) provides information on the effect of the heterogeneous noise source. The asymmetry measurement is mostly affected by the heterogeneous noise source distribution and is insensitive, at least to first order, to Earth structure. We thus confirm the plausibility argument by Ermert et al. (2016) that a misfit based on the asymmetry of correlation functions is well suited to study the distribution of noise sources. In contrast, primarily velocity anomalies lead to traveltime differences and, as expected by previous studies (Tsai 2009; Weaver et al. 2009; Froment et al. 2010), the heterogeneous noise source has no profound effect for time-independent tomography. Monitoring applications, which investigate small velocity changes over small timescales, typically use coda waves of correlation functions (Brenguier et al. 2008; Obermann et al. 2013, 2014), aiming at reducing the effect of source variability. In cases where this is not possible, for example due to strong attenuation or the lack of sufficiently strong scatterers, the effect of heterogeneous noise distributions on traveltimes can, however, become significant or even dominant (Zhan et al. 2013; Delaney et al. 2017). The behaviour of the energy measurement changes with the complexity of the structure. For smooth velocity media, the energy of the signal is essentially controlled by the noise source. In strongly heterogeneous media, however, energy is altered through scattering, reflections and focusing effects. These phenomena also lead to more and more complex waveforms, which is expressed by the increase of the misfit based on waveform differences with higher complexity of the velocity model. The shift in magnitude introduced by the source anomaly reflects the resulting amplitude change in the correlation functions. Inversion strategy Based on the findings above, we propose the following inversion strategy. Since the influence of heterogeneous noise sources on traveltimes is small, a wave equation traveltime inversion (Luo & Schuster 1991) can be performed first, assuming a homogeneous distribution of noise sources. The resulting model can then be taken as background model for a source inversion based on the asymmetry of noise correlation functions, because this misfit is hardly affected by unmodelled Earth structure. Alternatively, the model inferred from traveltimes serves as initial model for a joint inversion for noise sources and structure using energy and waveform differences. 4.2 Inversion example To explore the potential and limitations of the proposed strategy, we extend the forward solver with an iterative inversion framework that is based on limited-memory BFGS (L-BFGS) (Nocedal & Wright 2006). In order to deal with the ill-posedness of the inverse problem and to avoid high oscillations in the inverted model, a Tikhonov regularization term (Tikhonov 1963) is added to the misfit function. After running several inversions with different scaling factors for the regularization term, we choose an appropriate value using a pragmatic L-curve approach (Hansen 1992). In addition, we include a Gaussian smoothing operator in the parametrization and choose the dominant wavelength as standard deviation. The inversion is stopped, when the norm of the gradient is reduced by a factor of 10−3, which can be achieved with reasonable computational effort and is close to the first-order necessary condition of optimality (Nocedal & Wright 2006). For the joint inversion of both noise sources and Earth structure, the two misfit functionals, that is, the energy measurement and waveform differences, are weighted using a convex combination. The weight is chosen based on the magnitude of the individual misfit functions, but no significant changes are observed in the inversion results by varying the relative weight by an order of magnitude. Combining different misfits aims at reducing the null space and at making the misfit functional locally convex, which can lead to a higher convergence rate. In our case, a weight favouring waveform differences does not exploit information about the noise source distribution that energy differences provide, thus the inversion converges slower. To compute synthetic data, we use the heterogeneous noise source distribution and the most complex structure model generated for the misfit study (Fig. 4). The typical size of the heterogeneities is approximately 1.5 times the dominant wavelength. Following the proposed strategy, first traveltimes are exploited by a wave equation traveltime inversion assuming a homogeneous distribution of noise sources. The inversion result is presented in Fig. 6. A smooth representation of the velocity structure is recovered and small scale features are lost. This demonstrates the limitation of traveltime tomography, but also its robustness to retrieve the big picture, even when heterogeneous noise source distributions are ignored. Figure 6. View largeDownload slide Target velocity model (left) and inverted model (right) obtained by a wave equation traveltime tomography with a homogeneous distribution of noise sources. To facilitate their comparison, the velocity models are shaded outside of the array, where low resolution is expected. Figure 6. View largeDownload slide Target velocity model (left) and inverted model (right) obtained by a wave equation traveltime tomography with a homogeneous distribution of noise sources. To facilitate their comparison, the velocity models are shaded outside of the array, where low resolution is expected. Using the velocity model obtained from traveltimes as background model, we invert for the distribution of noise sources based on the asymmetry measurement (Fig. 7). It is possible to locate the strong anomaly on the left side of the array. In regions with only the background noise level, the energy is decreased. By dividing the energy on the causal/acausal branch, the asymmetry of noise correlation functions is not sensitive to the absolute energy level. In our example, the asymmetry can therefore be explained by either introducing a corresponding change in the power-spectral density or with a similar anomaly, but smaller in magnitude and in combination with a reduction of the background noise level. Figure 7. View largeDownload slide Inverted source distribution using the asymmetry of noise correlation functions. The velocity model from the wave equation traveltime inversion is used as a background model. Figure 7. View largeDownload slide Inverted source distribution using the asymmetry of noise correlation functions. The velocity model from the wave equation traveltime inversion is used as a background model. Instead of only focusing on the source distribution, we finally perform a joint inversion starting with a homogeneous noise source distribution and the velocity model obtained from traveltimes (Fig. 8). Since the source inversion above based on the asymmetry measurement is not sensitive to the absolute energy level and we use a misfit function for the joint inversion that also exploits amplitude information, we do not take the inverted source model as initial model and start from a homogeneous distribution of noise sources. Within the array, a high resolution image of the target velocity model is recovered and the noise source anomaly is located accurately. Figure 8. View largeDownload slide Source distribution (left) and velocity model (right) recovered by the joint inversion. Figure 8. View largeDownload slide Source distribution (left) and velocity model (right) recovered by the joint inversion. 4.3 Trade-offs 4.3.1 Trade-offs and resolution for joint inversion model In order to assess resolution and trade-offs in the joint inversion, we apply the theoretical developments regarding second derivatives to the model obtained by the joint inversion for the source distribution and velocity model presented above. Introducing a perturbation in the structure model changes the sensitivity to Earth structure. Close to the vicinity of the optimum model, this change in the sensitivity would be the first update in an iterative inversion procedure. It therefore gives a conservative estimate of the resolution in terms of a point spread function, also referred to as spatial intra-parameter trade-off. In Fig. 9, we introduce a scatterer in the velocity model of approximately one third of the dominant wavelength in the centre of the array and compute the corresponding Hessian-vector product for the misfit function used in the joint inversion, that is, the convex combination of waveform and energy differences (right panel in Fig. 9). We infer an approximate width of the point spread function of three times the size of the scatterer, that is, one dominant wavelength, and a smearing of the perturbation in diagonal direction. Figure 9. View largeDownload slide Hessian applied to a perturbation in the velocity model. Introducing a perturbation in the velocity model in the centre of the array (green dots) changes the source kernel (left) and the structure kernel (right). Figure 9. View largeDownload slide Hessian applied to a perturbation in the velocity model. Introducing a perturbation in the velocity model in the centre of the array (green dots) changes the source kernel (left) and the structure kernel (right). At the same time, the perturbation of the velocity model changes the sensitivity to the distribution of noise sources (left panel in Fig. 9). Following the line of arguments above, the change of the source kernel indicates how noise sources can compensate for variations in Earth structure. It is interesting to note that no change in the source distribution coinciding with the introduced scatterer is proposed. 4.3.2 Source-structure trade-offs for different measurements For a further investigation of the trade-offs for the asymmetry measurement and energy differences, we compute Hessian-vector products for both measurements, but now for a perturbation in the source distribution at the centre of the array (Fig. 10). Comparing the ratios between the change of the structure kernel and the change of the source kernel is an indicator for the extent of trade-offs. The ratio for the asymmetry measurement is approximately one order of magnitude smaller and thus confirming its capability to better decouple the inverse problem than energy differences. Figure 10. View largeDownload slide Hessian applied to a perturbation in the source distribution (green dots) for the asymmetry measurement (top) and energy differences (bottom). Please note that absolute amplitudes of Hessian-vector products are difficult to interpret. Figure 10. View largeDownload slide Hessian applied to a perturbation in the source distribution (green dots) for the asymmetry measurement (top) and energy differences (bottom). Please note that absolute amplitudes of Hessian-vector products are difficult to interpret. 5 DISCUSSION AND OUTLOOK In the following paragraphs we discuss further details of the developed method, including benefits and limitations, as well as possible directions for further research. We end this section by giving a short outlook to the application to observed correlation functions. 5.1 Theory Due to the general formulation of the problem in Section 2.1 the derivations are equally applicable to time and frequency domain, and various wave propagation physics, without changing notation. The latter includes any type of medium, including 3-D elastic, heterogeneous and attenuating media. We cannot only reproduce specific derivations in the frequency domain (Hanasoge 2014; Fichtner 2014; Ermert et al. 2016), but also derive corresponding expressions in the time domain, well suited for implementation in modern 3-D wave propagation codes, such as the open-source waveform modelling and inversion package Salvus (Afanasiev et al. 2017) or SPECFEM (Komatitsch & Tromp 2002a,b; Peter et al. 2011). While the last point is also addressed by Tromp et al. (2010), the derivation of sensitivity kernels for the distribution of noise sources and structure is more compact in our approach. It furthermore renders the extension to second derivatives possible. Since the formulation is based on principles applied in conventional source-receiver full waveform inversion, it provides a foundation for collaboration and supports the re-use of already developed methods and tools. Beyond that, the theory unifies the earthquake-based two station method and ambient noise correlations, because we do not impose assumptions on the nature of the wavefield source in eq. (2). Without any processing that mitigates the effect of energetic signals, the sensitivity to Earth structure will be dominated by large earthquakes (Fichtner et al. 2017) and it might be more efficient to apply conventional two station workflows in such cases. Therefore, the unification primarily loosens the requirement in ambient noise tomography to exclude transient signals in order to be closer to the assumption of diffuse wavefields. In addition, with a function P in eq. (9) that is not bilinear in Snm(x) and Gm, j(x, x2), the same formulation can be applied to problems such as deconvolution (Snieder & Şafak 2006; Vasconcelos & Snieder 2008a,b) and provides as such a possible framework to generalize different approaches in interferometry. The assumption on uncorrelated noise sources introduced in eq. (5) facilitates the forward computation of noise correlation functions. It is, however, as such not crucial for the application of the method. The correlation length is finite and the evaluation of the double integral in eq. (4) remains feasible. The basic requirement for the theory to be applicable is to get close to the assumption made on the correlation length in space. In a real data application, one possibility to assess the quality of a specific choice might be to check if the data can be explained to within the observational uncertainties or if a change of the correlation length, maybe in terms of an additional inversion parameter, is required. To the best of the authors’ knowledge there is no observation yet about the quantitative correlation length of different noise sources and since we work with seismic wavelengths that are order of magnitudes larger than ocean waves that mostly generate noise, we use the assumption of uncorrelated noise sources until further insight becomes available. A further constraint on the correlation function itself in terms of convergence is not necessary. The presented theory is only strictly valid for linearly processed data. The goal for suitable processing is therefore inherently different from commonly applied schemes focusing mainly on improving Green function recovery (Bensen et al. 2007). One possibility is either to follow the approach proposed by Fichtner et al. (2017) and to account for non-linear processing or to keep the processing to a minimum and as linear as possible. 5.2 Inversion strategy and example We estimate the effect of heterogeneous noise sources and structure on four different misfit functionals and then propose an inversion strategy. The misfit study is not meant to be exhaustive and to cover all possible configurations. By the specific choice of the heterogeneous noise source distribution we intend to include the most common configuration with strong noise sources near coasts. To validate the main finding regarding the capability of the asymmetry measurement and energy differences to map noise sources, two related Hessian-vector products are included. After obtaining a model from a wave equation traveltime inversion assuming a homogeneous distribution of noise sources, further steps depend on the respective target. Either a source inversion can be performed using the asymmetry measurement or a joint inversion based on energy and waveform differences. In the latter, we include amplitude information, which highly depends on the applied processing and whose information content is in general up for debate (Prieto et al. 2011; Tsai 2011). If one is, however, not specifically interested in the absolute energy level of the noise sources, amplitude based misfit functionals can be used, since relative amplitudes within a carefully processed data set carry information. A possible alternative for the inversion of observed correlation functions is a time- and frequency-dependent phase misfit (Fichtner et al. 2008), which has been applied successfully in conventional source-receiver full waveform inversion (Fichtner et al. 2009, 2013; Fichtner & Villaseñor 2015; Simutė et al. 2016). In general, an important component of a successful application of full waveform ambient noise inversion is to explore more misfits and possibly develop new measurements. One approach could be to extract ‘too-early-to-be-physical’ arrivals. Under perfect conditions, we would retrieve the Green function by correlating noise traces and the first arrivals would be governed by the seismic velocity. Since we never fulfil the required assumptions, spurious arrivals are introduced even before the first possible arrivals. These contain valuable information on noise source locations that could improve source inversions. Further research in that direction is needed to assess its potential. The presented joint inversion example illustrates that it is possible to go beyond traditional noise tomography, which mostly exploits fundamental-mode surface waves. A good inference of the distribution of noise sources is essential, which is analogous to traditional source-receiver methods. We expect that the benefit of full waveform ambient noise inversion will be even larger in 3-D, since higher-mode surface waves are not excited correctly (Halliday & Curtis 2008; Kimman & Trampert 2010), which is necessary for their exploitation in methods that assume a perfect retrieval of the Green function. The presented inversion is performed only for one frequency band and we therefore only include one distribution of noise sources. Since noise sources are frequency dependent, a more complete approach would be to allow several distributions in space, for example in terms of L basis functions as presented in the examples above, and invert for them at the same time. The asymmetry measurement and also energy differences, however, do not carry enough frequency information and the same asymmetry or energy can be obtained by changing any basis function. For the application to data, we will for now honour the frequency dependence by inverting for noise sources in subsequent frequency bands. A joint approach is the topic for further research. 5.3 Trade-offs Traditional ambient noise tomography works under the assumption of a homogeneous distribution of noise sources and thereby seemingly avoids the necessity to constrain the noise source distribution and to characterize trade-offs. The concept is simplistic, but to some extent oversimplified, since it fails to describe the physical system where both the distribution of noise sources and Earth structure determine the actual waveforms in correlation functions. The proposed method allows us to jointly invert for the distribution of noise sources and structure. Similar to earthquake tomography, errors in the inference of source properties map into tomographic images of Earth structure (Valentine & Woodhouse 2010). Trade-offs between source and structure can never be avoided completely, but to honour and to be able to interpret results obtained by joint inversion they have to be quantified. Exactly for that purpose, we present a second-order adjoint approach, enabling us to perform a resolution analysis that accounts for intra- and interparameter trade-offs. The trade-off kernel in Fig. 9 indicates indeed that a change in the source distribution can compensate for a scatterer introduced in the structure. In the inversion example, this trade-off seems to be manageable, but these trade-offs are likely to be larger when scattering is dominant. A detailed and quantitative analysis of trade-offs will, however, only be possible with data taking measurement errors into account and is subject for further research. The same concepts can be applied to develop and characterize different misfit functionals aiming at decoupling the inversion as much as possible. 5.4 Application to data For the application of the developed method to observed correlation functions, we are currently extending the spectral element solver Salvus (http://salvus.io) to be able to compute correlation functions in 3-D media with heterogeneous noise sources at the surface and the corresponding sensitivity kernels for the distribution of noise sources and Earth structure. The first goal is a global inversion for the Hum period band. The application to regional scales poses further challenges, because noise sources and structure outside of the domain of interest affect the correlation functions. Further research is needed to solve this problem. 6 CONCLUSIONS In the previous sections we extended the theoretical background for full waveform ambient noise inversion that goes beyond Green function retrieval. It accounts for the distribution of noise sources in space and frequency, operates for any type of medium, including a 3-D elastic, heterogeneous Earth with attenuation and anisotropy, and models the full seismic wave propagation physics. The general formulation of the forward problem of computing correlation functions facilitates the application of adjoint techniques, which enables us to compute first and also second derivatives efficiently. The numerical study revealed that the sensitivity of different observables to the distribution of noise sources and to structure varies. Based on that, we designed and implemented an appropriate inversion scheme in order to constrain both of them. The presented joint inversion example demonstrates that full waveform ambient noise inversion has the potential to improve the resolution of tomographic images compared to traditional ambient noise tomography and, in addition, to refine noise source location. An improvement of the latter is essential for a better understanding of noise generation. Inherent trade-offs between source and structure are quantified using Hessian-vector products. ACKNOWLEDGEMENTS The authors would like to thank Editor Jean Virieux and two anonymous reviewers for their valuable comments and excellent suggestions to improve the paper. In addition, the authors wish to express their gratitude towards Laurent Stehly and Evan Delaney for useful discussions and brainstorming sessions. This research was supported by the Swiss National Supercomputing Center (CSCS) in the form of the GeoScale and ch1/s741 projects, and by the Swiss National Science Foundation (SNSF) under grant 200021_149143. The authors acknowledge support from and discussions within TIDES COST Action ES1401. REFERENCES Afanasiev M., Boehm C., van Driel M., Krischer L., May D.A., Rietmann M., Fichtner A., 2017. Salvus: an open-source high-performance package for full waveform modelling and inversion from laboratory to global scales , doi:10.5905/ethz-1007-88. Agnew D.C., Berger J., 1978. Vertical seismic noise at very low frequencies, J. geophys. Res. , 83( B11), 5420– 5424. Google Scholar CrossRef Search ADS   Aki K., 1957. Space and time spectra of stationary stochastic waves with special reference to microtremors, Bull. Earthq. Res. Inst. , 35, 415– 456. Aki K., Richards P., 2002. Quantitative Seismology , University Science Books. Ardhuin F., Stutzmann E., Schimmel M., Mangeney A., 2011. Ocean wave sources of seismic noise, J. geophys. Res. , 116, doi:10.1029/2011JC006952. Ardhuin F., Gualtieri L., Stutzmann E., 2015. How ocean waves rock the earth: two mechanisms explain microseisms with periods 3 to 300s, Geophys. Res. Lett. , 42( 3), 765– 772. Google Scholar CrossRef Search ADS   Basini P., Nissen-Meyer T., Boschi L., Casarotti E., Verbeke J., Schenk O., Giardini D., 2013. The influence of nonuniform ambient noise on crustal tomography in Europe, Geochem. Geophys. Geosyst. , 14, 1471– 1492. Google Scholar CrossRef Search ADS   Bensen G.D., Ritzwoller M.H., Barmin M.P., Levshin A.L., Lin F., Moschetti M.P., Shapiro N.M., Yang Y., 2007. Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements, Geophys. J. Int. , 169, 1239– 1260. Google Scholar CrossRef Search ADS   Brenguier F., Campillo M., Hadziioannou C., Shapiro N.M., Nadeau R.M., Larose E., 2008. Postseismic relaxation along the san andreas fault at Parkfield from continuous seismological observations, Science , 321( 5895), 1478– 1481. Google Scholar CrossRef Search ADS PubMed  Campillo M., Paul A., 2003. Long-range correlations in the diffuse seismic coda, Science , 299( 5606), 547– 549. Google Scholar CrossRef Search ADS PubMed  Chen P., Zhao L., Jordan T.H., 2007. Full 3D tomography for the crustal structure of the Los Angeles region., Bull. seism. Soc. Am. , 97, 1094– 1120. Google Scholar CrossRef Search ADS   Claerbout J.F., 1968. Synthesis of a layered medium from its acoustic transmission response, Geophysics , 33( 2), 264– 269. Google Scholar CrossRef Search ADS   Cupillard P., Capdeville Y., 2010. On the amplitude of surface waves obtained by noise correlation and the capability to recover the attenuation: a numerical approach, Geophys. J. Int. , 181, 1687– 1700. Dahlen F., Hung S.-H., Nolet G., 2000. Fréchet kernels for finite-frequency traveltimes – I. Theory, Geophys. J. Int. , 141, 157– 174. Google Scholar CrossRef Search ADS   Dahlen F.A., Baig F.A., 2002. Fréchet kernels for body wave amplitudes, Geophys. J. Int. , 150, 440– 466. Google Scholar CrossRef Search ADS   Delaney E., Ermert L., Sager K., Kritski A., Bussat S., Fichtner A., 2017. Passive seismic monitoring with nonstationary noise sources, Geophysics , 82( 4), KS57– KS70. Google Scholar CrossRef Search ADS   Einstein A., 1906. Zur Theorie der Brownschen Bewegung, Ann. Phys. , 4( 19), 371– 381. Google Scholar CrossRef Search ADS   Ermert L., Villaseñor A., Fichtner A., 2016. Cross-correlation imaging of ambient noise sources, Geophys. J. Int. , 204( 1), 347– 364. Google Scholar CrossRef Search ADS   Fichtner A., 2010. Full Seismic Waveform Modelling and Inversion. , Springer. Fichtner A., 2014. Source and processing effects on noise correlations, Geophys. J. Int. , 197, 1527– 1531. Google Scholar CrossRef Search ADS   Fichtner A., 2015. Source-structure trade-offs in ambient noise correlations, Geophys. J. Int. , 202( 1), 678– 694. Google Scholar CrossRef Search ADS   Fichtner A., Trampert J., 2011. Hessian kernels of seismic data functionals based upon adjoint techniques, Geophys. J. Int. , 185, 775– 798. Google Scholar CrossRef Search ADS   Fichtner A., Villaseñor A., 2015. Crust and upper mantle of the western Mediterranean - Constraints from full-waveform inversion, Earth planet. Sci. Lett. , 428, 52– 62. Google Scholar CrossRef Search ADS   Fichtner A., Bunge H.-P., Igel H., 2006. The adjoint method in seismology - I. Theory, Phys. Earth planet. Inter. , 157, 86– 104. Google Scholar CrossRef Search ADS   Fichtner A., Kennett B.L.N., Igel H., Bunge H.-P., 2008. Theoretical background for continental- and global-scale full-waveform inversion in the time-frequency domain., Geophys. J. Int. , 175, 665– 685. Google Scholar CrossRef Search ADS   Fichtner A., Kennett B.L.N., Igel H., Bunge H.-P., 2009. Full seismic waveform tomography for upper-mantle structure in the Australasian region using adjoint methods., Geophys. J. Int. , 179, 1703– 1725. Google Scholar CrossRef Search ADS   Fichtner A., Saygin E., Taymaz T., Cupillard P., Capdeville Y., Trampert J., 2013. The deep structure of the North Anatolian Fault Zone, Earth planet. Sci. Lett. , 373, 109– 117. Google Scholar CrossRef Search ADS   Fichtner A., Stehly L., Ermert L., Boehm C., 2017. Generalized interferometry - I: theory for interstation correlations, Geophys. J. Int. , 208( 2), 603– 638. Google Scholar CrossRef Search ADS   Froment B., Campillo M., Roux P., Gouédard P., Verdel A., Weaver R.L., 2010. Estimation of the effect of nonisotropically distributed energy on the apparent arrival time in correlations, Geophysics , 75, SA85– SA93. Google Scholar CrossRef Search ADS   Gizon L., Birch A.C., 2002. Time-distance helioseismology: The forward problem for random distributed sources, Astrophys. J. , 571( 2), 966. Google Scholar CrossRef Search ADS   Groos J.C., Bussat S., Ritter J.R.R., 2012. Performance of different processing schemes in seismic noise cross-correlations, Geophys. J. Int. , 188, 498– 512. Google Scholar CrossRef Search ADS   Gualtieri L., Stutzmann E., Capdeville Y., Ardhuin F. M., Schimmel A.M., Morelli A., 2013. Modeling secondary microseismic noise by normal mode summation, Geophys. J. Int. , 193, 1732– 1745. Google Scholar CrossRef Search ADS   Halliday D., Curtis A., 2008. Seismic interferometry, surface waves and source distribution, Geophys. J. Int. , 175, 1067– 1087. Google Scholar CrossRef Search ADS   Hanasoge S.M., 2013. The influence of noise sources on cross-correlation amplitudes, Geophys. J. Int. , 192, 295– 309. Google Scholar CrossRef Search ADS   Hanasoge S.M., 2014. Measurements and kernels for source-structure inversions in noise tomography, Geophys. J. Int. , 192, 971– 985. Google Scholar CrossRef Search ADS   Hanasoge S.M., Birch A., Gizon L., Tromp J., 2011. The adjoint method applied to time-distance helioseismology, Astrophys. J. , 738, doi:10.1088/0004–637X/738/1/100. Haned A., Stutzmann E., Schimmel M., Kiselev S., Davaille A., Yelles-Chaouche A., 2016. Global tomography using seismic hum, Geophys. J. Int. , 204( 2), 1222– 1236. Google Scholar CrossRef Search ADS   Hansen P.C., 1992. Analysis of discrete ill-posed problems by means of the L-curve, SIAM Rev. , 34( 4), 561– 580. Google Scholar CrossRef Search ADS   Harmon N., Rychert C., Gerstoft P., 2010. Distribution of noise sources for seismic interferometry, Geophys. J. Int. , 183( 3), 1470– 1484. Google Scholar CrossRef Search ADS   Igel H., Gudmundsson O., 1997. Frequency-dependent effects on travel times and waveforms of long-period S and SS waves, Phys. Earth. planet. Int. , 104, 229– 246. Google Scholar CrossRef Search ADS   Kimman W., Trampert J., 2010. Approximations in seismic interferometry and their effects on surface waves, Geophys. J. Int. , 182, 461– 476. Komatitsch D., Tromp J., 2002a. Spectral-element simulations of global seismic wave propagation, part II: 3-D models, oceans, rotation, and gravity, Geophys. J. Int. , 150, 303– 318. Google Scholar CrossRef Search ADS   Komatitsch D., Tromp J., 2002b. Spectral-element simulations of global seismic wave propagation, part I: validation, Geophys. J. Int. , 149, 390– 412. Google Scholar CrossRef Search ADS   Landès M., Hubans F., Shapiro N.M., Paul A., Campillo M., 2010. Origin of deep ocean microseisms by using teleseismic body waves, J. geophys. Res. , 115( B5), B05302, doi:10.1029/2009JB006918. Google Scholar CrossRef Search ADS   Lobkis O.I., Weaver R.L., 2001. On the emergence of the Green’s function in the correlations of a diffuse field, J. acoust. Soc. Am. , 110, 3011– 3017. Google Scholar CrossRef Search ADS   Luo Y., Schuster G.T., 1991. Wave-equation traveltime inversion., Geophysics , 56, 645– 653. Google Scholar CrossRef Search ADS   Nishida K., 2014. Source spectra of seismic hum, Geophys. J. Int. , 199( 1), 416– 429. Google Scholar CrossRef Search ADS   Nishida K., Fukao Y., 2007. Source distribution of Earth’s background free oscillations, J. geophys. Res. , 112( B6), B06306, doi:10.1029/2006JB004720. Google Scholar CrossRef Search ADS   Nishida K., Montagner J.-P., Kawakatsu H., 2009. Global surface wave tomography using seismic hum, Science , 326( 5949), 112. Nocedal J., Wright S., 2006. Numerical Optimization , Springer Science & Business Media. Obermann A., Planès T., Larose E., Campillo M., 2013. Imaging preeruptive and coeruptive structural and mechanical changes of a volcano with ambient seismic noise, J. geophys. Res. , 118( 12), 6285– 6294. Google Scholar CrossRef Search ADS   Obermann A., Froment B., Campillo M., Larose E., Planès T., Valette B., Chen J.H., Liu Q.Y., 2014. Seismic noise correlations to image structural and mechanical changes associated with the Mw 7.9 2008 Wenchuan earthquake, J. geophys. Res. , 119( 4), 3155– 3168. Google Scholar CrossRef Search ADS   Peter D., Tape C., Boschi L., Woodhouse J.H., 2007. Surface wave tomography: global membrane waves and adjoint methods, Geophys. J. Int. , 171( 3), 1098– 1117. Google Scholar CrossRef Search ADS   Peter D. et al.  , 2011. Forward and adjoint simulations of seismic wave propagation on fully unstructured hexahedral meshes, Geophys. J. Int. , 186, 721– 739. Google Scholar CrossRef Search ADS   Plessix R.-E., 2006. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications, Geophys. J. Int. , 167, 495– 503. Google Scholar CrossRef Search ADS   Pratt R.G., 1999. Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model, Geophysics , 64, 888– 901. Google Scholar CrossRef Search ADS   Prieto G.A., Denolle M., Lawrence J.F., Beroza G.C., 2011. On amplitude information carried by the ambient seismic field, C. R. Geosci. , 343( 8-9), 600– 614. Google Scholar CrossRef Search ADS   Rhie J., Romanowicz B., 2004. Excitation of Earth’s continuous free oscillations by atmosphere–ocean–seafloor coupling, Nature , 431( 7008), 552– 556. Google Scholar CrossRef Search ADS PubMed  Sabra K.G., Gerstoft P., Roux P., Kuperman W.A., 2005a. Surface wave tomography from microseisms in Southern California, Geophys. Res. Lett. , 32, doi:10.1029/2005GL023155. Sabra K.G., Gerstoft P., Roux P., Kuperman W.A., Fehler M.C., 2005b. Extracting time-domain Green’s function estimates from ambient seismic noise, Geophys. Res. Lett. , 32( 3), L03310, doi:10.1029/2004GL021862. Google Scholar CrossRef Search ADS   Sánchez-Sesma F.J., Campillo M., 2006. Retrieval of the Green’s function from cross correlation: The canonical elastic problem, Bull. seism. Soc. Am. , 96, 1182– 1191. Google Scholar CrossRef Search ADS   Shapiro N.M., Campillo M., 2004. Emergence of broadband rayleigh waves from correlations of the ambient seismic noise, Geophys. Res. Lett. , 31( 7), L07614, doi:10.1029/2004GL019491. Google Scholar CrossRef Search ADS   Shapiro N.M., Campillo M., Stehly L., Ritzwoller M., 2005. High resolution surface wave tomography from ambient seismic noise, Science , 307, 1615– 1618. Google Scholar CrossRef Search ADS PubMed  Simutė S., Steptoe H., Cobden L., Gokhberg A., Fichtner A., 2016. Full-waveform inversion of the japanese islands region, J. geophys. Res. , 121( 5), 3722– 3741. Google Scholar CrossRef Search ADS   Snieder R., 2007. Extracting the Green’s function of attenuating heterogeneous acoustic media from uncorrelated waves, J. acoust. Soc. Am. , 121( 5), 2637– 2643. Google Scholar CrossRef Search ADS PubMed  Snieder R., Larose E., 2013. Extracting earth’s elastic wave response from noise measurements, Annu. Rev. Earth Planet. Sci. , 41( 1), 183– 206. Google Scholar CrossRef Search ADS   Snieder R., Şafak E., 2006. Extracting the building response using seismic interferometry: Theory and application to the Millikan Library in Pasadena, California, Bull. seism. Soc. Am. , 96, 586– 598. Google Scholar CrossRef Search ADS   Snieder R., Wapenaar K., Larner K., 2006. Spurious multiples in seismic interferometry of primaries, Geophysics , 71, SI111– SI124. Google Scholar CrossRef Search ADS   Snieder R., Wapenaar K., Wegler U., 2007. Unified Green’s function retrieval by cross-correlation; connection with energy principles, Phys. Rev. E , 75, 036103, doi:10.1103/PhysRevE.75.036103. Google Scholar CrossRef Search ADS   Snieder R., Fan Y., Slob E., Wapenaar K., 2010. Equipartitioning is not sufficient for Green’s function extraction, Earthq. Sci. , 23, 403– 415. Google Scholar CrossRef Search ADS   Stehly L., Boué P., 2017. On the interpretation of the amplitude decay of noise correlations computed along a line of receivers, Geophys. J. Int. , 209( 1), 358– 372. Stehly L., Campillo M., Shapiro N.M., 2006. A study of the seismic noise from its long-range correlation properties, J. Geophys. Res. , 111( B10), B10306, doi:10.1029/2005JB004237. Google Scholar CrossRef Search ADS   Stehly L., Fry B., Campillo M., Shapiro N., Guilbert J., Boschi L., Giardini D., 2009. Tomography of the Alpine region from observations of seismic ambient noise, Geophys. J. Int. , 178( 1), 338– 350. Google Scholar CrossRef Search ADS   Stutzmann E., Ardhuin F., Schimmel M., Mangeney A., Patau G., 2012. Modelling long-term seismic noise in various environments, Geophys. J. Int. , 191( 2), 707– 722. Google Scholar CrossRef Search ADS   Tanimoto T., 1990. Modelling curved surface wave paths: membrane surface wave synthetics, Geophys. J. Int. , 102( 1), 89– 100. Google Scholar CrossRef Search ADS   Tape C., Liu Q., Tromp J., 2007. Finite-frequency tomography using adjoint methods - methodology and examples using membrane surface waves, Geophys. J. Int. , 168, 1105– 1129. Google Scholar CrossRef Search ADS   Tape C., Liu Q., Maggi A., Tromp J., 2009. Adjoint tomography of the southern California crust, Science , 325, 988– 992. Google Scholar CrossRef Search ADS PubMed  Tarantola A., 1988. Theoretical background for the inversion of seismic waveforms, including elasticity and attenuation, Pure appl. Geophys. , 128, 365– 399. Google Scholar CrossRef Search ADS   Tarantola A., 2005. Inverse Problem Theory and Methods for Model Parameter Estimation , 2nd edn, Society for Industrial and Applied Mathematics. Google Scholar CrossRef Search ADS   Tian Y., Ritzwoller M.H., 2015. Directionality of ambient noise on the Juan de Fuca plate: implications for source locations of the primary and secondary microseisms, Geophys. J. Int. , 201( 1), 429– 443. Google Scholar CrossRef Search ADS   Tikhonov A., 1963. Solution of incorrectly formulated problems and the regularization method, Sov. Math. Dokl. , 5, 1035– 1038. Tromp J., Tape C., Liu Q., 2005. Seismic tomography, adjoint methods, time reversal and banana-doughnut kernels, Geophys. J. Int. , 160, 195– 216. Google Scholar CrossRef Search ADS   Tromp J., Luo Y., Hanasoge S., Peter D., 2010. Noise cross-correlation sensitivity kernels, Geophys. J. Int. , 183, 791– 819. Google Scholar CrossRef Search ADS   Tsai V.C., 2009. On establishing the accuracy of noise tomography traveltime measurements in a realistic medium, Geophys. J. Int. , 178, 1555– 1564. Google Scholar CrossRef Search ADS   Tsai V.C., 2011. Understanding the amplitudes of noise correlation measurements, J. geophys. Res. , 116, doi:10.1029/2011JB008483. Valentine A.P., Woodhouse J.H., 2010. Reducing errors in seismic tomography: combined inversion for sources and structure, Geophys. J. Int. , 180( 2), 847– 857. Google Scholar CrossRef Search ADS   van Driel M., Krischer L., Stähler S.C., Hosseini K., Nissen-Meyer T., 2015. Instaseis: instant global seismograms based on a broadband waveform database, Solid Earth , 6( 2), 701– 717. Google Scholar CrossRef Search ADS   Vasconcelos I., Snieder R., 2008a. Interferometry by deconvolution, Part 1 – Theory for acoustic waves and numerical examples, Geophysics , 73, S129– S141. Google Scholar CrossRef Search ADS   Vasconcelos I., Snieder R., 2008b. Interferometry by deconvolution, Part 2 – Theory for elastic waves and application to drill-bit seismic imaging, Geophysics , 73, S115– S128. Google Scholar CrossRef Search ADS   Wapenaar K., 2004. Retrieving the elastodynamic Green’s function of an arbitrary inhomogeneous medium by cross correlation, Phys. Rev. Lett. , 93, 254301– 254304. Google Scholar CrossRef Search ADS PubMed  Wapenaar K., Fokkema J., 2006. Green’s function representations for seismic interferometry, Geophysics , 71, S133– S146. Google Scholar CrossRef Search ADS   Weaver R., Froment B., Campillo M., 2009. On the correlation of non-isotropically distributed ballistic scalar diffuse waves, J. acoust. Soc. Am. , 126( 4), 1817– 1826. Google Scholar CrossRef Search ADS PubMed  Weaver R.L., Lobkis O.I., 2004. Diffuse fields in open systems and the emergence of Green’s function, J. acoust. Soc. Am. , 116, 2731– 2734. Google Scholar CrossRef Search ADS   Woodard M.F., 1997. Implications of localized, acoustic absorption for heliotomographic analysis of sunspots, Astrophys. J. , 485, 890– 894. Google Scholar CrossRef Search ADS   Yang Y., Ritzwoller M.H., 2008. Characteristics of ambient seismic noise as a source for surface wave tomography, Geochemistry, Geophysics, Geosystems , 9( 2), Q02008, doi:10.1029/2007GC001814. Google Scholar CrossRef Search ADS   Yao H., van der Hilst R.D., 2009. Analysis of ambient noise energy distribution and phase velocity bias in ambient noise tomography, with application to SE Tibet, Geophys. J. Int. , 179( 2), 1113– 1132. Google Scholar CrossRef Search ADS   Zhan Z., Tsai V.C., Clayton R.W., 2013. Spurious velocity changes caused by temporal variations in ambient noise frequency content, Geophys. J. Int. , 194( 3), 1574– 1581. Google Scholar CrossRef Search ADS   Zhou Y., Dahlen F.A., Nolet G., 2004. Three-dimensional sensitivity kernels for surface wave observables, Geophys. J. Int. , 158, 142– 168. Google Scholar CrossRef Search ADS   APPENDIX A: FOURIER TRANSFORM For the Fourier transform and its inverse we use   \begin{equation} y(\omega ) = \frac{1}{\sqrt{2\pi }} \int _{-\infty }^\infty y(t)\, e^{-i \omega t} \, \text{d}t \end{equation} (A1)  \begin{equation} y(t) = \frac{1}{\sqrt{2\pi }} \int _{-\infty }^\infty y(\omega )\, e^{i \omega t} \, \text{d}\omega . \end{equation} (A2)We distinguish between time and frequency domain only by argument or context. APPENDIX B: LINK TO FORMULATION BY FICHTNER (2015) For a link between the derivation presented in Section 3.1.2 and in Fichtner (2015), we start from eq. (44)   \begin{equation} \nabla _{{\bf m}} \mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger _{(2)}\, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} \Bigr \rangle , \end{equation} (B1)where we refer to $$u^\dagger _{(n)}$$ as the adjoint wavefield that corresponds to the correlation wavefield at reference station xn. In the frequency domain it can be written as   \begin{equation} \nabla _{{\bf m}} \mathcal {X} \, \delta {\bf m} = \underbrace{ \int \limits _\oplus \int \limits _{-\infty }^{\infty } u^\dagger _{(2)}({\bf x},\omega )\, \nabla _{\bf m}\mathcal {L}[C({\bf x},{\bf x}_2,\omega )]\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x} }_{(1)} + \underbrace{ \int \limits _\oplus \int \limits _{-\infty }^{\infty } c^\dagger ({\bf x},\omega )\, \nabla _{\bf m}\mathcal {L}[G({\bf x},{\bf x}_2,\omega )]\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x} }_{(2)}. \end{equation} (B2)With the definition of the adjoint correlation wavefield c† according to eq. (47), which is defined by   \begin{equation} c^\dagger ({\bf x},\omega ) = \int \limits _\oplus G^*({\bf x},\mathbf {\xi },\omega )\, S(\mathbf {\xi },\omega )\, G(\mathbf {\xi },{\bf x}_1,\omega)\, f^*(\omega ) \, \text{d}\mathbf {\xi }, \end{equation} (B3)the second part of eq. (B2) is given by   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } \Bigl[\int \limits _\oplus G^*({\bf x},\mathbf {\xi },\omega )\, S(\mathbf {\xi },\omega )\, G(\mathbf {\xi },{\bf x}_1,\omega)\, f^*(\omega )\, \text{d}\mathbf {\xi } \Bigr ] \, \nabla _{\bf m}\mathcal {L}(G({\bf x},{\bf x}_2,\omega ))\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}. \end{equation} (B4)Since $$\nabla _{\bf m}\mathcal {L}$$ is linear, we write   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } \Bigl [\int \limits _\oplus G^*({\bf x},\mathbf {\xi },\omega )\, S(\mathbf {\xi },\omega )\, G(\mathbf {\xi },{\bf x}_1,\omega)\, \text{d}\mathbf {\xi } \Bigr ] \, \nabla _{\bf m}\mathcal {L}(f^*(\omega ) G({\bf x},{\bf x}_2,\omega ))\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}, \end{equation} (B5)and observe that $$\nabla _{\bf m}\mathcal {L}$$ is acting on f*(ω)G(x, x2, ω), which is the adjoint wavefield $$u^{\dagger *}_{(1)}({\bf x},\omega )$$ for the reference station at x1 and the corresponding measurement at station x2. Eq. (B5) thus changes to   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } C^*({\bf x},{\bf x}_1,\omega ) \, \nabla _{\bf m}\mathcal {L}( u^{\dagger *}_{(1)}({\bf x},\omega ) )\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}. \end{equation} (B6)Using the adjoint operator defined in eq. (22), we obtain   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } u^{\dagger *}_{(1)}({\bf x},\omega )\, \nabla _{\bf m}\mathcal {L}^{\dagger }( C^*({\bf x},{\bf x}_1,\omega ) )\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}. \end{equation} (B7)Part 1 of eq. (B2) together with expression (B7) correspond to eq. (32) in Fichtner (2015). As expected due to the nature of correlation functions, the final expression is Hermitian in the exchange of x1 and x2. For a full evaluation of the gradient according to the expression by Fichtner (2015), the wavefield for all reference stations has to be computed. The derivation in this paper allows us to optimize the array in terms of simulations and coverage, convenient due to the high computational costs of full waveform ambient noise inversion. APPENDIX C: DERIVATION OF P† FOR EXAMPLE CASE For a better understanding of the developments, we derive the adjoint operator P†, invoked in eq. (40), for the example case in the following. Starting from expression (40), we use the definition of P in eq. (13) and write   \begin{equation} \Bigl \langle u^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle = - \frac{1}{2\pi } \textrm{Re} \int \limits _\oplus \int \limits _{\tau } u^\dagger ({\bf x}, t)\, \sum _l S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime })\, e^{-i\omega t^{\prime }} \text{d}t^{\prime } \Bigr ]^* e^{i\omega t} \, \text{d}\omega \, \text{d}t\, \text{d}{\bf x}. \end{equation} (C1)Reordering the integrals and the summation over basis functions, we obtain   \begin{equation} - \frac{1}{2\pi } \textrm{Re} \sum _l \int \limits _\oplus S_l({\bf x}) \int \limits _{\tau } \int \limits _{w_l}^{w_{l+1}} \int \limits _{\tau }^{} u^\dagger ({\bf x}, t)\, \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime }) e^{i\omega t^{\prime }} e^{i\omega t}\, \text{d}t^{\prime }\, \text{d}\omega \, \text{d}t\, \text{d}{\bf x}. \end{equation} (C2)Isolating the adjoint wavefield u† together with the exponential function eiωt, eq. (C2) changes to   \begin{equation} - \frac{1}{2\pi } \textrm{Re} \sum _l \int \limits _\oplus S_l({\bf x}) \int \limits _{\tau } \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime }) \int \limits _{w_l}^{w_{l+1}} \int \limits _{\tau }^{} u^\dagger ({\bf x}, t) e^{i\omega t}\, \text{d}t\, e^{i\omega t^{\prime }}\, \text{d}\omega \, \text{d}t^{\prime }\, \text{d}{\bf x}. \end{equation} (C3)The innermost integral is a Fourier transform of the adjoint wavefield u†. If we take the complex conjugate of the Fourier transform, taking into account that u†(x, t) is real, and reordering the integrals and summation, we get   \begin{equation} - \frac{1}{2\pi } \textrm{Re} \int \limits _\oplus \int \limits _{\tau } \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime })\, \sum _l S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} u^\dagger ({\bf x}, t)\, e^{-i\omega t} \text{d}t \Bigr ]^* e^{i\omega t^{\prime }} \, \text{d}\omega \, \text{d}t^{\prime }\, \text{d}{\bf x} \end{equation} (C4)  \begin{equation} = \Bigl \langle \delta _{{\bf m}}G\, P^\dagger ( S, u^\dagger ) \Bigr \rangle . \end{equation} (C5)For the last step, we compare eq. (C4) to eq. (C1) and observe that this is exactly what we try to retrieve. In this case, the definitions of P and P† are the same and P is therefore self-adjoint. APPENDIX D: SECOND DERIVATIVES D1 Mixed source-structure derivative Hms In the derivation of the mixed source-structure derivative presented in Section 3.2, we obtained as an intermediate result eq. (57), given by   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle , \end{equation} (D1)and realized that it is not possible to get an expression for Hms that is easy to compute for all possible directions δS and δm. We therefore chose a perturbation δm and got an expression for Hms( δm, ○) that can be evaluated for all possible directions δS. It is, however, also possible to introduce a perturbation in S and derive Hms(○, δS ) for all possible directions δm. To achieve this, we define the adjoint wavefield p† now as   \begin{equation} \mathcal {L}^\dagger ( p^\dagger ) = - \nabla _{C}\nabla _{C}\chi ^\dagger \, \delta _{S} C. \end{equation} (D2)Following the procedure in Section 3.2, we yield   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle - p^\dagger \, \mathcal {L}( \delta _{{\bf m}} C ) + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle . \end{equation} (D3)Replacing $$\mathcal {L}( \delta _{{\bf m}} C )$$ in the first term using the first derivative of eq. (9) with respect to structure, changes the expression for Hms accordingly to   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle p^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - p^\dagger \, P( S, \delta _{{\bf m}}G ) - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle . \end{equation} (D4)Term three and four in eq. (D4) prohibit the evaluation of Hms( ○, δS ) for all possible directions δm. To eliminate the third term, we multiply the first derivative of the PDE describing the computation of the Green function in eq. (8) with the adjoint wavefield z†, apply 〈.〉 and add it to eq. (D4). To facilitate the understanding, we only write down the terms of interest, which are given by   \begin{eqnarray} \Bigl \langle - p^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle & = & \Bigl \langle - p^\dagger \, P( S, \delta _{{\bf m}}G ) + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + z^\dagger \, \mathcal {L}( \delta _{{\bf m}}G ) \Bigr \rangle \nonumber\\ & = & \Bigl \langle - \delta _{{\bf m}}G\, P^\dagger ( S, p^\dagger ) + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + \delta _{{\bf m}}G\, \mathcal {L}^\dagger ( z^\dagger ) \Bigr \rangle , \end{eqnarray} (D5)where we invoke again the adjoint operators in the second line defined in eqs (40) and (41). If we can now find a wavefield z† that satisfies   \begin{equation} \mathcal {L}^\dagger ( z^\dagger ) = P^\dagger ( S, p^\dagger ), \end{equation} (D6)we can eliminate the first appearance of δmG in eq. (D4) and the third term is replaced by $$z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}$$. Following the exact same procedure for the fourth term, now naming the adjoint wavefield q†, we obtain   \begin{equation} \Bigl \langle - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle = \Bigl \langle - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) + q^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + q^\dagger \, \mathcal {L}(\delta _{{\bf m}}G) \Bigr \rangle , \end{equation} (D7)and the unpleasant term δmG vanishes, if a wavefield can be found that satisfies   \begin{equation} \mathcal {L}^\dagger (q^\dagger ) = P^\dagger ( \delta S, u^\dagger ). \end{equation} (D8)Bringing it all together, eq. (D4) changes to   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle p^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}(\delta _{S}C)\, \delta {\bf m} + z^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + q^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} \Bigr \rangle , \end{equation} (D9)and the Hessian-vector product Hms( ○, δS ) can be defined as   \begin{equation} H_{ms}( \circ , \delta S ) = \Bigl \langle p^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \circ \Bigr \rangle + \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}(\delta _{S}C)\, \circ \Bigr \rangle + \Bigl \langle z^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \circ \Bigr \rangle + \Bigl \langle q^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \circ \Bigr \rangle . \end{equation} (D10)To interpret Hms(○, δS ), we first study the expression (44) for the computation of the structure kernel. In the first term, the adjoint wavefield u† interacts with the correlation wavefield C and, in the second term, the adjoint correlation wavefield c† with the Green function G. As a consequence, the first-order change of the structure kernel given by eq. (D10), has four terms: interaction of (1) C with the change p† of the adjoint wavefield due to the altered misfit functional, (2) the adjoint wavefield with the altered correlation wavefield δSC, (3) G with the modified adjoint correlation wavefield q† because of the change of its forcing term in terms of p† (eq. D6) and (4) G with the modified adjoint correlation wavefield due to a change in S (eq. D8). Since both z† and q† interact with the Green function G and due to the linearity of the wave equation to the right-hand side, the two adjoint wavefields can be simulated simultaneously. D2 Second derivative with respect to the distribution of sources In order to derive second derivatives only with respect to source, we start with the directional derivative of the misfit function $$\mathcal {X}$$ with respect to S in the direction of δS1  \begin{equation} \nabla _{S}\mathcal {X} \, \delta S_1 = \Bigl \langle \nabla _C\chi \, \delta _{S_1} C \Bigr \rangle . \end{equation} (D11)We differentiate eq. (D11) again with respect to S, but now in the second direction δS2, which is given by   \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \nabla _{S}\nabla _{S}\mathcal {X} \, \delta S_1\, \delta S_2 = \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{S_1} C\, \delta _{S_2} C + \nabla _C\chi \, \delta _{S_1 S_2} C \Bigr \rangle , \end{equation} (D12)where   \begin{equation} \delta _{S_1 S_2} C = \nabla _{S} \nabla _{S}\, C\, \delta S_1\, \delta S_2 \end{equation} (D13)denotes the second-order variation of the correlation function C due to two perturbations δS1 and δS2. Since the correlation function is linear in S, we have   \begin{equation} \delta _{S_1 S_2} C = 0, \end{equation} (D14)and eq. (D12) simplifies to   \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{S_1} C\, \delta _{S_2} C \Bigr \rangle . \end{equation} (D15)Invoking the adjoint operator in eq. (59) and inserting the adjoint wavefield p†, defined by   \begin{equation} \mathcal {L}^\dagger ( p^\dagger ) = - \nabla _C\nabla _C\chi ^\dagger \, \delta _{S_1} C, \end{equation} (D16)into eq. (D15), we obtain for Hss applied to δS1 and δS2  \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \Bigl \langle - \delta _{S_2} C\, \mathcal {L}^\dagger (p^\dagger ) \Bigr \rangle . \end{equation} (D17)Using the adjoint operator in eq. (22) and the first variation of the PDE describing the modelling of the correlation wavefield due to δS (eq. (17)), Hss( δS1, δS2 ) changes to   \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \Bigl \langle - p^\dagger \, \mathcal {L}( \delta _{S_2} C ) \Bigr \rangle = \Bigl \langle - p^\dagger \, P( \delta S_2, G )\Bigr \rangle , \end{equation} (D18)and the Hessian-vector product is given by   \begin{equation} H_{ss}( \delta S_1, \circ ) = \Bigl \langle - p^\dagger \, P( \circ , G )\Bigr \rangle . \end{equation} (D19)By design, eq. (D19) allows us to compute the change of the source kernel due to a perturbation δS1 in the source distribution. The only term in eq. (27) describing the computation of the source kernel that is changed by a perturbation in the source distribution is the adjoint wavefield u†. The source perturbation δS1 changes the correlation function C to $$\delta _{S_1}C$$, which alters the misfit and consequently also the adjoint wavefield. The latter is accounted for in eq. (D19) by the adjoint wavefield p†, driven by the change of the misfit $$\nabla _C\nabla _C\chi \, \delta _{S_1}C$$. A convenient consequence is that the routines implemented for the computation of the source kernel can be reused for eq. (D19) and only the adjoint source time function has to be adapted accordingly. D3 Second derivative with respect to structure The last piece that is missing for a full assembly of the Hessian matrix is the computation of second derivatives only with respect to structure. We start again with the first derivative of the misfit function with respect to m in the direction δm1  \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m}_1 = \Bigl \langle \nabla _C\chi \, \delta _{{\bf m}_1} C \Bigr \rangle , \end{equation} (D20)and differentiate a second time, but now in the direction of δm2 to obtain   \begin{equation} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) = \nabla _{{\bf m}}\nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m}_1\, \delta {\bf m}_2 = \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + \nabla _C\chi \, \delta _{{\bf m}_1{\bf m}_2} C \Bigr \rangle , \end{equation} (D21)where   \begin{equation} \delta _{{\bf m}_1{\bf m}_2} C = \nabla _{{\bf m}}\nabla _{{\bf m}} C\, \delta {\bf m}_1\, \delta {\bf m}_2 \end{equation} (D22)is the second-order variation of the correlation C due to δm1 and δm2. In the following, we try again to eliminate unpleasant terms and therefore prepare convenient tools in terms of the first and second derivative of the PDE describing the computation of correlation functions C, given by   \begin{equation} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1 + \mathcal {L}( \delta _{{\bf m}_1}C ) - P( S, \delta _{{\bf m}_1}G ) = 0, \end{equation} (D23)  \begin{equation} \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2}C )\, \delta {\bf m}_1 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}C )\, \delta {\bf m}_2 + \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2}C ) - P( S, \delta _{{\bf m}_1{\bf m}_2}G ) = 0, \end{equation} (D24)and the same for the PDE concerning the modelling of the Green function G, that is,   \begin{equation} \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1 + \mathcal {L}( \delta _{{\bf m}_1} G ) = 0. \end{equation} (D25)  \begin{equation} \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1\, \delta {\bf m}_2 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2}G )\, \delta {\bf m}_1 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 + \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2} G ) = 0. \end{equation} (D26)We first multiply eq. (D24) with the adjoint wavefield u†, apply 〈.〉 and add it to eq. (D21), which yields   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + \nabla _C\chi \, \delta _{{\bf m}_1{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 + u^\dagger \, \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2} C )\nonumber\\ && -\, u^\dagger \, P( S, \delta _{{\bf m}_1{\bf m}_2}G )\Bigr \rangle . \end{eqnarray} (D27)Invoking again two adjoint operators for terms two and six in eq. (D27), we can eliminate $$\delta _{{\bf m}_1{\bf m}_2} C$$ due to the definition of u† and eq. (D27) simplifies to   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 - u^\dagger \, P( S, \delta _{{\bf m}_1{\bf m}_2}G )\Bigr \rangle . \end{eqnarray} (D28)Repeating the same procedure using eq. (D26) with the adjoint wavefield c† defined in eq. (43), we get   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 - u^\dagger \, P( S, \delta _{{\bf m}_1{\bf m}_2}G ) \nonumber\\ && +\, c^\dagger \nabla _{{\bf m}}\nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1\, \delta {\bf m}_2 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 \nonumber\\ && +\, c^\dagger \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2} G ) \Bigr \rangle . \end{eqnarray} (D29)Defining two adjoint operators for terms five and nine, we get rid of $$\delta _{{\bf m}_1{\bf m}_2} G$$, because of the definition of c† and eq. (D29) changes to   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 \nonumber\\ && +\, c^\dagger \nabla _{{\bf m}}\nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1\, \delta {\bf m}_2 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 \Bigr \rangle . \end{eqnarray} (D30)Until now, we have not made any assumptions on the forward modelling operator $$\mathcal {L}$$. For simplicity, we consider in the following only operators that are linear in m. Therefore, the two terms in the expression for Hmm( δm1, δm2 ) containing second derivatives of the operator $$\mathcal {L}$$ vanish, leading to   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 \Bigr \rangle . \end{eqnarray} (D31)For an incorporation and interpretation of non-linear operators, we refer the reader to Fichtner & Trampert (2011). The first three terms in eq. (D31) are unpleasant, because the direction δm2 is contained implicitly in terms of $$\delta _{{\bf m}_2} C$$. Trying to change that, we define two adjoint wavefields p† and w† by   \begin{equation} \mathcal {L}^\dagger ( p^\dagger ) = - \nabla _C\nabla _C\chi ^\dagger \, \delta _{{\bf m}_1} C - \nabla _{{\bf m}}\mathcal {L}^\dagger ( u^\dagger ) \, \delta {\bf m}_1, \end{equation} (D32)  \begin{equation} \mathcal {L}^\dagger ( w^\dagger ) = - \nabla _{{\bf m}}\mathcal {L}^\dagger ( c^\dagger ) \, \delta {\bf m}_1. \end{equation} (D33)Applying the principles used above several times, the first three terms in eq. (D31) can be altered in the following way:   \begin{eqnarray} &&{ \Bigl \langle c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 + \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 \Bigr \rangle }\nonumber\\ && = \Bigl \langle - \delta _{{\bf m}_2}G\, \mathcal {L}^\dagger ( w^\dagger ) - \delta _{{\bf m}_2}C\, \mathcal {L}^\dagger ( p^\dagger ) \Bigr \rangle = \Bigl \langle - w^\dagger \mathcal {L}( \delta _{{\bf m}_2} G ) - p^\dagger \mathcal {L}( \delta _{{\bf m}_2} C ) \Bigr \rangle . \end{eqnarray} (D34)Replacing $$-w^\dagger \mathcal {L}( \delta _{{\bf m}_2} G )$$ and $$- p^\dagger \mathcal {L}( \delta _{{\bf m}_2} C )$$ using eqs (D25) and (D23), respectively, and then adding expression (D25), multiplied with z† and 〈.〉 applied, eq. (D34) changes to   \begin{eqnarray} &&{ \Bigl \langle - w^\dagger \mathcal {L}( \delta _{{\bf m}_2} G ) - p^\dagger \mathcal {L}( \delta _{{\bf m}_2} C ) \Bigr \rangle = \Bigl \langle w^\dagger \, \nabla _{{\bf m}}\mathcal {L}(G)\, \delta {\bf m}_2 + p^\dagger \, \nabla _{{\bf m}}\mathcal {L}(C)\, \delta {\bf m}_2 - p^\dagger \, P( S, \delta _{{\bf m}_2}G ) \Bigr \rangle }\nonumber\\ && = \Bigl \langle w^\dagger \nabla _{{\bf m}}\mathcal {L}(G)\, \delta {\bf m}_2 + p^\dagger \nabla _{{\bf m}}\mathcal {L}(C)\, \delta {\bf m}_2 - p^\dagger \, P( S, \delta _{{\bf m}_2}G ) + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_2 + z^\dagger \, \mathcal {L}( \delta _{{\bf m}_2}G ) \Bigr \rangle . \end{eqnarray} (D35)If we now define z† as the solution to   \begin{equation} \mathcal {L}^\dagger ( z^\dagger ) = P^\dagger ( S, p^\dagger ), \end{equation} (D36)we can write down the final expression for Hmm( δm1, δm2 ) as   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}C )\, \delta {\bf m}_2 + c^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}G )\, \delta {\bf m}_2 + w^\dagger \, \nabla _{{\bf m}}\mathcal {L}(G)\, \delta {\bf m}_2 \nonumber\\ && +\, p^\dagger \, \nabla _{{\bf m}}\mathcal {L}(C)\, \delta {\bf m}_2 + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_2 \Bigr \rangle . \end{eqnarray} (D37)The Hessian operator applied to the model perturbation δm1 is thus given by   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \circ ) &=& \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}C )\, \circ \Bigr \rangle + \Bigl \langle c^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}G )\, \circ \Bigr \rangle + \Bigl \langle w^\dagger \, \nabla _{{\bf m}}\mathcal {L}(G)\, \circ \Bigr \rangle \nonumber\\ && +\, \Bigl \langle p^\dagger \, \nabla _{{\bf m}}\mathcal {L}(C)\, \circ \Bigr \rangle + \Bigl \langle z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \circ \Bigr \rangle . \end{eqnarray} (D38)The first two terms account for changes in the forward wavefields C and G caused by the perturbation δm1, the last three terms for changes in the adjoint wavefields u† and c†, caused by scattering and an altered adjoint source time function. The two adjoint wavefields w† and z† can be computed simultaneously, since both interact with the Green function G and the operator $$\mathcal {L}$$ is linear in its forcing term. © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Geophysical Journal International Oxford University Press

Towards full waveform ambient noise inversion

Loading next page...
 
/lp/ou_press/towards-full-waveform-ambient-noise-inversion-H9TxAvkAo0
Publisher
The Royal Astronomical Society
Copyright
© The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society.
ISSN
0956-540X
eISSN
1365-246X
D.O.I.
10.1093/gji/ggx429
Publisher site
See Article on Publisher Site

Abstract

SUMMARY In this work we investigate fundamentals of a method—referred to as full waveform ambient noise inversion—that improves the resolution of tomographic images by extracting waveform information from interstation correlation functions that cannot be used without knowing the distribution of noise sources. The fundamental idea is to drop the principle of Green function retrieval and to establish correlation functions as self-consistent observables in seismology. This involves the following steps: (1) We introduce an operator-based formulation of the forward problem of computing correlation functions. It is valid for arbitrary distributions of noise sources in both space and frequency, and for any type of medium, including 3-D elastic, heterogeneous and attenuating media. In addition, the formulation allows us to keep the derivations independent of time and frequency domain and it facilitates the application of adjoint techniques, which we use to derive efficient expressions to compute first and also second derivatives. The latter are essential for a resolution analysis that accounts for intra- and interparameter trade-offs. (2) In a forward modelling study we investigate the effect of noise sources and structure on different observables. Traveltimes are hardly affected by heterogeneous noise source distributions. On the other hand, the amplitude asymmetry of correlations is at least to first order insensitive to unmodelled Earth structure. Energy and waveform differences are sensitive to both structure and the distribution of noise sources. (3) We design and implement an appropriate inversion scheme, where the extraction of waveform information is successively increased. We demonstrate that full waveform ambient noise inversion has the potential to go beyond ambient noise tomography based on Green function retrieval and to refine noise source location, which is essential for a better understanding of noise generation. Inherent trade-offs between source and structure are quantified using Hessian-vector products. Interferometry, Inverse theory, Computational seismology, Seismic noise, Seismic tomography, Theoretical seismology 1 INTRODUCTION The ambient noise field was long regarded as ‘not of great seismological interest in itself ’ (Agnew & Berger 1978), but has in recent years attracted considerable attention as a valuable source of information on 3-D Earth structure. This paradigm shift was possible due to the raising awareness in seismology that the system response can be extracted from noise recordings. As pointed out by Snieder & Larose (2013) the underlying principle actually dates back more than a century to the paper by Einstein (1906) on Brownian motion. Although Aki (1957) and Claerbout (1968) used similar approaches to study surface waves propagating in the near surface or to relate the reflection response to the transmission response, this field of research only gained momentum after the work by Lobkis & Weaver (2001), who derived and demonstrated in a laboratory experiment that the interstation correlation of recordings of a diffuse field with random and uncorrelated modes is proportional to the Green function between the respective receivers. Campillo & Paul (2003) applied the idea first to earthquake coda and extracted the Green tensor for surface waves. Follow-up studies then confirmed the emergence of surface waveforms from correlations of microseismic noise (Shapiro & Campillo 2004; Sabra et al. 2005b) and prompted a large number of regional and also global tomographic studies (Shapiro et al. 2005; Sabra et al. 2005a; Nishida et al. 2009). While seismic tomography based on ambient noise flourished, theoretical derivations established that the equality of the interstation correlation and the Green function only holds under specific conditions, including wavefield diffusivity and equipartitioning (Lobkis & Weaver 2001; Weaver & Lobkis 2004; Sánchez-Sesma & Campillo 2006; Snieder et al. 2007; Tsai 2009) or the isotropic distribution of both mono- and dipolar uncorrelated noise sources (Wapenaar 2004; Wapenaar & Fokkema 2006; Snieder 2007). These assumptions are typically not satisfied on Earth and the ambient seismic noise field at any frequency is excited by a heterogeneous and non-stationary distribution of noise sources (Rhie & Romanowicz 2004; Nishida & Fukao 2007; Ardhuin et al. 2011; Stutzmann et al. 2012; Gualtieri et al. 2013; Ardhuin et al. 2015; Ermert et al. 2016). Despite the strong empirical evidence that at least fundamental-mode surface waves can be extracted and although tomographic images are broadly consistent with models based on earthquake data (Nishida et al. 2009; Stehly et al. 2009; Haned et al. 2016), breaking these assumptions has various consequences. Amplitudes of noise correlation functions are difficult to interpret and complicate attenuation studies (Harmon et al. 2010; Prieto et al. 2011; Tsai 2011; Stehly & Boué 2017). Heterogeneous noise source distributions affect traveltime measurements (Tsai 2009; Weaver et al. 2009; Yao & van der Hilst 2009; Froment et al. 2010; Harmon et al. 2010; Delaney et al. 2017) and introduce spurious arrivals, which can be mistaken as sound information (Snieder et al. 2006). In addition, certain seismic phases such as body waves and higher-mode surfaces waves are not excited correctly (Halliday & Curtis 2008; Kimman & Trampert 2010), which leads to a poor depth resolution in tomographic images. Besides correction factors presented in some of the studies above, numerous processing and stacking schemes have been proposed to mitigate the difference between reality and the assumptions necessary for Green function retrieval (Bensen et al. 2007; Groos et al. 2012). A common approach is to select correlation functions that resemble plausible Green functions and to discard the rest, which may affect up to 70 per cent of a data set (Stehly et al. 2009). Correlation functions that are strongly altered by heterogeneous noise source distributions may therefore be excluded although they contain valuable information on 3-D Earth structure (Landès et al. 2010). For example, strong noise sources perpendicular to the line connecting two stations manifest themselves in wave packets around zero lag, which cannot be interpreted reasonably as a feature of a Green function. Other processing steps vary for each data set and are usually guided by the objective of the respective study. The specific choices change the actual waveforms in correlation functions, which unavoidably leaves an imprint on the sensitivity to noise sources and Earth structure (Fichtner 2014; Fichtner et al. 2017). During the last decade, great progress has been made in ambient noise tomography based on the principle of Green function retrieval. The discussed shortcomings and corresponding ways to circumvent them, however, reflect that this principle cannot always be readily invoked and ambient noise tomography reaches a limit. Modern tomographic methods exploiting details in waveforms for the benefit of improved resolution (Pratt 1999; Chen et al. 2007; Tape et al. 2009; Fichtner et al. 2009) cannot be applied directly, since Green function retrieval does not account for the influence of the distribution of noise sources on the actual waveforms in correlation functions. Motivation and outline In order to address these limitations, we attempt to develop a method—in the following referred to as full waveform ambient noise inversion—that is capable of accounting for the distribution of noise sources in space and frequency, 3-D heterogeneous Earth structure and the full seismic wave propagation physics. The fundamental idea is to drop the principle of Green function retrieval and to extract information directly from correlation functions and establish them as self-consistent observables in seismology. A similar approach is used in helioseismology to study solar structure and dynamics (Woodard 1997; Gizon & Birch 2002; Hanasoge et al. 2011) and has recently been applied to terrestrial seismology (Tromp et al. 2010; Basini et al. 2013; Hanasoge 2013, 2014; Fichtner 2014). In the following, we present a more general and extended formulation of the theory that allows us not only to compute first, but also second derivatives, which are necessary for an uncertainty analysis accounting for intra- and interparameter trade-offs. In addition, we develop an inversion scheme to simultaneously constrain noise sources and Earth structure, and demonstrate in a 2-D example that this approach has the potential to improve resolution and the location of noise sources substantially. 2 FORWARD THEORY To forward model interstation correlations of ambient noise, we adopt the method developed by Woodard (1997) in helioseismology and recently transported to terrestrial seismology by Tromp et al. (2010). We start with the definition of the frequency-domain Green function $$G_{i,n}({\bf x},\boldsymbol{\xi })$$ as the solution of the governing equations when the right-hand side equals a point-localized force at position $$\boldsymbol{\xi }$$ in n-direction, that is,   \begin{equation} \mathcal {L}[\, G_{i,n}({\bf x},\boldsymbol{\xi })\, ] = \delta _{in} \delta ({\bf x}-\boldsymbol{\xi }), \end{equation} (1)with the Kronecker delta δin and the forward modelling operator $$\mathcal {L}$$. In the interest of a condensed notation, dependencies are omitted wherever possible, here on the angular frequency ω. The operator notation allows us to develop a general theory independent of the specific choice of the forward problem. In seismology, for example, it could stand for any type of wave equation. We limit ourselves, however, to operators that are linear in their argument. The operator contains all model parameters m, in our case the material parameters, that determine the response of the physical system to an external force given by the right hand side in eq. (1). The i-component of the seismic displacement field ui(x) is connected with the Green function $$G_{i,n}({\bf x},\boldsymbol{\xi })$$ and the forcing term $$N_n(\boldsymbol{\xi })$$, in our case the source field of ambient noise, via an integral over the volume ⊕ of the Earth, known as representation theorem (Aki & Richards 2002)   \begin{equation} u_i({\bf x}) = \int \limits _\oplus G_{i,n}({\bf x},\boldsymbol{\xi }) N_n(\boldsymbol{\xi })\, \text{d}\boldsymbol{\xi }. \end{equation} (2)Einstein’s summation convention applies to repeated lower case subscripts. The cross-correlation of the noise fields ui(x1) and uj(x2) is in the frequency domain given by   \begin{equation} \mathcal {C}_{ij}({\bf x}_1,{\bf x}_2) = u_i({\bf x}_1) u_j^*({\bf x}_2) = \int \limits _\oplus \int \limits _\oplus G_{i,n}({\bf x}_1,\boldsymbol{\xi }_1) N_n(\boldsymbol{\xi }_1) G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }_2) N_m^*(\boldsymbol{\xi }_2)\, \text{d}\boldsymbol{\xi }_1\, \text{d}\boldsymbol{\xi }_2. \end{equation} (3)We then consider the average $$\overline{\mathcal {C}_{ij}({\bf x}_1,{\bf x}_2)}$$ over many realizations   \begin{equation} C_{ij}({\bf x}_1,{\bf x}_2) = \overline{\mathcal {C}_{ij}({\bf x}_1,{\bf x}_2)} = \int \limits _\oplus \int \limits _\oplus G_{i,n}({\bf x}_1,\boldsymbol{\xi }_1) G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }_2)\ \overline{N_n(\boldsymbol{\xi }_1) N_m^*(\boldsymbol{\xi }_2)} \ \text{d}\boldsymbol{\xi }_1\, \text{d}\boldsymbol{\xi }_2, \end{equation} (4)which is estimated in standard processing of interstation correlation functions by stacking over many time intervals (Bensen et al. 2007). As pointed out by Snieder et al. (2010) dissipation effectively resets the clock after a characteristic decay time so that consecutive wavefield samples are practically independent. The evaluation of eq. (4) is prohibitively expensive, not only because of the double integral, but also since many realizations of the noise sources are required. This has only been done for simplified wave propagation scenarios (Cupillard & Capdeville 2010; van Driel et al. 2015; Fichtner et al. 2017). To reduce computational costs, we assume that the correlation between different noise sources decays relatively quickly with distance compared to the seismic wavelength and that the contribution from collocated noise sources to the correlation function is dominant. We therefore approximate $$\overline{N_n(\boldsymbol{\xi }_1)N_m^*(\boldsymbol{\xi }_2)}$$ with a delta function in space, that is,   \begin{equation} \overline{N_n(\boldsymbol{\xi }_1)N_m^*(\boldsymbol{\xi }_2)}=S_{nm}(\boldsymbol{\xi }_1)\delta (\boldsymbol{\xi }_1-\boldsymbol{\xi }_2), \end{equation} (5)with the source power-spectral density $$S_{nm}(\boldsymbol{\xi })$$. This assumption is frequently made (Woodard 1997; Tromp et al. 2010; Hanasoge 2013, 2014; Fichtner 2014; Nishida 2014) and its quality has to be assessed for each data set. In any case, the correlation length in space will be finite and the forward theory will remain computable, since the double integral has to be evaluated only for a small region in space. A detailed discussion can be found in Section 5. For uncorrelated noise sources eq. (3) condenses to   \begin{equation} C_{ij}({\bf x}_1,{\bf x}_2) = \int \limits _\oplus G_{i,n}({\bf x}_1,\boldsymbol{\xi }) G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }) S_{nm}(\boldsymbol{\xi })\, \text{d}\boldsymbol{\xi }. \end{equation} (6)Comparing (6) to the representation theorem (2) we see that Cij(x1, x2) can be interpreted as the i-component of a correlation wavefield Cj(x, x2)   \begin{equation} C_{ij}({\bf x},{\bf x}_2)=\int \limits _\oplus G_{i,n}({\bf x},\boldsymbol{\xi })\, \Bigl [G_{j,m}^*({\bf x}_2,\boldsymbol{\xi }) S_{nm}(\boldsymbol{\xi })\Bigr ]\, \text{d}\boldsymbol{\xi }, \end{equation} (7)evaluated at position x = x1. Its numerical computation can be described with the following recipe: step 1: Using source-receiver reciprocity, compute the Green function $$G_{m,j}(\boldsymbol{\xi },{\bf x}_2)$$ with source at x2. step 2: Multiply its complex conjugate with the noise source power-spectral density $$S_{nm}(\boldsymbol{\xi })$$. step 3: Model the correlation wavefield as a solution of the wave equation with $$G_{m,j}^*(\boldsymbol{\xi },{\bf x}_2) S_{nm}(\boldsymbol{\xi })$$ as distributed source and sample the correlation wavefield at any position where another receiver is located. We now have a forward model for the computation of interstation correlation functions for arbitrary distributions of the power-spectral density in frequency and space. It is important to note that this approach does not require any assumptions on wavefield diffusivity and equipartitioning (Lobkis & Weaver 2001; Weaver & Lobkis 2004; Sánchez-Sesma & Campillo 2006) or the isotropic distribution of both mono- and dipolar uncorrelated noise sources (Wapenaar 2004; Wapenaar & Fokkema 2006; Snieder 2007) that would be necessary to equate correlations with Green functions. 2.1 General formulation of the forward theory To facilitate further developments regarding sensitivity kernels and Hessian-vector products and, in addition, to be independent of the choice of time or frequency domain, we propose the following two partial differential equations (PDE) for a general formulation describing the modelling of interstation correlation functions according to the recipe above:   \begin{equation} \underbrace{\mathcal {L}[\, G_{m,j}({\bf x},{\bf x}_2)\, ] = \delta _{mj}\, D(\, 0, {\bf x}_2\, )}_{{\bf step 1}} \end{equation} (8)  \begin{equation} \underbrace{\mathcal {L}[\, C_{ij}({\bf x}, {\bf x}_2)\, ] = \underbrace{\delta _{in}\, P[\, S_{nm}({\bf x}), G_{m,j}({\bf x}, {\bf x}_2)\, ]}_{{\bf step 2}}}_{{\bf step 3}}, \end{equation} (9)where D( ts, xs) represents two delta functions, one at time ts or the respective frequency domain equivalent and the other in space at xs. P denotes a bilinear function acting on Snm(x) and Gm, j(x, x2). In the frequency domain P can simply be a multiplication of Snm(x, ω) and $$G_{m,j}^*({\bf x}, {\bf x}_2,\omega )$$, thereby again invoking Einstein’s summation convention. This would translate to a correlation between Gm, j(x, x2, t) and the time domain equivalent Snm(x, t) of the power-spectral density, and summing over m. These are only examples and other definitions are possible (see next subsection). 2.2 Example for membrane waves After each paragraph covering theoretical developments, we apply the main findings to membrane waves (Tanimoto 1990; Peter et al. 2007; Tape et al. 2007) to give a better understanding of the concepts. Since they can be interpreted as an analogue to fundamental mode surface waves, we also refer to them as surface waves in the following. The operator $$\mathcal {L}$$ describing membrane waves travelling in the x − y plane with vertical displacement u is in the time domain given by   \begin{equation} \mathcal {L}(\circ ) = \rho \, \partial _{tt} \circ -\, \partial _{\bf x}\bigl [\mu \, \partial _{\bf x}\, \circ \bigr ], \end{equation} (10)with density ρ, shear modulus μ and the (second) partial derivatives with respective to space ∂x and time ∂tt. The symbol ○ represents a place holder, in this case for a scalar wavefield. Eqs (8) and (9) take the specific form   \begin{equation} \rho \, \partial _{tt} G({\bf x}, {\bf x}_2, t) -\, \partial _{\bf x}\bigl [\mu \, \partial _{\bf x} G({\bf x}, {\bf x}_2, t) \bigr ] = \delta (t) \delta ({\bf x}-{\bf x}_2), \end{equation} (11)  \begin{equation} \rho \, \partial _{tt} C({\bf x},{\bf x}_2,t) - \partial _{\bf x}\bigl [\mu \, \partial _{\bf x} C({\bf x},{\bf x}_2,t)\bigr ] = \underbrace{\frac{1}{2\pi } \rm {Re} \int \limits _{-\infty }^{\infty } S({\bf x}, \omega )\, \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t)\, e^{-i\omega t}\, \text{d}t\, \Bigr ]^* e^{i\omega t} \, \text{d}\omega }_{=:P[\, S({\bf x}),\, G({\bf x}, {\bf x}_2)\, ]}. \end{equation} (12)The innermost integral on the right hand side of eq. (12), bracketed with [.], is an approximation of the Fourier transform (see Appendix A) of the Green function for a finite time interval τ = [t1, t2]. The interval is chosen such that (t2 − t1)−1 is much smaller than the minimum frequency of interest in order to avoid contamination of the Fourier spectrum. The complex conjugate of the Fourier transformed Green function, equivalent to its time reversed version, is multiplied with the power-spectral density S(x, ω) and the product is transformed back to the time domain. The final forcing term of the correlation wavefield is a time-dependent quantity that is distributed in space. In a realistic inverse problem, however, a continuous and infinite-dimensional frequency dependence cannot be resolved. We therefore choose a parametrization of the power-spectral density in terms of L basis functions Sl(x) that are constant within a period band given by ωl and ωl + 1 and zero outside. The integral over frequency is thus split up in a sum of L integrals, each covering the frequency band of the respective basis function. Eq. (12) changes accordingly to   \begin{equation} \rho \, \partial _{tt} C({\bf x},{\bf x}_2,t) - \partial _{\bf x}\bigl [\mu \, \partial _{\bf x} C({\bf x},{\bf x}_2,t)\bigr ] = \underbrace{\frac{1}{2\pi } \textrm{Re} \sum _l S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t)\, e^{-i\omega t}\, \text{d}t\, \Bigr ]^* e^{i\omega t} \, \text{d}\omega }_{=:\,P[\, S({\bf x}),\, G({\bf x}, {\bf x}_2)\, ]}. \end{equation} (13)This represents only one possibility to adapt P[ S(x), G(x, x2) ] for inversions and other definitions might be more feasible, but for clarity we base the following examples on eq. (13). Correlation functions for one frequency band are shown in Fig. 1 for different distributions in space. The two heterogeneous noise source distributions lead to asymmetric correlations functions with waves arriving earlier than the surface wave, which cannot be interpreted reasonably by Green function retrieval. Figure 1. View largeDownload slide Correlation functions for different noise source distributions computed for a frequency band of 0.05–0.18 Hz and a homogeneous velocity model with v = 4000 km s−1. The noise source distributions for (b) and (c) are shown on the right. Each correlation function is normalized. Figure 1. View largeDownload slide Correlation functions for different noise source distributions computed for a frequency band of 0.05–0.18 Hz and a homogeneous velocity model with v = 4000 km s−1. The noise source distributions for (b) and (c) are shown on the right. Each correlation function is normalized. 3 THE CONTINUOUS ADJOINT METHOD FOR FIRST AND SECOND DERIVATIVES 3.1 First derivatives The objective of full waveform ambient noise inversion is to find material parameters m and a power-spectral density distribution Snm, which minimizes a misfit $$\mathcal {X}$$ used to quantify the deviation of synthetic correlation functions and their corresponding observations. Since $$\mathcal {X}$$ is generally a non-linear function of m and Snm, we approximate the optimum with the help of iterative minimization algorithms. In order to obtain an efficient formulation to compute directional derivatives of $$\mathcal {X}$$ with respect to the power-spectral density and Earth structure, we apply the continuous adjoint method in the following (Tarantola 1988; Tromp et al. 2005; Fichtner et al. 2006) to the PDEs (8) and (9) describing the modelling of correlation functions. For the sake of a clean notation we present the derivation only for scalar wavefields. The extension to elastic media follows the same steps. 3.1.1 Source kernel The choice of the misfit functional depends on the specific problem and influences to a large extent the results of an inversion. Since the formalism described in the following, however, can be applied to any differentiable misfit functional, we write the misfit in a generic form by two integrals, which we denote by 〈.〉. One integral is over space $${\bf x} \in \oplus \subset \mathbb {R}^3$$ and the other over time t ∈ τ = [t1, t2] or equivalently over angular frequency ω ∈ W = [ω1, ω2] (Fichtner 2010). We therefore write   \begin{equation} \mathcal {X}({\bf m},S) = \Bigl \langle \chi (C ({\bf m},S) ) \Bigr \rangle , \end{equation} (14)where χ is a space and time dependent evaluation of the discrepancy between synthetics and observations. For observations at specific locations, for example, at seismic stations, χ can be defined by introducing δ functions in space. Without loss of generality, we consider a time interval τ well suited for correlation functions, which is centred around time zero, that is, τ = [−tc, tc], where tc is the maximum lag of the correlation function. An improvement of the fit translates to a decrease of the misfit functional $$\mathcal {X}$$. In order to invert for the distribution of noise sources in space, we are therefore interested in the directional derivative of $$\mathcal {X}$$ with respect to S. Applying the chain rule we get   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S = \nabla _C\mathcal {X}\, \delta _S C = \Bigl \langle \nabla _C\chi \, \delta _S C \Bigr \rangle , \end{equation} (15)where   \begin{equation} \delta _S C = \nabla _S C\, \delta S \end{equation} (16)represents the first-order change of the correlation C for a variation in the source distribution δS. The difficulty of eq. (15) lies in the appearance of δSC that is computationally expensive to evaluate for all possible directions δS necessary for a complete evaluation of the directional derivative. A first-order finite-difference approximation of $$\nabla _{S}\mathcal {X}$$ would mean to evaluate C(m, S + δS) for all possible directions δS, which becomes infeasible for expensive forward calculations or large model domains. Thus, the following considerations are directed to eliminate δSC in eq. (15). To achieve this, we differentiate eq. (9) with respect to S, which gives   \begin{equation} \mathcal {L}(\delta _{S}C) - P(\, \delta S, G\, ) = 0, \end{equation} (17)where we use that $$\mathcal {L}$$ is linear in C and P linear in S, that is,   \begin{equation} \nabla _C\mathcal {L}(C)\, \delta _{S}C = \mathcal {L}(\delta _{S}C), \end{equation} (18)and   \begin{equation} \nabla _S \, P(\, S, G\, )\, \delta S = P(\, \delta S, G\, ). \end{equation} (19)Multiplying eq. (17) by an arbitrary test function u†, applying the integrals 〈.〉 and adding it to eq. (15) yields   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S = \Bigl \langle \nabla _C\chi \, \delta _S C + u^\dagger \, \mathcal {L}( \delta _{S}C ) - u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle . \end{equation} (20)Invoking the adjoint operators defined by   \begin{equation} \Bigl \langle \nabla _C\chi \, \delta _S C \Bigr \rangle = \Bigl \langle \delta _{S} C\, \nabla _C\chi ^\dagger \Bigr \rangle \end{equation} (21)and   \begin{equation} \Bigl \langle u^\dagger \, \mathcal {L}(\delta _{S}C)\Bigr \rangle = \Bigl \langle \delta _{S}C\, \mathcal {L}^\dagger (u^\dagger )\Bigr \rangle \end{equation} (22)for any u† and δSC, we obtain   \begin{eqnarray} \nabla _{S}\mathcal {X} \, \delta S & = & \Bigl \langle \delta _{S} C\, \nabla _C\chi ^\dagger + \delta _{S}C\, \mathcal {L}^\dagger (u^\dagger ) - u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle \nonumber\\ & = & \Bigl \langle \delta _{S} C \, [\, \nabla _C\chi ^\dagger + \mathcal {L}^\dagger (u^\dagger )\, ] - u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle . \end{eqnarray} (23)We can now eliminate δSC when a field u† can be found such that the adjoint equation   \begin{equation} \mathcal {L}^\dagger ( u^\dagger ) = -\nabla _C\chi ^\dagger \end{equation} (24)is satisfied. The right hand side of eq. (24) is usually referred to as adjoint source time function and we denote it by f in the example sections below. The directional derivative of $$\mathcal {X}$$ in eq. (15) then simplifies to   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S = -\Bigl \langle u^\dagger \, P(\, \delta S, G\, )\Bigr \rangle . \end{equation} (25)Choosing the frequency domain for illustration, eq. (25) can be written as   \begin{equation} \nabla _{S}\mathcal {X} \, \delta S({\bf x},\omega ) = -\int \limits _\oplus \int \limits _W u^\dagger ({\bf x},\omega )\, P\bigl [\, \delta S({\bf x},\omega ), G({\bf x}, {\bf x}_2, \omega )\, \bigr ] \, \text{d}{\bf x}\, \text{d}\omega = \int \limits _\oplus \int \limits _W K(x, \omega )\, \delta S({\bf x},\omega ) \, \text{d}{\bf x}\, \text{d}\omega , \end{equation} (26)where we define the frequency dependent noise source kernel as   \begin{equation} K(x, \omega ) = - u^\dagger ({\bf x},\omega )\, P\bigl [\, \circ , G({\bf x}, {\bf x}_2, \omega )\, \bigr ]. \end{equation} (27)In the time domain, any definition of P includes an integral over angular frequency ω. Consequently, a noise source kernel can be defined in a similar way in the time domain. A kernel represents the first-order change of the measured misfit and tells us in our case where changes in the source distribution have an influence on the misfit. By construction, we can now compute $$\nabla _{S}\mathcal {X} \, \delta S$$ for any direction δS without the explicit knowledge of δSC. The price we have to pay is to find the adjoint operator $$\mathcal {L}^\dagger$$ and to solve the adjoint problem according to eq. (24). In non-dissipative media, the wave equation operator is self-adjoint, meaning that $$\mathcal {L}=\mathcal {L}^\dagger$$. The derivation of $$\mathcal {L}^\dagger$$ can be found, for example, in Fichtner (2010). In the time domain, the adjoint eq. (24) has to be solved subject to terminal conditions, requiring that the adjoint wavefield is zero at time tc. In practice, it is solved backwards in time, such that the terminal conditions act as initial conditions for the numerical simulation (Tarantola 1988; Tromp et al. 2005; Plessix 2006; Fichtner et al. 2006). Example for membrane waves After the derivation of the source kernel, we come back to the example introduced in Section 2.2. Suppose we have a perfect Earth model for a certain study area and we calculate synthetic correlation functions for a homogeneous distribution of noise sources. Since noise sources at any frequency are heterogeneously distributed and non-stationary (Ardhuin et al. 2011; Stutzmann et al. 2012; Ardhuin et al. 2015), there is a discrepancy between synthetics and observations. We are therefore interested in how each basis function Sl(x) has to be changed to improve the fit between synthetics and observations. According to eq. (25) this is given by   \begin{eqnarray} \nabla _{S_l}\chi \, \delta S_l & = & -\Bigl \langle u^\dagger \, P(\, \delta {S_l}, G\, )\Bigr \rangle \nonumber\\ & = & - \frac{1}{2\pi } \textrm{Re} \int \limits _\oplus \int \limits _\tau u^\dagger ({\bf x}, t)\, \delta S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t^{\prime })\, e^{-i\omega t^{\prime }} \text{d}t^{\prime } \Bigr ]^* e^{i\omega t} \, \text{d}\omega \, \text{d}t\, \text{d}{\bf x}. \end{eqnarray} (28)Since Sl is constant in the corresponding frequency band, the kernel Kl can be defined as   \begin{equation} K_l({\bf x}) = - \frac{1}{2\pi } \textrm{Re} \int \limits _\tau u^\dagger ({\bf x},t) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} G({\bf x}, {\bf x}_2, t^{\prime })\, e^{-i\omega t^{\prime }} \, \text{d}t^{\prime }\, \Bigr ]^* e^{i\omega t} \, \text{d}\omega \, \text{d}t, \end{equation} (29)such that   \begin{equation} \nabla _{S_l}\chi \, \delta S_l = \int \limits _\oplus K_l({\bf x}) \delta S_l({\bf x})\, \text{d}{\bf x}. \end{equation} (30)The frequency dependence of the noise source is in this example honoured by L noise source kernels, one for each basis function. The adjoint wavefield u†(x) can be interpreted as a time reversed Green function G(x, x1, −t), with its source at the receiver x1 where the measurement is taken, convolved ⋆ with the adjoint source time function f(t). We rewrite eq. (29) accordingly and obtain   \begin{equation} K_l({\bf x}) = - \int \limits _\tau f(t) \star G({\bf x}, {\bf x}_1, -t)\, G_l({\bf x}, {\bf x}_2, -t)\, \text{d}t, \end{equation} (31)where we abbreviate the complex conjugate of the finite Fourier transform and its band-limited inverse, covering the angular frequencies of the corresponding basis function, with Gl(x, x2, −t). Since the time reversed Green function is purely acausal, the kernel has to be evaluated only for times t ≤ 0. Examples of source kernels are presented in Fig. 2. The general shape is governed by the interaction of two Green functions, one at the reference station x2 and the other at the receiver x1, leading to a superposition of hyperbolas with varying eccentricity (Hanasoge 2013; Fichtner 2015; Ermert et al. 2016). Geometrically, a hyperbola can be defined as a set of points, for which the difference between any point and two foci, in our case the two receivers, is constant. Therefore, any source along one hyperbola contributes to the correlation waveform at a specific traveltime. Taking a measurement in a specific time window thus selects a set of hyperbolas and the measurement type can be interpreted as a weighting factor. The alternating signs of the hyperbolas indicate different Fresnel zones. For infinite frequency, the kernels collapse into rays starting at one focal point, directed away from the second. Figure 2. View largeDownload slide Effect of windowing and adjoint source time function on source kernels for a frequency band of 0.02–0.2 Hz. The general shape of the source kernel, that is, the hyperbolas (top left) are governed by two Green functions. Windowing the correlation function, selects different parts of the sensitivity kernel, for example, for the surface wave signal from 73 to 127 s (bottom left) or times around zero lag from −10 to 10 s (top right). Whereas the first three kernels are calculated for waveform differences, the last (bottom right) corresponds to an asymmetry measurement, both computed without data. For details about the specific misfit functionals see Section 4. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. Figure 2. View largeDownload slide Effect of windowing and adjoint source time function on source kernels for a frequency band of 0.02–0.2 Hz. The general shape of the source kernel, that is, the hyperbolas (top left) are governed by two Green functions. Windowing the correlation function, selects different parts of the sensitivity kernel, for example, for the surface wave signal from 73 to 127 s (bottom left) or times around zero lag from −10 to 10 s (top right). Whereas the first three kernels are calculated for waveform differences, the last (bottom right) corresponds to an asymmetry measurement, both computed without data. For details about the specific misfit functionals see Section 4. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. 3.1.2 Structure kernel The assumption of having a perfect structure model as in the example above will, of course, never be fulfilled and missing Earth structure in the model m will contribute to discrepancies between synthetics and observations as well. Therefore, we follow the procedure described in Section 3.1.1 in order to derive sensitivity kernels for material parameters m. The directional derivative of the misfit functional $$\mathcal {X}$$ with respect to δm is given by   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \nabla _C\mathcal {X}\, \delta _{\bf m} C = \Bigl \langle \nabla _C \chi \, \delta _{{\bf m}} C \Bigr \rangle , \end{equation} (32)where   \begin{equation} \delta _{\bf m} C = \nabla _{\bf m} C\, \delta {\bf m} \end{equation} (33)denotes the derivative of the correlation function C with respect to m in the direction of δm. The problem in eq. (32) is this time the appearance of δmC, which is computationally expensive to evaluate for all possible directions δm. In order to again eliminate the unpleasant term, we differentiate eq. (9) with respect to m  \begin{equation} \nabla _{{\bf m}} \mathcal {L}(C) \, \delta {\bf m} + \mathcal {L}(\delta _{{\bf m}}C)- P(S, \delta _{{\bf m}}G\, ) = 0, \end{equation} (34)where   \begin{equation} \delta _{\bf m} G = \nabla _{\bf m} G\, \delta {\bf m} \end{equation} (35)represents the derivative of the Green function G with respect to m in the direction of δm. Please note that we obtain three terms in eq. (34), because not only the correlation function C and the Green function G depend on the material parameters m, but also the operator $$\mathcal {L}$$. Multiplying eq. (34) with the adjoint wavefield u† defined in eq. (24), applying 〈.〉 and adding the term to eq. (32) leads to   \begin{equation} \delta _{{\bf m}}\chi = \Bigl \langle \nabla _C \chi \, \delta _{{\bf m}} C + u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + u^\dagger \, \mathcal {L}( \delta _{{\bf m}}C ) - u^\dagger \, P(S, \delta _{{\bf m}}G\, ) \Bigr \rangle . \end{equation} (36)With the same set of adjoint operators defined before in eqs (21) and (22), we eliminate terms one and three due to the definition of u† and obtain   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} - u^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle . \end{equation} (37)Since the source term for the correlation wavefield also depends on the material parameters m in terms of the Green function G, we still have the term δmG in eq. (37) and the directional derivative cannot be evaluated for arbitrary directions δm. Trying to eliminate δmG, we consider the derivative of eq. (8) with respect to m in the direction δm:   \begin{equation} \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + \mathcal {L}( \delta _{{\bf m}}G ) = 0 \end{equation} (38)Following the procedure above, that is, multiplying eq. (38) with a test field c†, applying the integrals 〈.〉 and adding the resulting term to eq. (37), yields   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} - u^\dagger \, P( S, \delta _{{\bf m}}G )+ c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + c^\dagger \mathcal {L}( \delta _{{\bf m}}G ) \Bigr \rangle . \end{equation} (39)Now invoking a second pair of adjoint operators   \begin{equation} \Bigl \langle u^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle = \Bigl \langle \delta _{{\bf m}}G\, P^\dagger ( S, u^\dagger ) \Bigr \rangle , \end{equation} (40)  \begin{equation} \Bigl \langle c^\dagger \, \mathcal {L}( \delta _{{\bf m}}G ) \Bigr \rangle = \Bigl \langle \delta _{{\bf m}}G\, \mathcal {L}^\dagger ( c^\dagger ) \Bigr \rangle , \end{equation} (41)we can alter eq. (39) to   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} - \delta _{{\bf m}}G\, [ P^\dagger ( S, u^\dagger ) - \mathcal {L}^\dagger (c^\dagger ) ] \Bigr \rangle . \end{equation} (42)When a field c† can be found that satisfies the second adjoint equation given by   \begin{equation} \mathcal {L}^\dagger ( c^\dagger ) = P^\dagger ( S, u^\dagger ), \end{equation} (43)the term δmG is eliminated. The derivative of the misfit functional $$\mathcal {X}$$ with respect to material parameters m can then be computed for any direction δm by   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} \Bigr \rangle = \int \limits _\oplus K({\bf x})\, \delta {\bf m}\, \text{d}{\bf x}, \end{equation} (44)where the sensitivity kernel K is in the time domain defined as   \begin{equation} K({\bf x}) = \int \limits _{\tau } u^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C) + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \text{d}t. \end{equation} (45)For the computation of sensitivity kernels for Earth structure, two adjoint equations given by eqs (24) and (43) have to be solved. The steps in solving the adjoint equations follow the recipe introduced in Section 2 to forward model correlation functions: First, a wavefield u† is computed, which is then combined with the power-spectral density distribution according to P†( S, u† ). The combination then drives in the last step a second wavefield c† as distributed source following eq. (43). For a further interpretation of the second adjoint wavefield, we first consider the right hand side of eq. (43) in the frequency domain and for a self-adjoint operator P, meaning that P = P†. We can then write   \begin{equation} P^\dagger \bigl [ S({\bf x},\omega ), u^\dagger ({\bf x},\omega ) \bigr ] = S({\bf x},\omega )\, u^{\dagger *}({\bf x},\omega ). \end{equation} (46)Using eq. (46) as the source term in the scalar version of the representation theorem in eq. (2) and writing again u† = f(ω) G*(x, x1, ω), we obtain   \begin{equation} c^\dagger ({\bf x},\omega ) = f^*(\omega ) \int \limits _\oplus G^*({\bf x},\boldsymbol{\xi },\omega ) \Bigl [S({\bf x},\omega ) G({\bf x}_1, \boldsymbol{\xi }, \omega )\Bigr ]\, \text{d}\boldsymbol{\xi }, \end{equation} (47)where the first Green function $$G^*({\bf x},\boldsymbol{\xi })$$ is time reversed, since c† is an adjoint wavefield. Comparing eq. (47) to the correlation wavefield given by expression (7), c† can be interpreted as an adjoint correlation wavefield with a reference station at x1, modulated with the adjoint source time function f(ω). The link between the derivation presented here and by Fichtner (2015) is presented in Appendix B. Example for membrane waves Similar to the derivation of the adjoint operator $$\mathcal {L}^\dagger$$, the explicit form of P† depends on the definition of P. A derivation for our example case is given in Appendix C. In addition, we apply the general expression for the derivative of the objective function $$\mathcal {X}$$ in direction δm (eq. (44)) to membrane waves and, for simplicity, only to variations in density ρ. Using the wave operator defined in eq. (10), we obtain for the derivative of the wave equation operator with respect to ρ   \begin{equation} \nabla _{\rho } \mathcal {L}(\circ ) \, \delta \rho = \delta \rho \, \partial _{tt}\circ . \end{equation} (48)Consequently, we find   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger \, \partial _{tt}{C}\, \delta \rho + c^\dagger \, \partial _{tt}{G}\, \delta \rho \Bigr \rangle = \int \limits _\oplus K_{\rho }({\bf x})\, \delta \rho \, \text{d}{\bf x}. \end{equation} (49)Integration by parts and using the terminal conditions for the adjoint wavefield provides a more symmetric version of the density kernel, which is convenient for numerical simulations, and is given by   \begin{equation} K_\rho ({\bf x}) = \int \limits _\tau \partial _{t}u^\dagger ({\bf x},t)\, \partial _{t}C({\bf x},t)\, + \partial _{t}c^\dagger ({\bf x},t)\, \partial _{t}G({\bf x},t)\, \text{d}t. \end{equation} (50)Following the same recipe, kernels can be derived for the elastic parameters or seismic velocities. Examples for structure kernels for different source distributions are illustrated in Fig. 3. Inherent in each structure kernel are elliptical features. For any point along an ellipse, the sum of the distances to two foci is constant. A change of the medium at any of these points therefore affects the waveforms at a specific traveltime. For a point source and two receivers, the structure kernel is composed of ellipses, each with one focal point at the source position and the other at one of the receivers. Any other source distribution can be explained by a superposition of point sources. For the special case of a homogeneous distribution of noise sources, the sensitivity to each noise source location vanishes and the two focal points are located at the receivers. The kernel then resembles finite-frequency kernels for earthquake tomography (Dahlen et al. 2000; Dahlen & Baig 2002; Zhou et al. 2004; Tromp et al. 2005). The kernel width is again determined by the frequency content. Figure 3. View largeDownload slide Structure kernels for four different source distributions: homogeneous (top left), Gaussian anomaly (bottom left) and two point sources (right). In the last three scenarios, the source anomalies are indicated in green colours. All kernels are calculated for a frequency band of 0.02 to 0.2 Hz using waveform differences as misfit functional, computed without data. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. Figure 3. View largeDownload slide Structure kernels for four different source distributions: homogeneous (top left), Gaussian anomaly (bottom left) and two point sources (right). In the last three scenarios, the source anomalies are indicated in green colours. All kernels are calculated for a frequency band of 0.02 to 0.2 Hz using waveform differences as misfit functional, computed without data. Each source kernel is normalized and the colour scale clipped at 0.5 to highlight more Fresnel zones. 3.2 Second derivatives With the developments above, we can attribute differences in synthetic and observed correlation functions either to the material parameters m or to the power-spectral density distribution Snm. In order to establish the foundation of a resolution analysis that accounts for the resulting trade-offs between noise sources and Earth structure, we extend the theory to the computation of second derivatives. In the context of probabilistic inverse problems, the inverse Hessian matrix in the vicinity of the global model and under the assumption of Gaussian statistics can be interpreted as an approximation of the posterior covariance matrix (Tarantola 2005; Fichtner & Trampert 2011). The Hessian matrix therefore contains all the information on resolution and trade-offs that we are trying to retrieve. Since mixed derivatives are most important for the characterization of trade-offs, we focus on them in the following and show second derivatives only with respect to structure Hmm or source Hss in Appendix D. For the sake of a clean notation we again present the derivation only for the scalar case, while the same recipe applies for the extension to elastic media. Mixed source-structure derivative Hms The basic principles of the adjoint method applied above to get an efficient formulation for the computation of first derivatives, are reused in the following to compute second derivatives. We start with the directional derivative of the misfit functional $$\mathcal {X}$$ with respect to δm, which is given by   \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} = \nabla _C\mathcal {X}\, \delta _{\bf m} C = \Bigl \langle \nabla _C \chi \, \delta _{{\bf m}} C \Bigr \rangle . \end{equation} (51)Since we are interested in mixed derivatives, we differentiate eq. (51) again, but now with respect to S, and we get   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \nabla _{S}\nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m} \, \delta S = \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + \nabla _{C}\chi \, \delta _{{\bf m}S} C \Bigr \rangle , \end{equation} (52)where   \begin{equation} \delta _{{\bf m}S} C = \nabla _{S}\nabla _{\bf m}\, C\, \delta {\bf m}\, \delta S \end{equation} (53)denotes the second-order variation of the correlation function C with a perturbation in m and in S, and Hms( δm, δS ) denotes the Hessian with mixed derivatives applied to the directions δm and δS. The problem in eq. (52) is not only the combined appearance of δmC and δSC, but also the second derivative of the correlation function δmSC, which would be even more expensive to evaluate for all possible directions δm and δS. Trying to eliminate the latter, we differentiate eq. (9) with respect to S and then again with respect to m, yielding   \begin{equation} \mathcal {L}(\delta _{S}C) - P( \delta S, G ) = 0, \end{equation} (54)  \begin{equation} \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} + \mathcal {L}( \delta _{{\bf m}S}C ) - P( \delta S, \delta _{{\bf m}} G ) = 0. \end{equation} (55)Using again the adjoint wavefield u† and the two integrals over space and time/frequency 〈.〉 introduced in eq. (14) in the same manner as before, but now with eq. (52) leads to   \begin{eqnarray} H_{ms}( \delta {\bf m}, \delta S ) &=& \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + \nabla _{C}\chi \, \delta _{{\bf m}S} C \nonumber\\ && + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} + u^\dagger \, \mathcal {L}( \delta _{{\bf m}S}C ) - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle . \end{eqnarray} (56)With the set of adjoint operators defined in eqs (21) and (22), we can eliminate terms two and four due to the definition of u† in eq. (24) and obtain   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle , \end{equation} (57)where only first derivatives of the correlation function and the Green function are involved. Unfortunately, the choice of the first direction determines the effect of the second and it is therefore not possible to get a general expression for the Hessian operator that can be evaluated easily for all possible directions for δm and for δS. Our strategy is therefore to choose one direction δm and try to get an expression for Hms( δm, ○ ), also known as Hessian-vector product, which can be evaluated for all possible directions δS. To achieve this, we define the adjoint wavefield p† by   \begin{equation} \mathcal {L}^\dagger (p^\dagger ) = - \nabla _{C}\nabla _{C}\chi ^\dagger \, \delta _{{\bf m}} C - \nabla _{{\bf m}}\mathcal {L}^\dagger (u^\dagger ), \end{equation} (58)and invoke the adjoint operators   \begin{equation} \Bigl \langle [\nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C]\, \delta _{S} C= \delta _{S} C\, [\nabla _{C}\nabla _{C}\chi ^\dagger \, \delta _{{\bf m}} C]\Bigr \rangle \end{equation} (59)and   \begin{equation} \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C ) = \delta _{S}C\, \nabla _{{\bf m}}\mathcal {L}^\dagger (u^\dagger )\Bigr \rangle . \end{equation} (60)With the definition of p† and the two adjoint operators, we can change eq. (57) to   \begin{equation} H_{ms}(\delta \textbf{m}, \delta S) = \Bigl\langle -\delta_{S} C\, \mathcal{L}^\dagger(p^\dagger) - u^\dagger\,P(\delta S, \delta_{\textbf{m}} G) \Bigr\rangle = \Bigl\langle -p^\dagger\,\mathcal{L}(\delta_{S} C) - u^\dagger\,P(\delta S, \delta_{\textbf{m}} G) \Bigr\rangle. \end{equation} (61)With eq. (54) we remove $$\mathcal {L}(\delta _{S} C)$$ from eq. (61) and replace it with P( δS, G ). Since δS then only appears explicitly, we write   \begin{equation} H_{ms}( \delta \textbf{m}, \circ) = \Bigl\langle -p^\dagger\, P(\circ, G) - u^\dagger\,P(\circ, \delta_{\textbf{m}} G) \Bigr\rangle \end{equation} (62)to illustrate that it can be evaluated for all possible directions δS. Following the idea used in expression (26), a Hessian kernel can be defined in a similar way. The Hessian-vector product Hms( δm, ○ ) can be interpreted as the first-order change in the sensitivity to the distribution of noise sources caused by a perturbation δm in the structure model, which we can think of as a scatterer without loss of generality. The source kernel according to eq. (27) is composed of two components: the adjoint wavefield u† and the Green function G. The change of either one of them is accounted for by the two terms in eq. (62). The wavefield p† describes the change of the adjoint wavefield u† caused by scattering of the adjoint and the correlation wavefield, where the latter affects the misfit that drives the adjoint wavefield u†. The second term in eq. (62) accounts for the scattering of the Green function. A direct interaction of the change in the adjoint wavefield and the change in the Green function would lead to higher order terms. 4 INVERSION With the theoretical developments presented above, we now try to understand the physics of the problem in a numerical model in 2-D and study, how structure and heterogeneous noise source distributions express themselves in noise correlation functions in terms of measurable quantities. We then use the 2-D examples in order to find suitable joint inversion schemes and measurements that are likely to be useful in 3-D real-data problems. The feasibility of the developed strategy is demonstrated in a synthetic inversion. In the following, we use a 2-D finite-difference discretization of the PDE describing membrane waves (eq. 10). The correlations are computed in the time domain for a frequency band from 0.02 to 0.2 Hz. 4.1 Misfit study A key element for the inversion of both noise sources and structure is the design of misfit functionals that decouple the inversion as much as possible and the awareness which quantities can only be recovered jointly. We therefore start our developments by studying the effect of heterogeneous noise source distributions and of structure on noise correlation functions in terms of four different misfit functionals. The first one is based on the causal/acausal asymmetry of noise correlations, proposed by Ermert et al. (2016) for an inference of the distribution of noise sources with the supposition that it is to first order insensitive to unmodelled Earth structure. The asymmetry A is defined by   \begin{equation} A = \ln \frac{ {\int }_\tau [w_+(t) C(t)]^2 \text{d}t}{ {\int }_\tau [w_-(t) C(t)]^2 \text{d}t}, \end{equation} (63)where w+(t) and w−(t) are time windows centred around signals on the positive and negative lag, respectively. The discrepancy between asymmetries of synthetic and observed correlations can be quantified in terms of an L2-misfit. A closely related measurement, referred to as energy differences E in the following, directly compares the absolute energy of windowed signals in synthetics and observations and is given by   \begin{equation} E = \frac{ {\int }_\tau [w(t) C(t)]^2 \text{d}t - {\int }_\tau [w(t) C^o(t)]^2 \text{d}t}{{\int }_\tau [w(t) C^o(t)]^2 \text{d}t}, \end{equation} (64)where the superscript o denotes observations. These two misfits are complemented with waveform differences using the L2-norm and cross-correlation traveltime measurements. Following Luo & Schuster (1991) and Dahlen et al. (2000), we define the latter as the traveltime shift, where the time domain cross-correlation between a synthetic and an observed signal reaches its global maximum. With the exception of waveform differences, the measurements are, for now, only performed on the surface wave arrival. In this study, the measurement process is kept simplistic and information from other signals should be exploited for an application with real data. First, we compile a synthetic reference data set for an array of 16 stations for a homogeneous velocity model and a homogeneous distribution of noise sources, and generate random models with varying degree of complexity following the approach by Igel & Gudmundsson (1997). The strength of the anomalies is chosen such that scattering potential remains constant, that is, size of the anomaly squared times strength. With this scaling the induced traveltime shift of each anomaly is approximately the same independent of the complexity of the medium. The specific choice for the scaling does not alter the conclusion drawn from this study. The array and three velocity models with varying degree of complexity are shown in Fig. 4. We then perform two sets of experiments: Figure 4. View largeDownload slide Source distribution (top left) and velocity model with lowest (top right), moderate (bottom left) and highest complexity (bottom right) used in the misfit study. Receivers are indicated with black triangles and absorbing boundaries with dashed lines. The source distribution and the model with highest complexity are reused in the inversion example as target model. Figure 4. View largeDownload slide Source distribution (top left) and velocity model with lowest (top right), moderate (bottom left) and highest complexity (bottom right) used in the misfit study. Receivers are indicated with black triangles and absorbing boundaries with dashed lines. The source distribution and the model with highest complexity are reused in the inversion example as target model. Series I: Effect of structure We evaluate the effect of structural perturbations alone by forward modelling correlations for the different velocity models with a homogeneous source distribution and compute the discrepancy to the reference data set in terms of the four misfit functionals. The respective values are shown as red lines in Fig. 5. Figure 5. View largeDownload slide Effect of structure (red line) and the combined effect structure and noise sources (blue line) for four different misfit functions: (a) asymmetry, (b) traveltimes, (c) energy differences and (d) waveform differences. The misfit values are normalized by the number of recordings and their units are given in the titles. The average size of the anomalies in the velocity models is given in terms of the dominant wavelength. Figure 5. View largeDownload slide Effect of structure (red line) and the combined effect structure and noise sources (blue line) for four different misfit functions: (a) asymmetry, (b) traveltimes, (c) energy differences and (d) waveform differences. The misfit values are normalized by the number of recordings and their units are given in the titles. The average size of the anomalies in the velocity models is given in terms of the dominant wavelength. Series II: Effect of source We repeat this procedure with the same velocity models, but now with a Gaussian-shaped source anomaly superimposed on the homogeneous background. The Gaussian anomaly is placed on the left side of the array (see Fig. 4), which mimics a frequent configuration with strong sources near coasts (Stehly et al. 2006; Yang & Ritzwoller 2008; Tian & Ritzwoller 2015). The misfit values from this series of models show the combined effect of both noise sources and structure on the measurements and are shown in Fig. 5 as blue line. The difference between misfits caused by structure perturbations only (red lines) and those caused by combined structure and source perturbations (blue lines) provides information on the effect of the heterogeneous noise source. The asymmetry measurement is mostly affected by the heterogeneous noise source distribution and is insensitive, at least to first order, to Earth structure. We thus confirm the plausibility argument by Ermert et al. (2016) that a misfit based on the asymmetry of correlation functions is well suited to study the distribution of noise sources. In contrast, primarily velocity anomalies lead to traveltime differences and, as expected by previous studies (Tsai 2009; Weaver et al. 2009; Froment et al. 2010), the heterogeneous noise source has no profound effect for time-independent tomography. Monitoring applications, which investigate small velocity changes over small timescales, typically use coda waves of correlation functions (Brenguier et al. 2008; Obermann et al. 2013, 2014), aiming at reducing the effect of source variability. In cases where this is not possible, for example due to strong attenuation or the lack of sufficiently strong scatterers, the effect of heterogeneous noise distributions on traveltimes can, however, become significant or even dominant (Zhan et al. 2013; Delaney et al. 2017). The behaviour of the energy measurement changes with the complexity of the structure. For smooth velocity media, the energy of the signal is essentially controlled by the noise source. In strongly heterogeneous media, however, energy is altered through scattering, reflections and focusing effects. These phenomena also lead to more and more complex waveforms, which is expressed by the increase of the misfit based on waveform differences with higher complexity of the velocity model. The shift in magnitude introduced by the source anomaly reflects the resulting amplitude change in the correlation functions. Inversion strategy Based on the findings above, we propose the following inversion strategy. Since the influence of heterogeneous noise sources on traveltimes is small, a wave equation traveltime inversion (Luo & Schuster 1991) can be performed first, assuming a homogeneous distribution of noise sources. The resulting model can then be taken as background model for a source inversion based on the asymmetry of noise correlation functions, because this misfit is hardly affected by unmodelled Earth structure. Alternatively, the model inferred from traveltimes serves as initial model for a joint inversion for noise sources and structure using energy and waveform differences. 4.2 Inversion example To explore the potential and limitations of the proposed strategy, we extend the forward solver with an iterative inversion framework that is based on limited-memory BFGS (L-BFGS) (Nocedal & Wright 2006). In order to deal with the ill-posedness of the inverse problem and to avoid high oscillations in the inverted model, a Tikhonov regularization term (Tikhonov 1963) is added to the misfit function. After running several inversions with different scaling factors for the regularization term, we choose an appropriate value using a pragmatic L-curve approach (Hansen 1992). In addition, we include a Gaussian smoothing operator in the parametrization and choose the dominant wavelength as standard deviation. The inversion is stopped, when the norm of the gradient is reduced by a factor of 10−3, which can be achieved with reasonable computational effort and is close to the first-order necessary condition of optimality (Nocedal & Wright 2006). For the joint inversion of both noise sources and Earth structure, the two misfit functionals, that is, the energy measurement and waveform differences, are weighted using a convex combination. The weight is chosen based on the magnitude of the individual misfit functions, but no significant changes are observed in the inversion results by varying the relative weight by an order of magnitude. Combining different misfits aims at reducing the null space and at making the misfit functional locally convex, which can lead to a higher convergence rate. In our case, a weight favouring waveform differences does not exploit information about the noise source distribution that energy differences provide, thus the inversion converges slower. To compute synthetic data, we use the heterogeneous noise source distribution and the most complex structure model generated for the misfit study (Fig. 4). The typical size of the heterogeneities is approximately 1.5 times the dominant wavelength. Following the proposed strategy, first traveltimes are exploited by a wave equation traveltime inversion assuming a homogeneous distribution of noise sources. The inversion result is presented in Fig. 6. A smooth representation of the velocity structure is recovered and small scale features are lost. This demonstrates the limitation of traveltime tomography, but also its robustness to retrieve the big picture, even when heterogeneous noise source distributions are ignored. Figure 6. View largeDownload slide Target velocity model (left) and inverted model (right) obtained by a wave equation traveltime tomography with a homogeneous distribution of noise sources. To facilitate their comparison, the velocity models are shaded outside of the array, where low resolution is expected. Figure 6. View largeDownload slide Target velocity model (left) and inverted model (right) obtained by a wave equation traveltime tomography with a homogeneous distribution of noise sources. To facilitate their comparison, the velocity models are shaded outside of the array, where low resolution is expected. Using the velocity model obtained from traveltimes as background model, we invert for the distribution of noise sources based on the asymmetry measurement (Fig. 7). It is possible to locate the strong anomaly on the left side of the array. In regions with only the background noise level, the energy is decreased. By dividing the energy on the causal/acausal branch, the asymmetry of noise correlation functions is not sensitive to the absolute energy level. In our example, the asymmetry can therefore be explained by either introducing a corresponding change in the power-spectral density or with a similar anomaly, but smaller in magnitude and in combination with a reduction of the background noise level. Figure 7. View largeDownload slide Inverted source distribution using the asymmetry of noise correlation functions. The velocity model from the wave equation traveltime inversion is used as a background model. Figure 7. View largeDownload slide Inverted source distribution using the asymmetry of noise correlation functions. The velocity model from the wave equation traveltime inversion is used as a background model. Instead of only focusing on the source distribution, we finally perform a joint inversion starting with a homogeneous noise source distribution and the velocity model obtained from traveltimes (Fig. 8). Since the source inversion above based on the asymmetry measurement is not sensitive to the absolute energy level and we use a misfit function for the joint inversion that also exploits amplitude information, we do not take the inverted source model as initial model and start from a homogeneous distribution of noise sources. Within the array, a high resolution image of the target velocity model is recovered and the noise source anomaly is located accurately. Figure 8. View largeDownload slide Source distribution (left) and velocity model (right) recovered by the joint inversion. Figure 8. View largeDownload slide Source distribution (left) and velocity model (right) recovered by the joint inversion. 4.3 Trade-offs 4.3.1 Trade-offs and resolution for joint inversion model In order to assess resolution and trade-offs in the joint inversion, we apply the theoretical developments regarding second derivatives to the model obtained by the joint inversion for the source distribution and velocity model presented above. Introducing a perturbation in the structure model changes the sensitivity to Earth structure. Close to the vicinity of the optimum model, this change in the sensitivity would be the first update in an iterative inversion procedure. It therefore gives a conservative estimate of the resolution in terms of a point spread function, also referred to as spatial intra-parameter trade-off. In Fig. 9, we introduce a scatterer in the velocity model of approximately one third of the dominant wavelength in the centre of the array and compute the corresponding Hessian-vector product for the misfit function used in the joint inversion, that is, the convex combination of waveform and energy differences (right panel in Fig. 9). We infer an approximate width of the point spread function of three times the size of the scatterer, that is, one dominant wavelength, and a smearing of the perturbation in diagonal direction. Figure 9. View largeDownload slide Hessian applied to a perturbation in the velocity model. Introducing a perturbation in the velocity model in the centre of the array (green dots) changes the source kernel (left) and the structure kernel (right). Figure 9. View largeDownload slide Hessian applied to a perturbation in the velocity model. Introducing a perturbation in the velocity model in the centre of the array (green dots) changes the source kernel (left) and the structure kernel (right). At the same time, the perturbation of the velocity model changes the sensitivity to the distribution of noise sources (left panel in Fig. 9). Following the line of arguments above, the change of the source kernel indicates how noise sources can compensate for variations in Earth structure. It is interesting to note that no change in the source distribution coinciding with the introduced scatterer is proposed. 4.3.2 Source-structure trade-offs for different measurements For a further investigation of the trade-offs for the asymmetry measurement and energy differences, we compute Hessian-vector products for both measurements, but now for a perturbation in the source distribution at the centre of the array (Fig. 10). Comparing the ratios between the change of the structure kernel and the change of the source kernel is an indicator for the extent of trade-offs. The ratio for the asymmetry measurement is approximately one order of magnitude smaller and thus confirming its capability to better decouple the inverse problem than energy differences. Figure 10. View largeDownload slide Hessian applied to a perturbation in the source distribution (green dots) for the asymmetry measurement (top) and energy differences (bottom). Please note that absolute amplitudes of Hessian-vector products are difficult to interpret. Figure 10. View largeDownload slide Hessian applied to a perturbation in the source distribution (green dots) for the asymmetry measurement (top) and energy differences (bottom). Please note that absolute amplitudes of Hessian-vector products are difficult to interpret. 5 DISCUSSION AND OUTLOOK In the following paragraphs we discuss further details of the developed method, including benefits and limitations, as well as possible directions for further research. We end this section by giving a short outlook to the application to observed correlation functions. 5.1 Theory Due to the general formulation of the problem in Section 2.1 the derivations are equally applicable to time and frequency domain, and various wave propagation physics, without changing notation. The latter includes any type of medium, including 3-D elastic, heterogeneous and attenuating media. We cannot only reproduce specific derivations in the frequency domain (Hanasoge 2014; Fichtner 2014; Ermert et al. 2016), but also derive corresponding expressions in the time domain, well suited for implementation in modern 3-D wave propagation codes, such as the open-source waveform modelling and inversion package Salvus (Afanasiev et al. 2017) or SPECFEM (Komatitsch & Tromp 2002a,b; Peter et al. 2011). While the last point is also addressed by Tromp et al. (2010), the derivation of sensitivity kernels for the distribution of noise sources and structure is more compact in our approach. It furthermore renders the extension to second derivatives possible. Since the formulation is based on principles applied in conventional source-receiver full waveform inversion, it provides a foundation for collaboration and supports the re-use of already developed methods and tools. Beyond that, the theory unifies the earthquake-based two station method and ambient noise correlations, because we do not impose assumptions on the nature of the wavefield source in eq. (2). Without any processing that mitigates the effect of energetic signals, the sensitivity to Earth structure will be dominated by large earthquakes (Fichtner et al. 2017) and it might be more efficient to apply conventional two station workflows in such cases. Therefore, the unification primarily loosens the requirement in ambient noise tomography to exclude transient signals in order to be closer to the assumption of diffuse wavefields. In addition, with a function P in eq. (9) that is not bilinear in Snm(x) and Gm, j(x, x2), the same formulation can be applied to problems such as deconvolution (Snieder & Şafak 2006; Vasconcelos & Snieder 2008a,b) and provides as such a possible framework to generalize different approaches in interferometry. The assumption on uncorrelated noise sources introduced in eq. (5) facilitates the forward computation of noise correlation functions. It is, however, as such not crucial for the application of the method. The correlation length is finite and the evaluation of the double integral in eq. (4) remains feasible. The basic requirement for the theory to be applicable is to get close to the assumption made on the correlation length in space. In a real data application, one possibility to assess the quality of a specific choice might be to check if the data can be explained to within the observational uncertainties or if a change of the correlation length, maybe in terms of an additional inversion parameter, is required. To the best of the authors’ knowledge there is no observation yet about the quantitative correlation length of different noise sources and since we work with seismic wavelengths that are order of magnitudes larger than ocean waves that mostly generate noise, we use the assumption of uncorrelated noise sources until further insight becomes available. A further constraint on the correlation function itself in terms of convergence is not necessary. The presented theory is only strictly valid for linearly processed data. The goal for suitable processing is therefore inherently different from commonly applied schemes focusing mainly on improving Green function recovery (Bensen et al. 2007). One possibility is either to follow the approach proposed by Fichtner et al. (2017) and to account for non-linear processing or to keep the processing to a minimum and as linear as possible. 5.2 Inversion strategy and example We estimate the effect of heterogeneous noise sources and structure on four different misfit functionals and then propose an inversion strategy. The misfit study is not meant to be exhaustive and to cover all possible configurations. By the specific choice of the heterogeneous noise source distribution we intend to include the most common configuration with strong noise sources near coasts. To validate the main finding regarding the capability of the asymmetry measurement and energy differences to map noise sources, two related Hessian-vector products are included. After obtaining a model from a wave equation traveltime inversion assuming a homogeneous distribution of noise sources, further steps depend on the respective target. Either a source inversion can be performed using the asymmetry measurement or a joint inversion based on energy and waveform differences. In the latter, we include amplitude information, which highly depends on the applied processing and whose information content is in general up for debate (Prieto et al. 2011; Tsai 2011). If one is, however, not specifically interested in the absolute energy level of the noise sources, amplitude based misfit functionals can be used, since relative amplitudes within a carefully processed data set carry information. A possible alternative for the inversion of observed correlation functions is a time- and frequency-dependent phase misfit (Fichtner et al. 2008), which has been applied successfully in conventional source-receiver full waveform inversion (Fichtner et al. 2009, 2013; Fichtner & Villaseñor 2015; Simutė et al. 2016). In general, an important component of a successful application of full waveform ambient noise inversion is to explore more misfits and possibly develop new measurements. One approach could be to extract ‘too-early-to-be-physical’ arrivals. Under perfect conditions, we would retrieve the Green function by correlating noise traces and the first arrivals would be governed by the seismic velocity. Since we never fulfil the required assumptions, spurious arrivals are introduced even before the first possible arrivals. These contain valuable information on noise source locations that could improve source inversions. Further research in that direction is needed to assess its potential. The presented joint inversion example illustrates that it is possible to go beyond traditional noise tomography, which mostly exploits fundamental-mode surface waves. A good inference of the distribution of noise sources is essential, which is analogous to traditional source-receiver methods. We expect that the benefit of full waveform ambient noise inversion will be even larger in 3-D, since higher-mode surface waves are not excited correctly (Halliday & Curtis 2008; Kimman & Trampert 2010), which is necessary for their exploitation in methods that assume a perfect retrieval of the Green function. The presented inversion is performed only for one frequency band and we therefore only include one distribution of noise sources. Since noise sources are frequency dependent, a more complete approach would be to allow several distributions in space, for example in terms of L basis functions as presented in the examples above, and invert for them at the same time. The asymmetry measurement and also energy differences, however, do not carry enough frequency information and the same asymmetry or energy can be obtained by changing any basis function. For the application to data, we will for now honour the frequency dependence by inverting for noise sources in subsequent frequency bands. A joint approach is the topic for further research. 5.3 Trade-offs Traditional ambient noise tomography works under the assumption of a homogeneous distribution of noise sources and thereby seemingly avoids the necessity to constrain the noise source distribution and to characterize trade-offs. The concept is simplistic, but to some extent oversimplified, since it fails to describe the physical system where both the distribution of noise sources and Earth structure determine the actual waveforms in correlation functions. The proposed method allows us to jointly invert for the distribution of noise sources and structure. Similar to earthquake tomography, errors in the inference of source properties map into tomographic images of Earth structure (Valentine & Woodhouse 2010). Trade-offs between source and structure can never be avoided completely, but to honour and to be able to interpret results obtained by joint inversion they have to be quantified. Exactly for that purpose, we present a second-order adjoint approach, enabling us to perform a resolution analysis that accounts for intra- and interparameter trade-offs. The trade-off kernel in Fig. 9 indicates indeed that a change in the source distribution can compensate for a scatterer introduced in the structure. In the inversion example, this trade-off seems to be manageable, but these trade-offs are likely to be larger when scattering is dominant. A detailed and quantitative analysis of trade-offs will, however, only be possible with data taking measurement errors into account and is subject for further research. The same concepts can be applied to develop and characterize different misfit functionals aiming at decoupling the inversion as much as possible. 5.4 Application to data For the application of the developed method to observed correlation functions, we are currently extending the spectral element solver Salvus (http://salvus.io) to be able to compute correlation functions in 3-D media with heterogeneous noise sources at the surface and the corresponding sensitivity kernels for the distribution of noise sources and Earth structure. The first goal is a global inversion for the Hum period band. The application to regional scales poses further challenges, because noise sources and structure outside of the domain of interest affect the correlation functions. Further research is needed to solve this problem. 6 CONCLUSIONS In the previous sections we extended the theoretical background for full waveform ambient noise inversion that goes beyond Green function retrieval. It accounts for the distribution of noise sources in space and frequency, operates for any type of medium, including a 3-D elastic, heterogeneous Earth with attenuation and anisotropy, and models the full seismic wave propagation physics. The general formulation of the forward problem of computing correlation functions facilitates the application of adjoint techniques, which enables us to compute first and also second derivatives efficiently. The numerical study revealed that the sensitivity of different observables to the distribution of noise sources and to structure varies. Based on that, we designed and implemented an appropriate inversion scheme in order to constrain both of them. The presented joint inversion example demonstrates that full waveform ambient noise inversion has the potential to improve the resolution of tomographic images compared to traditional ambient noise tomography and, in addition, to refine noise source location. An improvement of the latter is essential for a better understanding of noise generation. Inherent trade-offs between source and structure are quantified using Hessian-vector products. ACKNOWLEDGEMENTS The authors would like to thank Editor Jean Virieux and two anonymous reviewers for their valuable comments and excellent suggestions to improve the paper. In addition, the authors wish to express their gratitude towards Laurent Stehly and Evan Delaney for useful discussions and brainstorming sessions. This research was supported by the Swiss National Supercomputing Center (CSCS) in the form of the GeoScale and ch1/s741 projects, and by the Swiss National Science Foundation (SNSF) under grant 200021_149143. The authors acknowledge support from and discussions within TIDES COST Action ES1401. REFERENCES Afanasiev M., Boehm C., van Driel M., Krischer L., May D.A., Rietmann M., Fichtner A., 2017. Salvus: an open-source high-performance package for full waveform modelling and inversion from laboratory to global scales , doi:10.5905/ethz-1007-88. Agnew D.C., Berger J., 1978. Vertical seismic noise at very low frequencies, J. geophys. Res. , 83( B11), 5420– 5424. Google Scholar CrossRef Search ADS   Aki K., 1957. Space and time spectra of stationary stochastic waves with special reference to microtremors, Bull. Earthq. Res. Inst. , 35, 415– 456. Aki K., Richards P., 2002. Quantitative Seismology , University Science Books. Ardhuin F., Stutzmann E., Schimmel M., Mangeney A., 2011. Ocean wave sources of seismic noise, J. geophys. Res. , 116, doi:10.1029/2011JC006952. Ardhuin F., Gualtieri L., Stutzmann E., 2015. How ocean waves rock the earth: two mechanisms explain microseisms with periods 3 to 300s, Geophys. Res. Lett. , 42( 3), 765– 772. Google Scholar CrossRef Search ADS   Basini P., Nissen-Meyer T., Boschi L., Casarotti E., Verbeke J., Schenk O., Giardini D., 2013. The influence of nonuniform ambient noise on crustal tomography in Europe, Geochem. Geophys. Geosyst. , 14, 1471– 1492. Google Scholar CrossRef Search ADS   Bensen G.D., Ritzwoller M.H., Barmin M.P., Levshin A.L., Lin F., Moschetti M.P., Shapiro N.M., Yang Y., 2007. Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements, Geophys. J. Int. , 169, 1239– 1260. Google Scholar CrossRef Search ADS   Brenguier F., Campillo M., Hadziioannou C., Shapiro N.M., Nadeau R.M., Larose E., 2008. Postseismic relaxation along the san andreas fault at Parkfield from continuous seismological observations, Science , 321( 5895), 1478– 1481. Google Scholar CrossRef Search ADS PubMed  Campillo M., Paul A., 2003. Long-range correlations in the diffuse seismic coda, Science , 299( 5606), 547– 549. Google Scholar CrossRef Search ADS PubMed  Chen P., Zhao L., Jordan T.H., 2007. Full 3D tomography for the crustal structure of the Los Angeles region., Bull. seism. Soc. Am. , 97, 1094– 1120. Google Scholar CrossRef Search ADS   Claerbout J.F., 1968. Synthesis of a layered medium from its acoustic transmission response, Geophysics , 33( 2), 264– 269. Google Scholar CrossRef Search ADS   Cupillard P., Capdeville Y., 2010. On the amplitude of surface waves obtained by noise correlation and the capability to recover the attenuation: a numerical approach, Geophys. J. Int. , 181, 1687– 1700. Dahlen F., Hung S.-H., Nolet G., 2000. Fréchet kernels for finite-frequency traveltimes – I. Theory, Geophys. J. Int. , 141, 157– 174. Google Scholar CrossRef Search ADS   Dahlen F.A., Baig F.A., 2002. Fréchet kernels for body wave amplitudes, Geophys. J. Int. , 150, 440– 466. Google Scholar CrossRef Search ADS   Delaney E., Ermert L., Sager K., Kritski A., Bussat S., Fichtner A., 2017. Passive seismic monitoring with nonstationary noise sources, Geophysics , 82( 4), KS57– KS70. Google Scholar CrossRef Search ADS   Einstein A., 1906. Zur Theorie der Brownschen Bewegung, Ann. Phys. , 4( 19), 371– 381. Google Scholar CrossRef Search ADS   Ermert L., Villaseñor A., Fichtner A., 2016. Cross-correlation imaging of ambient noise sources, Geophys. J. Int. , 204( 1), 347– 364. Google Scholar CrossRef Search ADS   Fichtner A., 2010. Full Seismic Waveform Modelling and Inversion. , Springer. Fichtner A., 2014. Source and processing effects on noise correlations, Geophys. J. Int. , 197, 1527– 1531. Google Scholar CrossRef Search ADS   Fichtner A., 2015. Source-structure trade-offs in ambient noise correlations, Geophys. J. Int. , 202( 1), 678– 694. Google Scholar CrossRef Search ADS   Fichtner A., Trampert J., 2011. Hessian kernels of seismic data functionals based upon adjoint techniques, Geophys. J. Int. , 185, 775– 798. Google Scholar CrossRef Search ADS   Fichtner A., Villaseñor A., 2015. Crust and upper mantle of the western Mediterranean - Constraints from full-waveform inversion, Earth planet. Sci. Lett. , 428, 52– 62. Google Scholar CrossRef Search ADS   Fichtner A., Bunge H.-P., Igel H., 2006. The adjoint method in seismology - I. Theory, Phys. Earth planet. Inter. , 157, 86– 104. Google Scholar CrossRef Search ADS   Fichtner A., Kennett B.L.N., Igel H., Bunge H.-P., 2008. Theoretical background for continental- and global-scale full-waveform inversion in the time-frequency domain., Geophys. J. Int. , 175, 665– 685. Google Scholar CrossRef Search ADS   Fichtner A., Kennett B.L.N., Igel H., Bunge H.-P., 2009. Full seismic waveform tomography for upper-mantle structure in the Australasian region using adjoint methods., Geophys. J. Int. , 179, 1703– 1725. Google Scholar CrossRef Search ADS   Fichtner A., Saygin E., Taymaz T., Cupillard P., Capdeville Y., Trampert J., 2013. The deep structure of the North Anatolian Fault Zone, Earth planet. Sci. Lett. , 373, 109– 117. Google Scholar CrossRef Search ADS   Fichtner A., Stehly L., Ermert L., Boehm C., 2017. Generalized interferometry - I: theory for interstation correlations, Geophys. J. Int. , 208( 2), 603– 638. Google Scholar CrossRef Search ADS   Froment B., Campillo M., Roux P., Gouédard P., Verdel A., Weaver R.L., 2010. Estimation of the effect of nonisotropically distributed energy on the apparent arrival time in correlations, Geophysics , 75, SA85– SA93. Google Scholar CrossRef Search ADS   Gizon L., Birch A.C., 2002. Time-distance helioseismology: The forward problem for random distributed sources, Astrophys. J. , 571( 2), 966. Google Scholar CrossRef Search ADS   Groos J.C., Bussat S., Ritter J.R.R., 2012. Performance of different processing schemes in seismic noise cross-correlations, Geophys. J. Int. , 188, 498– 512. Google Scholar CrossRef Search ADS   Gualtieri L., Stutzmann E., Capdeville Y., Ardhuin F. M., Schimmel A.M., Morelli A., 2013. Modeling secondary microseismic noise by normal mode summation, Geophys. J. Int. , 193, 1732– 1745. Google Scholar CrossRef Search ADS   Halliday D., Curtis A., 2008. Seismic interferometry, surface waves and source distribution, Geophys. J. Int. , 175, 1067– 1087. Google Scholar CrossRef Search ADS   Hanasoge S.M., 2013. The influence of noise sources on cross-correlation amplitudes, Geophys. J. Int. , 192, 295– 309. Google Scholar CrossRef Search ADS   Hanasoge S.M., 2014. Measurements and kernels for source-structure inversions in noise tomography, Geophys. J. Int. , 192, 971– 985. Google Scholar CrossRef Search ADS   Hanasoge S.M., Birch A., Gizon L., Tromp J., 2011. The adjoint method applied to time-distance helioseismology, Astrophys. J. , 738, doi:10.1088/0004–637X/738/1/100. Haned A., Stutzmann E., Schimmel M., Kiselev S., Davaille A., Yelles-Chaouche A., 2016. Global tomography using seismic hum, Geophys. J. Int. , 204( 2), 1222– 1236. Google Scholar CrossRef Search ADS   Hansen P.C., 1992. Analysis of discrete ill-posed problems by means of the L-curve, SIAM Rev. , 34( 4), 561– 580. Google Scholar CrossRef Search ADS   Harmon N., Rychert C., Gerstoft P., 2010. Distribution of noise sources for seismic interferometry, Geophys. J. Int. , 183( 3), 1470– 1484. Google Scholar CrossRef Search ADS   Igel H., Gudmundsson O., 1997. Frequency-dependent effects on travel times and waveforms of long-period S and SS waves, Phys. Earth. planet. Int. , 104, 229– 246. Google Scholar CrossRef Search ADS   Kimman W., Trampert J., 2010. Approximations in seismic interferometry and their effects on surface waves, Geophys. J. Int. , 182, 461– 476. Komatitsch D., Tromp J., 2002a. Spectral-element simulations of global seismic wave propagation, part II: 3-D models, oceans, rotation, and gravity, Geophys. J. Int. , 150, 303– 318. Google Scholar CrossRef Search ADS   Komatitsch D., Tromp J., 2002b. Spectral-element simulations of global seismic wave propagation, part I: validation, Geophys. J. Int. , 149, 390– 412. Google Scholar CrossRef Search ADS   Landès M., Hubans F., Shapiro N.M., Paul A., Campillo M., 2010. Origin of deep ocean microseisms by using teleseismic body waves, J. geophys. Res. , 115( B5), B05302, doi:10.1029/2009JB006918. Google Scholar CrossRef Search ADS   Lobkis O.I., Weaver R.L., 2001. On the emergence of the Green’s function in the correlations of a diffuse field, J. acoust. Soc. Am. , 110, 3011– 3017. Google Scholar CrossRef Search ADS   Luo Y., Schuster G.T., 1991. Wave-equation traveltime inversion., Geophysics , 56, 645– 653. Google Scholar CrossRef Search ADS   Nishida K., 2014. Source spectra of seismic hum, Geophys. J. Int. , 199( 1), 416– 429. Google Scholar CrossRef Search ADS   Nishida K., Fukao Y., 2007. Source distribution of Earth’s background free oscillations, J. geophys. Res. , 112( B6), B06306, doi:10.1029/2006JB004720. Google Scholar CrossRef Search ADS   Nishida K., Montagner J.-P., Kawakatsu H., 2009. Global surface wave tomography using seismic hum, Science , 326( 5949), 112. Nocedal J., Wright S., 2006. Numerical Optimization , Springer Science & Business Media. Obermann A., Planès T., Larose E., Campillo M., 2013. Imaging preeruptive and coeruptive structural and mechanical changes of a volcano with ambient seismic noise, J. geophys. Res. , 118( 12), 6285– 6294. Google Scholar CrossRef Search ADS   Obermann A., Froment B., Campillo M., Larose E., Planès T., Valette B., Chen J.H., Liu Q.Y., 2014. Seismic noise correlations to image structural and mechanical changes associated with the Mw 7.9 2008 Wenchuan earthquake, J. geophys. Res. , 119( 4), 3155– 3168. Google Scholar CrossRef Search ADS   Peter D., Tape C., Boschi L., Woodhouse J.H., 2007. Surface wave tomography: global membrane waves and adjoint methods, Geophys. J. Int. , 171( 3), 1098– 1117. Google Scholar CrossRef Search ADS   Peter D. et al.  , 2011. Forward and adjoint simulations of seismic wave propagation on fully unstructured hexahedral meshes, Geophys. J. Int. , 186, 721– 739. Google Scholar CrossRef Search ADS   Plessix R.-E., 2006. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications, Geophys. J. Int. , 167, 495– 503. Google Scholar CrossRef Search ADS   Pratt R.G., 1999. Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model, Geophysics , 64, 888– 901. Google Scholar CrossRef Search ADS   Prieto G.A., Denolle M., Lawrence J.F., Beroza G.C., 2011. On amplitude information carried by the ambient seismic field, C. R. Geosci. , 343( 8-9), 600– 614. Google Scholar CrossRef Search ADS   Rhie J., Romanowicz B., 2004. Excitation of Earth’s continuous free oscillations by atmosphere–ocean–seafloor coupling, Nature , 431( 7008), 552– 556. Google Scholar CrossRef Search ADS PubMed  Sabra K.G., Gerstoft P., Roux P., Kuperman W.A., 2005a. Surface wave tomography from microseisms in Southern California, Geophys. Res. Lett. , 32, doi:10.1029/2005GL023155. Sabra K.G., Gerstoft P., Roux P., Kuperman W.A., Fehler M.C., 2005b. Extracting time-domain Green’s function estimates from ambient seismic noise, Geophys. Res. Lett. , 32( 3), L03310, doi:10.1029/2004GL021862. Google Scholar CrossRef Search ADS   Sánchez-Sesma F.J., Campillo M., 2006. Retrieval of the Green’s function from cross correlation: The canonical elastic problem, Bull. seism. Soc. Am. , 96, 1182– 1191. Google Scholar CrossRef Search ADS   Shapiro N.M., Campillo M., 2004. Emergence of broadband rayleigh waves from correlations of the ambient seismic noise, Geophys. Res. Lett. , 31( 7), L07614, doi:10.1029/2004GL019491. Google Scholar CrossRef Search ADS   Shapiro N.M., Campillo M., Stehly L., Ritzwoller M., 2005. High resolution surface wave tomography from ambient seismic noise, Science , 307, 1615– 1618. Google Scholar CrossRef Search ADS PubMed  Simutė S., Steptoe H., Cobden L., Gokhberg A., Fichtner A., 2016. Full-waveform inversion of the japanese islands region, J. geophys. Res. , 121( 5), 3722– 3741. Google Scholar CrossRef Search ADS   Snieder R., 2007. Extracting the Green’s function of attenuating heterogeneous acoustic media from uncorrelated waves, J. acoust. Soc. Am. , 121( 5), 2637– 2643. Google Scholar CrossRef Search ADS PubMed  Snieder R., Larose E., 2013. Extracting earth’s elastic wave response from noise measurements, Annu. Rev. Earth Planet. Sci. , 41( 1), 183– 206. Google Scholar CrossRef Search ADS   Snieder R., Şafak E., 2006. Extracting the building response using seismic interferometry: Theory and application to the Millikan Library in Pasadena, California, Bull. seism. Soc. Am. , 96, 586– 598. Google Scholar CrossRef Search ADS   Snieder R., Wapenaar K., Larner K., 2006. Spurious multiples in seismic interferometry of primaries, Geophysics , 71, SI111– SI124. Google Scholar CrossRef Search ADS   Snieder R., Wapenaar K., Wegler U., 2007. Unified Green’s function retrieval by cross-correlation; connection with energy principles, Phys. Rev. E , 75, 036103, doi:10.1103/PhysRevE.75.036103. Google Scholar CrossRef Search ADS   Snieder R., Fan Y., Slob E., Wapenaar K., 2010. Equipartitioning is not sufficient for Green’s function extraction, Earthq. Sci. , 23, 403– 415. Google Scholar CrossRef Search ADS   Stehly L., Boué P., 2017. On the interpretation of the amplitude decay of noise correlations computed along a line of receivers, Geophys. J. Int. , 209( 1), 358– 372. Stehly L., Campillo M., Shapiro N.M., 2006. A study of the seismic noise from its long-range correlation properties, J. Geophys. Res. , 111( B10), B10306, doi:10.1029/2005JB004237. Google Scholar CrossRef Search ADS   Stehly L., Fry B., Campillo M., Shapiro N., Guilbert J., Boschi L., Giardini D., 2009. Tomography of the Alpine region from observations of seismic ambient noise, Geophys. J. Int. , 178( 1), 338– 350. Google Scholar CrossRef Search ADS   Stutzmann E., Ardhuin F., Schimmel M., Mangeney A., Patau G., 2012. Modelling long-term seismic noise in various environments, Geophys. J. Int. , 191( 2), 707– 722. Google Scholar CrossRef Search ADS   Tanimoto T., 1990. Modelling curved surface wave paths: membrane surface wave synthetics, Geophys. J. Int. , 102( 1), 89– 100. Google Scholar CrossRef Search ADS   Tape C., Liu Q., Tromp J., 2007. Finite-frequency tomography using adjoint methods - methodology and examples using membrane surface waves, Geophys. J. Int. , 168, 1105– 1129. Google Scholar CrossRef Search ADS   Tape C., Liu Q., Maggi A., Tromp J., 2009. Adjoint tomography of the southern California crust, Science , 325, 988– 992. Google Scholar CrossRef Search ADS PubMed  Tarantola A., 1988. Theoretical background for the inversion of seismic waveforms, including elasticity and attenuation, Pure appl. Geophys. , 128, 365– 399. Google Scholar CrossRef Search ADS   Tarantola A., 2005. Inverse Problem Theory and Methods for Model Parameter Estimation , 2nd edn, Society for Industrial and Applied Mathematics. Google Scholar CrossRef Search ADS   Tian Y., Ritzwoller M.H., 2015. Directionality of ambient noise on the Juan de Fuca plate: implications for source locations of the primary and secondary microseisms, Geophys. J. Int. , 201( 1), 429– 443. Google Scholar CrossRef Search ADS   Tikhonov A., 1963. Solution of incorrectly formulated problems and the regularization method, Sov. Math. Dokl. , 5, 1035– 1038. Tromp J., Tape C., Liu Q., 2005. Seismic tomography, adjoint methods, time reversal and banana-doughnut kernels, Geophys. J. Int. , 160, 195– 216. Google Scholar CrossRef Search ADS   Tromp J., Luo Y., Hanasoge S., Peter D., 2010. Noise cross-correlation sensitivity kernels, Geophys. J. Int. , 183, 791– 819. Google Scholar CrossRef Search ADS   Tsai V.C., 2009. On establishing the accuracy of noise tomography traveltime measurements in a realistic medium, Geophys. J. Int. , 178, 1555– 1564. Google Scholar CrossRef Search ADS   Tsai V.C., 2011. Understanding the amplitudes of noise correlation measurements, J. geophys. Res. , 116, doi:10.1029/2011JB008483. Valentine A.P., Woodhouse J.H., 2010. Reducing errors in seismic tomography: combined inversion for sources and structure, Geophys. J. Int. , 180( 2), 847– 857. Google Scholar CrossRef Search ADS   van Driel M., Krischer L., Stähler S.C., Hosseini K., Nissen-Meyer T., 2015. Instaseis: instant global seismograms based on a broadband waveform database, Solid Earth , 6( 2), 701– 717. Google Scholar CrossRef Search ADS   Vasconcelos I., Snieder R., 2008a. Interferometry by deconvolution, Part 1 – Theory for acoustic waves and numerical examples, Geophysics , 73, S129– S141. Google Scholar CrossRef Search ADS   Vasconcelos I., Snieder R., 2008b. Interferometry by deconvolution, Part 2 – Theory for elastic waves and application to drill-bit seismic imaging, Geophysics , 73, S115– S128. Google Scholar CrossRef Search ADS   Wapenaar K., 2004. Retrieving the elastodynamic Green’s function of an arbitrary inhomogeneous medium by cross correlation, Phys. Rev. Lett. , 93, 254301– 254304. Google Scholar CrossRef Search ADS PubMed  Wapenaar K., Fokkema J., 2006. Green’s function representations for seismic interferometry, Geophysics , 71, S133– S146. Google Scholar CrossRef Search ADS   Weaver R., Froment B., Campillo M., 2009. On the correlation of non-isotropically distributed ballistic scalar diffuse waves, J. acoust. Soc. Am. , 126( 4), 1817– 1826. Google Scholar CrossRef Search ADS PubMed  Weaver R.L., Lobkis O.I., 2004. Diffuse fields in open systems and the emergence of Green’s function, J. acoust. Soc. Am. , 116, 2731– 2734. Google Scholar CrossRef Search ADS   Woodard M.F., 1997. Implications of localized, acoustic absorption for heliotomographic analysis of sunspots, Astrophys. J. , 485, 890– 894. Google Scholar CrossRef Search ADS   Yang Y., Ritzwoller M.H., 2008. Characteristics of ambient seismic noise as a source for surface wave tomography, Geochemistry, Geophysics, Geosystems , 9( 2), Q02008, doi:10.1029/2007GC001814. Google Scholar CrossRef Search ADS   Yao H., van der Hilst R.D., 2009. Analysis of ambient noise energy distribution and phase velocity bias in ambient noise tomography, with application to SE Tibet, Geophys. J. Int. , 179( 2), 1113– 1132. Google Scholar CrossRef Search ADS   Zhan Z., Tsai V.C., Clayton R.W., 2013. Spurious velocity changes caused by temporal variations in ambient noise frequency content, Geophys. J. Int. , 194( 3), 1574– 1581. Google Scholar CrossRef Search ADS   Zhou Y., Dahlen F.A., Nolet G., 2004. Three-dimensional sensitivity kernels for surface wave observables, Geophys. J. Int. , 158, 142– 168. Google Scholar CrossRef Search ADS   APPENDIX A: FOURIER TRANSFORM For the Fourier transform and its inverse we use   \begin{equation} y(\omega ) = \frac{1}{\sqrt{2\pi }} \int _{-\infty }^\infty y(t)\, e^{-i \omega t} \, \text{d}t \end{equation} (A1)  \begin{equation} y(t) = \frac{1}{\sqrt{2\pi }} \int _{-\infty }^\infty y(\omega )\, e^{i \omega t} \, \text{d}\omega . \end{equation} (A2)We distinguish between time and frequency domain only by argument or context. APPENDIX B: LINK TO FORMULATION BY FICHTNER (2015) For a link between the derivation presented in Section 3.1.2 and in Fichtner (2015), we start from eq. (44)   \begin{equation} \nabla _{{\bf m}} \mathcal {X} \, \delta {\bf m} = \Bigl \langle u^\dagger _{(2)}\, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + c^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} \Bigr \rangle , \end{equation} (B1)where we refer to $$u^\dagger _{(n)}$$ as the adjoint wavefield that corresponds to the correlation wavefield at reference station xn. In the frequency domain it can be written as   \begin{equation} \nabla _{{\bf m}} \mathcal {X} \, \delta {\bf m} = \underbrace{ \int \limits _\oplus \int \limits _{-\infty }^{\infty } u^\dagger _{(2)}({\bf x},\omega )\, \nabla _{\bf m}\mathcal {L}[C({\bf x},{\bf x}_2,\omega )]\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x} }_{(1)} + \underbrace{ \int \limits _\oplus \int \limits _{-\infty }^{\infty } c^\dagger ({\bf x},\omega )\, \nabla _{\bf m}\mathcal {L}[G({\bf x},{\bf x}_2,\omega )]\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x} }_{(2)}. \end{equation} (B2)With the definition of the adjoint correlation wavefield c† according to eq. (47), which is defined by   \begin{equation} c^\dagger ({\bf x},\omega ) = \int \limits _\oplus G^*({\bf x},\mathbf {\xi },\omega )\, S(\mathbf {\xi },\omega )\, G(\mathbf {\xi },{\bf x}_1,\omega)\, f^*(\omega ) \, \text{d}\mathbf {\xi }, \end{equation} (B3)the second part of eq. (B2) is given by   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } \Bigl[\int \limits _\oplus G^*({\bf x},\mathbf {\xi },\omega )\, S(\mathbf {\xi },\omega )\, G(\mathbf {\xi },{\bf x}_1,\omega)\, f^*(\omega )\, \text{d}\mathbf {\xi } \Bigr ] \, \nabla _{\bf m}\mathcal {L}(G({\bf x},{\bf x}_2,\omega ))\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}. \end{equation} (B4)Since $$\nabla _{\bf m}\mathcal {L}$$ is linear, we write   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } \Bigl [\int \limits _\oplus G^*({\bf x},\mathbf {\xi },\omega )\, S(\mathbf {\xi },\omega )\, G(\mathbf {\xi },{\bf x}_1,\omega)\, \text{d}\mathbf {\xi } \Bigr ] \, \nabla _{\bf m}\mathcal {L}(f^*(\omega ) G({\bf x},{\bf x}_2,\omega ))\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}, \end{equation} (B5)and observe that $$\nabla _{\bf m}\mathcal {L}$$ is acting on f*(ω)G(x, x2, ω), which is the adjoint wavefield $$u^{\dagger *}_{(1)}({\bf x},\omega )$$ for the reference station at x1 and the corresponding measurement at station x2. Eq. (B5) thus changes to   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } C^*({\bf x},{\bf x}_1,\omega ) \, \nabla _{\bf m}\mathcal {L}( u^{\dagger *}_{(1)}({\bf x},\omega ) )\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}. \end{equation} (B6)Using the adjoint operator defined in eq. (22), we obtain   \begin{equation} (2) = \int \limits _\oplus \int \limits _{-\infty }^{\infty } u^{\dagger *}_{(1)}({\bf x},\omega )\, \nabla _{\bf m}\mathcal {L}^{\dagger }( C^*({\bf x},{\bf x}_1,\omega ) )\, \delta {\bf m}\, \text{d}\omega \, \text{d}{\bf x}. \end{equation} (B7)Part 1 of eq. (B2) together with expression (B7) correspond to eq. (32) in Fichtner (2015). As expected due to the nature of correlation functions, the final expression is Hermitian in the exchange of x1 and x2. For a full evaluation of the gradient according to the expression by Fichtner (2015), the wavefield for all reference stations has to be computed. The derivation in this paper allows us to optimize the array in terms of simulations and coverage, convenient due to the high computational costs of full waveform ambient noise inversion. APPENDIX C: DERIVATION OF P† FOR EXAMPLE CASE For a better understanding of the developments, we derive the adjoint operator P†, invoked in eq. (40), for the example case in the following. Starting from expression (40), we use the definition of P in eq. (13) and write   \begin{equation} \Bigl \langle u^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle = - \frac{1}{2\pi } \textrm{Re} \int \limits _\oplus \int \limits _{\tau } u^\dagger ({\bf x}, t)\, \sum _l S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime })\, e^{-i\omega t^{\prime }} \text{d}t^{\prime } \Bigr ]^* e^{i\omega t} \, \text{d}\omega \, \text{d}t\, \text{d}{\bf x}. \end{equation} (C1)Reordering the integrals and the summation over basis functions, we obtain   \begin{equation} - \frac{1}{2\pi } \textrm{Re} \sum _l \int \limits _\oplus S_l({\bf x}) \int \limits _{\tau } \int \limits _{w_l}^{w_{l+1}} \int \limits _{\tau }^{} u^\dagger ({\bf x}, t)\, \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime }) e^{i\omega t^{\prime }} e^{i\omega t}\, \text{d}t^{\prime }\, \text{d}\omega \, \text{d}t\, \text{d}{\bf x}. \end{equation} (C2)Isolating the adjoint wavefield u† together with the exponential function eiωt, eq. (C2) changes to   \begin{equation} - \frac{1}{2\pi } \textrm{Re} \sum _l \int \limits _\oplus S_l({\bf x}) \int \limits _{\tau } \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime }) \int \limits _{w_l}^{w_{l+1}} \int \limits _{\tau }^{} u^\dagger ({\bf x}, t) e^{i\omega t}\, \text{d}t\, e^{i\omega t^{\prime }}\, \text{d}\omega \, \text{d}t^{\prime }\, \text{d}{\bf x}. \end{equation} (C3)The innermost integral is a Fourier transform of the adjoint wavefield u†. If we take the complex conjugate of the Fourier transform, taking into account that u†(x, t) is real, and reordering the integrals and summation, we get   \begin{equation} - \frac{1}{2\pi } \textrm{Re} \int \limits _\oplus \int \limits _{\tau } \delta _{\bf m}G({\bf x}, {\bf x}_2, t^{\prime })\, \sum _l S_l({\bf x}) \int \limits _{w_l}^{w_{l+1}} \Bigl [ \int \limits _{\tau }^{} u^\dagger ({\bf x}, t)\, e^{-i\omega t} \text{d}t \Bigr ]^* e^{i\omega t^{\prime }} \, \text{d}\omega \, \text{d}t^{\prime }\, \text{d}{\bf x} \end{equation} (C4)  \begin{equation} = \Bigl \langle \delta _{{\bf m}}G\, P^\dagger ( S, u^\dagger ) \Bigr \rangle . \end{equation} (C5)For the last step, we compare eq. (C4) to eq. (C1) and observe that this is exactly what we try to retrieve. In this case, the definitions of P and P† are the same and P is therefore self-adjoint. APPENDIX D: SECOND DERIVATIVES D1 Mixed source-structure derivative Hms In the derivation of the mixed source-structure derivative presented in Section 3.2, we obtained as an intermediate result eq. (57), given by   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle \nabla _{C}\nabla _{C}\chi \, \delta _{{\bf m}} C\, \delta _{S} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle , \end{equation} (D1)and realized that it is not possible to get an expression for Hms that is easy to compute for all possible directions δS and δm. We therefore chose a perturbation δm and got an expression for Hms( δm, ○) that can be evaluated for all possible directions δS. It is, however, also possible to introduce a perturbation in S and derive Hms(○, δS ) for all possible directions δm. To achieve this, we define the adjoint wavefield p† now as   \begin{equation} \mathcal {L}^\dagger ( p^\dagger ) = - \nabla _{C}\nabla _{C}\chi ^\dagger \, \delta _{S} C. \end{equation} (D2)Following the procedure in Section 3.2, we yield   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle - p^\dagger \, \mathcal {L}( \delta _{{\bf m}} C ) + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle . \end{equation} (D3)Replacing $$\mathcal {L}( \delta _{{\bf m}} C )$$ in the first term using the first derivative of eq. (9) with respect to structure, changes the expression for Hms accordingly to   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle p^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{S}C )\, \delta {\bf m} - p^\dagger \, P( S, \delta _{{\bf m}}G ) - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle . \end{equation} (D4)Term three and four in eq. (D4) prohibit the evaluation of Hms( ○, δS ) for all possible directions δm. To eliminate the third term, we multiply the first derivative of the PDE describing the computation of the Green function in eq. (8) with the adjoint wavefield z†, apply 〈.〉 and add it to eq. (D4). To facilitate the understanding, we only write down the terms of interest, which are given by   \begin{eqnarray} \Bigl \langle - p^\dagger \, P( S, \delta _{{\bf m}}G ) \Bigr \rangle & = & \Bigl \langle - p^\dagger \, P( S, \delta _{{\bf m}}G ) + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + z^\dagger \, \mathcal {L}( \delta _{{\bf m}}G ) \Bigr \rangle \nonumber\\ & = & \Bigl \langle - \delta _{{\bf m}}G\, P^\dagger ( S, p^\dagger ) + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + \delta _{{\bf m}}G\, \mathcal {L}^\dagger ( z^\dagger ) \Bigr \rangle , \end{eqnarray} (D5)where we invoke again the adjoint operators in the second line defined in eqs (40) and (41). If we can now find a wavefield z† that satisfies   \begin{equation} \mathcal {L}^\dagger ( z^\dagger ) = P^\dagger ( S, p^\dagger ), \end{equation} (D6)we can eliminate the first appearance of δmG in eq. (D4) and the third term is replaced by $$z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}$$. Following the exact same procedure for the fourth term, now naming the adjoint wavefield q†, we obtain   \begin{equation} \Bigl \langle - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) \Bigr \rangle = \Bigl \langle - u^\dagger \, P( \delta S, \delta _{{\bf m}} G ) + q^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + q^\dagger \, \mathcal {L}(\delta _{{\bf m}}G) \Bigr \rangle , \end{equation} (D7)and the unpleasant term δmG vanishes, if a wavefield can be found that satisfies   \begin{equation} \mathcal {L}^\dagger (q^\dagger ) = P^\dagger ( \delta S, u^\dagger ). \end{equation} (D8)Bringing it all together, eq. (D4) changes to   \begin{equation} H_{ms}( \delta {\bf m}, \delta S ) = \Bigl \langle p^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m} + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}(\delta _{S}C)\, \delta {\bf m} + z^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} + q^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m} \Bigr \rangle , \end{equation} (D9)and the Hessian-vector product Hms( ○, δS ) can be defined as   \begin{equation} H_{ms}( \circ , \delta S ) = \Bigl \langle p^\dagger \, \nabla _{{\bf m}} \mathcal {L}(C)\, \circ \Bigr \rangle + \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}(\delta _{S}C)\, \circ \Bigr \rangle + \Bigl \langle z^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \circ \Bigr \rangle + \Bigl \langle q^\dagger \nabla _{{\bf m}} \mathcal {L}(G)\, \circ \Bigr \rangle . \end{equation} (D10)To interpret Hms(○, δS ), we first study the expression (44) for the computation of the structure kernel. In the first term, the adjoint wavefield u† interacts with the correlation wavefield C and, in the second term, the adjoint correlation wavefield c† with the Green function G. As a consequence, the first-order change of the structure kernel given by eq. (D10), has four terms: interaction of (1) C with the change p† of the adjoint wavefield due to the altered misfit functional, (2) the adjoint wavefield with the altered correlation wavefield δSC, (3) G with the modified adjoint correlation wavefield q† because of the change of its forcing term in terms of p† (eq. D6) and (4) G with the modified adjoint correlation wavefield due to a change in S (eq. D8). Since both z† and q† interact with the Green function G and due to the linearity of the wave equation to the right-hand side, the two adjoint wavefields can be simulated simultaneously. D2 Second derivative with respect to the distribution of sources In order to derive second derivatives only with respect to source, we start with the directional derivative of the misfit function $$\mathcal {X}$$ with respect to S in the direction of δS1  \begin{equation} \nabla _{S}\mathcal {X} \, \delta S_1 = \Bigl \langle \nabla _C\chi \, \delta _{S_1} C \Bigr \rangle . \end{equation} (D11)We differentiate eq. (D11) again with respect to S, but now in the second direction δS2, which is given by   \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \nabla _{S}\nabla _{S}\mathcal {X} \, \delta S_1\, \delta S_2 = \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{S_1} C\, \delta _{S_2} C + \nabla _C\chi \, \delta _{S_1 S_2} C \Bigr \rangle , \end{equation} (D12)where   \begin{equation} \delta _{S_1 S_2} C = \nabla _{S} \nabla _{S}\, C\, \delta S_1\, \delta S_2 \end{equation} (D13)denotes the second-order variation of the correlation function C due to two perturbations δS1 and δS2. Since the correlation function is linear in S, we have   \begin{equation} \delta _{S_1 S_2} C = 0, \end{equation} (D14)and eq. (D12) simplifies to   \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{S_1} C\, \delta _{S_2} C \Bigr \rangle . \end{equation} (D15)Invoking the adjoint operator in eq. (59) and inserting the adjoint wavefield p†, defined by   \begin{equation} \mathcal {L}^\dagger ( p^\dagger ) = - \nabla _C\nabla _C\chi ^\dagger \, \delta _{S_1} C, \end{equation} (D16)into eq. (D15), we obtain for Hss applied to δS1 and δS2  \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \Bigl \langle - \delta _{S_2} C\, \mathcal {L}^\dagger (p^\dagger ) \Bigr \rangle . \end{equation} (D17)Using the adjoint operator in eq. (22) and the first variation of the PDE describing the modelling of the correlation wavefield due to δS (eq. (17)), Hss( δS1, δS2 ) changes to   \begin{equation} H_{ss}( \delta S_1, \delta S_2 ) = \Bigl \langle - p^\dagger \, \mathcal {L}( \delta _{S_2} C ) \Bigr \rangle = \Bigl \langle - p^\dagger \, P( \delta S_2, G )\Bigr \rangle , \end{equation} (D18)and the Hessian-vector product is given by   \begin{equation} H_{ss}( \delta S_1, \circ ) = \Bigl \langle - p^\dagger \, P( \circ , G )\Bigr \rangle . \end{equation} (D19)By design, eq. (D19) allows us to compute the change of the source kernel due to a perturbation δS1 in the source distribution. The only term in eq. (27) describing the computation of the source kernel that is changed by a perturbation in the source distribution is the adjoint wavefield u†. The source perturbation δS1 changes the correlation function C to $$\delta _{S_1}C$$, which alters the misfit and consequently also the adjoint wavefield. The latter is accounted for in eq. (D19) by the adjoint wavefield p†, driven by the change of the misfit $$\nabla _C\nabla _C\chi \, \delta _{S_1}C$$. A convenient consequence is that the routines implemented for the computation of the source kernel can be reused for eq. (D19) and only the adjoint source time function has to be adapted accordingly. D3 Second derivative with respect to structure The last piece that is missing for a full assembly of the Hessian matrix is the computation of second derivatives only with respect to structure. We start again with the first derivative of the misfit function with respect to m in the direction δm1  \begin{equation} \nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m}_1 = \Bigl \langle \nabla _C\chi \, \delta _{{\bf m}_1} C \Bigr \rangle , \end{equation} (D20)and differentiate a second time, but now in the direction of δm2 to obtain   \begin{equation} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) = \nabla _{{\bf m}}\nabla _{{\bf m}}\mathcal {X} \, \delta {\bf m}_1\, \delta {\bf m}_2 = \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + \nabla _C\chi \, \delta _{{\bf m}_1{\bf m}_2} C \Bigr \rangle , \end{equation} (D21)where   \begin{equation} \delta _{{\bf m}_1{\bf m}_2} C = \nabla _{{\bf m}}\nabla _{{\bf m}} C\, \delta {\bf m}_1\, \delta {\bf m}_2 \end{equation} (D22)is the second-order variation of the correlation C due to δm1 and δm2. In the following, we try again to eliminate unpleasant terms and therefore prepare convenient tools in terms of the first and second derivative of the PDE describing the computation of correlation functions C, given by   \begin{equation} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1 + \mathcal {L}( \delta _{{\bf m}_1}C ) - P( S, \delta _{{\bf m}_1}G ) = 0, \end{equation} (D23)  \begin{equation} \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2}C )\, \delta {\bf m}_1 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}C )\, \delta {\bf m}_2 + \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2}C ) - P( S, \delta _{{\bf m}_1{\bf m}_2}G ) = 0, \end{equation} (D24)and the same for the PDE concerning the modelling of the Green function G, that is,   \begin{equation} \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1 + \mathcal {L}( \delta _{{\bf m}_1} G ) = 0. \end{equation} (D25)  \begin{equation} \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1\, \delta {\bf m}_2 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2}G )\, \delta {\bf m}_1 + \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 + \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2} G ) = 0. \end{equation} (D26)We first multiply eq. (D24) with the adjoint wavefield u†, apply 〈.〉 and add it to eq. (D21), which yields   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + \nabla _C\chi \, \delta _{{\bf m}_1{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 + u^\dagger \, \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2} C )\nonumber\\ && -\, u^\dagger \, P( S, \delta _{{\bf m}_1{\bf m}_2}G )\Bigr \rangle . \end{eqnarray} (D27)Invoking again two adjoint operators for terms two and six in eq. (D27), we can eliminate $$\delta _{{\bf m}_1{\bf m}_2} C$$ due to the definition of u† and eq. (D27) simplifies to   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 - u^\dagger \, P( S, \delta _{{\bf m}_1{\bf m}_2}G )\Bigr \rangle . \end{eqnarray} (D28)Repeating the same procedure using eq. (D26) with the adjoint wavefield c† defined in eq. (43), we get   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 - u^\dagger \, P( S, \delta _{{\bf m}_1{\bf m}_2}G ) \nonumber\\ && +\, c^\dagger \nabla _{{\bf m}}\nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1\, \delta {\bf m}_2 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 \nonumber\\ && +\, c^\dagger \mathcal {L}( \delta _{{\bf m}_1{\bf m}_2} G ) \Bigr \rangle . \end{eqnarray} (D29)Defining two adjoint operators for terms five and nine, we get rid of $$\delta _{{\bf m}_1{\bf m}_2} G$$, because of the definition of c† and eq. (D29) changes to   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}} \nabla _{{\bf m}} \mathcal {L}(C)\, \delta {\bf m}_1\, \delta {\bf m}_2 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 \nonumber\\ && +\, c^\dagger \nabla _{{\bf m}}\nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_1\, \delta {\bf m}_2 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 \Bigr \rangle . \end{eqnarray} (D30)Until now, we have not made any assumptions on the forward modelling operator $$\mathcal {L}$$. For simplicity, we consider in the following only operators that are linear in m. Therefore, the two terms in the expression for Hmm( δm1, δm2 ) containing second derivatives of the operator $$\mathcal {L}$$ vanish, leading to   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 \nonumber\\ && +\, u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} C )\, \delta {\bf m}_2 + c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1} G )\, \delta {\bf m}_2 \Bigr \rangle . \end{eqnarray} (D31)For an incorporation and interpretation of non-linear operators, we refer the reader to Fichtner & Trampert (2011). The first three terms in eq. (D31) are unpleasant, because the direction δm2 is contained implicitly in terms of $$\delta _{{\bf m}_2} C$$. Trying to change that, we define two adjoint wavefields p† and w† by   \begin{equation} \mathcal {L}^\dagger ( p^\dagger ) = - \nabla _C\nabla _C\chi ^\dagger \, \delta _{{\bf m}_1} C - \nabla _{{\bf m}}\mathcal {L}^\dagger ( u^\dagger ) \, \delta {\bf m}_1, \end{equation} (D32)  \begin{equation} \mathcal {L}^\dagger ( w^\dagger ) = - \nabla _{{\bf m}}\mathcal {L}^\dagger ( c^\dagger ) \, \delta {\bf m}_1. \end{equation} (D33)Applying the principles used above several times, the first three terms in eq. (D31) can be altered in the following way:   \begin{eqnarray} &&{ \Bigl \langle c^\dagger \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} G )\, \delta {\bf m}_1 + \nabla _C\nabla _C\chi \, \delta _{{\bf m}_1} C\, \delta _{{\bf m}_2} C + u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_2} C ) \, \delta {\bf m}_1 \Bigr \rangle }\nonumber\\ && = \Bigl \langle - \delta _{{\bf m}_2}G\, \mathcal {L}^\dagger ( w^\dagger ) - \delta _{{\bf m}_2}C\, \mathcal {L}^\dagger ( p^\dagger ) \Bigr \rangle = \Bigl \langle - w^\dagger \mathcal {L}( \delta _{{\bf m}_2} G ) - p^\dagger \mathcal {L}( \delta _{{\bf m}_2} C ) \Bigr \rangle . \end{eqnarray} (D34)Replacing $$-w^\dagger \mathcal {L}( \delta _{{\bf m}_2} G )$$ and $$- p^\dagger \mathcal {L}( \delta _{{\bf m}_2} C )$$ using eqs (D25) and (D23), respectively, and then adding expression (D25), multiplied with z† and 〈.〉 applied, eq. (D34) changes to   \begin{eqnarray} &&{ \Bigl \langle - w^\dagger \mathcal {L}( \delta _{{\bf m}_2} G ) - p^\dagger \mathcal {L}( \delta _{{\bf m}_2} C ) \Bigr \rangle = \Bigl \langle w^\dagger \, \nabla _{{\bf m}}\mathcal {L}(G)\, \delta {\bf m}_2 + p^\dagger \, \nabla _{{\bf m}}\mathcal {L}(C)\, \delta {\bf m}_2 - p^\dagger \, P( S, \delta _{{\bf m}_2}G ) \Bigr \rangle }\nonumber\\ && = \Bigl \langle w^\dagger \nabla _{{\bf m}}\mathcal {L}(G)\, \delta {\bf m}_2 + p^\dagger \nabla _{{\bf m}}\mathcal {L}(C)\, \delta {\bf m}_2 - p^\dagger \, P( S, \delta _{{\bf m}_2}G ) + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_2 + z^\dagger \, \mathcal {L}( \delta _{{\bf m}_2}G ) \Bigr \rangle . \end{eqnarray} (D35)If we now define z† as the solution to   \begin{equation} \mathcal {L}^\dagger ( z^\dagger ) = P^\dagger ( S, p^\dagger ), \end{equation} (D36)we can write down the final expression for Hmm( δm1, δm2 ) as   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \delta {\bf m}_2 ) &=& \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}C )\, \delta {\bf m}_2 + c^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}G )\, \delta {\bf m}_2 + w^\dagger \, \nabla _{{\bf m}}\mathcal {L}(G)\, \delta {\bf m}_2 \nonumber\\ && +\, p^\dagger \, \nabla _{{\bf m}}\mathcal {L}(C)\, \delta {\bf m}_2 + z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \delta {\bf m}_2 \Bigr \rangle . \end{eqnarray} (D37)The Hessian operator applied to the model perturbation δm1 is thus given by   \begin{eqnarray} H_{mm}( \delta {\bf m}_1, \circ ) &=& \Bigl \langle u^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}C )\, \circ \Bigr \rangle + \Bigl \langle c^\dagger \, \nabla _{{\bf m}}\mathcal {L}( \delta _{{\bf m}_1}G )\, \circ \Bigr \rangle + \Bigl \langle w^\dagger \, \nabla _{{\bf m}}\mathcal {L}(G)\, \circ \Bigr \rangle \nonumber\\ && +\, \Bigl \langle p^\dagger \, \nabla _{{\bf m}}\mathcal {L}(C)\, \circ \Bigr \rangle + \Bigl \langle z^\dagger \, \nabla _{{\bf m}} \mathcal {L}(G)\, \circ \Bigr \rangle . \end{eqnarray} (D38)The first two terms account for changes in the forward wavefields C and G caused by the perturbation δm1, the last three terms for changes in the adjoint wavefields u† and c†, caused by scattering and an altered adjoint source time function. The two adjoint wavefields w† and z† can be computed simultaneously, since both interact with the Green function G and the operator $$\mathcal {L}$$ is linear in its forcing term. © The Authors 2017. Published by Oxford University Press on behalf of The Royal Astronomical Society.

Journal

Geophysical Journal InternationalOxford University Press

Published: Jan 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off