TY - JOUR AU - Lu,, Yongming AB - Abstract Although successful applications of deep neural networks (DNNs) have been shown in many research fields, their application to Fresnel zone picking for Kirchhoff-type migration has not been explored. We investigate the application of DNNs for identifying the left and right boundaries of the Fresnel zone in the dip-angle domain automatically, which leads to an optimal summation for imaging. We use a pair of 1D dip-angle gathers in both inline and crossline directions as the input data to train the DNNs in a supervised way. The goal is to minimize the cost as a function of weights and biases. The trained DNNs can be utilized to automatically extract a set of marks regarding different dip-angle values from a large number of sub-images in dip-angle gathers. Through experiment, we show that a four-layer DNN is enough to extract the features of our training data and meanwhile avoid overfitting in estimating the Fresnel zone range. Finally, we adopt two field data examples, Dashen and Xudong oil field in China, to demonstrate the effectiveness and the generalization ability of the DNNs-based Fresnel zone picking on dip-angle gathers. The results illustrate that the proposed DNN is an efficient automatic data-driven picker needing little human participation. Fresnel zone picking, deep neural networks (DNNs), seismic data, high resolution imaging 1. Introduction Kirchhoff-type migration is still a popular tool in seismic imaging because of its flexibility and efficiency. The flexibility and efficiency come from the fact that Kirchhoff migration allows processing each trace separately and it can easily be implemented in parallel (Leveille et al.2011). In theory, Kirchhoff migration is based on a diffraction-stack integral. The diffraction extends to infinite time and distance, which brings its infinite aperture. One drawback of this method is that the energy is distributed over a wide area. However, only a part of the signal within the aperture is contributive while the data outside that part is considered as unwanted noise. The outside part may not contribute or even have a negative contribution to the final imaging result (Schleicher et al.1997). Schleicher et al. (1997) first proposed a method of projected Fresnel zones to limit the aperture and thus improve imaging quality. Since then, different methods have been proposed for picking the optimal apertures (Sun & Schuster 2001; Spinner & Mann 2007; Klokov & Fomel 2013; Zhang et al.2016, 2017; Xu & Zhang 2017). The dip-angle domain has proved to be an ideal domain to estimate the range of the Fresnel zone for optimal summation (Zhang et al.2016, 2017; Xu & Zhang 2017). Both the aperture central position and its corresponding width are illustrated in the dip-angle domain that is clearly identifiable. In the dip-angle domain, a reflection event is visible as a concave shape with its reflector dip at the apex. Then, the Fresnel zone is the area that does not exceed half the prevailing period to the apex. In practice, a good estimate of aperture range should be neither too short to harm the contributing part in the Fresnel zone, nor be too long so that the noise may not be effectively suppressed. In conclusion, a good estimation of the Fresnel zone can reduce migration noise and enhance the resolution of the image (Klokov & Fomel 2013; Zhang et al.2017). Picking a Fresnel zone range in the dip-angle domain is not difficult. Nevertheless, it is repetitive and quite time-consuming work requiring intensive manpower, especially for a large field dataset that may contain many different noises. Moreover, the picking results depend mainly on the experience of the domain expert of data processing. Deep learning, embodied in a Deep Neural Network (DNN), is a kind of artificial intelligence (AI) technology imitating the way in which the human brain thinks and processes data. Deep learning consists of multiple layers of nonlinearly activated nodes to simulate signal transformations in the brain. With the development of algorithms and computing devices, the deep-learning method demonstrates its importance in many aspects of society (Suzuki 2017; Webb 2018). It has been proved to be good at discovering hidden features in high-dimensional data (Lecun et al.2015). Conventional machine-learning methods are relatively weak in handling raw data and they often require expertise to extract the features before the learning process. Deep-learning methods, by contrast, greatly reduce the demand for specialized knowledge. In general, we can enhance the performance of training by simply feeding in more data. Applications of deep learning in the field of seismology has become a research hot topic in recent years. Many studies focusing on DNN-based methods have been proposed, such as first-break picking (Yuan et al.2018a), tomography (Araya-Polo et al.2018) and full-waveform inversion (Lewis & Vigh 2017), which show that deep-learning technology has great potential to help to transfer a lot of human knowledge into a machine-interpretable form in the field of seismology. We investigate a novel method to pick the Fresnel zone in the dip-angle domain automatically using DNN. First, we introduce the Fresnel zone in the dip-angle domain, the data-preprocessing process and the architecture of a neural network. Second, we discuss the cost function and optimization method. Third, two field datasets are adopted to illustrate the performance of our deep learning-based picking method. Finally, conclusions are drawn and future work is discussed. 2. Theory This part is composed of three subsections to illustrate the methodology of the DNN-based automatic Fresnel zone picking algorithm. First, we introduce 1D inline and crossline dip-angle gathers—the input data fed to the neural network. Second, we discuss the data-preprocessing process. Finally, the architecture of the DNN is discussed in detail. 2.1. Fresnel zone in the dip-angle domain The (first) Fresnel zone is defined by the image points whose traveltime difference with the specular ray does not exceed half of the dominant period of the data, which is illustrated in figure 1. For a source-receiver pair |$( {S,\ G} )$| and monofrequency elementary wave with period T, the region of the (first) Fresnel zone relative to an arbitrary interface (reflecting/transmitting) |$\sum F$| can be expressed as (Schleicher et al.1997, Sun & Schuster 2001) \begin{equation} \left| {\tau \left( {S,\bar{M}} \right) + \tau \left( {\bar{M},G} \right) - \tau \left( {S,G} \right)} \right| \le \frac{1}{2}T, \end{equation} (1) where |$\bar{M}$| represents all the points on |$\sum F$| that satisfy Eq. (1) and |$\tau ( {A,\ B} )$| is the traveltime between the point A and B. The event in Fresnel zone contains the main energy. Figure 1. View largeDownload slide The (first) Fresnel zone for a monofrequency elementary wave on a reflecting surface, |$\sum F$|⁠, encountered by the ray, |$SG$|⁠. If the traveltime difference between |$SMG$| and |$S\bar{M}G$| is less than half of the wave period, |$T/2$|⁠, the indicated point |$\bar{M}$| on |$\sum F$| belongs to the Fresnel zone of M. Figure 1. View largeDownload slide The (first) Fresnel zone for a monofrequency elementary wave on a reflecting surface, |$\sum F$|⁠, encountered by the ray, |$SG$|⁠. If the traveltime difference between |$SMG$| and |$S\bar{M}G$| is less than half of the wave period, |$T/2$|⁠, the indicated point |$\bar{M}$| on |$\sum F$| belongs to the Fresnel zone of M. The Fresnel zone in the dip-angle domain, depending on whether it is 2D or 3D, is seen as a concave curve or a concave surface with the reflector dip at the apex. The imaging result is produced by the event summation in the dip-angle direction with a stationary-point principle. For a 3D imaging volume, it is hard to extract the 2D Fresnel zone directly because of the huge memory demand of the 2D dip-angle gathers. In this instance, Zhang et al. (2016) propose a decomposition method. They adopt a pair of 1D dip-angle gathers, inline and crossline, to alternatively represent the 2D dip-angle gathers; this reduces the memory consumption greatly. Figure 2 depicts a pair of dip-angle gathers, in which the Fresnel zone is the part between the two red lines. The merit of this method is that the data distribution of the two separated 1D dip-angle gathers has similarities. Thus, we can use both of them simultaneously for the input data fed to the DNN. In fact, the similarities in seismic data have been used as the evaluation criteria or constraint conditions to successfully apply to seismic data processing, inversion or interpretation (Yuan et al.2018b). In addition, this trained DNN can be used in solving both 2D and 3D problems without changing parameters, which is the special advantage of this neural network method: the speed of problem solving is no longer limited by the scale of the problem. Figure 2. View largeDownload slide A pair of dip-angle gathers: (a) inline and (b) crossline. The planar reflector has inline and crossline dip angles of |$15^{\circ} $| and |$- 10^{\circ} $|⁠, respectively. The Fresnel zone is the part between the red lines. Figure 2. View largeDownload slide A pair of dip-angle gathers: (a) inline and (b) crossline. The planar reflector has inline and crossline dip angles of |$15^{\circ} $| and |$- 10^{\circ} $|⁠, respectively. The Fresnel zone is the part between the red lines. In the dip-angle domain, the x-axis represents the angle and z-axis represents the traveltime. We choose the common depth point (CDP) at each time slice as the input samples, and the angle value as the output. The angle value is a continuous and ordered value, which results in a regression problem. It can be seen from figure 2 that the Fresnel zone has two boundaries: left and right. Our solution is to train a pair of DNNs to identify the left and the right boundaries separately. 2.2. Data preprocessing Data preprocessing is critical and it has a huge impact on the performance of a neural network. Transformation and standardization are common data-preprocessing methods. In general, transformation is used to improve the interpretability of data and standardization is used to accelerate the training process by letting the data have a zero-mean and unit-variance. By data preprocessing, we can accelerate the training process and avoid overexpressing a particular feature. For example, if a particular feature is significantly larger than other features, then the DNN will more likely to focus on this feature than others. The magnitude of energy in the dip-angle gather can vary greatly. The magnitude of energy inside the Fresnel zone can be from |$- {10^n}$| to |${10^n}$| depending on the stacking number and the source energy, while the energy outside the Fresnel zone is smaller for several orders of magnitude. Thus, the variance of raw data is extremely large. To train a DNN with this raw data will be very inefficient. In addition, the trained DNN will not have generalization ability, because the stacking number and the source energy may vary with different field datasets. In this instance, data preprocessing is very necessary for DNN Fresnel zone picker. Considering the range of the Fresnel zone in the dip-angle gather is independent of the sign of the event, we first change the data into absolute values. Then we transform the data through a log transformation to decrease the variance. Finally, standardization transformation is adopted to satisfy the generalization requirement. The preprocessing procedures are expressed as \begin{equation} {\boldsymbol{X}}' = \log \left( {abs\left( {\boldsymbol{X}} \right) + \epsilon } \right), \end{equation} (2) \begin{equation} \hspace{-2pc}{\boldsymbol{Z}} = \frac{{{\boldsymbol{X}}{\rm{'}} - \mu }}{\delta }, \end{equation} (3) where |${\boldsymbol{X}}$|⁠, |${\boldsymbol{X}}{\rm{'}}$| and |${\boldsymbol{Z}}$| are the vectors of raw data, transformed data and standardized data, respectively, |$\epsilon $| is a minimal positive real number and |$\mu $| and |$\delta $| are the mean and the standard error of |${\boldsymbol{X'}}$|⁠, respectively. Theoretically, the range of dip-angle values is from −90° to 90°. However, storing the whole range of dip values will take up a huge amount of memory and the reflection energy in the two side regions is approximately zero. A typical method is to store the energy in a limited range of dip value. The range of dip-angle value of our DNN is from −90° to 90° with a dip-angle interval of 1°. If the dip range of a given dataset is narrower than the size of the DNN input layer, we need to employ zero padding before data preprocessing. As can be seen from figure 2, the output has two boundaries: left and right. The two boundaries are trained by two separate DNNs. We normalize the two boundaries in the range of |$( {0,{\rm{\ }}1} )$| by dividing by |$\pm 90$| as the output values of the two separate DNNs. 2.3. DNN architecture The architecture of DNN is a set of layers of nonlinear processing units for feature extraction and transformation (Goodfellow et al.2016). The information flow between adjacent layers is through an affine transformation and a nonlinear activation, which can be expressed as \begin{equation} {{\boldsymbol{h}}^{\left( i \right)}} = {g^{\left( i \right)}}\ \left( {{{\boldsymbol{w}}^{\left( i \right)}}^T{{\boldsymbol{h}}^{\left( {i - 1} \right)}} + {b^{\left( i \right)}}} \right), \end{equation} (4) where i is the number of layer, |${{\boldsymbol{w}}^{( i )}}$| ℝ|${{D_i} \times {D_{i - 1}}}$| is the weight matrix, |${{\boldsymbol{h}}^{( {i - 1} )}} \in $| ℝ|${{D_{i - 1}}}$| is the value of layer nodes, |${b^{( i )}} \in $| ℝ|${{D_i}}$| is the bias, |${D_i}$| is the width of the |${i^{th}}$| layer and |$g()$| is the nonlinear activation function. The DNNs we employ consist of three types of layer: the input, hidden and output layers, which are shown in figure 3. The preprocessed data are fed into DNN as the input layer and they are trained by two independent DNNs to derive the normalized angle values for the left and right boundaries, respectively. Figure 3. View largeDownload slide The architecture of DNN. Blue, red and yellow dots represent the input, hidden and output layers, respectively. The width of layers of the input layer, the three hidden layers and the output layer are 181, 64, 32, 16 and 1, respectively, for each DNN. The DNNs for the left and right boundaries are trained separately and they have different cost functions. Figure 3. View largeDownload slide The architecture of DNN. Blue, red and yellow dots represent the input, hidden and output layers, respectively. The width of layers of the input layer, the three hidden layers and the output layer are 181, 64, 32, 16 and 1, respectively, for each DNN. The DNNs for the left and right boundaries are trained separately and they have different cost functions. Aside from the input layer and the output layer, which have fixed sizes, the size of the hidden layer is flexible. The depth and width of hidden layers decide the learning ability of a DNN. A wider and deeper DNN has a stronger ability of learning and vice versa. However, a proper DNN should not be too complex or be too simple. If the DNN is too simple, then it is not sufficient to fit the training samples, which causes underfitting. On the other hand, an over complicated DNN will learn the noise of training data and it doesn't have generalization ability, which results in overfitting. Furthermore, training a very deep DNN is time consuming. To judge if a DNN model is overfitting or underfitting, we can compare the training error and the validation error (Goodfellow et al.2016). The training error and the validation error access the cost on the training set and the validation set, respectively. If both the training error and the validation error are large, the trained DNN is underfitting and we need to add the depth or the width of hidden layers. If the training error is small and the validation error is large, the DNN is overfitting and we need to reduce the complexity of DNN. An empirical way of designing the architecture of a DNN is to increase the depth and the width of the hidden layer step by step. When the validation error decreases to the point before it begins to increase again, we stop increasing the complexity of DNN and we believe the DNN at this time is the ideal one. Figure 4 shows the relationship between the two errors and the depth of DNN for our Fresnel zone picking problem. It can be seen that a four-layer (the three hidden layers and the output layer) DNN has the best performance. The widths of the three hidden layers are 64, 32 and 16, respectively. Figure 4. View largeDownload slide Training cost, validation cost and training time on different layer DNNs. It can be seen from the result that a four-layer DNN is enough to extract the features relatively quickly. Figure 4. View largeDownload slide Training cost, validation cost and training time on different layer DNNs. It can be seen from the result that a four-layer DNN is enough to extract the features relatively quickly. The activation function |$g()$| in Eq. (4) is a nonlinear function presenting the rate of action potential firing in the cell. With activation functions, a neural network can simulate complex nonlinear functions. Following the AlexNet (Krizhevsky et al.2012), we adopt the Rectified Linear Unit (ReLU) (Nair & Hinton 2010) on the first three layers (the input layer and the first two hidden layers) and the sigmoid function (Han & Moraga 1995) on the last hidden layer. The ReLU function is |$f\!\ ( x ) = {\rm{max}}( {0,\ x} )$|⁠. It is a simple function and it has several advantages. First, it doesn't lead to vanishing gradient. Second, it is more computationally efficient to compute than sigmoid-like functions for its simplicity. And last, networks with ReLU tend to show better convergence performance than with sigmoid (Glorot et al.2011). However, it doesn't constrain the output of the neuron. It may tend to make the activation very big, with a result that ReLU is not suitable for the activation function of the last hidden layer. Thus, ReLU is adopted for the first three layers. The sigmoid function is |${\rm{S\ }}( {\rm{x}} ) = \frac{1}{{1 + {e^{ - x}}}}\ $|⁠. The sigmoid function has the advantage of not blowing up the activation. It constrains the output in the range of 0–1 and is always likely to generate a non-zero value resulting in dense representations. Utilizing this feature, the sigmoid function is adopted for the activation function of the last hidden layer. 3. Cost function and optimization This part is composed of two subsections. First, we discuss how we designed the cost function. Second, we show the optimization method for our DNN. 3.1. Cost function The aim of training is to optimize the parameters (i.e. weights |${\boldsymbol{w}}$| and bias b) to minimize the cost function. The cost function defines the error structure of a particular problem. Thus, it is quite important for model estimation and evaluation (Christoffersen & Jacobs 2004). A cost function has two parts: a loss term and a regularization term, which are expressed as \begin{equation} J\ \left( {\boldsymbol{\theta }} \right) = \frac{1}{N}\ \mathop \sum \nolimits_{x \in \bf{X}} L\left( {y,\hat{y}\left( {x;{\boldsymbol{\theta }}} \right)} \right) + \lambda R\left( {\boldsymbol{\theta }} \right), \end{equation} (5) where |$J( {\boldsymbol{\theta }} )$| is the cost function, |$L( {y,\ \hat{y}( {x;\ {\boldsymbol{\theta }}} )} )$| the loss term, |$R( {\boldsymbol{\theta }} )$| the regularization term, N the sample size, |${\boldsymbol{\theta \ }} = \ \{ {{\boldsymbol{w}},\ b} \}$| the parameter to learn, |$y \in ( {0,1} )$| the true value, |$\hat{y} \in ( {0,1} )$| the prediction value, |$x \in $| ℝ|${181}$| the input nodes and |$\lambda $| is the regularization factor. Typically, the cost function is designed on a case-by-case basis. In our DNN, we use a pair of asymmetric quadratic loss terms and |${L_2}$|-norm regularization. In the following part, we discuss how we designed the loss term and the regularization term. For the loss term, we considered that the loss of harming the event inside the Fresnel zone is much higher than the loss of keeping noise uncut, which requires an asymmetric loss function. In this instance, we designed a pair of asymmetric quadratic loss functions for the left and right boundaries separately, trying to retain the contributive part as much as possible. The loss functions of the left and right boundaries are expressed, respectively, as \begin{equation} {L_{{\rm{left}}}}\ \left( {y,\ \hat{y}} \right) = {\left( {y - \hat{y}} \right)^2}\ + \alpha *{\left( {\max \left( {0,\ \left( {\hat{y} - y} \right)} \right)} \right)^2}, \end{equation} (6) \begin{equation} {L_{{\rm{right}}}}\ \left( {y,\hat{y}} \right) = {\left( {y - \hat{y}} \right)^2}\ + \alpha *{\left( {\max \left( {0,\ \left( {y - \hat{y}} \right)} \right)} \right)^2}, \end{equation} (7) where |$\alpha $| is the additional penalty factor. As can be seen, the first term on the right-hand side is the quadratic loss and the second term is an additional penalty on over-cutting the Fresnel zone. The factor |$\alpha $| adjusts the degree of asymmetry. The penalty of over-cutting increases as the factor |$\alpha $| increases. However, if the factor |$\alpha $| goes up to some extent, the DNN will pay little attention to the Fresnel zone and estimate a very long aperture. Figure 5 shows the cost functions for the left and right boundaries using different penalty factors. We suggest |$\alpha $| should be less than 1. Figure 5. View largeDownload slide A pair of asymmetric quadratic loss functions are designed for (a) the left boundary and (b) the right boundary separately, trying to retain the contributive part as much as possible. The factor |$\alpha $| adjusts the degree of asymmetry. The penalty of over-cutting increases as the factor |$\alpha $| increases. Figure 5. View largeDownload slide A pair of asymmetric quadratic loss functions are designed for (a) the left boundary and (b) the right boundary separately, trying to retain the contributive part as much as possible. The factor |$\alpha $| adjusts the degree of asymmetry. The penalty of over-cutting increases as the factor |$\alpha $| increases. The regularization term is used to make the model less prone to overfitting. It is intended to reduce the generalization error but not the training error. The |${L_1}$|-norm and the |${L_2}$|-norm are the most two common forms of regularization. First, the |${L_1}$|-norm minimizes the sum of the absolute values of the parameters. It causes many parameters to be zero, making the parameter vector sparse, which is suitable for feature selection. Second, the |${L_2}$|-norm encourages all parameters to be small, making small distributed parameters, which is suitable for avoiding overfitting. Generally, the |${L_2}$|-norm is suitable for low-dimensional DNN models while the |${L_1}$|-norm tends to work better for high-dimensional models. Considering that our DNN is relatively simple, |${L_2}$|-norm is chosen in our DNN, which can be written as \begin{equation} R\ \left( \theta \right) = \Vert\theta \Vert_2^2. \end{equation} (8) The regularization parameter |$\lambda $| in Eq. (5) adjusts the importance of the regularization term. An appropriate regularization parameter can be derived by trying different |$\lambda $| to see the validation loss and test loss. Figure 6 illustrates the validation losses and the test losses using different regularization parameters. It is seen from figure 6 that both the validation loss and the test loss are small when |$\lambda < 0.2$|⁠. Figure 6. View largeDownload slide The validation losses and the test losses using different regularization parameters. Both the validation loss and the test loss are small when |${\rm{\lambda }} < 0.2$|⁠. Figure 6. View largeDownload slide The validation losses and the test losses using different regularization parameters. Both the validation loss and the test loss are small when |${\rm{\lambda }} < 0.2$|⁠. To summarize, the cost functions for our DNN Fresnel zone picker are as follows \begin{eqnarray} {J_{{\rm{left}}}}\ ( {\boldsymbol{\theta }} ) &=& \frac{1}{N}\ \mathop \sum \nolimits_{x \in \bf{X}} \left( {{( {y - \hat{y}} )}^2 }\right.\nonumber\\ &&\left.+\, \alpha *{{\left( {\max ( {0,\ \hat{y} - y} )} \right)}^2} \right) + \lambda \Vert\theta\Vert_2^2, \end{eqnarray} (9) \begin{eqnarray} {J_{{\rm{right}}}}\ ( {\boldsymbol{\theta }} ) &=& \frac{1}{N} \mathop \sum \nolimits_{x \in \bf{X}} \left( {{( {y - \hat{y}} )}^2}\right.\nonumber\\ &&\left.+\, \alpha *{{\left( {\max ( {0,\ y - \hat{y}} )} \right)}^2} \right) + \lambda \Vert\theta \Vert_2^2, \end{eqnarray} (10) where the penalty factor |$\alpha $| and the regularization parameter |$\lambda $| are the two hyperparameters. In our DNN, |$\alpha $| is 0.8 and |$\lambda $| is 0.02. 3.2. Optimization A good optimization algorithm can greatly accelerate the training process, especially in large-scale machine learning (Bottou et al.2016; Schmidhuber 2015). Gradient descent (GD) optimization methods are the dominant optimization methods, which can be expressed as \begin{equation} {\theta ^{i + 1}} = {\theta ^i}\ - \mu \frac{{\partial {\boldsymbol{J}}\left( \theta \right)}}{{\partial {\theta ^i}}}, \end{equation} (11) where |$\mu $| is the learning rate. Mini-batch gradient descent (MBGD) is one of the most popular GD methods, which uses a mini-batch of several samples to compute the gradient. However, it introduces a batch size, which is a hyperparameter. The choice of batch size largely depends on the computational capabilities of devices. We use MBGD with a batch size of 64. However, MBGD doesn't guarantee good convergence. Learning rate is also a decisive parameter controlling training performance. To choose a proper learning rate |$\mu $| is difficult. Too slow a learning rate leads to extremely slow convergence and too fast a learning rate causes the cost function to fluctuate around the minimum or even to diverge. Many optimization algorithms (Ruder 2016) have been proposed that try to decrease the difficulty of choosing the learning rate and increase the rate of convergence. In our method, the Adaptive Momentum Estimation (Adam) method (Kingma & Ba 2014) is adopted in our DNN with the default initial learning rate of 0.001. 4. Field data examples First, we use a field dataset from the Dashen oil field in China to illustrate the performance of our DNN-based Fresnel zone picking method. The number of labeled CDPs is 837 in total. We randomly divide them into a training set, a validation set and a test set, whose sizes are 600, 138 and 99, respectively. The range of CDPs is 70–1200, with a dip angle ranging from |$- 50^{\circ} $| to |$50^{\circ} $| with a dip-angle interval of |$1^{\circ} $|⁠. The number of time slices is 2351 and the time interval is 0.002 s. Six predicted CDPs overlaid with the dip-angle gather on the test set are shown in figure 7. Figure 7. View largeDownload slide Picking results on test sets. The white, green and yellow lines represent manual picking, DNN prediction and the DNN prediction after the moving average, respectively. Figure 7. View largeDownload slide Picking results on test sets. The white, green and yellow lines represent manual picking, DNN prediction and the DNN prediction after the moving average, respectively. It can be seen from the result that the trained DNN has a good estimation of the Fresnel zone. First, our DNN gives wider apertures than the manually picked ones for the most parts, which indicates that the designed cost functions are in accordance with the original intention. Second, we consider that the difference between the DNN picking and manual picking is caused by the variance of human labels. The label work is accomplished by several people and different people have different labeling styles. The DNN combines these labeling results in different styles and gives average inferences. Third, the fluctuations between adjacent time slices are small, indicating that the DNN is stable and convincing. Last but not least, the DNN method is faster than manual picking, which takes several weeks. The conventional imaging result, the imaging result using optimal summation by our DNN picker and the imaging result using manual picking are shown in figure 8. It can be seen from the results that the events are clearer and many artifacts are eliminated while the primary events are untouched. The DNN method has achieved the level of a human. Figure 8. View largeDownload slide Comparison of (a) the imaging result with manual picking optimal aperture summation, (b) the imaging result with DNN-predicted optimal aperture summation and (c) the conventional imaging result without optimal aperture summation. The yellow circles tell that the diffraction artifacts are eliminated. We can see that the DNN picking results are no worse than the manual picking results. Figure 8. View largeDownload slide Comparison of (a) the imaging result with manual picking optimal aperture summation, (b) the imaging result with DNN-predicted optimal aperture summation and (c) the conventional imaging result without optimal aperture summation. The yellow circles tell that the diffraction artifacts are eliminated. We can see that the DNN picking results are no worse than the manual picking results. Second, we use another field dataset from Xudong in China to validate the generalization ability of our DNN. This dataset is different from the previous one and is untrained for our DNN. The noise of this dataset is greater and the range of dip-angle is narrower. The range of CDP is 620–1220, with a dip-angle ranging from |$- 40^{\circ} $| to |$40^{\circ} $| and a dip-angle interval of |$1^{\circ} $|⁠. The number of time slices is 4000 and the time interval is 0.001 s. Three predicted CDPs are shown in figure 9. It can be seen that the noise of these CDPs is decreased to some extent and the energy inside the Fresnel zone is not harmed. Figure 10 shows the comparison between the imaging result with optimal summation by our DNN and the conventional imaging result. It can be seen that a lot of noise is eliminated and the imaging result is clearer. Figure 9. View largeDownload slide The picking results on datasets from Xudong in China. This dataset is untrained for DNN to validate the generalization ability of our method. The green and yellow lines represent the DNN prediction and the DNN prediction after moving average, respectively. This figure proves the generalization ability of our DNN. Figure 9. View largeDownload slide The picking results on datasets from Xudong in China. This dataset is untrained for DNN to validate the generalization ability of our method. The green and yellow lines represent the DNN prediction and the DNN prediction after moving average, respectively. This figure proves the generalization ability of our DNN. Figure 10. View largeDownload slide Comparison of (a) the imaging result with DNN-predicted optimal aperture summation and (b) the conventional imaging result without optimal aperture summation. It can be seen that noise is partly removed and the signal-to-noise ratio is improved while the primary events are untouched. Figure 10. View largeDownload slide Comparison of (a) the imaging result with DNN-predicted optimal aperture summation and (b) the conventional imaging result without optimal aperture summation. It can be seen that noise is partly removed and the signal-to-noise ratio is improved while the primary events are untouched. 5. Conclusion We have presented a method based on DNNs to estimate the optimal aperture in the dip-angle domain automatically. The proposed method predicts the left and the right boundaries separately using a pair of DNNs. In our method, we first adopt the decomposition method to generate a pair of 1D dip-angle gathers, inline and crossline, to represent the 2D dip-angle gathers. The 1D dip-angle gathers are similar and the trained DNN can solve both 2D and 3D problems with one set of parameters. Then, we use the event at each time slice in the 1D dip-angle gather as the input and normalized two boundaries as the output. We find that a relatively simple DNN with four layers is enough for a Fresnel zone picking problem. Last, we use two field datasets to validate the performance and generalization ability of our method. However, the imaging quality of Kirchhoff-type migration is limited. It can't handle complex media. Reverse time migration (RTM) provides much better imaging results than Kirchhoff migration. In the future, the DNN-based Fresnel zone picker for RTM is worth studying. Acknowledgements The authors are grateful to the National Natural Science Foundation of China (under grant 41804129) and China Postdoctoral Science Foundation (under grant 2018T110137). Conflict of interest statement. None declared. References Araya-Polo M. , Jennings J. , Adler A. , Dahlke T. , 2018 . Deep-learning tomography , Leading Edge , 37 , 58 – 66 . Google Scholar Crossref Search ADS Bottou L. , Curtis F.E. , Nocedal J. , 2016 . Optimization methods for large-scale machine learning , SIAM Review , 60 , 223 – 311 . Google Scholar Crossref Search ADS Christoffersen P. , Jacobs K. , 2004 . The importance of the loss function in option valuation , Journal of Financial Economics , 72 , 291 – 318 . Google Scholar Crossref Search ADS Glorot X. , Bordes A. , Bengio Y. , 2011 . Deep sparse rectifier neural networks , in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011 , Fort Lauderdale, FL, USA , 15 , 315 – 323 . Goodfellow I. , Bengio Y. , Courville A. , 2016 . Deep Learning , MIT Press . Han J. , Moraga C. , 1995 . The influence of the sigmoid function parameters on the speed of backpropagation learning , From Natural to Artificial Neural Computation , 930 , 195 – 201 . Google Scholar Crossref Search ADS Kingma D.P. , Ba J. , 2014 . Adam: a method for stochastic optimization , arXiv:1412.6980 . Klokov A. , Fomel S. , 2013 . Selecting an optimal aperture in Kirchhoff migration using dip-angle images , Geophysics , 78 , S243 – S254 . Google Scholar Crossref Search ADS Krizhevsky A. , Sutskever I. , Geoffrey E.H. , 2012 . ImageNet classification with deep convolutional neural networks , Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012 , Lake Tahoe, NV, USA , 1 , 1097 – 1105 . Lecun Y. , Bengio Y. , Hinton G. , 2015 . Deep learning , Nature , 521 , 436 – 444 . Google Scholar Crossref Search ADS PubMed Leveille J.P. , Jones I.F. , Zhou Z.Z. , Wang B. , Liu F. , 2011 . Subsalt imaging for exploration, production, and development: a review , Geophysics , 76 , WB3 – WB20 . Google Scholar Crossref Search ADS Lewis W. , Vigh D. , 2017 . Deep learning prior models from seismic images for full-waveform inversion , SEG Technical Program Expanded Abstracts 2017 , Society of Exploration Geophysicists , 1512 – 1517 . Nair V. , Hinton G.E. , 2010 . Rectified linear units improve restricted Boltzmann machines , Proceedings of the 27th International Conference on International Conference on Machine Learning , 807 – 814 . Ruder S. , 2016 . An overview of gradient descent optimization algorithms , arXiv:1609.04747 . Schleicher J. , Hubral P. , Tygel M. , Jaya M.S. , 1997 . Minimum apertures and Fresnel zones in migration and demigration , Geophysics , 62 , 183 – 194 . Google Scholar Crossref Search ADS Schmidhuber J. , 2015 . Deep learning in neural networks , Neural Networks , 61 , 85 – 117 . Google Scholar Crossref Search ADS PubMed Spinner M. , Mann J. , 2007 . CRS-based minimum-aperture time migration—A 2D land-data case study , SEG Technical Program Expanded Abstracts , Society of Exploration Geophysicists , 2354 – 2358 . Sun H. , Schuster G.T. , 2001 . Wavepath migration , Geophysics , 66 , 1528 – 1537 . Google Scholar Crossref Search ADS Suzuki K. , 2017 . Overview of deep learning in medical imaging , Radiological Physics and Technology , 10 , 257 – 273 . Google Scholar Crossref Search ADS PubMed Webb S. , 2018 . Deep Learning for Biology , Nature , 554 , 555 – 557 . Google Scholar Crossref Search ADS PubMed Xu J. , Zhang J. , 2017 . Prestack time migration of nonplanar data: Improving topography prestack time migration with dip-angle domain stationary-phase filtering and effective velocity inversion , Geophysics , 82 , S235 – S246 . Google Scholar Crossref Search ADS Yuan S. , Liu J. , Wang S. , Wang T. , Shi P. , 2018a . Seismic waveform classification and first-break picking using convolution neural networks , IEEE Geoscience and Remote Sensing Letters , 15 , 272 – 276 . Google Scholar Crossref Search ADS Yuan S. , Su Y. , Wang T. , Wang J. , Wang S. , 2018b . Geosteering phase attributes: a new detector for the discontinuities of seismic images , IEEE Geoscience and Remote Sensing Letters , 1 –5 . doi: 10.1109/LGRS.2018.2866419 . Zhang H. , Zhang J. , Li Z. , Zhang J. , Xiao J. , 2017 . Improving the imaging resolution of 3D PSTM in VTI media using optimal summation within the Fresnel zone , Journal of Seismic Exploration , 26 , 311 – 330 . Zhang J. , Li Z. , Liu L. , Wang J. , Xu J. , 2016 . High-resolution imaging: An approach by incorporating stationary-phase implementation into deabsorption prestack time migration , Geophysics , 81 , S317 – S331 . Google Scholar Crossref Search ADS © The Author(s) 2019. Published by Oxford University Press on behalf of the Sinopec Geophysical Research Institute. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. TI - Automatic Fresnel zone picking in the dip-angle domain using deep neural networks JO - Journal of Geophysics and Engineering DO - 10.1093/jge/gxy012 DA - 2019-02-01 UR - https://www.deepdyve.com/lp/oxford-university-press/automatic-fresnel-zone-picking-in-the-dip-angle-domain-using-deep-w3A3dh35ba SP - 136 VL - 16 IS - 1 DP - DeepDyve ER -