TY - JOUR AU - Muto,, Shunsuke AB - Abstract In this article, we demonstrate that a convolutional neural network (CNN) can be effectively used to determine the presence of twins in the atomic resolution scanning transmission electron microscopy (STEM) images of catalytic Au nanoparticles. In particular, the CNN screening of Hough transformed images resulted in significantly higher accuracy rates as compared to those obtained by applying this technique to the raw STEM images. The proposed method can be utilized for evaluating the statistical twining fraction of Au nanoparticles that strongly affects their catalytic activity. convolutional neural network (CNN), gold nanoparticle, Z-contrast image, crystal structure, Hough transform Introduction Catalytic Au nanoparticles with diameters of <5 nm have received significant attention after discovering their unusually high catalytic activity towards CO oxidation [1]. In particular, the activity of twinned Au nanoparticles was significantly larger than that of single-crystal Au nanoparticles [2,3]. Moreover, the supported Au catalyst containing mostly twinned nanoparticles exhibited higher catalytic activity toward CO oxidation as compared to that of the catalyst predominantly consisting of single-crystal nanoparticles [4]. Hence, it is important to estimate the fractions of twins in supported Au nanoparticle catalysts since they strongly affect the catalytic activity of these materials. Typically, the presence of twins in the structure of Au nanocatalyst is detected by transmission electron microscopy (TEM), which is able to identify the characteristic butterfly contrast in a dark-field (DF) image [5]. However, this method is not always applicable to supported nanoparticle catalysts because the contrast originated from the thick substrate often dominates the butterfly contrast of twinned particles. Atomic resolution imaging conducted using high-angle annular DF scanning TEM (HAADF-STEM) with aberration correction represents an alternative promising method for detecting twinned nanoparticles by measuring their Z-contrast (the contrast proportional to the atomic number) [4,6–9]. However, it is not practical to count the number of particles and distinguish between the twinned and untwinned ones through visual observation since Au nanoparticles are not always oriented along the low-order zone axis parallel to the direction of the incident electron beam; as a result, the obtained HAADF images are typically not well resolved on the atomic level. To solve this problem, we propose a combination of the Hough transform of HAADF images and convolutional neural network (CNN) method implemented in the Matlab commercial software package to efficiently and automatically detect the presence of twins in the crystal structure of nanoparticles and evaluate the fraction of twinned particles from the experimentally obtained HAADF images. The modern machine learning techniques that are capable of solving the image recognition problem by using an artificial intelligence (AI) utilize either neural networks (NNs) or deep learning (DL) methods. Usually, a NN contains a few layers formed by artificial neurons (called nodes). Each node represents a layer connected to the nodes of the neighboring layer, whose connecting weights must be manually adjusted to obtain the best result. NNs have been successfully used in the field of electron microscopy: for example, as an automatic tracking marker of electron tomography [10–12] that was able to correctly align the tilted image data. On the other hand, the DL techniques design a recognition model itself using a multi-layer network automatically optimized by AI. The CNN approach, one of the DL algorithms developed in the 1980s, is a method characterized by the existence of a convolution layer and a pooling layer in a multi-layer network [13]. Unlike the conventional NNs, CNNs extract various features from the localized areas of the analyzed image using multiple filters, which ensure the ‘translational invariance’ of pattern recognition. Since CNN learns iteratively to increase the correct answer rate by subsequently updating the connecting weights between different layers, it has been successfully applied to the automatic recognition of handwritten figures [14]. In the present study, we demonstrate that the repeated learning layers of CNN effectively improve the correct answer rate and thus can be used to determine the presence of Au nanoparticle twins in the experimentally obtained HAADF-STEM images. Furthermore, the correct answer rate is dramatically improved by applying the CNN approach to a Hough transformed dataset (as compared to the results obtained for the unprocessed images). Methods Sample preparation and STEM observations Au nanoparticles supported on TiO2 and Nb2O5 substrates (Au loading: 1 wt.%) were prepared by a deposition–precipitation method, which was described in detail elsewhere [2]. Both the TiO2 (JRC-TIO8 supplied by the Catalysis Society of Japan) and Nb2O5 (Nb2O5·nH2O supplied by CBMM) substrates were calcined at a temperature of 400°C before use. An aqueous solution of NaOH was added to a suspension containing 1 mM of HAuCl4 and a specified amount of the substrate to adjust its pH to 7. After stirring for 1 h, the suspension was centrifuged, and the obtained slurry was washed with water five times. After drying, the powder was treated either under a CO gas flow (4 vol.% CO/Ar, 10 mL min−1) at room temperature for 0.5 h or under a H2 flow (3 vol.% H2/N2, 100 mL min−1) at 473 K for 0.5 h. Structural variations of Au nanoparticles have been investigated in detail elsewhere [4], while their representative images are shown in Supplemental material. Atomic resolution STEM images were obtained using a Cs-corrected STEM instrument (JEOL) equipped with a thermal field-emission gun operated at a voltage of 200 kV. HAADF-STEM images were taken at a resolution of 0.021 nm/pixel. Pre-processing of STEM images In the present study, only in-focused nanoparticle regions of the STEM images were selected for the subsequent CNN analysis. After that, the images of focused Au nanoparticles were trimmed to obtain full square-shaped frames (to avoid including the support ceramics from the original Z-contrast images) and then resized to a resolution of 180 pixel2. All the in-focused Au nanoparticles were processed without any further arbitrary selection. Each image was labeled as a single-crystal or twinned one by comparing the experimental images with the datasets theoretically simulated using the kinematic HAADF-STEM image simulation package KINE-HAADF [15], which was able to quickly generate reasonably accurate simulated images for small particles (as compared to more precise full simulation packages). Three different models were considered: single-crystal, twinned and multiply twinned particle (MTP) ones. The standard octahedral and decahedral structures consisting of 147 Au atoms each were used as the single-crystal and twinned models, respectively. As the MTP model, the decahedral structure (called Ino’s model [16]) consisting of 181 Au atoms was utilized. The Gaussian spread for each atomic column (corresponding to the full width at half maximum, FWHM) was set to 0.07 nm by taking the actual size of the used electron probe into account [6]. All simulated images were prepared by rotating the particles in all directions between 0 and 90° around the x, y and z axes by 1° per step, while saving the image file obtained for each model. The experimental HAADF-STEM images, in which the lattice fringes contained kinks (see, for example, Figure 1(c) for reference), were easily labeled as twinned ones, while the remaining images were classified either as twinned or single-crystal ones by the characteristic lattice fringe patterns obtained by simulations (their typical examples are shown in Figure 1). More than 70% of nanoparticles in this study were identified as either single or twinned ones, while the others were not included in the CNN analysis because their catalytic activities had not been discussed in the previous work [4]. As a result, 397 images were prepared for analysis. Fig. 1. View largeDownload slide Comparison between the experimental and simulated HAADF images of Au nanoparticles with and without twins. Upper row: the experimental HAADF images of (a) (b) single-crystal, (c) twinned and (d) presumably multiply twinned particles (MTPs). Bottom row: the simulated images based on the (e) (f) single-crystal, (g) twinned, and (h) MTP models corresponding to the experimental images depicted in panels (a)−(d), respectively. Fig. 1. View largeDownload slide Comparison between the experimental and simulated HAADF images of Au nanoparticles with and without twins. Upper row: the experimental HAADF images of (a) (b) single-crystal, (c) twinned and (d) presumably multiply twinned particles (MTPs). Bottom row: the simulated images based on the (e) (f) single-crystal, (g) twinned, and (h) MTP models corresponding to the experimental images depicted in panels (a)−(d), respectively. For comparison, an additional dataset was prepared by applying the Hough transformation (described in the following section) to the above-mentioned Z-contrast images after the Sobel edge-detecting operation [17]. Each dataset was randomly divided into two groups containing 70% (training data) and 30% (proof data) of the obtained data points. Hough transformation The Hough transform is a digital image processing method used for detecting straight lines in an image. It transforms a straight line in the conventional x−y Cartesian coordinates to a pair of the distance from the origin to the closest point on the straight line (ρ) and the angle between the x-axis and the line connecting the origin with that closest point (θ) [18], which can be expressed by the following equation: ρ=xcosθ+ysinθ. Figure 2 shows typical three model images: (a) a single-crystal containing only one-directional lattice fringes, (b) a single-crystal consisting of two-directional lattice fringes, (c) a twinned crystal and (d) an MTP (the corresponding Hough transformed patterns are depicted below each model in panels (e)–(h)). The sharp line-contrast segments are arranged in tandem in panel (e), corresponding to the lattice fringes depicted in panel (a); the two sets of line segments displayed in panel (f) are arranged according to the two sets of lattice fringes with different orientations; the broader and weaker contrast features observed in panel (g) are due to the shorter lengths of the fringes of the twinned crystal depicted in panel (c) and the contrast caused by the fringes of MTPs (see panel (d)) is even more difficult to interpret by the method used for panel (h) because the twin boundaries are buried in the background. Figure 3 shows the Hough transformed patterns obtained from Figures 1(a)–(f). From these results, it can be concluded that the Hough transformed patterns of atomic resolution HAADF-STEM images of Au particles are more suitable for distinguishing between the single crystals and twinned particles than the original HAADF-STEM images. Fig. 2. View largeDownload slide Schematic diagrams of typical lattice fringe patterns and their corresponding Hough transforms. (a) A single lattice fringe pattern, (b) a two-directional lattice fringe pattern, (c) a twinned crystal and (d) a multi-twinned particle. (e)−(h) Hough transforms of the images depicted in panels (a)−(d) with their characteristic patterns. Fig. 2. View largeDownload slide Schematic diagrams of typical lattice fringe patterns and their corresponding Hough transforms. (a) A single lattice fringe pattern, (b) a two-directional lattice fringe pattern, (c) a twinned crystal and (d) a multi-twinned particle. (e)−(h) Hough transforms of the images depicted in panels (a)−(d) with their characteristic patterns. Fig. 3. View largeDownload slide Hough transformed patterns corresponding to the images depicted in Figures 1(a)−(d), respectively. Fig. 3. View largeDownload slide Hough transformed patterns corresponding to the images depicted in Figures 1(a)−(d), respectively. CNN construction The CNN utilized in this work was constructed using the Machine Learning Toolbox supported by the MATLAB R2017b (Mathworks) platform, which was installed on a workstation equipped with two graphical processing units (GPUs) and a GeForce GTX 1080 Ti (NVIDIA) graphics card containing 3584 CUDA cores and 11 GB of memory. Figure 4 shows the schematic diagram of the constructed CNN containing 100 channels (filters) with the size of a convolution layer equal to 7 × 7. Generally, a CNN with the standard configuration is constructed in the order of the presented dataflow, starting with the input layer and followed by the convolution layer, rectified linear unit (ReLU) layer, max-pooling layer, full-connect layer, softmax layer and classification layer. The feature map issued for each layer is supposed to successively disintegrate the image features and isolate various unit components expressing the entire group of the original images (which, unfortunately, constitutes a ‘Black Box’ inaccessible by the users of the present Matlab platform). Fig. 4. View largeDownload slide Schematic diagram of the CNN with a convolution layer containing 100 filters with sizes of 7 × 7. Fig. 4. View largeDownload slide Schematic diagram of the CNN with a convolution layer containing 100 filters with sizes of 7 × 7. The input layer is constructed using the input images with sizes of 180 × 180 × 1 pixels3 (the notation ‘×1’ indicates that the image contains a (8-bit) greyscale palette instead of the RGB colored one (‘×3’)). The second layer is called the ‘convolution layer’, which operates as a feature extractor by applying the convolution operations of the ‘filter’ matrices to the input layer. The convolution filter is usually represented by a square matrix with specified size and shape, whose entries begin with random numbers generated with the average of zero and standard deviation of 0.01 and are subsequently updated after each learning cycle. The number of filters N was set to 40, 70, 100, 130, 160 and 190, and the filter matrix dimension F assumed the values of 5, 7, 9, 11, 13 and 15. The default values of the other parameters including the stride S (the step size required for the filter to traverse the input image) and padding P (the entries in the extended edge areas of the input image) were set to 1 and 0, respectively, resulting in the feature maps with the size (180 + 2 P – F)/S + 1 and depth N. The next layer called the ‘ReLU layer’ is applied before the pooling operation [19]. It replaces the negative values with 0, thus accelerating the learning process and maintaining high degree of the robustness of the used CNN. The max-pooling layer downsizes the feature maps by a factor of nine by dividing each image into small windows with sizes of 3 × 3 and leaving the maximum value in each window, which also ensures the robustness of the learning process with respect to the translational ambiguity of the images. The full-connect layer combines all the filtered images of the previous layer by taking into account their individual contributions through different weights updated during each learning cycle, starting with random numbers generated using a procedure similar to that performed for the filters of the convolution layer. Finally, the softmax and classification layers are located after the full-connect layer to provide an output. For this purpose, the activation (‘softmax’) function assigns probability values to all nodes of the previous full-connect layer to produce the correct answers. Those values are then passed to the next classification layer to determine whether the answer provided by the network is correct (unity) or not (zero). The number of nodes in the classification layer corresponds to the total number of classes (single, twinned), which is equal to two in this study. A mini-batch learning procedure was utilized to increase the process rate and provide better results as compared to those obtained by updating the weights of the filters for all images [20]. A specified number of images were randomly selected from the training dataset (in this study, the size of the mini-batch was set to 16), and the resulting subset was used for a learning cycle, in which the procedure was repeated multiple times until the number of epochs reached a specified value (here, one epoch corresponds to the number of iterations required for all the training images to update their weights once). The maximum number of epochs was set to 120 (typically, its magnitude varies between 10 and 200). If a smaller number of epochs is used for training, the accuracy of the obtained results would be poor due to significant under-fitting; however, when the number of epochs is too large, the network would ‘memorize’ the desired outputs for the training inputs (the overfitting effect). The CNN constructed in this work was first trained using the training dataset. All training runs in this study were finished within 10 min or even shorter periods after decreasing the number of nodes in the training layers. Afterwards, a confusion matrix was produced using the proof dataset, in which the correct answer rate (called ‘accuracy’) was calculated and tabulated as a function of the number and size of the applied filters. Results The accuracies determined using the CNN with various numbers and sizes of filters and Hough transformed patterns are listed in Table 1. None of the obtained values exceed 70%. Since the angles of image lattice fringes represent the key parameters of the Hough transformed patterns utilized for distinguishing image features, the maximum number of the max-pooling layers located after the convolution layer was increased to four to improve the translational invariance of the system (see Figure 5). The accuracies obtained when the number of the max-pooling layers was equal to 2, 3 and 4 are listed in Tables 2(a)−(c), respectively. They show that as the number of the repeated max-pooling layers increases, the resulting accuracy values increase as well. Table 1. Accuracy rates of the CNN constructed according to the diagram depicted in Figure 4 with various numbers of filters and their sizes applied to the Hough transformed dataset. Size of filters Number of filter 5 7 9 11 13 15 40 60.5 54.62 55.46 64.71 60.5 60.5 70 63.87 60.5 57.98 59.66 57.14 63.03 100 58.82 63.87 67.23 65.55 68.07 63.87 130 63.87 63.03 67.23 67.23 62.18 68.07 160 61.34 62.18 64.71 64.71 66.39 62.18 190 61.34 67.23 65.55 65.55 66.39 69.75 Size of filters Number of filter 5 7 9 11 13 15 40 60.5 54.62 55.46 64.71 60.5 60.5 70 63.87 60.5 57.98 59.66 57.14 63.03 100 58.82 63.87 67.23 65.55 68.07 63.87 130 63.87 63.03 67.23 67.23 62.18 68.07 160 61.34 62.18 64.71 64.71 66.39 62.18 190 61.34 67.23 65.55 65.55 66.39 69.75 Table 1. Accuracy rates of the CNN constructed according to the diagram depicted in Figure 4 with various numbers of filters and their sizes applied to the Hough transformed dataset. Size of filters Number of filter 5 7 9 11 13 15 40 60.5 54.62 55.46 64.71 60.5 60.5 70 63.87 60.5 57.98 59.66 57.14 63.03 100 58.82 63.87 67.23 65.55 68.07 63.87 130 63.87 63.03 67.23 67.23 62.18 68.07 160 61.34 62.18 64.71 64.71 66.39 62.18 190 61.34 67.23 65.55 65.55 66.39 69.75 Size of filters Number of filter 5 7 9 11 13 15 40 60.5 54.62 55.46 64.71 60.5 60.5 70 63.87 60.5 57.98 59.66 57.14 63.03 100 58.82 63.87 67.23 65.55 68.07 63.87 130 63.87 63.03 67.23 67.23 62.18 68.07 160 61.34 62.18 64.71 64.71 66.39 62.18 190 61.34 67.23 65.55 65.55 66.39 69.75 Table 2. Accuracy rates of the CNN constructed by varying the size and number of filters applied to the Hough transformed dataset. The number of max-pooling layers is equal to (a) 2, (b) 3 and (c) 4. The accuracy rates greater than 70% and 80% are highlighted in bold and red bold, respectively. (a) Size of filters Number of filter 5 7 9 11 13 15 40 69.75 64.71 63.03 57.14 57.14 68.07 70 64.71 69.75 67.23 64.71 61.34 65.55 100 65.55 74.79 69.75 68.91 63.03 67.23 130 68.91 68.91 66.39 63.87 63.87 61.34 160 70.59 70.59 68.07 68.91 61.34 59.66 190 66.39 66.39 67.23 63.03 67.23 68.91 (a) Size of filters Number of filter 5 7 9 11 13 15 40 69.75 64.71 63.03 57.14 57.14 68.07 70 64.71 69.75 67.23 64.71 61.34 65.55 100 65.55 74.79 69.75 68.91 63.03 67.23 130 68.91 68.91 66.39 63.87 63.87 61.34 160 70.59 70.59 68.07 68.91 61.34 59.66 190 66.39 66.39 67.23 63.03 67.23 68.91 (b) Size of filters Number of filter 5 7 9 11 13 15 40 71.43 73.11 73.11 79.83 79.83 76.47 70 76.74 78.15 76.47 68.91 73.95 68.07 100 73.95 78.99 73.95 75.63 75.63 73.95 130 77.31 78.15 75.63 77.31 76.47 73.95 160 75.63 73.11 77.31 79.83 74.79 73.95 190 73.11 77.31 74.79 74.79 77.31 78.15 (b) Size of filters Number of filter 5 7 9 11 13 15 40 71.43 73.11 73.11 79.83 79.83 76.47 70 76.74 78.15 76.47 68.91 73.95 68.07 100 73.95 78.99 73.95 75.63 75.63 73.95 130 77.31 78.15 75.63 77.31 76.47 73.95 160 75.63 73.11 77.31 79.83 74.79 73.95 190 73.11 77.31 74.79 74.79 77.31 78.15 (c) Size of filters Number of filter 5 7 9 11 13 15 40 79.83 81.51 78.99 79.83 76.47 79.83 70 80.67 76.47 80.67 78.99 78.99 77.31 100 75.63 81.51 81.51 80.67 79.83 79.83 130 79.83 79.83 78.15 81.51 82.35 80.67 160 78.15 79.83 80.67 80.67 77.31 79.83 190 77.31 79.83 78.99 81.51 81.51 75.63 (c) Size of filters Number of filter 5 7 9 11 13 15 40 79.83 81.51 78.99 79.83 76.47 79.83 70 80.67 76.47 80.67 78.99 78.99 77.31 100 75.63 81.51 81.51 80.67 79.83 79.83 130 79.83 79.83 78.15 81.51 82.35 80.67 160 78.15 79.83 80.67 80.67 77.31 79.83 190 77.31 79.83 78.99 81.51 81.51 75.63 Table 2. Accuracy rates of the CNN constructed by varying the size and number of filters applied to the Hough transformed dataset. The number of max-pooling layers is equal to (a) 2, (b) 3 and (c) 4. The accuracy rates greater than 70% and 80% are highlighted in bold and red bold, respectively. (a) Size of filters Number of filter 5 7 9 11 13 15 40 69.75 64.71 63.03 57.14 57.14 68.07 70 64.71 69.75 67.23 64.71 61.34 65.55 100 65.55 74.79 69.75 68.91 63.03 67.23 130 68.91 68.91 66.39 63.87 63.87 61.34 160 70.59 70.59 68.07 68.91 61.34 59.66 190 66.39 66.39 67.23 63.03 67.23 68.91 (a) Size of filters Number of filter 5 7 9 11 13 15 40 69.75 64.71 63.03 57.14 57.14 68.07 70 64.71 69.75 67.23 64.71 61.34 65.55 100 65.55 74.79 69.75 68.91 63.03 67.23 130 68.91 68.91 66.39 63.87 63.87 61.34 160 70.59 70.59 68.07 68.91 61.34 59.66 190 66.39 66.39 67.23 63.03 67.23 68.91 (b) Size of filters Number of filter 5 7 9 11 13 15 40 71.43 73.11 73.11 79.83 79.83 76.47 70 76.74 78.15 76.47 68.91 73.95 68.07 100 73.95 78.99 73.95 75.63 75.63 73.95 130 77.31 78.15 75.63 77.31 76.47 73.95 160 75.63 73.11 77.31 79.83 74.79 73.95 190 73.11 77.31 74.79 74.79 77.31 78.15 (b) Size of filters Number of filter 5 7 9 11 13 15 40 71.43 73.11 73.11 79.83 79.83 76.47 70 76.74 78.15 76.47 68.91 73.95 68.07 100 73.95 78.99 73.95 75.63 75.63 73.95 130 77.31 78.15 75.63 77.31 76.47 73.95 160 75.63 73.11 77.31 79.83 74.79 73.95 190 73.11 77.31 74.79 74.79 77.31 78.15 (c) Size of filters Number of filter 5 7 9 11 13 15 40 79.83 81.51 78.99 79.83 76.47 79.83 70 80.67 76.47 80.67 78.99 78.99 77.31 100 75.63 81.51 81.51 80.67 79.83 79.83 130 79.83 79.83 78.15 81.51 82.35 80.67 160 78.15 79.83 80.67 80.67 77.31 79.83 190 77.31 79.83 78.99 81.51 81.51 75.63 (c) Size of filters Number of filter 5 7 9 11 13 15 40 79.83 81.51 78.99 79.83 76.47 79.83 70 80.67 76.47 80.67 78.99 78.99 77.31 100 75.63 81.51 81.51 80.67 79.83 79.83 130 79.83 79.83 78.15 81.51 82.35 80.67 160 78.15 79.83 80.67 80.67 77.31 79.83 190 77.31 79.83 78.99 81.51 81.51 75.63 Fig. 5. View largeDownload slide Schematic diagram of the CNN with a convolution layer containing 100 filters with sizes of 7 × 7 and four max-pooling layers. The points added to Figure 4 are highlighted in red. Fig. 5. View largeDownload slide Schematic diagram of the CNN with a convolution layer containing 100 filters with sizes of 7 × 7 and four max-pooling layers. The points added to Figure 4 are highlighted in red. An alternative way to improve the CNN accuracy would be to increase the maximum numbers of the convolution, ReLU, and max-pooling layers to 3 (see Figure 6). The corresponding accuracy values obtained when the numbers of these layers are equal to 2 and 3 are listed in Tables 3(a) and (b), respectively. These results show that increasing the number of repeated layers increases the accuracy of the constructed CNN. In particular, the network containing 100 filters with sizes of 7 × 7 and three convolutions, three ReLU, and three max-pooling layers achieved the highest accuracy value of around 85%, which represented the current upper limit of applying CNNs to actual datasets. In this study, the smaller was the size of the feature map connected to the full-connect layer, the higher was the correct answer rate. Table 3. Accuracy rates of the CNN constructed by varying the size and number of filters applied to the Hough transformed dataset. The numbers of the convolution, ReLU, and max-pooling layers are equal to (a) 2 and (b) 3. The accuracy rates greater than 70% and 80% are highlighted in bold and red bold, respectively. (a) Size of filters Number of filter 5 7 9 11 13 15 40 73.11 74.79 69.75 70.59 77.31 73.95 70 72.27 70.59 75.63 73.11 74.79 72.27 100 72.27 69.75 75.63 72.27 68.91 75.63 130 72.27 74.79 75.63 68.07 74.79 74.79 160 74.79 73.11 73.11 69.75 73.95 75.63 190 70.59 72.27 70.59 79.83 69.75 80.67 (a) Size of filters Number of filter 5 7 9 11 13 15 40 73.11 74.79 69.75 70.59 77.31 73.95 70 72.27 70.59 75.63 73.11 74.79 72.27 100 72.27 69.75 75.63 72.27 68.91 75.63 130 72.27 74.79 75.63 68.07 74.79 74.79 160 74.79 73.11 73.11 69.75 73.95 75.63 190 70.59 72.27 70.59 79.83 69.75 80.67 (b) Size of filters Number of filter 5 7 9 11 40 65.55 77.31 80.67 74.79 70 73.95 82.35 81.51 80.67 100 80.67 84.87 81.51 77.31 130 79.83 79.83 78.15 79.83 160 81.51 79.83 81.51 81.51 190 80.67 81.51 80.67 78.15 (b) Size of filters Number of filter 5 7 9 11 40 65.55 77.31 80.67 74.79 70 73.95 82.35 81.51 80.67 100 80.67 84.87 81.51 77.31 130 79.83 79.83 78.15 79.83 160 81.51 79.83 81.51 81.51 190 80.67 81.51 80.67 78.15 Table 3. Accuracy rates of the CNN constructed by varying the size and number of filters applied to the Hough transformed dataset. The numbers of the convolution, ReLU, and max-pooling layers are equal to (a) 2 and (b) 3. The accuracy rates greater than 70% and 80% are highlighted in bold and red bold, respectively. (a) Size of filters Number of filter 5 7 9 11 13 15 40 73.11 74.79 69.75 70.59 77.31 73.95 70 72.27 70.59 75.63 73.11 74.79 72.27 100 72.27 69.75 75.63 72.27 68.91 75.63 130 72.27 74.79 75.63 68.07 74.79 74.79 160 74.79 73.11 73.11 69.75 73.95 75.63 190 70.59 72.27 70.59 79.83 69.75 80.67 (a) Size of filters Number of filter 5 7 9 11 13 15 40 73.11 74.79 69.75 70.59 77.31 73.95 70 72.27 70.59 75.63 73.11 74.79 72.27 100 72.27 69.75 75.63 72.27 68.91 75.63 130 72.27 74.79 75.63 68.07 74.79 74.79 160 74.79 73.11 73.11 69.75 73.95 75.63 190 70.59 72.27 70.59 79.83 69.75 80.67 (b) Size of filters Number of filter 5 7 9 11 40 65.55 77.31 80.67 74.79 70 73.95 82.35 81.51 80.67 100 80.67 84.87 81.51 77.31 130 79.83 79.83 78.15 79.83 160 81.51 79.83 81.51 81.51 190 80.67 81.51 80.67 78.15 (b) Size of filters Number of filter 5 7 9 11 40 65.55 77.31 80.67 74.79 70 73.95 82.35 81.51 80.67 100 80.67 84.87 81.51 77.31 130 79.83 79.83 78.15 79.83 160 81.51 79.83 81.51 81.51 190 80.67 81.51 80.67 78.15 Fig. 6. View largeDownload slide Schematic diagram of the CNN with multiple convolution layers containing 100 filters with sizes of 7 × 7 and three ReLU and max-pooling layers. The points added to Figure 4 are highlighted in red. Fig. 6. View largeDownload slide Schematic diagram of the CNN with multiple convolution layers containing 100 filters with sizes of 7 × 7 and three ReLU and max-pooling layers. The points added to Figure 4 are highlighted in red. For comparison, the confusion matrices obtained using the networks described in Figures 4–6 and trimmed atomic resolution Z-contrast images of Au nanoparticles as the input (without Hough transforms) are listed in Tables 4(a)−(c), respectively (the confusion matrix corresponding to the highest accuracies presented in Table 3(b) is shown in Table 4(d)). It should be noted that all the images analyzed in Tables 4(b) and (c) were identified as twinned ones, suggesting that the utilized model was unsuitable for the datasets used in this work. On the other hand, the data points listed in Table 4(a) are not fully biased and exhibit much lower accuracy as compared to the results provided in Table 4(d). Therefore, the Hough transformation can be effectively utilized for the automatic identification of the crystal structure of Au nanoparticles using CNNs because it converts the features spread over the entire image into a set of localized signature spots arranged in tandem. Table 4. Confusion matrices of the solutions provided by the CNN constructed using the models depicted in Figures (a) 4 (b) 5 and (c) 6, which were applied to the trimmed atomic resolution Z-contrast images. Part (d) contains the results obtained by applying the model depicted in Figure 6 to the Hough transformed dataset. (a) Labeled by CNN Single Twinned Labeled in proof dataset Single 17 31 Twinned 24 47 Accuracy(%) 53.8 (a) Labeled by CNN Single Twinned Labeled in proof dataset Single 17 31 Twinned 24 47 Accuracy(%) 53.8 (b) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (b) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (c) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (c) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (d) Labeled by CNN Single Twinned Labeled in proof dataset Single 38 10 Twinned 8 63 Accuracy(%) 84.9 (d) Labeled by CNN Single Twinned Labeled in proof dataset Single 38 10 Twinned 8 63 Accuracy(%) 84.9 Table 4. Confusion matrices of the solutions provided by the CNN constructed using the models depicted in Figures (a) 4 (b) 5 and (c) 6, which were applied to the trimmed atomic resolution Z-contrast images. Part (d) contains the results obtained by applying the model depicted in Figure 6 to the Hough transformed dataset. (a) Labeled by CNN Single Twinned Labeled in proof dataset Single 17 31 Twinned 24 47 Accuracy(%) 53.8 (a) Labeled by CNN Single Twinned Labeled in proof dataset Single 17 31 Twinned 24 47 Accuracy(%) 53.8 (b) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (b) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (c) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (c) Labeled by CNN Single Twinned Labeled in proof dataset Single 0 48 Twinned 0 71 Accuracy(%) 59.7 (d) Labeled by CNN Single Twinned Labeled in proof dataset Single 38 10 Twinned 8 63 Accuracy(%) 84.9 (d) Labeled by CNN Single Twinned Labeled in proof dataset Single 38 10 Twinned 8 63 Accuracy(%) 84.9 Discussion To evaluate the highest obtained accuracy rate of 85%, the historical development of the CNN technique should be reviewed using the ImageNet Large-scale Visual Recognition Challenge (ILSVRC) resource, which is a worldwide annual competition of the image recognition systems utilized for the analysis of the supervised image data from ImageNet, a large-scale open source database available online [21]. Before 2011, the NN-based systems exhibited the highest correct answer rate of around 74%, thus successively reducing the error rate by around 1% each year. However, since 2012, the newly developed CNN (which was later called Alex-net) succeeded in reducing the error rate down to around 15%, which constituted an improvement of more than 10% as compared to the results obtained in the previous year [20]. Alex-net, implementing multiple modern GPUs, contained up to 8 learning layers (5 convolution layers and 3 full-connect layers; a total of 23 layers), which could not be executed within a realistic timeframe and exceeded the available memory capacity using a contemporary high-performance workstation without GPU. Since Alex-net represented a significant milestone in the development of machine learning algorithms utilized for the image recognition problem, almost all the existing systems have been replaced with the CNN-based ILSVRC ones since that time. Moreover, two ILSVRC systems were able to achieve an error rate of less than 5% that was better than the conceivable human error rate [22,23]. In 2017, a modified Alex-net ILSVRC system containing a Squeeze-and-Excitation block and characterized by an error rate of 2.25% was constructed [24]. The CNN used in this study more closely resembled Alex-net than the modified systems described in Refs. [22–24] with a smaller total number of learning layers. Its accuracy rate of 85% is very close to that of the Alex-net system developed in 2011 as well to the results obtained by applying CNNs to the actual datasets, including complex networks for different tasks [25], recognition of colored images [26], recognition of infrared images [27], and weather prediction performed using various climate datasets [28]. The correct answer rate can be increased using systems similar to that described in Refs. [22–24] without re-designing the entire network structure after they become a part of the MATLAB software in the future due to the lower costs and reasonable processing time. From the results of this work, it can be concluded that the constructed CNN may be utilized for the fast screening of STEM images containing several hundred nanoparticles synthesized under various conditions. Since the number of analyzed image data points typically does not exceed a few hundred, the obtained accuracy rate of 85% is considered relatively high (its magnitude can be further improved by analyzing a large number of experimental images). Currently, the obtained correct answer rate is sufficient for determining the effect produced by the crystal structure on the catalytic activity of a material because the factor of the catalytic activity toward CO oxidation remains relevant even after taking into account the 15% error rate [4]. The catalytic activity may be affected not only by the presence of twins, but also by the number of twins in a particle or MTP types, which would require the use of more advanced pattern recognition methods. However, this topic is beyond the scope of the present study. Conclusion In this work, the CNN technique was successfully applied to hundreds of HAADF-STEM images of Au nanoparticles to automatically detect the presence of twins among the analyzed structures. The correct answer rates obtained using the Hough transformed images were much higher than those achieved using the raw Z-contrast images as the input. The highest correct answer rate was obtained by constructing a CNN with the largest number of repeated max-pooling layers. The proposed scheme can be used for screening the images of nanoparticle catalysts synthesized under various conditions and examining the statistical correlation between their structural parameters and catalytic activity. Acknowledgements The authors thank Prof. J. Yuan and his colleagues from the York University for providing them with the KINE-HAADF simulation code. One of the co-authors (Y.Y.) also thanks Dr Kyoichi Sawabe for supplying the Au nanoparticle model structures used in the KINE-HAADF simulations. Funding This work was partially supported by Grants-in-Aid for the ‘Elements Strategy Initiative to Form Core Research Center’ Young Scientists (A) (16H06131) and Challenging Exploratory Research (16K14476) programs funded by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. References 1 Haruta M , Yamada N , Kobayashi T , and Iijima S ( 1989 ) Gold catalysts prepared by coprecipitation for low-temperature oxidation of hydrogen and of carbon monoxide . J. Catal. 115 : 301 – 309 . Google Scholar Crossref Search ADS 2 Conningham D A H , Vogel W , Sanchez R M T , Tanaka K , and Haruta M ( 1999 ) Structural analysis of Au/TiO2 catalysts by Debye function analysis . J. Catal. 183 : 24 – 31 . Google Scholar Crossref Search ADS 3 Pandey A D , Guttel R , Leoni M , Schuth F , and Weidenthaler C ( 2010 ) Influence of the microstructure of Gold-Zirconia Yolk-Shell catalysts on the CO oxidation activity . J. Phys. Chem. C 114 : 19386 – 19394 . Google Scholar Crossref Search ADS 4 Ohyama J , Koketsu T , Yamamoto Y , Arai S , and Satsuma A ( 2015 ) Preparation of TiO2-supported twinned gold nanoparticles by CO treatment and their CO oxidation activity . Chem. Commun. 51 : 15823 – 15826 . Google Scholar Crossref Search ADS 5 Ino S ( 1966 ) Epitaxial growth of metals on rocksalt faces cleaved in vacuum. II. Orientation and structure of gold particles formed in ultrahigh vacuum . J. Phys. Soc. Jpn. 21 : 346 – 362 . Google Scholar Crossref Search ADS 6 Yamamoto Y , Arai S , Esaki A , Ohyama J , Satsuma A , and Tanaka N ( 2014 ) Statistical distribution of single atoms and clusters of supported Au catalyst analysed by global high-resolution HAADFSTEM observation with morphological image-processing operation . Microscopy 63 : 209 – 218 . Google Scholar Crossref Search ADS PubMed 7 Ohyama J , Esaki E , Koketsu T , Yamamoto Y , Arai S , and Satsuma A ( 2016 ) Atomic-scale insight into the structural effect of a supported Au catalyst based on a size-distribution analysis using Cs-STEM and morphological image-processing . J. Catal. 335 : 24 – 35 . Google Scholar Crossref Search ADS 8 Chen C , Hu Z , Li Y , Liu L , Mori H , and Wang Z ( 2016 ) In-situ high-resolution ransmission electron microscopy investigation of overheating of Cu nanoparticles . Sci. Rep. 6 : 19545 . Google Scholar Crossref Search ADS PubMed 9 Yoshida K , Kon K , and Shimizu K ( 2016 ) Atomic-resolution HAADF-STEM study of Ag/Al2O3 catalysts for borrowing-hydrogen and acceptorless dehydrogenative coupling reactions of alcohols . Top. Catal. 59 : 1740 – 1747 . Google Scholar Crossref Search ADS 10 Ogura T , and Sato C ( 2001 ) An automatic particle pickup method using a neural network applicable to low-contrast electron micrographs . J. Struct. Biol. 136 : 227 – 238 . Google Scholar Crossref Search ADS PubMed 11 Ogura T , and Sato C ( 2004 ) Automatic particle pickup method using a neural network has high accuracy by applying an initial weight derived from eigenimages: a new reference free method for single-particle analysis . J. Struct. Biol. 145 : 63 – 75 . Google Scholar Crossref Search ADS PubMed 12 Ogura T , and Sato C ( 2004 ) Auto-accumulation method using simulated annealing enables fully automatic particle pickup completely free from a matching template or learning data . J. Struct. Biol. 146 : 344 – 358 . Google Scholar Crossref Search ADS PubMed 13 Fukushima K ( 1980 ) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position . Biol. Cybern. 36 : 193 – 202 . Google Scholar Crossref Search ADS PubMed 14 LeCun Y , Bose B , Denker J S , Henderson D , Howard R E , Hubbard W , and Jackel L D ( 1989 ) Backpropagation applied to handwritten zip code recognition . Neural Comput. 1 : 541 – 551 . Google Scholar Crossref Search ADS 15 He D S , Li Z Y , and Yuan J ( 2015 ) Kinematic HAADF-STEM image simulation of small nanoparticles . Micron 74 : 47 – 53 . Google Scholar Crossref Search ADS PubMed 16 Ino S ( 1969 ) Stability of multiply-twinned particles . J. Phys. Soc. Jpn. 27 : 941 – 953 . Google Scholar Crossref Search ADS 17 Gupta S , and Mazumdar S G ( 2013 ) Sobel edge detection algorithm . Int. J. Comput. Sci. Manage. Res. 2 : 1578 – 1583 . 18 Duda R O , and Hart P E ( 1972 ) Use of the Hough transformation to detect lines and curves in pictures . Graph. Image Process. 15 : 11 – 15 . 19 Nair V , and Hinton G E ( 2010 ) Rectified Linear Units Improve Restricted Boltzmann Machines, ICML'10 Proceedings of the 27th International Conference on International Conference on Machine Learning, 807–814. 20 Krizhevsky A , Sutskever I , and Hinton G E ( 2012 ) ImageNet Classification with Deep Convolutional Neural Networks, NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, 1:1097–1105. 21 Deng J , Dong W , Socher R , Li L J , Li K , and Fei-Fei L ( 2009 ) ImageNet: A Large-Scale Hierarchical Image Database, In proc. CVPR09 22 He K , Zhang X , Ren S , and Sun J ( 2015 ) Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, in 2015 IEEE International Conference on Computer Vision (ICCV), 1026–1034. 23 Ioffe S , and Szegedty C ( 2015 ) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, ICML'15 Proceedings of the 32nd International Conference on International Conference on Machine Learning, 37: 448–456. 24 Hu J , Shen L , and Sun G ( 2017 ) Squeeze-and-Excitation Networks, Computer Vision and Pattern Recognition, publishing 25 Razavian A S , Azizpour H , Sullivan J , and Cralsson S ( 2014 ) CNN Features off-the-shelf: an Astounding Baseline for Recognition, CVPRW ‘14 Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 512–519. 26 Xiao T , Xu Y , Yang K , Zhang J , Peng Y , and Zhang Z ( 2015 ) The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 842–850. 27 Zhang M M , Choi J , Daniilidis K , Wolf M T , and Kanan C ( 2015 ) VAIS: A Dataset for Recognizing Maritime Imagery in the Visible and Infrared Spectrums, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 512–519. 28 Liu Y , Racah E , Prabhat , Correa J , Khosrowshahi A , Lavers D , Kunkel K , Wehner M F , and Collins W D ( 2016 ) Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets, Int’l Conf. on Advances in Big Data Analytics, 81–88. © The Author(s) 2018. Published by Oxford University Press on behalf of The Japanese Society of Microscopy. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Twinned/untwinned catalytic gold nanoparticles identified by applying a convolutional neural network to their Hough transformed Z-contrast images JF - Microscopy DO - 10.1093/jmicro/dfy036 DA - 2018-12-01 UR - https://www.deepdyve.com/lp/oxford-university-press/twinned-untwinned-catalytic-gold-nanoparticles-identified-by-applying-dQA7U9q7hi SP - 321 VL - 67 IS - 6 DP - DeepDyve ER -