TY - JOUR AU - Chen, Zeng AB - 1. Introduction With the development of society, large-scale workpiece has been widely utilized in aerospace, shipbuilding, automotive, and other fields. In order to meet more strict application requirements, higher demands have been placed on the detection and assembly processes. The workpiece surface, as the object of detection, profoundly influences the processes owing to difference in surface quality [1–3]. Typically, the finished workpiece surface quality deviates from its design requirements due to machining error and other factors. Simultaneously, the workpiece surface suffers from damage during service influenced by factors such as fatigue, wear, and dust. In these circumstances, the working accuracy and efficiency of the workpiece decrease, subsequently shortening its service life. This greatly increases the probability of safety risks associated with the application of defective workpeice [4]. Therefore, defect detection of the workpiece surface as a crucial step of production [5, 6] can timely identify surface quality issues, which is greatly avoiding the risks of accidents during usage. Traditional defect detection of workpiece surface is typically carried out by quality detection personnel. During prolonged detection processes, inspectors will be exhausted, high pressure with lack of concentration, which leads to a decline in human performance. Consequently, detection accuracy and efficiency cannot meet the required standards. Moreover, manual detection is costly, and there can be significant variations in the skills and experience of quality inspectors, potentially resulting in biased detection results. In order to avoid the impact of low accuracy, high cost, and inefficiency associated with manual detection, machine-based detection methods have been introduced into the detection process. Since the first coordinate measuring machine was developed by Ferranti company in Scotland, coordinate measuring machines have been widely used to acquire three-dimensional metrological shape information and surface quality information of physical geometric features [7, 8]. These machines have become increasingly important in industrial detection, replacing part of manual detection and greatly improving detection efficiency and accuracy [9, 10]. In recent years, to achieve the aim of high precision and accuracy for surface defect detection, many of experts and scholars have conducted extensive research on workpiece surface defect detection. Currently, the accuracy of contacted measurement can reach the micron level, with a probe deviation of less than 10 nm [11]. Although the accuracy is high, the use of a contact probe during the measurement process can cause secondary damage to the workpiece surface, leading to detection errors and a decrease in accuracy, and affect the detection results. Additionally, specialized fixtures are needed for the detection of certain complex surfaces, The design and manufacturing of these fixtures require extra time with expensive cost [12]. Furthermore, the contact probe cannot be used to identify the overall workpiece shape and it can only measure specific reference areas. Compared with traditional coordinate measuring machines, non-contact optical measurement systems have the characteristics of non-destructiveness, high sensitivity, and high precision. The measurement systems can quickly obtain accurate and dense spatial surface geometry points without touching the workpiece [13], which are used to describe the geometric features of the measured surface [14, 15]. Feng et al. [16] used a handheld 3D laser scanner to obtain the three-dimensional geometric defects of an I-beam cross-section specimen (represented in point cloud form) and proposed a new method to establish a digital geometric model of the specimen based on point cloud data, with a scanning accuracy of up to 0.5 mm. On the traditional three-coordinate measuring machine platform, Chao et al. [17] replaced the contact measuring head with a non-contact laser sensor to achieve 3D scanning. The results show that the measurement errors are all less than 0.03 mm. Wang et al. [18] developed a handheld 3D laser scanning system for real-time measurement of large workpieces on-site. The measurement results indicated that this system could overcome problems of occlusion and range limitations, obtaining data information for the entire surface of the workpiece. However, for the measurement methods and systems above, the obtained workpiece surface information and the initial geometric defects still require detection personnel to identify and classify defects to select qualified workpiece. With the development of computer technology, traditional detection is being integrated with computer technology, and new methods are being proposed to address the detection of workpiece surface defects. He et al. [19] proposed an end-to-end CNN model for detecting surface defects in steel strips. The results indicated that the CNN model could be used in real-time detection, yet the recognition of edge defects is less satisfactory. Zong et al. [20] introduced an intelligent automatic three-dimensional defect detection system, which transformed the defect detection from simple image detection to 3D detection. The results show that the system can be used for quantitative three-dimensional estimation and feature classification of material surface defects. This system can accurately and quantitatively assess the defects of the detected object but cannot achieve real-time defect detection. Xu et al. [21] proposed a deep learning defect detection method based on the improved YOLOv5 model for the metal surface defect detection, which improved detection efficiency. Shen et al. [22] also proposed an improved method based on GA-Faster-RCNN network to raise the detection accuracy of small object detection on the flexible printed circuit (FPC) surface. He et al. [23] also proposed a new element feature fusion network (EFF Net) and which was used for defects detection of display panels online. Although there has been some progress in workpiece surface defect detection, the mentioned above detection methods were used to obtain information images of the physical workpiece surface by taking with an industrial camera. However, this method is easily affected by environmental lighting conditions, resulting in unclear images that fail to reflect the true workpiece surface information, thereby impacting the accuracy and results of the surface detection [24]. Besides, owing to the complexity of workpiece, the traditional 2D image is difficult to be applied to the surface acquisition of complex workpiece surface. The characteristics of digital twin models are summarized as interoperability, scalability, and high fidelity [25]. These characteristics ensure that the model can accurately reflect the true state of the physical workpiece surface and complete the detection task on behalf of the workpiece. This detection process transforms the physical workpiece to the digital twin model, which improves the efficiency and accuracy of the entire cycle from design to production to detection, enhancing overall management. Therefore, a new surface defect detection system is proposed by combining a three-dimensional automated scanning system with the improved YOLOv5 defect detection model. The high fidelity and high precision digital twin models are used for the first time to detect surface defects instead of physical workpieces. In this way, the workpiece surface can be scanned and reconstructed quickly, so that the characteristic surface can be obtained faster by virtual camera technology. Finally, the improved YOLOv5 model successfully realizes intelligent identification of surface defects, and locates and classifies them. 2. The process of surface defect detection The location of this experiment and research is in in the Key Laboratory of Fundamental Science for National Defense of Shenyang Aerospace University in China. The researchers involved were all laboratory workers, so the equipment required for the experiments was licensed. The process for identifying workpiece surface defects is illustrated in Fig 1. Firstly, a non-contact three-dimensional automated scanning detection system is employed for the detection of the workpiece surface. The geometric coordinates of the physical contact plane are obtained through laser scanning. The acquired point cloud data undergoes preprocessing operations, including noise filtering, smoothing, and data simplification. The point cloud data after processing is used for reconstructing modeling, acquiring a digital twin model of physical workpiece. Subsequently, a virtual camera is used to capture surface information images of the digital twin model. Finally, the surface defects of digital twin model are identified and classified using a defect detection model. Based on the quality requirements of the workpiece surface, qualified workpieces are screened out. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Identification process of workpiece surface defects. https://doi.org/10.1371/journal.pone.0302419.g001 2.1 The surface information acquisition The three-dimensional automated scanning system as shown in Fig 2. The robotic arm equipped with scanner equipment is used as the motion execution unit, and is driven by the control system to obtain surface point cloud data. At the same time, the rotating table device collaborate with robotic arm to complete the scanning task during the scanning process, and the on-site computer is used to observe the scanning process in real time. The system is also equipped with unique software for point cloud data post-processing and reconstruction of digital twin model. The accuracy of this scanning system can reach 0.02 mm, meeting the conditions for using a digital twin model instead of a workpiece model. Moreover, this process requires no manual intervention, greatly enhancing the efficiency of data collection. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. The three-dimensional automated scanning system. https://doi.org/10.1371/journal.pone.0302419.g002 Due to various disturbance factors during the process of acquiring comprehensive point cloud data of the workpiece surface, the acquired point cloud data will be disturbed and generate with blank points, outliers, noise points, etc, resulting in significant deviations from the true workpiece surface. Therefore, it is necessary to perform denoising and smoothing on the point cloud data. The processed point cloud data is then reconstructed to be a digital twin model that can replace the physical workpiece for detection. Surface image information of the digital twin model is captured using a virtual camera by Blender software. Fig 3 illustrates a schematic diagram of capturing photos via a virtual camera. Virtual camera photography is an essential technique in computer-generated images that simulates the workings of a real camera. A virtual camera is applied to capture surface image information. By adjusting virtual camera parameters, positioning, and orientation, the virtual camera is strategically used in the capture of target area, achieving the desired surface perspective. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Schematic diagram of surface information obtained by means of a virtual camera. https://doi.org/10.1371/journal.pone.0302419.g003 2.2 The principle and structure of YOLOv5 In recent years, Convolutional Neural Network (CNN) models have significantly improved the performance of visual tasks related to workpiece detection, providing possibilities for handling complex datasets with different inputs [26]. Simultaneously, major breakthroughs in the CNN field have enhanced the inference speed and accuracy, making real-time object detection and recognition feasible. The core of the defect detection is the application of the YOLOv5 model. The YOLOv5 model is the state-of-the-art detection model for the real-time detection of workpiece surface, which incorporats the essence of ResNet, DenseNet, and Feature Pyramid Network (FPN) [27]. The schematic representation of the identification process is shown in Fig 4. The principle of defect detection in the YOLOv5 model involves dividing the input image into an S×S grid, where the detected target areas fall within the grid. Each grid is responsible for detecting defects that fall into its area. If the coordinates of the center of a defect target fall within a grid, that grid is responsible for recognizing the defect target, continuing until the entire defect is identified. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. The schematic representation of the identification process for YOLOv5. https://doi.org/10.1371/journal.pone.0302419.g004 The overall identification principle of the YOLOv5 model has been outlined in the previous text, but internally it also possesses a complex network structure. The network structure of YOLOv5 is depicted in Fig 5, consisting of four main parts: the input end, the backbone network, the neck network, and the output end. The YOLOv5 network model allows for flexible configuration of different complexity levels by modifying depth and width parameters, thereby changing the size and quantity of convolutional kernels, and achieving end-to-end detection with a single CNN operation. Firstly, the backbone network is primarily responsible for the task of extracting feature information of defects from the input image. Then, the neck network is involved in the fusion. Finally, the detection part utilizes this defect feature information to identify the surface defect of the workpiece. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. The network structure of YOLOv5. https://doi.org/10.1371/journal.pone.0302419.g005 At the input stage, the surface information images of the digital twin models undergo preprocessing, namely re-sizing to a size of 640×640. Additionally, YOLOv5 model employs Mosaic data augmentation by randomly scaling, cropping, and arranging surface information images of the workpieces during training. This enriches the detection dataset, enhances training speed, and improves the ability to detect minor defects. The backbone network is responsible for extracting defect features from the surface information images of the workpieces, comprising the Focus module, CBL module, CSP module, and SPP module as shown in Fig 6. CBL module is the most basic module, which is mainly composed of Conv, BN and ReLU activation function operations. The most fundamental module is the CBL module, which is primarily operated by convolution (Conv), batch normalization (BN), and the ReLU activation function. At the same time, CBL module is also applied to the neck network of YOLOv5 model to realize the cross-stage level fusion of different feature layers. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. The backbone network of YOLOv5 model: (a) Focus module; (b) Focus operation; (c) SPP module; (d) CSP module. https://doi.org/10.1371/journal.pone.0302419.g006 As shown in Fig 6(A), the Focus module firstly performs a slicing operation on the input surface image with defects, then concatenates the results, and finally obtains the output data through the CBL module. The principle of the slicing operation is illustrated in Fig 6(B), which takes a value for every other pixel of the input image, getting four subsampled images. The Focus structure prevents the image from losing information in the subsequent subsampling process, retaining more complete information for subsequent defect feature extraction. The SPP module is a spatial pyramid pooling module designed to enlarge the receptive field. In the YOLOv5 model, expanding the receptive field of the model can help the model to better capture the image information, which improves the ability of the detection of model for small defects as shown in Fig 6(C). Initially, the input feature map undergoes a CBL module to reduce the channel count by half. Subsequently, three different convolutional kernels are employed for max-pooling subsampling. The results of these three pooling operations are then concatenated with the input feature map along the channels. The merged feature map has a channel count twice that of the original, allowing for a significant expansion of the receptive field at a relatively low computational cost. The CSP module plays a crucial role in effectively improving the model’s accuracy as shown in Fig 6(D). CSP1_X divides the input into two parts: one part undergoes the CBL module, followed by umpteen residual structures and convolution; the other part undergoes convolution directly. The results of these two parts are then tensor-concatenated, followed by batch normalization (BN) and activation function operations. Finally, another CBL operation is applied. The CSP2_X structure is similar to CSP1_X, with the only difference being the replacement of the residual block Bottleneck with two CBL modules. The Bottleneck utilizes two CBL modules to first reduce and then expand the channel count, aiming to extract feature information. The neck network consists of the Feature Pyramid Network (FPN) and the Path Aggregation Network (PAN) as shown in Fig 7. The FPN is responsible for the top-down upsampling, transmitting strong semantic features from the upper layers of the pyramid network downwards. On the other hand, the PAN performs bottom-up subsampling, propagating strong localization features from the lower layers of the pyramid network upwards. The complementary structure of FPN and PAN addresses the issue of multi-scale feature fusion. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. FPN structure and PAN structure of YOLOv5 model. https://doi.org/10.1371/journal.pone.0302419.g007 At the output stage, the generated feature maps are processed by additional convolutional layers to predict the positions and categories of the objects. Multiple prediction boxes are generated to avoid redundant predictions. Non-Maximum Suppression (NMS) is applied to retain the prediction box with the highest confidence score, thus completing the object detection process. The loss function is used to measure the proximity between the neural network’s predicted output and the desired output. The smaller the loss value, the closer the predicted output is to the expected output. The loss function in YOLOv5 consists of three parts: classification loss, prediction box confidence loss, and prediction box localization loss. Both the classification loss and prediction box confidence loss are binary cross-entropy losses, while the prediction box localization loss is calculated using the Complete Intersection over Union (CIOU). To address the non-differentiability issue of the IOU loss function when the ground truth box and predicted box do not intersect, YOLOv5 algorithm uses the CIOU loss as the bounding box regression loss function. The formula for the CIOU loss function is as follows: (1) (2) (3) (4) Here, ρ represents the Euclidean distance between the predicted box and the ground truth box’s center point. c represents the normalized factor of the target box’s diagonal length. α serves as a balancing parameter to balance the position and size errors of the predicted box. v acts as a correction factor to reduce the overlapping differences between large and small objects. w1 and h1 represent the width and height of the ground truth box, respectively, while wp and hp represent the width and height of the predicted box, respectively. In addition, the YOLOv5 algorithm adopts cross-grid matching rules to increase the number of positive anchor boxes, achieving speed up model convergence. For each detection branch, the aspect ratio between the predicted box and the anchor box of the current layer is calculated. When the aspect ratio exceeds a certain threshold, it indicates insufficient matching between the predicted box and the anchor box, and in this case, the corresponding region is considered as the background in the predictions of that detection branch. For the remaining predicted boxes, the grid where the center point lies are determined, and the two nearest grids are also considered responsible for predicting that bounding box. The anchor matching strategy in YOLOv5 is illustrated in Fig 8. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 8. Anchor matching strategy in YOLOv5. https://doi.org/10.1371/journal.pone.0302419.g008 During the process of detecting surface defects on objects, both precision and recall are considered. Precision refers to the ratio between the number of specific class defects correctly classified as positive samples on the surface of the object and the total number of defects predicted as positive samples of that class by the model. The calculation method for precision (P) is shown in Eq (5). Recall (R) refers to the ratio between the number of specific class defects correctly classified as positive samples on the surface of the object and the total number of defects belonging to that class that are actually positive samples. The calculation method for recall is shown in Eq (6). (5)(6) In the Formulas, TP represents the number of true positive samples, which are correctly identified as positive samples. The context of this article involves correctly identifying defective objects. FP represents the number of false positive samples, where the background is mistakenly identified as a defective target. FN represents the number of false negative samples, where the defective targets are not recognized. By considering the relationship between P and R, a precision-recall curve (P-R curve) can be plotted. The average precision (AP) across all classes represents the area enclosed by the P-R curve and the axes. The calculation for average precision is shown in Eq (7). The mean average precision (mAP) across all classes is obtained by calculating the average precision for each class and taking the mean. The mAP represents the average value of AP for each class and is commonly used to evaluate the performance of a detection model. A higher mAP indicates better performance of the detection model. The calculation of mAP is shown in Eq (8). (7)(8) 2.3 Improved YOLOv5 model In order to improve the defect detection accuracy of the YOLOv5 model, enable the model to focus on the target area during training, and suppress the expression of unimportant information, the Convolutional Block Attention Module (CBAM) mechanism proposed by Woo et al [28] was introduced in the YOLOv5 model. CBAM is an attention mechanism module used to enhance the performance of convolutional neural networks (CNN). The main goal of CBAM mechanism is to improve the perceptual ability of the model by introducing channel attention and spatial attention in CNN, thereby improving performance without increasing network complexity. The principle of CBAM module is shown in the Fig 9. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 9. The diagram of CBAM mechanism module. https://doi.org/10.1371/journal.pone.0302419.g009 The CBAM module is divided into channel attention module and spatial attention module. First, the feature is input into the channel attention module, and the corresponding attention feature is output. Then the input feature is multiplied by the attention feature, and the output is passed through the spatial attention module. The same operation is performed, and finally the enhanced output features are obtained. By combining channel attention and spatial attention, the CBAM module can capture the correlation between features in different dimensions to improve the feature expression ability of the convolutional neural network, thereby improving the performance of image recognition tasks. The BiFPN module was proposed by Tan et al. [29], which enhances the ability to process higher-level feature fusion in the path. Through the fusion of weighted features, each bidirectional path (top-down and bottom-up) is processed as a feature network layer. Fig 10(A) and 10(B) shows the FPN and PAN feature fusion structures used in the original YOLOv5 network. FPN performs multi-scale feature fusion in a top-down manner, and PAN adds a bottom-up path based on FPN. Fig 10(C) shows the BiFPN module, which is the feature fusion part of the EfficientDet network. Five effective feature layers P3-P7 are received from the backbone feature extraction network, and performed upsampling and downsampling feature fusion on these feature layers in sequence. each node is set weights to balance features of different scales, which improves detection accuracy. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 10. The FPN, PAN, and BiFPN network structure: (a) FPN; (b) PAN; (c) BiFPN. https://doi.org/10.1371/journal.pone.0302419.g010 The CBAM module is added to the head of the original YOLOv5 model. This module is targeted at workpieces surface with dense defects and multiple target features. By integrating the CBAM module, the improved YOLOv5 model pays more attention to target areas containing important information, suppresses irrelevant information, and improves the overall accuracy of target detection. At the same time, BiFPN performs better than the original FPN+PAN structure in multi-scale feature fusion. Therefore, the original neck module is transferred into the BiFPN network module, thereby improving the accuracy of position estimation and category classification. The improved network structure is shown in Fig 11. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 11. The network structure of improved YOLOv5 model. https://doi.org/10.1371/journal.pone.0302419.g011 2.1 The surface information acquisition The three-dimensional automated scanning system as shown in Fig 2. The robotic arm equipped with scanner equipment is used as the motion execution unit, and is driven by the control system to obtain surface point cloud data. At the same time, the rotating table device collaborate with robotic arm to complete the scanning task during the scanning process, and the on-site computer is used to observe the scanning process in real time. The system is also equipped with unique software for point cloud data post-processing and reconstruction of digital twin model. The accuracy of this scanning system can reach 0.02 mm, meeting the conditions for using a digital twin model instead of a workpiece model. Moreover, this process requires no manual intervention, greatly enhancing the efficiency of data collection. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. The three-dimensional automated scanning system. https://doi.org/10.1371/journal.pone.0302419.g002 Due to various disturbance factors during the process of acquiring comprehensive point cloud data of the workpiece surface, the acquired point cloud data will be disturbed and generate with blank points, outliers, noise points, etc, resulting in significant deviations from the true workpiece surface. Therefore, it is necessary to perform denoising and smoothing on the point cloud data. The processed point cloud data is then reconstructed to be a digital twin model that can replace the physical workpiece for detection. Surface image information of the digital twin model is captured using a virtual camera by Blender software. Fig 3 illustrates a schematic diagram of capturing photos via a virtual camera. Virtual camera photography is an essential technique in computer-generated images that simulates the workings of a real camera. A virtual camera is applied to capture surface image information. By adjusting virtual camera parameters, positioning, and orientation, the virtual camera is strategically used in the capture of target area, achieving the desired surface perspective. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Schematic diagram of surface information obtained by means of a virtual camera. https://doi.org/10.1371/journal.pone.0302419.g003 2.2 The principle and structure of YOLOv5 In recent years, Convolutional Neural Network (CNN) models have significantly improved the performance of visual tasks related to workpiece detection, providing possibilities for handling complex datasets with different inputs [26]. Simultaneously, major breakthroughs in the CNN field have enhanced the inference speed and accuracy, making real-time object detection and recognition feasible. The core of the defect detection is the application of the YOLOv5 model. The YOLOv5 model is the state-of-the-art detection model for the real-time detection of workpiece surface, which incorporats the essence of ResNet, DenseNet, and Feature Pyramid Network (FPN) [27]. The schematic representation of the identification process is shown in Fig 4. The principle of defect detection in the YOLOv5 model involves dividing the input image into an S×S grid, where the detected target areas fall within the grid. Each grid is responsible for detecting defects that fall into its area. If the coordinates of the center of a defect target fall within a grid, that grid is responsible for recognizing the defect target, continuing until the entire defect is identified. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. The schematic representation of the identification process for YOLOv5. https://doi.org/10.1371/journal.pone.0302419.g004 The overall identification principle of the YOLOv5 model has been outlined in the previous text, but internally it also possesses a complex network structure. The network structure of YOLOv5 is depicted in Fig 5, consisting of four main parts: the input end, the backbone network, the neck network, and the output end. The YOLOv5 network model allows for flexible configuration of different complexity levels by modifying depth and width parameters, thereby changing the size and quantity of convolutional kernels, and achieving end-to-end detection with a single CNN operation. Firstly, the backbone network is primarily responsible for the task of extracting feature information of defects from the input image. Then, the neck network is involved in the fusion. Finally, the detection part utilizes this defect feature information to identify the surface defect of the workpiece. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. The network structure of YOLOv5. https://doi.org/10.1371/journal.pone.0302419.g005 At the input stage, the surface information images of the digital twin models undergo preprocessing, namely re-sizing to a size of 640×640. Additionally, YOLOv5 model employs Mosaic data augmentation by randomly scaling, cropping, and arranging surface information images of the workpieces during training. This enriches the detection dataset, enhances training speed, and improves the ability to detect minor defects. The backbone network is responsible for extracting defect features from the surface information images of the workpieces, comprising the Focus module, CBL module, CSP module, and SPP module as shown in Fig 6. CBL module is the most basic module, which is mainly composed of Conv, BN and ReLU activation function operations. The most fundamental module is the CBL module, which is primarily operated by convolution (Conv), batch normalization (BN), and the ReLU activation function. At the same time, CBL module is also applied to the neck network of YOLOv5 model to realize the cross-stage level fusion of different feature layers. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. The backbone network of YOLOv5 model: (a) Focus module; (b) Focus operation; (c) SPP module; (d) CSP module. https://doi.org/10.1371/journal.pone.0302419.g006 As shown in Fig 6(A), the Focus module firstly performs a slicing operation on the input surface image with defects, then concatenates the results, and finally obtains the output data through the CBL module. The principle of the slicing operation is illustrated in Fig 6(B), which takes a value for every other pixel of the input image, getting four subsampled images. The Focus structure prevents the image from losing information in the subsequent subsampling process, retaining more complete information for subsequent defect feature extraction. The SPP module is a spatial pyramid pooling module designed to enlarge the receptive field. In the YOLOv5 model, expanding the receptive field of the model can help the model to better capture the image information, which improves the ability of the detection of model for small defects as shown in Fig 6(C). Initially, the input feature map undergoes a CBL module to reduce the channel count by half. Subsequently, three different convolutional kernels are employed for max-pooling subsampling. The results of these three pooling operations are then concatenated with the input feature map along the channels. The merged feature map has a channel count twice that of the original, allowing for a significant expansion of the receptive field at a relatively low computational cost. The CSP module plays a crucial role in effectively improving the model’s accuracy as shown in Fig 6(D). CSP1_X divides the input into two parts: one part undergoes the CBL module, followed by umpteen residual structures and convolution; the other part undergoes convolution directly. The results of these two parts are then tensor-concatenated, followed by batch normalization (BN) and activation function operations. Finally, another CBL operation is applied. The CSP2_X structure is similar to CSP1_X, with the only difference being the replacement of the residual block Bottleneck with two CBL modules. The Bottleneck utilizes two CBL modules to first reduce and then expand the channel count, aiming to extract feature information. The neck network consists of the Feature Pyramid Network (FPN) and the Path Aggregation Network (PAN) as shown in Fig 7. The FPN is responsible for the top-down upsampling, transmitting strong semantic features from the upper layers of the pyramid network downwards. On the other hand, the PAN performs bottom-up subsampling, propagating strong localization features from the lower layers of the pyramid network upwards. The complementary structure of FPN and PAN addresses the issue of multi-scale feature fusion. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. FPN structure and PAN structure of YOLOv5 model. https://doi.org/10.1371/journal.pone.0302419.g007 At the output stage, the generated feature maps are processed by additional convolutional layers to predict the positions and categories of the objects. Multiple prediction boxes are generated to avoid redundant predictions. Non-Maximum Suppression (NMS) is applied to retain the prediction box with the highest confidence score, thus completing the object detection process. The loss function is used to measure the proximity between the neural network’s predicted output and the desired output. The smaller the loss value, the closer the predicted output is to the expected output. The loss function in YOLOv5 consists of three parts: classification loss, prediction box confidence loss, and prediction box localization loss. Both the classification loss and prediction box confidence loss are binary cross-entropy losses, while the prediction box localization loss is calculated using the Complete Intersection over Union (CIOU). To address the non-differentiability issue of the IOU loss function when the ground truth box and predicted box do not intersect, YOLOv5 algorithm uses the CIOU loss as the bounding box regression loss function. The formula for the CIOU loss function is as follows: (1) (2) (3) (4) Here, ρ represents the Euclidean distance between the predicted box and the ground truth box’s center point. c represents the normalized factor of the target box’s diagonal length. α serves as a balancing parameter to balance the position and size errors of the predicted box. v acts as a correction factor to reduce the overlapping differences between large and small objects. w1 and h1 represent the width and height of the ground truth box, respectively, while wp and hp represent the width and height of the predicted box, respectively. In addition, the YOLOv5 algorithm adopts cross-grid matching rules to increase the number of positive anchor boxes, achieving speed up model convergence. For each detection branch, the aspect ratio between the predicted box and the anchor box of the current layer is calculated. When the aspect ratio exceeds a certain threshold, it indicates insufficient matching between the predicted box and the anchor box, and in this case, the corresponding region is considered as the background in the predictions of that detection branch. For the remaining predicted boxes, the grid where the center point lies are determined, and the two nearest grids are also considered responsible for predicting that bounding box. The anchor matching strategy in YOLOv5 is illustrated in Fig 8. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 8. Anchor matching strategy in YOLOv5. https://doi.org/10.1371/journal.pone.0302419.g008 During the process of detecting surface defects on objects, both precision and recall are considered. Precision refers to the ratio between the number of specific class defects correctly classified as positive samples on the surface of the object and the total number of defects predicted as positive samples of that class by the model. The calculation method for precision (P) is shown in Eq (5). Recall (R) refers to the ratio between the number of specific class defects correctly classified as positive samples on the surface of the object and the total number of defects belonging to that class that are actually positive samples. The calculation method for recall is shown in Eq (6). (5)(6) In the Formulas, TP represents the number of true positive samples, which are correctly identified as positive samples. The context of this article involves correctly identifying defective objects. FP represents the number of false positive samples, where the background is mistakenly identified as a defective target. FN represents the number of false negative samples, where the defective targets are not recognized. By considering the relationship between P and R, a precision-recall curve (P-R curve) can be plotted. The average precision (AP) across all classes represents the area enclosed by the P-R curve and the axes. The calculation for average precision is shown in Eq (7). The mean average precision (mAP) across all classes is obtained by calculating the average precision for each class and taking the mean. The mAP represents the average value of AP for each class and is commonly used to evaluate the performance of a detection model. A higher mAP indicates better performance of the detection model. The calculation of mAP is shown in Eq (8). (7)(8) 2.3 Improved YOLOv5 model In order to improve the defect detection accuracy of the YOLOv5 model, enable the model to focus on the target area during training, and suppress the expression of unimportant information, the Convolutional Block Attention Module (CBAM) mechanism proposed by Woo et al [28] was introduced in the YOLOv5 model. CBAM is an attention mechanism module used to enhance the performance of convolutional neural networks (CNN). The main goal of CBAM mechanism is to improve the perceptual ability of the model by introducing channel attention and spatial attention in CNN, thereby improving performance without increasing network complexity. The principle of CBAM module is shown in the Fig 9. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 9. The diagram of CBAM mechanism module. https://doi.org/10.1371/journal.pone.0302419.g009 The CBAM module is divided into channel attention module and spatial attention module. First, the feature is input into the channel attention module, and the corresponding attention feature is output. Then the input feature is multiplied by the attention feature, and the output is passed through the spatial attention module. The same operation is performed, and finally the enhanced output features are obtained. By combining channel attention and spatial attention, the CBAM module can capture the correlation between features in different dimensions to improve the feature expression ability of the convolutional neural network, thereby improving the performance of image recognition tasks. The BiFPN module was proposed by Tan et al. [29], which enhances the ability to process higher-level feature fusion in the path. Through the fusion of weighted features, each bidirectional path (top-down and bottom-up) is processed as a feature network layer. Fig 10(A) and 10(B) shows the FPN and PAN feature fusion structures used in the original YOLOv5 network. FPN performs multi-scale feature fusion in a top-down manner, and PAN adds a bottom-up path based on FPN. Fig 10(C) shows the BiFPN module, which is the feature fusion part of the EfficientDet network. Five effective feature layers P3-P7 are received from the backbone feature extraction network, and performed upsampling and downsampling feature fusion on these feature layers in sequence. each node is set weights to balance features of different scales, which improves detection accuracy. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 10. The FPN, PAN, and BiFPN network structure: (a) FPN; (b) PAN; (c) BiFPN. https://doi.org/10.1371/journal.pone.0302419.g010 The CBAM module is added to the head of the original YOLOv5 model. This module is targeted at workpieces surface with dense defects and multiple target features. By integrating the CBAM module, the improved YOLOv5 model pays more attention to target areas containing important information, suppresses irrelevant information, and improves the overall accuracy of target detection. At the same time, BiFPN performs better than the original FPN+PAN structure in multi-scale feature fusion. Therefore, the original neck module is transferred into the BiFPN network module, thereby improving the accuracy of position estimation and category classification. The improved network structure is shown in Fig 11. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 11. The network structure of improved YOLOv5 model. https://doi.org/10.1371/journal.pone.0302419.g011 3 Experiment and discussion The combination of the 3D automated scanning system with the YOLOv5 model is employed during the surface detection. Initially, the digital twin model of the workpiece is acquired by scanning equipment with the accuracy of 0.02mm. Simultaneously, surface information images of the digital twin model are obtained by utilizing virtual camera and compiled to form a dataset for training the original and improved YOLOv5 model. Subsequently, the dataset undergoes defect annotation. The two YOLOv5 models are trained to intelligently identify and classify defects presented in the images, effectively localizing them. Finally, the detection results based on the improved YOLOv5 model are analyzed in detail and compared with the detection results of the original model and other algorithms. 3.1 The acquisition and processing of the dataset On the manufacturing industry field, the reasons for workpieces failure often stem from inherent material defects, structural design issues, process defects resulting from incorrect processing or manufacturing, as well as fatigue defects caused by incorrect maintenance during service process. Furthermore, this study primarily focuses on process defects presented on the workpiece. The workpiece used in the dataset came from repair factory, including engine castings, machined aircraft parts, and sheet metal forming structures. The surface defects of its components show pitting, uneven, holes and so on. Comprehensive physical surface point cloud data via 3D scanning equipment is used to reconstruct digital twin model. Subsequently, the surface of the digital twin model is captured using a virtual camera. This process, with a scanning accuracy of 0.02mm, can obtain the digital twin surface of the entire workpiece with high precision, while completing the capture of the surface with defects and forming the dataset. To improve the quality of dataset training, a series of dataset augmentation techniques are applied during the process. The random augmentation techniques include flipping, rotation, and variation of image brightness, getting an expanded images as shown in Fig 12. In this way, the YOLOv5 model can learn object features at different scales, brightness levels, and angles, thereby enhancing the model’s generalization performance on unseen data. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 12. Dataset augmentation methods: (a) Original Image; (b) Flip; (c)Rotation; (d)Brightness. https://doi.org/10.1371/journal.pone.0302419.g012 Before conducting model training, the samples in the dataset need to be manually annotated for defects using Labeling software. The dataset comprises various types of samples with surface defects, including images with single defects, multiple defects, and defect-free images. The defects encompass four typical categories: Inclusion, Perforation, Pitted surface, and Rolled-in scale. The detailed feature and legend of these typical defects are presented in Fig 13. Inclusion presents obvious interface boundary with the shape of flakes or blocks. Perforation presents irregular holes on the surface. Pitted surface presents an uneven area. Rolled-in scale presents the features of small spots, fish scales, strips, and lumps. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 13. Induction of various surface defects in the dataset: (a) Inclusion; (b)Rolled-in scale; (c)Perforation; (d)surface. https://doi.org/10.1371/journal.pone.0302419.g013 3.2 Improved YOLOv5 model training results As mentioned on section 2.3, the improved YOLOv5 model can be better used for defect identification, classification and location. The improved YOLOv5 model is trained through the surface defect data set. The training results are shown in Fig 14, the training and testing results of the Box loss function, the loss function without the target category, and the category loss function. The results show that although there are some flaw, the cls_loss value in the training and validation shows that both are close to 0, and the category and location of the defect can be correctly detected. In this case, efficient classification can be performed during the defect detection. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 14. The improved YOLOv5 model training results. https://doi.org/10.1371/journal.pone.0302419.g014 The P-R curve can description the variation of defects detection results before and after improvement, as shown in Fig 15(A) and 15(B). Although the accuracy of perforation and pitted surface decreased by 0.1% and 2.9% respectively, the Rolled-in scale of defects with poor detection results before has been greatly improved by 3.3%, and the accuracy of Inclusion has also been increased by 0.7%. Compared with the original model, the mAP value of the improved model is increased from 77.4% to 77.6%. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 15. Comparison of training results before and after Improving the YOLOv5 Model: (a) Original Model; (b) Improved Model. https://doi.org/10.1371/journal.pone.0302419.g015 3.3 Detection results visualization In order to verify the actual detection effect of the improved model, the original YOLOv5 model and the improved YOLOv5 model were used to detect the image with surface defect. Fig 16 shows the boundary box output of detection of four defect categories. It can be seen from the test results that: The improved YOLOv5 model can better identify and classify defects, and the inspection is more adequate. The original YOLOv5 model has low confidence in the inspection of Rolled-in scale defects, and some Rolled-in scale defects are omitted. Since the improved model introduces the CABM mechanism, it focuses on the defect area during the feature extraction process, enhancing the expression of defect features,. At the same time, the BiFPN module is used to improve the feature fusion and fully integrate the surface features, thus improving the detection accuracy of Inclusion and Rolled-in scale. For the detection results of the same type of defects, the improved model has a better detection effect, and the confidence level of the detected defects is higher than that before the improvement. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 16. Detection Results for each Defect Category:(a) Inclusion; (b) Perforation; (c) Pitted surface; (d) Rolled-in scale. https://doi.org/10.1371/journal.pone.0302419.g016 3.4 Models comparison for surface defect detection Based on the division of the same data set, the improved model in this paper is compared with SSD model, Faster R-CNN model, YOLOv3 model and YOLOv4 model, and the results are shown in Table 1. It can be seen from the data in the table that the AP values of Perforation, Inclusion, Pitted surface and roll-in scale are higher than those of other models. Eventually, the mAP value of the improved model reached 77.6%, which was 11.7%, 3.4%, 6.2% and 33.5% higher than that of SSD, Faster R-CNN, YOLOv3 and YOLOv4 models of the same series, respectively. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Comparison results of different models. https://doi.org/10.1371/journal.pone.0302419.t001 3.1 The acquisition and processing of the dataset On the manufacturing industry field, the reasons for workpieces failure often stem from inherent material defects, structural design issues, process defects resulting from incorrect processing or manufacturing, as well as fatigue defects caused by incorrect maintenance during service process. Furthermore, this study primarily focuses on process defects presented on the workpiece. The workpiece used in the dataset came from repair factory, including engine castings, machined aircraft parts, and sheet metal forming structures. The surface defects of its components show pitting, uneven, holes and so on. Comprehensive physical surface point cloud data via 3D scanning equipment is used to reconstruct digital twin model. Subsequently, the surface of the digital twin model is captured using a virtual camera. This process, with a scanning accuracy of 0.02mm, can obtain the digital twin surface of the entire workpiece with high precision, while completing the capture of the surface with defects and forming the dataset. To improve the quality of dataset training, a series of dataset augmentation techniques are applied during the process. The random augmentation techniques include flipping, rotation, and variation of image brightness, getting an expanded images as shown in Fig 12. In this way, the YOLOv5 model can learn object features at different scales, brightness levels, and angles, thereby enhancing the model’s generalization performance on unseen data. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 12. Dataset augmentation methods: (a) Original Image; (b) Flip; (c)Rotation; (d)Brightness. https://doi.org/10.1371/journal.pone.0302419.g012 Before conducting model training, the samples in the dataset need to be manually annotated for defects using Labeling software. The dataset comprises various types of samples with surface defects, including images with single defects, multiple defects, and defect-free images. The defects encompass four typical categories: Inclusion, Perforation, Pitted surface, and Rolled-in scale. The detailed feature and legend of these typical defects are presented in Fig 13. Inclusion presents obvious interface boundary with the shape of flakes or blocks. Perforation presents irregular holes on the surface. Pitted surface presents an uneven area. Rolled-in scale presents the features of small spots, fish scales, strips, and lumps. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 13. Induction of various surface defects in the dataset: (a) Inclusion; (b)Rolled-in scale; (c)Perforation; (d)surface. https://doi.org/10.1371/journal.pone.0302419.g013 3.2 Improved YOLOv5 model training results As mentioned on section 2.3, the improved YOLOv5 model can be better used for defect identification, classification and location. The improved YOLOv5 model is trained through the surface defect data set. The training results are shown in Fig 14, the training and testing results of the Box loss function, the loss function without the target category, and the category loss function. The results show that although there are some flaw, the cls_loss value in the training and validation shows that both are close to 0, and the category and location of the defect can be correctly detected. In this case, efficient classification can be performed during the defect detection. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 14. The improved YOLOv5 model training results. https://doi.org/10.1371/journal.pone.0302419.g014 The P-R curve can description the variation of defects detection results before and after improvement, as shown in Fig 15(A) and 15(B). Although the accuracy of perforation and pitted surface decreased by 0.1% and 2.9% respectively, the Rolled-in scale of defects with poor detection results before has been greatly improved by 3.3%, and the accuracy of Inclusion has also been increased by 0.7%. Compared with the original model, the mAP value of the improved model is increased from 77.4% to 77.6%. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 15. Comparison of training results before and after Improving the YOLOv5 Model: (a) Original Model; (b) Improved Model. https://doi.org/10.1371/journal.pone.0302419.g015 3.3 Detection results visualization In order to verify the actual detection effect of the improved model, the original YOLOv5 model and the improved YOLOv5 model were used to detect the image with surface defect. Fig 16 shows the boundary box output of detection of four defect categories. It can be seen from the test results that: The improved YOLOv5 model can better identify and classify defects, and the inspection is more adequate. The original YOLOv5 model has low confidence in the inspection of Rolled-in scale defects, and some Rolled-in scale defects are omitted. Since the improved model introduces the CABM mechanism, it focuses on the defect area during the feature extraction process, enhancing the expression of defect features,. At the same time, the BiFPN module is used to improve the feature fusion and fully integrate the surface features, thus improving the detection accuracy of Inclusion and Rolled-in scale. For the detection results of the same type of defects, the improved model has a better detection effect, and the confidence level of the detected defects is higher than that before the improvement. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 16. Detection Results for each Defect Category:(a) Inclusion; (b) Perforation; (c) Pitted surface; (d) Rolled-in scale. https://doi.org/10.1371/journal.pone.0302419.g016 3.4 Models comparison for surface defect detection Based on the division of the same data set, the improved model in this paper is compared with SSD model, Faster R-CNN model, YOLOv3 model and YOLOv4 model, and the results are shown in Table 1. It can be seen from the data in the table that the AP values of Perforation, Inclusion, Pitted surface and roll-in scale are higher than those of other models. Eventually, the mAP value of the improved model reached 77.6%, which was 11.7%, 3.4%, 6.2% and 33.5% higher than that of SSD, Faster R-CNN, YOLOv3 and YOLOv4 models of the same series, respectively. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. Comparison results of different models. https://doi.org/10.1371/journal.pone.0302419.t001 4. Conclusion Surface defect detection on workpieces is an essential part of intelligent manufacturing. Therefore, the research on industrial product surface defect detection holds strong practical significance. In order to achieve efficient and accurate detection of workpiece surface defects, this study proposes a surface defect detection system based on the improved YOLOv5 model, which is targeted towards digital twin surface. The improvement is to add a CBAM module in the head of YOLOv5 to enhance the feature expression ability. Then the original neck module is transferred to BiFPN network to improve the feature fusion ability of the model. The results show that the improved YOLOv5 model can realize the identification and classification of surface defects, including Inclusion, Perforation, Pitted surface and Rolled-in scale in the detection system. Compared with the original YOLOv5 model, the mAP value of the improved YOLOv5 model has increased by 0.2%, and the model has high precision. On the basis of the same data set, the improved YOLOv5 model has higher recognition accuracy than other models, improving 11.7%, 3.4%, 6.2%, 33.5%, respectively. Therefore, this study provides a practical and systematic detection method for digital twin model surface during the intelligent production process, and realizes the rapid screening of the workpiece with defects. Acknowledgments We would like to acknowledge Han Xie for detailed revise of the manuscript. TI - Identification and classification of surface defects for digital twin models of the workpiece JO - PLoS ONE DO - 10.1371/journal.pone.0302419 DA - 2024-04-30 UR - https://www.deepdyve.com/lp/public-library-of-science-plos-journal/identification-and-classification-of-surface-defects-for-digital-twin-x00xLfsqop SP - e0302419 VL - 19 IS - 4 DP - DeepDyve ER -