TY - JOUR AU - Kadry, Seifedine AB - Introduction Surveillance is described as observing, tracking, documenting, and controlling the behavior of individuals, objects, and events to regulate activity in today’s social life [1]. To be safe and secure is one of the basic human needs [2]. The unexpected data patterns cause some problems that are referred to as anomalies. However, due to the wide variety of specific contexts, unusual event detection is a difficult glitch to solve [3]. Theft, robbery, shoplifting, snatching, automobile burglary, arson, and running arbitrarily are all examples of security threats [4,5]. These security threats can be minimized by implementing modern techniques [6,7] of computer vision [8] and Deep Learning. Identifying anomalous frames and categorizing suspicious photos is complicated and time-consuming [9]. Numerous applications use anomaly recognition, such as intrusion detection applications, visual monitoring, and recognition of suspicious or anomalous behavior. Surveillance is usually done by different types of cameras, all public and private places need proper surveillance [10]. Detection of anomalous patterns in surveillance [11] is much more challenging and complex now because of massive and instantaneous data handling requirements. Human beings have some limitations. They cannot examine enormous data for anomaly recognition in surveillance scenes [12]. Prior automated warning of violent activity in surveillance scenes could significantly minimize the risk of any dangerous activity. Anomaly incident recognition, a well-known research part, is a sub-classification of insightful reconnaissance using computer vision techniques. The outcomes of this classification can be used in various applications, including avoiding security threats in public places and other routine surveillance at various locations[10]. In today’s smart city era, video monitoring has become extremely vital. Large surveillance cameras are installed in public and reserved locations to monitor infrastructure and public safety. To separate the anomalous frame from normal footage is much time and effort, consuming [13]. The categorization of suspicious photos is complicated by variations in shape, texture, people orientation, size, and background. Anomalous image classification is also discovered as an important issue in the domain of surveillance[14]. Some techniques and algorithms are utilized under the umbrella of indirect supervision for anomaly recognition [15]. This paper addresses issues of accuracy, robustness, and the shortcomings of previous approaches by presenting a unique framework for anomaly identification in surveillance systems. The following are this work’s main contributions: Introduction of Up-to-the-Minute-Net: This research proposes a new 63-layer DCNN named Up-to-the-Minute-Net, specifically designed for feature extraction in anomaly detection tasks. This model is optimized to handle complex image data, enhancing its effectiveness in identifying suspicious activities in surveillance footage. Dual Feature Extraction Mechanism: This approach combines the feature extraction capabilities of the proposed Up-to-the-Minute-Net with the well-established Inception-Resnet-v2 model. By fusing features from both networks, we create a rich and diverse set of features that improve the model’s ability to detect subtle anomalies. Advanced Feature Selection Techniques: To further refine the feature set, we introduce two state-of-the-art optimization algorithms—Dragonfly and GA—for feature selection. These methods ensure that only the most relevant features are used, significantly enhancing the model’s performance and reducing computational complexity. Extensive Cross-Validation: We rigorously evaluate our model using 5-fold and 10-fold cross-validation, demonstrating its robustness and generalizability. The proposed framework achieves an unprecedented 99.9% accuracy through these experiments, substantially improving the existing technique. Benchmark Dataset and Real-World Application Potential: The model is tested on the Suspicious Activity Recognition (SAR) benchmark dataset, and its performance surpasses existing state-of-the-art methods. Our model demonstrates strong potential for practical, real-world applications, particularly enhancing public safety through more accurate and reliable surveillance systems. The manuscript has six sections. Section 1 is about the introduction of the topic. Section 2 belongs to similar jobs. Section 3 encircles the proposed deep neural network-based methodology with the feature optimizer used in the introduced process. Section 4 is about the experimental outcomes and discussion, Section 5 is about discussion and limitations, and Section 6 depicts the conclusion and future direction. Nomenclature used in the paper Several methods are discussed in this review paper, and a list of nomenclature is also provided in Table 1, which are used throughout the paper. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. A list of abbreviations used in this paper. https://doi.org/10.1371/journal.pone.0313692.t001 Nomenclature used in the paper Several methods are discussed in this review paper, and a list of nomenclature is also provided in Table 1, which are used throughout the paper. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 1. A list of abbreviations used in this paper. https://doi.org/10.1371/journal.pone.0313692.t001 Related work Any recognition system mainly follows the important steps 1. pre-processing [16], 2. feature extraction[17,18] 3. optimal feature selection [19,20], and 4. classification [21,22] notwithstanding their complexities and complications. Before being used in the construction and application of models, images must undergo image preprocessing [23]. This includes, but is not limited to, scaling, organizing, and color modifications. Scale Invariant Feature Transform(3DSIFT) [24] finds the nearby elements in a picture. A one-dimensional CL (CONVID) is employed to limit commotion and to get applicable data from the embedding produced by the spatial highlights’ extractor [25]. A deep neural network gives very exceptional outcomes in the area of surveillance for anomaly detection, as the methodology proposed by experts is to perceive and segment individuals in pictures and recordings [26–28]. Features are extracted from the images of suspicious actions and some methods such as MPPCA [29], MMPC+SFA [30], Conv-AE [31], ConvLSTM-AE [32], Deep Generic [33], GANs [34] are proposed for feature extraction. Heuristic and hand-crafted features a 3D extension of HOG [35] and HOF [36] provide a brilliant outcome. These features are also assessed utilizing diverse deep-learning models. Some pre-trained feature extraction models such as VGG -19 [37], Inception -3 [38], ResNet-50 [39], Google Net [40], ResNet-18 [41], Squeeze Net [42] DL strategies are utilized for feature extraction. Further with the features selection techniques, knowledge can be discovered and mined which allows for the elimination of redundant and outdated features while maintaining the core details [43]. Some irrelevant features can be further eliminated using dimension-reduction methods to improve the system’s effectiveness and performance [44–46]. Principal Component Analysis(PCA) [47] attempts to order all n information vectors as immediate groupings with few eigenvectors while ignoring test focuses that don’t match the standards [48]. Object identification [49], object tracking [50], and object recognition [51] are the three phases of surveillance. Object classification would ultimately allow for distinction not only between humans and vehicles [52] but also between various types of behavior, such as drunk drivers and suspicious humans [52]. Members of the intoxicated driver’s class are likely to trigger an accident, so they are indirectly conducting behavior prediction. Members of the suspicious human class who are discovered in a parking lot may attempt to steal a car [53]. In the domain of surveillance, walker grouping is a significant objective recognition concern [54]. DL has recently seen a considerable advancement in images such as object recognition, and pose estimation, as different tasks utilizing non-direct relations in high-dimensional information [55,56]. In Generative Adversarial Networks (GAN), in light of its capacity to learn successive information, CNN-based LSTM [57,58] is utilized for encoding transient data in pictures [59]. Some methods are used for object detection which leads to object classification as YOLO is proposed for object detection [60]. Table 2 presents a comparative comparison of the state-of-the-art deep learning approaches for anomaly identification in surveillance. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 2. Comparative analysis of existing approaches in anomaly recognition for surveillance systems. https://doi.org/10.1371/journal.pone.0313692.t002 Motivation The proposed research is motivated by the critical need to enhance surveillance systems to ensure public safety in an increasingly complex environment where manual monitoring is no longer feasible. Current DL methods for anomaly recognition face challenges in achieving high accuracy, efficient feature extraction, and scalability for real-world applications. To bridge these gaps, this paper introduces a novel framework that combines two DCNNs— the custom-built “Up-to-the-Minute-Net” and the established Inception-ResNet-v2—for comprehensive feature extraction. This dual approach is complemented by a hybrid optimization technique using Dragonfly and GA for feature selection. This approach notably enhances the model’s performance by concentrating on the most pertinent features. The model’s ability to achieve 99.9% accuracy with 5-fold cross-validation demonstrates its superiority over existing methods. By advancing the state-of-the-art in both feature extraction and selection, this study advances not only the theoretical knowledge of anomaly recognition but also offers a practical solution that is scalable and applicable to real-world surveillance scenarios. Motivation The proposed research is motivated by the critical need to enhance surveillance systems to ensure public safety in an increasingly complex environment where manual monitoring is no longer feasible. Current DL methods for anomaly recognition face challenges in achieving high accuracy, efficient feature extraction, and scalability for real-world applications. To bridge these gaps, this paper introduces a novel framework that combines two DCNNs— the custom-built “Up-to-the-Minute-Net” and the established Inception-ResNet-v2—for comprehensive feature extraction. This dual approach is complemented by a hybrid optimization technique using Dragonfly and GA for feature selection. This approach notably enhances the model’s performance by concentrating on the most pertinent features. The model’s ability to achieve 99.9% accuracy with 5-fold cross-validation demonstrates its superiority over existing methods. By advancing the state-of-the-art in both feature extraction and selection, this study advances not only the theoretical knowledge of anomaly recognition but also offers a practical solution that is scalable and applicable to real-world surveillance scenarios. Proposed methodology The main objective of this study is to introduce a method for detecting anomalies in surveillance footage and identifying suspicious activities. The overall architecture of the proposed anomaly detection framework is illustrated in (Fig 1). It shows the integration of various stages, including the initial image preprocessing step using contrast enhancement [67] and histogram equalization. In this manuscript, a DDCNN named Up-to-the-Minute-Net is proposed. The other significant phases performed are feature extraction from the proposed network and pre-trained deep network Inception-Resnet-V2 [68]. This visualization helps in understanding how each component contributes to the overall anomaly detection process, from preprocessing to feature extraction and classification. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 1. Flowchart of the proposed approach. https://doi.org/10.1371/journal.pone.0313692.g001 Extracted features from both networks are fused in the fusion phase. The extracted features were passed to two feature optimizers Dragonfly and GA. The methodology produces improved outcomes as compared to much other research after the classification. The optimized features are transferred to distinctive classifiers for categorization. The experiments are performed on two different cross-validation numbers, 5- and 10 values are selected for this purpose. Data preprocessing Before processing the images through the networks, the data undergoes several preprocessing steps. Initially, contrast enhancement [67] and histogram equalization techniques are applied to improve image quality. The noise from the image is eliminated using a contrast enhancement technique and histogram equalization upgrades the differentiation of pictures by changing the qualities in the intensities of the image. Histogram equalization further improves the contrast by redistributing the image’s intensity levels, making features more distinguishable. Feature extraction Feature extraction is integral to the execution of the classification procedure. The research has a prodigious contribution to the fusion of features extracted from the proposed Up-to-the-minute-Net and pre-trained deep Inception-Resnet-V2 [68] network. Both networks are used for feature extraction from the SAR [69] dataset. The features extracted from Up-to-the-Minute-Net are 4096 and features extracted from Inception-Resnet-V2 are 1536. Proposed Deep CNN Up-to-the-Minute-Net. A new CNN-based deep neural network named Up-to-the-Minute-Net is proposed. First, the CIFAR-100 is used to train the anticipated CNN model [70]. The dataset contains 100 classes of images belonging to different categories. In the wake of switching the proposed network over to a pre-trained model, the features are extracted from the SAR [69]. The proposed Up-to-the-Minute-Net is a branched network. Table 3 elaborates on the features FMS, FD, ST, PD, and the pooling window size of each layer. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Detailed layers’ configuration of proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.t003 The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96. Following the input, the network has a CN1 which CN layer. The network has a total of 63 layers in number. The features are extracted from FC20 in the form of 1 × 1 × 4096. The detailed architecture is depicted in (Fig 2). The network has 20 CN layers. It also contains 7 R layers, 2 D layers, 3 FC layers, and one SoftMax layer. The network also has 14 BN layers, 5 P layers, 4 ADD layers, 4 GCN layers, and 1 final CL layer for classification purposes. The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Detailed layered architecture of proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.g002 The network output type is classification. Some parameters are definite and have to be set at the time of implementation. Table 4 describes the hyperparameters and their values. These hyperparameters have a major role in the training of the network. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Hyperparameters and their values for proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.t004 All dataset images contain features that were obtained from the FC20 layer. Each picture has 4096 total features that were obtained from this layer. The feature matrix’s overall size thus increases to 13250 by 4096. The first network used for feature extraction is the proposed Up-to-the-Minute-Net. Inception-Resnet-V2. The second network used for feature extraction is Inception-Resnet-V2 [68]. This DCNN is trained on more than 1,000,000 images from the ImageNet [71] data set without the FC layer. The net is 164 layers deep and can categorize images into 1000 article classes. Thus, the network has acquired fine feature portrayals for a large number of images. The network has an image input size of 299 by 299. In the Inception-Resnet block, numerous valued CN channels are joined with available connections. Inception-Resnet-V2 has a total of 824 layers and functions in number. Its input type is the image, and its output type is classification. The features are extracted from the average P-layer. The total features extracted from the network are 1536 and the total images in the SAR [69] are 13250. The feature matrix we achieved is 13250* 1536. Feature fusion Feature fusion intends to consolidate the source image of a similar scene to shape one composite picture that holds a more exact depiction of the view than any of the singular source images. Preceding the merging of images, striking features, present in all source pictures, are separated utilizing a suitable component extraction method. The research has a prodigious contribution to the fusion of features obtained from the proposed Up-to-the-Minute-Net and pre-trained deep CNN network Inception-Resnet-V2. Both networks are used for feature extraction. Then, at that point, fusion is performed utilizing the features extracted from these networks. The extracted features from InceptionResNet-v2 are 1536 and Up-to-the-Minute-Net are 4096. These features got fused and made a collective compound of 13250*5632 fused features. The remarkable features are first recognized in each source image. The remarkable quality of a component is figured as a coefficient. The features fusion process ensures that the majority of the predominant features are integrated. Feature optimization There is an immense number of retrieved features in the feature extraction phase. After the fusion process, this makes a gigantic combination of 5632 features. The training phase for classification can be slowed down for these many features. Additionally, concerns like indefinite features and the curse of dimensionality can emerge, thus, a decline in execution can happen. To resolve this issue, the fused features are delivered to the Dragonfly [72] and GA [73] feature optimizers simultaneously. These algorithms were chosen because of their demonstrated effectiveness in solving high-dimensional optimization problems, which is essential in the context of anomaly recognition and feature selection from large datasets. Dragonfly algorithm. The mathematical implementation of Dragonfly is as follows according to Reynolds, the overall behavior follows three primitive principles of separation, alignment, and cohesion [74]. While calculating the separation is the position of the current individual, shows the position of the kth neighboring individual and is some neighboring individuals. (1) While calculating alignment shows the velocity of the kth neighboring individual. (2) The cohesion is calculated as follows: (3) Where m is the position of the current individual [72]. These principles are incorporated into the Dragonfly Algorithm to guide the search process toward optimal solutions in a high-dimensional space. Genetic algorithm. GA reached to optimal solution through the probabilities of crossover and mutation by continuously changing the search space. In the following equation e is the number of generations and E is the total number of evolutionary generations set by the population [75]. While performing crossover two parents are chosen for mating one parent donates some part of genetic material and the corresponding part of the other parent participates in the offspring [76]. Dragonfly and GA were chosen due to their demonstrated suitability for feature selection in complex, high-dimensional datasets. Both algorithms are well-suited to balancing exploration and exploitation, which is critical for ensuring that the most relevant features are selected from the fused feature set. The number of optimal features selected from the Dragonfly features optimizer is 950 and 1250 and the total of optimal features chosen from the GA optimizer are 500 and 2500. Classification The chosen features are pushed ahead to different classifiers. The picked classifiers are the variants of SVM [77]. The variants of SVM incorporate Quad-SVM [78], Linear-SVM [79], FineG-SVM [80], CoarG-SVM), MedG-SVM, and Cu-SVM [81]. The SVM classifier and its kernel can be found in [82–84]. The classifiers are evaluated on several performance evaluation metrics. The itemized upshots and trials are presented in the results section. The results are taken through two different folds of cross-validation numbers with two changed feature optimizers Dragonfly and GA. Data preprocessing Before processing the images through the networks, the data undergoes several preprocessing steps. Initially, contrast enhancement [67] and histogram equalization techniques are applied to improve image quality. The noise from the image is eliminated using a contrast enhancement technique and histogram equalization upgrades the differentiation of pictures by changing the qualities in the intensities of the image. Histogram equalization further improves the contrast by redistributing the image’s intensity levels, making features more distinguishable. Feature extraction Feature extraction is integral to the execution of the classification procedure. The research has a prodigious contribution to the fusion of features extracted from the proposed Up-to-the-minute-Net and pre-trained deep Inception-Resnet-V2 [68] network. Both networks are used for feature extraction from the SAR [69] dataset. The features extracted from Up-to-the-Minute-Net are 4096 and features extracted from Inception-Resnet-V2 are 1536. Proposed Deep CNN Up-to-the-Minute-Net. A new CNN-based deep neural network named Up-to-the-Minute-Net is proposed. First, the CIFAR-100 is used to train the anticipated CNN model [70]. The dataset contains 100 classes of images belonging to different categories. In the wake of switching the proposed network over to a pre-trained model, the features are extracted from the SAR [69]. The proposed Up-to-the-Minute-Net is a branched network. Table 3 elaborates on the features FMS, FD, ST, PD, and the pooling window size of each layer. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Detailed layers’ configuration of proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.t003 The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96. Following the input, the network has a CN1 which CN layer. The network has a total of 63 layers in number. The features are extracted from FC20 in the form of 1 × 1 × 4096. The detailed architecture is depicted in (Fig 2). The network has 20 CN layers. It also contains 7 R layers, 2 D layers, 3 FC layers, and one SoftMax layer. The network also has 14 BN layers, 5 P layers, 4 ADD layers, 4 GCN layers, and 1 final CL layer for classification purposes. The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Detailed layered architecture of proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.g002 The network output type is classification. Some parameters are definite and have to be set at the time of implementation. Table 4 describes the hyperparameters and their values. These hyperparameters have a major role in the training of the network. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Hyperparameters and their values for proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.t004 All dataset images contain features that were obtained from the FC20 layer. Each picture has 4096 total features that were obtained from this layer. The feature matrix’s overall size thus increases to 13250 by 4096. The first network used for feature extraction is the proposed Up-to-the-Minute-Net. Inception-Resnet-V2. The second network used for feature extraction is Inception-Resnet-V2 [68]. This DCNN is trained on more than 1,000,000 images from the ImageNet [71] data set without the FC layer. The net is 164 layers deep and can categorize images into 1000 article classes. Thus, the network has acquired fine feature portrayals for a large number of images. The network has an image input size of 299 by 299. In the Inception-Resnet block, numerous valued CN channels are joined with available connections. Inception-Resnet-V2 has a total of 824 layers and functions in number. Its input type is the image, and its output type is classification. The features are extracted from the average P-layer. The total features extracted from the network are 1536 and the total images in the SAR [69] are 13250. The feature matrix we achieved is 13250* 1536. Proposed Deep CNN Up-to-the-Minute-Net. A new CNN-based deep neural network named Up-to-the-Minute-Net is proposed. First, the CIFAR-100 is used to train the anticipated CNN model [70]. The dataset contains 100 classes of images belonging to different categories. In the wake of switching the proposed network over to a pre-trained model, the features are extracted from the SAR [69]. The proposed Up-to-the-Minute-Net is a branched network. Table 3 elaborates on the features FMS, FD, ST, PD, and the pooling window size of each layer. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 3. Detailed layers’ configuration of proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.t003 The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96. Following the input, the network has a CN1 which CN layer. The network has a total of 63 layers in number. The features are extracted from FC20 in the form of 1 × 1 × 4096. The detailed architecture is depicted in (Fig 2). The network has 20 CN layers. It also contains 7 R layers, 2 D layers, 3 FC layers, and one SoftMax layer. The network also has 14 BN layers, 5 P layers, 4 ADD layers, 4 GCN layers, and 1 final CL layer for classification purposes. The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 2. Detailed layered architecture of proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.g002 The network output type is classification. Some parameters are definite and have to be set at the time of implementation. Table 4 describes the hyperparameters and their values. These hyperparameters have a major role in the training of the network. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 4. Hyperparameters and their values for proposed Up-to-the-Minute-Net. https://doi.org/10.1371/journal.pone.0313692.t004 All dataset images contain features that were obtained from the FC20 layer. Each picture has 4096 total features that were obtained from this layer. The feature matrix’s overall size thus increases to 13250 by 4096. The first network used for feature extraction is the proposed Up-to-the-Minute-Net. Inception-Resnet-V2. The second network used for feature extraction is Inception-Resnet-V2 [68]. This DCNN is trained on more than 1,000,000 images from the ImageNet [71] data set without the FC layer. The net is 164 layers deep and can categorize images into 1000 article classes. Thus, the network has acquired fine feature portrayals for a large number of images. The network has an image input size of 299 by 299. In the Inception-Resnet block, numerous valued CN channels are joined with available connections. Inception-Resnet-V2 has a total of 824 layers and functions in number. Its input type is the image, and its output type is classification. The features are extracted from the average P-layer. The total features extracted from the network are 1536 and the total images in the SAR [69] are 13250. The feature matrix we achieved is 13250* 1536. Feature fusion Feature fusion intends to consolidate the source image of a similar scene to shape one composite picture that holds a more exact depiction of the view than any of the singular source images. Preceding the merging of images, striking features, present in all source pictures, are separated utilizing a suitable component extraction method. The research has a prodigious contribution to the fusion of features obtained from the proposed Up-to-the-Minute-Net and pre-trained deep CNN network Inception-Resnet-V2. Both networks are used for feature extraction. Then, at that point, fusion is performed utilizing the features extracted from these networks. The extracted features from InceptionResNet-v2 are 1536 and Up-to-the-Minute-Net are 4096. These features got fused and made a collective compound of 13250*5632 fused features. The remarkable features are first recognized in each source image. The remarkable quality of a component is figured as a coefficient. The features fusion process ensures that the majority of the predominant features are integrated. Feature optimization There is an immense number of retrieved features in the feature extraction phase. After the fusion process, this makes a gigantic combination of 5632 features. The training phase for classification can be slowed down for these many features. Additionally, concerns like indefinite features and the curse of dimensionality can emerge, thus, a decline in execution can happen. To resolve this issue, the fused features are delivered to the Dragonfly [72] and GA [73] feature optimizers simultaneously. These algorithms were chosen because of their demonstrated effectiveness in solving high-dimensional optimization problems, which is essential in the context of anomaly recognition and feature selection from large datasets. Dragonfly algorithm. The mathematical implementation of Dragonfly is as follows according to Reynolds, the overall behavior follows three primitive principles of separation, alignment, and cohesion [74]. While calculating the separation is the position of the current individual, shows the position of the kth neighboring individual and is some neighboring individuals. (1) While calculating alignment shows the velocity of the kth neighboring individual. (2) The cohesion is calculated as follows: (3) Where m is the position of the current individual [72]. These principles are incorporated into the Dragonfly Algorithm to guide the search process toward optimal solutions in a high-dimensional space. Genetic algorithm. GA reached to optimal solution through the probabilities of crossover and mutation by continuously changing the search space. In the following equation e is the number of generations and E is the total number of evolutionary generations set by the population [75]. While performing crossover two parents are chosen for mating one parent donates some part of genetic material and the corresponding part of the other parent participates in the offspring [76]. Dragonfly and GA were chosen due to their demonstrated suitability for feature selection in complex, high-dimensional datasets. Both algorithms are well-suited to balancing exploration and exploitation, which is critical for ensuring that the most relevant features are selected from the fused feature set. The number of optimal features selected from the Dragonfly features optimizer is 950 and 1250 and the total of optimal features chosen from the GA optimizer are 500 and 2500. Dragonfly algorithm. The mathematical implementation of Dragonfly is as follows according to Reynolds, the overall behavior follows three primitive principles of separation, alignment, and cohesion [74]. While calculating the separation is the position of the current individual, shows the position of the kth neighboring individual and is some neighboring individuals. (1) While calculating alignment shows the velocity of the kth neighboring individual. (2) The cohesion is calculated as follows: (3) Where m is the position of the current individual [72]. These principles are incorporated into the Dragonfly Algorithm to guide the search process toward optimal solutions in a high-dimensional space. Genetic algorithm. GA reached to optimal solution through the probabilities of crossover and mutation by continuously changing the search space. In the following equation e is the number of generations and E is the total number of evolutionary generations set by the population [75]. While performing crossover two parents are chosen for mating one parent donates some part of genetic material and the corresponding part of the other parent participates in the offspring [76]. Dragonfly and GA were chosen due to their demonstrated suitability for feature selection in complex, high-dimensional datasets. Both algorithms are well-suited to balancing exploration and exploitation, which is critical for ensuring that the most relevant features are selected from the fused feature set. The number of optimal features selected from the Dragonfly features optimizer is 950 and 1250 and the total of optimal features chosen from the GA optimizer are 500 and 2500. Classification The chosen features are pushed ahead to different classifiers. The picked classifiers are the variants of SVM [77]. The variants of SVM incorporate Quad-SVM [78], Linear-SVM [79], FineG-SVM [80], CoarG-SVM), MedG-SVM, and Cu-SVM [81]. The SVM classifier and its kernel can be found in [82–84]. The classifiers are evaluated on several performance evaluation metrics. The itemized upshots and trials are presented in the results section. The results are taken through two different folds of cross-validation numbers with two changed feature optimizers Dragonfly and GA. Experimental results and discussion A total of 8 experiments are performed, the number of experiments performed with the Dragonfly is 4, and likewise, the experiments performed with GA are 4 in number. The dataset used for implementation purposes is SAR [69]. The features are extracted from two DCNNs, one is proposed Up-to-the-minute-Net, and the second is pre-trained Inception-Resnet-V2. The extracted features from both DCNNs got fused and provided a combination of 5632. Results are taken by 5- and 10-fold cross-validation. The number of optimal features selected is 500, 950, 1250, and 2500. The total number of iterations is 25 for each experiment. The presented research is implemented using MATLAB R2021a and on the Microsoft Windows 10 Pro operating system. The system has the processor of Intel(R) Core (TM) i5-2520M CPU @ 2.50GHz, 2501 Mhz, 2 Core(s), 4 Logical Processor(s), and BIOS Version Hewlett-Packard 68SCF Ver. F.67. The system has Installed Physical Memory (RAM) of 8.00 GB. The detailed results of all experiments are described in section 4.2. The performance measures used for experiments are Precision(Pre) [85], Recall(Rec) [86], Accuracy [87], F1-Sc [88], ROC, and AUC [89]. The performance criteria are similar for all experiments. Dataset used for implementation A dataset containing five anomaly classes is arranged by acquiring four classes ((a)Falling (b)Fighting (c)Firing and (d) Running) from HMDB51 (https://www.kaggle.com/datasets/easonlll/hmdb51) [79] and one class (Fire) from AIDER (https://zenodo.org/records/3888300#.XvCPQUUzaUk) [80] datasets. The dataset is named SAR (Suspicious Action Recognition). It is also used in [59]. Simulation and data details This dataset SAR is essential for assessing how well the suggested anomaly recognition methodology works. The SAR dataset comprises 13,250 images, each with different scenarios and conditions, making it suitable for testing the robustness and accuracy of the proposed models. The total number of original images in the dataset is 6625 and the augmented images are 13250. The dataset distribution between original and augmented images is depicted in (Fig 3), where the blue bars represent the original images, and the yellow bars indicate the augmented images. The figure shows how the dataset has been expanded through augmentation to create a larger, more diverse dataset that improves the model’s training and reduces overfitting. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Graphical representation of original and augmented images. https://doi.org/10.1371/journal.pone.0313692.g003 The combination we obtained after the fusion process is 13250* 5632. The k-fold cross-validation method is used for the validation of this multiclass classification. Experiments performed based on feature optimizer and no. of folds To perform the experiments two different feature optimizers are used. First Dragonfly feature optimizer is selected for the selection of optimal features second is the GA feature optimizer to perform the experiments. The parameters selected for the Dragonfly feature optimizer are discussed in Table 5. A total of 4 experiments were executed based on Dragonfly optimization; two experiments were executed with 5 folds and 2 experiments were executed with 10 folds cross-validation. A different number of validations is selected to check out the difference if it exists by changing the number of folds for validation purposes. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. Parameters Values used for Dragonfly optimization. https://doi.org/10.1371/journal.pone.0313692.t005 Two experiments are performed with 950 selected features with 5- and 10-fold cross-validation. Two experiments are performed with 1250 selected features on both folds for validation. In this manuscript, the model is also tested with a GA feature optimizer to attain the best results. In the following category total of 4 experiments are performed based on GA features optimizer. 2 experiments are performed with 5 folds and two are performed with 10 folds. Parameter values used for experiments are elaborated in Table 6. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. Parameters values used for GA optimization. https://doi.org/10.1371/journal.pone.0313692.t006 Two experiments are performed with 500 selected features with 5- and 10-fold cross-validation two experiments are performed with 2500 on both folds for validation. The outcomes of all 8 experiments are discussed in Table 7 which are comprised of Pre, Rec, F1-Sc, and Accuracy. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. Performance evaluation of all experiments. https://doi.org/10.1371/journal.pone.0313692.t007 The highest accuracy achieved is 99.9% with a Pre of 1.00, Rec of 1.00, and F1-Sc of 1.00. In this experiment, the selected number of optimal features is 2500 from total features of 5632. The CM in (Fig 4) shows how well the model classified the different anomaly classes in the SAR dataset and it presents the CM of the experiment with the highest accuracy. The CM details the performance of the model in classifying different anomaly types, with metrics such as TP, FP, TN, and FN. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. CM of highest results obtained. https://doi.org/10.1371/journal.pone.0313692.g004 The ROC curve in (Fig 5) visualizes the trade-off between TP rates (sensitivity) and FP rates across different classification thresholds. The AUC value close to 1.0 signifies that the model is highly effective at distinguishing between anomaly and non-anomaly cases. The curve in this Fig shows that the model’s predictive power is excellent, as it achieves near-perfect classification with minimal FP and FN. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. ROC of highest results obtained. https://doi.org/10.1371/journal.pone.0313692.g005 The (Fig 6) shows the performance metrics for Experiment 7, which achieved the highest accuracy. The fig includes graphical representations of Accuracy, Pre, Rec, and F1-Sc for the model trained with 2500 features using GA optimization and 5-fold cross-validation. This fig emphasizes the model’s superior performance across all metrics, with perfect scores indicating that the model is highly effective at detecting anomalies. The detailed performance metrics underscore the impact of optimal feature selection and the effectiveness of the GA optimization technique. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. Graphical representation of Accuracy, Pre, Rec, and F1 Score on 2500 features with GA optimization on 5-fold cross-validation. https://doi.org/10.1371/journal.pone.0313692.g006 No. of the folds on which the highest accuracy is achieved is 5. In this experiment, the data is trained on the testing and training ratio of 20:80. The specified portion of testing and training data is changed after every iteration. Accuracy over selected features The selection of optimal features has a great impact on accuracy. A total 4 number of features are selected for experimentation those are 500,950,1250,2500. Table 8 describes the association between the features and accuracy. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. Accuracy over selected features. https://doi.org/10.1371/journal.pone.0313692.t008 The selection of these optimal features is random, as the total fused feature is 5632 in number. The careful selection of features shown in (Fig 7), rather than simply using all available features, leads to significant improvements in accuracy. The performance improvements as the number of features increases show that the model can leverage additional information up to a point, after which further increases offer limited benefits. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. The relational graph between No. of features and Accuracy. https://doi.org/10.1371/journal.pone.0313692.g007 The highest accuracy achieved is 99.9% on 2500 features on Cu-SVM with GA optimization on 5-fold cross-validation. Comparison with existing results This section compares the previously published work of [69]. Heretofore the number of experiments done was 5. The number of selected features was 100, 250, 500, 450, 1000. The number of folds selected for all experiments was 5 folds. ASO is used for feature optimization of all experiments. The highest result achieved was 99.3% on the Cu-SVM of a selected number of 500 features. Table 9 provides a widespread evaluation of the proposed work with prevailing work. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. A widespread comparison of proposed work with existing work. https://doi.org/10.1371/journal.pone.0313692.t009 In the proposed work a total of 8 experiments are executed with the selected features of 500,950,1250 and 2500. The experiments are accomplished, and results are obtained by the two feature optimizers the Dragonfly algorithm and GA. No of the folds selected are 5 folds and 10 folds for cross-validation. The highest accuracy acquired is 99.9% from 2500 selected features on Cu-SVM with 5-fold cross-validation. While [69] achieved 99.3% accuracy with 500 features using the entropy-coded ant colony optimization and L4-branched-ActionNet, our model achieved the same accuracy with 950 features using the Up-to-the-Minute-Net and GA optimizer. However, our model was designed to explore the impact of using a larger feature set (up to 2500 features), ultimately achieving 99.9% accuracy. This suggests that while [69] Maybe more efficiently with fewer features, our approach excels when higher feature counts yield better performance. Future work will focus on optimizing feature selection to balance accuracy and computational efficiency for real-world applications. Statistical analysis To ensure the robustness and reliability of our model, we employed rigorous statistical methods throughout the study. We utilized 5-fold and 10-fold cross-validation to validate the model’s efficiency, ensuring generalizability across different data splits. Additionally, we implemented two distinct feature optimization techniques—Dragonfly and GA—to select the most relevant features, testing different feature subsets to optimize accuracy and reliability. The models were evaluated using key performance metrics such as Pre, Rec, F1-Sc, and accuracy, across multiple iterations and parameter settings, confirming the consistency and validity of our findings. These comprehensive steps ensure that the statistical analysis conducted is thorough and methodologically sound. The base paper achieved a prediction speed of 650 obs/sec, while the proposed model achieved 87 obs/sec. This speed discrepancy can largely be attributed to the computational power available in the respective environments. The base paper benefited from the use of a dedicated NVIDIA GTX 1070 GPU, which is designed to handle intensive computational tasks such as model training and prediction at a much faster rate than a CPU. In contrast, the proposed model was run on a CPU-based system without GPU acceleration, which naturally results in slower processing speeds. Although the prediction speed of the proposed model is lower, the focus of our research is on improving accuracy, achieving 99.9%, as compared to the 99.3% accuracy reported in [69]. The increase in accuracy, despite the computational limitations, demonstrates the effectiveness of our feature extraction and selection approach, making the trade-off between speed and accuracy a key consideration. For applications where high accuracy is more critical than prediction speed, the proposed model offers a valuable alternative. In on our feature selection process and the trends observed with the use of 500, 950,1250, and 2500 features, we anticipate a marginal increase in accuracy beyond 99.9% when using the full feature set. The primary goal of our feature selection was to find an optimal balance between computational efficiency and accuracy, as the marginal increase in accuracy with significantly more features is unlikely to justify the additional computational cost. In real-time applications where speed is crucial, a model with fewer features might be preferable, whereas in other cases, the higher accuracy of the proposed model may justify the additional computational time. Dataset used for implementation A dataset containing five anomaly classes is arranged by acquiring four classes ((a)Falling (b)Fighting (c)Firing and (d) Running) from HMDB51 (https://www.kaggle.com/datasets/easonlll/hmdb51) [79] and one class (Fire) from AIDER (https://zenodo.org/records/3888300#.XvCPQUUzaUk) [80] datasets. The dataset is named SAR (Suspicious Action Recognition). It is also used in [59]. Simulation and data details This dataset SAR is essential for assessing how well the suggested anomaly recognition methodology works. The SAR dataset comprises 13,250 images, each with different scenarios and conditions, making it suitable for testing the robustness and accuracy of the proposed models. The total number of original images in the dataset is 6625 and the augmented images are 13250. The dataset distribution between original and augmented images is depicted in (Fig 3), where the blue bars represent the original images, and the yellow bars indicate the augmented images. The figure shows how the dataset has been expanded through augmentation to create a larger, more diverse dataset that improves the model’s training and reduces overfitting. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 3. Graphical representation of original and augmented images. https://doi.org/10.1371/journal.pone.0313692.g003 The combination we obtained after the fusion process is 13250* 5632. The k-fold cross-validation method is used for the validation of this multiclass classification. Experiments performed based on feature optimizer and no. of folds To perform the experiments two different feature optimizers are used. First Dragonfly feature optimizer is selected for the selection of optimal features second is the GA feature optimizer to perform the experiments. The parameters selected for the Dragonfly feature optimizer are discussed in Table 5. A total of 4 experiments were executed based on Dragonfly optimization; two experiments were executed with 5 folds and 2 experiments were executed with 10 folds cross-validation. A different number of validations is selected to check out the difference if it exists by changing the number of folds for validation purposes. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 5. Parameters Values used for Dragonfly optimization. https://doi.org/10.1371/journal.pone.0313692.t005 Two experiments are performed with 950 selected features with 5- and 10-fold cross-validation. Two experiments are performed with 1250 selected features on both folds for validation. In this manuscript, the model is also tested with a GA feature optimizer to attain the best results. In the following category total of 4 experiments are performed based on GA features optimizer. 2 experiments are performed with 5 folds and two are performed with 10 folds. Parameter values used for experiments are elaborated in Table 6. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 6. Parameters values used for GA optimization. https://doi.org/10.1371/journal.pone.0313692.t006 Two experiments are performed with 500 selected features with 5- and 10-fold cross-validation two experiments are performed with 2500 on both folds for validation. The outcomes of all 8 experiments are discussed in Table 7 which are comprised of Pre, Rec, F1-Sc, and Accuracy. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 7. Performance evaluation of all experiments. https://doi.org/10.1371/journal.pone.0313692.t007 The highest accuracy achieved is 99.9% with a Pre of 1.00, Rec of 1.00, and F1-Sc of 1.00. In this experiment, the selected number of optimal features is 2500 from total features of 5632. The CM in (Fig 4) shows how well the model classified the different anomaly classes in the SAR dataset and it presents the CM of the experiment with the highest accuracy. The CM details the performance of the model in classifying different anomaly types, with metrics such as TP, FP, TN, and FN. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 4. CM of highest results obtained. https://doi.org/10.1371/journal.pone.0313692.g004 The ROC curve in (Fig 5) visualizes the trade-off between TP rates (sensitivity) and FP rates across different classification thresholds. The AUC value close to 1.0 signifies that the model is highly effective at distinguishing between anomaly and non-anomaly cases. The curve in this Fig shows that the model’s predictive power is excellent, as it achieves near-perfect classification with minimal FP and FN. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 5. ROC of highest results obtained. https://doi.org/10.1371/journal.pone.0313692.g005 The (Fig 6) shows the performance metrics for Experiment 7, which achieved the highest accuracy. The fig includes graphical representations of Accuracy, Pre, Rec, and F1-Sc for the model trained with 2500 features using GA optimization and 5-fold cross-validation. This fig emphasizes the model’s superior performance across all metrics, with perfect scores indicating that the model is highly effective at detecting anomalies. The detailed performance metrics underscore the impact of optimal feature selection and the effectiveness of the GA optimization technique. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 6. Graphical representation of Accuracy, Pre, Rec, and F1 Score on 2500 features with GA optimization on 5-fold cross-validation. https://doi.org/10.1371/journal.pone.0313692.g006 No. of the folds on which the highest accuracy is achieved is 5. In this experiment, the data is trained on the testing and training ratio of 20:80. The specified portion of testing and training data is changed after every iteration. Accuracy over selected features The selection of optimal features has a great impact on accuracy. A total 4 number of features are selected for experimentation those are 500,950,1250,2500. Table 8 describes the association between the features and accuracy. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 8. Accuracy over selected features. https://doi.org/10.1371/journal.pone.0313692.t008 The selection of these optimal features is random, as the total fused feature is 5632 in number. The careful selection of features shown in (Fig 7), rather than simply using all available features, leads to significant improvements in accuracy. The performance improvements as the number of features increases show that the model can leverage additional information up to a point, after which further increases offer limited benefits. Download: PPT PowerPoint slide PNG larger image TIFF original image Fig 7. The relational graph between No. of features and Accuracy. https://doi.org/10.1371/journal.pone.0313692.g007 The highest accuracy achieved is 99.9% on 2500 features on Cu-SVM with GA optimization on 5-fold cross-validation. Comparison with existing results This section compares the previously published work of [69]. Heretofore the number of experiments done was 5. The number of selected features was 100, 250, 500, 450, 1000. The number of folds selected for all experiments was 5 folds. ASO is used for feature optimization of all experiments. The highest result achieved was 99.3% on the Cu-SVM of a selected number of 500 features. Table 9 provides a widespread evaluation of the proposed work with prevailing work. Download: PPT PowerPoint slide PNG larger image TIFF original image Table 9. A widespread comparison of proposed work with existing work. https://doi.org/10.1371/journal.pone.0313692.t009 In the proposed work a total of 8 experiments are executed with the selected features of 500,950,1250 and 2500. The experiments are accomplished, and results are obtained by the two feature optimizers the Dragonfly algorithm and GA. No of the folds selected are 5 folds and 10 folds for cross-validation. The highest accuracy acquired is 99.9% from 2500 selected features on Cu-SVM with 5-fold cross-validation. While [69] achieved 99.3% accuracy with 500 features using the entropy-coded ant colony optimization and L4-branched-ActionNet, our model achieved the same accuracy with 950 features using the Up-to-the-Minute-Net and GA optimizer. However, our model was designed to explore the impact of using a larger feature set (up to 2500 features), ultimately achieving 99.9% accuracy. This suggests that while [69] Maybe more efficiently with fewer features, our approach excels when higher feature counts yield better performance. Future work will focus on optimizing feature selection to balance accuracy and computational efficiency for real-world applications. Statistical analysis To ensure the robustness and reliability of our model, we employed rigorous statistical methods throughout the study. We utilized 5-fold and 10-fold cross-validation to validate the model’s efficiency, ensuring generalizability across different data splits. Additionally, we implemented two distinct feature optimization techniques—Dragonfly and GA—to select the most relevant features, testing different feature subsets to optimize accuracy and reliability. The models were evaluated using key performance metrics such as Pre, Rec, F1-Sc, and accuracy, across multiple iterations and parameter settings, confirming the consistency and validity of our findings. These comprehensive steps ensure that the statistical analysis conducted is thorough and methodologically sound. The base paper achieved a prediction speed of 650 obs/sec, while the proposed model achieved 87 obs/sec. This speed discrepancy can largely be attributed to the computational power available in the respective environments. The base paper benefited from the use of a dedicated NVIDIA GTX 1070 GPU, which is designed to handle intensive computational tasks such as model training and prediction at a much faster rate than a CPU. In contrast, the proposed model was run on a CPU-based system without GPU acceleration, which naturally results in slower processing speeds. Although the prediction speed of the proposed model is lower, the focus of our research is on improving accuracy, achieving 99.9%, as compared to the 99.3% accuracy reported in [69]. The increase in accuracy, despite the computational limitations, demonstrates the effectiveness of our feature extraction and selection approach, making the trade-off between speed and accuracy a key consideration. For applications where high accuracy is more critical than prediction speed, the proposed model offers a valuable alternative. In on our feature selection process and the trends observed with the use of 500, 950,1250, and 2500 features, we anticipate a marginal increase in accuracy beyond 99.9% when using the full feature set. The primary goal of our feature selection was to find an optimal balance between computational efficiency and accuracy, as the marginal increase in accuracy with significantly more features is unlikely to justify the additional computational cost. In real-time applications where speed is crucial, a model with fewer features might be preferable, whereas in other cases, the higher accuracy of the proposed model may justify the additional computational time. Discussion and limitations The importance of anomaly recognition in surveillance has been discussed before. For this purpose, a model is designed and tested with different experiments. This subdivision mounts the experiments and results of the proposed approaches and the relative comparison to the existing techniques. The experiments are conducted with the standard benchmark dataset Suspicious Activity Recognition (SAR). In the paper, a list of nomenclature is presented in Table 1 and a comparative analysis of existing techniques to detect anomalies in surveillance with the help of DL is presented in Table 2. Fig 1 shows the flow chart of the proposed approach. Table 3 shows the detailed layer’s configuration of the proposed Up-to-the-Minute-Net. Fig 2 depicts the detailed layered architecture of the proposed DCNN net. Table 4 shows the hyperparameters and their values for the proposed Up-to-the-Minute-net. The augmented dataset is used for experimentation purposes and a graphic illustration of the dataset is shown in Fig 3. The extracted features from both DNNs got fused which are passed to two feature optimizers to attain the best results. The hyperparameters set for the experimentation based on the Dragonfly features optimizer are discussed in Table 5. The hyperparameters set for the experimentation based on the GA features optimizer are discussed in Table 6. The selected features were passed to two validation methods, 5- and 10-fold cross-validation methods. The outcome of all 8 experiments is discussed in Table 7 comprised of Pre, Rec, F1-Sc, and Accuracy. The highest results were achieved in experiment 7 on Cu-SVM on 5-fold cross-validation on 2500 features with GA features optimizer. The CM and ROC of the highest accuracy are represented in Fig 4. The ROC of the highest obtained is shown in Fig 5. The Pre, Rec, F1-Sc, and accuracy of experiment 7 are represented graphically in Fig 6. The selection of several features has a great impact on results. The highest accuracy obtained on selected features is discussed in Table 8 and it is graphically depicted in Fig 7. For evaluation purposes, a comprehensive comparison of the proposed work with existing work is discussed in Table 9. While our proposed framework demonstrates significant improvements in anomaly recognition accuracy, there are several limitations to consider. The method’s performance may vary when applied to larger or more diverse datasets outside of the benchmark dataset used in this study. Additionally, the computational complexity of the DL models and optimization algorithms may pose challenges in real-time applications or on resource-constrained devices. Further research is needed to evaluate the scalability and robustness of our approach in different operational environments and to address these practical constraints. Conclusion and future work Within this manuscript, a DCNN-branched model named Up-to-the-Minute-Net is anticipated. The benchmark dataset SAR is used for experiments. The dataset consists of 5 classes of anomalies. After the preprocessing of images, the features are extracted from the proposed DCNN. Another pre-trained deep neural network Inception-Resnet-V2 is used for feature extraction. The extracted features from both DCNNs were fused. In this research we proposed two methodologies belonging to optimization, the experiments are performed on the fused features by two different feature optimizers for the selection of optimal features. The feature selection algorithms Dragonfly and GA are used for the optimization problem. The results are taken on 5 and 10 folds as well to pattern the variance. The total number of experiments performed is 8. Four experiments are performed by the Dragonfly feature optimizer and four are performed by the GA features optimizer. The number of features selected for different experiments are 500,950,1250 and 2500, The highest accuracy is 99.9% achieved on Cu-SVM on selected 2500 features on 5-fold cross-validation based on GA optimization. While the GA optimizer with 2500 features provided the highest accuracy, it is important to consider the computational requirements associated with this feature set. In real-world applications, the increased number of features may result in higher computational overhead, affecting processing time and memory usage. A comparison between GA with 2500 features and GA with 500 features highlights this trade-off, where the model with 500 features, although slightly lower in accuracy (99.2%), may be more suitable for resource-constrained environments. Future work will focus on evaluating these computational aspects to optimize the framework for practical scenarios. TI - Anomaly recognition in surveillance based on feature optimizer using deep learning JO - PLoS ONE DO - 10.1371/journal.pone.0313692 DA - 2025-05-19 UR - https://www.deepdyve.com/lp/public-library-of-science-plos-journal/anomaly-recognition-in-surveillance-based-on-feature-optimizer-using-2Y2yiFxUpJ SP - e0313692 VL - 20 IS - 5 DP - DeepDyve ER -