A Yolo-Based Model for Breast Cancer Detection in Mammograms

Francesco Prinzi; Marco Insalaco; Alessia Orlando; Salvatore Gaglio; Salvatore Vitabile

doi:10.1007/s12559-023-10189-6

A Yolo-Based Model for Breast Cancer Detection in Mammograms

Prinzi, Francesco; Insalaco, Marco; Orlando, Alessia; Gaglio, Salvatore; Vitabile, Salvatore 2024-01-01 00:00:00 This work aims to implement an automated data-driven model for breast cancer detection in mammograms to support physi- cians’ decision process within a breast cancer screening or detection program. The public available CBIS-DDSM and the INbreast datasets were used as sources to implement the transfer learning technique on full-field digital mammography pro- prietary dataset. The proprietary dataset reflects a real heterogeneous case study, consisting of 190 masses, 46 asymmetries, and 71 distortions. Several Yolo architectures were compared, including YoloV3, YoloV5, and YoloV5-Transformer. In addition, Eigen-CAM was implemented for model introspection and outputs explanation by highlighting all the suspicious regions of interest within the mammogram. The small YoloV5 model resulted in the best developed solution obtaining an mAP of 0.621 on proprietary dataset. The saliency maps computed via Eigen-CAM have proven capable solution reporting all regions of interest also on incorrect prediction scenarios. In particular, Eigen-CAM produces a substantial reduction in the incidence of false negatives, although accompanied by an increase in false positives. Despite the presence of hard-to- recognize anomalies such as asymmetries and distortions on the proprietary dataset, the trained model showed encouraging detection capabilities. The combination of Yolo predictions and the generated saliency maps represent two complementary outputs for the reduction of false negatives. Nevertheless, it is imperative to regard these outputs as qualitative tools that invariably necessitate clinical radiologic evaluation. In this view, the model represents a trusted predictive system to support cognitive and decision-making, encouraging its integration into real clinical practice. Keywords Breast cancer detection · Explainable AI · YoloV5 · Transfer learning · Proprietary dataset Introduction * Francesco Prinzi Breast cancer is the most common worldwide tumor in [email protected] the female population [1]. Previous randomized trials and Marco Insalaco incidence-based mortality studies have demonstrated a sig- [email protected] nificant reduction in breast cancer mortality associated with Alessia Orlando participation in breast screening programs [2]. However, the [email protected] problem of false positives and false negatives persists as Salvatore Gaglio a concern. Most of these errors can be attributed to dense [email protected] breasts (masking effect), as well as human factors such Salvatore Vitabile as radiologist perception and erroneous decision-making [email protected] behaviors. Additionally, the inherent imaging characteristics of tumors contribute to the issue, with benign masses often Department of Biomedicine, Neuroscience and Advanced resembling malignant ones and malignant masses sometimes Diagnostics (BiND), University of Palermo, Palermo, Italy mimicking benign ones [3]. During the breast cancer diag- Section of Radiology - Department of Biomedicine, nosis process, the physician aims to detect all the regions of Neuroscience and Advanced Diagnostics (BiND), University Hospital “Paolo Giaccone”, Palermo, Italy interest (ROIs) in the whole mammogram: masses, calcifi- 3 cations, distortions, etc. Detection in the early stage of the Department of Engineering, University of Palermo, Palermo, Italy disease is critical for planning new examinations, therapies, 4 or lines of intervention. A missed detection, on the other Institute for High-Performance Computing and Networking, National Research Council (ICAR-CNR), Palermo, Italy hand, may result in irreversible injury to the patient. For Vol.:(0123456789) 1 3 Cognitive Computation this reason, breast cancer detection is the most complicated However, despite the high performance of the deep learn- but also the most important task. Unfortunately, several pro- ing models, their actual use is inhibited by their black-box posed solutions in the literature do not aim to analyze the nature, i.e., the internal logic is incomprehensible to users entire image, but rather limit detection to patch classifica- [17]. This has raised some critical issues about their use such tion: the ROIs are first manually selected and cropped, and as legal aspects, user acceptance, and trust [18, 19]. For this then the classifiers are trained to distinguish the crops. How- reason, in order to encourage the integration of these sys- ever, to support and imitate the physician’s diagnostic pro- tems into real clinical practice, the problem of their explain- cess, an architecture capable of detecting all ROIs within the ability needs to be addressed. The gradient-free method whole mammogram is required. Faster R-CNN, RetinaNet, Eigen-CAM [20] was used for saliency maps computation and Yolo have encouraged the development of systems for and compared with the occlusion sensitivity method. The breast cancer detection [4–7]. These frameworks certainly saliency maps were employed to verify the learning model introduce two main difficulties: (1) the models have to learn and to highlight the most important pixel involved in the the features of the whole mammogram, and the image resiz- prediction process. We believe that reporting regions in the ing required for training may result in the loss of critical form of heat maps can guide the physician’s attention much details; (2) since the model has to detect all ROIs among all more than ROIs prediction: ROIs are predicted and shown patches of healthy tissue (i.e., non-ROIs), an unavoidable only above a certain confidence threshold, and the hardest- increase in the error rate must be faced. However, Yolo has to-find regions may not exceed this threshold. In this way, proven to be an excellent tool in numerous scenarios, achiev- the complicated, tedious, and exhausting process of mam- ing higher accuracy and inference speed rates than its object mogram evaluation can be supported by guiding the physi- detector competitors [8]. cian’s attention to different ROIs. In [9], a comparison and evaluation of YoloV5 nano, The main contributions on the current manuscript are as small, medium, and large models using the CBIS-DDSM follows: and INbreast datasets was performed. However, several aspects have not yet been considered. The issue of explain- • The first novelty falls within the field of explainable ability was not addressed. Nevertheless, in critical domains artificial intelligence (XAI). While data-driven methods like medical applications, ensuring model explainability is have demonstrated high performance in various medical an essential prerequisite. Furthermore, it has not been exam- scenarios, their lack of transparency creates skepticism ined whether the utilization of deeper architectures such as among both physicians and patients regarding these new YoloV3 can enhance detection performance in the case of technologies. This skepticism is particularly prominent small datasets. Additionally, the potential advantages of in the development of clinical decision support systems incorporating a Transformer block into Yolo, considering (CDSS), where understanding the decision-making pro- their generalization capability, have not been investigated. cess and ensuring system reliability are crucial prerequi- In this work, a YoloV5-based model was proposed for breast sites for facilitating the diagnostic process. Conventional cancer detection to support the physician’s diagnostic pro- machine learning approaches are inadequate in meeting cess. A comparison between other feature extractors such as these demands and fail to provide justifications for the Darknet53 proposed in YoloV3 [10] and the Vision Trans- decisions made by the systems. Introducing explainabil- former [11] was performed. ity for breast cancer detection is of utmost importance Given the need for large databases to facilitate deep due to the potential for early detection of invasive dis- training [12], the transfer learning (TL) technique was eases in mammography screening. Quite frequently, these used. In fact, it has also recently been shown that training lesions may not be readily apparent and may fail to meet the with small datasets by exploiting pre-trainings represents confidence threshold established in Yolo to return the detec- a future direction to provide a trusted system supporting tion. Conversely, gradient-free XAI methods could remain cognitive and decision-making processes in the medical unaffected by the final output and can provide valuable domain [13]. For this reason, the CBIS-DDSM [14] and assistance in the diagnostic process, even in situations INbreast [15] datasets were used as source datasets and a involving inaccurate or low confidence predictions. The proprietary dataset as target. In contrast to CBIS-DDSM saliency maps have been proposed as a valuable tool to and INbreast, the proprietary dataset includes lesions that enhance the predictions of YoloV5. are more challenging to recognize, such as asymmetries and • A proprietary dataset was acquired during daily clini- distortions, which hold significant clinical importance [16]. cal sessions from the Radiology Section of the Univer- The proprietary dataset was acquired and annotated at the sity Hospital “Paolo Giaccone” (Palermo, Italy) for Radiology Section of the University Hospital “Paolo Giac- model evaluation. Unlike CBIS-DDSM and INbreast, cone” (Palermo, Italy). The workflow of the experiments this dataset comprises a real clinical dataset contain- performed is shown in Fig. 1. ing numerous lesions that present greater complexity 1 3 Cognitive Computation Fig. 1 The overall architecture. The CBIS-DDSM dataset was used as detection on a proprietary dataset. A data augmentation procedure source to evaluate several Yolo-based architectures (YoloV3, YoloV5 was performed before the training phase for class balancing as well as (n, s, m, l), and YoloV5-Transformer) on the INbreast target data- during the training. The output comprises bounding-box predictions set. Then, the best trained architecture (YoloV5s) was used for mass and a heat map that highlights all the ROIs within the mammogram in recognition, including asymmetries and distortions. datasets. The same section explains the three main used These challenging cases hold an important clinical architectures of Yolo, their training, and the methods for significance [16]. Furthermore, the training process saliency maps computation. Section “Results” shows the involved the utilization of three datasets, enabling the achieved results, and “Discussion”, their discussion. Finally, final model to incorporate the knowledge acquired from in “Conclusions”, the main conclusions are reported. CBIS-DDSM and INbreast datasets. The article presents a comparison of several Yolo-based models. In addition, we evaluated the integration of Related Work Transformers [11] inside Yolo. Transformers have had an enormous impact on large language models and computer Given the incidence of breast cancer, many works have been vision tasks. However, the authors [11] acknowledge that proposed to support the physician’s diagnostic process. Transformers lack certain inherent biases found in con- Muduli et al. [22] and Mahmood et al. [23] have compared volutional neural networks (CNNs), such as translation their own CNN architecture with state-of-the-art networks equivariance and locality. Consequently, Transformers for malignant and benign ROIs classification. Soulami et al. may not generalize well when trained on limited amounts [24] have also proposed a CNN, called CapsNet, to address of data. This phenomenon is starting to be discussed in the classification of ROIs. They showed that the classifica- other studies [21]. In the context of mammograms and tion of breast masses into normal, benign, and malignant transfer learning, the generalizability of these findings is certainly more complex than a binary classification of remains uncertain. masses into normal and abnormal. Also, Ragab et al. [25] have addressed breast cancer classification at patch-level, This article is organized as follows: “Related Work” provides using AlexNet, GoogleNet, and ResNet-18-50-101 as feature the related works on breast cancer classification both using extractors and a support vector machine as classifier. They patch-based classification and exploiting the whole mam- also evaluated classification through deep feature fusion mogram. Section “Materials and Methods” describes the and a subsequent application of principal component analy- open-source CBIS-DDSM, the INbreast, and the proprietary sis. Yu et al. [26] have explored several methods and CNN 1 3 Cognitive Computation architectures for tumor or normal ROIs classification. Two dataset and is composed of scanned film mammograms. deep fusion models based on VGG16 were used to classify Focusing on masses, 1514 images with a total of 1618 different patches extracted from the original ROI, to obtain lesions (850 benign and 768 malignant) were included. Of the final prediction using a majority voting. In Agarwal et al. the total 1696 lesions, 78 were discarded due to a mismatch [27], a sliding window approach is used to scan the whole between the size of the image and its mask, generating ROIs breast and extract all the possible cancer patches from the that did not match a lesion. image. Several patch-based CNN (VGG16, ResNet50, and InceptionV3) were trained for breast cancer detection, that The INbreast Dataset is the classification between positive and negative patches. The aforementioned works train convolutional models The INbreast [15] dataset consists of 410 full-field digital that can distinguish ROIs, without dealing with recognizing mammograms (FFDM) classified into normal, benign, and them. However, at the breast screening stage, it is crucial to malignant. Only the 107 positive images were selected, and detect all ROIs and subsequently plan new lines of interven- lesions with Bi-Rads > 3 were considered malignant; the tion. Jung et al. [7] used RetinaNet as object detector for the others were labeled as benign. Considering that some images automatic localization of masses (both benign and malig- contain multiple lesions, a total of 40 benign and 75 malig- nant) in the whole mammogram. A dual-view deep convo- nant ROIs were identified. lutional neural network (DV-DCNN) for matching detected masses was proposed by AlGhamdi and Abdel-Mottaleb The Proprietary Dataset [28]. The authors used RetinaNet [29] for mass detection and the DV-DCNN architecture to determine if two patches The dataset consists of 278 FFDMs containing a total of from the craniocaudal (CC) and mediolateral oblique (MLO) 307 lesions, annotated by expert radiologists dealing with views of the same breast represent the same mass, i.e., a the identification of abnormal regions. The images were positive pair. In [4] a Yolo-based Computer-Aided Diagnosis acquired by a Fujifilm Full Field Digital at the Radiol- (CAD) was proposed for mass detection and classification, ogy Section of the University Hospital “Paolo Giaccone” proving that the system works also where the masses exist (Palermo, Italy). Images have spatial resolution and pixel over the pectoral muscles or dense regions. Aly et al. [5] size of 5928 × 4728 and 50 µm, respectively. The image define the evaluation process of screening mammograms as annotations were saved in grayscale softcopy presentation very monotonous, tiring, lengthy, costly, and significantly state (GSPS) format, compliant with the DICOM standard. prone to errors for human readers. In fact, a YoloV3 model All identified by radiologist ROIs were annotated by a cir- was proposed for mass detection and classification. They cumscribed circle, and then the coordinates of the bounding- obtained the fairest and most accurate performance using boxes used for Yolo input were calculated as the coordinates an augmented dataset. of the square circumscribed by the circle. The dataset used In this work, new feature extractors for breast cancer in our study was obtained from the real clinical practice at detection were considered. The YoloV5 architecture was University Hospital “Paolo Giaccone” (Palermo, Italy). Spe- compared with the previous YoloV3 model and considering cifically, the data was collected from the outpatient breast also the Vision Transformer block. In addition, Eigen-CAM clinic, which specializes in second-level diagnostics. As a was used as explainable AI algorithm [30, 31] to provide a result, the acquired case series are heavily skewed towards post hoc explanation. The Eigen-CAM method was com- more severe breast cancer lesions including distortions and pared with occlusion sensitivity. The generated saliency asymmetries. Detecting and diagnosing distortions can be maps were used for two main reasons: (1) as explanatory particularly challenging, as they are characterized by the debugging tool for preventing inadequate outputs [32, 33] presence of spicules radiating from a point, focal retractions, and (2) to guide physicians’ attention even on incorrect pre- or straightening at the edges of the parenchyma [34]. Con- diction scenarios. sequently, distortions are among the most commonly over- looked abnormalities [35]. Asymmetries refer to unilateral deposits of fibroglandular tissue that do not meet the criteria Materials and Methods for being classified as masses. They can be further catego- rized as asymmetry, focal asymmetry, global asymmetry, or Datasets developing asymmetry. It has been estimated that around 20% of asymmetry cases are associated with malignancy, The CBIS‑DDSM Dataset making them an important area of research [16]. The benign lesions represent 17.6% of the dataset (54 samples), and the The CBIS-DDSM dataset [14] is the curated version of the 82.4% (253 samples) are malignant. The dataset reflects Digital Database for Screening Mammography (DDSM) a real clinical scenario; in fact, it is composed of masses 1 3 Cognitive Computation (62%), asymmetries (%15), and distortions (23%). Given the the training set were augmented. Although the main pur- large class imbalance, the proprietary dataset was used only pose of the work is to evaluate the detection performance for detection. on the proprietary dataset (regardless of lesion class), the following data augmentation procedure was applied to the Data Pre‑Processing proprietary dataset before the training phase. Figure 2 sum- marizes the transformation considered. In particular, 180 For the CBIS-DDSM and INbreast datasets, the coordi- rotation and 180 rotation + flip upper-down (UD) were nates of the ROIs bounding box required for Yolo training applied for benign images. The other transformations were were calculated considering the coordinates of the small- applied during the training of Yolo, as discussed in the next est rectangle containing the segmented lesion. Instead, the subsection “Techniques Used During the Training Phase”. ROI coordinates for the proprietary dataset were computed In addition, according to [5], the remaining test dataset was from the square region that inscribes the circle containing augmented to obtain the validation set. In fact, flip UD, ◦ ◦ the ROI. The CBIS-DDSM dataset has an acceptable size 180 rotation + flip UD, flip left-right (LR) and 180 rota- for deep learning architecture training. However, it is com- tion were applied on benign images, and Flip LR for malig- posed of scanned film mammograms, much noisier and less nant images. Considering the smaller difference between detailed than FFDM. For this reason, only for the CBIS- the classes, on INbreast, also, 180 rotations for malignant DDSM dataset, the contrast limited adaptive histogram masses were considered [9]. equalization (CLAHE) was applied for image enhancement This procedure resulted in the generation of a balanced [23], with the following setting: 1 as contrast limit, 2 × 2 as validation set. In addition, the discussed procedure for grid size, followed by a 3 × 3 Gaussian filter. For all data- INbreast and the proprietary datasets was repeated consid- sets, the gray levels were scaled in the range 0–255, and the ering 5 different splitting of training and test sets (5-fold images were resized to 640 × 640 using the Lanczos filter cross-validation). [36, 37]. The CBIS-DDSM dataset was splitted randomly considering 70% training, 15% validation, and 15% test Techniques Used During the Training Phase set. Conversely, the INbreast and the proprietary datasets were split into training (80%) and test set (20%), respec- Transformations not considered in the previous step were tively. Considering the small size of the two datasets and the performed during Yolo training. In particular, three differ- unbalanced issue, the next “Data Augmentation” discusses ent data augmentation configurations were chosen: low, data augmentation for class balancing and generation of the medium, and high. In all cases, image translation, rotation, validation set “Techniques for Class Balancing Before the scale, shear, flip UD, flip LR, and also HSV augmentation Training Phase”, as well as the procedure to improve the were considered. In addition, although it is a common sce- training “Techniques Used During the Training Phase”. nario for breast cancer, all three datasets contain few multi- lesion images. Therefore, to improve the model’s capability Data Augmentation to detect multiple lesions in the same image, the mosaic technique was used. The mosaic augmentation method con- Techniques for Class Balancing Before the Training Phase sists of the generation of a 2 × 2 grid image, containing the considered image and three random images of the dataset. Due to the excessive imbalance classes for the INbreast and The mosaic technique improves training for two main rea- proprietary dataset, the minority class images (benign) of sons: (1) merging 4 images results in multiple ROIs in the Fig. 2 Transformations for class balancing and validation set creation. The procedure was repeated implementing the 5-fold cross-validation 1 3 Cognitive Computation same image, and the model improves in recognizing multiple pyramids network (FPN) [40] is used as neck, allowing to ROIs simultaneously; (2) to achieve the same input size, the learn objects of different sizes: it specializes in detecting 4 merged images and their respective ROIs are downsized, large and small objects. In addition, the non-maximum sup- improving the detection of smaller lesions. pression to select one bounding box out of many overlap- Table 1 shows the parameter set for each configuration. ping bounding boxes is used. The values reported HSV, translation, rotation, scale, and shear indicate the range considered for the random transfor- YoloV5 Model mation. For flip and mosaic, the value indicates the prob- ability of performing the transformation, so 0.5 is considered YoloV5 uses CSPDarknet53 as its backbone: it exploits a higher level of augmentation because both augmented and the architecture Darknet53 proposed in [10] and employs non-augmented images are considered for training. a CSPNet [41] strategy to partition the feature map of the base layer into two parts and then merges them through a Yolo Architectures Training cross-stage hierarchy. In the neck part, PAnet [42] is used to generate the feature pyramids network (FPN) and allow the Like other single-stage object detectors, Yolo consists of extraction of multi-scale feature maps. This structure allows three parts: backbone, neck, and head. The backbone part the extraction of features optimized for small, medium, and is a CNN that extracts and aggregates image features. The large object detection. YoloV5 was released in nano, small, neck part allows for features extraction optimized for small, medium, large, and extra-large versions. The versions differ medium, and large object detection. In the end, the three in the number of convolutional kernels used and thus the feature maps for small, medium, and large object detec- number of parameters. In this paper, a comparison between tion are given as input to the head part, thus composed of nano, small, medium, and large versions was performed. convolutional layers for the final prediction. Yolo requires that the image is divided into a grid, then makes a predic- YoloV5‑Transformer tion for each grid cell. The prediction consists of a 6-tuple y =(p , b , b , b , b , c) , where (b , b , b , b ) identify coor- In contrast to convolutional networks, Transformer are able c x y h w x y h w dinates (x, y) and sizes (height, width) of the predicted to model the relationships among various small patches in bounding-box, p represent the probability that there is an the image. The Transformer block assumes the image is object in the cell, and c represent the predicted class. The split into a sequence of patches, where each patch is flat- mechanism of anchors is also used, to allow multiple object tened to a vector. These flattened image patches are used to detection in the same grid cell. For this reason, the predic- create lower-dimensional linear embeddings and fed into a tion is the 6-tuple discussed for each specified anchor. Each Transformer encoder, composed by a multi-head attention version of Yolo has its own peculiarities, which mainly to find local and global dependencies in the image. It has concern the structure of the feature extractor, that is, the been shown that the introduction of a Transformer block to backbone. convolutional networks can improve efficiency and overall accuracy [43]. In YoloV5, the Transformer block was embed- YoloV3 Model ded in the penultimate layer of the backbone, that is, among the three convolutional layers preceding the spatial pyramid YoloV3 is much deeper than the previous two versions pooling layer. and is more accurate but requires more time and data for training. In YoloV3, the Darknet53 was used as backbone Models Training [10]. Darknet53 is a hybrid approach between Darknet19 (used in YoloV2 [38]) and residual network elements (e.g., Considering the small size of both INbreast and proprietary BottleNeck) [39], proposed to improve the Darknet19 and datasets, training a deep architecture such as Yolo may harm the efficiency of ResNet-101/152. The short-cut connec- the reliability of the trained models. Therefore, despite it tions allow getting more fine-grained information, lead- being composed of scanned film mammograms, the CBIS- ing to better performance for small objects. The feature DDSM is employed as source dataset for initial training. The Table 1 Setting for data Level H,S,V Translation Rotation Scale Shear Flip (UD, LR) Mosaic augmentation during the training phase Low 0.0, 0.0, 0.0 0.1 5.0 0.1 5.0 (0.5, 0.5) 0.0 Med 0.007, 0.35, 0.2 0.3 10.0 0.3 5.0 (0.5, 0.5) 1.0 High 0.015, 0.7, 0.4 0.3 20.0 0.3 10.0 (0.5, 0.5) 0.5 1 3 Cognitive Computation above setup allows the TL technique on the INbreast and output and activation maps, can produce distorted visualiza- proprietary target datasets. Considering that both source and tions when predictions are incorrect. To address these issues, target datasets are labeled, the performed TL was inductive this study presents Eigen-CAM for saliency map computa- transfer learning [44]. Since Yolo simultaneously solves a tion and compares it with the occlusion sensitivity method. regression task to predict bounding box coordinates, and two Eigen-CAM is a gradient-free method that computes and classification tasks to predict objectiveness and class score, visualizes the principal components of the learned features/ two different loss functions were employed. For regression, representations from the convolutional layers, resulting in complete Intersection over Union (IoU) loss was used; for intuitive and compatible with all the deep learning mod- classification, binary cross-entropy with logits loss function els. In Eigen-CAM, it is assumed that all relevant spatial was used in both cases. features learned over the hierarchy of the CNN model will be preserved during the optimization process, and non- Performance Evaluation relevant features will be regularized or smoothed out. The Eigen-CAM is computed considering the input image I of The results obtained were presented considering the most com- size i × j projected onto the last convolutional layer L = K mon indexes for object detection tasks such as precision, recall, and is given by O = W I . The matrix O = UΣV is L=K L=K L=K and average precision. The average precision (AP) is defined factorized using the singular value decomposition to obtain as the area under the precision-recall curve. The IoU was set the principal components. The activation map is given by to 0.5. For CBIS-DDSM and INbreast datasets, AP was calcu- the projection on the first eigenvector L = O V , Eigen−CAM L=K 1 lated for detecting malignant (M AP) and benign (B AP) lesions where V is the first eigenvector in the V matrix. Similar to separately, as well as the mean of the two classes (mAP). Eigen-CAM, Occlusion sensitivity can be linked to image detection tasks, and it is gradient-free and independent of the Models Explanation: Eigen‑CAM specific architecture used. It assesses changes in activations resulting from occluding different regions of the image [51]. Examining trained models is essential before incorporating The saliency maps have been proposed as a valuable tool them into actual clinical practice. As a result, our system to enhance the predictions of YoloV5, which can assist phy- produces prediction explanations as the second output to sicians in the diagnostic process, especially when the model fulfill this requirement. Saliency maps have the capability fails to make accurate predictions. YoloV5 only provides to reveal the pixels or regions that played a significant role predictions if they surpass a certain confidence threshold. in the decision-making process of the system. This effec- The purpose of saliency maps is to identify all ROIs and mit- tively highlights all potential ROIs to the physician. Several igate false negative issues. It has been observed that many gradient-based methods such as CAM [45], Grad-CAM [46], cancer types progress to an invasive stage due to the failure and GradCAM++ [47] have been proposed to implement of early prediction also with preliminary signs. Therefore, interpretability and transparency of deep learning models. In in contrast to YoloV5’s predictions, saliency maps offer all particular, they are class discriminative visualization meth- potential ROIs, even with low confidence. This inevitably ods and require the class probability score for the gradient leads to an increase in false positives. Considering this, phy- computations. However, gradient-based methods suffer from sicians receive two outputs: firstly, the conventional YoloV5 this problem: backpropagating any quantity requires addi- output that balances precision and recall, providing only tional computational overhead and assumes that classifiers ROIs that exceed a certain confidence level. In addition, produced correct decisions, and whenever a wrong decision saliency maps propose all potential ROIs, which may serve is made, all mentioned methods will produce wrong or dis- as early cancer indications, even if their probability of being torted visualizations [20]. For this reason, the localization lesions (i.e., not exceeding the threshold) is low. Thus, a accuracy of the above methods remains weak, especially simple predictive model transforms into a decision-support in the case of incorrect predictions. In addition, while tra- system, as physicians receive not only a definitive decision ditional CNNs provide class distributions for each sample, YOLO’s output includes bounding box coordinates, object Table 2 Comparison of the nano, small, medium, and large architec- tures of YoloV5 on the CBIS-DDSM dataset, considering all default presence probabilities in each cell, and class distributions. hyperparameters These issues often make the output non-differentiable and impractical to implement gradient-based algorithms. As a Model B AP M AP Precision Recall mAP result, many object detection studies employing Yolo rely on n 0.257 0.479 0.473 0.408 0.368 Eigen-CAM for architecture interpretation [48–50]. Eigen- s 0.257 0.518 0.447 0.427 0.387 CAM is preferred due to its gradient-free nature and prin- m 0.280 0.514 0.489 0.403 0.397 cipal components use from the extracted feature maps. It l 0.239 0.488 0.491 0.377 0.364 should be noted that gradient-based methods, which rely on 1 3 Cognitive Computation Table 3 Performance of YoloV5 small version, considering the its parameters. Therefore, all subsequent experiments were equalized CBIS-DDSM dataset, Adam optimizer, and the three data carried out only considering the small model. augmentation configurations Table 3 shows that the histogram equalization speci- Hyps B AP M AP Precision Recall mAP fied in the data pre-processing section improves the model performance. In addition, the Adam optimizer using 0.001 Equal 0.300 0.501 0.487 0.408 0.400 as learning rate outperforms the default stochastic gradi- Adam+equal 0.321 0.555 0.487 0.464 0.438 ent descent (SGD) optimizer with learning rate of 0.01. aug-low 0.241 0.49 0.46 0.394 0.366 Therefore, experiments to evaluate the impact of data aug- aug-med 0.337 0.549 0.497 0.487 0.433 mentation were carried out using the equalized dataset and aug-high 0.361 0.634 0.566 0.482 0.498 Adam optimizer. Table 3 shows how the results improve as data augmentation increases. The extensive data augmen- tation employed emphasizes the necessity for substantial but also suggestions of lesions that the system recommends amounts of data when training this deep architecture, con- paying attention to. firming the choice of using the CBIS-DDSM dataset to perform TL on INbreast and proprietary datasets. Results Inbreast Results and Transfer Learning Evaluation The experiments were performed in Google Colaboratory Pro, using Python 3 environment. The PyTorch implemen- Exploiting the optimized hyperparameters for the CBIS- tation proposed by Ultralytics [52] was exploited, and the DDSM dataset, YoloV3 and YoloV5-Transformer models Weights & Biases platform [53] was used to monitor the were also trained on the CBIS-DDSM dataset, to imple- training process. The trainings were performed for 100 ment the TL technique on the INbreast target dataset. epochs and 16 as batch. The validation mAP was used for Table 4 shows the achieved results. Considering the model selection, considering the best model as a weighted dataset size, the performance was calculated in 5-fold combination of [email protected], [email protected]:0.95 metrics, respec- cross validation, and mean and standard deviation were tively 0.9 and 0.1. reported for each metric. The best training protocol for the CBIS-DDSM, that is, Adam optimizer, high data aug- CBIS‑DDSM Results and Data‑Augmentation mentation, and 16 as batch, was used for all the experi- Improvements ments. In addition, INbreast was also trained from scratch to show the difference in accuracy with and without The CBIS-DDSM dataset was used to evaluate the opti- TL. The YoloV5s model outperforms its previous ver- mal YoloV5 architecture and for hyperparameters opti- sion YoloV3 and also the YoloV5-Transformer. YoloV3 mization, considering the nano, small, medium, and contains a feature extractor with more parameters than large versions. Then, it was exploited as source dataset to YoloV5s and Transformer (about 61 vs. 7 million) and implement inductive TL and improve the generalization therefore needs a larger amount of data for their train- capabilities on INbreast and proprietary FFDM images. ing. In addition, the YoloV5-Transformer version showed For this reason, given the huge amount of hyperparam- lower performance while it has a comparable number eters, an initial analysis was performed using all the pro- of parameters to YoloV5s. Comparing YoloV5s training posed default values for each model. Table 2 shows the from scratch and with TL on the INbreast, an increase achieved results for each version of YoloV5. The nano of 0.061 mAP and 0.119 of B AP was calculated. The and large versions have a lower mAP than the small and imbalance of the dataset clearly reflects the model per- medium versions. Conversely, the small model, compared formance: the benign lesions detection rate, which is the with the medium model, results in a more balanced preci- minority class, is lower than malignant lesions for each sion and recall pair, while it contains about one-third of considered model. Table 4 5-fold results for Model B AP M AP Precision Recall mAP the three used architectures on INbreast dataset (Tr is YoloV3 0.585 ± 0.093 0.890 ± 0.036 0.785 ± 0.012 0.695 ± 0.104 0.738 ± 0.061 for Transformer; NoTL is YoloV5s-Tr 0.642 ± 0.060 0.894 ± 0.054 0.799 ± 0.118 0.742 ± 0.146 0.771 ± 0.048 the training without transfer YoloV5s-NoTL 0.652 ± 0.051 0.890 ± 0.047 0.835 ± 0.059 0.713 ± 0.770 0.771 ± 0.038 learning) YoloV5s 0.771 ± 0.131 0.898 ± 0.069 0.854 ± 0.097 0.729 ± 0.100 0.835 ± 0.098 1 3 Cognitive Computation Table 5 5-Fold results on the proprietary dataset, considering the complete overlap as shown in Figs. 4 and 5. However, a poor training with and without transfer learning overlap between saliency maps calculated through different methods has been widely shown in the literature [55–57]. Model Precision Recall mAP More specifically, it has been observed that when consider- YoloV5s no-TL 0.665 ± 0.054 0.541 ± 0.043 0.561 ± 0.053 ing occlusion sensitivity, the regions linked to lesions appear YoloV5s TL 0.726 ± 0.110 0.591 ± 0.063 0.621 ± 0.035 to be slightly illuminated compared to Eigen-CAM, where they are more prominently highlighted. In addition, the quantitative analysis showed the superiority of Eigen-CAM Proprietary Dataset Results and Transfer for this object detection task in mammography. Table 6 sum- Learning Evaluation marizes the results. In the selected subset, the Yolo model correctly detected 41 lesions, but missed 15 lesions (false The YoloV5s model was the most accurate for the two open- negatives) and incorrectly identified 19 non-existent lesions source datasets and was used for lesion detection on propri- (false positives). However, when we employed Eigen-CAM, etary dataset. The trained model using the CBIS-DDSM as we observed better results. Out of the 56 lesions, 52 were source dataset and INbreast as target dataset was the check- correctly detected, reducing the false negatives to just 4. point to start training on the proprietary dataset. For this rea- However, the use of Eigen-CAM led to an increase in false son, the model trained on the proprietary dataset brings the positives, with a total of 34. On the other hand, the occlusion knowledge learned on CBIS-DDSM and INbreast. Figure 3 sensitivity method did not perform as well as Eigen-CAM, shows the difference in validation mAP calculated during showing an increase in false negatives to 20 and false posi- training with and without transfer learning. In particular, tive of 55. higher initial mAP, faster mAP growth in the early epochs, and higher mAP asymptote was calculated using transfer learning [54]. The result was confirmed in the test set, with Discussion an mAP of 0.561 and 0.61 without and with transfer learn- ing, respectively. Table 5 shows the results computed within Performance and Transfer Learning Importance the 5-Fold Cross Validation strategy. The proposed work for breast cancer detection introduces Explainability Results several novelties and advantages. Three different data- sets were considered. The CBIS-DDSM is the largest and To evaluate the performance using XAI methods, we con- therefore the most appropriate for deep training. However, ducted a manual analysis on a proprietary dataset subset it is composed of scanned film mammograms, resulting in consisting of 50 images and 56 lesions. No healthy images images that are notably distinct from the FFDM images. were considered. Our focus was evaluating the differences in Conversely, the INbreast and the proprietary FFDM data- false positives and false negatives using two XAI techniques: sets can be considered a good benchmark for testing Yolo Eigen-CAM and occlusion sensitivity. Through a qualita- on real clinical practice images. For this reason, the CBIS- tive analysis, the generated saliency maps do not exhibit DDSM dataset was used to obtain an optimized pre-training Fig. 3 Training performance with (green) and without (red) transfer learning on the proprietary dataset 1 3 Cognitive Computation Fig. 4 Example of a bounding-box prediction on the left and the respective saliency map on the right. The ROI is correctly predicted with a con- fidence index of 0.6. However, other suspicious areas are also highlighted on the saliency map compared with the common COCO dataset (that is the differ. In fact, INbreast was acquired with a MammoNova- benchmark for Yolo). In fact, the COCO dataset is used for tion Siemens FFDM machine with a pixel size of 70 µm and the recognition of objects, cars, people, etc., on real-life our dataset with a Fujifilm FFDM with a pixel size of 50 images. In each case, with a significantly different distribu- µm. The spatial resolution is also very different: for INbreast tion than breast cancer in mammograms. Then, for all exper- 3328×4084 or 2560×3328 and for the proprietary dataset iments, the transfer learning technique was exploited using 5928×4728. Moreover, the main difference lies in the het- the CBIS-DDSM as source dataset, and different Yolo archi- erogeneity of the datasets. In fact, for INbreast, the 107 con- tectures were compared. Considering that Yolo architectures sidered abnormalities are only masses, with 2 asymmetries. evolve to improve both accuracy and inference speeds, it was In contrast, our dataset is mainly composed of masses (62%), not obvious to find YoloV5 more accurate than YoloV3. but also of asymmetries (15%) and distortions (23%). The Moreover, among the various versions of YoloV5, the presence of these types of lesions, which account for 38% small version was the most accurate, also compared with of our dataset, poses an additional challenge for accurate the YoloV5s-Transformer. The performance obtained on detection. In fact, according to BI-RADS [58], the term the proprietary dataset was lower than on INbreast. How- architectural distortion (AD) is used when normal architec- ever, our dataset contains three times the number of lesions, ture is distorted with a non-definite visible mass. AD is not allowing for a more accurate evaluation of the models. Also, always a sign of cancer and may represent different benign although both are datasets for breast cancer analysis, it is processes and high-risk lesions [59], and it is responsible natural that the distributions, and consequently the training, for 12 to 45% of breast cancer missed during screening [60]. Fig. 5 Example of wrong prediction on the left and the respective saliency map on the right. Despite the error, the saliency map calculated via Eigen-Cam provides several suspicious ROIs, as well as the miss-detected lesion (marked with the white bounding-box) 1 3 Cognitive Computation Table 6 Performance variation through the use of saliency maps spatial resolution. In fact, comparing our result on the best fold with their result on 608×608 images, we obtained an AP Model Lesions # TP FP FN of 88.5 (vs. their 87.5) and 92.2 (vs. their 80.8) for benign Yolo-based 56 41 19 15 and malignant detection, respectively, illustrated in Aly Eigen-CAM 56 52 34 4 et al. [5], increasing the image size proves beneficial for the OS 56 36 55 20 learning process. However, the disparity between experi- ments conducted with sizes of 448×448 vs. 608×608 is quite substantial, but it diminishes significantly when consider- Asymmetries are areas of fibroglandular tissue visible on ing the size of 832×832. This finding suggests that larger only one mammographic projection, mostly caused by the image sizes may yield slightly improved results, while the superimposition of normal breast tissue. There are different increased complexity of models and the associated optimiza- types of asymmetries: for example, the developing asym- tion could pose a considerably increased computational cost. metry has a 15% risk of malignancy [61], and the global symmetry instead is mostly a normal variant. Although this Explainability Discussion introduces a significant level of complexity, it moves the sys- tem towards the real-world clinical scenario. For this reason, Despite the encouraging performance, the system must be the achieved results are encouraging and demonstrate that both accurate and trusted by physicians for its integration breast cancer detection can be addressed without reducing into real clinical practice. Therefore, an introspection and the task to patch classification. explanation of the trained model were conducted via Eigen- CAM. Figures 4 and 5 show two generated saliency maps Comparison via Eigen-CAM and occlusion sensitivity methods. In par- ticular, the former image represents a correct prediction and An accurate comparison with other studies is complex the latter an incorrect prediction. In Fig. 4, the Eigen-CAM because of different datasets, pre-processing, and training heat-map results most brightly around the predicted lesion, protocols. However, Table 7 shows some similar works. In but it is suggested that the physician should also pay atten- [62], OPTIMAM dataset (OMI-H), composed of about 5300 tion to other areas of the image. In Fig. 5, instead, the model mammograms, was used as source dataset to perform TL makes an error in prediction (missed detection). In this fig- on INbreast dataset. Using the faster R-CNN architecture, ure, the advantage of using a gradient-free method can be they obtained an AUC-ROC of 0.79 and 0.95 for benign seen. In fact, the generated Eigen-CAM heat map identifies and malignant lesion detection. YoloV1 was used in [4], several salient areas that demand the physician’s attention. resulting in 99.5 and 99.9 for benign and malignant lesion In addition, the saliency maps depicted in Figs. 4 and 5 detection in the DDSM dataset. Yolo9000 (e.g., YoloV2) indicate that the activations primarily concentrate on the is used in [63]: in contrast to our system, localization and breast region. Any minimal activations observed outside this classification performance were evaluated separately on the area (in the Eigen-CAM maps) can be attributed to artifacts INbreast dataset. In particular, first, the lesions are localized, and are not considered confounding factors for the physi- and then only the localized ones are classified, resulting in cian. It is possible to speculate that the slight activations at a detection accuracy of 97.2 and a classification accuracy the black edges of the images might assist in aligning the of 95.3. The most similar work to ours in terms of evalua- coordinates of the bounding boxes predicted in the opposite tion protocol and workflow was proposed by Aly et al. [5]. area of the image, where only the background is present. Using YoloV3, they obtained an AP of 94.2 and 84.6 for The obtained saliency maps are class-independent as con- benign and malignant detection, respectively. However the firmed by clinical literature findings, where mammography reported best results are computed using a higher image spa- is typically employed as a screening examination aimed at tial resolution (832×832 vs. our 640×640), and the results identifying certain abnormalities. On the other hand, other were reported in 5-fold cross-validation only for 448×448 examination modalities, such as MRI, are more informative Table 7 Comparison between Paper Architecture Dataset Performance the proposed and other breast cancer detection works, [62] Faster R-CNN Optimam → INbreast AUC B: 0.79; M: 0.95 considering the INbreast [4] YoloV1 DDSM AUC B: 99.5; M: 99.9 dataset. (Det, detection; Cls, [63] YoloV2 DDSM & INbreast Det. Acc: 97.2; Cls Acc (AUC): 95.3 classification; Acc , accuracy; AP, average precision; → is for [5] YoloV3 INbreast AP B: 94.2; M: 84.6 TL from dataset1 to dataset2) Our YoloV5s CBIS-DDSM → INbreast AP B:0.771 ± 0.131 ; M:0.898 ± 0.069 1 3 Cognitive Computation Data Availability Data will be made available on reasonable request. for characterization purposes and are thus considered sec- ondary examinations [12, 64]. Declarations Based on these findings, Eigen-CAM proves to be the method more suitable with respect to occlusion sensitiv- Ethical Approval Retrospective data collection was approved by the ity for generating saliency maps in object detection tasks. local ethics committee. Despite the unavoidable increase in false positives, the Consent to Participate The requirement for evidence of informed con- reduction in false negatives was significant. This reduction sent was waived because of the retrospective nature of our study. is particularly important from a clinical perspective, as it enables early diagnosis and facilitates the scheduling of Conflict of Interest The authors declare that they have no conflict of further examinations by ruling out the growth of invasive interest. lesions. Considering these factors, we believe that saliency Open Access This article is licensed under a Creative Commons Attri- maps should complement, rather than replace, the outputs bution 4.0 International License, which permits use, sharing, adapta- of the Yolo model. In fact, Yolo’s predictions resulted strict tion, distribution and reproduction in any medium or format, as long with a small number of false positives, while Eigen-CAM’s as you give appropriate credit to the original author(s) and the source, predictions are more conservative with a minimal number of provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are false negatives. Above all, these outputs should be seen as a included in the article's Creative Commons licence, unless indicated qualitative tool that always requires clinical radiologic evalu- otherwise in a credit line to the material. If material is not included in ation. For this reason, it is the responsibility of the physician the article's Creative Commons licence and your intended use is not to determine which areas necessitate additional examination. permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . Conclusions References In this work, a Yolo-based model was proposed for breast can- 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN cer detection. Although the CBIS-DDSM dataset is composed estimates of incidence and mortality worldwide for 36 can- of scanned film mammograms, the use of the transfer learning cers in 185 countries. CA: A Cancer Journal for Clinicians. technique improves the models’ generalization capabilities when 2021;71(3):209–249. https:// doi. org/ 10. 3322/ caac. 21660. Yolo is fine-tuned with FFDM images (INbreast and propri- 2. Duffy SW, Tabár L, Yen AM-F, Dean PB, Smith RA, Jonsson H, Törnberg S, Chen SL-S, Chiu SY-H, Fann JC-Y, Ku MM-S, Wu etary datasets). The results obtained on the INbreast dataset were WY-Y, Hsu C-Y, Chen Y-C, Svane G, Azavedo E, Grundström exploited to train YoloV5 on the proprietary dataset. The per- H, Sundén P, Leifland K, Frodis E, Ramos J, Epstein B, Åker- formance obtained are very encouraging, also considering the lund A, Sundbom A, Bordás P, Wallin H, Starck L, Björkgren heterogeneity of the proprietary dataset, which is composed of A, Carlson S, Fredriksson I, Ahlgren J, Öhman D, Holmberg L, Chen TH-H. Mammography screening reduces rates of advanced particularly difficult-to-recognize lesions such as asymmetries and fatal breast cancers: results in 549,091 women. Cancer. and distortions. In addition, the use of the saliency maps makes 2020;126(13):2971–2979. https:// doi. org/ 10. 1002/ cncr. 32859. the internal process of deep learning models transparent and 3. Ekpo EU, Alakhras M, Brennan P. Errors in mammography can- encourages the integration of our model within a clinical deci- not be solved through technology alone. Asian Pac J Cancer Prev: APJCP. 2018;19(2):291. https:// doi. org/ 10. 22034/ APJCP. 2018. sion support system. In fact, the gradient-free Eigen-CAM 19.2. 291. method highlights all the suspicious ROIs, also in incorrect 4. Al-Masni MA, Al-Antari MA, Park J-M, Gi G, Kim T-Y, Rivera prediction scenarios. For this reason, it represents the enhanced P, Valarezo E, Choi M-T, Han S-M, Kim T-S. Simultaneous detec- output of our model. The proposed model represents a trusted tion and classification of breast masses in digital mammograms via a deep learning Yolo-based cad system. Comput Methods Pro- predictive system to support cognitive and decision-making and grams Biomed. 2018;157:85–94. https://doi. or g/10. 1016/j. cm pb. control processes in the clinical practice. In addition, the XAI 2018. 01. 017. results pave the way for a prospective study in which the diag- 5. Aly GH, Marey M, El-Sayed SA, Tolba MF. Yolo based breast nostic performance of physicians is evaluated with and without masses detection and classification in full-field digital mammo- grams. Comput Methods Programs Biomed. 2021;200:105823. the support of both Yolo and Eigen-CAM outputs, using an https:// doi. org/ 10. 1016/j. cmpb. 2020. 105823. external data cohort. This represents a step towards the integra- 6. Baccouche A, Garcia-Zapirain B, Olea CC, Elmaghraby AS. tion of data-driven systems into real clinical practice. Breast lesions detection and classification via Yolo-based fusion models. Comput Mater Contin. 2021;69:1407–1425. https:// doi. org/ 10. 32604/ cmc. 2021. 018461. Funding Open access funding provided by Università degli Studi di 7. Jung H, Kim B, Lee I, Yoo M, Lee J, Ham S, Woo O, Kang J. Palermo within the CRUI-CARE Agreement. This work was partially Detection of masses in mammograms using a one-stage object supported by the University of Palermo Grant EUROSTART, CUP detector based on a deep convolutional neural network. PloS one. B79J21038330001, Project TRUSTAI4NCDI. 2018;13(9):0203355. https://doi. or g/10. 1371/ jour nal. pone. 02033 55 . 1 3 Cognitive Computation 8. Darma IWAS, Suciati N, Siahaan D. A performance compari- a deep convolutional neural network-based approach. Plos one. son of balinese carving motif detection and recognition using 2022;17(1):0263126. https://doi. or g/10. 1371/ jour nal. pone. 02631 26 . YOLOv5 and mask R-CNN. In: 2021 5th International Conference 24. Soulami KB, Kaabouch N, Saidi MN. Breast cancer: classification on Informatics and Computational Sciences (ICICoS), 2021;pp. of suspicious regions in digital mammograms based on capsule 52–57. https:// doi. org/ 10. 1109/ ICICo S53627. 2021. 96518 55. network. Biomed Signal Process Control. 2022;76. https://doi. or g/ 9. Prinzi F, Insalaco M, Gaglio S, Vitabile S. Breast cancer locali-10. 1016/j. bspc. 2022. 103696. zation and classification in mammograms using YoloV5. In: 25. Ragab DA, Attallah O, Sharkas M, Ren J, Marshall S. A framework Esposito A, Faundez-Zanuy M, Morabito FC, Pasero E, edi- for breast cancer classification using multi-DCNNs. Comput Biol tors. Applications of artificial intelligence and neural systems Med. 2021;131. https://d oi.o rg/1 0.1 016/j.c ompbi omed.2 021.1 04245. to data science. Smart innovation, systems and technologies. 26. Yu X, Pang W, Xu Q, Liang M. Mammographic image classifica- Vol. 360. Singapore: Springer; 2023. h t t ps : / / d o i . o r g/ 1 0 . 1 0 0 7/ tion with deep fusion learning. Sci Rep. 2020;10(1):1–11. https:// 978- 981- 99- 3592-5_7.doi. org/ 10. 1038/ s41598- 020- 71431-x. 10. Redmon J, Farhadi A. YOLOv3: an incremental improvement. 27. Agarwal R, Diaz O, Lladó X, Yap MH, Martí R. Automatic mass arXiv preprint arXiv:1804.02767. 2018. https://doi. or g/10. 48550/ detection in mammograms using deep convolutional neural net- arXiv. 1804. 02767. works. J Med Imaging. 2019;6(3):031409. https:// doi. or g/ 10. 11. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, 1117/1. JMI.6. 3. 031409. Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, 28. AlGhamdi M, Abdel-Mottaleb M. DV-DCNN: dual-view deep et al. An image is worth 16x16 words: transformers for image rec- convolutional neural network for matching detected masses in ognition at scale. arXiv preprint arXiv:2010.11929. 2020. https:// mammograms. Comput Methods Programs Biomed. 2021;207. doi. org/ 10. 48550/ arXiv. 2010. 11929.https:// doi. org/ 10. 1016/j. cmpb. 2021. 106152. 12. Prinzi F, Orlando A, Gaglio S, Midiri M, Vitabile S. ML-based 29. Lin T-Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for radiomics analysis for breast cancer classification in DCE-MRI. dense object detection. IEEE Trans Pattern Anal Mach Intell. In: Applied Intelligence and Informatics: Second International 2020;42(2):318–27. https://do i.or g/10 .11 09/T PAMI.20 18.28 5882 6. Conference, AII 2022, Reggio Calabria, Italy, September 1–3, 30. Montavon G, Samek W, Müller K-R. Methods for interpreting 2022, Proceedings. 2023;pp. 144–158. https:// doi. org/ 10. 1007/ and understanding deep neural networks. Digit Signal Process. 978-3- 031- 24801-6_ 11. Springer 2018;73:1–15. https:// doi. org/ 10. 1016/j. dsp. 2017. 10. 011. 13. Chugh G, Kumar S, Singh N. Survey on machine learn- 31. Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, ing and deep learning applications in breast cancer diagno- Barbado A, García S, Gil-López S, Molina D, Benjamins R, et al. sis. Cognit Comput. 2021;pp. 1–20. https:// doi. or g/ 10. 1007/ Explainable artificial intelligence (XAI): concepts, taxonomies, s12559- 020- 09813-6. opportunities and challenges toward responsible AI. Inf Fusion. 14. Lee RS, Gimenez F, Hoogi A, Miyake KK, Gorovoy M, Rubin 2020;58:82–115. https:// doi. org/ 10. 1016/j. inffus. 2019. 12. 012. DL. A curated mammography data set for use in computer-aided 32. Kulesza T, Burnett M, Wong W-K, Stumpf S. Principles of detection and diagnosis research. Scientific Data. 2017;4(1):1–9. explanatory debugging to personalize interactive machine learn- https:// doi. org/ 10. 1038/ sdata. 2017. 177. ing. In: Proceedings of the 20th International Conference on Intel- 15. Moreira IC, Amaral I, Domingues I, Cardoso A, Cardoso MJ, ligent User Interfaces. 2015;pp. 126–137. https://doi. or g/10. 1145/ Cardoso JS. INbreast: toward a full-field digital mammographic 26780 25. 27013 99. database. Acad Radiol. 2012;19(2):236–48. https:// doi. org/ 10. 33. Pocevičiūtė M, Eilertsen G, Lundström C. In: Holzinger, A., Goebel, 1016/j. acra. 2011. 09. 014. R., Mengel, M., Müller, H. (eds.) Survey of XAI in digital pathol- 16. Abdelrahman L, Al Ghamdi M, Collado-Mesa F, Abdel-Mottaleb ogy. 2020;pp. 56–88. Springer, Cham. https:// doi. org/ 10. 1007/ M. Convolutional neural networks for breast cancer detection in 978-3- 030- 50402-1_4. mammography: a survey. Comput Biol Med. 2021;131. https:// 34. Durand MA, Wang S, Hooley RJ, Raghu M, Philpotts LE. doi. org/ 10. 1016/j. compb iomed. 2021. 104248. Tomosynthesis-detected architectural distortion: management 17. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, algorithm with radiologic-pathologic correlation. Radiographics. Pedreschi D. A survey of methods for explaining black box mod- 2016;36(2):311–21. https:// doi. org/ 10. 1148/ rg. 20161 50093. els. ACM Comput Surv (CSUR). 2018;51(5):1–42. https:// doi. 35. Oyelade ON, Ezugwu AE-S. A state-of-the-art survey on deep org/ 10. 1145/ 32360 09. learning methods for detection of architectural distortion from 18. Lipton ZC. The mythos of model interpretability: in machine digital mammography. IEEE Access. 2020;8:148644–76. https:// learning, the concept of interpretability is both important and slip-doi. org/ 10. 1109/ ACCESS. 2020. 30162 23. pery. Queue. 2018;16(3):31–57. https://doi. or g/10. 1145/ 32363 86. 36. Al-Dhabyani W, Gomaa M, Khaled H, Aly F. Deep learning 32413 40. approaches for data augmentation and classification of breast 19. Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang G-Z. masses using ultrasound images. Int J Adv Comput Sci Appl. Xai-explainable artificial intelligence Science robotics. 2019;10(5):1–11. https://doi. or g/10. 14569/ IJ ACSA.2019. 01005 79 . 2019;4(37):7120. https:// doi. org/ 10. 1126/ sciro botics. aay71 20. 37. Kyono T, Gilbert FJ, van der Schaar M. MAMMO: a deep learn- 20. Muhammad MB, Yeasin M. Eigen-CAM: class activation map ing solution for facilitating radiologist-machine collaboration in using principal components. In: 2020 International Joint Confer- breast cancer diagnosis. arXiv preprint arXiv:1811.02661. 2018. ence on Neural Networks (IJCNN), 2020;pp. 1–7. https://doi. or g/ https:// doi. org/ 10. 48550/ arXiv. 1811. 02661. 10. 1109/ IJCNN 48605. 2020. 92066 26. 38. Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Pro- 21. Zhu H, Chen B, Yang C. Understanding why ViT trains badly ceedings of the IEEE Conference on Computer Vision and Pattern on small datasets: an intuitive perspective. arXiv preprint Recognition. 2017;pp. 7263–7271. https://doi. or g/10. 1109/ CVPR. arXiv:2302.03751. 2023.2017. 690. 22. Muduli D, Dash R, Majhi B. Automated diagnosis of breast cancer 39. He K, Zhang X, Ren S, Sun J. Deep residual learning for image using multi-modal datasets: a deep convolution neural network recognition. In: 2016 IEEE Conference on Computer Vision and based approach. Biomed Signal Process Control. 2022;71. https:// Pattern Recognition (CVPR), 2016;pp. 770–778. https:// doi. org/ doi. org/ 10. 1016/j. bspc. 2021. 102825.10. 1109/ CVPR. 2016. 90. 40. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. 23. Mahmood T, Li J, Pei Y, Akhtar F, Rehman MU, Wasti SH. Feature pyramid networks for object detection. In: Proceedings Breast lesions classifications of mammographic images using 1 3 Cognitive Computation of the IEEE Conference on Computer Vision and Pattern Rec- 52. Ultralytics: YoloV5 Ultralytics Github. 2022. (Last accessed ognition. 2017;pp. 2117–2125. https:// doi. org/ 10. 48550/ arXiv. 24-Jan-2023). https:// github. com/ ultra lytics/ yolov5. 1612. 03144. 53. wandb: Weights & Biases. 2022. (Last accessed 24-Jan-2023). 41. Wang C-Y, MarkLiaoH-Y, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh https:// github. com/ wandb/ wandb. I-H. CSPNet: a new backbone that can enhance learning capability 54. Torrey L, Shavlik J. Chapter 11 transfer learning. In: Handbook of of CNN. In: 2020 IEEE/CVF Conference on Computer Vision and research on machine learning applications and trends: algorithms, Pattern Recognition Workshops (CVPRW). 2020;pp. 1571–1580. methods, and techniques. 2010;pp. 242–264. https:// doi. org/ 10. https:// doi. org/ 10. 1109/ CVPRW 50498. 2020. 00203.4018/ 978-1- 60566- 766-9. 42. Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for 55. Zhang J, Chao H, Kalra MK, Wang G, Yan P. Overlooked trust- instance segmentation. In: 2018 IEEE/CVF Conference on Com- worthiness of explainability in medical AI. medRxiv. 2021. puter Vision and Pattern Recognition. 2018;pp. 8759–8768. 56. Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of https:// doi. org/ 10. 1109/ CVPR. 2018. 00913. current approaches to explainable artificial intelligence in health 43. Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, care. Lancet Digital Health. 2021;3(11):745–50. https:// doi. org/ Gonzalez J, Keutzer K, Vajda P. Visual transformers: token-10. 1016/ S2589- 7500(21) 00208-9. based image representation and processing for computer vision. 57. Bodria F, Giannotti F, Guidotti R, Naretto F, Pedreschi D, Rinzivillo arXiv preprint arXiv:2006.03677. 2020. https:// doi. or g/ 10. S. Benchmarking and survey of explanation methods for black box 48550/ arXiv. 2006. 03677. models. arXiv preprint arXiv:2102.13076. 2021. 44. Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer 58. ACR: American college of radiology et.al: ACR BI-RADS Atlas: learning. J Big Data. 2016;3(1):1–40. https:// doi. org/ 10. 1186/ breast imaging reporting and data system. Reston, VA: American s40537- 016- 0043-6. College of Radiology 2014. 2013;pp. 37–78. 45. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning 59. Babkina TM, Gurando AV, Kozarenko TM, Gurando VR, Telniy VV, deep features for discriminative localization. In: Proceedings of the Pominchuk DV. Detection of breast cancers represented as architec- IEEE Conference on Computer Vision and Pattern Recognition. tural distortion: a comparison of full-field digital mammography and 2016;pp. 2921–2929. https:// doi. org/ 10. 1109/ CVPR. 2016. 319. digital breast tomosynthesis. Wiad Lek. 2021;74(7):1674–9. https:// 46. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, doi. org/ 10. 36740/ WLek2 02107 121. Batra D. Grad-CAM: visual explanations from deep networks 60. Rangayyan RM, Banik S, Desautels J. Computer-aided detection via gradient-based localization. In: 2017 IEEE International of architectural distortion in prior mammograms of interval can- Conference on Computer Vision (ICCV). 2017;pp. 618–626. cer. J Digit Imaging. 2010;23(5):611–31. https://doi. or g/10. 1007/ https:// doi. org/ 10. 1109/ ICCV. 2017. 74.s10278- 009- 9257-x. 47. Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN. 61. Arian A, Dinas K, Pratilas GC, Alipour S. The breast imaging- Grad-CAM++: generalized gradient-based visual explanations for reporting and data system (BI-RADS) made easy. Iran J Radiol. deep convolutional networks. In: 2018 IEEE Winter Conference 2022;19(1). https:// doi. org/ 10. 5812/ iranj radiol- 121155. on Applications of Computer Vision (WACV). 2018;pp. 839–847. 62. Agarwal R, Díaz O, Yap MH, Lladó X, Martí R. Deep learning https:// doi. org/ 10. 1109/ WACV. 2018. 00097. for mass detection in full field digital mammograms. Comput Biol 48. Tan Q, Xie W, Tang H, Li Y. Multi-scale attention adaptive net- Med. 2020;121:103774. https://do i. org/ 10. 1016/j. compb iomed. work for object detection in remote sensing images. In: 2022 5th 2020. 103774. International Conference on Information Communication and 63. Al-Antari MA, Han S-M, Kim T-S. Evaluation of deep learning Signal Processing (ICICSP). 2022;pp. 218–223. https:// doi. org/ detection and classification towards computer-aided diagnosis of 10. 1109/ ICICS P55539. 2022. 10050 627. IEEE. breast lesions in digital X-ray mammograms. Comput Methods 49. Li W, Huang L. YOLOSA: object detection based on 2D local Programs Biomed. 2020;196. https:// doi. or g/ 10. 1016/j. cm pb. feature superimposed self-attention. Pattern Recognition Letters. 2020. 105584. 2023;168:86–92. https:// doi. org/ 10. 1016/j. patrec. 2023. 03. 003. 64. Militello C, Rundo L, Dimarco M, Orlando A, Woitek R, 50. Qiu M, Christopher LA, Chien S, Chen Y. Attention mechanism D’Angelo I, Russo G, Bartolotta TV. 3D DCE-MRI radiomic improves YOLOv5x for detecting vehicles on surveillance videos. analysis for malignant lesion prediction in breast cancer patients. In: 2022 IEEE Applied Imagery Pattern Recognition Workshop Acad Radiol. 2022;29(6):830–40. https:// doi. org/ 10. 1016/j. acra. (AIPR), 2022;pp. 1–8. https://doi. or g/10. 1109/ AIPR5 7179. 2022. 2021. 08. 024. 10092 237. IEEE. 51. Zeiler MD, Fergus R. Visualizing and understanding convolu- Publisher's Note Springer Nature remains neutral with regard to tional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, jurisdictional claims in published maps and institutional affiliations. T. (eds.) Computer Vision – ECCV 2014. 2014;pp. 818–833. Springer, Cham. https://doi. or g/10. 1007/ 978-3- 319- 10590-1_ 53 . 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Cognitive Computation Springer Journals http://www.deepdyve.com/lp/springer-journals/a-yolo-based-model-for-breast-cancer-detection-in-mammograms-0e40bb00o8

Loading next page...

References (35)

Manal Ghamdi, M. Abdel-Mottaleb (2021)
DV-DCNN: Dual-view deep convolutional neural network for matching detected masses in mammograms
Computer methods and programs in biomedicine, 207
David Gunning, M. Stefik, Jaesik Choi, Timothy Miller, Simone Stumpf, Guang-zhong Yang (2019)
XAI—Explainable artificial intelligence
Science Robotics, 4
Karl Weiss, T. Khoshgoftaar, Dingding Wang (2016)
A survey of transfer learning
Journal of Big Data, 3
S. Banik, R. Rangayyan (2013)
Computer-Aided Detection of Architectural Distortion in Prior Mammograms of Interval Cancer
Journal of Digital Imaging, 23
Zachary Lipton (2016)
The mythos of model interpretability
Communications of the ACM, 61
H. Sung, J. Ferlay, R. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, F. Bray (2021)
Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries
CA: A Cancer Journal for Clinicians, 71
Iněs Moreira, Igor Amaral, Inês Domingues, A. Cardoso, M. Cardoso, Jaime Cardoso (2012)
INbreast: toward a full-field digital mammographic database.
Academic radiology, 19 2
Gunjan Chugh, Shailender Kumar, Nanhay Singh (2021)
Survey on Machine Learning and Deep Learning Applications in Breast Cancer Diagnosis
Cognitive Computation, 13
C. Militello, L. Rundo, M. Dimarco, A. Orlando, R. Woitek, Ildebrando D’Angelo, G. Russo, T. Bartolotta (2021)
3D DCE-MRI Radiomic Analysis for Malignant Lesion Prediction in Breast Cancer Patients.
Academic radiology
M. Al-antari, S. Han, Tae-Seong Kim (2020)
Evaluation of deep learning detection and classification towards computer-aided diagnosis of breast lesions in digital X-ray mammograms
Computer methods and programs in biomedicine, 196
M. Al-masni, M. Al-antari, Jeong-Min Park, G. Gi, Tae-Yeon Kim, P. Rivera, E. Valarezo, Mun-Taek Choi, S. Han, Tae-Seong Kim (2018)
Simultaneous Detection and Classification of Breast Masses in Digital Mammograms via a Deep Learning YOLO-based CAD System
Computer methods and programs in biomedicine, 157
R. Lee, Francisco Gimenez, A. Hoogi, K. Miyake, Mia Gorovoy, D. Rubin (2017)
A curated mammography data set for use in computer-aided detection and diagnosis research
Scientific Data, 4
M. Durand, Steven Wang, R. Hooley, M. Raghu, L. Philpotts (2016)
Tomosynthesis-detected Architectural Distortion: Management Algorithm with Radiologic-Pathologic Correlation.
Radiographics : a review publication of the Radiological Society of North America, Inc, 36 2
Todd Kulesza, M. Burnett, Weng-Keen Wong, S. Stumpf (2015)
Principles of Explanatory Debugging to Personalize Interactive Machine Learning
Proceedings of the 20th International Conference on Intelligent User Interfaces
T-Y Lin, P Goyal, R Girshick, K He, P Dollár (2020)
Focal loss for dense object detection
IEEE Trans Pattern Anal Mach Intell., 42
M Ghassemi, L Oakden-Rayner, AL Beam (2021)
The false hope of current approaches to explainable artificial intelligence in health care
Lancet Digital Health., 3
Tariq Mahmood, Jianqiang Li, Yan Pei, F. Akhtar, M. Rehman, S. Wasti (2022)
Breast lesions classifications of mammographic images using a deep convolutional neural network-based approach
PLoS ONE, 17
Alexander Gorban, A. Zinovyev (2008)
Principal Graphs and Manifolds
ArXiv, abs/0809.0490
S. Duffy, L. Tabár, A. Yen, P. Dean, Robert Smith, H. Jonsson, S. Törnberg, S. Chen, S. Chiu, J. Fann, M. Ku, W. Wu, Chen-Yang Hsu, Yu-Ching Chen, G. Svane, E. Azavedo, H. Grundström, P. Sunden, K. Leifland, E. Frodis, J. Ramos, Birgitta Epstein, Anders Åkerlund, A. Sundbom, P. Bordás, Hans Wallin, Leena Starck, Annika Björkgren, S. Carlson, I. Fredriksson, J. Ahlgren, Daniel Öhman, L. Holmberg, T. Chen (2020)
Mammography screening reduces rates of advanced and fatal breast cancers: Results in 549,091 women
Cancer, 126
Hwejin Jung, Bumsoo Kim, Inyeop Lee, Minhwan Yoo, Junhyun Lee, Soo-Youn Ham, Okhee Woo, Jaewoo Kang (2018)
Detection of masses in mammograms using a one-stage object detector based on a deep convolutional neural network
PLoS ONE, 13
Riccardo Guidotti, A. Monreale, F. Turini, D. Pedreschi, F. Giannotti (2018)
A Survey of Methods for Explaining Black Box Models
ACM Computing Surveys (CSUR), 51
Ghada Hamed, M. Marey, S. El-Sayed, M. Tolba (2020)
YOLO Based Breast Masses Detection and Classification in Full-Field Digital Mammograms
Computer methods and programs in biomedicine
L. Abdelrahman, Manal Ghamdi, F. Collado-Mesa, M. Abdel-Mottaleb (2021)
Convolutional neural networks for breast cancer detection in mammography: A survey
Computers in biology and medicine, 131
Weisheng Li, Lin Huang (2022)
YOLOSA: Object detection based on 2D local feature superimposed self-attention
ArXiv, abs/2206.11825
D. Ragab, Omneya Attallah, M. Sharkas, Jinchang Ren, S. Marshall (2021)
A framework for breast cancer classification using Multi-DCNNs
Computers in biology and medicine, 131
R Agarwal, O Diaz, X Lladó, MH Yap, R Martí (2019)
Automatic mass detection in mammograms using deep convolutional neural networks
J Med Imaging., 6
Debendra Muduli, Ratnakar Dash, B. Majhi (2021)
Automated diagnosis of breast cancer using multi-modal datasets: A deep convolution neural network based approach
Biomed. Signal Process. Control., 71
Xiangchun Yu, Wei Pang, Qing Xu, Miaomiao Liang (2020)
Mammographic image classification with deep fusion learning
Scientific Reports, 10
Alejandro Arrieta, Natalia Rodríguez, J. Ser, Adrien Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-Lopez, D. Molina, Richard Benjamins, Raja Chatila, Francisco Herrera (2019)
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI
Inf. Fusion, 58
ZC Lipton (2018)
The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery
Queue., 16
G. Montavon, W. Samek, K. Müller (2017)
Methods for interpreting and understanding deep neural networks
Digit. Signal Process., 73
Richa Agarwal, Oliver Díaz, Moi Yap, Xavier Lladó, R. Martí (2020)
Deep learning for mass detection in Full Field Digital Mammograms
Computers in biology and medicine, 121
Khaoula Soulami, N. Kaabouch, Mohamed Saidi (2022)
Breast cancer: Classification of suspicious regions in digital mammograms based on capsule network
Biomed. Signal Process. Control., 76
Arvin Arian, K. Dinas, G. Pratilas, S. Alipour (2022)
The Breast Imaging-Reporting and Data System (BI-RADS) Made Easy
Iranian Journal of Radiology
Asma Baccouche, B. Garcia-Zapirain, Cristian Olea, Adel Elmaghraby (2021)
Breast Lesions Detection and Classification via YOLO-Based Fusion Models
Computers, Materials & Continua

Publisher: Springer Journals
Copyright: Copyright © The Author(s) 2023
ISSN: 1866-9956
eISSN: 1866-9964
DOI: 10.1007/s12559-023-10189-6
Publisher site: See Article on Publisher Site

Abstract

This work aims to implement an automated data-driven model for breast cancer detection in mammograms to support physi- cians’ decision process within a breast cancer screening or detection program. The public available CBIS-DDSM and the INbreast datasets were used as sources to implement the transfer learning technique on full-field digital mammography pro- prietary dataset. The proprietary dataset reflects a real heterogeneous case study, consisting of 190 masses, 46 asymmetries, and 71 distortions. Several Yolo architectures were compared, including YoloV3, YoloV5, and YoloV5-Transformer. In addition, Eigen-CAM was implemented for model introspection and outputs explanation by highlighting all the suspicious regions of interest within the mammogram. The small YoloV5 model resulted in the best developed solution obtaining an mAP of 0.621 on proprietary dataset. The saliency maps computed via Eigen-CAM have proven capable solution reporting all regions of interest also on incorrect prediction scenarios. In particular, Eigen-CAM produces a substantial reduction in the incidence of false negatives, although accompanied by an increase in false positives. Despite the presence of hard-to- recognize anomalies such as asymmetries and distortions on the proprietary dataset, the trained model showed encouraging detection capabilities. The combination of Yolo predictions and the generated saliency maps represent two complementary outputs for the reduction of false negatives. Nevertheless, it is imperative to regard these outputs as qualitative tools that invariably necessitate clinical radiologic evaluation. In this view, the model represents a trusted predictive system to support cognitive and decision-making, encouraging its integration into real clinical practice. Keywords Breast cancer detection · Explainable AI · YoloV5 · Transfer learning · Proprietary dataset Introduction * Francesco Prinzi Breast cancer is the most common worldwide tumor in [email protected] the female population [1]. Previous randomized trials and Marco Insalaco incidence-based mortality studies have demonstrated a sig- [email protected] nificant reduction in breast cancer mortality associated with Alessia Orlando participation in breast screening programs [2]. However, the [email protected] problem of false positives and false negatives persists as Salvatore Gaglio a concern. Most of these errors can be attributed to dense [email protected] breasts (masking effect), as well as human factors such Salvatore Vitabile as radiologist perception and erroneous decision-making [email protected] behaviors. Additionally, the inherent imaging characteristics of tumors contribute to the issue, with benign masses often Department of Biomedicine, Neuroscience and Advanced resembling malignant ones and malignant masses sometimes Diagnostics (BiND), University of Palermo, Palermo, Italy mimicking benign ones [3]. During the breast cancer diag- Section of Radiology - Department of Biomedicine, nosis process, the physician aims to detect all the regions of Neuroscience and Advanced Diagnostics (BiND), University Hospital “Paolo Giaccone”, Palermo, Italy interest (ROIs) in the whole mammogram: masses, calcifi- 3 cations, distortions, etc. Detection in the early stage of the Department of Engineering, University of Palermo, Palermo, Italy disease is critical for planning new examinations, therapies, 4 or lines of intervention. A missed detection, on the other Institute for High-Performance Computing and Networking, National Research Council (ICAR-CNR), Palermo, Italy hand, may result in irreversible injury to the patient. For Vol.:(0123456789) 1 3 Cognitive Computation this reason, breast cancer detection is the most complicated However, despite the high performance of the deep learn- but also the most important task. Unfortunately, several pro- ing models, their actual use is inhibited by their black-box posed solutions in the literature do not aim to analyze the nature, i.e., the internal logic is incomprehensible to users entire image, but rather limit detection to patch classifica- [17]. This has raised some critical issues about their use such tion: the ROIs are first manually selected and cropped, and as legal aspects, user acceptance, and trust [18, 19]. For this then the classifiers are trained to distinguish the crops. How- reason, in order to encourage the integration of these sys- ever, to support and imitate the physician’s diagnostic pro- tems into real clinical practice, the problem of their explain- cess, an architecture capable of detecting all ROIs within the ability needs to be addressed. The gradient-free method whole mammogram is required. Faster R-CNN, RetinaNet, Eigen-CAM [20] was used for saliency maps computation and Yolo have encouraged the development of systems for and compared with the occlusion sensitivity method. The breast cancer detection [4–7]. These frameworks certainly saliency maps were employed to verify the learning model introduce two main difficulties: (1) the models have to learn and to highlight the most important pixel involved in the the features of the whole mammogram, and the image resiz- prediction process. We believe that reporting regions in the ing required for training may result in the loss of critical form of heat maps can guide the physician’s attention much details; (2) since the model has to detect all ROIs among all more than ROIs prediction: ROIs are predicted and shown patches of healthy tissue (i.e., non-ROIs), an unavoidable only above a certain confidence threshold, and the hardest- increase in the error rate must be faced. However, Yolo has to-find regions may not exceed this threshold. In this way, proven to be an excellent tool in numerous scenarios, achiev- the complicated, tedious, and exhausting process of mam- ing higher accuracy and inference speed rates than its object mogram evaluation can be supported by guiding the physi- detector competitors [8]. cian’s attention to different ROIs. In [9], a comparison and evaluation of YoloV5 nano, The main contributions on the current manuscript are as small, medium, and large models using the CBIS-DDSM follows: and INbreast datasets was performed. However, several aspects have not yet been considered. The issue of explain- • The first novelty falls within the field of explainable ability was not addressed. Nevertheless, in critical domains artificial intelligence (XAI). While data-driven methods like medical applications, ensuring model explainability is have demonstrated high performance in various medical an essential prerequisite. Furthermore, it has not been exam- scenarios, their lack of transparency creates skepticism ined whether the utilization of deeper architectures such as among both physicians and patients regarding these new YoloV3 can enhance detection performance in the case of technologies. This skepticism is particularly prominent small datasets. Additionally, the potential advantages of in the development of clinical decision support systems incorporating a Transformer block into Yolo, considering (CDSS), where understanding the decision-making pro- their generalization capability, have not been investigated. cess and ensuring system reliability are crucial prerequi- In this work, a YoloV5-based model was proposed for breast sites for facilitating the diagnostic process. Conventional cancer detection to support the physician’s diagnostic pro- machine learning approaches are inadequate in meeting cess. A comparison between other feature extractors such as these demands and fail to provide justifications for the Darknet53 proposed in YoloV3 [10] and the Vision Trans- decisions made by the systems. Introducing explainabil- former [11] was performed. ity for breast cancer detection is of utmost importance Given the need for large databases to facilitate deep due to the potential for early detection of invasive dis- training [12], the transfer learning (TL) technique was eases in mammography screening. Quite frequently, these used. In fact, it has also recently been shown that training lesions may not be readily apparent and may fail to meet the with small datasets by exploiting pre-trainings represents confidence threshold established in Yolo to return the detec- a future direction to provide a trusted system supporting tion. Conversely, gradient-free XAI methods could remain cognitive and decision-making processes in the medical unaffected by the final output and can provide valuable domain [13]. For this reason, the CBIS-DDSM [14] and assistance in the diagnostic process, even in situations INbreast [15] datasets were used as source datasets and a involving inaccurate or low confidence predictions. The proprietary dataset as target. In contrast to CBIS-DDSM saliency maps have been proposed as a valuable tool to and INbreast, the proprietary dataset includes lesions that enhance the predictions of YoloV5. are more challenging to recognize, such as asymmetries and • A proprietary dataset was acquired during daily clini- distortions, which hold significant clinical importance [16]. cal sessions from the Radiology Section of the Univer- The proprietary dataset was acquired and annotated at the sity Hospital “Paolo Giaccone” (Palermo, Italy) for Radiology Section of the University Hospital “Paolo Giac- model evaluation. Unlike CBIS-DDSM and INbreast, cone” (Palermo, Italy). The workflow of the experiments this dataset comprises a real clinical dataset contain- performed is shown in Fig. 1. ing numerous lesions that present greater complexity 1 3 Cognitive Computation Fig. 1 The overall architecture. The CBIS-DDSM dataset was used as detection on a proprietary dataset. A data augmentation procedure source to evaluate several Yolo-based architectures (YoloV3, YoloV5 was performed before the training phase for class balancing as well as (n, s, m, l), and YoloV5-Transformer) on the INbreast target data- during the training. The output comprises bounding-box predictions set. Then, the best trained architecture (YoloV5s) was used for mass and a heat map that highlights all the ROIs within the mammogram in recognition, including asymmetries and distortions. datasets. The same section explains the three main used These challenging cases hold an important clinical architectures of Yolo, their training, and the methods for significance [16]. Furthermore, the training process saliency maps computation. Section “Results” shows the involved the utilization of three datasets, enabling the achieved results, and “Discussion”, their discussion. Finally, final model to incorporate the knowledge acquired from in “Conclusions”, the main conclusions are reported. CBIS-DDSM and INbreast datasets. The article presents a comparison of several Yolo-based models. In addition, we evaluated the integration of Related Work Transformers [11] inside Yolo. Transformers have had an enormous impact on large language models and computer Given the incidence of breast cancer, many works have been vision tasks. However, the authors [11] acknowledge that proposed to support the physician’s diagnostic process. Transformers lack certain inherent biases found in con- Muduli et al. [22] and Mahmood et al. [23] have compared volutional neural networks (CNNs), such as translation their own CNN architecture with state-of-the-art networks equivariance and locality. Consequently, Transformers for malignant and benign ROIs classification. Soulami et al. may not generalize well when trained on limited amounts [24] have also proposed a CNN, called CapsNet, to address of data. This phenomenon is starting to be discussed in the classification of ROIs. They showed that the classifica- other studies [21]. In the context of mammograms and tion of breast masses into normal, benign, and malignant transfer learning, the generalizability of these findings is certainly more complex than a binary classification of remains uncertain. masses into normal and abnormal. Also, Ragab et al. [25] have addressed breast cancer classification at patch-level, This article is organized as follows: “Related Work” provides using AlexNet, GoogleNet, and ResNet-18-50-101 as feature the related works on breast cancer classification both using extractors and a support vector machine as classifier. They patch-based classification and exploiting the whole mam- also evaluated classification through deep feature fusion mogram. Section “Materials and Methods” describes the and a subsequent application of principal component analy- open-source CBIS-DDSM, the INbreast, and the proprietary sis. Yu et al. [26] have explored several methods and CNN 1 3 Cognitive Computation architectures for tumor or normal ROIs classification. Two dataset and is composed of scanned film mammograms. deep fusion models based on VGG16 were used to classify Focusing on masses, 1514 images with a total of 1618 different patches extracted from the original ROI, to obtain lesions (850 benign and 768 malignant) were included. Of the final prediction using a majority voting. In Agarwal et al. the total 1696 lesions, 78 were discarded due to a mismatch [27], a sliding window approach is used to scan the whole between the size of the image and its mask, generating ROIs breast and extract all the possible cancer patches from the that did not match a lesion. image. Several patch-based CNN (VGG16, ResNet50, and InceptionV3) were trained for breast cancer detection, that The INbreast Dataset is the classification between positive and negative patches. The aforementioned works train convolutional models The INbreast [15] dataset consists of 410 full-field digital that can distinguish ROIs, without dealing with recognizing mammograms (FFDM) classified into normal, benign, and them. However, at the breast screening stage, it is crucial to malignant. Only the 107 positive images were selected, and detect all ROIs and subsequently plan new lines of interven- lesions with Bi-Rads > 3 were considered malignant; the tion. Jung et al. [7] used RetinaNet as object detector for the others were labeled as benign. Considering that some images automatic localization of masses (both benign and malig- contain multiple lesions, a total of 40 benign and 75 malig- nant) in the whole mammogram. A dual-view deep convo- nant ROIs were identified. lutional neural network (DV-DCNN) for matching detected masses was proposed by AlGhamdi and Abdel-Mottaleb The Proprietary Dataset [28]. The authors used RetinaNet [29] for mass detection and the DV-DCNN architecture to determine if two patches The dataset consists of 278 FFDMs containing a total of from the craniocaudal (CC) and mediolateral oblique (MLO) 307 lesions, annotated by expert radiologists dealing with views of the same breast represent the same mass, i.e., a the identification of abnormal regions. The images were positive pair. In [4] a Yolo-based Computer-Aided Diagnosis acquired by a Fujifilm Full Field Digital at the Radiol- (CAD) was proposed for mass detection and classification, ogy Section of the University Hospital “Paolo Giaccone” proving that the system works also where the masses exist (Palermo, Italy). Images have spatial resolution and pixel over the pectoral muscles or dense regions. Aly et al. [5] size of 5928 × 4728 and 50 µm, respectively. The image define the evaluation process of screening mammograms as annotations were saved in grayscale softcopy presentation very monotonous, tiring, lengthy, costly, and significantly state (GSPS) format, compliant with the DICOM standard. prone to errors for human readers. In fact, a YoloV3 model All identified by radiologist ROIs were annotated by a cir- was proposed for mass detection and classification. They cumscribed circle, and then the coordinates of the bounding- obtained the fairest and most accurate performance using boxes used for Yolo input were calculated as the coordinates an augmented dataset. of the square circumscribed by the circle. The dataset used In this work, new feature extractors for breast cancer in our study was obtained from the real clinical practice at detection were considered. The YoloV5 architecture was University Hospital “Paolo Giaccone” (Palermo, Italy). Spe- compared with the previous YoloV3 model and considering cifically, the data was collected from the outpatient breast also the Vision Transformer block. In addition, Eigen-CAM clinic, which specializes in second-level diagnostics. As a was used as explainable AI algorithm [30, 31] to provide a result, the acquired case series are heavily skewed towards post hoc explanation. The Eigen-CAM method was com- more severe breast cancer lesions including distortions and pared with occlusion sensitivity. The generated saliency asymmetries. Detecting and diagnosing distortions can be maps were used for two main reasons: (1) as explanatory particularly challenging, as they are characterized by the debugging tool for preventing inadequate outputs [32, 33] presence of spicules radiating from a point, focal retractions, and (2) to guide physicians’ attention even on incorrect pre- or straightening at the edges of the parenchyma [34]. Con- diction scenarios. sequently, distortions are among the most commonly over- looked abnormalities [35]. Asymmetries refer to unilateral deposits of fibroglandular tissue that do not meet the criteria Materials and Methods for being classified as masses. They can be further catego- rized as asymmetry, focal asymmetry, global asymmetry, or Datasets developing asymmetry. It has been estimated that around 20% of asymmetry cases are associated with malignancy, The CBIS‑DDSM Dataset making them an important area of research [16]. The benign lesions represent 17.6% of the dataset (54 samples), and the The CBIS-DDSM dataset [14] is the curated version of the 82.4% (253 samples) are malignant. The dataset reflects Digital Database for Screening Mammography (DDSM) a real clinical scenario; in fact, it is composed of masses 1 3 Cognitive Computation (62%), asymmetries (%15), and distortions (23%). Given the the training set were augmented. Although the main pur- large class imbalance, the proprietary dataset was used only pose of the work is to evaluate the detection performance for detection. on the proprietary dataset (regardless of lesion class), the following data augmentation procedure was applied to the Data Pre‑Processing proprietary dataset before the training phase. Figure 2 sum- marizes the transformation considered. In particular, 180 For the CBIS-DDSM and INbreast datasets, the coordi- rotation and 180 rotation + flip upper-down (UD) were nates of the ROIs bounding box required for Yolo training applied for benign images. The other transformations were were calculated considering the coordinates of the small- applied during the training of Yolo, as discussed in the next est rectangle containing the segmented lesion. Instead, the subsection “Techniques Used During the Training Phase”. ROI coordinates for the proprietary dataset were computed In addition, according to [5], the remaining test dataset was from the square region that inscribes the circle containing augmented to obtain the validation set. In fact, flip UD, ◦ ◦ the ROI. The CBIS-DDSM dataset has an acceptable size 180 rotation + flip UD, flip left-right (LR) and 180 rota- for deep learning architecture training. However, it is com- tion were applied on benign images, and Flip LR for malig- posed of scanned film mammograms, much noisier and less nant images. Considering the smaller difference between detailed than FFDM. For this reason, only for the CBIS- the classes, on INbreast, also, 180 rotations for malignant DDSM dataset, the contrast limited adaptive histogram masses were considered [9]. equalization (CLAHE) was applied for image enhancement This procedure resulted in the generation of a balanced [23], with the following setting: 1 as contrast limit, 2 × 2 as validation set. In addition, the discussed procedure for grid size, followed by a 3 × 3 Gaussian filter. For all data- INbreast and the proprietary datasets was repeated consid- sets, the gray levels were scaled in the range 0–255, and the ering 5 different splitting of training and test sets (5-fold images were resized to 640 × 640 using the Lanczos filter cross-validation). [36, 37]. The CBIS-DDSM dataset was splitted randomly considering 70% training, 15% validation, and 15% test Techniques Used During the Training Phase set. Conversely, the INbreast and the proprietary datasets were split into training (80%) and test set (20%), respec- Transformations not considered in the previous step were tively. Considering the small size of the two datasets and the performed during Yolo training. In particular, three differ- unbalanced issue, the next “Data Augmentation” discusses ent data augmentation configurations were chosen: low, data augmentation for class balancing and generation of the medium, and high. In all cases, image translation, rotation, validation set “Techniques for Class Balancing Before the scale, shear, flip UD, flip LR, and also HSV augmentation Training Phase”, as well as the procedure to improve the were considered. In addition, although it is a common sce- training “Techniques Used During the Training Phase”. nario for breast cancer, all three datasets contain few multi- lesion images. Therefore, to improve the model’s capability Data Augmentation to detect multiple lesions in the same image, the mosaic technique was used. The mosaic augmentation method con- Techniques for Class Balancing Before the Training Phase sists of the generation of a 2 × 2 grid image, containing the considered image and three random images of the dataset. Due to the excessive imbalance classes for the INbreast and The mosaic technique improves training for two main rea- proprietary dataset, the minority class images (benign) of sons: (1) merging 4 images results in multiple ROIs in the Fig. 2 Transformations for class balancing and validation set creation. The procedure was repeated implementing the 5-fold cross-validation 1 3 Cognitive Computation same image, and the model improves in recognizing multiple pyramids network (FPN) [40] is used as neck, allowing to ROIs simultaneously; (2) to achieve the same input size, the learn objects of different sizes: it specializes in detecting 4 merged images and their respective ROIs are downsized, large and small objects. In addition, the non-maximum sup- improving the detection of smaller lesions. pression to select one bounding box out of many overlap- Table 1 shows the parameter set for each configuration. ping bounding boxes is used. The values reported HSV, translation, rotation, scale, and shear indicate the range considered for the random transfor- YoloV5 Model mation. For flip and mosaic, the value indicates the prob- ability of performing the transformation, so 0.5 is considered YoloV5 uses CSPDarknet53 as its backbone: it exploits a higher level of augmentation because both augmented and the architecture Darknet53 proposed in [10] and employs non-augmented images are considered for training. a CSPNet [41] strategy to partition the feature map of the base layer into two parts and then merges them through a Yolo Architectures Training cross-stage hierarchy. In the neck part, PAnet [42] is used to generate the feature pyramids network (FPN) and allow the Like other single-stage object detectors, Yolo consists of extraction of multi-scale feature maps. This structure allows three parts: backbone, neck, and head. The backbone part the extraction of features optimized for small, medium, and is a CNN that extracts and aggregates image features. The large object detection. YoloV5 was released in nano, small, neck part allows for features extraction optimized for small, medium, large, and extra-large versions. The versions differ medium, and large object detection. In the end, the three in the number of convolutional kernels used and thus the feature maps for small, medium, and large object detec- number of parameters. In this paper, a comparison between tion are given as input to the head part, thus composed of nano, small, medium, and large versions was performed. convolutional layers for the final prediction. Yolo requires that the image is divided into a grid, then makes a predic- YoloV5‑Transformer tion for each grid cell. The prediction consists of a 6-tuple y =(p , b , b , b , b , c) , where (b , b , b , b ) identify coor- In contrast to convolutional networks, Transformer are able c x y h w x y h w dinates (x, y) and sizes (height, width) of the predicted to model the relationships among various small patches in bounding-box, p represent the probability that there is an the image. The Transformer block assumes the image is object in the cell, and c represent the predicted class. The split into a sequence of patches, where each patch is flat- mechanism of anchors is also used, to allow multiple object tened to a vector. These flattened image patches are used to detection in the same grid cell. For this reason, the predic- create lower-dimensional linear embeddings and fed into a tion is the 6-tuple discussed for each specified anchor. Each Transformer encoder, composed by a multi-head attention version of Yolo has its own peculiarities, which mainly to find local and global dependencies in the image. It has concern the structure of the feature extractor, that is, the been shown that the introduction of a Transformer block to backbone. convolutional networks can improve efficiency and overall accuracy [43]. In YoloV5, the Transformer block was embed- YoloV3 Model ded in the penultimate layer of the backbone, that is, among the three convolutional layers preceding the spatial pyramid YoloV3 is much deeper than the previous two versions pooling layer. and is more accurate but requires more time and data for training. In YoloV3, the Darknet53 was used as backbone Models Training [10]. Darknet53 is a hybrid approach between Darknet19 (used in YoloV2 [38]) and residual network elements (e.g., Considering the small size of both INbreast and proprietary BottleNeck) [39], proposed to improve the Darknet19 and datasets, training a deep architecture such as Yolo may harm the efficiency of ResNet-101/152. The short-cut connec- the reliability of the trained models. Therefore, despite it tions allow getting more fine-grained information, lead- being composed of scanned film mammograms, the CBIS- ing to better performance for small objects. The feature DDSM is employed as source dataset for initial training. The Table 1 Setting for data Level H,S,V Translation Rotation Scale Shear Flip (UD, LR) Mosaic augmentation during the training phase Low 0.0, 0.0, 0.0 0.1 5.0 0.1 5.0 (0.5, 0.5) 0.0 Med 0.007, 0.35, 0.2 0.3 10.0 0.3 5.0 (0.5, 0.5) 1.0 High 0.015, 0.7, 0.4 0.3 20.0 0.3 10.0 (0.5, 0.5) 0.5 1 3 Cognitive Computation above setup allows the TL technique on the INbreast and output and activation maps, can produce distorted visualiza- proprietary target datasets. Considering that both source and tions when predictions are incorrect. To address these issues, target datasets are labeled, the performed TL was inductive this study presents Eigen-CAM for saliency map computa- transfer learning [44]. Since Yolo simultaneously solves a tion and compares it with the occlusion sensitivity method. regression task to predict bounding box coordinates, and two Eigen-CAM is a gradient-free method that computes and classification tasks to predict objectiveness and class score, visualizes the principal components of the learned features/ two different loss functions were employed. For regression, representations from the convolutional layers, resulting in complete Intersection over Union (IoU) loss was used; for intuitive and compatible with all the deep learning mod- classification, binary cross-entropy with logits loss function els. In Eigen-CAM, it is assumed that all relevant spatial was used in both cases. features learned over the hierarchy of the CNN model will be preserved during the optimization process, and non- Performance Evaluation relevant features will be regularized or smoothed out. The Eigen-CAM is computed considering the input image I of The results obtained were presented considering the most com- size i × j projected onto the last convolutional layer L = K mon indexes for object detection tasks such as precision, recall, and is given by O = W I . The matrix O = UΣV is L=K L=K L=K and average precision. The average precision (AP) is defined factorized using the singular value decomposition to obtain as the area under the precision-recall curve. The IoU was set the principal components. The activation map is given by to 0.5. For CBIS-DDSM and INbreast datasets, AP was calcu- the projection on the first eigenvector L = O V , Eigen−CAM L=K 1 lated for detecting malignant (M AP) and benign (B AP) lesions where V is the first eigenvector in the V matrix. Similar to separately, as well as the mean of the two classes (mAP). Eigen-CAM, Occlusion sensitivity can be linked to image detection tasks, and it is gradient-free and independent of the Models Explanation: Eigen‑CAM specific architecture used. It assesses changes in activations resulting from occluding different regions of the image [51]. Examining trained models is essential before incorporating The saliency maps have been proposed as a valuable tool them into actual clinical practice. As a result, our system to enhance the predictions of YoloV5, which can assist phy- produces prediction explanations as the second output to sicians in the diagnostic process, especially when the model fulfill this requirement. Saliency maps have the capability fails to make accurate predictions. YoloV5 only provides to reveal the pixels or regions that played a significant role predictions if they surpass a certain confidence threshold. in the decision-making process of the system. This effec- The purpose of saliency maps is to identify all ROIs and mit- tively highlights all potential ROIs to the physician. Several igate false negative issues. It has been observed that many gradient-based methods such as CAM [45], Grad-CAM [46], cancer types progress to an invasive stage due to the failure and GradCAM++ [47] have been proposed to implement of early prediction also with preliminary signs. Therefore, interpretability and transparency of deep learning models. In in contrast to YoloV5’s predictions, saliency maps offer all particular, they are class discriminative visualization meth- potential ROIs, even with low confidence. This inevitably ods and require the class probability score for the gradient leads to an increase in false positives. Considering this, phy- computations. However, gradient-based methods suffer from sicians receive two outputs: firstly, the conventional YoloV5 this problem: backpropagating any quantity requires addi- output that balances precision and recall, providing only tional computational overhead and assumes that classifiers ROIs that exceed a certain confidence level. In addition, produced correct decisions, and whenever a wrong decision saliency maps propose all potential ROIs, which may serve is made, all mentioned methods will produce wrong or dis- as early cancer indications, even if their probability of being torted visualizations [20]. For this reason, the localization lesions (i.e., not exceeding the threshold) is low. Thus, a accuracy of the above methods remains weak, especially simple predictive model transforms into a decision-support in the case of incorrect predictions. In addition, while tra- system, as physicians receive not only a definitive decision ditional CNNs provide class distributions for each sample, YOLO’s output includes bounding box coordinates, object Table 2 Comparison of the nano, small, medium, and large architec- tures of YoloV5 on the CBIS-DDSM dataset, considering all default presence probabilities in each cell, and class distributions. hyperparameters These issues often make the output non-differentiable and impractical to implement gradient-based algorithms. As a Model B AP M AP Precision Recall mAP result, many object detection studies employing Yolo rely on n 0.257 0.479 0.473 0.408 0.368 Eigen-CAM for architecture interpretation [48–50]. Eigen- s 0.257 0.518 0.447 0.427 0.387 CAM is preferred due to its gradient-free nature and prin- m 0.280 0.514 0.489 0.403 0.397 cipal components use from the extracted feature maps. It l 0.239 0.488 0.491 0.377 0.364 should be noted that gradient-based methods, which rely on 1 3 Cognitive Computation Table 3 Performance of YoloV5 small version, considering the its parameters. Therefore, all subsequent experiments were equalized CBIS-DDSM dataset, Adam optimizer, and the three data carried out only considering the small model. augmentation configurations Table 3 shows that the histogram equalization speci- Hyps B AP M AP Precision Recall mAP fied in the data pre-processing section improves the model performance. In addition, the Adam optimizer using 0.001 Equal 0.300 0.501 0.487 0.408 0.400 as learning rate outperforms the default stochastic gradi- Adam+equal 0.321 0.555 0.487 0.464 0.438 ent descent (SGD) optimizer with learning rate of 0.01. aug-low 0.241 0.49 0.46 0.394 0.366 Therefore, experiments to evaluate the impact of data aug- aug-med 0.337 0.549 0.497 0.487 0.433 mentation were carried out using the equalized dataset and aug-high 0.361 0.634 0.566 0.482 0.498 Adam optimizer. Table 3 shows how the results improve as data augmentation increases. The extensive data augmen- tation employed emphasizes the necessity for substantial but also suggestions of lesions that the system recommends amounts of data when training this deep architecture, con- paying attention to. firming the choice of using the CBIS-DDSM dataset to perform TL on INbreast and proprietary datasets. Results Inbreast Results and Transfer Learning Evaluation The experiments were performed in Google Colaboratory Pro, using Python 3 environment. The PyTorch implemen- Exploiting the optimized hyperparameters for the CBIS- tation proposed by Ultralytics [52] was exploited, and the DDSM dataset, YoloV3 and YoloV5-Transformer models Weights & Biases platform [53] was used to monitor the were also trained on the CBIS-DDSM dataset, to imple- training process. The trainings were performed for 100 ment the TL technique on the INbreast target dataset. epochs and 16 as batch. The validation mAP was used for Table 4 shows the achieved results. Considering the model selection, considering the best model as a weighted dataset size, the performance was calculated in 5-fold combination of [email protected], [email protected]:0.95 metrics, respec- cross validation, and mean and standard deviation were tively 0.9 and 0.1. reported for each metric. The best training protocol for the CBIS-DDSM, that is, Adam optimizer, high data aug- CBIS‑DDSM Results and Data‑Augmentation mentation, and 16 as batch, was used for all the experi- Improvements ments. In addition, INbreast was also trained from scratch to show the difference in accuracy with and without The CBIS-DDSM dataset was used to evaluate the opti- TL. The YoloV5s model outperforms its previous ver- mal YoloV5 architecture and for hyperparameters opti- sion YoloV3 and also the YoloV5-Transformer. YoloV3 mization, considering the nano, small, medium, and contains a feature extractor with more parameters than large versions. Then, it was exploited as source dataset to YoloV5s and Transformer (about 61 vs. 7 million) and implement inductive TL and improve the generalization therefore needs a larger amount of data for their train- capabilities on INbreast and proprietary FFDM images. ing. In addition, the YoloV5-Transformer version showed For this reason, given the huge amount of hyperparam- lower performance while it has a comparable number eters, an initial analysis was performed using all the pro- of parameters to YoloV5s. Comparing YoloV5s training posed default values for each model. Table 2 shows the from scratch and with TL on the INbreast, an increase achieved results for each version of YoloV5. The nano of 0.061 mAP and 0.119 of B AP was calculated. The and large versions have a lower mAP than the small and imbalance of the dataset clearly reflects the model per- medium versions. Conversely, the small model, compared formance: the benign lesions detection rate, which is the with the medium model, results in a more balanced preci- minority class, is lower than malignant lesions for each sion and recall pair, while it contains about one-third of considered model. Table 4 5-fold results for Model B AP M AP Precision Recall mAP the three used architectures on INbreast dataset (Tr is YoloV3 0.585 ± 0.093 0.890 ± 0.036 0.785 ± 0.012 0.695 ± 0.104 0.738 ± 0.061 for Transformer; NoTL is YoloV5s-Tr 0.642 ± 0.060 0.894 ± 0.054 0.799 ± 0.118 0.742 ± 0.146 0.771 ± 0.048 the training without transfer YoloV5s-NoTL 0.652 ± 0.051 0.890 ± 0.047 0.835 ± 0.059 0.713 ± 0.770 0.771 ± 0.038 learning) YoloV5s 0.771 ± 0.131 0.898 ± 0.069 0.854 ± 0.097 0.729 ± 0.100 0.835 ± 0.098 1 3 Cognitive Computation Table 5 5-Fold results on the proprietary dataset, considering the complete overlap as shown in Figs. 4 and 5. However, a poor training with and without transfer learning overlap between saliency maps calculated through different methods has been widely shown in the literature [55–57]. Model Precision Recall mAP More specifically, it has been observed that when consider- YoloV5s no-TL 0.665 ± 0.054 0.541 ± 0.043 0.561 ± 0.053 ing occlusion sensitivity, the regions linked to lesions appear YoloV5s TL 0.726 ± 0.110 0.591 ± 0.063 0.621 ± 0.035 to be slightly illuminated compared to Eigen-CAM, where they are more prominently highlighted. In addition, the quantitative analysis showed the superiority of Eigen-CAM Proprietary Dataset Results and Transfer for this object detection task in mammography. Table 6 sum- Learning Evaluation marizes the results. In the selected subset, the Yolo model correctly detected 41 lesions, but missed 15 lesions (false The YoloV5s model was the most accurate for the two open- negatives) and incorrectly identified 19 non-existent lesions source datasets and was used for lesion detection on propri- (false positives). However, when we employed Eigen-CAM, etary dataset. The trained model using the CBIS-DDSM as we observed better results. Out of the 56 lesions, 52 were source dataset and INbreast as target dataset was the check- correctly detected, reducing the false negatives to just 4. point to start training on the proprietary dataset. For this rea- However, the use of Eigen-CAM led to an increase in false son, the model trained on the proprietary dataset brings the positives, with a total of 34. On the other hand, the occlusion knowledge learned on CBIS-DDSM and INbreast. Figure 3 sensitivity method did not perform as well as Eigen-CAM, shows the difference in validation mAP calculated during showing an increase in false negatives to 20 and false posi- training with and without transfer learning. In particular, tive of 55. higher initial mAP, faster mAP growth in the early epochs, and higher mAP asymptote was calculated using transfer learning [54]. The result was confirmed in the test set, with Discussion an mAP of 0.561 and 0.61 without and with transfer learn- ing, respectively. Table 5 shows the results computed within Performance and Transfer Learning Importance the 5-Fold Cross Validation strategy. The proposed work for breast cancer detection introduces Explainability Results several novelties and advantages. Three different data- sets were considered. The CBIS-DDSM is the largest and To evaluate the performance using XAI methods, we con- therefore the most appropriate for deep training. However, ducted a manual analysis on a proprietary dataset subset it is composed of scanned film mammograms, resulting in consisting of 50 images and 56 lesions. No healthy images images that are notably distinct from the FFDM images. were considered. Our focus was evaluating the differences in Conversely, the INbreast and the proprietary FFDM data- false positives and false negatives using two XAI techniques: sets can be considered a good benchmark for testing Yolo Eigen-CAM and occlusion sensitivity. Through a qualita- on real clinical practice images. For this reason, the CBIS- tive analysis, the generated saliency maps do not exhibit DDSM dataset was used to obtain an optimized pre-training Fig. 3 Training performance with (green) and without (red) transfer learning on the proprietary dataset 1 3 Cognitive Computation Fig. 4 Example of a bounding-box prediction on the left and the respective saliency map on the right. The ROI is correctly predicted with a con- fidence index of 0.6. However, other suspicious areas are also highlighted on the saliency map compared with the common COCO dataset (that is the differ. In fact, INbreast was acquired with a MammoNova- benchmark for Yolo). In fact, the COCO dataset is used for tion Siemens FFDM machine with a pixel size of 70 µm and the recognition of objects, cars, people, etc., on real-life our dataset with a Fujifilm FFDM with a pixel size of 50 images. In each case, with a significantly different distribu- µm. The spatial resolution is also very different: for INbreast tion than breast cancer in mammograms. Then, for all exper- 3328×4084 or 2560×3328 and for the proprietary dataset iments, the transfer learning technique was exploited using 5928×4728. Moreover, the main difference lies in the het- the CBIS-DDSM as source dataset, and different Yolo archi- erogeneity of the datasets. In fact, for INbreast, the 107 con- tectures were compared. Considering that Yolo architectures sidered abnormalities are only masses, with 2 asymmetries. evolve to improve both accuracy and inference speeds, it was In contrast, our dataset is mainly composed of masses (62%), not obvious to find YoloV5 more accurate than YoloV3. but also of asymmetries (15%) and distortions (23%). The Moreover, among the various versions of YoloV5, the presence of these types of lesions, which account for 38% small version was the most accurate, also compared with of our dataset, poses an additional challenge for accurate the YoloV5s-Transformer. The performance obtained on detection. In fact, according to BI-RADS [58], the term the proprietary dataset was lower than on INbreast. How- architectural distortion (AD) is used when normal architec- ever, our dataset contains three times the number of lesions, ture is distorted with a non-definite visible mass. AD is not allowing for a more accurate evaluation of the models. Also, always a sign of cancer and may represent different benign although both are datasets for breast cancer analysis, it is processes and high-risk lesions [59], and it is responsible natural that the distributions, and consequently the training, for 12 to 45% of breast cancer missed during screening [60]. Fig. 5 Example of wrong prediction on the left and the respective saliency map on the right. Despite the error, the saliency map calculated via Eigen-Cam provides several suspicious ROIs, as well as the miss-detected lesion (marked with the white bounding-box) 1 3 Cognitive Computation Table 6 Performance variation through the use of saliency maps spatial resolution. In fact, comparing our result on the best fold with their result on 608×608 images, we obtained an AP Model Lesions # TP FP FN of 88.5 (vs. their 87.5) and 92.2 (vs. their 80.8) for benign Yolo-based 56 41 19 15 and malignant detection, respectively, illustrated in Aly Eigen-CAM 56 52 34 4 et al. [5], increasing the image size proves beneficial for the OS 56 36 55 20 learning process. However, the disparity between experi- ments conducted with sizes of 448×448 vs. 608×608 is quite substantial, but it diminishes significantly when consider- Asymmetries are areas of fibroglandular tissue visible on ing the size of 832×832. This finding suggests that larger only one mammographic projection, mostly caused by the image sizes may yield slightly improved results, while the superimposition of normal breast tissue. There are different increased complexity of models and the associated optimiza- types of asymmetries: for example, the developing asym- tion could pose a considerably increased computational cost. metry has a 15% risk of malignancy [61], and the global symmetry instead is mostly a normal variant. Although this Explainability Discussion introduces a significant level of complexity, it moves the sys- tem towards the real-world clinical scenario. For this reason, Despite the encouraging performance, the system must be the achieved results are encouraging and demonstrate that both accurate and trusted by physicians for its integration breast cancer detection can be addressed without reducing into real clinical practice. Therefore, an introspection and the task to patch classification. explanation of the trained model were conducted via Eigen- CAM. Figures 4 and 5 show two generated saliency maps Comparison via Eigen-CAM and occlusion sensitivity methods. In par- ticular, the former image represents a correct prediction and An accurate comparison with other studies is complex the latter an incorrect prediction. In Fig. 4, the Eigen-CAM because of different datasets, pre-processing, and training heat-map results most brightly around the predicted lesion, protocols. However, Table 7 shows some similar works. In but it is suggested that the physician should also pay atten- [62], OPTIMAM dataset (OMI-H), composed of about 5300 tion to other areas of the image. In Fig. 5, instead, the model mammograms, was used as source dataset to perform TL makes an error in prediction (missed detection). In this fig- on INbreast dataset. Using the faster R-CNN architecture, ure, the advantage of using a gradient-free method can be they obtained an AUC-ROC of 0.79 and 0.95 for benign seen. In fact, the generated Eigen-CAM heat map identifies and malignant lesion detection. YoloV1 was used in [4], several salient areas that demand the physician’s attention. resulting in 99.5 and 99.9 for benign and malignant lesion In addition, the saliency maps depicted in Figs. 4 and 5 detection in the DDSM dataset. Yolo9000 (e.g., YoloV2) indicate that the activations primarily concentrate on the is used in [63]: in contrast to our system, localization and breast region. Any minimal activations observed outside this classification performance were evaluated separately on the area (in the Eigen-CAM maps) can be attributed to artifacts INbreast dataset. In particular, first, the lesions are localized, and are not considered confounding factors for the physi- and then only the localized ones are classified, resulting in cian. It is possible to speculate that the slight activations at a detection accuracy of 97.2 and a classification accuracy the black edges of the images might assist in aligning the of 95.3. The most similar work to ours in terms of evalua- coordinates of the bounding boxes predicted in the opposite tion protocol and workflow was proposed by Aly et al. [5]. area of the image, where only the background is present. Using YoloV3, they obtained an AP of 94.2 and 84.6 for The obtained saliency maps are class-independent as con- benign and malignant detection, respectively. However the firmed by clinical literature findings, where mammography reported best results are computed using a higher image spa- is typically employed as a screening examination aimed at tial resolution (832×832 vs. our 640×640), and the results identifying certain abnormalities. On the other hand, other were reported in 5-fold cross-validation only for 448×448 examination modalities, such as MRI, are more informative Table 7 Comparison between Paper Architecture Dataset Performance the proposed and other breast cancer detection works, [62] Faster R-CNN Optimam → INbreast AUC B: 0.79; M: 0.95 considering the INbreast [4] YoloV1 DDSM AUC B: 99.5; M: 99.9 dataset. (Det, detection; Cls, [63] YoloV2 DDSM & INbreast Det. Acc: 97.2; Cls Acc (AUC): 95.3 classification; Acc , accuracy; AP, average precision; → is for [5] YoloV3 INbreast AP B: 94.2; M: 84.6 TL from dataset1 to dataset2) Our YoloV5s CBIS-DDSM → INbreast AP B:0.771 ± 0.131 ; M:0.898 ± 0.069 1 3 Cognitive Computation Data Availability Data will be made available on reasonable request. for characterization purposes and are thus considered sec- ondary examinations [12, 64]. Declarations Based on these findings, Eigen-CAM proves to be the method more suitable with respect to occlusion sensitiv- Ethical Approval Retrospective data collection was approved by the ity for generating saliency maps in object detection tasks. local ethics committee. Despite the unavoidable increase in false positives, the Consent to Participate The requirement for evidence of informed con- reduction in false negatives was significant. This reduction sent was waived because of the retrospective nature of our study. is particularly important from a clinical perspective, as it enables early diagnosis and facilitates the scheduling of Conflict of Interest The authors declare that they have no conflict of further examinations by ruling out the growth of invasive interest. lesions. Considering these factors, we believe that saliency Open Access This article is licensed under a Creative Commons Attri- maps should complement, rather than replace, the outputs bution 4.0 International License, which permits use, sharing, adapta- of the Yolo model. In fact, Yolo’s predictions resulted strict tion, distribution and reproduction in any medium or format, as long with a small number of false positives, while Eigen-CAM’s as you give appropriate credit to the original author(s) and the source, predictions are more conservative with a minimal number of provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are false negatives. Above all, these outputs should be seen as a included in the article's Creative Commons licence, unless indicated qualitative tool that always requires clinical radiologic evalu- otherwise in a credit line to the material. If material is not included in ation. For this reason, it is the responsibility of the physician the article's Creative Commons licence and your intended use is not to determine which areas necessitate additional examination. permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://cr eativ ecommons. or g/licen ses/ b y/4.0/ . Conclusions References In this work, a Yolo-based model was proposed for breast can- 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN cer detection. Although the CBIS-DDSM dataset is composed estimates of incidence and mortality worldwide for 36 can- of scanned film mammograms, the use of the transfer learning cers in 185 countries. CA: A Cancer Journal for Clinicians. technique improves the models’ generalization capabilities when 2021;71(3):209–249. https:// doi. org/ 10. 3322/ caac. 21660. Yolo is fine-tuned with FFDM images (INbreast and propri- 2. Duffy SW, Tabár L, Yen AM-F, Dean PB, Smith RA, Jonsson H, Törnberg S, Chen SL-S, Chiu SY-H, Fann JC-Y, Ku MM-S, Wu etary datasets). The results obtained on the INbreast dataset were WY-Y, Hsu C-Y, Chen Y-C, Svane G, Azavedo E, Grundström exploited to train YoloV5 on the proprietary dataset. The per- H, Sundén P, Leifland K, Frodis E, Ramos J, Epstein B, Åker- formance obtained are very encouraging, also considering the lund A, Sundbom A, Bordás P, Wallin H, Starck L, Björkgren heterogeneity of the proprietary dataset, which is composed of A, Carlson S, Fredriksson I, Ahlgren J, Öhman D, Holmberg L, Chen TH-H. Mammography screening reduces rates of advanced particularly difficult-to-recognize lesions such as asymmetries and fatal breast cancers: results in 549,091 women. Cancer. and distortions. In addition, the use of the saliency maps makes 2020;126(13):2971–2979. https:// doi. org/ 10. 1002/ cncr. 32859. the internal process of deep learning models transparent and 3. Ekpo EU, Alakhras M, Brennan P. Errors in mammography can- encourages the integration of our model within a clinical deci- not be solved through technology alone. Asian Pac J Cancer Prev: APJCP. 2018;19(2):291. https:// doi. org/ 10. 22034/ APJCP. 2018. sion support system. In fact, the gradient-free Eigen-CAM 19.2. 291. method highlights all the suspicious ROIs, also in incorrect 4. Al-Masni MA, Al-Antari MA, Park J-M, Gi G, Kim T-Y, Rivera prediction scenarios. For this reason, it represents the enhanced P, Valarezo E, Choi M-T, Han S-M, Kim T-S. Simultaneous detec- output of our model. The proposed model represents a trusted tion and classification of breast masses in digital mammograms via a deep learning Yolo-based cad system. Comput Methods Pro- predictive system to support cognitive and decision-making and grams Biomed. 2018;157:85–94. https://doi. or g/10. 1016/j. cm pb. control processes in the clinical practice. In addition, the XAI 2018. 01. 017. results pave the way for a prospective study in which the diag- 5. Aly GH, Marey M, El-Sayed SA, Tolba MF. Yolo based breast nostic performance of physicians is evaluated with and without masses detection and classification in full-field digital mammo- grams. Comput Methods Programs Biomed. 2021;200:105823. the support of both Yolo and Eigen-CAM outputs, using an https:// doi. org/ 10. 1016/j. cmpb. 2020. 105823. external data cohort. This represents a step towards the integra- 6. Baccouche A, Garcia-Zapirain B, Olea CC, Elmaghraby AS. tion of data-driven systems into real clinical practice. Breast lesions detection and classification via Yolo-based fusion models. Comput Mater Contin. 2021;69:1407–1425. https:// doi. org/ 10. 32604/ cmc. 2021. 018461. Funding Open access funding provided by Università degli Studi di 7. Jung H, Kim B, Lee I, Yoo M, Lee J, Ham S, Woo O, Kang J. Palermo within the CRUI-CARE Agreement. This work was partially Detection of masses in mammograms using a one-stage object supported by the University of Palermo Grant EUROSTART, CUP detector based on a deep convolutional neural network. PloS one. B79J21038330001, Project TRUSTAI4NCDI. 2018;13(9):0203355. https://doi. or g/10. 1371/ jour nal. pone. 02033 55 . 1 3 Cognitive Computation 8. Darma IWAS, Suciati N, Siahaan D. A performance compari- a deep convolutional neural network-based approach. Plos one. son of balinese carving motif detection and recognition using 2022;17(1):0263126. https://doi. or g/10. 1371/ jour nal. pone. 02631 26 . YOLOv5 and mask R-CNN. In: 2021 5th International Conference 24. Soulami KB, Kaabouch N, Saidi MN. Breast cancer: classification on Informatics and Computational Sciences (ICICoS), 2021;pp. of suspicious regions in digital mammograms based on capsule 52–57. https:// doi. org/ 10. 1109/ ICICo S53627. 2021. 96518 55. network. Biomed Signal Process Control. 2022;76. https://doi. or g/ 9. Prinzi F, Insalaco M, Gaglio S, Vitabile S. Breast cancer locali-10. 1016/j. bspc. 2022. 103696. zation and classification in mammograms using YoloV5. In: 25. Ragab DA, Attallah O, Sharkas M, Ren J, Marshall S. A framework Esposito A, Faundez-Zanuy M, Morabito FC, Pasero E, edi- for breast cancer classification using multi-DCNNs. Comput Biol tors. Applications of artificial intelligence and neural systems Med. 2021;131. https://d oi.o rg/1 0.1 016/j.c ompbi omed.2 021.1 04245. to data science. Smart innovation, systems and technologies. 26. Yu X, Pang W, Xu Q, Liang M. Mammographic image classifica- Vol. 360. Singapore: Springer; 2023. h t t ps : / / d o i . o r g/ 1 0 . 1 0 0 7/ tion with deep fusion learning. Sci Rep. 2020;10(1):1–11. https:// 978- 981- 99- 3592-5_7.doi. org/ 10. 1038/ s41598- 020- 71431-x. 10. Redmon J, Farhadi A. YOLOv3: an incremental improvement. 27. Agarwal R, Diaz O, Lladó X, Yap MH, Martí R. Automatic mass arXiv preprint arXiv:1804.02767. 2018. https://doi. or g/10. 48550/ detection in mammograms using deep convolutional neural net- arXiv. 1804. 02767. works. J Med Imaging. 2019;6(3):031409. https:// doi. or g/ 10. 11. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, 1117/1. JMI.6. 3. 031409. Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, 28. AlGhamdi M, Abdel-Mottaleb M. DV-DCNN: dual-view deep et al. An image is worth 16x16 words: transformers for image rec- convolutional neural network for matching detected masses in ognition at scale. arXiv preprint arXiv:2010.11929. 2020. https:// mammograms. Comput Methods Programs Biomed. 2021;207. doi. org/ 10. 48550/ arXiv. 2010. 11929.https:// doi. org/ 10. 1016/j. cmpb. 2021. 106152. 12. Prinzi F, Orlando A, Gaglio S, Midiri M, Vitabile S. ML-based 29. Lin T-Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for radiomics analysis for breast cancer classification in DCE-MRI. dense object detection. IEEE Trans Pattern Anal Mach Intell. In: Applied Intelligence and Informatics: Second International 2020;42(2):318–27. https://do i.or g/10 .11 09/T PAMI.20 18.28 5882 6. Conference, AII 2022, Reggio Calabria, Italy, September 1–3, 30. Montavon G, Samek W, Müller K-R. Methods for interpreting 2022, Proceedings. 2023;pp. 144–158. https:// doi. org/ 10. 1007/ and understanding deep neural networks. Digit Signal Process. 978-3- 031- 24801-6_ 11. Springer 2018;73:1–15. https:// doi. org/ 10. 1016/j. dsp. 2017. 10. 011. 13. Chugh G, Kumar S, Singh N. Survey on machine learn- 31. Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, ing and deep learning applications in breast cancer diagno- Barbado A, García S, Gil-López S, Molina D, Benjamins R, et al. sis. Cognit Comput. 2021;pp. 1–20. https:// doi. or g/ 10. 1007/ Explainable artificial intelligence (XAI): concepts, taxonomies, s12559- 020- 09813-6. opportunities and challenges toward responsible AI. Inf Fusion. 14. Lee RS, Gimenez F, Hoogi A, Miyake KK, Gorovoy M, Rubin 2020;58:82–115. https:// doi. org/ 10. 1016/j. inffus. 2019. 12. 012. DL. A curated mammography data set for use in computer-aided 32. Kulesza T, Burnett M, Wong W-K, Stumpf S. Principles of detection and diagnosis research. Scientific Data. 2017;4(1):1–9. explanatory debugging to personalize interactive machine learn- https:// doi. org/ 10. 1038/ sdata. 2017. 177. ing. In: Proceedings of the 20th International Conference on Intel- 15. Moreira IC, Amaral I, Domingues I, Cardoso A, Cardoso MJ, ligent User Interfaces. 2015;pp. 126–137. https://doi. or g/10. 1145/ Cardoso JS. INbreast: toward a full-field digital mammographic 26780 25. 27013 99. database. Acad Radiol. 2012;19(2):236–48. https:// doi. org/ 10. 33. Pocevičiūtė M, Eilertsen G, Lundström C. In: Holzinger, A., Goebel, 1016/j. acra. 2011. 09. 014. R., Mengel, M., Müller, H. (eds.) Survey of XAI in digital pathol- 16. Abdelrahman L, Al Ghamdi M, Collado-Mesa F, Abdel-Mottaleb ogy. 2020;pp. 56–88. Springer, Cham. https:// doi. org/ 10. 1007/ M. Convolutional neural networks for breast cancer detection in 978-3- 030- 50402-1_4. mammography: a survey. Comput Biol Med. 2021;131. https:// 34. Durand MA, Wang S, Hooley RJ, Raghu M, Philpotts LE. doi. org/ 10. 1016/j. compb iomed. 2021. 104248. Tomosynthesis-detected architectural distortion: management 17. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, algorithm with radiologic-pathologic correlation. Radiographics. Pedreschi D. A survey of methods for explaining black box mod- 2016;36(2):311–21. https:// doi. org/ 10. 1148/ rg. 20161 50093. els. ACM Comput Surv (CSUR). 2018;51(5):1–42. https:// doi. 35. Oyelade ON, Ezugwu AE-S. A state-of-the-art survey on deep org/ 10. 1145/ 32360 09. learning methods for detection of architectural distortion from 18. Lipton ZC. The mythos of model interpretability: in machine digital mammography. IEEE Access. 2020;8:148644–76. https:// learning, the concept of interpretability is both important and slip-doi. org/ 10. 1109/ ACCESS. 2020. 30162 23. pery. Queue. 2018;16(3):31–57. https://doi. or g/10. 1145/ 32363 86. 36. Al-Dhabyani W, Gomaa M, Khaled H, Aly F. Deep learning 32413 40. approaches for data augmentation and classification of breast 19. Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang G-Z. masses using ultrasound images. Int J Adv Comput Sci Appl. Xai-explainable artificial intelligence Science robotics. 2019;10(5):1–11. https://doi. or g/10. 14569/ IJ ACSA.2019. 01005 79 . 2019;4(37):7120. https:// doi. org/ 10. 1126/ sciro botics. aay71 20. 37. Kyono T, Gilbert FJ, van der Schaar M. MAMMO: a deep learn- 20. Muhammad MB, Yeasin M. Eigen-CAM: class activation map ing solution for facilitating radiologist-machine collaboration in using principal components. In: 2020 International Joint Confer- breast cancer diagnosis. arXiv preprint arXiv:1811.02661. 2018. ence on Neural Networks (IJCNN), 2020;pp. 1–7. https://doi. or g/ https:// doi. org/ 10. 48550/ arXiv. 1811. 02661. 10. 1109/ IJCNN 48605. 2020. 92066 26. 38. Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Pro- 21. Zhu H, Chen B, Yang C. Understanding why ViT trains badly ceedings of the IEEE Conference on Computer Vision and Pattern on small datasets: an intuitive perspective. arXiv preprint Recognition. 2017;pp. 7263–7271. https://doi. or g/10. 1109/ CVPR. arXiv:2302.03751. 2023.2017. 690. 22. Muduli D, Dash R, Majhi B. Automated diagnosis of breast cancer 39. He K, Zhang X, Ren S, Sun J. Deep residual learning for image using multi-modal datasets: a deep convolution neural network recognition. In: 2016 IEEE Conference on Computer Vision and based approach. Biomed Signal Process Control. 2022;71. https:// Pattern Recognition (CVPR), 2016;pp. 770–778. https:// doi. org/ doi. org/ 10. 1016/j. bspc. 2021. 102825.10. 1109/ CVPR. 2016. 90. 40. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. 23. Mahmood T, Li J, Pei Y, Akhtar F, Rehman MU, Wasti SH. Feature pyramid networks for object detection. In: Proceedings Breast lesions classifications of mammographic images using 1 3 Cognitive Computation of the IEEE Conference on Computer Vision and Pattern Rec- 52. Ultralytics: YoloV5 Ultralytics Github. 2022. (Last accessed ognition. 2017;pp. 2117–2125. https:// doi. org/ 10. 48550/ arXiv. 24-Jan-2023). https:// github. com/ ultra lytics/ yolov5. 1612. 03144. 53. wandb: Weights & Biases. 2022. (Last accessed 24-Jan-2023). 41. Wang C-Y, MarkLiaoH-Y, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh https:// github. com/ wandb/ wandb. I-H. CSPNet: a new backbone that can enhance learning capability 54. Torrey L, Shavlik J. Chapter 11 transfer learning. In: Handbook of of CNN. In: 2020 IEEE/CVF Conference on Computer Vision and research on machine learning applications and trends: algorithms, Pattern Recognition Workshops (CVPRW). 2020;pp. 1571–1580. methods, and techniques. 2010;pp. 242–264. https:// doi. org/ 10. https:// doi. org/ 10. 1109/ CVPRW 50498. 2020. 00203.4018/ 978-1- 60566- 766-9. 42. Liu S, Qi L, Qin H, Shi J, Jia J. Path aggregation network for 55. Zhang J, Chao H, Kalra MK, Wang G, Yan P. Overlooked trust- instance segmentation. In: 2018 IEEE/CVF Conference on Com- worthiness of explainability in medical AI. medRxiv. 2021. puter Vision and Pattern Recognition. 2018;pp. 8759–8768. 56. Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of https:// doi. org/ 10. 1109/ CVPR. 2018. 00913. current approaches to explainable artificial intelligence in health 43. Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, care. Lancet Digital Health. 2021;3(11):745–50. https:// doi. org/ Gonzalez J, Keutzer K, Vajda P. Visual transformers: token-10. 1016/ S2589- 7500(21) 00208-9. based image representation and processing for computer vision. 57. Bodria F, Giannotti F, Guidotti R, Naretto F, Pedreschi D, Rinzivillo arXiv preprint arXiv:2006.03677. 2020. https:// doi. or g/ 10. S. Benchmarking and survey of explanation methods for black box 48550/ arXiv. 2006. 03677. models. arXiv preprint arXiv:2102.13076. 2021. 44. Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer 58. ACR: American college of radiology et.al: ACR BI-RADS Atlas: learning. J Big Data. 2016;3(1):1–40. https:// doi. org/ 10. 1186/ breast imaging reporting and data system. Reston, VA: American s40537- 016- 0043-6. College of Radiology 2014. 2013;pp. 37–78. 45. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning 59. Babkina TM, Gurando AV, Kozarenko TM, Gurando VR, Telniy VV, deep features for discriminative localization. In: Proceedings of the Pominchuk DV. Detection of breast cancers represented as architec- IEEE Conference on Computer Vision and Pattern Recognition. tural distortion: a comparison of full-field digital mammography and 2016;pp. 2921–2929. https:// doi. org/ 10. 1109/ CVPR. 2016. 319. digital breast tomosynthesis. Wiad Lek. 2021;74(7):1674–9. https:// 46. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, doi. org/ 10. 36740/ WLek2 02107 121. Batra D. Grad-CAM: visual explanations from deep networks 60. Rangayyan RM, Banik S, Desautels J. Computer-aided detection via gradient-based localization. In: 2017 IEEE International of architectural distortion in prior mammograms of interval can- Conference on Computer Vision (ICCV). 2017;pp. 618–626. cer. J Digit Imaging. 2010;23(5):611–31. https://doi. or g/10. 1007/ https:// doi. org/ 10. 1109/ ICCV. 2017. 74.s10278- 009- 9257-x. 47. Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN. 61. Arian A, Dinas K, Pratilas GC, Alipour S. The breast imaging- Grad-CAM++: generalized gradient-based visual explanations for reporting and data system (BI-RADS) made easy. Iran J Radiol. deep convolutional networks. In: 2018 IEEE Winter Conference 2022;19(1). https:// doi. org/ 10. 5812/ iranj radiol- 121155. on Applications of Computer Vision (WACV). 2018;pp. 839–847. 62. Agarwal R, Díaz O, Yap MH, Lladó X, Martí R. Deep learning https:// doi. org/ 10. 1109/ WACV. 2018. 00097. for mass detection in full field digital mammograms. Comput Biol 48. Tan Q, Xie W, Tang H, Li Y. Multi-scale attention adaptive net- Med. 2020;121:103774. https://do i. org/ 10. 1016/j. compb iomed. work for object detection in remote sensing images. In: 2022 5th 2020. 103774. International Conference on Information Communication and 63. Al-Antari MA, Han S-M, Kim T-S. Evaluation of deep learning Signal Processing (ICICSP). 2022;pp. 218–223. https:// doi. org/ detection and classification towards computer-aided diagnosis of 10. 1109/ ICICS P55539. 2022. 10050 627. IEEE. breast lesions in digital X-ray mammograms. Comput Methods 49. Li W, Huang L. YOLOSA: object detection based on 2D local Programs Biomed. 2020;196. https:// doi. or g/ 10. 1016/j. cm pb. feature superimposed self-attention. Pattern Recognition Letters. 2020. 105584. 2023;168:86–92. https:// doi. org/ 10. 1016/j. patrec. 2023. 03. 003. 64. Militello C, Rundo L, Dimarco M, Orlando A, Woitek R, 50. Qiu M, Christopher LA, Chien S, Chen Y. Attention mechanism D’Angelo I, Russo G, Bartolotta TV. 3D DCE-MRI radiomic improves YOLOv5x for detecting vehicles on surveillance videos. analysis for malignant lesion prediction in breast cancer patients. In: 2022 IEEE Applied Imagery Pattern Recognition Workshop Acad Radiol. 2022;29(6):830–40. https:// doi. org/ 10. 1016/j. acra. (AIPR), 2022;pp. 1–8. https://doi. or g/10. 1109/ AIPR5 7179. 2022. 2021. 08. 024. 10092 237. IEEE. 51. Zeiler MD, Fergus R. Visualizing and understanding convolu- Publisher's Note Springer Nature remains neutral with regard to tional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, jurisdictional claims in published maps and institutional affiliations. T. (eds.) Computer Vision – ECCV 2014. 2014;pp. 818–833. Springer, Cham. https://doi. or g/10. 1007/ 978-3- 319- 10590-1_ 53 . 1 3

Journal

Cognitive Computation – Springer Journals

Published: Jan 1, 2024

Keywords: Breast cancer detection; Explainable AI; YoloV5; Transfer learning; Proprietary dataset

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A Yolo-Based Model for Breast Cancer Detection in Mammograms

A Yolo-Based Model for Breast Cancer Detection in Mammograms

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A Yolo-Based Model for Breast Cancer Detection in Mammograms

A Yolo-Based Model for Breast Cancer Detection in Mammograms

References (35)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies