TY - JOUR AU - Wei, Leyi AB - Abstract The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation. protein subcellular location, deep neural network, criterion learning, phenotype features 1 Introduction The spatial distribution of proteins at a subcellular level reveals valuable information for understanding the working of proteins [16]. These subcellular locations provide a specific chemical environment and set of interaction partners that enable the fulfilling of the function of the proteins [29]. Abnormal subcellular locations of proteins have been reported to be associated with cellular dysfunction and disease [15, 19]. Therefore, exact protein location prediction at subcellular level can improve tasks such as target identification in drug discovery [13], especially for anticancer research [17], etc. Using biological experiments and visual inspection to determine the subcellular localization of a protein is reliable but suffers from large labor and time cost. Thus, computational approaches have been developed for automated subcellular localization of proteins. In terms of the data source, some computational methods predicted subcellular location using 1D protein amino acid sequence [34, 41]. Compared to the sequencing data, 2D images that present proteins or their subcellular location with distinct patterns are more intuitive and interpretable [38]. Particularly, with the advancement of imaging technologies, considerable progress has been made for the development of imaging-based methods to automatically determine the protein subcellular locations [12, 18]. Large number of hand-crafted image features have been employed to predict subcellular location of proteins [1, 14]. Texture feature, which describes the spatial arrangement of color or intensities in images, is one of the most popular image feature types that has been used in subcellular location identification. In a number of studies, such as Boland and Murphy [1], Tahir and Khan [26] and Chebira et al. [2], Haralick texture features, calculated based on gray level co-occurrence matrices in different orientations, were extracted to recognize the subcellular localization. Local binary pattern (LBP), which encodes the local structure around each pixel with decimal numbers, is another type of texture features that has been used extensively to predict protein subcellular localization [28, 37, 40]. Extensions of LBP such as local ternary pattern (LTP) features and local quinary pattern (LQP) features were employed in [40] to extract the texture information for location classification. Texture features such Gabor features or wavelet features were also employed in some studies to recognize protein subcellular location [9]. The 2nd commonly used type of features is the Zernike feature. Due to its invariance to the translation and rotation of images, some works such as [1, 27] extracted Zernike features to localize protein at subcellular levels. Other types of features including morphological features (the charateristics such as number intensity, shape, position,etc.) [2, 3, 8], DNA distribution features (DNA spatial distributions of a human cell) [37, 40], texton-based features [26], etc. were also widely used. However, most of the current studies focus on localizing proteins with single labels. Each protein was assumed to correspond to only one subcellular location in these studies. In fact, at least 20% of human proteins exist at more than two subcellular locations [42]. Some studies proposed algorithms to label proteins with multiple subcellular structures. In Xu et al. [37], binary relevance (BR) was used to build the multi-label predictor, which treated the multi-location issue as multiple independent binary classification issues. However, the correlation among classes was not considered in this method. Xu et al. solved this problem by means of a Bayesian graph model and used it to guide the order of binary classifiers training and also used it as additional features [36]. Wang and Li [33] extended the BR method and learned the label correlation through feature space transformation. For each label, multiple binary classifiers were learned by randomly selecting several labels as its additional input features and then these classifiers were aggregated by majority voting strategy. Yang et al. [39] used frequency features and a chained prediction model to handle multi-label problem. However, the prediction performance is still waiting to be improved, as the best accuracy among all these methods is still below 70%. Deep learning has shown impressive performance in a wide range of areas [10, 25]. Compared to the hand-crafted features that depend on prior knowledge and manual design, deep learning automatically learns features used for classification issues. As far as we know, only a handful of studies employed deep learning for protein subcellular location identification. Pärnamaa and Parts [20] trained an 11-layer neural network to classify fluorescent protein subcellular locations in yeast cells. Shao et al. [23] applied features extracted from the last fully connected layer of the convolutional neural networks (CNNs), selected the most distinguishing features based on Lasso model and used these selected features for final classifications. Winsnes et al. [35] used deep neural networks (DNNs) for predicting subcellular localization on the images of U2OS cell lines. It is expected that the deep learning-based methods are able to achieve more progress in prediction of protein subcellular location. In this study, we proposed a protein subcellular location prediction approach that can handle both single-label and multi-label samples based on deep features extracted from CNNs. Used as feature extractors, five popular CNNs were employed and five sets of 128-dimensional feature vector were produced from them. Then we handled the five networks, respectively, and identified an optimal one. Next, we obtained a more compact feature subset through feature selection. Then the classification was made based on these optimal features. Particularly, multi-label localization is quite a challenging task. Most of the current methods lack the sufficient exploration of label relevance; thus, the accuracy is limited. To overcome these shortcomings, we proposed an approach that can learn both label–attribute relevance and label–label relevance automatically. Firstly, we generated a probability vector containing the predicted probability of each class. Then inspired by Xu et al. [37], we judged the label set of each sample through counting the number of ‘maximal probabilities’. Different from Xu et al., which depended on manual settings, the counting procedure of the proposed method was completed via automated learning of a criterion. We learned a threshold value for this criterion to determine the labels. We designed a series of experiments to validate the performance of the proposed method and the results have proved the effectiveness of the proposed method. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation. 2 Materials and methods 2.1 Overview of the proposed approach The overall framework of the proposed method is shown in Figure 1. Three steps are involved: Phase A includes data collection and pre-processing; feature extraction and selection are conducted in Phase B; In Phase C, the labels of the samples are predicted. Here we handle the single-label samples and multi-label samples separately. Fig. 1 Open in new tabDownload slide Overview of the proposed method. Phase A: data collection and pre-processing; Phase B: feature extraction and selection; Phase C: subcellular location prediction. In Phase C, we deal with single-label samples (top) and multi-label samples (bottom) separately. Fig. 1 Open in new tabDownload slide Overview of the proposed method. Phase A: data collection and pre-processing; Phase B: feature extraction and selection; Phase C: subcellular location prediction. In Phase C, we deal with single-label samples (top) and multi-label samples (bottom) separately. 2.2 Phase A: data collection and pre-processing We used the immunohistochemistry (IHC) microscopy images extracted from the Human Protein Atlas (HPA) (http://www.proteinatlas.org/) for training and validation [31]. The HPA consists of six parts: the Tissue Atlas, the Cell Atlas, the Pathology Atlas, the Blood Atlas, the Brain Atlas and the Metabolic Atlas. We extracted IHC images from the Tissue Atlas, which presents the distribution of the proteins across all major tissues and organs in the human body. Image labels were obtained from the HPA and Universal Protein Resource (UniProt; https://www.uniprot.org/) [5]. We discarded IHC images with protein expression level score ‘not detected’. Single-label data and mixed-label data (including both single-label samples and multi-label samples) were both used in our study. The single-label data contains 14 antibody proteins with a total of 1386 IHC images belonging to 7 subcellular locations, which include endoplasmic reticulum (ER), cytoskeleton, Golgi apparatus, mitochondria, nucleolus, nucleus, and vesicles. In addition to these 14 antibody proteins, the mixed-label data has another 24 antibody proteins and totally contains 3129 IHC images (531 for multi-label proteins and another new 1212 single-label images besides the original 1386 images). These proteins distribute at nine subcellular locations including lysosome and cytoplasm in addition to the above seven locations. We selected the proteins with high and medium reliability scores. In order to increase the volume of training data and avoid over-fitting, we conducted data augmentation through flipping (up/down and left/right) and rotation (⁠|$\frac{\pi } {\textrm{6}}$|⁠, |$\frac{\pi } {\textrm{3}}$|⁠, |$\frac{\pi } {\textrm{2}}$|⁠, |$\frac{{{\textrm{2}}\pi }}{\textrm{3}}$|⁠, |$\frac{{{\textrm{5}}\pi }}{\textrm{6}}$| and |$\pi $|⁠) operations. Totally, we obtained 30 910 images for the single-label data and 27 492 images for the mixed-label data. The details of the data can be found in Supplementary File 1. The HPA data include two channels: DNA channel (purple) and protein channel (brown). We separated the protein channel from the DNA channel based on linear spectral separation (LIN) algorithm [37]. 2.3 Phase B: feature extraction and selection In our study, the deep CNNs worked as feature extractors to learn rich and discriminative information for protein subcellular location prediction. Here we extracted five sets of features from five popular networks: AlexNet [11], VggNet [24], Xception [4], ResNet [6] and DenseNet [7]. For AlexNet, VggNet and ResNet, we connected three fully connected layers with 1024, 128 and 7 nodes (9 nodes for multi-label classification); while for Xception and DenseNet, we had two fully connected layers with nodes 128 and 7 (9 nodes for multi-label classification). We stopped training when the loss decayed below a magnitude of 1%. We extracted features from the 2nd last fully connected layer and thus 128-dimensional feature vectors were generated for each architecture. Finally, each protein expression image was represented by the 128-dimensional feature vector under each CNN framework. Then we performed feature selection to improve the computation efficiency. Here we used a minimum-redundancy maximum-relevance (mRMR) [22] algorithm combined with backward feature elimination (BFE) to reduce feature redundancy. The mRMR ranked the features based on feature mutual information and the BFE generated feature subsets by eliminating last ranked features. Then the best performed feature subset was picked as the optimal selected features. 2.4 Phase C: subcellular location prediction For the single-label samples, the prediction is an ordinary routine. The support vector machine (SVM) was used to distinguish the subcellular locations after the optimal feature subset was determined. Different from the single-label classification, multi-label predictors must consider the relationships between labels and features and the dependencies between labels due to label co-occurrences [30]. Here we used a criterion learning strategy to find the relationship between labels and features and between labels and labels. The iLocator proposed by Xu et al. [37] employed top criterion and threshold criterion to decide the final multi-label set. However, the two criterions needed to be adjusted manually to obtain an ideal result. We proposed a multi-label prediction approach, which enables automated decision of the label set without human intervention. Only one criterion was used to determine the label set and this threshold can be automatically learned. For the multi-label prediction, each sample might be classified with one or more labels. We denote the label set for a sample as |$L=\left \{l_1,l_2,...,l_9\right \}$|⁠. If |$l_i$| belongs to this sample, the value will be set to 1. Otherwise, it is 0. We trained an SVM classifier based on the training images. Then for each image, the SVM output nine scores |$P=\left \{p_1,p_2,...,p_9\right \}$|⁠, which represented the probability of each class. Among them, the label with the maximum probability |$p_{\textrm{max}}$| should be assigned to the sample. Thus, the position in |$L$| corresponding to |$p_{\textrm{max}}$|’s position was set to 1. Then we sought probability values similar to |$p_{\textrm{max}}$|⁠. The rationale is that true labels are supposed to have similar probabilities, which are larger than the false labels. Therefore, the |$i$|th label, which had a similar probability with |$p_{\textrm{max}}$|⁠, or in other words, had a sufficiently small difference or close relevancy of probability score with |$p_{\textrm{max}}$|⁠, could be assigned to the predicted image. We defined a parameter, |$\theta $|⁠, that worked as a criterion, to determine the final labels. Through comparing the value of |$\theta $| with the difference of probability, the labels can be obtained. The value of |$l_i$| is determined as follows: $$\begin{equation} l_i=\left\{ \begin{array}{l} 1 \ \ \ \textrm{if}\ p_i=p_{\textrm{max}},\\ 1 \ \ \ \textrm{if}\ p_i \ne p_{\textrm{max}} \& p_{\textrm{dif} i}=p_{\textrm{max}}-p_i<\theta,\\ 0 \ \ \ \textrm{if}\ p_i \ne p_{\textrm{max}} \& p_{\textrm{dif} i}=p_{\textrm{max}}-p_i \geqslant \theta.\\ \end{array} \right. \end{equation}$$(1) where |$P_{dif}=\left \{p_{\textrm{dif} 1},p_{\textrm{dif} 2},...,p_{\textrm{dif} 9}\right \}$| is the probability difference vector. If all the differences (except the |$p_{\textrm{max}}$| position) between |$p$| and |$p_{\textrm{max}}$| are larger than |$\theta $|⁠, or no elements in |$P_{dif}$| (except the |$p_{\textrm{max}}$| position) is smaller than |$\theta $|⁠, eight values of |$L$| are 0 and the sample is considered to be single label. We show example of labeling strategy in Figure 2. Fig. 2 Open in new tabDownload slide The labeling strategy of the proposed method. Sample 1 is predicted to have three labels and Sample 2 is predicted to be single label. The orange boxes represent |$p_{\textrm{max}}$|⁠, the dark blue boxes show the probability values similar to |$p_{\textrm{max}}$| and the light blue boxes represent the probability values that have large difference with |$p_{\textrm{max}}$|⁠. The obtained labels |$L$| are listed in the top and bottom, respectively. Fig. 2 Open in new tabDownload slide The labeling strategy of the proposed method. Sample 1 is predicted to have three labels and Sample 2 is predicted to be single label. The orange boxes represent |$p_{\textrm{max}}$|⁠, the dark blue boxes show the probability values similar to |$p_{\textrm{max}}$| and the light blue boxes represent the probability values that have large difference with |$p_{\textrm{max}}$|⁠. The obtained labels |$L$| are listed in the top and bottom, respectively. In contrast to Xu et al. study, which set the two criterions manually, we only employed one criterion, |$\theta $|⁠. The value of |$\theta $| was obtained through grid searching. Since the probability ranges from 0 to 1, |$\theta $| was tuned from 0 to 1 with a step size of 0.05. Then the optimal |$\theta $| was picked when the performance achieved the best. The pseudo-code of the proposed method to predict multi-label samples is shown in Algorithm 1. Construction of model of predicting protein’s multiple subcellular locations Input: 1: Input: IHC images of proteins. Output: 2: Output: Model of predicting protein subcellular locations. 3: Separate the two channels of the IHC images and use the protein channel images. 4: Augment all the images to make them sufficient in volume for training. 5: Split the images into |$C_{train}$| and |$C_{test}$|⁠. 6: Feed |$C_{train}$| into different CNNs and test on |$C_{test}$|⁠. 7: Extract 128-dimension feature vector |$F_D$| from the last second fully connected layer for all the images. 8: Rank the |$F_D$| based on mRMR (increasing order) and denote the rank as |$F_r$|⁠. Make |$F_{update}=F_r$|⁠. 9: Split the images into |$I_{train}$|⁠, |$I_{test}$|⁠, and |$I_{val}$|⁠. 10: for |$j=1$|⁠; |$j \le 128$|⁠; |$j++$| do 11: Train SVM based on |$I_{train}$|⁠. 12: Predict subcellular locations based on |$F_{update}$| using SVM on |$I_{test}$|⁠; Record the results as |$acc[j]$|⁠, |$sen[j]$|⁠, etc. 13: |$F_{update} \leftarrow F_r-F_{r}[j]$|⁠. 14: end for 15: Choose the feature subset with the highest accuracy as the final selected subset, |$f_s$|⁠. 16: Train SVM using |$I_{train}+I_{test}$| based on |$f_s$| and output probability vector |$P$|⁠; Find |$p_{max}$|⁠. 17: Calculate difference vector |$P_{dif}$| and tune |$\theta $|⁠. 18: Test and validate the model on |$I_{val}$|⁠. 2.5 Evaluation metrics We used four metrics to evaluate the proposed framework, accuracy (ACC), sensitivity (SEN), specificity (SPE) and F1-score (F1). They are defined as follows: $$\begin{align}& \textrm{ACC}=\frac{\textrm{TP+TN}}{\textrm{TP+TN+FP+FN}} \end{align}$$(2) $$\begin{align}& \textrm{SEN}=\frac{\textrm{TP}}{\textrm{TP+FN}} \end{align}$$(3) $$\begin{align}& \textrm{SPE}=\frac{\textrm{TN}}{\textrm{TN+FP}} \end{align}$$(4) $$\begin{align}& \textrm{F1-score}=\frac{\textrm{2 TP}}{\textrm{2 TP+FP+FN}} \end{align}$$(5) Table 1 Hyper-parameter setting of the five deep networks . LR|$^1$| . BS|$^2$| . BN|$^3$| . Optimizer . AlexNet 0.01 128 Yes Adam VggNet 0.01 128 Yes Adadelta Xception 0.01 128 Yes Adam ResNet 0.01 64 Yes Adadelta DenseNet 0.01 32 Yes Momentum . LR|$^1$| . BS|$^2$| . BN|$^3$| . Optimizer . AlexNet 0.01 128 Yes Adam VggNet 0.01 128 Yes Adadelta Xception 0.01 128 Yes Adam ResNet 0.01 64 Yes Adadelta DenseNet 0.01 32 Yes Momentum |$^1$|LR means learning rate.|$^2$|BS means batch size.|$^3$|BN means batch normalization. Open in new tab Table 1 Hyper-parameter setting of the five deep networks . LR|$^1$| . BS|$^2$| . BN|$^3$| . Optimizer . AlexNet 0.01 128 Yes Adam VggNet 0.01 128 Yes Adadelta Xception 0.01 128 Yes Adam ResNet 0.01 64 Yes Adadelta DenseNet 0.01 32 Yes Momentum . LR|$^1$| . BS|$^2$| . BN|$^3$| . Optimizer . AlexNet 0.01 128 Yes Adam VggNet 0.01 128 Yes Adadelta Xception 0.01 128 Yes Adam ResNet 0.01 64 Yes Adadelta DenseNet 0.01 32 Yes Momentum |$^1$|LR means learning rate.|$^2$|BS means batch size.|$^3$|BN means batch normalization. Open in new tab Here TP, FP, TN and FN are true positive, false positive, true negative and false negative, respectively. For the multi-label classification, subset accuracy (⁠|$\textrm{ACC}_{\textrm{sub}}$|⁠), label accuracy (⁠|$\textrm{ACC}_{\textrm{lab}}$|⁠) and average label accuracy (⁠|$\textrm{ACC}_{\textrm{avelab}}$|⁠) were also used to provide a more stringent measurement. |$\textrm{ACC}_{\textrm{sub}}$| is the fraction of samples whose predicted label set is exactly the same as the true label set. |$\textrm{ACC}_{\textrm{lab}}$| evaluates the prediction accuracy for each label and it can be used to measure the difficulty of recognizing each label [37]. |$\textrm{ACC}_{\textrm{avelab}}$| is the average of |$\textrm{ACC}_{\textrm{lab}}$| across all the labels. In addition, we used the receive operating characteristic curves (ROCs) and the area under the ROC (AUC) to evaluate the performance. We used 10-fold cross validation to validate the model. 3 Results 3.1 Prediction of single-label locations We firstly tested the performance of the proposed method using the single-label proteins. We trained the AlexNet, VggNet, Xception, ResNet and DenseNet and extracted 128-dimensional deep features from them, respectively. A total of 24 750 images were used for training the CNN, and 6160 images were used for testing. The networks were initialized randomly and the setting of the networks is shown in Table 1. Then we conducted feature selection based on mRMR/BFE and then carried out the classifications using SVM. The intermediate results of feature selection are shown in Figure 3. From the figure, it shows that similar or better performance can be obtained using smaller number of features through feature selection. Fig. 3 Open in new tabDownload slide The intermediate results of feature selection for single-label prediction. Here we take Xception (left) and VggNet (right) as examples. They selected 22 features and 120 features, respectively. Fig. 3 Open in new tabDownload slide The intermediate results of feature selection for single-label prediction. Here we take Xception (left) and VggNet (right) as examples. They selected 22 features and 120 features, respectively. After picking the optimal feature subsets, the model was trained based on the optimal features. We show the performance of each network on the test data in Table 2. From the table, it shows that the Xception achieves the best performance. With only 22 deep features, it can reach ACC 92.1% and AUC 0.9918. Following Xception, AlexNet, VggNet, DenseNet and ResNet present a decreasing order for ACC values. We also show the ROC curves in Figure 4. It can be seen that the AlexNet, VggNet, Xception and DenseNet have similar AUC values, while ResNet stays under the other four networks. We show the confusion matrix of the best performed network, Xception in Table 3. It shows that the prediction of mitochondrion achieves 100% while predicting vesicles is not as accurate as other subcellular locations. The confusion matrices of AlexNet, VggNet, DenseNet and ResNet can be found in Supplementary File 2. Table 2 Performance of the five deep networks on single-label samples . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . AlexNet 95 91.4 91.2 98.5 0.99 91.7 VggNet 120 89.9 89.3 98.3 0.99 89.7 Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 DenseNet 51 89.2 89.6 98.2 0.99 89.8 . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . AlexNet 95 91.4 91.2 98.5 0.99 91.7 VggNet 120 89.9 89.3 98.3 0.99 89.7 Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 DenseNet 51 89.2 89.6 98.2 0.99 89.8 |$^1$|FN means feature number. Open in new tab Table 2 Performance of the five deep networks on single-label samples . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . AlexNet 95 91.4 91.2 98.5 0.99 91.7 VggNet 120 89.9 89.3 98.3 0.99 89.7 Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 DenseNet 51 89.2 89.6 98.2 0.99 89.8 . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . AlexNet 95 91.4 91.2 98.5 0.99 91.7 VggNet 120 89.9 89.3 98.3 0.99 89.7 Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 DenseNet 51 89.2 89.6 98.2 0.99 89.8 |$^1$|FN means feature number. Open in new tab Table 3 Confusion matrix of Xception . Golgi apparatus (%) . Mitochondrion (%) . Vesicles (%) . ER (%) . Nucleolus (%) . Nucleus (%) . Cytoskeleton (%) . Golgi apparatus 92.9 0 0 0 7.1 0 0 Mitochondrion 0 100 0 0 0 0 0 Vesicles 0 0 81.8 4.5 0 4.5 9.1 ER 0 0 4.3 95.7 0 0 0 Nucleolus 0 0 0 0 90.9 9.1 0 Nucleus 0 0 6.7 0 6.7 86.7 0 Cytoskeleton 0 4.8 0 0 0 0 95.2 . Golgi apparatus (%) . Mitochondrion (%) . Vesicles (%) . ER (%) . Nucleolus (%) . Nucleus (%) . Cytoskeleton (%) . Golgi apparatus 92.9 0 0 0 7.1 0 0 Mitochondrion 0 100 0 0 0 0 0 Vesicles 0 0 81.8 4.5 0 4.5 9.1 ER 0 0 4.3 95.7 0 0 0 Nucleolus 0 0 0 0 90.9 9.1 0 Nucleus 0 0 6.7 0 6.7 86.7 0 Cytoskeleton 0 4.8 0 0 0 0 95.2 Open in new tab Table 3 Confusion matrix of Xception . Golgi apparatus (%) . Mitochondrion (%) . Vesicles (%) . ER (%) . Nucleolus (%) . Nucleus (%) . Cytoskeleton (%) . Golgi apparatus 92.9 0 0 0 7.1 0 0 Mitochondrion 0 100 0 0 0 0 0 Vesicles 0 0 81.8 4.5 0 4.5 9.1 ER 0 0 4.3 95.7 0 0 0 Nucleolus 0 0 0 0 90.9 9.1 0 Nucleus 0 0 6.7 0 6.7 86.7 0 Cytoskeleton 0 4.8 0 0 0 0 95.2 . Golgi apparatus (%) . Mitochondrion (%) . Vesicles (%) . ER (%) . Nucleolus (%) . Nucleus (%) . Cytoskeleton (%) . Golgi apparatus 92.9 0 0 0 7.1 0 0 Mitochondrion 0 100 0 0 0 0 0 Vesicles 0 0 81.8 4.5 0 4.5 9.1 ER 0 0 4.3 95.7 0 0 0 Nucleolus 0 0 0 0 90.9 9.1 0 Nucleus 0 0 6.7 0 6.7 86.7 0 Cytoskeleton 0 4.8 0 0 0 0 95.2 Open in new tab Fig. 4 Open in new tabDownload slide The ROC curves of the five deep networks to classify single-label samples. Fig. 4 Open in new tabDownload slide The ROC curves of the five deep networks to classify single-label samples. 3.2 Performance of the proposed predictor to classify mixed-label locations 3.2.1 Discrimination of multi-label and single-label samples The proposed approach must be able to discriminate single-label and multi-label proteins if mixed-label samples are provided. In our design, whether a sample is single-label or multi-label is judged based on the number of elements with probability difference smaller than |$\theta $| in the probability difference vector |$P_{dif}$|⁠. If the number is 0 (except |$p_{max}$| position), it means other probability scores except |$p_{max}$| are far from |$p_{max}$|⁠; thus, this sample is single-labeled; if the number is larger than 0, it means at least one probability score is similar to |$p_{max}$| besides |$p_{max}$| itself; thus, this is a multi-label sample. Assuming the multi-label is labeled as 1 and single-label is 0, we looked into the performance of the proposed method on distinguishing single-label and multi-label samples based on mixed-label data set. The results are shown in Table 4. Table 4 Performance of the proposed method distinguishing single-label and multi-label samples . FN|$^1$| . Total N|$^2$| . |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }=0 \right )^3$| . |$\textrm{N}\left ( n_{p_{\textrm{dif} i}<\theta }\geqslant 1 \right )^4$| . ACC (%) . AlexNet 116 315 243 43 90.8 VggNet 116 315 244 49 93.0 Xception 22 315 186 16 64.1 ResNet 107 315 219 32 79.7 DenseNet 13 315 219 27 78.1 . FN|$^1$| . Total N|$^2$| . |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }=0 \right )^3$| . |$\textrm{N}\left ( n_{p_{\textrm{dif} i}<\theta }\geqslant 1 \right )^4$| . ACC (%) . AlexNet 116 315 243 43 90.8 VggNet 116 315 244 49 93.0 Xception 22 315 186 16 64.1 ResNet 107 315 219 32 79.7 DenseNet 13 315 219 27 78.1 |$^1$| FN means feature number. |$^2$| Total N means the total number of images. |$^3$| |$n_{p_{\textrm{dif} i}<\theta }=0$| means the difference vector that has 0 element with |$p_{\textrm{dif}}<\theta $| besides |$p_{max}$| position and |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }=0 \right )$| shows the correctly predicted number of images with the difference vector that has 0 elements with |$p_{\textrm{dif}}<\theta $| (besides |$p_{max}$| position). In other words, this is the correctly detected number of single-label samples. Therefore, it is the true negative result. |$^4$| |$n_{p_{\textrm{dif} i}<\theta }\geqslant 1$| means the difference vector that has more than one element with |$p_{\textrm{dif}}< \theta $| besides |$p_{max}$| position and |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }\geqslant 1 \right )$| shows the correctly predicted number of images with the difference vector that has more than one element with |$p_{\textrm{dif}}<\theta $| (besides |$p_{max}$| position). In other words, this is the correctly detected number of multi-label samples. Therefore, it is the true positive result. Open in new tab Table 4 Performance of the proposed method distinguishing single-label and multi-label samples . FN|$^1$| . Total N|$^2$| . |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }=0 \right )^3$| . |$\textrm{N}\left ( n_{p_{\textrm{dif} i}<\theta }\geqslant 1 \right )^4$| . ACC (%) . AlexNet 116 315 243 43 90.8 VggNet 116 315 244 49 93.0 Xception 22 315 186 16 64.1 ResNet 107 315 219 32 79.7 DenseNet 13 315 219 27 78.1 . FN|$^1$| . Total N|$^2$| . |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }=0 \right )^3$| . |$\textrm{N}\left ( n_{p_{\textrm{dif} i}<\theta }\geqslant 1 \right )^4$| . ACC (%) . AlexNet 116 315 243 43 90.8 VggNet 116 315 244 49 93.0 Xception 22 315 186 16 64.1 ResNet 107 315 219 32 79.7 DenseNet 13 315 219 27 78.1 |$^1$| FN means feature number. |$^2$| Total N means the total number of images. |$^3$| |$n_{p_{\textrm{dif} i}<\theta }=0$| means the difference vector that has 0 element with |$p_{\textrm{dif}}<\theta $| besides |$p_{max}$| position and |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }=0 \right )$| shows the correctly predicted number of images with the difference vector that has 0 elements with |$p_{\textrm{dif}}<\theta $| (besides |$p_{max}$| position). In other words, this is the correctly detected number of single-label samples. Therefore, it is the true negative result. |$^4$| |$n_{p_{\textrm{dif} i}<\theta }\geqslant 1$| means the difference vector that has more than one element with |$p_{\textrm{dif}}< \theta $| besides |$p_{max}$| position and |$\textrm{N}\left (n_{p_{\textrm{dif} i}<\theta }\geqslant 1 \right )$| shows the correctly predicted number of images with the difference vector that has more than one element with |$p_{\textrm{dif}}<\theta $| (besides |$p_{max}$| position). In other words, this is the correctly detected number of multi-label samples. Therefore, it is the true positive result. Open in new tab From the table, it shows that VggNet reaches the highest accuracy, 93.0%, followed by AlexNet, ResNet, DenseNet and Xception. The result is different from the single-label classification as Xception performs the best in single-label prediction. 3.2.2 Prediction of mixed-label locations The proposed method enables discrimination of single-label and multi-label images. We tested the performance of the proposed method for predicting the mixed-label samples. A total of 21 984 images were used to train the CNN and 5508 images for testing. The mixed-label classification used the selected features and the probability vector to learn |$\theta $| and this |$\theta $| determines the final label set. The setting of |$\theta $| was obtained through learning. Here the starting value of |$\theta $| was 0.1. Then |$\theta $| was incremented with a step-size 0.05 until reaching 1. The performance curves of learning |$\theta $| can be found in Figure 5. We obtained the value of |$\theta $| when the curves reached peaks. Fig. 5 Open in new tabDownload slide The learning of |$\theta $|⁠. The starting value of |$\theta $| is 0.1 and the step size is 0.05. The curves achieve peaks when |$\theta $| equals to 0.1. Fig. 5 Open in new tabDownload slide The learning of |$\theta $|⁠. The starting value of |$\theta $| is 0.1 and the step size is 0.05. The curves achieve peaks when |$\theta $| equals to 0.1. Next, we tested the performance of the five networks, which are shown in Table 5 and Figure 6 (for the ROC, we obtained one graph for each class. Then we present the micro-average of the graphs of all the classes). In Table 5, it can be seen that VggNet achieves the highest subset accuracy, accuracy and AUC. However, Xception has the worst performance with the lowest subset accuracy, accuracy and AUC values. In Figure 6, VggNet stays top leftmost while Xception stays below other curves. We further show the performance of recognizing each subcellular location in Figure 7, in order to observe which location is easier to detect. It shows that ‘Golgi apparatus’, ‘Cytoskeleton’ and ‘Nucleolus’ stay quite close and above other locations, which means these three subcellular locations are easier to be recognized; while ‘Mitochondrion’ and ‘Cytoplasm’ are more difficult to be identified among the nine locations. Table 5 Performance of the five deep networks on mixed-label data . FN|$^1$| . |$\theta $| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 116 0.15 80.3 95.3 83.0 0.95 83.1 VggNet 116 0.1 84.1 95.2 85.9 0.97 82.7 Xception 22 0.1 29.8 80.6 39.5 0.77 40.6 ResNet 107 0.1 67.3 92.6 74.1 0.95 73.5 DenseNet 13 0.1 57.1 89.8 65.5 0.92 69.1 . FN|$^1$| . |$\theta $| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 116 0.15 80.3 95.3 83.0 0.95 83.1 VggNet 116 0.1 84.1 95.2 85.9 0.97 82.7 Xception 22 0.1 29.8 80.6 39.5 0.77 40.6 ResNet 107 0.1 67.3 92.6 74.1 0.95 73.5 DenseNet 13 0.1 57.1 89.8 65.5 0.92 69.1 |$^1$|FN means feature number. Open in new tab Table 5 Performance of the five deep networks on mixed-label data . FN|$^1$| . |$\theta $| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 116 0.15 80.3 95.3 83.0 0.95 83.1 VggNet 116 0.1 84.1 95.2 85.9 0.97 82.7 Xception 22 0.1 29.8 80.6 39.5 0.77 40.6 ResNet 107 0.1 67.3 92.6 74.1 0.95 73.5 DenseNet 13 0.1 57.1 89.8 65.5 0.92 69.1 . FN|$^1$| . |$\theta $| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 116 0.15 80.3 95.3 83.0 0.95 83.1 VggNet 116 0.1 84.1 95.2 85.9 0.97 82.7 Xception 22 0.1 29.8 80.6 39.5 0.77 40.6 ResNet 107 0.1 67.3 92.6 74.1 0.95 73.5 DenseNet 13 0.1 57.1 89.8 65.5 0.92 69.1 |$^1$|FN means feature number. Open in new tab Fig. 6 Open in new tabDownload slide The ROC curves of the five deep networks to classify mixed-label samples. Fig. 6 Open in new tabDownload slide The ROC curves of the five deep networks to classify mixed-label samples. Fig. 7 Open in new tabDownload slide The performance of recognizing each subcellular location across the five deep architectures. Fig. 7 Open in new tabDownload slide The performance of recognizing each subcellular location across the five deep architectures. 3.3 Comparison with hand-crafted features Most of the current protein subcellular localization approaches extracted hand-crafted features and conducted prediction based on these features. We extracted four types of popular texture features, which were also used in Yang et al.’s study as comparison studies. These features include 8126 Haralick texture features, 28 LBP features, 72 LTP features and 288 LQP features. Totally 8514 hand-crafted features were used for comparison. Similar to the procedure in the proposed method, we selected the optimal feature subset and made prediction. The details of the four types of features can be found in Supplementary File 3. 3.3.1 Visualization and representation of features In order to get insights into the features, we applied the |$t$|-distributed stochastic neighbor embedding (t-SNE) [32] to visualize the distribution of deep features and hand-crafted features (denoted as Yang et al.’s method [40]). The plotting can be found in Figure 8. From the figure, it shows that the Xception separates the features quite well in low dimension space while VggNet becomes more mixing. Yang et al.’s features almost align one line and it is difficult to distinguish the seven categories even in low dimension. This has shown that the deep features are more discriminative than the hand-crafted features. Table 6 Comparison between the deep features and Yang et al.’s features for single-label samples. We list the best and worst performed CNNs, the Xception and ResNet, for comparison . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 Yang et al.’s 500 71.2 70.8 95.2 0.93 70.7 . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 Yang et al.’s 500 71.2 70.8 95.2 0.93 70.7 |$^1$|FN means feature number. Open in new tab Table 6 Comparison between the deep features and Yang et al.’s features for single-label samples. We list the best and worst performed CNNs, the Xception and ResNet, for comparison . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 Yang et al.’s 500 71.2 70.8 95.2 0.93 70.7 . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . Xception 22 92.1 91.9 98.7 0.99 91.9 ResNet 108 75.7 75.3 95.9 0.96 75.1 Yang et al.’s 500 71.2 70.8 95.2 0.93 70.7 |$^1$|FN means feature number. Open in new tab Fig. 8 Open in new tabDownload slide t-SNE plotting of the deep features and hand-crafted features. The t-SNE was applied on single-label data. Fig. 8 Open in new tabDownload slide t-SNE plotting of the deep features and hand-crafted features. The t-SNE was applied on single-label data. Fig. 9 Open in new tabDownload slide The comparison between the deep features and Yang et al.’s features to classify single-label samples. Fig. 9 Open in new tabDownload slide The comparison between the deep features and Yang et al.’s features to classify single-label samples. 3.3.2 Comparison with hand-crafted features on single-label samples We compared the performances between deep features and Yang et al.’s features on single-label samples. We extracted features, selected the optimal feature subset based on mRMR/BFE and predicted the labels using SVM. The results are shown in Table 6 and Figure 9. From Table 6, it can be found that the Yang et al.’s approach reached ACC 71.2%, which is 20% lower than the Xception and even lower than the worst performed CNN, the ResNet. Yang et al.’s accuracy was achieved by selected 500 features. The size of the selected feature subset is much larger than that of CNNs. 3.3.3 Comparison with hand-crafted features on mixed-label samples We also compared between the proposed method with the hand-crafted features for mixed-label subcellular location prediction. The results are shown in Table 7 and Figure 7. Table 7 Comparison between the deep features and Yang et al.’s features on mixed-label samples. We list the best and worst performed CNNs, the VggNet and Xception, for comparison . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . VggNet 116 85.9 88.6 96.2 0.97 82.7 Xception 22 39.5 45.9 84.4 0.77 40.6 Yang et al.’s 480 54.9 59.2 90.9 0.91 57.4 . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . VggNet 116 85.9 88.6 96.2 0.97 82.7 Xception 22 39.5 45.9 84.4 0.77 40.6 Yang et al.’s 480 54.9 59.2 90.9 0.91 57.4 |$^1$|FN means feature number. Open in new tab Table 7 Comparison between the deep features and Yang et al.’s features on mixed-label samples. We list the best and worst performed CNNs, the VggNet and Xception, for comparison . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . VggNet 116 85.9 88.6 96.2 0.97 82.7 Xception 22 39.5 45.9 84.4 0.77 40.6 Yang et al.’s 480 54.9 59.2 90.9 0.91 57.4 . FN|$^1$| . ACC (%) . SEN (%) . SPE (%) . AUC . F1 (%) . VggNet 116 85.9 88.6 96.2 0.97 82.7 Xception 22 39.5 45.9 84.4 0.77 40.6 Yang et al.’s 480 54.9 59.2 90.9 0.91 57.4 |$^1$|FN means feature number. Open in new tab Fig. 10 Open in new tabDownload slide The comparison between the deep features and Yang et al.’s features to classify mixed-label samples. Fig. 10 Open in new tabDownload slide The comparison between the deep features and Yang et al.’s features to classify mixed-label samples. Fig. 11 Open in new tabDownload slide The comparison between the ordinary cut-off method and the proposed method on mixed-label samples. FN represents feature number. We here take VggNet and Xception as examples. (c) shows the average comparison across the five deep networks. Each column represents the mean value of the five architectures. Comparisons of AlexNet, ResNet and DenseNet can be found in Supplementary File 4. Fig. 11 Open in new tabDownload slide The comparison between the ordinary cut-off method and the proposed method on mixed-label samples. FN represents feature number. We here take VggNet and Xception as examples. (c) shows the average comparison across the five deep networks. Each column represents the mean value of the five architectures. Comparisons of AlexNet, ResNet and DenseNet can be found in Supplementary File 4. From Table 7 and Figure 7, it shows that Yang et al.’s method performed between the best and worst deep features in terms of accuracy. It has the largest feature number among the three methods. This means that for mixed-label samples, the CNN also achieves more competitive results than the hand-crafted features. 3.4 Multi-label learning strategy In the multi-label prediction, the SVM output a probability vector. We considered the label with the maximal probability as the 1st predicted label and calculated the difference vector. Then we learned a criterion |$\theta $|⁠. Through comparing the difference vector with the criterion, the other labels could be determined. We compared the proposed strategy with the ordinary cut-off method, that is, if the probability was larger than 0.5, the label was set to 1; otherwise, it was set to 0. The results are shown in Table. 8. We show the comparison between the ordinary method and the proposed method in Figure 11. Generally speaking, the proposed multi-label learning outperforms the ordinary cut-off for most of the metrics, which shows that the learning of proper criterion is more accurate than the pre-set threshold values. Table 8 Performance of the five deep networks on mixed-label data using the ordinary cut-off method . FN|$^1$| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 109 79.4 96.3 79.9 0.96 85.1 VggNet 101 82.5 97.2 83.0 0.97 89.3 Xception 15 20.6 85.9 23.7 0.72 29.2 ResNet 108 61.6 94.5 65.6 0.95 77.9 DenseNet 13 53.3 92.6 57.0 0.91 70.0 . FN|$^1$| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 109 79.4 96.3 79.9 0.96 85.1 VggNet 101 82.5 97.2 83.0 0.97 89.3 Xception 15 20.6 85.9 23.7 0.72 29.2 ResNet 108 61.6 94.5 65.6 0.95 77.9 DenseNet 13 53.3 92.6 57.0 0.91 70.0 |$^1$|FN means feature number. Open in new tab Table 8 Performance of the five deep networks on mixed-label data using the ordinary cut-off method . FN|$^1$| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 109 79.4 96.3 79.9 0.96 85.1 VggNet 101 82.5 97.2 83.0 0.97 89.3 Xception 15 20.6 85.9 23.7 0.72 29.2 ResNet 108 61.6 94.5 65.6 0.95 77.9 DenseNet 13 53.3 92.6 57.0 0.91 70.0 . FN|$^1$| . |$\textrm{ACC}_{\textrm{sub}}$|(%) . |$\textrm{ACC}_{\textrm{avelab}}$|(%) . ACC (%) . AUC . F1 (%) . AlexNet 109 79.4 96.3 79.9 0.96 85.1 VggNet 101 82.5 97.2 83.0 0.97 89.3 Xception 15 20.6 85.9 23.7 0.72 29.2 ResNet 108 61.6 94.5 65.6 0.95 77.9 DenseNet 13 53.3 92.6 57.0 0.91 70.0 |$^1$|FN means feature number. Open in new tab The CNNs were implemented under Tensorflow framework (v1.14.0) in Python environment (Python foundation version 3.5.2). The classification analysis was performed using the python (v3.6) based scikit-learn (v0.19.1) [21]. The feature selection was performed using the pymrmr(v0.1.8). The hand-craft feature extraction was performed using matlab (v2018a). 4 Conclusions and discussions In this study, we proposed an imaging-based method to predict protein subcellular location from IHC images. The CNNs, as one of the most popular deep learning-based approaches used in imaging tasks, worked as feature extractor to automatically learn the most informative features for prediction. In the past, only a handful of studies have tried the deep features for protein localization prediction. This study has shown the deep features are more discriminative than the traditionally used hand-crafted features. The proposed method can handle proteins locating at both single and multiple locations. We exploited label–label relevance, which is ignored in many multi-label classification methods, and made more accurate predictions. Different from previous studies, which set two criterions manually to decide the final label set after probability vector was generated, we used a learning strategy to automatically find a proper criterion value in order to avoid human intervention. In this study, we have tested five popular CNNs. For single-label samples, the Xception achieved the best performance while for mixed-label samples, the VggNet reached the highest accuracy. Compared with the hand-crafted features, the CNNs also show more competitive results. However, the proposed method has some limitations. Firstly, we tested only five architectures of CNNs. The results might be further improved if more architectures were employed. Secondly, the protein channel was used in the proposed method. Although most of the current studies have proved that protein channel is more effective, it is more reasonable to test both DNA channel or combination of both channels. In the future, we target to try more architectures and more channels and further improve the results. Besides, we will try to combine diverse sources of information such as the evolutionary information and physicochemical information to develop a multi-modality model to handle this problem. Key Points We proposed a protein subcellular location prediction method based on phenotype features. Immunohistochemistry images were used to predict the subcellular locations of proteins. Although deep learning has shown impressive performance in imaging tasks, only a handful of studies have applied it to protein subcellular localization issues. We provided a conclusion that which architecture could give optimal results. Single-label and multi-label proteins can be treated. We here for the 1st time developed a multi-label prediction method using a criterion learning strategy. This has enabled the learning of label-label relevancy in mixed-label classification. Acknowledgments National Natural Science Foundation of China (Grant Nos. 62072329, 62071278, 61702361 and 61701340); Natural Science Foundation of Tianjin (Nos. 18JCQNJC00800 and 18JCQNJC00500). Conflict of interest The authors declared that they have no conflicts of interest to this work. Ran Su received her PhD degrees in computer science from The University of New South Wales, Australia, in 2013. She worked as a research fellow in Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR) from 2013 to 2016. She joined the College of Intelligence and Computing, Tianjin University, in 2016 as an associate professor and was awarded ‘Tianjin Youth Talented 1000 Experts’. Her research is in the areas of bioinformatics and biomedical image processing. Linlin He received her bachelor degree in the School of Software, Tianjin University, China. She is currently a master student in the School of Computer Software, College of Intelligence and Computing, Tianjin University, China. Her research interests are bioinformatics and machine learning. Tianling Liu received his bachelor degree in the School of Software Liaocheng University, China. He is currently a master student in the School of Computer Software, College of Intelligence and Computing Tianjin University, China. His research interest is medical imaging. Xiaofeng Liu is currently an assistant professor at Tianjin Medical University Cancer Institute and Hospital. He received his PhD in bioengineering and biomedical engineering from Nanyang Technological University. Leyi Wei received his PhD in Computer Science from Xiamen University, China. He is currently a professor at School of Software at Shandong University, China. His research interests include machine learning and their applications to bioinformatics. References 1. Boland MV , Murphy RF. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells . Bioinformatics 2001 ; 17 ( 12 ): 1213 – 23 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Chebira A , Barbotin Y, Jackson C, et al. A multiresolution approach to automated classification of protein subcellular location images . BMC Bioinform 2007 ; 8 : 210 . Google Scholar Crossref Search ADS WorldCat 3. Chen S-C , Zhao T, Gordon GJ, et al. Automated image analysis of protein localization in budding yeast . Bioinformatics 2007 ; 23 ( 13 ): i66 – 71 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Chollet F . Xception: deep learning with depthwise separable convolutions . In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA , New York, NY, USA: IEEE, 2017 , 1800 – 1807 . 5. UniProt Consortium . UniProt: a worldwide hub of protein knowledge . Nucleic Acids Res 2019 ; 47 ( D1 ): D506 – 15 . Crossref Search ADS PubMed WorldCat 6. He K , Zhang X, Ren S, et al. Deep residual learning for image recognition . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , New York, NY, USA: IEEE, 2016 , 770 – 8 . Google Scholar OpenURL Placeholder Text WorldCat 7. Gao H , Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks . In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA , New York, NY, USA: IEEE, 2017 , 2261 – 2269 . 8. Huang K , Murphy RF. Automated classification of subcellular patterns in multicell images without segmentation into single cells . In: Proceedings of the 2004 IEEE International Symposium on Biomedical Imaging , New York, NY, USA: IEEE, 2004 , 1139 – 42 . 9. Huang K , Murphy RF. Boosting accuracy of automated classification of fluorescence microscope images for location proteomics . BMC Bioinform 2004 ; 5 : 78 . Google Scholar Crossref Search ADS WorldCat 10. Jin Q , Meng Z, Pham TD, et al. DUNet: a deformable network for retinal vessel segmentation . Knowl Based Syst 2019 ;( 178 ): 149 – 62 . Google Scholar OpenURL Placeholder Text WorldCat 11. Krizhevsky A , Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks . In: Advances in Neural Information Processing Systems , New York, NY, USA: Curran Associates Inc, 2012 , 1097 – 105 . 12. Lin C-C , Tsai Y-S, Lin Y-S, et al. Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization . Bioinformatics 2007 ; 23 ( 24 ): 3374 – 81 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Lomenick B , Olsen RW, Huang J. Identification of direct protein targets of small molecules . ACS Chem Biol 2011 ; 6 ( 1 ): 34 – 6 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Alex XL , Yolanda T, Chong ISH, et al. Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins . Elife 2018 ; 7 :e31872. Google Scholar OpenURL Placeholder Text WorldCat 15. Hung M-C , Link W. Protein localization in disease and therapy . J Cell Sci 2011 ; 124 ( 20 ): 3381 – 92 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Murphy RF , Boland M, Velliste M. Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images . In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, La Jolla/San Diego, CA, USA , Vol. 8. Palo Alto, CA, USA: AAAI Press , 2000 , 251 – 259 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 17. Murphy RF . Automated interpretation of protein subcellular location patterns: implications for early cancer detection and assessment . Ann N Y Acad Sci 2004 ; 1020 ( 1 ): 124 – 31 . Google Scholar Crossref Search ADS PubMed WorldCat 18. Murphy RF , Velliste M, Porreca G. Robust numerical features for description and classification of subcellular location patterns in fluorescence microscope images . J VLSI Signal Process Syst Signal Image Video Technol 2003 ; 35 : 311 – 21 . Google Scholar Crossref Search ADS WorldCat 19. Park S , Yang J-S, Shin Y-E, et al. Protein localization as a principal feature of the etiology and comorbidity of genetic diseases . Mol Syst Biol 2011 ; 7 : 494 . Google Scholar Crossref Search ADS PubMed WorldCat 20. Pärnamaa T , Parts L. Accurate classification of protein subcellular localization from high-throughput microscopy images using deep learning . G3 (Bethesda) 2017 ; 7 ( 5 ): 1385 – 92 . Google Scholar Crossref Search ADS PubMed WorldCat 21. Pedregosa F , Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python . J Mach Learn Res 2011 ; 12 : 2825 – 30 . Google Scholar OpenURL Placeholder Text WorldCat 22. Peng H , Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy . IEEE Trans Pattern Anal Mach Intell 2005 ; 27 ( 8 ): 1226 – 38 . Google Scholar Crossref Search ADS PubMed WorldCat 23. Shao W , Ding Y, Shen H-B, et al. Deep model-based feature extraction for predicting protein subcellular localizations from bio-images . Front Comput Sci 2017 ; 11 ( 2 ): 243 – 52 . Google Scholar Crossref Search ADS WorldCat 24. Simonyan K , Zisserman A. Very deep convolutional networks for large-scale image recognition . In: Proceedings of International Conference on Learning Representations , 2015 . 25. Ran S , Wu H, Liu X, et al. Predicting drug-induced hepatotoxicity based on biological feature maps and diverse classification strategies . Brief Bioinform 2020 . Google Scholar OpenURL Placeholder Text WorldCat 26. Tahir M , Khan A. Protein subcellular localization of fluorescence microscopy images: employing new statistical and Texton based image features and SVM based ensemble classificatio . Inform Sci 2016 ; 345 : 65 – 80 . Google Scholar Crossref Search ADS WorldCat 27. Tahir M , Khan A, Majid A. Protein subcellular localization of fluorescence imagery using spatial and transform domain features . Bioinformatics 2012 ; 28 ( 1 ): 91 – 7 . Google Scholar Crossref Search ADS PubMed WorldCat 28. Tahir M , Khana A, Majid A, et al. Subcellular localization using fluorescence imagery: utilizing ensemble classification with diverse feature extraction strategies and data balancing . Appl Soft Comput 2013 ; 13 ( 11 ): 4231 – 43 . Google Scholar Crossref Search ADS WorldCat 29. Thul PJ , Akesson L, Wiking M, et al. A subcellular map of the human proteome . Science 2017 ; 356 ( 6340 ): eaal3321 . Google Scholar OpenURL Placeholder Text WorldCat 30. Tsai C-P , Lee H-Y. Adversarial learning of label dependency: a novel framework for multi-class classification . In: Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom , New York, NY, USA: IEEE, 2019 , 3847 – 3851 . 31. Uhlen M , Cheng Z, Sjöstedt E, et al. A pathology atlas of the human cancer transcriptome . Science 2017 ; 357 ( 6352 ): eaan2507 . Google Scholar OpenURL Placeholder Text WorldCat 32. van der Maaten LJP , Hinton GE. Visualizing high-dimensional data using t-SNE . J Mach Learn Res 2008 ; 9 : 2579 – 605 . Google Scholar OpenURL Placeholder Text WorldCat 33. Wang X , Li G-Z. Multilabel learning via random label selection for protein subcellular multilocations prediction . IEEE/ACM Trans Comput Biol Bioinform 2013 ; 10 ( 2 ): 436 – 46 . Google Scholar Crossref Search ADS PubMed WorldCat 34. Wei L , Ding Y, Ran S, et al. Prediction of human protein subcellular localization using deep learning . J Parallel Distrib Comput 2018 ; 117 : 212 – 7 . Google Scholar Crossref Search ADS WorldCat 35. Winsnes C , Sullivan DP, Smith K, et al. Multi-label prediction of subcellular localization in confocal images using deep neural networks . Mol Biol Cell Rockville, MD, USA: American Society for Cell Biology, 2016 ; 27 . Google Scholar OpenURL Placeholder Text WorldCat 36. Xu Y-Y , Yang F, Shen H-B. Incorporating organelle correlations into semi-supervised learning for protein subcellular localization prediction . Bioinformatics 2016 ; 32 ( 14 ): 2184 – 92 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Xu Y-Y , Yang F, Zhang Y, et al. An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues . Bioinformatics 2013 ; 29 ( 16 ): 2032 – 40 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Xu Y-Y , Yao L-X, Shen H-B. Bioimage-based protein subcellular location prediction: a comprehensive review . Front Comput Sci 2018 ; 12 : 26 – 39 . Google Scholar Crossref Search ADS WorldCat 39. Yang F , Yang L, Wang Y, et al. |$\mathrm{mic}_{\mathrm{locator}}$|⁠: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy . BMC Bioinform 2019 ; 20 : 522 . Google Scholar Crossref Search ADS WorldCat 40. Yang F , Xu Y-Y, Wang S-T, et al. Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features . Neurocomputing 2014 ; 131 : 113 – 23 . Google Scholar Crossref Search ADS WorldCat 41. Zhou H , Yang Y, Shen H-B. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features . Bioinformatics 2017 ; 33 ( 6 ): 843 – 53 . Google Scholar PubMed OpenURL Placeholder Text WorldCat 42. Lin Z , Yang J, Shen H-B. Multi label learning for prediction of human protein subcellular localizations . Protein J 2009 ; 28 ( 9–10 ): 384 – 90 . Google Scholar PubMed OpenURL Placeholder Text WorldCat © The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Protein subcellular localization based on deep image features and criterion learning strategy JF - Briefings in Bioinformatics DO - 10.1093/bib/bbaa313 DA - 2020-12-16 UR - https://www.deepdyve.com/lp/oxford-university-press/protein-subcellular-localization-based-on-deep-image-features-and-x5btEaKuWb SP - 1 EP - 1 VL - Advance Article IS - DP - DeepDyve ER -