TY - JOUR AU - Bao, Wenzheng AB - Abstract Emerging evidence indicates that the abnormal expression of miRNAs involves in the evolution and progression of various human complex diseases. Identifying disease-related miRNAs as new biomarkers can promote the development of disease pathology and clinical medicine. However, designing biological experiments to validate disease-related miRNAs is usually time-consuming and expensive. Therefore, it is urgent to design effective computational methods for predicting potential miRNA-disease associations. Inspired by the great progress of graph neural networks in link prediction, we propose a novel graph auto-encoder model, named GAEMDA, to identify the potential miRNA-disease associations in an end-to-end manner. More specifically, the GAEMDA model applies a graph neural networks-based encoder, which contains aggregator function and multi-layer perceptron for aggregating nodes’ neighborhood information, to generate the low-dimensional embeddings of miRNA and disease nodes and realize the effective fusion of heterogeneous information. Then, the embeddings of miRNA and disease nodes are fed into a bilinear decoder to identify the potential links between miRNA and disease nodes. The experimental results indicate that GAEMDA achieves the average area under the curve of |$93.56\pm 0.44\%$| under 5-fold cross-validation. Besides, we further carried out case studies on colon neoplasms, esophageal neoplasms and kidney neoplasms. As a result, 48 of the top 50 predicted miRNAs associated with these diseases are confirmed by the database of differentially expressed miRNAs in human cancers and microRNA deregulation in human disease database, respectively. The satisfactory prediction performance suggests that GAEMDA model could serve as a reliable tool to guide the following researches on the regulatory role of miRNAs. Besides, the source codes are available at https://github.com/chimianbuhetang/GAEMDA. miRNA, complex disease, miRNA-disease associations prediction, heterogeneous graph, graph auto-encoder, graph neural networks Introduction MicroRNAs are one kind of small, endogenous, noncoding single-stranded RNA molecules with a length of about 22 nucleotides, which can regulate gene expression at the posttranscriptional level [1, 2]. It has been more than two decades since the first two miRNAs lin-4 and let-7 were discovered in Caenorhabditis elegans [3, 4]. During this period, a growing number of research analyses have shown that miRNAs play a critical role in various complex biological processes such as cell proliferation, differentiation, signal transduction, viral infection and more [5]. Furthermore, emerging experimental evidence also suggests that the mutations or aberrant expression of miRNAs often lead to the evolution and progression of numerous complex human diseases [6]. For instance, it was confirmed that the overexpression of hsa-mir-449a in CL1-0 will increase DNA damage and apoptosis, which were induced by irradiation, change the distribution of cell cycle and ultimately result in the sensitivity of CL1-0 to irradiation [7]. Besides, hsa-mir-195 and hsa-mir-497 were verified to play key inhibitory roles in breast cancer malignancy, and even can be the potential diagnostic targets [8]. Therefore, adopting appropriate experimental or computational methods to explore the associations between miRNAs and diseases could make miRNAs serve as tumor suppressors or biomarkers and help medical personnel to gain insight into the pathological mechanisms of various complex diseases and develop related new drugs from the molecular perspective [9]. Traditional experimental methods used for identifying potential miRNA-disease associations mainly include reverse transcription polymerase chain reaction (PCR) [10], northern blotting [11] and microarray profiling [12]. In general, experimental methods tend to be inefficient and require a significant investment of time and money. However, due to the reliability of experimental methods, researchers have established many authoritative bioinformatic databases to store experimentally confirmed miRNA-disease associations, such as the database of differentially expressed miRNAs in human cancers (dbDEMC) [13], the human microRNA disease database (HMDD) [14] and the database for microRNA deregulation in human disease (miR2Disease) [15]. At the same time, the computing and storage performance of computers has been greatly improved. Therefore, it becomes possible to design excellent computational methods to predict potential miRNA-disease associations. Computational methods are often efficient and economical, potentially prioritizing miRNAs based on their impact on a specific disease. Therefore, computational methods could provide researchers with a novel perspective to study the top-ranked miRNAs and prompt them to carry out relevant experimental methods to further validate these associations. Over the past decade, numerous computational methods for potential miRNA-disease associations prediction have been proposed. Among these methods, similarity measure-based method is a classical computational method, which prioritizes disease-related miRNAs based on the assumption that miRNAs with similar function tend to be related to diseases with phenotypic similarity [16]. For instance, Jiang et al. [17] first developed a computational method to examine the relationships between functionally related miRNAs and phenotypically similar diseases based on a human phenome-microRNAome network. However, suffered from high false positive and false negative rates that existed in miRNA-target associations, the predictive performance of this method was restricted. Later, Chen et al. [18] integrated Gaussian interaction profile kernel similarity into the calculation of miRNA and disease similarity to predict disease-related miRNAs without any known related miRNAs (WBSMDA). Note that WBSMDA method is a global rank method, which allows for prioritizing miRNAs for all diseases simultaneously. Che et al. [19] designed a new algorithm to calculate miRNAs functional similarity based on the Levenshtein distance between two miRNA sequences, and proposed LFEMDA method to predict potential associations. Besides, Zhang et al. [20] developed FLNSNLI method to predict potential miRNA-disease associations. This method first expressed miRNAs and diseases as association profiles and then combined fast linear neighborhood similarity measure to calculate miRNAs and diseases similarity. At last, FLNSNLI adopted label propagation algorithm on these two kinds of similarity and utilized weighted average strategy to obtain the final prediction scores. According to integrating multiple data sources including disease gene information, miRNA target gene information and gene similarity information, Ma et al. [21] constructed novel similarity matrices for miRNAs and diseases, and applied kernel neighborhood similarity algorithm to calculate kernel neighbor similarity for miRNAs and diseases. At last, they applied bidirectional propagation algorithm to achieve the prediction scores. Machine learning-based method is another commonly used computational method to predict potential miRNA-disease associations. For the sake of predicting different types of miRNA-disease associations, Chen et al. [22] utilized restricted Boltzmann machine as classifier and presented RBMMMDA method. Then, Chen et al. [23] proposed RFMDA method combining a filter-based feature selection strategy and a random forest classifier to enhance the prediction performance. Later, motivated by RFMDA, Yao et al. [24] further designed IRFMDA method to optimize the model’s prediction ability. IRFMDA utilized a novel feature selection strategy based on the variable importance score of random forest and adopted random forest regression to predict unknown associations. Besides, Yan et al. [25] proposed DNRLMF-MDA method, which utilized logistic matrix factorization and dynamic neighborhood regularized to compute the miRNA-disease associations probability. Peng et al. [26] presented MDA-CNN method, which adopted a three-layer network and an auto-encoder to capture significant miRNA-disease feature combinations, employed a convolutional neural network to obtain the final prediction scores. Zheng et al. [27] developed MLMDA to make full use of miRNA sequence information based on k-mer sparse matrix and adopted random forest classifier to obtain the prediction probability. Zhou et al. [28] proposed a novel method for miRNA-disease associations prediction called GBDT-LR, which first screened negative samples by adopting k-means clustering on unknown miRNA-disease associations, then applied gradient boosting decision tree to dig out more differentiating features, and employed logistic regression model to obtain the final prediction scores. Meanwhile, by integrating the interactions among miRNA, disease, lncRNA, drug and protein into a heterogeneous network, Ji et al. [29] employed learning graph representations with global structural information (GraRep) method to obtain the integrated features of miRNAs and diseases and adopted random forest classifier for prediction. Influenced by the tremendous progress made by graph neural networks on graph structure data, such as Cora [30], MovieLens [31], Reddit [32] and Protein–Protein Interactions (PPI) [33] datasets, numerous graph neural networks-based method was emerging to address the prediction of potential miRNA-disease associations. For example, Li et al. [34] proposed HGCNMDA method to infer disease-related miRNAs, which adopted node2vec algorithm and graph convolutional networks on PPI network to obtain cross-features of miRNAs and diseases, and designed an edge features extraction component for potential associations prediction. In order to obtain more valuable features of miRNAs and diseases, Li et al. [35] performed graph convolutional networks on miRNA similarity network and disease similarity network, respectively, and proposed NIMCGCN method to generate miRNA-disease associations based on neural inductive matrix completion. Through combining miRNA similarity and disease similarity into a fully connected homogeneous graph, Li et al. [36] presented a fully connected graph convolutional networks based on graph convolutional networks for potential miRNA-disease associations prediction. In this paper, we proposed a novel graph auto-encoder model named GAEMDA for potential miRNA-disease associations prediction. Specifically, we first constructed a miRNA-disease bipartite graph to formulate the associations between miRNAs and diseases, in which each node is represented by the corresponding similarity information, and each link represents the corresponding association. Secondly, considering the heterogeneity of miRNA and disease nodes, we designed node-type transformation matrices to project the miRNA and disease nodes into the same vector space. Thirdly, in order to fully explore the rich miRNA-disease interaction information, we generated the embeddings of nodes by aggregating their heterogeneous neighborhood features into their original features through the graph neural networks-based encoder. Fourthly, the embeddings of miRNA and disease nodes were fed into a bilinear decoder to reconstruct the links between miRNA nodes and disease nodes. Then, cross-entropy loss and back propagation algorithm were utilized to train the whole model in an end-to-end manner. Moreover, we evaluated the prediction performance of GAEMDA model based on 5-fold cross-validation. At last, GAEMDA achieved the average area under the curve (AUC) of |$93.56\pm 0.44\%$|⁠, Accuracy of |$84.93\pm 0.95\%$|⁠, Precision of |$81.37\pm 1.98\%$|⁠, Recall of |$90.70\pm 1.27\%$|⁠, and F1-score of |$85.75\pm 0.76\%$|⁠. For further verifying the performance of GAEMDA on predicting potentially related miRNAs for certain diseases, case studies on colon neoplasms, esophageal neoplasms and kidney neoplasms were carried out. The results showed that 48 of the top 50 predicted miRNAs for these neoplasms can be confirmed by dbDEMC and miR2Disease, respectively. Our model provides a novel perspective to utilize the existed miRNA-disease interaction information to address the miRNA-disease associations task with graph neural networks. All the results show that GAEMDA can be used as a powerful tool to guide the following researches on the regulatory role of miRNAs. Materials and methods Human miRNA-disease associations In this study, we adopted HMDD v2.0 as the benchmark dataset and directly downloaded the experimentally verified miRNA-disease associations from https://www.cuilab.cn/hmdd [14]. At last, we can obtain 5430 experimentally verified miRNA-disease associations between 383 diseases and 495 miRNAs. For convenience, we utilized a binary matrix |$\mathrm{DM}$| with 383 rows and 495 columns to store the associations. If a disease is associated with a miRNA, then the value of the element at the corresponding position of matrix |$\mathrm{DM}$| is set as 1, otherwise 0. Note that all the experimentally verified associations were selected as positive samples in our following experiments. MiRNA functional similarity On the basis of the calculation of miRNA functional similarity provided by Wang et al., assuming that diseases with similar phenotypes are more likely to be linked with functional similar miRNAs, and vice versa, we can obtain the miRNA functional similarity from https://www.cuilab.cn/files/images/cuilab/misim.zip [37]. Here, we built matrix |$\mathrm{MFSM}$| with 495 rows and 495 columns to store the miRNA functional similarity, in which |$\mathrm{MFSM}({m}_i,{m}_j)$| denotes the miRNA functional similarity score between miRNA |${m}_i$| and |${m}_j$|⁠. Disease semantic similarity Based on previous study [38], disease semantic similarity can be calculated based on the medical subject headings (MeSH) descriptors, which is available on https://www.ncbi.nlm.nih.gov/. Here, we formulated each disease as a directed acyclic graph (DAG). Concretely, we can adopt |$\mathrm{DAG}({d}_i)=({d}_i,T({d}_i),E({d}_i))$| to describe disease |${d}_i$|⁠, in which |$T({d}_i)$| represents a set of nodes composed of both node |${d}_i$| and its ancestor nodes, |$E({d}_i)$| denotes the corresponding edge set containing direct links from parent nodes to child nodes. Then, we can calculate the semantic contribution of disease |${d}_k$| to |${d}_i$| as follows: $$\begin{equation} D{1}_{d_i}\left({d}_k\right)=\left\{\begin{array}{@{}cc}1& if\ {d}_k={d}_i\\{}\max \left\{\varDelta \ast D{1}_{d_i}\left({d}_k^{\prime}\right)|{d}_k^{\prime}\in \mathrm{children}\ \mathrm{of}\ {d}_k\right\}& if\ {d}_k\ne{d}_i \end{array},\right. \end{equation}$$(1) where |$\varDelta$| represents the semantic contribution decay factor and we set it as 0.5 according to previous study [37]. The semantic contribution value of disease |${d}_i$| to itself is 1, and the semantic contribution value of disease |${d}_k$| to disease |${d}_i$| will decrease as the distance between them increases. Therefore, the semantic value of disease |${d}_i$| can be defined as below: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle DS1\left({d}_i\right)=\sum_{d_k\in T\left({d}_i\right)}D{1}_{d_i}\left({d}_k\right)\end{array}} \end{equation}$$(2) Based on the supposition that two diseases can be considered more similar if they share larger parts of their DAGs, we can obtain the disease semantic similarity |$\mathrm{DSSM}1({d}_i,{d}_j)$| between disease |${d}_i$| and |${d}_j$| as follows: $$\begin{equation} {\displaystyle \begin{array}{c}\mathrm{DSSM}1\left({d}_i,{d}_j\right)=\dfrac{\sum_{d_t\in T\left({d}_i\right)\bigcap T\left({d}_j\right)}\left(D{1}_{d_i}\left({d}_t\right)+D{1}_{d_j}\left({d}_t\right)\right)}{DS1\left({d}_i\right)+ DS1\left({d}_j\right)},\end{array}} \end{equation}$$(3) where |$\mathrm{DSSM}1$| is a |$383\times 383$| matrix storing the first kind of disease semantic similarity. However, considering that diseases appearing in more DAGs may be more common, diseases appearing in less DAGs may be more specific, the semantic contribution value of diseases in the same layer of DAG should be different. Thus, we adopted another method to calculate the disease semantic similarity on the basis of previous study [39]. Here, the semantic contribution of disease |${d}_k$| to |${d}_i$| can be described as below: $$\begin{equation} {\displaystyle \begin{array}{c}D{2}_{d_i}\left({d}_k\right)=-\log \left(\dfrac{\mathrm{the}\ \mathrm{number}\ \mathrm{of}\ \mathrm{DAGs}\ \mathrm{including}\ {d}_k}{\mathrm{the}\ \mathrm{number}\ \mathrm{of}\ \mathrm{diseases}}\right)\end{array}} \end{equation}$$(4) Correspondingly, we can obtain the semantic value of disease |${d}_i$| from equation (5), and the disease semantic similarity |$\mathrm{DSSM}2({d}_i,{d}_j)$| between disease |${d}_i$| and |${d}_j$| from equation (6). $$\begin{equation} {\displaystyle \begin{array}{c} \displaystyle DS2\left({d}_i\right)=\sum_{d_k\in T\left({d}_i\right)}D{2}_{d_i}\left({d}_k\right)\end{array}} \end{equation}$$(5) $$\begin{equation} {\displaystyle \begin{array}{c}\mathrm{DSSM}2\left({d}_i,{d}_j\right)=\dfrac{\sum_{d_t\in T\left({d}_i\right)\cap T\left({d}_j\right)}\left(D{2}_{d_i}\left({d}_t\right)+D{2}_{d_j}\left({d}_t\right)\right)}{DS2\left({d}_i\right)+ DS2\left({d}_j\right)},\end{array}} \end{equation}$$(6) where |$\mathrm{DSSM}2$| is a |$383\times 383$| matrix storing the second kind of disease semantic similarity. In order to obtain a more reasonable semantic similarity of diseases, we synthesized these two kinds of disease semantic similarity to calculate the final disease semantic similarity based on previous study [18]. At last, we can obtain the disease semantic similarity |$\mathrm{DSSM}({d}_i,{d}_j)$| between disease |${d}_i$| and |${d}_j$| according to the following equation: $$\begin{equation} {\displaystyle \begin{array}{c}\mathrm{DSSM}\left({d}_i,{d}_j\right)=\dfrac{\mathrm{DSSM}1\left({d}_i,{d}_j\right)+\mathrm{DSSM}2\left({d}_i,{d}_j\right)}{2}\end{array}} \end{equation}$$(7) Gaussian interaction profile kernel similarity for miRNAs and diseases Based on previous research, the Gaussian interaction profile kernel similarity can be calculated through a hypothesis that similar miRNAs are more likely to be related with similar diseases [18]. Concretely, a binary vector |$\mathrm{IP}({m}_i)$|⁠, located in the ith column of matrix |$\mathrm{DM}$|⁠, is constructed to represent associations between miRNA |${m}_i$| and each disease. Then, the Gaussian interaction profile kernel similarity for miRNAs |$\mathrm{MGSM}({m}_i,{m}_j)$| between miRNA |${m}_i$| and |${m}_j$| can be calculated as follows: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle\displaystyle\mathrm{MGSM}\left({m}_i,{m}_j\right)=\mathit{\exp}\left(-{r}_m\parallel IP\left({m}_i\right)- IP\left({m}_j\right){\parallel}^2\right),\end{array}} \end{equation}$$(8) where |${r}_m$| is applied for controlling the bandwidth of kernel. Here, we can calculate |${r}_m$| according to normalizing the original kernel bandwidth |${r}_m^{\prime }$| as below: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{r}_m={r}_m^{\prime }/\left(\dfrac{1}{nm}\sum_{i=1}^{nm}\parallel \mathrm{IP}\left({m}_i\right){\parallel}^2\right),\end{array}} \end{equation}$$(9) where |$nm$| represents the number of all miRNAs, which is equal to 495 in our study, and |${r}_m^{\prime }$| is set to 1 referring to previous study [18]. Similarly, the Gaussian interaction profile kernel similarity for diseases |$\mathrm{DGSM}({d}_i,{d}_j)$| between disease |${d}_i$| and |${d}_j$| can be calculated according to the following two equations: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle \mathrm{DGSM}\left({d}_i,{d}_j\right)=\exp \left(-{r}_d\parallel \mathrm{IP}\left({d}_i\right)-\mathrm{IP}\left({d}_j\right){\parallel}^2\right)\end{array}} \end{equation}$$(10) $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{r}_d={r}_d^{\prime }/\left(\frac{1}{nd}\sum_{i=1}^{nd}\parallel \mathrm{IP}\left({d}_i\right){\parallel}^2\right),\end{array}} \end{equation}$$(11) where binary vector |$\mathrm{IP}({d}_i)$|⁠, located in the ith row of matrix |$\mathrm{DM}$|⁠, represents associations between disease |${d}_i$| and each miRNA, |$nd$| denotes the number of all diseases, which is equal to 383 in our study and |${r}_d^{\prime }$| is set to 1 accordingly. Integrated similarity for miRNAs and diseases Considering that there exist a lot of sparse values in the obtained miRNA functional similarity matrix and disease semantic similarity matrix, we incorporated the Gaussian interaction profile kernel similarity into the miRNA and disease similarity matrices. Based on Chen’s study [18], the integrated similarity for miRNAs |$IM({m}_i,{m}_j)$| between miRNA |${m}_i$| and |${m}_j$| was calculated as equation (12), the integrated similarity for diseases |$ID({d}_i,{d}_j)$| between disease |${d}_i$| and |${d}_j$| was calculated as equation (13). $$\begin{equation} {\displaystyle \begin{array}{c}\mathrm{IM}\left({m}_i,{m}_j\right)=\left\{\begin{array}{@{}cc}\mathrm{MFSM}\left({m}_i,{m}_j\right)& if\ {m}_i\ \mathrm{and}\ {m}_j\ \mathrm{have}\ \mathrm{functional}\\ & \mathrm{similarity}\\{}\mathrm{MGSM}\left({m}_i,{m}_j\right)& \mathrm{otherwise}\end{array}\right.\end{array}} \end{equation}$$(12) $$\begin{equation} {\displaystyle \begin{array}{c} ID\left({d}_i,{d}_j\right)=\left\{\begin{array}{cc}\mathrm{DSSM}\left({d}_i,{d}_j\right)& \mathrm{if}\ {d}_i\ and\ {d}_j\ \mathrm{have}\ \mathrm{semantic}\\ & \mathrm{similarity}\\{}\mathrm{DGSM}\left({d}_i,{d}_j\right)& \mathrm{otherwise}\end{array}\right.\end{array}} \end{equation}$$(13) GAEMDA Motivated by the great progress of graph neural networks in link prediction task [31, 40–42], we proposed a graph auto-encoder model, which combined a graph neural networks-based encoder and a bilinear decoder for potential miRNA-disease associations prediction (GAEMDA). GAEMDA can be described as five steps (see Figure 1): (i) construct a miRNA-disease bipartite graph, (ii) project miRNA and disease nodes into the same vector space, (iii) apply graph neural networks-based encoder to generate the embeddings of miRNA and disease nodes, (iv) apply bilinear decoder to reconstruct links in the bipartite graph and (v) apply cross-entropy loss function to train the whole model in an end-to-end manner. Next, we will discuss the specific implementation details of each step. Figure 1 Open in new tabDownload slide Flowchart of GAEMDA model for predicting potential miRNA-disease associations. Figure 1 Open in new tabDownload slide Flowchart of GAEMDA model for predicting potential miRNA-disease associations. In step 1, we integrated multiple data source into a miRNA-disease bipartite graph, which contained 495 miRNA nodes and 383 disease nodes. As we known, there are totally 5430 experimentally validated miRNA-disease associations in HMDD v2.0 [14]. Here, we treated all these 5430 associations as positive links between miRNA nodes and disease nodes. Besides, to better train the model, we needed to construct an equal number of negative links to balance the sample set. Considering that the number of unknown associations between miRNAs and diseases is much larger than the number of known associations, we here randomly selected 5430 associations from the unknown associations as negative associations, added them as negative links into the miRNA-disease bipartite graph. Then, all the positive links were labeled as 1 and all the negative links were labeled as 0 for following model training, respectively. Besides, we treated the integrated similarity for miRNAs and diseases as miRNA features and disease features, respectively. Concretely, miRNA |${m}_i$| can be described as a 495-dimensions vector |${F}_{m_i}$| as follows: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{F}_{m_i}=\left({v}_1,{v}_2,{v}_3,\dots, {v}_{494},{v}_{495}\right),\end{array}} \end{equation}$$(14) where |${F}_{m_i}$| represents the ith row of matrix |$IM$| and |${v}_j$| represents the integrated similarity value between miRNA |${m}_i$| and |${m}_j$|⁠. Similarly, disease |${d}_i$| can be described as a 383-dimensions vector |${F}_{d_i}$| as below: $$\begin{equation} {\displaystyle \begin{array}{c}{F}_{d_i}=\left({w}_1,{w}_2,{w}_3,\dots, {w}_{382},{w}_{383}\right),\end{array}} \end{equation}$$(15) where |${F}_{d_i}$| denotes the ith row of matrix |$\mathrm{ID}$| and |${w}_j$| represents the integrated similarity value between disease |${d}_i$| and |${d}_j$|⁠. Then, the 495-dimensions miRNA features |${F}_m$| were added to miRNA nodes, and the 383-dimensions disease features |${F}_d$| were added to disease nodes in the miRNA-disease bipartite graph, respectively. In step 2, we projected the heterogeneous miRNA nodes and disease nodes into the same vector space. Due to the heterogeneity of nodes in miRNA-disease bipartite graph, miRNA nodes and disease nodes belong to different feature spaces through step 1. To facilitate subsequent calculations, we designed node-type transformation matrices to project the features of miRNA nodes and disease nodes into the same vector space. The projection process of miRNA nodes can be described as follows: $$\begin{equation} {\displaystyle \begin{array}{c}{H}_m={W}_{\varnothing_m}\cdot{F}_m,\end{array}} \end{equation}$$(16) where |${H}_m$| represents the projected features of miRNA nodes, |${F}_m$| denotes the original features of miRNA nodes, and |${W}_{\varnothing_m}$| denotes a linear transformation matrix to project the 495-dimensional miRNA nodes into E-dimensional space. Similarly, the projection process of disease nodes can be shown as below: $$\begin{equation} {\displaystyle \begin{array}{c}{H}_d={W}_{\varnothing_d}\cdot{F}_d,\end{array}} \end{equation}$$(17) where |${H}_d$| denotes the projected features of disease nodes, |${F}_d$| denotes the original features of disease nodes, and |${W}_{\varnothing_d}$| denotes a linear transformation matrix to project the 383-dimensional disease nodes into E-dimensional space. Then, both miRNA and disease nodes features are in E-dimensional vector space. In step 3, we generated the embeddings of miRNA and disease nodes with their direct neighbors’ information using the graph neural networks-based encoder. For example, for miRNA node |${m}_i$|⁠, we first calculated the aggregation of its direct neighbors’ features as follows: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{H}_{m_i}^a=\frac{1}{D_{m_i}}g\left({H}_{d_1},{H}_{d_2},\dots \right),\end{array}} \end{equation}$$(18) where |${H}_{m_i}^a$| denotes the aggregation of neighbors features of node |${m}_i$|⁠. |$g(\cdot )$| represents the aggregator function such as |$\mathrm{sum}(\cdot )$| i.e. element-wise summation of all coming messages, |$\max (\cdot )$| i.e. element-wise max-pooling of all coming messages, or |$\mathrm{mean}(\cdot )$| i.e. element-wise mean-pooling of all coming messages. Note that the |$\mathrm{sum}(\cdot)$| function was set as the default aggregator. Diseases |$\{{d}_1,{d}_2,\dots \}$| represent the direct neighbors of miRNA node |${m}_i$|⁠. |${D}_{m_i}$| is a normalization constant, which we chose to be the degree value of node |${m}_i$|⁠. Then, in order to fuse the aggregated features into the original features of miRNA node |${m}_i$|⁠, we concatenated features |${H}_{m_i}$| with |${H}_{m_i}^a$|⁠, and applied multi-layer perceptron (MLP) to update the features of node |${m}_i$| as follows: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{H}_{m_i}^{\prime }=\mathrm{LeakyReLU}\left(f\left({H}_{m_i}\bigoplus{H}_{m_i}^a\right)\right),\end{array}} \end{equation}$$(19) where |${H}_{m_i}^{\prime }$| represents the updated features of node |${m}_i$|⁠, |$\bigoplus$| denotes the concatenate operation and |$f(\cdot)$| represents a single MLP layer with |$E$| outputs, which are equal to the projection dimensions, |$\mathrm{LeakyReLU}(\cdot)$| is nonlinearity activation function with negative input slope 0.2. In an analogous way, we can update the features of disease node |${d}_j$| through the following two equations: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{H}_{d_j}^a=\frac{1}{D_{d_j}}g\left({H}_{m_1},{H}_{m_2},\dots \right)\end{array}} \end{equation}$$(20) $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{H}_{d_j}^{\prime }=\mathrm{LeakyReLU}\left(f\left({H}_{d_j}\bigoplus{H}_{d_j}^a\right)\right),\end{array}} \end{equation}$$(21) where |${H}_{d_j}^a$| denotes the aggregation of neighbors features of node |${d}_j$|⁠, miRNAs |$\{{m}_1,{m}_2,\dots \}$| are the direct neighbors of disease node |${d}_j$|⁠, |${D}_{d_j}$| denotes the degree value of node |${d}_j$|⁠, and |${H}_{d_j}^{\prime }$| represents the updated features of node |${d}_j$|⁠. Note that the above operations could be applied to all miRNA and disease nodes simultaneously. We would refer these operations as a single-layer graph neural networks-based encoder. Considering that the input and the output of this single-layer graph neural networks-based encoder are both |$E$|-dimensional vectors, we can stack |$L$| layers graph neural networks-based encoder to aggregate multiple neighbors’ features and enhance the features representation ability of miRNA and disease nodes. Therefore, we can obtain the final embeddings of miRNA nodes |${H}_m^L$| and embeddings of disease nodes |${H}_d^L$| through |$L$|-layers graph neural networks-based encoder. In step 4, we adopted a bilinear decoder to reconstruct the links between miRNA and disease nodes. Because the sigmoid activation function has great advantages in dealing with binarization classification problems, we here introduced a bilinear operation followed by a sigmoid function to predict the probability |${\hat{y}}_{ij}$| that a miRNA node |${m}_i$| would be linked with a disease node |${d}_j$| as follows: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle{\hat{y}}_{ij}=\mathrm{sigmoid}\left({H}_{d_j}^LQ{\left({H}_{m_i}^L\right)}^T\right).\end{array}} \end{equation}$$(22) where |$Q$| is a trainable parameter matrix with |$E\times E$| dimensions, and the sigmoid function can be defined as |$\mathrm{sigmoid}(x)=1/(1+{e}^{-x})$|⁠. In step 5, we applied cross-entropy loss |$\mathrm{LOSS}$| over all training samples to optimize model parameters as follows: $$\begin{equation} {\displaystyle \begin{array}{c}\displaystyle\mathrm{LOSS}=-\sum_{i,j\in \mathcal{Y}\bigcup{\mathcal{Y}}^{-}}\left({y}_{ij}\mathit{\log}{\hat{y}}_{ij}+\left(1-{y}_{ij}\right)\mathit{\log}\left(1-{\hat{y}}_{ij}\right)\right),\end{array}} \end{equation}$$(23) where |${y}_{ij}$| represents the true label of the link, which will be 1 or 0, |$\mathcal{Y}$| and |${\mathcal{Y}}^{-}$| denote the set of all nodes contained in the positive links set and negative links set, respectively. Then, we can train the whole model via back propagation algorithm in an end-to-end manner. Results Implementation details and evaluation metrics We implemented GAEMDA model based on Deep Graph Library with MXNet backend [43]. In the training stage, we randomly initialized the model parameters with Xavier initialization [44], optimized the model parameters with Adam [45]. Besides, we adopted grid search to find the optimal hyper parameters and set the learning rate as 0.001, the weight decay as 1e-3. To avoid over-fitting problems, we randomly dropped hidden units after projection operations and each MLP layer. We searched the optimal dropout rate from |$\{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9\}$| and set it as 0.7 in our experiment. We trained the model for 1000 epochs and printed the test set results every 10 epochs. All the samples were constructed based on HMDD v2.0 [14]. The experiment was carried out in Nvidia Tesla P100 Cluster. All the codes and datasets are available on https://github.com/chimianbuhetang/GAEMDA. To ensure impartial comparisons, 5-fold cross-validation was used to evaluate the performance of GAEMDA. Specially, all the samples were divided into five equal parts, each part was treated as testing set, and the other four parts were treated as training set in turn. Therefore, there would be no overlap between training sets and testing sets, and each sample can be tested by our model upon 5-fold cross-validation. Moreover, four commonly used evaluation metrics were adopted to measure the performance of GAEMAD, which are Accuracy (Acc.), Precision (Prec.), Recall, and F1-score. Meanwhile, we plotted the receiver operating characteristic (ROC) curves to intuitively display the performance of our model, utilized the AUC to comprehensively evaluate the performance of this model. In general, higher AUC values imply better prediction performance, and AUC values below 0.5 signify random classification ability. Performance evaluation Note that we refer the model with |$\mathrm{sum}(\cdot)$| aggregator as GAEMDA, the model with |$\max (\cdot)$| aggregator as GAEMDA-max, and the model with |$\mathrm{mean}(\cdot)$| aggregator as GAEMDA-mean. In our preliminary experiment, we found the best performances of these models occurred when the projected dimensions |$E$| was 256 and the encoder layers |$L$| was set as 2. As shown in Figure 2, GAEMDA achieves mean AUC of |$93.56\pm 0.44\%$|⁠, which is the average of |$93.21$|⁠, |$93.59$|⁠, |$94.34$|⁠, |$93.57$| and |$93.07\%$| upon 5-fold cross-validation. From Table 1 we can see that GAEMDA obtains the average Acc. of |$84.93\%$|⁠, Prec. of |$81.37\%$|⁠, Recall of |$90.70\%$|⁠, and F1-score of |$85.75\%$| upon 5-fold cross-validation, respectively. Besides, we summarized the average results of these three models on five evaluation metrics based on HMDD v2.0 in Table 2. Here, GAEMDA model achieves the highest Recall, F1-score and AUC values among these models. Especially the Recall rate, GAEMDA is much higher than the other two models, which means that GAEMDA model can predict more positive samples from all samples. Besides, in terms of F1-score and AUC values, which can measure the model’s performance more comprehensively, GAEMDA shows better results than GAEMDA-max and GAEMDA-mean. Although GAEMDA-max shows the highest Acc. value and GAEMDA-mean obtains the highest Prec. value, we here only adopt |$\mathrm{sum}(\cdot)$| function as our default aggregator and apply GAEMDA model for potential miRNA-disease associations prediction. Figure 2 Open in new tabDownload slide ROC curves performed by GAEMDA based on HMDD v2.0. Figure 2 Open in new tabDownload slide ROC curves performed by GAEMDA based on HMDD v2.0. Table 1 5-fold cross-validation results performed by GAEMDA based on HMDD v2.0 Testing set . Accuracy |$(\%)$| . Precision|$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . 1 84.30 80.43 90.35 85.10 2 85.36 80.96 92.10 86.17 3 86.23 84.74 88.41 86.54 4 85.27 81.94 91.25 86.35 5 83.47 78.77 91.39 84.61 Average |$84.93\pm 0.95$| |$81.37\pm 1.98$| |$90.70\pm 1.27$| |$85.75\pm 0.76$| Testing set . Accuracy |$(\%)$| . Precision|$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . 1 84.30 80.43 90.35 85.10 2 85.36 80.96 92.10 86.17 3 86.23 84.74 88.41 86.54 4 85.27 81.94 91.25 86.35 5 83.47 78.77 91.39 84.61 Average |$84.93\pm 0.95$| |$81.37\pm 1.98$| |$90.70\pm 1.27$| |$85.75\pm 0.76$| Open in new tab Table 1 5-fold cross-validation results performed by GAEMDA based on HMDD v2.0 Testing set . Accuracy |$(\%)$| . Precision|$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . 1 84.30 80.43 90.35 85.10 2 85.36 80.96 92.10 86.17 3 86.23 84.74 88.41 86.54 4 85.27 81.94 91.25 86.35 5 83.47 78.77 91.39 84.61 Average |$84.93\pm 0.95$| |$81.37\pm 1.98$| |$90.70\pm 1.27$| |$85.75\pm 0.76$| Testing set . Accuracy |$(\%)$| . Precision|$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . 1 84.30 80.43 90.35 85.10 2 85.36 80.96 92.10 86.17 3 86.23 84.74 88.41 86.54 4 85.27 81.94 91.25 86.35 5 83.47 78.77 91.39 84.61 Average |$84.93\pm 0.95$| |$81.37\pm 1.98$| |$90.70\pm 1.27$| |$85.75\pm 0.76$| Open in new tab Table 2 The comparison results of GAEMDA, GAEMDA-mean and GAEMDA-max based on HMDD v2.0 Models . Accuracy |$(\%)$| . Precision |$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . AUC|$(\%)$| . GAEMDA-mean 85.69 86.06 85.24 85.63 93.45 GAEMDA-max 85.76 85.87 85.63 85.73 93.50 GAEMDA 84.93 81.37 90.70 85.75 93.56 Models . Accuracy |$(\%)$| . Precision |$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . AUC|$(\%)$| . GAEMDA-mean 85.69 86.06 85.24 85.63 93.45 GAEMDA-max 85.76 85.87 85.63 85.73 93.50 GAEMDA 84.93 81.37 90.70 85.75 93.56 Open in new tab Table 2 The comparison results of GAEMDA, GAEMDA-mean and GAEMDA-max based on HMDD v2.0 Models . Accuracy |$(\%)$| . Precision |$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . AUC|$(\%)$| . GAEMDA-mean 85.69 86.06 85.24 85.63 93.45 GAEMDA-max 85.76 85.87 85.63 85.73 93.50 GAEMDA 84.93 81.37 90.70 85.75 93.56 Models . Accuracy |$(\%)$| . Precision |$(\%)$| . Recall|$(\%)$| . F1-score|$(\%)$| . AUC|$(\%)$| . GAEMDA-mean 85.69 86.06 85.24 85.63 93.45 GAEMDA-max 85.76 85.87 85.63 85.73 93.50 GAEMDA 84.93 81.37 90.70 85.75 93.56 Open in new tab Effect of projection dimensions We compared the effect of several projection dimensions |$E$| on the performance of GAEMDA under 2-layers encoder upon 5-fold cross-validation. The results were shown in Figure 3. We can see that with the increase of projection dimensions, the Recall and AUC values have been significantly improved, while the Acc. and F1-score values almost remain unchanged, only the Prec. value shows a downward trend. Considering that the AUC value can reflect the prediction performance of models more comprehensively, we also compared the AUC values of GAEMDA, GAEMDA-mean and GAEMDA-max under different projection dimensions |$E$| upon 5-fold cross-validation. From Figure 4, we can see that with the increase of projection dimensions, the AUC values of all these three models show an upward trend. And GAEMDA shows better AUC values than the other two models under these four different projection dimensions. In addition, we also tried 512 projection dimensions to evaluate the performance of our models, but gradient vanishing problem appeared during our models training phase. Therefore, we chose 256-dimensions as our default projection dimensions. Figure 3 Open in new tabDownload slide The average accuracy, precision, recall, F1-score and AUC values of GAEMDA under different projection dimensions upon 5-fold cross-validation. Figure 3 Open in new tabDownload slide The average accuracy, precision, recall, F1-score and AUC values of GAEMDA under different projection dimensions upon 5-fold cross-validation. Figure 4 Open in new tabDownload slide The AUC comparisons of GAEMDA, GAEMDA-mean and GAEMDA-max under different projection dimensions upon 5-fold cross-validation. Figure 4 Open in new tabDownload slide The AUC comparisons of GAEMDA, GAEMDA-mean and GAEMDA-max under different projection dimensions upon 5-fold cross-validation. Effect of the number of encoder layers In our models, we set the number of graph neural networks-based encoder layers to be stacked infinitely to enhance the prediction performance. Here, we set the projection dimensions as 256, and only utilized AUC values to quantify the predictive performance of these models. Therefore, we compared the predictive ability changes of GAEMDA, GAEMDA-mean and GAEMDA-max under different number of encoder layers upon 5-fold cross-validation. The results were shown in Figure 5. We can see that when the encoder layers |$L$| is set as 2, all the three models achieve the best prediction performance, and when the number of encoder layers continues to increase, the prediction performance of all three models shows a downward trend. Note that GAEMDA still obtains the highest AUC value under 2-layers encoder. In addition, in our initial experiments, we found that when the number of encoder layers is larger than 7, all these three models become hard to train, and then the AUC values drop sharply to 0.5. Therefore, we did not consider the case where the number of encoder layers is larger than 7. Figure 5 Open in new tabDownload slide The AUC comparisons of GAEMDA, GAEMDA-mean and GAEMDA-max under different number of encoder layers upon 5-fold cross-validation. Figure 5 Open in new tabDownload slide The AUC comparisons of GAEMDA, GAEMDA-mean and GAEMDA-max under different number of encoder layers upon 5-fold cross-validation. Comparison of GAEMDA with other related models In order to further demonstrate the superior performance of our model, we compared the prediction performance of GAEMDA model with seven state-of-the-art models, which are WBSMDA [18], RFMDA [23], PBMDA [46], LLCMDA [47], EDTMDA [48], GBDT-LR [28] and MCLPMDA [49]. For a fair comparison, the above models were all evaluated upon 5-fold cross-validation based on HMDD v2.0 [14]. In addition, since the above models adopted a variety of different evaluation metrics, we here only utilized AUC value to comprehensively measure the prediction performance of these models. Note that all AUC values were selected from the best values recorded in their papers. The comparison results were summarized in Table 3. We can see that our model achieves the highest AUC value among these nine models, and |$0.36\%$| higher than the second highest MCLPMDA model, which has been proved to be a top predictive model in a benchmarking test carried out by Huang et al. [50]. The superior performance of GAEMDA benefits from the graph neural networks-based encoder and the end-to-end training manner. Table 3 The comparison results of GAEMDA with other related models upon 5-fold cross-validation based on HMDD v2.0 Models . AUC |$(\%)$| . WBSMDA 81.85 RFMDA 88.18 PBMDA 91.72 LLCMDA 91.90 EDTMDA 91.92 GBDT-LR 92.74 MCLPMDA 93.20 GAEMDA 93.56 Models . AUC |$(\%)$| . WBSMDA 81.85 RFMDA 88.18 PBMDA 91.72 LLCMDA 91.90 EDTMDA 91.92 GBDT-LR 92.74 MCLPMDA 93.20 GAEMDA 93.56 Open in new tab Table 3 The comparison results of GAEMDA with other related models upon 5-fold cross-validation based on HMDD v2.0 Models . AUC |$(\%)$| . WBSMDA 81.85 RFMDA 88.18 PBMDA 91.72 LLCMDA 91.90 EDTMDA 91.92 GBDT-LR 92.74 MCLPMDA 93.20 GAEMDA 93.56 Models . AUC |$(\%)$| . WBSMDA 81.85 RFMDA 88.18 PBMDA 91.72 LLCMDA 91.90 EDTMDA 91.92 GBDT-LR 92.74 MCLPMDA 93.20 GAEMDA 93.56 Open in new tab Case studies For further validating the performance of GAEMDA on predicting potentially related miRNAs for specific diseases, we performed case studies on three important neoplasms diseases, which are colon neoplasms, esophageal neoplasms and kidney neoplasms. To be specific, we constructed training samples containing 5430 experimentally validated positive miRNA-disease associations and 5430 randomly selected negative miRNA-disease associations, which excluded the specific disease investigated for the case study. Then, the associations between the specific disease and the remaining miRNAs were used to construct testing samples. At last, we trained GAEMDA model on training samples and utilized the trained model to encode miRNA and disease embedding features. Then, the predicted probabilities were generated by the decoder of the trained model based on testing samples. Besides, we ranked the predicted probabilities and confirmed the top 50 candidate miRNAs according to dbDEMC [13] and miR2Disease [15] databases, respectively. As a common malignant tumor in the intestines, colon neoplasms refer to benign epithelial tumors of the colon cavity, which have an incidence rate second only to gastric and esophageal cancers [51]. Therefore, we designed the first case study to prioritize miRNAs potentially related to colon neoplasms. The results were shown in Table 4. We can see that 48 of the top 50 colon neoplasms-related miRNAs are successfully confirmed by dbDEMC or miR2Disease. As another high-incidence disease, esophageal neoplasms are a malignant tumor occurring in the esophageal tissue, and its etiology is related to chronic nitrosamine stimulation, inflammation and trace element content in common food [52]. Here, we selected esophageal neoplasms as our second case study and listed the predicted top-50 miRNAs in Table 5. As a result, 48 of the top 50 miRNAs are confirmed. Kidney neoplasms are one of the most common human urogenital tumors with complex pathological types and unusual clinical manifestations, which have a high incidence in European and American countries [53]. Thus, we selected kidney neoplasms to perform our third case study. From Table 6, we can see that 48 of the top 50 related miRNAs prioritized by GAEMDA can be confirmed by the other two databases. To sum up, we can deduce that GAEMDA model performs satisfactorily in prioritizing candidate miRNAs for certain diseases. Table 4 Top 50 colon neoplasms-related miRNAs predicted by GAEMDA based on HMDD v2.0 miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-21 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-155 Confirmed Confirmed hsa-mir-31 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-181a Confirmed Confirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-23a Confirmed Confirmed hsa-mir-150 Confirmed Unconfirmed hsa-mir-199a Unconfirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-195 Confirmed Confirmed hsa-mir-221 Confirmed Confirmed hsa-let-7a Confirmed Confirmed hsa-mir-1 Confirmed Confirmed hsa-mir-24 Confirmed Confirmed hsa-mir-29a Confirmed Confirmed hsa-mir-29c Confirmed Unconfirmed hsa-mir-15a Confirmed Unconfirmed hsa-mir-133b Confirmed Confirmed hsa-mir-20a Confirmed Confirmed hsa-mir-106b Confirmed Confirmed hsa-mir-122 Confirmed Unconfirmed hsa-mir-9 Confirmed Confirmed hsa-mir-19a Confirmed Confirmed hsa-mir-181b Confirmed Confirmed hsa-mir-19b Confirmed Confirmed hsa-mir-196a Confirmed Confirmed hsa-mir-133a Confirmed Confirmed hsa-mir-210 Confirmed Unconfirmed hsa-mir-223 Confirmed Confirmed hsa-mir-200c Confirmed Confirmed hsa-mir-92a Confirmed Unconfirmed hsa-let-7b Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-203 Confirmed Confirmed hsa-mir-18a Confirmed Confirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-34a Confirmed Confirmed hsa-mir-182 Confirmed Confirmed hsa-mir-29b Confirmed Confirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-26a Confirmed Confirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-214 Confirmed Unconfirmed hsa-mir-143 Confirmed Confirmed hsa-mir-34c Unconfirmed Confirmed hsa-mir-15b Confirmed Confirmed hsa-mir-124 Confirmed Unconfirmed miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-21 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-155 Confirmed Confirmed hsa-mir-31 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-181a Confirmed Confirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-23a Confirmed Confirmed hsa-mir-150 Confirmed Unconfirmed hsa-mir-199a Unconfirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-195 Confirmed Confirmed hsa-mir-221 Confirmed Confirmed hsa-let-7a Confirmed Confirmed hsa-mir-1 Confirmed Confirmed hsa-mir-24 Confirmed Confirmed hsa-mir-29a Confirmed Confirmed hsa-mir-29c Confirmed Unconfirmed hsa-mir-15a Confirmed Unconfirmed hsa-mir-133b Confirmed Confirmed hsa-mir-20a Confirmed Confirmed hsa-mir-106b Confirmed Confirmed hsa-mir-122 Confirmed Unconfirmed hsa-mir-9 Confirmed Confirmed hsa-mir-19a Confirmed Confirmed hsa-mir-181b Confirmed Confirmed hsa-mir-19b Confirmed Confirmed hsa-mir-196a Confirmed Confirmed hsa-mir-133a Confirmed Confirmed hsa-mir-210 Confirmed Unconfirmed hsa-mir-223 Confirmed Confirmed hsa-mir-200c Confirmed Confirmed hsa-mir-92a Confirmed Unconfirmed hsa-let-7b Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-203 Confirmed Confirmed hsa-mir-18a Confirmed Confirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-34a Confirmed Confirmed hsa-mir-182 Confirmed Confirmed hsa-mir-29b Confirmed Confirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-26a Confirmed Confirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-214 Confirmed Unconfirmed hsa-mir-143 Confirmed Confirmed hsa-mir-34c Unconfirmed Confirmed hsa-mir-15b Confirmed Confirmed hsa-mir-124 Confirmed Unconfirmed Open in new tab Table 4 Top 50 colon neoplasms-related miRNAs predicted by GAEMDA based on HMDD v2.0 miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-21 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-155 Confirmed Confirmed hsa-mir-31 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-181a Confirmed Confirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-23a Confirmed Confirmed hsa-mir-150 Confirmed Unconfirmed hsa-mir-199a Unconfirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-195 Confirmed Confirmed hsa-mir-221 Confirmed Confirmed hsa-let-7a Confirmed Confirmed hsa-mir-1 Confirmed Confirmed hsa-mir-24 Confirmed Confirmed hsa-mir-29a Confirmed Confirmed hsa-mir-29c Confirmed Unconfirmed hsa-mir-15a Confirmed Unconfirmed hsa-mir-133b Confirmed Confirmed hsa-mir-20a Confirmed Confirmed hsa-mir-106b Confirmed Confirmed hsa-mir-122 Confirmed Unconfirmed hsa-mir-9 Confirmed Confirmed hsa-mir-19a Confirmed Confirmed hsa-mir-181b Confirmed Confirmed hsa-mir-19b Confirmed Confirmed hsa-mir-196a Confirmed Confirmed hsa-mir-133a Confirmed Confirmed hsa-mir-210 Confirmed Unconfirmed hsa-mir-223 Confirmed Confirmed hsa-mir-200c Confirmed Confirmed hsa-mir-92a Confirmed Unconfirmed hsa-let-7b Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-203 Confirmed Confirmed hsa-mir-18a Confirmed Confirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-34a Confirmed Confirmed hsa-mir-182 Confirmed Confirmed hsa-mir-29b Confirmed Confirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-26a Confirmed Confirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-214 Confirmed Unconfirmed hsa-mir-143 Confirmed Confirmed hsa-mir-34c Unconfirmed Confirmed hsa-mir-15b Confirmed Confirmed hsa-mir-124 Confirmed Unconfirmed miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-21 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-155 Confirmed Confirmed hsa-mir-31 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-181a Confirmed Confirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-23a Confirmed Confirmed hsa-mir-150 Confirmed Unconfirmed hsa-mir-199a Unconfirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-195 Confirmed Confirmed hsa-mir-221 Confirmed Confirmed hsa-let-7a Confirmed Confirmed hsa-mir-1 Confirmed Confirmed hsa-mir-24 Confirmed Confirmed hsa-mir-29a Confirmed Confirmed hsa-mir-29c Confirmed Unconfirmed hsa-mir-15a Confirmed Unconfirmed hsa-mir-133b Confirmed Confirmed hsa-mir-20a Confirmed Confirmed hsa-mir-106b Confirmed Confirmed hsa-mir-122 Confirmed Unconfirmed hsa-mir-9 Confirmed Confirmed hsa-mir-19a Confirmed Confirmed hsa-mir-181b Confirmed Confirmed hsa-mir-19b Confirmed Confirmed hsa-mir-196a Confirmed Confirmed hsa-mir-133a Confirmed Confirmed hsa-mir-210 Confirmed Unconfirmed hsa-mir-223 Confirmed Confirmed hsa-mir-200c Confirmed Confirmed hsa-mir-92a Confirmed Unconfirmed hsa-let-7b Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-203 Confirmed Confirmed hsa-mir-18a Confirmed Confirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-34a Confirmed Confirmed hsa-mir-182 Confirmed Confirmed hsa-mir-29b Confirmed Confirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-26a Confirmed Confirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-214 Confirmed Unconfirmed hsa-mir-143 Confirmed Confirmed hsa-mir-34c Unconfirmed Confirmed hsa-mir-15b Confirmed Confirmed hsa-mir-124 Confirmed Unconfirmed Open in new tab Table 5 Top 50 esophageal neoplasms-related miRNAs predicted by GAEMDA based on HMDD v2.0 miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-17 Confirmed Unconfirmed hsa-mir-142 Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-30a Confirmed Unconfirmed hsa-mir-19b Confirmed Unconfirmed hsa-mir-106a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-218 Confirmed Unconfirmed hsa-mir-29a Confirmed Unconfirmed hsa-mir-133b Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-7 Confirmed Unconfirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-93 Confirmed Unconfirmed hsa-mir-29b Confirmed Unconfirmed hsa-mir-182 Confirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-20b Confirmed Unconfirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-429 Confirmed Unconfirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-107 Confirmed Confirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-206 Confirmed Unconfirmed hsa-let-7f Confirmed Unconfirmed hsa-mir-15b Confirmed Unconfirmed hsa-let-7i Confirmed Unconfirmed hsa-mir-30c Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-132 Confirmed Unconfirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-124 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-18b Confirmed Unconfirmed hsa-let-7 g Confirmed Unconfirmed hsa-mir-199b Confirmed Unconfirmed hsa-mir-106b Confirmed Unconfirmed hsa-mir-122 Unconfirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-127 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-30e Unconfirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-191 Confirmed Unconfirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-373 Confirmed Confirmed hsa-mir-125a Confirmed Unconfirmed hsa-mir-302b Confirmed Unconfirmed hsa-mir-10b Confirmed Unconfirmed hsa-mir-30d Confirmed Unconfirmed miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-17 Confirmed Unconfirmed hsa-mir-142 Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-30a Confirmed Unconfirmed hsa-mir-19b Confirmed Unconfirmed hsa-mir-106a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-218 Confirmed Unconfirmed hsa-mir-29a Confirmed Unconfirmed hsa-mir-133b Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-7 Confirmed Unconfirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-93 Confirmed Unconfirmed hsa-mir-29b Confirmed Unconfirmed hsa-mir-182 Confirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-20b Confirmed Unconfirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-429 Confirmed Unconfirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-107 Confirmed Confirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-206 Confirmed Unconfirmed hsa-let-7f Confirmed Unconfirmed hsa-mir-15b Confirmed Unconfirmed hsa-let-7i Confirmed Unconfirmed hsa-mir-30c Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-132 Confirmed Unconfirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-124 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-18b Confirmed Unconfirmed hsa-let-7 g Confirmed Unconfirmed hsa-mir-199b Confirmed Unconfirmed hsa-mir-106b Confirmed Unconfirmed hsa-mir-122 Unconfirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-127 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-30e Unconfirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-191 Confirmed Unconfirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-373 Confirmed Confirmed hsa-mir-125a Confirmed Unconfirmed hsa-mir-302b Confirmed Unconfirmed hsa-mir-10b Confirmed Unconfirmed hsa-mir-30d Confirmed Unconfirmed Open in new tab Table 5 Top 50 esophageal neoplasms-related miRNAs predicted by GAEMDA based on HMDD v2.0 miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-17 Confirmed Unconfirmed hsa-mir-142 Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-30a Confirmed Unconfirmed hsa-mir-19b Confirmed Unconfirmed hsa-mir-106a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-218 Confirmed Unconfirmed hsa-mir-29a Confirmed Unconfirmed hsa-mir-133b Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-7 Confirmed Unconfirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-93 Confirmed Unconfirmed hsa-mir-29b Confirmed Unconfirmed hsa-mir-182 Confirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-20b Confirmed Unconfirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-429 Confirmed Unconfirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-107 Confirmed Confirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-206 Confirmed Unconfirmed hsa-let-7f Confirmed Unconfirmed hsa-mir-15b Confirmed Unconfirmed hsa-let-7i Confirmed Unconfirmed hsa-mir-30c Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-132 Confirmed Unconfirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-124 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-18b Confirmed Unconfirmed hsa-let-7 g Confirmed Unconfirmed hsa-mir-199b Confirmed Unconfirmed hsa-mir-106b Confirmed Unconfirmed hsa-mir-122 Unconfirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-127 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-30e Unconfirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-191 Confirmed Unconfirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-373 Confirmed Confirmed hsa-mir-125a Confirmed Unconfirmed hsa-mir-302b Confirmed Unconfirmed hsa-mir-10b Confirmed Unconfirmed hsa-mir-30d Confirmed Unconfirmed miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-17 Confirmed Unconfirmed hsa-mir-142 Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-30a Confirmed Unconfirmed hsa-mir-19b Confirmed Unconfirmed hsa-mir-106a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-218 Confirmed Unconfirmed hsa-mir-29a Confirmed Unconfirmed hsa-mir-133b Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-mir-7 Confirmed Unconfirmed hsa-let-7e Confirmed Unconfirmed hsa-mir-93 Confirmed Unconfirmed hsa-mir-29b Confirmed Unconfirmed hsa-mir-182 Confirmed Unconfirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-20b Confirmed Unconfirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-429 Confirmed Unconfirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-107 Confirmed Confirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-206 Confirmed Unconfirmed hsa-let-7f Confirmed Unconfirmed hsa-mir-15b Confirmed Unconfirmed hsa-let-7i Confirmed Unconfirmed hsa-mir-30c Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-132 Confirmed Unconfirmed hsa-mir-200b Confirmed Unconfirmed hsa-mir-124 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-18b Confirmed Unconfirmed hsa-let-7 g Confirmed Unconfirmed hsa-mir-199b Confirmed Unconfirmed hsa-mir-106b Confirmed Unconfirmed hsa-mir-122 Unconfirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-127 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-30e Unconfirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-191 Confirmed Unconfirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-373 Confirmed Confirmed hsa-mir-125a Confirmed Unconfirmed hsa-mir-302b Confirmed Unconfirmed hsa-mir-10b Confirmed Unconfirmed hsa-mir-30d Confirmed Unconfirmed Open in new tab Table 6 Top 50 kidney neoplasms-related miRNAs predicted by GAEMDA based on HMDD v2.0 miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-155 Confirmed Unconfirmed hsa-mir-210 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-106b Confirmed Confirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-17 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-20a Confirmed Confirmed hsa-mir-200b Confirmed Confirmed hsa-mir-34a Confirmed Unconfirmed hsa-mir-31 Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-let-7a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-19a Confirmed Unconfirmed hsa-mir-34c Confirmed Unconfirmed hsa-mir-29a Confirmed Confirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-19b Confirmed Confirmed hsa-mir-15b Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-223 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-203 Confirmed Unconfirmed hsa-mir-126 Confirmed Confirmed hsa-mir-214 Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-145 Confirmed Unconfirmed hsa-mir-196a Confirmed Unconfirmed hsa-mir-92a Confirmed Unconfirmed hsa-mir-200a Confirmed Unconfirmed hsa-mir-150 Confirmed Confirmed hsa-mir-34b Confirmed Unconfirmed hsa-mir-29b Confirmed Confirmed hsa-let-7b Confirmed Unconfirmed hsa-mir-199a Confirmed Confirmed hsa-mir-29c Confirmed Confirmed hsa-mir-133a Confirmed Unconfirmed hsa-let-7e Unconfirmed Unconfirmed hsa-mir-143 Confirmed Unconfirmed hsa-let-7c Confirmed Unconfirmed hsa-mir-122 Confirmed Confirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-7 Confirmed Confirmed miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-155 Confirmed Unconfirmed hsa-mir-210 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-106b Confirmed Confirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-17 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-20a Confirmed Confirmed hsa-mir-200b Confirmed Confirmed hsa-mir-34a Confirmed Unconfirmed hsa-mir-31 Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-let-7a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-19a Confirmed Unconfirmed hsa-mir-34c Confirmed Unconfirmed hsa-mir-29a Confirmed Confirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-19b Confirmed Confirmed hsa-mir-15b Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-223 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-203 Confirmed Unconfirmed hsa-mir-126 Confirmed Confirmed hsa-mir-214 Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-145 Confirmed Unconfirmed hsa-mir-196a Confirmed Unconfirmed hsa-mir-92a Confirmed Unconfirmed hsa-mir-200a Confirmed Unconfirmed hsa-mir-150 Confirmed Confirmed hsa-mir-34b Confirmed Unconfirmed hsa-mir-29b Confirmed Confirmed hsa-let-7b Confirmed Unconfirmed hsa-mir-199a Confirmed Confirmed hsa-mir-29c Confirmed Confirmed hsa-mir-133a Confirmed Unconfirmed hsa-let-7e Unconfirmed Unconfirmed hsa-mir-143 Confirmed Unconfirmed hsa-let-7c Confirmed Unconfirmed hsa-mir-122 Confirmed Confirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-7 Confirmed Confirmed Open in new tab Table 6 Top 50 kidney neoplasms-related miRNAs predicted by GAEMDA based on HMDD v2.0 miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-155 Confirmed Unconfirmed hsa-mir-210 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-106b Confirmed Confirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-17 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-20a Confirmed Confirmed hsa-mir-200b Confirmed Confirmed hsa-mir-34a Confirmed Unconfirmed hsa-mir-31 Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-let-7a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-19a Confirmed Unconfirmed hsa-mir-34c Confirmed Unconfirmed hsa-mir-29a Confirmed Confirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-19b Confirmed Confirmed hsa-mir-15b Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-223 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-203 Confirmed Unconfirmed hsa-mir-126 Confirmed Confirmed hsa-mir-214 Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-145 Confirmed Unconfirmed hsa-mir-196a Confirmed Unconfirmed hsa-mir-92a Confirmed Unconfirmed hsa-mir-200a Confirmed Unconfirmed hsa-mir-150 Confirmed Confirmed hsa-mir-34b Confirmed Unconfirmed hsa-mir-29b Confirmed Confirmed hsa-let-7b Confirmed Unconfirmed hsa-mir-199a Confirmed Confirmed hsa-mir-29c Confirmed Confirmed hsa-mir-133a Confirmed Unconfirmed hsa-let-7e Unconfirmed Unconfirmed hsa-mir-143 Confirmed Unconfirmed hsa-let-7c Confirmed Unconfirmed hsa-mir-122 Confirmed Confirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-7 Confirmed Confirmed miRNA . dbDEMC . miR2Disease . miRNA . dbDEMC . miR2Disease . hsa-mir-155 Confirmed Unconfirmed hsa-mir-210 Confirmed Confirmed hsa-mir-146a Confirmed Unconfirmed hsa-mir-106b Confirmed Confirmed hsa-mir-125b Confirmed Unconfirmed hsa-mir-9 Confirmed Unconfirmed hsa-mir-17 Confirmed Confirmed hsa-mir-206 Confirmed Unconfirmed hsa-mir-20a Confirmed Confirmed hsa-mir-200b Confirmed Confirmed hsa-mir-34a Confirmed Unconfirmed hsa-mir-31 Confirmed Unconfirmed hsa-mir-16 Confirmed Unconfirmed hsa-let-7a Confirmed Unconfirmed hsa-mir-221 Confirmed Unconfirmed hsa-mir-181a Confirmed Unconfirmed hsa-mir-19a Confirmed Unconfirmed hsa-mir-34c Confirmed Unconfirmed hsa-mir-29a Confirmed Confirmed hsa-mir-181b Confirmed Unconfirmed hsa-mir-19b Confirmed Confirmed hsa-mir-15b Confirmed Unconfirmed hsa-mir-18a Confirmed Unconfirmed hsa-mir-142 Unconfirmed Unconfirmed hsa-mir-223 Confirmed Unconfirmed hsa-mir-24 Confirmed Unconfirmed hsa-mir-1 Confirmed Unconfirmed hsa-mir-203 Confirmed Unconfirmed hsa-mir-126 Confirmed Confirmed hsa-mir-214 Confirmed Confirmed hsa-mir-222 Confirmed Unconfirmed hsa-mir-146b Confirmed Unconfirmed hsa-mir-145 Confirmed Unconfirmed hsa-mir-196a Confirmed Unconfirmed hsa-mir-92a Confirmed Unconfirmed hsa-mir-200a Confirmed Unconfirmed hsa-mir-150 Confirmed Confirmed hsa-mir-34b Confirmed Unconfirmed hsa-mir-29b Confirmed Confirmed hsa-let-7b Confirmed Unconfirmed hsa-mir-199a Confirmed Confirmed hsa-mir-29c Confirmed Confirmed hsa-mir-133a Confirmed Unconfirmed hsa-let-7e Unconfirmed Unconfirmed hsa-mir-143 Confirmed Unconfirmed hsa-let-7c Confirmed Unconfirmed hsa-mir-122 Confirmed Confirmed hsa-let-7d Confirmed Unconfirmed hsa-mir-195 Confirmed Unconfirmed hsa-mir-7 Confirmed Confirmed Open in new tab Discussion In general, heterogeneous miRNAs and diseases features belong to different dimensional spaces. And, it is hard to perform operations on different dimensional features. Therefore, we projected the heterogeneous miRNAs and diseases features into the same vector space. Then, heterogeneous features can be calculated in the same vector space. Meanwhile, considering that graph neural networks can effectively aggregate neighbor features in the graph and enhance the feature representation of nodes, we designed a graph neural networks-based encoder to explore the rich miRNA-disease interaction information and generate effective embeddings of miRNA and diseases nodes. Then, a bilinear decoder followed by a sigmoid activation function was applied to obtain the miRNA-disease associations scores. Besides, the end-to-end training mode adopted by our model can better predict potential miRNA-disease associations based on the specific link prediction task. However, our encoder model is hard to be extended to very deep layers due to the over-smoothing issue of graph neural networks. Conclusion Abnormal miRNA expression has been widely observed in the evolution of numerous complex human diseases. The identification of disease-related miRNAs can facilitate the pathological study of related diseases and promote the development of clinical medicine. In this paper, we designed a novel graph auto-encoder model to predict the potential associations between miRNAs and diseases, called GAEMDA. GAEMDA adopted a graph neural networks-based encoder to generate the embeddings of miRNAs and diseases features, and then applied a bilinear decoder to reconstruct the links between miRNAs and diseases. Furthermore, several evaluation metrics under 5-fold cross-validation and case studies on three common complex diseases all demonstrated the satisfactory prediction performance of GAEMDA. Therefore, GAEMDA can serve as a powerful tool to guide researchers to study the regulatory role of related miRNAs. However, considering that the proportion of validated associations between miRNAs and diseases is very small, and our model relied heavily on the direct neighbor information between heterogeneous nodes, we planned to seek ways to expand the scope of aggregated information, such as aggregating information of multi-hops neighbors, to further enhance the prediction performance. Besides, embedding more biological information, such as miRNA sequence or target information, and designing more effective aggregators, such as attention mechanism, are also our following research focus. Key Points Aberrant miRNAs expression has close associations with the evolution of multiple complex human diseases. Based on experimentally validated databases, computational methods can prioritize potentially related miRNAs for specific diseases, and prompt researchers carry out relevant wet experiments for further validation. In this paper, we designed a graph auto-encoder model combining a graph neural networks-based encoder and a bilinear decoder to predict potential miRNA-disease associations in an end-to-end manner (GAEMDA). Results on several common evaluation metrics all showed superior performance of GAEMDA under 5-fold cross-validation based on HMDD v2.0. Besides, case studies on three important diseases also showed satisfactory results. GAEMDA model is a successful attempt to utilize the rich miRNA-disease interaction information to generate effective embeddings and can serve as a reliable tool to guide the following researches. Acknowledgments The authors would like to thank all anonymous reviewers for their constructive advices. Funding National Natural Science Foundation of China (61722212, 61873270, 61902337, 61732012 and 61972399) Zhengwei Li, PhD, is an associate professor of Engineering Research Center of Mine Digitalization of Ministry of Education and School of Computer Science and Technology, China University of Mining and Technology. His research interests include disease and noncoding RNAs, network pharmacology, complex network and machine learning. Jiashu Li, is a full-time master student of School of Computer Science and Technology, China University of Mining and Technology. His research interests include disease and microRNAs a deep learning. Ru Nie, PhD, is an associate professor of School of Computer Science and Technology, China University of Mining and Technology. Her research interests include disease and noncoding RNAs, complex network and pattern recognition. Zhu-Hong You, PhD, is a professor of Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science. His research interests include disease and noncoding RNAs, network pharmacology and machine learning. Wenzheng Bao, PhD, is a lecturer of School of Information Engineering, Xuzhou University of Technology. His research interests include disease and noncoding RNAs, machine learning and pattern recognition. References 1. Ambros V . The functions of animal microRNAs . Nature 2004 ; 431 : 350 – 5 . Google Scholar Crossref Search ADS PubMed WorldCat 2. Bartel DP . MicroRNAs: genomics, biogenesis, mechanism and function . Cell 2004 ; 116 : 281 – 97 . Google Scholar Crossref Search ADS PubMed WorldCat 3. Lee RC , Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 . Cell 1993 ; 75 : 843 – 54 . Google Scholar Crossref Search ADS PubMed WorldCat 4. Reinhart BJ , Slack FJ, Basson M, et al. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans . Nature 2000 ; 403 : 901 – 6 . Google Scholar Crossref Search ADS PubMed WorldCat 5. Xu P , Guo M, Hay BA. MicroRNAs and the regulation of cell death . Trends Genet 2004 ; 20 : 617 – 24 . Google Scholar Crossref Search ADS PubMed WorldCat 6. Meltzer PS . Cancer genomics: small RNAs with big impacts . Nature 2005 ; 435 : 745 – 6 . Google Scholar Crossref Search ADS PubMed WorldCat 7. Liu YJ , Lin YF, Chen YF, et al. MicroRNA-449a enhances radiosensitivity in CL1-0 lung adenocarcinoma cells . PloS One 2013 ; 8 :e62383. Google Scholar OpenURL Placeholder Text WorldCat 8. Li D , Zhao Y, Liu C, et al. Analysis of MiR-195 and MiR-497 expression, regulation and role in breast cancer . Clin Cancer Res 2011 ; 17 : 1722 – 30 . Google Scholar Crossref Search ADS PubMed WorldCat 9. Chen CZ . MicroRNAs as oncogenes and tumor suppressors . N Engl J Med 2005 ; 353 : 1768 – 71 . Google Scholar Crossref Search ADS PubMed WorldCat 10. Freeman WM , Walker SJ, Vrana KE. Quantitative RT-PCR: pitfalls and potential . Biotechniques 1999 ; 26 : 112 – 22 124-115 . Google Scholar Crossref Search ADS PubMed WorldCat 11. Várallyay E , Burgyán J, Havelda Z. MicroRNA detection by northern blotting using locked nucleic acid probes . Nat Protoc 2008 ; 3 : 190 – 6 . Google Scholar Crossref Search ADS PubMed WorldCat 12. Baskerville S , Bartel DP. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes . RNA 2005 ; 11 : 241 – 7 . Google Scholar Crossref Search ADS PubMed WorldCat 13. Yang Z , Ren F, Liu C, et al. dbDEMC: a database of differentially expressed miRNAs in human cancers . BMC Genomics 2010 ; 11 : S5 – 5 . Google Scholar Crossref Search ADS PubMed WorldCat 14. Li Y , Qiu C, Tu J, et al. HMDD v2.0: a database for experimentally supported human microRNA and disease associations . Nucleic Acids Res 2013 ; 42 : D1070 – 4 . Google Scholar Crossref Search ADS PubMed WorldCat 15. Jiang Q , Wang Y, Hao Y, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease . Nucleic Acids Res 2009 ; 37 : D98 – 104 . Google Scholar Crossref Search ADS PubMed WorldCat 16. Chen X , Xie D, Zhao Q, et al. MicroRNAs and complex diseases: from experimental results to computational models . Brief Bioinform 2019 ; 20 : 515 – 39 . Google Scholar Crossref Search ADS PubMed WorldCat 17. Jiang Q , Hao Y, Wang G, et al. Prioritization of disease microRNAs through a human phenome-microRNAome network . BMC Syst Biol 2010 ; 4 :S2. Google Scholar OpenURL Placeholder Text WorldCat 18. Chen X , Yan CC, Zhang X, et al. WBSMDA: within and between score for MiRNA-disease association prediction . Sci Rep 2016 ; 6 :21106. Google Scholar OpenURL Placeholder Text WorldCat 19. Che K , Guo M, Wang C, et al. Predicting MiRNA-disease association by latent feature extraction with positive samples . Genes (Basel) 2019 ; 10 . Google Scholar OpenURL Placeholder Text WorldCat 20. Zhang W , Li Z, Guo W, et al. A fast linear neighborhood similarity-based network link inference method to predict microRNA-disease associations . IEEE/ACM Trans Comput Biol Bioinform 2019 . Google Scholar OpenURL Placeholder Text WorldCat 21. Ma Y , He T, Ge L, et al. MiRNA-disease interaction prediction based on kernel neighborhood similarity and multi-network bidirectional propagation . BMC Med Genomics 2019 ; 12 :185. Google Scholar OpenURL Placeholder Text WorldCat 22. Chen X , Yan CC, Zhang X, et al. RBMMMDA: predicting multiple types of disease-microRNA associations . Sci Rep 2015 ; 5 :13877. Google Scholar OpenURL Placeholder Text WorldCat 23. Chen X , Wang CC, Yin J, et al. Novel human miRNA-disease association inference based on random Forest . Mol Ther Nucleic Acids 2018 ; 13 : 568 – 79 . Google Scholar Crossref Search ADS PubMed WorldCat 24. Yao D , Zhan X, Kwoh CK. An improved random forest-based computational model for predicting novel miRNA-disease associations . BMC Bioinformatics 2019 ; 20 :624. Google Scholar OpenURL Placeholder Text WorldCat 25. Yan C , Wang J, Ni P, et al. DNRLMF-MDA:predicting microRNA-disease associations based on similarities of microRNAs and diseases . IEEE/ACM Trans Comput Biol Bioinform 2019 ; 16 : 233 – 43 . Google Scholar Crossref Search ADS PubMed WorldCat 26. Peng J , Hui W, Li Q, et al. A learning-based framework for miRNA-disease association identification using neural networks . Bioinformatics 2019 ; 35 : 4364 – 71 . Google Scholar Crossref Search ADS PubMed WorldCat 27. Zheng K , You ZH, Wang L, et al. MLMDA: a machine learning approach to predict and validate MicroRNA-disease associations by integrating of heterogenous information sources . J Transl Med 2019 ; 17 :260. Google Scholar OpenURL Placeholder Text WorldCat 28. Zhou S , Wang S, Wu Q, et al. Predicting potential miRNA-disease associations by combining gradient boosting decision tree with logistic regression . Comput Biol Chem 2020 ; 85 :107200. Google Scholar OpenURL Placeholder Text WorldCat 29. Ji BY , You ZH, Cheng L, et al. Predicting miRNA-disease association from heterogeneous information network with GraRep embedding model . Sci Rep 2020 ; 10 :6658. Google Scholar OpenURL Placeholder Text WorldCat 30. Kipf TN , Welling M. Semi-supervised classification with graph convolutional networks . arXiv e-prints. 2016 , arXiv:1609.02907 . 31. van den Berg R , Kipf TN, Welling M. Graph convolutional matrix completion . arXiv e-prints . 2017 , arXiv:1706.02263 . 32. Hamilton WL , Ying R, Leskovec JJae-p. Inductive representation learning on large graphs. arXiv e-prints . 2017 , arXiv:1706.02216 . 33. Veličković P , Cucurull G, Casanova A, et al. Graph attention networks . arXiv e-prints . 2017 , arXiv:1710.10903 . 34. Li C , Liu H, Hu Q, et al. A novel computational model for predicting microRNA-disease associations based on heterogeneous graph convolutional networks . Cells 2019 ; 8 :977. Google Scholar OpenURL Placeholder Text WorldCat 35. Li J , Zhang S, Liu T, et al. Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction . Bioinformatics 2020 ; 36 : 2538 – 46 . Google Scholar Crossref Search ADS PubMed WorldCat 36. Li J , Li Z, Nie R, et al. FCGCNMDA: predicting miRNA-disease associations by applying fully connected graph convolutional networks . Mol Genet Genomics 2020 ; 295 : 1197 – 209 . Google Scholar Crossref Search ADS PubMed WorldCat 37. Wang D , Wang J, Lu M, et al. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases . Bioinformatics 2010 ; 26 : 1644 – 50 . Google Scholar Crossref Search ADS PubMed WorldCat 38. Xuan P , Han K, Guo M, et al. Prediction of microRNAs associated with human diseases based on weighted k most similar Neighbors . PloS one 2013 ; 8 :e70204. Google Scholar OpenURL Placeholder Text WorldCat 39. Pasquier C , Gardès J. Prediction of miRNA-disease associations with a vector space model . Sci Rep 2016 ; 6 :27036. Google Scholar OpenURL Placeholder Text WorldCat 40. Fan S , Zhu J, Han X, et al. Metapath-guided heterogeneous graph neural network for intent recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Anchorage, AK. USA: Association for Computing Machinery , 2019 , 2478 – 86 . 41. Schlichtkrull M , Kipf TN, Bloem P, et al. Modeling relational data with graph convolutional networks . arXiv e-prints . 2017 , arXiv:1703.06103 . 42. Kipf TN , Welling M. Variational graph auto-encoders . arXiv e-prints. 2016 , arXiv:1611.07308. 43. Wang M , Yu L, Zheng D, et al. Deep graph library: towards efficient and scalable deep learning on graphs . arXiv e-prints. 2019 , arXiv:1909.01315 . 44. Glorot X , Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics , 2010 , 249 – 56 . 45. Kingma DP , Ba J. Adam: a method for stochastic optimization . arXiv e-prints . 2014 , arXiv:1412.6980 . 46. You ZH , Huang ZA, Zhu Z, et al. PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction . PLoS Comput Biol 2017 ; 13 :e1005455. Google Scholar OpenURL Placeholder Text WorldCat 47. Qu Y , Zhang H, Lyu C, et al. LLCMDA: a novel method for predicting miRNA gene and disease relationship based on locality-constrained linear coding . Front Genet 2018 ; 9 : 576 . Google Scholar Crossref Search ADS PubMed WorldCat 48. Chen X , Zhu CC, Yin J. Ensemble of decision tree reveals potential miRNA-disease associations . PLoS Comput Biol 2019 ; 15 :e1007209. Google Scholar OpenURL Placeholder Text WorldCat 49. Yu SP , Liang C, Xiao Q, et al. MCLPMDA: a novel method for miRNA-disease association prediction based on matrix completion and label propagation . J Cell Mol Med 2019 ; 23 : 1427 – 38 . Google Scholar Crossref Search ADS PubMed WorldCat 50. Huang Z , Liu L, Gao Y, et al. Benchmark of computational methods for predicting microRNA-disease associations . Genome Biol 2019 ; 20 : 202 . Google Scholar Crossref Search ADS PubMed WorldCat 51. Pita-Fernández S , Pértega-Díaz S, López-Calviño B, et al. Diagnostic and treatment delay, quality of life and satisfaction with care in colorectal cancer patients: a study protocol . Health Qual Life Outcomes 2013 ; 11 : 117 . Google Scholar Crossref Search ADS PubMed WorldCat 52. Kollarova H , Machova L, Horakova D, et al. Epidemiology of esophageal cancer--an overview article . Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub 2007 ; 151 : 17 – 20 . Google Scholar Crossref Search ADS PubMed WorldCat 53. Shephard E , Neal R, Rose P, et al. Clinical features of kidney cancer in primary care: a case-control study using primary care records . Br J Gen Pract 2013 ; 63 : e250 – 5 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - A graph auto-encoder model for miRNA-disease associations prediction JF - Briefings in Bioinformatics DO - 10.1093/bib/bbaa240 DA - 2020-10-19 UR - https://www.deepdyve.com/lp/oxford-university-press/a-graph-auto-encoder-model-for-mirna-disease-associations-prediction-TIoqI11KFF SP - 1 EP - 1 VL - Advance Article IS - DP - DeepDyve ER -