Access the full text.
Sign up today, get DeepDyve free for 14 days.
Summary: DincRNA aims to provide a comprehensive web-based bioinformatics toolkit to eluci- date the entangled relationships among diseases and non-coding RNAs (ncRNAs) from the per- spective of disease similarity. The quantitative way to illustrate relationships of pair-wise diseases always depends on their molecular mechanisms, and structures of the directed acyclic graph of Disease Ontology (DO). Corresponding methods for calculating similarity of pair-wise diseases in- volve Resnik’s, Lin’s, Wang’s, PSB and SemFunSim methods. Recently, disease similarity was vali- dated suitable for calculating functional similarities of ncRNAs and prioritizing ncRNA–disease pairs, and it has been widely applied for predicting the ncRNA function due to the limited biological knowledge from wet lab experiments of these RNAs. For this purpose, a large number of algo- rithms and priori knowledge need to be integrated. e.g. ‘pair-wise best, pairs-average’ (PBPA) and ‘pair-wise all, pairs-maximum’ (PAPM) methods for calculating functional similarities of ncRNAs, and random walk with restart (RWR) method for prioritizing ncRNA–disease pairs. To facilitate the exploration of disease associations and ncRNA function, DincRNA implemented all of the above eight algorithms based on DO and disease-related genes. Currently, it provides the function to query disease similarity scores, miRNA and lncRNA functional similarity scores, and the prioritiza- tion scores of lncRNA–disease and miRNA–disease pairs. Availability and implementation: http://bio-annotation.cn:18080/DincRNAClient/ Contact: biofomeng@hotmail.com or qhjiang@hit.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction (ncRNAs) (Sun et al., 2014; Wang et al., 2010), and predicting Disease similarity researches include the designment of the quantita- ncRNA–disease associations (Sun et al., 2014) and so on. tively measure to calculate the similarity of pair-wise diseases The development of Disease Ontology (DO) (Kibbe et al., 2015) (SPWD) and similarity of pair-wise disease sets (SPWDS). This do- provides a way to investigate disease similarity using system biology main attracted much more attention since its wide application in es- methods, because it is the first vocabulary established around dis- tablishing functional similarity network (FSN) of non-coding RNAs ease names. The state-of-art methods for calculating the similarity V The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 1953 1954 L.Cheng et al. of pair-wise DO terms involve Wang’s, Resnik’s, Lin’s process- 2.4 Designment and implementation of the system similarity based (PSB) and SemFunSim methods (Cheng et al., 2014; The three-layer architecture involving DATABASE, ALGORITHM Lin, 1998; Mathur and Dinakarpandian, 2012; Resnik, 1995; Wang and TOOLS layer is designed in the Figure 1. All the datasets are et al., 2007). In addition, the commonly used method for calculating stored in the relational database management system MySQL 5.5 as SPWDS based on SPWD is ‘pair-wise best, pairs-average’ (PBPA) DATABASE layer. Eight methods involving Wang’s, Resnik’s, Lin’s, (Sun et al., 2014), and an alternative method is ‘pair-wise all, pairs- PSB, SemFunSim, PBPA, PAPM and RWR methods are imple- maximum’ (PAPM) (Pesquita et al., 2009). Recently, disease mented in ALGORITHM layer. DincRNA has been implemented on similarity was widely used for predicting the ncRNA function and a JavaEE framework and run on the web server [2-core (2.26 GHz) prioritizing ncRNA–disease pairs based on random walk with re- processors] of UCloud (Sqalli et al., 2012). It provides web applica- start (RWR) method as the lack of proteins from these RNAs. First, tion by a typical browser/server model in Apache Tomcat container. the functional similarity of pair-wise ncRNAs is converted to the The querying results are packed into JSON objects for transferring similarity of their inducing disease sets, which can be calculated and displaying. In the front end of the DincRNA, the Asynchronous based on PBPA and PAPM methods. Then the similarities of all the JavaScript and XML (AJAX) technique is adopted for exchanging pair-wise ncRNAs are used to construct ncRNA FSN, where each data asynchronously between the browser and the server to avoid RNA is deemed as a node and the functional similarity score as full page reloads. the weight of edge. Finally, the network is utilized to prioritize ncRNA–disease pairs using a RWR method. The details of these 3 Results methods were described in the ‘Supplementary Methods’ section of Supplementary Material. 3.1 Web interface Although SPWDS is widely applied in computing functional The DincRNA provides web pages for users to query disease similar- similarity of ncRNAs and predicting novel ncRNA–disease associ- ity score, ncRNA functional similarity score, and ncRNA–disease ation, no tools were designed for the purpose nowadays. In this prioritization score. All of these scores can be downloaded from paper, we presented a comprehensive toolkit to explore disease asso- ‘Resources’ page or result page. ciations and ncRNA function from the perspective of disease simi- The DincRNA provides a search engine to query entities involv- larity. The toolkit provides tools for retrieving SPWD, SPWDS, ing disease names, DOIDs, lncRNA symbols and miRNA symbols in lncRNA functional similarity (LFS), miRNA functional similarity the input page. The autocomplete function of the page can help (MFS), prioritization of lncRNA–disease pair and prioritization of users to input the interested diseases and ncRNA names easily. To miRNA–disease pair. Our toolkit is freely available at http://bio-an support the calculation of the similarities of multiple pairs of disease notation.cn:18080/DincRNAClient/. sets online, the DincRNA also provides the batch processing func- tion which allows users to input disease sets by files (Supplementary Fig. S2). 2 Materials and methods Each entry of query results by DincRNA contains names of the 2.1 Priori semantic associations between diseases pair-wise entities and their similarity score or prioritization score. Semantic associations between diseases were extracted from DO. The result page provides the paging and sorting functions to show Currently, DO contains 7124 ‘IS_A’ relationships between 6920 all of the entries. Each page can show 20, 50 or 100 entries as disease terms. required based on paging function (Supplementary Fig. S2). Entries can be displayed in descending or ascending order of similarity score or prioritization score by sorting function (Supplementary Fig. S2). 2.2 Datasets for calculating disease similarity In addition, all the query results can be downloaded in JSON and Disease-related genes, gene-related biological processes (BPs) and CSV formats (Supplementary Fig. S2) for easing to browse the the human gene functional network are utilized for calculating querying results locally. Detailed usage of the DincRNA is described disease similarity. Priori disease–gene associations are scattered in in the Supplementary Material. multiple manual databases. Main of these databases include Gene Reference into Function (GeneRIF) (Mitchell et al., 2003), Online Mendelian Inheritance in Man (OMIM) (Amberger et al., 2011), 3.2 Performance evaluation of the prioritization of Genetic Association Database (GAD) (Becker et al., 2004) and ncRNA–disease pairs Comparative Toxicogenomics Database (CTD) (Davis et al., 2013). Since SemFunSim, PSB, Wang’s, Lin’s and Resnik’s methods can After mapping disease names in these four databases to DO terms be combined with PBPA or PAPM methods for prioritizing based on SIDD (Cheng et al., 2013), the integrated disease–gene as- sociations are obtained. In addition, gene-related BP is from GO Annotation (GOA) (Camon, 2004), and the human gene functional network are from HumanNet (Lee et al., 2011). 2.3 Priori ncRNA–disease associations Here ncRNA–disease associations contain lncRNA–disease asso- ciations and miRNA–disease associations. lncRNA–disease asso- ciations are from LncRNADisease (Chen et al., 2013) and Lnc2Cancer (Ning et al., 2016). miRNA–disease associations are from HMDD v2.0 (Li et al., 2014). All of the disease names in these databases are manually mapped to DO terms. As a result, we get 5710 associations between 265 diseases and 556 miRNAs, and 596 associations between 161 diseases and 343 lncRNAs. Fig. 1. System overview of DincRNA A comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function 1955 Table 1. Accuracy of prioritizing lncRNA–disease pairs based on PBPA method for specified false positive rate Method Accuracy Accuracy Accuracy (FPR¼ 0.05) (FPR¼ 0.10) (FPR¼ 0.15) Lin 0.831 0.903 0.939 PSB 0.913 0.958 0.973 Resnik 0.953 0.977 0.996 SemFunSim 0.825 0.941 0.975 Wang 0.729 0.896 0.934 comprehensive for larger size of sample. And corresponding the per- formance should be more accurate. Although the AUCs based on PBPA method decline slightly with the increasement of size of the sample, its performance is still very well due to the high AUC. In comparison with the prioritization of lncRNA–disease pairs, the AUCs of prioritizing miRNA–disease pairs using PAPW declines dramatically. The decline of the performance with the larger size of sample may be caused by its more noise. These results also show that the performance of PAPW is unstable. That may be the reason why PBPA is applied more widely than PAPW (Chen et al., 2015; Sun et al., 2014; Wang et al., 2010). The AUCs based on SemFunSim, PSB, Wang’s, Lin’s and Fig. 2. Performance evaluation results. (A) The ROC curve of prioritizing Resnik’s methods are close to each other according to Figure 2B. For lncRNA–disease pairs based on PBPA method. (B) The performance evalu- example, the maximum and minimum AUCs of prioritizing ation of prioritizing ncRNA–disease pairs based on different methods miRNA–disease pairs based on PBPA are 0.892 and 0.852 respect- ively. These stable results validate that existing methods for calculat- ncRNA–disease pairs using RWR method, it is not easy for users to ing SPWD are mature. choose the most suitable ones. Hence, we evaluated the performance of each type of combinations in prioritizing lncRNA–disease and miRNA–disease pairs by leave-one-out cross validation. Totally 525 priori lncRNA–disease associations, including 88 Acknowledgements diseases with at least two lncRNAs, were used for this assessment. We thank Dr. Jiajie Peng for discussion of the system design. We also thank The receiver operating characteristic (ROC) curves of prioritizing Dr. Hengqianq Zhao for DincRNA testing. lncRNA–disease pairs based on lncRNA FSN (LFSN) constructed by PBPA method with different algorithms for calculating SPWD were shown in Figure 2A. The areas under ROC curve (AUCs) based Funding on the combinations between SemFunSim, PSB, Wang’s, Lin’s, Resnik’s methods and PBPA method are 0.978, 0.985, 0.961, 0.967 This work was supported by the National Natural Science Foundation of and 0.990 respectively. The further evaluation for accuracy under China [Grant No. 61502125], Heilongjiang Postdoctoral Fund [LBH- Z15179] and China Postdoctoral Science Foundation [2016M590291]. lower false positive rate (FPR) is shown in Table 1. When the FPR was set to 15%, the accuracy based on the combinations between Conflict of Interest: none declared. SemFunSim, PSB, Wang’s, Lin’s, Resnik’s methods and PBPA method are 0.975, 0.973, 0.934, 0.939 and 0.996 respectively. All of the evaluation results based on different combinations are very References well and close. The similar evaluation results occur based on the Amberger,J. et al. (2011) A new face and new challenges for Online V R combinations between SemFunSim, PSB, Wang’s, Lin’s, Resnik’s Mendelian Inheritance in Man (OMIM ). Hum. Mutat., 32, 564–567. methods and PAPM method, which is shown in Figure 2B. These re- Becker,K.G. et al. (2004) The genetic association database. Nat. Genet., 36, sults show that PBPA and PAPM methods combined with other al- 431–432. gorithms are very suitable for prioritizing lncRNA–disease pairs. Camon,E. (2004) The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res., 32, Analogously 5661 priori miRNA–disease associations of 261 D262–D266. diseases were utilized for leave-one-out cross validation. The AUCs Chen,G. et al. (2013) LncRNADisease: a database for long-non-coding based on PBPA and PAPM methods are shown in Figure 2B. RNA-associated diseases. Nucleic Acids Res., 41, D983–D986. Overall, the evaluation results of miRNAs are lower than those of Chen,X. et al. (2015) Constructing lncRNA functional similarity network lncRNAs using PBPA method. For example, The AUCs of lncRNA– based on lncRNA–disease associations and disease semantic similarity. Sci. disease and miRNA–disease pairs based on the combination of Rep., 5, 11338. SemFunSim and PBPA methods are 0.978 and 0.875 respectively. Cheng,L. et al. (2014) SemFunSim: a new method for measuring disease simi- This may be caused by the difference of the size of the sample in larity by integrating semantic and gene functional association. PLoS One, 9, lncRNAs and miRNAs. Intuitively the assessment should be more e99415. 1956 L.Cheng et al. Cheng,L. et al. (2013) SIDD: a semantically integrated database towards a glo- Ning,S. et al. (2016) Lnc2Cancer: a manually curated database of experimen- bal view of human disease. PLoS One, 8, e75504. tally supported lncRNAs associated with various human cancers. Nucleic Davis,A.P. et al. (2013) The comparative toxicogenomics database: update Acids Res., 44, D980–D985. 2013. Nucleic Acids Res., 41, D1104–D1114. Pesquita,C. et al. (2009) Semantic similarity in biomedical ontologies. PLoS Kibbe,W.A. et al. (2015) Disease Ontology 2015 update: an expanded and Comput. Biol., 5, e1000443. updated database of human diseases for linking biomedical knowledge Resnik,P. (1995) Using information content to evaluate semantic similarity in through disease data. Nucleic Acids Res., 43, D1071–D1078. a taxonomy, arXiv preprint cmp-lg/9511007. Lee,I. et al. (2011) Prioritizing candidate disease genes by network-based Sqalli,M.H. et al. (2012) UCloud: a simulated Hybrid Cloud for a university boosting of genome-wide association data. Genome Res., 21, 1109–1121. environment. In: 2012 IEEE 1st International Conference on Cloud Li,Y. et al. (2014) HMDD v2.0: a database for experimentally supported human Networking (CLOUDNET), IEEE, pp. 170–172. microRNA and disease associations. Nucleic Acids Res., 42, D1070–D1074. Sun,J. et al. (2014) Inferring novel lncRNA–disease associations based on a Lin,D. (1998) An information-theoretic definition of similarity. ICML,p random walk model of a lncRNA functional similarity network. Mol. 296–304. bioSystems, 10, 2074–2081. Mathur,S. and Dinakarpandian,D. (2012) Finding disease similarity based on Wang,D. et al. (2010) Inferring the human microRNA functional similar- implicit semantic similarity. J. Biomed. Inf., 45, 363–371. ity and functional network based on microRNA-associated diseases. Mitchell,J.A. et al. (2003) Gene indexing: characterization and analysis of Bioinformatics, 26, 1644–1650. NLM’s GeneRIFs. In: AMIA .. . Annual Symposium proceedings/AMIA Wang,J.Z. et al. (2007) A new method to measure the semantic similarity of Symposium. AMIA Symposium, pp. 460–464. GO terms. Bioinformatics, 23, 1274–1281.
Bioinformatics – Oxford University Press
Published: Jan 22, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.