Access the full text.
Sign up today, get DeepDyve free for 14 days.
F. Ferrè, A. Colantoni, M. Helmer-Citterich (2015)
Revealing protein–lncRNA interactionBriefings in Bioinformatics, 17
D. Rumelhart, Geoffrey Hinton, Ronald Williams (1986)
Learning representations by back-propagating errorsNature, 323
Xingli Guo, Lin Gao, Q. Liao, Hui Xiao, Xiaoke Ma, Xiaofei Yang, Haitao Luo, Guoguang Zhao, Dechao Bu, F. Jiao, Qixiang Shao, Runsheng Chen, Yi Zhao (2012)
Long non-coding RNAs function annotation: a global prediction method based on bi-colored networksNucleic Acids Research, 41
Hanghang Tong, C. Faloutsos, Jia-Yu Pan (2006)
Fast Random Walk with Restart and Its ApplicationsSixth International Conference on Data Mining (ICDM'06)
Yajing Hao, Wei Wu, Hui Li, Jiao Yuan, Jianjun Luo, Yi Zhao, Runsheng Chen (2016)
NPInter v3.0: an upgraded database of noncoding RNA-associated interactionsDatabase: The Journal of Biological Databases and Curation, 2016
M. Turner, Alison Galloway, E. Vigorito (2014)
Noncoding RNA and its associated proteins as regulatory elements of the immune systemNature Immunology, 15
Q. Zou, Jinjin Li, Li Song, Xiangxiang Zeng, Guohua Wang (2015)
Similarity computation strategies in the microRNA-disease network: a survey.Briefings in functional genomics, 15 1
J. Mazar, Amy Rosado, J. Shelley, John Marchica, Tamarah Westmoreland (2016)
The long non-coding RNA GAS5 differentially regulates cell cycle arrest and apoptosis through activation of BRCA1 and p53 in human neuroblastomaOncotarget, 8
Yasunobu Okamura, Yuichi Aoki, T. Obayashi, Shu Tadaka, S. Ito, Takafumi Narise, K. Kinoshita (2014)
COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systemsNucleic Acids Research, 43
Jun Li, Meng Zhang, G. An, Q. Ma (2016)
LncRNA TUG1 acts as a tumor suppressor in human glioma by promoting cell apoptosisExperimental Biology and Medicine, 241
Homin Lee, Amy Hsu, J. Sajdak, J. Qin, P. Pavlidis (2004)
Coexpression analysis of human genes across many microarray data sets.Genome research, 14 6
Chao Fan, Diwei Liu, Rui Huang, Zhigang Chen, L. Deng (2016)
PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibilityBMC Bioinformatics, 17
Wei Tang, Z. Liao, Q. Zou (2016)
Which statistical significance test best detects oncomiRNAs in cancer tissues? An exploratory analysisOncotarget, 7
Jingpu Zhang, Zuping Zhang, Zhigang Chen, L. Deng (2019)
Integrating Multiple Heterogeneous Networks for Novel LncRNA-Disease Association InferenceIEEE/ACM Transactions on Computational Biology and Bioinformatics, 16
Q. Zou, Jinjin Li, Qingqi Hong, Ziyu Lin, Yun Wu, Hua Shi, Y. Ju (2015)
Prediction of MicroRNA-Disease Associations Based on Social Network Analysis MethodsBioMed Research International, 2015
A. Necsulea, M. Soumillon, Maria Warnefors, A. Liechti, Tasman Daish, U. Zeller, J. Baker, F. Grützner, H. Kaessmann (2014)
The evolution of lncRNA repertoires and expression patterns in tetrapodsNature, 505
Chao Xie, Jiao Yuan, Hui Li, Ming Li, Guoguang Zhao, Dechao Bu, Weimin Zhu, Wei Wu, Runsheng Chen, Yi Zhao (2013)
NONCODEv4: exploring the world of long non-coding RNA genesNucleic Acids Research, 42
T. Mercer, M. Dinger, J. Mattick (2009)
Long non-coding RNAs: insights into functionsNature Reviews Genetics, 10
C. Schneider, R. King, L. Philipson (1988)
Genes specifically expressed at growth arrest of mammalian cellsCell, 54
Guiyou Liu, Fang Zhang, Yongshuai Jiang, Yang Hu, Z. Gong, Shoufeng Liu, Xiuju Chen, Qinghua Jiang, J. Hao (2017)
Integrating genome-wide association studies and gene expression data highlights dysregulated multiple sclerosis risk pathwaysMultiple Sclerosis Journal, 23
Xueqing Zhang, S. Weissman, P. Newburger (2014)
Long intergenic non-coding RNA HOTAIRM1 regulates cell cycle progression during myeloid maturation in NB4 human promyelocytic leukemia cellsRNA Biology, 11
Hyunghoon Cho, B. Berger, Jian Peng (2015)
Diffusion Component Analysis: Unraveling Functional Topology in Biological NetworksResearch in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB, 9029
M. Paraskevopoulou, A. Hatzigeorgiou (2016)
Analyzing MiRNA-LncRNA Interactions.Methods in molecular biology, 1402
R. Cerri, Rodrigo Barros, A. Carvalho (2015)
Hierarchical classification of Gene Ontology-based protein functions with neural networks2015 International Joint Conference on Neural Networks (IJCNN)
Sheng Wang, Hyunghoon Cho, ChengXiang Zhai, B. Berger, Jian Peng (2015)
Exploiting ontology graph for predicting sparsely annotated gene functionBioinformatics, 31
José Garzón, L. Deng, D. Murray, S. Shapira, Donald Petrey, B. Honig (2016)
A computational interactome and functional annotation for the human proteomeeLife, 5
Orly Wapinski, Howard Chang (2011)
Long noncoding RNAs and human disease.Trends in cell biology, 21 6
M. Pickard, M. Mourtada-Maarabouni, Gwyn Williams (2013)
Long non-coding RNA GAS5 regulates apoptosis in prostate cancer cell lines.Biochimica et biophysica acta, 1832 10
R. Cerri, Rodrigo Barros, A. Carvalho (2014)
Hierarchical multi-label classification using local neural networksJ. Comput. Syst. Sci., 80
Xueqing Zhang, Z. Lian, C. Padden, M. Gerstein, J. Rozowsky, Michael Snyder, T. Gingeras, P. Kapranov, S. Weissman, P. Newburger (2009)
A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster.Blood, 113 11
A. Mortazavi, B. Williams, K. McCue, Lorian Schaeffer, B. Wold (2008)
Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Methods, 5
H. Chua, L. Wong (2012)
Predicting Protein Functions from Protein Interaction NetworksInt. J. Knowl. Discov. Bioinform., 3
L. Deng, Zhigang Chen (2015)
An Integrated Framework for Functional Annotation of Protein Structural DomainsIEEE/ACM Transactions on Computational Biology and Bioinformatics, 12
D.B. Marina (2015)
The lincrnahotairm1, located in thehoxagenomic region, is expressed in acute myeloid leukemia, impacts prognosis in patients in the intermediate-risk cytogenetic category, and is associated with a distinctive microrna signatureOncotarget, 6
K. Morris, J. Mattick (2014)
The rise of regulatory RNANature Reviews Genetics, 15
Barrett (2007)
D760Nucl. Acids Res, 35
Qinghua Jiang, Rui Ma, Jixuan Wang, Xiaoliang Wu, Shuilin Jin, Jiajie Peng, Renjie Tan, Tianjiao Zhang, Yu Li, Yadong Wang (2015)
LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq dataBMC Genomics, 16
Q. Liao, Changning Liu, Xiongying Yuan, Shuli Kang, Ruoyu Miao, Hui Xiao, Guoguang Zhao, Haitao Luo, Dechao Bu, Haitao Zhao, G. Skogerbø, Zhongdao Wu, Yi Zhao (2011)
Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression networkNucleic Acids Research, 39
E. Birney, J. Stamatoyannopoulos, A. Dutta, R. Guigó, T. Gingeras, E. Margulies, Z. Weng, M. Snyder, E. Dermitzakis, R. Thurman, Michael Kuehn, Christopher Taylor, Shane Neph, Christoph Koch, S. Asthana, A. Malhotra, I. Adzhubei, J. Greenbaum, R. Andrews, Paul Flicek, Patrick Boyle, Hua Cao, N. Carter, Gayle Clelland, S. Davis, Nathan Day, P. Dhami, Shane Dillon, M. Dorschner, H. Fiegler, P. Giresi, J. Goldy, M. Hawrylycz, A. Haydock, R. Humbert, K. James, Brett Johnson, Ericka Johnson, Tristan Frum, Elizabeth Rosenzweig, N. Karnani, K. Lee, Gregory Lefebvre, P. Navas, Fidencio Neri, Stephen Parker, P. Sabo, R. Sandstrom, A. Shafer, D. Vetrie, M. Weaver, S. Wilcox, Man Yu, F. Collins, J. Dekker, J. Lieb, T. Tullius, G. Crawford, S. Sunyaev, W. Noble, I. Dunham, F. Denoeud, A. Reymond, P. Kapranov, J. Rozowsky, D. Zheng, R. Castelo, A. Frankish, J. Harrow, Srinka Ghosh, A. Sandelin, I. Hofacker, R. Baertsch, Damian Keefe, S. Dike, Jill Cheng, H. Hirsch, E. Sekinger, Julien Lagarde, J. Abril, A. Shahab, C. Flamm, C. Fried, J. Hackermüller, Jana Hertel, Manja Lindemeyer, Kristin Missal, Andrea Tanzer, Stefan Washietl, J. Korbel, O. Emanuelsson, J. Pedersen, N. Holroyd, Ruth Taylor, D. Swarbreck, N. Matthews, M. Dickson, D. Thomas, M. Weirauch, J. Gilbert, J. Drenkow, I. Bell, X. Zhao, K. Srinivasan, W. Sung, H. Ooi, K. Chiu, S. Foissac, T. Alioto, M. Brent, L. Pachter, M. Tress, A. Valencia, S. Choo, C. Choo, C. Ucla, C. Manzano, Carine Wyss, E. Cheung, T. Clark, James Brown, M. Ganesh, Sandeep Patel, H. Tammana, Jacqueline Chrast, C. Henrichsen, C. Kai, J. Kawai, U. Nagalakshmi, Jiaqian Wu, Z. Lian, Jin Lian, P. Newburger, Xueqing Zhang, P. Bickel, J. Mattick, Piero Carninci, Y. Hayashizaki, S. Weissman, T. Hubbard, R. Myers, J. Rogers, P. Stadler, T. Lowe, Chia-Lin Wei, Y. Ruan, K. Struhl, M. Gerstein, S. Antonarakis, Yutao Fu, E. Green, Ulas Karaöz, A. Siepel, James Taylor, L. Liefer, K. Wetterstrand, P. Good, E. Feingold, M. Guyer, G. Cooper, G. Asimenos, Colin Dewey, Minmei Hou, S. Nikolaev, J. Montoya-Burgos, A. Löytynoja, S. Whelan, F. Pardi, Tim Massingham, Haiyan Huang, Na Zhang, I. Holmes, Jim Mullikin, A. Ureta-Vidal, B. Paten, Michael Seringhaus, D. Church, K. Rosenbloom, W. Kent, Eric Stone, S. Batzoglou, N. Goldman, R. Hardison, D. Haussler, Webb Miller, A. Sidow, N. Trinklein, Zhengdong Zhang, Leah Barrera, R. Stuart, D. King, A. Ameur, Stefan Enroth, M. Bieda, Jonghwan Kim, A. Bhinge, N. Jiang, Jun Liu, Fei Yao, V. Vega, C. Lee, Patrick Ng, A. Yang, Z. Moqtaderi, Zhou Zhu, Xiaoqin Xu, S. Squazzo, M. Oberley, David Inman, Michael Singer, T. Richmond, Kyle Munn, Á. Rada-Iglesias, O. Wallerman, J. Komorowski, J. Fowler, Phillippe Couttet, Alexander Bruce, O. Dovey, P. Ellis, C. Langford, D. Nix, G. Euskirchen, S. Hartman, A. Urban, P. Kraus, Sara Calcar, Nathaniel Heintzman, Tae Kim, Kun Wang, Chunxu Qu, G. Hon, R. Luna, C. Glass, M. Rosenfeld, S. Aldred, S. Cooper, Anason Halees, J. Lin, H. Shulha, Xiaoling Zhang, Mousheng Xu, Jaafar Haidar, Yon-Jong Yu, V. Iyer, Roland Green, C. Wadelius, P. Farnham, B. Ren, R. Harte, A. Hinrichs, Heather Trumbower, H. Clawson, J. Hillman-Jackson, A. Zweig, Kayla Smith, Archana Thakkapallayil, G. Barber, R. Kuhn, D. Karolchik, L. Armengol, C. Bird, P. Bakker, Andrew Kern, N. López-Bigas, Joel Martin, B. Stranger, A. Woodroffe, Eugene Davydov, A. Dimas, E. Eyras, Ingileif Hallgrímsdóttir, J. Huppert, M. Zody, G. Abecasis, X. Estivill, G. Bouffard, Xiaobin Guan, N. Hansen, J. Idol, V. Maduro, Baishali Maskeri, Jennifer Mcdowell, Morgan Park, Pamela Thomas, Alice Young, R. Blakesley, D. Muzny, E. Sodergren, D. Wheeler, K. Worley, Huaiyang Jiang, G. Weinstock, R. Gibbs, T. Graves, R. Fulton, E. Mardis, R. Wilson, M. Clamp, James Cuff, S. Gnerre, D. Jaffe, Jean Chang, K. Lindblad-Toh, E. Lander, M. Koriabine, M. Nefedov, K. Osoegawa, Y. Yoshinaga, B. Zhu, P. Jong (2007)
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectNature, 447
P. Rocca-Serra, A. Brazma, H. Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, S. Contrino, J. Vilo, Niran Abeygunawardena, Gaurab Mukherjee, Ele Holloway, M. Kapushesky, P. Kemmeren, G. Lara, A. Oezcimen, Susanna-Assunta Sansone (2003)
ArrayExpress: a public database of gene expression data at EBI.Comptes rendus biologies, 326 10-11
T. Derrien, Rory Johnson, G. Bussotti, Andrea Tanzer, S. Djebali, Hagen Tilgner, G. Guernec, David Martin, A. Merkel, David Knowles, Julien Lagarde, Lavanya Veeravalli, Xiaoan Ruan, Y. Ruan, T. Lassmann, Piero Carninci, James Brown, L. Lipovich, J. Gonzalez, Mark Thomas, C. Davis, R. Shiekhattar, T. Gingeras, T. Hubbard, C. Notredame, J. Harrow, R. Guigó (2012)
The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expressionGenome Research, 22
M. Guttman, I. Amit, Manuel Garber, Courtney French, Michael Lin, D. Feldser, Maite Huarte, O. Zuk, B. Carey, John Cassady, M. Cabili, R. Jaenisch, T. Mikkelsen, T. Jacks, N. Hacohen, B. Bernstein, Manolis Kellis, A. Regev, J. Rinn, E. Lander (2009)
Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammalsNature, 458
G. Raho, V. Barone, D. Rossi, L. Philipson, V. Sorrentino (2000)
The gas 5 gene shows four alternative splicing patterns without coding for a protein.Gene, 256 1-2
A. Dupuy, E. Caron (2008)
Integrin-dependent phagocytosis – spreading from microadhesion to new conceptsJournal of Cell Science, 121
M. Simon (2013)
Capture hybridization analysis of RNA targets (CHART).Current protocols in molecular biology, Chapter 21
Ci Chu, Jeffrey Quinn, Howard Chang (2012)
Chromatin Isolation by RNA Purification (ChIRP)Journal of Visualized Experiments : JoVE
Cerri (2014)
39J. Comput. Syst. Sci, 80
R. Cerri, Rodrigo Barros, A. Carvalho, Yaochu Jin (2016)
Reduction strategies for hierarchical multi-label classification in protein function predictionBMC Bioinformatics, 17
Margaret Ebert, P. Sharp (2010)
Emerging Roles for Natural MicroRNA SpongesCurrent Biology, 20
T. Barrett, D. Troup, S. Wilhite, Pierre Ledoux, D. Rudnev, Carlos Evangelista, Irene Kim, Alexandra Soboleva, M. Tomashevsky, Ron Edgar (2006)
NCBI GEO: mining tens of millions of expression profiles—database and tools updateNucleic Acids Research, 35
H. Cho (2015)
Diffusion component analysis: unraveling functional topology in biological networksComput. Sci, 9029
Guoxian Yu, Guangyuan Fu, Jun Wang, Yingwen Zhao (2018)
NewGOA: Predicting New GO Annotations of Proteins by Bi-Random Walks on a Hybrid GraphIEEE/ACM Transactions on Computational Biology and Bioinformatics, 15
Birney (2007)
799Nature, 447
T. Mercer, J. Mattick (2013)
Structure and function of long noncoding RNAs in epigenetic regulationNature Structural &Molecular Biology, 20
Motivation: Long non-coding RNAs (lncRNAs) are an enormous collection of functional non- coding RNAs. Over the past decades, a large number of novel lncRNA genes have been identified. However, most of the lncRNAs remain function uncharacterized at present. Computational approaches provide a new insight to understand the potential functional implications of lncRNAs. Results: Considering that each lncRNA may have multiple functions and a function may be further specialized into sub-functions, here we describe NeuraNetL2GO, a computational ontological func- tion prediction approach for lncRNAs using hierarchical multi-label classification strategy based on multiple neural networks. The neural networks are incrementally trained level by level, each per- forming the prediction of gene ontology (GO) terms belonging to a given level. In NeuraNetL2GO, we use topological features of the lncRNA similarity network as the input of the neural networks and employ the output results to annotate the lncRNAs. We show that NeuraNetL2GO achieves the best performance and the overall advantage in maximum F-measure and coverage on the man- ually annotated lncRNA2GO-55 dataset compared to other state-of-the-art methods. Availability and implementation: The source code and data are available at http://denglab.org/ NeuraNetL2GO/. Contact: leideng@csu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction types and number of lncRNAs are far more than those of protein- Long non-coding RNAs (lncRNAs), which have little or no potential coding transcripts (Birney et al., 2007). Accumulating evidence to encode for functional proteins (Mercer et al., 2009), have a wide shows that lncRNAs are involved in many biological processes, such distribution in organisms. Large numbers of lncRNAs have been rec- as immune response, development, differentiation and gene imprint- ognized in many organisms along with the development of DNA ing (Morris and Mattick, 2014; Tang et al., 2016; Turner et al., sequencing technologies. The number of lncRNAs increased signifi- 2014) and are associated with diseases and cancers (Wapinski and cantly in recent years with the extensive utilization of experimental Chang, 2011; Zhang et al., 2017; Zou et al., 2015a, b). However, technologies to annotate transcriptome (Mortazavi et al., 2008). the functions of most lncRNAs and the underlying molecular mech- Large-scale analyses of the transcriptome have revealed that the anisms of gene regulation remain unclear. Hence, the annotation of V The Author(s) 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 1750 NeuraNetL2GO 1751 lncRNA functions has become an area of focus in the fields of biol- In this work, we propose NeuraNetL2GO, which uses multiple ogy and bioinformatics. neural networks to annotate probable function of lncRNAs at a Currently, some biological schemes for determining the functions large scale (Cerri et al., 2014; Cerri et al., 2015; Ricardo et al., of lncRNAs are as follows: analysis of lncRNA high-throughput 2016). First, we construct a lncRNA-lncRNA biological network expression profiles (Liu et al., 2017), verification of high- according to lncRNA co-expression data. Second, we generate the throughput data and exploring lncRNA function as part of inter- topological feature vectors of the co-expression network by running actions. The high-throughput analysis of lncRNA expression random walks with restart (RWR; Tong et al., 2006). Finally, we profiles is performed through microarrays and RNA-seq (Mortazavi build multiple neural networks, in which the topological feature vec- et al., 2008). Li et al. (2016) utilized quantitative RT-PCR to detect tors are used as inputs, and the gene ontology (GO) terms are the the expression profiles of lncRNA TUG1 in glioma tissues and con- output labels of these neural networks. We generate 13 neural net- ducted correlation analysis to reveal the relationship between TUG1 works in total since the GO terms are distributed over the 13 levels expression and different clinicopathologic parameters. They also in the directed acyclic graph (DAG) hierarchy of GO and each neu- researched into the function and found the influence of TUG1 on ral network corresponds to the GO terms in one level. In the inde- apoptosis and cell proliferation. In addition to expression-based pendent test, we achieve a maximum F-measure of 0.336 on the methods, the application of next-generation sequencing technology manually annotated 55 lncRNAs with 129 GO terms, which is sig- opens up a new way for us to construct genome-wide interaction nificantly better than that of the other two state-of-the-art methods: maps for biomolecules (Garzo ´n et al., 2016). The biological function lnc-GFP (Guo et al., 2013) and LncRNA2Function (Jiang et al., of lncRNAs in the cell could be considered as a function of biologi- 2015). cal interactions mediated by lncRNAs with other biomolecules (e.g. DNAs, RNAs, proteins). Some well-characterized lncRNAs (e.g. HOTTIP, HOTAIR) carry out their function by interacting with 2 Materials and methods DNA (Mercer and Mattick, 2013). Many experimental methods to As an overview, the flowchart of our method is depicted in Figure 1. investigate RNA–DNA interactions have been proposed in recent The primary processing is composed of several steps: (i) Construct years, such as chromatin isolation by RNA purification developed the lncRNA similarity network according to the lncRNA expression by Jeffrey and his cooperators (Jeffrey and Chang, 2012) and profiles; (ii) diffusion component analysis (DCA) (Cho et al., 2015) capture hybridization analysis of RNA targets designed by Simon is adopted to obtain a low-dimensional vector representation of (2013). Apart from interaction with DNA, lncRNAs have been dem- each node in the lncRNA similarity network; (iii) Build the training onstrated to interact with RNAs. Among various types of RNA, GO annotation dataset using neighbor counting method (Wong and interactions with miRNA are most well-studied (Paraskevopoulou Chua, 2012); (iv) Train the multi-layer networks incrementally, level and Hatzigeorgiou, 2016), for example, lncRNA could act as a by level, and apply the neural networks to the independent test data- sponge to regulate the behavior of regulatory miRNAs (Ebert and set and the human genome. Sharp, 2010). Besides with DNA and RNA, interactions with pro- tein are pervasive and protein–RNA interactions are crucial aspects of many cellular processes (Yu et al., 2017). Ferre ` et al. addressed the approaches to reveal the lncRNA–protein interactions (Ferre ` et al., 2016). To explore the function of lncRNAs, it is usually neces- sary to combine one or more of the above-described interactions. The experimentally identifying functions of lncRNAs are usually expensive and progressing slowly. Computational methods for pre- dicting lncRNA function become more and more important. Since genes with identical or similar functions tend to have similar expres- sion patterns across multiple different tissues (Lee et al., 2004), it is B C an efficient approach to analyze the role of the lncRNAs by analyz- ing the co-expression patterns shared with their neighboring coun- terparts (Necsulea et al., 2014). Guttman et al. (2009) identified some lincRNAs, then computed functional associations using gene set enrichment analysis (GSEA). GSEA was based on co-expression patterns, but the authors did not build a complete co-expression net- work. In another study, the researchers constructed a coding and non-coding gene co-expression network according to the abundant expression profiles in the GEO database, then predicted the func- tions of more than 300 mouse lncRNAs based on co-expression and genomic co-location (Qi et al., 2011). Guo et al. (2013) developed an approach named lnc-GFP to predict function for 1625 lncRNAs. In lnc-GFP, a bi-colored biological network was constructed and took into account both coding and non-coding co-expression pro- files and protein–protein interactions. In 2015, Jiang et al. (2015) computed the Pearson correlation coefficients (PCCs) of all Fig. 1. Flowchart of NeuraNetL2GO. It includes four steps: (A) Construct the lncRNA-mRNA gene pairs according to the expressions of all lncRNA similarity network. (B) Extract topological features in the network human lncRNAs and mRNAs in the 19 tissues and then annotated with the DCA approach. (C) Build the training dataset by employing the 9625 human lncRNAs by employing the hypergeometric test. Neighbor Counting method. (D) Training the multi-layer neural networks 1752 J.Zhang et al. 2.1 Construct the lncRNA similarity network The construction of the lncRNA similarity network is based on the assumption that transcripts which have similar expression patterns have similar functions or share related biological pathways. We cal- culate the PCC between the expression profiles of each pair of lncRNAs. The PCC values are used as the weights of the similarity networks. 2.2 Obtain a low-dimensional vector representation In the lncRNA co-expression network, the nodes (lncRNAs) that are similar in the topological structure may have similar functions. We employ the DCA strategy to extract the low-dimensional topological Fig. 2. Architecture of the multiple neural networks for a three-level hierarchy. information of lncRNAs. First, the RWR algorithm is performed on (A) Train a neural network for the first level. (B) Train a neural network for the each node in the lncRNA similarity network. Since considering the second level, the input of which includes the features of the instances and the local and global topological information in the network, RWR can output of the neural network of level 1. (C) Use the output of the neural net- work of level 2 to augment the feature vectors for training the neural network select the relevant or similar nodes in the network. of level 3 Let P be a vector in which the i-th entry holds the probability of the i-th node being visited at step t. The probability vector at step learning model is split into simpler models. The multiple neural net- t þ 1 can be decided by works are trained sequentially, level by level. After the training proc- p ¼ðÞ 1 r M p þ rp : (1) tþ1 t 0 ess of one neural network for a certain level, both the predictions of the network and the feature information extracted from the instan- The parameter r is the restart probability which is a balancing param- ces belonging to the next level are employed to train the next neural eter determining the importance of local and global topological infor- network. The procedure keeps on until the last level. mation; M is the transition probability of the network and M is the The architecture of the multiple neural networks for a three-level transpose matrix of M. After a large number of steps, the probabilities hierarchy is illustrated in Figure 2.Here, X is the feature vectors of will reach to a steady distribution which is called as the ‘diffusion the instances corresponding to classes from level l; H and O repre- l l state’. The steady probability provides a measurement of the proxim- sent the hidden layer and output layer of the neural network at level l, ity to the seed nodes. When the two nodes harbor similar diffusion respectively. The architecture of the neural network at each level states, it suggests that the two nodes are in similar positions in the net- includes one hidden layer and one output layer to be trained. In our work to other nodes. This means they are similar in function. work, the feature vectors of the instances are the low-dimensional However, the diffusion states are high-dimensional when the network vector-space representations for all nodes in the lncRNA similarity is large and may have noise information. To solve this problem, we network. Since there are multiple GO terms at each level of the DAG use singular value decomposition to reduce the dimensionality of dif- hierarchy, each output neuron corresponds to one class, namely one fusion states (Cho et al.,2015; Wang et al.,2015). GO term. First, the neural network corresponding to the first level 1 is trained (Fig. 2A), then the second neural network at level 2 follows. 2.3 Build the training dataset Besides the feature vectors of the instances that are assigned to the At present, the fact that there are no public GO annotations of classes belonging to level 2 (X ), the output of the neural network of lncRNAs limits the application of machine learning (Fan et al., level 1 is also the input of the network of level 2 (Fig. 2B). In the same 2016). Hence, we use the neighbor counting method to annotate way, the third neural network at level 3 is trained after the training of some lncRNAs according to the known GO annotations of protein. the neural network at level 2 is finished (Fig. 2C). The procedure of This approach is based on the fact that the target lncRNA may have incremental training does not end until the neural network at the last very similar functions as that of the direct neighbor proteins in the level is trained. It should be pointed out that when a neural network lncRNA-protein association network. at level l is trained, the neural networks at the previous levels are not For each target lncRNA l in the lncRNA-protein association net- re-trained since these networks have already been trained in the pre- work, the frequency of appearance of each function f 2 F is calcu- vious steps. The output of the network at level l – 1 only acts as a por- lated based on the direct neighbors of l,where F is the set of functions tion of the input for the network at level l. In our networks, we use owned by all direct neighbors of l. The function is as follows: the quadratic cost function and sigmoid function as the cost function S ðÞ L ¼ InðÞ ; f ; (2) and activation function, respectively. In training, the famous back- n2N propagation is used to train the neural networks (Rumelhart et al., where InðÞ ; f ¼ 1 if the neighbor n has the function f, 0 otherwise. 1986). The pseudocodes for the training procedures are presented in N is the set of direct neighbors of lncRNA l in the lncRNA-protein l Algorithm 1. We use a top-down strategy to predict the GO terms association network. A proper minimum threshold frequency needs according to the test instances. The test examples act as the input for to be selected to adjust the prediction for lncRNA. the neural network at the first level, and then the output from the first network, combining the feature vector of test example, is fed to the 2.4 Train the multi-layer networks second neural network. The output values from the second network Since GO functions are organized as a DAG hierarchy (Deng and will augment the feature vector of test example belonging to the third level once again. The procedure is continued until the last network is Chen, 2015), the prediction of lncRNA functions can be considered as a hierarchical multi-label classification. We associate one neural reached. The output values for each level are achieved in sequence. network to each level of the class hierarchy. In this way, the complex When the procedure is finished, the output values of the output layers NeuraNetL2GO 1753 of the neural networks fall in the range [0, 1] since the sigmoid func- Algorithm 1 NeuralNETLGO algorithm tion is employed as activation function in the neurons. Different thresholds are applied to the output neurons of the networks to Require: X: feature matrix of training instances, y: label obtain the predicted GO terms for each level. If the output value of a matrix of annotations, dataValid: feature matrix of validating neuron in the output layer is equal to or larger than a given threshold, instances, Levels: level number of the GO hierarchy, epochs: the corresponding position of the class vector is set to 1. Otherwise, number of training epochs, g learning rate in Back-propaga- the position is assigned 0. tion, a: momentum factor in Back-propagation Ensure: W: weights of the multiple neural networks 3 Results //Initialize the weights of the multiple neural networks InitializeRandomWeights(W) 3.1 Datasets and pre-processing for i ¼ 1to epochs 3.1.1 lncRNA co-expression similarity for l ¼ 1to Levels We extract the lncRNA expression profiles from NONCODE2016 for x in X database (Xie et al., 2014), which provides the expression profiles of //r is activation function 90 062 lncRNAs in 24 human tissues or cell types. PCC between the 1 1 a ¼ rðW x Þ 1 expression profiles of each pair of lncRNAs is computed, and then 1 1 1 1 a ¼ rðW h Þ 1 the lncRNA similarity network is built according to the PCC scores 2 2 if l ¼ 1 between lncRNAs. //Calculate error err ¼ y a 3.1.2 lncRNA-protein associations //Calculate gradients The lncRNA-protein associations are computed based on the co- 0 1 d ¼ err r ðW h Þ expression data and the interactions between lncRNAs and proteins. 2 2 //Elementwise product We downloaded all human lncRNA genes and protein-coding genes 1 T 1 1 0 1 d ¼ððW Þ d Þ r ðW x Þ from the GENCODE Release 24 (Derrien et al., 2012) and extracted 1 2 2 1 //Update weights a total of 15 941 lncRNA genes and 20 284 protein-coding genes. i1 1 1 1 1 DW ¼ a ðDW Þ þ g ðd a Þ 2 Then, we calculated the co-expressions and interactions: 2 2 2 2 1 i1 1 1 1 DW ¼ a ðDW Þ þ g ðd x Þ 1 1 1 1 1 1 1 1 1 Co-expression data from COXPRESdb (Okamura et al., 2015). W ¼ W þ DW 2 2 2 1 1 1 We extracted three preprocessed co-expression datasets (Hsa.c4- W ¼ W þ DW 1 1 1 1, Hsa2.c2-0 and Hsa3.c1-0) with pre-calculated pairwise PCC else values for human from COXPRESdb. The correlations are calcu- //Feedforward from level 2 to l lated as follows: for j ¼ 2to l j1 x ¼ x a //Concatenate vectors j j D j j a ¼ rðW x Þ C ðÞ l; p ¼ 1 ðÞ 1 CðÞ l; p if CðÞ l; p > 0; (3) j C d d 1 1 j j j d¼1 a ¼ rðW a Þ 2 2 1 end for where C(l, p) is the overall correlation between gene l (lncRNA) err ¼ y a and protein-coding gene p, CðÞ l; p is the correlation score 0 l d ¼ err r ðW h Þ 2 2 between l and p in dataset d, D is the number of pairs (l and p) l T l l 0 l d ¼ððW Þ d Þ r ðW x Þ 1 2 2 1 with positive correlation scores. //Update weights in level l i1 l l l l Co-expression data from ArrayExpress (Rocca-Serra et al., DW ¼ a ðDW Þ þ g ðd a Þ 2 2 2 2 1 i1 l l l 2003) and GEO (Barrett et al., 2007). We obtained the co- DW ¼ a ðDW Þ þ g ðd x Þ 1 l 1 1 1 1 l l l expression data from the work of Jiang et al. (Jiang et al., 2015). W ¼ W þ DW 2 2 2 l l l PCC values (denoted as C ) are used to evaluate the co- W ¼ W þ DW 1 1 1 end if expression of lncRNA-protein pairs. end for LncRNA-protein interaction data. We extracted human end for lncRNA-protein interactions from Npinter 3.0 (Hao et al., Measure ¼ validate(W, dataValid) 2016). The score I(l, p) is 1 if there exists an interaction between if Measure > bestMeasure lncRNA l and protein p, otherwise the score is 0. bestMeasure¼ Measure Finally, we computed overall association score for each lncRNA- else protein pair by combining the three sources of co-expression and earlyStopþþ interactions: if earlyStop¼=maxEpochs break ðÞ ðÞ ðÞ Al; p ¼ 1 1 C 1 C 1 I : (4) C J end if end if end for 3.1.3 Benchmarks Return W The neural networks are trained using a predicted GOA-lncRNA dataset, which includes more than 4000 lncRNAs by employing the 1754 J.Zhang et al. 2 PrecðÞ t RecðÞ t neighbor counting method according to the lncRNA–protein inter- F ¼ max : (11) max PrecðÞ t þ RecðÞ t actions. An lncRNA is annotated with a GO term if the number of protein neighbors annotated with the GO term is larger than a Also, coverage is used to evaluate our method and compare it with threshold N. In this paper, N is assigned 20. At last, a total of 4031 other methods. It is defined as the ratio of the portion of lncRNAs lncRNAs are annotated, 70% of which are randomly chosen as the annotated with GO annotations to the whole number lncRNAs. training data (2821 lncRNAs) and the rest are used as validation data (1210 lncRNAs). In our method, each class needs to be defined 3.3 Post processing to a definite level. However, in DAG structures, which level a class In NeuraNetL2GO, each neural network gives the predictions for belongs to is determined by the hierarchical path chosen from the the examples at each level. Namely, the prediction value in a level is root node to the class. In our method, the longest path (the deepest not determined by the output of the neural network at other levels. hierarchy) from the class to the root node is treated as the level of a Hence, classification inconsistencies may occur in terms of the pre- class in the DAG structure. In this way, when a class is defined to a dictions, i.e. when a subclass is predicted but its superclass is not. level l, all its superclasses will be defined to levels shallower than l. Figure 3 shows an example of the classification inconsistencies. Since there is no available public database of lncRNA function Figure 3A illustrates a small part of the GO hierarchy taxonomy. annotations, we manually curate a independent test set of 55 The digits in the circles are the indices of the class in our experiment. lncRNAs with 129 GO terms (lncRNA2GO-55) based on referen- The GO terms next to the circles correspond to the indices, respec- ces. The lncRNA2GO-55 dataset only includes lncNRAs that have tively. The vector of prediction values is shown in Figure 3B, and the been functionally characterized through knockdown or over- vector of predicted classes is obtained after a threshold value of 0.5 expression experiments. is used (Fig. 3C). The class corresponding to 196 is assigned 1(the red), but its superclass corresponding to 73 is assigned 0. The case is considered to be an inconsistency. Therefore, the value of the posi- 3.2 Evaluation measures tion is corrected to 0 (Fig. 3D). The output of our method for each term in the GO is a score in the Another post-processing step needs to be highlighted. In DAG, interval [0, 1]. Hence, a threshold value is applied to determine the there may be multiple paths from an ancestor to one descendant final predictions. For a given example, if the output for a class is node, i.e. there are three paths from the node with index 1 to the equal to or greater than the threshold, the example is considered to node with index 361 in Figure 3A. The three paths are 1->19->73- belong to the class, otherwise it is not. We use t to denote the >195->361, 1->19->74->195->361 and 1->19->73->196->361. threshold value, P(t) to denote the set of predicted terms, and T to If there is one path that has been correctly predicted, all the super- denote a set of experimentally determined GO terms. TP, FP and classes will be set to 1. For example, if 1->19->73->195->361 is FN represent the number of true positives, false positives, and false correctly predicted, the superclasses of node with index 361, negatives, respectively. For each lncRNA i and threshold t,they namely, all the nodes in Figure 3A, will be assigned 1. are given by 3.4 Parameter selection TP ¼ IfðÞ 2 P ðÞ t ^ f 2 T (5) i i i f2O There are many hyper parameters to be optimized, and parameter optimization is a complicated problem to solve. The hyper parame- FP ¼ IfðÞ 2 P ðÞ t ^ f 62 T (6) i i i f2O ters utilized in our method are selected without exhaustive experi- X ments. The hyper parameters to be optimized are as follows: FN ¼ IfðÞ 2 T ^ f 62 P ðÞ t : (7) i i i f2O i. Number of hidden neurons in each neural network. There are Here, f is a GO term and O is the set of GO terms in our experiment. 13 neural networks in all, each corresponding to one level of Function I(x) is an indicator function defined as: the GO hierarchy. 8 ii. Momentum factor and learning rate used in Back-propagation < x ¼ true with momentum algorithm. IxðÞ¼ (8) 0 iii. Initializations for the weights and biases in the neural networks. x ¼ false: For a given threshold t, the overall precision and recall for all exam- ple are defined as: AB TP Prec ¼ P P (9) TP þ FP i i i i TP P P Rec ¼ (10) TP þ FN i i i i Low threshold engenders each example having many GO terms and brings about high recall and low precision. On the other hand, large threshold engenders few GO terms being assigned to each example and brings about high precision and low recall. To cope with the Fig. 3. Example of the classification inconsistencies. (A) an example of the GO problem and provide a single-score for overall evaluation of differ- hierarchy. (B) the vector of predicted values. (C) the binary vector of predicted ent methods, the maximum F-measure over all thresholds (all points classes when a threshold value of 0.5 is used. (D) the binary vector of pre- in the precision-recall curve) is calculated. The F is written as: dicted classes after post processing max NeuraNetL2GO 1755 Table 1. The F values when using different combinations of the 0.8 max two hyper-parameters N N neighbor feature 0.7 50 100 200 10 0.1455 0.1458 0.1841 0.6 20 0.3361 0.2799 0.2309 0.5 iv. Number of lncRNA features (N ), namely, the number of feature dimensions of low-dimensional vector representation of each 0.4 node in the lncRNA similarity network. v. The threshold N r used to determine if a lncRNA has the neighbo function of a GO term (the number of neighbors in the 0.3 lncRNA–protein interactions corresponding to a GO term). 0246 8 10 12 14 GO depth The first three categories of hyper parameters are used in the neural networks. We utilize the same values as Cerri et al.’s research (Cerri Fig. 4. Performance comparison of different levels in the GO hierarchy et al., 2015). As the depth of the hierarchy in DAG becomes deep, the training samples are gradually reduced. Therefore, to reduce overfitting, we gradually decrease the number of hidden neurons as the level is getting deeper. The number of hidden neurons is deter- mined by setting the ratio of them to the number of neurons in input layer. These ratios are as follows: 0.65/0.65/0.6/0.55/0.5/0.45/0.4/ 0.35/0.3/0.25/0.2/0.15/0.1. The learning rate and momentum factor utilized in Back-propagation for hidden layers and output layers are 0.05, 0.03 and 0.03, 0.01, respectively. The initial values of the weights and biases in the neural networks in our experiment are selected randomly from the range [0.1, 0.1]. Besides the hyper-parameters related to neural networks, the number of lncRNA features (N ) and the threshold N feature neighbor also have significant influence on the prediction performance. In order to evaluate the impact of the two hyper-parameters on the Fig. 5. Performance comparison with the methods of lnc-GFP and functional annotations of lncRNAs, we vary their values and per- LncRNA2Function form the independent test using the lncRNA2GO-55 dataset. Table 1 shows the comparison of the F when the two hyper- max predict probable functions for all the lncRNAs characterized in the parameters are assigned different values. It can be observed that the lncRNA co-expression network. F value reaches the max value when the number of lncRNA fea- max In order to examine our method level by level, we calculated the tures (N ) and the threshold N are set to 50 and 20, feature neighbor maximum F-measure when predicting function classes of lncRNAs respectively. Hence, the two parameters, N and N , are set feature feature in different hierarchical levels. As shown in Figure 4, the perform- to 50 and 20, respectively, in this work. ance of level 1 is the best and the maximum F-measure is 0.745. As the depth of the hierarchy increases, the performance gradually dete- 3.5 Performance riorates. In GO hierarchy, the parent terms are more generalized As described earlier, the computational methods that investigate the and the child terms are more specific. functions of lncRNAs are mainly based on ‘guilt-by-association’ In this paper, we compare the performance of our method with from co-expression patterns shared with their protein-coding coun- the two state-of-the-art methods (lnc-GFP and LncRNA2Function) terparts. Among these methods, Liao et al.’s module-based method on the lncRNA2GO-55 dataset by an independent test. The GO is based on a local strategy and only 340 lncRNAs have been func- classifies functions on three aspects: molecular function, cellular tionally characterized (Qi et al., 2011). Lnc-GFP (Guo et al., 2013) component and biological process. In our experiment, we compare is an important method that can annotate probable functions for the biological process with the other two methods since many lncRNAs on a large scale. In Lnc-GFP, a coding-non-coding bi-col- lncRNAs participate in the biological process by lncRNA–protein ored biological network is constructed according to gene expression interactions and most annotations in lncRNA2GO-55 are biological data and protein–protein interaction data. Then a global propaga- process terms. Performance comparison of the three methods is tion algorithm on the bi-colored network is used to predict putative shown in Figure 5. Our NeuraNetL2GO method shows a much bet- functions for lncRNAs based on the known functions of proteins. ter performance in terms of maximum F-measure of 0.336, and lnc- LncRNA2Function (Jiang et al., 2015) is a method based on statis- GFP and LncRNA2Function follow with the maximum F-measure tics. In LncRNA2Function, the hypergeometric test is employed to infer the functions of lncRNAs of interest according to the expres- of 0.225 and 0.161. In precision and recall, our method also gains competitive scores of 0.250 and 0.513, respectively. Also, we calcu- sion correlation between lncRNAs and protein-coding genes across 19 human normal tissues. Our NeuraNetL2GO approach is based late the numbers of lncRNAs that are annotated with at least one on machine learning. We constructed multiple neural networks to biological process GO term (excluding the root GO: 0008150) by Fmax 1756 J.Zhang et al. rescuing cell cycle arrest, while the C2 variant had only a minimal effect on apoptosis. They also demonstrated that GAS5 expression has a significant impact on neuroblastoma cell biology. To further assess the performance, we run NeuraNetL2GO on the lncRNA GAS5 according to the trained parameters. The GO terms predicted are listed in Supplementary Table S2. As expected, some of them are the apoptotic process, some regulation of cellular process, some cell cycle and so on. These predicted functions of GAS5 are consistent with the experimental results described earlier. Fig. 6. The numbers of lncRNAs that are annotated correctly by the three 4 Discussion and conclusion methods, respectively A huge number of lncRNAs have been recognized in the past few years. However, most of the lncRNAs remain poorly functional the three methods. As shown in Figure 6, 50 lncRNAs are annotated characterized. In this study, we propose a hierarchical multi-label correctly by our method. The coverage of NeuraNetL2GO is much classification strategy to annotate the functions of lncRNAs. First, higher than that of lnc-GFP and LncRAN2Function. we constructed an lncRNA similarity network according to the lncRNA expression profiles and extracted a low-dimensional vector 3.6 Case studies representation of each node by running RWR on the network. Then In this section, two lncRNAs are used as instances to further demon- multiple neural networks are trained with the low-dimensional vec- strate the predictive performance and show the application of our tor representations as features of inputs and GO terms as outputs. method. For each lncRNA, the predicted GO terms, GO names and After training these neural networks, the lncRNA2GO-55 dataset is GO paths are listed in Supplementary tables. employed to evaluate the performance independently. Regarding the Case study 1: HOTAIRM1. HOTAIRM1 is an lncRNA located experimental results, our NeuraNetL2GO method achieves the best between the HOXA1 and HOXA2 genes in humans, and it is tran- prediction results, when compared to the other two state-of-the-art scribed antisense by RNA polymerase II. Researchers have demon- methods: lnc-GFP and LncRNA2Function. Moreover, 50 of the strated that HOTAIRM1 may play a regulatory role in myeloid manually annotated 55 are correctly annotated with at least one GO transcriptional regulation and quantitatively impairs expression of term, which overwhelmingly outperforms the other two methods. the genes HOXA1 and HOXA4 (Zhang et al., 2009). Marina et al. We would like to point out that our NeuraNetL2GO method (2015) found that HOTAIRM1 plays a role in normal hematopoie- may have some limitations. First, we have to employ the neighbor sis and leukemogenesis, including miR-196 b. The current studies counting method to annotate some lncRNAs to train the neural net- have revealed that members of the integrin families take part in works because of the lack of experimentally determined lncRNA phagocytosis, leukocyte trafficking and signal transduction and are regulated by HOX genes (Dupuy and Caron, 2008). Also, function annotations. It would lead to a bias against the correct HOTAIRM1 regulates genes encoding cell adhesion receptors. annotations. Second, low-dimensional vector representation of each Zhang et al. (2014) revealed that E2Fs, HOTAIRM1 and perhaps node is extracted depending on the structure of lncRNA similarity protein-coding HOX genes might serve as a network to regulate cell network. However, low-dimensional vector representation of each cycle progression during differentiation. And their results suggest node is inexact since the expressions of many lncRNAs are missing. that an HOTAIRM1-regulated integrin switch mechanism involving Third, it is challenging to set so many hyper-parameters to proper CD11c and CD49d may regulate the cell growth in NB4 acute pro- values. In the future, we will integrate more biological data and myelocytic leukemia cells and hence modulate NB4 cell maturation. efficient machine learning algorithms to better predict lncRNA We use NeuraNetL2GO to predict the functions of functions. HOTAIRM1. The GO terms assigned to the lncRNA HOTAIRM1 are shown in Supplementary Table S1. Most of them are related to Funding biological regulation, signal transduction and cellular process. These functions have been demonstrated by the previous studies. The This work was supported by National Natural Science Foundation of China results show that NeuraNetL2GO can successfully infer the func- under grant nos. 61672541 and 61379109, Shanghai Key Laboratory of tions of lncRNA HOTAIRM1. Intelligent Information Processing under grant no. IIPL-2014-002, Scientific Case study 2: GAS5. GAS5 (growth arrest-specific transcript 5) Research Fund of Hunan Province Education Department under grant no. 16B244, Natural Science Foundation of Hunan Province under grant no. was originally identified from NIH3T3 cells using subtraction 2017JJ3287, and Natural Science Foundation of Zhejiang under grant no. hybridization (Schneider et al., 1988). There exist many different LY13F020038. patterns of alternative splicing in GAS5 transcripts. The open read- ing frame in GAS5 exons is small and poorly conserved during even Conflict of Interest: none declared. relatively short periods of evolution (Schneider et al., 1988; Raho et al., 2000). Some studies have shown that GAS5 is related to apop- tosis and it could play a role in the progression of numerous human References cancers. For example, GAS5 has been shown to be a key regulator Barrett,T. et al. (2007) Ncbi geo: mining tens of millions of expression profiles of prostate cell survival, and its levels in cellular are quantitatively database and tools update. Nucl. Acids Res., 35, D760–D765. related to cell death (Pickard et al., 2013). Mazar et al. (2017) found Birney,E. et al. (2007) Identification and analysis of functional elements in 1% multiple novel splice variants by further analysis of sequenced GAS5 of the human genome by the encode pilot project. Nature, 447, 799–816. clones, the two variants of which were called Full-Length (FL) and Cerri,R. et al. (2014) Hierarchical multi-label classification using local neural Clone 2 (C2). The FL variant further promoted cell proliferation by networks. J. Comput. Syst. Sci., 80, 39–56. NeuraNetL2GO 1757 Cerri,R. et al. (2015) Hierarchical classification of gene ontology-based pro- Necsulea,A. et al. (2014) The evolution of lncrna repertoires and expression tein functions with neural networks. In: International Joint Conference on patterns in tetrapods. Nature, 505, 635–640. Neural Networks, pp. 1–8. Okamura,Y. et al. (2015) Coxpresdb in 2015: coexpression database for ani- Cho,H. et al. (2015) Diffusion component analysis: unraveling functional top- mal species by dna-microarray and rnaseq-based expression data with mul- ology in biological networks. Comput. Sci., 9029, 62–64. tiple quality assessment systems. Nucl. Acids Res., 43, 82–86. Deng,L. and Chen, Z. (2015) An integrated framework for functional annota- Paraskevopoulou,M.D. and Hatzigeorgiou,A.G. (2016) Analyzing mirna-lncrna tion of protein structural domains. IEEE/ACM Trans. Comput. Biol. interactions. Methods Mol. Biol., 1402,271. Bioinformatics (TCBB), 12, 902–913. Pickard,M.R. et al. (2013) Long non-coding RNA GAS5 regulates apoptosis Derrien,T. et al. (2012) The gencode v7 catalog of human long noncoding in prostate cancer cell lines. Biochim. Biophys. Acta, 1832, 1613–1623. rnas: analysis of their gene structure, evolution, and expression. Genome Qi,L. et al. (2011) Large-scale prediction of long non-coding RNA functions Res., 22, 1775–1789. in a coding-non-coding gene co-expression network. Nucl. Acids Res., 39, Dupuy,A.G. and Caron,E. (2008) Integrin-dependent phagocytosis: spreading from microadhesion to new concepts. J. Cell Sci., 121, 1773–1783. Raho,G. et al. (2000) The gas 5 gene shows four alternative splicing patterns Ebert,M.S. and Sharp,P.A. (2010) Emerging roles for natural microrna without coding for a protein. Gene, 256, 13–17. sponges. Curr. Biol., 20, 858–861. Ricardo,C. et al. (2016) Reduction strategies for hierarchical multi-label clas- Fan,C. et al. (2016) Predrsa: a gradient boosted regression trees approach sification in protein function prediction. BMC Bioinform., 17, 373. for predicting protein solvent accessibility. In: BMC Bioinformatics Vol. 17, Rocca-Serra,P. et al. (2003) Arrayexpress: a public database of gene expres- p. 8. BioMed Central Ltd. sion data at ebi. C. R. Biol., 326, 1075. Ferre ` ,F. et al. (2016) Revealing protein-lncRNA interaction. Brief Bioinform., Rumelhart,D.E. et al. (1986) Learning Representations by Back-Propagating 17, 106–116. Errors. Nature, 323, 533–536. Garzo ´ n,J.I. et al. (2016) A computational interactome and functional annota- Schneider,C. et al. (1988) Genes specifically expressed at growth arrest of tion for the human proteome. Elife, 5, e18715. mammalian cells. Cell, 54, 787–793. Guo,X. et al. (2013) Long non-coding RNAs function annotation: a global Simon,M.D. (2013) Capture Hybridization Analysis of RNA Targets prediction method based on bi-colored networks. Nucl. Acids Res., 41, e35. (CHART). John Wiley & Sons, Inc, Hoboken, New Jersey. Guttman, M. et al. (2009) Chromatin signature reveals over a thousand highly Tang,W. et al. (2016) Which statistical significance test best detects conserved large non-coding rnas in mammals. Nature, 458, 223. oncomirnas in cancer tissues? An exploratory analysis. Oncotarget, 7, Hao,Y. et al. (2016) NPInter v3.0: an upgraded database of noncoding 85613–85623. RNA-associated interactions. Database J. Biol. Databases Curat., 2016, Tong,H. et al. (2006) Fast random walk with restart and its applications. In: baw057. International Conference on Data Mining, pp. 613–622. Jeffrey,Q. and Chang,H.Y. (2012) Chromatin isolation by RNA purification Turner,M. et al. (2014) Noncoding RNA and its associated proteins as regula- (ChIRP). J. Vis. Exp., 61, 3912. tory elements of the immune system. Nat. Immunol., 15, 484–491. Jiang,Q. et al. (2015) LncRNA2Function: a comprehensive resource for func- Wang,S. et al. (2015) Exploiting ontology graph for predicting sparsely anno- tional investigation of human lncRNAs based on RNA-seq data. BMC tated gene function. Bioinformatics, 31, 357–364. Genomics, 16(Suppl 3), S2. Wapinski,O. and Chang,H.Y. (2011) Long noncoding rnas and human dis- Lee,H.K. et al. (2004) Coexpression analysis of human genes across many ease. Trends in Cell Biol., 21, 354–361. microarray data sets. Genome Res., 14, 1085. Wong,L. and Chua,H.N. (2012) Predicting Protein Functions from Protein Li,J. et al. (2016) LncRNA TUG1 acts as a tumor suppressor in human glioma Interaction Networks. IGI Global. by promoting cell apoptosis. Exp. Biol. Med., 241, 644–649. Xie,C. et al. (2014) Noncodev4: exploring the world of long non-coding RNA Liu,G. et al. (2017) Integrating genome-wide association studies and gene genes. Nucl. Acids Res., 42, D98. expression data highlights dysregulated multiple sclerosis risk pathways. Yu,G. et al. (2017) Newgoa: predicting new go annotations of proteins by Multi. Scler., 23, 205. bi-random walks on a hybrid graph. IEEE/ACM Trans. Comput. Biol. Marina,D.B. et al. (2015) The lincrnahotairm1, located in thehoxagenomic Bioinform., doi:10.1109/TCBB.2017.2715842. region, is expressed in acute myeloid leukemia, impacts prognosis in patients Zhang,X. et al. (2009) A myelopoiesis-associated regulatory intergenic non- in the intermediate-risk cytogenetic category, and is associated with a dis- coding rna transcript within the human HOXA cluster. Blood, 113, tinctive microrna signature. Oncotarget, 6, 31613–31627. 2526–2534. Mazar,J. et al. (2017) The long non-coding RNA GAS5 differentially regulates Zhang,X. et al. (2014) Long intergenic non-coding RNA HOTAIRM1 regu- cell cycle arrest and apoptosis through activation of BRCA1 and p53 in lates cell cycle progression during myeloid maturation in NB4 human pro- human neuroblastoma. Oncotarget, 8, 6589–6607. myelocytic leukemia cells. Rna Biology, 11, 777–787. Mercer,T.R. et al. (2009) Long non-coding RNAs: insights into functions. Zhang,J. et al. (2017) Integrating multiple heterogeneous networks for novel Nat. Rev. Genet., 10, 155. lncRNA-disease association inference. IEEE/ACM Trans. Comput. Biol. Mercer,T.R. and Mattick,J.S. (2013) Structure and function of long noncoding Bioinform., doi:10.1109/TCBB.2017.2701379. rnas in epigenetic regulation. Nat. Struct. Mol. Biol., 20, 300. Zou,Q. et al. (2015a) Prediction of microrna-disease associations based on Morris,K.V. and Mattick,J.S. (2014) The rise of regulatory rna. Nat. Rev. Genet., 15, 423–437. social network analysis methods. Biomed. Res. Int., 2015, 810514. Mortazavi,A. et al. (2008) Mapping and quantifying mammalian transcrip- Zou,Q. et al. (2015b) Similarity computation strategies in the microRNA-disease tomes by RNA-seq. Nat. Methods, 5, 621. network: a survey. Brief. Funct. Genomics, 15,55–64.
Bioinformatics – Oxford University Press
Published: Dec 23, 2017
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.