Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin

Survey of local and global biological network alignment: the need to reconcile the two sides of... Abstract Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology. graph comparison, local network alignment, global network alignment, biological networks Introduction Molecular biology has deeply investigated the role of biologically relevant molecules within cells. After the first phase in which most of the efforts focused on the discovery of such molecules, more recently, the interests of researchers have concentrated on the identification of a complex set of relations among molecules, under the hypothesis that genes, proteins, and other molecules rarely work alone but instead form a complex network of interactions [1]. This new systems-level point of view of molecular biology has affected both technologies used to analyse cells as well as methods and models used to manage resulting data. Namely, this new perspective has encouraged the development of high-throughput techniques for determining relations between biomolecules [2]. Consequently, this has led to accumulation of large amounts of biomolecular interaction data, such as protein–protein interaction networks (PINs) [3, 4], collected in publicly available databases (see [5, 6] for an extensive review). The systems-level data collection, in turn, has raised the need to develop novel computational tools and algorithms that are able to accurately and efficiently model, query and analyse the network data [7]. Of all biological network types, our key focus is on PINs. Yet, our discussion is applicable to other biological network types, such as gene co-expression, metabolic or gene regulatory networks. For PINs, the most used formalism to manage and analyse the data has been adopted from graph theory [8]. Consequently, a PIN, or interactome, is modelled as a graph G = (V,E), where V is the set of nodes representing proteins, and E is the set of edges representing (typically physical) protein–protein interactions (PPIs). Such a synergetic view of cellular functioning gives to researchers the opportunity to analyse the complex joint effect of the interplay among molecules, which is more realistic compared with analysing only effects of individual molecules in isolation [1]. There exist a variety of research problems related to PIN analysis. For instance, Panni and Rombo [9] consider three important aspects: ‘network alignment’, ‘network querying’ and ‘network motif extraction’. Additional aspects are ‘study of network properties’ [1], ‘network clustering’ [10, 11] and ‘dynamic network analysis’ [12]. Here, we focus on the problem of network alignment, i.e. the comparison of different PINs typically corresponding with different species. Such analysis is the counterpart of the alignment of (linear) sequences of genes or proteins. In the basic formulation, network alignment aims to find a good node mapping between PINs of two or more species that identifies common interaction patterns between the compared networks, which are then hypothesized to correspond to functionally conserved network regions between the species. Therefore, just as sequence alignment has been used to define sequence-based homology, network alignment can be used to define network-based homology or similarity. However, unlike sequence alignment, network alignment is computationally intractable (i.e. NP-hard), owing to the NP-completeness of the underlying subgraph isomorphism problem. Therefore, heuristic approaches for solving the network alignment problem need to be sought. Analogous to sequence alignment, there exist two different instances of the network alignment problem: local and global [13, 14]. From a computational perspective, local network alignment (LNA) [15] searches for highly similar network regions that likely represent conserved functional structures, which often results in relatively small mapped subnetworks and in some network regions not being a part of the alignment (Figure 1). Instead, global network alignment (GNA) [13] looks for the best superimposition of the whole input networks (i.e. an alignment that minimizes a cost function), which typically results in large but suboptimally conserved mapped subnetworks (Figure 1). Both kinds of network alignment may compare two or more networks, corresponding to pairwise and multiple alignment, respectively. From a biological perspective, LNA looks for evolutionarily conserved building blocks of the cellular machinery, disregarding the overall similarity between the networks. Instead, GNA searches for a single comprehensive mapping of the whole sets of protein interactions from different species. Figure 1. View largeDownload slide LNA (left) versus GNA (right). LNA finds small local regions of high similarity and often admits a many-to-many mapping between nodes of the compared networks. GNA finds an injective (one-to-one) global mapping between the nodes at the expense of suboptimally matching local network regions. Nodes from the compared networks that are aligned to each other are indicated with broken lines. Figure 1. View largeDownload slide LNA (left) versus GNA (right). LNA finds small local regions of high similarity and often admits a many-to-many mapping between nodes of the compared networks. GNA finds an injective (one-to-one) global mapping between the nodes at the expense of suboptimally matching local network regions. Nodes from the compared networks that are aligned to each other are indicated with broken lines. Formally, given two input networks G1 and G2 (let us suppose that G1 has fewer nodes than G2 or the same number of nodes as G2), the problem of finding an alignment between the two networks corresponds to the search for a mapping between nodes of G1 and nodes of G2 that maximizes a given cost function (quality of alignment). The size of this search space is large, as it consists of all possible mappings between nodes of the compared networks. The computational intractability of the above problem, which arises from the NP-completeness of the underlying subgraph isomorphism problem [16], requires development of heuristics (approximate approaches) to solve the problem. Thus, all existing LNA and GNA methods are heuristics. GNA produces an injective (one-to-one) matching, i.e. for each node of G1, there exists a unique correspondent node in G2, while LNA gives a mapping of a subset of nodes in G1 to a subset of nodes in G2 (even admitting many-to-many node correspondences in some cases). There is a clear connection between LNA and GNA. For example, both aim to find topological and functional similarities between the compared networks to allow for the transfer of biological knowledge from well-studied species to poorly studied species between the conserved (aligned) network regions. Yet, researchers in these two subfields have produced independent algorithms. Consequently, there are many LNA algorithms and many GNA algorithms that rely on different assumptions and use different approaches that maximize different cost functions [17]. For instance, many algorithms try to optimize some cost functions based mainly on topology, while many others are tailored to enhance the functional relevance of alignments. Therefore, a direct comparison of an LNA method and a GNA method is non-trivial. Consequently, when a new method is proposed, it is only compared against the existing network alignment methods from the same category (i.e. LNA or GNA), even though LNA and GNA have the same goal of across-species transfer of biological knowledge between aligned network regions [18]. Therefore, the ultimate question is which one to use: LNA, GNA or a hybrid approach that would reconcile the two? The possible reconciliation between these two aspects of network alignment is an open research problem that should be investigated deeply in the future. In this survey, after presenting prominent algorithms for each of LNA and GNA (which are summarized in Table 1), we discuss recent beginning steps towards reconciling the two corners. In particular, we point to a comprehensive evaluation study, the first ever comparison of LNA and GNA, which showed that LNA and GNA are complementary, as the former results in high functional but low topological quality, while the latter results in high topological but low functional quality, and as they lead to different biological predictions. As both PIN topological information and functional information are valuable sources of biological knowledge, this important finding about complementarity of LNA and GNA further highlights the need for reconciling the two approach types. Towards the end of our discussion below, we comment on a recent first step in this direction. Table 1. Summary of the existing LNA and GNA methods Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Note. Each of the two method categories is further divided into pairwise (PNA) and multiple (MNA) methods. For each method, its general algorithmic principle is stated, along with the potential availability of its software implementation. In the table, AS is alignment strategy, and NCF is node cost function (see the text for details). Table 1. Summary of the existing LNA and GNA methods Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Note. Each of the two method categories is further divided into pairwise (PNA) and multiple (MNA) methods. For each method, its general algorithmic principle is stated, along with the potential availability of its software implementation. In the table, AS is alignment strategy, and NCF is node cost function (see the text for details). Local network alignment In recent years, several algorithms have been proposed to solve the LNA problem. They may be categorized considering the approach and the scoring of the alignment. Nevertheless, we group them into two broad classes considering their main steps: mine-and-merge and merge-and-mine. Algorithms belonging to the mine-and-merge class first analyse each network separately (e.g. to identify small dense subgraphs or communities), and then they build the alignment solution by comparing the pre-identified modules across the networks [19–21]. Examples of mine-and-merge methods are MOPHY [19], the algorithm by Jancura et al. [20] and MODULA [22]. Briefly, these methods work as follows. The MOPHY algorithm, proposed by Erten et al. [19], is based on the identification of modular network components from each network separately by applying graph clustering techniques. Then, each module is projected onto other networks using sequence similarity. The conservation of each module is used to assess the similarity between input graphs. The method proposed by Jancura et al. [20] is tailored to the study of the conservation of protein complexes (i.e. set of mutually interacting proteins) among different species. The algorithm first identifies protein complexes in each input network; then, orthology relationships are used to link corresponding protein complexes. MODULA algorithm uses ClusterOne [23] to separate modules in each input network. Then, the modules are matched using semantic similarity [24]. High-scoring pairs of the modules are included in the alignment, and the algorithm iterates for as long as it grows. The algorithms from the mine-and-merge class have to be able to merge efficiently subnetworks of the input networks, which may be computationally expensive, as it is related to the (sub)graph isomorphism problem. In particular, they have to consider even approximate matching among subgraphs and possible many-to-many mappings among subgraphs of two networks. Conversely, merge-and-mine analysis alleviates these difficulties by avoiding one-to-one comparison of generated modules network topologies, but they are more sensitive to noise in input data, as they use an initial node mapping list derived from biological consideration (homology relationships, i.e. sequence similarities, or semantic similarities, more recently). Initial node mapping contains many-to-many relations usually, and algorithms may prune this initial list using some heuristics. Without any additional information on the input node mapping (i.e. the possible orthologs), the LNA problem is complex, as all the possible combinations of protein pairs should be considered. Merge-and-mine algorithms analyse PINs together, usually after the alignment or merging of the PINs into a single graph [25, 26]. Several merge-and-mine algorithms follow a similar processing scheme: merge all input data (PINs, putative homologies) into a single weighted undirected graph, generally referred to as an alignment graph, and then apply a mining heuristic on top of it to identify conserved modules. Alignment graphs are built using many different models. From a conceptual point of view, the simplest formulation of the alignment graph is a graph whose nodes correspond to pairs of matched nodes (i.e. orthologs nodes), and edges link matched nodes (such an edge corresponds to, e.g., a potentially conserved interaction). Such formulation is, unfortunately, sensitive to noise in input data, i.e. missing or wrong interactions or homology relationships as evidenced by Mina and Guzzi [26]. Consequently, all the algorithms proposed a different formulation of alignment graphs that was more robust to noise. For instance, NetworkBlast algorithm [27] showed a relaxation of constraints related to edges. In its formulation, there exist an edge among two nodes of the alignment graph whenever nodes of the input networks are at distance less than or equal to a given threshold k in a network, and they are adjacent to the other one. One of the drawbacks of this approach is that it might introduce many unreliable edges in the alignment graph because of the presence of false positives in the input networks. The MaWish algorithm [28] takes into account this problem by setting the k threshold to 2 and weighting the edges of the alignment graph by the similarity scores of the putative orthologs. This approach avoids the generation of dense graphs, but it fails on the recovery of large subgraphs, and it is more sensitive to missing interactions as noted by Ciriello et al. [25]. More recently, NetAligner [29] proposed a two-step strategy for building the alignment graph. First, a relatively small alignment graph is made, considering only the most reliable interactions, and it is used to divide both input networks into candidate seeds. Finally, predicting potential conserved interactions extends the alignment graph. NetworkBlast, MaWish and NetAligner use the concept of shortest path distance between nodes of input networks to propose a new connection among nodes of the alignment graph. AlignNemo extends the idea of distance by considering the number of paths connecting input nodes. For each pair of orthologs, the number of short paths (of length at most 2) connecting them is used to evaluate how likely the orthologs are connected in both the species. In this process, AlignNemo also takes into consideration the degree of each protein, penalizing paths passing through hubs and high-degree proteins. Consequently, the impact of false positives in AlignNemo has significantly reduced under the consideration that it is unlikely that many false interactions consistently participate in short redundant paths between two proteins. Align-MCL [30] uses the same strategy of Align-Nemo to build the alignment graph. Almost all of the above merge-and-mine approaches give scores to the edges and nodes of the alignment graph. AlignNemo and Align-MCL assign higher scores to edges involving protein pairs connected by multiple short paths in input networks. MaWish uses the sequence similarities of connected proteins to weigh the edges. NetAligner uses a more sophisticated model based on an evolutionary and a probabilistic assumption: interacting proteins evolve at rates significantly closer than expected by chance. Edge scores of the alignment graph are probabilities, and they are calculated by taking into account the difference of evolutionary distances between the corresponding ortholog pairs. Once that the alignment graph is built, each algorithm uses a different strategy to obtain the alignment. MaWish uses the alignment graph to solve the alignment problem as a graph optimization by finding a Maximum Weight Induced Subgraph Problem (which is where the algorithm name, MaWISh, comes from). Starting from the alignment graph, MaWish extracts a subset of nodes that induce a maximum weighted subgraph. NetworkBlast is based on a search algorithm that first identifies high-scoring subnetworks seeds and then expands them in a greedy fashion. The scoring is based on a probabilistic model, and the algorithm iterates as far as the significance of the identified subnetwork is high. AlignNemo extracts all connected subgraphs of a given size from the alignment graph and ranks them according to weights on nodes and edges. Top ranking fully connected subgraphs are used as seeds for the alignment solution. Each seed is expanded in an iterative fashion by adding multiple subgraphs at each step allowing to explore the network context of a solution beyond its immediate neighbours. Finally, Align-MCL uses Markov clustering to separate small dense subgraphs. All the approaches discussed above can align two networks, and they fail to deal with multiple networks. Only NetworkBlast has been extended into NetworkBlast-M [31] that can align multiple networks. Other algorithms, such as GASOLINE [32], have been designed specifically for the multiple local alignment problem. Global network alignment Next, we review existing work on GNA [13, 14]. Traditional GNA uses a two-stage procedure: (1) use a cost function to compute pairwise similarities between nodes of the compared networks and (2) use an alignment strategy to rapidly identify from all possible alignments a high-scoring alignment with respect to the total similarity over all aligned nodes. A typical cost function aims to quantify topological similarities between nodes, meaning that two nodes whose (extended) network neighbourhoods match well will have high topological similarity. Yet, most of existing GNA methods allow for integrating sequence information (i.e. sequence-based node similarities) into the cost function. A typical alignment strategy uses the precomputed node similarities (resulting from the cost function) to produce an alignment, often in an iterative seed-and-extend fashion. Prominent examples of two-stage GNA methods are IsoRank [33, 34] and GRAAL [35–37] approach families, as well as GHOST [38]. These approaches vary in their cost functions. Namely, IsoRank, GRAAL and GHOST compute similarities between two nodes with respect to the nodes’ PageRank-like [33, 34], graphlet-based [39] and spectral signature-based [38] ‘topological signatures’, respectively. These approaches differ in their alignment strategies as well; for details, see [13]. What the traditional two-stage GNA methods have in common is that their alignment strategies use precomputed node similarities without updating the node similarities while iteratively building on top of the current alignment. However, nodes that are already aligned at a given iteration of the alignment strategy might convey valuable information for guiding the remaining iterations of the strategy. Therefore, it could be desirable to update the node similarities (i.e. cost function) in each iteration [13]. Consequently, newer approaches have been designed to allow for this, such as NETAL [40] and WAVE [41]. Whether the existing two-stage GNA methods allow for updating their cost functions during the alignment construction process, most of them have their own cost functions as well as alignment strategies. Consequently, when a given approach is found to be superior to another, it is not clear whether this superiority comes from the winner’s cost function, its alignment strategy or both. So, to fairly evaluate different two-stage GNA methods, their different cost functions should be compared under the same alignment strategy, and their different alignment strategies should be compared under the same cost function [42, 43]. It was shown by mixing and matching the algorithmic components of MI-GRAAL [37] and IsoRankN [34], as well as of MI-GRAAL and GHOST, that the combination of cost function of one method and alignment strategy of another method typically beats each original method [42, 43]. Another key finding was that of all cost functions, the graphlet-based node similarities [39] lead to the best alignments. Traditional GNA methods, as discussed up to this point, identify from possible alignments the high-scoring alignments with respect to the overall node similarity (or node conservation). However, they then evaluate the accuracy of the alignments with some other measure that is different than the node similarity used to construct the alignments. Typically, one measures the amount of conserved edges. Thus, the traditional methods align similar nodes between networks hoping to conserve many edges, but only after the alignments are constructed. Instead, an effort was made to directly maximize the amount of conserved edges during the alignment construction process, which indeed improved alignment quality. The resulting approach is called MAGNA [44]. The core algorithm of MAGNA was incorporated into a user-friendly and parallelized GNA framework called MAGNA ++ [45], which allows for optimizing both node and edge conservation during the alignment construction process. This further increased alignment quality compared with optimizing node conservation only (as the traditional methods do) or edge conservation only (as MAGNA does). Unlike the two-stage GNA approaches, MAGNA ++ (and thus MAGNA) is a search-based aligner, meaning that it can directly ‘optimize’ any alignment quality measure (i.e. cost function) via a strategy such as a genetic algorithm (which is what MAGNA and MAGNA ++ use) or simulated annealing (which is what an alternative approach called NetCoffee uses [46]). Several additional search-based aligners have appeared recently, including NABEECO [47], GEDEVO [48] and Optnetalign [49]. Additional GNA efforts related to improving edge conservation during alignment construction include WAVE [41] and GREAT [50]. Both of these approaches optimize both node and edge conservation, just like MAGNA ++, but they also weigh each conserved edge to favour conserved edges that are topologically similar over conserved edges that are topologically dissimilar, unlike MAGNA ++. MAGNA ++ is unique to the other two approaches, as it is search-based, while WAVE and GREAT are two-stage approaches. WAVE is unique in the sense that it is only the alignment strategy of a GNA algorithm, and, thus, it can be combined with any cost function. It was shown to work the best under the graphlet-based cost function. GREAT is unique in the sense that it approaches the GNA problem from a novel perspective, by aligning well edges between networks first to improve the cost function needed to then align well nodes between the networks. Yet, this makes GREAT slower than both WAVE and MAGNA ++, and more efforts are needed to speed it up to make it practically useful for large networks. Some of existing GNA approaches do not necessarily fit any of the above GNA categorizations explicitly. So, we acknowledge such methods by briefly commenting on their unique feature(s), without explaining their algorithmic principles in detail, as follows. Unlike traditional GNA, a method introduced in [51] uses a notion of an alignment graph, just as LNA does. Or, unlike traditional GNA, a method introduced in [52] starts its alignment construction process on an alignment that is optimal based purely on sequence data and uses topological information only to compensate for matching proteins whose sequences are not particularly similar to one another. The above discussion has focused on contrasting the algorithmic principles of different existing GNA methods. Another important aspect of their differences is whether they can align only two networks at once or more than two (multiple) networks. Prominent recent pairwise methods (many of which are already discussed above) include GHOST, NETAL, MAGNA ++, WAVE, OptNetAlign, L-GRAAL [36], Natalie 2.0 [53], SUMONA [54], ModuleAlign [55] and SANA [56], while prominent recent multiple methods include IsoRankN, MI-Iso [42], SMETANA [57], GEDEVO-M [58], BEAMS [59], FUSE [60], SMAL [61] and multiMAGNA ++ [62]. For a systematic comparison of recent pairwise GNA methods, see [18]. For a systematic comparison of recent multiple GNA methods, see [62]. Note that some of these recent methods are new, as they have appeared in parallel to each other as well as to this survey and were, thus, not directly evaluated against each other. Such an evaluation will clearly have to be done in the near future to properly position these newest methods compared with each other and the others. We conclude our discussion of the existing GNA methods by commenting on what it means for an alignment to be of high quality. Conceptually, one wants to measure topological and functional alignment quality. In terms of topological alignment quality, one wants to measure node correctness, i.e. how correctly the nodes have been mapped under the true node mapping, when such a mapping is known beforehand but is hidden during the alignments construction process [18]. More formally, node correctness is the percentage of nodes in the smaller of the aligned networks that are correctly aligned according to the true node mapping. Unfortunately, in real-life applications, such a true mapping is typically unknown. So, an alternative measure of topological alignment quality is the amount of conserved edges. A popular measure quantifying this is symmetric substructure score (S3) [18, 44] or its successor generalized S3 (GS3) [18]. More formally, S3 is the ratio of the number of conserved (aligned) edges to the number of both conserved and non-conserved edges; if u and v are nodes in one network, if u’ and v’ are nodes in another network, if there is an edge between u and v, if there is an edge between u’ and v’, if u has been aligned to u’ and if v has been aligned to v’, then the edge (u,v) has been aligned to edge (u’,v’), and, consequently, this edge is conserved. Although S3 is restricted to one-to-one (injective) GNA, GS3 was recently proposed as an intuitive extension of S3 that can deal with both LNA and GNA (this is important for reasons discussed in the following section). For details on GS3, see [18]. An alternative measure quantifying edge conservation is the size of the common subgraph [18], which does not just quantify the amount of conserved edges (as S3 or GS3 does), but it also accounts for how the conserved edges are organized, i.e. whether they form large, dense and contiguous conserved network regions (which is desired). Finally, in terms of topological quality, especially when an alignment is not one-to-one (see the following section), one wants to measure node coverage (NCV), which captures the percentage of all nodes across all of the aligned networks that are a part of the alignment. In terms of functional alignment quality, one wants to measure whether the aligned nodes share similar biological functions; popular measures quantifying this are gene ontology (GO) correctness, GO semantic similarity, normalized mean entropy or accuracy of protein function prediction (where the latter is measured in terms of precision—P-PF, recall—R-PF and F-score—F-PF) [18, 62]. For more details on these and other measures of alignment quality for pairwise and multiple GNA, see [18] and [62], respectively. Reconciliation of local and GNA Next, we touch in more detail the question of which one is better, LNA or GNA? Only recently, a study was conducted towards answering this question, representing the first ever comparison of the two NA method categories that evaluated 10 prominent LNA and GNA methods [18]. In the process, as LNA and GNA result in different output types (partial many-to-many node mapping versus one-to-one injective matching, respectively), the same study introduced new measures of alignment quality that allow for fair comparison of the different output types. The 10 considered methods were evaluated comprehensively, on both synthetic networks with known true node mapping and real-world networks with unknown true node mapping, examining in the process the impact on the results of using different PPI types and PPIs of varying confidence, as well as using only topological node similarities in the cost function versus including also protein sequence information, and evaluating both topological and functional alignment quality. Key results of this evaluation study were as follows [18]. When using only topological information in the cost function, GNA outperformed LNA both topologically and functionally (note that when parallelizable methods are run on multiple cores, GNA is also faster than LNA). When sequence information was also included, GNA was superior to LNA in terms of topological alignment quality, while LNA was superior to GNA in terms of functional alignment quality, indicating the complementarity of the two NA categories (note that when parallelizable methods are run on multiple cores, GNA is at least as fast as LNA). The results were overall robust to the choice of PIN data, meaning that both different PPI types and confidence levels led to consistent results in all cases topologically and in most cases functionally. Importantly, when the 10 NA methods were used to predict novel protein functional knowledge via across-species knowledge transfer, LNA and GNA produced different predictions, which further confirmed their complementarity. An important observation of this study[18] and some others [17, 38, 42] was that different measures of topological alignment quality tended to correlate well with each other, and different measures of functional alignment quality tended to correlate well with each other, while topological measures and functional measures typically did not correlate well. This observation held for both LNA and GNA [18]. This indicates that the topological versus functional fit between aligned networks conflict to a larger extent than previously realized. An explanation could be that the discovery of the current experimental biological knowledge may have been mostly guided by sequence-based analyses and not by network-based analyses. Findings from [18] support this, as do findings of an alternative study [52]. If the current experimental biological knowledge is indeed biased towards sequence data, given that sequences and network topology can lead to complementary biological insights [63], the above observation should not be surprising. Importantly, network topology is a valuable source of biological knowledge that can lead to novel insights compared with sequence data alone and can, thus, be used to redefine the traditional notion of sequence-based homology to a new notion of network-based homology [18]. As LNA has high functional but low topological alignment quality, while GNA has high topological but low functional alignment quality, a recent approach called IGLOO was proposed that integrates algorithmic components from both LNA and GNA in hope of reconciling the two NA types [64]. That is, IGLOO aims to inherit high functional quality of LNA and high topological quality of GNA. IGLOO’s inputs are two networks and pairwise similarity scores between their nodes, where the scores are computed via some cost function. This is the same as the input of existing LNA and GNA methods. Then, IGLOO produces an alignment in a similar way as GNA does, by first identifying a high-scoring seed alignment and then expanding around the seed via an alignment strategy. However, IGLOO differs from GNA as follows. Although GNA uses as its seed just a single pair of nodes from the compared networks, IGLOO’s seed is a local alignment (or a part of it) of high functional quality generated by an existing LNA method. Given the seed alignment, IGLOO expands around it via an existing alignment strategy to increase topological quality of the alignment. The difference between IGLOO and LNA is that IGLOO builds on top of the given local alignment to improve its topological quality. The difference between IGLOO and GNA is that IGLOO uses as the seed a local alignment of high functional quality (or a part of it) rather than just a single node pair, to improve functional quality of GNA. Namely, IGLOO varies the size of the seed from the entire local alignment (i.e. 100% of it) on one extreme (this version, denoted as IGLOO 4, is expected to resemble LNA the most) to only a single node pair (i.e. 0% of the local alignment) as the other extreme (this version, denoted as IGLOO 0, is expected to resemble GNA the most), with several in-between-the-extreme versions of IGLOO that use as the seed a certain portion (between 100 and 0%, exclusively) of the local alignment (these versions, denoted by IGLOO 3 to IGLOO 1, are expected to balance between high functional quality of LNA and high topological quality of GNA). As a result, IGLOO’s alignment is local in the sense that it allows for many-to-many mapping between nodes of the compared networks, just as LNA does. Yet, its alignment is global in the sense that it allows for mapping large conserved subgraphs across the compared networks, just as GNA does. IGLOO was evaluated comprehensively [64], against the same 10 existing NA methods, on the same data, and using the same alignment quality measures as in the above evaluation study [18] IGLOO produced a better trade-off between topological and functional alignment quality than the existing LNA and GNA methods (Figure 2). Namely, across all NA methods and network pairs, IGLOO was comparable or superior with the existing methods both functionally and topologically in 62% of all cases. Figure 2. View largeDownload slide Topological and functional alignment quality for existing prominent LNA methods (triangles), existing prominent GNA methods (stars) and IGLOO versions (circles), when aligning yeast and human PINs with yeast two-hybrid PPIs. The measure of topological alignment quality used in the figure is NCV combined with GS3, and the measure of functional alignment quality used in the figure is P-FP and R-PF of protein function prediction combined into F-PF; see the text for description of these measures, which are proven evaluation criteria for both LNA and GNA that can compare the two fairly [18, 65]. In general, LNA results in high functional but low topological alignment quality, while GNA results in high topological but low functional alignment quality [18, 65]. As IGLOO (in particular, IGLOO 4 and also IGLOO 3) is superior to each LNA method both topologically and functionally (these findings hold for all analysed PINs), as its minimum contribution, IGLOO is a new state-of-the-art LNA method. At the same time, IGLOO (the same two IGLOO versions) drastically increases functional alignment quality of all GNA methods without significantly lowering their topological alignment quality. Over all network data sets and all GNA methods, IGLOO’s average improvement in functional quality is 331%, while IGLOO’s average decrease in topological alignment quality is only 40%. The figure is taken from [65]. Figure 2. View largeDownload slide Topological and functional alignment quality for existing prominent LNA methods (triangles), existing prominent GNA methods (stars) and IGLOO versions (circles), when aligning yeast and human PINs with yeast two-hybrid PPIs. The measure of topological alignment quality used in the figure is NCV combined with GS3, and the measure of functional alignment quality used in the figure is P-FP and R-PF of protein function prediction combined into F-PF; see the text for description of these measures, which are proven evaluation criteria for both LNA and GNA that can compare the two fairly [18, 65]. In general, LNA results in high functional but low topological alignment quality, while GNA results in high topological but low functional alignment quality [18, 65]. As IGLOO (in particular, IGLOO 4 and also IGLOO 3) is superior to each LNA method both topologically and functionally (these findings hold for all analysed PINs), as its minimum contribution, IGLOO is a new state-of-the-art LNA method. At the same time, IGLOO (the same two IGLOO versions) drastically increases functional alignment quality of all GNA methods without significantly lowering their topological alignment quality. Over all network data sets and all GNA methods, IGLOO’s average improvement in functional quality is 331%, while IGLOO’s average decrease in topological alignment quality is only 40%. The figure is taken from [65]. Conclusion and future directions This survey has presented a systematic overview of current algorithms for LNA and GNA. In particular, we have mainly focused on the following issue. Although LNA aims to find highly functionally conserved modules, the modules that the current LNA methods can find are small (i.e. topologically suboptimal). On the other hand, while GNA aims to find overall (global) regions of topological similarity between the compared networks, the regions that the current GNA methods can find are typically poorly functionally conserved. Hence, we have asked whether it is possible to find both topologically large and highly functionally conserved regions of network similarity, which would still satisfy design/application goals of each of LNA and GNA but would also overcome their drawbacks. That is, we have asked: (1) How much can one expand around the small highly functionally conserved network regions that the current LNA methods can find to improve their topological quality while still preserving their functional quality? and (2) How much does one need to shrink the large regions of topological similarity that the current GNA methods can find to improve their functional quality while still preserving (or at least without drastically decreasing) their topological quality? We have presented early evidence [18, 64] that: (1) small highly functionally conserved modules of the existing LNA regions can be drastically increased to improve their topological quality without decreasing (or while also increasing!) their functional quality. (2) Large regions of topological similarity of the existing GNA methods are typically not functionally meaningful but can be refined to only functionally meaningful regions, which can drastically improve functional quality without drastically decreasing their topological quality. Such an efficient integration of LNA and GNA that preserves the original goals of each of LNA and GNA while at the same time, improving both LNA and GNA is a win–win for both scientists who aim to find highly functionally conserved local network regions (in which case they would find larger but equally functionally meaningful regions than the existing LNA methods can find) and scientists who aim to find highly topologically conserved global network regions (in which case they would find slightly smaller but much more biologically meaningful regions than the existing GNA methods can find). Importantly, balancing between the size (i.e. topological quality) of an aligned network region and its functional quality might need to be regulated with a parameter of the given (existing or future) LNA-GNA integrative method. For example, IGLOO (the only current such integrative method) has a parameter that allows for this, as illustrated in Figure 2; certain values of this parameter are intended to compete with the existing LNA methods, while other values of this parameter are intended to compete with the existing GNA methods. And the choice of this parameter value might be application-dependent. But a single method (e.g. IGLOO) should be able to achieve both goals (high topological quality and high functional quality), whereas none of the existing LNA or GNA methods allows for this. So, the need to, henceforth, develop more of such LNA-GNA integrative methods, which can compete with any of LNA and GNA, is the key point of our article. In addition to the above issue, several other questions remain open, as follows. From a practical point of view, one of the central questions to be addressed is: Which method should a biological scientist use? From a biological perspective, GNA and LNA have similarities to global and local sequence alignment. The former one is used to compare whole genomes, looking to highlight (dis)similarities among species, while the latter one is used to find conserved functional motifs (e.g. promoter or regulatory sequences). In a similar way, GNA tries to find the best overall alignment among compared networks, and it can be used to compare interactomes of different species, while LNA may be used to analyse conserved functional network modules (e.g. pathways or protein complexes). However, this distinction between application goals of LNA and GNA is not rigid. Namely, while LNA in general cannot recover large regions of similarity [65], GNA has been reported in the literature as able to recover biologically meaningful pathways or evolutionary conserved protein complexes ([35, 66]). Moreover, the answer to the above question is related to other issues such as: How to compare two different alignments? and How to measure the quality of an alignment? To the best of our knowledge, this problem has been partially addressed in other works. In [42], the authors investigated GNA methods, and they proposed a novel measure for the quality of the alignment. In [26], the authors discussed LNA methods, and they proposed a framework to improve the robustness of the local alignments. More recently, in [18, 64] the authors discussed the integration of LNA and GNA, and they proposed a comparison among them. It should be noted that the comparison is complex even among aligners of the same type (i.e. either LNA or GNA), and it is especially hard when comparing aligners of the different classes (i.e. LNA and GNA). Therefore, in [18], the authors proposed new measures of alignment quality that are applicable to both LNA and GNA. Yet, the question of proper evaluation is far from being settled. The existing metrics are either based only on network topology or use functional considerations. Nevertheless, aligners that produce good alignments regarding functional quality are usually not equally good regarding topological quality, and vice versa. Therefore, there is the need to unify the existing measures into a single consistent framework. Consequently, we emphasize again that one of the future challenges is the definition of a formal way to represent and analyse the output of the network aligners, similarly to other graph analysis related fields. For instance, the comparison of clustering and community detection algorithms relies on some accepted metrics and gold standards that are still absent in the context of the comparison of network alignment algorithms [10]. In the scenario we envision, each obtained alignment should be measured using some accepted metrics, and all the aligners should be benchmarked on the same group of gold-standard networks. Such comparison is hard. For instance, PINs evolve continuously. Consequently, aligners of these networks have been evaluated using different data sets, and, consequently, their comparison is hard [26]. The existence of a common framework to compare aligners may also open the possibility to expand their usage. For instance, network aligners could be used to measure the (dis)similarity among two or more networks, overcoming the well-known graph edit distance [67]. Moreover, as aligners are usually tailored to the structure of the analysed networks, they may also evidence the (dis)similarity of subnetworks (e.g. protein modules or complexes), highlighting the changes in such modular structures among different networks or conditions. Although in our survey we focused on the alignment of PINs, network alignment has been applied to other biological network types, such as metabolic [68] or gene co-expression [69] networks. Moreover, other emerging fields of applications, e.g. the comparison of networks representing the structural connections within the brain, pose many interesting questions to researchers. For example, is it possible to simply apply existing algorithms used in the comparison of biological networks to networks in other domains, or do significant changes need to be made to the existing methods to customize them to the given applied problem at hand? LNA methods are usually tightly coupled to the biological question addressed. For instance, some of them try to evidence conserved protein complexes among organisms [70]. Specifically, many LNA approaches (e.g. MaWiSH, NetworkBlast and AlignNemo) rely on biological models of protein complexes, i.e. the small aligned regions that are the output of the alignment. Consequently, the application of such aligners in other fields poses some challenges that need to be solved. On the other hand, GNA methods are usually based more on topology during the alignment steps; therefore, the application to other fields may be simpler with respect to LNA approaches. Another important question related to the applicability of the existing methods for biological network alignment to other domains, such as to social networks [71–73], is that of scalability. Namely, as social networks are typically much larger (with millions of nodes) than biological ones (with thousands of nodes), the existing biological network alignment methods might need to be redesigned to allow for aligning such large social networks. Although some of the existing methods, such as MAGNA ++ and multiMAGNA ++, two of the GNA methods, exploit the benefits of the parallelism and multithreading, there is still significant room for improvement in efficiency of some algorithms that suffer from known scalability problems. Development of cloud-based aligners that would exploit the benefit of elastic computing may constitute a possible research trend in the future. Key Points The article discusses the topic of biological network alignment, which, analogous to genomic sequence alignment, has a potential to revolutionize our understanding of cellular functioning. The article discusses recent approaches for local and GNA and contrasts the two approach types, indicating their complementarity in uncovering new biological knowledge. The article discusses a possible reconciliation of the two complementary approach types, the topic that has been explored only recently and that, thus, remains an open research problem for future exploration. The article outlines additional future research directions, such as the definition of novel theories for a better understanding of the network alignment output, which are needed for a better comparison of network alignment methods, for guiding future method development by computational scientists and for guiding applied (e.g. biological) scientists on how to use the network alignment output to learn new biology. The article outlines the possibility of the use of existing network alignment methods in other emerging fields. Funding The United States Air Force Office of Scientific Research (AFOSR) Young Investigator Research Program (YIP) (grant number FA9550-16-1-0147 to T.M.), National Science Foundation (NSF) (grant number CCF-1319469 to T.M.), The Italian Ministry of Education and Research (MIUR): BA2Know-Business Analytics to Know (grant number PON03PE_00001_1 to P.H.G.). Pietro H. Guzzi is an associate professor of Computer Science Engineering at the University ‘Magna Græcia’ of Catanzaro, Italy. His research interests comprise semantic-based and network-based analysis of biological and clinical data. Tijana Milenković is an associate professor of Computer Science and Engineering at the University of Notre Dame. She researches network science and computational biology. She won NSF CAREER 2015 and AFOSR YIP 2016 awards, among others. Milenković is an Associate Editor of IEEE/ACM TCBB. References 1 Barabasi A-L , Oltvai ZN. Network biology: understanding the cell's functional organization . Nat Rev Genet 2004 ; 5 : 101 – 13 . Google Scholar CrossRef Search ADS PubMed 2 Alon U. Network motifs: theory and experimental approaches . Nat Rev Genet 2007 ; 8 : 450 – 61 . Google Scholar CrossRef Search ADS PubMed 3 Gavin AEA. Functional organization of the yeast proteome by systematic analysis of protein complexes . Nature 2002 ; 415 : 141 – 7 . Google Scholar CrossRef Search ADS PubMed 4 Calin GA , Croce CM. MicroRNA-cancer connection: the beginning of a new tale . Cancer Res 2006 ; 66 : 7390 – 4 . Google Scholar CrossRef Search ADS PubMed 5 Tuncbag N , Kar G , Keskin O , et al. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces . Brief Bioinform 2008 ; 10 : 217 – 32 . Google Scholar CrossRef Search ADS 6 Cannataro M , Guzzi PH , Veltri P. Protein-to-protein interactions: Technologies, databases, and algorithms . ACM Comput Surv 2010 ; 43 ( 1 ): 1 . Google Scholar CrossRef Search ADS 7 Bertolazzi P , Bock ME , Guerra C. On the functional and structural characterization of hubs in protein–protein interaction networks . Biotechnol Adv 2013 ; 31 : 274 – 86 . Google Scholar CrossRef Search ADS PubMed 8 Aittokallio T , Schwikowski B. Graph-based methods for analysing networks in cell biology . Brief Bioinform 2006 ; 7 : 243 – 55 . Google Scholar CrossRef Search ADS PubMed 9 Panni S , Rombo SE. Searching for repetitions in biological networks: methods, resources and tools . Brief Bioinform 2015 ; 16 : 118 – 36 . Google Scholar CrossRef Search ADS PubMed 10 Lancichinetti A , Fortunato S , Radicchi F. Benchmark graphs for testing community detection algorithms . Phys Rev E 2008 ; 78 : 046110. Google Scholar CrossRef Search ADS 11 Fortunato S. Community detection in graphs . Phys Rep 2010 ; 486 : 75 – 174 . Google Scholar CrossRef Search ADS 12 Faisal FE , Milenković T. Dynamic networks reveal key players in aging . Bioinformatics 2014 ; 30 : 1721 – 9 . Google Scholar CrossRef Search ADS PubMed 13 Faisal FE , Meng L , Crawford J , et al. The post-genomic era of biological network alignment . EURASIP J Bioinform Syst Biol 2015 ; 2015 : 1 – 19 . Google Scholar CrossRef Search ADS PubMed 14 Elmsallati A , Clark C , Kalita J. Global alignment of protein-protein interaction networks: a survey . IEEE/ACM Trans Comput Biol Bioinform 2015 ; 13 : 689 – 705 . Google Scholar CrossRef Search ADS PubMed 15 Sharan R , Suthram S , Kelley RM , et al. Conserved patterns of protein interaction in multiple species . Proc Natl Acad Sci USA 2005 ; 102 : 1974 – 9 . Google Scholar CrossRef Search ADS PubMed 16 Cook SA. The complexity of theorem-proving procedures. In: Stoc '71 Proceedings of the third annual ACM symposium on Theory of computing, ACM Press, NY, 1971 , pp. 151–8. 17 Clark C , Kalita J. A comparison of algorithms for the pairwise alignment of biological networks . Bioinformatics 2014 ; 30 : 2351 – 9 . Google Scholar CrossRef Search ADS PubMed 18 Meng L , Striegel A , Milenković T. Local versus global biological network alignment . Bioinformatics 2016 ; 32 : 3155 – 64 . Google Scholar CrossRef Search ADS PubMed 19 Erten S , Li X , Bebek G , et al. Phylogenetic analysis of modularity in protein interaction networks . BMC Bioinformatics 2009 ; 10 : 333 . Google Scholar CrossRef Search ADS PubMed 20 Jancura P , Mavridou E , Carrillo-de Santa Pau E , et al. A methodology for detecting the orthology signal in a PPI network at a functional complex level . BMC Bioinformatics 2012 ; 13 (Suppl 1) : S18. Google Scholar CrossRef Search ADS 21 Lancichinetti A , Fortunato S. Community detection algorithms: a comparative analysis . Phys Rev E 2009 ; 80 : 056117 . Google Scholar CrossRef Search ADS 22 Guzzi PH , Veltri P , Roy S , et al. MODULA: a network module based local protein interaction network alignment method. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE Press, NY, 2015 , pp. 1620 – 3 . 23 Nepusz T , Yu H , Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks . Nat Methods 2012 ; 9 : 471 – 72 . Google Scholar CrossRef Search ADS PubMed 24 Guzzi PH , Mina M , Guerra C , et al. Semantic similarity analysis of protein data: assessment with biological features and issues . Brief Bioinform 2012 ; 13 : 569 – 85 . Google Scholar CrossRef Search ADS PubMed 25 Ciriello G , Mina M , Guzzi PH , et al. AlignNemo: a local network alignment method to integrate homology and topology . PloS One 2012 ; 7 : e38107. Google Scholar CrossRef Search ADS PubMed 26 Mina M. , Hiram Guzzi P. Improving the Robustness of local network alignment: design and extensive assessmentof a Markov clustering-based approach . IEEE/ACM Trans Comput Biol Bioinform 2014 ; 11 : 561 – 72 . Google Scholar CrossRef Search ADS PubMed 27 Kalaev M , Smoot M , Ideker T , et al. NetworkBLAST: comparative analysis of protein networks . Bioinformatics 2008 ; 24 : 594 – 6 . Google Scholar CrossRef Search ADS PubMed 28 Koyutrk M , Kim Y , Topkara U , et al. Pairwise alignment of protein interaction networks . J Comput Biol 2006 ; 13 : 182 – 99 . Google Scholar CrossRef Search ADS PubMed 29 Pache RA , Ceol A , Aloy P. NetAligner—a network alignment server to compare complexes, pathways and whole interactomes . Nucleic Acids Res 2012 ; 40 : W157 – 61 . Google Scholar CrossRef Search ADS PubMed 30 Mina M , Guzzi PH. AlignMCL: comparative analysis of protein interaction networks through Markov clustering. In: 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), IEEE Press, NY, 2012 , pp, 174–81. 31 Kalaev M , Bafna V , Sharan R. Fast and accurate alignment of multiple protein networks . J Comput Biol 2009 ; 16 : 989 – 99 . Google Scholar CrossRef Search ADS PubMed 32 Micale G , Pulvirenti A , Giugno R , et al. GASOLINE: a greedy and stochastic algorithm for optimal local multiple alignment of interaction networks . PloS One 2014 ; 9 : e98750. Google Scholar CrossRef Search ADS PubMed 33 Singh R , Xu J , Berger B. Global alignment of multiple protein interaction networks with application to functional orthology detection . Proc Nat Acad Sci USA 2008 ; 105 : 12763 – 68 . Google Scholar CrossRef Search ADS PubMed 34 Liao C-S , Lu K , Baym M , et al. IsoRankN: spectral methods for global alignment of multiple protein networks . Bioinformatics 2009 ; 25 : i253 – 8 . Google Scholar CrossRef Search ADS PubMed 35 Kuchaiev O , Milenković T , Memišević V , et al. Topological network alignment uncovers biological function and phylogeny . J R So Interface 2010 ; 7 : 1341 – 54 . Google Scholar CrossRef Search ADS 36 Malod-Dognin N , Pržulj N. L-GRAAL: Lagrangian Graphlet-based network aligner . Bioinformatics 2015 ; 31 : 2182 – 9 . Google Scholar CrossRef Search ADS PubMed 37 Kuchaiev O , Pržulj N. Integrative network alignment reveals large regions of global network similarity in yeast and human . Bioinformatics 2011 ; 27 : 1390 – 6 . Google Scholar CrossRef Search ADS PubMed 38 Patro R , Kingsford C. Global network alignment using multiscale spectral signatures . Bioinformatics 2012 ; 28 : 3105 – 14 . Google Scholar CrossRef Search ADS PubMed 39 Milenković T , Pržulj N. Uncovering biological network function via graphlet degree signatures . Cancer Inform 2008 ; 6 : 257 – 73 . Google Scholar CrossRef Search ADS PubMed 40 Neyshabur B , Khadem A , Hashemifar S , et al. NETAL: a new graph-based method for global alignment of protein–protein interaction networks . Bioinformatics 2013 ; 29 : 1654 – 62 . Google Scholar CrossRef Search ADS PubMed 41 Sun Y , Crawford J , Tang J , et al. Simultaneous optimization of both node and edge conservation in network alignment via WAVE. In: Algorithms in Bioinformatics, Volume 9289 of the series Lecture Notes in Computer Science, Springer Verlag, 2015 , pp. 16 – 39 . 42 Faisal F , Zhao H , Milenković T. Global network alignment in the context of aging . IEEE/ACM Trans Comput Biol Bioinform 2014 ; 12 : 40 – 52 . Google Scholar CrossRef Search ADS 43 Crawford J , Sun Y , Milenković T. Fair evaluation of global network aligners . Algorithms for Molecular Biology 2015 ; 10 : 19 . Google Scholar CrossRef Search ADS PubMed 44 Saraph V , Milenković T. MAGNA: Maximizing Accuracy in Global Network Alignment . Bioinformatics 2014 ; 30 : 2931 – 40 . Google Scholar CrossRef Search ADS PubMed 45 Vijayan V , Saraph V , Milenković T. MAGNA ++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation . Bioinformatics 2015 ; 31 : 2409 – 11 . Google Scholar CrossRef Search ADS PubMed 46 Hu J , Kehr B , Reinert K. NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks . Bioinformatics 2014 ; 30 : 540 – 8 . Google Scholar CrossRef Search ADS PubMed 47 Ibragimov R , Malek M , Guo J , et al. NABEECO: biological network alignment with bee colony optimization algorithm. In: GECCO '13 Companion Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, ACM Press, NY, 2013 , pp. 43–44. 48 Ibragimov R , Malek M , Guo J , et al. GEDEVO: an evolutionary graph edit distance algorithm for biological network alignment . German Conf Bioinformatics (GCB) 2013 ; 34 : 68 – 79 . 49 Clark C , Kalita J. A multiobjective memetic algorithm for PPI network alignment . Bioinformatics 2015 ; 31 : 1988 – 98 . Google Scholar CrossRef Search ADS PubMed 50 Crawford J , Milenković T. GREAT: GRaphlet Edge-based network AlignmenT. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2015 ; 220 – 227 . 51 Klau G. A new graph-based method for pairwise global network alignment . BMC Bioinformatics 2009 ; 10 : S59. Google Scholar CrossRef Search ADS PubMed 52 Chindelevitch L , Liao C-S , Berger B. Local optimization for global alignment of protein interaction networks . Pac Symp Biocomput 2010 ; 123 – 32 . 53 El-Kebir M , Heringa J , Klau GW. Natalie 2.0—sparse global network alignment as a special case of quadratic assignment . Algorithms 2015 ; 8 : 1035 – 51 . Google Scholar CrossRef Search ADS 54 Tuncay EG , Can T. SUMONA: a supervised method for optimizing network alignment . Comput Biol Chem 2016 ; 63 : 41 – 51 . Google Scholar CrossRef Search ADS PubMed 55 Hashemifar S , Ma J , Naveed H , et al. ModuleAlign: module-based global alignment of protein–protein interaction networks . Bioinformatics 2016 ; 32 : i658 – 64 . Google Scholar CrossRef Search ADS PubMed 56 Mamano N , Hayes W. SANA: simulated annealing network alignment applied to biological networks. arXiv 2016 ;q-bio.MN. 57 Sahraeian SME , Yoon B-J. SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks . PloS One 2013 ; 8 : e67995. Google Scholar CrossRef Search ADS PubMed 58 Ibragimov R , Malek M , Guo J , et al. Multiple graph edit distance - simultaneous topological alignment of multiple protein-protein interaction networks with an evolutionary algorithm. In: GECCO '14 Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, IEEE Press, NY, 2014 , pp. 277–84. 59 Alkan F , Erten C. BEAMS: backbone extraction and merge strategy for the global many-to-many alignment of multiple PPI networks . Bioinformatics 2014 ; 30 : 531 – 9 . Google Scholar CrossRef Search ADS PubMed 60 Gligorijević V , Malod-Dognin N , Pržulj N. FUSE: multiple network alignment via data fusion . Bioinformatics 2015 ; 32 : 1195 – 203 . Google Scholar CrossRef Search ADS PubMed 61 Dohrmann J , Singh R. The SMAL web server: global multiple network alignment from pairwise alignments . Bioinformatics 2016 ; 32 : 3330 – 2 . Google Scholar CrossRef Search ADS PubMed 62 Vijayan V , Milenković T. Multiple network alignment via multiMAGNA. arXiv:1604.01740 [q-bio.MN] 2016. 63 Memišević V , Milenković T , Pržulj N. Complementarity of network and sequence information in homologous proteins . J Integr Bioinform 2010 ; 7 ( 3 ): 135 . 64 Meng L , Crawford J , Striegel A , et al. IGLOO: integrating global and local biological network alignment. In: 12th International Workshop on Mining and Learning with Graphs (MLG) 2016 . 65 Kuchaiev O , Milenković T , Memišević V , et al. Topological network alignment uncovers biological function and phylogeny . J R Soc Interface 2010 ; 7 : 1341 – 54 . Google Scholar CrossRef Search ADS PubMed 66 Kelley BP , Sharan R , Karp RM , et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment . Proc Natl Acad Sci USA 2003 ; 100 : 11394 – 9 . Google Scholar CrossRef Search ADS PubMed 67 Zager L , Verghese G. Graph similarity scoring and matching . Appl Math Lett 2008 ; 21 : 86 – 94 . Google Scholar CrossRef Search ADS 68 Pah AR , Guimerà R , Mustoe AM , et al. Use of a global metabolic network to curate organismal metabolic networks . Sci Rep 2013 ; 3 : 1695. Google Scholar CrossRef Search ADS PubMed 69 Ma C-Y , Lin S-H , Lee C-C , et al. Reconstruction of phyletic trees by global alignment of multiple metabolic networks . BMC Bioinformatics 2013 ; 14 : S12. Google Scholar CrossRef Search ADS PubMed 70 Cannataro M , Guzzi PH , Veltri P. IMPRECO: distributed prediction of protein complexes . Future Gener Comput Syst 2010 ; 26 : 434 – 40 . Google Scholar CrossRef Search ADS 71 Narayanan A , Shi E , Rubinstein BIP. Link prediction by de-anonymization: how we won the Kaggle Social Network challenge. In: 2011 International Joint Conference on Neural Networks (IJCNN 2011—San Jose), IEEE Press, NY, 2011 , pp. 1825–1834. 72 Zhang Y , Tang J , Yang Z , et al. COSNET: connecting heterogeneous social networks with local and global consistency. In: Kdd '15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, NY, 2015 , pp. 1485–1494. 73 Zhang J , Yu PS. Multiple anonymized social networks alignment. In: 2015 IEEE International Conference on Data Mining (ICDM), ACM Press, NY, 2015 , pp. 599–608. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin

Loading next page...
 
/lp/ou_press/survey-of-local-and-global-biological-network-alignment-the-need-to-WLJ7OWGhQB
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1467-5463
eISSN
1477-4054
D.O.I.
10.1093/bib/bbw132
Publisher site
See Article on Publisher Site

Abstract

Abstract Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology. graph comparison, local network alignment, global network alignment, biological networks Introduction Molecular biology has deeply investigated the role of biologically relevant molecules within cells. After the first phase in which most of the efforts focused on the discovery of such molecules, more recently, the interests of researchers have concentrated on the identification of a complex set of relations among molecules, under the hypothesis that genes, proteins, and other molecules rarely work alone but instead form a complex network of interactions [1]. This new systems-level point of view of molecular biology has affected both technologies used to analyse cells as well as methods and models used to manage resulting data. Namely, this new perspective has encouraged the development of high-throughput techniques for determining relations between biomolecules [2]. Consequently, this has led to accumulation of large amounts of biomolecular interaction data, such as protein–protein interaction networks (PINs) [3, 4], collected in publicly available databases (see [5, 6] for an extensive review). The systems-level data collection, in turn, has raised the need to develop novel computational tools and algorithms that are able to accurately and efficiently model, query and analyse the network data [7]. Of all biological network types, our key focus is on PINs. Yet, our discussion is applicable to other biological network types, such as gene co-expression, metabolic or gene regulatory networks. For PINs, the most used formalism to manage and analyse the data has been adopted from graph theory [8]. Consequently, a PIN, or interactome, is modelled as a graph G = (V,E), where V is the set of nodes representing proteins, and E is the set of edges representing (typically physical) protein–protein interactions (PPIs). Such a synergetic view of cellular functioning gives to researchers the opportunity to analyse the complex joint effect of the interplay among molecules, which is more realistic compared with analysing only effects of individual molecules in isolation [1]. There exist a variety of research problems related to PIN analysis. For instance, Panni and Rombo [9] consider three important aspects: ‘network alignment’, ‘network querying’ and ‘network motif extraction’. Additional aspects are ‘study of network properties’ [1], ‘network clustering’ [10, 11] and ‘dynamic network analysis’ [12]. Here, we focus on the problem of network alignment, i.e. the comparison of different PINs typically corresponding with different species. Such analysis is the counterpart of the alignment of (linear) sequences of genes or proteins. In the basic formulation, network alignment aims to find a good node mapping between PINs of two or more species that identifies common interaction patterns between the compared networks, which are then hypothesized to correspond to functionally conserved network regions between the species. Therefore, just as sequence alignment has been used to define sequence-based homology, network alignment can be used to define network-based homology or similarity. However, unlike sequence alignment, network alignment is computationally intractable (i.e. NP-hard), owing to the NP-completeness of the underlying subgraph isomorphism problem. Therefore, heuristic approaches for solving the network alignment problem need to be sought. Analogous to sequence alignment, there exist two different instances of the network alignment problem: local and global [13, 14]. From a computational perspective, local network alignment (LNA) [15] searches for highly similar network regions that likely represent conserved functional structures, which often results in relatively small mapped subnetworks and in some network regions not being a part of the alignment (Figure 1). Instead, global network alignment (GNA) [13] looks for the best superimposition of the whole input networks (i.e. an alignment that minimizes a cost function), which typically results in large but suboptimally conserved mapped subnetworks (Figure 1). Both kinds of network alignment may compare two or more networks, corresponding to pairwise and multiple alignment, respectively. From a biological perspective, LNA looks for evolutionarily conserved building blocks of the cellular machinery, disregarding the overall similarity between the networks. Instead, GNA searches for a single comprehensive mapping of the whole sets of protein interactions from different species. Figure 1. View largeDownload slide LNA (left) versus GNA (right). LNA finds small local regions of high similarity and often admits a many-to-many mapping between nodes of the compared networks. GNA finds an injective (one-to-one) global mapping between the nodes at the expense of suboptimally matching local network regions. Nodes from the compared networks that are aligned to each other are indicated with broken lines. Figure 1. View largeDownload slide LNA (left) versus GNA (right). LNA finds small local regions of high similarity and often admits a many-to-many mapping between nodes of the compared networks. GNA finds an injective (one-to-one) global mapping between the nodes at the expense of suboptimally matching local network regions. Nodes from the compared networks that are aligned to each other are indicated with broken lines. Formally, given two input networks G1 and G2 (let us suppose that G1 has fewer nodes than G2 or the same number of nodes as G2), the problem of finding an alignment between the two networks corresponds to the search for a mapping between nodes of G1 and nodes of G2 that maximizes a given cost function (quality of alignment). The size of this search space is large, as it consists of all possible mappings between nodes of the compared networks. The computational intractability of the above problem, which arises from the NP-completeness of the underlying subgraph isomorphism problem [16], requires development of heuristics (approximate approaches) to solve the problem. Thus, all existing LNA and GNA methods are heuristics. GNA produces an injective (one-to-one) matching, i.e. for each node of G1, there exists a unique correspondent node in G2, while LNA gives a mapping of a subset of nodes in G1 to a subset of nodes in G2 (even admitting many-to-many node correspondences in some cases). There is a clear connection between LNA and GNA. For example, both aim to find topological and functional similarities between the compared networks to allow for the transfer of biological knowledge from well-studied species to poorly studied species between the conserved (aligned) network regions. Yet, researchers in these two subfields have produced independent algorithms. Consequently, there are many LNA algorithms and many GNA algorithms that rely on different assumptions and use different approaches that maximize different cost functions [17]. For instance, many algorithms try to optimize some cost functions based mainly on topology, while many others are tailored to enhance the functional relevance of alignments. Therefore, a direct comparison of an LNA method and a GNA method is non-trivial. Consequently, when a new method is proposed, it is only compared against the existing network alignment methods from the same category (i.e. LNA or GNA), even though LNA and GNA have the same goal of across-species transfer of biological knowledge between aligned network regions [18]. Therefore, the ultimate question is which one to use: LNA, GNA or a hybrid approach that would reconcile the two? The possible reconciliation between these two aspects of network alignment is an open research problem that should be investigated deeply in the future. In this survey, after presenting prominent algorithms for each of LNA and GNA (which are summarized in Table 1), we discuss recent beginning steps towards reconciling the two corners. In particular, we point to a comprehensive evaluation study, the first ever comparison of LNA and GNA, which showed that LNA and GNA are complementary, as the former results in high functional but low topological quality, while the latter results in high topological but low functional quality, and as they lead to different biological predictions. As both PIN topological information and functional information are valuable sources of biological knowledge, this important finding about complementarity of LNA and GNA further highlights the need for reconciling the two approach types. Towards the end of our discussion below, we comment on a recent first step in this direction. Table 1. Summary of the existing LNA and GNA methods Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Note. Each of the two method categories is further divided into pairwise (PNA) and multiple (MNA) methods. For each method, its general algorithmic principle is stated, along with the potential availability of its software implementation. In the table, AS is alignment strategy, and NCF is node cost function (see the text for details). Table 1. Summary of the existing LNA and GNA methods Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Name LNA or GNA? PNA or MNA? Algorithmic principle(s) Software availability (website) MOPHY LNA PNA Mine-and-merge AS N/A Jancura et al. LNA PNA Mine-and-merge AS N/A MODULA LNA PNA Mine-and-merge AS N/A MaWiSH LNA PNA Merge-and-mine AS http://compbio.case.edu/koyuturk/software/mawish/ AlignNemo LNA PNA Merge-and-mine AS http://sourceforge.net/p/alignnemo AlignMCL LNA PNA Merge-and-mine AS https://sites.google.com/site/alignmcl NetworkBlast LNA PNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetworkBlast-M LNA MNA Merge-and-mine AS http://cs.tau.ac.il/∼bnet/networkblast.htm NetAligner LNA PNA Merge-and-mine AS http://netaligner.irbbarcelona.org/ GASOLINE LNA MNA Greedy stochastic optimal alignment http://ferrolab.dmi.unict.it/gasoline/ IsoRank GNA PNA PageRank-like NCF, greedy AS https://groups.csail.mit.edu/cb/mna/ MI-Iso GNA PNA MI-GRAAL’s NCF, IsoRankN’s AS N/A MI-GRAAL GNA PNA Graphlet-based NCF, seed and extend AS http:///www0.cs.ucl.ac.uk/staff/natasa/MI-GRAAL L-GRAAL GNA PNA Lagrangian optimization of node and edge conservation http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL GHOST GNA PNA Spectral signature-based NCF, seed and extend AS http://www.cs.cmu.edu/∼ckingsf/software/ghost/ NETAL GNA PNA Iterative updating of NCF, greedy programming http://www.bioinf.cs.ipm.ir/software/netal SUMONA GNA PNA Integration of Optnetalign with other algorithms N/A NABEECO GNA PNA Artificial bee colony-based alignment http://nabeeco.mpi-inf.mpg.de/ PISWAP GNA PNA Iterative local optimization of GNA http://groups.csail.mit.edu/cb/piswap/webserver/ GEDEVO GNA PNA Evolutionary alignment optimization http://gedevo.mpi-inf.mpg.de/ Optnetalign GNA PNA Evolutionary optimization of alignment https://github.com/crclark/optnetaligncpp/ SANA GNA PNA Simulated annealing-based AS http://sana.ics.uci.edu NATALIE 2.0 GNA PNA Quadratic assignment http://mi.fu-berlin.de/w/LiSA/Natalie MAGNA ++ GNA PNA Evolutionary optimization of node and edge conservation http://nd.edu/∼cone/MAGNA++/ WAVE GNA PNA Seed and extend optimization of node and edge conservation http://nd.edu/∼cone/WAVE/WAVE.zip SMETANA GNA PNA Semi-Markov random walk probabilistic alignment http://ece.tamu.edu/∼bjyoon/SMETANA/ moduleAlign GNA PNA Alignment via module matching http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html IsoRankN GNA MNA IsoRank’s MNA counterpart https://groups.csail.mit.edu/cb/mna/ SMAL GNA MNA Scaffold-based MNA http://haddock6.sfsu.edu/smal/ GEDEVO-M GNA MNA GEDEVO’s MNA counterpart http://gedevo.mpi-inf.mpg.de/multiple-network-alignment/ multiMAGNA ++ GNA MNA MAGNA ++’s MNA counterpart http://nd.edu/∼cone/multiMAGNA++/ BEAMS GNA MNA Backbone extraction and merge strategy http://webprs.khas.edu.tr/∼cesim/BEAMS.tar.gz FUSE GNA MNA Non-negative matrix tri-factorization http://www0.cs.ucl.ac.uk/staff/natasa/FUSE/ IGLOO Both PNA Integrating LNA and GNA Available on request Note. Each of the two method categories is further divided into pairwise (PNA) and multiple (MNA) methods. For each method, its general algorithmic principle is stated, along with the potential availability of its software implementation. In the table, AS is alignment strategy, and NCF is node cost function (see the text for details). Local network alignment In recent years, several algorithms have been proposed to solve the LNA problem. They may be categorized considering the approach and the scoring of the alignment. Nevertheless, we group them into two broad classes considering their main steps: mine-and-merge and merge-and-mine. Algorithms belonging to the mine-and-merge class first analyse each network separately (e.g. to identify small dense subgraphs or communities), and then they build the alignment solution by comparing the pre-identified modules across the networks [19–21]. Examples of mine-and-merge methods are MOPHY [19], the algorithm by Jancura et al. [20] and MODULA [22]. Briefly, these methods work as follows. The MOPHY algorithm, proposed by Erten et al. [19], is based on the identification of modular network components from each network separately by applying graph clustering techniques. Then, each module is projected onto other networks using sequence similarity. The conservation of each module is used to assess the similarity between input graphs. The method proposed by Jancura et al. [20] is tailored to the study of the conservation of protein complexes (i.e. set of mutually interacting proteins) among different species. The algorithm first identifies protein complexes in each input network; then, orthology relationships are used to link corresponding protein complexes. MODULA algorithm uses ClusterOne [23] to separate modules in each input network. Then, the modules are matched using semantic similarity [24]. High-scoring pairs of the modules are included in the alignment, and the algorithm iterates for as long as it grows. The algorithms from the mine-and-merge class have to be able to merge efficiently subnetworks of the input networks, which may be computationally expensive, as it is related to the (sub)graph isomorphism problem. In particular, they have to consider even approximate matching among subgraphs and possible many-to-many mappings among subgraphs of two networks. Conversely, merge-and-mine analysis alleviates these difficulties by avoiding one-to-one comparison of generated modules network topologies, but they are more sensitive to noise in input data, as they use an initial node mapping list derived from biological consideration (homology relationships, i.e. sequence similarities, or semantic similarities, more recently). Initial node mapping contains many-to-many relations usually, and algorithms may prune this initial list using some heuristics. Without any additional information on the input node mapping (i.e. the possible orthologs), the LNA problem is complex, as all the possible combinations of protein pairs should be considered. Merge-and-mine algorithms analyse PINs together, usually after the alignment or merging of the PINs into a single graph [25, 26]. Several merge-and-mine algorithms follow a similar processing scheme: merge all input data (PINs, putative homologies) into a single weighted undirected graph, generally referred to as an alignment graph, and then apply a mining heuristic on top of it to identify conserved modules. Alignment graphs are built using many different models. From a conceptual point of view, the simplest formulation of the alignment graph is a graph whose nodes correspond to pairs of matched nodes (i.e. orthologs nodes), and edges link matched nodes (such an edge corresponds to, e.g., a potentially conserved interaction). Such formulation is, unfortunately, sensitive to noise in input data, i.e. missing or wrong interactions or homology relationships as evidenced by Mina and Guzzi [26]. Consequently, all the algorithms proposed a different formulation of alignment graphs that was more robust to noise. For instance, NetworkBlast algorithm [27] showed a relaxation of constraints related to edges. In its formulation, there exist an edge among two nodes of the alignment graph whenever nodes of the input networks are at distance less than or equal to a given threshold k in a network, and they are adjacent to the other one. One of the drawbacks of this approach is that it might introduce many unreliable edges in the alignment graph because of the presence of false positives in the input networks. The MaWish algorithm [28] takes into account this problem by setting the k threshold to 2 and weighting the edges of the alignment graph by the similarity scores of the putative orthologs. This approach avoids the generation of dense graphs, but it fails on the recovery of large subgraphs, and it is more sensitive to missing interactions as noted by Ciriello et al. [25]. More recently, NetAligner [29] proposed a two-step strategy for building the alignment graph. First, a relatively small alignment graph is made, considering only the most reliable interactions, and it is used to divide both input networks into candidate seeds. Finally, predicting potential conserved interactions extends the alignment graph. NetworkBlast, MaWish and NetAligner use the concept of shortest path distance between nodes of input networks to propose a new connection among nodes of the alignment graph. AlignNemo extends the idea of distance by considering the number of paths connecting input nodes. For each pair of orthologs, the number of short paths (of length at most 2) connecting them is used to evaluate how likely the orthologs are connected in both the species. In this process, AlignNemo also takes into consideration the degree of each protein, penalizing paths passing through hubs and high-degree proteins. Consequently, the impact of false positives in AlignNemo has significantly reduced under the consideration that it is unlikely that many false interactions consistently participate in short redundant paths between two proteins. Align-MCL [30] uses the same strategy of Align-Nemo to build the alignment graph. Almost all of the above merge-and-mine approaches give scores to the edges and nodes of the alignment graph. AlignNemo and Align-MCL assign higher scores to edges involving protein pairs connected by multiple short paths in input networks. MaWish uses the sequence similarities of connected proteins to weigh the edges. NetAligner uses a more sophisticated model based on an evolutionary and a probabilistic assumption: interacting proteins evolve at rates significantly closer than expected by chance. Edge scores of the alignment graph are probabilities, and they are calculated by taking into account the difference of evolutionary distances between the corresponding ortholog pairs. Once that the alignment graph is built, each algorithm uses a different strategy to obtain the alignment. MaWish uses the alignment graph to solve the alignment problem as a graph optimization by finding a Maximum Weight Induced Subgraph Problem (which is where the algorithm name, MaWISh, comes from). Starting from the alignment graph, MaWish extracts a subset of nodes that induce a maximum weighted subgraph. NetworkBlast is based on a search algorithm that first identifies high-scoring subnetworks seeds and then expands them in a greedy fashion. The scoring is based on a probabilistic model, and the algorithm iterates as far as the significance of the identified subnetwork is high. AlignNemo extracts all connected subgraphs of a given size from the alignment graph and ranks them according to weights on nodes and edges. Top ranking fully connected subgraphs are used as seeds for the alignment solution. Each seed is expanded in an iterative fashion by adding multiple subgraphs at each step allowing to explore the network context of a solution beyond its immediate neighbours. Finally, Align-MCL uses Markov clustering to separate small dense subgraphs. All the approaches discussed above can align two networks, and they fail to deal with multiple networks. Only NetworkBlast has been extended into NetworkBlast-M [31] that can align multiple networks. Other algorithms, such as GASOLINE [32], have been designed specifically for the multiple local alignment problem. Global network alignment Next, we review existing work on GNA [13, 14]. Traditional GNA uses a two-stage procedure: (1) use a cost function to compute pairwise similarities between nodes of the compared networks and (2) use an alignment strategy to rapidly identify from all possible alignments a high-scoring alignment with respect to the total similarity over all aligned nodes. A typical cost function aims to quantify topological similarities between nodes, meaning that two nodes whose (extended) network neighbourhoods match well will have high topological similarity. Yet, most of existing GNA methods allow for integrating sequence information (i.e. sequence-based node similarities) into the cost function. A typical alignment strategy uses the precomputed node similarities (resulting from the cost function) to produce an alignment, often in an iterative seed-and-extend fashion. Prominent examples of two-stage GNA methods are IsoRank [33, 34] and GRAAL [35–37] approach families, as well as GHOST [38]. These approaches vary in their cost functions. Namely, IsoRank, GRAAL and GHOST compute similarities between two nodes with respect to the nodes’ PageRank-like [33, 34], graphlet-based [39] and spectral signature-based [38] ‘topological signatures’, respectively. These approaches differ in their alignment strategies as well; for details, see [13]. What the traditional two-stage GNA methods have in common is that their alignment strategies use precomputed node similarities without updating the node similarities while iteratively building on top of the current alignment. However, nodes that are already aligned at a given iteration of the alignment strategy might convey valuable information for guiding the remaining iterations of the strategy. Therefore, it could be desirable to update the node similarities (i.e. cost function) in each iteration [13]. Consequently, newer approaches have been designed to allow for this, such as NETAL [40] and WAVE [41]. Whether the existing two-stage GNA methods allow for updating their cost functions during the alignment construction process, most of them have their own cost functions as well as alignment strategies. Consequently, when a given approach is found to be superior to another, it is not clear whether this superiority comes from the winner’s cost function, its alignment strategy or both. So, to fairly evaluate different two-stage GNA methods, their different cost functions should be compared under the same alignment strategy, and their different alignment strategies should be compared under the same cost function [42, 43]. It was shown by mixing and matching the algorithmic components of MI-GRAAL [37] and IsoRankN [34], as well as of MI-GRAAL and GHOST, that the combination of cost function of one method and alignment strategy of another method typically beats each original method [42, 43]. Another key finding was that of all cost functions, the graphlet-based node similarities [39] lead to the best alignments. Traditional GNA methods, as discussed up to this point, identify from possible alignments the high-scoring alignments with respect to the overall node similarity (or node conservation). However, they then evaluate the accuracy of the alignments with some other measure that is different than the node similarity used to construct the alignments. Typically, one measures the amount of conserved edges. Thus, the traditional methods align similar nodes between networks hoping to conserve many edges, but only after the alignments are constructed. Instead, an effort was made to directly maximize the amount of conserved edges during the alignment construction process, which indeed improved alignment quality. The resulting approach is called MAGNA [44]. The core algorithm of MAGNA was incorporated into a user-friendly and parallelized GNA framework called MAGNA ++ [45], which allows for optimizing both node and edge conservation during the alignment construction process. This further increased alignment quality compared with optimizing node conservation only (as the traditional methods do) or edge conservation only (as MAGNA does). Unlike the two-stage GNA approaches, MAGNA ++ (and thus MAGNA) is a search-based aligner, meaning that it can directly ‘optimize’ any alignment quality measure (i.e. cost function) via a strategy such as a genetic algorithm (which is what MAGNA and MAGNA ++ use) or simulated annealing (which is what an alternative approach called NetCoffee uses [46]). Several additional search-based aligners have appeared recently, including NABEECO [47], GEDEVO [48] and Optnetalign [49]. Additional GNA efforts related to improving edge conservation during alignment construction include WAVE [41] and GREAT [50]. Both of these approaches optimize both node and edge conservation, just like MAGNA ++, but they also weigh each conserved edge to favour conserved edges that are topologically similar over conserved edges that are topologically dissimilar, unlike MAGNA ++. MAGNA ++ is unique to the other two approaches, as it is search-based, while WAVE and GREAT are two-stage approaches. WAVE is unique in the sense that it is only the alignment strategy of a GNA algorithm, and, thus, it can be combined with any cost function. It was shown to work the best under the graphlet-based cost function. GREAT is unique in the sense that it approaches the GNA problem from a novel perspective, by aligning well edges between networks first to improve the cost function needed to then align well nodes between the networks. Yet, this makes GREAT slower than both WAVE and MAGNA ++, and more efforts are needed to speed it up to make it practically useful for large networks. Some of existing GNA approaches do not necessarily fit any of the above GNA categorizations explicitly. So, we acknowledge such methods by briefly commenting on their unique feature(s), without explaining their algorithmic principles in detail, as follows. Unlike traditional GNA, a method introduced in [51] uses a notion of an alignment graph, just as LNA does. Or, unlike traditional GNA, a method introduced in [52] starts its alignment construction process on an alignment that is optimal based purely on sequence data and uses topological information only to compensate for matching proteins whose sequences are not particularly similar to one another. The above discussion has focused on contrasting the algorithmic principles of different existing GNA methods. Another important aspect of their differences is whether they can align only two networks at once or more than two (multiple) networks. Prominent recent pairwise methods (many of which are already discussed above) include GHOST, NETAL, MAGNA ++, WAVE, OptNetAlign, L-GRAAL [36], Natalie 2.0 [53], SUMONA [54], ModuleAlign [55] and SANA [56], while prominent recent multiple methods include IsoRankN, MI-Iso [42], SMETANA [57], GEDEVO-M [58], BEAMS [59], FUSE [60], SMAL [61] and multiMAGNA ++ [62]. For a systematic comparison of recent pairwise GNA methods, see [18]. For a systematic comparison of recent multiple GNA methods, see [62]. Note that some of these recent methods are new, as they have appeared in parallel to each other as well as to this survey and were, thus, not directly evaluated against each other. Such an evaluation will clearly have to be done in the near future to properly position these newest methods compared with each other and the others. We conclude our discussion of the existing GNA methods by commenting on what it means for an alignment to be of high quality. Conceptually, one wants to measure topological and functional alignment quality. In terms of topological alignment quality, one wants to measure node correctness, i.e. how correctly the nodes have been mapped under the true node mapping, when such a mapping is known beforehand but is hidden during the alignments construction process [18]. More formally, node correctness is the percentage of nodes in the smaller of the aligned networks that are correctly aligned according to the true node mapping. Unfortunately, in real-life applications, such a true mapping is typically unknown. So, an alternative measure of topological alignment quality is the amount of conserved edges. A popular measure quantifying this is symmetric substructure score (S3) [18, 44] or its successor generalized S3 (GS3) [18]. More formally, S3 is the ratio of the number of conserved (aligned) edges to the number of both conserved and non-conserved edges; if u and v are nodes in one network, if u’ and v’ are nodes in another network, if there is an edge between u and v, if there is an edge between u’ and v’, if u has been aligned to u’ and if v has been aligned to v’, then the edge (u,v) has been aligned to edge (u’,v’), and, consequently, this edge is conserved. Although S3 is restricted to one-to-one (injective) GNA, GS3 was recently proposed as an intuitive extension of S3 that can deal with both LNA and GNA (this is important for reasons discussed in the following section). For details on GS3, see [18]. An alternative measure quantifying edge conservation is the size of the common subgraph [18], which does not just quantify the amount of conserved edges (as S3 or GS3 does), but it also accounts for how the conserved edges are organized, i.e. whether they form large, dense and contiguous conserved network regions (which is desired). Finally, in terms of topological quality, especially when an alignment is not one-to-one (see the following section), one wants to measure node coverage (NCV), which captures the percentage of all nodes across all of the aligned networks that are a part of the alignment. In terms of functional alignment quality, one wants to measure whether the aligned nodes share similar biological functions; popular measures quantifying this are gene ontology (GO) correctness, GO semantic similarity, normalized mean entropy or accuracy of protein function prediction (where the latter is measured in terms of precision—P-PF, recall—R-PF and F-score—F-PF) [18, 62]. For more details on these and other measures of alignment quality for pairwise and multiple GNA, see [18] and [62], respectively. Reconciliation of local and GNA Next, we touch in more detail the question of which one is better, LNA or GNA? Only recently, a study was conducted towards answering this question, representing the first ever comparison of the two NA method categories that evaluated 10 prominent LNA and GNA methods [18]. In the process, as LNA and GNA result in different output types (partial many-to-many node mapping versus one-to-one injective matching, respectively), the same study introduced new measures of alignment quality that allow for fair comparison of the different output types. The 10 considered methods were evaluated comprehensively, on both synthetic networks with known true node mapping and real-world networks with unknown true node mapping, examining in the process the impact on the results of using different PPI types and PPIs of varying confidence, as well as using only topological node similarities in the cost function versus including also protein sequence information, and evaluating both topological and functional alignment quality. Key results of this evaluation study were as follows [18]. When using only topological information in the cost function, GNA outperformed LNA both topologically and functionally (note that when parallelizable methods are run on multiple cores, GNA is also faster than LNA). When sequence information was also included, GNA was superior to LNA in terms of topological alignment quality, while LNA was superior to GNA in terms of functional alignment quality, indicating the complementarity of the two NA categories (note that when parallelizable methods are run on multiple cores, GNA is at least as fast as LNA). The results were overall robust to the choice of PIN data, meaning that both different PPI types and confidence levels led to consistent results in all cases topologically and in most cases functionally. Importantly, when the 10 NA methods were used to predict novel protein functional knowledge via across-species knowledge transfer, LNA and GNA produced different predictions, which further confirmed their complementarity. An important observation of this study[18] and some others [17, 38, 42] was that different measures of topological alignment quality tended to correlate well with each other, and different measures of functional alignment quality tended to correlate well with each other, while topological measures and functional measures typically did not correlate well. This observation held for both LNA and GNA [18]. This indicates that the topological versus functional fit between aligned networks conflict to a larger extent than previously realized. An explanation could be that the discovery of the current experimental biological knowledge may have been mostly guided by sequence-based analyses and not by network-based analyses. Findings from [18] support this, as do findings of an alternative study [52]. If the current experimental biological knowledge is indeed biased towards sequence data, given that sequences and network topology can lead to complementary biological insights [63], the above observation should not be surprising. Importantly, network topology is a valuable source of biological knowledge that can lead to novel insights compared with sequence data alone and can, thus, be used to redefine the traditional notion of sequence-based homology to a new notion of network-based homology [18]. As LNA has high functional but low topological alignment quality, while GNA has high topological but low functional alignment quality, a recent approach called IGLOO was proposed that integrates algorithmic components from both LNA and GNA in hope of reconciling the two NA types [64]. That is, IGLOO aims to inherit high functional quality of LNA and high topological quality of GNA. IGLOO’s inputs are two networks and pairwise similarity scores between their nodes, where the scores are computed via some cost function. This is the same as the input of existing LNA and GNA methods. Then, IGLOO produces an alignment in a similar way as GNA does, by first identifying a high-scoring seed alignment and then expanding around the seed via an alignment strategy. However, IGLOO differs from GNA as follows. Although GNA uses as its seed just a single pair of nodes from the compared networks, IGLOO’s seed is a local alignment (or a part of it) of high functional quality generated by an existing LNA method. Given the seed alignment, IGLOO expands around it via an existing alignment strategy to increase topological quality of the alignment. The difference between IGLOO and LNA is that IGLOO builds on top of the given local alignment to improve its topological quality. The difference between IGLOO and GNA is that IGLOO uses as the seed a local alignment of high functional quality (or a part of it) rather than just a single node pair, to improve functional quality of GNA. Namely, IGLOO varies the size of the seed from the entire local alignment (i.e. 100% of it) on one extreme (this version, denoted as IGLOO 4, is expected to resemble LNA the most) to only a single node pair (i.e. 0% of the local alignment) as the other extreme (this version, denoted as IGLOO 0, is expected to resemble GNA the most), with several in-between-the-extreme versions of IGLOO that use as the seed a certain portion (between 100 and 0%, exclusively) of the local alignment (these versions, denoted by IGLOO 3 to IGLOO 1, are expected to balance between high functional quality of LNA and high topological quality of GNA). As a result, IGLOO’s alignment is local in the sense that it allows for many-to-many mapping between nodes of the compared networks, just as LNA does. Yet, its alignment is global in the sense that it allows for mapping large conserved subgraphs across the compared networks, just as GNA does. IGLOO was evaluated comprehensively [64], against the same 10 existing NA methods, on the same data, and using the same alignment quality measures as in the above evaluation study [18] IGLOO produced a better trade-off between topological and functional alignment quality than the existing LNA and GNA methods (Figure 2). Namely, across all NA methods and network pairs, IGLOO was comparable or superior with the existing methods both functionally and topologically in 62% of all cases. Figure 2. View largeDownload slide Topological and functional alignment quality for existing prominent LNA methods (triangles), existing prominent GNA methods (stars) and IGLOO versions (circles), when aligning yeast and human PINs with yeast two-hybrid PPIs. The measure of topological alignment quality used in the figure is NCV combined with GS3, and the measure of functional alignment quality used in the figure is P-FP and R-PF of protein function prediction combined into F-PF; see the text for description of these measures, which are proven evaluation criteria for both LNA and GNA that can compare the two fairly [18, 65]. In general, LNA results in high functional but low topological alignment quality, while GNA results in high topological but low functional alignment quality [18, 65]. As IGLOO (in particular, IGLOO 4 and also IGLOO 3) is superior to each LNA method both topologically and functionally (these findings hold for all analysed PINs), as its minimum contribution, IGLOO is a new state-of-the-art LNA method. At the same time, IGLOO (the same two IGLOO versions) drastically increases functional alignment quality of all GNA methods without significantly lowering their topological alignment quality. Over all network data sets and all GNA methods, IGLOO’s average improvement in functional quality is 331%, while IGLOO’s average decrease in topological alignment quality is only 40%. The figure is taken from [65]. Figure 2. View largeDownload slide Topological and functional alignment quality for existing prominent LNA methods (triangles), existing prominent GNA methods (stars) and IGLOO versions (circles), when aligning yeast and human PINs with yeast two-hybrid PPIs. The measure of topological alignment quality used in the figure is NCV combined with GS3, and the measure of functional alignment quality used in the figure is P-FP and R-PF of protein function prediction combined into F-PF; see the text for description of these measures, which are proven evaluation criteria for both LNA and GNA that can compare the two fairly [18, 65]. In general, LNA results in high functional but low topological alignment quality, while GNA results in high topological but low functional alignment quality [18, 65]. As IGLOO (in particular, IGLOO 4 and also IGLOO 3) is superior to each LNA method both topologically and functionally (these findings hold for all analysed PINs), as its minimum contribution, IGLOO is a new state-of-the-art LNA method. At the same time, IGLOO (the same two IGLOO versions) drastically increases functional alignment quality of all GNA methods without significantly lowering their topological alignment quality. Over all network data sets and all GNA methods, IGLOO’s average improvement in functional quality is 331%, while IGLOO’s average decrease in topological alignment quality is only 40%. The figure is taken from [65]. Conclusion and future directions This survey has presented a systematic overview of current algorithms for LNA and GNA. In particular, we have mainly focused on the following issue. Although LNA aims to find highly functionally conserved modules, the modules that the current LNA methods can find are small (i.e. topologically suboptimal). On the other hand, while GNA aims to find overall (global) regions of topological similarity between the compared networks, the regions that the current GNA methods can find are typically poorly functionally conserved. Hence, we have asked whether it is possible to find both topologically large and highly functionally conserved regions of network similarity, which would still satisfy design/application goals of each of LNA and GNA but would also overcome their drawbacks. That is, we have asked: (1) How much can one expand around the small highly functionally conserved network regions that the current LNA methods can find to improve their topological quality while still preserving their functional quality? and (2) How much does one need to shrink the large regions of topological similarity that the current GNA methods can find to improve their functional quality while still preserving (or at least without drastically decreasing) their topological quality? We have presented early evidence [18, 64] that: (1) small highly functionally conserved modules of the existing LNA regions can be drastically increased to improve their topological quality without decreasing (or while also increasing!) their functional quality. (2) Large regions of topological similarity of the existing GNA methods are typically not functionally meaningful but can be refined to only functionally meaningful regions, which can drastically improve functional quality without drastically decreasing their topological quality. Such an efficient integration of LNA and GNA that preserves the original goals of each of LNA and GNA while at the same time, improving both LNA and GNA is a win–win for both scientists who aim to find highly functionally conserved local network regions (in which case they would find larger but equally functionally meaningful regions than the existing LNA methods can find) and scientists who aim to find highly topologically conserved global network regions (in which case they would find slightly smaller but much more biologically meaningful regions than the existing GNA methods can find). Importantly, balancing between the size (i.e. topological quality) of an aligned network region and its functional quality might need to be regulated with a parameter of the given (existing or future) LNA-GNA integrative method. For example, IGLOO (the only current such integrative method) has a parameter that allows for this, as illustrated in Figure 2; certain values of this parameter are intended to compete with the existing LNA methods, while other values of this parameter are intended to compete with the existing GNA methods. And the choice of this parameter value might be application-dependent. But a single method (e.g. IGLOO) should be able to achieve both goals (high topological quality and high functional quality), whereas none of the existing LNA or GNA methods allows for this. So, the need to, henceforth, develop more of such LNA-GNA integrative methods, which can compete with any of LNA and GNA, is the key point of our article. In addition to the above issue, several other questions remain open, as follows. From a practical point of view, one of the central questions to be addressed is: Which method should a biological scientist use? From a biological perspective, GNA and LNA have similarities to global and local sequence alignment. The former one is used to compare whole genomes, looking to highlight (dis)similarities among species, while the latter one is used to find conserved functional motifs (e.g. promoter or regulatory sequences). In a similar way, GNA tries to find the best overall alignment among compared networks, and it can be used to compare interactomes of different species, while LNA may be used to analyse conserved functional network modules (e.g. pathways or protein complexes). However, this distinction between application goals of LNA and GNA is not rigid. Namely, while LNA in general cannot recover large regions of similarity [65], GNA has been reported in the literature as able to recover biologically meaningful pathways or evolutionary conserved protein complexes ([35, 66]). Moreover, the answer to the above question is related to other issues such as: How to compare two different alignments? and How to measure the quality of an alignment? To the best of our knowledge, this problem has been partially addressed in other works. In [42], the authors investigated GNA methods, and they proposed a novel measure for the quality of the alignment. In [26], the authors discussed LNA methods, and they proposed a framework to improve the robustness of the local alignments. More recently, in [18, 64] the authors discussed the integration of LNA and GNA, and they proposed a comparison among them. It should be noted that the comparison is complex even among aligners of the same type (i.e. either LNA or GNA), and it is especially hard when comparing aligners of the different classes (i.e. LNA and GNA). Therefore, in [18], the authors proposed new measures of alignment quality that are applicable to both LNA and GNA. Yet, the question of proper evaluation is far from being settled. The existing metrics are either based only on network topology or use functional considerations. Nevertheless, aligners that produce good alignments regarding functional quality are usually not equally good regarding topological quality, and vice versa. Therefore, there is the need to unify the existing measures into a single consistent framework. Consequently, we emphasize again that one of the future challenges is the definition of a formal way to represent and analyse the output of the network aligners, similarly to other graph analysis related fields. For instance, the comparison of clustering and community detection algorithms relies on some accepted metrics and gold standards that are still absent in the context of the comparison of network alignment algorithms [10]. In the scenario we envision, each obtained alignment should be measured using some accepted metrics, and all the aligners should be benchmarked on the same group of gold-standard networks. Such comparison is hard. For instance, PINs evolve continuously. Consequently, aligners of these networks have been evaluated using different data sets, and, consequently, their comparison is hard [26]. The existence of a common framework to compare aligners may also open the possibility to expand their usage. For instance, network aligners could be used to measure the (dis)similarity among two or more networks, overcoming the well-known graph edit distance [67]. Moreover, as aligners are usually tailored to the structure of the analysed networks, they may also evidence the (dis)similarity of subnetworks (e.g. protein modules or complexes), highlighting the changes in such modular structures among different networks or conditions. Although in our survey we focused on the alignment of PINs, network alignment has been applied to other biological network types, such as metabolic [68] or gene co-expression [69] networks. Moreover, other emerging fields of applications, e.g. the comparison of networks representing the structural connections within the brain, pose many interesting questions to researchers. For example, is it possible to simply apply existing algorithms used in the comparison of biological networks to networks in other domains, or do significant changes need to be made to the existing methods to customize them to the given applied problem at hand? LNA methods are usually tightly coupled to the biological question addressed. For instance, some of them try to evidence conserved protein complexes among organisms [70]. Specifically, many LNA approaches (e.g. MaWiSH, NetworkBlast and AlignNemo) rely on biological models of protein complexes, i.e. the small aligned regions that are the output of the alignment. Consequently, the application of such aligners in other fields poses some challenges that need to be solved. On the other hand, GNA methods are usually based more on topology during the alignment steps; therefore, the application to other fields may be simpler with respect to LNA approaches. Another important question related to the applicability of the existing methods for biological network alignment to other domains, such as to social networks [71–73], is that of scalability. Namely, as social networks are typically much larger (with millions of nodes) than biological ones (with thousands of nodes), the existing biological network alignment methods might need to be redesigned to allow for aligning such large social networks. Although some of the existing methods, such as MAGNA ++ and multiMAGNA ++, two of the GNA methods, exploit the benefits of the parallelism and multithreading, there is still significant room for improvement in efficiency of some algorithms that suffer from known scalability problems. Development of cloud-based aligners that would exploit the benefit of elastic computing may constitute a possible research trend in the future. Key Points The article discusses the topic of biological network alignment, which, analogous to genomic sequence alignment, has a potential to revolutionize our understanding of cellular functioning. The article discusses recent approaches for local and GNA and contrasts the two approach types, indicating their complementarity in uncovering new biological knowledge. The article discusses a possible reconciliation of the two complementary approach types, the topic that has been explored only recently and that, thus, remains an open research problem for future exploration. The article outlines additional future research directions, such as the definition of novel theories for a better understanding of the network alignment output, which are needed for a better comparison of network alignment methods, for guiding future method development by computational scientists and for guiding applied (e.g. biological) scientists on how to use the network alignment output to learn new biology. The article outlines the possibility of the use of existing network alignment methods in other emerging fields. Funding The United States Air Force Office of Scientific Research (AFOSR) Young Investigator Research Program (YIP) (grant number FA9550-16-1-0147 to T.M.), National Science Foundation (NSF) (grant number CCF-1319469 to T.M.), The Italian Ministry of Education and Research (MIUR): BA2Know-Business Analytics to Know (grant number PON03PE_00001_1 to P.H.G.). Pietro H. Guzzi is an associate professor of Computer Science Engineering at the University ‘Magna Græcia’ of Catanzaro, Italy. His research interests comprise semantic-based and network-based analysis of biological and clinical data. Tijana Milenković is an associate professor of Computer Science and Engineering at the University of Notre Dame. She researches network science and computational biology. She won NSF CAREER 2015 and AFOSR YIP 2016 awards, among others. Milenković is an Associate Editor of IEEE/ACM TCBB. References 1 Barabasi A-L , Oltvai ZN. Network biology: understanding the cell's functional organization . Nat Rev Genet 2004 ; 5 : 101 – 13 . Google Scholar CrossRef Search ADS PubMed 2 Alon U. Network motifs: theory and experimental approaches . Nat Rev Genet 2007 ; 8 : 450 – 61 . Google Scholar CrossRef Search ADS PubMed 3 Gavin AEA. Functional organization of the yeast proteome by systematic analysis of protein complexes . Nature 2002 ; 415 : 141 – 7 . Google Scholar CrossRef Search ADS PubMed 4 Calin GA , Croce CM. MicroRNA-cancer connection: the beginning of a new tale . Cancer Res 2006 ; 66 : 7390 – 4 . Google Scholar CrossRef Search ADS PubMed 5 Tuncbag N , Kar G , Keskin O , et al. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces . Brief Bioinform 2008 ; 10 : 217 – 32 . Google Scholar CrossRef Search ADS 6 Cannataro M , Guzzi PH , Veltri P. Protein-to-protein interactions: Technologies, databases, and algorithms . ACM Comput Surv 2010 ; 43 ( 1 ): 1 . Google Scholar CrossRef Search ADS 7 Bertolazzi P , Bock ME , Guerra C. On the functional and structural characterization of hubs in protein–protein interaction networks . Biotechnol Adv 2013 ; 31 : 274 – 86 . Google Scholar CrossRef Search ADS PubMed 8 Aittokallio T , Schwikowski B. Graph-based methods for analysing networks in cell biology . Brief Bioinform 2006 ; 7 : 243 – 55 . Google Scholar CrossRef Search ADS PubMed 9 Panni S , Rombo SE. Searching for repetitions in biological networks: methods, resources and tools . Brief Bioinform 2015 ; 16 : 118 – 36 . Google Scholar CrossRef Search ADS PubMed 10 Lancichinetti A , Fortunato S , Radicchi F. Benchmark graphs for testing community detection algorithms . Phys Rev E 2008 ; 78 : 046110. Google Scholar CrossRef Search ADS 11 Fortunato S. Community detection in graphs . Phys Rep 2010 ; 486 : 75 – 174 . Google Scholar CrossRef Search ADS 12 Faisal FE , Milenković T. Dynamic networks reveal key players in aging . Bioinformatics 2014 ; 30 : 1721 – 9 . Google Scholar CrossRef Search ADS PubMed 13 Faisal FE , Meng L , Crawford J , et al. The post-genomic era of biological network alignment . EURASIP J Bioinform Syst Biol 2015 ; 2015 : 1 – 19 . Google Scholar CrossRef Search ADS PubMed 14 Elmsallati A , Clark C , Kalita J. Global alignment of protein-protein interaction networks: a survey . IEEE/ACM Trans Comput Biol Bioinform 2015 ; 13 : 689 – 705 . Google Scholar CrossRef Search ADS PubMed 15 Sharan R , Suthram S , Kelley RM , et al. Conserved patterns of protein interaction in multiple species . Proc Natl Acad Sci USA 2005 ; 102 : 1974 – 9 . Google Scholar CrossRef Search ADS PubMed 16 Cook SA. The complexity of theorem-proving procedures. In: Stoc '71 Proceedings of the third annual ACM symposium on Theory of computing, ACM Press, NY, 1971 , pp. 151–8. 17 Clark C , Kalita J. A comparison of algorithms for the pairwise alignment of biological networks . Bioinformatics 2014 ; 30 : 2351 – 9 . Google Scholar CrossRef Search ADS PubMed 18 Meng L , Striegel A , Milenković T. Local versus global biological network alignment . Bioinformatics 2016 ; 32 : 3155 – 64 . Google Scholar CrossRef Search ADS PubMed 19 Erten S , Li X , Bebek G , et al. Phylogenetic analysis of modularity in protein interaction networks . BMC Bioinformatics 2009 ; 10 : 333 . Google Scholar CrossRef Search ADS PubMed 20 Jancura P , Mavridou E , Carrillo-de Santa Pau E , et al. A methodology for detecting the orthology signal in a PPI network at a functional complex level . BMC Bioinformatics 2012 ; 13 (Suppl 1) : S18. Google Scholar CrossRef Search ADS 21 Lancichinetti A , Fortunato S. Community detection algorithms: a comparative analysis . Phys Rev E 2009 ; 80 : 056117 . Google Scholar CrossRef Search ADS 22 Guzzi PH , Veltri P , Roy S , et al. MODULA: a network module based local protein interaction network alignment method. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE Press, NY, 2015 , pp. 1620 – 3 . 23 Nepusz T , Yu H , Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks . Nat Methods 2012 ; 9 : 471 – 72 . Google Scholar CrossRef Search ADS PubMed 24 Guzzi PH , Mina M , Guerra C , et al. Semantic similarity analysis of protein data: assessment with biological features and issues . Brief Bioinform 2012 ; 13 : 569 – 85 . Google Scholar CrossRef Search ADS PubMed 25 Ciriello G , Mina M , Guzzi PH , et al. AlignNemo: a local network alignment method to integrate homology and topology . PloS One 2012 ; 7 : e38107. Google Scholar CrossRef Search ADS PubMed 26 Mina M. , Hiram Guzzi P. Improving the Robustness of local network alignment: design and extensive assessmentof a Markov clustering-based approach . IEEE/ACM Trans Comput Biol Bioinform 2014 ; 11 : 561 – 72 . Google Scholar CrossRef Search ADS PubMed 27 Kalaev M , Smoot M , Ideker T , et al. NetworkBLAST: comparative analysis of protein networks . Bioinformatics 2008 ; 24 : 594 – 6 . Google Scholar CrossRef Search ADS PubMed 28 Koyutrk M , Kim Y , Topkara U , et al. Pairwise alignment of protein interaction networks . J Comput Biol 2006 ; 13 : 182 – 99 . Google Scholar CrossRef Search ADS PubMed 29 Pache RA , Ceol A , Aloy P. NetAligner—a network alignment server to compare complexes, pathways and whole interactomes . Nucleic Acids Res 2012 ; 40 : W157 – 61 . Google Scholar CrossRef Search ADS PubMed 30 Mina M , Guzzi PH. AlignMCL: comparative analysis of protein interaction networks through Markov clustering. In: 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), IEEE Press, NY, 2012 , pp, 174–81. 31 Kalaev M , Bafna V , Sharan R. Fast and accurate alignment of multiple protein networks . J Comput Biol 2009 ; 16 : 989 – 99 . Google Scholar CrossRef Search ADS PubMed 32 Micale G , Pulvirenti A , Giugno R , et al. GASOLINE: a greedy and stochastic algorithm for optimal local multiple alignment of interaction networks . PloS One 2014 ; 9 : e98750. Google Scholar CrossRef Search ADS PubMed 33 Singh R , Xu J , Berger B. Global alignment of multiple protein interaction networks with application to functional orthology detection . Proc Nat Acad Sci USA 2008 ; 105 : 12763 – 68 . Google Scholar CrossRef Search ADS PubMed 34 Liao C-S , Lu K , Baym M , et al. IsoRankN: spectral methods for global alignment of multiple protein networks . Bioinformatics 2009 ; 25 : i253 – 8 . Google Scholar CrossRef Search ADS PubMed 35 Kuchaiev O , Milenković T , Memišević V , et al. Topological network alignment uncovers biological function and phylogeny . J R So Interface 2010 ; 7 : 1341 – 54 . Google Scholar CrossRef Search ADS 36 Malod-Dognin N , Pržulj N. L-GRAAL: Lagrangian Graphlet-based network aligner . Bioinformatics 2015 ; 31 : 2182 – 9 . Google Scholar CrossRef Search ADS PubMed 37 Kuchaiev O , Pržulj N. Integrative network alignment reveals large regions of global network similarity in yeast and human . Bioinformatics 2011 ; 27 : 1390 – 6 . Google Scholar CrossRef Search ADS PubMed 38 Patro R , Kingsford C. Global network alignment using multiscale spectral signatures . Bioinformatics 2012 ; 28 : 3105 – 14 . Google Scholar CrossRef Search ADS PubMed 39 Milenković T , Pržulj N. Uncovering biological network function via graphlet degree signatures . Cancer Inform 2008 ; 6 : 257 – 73 . Google Scholar CrossRef Search ADS PubMed 40 Neyshabur B , Khadem A , Hashemifar S , et al. NETAL: a new graph-based method for global alignment of protein–protein interaction networks . Bioinformatics 2013 ; 29 : 1654 – 62 . Google Scholar CrossRef Search ADS PubMed 41 Sun Y , Crawford J , Tang J , et al. Simultaneous optimization of both node and edge conservation in network alignment via WAVE. In: Algorithms in Bioinformatics, Volume 9289 of the series Lecture Notes in Computer Science, Springer Verlag, 2015 , pp. 16 – 39 . 42 Faisal F , Zhao H , Milenković T. Global network alignment in the context of aging . IEEE/ACM Trans Comput Biol Bioinform 2014 ; 12 : 40 – 52 . Google Scholar CrossRef Search ADS 43 Crawford J , Sun Y , Milenković T. Fair evaluation of global network aligners . Algorithms for Molecular Biology 2015 ; 10 : 19 . Google Scholar CrossRef Search ADS PubMed 44 Saraph V , Milenković T. MAGNA: Maximizing Accuracy in Global Network Alignment . Bioinformatics 2014 ; 30 : 2931 – 40 . Google Scholar CrossRef Search ADS PubMed 45 Vijayan V , Saraph V , Milenković T. MAGNA ++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation . Bioinformatics 2015 ; 31 : 2409 – 11 . Google Scholar CrossRef Search ADS PubMed 46 Hu J , Kehr B , Reinert K. NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks . Bioinformatics 2014 ; 30 : 540 – 8 . Google Scholar CrossRef Search ADS PubMed 47 Ibragimov R , Malek M , Guo J , et al. NABEECO: biological network alignment with bee colony optimization algorithm. In: GECCO '13 Companion Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, ACM Press, NY, 2013 , pp. 43–44. 48 Ibragimov R , Malek M , Guo J , et al. GEDEVO: an evolutionary graph edit distance algorithm for biological network alignment . German Conf Bioinformatics (GCB) 2013 ; 34 : 68 – 79 . 49 Clark C , Kalita J. A multiobjective memetic algorithm for PPI network alignment . Bioinformatics 2015 ; 31 : 1988 – 98 . Google Scholar CrossRef Search ADS PubMed 50 Crawford J , Milenković T. GREAT: GRaphlet Edge-based network AlignmenT. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2015 ; 220 – 227 . 51 Klau G. A new graph-based method for pairwise global network alignment . BMC Bioinformatics 2009 ; 10 : S59. Google Scholar CrossRef Search ADS PubMed 52 Chindelevitch L , Liao C-S , Berger B. Local optimization for global alignment of protein interaction networks . Pac Symp Biocomput 2010 ; 123 – 32 . 53 El-Kebir M , Heringa J , Klau GW. Natalie 2.0—sparse global network alignment as a special case of quadratic assignment . Algorithms 2015 ; 8 : 1035 – 51 . Google Scholar CrossRef Search ADS 54 Tuncay EG , Can T. SUMONA: a supervised method for optimizing network alignment . Comput Biol Chem 2016 ; 63 : 41 – 51 . Google Scholar CrossRef Search ADS PubMed 55 Hashemifar S , Ma J , Naveed H , et al. ModuleAlign: module-based global alignment of protein–protein interaction networks . Bioinformatics 2016 ; 32 : i658 – 64 . Google Scholar CrossRef Search ADS PubMed 56 Mamano N , Hayes W. SANA: simulated annealing network alignment applied to biological networks. arXiv 2016 ;q-bio.MN. 57 Sahraeian SME , Yoon B-J. SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks . PloS One 2013 ; 8 : e67995. Google Scholar CrossRef Search ADS PubMed 58 Ibragimov R , Malek M , Guo J , et al. Multiple graph edit distance - simultaneous topological alignment of multiple protein-protein interaction networks with an evolutionary algorithm. In: GECCO '14 Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, IEEE Press, NY, 2014 , pp. 277–84. 59 Alkan F , Erten C. BEAMS: backbone extraction and merge strategy for the global many-to-many alignment of multiple PPI networks . Bioinformatics 2014 ; 30 : 531 – 9 . Google Scholar CrossRef Search ADS PubMed 60 Gligorijević V , Malod-Dognin N , Pržulj N. FUSE: multiple network alignment via data fusion . Bioinformatics 2015 ; 32 : 1195 – 203 . Google Scholar CrossRef Search ADS PubMed 61 Dohrmann J , Singh R. The SMAL web server: global multiple network alignment from pairwise alignments . Bioinformatics 2016 ; 32 : 3330 – 2 . Google Scholar CrossRef Search ADS PubMed 62 Vijayan V , Milenković T. Multiple network alignment via multiMAGNA. arXiv:1604.01740 [q-bio.MN] 2016. 63 Memišević V , Milenković T , Pržulj N. Complementarity of network and sequence information in homologous proteins . J Integr Bioinform 2010 ; 7 ( 3 ): 135 . 64 Meng L , Crawford J , Striegel A , et al. IGLOO: integrating global and local biological network alignment. In: 12th International Workshop on Mining and Learning with Graphs (MLG) 2016 . 65 Kuchaiev O , Milenković T , Memišević V , et al. Topological network alignment uncovers biological function and phylogeny . J R Soc Interface 2010 ; 7 : 1341 – 54 . Google Scholar CrossRef Search ADS PubMed 66 Kelley BP , Sharan R , Karp RM , et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment . Proc Natl Acad Sci USA 2003 ; 100 : 11394 – 9 . Google Scholar CrossRef Search ADS PubMed 67 Zager L , Verghese G. Graph similarity scoring and matching . Appl Math Lett 2008 ; 21 : 86 – 94 . Google Scholar CrossRef Search ADS 68 Pah AR , Guimerà R , Mustoe AM , et al. Use of a global metabolic network to curate organismal metabolic networks . Sci Rep 2013 ; 3 : 1695. Google Scholar CrossRef Search ADS PubMed 69 Ma C-Y , Lin S-H , Lee C-C , et al. Reconstruction of phyletic trees by global alignment of multiple metabolic networks . BMC Bioinformatics 2013 ; 14 : S12. Google Scholar CrossRef Search ADS PubMed 70 Cannataro M , Guzzi PH , Veltri P. IMPRECO: distributed prediction of protein complexes . Future Gener Comput Syst 2010 ; 26 : 434 – 40 . Google Scholar CrossRef Search ADS 71 Narayanan A , Shi E , Rubinstein BIP. Link prediction by de-anonymization: how we won the Kaggle Social Network challenge. In: 2011 International Joint Conference on Neural Networks (IJCNN 2011—San Jose), IEEE Press, NY, 2011 , pp. 1825–1834. 72 Zhang Y , Tang J , Yang Z , et al. COSNET: connecting heterogeneous social networks with local and global consistency. In: Kdd '15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, NY, 2015 , pp. 1485–1494. 73 Zhang J , Yu PS. Multiple anonymized social networks alignment. In: 2015 IEEE International Conference on Data Mining (ICDM), ACM Press, NY, 2015 , pp. 599–608. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Briefings in BioinformaticsOxford University Press

Published: Jan 5, 2017

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off